public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH][ARM][GCC][2/x]: MVE ACLE intrinsics framework patch.
  2019-11-14 19:34 [PATCH][ARM][GCC][0/x]: Support for MVE ACLE intrinsics Srinath Parvathaneni
                   ` (2 preceding siblings ...)
  2019-11-14 19:13 ` [PATCH][ARM][GCC][4/x]: MVE ACLE vector interleaving store intrinsics Srinath Parvathaneni
@ 2019-11-14 19:13 ` Srinath Parvathaneni
  2019-12-19 17:24   ` Kyrill Tkachov
  2019-11-14 19:13 ` [PATCH][ARM][GCC][3/1x]: MVE intrinsics with unary operand Srinath Parvathaneni
                   ` (34 subsequent siblings)
  38 siblings, 1 reply; 48+ messages in thread
From: Srinath Parvathaneni @ 2019-11-14 19:13 UTC (permalink / raw)
  To: gcc-patches; +Cc: Richard Earnshaw, Kyrylo Tkachov

[-- Attachment #1: Type: text/plain, Size: 15429 bytes --]

Hello,

This patch is part of MVE ACLE intrinsics framework.
This patches add support to update (read/write) the APSR (Application Program Status Register)
register and FPSCR (Floating-point Status and Control Register) register for MVE.
This patch also enables thumb2 mov RTL patterns for MVE.

Please refer to Arm reference manual [1] for more details.
[1] https://static.docs.arm.com/ddi0553/bh/DDI0553B_h_armv8m_arm.pdf?_ga=2.102521798.659307368.1572453718-1501600630.1548848914

Regression tested on arm-none-eabi and found no regressions.

Ok for trunk?

Thanks,
Srinath

gcc/ChangeLog:

2019-11-11  Andre Vieira  <andre.simoesdiasvieira@arm.com>
	    Mihail Ionescu  <mihail.ionescu@arm.com>
	    Srinath Parvathaneni  <srinath.parvathaneni@arm.com>

	* config/arm/thumb2.md (thumb2_movsfcc_soft_insn): Add check to not allow
	TARGET_HAVE_MVE for this pattern.
	(thumb2_cmse_entry_return): Add TARGET_HAVE_MVE check to update APSR register.
	* config/arm/unspecs.md (UNSPEC_GET_FPSCR): Define.
	(VUNSPEC_GET_FPSCR): Remove.
	* config/arm/vfp.md (thumb2_movhi_vfp): Add TARGET_HAVE_MVE check.
	(thumb2_movhi_fp16): Add TARGET_HAVE_MVE check.
	(thumb2_movsi_vfp): Add TARGET_HAVE_MVE check.
	(movdi_vfp): Add TARGET_HAVE_MVE check.
	(thumb2_movdf_vfp): Add TARGET_HAVE_MVE check.
	(thumb2_movsfcc_vfp): Add TARGET_HAVE_MVE check.
	(thumb2_movdfcc_vfp): Add TARGET_HAVE_MVE check.
	(push_multi_vfp): Add TARGET_HAVE_MVE check.
	(set_fpscr): Add TARGET_HAVE_MVE check.
	(get_fpscr): Add TARGET_HAVE_MVE check.


###############     Attachment also inlined for ease of reply    ###############


diff --git a/gcc/config/arm/thumb2.md b/gcc/config/arm/thumb2.md
index 809461a25da5a8058a8afce972dea0d3131effc0..81afd8fcdc1b0a82493dc0758bce16fa9e5fde20 100644
--- a/gcc/config/arm/thumb2.md
+++ b/gcc/config/arm/thumb2.md
@@ -435,10 +435,10 @@
 (define_insn "*cmovsi_insn"
   [(set (match_operand:SI 0 "arm_general_register_operand" "=r,r,r,r,r,r,r")
 	(if_then_else:SI
-	 (match_operator 1 "arm_comparison_operator"
-	  [(match_operand 2 "cc_register" "") (const_int 0)])
-	 (match_operand:SI 3 "arm_reg_or_m1_or_1" "r, r,UM, r,U1,UM,U1")
-	 (match_operand:SI 4 "arm_reg_or_m1_or_1" "r,UM, r,U1, r,UM,U1")))]
+	(match_operator 1 "arm_comparison_operator"
+	 [(match_operand 2 "cc_register" "") (const_int 0)])
+	(match_operand:SI 3 "arm_reg_or_m1_or_1" "r, r,UM, r,U1,UM,U1")
+	(match_operand:SI 4 "arm_reg_or_m1_or_1" "r,UM, r,U1, r,UM,U1")))]
   "TARGET_THUMB2 && TARGET_COND_ARITH
    && (!((operands[3] == const1_rtx && operands[4] == constm1_rtx)
        || (operands[3] == constm1_rtx && operands[4] == const1_rtx)))"
@@ -540,7 +540,7 @@
 			  [(match_operand 4 "cc_register" "") (const_int 0)])
 			 (match_operand:SF 1 "s_register_operand" "0,r")
 			 (match_operand:SF 2 "s_register_operand" "r,0")))]
-  "TARGET_THUMB2 && TARGET_SOFT_FLOAT"
+  "TARGET_THUMB2 && TARGET_SOFT_FLOAT && !TARGET_HAVE_MVE"
   "@
    it\\t%D3\;mov%D3\\t%0, %2
    it\\t%d3\;mov%d3\\t%0, %1"
@@ -1226,7 +1226,7 @@
    ; added to clear the APSR and potentially the FPSCR if VFP is available, so
    ; we adapt the length accordingly.
    (set (attr "length")
-     (if_then_else (match_test "TARGET_HARD_FLOAT")
+     (if_then_else (match_test "TARGET_HARD_FLOAT || TARGET_HAVE_MVE")
       (const_int 34)
       (const_int 8)))
    ; We do not support predicate execution of returns from cmse_nonsecure_entry
diff --git a/gcc/config/arm/unspecs.md b/gcc/config/arm/unspecs.md
index b3b4f8ee3e2d1bdad968a9dd8ccbc72ded274f48..ac7fe7d0af19f1965356d47d8327e24d410b99bd 100644
--- a/gcc/config/arm/unspecs.md
+++ b/gcc/config/arm/unspecs.md
@@ -170,6 +170,7 @@
   UNSPEC_TORC		; Used by the intrinsic form of the iWMMXt TORC instruction.
   UNSPEC_TORVSC		; Used by the intrinsic form of the iWMMXt TORVSC instruction.
   UNSPEC_TEXTRC		; Used by the intrinsic form of the iWMMXt TEXTRC instruction.
+  UNSPEC_GET_FPSCR	; Represent fetch of FPSCR content.
 ])
 
 
@@ -216,7 +217,6 @@
   VUNSPEC_SLX		; Represent a store-register-release-exclusive.
   VUNSPEC_LDA		; Represent a store-register-acquire.
   VUNSPEC_STL		; Represent a store-register-release.
-  VUNSPEC_GET_FPSCR	; Represent fetch of FPSCR content.
   VUNSPEC_SET_FPSCR	; Represent assign of FPSCR content.
   VUNSPEC_PROBE_STACK_RANGE ; Represent stack range probing.
   VUNSPEC_CDP		; Represent the coprocessor cdp instruction.
diff --git a/gcc/config/arm/vfp.md b/gcc/config/arm/vfp.md
index 6349c0570540ec25a599166f5d427fcbdbf2af68..461a5d71ca8548cfc61c83f9716249425633ad21 100644
--- a/gcc/config/arm/vfp.md
+++ b/gcc/config/arm/vfp.md
@@ -74,10 +74,10 @@
 (define_insn "*thumb2_movhi_vfp"
  [(set
    (match_operand:HI 0 "nonimmediate_operand"
-    "=rk, r, l, r, m, r, *t, r, *t")
+    "=rk, r, l, r, m, r, *t, r, *t, Up, r")
    (match_operand:HI 1 "general_operand"
-    "rk, I, Py, n, r, m, r, *t, *t"))]
- "TARGET_THUMB2 && TARGET_HARD_FLOAT
+    "rk, I, Py, n, r, m, r, *t, *t, r, Up"))]
+ "TARGET_THUMB2 && (TARGET_HARD_FLOAT || TARGET_HAVE_MVE)
   && !TARGET_VFP_FP16INST
   && (register_operand (operands[0], HImode)
        || register_operand (operands[1], HImode))"
@@ -99,20 +99,24 @@
       return "vmov%?\t%0, %1\t%@ int";
     case 8:
       return "vmov%?.f32\t%0, %1\t%@ int";
+    case 9:
+      return "vmsr%?\t P0, %1\t@ movhi";
+    case 10:
+      return "vmrs%?\t %0, P0\t@ movhi";
     default:
       gcc_unreachable ();
     }
 }
  [(set_attr "predicable" "yes")
   (set_attr "predicable_short_it"
-   "yes, no, yes, no, no, no, no, no, no")
+   "yes, no, yes, no, no, no, no, no, no, no, no")
   (set_attr "type"
    "mov_reg, mov_imm, mov_imm, mov_imm, store_4, load_4,\
-    f_mcr, f_mrc, fmov")
-  (set_attr "arch" "*, *, *, v6t2, *, *, *, *, *")
-  (set_attr "pool_range" "*, *, *, *, *, 4094, *, *, *")
-  (set_attr "neg_pool_range" "*, *, *, *, *, 250, *, *, *")
-  (set_attr "length" "2, 4, 2, 4, 4, 4, 4, 4, 4")]
+    f_mcr, f_mrc, fmov, mve_move, mve_move")
+  (set_attr "arch" "*, *, *, v6t2, *, *, *, *, *, *, *")
+  (set_attr "pool_range" "*, *, *, *, *, 4094, *, *, *, *, *")
+  (set_attr "neg_pool_range" "*, *, *, *, *, 250, *, *, *, *, *")
+  (set_attr "length" "2, 4, 2, 4, 4, 4, 4, 4, 4, 4, 4")]
 )
 
 ;; Patterns for HI moves which provide more data transfer instructions when FP16
@@ -170,10 +174,10 @@
 (define_insn "*thumb2_movhi_fp16"
  [(set
    (match_operand:HI 0 "nonimmediate_operand"
-    "=rk, r, l, r, m, r, *t, r, *t")
+    "=rk, r, l, r, m, r, *t, r, *t, Up, r")
    (match_operand:HI 1 "general_operand"
-    "rk, I, Py, n, r, m, r, *t, *t"))]
- "TARGET_THUMB2 && TARGET_VFP_FP16INST
+    "rk, I, Py, n, r, m, r, *t, *t, r, Up"))]
+ "TARGET_THUMB2 && (TARGET_VFP_FP16INST || TARGET_HAVE_MVE)
   && (register_operand (operands[0], HImode)
        || register_operand (operands[1], HImode))"
 {
@@ -194,21 +198,25 @@
       return "vmov.f16\t%0, %1\t%@ int";
     case 8:
       return "vmov%?.f32\t%0, %1\t%@ int";
+    case 9:
+      return "vmsr%?\tP0, %1\t%@ movhi";
+    case 10:
+      return "vmrs%?\t%0, P0\t%@ movhi";
     default:
       gcc_unreachable ();
     }
 }
  [(set_attr "predicable"
-   "yes, yes, yes, yes, yes, yes, no, no, yes")
+   "yes, yes, yes, yes, yes, yes, no, no, yes, yes, yes")
   (set_attr "predicable_short_it"
-   "yes, no, yes, no, no, no, no, no, no")
+   "yes, no, yes, no, no, no, no, no, no, no, no")
   (set_attr "type"
    "mov_reg, mov_imm, mov_imm, mov_imm, store_4, load_4,\
-    f_mcr, f_mrc, fmov")
-  (set_attr "arch" "*, *, *, v6t2, *, *, *, *, *")
-  (set_attr "pool_range" "*, *, *, *, *, 4094, *, *, *")
-  (set_attr "neg_pool_range" "*, *, *, *, *, 250, *, *, *")
-  (set_attr "length" "2, 4, 2, 4, 4, 4, 4, 4, 4")]
+    f_mcr, f_mrc, fmov, mve_move, mve_move")
+  (set_attr "arch" "*, *, *, v6t2, *, *, *, *, *, *, *")
+  (set_attr "pool_range" "*, *, *, *, *, 4094, *, *, *, *, *")
+  (set_attr "neg_pool_range" "*, *, *, *, *, 250, *, *, *, *, *")
+  (set_attr "length" "2, 4, 2, 4, 4, 4, 4, 4, 4, 4, 4")]
 )
 
 ;; SImode moves
@@ -258,9 +266,11 @@
 ;; is chosen with length 2 when the instruction is predicated for
 ;; arm_restrict_it.
 (define_insn "*thumb2_movsi_vfp"
-  [(set (match_operand:SI 0 "nonimmediate_operand" "=rk,r,l,r,r,lk*r,m,*t, r,*t,*t,  *Uv")
-	(match_operand:SI 1 "general_operand"	   "rk,I,Py,K,j,mi,lk*r, r,*t,*t,*UvTu,*t"))]
-  "TARGET_THUMB2 && TARGET_HARD_FLOAT
+  [(set (match_operand:SI 0 "nonimmediate_operand" "=rk,r,l,r,r,l,*hk,m,*m,*t,\
+			     r,*t,*t,*Uv, Up, r")
+	(match_operand:SI 1 "general_operand"	   "rk,I,Py,K,j,mi,*mi,l,*hk,\
+			     r,*t,*t,*UvTu,*t, r, Up"))]
+  "TARGET_THUMB2 && (TARGET_HARD_FLOAT || TARGET_HAVE_MVE)
    && (   s_register_operand (operands[0], SImode)
        || s_register_operand (operands[1], SImode))"
   "*
@@ -275,43 +285,53 @@
     case 4:
       return \"movw%?\\t%0, %1\";
     case 5:
+    case 6:
       /* Cannot load it directly, split to load it via MOV / MOVT.  */
       if (!MEM_P (operands[1]) && arm_disable_literal_pool)
 	return \"#\";
       return \"ldr%?\\t%0, %1\";
-    case 6:
-      return \"str%?\\t%1, %0\";
     case 7:
-      return \"vmov%?\\t%0, %1\\t%@ int\";
     case 8:
-      return \"vmov%?\\t%0, %1\\t%@ int\";
+      return \"str%?\\t%1, %0\";
     case 9:
+      return \"vmov%?\\t%0, %1\\t%@ int\";
+    case 10:
+      return \"vmov%?\\t%0, %1\\t%@ int\";
+    case 11:
       return \"vmov%?.f32\\t%0, %1\\t%@ int\";
-    case 10: case 11:
+    case 12: case 13:
       return output_move_vfp (operands);
+    case 14:
+      return \"vmsr\\t P0, %1\";
+    case 15:
+      return \"vmrs\\t %0, P0\";
     default:
       gcc_unreachable ();
     }
   "
   [(set_attr "predicable" "yes")
-   (set_attr "predicable_short_it" "yes,no,yes,no,no,no,no,no,no,no,no,no")
-   (set_attr "type" "mov_reg,mov_reg,mov_reg,mvn_reg,mov_imm,load_4,store_4,f_mcr,f_mrc,fmov,f_loads,f_stores")
-   (set_attr "length" "2,4,2,4,4,4,4,4,4,4,4,4")
-   (set_attr "pool_range"     "*,*,*,*,*,1018,*,*,*,*,1018,*")
-   (set_attr "neg_pool_range" "*,*,*,*,*,   0,*,*,*,*,1008,*")]
+   (set_attr "predicable_short_it" "yes,no,yes,no,no,no,no,no,no,no,no,no,no,\
+	     no,no,no")
+   (set_attr "type" "mov_reg,mov_reg,mov_reg,mvn_reg,mov_imm,load_4,load_4,\
+	     store_4,store_4,f_mcr,f_mrc,fmov,f_loads,f_stores,mve_move,\
+	     mve_move")
+   (set_attr "length" "2,4,2,4,4,4,4,4,4,4,4,4,4,4,4,4")
+   (set_attr "pool_range"     "*,*,*,*,*,1018,4094,*,*,*,*,*,1018,*,*,*")
+   (set_attr "neg_pool_range" "*,*,*,*,*,   0,   0,*,*,*,*,*,1008,*,*,*")]
 )
 
 
 ;; DImode moves
 
 (define_insn "*movdi_vfp"
-  [(set (match_operand:DI 0 "nonimmediate_di_operand" "=r,r,r,r,r,r,m,w,!r,w,w, Uv")
-	(match_operand:DI 1 "di_operand"	      "r,rDa,Db,Dc,mi,mi,r,r,w,w,UvTu,w"))]
-  "TARGET_32BIT && TARGET_HARD_FLOAT
-   && (   register_operand (operands[0], DImode)
-       || register_operand (operands[1], DImode))
-   && !(TARGET_NEON && CONST_INT_P (operands[1])
-	&& simd_immediate_valid_for_move (operands[1], DImode, NULL, NULL))"
+  [(set (match_operand:DI 0 "nonimmediate_di_operand" "=r,r,r,r,r,r,m,w,!r,w,\
+			    w, Uv")
+       (match_operand:DI 1 "di_operand" "r,rDa,Db,Dc,mi,mi,r,r,w,w,UvTu,w"))]
+  "TARGET_32BIT && (TARGET_HARD_FLOAT || TARGET_HAVE_MVE)
+    && (register_operand (operands[0], DImode)
+	|| register_operand (operands[1], DImode))
+   && !((TARGET_NEON || TARGET_HAVE_MVE) && CONST_INT_P (operands[1])
+       && simd_immediate_valid_for_move (operands[1], DImode, NULL, NULL))"
   "*
   switch (which_alternative)
     {
@@ -390,9 +410,15 @@
     case 6: /* S register from immediate.  */
       return \"vmov.f16\\t%0, %1\t%@ __fp16\";
     case 7: /* S register from memory.  */
-      return \"vld1.16\\t{%z0}, %A1\";
+      if (TARGET_HAVE_MVE)
+	return \"vldr.16\\t%0, %A1\";
+      else
+	return \"vld1.16\\t{%z0}, %A1\";
     case 8: /* Memory from S register.  */
-      return \"vst1.16\\t{%z1}, %A0\";
+      if (TARGET_HAVE_MVE)
+	return \"vstr.16\\t%1, %A0\";
+      else
+	return \"vst1.16\\t{%z1}, %A0\";
     case 9: /* ARM register from constant.  */
       {
 	long bits;
@@ -593,7 +619,7 @@
 (define_insn "*thumb2_movsf_vfp"
   [(set (match_operand:SF 0 "nonimmediate_operand" "=t,?r,t, t  ,Uv,r ,m,t,r")
 	(match_operand:SF 1 "hard_sf_operand"	   " ?r,t,Dv,UvHa,t, mHa,r,t,r"))]
-  "TARGET_THUMB2 && TARGET_HARD_FLOAT
+  "TARGET_THUMB2 && (TARGET_HARD_FLOAT || TARGET_HAVE_MVE)
    && (   s_register_operand (operands[0], SFmode)
        || s_register_operand (operands[1], SFmode))"
   "*
@@ -682,7 +708,7 @@
 (define_insn "*thumb2_movdf_vfp"
   [(set (match_operand:DF 0 "nonimmediate_soft_df_operand" "=w,?r,w ,w,w  ,Uv,r ,m,w,r")
 	(match_operand:DF 1 "hard_df_operand"		   " ?r,w,Dy,G,UvHa,w, mHa,r, w,r"))]
-  "TARGET_THUMB2 && TARGET_HARD_FLOAT
+  "TARGET_THUMB2 && (TARGET_HARD_FLOAT || TARGET_HAVE_MVE)
    && (   register_operand (operands[0], DFmode)
        || register_operand (operands[1], DFmode))"
   "*
@@ -760,7 +786,7 @@
 	    [(match_operand 4 "cc_register" "") (const_int 0)])
 	  (match_operand:SF 1 "s_register_operand" "0,t,t,0,?r,?r,0,t,t")
 	  (match_operand:SF 2 "s_register_operand" "t,0,t,?r,0,?r,t,0,t")))]
-  "TARGET_THUMB2 && TARGET_HARD_FLOAT && !arm_restrict_it"
+  "TARGET_THUMB2 && (TARGET_HARD_FLOAT || TARGET_HAVE_MVE) && !arm_restrict_it"
   "@
    it\\t%D3\;vmov%D3.f32\\t%0, %2
    it\\t%d3\;vmov%d3.f32\\t%0, %1
@@ -806,7 +832,8 @@
 	    [(match_operand 4 "cc_register" "") (const_int 0)])
 	  (match_operand:DF 1 "s_register_operand" "0,w,w,0,?r,?r,0,w,w")
 	  (match_operand:DF 2 "s_register_operand" "w,0,w,?r,0,?r,w,0,w")))]
-  "TARGET_THUMB2 && TARGET_HARD_FLOAT && TARGET_VFP_DOUBLE && !arm_restrict_it"
+  "TARGET_THUMB2 && (TARGET_HARD_FLOAT || TARGET_HAVE_MVE) && TARGET_VFP_DOUBLE
+   && !arm_restrict_it"
   "@
    it\\t%D3\;vmov%D3.f64\\t%P0, %P2
    it\\t%d3\;vmov%d3.f64\\t%P0, %P1
@@ -1982,7 +2009,7 @@
     [(set (match_operand:BLK 0 "memory_operand" "=m")
 	  (unspec:BLK [(match_operand:DF 1 "vfp_register_operand" "")]
 		      UNSPEC_PUSH_MULT))])]
-  "TARGET_32BIT && TARGET_HARD_FLOAT"
+  "TARGET_32BIT && (TARGET_HARD_FLOAT || TARGET_HAVE_MVE)"
   "* return vfp_output_vstmd (operands);"
   [(set_attr "type" "f_stored")]
 )
@@ -2070,16 +2097,18 @@
 
 ;; Write Floating-point Status and Control Register.
 (define_insn "set_fpscr"
-  [(unspec_volatile [(match_operand:SI 0 "register_operand" "r")] VUNSPEC_SET_FPSCR)]
-  "TARGET_HARD_FLOAT"
+  [(set (reg:SI VFPCC_REGNUM)
+	(unspec_volatile:SI
+	 [(match_operand:SI 0 "register_operand" "r")] VUNSPEC_SET_FPSCR))]
+  "TARGET_HARD_FLOAT || TARGET_HAVE_MVE"
   "mcr\\tp10, 7, %0, cr1, cr0, 0\\t @SET_FPSCR"
   [(set_attr "type" "mrs")])
 
 ;; Read Floating-point Status and Control Register.
 (define_insn "get_fpscr"
   [(set (match_operand:SI 0 "register_operand" "=r")
-        (unspec_volatile:SI [(const_int 0)] VUNSPEC_GET_FPSCR))]
-  "TARGET_HARD_FLOAT"
+	(unspec:SI [(reg:SI VFPCC_REGNUM)] UNSPEC_GET_FPSCR))]
+  "TARGET_HARD_FLOAT || TARGET_HAVE_MVE"
   "mrc\\tp10, 7, %0, cr1, cr0, 0\\t @GET_FPSCR"
   [(set_attr "type" "mrs")])
 


[-- Attachment #2: diff01.patch --]
[-- Type: text/plain, Size: 13461 bytes --]

diff --git a/gcc/config/arm/thumb2.md b/gcc/config/arm/thumb2.md
index 809461a25da5a8058a8afce972dea0d3131effc0..81afd8fcdc1b0a82493dc0758bce16fa9e5fde20 100644
--- a/gcc/config/arm/thumb2.md
+++ b/gcc/config/arm/thumb2.md
@@ -435,10 +435,10 @@
 (define_insn "*cmovsi_insn"
   [(set (match_operand:SI 0 "arm_general_register_operand" "=r,r,r,r,r,r,r")
 	(if_then_else:SI
-	 (match_operator 1 "arm_comparison_operator"
-	  [(match_operand 2 "cc_register" "") (const_int 0)])
-	 (match_operand:SI 3 "arm_reg_or_m1_or_1" "r, r,UM, r,U1,UM,U1")
-	 (match_operand:SI 4 "arm_reg_or_m1_or_1" "r,UM, r,U1, r,UM,U1")))]
+	(match_operator 1 "arm_comparison_operator"
+	 [(match_operand 2 "cc_register" "") (const_int 0)])
+	(match_operand:SI 3 "arm_reg_or_m1_or_1" "r, r,UM, r,U1,UM,U1")
+	(match_operand:SI 4 "arm_reg_or_m1_or_1" "r,UM, r,U1, r,UM,U1")))]
   "TARGET_THUMB2 && TARGET_COND_ARITH
    && (!((operands[3] == const1_rtx && operands[4] == constm1_rtx)
        || (operands[3] == constm1_rtx && operands[4] == const1_rtx)))"
@@ -540,7 +540,7 @@
 			  [(match_operand 4 "cc_register" "") (const_int 0)])
 			 (match_operand:SF 1 "s_register_operand" "0,r")
 			 (match_operand:SF 2 "s_register_operand" "r,0")))]
-  "TARGET_THUMB2 && TARGET_SOFT_FLOAT"
+  "TARGET_THUMB2 && TARGET_SOFT_FLOAT && !TARGET_HAVE_MVE"
   "@
    it\\t%D3\;mov%D3\\t%0, %2
    it\\t%d3\;mov%d3\\t%0, %1"
@@ -1226,7 +1226,7 @@
    ; added to clear the APSR and potentially the FPSCR if VFP is available, so
    ; we adapt the length accordingly.
    (set (attr "length")
-     (if_then_else (match_test "TARGET_HARD_FLOAT")
+     (if_then_else (match_test "TARGET_HARD_FLOAT || TARGET_HAVE_MVE")
       (const_int 34)
       (const_int 8)))
    ; We do not support predicate execution of returns from cmse_nonsecure_entry
diff --git a/gcc/config/arm/unspecs.md b/gcc/config/arm/unspecs.md
index b3b4f8ee3e2d1bdad968a9dd8ccbc72ded274f48..ac7fe7d0af19f1965356d47d8327e24d410b99bd 100644
--- a/gcc/config/arm/unspecs.md
+++ b/gcc/config/arm/unspecs.md
@@ -170,6 +170,7 @@
   UNSPEC_TORC		; Used by the intrinsic form of the iWMMXt TORC instruction.
   UNSPEC_TORVSC		; Used by the intrinsic form of the iWMMXt TORVSC instruction.
   UNSPEC_TEXTRC		; Used by the intrinsic form of the iWMMXt TEXTRC instruction.
+  UNSPEC_GET_FPSCR	; Represent fetch of FPSCR content.
 ])
 
 
@@ -216,7 +217,6 @@
   VUNSPEC_SLX		; Represent a store-register-release-exclusive.
   VUNSPEC_LDA		; Represent a store-register-acquire.
   VUNSPEC_STL		; Represent a store-register-release.
-  VUNSPEC_GET_FPSCR	; Represent fetch of FPSCR content.
   VUNSPEC_SET_FPSCR	; Represent assign of FPSCR content.
   VUNSPEC_PROBE_STACK_RANGE ; Represent stack range probing.
   VUNSPEC_CDP		; Represent the coprocessor cdp instruction.
diff --git a/gcc/config/arm/vfp.md b/gcc/config/arm/vfp.md
index 6349c0570540ec25a599166f5d427fcbdbf2af68..461a5d71ca8548cfc61c83f9716249425633ad21 100644
--- a/gcc/config/arm/vfp.md
+++ b/gcc/config/arm/vfp.md
@@ -74,10 +74,10 @@
 (define_insn "*thumb2_movhi_vfp"
  [(set
    (match_operand:HI 0 "nonimmediate_operand"
-    "=rk, r, l, r, m, r, *t, r, *t")
+    "=rk, r, l, r, m, r, *t, r, *t, Up, r")
    (match_operand:HI 1 "general_operand"
-    "rk, I, Py, n, r, m, r, *t, *t"))]
- "TARGET_THUMB2 && TARGET_HARD_FLOAT
+    "rk, I, Py, n, r, m, r, *t, *t, r, Up"))]
+ "TARGET_THUMB2 && (TARGET_HARD_FLOAT || TARGET_HAVE_MVE)
   && !TARGET_VFP_FP16INST
   && (register_operand (operands[0], HImode)
        || register_operand (operands[1], HImode))"
@@ -99,20 +99,24 @@
       return "vmov%?\t%0, %1\t%@ int";
     case 8:
       return "vmov%?.f32\t%0, %1\t%@ int";
+    case 9:
+      return "vmsr%?\t P0, %1\t@ movhi";
+    case 10:
+      return "vmrs%?\t %0, P0\t@ movhi";
     default:
       gcc_unreachable ();
     }
 }
  [(set_attr "predicable" "yes")
   (set_attr "predicable_short_it"
-   "yes, no, yes, no, no, no, no, no, no")
+   "yes, no, yes, no, no, no, no, no, no, no, no")
   (set_attr "type"
    "mov_reg, mov_imm, mov_imm, mov_imm, store_4, load_4,\
-    f_mcr, f_mrc, fmov")
-  (set_attr "arch" "*, *, *, v6t2, *, *, *, *, *")
-  (set_attr "pool_range" "*, *, *, *, *, 4094, *, *, *")
-  (set_attr "neg_pool_range" "*, *, *, *, *, 250, *, *, *")
-  (set_attr "length" "2, 4, 2, 4, 4, 4, 4, 4, 4")]
+    f_mcr, f_mrc, fmov, mve_move, mve_move")
+  (set_attr "arch" "*, *, *, v6t2, *, *, *, *, *, *, *")
+  (set_attr "pool_range" "*, *, *, *, *, 4094, *, *, *, *, *")
+  (set_attr "neg_pool_range" "*, *, *, *, *, 250, *, *, *, *, *")
+  (set_attr "length" "2, 4, 2, 4, 4, 4, 4, 4, 4, 4, 4")]
 )
 
 ;; Patterns for HI moves which provide more data transfer instructions when FP16
@@ -170,10 +174,10 @@
 (define_insn "*thumb2_movhi_fp16"
  [(set
    (match_operand:HI 0 "nonimmediate_operand"
-    "=rk, r, l, r, m, r, *t, r, *t")
+    "=rk, r, l, r, m, r, *t, r, *t, Up, r")
    (match_operand:HI 1 "general_operand"
-    "rk, I, Py, n, r, m, r, *t, *t"))]
- "TARGET_THUMB2 && TARGET_VFP_FP16INST
+    "rk, I, Py, n, r, m, r, *t, *t, r, Up"))]
+ "TARGET_THUMB2 && (TARGET_VFP_FP16INST || TARGET_HAVE_MVE)
   && (register_operand (operands[0], HImode)
        || register_operand (operands[1], HImode))"
 {
@@ -194,21 +198,25 @@
       return "vmov.f16\t%0, %1\t%@ int";
     case 8:
       return "vmov%?.f32\t%0, %1\t%@ int";
+    case 9:
+      return "vmsr%?\tP0, %1\t%@ movhi";
+    case 10:
+      return "vmrs%?\t%0, P0\t%@ movhi";
     default:
       gcc_unreachable ();
     }
 }
  [(set_attr "predicable"
-   "yes, yes, yes, yes, yes, yes, no, no, yes")
+   "yes, yes, yes, yes, yes, yes, no, no, yes, yes, yes")
   (set_attr "predicable_short_it"
-   "yes, no, yes, no, no, no, no, no, no")
+   "yes, no, yes, no, no, no, no, no, no, no, no")
   (set_attr "type"
    "mov_reg, mov_imm, mov_imm, mov_imm, store_4, load_4,\
-    f_mcr, f_mrc, fmov")
-  (set_attr "arch" "*, *, *, v6t2, *, *, *, *, *")
-  (set_attr "pool_range" "*, *, *, *, *, 4094, *, *, *")
-  (set_attr "neg_pool_range" "*, *, *, *, *, 250, *, *, *")
-  (set_attr "length" "2, 4, 2, 4, 4, 4, 4, 4, 4")]
+    f_mcr, f_mrc, fmov, mve_move, mve_move")
+  (set_attr "arch" "*, *, *, v6t2, *, *, *, *, *, *, *")
+  (set_attr "pool_range" "*, *, *, *, *, 4094, *, *, *, *, *")
+  (set_attr "neg_pool_range" "*, *, *, *, *, 250, *, *, *, *, *")
+  (set_attr "length" "2, 4, 2, 4, 4, 4, 4, 4, 4, 4, 4")]
 )
 
 ;; SImode moves
@@ -258,9 +266,11 @@
 ;; is chosen with length 2 when the instruction is predicated for
 ;; arm_restrict_it.
 (define_insn "*thumb2_movsi_vfp"
-  [(set (match_operand:SI 0 "nonimmediate_operand" "=rk,r,l,r,r,lk*r,m,*t, r,*t,*t,  *Uv")
-	(match_operand:SI 1 "general_operand"	   "rk,I,Py,K,j,mi,lk*r, r,*t,*t,*UvTu,*t"))]
-  "TARGET_THUMB2 && TARGET_HARD_FLOAT
+  [(set (match_operand:SI 0 "nonimmediate_operand" "=rk,r,l,r,r,l,*hk,m,*m,*t,\
+			     r,*t,*t,*Uv, Up, r")
+	(match_operand:SI 1 "general_operand"	   "rk,I,Py,K,j,mi,*mi,l,*hk,\
+			     r,*t,*t,*UvTu,*t, r, Up"))]
+  "TARGET_THUMB2 && (TARGET_HARD_FLOAT || TARGET_HAVE_MVE)
    && (   s_register_operand (operands[0], SImode)
        || s_register_operand (operands[1], SImode))"
   "*
@@ -275,43 +285,53 @@
     case 4:
       return \"movw%?\\t%0, %1\";
     case 5:
+    case 6:
       /* Cannot load it directly, split to load it via MOV / MOVT.  */
       if (!MEM_P (operands[1]) && arm_disable_literal_pool)
 	return \"#\";
       return \"ldr%?\\t%0, %1\";
-    case 6:
-      return \"str%?\\t%1, %0\";
     case 7:
-      return \"vmov%?\\t%0, %1\\t%@ int\";
     case 8:
-      return \"vmov%?\\t%0, %1\\t%@ int\";
+      return \"str%?\\t%1, %0\";
     case 9:
+      return \"vmov%?\\t%0, %1\\t%@ int\";
+    case 10:
+      return \"vmov%?\\t%0, %1\\t%@ int\";
+    case 11:
       return \"vmov%?.f32\\t%0, %1\\t%@ int\";
-    case 10: case 11:
+    case 12: case 13:
       return output_move_vfp (operands);
+    case 14:
+      return \"vmsr\\t P0, %1\";
+    case 15:
+      return \"vmrs\\t %0, P0\";
     default:
       gcc_unreachable ();
     }
   "
   [(set_attr "predicable" "yes")
-   (set_attr "predicable_short_it" "yes,no,yes,no,no,no,no,no,no,no,no,no")
-   (set_attr "type" "mov_reg,mov_reg,mov_reg,mvn_reg,mov_imm,load_4,store_4,f_mcr,f_mrc,fmov,f_loads,f_stores")
-   (set_attr "length" "2,4,2,4,4,4,4,4,4,4,4,4")
-   (set_attr "pool_range"     "*,*,*,*,*,1018,*,*,*,*,1018,*")
-   (set_attr "neg_pool_range" "*,*,*,*,*,   0,*,*,*,*,1008,*")]
+   (set_attr "predicable_short_it" "yes,no,yes,no,no,no,no,no,no,no,no,no,no,\
+	     no,no,no")
+   (set_attr "type" "mov_reg,mov_reg,mov_reg,mvn_reg,mov_imm,load_4,load_4,\
+	     store_4,store_4,f_mcr,f_mrc,fmov,f_loads,f_stores,mve_move,\
+	     mve_move")
+   (set_attr "length" "2,4,2,4,4,4,4,4,4,4,4,4,4,4,4,4")
+   (set_attr "pool_range"     "*,*,*,*,*,1018,4094,*,*,*,*,*,1018,*,*,*")
+   (set_attr "neg_pool_range" "*,*,*,*,*,   0,   0,*,*,*,*,*,1008,*,*,*")]
 )
 
 
 ;; DImode moves
 
 (define_insn "*movdi_vfp"
-  [(set (match_operand:DI 0 "nonimmediate_di_operand" "=r,r,r,r,r,r,m,w,!r,w,w, Uv")
-	(match_operand:DI 1 "di_operand"	      "r,rDa,Db,Dc,mi,mi,r,r,w,w,UvTu,w"))]
-  "TARGET_32BIT && TARGET_HARD_FLOAT
-   && (   register_operand (operands[0], DImode)
-       || register_operand (operands[1], DImode))
-   && !(TARGET_NEON && CONST_INT_P (operands[1])
-	&& simd_immediate_valid_for_move (operands[1], DImode, NULL, NULL))"
+  [(set (match_operand:DI 0 "nonimmediate_di_operand" "=r,r,r,r,r,r,m,w,!r,w,\
+			    w, Uv")
+       (match_operand:DI 1 "di_operand" "r,rDa,Db,Dc,mi,mi,r,r,w,w,UvTu,w"))]
+  "TARGET_32BIT && (TARGET_HARD_FLOAT || TARGET_HAVE_MVE)
+    && (register_operand (operands[0], DImode)
+	|| register_operand (operands[1], DImode))
+   && !((TARGET_NEON || TARGET_HAVE_MVE) && CONST_INT_P (operands[1])
+       && simd_immediate_valid_for_move (operands[1], DImode, NULL, NULL))"
   "*
   switch (which_alternative)
     {
@@ -390,9 +410,15 @@
     case 6: /* S register from immediate.  */
       return \"vmov.f16\\t%0, %1\t%@ __fp16\";
     case 7: /* S register from memory.  */
-      return \"vld1.16\\t{%z0}, %A1\";
+      if (TARGET_HAVE_MVE)
+	return \"vldr.16\\t%0, %A1\";
+      else
+	return \"vld1.16\\t{%z0}, %A1\";
     case 8: /* Memory from S register.  */
-      return \"vst1.16\\t{%z1}, %A0\";
+      if (TARGET_HAVE_MVE)
+	return \"vstr.16\\t%1, %A0\";
+      else
+	return \"vst1.16\\t{%z1}, %A0\";
     case 9: /* ARM register from constant.  */
       {
 	long bits;
@@ -593,7 +619,7 @@
 (define_insn "*thumb2_movsf_vfp"
   [(set (match_operand:SF 0 "nonimmediate_operand" "=t,?r,t, t  ,Uv,r ,m,t,r")
 	(match_operand:SF 1 "hard_sf_operand"	   " ?r,t,Dv,UvHa,t, mHa,r,t,r"))]
-  "TARGET_THUMB2 && TARGET_HARD_FLOAT
+  "TARGET_THUMB2 && (TARGET_HARD_FLOAT || TARGET_HAVE_MVE)
    && (   s_register_operand (operands[0], SFmode)
        || s_register_operand (operands[1], SFmode))"
   "*
@@ -682,7 +708,7 @@
 (define_insn "*thumb2_movdf_vfp"
   [(set (match_operand:DF 0 "nonimmediate_soft_df_operand" "=w,?r,w ,w,w  ,Uv,r ,m,w,r")
 	(match_operand:DF 1 "hard_df_operand"		   " ?r,w,Dy,G,UvHa,w, mHa,r, w,r"))]
-  "TARGET_THUMB2 && TARGET_HARD_FLOAT
+  "TARGET_THUMB2 && (TARGET_HARD_FLOAT || TARGET_HAVE_MVE)
    && (   register_operand (operands[0], DFmode)
        || register_operand (operands[1], DFmode))"
   "*
@@ -760,7 +786,7 @@
 	    [(match_operand 4 "cc_register" "") (const_int 0)])
 	  (match_operand:SF 1 "s_register_operand" "0,t,t,0,?r,?r,0,t,t")
 	  (match_operand:SF 2 "s_register_operand" "t,0,t,?r,0,?r,t,0,t")))]
-  "TARGET_THUMB2 && TARGET_HARD_FLOAT && !arm_restrict_it"
+  "TARGET_THUMB2 && (TARGET_HARD_FLOAT || TARGET_HAVE_MVE) && !arm_restrict_it"
   "@
    it\\t%D3\;vmov%D3.f32\\t%0, %2
    it\\t%d3\;vmov%d3.f32\\t%0, %1
@@ -806,7 +832,8 @@
 	    [(match_operand 4 "cc_register" "") (const_int 0)])
 	  (match_operand:DF 1 "s_register_operand" "0,w,w,0,?r,?r,0,w,w")
 	  (match_operand:DF 2 "s_register_operand" "w,0,w,?r,0,?r,w,0,w")))]
-  "TARGET_THUMB2 && TARGET_HARD_FLOAT && TARGET_VFP_DOUBLE && !arm_restrict_it"
+  "TARGET_THUMB2 && (TARGET_HARD_FLOAT || TARGET_HAVE_MVE) && TARGET_VFP_DOUBLE
+   && !arm_restrict_it"
   "@
    it\\t%D3\;vmov%D3.f64\\t%P0, %P2
    it\\t%d3\;vmov%d3.f64\\t%P0, %P1
@@ -1982,7 +2009,7 @@
     [(set (match_operand:BLK 0 "memory_operand" "=m")
 	  (unspec:BLK [(match_operand:DF 1 "vfp_register_operand" "")]
 		      UNSPEC_PUSH_MULT))])]
-  "TARGET_32BIT && TARGET_HARD_FLOAT"
+  "TARGET_32BIT && (TARGET_HARD_FLOAT || TARGET_HAVE_MVE)"
   "* return vfp_output_vstmd (operands);"
   [(set_attr "type" "f_stored")]
 )
@@ -2070,16 +2097,18 @@
 
 ;; Write Floating-point Status and Control Register.
 (define_insn "set_fpscr"
-  [(unspec_volatile [(match_operand:SI 0 "register_operand" "r")] VUNSPEC_SET_FPSCR)]
-  "TARGET_HARD_FLOAT"
+  [(set (reg:SI VFPCC_REGNUM)
+	(unspec_volatile:SI
+	 [(match_operand:SI 0 "register_operand" "r")] VUNSPEC_SET_FPSCR))]
+  "TARGET_HARD_FLOAT || TARGET_HAVE_MVE"
   "mcr\\tp10, 7, %0, cr1, cr0, 0\\t @SET_FPSCR"
   [(set_attr "type" "mrs")])
 
 ;; Read Floating-point Status and Control Register.
 (define_insn "get_fpscr"
   [(set (match_operand:SI 0 "register_operand" "=r")
-        (unspec_volatile:SI [(const_int 0)] VUNSPEC_GET_FPSCR))]
-  "TARGET_HARD_FLOAT"
+	(unspec:SI [(reg:SI VFPCC_REGNUM)] UNSPEC_GET_FPSCR))]
+  "TARGET_HARD_FLOAT || TARGET_HAVE_MVE"
   "mrc\\tp10, 7, %0, cr1, cr0, 0\\t @GET_FPSCR"
   [(set_attr "type" "mrs")])
 


^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH][ARM][GCC][3/1x]: MVE intrinsics with unary operand.
  2019-11-14 19:34 [PATCH][ARM][GCC][0/x]: Support for MVE ACLE intrinsics Srinath Parvathaneni
                   ` (3 preceding siblings ...)
  2019-11-14 19:13 ` [PATCH][ARM][GCC][2/x]: MVE ACLE intrinsics framework patch Srinath Parvathaneni
@ 2019-11-14 19:13 ` Srinath Parvathaneni
  2019-11-14 19:13 ` [PATCH][ARM][GCC][2/1x]: " Srinath Parvathaneni
                   ` (33 subsequent siblings)
  38 siblings, 0 replies; 48+ messages in thread
From: Srinath Parvathaneni @ 2019-11-14 19:13 UTC (permalink / raw)
  To: gcc-patches; +Cc: Richard Earnshaw, Kyrylo Tkachov

[-- Attachment #1: Type: text/plain, Size: 11491 bytes --]

Hello,

This patch supports following MVE ACLE intrinsics with unary operand.

vdupq_n_s8, vdupq_n_s16, vdupq_n_s32, vabsq_s8, vabsq_s16, vabsq_s32, vclsq_s8, vclsq_s16, vclsq_s32, vclzq_s8, vclzq_s16, vclzq_s32, vnegq_s8, vnegq_s16, vnegq_s32, vaddlvq_s32, vaddvq_s8, vaddvq_s16, vaddvq_s32, vmovlbq_s8, vmovlbq_s16, vmovltq_s8, vmovltq_s16, vmvnq_s8, vmvnq_s16, vmvnq_s32, vrev16q_s8, vrev32q_s8, vrev32q_s16, vqabsq_s8, vqabsq_s16, vqabsq_s32, vqnegq_s8, vqnegq_s16, vqnegq_s32, vcvtaq_s16_f16, vcvtaq_s32_f32, vcvtnq_s16_f16, vcvtnq_s32_f32, vcvtpq_s16_f16, vcvtpq_s32_f32, vcvtmq_s16_f16, vcvtmq_s32_f32, vmvnq_u8, vmvnq_u16, vmvnq_u32, vdupq_n_u8, vdupq_n_u16, vdupq_n_u32, vclzq_u8, vclzq_u16, vclzq_u32, vaddvq_u8, vaddvq_u16, vaddvq_u32, vrev32q_u8, vrev32q_u16, vmovltq_u8, vmovltq_u16, vmovlbq_u8, vmovlbq_u16, vrev16q_u8, vaddlvq_u32, vcvtpq_u16_f16, vcvtpq_u32_f32, vcvtnq_u16_f16, vcvtmq_u16_f16, vcvtmq_u32_f32, vcvtaq_u16_f16, vcvtaq_u32_f32, vdupq_n, vabsq, vclsq, vclzq, vnegq, vaddlvq, vaddvq, vmovlbq, vmovltq, vmvnq, vrev16q, vrev32q, vqabsq, vqnegq.

A new register class "EVEN_REGS" which allows only even registers is added in this patch.

The new constraint "e" allows only reigsters of EVEN_REGS class.

Please refer to M-profile Vector Extension (MVE) intrinsics [1]  for more details.
[1] https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/mve-intrinsics

Regression tested on arm-none-eabi and found no regressions.

Ok for trunk?

Thanks,
Srinath.

gcc/ChangeLog:

2019-10-21  Andre Vieira  <andre.simoesdiasvieira@arm.com>
	    Mihail Ionescu  <mihail.ionescu@arm.com>
	    Srinath Parvathaneni  <srinath.parvathaneni@arm.com>

	* config/arm/arm.h (enum reg_class): Define new class EVEN_REGS.
	* config/arm/arm_mve.h (vdupq_n_s8): Define macro.
	(vdupq_n_s16): Likewise.
	(vdupq_n_s32): Likewise.
	(vabsq_s8): Likewise.
	(vabsq_s16): Likewise.
	(vabsq_s32): Likewise.
	(vclsq_s8): Likewise.
	(vclsq_s16): Likewise.
	(vclsq_s32): Likewise.
	(vclzq_s8): Likewise.
	(vclzq_s16): Likewise.
	(vclzq_s32): Likewise.
	(vnegq_s8): Likewise.
	(vnegq_s16): Likewise.
	(vnegq_s32): Likewise.
	(vaddlvq_s32): Likewise.
	(vaddvq_s8): Likewise.
	(vaddvq_s16): Likewise.
	(vaddvq_s32): Likewise.
	(vmovlbq_s8): Likewise.
	(vmovlbq_s16): Likewise.
	(vmovltq_s8): Likewise.
	(vmovltq_s16): Likewise.
	(vmvnq_s8): Likewise.
	(vmvnq_s16): Likewise.
	(vmvnq_s32): Likewise.
	(vrev16q_s8): Likewise.
	(vrev32q_s8): Likewise.
	(vrev32q_s16): Likewise.
	(vqabsq_s8): Likewise.
	(vqabsq_s16): Likewise.
	(vqabsq_s32): Likewise.
	(vqnegq_s8): Likewise.
	(vqnegq_s16): Likewise.
	(vqnegq_s32): Likewise.
	(vcvtaq_s16_f16): Likewise.
	(vcvtaq_s32_f32): Likewise.
	(vcvtnq_s16_f16): Likewise.
	(vcvtnq_s32_f32): Likewise.
	(vcvtpq_s16_f16): Likewise.
	(vcvtpq_s32_f32): Likewise.
	(vcvtmq_s16_f16): Likewise.
	(vcvtmq_s32_f32): Likewise.
	(vmvnq_u8): Likewise.
	(vmvnq_u16): Likewise.
	(vmvnq_u32): Likewise.
	(vdupq_n_u8): Likewise.
	(vdupq_n_u16): Likewise.
	(vdupq_n_u32): Likewise.
	(vclzq_u8): Likewise.
	(vclzq_u16): Likewise.
	(vclzq_u32): Likewise.
	(vaddvq_u8): Likewise.
	(vaddvq_u16): Likewise.
	(vaddvq_u32): Likewise.
	(vrev32q_u8): Likewise.
	(vrev32q_u16): Likewise.
	(vmovltq_u8): Likewise.
	(vmovltq_u16): Likewise.
	(vmovlbq_u8): Likewise.
	(vmovlbq_u16): Likewise.
	(vrev16q_u8): Likewise.
	(vaddlvq_u32): Likewise.
	(vcvtpq_u16_f16): Likewise.
	(vcvtpq_u32_f32): Likewise.
	(vcvtnq_u16_f16): Likewise.
	(vcvtmq_u16_f16): Likewise.
	(vcvtmq_u32_f32): Likewise.
	(vcvtaq_u16_f16): Likewise.
	(vcvtaq_u32_f32): Likewise.
	(__arm_vdupq_n_s8): Define intrinsic.
	(__arm_vdupq_n_s16): Likewise.
	(__arm_vdupq_n_s32): Likewise.
	(__arm_vabsq_s8): Likewise.
	(__arm_vabsq_s16): Likewise.
	(__arm_vabsq_s32): Likewise.
	(__arm_vclsq_s8): Likewise.
	(__arm_vclsq_s16): Likewise.
	(__arm_vclsq_s32): Likewise.
	(__arm_vclzq_s8): Likewise.
	(__arm_vclzq_s16): Likewise.
	(__arm_vclzq_s32): Likewise.
	(__arm_vnegq_s8): Likewise.
	(__arm_vnegq_s16): Likewise.
	(__arm_vnegq_s32): Likewise.
	(__arm_vaddlvq_s32): Likewise.
	(__arm_vaddvq_s8): Likewise.
	(__arm_vaddvq_s16): Likewise.
	(__arm_vaddvq_s32): Likewise.
	(__arm_vmovlbq_s8): Likewise.
	(__arm_vmovlbq_s16): Likewise.
	(__arm_vmovltq_s8): Likewise.
	(__arm_vmovltq_s16): Likewise.
	(__arm_vmvnq_s8): Likewise.
	(__arm_vmvnq_s16): Likewise.
	(__arm_vmvnq_s32): Likewise.
	(__arm_vrev16q_s8): Likewise.
	(__arm_vrev32q_s8): Likewise.
	(__arm_vrev32q_s16): Likewise.
	(__arm_vqabsq_s8): Likewise.
	(__arm_vqabsq_s16): Likewise.
	(__arm_vqabsq_s32): Likewise.
	(__arm_vqnegq_s8): Likewise.
	(__arm_vqnegq_s16): Likewise.
	(__arm_vqnegq_s32): Likewise.
	(__arm_vmvnq_u8): Likewise.
	(__arm_vmvnq_u16): Likewise.
	(__arm_vmvnq_u32): Likewise.
	(__arm_vdupq_n_u8): Likewise.
	(__arm_vdupq_n_u16): Likewise.
	(__arm_vdupq_n_u32): Likewise.
	(__arm_vclzq_u8): Likewise.
	(__arm_vclzq_u16): Likewise.
	(__arm_vclzq_u32): Likewise.
	(__arm_vaddvq_u8): Likewise.
	(__arm_vaddvq_u16): Likewise.
	(__arm_vaddvq_u32): Likewise.
	(__arm_vrev32q_u8): Likewise.
	(__arm_vrev32q_u16): Likewise.
	(__arm_vmovltq_u8): Likewise.
	(__arm_vmovltq_u16): Likewise.
	(__arm_vmovlbq_u8): Likewise.
	(__arm_vmovlbq_u16): Likewise.
	(__arm_vrev16q_u8): Likewise.
	(__arm_vaddlvq_u32): Likewise.
	(__arm_vcvtpq_u16_f16): Likewise.
	(__arm_vcvtpq_u32_f32): Likewise.
	(__arm_vcvtnq_u16_f16): Likewise.
	(__arm_vcvtmq_u16_f16): Likewise.
	(__arm_vcvtmq_u32_f32): Likewise.
	(__arm_vcvtaq_u16_f16): Likewise.
	(__arm_vcvtaq_u32_f32): Likewise.
	(__arm_vcvtaq_s16_f16): Likewise.
	(__arm_vcvtaq_s32_f32): Likewise.
	(__arm_vcvtnq_s16_f16): Likewise.
	(__arm_vcvtnq_s32_f32): Likewise.
	(__arm_vcvtpq_s16_f16): Likewise.
	(__arm_vcvtpq_s32_f32): Likewise.
	(__arm_vcvtmq_s16_f16): Likewise.
	(__arm_vcvtmq_s32_f32): Likewise.
	(vdupq_n): Define polymorphic variant.
	(vabsq): Likewise.
	(vclsq): Likewise.
	(vclzq): Likewise.
	(vnegq): Likewise.
	(vaddlvq): Likewise.
	(vaddvq): Likewise.
	(vmovlbq): Likewise.
	(vmovltq): Likewise.
	(vmvnq): Likewise.
	(vrev16q): Likewise.
	(vrev32q): Likewise.
	(vqabsq): Likewise.
	(vqnegq): Likewise.
	* config/arm/arm_mve_builtins.def (UNOP_SNONE_SNONE): Use it.
	(UNOP_SNONE_NONE): Likewise.
	(UNOP_UNONE_UNONE): Likewise.
	(UNOP_UNONE_NONE): Likewise.
	* config/arm/constraints.md (e): Define new constriant to allow only
	even registers.
	* config/arm/mve.md (mve_vqabsq_s<mode>): Define RTL pattern.
	(mve_vnegq_s<mode>): Likewise.
	(mve_vmvnq_<supf><mode>): Likewise.
	(mve_vdupq_n_<supf><mode>): Likewise.
	(mve_vclzq_<supf><mode>): Likewise.
	(mve_vclsq_s<mode>): Likewise.
	(mve_vaddvq_<supf><mode>): Likewise.
	(mve_vabsq_s<mode>): Likewise.
	(mve_vrev32q_<supf><mode>): Likewise.
	(mve_vmovltq_<supf><mode>): Likewise.
	(mve_vmovlbq_<supf><mode>): Likewise.
	(mve_vcvtpq_<supf><mode>): Likewise.
	(mve_vcvtnq_<supf><mode>): Likewise.
	(mve_vcvtmq_<supf><mode>): Likewise.
	(mve_vcvtaq_<supf><mode>): Likewise.
	(mve_vrev16q_<supf>v16qi): Likewise.
	(mve_vaddlvq_<supf>v4si): Likewise.

gcc/testsuite/ChangeLog:

2019-10-21  Andre Vieira  <andre.simoesdiasvieira@arm.com>
	    Mihail Ionescu  <mihail.ionescu@arm.com>
	    Srinath Parvathaneni  <srinath.parvathaneni@arm.com>

	* gcc.target/arm/mve/intrinsics/vabsq_s16.c: New test.
	* gcc.target/arm/mve/intrinsics/vabsq_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vabsq_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddlvq_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddlvq_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddvq_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddvq_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddvq_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddvq_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddvq_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddvq_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vclsq_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vclsq_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vclsq_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vclzq_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vclzq_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vclzq_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vclzq_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vclzq_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vclzq_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcvtaq_s16_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcvtaq_s32_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcvtaq_u16_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcvtaq_u32_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcvtmq_s16_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcvtmq_s32_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcvtmq_u16_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcvtmq_u32_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcvtnq_s16_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcvtnq_s32_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcvtnq_u16_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcvtpq_s16_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcvtpq_s32_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcvtpq_u16_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcvtpq_u32_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vdupq_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vdupq_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vdupq_n_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vdupq_n_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vdupq_n_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vdupq_n_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmovlbq_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmovlbq_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmovlbq_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmovlbq_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmovltq_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmovltq_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmovltq_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmovltq_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmvnq_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmvnq_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmvnq_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmvnq_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmvnq_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmvnq_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vnegq_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vnegq_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vnegq_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqabsq_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqabsq_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqabsq_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqnegq_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqnegq_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqnegq_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrev16q_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrev16q_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrev32q_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrev32q_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrev32q_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrev32q_u8.c: Likewise.

[-- Attachment #2: diff06.patch.gz --]
[-- Type: application/gzip, Size: 9574 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH][ARM][GCC][3/3x]: MVE intrinsics with ternary operands.
  2019-11-14 19:34 [PATCH][ARM][GCC][0/x]: Support for MVE ACLE intrinsics Srinath Parvathaneni
                   ` (5 preceding siblings ...)
  2019-11-14 19:13 ` [PATCH][ARM][GCC][2/1x]: " Srinath Parvathaneni
@ 2019-11-14 19:13 ` Srinath Parvathaneni
  2019-11-14 19:13 ` [PATCH][ARM][GCC][5/2x]: MVE intrinsics with binary operands Srinath Parvathaneni
                   ` (31 subsequent siblings)
  38 siblings, 0 replies; 48+ messages in thread
From: Srinath Parvathaneni @ 2019-11-14 19:13 UTC (permalink / raw)
  To: gcc-patches; +Cc: Richard Earnshaw, Kyrylo Tkachov

[-- Attachment #1: Type: text/plain, Size: 36897 bytes --]

Hello,

This patch supports following MVE ACLE intrinsics with ternary operands.

vrmlaldavhaxq_s32, vrmlsldavhaq_s32, vrmlsldavhaxq_s32, vaddlvaq_p_s32, vcvtbq_m_f16_f32,
vcvtbq_m_f32_f16, vcvttq_m_f16_f32, vcvttq_m_f32_f16, vrev16q_m_s8, vrev32q_m_f16,
vrmlaldavhq_p_s32, vrmlaldavhxq_p_s32, vrmlsldavhq_p_s32, vrmlsldavhxq_p_s32, vaddlvaq_p_u32,
vrev16q_m_u8, vrmlaldavhq_p_u32, vmvnq_m_n_s16, vorrq_m_n_s16, vqrshrntq_n_s16, vqshrnbq_n_s16,
vqshrntq_n_s16, vrshrnbq_n_s16, vrshrntq_n_s16, vshrnbq_n_s16, vshrntq_n_s16, vcmlaq_f16,
vcmlaq_rot180_f16, vcmlaq_rot270_f16, vcmlaq_rot90_f16, vfmaq_f16, vfmaq_n_f16, vfmasq_n_f16,
vfmsq_f16, vmlaldavaq_s16, vmlaldavaxq_s16, vmlsldavaq_s16, vmlsldavaxq_s16, vabsq_m_f16,
vcvtmq_m_s16_f16, vcvtnq_m_s16_f16, vcvtpq_m_s16_f16, vcvtq_m_s16_f16, vdupq_m_n_f16,
vmaxnmaq_m_f16, vmaxnmavq_p_f16, vmaxnmvq_p_f16, vminnmaq_m_f16, vminnmavq_p_f16, 
vminnmvq_p_f16, vmlaldavq_p_s16, vmlaldavxq_p_s16, vmlsldavq_p_s16, vmlsldavxq_p_s16, 
vmovlbq_m_s8, vmovltq_m_s8, vmovnbq_m_s16, vmovntq_m_s16, vnegq_m_f16, vpselq_f16,
vqmovnbq_m_s16, vqmovntq_m_s16, vrev32q_m_s8, vrev64q_m_f16, vrndaq_m_f16, vrndmq_m_f16,
vrndnq_m_f16, vrndpq_m_f16, vrndq_m_f16, vrndxq_m_f16, vcmpeqq_m_n_f16, vcmpgeq_m_f16,
vcmpgeq_m_n_f16, vcmpgtq_m_f16, vcmpgtq_m_n_f16, vcmpleq_m_f16, vcmpleq_m_n_f16,
vcmpltq_m_f16, vcmpltq_m_n_f16, vcmpneq_m_f16, vcmpneq_m_n_f16, vmvnq_m_n_u16,
vorrq_m_n_u16, vqrshruntq_n_s16, vqshrunbq_n_s16, vqshruntq_n_s16, vcvtmq_m_u16_f16,
vcvtnq_m_u16_f16, vcvtpq_m_u16_f16, vcvtq_m_u16_f16, vqmovunbq_m_s16, vqmovuntq_m_s16,
vqrshrntq_n_u16, vqshrnbq_n_u16, vqshrntq_n_u16, vrshrnbq_n_u16, vrshrntq_n_u16, 
vshrnbq_n_u16, vshrntq_n_u16, vmlaldavaq_u16, vmlaldavaxq_u16, vmlaldavq_p_u16,
vmlaldavxq_p_u16, vmovlbq_m_u8, vmovltq_m_u8, vmovnbq_m_u16, vmovntq_m_u16, vqmovnbq_m_u16,
vqmovntq_m_u16, vrev32q_m_u8, vmvnq_m_n_s32, vorrq_m_n_s32, vqrshrntq_n_s32, vqshrnbq_n_s32,
vqshrntq_n_s32, vrshrnbq_n_s32, vrshrntq_n_s32, vshrnbq_n_s32, vshrntq_n_s32, vcmlaq_f32,
vcmlaq_rot180_f32, vcmlaq_rot270_f32, vcmlaq_rot90_f32, vfmaq_f32, vfmaq_n_f32, vfmasq_n_f32,
vfmsq_f32, vmlaldavaq_s32, vmlaldavaxq_s32, vmlsldavaq_s32, vmlsldavaxq_s32, vabsq_m_f32,
vcvtmq_m_s32_f32, vcvtnq_m_s32_f32, vcvtpq_m_s32_f32, vcvtq_m_s32_f32, vdupq_m_n_f32,
vmaxnmaq_m_f32, vmaxnmavq_p_f32, vmaxnmvq_p_f32, vminnmaq_m_f32, vminnmavq_p_f32,
vminnmvq_p_f32, vmlaldavq_p_s32, vmlaldavxq_p_s32, vmlsldavq_p_s32, vmlsldavxq_p_s32,
vmovlbq_m_s16, vmovltq_m_s16, vmovnbq_m_s32, vmovntq_m_s32, vnegq_m_f32, vpselq_f32,
vqmovnbq_m_s32, vqmovntq_m_s32, vrev32q_m_s16, vrev64q_m_f32, vrndaq_m_f32, vrndmq_m_f32,
vrndnq_m_f32, vrndpq_m_f32, vrndq_m_f32, vrndxq_m_f32, vcmpeqq_m_n_f32, vcmpgeq_m_f32,
vcmpgeq_m_n_f32, vcmpgtq_m_f32, vcmpgtq_m_n_f32, vcmpleq_m_f32, vcmpleq_m_n_f32,
vcmpltq_m_f32, vcmpltq_m_n_f32, vcmpneq_m_f32, vcmpneq_m_n_f32, vmvnq_m_n_u32, vorrq_m_n_u32,
vqrshruntq_n_s32, vqshrunbq_n_s32, vqshruntq_n_s32, vcvtmq_m_u32_f32, vcvtnq_m_u32_f32,
vcvtpq_m_u32_f32, vcvtq_m_u32_f32, vqmovunbq_m_s32, vqmovuntq_m_s32, vqrshrntq_n_u32,
vqshrnbq_n_u32, vqshrntq_n_u32, vrshrnbq_n_u32, vrshrntq_n_u32, vshrnbq_n_u32, vshrntq_n_u32,
vmlaldavaq_u32, vmlaldavaxq_u32, vmlaldavq_p_u32, vmlaldavxq_p_u32, vmovlbq_m_u16,
vmovltq_m_u16, vmovnbq_m_u32, vmovntq_m_u32, vqmovnbq_m_u32, vqmovntq_m_u32, vrev32q_m_u16.

Please refer to M-profile Vector Extension (MVE) intrinsics [1]  for more details.
[1] https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/mve-intrinsics

Regression tested on arm-none-eabi and found no regressions.

Ok for trunk?

Thanks,
Srinath.

gcc/ChangeLog:

2019-10-29  Andre Vieira  <andre.simoesdiasvieira@arm.com>
            Mihail Ionescu  <mihail.ionescu@arm.com>
            Srinath Parvathaneni  <srinath.parvathaneni@arm.com>

	* config/arm/arm_mve.h (vrmlaldavhaxq_s32): Define macro.
	(vrmlsldavhaq_s32): Likewise.
	(vrmlsldavhaxq_s32): Likewise.
	(vaddlvaq_p_s32): Likewise.
	(vcvtbq_m_f16_f32): Likewise.
	(vcvtbq_m_f32_f16): Likewise.
	(vcvttq_m_f16_f32): Likewise.
	(vcvttq_m_f32_f16): Likewise.
	(vrev16q_m_s8): Likewise.
	(vrev32q_m_f16): Likewise.
	(vrmlaldavhq_p_s32): Likewise.
	(vrmlaldavhxq_p_s32): Likewise.
	(vrmlsldavhq_p_s32): Likewise.
	(vrmlsldavhxq_p_s32): Likewise.
	(vaddlvaq_p_u32): Likewise.
	(vrev16q_m_u8): Likewise.
	(vrmlaldavhq_p_u32): Likewise.
	(vmvnq_m_n_s16): Likewise.
	(vorrq_m_n_s16): Likewise.
	(vqrshrntq_n_s16): Likewise.
	(vqshrnbq_n_s16): Likewise.
	(vqshrntq_n_s16): Likewise.
	(vrshrnbq_n_s16): Likewise.
	(vrshrntq_n_s16): Likewise.
	(vshrnbq_n_s16): Likewise.
	(vshrntq_n_s16): Likewise.
	(vcmlaq_f16): Likewise.
	(vcmlaq_rot180_f16): Likewise.
	(vcmlaq_rot270_f16): Likewise.
	(vcmlaq_rot90_f16): Likewise.
	(vfmaq_f16): Likewise.
	(vfmaq_n_f16): Likewise.
	(vfmasq_n_f16): Likewise.
	(vfmsq_f16): Likewise.
	(vmlaldavaq_s16): Likewise.
	(vmlaldavaxq_s16): Likewise.
	(vmlsldavaq_s16): Likewise.
	(vmlsldavaxq_s16): Likewise.
	(vabsq_m_f16): Likewise.
	(vcvtmq_m_s16_f16): Likewise.
	(vcvtnq_m_s16_f16): Likewise.
	(vcvtpq_m_s16_f16): Likewise.
	(vcvtq_m_s16_f16): Likewise.
	(vdupq_m_n_f16): Likewise.
	(vmaxnmaq_m_f16): Likewise.
	(vmaxnmavq_p_f16): Likewise.
	(vmaxnmvq_p_f16): Likewise.
	(vminnmaq_m_f16): Likewise.
	(vminnmavq_p_f16): Likewise.
	(vminnmvq_p_f16): Likewise.
	(vmlaldavq_p_s16): Likewise.
	(vmlaldavxq_p_s16): Likewise.
	(vmlsldavq_p_s16): Likewise.
	(vmlsldavxq_p_s16): Likewise.
	(vmovlbq_m_s8): Likewise.
	(vmovltq_m_s8): Likewise.
	(vmovnbq_m_s16): Likewise.
	(vmovntq_m_s16): Likewise.
	(vnegq_m_f16): Likewise.
	(vpselq_f16): Likewise.
	(vqmovnbq_m_s16): Likewise.
	(vqmovntq_m_s16): Likewise.
	(vrev32q_m_s8): Likewise.
	(vrev64q_m_f16): Likewise.
	(vrndaq_m_f16): Likewise.
	(vrndmq_m_f16): Likewise.
	(vrndnq_m_f16): Likewise.
	(vrndpq_m_f16): Likewise.
	(vrndq_m_f16): Likewise.
	(vrndxq_m_f16): Likewise.
	(vcmpeqq_m_n_f16): Likewise.
	(vcmpgeq_m_f16): Likewise.
	(vcmpgeq_m_n_f16): Likewise.
	(vcmpgtq_m_f16): Likewise.
	(vcmpgtq_m_n_f16): Likewise.
	(vcmpleq_m_f16): Likewise.
	(vcmpleq_m_n_f16): Likewise.
	(vcmpltq_m_f16): Likewise.
	(vcmpltq_m_n_f16): Likewise.
	(vcmpneq_m_f16): Likewise.
	(vcmpneq_m_n_f16): Likewise.
	(vmvnq_m_n_u16): Likewise.
	(vorrq_m_n_u16): Likewise.
	(vqrshruntq_n_s16): Likewise.
	(vqshrunbq_n_s16): Likewise.
	(vqshruntq_n_s16): Likewise.
	(vcvtmq_m_u16_f16): Likewise.
	(vcvtnq_m_u16_f16): Likewise.
	(vcvtpq_m_u16_f16): Likewise.
	(vcvtq_m_u16_f16): Likewise.
	(vqmovunbq_m_s16): Likewise.
	(vqmovuntq_m_s16): Likewise.
	(vqrshrntq_n_u16): Likewise.
	(vqshrnbq_n_u16): Likewise.
	(vqshrntq_n_u16): Likewise.
	(vrshrnbq_n_u16): Likewise.
	(vrshrntq_n_u16): Likewise.
	(vshrnbq_n_u16): Likewise.
	(vshrntq_n_u16): Likewise.
	(vmlaldavaq_u16): Likewise.
	(vmlaldavaxq_u16): Likewise.
	(vmlaldavq_p_u16): Likewise.
	(vmlaldavxq_p_u16): Likewise.
	(vmovlbq_m_u8): Likewise.
	(vmovltq_m_u8): Likewise.
	(vmovnbq_m_u16): Likewise.
	(vmovntq_m_u16): Likewise.
	(vqmovnbq_m_u16): Likewise.
	(vqmovntq_m_u16): Likewise.
	(vrev32q_m_u8): Likewise.
	(vmvnq_m_n_s32): Likewise.
	(vorrq_m_n_s32): Likewise.
	(vqrshrntq_n_s32): Likewise.
	(vqshrnbq_n_s32): Likewise.
	(vqshrntq_n_s32): Likewise.
	(vrshrnbq_n_s32): Likewise.
	(vrshrntq_n_s32): Likewise.
	(vshrnbq_n_s32): Likewise.
	(vshrntq_n_s32): Likewise.
	(vcmlaq_f32): Likewise.
	(vcmlaq_rot180_f32): Likewise.
	(vcmlaq_rot270_f32): Likewise.
	(vcmlaq_rot90_f32): Likewise.
	(vfmaq_f32): Likewise.
	(vfmaq_n_f32): Likewise.
	(vfmasq_n_f32): Likewise.
	(vfmsq_f32): Likewise.
	(vmlaldavaq_s32): Likewise.
	(vmlaldavaxq_s32): Likewise.
	(vmlsldavaq_s32): Likewise.
	(vmlsldavaxq_s32): Likewise.
	(vabsq_m_f32): Likewise.
	(vcvtmq_m_s32_f32): Likewise.
	(vcvtnq_m_s32_f32): Likewise.
	(vcvtpq_m_s32_f32): Likewise.
	(vcvtq_m_s32_f32): Likewise.
	(vdupq_m_n_f32): Likewise.
	(vmaxnmaq_m_f32): Likewise.
	(vmaxnmavq_p_f32): Likewise.
	(vmaxnmvq_p_f32): Likewise.
	(vminnmaq_m_f32): Likewise.
	(vminnmavq_p_f32): Likewise.
	(vminnmvq_p_f32): Likewise.
	(vmlaldavq_p_s32): Likewise.
	(vmlaldavxq_p_s32): Likewise.
	(vmlsldavq_p_s32): Likewise.
	(vmlsldavxq_p_s32): Likewise.
	(vmovlbq_m_s16): Likewise.
	(vmovltq_m_s16): Likewise.
	(vmovnbq_m_s32): Likewise.
	(vmovntq_m_s32): Likewise.
	(vnegq_m_f32): Likewise.
	(vpselq_f32): Likewise.
	(vqmovnbq_m_s32): Likewise.
	(vqmovntq_m_s32): Likewise.
	(vrev32q_m_s16): Likewise.
	(vrev64q_m_f32): Likewise.
	(vrndaq_m_f32): Likewise.
	(vrndmq_m_f32): Likewise.
	(vrndnq_m_f32): Likewise.
	(vrndpq_m_f32): Likewise.
	(vrndq_m_f32): Likewise.
	(vrndxq_m_f32): Likewise.
	(vcmpeqq_m_n_f32): Likewise.
	(vcmpgeq_m_f32): Likewise.
	(vcmpgeq_m_n_f32): Likewise.
	(vcmpgtq_m_f32): Likewise.
	(vcmpgtq_m_n_f32): Likewise.
	(vcmpleq_m_f32): Likewise.
	(vcmpleq_m_n_f32): Likewise.
	(vcmpltq_m_f32): Likewise.
	(vcmpltq_m_n_f32): Likewise.
	(vcmpneq_m_f32): Likewise.
	(vcmpneq_m_n_f32): Likewise.
	(vmvnq_m_n_u32): Likewise.
	(vorrq_m_n_u32): Likewise.
	(vqrshruntq_n_s32): Likewise.
	(vqshrunbq_n_s32): Likewise.
	(vqshruntq_n_s32): Likewise.
	(vcvtmq_m_u32_f32): Likewise.
	(vcvtnq_m_u32_f32): Likewise.
	(vcvtpq_m_u32_f32): Likewise.
	(vcvtq_m_u32_f32): Likewise.
	(vqmovunbq_m_s32): Likewise.
	(vqmovuntq_m_s32): Likewise.
	(vqrshrntq_n_u32): Likewise.
	(vqshrnbq_n_u32): Likewise.
	(vqshrntq_n_u32): Likewise.
	(vrshrnbq_n_u32): Likewise.
	(vrshrntq_n_u32): Likewise.
	(vshrnbq_n_u32): Likewise.
	(vshrntq_n_u32): Likewise.
	(vmlaldavaq_u32): Likewise.
	(vmlaldavaxq_u32): Likewise.
	(vmlaldavq_p_u32): Likewise.
	(vmlaldavxq_p_u32): Likewise.
	(vmovlbq_m_u16): Likewise.
	(vmovltq_m_u16): Likewise.
	(vmovnbq_m_u32): Likewise.
	(vmovntq_m_u32): Likewise.
	(vqmovnbq_m_u32): Likewise.
	(vqmovntq_m_u32): Likewise.
	(vrev32q_m_u16): Likewise.
	(__arm_vrmlaldavhaxq_s32): Define intrinsic.
	(__arm_vrmlsldavhaq_s32): Likewise.
	(__arm_vrmlsldavhaxq_s32): Likewise.
	(__arm_vaddlvaq_p_s32): Likewise.
	(__arm_vrev16q_m_s8): Likewise.
	(__arm_vrmlaldavhq_p_s32): Likewise.
	(__arm_vrmlaldavhxq_p_s32): Likewise.
	(__arm_vrmlsldavhq_p_s32): Likewise.
	(__arm_vrmlsldavhxq_p_s32): Likewise.
	(__arm_vaddlvaq_p_u32): Likewise.
	(__arm_vrev16q_m_u8): Likewise.
	(__arm_vrmlaldavhq_p_u32): Likewise.
	(__arm_vmvnq_m_n_s16): Likewise.
	(__arm_vorrq_m_n_s16): Likewise.
	(__arm_vqrshrntq_n_s16): Likewise.
	(__arm_vqshrnbq_n_s16): Likewise.
	(__arm_vqshrntq_n_s16): Likewise.
	(__arm_vrshrnbq_n_s16): Likewise.
	(__arm_vrshrntq_n_s16): Likewise.
	(__arm_vshrnbq_n_s16): Likewise.
	(__arm_vshrntq_n_s16): Likewise.
	(__arm_vmlaldavaq_s16): Likewise.
	(__arm_vmlaldavaxq_s16): Likewise.
	(__arm_vmlsldavaq_s16): Likewise.
	(__arm_vmlsldavaxq_s16): Likewise.
	(__arm_vmlaldavq_p_s16): Likewise.
	(__arm_vmlaldavxq_p_s16): Likewise.
	(__arm_vmlsldavq_p_s16): Likewise.
	(__arm_vmlsldavxq_p_s16): Likewise.
	(__arm_vmovlbq_m_s8): Likewise.
	(__arm_vmovltq_m_s8): Likewise.
	(__arm_vmovnbq_m_s16): Likewise.
	(__arm_vmovntq_m_s16): Likewise.
	(__arm_vqmovnbq_m_s16): Likewise.
	(__arm_vqmovntq_m_s16): Likewise.
	(__arm_vrev32q_m_s8): Likewise.
	(__arm_vmvnq_m_n_u16): Likewise.
	(__arm_vorrq_m_n_u16): Likewise.
	(__arm_vqrshruntq_n_s16): Likewise.
	(__arm_vqshrunbq_n_s16): Likewise.
	(__arm_vqshruntq_n_s16): Likewise.
	(__arm_vqmovunbq_m_s16): Likewise.
	(__arm_vqmovuntq_m_s16): Likewise.
	(__arm_vqrshrntq_n_u16): Likewise.
	(__arm_vqshrnbq_n_u16): Likewise.
	(__arm_vqshrntq_n_u16): Likewise.
	(__arm_vrshrnbq_n_u16): Likewise.
	(__arm_vrshrntq_n_u16): Likewise.
	(__arm_vshrnbq_n_u16): Likewise.
	(__arm_vshrntq_n_u16): Likewise.
	(__arm_vmlaldavaq_u16): Likewise.
	(__arm_vmlaldavaxq_u16): Likewise.
	(__arm_vmlaldavq_p_u16): Likewise.
	(__arm_vmlaldavxq_p_u16): Likewise.
	(__arm_vmovlbq_m_u8): Likewise.
	(__arm_vmovltq_m_u8): Likewise.
	(__arm_vmovnbq_m_u16): Likewise.
	(__arm_vmovntq_m_u16): Likewise.
	(__arm_vqmovnbq_m_u16): Likewise.
	(__arm_vqmovntq_m_u16): Likewise.
	(__arm_vrev32q_m_u8): Likewise.
	(__arm_vmvnq_m_n_s32): Likewise.
	(__arm_vorrq_m_n_s32): Likewise.
	(__arm_vqrshrntq_n_s32): Likewise.
	(__arm_vqshrnbq_n_s32): Likewise.
	(__arm_vqshrntq_n_s32): Likewise.
	(__arm_vrshrnbq_n_s32): Likewise.
	(__arm_vrshrntq_n_s32): Likewise.
	(__arm_vshrnbq_n_s32): Likewise.
	(__arm_vshrntq_n_s32): Likewise.
	(__arm_vmlaldavaq_s32): Likewise.
	(__arm_vmlaldavaxq_s32): Likewise.
	(__arm_vmlsldavaq_s32): Likewise.
	(__arm_vmlsldavaxq_s32): Likewise.
	(__arm_vmlaldavq_p_s32): Likewise.
	(__arm_vmlaldavxq_p_s32): Likewise.
	(__arm_vmlsldavq_p_s32): Likewise.
	(__arm_vmlsldavxq_p_s32): Likewise.
	(__arm_vmovlbq_m_s16): Likewise.
	(__arm_vmovltq_m_s16): Likewise.
	(__arm_vmovnbq_m_s32): Likewise.
	(__arm_vmovntq_m_s32): Likewise.
	(__arm_vqmovnbq_m_s32): Likewise.
	(__arm_vqmovntq_m_s32): Likewise.
	(__arm_vrev32q_m_s16): Likewise.
	(__arm_vmvnq_m_n_u32): Likewise.
	(__arm_vorrq_m_n_u32): Likewise.
	(__arm_vqrshruntq_n_s32): Likewise.
	(__arm_vqshrunbq_n_s32): Likewise.
	(__arm_vqshruntq_n_s32): Likewise.
	(__arm_vqmovunbq_m_s32): Likewise.
	(__arm_vqmovuntq_m_s32): Likewise.
	(__arm_vqrshrntq_n_u32): Likewise.
	(__arm_vqshrnbq_n_u32): Likewise.
	(__arm_vqshrntq_n_u32): Likewise.
	(__arm_vrshrnbq_n_u32): Likewise.
	(__arm_vrshrntq_n_u32): Likewise.
	(__arm_vshrnbq_n_u32): Likewise.
	(__arm_vshrntq_n_u32): Likewise.
	(__arm_vmlaldavaq_u32): Likewise.
	(__arm_vmlaldavaxq_u32): Likewise.
	(__arm_vmlaldavq_p_u32): Likewise.
	(__arm_vmlaldavxq_p_u32): Likewise.
	(__arm_vmovlbq_m_u16): Likewise.
	(__arm_vmovltq_m_u16): Likewise.
	(__arm_vmovnbq_m_u32): Likewise.
	(__arm_vmovntq_m_u32): Likewise.
	(__arm_vqmovnbq_m_u32): Likewise.
	(__arm_vqmovntq_m_u32): Likewise.
	(__arm_vrev32q_m_u16): Likewise.
	(__arm_vcvtbq_m_f16_f32): Likewise.
	(__arm_vcvtbq_m_f32_f16): Likewise.
	(__arm_vcvttq_m_f16_f32): Likewise.
	(__arm_vcvttq_m_f32_f16): Likewise.
	(__arm_vrev32q_m_f16): Likewise.
	(__arm_vcmlaq_f16): Likewise.
	(__arm_vcmlaq_rot180_f16): Likewise.
	(__arm_vcmlaq_rot270_f16): Likewise.
	(__arm_vcmlaq_rot90_f16): Likewise.
	(__arm_vfmaq_f16): Likewise.
	(__arm_vfmaq_n_f16): Likewise.
	(__arm_vfmasq_n_f16): Likewise.
	(__arm_vfmsq_f16): Likewise.
	(__arm_vabsq_m_f16): Likewise.
	(__arm_vcvtmq_m_s16_f16): Likewise.
	(__arm_vcvtnq_m_s16_f16): Likewise.
	(__arm_vcvtpq_m_s16_f16): Likewise.
	(__arm_vcvtq_m_s16_f16): Likewise.
	(__arm_vdupq_m_n_f16): Likewise.
	(__arm_vmaxnmaq_m_f16): Likewise.
	(__arm_vmaxnmavq_p_f16): Likewise.
	(__arm_vmaxnmvq_p_f16): Likewise.
	(__arm_vminnmaq_m_f16): Likewise.
	(__arm_vminnmavq_p_f16): Likewise.
	(__arm_vminnmvq_p_f16): Likewise.
	(__arm_vnegq_m_f16): Likewise.
	(__arm_vpselq_f16): Likewise.
	(__arm_vrev64q_m_f16): Likewise.
	(__arm_vrndaq_m_f16): Likewise.
	(__arm_vrndmq_m_f16): Likewise.
	(__arm_vrndnq_m_f16): Likewise.
	(__arm_vrndpq_m_f16): Likewise.
	(__arm_vrndq_m_f16): Likewise.
	(__arm_vrndxq_m_f16): Likewise.
	(__arm_vcmpeqq_m_n_f16): Likewise.
	(__arm_vcmpgeq_m_f16): Likewise.
	(__arm_vcmpgeq_m_n_f16): Likewise.
	(__arm_vcmpgtq_m_f16): Likewise.
	(__arm_vcmpgtq_m_n_f16): Likewise.
	(__arm_vcmpleq_m_f16): Likewise.
	(__arm_vcmpleq_m_n_f16): Likewise.
	(__arm_vcmpltq_m_f16): Likewise.
	(__arm_vcmpltq_m_n_f16): Likewise.
	(__arm_vcmpneq_m_f16): Likewise.
	(__arm_vcmpneq_m_n_f16): Likewise.
	(__arm_vcvtmq_m_u16_f16): Likewise.
	(__arm_vcvtnq_m_u16_f16): Likewise.
	(__arm_vcvtpq_m_u16_f16): Likewise.
	(__arm_vcvtq_m_u16_f16): Likewise.
	(__arm_vcmlaq_f32): Likewise.
	(__arm_vcmlaq_rot180_f32): Likewise.
	(__arm_vcmlaq_rot270_f32): Likewise.
	(__arm_vcmlaq_rot90_f32): Likewise.
	(__arm_vfmaq_f32): Likewise.
	(__arm_vfmaq_n_f32): Likewise.
	(__arm_vfmasq_n_f32): Likewise.
	(__arm_vfmsq_f32): Likewise.
	(__arm_vabsq_m_f32): Likewise.
	(__arm_vcvtmq_m_s32_f32): Likewise.
	(__arm_vcvtnq_m_s32_f32): Likewise.
	(__arm_vcvtpq_m_s32_f32): Likewise.
	(__arm_vcvtq_m_s32_f32): Likewise.
	(__arm_vdupq_m_n_f32): Likewise.
	(__arm_vmaxnmaq_m_f32): Likewise.
	(__arm_vmaxnmavq_p_f32): Likewise.
	(__arm_vmaxnmvq_p_f32): Likewise.
	(__arm_vminnmaq_m_f32): Likewise.
	(__arm_vminnmavq_p_f32): Likewise.
	(__arm_vminnmvq_p_f32): Likewise.
	(__arm_vnegq_m_f32): Likewise.
	(__arm_vpselq_f32): Likewise.
	(__arm_vrev64q_m_f32): Likewise.
	(__arm_vrndaq_m_f32): Likewise.
	(__arm_vrndmq_m_f32): Likewise.
	(__arm_vrndnq_m_f32): Likewise.
	(__arm_vrndpq_m_f32): Likewise.
	(__arm_vrndq_m_f32): Likewise.
	(__arm_vrndxq_m_f32): Likewise.
	(__arm_vcmpeqq_m_n_f32): Likewise.
	(__arm_vcmpgeq_m_f32): Likewise.
	(__arm_vcmpgeq_m_n_f32): Likewise.
	(__arm_vcmpgtq_m_f32): Likewise.
	(__arm_vcmpgtq_m_n_f32): Likewise.
	(__arm_vcmpleq_m_f32): Likewise.
	(__arm_vcmpleq_m_n_f32): Likewise.
	(__arm_vcmpltq_m_f32): Likewise.
	(__arm_vcmpltq_m_n_f32): Likewise.
	(__arm_vcmpneq_m_f32): Likewise.
	(__arm_vcmpneq_m_n_f32): Likewise.
	(__arm_vcvtmq_m_u32_f32): Likewise.
	(__arm_vcvtnq_m_u32_f32): Likewise.
	(__arm_vcvtpq_m_u32_f32): Likewise.
	(__arm_vcvtq_m_u32_f32): Likewise.
	(vcvtq_m): Define polymorphic variant.
	(vabsq_m): Likewise.
	(vcmlaq): Likewise.
	(vcmlaq_rot180): Likewise.
	(vcmlaq_rot270): Likewise.
	(vcmlaq_rot90): Likewise.
	(vcmpeqq_m_n): Likewise.
	(vcmpgeq_m_n): Likewise.
	(vrndxq_m): Likewise.
	(vrndq_m): Likewise.
	(vrndpq_m): Likewise.
	(vcmpgtq_m_n): Likewise.
	(vcmpgtq_m): Likewise.
	(vcmpleq_m): Likewise.
	(vcmpleq_m_n): Likewise.
	(vcmpltq_m_n): Likewise.
	(vcmpltq_m): Likewise.
	(vcmpneq_m): Likewise.
	(vcmpneq_m_n): Likewise.
	(vcvtbq_m): Likewise.
	(vcvttq_m): Likewise.
	(vcvtmq_m): Likewise.
	(vcvtnq_m): Likewise.
	(vcvtpq_m): Likewise.
	(vdupq_m_n): Likewise.
	(vfmaq_n): Likewise.
	(vfmaq): Likewise.
	(vfmasq_n): Likewise.
	(vfmsq): Likewise.
	(vmaxnmaq_m): Likewise.
	(vmaxnmavq_m): Likewise.
	(vmaxnmvq_m): Likewise.
	(vmaxnmavq_p): Likewise.
	(vmaxnmvq_p): Likewise.
	(vminnmaq_m): Likewise.
	(vminnmavq_p): Likewise.
	(vminnmvq_p): Likewise.
	(vrndnq_m): Likewise.
	(vrndaq_m): Likewise.
	(vrndmq_m): Likewise.
	(vrev64q_m): Likewise.
	(vrev32q_m): Likewise.
	(vpselq): Likewise.
	(vnegq_m): Likewise.
	(vcmpgeq_m): Likewise.
	(vshrntq_n): Likewise.
	(vrshrntq_n): Likewise.
	(vmovlbq_m): Likewise.
	(vmovnbq_m): Likewise.
	(vmovntq_m): Likewise.
	(vmvnq_m_n): Likewise.
	(vmvnq_m): Likewise.
	(vshrnbq_n): Likewise.
	(vrshrnbq_n): Likewise.
	(vqshruntq_n): Likewise.
	(vrev16q_m): Likewise.
	(vqshrunbq_n): Likewise.
	(vqshrntq_n): Likewise.
	(vqrshruntq_n): Likewise.
	(vqrshrntq_n): Likewise.
	(vqshrnbq_n): Likewise.
	(vqmovuntq_m): Likewise.
	(vqmovntq_m): Likewise.
	(vqmovnbq_m): Likewise.
	(vorrq_m_n): Likewise.
	(vmovltq_m): Likewise.
	(vqmovunbq_m): Likewise.
	(vaddlvaq_p): Likewise.
	(vmlaldavaq): Likewise.
	(vmlaldavaxq): Likewise.
	(vmlaldavq_p): Likewise.
	(vmlaldavxq_p): Likewise.
	(vmlsldavaq): Likewise.
	(vmlsldavaxq): Likewise.
	(vmlsldavq_p): Likewise.
	(vmlsldavxq_p): Likewise.
	(vrmlaldavhaxq): Likewise.
	(vrmlaldavhq_p): Likewise.
	(vrmlaldavhxq_p): Likewise.
	(vrmlsldavhaq): Likewise.
	(vrmlsldavhaxq): Likewise.
	(vrmlsldavhq_p): Likewise.
	(vrmlsldavhxq_p): Likewise.
	* config/arm/arm_mve_builtins.def (TERNOP_NONE_NONE_IMM_UNONE): Use
	builtin qualifier.
	(TERNOP_NONE_NONE_NONE_IMM): Likewise.
	(TERNOP_NONE_NONE_NONE_NONE): Likewise.
	(TERNOP_NONE_NONE_NONE_UNONE): Likewise.
	(TERNOP_UNONE_NONE_NONE_UNONE): Likewise.
	(TERNOP_UNONE_UNONE_IMM_UNONE): Likewise.
	(TERNOP_UNONE_UNONE_NONE_IMM): Likewise.
	(TERNOP_UNONE_UNONE_NONE_UNONE): Likewise.
	(TERNOP_UNONE_UNONE_UNONE_IMM): Likewise.
	(TERNOP_UNONE_UNONE_UNONE_UNONE): Likewise.
	* config/arm/mve.md (MVE_constraint3): Define mode attribute iterator.
	(MVE_pred3): Likewise.
	(MVE_constraint1): Likewise.
	(MVE_pred1): Likewise.
	(VMLALDAVQ_P): Define iterator.
	(VQMOVNBQ_M): Likewise.
	(VMOVLTQ_M): Likewise.
	(VMOVNBQ_M): Likewise.
	(VRSHRNTQ_N): Likewise.
	(VORRQ_M_N): Likewise.
	(VREV32Q_M): Likewise.
	(VREV16Q_M): Likewise.
	(VQRSHRNTQ_N): Likewise.
	(VMOVNTQ_M): Likewise.
	(VMOVLBQ_M): Likewise.
	(VMLALDAVAQ): Likewise.
	(VQSHRNBQ_N): Likewise.
	(VSHRNBQ_N): Likewise.
	(VRSHRNBQ_N): Likewise.
	(VMLALDAVXQ_P): Likewise.
	(VQMOVNTQ_M): Likewise.
	(VMVNQ_M_N): Likewise.
	(VQSHRNTQ_N): Likewise.
	(VMLALDAVAXQ): Likewise.
	(VSHRNTQ_N): Likewise.
	(VCVTMQ_M): Likewise.
	(VCVTNQ_M): Likewise.
	(VCVTPQ_M): Likewise.
	(VCVTQ_M_N_FROM_F): Likewise.
	(VCVTQ_M_FROM_F): Likewise.
	(VRMLALDAVHQ_P): Likewise.
	(VADDLVAQ_P): Likewise.
	(mve_vrndq_m_f<mode>): Define RTL pattern.
	(mve_vabsq_m_f<mode>): Likewise.
	(mve_vaddlvaq_p_<supf>v4si): Likewise.
	(mve_vcmlaq_f<mode>): Likewise.
	(mve_vcmlaq_rot180_f<mode>): Likewise.
	(mve_vcmlaq_rot270_f<mode>): Likewise.
	(mve_vcmlaq_rot90_f<mode>): Likewise.
	(mve_vcmpeqq_m_n_f<mode>): Likewise.
	(mve_vcmpgeq_m_f<mode>): Likewise.
	(mve_vcmpgeq_m_n_f<mode>): Likewise.
	(mve_vcmpgtq_m_f<mode>): Likewise.
	(mve_vcmpgtq_m_n_f<mode>): Likewise.
	(mve_vcmpleq_m_f<mode>): Likewise.
	(mve_vcmpleq_m_n_f<mode>): Likewise.
	(mve_vcmpltq_m_f<mode>): Likewise.
	(mve_vcmpltq_m_n_f<mode>): Likewise.
	(mve_vcmpneq_m_f<mode>): Likewise.
	(mve_vcmpneq_m_n_f<mode>): Likewise.
	(mve_vcvtbq_m_f16_f32v8hf): Likewise.
	(mve_vcvtbq_m_f32_f16v4sf): Likewise.
	(mve_vcvttq_m_f16_f32v8hf): Likewise.
	(mve_vcvttq_m_f32_f16v4sf): Likewise.
	(mve_vdupq_m_n_f<mode>): Likewise.
	(mve_vfmaq_f<mode>): Likewise.
	(mve_vfmaq_n_f<mode>): Likewise.
	(mve_vfmasq_n_f<mode>): Likewise.
	(mve_vfmsq_f<mode>): Likewise.
	(mve_vmaxnmaq_m_f<mode>): Likewise.
	(mve_vmaxnmavq_p_f<mode>): Likewise.
	(mve_vmaxnmvq_p_f<mode>): Likewise.
	(mve_vminnmaq_m_f<mode>): Likewise.
	(mve_vminnmavq_p_f<mode>): Likewise.
	(mve_vminnmvq_p_f<mode>): Likewise.
	(mve_vmlaldavaq_<supf><mode>): Likewise.
	(mve_vmlaldavaxq_<supf><mode>): Likewise.
	(mve_vmlaldavq_p_<supf><mode>): Likewise.
	(mve_vmlaldavxq_p_<supf><mode>): Likewise.
	(mve_vmlsldavaq_s<mode>): Likewise.
	(mve_vmlsldavaxq_s<mode>): Likewise.
	(mve_vmlsldavq_p_s<mode>): Likewise.
	(mve_vmlsldavxq_p_s<mode>): Likewise.
	(mve_vmovlbq_m_<supf><mode>): Likewise.
	(mve_vmovltq_m_<supf><mode>): Likewise.
	(mve_vmovnbq_m_<supf><mode>): Likewise.
	(mve_vmovntq_m_<supf><mode>): Likewise.
	(mve_vmvnq_m_n_<supf><mode>): Likewise.
	(mve_vnegq_m_f<mode>): Likewise.
	(mve_vorrq_m_n_<supf><mode>): Likewise.
	(mve_vpselq_f<mode>): Likewise.
	(mve_vqmovnbq_m_<supf><mode>): Likewise.
	(mve_vqmovntq_m_<supf><mode>): Likewise.
	(mve_vqmovunbq_m_s<mode>): Likewise.
	(mve_vqmovuntq_m_s<mode>): Likewise.
	(mve_vqrshrntq_n_<supf><mode>): Likewise.
	(mve_vqrshruntq_n_s<mode>): Likewise.
	(mve_vqshrnbq_n_<supf><mode>): Likewise.
	(mve_vqshrntq_n_<supf><mode>): Likewise.
	(mve_vqshrunbq_n_s<mode>): Likewise.
	(mve_vqshruntq_n_s<mode>): Likewise.
	(mve_vrev32q_m_fv8hf): Likewise.
	(mve_vrev32q_m_<supf><mode>): Likewise.
	(mve_vrev64q_m_f<mode>): Likewise.
	(mve_vrmlaldavhaxq_sv4si): Likewise.
	(mve_vrmlaldavhxq_p_sv4si): Likewise.
	(mve_vrmlsldavhaxq_sv4si): Likewise.
	(mve_vrmlsldavhq_p_sv4si): Likewise.
	(mve_vrmlsldavhxq_p_sv4si): Likewise.
	(mve_vrndaq_m_f<mode>): Likewise.
	(mve_vrndmq_m_f<mode>): Likewise.
	(mve_vrndnq_m_f<mode>): Likewise.
	(mve_vrndpq_m_f<mode>): Likewise.
	(mve_vrndxq_m_f<mode>): Likewise.
	(mve_vrshrnbq_n_<supf><mode>): Likewise.
	(mve_vrshrntq_n_<supf><mode>): Likewise.
	(mve_vshrnbq_n_<supf><mode>): Likewise.
	(mve_vshrntq_n_<supf><mode>): Likewise.
	(mve_vcvtmq_m_<supf><mode>): Likewise.
	(mve_vcvtpq_m_<supf><mode>): Likewise.
	(mve_vcvtnq_m_<supf><mode>): Likewise.
	(mve_vcvtq_m_n_from_f_<supf><mode>): Likewise.
	(mve_vrev16q_m_<supf>v16qi): Likewise.
	(mve_vcvtq_m_from_f_<supf><mode>): Likewise.
	(mve_vrmlaldavhq_p_<supf>v4si): Likewise.
	(mve_vrmlsldavhaq_sv4si): Likewise.
	
gcc/testsuite/ChangeLog:

2019-10-29  Andre Vieira  <andre.simoesdiasvieira@arm.com>
            Mihail Ionescu  <mihail.ionescu@arm.com>
            Srinath Parvathaneni  <srinath.parvathaneni@arm.com>

	* gcc.target/arm/mve/intrinsics/vabsq_m_f16.c: New test.
	* gcc.target/arm/mve/intrinsics/vabsq_m_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddlvaq_p_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddlvaq_p_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmlaq_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmlaq_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmlaq_rot180_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmlaq_rot180_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmlaq_rot270_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmlaq_rot270_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmlaq_rot90_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmlaq_rot90_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpeqq_m_n_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpeqq_m_n_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpgeq_m_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpgeq_m_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpgeq_m_n_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpgeq_m_n_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpgtq_m_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpgtq_m_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpgtq_m_n_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpgtq_m_n_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpleq_m_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpleq_m_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpleq_m_n_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpleq_m_n_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpltq_m_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpltq_m_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpltq_m_n_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpltq_m_n_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpneq_m_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpneq_m_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpneq_m_n_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpneq_m_n_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcvtbq_m_f16_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcvtbq_m_f32_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcvtmq_m_s16_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcvtmq_m_s32_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcvtmq_m_u16_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcvtmq_m_u32_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcvtnq_m_s16_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcvtnq_m_s32_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcvtnq_m_u16_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcvtnq_m_u32_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcvtpq_m_s16_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcvtpq_m_s32_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcvtpq_m_u16_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcvtpq_m_u32_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcvtq_m_s16_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcvtq_m_s32_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcvtq_m_u16_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcvtq_m_u32_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcvttq_m_f16_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcvttq_m_f32_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vdupq_m_n_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vdupq_m_n_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vfmaq_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vfmaq_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vfmaq_n_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vfmaq_n_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vfmasq_n_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vfmasq_n_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vfmsq_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vfmsq_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmaxnmaq_m_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmaxnmaq_m_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmaxnmavq_p_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmaxnmavq_p_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmaxnmvq_p_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmaxnmvq_p_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vminnmaq_m_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vminnmaq_m_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vminnmavq_p_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vminnmavq_p_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vminnmvq_p_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vminnmvq_p_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmlaldavaq_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmlaldavaq_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmlaldavaq_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmlaldavaq_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmlaldavaxq_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmlaldavaxq_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmlaldavaxq_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmlaldavaxq_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmlaldavq_p_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmlaldavq_p_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmlaldavq_p_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmlaldavq_p_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmlaldavxq_p_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmlaldavxq_p_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmlaldavxq_p_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmlaldavxq_p_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmlsldavaq_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmlsldavaq_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmlsldavaxq_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmlsldavaxq_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmlsldavq_p_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmlsldavq_p_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmlsldavxq_p_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmlsldavxq_p_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmovlbq_m_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmovlbq_m_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmovlbq_m_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmovlbq_m_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmovltq_m_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmovltq_m_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmovltq_m_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmovltq_m_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmovnbq_m_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmovnbq_m_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmovnbq_m_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmovnbq_m_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmovntq_m_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmovntq_m_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmovntq_m_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmovntq_m_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmvnq_m_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmvnq_m_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmvnq_m_n_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmvnq_m_n_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vnegq_m_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vnegq_m_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vorrq_m_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vorrq_m_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vorrq_m_n_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vorrq_m_n_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vpselq_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vpselq_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqmovnbq_m_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqmovnbq_m_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqmovnbq_m_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqmovnbq_m_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqmovntq_m_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqmovntq_m_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqmovntq_m_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqmovntq_m_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqmovunbq_m_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqmovunbq_m_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqmovuntq_m_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqmovuntq_m_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqrshrntq_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqrshrntq_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqrshrntq_n_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqrshrntq_n_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqrshruntq_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqrshruntq_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqshrnbq_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqshrnbq_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqshrnbq_n_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqshrnbq_n_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqshrntq_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqshrntq_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqshrntq_n_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqshrntq_n_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqshrunbq_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqshrunbq_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqshruntq_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqshruntq_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrev16q_m_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrev16q_m_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrev32q_m_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrev32q_m_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrev32q_m_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrev32q_m_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrev32q_m_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrev64q_m_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrev64q_m_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrmlaldavhaxq_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrmlaldavhq_p_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrmlaldavhq_p_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrmlaldavhxq_p_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrmlsldavhaq_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrmlsldavhaxq_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrmlsldavhq_p_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrmlsldavhxq_p_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrndaq_m_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrndaq_m_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrndmq_m_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrndmq_m_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrndnq_m_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrndnq_m_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrndpq_m_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrndpq_m_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrndq_m_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrndq_m_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrndxq_m_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrndxq_m_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrshrnbq_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrshrnbq_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrshrnbq_n_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrshrnbq_n_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrshrntq_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrshrntq_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrshrntq_n_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrshrntq_n_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vshrnbq_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vshrnbq_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vshrnbq_n_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vshrnbq_n_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vshrntq_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vshrntq_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vshrntq_n_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vshrntq_n_u32.c: Likewise.

[-- Attachment #2: diff15.patch.gz --]
[-- Type: application/gzip, Size: 29303 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH][ARM][GCC][3/x]: MVE ACLE intrinsics framework patch.
  2019-11-14 19:34 [PATCH][ARM][GCC][0/x]: Support for MVE ACLE intrinsics Srinath Parvathaneni
  2019-11-14 19:13 ` [PATCH][ARM][GCC][2/2x]: MVE intrinsics with binary operands Srinath Parvathaneni
@ 2019-11-14 19:13 ` Srinath Parvathaneni
  2019-12-19 17:50   ` Kyrill Tkachov
  2019-11-14 19:13 ` [PATCH][ARM][GCC][4/x]: MVE ACLE vector interleaving store intrinsics Srinath Parvathaneni
                   ` (36 subsequent siblings)
  38 siblings, 1 reply; 48+ messages in thread
From: Srinath Parvathaneni @ 2019-11-14 19:13 UTC (permalink / raw)
  To: gcc-patches; +Cc: Richard Earnshaw, Kyrylo Tkachov

[-- Attachment #1: Type: text/plain, Size: 2922 bytes --]

Hello,

This patch is part of MVE ACLE intrinsics framework.

The patch supports the use of emulation for the double-precision arithmetic
operations for MVE. This changes are to support the MVE ACLE intrinsics which
operates on vector floating point arithmetic operations.

Please refer to Arm reference manual [1] for more details.
[1] https://static.docs.arm.com/ddi0553/bh/DDI0553B_h_armv8m_arm.pdf?_ga=2.102521798.659307368.1572453718-1501600630.1548848914

Regression tested on arm-none-eabi and found no regressions.

Ok for trunk?

Thanks,
Srinath.

gcc/ChangeLog:

2019-11-11  Andre Vieira  <andre.simoesdiasvieira@arm.com>
	    Srinath Parvathaneni  <srinath.parvathaneni@arm.com>

	* config/arm/arm.c (arm_libcall_uses_aapcs_base): Modify function to add
	emulator calls for dobule precision arithmetic operations for MVE.


###############     Attachment also inlined for ease of reply    ###############


diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 6faed76206b93c1a9dea048e2f693dc16ee58072..358b2638b65a2007d1c7e8062844b67682597f45 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -5658,9 +5658,25 @@ arm_libcall_uses_aapcs_base (const_rtx libcall)
       /* Values from double-precision helper functions are returned in core
 	 registers if the selected core only supports single-precision
 	 arithmetic, even if we are using the hard-float ABI.  The same is
-	 true for single-precision helpers, but we will never be using the
-	 hard-float ABI on a CPU which doesn't support single-precision
-	 operations in hardware.  */
+	 true for single-precision helpers except in case of MVE, because in
+	 MVE we will be using the hard-float ABI on a CPU which doesn't support
+	 single-precision operations in hardware.  In MVE the following check
+	 enables use of emulation for the double-precision arithmetic
+	 operations.  */
+      if (TARGET_HAVE_MVE)
+	{
+	  add_libcall (libcall_htab, optab_libfunc (add_optab, SFmode));
+	  add_libcall (libcall_htab, optab_libfunc (sdiv_optab, SFmode));
+	  add_libcall (libcall_htab, optab_libfunc (smul_optab, SFmode));
+	  add_libcall (libcall_htab, optab_libfunc (neg_optab, SFmode));
+	  add_libcall (libcall_htab, optab_libfunc (sub_optab, SFmode));
+	  add_libcall (libcall_htab, optab_libfunc (eq_optab, SFmode));
+	  add_libcall (libcall_htab, optab_libfunc (lt_optab, SFmode));
+	  add_libcall (libcall_htab, optab_libfunc (le_optab, SFmode));
+	  add_libcall (libcall_htab, optab_libfunc (ge_optab, SFmode));
+	  add_libcall (libcall_htab, optab_libfunc (gt_optab, SFmode));
+	  add_libcall (libcall_htab, optab_libfunc (unord_optab, SFmode));
+	}
       add_libcall (libcall_htab, optab_libfunc (add_optab, DFmode));
       add_libcall (libcall_htab, optab_libfunc (sdiv_optab, DFmode));
       add_libcall (libcall_htab, optab_libfunc (smul_optab, DFmode));


[-- Attachment #2: diff02.patch --]
[-- Type: text/plain, Size: 1940 bytes --]

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 6faed76206b93c1a9dea048e2f693dc16ee58072..358b2638b65a2007d1c7e8062844b67682597f45 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -5658,9 +5658,25 @@ arm_libcall_uses_aapcs_base (const_rtx libcall)
       /* Values from double-precision helper functions are returned in core
 	 registers if the selected core only supports single-precision
 	 arithmetic, even if we are using the hard-float ABI.  The same is
-	 true for single-precision helpers, but we will never be using the
-	 hard-float ABI on a CPU which doesn't support single-precision
-	 operations in hardware.  */
+	 true for single-precision helpers except in case of MVE, because in
+	 MVE we will be using the hard-float ABI on a CPU which doesn't support
+	 single-precision operations in hardware.  In MVE the following check
+	 enables use of emulation for the double-precision arithmetic
+	 operations.  */
+      if (TARGET_HAVE_MVE)
+	{
+	  add_libcall (libcall_htab, optab_libfunc (add_optab, SFmode));
+	  add_libcall (libcall_htab, optab_libfunc (sdiv_optab, SFmode));
+	  add_libcall (libcall_htab, optab_libfunc (smul_optab, SFmode));
+	  add_libcall (libcall_htab, optab_libfunc (neg_optab, SFmode));
+	  add_libcall (libcall_htab, optab_libfunc (sub_optab, SFmode));
+	  add_libcall (libcall_htab, optab_libfunc (eq_optab, SFmode));
+	  add_libcall (libcall_htab, optab_libfunc (lt_optab, SFmode));
+	  add_libcall (libcall_htab, optab_libfunc (le_optab, SFmode));
+	  add_libcall (libcall_htab, optab_libfunc (ge_optab, SFmode));
+	  add_libcall (libcall_htab, optab_libfunc (gt_optab, SFmode));
+	  add_libcall (libcall_htab, optab_libfunc (unord_optab, SFmode));
+	}
       add_libcall (libcall_htab, optab_libfunc (add_optab, DFmode));
       add_libcall (libcall_htab, optab_libfunc (sdiv_optab, DFmode));
       add_libcall (libcall_htab, optab_libfunc (smul_optab, DFmode));


^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH][ARM][GCC][3/4x]: MVE intrinsics with quaternary operands.
  2019-11-14 19:34 [PATCH][ARM][GCC][0/x]: Support for MVE ACLE intrinsics Srinath Parvathaneni
                   ` (7 preceding siblings ...)
  2019-11-14 19:13 ` [PATCH][ARM][GCC][5/2x]: MVE intrinsics with binary operands Srinath Parvathaneni
@ 2019-11-14 19:13 ` Srinath Parvathaneni
  2019-11-14 19:13 ` [PATCH][ARM][GCC][3/2x]: MVE intrinsics with binary operands Srinath Parvathaneni
                   ` (29 subsequent siblings)
  38 siblings, 0 replies; 48+ messages in thread
From: Srinath Parvathaneni @ 2019-11-14 19:13 UTC (permalink / raw)
  To: gcc-patches; +Cc: Richard Earnshaw, Kyrylo Tkachov

[-- Attachment #1: Type: text/plain, Size: 15272 bytes --]

Hello,

This patch supports following MVE ACLE intrinsics with quaternary operands.

vmlaldavaq_p_s16, vmlaldavaq_p_s32, vmlaldavaq_p_u16, vmlaldavaq_p_u32, vmlaldavaxq_p_s16,
vmlaldavaxq_p_s32, vmlaldavaxq_p_u16, vmlaldavaxq_p_u32, vmlsldavaq_p_s16, vmlsldavaq_p_s32,
vmlsldavaxq_p_s16, vmlsldavaxq_p_s32, vmullbq_poly_m_p16, vmullbq_poly_m_p8,
vmulltq_poly_m_p16, vmulltq_poly_m_p8, vqdmullbq_m_n_s16, vqdmullbq_m_n_s32, vqdmullbq_m_s16,
vqdmullbq_m_s32, vqdmulltq_m_n_s16, vqdmulltq_m_n_s32, vqdmulltq_m_s16, vqdmulltq_m_s32,
vqrshrnbq_m_n_s16, vqrshrnbq_m_n_s32, vqrshrnbq_m_n_u16, vqrshrnbq_m_n_u32, vqrshrntq_m_n_s16,
vqrshrntq_m_n_s32, vqrshrntq_m_n_u16, vqrshrntq_m_n_u32, vqrshrunbq_m_n_s16, vqrshrunbq_m_n_s32,
vqrshruntq_m_n_s16, vqrshruntq_m_n_s32, vqshrnbq_m_n_s16, vqshrnbq_m_n_s32, vqshrnbq_m_n_u16,
vqshrnbq_m_n_u32, vqshrntq_m_n_s16, vqshrntq_m_n_s32, vqshrntq_m_n_u16, vqshrntq_m_n_u32,
vqshrunbq_m_n_s16, vqshrunbq_m_n_s32, vqshruntq_m_n_s16, vqshruntq_m_n_s32, vrmlaldavhaq_p_s32,
vrmlaldavhaq_p_u32, vrmlaldavhaxq_p_s32, vrmlsldavhaq_p_s32, vrmlsldavhaxq_p_s32,
vrshrnbq_m_n_s16, vrshrnbq_m_n_s32, vrshrnbq_m_n_u16, vrshrnbq_m_n_u32, vrshrntq_m_n_s16,
vrshrntq_m_n_s32, vrshrntq_m_n_u16, vrshrntq_m_n_u32, vshllbq_m_n_s16, vshllbq_m_n_s8,
vshllbq_m_n_u16, vshllbq_m_n_u8, vshlltq_m_n_s16, vshlltq_m_n_s8, vshlltq_m_n_u16, vshlltq_m_n_u8,
vshrnbq_m_n_s16, vshrnbq_m_n_s32, vshrnbq_m_n_u16, vshrnbq_m_n_u32, vshrntq_m_n_s16,
vshrntq_m_n_s32, vshrntq_m_n_u16, vshrntq_m_n_u32.

Please refer to M-profile Vector Extension (MVE) intrinsics [1]  for more details.
[1] https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/mve-intrinsics

Regression tested on arm-none-eabi and found no regressions.

Ok for trunk?

Thanks,
Srinath.

gcc/ChangeLog:

2019-10-31  Andre Vieira  <andre.simoesdiasvieira@arm.com>
	    Mihail Ionescu  <mihail.ionescu@arm.com>
	    Srinath Parvathaneni  <srinath.parvathaneni@arm.com>

	* config/arm/arm-protos.h (arm_mve_immediate_check): 
	* config/arm/arm.c (arm_mve_immediate_check): Define fuction to	check
	mode and interger value.
	* config/arm/arm_mve.h (vmlaldavaq_p_s32): Define macro.
	(vmlaldavaq_p_s16): Likewise.
	(vmlaldavaq_p_u32): Likewise.
	(vmlaldavaq_p_u16): Likewise.
	(vmlaldavaxq_p_s32): Likewise.
	(vmlaldavaxq_p_s16): Likewise.
	(vmlaldavaxq_p_u32): Likewise.
	(vmlaldavaxq_p_u16): Likewise.
	(vmlsldavaq_p_s32): Likewise.
	(vmlsldavaq_p_s16): Likewise.
	(vmlsldavaxq_p_s32): Likewise.
	(vmlsldavaxq_p_s16): Likewise.
	(vmullbq_poly_m_p8): Likewise.
	(vmullbq_poly_m_p16): Likewise.
	(vmulltq_poly_m_p8): Likewise.
	(vmulltq_poly_m_p16): Likewise.
	(vqdmullbq_m_n_s32): Likewise.
	(vqdmullbq_m_n_s16): Likewise.
	(vqdmullbq_m_s32): Likewise.
	(vqdmullbq_m_s16): Likewise.
	(vqdmulltq_m_n_s32): Likewise.
	(vqdmulltq_m_n_s16): Likewise.
	(vqdmulltq_m_s32): Likewise.
	(vqdmulltq_m_s16): Likewise.
	(vqrshrnbq_m_n_s32): Likewise.
	(vqrshrnbq_m_n_s16): Likewise.
	(vqrshrnbq_m_n_u32): Likewise.
	(vqrshrnbq_m_n_u16): Likewise.
	(vqrshrntq_m_n_s32): Likewise.
	(vqrshrntq_m_n_s16): Likewise.
	(vqrshrntq_m_n_u32): Likewise.
	(vqrshrntq_m_n_u16): Likewise.
	(vqrshrunbq_m_n_s32): Likewise.
	(vqrshrunbq_m_n_s16): Likewise.
	(vqrshruntq_m_n_s32): Likewise.
	(vqrshruntq_m_n_s16): Likewise.
	(vqshrnbq_m_n_s32): Likewise.
	(vqshrnbq_m_n_s16): Likewise.
	(vqshrnbq_m_n_u32): Likewise.
	(vqshrnbq_m_n_u16): Likewise.
	(vqshrntq_m_n_s32): Likewise.
	(vqshrntq_m_n_s16): Likewise.
	(vqshrntq_m_n_u32): Likewise.
	(vqshrntq_m_n_u16): Likewise.
	(vqshrunbq_m_n_s32): Likewise.
	(vqshrunbq_m_n_s16): Likewise.
	(vqshruntq_m_n_s32): Likewise.
	(vqshruntq_m_n_s16): Likewise.
	(vrmlaldavhaq_p_s32): Likewise.
	(vrmlaldavhaq_p_u32): Likewise.
	(vrmlaldavhaxq_p_s32): Likewise.
	(vrmlsldavhaq_p_s32): Likewise.
	(vrmlsldavhaxq_p_s32): Likewise.
	(vrshrnbq_m_n_s32): Likewise.
	(vrshrnbq_m_n_s16): Likewise.
	(vrshrnbq_m_n_u32): Likewise.
	(vrshrnbq_m_n_u16): Likewise.
	(vrshrntq_m_n_s32): Likewise.
	(vrshrntq_m_n_s16): Likewise.
	(vrshrntq_m_n_u32): Likewise.
	(vrshrntq_m_n_u16): Likewise.
	(vshllbq_m_n_s8): Likewise.
	(vshllbq_m_n_s16): Likewise.
	(vshllbq_m_n_u8): Likewise.
	(vshllbq_m_n_u16): Likewise.
	(vshlltq_m_n_s8): Likewise.
	(vshlltq_m_n_s16): Likewise.
	(vshlltq_m_n_u8): Likewise.
	(vshlltq_m_n_u16): Likewise.
	(vshrnbq_m_n_s32): Likewise.
	(vshrnbq_m_n_s16): Likewise.
	(vshrnbq_m_n_u32): Likewise.
	(vshrnbq_m_n_u16): Likewise.
	(vshrntq_m_n_s32): Likewise.
	(vshrntq_m_n_s16): Likewise.
	(vshrntq_m_n_u32): Likewise.
	(vshrntq_m_n_u16): Likewise.
	(__arm_vmlaldavaq_p_s32): Define intrinsic.
	(__arm_vmlaldavaq_p_s16): Likewise.
	(__arm_vmlaldavaq_p_u32): Likewise.
	(__arm_vmlaldavaq_p_u16): Likewise.
	(__arm_vmlaldavaxq_p_s32): Likewise.
	(__arm_vmlaldavaxq_p_s16): Likewise.
	(__arm_vmlaldavaxq_p_u32): Likewise.
	(__arm_vmlaldavaxq_p_u16): Likewise.
	(__arm_vmlsldavaq_p_s32): Likewise.
	(__arm_vmlsldavaq_p_s16): Likewise.
	(__arm_vmlsldavaxq_p_s32): Likewise.
	(__arm_vmlsldavaxq_p_s16): Likewise.
	(__arm_vmullbq_poly_m_p8): Likewise.
	(__arm_vmullbq_poly_m_p16): Likewise.
	(__arm_vmulltq_poly_m_p8): Likewise.
	(__arm_vmulltq_poly_m_p16): Likewise.
	(__arm_vqdmullbq_m_n_s32): Likewise.
	(__arm_vqdmullbq_m_n_s16): Likewise.
	(__arm_vqdmullbq_m_s32): Likewise.
	(__arm_vqdmullbq_m_s16): Likewise.
	(__arm_vqdmulltq_m_n_s32): Likewise.
	(__arm_vqdmulltq_m_n_s16): Likewise.
	(__arm_vqdmulltq_m_s32): Likewise.
	(__arm_vqdmulltq_m_s16): Likewise.
	(__arm_vqrshrnbq_m_n_s32): Likewise.
	(__arm_vqrshrnbq_m_n_s16): Likewise.
	(__arm_vqrshrnbq_m_n_u32): Likewise.
	(__arm_vqrshrnbq_m_n_u16): Likewise.
	(__arm_vqrshrntq_m_n_s32): Likewise.
	(__arm_vqrshrntq_m_n_s16): Likewise.
	(__arm_vqrshrntq_m_n_u32): Likewise.
	(__arm_vqrshrntq_m_n_u16): Likewise.
	(__arm_vqrshrunbq_m_n_s32): Likewise.
	(__arm_vqrshrunbq_m_n_s16): Likewise.
	(__arm_vqrshruntq_m_n_s32): Likewise.
	(__arm_vqrshruntq_m_n_s16): Likewise.
	(__arm_vqshrnbq_m_n_s32): Likewise.
	(__arm_vqshrnbq_m_n_s16): Likewise.
	(__arm_vqshrnbq_m_n_u32): Likewise.
	(__arm_vqshrnbq_m_n_u16): Likewise.
	(__arm_vqshrntq_m_n_s32): Likewise.
	(__arm_vqshrntq_m_n_s16): Likewise.
	(__arm_vqshrntq_m_n_u32): Likewise.
	(__arm_vqshrntq_m_n_u16): Likewise.
	(__arm_vqshrunbq_m_n_s32): Likewise.
	(__arm_vqshrunbq_m_n_s16): Likewise.
	(__arm_vqshruntq_m_n_s32): Likewise.
	(__arm_vqshruntq_m_n_s16): Likewise.
	(__arm_vrmlaldavhaq_p_s32): Likewise.
	(__arm_vrmlaldavhaq_p_u32): Likewise.
	(__arm_vrmlaldavhaxq_p_s32): Likewise.
	(__arm_vrmlsldavhaq_p_s32): Likewise.
	(__arm_vrmlsldavhaxq_p_s32): Likewise.
	(__arm_vrshrnbq_m_n_s32): Likewise.
	(__arm_vrshrnbq_m_n_s16): Likewise.
	(__arm_vrshrnbq_m_n_u32): Likewise.
	(__arm_vrshrnbq_m_n_u16): Likewise.
	(__arm_vrshrntq_m_n_s32): Likewise.
	(__arm_vrshrntq_m_n_s16): Likewise.
	(__arm_vrshrntq_m_n_u32): Likewise.
	(__arm_vrshrntq_m_n_u16): Likewise.
	(__arm_vshllbq_m_n_s8): Likewise.
	(__arm_vshllbq_m_n_s16): Likewise.
	(__arm_vshllbq_m_n_u8): Likewise.
	(__arm_vshllbq_m_n_u16): Likewise.
	(__arm_vshlltq_m_n_s8): Likewise.
	(__arm_vshlltq_m_n_s16): Likewise.
	(__arm_vshlltq_m_n_u8): Likewise.
	(__arm_vshlltq_m_n_u16): Likewise.
	(__arm_vshrnbq_m_n_s32): Likewise.
	(__arm_vshrnbq_m_n_s16): Likewise.
	(__arm_vshrnbq_m_n_u32): Likewise.
	(__arm_vshrnbq_m_n_u16): Likewise.
	(__arm_vshrntq_m_n_s32): Likewise.
	(__arm_vshrntq_m_n_s16): Likewise.
	(__arm_vshrntq_m_n_u32): Likewise.
	(__arm_vshrntq_m_n_u16): Likewise.
	(vmullbq_poly_m): Define polymorphic variant.
	(vmulltq_poly_m): Likewise.
	(vshllbq_m): Likewise.
	(vshrntq_m_n): Likewise.
	(vshrnbq_m_n): Likewise.
	(vshlltq_m_n): Likewise.
	(vshllbq_m_n): Likewise.
	(vrshrntq_m_n): Likewise.
	(vrshrnbq_m_n): Likewise.
	(vqshruntq_m_n): Likewise.
	(vqshrunbq_m_n): Likewise.
	(vqdmullbq_m_n): Likewise.
	(vqdmullbq_m): Likewise.
	(vqdmulltq_m_n): Likewise.
	(vqdmulltq_m): Likewise.
	(vqrshrnbq_m_n): Likewise.
	(vqrshrntq_m_n): Likewise.
	(vqrshrunbq_m_n): Likewise.
	(vqrshruntq_m_n): Likewise.
	(vqshrnbq_m_n): Likewise.
	(vqshrntq_m_n): Likewise.
	* config/arm/arm_mve_builtins.def (QUADOP_NONE_NONE_NONE_IMM_UNONE): Use
	builtin qualifiers.
	(QUADOP_NONE_NONE_NONE_NONE_UNONE): Likewise.
	(QUADOP_UNONE_UNONE_NONE_IMM_UNONE): Likewise.
	(QUADOP_UNONE_UNONE_UNONE_IMM_UNONE): Likewise.
	(QUADOP_UNONE_UNONE_UNONE_UNONE_UNONE): Likewise.
	* config/arm/mve.md (VMLALDAVAQ_P): Define iterator.
	(VMLALDAVAXQ_P): Likewise.
	(VQRSHRNBQ_M_N): Likewise.
	(VQRSHRNTQ_M_N): Likewise.
	(VQSHRNBQ_M_N): Likewise.
	(VQSHRNTQ_M_N): Likewise.
	(VRSHRNBQ_M_N): Likewise.
	(VRSHRNTQ_M_N): Likewise.
	(VSHLLBQ_M_N): Likewise.
	(VSHLLTQ_M_N): Likewise.
	(VSHRNBQ_M_N): Likewise.
	(VSHRNTQ_M_N): Likewise.
	(mve_vmlaldavaq_p_<supf><mode>): Define RTL pattern.
	(mve_vmlaldavaxq_p_<supf><mode>): Likewise.
	(mve_vqrshrnbq_m_n_<supf><mode>): Likewise.
	(mve_vqrshrntq_m_n_<supf><mode>): Likewise.
	(mve_vqshrnbq_m_n_<supf><mode>): Likewise.
	(mve_vqshrntq_m_n_<supf><mode>): Likewise.
	(mve_vrmlaldavhaq_p_sv4si): Likewise.
	(mve_vrshrnbq_m_n_<supf><mode>): Likewise.
	(mve_vrshrntq_m_n_<supf><mode>): Likewise.
	(mve_vshllbq_m_n_<supf><mode>): Likewise.
	(mve_vshlltq_m_n_<supf><mode>): Likewise.
	(mve_vshrnbq_m_n_<supf><mode>): Likewise.
	(mve_vshrntq_m_n_<supf><mode>): Likewise.
	(mve_vmlsldavaq_p_s<mode>): Likewise.
	(mve_vmlsldavaxq_p_s<mode>): Likewise.
	(mve_vmullbq_poly_m_p<mode>): Likewise.
	(mve_vmulltq_poly_m_p<mode>): Likewise.
	(mve_vqdmullbq_m_n_s<mode>): Likewise.
	(mve_vqdmullbq_m_s<mode>): Likewise.
	(mve_vqdmulltq_m_n_s<mode>): Likewise.
	(mve_vqdmulltq_m_s<mode>): Likewise.
	(mve_vqrshrunbq_m_n_s<mode>): Likewise.
	(mve_vqrshruntq_m_n_s<mode>): Likewise.
	(mve_vqshrunbq_m_n_s<mode>): Likewise.
	(mve_vqshruntq_m_n_s<mode>): Likewise.
	(mve_vrmlaldavhaq_p_uv4si): Likewise.
	(mve_vrmlaldavhaxq_p_sv4si): Likewise.
	(mve_vrmlsldavhaq_p_sv4si): Likewise.
	(mve_vrmlsldavhaxq_p_sv4si): Likewise.

gcc/testsuite/ChangeLog:

2019-10-31  Andre Vieira  <andre.simoesdiasvieira@arm.com>
	    Mihail Ionescu  <mihail.ionescu@arm.com>
	    Srinath Parvathaneni  <srinath.parvathaneni@arm.com>

	* gcc.target/arm/mve/intrinsics/vmlaldavaq_p_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmlaldavaq_p_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmlaldavaq_p_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmlaldavaq_p_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmlaldavaxq_p_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmlaldavaxq_p_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmlaldavaxq_p_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmlaldavaxq_p_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmlsldavaq_p_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmlsldavaq_p_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmlsldavaxq_p_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmlsldavaxq_p_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmullbq_poly_m_p16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmullbq_poly_m_p8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmulltq_poly_m_p16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmulltq_poly_m_p8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqdmullbq_m_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqdmullbq_m_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqdmullbq_m_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqdmullbq_m_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqdmulltq_m_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqdmulltq_m_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqdmulltq_m_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqdmulltq_m_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqrshrnbq_m_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqrshrnbq_m_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqrshrnbq_m_n_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqrshrnbq_m_n_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqrshrntq_m_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqrshrntq_m_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqrshrntq_m_n_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqrshrntq_m_n_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqrshrunbq_m_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqrshrunbq_m_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqrshruntq_m_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqrshruntq_m_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqshrnbq_m_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqshrnbq_m_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqshrnbq_m_n_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqshrnbq_m_n_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqshrntq_m_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqshrntq_m_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqshrntq_m_n_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqshrntq_m_n_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqshrunbq_m_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqshrunbq_m_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqshruntq_m_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqshruntq_m_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrmlaldavhaq_p_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrmlaldavhaq_p_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrmlaldavhaxq_p_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrmlsldavhaq_p_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrmlsldavhaxq_p_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrshrnbq_m_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrshrnbq_m_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrshrnbq_m_n_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrshrnbq_m_n_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrshrntq_m_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrshrntq_m_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrshrntq_m_n_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrshrntq_m_n_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vshllbq_m_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vshllbq_m_n_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vshllbq_m_n_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vshllbq_m_n_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vshlltq_m_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vshlltq_m_n_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vshlltq_m_n_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vshlltq_m_n_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vshrnbq_m_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vshrnbq_m_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vshrnbq_m_n_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vshrnbq_m_n_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vshrntq_m_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vshrntq_m_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vshrntq_m_n_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vshrntq_m_n_u32.c: Likewise.

[-- Attachment #2: diff18.patch.gz --]
[-- Type: application/gzip, Size: 10979 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH][ARM][GCC][2/2x]: MVE intrinsics with binary operands.
  2019-11-14 19:34 [PATCH][ARM][GCC][0/x]: Support for MVE ACLE intrinsics Srinath Parvathaneni
@ 2019-11-14 19:13 ` Srinath Parvathaneni
  2019-11-14 19:13 ` [PATCH][ARM][GCC][3/x]: MVE ACLE intrinsics framework patch Srinath Parvathaneni
                   ` (37 subsequent siblings)
  38 siblings, 0 replies; 48+ messages in thread
From: Srinath Parvathaneni @ 2019-11-14 19:13 UTC (permalink / raw)
  To: gcc-patches; +Cc: Richard Earnshaw, Kyrylo Tkachov

[-- Attachment #1: Type: text/plain, Size: 33680 bytes --]

Hello,

This patch supports following MVE ACLE intrinsics with binary operands.

vcvtq_n_s16_f16, vcvtq_n_s32_f32, vcvtq_n_u16_f16, vcvtq_n_u32_f32, vcreateq_u8,
vcreateq_u16, vcreateq_u32, vcreateq_u64, vcreateq_s8, vcreateq_s16, vcreateq_s32,
vcreateq_s64, vshrq_n_s8, vshrq_n_s16, vshrq_n_s32, vshrq_n_u8, vshrq_n_u16, vshrq_n_u32.

Please refer to M-profile Vector Extension (MVE) intrinsics [1]  for more details.
[1] https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/mve-intrinsics

In this patch new constraints "Rb" and "Rf" are added, which checks the constant is with
in the range of 1 to 8 and 1 to 32 respectively.

Also a new predicates "mve_imm_8" and "mve_imm_32" are added, to check the the matching
constraint Rb and Rf respectively.

Regression tested on arm-none-eabi and found no regressions.

Ok for trunk?

Thanks,
Srinath.

gcc/ChangeLog:

2019-10-21  Andre Vieira  <andre.simoesdiasvieira@arm.com>
	    Mihail Ionescu  <mihail.ionescu@arm.com>
	    Srinath Parvathaneni  <srinath.parvathaneni@arm.com>

	* config/arm/arm-builtins.c (BINOP_UNONE_UNONE_IMM_QUALIFIERS): Define
	qualifier for binary operands.
	(BINOP_UNONE_UNONE_UNONE_QUALIFIERS): Likewise.
	(BINOP_UNONE_NONE_IMM_QUALIFIERS): Likewise.
	* config/arm/arm_mve.h (vcvtq_n_s16_f16): Define macro.
	(vcvtq_n_s32_f32): Likewise.
	(vcvtq_n_u16_f16): Likewise.
	(vcvtq_n_u32_f32): Likewise.
	(vcreateq_u8): Likewise.
	(vcreateq_u16): Likewise.
	(vcreateq_u32): Likewise.
	(vcreateq_u64): Likewise.
	(vcreateq_s8): Likewise.
	(vcreateq_s16): Likewise.
	(vcreateq_s32): Likewise.
	(vcreateq_s64): Likewise.
	(vshrq_n_s8): Likewise.
	(vshrq_n_s16): Likewise.
	(vshrq_n_s32): Likewise.
	(vshrq_n_u8): Likewise.
	(vshrq_n_u16): Likewise.
	(vshrq_n_u32): Likewise.
	(__arm_vcreateq_u8): Define intrinsic.
	(__arm_vcreateq_u16): Likewise.
	(__arm_vcreateq_u32): Likewise.
	(__arm_vcreateq_u64): Likewise.
	(__arm_vcreateq_s8): Likewise.
	(__arm_vcreateq_s16): Likewise.
	(__arm_vcreateq_s32): Likewise.
	(__arm_vcreateq_s64): Likewise.
	(__arm_vshrq_n_s8): Likewise.
	(__arm_vshrq_n_s16): Likewise.
	(__arm_vshrq_n_s32): Likewise.
	(__arm_vshrq_n_u8): Likewise.
	(__arm_vshrq_n_u16): Likewise.
	(__arm_vshrq_n_u32): Likewise.
	(__arm_vcvtq_n_s16_f16): Likewise.
	(__arm_vcvtq_n_s32_f32): Likewise.
	(__arm_vcvtq_n_u16_f16): Likewise.
	(__arm_vcvtq_n_u32_f32): Likewise.
	(vshrq_n): Define polymorphic variant.
	* config/arm/arm_mve_builtins.def (BINOP_UNONE_UNONE_IMM_QUALIFIERS):
	Use it.
        (BINOP_UNONE_UNONE_UNONE_QUALIFIERS): Likewise.
        (BINOP_UNONE_NONE_IMM_QUALIFIERS): Likewise.
	* config/arm/constraints.md (Rb): Define constraint to check constant is
	in the range of 1 to 8.
	(Rf): Define constraint to check constant is in the range of 1 to 32.
	* config/arm/mve.md (mve_vcreateq_<supf><mode>): Define RTL pattern.
	(mve_vshrq_n_<supf><mode>): Likewise.
	(mve_vcvtq_n_from_f_<supf><mode>): Likewise.
	* config/arm/predicates.md (mve_imm_8): Define predicate to check
	the matching constraint Rb.
	(mve_imm_32): Define predicate to check the matching constraint Rf.

gcc/testsuite/ChangeLog:

2019-10-21  Andre Vieira  <andre.simoesdiasvieira@arm.com>
	    Mihail Ionescu  <mihail.ionescu@arm.com>
	    Srinath Parvathaneni  <srinath.parvathaneni@arm.com>

	* gcc.target/arm/mve/intrinsics/vcreateq_s16.c: New test.
	* gcc.target/arm/mve/intrinsics/vcreateq_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcreateq_s64.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcreateq_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcreateq_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcreateq_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcreateq_u64.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcreateq_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcvtq_n_s16_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcvtq_n_s32_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcvtq_n_u16_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcvtq_n_u32_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vshrq_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vshrq_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vshrq_n_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vshrq_n_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vshrq_n_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vshrq_n_u8.c: Likewise.


###############     Attachment also inlined for ease of reply    ###############


diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index c2dad057d1365914477c64d559aa1fd1c32bbf19..ec56dbcdd66662bb1de696142186955d75570772 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -373,6 +373,24 @@ arm_binop_none_unone_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 #define BINOP_NONE_UNONE_UNONE_QUALIFIERS \
   (arm_binop_none_unone_unone_qualifiers)
 
+static enum arm_type_qualifiers
+arm_binop_unone_unone_imm_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_unsigned, qualifier_unsigned, qualifier_immediate };
+#define BINOP_UNONE_UNONE_IMM_QUALIFIERS \
+  (arm_binop_unone_unone_imm_qualifiers)
+
+static enum arm_type_qualifiers
+arm_binop_unone_unone_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_unsigned, qualifier_unsigned, qualifier_unsigned };
+#define BINOP_UNONE_UNONE_UNONE_QUALIFIERS \
+  (arm_binop_unone_unone_unone_qualifiers)
+
+static enum arm_type_qualifiers
+arm_binop_unone_none_imm_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_unsigned, qualifier_none, qualifier_immediate };
+#define BINOP_UNONE_NONE_IMM_QUALIFIERS \
+  (arm_binop_unone_none_imm_qualifiers)
+
 /* End of Qualifier for MVE builtins.  */
 
    /* void ([T element type] *, T, immediate).  */
diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
index 15b7ada025fe57b682f3873f0f275c43a30d1273..0318a2cd8c3cdea430a32506e75a5aceb4d3a475 100644
--- a/gcc/config/arm/arm_mve.h
+++ b/gcc/config/arm/arm_mve.h
@@ -207,6 +207,24 @@ typedef struct { uint8x16_t val[4]; } uint8x16x4_t;
 #define vcvtq_n_f32_u32(__a,  __imm6) __arm_vcvtq_n_f32_u32(__a,  __imm6)
 #define vcreateq_f16(__a, __b) __arm_vcreateq_f16(__a, __b)
 #define vcreateq_f32(__a, __b) __arm_vcreateq_f32(__a, __b)
+#define vcvtq_n_s16_f16(__a,  __imm6) __arm_vcvtq_n_s16_f16(__a,  __imm6)
+#define vcvtq_n_s32_f32(__a,  __imm6) __arm_vcvtq_n_s32_f32(__a,  __imm6)
+#define vcvtq_n_u16_f16(__a,  __imm6) __arm_vcvtq_n_u16_f16(__a,  __imm6)
+#define vcvtq_n_u32_f32(__a,  __imm6) __arm_vcvtq_n_u32_f32(__a,  __imm6)
+#define vcreateq_u8(__a, __b) __arm_vcreateq_u8(__a, __b)
+#define vcreateq_u16(__a, __b) __arm_vcreateq_u16(__a, __b)
+#define vcreateq_u32(__a, __b) __arm_vcreateq_u32(__a, __b)
+#define vcreateq_u64(__a, __b) __arm_vcreateq_u64(__a, __b)
+#define vcreateq_s8(__a, __b) __arm_vcreateq_s8(__a, __b)
+#define vcreateq_s16(__a, __b) __arm_vcreateq_s16(__a, __b)
+#define vcreateq_s32(__a, __b) __arm_vcreateq_s32(__a, __b)
+#define vcreateq_s64(__a, __b) __arm_vcreateq_s64(__a, __b)
+#define vshrq_n_s8(__a,  __imm) __arm_vshrq_n_s8(__a,  __imm)
+#define vshrq_n_s16(__a,  __imm) __arm_vshrq_n_s16(__a,  __imm)
+#define vshrq_n_s32(__a,  __imm) __arm_vshrq_n_s32(__a,  __imm)
+#define vshrq_n_u8(__a,  __imm) __arm_vshrq_n_u8(__a,  __imm)
+#define vshrq_n_u16(__a,  __imm) __arm_vshrq_n_u16(__a,  __imm)
+#define vshrq_n_u32(__a,  __imm) __arm_vshrq_n_u32(__a,  __imm)
 #endif
 
 __extension__ extern __inline void
@@ -753,6 +771,104 @@ __arm_vpnot (mve_pred16_t __a)
   return __builtin_mve_vpnothi (__a);
 }
 
+__extension__ extern __inline uint8x16_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vcreateq_u8 (uint64_t __a, uint64_t __b)
+{
+  return __builtin_mve_vcreateq_uv16qi (__a, __b);
+}
+
+__extension__ extern __inline uint16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vcreateq_u16 (uint64_t __a, uint64_t __b)
+{
+  return __builtin_mve_vcreateq_uv8hi (__a, __b);
+}
+
+__extension__ extern __inline uint32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vcreateq_u32 (uint64_t __a, uint64_t __b)
+{
+  return __builtin_mve_vcreateq_uv4si (__a, __b);
+}
+
+__extension__ extern __inline uint64x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vcreateq_u64 (uint64_t __a, uint64_t __b)
+{
+  return __builtin_mve_vcreateq_uv2di (__a, __b);
+}
+
+__extension__ extern __inline int8x16_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vcreateq_s8 (uint64_t __a, uint64_t __b)
+{
+  return __builtin_mve_vcreateq_sv16qi (__a, __b);
+}
+
+__extension__ extern __inline int16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vcreateq_s16 (uint64_t __a, uint64_t __b)
+{
+  return __builtin_mve_vcreateq_sv8hi (__a, __b);
+}
+
+__extension__ extern __inline int32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vcreateq_s32 (uint64_t __a, uint64_t __b)
+{
+  return __builtin_mve_vcreateq_sv4si (__a, __b);
+}
+
+__extension__ extern __inline int64x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vcreateq_s64 (uint64_t __a, uint64_t __b)
+{
+  return __builtin_mve_vcreateq_sv2di (__a, __b);
+}
+
+__extension__ extern __inline int8x16_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vshrq_n_s8 (int8x16_t __a, const int __imm)
+{
+  return __builtin_mve_vshrq_n_sv16qi (__a, __imm);
+}
+
+__extension__ extern __inline int16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vshrq_n_s16 (int16x8_t __a, const int __imm)
+{
+  return __builtin_mve_vshrq_n_sv8hi (__a, __imm);
+}
+
+__extension__ extern __inline int32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vshrq_n_s32 (int32x4_t __a, const int __imm)
+{
+  return __builtin_mve_vshrq_n_sv4si (__a, __imm);
+}
+
+__extension__ extern __inline uint8x16_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vshrq_n_u8 (uint8x16_t __a, const int __imm)
+{
+  return __builtin_mve_vshrq_n_uv16qi (__a, __imm);
+}
+
+__extension__ extern __inline uint16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vshrq_n_u16 (uint16x8_t __a, const int __imm)
+{
+  return __builtin_mve_vshrq_n_uv8hi (__a, __imm);
+}
+
+__extension__ extern __inline uint32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vshrq_n_u32 (uint32x4_t __a, const int __imm)
+{
+  return __builtin_mve_vshrq_n_uv4si (__a, __imm);
+}
+
 #if (__ARM_FEATURE_MVE & 2) /* MVE Floating point.  */
 
 __extension__ extern __inline void
@@ -1165,6 +1281,34 @@ __arm_vcreateq_f32 (uint64_t __a, uint64_t __b)
   return __builtin_mve_vcreateq_fv4sf (__a, __b);
 }
 
+__extension__ extern __inline int16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vcvtq_n_s16_f16 (float16x8_t __a, const int __imm6)
+{
+  return __builtin_mve_vcvtq_n_from_f_sv8hi (__a, __imm6);
+}
+
+__extension__ extern __inline int32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vcvtq_n_s32_f32 (float32x4_t __a, const int __imm6)
+{
+  return __builtin_mve_vcvtq_n_from_f_sv4si (__a, __imm6);
+}
+
+__extension__ extern __inline uint16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vcvtq_n_u16_f16 (float16x8_t __a, const int __imm6)
+{
+  return __builtin_mve_vcvtq_n_from_f_uv8hi (__a, __imm6);
+}
+
+__extension__ extern __inline uint32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vcvtq_n_u32_f32 (float32x4_t __a, const int __imm6)
+{
+  return __builtin_mve_vcvtq_n_from_f_uv4si (__a, __imm6);
+}
+
 #endif
 
 enum {
@@ -1598,6 +1742,16 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_int16x8_t]: __arm_vqnegq_s16 (__ARM_mve_coerce(__p0, int16x8_t)), \
   int (*)[__ARM_mve_type_int32x4_t]: __arm_vqnegq_s32 (__ARM_mve_coerce(__p0, int32x4_t)));})
 
+#define vshrq(p0,p1) __arm_vshrq(p0,p1)
+#define __arm_vshrq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
+  int (*)[__ARM_mve_type_int8x16_t]: __arm_vshrq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), p1), \
+  int (*)[__ARM_mve_type_int16x8_t]: __arm_vshrq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), p1), \
+  int (*)[__ARM_mve_type_int32x4_t]: __arm_vshrq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), p1), \
+  int (*)[__ARM_mve_type_uint8x16_t]: __arm_vshrq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t), p1), \
+  int (*)[__ARM_mve_type_uint16x8_t]: __arm_vshrq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), p1), \
+  int (*)[__ARM_mve_type_uint32x4_t]: __arm_vshrq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), p1));})
+
 #endif /* MVE Floating point.  */
 
 #ifdef __cplusplus
diff --git a/gcc/config/arm/arm_mve_builtins.def b/gcc/config/arm/arm_mve_builtins.def
index 8d1e4fac3d75e87fbe334e64e1073cb1fef0d96d..eae37ba831361401241ed91950169f06dee29c3e 100644
--- a/gcc/config/arm/arm_mve_builtins.def
+++ b/gcc/config/arm/arm_mve_builtins.def
@@ -6,7 +6,7 @@
 
     GCC is free software; you can redistribute it and/or modify it
     under the terms of the GNU General Public License as published
-    by the Free Software Foundation; either version 3, or  (at your
+    by the Free Software Foundation; either version 3, or   (at your
     option) any later version.
 
     GCC is distributed in the hope that it will be useful, but WITHOUT
@@ -81,3 +81,9 @@ VAR2 (BINOP_NONE_NONE_NONE, vbrsrq_n_f, v8hf, v4sf)
 VAR2 (BINOP_NONE_NONE_IMM, vcvtq_n_to_f_s, v8hf, v4sf)
 VAR2 (BINOP_NONE_UNONE_IMM, vcvtq_n_to_f_u, v8hf, v4sf)
 VAR2 (BINOP_NONE_UNONE_UNONE, vcreateq_f, v8hf, v4sf)
+VAR2 (BINOP_UNONE_NONE_IMM, vcvtq_n_from_f_u, v8hi, v4si)
+VAR2 (BINOP_NONE_NONE_IMM, vcvtq_n_from_f_s, v8hi, v4si)
+VAR4 (BINOP_UNONE_UNONE_UNONE, vcreateq_u, v16qi, v8hi, v4si, v2di)
+VAR4 (BINOP_NONE_UNONE_UNONE, vcreateq_s, v16qi, v8hi, v4si, v2di)
+VAR3 (BINOP_UNONE_UNONE_IMM, vshrq_n_u, v16qi, v8hi, v4si)
+VAR3 (BINOP_NONE_NONE_IMM, vshrq_n_s, v16qi, v8hi, v4si)
diff --git a/gcc/config/arm/constraints.md b/gcc/config/arm/constraints.md
index 86cd82334533acbe58be4a0c547cf38737d2bb84..978674991d4bf0fa2082f38a630f2c9e845a7881 100644
--- a/gcc/config/arm/constraints.md
+++ b/gcc/config/arm/constraints.md
@@ -35,7 +35,7 @@
 ;; in ARM/Thumb-2 state: Da, Db, Dc, Dd, Dn, DN, Dm, Dl, DL, Do, Dv, Dy, Di,
 ;;			 Dt, Dp, Dz, Tu
 ;; in Thumb-1 state: Pa, Pb, Pc, Pd, Pe
-;; in Thumb-2 state: Ha, Pj, PJ, Ps, Pt, Pu, Pv, Pw, Px, Py, Pz, Rd
+;; in Thumb-2 state: Ha, Pj, PJ, Ps, Pt, Pu, Pv, Pw, Px, Py, Pz, Rd, Rf, Rb
 ;; in all states: Pf, Pg, UM, U1
 
 ;; The following memory constraints have been used:
@@ -56,6 +56,16 @@
   (and (match_code "const_int")
        (match_test "TARGET_HAVE_MVE && ival >= 1 && ival <= 16")))
 
+(define_constraint "Rb"
+  "@internal In Thumb-2 state a constant in range 1 to 8"
+  (and (match_code "const_int")
+       (match_test "TARGET_HAVE_MVE && ival >= 1 && ival <= 8")))
+
+(define_constraint "Rf"
+  "@internal In Thumb-2 state a constant in range 1 to 32"
+  (and (match_code "const_int")
+       (match_test "TARGET_HAVE_MVE && ival >= 1 && ival <= 32")))
+
 (define_register_constraint "t" "TARGET_32BIT ? VFP_LO_REGS : NO_REGS"
  "The VFP registers @code{s0}-@code{s31}.")
 
diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index 06701b024dabae85446bb60060a8b331e540cd6d..4ae7646cf20683ec147af11f398d7460bd65f7ba 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -22,6 +22,7 @@
 (define_mode_iterator MVE_types [V16QI V8HI V4SI V2DI TI V8HF V4SF V2DF])
 (define_mode_iterator MVE_VLD_ST [V16QI V8HI V4SI V8HF V4SF])
 (define_mode_iterator MVE_0 [V8HF V4SF])
+(define_mode_iterator MVE_1 [V16QI V8HI V4SI V2DI])
 (define_mode_iterator MVE_3 [V16QI V8HI])
 (define_mode_iterator MVE_2 [V16QI V8HI V4SI])
 (define_mode_iterator MVE_5 [V8HI V4SI])
@@ -38,7 +39,8 @@
 			 VCVTPQ_U VCVTNQ_S VCVTNQ_U VCVTMQ_S VCVTMQ_U
 			 VADDLVQ_U VCTP8Q VCTP16Q VCTP32Q VCTP64Q VPNOT
 			 VCREATEQ_F VCVTQ_N_TO_F_S VCVTQ_N_TO_F_U VBRSRQ_N_F
-			 VSUBQ_N_F])
+			 VSUBQ_N_F VCREATEQ_U VCREATEQ_S VSHRQ_N_S VSHRQ_N_U
+			 VCVTQ_N_FROM_F_S VCVTQ_N_FROM_F_U])
 
 (define_mode_attr MVE_CNVT [(V8HI "V8HF") (V4SI "V4SF")
 			    (V8HF "V8HI") (V4SF "V4SI")])
@@ -55,10 +57,16 @@
 		       (VCVTNQ_U "u") (VCVTMQ_S "s") (VCVTMQ_U "u")
 		       (VCLZQ_U "u") (VCLZQ_S "s") (VREV32Q_U "u")
 		       (VREV32Q_S "s") (VADDLVQ_U "u") (VADDLVQ_S "s")
-		       (VCVTQ_N_TO_F_S "s") (VCVTQ_N_TO_F_U "u")])
+		       (VCVTQ_N_TO_F_S "s") (VCVTQ_N_TO_F_U "u")
+		       (VCREATEQ_U "u") (VCREATEQ_S "s") (VSHRQ_N_S "s")
+		       (VSHRQ_N_U "u") (VCVTQ_N_FROM_F_S "s")
+		       (VCVTQ_N_FROM_F_U "u")])
 
 (define_int_attr mode1 [(VCTP8Q "8") (VCTP16Q "16") (VCTP32Q "32")
 			(VCTP64Q "64")])
+(define_mode_attr MVE_pred2 [(V16QI "mve_imm_8") (V8HI "mve_imm_16")
+			     (V4SI "mve_imm_32")])
+(define_mode_attr MVE_constraint2 [(V16QI "Rb") (V8HI "Rd") (V4SI "Rf")])
 
 (define_int_iterator VCVTQ_TO_F [VCVTQ_TO_F_S VCVTQ_TO_F_U])
 (define_int_iterator VMVNQ_N [VMVNQ_N_U VMVNQ_N_S])
@@ -79,6 +87,9 @@
 (define_int_iterator VADDLVQ [VADDLVQ_U VADDLVQ_S])
 (define_int_iterator VCTPQ [VCTP8Q VCTP16Q VCTP32Q VCTP64Q])
 (define_int_iterator VCVTQ_N_TO_F [VCVTQ_N_TO_F_S VCVTQ_N_TO_F_U])
+(define_int_iterator VCREATEQ [VCREATEQ_U VCREATEQ_S])
+(define_int_iterator VSHRQ_N [VSHRQ_N_S VSHRQ_N_U])
+(define_int_iterator VCVTQ_N_FROM_F [VCVTQ_N_FROM_F_S VCVTQ_N_FROM_F_U])
 
 (define_insn "*mve_mov<mode>"
   [(set (match_operand:MVE_types 0 "s_register_operand" "=w,w,r,w,w,r,w")
@@ -743,3 +754,48 @@
   "vmov %q0[2], %q0[0], %Q2, %Q1\;vmov %q0[3], %q0[1], %R2, %R1"
   [(set_attr "type" "mve_move")
    (set_attr "length""8")])
+
+;;
+;; [vcreateq_u, vcreateq_s])
+;;
+(define_insn "mve_vcreateq_<supf><mode>"
+  [
+   (set (match_operand:MVE_1 0 "s_register_operand" "=w")
+	(unspec:MVE_1 [(match_operand:DI 1 "s_register_operand" "r")
+		       (match_operand:DI 2 "s_register_operand" "r")]
+	 VCREATEQ))
+  ]
+  "TARGET_HAVE_MVE"
+  "vmov %q0[2], %q0[0], %Q2, %Q1\;vmov %q0[3], %q0[1], %R2, %R1"
+  [(set_attr "type" "mve_move")
+   (set_attr "length""8")])
+
+;;
+;; [vshrq_n_s, vshrq_n_u])
+;;
+(define_insn "mve_vshrq_n_<supf><mode>"
+  [
+   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
+	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "w")
+		       (match_operand:SI 2 "<MVE_pred2>" "<MVE_constraint2>")]
+	 VSHRQ_N))
+  ]
+  "TARGET_HAVE_MVE"
+  "vshr.<supf>%#<V_sz_elem>\t%q0, %q1, %2"
+  [(set_attr "type" "mve_move")
+])
+
+;;
+;; [vcvtq_n_from_f_s, vcvtq_n_from_f_u])
+;;
+(define_insn "mve_vcvtq_n_from_f_<supf><mode>"
+  [
+   (set (match_operand:MVE_5 0 "s_register_operand" "=w")
+	(unspec:MVE_5 [(match_operand:<MVE_CNVT> 1 "s_register_operand" "w")
+		       (match_operand:SI 2 "mve_imm_16" "Rd")]
+	 VCVTQ_N_FROM_F))
+  ]
+  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
+  "vcvt.<supf>%#<V_sz_elem>.f%#<V_sz_elem>\t%q0, %q1, %2"
+  [(set_attr "type" "mve_move")
+])
diff --git a/gcc/config/arm/predicates.md b/gcc/config/arm/predicates.md
index 80a44bb32cbce1ee3f850a44b70a0e4ceed548be..2e038705b3594ef8756bbb56c46d3285261e946e 100644
--- a/gcc/config/arm/predicates.md
+++ b/gcc/config/arm/predicates.md
@@ -35,6 +35,14 @@
 (define_predicate "mve_imm_16"
   (match_test "satisfies_constraint_Rd (op)"))
 
+;; True for immediates in the range of 1 to 8 for MVE.
+(define_predicate "mve_imm_8"
+  (match_test "satisfies_constraint_Rb (op)"))
+
+;; True for immediates in the range of 1 to 32 for MVE.
+(define_predicate "mve_imm_32"
+  (match_test "satisfies_constraint_Rf (op)"))
+
 ; Predicate for stack protector guard's address in
 ; stack_protect_combined_set_insn and stack_protect_combined_test_insn patterns
 (define_predicate "guard_addr_operand"
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_s16.c
new file mode 100644
index 0000000000000000000000000000000000000000..fad5de2e5d2603abac244a6554ee2351c3b05ddb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_s16.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int16x8_t
+foo (uint64_t a, uint64_t b)
+{
+  return vcreateq_s16 (a, b);
+}
+
+/* { dg-final { scan-assembler "vmov"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_s32.c
new file mode 100644
index 0000000000000000000000000000000000000000..b6013be8481b8d8a800eca2a2e275720283586d9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_s32.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int32x4_t
+foo (uint64_t a, uint64_t b)
+{
+  return vcreateq_s32 (a, b);
+}
+
+/* { dg-final { scan-assembler "vmov"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_s64.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_s64.c
new file mode 100644
index 0000000000000000000000000000000000000000..74372bef4eaa53772765029ce20e51ef3ecb80d6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_s64.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int64x2_t
+foo (uint64_t a, uint64_t b)
+{
+  return vcreateq_s64 (a, b);
+}
+
+/* { dg-final { scan-assembler "vmov"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_s8.c
new file mode 100644
index 0000000000000000000000000000000000000000..9884f0435780c7895c0cf1825d4d024f496e5a33
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_s8.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int8x16_t
+foo (uint64_t a, uint64_t b)
+{
+  return vcreateq_s8 (a, b);
+}
+
+/* { dg-final { scan-assembler "vmov"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_u16.c
new file mode 100644
index 0000000000000000000000000000000000000000..cc60f6cd4dd7370dae5245a6499383659705af7e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_u16.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint16x8_t
+foo (uint64_t a, uint64_t b)
+{
+  return vcreateq_u16 (a, b);
+}
+
+/* { dg-final { scan-assembler "vmov"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_u32.c
new file mode 100644
index 0000000000000000000000000000000000000000..43a49bc0a4a86c7a090151e7a5f4b6887496b520
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_u32.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint32x4_t
+foo (uint64_t a, uint64_t b)
+{
+  return vcreateq_u32 (a, b);
+}
+
+/* { dg-final { scan-assembler "vmov"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_u64.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_u64.c
new file mode 100644
index 0000000000000000000000000000000000000000..d798273dab689fc95746c0788d31c7d56244dd67
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_u64.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint64x2_t
+foo (uint64_t a, uint64_t b)
+{
+  return vcreateq_u64 (a, b);
+}
+
+/* { dg-final { scan-assembler "vmov"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_u8.c
new file mode 100644
index 0000000000000000000000000000000000000000..7bb6a75ab8a2d52834798f39e8fff1e43ec0cfd9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_u8.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint8x16_t
+foo (uint64_t a, uint64_t b)
+{
+  return vcreateq_u8 (a, b);
+}
+
+/* { dg-final { scan-assembler "vmov"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_n_s16_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_n_s16_f16.c
new file mode 100644
index 0000000000000000000000000000000000000000..f40561848707ba059506fb532b052489823262cb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_n_s16_f16.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int16x8_t
+foo (float16x8_t a)
+{
+  return vcvtq_n_s16_f16 (a, 1);
+}
+
+/* { dg-final { scan-assembler "vcvt.s16.f16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_n_s32_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_n_s32_f32.c
new file mode 100644
index 0000000000000000000000000000000000000000..1bb98c254f5ba72af2416b952c5c96b315d148a9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_n_s32_f32.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int32x4_t
+foo (float32x4_t a)
+{
+  return vcvtq_n_s32_f32 (a, 1);
+}
+
+/* { dg-final { scan-assembler "vcvt.s32.f32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_n_u16_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_n_u16_f16.c
new file mode 100644
index 0000000000000000000000000000000000000000..7bded9d70b55b06a872789118df65f2cced0d7a1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_n_u16_f16.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint16x8_t
+foo (float16x8_t a)
+{
+  return vcvtq_n_u16_f16 (a, 1);
+}
+
+/* { dg-final { scan-assembler "vcvt.u16.f16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_n_u32_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_n_u32_f32.c
new file mode 100644
index 0000000000000000000000000000000000000000..cbcf93b3e102d2e47c18792d641f0e8f56fbdbe1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_n_u32_f32.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint32x4_t
+foo (float32x4_t a)
+{
+  return vcvtq_n_u32_f32 (a, 1);
+}
+
+/* { dg-final { scan-assembler "vcvt.u32.f32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vshrq_n_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vshrq_n_s16.c
new file mode 100644
index 0000000000000000000000000000000000000000..71950e35e9f1a9185f45858bb9f4e5def844644d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vshrq_n_s16.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int16x8_t
+foo (int16x8_t a)
+{
+  return vshrq_n_s16 (a, 16);
+}
+
+/* { dg-final { scan-assembler "vshr.s16"  }  } */
+
+int16x8_t
+foo1 (int16x8_t a)
+{
+  return vshrq (a, 16);
+}
+
+/* { dg-final { scan-assembler "vshr.s16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vshrq_n_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vshrq_n_s32.c
new file mode 100644
index 0000000000000000000000000000000000000000..b31d7a479e778f8ef9bc50c75b07983de7f93fe2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vshrq_n_s32.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int32x4_t
+foo (int32x4_t a)
+{
+  return vshrq_n_s32 (a, 32);
+}
+
+/* { dg-final { scan-assembler "vshr.s32"  }  } */
+
+int32x4_t
+foo1 (int32x4_t a)
+{
+  return vshrq (a, 32);
+}
+
+/* { dg-final { scan-assembler "vshr.s32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vshrq_n_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vshrq_n_s8.c
new file mode 100644
index 0000000000000000000000000000000000000000..461a268501f86ad9ead5dbe68b21c438d6257dbe
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vshrq_n_s8.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int8x16_t
+foo (int8x16_t a)
+{
+  return vshrq_n_s8 (a, 8);
+}
+
+/* { dg-final { scan-assembler "vshr.s8"  }  } */
+
+int8x16_t
+foo1 (int8x16_t a)
+{
+  return vshrq (a, 8);
+}
+
+/* { dg-final { scan-assembler "vshr.s8"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vshrq_n_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vshrq_n_u16.c
new file mode 100644
index 0000000000000000000000000000000000000000..97c6323b4c645fb78fb28e4ab8ef83855fb60037
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vshrq_n_u16.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint16x8_t
+foo (uint16x8_t a)
+{
+  return vshrq_n_u16 (a, 16);
+}
+
+/* { dg-final { scan-assembler "vshr.u16"  }  } */
+
+uint16x8_t
+foo1 (uint16x8_t a)
+{
+  return vshrq (a, 16);
+}
+
+/* { dg-final { scan-assembler "vshr.u16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vshrq_n_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vshrq_n_u32.c
new file mode 100644
index 0000000000000000000000000000000000000000..42006566f635b81268e01d3e18426bc294484c5b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vshrq_n_u32.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint32x4_t
+foo (uint32x4_t a)
+{
+  return vshrq_n_u32 (a, 32);
+}
+
+/* { dg-final { scan-assembler "vshr.u32"  }  } */
+
+uint32x4_t
+foo1 (uint32x4_t a)
+{
+  return vshrq (a, 32);
+}
+
+/* { dg-final { scan-assembler "vshr.u32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vshrq_n_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vshrq_n_u8.c
new file mode 100644
index 0000000000000000000000000000000000000000..25679297b6d5165fabfd77953f48b0af36867dad
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vshrq_n_u8.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint8x16_t
+foo (uint8x16_t a)
+{
+  return vshrq_n_u8 (a, 8);
+}
+
+/* { dg-final { scan-assembler "vshr.u8"  }  } */
+
+uint8x16_t
+foo1 (uint8x16_t a)
+{
+  return vshrq (a, 8);
+}
+
+/* { dg-final { scan-assembler "vshr.u8"  }  } */


[-- Attachment #2: diff09.patch --]
[-- Type: text/plain, Size: 28361 bytes --]

diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index c2dad057d1365914477c64d559aa1fd1c32bbf19..ec56dbcdd66662bb1de696142186955d75570772 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -373,6 +373,24 @@ arm_binop_none_unone_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 #define BINOP_NONE_UNONE_UNONE_QUALIFIERS \
   (arm_binop_none_unone_unone_qualifiers)
 
+static enum arm_type_qualifiers
+arm_binop_unone_unone_imm_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_unsigned, qualifier_unsigned, qualifier_immediate };
+#define BINOP_UNONE_UNONE_IMM_QUALIFIERS \
+  (arm_binop_unone_unone_imm_qualifiers)
+
+static enum arm_type_qualifiers
+arm_binop_unone_unone_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_unsigned, qualifier_unsigned, qualifier_unsigned };
+#define BINOP_UNONE_UNONE_UNONE_QUALIFIERS \
+  (arm_binop_unone_unone_unone_qualifiers)
+
+static enum arm_type_qualifiers
+arm_binop_unone_none_imm_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_unsigned, qualifier_none, qualifier_immediate };
+#define BINOP_UNONE_NONE_IMM_QUALIFIERS \
+  (arm_binop_unone_none_imm_qualifiers)
+
 /* End of Qualifier for MVE builtins.  */
 
    /* void ([T element type] *, T, immediate).  */
diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
index 15b7ada025fe57b682f3873f0f275c43a30d1273..0318a2cd8c3cdea430a32506e75a5aceb4d3a475 100644
--- a/gcc/config/arm/arm_mve.h
+++ b/gcc/config/arm/arm_mve.h
@@ -207,6 +207,24 @@ typedef struct { uint8x16_t val[4]; } uint8x16x4_t;
 #define vcvtq_n_f32_u32(__a,  __imm6) __arm_vcvtq_n_f32_u32(__a,  __imm6)
 #define vcreateq_f16(__a, __b) __arm_vcreateq_f16(__a, __b)
 #define vcreateq_f32(__a, __b) __arm_vcreateq_f32(__a, __b)
+#define vcvtq_n_s16_f16(__a,  __imm6) __arm_vcvtq_n_s16_f16(__a,  __imm6)
+#define vcvtq_n_s32_f32(__a,  __imm6) __arm_vcvtq_n_s32_f32(__a,  __imm6)
+#define vcvtq_n_u16_f16(__a,  __imm6) __arm_vcvtq_n_u16_f16(__a,  __imm6)
+#define vcvtq_n_u32_f32(__a,  __imm6) __arm_vcvtq_n_u32_f32(__a,  __imm6)
+#define vcreateq_u8(__a, __b) __arm_vcreateq_u8(__a, __b)
+#define vcreateq_u16(__a, __b) __arm_vcreateq_u16(__a, __b)
+#define vcreateq_u32(__a, __b) __arm_vcreateq_u32(__a, __b)
+#define vcreateq_u64(__a, __b) __arm_vcreateq_u64(__a, __b)
+#define vcreateq_s8(__a, __b) __arm_vcreateq_s8(__a, __b)
+#define vcreateq_s16(__a, __b) __arm_vcreateq_s16(__a, __b)
+#define vcreateq_s32(__a, __b) __arm_vcreateq_s32(__a, __b)
+#define vcreateq_s64(__a, __b) __arm_vcreateq_s64(__a, __b)
+#define vshrq_n_s8(__a,  __imm) __arm_vshrq_n_s8(__a,  __imm)
+#define vshrq_n_s16(__a,  __imm) __arm_vshrq_n_s16(__a,  __imm)
+#define vshrq_n_s32(__a,  __imm) __arm_vshrq_n_s32(__a,  __imm)
+#define vshrq_n_u8(__a,  __imm) __arm_vshrq_n_u8(__a,  __imm)
+#define vshrq_n_u16(__a,  __imm) __arm_vshrq_n_u16(__a,  __imm)
+#define vshrq_n_u32(__a,  __imm) __arm_vshrq_n_u32(__a,  __imm)
 #endif
 
 __extension__ extern __inline void
@@ -753,6 +771,104 @@ __arm_vpnot (mve_pred16_t __a)
   return __builtin_mve_vpnothi (__a);
 }
 
+__extension__ extern __inline uint8x16_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vcreateq_u8 (uint64_t __a, uint64_t __b)
+{
+  return __builtin_mve_vcreateq_uv16qi (__a, __b);
+}
+
+__extension__ extern __inline uint16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vcreateq_u16 (uint64_t __a, uint64_t __b)
+{
+  return __builtin_mve_vcreateq_uv8hi (__a, __b);
+}
+
+__extension__ extern __inline uint32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vcreateq_u32 (uint64_t __a, uint64_t __b)
+{
+  return __builtin_mve_vcreateq_uv4si (__a, __b);
+}
+
+__extension__ extern __inline uint64x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vcreateq_u64 (uint64_t __a, uint64_t __b)
+{
+  return __builtin_mve_vcreateq_uv2di (__a, __b);
+}
+
+__extension__ extern __inline int8x16_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vcreateq_s8 (uint64_t __a, uint64_t __b)
+{
+  return __builtin_mve_vcreateq_sv16qi (__a, __b);
+}
+
+__extension__ extern __inline int16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vcreateq_s16 (uint64_t __a, uint64_t __b)
+{
+  return __builtin_mve_vcreateq_sv8hi (__a, __b);
+}
+
+__extension__ extern __inline int32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vcreateq_s32 (uint64_t __a, uint64_t __b)
+{
+  return __builtin_mve_vcreateq_sv4si (__a, __b);
+}
+
+__extension__ extern __inline int64x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vcreateq_s64 (uint64_t __a, uint64_t __b)
+{
+  return __builtin_mve_vcreateq_sv2di (__a, __b);
+}
+
+__extension__ extern __inline int8x16_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vshrq_n_s8 (int8x16_t __a, const int __imm)
+{
+  return __builtin_mve_vshrq_n_sv16qi (__a, __imm);
+}
+
+__extension__ extern __inline int16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vshrq_n_s16 (int16x8_t __a, const int __imm)
+{
+  return __builtin_mve_vshrq_n_sv8hi (__a, __imm);
+}
+
+__extension__ extern __inline int32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vshrq_n_s32 (int32x4_t __a, const int __imm)
+{
+  return __builtin_mve_vshrq_n_sv4si (__a, __imm);
+}
+
+__extension__ extern __inline uint8x16_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vshrq_n_u8 (uint8x16_t __a, const int __imm)
+{
+  return __builtin_mve_vshrq_n_uv16qi (__a, __imm);
+}
+
+__extension__ extern __inline uint16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vshrq_n_u16 (uint16x8_t __a, const int __imm)
+{
+  return __builtin_mve_vshrq_n_uv8hi (__a, __imm);
+}
+
+__extension__ extern __inline uint32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vshrq_n_u32 (uint32x4_t __a, const int __imm)
+{
+  return __builtin_mve_vshrq_n_uv4si (__a, __imm);
+}
+
 #if (__ARM_FEATURE_MVE & 2) /* MVE Floating point.  */
 
 __extension__ extern __inline void
@@ -1165,6 +1281,34 @@ __arm_vcreateq_f32 (uint64_t __a, uint64_t __b)
   return __builtin_mve_vcreateq_fv4sf (__a, __b);
 }
 
+__extension__ extern __inline int16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vcvtq_n_s16_f16 (float16x8_t __a, const int __imm6)
+{
+  return __builtin_mve_vcvtq_n_from_f_sv8hi (__a, __imm6);
+}
+
+__extension__ extern __inline int32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vcvtq_n_s32_f32 (float32x4_t __a, const int __imm6)
+{
+  return __builtin_mve_vcvtq_n_from_f_sv4si (__a, __imm6);
+}
+
+__extension__ extern __inline uint16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vcvtq_n_u16_f16 (float16x8_t __a, const int __imm6)
+{
+  return __builtin_mve_vcvtq_n_from_f_uv8hi (__a, __imm6);
+}
+
+__extension__ extern __inline uint32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vcvtq_n_u32_f32 (float32x4_t __a, const int __imm6)
+{
+  return __builtin_mve_vcvtq_n_from_f_uv4si (__a, __imm6);
+}
+
 #endif
 
 enum {
@@ -1598,6 +1742,16 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_int16x8_t]: __arm_vqnegq_s16 (__ARM_mve_coerce(__p0, int16x8_t)), \
   int (*)[__ARM_mve_type_int32x4_t]: __arm_vqnegq_s32 (__ARM_mve_coerce(__p0, int32x4_t)));})
 
+#define vshrq(p0,p1) __arm_vshrq(p0,p1)
+#define __arm_vshrq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
+  int (*)[__ARM_mve_type_int8x16_t]: __arm_vshrq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), p1), \
+  int (*)[__ARM_mve_type_int16x8_t]: __arm_vshrq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), p1), \
+  int (*)[__ARM_mve_type_int32x4_t]: __arm_vshrq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), p1), \
+  int (*)[__ARM_mve_type_uint8x16_t]: __arm_vshrq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t), p1), \
+  int (*)[__ARM_mve_type_uint16x8_t]: __arm_vshrq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), p1), \
+  int (*)[__ARM_mve_type_uint32x4_t]: __arm_vshrq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), p1));})
+
 #endif /* MVE Floating point.  */
 
 #ifdef __cplusplus
diff --git a/gcc/config/arm/arm_mve_builtins.def b/gcc/config/arm/arm_mve_builtins.def
index 8d1e4fac3d75e87fbe334e64e1073cb1fef0d96d..eae37ba831361401241ed91950169f06dee29c3e 100644
--- a/gcc/config/arm/arm_mve_builtins.def
+++ b/gcc/config/arm/arm_mve_builtins.def
@@ -6,7 +6,7 @@
 
     GCC is free software; you can redistribute it and/or modify it
     under the terms of the GNU General Public License as published
-    by the Free Software Foundation; either version 3, or  (at your
+    by the Free Software Foundation; either version 3, or   (at your
     option) any later version.
 
     GCC is distributed in the hope that it will be useful, but WITHOUT
@@ -81,3 +81,9 @@ VAR2 (BINOP_NONE_NONE_NONE, vbrsrq_n_f, v8hf, v4sf)
 VAR2 (BINOP_NONE_NONE_IMM, vcvtq_n_to_f_s, v8hf, v4sf)
 VAR2 (BINOP_NONE_UNONE_IMM, vcvtq_n_to_f_u, v8hf, v4sf)
 VAR2 (BINOP_NONE_UNONE_UNONE, vcreateq_f, v8hf, v4sf)
+VAR2 (BINOP_UNONE_NONE_IMM, vcvtq_n_from_f_u, v8hi, v4si)
+VAR2 (BINOP_NONE_NONE_IMM, vcvtq_n_from_f_s, v8hi, v4si)
+VAR4 (BINOP_UNONE_UNONE_UNONE, vcreateq_u, v16qi, v8hi, v4si, v2di)
+VAR4 (BINOP_NONE_UNONE_UNONE, vcreateq_s, v16qi, v8hi, v4si, v2di)
+VAR3 (BINOP_UNONE_UNONE_IMM, vshrq_n_u, v16qi, v8hi, v4si)
+VAR3 (BINOP_NONE_NONE_IMM, vshrq_n_s, v16qi, v8hi, v4si)
diff --git a/gcc/config/arm/constraints.md b/gcc/config/arm/constraints.md
index 86cd82334533acbe58be4a0c547cf38737d2bb84..978674991d4bf0fa2082f38a630f2c9e845a7881 100644
--- a/gcc/config/arm/constraints.md
+++ b/gcc/config/arm/constraints.md
@@ -35,7 +35,7 @@
 ;; in ARM/Thumb-2 state: Da, Db, Dc, Dd, Dn, DN, Dm, Dl, DL, Do, Dv, Dy, Di,
 ;;			 Dt, Dp, Dz, Tu
 ;; in Thumb-1 state: Pa, Pb, Pc, Pd, Pe
-;; in Thumb-2 state: Ha, Pj, PJ, Ps, Pt, Pu, Pv, Pw, Px, Py, Pz, Rd
+;; in Thumb-2 state: Ha, Pj, PJ, Ps, Pt, Pu, Pv, Pw, Px, Py, Pz, Rd, Rf, Rb
 ;; in all states: Pf, Pg, UM, U1
 
 ;; The following memory constraints have been used:
@@ -56,6 +56,16 @@
   (and (match_code "const_int")
        (match_test "TARGET_HAVE_MVE && ival >= 1 && ival <= 16")))
 
+(define_constraint "Rb"
+  "@internal In Thumb-2 state a constant in range 1 to 8"
+  (and (match_code "const_int")
+       (match_test "TARGET_HAVE_MVE && ival >= 1 && ival <= 8")))
+
+(define_constraint "Rf"
+  "@internal In Thumb-2 state a constant in range 1 to 32"
+  (and (match_code "const_int")
+       (match_test "TARGET_HAVE_MVE && ival >= 1 && ival <= 32")))
+
 (define_register_constraint "t" "TARGET_32BIT ? VFP_LO_REGS : NO_REGS"
  "The VFP registers @code{s0}-@code{s31}.")
 
diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index 06701b024dabae85446bb60060a8b331e540cd6d..4ae7646cf20683ec147af11f398d7460bd65f7ba 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -22,6 +22,7 @@
 (define_mode_iterator MVE_types [V16QI V8HI V4SI V2DI TI V8HF V4SF V2DF])
 (define_mode_iterator MVE_VLD_ST [V16QI V8HI V4SI V8HF V4SF])
 (define_mode_iterator MVE_0 [V8HF V4SF])
+(define_mode_iterator MVE_1 [V16QI V8HI V4SI V2DI])
 (define_mode_iterator MVE_3 [V16QI V8HI])
 (define_mode_iterator MVE_2 [V16QI V8HI V4SI])
 (define_mode_iterator MVE_5 [V8HI V4SI])
@@ -38,7 +39,8 @@
 			 VCVTPQ_U VCVTNQ_S VCVTNQ_U VCVTMQ_S VCVTMQ_U
 			 VADDLVQ_U VCTP8Q VCTP16Q VCTP32Q VCTP64Q VPNOT
 			 VCREATEQ_F VCVTQ_N_TO_F_S VCVTQ_N_TO_F_U VBRSRQ_N_F
-			 VSUBQ_N_F])
+			 VSUBQ_N_F VCREATEQ_U VCREATEQ_S VSHRQ_N_S VSHRQ_N_U
+			 VCVTQ_N_FROM_F_S VCVTQ_N_FROM_F_U])
 
 (define_mode_attr MVE_CNVT [(V8HI "V8HF") (V4SI "V4SF")
 			    (V8HF "V8HI") (V4SF "V4SI")])
@@ -55,10 +57,16 @@
 		       (VCVTNQ_U "u") (VCVTMQ_S "s") (VCVTMQ_U "u")
 		       (VCLZQ_U "u") (VCLZQ_S "s") (VREV32Q_U "u")
 		       (VREV32Q_S "s") (VADDLVQ_U "u") (VADDLVQ_S "s")
-		       (VCVTQ_N_TO_F_S "s") (VCVTQ_N_TO_F_U "u")])
+		       (VCVTQ_N_TO_F_S "s") (VCVTQ_N_TO_F_U "u")
+		       (VCREATEQ_U "u") (VCREATEQ_S "s") (VSHRQ_N_S "s")
+		       (VSHRQ_N_U "u") (VCVTQ_N_FROM_F_S "s")
+		       (VCVTQ_N_FROM_F_U "u")])
 
 (define_int_attr mode1 [(VCTP8Q "8") (VCTP16Q "16") (VCTP32Q "32")
 			(VCTP64Q "64")])
+(define_mode_attr MVE_pred2 [(V16QI "mve_imm_8") (V8HI "mve_imm_16")
+			     (V4SI "mve_imm_32")])
+(define_mode_attr MVE_constraint2 [(V16QI "Rb") (V8HI "Rd") (V4SI "Rf")])
 
 (define_int_iterator VCVTQ_TO_F [VCVTQ_TO_F_S VCVTQ_TO_F_U])
 (define_int_iterator VMVNQ_N [VMVNQ_N_U VMVNQ_N_S])
@@ -79,6 +87,9 @@
 (define_int_iterator VADDLVQ [VADDLVQ_U VADDLVQ_S])
 (define_int_iterator VCTPQ [VCTP8Q VCTP16Q VCTP32Q VCTP64Q])
 (define_int_iterator VCVTQ_N_TO_F [VCVTQ_N_TO_F_S VCVTQ_N_TO_F_U])
+(define_int_iterator VCREATEQ [VCREATEQ_U VCREATEQ_S])
+(define_int_iterator VSHRQ_N [VSHRQ_N_S VSHRQ_N_U])
+(define_int_iterator VCVTQ_N_FROM_F [VCVTQ_N_FROM_F_S VCVTQ_N_FROM_F_U])
 
 (define_insn "*mve_mov<mode>"
   [(set (match_operand:MVE_types 0 "s_register_operand" "=w,w,r,w,w,r,w")
@@ -743,3 +754,48 @@
   "vmov %q0[2], %q0[0], %Q2, %Q1\;vmov %q0[3], %q0[1], %R2, %R1"
   [(set_attr "type" "mve_move")
    (set_attr "length""8")])
+
+;;
+;; [vcreateq_u, vcreateq_s])
+;;
+(define_insn "mve_vcreateq_<supf><mode>"
+  [
+   (set (match_operand:MVE_1 0 "s_register_operand" "=w")
+	(unspec:MVE_1 [(match_operand:DI 1 "s_register_operand" "r")
+		       (match_operand:DI 2 "s_register_operand" "r")]
+	 VCREATEQ))
+  ]
+  "TARGET_HAVE_MVE"
+  "vmov %q0[2], %q0[0], %Q2, %Q1\;vmov %q0[3], %q0[1], %R2, %R1"
+  [(set_attr "type" "mve_move")
+   (set_attr "length""8")])
+
+;;
+;; [vshrq_n_s, vshrq_n_u])
+;;
+(define_insn "mve_vshrq_n_<supf><mode>"
+  [
+   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
+	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "w")
+		       (match_operand:SI 2 "<MVE_pred2>" "<MVE_constraint2>")]
+	 VSHRQ_N))
+  ]
+  "TARGET_HAVE_MVE"
+  "vshr.<supf>%#<V_sz_elem>\t%q0, %q1, %2"
+  [(set_attr "type" "mve_move")
+])
+
+;;
+;; [vcvtq_n_from_f_s, vcvtq_n_from_f_u])
+;;
+(define_insn "mve_vcvtq_n_from_f_<supf><mode>"
+  [
+   (set (match_operand:MVE_5 0 "s_register_operand" "=w")
+	(unspec:MVE_5 [(match_operand:<MVE_CNVT> 1 "s_register_operand" "w")
+		       (match_operand:SI 2 "mve_imm_16" "Rd")]
+	 VCVTQ_N_FROM_F))
+  ]
+  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
+  "vcvt.<supf>%#<V_sz_elem>.f%#<V_sz_elem>\t%q0, %q1, %2"
+  [(set_attr "type" "mve_move")
+])
diff --git a/gcc/config/arm/predicates.md b/gcc/config/arm/predicates.md
index 80a44bb32cbce1ee3f850a44b70a0e4ceed548be..2e038705b3594ef8756bbb56c46d3285261e946e 100644
--- a/gcc/config/arm/predicates.md
+++ b/gcc/config/arm/predicates.md
@@ -35,6 +35,14 @@
 (define_predicate "mve_imm_16"
   (match_test "satisfies_constraint_Rd (op)"))
 
+;; True for immediates in the range of 1 to 8 for MVE.
+(define_predicate "mve_imm_8"
+  (match_test "satisfies_constraint_Rb (op)"))
+
+;; True for immediates in the range of 1 to 32 for MVE.
+(define_predicate "mve_imm_32"
+  (match_test "satisfies_constraint_Rf (op)"))
+
 ; Predicate for stack protector guard's address in
 ; stack_protect_combined_set_insn and stack_protect_combined_test_insn patterns
 (define_predicate "guard_addr_operand"
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_s16.c
new file mode 100644
index 0000000000000000000000000000000000000000..fad5de2e5d2603abac244a6554ee2351c3b05ddb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_s16.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int16x8_t
+foo (uint64_t a, uint64_t b)
+{
+  return vcreateq_s16 (a, b);
+}
+
+/* { dg-final { scan-assembler "vmov"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_s32.c
new file mode 100644
index 0000000000000000000000000000000000000000..b6013be8481b8d8a800eca2a2e275720283586d9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_s32.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int32x4_t
+foo (uint64_t a, uint64_t b)
+{
+  return vcreateq_s32 (a, b);
+}
+
+/* { dg-final { scan-assembler "vmov"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_s64.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_s64.c
new file mode 100644
index 0000000000000000000000000000000000000000..74372bef4eaa53772765029ce20e51ef3ecb80d6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_s64.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int64x2_t
+foo (uint64_t a, uint64_t b)
+{
+  return vcreateq_s64 (a, b);
+}
+
+/* { dg-final { scan-assembler "vmov"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_s8.c
new file mode 100644
index 0000000000000000000000000000000000000000..9884f0435780c7895c0cf1825d4d024f496e5a33
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_s8.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int8x16_t
+foo (uint64_t a, uint64_t b)
+{
+  return vcreateq_s8 (a, b);
+}
+
+/* { dg-final { scan-assembler "vmov"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_u16.c
new file mode 100644
index 0000000000000000000000000000000000000000..cc60f6cd4dd7370dae5245a6499383659705af7e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_u16.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint16x8_t
+foo (uint64_t a, uint64_t b)
+{
+  return vcreateq_u16 (a, b);
+}
+
+/* { dg-final { scan-assembler "vmov"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_u32.c
new file mode 100644
index 0000000000000000000000000000000000000000..43a49bc0a4a86c7a090151e7a5f4b6887496b520
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_u32.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint32x4_t
+foo (uint64_t a, uint64_t b)
+{
+  return vcreateq_u32 (a, b);
+}
+
+/* { dg-final { scan-assembler "vmov"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_u64.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_u64.c
new file mode 100644
index 0000000000000000000000000000000000000000..d798273dab689fc95746c0788d31c7d56244dd67
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_u64.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint64x2_t
+foo (uint64_t a, uint64_t b)
+{
+  return vcreateq_u64 (a, b);
+}
+
+/* { dg-final { scan-assembler "vmov"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_u8.c
new file mode 100644
index 0000000000000000000000000000000000000000..7bb6a75ab8a2d52834798f39e8fff1e43ec0cfd9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_u8.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint8x16_t
+foo (uint64_t a, uint64_t b)
+{
+  return vcreateq_u8 (a, b);
+}
+
+/* { dg-final { scan-assembler "vmov"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_n_s16_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_n_s16_f16.c
new file mode 100644
index 0000000000000000000000000000000000000000..f40561848707ba059506fb532b052489823262cb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_n_s16_f16.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int16x8_t
+foo (float16x8_t a)
+{
+  return vcvtq_n_s16_f16 (a, 1);
+}
+
+/* { dg-final { scan-assembler "vcvt.s16.f16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_n_s32_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_n_s32_f32.c
new file mode 100644
index 0000000000000000000000000000000000000000..1bb98c254f5ba72af2416b952c5c96b315d148a9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_n_s32_f32.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int32x4_t
+foo (float32x4_t a)
+{
+  return vcvtq_n_s32_f32 (a, 1);
+}
+
+/* { dg-final { scan-assembler "vcvt.s32.f32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_n_u16_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_n_u16_f16.c
new file mode 100644
index 0000000000000000000000000000000000000000..7bded9d70b55b06a872789118df65f2cced0d7a1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_n_u16_f16.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint16x8_t
+foo (float16x8_t a)
+{
+  return vcvtq_n_u16_f16 (a, 1);
+}
+
+/* { dg-final { scan-assembler "vcvt.u16.f16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_n_u32_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_n_u32_f32.c
new file mode 100644
index 0000000000000000000000000000000000000000..cbcf93b3e102d2e47c18792d641f0e8f56fbdbe1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_n_u32_f32.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint32x4_t
+foo (float32x4_t a)
+{
+  return vcvtq_n_u32_f32 (a, 1);
+}
+
+/* { dg-final { scan-assembler "vcvt.u32.f32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vshrq_n_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vshrq_n_s16.c
new file mode 100644
index 0000000000000000000000000000000000000000..71950e35e9f1a9185f45858bb9f4e5def844644d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vshrq_n_s16.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int16x8_t
+foo (int16x8_t a)
+{
+  return vshrq_n_s16 (a, 16);
+}
+
+/* { dg-final { scan-assembler "vshr.s16"  }  } */
+
+int16x8_t
+foo1 (int16x8_t a)
+{
+  return vshrq (a, 16);
+}
+
+/* { dg-final { scan-assembler "vshr.s16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vshrq_n_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vshrq_n_s32.c
new file mode 100644
index 0000000000000000000000000000000000000000..b31d7a479e778f8ef9bc50c75b07983de7f93fe2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vshrq_n_s32.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int32x4_t
+foo (int32x4_t a)
+{
+  return vshrq_n_s32 (a, 32);
+}
+
+/* { dg-final { scan-assembler "vshr.s32"  }  } */
+
+int32x4_t
+foo1 (int32x4_t a)
+{
+  return vshrq (a, 32);
+}
+
+/* { dg-final { scan-assembler "vshr.s32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vshrq_n_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vshrq_n_s8.c
new file mode 100644
index 0000000000000000000000000000000000000000..461a268501f86ad9ead5dbe68b21c438d6257dbe
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vshrq_n_s8.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int8x16_t
+foo (int8x16_t a)
+{
+  return vshrq_n_s8 (a, 8);
+}
+
+/* { dg-final { scan-assembler "vshr.s8"  }  } */
+
+int8x16_t
+foo1 (int8x16_t a)
+{
+  return vshrq (a, 8);
+}
+
+/* { dg-final { scan-assembler "vshr.s8"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vshrq_n_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vshrq_n_u16.c
new file mode 100644
index 0000000000000000000000000000000000000000..97c6323b4c645fb78fb28e4ab8ef83855fb60037
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vshrq_n_u16.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint16x8_t
+foo (uint16x8_t a)
+{
+  return vshrq_n_u16 (a, 16);
+}
+
+/* { dg-final { scan-assembler "vshr.u16"  }  } */
+
+uint16x8_t
+foo1 (uint16x8_t a)
+{
+  return vshrq (a, 16);
+}
+
+/* { dg-final { scan-assembler "vshr.u16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vshrq_n_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vshrq_n_u32.c
new file mode 100644
index 0000000000000000000000000000000000000000..42006566f635b81268e01d3e18426bc294484c5b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vshrq_n_u32.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint32x4_t
+foo (uint32x4_t a)
+{
+  return vshrq_n_u32 (a, 32);
+}
+
+/* { dg-final { scan-assembler "vshr.u32"  }  } */
+
+uint32x4_t
+foo1 (uint32x4_t a)
+{
+  return vshrq (a, 32);
+}
+
+/* { dg-final { scan-assembler "vshr.u32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vshrq_n_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vshrq_n_u8.c
new file mode 100644
index 0000000000000000000000000000000000000000..25679297b6d5165fabfd77953f48b0af36867dad
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vshrq_n_u8.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint8x16_t
+foo (uint8x16_t a)
+{
+  return vshrq_n_u8 (a, 8);
+}
+
+/* { dg-final { scan-assembler "vshr.u8"  }  } */
+
+uint8x16_t
+foo1 (uint8x16_t a)
+{
+  return vshrq (a, 8);
+}
+
+/* { dg-final { scan-assembler "vshr.u8"  }  } */


^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH][ARM][GCC][2/1x]: MVE intrinsics with unary operand.
  2019-11-14 19:34 [PATCH][ARM][GCC][0/x]: Support for MVE ACLE intrinsics Srinath Parvathaneni
                   ` (4 preceding siblings ...)
  2019-11-14 19:13 ` [PATCH][ARM][GCC][3/1x]: MVE intrinsics with unary operand Srinath Parvathaneni
@ 2019-11-14 19:13 ` Srinath Parvathaneni
  2019-12-19 18:05   ` Kyrill Tkachov
  2019-11-14 19:13 ` [PATCH][ARM][GCC][3/3x]: MVE intrinsics with ternary operands Srinath Parvathaneni
                   ` (32 subsequent siblings)
  38 siblings, 1 reply; 48+ messages in thread
From: Srinath Parvathaneni @ 2019-11-14 19:13 UTC (permalink / raw)
  To: gcc-patches; +Cc: Richard Earnshaw, Kyrylo Tkachov

[-- Attachment #1: Type: text/plain, Size: 24908 bytes --]

Hello,

This patch supports following MVE ACLE intrinsics with unary operand.

vmvnq_n_s16, vmvnq_n_s32, vrev64q_s8, vrev64q_s16, vrev64q_s32, vcvtq_s16_f16, vcvtq_s32_f32,
vrev64q_u8, vrev64q_u16, vrev64q_u32, vmvnq_n_u16, vmvnq_n_u32, vcvtq_u16_f16, vcvtq_u32_f32,
vrev64q.

Please refer to M-profile Vector Extension (MVE) intrinsics [1]  for more details.
[1] https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/mve-intrinsics

Regression tested on arm-none-eabi and found no regressions.

Ok for trunk?

Thanks,
Srinath.

gcc/ChangeLog:

2019-10-21  Andre Vieira  <andre.simoesdiasvieira@arm.com>
	    Mihail Ionescu  <mihail.ionescu@arm.com>
	    Srinath Parvathaneni  <srinath.parvathaneni@arm.com>

	* config/arm/arm-builtins.c (UNOP_SNONE_SNONE_QUALIFIERS): Define.
	(UNOP_SNONE_NONE_QUALIFIERS): Likewise.
	(UNOP_SNONE_IMM_QUALIFIERS): Likewise.
	(UNOP_UNONE_NONE_QUALIFIERS): Likewise.
	(UNOP_UNONE_UNONE_QUALIFIERS): Likewise.
	(UNOP_UNONE_IMM_QUALIFIERS): Likewise.
	* config/arm/arm_mve.h (vmvnq_n_s16): Define macro.
	(vmvnq_n_s32): Likewise.
	(vrev64q_s8): Likewise.
	(vrev64q_s16): Likewise.
	(vrev64q_s32): Likewise.
	(vcvtq_s16_f16): Likewise.
	(vcvtq_s32_f32): Likewise.
	(vrev64q_u8): Likewise.
	(vrev64q_u16): Likewise.
	(vrev64q_u32): Likewise.
	(vmvnq_n_u16): Likewise.
	(vmvnq_n_u32): Likewise.
	(vcvtq_u16_f16): Likewise.
	(vcvtq_u32_f32): Likewise.
	(__arm_vmvnq_n_s16): Define intrinsic.
	(__arm_vmvnq_n_s32): Likewise.
	(__arm_vrev64q_s8): Likewise.
	(__arm_vrev64q_s16): Likewise.
	(__arm_vrev64q_s32): Likewise.
	(__arm_vrev64q_u8): Likewise.
	(__arm_vrev64q_u16): Likewise.
	(__arm_vrev64q_u32): Likewise.
	(__arm_vmvnq_n_u16): Likewise.
	(__arm_vmvnq_n_u32): Likewise.
	(__arm_vcvtq_s16_f16): Likewise.
	(__arm_vcvtq_s32_f32): Likewise.
	(__arm_vcvtq_u16_f16): Likewise.
	(__arm_vcvtq_u32_f32): Likewise.
	(vrev64q): Define polymorphic variant.
	* config/arm/arm_mve_builtins.def (UNOP_SNONE_SNONE): Use it.
	(UNOP_SNONE_NONE): Likewise.
	(UNOP_SNONE_IMM): Likewise.
	(UNOP_UNONE_UNONE): Likewise.
	(UNOP_UNONE_NONE): Likewise.
	(UNOP_UNONE_IMM): Likewise.
	* config/arm/mve.md (mve_vrev64q_<supf><mode>): Define RTL pattern.
	(mve_vcvtq_from_f_<supf><mode>): Likewise.
	(mve_vmvnq_n_<supf><mode>): Likewise.

gcc/testsuite/ChangeLog:

2019-10-21  Andre Vieira  <andre.simoesdiasvieira@arm.com>
	    Mihail Ionescu  <mihail.ionescu@arm.com>
	    Srinath Parvathaneni  <srinath.parvathaneni@arm.com>

	* gcc.target/arm/mve/intrinsics/vcvtq_s16_f16.c: New test.
	* gcc.target/arm/mve/intrinsics/vcvtq_s32_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcvtq_u16_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcvtq_u32_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmvnq_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmvnq_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmvnq_n_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmvnq_n_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrev64q_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrev64q_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrev64q_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrev64q_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrev64q_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrev64q_u8.c: Likewise.


###############     Attachment also inlined for ease of reply    ###############


diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index 2fee417fe6585f457edd4cf96655366b1d6bd1a0..21b213d8e1bc99a3946f15e97161e01d73832799 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -313,6 +313,42 @@ arm_unop_none_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 #define UNOP_NONE_UNONE_QUALIFIERS \
   (arm_unop_none_unone_qualifiers)
 
+static enum arm_type_qualifiers
+arm_unop_snone_snone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_none, qualifier_none };
+#define UNOP_SNONE_SNONE_QUALIFIERS \
+  (arm_unop_snone_snone_qualifiers)
+
+static enum arm_type_qualifiers
+arm_unop_snone_none_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_none, qualifier_none };
+#define UNOP_SNONE_NONE_QUALIFIERS \
+  (arm_unop_snone_none_qualifiers)
+
+static enum arm_type_qualifiers
+arm_unop_snone_imm_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_none, qualifier_immediate };
+#define UNOP_SNONE_IMM_QUALIFIERS \
+  (arm_unop_snone_imm_qualifiers)
+
+static enum arm_type_qualifiers
+arm_unop_unone_none_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_unsigned, qualifier_none };
+#define UNOP_UNONE_NONE_QUALIFIERS \
+  (arm_unop_unone_none_qualifiers)
+
+static enum arm_type_qualifiers
+arm_unop_unone_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_unsigned, qualifier_unsigned };
+#define UNOP_UNONE_UNONE_QUALIFIERS \
+  (arm_unop_unone_unone_qualifiers)
+
+static enum arm_type_qualifiers
+arm_unop_unone_imm_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_unsigned, qualifier_immediate };
+#define UNOP_UNONE_IMM_QUALIFIERS \
+  (arm_unop_unone_imm_qualifiers)
+
 /* End of Qualifier for MVE builtins.  */
 
    /* void ([T element type] *, T, immediate).  */
diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
index 9bcb04fb99a54b47057bb33cc807d6a5ad16401f..bd5162122b8c8e61ba25ba6ea89c56005f5a79dc 100644
--- a/gcc/config/arm/arm_mve.h
+++ b/gcc/config/arm/arm_mve.h
@@ -108,6 +108,20 @@ typedef struct { uint8x16_t val[4]; } uint8x16x4_t;
 #define vcvtq_f32_s32(__a) __arm_vcvtq_f32_s32(__a)
 #define vcvtq_f16_u16(__a) __arm_vcvtq_f16_u16(__a)
 #define vcvtq_f32_u32(__a) __arm_vcvtq_f32_u32(__a)
+#define vmvnq_n_s16( __imm) __arm_vmvnq_n_s16( __imm)
+#define vmvnq_n_s32( __imm) __arm_vmvnq_n_s32( __imm)
+#define vrev64q_s8(__a) __arm_vrev64q_s8(__a)
+#define vrev64q_s16(__a) __arm_vrev64q_s16(__a)
+#define vrev64q_s32(__a) __arm_vrev64q_s32(__a)
+#define vcvtq_s16_f16(__a) __arm_vcvtq_s16_f16(__a)
+#define vcvtq_s32_f32(__a) __arm_vcvtq_s32_f32(__a)
+#define vrev64q_u8(__a) __arm_vrev64q_u8(__a)
+#define vrev64q_u16(__a) __arm_vrev64q_u16(__a)
+#define vrev64q_u32(__a) __arm_vrev64q_u32(__a)
+#define vmvnq_n_u16( __imm) __arm_vmvnq_n_u16( __imm)
+#define vmvnq_n_u32( __imm) __arm_vmvnq_n_u32( __imm)
+#define vcvtq_u16_f16(__a) __arm_vcvtq_u16_f16(__a)
+#define vcvtq_u32_f32(__a) __arm_vcvtq_u32_f32(__a)
 #endif
 
 __extension__ extern __inline void
@@ -164,6 +178,76 @@ __arm_vst4q_u32 (uint32_t * __addr, uint32x4x4_t __value)
   __builtin_mve_vst4qv4si ((__builtin_neon_si *) __addr, __rv.__o);
 }
 
+__extension__ extern __inline int16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vmvnq_n_s16 (const int __imm)
+{
+  return __builtin_mve_vmvnq_n_sv8hi (__imm);
+}
+
+__extension__ extern __inline int32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vmvnq_n_s32 (const int __imm)
+{
+  return __builtin_mve_vmvnq_n_sv4si (__imm);
+}
+
+__extension__ extern __inline int8x16_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vrev64q_s8 (int8x16_t __a)
+{
+  return __builtin_mve_vrev64q_sv16qi (__a);
+}
+
+__extension__ extern __inline int16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vrev64q_s16 (int16x8_t __a)
+{
+  return __builtin_mve_vrev64q_sv8hi (__a);
+}
+
+__extension__ extern __inline int32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vrev64q_s32 (int32x4_t __a)
+{
+  return __builtin_mve_vrev64q_sv4si (__a);
+}
+
+__extension__ extern __inline uint8x16_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vrev64q_u8 (uint8x16_t __a)
+{
+  return __builtin_mve_vrev64q_uv16qi (__a);
+}
+
+__extension__ extern __inline uint16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vrev64q_u16 (uint16x8_t __a)
+{
+  return __builtin_mve_vrev64q_uv8hi (__a);
+}
+
+__extension__ extern __inline uint32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vrev64q_u32 (uint32x4_t __a)
+{
+  return __builtin_mve_vrev64q_uv4si (__a);
+}
+
+__extension__ extern __inline uint16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vmvnq_n_u16 (const int __imm)
+{
+  return __builtin_mve_vmvnq_n_uv8hi (__imm);
+}
+
+__extension__ extern __inline uint32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vmvnq_n_u32 (const int __imm)
+{
+  return __builtin_mve_vmvnq_n_uv4si (__imm);
+}
+
 #if (__ARM_FEATURE_MVE & 2) /* MVE Floating point.  */
 
 __extension__ extern __inline void
@@ -373,6 +457,34 @@ __arm_vcvtq_f32_u32 (uint32x4_t __a)
   return __builtin_mve_vcvtq_to_f_uv4sf (__a);
 }
 
+__extension__ extern __inline int16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vcvtq_s16_f16 (float16x8_t __a)
+{
+  return __builtin_mve_vcvtq_from_f_sv8hi (__a);
+}
+
+__extension__ extern __inline int32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vcvtq_s32_f32 (float32x4_t __a)
+{
+  return __builtin_mve_vcvtq_from_f_sv4si (__a);
+}
+
+__extension__ extern __inline uint16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vcvtq_u16_f16 (float16x8_t __a)
+{
+  return __builtin_mve_vcvtq_from_f_uv8hi (__a);
+}
+
+__extension__ extern __inline uint32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vcvtq_u32_f32 (float32x4_t __a)
+{
+  return __builtin_mve_vcvtq_from_f_uv4si (__a);
+}
+
 #endif
 
 enum {
@@ -674,6 +786,16 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_uint16_t_ptr][__ARM_mve_type_uint16x8x4_t]: __arm_vst4q_u16 (__ARM_mve_coerce(__p0, uint16_t *), __ARM_mve_coerce(__p1, uint16x8x4_t)), \
   int (*)[__ARM_mve_type_uint32_t_ptr][__ARM_mve_type_uint32x4x4_t]: __arm_vst4q_u32 (__ARM_mve_coerce(__p0, uint32_t *), __ARM_mve_coerce(__p1, uint32x4x4_t)));})
 
+#define vrev64q(p0) __arm_vrev64q(p0)
+#define __arm_vrev64q(p0) ({ __typeof(p0) __p0 = (p0); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
+  int (*)[__ARM_mve_type_int8x16_t]: __arm_vrev64q_s8 (__ARM_mve_coerce(__p0, int8x16_t)), \
+  int (*)[__ARM_mve_type_int16x8_t]: __arm_vrev64q_s16 (__ARM_mve_coerce(__p0, int16x8_t)), \
+  int (*)[__ARM_mve_type_int32x4_t]: __arm_vrev64q_s32 (__ARM_mve_coerce(__p0, int32x4_t)), \
+  int (*)[__ARM_mve_type_uint8x16_t]: __arm_vrev64q_u8 (__ARM_mve_coerce(__p0, uint8x16_t)), \
+  int (*)[__ARM_mve_type_uint16x8_t]: __arm_vrev64q_u16 (__ARM_mve_coerce(__p0, uint16x8_t)), \
+  int (*)[__ARM_mve_type_uint32x4_t]: __arm_vrev64q_u32 (__ARM_mve_coerce(__p0, uint32x4_t)));})
+
 #endif /* MVE Floating point.  */
 
 #ifdef __cplusplus
diff --git a/gcc/config/arm/arm_mve_builtins.def b/gcc/config/arm/arm_mve_builtins.def
index 65dc58c9328525891a0aa0bb97a412ebc8257c18..d205aca28909a224bd4bad103b8a280631661538 100644
--- a/gcc/config/arm/arm_mve_builtins.def
+++ b/gcc/config/arm/arm_mve_builtins.def
@@ -34,3 +34,9 @@ VAR1 (UNOP_NONE_NONE, vcvttq_f32_f16, v4sf)
 VAR1 (UNOP_NONE_NONE, vcvtbq_f32_f16, v4sf)
 VAR2 (UNOP_NONE_SNONE, vcvtq_to_f_s, v8hf, v4sf)
 VAR2 (UNOP_NONE_UNONE, vcvtq_to_f_u, v8hf, v4sf)
+VAR3 (UNOP_SNONE_SNONE, vrev64q_s, v16qi, v8hi, v4si)
+VAR2 (UNOP_SNONE_NONE, vcvtq_from_f_s, v8hi, v4si)
+VAR2 (UNOP_SNONE_IMM, vmvnq_n_s, v8hi, v4si)
+VAR3 (UNOP_UNONE_UNONE, vrev64q_u, v16qi, v8hi, v4si)
+VAR2 (UNOP_UNONE_NONE, vcvtq_from_f_u, v8hi, v4si)
+VAR2 (UNOP_UNONE_IMM, vmvnq_n_u, v8hi, v4si)
diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index 7a31d0abdfff9a93d79faa1de44d1b224470e2eb..a1dd709a9ffe479cf16a88a5923975f1941531ef 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -22,17 +22,26 @@
 (define_mode_iterator MVE_types [V16QI V8HI V4SI V2DI TI V8HF V4SF V2DF])
 (define_mode_iterator MVE_VLD_ST [V16QI V8HI V4SI V8HF V4SF])
 (define_mode_iterator MVE_0 [V8HF V4SF])
+(define_mode_iterator MVE_2 [V16QI V8HI V4SI])
+(define_mode_iterator MVE_5 [V8HI V4SI])
 
 (define_c_enum "unspec" [VST4Q VRNDXQ_F VRNDQ_F VRNDPQ_F VRNDNQ_F VRNDMQ_F
 			 VRNDAQ_F VREV64Q_F VNEGQ_F VDUPQ_N_F VABSQ_F VREV32Q_F
 			 VCVTTQ_F32_F16 VCVTBQ_F32_F16 VCVTQ_TO_F_S
-			 VCVTQ_TO_F_U])
+			 VCVTQ_TO_F_U VMVNQ_N_S VMVNQ_N_U VREV64Q_S VREV64Q_U
+			 VCVTQ_FROM_F_S VCVTQ_FROM_F_U])
 
 (define_mode_attr MVE_CNVT [(V8HI "V8HF") (V4SI "V4SF")
 			    (V8HF "V8HI") (V4SF "V4SI")])
 
-(define_int_attr supf [(VCVTQ_TO_F_S "s") (VCVTQ_TO_F_U "u")])
+(define_int_attr supf [(VCVTQ_TO_F_S "s") (VCVTQ_TO_F_U "u") (VMVNQ_N_S "s")
+		       (VMVNQ_N_U "u") (VREV64Q_U "u") (VREV64Q_S "s")
+		       (VCVTQ_FROM_F_S "s") (VCVTQ_FROM_F_U "u")])
+
 (define_int_iterator VCVTQ_TO_F [VCVTQ_TO_F_S VCVTQ_TO_F_U])
+(define_int_iterator VMVNQ_N [VMVNQ_N_U VMVNQ_N_S])
+(define_int_iterator VREV64Q [VREV64Q_S VREV64Q_U])
+(define_int_iterator VCVTQ_FROM_F [VCVTQ_FROM_F_S VCVTQ_FROM_F_U])
 
 (define_insn "*mve_mov<mode>"
   [(set (match_operand:MVE_types 0 "s_register_operand" "=w,w,r,w,w,r,w")
@@ -318,3 +327,45 @@
   "vcvt.f%#<V_sz_elem>.<supf>%#<V_sz_elem>       %q0, %q1"
   [(set_attr "type" "mve_move")
 ])
+
+;;
+;; [vrev64q_u, vrev64q_s])
+;;
+(define_insn "mve_vrev64q_<supf><mode>"
+  [
+   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
+	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "w")]
+	 VREV64Q))
+  ]
+  "TARGET_HAVE_MVE"
+  "vrev64.%#<V_sz_elem> %q0, %q1"
+  [(set_attr "type" "mve_move")
+])
+
+;;
+;; [vcvtq_from_f_s, vcvtq_from_f_u])
+;;
+(define_insn "mve_vcvtq_from_f_<supf><mode>"
+  [
+   (set (match_operand:MVE_5 0 "s_register_operand" "=w")
+	(unspec:MVE_5 [(match_operand:<MVE_CNVT> 1 "s_register_operand" "w")]
+	 VCVTQ_FROM_F))
+  ]
+  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
+  "vcvt.<supf>%#<V_sz_elem>.f%#<V_sz_elem>       %q0, %q1"
+  [(set_attr "type" "mve_move")
+])
+
+;;
+;; [vmvnq_n_u, vmvnq_n_s])
+;;
+(define_insn "mve_vmvnq_n_<supf><mode>"
+  [
+   (set (match_operand:MVE_5 0 "s_register_operand" "=w")
+	(unspec:MVE_5 [(match_operand:SI 1 "immediate_operand" "i")]
+	 VMVNQ_N))
+  ]
+  "TARGET_HAVE_MVE"
+  "vmvn.i%#<V_sz_elem>  %q0, %1"
+  [(set_attr "type" "mve_move")
+])
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_s16_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_s16_f16.c
new file mode 100644
index 0000000000000000000000000000000000000000..aa69b11b79a15dace81bb5d8112cc5053a6f8dc2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_s16_f16.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int16x8_t
+foo (float16x8_t a)
+{
+  return vcvtq_s16_f16 (a);
+}
+
+/* { dg-final { scan-assembler "vcvt.s16.f16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_s32_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_s32_f32.c
new file mode 100644
index 0000000000000000000000000000000000000000..0bfcba6dcf4e240a1de0cba5d98d85c2a529c09e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_s32_f32.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int32x4_t
+foo (float32x4_t a)
+{
+  return vcvtq_s32_f32 (a);
+}
+
+/* { dg-final { scan-assembler "vcvt.s32.f32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_u16_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_u16_f16.c
new file mode 100644
index 0000000000000000000000000000000000000000..ed36c8082ee464e8878ae7453e04d26e09a87752
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_u16_f16.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint16x8_t
+foo (float16x8_t a)
+{
+    return vcvtq_u16_f16 (a);
+}
+
+/* { dg-final { scan-assembler "vcvt.u16.f16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_u32_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_u32_f32.c
new file mode 100644
index 0000000000000000000000000000000000000000..fbd3989e19c8d832561d6a4265b68d0a87a678b7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_u32_f32.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint32x4_t
+foo (float32x4_t a)
+{
+    return vcvtq_u32_f32 (a);
+}
+
+/* { dg-final { scan-assembler "vcvt.u32.f32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmvnq_n_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmvnq_n_s16.c
new file mode 100644
index 0000000000000000000000000000000000000000..39c31b4bbe743ed765c9a106778d6c4ba31d14eb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmvnq_n_s16.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int16x8_t
+foo ()
+{
+  return vmvnq_n_s16 (1);
+}
+
+/* { dg-final { scan-assembler "vmvn.i16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmvnq_n_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmvnq_n_s32.c
new file mode 100644
index 0000000000000000000000000000000000000000..6754cbf8baf11a702543c86d4c048df6bd9699a8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmvnq_n_s32.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int32x4_t
+foo ()
+{
+  return vmvnq_n_s32 (2);
+}
+
+/* { dg-final { scan-assembler "vmvn.i32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmvnq_n_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmvnq_n_u16.c
new file mode 100644
index 0000000000000000000000000000000000000000..b7b12e7476917631927017d9413ef7226fedbe23
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmvnq_n_u16.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint16x8_t
+foo ()
+{
+    return vmvnq_n_u16 (1);
+}
+
+/* { dg-final { scan-assembler "vmvn.i16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmvnq_n_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmvnq_n_u32.c
new file mode 100644
index 0000000000000000000000000000000000000000..d5fb831b41ca68d6c4d812051878835b074c47e5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmvnq_n_u32.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint32x4_t
+foo ()
+{
+    return vmvnq_n_u32 (2);
+}
+
+/* { dg-final { scan-assembler "vmvn.i32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrev64q_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrev64q_s16.c
new file mode 100644
index 0000000000000000000000000000000000000000..4eda96fefd369781a7639d7c5a9515d02b4b439e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrev64q_s16.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int16x8_t
+foo (int16x8_t a)
+{
+  return vrev64q_s16 (a);
+}
+
+/* { dg-final { scan-assembler "vrev64.16"  }  } */
+
+int16x8_t
+foo1 (int16x8_t a)
+{
+  return vrev64q_s16 (a);
+}
+
+/* { dg-final { scan-assembler "vrev64.16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrev64q_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrev64q_s32.c
new file mode 100644
index 0000000000000000000000000000000000000000..356f162c477e8159da73c9242caff6a545235cc1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrev64q_s32.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int32x4_t
+foo (int32x4_t a)
+{
+  return vrev64q_s32 (a);
+}
+
+/* { dg-final { scan-assembler "vrev64.32"  }  } */
+
+int32x4_t
+foo1 (int32x4_t a)
+{
+  return vrev64q_s32 (a);
+}
+
+/* { dg-final { scan-assembler "vrev64.32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrev64q_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrev64q_s8.c
new file mode 100644
index 0000000000000000000000000000000000000000..5cc4d0750f4d8de85e997247c77d7c076dfb624e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrev64q_s8.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int8x16_t
+foo (int8x16_t a)
+{
+  return vrev64q_s8 (a);
+}
+
+/* { dg-final { scan-assembler "vrev64.8"  }  } */
+
+int8x16_t
+foo1 (int8x16_t a)
+{
+  return vrev64q (a);
+}
+
+/* { dg-final { scan-assembler "vrev64.8"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrev64q_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrev64q_u16.c
new file mode 100644
index 0000000000000000000000000000000000000000..ae7e3665c54b11e2eee8209ade6030875a201b6b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrev64q_u16.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint16x8_t
+foo (uint16x8_t a)
+{
+    return vrev64q_u16 (a);
+}
+
+/* { dg-final { scan-assembler "vrev64.16"  }  } */
+
+uint16x8_t
+foo1 (uint16x8_t a)
+{
+    return vrev64q (a);
+}
+
+/* { dg-final { scan-assembler "vrev64.16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrev64q_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrev64q_u32.c
new file mode 100644
index 0000000000000000000000000000000000000000..8c87cab925766ea981ce90cc47b3194bbf0913ff
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrev64q_u32.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint32x4_t
+foo (uint32x4_t a)
+{
+    return vrev64q_u32 (a);
+}
+
+/* { dg-final { scan-assembler "vrev64.32"  }  } */
+
+uint32x4_t
+foo1 (uint32x4_t a)
+{
+    return vrev64q (a);
+}
+
+/* { dg-final { scan-assembler "vrev64.32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrev64q_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrev64q_u8.c
new file mode 100644
index 0000000000000000000000000000000000000000..c4abd160e61517a4fcc2312c8fcdff1119686da6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrev64q_u8.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint8x16_t
+foo (uint8x16_t a)
+{
+    return vrev64q_u8 (a);
+}
+
+/* { dg-final { scan-assembler "vrev64.8"  }  } */
+
+uint8x16_t
+foo1 (uint8x16_t a)
+{
+    return vrev64q (a);
+}
+
+/* { dg-final { scan-assembler "vrev64.8"  }  } */


[-- Attachment #2: diff05.patch --]
[-- Type: text/plain, Size: 20847 bytes --]

diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index 2fee417fe6585f457edd4cf96655366b1d6bd1a0..21b213d8e1bc99a3946f15e97161e01d73832799 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -313,6 +313,42 @@ arm_unop_none_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 #define UNOP_NONE_UNONE_QUALIFIERS \
   (arm_unop_none_unone_qualifiers)
 
+static enum arm_type_qualifiers
+arm_unop_snone_snone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_none, qualifier_none };
+#define UNOP_SNONE_SNONE_QUALIFIERS \
+  (arm_unop_snone_snone_qualifiers)
+
+static enum arm_type_qualifiers
+arm_unop_snone_none_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_none, qualifier_none };
+#define UNOP_SNONE_NONE_QUALIFIERS \
+  (arm_unop_snone_none_qualifiers)
+
+static enum arm_type_qualifiers
+arm_unop_snone_imm_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_none, qualifier_immediate };
+#define UNOP_SNONE_IMM_QUALIFIERS \
+  (arm_unop_snone_imm_qualifiers)
+
+static enum arm_type_qualifiers
+arm_unop_unone_none_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_unsigned, qualifier_none };
+#define UNOP_UNONE_NONE_QUALIFIERS \
+  (arm_unop_unone_none_qualifiers)
+
+static enum arm_type_qualifiers
+arm_unop_unone_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_unsigned, qualifier_unsigned };
+#define UNOP_UNONE_UNONE_QUALIFIERS \
+  (arm_unop_unone_unone_qualifiers)
+
+static enum arm_type_qualifiers
+arm_unop_unone_imm_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_unsigned, qualifier_immediate };
+#define UNOP_UNONE_IMM_QUALIFIERS \
+  (arm_unop_unone_imm_qualifiers)
+
 /* End of Qualifier for MVE builtins.  */
 
    /* void ([T element type] *, T, immediate).  */
diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
index 9bcb04fb99a54b47057bb33cc807d6a5ad16401f..bd5162122b8c8e61ba25ba6ea89c56005f5a79dc 100644
--- a/gcc/config/arm/arm_mve.h
+++ b/gcc/config/arm/arm_mve.h
@@ -108,6 +108,20 @@ typedef struct { uint8x16_t val[4]; } uint8x16x4_t;
 #define vcvtq_f32_s32(__a) __arm_vcvtq_f32_s32(__a)
 #define vcvtq_f16_u16(__a) __arm_vcvtq_f16_u16(__a)
 #define vcvtq_f32_u32(__a) __arm_vcvtq_f32_u32(__a)
+#define vmvnq_n_s16( __imm) __arm_vmvnq_n_s16( __imm)
+#define vmvnq_n_s32( __imm) __arm_vmvnq_n_s32( __imm)
+#define vrev64q_s8(__a) __arm_vrev64q_s8(__a)
+#define vrev64q_s16(__a) __arm_vrev64q_s16(__a)
+#define vrev64q_s32(__a) __arm_vrev64q_s32(__a)
+#define vcvtq_s16_f16(__a) __arm_vcvtq_s16_f16(__a)
+#define vcvtq_s32_f32(__a) __arm_vcvtq_s32_f32(__a)
+#define vrev64q_u8(__a) __arm_vrev64q_u8(__a)
+#define vrev64q_u16(__a) __arm_vrev64q_u16(__a)
+#define vrev64q_u32(__a) __arm_vrev64q_u32(__a)
+#define vmvnq_n_u16( __imm) __arm_vmvnq_n_u16( __imm)
+#define vmvnq_n_u32( __imm) __arm_vmvnq_n_u32( __imm)
+#define vcvtq_u16_f16(__a) __arm_vcvtq_u16_f16(__a)
+#define vcvtq_u32_f32(__a) __arm_vcvtq_u32_f32(__a)
 #endif
 
 __extension__ extern __inline void
@@ -164,6 +178,76 @@ __arm_vst4q_u32 (uint32_t * __addr, uint32x4x4_t __value)
   __builtin_mve_vst4qv4si ((__builtin_neon_si *) __addr, __rv.__o);
 }
 
+__extension__ extern __inline int16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vmvnq_n_s16 (const int __imm)
+{
+  return __builtin_mve_vmvnq_n_sv8hi (__imm);
+}
+
+__extension__ extern __inline int32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vmvnq_n_s32 (const int __imm)
+{
+  return __builtin_mve_vmvnq_n_sv4si (__imm);
+}
+
+__extension__ extern __inline int8x16_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vrev64q_s8 (int8x16_t __a)
+{
+  return __builtin_mve_vrev64q_sv16qi (__a);
+}
+
+__extension__ extern __inline int16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vrev64q_s16 (int16x8_t __a)
+{
+  return __builtin_mve_vrev64q_sv8hi (__a);
+}
+
+__extension__ extern __inline int32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vrev64q_s32 (int32x4_t __a)
+{
+  return __builtin_mve_vrev64q_sv4si (__a);
+}
+
+__extension__ extern __inline uint8x16_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vrev64q_u8 (uint8x16_t __a)
+{
+  return __builtin_mve_vrev64q_uv16qi (__a);
+}
+
+__extension__ extern __inline uint16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vrev64q_u16 (uint16x8_t __a)
+{
+  return __builtin_mve_vrev64q_uv8hi (__a);
+}
+
+__extension__ extern __inline uint32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vrev64q_u32 (uint32x4_t __a)
+{
+  return __builtin_mve_vrev64q_uv4si (__a);
+}
+
+__extension__ extern __inline uint16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vmvnq_n_u16 (const int __imm)
+{
+  return __builtin_mve_vmvnq_n_uv8hi (__imm);
+}
+
+__extension__ extern __inline uint32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vmvnq_n_u32 (const int __imm)
+{
+  return __builtin_mve_vmvnq_n_uv4si (__imm);
+}
+
 #if (__ARM_FEATURE_MVE & 2) /* MVE Floating point.  */
 
 __extension__ extern __inline void
@@ -373,6 +457,34 @@ __arm_vcvtq_f32_u32 (uint32x4_t __a)
   return __builtin_mve_vcvtq_to_f_uv4sf (__a);
 }
 
+__extension__ extern __inline int16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vcvtq_s16_f16 (float16x8_t __a)
+{
+  return __builtin_mve_vcvtq_from_f_sv8hi (__a);
+}
+
+__extension__ extern __inline int32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vcvtq_s32_f32 (float32x4_t __a)
+{
+  return __builtin_mve_vcvtq_from_f_sv4si (__a);
+}
+
+__extension__ extern __inline uint16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vcvtq_u16_f16 (float16x8_t __a)
+{
+  return __builtin_mve_vcvtq_from_f_uv8hi (__a);
+}
+
+__extension__ extern __inline uint32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vcvtq_u32_f32 (float32x4_t __a)
+{
+  return __builtin_mve_vcvtq_from_f_uv4si (__a);
+}
+
 #endif
 
 enum {
@@ -674,6 +786,16 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_uint16_t_ptr][__ARM_mve_type_uint16x8x4_t]: __arm_vst4q_u16 (__ARM_mve_coerce(__p0, uint16_t *), __ARM_mve_coerce(__p1, uint16x8x4_t)), \
   int (*)[__ARM_mve_type_uint32_t_ptr][__ARM_mve_type_uint32x4x4_t]: __arm_vst4q_u32 (__ARM_mve_coerce(__p0, uint32_t *), __ARM_mve_coerce(__p1, uint32x4x4_t)));})
 
+#define vrev64q(p0) __arm_vrev64q(p0)
+#define __arm_vrev64q(p0) ({ __typeof(p0) __p0 = (p0); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
+  int (*)[__ARM_mve_type_int8x16_t]: __arm_vrev64q_s8 (__ARM_mve_coerce(__p0, int8x16_t)), \
+  int (*)[__ARM_mve_type_int16x8_t]: __arm_vrev64q_s16 (__ARM_mve_coerce(__p0, int16x8_t)), \
+  int (*)[__ARM_mve_type_int32x4_t]: __arm_vrev64q_s32 (__ARM_mve_coerce(__p0, int32x4_t)), \
+  int (*)[__ARM_mve_type_uint8x16_t]: __arm_vrev64q_u8 (__ARM_mve_coerce(__p0, uint8x16_t)), \
+  int (*)[__ARM_mve_type_uint16x8_t]: __arm_vrev64q_u16 (__ARM_mve_coerce(__p0, uint16x8_t)), \
+  int (*)[__ARM_mve_type_uint32x4_t]: __arm_vrev64q_u32 (__ARM_mve_coerce(__p0, uint32x4_t)));})
+
 #endif /* MVE Floating point.  */
 
 #ifdef __cplusplus
diff --git a/gcc/config/arm/arm_mve_builtins.def b/gcc/config/arm/arm_mve_builtins.def
index 65dc58c9328525891a0aa0bb97a412ebc8257c18..d205aca28909a224bd4bad103b8a280631661538 100644
--- a/gcc/config/arm/arm_mve_builtins.def
+++ b/gcc/config/arm/arm_mve_builtins.def
@@ -34,3 +34,9 @@ VAR1 (UNOP_NONE_NONE, vcvttq_f32_f16, v4sf)
 VAR1 (UNOP_NONE_NONE, vcvtbq_f32_f16, v4sf)
 VAR2 (UNOP_NONE_SNONE, vcvtq_to_f_s, v8hf, v4sf)
 VAR2 (UNOP_NONE_UNONE, vcvtq_to_f_u, v8hf, v4sf)
+VAR3 (UNOP_SNONE_SNONE, vrev64q_s, v16qi, v8hi, v4si)
+VAR2 (UNOP_SNONE_NONE, vcvtq_from_f_s, v8hi, v4si)
+VAR2 (UNOP_SNONE_IMM, vmvnq_n_s, v8hi, v4si)
+VAR3 (UNOP_UNONE_UNONE, vrev64q_u, v16qi, v8hi, v4si)
+VAR2 (UNOP_UNONE_NONE, vcvtq_from_f_u, v8hi, v4si)
+VAR2 (UNOP_UNONE_IMM, vmvnq_n_u, v8hi, v4si)
diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index 7a31d0abdfff9a93d79faa1de44d1b224470e2eb..a1dd709a9ffe479cf16a88a5923975f1941531ef 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -22,17 +22,26 @@
 (define_mode_iterator MVE_types [V16QI V8HI V4SI V2DI TI V8HF V4SF V2DF])
 (define_mode_iterator MVE_VLD_ST [V16QI V8HI V4SI V8HF V4SF])
 (define_mode_iterator MVE_0 [V8HF V4SF])
+(define_mode_iterator MVE_2 [V16QI V8HI V4SI])
+(define_mode_iterator MVE_5 [V8HI V4SI])
 
 (define_c_enum "unspec" [VST4Q VRNDXQ_F VRNDQ_F VRNDPQ_F VRNDNQ_F VRNDMQ_F
 			 VRNDAQ_F VREV64Q_F VNEGQ_F VDUPQ_N_F VABSQ_F VREV32Q_F
 			 VCVTTQ_F32_F16 VCVTBQ_F32_F16 VCVTQ_TO_F_S
-			 VCVTQ_TO_F_U])
+			 VCVTQ_TO_F_U VMVNQ_N_S VMVNQ_N_U VREV64Q_S VREV64Q_U
+			 VCVTQ_FROM_F_S VCVTQ_FROM_F_U])
 
 (define_mode_attr MVE_CNVT [(V8HI "V8HF") (V4SI "V4SF")
 			    (V8HF "V8HI") (V4SF "V4SI")])
 
-(define_int_attr supf [(VCVTQ_TO_F_S "s") (VCVTQ_TO_F_U "u")])
+(define_int_attr supf [(VCVTQ_TO_F_S "s") (VCVTQ_TO_F_U "u") (VMVNQ_N_S "s")
+		       (VMVNQ_N_U "u") (VREV64Q_U "u") (VREV64Q_S "s")
+		       (VCVTQ_FROM_F_S "s") (VCVTQ_FROM_F_U "u")])
+
 (define_int_iterator VCVTQ_TO_F [VCVTQ_TO_F_S VCVTQ_TO_F_U])
+(define_int_iterator VMVNQ_N [VMVNQ_N_U VMVNQ_N_S])
+(define_int_iterator VREV64Q [VREV64Q_S VREV64Q_U])
+(define_int_iterator VCVTQ_FROM_F [VCVTQ_FROM_F_S VCVTQ_FROM_F_U])
 
 (define_insn "*mve_mov<mode>"
   [(set (match_operand:MVE_types 0 "s_register_operand" "=w,w,r,w,w,r,w")
@@ -318,3 +327,45 @@
   "vcvt.f%#<V_sz_elem>.<supf>%#<V_sz_elem>       %q0, %q1"
   [(set_attr "type" "mve_move")
 ])
+
+;;
+;; [vrev64q_u, vrev64q_s])
+;;
+(define_insn "mve_vrev64q_<supf><mode>"
+  [
+   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
+	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "w")]
+	 VREV64Q))
+  ]
+  "TARGET_HAVE_MVE"
+  "vrev64.%#<V_sz_elem> %q0, %q1"
+  [(set_attr "type" "mve_move")
+])
+
+;;
+;; [vcvtq_from_f_s, vcvtq_from_f_u])
+;;
+(define_insn "mve_vcvtq_from_f_<supf><mode>"
+  [
+   (set (match_operand:MVE_5 0 "s_register_operand" "=w")
+	(unspec:MVE_5 [(match_operand:<MVE_CNVT> 1 "s_register_operand" "w")]
+	 VCVTQ_FROM_F))
+  ]
+  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
+  "vcvt.<supf>%#<V_sz_elem>.f%#<V_sz_elem>       %q0, %q1"
+  [(set_attr "type" "mve_move")
+])
+
+;;
+;; [vmvnq_n_u, vmvnq_n_s])
+;;
+(define_insn "mve_vmvnq_n_<supf><mode>"
+  [
+   (set (match_operand:MVE_5 0 "s_register_operand" "=w")
+	(unspec:MVE_5 [(match_operand:SI 1 "immediate_operand" "i")]
+	 VMVNQ_N))
+  ]
+  "TARGET_HAVE_MVE"
+  "vmvn.i%#<V_sz_elem>  %q0, %1"
+  [(set_attr "type" "mve_move")
+])
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_s16_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_s16_f16.c
new file mode 100644
index 0000000000000000000000000000000000000000..aa69b11b79a15dace81bb5d8112cc5053a6f8dc2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_s16_f16.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int16x8_t
+foo (float16x8_t a)
+{
+  return vcvtq_s16_f16 (a);
+}
+
+/* { dg-final { scan-assembler "vcvt.s16.f16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_s32_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_s32_f32.c
new file mode 100644
index 0000000000000000000000000000000000000000..0bfcba6dcf4e240a1de0cba5d98d85c2a529c09e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_s32_f32.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int32x4_t
+foo (float32x4_t a)
+{
+  return vcvtq_s32_f32 (a);
+}
+
+/* { dg-final { scan-assembler "vcvt.s32.f32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_u16_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_u16_f16.c
new file mode 100644
index 0000000000000000000000000000000000000000..ed36c8082ee464e8878ae7453e04d26e09a87752
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_u16_f16.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint16x8_t
+foo (float16x8_t a)
+{
+    return vcvtq_u16_f16 (a);
+}
+
+/* { dg-final { scan-assembler "vcvt.u16.f16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_u32_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_u32_f32.c
new file mode 100644
index 0000000000000000000000000000000000000000..fbd3989e19c8d832561d6a4265b68d0a87a678b7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_u32_f32.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint32x4_t
+foo (float32x4_t a)
+{
+    return vcvtq_u32_f32 (a);
+}
+
+/* { dg-final { scan-assembler "vcvt.u32.f32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmvnq_n_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmvnq_n_s16.c
new file mode 100644
index 0000000000000000000000000000000000000000..39c31b4bbe743ed765c9a106778d6c4ba31d14eb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmvnq_n_s16.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int16x8_t
+foo ()
+{
+  return vmvnq_n_s16 (1);
+}
+
+/* { dg-final { scan-assembler "vmvn.i16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmvnq_n_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmvnq_n_s32.c
new file mode 100644
index 0000000000000000000000000000000000000000..6754cbf8baf11a702543c86d4c048df6bd9699a8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmvnq_n_s32.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int32x4_t
+foo ()
+{
+  return vmvnq_n_s32 (2);
+}
+
+/* { dg-final { scan-assembler "vmvn.i32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmvnq_n_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmvnq_n_u16.c
new file mode 100644
index 0000000000000000000000000000000000000000..b7b12e7476917631927017d9413ef7226fedbe23
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmvnq_n_u16.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint16x8_t
+foo ()
+{
+    return vmvnq_n_u16 (1);
+}
+
+/* { dg-final { scan-assembler "vmvn.i16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmvnq_n_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmvnq_n_u32.c
new file mode 100644
index 0000000000000000000000000000000000000000..d5fb831b41ca68d6c4d812051878835b074c47e5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmvnq_n_u32.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint32x4_t
+foo ()
+{
+    return vmvnq_n_u32 (2);
+}
+
+/* { dg-final { scan-assembler "vmvn.i32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrev64q_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrev64q_s16.c
new file mode 100644
index 0000000000000000000000000000000000000000..4eda96fefd369781a7639d7c5a9515d02b4b439e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrev64q_s16.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int16x8_t
+foo (int16x8_t a)
+{
+  return vrev64q_s16 (a);
+}
+
+/* { dg-final { scan-assembler "vrev64.16"  }  } */
+
+int16x8_t
+foo1 (int16x8_t a)
+{
+  return vrev64q_s16 (a);
+}
+
+/* { dg-final { scan-assembler "vrev64.16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrev64q_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrev64q_s32.c
new file mode 100644
index 0000000000000000000000000000000000000000..356f162c477e8159da73c9242caff6a545235cc1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrev64q_s32.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int32x4_t
+foo (int32x4_t a)
+{
+  return vrev64q_s32 (a);
+}
+
+/* { dg-final { scan-assembler "vrev64.32"  }  } */
+
+int32x4_t
+foo1 (int32x4_t a)
+{
+  return vrev64q_s32 (a);
+}
+
+/* { dg-final { scan-assembler "vrev64.32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrev64q_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrev64q_s8.c
new file mode 100644
index 0000000000000000000000000000000000000000..5cc4d0750f4d8de85e997247c77d7c076dfb624e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrev64q_s8.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int8x16_t
+foo (int8x16_t a)
+{
+  return vrev64q_s8 (a);
+}
+
+/* { dg-final { scan-assembler "vrev64.8"  }  } */
+
+int8x16_t
+foo1 (int8x16_t a)
+{
+  return vrev64q (a);
+}
+
+/* { dg-final { scan-assembler "vrev64.8"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrev64q_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrev64q_u16.c
new file mode 100644
index 0000000000000000000000000000000000000000..ae7e3665c54b11e2eee8209ade6030875a201b6b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrev64q_u16.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint16x8_t
+foo (uint16x8_t a)
+{
+    return vrev64q_u16 (a);
+}
+
+/* { dg-final { scan-assembler "vrev64.16"  }  } */
+
+uint16x8_t
+foo1 (uint16x8_t a)
+{
+    return vrev64q (a);
+}
+
+/* { dg-final { scan-assembler "vrev64.16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrev64q_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrev64q_u32.c
new file mode 100644
index 0000000000000000000000000000000000000000..8c87cab925766ea981ce90cc47b3194bbf0913ff
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrev64q_u32.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint32x4_t
+foo (uint32x4_t a)
+{
+    return vrev64q_u32 (a);
+}
+
+/* { dg-final { scan-assembler "vrev64.32"  }  } */
+
+uint32x4_t
+foo1 (uint32x4_t a)
+{
+    return vrev64q (a);
+}
+
+/* { dg-final { scan-assembler "vrev64.32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrev64q_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrev64q_u8.c
new file mode 100644
index 0000000000000000000000000000000000000000..c4abd160e61517a4fcc2312c8fcdff1119686da6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrev64q_u8.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint8x16_t
+foo (uint8x16_t a)
+{
+    return vrev64q_u8 (a);
+}
+
+/* { dg-final { scan-assembler "vrev64.8"  }  } */
+
+uint8x16_t
+foo1 (uint8x16_t a)
+{
+    return vrev64q (a);
+}
+
+/* { dg-final { scan-assembler "vrev64.8"  }  } */


^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH][ARM][GCC][4/x]: MVE ACLE vector interleaving store intrinsics.
  2019-11-14 19:34 [PATCH][ARM][GCC][0/x]: Support for MVE ACLE intrinsics Srinath Parvathaneni
  2019-11-14 19:13 ` [PATCH][ARM][GCC][2/2x]: MVE intrinsics with binary operands Srinath Parvathaneni
  2019-11-14 19:13 ` [PATCH][ARM][GCC][3/x]: MVE ACLE intrinsics framework patch Srinath Parvathaneni
@ 2019-11-14 19:13 ` Srinath Parvathaneni
  2019-12-19 17:39   ` Kyrill Tkachov
  2019-11-14 19:13 ` [PATCH][ARM][GCC][2/x]: MVE ACLE intrinsics framework patch Srinath Parvathaneni
                   ` (35 subsequent siblings)
  38 siblings, 1 reply; 48+ messages in thread
From: Srinath Parvathaneni @ 2019-11-14 19:13 UTC (permalink / raw)
  To: gcc-patches; +Cc: Richard Earnshaw, Kyrylo Tkachov

[-- Attachment #1: Type: text/plain, Size: 38511 bytes --]

Hello,

This patch supports MVE ACLE intrinsics vst4q_s8, vst4q_s16, vst4q_s32, vst4q_u8,
vst4q_u16, vst4q_u32, vst4q_f16 and vst4q_f32.

In this patch arm_mve_builtins.def file is added to the source code in which the
builtins for MVE ACLE intrinsics are defined using builtin qualifiers.

Please refer to M-profile Vector Extension (MVE) intrinsics [1]  for more details.
[1] https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/mve-intrinsics

Regression tested on arm-none-eabi and found no regressions.

Ok for trunk?

Thanks,
Srinath.

gcc/ChangeLog:

2019-11-12  Andre Vieira  <andre.simoesdiasvieira@arm.com>
	    Mihail Ionescu  <mihail.ionescu@arm.com>
	    Srinath Parvathaneni  <srinath.parvathaneni@arm.com>

	* config/arm/arm-builtins.c (CF): Define mve_builtin_data.
	(VAR1): Define.
	(ARM_BUILTIN_MVE_PATTERN_START): Define.
	(arm_init_mve_builtins): Define function.
	(arm_init_builtins): Add TARGET_HAVE_MVE check.
	(arm_expand_builtin_1): Check the range of fcode.
	(arm_expand_mve_builtin): Define function to expand MVE builtins.
	(arm_expand_builtin): Check the range of fcode.
	* config/arm/arm_mve.h (__ARM_FEATURE_MVE): Define MVE floating point
        types.
	(__ARM_MVE_PRESERVE_USER_NAMESPACE): Define to protect user namespace.
	(vst4q_s8): Define macro.
	(vst4q_s16): Likewise.
	(vst4q_s32): Likewise.
	(vst4q_u8): Likewise.
	(vst4q_u16): Likewise.
	(vst4q_u32): Likewise.
	(vst4q_f16): Likewise.
	(vst4q_f32): Likewise.
	(__arm_vst4q_s8): Define inline builtin.
	(__arm_vst4q_s16): Likewise.
	(__arm_vst4q_s32): Likewise.
	(__arm_vst4q_u8): Likewise.
	(__arm_vst4q_u16): Likewise.
	(__arm_vst4q_u32): Likewise.
	(__arm_vst4q_f16): Likewise.
	(__arm_vst4q_f32): Likewise.
	(__ARM_mve_typeid): Define macro with MVE types.
	(__ARM_mve_coerce): Define macro with _Generic feature.
	(vst4q): Define polymorphic variant for different vst4q builtins.
	* config/arm/arm_mve_builtins.def: New file.
	* config/arm/mve.md (MVE_VLD_ST): Define iterator.
	(unspec): Define unspec.
	(mve_vst4q<mode>): Define RTL pattern.
	* config/arm/t-arm (arm.o): Add entry for arm_mve_builtins.def.
	(arm-builtins.o): Likewise.

gcc/testsuite/ChangeLog:

2019-11-12  Andre Vieira  <andre.simoesdiasvieira@arm.com>
	    Mihail Ionescu  <mihail.ionescu@arm.com>
	    Srinath Parvathaneni  <srinath.parvathaneni@arm.com>

	* gcc.target/arm/mve/intrinsics/vst4q_f16.c: New test.
	* gcc.target/arm/mve/intrinsics/vst4q_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vst4q_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vst4q_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vst4q_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vst4q_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vst4q_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vst4q_u8.c: Likewise.


###############     Attachment also inlined for ease of reply    ###############


diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index d4cb0ea3deb49b10266d1620c85e243ed34aee4d..a9f76971ef310118bf7edea6a8dd3de1da46b46b 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -401,6 +401,13 @@ static arm_builtin_datum neon_builtin_data[] =
 };
 
 #undef CF
+#define CF(N,X) CODE_FOR_mve_##N##X
+static arm_builtin_datum mve_builtin_data[] =
+{
+#include "arm_mve_builtins.def"
+};
+
+#undef CF
 #undef VAR1
 #define VAR1(T, N, A) \
   {#N, UP (A), CODE_FOR_arm_##N, 0, T##_QUALIFIERS},
@@ -705,6 +712,13 @@ enum arm_builtins
 
 #include "arm_acle_builtins.def"
 
+  ARM_BUILTIN_MVE_BASE,
+
+#undef VAR1
+#define VAR1(T, N, X) \
+  ARM_BUILTIN_MVE_##N##X,
+#include "arm_mve_builtins.def"
+
   ARM_BUILTIN_MAX
 };
 
@@ -714,6 +728,9 @@ enum arm_builtins
 #define ARM_BUILTIN_NEON_PATTERN_START \
   (ARM_BUILTIN_NEON_BASE + 1)
 
+#define ARM_BUILTIN_MVE_PATTERN_START \
+  (ARM_BUILTIN_MVE_BASE + 1)
+
 #define ARM_BUILTIN_ACLE_PATTERN_START \
   (ARM_BUILTIN_ACLE_BASE + 1)
 
@@ -1219,6 +1236,22 @@ arm_init_acle_builtins (void)
     }
 }
 
+/* Set up all the MVE builtins mentioned in arm_mve_builtins.def file.  */
+static void
+arm_init_mve_builtins (void)
+{
+  volatile unsigned int i, fcode = ARM_BUILTIN_MVE_PATTERN_START;
+
+  arm_init_simd_builtin_scalar_types ();
+  arm_init_simd_builtin_types ();
+
+  for (i = 0; i < ARRAY_SIZE (mve_builtin_data); i++, fcode++)
+    {
+      arm_builtin_datum *d = &mve_builtin_data[i];
+      arm_init_builtin (fcode, d, "__builtin_mve");
+    }
+}
+
 /* Set up all the NEON builtins, even builtins for instructions that are not
    in the current target ISA to allow the user to compile particular modules
    with different target specific options that differ from the command line
@@ -1961,8 +1994,10 @@ arm_init_builtins (void)
       = add_builtin_function ("__builtin_arm_lane_check", lane_check_fpr,
 			      ARM_BUILTIN_SIMD_LANE_CHECK, BUILT_IN_MD,
 			      NULL, NULL_TREE);
-
-      arm_init_neon_builtins ();
+      if (TARGET_HAVE_MVE)
+	arm_init_mve_builtins ();
+      else
+	arm_init_neon_builtins ();
       arm_init_vfp_builtins ();
       arm_init_crypto_builtins ();
     }
@@ -2492,10 +2527,14 @@ arm_expand_builtin_1 (int fcode, tree exp, rtx target,
   int is_void = 0;
   int k;
   bool neon = false;
+  bool mve = false;
 
   if (IN_RANGE (fcode, ARM_BUILTIN_VFP_BASE, ARM_BUILTIN_ACLE_BASE - 1))
     neon = true;
 
+  if (IN_RANGE (fcode, ARM_BUILTIN_MVE_BASE, ARM_BUILTIN_MAX - 1))
+    mve = true;
+
   is_void = !!(d->qualifiers[0] & qualifier_void);
 
   num_args += is_void;
@@ -2535,7 +2574,7 @@ arm_expand_builtin_1 (int fcode, tree exp, rtx target,
 	}
       else if (d->qualifiers[qualifiers_k] & qualifier_pointer)
 	{
-	  if (neon)
+	  if (neon || mve)
 	    args[k] = ARG_BUILTIN_NEON_MEMORY;
 	  else
 	    args[k] = ARG_BUILTIN_MEMORY;
@@ -2585,6 +2624,26 @@ arm_expand_acle_builtin (int fcode, tree exp, rtx target)
   return arm_expand_builtin_1 (fcode, exp, target, d);
 }
 
+/* Expand an MVE builtin, i.e. those registered only if their respective target
+   constraints are met.  This check happens within arm_expand_builtin.  */
+
+static rtx
+arm_expand_mve_builtin (int fcode, tree exp, rtx target)
+{
+  if (fcode >= ARM_BUILTIN_MVE_BASE && !TARGET_HAVE_MVE)
+  {
+    fatal_error (input_location,
+		"You must enable MVE instructions"
+		" to use these intrinsics");
+    return const0_rtx;
+  }
+
+  arm_builtin_datum *d
+    = &mve_builtin_data[fcode - ARM_BUILTIN_MVE_PATTERN_START];
+
+  return arm_expand_builtin_1 (fcode, exp, target, d);
+}
+
 /* Expand a Neon builtin, i.e. those registered only if TARGET_NEON holds.
    Most of these are "special" because they don't have symbolic
    constants defined per-instruction or per instruction-variant.  Instead, the
@@ -2678,6 +2737,8 @@ arm_expand_builtin (tree exp,
       /* Don't generate any RTL.  */
       return const0_rtx;
     }
+  if (fcode >= ARM_BUILTIN_MVE_BASE)
+    return arm_expand_mve_builtin (fcode, exp, target);
 
   if (fcode >= ARM_BUILTIN_ACLE_BASE)
     return arm_expand_acle_builtin (fcode, exp, target);
diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
index 5ffb466596b5d8fc330616a6fcc7ee37d3e28def..39c6a1551a72700292dde8ef6cea44ba0907af8d 100644
--- a/gcc/config/arm/arm_mve.h
+++ b/gcc/config/arm/arm_mve.h
@@ -42,6 +42,13 @@ typedef __simd128_float16_t float16x8_t;
 typedef __simd128_float32_t float32x4_t;
 #endif
 
+#if (__ARM_FEATURE_MVE & 2) /* MVE Floating point.  */
+typedef struct { float16x8_t val[2]; } float16x8x2_t;
+typedef struct { float16x8_t val[4]; } float16x8x4_t;
+typedef struct { float32x4_t val[2]; } float32x4x2_t;
+typedef struct { float32x4_t val[4]; } float32x4x4_t;
+#endif
+
 typedef uint16_t mve_pred16_t;
 typedef __simd128_uint8_t uint8x16_t;
 typedef __simd128_uint16_t uint16x8_t;
@@ -52,6 +59,330 @@ typedef __simd128_int16_t int16x8_t;
 typedef __simd128_int32_t int32x4_t;
 typedef __simd128_int64_t int64x2_t;
 
+typedef struct { int16x8_t val[2]; } int16x8x2_t;
+typedef struct { int16x8_t val[4]; } int16x8x4_t;
+typedef struct { int32x4_t val[2]; } int32x4x2_t;
+typedef struct { int32x4_t val[4]; } int32x4x4_t;
+typedef struct { int8x16_t val[2]; } int8x16x2_t;
+typedef struct { int8x16_t val[4]; } int8x16x4_t;
+typedef struct { uint16x8_t val[2]; } uint16x8x2_t;
+typedef struct { uint16x8_t val[4]; } uint16x8x4_t;
+typedef struct { uint32x4_t val[2]; } uint32x4x2_t;
+typedef struct { uint32x4_t val[4]; } uint32x4x4_t;
+typedef struct { uint8x16_t val[2]; } uint8x16x2_t;
+typedef struct { uint8x16_t val[4]; } uint8x16x4_t;
+
+#ifndef __ARM_MVE_PRESERVE_USER_NAMESPACE
+#define vst4q_s8( __addr, __value) __arm_vst4q_s8( __addr, __value)
+#define vst4q_s16( __addr, __value) __arm_vst4q_s16( __addr, __value)
+#define vst4q_s32( __addr, __value) __arm_vst4q_s32( __addr, __value)
+#define vst4q_u8( __addr, __value) __arm_vst4q_u8( __addr, __value)
+#define vst4q_u16( __addr, __value) __arm_vst4q_u16( __addr, __value)
+#define vst4q_u32( __addr, __value) __arm_vst4q_u32( __addr, __value)
+#define vst4q_f16( __addr, __value) __arm_vst4q_f16( __addr, __value)
+#define vst4q_f32( __addr, __value) __arm_vst4q_f32( __addr, __value)
+#endif
+
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vst4q_s8 (int8_t * __addr, int8x16x4_t __value)
+{
+  union { int8x16x4_t __i; __builtin_neon_xi __o; } __rv;
+  __rv.__i = __value;
+  __builtin_mve_vst4qv16qi ((__builtin_neon_qi *) __addr, __rv.__o);
+}
+
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vst4q_s16 (int16_t * __addr, int16x8x4_t __value)
+{
+  union { int16x8x4_t __i; __builtin_neon_xi __o; } __rv;
+  __rv.__i = __value;
+  __builtin_mve_vst4qv8hi ((__builtin_neon_hi *) __addr, __rv.__o);
+}
+
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vst4q_s32 (int32_t * __addr, int32x4x4_t __value)
+{
+  union { int32x4x4_t __i; __builtin_neon_xi __o; } __rv;
+  __rv.__i = __value;
+  __builtin_mve_vst4qv4si ((__builtin_neon_si *) __addr, __rv.__o);
+}
+
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vst4q_u8 (uint8_t * __addr, uint8x16x4_t __value)
+{
+  union { uint8x16x4_t __i; __builtin_neon_xi __o; } __rv;
+  __rv.__i = __value;
+  __builtin_mve_vst4qv16qi ((__builtin_neon_qi *) __addr, __rv.__o);
+}
+
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vst4q_u16 (uint16_t * __addr, uint16x8x4_t __value)
+{
+  union { uint16x8x4_t __i; __builtin_neon_xi __o; } __rv;
+  __rv.__i = __value;
+  __builtin_mve_vst4qv8hi ((__builtin_neon_hi *) __addr, __rv.__o);
+}
+
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vst4q_u32 (uint32_t * __addr, uint32x4x4_t __value)
+{
+  union { uint32x4x4_t __i; __builtin_neon_xi __o; } __rv;
+  __rv.__i = __value;
+  __builtin_mve_vst4qv4si ((__builtin_neon_si *) __addr, __rv.__o);
+}
+
+#if (__ARM_FEATURE_MVE & 2) /* MVE Floating point.  */
+
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vst4q_f16 (float16_t * __addr, float16x8x4_t __value)
+{
+  union { float16x8x4_t __i; __builtin_neon_xi __o; } __rv;
+  __rv.__i = __value;
+  __builtin_mve_vst4qv8hf (__addr, __rv.__o);
+}
+
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vst4q_f32 (float32_t * __addr, float32x4x4_t __value)
+{
+  union { float32x4x4_t __i; __builtin_neon_xi __o; } __rv;
+  __rv.__i = __value;
+  __builtin_mve_vst4qv4sf (__addr, __rv.__o);
+}
+
+#endif
+
+enum {
+    __ARM_mve_type_float16_t = 1,
+    __ARM_mve_type_float16_t_ptr,
+    __ARM_mve_type_float16_t_const_ptr,
+    __ARM_mve_type_float16x8_t,
+    __ARM_mve_type_float16x8x2_t,
+    __ARM_mve_type_float16x8x4_t,
+    __ARM_mve_type_float32_t,
+    __ARM_mve_type_float32_t_ptr,
+    __ARM_mve_type_float32_t_const_ptr,
+    __ARM_mve_type_float32x4_t,
+    __ARM_mve_type_float32x4x2_t,
+    __ARM_mve_type_float32x4x4_t,
+    __ARM_mve_type_int16_t,
+    __ARM_mve_type_int16_t_ptr,
+    __ARM_mve_type_int16_t_const_ptr,
+    __ARM_mve_type_int16x8_t,
+    __ARM_mve_type_int16x8x2_t,
+    __ARM_mve_type_int16x8x4_t,
+    __ARM_mve_type_int32_t,
+    __ARM_mve_type_int32_t_ptr,
+    __ARM_mve_type_int32_t_const_ptr,
+    __ARM_mve_type_int32x4_t,
+    __ARM_mve_type_int32x4x2_t,
+    __ARM_mve_type_int32x4x4_t,
+    __ARM_mve_type_int64_t,
+    __ARM_mve_type_int64_t_ptr,
+    __ARM_mve_type_int64_t_const_ptr,
+    __ARM_mve_type_int64x2_t,
+    __ARM_mve_type_int8_t,
+    __ARM_mve_type_int8_t_ptr,
+    __ARM_mve_type_int8_t_const_ptr,
+    __ARM_mve_type_int8x16_t,
+    __ARM_mve_type_int8x16x2_t,
+    __ARM_mve_type_int8x16x4_t,
+    __ARM_mve_type_uint16_t,
+    __ARM_mve_type_uint16_t_ptr,
+    __ARM_mve_type_uint16_t_const_ptr,
+    __ARM_mve_type_uint16x8_t,
+    __ARM_mve_type_uint16x8x2_t,
+    __ARM_mve_type_uint16x8x4_t,
+    __ARM_mve_type_uint32_t,
+    __ARM_mve_type_uint32_t_ptr,
+    __ARM_mve_type_uint32_t_const_ptr,
+    __ARM_mve_type_uint32x4_t,
+    __ARM_mve_type_uint32x4x2_t,
+    __ARM_mve_type_uint32x4x4_t,
+    __ARM_mve_type_uint64_t,
+    __ARM_mve_type_uint64_t_ptr,
+    __ARM_mve_type_uint64_t_const_ptr,
+    __ARM_mve_type_uint64x2_t,
+    __ARM_mve_type_uint8_t,
+    __ARM_mve_type_uint8_t_ptr,
+    __ARM_mve_type_uint8_t_const_ptr,
+    __ARM_mve_type_uint8x16_t,
+    __ARM_mve_type_uint8x16x2_t,
+    __ARM_mve_type_uint8x16x4_t,
+    __ARM_mve_unsupported_type
+};
+
+#if (__ARM_FEATURE_MVE & 2) /* MVE Floating point.  */
+#define __ARM_mve_typeid(x) _Generic(x, \
+    float16_t: __ARM_mve_type_float16_t, \
+    float16_t *: __ARM_mve_type_float16_t_ptr, \
+    float16_t const *: __ARM_mve_type_float16_t_const_ptr, \
+    float16x8_t: __ARM_mve_type_float16x8_t, \
+    float16x8x2_t: __ARM_mve_type_float16x8x2_t, \
+    float16x8x4_t: __ARM_mve_type_float16x8x4_t, \
+    float32_t: __ARM_mve_type_float32_t, \
+    float32_t *: __ARM_mve_type_float32_t_ptr, \
+    float32_t const *: __ARM_mve_type_float32_t_const_ptr, \
+    float32x4_t: __ARM_mve_type_float32x4_t, \
+    float32x4x2_t: __ARM_mve_type_float32x4x2_t, \
+    float32x4x4_t: __ARM_mve_type_float32x4x4_t, \
+    int16_t: __ARM_mve_type_int16_t, \
+    int16_t *: __ARM_mve_type_int16_t_ptr, \
+    int16_t const *: __ARM_mve_type_int16_t_const_ptr, \
+    int16x8_t: __ARM_mve_type_int16x8_t, \
+    int16x8x2_t: __ARM_mve_type_int16x8x2_t, \
+    int16x8x4_t: __ARM_mve_type_int16x8x4_t, \
+    int32_t: __ARM_mve_type_int32_t, \
+    int32_t *: __ARM_mve_type_int32_t_ptr, \
+    int32_t const *: __ARM_mve_type_int32_t_const_ptr, \
+    int32x4_t: __ARM_mve_type_int32x4_t, \
+    int32x4x2_t: __ARM_mve_type_int32x4x2_t, \
+    int32x4x4_t: __ARM_mve_type_int32x4x4_t, \
+    int64_t: __ARM_mve_type_int64_t, \
+    int64_t *: __ARM_mve_type_int64_t_ptr, \
+    int64_t const *: __ARM_mve_type_int64_t_const_ptr, \
+    int64x2_t: __ARM_mve_type_int64x2_t, \
+    int8_t: __ARM_mve_type_int8_t, \
+    int8_t *: __ARM_mve_type_int8_t_ptr, \
+    int8_t const *: __ARM_mve_type_int8_t_const_ptr, \
+    int8x16_t: __ARM_mve_type_int8x16_t, \
+    int8x16x2_t: __ARM_mve_type_int8x16x2_t, \
+    int8x16x4_t: __ARM_mve_type_int8x16x4_t, \
+    uint16_t: __ARM_mve_type_uint16_t, \
+    uint16_t *: __ARM_mve_type_uint16_t_ptr, \
+    uint16_t const *: __ARM_mve_type_uint16_t_const_ptr, \
+    uint16x8_t: __ARM_mve_type_uint16x8_t, \
+    uint16x8x2_t: __ARM_mve_type_uint16x8x2_t, \
+    uint16x8x4_t: __ARM_mve_type_uint16x8x4_t, \
+    uint32_t: __ARM_mve_type_uint32_t, \
+    uint32_t *: __ARM_mve_type_uint32_t_ptr, \
+    uint32_t const *: __ARM_mve_type_uint32_t_const_ptr, \
+    uint32x4_t: __ARM_mve_type_uint32x4_t, \
+    uint32x4x2_t: __ARM_mve_type_uint32x4x2_t, \
+    uint32x4x4_t: __ARM_mve_type_uint32x4x4_t, \
+    uint64_t: __ARM_mve_type_uint64_t, \
+    uint64_t *: __ARM_mve_type_uint64_t_ptr, \
+    uint64_t const *: __ARM_mve_type_uint64_t_const_ptr, \
+    uint64x2_t: __ARM_mve_type_uint64x2_t, \
+    uint8_t: __ARM_mve_type_uint8_t, \
+    uint8_t *: __ARM_mve_type_uint8_t_ptr, \
+    uint8_t const *: __ARM_mve_type_uint8_t_const_ptr, \
+    uint8x16_t: __ARM_mve_type_uint8x16_t, \
+    uint8x16x2_t: __ARM_mve_type_uint8x16x2_t, \
+    uint8x16x4_t: __ARM_mve_type_uint8x16x4_t, \
+    default: _Generic(x, \
+	signed char: __ARM_mve_type_int8_t, \
+	short: __ARM_mve_type_int16_t, \
+	int: __ARM_mve_type_int32_t, \
+	long: __ARM_mve_type_int32_t, \
+	long long: __ARM_mve_type_int64_t, \
+	unsigned char: __ARM_mve_type_uint8_t, \
+	unsigned short: __ARM_mve_type_uint16_t, \
+	unsigned int: __ARM_mve_type_uint32_t, \
+	unsigned long: __ARM_mve_type_uint32_t, \
+	unsigned long long: __ARM_mve_type_uint64_t, \
+	default: __ARM_mve_unsupported_type))
+#else
+#define __ARM_mve_typeid(x) _Generic(x, \
+    int16_t: __ARM_mve_type_int16_t, \
+    int16_t *: __ARM_mve_type_int16_t_ptr, \
+    int16_t const *: __ARM_mve_type_int16_t_const_ptr, \
+    int16x8_t: __ARM_mve_type_int16x8_t, \
+    int16x8x2_t: __ARM_mve_type_int16x8x2_t, \
+    int16x8x4_t: __ARM_mve_type_int16x8x4_t, \
+    int32_t: __ARM_mve_type_int32_t, \
+    int32_t *: __ARM_mve_type_int32_t_ptr, \
+    int32_t const *: __ARM_mve_type_int32_t_const_ptr, \
+    int32x4_t: __ARM_mve_type_int32x4_t, \
+    int32x4x2_t: __ARM_mve_type_int32x4x2_t, \
+    int32x4x4_t: __ARM_mve_type_int32x4x4_t, \
+    int64_t: __ARM_mve_type_int64_t, \
+    int64_t *: __ARM_mve_type_int64_t_ptr, \
+    int64_t const *: __ARM_mve_type_int64_t_const_ptr, \
+    int64x2_t: __ARM_mve_type_int64x2_t, \
+    int8_t: __ARM_mve_type_int8_t, \
+    int8_t *: __ARM_mve_type_int8_t_ptr, \
+    int8_t const *: __ARM_mve_type_int8_t_const_ptr, \
+    int8x16_t: __ARM_mve_type_int8x16_t, \
+    int8x16x2_t: __ARM_mve_type_int8x16x2_t, \
+    int8x16x4_t: __ARM_mve_type_int8x16x4_t, \
+    uint16_t: __ARM_mve_type_uint16_t, \
+    uint16_t *: __ARM_mve_type_uint16_t_ptr, \
+    uint16_t const *: __ARM_mve_type_uint16_t_const_ptr, \
+    uint16x8_t: __ARM_mve_type_uint16x8_t, \
+    uint16x8x2_t: __ARM_mve_type_uint16x8x2_t, \
+    uint16x8x4_t: __ARM_mve_type_uint16x8x4_t, \
+    uint32_t: __ARM_mve_type_uint32_t, \
+    uint32_t *: __ARM_mve_type_uint32_t_ptr, \
+    uint32_t const *: __ARM_mve_type_uint32_t_const_ptr, \
+    uint32x4_t: __ARM_mve_type_uint32x4_t, \
+    uint32x4x2_t: __ARM_mve_type_uint32x4x2_t, \
+    uint32x4x4_t: __ARM_mve_type_uint32x4x4_t, \
+    uint64_t: __ARM_mve_type_uint64_t, \
+    uint64_t *: __ARM_mve_type_uint64_t_ptr, \
+    uint64_t const *: __ARM_mve_type_uint64_t_const_ptr, \
+    uint64x2_t: __ARM_mve_type_uint64x2_t, \
+    uint8_t: __ARM_mve_type_uint8_t, \
+    uint8_t *: __ARM_mve_type_uint8_t_ptr, \
+    uint8_t const *: __ARM_mve_type_uint8_t_const_ptr, \
+    uint8x16_t: __ARM_mve_type_uint8x16_t, \
+    uint8x16x2_t: __ARM_mve_type_uint8x16x2_t, \
+    uint8x16x4_t: __ARM_mve_type_uint8x16x4_t, \
+    default: _Generic(x, \
+	signed char: __ARM_mve_type_int8_t, \
+	short: __ARM_mve_type_int16_t, \
+	int: __ARM_mve_type_int32_t, \
+	long: __ARM_mve_type_int32_t, \
+	long long: __ARM_mve_type_int64_t, \
+	unsigned char: __ARM_mve_type_uint8_t, \
+	unsigned short: __ARM_mve_type_uint16_t, \
+	unsigned int: __ARM_mve_type_uint32_t, \
+	unsigned long: __ARM_mve_type_uint32_t, \
+	unsigned long long: __ARM_mve_type_uint64_t, \
+	default: __ARM_mve_unsupported_type))
+#endif /* MVE Floating point.  */
+
+extern void *__ARM_undef;
+#define __ARM_mve_coerce(param, type) \
+    _Generic(param, type: param, default: *(type *)__ARM_undef)
+
+#if (__ARM_FEATURE_MVE & 2) /* MVE Floating point.  */
+
+#define vst4q(p0,p1) __arm_vst4q(p0,p1)
+#define __arm_vst4q(p0,p1) ({ __typeof(p0) __p0 = (p0); \
+  __typeof(p1) __p1 = (p1); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
+  int (*)[__ARM_mve_type_int8_t_ptr][__ARM_mve_type_int8x16x4_t]: __arm_vst4q_s8 (__ARM_mve_coerce(__p0, int8_t *), __ARM_mve_coerce(__p1, int8x16x4_t)), \
+  int (*)[__ARM_mve_type_int16_t_ptr][__ARM_mve_type_int16x8x4_t]: __arm_vst4q_s16 (__ARM_mve_coerce(__p0, int16_t *), __ARM_mve_coerce(__p1, int16x8x4_t)), \
+  int (*)[__ARM_mve_type_int32_t_ptr][__ARM_mve_type_int32x4x4_t]: __arm_vst4q_s32 (__ARM_mve_coerce(__p0, int32_t *), __ARM_mve_coerce(__p1, int32x4x4_t)), \
+  int (*)[__ARM_mve_type_uint8_t_ptr][__ARM_mve_type_uint8x16x4_t]: __arm_vst4q_u8 (__ARM_mve_coerce(__p0, uint8_t *), __ARM_mve_coerce(__p1, uint8x16x4_t)), \
+  int (*)[__ARM_mve_type_uint16_t_ptr][__ARM_mve_type_uint16x8x4_t]: __arm_vst4q_u16 (__ARM_mve_coerce(__p0, uint16_t *), __ARM_mve_coerce(__p1, uint16x8x4_t)), \
+  int (*)[__ARM_mve_type_uint32_t_ptr][__ARM_mve_type_uint32x4x4_t]: __arm_vst4q_u32 (__ARM_mve_coerce(__p0, uint32_t *), __ARM_mve_coerce(__p1, uint32x4x4_t)), \
+  int (*)[__ARM_mve_type_float16_t_ptr][__ARM_mve_type_float16x8x4_t]: __arm_vst4q_f16 (__ARM_mve_coerce(__p0, float16_t *), __ARM_mve_coerce(__p1, float16x8x4_t)), \
+  int (*)[__ARM_mve_type_float32_t_ptr][__ARM_mve_type_float32x4x4_t]: __arm_vst4q_f32 (__ARM_mve_coerce(__p0, float32_t *), __ARM_mve_coerce(__p1, float32x4x4_t)));})
+
+#else /* MVE Interger.  */
+
+#define vst4q(p0,p1) __arm_vst4q(p0,p1)
+#define __arm_vst4q(p0,p1) ({ __typeof(p0) __p0 = (p0); \
+  __typeof(p1) __p1 = (p1); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
+  int (*)[__ARM_mve_type_int8_t_ptr][__ARM_mve_type_int8x16x4_t]: __arm_vst4q_s8 (__ARM_mve_coerce(__p0, int8_t *), __ARM_mve_coerce(__p1, int8x16x4_t)), \
+  int (*)[__ARM_mve_type_int16_t_ptr][__ARM_mve_type_int16x8x4_t]: __arm_vst4q_s16 (__ARM_mve_coerce(__p0, int16_t *), __ARM_mve_coerce(__p1, int16x8x4_t)), \
+  int (*)[__ARM_mve_type_int32_t_ptr][__ARM_mve_type_int32x4x4_t]: __arm_vst4q_s32 (__ARM_mve_coerce(__p0, int32_t *), __ARM_mve_coerce(__p1, int32x4x4_t)), \
+  int (*)[__ARM_mve_type_uint8_t_ptr][__ARM_mve_type_uint8x16x4_t]: __arm_vst4q_u8 (__ARM_mve_coerce(__p0, uint8_t *), __ARM_mve_coerce(__p1, uint8x16x4_t)), \
+  int (*)[__ARM_mve_type_uint16_t_ptr][__ARM_mve_type_uint16x8x4_t]: __arm_vst4q_u16 (__ARM_mve_coerce(__p0, uint16_t *), __ARM_mve_coerce(__p1, uint16x8x4_t)), \
+  int (*)[__ARM_mve_type_uint32_t_ptr][__ARM_mve_type_uint32x4x4_t]: __arm_vst4q_u32 (__ARM_mve_coerce(__p0, uint32_t *), __ARM_mve_coerce(__p1, uint32x4x4_t)));})
+
+#endif /* MVE Floating point.  */
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/gcc/config/arm/arm_mve_builtins.def b/gcc/config/arm/arm_mve_builtins.def
new file mode 100644
index 0000000000000000000000000000000000000000..c7f3546c9cb0ec3ae794e181e533a1cbed38121c
--- /dev/null
+++ b/gcc/config/arm/arm_mve_builtins.def
@@ -0,0 +1,21 @@
+/*  MVE builtin definitions for Arm.
+    Copyright  (C) 2019 Free Software Foundation, Inc.
+    Contributed by Arm.
+
+    This file is part of GCC.
+
+    GCC is free software; you can redistribute it and/or modify it
+    under the terms of the GNU General Public License as published
+    by the Free Software Foundation; either version 3, or (at your
+    option) any later version.
+
+    GCC is distributed in the hope that it will be useful, but WITHOUT
+    ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+    or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
+    License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with GCC; see the file COPYING3.  If not see
+    <http://www.gnu.org/licenses/>.  */
+
+VAR5 (STORE1, vst4q, v16qi, v8hi, v4si, v8hf, v4sf)
diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index 53334c6d329dedd482615b996232e85ded7a34f8..a2f5220eccb6d888b9b0b1bbd1054cc3a6be97f5 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -17,9 +17,12 @@
 ;; along with GCC; see the file COPYING3.  If not see
 ;; <http://www.gnu.org/licenses/>.
 
-(define_mode_iterator MVE_types [V16QI V8HI V4SI V2DI TI V8HF V4SF V2DF])
 (define_mode_attr V_sz_elem2 [(V16QI "s8") (V8HI "u16") (V4SI "u32")
 			      (V2DI "u64")])
+(define_mode_iterator MVE_types [V16QI V8HI V4SI V2DI TI V8HF V4SF V2DF])
+(define_mode_iterator MVE_VLD_ST [V16QI V8HI V4SI V8HF V4SF])
+
+(define_c_enum "unspec" [VST4Q])
 
 (define_insn "*mve_mov<mode>"
   [(set (match_operand:MVE_types 0 "s_register_operand" "=w,w,r,w,w,r,w")
@@ -76,3 +79,37 @@
   "TARGET_HAVE_MVE"
   "vstrb.<V_sz_elem> %q1, %E0"
   [(set_attr "type" "mve_store")])
+
+;;
+;; [vst4q])
+;;
+(define_insn "mve_vst4q<mode>"
+  [(set (match_operand:XI 0 "neon_struct_operand" "=Um")
+	(unspec:XI [(match_operand:XI 1 "s_register_operand" "w")
+		    (unspec:MVE_VLD_ST [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
+	 VST4Q))
+  ]
+  "TARGET_HAVE_MVE"
+{
+   rtx ops[6];
+   int regno = REGNO (operands[1]);
+   ops[0] = gen_rtx_REG (TImode, regno);
+   ops[1] = gen_rtx_REG (TImode, regno+4);
+   ops[2] = gen_rtx_REG (TImode, regno+8);
+   ops[3] = gen_rtx_REG (TImode, regno+12);
+   rtx reg  = operands[0];
+   while (reg && !REG_P (reg))
+    reg = XEXP (reg, 0);
+   gcc_assert (REG_P (reg));
+   ops[4] = reg;
+   ops[5] = operands[0];
+   /* Here in first three instructions data is stored to ops[4]'s location but
+      in the fourth instruction data is stored to operands[0], this is to
+      support the writeback.  */
+   output_asm_insn ("vst40.<V_sz_elem>\t{%q0, %q1, %q2, %q3}, [%4]\n\t"
+		    "vst41.<V_sz_elem>\t{%q0, %q1, %q2, %q3}, [%4]\n\t"
+		    "vst42.<V_sz_elem>\t{%q0, %q1, %q2, %q3}, [%4]\n\t"
+		    "vst43.<V_sz_elem>\t{%q0, %q1, %q2, %q3}, %5", ops);
+   return "";
+}
+  [(set_attr "length" "16")])
diff --git a/gcc/config/arm/t-arm b/gcc/config/arm/t-arm
index fda5e84355b56a20eb9a22919ab1c786120cc8f1..fbd99197b31ee4b69b7a06592123b6facd04f9af 100644
--- a/gcc/config/arm/t-arm
+++ b/gcc/config/arm/t-arm
@@ -135,7 +135,8 @@ arm.o: $(srcdir)/config/arm/arm.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) \
   arm-cpu-data.h \
   $(srcdir)/config/arm/arm-protos.h \
   $(srcdir)/config/arm/arm_neon_builtins.def \
-  $(srcdir)/config/arm/arm_vfp_builtins.def
+  $(srcdir)/config/arm/arm_vfp_builtins.def \
+  $(srcdir)/config/arm/arm_mve_builtins.def
 
 arm-builtins.o: $(srcdir)/config/arm/arm-builtins.c $(CONFIG_H) \
   $(SYSTEM_H) coretypes.h $(TM_H) \
@@ -145,6 +146,7 @@ arm-builtins.o: $(srcdir)/config/arm/arm-builtins.c $(CONFIG_H) \
   $(srcdir)/config/arm/arm_acle_builtins.def \
   $(srcdir)/config/arm/arm_neon_builtins.def \
   $(srcdir)/config/arm/arm_vfp_builtins.def \
+  $(srcdir)/config/arm/arm_mve_builtins.def \
   $(srcdir)/config/arm/arm-simd-builtin-types.def
 	$(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) \
 		$(srcdir)/config/arm/arm-builtins.c
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst4q_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst4q_f16.c
new file mode 100644
index 0000000000000000000000000000000000000000..6f8ea17345b85983202613b0f481b23e0532fd16
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst4q_f16.c
@@ -0,0 +1,37 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+void
+foo (float16_t * addr, float16x8x4_t value)
+{
+  vst4q_f16 (addr, value);
+}
+
+/* { dg-final { scan-assembler "vst40.16"  }  } */
+/* { dg-final { scan-assembler "vst41.16"  }  } */
+/* { dg-final { scan-assembler "vst42.16"  }  } */
+/* { dg-final { scan-assembler "vst43.16"  }  } */
+
+void
+foo1 (float16_t * addr, float16x8x4_t value)
+{
+  vst4q (addr, value);
+}
+
+/* { dg-final { scan-assembler "vst40.16"  }  } */
+/* { dg-final { scan-assembler "vst41.16"  }  } */
+/* { dg-final { scan-assembler "vst42.16"  }  } */
+/* { dg-final { scan-assembler "vst43.16"  }  } */
+
+void
+foo2 (float16_t * addr, float16x8x4_t value)
+{
+  vst4q_f16 (addr, value);
+  addr += 32;
+  vst4q_f16 (addr, value);
+}
+
+/* { dg-final { scan-assembler {vst43.16\s\{.*\}, \[.*\]!}  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst4q_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst4q_f32.c
new file mode 100644
index 0000000000000000000000000000000000000000..03b1dbb5b13e2c41013c772d944f29da8c2468f7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst4q_f32.c
@@ -0,0 +1,37 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+void
+foo (float32_t * addr, float32x4x4_t value)
+{
+  vst4q_f32 (addr, value);
+}
+
+/* { dg-final { scan-assembler "vst40.32"  }  } */
+/* { dg-final { scan-assembler "vst41.32"  }  } */
+/* { dg-final { scan-assembler "vst42.32"  }  } */
+/* { dg-final { scan-assembler "vst43.32"  }  } */
+
+void
+foo1 (float32_t * addr, float32x4x4_t value)
+{
+  vst4q (addr, value);
+}
+
+/* { dg-final { scan-assembler "vst40.32"  }  } */
+/* { dg-final { scan-assembler "vst41.32"  }  } */
+/* { dg-final { scan-assembler "vst42.32"  }  } */
+/* { dg-final { scan-assembler "vst43.32"  }  } */
+
+void
+foo2 (float32_t * addr, float32x4x4_t value)
+{
+  vst4q_f32 (addr, value);
+  addr += 16;
+  vst4q_f32 (addr, value);
+}
+
+/* { dg-final { scan-assembler {vst43.32\s\{.*\}, \[.*\]!}  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst4q_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst4q_s16.c
new file mode 100644
index 0000000000000000000000000000000000000000..05f53fc9175b41e4d16c7a6bc525b57b626a632d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst4q_s16.c
@@ -0,0 +1,37 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+void
+foo (int16_t * addr, int16x8x4_t value)
+{
+  vst4q_s16 (addr, value);
+}
+
+/* { dg-final { scan-assembler "vst40.16"  }  } */
+/* { dg-final { scan-assembler "vst41.16"  }  } */
+/* { dg-final { scan-assembler "vst42.16"  }  } */
+/* { dg-final { scan-assembler "vst43.16"  }  } */
+
+void
+foo1 (int16_t * addr, int16x8x4_t value)
+{
+  vst4q (addr, value);
+}
+
+/* { dg-final { scan-assembler "vst40.16"  }  } */
+/* { dg-final { scan-assembler "vst41.16"  }  } */
+/* { dg-final { scan-assembler "vst42.16"  }  } */
+/* { dg-final { scan-assembler "vst43.16"  }  } */
+
+void
+foo2 (int16_t * addr, int16x8x4_t value)
+{
+  vst4q_s16 (addr, value);
+  addr += 32;
+  vst4q_s16 (addr, value);
+}
+
+/* { dg-final { scan-assembler {vst43.16\s\{.*\}, \[.*\]!}  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst4q_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst4q_s32.c
new file mode 100644
index 0000000000000000000000000000000000000000..9da2addb8c3d9566722b640f66bbee4f2b4b563d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst4q_s32.c
@@ -0,0 +1,37 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+void
+foo (int32_t * addr, int32x4x4_t value)
+{
+  vst4q_s32 (addr, value);
+}
+
+/* { dg-final { scan-assembler "vst40.32"  }  } */
+/* { dg-final { scan-assembler "vst41.32"  }  } */
+/* { dg-final { scan-assembler "vst42.32"  }  } */
+/* { dg-final { scan-assembler "vst43.32"  }  } */
+
+void
+foo1 (int32_t * addr, int32x4x4_t value)
+{
+  vst4q (addr, value);
+}
+
+/* { dg-final { scan-assembler "vst40.32"  }  } */
+/* { dg-final { scan-assembler "vst41.32"  }  } */
+/* { dg-final { scan-assembler "vst42.32"  }  } */
+/* { dg-final { scan-assembler "vst43.32"  }  } */
+
+void
+foo2 (int32_t * addr, int32x4x4_t value)
+{
+  vst4q_s32 (addr, value);
+  addr += 16;
+  vst4q_s32 (addr, value); 
+}
+
+/* { dg-final { scan-assembler {vst43.32\s\{.*\}, \[.*\]!}  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst4q_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst4q_s8.c
new file mode 100644
index 0000000000000000000000000000000000000000..1243d135e1628de14596a58f24a79f45299ce517
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst4q_s8.c
@@ -0,0 +1,37 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+void
+foo (int8_t * addr, int8x16x4_t value)
+{
+  vst4q_s8 (addr, value);
+}
+
+/* { dg-final { scan-assembler "vst40.8"  }  } */
+/* { dg-final { scan-assembler "vst41.8"  }  } */
+/* { dg-final { scan-assembler "vst42.8"  }  } */
+/* { dg-final { scan-assembler "vst43.8"  }  } */
+
+void
+foo1 (int8_t * addr, int8x16x4_t value)
+{
+  vst4q (addr, value);
+}
+
+/* { dg-final { scan-assembler "vst40.8"  }  } */
+/* { dg-final { scan-assembler "vst41.8"  }  } */
+/* { dg-final { scan-assembler "vst42.8"  }  } */
+/* { dg-final { scan-assembler "vst43.8"  }  } */
+
+void
+foo2 (int8_t * addr, int8x16x4_t value)
+{
+  vst4q_s8 (addr, value);
+  addr += 16*4;
+  vst4q_s8 (addr, value);
+}
+
+/* { dg-final { scan-assembler {vst43.8\s\{.*\}, \[.*\]!}  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst4q_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst4q_u16.c
new file mode 100644
index 0000000000000000000000000000000000000000..09097e7914ed08dda5e7616bafe35cdb30486f21
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst4q_u16.c
@@ -0,0 +1,37 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+void
+foo (uint16_t * addr, uint16x8x4_t value)
+{
+  vst4q_u16 (addr, value);
+}
+
+/* { dg-final { scan-assembler "vst40.16"  }  } */
+/* { dg-final { scan-assembler "vst41.16"  }  } */
+/* { dg-final { scan-assembler "vst42.16"  }  } */
+/* { dg-final { scan-assembler "vst43.16"  }  } */
+
+void
+foo1 (uint16_t * addr, uint16x8x4_t value)
+{
+  vst4q (addr, value);
+}
+
+/* { dg-final { scan-assembler "vst40.16"  }  } */
+/* { dg-final { scan-assembler "vst41.16"  }  } */
+/* { dg-final { scan-assembler "vst42.16"  }  } */
+/* { dg-final { scan-assembler "vst43.16"  }  } */
+
+void
+foo2 (uint16_t * addr, uint16x8x4_t value)
+{
+  vst4q_u16 (addr, value);
+  addr += 32;
+  vst4q_u16 (addr, value);
+}
+
+/* { dg-final { scan-assembler {vst43.16\s\{.*\}, \[.*\]!}  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst4q_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst4q_u32.c
new file mode 100644
index 0000000000000000000000000000000000000000..dcb0a10f9f21e4adb70d18b590fcc7b663547f16
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst4q_u32.c
@@ -0,0 +1,37 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+void
+foo (uint32_t * addr, uint32x4x4_t value)
+{
+  vst4q_u32 (addr, value);
+}
+
+/* { dg-final { scan-assembler "vst40.32"  }  } */
+/* { dg-final { scan-assembler "vst41.32"  }  } */
+/* { dg-final { scan-assembler "vst42.32"  }  } */
+/* { dg-final { scan-assembler "vst43.32"  }  } */
+
+void
+foo1 (uint32_t * addr, uint32x4x4_t value)
+{
+  vst4q (addr, value);
+}
+
+/* { dg-final { scan-assembler "vst40.32"  }  } */
+/* { dg-final { scan-assembler "vst41.32"  }  } */
+/* { dg-final { scan-assembler "vst42.32"  }  } */
+/* { dg-final { scan-assembler "vst43.32"  }  } */
+
+void
+foo2 (uint32_t * addr, uint32x4x4_t value)
+{
+  vst4q_u32 (addr, value);
+  addr += 16;
+  vst4q_u32 (addr, value); 
+}
+
+/* { dg-final { scan-assembler {vst43.32\s\{.*\}, \[.*\]!}  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst4q_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst4q_u8.c
new file mode 100644
index 0000000000000000000000000000000000000000..b18f57dd6356c2fdf39cbf14ca85373c98f1c62c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst4q_u8.c
@@ -0,0 +1,37 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+void
+foo (uint8_t * addr, uint8x16x4_t value)
+{
+  vst4q_u8 (addr, value);
+}
+
+/* { dg-final { scan-assembler "vst40.8"  }  } */
+/* { dg-final { scan-assembler "vst41.8"  }  } */
+/* { dg-final { scan-assembler "vst42.8"  }  } */
+/* { dg-final { scan-assembler "vst43.8"  }  } */
+
+void
+foo1 (uint8_t * addr, uint8x16x4_t value)
+{
+  vst4q (addr, value);
+}
+
+/* { dg-final { scan-assembler "vst40.8"  }  } */
+/* { dg-final { scan-assembler "vst41.8"  }  } */
+/* { dg-final { scan-assembler "vst42.8"  }  } */
+/* { dg-final { scan-assembler "vst43.8"  }  } */
+
+void
+foo2 (uint8_t * addr, uint8x16x4_t value)
+{
+  vst4q_u8 (addr, value);
+  addr += 16*4;
+  vst4q_u8 (addr, value);
+}
+
+/* { dg-final { scan-assembler {vst43.8\s\{.*\}, \[.*\]!}  }  } */


[-- Attachment #2: diff03.patch --]
[-- Type: text/plain, Size: 34616 bytes --]

diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index d4cb0ea3deb49b10266d1620c85e243ed34aee4d..a9f76971ef310118bf7edea6a8dd3de1da46b46b 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -401,6 +401,13 @@ static arm_builtin_datum neon_builtin_data[] =
 };
 
 #undef CF
+#define CF(N,X) CODE_FOR_mve_##N##X
+static arm_builtin_datum mve_builtin_data[] =
+{
+#include "arm_mve_builtins.def"
+};
+
+#undef CF
 #undef VAR1
 #define VAR1(T, N, A) \
   {#N, UP (A), CODE_FOR_arm_##N, 0, T##_QUALIFIERS},
@@ -705,6 +712,13 @@ enum arm_builtins
 
 #include "arm_acle_builtins.def"
 
+  ARM_BUILTIN_MVE_BASE,
+
+#undef VAR1
+#define VAR1(T, N, X) \
+  ARM_BUILTIN_MVE_##N##X,
+#include "arm_mve_builtins.def"
+
   ARM_BUILTIN_MAX
 };
 
@@ -714,6 +728,9 @@ enum arm_builtins
 #define ARM_BUILTIN_NEON_PATTERN_START \
   (ARM_BUILTIN_NEON_BASE + 1)
 
+#define ARM_BUILTIN_MVE_PATTERN_START \
+  (ARM_BUILTIN_MVE_BASE + 1)
+
 #define ARM_BUILTIN_ACLE_PATTERN_START \
   (ARM_BUILTIN_ACLE_BASE + 1)
 
@@ -1219,6 +1236,22 @@ arm_init_acle_builtins (void)
     }
 }
 
+/* Set up all the MVE builtins mentioned in arm_mve_builtins.def file.  */
+static void
+arm_init_mve_builtins (void)
+{
+  volatile unsigned int i, fcode = ARM_BUILTIN_MVE_PATTERN_START;
+
+  arm_init_simd_builtin_scalar_types ();
+  arm_init_simd_builtin_types ();
+
+  for (i = 0; i < ARRAY_SIZE (mve_builtin_data); i++, fcode++)
+    {
+      arm_builtin_datum *d = &mve_builtin_data[i];
+      arm_init_builtin (fcode, d, "__builtin_mve");
+    }
+}
+
 /* Set up all the NEON builtins, even builtins for instructions that are not
    in the current target ISA to allow the user to compile particular modules
    with different target specific options that differ from the command line
@@ -1961,8 +1994,10 @@ arm_init_builtins (void)
       = add_builtin_function ("__builtin_arm_lane_check", lane_check_fpr,
 			      ARM_BUILTIN_SIMD_LANE_CHECK, BUILT_IN_MD,
 			      NULL, NULL_TREE);
-
-      arm_init_neon_builtins ();
+      if (TARGET_HAVE_MVE)
+	arm_init_mve_builtins ();
+      else
+	arm_init_neon_builtins ();
       arm_init_vfp_builtins ();
       arm_init_crypto_builtins ();
     }
@@ -2492,10 +2527,14 @@ arm_expand_builtin_1 (int fcode, tree exp, rtx target,
   int is_void = 0;
   int k;
   bool neon = false;
+  bool mve = false;
 
   if (IN_RANGE (fcode, ARM_BUILTIN_VFP_BASE, ARM_BUILTIN_ACLE_BASE - 1))
     neon = true;
 
+  if (IN_RANGE (fcode, ARM_BUILTIN_MVE_BASE, ARM_BUILTIN_MAX - 1))
+    mve = true;
+
   is_void = !!(d->qualifiers[0] & qualifier_void);
 
   num_args += is_void;
@@ -2535,7 +2574,7 @@ arm_expand_builtin_1 (int fcode, tree exp, rtx target,
 	}
       else if (d->qualifiers[qualifiers_k] & qualifier_pointer)
 	{
-	  if (neon)
+	  if (neon || mve)
 	    args[k] = ARG_BUILTIN_NEON_MEMORY;
 	  else
 	    args[k] = ARG_BUILTIN_MEMORY;
@@ -2585,6 +2624,26 @@ arm_expand_acle_builtin (int fcode, tree exp, rtx target)
   return arm_expand_builtin_1 (fcode, exp, target, d);
 }
 
+/* Expand an MVE builtin, i.e. those registered only if their respective target
+   constraints are met.  This check happens within arm_expand_builtin.  */
+
+static rtx
+arm_expand_mve_builtin (int fcode, tree exp, rtx target)
+{
+  if (fcode >= ARM_BUILTIN_MVE_BASE && !TARGET_HAVE_MVE)
+  {
+    fatal_error (input_location,
+		"You must enable MVE instructions"
+		" to use these intrinsics");
+    return const0_rtx;
+  }
+
+  arm_builtin_datum *d
+    = &mve_builtin_data[fcode - ARM_BUILTIN_MVE_PATTERN_START];
+
+  return arm_expand_builtin_1 (fcode, exp, target, d);
+}
+
 /* Expand a Neon builtin, i.e. those registered only if TARGET_NEON holds.
    Most of these are "special" because they don't have symbolic
    constants defined per-instruction or per instruction-variant.  Instead, the
@@ -2678,6 +2737,8 @@ arm_expand_builtin (tree exp,
       /* Don't generate any RTL.  */
       return const0_rtx;
     }
+  if (fcode >= ARM_BUILTIN_MVE_BASE)
+    return arm_expand_mve_builtin (fcode, exp, target);
 
   if (fcode >= ARM_BUILTIN_ACLE_BASE)
     return arm_expand_acle_builtin (fcode, exp, target);
diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
index 5ffb466596b5d8fc330616a6fcc7ee37d3e28def..39c6a1551a72700292dde8ef6cea44ba0907af8d 100644
--- a/gcc/config/arm/arm_mve.h
+++ b/gcc/config/arm/arm_mve.h
@@ -42,6 +42,13 @@ typedef __simd128_float16_t float16x8_t;
 typedef __simd128_float32_t float32x4_t;
 #endif
 
+#if (__ARM_FEATURE_MVE & 2) /* MVE Floating point.  */
+typedef struct { float16x8_t val[2]; } float16x8x2_t;
+typedef struct { float16x8_t val[4]; } float16x8x4_t;
+typedef struct { float32x4_t val[2]; } float32x4x2_t;
+typedef struct { float32x4_t val[4]; } float32x4x4_t;
+#endif
+
 typedef uint16_t mve_pred16_t;
 typedef __simd128_uint8_t uint8x16_t;
 typedef __simd128_uint16_t uint16x8_t;
@@ -52,6 +59,330 @@ typedef __simd128_int16_t int16x8_t;
 typedef __simd128_int32_t int32x4_t;
 typedef __simd128_int64_t int64x2_t;
 
+typedef struct { int16x8_t val[2]; } int16x8x2_t;
+typedef struct { int16x8_t val[4]; } int16x8x4_t;
+typedef struct { int32x4_t val[2]; } int32x4x2_t;
+typedef struct { int32x4_t val[4]; } int32x4x4_t;
+typedef struct { int8x16_t val[2]; } int8x16x2_t;
+typedef struct { int8x16_t val[4]; } int8x16x4_t;
+typedef struct { uint16x8_t val[2]; } uint16x8x2_t;
+typedef struct { uint16x8_t val[4]; } uint16x8x4_t;
+typedef struct { uint32x4_t val[2]; } uint32x4x2_t;
+typedef struct { uint32x4_t val[4]; } uint32x4x4_t;
+typedef struct { uint8x16_t val[2]; } uint8x16x2_t;
+typedef struct { uint8x16_t val[4]; } uint8x16x4_t;
+
+#ifndef __ARM_MVE_PRESERVE_USER_NAMESPACE
+#define vst4q_s8( __addr, __value) __arm_vst4q_s8( __addr, __value)
+#define vst4q_s16( __addr, __value) __arm_vst4q_s16( __addr, __value)
+#define vst4q_s32( __addr, __value) __arm_vst4q_s32( __addr, __value)
+#define vst4q_u8( __addr, __value) __arm_vst4q_u8( __addr, __value)
+#define vst4q_u16( __addr, __value) __arm_vst4q_u16( __addr, __value)
+#define vst4q_u32( __addr, __value) __arm_vst4q_u32( __addr, __value)
+#define vst4q_f16( __addr, __value) __arm_vst4q_f16( __addr, __value)
+#define vst4q_f32( __addr, __value) __arm_vst4q_f32( __addr, __value)
+#endif
+
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vst4q_s8 (int8_t * __addr, int8x16x4_t __value)
+{
+  union { int8x16x4_t __i; __builtin_neon_xi __o; } __rv;
+  __rv.__i = __value;
+  __builtin_mve_vst4qv16qi ((__builtin_neon_qi *) __addr, __rv.__o);
+}
+
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vst4q_s16 (int16_t * __addr, int16x8x4_t __value)
+{
+  union { int16x8x4_t __i; __builtin_neon_xi __o; } __rv;
+  __rv.__i = __value;
+  __builtin_mve_vst4qv8hi ((__builtin_neon_hi *) __addr, __rv.__o);
+}
+
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vst4q_s32 (int32_t * __addr, int32x4x4_t __value)
+{
+  union { int32x4x4_t __i; __builtin_neon_xi __o; } __rv;
+  __rv.__i = __value;
+  __builtin_mve_vst4qv4si ((__builtin_neon_si *) __addr, __rv.__o);
+}
+
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vst4q_u8 (uint8_t * __addr, uint8x16x4_t __value)
+{
+  union { uint8x16x4_t __i; __builtin_neon_xi __o; } __rv;
+  __rv.__i = __value;
+  __builtin_mve_vst4qv16qi ((__builtin_neon_qi *) __addr, __rv.__o);
+}
+
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vst4q_u16 (uint16_t * __addr, uint16x8x4_t __value)
+{
+  union { uint16x8x4_t __i; __builtin_neon_xi __o; } __rv;
+  __rv.__i = __value;
+  __builtin_mve_vst4qv8hi ((__builtin_neon_hi *) __addr, __rv.__o);
+}
+
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vst4q_u32 (uint32_t * __addr, uint32x4x4_t __value)
+{
+  union { uint32x4x4_t __i; __builtin_neon_xi __o; } __rv;
+  __rv.__i = __value;
+  __builtin_mve_vst4qv4si ((__builtin_neon_si *) __addr, __rv.__o);
+}
+
+#if (__ARM_FEATURE_MVE & 2) /* MVE Floating point.  */
+
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vst4q_f16 (float16_t * __addr, float16x8x4_t __value)
+{
+  union { float16x8x4_t __i; __builtin_neon_xi __o; } __rv;
+  __rv.__i = __value;
+  __builtin_mve_vst4qv8hf (__addr, __rv.__o);
+}
+
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vst4q_f32 (float32_t * __addr, float32x4x4_t __value)
+{
+  union { float32x4x4_t __i; __builtin_neon_xi __o; } __rv;
+  __rv.__i = __value;
+  __builtin_mve_vst4qv4sf (__addr, __rv.__o);
+}
+
+#endif
+
+enum {
+    __ARM_mve_type_float16_t = 1,
+    __ARM_mve_type_float16_t_ptr,
+    __ARM_mve_type_float16_t_const_ptr,
+    __ARM_mve_type_float16x8_t,
+    __ARM_mve_type_float16x8x2_t,
+    __ARM_mve_type_float16x8x4_t,
+    __ARM_mve_type_float32_t,
+    __ARM_mve_type_float32_t_ptr,
+    __ARM_mve_type_float32_t_const_ptr,
+    __ARM_mve_type_float32x4_t,
+    __ARM_mve_type_float32x4x2_t,
+    __ARM_mve_type_float32x4x4_t,
+    __ARM_mve_type_int16_t,
+    __ARM_mve_type_int16_t_ptr,
+    __ARM_mve_type_int16_t_const_ptr,
+    __ARM_mve_type_int16x8_t,
+    __ARM_mve_type_int16x8x2_t,
+    __ARM_mve_type_int16x8x4_t,
+    __ARM_mve_type_int32_t,
+    __ARM_mve_type_int32_t_ptr,
+    __ARM_mve_type_int32_t_const_ptr,
+    __ARM_mve_type_int32x4_t,
+    __ARM_mve_type_int32x4x2_t,
+    __ARM_mve_type_int32x4x4_t,
+    __ARM_mve_type_int64_t,
+    __ARM_mve_type_int64_t_ptr,
+    __ARM_mve_type_int64_t_const_ptr,
+    __ARM_mve_type_int64x2_t,
+    __ARM_mve_type_int8_t,
+    __ARM_mve_type_int8_t_ptr,
+    __ARM_mve_type_int8_t_const_ptr,
+    __ARM_mve_type_int8x16_t,
+    __ARM_mve_type_int8x16x2_t,
+    __ARM_mve_type_int8x16x4_t,
+    __ARM_mve_type_uint16_t,
+    __ARM_mve_type_uint16_t_ptr,
+    __ARM_mve_type_uint16_t_const_ptr,
+    __ARM_mve_type_uint16x8_t,
+    __ARM_mve_type_uint16x8x2_t,
+    __ARM_mve_type_uint16x8x4_t,
+    __ARM_mve_type_uint32_t,
+    __ARM_mve_type_uint32_t_ptr,
+    __ARM_mve_type_uint32_t_const_ptr,
+    __ARM_mve_type_uint32x4_t,
+    __ARM_mve_type_uint32x4x2_t,
+    __ARM_mve_type_uint32x4x4_t,
+    __ARM_mve_type_uint64_t,
+    __ARM_mve_type_uint64_t_ptr,
+    __ARM_mve_type_uint64_t_const_ptr,
+    __ARM_mve_type_uint64x2_t,
+    __ARM_mve_type_uint8_t,
+    __ARM_mve_type_uint8_t_ptr,
+    __ARM_mve_type_uint8_t_const_ptr,
+    __ARM_mve_type_uint8x16_t,
+    __ARM_mve_type_uint8x16x2_t,
+    __ARM_mve_type_uint8x16x4_t,
+    __ARM_mve_unsupported_type
+};
+
+#if (__ARM_FEATURE_MVE & 2) /* MVE Floating point.  */
+#define __ARM_mve_typeid(x) _Generic(x, \
+    float16_t: __ARM_mve_type_float16_t, \
+    float16_t *: __ARM_mve_type_float16_t_ptr, \
+    float16_t const *: __ARM_mve_type_float16_t_const_ptr, \
+    float16x8_t: __ARM_mve_type_float16x8_t, \
+    float16x8x2_t: __ARM_mve_type_float16x8x2_t, \
+    float16x8x4_t: __ARM_mve_type_float16x8x4_t, \
+    float32_t: __ARM_mve_type_float32_t, \
+    float32_t *: __ARM_mve_type_float32_t_ptr, \
+    float32_t const *: __ARM_mve_type_float32_t_const_ptr, \
+    float32x4_t: __ARM_mve_type_float32x4_t, \
+    float32x4x2_t: __ARM_mve_type_float32x4x2_t, \
+    float32x4x4_t: __ARM_mve_type_float32x4x4_t, \
+    int16_t: __ARM_mve_type_int16_t, \
+    int16_t *: __ARM_mve_type_int16_t_ptr, \
+    int16_t const *: __ARM_mve_type_int16_t_const_ptr, \
+    int16x8_t: __ARM_mve_type_int16x8_t, \
+    int16x8x2_t: __ARM_mve_type_int16x8x2_t, \
+    int16x8x4_t: __ARM_mve_type_int16x8x4_t, \
+    int32_t: __ARM_mve_type_int32_t, \
+    int32_t *: __ARM_mve_type_int32_t_ptr, \
+    int32_t const *: __ARM_mve_type_int32_t_const_ptr, \
+    int32x4_t: __ARM_mve_type_int32x4_t, \
+    int32x4x2_t: __ARM_mve_type_int32x4x2_t, \
+    int32x4x4_t: __ARM_mve_type_int32x4x4_t, \
+    int64_t: __ARM_mve_type_int64_t, \
+    int64_t *: __ARM_mve_type_int64_t_ptr, \
+    int64_t const *: __ARM_mve_type_int64_t_const_ptr, \
+    int64x2_t: __ARM_mve_type_int64x2_t, \
+    int8_t: __ARM_mve_type_int8_t, \
+    int8_t *: __ARM_mve_type_int8_t_ptr, \
+    int8_t const *: __ARM_mve_type_int8_t_const_ptr, \
+    int8x16_t: __ARM_mve_type_int8x16_t, \
+    int8x16x2_t: __ARM_mve_type_int8x16x2_t, \
+    int8x16x4_t: __ARM_mve_type_int8x16x4_t, \
+    uint16_t: __ARM_mve_type_uint16_t, \
+    uint16_t *: __ARM_mve_type_uint16_t_ptr, \
+    uint16_t const *: __ARM_mve_type_uint16_t_const_ptr, \
+    uint16x8_t: __ARM_mve_type_uint16x8_t, \
+    uint16x8x2_t: __ARM_mve_type_uint16x8x2_t, \
+    uint16x8x4_t: __ARM_mve_type_uint16x8x4_t, \
+    uint32_t: __ARM_mve_type_uint32_t, \
+    uint32_t *: __ARM_mve_type_uint32_t_ptr, \
+    uint32_t const *: __ARM_mve_type_uint32_t_const_ptr, \
+    uint32x4_t: __ARM_mve_type_uint32x4_t, \
+    uint32x4x2_t: __ARM_mve_type_uint32x4x2_t, \
+    uint32x4x4_t: __ARM_mve_type_uint32x4x4_t, \
+    uint64_t: __ARM_mve_type_uint64_t, \
+    uint64_t *: __ARM_mve_type_uint64_t_ptr, \
+    uint64_t const *: __ARM_mve_type_uint64_t_const_ptr, \
+    uint64x2_t: __ARM_mve_type_uint64x2_t, \
+    uint8_t: __ARM_mve_type_uint8_t, \
+    uint8_t *: __ARM_mve_type_uint8_t_ptr, \
+    uint8_t const *: __ARM_mve_type_uint8_t_const_ptr, \
+    uint8x16_t: __ARM_mve_type_uint8x16_t, \
+    uint8x16x2_t: __ARM_mve_type_uint8x16x2_t, \
+    uint8x16x4_t: __ARM_mve_type_uint8x16x4_t, \
+    default: _Generic(x, \
+	signed char: __ARM_mve_type_int8_t, \
+	short: __ARM_mve_type_int16_t, \
+	int: __ARM_mve_type_int32_t, \
+	long: __ARM_mve_type_int32_t, \
+	long long: __ARM_mve_type_int64_t, \
+	unsigned char: __ARM_mve_type_uint8_t, \
+	unsigned short: __ARM_mve_type_uint16_t, \
+	unsigned int: __ARM_mve_type_uint32_t, \
+	unsigned long: __ARM_mve_type_uint32_t, \
+	unsigned long long: __ARM_mve_type_uint64_t, \
+	default: __ARM_mve_unsupported_type))
+#else
+#define __ARM_mve_typeid(x) _Generic(x, \
+    int16_t: __ARM_mve_type_int16_t, \
+    int16_t *: __ARM_mve_type_int16_t_ptr, \
+    int16_t const *: __ARM_mve_type_int16_t_const_ptr, \
+    int16x8_t: __ARM_mve_type_int16x8_t, \
+    int16x8x2_t: __ARM_mve_type_int16x8x2_t, \
+    int16x8x4_t: __ARM_mve_type_int16x8x4_t, \
+    int32_t: __ARM_mve_type_int32_t, \
+    int32_t *: __ARM_mve_type_int32_t_ptr, \
+    int32_t const *: __ARM_mve_type_int32_t_const_ptr, \
+    int32x4_t: __ARM_mve_type_int32x4_t, \
+    int32x4x2_t: __ARM_mve_type_int32x4x2_t, \
+    int32x4x4_t: __ARM_mve_type_int32x4x4_t, \
+    int64_t: __ARM_mve_type_int64_t, \
+    int64_t *: __ARM_mve_type_int64_t_ptr, \
+    int64_t const *: __ARM_mve_type_int64_t_const_ptr, \
+    int64x2_t: __ARM_mve_type_int64x2_t, \
+    int8_t: __ARM_mve_type_int8_t, \
+    int8_t *: __ARM_mve_type_int8_t_ptr, \
+    int8_t const *: __ARM_mve_type_int8_t_const_ptr, \
+    int8x16_t: __ARM_mve_type_int8x16_t, \
+    int8x16x2_t: __ARM_mve_type_int8x16x2_t, \
+    int8x16x4_t: __ARM_mve_type_int8x16x4_t, \
+    uint16_t: __ARM_mve_type_uint16_t, \
+    uint16_t *: __ARM_mve_type_uint16_t_ptr, \
+    uint16_t const *: __ARM_mve_type_uint16_t_const_ptr, \
+    uint16x8_t: __ARM_mve_type_uint16x8_t, \
+    uint16x8x2_t: __ARM_mve_type_uint16x8x2_t, \
+    uint16x8x4_t: __ARM_mve_type_uint16x8x4_t, \
+    uint32_t: __ARM_mve_type_uint32_t, \
+    uint32_t *: __ARM_mve_type_uint32_t_ptr, \
+    uint32_t const *: __ARM_mve_type_uint32_t_const_ptr, \
+    uint32x4_t: __ARM_mve_type_uint32x4_t, \
+    uint32x4x2_t: __ARM_mve_type_uint32x4x2_t, \
+    uint32x4x4_t: __ARM_mve_type_uint32x4x4_t, \
+    uint64_t: __ARM_mve_type_uint64_t, \
+    uint64_t *: __ARM_mve_type_uint64_t_ptr, \
+    uint64_t const *: __ARM_mve_type_uint64_t_const_ptr, \
+    uint64x2_t: __ARM_mve_type_uint64x2_t, \
+    uint8_t: __ARM_mve_type_uint8_t, \
+    uint8_t *: __ARM_mve_type_uint8_t_ptr, \
+    uint8_t const *: __ARM_mve_type_uint8_t_const_ptr, \
+    uint8x16_t: __ARM_mve_type_uint8x16_t, \
+    uint8x16x2_t: __ARM_mve_type_uint8x16x2_t, \
+    uint8x16x4_t: __ARM_mve_type_uint8x16x4_t, \
+    default: _Generic(x, \
+	signed char: __ARM_mve_type_int8_t, \
+	short: __ARM_mve_type_int16_t, \
+	int: __ARM_mve_type_int32_t, \
+	long: __ARM_mve_type_int32_t, \
+	long long: __ARM_mve_type_int64_t, \
+	unsigned char: __ARM_mve_type_uint8_t, \
+	unsigned short: __ARM_mve_type_uint16_t, \
+	unsigned int: __ARM_mve_type_uint32_t, \
+	unsigned long: __ARM_mve_type_uint32_t, \
+	unsigned long long: __ARM_mve_type_uint64_t, \
+	default: __ARM_mve_unsupported_type))
+#endif /* MVE Floating point.  */
+
+extern void *__ARM_undef;
+#define __ARM_mve_coerce(param, type) \
+    _Generic(param, type: param, default: *(type *)__ARM_undef)
+
+#if (__ARM_FEATURE_MVE & 2) /* MVE Floating point.  */
+
+#define vst4q(p0,p1) __arm_vst4q(p0,p1)
+#define __arm_vst4q(p0,p1) ({ __typeof(p0) __p0 = (p0); \
+  __typeof(p1) __p1 = (p1); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
+  int (*)[__ARM_mve_type_int8_t_ptr][__ARM_mve_type_int8x16x4_t]: __arm_vst4q_s8 (__ARM_mve_coerce(__p0, int8_t *), __ARM_mve_coerce(__p1, int8x16x4_t)), \
+  int (*)[__ARM_mve_type_int16_t_ptr][__ARM_mve_type_int16x8x4_t]: __arm_vst4q_s16 (__ARM_mve_coerce(__p0, int16_t *), __ARM_mve_coerce(__p1, int16x8x4_t)), \
+  int (*)[__ARM_mve_type_int32_t_ptr][__ARM_mve_type_int32x4x4_t]: __arm_vst4q_s32 (__ARM_mve_coerce(__p0, int32_t *), __ARM_mve_coerce(__p1, int32x4x4_t)), \
+  int (*)[__ARM_mve_type_uint8_t_ptr][__ARM_mve_type_uint8x16x4_t]: __arm_vst4q_u8 (__ARM_mve_coerce(__p0, uint8_t *), __ARM_mve_coerce(__p1, uint8x16x4_t)), \
+  int (*)[__ARM_mve_type_uint16_t_ptr][__ARM_mve_type_uint16x8x4_t]: __arm_vst4q_u16 (__ARM_mve_coerce(__p0, uint16_t *), __ARM_mve_coerce(__p1, uint16x8x4_t)), \
+  int (*)[__ARM_mve_type_uint32_t_ptr][__ARM_mve_type_uint32x4x4_t]: __arm_vst4q_u32 (__ARM_mve_coerce(__p0, uint32_t *), __ARM_mve_coerce(__p1, uint32x4x4_t)), \
+  int (*)[__ARM_mve_type_float16_t_ptr][__ARM_mve_type_float16x8x4_t]: __arm_vst4q_f16 (__ARM_mve_coerce(__p0, float16_t *), __ARM_mve_coerce(__p1, float16x8x4_t)), \
+  int (*)[__ARM_mve_type_float32_t_ptr][__ARM_mve_type_float32x4x4_t]: __arm_vst4q_f32 (__ARM_mve_coerce(__p0, float32_t *), __ARM_mve_coerce(__p1, float32x4x4_t)));})
+
+#else /* MVE Interger.  */
+
+#define vst4q(p0,p1) __arm_vst4q(p0,p1)
+#define __arm_vst4q(p0,p1) ({ __typeof(p0) __p0 = (p0); \
+  __typeof(p1) __p1 = (p1); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
+  int (*)[__ARM_mve_type_int8_t_ptr][__ARM_mve_type_int8x16x4_t]: __arm_vst4q_s8 (__ARM_mve_coerce(__p0, int8_t *), __ARM_mve_coerce(__p1, int8x16x4_t)), \
+  int (*)[__ARM_mve_type_int16_t_ptr][__ARM_mve_type_int16x8x4_t]: __arm_vst4q_s16 (__ARM_mve_coerce(__p0, int16_t *), __ARM_mve_coerce(__p1, int16x8x4_t)), \
+  int (*)[__ARM_mve_type_int32_t_ptr][__ARM_mve_type_int32x4x4_t]: __arm_vst4q_s32 (__ARM_mve_coerce(__p0, int32_t *), __ARM_mve_coerce(__p1, int32x4x4_t)), \
+  int (*)[__ARM_mve_type_uint8_t_ptr][__ARM_mve_type_uint8x16x4_t]: __arm_vst4q_u8 (__ARM_mve_coerce(__p0, uint8_t *), __ARM_mve_coerce(__p1, uint8x16x4_t)), \
+  int (*)[__ARM_mve_type_uint16_t_ptr][__ARM_mve_type_uint16x8x4_t]: __arm_vst4q_u16 (__ARM_mve_coerce(__p0, uint16_t *), __ARM_mve_coerce(__p1, uint16x8x4_t)), \
+  int (*)[__ARM_mve_type_uint32_t_ptr][__ARM_mve_type_uint32x4x4_t]: __arm_vst4q_u32 (__ARM_mve_coerce(__p0, uint32_t *), __ARM_mve_coerce(__p1, uint32x4x4_t)));})
+
+#endif /* MVE Floating point.  */
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/gcc/config/arm/arm_mve_builtins.def b/gcc/config/arm/arm_mve_builtins.def
new file mode 100644
index 0000000000000000000000000000000000000000..c7f3546c9cb0ec3ae794e181e533a1cbed38121c
--- /dev/null
+++ b/gcc/config/arm/arm_mve_builtins.def
@@ -0,0 +1,21 @@
+/*  MVE builtin definitions for Arm.
+    Copyright  (C) 2019 Free Software Foundation, Inc.
+    Contributed by Arm.
+
+    This file is part of GCC.
+
+    GCC is free software; you can redistribute it and/or modify it
+    under the terms of the GNU General Public License as published
+    by the Free Software Foundation; either version 3, or (at your
+    option) any later version.
+
+    GCC is distributed in the hope that it will be useful, but WITHOUT
+    ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+    or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
+    License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with GCC; see the file COPYING3.  If not see
+    <http://www.gnu.org/licenses/>.  */
+
+VAR5 (STORE1, vst4q, v16qi, v8hi, v4si, v8hf, v4sf)
diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index 53334c6d329dedd482615b996232e85ded7a34f8..a2f5220eccb6d888b9b0b1bbd1054cc3a6be97f5 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -17,9 +17,12 @@
 ;; along with GCC; see the file COPYING3.  If not see
 ;; <http://www.gnu.org/licenses/>.
 
-(define_mode_iterator MVE_types [V16QI V8HI V4SI V2DI TI V8HF V4SF V2DF])
 (define_mode_attr V_sz_elem2 [(V16QI "s8") (V8HI "u16") (V4SI "u32")
 			      (V2DI "u64")])
+(define_mode_iterator MVE_types [V16QI V8HI V4SI V2DI TI V8HF V4SF V2DF])
+(define_mode_iterator MVE_VLD_ST [V16QI V8HI V4SI V8HF V4SF])
+
+(define_c_enum "unspec" [VST4Q])
 
 (define_insn "*mve_mov<mode>"
   [(set (match_operand:MVE_types 0 "s_register_operand" "=w,w,r,w,w,r,w")
@@ -76,3 +79,37 @@
   "TARGET_HAVE_MVE"
   "vstrb.<V_sz_elem> %q1, %E0"
   [(set_attr "type" "mve_store")])
+
+;;
+;; [vst4q])
+;;
+(define_insn "mve_vst4q<mode>"
+  [(set (match_operand:XI 0 "neon_struct_operand" "=Um")
+	(unspec:XI [(match_operand:XI 1 "s_register_operand" "w")
+		    (unspec:MVE_VLD_ST [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
+	 VST4Q))
+  ]
+  "TARGET_HAVE_MVE"
+{
+   rtx ops[6];
+   int regno = REGNO (operands[1]);
+   ops[0] = gen_rtx_REG (TImode, regno);
+   ops[1] = gen_rtx_REG (TImode, regno+4);
+   ops[2] = gen_rtx_REG (TImode, regno+8);
+   ops[3] = gen_rtx_REG (TImode, regno+12);
+   rtx reg  = operands[0];
+   while (reg && !REG_P (reg))
+    reg = XEXP (reg, 0);
+   gcc_assert (REG_P (reg));
+   ops[4] = reg;
+   ops[5] = operands[0];
+   /* Here in first three instructions data is stored to ops[4]'s location but
+      in the fourth instruction data is stored to operands[0], this is to
+      support the writeback.  */
+   output_asm_insn ("vst40.<V_sz_elem>\t{%q0, %q1, %q2, %q3}, [%4]\n\t"
+		    "vst41.<V_sz_elem>\t{%q0, %q1, %q2, %q3}, [%4]\n\t"
+		    "vst42.<V_sz_elem>\t{%q0, %q1, %q2, %q3}, [%4]\n\t"
+		    "vst43.<V_sz_elem>\t{%q0, %q1, %q2, %q3}, %5", ops);
+   return "";
+}
+  [(set_attr "length" "16")])
diff --git a/gcc/config/arm/t-arm b/gcc/config/arm/t-arm
index fda5e84355b56a20eb9a22919ab1c786120cc8f1..fbd99197b31ee4b69b7a06592123b6facd04f9af 100644
--- a/gcc/config/arm/t-arm
+++ b/gcc/config/arm/t-arm
@@ -135,7 +135,8 @@ arm.o: $(srcdir)/config/arm/arm.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) \
   arm-cpu-data.h \
   $(srcdir)/config/arm/arm-protos.h \
   $(srcdir)/config/arm/arm_neon_builtins.def \
-  $(srcdir)/config/arm/arm_vfp_builtins.def
+  $(srcdir)/config/arm/arm_vfp_builtins.def \
+  $(srcdir)/config/arm/arm_mve_builtins.def
 
 arm-builtins.o: $(srcdir)/config/arm/arm-builtins.c $(CONFIG_H) \
   $(SYSTEM_H) coretypes.h $(TM_H) \
@@ -145,6 +146,7 @@ arm-builtins.o: $(srcdir)/config/arm/arm-builtins.c $(CONFIG_H) \
   $(srcdir)/config/arm/arm_acle_builtins.def \
   $(srcdir)/config/arm/arm_neon_builtins.def \
   $(srcdir)/config/arm/arm_vfp_builtins.def \
+  $(srcdir)/config/arm/arm_mve_builtins.def \
   $(srcdir)/config/arm/arm-simd-builtin-types.def
 	$(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) \
 		$(srcdir)/config/arm/arm-builtins.c
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst4q_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst4q_f16.c
new file mode 100644
index 0000000000000000000000000000000000000000..6f8ea17345b85983202613b0f481b23e0532fd16
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst4q_f16.c
@@ -0,0 +1,37 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+void
+foo (float16_t * addr, float16x8x4_t value)
+{
+  vst4q_f16 (addr, value);
+}
+
+/* { dg-final { scan-assembler "vst40.16"  }  } */
+/* { dg-final { scan-assembler "vst41.16"  }  } */
+/* { dg-final { scan-assembler "vst42.16"  }  } */
+/* { dg-final { scan-assembler "vst43.16"  }  } */
+
+void
+foo1 (float16_t * addr, float16x8x4_t value)
+{
+  vst4q (addr, value);
+}
+
+/* { dg-final { scan-assembler "vst40.16"  }  } */
+/* { dg-final { scan-assembler "vst41.16"  }  } */
+/* { dg-final { scan-assembler "vst42.16"  }  } */
+/* { dg-final { scan-assembler "vst43.16"  }  } */
+
+void
+foo2 (float16_t * addr, float16x8x4_t value)
+{
+  vst4q_f16 (addr, value);
+  addr += 32;
+  vst4q_f16 (addr, value);
+}
+
+/* { dg-final { scan-assembler {vst43.16\s\{.*\}, \[.*\]!}  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst4q_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst4q_f32.c
new file mode 100644
index 0000000000000000000000000000000000000000..03b1dbb5b13e2c41013c772d944f29da8c2468f7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst4q_f32.c
@@ -0,0 +1,37 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+void
+foo (float32_t * addr, float32x4x4_t value)
+{
+  vst4q_f32 (addr, value);
+}
+
+/* { dg-final { scan-assembler "vst40.32"  }  } */
+/* { dg-final { scan-assembler "vst41.32"  }  } */
+/* { dg-final { scan-assembler "vst42.32"  }  } */
+/* { dg-final { scan-assembler "vst43.32"  }  } */
+
+void
+foo1 (float32_t * addr, float32x4x4_t value)
+{
+  vst4q (addr, value);
+}
+
+/* { dg-final { scan-assembler "vst40.32"  }  } */
+/* { dg-final { scan-assembler "vst41.32"  }  } */
+/* { dg-final { scan-assembler "vst42.32"  }  } */
+/* { dg-final { scan-assembler "vst43.32"  }  } */
+
+void
+foo2 (float32_t * addr, float32x4x4_t value)
+{
+  vst4q_f32 (addr, value);
+  addr += 16;
+  vst4q_f32 (addr, value);
+}
+
+/* { dg-final { scan-assembler {vst43.32\s\{.*\}, \[.*\]!}  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst4q_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst4q_s16.c
new file mode 100644
index 0000000000000000000000000000000000000000..05f53fc9175b41e4d16c7a6bc525b57b626a632d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst4q_s16.c
@@ -0,0 +1,37 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+void
+foo (int16_t * addr, int16x8x4_t value)
+{
+  vst4q_s16 (addr, value);
+}
+
+/* { dg-final { scan-assembler "vst40.16"  }  } */
+/* { dg-final { scan-assembler "vst41.16"  }  } */
+/* { dg-final { scan-assembler "vst42.16"  }  } */
+/* { dg-final { scan-assembler "vst43.16"  }  } */
+
+void
+foo1 (int16_t * addr, int16x8x4_t value)
+{
+  vst4q (addr, value);
+}
+
+/* { dg-final { scan-assembler "vst40.16"  }  } */
+/* { dg-final { scan-assembler "vst41.16"  }  } */
+/* { dg-final { scan-assembler "vst42.16"  }  } */
+/* { dg-final { scan-assembler "vst43.16"  }  } */
+
+void
+foo2 (int16_t * addr, int16x8x4_t value)
+{
+  vst4q_s16 (addr, value);
+  addr += 32;
+  vst4q_s16 (addr, value);
+}
+
+/* { dg-final { scan-assembler {vst43.16\s\{.*\}, \[.*\]!}  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst4q_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst4q_s32.c
new file mode 100644
index 0000000000000000000000000000000000000000..9da2addb8c3d9566722b640f66bbee4f2b4b563d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst4q_s32.c
@@ -0,0 +1,37 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+void
+foo (int32_t * addr, int32x4x4_t value)
+{
+  vst4q_s32 (addr, value);
+}
+
+/* { dg-final { scan-assembler "vst40.32"  }  } */
+/* { dg-final { scan-assembler "vst41.32"  }  } */
+/* { dg-final { scan-assembler "vst42.32"  }  } */
+/* { dg-final { scan-assembler "vst43.32"  }  } */
+
+void
+foo1 (int32_t * addr, int32x4x4_t value)
+{
+  vst4q (addr, value);
+}
+
+/* { dg-final { scan-assembler "vst40.32"  }  } */
+/* { dg-final { scan-assembler "vst41.32"  }  } */
+/* { dg-final { scan-assembler "vst42.32"  }  } */
+/* { dg-final { scan-assembler "vst43.32"  }  } */
+
+void
+foo2 (int32_t * addr, int32x4x4_t value)
+{
+  vst4q_s32 (addr, value);
+  addr += 16;
+  vst4q_s32 (addr, value); 
+}
+
+/* { dg-final { scan-assembler {vst43.32\s\{.*\}, \[.*\]!}  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst4q_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst4q_s8.c
new file mode 100644
index 0000000000000000000000000000000000000000..1243d135e1628de14596a58f24a79f45299ce517
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst4q_s8.c
@@ -0,0 +1,37 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+void
+foo (int8_t * addr, int8x16x4_t value)
+{
+  vst4q_s8 (addr, value);
+}
+
+/* { dg-final { scan-assembler "vst40.8"  }  } */
+/* { dg-final { scan-assembler "vst41.8"  }  } */
+/* { dg-final { scan-assembler "vst42.8"  }  } */
+/* { dg-final { scan-assembler "vst43.8"  }  } */
+
+void
+foo1 (int8_t * addr, int8x16x4_t value)
+{
+  vst4q (addr, value);
+}
+
+/* { dg-final { scan-assembler "vst40.8"  }  } */
+/* { dg-final { scan-assembler "vst41.8"  }  } */
+/* { dg-final { scan-assembler "vst42.8"  }  } */
+/* { dg-final { scan-assembler "vst43.8"  }  } */
+
+void
+foo2 (int8_t * addr, int8x16x4_t value)
+{
+  vst4q_s8 (addr, value);
+  addr += 16*4;
+  vst4q_s8 (addr, value);
+}
+
+/* { dg-final { scan-assembler {vst43.8\s\{.*\}, \[.*\]!}  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst4q_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst4q_u16.c
new file mode 100644
index 0000000000000000000000000000000000000000..09097e7914ed08dda5e7616bafe35cdb30486f21
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst4q_u16.c
@@ -0,0 +1,37 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+void
+foo (uint16_t * addr, uint16x8x4_t value)
+{
+  vst4q_u16 (addr, value);
+}
+
+/* { dg-final { scan-assembler "vst40.16"  }  } */
+/* { dg-final { scan-assembler "vst41.16"  }  } */
+/* { dg-final { scan-assembler "vst42.16"  }  } */
+/* { dg-final { scan-assembler "vst43.16"  }  } */
+
+void
+foo1 (uint16_t * addr, uint16x8x4_t value)
+{
+  vst4q (addr, value);
+}
+
+/* { dg-final { scan-assembler "vst40.16"  }  } */
+/* { dg-final { scan-assembler "vst41.16"  }  } */
+/* { dg-final { scan-assembler "vst42.16"  }  } */
+/* { dg-final { scan-assembler "vst43.16"  }  } */
+
+void
+foo2 (uint16_t * addr, uint16x8x4_t value)
+{
+  vst4q_u16 (addr, value);
+  addr += 32;
+  vst4q_u16 (addr, value);
+}
+
+/* { dg-final { scan-assembler {vst43.16\s\{.*\}, \[.*\]!}  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst4q_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst4q_u32.c
new file mode 100644
index 0000000000000000000000000000000000000000..dcb0a10f9f21e4adb70d18b590fcc7b663547f16
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst4q_u32.c
@@ -0,0 +1,37 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+void
+foo (uint32_t * addr, uint32x4x4_t value)
+{
+  vst4q_u32 (addr, value);
+}
+
+/* { dg-final { scan-assembler "vst40.32"  }  } */
+/* { dg-final { scan-assembler "vst41.32"  }  } */
+/* { dg-final { scan-assembler "vst42.32"  }  } */
+/* { dg-final { scan-assembler "vst43.32"  }  } */
+
+void
+foo1 (uint32_t * addr, uint32x4x4_t value)
+{
+  vst4q (addr, value);
+}
+
+/* { dg-final { scan-assembler "vst40.32"  }  } */
+/* { dg-final { scan-assembler "vst41.32"  }  } */
+/* { dg-final { scan-assembler "vst42.32"  }  } */
+/* { dg-final { scan-assembler "vst43.32"  }  } */
+
+void
+foo2 (uint32_t * addr, uint32x4x4_t value)
+{
+  vst4q_u32 (addr, value);
+  addr += 16;
+  vst4q_u32 (addr, value); 
+}
+
+/* { dg-final { scan-assembler {vst43.32\s\{.*\}, \[.*\]!}  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst4q_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst4q_u8.c
new file mode 100644
index 0000000000000000000000000000000000000000..b18f57dd6356c2fdf39cbf14ca85373c98f1c62c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst4q_u8.c
@@ -0,0 +1,37 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+void
+foo (uint8_t * addr, uint8x16x4_t value)
+{
+  vst4q_u8 (addr, value);
+}
+
+/* { dg-final { scan-assembler "vst40.8"  }  } */
+/* { dg-final { scan-assembler "vst41.8"  }  } */
+/* { dg-final { scan-assembler "vst42.8"  }  } */
+/* { dg-final { scan-assembler "vst43.8"  }  } */
+
+void
+foo1 (uint8_t * addr, uint8x16x4_t value)
+{
+  vst4q (addr, value);
+}
+
+/* { dg-final { scan-assembler "vst40.8"  }  } */
+/* { dg-final { scan-assembler "vst41.8"  }  } */
+/* { dg-final { scan-assembler "vst42.8"  }  } */
+/* { dg-final { scan-assembler "vst43.8"  }  } */
+
+void
+foo2 (uint8_t * addr, uint8x16x4_t value)
+{
+  vst4q_u8 (addr, value);
+  addr += 16*4;
+  vst4q_u8 (addr, value);
+}
+
+/* { dg-final { scan-assembler {vst43.8\s\{.*\}, \[.*\]!}  }  } */


^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH][ARM][GCC][3/2x]: MVE intrinsics with binary operands.
  2019-11-14 19:34 [PATCH][ARM][GCC][0/x]: Support for MVE ACLE intrinsics Srinath Parvathaneni
                   ` (8 preceding siblings ...)
  2019-11-14 19:13 ` [PATCH][ARM][GCC][3/4x]: MVE intrinsics with quaternary operands Srinath Parvathaneni
@ 2019-11-14 19:13 ` Srinath Parvathaneni
  2019-11-14 19:14 ` [PATCH][ARM][GCC][1/2x]: " Srinath Parvathaneni
                   ` (28 subsequent siblings)
  38 siblings, 0 replies; 48+ messages in thread
From: Srinath Parvathaneni @ 2019-11-14 19:13 UTC (permalink / raw)
  To: gcc-patches; +Cc: Richard Earnshaw, Kyrylo Tkachov

[-- Attachment #1: Type: text/plain, Size: 28396 bytes --]

Hello,

This patch supports following MVE ACLE intrinsics with binary operands.

vaddlvq_p_s32, vaddlvq_p_u32, vcmpneq_s8, vcmpneq_s16, vcmpneq_s32, vcmpneq_u8,
vcmpneq_u16, vcmpneq_u32, vshlq_s8, vshlq_s16, vshlq_s32, vshlq_u8, vshlq_u16, vshlq_u32.

Please refer to M-profile Vector Extension (MVE) intrinsics [1]  for more details.
[1] https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/mve-intrinsics

Regression tested on arm-none-eabi and found no regressions.

Ok for trunk?

Thanks,
Srinath.

gcc/ChangeLog:

2019-10-21  Andre Vieira  <andre.simoesdiasvieira@arm.com>
	    Mihail Ionescu  <mihail.ionescu@arm.com>
	    Srinath Parvathaneni  <srinath.parvathaneni@arm.com>

	* config/arm/arm-builtins.c (BINOP_NONE_NONE_UNONE_QUALIFIERS): Define
	qualifier for binary operands.
	(BINOP_UNONE_NONE_NONE_QUALIFIERS): Likewise.
	(BINOP_UNONE_UNONE_NONE_QUALIFIERS): Likewise.
	* config/arm/arm_mve.h (vaddlvq_p_s32): Define macro.
	(vaddlvq_p_u32): Likewise.
	(vcmpneq_s8): Likewise.
	(vcmpneq_s16): Likewise.
	(vcmpneq_s32): Likewise.
	(vcmpneq_u8): Likewise.
	(vcmpneq_u16): Likewise.
	(vcmpneq_u32): Likewise.
	(vshlq_s8): Likewise.
	(vshlq_s16): Likewise.
	(vshlq_s32): Likewise.
	(vshlq_u8): Likewise.
	(vshlq_u16): Likewise.
	(vshlq_u32): Likewise.
	(__arm_vaddlvq_p_s32): Define intrinsic.
	(__arm_vaddlvq_p_u32): Likewise.
	(__arm_vcmpneq_s8): Likewise.
	(__arm_vcmpneq_s16): Likewise.
	(__arm_vcmpneq_s32): Likewise.
	(__arm_vcmpneq_u8): Likewise.
	(__arm_vcmpneq_u16): Likewise.
	(__arm_vcmpneq_u32): Likewise.
	(__arm_vshlq_s8): Likewise.
	(__arm_vshlq_s16): Likewise.
	(__arm_vshlq_s32): Likewise.
	(__arm_vshlq_u8): Likewise.
	(__arm_vshlq_u16): Likewise.
	(__arm_vshlq_u32): Likewise.
	(vaddlvq_p): Define polymorphic variant.
	(vcmpneq): Likewise.
	(vshlq): Likewise.
	* config/arm/arm_mve_builtins.def (BINOP_NONE_NONE_UNONE_QUALIFIERS):
	Use it.
	(BINOP_UNONE_NONE_NONE_QUALIFIERS): Likewise.
	(BINOP_UNONE_UNONE_NONE_QUALIFIERS): Likewise.
	* config/arm/mve.md (mve_vaddlvq_p_<supf>v4si): Define RTL pattern.
	(mve_vcmpneq_<supf><mode>): Likewise.
	(mve_vshlq_<supf><mode>): Likewise.

gcc/testsuite/ChangeLog:

2019-10-21  Andre Vieira  <andre.simoesdiasvieira@arm.com>
	    Mihail Ionescu  <mihail.ionescu@arm.com>
	    Srinath Parvathaneni  <srinath.parvathaneni@arm.com>

	* gcc.target/arm/mve/intrinsics/vaddlvq_p_s32.c: New test.
	* gcc.target/arm/mve/intrinsics/vaddlvq_p_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpneq_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpneq_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpneq_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpneq_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpneq_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpneq_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vshlq_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vshlq_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vshlq_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vshlq_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vshlq_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vshlq_u8.c: Likewise.


###############     Attachment also inlined for ease of reply    ###############


diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index ec56dbcdd66662bb1de696142186955d75570772..73a0a3070bda2e35f3e400994dbb6ccfb3043f76 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -391,6 +391,24 @@ arm_binop_unone_none_imm_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 #define BINOP_UNONE_NONE_IMM_QUALIFIERS \
   (arm_binop_unone_none_imm_qualifiers)
 
+static enum arm_type_qualifiers
+arm_binop_none_none_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_none, qualifier_none, qualifier_unsigned };
+#define BINOP_NONE_NONE_UNONE_QUALIFIERS \
+  (arm_binop_none_none_unone_qualifiers)
+
+static enum arm_type_qualifiers
+arm_binop_unone_none_none_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_unsigned, qualifier_none, qualifier_none };
+#define BINOP_UNONE_NONE_NONE_QUALIFIERS \
+  (arm_binop_unone_none_none_qualifiers)
+
+static enum arm_type_qualifiers
+arm_binop_unone_unone_none_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_unsigned, qualifier_unsigned, qualifier_none };
+#define BINOP_UNONE_UNONE_NONE_QUALIFIERS \
+  (arm_binop_unone_unone_none_qualifiers)
+
 /* End of Qualifier for MVE builtins.  */
 
    /* void ([T element type] *, T, immediate).  */
diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
index 0318a2cd8c3cdea430a32506e75a5aceb4d3a475..0140b0cb38ccee5f9caa7570627bdc010e158e71 100644
--- a/gcc/config/arm/arm_mve.h
+++ b/gcc/config/arm/arm_mve.h
@@ -225,6 +225,20 @@ typedef struct { uint8x16_t val[4]; } uint8x16x4_t;
 #define vshrq_n_u8(__a,  __imm) __arm_vshrq_n_u8(__a,  __imm)
 #define vshrq_n_u16(__a,  __imm) __arm_vshrq_n_u16(__a,  __imm)
 #define vshrq_n_u32(__a,  __imm) __arm_vshrq_n_u32(__a,  __imm)
+#define vaddlvq_p_s32(__a, __p) __arm_vaddlvq_p_s32(__a, __p)
+#define vaddlvq_p_u32(__a, __p) __arm_vaddlvq_p_u32(__a, __p)
+#define vcmpneq_s8(__a, __b) __arm_vcmpneq_s8(__a, __b)
+#define vcmpneq_s16(__a, __b) __arm_vcmpneq_s16(__a, __b)
+#define vcmpneq_s32(__a, __b) __arm_vcmpneq_s32(__a, __b)
+#define vcmpneq_u8(__a, __b) __arm_vcmpneq_u8(__a, __b)
+#define vcmpneq_u16(__a, __b) __arm_vcmpneq_u16(__a, __b)
+#define vcmpneq_u32(__a, __b) __arm_vcmpneq_u32(__a, __b)
+#define vshlq_s8(__a, __b) __arm_vshlq_s8(__a, __b)
+#define vshlq_s16(__a, __b) __arm_vshlq_s16(__a, __b)
+#define vshlq_s32(__a, __b) __arm_vshlq_s32(__a, __b)
+#define vshlq_u8(__a, __b) __arm_vshlq_u8(__a, __b)
+#define vshlq_u16(__a, __b) __arm_vshlq_u16(__a, __b)
+#define vshlq_u32(__a, __b) __arm_vshlq_u32(__a, __b)
 #endif
 
 __extension__ extern __inline void
@@ -868,6 +882,103 @@ __arm_vshrq_n_u32 (uint32x4_t __a, const int __imm)
 {
   return __builtin_mve_vshrq_n_uv4si (__a, __imm);
 }
+__extension__ extern __inline int64_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vaddlvq_p_s32 (int32x4_t __a, mve_pred16_t __p)
+{
+  return __builtin_mve_vaddlvq_p_sv4si (__a, __p);
+}
+
+__extension__ extern __inline uint64_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vaddlvq_p_u32 (uint32x4_t __a, mve_pred16_t __p)
+{
+  return __builtin_mve_vaddlvq_p_uv4si (__a, __p);
+}
+
+__extension__ extern __inline int32_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vcmpneq_s8 (int8x16_t __a, int8x16_t __b)
+{
+  return __builtin_mve_vcmpneq_sv16qi (__a, __b);
+}
+
+__extension__ extern __inline mve_pred16_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vcmpneq_s16 (int16x8_t __a, int16x8_t __b)
+{
+  return __builtin_mve_vcmpneq_sv8hi (__a, __b);
+}
+
+__extension__ extern __inline mve_pred16_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vcmpneq_s32 (int32x4_t __a, int32x4_t __b)
+{
+  return __builtin_mve_vcmpneq_sv4si (__a, __b);
+}
+
+__extension__ extern __inline mve_pred16_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vcmpneq_u8 (uint8x16_t __a, uint8x16_t __b)
+{
+  return __builtin_mve_vcmpneq_uv16qi (__a, __b);
+}
+
+__extension__ extern __inline mve_pred16_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vcmpneq_u16 (uint16x8_t __a, uint16x8_t __b)
+{
+  return __builtin_mve_vcmpneq_uv8hi (__a, __b);
+}
+
+__extension__ extern __inline mve_pred16_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vcmpneq_u32 (uint32x4_t __a, uint32x4_t __b)
+{
+  return __builtin_mve_vcmpneq_uv4si (__a, __b);
+}
+
+__extension__ extern __inline int8x16_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vshlq_s8 (int8x16_t __a, int8x16_t __b)
+{
+  return __builtin_mve_vshlq_sv16qi (__a, __b);
+}
+
+__extension__ extern __inline int16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vshlq_s16 (int16x8_t __a, int16x8_t __b)
+{
+  return __builtin_mve_vshlq_sv8hi (__a, __b);
+}
+
+__extension__ extern __inline int32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vshlq_s32 (int32x4_t __a, int32x4_t __b)
+{
+  return __builtin_mve_vshlq_sv4si (__a, __b);
+}
+
+__extension__ extern __inline uint8x16_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vshlq_u8 (uint8x16_t __a, int8x16_t __b)
+{
+  return __builtin_mve_vshlq_uv16qi (__a, __b);
+}
+
+__extension__ extern __inline uint16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vshlq_u16 (uint16x8_t __a, int16x8_t __b)
+{
+  return __builtin_mve_vshlq_uv8hi (__a, __b);
+}
+
+__extension__ extern __inline uint32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vshlq_u32 (uint32x4_t __a, int32x4_t __b)
+{
+  return __builtin_mve_vshlq_uv4si (__a, __b);
+}
 
 #if (__ARM_FEATURE_MVE & 2) /* MVE Floating point.  */
 
@@ -1752,6 +1863,34 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_uint16x8_t]: __arm_vshrq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), p1), \
   int (*)[__ARM_mve_type_uint32x4_t]: __arm_vshrq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), p1));})
 
+#define vaddlvq_p(p0,p1) __arm_vaddlvq_p(p0,p1)
+#define __arm_vaddlvq_p(p0,p1) ({ __typeof(p0) __p0 = (p0); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
+  int (*)[__ARM_mve_type_int32x4_t]: __arm_vaddlvq_p_s32 (__ARM_mve_coerce(__p0, int32x4_t), p1), \
+  int (*)[__ARM_mve_type_uint32x4_t]: __arm_vaddlvq_p_u32 (__ARM_mve_coerce(__p0, uint32x4_t), p1));})
+
+#define vcmpneq(p0,p1) __arm_vcmpneq(p0,p1)
+#define __arm_vcmpneq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
+  __typeof(p1) __p1 = (p1); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
+  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vcmpneq_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t)), \
+  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vcmpneq_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t)), \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vcmpneq_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t)), \
+  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]: __arm_vcmpneq_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t)), \
+  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vcmpneq_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t)), \
+  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vcmpneq_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t)));})
+
+#define vshlq(p0,p1) __arm_vshlq(p0,p1)
+#define __arm_vshlq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
+  __typeof(p1) __p1 = (p1); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
+  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vshlq_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t)), \
+  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vshlq_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t)), \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vshlq_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t)), \
+  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int8x16_t]: __arm_vshlq_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, int8x16_t)), \
+  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int16x8_t]: __arm_vshlq_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, int16x8_t)), \
+  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int32x4_t]: __arm_vshlq_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, int32x4_t)));})
+
 #endif /* MVE Floating point.  */
 
 #ifdef __cplusplus
diff --git a/gcc/config/arm/arm_mve_builtins.def b/gcc/config/arm/arm_mve_builtins.def
index eae37ba831361401241ed91950169f06dee29c3e..30342bae823aadeae6d27b9b1317bac80c9e5039 100644
--- a/gcc/config/arm/arm_mve_builtins.def
+++ b/gcc/config/arm/arm_mve_builtins.def
@@ -6,7 +6,7 @@
 
     GCC is free software; you can redistribute it and/or modify it
     under the terms of the GNU General Public License as published
-    by the Free Software Foundation; either version 3, or   (at your
+    by the Free Software Foundation; either version 3, or (at your
     option) any later version.
 
     GCC is distributed in the hope that it will be useful, but WITHOUT
@@ -87,3 +87,9 @@ VAR4 (BINOP_UNONE_UNONE_UNONE, vcreateq_u, v16qi, v8hi, v4si, v2di)
 VAR4 (BINOP_NONE_UNONE_UNONE, vcreateq_s, v16qi, v8hi, v4si, v2di)
 VAR3 (BINOP_UNONE_UNONE_IMM, vshrq_n_u, v16qi, v8hi, v4si)
 VAR3 (BINOP_NONE_NONE_IMM, vshrq_n_s, v16qi, v8hi, v4si)
+VAR1 (BINOP_NONE_NONE_UNONE, vaddlvq_p_s, v4si)
+VAR1 (BINOP_UNONE_UNONE_UNONE, vaddlvq_p_u, v4si)
+VAR3 (BINOP_UNONE_NONE_NONE, vcmpneq_s, v16qi, v8hi, v4si)
+VAR3 (BINOP_UNONE_UNONE_UNONE, vcmpneq_u, v16qi, v8hi, v4si)
+VAR3 (BINOP_NONE_NONE_NONE, vshlq_s, v16qi, v8hi, v4si)
+VAR3 (BINOP_UNONE_UNONE_NONE, vshlq_u, v16qi, v8hi, v4si)
diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index 4ae7646cf20683ec147af11f398d7460bd65f7ba..e440671ba0862b51d85d1bfa9114fb585b6b39cd 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -40,7 +40,8 @@
 			 VADDLVQ_U VCTP8Q VCTP16Q VCTP32Q VCTP64Q VPNOT
 			 VCREATEQ_F VCVTQ_N_TO_F_S VCVTQ_N_TO_F_U VBRSRQ_N_F
 			 VSUBQ_N_F VCREATEQ_U VCREATEQ_S VSHRQ_N_S VSHRQ_N_U
-			 VCVTQ_N_FROM_F_S VCVTQ_N_FROM_F_U])
+			 VCVTQ_N_FROM_F_S VCVTQ_N_FROM_F_U VADDLVQ_P_S
+			 VADDLVQ_P_U VCMPNEQ_U VCMPNEQ_S VSHLQ_S VSHLQ_U])
 
 (define_mode_attr MVE_CNVT [(V8HI "V8HF") (V4SI "V4SF")
 			    (V8HF "V8HI") (V4SF "V4SI")])
@@ -59,8 +60,9 @@
 		       (VREV32Q_S "s") (VADDLVQ_U "u") (VADDLVQ_S "s")
 		       (VCVTQ_N_TO_F_S "s") (VCVTQ_N_TO_F_U "u")
 		       (VCREATEQ_U "u") (VCREATEQ_S "s") (VSHRQ_N_S "s")
-		       (VSHRQ_N_U "u") (VCVTQ_N_FROM_F_S "s")
-		       (VCVTQ_N_FROM_F_U "u")])
+		       (VSHRQ_N_U "u") (VCVTQ_N_FROM_F_S "s") (VSHLQ_U "u")
+		       (VCVTQ_N_FROM_F_U "u") (VADDLVQ_P_S "s") (VSHLQ_S "s")
+		       (VADDLVQ_P_U "u") (VCMPNEQ_U "u") (VCMPNEQ_S "s")])
 
 (define_int_attr mode1 [(VCTP8Q "8") (VCTP16Q "16") (VCTP32Q "32")
 			(VCTP64Q "64")])
@@ -90,6 +92,9 @@
 (define_int_iterator VCREATEQ [VCREATEQ_U VCREATEQ_S])
 (define_int_iterator VSHRQ_N [VSHRQ_N_S VSHRQ_N_U])
 (define_int_iterator VCVTQ_N_FROM_F [VCVTQ_N_FROM_F_S VCVTQ_N_FROM_F_U])
+(define_int_iterator VADDLVQ_P [VADDLVQ_P_S VADDLVQ_P_U])
+(define_int_iterator VCMPNEQ [VCMPNEQ_U VCMPNEQ_S])
+(define_int_iterator VSHLQ [VSHLQ_S VSHLQ_U])
 
 (define_insn "*mve_mov<mode>"
   [(set (match_operand:MVE_types 0 "s_register_operand" "=w,w,r,w,w,r,w")
@@ -799,3 +804,48 @@
   "vcvt.<supf>%#<V_sz_elem>.f%#<V_sz_elem>\t%q0, %q1, %2"
   [(set_attr "type" "mve_move")
 ])
+
+;;
+;; [vaddlvq_p_s])
+;;
+(define_insn "mve_vaddlvq_p_<supf>v4si"
+  [
+   (set (match_operand:DI 0 "s_register_operand" "=r")
+	(unspec:DI [(match_operand:V4SI 1 "s_register_operand" "w")
+		    (match_operand:HI 2 "vpr_register_operand" "Up")]
+	 VADDLVQ_P))
+  ]
+  "TARGET_HAVE_MVE"
+  "vpst\;vaddlvt.<supf>32 %Q0, %R0, %q1"
+  [(set_attr "type" "mve_move")
+   (set_attr "length""8")])
+
+;;
+;; [vcmpneq_u, vcmpneq_s])
+;;
+(define_insn "mve_vcmpneq_<supf><mode>"
+  [
+   (set (match_operand:HI 0 "vpr_register_operand" "=Up")
+	(unspec:HI [(match_operand:MVE_2 1 "s_register_operand" "w")
+		    (match_operand:MVE_2 2 "s_register_operand" "w")]
+	 VCMPNEQ))
+  ]
+  "TARGET_HAVE_MVE"
+  "vcmp.i%#<V_sz_elem>  ne, %q1, %q2"
+  [(set_attr "type" "mve_move")
+])
+
+;;
+;; [vshlq_s, vshlq_u])
+;;
+(define_insn "mve_vshlq_<supf><mode>"
+  [
+   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
+	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "w")
+		       (match_operand:MVE_2 2 "s_register_operand" "w")]
+	 VSHLQ))
+  ]
+  "TARGET_HAVE_MVE"
+  "vshl.<supf>%#<V_sz_elem>\t%q0, %q1, %q2"
+  [(set_attr "type" "mve_move")
+])
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddlvq_p_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddlvq_p_s32.c
new file mode 100644
index 0000000000000000000000000000000000000000..55fdbbf764527ff9a7dc26f2843820f83f7dc9b0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddlvq_p_s32.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int64_t
+foo (int32x4_t a, mve_pred16_t p)
+{
+  return vaddlvq_p_s32 (a, p);
+}
+
+/* { dg-final { scan-assembler "vaddlvt.s32"  }  } */
+
+int64_t
+foo1 (int32x4_t a, mve_pred16_t p)
+{
+  return vaddlvq_p (a, p);
+}
+
+/* { dg-final { scan-assembler "vaddlvt.s32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddlvq_p_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddlvq_p_u32.c
new file mode 100644
index 0000000000000000000000000000000000000000..7b88a2c46366db967e05a27a1ba6636dd0be76d5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddlvq_p_u32.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint64_t
+foo (uint32x4_t a, mve_pred16_t p)
+{
+  return vaddlvq_p_u32 (a, p);
+}
+
+/* { dg-final { scan-assembler "vaddlvt.u32"  }  } */
+
+uint64_t
+foo1 (uint32x4_t a, mve_pred16_t p)
+{
+  return vaddlvq_p (a, p);
+}
+
+/* { dg-final { scan-assembler "vaddlvt.u32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_s16.c
new file mode 100644
index 0000000000000000000000000000000000000000..eef205f20291a62b67927ad15780d93c6a9e61d0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_s16.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+mve_pred16_t
+foo (int16x8_t a, int16x8_t b)
+{
+  return vcmpneq_s16 (a, b);
+}
+
+/* { dg-final { scan-assembler "vcmp.i16"  }  } */
+
+mve_pred16_t
+foo1 (int16x8_t a, int16x8_t b)
+{
+  return vcmpneq (a, b);
+}
+
+/* { dg-final { scan-assembler "vcmp.i16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_s32.c
new file mode 100644
index 0000000000000000000000000000000000000000..64eb0474610b16a52b7466bc5368acf0f7e83fff
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_s32.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+mve_pred16_t
+foo (int32x4_t a, int32x4_t b)
+{
+  return vcmpneq_s32 (a, b);
+}
+
+/* { dg-final { scan-assembler "vcmp.i32"  }  } */
+
+mve_pred16_t
+foo1 (int32x4_t a, int32x4_t b)
+{
+  return vcmpneq (a, b);
+}
+
+/* { dg-final { scan-assembler "vcmp.i32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_s8.c
new file mode 100644
index 0000000000000000000000000000000000000000..07b04c12170ef7217aa9aa9f6171425f209f38bf
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_s8.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+mve_pred16_t
+foo (int8x16_t a, int8x16_t b)
+{
+  return vcmpneq_s8 (a, b);
+}
+
+/* { dg-final { scan-assembler "vcmp.i8"  }  } */
+
+mve_pred16_t
+foo1 (int8x16_t a, int8x16_t b)
+{
+  return vcmpneq (a, b);
+}
+
+/* { dg-final { scan-assembler "vcmp.i8"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_u16.c
new file mode 100644
index 0000000000000000000000000000000000000000..5bf35fd79e2d7f393abea01788fc3fc106a0bf00
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_u16.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+mve_pred16_t
+foo (uint16x8_t a, uint16x8_t b)
+{
+  return vcmpneq_u16 (a, b);
+}
+
+/* { dg-final { scan-assembler "vcmp.i16"  }  } */
+
+mve_pred16_t
+foo1 (uint16x8_t a, uint16x8_t b)
+{
+  return vcmpneq (a, b);
+}
+
+/* { dg-final { scan-assembler "vcmp.i16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_u32.c
new file mode 100644
index 0000000000000000000000000000000000000000..c80918547492b0854f1475b3b4d81f847324f898
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_u32.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+mve_pred16_t
+foo (uint32x4_t a, uint32x4_t b)
+{
+  return vcmpneq_u32 (a, b);
+}
+
+/* { dg-final { scan-assembler "vcmp.i32"  }  } */
+
+mve_pred16_t
+foo1 (uint32x4_t a, uint32x4_t b)
+{
+  return vcmpneq (a, b);
+}
+
+/* { dg-final { scan-assembler "vcmp.i32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_u8.c
new file mode 100644
index 0000000000000000000000000000000000000000..c71cc592923502751786c7bcc800a941e4c185d3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_u8.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+mve_pred16_t
+foo (uint8x16_t a, uint8x16_t b)
+{
+  return vcmpneq_u8 (a, b);
+}
+
+/* { dg-final { scan-assembler "vcmp.i8"  }  } */
+
+mve_pred16_t
+foo1 (uint8x16_t a, uint8x16_t b)
+{
+  return vcmpneq (a, b);
+}
+
+/* { dg-final { scan-assembler "vcmp.i8"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vshlq_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vshlq_s16.c
new file mode 100644
index 0000000000000000000000000000000000000000..88575fdef3a794358e8cba40bf2df3a4273b8628
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vshlq_s16.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int16x8_t
+foo (int16x8_t a, int16x8_t b)
+{
+  return vshlq_s16 (a, b);
+}
+
+/* { dg-final { scan-assembler "vshl.s16"  }  } */
+
+int16x8_t
+foo1 (int16x8_t a, int16x8_t b)
+{
+  return vshlq (a, b);
+}
+
+/* { dg-final { scan-assembler "vshl.s16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vshlq_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vshlq_s32.c
new file mode 100644
index 0000000000000000000000000000000000000000..7603b6c99c8f7804ada72d830cba9a26aeab5c83
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vshlq_s32.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int32x4_t
+foo (int32x4_t a, int32x4_t b)
+{
+  return vshlq_s32 (a, b);
+}
+
+/* { dg-final { scan-assembler "vshl.s32"  }  } */
+
+int32x4_t
+foo1 (int32x4_t a, int32x4_t b)
+{
+  return vshlq (a, b);
+}
+
+/* { dg-final { scan-assembler "vshl.s32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vshlq_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vshlq_s8.c
new file mode 100644
index 0000000000000000000000000000000000000000..6f86f84f77b17b444fdde91e02c99f16cdb7bff6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vshlq_s8.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int8x16_t
+foo (int8x16_t a, int8x16_t b)
+{
+  return vshlq_s8 (a, b);
+}
+
+/* { dg-final { scan-assembler "vshl.s8"  }  } */
+
+int8x16_t
+foo1 (int8x16_t a, int8x16_t b)
+{
+  return vshlq (a, b);
+}
+
+/* { dg-final { scan-assembler "vshl.s8"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vshlq_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vshlq_u16.c
new file mode 100644
index 0000000000000000000000000000000000000000..9b216630310187080c072d6c465fb60fea9b4f5a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vshlq_u16.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint16x8_t
+foo (uint16x8_t a, int16x8_t b)
+{
+  return vshlq_u16 (a, b);
+}
+
+/* { dg-final { scan-assembler "vshl.u16"  }  } */
+
+uint16x8_t
+foo1 (uint16x8_t a, int16x8_t b)
+{
+  return vshlq (a, b);
+}
+
+/* { dg-final { scan-assembler "vshl.u16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vshlq_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vshlq_u32.c
new file mode 100644
index 0000000000000000000000000000000000000000..4da8a98ae5ebab7cdf1eec080f9f3b3db91cc7a8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vshlq_u32.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint32x4_t
+foo (uint32x4_t a, int32x4_t b)
+{
+  return vshlq_u32 (a, b);
+}
+
+/* { dg-final { scan-assembler "vshl.u32"  }  } */
+
+uint32x4_t
+foo1 (uint32x4_t a, int32x4_t b)
+{
+  return vshlq (a, b);
+}
+
+/* { dg-final { scan-assembler "vshl.u32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vshlq_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vshlq_u8.c
new file mode 100644
index 0000000000000000000000000000000000000000..a5414d68e3b3b38957039678e171a576cbfa9fa0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vshlq_u8.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint8x16_t
+foo (uint8x16_t a, int8x16_t b)
+{
+  return vshlq_u8 (a, b);
+}
+
+/* { dg-final { scan-assembler "vshl.u8"  }  } */
+
+uint8x16_t
+foo1 (uint8x16_t a, int8x16_t b)
+{
+  return vshlq (a, b);
+}
+
+/* { dg-final { scan-assembler "vshl.u8"  }  } */


[-- Attachment #2: diff10.patch --]
[-- Type: text/plain, Size: 24419 bytes --]

diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index ec56dbcdd66662bb1de696142186955d75570772..73a0a3070bda2e35f3e400994dbb6ccfb3043f76 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -391,6 +391,24 @@ arm_binop_unone_none_imm_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 #define BINOP_UNONE_NONE_IMM_QUALIFIERS \
   (arm_binop_unone_none_imm_qualifiers)
 
+static enum arm_type_qualifiers
+arm_binop_none_none_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_none, qualifier_none, qualifier_unsigned };
+#define BINOP_NONE_NONE_UNONE_QUALIFIERS \
+  (arm_binop_none_none_unone_qualifiers)
+
+static enum arm_type_qualifiers
+arm_binop_unone_none_none_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_unsigned, qualifier_none, qualifier_none };
+#define BINOP_UNONE_NONE_NONE_QUALIFIERS \
+  (arm_binop_unone_none_none_qualifiers)
+
+static enum arm_type_qualifiers
+arm_binop_unone_unone_none_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_unsigned, qualifier_unsigned, qualifier_none };
+#define BINOP_UNONE_UNONE_NONE_QUALIFIERS \
+  (arm_binop_unone_unone_none_qualifiers)
+
 /* End of Qualifier for MVE builtins.  */
 
    /* void ([T element type] *, T, immediate).  */
diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
index 0318a2cd8c3cdea430a32506e75a5aceb4d3a475..0140b0cb38ccee5f9caa7570627bdc010e158e71 100644
--- a/gcc/config/arm/arm_mve.h
+++ b/gcc/config/arm/arm_mve.h
@@ -225,6 +225,20 @@ typedef struct { uint8x16_t val[4]; } uint8x16x4_t;
 #define vshrq_n_u8(__a,  __imm) __arm_vshrq_n_u8(__a,  __imm)
 #define vshrq_n_u16(__a,  __imm) __arm_vshrq_n_u16(__a,  __imm)
 #define vshrq_n_u32(__a,  __imm) __arm_vshrq_n_u32(__a,  __imm)
+#define vaddlvq_p_s32(__a, __p) __arm_vaddlvq_p_s32(__a, __p)
+#define vaddlvq_p_u32(__a, __p) __arm_vaddlvq_p_u32(__a, __p)
+#define vcmpneq_s8(__a, __b) __arm_vcmpneq_s8(__a, __b)
+#define vcmpneq_s16(__a, __b) __arm_vcmpneq_s16(__a, __b)
+#define vcmpneq_s32(__a, __b) __arm_vcmpneq_s32(__a, __b)
+#define vcmpneq_u8(__a, __b) __arm_vcmpneq_u8(__a, __b)
+#define vcmpneq_u16(__a, __b) __arm_vcmpneq_u16(__a, __b)
+#define vcmpneq_u32(__a, __b) __arm_vcmpneq_u32(__a, __b)
+#define vshlq_s8(__a, __b) __arm_vshlq_s8(__a, __b)
+#define vshlq_s16(__a, __b) __arm_vshlq_s16(__a, __b)
+#define vshlq_s32(__a, __b) __arm_vshlq_s32(__a, __b)
+#define vshlq_u8(__a, __b) __arm_vshlq_u8(__a, __b)
+#define vshlq_u16(__a, __b) __arm_vshlq_u16(__a, __b)
+#define vshlq_u32(__a, __b) __arm_vshlq_u32(__a, __b)
 #endif
 
 __extension__ extern __inline void
@@ -868,6 +882,103 @@ __arm_vshrq_n_u32 (uint32x4_t __a, const int __imm)
 {
   return __builtin_mve_vshrq_n_uv4si (__a, __imm);
 }
+__extension__ extern __inline int64_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vaddlvq_p_s32 (int32x4_t __a, mve_pred16_t __p)
+{
+  return __builtin_mve_vaddlvq_p_sv4si (__a, __p);
+}
+
+__extension__ extern __inline uint64_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vaddlvq_p_u32 (uint32x4_t __a, mve_pred16_t __p)
+{
+  return __builtin_mve_vaddlvq_p_uv4si (__a, __p);
+}
+
+__extension__ extern __inline int32_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vcmpneq_s8 (int8x16_t __a, int8x16_t __b)
+{
+  return __builtin_mve_vcmpneq_sv16qi (__a, __b);
+}
+
+__extension__ extern __inline mve_pred16_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vcmpneq_s16 (int16x8_t __a, int16x8_t __b)
+{
+  return __builtin_mve_vcmpneq_sv8hi (__a, __b);
+}
+
+__extension__ extern __inline mve_pred16_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vcmpneq_s32 (int32x4_t __a, int32x4_t __b)
+{
+  return __builtin_mve_vcmpneq_sv4si (__a, __b);
+}
+
+__extension__ extern __inline mve_pred16_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vcmpneq_u8 (uint8x16_t __a, uint8x16_t __b)
+{
+  return __builtin_mve_vcmpneq_uv16qi (__a, __b);
+}
+
+__extension__ extern __inline mve_pred16_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vcmpneq_u16 (uint16x8_t __a, uint16x8_t __b)
+{
+  return __builtin_mve_vcmpneq_uv8hi (__a, __b);
+}
+
+__extension__ extern __inline mve_pred16_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vcmpneq_u32 (uint32x4_t __a, uint32x4_t __b)
+{
+  return __builtin_mve_vcmpneq_uv4si (__a, __b);
+}
+
+__extension__ extern __inline int8x16_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vshlq_s8 (int8x16_t __a, int8x16_t __b)
+{
+  return __builtin_mve_vshlq_sv16qi (__a, __b);
+}
+
+__extension__ extern __inline int16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vshlq_s16 (int16x8_t __a, int16x8_t __b)
+{
+  return __builtin_mve_vshlq_sv8hi (__a, __b);
+}
+
+__extension__ extern __inline int32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vshlq_s32 (int32x4_t __a, int32x4_t __b)
+{
+  return __builtin_mve_vshlq_sv4si (__a, __b);
+}
+
+__extension__ extern __inline uint8x16_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vshlq_u8 (uint8x16_t __a, int8x16_t __b)
+{
+  return __builtin_mve_vshlq_uv16qi (__a, __b);
+}
+
+__extension__ extern __inline uint16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vshlq_u16 (uint16x8_t __a, int16x8_t __b)
+{
+  return __builtin_mve_vshlq_uv8hi (__a, __b);
+}
+
+__extension__ extern __inline uint32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vshlq_u32 (uint32x4_t __a, int32x4_t __b)
+{
+  return __builtin_mve_vshlq_uv4si (__a, __b);
+}
 
 #if (__ARM_FEATURE_MVE & 2) /* MVE Floating point.  */
 
@@ -1752,6 +1863,34 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_uint16x8_t]: __arm_vshrq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), p1), \
   int (*)[__ARM_mve_type_uint32x4_t]: __arm_vshrq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), p1));})
 
+#define vaddlvq_p(p0,p1) __arm_vaddlvq_p(p0,p1)
+#define __arm_vaddlvq_p(p0,p1) ({ __typeof(p0) __p0 = (p0); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
+  int (*)[__ARM_mve_type_int32x4_t]: __arm_vaddlvq_p_s32 (__ARM_mve_coerce(__p0, int32x4_t), p1), \
+  int (*)[__ARM_mve_type_uint32x4_t]: __arm_vaddlvq_p_u32 (__ARM_mve_coerce(__p0, uint32x4_t), p1));})
+
+#define vcmpneq(p0,p1) __arm_vcmpneq(p0,p1)
+#define __arm_vcmpneq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
+  __typeof(p1) __p1 = (p1); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
+  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vcmpneq_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t)), \
+  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vcmpneq_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t)), \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vcmpneq_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t)), \
+  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]: __arm_vcmpneq_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t)), \
+  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vcmpneq_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t)), \
+  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vcmpneq_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t)));})
+
+#define vshlq(p0,p1) __arm_vshlq(p0,p1)
+#define __arm_vshlq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
+  __typeof(p1) __p1 = (p1); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
+  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vshlq_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t)), \
+  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vshlq_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t)), \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vshlq_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t)), \
+  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int8x16_t]: __arm_vshlq_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, int8x16_t)), \
+  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int16x8_t]: __arm_vshlq_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, int16x8_t)), \
+  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int32x4_t]: __arm_vshlq_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, int32x4_t)));})
+
 #endif /* MVE Floating point.  */
 
 #ifdef __cplusplus
diff --git a/gcc/config/arm/arm_mve_builtins.def b/gcc/config/arm/arm_mve_builtins.def
index eae37ba831361401241ed91950169f06dee29c3e..30342bae823aadeae6d27b9b1317bac80c9e5039 100644
--- a/gcc/config/arm/arm_mve_builtins.def
+++ b/gcc/config/arm/arm_mve_builtins.def
@@ -6,7 +6,7 @@
 
     GCC is free software; you can redistribute it and/or modify it
     under the terms of the GNU General Public License as published
-    by the Free Software Foundation; either version 3, or   (at your
+    by the Free Software Foundation; either version 3, or (at your
     option) any later version.
 
     GCC is distributed in the hope that it will be useful, but WITHOUT
@@ -87,3 +87,9 @@ VAR4 (BINOP_UNONE_UNONE_UNONE, vcreateq_u, v16qi, v8hi, v4si, v2di)
 VAR4 (BINOP_NONE_UNONE_UNONE, vcreateq_s, v16qi, v8hi, v4si, v2di)
 VAR3 (BINOP_UNONE_UNONE_IMM, vshrq_n_u, v16qi, v8hi, v4si)
 VAR3 (BINOP_NONE_NONE_IMM, vshrq_n_s, v16qi, v8hi, v4si)
+VAR1 (BINOP_NONE_NONE_UNONE, vaddlvq_p_s, v4si)
+VAR1 (BINOP_UNONE_UNONE_UNONE, vaddlvq_p_u, v4si)
+VAR3 (BINOP_UNONE_NONE_NONE, vcmpneq_s, v16qi, v8hi, v4si)
+VAR3 (BINOP_UNONE_UNONE_UNONE, vcmpneq_u, v16qi, v8hi, v4si)
+VAR3 (BINOP_NONE_NONE_NONE, vshlq_s, v16qi, v8hi, v4si)
+VAR3 (BINOP_UNONE_UNONE_NONE, vshlq_u, v16qi, v8hi, v4si)
diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index 4ae7646cf20683ec147af11f398d7460bd65f7ba..e440671ba0862b51d85d1bfa9114fb585b6b39cd 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -40,7 +40,8 @@
 			 VADDLVQ_U VCTP8Q VCTP16Q VCTP32Q VCTP64Q VPNOT
 			 VCREATEQ_F VCVTQ_N_TO_F_S VCVTQ_N_TO_F_U VBRSRQ_N_F
 			 VSUBQ_N_F VCREATEQ_U VCREATEQ_S VSHRQ_N_S VSHRQ_N_U
-			 VCVTQ_N_FROM_F_S VCVTQ_N_FROM_F_U])
+			 VCVTQ_N_FROM_F_S VCVTQ_N_FROM_F_U VADDLVQ_P_S
+			 VADDLVQ_P_U VCMPNEQ_U VCMPNEQ_S VSHLQ_S VSHLQ_U])
 
 (define_mode_attr MVE_CNVT [(V8HI "V8HF") (V4SI "V4SF")
 			    (V8HF "V8HI") (V4SF "V4SI")])
@@ -59,8 +60,9 @@
 		       (VREV32Q_S "s") (VADDLVQ_U "u") (VADDLVQ_S "s")
 		       (VCVTQ_N_TO_F_S "s") (VCVTQ_N_TO_F_U "u")
 		       (VCREATEQ_U "u") (VCREATEQ_S "s") (VSHRQ_N_S "s")
-		       (VSHRQ_N_U "u") (VCVTQ_N_FROM_F_S "s")
-		       (VCVTQ_N_FROM_F_U "u")])
+		       (VSHRQ_N_U "u") (VCVTQ_N_FROM_F_S "s") (VSHLQ_U "u")
+		       (VCVTQ_N_FROM_F_U "u") (VADDLVQ_P_S "s") (VSHLQ_S "s")
+		       (VADDLVQ_P_U "u") (VCMPNEQ_U "u") (VCMPNEQ_S "s")])
 
 (define_int_attr mode1 [(VCTP8Q "8") (VCTP16Q "16") (VCTP32Q "32")
 			(VCTP64Q "64")])
@@ -90,6 +92,9 @@
 (define_int_iterator VCREATEQ [VCREATEQ_U VCREATEQ_S])
 (define_int_iterator VSHRQ_N [VSHRQ_N_S VSHRQ_N_U])
 (define_int_iterator VCVTQ_N_FROM_F [VCVTQ_N_FROM_F_S VCVTQ_N_FROM_F_U])
+(define_int_iterator VADDLVQ_P [VADDLVQ_P_S VADDLVQ_P_U])
+(define_int_iterator VCMPNEQ [VCMPNEQ_U VCMPNEQ_S])
+(define_int_iterator VSHLQ [VSHLQ_S VSHLQ_U])
 
 (define_insn "*mve_mov<mode>"
   [(set (match_operand:MVE_types 0 "s_register_operand" "=w,w,r,w,w,r,w")
@@ -799,3 +804,48 @@
   "vcvt.<supf>%#<V_sz_elem>.f%#<V_sz_elem>\t%q0, %q1, %2"
   [(set_attr "type" "mve_move")
 ])
+
+;;
+;; [vaddlvq_p_s])
+;;
+(define_insn "mve_vaddlvq_p_<supf>v4si"
+  [
+   (set (match_operand:DI 0 "s_register_operand" "=r")
+	(unspec:DI [(match_operand:V4SI 1 "s_register_operand" "w")
+		    (match_operand:HI 2 "vpr_register_operand" "Up")]
+	 VADDLVQ_P))
+  ]
+  "TARGET_HAVE_MVE"
+  "vpst\;vaddlvt.<supf>32 %Q0, %R0, %q1"
+  [(set_attr "type" "mve_move")
+   (set_attr "length""8")])
+
+;;
+;; [vcmpneq_u, vcmpneq_s])
+;;
+(define_insn "mve_vcmpneq_<supf><mode>"
+  [
+   (set (match_operand:HI 0 "vpr_register_operand" "=Up")
+	(unspec:HI [(match_operand:MVE_2 1 "s_register_operand" "w")
+		    (match_operand:MVE_2 2 "s_register_operand" "w")]
+	 VCMPNEQ))
+  ]
+  "TARGET_HAVE_MVE"
+  "vcmp.i%#<V_sz_elem>  ne, %q1, %q2"
+  [(set_attr "type" "mve_move")
+])
+
+;;
+;; [vshlq_s, vshlq_u])
+;;
+(define_insn "mve_vshlq_<supf><mode>"
+  [
+   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
+	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "w")
+		       (match_operand:MVE_2 2 "s_register_operand" "w")]
+	 VSHLQ))
+  ]
+  "TARGET_HAVE_MVE"
+  "vshl.<supf>%#<V_sz_elem>\t%q0, %q1, %q2"
+  [(set_attr "type" "mve_move")
+])
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddlvq_p_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddlvq_p_s32.c
new file mode 100644
index 0000000000000000000000000000000000000000..55fdbbf764527ff9a7dc26f2843820f83f7dc9b0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddlvq_p_s32.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int64_t
+foo (int32x4_t a, mve_pred16_t p)
+{
+  return vaddlvq_p_s32 (a, p);
+}
+
+/* { dg-final { scan-assembler "vaddlvt.s32"  }  } */
+
+int64_t
+foo1 (int32x4_t a, mve_pred16_t p)
+{
+  return vaddlvq_p (a, p);
+}
+
+/* { dg-final { scan-assembler "vaddlvt.s32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddlvq_p_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddlvq_p_u32.c
new file mode 100644
index 0000000000000000000000000000000000000000..7b88a2c46366db967e05a27a1ba6636dd0be76d5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddlvq_p_u32.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint64_t
+foo (uint32x4_t a, mve_pred16_t p)
+{
+  return vaddlvq_p_u32 (a, p);
+}
+
+/* { dg-final { scan-assembler "vaddlvt.u32"  }  } */
+
+uint64_t
+foo1 (uint32x4_t a, mve_pred16_t p)
+{
+  return vaddlvq_p (a, p);
+}
+
+/* { dg-final { scan-assembler "vaddlvt.u32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_s16.c
new file mode 100644
index 0000000000000000000000000000000000000000..eef205f20291a62b67927ad15780d93c6a9e61d0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_s16.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+mve_pred16_t
+foo (int16x8_t a, int16x8_t b)
+{
+  return vcmpneq_s16 (a, b);
+}
+
+/* { dg-final { scan-assembler "vcmp.i16"  }  } */
+
+mve_pred16_t
+foo1 (int16x8_t a, int16x8_t b)
+{
+  return vcmpneq (a, b);
+}
+
+/* { dg-final { scan-assembler "vcmp.i16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_s32.c
new file mode 100644
index 0000000000000000000000000000000000000000..64eb0474610b16a52b7466bc5368acf0f7e83fff
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_s32.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+mve_pred16_t
+foo (int32x4_t a, int32x4_t b)
+{
+  return vcmpneq_s32 (a, b);
+}
+
+/* { dg-final { scan-assembler "vcmp.i32"  }  } */
+
+mve_pred16_t
+foo1 (int32x4_t a, int32x4_t b)
+{
+  return vcmpneq (a, b);
+}
+
+/* { dg-final { scan-assembler "vcmp.i32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_s8.c
new file mode 100644
index 0000000000000000000000000000000000000000..07b04c12170ef7217aa9aa9f6171425f209f38bf
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_s8.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+mve_pred16_t
+foo (int8x16_t a, int8x16_t b)
+{
+  return vcmpneq_s8 (a, b);
+}
+
+/* { dg-final { scan-assembler "vcmp.i8"  }  } */
+
+mve_pred16_t
+foo1 (int8x16_t a, int8x16_t b)
+{
+  return vcmpneq (a, b);
+}
+
+/* { dg-final { scan-assembler "vcmp.i8"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_u16.c
new file mode 100644
index 0000000000000000000000000000000000000000..5bf35fd79e2d7f393abea01788fc3fc106a0bf00
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_u16.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+mve_pred16_t
+foo (uint16x8_t a, uint16x8_t b)
+{
+  return vcmpneq_u16 (a, b);
+}
+
+/* { dg-final { scan-assembler "vcmp.i16"  }  } */
+
+mve_pred16_t
+foo1 (uint16x8_t a, uint16x8_t b)
+{
+  return vcmpneq (a, b);
+}
+
+/* { dg-final { scan-assembler "vcmp.i16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_u32.c
new file mode 100644
index 0000000000000000000000000000000000000000..c80918547492b0854f1475b3b4d81f847324f898
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_u32.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+mve_pred16_t
+foo (uint32x4_t a, uint32x4_t b)
+{
+  return vcmpneq_u32 (a, b);
+}
+
+/* { dg-final { scan-assembler "vcmp.i32"  }  } */
+
+mve_pred16_t
+foo1 (uint32x4_t a, uint32x4_t b)
+{
+  return vcmpneq (a, b);
+}
+
+/* { dg-final { scan-assembler "vcmp.i32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_u8.c
new file mode 100644
index 0000000000000000000000000000000000000000..c71cc592923502751786c7bcc800a941e4c185d3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_u8.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+mve_pred16_t
+foo (uint8x16_t a, uint8x16_t b)
+{
+  return vcmpneq_u8 (a, b);
+}
+
+/* { dg-final { scan-assembler "vcmp.i8"  }  } */
+
+mve_pred16_t
+foo1 (uint8x16_t a, uint8x16_t b)
+{
+  return vcmpneq (a, b);
+}
+
+/* { dg-final { scan-assembler "vcmp.i8"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vshlq_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vshlq_s16.c
new file mode 100644
index 0000000000000000000000000000000000000000..88575fdef3a794358e8cba40bf2df3a4273b8628
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vshlq_s16.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int16x8_t
+foo (int16x8_t a, int16x8_t b)
+{
+  return vshlq_s16 (a, b);
+}
+
+/* { dg-final { scan-assembler "vshl.s16"  }  } */
+
+int16x8_t
+foo1 (int16x8_t a, int16x8_t b)
+{
+  return vshlq (a, b);
+}
+
+/* { dg-final { scan-assembler "vshl.s16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vshlq_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vshlq_s32.c
new file mode 100644
index 0000000000000000000000000000000000000000..7603b6c99c8f7804ada72d830cba9a26aeab5c83
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vshlq_s32.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int32x4_t
+foo (int32x4_t a, int32x4_t b)
+{
+  return vshlq_s32 (a, b);
+}
+
+/* { dg-final { scan-assembler "vshl.s32"  }  } */
+
+int32x4_t
+foo1 (int32x4_t a, int32x4_t b)
+{
+  return vshlq (a, b);
+}
+
+/* { dg-final { scan-assembler "vshl.s32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vshlq_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vshlq_s8.c
new file mode 100644
index 0000000000000000000000000000000000000000..6f86f84f77b17b444fdde91e02c99f16cdb7bff6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vshlq_s8.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int8x16_t
+foo (int8x16_t a, int8x16_t b)
+{
+  return vshlq_s8 (a, b);
+}
+
+/* { dg-final { scan-assembler "vshl.s8"  }  } */
+
+int8x16_t
+foo1 (int8x16_t a, int8x16_t b)
+{
+  return vshlq (a, b);
+}
+
+/* { dg-final { scan-assembler "vshl.s8"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vshlq_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vshlq_u16.c
new file mode 100644
index 0000000000000000000000000000000000000000..9b216630310187080c072d6c465fb60fea9b4f5a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vshlq_u16.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint16x8_t
+foo (uint16x8_t a, int16x8_t b)
+{
+  return vshlq_u16 (a, b);
+}
+
+/* { dg-final { scan-assembler "vshl.u16"  }  } */
+
+uint16x8_t
+foo1 (uint16x8_t a, int16x8_t b)
+{
+  return vshlq (a, b);
+}
+
+/* { dg-final { scan-assembler "vshl.u16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vshlq_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vshlq_u32.c
new file mode 100644
index 0000000000000000000000000000000000000000..4da8a98ae5ebab7cdf1eec080f9f3b3db91cc7a8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vshlq_u32.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint32x4_t
+foo (uint32x4_t a, int32x4_t b)
+{
+  return vshlq_u32 (a, b);
+}
+
+/* { dg-final { scan-assembler "vshl.u32"  }  } */
+
+uint32x4_t
+foo1 (uint32x4_t a, int32x4_t b)
+{
+  return vshlq (a, b);
+}
+
+/* { dg-final { scan-assembler "vshl.u32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vshlq_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vshlq_u8.c
new file mode 100644
index 0000000000000000000000000000000000000000..a5414d68e3b3b38957039678e171a576cbfa9fa0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vshlq_u8.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint8x16_t
+foo (uint8x16_t a, int8x16_t b)
+{
+  return vshlq_u8 (a, b);
+}
+
+/* { dg-final { scan-assembler "vshl.u8"  }  } */
+
+uint8x16_t
+foo1 (uint8x16_t a, int8x16_t b)
+{
+  return vshlq (a, b);
+}
+
+/* { dg-final { scan-assembler "vshl.u8"  }  } */


^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH][ARM][GCC][5/2x]: MVE intrinsics with binary operands.
  2019-11-14 19:34 [PATCH][ARM][GCC][0/x]: Support for MVE ACLE intrinsics Srinath Parvathaneni
                   ` (6 preceding siblings ...)
  2019-11-14 19:13 ` [PATCH][ARM][GCC][3/3x]: MVE intrinsics with ternary operands Srinath Parvathaneni
@ 2019-11-14 19:13 ` Srinath Parvathaneni
  2019-11-14 19:13 ` [PATCH][ARM][GCC][3/4x]: MVE intrinsics with quaternary operands Srinath Parvathaneni
                   ` (30 subsequent siblings)
  38 siblings, 0 replies; 48+ messages in thread
From: Srinath Parvathaneni @ 2019-11-14 19:13 UTC (permalink / raw)
  To: gcc-patches; +Cc: Richard Earnshaw, Kyrylo Tkachov

[-- Attachment #1: Type: text/plain, Size: 26609 bytes --]

Hello,

This patch supports following MVE ACLE intrinsics with binary operands.

vqmovntq_u16, vqmovnbq_u16, vmulltq_poly_p8, vmullbq_poly_p8, vmovntq_u16,
vmovnbq_u16, vmlaldavxq_u16, vmlaldavq_u16, vqmovuntq_s16, vqmovunbq_s16, 
vshlltq_n_u8, vshllbq_n_u8, vorrq_n_u16, vbicq_n_u16, vcmpneq_n_f16, vcmpneq_f16,
vcmpltq_n_f16, vcmpltq_f16, vcmpleq_n_f16, vcmpleq_f16, vcmpgtq_n_f16,
vcmpgtq_f16, vcmpgeq_n_f16, vcmpgeq_f16, vcmpeqq_n_f16, vcmpeqq_f16, vsubq_f16,
vqmovntq_s16, vqmovnbq_s16, vqdmulltq_s16, vqdmulltq_n_s16, vqdmullbq_s16,
vqdmullbq_n_s16, vorrq_f16, vornq_f16, vmulq_n_f16, vmulq_f16, vmovntq_s16,
vmovnbq_s16, vmlsldavxq_s16, vmlsldavq_s16, vmlaldavxq_s16, vmlaldavq_s16,
vminnmvq_f16, vminnmq_f16, vminnmavq_f16, vminnmaq_f16, vmaxnmvq_f16, vmaxnmq_f16,
vmaxnmavq_f16, vmaxnmaq_f16, veorq_f16, vcmulq_rot90_f16, vcmulq_rot270_f16,
vcmulq_rot180_f16, vcmulq_f16, vcaddq_rot90_f16, vcaddq_rot270_f16, vbicq_f16,
vandq_f16, vaddq_n_f16, vabdq_f16, vshlltq_n_s8, vshllbq_n_s8, vorrq_n_s16, 
vbicq_n_s16, vqmovntq_u32, vqmovnbq_u32, vmulltq_poly_p16, vmullbq_poly_p16,
vmovntq_u32, vmovnbq_u32, vmlaldavxq_u32, vmlaldavq_u32, vqmovuntq_s32,
vqmovunbq_s32, vshlltq_n_u16, vshllbq_n_u16, vorrq_n_u32, vbicq_n_u32, 
vcmpneq_n_f32, vcmpneq_f32, vcmpltq_n_f32, vcmpltq_f32, vcmpleq_n_f32, 
vcmpleq_f32, vcmpgtq_n_f32, vcmpgtq_f32, vcmpgeq_n_f32, vcmpgeq_f32, 
vcmpeqq_n_f32, vcmpeqq_f32, vsubq_f32, vqmovntq_s32, vqmovnbq_s32, 
vqdmulltq_s32, vqdmulltq_n_s32, vqdmullbq_s32, vqdmullbq_n_s32, vorrq_f32,
vornq_f32, vmulq_n_f32, vmulq_f32, vmovntq_s32, vmovnbq_s32, vmlsldavxq_s32,
vmlsldavq_s32, vmlaldavxq_s32, vmlaldavq_s32, vminnmvq_f32, vminnmq_f32,
vminnmavq_f32, vminnmaq_f32, vmaxnmvq_f32, vmaxnmq_f32, vmaxnmavq_f32,
vmaxnmaq_f32, veorq_f32, vcmulq_rot90_f32, vcmulq_rot270_f32, vcmulq_rot180_f32,
vcmulq_f32, vcaddq_rot90_f32, vcaddq_rot270_f32, vbicq_f32, vandq_f32, 
vaddq_n_f32, vabdq_f32, vshlltq_n_s16, vshllbq_n_s16, vorrq_n_s32, vbicq_n_s32, 
vrmlaldavhq_u32, vctp8q_m, vctp64q_m, vctp32q_m, vctp16q_m, vaddlvaq_u32, 
vrmlsldavhxq_s32, vrmlsldavhq_s32, vrmlaldavhxq_s32, vrmlaldavhq_s32,
vcvttq_f16_f32, vcvtbq_f16_f32, vaddlvaq_s32.

Please refer to M-profile Vector Extension (MVE) intrinsics [1]  for more details.
[1] https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/mve-intrinsics

The above intrinsics are defined using the already defined builtin qualifiers BINOP_NONE_NONE_IMM,
BINOP_NONE_NONE_NONE, BINOP_UNONE_NONE_NONE, BINOP_UNONE_UNONE_IMM, BINOP_UNONE_UNONE_NONE,
BINOP_UNONE_UNONE_UNONE.

Regression tested on arm-none-eabi and found no regressions.

Ok for trunk?

Thanks,
Srinath.

gcc/ChangeLog:

2019-10-23  Andre Vieira  <andre.simoesdiasvieira@arm.com>
	    Mihail Ionescu  <mihail.ionescu@arm.com>
	    Srinath Parvathaneni  <srinath.parvathaneni@arm.com>

	* config/arm/arm_mve.h (vqmovntq_u16): Define macro.
	(vqmovnbq_u16): Likewise.
	(vmulltq_poly_p8): Likewise.
	(vmullbq_poly_p8): Likewise.
	(vmovntq_u16): Likewise.
	(vmovnbq_u16): Likewise.
	(vmlaldavxq_u16): Likewise.
	(vmlaldavq_u16): Likewise.
	(vqmovuntq_s16): Likewise.
	(vqmovunbq_s16): Likewise.
	(vshlltq_n_u8): Likewise.
	(vshllbq_n_u8): Likewise.
	(vorrq_n_u16): Likewise.
	(vbicq_n_u16): Likewise.
	(vcmpneq_n_f16): Likewise.
	(vcmpneq_f16): Likewise.
	(vcmpltq_n_f16): Likewise.
	(vcmpltq_f16): Likewise.
	(vcmpleq_n_f16): Likewise.
	(vcmpleq_f16): Likewise.
	(vcmpgtq_n_f16): Likewise.
	(vcmpgtq_f16): Likewise.
	(vcmpgeq_n_f16): Likewise.
	(vcmpgeq_f16): Likewise.
	(vcmpeqq_n_f16): Likewise.
	(vcmpeqq_f16): Likewise.
	(vsubq_f16): Likewise.
	(vqmovntq_s16): Likewise.
	(vqmovnbq_s16): Likewise.
	(vqdmulltq_s16): Likewise.
	(vqdmulltq_n_s16): Likewise.
	(vqdmullbq_s16): Likewise.
	(vqdmullbq_n_s16): Likewise.
	(vorrq_f16): Likewise.
	(vornq_f16): Likewise.
	(vmulq_n_f16): Likewise.
	(vmulq_f16): Likewise.
	(vmovntq_s16): Likewise.
	(vmovnbq_s16): Likewise.
	(vmlsldavxq_s16): Likewise.
	(vmlsldavq_s16): Likewise.
	(vmlaldavxq_s16): Likewise.
	(vmlaldavq_s16): Likewise.
	(vminnmvq_f16): Likewise.
	(vminnmq_f16): Likewise.
	(vminnmavq_f16): Likewise.
	(vminnmaq_f16): Likewise.
	(vmaxnmvq_f16): Likewise.
	(vmaxnmq_f16): Likewise.
	(vmaxnmavq_f16): Likewise.
	(vmaxnmaq_f16): Likewise.
	(veorq_f16): Likewise.
	(vcmulq_rot90_f16): Likewise.
	(vcmulq_rot270_f16): Likewise.
	(vcmulq_rot180_f16): Likewise.
	(vcmulq_f16): Likewise.
	(vcaddq_rot90_f16): Likewise.
	(vcaddq_rot270_f16): Likewise.
	(vbicq_f16): Likewise.
	(vandq_f16): Likewise.
	(vaddq_n_f16): Likewise.
	(vabdq_f16): Likewise.
	(vshlltq_n_s8): Likewise.
	(vshllbq_n_s8): Likewise.
	(vorrq_n_s16): Likewise.
	(vbicq_n_s16): Likewise.
	(vqmovntq_u32): Likewise.
	(vqmovnbq_u32): Likewise.
	(vmulltq_poly_p16): Likewise.
	(vmullbq_poly_p16): Likewise.
	(vmovntq_u32): Likewise.
	(vmovnbq_u32): Likewise.
	(vmlaldavxq_u32): Likewise.
	(vmlaldavq_u32): Likewise.
	(vqmovuntq_s32): Likewise.
	(vqmovunbq_s32): Likewise.
	(vshlltq_n_u16): Likewise.
	(vshllbq_n_u16): Likewise.
	(vorrq_n_u32): Likewise.
	(vbicq_n_u32): Likewise.
	(vcmpneq_n_f32): Likewise.
	(vcmpneq_f32): Likewise.
	(vcmpltq_n_f32): Likewise.
	(vcmpltq_f32): Likewise.
	(vcmpleq_n_f32): Likewise.
	(vcmpleq_f32): Likewise.
	(vcmpgtq_n_f32): Likewise.
	(vcmpgtq_f32): Likewise.
	(vcmpgeq_n_f32): Likewise.
	(vcmpgeq_f32): Likewise.
	(vcmpeqq_n_f32): Likewise.
	(vcmpeqq_f32): Likewise.
	(vsubq_f32): Likewise.
	(vqmovntq_s32): Likewise.
	(vqmovnbq_s32): Likewise.
	(vqdmulltq_s32): Likewise.
	(vqdmulltq_n_s32): Likewise.
	(vqdmullbq_s32): Likewise.
	(vqdmullbq_n_s32): Likewise.
	(vorrq_f32): Likewise.
	(vornq_f32): Likewise.
	(vmulq_n_f32): Likewise.
	(vmulq_f32): Likewise.
	(vmovntq_s32): Likewise.
	(vmovnbq_s32): Likewise.
	(vmlsldavxq_s32): Likewise.
	(vmlsldavq_s32): Likewise.
	(vmlaldavxq_s32): Likewise.
	(vmlaldavq_s32): Likewise.
	(vminnmvq_f32): Likewise.
	(vminnmq_f32): Likewise.
	(vminnmavq_f32): Likewise.
	(vminnmaq_f32): Likewise.
	(vmaxnmvq_f32): Likewise.
	(vmaxnmq_f32): Likewise.
	(vmaxnmavq_f32): Likewise.
	(vmaxnmaq_f32): Likewise.
	(veorq_f32): Likewise.
	(vcmulq_rot90_f32): Likewise.
	(vcmulq_rot270_f32): Likewise.
	(vcmulq_rot180_f32): Likewise.
	(vcmulq_f32): Likewise.
	(vcaddq_rot90_f32): Likewise.
	(vcaddq_rot270_f32): Likewise.
	(vbicq_f32): Likewise.
	(vandq_f32): Likewise.
	(vaddq_n_f32): Likewise.
	(vabdq_f32): Likewise.
	(vshlltq_n_s16): Likewise.
	(vshllbq_n_s16): Likewise.
	(vorrq_n_s32): Likewise.
	(vbicq_n_s32): Likewise.
	(vrmlaldavhq_u32): Likewise.
	(vctp8q_m): Likewise.
	(vctp64q_m): Likewise.
	(vctp32q_m): Likewise.
	(vctp16q_m): Likewise.
	(vaddlvaq_u32): Likewise.
	(vrmlsldavhxq_s32): Likewise.
	(vrmlsldavhq_s32): Likewise.
	(vrmlaldavhxq_s32): Likewise.
	(vrmlaldavhq_s32): Likewise.
	(vcvttq_f16_f32): Likewise.
	(vcvtbq_f16_f32): Likewise.
	(vaddlvaq_s32): Likewise.
	(__arm_vqmovntq_u16): Define intrinsic.
	(__arm_vqmovnbq_u16): Likewise.
	(__arm_vmulltq_poly_p8): Likewise.
	(__arm_vmullbq_poly_p8): Likewise.
	(__arm_vmovntq_u16): Likewise.
	(__arm_vmovnbq_u16): Likewise.
	(__arm_vmlaldavxq_u16): Likewise.
	(__arm_vmlaldavq_u16): Likewise.
	(__arm_vqmovuntq_s16): Likewise.
	(__arm_vqmovunbq_s16): Likewise.
	(__arm_vshlltq_n_u8): Likewise.
	(__arm_vshllbq_n_u8): Likewise.
	(__arm_vorrq_n_u16): Likewise.
	(__arm_vbicq_n_u16): Likewise.
	(__arm_vcmpneq_n_f16): Likewise.
	(__arm_vcmpneq_f16): Likewise.
	(__arm_vcmpltq_n_f16): Likewise.
	(__arm_vcmpltq_f16): Likewise.
	(__arm_vcmpleq_n_f16): Likewise.
	(__arm_vcmpleq_f16): Likewise.
	(__arm_vcmpgtq_n_f16): Likewise.
	(__arm_vcmpgtq_f16): Likewise.
	(__arm_vcmpgeq_n_f16): Likewise.
	(__arm_vcmpgeq_f16): Likewise.
	(__arm_vcmpeqq_n_f16): Likewise.
	(__arm_vcmpeqq_f16): Likewise.
	(__arm_vsubq_f16): Likewise.
	(__arm_vqmovntq_s16): Likewise.
	(__arm_vqmovnbq_s16): Likewise.
	(__arm_vqdmulltq_s16): Likewise.
	(__arm_vqdmulltq_n_s16): Likewise.
	(__arm_vqdmullbq_s16): Likewise.
	(__arm_vqdmullbq_n_s16): Likewise.
	(__arm_vorrq_f16): Likewise.
	(__arm_vornq_f16): Likewise.
	(__arm_vmulq_n_f16): Likewise.
	(__arm_vmulq_f16): Likewise.
	(__arm_vmovntq_s16): Likewise.
	(__arm_vmovnbq_s16): Likewise.
	(__arm_vmlsldavxq_s16): Likewise.
	(__arm_vmlsldavq_s16): Likewise.
	(__arm_vmlaldavxq_s16): Likewise.
	(__arm_vmlaldavq_s16): Likewise.
	(__arm_vminnmvq_f16): Likewise.
	(__arm_vminnmq_f16): Likewise.
	(__arm_vminnmavq_f16): Likewise.
	(__arm_vminnmaq_f16): Likewise.
	(__arm_vmaxnmvq_f16): Likewise.
	(__arm_vmaxnmq_f16): Likewise.
	(__arm_vmaxnmavq_f16): Likewise.
	(__arm_vmaxnmaq_f16): Likewise.
	(__arm_veorq_f16): Likewise.
	(__arm_vcmulq_rot90_f16): Likewise.
	(__arm_vcmulq_rot270_f16): Likewise.
	(__arm_vcmulq_rot180_f16): Likewise.
	(__arm_vcmulq_f16): Likewise.
	(__arm_vcaddq_rot90_f16): Likewise.
	(__arm_vcaddq_rot270_f16): Likewise.
	(__arm_vbicq_f16): Likewise.
	(__arm_vandq_f16): Likewise.
	(__arm_vaddq_n_f16): Likewise.
	(__arm_vabdq_f16): Likewise.
	(__arm_vshlltq_n_s8): Likewise.
	(__arm_vshllbq_n_s8): Likewise.
	(__arm_vorrq_n_s16): Likewise.
	(__arm_vbicq_n_s16): Likewise.
	(__arm_vqmovntq_u32): Likewise.
	(__arm_vqmovnbq_u32): Likewise.
	(__arm_vmulltq_poly_p16): Likewise.
	(__arm_vmullbq_poly_p16): Likewise.
	(__arm_vmovntq_u32): Likewise.
	(__arm_vmovnbq_u32): Likewise.
	(__arm_vmlaldavxq_u32): Likewise.
	(__arm_vmlaldavq_u32): Likewise.
	(__arm_vqmovuntq_s32): Likewise.
	(__arm_vqmovunbq_s32): Likewise.
	(__arm_vshlltq_n_u16): Likewise.
	(__arm_vshllbq_n_u16): Likewise.
	(__arm_vorrq_n_u32): Likewise.
	(__arm_vbicq_n_u32): Likewise.
	(__arm_vcmpneq_n_f32): Likewise.
	(__arm_vcmpneq_f32): Likewise.
	(__arm_vcmpltq_n_f32): Likewise.
	(__arm_vcmpltq_f32): Likewise.
	(__arm_vcmpleq_n_f32): Likewise.
	(__arm_vcmpleq_f32): Likewise.
	(__arm_vcmpgtq_n_f32): Likewise.
	(__arm_vcmpgtq_f32): Likewise.
	(__arm_vcmpgeq_n_f32): Likewise.
	(__arm_vcmpgeq_f32): Likewise.
	(__arm_vcmpeqq_n_f32): Likewise.
	(__arm_vcmpeqq_f32): Likewise.
	(__arm_vsubq_f32): Likewise.
	(__arm_vqmovntq_s32): Likewise.
	(__arm_vqmovnbq_s32): Likewise.
	(__arm_vqdmulltq_s32): Likewise.
	(__arm_vqdmulltq_n_s32): Likewise.
	(__arm_vqdmullbq_s32): Likewise.
	(__arm_vqdmullbq_n_s32): Likewise.
	(__arm_vorrq_f32): Likewise.
	(__arm_vornq_f32): Likewise.
	(__arm_vmulq_n_f32): Likewise.
	(__arm_vmulq_f32): Likewise.
	(__arm_vmovntq_s32): Likewise.
	(__arm_vmovnbq_s32): Likewise.
	(__arm_vmlsldavxq_s32): Likewise.
	(__arm_vmlsldavq_s32): Likewise.
	(__arm_vmlaldavxq_s32): Likewise.
	(__arm_vmlaldavq_s32): Likewise.
	(__arm_vminnmvq_f32): Likewise.
	(__arm_vminnmq_f32): Likewise.
	(__arm_vminnmavq_f32): Likewise.
	(__arm_vminnmaq_f32): Likewise.
	(__arm_vmaxnmvq_f32): Likewise.
	(__arm_vmaxnmq_f32): Likewise.
	(__arm_vmaxnmavq_f32): Likewise.
	(__arm_vmaxnmaq_f32): Likewise.
	(__arm_veorq_f32): Likewise.
	(__arm_vcmulq_rot90_f32): Likewise.
	(__arm_vcmulq_rot270_f32): Likewise.
	(__arm_vcmulq_rot180_f32): Likewise.
	(__arm_vcmulq_f32): Likewise.
	(__arm_vcaddq_rot90_f32): Likewise.
	(__arm_vcaddq_rot270_f32): Likewise.
	(__arm_vbicq_f32): Likewise.
	(__arm_vandq_f32): Likewise.
	(__arm_vaddq_n_f32): Likewise.
	(__arm_vabdq_f32): Likewise.
	(__arm_vshlltq_n_s16): Likewise.
	(__arm_vshllbq_n_s16): Likewise.
	(__arm_vorrq_n_s32): Likewise.
	(__arm_vbicq_n_s32): Likewise.
	(__arm_vrmlaldavhq_u32): Likewise.
	(__arm_vctp8q_m): Likewise.
	(__arm_vctp64q_m): Likewise.
	(__arm_vctp32q_m): Likewise.
	(__arm_vctp16q_m): Likewise.
	(__arm_vaddlvaq_u32): Likewise.
	(__arm_vrmlsldavhxq_s32): Likewise.
	(__arm_vrmlsldavhq_s32): Likewise.
	(__arm_vrmlaldavhxq_s32): Likewise.
	(__arm_vrmlaldavhq_s32): Likewise.
	(__arm_vcvttq_f16_f32): Likewise.
	(__arm_vcvtbq_f16_f32): Likewise.
	(__arm_vaddlvaq_s32): Likewise.
	(vst4q): Define polymorphic variant.
	(vrndxq): Likewise.
	(vrndq): Likewise.
	(vrndpq): Likewise.
	(vrndnq): Likewise.
	(vrndmq): Likewise.
	(vrndaq): Likewise.
	(vrev64q): Likewise.
	(vnegq): Likewise.
	(vdupq_n): Likewise.
	(vabsq): Likewise.
	(vrev32q): Likewise.
	(vcvtbq_f32): Likewise.
	(vcvttq_f32): Likewise.
	(vcvtq): Likewise.
	(vsubq_n): Likewise.
	(vbrsrq_n): Likewise.
	(vcvtq_n): Likewise.
	(vsubq): Likewise.
	(vorrq): Likewise.
	(vabdq): Likewise.
	(vaddq_n): Likewise.
	(vandq): Likewise.
	(vbicq): Likewise.
	(vornq): Likewise.
	(vmulq_n): Likewise.
	(vmulq): Likewise.
	(vcaddq_rot270): Likewise.
	(vcmpeqq_n): Likewise.
	(vcmpeqq): Likewise.
	(vcaddq_rot90): Likewise.
	(vcmpgeq_n): Likewise.
	(vcmpgeq): Likewise.
	(vcmpgtq_n): Likewise.
	(vcmpgtq): Likewise.
	(vcmpgtq): Likewise.
	(vcmpleq_n): Likewise.
	(vcmpleq_n): Likewise.
	(vcmpleq): Likewise.
	(vcmpleq): Likewise.
	(vcmpltq_n): Likewise.
	(vcmpltq_n): Likewise.
	(vcmpltq): Likewise.
	(vcmpltq): Likewise.
	(vcmpneq_n): Likewise.
	(vcmpneq_n): Likewise.
	(vcmpneq): Likewise.
	(vcmpneq): Likewise.
	(vcmulq): Likewise.
	(vcmulq): Likewise.
	(vcmulq_rot180): Likewise.
	(vcmulq_rot180): Likewise.
	(vcmulq_rot270): Likewise.
	(vcmulq_rot270): Likewise.
	(vcmulq_rot90): Likewise.
	(vcmulq_rot90): Likewise.
	(veorq): Likewise.
	(veorq): Likewise.
	(vmaxnmaq): Likewise.
	(vmaxnmaq): Likewise.
	(vmaxnmavq): Likewise.
	(vmaxnmavq): Likewise.
	(vmaxnmq): Likewise.
	(vmaxnmq): Likewise.
	(vmaxnmvq): Likewise.
	(vmaxnmvq): Likewise.
	(vminnmaq): Likewise.
	(vminnmaq): Likewise.
	(vminnmavq): Likewise.
	(vminnmavq): Likewise.
	(vminnmq): Likewise.
	(vminnmq): Likewise.
	(vminnmvq): Likewise.
	(vminnmvq): Likewise.
	(vbicq_n): Likewise.
	(vqmovntq): Likewise.
	(vqmovntq): Likewise.
	(vqmovnbq): Likewise.
	(vqmovnbq): Likewise.
	(vmulltq_poly): Likewise.
	(vmulltq_poly): Likewise.
	(vmullbq_poly): Likewise.
	(vmullbq_poly): Likewise.
	(vmovntq): Likewise.
	(vmovntq): Likewise.
	(vmovnbq): Likewise.
	(vmovnbq): Likewise.
	(vmlaldavxq): Likewise.
	(vmlaldavxq): Likewise.
	(vqmovuntq): Likewise.
	(vqmovuntq): Likewise.
	(vshlltq_n): Likewise.
	(vshlltq_n): Likewise.
	(vshllbq_n): Likewise.
	(vshllbq_n): Likewise.
	(vorrq_n): Likewise.
	(vorrq_n): Likewise.
	(vmlaldavq): Likewise.
	(vmlaldavq): Likewise.
	(vqmovunbq): Likewise.
	(vqmovunbq): Likewise.
	(vqdmulltq_n): Likewise.
	(vqdmulltq_n): Likewise.
	(vqdmulltq): Likewise.
	(vqdmulltq): Likewise.
	(vqdmullbq_n): Likewise.
	(vqdmullbq_n): Likewise.
	(vqdmullbq): Likewise.
	(vqdmullbq): Likewise.
	(vaddlvaq): Likewise.
	(vaddlvaq): Likewise.
	(vrmlaldavhq): Likewise.
	(vrmlaldavhq): Likewise.
	(vrmlaldavhxq): Likewise.
	(vrmlaldavhxq): Likewise.
	(vrmlsldavhq): Likewise.
	(vrmlsldavhq): Likewise.
	(vrmlsldavhxq): Likewise.
	(vrmlsldavhxq): Likewise.
	(vmlsldavxq): Likewise.
	(vmlsldavxq): Likewise.
	(vmlsldavq): Likewise.
	(vmlsldavq): Likewise.
	* config/arm/arm_mve_builtins.def (BINOP_NONE_NONE_IMM): Use it.
	(BINOP_NONE_NONE_NONE): Likewise.
	(BINOP_UNONE_NONE_NONE): Likewise.
	(BINOP_UNONE_UNONE_IMM): Likewise.
	(BINOP_UNONE_UNONE_NONE): Likewise.
	(BINOP_UNONE_UNONE_UNONE): Likewise.
	* config/arm/mve.md (mve_vabdq_f<mode>): Define RTL pattern.
	(mve_vaddlvaq_<supf>v4si): Likewise.
	(mve_vaddq_n_f<mode>): Likewise.
	(mve_vandq_f<mode>): Likewise.
	(mve_vbicq_f<mode>): Likewise.
	(mve_vbicq_n_<supf><mode>): Likewise.
	(mve_vcaddq_rot270_f<mode>): Likewise.
	(mve_vcaddq_rot90_f<mode>): Likewise.
	(mve_vcmpeqq_f<mode>): Likewise.
	(mve_vcmpeqq_n_f<mode>): Likewise.
	(mve_vcmpgeq_f<mode>): Likewise.
	(mve_vcmpgeq_n_f<mode>): Likewise.
	(mve_vcmpgtq_f<mode>): Likewise.
	(mve_vcmpgtq_n_f<mode>): Likewise.
	(mve_vcmpleq_f<mode>): Likewise.
	(mve_vcmpleq_n_f<mode>): Likewise.
	(mve_vcmpltq_f<mode>): Likewise.
	(mve_vcmpltq_n_f<mode>): Likewise.
	(mve_vcmpneq_f<mode>): Likewise.
	(mve_vcmpneq_n_f<mode>): Likewise.
	(mve_vcmulq_f<mode>): Likewise.
	(mve_vcmulq_rot180_f<mode>): Likewise.
	(mve_vcmulq_rot270_f<mode>): Likewise.
	(mve_vcmulq_rot90_f<mode>): Likewise.
	(mve_vctp<mode1>q_mhi): Likewise.
	(mve_vcvtbq_f16_f32v8hf): Likewise.
	(mve_vcvttq_f16_f32v8hf): Likewise.
	(mve_veorq_f<mode>): Likewise.
	(mve_vmaxnmaq_f<mode>): Likewise.
	(mve_vmaxnmavq_f<mode>): Likewise.
	(mve_vmaxnmq_f<mode>): Likewise.
	(mve_vmaxnmvq_f<mode>): Likewise.
	(mve_vminnmaq_f<mode>): Likewise.
	(mve_vminnmavq_f<mode>): Likewise.
	(mve_vminnmq_f<mode>): Likewise.
	(mve_vminnmvq_f<mode>): Likewise.
	(mve_vmlaldavq_<supf><mode>): Likewise.
	(mve_vmlaldavxq_<supf><mode>): Likewise.
	(mve_vmlsldavq_s<mode>): Likewise.
	(mve_vmlsldavxq_s<mode>): Likewise.
	(mve_vmovnbq_<supf><mode>): Likewise.
	(mve_vmovntq_<supf><mode>): Likewise.
	(mve_vmulq_f<mode>): Likewise.
	(mve_vmulq_n_f<mode>): Likewise.
	(mve_vornq_f<mode>): Likewise.
	(mve_vorrq_f<mode>): Likewise.
	(mve_vorrq_n_<supf><mode>): Likewise.
	(mve_vqdmullbq_n_s<mode>): Likewise.
	(mve_vqdmullbq_s<mode>): Likewise.
	(mve_vqdmulltq_n_s<mode>): Likewise.
	(mve_vqdmulltq_s<mode>): Likewise.
	(mve_vqmovnbq_<supf><mode>): Likewise.
	(mve_vqmovntq_<supf><mode>): Likewise.
	(mve_vqmovunbq_s<mode>): Likewise.
	(mve_vqmovuntq_s<mode>): Likewise.
	(mve_vrmlaldavhxq_sv4si): Likewise.
	(mve_vrmlsldavhq_sv4si): Likewise.
	(mve_vrmlsldavhxq_sv4si): Likewise.
	(mve_vshllbq_n_<supf><mode>): Likewise.
	(mve_vshlltq_n_<supf><mode>): Likewise.
	(mve_vsubq_f<mode>): Likewise.
	(mve_vmulltq_poly_p<mode>): Likewise.
	(mve_vmullbq_poly_p<mode>): Likewise.
	(mve_vrmlaldavhq_<supf>v4si): Likewise.

gcc/testsuite/ChangeLog:

2019-10-23  Andre Vieira  <andre.simoesdiasvieira@arm.com>
	    Mihail Ionescu  <mihail.ionescu@arm.com>
	    Srinath Parvathaneni  <srinath.parvathaneni@arm.com>

	* gcc.target/arm/mve/intrinsics/vabdq_f16.c: New test.
	* gcc.target/arm/mve/intrinsics/vabdq_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddlvaq_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddlvaq_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddq_n_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddq_n_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vandq_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vandq_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vbicq_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vbicq_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vbicq_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vbicq_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vbicq_n_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vbicq_n_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcaddq_rot270_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcaddq_rot270_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcaddq_rot90_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcaddq_rot90_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpeqq_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpeqq_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpeqq_n_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpeqq_n_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpgeq_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpgeq_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpgeq_n_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpgeq_n_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpgtq_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpgtq_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpgtq_n_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpgtq_n_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpleq_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpleq_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpleq_n_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpleq_n_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpltq_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpltq_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpltq_n_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpltq_n_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpneq_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpneq_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpneq_n_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpneq_n_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmulq_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmulq_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmulq_rot180_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmulq_rot180_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmulq_rot270_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmulq_rot270_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmulq_rot90_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmulq_rot90_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vctp16q_m.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vctp32q_m.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vctp64q_m.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vctp8q_m.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcvtbq_f16_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcvttq_f16_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/veorq_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/veorq_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmaxnmaq_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmaxnmaq_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmaxnmavq_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmaxnmavq_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmaxnmq_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmaxnmq_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmaxnmvq_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmaxnmvq_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vminnmaq_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vminnmaq_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vminnmavq_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vminnmavq_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vminnmq_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vminnmq_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vminnmvq_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vminnmvq_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmlaldavq_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmlaldavq_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmlaldavq_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmlaldavq_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmlaldavxq_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmlaldavxq_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmlaldavxq_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmlaldavxq_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmlsldavq_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmlsldavq_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmlsldavxq_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmlsldavxq_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmovnbq_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmovnbq_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmovnbq_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmovnbq_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmovntq_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmovntq_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmovntq_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmovntq_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmullbq_poly_p16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmullbq_poly_p8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmulltq_poly_p16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmulltq_poly_p8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmulq_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmulq_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmulq_n_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmulq_n_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vornq_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vornq_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vorrq_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vorrq_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vorrq_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vorrq_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vorrq_n_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vorrq_n_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqdmullbq_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqdmullbq_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqdmullbq_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqdmullbq_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqdmulltq_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqdmulltq_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqdmulltq_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqdmulltq_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqmovnbq_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqmovnbq_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqmovnbq_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqmovnbq_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqmovntq_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqmovntq_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqmovntq_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqmovntq_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqmovunbq_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqmovunbq_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqmovuntq_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqmovuntq_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrmlaldavhq_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrmlaldavhq_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrmlaldavhxq_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrmlsldavhq_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrmlsldavhxq_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vshllbq_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vshllbq_n_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vshllbq_n_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vshllbq_n_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vshlltq_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vshlltq_n_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vshlltq_n_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vshlltq_n_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsubq_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsubq_f32.c: Likewise.

[-- Attachment #2: diff12.patch.gz --]
[-- Type: application/gzip, Size: 18854 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH][ARM][GCC][1/2x]: MVE intrinsics with binary operands.
  2019-11-14 19:34 [PATCH][ARM][GCC][0/x]: Support for MVE ACLE intrinsics Srinath Parvathaneni
                   ` (9 preceding siblings ...)
  2019-11-14 19:13 ` [PATCH][ARM][GCC][3/2x]: MVE intrinsics with binary operands Srinath Parvathaneni
@ 2019-11-14 19:14 ` Srinath Parvathaneni
  2019-12-19 19:10   ` Kyrill Tkachov
  2019-11-14 19:14 ` [PATCH][ARM][GCC][1/4x]: MVE intrinsics with quaternary operands Srinath Parvathaneni
                   ` (27 subsequent siblings)
  38 siblings, 1 reply; 48+ messages in thread
From: Srinath Parvathaneni @ 2019-11-14 19:14 UTC (permalink / raw)
  To: gcc-patches; +Cc: Richard Earnshaw, Kyrylo Tkachov

[-- Attachment #1: Type: text/plain, Size: 24720 bytes --]

Hello,

This patch supports following MVE ACLE intrinsics with binary operand.

vsubq_n_f16, vsubq_n_f32, vbrsrq_n_f16, vbrsrq_n_f32, vcvtq_n_f16_s16,
vcvtq_n_f32_s32, vcvtq_n_f16_u16, vcvtq_n_f32_u32, vcreateq_f16, vcreateq_f32.

Please refer to M-profile Vector Extension (MVE) intrinsics [1]  for more details.
[1] https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/mve-intrinsics

In this patch new constraint "Rd" is added, which checks the constant is with in the range of 1 to 16.
Also a new predicate "mve_imm_16" is added, to check the the matching constraint Rd.

Regression tested on arm-none-eabi and found no regressions.

Ok for trunk?

Thanks,
Srinath.

gcc/ChangeLog:

2019-10-21  Andre Vieira  <andre.simoesdiasvieira@arm.com>
	    Mihail Ionescu  <mihail.ionescu@arm.com>
	    Srinath Parvathaneni  <srinath.parvathaneni@arm.com>

	* config/arm/arm-builtins.c (BINOP_NONE_NONE_NONE_QUALIFIERS): Define
	qualifier for binary operands.
	(BINOP_NONE_NONE_IMM_QUALIFIERS): Likewise.
	(BINOP_NONE_UNONE_IMM_QUALIFIERS): Likewise.
	(BINOP_NONE_UNONE_UNONE_QUALIFIERS): Likewise.
	* config/arm/arm_mve.h (vsubq_n_f16): Define macro.
	(vsubq_n_f32): Likewise.
	(vbrsrq_n_f16): Likewise.
	(vbrsrq_n_f32): Likewise.
	(vcvtq_n_f16_s16): Likewise.
	(vcvtq_n_f32_s32): Likewise.
	(vcvtq_n_f16_u16): Likewise.
	(vcvtq_n_f32_u32): Likewise.
	(vcreateq_f16): Likewise.
	(vcreateq_f32): Likewise.
	(__arm_vsubq_n_f16): Define intrinsic.
	(__arm_vsubq_n_f32): Likewise.
	(__arm_vbrsrq_n_f16): Likewise.
	(__arm_vbrsrq_n_f32): Likewise.
	(__arm_vcvtq_n_f16_s16): Likewise.
	(__arm_vcvtq_n_f32_s32): Likewise.
	(__arm_vcvtq_n_f16_u16): Likewise.
	(__arm_vcvtq_n_f32_u32): Likewise.
	(__arm_vcreateq_f16): Likewise.
	(__arm_vcreateq_f32): Likewise.
	(vsubq): Define polymorphic variant.
	(vbrsrq): Likewise.
	(vcvtq_n): Likewise.
	* config/arm/arm_mve_builtins.def (BINOP_NONE_NONE_NONE_QUALIFIERS): Use
	it.
        (BINOP_NONE_NONE_IMM_QUALIFIERS): Likewise.
        (BINOP_NONE_UNONE_IMM_QUALIFIERS): Likewise.
        (BINOP_NONE_UNONE_UNONE_QUALIFIERS): Likewise.
	* config/arm/constraints.md (Rd): Define constraint to check constant is
	in the range of 1 to 16.
	* config/arm/mve.md (mve_vsubq_n_f<mode>): Define RTL pattern.
	mve_vbrsrq_n_f<mode>: Likewise.
	mve_vcvtq_n_to_f_<supf><mode>: Likewise.
	mve_vcreateq_f<mode>: Likewise.
	* config/arm/predicates.md (mve_imm_16): Define predicate to check
	the matching constraint Rd.

gcc/testsuite/ChangeLog:

2019-10-21  Andre Vieira  <andre.simoesdiasvieira@arm.com>
	    Mihail Ionescu  <mihail.ionescu@arm.com>
	    Srinath Parvathaneni  <srinath.parvathaneni@arm.com>

	* gcc.target/arm/mve/intrinsics/vbrsrq_n_f16.c: New test.
	* gcc.target/arm/mve/intrinsics/vbrsrq_n_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcreateq_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcreateq_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcvtq_n_f16_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcvtq_n_f16_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcvtq_n_f32_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcvtq_n_f32_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsubq_n_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsubq_n_f32.c: Likewise.


###############     Attachment also inlined for ease of reply    ###############


diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index cd82aa159089c288607e240de02a85dcbb134a14..c2dad057d1365914477c64d559aa1fd1c32bbf19 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -349,6 +349,30 @@ arm_unop_unone_imm_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 #define UNOP_UNONE_IMM_QUALIFIERS \
   (arm_unop_unone_imm_qualifiers)
 
+static enum arm_type_qualifiers
+arm_binop_none_none_none_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_none, qualifier_none, qualifier_none };
+#define BINOP_NONE_NONE_NONE_QUALIFIERS \
+  (arm_binop_none_none_none_qualifiers)
+
+static enum arm_type_qualifiers
+arm_binop_none_none_imm_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_none, qualifier_none, qualifier_immediate };
+#define BINOP_NONE_NONE_IMM_QUALIFIERS \
+  (arm_binop_none_none_imm_qualifiers)
+
+static enum arm_type_qualifiers
+arm_binop_none_unone_imm_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_none, qualifier_unsigned, qualifier_immediate };
+#define BINOP_NONE_UNONE_IMM_QUALIFIERS \
+  (arm_binop_none_unone_imm_qualifiers)
+
+static enum arm_type_qualifiers
+arm_binop_none_unone_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_none, qualifier_unsigned, qualifier_unsigned };
+#define BINOP_NONE_UNONE_UNONE_QUALIFIERS \
+  (arm_binop_none_unone_unone_qualifiers)
+
 /* End of Qualifier for MVE builtins.  */
 
    /* void ([T element type] *, T, immediate).  */
diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
index c8d9b6471634725cea9bab3f9fa145810b506938..15b7ada025fe57b682f3873f0f275c43a30d1273 100644
--- a/gcc/config/arm/arm_mve.h
+++ b/gcc/config/arm/arm_mve.h
@@ -197,6 +197,16 @@ typedef struct { uint8x16_t val[4]; } uint8x16x4_t;
 #define vctp64q(__a) __arm_vctp64q(__a)
 #define vctp8q(__a) __arm_vctp8q(__a)
 #define vpnot(__a) __arm_vpnot(__a)
+#define vsubq_n_f16(__a, __b) __arm_vsubq_n_f16(__a, __b)
+#define vsubq_n_f32(__a, __b) __arm_vsubq_n_f32(__a, __b)
+#define vbrsrq_n_f16(__a, __b) __arm_vbrsrq_n_f16(__a, __b)
+#define vbrsrq_n_f32(__a, __b) __arm_vbrsrq_n_f32(__a, __b)
+#define vcvtq_n_f16_s16(__a,  __imm6) __arm_vcvtq_n_f16_s16(__a,  __imm6)
+#define vcvtq_n_f32_s32(__a,  __imm6) __arm_vcvtq_n_f32_s32(__a,  __imm6)
+#define vcvtq_n_f16_u16(__a,  __imm6) __arm_vcvtq_n_f16_u16(__a,  __imm6)
+#define vcvtq_n_f32_u32(__a,  __imm6) __arm_vcvtq_n_f32_u32(__a,  __imm6)
+#define vcreateq_f16(__a, __b) __arm_vcreateq_f16(__a, __b)
+#define vcreateq_f32(__a, __b) __arm_vcreateq_f32(__a, __b)
 #endif
 
 __extension__ extern __inline void
@@ -1085,6 +1095,76 @@ __arm_vcvtmq_s32_f32 (float32x4_t __a)
   return __builtin_mve_vcvtmq_sv4si (__a);
 }
 
+__extension__ extern __inline float16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vsubq_n_f16 (float16x8_t __a, float16_t __b)
+{
+  return __builtin_mve_vsubq_n_fv8hf (__a, __b);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vsubq_n_f32 (float32x4_t __a, float32_t __b)
+{
+  return __builtin_mve_vsubq_n_fv4sf (__a, __b);
+}
+
+__extension__ extern __inline float16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vbrsrq_n_f16 (float16x8_t __a, int32_t __b)
+{
+  return __builtin_mve_vbrsrq_n_fv8hf (__a, __b);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vbrsrq_n_f32 (float32x4_t __a, int32_t __b)
+{
+  return __builtin_mve_vbrsrq_n_fv4sf (__a, __b);
+}
+
+__extension__ extern __inline float16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vcvtq_n_f16_s16 (int16x8_t __a, const int __imm6)
+{
+  return __builtin_mve_vcvtq_n_to_f_sv8hf (__a, __imm6);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vcvtq_n_f32_s32 (int32x4_t __a, const int __imm6)
+{
+  return __builtin_mve_vcvtq_n_to_f_sv4sf (__a, __imm6);
+}
+
+__extension__ extern __inline float16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vcvtq_n_f16_u16 (uint16x8_t __a, const int __imm6)
+{
+  return __builtin_mve_vcvtq_n_to_f_uv8hf (__a, __imm6);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vcvtq_n_f32_u32 (uint32x4_t __a, const int __imm6)
+{
+  return __builtin_mve_vcvtq_n_to_f_uv4sf (__a, __imm6);
+}
+
+__extension__ extern __inline float16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vcreateq_f16 (uint64_t __a, uint64_t __b)
+{
+  return __builtin_mve_vcreateq_fv8hf (__a, __b);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vcreateq_f32 (uint64_t __a, uint64_t __b)
+{
+  return __builtin_mve_vcreateq_fv4sf (__a, __b);
+}
+
 #endif
 
 enum {
@@ -1373,6 +1453,27 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_uint16x8_t]: __arm_vcvtq_f16_u16 (__ARM_mve_coerce(__p0, uint16x8_t)), \
   int (*)[__ARM_mve_type_uint32x4_t]: __arm_vcvtq_f32_u32 (__ARM_mve_coerce(__p0, uint32x4_t)));})
 
+#define vsubq(p0,p1) __arm_vsubq(p0,p1)
+#define __arm_vsubq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
+  __typeof(p1) __p1 = (p1); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
+  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16_t]: __arm_vsubq_n_f16 (__ARM_mve_coerce(__p0, float16x8_t), __ARM_mve_coerce(__p1, float16_t)), \
+  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32_t]: __arm_vsubq_n_f32 (__ARM_mve_coerce(__p0, float32x4_t), __ARM_mve_coerce(__p1, float32_t)));})
+
+#define vbrsrq(p0,p1) __arm_vbrsrq(p0,p1)
+#define __arm_vbrsrq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
+  int (*)[__ARM_mve_type_float16x8_t]: __arm_vbrsrq_n_f16 (__ARM_mve_coerce(__p0, float16x8_t), p1), \
+  int (*)[__ARM_mve_type_float32x4_t]: __arm_vbrsrq_n_f32 (__ARM_mve_coerce(__p0, float32x4_t), p1));})
+
+#define vcvtq_n(p0,p1) __arm_vcvtq_n(p0,p1)
+#define __arm_vcvtq_n(p0,p1) ({ __typeof(p0) __p0 = (p0); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
+  int (*)[__ARM_mve_type_int16x8_t]: __arm_vcvtq_n_f16_s16 (__ARM_mve_coerce(__p0, int16x8_t), p1), \
+  int (*)[__ARM_mve_type_int32x4_t]: __arm_vcvtq_n_f32_s32 (__ARM_mve_coerce(__p0, int32x4_t), p1), \
+  int (*)[__ARM_mve_type_uint16x8_t]: __arm_vcvtq_n_f16_u16 (__ARM_mve_coerce(__p0, uint16x8_t), p1), \
+  int (*)[__ARM_mve_type_uint32x4_t]: __arm_vcvtq_n_f32_u32 (__ARM_mve_coerce(__p0, uint32x4_t), p1));})
+
 #else /* MVE Interger.  */
 
 #define vst4q(p0,p1) __arm_vst4q(p0,p1)
diff --git a/gcc/config/arm/arm_mve_builtins.def b/gcc/config/arm/arm_mve_builtins.def
index 816b6dfca7fb221275212ca5f06fc6f679860a38..8d1e4fac3d75e87fbe334e64e1073cb1fef0d96d 100644
--- a/gcc/config/arm/arm_mve_builtins.def
+++ b/gcc/config/arm/arm_mve_builtins.def
@@ -6,7 +6,7 @@
 
     GCC is free software; you can redistribute it and/or modify it
     under the terms of the GNU General Public License as published
-    by the Free Software Foundation; either version 3, or (at your
+    by the Free Software Foundation; either version 3, or  (at your
     option) any later version.
 
     GCC is distributed in the hope that it will be useful, but WITHOUT
@@ -76,3 +76,8 @@ VAR1 (UNOP_UNONE_UNONE, vctp32q, hi)
 VAR1 (UNOP_UNONE_UNONE, vctp64q, hi)
 VAR1 (UNOP_UNONE_UNONE, vctp8q, hi)
 VAR1 (UNOP_UNONE_UNONE, vpnot, hi)
+VAR2 (BINOP_NONE_NONE_NONE, vsubq_n_f, v8hf, v4sf)
+VAR2 (BINOP_NONE_NONE_NONE, vbrsrq_n_f, v8hf, v4sf)
+VAR2 (BINOP_NONE_NONE_IMM, vcvtq_n_to_f_s, v8hf, v4sf)
+VAR2 (BINOP_NONE_UNONE_IMM, vcvtq_n_to_f_u, v8hf, v4sf)
+VAR2 (BINOP_NONE_UNONE_UNONE, vcreateq_f, v8hf, v4sf)
diff --git a/gcc/config/arm/constraints.md b/gcc/config/arm/constraints.md
index 4c193d20a48d5e9ed43bdca76468381fc17682da..86cd82334533acbe58be4a0c547cf38737d2bb84 100644
--- a/gcc/config/arm/constraints.md
+++ b/gcc/config/arm/constraints.md
@@ -35,7 +35,7 @@
 ;; in ARM/Thumb-2 state: Da, Db, Dc, Dd, Dn, DN, Dm, Dl, DL, Do, Dv, Dy, Di,
 ;;			 Dt, Dp, Dz, Tu
 ;; in Thumb-1 state: Pa, Pb, Pc, Pd, Pe
-;; in Thumb-2 state: Ha, Pj, PJ, Ps, Pt, Pu, Pv, Pw, Px, Py, Pz
+;; in Thumb-2 state: Ha, Pj, PJ, Ps, Pt, Pu, Pv, Pw, Px, Py, Pz, Rd
 ;; in all states: Pf, Pg, UM, U1
 
 ;; The following memory constraints have been used:
@@ -51,6 +51,11 @@
   "MVE EVEN registers @code{r0}, @code{r2}, @code{r4}, @code{r6}, @code{r8},
    @code{r10}, @code{r12}, @code{r14}")
 
+(define_constraint "Rd"
+  "@internal In Thumb-2 state a constant in range 1 to 16"
+  (and (match_code "const_int")
+       (match_test "TARGET_HAVE_MVE && ival >= 1 && ival <= 16")))
+
 (define_register_constraint "t" "TARGET_32BIT ? VFP_LO_REGS : NO_REGS"
  "The VFP registers @code{s0}-@code{s31}.")
 
diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index ee2263e04309e8274ec76ae4478afb7cfba59f8f..06701b024dabae85446bb60060a8b331e540cd6d 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -36,7 +36,9 @@
 			 VREV32Q_U VREV32Q_S VMOVLTQ_U VMOVLTQ_S VMOVLBQ_S
 			 VMOVLBQ_U VCVTQ_FROM_F_S VCVTQ_FROM_F_U VCVTPQ_S
 			 VCVTPQ_U VCVTNQ_S VCVTNQ_U VCVTMQ_S VCVTMQ_U
-			 VADDLVQ_U VCTP8Q VCTP16Q VCTP32Q VCTP64Q VPNOT])
+			 VADDLVQ_U VCTP8Q VCTP16Q VCTP32Q VCTP64Q VPNOT
+			 VCREATEQ_F VCVTQ_N_TO_F_S VCVTQ_N_TO_F_U VBRSRQ_N_F
+			 VSUBQ_N_F])
 
 (define_mode_attr MVE_CNVT [(V8HI "V8HF") (V4SI "V4SF")
 			    (V8HF "V8HI") (V4SF "V4SI")])
@@ -52,7 +54,8 @@
 		       (VCVTPQ_S "s") (VCVTPQ_U "u") (VCVTNQ_S "s")
 		       (VCVTNQ_U "u") (VCVTMQ_S "s") (VCVTMQ_U "u")
 		       (VCLZQ_U "u") (VCLZQ_S "s") (VREV32Q_U "u")
-		       (VREV32Q_S "s") (VADDLVQ_U "u") (VADDLVQ_S "s")])
+		       (VREV32Q_S "s") (VADDLVQ_U "u") (VADDLVQ_S "s")
+		       (VCVTQ_N_TO_F_S "s") (VCVTQ_N_TO_F_U "u")])
 
 (define_int_attr mode1 [(VCTP8Q "8") (VCTP16Q "16") (VCTP32Q "32")
 			(VCTP64Q "64")])
@@ -75,6 +78,7 @@
 (define_int_iterator VCVTMQ [VCVTMQ_S VCVTMQ_U])
 (define_int_iterator VADDLVQ [VADDLVQ_U VADDLVQ_S])
 (define_int_iterator VCTPQ [VCTP8Q VCTP16Q VCTP32Q VCTP64Q])
+(define_int_iterator VCVTQ_N_TO_F [VCVTQ_N_TO_F_S VCVTQ_N_TO_F_U])
 
 (define_insn "*mve_mov<mode>"
   [(set (match_operand:MVE_types 0 "s_register_operand" "=w,w,r,w,w,r,w")
@@ -680,3 +684,62 @@
   "vpnot"
   [(set_attr "type" "mve_move")
 ])
+
+;;
+;; [vsubq_n_f])
+;;
+(define_insn "mve_vsubq_n_f<mode>"
+  [
+   (set (match_operand:MVE_0 0 "s_register_operand" "=w")
+	(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "w")
+		       (match_operand:<V_elem> 2 "s_register_operand" "r")]
+	 VSUBQ_N_F))
+  ]
+  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
+  "vsub.f%#<V_sz_elem>  %q0, %q1, %2"
+  [(set_attr "type" "mve_move")
+])
+
+;;
+;; [vbrsrq_n_f])
+;;
+(define_insn "mve_vbrsrq_n_f<mode>"
+  [
+   (set (match_operand:MVE_0 0 "s_register_operand" "=w")
+	(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "w")
+		       (match_operand:SI 2 "s_register_operand" "r")]
+	 VBRSRQ_N_F))
+  ]
+  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
+  "vbrsr.%#<V_sz_elem>  %q0, %q1, %2"
+  [(set_attr "type" "mve_move")
+])
+
+;;
+;; [vcvtq_n_to_f_s, vcvtq_n_to_f_u])
+;;
+(define_insn "mve_vcvtq_n_to_f_<supf><mode>"
+  [
+   (set (match_operand:MVE_0 0 "s_register_operand" "=w")
+	(unspec:MVE_0 [(match_operand:<MVE_CNVT> 1 "s_register_operand" "w")
+		       (match_operand:SI 2 "mve_imm_16" "Rd")]
+	 VCVTQ_N_TO_F))
+  ]
+  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
+  "vcvt.f%#<V_sz_elem>.<supf>%#<V_sz_elem>\t%q0, %q1, %2"
+  [(set_attr "type" "mve_move")
+])
+
+;; [vcreateq_f])
+;;
+(define_insn "mve_vcreateq_f<mode>"
+  [
+   (set (match_operand:MVE_0 0 "s_register_operand" "=w")
+	(unspec:MVE_0 [(match_operand:DI 1 "s_register_operand" "r")
+		       (match_operand:DI 2 "s_register_operand" "r")]
+	 VCREATEQ_F))
+  ]
+  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
+  "vmov %q0[2], %q0[0], %Q2, %Q1\;vmov %q0[3], %q0[1], %R2, %R1"
+  [(set_attr "type" "mve_move")
+   (set_attr "length""8")])
diff --git a/gcc/config/arm/predicates.md b/gcc/config/arm/predicates.md
index 9d74165fe065b03c77918fe9e4611967799535f1..80a44bb32cbce1ee3f850a44b70a0e4ceed548be 100644
--- a/gcc/config/arm/predicates.md
+++ b/gcc/config/arm/predicates.md
@@ -31,6 +31,10 @@
 	      || REGNO_REG_CLASS (REGNO (op)) != NO_REGS));
 })
 
+;; True for immediates in the range of 1 to 16 for MVE.
+(define_predicate "mve_imm_16"
+  (match_test "satisfies_constraint_Rd (op)"))
+
 ; Predicate for stack protector guard's address in
 ; stack_protect_combined_set_insn and stack_protect_combined_test_insn patterns
 (define_predicate "guard_addr_operand"
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vbrsrq_n_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vbrsrq_n_f16.c
new file mode 100644
index 0000000000000000000000000000000000000000..90f52b50e6bd0dbccc40e9da7e2f8badb593727d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vbrsrq_n_f16.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+float16x8_t
+foo (float16x8_t a, int32_t b)
+{
+  return vbrsrq_n_f16 (a, b);
+}
+
+/* { dg-final { scan-assembler "vbrsr.16"  }  } */
+
+float16x8_t
+foo1 (float16x8_t a, int32_t b)
+{
+  return vbrsrq (a, b);
+}
+
+/* { dg-final { scan-assembler "vbrsr.16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vbrsrq_n_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vbrsrq_n_f32.c
new file mode 100644
index 0000000000000000000000000000000000000000..4ccd8dd313319ce1933e85f76955d1ea12952abd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vbrsrq_n_f32.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+float32x4_t
+foo (float32x4_t a, int32_t b)
+{
+  return vbrsrq_n_f32 (a, b);
+}
+
+/* { dg-final { scan-assembler "vbrsr.32"  }  } */
+
+float32x4_t
+foo1 (float32x4_t a, int32_t b)
+{
+  return vbrsrq (a, b);
+}
+
+/* { dg-final { scan-assembler "vbrsr.32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_f16.c
new file mode 100644
index 0000000000000000000000000000000000000000..a83203ad0de1e6bfc6b4b9d469748108a335a391
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_f16.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+float16x8_t
+foo (uint64_t a, uint64_t b)
+{
+  return vcreateq_f16 (a, b);
+}
+
+/* { dg-final { scan-assembler "vmov"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_f32.c
new file mode 100644
index 0000000000000000000000000000000000000000..279e922ed45165d0b104b236e4a2c8f782c4a59d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_f32.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+float32x4_t
+foo (uint64_t a, uint64_t b)
+{
+  return vcreateq_f32 (a, b);
+}
+
+/* { dg-final { scan-assembler "vmov"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_n_f16_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_n_f16_s16.c
new file mode 100644
index 0000000000000000000000000000000000000000..440d4df99b295deb66b91e3996cbca8f35f7d12b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_n_f16_s16.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+float16x8_t
+foo (int16x8_t a)
+{
+  return vcvtq_n_f16_s16 (a, 1);
+}
+
+/* { dg-final { scan-assembler "vcvt.f16.s16"  }  } */
+
+float16x8_t
+foo1 (int16x8_t a)
+{
+  return vcvtq_n (a, 1);
+}
+
+/* { dg-final { scan-assembler "vcvt.f16.s16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_n_f16_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_n_f16_u16.c
new file mode 100644
index 0000000000000000000000000000000000000000..4d648d653ef91436bb87d2e07e15ccee0844309e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_n_f16_u16.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+float16x8_t
+foo (uint16x8_t a)
+{
+  return vcvtq_n_f16_u16 (a, 1);
+}
+
+/* { dg-final { scan-assembler "vcvt.f16.u16"  }  } */
+
+float16x8_t
+foo1 (uint16x8_t a)
+{
+  return vcvtq_n (a, 1);
+}
+
+/* { dg-final { scan-assembler "vcvt.f16.u16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_n_f32_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_n_f32_s32.c
new file mode 100644
index 0000000000000000000000000000000000000000..836727a9b2d6a2621b34de5b7d4643ee1bebf47c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_n_f32_s32.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+float32x4_t
+foo (int32x4_t a)
+{
+  return vcvtq_n_f32_s32 (a, 1);
+}
+
+/* { dg-final { scan-assembler "vcvt.f32.s32"  }  } */
+
+float32x4_t
+foo1 (int32x4_t a)
+{
+  return vcvtq_n (a, 1);
+}
+
+/* { dg-final { scan-assembler "vcvt.f32.s32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_n_f32_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_n_f32_u32.c
new file mode 100644
index 0000000000000000000000000000000000000000..fcd66a42fd49c78e1c11ac89393dc38cb06efbf5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_n_f32_u32.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+float32x4_t
+foo (uint32x4_t a)
+{
+  return vcvtq_n_f32_u32 (a, 1);
+}
+
+/* { dg-final { scan-assembler "vcvt.f32.u32"  }  } */
+
+float32x4_t
+foo1 (uint32x4_t a)
+{
+  return vcvtq_n (a, 1);
+}
+
+/* { dg-final { scan-assembler "vcvt.f32.u32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_n_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_n_f16.c
new file mode 100644
index 0000000000000000000000000000000000000000..e6ff11edf3933de7acd7bdffd33d8b79c88a7f48
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_n_f16.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+float16x8_t
+foo (float16x8_t a, float16_t b)
+{
+  return vsubq_n_f16 (a, b);
+}
+
+/* { dg-final { scan-assembler "vsub.f16"  }  } */
+
+float16x8_t
+foo1 (float16x8_t a, float16_t b)
+{
+  return vsubq (a, b);
+}
+
+/* { dg-final { scan-assembler "vsub.f16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_n_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_n_f32.c
new file mode 100644
index 0000000000000000000000000000000000000000..1092fd0aa6abc2a0ed0320a17d6b563f0396fb58
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_n_f32.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+float32x4_t
+foo (float32x4_t a, float32_t b)
+{
+  return vsubq_n_f32 (a, b);
+}
+
+/* { dg-final { scan-assembler "vsub.f32"  }  } */
+
+float32x4_t
+foo1 (float32x4_t a, float32_t b)
+{
+  return vsubq (a, b);
+}
+
+/* { dg-final { scan-assembler "vsub.f32"  }  } */


[-- Attachment #2: diff08.patch --]
[-- Type: text/plain, Size: 20723 bytes --]

diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index cd82aa159089c288607e240de02a85dcbb134a14..c2dad057d1365914477c64d559aa1fd1c32bbf19 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -349,6 +349,30 @@ arm_unop_unone_imm_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 #define UNOP_UNONE_IMM_QUALIFIERS \
   (arm_unop_unone_imm_qualifiers)
 
+static enum arm_type_qualifiers
+arm_binop_none_none_none_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_none, qualifier_none, qualifier_none };
+#define BINOP_NONE_NONE_NONE_QUALIFIERS \
+  (arm_binop_none_none_none_qualifiers)
+
+static enum arm_type_qualifiers
+arm_binop_none_none_imm_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_none, qualifier_none, qualifier_immediate };
+#define BINOP_NONE_NONE_IMM_QUALIFIERS \
+  (arm_binop_none_none_imm_qualifiers)
+
+static enum arm_type_qualifiers
+arm_binop_none_unone_imm_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_none, qualifier_unsigned, qualifier_immediate };
+#define BINOP_NONE_UNONE_IMM_QUALIFIERS \
+  (arm_binop_none_unone_imm_qualifiers)
+
+static enum arm_type_qualifiers
+arm_binop_none_unone_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_none, qualifier_unsigned, qualifier_unsigned };
+#define BINOP_NONE_UNONE_UNONE_QUALIFIERS \
+  (arm_binop_none_unone_unone_qualifiers)
+
 /* End of Qualifier for MVE builtins.  */
 
    /* void ([T element type] *, T, immediate).  */
diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
index c8d9b6471634725cea9bab3f9fa145810b506938..15b7ada025fe57b682f3873f0f275c43a30d1273 100644
--- a/gcc/config/arm/arm_mve.h
+++ b/gcc/config/arm/arm_mve.h
@@ -197,6 +197,16 @@ typedef struct { uint8x16_t val[4]; } uint8x16x4_t;
 #define vctp64q(__a) __arm_vctp64q(__a)
 #define vctp8q(__a) __arm_vctp8q(__a)
 #define vpnot(__a) __arm_vpnot(__a)
+#define vsubq_n_f16(__a, __b) __arm_vsubq_n_f16(__a, __b)
+#define vsubq_n_f32(__a, __b) __arm_vsubq_n_f32(__a, __b)
+#define vbrsrq_n_f16(__a, __b) __arm_vbrsrq_n_f16(__a, __b)
+#define vbrsrq_n_f32(__a, __b) __arm_vbrsrq_n_f32(__a, __b)
+#define vcvtq_n_f16_s16(__a,  __imm6) __arm_vcvtq_n_f16_s16(__a,  __imm6)
+#define vcvtq_n_f32_s32(__a,  __imm6) __arm_vcvtq_n_f32_s32(__a,  __imm6)
+#define vcvtq_n_f16_u16(__a,  __imm6) __arm_vcvtq_n_f16_u16(__a,  __imm6)
+#define vcvtq_n_f32_u32(__a,  __imm6) __arm_vcvtq_n_f32_u32(__a,  __imm6)
+#define vcreateq_f16(__a, __b) __arm_vcreateq_f16(__a, __b)
+#define vcreateq_f32(__a, __b) __arm_vcreateq_f32(__a, __b)
 #endif
 
 __extension__ extern __inline void
@@ -1085,6 +1095,76 @@ __arm_vcvtmq_s32_f32 (float32x4_t __a)
   return __builtin_mve_vcvtmq_sv4si (__a);
 }
 
+__extension__ extern __inline float16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vsubq_n_f16 (float16x8_t __a, float16_t __b)
+{
+  return __builtin_mve_vsubq_n_fv8hf (__a, __b);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vsubq_n_f32 (float32x4_t __a, float32_t __b)
+{
+  return __builtin_mve_vsubq_n_fv4sf (__a, __b);
+}
+
+__extension__ extern __inline float16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vbrsrq_n_f16 (float16x8_t __a, int32_t __b)
+{
+  return __builtin_mve_vbrsrq_n_fv8hf (__a, __b);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vbrsrq_n_f32 (float32x4_t __a, int32_t __b)
+{
+  return __builtin_mve_vbrsrq_n_fv4sf (__a, __b);
+}
+
+__extension__ extern __inline float16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vcvtq_n_f16_s16 (int16x8_t __a, const int __imm6)
+{
+  return __builtin_mve_vcvtq_n_to_f_sv8hf (__a, __imm6);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vcvtq_n_f32_s32 (int32x4_t __a, const int __imm6)
+{
+  return __builtin_mve_vcvtq_n_to_f_sv4sf (__a, __imm6);
+}
+
+__extension__ extern __inline float16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vcvtq_n_f16_u16 (uint16x8_t __a, const int __imm6)
+{
+  return __builtin_mve_vcvtq_n_to_f_uv8hf (__a, __imm6);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vcvtq_n_f32_u32 (uint32x4_t __a, const int __imm6)
+{
+  return __builtin_mve_vcvtq_n_to_f_uv4sf (__a, __imm6);
+}
+
+__extension__ extern __inline float16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vcreateq_f16 (uint64_t __a, uint64_t __b)
+{
+  return __builtin_mve_vcreateq_fv8hf (__a, __b);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vcreateq_f32 (uint64_t __a, uint64_t __b)
+{
+  return __builtin_mve_vcreateq_fv4sf (__a, __b);
+}
+
 #endif
 
 enum {
@@ -1373,6 +1453,27 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_uint16x8_t]: __arm_vcvtq_f16_u16 (__ARM_mve_coerce(__p0, uint16x8_t)), \
   int (*)[__ARM_mve_type_uint32x4_t]: __arm_vcvtq_f32_u32 (__ARM_mve_coerce(__p0, uint32x4_t)));})
 
+#define vsubq(p0,p1) __arm_vsubq(p0,p1)
+#define __arm_vsubq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
+  __typeof(p1) __p1 = (p1); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
+  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16_t]: __arm_vsubq_n_f16 (__ARM_mve_coerce(__p0, float16x8_t), __ARM_mve_coerce(__p1, float16_t)), \
+  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32_t]: __arm_vsubq_n_f32 (__ARM_mve_coerce(__p0, float32x4_t), __ARM_mve_coerce(__p1, float32_t)));})
+
+#define vbrsrq(p0,p1) __arm_vbrsrq(p0,p1)
+#define __arm_vbrsrq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
+  int (*)[__ARM_mve_type_float16x8_t]: __arm_vbrsrq_n_f16 (__ARM_mve_coerce(__p0, float16x8_t), p1), \
+  int (*)[__ARM_mve_type_float32x4_t]: __arm_vbrsrq_n_f32 (__ARM_mve_coerce(__p0, float32x4_t), p1));})
+
+#define vcvtq_n(p0,p1) __arm_vcvtq_n(p0,p1)
+#define __arm_vcvtq_n(p0,p1) ({ __typeof(p0) __p0 = (p0); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
+  int (*)[__ARM_mve_type_int16x8_t]: __arm_vcvtq_n_f16_s16 (__ARM_mve_coerce(__p0, int16x8_t), p1), \
+  int (*)[__ARM_mve_type_int32x4_t]: __arm_vcvtq_n_f32_s32 (__ARM_mve_coerce(__p0, int32x4_t), p1), \
+  int (*)[__ARM_mve_type_uint16x8_t]: __arm_vcvtq_n_f16_u16 (__ARM_mve_coerce(__p0, uint16x8_t), p1), \
+  int (*)[__ARM_mve_type_uint32x4_t]: __arm_vcvtq_n_f32_u32 (__ARM_mve_coerce(__p0, uint32x4_t), p1));})
+
 #else /* MVE Interger.  */
 
 #define vst4q(p0,p1) __arm_vst4q(p0,p1)
diff --git a/gcc/config/arm/arm_mve_builtins.def b/gcc/config/arm/arm_mve_builtins.def
index 816b6dfca7fb221275212ca5f06fc6f679860a38..8d1e4fac3d75e87fbe334e64e1073cb1fef0d96d 100644
--- a/gcc/config/arm/arm_mve_builtins.def
+++ b/gcc/config/arm/arm_mve_builtins.def
@@ -6,7 +6,7 @@
 
     GCC is free software; you can redistribute it and/or modify it
     under the terms of the GNU General Public License as published
-    by the Free Software Foundation; either version 3, or (at your
+    by the Free Software Foundation; either version 3, or  (at your
     option) any later version.
 
     GCC is distributed in the hope that it will be useful, but WITHOUT
@@ -76,3 +76,8 @@ VAR1 (UNOP_UNONE_UNONE, vctp32q, hi)
 VAR1 (UNOP_UNONE_UNONE, vctp64q, hi)
 VAR1 (UNOP_UNONE_UNONE, vctp8q, hi)
 VAR1 (UNOP_UNONE_UNONE, vpnot, hi)
+VAR2 (BINOP_NONE_NONE_NONE, vsubq_n_f, v8hf, v4sf)
+VAR2 (BINOP_NONE_NONE_NONE, vbrsrq_n_f, v8hf, v4sf)
+VAR2 (BINOP_NONE_NONE_IMM, vcvtq_n_to_f_s, v8hf, v4sf)
+VAR2 (BINOP_NONE_UNONE_IMM, vcvtq_n_to_f_u, v8hf, v4sf)
+VAR2 (BINOP_NONE_UNONE_UNONE, vcreateq_f, v8hf, v4sf)
diff --git a/gcc/config/arm/constraints.md b/gcc/config/arm/constraints.md
index 4c193d20a48d5e9ed43bdca76468381fc17682da..86cd82334533acbe58be4a0c547cf38737d2bb84 100644
--- a/gcc/config/arm/constraints.md
+++ b/gcc/config/arm/constraints.md
@@ -35,7 +35,7 @@
 ;; in ARM/Thumb-2 state: Da, Db, Dc, Dd, Dn, DN, Dm, Dl, DL, Do, Dv, Dy, Di,
 ;;			 Dt, Dp, Dz, Tu
 ;; in Thumb-1 state: Pa, Pb, Pc, Pd, Pe
-;; in Thumb-2 state: Ha, Pj, PJ, Ps, Pt, Pu, Pv, Pw, Px, Py, Pz
+;; in Thumb-2 state: Ha, Pj, PJ, Ps, Pt, Pu, Pv, Pw, Px, Py, Pz, Rd
 ;; in all states: Pf, Pg, UM, U1
 
 ;; The following memory constraints have been used:
@@ -51,6 +51,11 @@
   "MVE EVEN registers @code{r0}, @code{r2}, @code{r4}, @code{r6}, @code{r8},
    @code{r10}, @code{r12}, @code{r14}")
 
+(define_constraint "Rd"
+  "@internal In Thumb-2 state a constant in range 1 to 16"
+  (and (match_code "const_int")
+       (match_test "TARGET_HAVE_MVE && ival >= 1 && ival <= 16")))
+
 (define_register_constraint "t" "TARGET_32BIT ? VFP_LO_REGS : NO_REGS"
  "The VFP registers @code{s0}-@code{s31}.")
 
diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index ee2263e04309e8274ec76ae4478afb7cfba59f8f..06701b024dabae85446bb60060a8b331e540cd6d 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -36,7 +36,9 @@
 			 VREV32Q_U VREV32Q_S VMOVLTQ_U VMOVLTQ_S VMOVLBQ_S
 			 VMOVLBQ_U VCVTQ_FROM_F_S VCVTQ_FROM_F_U VCVTPQ_S
 			 VCVTPQ_U VCVTNQ_S VCVTNQ_U VCVTMQ_S VCVTMQ_U
-			 VADDLVQ_U VCTP8Q VCTP16Q VCTP32Q VCTP64Q VPNOT])
+			 VADDLVQ_U VCTP8Q VCTP16Q VCTP32Q VCTP64Q VPNOT
+			 VCREATEQ_F VCVTQ_N_TO_F_S VCVTQ_N_TO_F_U VBRSRQ_N_F
+			 VSUBQ_N_F])
 
 (define_mode_attr MVE_CNVT [(V8HI "V8HF") (V4SI "V4SF")
 			    (V8HF "V8HI") (V4SF "V4SI")])
@@ -52,7 +54,8 @@
 		       (VCVTPQ_S "s") (VCVTPQ_U "u") (VCVTNQ_S "s")
 		       (VCVTNQ_U "u") (VCVTMQ_S "s") (VCVTMQ_U "u")
 		       (VCLZQ_U "u") (VCLZQ_S "s") (VREV32Q_U "u")
-		       (VREV32Q_S "s") (VADDLVQ_U "u") (VADDLVQ_S "s")])
+		       (VREV32Q_S "s") (VADDLVQ_U "u") (VADDLVQ_S "s")
+		       (VCVTQ_N_TO_F_S "s") (VCVTQ_N_TO_F_U "u")])
 
 (define_int_attr mode1 [(VCTP8Q "8") (VCTP16Q "16") (VCTP32Q "32")
 			(VCTP64Q "64")])
@@ -75,6 +78,7 @@
 (define_int_iterator VCVTMQ [VCVTMQ_S VCVTMQ_U])
 (define_int_iterator VADDLVQ [VADDLVQ_U VADDLVQ_S])
 (define_int_iterator VCTPQ [VCTP8Q VCTP16Q VCTP32Q VCTP64Q])
+(define_int_iterator VCVTQ_N_TO_F [VCVTQ_N_TO_F_S VCVTQ_N_TO_F_U])
 
 (define_insn "*mve_mov<mode>"
   [(set (match_operand:MVE_types 0 "s_register_operand" "=w,w,r,w,w,r,w")
@@ -680,3 +684,62 @@
   "vpnot"
   [(set_attr "type" "mve_move")
 ])
+
+;;
+;; [vsubq_n_f])
+;;
+(define_insn "mve_vsubq_n_f<mode>"
+  [
+   (set (match_operand:MVE_0 0 "s_register_operand" "=w")
+	(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "w")
+		       (match_operand:<V_elem> 2 "s_register_operand" "r")]
+	 VSUBQ_N_F))
+  ]
+  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
+  "vsub.f%#<V_sz_elem>  %q0, %q1, %2"
+  [(set_attr "type" "mve_move")
+])
+
+;;
+;; [vbrsrq_n_f])
+;;
+(define_insn "mve_vbrsrq_n_f<mode>"
+  [
+   (set (match_operand:MVE_0 0 "s_register_operand" "=w")
+	(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "w")
+		       (match_operand:SI 2 "s_register_operand" "r")]
+	 VBRSRQ_N_F))
+  ]
+  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
+  "vbrsr.%#<V_sz_elem>  %q0, %q1, %2"
+  [(set_attr "type" "mve_move")
+])
+
+;;
+;; [vcvtq_n_to_f_s, vcvtq_n_to_f_u])
+;;
+(define_insn "mve_vcvtq_n_to_f_<supf><mode>"
+  [
+   (set (match_operand:MVE_0 0 "s_register_operand" "=w")
+	(unspec:MVE_0 [(match_operand:<MVE_CNVT> 1 "s_register_operand" "w")
+		       (match_operand:SI 2 "mve_imm_16" "Rd")]
+	 VCVTQ_N_TO_F))
+  ]
+  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
+  "vcvt.f%#<V_sz_elem>.<supf>%#<V_sz_elem>\t%q0, %q1, %2"
+  [(set_attr "type" "mve_move")
+])
+
+;; [vcreateq_f])
+;;
+(define_insn "mve_vcreateq_f<mode>"
+  [
+   (set (match_operand:MVE_0 0 "s_register_operand" "=w")
+	(unspec:MVE_0 [(match_operand:DI 1 "s_register_operand" "r")
+		       (match_operand:DI 2 "s_register_operand" "r")]
+	 VCREATEQ_F))
+  ]
+  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
+  "vmov %q0[2], %q0[0], %Q2, %Q1\;vmov %q0[3], %q0[1], %R2, %R1"
+  [(set_attr "type" "mve_move")
+   (set_attr "length""8")])
diff --git a/gcc/config/arm/predicates.md b/gcc/config/arm/predicates.md
index 9d74165fe065b03c77918fe9e4611967799535f1..80a44bb32cbce1ee3f850a44b70a0e4ceed548be 100644
--- a/gcc/config/arm/predicates.md
+++ b/gcc/config/arm/predicates.md
@@ -31,6 +31,10 @@
 	      || REGNO_REG_CLASS (REGNO (op)) != NO_REGS));
 })
 
+;; True for immediates in the range of 1 to 16 for MVE.
+(define_predicate "mve_imm_16"
+  (match_test "satisfies_constraint_Rd (op)"))
+
 ; Predicate for stack protector guard's address in
 ; stack_protect_combined_set_insn and stack_protect_combined_test_insn patterns
 (define_predicate "guard_addr_operand"
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vbrsrq_n_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vbrsrq_n_f16.c
new file mode 100644
index 0000000000000000000000000000000000000000..90f52b50e6bd0dbccc40e9da7e2f8badb593727d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vbrsrq_n_f16.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+float16x8_t
+foo (float16x8_t a, int32_t b)
+{
+  return vbrsrq_n_f16 (a, b);
+}
+
+/* { dg-final { scan-assembler "vbrsr.16"  }  } */
+
+float16x8_t
+foo1 (float16x8_t a, int32_t b)
+{
+  return vbrsrq (a, b);
+}
+
+/* { dg-final { scan-assembler "vbrsr.16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vbrsrq_n_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vbrsrq_n_f32.c
new file mode 100644
index 0000000000000000000000000000000000000000..4ccd8dd313319ce1933e85f76955d1ea12952abd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vbrsrq_n_f32.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+float32x4_t
+foo (float32x4_t a, int32_t b)
+{
+  return vbrsrq_n_f32 (a, b);
+}
+
+/* { dg-final { scan-assembler "vbrsr.32"  }  } */
+
+float32x4_t
+foo1 (float32x4_t a, int32_t b)
+{
+  return vbrsrq (a, b);
+}
+
+/* { dg-final { scan-assembler "vbrsr.32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_f16.c
new file mode 100644
index 0000000000000000000000000000000000000000..a83203ad0de1e6bfc6b4b9d469748108a335a391
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_f16.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+float16x8_t
+foo (uint64_t a, uint64_t b)
+{
+  return vcreateq_f16 (a, b);
+}
+
+/* { dg-final { scan-assembler "vmov"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_f32.c
new file mode 100644
index 0000000000000000000000000000000000000000..279e922ed45165d0b104b236e4a2c8f782c4a59d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_f32.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+float32x4_t
+foo (uint64_t a, uint64_t b)
+{
+  return vcreateq_f32 (a, b);
+}
+
+/* { dg-final { scan-assembler "vmov"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_n_f16_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_n_f16_s16.c
new file mode 100644
index 0000000000000000000000000000000000000000..440d4df99b295deb66b91e3996cbca8f35f7d12b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_n_f16_s16.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+float16x8_t
+foo (int16x8_t a)
+{
+  return vcvtq_n_f16_s16 (a, 1);
+}
+
+/* { dg-final { scan-assembler "vcvt.f16.s16"  }  } */
+
+float16x8_t
+foo1 (int16x8_t a)
+{
+  return vcvtq_n (a, 1);
+}
+
+/* { dg-final { scan-assembler "vcvt.f16.s16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_n_f16_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_n_f16_u16.c
new file mode 100644
index 0000000000000000000000000000000000000000..4d648d653ef91436bb87d2e07e15ccee0844309e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_n_f16_u16.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+float16x8_t
+foo (uint16x8_t a)
+{
+  return vcvtq_n_f16_u16 (a, 1);
+}
+
+/* { dg-final { scan-assembler "vcvt.f16.u16"  }  } */
+
+float16x8_t
+foo1 (uint16x8_t a)
+{
+  return vcvtq_n (a, 1);
+}
+
+/* { dg-final { scan-assembler "vcvt.f16.u16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_n_f32_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_n_f32_s32.c
new file mode 100644
index 0000000000000000000000000000000000000000..836727a9b2d6a2621b34de5b7d4643ee1bebf47c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_n_f32_s32.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+float32x4_t
+foo (int32x4_t a)
+{
+  return vcvtq_n_f32_s32 (a, 1);
+}
+
+/* { dg-final { scan-assembler "vcvt.f32.s32"  }  } */
+
+float32x4_t
+foo1 (int32x4_t a)
+{
+  return vcvtq_n (a, 1);
+}
+
+/* { dg-final { scan-assembler "vcvt.f32.s32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_n_f32_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_n_f32_u32.c
new file mode 100644
index 0000000000000000000000000000000000000000..fcd66a42fd49c78e1c11ac89393dc38cb06efbf5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_n_f32_u32.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+float32x4_t
+foo (uint32x4_t a)
+{
+  return vcvtq_n_f32_u32 (a, 1);
+}
+
+/* { dg-final { scan-assembler "vcvt.f32.u32"  }  } */
+
+float32x4_t
+foo1 (uint32x4_t a)
+{
+  return vcvtq_n (a, 1);
+}
+
+/* { dg-final { scan-assembler "vcvt.f32.u32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_n_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_n_f16.c
new file mode 100644
index 0000000000000000000000000000000000000000..e6ff11edf3933de7acd7bdffd33d8b79c88a7f48
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_n_f16.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+float16x8_t
+foo (float16x8_t a, float16_t b)
+{
+  return vsubq_n_f16 (a, b);
+}
+
+/* { dg-final { scan-assembler "vsub.f16"  }  } */
+
+float16x8_t
+foo1 (float16x8_t a, float16_t b)
+{
+  return vsubq (a, b);
+}
+
+/* { dg-final { scan-assembler "vsub.f16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_n_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_n_f32.c
new file mode 100644
index 0000000000000000000000000000000000000000..1092fd0aa6abc2a0ed0320a17d6b563f0396fb58
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_n_f32.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+float32x4_t
+foo (float32x4_t a, float32_t b)
+{
+  return vsubq_n_f32 (a, b);
+}
+
+/* { dg-final { scan-assembler "vsub.f32"  }  } */
+
+float32x4_t
+foo1 (float32x4_t a, float32_t b)
+{
+  return vsubq (a, b);
+}
+
+/* { dg-final { scan-assembler "vsub.f32"  }  } */


^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH][ARM][GCC][1/4x]: MVE intrinsics with quaternary operands.
  2019-11-14 19:34 [PATCH][ARM][GCC][0/x]: Support for MVE ACLE intrinsics Srinath Parvathaneni
                   ` (10 preceding siblings ...)
  2019-11-14 19:14 ` [PATCH][ARM][GCC][1/2x]: " Srinath Parvathaneni
@ 2019-11-14 19:14 ` Srinath Parvathaneni
  2019-11-14 19:15 ` [PATCH][ARM][GCC][4/2x]: MVE intrinsics with binary operands Srinath Parvathaneni
                   ` (26 subsequent siblings)
  38 siblings, 0 replies; 48+ messages in thread
From: Srinath Parvathaneni @ 2019-11-14 19:14 UTC (permalink / raw)
  To: gcc-patches; +Cc: Richard Earnshaw, Kyrylo Tkachov

[-- Attachment #1: Type: text/plain, Size: 84299 bytes --]

Hello,

This patch supports following MVE ACLE intrinsics with quaternary operands.

vsriq_m_n_s8, vsubq_m_s8, vsubq_x_s8, vcvtq_m_n_f16_u16, vcvtq_x_n_f16_u16,
vqshluq_m_n_s8, vabavq_p_s8, vsriq_m_n_u8, vshlq_m_u8, vshlq_x_u8, vsubq_m_u8,
vsubq_x_u8, vabavq_p_u8, vshlq_m_s8, vshlq_x_s8, vcvtq_m_n_f16_s16,
vcvtq_x_n_f16_s16, vsriq_m_n_s16, vsubq_m_s16, vsubq_x_s16, vcvtq_m_n_f32_u32,
vcvtq_x_n_f32_u32, vqshluq_m_n_s16, vabavq_p_s16, vsriq_m_n_u16,
vshlq_m_u16, vshlq_x_u16, vsubq_m_u16, vsubq_x_u16, vabavq_p_u16, vshlq_m_s16,
vshlq_x_s16, vcvtq_m_n_f32_s32, vcvtq_x_n_f32_s32, vsriq_m_n_s32, vsubq_m_s32,
vsubq_x_s32, vqshluq_m_n_s32, vabavq_p_s32, vsriq_m_n_u32, vshlq_m_u32,
vshlq_x_u32, vsubq_m_u32, vsubq_x_u32, vabavq_p_u32, vshlq_m_s32, vshlq_x_s32.

Please refer to M-profile Vector Extension (MVE) intrinsics [1]  for more details.
[1] https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/mve-intrinsics

Regression tested on arm-none-eabi and found no regressions.

Ok for trunk?

Thanks,
Srinath.

gcc/ChangeLog:

2019-10-29  Andre Vieira  <andre.simoesdiasvieira@arm.com>
            Mihail Ionescu  <mihail.ionescu@arm.com>
            Srinath Parvathaneni  <srinath.parvathaneni@arm.com>

	* config/arm/arm-builtins.c (QUADOP_UNONE_UNONE_NONE_NONE_UNONE_QUALIFIERS):
	Define builtin qualifier.
	(QUADOP_NONE_NONE_NONE_NONE_UNONE_QUALIFIERS): Likewise.
	(QUADOP_NONE_NONE_NONE_IMM_UNONE_QUALIFIERS): Likewise.
	(QUADOP_UNONE_UNONE_UNONE_UNONE_UNONE_QUALIFIERS): Likewise.
	(QUADOP_UNONE_UNONE_NONE_IMM_UNONE_QUALIFIERS): Likewise.
	(QUADOP_NONE_NONE_UNONE_IMM_UNONE_QUALIFIERS): Likewise.
	(QUADOP_UNONE_UNONE_UNONE_IMM_UNONE_QUALIFIERS): Likewise.
	(QUADOP_UNONE_UNONE_UNONE_NONE_UNONE_QUALIFIERS): Likewise.
	* config/arm/arm_mve.h (vsriq_m_n_s8): Define macro.
	(vsubq_m_s8): Likewise.
	(vcvtq_m_n_f16_u16): Likewise.
	(vqshluq_m_n_s8): Likewise.
	(vabavq_p_s8): Likewise.
	(vsriq_m_n_u8): Likewise.
	(vshlq_m_u8): Likewise.
	(vsubq_m_u8): Likewise.
	(vabavq_p_u8): Likewise.
	(vshlq_m_s8): Likewise.
	(vcvtq_m_n_f16_s16): Likewise.
	(vsriq_m_n_s16): Likewise.
	(vsubq_m_s16): Likewise.
	(vcvtq_m_n_f32_u32): Likewise.
	(vqshluq_m_n_s16): Likewise.
	(vabavq_p_s16): Likewise.
	(vsriq_m_n_u16): Likewise.
	(vshlq_m_u16): Likewise.
	(vsubq_m_u16): Likewise.
	(vabavq_p_u16): Likewise.
	(vshlq_m_s16): Likewise.
	(vcvtq_m_n_f32_s32): Likewise.
	(vsriq_m_n_s32): Likewise.
	(vsubq_m_s32): Likewise.
	(vqshluq_m_n_s32): Likewise.
	(vabavq_p_s32): Likewise.
	(vsriq_m_n_u32): Likewise.
	(vshlq_m_u32): Likewise.
	(vsubq_m_u32): Likewise.
	(vabavq_p_u32): Likewise.
	(vshlq_m_s32): Likewise.
	(__arm_vsriq_m_n_s8): Define intrinsic.
	(__arm_vsubq_m_s8): Likewise.
	(__arm_vqshluq_m_n_s8): Likewise.
	(__arm_vabavq_p_s8): Likewise.
	(__arm_vsriq_m_n_u8): Likewise.
	(__arm_vshlq_m_u8): Likewise.
	(__arm_vsubq_m_u8): Likewise.
	(__arm_vabavq_p_u8): Likewise.
	(__arm_vshlq_m_s8): Likewise.
	(__arm_vsriq_m_n_s16): Likewise.
	(__arm_vsubq_m_s16): Likewise.
	(__arm_vqshluq_m_n_s16): Likewise.
	(__arm_vabavq_p_s16): Likewise.
	(__arm_vsriq_m_n_u16): Likewise.
	(__arm_vshlq_m_u16): Likewise.
	(__arm_vsubq_m_u16): Likewise.
	(__arm_vabavq_p_u16): Likewise.
	(__arm_vshlq_m_s16): Likewise.
	(__arm_vsriq_m_n_s32): Likewise.
	(__arm_vsubq_m_s32): Likewise.
	(__arm_vqshluq_m_n_s32): Likewise.
	(__arm_vabavq_p_s32): Likewise.
	(__arm_vsriq_m_n_u32): Likewise.
	(__arm_vshlq_m_u32): Likewise.
	(__arm_vsubq_m_u32): Likewise.
	(__arm_vabavq_p_u32): Likewise.
	(__arm_vshlq_m_s32): Likewise.
	(__arm_vcvtq_m_n_f16_u16): Likewise.
	(__arm_vcvtq_m_n_f16_s16): Likewise.
	(__arm_vcvtq_m_n_f32_u32): Likewise.
	(__arm_vcvtq_m_n_f32_s32): Likewise.
	(vcvtq_m_n): Define polymorphic variant.
	(vqshluq_m_n): Likewise.
	(vshlq_m): Likewise.
	(vsriq_m_n): Likewise.
	(vsubq_m): Likewise.
	(vabavq_p): Likewise.
	* config/arm/arm_mve_builtins.def
	(QUADOP_UNONE_UNONE_NONE_NONE_UNONE_QUALIFIERS): Use builtin qualifier.
	(QUADOP_NONE_NONE_NONE_NONE_UNONE_QUALIFIERS): Likewise.
	(QUADOP_NONE_NONE_NONE_IMM_UNONE_QUALIFIERS): Likewise.
	(QUADOP_UNONE_UNONE_UNONE_UNONE_UNONE_QUALIFIERS): Likewise.
	(QUADOP_UNONE_UNONE_NONE_IMM_UNONE_QUALIFIERS): Likewise.
	(QUADOP_NONE_NONE_UNONE_IMM_UNONE_QUALIFIERS): Likewise.
	(QUADOP_UNONE_UNONE_UNONE_IMM_UNONE_QUALIFIERS): Likewise.
	(QUADOP_UNONE_UNONE_UNONE_NONE_UNONE_QUALIFIERS): Likewise.
	* config/arm/mve.md (VABAVQ_P): Define iterator.
	(VSHLQ_M): Likewise.
	(VSRIQ_M_N): Likewise.
	(VSUBQ_M): Likewise.
	(VCVTQ_M_N_TO_F): Likewise.
	(mve_vabavq_p_<supf><mode>): Define RTL pattern.
	(mve_vqshluq_m_n_s<mode>): Likewise.
	(mve_vshlq_m_<supf><mode>): Likewise.
	(mve_vsriq_m_n_<supf><mode>): Likewise.
	(mve_vsubq_m_<supf><mode>): Likewise.
	(mve_vcvtq_m_n_to_f_<supf><mode>): Likewise.

gcc/testsuite/ChangeLog:

2019-10-29  Andre Vieira  <andre.simoesdiasvieira@arm.com>
            Mihail Ionescu  <mihail.ionescu@arm.com>
            Srinath Parvathaneni  <srinath.parvathaneni@arm.com>

	* gcc.target/arm/mve/intrinsics/vabavq_p_s16.c: New test.
	* gcc.target/arm/mve/intrinsics/vabavq_p_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vabavq_p_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vabavq_p_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vabavq_p_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vabavq_p_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcvtq_m_n_f16_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcvtq_m_n_f16_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcvtq_m_n_f32_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcvtq_m_n_f32_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqshluq_m_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqshluq_m_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqshluq_m_n_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vshlq_m_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vshlq_m_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vshlq_m_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vshlq_m_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vshlq_m_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vshlq_m_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsriq_m_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsriq_m_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsriq_m_n_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsriq_m_n_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsriq_m_n_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsriq_m_n_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsubq_m_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsubq_m_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsubq_m_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsubq_m_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsubq_m_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsubq_m_u8.c: Likewise.


###############     Attachment also inlined for ease of reply    ###############


diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index b0d2a19684dc798e1ae1a4c0496c6c4fddac891d..6dffb36fe179357c62cb4f35d486513971e3487d 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -499,6 +499,62 @@ arm_ternop_none_none_none_none_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 #define TERNOP_NONE_NONE_NONE_NONE_QUALIFIERS \
   (arm_ternop_none_none_none_none_qualifiers)
 
+static enum arm_type_qualifiers
+arm_quadop_unone_unone_none_none_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_unsigned, qualifier_unsigned, qualifier_none, qualifier_none,
+    qualifier_unsigned };
+#define QUADOP_UNONE_UNONE_NONE_NONE_UNONE_QUALIFIERS \
+  (arm_quadop_unone_unone_none_none_unone_qualifiers)
+
+static enum arm_type_qualifiers
+arm_quadop_none_none_none_none_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_none, qualifier_none, qualifier_none, qualifier_none,
+    qualifier_unsigned };
+#define QUADOP_NONE_NONE_NONE_NONE_UNONE_QUALIFIERS \
+  (arm_quadop_none_none_none_none_unone_qualifiers)
+
+static enum arm_type_qualifiers
+arm_quadop_none_none_none_imm_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_none, qualifier_none, qualifier_none, qualifier_immediate,
+    qualifier_unsigned };
+#define QUADOP_NONE_NONE_NONE_IMM_UNONE_QUALIFIERS \
+  (arm_quadop_none_none_none_imm_unone_qualifiers)
+
+static enum arm_type_qualifiers
+arm_quadop_unone_unone_unone_unone_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_unsigned, qualifier_unsigned, qualifier_unsigned,
+    qualifier_unsigned, qualifier_unsigned };
+#define QUADOP_UNONE_UNONE_UNONE_UNONE_UNONE_QUALIFIERS \
+  (arm_quadop_unone_unone_unone_unone_unone_qualifiers)
+
+static enum arm_type_qualifiers
+arm_quadop_unone_unone_none_imm_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_unsigned, qualifier_unsigned, qualifier_none,
+    qualifier_immediate, qualifier_unsigned };
+#define QUADOP_UNONE_UNONE_NONE_IMM_UNONE_QUALIFIERS \
+  (arm_quadop_unone_unone_none_imm_unone_qualifiers)
+
+static enum arm_type_qualifiers
+arm_quadop_none_none_unone_imm_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_none, qualifier_none, qualifier_unsigned, qualifier_immediate,
+    qualifier_unsigned };
+#define QUADOP_NONE_NONE_UNONE_IMM_UNONE_QUALIFIERS \
+  (arm_quadop_none_none_unone_imm_unone_qualifiers)
+
+static enum arm_type_qualifiers
+arm_quadop_unone_unone_unone_imm_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_unsigned, qualifier_unsigned, qualifier_unsigned,
+    qualifier_immediate, qualifier_unsigned };
+#define QUADOP_UNONE_UNONE_UNONE_IMM_UNONE_QUALIFIERS \
+  (arm_quadop_unone_unone_unone_imm_unone_qualifiers)
+
+static enum arm_type_qualifiers
+arm_quadop_unone_unone_unone_none_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_unsigned, qualifier_unsigned, qualifier_unsigned,
+    qualifier_none, qualifier_unsigned };
+#define QUADOP_UNONE_UNONE_UNONE_NONE_UNONE_QUALIFIERS \
+  (arm_quadop_unone_unone_unone_none_unone_qualifiers)
+
 /* End of Qualifier for MVE builtins.  */
 
    /* void ([T element type] *, T, immediate).  */
diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
index dcb5682ef5c82585889b88ce08c14ddb41e84a3e..8a282966750556f58298a4a7c3a872e2cc7bee51 100644
--- a/gcc/config/arm/arm_mve.h
+++ b/gcc/config/arm/arm_mve.h
@@ -1232,6 +1232,37 @@ typedef struct { uint8x16_t val[4]; } uint8x16x4_t;
 #define vqmovnbq_m_u32(__a, __b, __p) __arm_vqmovnbq_m_u32(__a, __b, __p)
 #define vqmovntq_m_u32(__a, __b, __p) __arm_vqmovntq_m_u32(__a, __b, __p)
 #define vrev32q_m_u16(__inactive, __a, __p) __arm_vrev32q_m_u16(__inactive, __a, __p)
+#define vsriq_m_n_s8(__a, __b,  __imm, __p) __arm_vsriq_m_n_s8(__a, __b,  __imm, __p)
+#define vsubq_m_s8(__inactive, __a, __b, __p) __arm_vsubq_m_s8(__inactive, __a, __b, __p)
+#define vcvtq_m_n_f16_u16(__inactive, __a,  __imm6, __p) __arm_vcvtq_m_n_f16_u16(__inactive, __a,  __imm6, __p)
+#define vqshluq_m_n_s8(__inactive, __a,  __imm, __p) __arm_vqshluq_m_n_s8(__inactive, __a,  __imm, __p)
+#define vabavq_p_s8(__a, __b, __c, __p) __arm_vabavq_p_s8(__a, __b, __c, __p)
+#define vsriq_m_n_u8(__a, __b,  __imm, __p) __arm_vsriq_m_n_u8(__a, __b,  __imm, __p)
+#define vshlq_m_u8(__inactive, __a, __b, __p) __arm_vshlq_m_u8(__inactive, __a, __b, __p)
+#define vsubq_m_u8(__inactive, __a, __b, __p) __arm_vsubq_m_u8(__inactive, __a, __b, __p)
+#define vabavq_p_u8(__a, __b, __c, __p) __arm_vabavq_p_u8(__a, __b, __c, __p)
+#define vshlq_m_s8(__inactive, __a, __b, __p) __arm_vshlq_m_s8(__inactive, __a, __b, __p)
+#define vcvtq_m_n_f16_s16(__inactive, __a,  __imm6, __p) __arm_vcvtq_m_n_f16_s16(__inactive, __a,  __imm6, __p)
+#define vsriq_m_n_s16(__a, __b,  __imm, __p) __arm_vsriq_m_n_s16(__a, __b,  __imm, __p)
+#define vsubq_m_s16(__inactive, __a, __b, __p) __arm_vsubq_m_s16(__inactive, __a, __b, __p)
+#define vcvtq_m_n_f32_u32(__inactive, __a,  __imm6, __p) __arm_vcvtq_m_n_f32_u32(__inactive, __a,  __imm6, __p)
+#define vqshluq_m_n_s16(__inactive, __a,  __imm, __p) __arm_vqshluq_m_n_s16(__inactive, __a,  __imm, __p)
+#define vabavq_p_s16(__a, __b, __c, __p) __arm_vabavq_p_s16(__a, __b, __c, __p)
+#define vsriq_m_n_u16(__a, __b,  __imm, __p) __arm_vsriq_m_n_u16(__a, __b,  __imm, __p)
+#define vshlq_m_u16(__inactive, __a, __b, __p) __arm_vshlq_m_u16(__inactive, __a, __b, __p)
+#define vsubq_m_u16(__inactive, __a, __b, __p) __arm_vsubq_m_u16(__inactive, __a, __b, __p)
+#define vabavq_p_u16(__a, __b, __c, __p) __arm_vabavq_p_u16(__a, __b, __c, __p)
+#define vshlq_m_s16(__inactive, __a, __b, __p) __arm_vshlq_m_s16(__inactive, __a, __b, __p)
+#define vcvtq_m_n_f32_s32(__inactive, __a,  __imm6, __p) __arm_vcvtq_m_n_f32_s32(__inactive, __a,  __imm6, __p)
+#define vsriq_m_n_s32(__a, __b,  __imm, __p) __arm_vsriq_m_n_s32(__a, __b,  __imm, __p)
+#define vsubq_m_s32(__inactive, __a, __b, __p) __arm_vsubq_m_s32(__inactive, __a, __b, __p)
+#define vqshluq_m_n_s32(__inactive, __a,  __imm, __p) __arm_vqshluq_m_n_s32(__inactive, __a,  __imm, __p)
+#define vabavq_p_s32(__a, __b, __c, __p) __arm_vabavq_p_s32(__a, __b, __c, __p)
+#define vsriq_m_n_u32(__a, __b,  __imm, __p) __arm_vsriq_m_n_u32(__a, __b,  __imm, __p)
+#define vshlq_m_u32(__inactive, __a, __b, __p) __arm_vshlq_m_u32(__inactive, __a, __b, __p)
+#define vsubq_m_u32(__inactive, __a, __b, __p) __arm_vsubq_m_u32(__inactive, __a, __b, __p)
+#define vabavq_p_u32(__a, __b, __c, __p) __arm_vabavq_p_u32(__a, __b, __c, __p)
+#define vshlq_m_s32(__inactive, __a, __b, __p) __arm_vshlq_m_s32(__inactive, __a, __b, __p)
 #endif
 
 __extension__ extern __inline void
@@ -7696,6 +7727,196 @@ __arm_vrev32q_m_u16 (uint16x8_t __inactive, uint16x8_t __a, mve_pred16_t __p)
 {
   return __builtin_mve_vrev32q_m_uv8hi (__inactive, __a, __p);
 }
+
+__extension__ extern __inline int8x16_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vsriq_m_n_s8 (int8x16_t __a, int8x16_t __b, const int __imm, mve_pred16_t __p)
+{
+  return __builtin_mve_vsriq_m_n_sv16qi (__a, __b, __imm, __p);
+}
+
+__extension__ extern __inline int8x16_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vsubq_m_s8 (int8x16_t __inactive, int8x16_t __a, int8x16_t __b, mve_pred16_t __p)
+{
+  return __builtin_mve_vsubq_m_sv16qi (__inactive, __a, __b, __p);
+}
+
+__extension__ extern __inline uint8x16_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vqshluq_m_n_s8 (uint8x16_t __inactive, int8x16_t __a, const int __imm, mve_pred16_t __p)
+{
+  return __builtin_mve_vqshluq_m_n_sv16qi (__inactive, __a, __imm, __p);
+}
+
+__extension__ extern __inline uint32_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vabavq_p_s8 (uint32_t __a, int8x16_t __b, int8x16_t __c, mve_pred16_t __p)
+{
+  return __builtin_mve_vabavq_p_sv16qi (__a, __b, __c, __p);
+}
+
+__extension__ extern __inline uint8x16_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vsriq_m_n_u8 (uint8x16_t __a, uint8x16_t __b, const int __imm, mve_pred16_t __p)
+{
+  return __builtin_mve_vsriq_m_n_uv16qi (__a, __b, __imm, __p);
+}
+
+__extension__ extern __inline uint8x16_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vshlq_m_u8 (uint8x16_t __inactive, uint8x16_t __a, int8x16_t __b, mve_pred16_t __p)
+{
+  return __builtin_mve_vshlq_m_uv16qi (__inactive, __a, __b, __p);
+}
+
+__extension__ extern __inline uint8x16_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vsubq_m_u8 (uint8x16_t __inactive, uint8x16_t __a, uint8x16_t __b, mve_pred16_t __p)
+{
+  return __builtin_mve_vsubq_m_uv16qi (__inactive, __a, __b, __p);
+}
+
+__extension__ extern __inline uint32_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vabavq_p_u8 (uint32_t __a, uint8x16_t __b, uint8x16_t __c, mve_pred16_t __p)
+{
+  return __builtin_mve_vabavq_p_uv16qi (__a, __b, __c, __p);
+}
+
+__extension__ extern __inline int8x16_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vshlq_m_s8 (int8x16_t __inactive, int8x16_t __a, int8x16_t __b, mve_pred16_t __p)
+{
+  return __builtin_mve_vshlq_m_sv16qi (__inactive, __a, __b, __p);
+}
+
+__extension__ extern __inline int16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vsriq_m_n_s16 (int16x8_t __a, int16x8_t __b, const int __imm, mve_pred16_t __p)
+{
+  return __builtin_mve_vsriq_m_n_sv8hi (__a, __b, __imm, __p);
+}
+
+__extension__ extern __inline int16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vsubq_m_s16 (int16x8_t __inactive, int16x8_t __a, int16x8_t __b, mve_pred16_t __p)
+{
+  return __builtin_mve_vsubq_m_sv8hi (__inactive, __a, __b, __p);
+}
+
+__extension__ extern __inline uint16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vqshluq_m_n_s16 (uint16x8_t __inactive, int16x8_t __a, const int __imm, mve_pred16_t __p)
+{
+  return __builtin_mve_vqshluq_m_n_sv8hi (__inactive, __a, __imm, __p);
+}
+
+__extension__ extern __inline uint32_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vabavq_p_s16 (uint32_t __a, int16x8_t __b, int16x8_t __c, mve_pred16_t __p)
+{
+  return __builtin_mve_vabavq_p_sv8hi (__a, __b, __c, __p);
+}
+
+__extension__ extern __inline uint16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vsriq_m_n_u16 (uint16x8_t __a, uint16x8_t __b, const int __imm, mve_pred16_t __p)
+{
+  return __builtin_mve_vsriq_m_n_uv8hi (__a, __b, __imm, __p);
+}
+
+__extension__ extern __inline uint16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vshlq_m_u16 (uint16x8_t __inactive, uint16x8_t __a, int16x8_t __b, mve_pred16_t __p)
+{
+  return __builtin_mve_vshlq_m_uv8hi (__inactive, __a, __b, __p);
+}
+
+__extension__ extern __inline uint16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vsubq_m_u16 (uint16x8_t __inactive, uint16x8_t __a, uint16x8_t __b, mve_pred16_t __p)
+{
+  return __builtin_mve_vsubq_m_uv8hi (__inactive, __a, __b, __p);
+}
+
+__extension__ extern __inline uint32_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vabavq_p_u16 (uint32_t __a, uint16x8_t __b, uint16x8_t __c, mve_pred16_t __p)
+{
+  return __builtin_mve_vabavq_p_uv8hi (__a, __b, __c, __p);
+}
+
+__extension__ extern __inline int16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vshlq_m_s16 (int16x8_t __inactive, int16x8_t __a, int16x8_t __b, mve_pred16_t __p)
+{
+  return __builtin_mve_vshlq_m_sv8hi (__inactive, __a, __b, __p);
+}
+
+__extension__ extern __inline int32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vsriq_m_n_s32 (int32x4_t __a, int32x4_t __b, const int __imm, mve_pred16_t __p)
+{
+  return __builtin_mve_vsriq_m_n_sv4si (__a, __b, __imm, __p);
+}
+
+__extension__ extern __inline int32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vsubq_m_s32 (int32x4_t __inactive, int32x4_t __a, int32x4_t __b, mve_pred16_t __p)
+{
+  return __builtin_mve_vsubq_m_sv4si (__inactive, __a, __b, __p);
+}
+
+__extension__ extern __inline uint32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vqshluq_m_n_s32 (uint32x4_t __inactive, int32x4_t __a, const int __imm, mve_pred16_t __p)
+{
+  return __builtin_mve_vqshluq_m_n_sv4si (__inactive, __a, __imm, __p);
+}
+
+__extension__ extern __inline uint32_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vabavq_p_s32 (uint32_t __a, int32x4_t __b, int32x4_t __c, mve_pred16_t __p)
+{
+  return __builtin_mve_vabavq_p_sv4si (__a, __b, __c, __p);
+}
+
+__extension__ extern __inline uint32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vsriq_m_n_u32 (uint32x4_t __a, uint32x4_t __b, const int __imm, mve_pred16_t __p)
+{
+  return __builtin_mve_vsriq_m_n_uv4si (__a, __b, __imm, __p);
+}
+
+__extension__ extern __inline uint32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vshlq_m_u32 (uint32x4_t __inactive, uint32x4_t __a, int32x4_t __b, mve_pred16_t __p)
+{
+  return __builtin_mve_vshlq_m_uv4si (__inactive, __a, __b, __p);
+}
+
+__extension__ extern __inline uint32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vsubq_m_u32 (uint32x4_t __inactive, uint32x4_t __a, uint32x4_t __b, mve_pred16_t __p)
+{
+  return __builtin_mve_vsubq_m_uv4si (__inactive, __a, __b, __p);
+}
+
+__extension__ extern __inline uint32_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vabavq_p_u32 (uint32_t __a, uint32x4_t __b, uint32x4_t __c, mve_pred16_t __p)
+{
+  return __builtin_mve_vabavq_p_uv4si (__a, __b, __c, __p);
+}
+
+__extension__ extern __inline int32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vshlq_m_s32 (int32x4_t __inactive, int32x4_t __a, int32x4_t __b, mve_pred16_t __p)
+{
+  return __builtin_mve_vshlq_m_sv4si (__inactive, __a, __b, __p);
+}
+
 #if (__ARM_FEATURE_MVE & 2) /* MVE Floating point.  */
 
 __extension__ extern __inline void
@@ -9375,6 +9596,35 @@ __arm_vcvtq_m_u32_f32 (uint32x4_t __inactive, float32x4_t __a, mve_pred16_t __p)
 {
   return __builtin_mve_vcvtq_m_from_f_uv4si (__inactive, __a, __p);
 }
+
+__extension__ extern __inline float16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vcvtq_m_n_f16_u16 (float16x8_t __inactive, uint16x8_t __a, const int __imm6, mve_pred16_t __p)
+{
+  return __builtin_mve_vcvtq_m_n_to_f_uv8hf (__inactive, __a, __imm6, __p);
+}
+
+__extension__ extern __inline float16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vcvtq_m_n_f16_s16 (float16x8_t __inactive, int16x8_t __a, const int __imm6, mve_pred16_t __p)
+{
+  return __builtin_mve_vcvtq_m_n_to_f_sv8hf (__inactive, __a, __imm6, __p);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vcvtq_m_n_f32_u32 (float32x4_t __inactive, uint32x4_t __a, const int __imm6, mve_pred16_t __p)
+{
+  return __builtin_mve_vcvtq_m_n_to_f_uv4sf (__inactive, __a, __imm6, __p);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vcvtq_m_n_f32_s32 (float32x4_t __inactive, int32x4_t __a, const int __imm6, mve_pred16_t __p)
+{
+  return __builtin_mve_vcvtq_m_n_to_f_sv4sf (__inactive, __a, __imm6, __p);
+}
+
 #endif
 
 enum {
@@ -10131,6 +10381,15 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_float16x8_t]: __arm_vcvtq_m_u16_f16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, float16x8_t), p2), \
   int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_float32x4_t]: __arm_vcvtq_m_u32_f32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, float32x4_t), p2));})
 
+#define vcvtq_m_n(p0,p1,p2,p3) __arm_vcvtq_m_n(p0,p1,p2,p3)
+#define __arm_vcvtq_m_n(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
+  __typeof(p1) __p1 = (p1); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
+  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_int16x8_t]: __arm_vcvtq_m_n_f16_s16 (__ARM_mve_coerce(__p0, float16x8_t), __ARM_mve_coerce(__p1, int16x8_t), p2, p3), \
+  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_int32x4_t]: __arm_vcvtq_m_n_f32_s32 (__ARM_mve_coerce(__p0, float32x4_t), __ARM_mve_coerce(__p1, int32x4_t), p2, p3), \
+  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vcvtq_m_n_f16_u16 (__ARM_mve_coerce(__p0, float16x8_t), __ARM_mve_coerce(__p1, uint16x8_t), p2, p3), \
+  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vcvtq_m_n_f32_u32 (__ARM_mve_coerce(__p0, float32x4_t), __ARM_mve_coerce(__p1, uint32x4_t), p2, p3));})
+
 #define vabsq_m(p0,p1,p2) __arm_vabsq_m(p0,p1,p2)
 #define __arm_vabsq_m(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
@@ -10173,19 +10432,6 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]: __arm_vcmlaq_rot90_f16 (__ARM_mve_coerce(__p0, float16x8_t), __ARM_mve_coerce(__p1, float16x8_t), __ARM_mve_coerce(__p2, float16x8_t)), \
   int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]: __arm_vcmlaq_rot90_f32 (__ARM_mve_coerce(__p0, float32x4_t), __ARM_mve_coerce(__p1, float32x4_t), __ARM_mve_coerce(__p2, float32x4_t)));})
 
-#define vcmpeqq_m_n(p0,p1,p2) __arm_vcmpeqq_m_n(p0,p1,p2)
-#define __arm_vcmpeqq_m_n(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
-  __typeof(p1) __p1 = (p1); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8_t]: __arm_vcmpeqq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8_t), p2), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16_t]: __arm_vcmpeqq_m_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16_t), p2), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32_t]: __arm_vcmpeqq_m_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32_t), p2), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8_t]: __arm_vcmpeqq_m_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8_t), p2), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16_t]: __arm_vcmpeqq_m_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16_t), p2), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32_t]: __arm_vcmpeqq_m_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32_t), p2), \
-  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16_t]: __arm_vcmpeqq_m_n_f16 (__ARM_mve_coerce(__p0, float16x8_t), __ARM_mve_coerce(__p1, float16_t), p2), \
-  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32_t]: __arm_vcmpeqq_m_n_f32 (__ARM_mve_coerce(__p0, float32x4_t), __ARM_mve_coerce(__p1, float32_t), p2));})
-
 #define vrndxq_m(p0,p1,p2) __arm_vrndxq_m(p0,p1,p2)
 #define __arm_vrndxq_m(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
@@ -11847,8 +12093,8 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vsliq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t), p2), \
   int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vsliq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t), p2));})
 
-#define vsriq_n(p0,p1,p2) __arm_vsriq_n(p0,p1,p2)
-#define __arm_vsriq_n(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
+#define vsriq(p0,p1,p2) __arm_vsriq(p0,p1,p2)
+#define __arm_vsriq(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
   _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
   int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vsriq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), p2), \
@@ -12169,28 +12415,6 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vcmpcsq_m_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t), p2), \
   int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vcmpcsq_m_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t), p2));})
 
-#define vcmpeqq_m_n(p0,p1,p2) __arm_vcmpeqq_m_n(p0,p1,p2)
-#define __arm_vcmpeqq_m_n(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
-  __typeof(p1) __p1 = (p1); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8_t]: __arm_vcmpeqq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8_t), p2), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16_t]: __arm_vcmpeqq_m_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16_t), p2), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32_t]: __arm_vcmpeqq_m_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32_t), p2), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8_t]: __arm_vcmpeqq_m_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8_t), p2), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16_t]: __arm_vcmpeqq_m_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16_t), p2), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32_t]: __arm_vcmpeqq_m_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32_t), p2));})
-
-#define vcmpeqq_m(p0,p1,p2) __arm_vcmpeqq_m(p0,p1,p2)
-#define __arm_vcmpeqq_m(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
-  __typeof(p1) __p1 = (p1); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vcmpeqq_m_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), p2), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vcmpeqq_m_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), p2), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vcmpeqq_m_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), p2), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]: __arm_vcmpeqq_m_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t), p2), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vcmpeqq_m_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t), p2), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vcmpeqq_m_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t), p2));})
-
 #define vmladavxq_p(p0,p1,p2) __arm_vmladavxq_p(p0,p1,p2)
 #define __arm_vmladavxq_p(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
@@ -12607,6 +12831,61 @@ extern void *__ARM_undef;
 #define vrmlsldavhxq_p(p0,p1,p2) __arm_vrmlsldavhxq_p(p0,p1,p2)
 #define __arm_vrmlsldavhxq_p(p0,p1,p2) __arm_vrmlsldavhxq_p_s32(p0,p1,p2)
 
+#define vqshluq_m_n(p0,p1,p2,p3) __arm_vqshluq_m_n(p0,p1,p2,p3)
+#define __arm_vqshluq_m_n(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
+  __typeof(p1) __p1 = (p1); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
+  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int8x16_t]: __arm_vqshluq_m_n_s8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, int8x16_t), p2, p3), \
+  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int16x8_t]: __arm_vqshluq_m_n_s16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, int16x8_t), p2, p3), \
+  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int32x4_t]: __arm_vqshluq_m_n_s32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, int32x4_t), p2, p3));})
+
+#define vshlq_m(p0,p1,p2,p3) __arm_vshlq_m(p0,p1,p2,p3)
+#define __arm_vshlq_m(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
+  __typeof(p1) __p1 = (p1); \
+  __typeof(p2) __p2 = (p2); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
+  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vshlq_m_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8x16_t), p3), \
+  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vshlq_m_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16x8_t), p3), \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vshlq_m_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32x4_t), p3), \
+  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_mve_type_int8x16_t]: __arm_vshlq_m_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce(__p2, int8x16_t), p3), \
+  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_mve_type_int16x8_t]: __arm_vshlq_m_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, int16x8_t), p3), \
+  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_mve_type_int32x4_t]: __arm_vshlq_m_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, int32x4_t), p3));})
+
+#define vsriq_m(p0,p1,p2,p3) __arm_vsriq_m(p0,p1,p2,p3)
+#define __arm_vsriq_m(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
+  __typeof(p1) __p1 = (p1); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
+  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vsriq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), p2, p3), \
+  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vsriq_m_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), p2, p3), \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vsriq_m_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), p2, p3), \
+  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]: __arm_vsriq_m_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t), p2, p3), \
+  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vsriq_m_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t), p2, p3), \
+  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vsriq_m_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t), p2, p3));})
+
+#define vsubq_m(p0,p1,p2,p3) __arm_vsubq_m(p0,p1,p2,p3)
+#define __arm_vsubq_m(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
+  __typeof(p1) __p1 = (p1); \
+  __typeof(p2) __p2 = (p2); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
+  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vsubq_m_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8x16_t), p3), \
+  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vsubq_m_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16x8_t), p3), \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vsubq_m_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32x4_t), p3), \
+  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]: __arm_vsubq_m_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce(__p2, uint8x16_t), p3), \
+  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vsubq_m_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, uint16x8_t), p3), \
+  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vsubq_m_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, uint32x4_t), p3));})
+
+#define vabavq_p(p0,p1,p2,p3) __arm_vabavq_p(p0,p1,p2,p3)
+#define __arm_vabavq_p(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
+  __typeof(p1) __p1 = (p1); \
+  __typeof(p2) __p2 = (p2); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
+  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vabavq_p_s8(__p0, __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8x16_t), p3), \
+  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vabavq_p_s16(__p0, __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16x8_t), p3), \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vabavq_p_s32(__p0, __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32x4_t), p3), \
+  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]: __arm_vabavq_p_u8(__p0, __ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce(__p2, uint8x16_t), p3), \
+  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vabavq_p_u16(__p0, __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, uint16x8_t), p3), \
+  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vabavq_p_u32(__p0, __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, uint32x4_t), p3));})
+
 #endif /* MVE Floating point.  */
 
 #ifdef __cplusplus
diff --git a/gcc/config/arm/arm_mve_builtins.def b/gcc/config/arm/arm_mve_builtins.def
index 4169b31646d4cf55a067131f6b516718c8e970c4..22c3de7c75171e239f2e644308ed3e05d2cdf819 100644
--- a/gcc/config/arm/arm_mve_builtins.def
+++ b/gcc/config/arm/arm_mve_builtins.def
@@ -502,3 +502,14 @@ VAR1 (TERNOP_NONE_NONE_NONE_UNONE, vaddlvaq_p_s, v4si)
 VAR1 (TERNOP_NONE_NONE_NONE_NONE, vrmlsldavhaxq_s, v4si)
 VAR1 (TERNOP_NONE_NONE_NONE_NONE, vrmlsldavhaq_s, v4si)
 VAR1 (TERNOP_NONE_NONE_NONE_NONE, vrmlaldavhaxq_s, v4si)
+VAR3 (QUADOP_NONE_NONE_NONE_IMM_UNONE, vsriq_m_n_s, v16qi, v8hi, v4si)
+VAR3 (QUADOP_UNONE_UNONE_UNONE_IMM_UNONE, vsriq_m_n_u, v16qi, v8hi, v4si)
+VAR3 (QUADOP_NONE_NONE_NONE_NONE_UNONE, vsubq_m_s, v16qi, v8hi, v4si)
+VAR3 (QUADOP_UNONE_UNONE_UNONE_UNONE_UNONE, vsubq_m_u, v16qi, v8hi, v4si)
+VAR2 (QUADOP_NONE_NONE_UNONE_IMM_UNONE, vcvtq_m_n_to_f_u, v8hf, v4sf)
+VAR2 (QUADOP_NONE_NONE_NONE_IMM_UNONE, vcvtq_m_n_to_f_s, v8hf, v4sf)
+VAR3 (QUADOP_UNONE_UNONE_NONE_IMM_UNONE, vqshluq_m_n_s, v16qi, v8hi, v4si)
+VAR3 (QUADOP_UNONE_UNONE_NONE_NONE_UNONE, vabavq_p_s, v16qi, v8hi, v4si)
+VAR3 (QUADOP_UNONE_UNONE_UNONE_UNONE_UNONE, vabavq_p_u, v16qi, v8hi, v4si)
+VAR3 (QUADOP_UNONE_UNONE_UNONE_NONE_UNONE, vshlq_m_u, v16qi, v8hi, v4si)
+VAR3 (QUADOP_NONE_NONE_NONE_NONE_UNONE, vshlq_m_s, v16qi, v8hi, v4si)
diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index 27a12dc533c69372328a7410bda2f68895b4e2bd..0fe7de439a3b40a03299bb2cf668e7cd59f2fc46 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -140,7 +140,10 @@
 			 VCVTPQ_M_S VCVTPQ_M_U VCVTQ_M_N_FROM_F_S VCVTNQ_M_U
 			 VREV16Q_M_S VREV16Q_M_U VREV32Q_M VCVTQ_M_FROM_F_U
 			 VCVTQ_M_FROM_F_S VRMLALDAVHQ_P_U VADDLVAQ_P_U
-			 VCVTQ_M_N_FROM_F_U])
+			 VCVTQ_M_N_FROM_F_U VQSHLUQ_M_N_S VABAVQ_P_S
+			 VABAVQ_P_U VSHLQ_M_S VSHLQ_M_U VSRIQ_M_N_S
+			 VSRIQ_M_N_U VSUBQ_M_U VSUBQ_M_S VCVTQ_M_N_TO_F_U
+			 VCVTQ_M_N_TO_F_S])
 
 (define_mode_attr MVE_CNVT [(V8HI "V8HF") (V4SI "V4SF")
 			    (V8HF "V8HI") (V4SF "V4SI")])
@@ -244,7 +247,11 @@
 		       (VCVTQ_M_N_FROM_F_U "u") (VCVTQ_M_FROM_F_S "s")
 		       (VCVTQ_M_FROM_F_U "u") (VRMLALDAVHQ_P_U "u")
 		       (VRMLALDAVHQ_P_S "s") (VADDLVAQ_P_U "u")
-		       (VCVTQ_M_N_FROM_F_S "s")])
+		       (VCVTQ_M_N_FROM_F_S "s") (VABAVQ_P_U "u")
+		       (VABAVQ_P_S "s") (VSHLQ_M_S "s") (VSHLQ_M_U "u")
+		       (VSRIQ_M_N_S "s") (VSRIQ_M_N_U "u") (VSUBQ_M_S "s")
+		       (VSUBQ_M_U "u") (VCVTQ_M_N_TO_F_S "s")
+		       (VCVTQ_M_N_TO_F_U "u")])
 
 (define_int_attr mode1 [(VCTP8Q "8") (VCTP16Q "16") (VCTP32Q "32")
 			(VCTP64Q "64") (VCTP8Q_M "8") (VCTP16Q_M "16")
@@ -407,6 +414,11 @@
 (define_int_iterator VCVTQ_M_FROM_F [VCVTQ_M_FROM_F_U VCVTQ_M_FROM_F_S])
 (define_int_iterator VRMLALDAVHQ_P [VRMLALDAVHQ_P_S VRMLALDAVHQ_P_U])
 (define_int_iterator VADDLVAQ_P [VADDLVAQ_P_U VADDLVAQ_P_S])
+(define_int_iterator VABAVQ_P [VABAVQ_P_S VABAVQ_P_U])
+(define_int_iterator VSHLQ_M [VSHLQ_M_S VSHLQ_M_U])
+(define_int_iterator VSRIQ_M_N [VSRIQ_M_N_S VSRIQ_M_N_U])
+(define_int_iterator VSUBQ_M [VSUBQ_M_U VSUBQ_M_S])
+(define_int_iterator VCVTQ_M_N_TO_F [VCVTQ_M_N_TO_F_U VCVTQ_M_N_TO_F_S])
 
 (define_insn "*mve_mov<mode>"
   [(set (match_operand:MVE_types 0 "s_register_operand" "=w,w,r,w,w,r,w")
@@ -5544,7 +5556,7 @@
 	 VSHRNTQ_N))
   ]
   "TARGET_HAVE_MVE"
-  "vshrnt.i%#<V_sz_elem>	%q0, %q2, %3"
+  "vshrnt.i%#<V_sz_elem>\t%q0, %q2, %3"
   [(set_attr "type" "mve_move")
 ])
 
@@ -5560,7 +5572,7 @@
 	 VCVTMQ_M))
   ]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
-  "vpst\;vcvtmt.<supf>%#<V_sz_elem>.f%#<V_sz_elem>       %q0, %q2"
+  "vpst\;vcvtmt.<supf>%#<V_sz_elem>.f%#<V_sz_elem>\t%q0, %q2"
   [(set_attr "type" "mve_move")
    (set_attr "length""8")])
 
@@ -5576,7 +5588,7 @@
 	 VCVTPQ_M))
   ]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
-  "vpst\;vcvtpt.<supf>%#<V_sz_elem>.f%#<V_sz_elem>       %q0, %q2"
+  "vpst\;vcvtpt.<supf>%#<V_sz_elem>.f%#<V_sz_elem>\t%q0, %q2"
   [(set_attr "type" "mve_move")
    (set_attr "length""8")])
 
@@ -5592,7 +5604,7 @@
 	 VCVTNQ_M))
   ]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
-  "vpst\;vcvtnt.<supf>%#<V_sz_elem>.f%#<V_sz_elem>       %q0, %q2"
+  "vpst\;vcvtnt.<supf>%#<V_sz_elem>.f%#<V_sz_elem>\t%q0, %q2"
   [(set_attr "type" "mve_move")
    (set_attr "length""8")])
 
@@ -5609,7 +5621,7 @@
 	 VCVTQ_M_N_FROM_F))
   ]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
-  "vpst\;vcvtt.<supf>%#<V_sz_elem>.f%#<V_sz_elem>	%q0, %q2, %3"
+  "vpst\;vcvtt.<supf>%#<V_sz_elem>.f%#<V_sz_elem>\t%q0, %q2, %3"
   [(set_attr "type" "mve_move")
    (set_attr "length""8")])
 
@@ -5641,7 +5653,7 @@
 	 VCVTQ_M_FROM_F))
   ]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
-  "vpst\;vcvtt.<supf>%#<V_sz_elem>.f%#<V_sz_elem>	%q0, %q2"
+  "vpst\;vcvtt.<supf>%#<V_sz_elem>.f%#<V_sz_elem>\t%q0, %q2"
   [(set_attr "type" "mve_move")
    (set_attr "length""8")])
 
@@ -5676,3 +5688,101 @@
   "vrmlsldavha.s32 %Q0, %R0, %q2, %q3"
   [(set_attr "type" "mve_move")
 ])
+
+;;
+;; [vabavq_p_s, vabavq_p_u])
+;;
+(define_insn "mve_vabavq_p_<supf><mode>"
+  [
+   (set (match_operand:SI 0 "s_register_operand" "=r")
+	(unspec:SI [(match_operand:SI 1 "s_register_operand" "0")
+		    (match_operand:MVE_2 2 "s_register_operand" "w")
+		    (match_operand:MVE_2 3 "s_register_operand" "w")
+		    (match_operand:HI 4 "vpr_register_operand" "Up")]
+	 VABAVQ_P))
+  ]
+  "TARGET_HAVE_MVE"
+  "vpst\;vabavt.<supf>%#<V_sz_elem>\t%0, %q2, %q3"
+  [(set_attr "type" "mve_move")
+])
+
+;;
+;; [vqshluq_m_n_s])
+;;
+(define_insn "mve_vqshluq_m_n_s<mode>"
+  [
+   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
+	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
+		       (match_operand:MVE_2 2 "s_register_operand" "w")
+		       (match_operand:SI 3 "mve_imm_7" "Ra")
+		       (match_operand:HI 4 "vpr_register_operand" "Up")]
+	 VQSHLUQ_M_N_S))
+  ]
+  "TARGET_HAVE_MVE"
+  "vpst\n\tvqshlut.s%#<V_sz_elem>\t%q0, %q2, %3"
+  [(set_attr "type" "mve_move")])
+
+;;
+;; [vshlq_m_s, vshlq_m_u])
+;;
+(define_insn "mve_vshlq_m_<supf><mode>"
+  [
+   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
+	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
+		       (match_operand:MVE_2 2 "s_register_operand" "w")
+		       (match_operand:MVE_2 3 "s_register_operand" "w")
+		       (match_operand:HI 4 "vpr_register_operand" "Up")]
+	 VSHLQ_M))
+  ]
+  "TARGET_HAVE_MVE"
+  "vpst\;vshlt.<supf>%#<V_sz_elem>\t%q0, %q2, %q3"
+  [(set_attr "type" "mve_move")])
+
+;;
+;; [vsriq_m_n_s, vsriq_m_n_u])
+;;
+(define_insn "mve_vsriq_m_n_<supf><mode>"
+  [
+   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
+	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
+		       (match_operand:MVE_2 2 "s_register_operand" "w")
+		       (match_operand:SI 3 "mve_imm_selective_upto_8" "Rg")
+		       (match_operand:HI 4 "vpr_register_operand" "Up")]
+	 VSRIQ_M_N))
+  ]
+  "TARGET_HAVE_MVE"
+  "vpst\;vsrit.%#<V_sz_elem>\t%q0, %q2, %3"
+  [(set_attr "type" "mve_move")])
+
+;;
+;; [vsubq_m_u, vsubq_m_s])
+;;
+(define_insn "mve_vsubq_m_<supf><mode>"
+  [
+   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
+	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
+		       (match_operand:MVE_2 2 "s_register_operand" "w")
+		       (match_operand:MVE_2 3 "s_register_operand" "w")
+		       (match_operand:HI 4 "vpr_register_operand" "Up")]
+	 VSUBQ_M))
+  ]
+  "TARGET_HAVE_MVE"
+  "vpst\;vsubt.i%#<V_sz_elem>\t%q0, %q2, %q3"
+  [(set_attr "type" "mve_move")])
+
+;;
+;; [vcvtq_m_n_to_f_u, vcvtq_m_n_to_f_s])
+;;
+(define_insn "mve_vcvtq_m_n_to_f_<supf><mode>"
+  [
+   (set (match_operand:MVE_0 0 "s_register_operand" "=w")
+	(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "0")
+		       (match_operand:<MVE_CNVT> 2 "s_register_operand" "w")
+		       (match_operand:SI 3 "mve_imm_16" "Rd")
+		       (match_operand:HI 4 "vpr_register_operand" "Up")]
+	 VCVTQ_M_N_TO_F))
+  ]
+  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
+  "vpst\;vcvtt.f%#<V_sz_elem>.<supf>%#<V_sz_elem>\t%q0, %q2, %3"
+  [(set_attr "type" "mve_move")
+   (set_attr "length""8")])
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_p_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_p_s16.c
new file mode 100644
index 0000000000000000000000000000000000000000..0c67c3afabb3780d6dcb4082a822389d868551f3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_p_s16.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint32_t
+foo (uint32_t a, int16x8_t b, int16x8_t c, mve_pred16_t p)
+{
+  return vabavq_p_s16 (a, b, c, p);
+}
+
+/* { dg-final { scan-assembler "vabavt.s16"  }  } */
+
+uint32_t
+foo1 (uint32_t a, int16x8_t b, int16x8_t c, mve_pred16_t p)
+{
+  return vabavq_p (a, b, c, p);
+}
+
+/* { dg-final { scan-assembler "vabavt.s16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_p_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_p_s32.c
new file mode 100644
index 0000000000000000000000000000000000000000..4a8c8425208e77295464e7b0a26bfd194170388f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_p_s32.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint32_t
+foo (uint32_t a, int32x4_t b, int32x4_t c, mve_pred16_t p)
+{
+  return vabavq_p_s32 (a, b, c, p);
+}
+
+/* { dg-final { scan-assembler "vabavt.s32"  }  } */
+
+uint32_t
+foo1 (uint32_t a, int32x4_t b, int32x4_t c, mve_pred16_t p)
+{
+  return vabavq_p (a, b, c, p);
+}
+
+/* { dg-final { scan-assembler "vabavt.s32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_p_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_p_s8.c
new file mode 100644
index 0000000000000000000000000000000000000000..80f09cbd4b17ae11ab88c69af579cfe0789c7b9e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_p_s8.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint32_t
+foo (uint32_t a, int8x16_t b, int8x16_t c, mve_pred16_t p)
+{
+  return vabavq_p_s8 (a, b, c, p);
+}
+
+/* { dg-final { scan-assembler "vabavt.s8"  }  } */
+
+uint32_t
+foo1 (uint32_t a, int8x16_t b, int8x16_t c, mve_pred16_t p)
+{
+  return vabavq_p (a, b, c, p);
+}
+
+/* { dg-final { scan-assembler "vabavt.s8"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_p_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_p_u16.c
new file mode 100644
index 0000000000000000000000000000000000000000..959bc7c3eb16e1937be8d88da6d38c4b93d11adb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_p_u16.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint32_t
+foo (uint32_t a, uint16x8_t b, uint16x8_t c, mve_pred16_t p)
+{
+  return vabavq_p_u16 (a, b, c, p);
+}
+
+/* { dg-final { scan-assembler "vabavt.u16"  }  } */
+
+uint32_t
+foo1 (uint32_t a, uint16x8_t b, uint16x8_t c, mve_pred16_t p)
+{
+  return vabavq_p (a, b, c, p);
+}
+
+/* { dg-final { scan-assembler "vabavt.u16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_p_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_p_u32.c
new file mode 100644
index 0000000000000000000000000000000000000000..7f32cb521ff1686c7eb68612c90c45f319ef2a81
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_p_u32.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint32_t
+foo (uint32_t a, uint32x4_t b, uint32x4_t c, mve_pred16_t p)
+{
+  return vabavq_p_u32 (a, b, c, p);
+}
+
+/* { dg-final { scan-assembler "vabavt.u32"  }  } */
+
+uint32_t
+foo1 (uint32_t a, uint32x4_t b, uint32x4_t c, mve_pred16_t p)
+{
+  return vabavq_p (a, b, c, p);
+}
+
+/* { dg-final { scan-assembler "vabavt.u32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_p_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_p_u8.c
new file mode 100644
index 0000000000000000000000000000000000000000..7654e36f1d17e05018399a77ce2e60efe532dd1e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_p_u8.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint32_t
+foo (uint32_t a, uint8x16_t b, uint8x16_t c, mve_pred16_t p)
+{
+  return vabavq_p_u8 (a, b, c, p);
+}
+
+/* { dg-final { scan-assembler "vabavt.u8"  }  } */
+
+uint32_t
+foo1 (uint32_t a, uint8x16_t b, uint8x16_t c, mve_pred16_t p)
+{
+  return vabavq_p (a, b, c, p);
+}
+
+/* { dg-final { scan-assembler "vabavt.u8"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_n_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_n_f16.c
index b40247d3ee40e491ae304b0cd4a8d656af1cc2cc..2db6c1369a8b238ce02c73b397d60e5d713786e3 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_n_f16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_n_f16.c
@@ -16,7 +16,7 @@ foo (float16x8_t a, float16_t b, mve_pred16_t p)
 mve_pred16_t
 foo1 (float16x8_t a, float16_t b, mve_pred16_t p)
 {
-  return vcmpeqq_m_n (a, b, p);
+  return vcmpeqq_m (a, b, p);
 }
 
 /* { dg-final { scan-assembler "vpst" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_n_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_n_f32.c
index 9b9ecf8c5518fdef76def0c75a87a1246974733b..e3a19aa7ae6dee749c7f6d839a068f458771fcaa 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_n_f32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_n_f32.c
@@ -16,7 +16,7 @@ foo (float32x4_t a, float32_t b, mve_pred16_t p)
 mve_pred16_t
 foo1 (float32x4_t a, float32_t b, mve_pred16_t p)
 {
-  return vcmpeqq_m_n (a, b, p);
+  return vcmpeqq_m (a, b, p);
 }
 
 /* { dg-final { scan-assembler "vpst" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_n_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_n_s16.c
index b510d453b05ffc38ee26184c92d6083a39403b36..2c12d25e2c9392280eeaab5f2324e7a969805e1c 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_n_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_n_s16.c
@@ -16,7 +16,7 @@ foo (int16x8_t a, int16_t b, mve_pred16_t p)
 mve_pred16_t
 foo1 (int16x8_t a, int16_t b, mve_pred16_t p)
 {
-  return vcmpeqq_m_n (a, b, p);
+  return vcmpeqq_m (a, b, p);
 }
 
 /* { dg-final { scan-assembler "vpst" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_n_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_n_s32.c
index 239273dbf080f4f072d8c28ed0a2a72f2610ce43..ae9d2097500e03868051674cf4293069c70fcf96 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_n_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_n_s32.c
@@ -16,7 +16,7 @@ foo (int32x4_t a, int32_t b, mve_pred16_t p)
 mve_pred16_t
 foo1 (int32x4_t a, int32_t b, mve_pred16_t p)
 {
-  return vcmpeqq_m_n (a, b, p);
+  return vcmpeqq_m (a, b, p);
 }
 
 /* { dg-final { scan-assembler "vpst" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_n_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_n_s8.c
index 2d811e6f0efdf4a313c26a9e0c6aa1a297b3d3cd..3b55515e604a544d9ad4034f3b4fdec1d6d57bd5 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_n_s8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_n_s8.c
@@ -16,7 +16,7 @@ foo (int8x16_t a, int8_t b, mve_pred16_t p)
 mve_pred16_t
 foo1 (int8x16_t a, int8_t b, mve_pred16_t p)
 {
-  return vcmpeqq_m_n (a, b, p);
+  return vcmpeqq_m (a, b, p);
 }
 
 /* { dg-final { scan-assembler "vpst" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_n_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_n_u16.c
index fd0867387de694933ff487d16d16627d79a0cacc..a56ecb5df0a5363d694e4c2fef44152579642fbd 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_n_u16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_n_u16.c
@@ -16,7 +16,7 @@ foo (uint16x8_t a, uint16_t b, mve_pred16_t p)
 mve_pred16_t
 foo1 (uint16x8_t a, uint16_t b, mve_pred16_t p)
 {
-  return vcmpeqq_m_n (a, b, p);
+  return vcmpeqq_m (a, b, p);
 }
 
 /* { dg-final { scan-assembler "vpst" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_n_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_n_u32.c
index f56f37e3088f286118862024a9dba2aea4a5db2d..c9f90b8a4c1a4540ed05af9690599b4ea8a38904 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_n_u32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_n_u32.c
@@ -16,7 +16,7 @@ foo (uint32x4_t a, uint32_t b, mve_pred16_t p)
 mve_pred16_t
 foo1 (uint32x4_t a, uint32_t b, mve_pred16_t p)
 {
-  return vcmpeqq_m_n (a, b, p);
+  return vcmpeqq_m (a, b, p);
 }
 
 /* { dg-final { scan-assembler "vpst" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_n_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_n_u8.c
index 0178fe39e2b5a2cf77b4818f2c096779ed7bb6a7..6f2792496c6b22be7c114092a3c4a8f27b6de780 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_n_u8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_n_u8.c
@@ -16,7 +16,7 @@ foo (uint8x16_t a, uint8_t b, mve_pred16_t p)
 mve_pred16_t
 foo1 (uint8x16_t a, uint8_t b, mve_pred16_t p)
 {
-  return vcmpeqq_m_n (a, b, p);
+  return vcmpeqq_m (a, b, p);
 }
 
 /* { dg-final { scan-assembler "vpst" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_m_n_f16_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_m_n_f16_s16.c
new file mode 100644
index 0000000000000000000000000000000000000000..df72d22479b69a0db2b4596182fd7b2ca656218c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_m_n_f16_s16.c
@@ -0,0 +1,23 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+float16x8_t
+foo (float16x8_t inactive, int16x8_t a, mve_pred16_t p)
+{
+  return vcvtq_m_n_f16_s16 (inactive, a, 1, p);
+}
+
+/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler "vcvtt.f16.s16"  }  } */
+
+float16x8_t
+foo1 (float16x8_t inactive, int16x8_t a, mve_pred16_t p)
+{
+  return vcvtq_m_n (inactive, a, 1, p);
+}
+
+/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler "vcvtt.f16.s16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_m_n_f16_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_m_n_f16_u16.c
new file mode 100644
index 0000000000000000000000000000000000000000..ad7cf0725c7c7ee53fb0f5e4e50a29a09c91b57c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_m_n_f16_u16.c
@@ -0,0 +1,23 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+float16x8_t
+foo (float16x8_t inactive, uint16x8_t a, mve_pred16_t p)
+{
+  return vcvtq_m_n_f16_u16 (inactive, a, 1, p);
+}
+
+/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler "vcvtt.f16.u16"  }  } */
+
+float16x8_t
+foo1 (float16x8_t inactive, uint16x8_t a, mve_pred16_t p)
+{
+  return vcvtq_m_n (inactive, a, 1, p);
+}
+
+/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler "vcvtt.f16.u16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_m_n_f32_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_m_n_f32_s32.c
new file mode 100644
index 0000000000000000000000000000000000000000..a90bf1a163ec721b13c2af19b1eda27b5c6f7f48
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_m_n_f32_s32.c
@@ -0,0 +1,23 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+float32x4_t
+foo (float32x4_t inactive, int32x4_t a, mve_pred16_t p)
+{
+  return vcvtq_m_n_f32_s32 (inactive, a, 1, p);
+}
+
+/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler "vcvtt.f32.s32"  }  } */
+
+float32x4_t
+foo1 (float32x4_t inactive, int32x4_t a, mve_pred16_t p)
+{
+  return vcvtq_m_n (inactive, a, 1, p);
+}
+
+/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler "vcvtt.f32.s32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_m_n_f32_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_m_n_f32_u32.c
new file mode 100644
index 0000000000000000000000000000000000000000..3581481ffd9db52dc027793a32dc2e1dc6125ced
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_m_n_f32_u32.c
@@ -0,0 +1,23 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+float32x4_t
+foo (float32x4_t inactive, uint32x4_t a, mve_pred16_t p)
+{
+  return vcvtq_m_n_f32_u32 (inactive, a, 16, p);
+}
+
+/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler "vcvtt.f32.u32"  }  } */
+
+float32x4_t
+foo1 (float32x4_t inactive, uint32x4_t a, mve_pred16_t p)
+{
+  return vcvtq_m_n (inactive, a, 16, p);
+}
+
+/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler "vcvtt.f32.u32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqshluq_m_n_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqshluq_m_n_s16.c
new file mode 100644
index 0000000000000000000000000000000000000000..71115723d4b051ba977a4c89b086efd376e779a8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqshluq_m_n_s16.c
@@ -0,0 +1,22 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint16x8_t
+foo (uint16x8_t inactive, int16x8_t a, mve_pred16_t p)
+{
+  return vqshluq_m_n_s16 (inactive, a, 7, p);
+}
+
+/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler "vqshlut.s16"  }  } */
+
+uint16x8_t
+foo1 (uint16x8_t inactive, int16x8_t a, mve_pred16_t p)
+{
+  return vqshluq_m_n (inactive, a, 7, p);
+}
+
+/* { dg-final { scan-assembler "vpst" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqshluq_m_n_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqshluq_m_n_s32.c
new file mode 100644
index 0000000000000000000000000000000000000000..60684f8d755f60a866b2e49ab3fc0e4fa11220fc
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqshluq_m_n_s32.c
@@ -0,0 +1,22 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint32x4_t
+foo (uint32x4_t inactive, int32x4_t a, mve_pred16_t p)
+{
+  return vqshluq_m_n_s32 (inactive, a, 7, p);
+}
+
+/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler "vqshlut.s32"  }  } */
+
+uint32x4_t
+foo1 (uint32x4_t inactive, int32x4_t a, mve_pred16_t p)
+{
+  return vqshluq_m_n (inactive, a, 7, p);
+}
+
+/* { dg-final { scan-assembler "vpst" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqshluq_m_n_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqshluq_m_n_s8.c
new file mode 100644
index 0000000000000000000000000000000000000000..53f0c67660d0def94807a3d91ad8fb50fda5eb02
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqshluq_m_n_s8.c
@@ -0,0 +1,22 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint8x16_t
+foo (uint8x16_t inactive, int8x16_t a, mve_pred16_t p)
+{
+  return vqshluq_m_n_s8 (inactive, a, 7, p);
+}
+
+/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler "vqshlut.s8"  }  } */
+
+uint8x16_t
+foo1 (uint8x16_t inactive, int8x16_t a, mve_pred16_t p)
+{
+  return vqshluq_m_n (inactive, a, 7, p);
+}
+
+/* { dg-final { scan-assembler "vpst" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vshlq_m_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vshlq_m_s16.c
new file mode 100644
index 0000000000000000000000000000000000000000..cb412e4dcb6993cf23b39a9668720bab1b96c357
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vshlq_m_s16.c
@@ -0,0 +1,22 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int16x8_t
+foo (int16x8_t inactive, int16x8_t a, int16x8_t b, mve_pred16_t p)
+{
+  return vshlq_m_s16 (inactive, a, b, p);
+}
+
+/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler "vshlt.s16"  }  } */
+
+int16x8_t
+foo1 (int16x8_t inactive, int16x8_t a, int16x8_t b, mve_pred16_t p)
+{
+  return vshlq_m (inactive, a, b, p);
+}
+
+/* { dg-final { scan-assembler "vpst" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vshlq_m_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vshlq_m_s32.c
new file mode 100644
index 0000000000000000000000000000000000000000..a356b02b0741aa0c0801ee219194db99c49322dc
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vshlq_m_s32.c
@@ -0,0 +1,22 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int32x4_t
+foo (int32x4_t inactive, int32x4_t a, int32x4_t b, mve_pred16_t p)
+{
+  return vshlq_m_s32 (inactive, a, b, p);
+}
+
+/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler "vshlt.s32"  }  } */
+
+int32x4_t
+foo1 (int32x4_t inactive, int32x4_t a, int32x4_t b, mve_pred16_t p)
+{
+  return vshlq_m (inactive, a, b, p);
+}
+
+/* { dg-final { scan-assembler "vpst" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vshlq_m_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vshlq_m_s8.c
new file mode 100644
index 0000000000000000000000000000000000000000..57b81d2a6ab7f7755c750c8e0e3d7dc126b445ef
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vshlq_m_s8.c
@@ -0,0 +1,22 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int8x16_t
+foo (int8x16_t inactive, int8x16_t a, int8x16_t b, mve_pred16_t p)
+{
+  return vshlq_m_s8 (inactive, a, b, p);
+}
+
+/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler "vshlt.s8"  }  } */
+
+int8x16_t
+foo1 (int8x16_t inactive, int8x16_t a, int8x16_t b, mve_pred16_t p)
+{
+  return vshlq_m (inactive, a, b, p);
+}
+
+/* { dg-final { scan-assembler "vpst" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vshlq_m_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vshlq_m_u16.c
new file mode 100644
index 0000000000000000000000000000000000000000..31baa5a1b68e149695db130262411e0a7854033e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vshlq_m_u16.c
@@ -0,0 +1,22 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint16x8_t
+foo (uint16x8_t inactive, uint16x8_t a, int16x8_t b, mve_pred16_t p)
+{
+  return vshlq_m_u16 (inactive, a, b, p);
+}
+
+/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler "vshlt.u16"  }  } */
+
+uint16x8_t
+foo1 (uint16x8_t inactive, uint16x8_t a, int16x8_t b, mve_pred16_t p)
+{
+  return vshlq_m (inactive, a, b, p);
+}
+
+/* { dg-final { scan-assembler "vpst" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vshlq_m_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vshlq_m_u32.c
new file mode 100644
index 0000000000000000000000000000000000000000..e4b547c68fc7c676df40416bd0771aef13c6bec2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vshlq_m_u32.c
@@ -0,0 +1,22 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint32x4_t
+foo (uint32x4_t inactive, uint32x4_t a, int32x4_t b, mve_pred16_t p)
+{
+  return vshlq_m_u32 (inactive, a, b, p);
+}
+
+/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler "vshlt.u32"  }  } */
+
+uint32x4_t
+foo1 (uint32x4_t inactive, uint32x4_t a, int32x4_t b, mve_pred16_t p)
+{
+  return vshlq_m (inactive, a, b, p);
+}
+
+/* { dg-final { scan-assembler "vpst" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vshlq_m_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vshlq_m_u8.c
new file mode 100644
index 0000000000000000000000000000000000000000..9905fa6fccac2286ae94ff4354dcf014321568c2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vshlq_m_u8.c
@@ -0,0 +1,22 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint8x16_t
+foo (uint8x16_t inactive, uint8x16_t a, int8x16_t b, mve_pred16_t p)
+{
+  return vshlq_m_u8 (inactive, a, b, p);
+}
+
+/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler "vshlt.u8"  }  } */
+
+uint8x16_t
+foo1 (uint8x16_t inactive, uint8x16_t a, int8x16_t b, mve_pred16_t p)
+{
+  return vshlq_m (inactive, a, b, p);
+}
+
+/* { dg-final { scan-assembler "vpst" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsriq_m_n_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsriq_m_n_s16.c
new file mode 100644
index 0000000000000000000000000000000000000000..0f707c0d05ffcfab8520f564066915eae3efd9ac
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsriq_m_n_s16.c
@@ -0,0 +1,22 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int16x8_t
+foo (int16x8_t a, int16x8_t b, mve_pred16_t p)
+{
+  return vsriq_m_n_s16 (a, b, 4, p);
+}
+
+/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler "vsrit.16"  }  } */
+
+int16x8_t
+foo1 (int16x8_t a, int16x8_t b, mve_pred16_t p)
+{
+  return vsriq_m (a, b, 4, p);
+}
+
+/* { dg-final { scan-assembler "vpst" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsriq_m_n_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsriq_m_n_s32.c
new file mode 100644
index 0000000000000000000000000000000000000000..d0d29363c3786ff7e47650d39dbca65c3e5fb558
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsriq_m_n_s32.c
@@ -0,0 +1,22 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int32x4_t
+foo (int32x4_t a, int32x4_t b, mve_pred16_t p)
+{
+  return vsriq_m_n_s32 (a, b, 2, p);
+}
+
+/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler "vsrit.32"  }  } */
+
+int32x4_t
+foo1 (int32x4_t a, int32x4_t b, mve_pred16_t p)
+{
+  return vsriq_m (a, b, 2, p);
+}
+
+/* { dg-final { scan-assembler "vpst" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsriq_m_n_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsriq_m_n_s8.c
new file mode 100644
index 0000000000000000000000000000000000000000..cab82024c36c0cc3062daa736084130bc12a8f44
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsriq_m_n_s8.c
@@ -0,0 +1,22 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int8x16_t
+foo (int8x16_t a, int8x16_t b, mve_pred16_t p)
+{
+  return vsriq_m_n_s8 (a, b, 4, p);
+}
+
+/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler "vsrit.8"  }  } */
+
+int8x16_t
+foo1 (int8x16_t a, int8x16_t b, mve_pred16_t p)
+{
+  return vsriq_m (a, b, 4, p);
+}
+
+/* { dg-final { scan-assembler "vpst" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsriq_m_n_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsriq_m_n_u16.c
new file mode 100644
index 0000000000000000000000000000000000000000..bccc3e9279d850b84e6d20bb31c7a9e98f7a2624
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsriq_m_n_u16.c
@@ -0,0 +1,22 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint16x8_t
+foo (uint16x8_t a, uint16x8_t b, mve_pred16_t p)
+{
+  return vsriq_m_n_u16 (a, b, 4, p);
+}
+
+/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler "vsrit.16"  }  } */
+
+uint16x8_t
+foo1 (uint16x8_t a, uint16x8_t b, mve_pred16_t p)
+{
+  return vsriq_m (a, b, 4, p);
+}
+
+/* { dg-final { scan-assembler "vpst" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsriq_m_n_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsriq_m_n_u32.c
new file mode 100644
index 0000000000000000000000000000000000000000..d81835ff5db9b679c84ccea9e045a421d85dfd13
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsriq_m_n_u32.c
@@ -0,0 +1,22 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint32x4_t
+foo (uint32x4_t a, uint32x4_t b, mve_pred16_t p)
+{
+  return vsriq_m_n_u32 (a, b, 4, p);
+}
+
+/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler "vsrit.32"  }  } */
+
+uint32x4_t
+foo1 (uint32x4_t a, uint32x4_t b, mve_pred16_t p)
+{
+  return vsriq_m (a, b, 4, p);
+}
+
+/* { dg-final { scan-assembler "vpst" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsriq_m_n_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsriq_m_n_u8.c
new file mode 100644
index 0000000000000000000000000000000000000000..4327d6e101ce3e4fa0ccc1f5bf7af2ceb77b8dfe
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsriq_m_n_u8.c
@@ -0,0 +1,22 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint8x16_t
+foo (uint8x16_t a, uint8x16_t b, mve_pred16_t p)
+{
+  return vsriq_m_n_u8 (a, b, 4, p);
+}
+
+/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler "vsrit.8"  }  } */
+
+uint8x16_t
+foo1 (uint8x16_t a, uint8x16_t b, mve_pred16_t p)
+{
+  return vsriq_m (a, b, 4, p);
+}
+
+/* { dg-final { scan-assembler "vpst" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsriq_n_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsriq_n_s16.c
index fca14b89661ffcd133a818070958243985add8df..00828e3ce74ed31f05ce86a21e430a27211dd024 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsriq_n_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsriq_n_s16.c
@@ -15,7 +15,7 @@ foo (int16x8_t a, int16x8_t b)
 int16x8_t
 foo1 (int16x8_t a, int16x8_t b)
 {
-  return vsriq_n (a, b, 4);
+  return vsriq (a, b, 4);
 }
 
 /* { dg-final { scan-assembler "vsri.16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsriq_n_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsriq_n_s32.c
index 55ef218fcc6995b661946dbfc894225260125157..03060c33e663ee6f596270768ce34712493330d6 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsriq_n_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsriq_n_s32.c
@@ -15,7 +15,7 @@ foo (int32x4_t a, int32x4_t b)
 int32x4_t
 foo1 (int32x4_t a, int32x4_t b)
 {
-  return vsriq_n (a, b, 4);
+  return vsriq (a, b, 4);
 }
 
 /* { dg-final { scan-assembler "vsri.32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsriq_n_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsriq_n_s8.c
index 83d726c2806f2547c886472e4ebd048aa80c61a2..e489cef12817a8a05b041646ad3e6b16a565468a 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsriq_n_s8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsriq_n_s8.c
@@ -15,7 +15,7 @@ foo (int8x16_t a, int8x16_t b)
 int8x16_t
 foo1 (int8x16_t a, int8x16_t b)
 {
-  return vsriq_n (a, b, 4);
+  return vsriq (a, b, 4);
 }
 
 /* { dg-final { scan-assembler "vsri.8"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsriq_n_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsriq_n_u16.c
index 6b49b46f6af78d16a4d281d07b4a00da5fb8a1e0..28f2558ce51d39e17c6b846ad23028839859a9f5 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsriq_n_u16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsriq_n_u16.c
@@ -15,7 +15,7 @@ foo (uint16x8_t a, uint16x8_t b)
 uint16x8_t
 foo1 (uint16x8_t a, uint16x8_t b)
 {
-  return vsriq_n (a, b, 4);
+  return vsriq (a, b, 4);
 }
 
 /* { dg-final { scan-assembler "vsri.16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsriq_n_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsriq_n_u32.c
index 9849355f5c32f59589fa69b868550f88f4e2ec2f..54ef642b9d8180589f9f9ad8fe2731f131c99759 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsriq_n_u32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsriq_n_u32.c
@@ -15,7 +15,7 @@ foo (uint32x4_t a, uint32x4_t b)
 uint32x4_t
 foo1 (uint32x4_t a, uint32x4_t b)
 {
-  return vsriq_n (a, b, 4);
+  return vsriq (a, b, 4);
 }
 
 /* { dg-final { scan-assembler "vsri.32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsriq_n_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsriq_n_u8.c
index 271248f7fa48372bf11233fa118a05b663a9b278..ee453ddbfd555a0d342aa1080d1ae0f2457032a5 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsriq_n_u8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsriq_n_u8.c
@@ -15,7 +15,7 @@ foo (uint8x16_t a, uint8x16_t b)
 uint8x16_t
 foo1 (uint8x16_t a, uint8x16_t b)
 {
-  return vsriq_n (a, b, 4);
+  return vsriq (a, b, 4);
 }
 
 /* { dg-final { scan-assembler "vsri.8"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_s16.c
new file mode 100644
index 0000000000000000000000000000000000000000..4937ee3187eec987a4c3f0048037325a46e2e3c7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_s16.c
@@ -0,0 +1,22 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int16x8_t
+foo (int16x8_t inactive, int16x8_t a, int16x8_t b, mve_pred16_t p)
+{
+  return vsubq_m_s16 (inactive, a, b, p);
+}
+
+/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler "vsubt.i16"  }  } */
+
+int16x8_t
+foo1 (int16x8_t inactive, int16x8_t a, int16x8_t b, mve_pred16_t p)
+{
+  return vsubq_m (inactive, a, b, p);
+}
+
+/* { dg-final { scan-assembler "vpst" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_s32.c
new file mode 100644
index 0000000000000000000000000000000000000000..52c6144d2a022a84ca6d64fc4628b6da036457c4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_s32.c
@@ -0,0 +1,22 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int32x4_t
+foo (int32x4_t inactive, int32x4_t a, int32x4_t b, mve_pred16_t p)
+{
+  return vsubq_m_s32 (inactive, a, b, p);
+}
+
+/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler "vsubt.i32"  }  } */
+
+int32x4_t
+foo1 (int32x4_t inactive, int32x4_t a, int32x4_t b, mve_pred16_t p)
+{
+  return vsubq_m (inactive, a, b, p);
+}
+
+/* { dg-final { scan-assembler "vpst" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_s8.c
new file mode 100644
index 0000000000000000000000000000000000000000..c4e88154369b60f3427d8ca4195d33a73d663e38
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_s8.c
@@ -0,0 +1,22 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int8x16_t
+foo (int8x16_t inactive, int8x16_t a, int8x16_t b, mve_pred16_t p)
+{
+  return vsubq_m_s8 (inactive, a, b, p);
+}
+
+/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler "vsubt.i8"  }  } */
+
+int8x16_t
+foo1 (int8x16_t inactive, int8x16_t a, int8x16_t b, mve_pred16_t p)
+{
+  return vsubq_m (inactive, a, b, p);
+}
+
+/* { dg-final { scan-assembler "vpst" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_u16.c
new file mode 100644
index 0000000000000000000000000000000000000000..43ef332c487fb7a97cc2b1b3a396e5a6e3e77b26
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_u16.c
@@ -0,0 +1,22 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint16x8_t
+foo (uint16x8_t inactive, uint16x8_t a, uint16x8_t b, mve_pred16_t p)
+{
+  return vsubq_m_u16 (inactive, a, b, p);
+}
+
+/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler "vsubt.i16"  }  } */
+
+uint16x8_t
+foo1 (uint16x8_t inactive, uint16x8_t a, uint16x8_t b, mve_pred16_t p)
+{
+  return vsubq_m (inactive, a, b, p);
+}
+
+/* { dg-final { scan-assembler "vpst" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_u32.c
new file mode 100644
index 0000000000000000000000000000000000000000..ab081b4e71106901b4ea677df7c0a7cb4c6d6444
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_u32.c
@@ -0,0 +1,22 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint32x4_t
+foo (uint32x4_t inactive, uint32x4_t a, uint32x4_t b, mve_pred16_t p)
+{
+  return vsubq_m_u32 (inactive, a, b, p);
+}
+
+/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler "vsubt.i32"  }  } */
+
+uint32x4_t
+foo1 (uint32x4_t inactive, uint32x4_t a, uint32x4_t b, mve_pred16_t p)
+{
+  return vsubq_m (inactive, a, b, p);
+}
+
+/* { dg-final { scan-assembler "vpst" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_u8.c
new file mode 100644
index 0000000000000000000000000000000000000000..bb7ffc1f6bfd65dc05946c8de1632239f9c16aea
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_m_u8.c
@@ -0,0 +1,22 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint8x16_t
+foo (uint8x16_t inactive, uint8x16_t a, uint8x16_t b, mve_pred16_t p)
+{
+  return vsubq_m_u8 (inactive, a, b, p);
+}
+
+/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler "vsubt.i8"  }  } */
+
+uint8x16_t
+foo1 (uint8x16_t inactive, uint8x16_t a, uint8x16_t b, mve_pred16_t p)
+{
+  return vsubq_m (inactive, a, b, p);
+}
+
+/* { dg-final { scan-assembler "vpst" } } */


[-- Attachment #2: diff16.patch.gz --]
[-- Type: application/gzip, Size: 8206 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH][ARM][GCC][4/5x]: MVE load intrinsics with zero(_z) suffix.
  2019-11-14 19:34 [PATCH][ARM][GCC][0/x]: Support for MVE ACLE intrinsics Srinath Parvathaneni
                   ` (12 preceding siblings ...)
  2019-11-14 19:15 ` [PATCH][ARM][GCC][4/2x]: MVE intrinsics with binary operands Srinath Parvathaneni
@ 2019-11-14 19:15 ` Srinath Parvathaneni
  2019-11-14 19:15 ` [PATCH][ARM][GCC][2/4x]: MVE intrinsics with quaternary operands Srinath Parvathaneni
                   ` (24 subsequent siblings)
  38 siblings, 0 replies; 48+ messages in thread
From: Srinath Parvathaneni @ 2019-11-14 19:15 UTC (permalink / raw)
  To: gcc-patches; +Cc: Richard Earnshaw, Kyrylo Tkachov

[-- Attachment #1: Type: text/plain, Size: 29467 bytes --]

Hello,

This patch supports the following MVE ACLE load intrinsics with zero(_z)
suffix.
* ``_z`` (zero) which indicates false-predicated lanes are filled with zeroes,
these are only used for load instructions.

vldrbq_gather_offset_z_s16, vldrbq_gather_offset_z_u8, vldrbq_gather_offset_z_s32,
vldrbq_gather_offset_z_u16, vldrbq_gather_offset_z_u32, vldrbq_gather_offset_z_s8,
vldrbq_z_s16, vldrbq_z_u8, vldrbq_z_s8, vldrbq_z_s32, vldrbq_z_u16, vldrbq_z_u32,
vldrwq_gather_base_z_u32, vldrwq_gather_base_z_s32.

Please refer to M-profile Vector Extension (MVE) intrinsics [1]  for more details.
[1]  https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/mve-intrinsics

Regression tested on arm-none-eabi and found no regressions.

Ok for trunk?

Thanks,
Srinath.

gcc/ChangeLog:

2019-11-01  Andre Vieira  <andre.simoesdiasvieira@arm.com>
	    Mihail Ionescu  <mihail.ionescu@arm.com>
	    Srinath Parvathaneni  <srinath.parvathaneni@arm.com>

	* config/arm/arm-builtins.c (LDRGBS_Z_QUALIFIERS): Define builtin
	qualifier.
	(LDRGBU_Z_QUALIFIERS): Likewise.
	(LDRGS_Z_QUALIFIERS): Likewise.
	(LDRGU_Z_QUALIFIERS): Likewise.
	(LDRS_Z_QUALIFIERS): Likewise.
	(LDRU_Z_QUALIFIERS): Likewise.
	* config/arm/arm_mve.h (vldrbq_gather_offset_z_s16): Define macro.
	(vldrbq_gather_offset_z_u8): Likewise.
	(vldrbq_gather_offset_z_s32): Likewise.
	(vldrbq_gather_offset_z_u16): Likewise.
	(vldrbq_gather_offset_z_u32): Likewise.
	(vldrbq_gather_offset_z_s8): Likewise.
	(vldrbq_z_s16): Likewise.
	(vldrbq_z_u8): Likewise.
	(vldrbq_z_s8): Likewise.
	(vldrbq_z_s32): Likewise.
	(vldrbq_z_u16): Likewise.
	(vldrbq_z_u32): Likewise.
	(vldrwq_gather_base_z_u32): Likewise.
	(vldrwq_gather_base_z_s32): Likewise.
	(__arm_vldrbq_gather_offset_z_s8): Define intrinsic.
	(__arm_vldrbq_gather_offset_z_s32): Likewise.
	(__arm_vldrbq_gather_offset_z_s16): Likewise.
	(__arm_vldrbq_gather_offset_z_u8): Likewise.
	(__arm_vldrbq_gather_offset_z_u32): Likewise.
	(__arm_vldrbq_gather_offset_z_u16): Likewise.
	(__arm_vldrbq_z_s8): Likewise.
	(__arm_vldrbq_z_s32): Likewise.
	(__arm_vldrbq_z_s16): Likewise.
	(__arm_vldrbq_z_u8): Likewise.
	(__arm_vldrbq_z_u32): Likewise.
	(__arm_vldrbq_z_u16): Likewise.
	(__arm_vldrwq_gather_base_z_s32): Likewise.
	(__arm_vldrwq_gather_base_z_u32): Likewise.
	(vldrbq_gather_offset_z): Define polymorphic variant.
	* config/arm/arm_mve_builtins.def (LDRGBS_Z_QUALIFIERS): Use builtin
        qualifier.
        (LDRGBU_Z_QUALIFIERS): Likewise.
        (LDRGS_Z_QUALIFIERS): Likewise.
        (LDRGU_Z_QUALIFIERS): Likewise.
        (LDRS_Z_QUALIFIERS): Likewise.
        (LDRU_Z_QUALIFIERS): Likewise.
	* config/arm/mve.md (mve_vldrbq_gather_offset_z_<supf><mode>): Define
	RTL pattern.
	(mve_vldrbq_z_<supf><mode>): Likewise.
	(mve_vldrwq_gather_base_z_<supf>v4si): Likewise.

gcc/testsuite/ChangeLog: Likewise.

2019-11-01  Andre Vieira  <andre.simoesdiasvieira@arm.com>
	    Mihail Ionescu  <mihail.ionescu@arm.com>
	    Srinath Parvathaneni  <srinath.parvathaneni@arm.com>

	* gcc.target/arm/mve/intrinsics/vldrbq_gather_offset_z_s16.c: New test.
	* gcc.target/arm/mve/intrinsics/vldrbq_gather_offset_z_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vldrbq_gather_offset_z_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vldrbq_gather_offset_z_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vldrbq_gather_offset_z_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vldrbq_gather_offset_z_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vldrbq_z_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vldrbq_z_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vldrbq_z_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vldrbq_z_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vldrbq_z_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vldrbq_z_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vldrwq_gather_base_z_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vldrwq_gather_base_z_u32.c: Likewise.


###############     Attachment also inlined for ease of reply    ###############


diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index b5639051bf07785d906ed596e08d670f4de1a67e..c3d12375d2fbc933ad33f7a15a3bbf53079d0639 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -653,6 +653,40 @@ arm_ldrgbu_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_unsigned, qualifier_unsigned, qualifier_immediate};
 #define LDRGBU_QUALIFIERS (arm_ldrgbu_qualifiers)
 
+static enum arm_type_qualifiers
+arm_ldrgbs_z_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_none, qualifier_unsigned, qualifier_immediate,
+      qualifier_unsigned};
+#define LDRGBS_Z_QUALIFIERS (arm_ldrgbs_z_qualifiers)
+
+static enum arm_type_qualifiers
+arm_ldrgbu_z_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_unsigned, qualifier_unsigned, qualifier_immediate,
+      qualifier_unsigned};
+#define LDRGBU_Z_QUALIFIERS (arm_ldrgbu_z_qualifiers)
+
+static enum arm_type_qualifiers
+arm_ldrgs_z_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_none, qualifier_pointer, qualifier_unsigned,
+      qualifier_unsigned};
+#define LDRGS_Z_QUALIFIERS (arm_ldrgs_z_qualifiers)
+
+static enum arm_type_qualifiers
+arm_ldrgu_z_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_unsigned, qualifier_pointer, qualifier_unsigned,
+      qualifier_unsigned};
+#define LDRGU_Z_QUALIFIERS (arm_ldrgu_z_qualifiers)
+
+static enum arm_type_qualifiers
+arm_ldrs_z_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_none, qualifier_pointer, qualifier_unsigned};
+#define LDRS_Z_QUALIFIERS (arm_ldrs_z_qualifiers)
+
+static enum arm_type_qualifiers
+arm_ldru_z_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_unsigned, qualifier_pointer, qualifier_unsigned};
+#define LDRU_Z_QUALIFIERS (arm_ldru_z_qualifiers)
+
 /* End of Qualifier for MVE builtins.  */
 
    /* void ([T element type] *, T, immediate).  */
diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
index f852acb575b596c534c474d19faa73c03cf85c5e..b40a9c238b7f883a0e91dd6fcc5f41182ea6efe3 100644
--- a/gcc/config/arm/arm_mve.h
+++ b/gcc/config/arm/arm_mve.h
@@ -1744,6 +1744,20 @@ typedef struct { uint8x16_t val[4]; } uint8x16x4_t;
 #define vstrbq_scatter_offset_p_u16( __base, __offset, __value, __p) __arm_vstrbq_scatter_offset_p_u16( __base, __offset, __value, __p)
 #define vstrwq_scatter_base_p_s32(__addr,  __offset, __value, __p) __arm_vstrwq_scatter_base_p_s32(__addr,  __offset, __value, __p)
 #define vstrwq_scatter_base_p_u32(__addr,  __offset, __value, __p) __arm_vstrwq_scatter_base_p_u32(__addr,  __offset, __value, __p)
+#define vldrbq_gather_offset_z_s16(__base, __offset, __p) __arm_vldrbq_gather_offset_z_s16(__base, __offset, __p)
+#define vldrbq_gather_offset_z_u8(__base, __offset, __p) __arm_vldrbq_gather_offset_z_u8(__base, __offset, __p)
+#define vldrbq_gather_offset_z_s32(__base, __offset, __p) __arm_vldrbq_gather_offset_z_s32(__base, __offset, __p)
+#define vldrbq_gather_offset_z_u16(__base, __offset, __p) __arm_vldrbq_gather_offset_z_u16(__base, __offset, __p)
+#define vldrbq_gather_offset_z_u32(__base, __offset, __p) __arm_vldrbq_gather_offset_z_u32(__base, __offset, __p)
+#define vldrbq_gather_offset_z_s8(__base, __offset, __p) __arm_vldrbq_gather_offset_z_s8(__base, __offset, __p)
+#define vldrbq_z_s16(__base, __p) __arm_vldrbq_z_s16(__base, __p)
+#define vldrbq_z_u8(__base, __p) __arm_vldrbq_z_u8(__base, __p)
+#define vldrbq_z_s8(__base, __p) __arm_vldrbq_z_s8(__base, __p)
+#define vldrbq_z_s32(__base, __p) __arm_vldrbq_z_s32(__base, __p)
+#define vldrbq_z_u16(__base, __p) __arm_vldrbq_z_u16(__base, __p)
+#define vldrbq_z_u32(__base, __p) __arm_vldrbq_z_u32(__base, __p)
+#define vldrwq_gather_base_z_u32(__addr,  __offset, __p) __arm_vldrwq_gather_base_z_u32(__addr,  __offset, __p)
+#define vldrwq_gather_base_z_s32(__addr,  __offset, __p) __arm_vldrwq_gather_base_z_s32(__addr,  __offset, __p)
 #endif
 
 __extension__ extern __inline void
@@ -11330,6 +11344,105 @@ __arm_vstrwq_scatter_base_p_u32 (uint32x4_t __addr, const int __offset, uint32x4
 {
   __builtin_mve_vstrwq_scatter_base_p_uv4si (__addr, __offset, __value, __p);
 }
+
+__extension__ extern __inline int8x16_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vldrbq_gather_offset_z_s8 (int8_t const * __base, uint8x16_t __offset, mve_pred16_t __p)
+{
+  return __builtin_mve_vldrbq_gather_offset_z_sv16qi ((__builtin_neon_qi *) __base, __offset, __p);
+}
+
+__extension__ extern __inline int32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vldrbq_gather_offset_z_s32 (int8_t const * __base, uint32x4_t __offset, mve_pred16_t __p)
+{
+  return __builtin_mve_vldrbq_gather_offset_z_sv4si ((__builtin_neon_qi *) __base, __offset, __p);
+}
+
+__extension__ extern __inline int16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vldrbq_gather_offset_z_s16 (int8_t const * __base, uint16x8_t __offset, mve_pred16_t __p)
+{
+  return __builtin_mve_vldrbq_gather_offset_z_sv8hi ((__builtin_neon_qi *) __base, __offset, __p);
+}
+
+__extension__ extern __inline uint8x16_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vldrbq_gather_offset_z_u8 (uint8_t const * __base, uint8x16_t __offset, mve_pred16_t __p)
+{
+  return __builtin_mve_vldrbq_gather_offset_z_uv16qi ((__builtin_neon_qi *) __base, __offset, __p);
+}
+
+__extension__ extern __inline uint32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vldrbq_gather_offset_z_u32 (uint8_t const * __base, uint32x4_t __offset, mve_pred16_t __p)
+{
+  return __builtin_mve_vldrbq_gather_offset_z_uv4si ((__builtin_neon_qi *) __base, __offset, __p);
+}
+
+__extension__ extern __inline uint16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vldrbq_gather_offset_z_u16 (uint8_t const * __base, uint16x8_t __offset, mve_pred16_t __p)
+{
+  return __builtin_mve_vldrbq_gather_offset_z_uv8hi ((__builtin_neon_qi *) __base, __offset, __p);
+}
+
+__extension__ extern __inline int8x16_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vldrbq_z_s8 (int8_t const * __base, mve_pred16_t __p)
+{
+  return __builtin_mve_vldrbq_z_sv16qi ((__builtin_neon_qi *) __base, __p);
+}
+
+__extension__ extern __inline int32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vldrbq_z_s32 (int8_t const * __base, mve_pred16_t __p)
+{
+  return __builtin_mve_vldrbq_z_sv4si ((__builtin_neon_qi *) __base, __p);
+}
+
+__extension__ extern __inline int16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vldrbq_z_s16 (int8_t const * __base, mve_pred16_t __p)
+{
+  return __builtin_mve_vldrbq_z_sv8hi ((__builtin_neon_qi *) __base, __p);
+}
+
+__extension__ extern __inline uint8x16_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vldrbq_z_u8 (uint8_t const * __base, mve_pred16_t __p)
+{
+  return __builtin_mve_vldrbq_z_uv16qi ((__builtin_neon_qi *) __base, __p);
+}
+
+__extension__ extern __inline uint32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vldrbq_z_u32 (uint8_t const * __base, mve_pred16_t __p)
+{
+  return __builtin_mve_vldrbq_z_uv4si ((__builtin_neon_qi *) __base, __p);
+}
+
+__extension__ extern __inline uint16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vldrbq_z_u16 (uint8_t const * __base, mve_pred16_t __p)
+{
+  return __builtin_mve_vldrbq_z_uv8hi ((__builtin_neon_qi *) __base, __p);
+}
+
+__extension__ extern __inline int32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vldrwq_gather_base_z_s32 (uint32x4_t __addr, const int __offset, mve_pred16_t __p)
+{
+  return __builtin_mve_vldrwq_gather_base_z_sv4si (__addr, __offset, __p);
+}
+
+__extension__ extern __inline uint32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vldrwq_gather_base_z_u32 (uint32x4_t __addr, const int __offset, mve_pred16_t __p)
+{
+  return __builtin_mve_vldrwq_gather_base_z_uv4si (__addr, __offset, __p);
+}
+
 #if (__ARM_FEATURE_MVE & 2) /* MVE Floating point.  */
 
 __extension__ extern __inline void
@@ -13471,6 +13584,7 @@ __arm_vsubq_m_n_f16 (float16x8_t __inactive, float16x8_t __a, float16_t __b, mve
 {
   return __builtin_mve_vsubq_m_n_fv8hf (__inactive, __a, __b, __p);
 }
+
 #endif
 
 enum {
@@ -17946,6 +18060,17 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_int32x4_t]: __arm_vstrwq_scatter_base_p_s32 (p0, p1, __ARM_mve_coerce(__p2, int32x4_t), p3), \
   int (*)[__ARM_mve_type_uint32x4_t]: __arm_vstrwq_scatter_base_p_u32 (p0, p1, __ARM_mve_coerce(__p2, uint32x4_t), p3));})
 
+#define vldrbq_gather_offset_z(p0,p1,p2) __arm_vldrbq_gather_offset_z(p0,p1,p2)
+#define __arm_vldrbq_gather_offset_z(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
+  __typeof(p1) __p1 = (p1); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
+  int (*)[__ARM_mve_type_int8_t_const_ptr][__ARM_mve_type_uint8x16_t]: __arm_vldrbq_gather_offset_z_s8 (__ARM_mve_coerce(__p0, int8_t const *), __ARM_mve_coerce(__p1, uint8x16_t), p2), \
+  int (*)[__ARM_mve_type_int8_t_const_ptr][__ARM_mve_type_uint16x8_t]: __arm_vldrbq_gather_offset_z_s16 (__ARM_mve_coerce(__p0, int8_t const *), __ARM_mve_coerce(__p1, uint16x8_t), p2), \
+  int (*)[__ARM_mve_type_int8_t_const_ptr][__ARM_mve_type_uint32x4_t]: __arm_vldrbq_gather_offset_z_s32 (__ARM_mve_coerce(__p0, int8_t const *), __ARM_mve_coerce(__p1, uint32x4_t), p2), \
+  int (*)[__ARM_mve_type_uint8_t_const_ptr][__ARM_mve_type_uint8x16_t]: __arm_vldrbq_gather_offset_z_u8 (__ARM_mve_coerce(__p0, uint8_t const *), __ARM_mve_coerce(__p1, uint8x16_t), p2), \
+  int (*)[__ARM_mve_type_uint8_t_const_ptr][__ARM_mve_type_uint16x8_t]: __arm_vldrbq_gather_offset_z_u16 (__ARM_mve_coerce(__p0, uint8_t const *), __ARM_mve_coerce(__p1, uint16x8_t), p2), \
+  int (*)[__ARM_mve_type_uint8_t_const_ptr][__ARM_mve_type_uint32x4_t]: __arm_vldrbq_gather_offset_z_u32 (__ARM_mve_coerce(__p0, uint8_t const *), __ARM_mve_coerce(__p1, uint32x4_t), p2));})
+
 #endif /* MVE Floating point.  */
 
 #ifdef __cplusplus
diff --git a/gcc/config/arm/arm_mve_builtins.def b/gcc/config/arm/arm_mve_builtins.def
index dc22c2f599702be5a4c4481033d3f413abf6d1c5..45b00e889cfe716a2c70fb86d72eb9a4c411b70d 100644
--- a/gcc/config/arm/arm_mve_builtins.def
+++ b/gcc/config/arm/arm_mve_builtins.def
@@ -703,3 +703,9 @@ VAR3 (STRSS_P, vstrbq_scatter_offset_p_s, v16qi, v8hi, v4si)
 VAR3 (STRSU_P, vstrbq_scatter_offset_p_u, v16qi, v8hi, v4si)
 VAR1 (STRSBS_P, vstrwq_scatter_base_p_s, v4si)
 VAR1 (STRSBU_P, vstrwq_scatter_base_p_u, v4si)
+VAR1 (LDRGBS_Z, vldrwq_gather_base_z_s, v4si)
+VAR1 (LDRGBU_Z, vldrwq_gather_base_z_u, v4si)
+VAR3 (LDRGS_Z, vldrbq_gather_offset_z_s, v16qi, v8hi, v4si)
+VAR3 (LDRGU_Z, vldrbq_gather_offset_z_u, v16qi, v8hi, v4si)
+VAR3 (LDRS_Z, vldrbq_z_s, v16qi, v8hi, v4si)
+VAR3 (LDRU_Z, vldrbq_z_u, v16qi, v8hi, v4si)
diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index 266a7f830285521fd225c8b07bc15e412b7bab61..05b94e4ee283da427aee18c7223bf5ff4b0e1e4a 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -8135,3 +8135,69 @@
    return "";
 }
   [(set_attr "length" "8")])
+
+;;
+;; [vldrbq_gather_offset_z_s vldrbq_gather_offset_z_u]
+;;
+(define_insn "mve_vldrbq_gather_offset_z_<supf><mode>"
+  [(set (match_operand:MVE_2 0 "s_register_operand" "=&w")
+	(unspec:MVE_2 [(match_operand:<MVE_B_ELEM> 1 "memory_operand" "Us")
+		       (match_operand:MVE_2 2 "s_register_operand" "w")
+		       (match_operand:HI 3 "vpr_register_operand" "Up")]
+	 VLDRBGOQ))
+  ]
+  "TARGET_HAVE_MVE"
+{
+   rtx ops[4];
+   ops[0] = operands[0];
+   ops[1] = operands[1];
+   ops[2] = operands[2];
+   ops[3] = operands[3];
+   if (!strcmp ("<supf>","s") && <V_sz_elem> == 8)
+     output_asm_insn ("vpst\n\tvldrbt.u8\t%q0, [%m1, %q2]",ops);
+   else
+     output_asm_insn ("vpst\n\tvldrbt.<supf><V_sz_elem>\t%q0, [%m1, %q2]",ops);
+   return "";
+}
+  [(set_attr "length" "8")])
+
+;;
+;; [vldrbq_z_s vldrbq_z_u]
+;;
+(define_insn "mve_vldrbq_z_<supf><mode>"
+  [(set (match_operand:MVE_2 0 "s_register_operand" "=w")
+	(unspec:MVE_2 [(match_operand:<MVE_B_ELEM> 1 "memory_operand" "Us")
+		       (match_operand:HI 2 "vpr_register_operand" "Up")]
+	 VLDRBQ))
+  ]
+  "TARGET_HAVE_MVE"
+{
+   rtx ops[2];
+   int regno = REGNO (operands[0]);
+   ops[0] = gen_rtx_REG (TImode, regno);
+   ops[1]  = operands[1];
+   output_asm_insn ("vpst\n\tvldrbt.<supf><V_sz_elem>\t%q0, %E1",ops);
+   return "";
+}
+  [(set_attr "length" "8")])
+
+;;
+;; [vldrwq_gather_base_z_s vldrwq_gather_base_z_u]
+;;
+(define_insn "mve_vldrwq_gather_base_z_<supf>v4si"
+  [(set (match_operand:V4SI 0 "s_register_operand" "=&w")
+	(unspec:V4SI [(match_operand:V4SI 1 "s_register_operand" "w")
+		      (match_operand:SI 2 "immediate_operand" "i")
+		      (match_operand:HI 3 "vpr_register_operand" "Up")]
+	 VLDRWGBQ))
+  ]
+  "TARGET_HAVE_MVE"
+{
+   rtx ops[3];
+   ops[0] = operands[0];
+   ops[1] = operands[1];
+   ops[2] = operands[2];
+   output_asm_insn ("vpst\n\tvldrwt.u32\t%q0, [%q1, %2]",ops);
+   return "";
+}
+  [(set_attr "length" "8")])
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_gather_offset_z_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_gather_offset_z_s16.c
new file mode 100644
index 0000000000000000000000000000000000000000..ce1b2b241989449e0bfee0f5a378eb9b5a3e250a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_gather_offset_z_s16.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int16x8_t
+foo (int8_t const * base, uint16x8_t offset, mve_pred16_t p)
+{
+  return vldrbq_gather_offset_z_s16 (base, offset, p);
+}
+
+/* { dg-final { scan-assembler "vldrbt.s16"  }  } */
+
+int16x8_t
+foo1 (int8_t const * base, uint16x8_t offset, mve_pred16_t p)
+{
+  return vldrbq_gather_offset_z (base, offset, p);
+}
+
+/* { dg-final { scan-assembler "vldrbt.s16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_gather_offset_z_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_gather_offset_z_s32.c
new file mode 100644
index 0000000000000000000000000000000000000000..b79e1c8961cb52b4e16351064ee8f2e1be1baae5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_gather_offset_z_s32.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int32x4_t
+foo (int8_t const * base, uint32x4_t offset, mve_pred16_t p)
+{
+  return vldrbq_gather_offset_z_s32 (base, offset, p);
+}
+
+/* { dg-final { scan-assembler "vldrbt.s32"  }  } */
+
+int32x4_t
+foo1 (int8_t const * base, uint32x4_t offset, mve_pred16_t p)
+{
+  return vldrbq_gather_offset_z (base, offset, p);
+}
+
+/* { dg-final { scan-assembler "vldrbt.s32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_gather_offset_z_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_gather_offset_z_s8.c
new file mode 100644
index 0000000000000000000000000000000000000000..816634a7086ce290048a669144e715de03eed71c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_gather_offset_z_s8.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int8x16_t
+foo (int8_t const * base, uint8x16_t offset, mve_pred16_t p)
+{
+  return vldrbq_gather_offset_z_s8 (base, offset, p);
+}
+
+/* { dg-final { scan-assembler "vldrbt.u8"  }  } */
+
+int8x16_t
+foo1 (int8_t const * base, uint8x16_t offset, mve_pred16_t p)
+{
+  return vldrbq_gather_offset_z (base, offset, p);
+}
+
+/* { dg-final { scan-assembler "vldrbt.u8"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_gather_offset_z_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_gather_offset_z_u16.c
new file mode 100644
index 0000000000000000000000000000000000000000..67722400bb04be883b89d3a882023ef18b1e1c7d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_gather_offset_z_u16.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint16x8_t
+foo (uint8_t const * base, uint16x8_t offset, mve_pred16_t p)
+{
+  return vldrbq_gather_offset_z_u16 (base, offset, p);
+}
+
+/* { dg-final { scan-assembler "vldrbt.u16"  }  } */
+
+uint16x8_t
+foo1 (uint8_t const * base, uint16x8_t offset, mve_pred16_t p)
+{
+  return vldrbq_gather_offset_z (base, offset, p);
+}
+
+/* { dg-final { scan-assembler "vldrbt.u16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_gather_offset_z_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_gather_offset_z_u32.c
new file mode 100644
index 0000000000000000000000000000000000000000..ed174404330e13e017edd07883dbdb261300f23b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_gather_offset_z_u32.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint32x4_t
+foo (uint8_t const * base, uint32x4_t offset, mve_pred16_t p)
+{
+  return vldrbq_gather_offset_z_u32 (base, offset, p);
+}
+
+/* { dg-final { scan-assembler "vldrbt.u32"  }  } */
+
+uint32x4_t
+foo1 (uint8_t const * base, uint32x4_t offset, mve_pred16_t p)
+{
+  return vldrbq_gather_offset_z (base, offset, p);
+}
+
+/* { dg-final { scan-assembler "vldrbt.u32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_gather_offset_z_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_gather_offset_z_u8.c
new file mode 100644
index 0000000000000000000000000000000000000000..1ac9f4ee3669222020b6de6b8e89463128b61acd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_gather_offset_z_u8.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint8x16_t
+foo (uint8_t const * base, uint8x16_t offset, mve_pred16_t p)
+{
+  return vldrbq_gather_offset_z_u8 (base, offset, p);
+}
+
+/* { dg-final { scan-assembler "vldrbt.u8"  }  } */
+
+uint8x16_t
+foo1 (uint8_t const * base, uint8x16_t offset, mve_pred16_t p)
+{
+  return vldrbq_gather_offset_z (base, offset, p);
+}
+
+/* { dg-final { scan-assembler "vldrbt.u8"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_z_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_z_s16.c
new file mode 100644
index 0000000000000000000000000000000000000000..4fd604ed27da1d686f60b76c6a66a5377b32ed3c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_z_s16.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int16x8_t
+foo (int8_t const * base, mve_pred16_t p)
+{
+  return vldrbq_z_s16 (base, p);
+}
+
+/* { dg-final { scan-assembler "vldrbt.s16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_z_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_z_s32.c
new file mode 100644
index 0000000000000000000000000000000000000000..faae9e17c15cc60d72a582e4f341dd2e132c4f7e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_z_s32.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int32x4_t
+foo (int8_t const * base, mve_pred16_t p)
+{
+  return vldrbq_z_s32 (base, p);
+}
+
+/* { dg-final { scan-assembler "vldrbt.s32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_z_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_z_s8.c
new file mode 100644
index 0000000000000000000000000000000000000000..1bb0bed237f936739395d48be9509393a888ae3d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_z_s8.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int8x16_t
+foo (int8_t const * base, mve_pred16_t p)
+{
+  return vldrbq_z_s8 (base, p);
+}
+
+/* { dg-final { scan-assembler "vldrbt.s8"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_z_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_z_u16.c
new file mode 100644
index 0000000000000000000000000000000000000000..50e18a3cecc98fe83727b9256e14e8dd1088aa61
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_z_u16.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint16x8_t
+foo (uint8_t const * base, mve_pred16_t p)
+{
+  return vldrbq_z_u16 (base, p);
+}
+
+/* { dg-final { scan-assembler "vldrbt.u16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_z_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_z_u32.c
new file mode 100644
index 0000000000000000000000000000000000000000..78590dda496acb26d8eeff273932541f7f1a3869
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_z_u32.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint32x4_t
+foo (uint8_t const * base, mve_pred16_t p)
+{
+  return vldrbq_z_u32 (base, p);
+}
+
+/* { dg-final { scan-assembler "vldrbt.u32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_z_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_z_u8.c
new file mode 100644
index 0000000000000000000000000000000000000000..54192ae94af60bbb2525d0f3dc830d603de69fb6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_z_u8.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint8x16_t
+foo (uint8_t const * base, mve_pred16_t p)
+{
+  return vldrbq_z_u8 (base, p);
+}
+
+/* { dg-final { scan-assembler "vldrbt.u8"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_z_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_z_s32.c
new file mode 100644
index 0000000000000000000000000000000000000000..3bb1ba826f309e53810aa675e5442d2aeb15f4b1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_z_s32.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int32x4_t
+foo (uint32x4_t addr, mve_pred16_t p)
+{
+  return vldrwq_gather_base_z_s32 (addr, 4, p);
+}
+
+/* { dg-final { scan-assembler "vldrwt.u32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_z_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_z_u32.c
new file mode 100644
index 0000000000000000000000000000000000000000..2a92e4d915fb77c42efd9be0ab04f43dc836a4d4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_z_u32.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint32x4_t
+foo (uint32x4_t addr, mve_pred16_t p)
+{
+  return vldrwq_gather_base_z_u32 (addr, 4, p);
+}
+
+/* { dg-final { scan-assembler "vldrwt.u32"  }  } */


[-- Attachment #2: diff23.patch --]
[-- Type: text/plain, Size: 24735 bytes --]

diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index b5639051bf07785d906ed596e08d670f4de1a67e..c3d12375d2fbc933ad33f7a15a3bbf53079d0639 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -653,6 +653,40 @@ arm_ldrgbu_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_unsigned, qualifier_unsigned, qualifier_immediate};
 #define LDRGBU_QUALIFIERS (arm_ldrgbu_qualifiers)
 
+static enum arm_type_qualifiers
+arm_ldrgbs_z_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_none, qualifier_unsigned, qualifier_immediate,
+      qualifier_unsigned};
+#define LDRGBS_Z_QUALIFIERS (arm_ldrgbs_z_qualifiers)
+
+static enum arm_type_qualifiers
+arm_ldrgbu_z_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_unsigned, qualifier_unsigned, qualifier_immediate,
+      qualifier_unsigned};
+#define LDRGBU_Z_QUALIFIERS (arm_ldrgbu_z_qualifiers)
+
+static enum arm_type_qualifiers
+arm_ldrgs_z_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_none, qualifier_pointer, qualifier_unsigned,
+      qualifier_unsigned};
+#define LDRGS_Z_QUALIFIERS (arm_ldrgs_z_qualifiers)
+
+static enum arm_type_qualifiers
+arm_ldrgu_z_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_unsigned, qualifier_pointer, qualifier_unsigned,
+      qualifier_unsigned};
+#define LDRGU_Z_QUALIFIERS (arm_ldrgu_z_qualifiers)
+
+static enum arm_type_qualifiers
+arm_ldrs_z_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_none, qualifier_pointer, qualifier_unsigned};
+#define LDRS_Z_QUALIFIERS (arm_ldrs_z_qualifiers)
+
+static enum arm_type_qualifiers
+arm_ldru_z_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_unsigned, qualifier_pointer, qualifier_unsigned};
+#define LDRU_Z_QUALIFIERS (arm_ldru_z_qualifiers)
+
 /* End of Qualifier for MVE builtins.  */
 
    /* void ([T element type] *, T, immediate).  */
diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
index f852acb575b596c534c474d19faa73c03cf85c5e..b40a9c238b7f883a0e91dd6fcc5f41182ea6efe3 100644
--- a/gcc/config/arm/arm_mve.h
+++ b/gcc/config/arm/arm_mve.h
@@ -1744,6 +1744,20 @@ typedef struct { uint8x16_t val[4]; } uint8x16x4_t;
 #define vstrbq_scatter_offset_p_u16( __base, __offset, __value, __p) __arm_vstrbq_scatter_offset_p_u16( __base, __offset, __value, __p)
 #define vstrwq_scatter_base_p_s32(__addr,  __offset, __value, __p) __arm_vstrwq_scatter_base_p_s32(__addr,  __offset, __value, __p)
 #define vstrwq_scatter_base_p_u32(__addr,  __offset, __value, __p) __arm_vstrwq_scatter_base_p_u32(__addr,  __offset, __value, __p)
+#define vldrbq_gather_offset_z_s16(__base, __offset, __p) __arm_vldrbq_gather_offset_z_s16(__base, __offset, __p)
+#define vldrbq_gather_offset_z_u8(__base, __offset, __p) __arm_vldrbq_gather_offset_z_u8(__base, __offset, __p)
+#define vldrbq_gather_offset_z_s32(__base, __offset, __p) __arm_vldrbq_gather_offset_z_s32(__base, __offset, __p)
+#define vldrbq_gather_offset_z_u16(__base, __offset, __p) __arm_vldrbq_gather_offset_z_u16(__base, __offset, __p)
+#define vldrbq_gather_offset_z_u32(__base, __offset, __p) __arm_vldrbq_gather_offset_z_u32(__base, __offset, __p)
+#define vldrbq_gather_offset_z_s8(__base, __offset, __p) __arm_vldrbq_gather_offset_z_s8(__base, __offset, __p)
+#define vldrbq_z_s16(__base, __p) __arm_vldrbq_z_s16(__base, __p)
+#define vldrbq_z_u8(__base, __p) __arm_vldrbq_z_u8(__base, __p)
+#define vldrbq_z_s8(__base, __p) __arm_vldrbq_z_s8(__base, __p)
+#define vldrbq_z_s32(__base, __p) __arm_vldrbq_z_s32(__base, __p)
+#define vldrbq_z_u16(__base, __p) __arm_vldrbq_z_u16(__base, __p)
+#define vldrbq_z_u32(__base, __p) __arm_vldrbq_z_u32(__base, __p)
+#define vldrwq_gather_base_z_u32(__addr,  __offset, __p) __arm_vldrwq_gather_base_z_u32(__addr,  __offset, __p)
+#define vldrwq_gather_base_z_s32(__addr,  __offset, __p) __arm_vldrwq_gather_base_z_s32(__addr,  __offset, __p)
 #endif
 
 __extension__ extern __inline void
@@ -11330,6 +11344,105 @@ __arm_vstrwq_scatter_base_p_u32 (uint32x4_t __addr, const int __offset, uint32x4
 {
   __builtin_mve_vstrwq_scatter_base_p_uv4si (__addr, __offset, __value, __p);
 }
+
+__extension__ extern __inline int8x16_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vldrbq_gather_offset_z_s8 (int8_t const * __base, uint8x16_t __offset, mve_pred16_t __p)
+{
+  return __builtin_mve_vldrbq_gather_offset_z_sv16qi ((__builtin_neon_qi *) __base, __offset, __p);
+}
+
+__extension__ extern __inline int32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vldrbq_gather_offset_z_s32 (int8_t const * __base, uint32x4_t __offset, mve_pred16_t __p)
+{
+  return __builtin_mve_vldrbq_gather_offset_z_sv4si ((__builtin_neon_qi *) __base, __offset, __p);
+}
+
+__extension__ extern __inline int16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vldrbq_gather_offset_z_s16 (int8_t const * __base, uint16x8_t __offset, mve_pred16_t __p)
+{
+  return __builtin_mve_vldrbq_gather_offset_z_sv8hi ((__builtin_neon_qi *) __base, __offset, __p);
+}
+
+__extension__ extern __inline uint8x16_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vldrbq_gather_offset_z_u8 (uint8_t const * __base, uint8x16_t __offset, mve_pred16_t __p)
+{
+  return __builtin_mve_vldrbq_gather_offset_z_uv16qi ((__builtin_neon_qi *) __base, __offset, __p);
+}
+
+__extension__ extern __inline uint32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vldrbq_gather_offset_z_u32 (uint8_t const * __base, uint32x4_t __offset, mve_pred16_t __p)
+{
+  return __builtin_mve_vldrbq_gather_offset_z_uv4si ((__builtin_neon_qi *) __base, __offset, __p);
+}
+
+__extension__ extern __inline uint16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vldrbq_gather_offset_z_u16 (uint8_t const * __base, uint16x8_t __offset, mve_pred16_t __p)
+{
+  return __builtin_mve_vldrbq_gather_offset_z_uv8hi ((__builtin_neon_qi *) __base, __offset, __p);
+}
+
+__extension__ extern __inline int8x16_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vldrbq_z_s8 (int8_t const * __base, mve_pred16_t __p)
+{
+  return __builtin_mve_vldrbq_z_sv16qi ((__builtin_neon_qi *) __base, __p);
+}
+
+__extension__ extern __inline int32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vldrbq_z_s32 (int8_t const * __base, mve_pred16_t __p)
+{
+  return __builtin_mve_vldrbq_z_sv4si ((__builtin_neon_qi *) __base, __p);
+}
+
+__extension__ extern __inline int16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vldrbq_z_s16 (int8_t const * __base, mve_pred16_t __p)
+{
+  return __builtin_mve_vldrbq_z_sv8hi ((__builtin_neon_qi *) __base, __p);
+}
+
+__extension__ extern __inline uint8x16_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vldrbq_z_u8 (uint8_t const * __base, mve_pred16_t __p)
+{
+  return __builtin_mve_vldrbq_z_uv16qi ((__builtin_neon_qi *) __base, __p);
+}
+
+__extension__ extern __inline uint32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vldrbq_z_u32 (uint8_t const * __base, mve_pred16_t __p)
+{
+  return __builtin_mve_vldrbq_z_uv4si ((__builtin_neon_qi *) __base, __p);
+}
+
+__extension__ extern __inline uint16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vldrbq_z_u16 (uint8_t const * __base, mve_pred16_t __p)
+{
+  return __builtin_mve_vldrbq_z_uv8hi ((__builtin_neon_qi *) __base, __p);
+}
+
+__extension__ extern __inline int32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vldrwq_gather_base_z_s32 (uint32x4_t __addr, const int __offset, mve_pred16_t __p)
+{
+  return __builtin_mve_vldrwq_gather_base_z_sv4si (__addr, __offset, __p);
+}
+
+__extension__ extern __inline uint32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vldrwq_gather_base_z_u32 (uint32x4_t __addr, const int __offset, mve_pred16_t __p)
+{
+  return __builtin_mve_vldrwq_gather_base_z_uv4si (__addr, __offset, __p);
+}
+
 #if (__ARM_FEATURE_MVE & 2) /* MVE Floating point.  */
 
 __extension__ extern __inline void
@@ -13471,6 +13584,7 @@ __arm_vsubq_m_n_f16 (float16x8_t __inactive, float16x8_t __a, float16_t __b, mve
 {
   return __builtin_mve_vsubq_m_n_fv8hf (__inactive, __a, __b, __p);
 }
+
 #endif
 
 enum {
@@ -17946,6 +18060,17 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_int32x4_t]: __arm_vstrwq_scatter_base_p_s32 (p0, p1, __ARM_mve_coerce(__p2, int32x4_t), p3), \
   int (*)[__ARM_mve_type_uint32x4_t]: __arm_vstrwq_scatter_base_p_u32 (p0, p1, __ARM_mve_coerce(__p2, uint32x4_t), p3));})
 
+#define vldrbq_gather_offset_z(p0,p1,p2) __arm_vldrbq_gather_offset_z(p0,p1,p2)
+#define __arm_vldrbq_gather_offset_z(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
+  __typeof(p1) __p1 = (p1); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
+  int (*)[__ARM_mve_type_int8_t_const_ptr][__ARM_mve_type_uint8x16_t]: __arm_vldrbq_gather_offset_z_s8 (__ARM_mve_coerce(__p0, int8_t const *), __ARM_mve_coerce(__p1, uint8x16_t), p2), \
+  int (*)[__ARM_mve_type_int8_t_const_ptr][__ARM_mve_type_uint16x8_t]: __arm_vldrbq_gather_offset_z_s16 (__ARM_mve_coerce(__p0, int8_t const *), __ARM_mve_coerce(__p1, uint16x8_t), p2), \
+  int (*)[__ARM_mve_type_int8_t_const_ptr][__ARM_mve_type_uint32x4_t]: __arm_vldrbq_gather_offset_z_s32 (__ARM_mve_coerce(__p0, int8_t const *), __ARM_mve_coerce(__p1, uint32x4_t), p2), \
+  int (*)[__ARM_mve_type_uint8_t_const_ptr][__ARM_mve_type_uint8x16_t]: __arm_vldrbq_gather_offset_z_u8 (__ARM_mve_coerce(__p0, uint8_t const *), __ARM_mve_coerce(__p1, uint8x16_t), p2), \
+  int (*)[__ARM_mve_type_uint8_t_const_ptr][__ARM_mve_type_uint16x8_t]: __arm_vldrbq_gather_offset_z_u16 (__ARM_mve_coerce(__p0, uint8_t const *), __ARM_mve_coerce(__p1, uint16x8_t), p2), \
+  int (*)[__ARM_mve_type_uint8_t_const_ptr][__ARM_mve_type_uint32x4_t]: __arm_vldrbq_gather_offset_z_u32 (__ARM_mve_coerce(__p0, uint8_t const *), __ARM_mve_coerce(__p1, uint32x4_t), p2));})
+
 #endif /* MVE Floating point.  */
 
 #ifdef __cplusplus
diff --git a/gcc/config/arm/arm_mve_builtins.def b/gcc/config/arm/arm_mve_builtins.def
index dc22c2f599702be5a4c4481033d3f413abf6d1c5..45b00e889cfe716a2c70fb86d72eb9a4c411b70d 100644
--- a/gcc/config/arm/arm_mve_builtins.def
+++ b/gcc/config/arm/arm_mve_builtins.def
@@ -703,3 +703,9 @@ VAR3 (STRSS_P, vstrbq_scatter_offset_p_s, v16qi, v8hi, v4si)
 VAR3 (STRSU_P, vstrbq_scatter_offset_p_u, v16qi, v8hi, v4si)
 VAR1 (STRSBS_P, vstrwq_scatter_base_p_s, v4si)
 VAR1 (STRSBU_P, vstrwq_scatter_base_p_u, v4si)
+VAR1 (LDRGBS_Z, vldrwq_gather_base_z_s, v4si)
+VAR1 (LDRGBU_Z, vldrwq_gather_base_z_u, v4si)
+VAR3 (LDRGS_Z, vldrbq_gather_offset_z_s, v16qi, v8hi, v4si)
+VAR3 (LDRGU_Z, vldrbq_gather_offset_z_u, v16qi, v8hi, v4si)
+VAR3 (LDRS_Z, vldrbq_z_s, v16qi, v8hi, v4si)
+VAR3 (LDRU_Z, vldrbq_z_u, v16qi, v8hi, v4si)
diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index 266a7f830285521fd225c8b07bc15e412b7bab61..05b94e4ee283da427aee18c7223bf5ff4b0e1e4a 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -8135,3 +8135,69 @@
    return "";
 }
   [(set_attr "length" "8")])
+
+;;
+;; [vldrbq_gather_offset_z_s vldrbq_gather_offset_z_u]
+;;
+(define_insn "mve_vldrbq_gather_offset_z_<supf><mode>"
+  [(set (match_operand:MVE_2 0 "s_register_operand" "=&w")
+	(unspec:MVE_2 [(match_operand:<MVE_B_ELEM> 1 "memory_operand" "Us")
+		       (match_operand:MVE_2 2 "s_register_operand" "w")
+		       (match_operand:HI 3 "vpr_register_operand" "Up")]
+	 VLDRBGOQ))
+  ]
+  "TARGET_HAVE_MVE"
+{
+   rtx ops[4];
+   ops[0] = operands[0];
+   ops[1] = operands[1];
+   ops[2] = operands[2];
+   ops[3] = operands[3];
+   if (!strcmp ("<supf>","s") && <V_sz_elem> == 8)
+     output_asm_insn ("vpst\n\tvldrbt.u8\t%q0, [%m1, %q2]",ops);
+   else
+     output_asm_insn ("vpst\n\tvldrbt.<supf><V_sz_elem>\t%q0, [%m1, %q2]",ops);
+   return "";
+}
+  [(set_attr "length" "8")])
+
+;;
+;; [vldrbq_z_s vldrbq_z_u]
+;;
+(define_insn "mve_vldrbq_z_<supf><mode>"
+  [(set (match_operand:MVE_2 0 "s_register_operand" "=w")
+	(unspec:MVE_2 [(match_operand:<MVE_B_ELEM> 1 "memory_operand" "Us")
+		       (match_operand:HI 2 "vpr_register_operand" "Up")]
+	 VLDRBQ))
+  ]
+  "TARGET_HAVE_MVE"
+{
+   rtx ops[2];
+   int regno = REGNO (operands[0]);
+   ops[0] = gen_rtx_REG (TImode, regno);
+   ops[1]  = operands[1];
+   output_asm_insn ("vpst\n\tvldrbt.<supf><V_sz_elem>\t%q0, %E1",ops);
+   return "";
+}
+  [(set_attr "length" "8")])
+
+;;
+;; [vldrwq_gather_base_z_s vldrwq_gather_base_z_u]
+;;
+(define_insn "mve_vldrwq_gather_base_z_<supf>v4si"
+  [(set (match_operand:V4SI 0 "s_register_operand" "=&w")
+	(unspec:V4SI [(match_operand:V4SI 1 "s_register_operand" "w")
+		      (match_operand:SI 2 "immediate_operand" "i")
+		      (match_operand:HI 3 "vpr_register_operand" "Up")]
+	 VLDRWGBQ))
+  ]
+  "TARGET_HAVE_MVE"
+{
+   rtx ops[3];
+   ops[0] = operands[0];
+   ops[1] = operands[1];
+   ops[2] = operands[2];
+   output_asm_insn ("vpst\n\tvldrwt.u32\t%q0, [%q1, %2]",ops);
+   return "";
+}
+  [(set_attr "length" "8")])
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_gather_offset_z_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_gather_offset_z_s16.c
new file mode 100644
index 0000000000000000000000000000000000000000..ce1b2b241989449e0bfee0f5a378eb9b5a3e250a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_gather_offset_z_s16.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int16x8_t
+foo (int8_t const * base, uint16x8_t offset, mve_pred16_t p)
+{
+  return vldrbq_gather_offset_z_s16 (base, offset, p);
+}
+
+/* { dg-final { scan-assembler "vldrbt.s16"  }  } */
+
+int16x8_t
+foo1 (int8_t const * base, uint16x8_t offset, mve_pred16_t p)
+{
+  return vldrbq_gather_offset_z (base, offset, p);
+}
+
+/* { dg-final { scan-assembler "vldrbt.s16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_gather_offset_z_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_gather_offset_z_s32.c
new file mode 100644
index 0000000000000000000000000000000000000000..b79e1c8961cb52b4e16351064ee8f2e1be1baae5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_gather_offset_z_s32.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int32x4_t
+foo (int8_t const * base, uint32x4_t offset, mve_pred16_t p)
+{
+  return vldrbq_gather_offset_z_s32 (base, offset, p);
+}
+
+/* { dg-final { scan-assembler "vldrbt.s32"  }  } */
+
+int32x4_t
+foo1 (int8_t const * base, uint32x4_t offset, mve_pred16_t p)
+{
+  return vldrbq_gather_offset_z (base, offset, p);
+}
+
+/* { dg-final { scan-assembler "vldrbt.s32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_gather_offset_z_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_gather_offset_z_s8.c
new file mode 100644
index 0000000000000000000000000000000000000000..816634a7086ce290048a669144e715de03eed71c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_gather_offset_z_s8.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int8x16_t
+foo (int8_t const * base, uint8x16_t offset, mve_pred16_t p)
+{
+  return vldrbq_gather_offset_z_s8 (base, offset, p);
+}
+
+/* { dg-final { scan-assembler "vldrbt.u8"  }  } */
+
+int8x16_t
+foo1 (int8_t const * base, uint8x16_t offset, mve_pred16_t p)
+{
+  return vldrbq_gather_offset_z (base, offset, p);
+}
+
+/* { dg-final { scan-assembler "vldrbt.u8"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_gather_offset_z_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_gather_offset_z_u16.c
new file mode 100644
index 0000000000000000000000000000000000000000..67722400bb04be883b89d3a882023ef18b1e1c7d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_gather_offset_z_u16.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint16x8_t
+foo (uint8_t const * base, uint16x8_t offset, mve_pred16_t p)
+{
+  return vldrbq_gather_offset_z_u16 (base, offset, p);
+}
+
+/* { dg-final { scan-assembler "vldrbt.u16"  }  } */
+
+uint16x8_t
+foo1 (uint8_t const * base, uint16x8_t offset, mve_pred16_t p)
+{
+  return vldrbq_gather_offset_z (base, offset, p);
+}
+
+/* { dg-final { scan-assembler "vldrbt.u16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_gather_offset_z_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_gather_offset_z_u32.c
new file mode 100644
index 0000000000000000000000000000000000000000..ed174404330e13e017edd07883dbdb261300f23b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_gather_offset_z_u32.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint32x4_t
+foo (uint8_t const * base, uint32x4_t offset, mve_pred16_t p)
+{
+  return vldrbq_gather_offset_z_u32 (base, offset, p);
+}
+
+/* { dg-final { scan-assembler "vldrbt.u32"  }  } */
+
+uint32x4_t
+foo1 (uint8_t const * base, uint32x4_t offset, mve_pred16_t p)
+{
+  return vldrbq_gather_offset_z (base, offset, p);
+}
+
+/* { dg-final { scan-assembler "vldrbt.u32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_gather_offset_z_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_gather_offset_z_u8.c
new file mode 100644
index 0000000000000000000000000000000000000000..1ac9f4ee3669222020b6de6b8e89463128b61acd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_gather_offset_z_u8.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint8x16_t
+foo (uint8_t const * base, uint8x16_t offset, mve_pred16_t p)
+{
+  return vldrbq_gather_offset_z_u8 (base, offset, p);
+}
+
+/* { dg-final { scan-assembler "vldrbt.u8"  }  } */
+
+uint8x16_t
+foo1 (uint8_t const * base, uint8x16_t offset, mve_pred16_t p)
+{
+  return vldrbq_gather_offset_z (base, offset, p);
+}
+
+/* { dg-final { scan-assembler "vldrbt.u8"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_z_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_z_s16.c
new file mode 100644
index 0000000000000000000000000000000000000000..4fd604ed27da1d686f60b76c6a66a5377b32ed3c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_z_s16.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int16x8_t
+foo (int8_t const * base, mve_pred16_t p)
+{
+  return vldrbq_z_s16 (base, p);
+}
+
+/* { dg-final { scan-assembler "vldrbt.s16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_z_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_z_s32.c
new file mode 100644
index 0000000000000000000000000000000000000000..faae9e17c15cc60d72a582e4f341dd2e132c4f7e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_z_s32.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int32x4_t
+foo (int8_t const * base, mve_pred16_t p)
+{
+  return vldrbq_z_s32 (base, p);
+}
+
+/* { dg-final { scan-assembler "vldrbt.s32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_z_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_z_s8.c
new file mode 100644
index 0000000000000000000000000000000000000000..1bb0bed237f936739395d48be9509393a888ae3d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_z_s8.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int8x16_t
+foo (int8_t const * base, mve_pred16_t p)
+{
+  return vldrbq_z_s8 (base, p);
+}
+
+/* { dg-final { scan-assembler "vldrbt.s8"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_z_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_z_u16.c
new file mode 100644
index 0000000000000000000000000000000000000000..50e18a3cecc98fe83727b9256e14e8dd1088aa61
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_z_u16.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint16x8_t
+foo (uint8_t const * base, mve_pred16_t p)
+{
+  return vldrbq_z_u16 (base, p);
+}
+
+/* { dg-final { scan-assembler "vldrbt.u16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_z_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_z_u32.c
new file mode 100644
index 0000000000000000000000000000000000000000..78590dda496acb26d8eeff273932541f7f1a3869
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_z_u32.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint32x4_t
+foo (uint8_t const * base, mve_pred16_t p)
+{
+  return vldrbq_z_u32 (base, p);
+}
+
+/* { dg-final { scan-assembler "vldrbt.u32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_z_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_z_u8.c
new file mode 100644
index 0000000000000000000000000000000000000000..54192ae94af60bbb2525d0f3dc830d603de69fb6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_z_u8.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint8x16_t
+foo (uint8_t const * base, mve_pred16_t p)
+{
+  return vldrbq_z_u8 (base, p);
+}
+
+/* { dg-final { scan-assembler "vldrbt.u8"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_z_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_z_s32.c
new file mode 100644
index 0000000000000000000000000000000000000000..3bb1ba826f309e53810aa675e5442d2aeb15f4b1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_z_s32.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int32x4_t
+foo (uint32x4_t addr, mve_pred16_t p)
+{
+  return vldrwq_gather_base_z_s32 (addr, 4, p);
+}
+
+/* { dg-final { scan-assembler "vldrwt.u32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_z_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_z_u32.c
new file mode 100644
index 0000000000000000000000000000000000000000..2a92e4d915fb77c42efd9be0ab04f43dc836a4d4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_z_u32.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint32x4_t
+foo (uint32x4_t addr, mve_pred16_t p)
+{
+  return vldrwq_gather_base_z_u32 (addr, 4, p);
+}
+
+/* { dg-final { scan-assembler "vldrwt.u32"  }  } */


^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH][ARM][GCC][1/5x]: MVE store intrinsics.
  2019-11-14 19:34 [PATCH][ARM][GCC][0/x]: Support for MVE ACLE intrinsics Srinath Parvathaneni
                   ` (16 preceding siblings ...)
  2019-11-14 19:15 ` [PATCH][ARM][GCC][8/5x]: Remaining MVE store intrinsics which stores an half word, word and double " Srinath Parvathaneni
@ 2019-11-14 19:15 ` Srinath Parvathaneni
  2019-11-14 19:15 ` [PATCH][ARM][GCC][2/5x]: MVE load intrinsics Srinath Parvathaneni
                   ` (20 subsequent siblings)
  38 siblings, 0 replies; 48+ messages in thread
From: Srinath Parvathaneni @ 2019-11-14 19:15 UTC (permalink / raw)
  To: gcc-patches; +Cc: Richard Earnshaw, Kyrylo Tkachov

[-- Attachment #1: Type: text/plain, Size: 33022 bytes --]

Hello,

This patch supports the following MVE ACLE store intrinsics.

vstrbq_scatter_offset_s8, vstrbq_scatter_offset_s32, vstrbq_scatter_offset_s16,
vstrbq_scatter_offset_u8, vstrbq_scatter_offset_u32, vstrbq_scatter_offset_u16,
vstrbq_s8, vstrbq_s32, vstrbq_s16, vstrbq_u8, vstrbq_u32, vstrbq_u16,
vstrwq_scatter_base_s32, vstrwq_scatter_base_u32.

Please refer to M-profile Vector Extension (MVE) intrinsics [1]  for more details.
[1]  https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/mve-intrinsics

Regression tested on arm-none-eabi and found no regressions.

Ok for trunk?

Thanks,
Srinath.

gcc/ChangeLog:

2019-11-01  Andre Vieira  <andre.simoesdiasvieira@arm.com>
	    Mihail Ionescu  <mihail.ionescu@arm.com>
	    Srinath Parvathaneni  <srinath.parvathaneni@arm.com>

	* config/arm/arm-builtins.c (STRS_QUALIFIERS): Define builtin qualifier.
	(STRU_QUALIFIERS): Likewise.
	(STRSS_QUALIFIERS): Likewise.
	(STRSU_QUALIFIERS): Likewise.
	(STRSBS_QUALIFIERS): Likewise.
	(STRSBU_QUALIFIERS): Likewise.
	* config/arm/arm_mve.h (vstrbq_s8): Define macro.
	(vstrbq_u8): Likewise.
	(vstrbq_u16): Likewise.
	(vstrbq_scatter_offset_s8): Likewise.
	(vstrbq_scatter_offset_u8): Likewise.
	(vstrbq_scatter_offset_u16): Likewise.
	(vstrbq_s16): Likewise.
	(vstrbq_u32): Likewise.
	(vstrbq_scatter_offset_s16): Likewise.
	(vstrbq_scatter_offset_u32): Likewise.
	(vstrbq_s32): Likewise.
	(vstrbq_scatter_offset_s32): Likewise.
	(vstrwq_scatter_base_s32): Likewise.
	(vstrwq_scatter_base_u32): Likewise.
	(__arm_vstrbq_scatter_offset_s8): Define intrinsic.
	(__arm_vstrbq_scatter_offset_s32): Likewise.
	(__arm_vstrbq_scatter_offset_s16): Likewise.
	(__arm_vstrbq_scatter_offset_u8): Likewise.
	(__arm_vstrbq_scatter_offset_u32): Likewise.
	(__arm_vstrbq_scatter_offset_u16): Likewise.
	(__arm_vstrbq_s8): Likewise.
	(__arm_vstrbq_s32): Likewise.
	(__arm_vstrbq_s16): Likewise.
	(__arm_vstrbq_u8): Likewise.
	(__arm_vstrbq_u32): Likewise.
	(__arm_vstrbq_u16): Likewise.
	(__arm_vstrwq_scatter_base_s32): Likewise.
	(__arm_vstrwq_scatter_base_u32): Likewise.
	(vstrbq): Define polymorphic variant.
	(vstrbq_scatter_offset): Likewise.
	(vstrwq_scatter_base): Likewise.
	* config/arm/arm_mve_builtins.def (STRS_QUALIFIERS): Use builtin
	qualifier.
        (STRU_QUALIFIERS): Likewise.
        (STRSS_QUALIFIERS): Likewise.
        (STRSU_QUALIFIERS): Likewise.
        (STRSBS_QUALIFIERS): Likewise.
        (STRSBU_QUALIFIERS): Likewise.
	* config/arm/mve.md (MVE_B_ELEM): Define mode attribute iterator.
	(VSTRWSBQ): Define iterators.
	(VSTRBSOQ): Likewise. 
	(VSTRBQ): Likewise.
	(mve_vstrbq_<supf><mode>): Define RTL pattern.
	(mve_vstrbq_scatter_offset_<supf><mode>): Likewise.
	(mve_vstrwq_scatter_base_<supf>v4si): Likewise.

gcc/testsuite/ChangeLog:

2019-11-01  Andre Vieira  <andre.simoesdiasvieira@arm.com>
	    Mihail Ionescu  <mihail.ionescu@arm.com>
	    Srinath Parvathaneni  <srinath.parvathaneni@arm.com>

	* gcc.target/arm/mve/intrinsics/vstrbq_s16.c: New test.
	* gcc.target/arm/mve/intrinsics/vstrbq_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vstrbq_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vstrbq_scatter_offset_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vstrbq_scatter_offset_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vstrbq_scatter_offset_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vstrbq_scatter_offset_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vstrbq_scatter_offset_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vstrbq_scatter_offset_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vstrbq_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vstrbq_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vstrbq_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_u32.c: Likewise.


###############     Attachment also inlined for ease of reply    ###############


diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index 6dffb36fe179357c62cb4f35d486513971e3487d..ec88199bb5e7e9c15a346061c70841f3086004ef 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -555,6 +555,39 @@ arm_quadop_unone_unone_unone_none_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 #define QUADOP_UNONE_UNONE_UNONE_NONE_UNONE_QUALIFIERS \
   (arm_quadop_unone_unone_unone_none_unone_qualifiers)
 
+static enum arm_type_qualifiers
+arm_strs_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_void, qualifier_pointer, qualifier_none };
+#define STRS_QUALIFIERS (arm_strs_qualifiers)
+
+static enum arm_type_qualifiers
+arm_stru_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_void, qualifier_pointer, qualifier_unsigned };
+#define STRU_QUALIFIERS (arm_stru_qualifiers)
+
+static enum arm_type_qualifiers
+arm_strss_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_void, qualifier_pointer, qualifier_unsigned,
+      qualifier_none};
+#define STRSS_QUALIFIERS (arm_strss_qualifiers)
+
+static enum arm_type_qualifiers
+arm_strsu_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_void, qualifier_pointer, qualifier_unsigned,
+      qualifier_unsigned};
+#define STRSU_QUALIFIERS (arm_strsu_qualifiers)
+
+static enum arm_type_qualifiers
+arm_strsbs_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_void, qualifier_unsigned, qualifier_immediate, qualifier_none};
+#define STRSBS_QUALIFIERS (arm_strsbs_qualifiers)
+
+static enum arm_type_qualifiers
+arm_strsbu_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_void, qualifier_unsigned, qualifier_immediate,
+      qualifier_unsigned};
+#define STRSBU_QUALIFIERS (arm_strsbu_qualifiers)
+
 /* End of Qualifier for MVE builtins.  */
 
    /* void ([T element type] *, T, immediate).  */
diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
index 896410a843716376b576016a07bc9ece41f63e7d..3d64033dd23636b041cbd6ecb3eb18c3b9db3aed 100644
--- a/gcc/config/arm/arm_mve.h
+++ b/gcc/config/arm/arm_mve.h
@@ -1702,6 +1702,20 @@ typedef struct { uint8x16_t val[4]; } uint8x16x4_t;
 #define vsubq_m_f16(__inactive, __a, __b, __p) __arm_vsubq_m_f16(__inactive, __a, __b, __p)
 #define vsubq_m_n_f32(__inactive, __a, __b, __p) __arm_vsubq_m_n_f32(__inactive, __a, __b, __p)
 #define vsubq_m_n_f16(__inactive, __a, __b, __p) __arm_vsubq_m_n_f16(__inactive, __a, __b, __p)
+#define vstrbq_s8( __addr, __value) __arm_vstrbq_s8( __addr, __value)
+#define vstrbq_u8( __addr, __value) __arm_vstrbq_u8( __addr, __value)
+#define vstrbq_u16( __addr, __value) __arm_vstrbq_u16( __addr, __value)
+#define vstrbq_scatter_offset_s8( __base, __offset, __value) __arm_vstrbq_scatter_offset_s8( __base, __offset, __value)
+#define vstrbq_scatter_offset_u8( __base, __offset, __value) __arm_vstrbq_scatter_offset_u8( __base, __offset, __value)
+#define vstrbq_scatter_offset_u16( __base, __offset, __value) __arm_vstrbq_scatter_offset_u16( __base, __offset, __value)
+#define vstrbq_s16( __addr, __value) __arm_vstrbq_s16( __addr, __value)
+#define vstrbq_u32( __addr, __value) __arm_vstrbq_u32( __addr, __value)
+#define vstrbq_scatter_offset_s16( __base, __offset, __value) __arm_vstrbq_scatter_offset_s16( __base, __offset, __value)
+#define vstrbq_scatter_offset_u32( __base, __offset, __value) __arm_vstrbq_scatter_offset_u32( __base, __offset, __value)
+#define vstrbq_s32( __addr, __value) __arm_vstrbq_s32( __addr, __value)
+#define vstrbq_scatter_offset_s32( __base, __offset, __value) __arm_vstrbq_scatter_offset_s32( __base, __offset, __value)
+#define vstrwq_scatter_base_s32(__addr,  __offset, __value) __arm_vstrwq_scatter_base_s32(__addr,  __offset, __value)
+#define vstrwq_scatter_base_u32(__addr,  __offset, __value) __arm_vstrwq_scatter_base_u32(__addr,  __offset, __value)
 #endif
 
 __extension__ extern __inline void
@@ -10995,6 +11009,103 @@ __arm_vshrntq_m_n_u16 (uint8x16_t __a, uint16x8_t __b, const int __imm, mve_pred
   return __builtin_mve_vshrntq_m_n_uv8hi (__a, __b, __imm, __p);
 }
 
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vstrbq_scatter_offset_s8 (int8_t * __base, uint8x16_t __offset, int8x16_t __value)
+{
+  __builtin_mve_vstrbq_scatter_offset_sv16qi ((__builtin_neon_qi *) __base, __offset, __value);
+}
+
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vstrbq_scatter_offset_s32 (int8_t * __base, uint32x4_t __offset, int32x4_t __value)
+{
+  __builtin_mve_vstrbq_scatter_offset_sv4si ((__builtin_neon_qi *) __base, __offset, __value);
+}
+
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vstrbq_scatter_offset_s16 (int8_t * __base, uint16x8_t __offset, int16x8_t __value)
+{
+  __builtin_mve_vstrbq_scatter_offset_sv8hi ((__builtin_neon_qi *) __base, __offset, __value);
+}
+
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vstrbq_scatter_offset_u8 (uint8_t * __base, uint8x16_t __offset, uint8x16_t __value)
+{
+  __builtin_mve_vstrbq_scatter_offset_uv16qi ((__builtin_neon_qi *) __base, __offset, __value);
+}
+
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vstrbq_scatter_offset_u32 (uint8_t * __base, uint32x4_t __offset, uint32x4_t __value)
+{
+  __builtin_mve_vstrbq_scatter_offset_uv4si ((__builtin_neon_qi *) __base, __offset, __value);
+}
+
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vstrbq_scatter_offset_u16 (uint8_t * __base, uint16x8_t __offset, uint16x8_t __value)
+{
+  __builtin_mve_vstrbq_scatter_offset_uv8hi ((__builtin_neon_qi *) __base, __offset, __value);
+}
+
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vstrbq_s8 (int8_t * __addr, int8x16_t __value)
+{
+  __builtin_mve_vstrbq_sv16qi ((__builtin_neon_qi *) __addr, __value);
+}
+
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vstrbq_s32 (int8_t * __addr, int32x4_t __value)
+{
+  __builtin_mve_vstrbq_sv4si ((__builtin_neon_qi *) __addr, __value);
+}
+
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vstrbq_s16 (int8_t * __addr, int16x8_t __value)
+{
+  __builtin_mve_vstrbq_sv8hi ((__builtin_neon_qi *) __addr, __value);
+}
+
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vstrbq_u8 (uint8_t * __addr, uint8x16_t __value)
+{
+  __builtin_mve_vstrbq_uv16qi ((__builtin_neon_qi *) __addr, __value);
+}
+
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vstrbq_u32 (uint8_t * __addr, uint32x4_t __value)
+{
+  __builtin_mve_vstrbq_uv4si ((__builtin_neon_qi *) __addr, __value);
+}
+
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vstrbq_u16 (uint8_t * __addr, uint16x8_t __value)
+{
+  __builtin_mve_vstrbq_uv8hi ((__builtin_neon_qi *) __addr, __value);
+}
+
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vstrwq_scatter_base_s32 (uint32x4_t __addr, const int __offset, int32x4_t __value)
+{
+  __builtin_mve_vstrwq_scatter_base_sv4si (__addr, __offset, __value);
+}
+
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vstrwq_scatter_base_u32 (uint32x4_t __addr, const int __offset, uint32x4_t __value)
+{
+  __builtin_mve_vstrwq_scatter_base_uv4si (__addr, __offset, __value);
+}
 #if (__ARM_FEATURE_MVE & 2) /* MVE Floating point.  */
 
 __extension__ extern __inline void
@@ -17542,6 +17653,35 @@ extern void *__ARM_undef;
 #define vrmlsldavhaxq_p(p0,p1,p2,p3) __arm_vrmlsldavhaxq_p(p0,p1,p2,p3)
 #define __arm_vrmlsldavhaxq_p(p0,p1,p2,p3) __arm_vrmlsldavhaxq_p_s32(p0,p1,p2,p3)
 
+#define vstrbq(p0,p1) __arm_vstrbq(p0,p1)
+#define __arm_vstrbq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
+  __typeof(p1) __p1 = (p1); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
+  int (*)[__ARM_mve_type_int8_t_ptr][__ARM_mve_type_int8x16_t]: __arm_vstrbq_s8 (__ARM_mve_coerce(__p0, int8_t *), __ARM_mve_coerce(__p1, int8x16_t)), \
+  int (*)[__ARM_mve_type_int8_t_ptr][__ARM_mve_type_int16x8_t]: __arm_vstrbq_s16 (__ARM_mve_coerce(__p0, int8_t *), __ARM_mve_coerce(__p1, int16x8_t)), \
+  int (*)[__ARM_mve_type_int8_t_ptr][__ARM_mve_type_int32x4_t]: __arm_vstrbq_s32 (__ARM_mve_coerce(__p0, int8_t *), __ARM_mve_coerce(__p1, int32x4_t)), \
+  int (*)[__ARM_mve_type_uint8_t_ptr][__ARM_mve_type_uint8x16_t]: __arm_vstrbq_u8 (__ARM_mve_coerce(__p0, uint8_t *), __ARM_mve_coerce(__p1, uint8x16_t)), \
+  int (*)[__ARM_mve_type_uint8_t_ptr][__ARM_mve_type_uint16x8_t]: __arm_vstrbq_u16 (__ARM_mve_coerce(__p0, uint8_t *), __ARM_mve_coerce(__p1, uint16x8_t)), \
+  int (*)[__ARM_mve_type_uint8_t_ptr][__ARM_mve_type_uint32x4_t]: __arm_vstrbq_u32 (__ARM_mve_coerce(__p0, uint8_t *), __ARM_mve_coerce(__p1, uint32x4_t)));})
+
+#define vstrbq_scatter_offset(p0,p1,p2) __arm_vstrbq_scatter_offset(p0,p1,p2)
+#define __arm_vstrbq_scatter_offset(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
+  __typeof(p1) __p1 = (p1); \
+  __typeof(p2) __p2 = (p2); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
+  int (*)[__ARM_mve_type_int8_t_ptr][__ARM_mve_type_uint8x16_t][__ARM_mve_type_int8x16_t]: __arm_vstrbq_scatter_offset_s8 (__ARM_mve_coerce(__p0, int8_t *), __ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce(__p2, int8x16_t)), \
+  int (*)[__ARM_mve_type_int8_t_ptr][__ARM_mve_type_uint16x8_t][__ARM_mve_type_int16x8_t]: __arm_vstrbq_scatter_offset_s16 (__ARM_mve_coerce(__p0, int8_t *), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, int16x8_t)), \
+  int (*)[__ARM_mve_type_int8_t_ptr][__ARM_mve_type_uint32x4_t][__ARM_mve_type_int32x4_t]: __arm_vstrbq_scatter_offset_s32 (__ARM_mve_coerce(__p0, int8_t *), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, int32x4_t)), \
+  int (*)[__ARM_mve_type_uint8_t_ptr][__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]: __arm_vstrbq_scatter_offset_u8 (__ARM_mve_coerce(__p0, uint8_t *), __ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce(__p2, uint8x16_t)), \
+  int (*)[__ARM_mve_type_uint8_t_ptr][__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vstrbq_scatter_offset_u16 (__ARM_mve_coerce(__p0, uint8_t *), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, uint16x8_t)), \
+  int (*)[__ARM_mve_type_uint8_t_ptr][__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vstrbq_scatter_offset_u32 (__ARM_mve_coerce(__p0, uint8_t *), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, uint32x4_t)));})
+
+#define vstrwq_scatter_base(p0,p1,p2) __arm_vstrwq_scatter_base(p0,p1,p2)
+#define __arm_vstrwq_scatter_base(p0,p1,p2) ({ __typeof(p2) __p2 = (p2); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p2)])0, \
+  int (*)[__ARM_mve_type_int32x4_t]: __arm_vstrwq_scatter_base_s32(p0, p1, __ARM_mve_coerce(__p2, int32x4_t)), \
+  int (*)[__ARM_mve_type_uint32x4_t]: __arm_vstrwq_scatter_base_u32(p0, p1, __ARM_mve_coerce(__p2, uint32x4_t)));})
+
 #endif /* MVE Floating point.  */
 
 #ifdef __cplusplus
diff --git a/gcc/config/arm/arm_mve_builtins.def b/gcc/config/arm/arm_mve_builtins.def
index a18b9b3f433ead3c5d9e73afa3c2d94ae87765f7..149c4115b6167e3f2f0eb43fbdeaa8708ba5d723 100644
--- a/gcc/config/arm/arm_mve_builtins.def
+++ b/gcc/config/arm/arm_mve_builtins.def
@@ -685,3 +685,9 @@ VAR2 (QUADOP_NONE_NONE_NONE_NONE_UNONE, vandq_m_f, v8hf, v4sf)
 VAR2 (QUADOP_NONE_NONE_NONE_NONE_UNONE, vaddq_m_n_f, v8hf, v4sf)
 VAR2 (QUADOP_NONE_NONE_NONE_NONE_UNONE, vaddq_m_f, v8hf, v4sf)
 VAR2 (QUADOP_NONE_NONE_NONE_NONE_UNONE, vabdq_m_f, v8hf, v4sf)
+VAR3 (STRS, vstrbq_s, v16qi, v8hi, v4si)
+VAR3 (STRU, vstrbq_u, v16qi, v8hi, v4si)
+VAR3 (STRSS, vstrbq_scatter_offset_s, v16qi, v8hi, v4si)
+VAR3 (STRSU, vstrbq_scatter_offset_u, v16qi, v8hi, v4si)
+VAR1 (STRSBS, vstrwq_scatter_base_s, v4si)
+VAR1 (STRSBU, vstrwq_scatter_base_u, v4si)
diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index 741c1e84e7d2379d22cf6239649019453b037bc2..eb365166e934f32f7695e5354258151202f378c0 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -191,7 +191,8 @@
 			 VCMLAQ_ROT90_M_F VCMULQ_M_F VCMULQ_ROT180_M_F
 			 VCMULQ_ROT270_M_F VCMULQ_ROT90_M_F VFMAQ_M_F
 			 VFMAQ_M_N_F VFMASQ_M_N_F VFMSQ_M_F VMAXNMQ_M_F
-			 VMINNMQ_M_F VSUBQ_M_F])
+			 VMINNMQ_M_F VSUBQ_M_F VSTRWQSB_S VSTRWQSB_U
+			 VSTRBQSO_S VSTRBQSO_U VSTRBQ_S VSTRBQ_U])
 
 (define_mode_attr MVE_CNVT [(V8HI "V8HF") (V4SI "V4SF")
 			    (V8HF "V8HI") (V4SF "V4SI")])
@@ -342,7 +343,9 @@
 		       (VQRSHRNTQ_M_N_S "s") (VQRSHRNTQ_M_N_U "u")
 		       (VQRSHRNBQ_M_N_S "s") (VQRSHRNBQ_M_N_U "u")
 		       (VMLALDAVAXQ_P_S "s") (VMLALDAVAXQ_P_U "u")
-		       (VMLALDAVAQ_P_S "s") (VMLALDAVAQ_P_U "u")])
+		       (VMLALDAVAQ_P_S "s") (VMLALDAVAQ_P_U "u")
+		       (VSTRWQSB_S "s") (VSTRWQSB_U "u") (VSTRBQSO_S "s")
+		       (VSTRBQSO_U "u") (VSTRBQ_S "s") (VSTRBQ_U "u")])
 
 (define_int_attr mode1 [(VCTP8Q "8") (VCTP16Q "16") (VCTP32Q "32")
 			(VCTP64Q "64") (VCTP8Q_M "8") (VCTP16Q_M "16")
@@ -359,6 +362,7 @@
 
 (define_mode_attr MVE_constraint1 [ (V8HI "Ra") (V4SI "Rc")])
 (define_mode_attr MVE_pred1 [ (V8HI "mve_imm_7") (V4SI "mve_imm_15")])
+(define_mode_attr MVE_B_ELEM [ (V16QI "V16QI") (V8HI "V8QI") (V4SI "V4QI")])
 
 (define_int_iterator VCVTQ_TO_F [VCVTQ_TO_F_S VCVTQ_TO_F_U])
 (define_int_iterator VMVNQ_N [VMVNQ_N_U VMVNQ_N_S])
@@ -562,6 +566,9 @@
 (define_int_iterator VSHLLTQ_M_N [VSHLLTQ_M_N_U VSHLLTQ_M_N_S])
 (define_int_iterator VSHRNBQ_M_N [VSHRNBQ_M_N_S VSHRNBQ_M_N_U])
 (define_int_iterator VSHRNTQ_M_N [VSHRNTQ_M_N_S VSHRNTQ_M_N_U])
+(define_int_iterator VSTRWSBQ [VSTRWQSB_S VSTRWQSB_U])
+(define_int_iterator VSTRBSOQ [VSTRBQSO_S VSTRBQSO_U])
+(define_int_iterator VSTRBQ [VSTRBQ_S VSTRBQ_U])
 
 (define_insn "*mve_mov<mode>"
   [(set (match_operand:MVE_types 0 "s_register_operand" "=w,w,r,w,w,r,w")
@@ -7933,3 +7940,65 @@
   "vpst\;vsubt.f%#<V_sz_elem>\t%q0, %q2, %3"
   [(set_attr "type" "mve_move")
    (set_attr "length""8")])
+
+;;
+;; [vstrbq_s vstrbq_u]
+;;
+(define_insn "mve_vstrbq_<supf><mode>"
+  [(set (match_operand:<MVE_B_ELEM> 0 "memory_operand" "=Us")
+	(unspec:<MVE_B_ELEM> [(match_operand:MVE_2 1 "s_register_operand" "w")]
+	 VSTRBQ))
+  ]
+  "TARGET_HAVE_MVE"
+{
+   rtx ops[2];
+   int regno = REGNO (operands[1]);
+   ops[1] = gen_rtx_REG (TImode, regno);
+   ops[0]  = operands[0];
+   output_asm_insn("vstrb.<V_sz_elem>\t%q1, %E0",ops);
+   return "";
+}
+  [(set_attr "length" "4")])
+
+;;
+;; [vstrbq_scatter_offset_s vstrbq_scatter_offset_u]
+;;
+(define_insn "mve_vstrbq_scatter_offset_<supf><mode>"
+  [(set (match_operand:<MVE_B_ELEM> 0 "memory_operand" "=Us")
+	(unspec:<MVE_B_ELEM>
+		[(match_operand:MVE_2 1 "s_register_operand" "w")
+		 (match_operand:MVE_2 2 "s_register_operand" "w")]
+	 VSTRBSOQ))
+  ]
+  "TARGET_HAVE_MVE"
+{
+   rtx ops[3];
+   ops[0] = operands[0];
+   ops[1] = operands[1];
+   ops[2] = operands[2];
+   output_asm_insn("vstrb.<V_sz_elem>\t%q2, [%m0, %q1]",ops);
+   return "";
+}
+  [(set_attr "length" "4")])
+
+;;
+;; [vstrwq_scatter_base_s vstrwq_scatter_base_u]
+;;
+(define_insn "mve_vstrwq_scatter_base_<supf>v4si"
+  [(set (mem:BLK (scratch))
+	(unspec:BLK
+		[(match_operand:V4SI 0 "s_register_operand" "w")
+		 (match_operand:SI 1 "immediate_operand" "i")
+		 (match_operand:V4SI 2 "s_register_operand" "w")]
+	 VSTRWSBQ))
+  ]
+  "TARGET_HAVE_MVE"
+{
+   rtx ops[3];
+   ops[0] = operands[0];
+   ops[1] = operands[1];
+   ops[2] = operands[2];
+   output_asm_insn("vstrw.u32\t%q2, [%q0, %1]",ops);
+   return "";
+}
+  [(set_attr "length" "4")])
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrbq_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrbq_s16.c
new file mode 100644
index 0000000000000000000000000000000000000000..7c34f7f3b8e22bdc6e64f52fa306833d339c42e4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrbq_s16.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+void
+foo (int8_t * addr, int16x8_t value)
+{
+  vstrbq_s16 (addr, value);
+}
+
+/* { dg-final { scan-assembler "vstrb.16"  }  } */
+
+void
+foo1 (int8_t * addr, int16x8_t value)
+{
+  vstrbq (addr, value);
+}
+
+/* { dg-final { scan-assembler "vstrb.16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrbq_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrbq_s32.c
new file mode 100644
index 0000000000000000000000000000000000000000..a0925529e7c2066d034b20519cea660e159fc4d7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrbq_s32.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+void
+foo (int8_t * addr, int32x4_t value)
+{
+  vstrbq_s32 (addr, value);
+}
+
+/* { dg-final { scan-assembler "vstrb.32"  }  } */
+
+void
+foo1 (int8_t * addr, int32x4_t value)
+{
+  vstrbq (addr, value);
+}
+
+/* { dg-final { scan-assembler "vstrb.32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrbq_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrbq_s8.c
new file mode 100644
index 0000000000000000000000000000000000000000..d56e6c208c07eec84353c080f2cd209a17451736
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrbq_s8.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+void
+foo (int8_t * addr, int8x16_t value)
+{
+  vstrbq_s8 (addr, value);
+}
+
+/* { dg-final { scan-assembler "vstrb.8"  }  } */
+
+void
+foo1 (int8_t * addr, int8x16_t value)
+{
+  vstrbq (addr, value);
+}
+
+/* { dg-final { scan-assembler "vstrb.8"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrbq_scatter_offset_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrbq_scatter_offset_s16.c
new file mode 100644
index 0000000000000000000000000000000000000000..aacf7fc87de328013a3597bd554a821738d404a8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrbq_scatter_offset_s16.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+void
+foo (int8_t * base, uint16x8_t offset, int16x8_t value)
+{
+  vstrbq_scatter_offset_s16 (base, offset, value);
+}
+
+/* { dg-final { scan-assembler "vstrb.16"  }  } */
+
+void
+foo1 (int8_t * base, uint16x8_t offset, int16x8_t value)
+{
+  vstrbq_scatter_offset (base, offset, value);
+}
+
+/* { dg-final { scan-assembler "vstrb.16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrbq_scatter_offset_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrbq_scatter_offset_s32.c
new file mode 100644
index 0000000000000000000000000000000000000000..ef8a9b1aa97311a9e9ef3d61f8aa130652e1f354
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrbq_scatter_offset_s32.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+void
+foo (int8_t * base, uint32x4_t offset, int32x4_t value)
+{
+  vstrbq_scatter_offset_s32 (base, offset, value);
+}
+
+/* { dg-final { scan-assembler "vstrb.32"  }  } */
+
+void
+foo1 (int8_t * base, uint32x4_t offset, int32x4_t value)
+{
+  vstrbq_scatter_offset (base, offset, value);
+}
+
+/* { dg-final { scan-assembler "vstrb.32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrbq_scatter_offset_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrbq_scatter_offset_s8.c
new file mode 100644
index 0000000000000000000000000000000000000000..ee5ec04693a7166e5d498fcbca22db88d093485e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrbq_scatter_offset_s8.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+void
+foo (int8_t * base, uint8x16_t offset, int8x16_t value)
+{
+  vstrbq_scatter_offset_s8 (base, offset, value);
+}
+
+/* { dg-final { scan-assembler "vstrb.8"  }  } */
+
+void
+foo1 (int8_t * base, uint8x16_t offset, int8x16_t value)
+{
+  vstrbq_scatter_offset (base, offset, value);
+}
+
+/* { dg-final { scan-assembler "vstrb.8"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrbq_scatter_offset_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrbq_scatter_offset_u16.c
new file mode 100644
index 0000000000000000000000000000000000000000..f73cf9ec0590e86da71981c5febff14d13fab6ab
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrbq_scatter_offset_u16.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+void
+foo (uint8_t * base, uint16x8_t offset, uint16x8_t value)
+{
+  vstrbq_scatter_offset_u16 (base, offset, value);
+}
+
+/* { dg-final { scan-assembler "vstrb.16"  }  } */
+
+void
+foo1 (uint8_t * base, uint16x8_t offset, uint16x8_t value)
+{
+  vstrbq_scatter_offset (base, offset, value);
+}
+
+/* { dg-final { scan-assembler "vstrb.16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrbq_scatter_offset_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrbq_scatter_offset_u32.c
new file mode 100644
index 0000000000000000000000000000000000000000..b305b1c42244c844b15972695ad81da4c992d90d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrbq_scatter_offset_u32.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+void
+foo (uint8_t * base, uint32x4_t offset, uint32x4_t value)
+{
+  vstrbq_scatter_offset_u32 (base, offset, value);
+}
+
+/* { dg-final { scan-assembler "vstrb.32"  }  } */
+
+void
+foo1 (uint8_t * base, uint32x4_t offset, uint32x4_t value)
+{
+  vstrbq_scatter_offset (base, offset, value);
+}
+
+/* { dg-final { scan-assembler "vstrb.32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrbq_scatter_offset_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrbq_scatter_offset_u8.c
new file mode 100644
index 0000000000000000000000000000000000000000..cf0c267d8f9939d8f12122ece12166e102898dc5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrbq_scatter_offset_u8.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+void
+foo (uint8_t * base, uint8x16_t offset, uint8x16_t value)
+{
+  vstrbq_scatter_offset_u8 (base, offset, value);
+}
+
+/* { dg-final { scan-assembler "vstrb.8"  }  } */
+
+void
+foo1 (uint8_t * base, uint8x16_t offset, uint8x16_t value)
+{
+  vstrbq_scatter_offset (base, offset, value);
+}
+
+/* { dg-final { scan-assembler "vstrb.8"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrbq_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrbq_u16.c
new file mode 100644
index 0000000000000000000000000000000000000000..4cdf967aa68e035ec428bb2d1b4857d3599a3d4f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrbq_u16.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+void
+foo (uint8_t * addr, uint16x8_t value)
+{
+  vstrbq_u16 (addr, value);
+}
+
+/* { dg-final { scan-assembler "vstrb.16"  }  } */
+
+void
+foo1 (uint8_t * addr, uint16x8_t value)
+{
+  vstrbq (addr, value);
+}
+
+/* { dg-final { scan-assembler "vstrb.16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrbq_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrbq_u32.c
new file mode 100644
index 0000000000000000000000000000000000000000..c64ed808289f6a17968014b284909e2e72ece8fe
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrbq_u32.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+void
+foo (uint8_t * addr, uint32x4_t value)
+{
+  vstrbq_u32 (addr, value);
+}
+
+/* { dg-final { scan-assembler "vstrb.32"  }  } */
+
+void
+foo1 (uint8_t * addr, uint32x4_t value)
+{
+  vstrbq (addr, value);
+}
+
+/* { dg-final { scan-assembler "vstrb.32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrbq_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrbq_u8.c
new file mode 100644
index 0000000000000000000000000000000000000000..94d14a4fb71cec8fdb07438be1a1d1fc29e55198
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrbq_u8.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+void
+foo (uint8_t * addr, uint8x16_t value)
+{
+  vstrbq_u8 (addr, value);
+}
+
+/* { dg-final { scan-assembler "vstrb.8"  }  } */
+
+void
+foo1 (uint8_t * addr, uint8x16_t value)
+{
+  vstrbq (addr, value);
+}
+
+/* { dg-final { scan-assembler "vstrb.8"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_s32.c
new file mode 100644
index 0000000000000000000000000000000000000000..d107819f7dfcec71471d67eabbcf32b425f426dc
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_s32.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+void
+foo (uint32x4_t addr, int32x4_t value)
+{
+  vstrwq_scatter_base_s32 (addr, 8, value);
+}
+
+/* { dg-final { scan-assembler "vstrw.u32"  }  } */
+
+void
+foo1 (uint32x4_t addr, int32x4_t value)
+{
+  vstrwq_scatter_base (addr, 8, value);
+}
+
+/* { dg-final { scan-assembler "vstrw.u32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_u32.c
new file mode 100644
index 0000000000000000000000000000000000000000..148516f57fb115ea5bd0dbb5c5bee4fe3b8d02e8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_u32.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+void
+foo (uint32x4_t addr, uint32x4_t value)
+{
+  vstrwq_scatter_base_u32 (addr, 8, value);
+}
+
+/* { dg-final { scan-assembler "vstrw.u32"  }  } */
+
+void
+foo1 (uint32x4_t addr, uint32x4_t value)
+{
+  vstrwq_scatter_base (addr, 8, value);
+}
+
+/* { dg-final { scan-assembler "vstrw.u32"  }  } */


[-- Attachment #2: diff20.patch --]
[-- Type: text/plain, Size: 28275 bytes --]

diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index 6dffb36fe179357c62cb4f35d486513971e3487d..ec88199bb5e7e9c15a346061c70841f3086004ef 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -555,6 +555,39 @@ arm_quadop_unone_unone_unone_none_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 #define QUADOP_UNONE_UNONE_UNONE_NONE_UNONE_QUALIFIERS \
   (arm_quadop_unone_unone_unone_none_unone_qualifiers)
 
+static enum arm_type_qualifiers
+arm_strs_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_void, qualifier_pointer, qualifier_none };
+#define STRS_QUALIFIERS (arm_strs_qualifiers)
+
+static enum arm_type_qualifiers
+arm_stru_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_void, qualifier_pointer, qualifier_unsigned };
+#define STRU_QUALIFIERS (arm_stru_qualifiers)
+
+static enum arm_type_qualifiers
+arm_strss_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_void, qualifier_pointer, qualifier_unsigned,
+      qualifier_none};
+#define STRSS_QUALIFIERS (arm_strss_qualifiers)
+
+static enum arm_type_qualifiers
+arm_strsu_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_void, qualifier_pointer, qualifier_unsigned,
+      qualifier_unsigned};
+#define STRSU_QUALIFIERS (arm_strsu_qualifiers)
+
+static enum arm_type_qualifiers
+arm_strsbs_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_void, qualifier_unsigned, qualifier_immediate, qualifier_none};
+#define STRSBS_QUALIFIERS (arm_strsbs_qualifiers)
+
+static enum arm_type_qualifiers
+arm_strsbu_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_void, qualifier_unsigned, qualifier_immediate,
+      qualifier_unsigned};
+#define STRSBU_QUALIFIERS (arm_strsbu_qualifiers)
+
 /* End of Qualifier for MVE builtins.  */
 
    /* void ([T element type] *, T, immediate).  */
diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
index 896410a843716376b576016a07bc9ece41f63e7d..3d64033dd23636b041cbd6ecb3eb18c3b9db3aed 100644
--- a/gcc/config/arm/arm_mve.h
+++ b/gcc/config/arm/arm_mve.h
@@ -1702,6 +1702,20 @@ typedef struct { uint8x16_t val[4]; } uint8x16x4_t;
 #define vsubq_m_f16(__inactive, __a, __b, __p) __arm_vsubq_m_f16(__inactive, __a, __b, __p)
 #define vsubq_m_n_f32(__inactive, __a, __b, __p) __arm_vsubq_m_n_f32(__inactive, __a, __b, __p)
 #define vsubq_m_n_f16(__inactive, __a, __b, __p) __arm_vsubq_m_n_f16(__inactive, __a, __b, __p)
+#define vstrbq_s8( __addr, __value) __arm_vstrbq_s8( __addr, __value)
+#define vstrbq_u8( __addr, __value) __arm_vstrbq_u8( __addr, __value)
+#define vstrbq_u16( __addr, __value) __arm_vstrbq_u16( __addr, __value)
+#define vstrbq_scatter_offset_s8( __base, __offset, __value) __arm_vstrbq_scatter_offset_s8( __base, __offset, __value)
+#define vstrbq_scatter_offset_u8( __base, __offset, __value) __arm_vstrbq_scatter_offset_u8( __base, __offset, __value)
+#define vstrbq_scatter_offset_u16( __base, __offset, __value) __arm_vstrbq_scatter_offset_u16( __base, __offset, __value)
+#define vstrbq_s16( __addr, __value) __arm_vstrbq_s16( __addr, __value)
+#define vstrbq_u32( __addr, __value) __arm_vstrbq_u32( __addr, __value)
+#define vstrbq_scatter_offset_s16( __base, __offset, __value) __arm_vstrbq_scatter_offset_s16( __base, __offset, __value)
+#define vstrbq_scatter_offset_u32( __base, __offset, __value) __arm_vstrbq_scatter_offset_u32( __base, __offset, __value)
+#define vstrbq_s32( __addr, __value) __arm_vstrbq_s32( __addr, __value)
+#define vstrbq_scatter_offset_s32( __base, __offset, __value) __arm_vstrbq_scatter_offset_s32( __base, __offset, __value)
+#define vstrwq_scatter_base_s32(__addr,  __offset, __value) __arm_vstrwq_scatter_base_s32(__addr,  __offset, __value)
+#define vstrwq_scatter_base_u32(__addr,  __offset, __value) __arm_vstrwq_scatter_base_u32(__addr,  __offset, __value)
 #endif
 
 __extension__ extern __inline void
@@ -10995,6 +11009,103 @@ __arm_vshrntq_m_n_u16 (uint8x16_t __a, uint16x8_t __b, const int __imm, mve_pred
   return __builtin_mve_vshrntq_m_n_uv8hi (__a, __b, __imm, __p);
 }
 
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vstrbq_scatter_offset_s8 (int8_t * __base, uint8x16_t __offset, int8x16_t __value)
+{
+  __builtin_mve_vstrbq_scatter_offset_sv16qi ((__builtin_neon_qi *) __base, __offset, __value);
+}
+
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vstrbq_scatter_offset_s32 (int8_t * __base, uint32x4_t __offset, int32x4_t __value)
+{
+  __builtin_mve_vstrbq_scatter_offset_sv4si ((__builtin_neon_qi *) __base, __offset, __value);
+}
+
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vstrbq_scatter_offset_s16 (int8_t * __base, uint16x8_t __offset, int16x8_t __value)
+{
+  __builtin_mve_vstrbq_scatter_offset_sv8hi ((__builtin_neon_qi *) __base, __offset, __value);
+}
+
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vstrbq_scatter_offset_u8 (uint8_t * __base, uint8x16_t __offset, uint8x16_t __value)
+{
+  __builtin_mve_vstrbq_scatter_offset_uv16qi ((__builtin_neon_qi *) __base, __offset, __value);
+}
+
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vstrbq_scatter_offset_u32 (uint8_t * __base, uint32x4_t __offset, uint32x4_t __value)
+{
+  __builtin_mve_vstrbq_scatter_offset_uv4si ((__builtin_neon_qi *) __base, __offset, __value);
+}
+
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vstrbq_scatter_offset_u16 (uint8_t * __base, uint16x8_t __offset, uint16x8_t __value)
+{
+  __builtin_mve_vstrbq_scatter_offset_uv8hi ((__builtin_neon_qi *) __base, __offset, __value);
+}
+
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vstrbq_s8 (int8_t * __addr, int8x16_t __value)
+{
+  __builtin_mve_vstrbq_sv16qi ((__builtin_neon_qi *) __addr, __value);
+}
+
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vstrbq_s32 (int8_t * __addr, int32x4_t __value)
+{
+  __builtin_mve_vstrbq_sv4si ((__builtin_neon_qi *) __addr, __value);
+}
+
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vstrbq_s16 (int8_t * __addr, int16x8_t __value)
+{
+  __builtin_mve_vstrbq_sv8hi ((__builtin_neon_qi *) __addr, __value);
+}
+
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vstrbq_u8 (uint8_t * __addr, uint8x16_t __value)
+{
+  __builtin_mve_vstrbq_uv16qi ((__builtin_neon_qi *) __addr, __value);
+}
+
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vstrbq_u32 (uint8_t * __addr, uint32x4_t __value)
+{
+  __builtin_mve_vstrbq_uv4si ((__builtin_neon_qi *) __addr, __value);
+}
+
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vstrbq_u16 (uint8_t * __addr, uint16x8_t __value)
+{
+  __builtin_mve_vstrbq_uv8hi ((__builtin_neon_qi *) __addr, __value);
+}
+
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vstrwq_scatter_base_s32 (uint32x4_t __addr, const int __offset, int32x4_t __value)
+{
+  __builtin_mve_vstrwq_scatter_base_sv4si (__addr, __offset, __value);
+}
+
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vstrwq_scatter_base_u32 (uint32x4_t __addr, const int __offset, uint32x4_t __value)
+{
+  __builtin_mve_vstrwq_scatter_base_uv4si (__addr, __offset, __value);
+}
 #if (__ARM_FEATURE_MVE & 2) /* MVE Floating point.  */
 
 __extension__ extern __inline void
@@ -17542,6 +17653,35 @@ extern void *__ARM_undef;
 #define vrmlsldavhaxq_p(p0,p1,p2,p3) __arm_vrmlsldavhaxq_p(p0,p1,p2,p3)
 #define __arm_vrmlsldavhaxq_p(p0,p1,p2,p3) __arm_vrmlsldavhaxq_p_s32(p0,p1,p2,p3)
 
+#define vstrbq(p0,p1) __arm_vstrbq(p0,p1)
+#define __arm_vstrbq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
+  __typeof(p1) __p1 = (p1); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
+  int (*)[__ARM_mve_type_int8_t_ptr][__ARM_mve_type_int8x16_t]: __arm_vstrbq_s8 (__ARM_mve_coerce(__p0, int8_t *), __ARM_mve_coerce(__p1, int8x16_t)), \
+  int (*)[__ARM_mve_type_int8_t_ptr][__ARM_mve_type_int16x8_t]: __arm_vstrbq_s16 (__ARM_mve_coerce(__p0, int8_t *), __ARM_mve_coerce(__p1, int16x8_t)), \
+  int (*)[__ARM_mve_type_int8_t_ptr][__ARM_mve_type_int32x4_t]: __arm_vstrbq_s32 (__ARM_mve_coerce(__p0, int8_t *), __ARM_mve_coerce(__p1, int32x4_t)), \
+  int (*)[__ARM_mve_type_uint8_t_ptr][__ARM_mve_type_uint8x16_t]: __arm_vstrbq_u8 (__ARM_mve_coerce(__p0, uint8_t *), __ARM_mve_coerce(__p1, uint8x16_t)), \
+  int (*)[__ARM_mve_type_uint8_t_ptr][__ARM_mve_type_uint16x8_t]: __arm_vstrbq_u16 (__ARM_mve_coerce(__p0, uint8_t *), __ARM_mve_coerce(__p1, uint16x8_t)), \
+  int (*)[__ARM_mve_type_uint8_t_ptr][__ARM_mve_type_uint32x4_t]: __arm_vstrbq_u32 (__ARM_mve_coerce(__p0, uint8_t *), __ARM_mve_coerce(__p1, uint32x4_t)));})
+
+#define vstrbq_scatter_offset(p0,p1,p2) __arm_vstrbq_scatter_offset(p0,p1,p2)
+#define __arm_vstrbq_scatter_offset(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
+  __typeof(p1) __p1 = (p1); \
+  __typeof(p2) __p2 = (p2); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
+  int (*)[__ARM_mve_type_int8_t_ptr][__ARM_mve_type_uint8x16_t][__ARM_mve_type_int8x16_t]: __arm_vstrbq_scatter_offset_s8 (__ARM_mve_coerce(__p0, int8_t *), __ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce(__p2, int8x16_t)), \
+  int (*)[__ARM_mve_type_int8_t_ptr][__ARM_mve_type_uint16x8_t][__ARM_mve_type_int16x8_t]: __arm_vstrbq_scatter_offset_s16 (__ARM_mve_coerce(__p0, int8_t *), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, int16x8_t)), \
+  int (*)[__ARM_mve_type_int8_t_ptr][__ARM_mve_type_uint32x4_t][__ARM_mve_type_int32x4_t]: __arm_vstrbq_scatter_offset_s32 (__ARM_mve_coerce(__p0, int8_t *), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, int32x4_t)), \
+  int (*)[__ARM_mve_type_uint8_t_ptr][__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]: __arm_vstrbq_scatter_offset_u8 (__ARM_mve_coerce(__p0, uint8_t *), __ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce(__p2, uint8x16_t)), \
+  int (*)[__ARM_mve_type_uint8_t_ptr][__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vstrbq_scatter_offset_u16 (__ARM_mve_coerce(__p0, uint8_t *), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, uint16x8_t)), \
+  int (*)[__ARM_mve_type_uint8_t_ptr][__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vstrbq_scatter_offset_u32 (__ARM_mve_coerce(__p0, uint8_t *), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, uint32x4_t)));})
+
+#define vstrwq_scatter_base(p0,p1,p2) __arm_vstrwq_scatter_base(p0,p1,p2)
+#define __arm_vstrwq_scatter_base(p0,p1,p2) ({ __typeof(p2) __p2 = (p2); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p2)])0, \
+  int (*)[__ARM_mve_type_int32x4_t]: __arm_vstrwq_scatter_base_s32(p0, p1, __ARM_mve_coerce(__p2, int32x4_t)), \
+  int (*)[__ARM_mve_type_uint32x4_t]: __arm_vstrwq_scatter_base_u32(p0, p1, __ARM_mve_coerce(__p2, uint32x4_t)));})
+
 #endif /* MVE Floating point.  */
 
 #ifdef __cplusplus
diff --git a/gcc/config/arm/arm_mve_builtins.def b/gcc/config/arm/arm_mve_builtins.def
index a18b9b3f433ead3c5d9e73afa3c2d94ae87765f7..149c4115b6167e3f2f0eb43fbdeaa8708ba5d723 100644
--- a/gcc/config/arm/arm_mve_builtins.def
+++ b/gcc/config/arm/arm_mve_builtins.def
@@ -685,3 +685,9 @@ VAR2 (QUADOP_NONE_NONE_NONE_NONE_UNONE, vandq_m_f, v8hf, v4sf)
 VAR2 (QUADOP_NONE_NONE_NONE_NONE_UNONE, vaddq_m_n_f, v8hf, v4sf)
 VAR2 (QUADOP_NONE_NONE_NONE_NONE_UNONE, vaddq_m_f, v8hf, v4sf)
 VAR2 (QUADOP_NONE_NONE_NONE_NONE_UNONE, vabdq_m_f, v8hf, v4sf)
+VAR3 (STRS, vstrbq_s, v16qi, v8hi, v4si)
+VAR3 (STRU, vstrbq_u, v16qi, v8hi, v4si)
+VAR3 (STRSS, vstrbq_scatter_offset_s, v16qi, v8hi, v4si)
+VAR3 (STRSU, vstrbq_scatter_offset_u, v16qi, v8hi, v4si)
+VAR1 (STRSBS, vstrwq_scatter_base_s, v4si)
+VAR1 (STRSBU, vstrwq_scatter_base_u, v4si)
diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index 741c1e84e7d2379d22cf6239649019453b037bc2..eb365166e934f32f7695e5354258151202f378c0 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -191,7 +191,8 @@
 			 VCMLAQ_ROT90_M_F VCMULQ_M_F VCMULQ_ROT180_M_F
 			 VCMULQ_ROT270_M_F VCMULQ_ROT90_M_F VFMAQ_M_F
 			 VFMAQ_M_N_F VFMASQ_M_N_F VFMSQ_M_F VMAXNMQ_M_F
-			 VMINNMQ_M_F VSUBQ_M_F])
+			 VMINNMQ_M_F VSUBQ_M_F VSTRWQSB_S VSTRWQSB_U
+			 VSTRBQSO_S VSTRBQSO_U VSTRBQ_S VSTRBQ_U])
 
 (define_mode_attr MVE_CNVT [(V8HI "V8HF") (V4SI "V4SF")
 			    (V8HF "V8HI") (V4SF "V4SI")])
@@ -342,7 +343,9 @@
 		       (VQRSHRNTQ_M_N_S "s") (VQRSHRNTQ_M_N_U "u")
 		       (VQRSHRNBQ_M_N_S "s") (VQRSHRNBQ_M_N_U "u")
 		       (VMLALDAVAXQ_P_S "s") (VMLALDAVAXQ_P_U "u")
-		       (VMLALDAVAQ_P_S "s") (VMLALDAVAQ_P_U "u")])
+		       (VMLALDAVAQ_P_S "s") (VMLALDAVAQ_P_U "u")
+		       (VSTRWQSB_S "s") (VSTRWQSB_U "u") (VSTRBQSO_S "s")
+		       (VSTRBQSO_U "u") (VSTRBQ_S "s") (VSTRBQ_U "u")])
 
 (define_int_attr mode1 [(VCTP8Q "8") (VCTP16Q "16") (VCTP32Q "32")
 			(VCTP64Q "64") (VCTP8Q_M "8") (VCTP16Q_M "16")
@@ -359,6 +362,7 @@
 
 (define_mode_attr MVE_constraint1 [ (V8HI "Ra") (V4SI "Rc")])
 (define_mode_attr MVE_pred1 [ (V8HI "mve_imm_7") (V4SI "mve_imm_15")])
+(define_mode_attr MVE_B_ELEM [ (V16QI "V16QI") (V8HI "V8QI") (V4SI "V4QI")])
 
 (define_int_iterator VCVTQ_TO_F [VCVTQ_TO_F_S VCVTQ_TO_F_U])
 (define_int_iterator VMVNQ_N [VMVNQ_N_U VMVNQ_N_S])
@@ -562,6 +566,9 @@
 (define_int_iterator VSHLLTQ_M_N [VSHLLTQ_M_N_U VSHLLTQ_M_N_S])
 (define_int_iterator VSHRNBQ_M_N [VSHRNBQ_M_N_S VSHRNBQ_M_N_U])
 (define_int_iterator VSHRNTQ_M_N [VSHRNTQ_M_N_S VSHRNTQ_M_N_U])
+(define_int_iterator VSTRWSBQ [VSTRWQSB_S VSTRWQSB_U])
+(define_int_iterator VSTRBSOQ [VSTRBQSO_S VSTRBQSO_U])
+(define_int_iterator VSTRBQ [VSTRBQ_S VSTRBQ_U])
 
 (define_insn "*mve_mov<mode>"
   [(set (match_operand:MVE_types 0 "s_register_operand" "=w,w,r,w,w,r,w")
@@ -7933,3 +7940,65 @@
   "vpst\;vsubt.f%#<V_sz_elem>\t%q0, %q2, %3"
   [(set_attr "type" "mve_move")
    (set_attr "length""8")])
+
+;;
+;; [vstrbq_s vstrbq_u]
+;;
+(define_insn "mve_vstrbq_<supf><mode>"
+  [(set (match_operand:<MVE_B_ELEM> 0 "memory_operand" "=Us")
+	(unspec:<MVE_B_ELEM> [(match_operand:MVE_2 1 "s_register_operand" "w")]
+	 VSTRBQ))
+  ]
+  "TARGET_HAVE_MVE"
+{
+   rtx ops[2];
+   int regno = REGNO (operands[1]);
+   ops[1] = gen_rtx_REG (TImode, regno);
+   ops[0]  = operands[0];
+   output_asm_insn("vstrb.<V_sz_elem>\t%q1, %E0",ops);
+   return "";
+}
+  [(set_attr "length" "4")])
+
+;;
+;; [vstrbq_scatter_offset_s vstrbq_scatter_offset_u]
+;;
+(define_insn "mve_vstrbq_scatter_offset_<supf><mode>"
+  [(set (match_operand:<MVE_B_ELEM> 0 "memory_operand" "=Us")
+	(unspec:<MVE_B_ELEM>
+		[(match_operand:MVE_2 1 "s_register_operand" "w")
+		 (match_operand:MVE_2 2 "s_register_operand" "w")]
+	 VSTRBSOQ))
+  ]
+  "TARGET_HAVE_MVE"
+{
+   rtx ops[3];
+   ops[0] = operands[0];
+   ops[1] = operands[1];
+   ops[2] = operands[2];
+   output_asm_insn("vstrb.<V_sz_elem>\t%q2, [%m0, %q1]",ops);
+   return "";
+}
+  [(set_attr "length" "4")])
+
+;;
+;; [vstrwq_scatter_base_s vstrwq_scatter_base_u]
+;;
+(define_insn "mve_vstrwq_scatter_base_<supf>v4si"
+  [(set (mem:BLK (scratch))
+	(unspec:BLK
+		[(match_operand:V4SI 0 "s_register_operand" "w")
+		 (match_operand:SI 1 "immediate_operand" "i")
+		 (match_operand:V4SI 2 "s_register_operand" "w")]
+	 VSTRWSBQ))
+  ]
+  "TARGET_HAVE_MVE"
+{
+   rtx ops[3];
+   ops[0] = operands[0];
+   ops[1] = operands[1];
+   ops[2] = operands[2];
+   output_asm_insn("vstrw.u32\t%q2, [%q0, %1]",ops);
+   return "";
+}
+  [(set_attr "length" "4")])
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrbq_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrbq_s16.c
new file mode 100644
index 0000000000000000000000000000000000000000..7c34f7f3b8e22bdc6e64f52fa306833d339c42e4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrbq_s16.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+void
+foo (int8_t * addr, int16x8_t value)
+{
+  vstrbq_s16 (addr, value);
+}
+
+/* { dg-final { scan-assembler "vstrb.16"  }  } */
+
+void
+foo1 (int8_t * addr, int16x8_t value)
+{
+  vstrbq (addr, value);
+}
+
+/* { dg-final { scan-assembler "vstrb.16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrbq_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrbq_s32.c
new file mode 100644
index 0000000000000000000000000000000000000000..a0925529e7c2066d034b20519cea660e159fc4d7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrbq_s32.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+void
+foo (int8_t * addr, int32x4_t value)
+{
+  vstrbq_s32 (addr, value);
+}
+
+/* { dg-final { scan-assembler "vstrb.32"  }  } */
+
+void
+foo1 (int8_t * addr, int32x4_t value)
+{
+  vstrbq (addr, value);
+}
+
+/* { dg-final { scan-assembler "vstrb.32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrbq_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrbq_s8.c
new file mode 100644
index 0000000000000000000000000000000000000000..d56e6c208c07eec84353c080f2cd209a17451736
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrbq_s8.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+void
+foo (int8_t * addr, int8x16_t value)
+{
+  vstrbq_s8 (addr, value);
+}
+
+/* { dg-final { scan-assembler "vstrb.8"  }  } */
+
+void
+foo1 (int8_t * addr, int8x16_t value)
+{
+  vstrbq (addr, value);
+}
+
+/* { dg-final { scan-assembler "vstrb.8"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrbq_scatter_offset_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrbq_scatter_offset_s16.c
new file mode 100644
index 0000000000000000000000000000000000000000..aacf7fc87de328013a3597bd554a821738d404a8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrbq_scatter_offset_s16.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+void
+foo (int8_t * base, uint16x8_t offset, int16x8_t value)
+{
+  vstrbq_scatter_offset_s16 (base, offset, value);
+}
+
+/* { dg-final { scan-assembler "vstrb.16"  }  } */
+
+void
+foo1 (int8_t * base, uint16x8_t offset, int16x8_t value)
+{
+  vstrbq_scatter_offset (base, offset, value);
+}
+
+/* { dg-final { scan-assembler "vstrb.16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrbq_scatter_offset_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrbq_scatter_offset_s32.c
new file mode 100644
index 0000000000000000000000000000000000000000..ef8a9b1aa97311a9e9ef3d61f8aa130652e1f354
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrbq_scatter_offset_s32.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+void
+foo (int8_t * base, uint32x4_t offset, int32x4_t value)
+{
+  vstrbq_scatter_offset_s32 (base, offset, value);
+}
+
+/* { dg-final { scan-assembler "vstrb.32"  }  } */
+
+void
+foo1 (int8_t * base, uint32x4_t offset, int32x4_t value)
+{
+  vstrbq_scatter_offset (base, offset, value);
+}
+
+/* { dg-final { scan-assembler "vstrb.32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrbq_scatter_offset_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrbq_scatter_offset_s8.c
new file mode 100644
index 0000000000000000000000000000000000000000..ee5ec04693a7166e5d498fcbca22db88d093485e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrbq_scatter_offset_s8.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+void
+foo (int8_t * base, uint8x16_t offset, int8x16_t value)
+{
+  vstrbq_scatter_offset_s8 (base, offset, value);
+}
+
+/* { dg-final { scan-assembler "vstrb.8"  }  } */
+
+void
+foo1 (int8_t * base, uint8x16_t offset, int8x16_t value)
+{
+  vstrbq_scatter_offset (base, offset, value);
+}
+
+/* { dg-final { scan-assembler "vstrb.8"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrbq_scatter_offset_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrbq_scatter_offset_u16.c
new file mode 100644
index 0000000000000000000000000000000000000000..f73cf9ec0590e86da71981c5febff14d13fab6ab
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrbq_scatter_offset_u16.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+void
+foo (uint8_t * base, uint16x8_t offset, uint16x8_t value)
+{
+  vstrbq_scatter_offset_u16 (base, offset, value);
+}
+
+/* { dg-final { scan-assembler "vstrb.16"  }  } */
+
+void
+foo1 (uint8_t * base, uint16x8_t offset, uint16x8_t value)
+{
+  vstrbq_scatter_offset (base, offset, value);
+}
+
+/* { dg-final { scan-assembler "vstrb.16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrbq_scatter_offset_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrbq_scatter_offset_u32.c
new file mode 100644
index 0000000000000000000000000000000000000000..b305b1c42244c844b15972695ad81da4c992d90d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrbq_scatter_offset_u32.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+void
+foo (uint8_t * base, uint32x4_t offset, uint32x4_t value)
+{
+  vstrbq_scatter_offset_u32 (base, offset, value);
+}
+
+/* { dg-final { scan-assembler "vstrb.32"  }  } */
+
+void
+foo1 (uint8_t * base, uint32x4_t offset, uint32x4_t value)
+{
+  vstrbq_scatter_offset (base, offset, value);
+}
+
+/* { dg-final { scan-assembler "vstrb.32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrbq_scatter_offset_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrbq_scatter_offset_u8.c
new file mode 100644
index 0000000000000000000000000000000000000000..cf0c267d8f9939d8f12122ece12166e102898dc5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrbq_scatter_offset_u8.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+void
+foo (uint8_t * base, uint8x16_t offset, uint8x16_t value)
+{
+  vstrbq_scatter_offset_u8 (base, offset, value);
+}
+
+/* { dg-final { scan-assembler "vstrb.8"  }  } */
+
+void
+foo1 (uint8_t * base, uint8x16_t offset, uint8x16_t value)
+{
+  vstrbq_scatter_offset (base, offset, value);
+}
+
+/* { dg-final { scan-assembler "vstrb.8"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrbq_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrbq_u16.c
new file mode 100644
index 0000000000000000000000000000000000000000..4cdf967aa68e035ec428bb2d1b4857d3599a3d4f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrbq_u16.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+void
+foo (uint8_t * addr, uint16x8_t value)
+{
+  vstrbq_u16 (addr, value);
+}
+
+/* { dg-final { scan-assembler "vstrb.16"  }  } */
+
+void
+foo1 (uint8_t * addr, uint16x8_t value)
+{
+  vstrbq (addr, value);
+}
+
+/* { dg-final { scan-assembler "vstrb.16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrbq_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrbq_u32.c
new file mode 100644
index 0000000000000000000000000000000000000000..c64ed808289f6a17968014b284909e2e72ece8fe
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrbq_u32.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+void
+foo (uint8_t * addr, uint32x4_t value)
+{
+  vstrbq_u32 (addr, value);
+}
+
+/* { dg-final { scan-assembler "vstrb.32"  }  } */
+
+void
+foo1 (uint8_t * addr, uint32x4_t value)
+{
+  vstrbq (addr, value);
+}
+
+/* { dg-final { scan-assembler "vstrb.32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrbq_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrbq_u8.c
new file mode 100644
index 0000000000000000000000000000000000000000..94d14a4fb71cec8fdb07438be1a1d1fc29e55198
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrbq_u8.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+void
+foo (uint8_t * addr, uint8x16_t value)
+{
+  vstrbq_u8 (addr, value);
+}
+
+/* { dg-final { scan-assembler "vstrb.8"  }  } */
+
+void
+foo1 (uint8_t * addr, uint8x16_t value)
+{
+  vstrbq (addr, value);
+}
+
+/* { dg-final { scan-assembler "vstrb.8"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_s32.c
new file mode 100644
index 0000000000000000000000000000000000000000..d107819f7dfcec71471d67eabbcf32b425f426dc
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_s32.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+void
+foo (uint32x4_t addr, int32x4_t value)
+{
+  vstrwq_scatter_base_s32 (addr, 8, value);
+}
+
+/* { dg-final { scan-assembler "vstrw.u32"  }  } */
+
+void
+foo1 (uint32x4_t addr, int32x4_t value)
+{
+  vstrwq_scatter_base (addr, 8, value);
+}
+
+/* { dg-final { scan-assembler "vstrw.u32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_u32.c
new file mode 100644
index 0000000000000000000000000000000000000000..148516f57fb115ea5bd0dbb5c5bee4fe3b8d02e8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_u32.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+void
+foo (uint32x4_t addr, uint32x4_t value)
+{
+  vstrwq_scatter_base_u32 (addr, 8, value);
+}
+
+/* { dg-final { scan-assembler "vstrw.u32"  }  } */
+
+void
+foo1 (uint32x4_t addr, uint32x4_t value)
+{
+  vstrwq_scatter_base (addr, 8, value);
+}
+
+/* { dg-final { scan-assembler "vstrw.u32"  }  } */


^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH][ARM][GCC][7/5x]: MVE store intrinsics which stores byte,half word or word to memory.
  2019-11-14 19:34 [PATCH][ARM][GCC][0/x]: Support for MVE ACLE intrinsics Srinath Parvathaneni
                   ` (14 preceding siblings ...)
  2019-11-14 19:15 ` [PATCH][ARM][GCC][2/4x]: MVE intrinsics with quaternary operands Srinath Parvathaneni
@ 2019-11-14 19:15 ` Srinath Parvathaneni
  2019-11-14 19:15 ` [PATCH][ARM][GCC][8/5x]: Remaining MVE store intrinsics which stores an half word, word and double " Srinath Parvathaneni
                   ` (22 subsequent siblings)
  38 siblings, 0 replies; 48+ messages in thread
From: Srinath Parvathaneni @ 2019-11-14 19:15 UTC (permalink / raw)
  To: gcc-patches; +Cc: Richard Earnshaw, Kyrylo Tkachov

[-- Attachment #1: Type: text/plain, Size: 94365 bytes --]

Hello,

This patch supports the following MVE ACLE store intrinsics which stores a byte, halfword,
or word to memory.

vst1q_f32, vst1q_f16, vst1q_s8, vst1q_s32, vst1q_s16, vst1q_u8, vst1q_u32, vst1q_u16,
vstrhq_f16, vstrhq_scatter_offset_s32, vstrhq_scatter_offset_s16, vstrhq_scatter_offset_u32,
vstrhq_scatter_offset_u16, vstrhq_scatter_offset_p_s32, vstrhq_scatter_offset_p_s16,
vstrhq_scatter_offset_p_u32, vstrhq_scatter_offset_p_u16, vstrhq_scatter_shifted_offset_s32,
vstrhq_scatter_shifted_offset_s16, vstrhq_scatter_shifted_offset_u32,
vstrhq_scatter_shifted_offset_u16, vstrhq_scatter_shifted_offset_p_s32,
vstrhq_scatter_shifted_offset_p_s16, vstrhq_scatter_shifted_offset_p_u32,
vstrhq_scatter_shifted_offset_p_u16, vstrhq_s32, vstrhq_s16, vstrhq_u32, vstrhq_u16,
vstrhq_p_f16, vstrhq_p_s32, vstrhq_p_s16, vstrhq_p_u32, vstrhq_p_u16, vstrwq_f32,
vstrwq_s32, vstrwq_u32, vstrwq_p_f32, vstrwq_p_s32, vstrwq_p_u32.

Please refer to M-profile Vector Extension (MVE) intrinsics [1]  for more details.
[1]  https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/mve-intrinsics

Regression tested on arm-none-eabi and found no regressions.

Ok for trunk?

Thanks,
Srinath.

gcc/ChangeLog:

2019-11-01  Andre Vieira  <andre.simoesdiasvieira@arm.com>
            Mihail Ionescu  <mihail.ionescu@arm.com>
            Srinath Parvathaneni  <srinath.parvathaneni@arm.com>

	* config/arm/arm_mve.h (vst1q_f32): Define macro.
	(vst1q_f16): Likewise.
	(vst1q_s8): Likewise.
	(vst1q_s32): Likewise.
	(vst1q_s16): Likewise.
	(vst1q_u8): Likewise.
	(vst1q_u32): Likewise.
	(vst1q_u16): Likewise.
	(vstrhq_f16): Likewise.
	(vstrhq_scatter_offset_s32): Likewise.
	(vstrhq_scatter_offset_s16): Likewise.
	(vstrhq_scatter_offset_u32): Likewise.
	(vstrhq_scatter_offset_u16): Likewise.
	(vstrhq_scatter_offset_p_s32): Likewise.
	(vstrhq_scatter_offset_p_s16): Likewise.
	(vstrhq_scatter_offset_p_u32): Likewise.
	(vstrhq_scatter_offset_p_u16): Likewise.
	(vstrhq_scatter_shifted_offset_s32): Likewise.
	(vstrhq_scatter_shifted_offset_s16): Likewise.
	(vstrhq_scatter_shifted_offset_u32): Likewise.
	(vstrhq_scatter_shifted_offset_u16): Likewise.
	(vstrhq_scatter_shifted_offset_p_s32): Likewise.
	(vstrhq_scatter_shifted_offset_p_s16): Likewise.
	(vstrhq_scatter_shifted_offset_p_u32): Likewise.
	(vstrhq_scatter_shifted_offset_p_u16): Likewise.
	(vstrhq_s32): Likewise.
	(vstrhq_s16): Likewise.
	(vstrhq_u32): Likewise.
	(vstrhq_u16): Likewise.
	(vstrhq_p_f16): Likewise.
	(vstrhq_p_s32): Likewise.
	(vstrhq_p_s16): Likewise.
	(vstrhq_p_u32): Likewise.
	(vstrhq_p_u16): Likewise.
	(vstrwq_f32): Likewise.
	(vstrwq_s32): Likewise.
	(vstrwq_u32): Likewise.
	(vstrwq_p_f32): Likewise.
	(vstrwq_p_s32): Likewise.
	(vstrwq_p_u32): Likewise.
	(__arm_vst1q_s8): Define intrinsic.
	(__arm_vst1q_s32): Likewise.
	(__arm_vst1q_s16): Likewise.
	(__arm_vst1q_u8): Likewise.
	(__arm_vst1q_u32): Likewise.
	(__arm_vst1q_u16): Likewise.
	(__arm_vstrhq_scatter_offset_s32): Likewise.
	(__arm_vstrhq_scatter_offset_s16): Likewise.
	(__arm_vstrhq_scatter_offset_u32): Likewise.
	(__arm_vstrhq_scatter_offset_u16): Likewise.
	(__arm_vstrhq_scatter_offset_p_s32): Likewise.
	(__arm_vstrhq_scatter_offset_p_s16): Likewise.
	(__arm_vstrhq_scatter_offset_p_u32): Likewise.
	(__arm_vstrhq_scatter_offset_p_u16): Likewise.
	(__arm_vstrhq_scatter_shifted_offset_s32): Likewise.
	(__arm_vstrhq_scatter_shifted_offset_s16): Likewise.
	(__arm_vstrhq_scatter_shifted_offset_u32): Likewise.
	(__arm_vstrhq_scatter_shifted_offset_u16): Likewise.
	(__arm_vstrhq_scatter_shifted_offset_p_s32): Likewise.
	(__arm_vstrhq_scatter_shifted_offset_p_s16): Likewise.
	(__arm_vstrhq_scatter_shifted_offset_p_u32): Likewise.
	(__arm_vstrhq_scatter_shifted_offset_p_u16): Likewise.
	(__arm_vstrhq_s32): Likewise.
	(__arm_vstrhq_s16): Likewise.
	(__arm_vstrhq_u32): Likewise.
	(__arm_vstrhq_u16): Likewise.
	(__arm_vstrhq_p_s32): Likewise.
	(__arm_vstrhq_p_s16): Likewise.
	(__arm_vstrhq_p_u32): Likewise.
	(__arm_vstrhq_p_u16): Likewise.
	(__arm_vstrwq_s32): Likewise.
	(__arm_vstrwq_u32): Likewise.
	(__arm_vstrwq_p_s32): Likewise.
	(__arm_vstrwq_p_u32): Likewise.
	(__arm_vstrwq_p_f32): Likewise.
	(__arm_vstrwq_f32): Likewise.
	(__arm_vst1q_f32): Likewise.
	(__arm_vst1q_f16): Likewise.
	(__arm_vstrhq_f16): Likewise.
	(__arm_vstrhq_p_f16): Likewise.
	(vst1q): Define polymorphic variant.
	(vstrhq): Likewise.
	(vstrhq_p): Likewise.
	(vstrhq_scatter_offset_p): Likewise.
	(vstrhq_scatter_offset): Likewise.
	(vstrhq_scatter_shifted_offset_p): Likewise.
	(vstrhq_scatter_shifted_offset): Likewise.
	(vstrwq_p): Likewise.
	(vstrwq): Likewise.
	* config/arm/arm_mve_builtins.def (STRS): Use builtin qualifier.
	(STRS_P): Likewise.
	(STRSS): Likewise.
	(STRSS_P): Likewise.
	(STRSU): Likewise.
	(STRSU_P): Likewise.
	(STRU): Likewise.
	(STRU_P): Likewise.
	* config/arm/mve.md (VST1Q): Define iterator.
	(VSTRHSOQ): Likewise.
	(VSTRHSSOQ): Likewise.
	(VSTRHQ): Likewise.
	(VSTRWQ): Likewise.
	(mve_vstrhq_fv8hf): Define RTL pattern.
	(mve_vstrhq_p_fv8hf): Likewise.
	(mve_vstrhq_p_<supf><mode>): Likewise.
	(mve_vstrhq_scatter_offset_p_<supf><mode>): Likewise.
	(mve_vstrhq_scatter_offset_<supf><mode>): Likewise.
	(mve_vstrhq_scatter_shifted_offset_p_<supf><mode>): Likewise.
	(mve_vstrhq_scatter_shifted_offset_<supf><mode>): Likewise.
	(mve_vstrhq_<supf><mode>): Likewise.
	(mve_vstrwq_fv4sf): Likewise.
	(mve_vstrwq_p_fv4sf): Likewise.
	(mve_vstrwq_p_<supf>v4si): Likewise.
	(mve_vstrwq_<supf>v4si): Likewise.
	(mve_vst1q_f<mode>): Define expand.
	(mve_vst1q_<supf><mode>): Likewise.

gcc/testsuite/ChangeLog:

2019-11-01  Andre Vieira  <andre.simoesdiasvieira@arm.com>
            Mihail Ionescu  <mihail.ionescu@arm.com>
            Srinath Parvathaneni  <srinath.parvathaneni@arm.com>

	* gcc.target/arm/mve/intrinsics/vst1q_f16.c: New test.
	* gcc.target/arm/mve/intrinsics/vst1q_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vst1q_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vst1q_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vst1q_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vst1q_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vst1q_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vst1q_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vstrhq_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vstrhq_p_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vstrhq_p_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vstrhq_p_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vstrhq_p_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vstrhq_p_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vstrhq_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vstrhq_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vstrhq_scatter_offset_p_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vstrhq_scatter_offset_p_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vstrhq_scatter_offset_p_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vstrhq_scatter_offset_p_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vstrhq_scatter_offset_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vstrhq_scatter_offset_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vstrhq_scatter_offset_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vstrhq_scatter_offset_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vstrhq_scatter_shifted_offset_p_s16.c:
	Likewise.
	* gcc.target/arm/mve/intrinsics/vstrhq_scatter_shifted_offset_p_s32.c:
	Likewise.
	* gcc.target/arm/mve/intrinsics/vstrhq_scatter_shifted_offset_p_u16.c:
	Likewise.
	* gcc.target/arm/mve/intrinsics/vstrhq_scatter_shifted_offset_p_u32.c:
	Likewise.
	* gcc.target/arm/mve/intrinsics/vstrhq_scatter_shifted_offset_s16.c:
	Likewise.
	* gcc.target/arm/mve/intrinsics/vstrhq_scatter_shifted_offset_s32.c:
	Likewise.
	* gcc.target/arm/mve/intrinsics/vstrhq_scatter_shifted_offset_u16.c:
	Likewise.
	* gcc.target/arm/mve/intrinsics/vstrhq_scatter_shifted_offset_u32.c:
	Likewise.
	* gcc.target/arm/mve/intrinsics/vstrhq_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vstrhq_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vstrwq_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vstrwq_p_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vstrwq_p_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vstrwq_p_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vstrwq_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vstrwq_u32.c: Likewise.


###############     Attachment also inlined for ease of reply    ###############


diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
index a4663a04902741af5276a5f20decca3a7321822e..4aea729bc19e459000e173406e3a7bc189614e9b 100644
--- a/gcc/config/arm/arm_mve.h
+++ b/gcc/config/arm/arm_mve.h
@@ -1828,6 +1828,46 @@ typedef struct { uint8x16_t val[4]; } uint8x16x4_t;
 #define vldrwq_gather_shifted_offset_z_f32(__base, __offset, __p) __arm_vldrwq_gather_shifted_offset_z_f32(__base, __offset, __p)
 #define vldrwq_gather_shifted_offset_z_s32(__base, __offset, __p) __arm_vldrwq_gather_shifted_offset_z_s32(__base, __offset, __p)
 #define vldrwq_gather_shifted_offset_z_u32(__base, __offset, __p) __arm_vldrwq_gather_shifted_offset_z_u32(__base, __offset, __p)
+#define vst1q_f32(__addr, __value) __arm_vst1q_f32(__addr, __value)
+#define vst1q_f16(__addr, __value) __arm_vst1q_f16(__addr, __value)
+#define vst1q_s8(__addr, __value) __arm_vst1q_s8(__addr, __value)
+#define vst1q_s32(__addr, __value) __arm_vst1q_s32(__addr, __value)
+#define vst1q_s16(__addr, __value) __arm_vst1q_s16(__addr, __value)
+#define vst1q_u8(__addr, __value) __arm_vst1q_u8(__addr, __value)
+#define vst1q_u32(__addr, __value) __arm_vst1q_u32(__addr, __value)
+#define vst1q_u16(__addr, __value) __arm_vst1q_u16(__addr, __value)
+#define vstrhq_f16(__addr, __value) __arm_vstrhq_f16(__addr, __value)
+#define vstrhq_scatter_offset_s32( __base, __offset, __value) __arm_vstrhq_scatter_offset_s32( __base, __offset, __value)
+#define vstrhq_scatter_offset_s16( __base, __offset, __value) __arm_vstrhq_scatter_offset_s16( __base, __offset, __value)
+#define vstrhq_scatter_offset_u32( __base, __offset, __value) __arm_vstrhq_scatter_offset_u32( __base, __offset, __value)
+#define vstrhq_scatter_offset_u16( __base, __offset, __value) __arm_vstrhq_scatter_offset_u16( __base, __offset, __value)
+#define vstrhq_scatter_offset_p_s32( __base, __offset, __value, __p) __arm_vstrhq_scatter_offset_p_s32( __base, __offset, __value, __p)
+#define vstrhq_scatter_offset_p_s16( __base, __offset, __value, __p) __arm_vstrhq_scatter_offset_p_s16( __base, __offset, __value, __p)
+#define vstrhq_scatter_offset_p_u32( __base, __offset, __value, __p) __arm_vstrhq_scatter_offset_p_u32( __base, __offset, __value, __p)
+#define vstrhq_scatter_offset_p_u16( __base, __offset, __value, __p) __arm_vstrhq_scatter_offset_p_u16( __base, __offset, __value, __p)
+#define vstrhq_scatter_shifted_offset_s32( __base, __offset, __value) __arm_vstrhq_scatter_shifted_offset_s32( __base, __offset, __value)
+#define vstrhq_scatter_shifted_offset_s16( __base, __offset, __value) __arm_vstrhq_scatter_shifted_offset_s16( __base, __offset, __value)
+#define vstrhq_scatter_shifted_offset_u32( __base, __offset, __value) __arm_vstrhq_scatter_shifted_offset_u32( __base, __offset, __value)
+#define vstrhq_scatter_shifted_offset_u16( __base, __offset, __value) __arm_vstrhq_scatter_shifted_offset_u16( __base, __offset, __value)
+#define vstrhq_scatter_shifted_offset_p_s32( __base, __offset, __value, __p) __arm_vstrhq_scatter_shifted_offset_p_s32( __base, __offset, __value, __p)
+#define vstrhq_scatter_shifted_offset_p_s16( __base, __offset, __value, __p) __arm_vstrhq_scatter_shifted_offset_p_s16( __base, __offset, __value, __p)
+#define vstrhq_scatter_shifted_offset_p_u32( __base, __offset, __value, __p) __arm_vstrhq_scatter_shifted_offset_p_u32( __base, __offset, __value, __p)
+#define vstrhq_scatter_shifted_offset_p_u16( __base, __offset, __value, __p) __arm_vstrhq_scatter_shifted_offset_p_u16( __base, __offset, __value, __p)
+#define vstrhq_s32(__addr, __value) __arm_vstrhq_s32(__addr, __value)
+#define vstrhq_s16(__addr, __value) __arm_vstrhq_s16(__addr, __value)
+#define vstrhq_u32(__addr, __value) __arm_vstrhq_u32(__addr, __value)
+#define vstrhq_u16(__addr, __value) __arm_vstrhq_u16(__addr, __value)
+#define vstrhq_p_f16(__addr, __value, __p) __arm_vstrhq_p_f16(__addr, __value, __p)
+#define vstrhq_p_s32(__addr, __value, __p) __arm_vstrhq_p_s32(__addr, __value, __p)
+#define vstrhq_p_s16(__addr, __value, __p) __arm_vstrhq_p_s16(__addr, __value, __p)
+#define vstrhq_p_u32(__addr, __value, __p) __arm_vstrhq_p_u32(__addr, __value, __p)
+#define vstrhq_p_u16(__addr, __value, __p) __arm_vstrhq_p_u16(__addr, __value, __p)
+#define vstrwq_f32(__addr, __value) __arm_vstrwq_f32(__addr, __value)
+#define vstrwq_s32(__addr, __value) __arm_vstrwq_s32(__addr, __value)
+#define vstrwq_u32(__addr, __value) __arm_vstrwq_u32(__addr, __value)
+#define vstrwq_p_f32(__addr, __value, __p) __arm_vstrwq_p_f32(__addr, __value, __p)
+#define vstrwq_p_s32(__addr, __value, __p) __arm_vstrwq_p_s32(__addr, __value, __p)
+#define vstrwq_p_u32(__addr, __value, __p) __arm_vstrwq_p_u32(__addr, __value, __p)
 #endif
 
 __extension__ extern __inline void
@@ -11893,6 +11933,244 @@ __arm_vldrwq_gather_shifted_offset_z_u32 (uint32_t const * __base, uint32x4_t __
   return __builtin_mve_vldrwq_gather_shifted_offset_z_uv4si ((__builtin_neon_si *) __base, __offset, __p);
 }
 
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vst1q_s8 (int8_t * __addr, int8x16_t __value)
+{
+  __builtin_mve_vst1q_sv16qi ((__builtin_neon_qi *) __addr, __value);
+}
+
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vst1q_s32 (int32_t * __addr, int32x4_t __value)
+{
+  __builtin_mve_vst1q_sv4si ((__builtin_neon_si *) __addr, __value);
+}
+
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vst1q_s16 (int16_t * __addr, int16x8_t __value)
+{
+  __builtin_mve_vst1q_sv8hi ((__builtin_neon_hi *) __addr, __value);
+}
+
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vst1q_u8 (uint8_t * __addr, uint8x16_t __value)
+{
+  __builtin_mve_vst1q_uv16qi ((__builtin_neon_qi *) __addr, __value);
+}
+
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vst1q_u32 (uint32_t * __addr, uint32x4_t __value)
+{
+  __builtin_mve_vst1q_uv4si ((__builtin_neon_si *) __addr, __value);
+}
+
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vst1q_u16 (uint16_t * __addr, uint16x8_t __value)
+{
+  __builtin_mve_vst1q_uv8hi ((__builtin_neon_hi *) __addr, __value);
+}
+
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vstrhq_scatter_offset_s32 (int16_t * __base, uint32x4_t __offset, int32x4_t __value)
+{
+  __builtin_mve_vstrhq_scatter_offset_sv4si ((__builtin_neon_hi *) __base, __offset, __value);
+}
+
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vstrhq_scatter_offset_s16 (int16_t * __base, uint16x8_t __offset, int16x8_t __value)
+{
+  __builtin_mve_vstrhq_scatter_offset_sv8hi ((__builtin_neon_hi *) __base, __offset, __value);
+}
+
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vstrhq_scatter_offset_u32 (uint16_t * __base, uint32x4_t __offset, uint32x4_t __value)
+{
+  __builtin_mve_vstrhq_scatter_offset_uv4si ((__builtin_neon_hi *) __base, __offset, __value);
+}
+
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vstrhq_scatter_offset_u16 (uint16_t * __base, uint16x8_t __offset, uint16x8_t __value)
+{
+  __builtin_mve_vstrhq_scatter_offset_uv8hi ((__builtin_neon_hi *) __base, __offset, __value);
+}
+
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vstrhq_scatter_offset_p_s32 (int16_t * __base, uint32x4_t __offset, int32x4_t __value, mve_pred16_t __p)
+{
+  __builtin_mve_vstrhq_scatter_offset_p_sv4si ((__builtin_neon_hi *) __base, __offset, __value, __p);
+}
+
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vstrhq_scatter_offset_p_s16 (int16_t * __base, uint16x8_t __offset, int16x8_t __value, mve_pred16_t __p)
+{
+  __builtin_mve_vstrhq_scatter_offset_p_sv8hi ((__builtin_neon_hi *) __base, __offset, __value, __p);
+}
+
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vstrhq_scatter_offset_p_u32 (uint16_t * __base, uint32x4_t __offset, uint32x4_t __value, mve_pred16_t __p)
+{
+  __builtin_mve_vstrhq_scatter_offset_p_uv4si ((__builtin_neon_hi *) __base, __offset, __value, __p);
+}
+
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vstrhq_scatter_offset_p_u16 (uint16_t * __base, uint16x8_t __offset, uint16x8_t __value, mve_pred16_t __p)
+{
+  __builtin_mve_vstrhq_scatter_offset_p_uv8hi ((__builtin_neon_hi *) __base, __offset, __value, __p);
+}
+
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vstrhq_scatter_shifted_offset_s32 (int16_t * __base, uint32x4_t __offset, int32x4_t __value)
+{
+  __builtin_mve_vstrhq_scatter_shifted_offset_sv4si ((__builtin_neon_hi *) __base, __offset, __value);
+}
+
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vstrhq_scatter_shifted_offset_s16 (int16_t * __base, uint16x8_t __offset, int16x8_t __value)
+{
+  __builtin_mve_vstrhq_scatter_shifted_offset_sv8hi ((__builtin_neon_hi *) __base, __offset, __value);
+}
+
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vstrhq_scatter_shifted_offset_u32 (uint16_t * __base, uint32x4_t __offset, uint32x4_t __value)
+{
+  __builtin_mve_vstrhq_scatter_shifted_offset_uv4si ((__builtin_neon_hi *) __base, __offset, __value);
+}
+
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vstrhq_scatter_shifted_offset_u16 (uint16_t * __base, uint16x8_t __offset, uint16x8_t __value)
+{
+  __builtin_mve_vstrhq_scatter_shifted_offset_uv8hi ((__builtin_neon_hi *) __base, __offset, __value);
+}
+
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vstrhq_scatter_shifted_offset_p_s32 (int16_t * __base, uint32x4_t __offset, int32x4_t __value, mve_pred16_t __p)
+{
+  __builtin_mve_vstrhq_scatter_shifted_offset_p_sv4si ((__builtin_neon_hi *) __base, __offset, __value, __p);
+}
+
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vstrhq_scatter_shifted_offset_p_s16 (int16_t * __base, uint16x8_t __offset, int16x8_t __value, mve_pred16_t __p)
+{
+  __builtin_mve_vstrhq_scatter_shifted_offset_p_sv8hi ((__builtin_neon_hi *) __base, __offset, __value, __p);
+}
+
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vstrhq_scatter_shifted_offset_p_u32 (uint16_t * __base, uint32x4_t __offset, uint32x4_t __value, mve_pred16_t __p)
+{
+  __builtin_mve_vstrhq_scatter_shifted_offset_p_uv4si ((__builtin_neon_hi *) __base, __offset, __value, __p);
+}
+
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vstrhq_scatter_shifted_offset_p_u16 (uint16_t * __base, uint16x8_t __offset, uint16x8_t __value, mve_pred16_t __p)
+{
+  __builtin_mve_vstrhq_scatter_shifted_offset_p_uv8hi ((__builtin_neon_hi *) __base, __offset, __value, __p);
+}
+
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vstrhq_s32 (int16_t * __addr, int32x4_t __value)
+{
+  __builtin_mve_vstrhq_sv4si ((__builtin_neon_hi *) __addr, __value);
+}
+
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vstrhq_s16 (int16_t * __addr, int16x8_t __value)
+{
+  __builtin_mve_vstrhq_sv8hi ((__builtin_neon_hi *) __addr, __value);
+}
+
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vstrhq_u32 (uint16_t * __addr, uint32x4_t __value)
+{
+  __builtin_mve_vstrhq_uv4si ((__builtin_neon_hi *) __addr, __value);
+}
+
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vstrhq_u16 (uint16_t * __addr, uint16x8_t __value)
+{
+  __builtin_mve_vstrhq_uv8hi ((__builtin_neon_hi *) __addr, __value);
+}
+
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vstrhq_p_s32 (int16_t * __addr, int32x4_t __value, mve_pred16_t __p)
+{
+  __builtin_mve_vstrhq_p_sv4si ((__builtin_neon_hi *) __addr, __value, __p);
+}
+
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vstrhq_p_s16 (int16_t * __addr, int16x8_t __value, mve_pred16_t __p)
+{
+  __builtin_mve_vstrhq_p_sv8hi ((__builtin_neon_hi *) __addr, __value, __p);
+}
+
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vstrhq_p_u32 (uint16_t * __addr, uint32x4_t __value, mve_pred16_t __p)
+{
+  __builtin_mve_vstrhq_p_uv4si ((__builtin_neon_hi *) __addr, __value, __p);
+}
+
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vstrhq_p_u16 (uint16_t * __addr, uint16x8_t __value, mve_pred16_t __p)
+{
+  __builtin_mve_vstrhq_p_uv8hi ((__builtin_neon_hi *) __addr, __value, __p);
+}
+
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vstrwq_s32 (int32_t * __addr, int32x4_t __value)
+{
+  __builtin_mve_vstrwq_sv4si ((__builtin_neon_si *) __addr, __value);
+}
+
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vstrwq_u32 (uint32_t * __addr, uint32x4_t __value)
+{
+  __builtin_mve_vstrwq_uv4si ((__builtin_neon_si *) __addr, __value);
+}
+
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vstrwq_p_s32 (int32_t * __addr, int32x4_t __value, mve_pred16_t __p)
+{
+  __builtin_mve_vstrwq_p_sv4si ((__builtin_neon_si *) __addr, __value, __p);
+}
+
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vstrwq_p_u32 (uint32_t * __addr, uint32x4_t __value, mve_pred16_t __p)
+{
+  __builtin_mve_vstrwq_p_uv4si ((__builtin_neon_si *) __addr, __value, __p);
+}
+
 #if (__ARM_FEATURE_MVE & 2) /* MVE Floating point.  */
 
 __extension__ extern __inline void
@@ -14147,6 +14425,48 @@ __arm_vldrwq_gather_shifted_offset_z_f32 (float32_t const * __base, uint32x4_t _
   return __builtin_mve_vldrwq_gather_shifted_offset_z_fv4sf (__base, __offset, __p);
 }
 
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vstrwq_p_f32 (float32_t * __addr, float32x4_t __value, mve_pred16_t __p)
+{
+  __builtin_mve_vstrwq_p_fv4sf (__addr, __value, __p);
+}
+
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vstrwq_f32 (float32_t * __addr, float32x4_t __value)
+{
+  __builtin_mve_vstrwq_fv4sf (__addr, __value);
+}
+
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vst1q_f32 (float32_t * __addr, float32x4_t __value)
+{
+  __builtin_mve_vst1q_fv4sf (__addr, __value);
+}
+
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vst1q_f16 (float16_t * __addr, float16x8_t __value)
+{
+  __builtin_mve_vst1q_fv8hf (__addr, __value);
+}
+
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vstrhq_f16 (float16_t * __addr, float16x8_t __value)
+{
+  __builtin_mve_vstrhq_fv8hf (__addr, __value);
+}
+
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vstrhq_p_f16 (float16_t * __addr, float16x8_t __value, mve_pred16_t __p)
+{
+  __builtin_mve_vstrhq_p_fv8hf (__addr, __value, __p);
+}
+
 #endif
 
 enum {
@@ -15774,6 +16094,99 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_uint32_t_const_ptr]: __arm_vldrwq_gather_shifted_offset_z_u32 (__ARM_mve_coerce(__p0, uint32_t const *), p1, p2), \
   int (*)[__ARM_mve_type_float32_t_const_ptr]: __arm_vldrwq_gather_shifted_offset_z_f32 (__ARM_mve_coerce(__p0, float32_t const *), p1, p2));})
 
+#define vst1q(p0,p1) __arm_vst1q(p0,p1)
+#define __arm_vst1q(p0,p1) ({ __typeof(p0) __p0 = (p0); \
+  __typeof(p1) __p1 = (p1); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
+  int (*)[__ARM_mve_type_int8_t_ptr][__ARM_mve_type_int8x16_t]: __arm_vst1q_s8 (__ARM_mve_coerce(__p0, int8_t *), __ARM_mve_coerce(__p1, int8x16_t)), \
+  int (*)[__ARM_mve_type_int16_t_ptr][__ARM_mve_type_int16x8_t]: __arm_vst1q_s16 (__ARM_mve_coerce(__p0, int16_t *), __ARM_mve_coerce(__p1, int16x8_t)), \
+  int (*)[__ARM_mve_type_int32_t_ptr][__ARM_mve_type_int32x4_t]: __arm_vst1q_s32 (__ARM_mve_coerce(__p0, int32_t *), __ARM_mve_coerce(__p1, int32x4_t)), \
+  int (*)[__ARM_mve_type_uint8_t_ptr][__ARM_mve_type_uint8x16_t]: __arm_vst1q_u8 (__ARM_mve_coerce(__p0, uint8_t *), __ARM_mve_coerce(__p1, uint8x16_t)), \
+  int (*)[__ARM_mve_type_uint16_t_ptr][__ARM_mve_type_uint16x8_t]: __arm_vst1q_u16 (__ARM_mve_coerce(__p0, uint16_t *), __ARM_mve_coerce(__p1, uint16x8_t)), \
+  int (*)[__ARM_mve_type_uint32_t_ptr][__ARM_mve_type_uint32x4_t]: __arm_vst1q_u32 (__ARM_mve_coerce(__p0, uint32_t *), __ARM_mve_coerce(__p1, uint32x4_t)), \
+  int (*)[__ARM_mve_type_float16_t_ptr][__ARM_mve_type_float16x8_t]: __arm_vst1q_f16 (__ARM_mve_coerce(__p0, float16_t *), __ARM_mve_coerce(__p1, float16x8_t)), \
+  int (*)[__ARM_mve_type_float32_t_ptr][__ARM_mve_type_float32x4_t]: __arm_vst1q_f32 (__ARM_mve_coerce(__p0, float32_t *), __ARM_mve_coerce(__p1, float32x4_t)));})
+
+#define vstrhq(p0,p1) __arm_vstrhq(p0,p1)
+#define __arm_vstrhq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
+  __typeof(p1) __p1 = (p1); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
+  int (*)[__ARM_mve_type_int16_t_ptr][__ARM_mve_type_int16x8_t]: __arm_vstrhq_s16 (__ARM_mve_coerce(__p0, int16_t *), __ARM_mve_coerce(__p1, int16x8_t)), \
+  int (*)[__ARM_mve_type_int16_t_ptr][__ARM_mve_type_int32x4_t]: __arm_vstrhq_s32 (__ARM_mve_coerce(__p0, int16_t *), __ARM_mve_coerce(__p1, int32x4_t)), \
+  int (*)[__ARM_mve_type_uint16_t_ptr][__ARM_mve_type_uint16x8_t]: __arm_vstrhq_u16 (__ARM_mve_coerce(__p0, uint16_t *), __ARM_mve_coerce(__p1, uint16x8_t)), \
+  int (*)[__ARM_mve_type_uint16_t_ptr][__ARM_mve_type_uint32x4_t]: __arm_vstrhq_u32 (__ARM_mve_coerce(__p0, uint16_t *), __ARM_mve_coerce(__p1, uint32x4_t)), \
+  int (*)[__ARM_mve_type_float16_t_ptr][__ARM_mve_type_float16x8_t]: __arm_vstrhq_f16 (__ARM_mve_coerce(__p0, float16_t *), __ARM_mve_coerce(__p1, float16x8_t)));})
+
+#define vstrhq_p(p0,p1,p2) __arm_vstrhq_p(p0,p1,p2)
+#define __arm_vstrhq_p(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
+  __typeof(p1) __p1 = (p1); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
+  int (*)[__ARM_mve_type_int16_t_ptr][__ARM_mve_type_int16x8_t]: __arm_vstrhq_p_s16 (__ARM_mve_coerce(__p0, int16_t *), __ARM_mve_coerce(__p1, int16x8_t), p2), \
+  int (*)[__ARM_mve_type_int16_t_ptr][__ARM_mve_type_int32x4_t]: __arm_vstrhq_p_s32 (__ARM_mve_coerce(__p0, int16_t *), __ARM_mve_coerce(__p1, int32x4_t), p2), \
+  int (*)[__ARM_mve_type_uint16_t_ptr][__ARM_mve_type_uint16x8_t]: __arm_vstrhq_p_u16 (__ARM_mve_coerce(__p0, uint16_t *), __ARM_mve_coerce(__p1, uint16x8_t), p2), \
+  int (*)[__ARM_mve_type_uint16_t_ptr][__ARM_mve_type_uint32x4_t]: __arm_vstrhq_p_u32 (__ARM_mve_coerce(__p0, uint16_t *), __ARM_mve_coerce(__p1, uint32x4_t), p2), \
+  int (*)[__ARM_mve_type_float16_t_ptr][__ARM_mve_type_float16x8_t]: __arm_vstrhq_p_f16 (__ARM_mve_coerce(__p0, float16_t *), __ARM_mve_coerce(__p1, float16x8_t), p2));})
+
+#define vstrhq_scatter_offset_p(p0,p1,p2,p3) __arm_vstrhq_scatter_offset_p(p0,p1,p2,p3)
+#define __arm_vstrhq_scatter_offset_p(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
+  __typeof(p1) __p1 = (p1); \
+  __typeof(p2) __p2 = (p2); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
+  int (*)[__ARM_mve_type_int16_t_ptr][__ARM_mve_type_uint16x8_t][__ARM_mve_type_int16x8_t]: __arm_vstrhq_scatter_offset_p_s16 (__ARM_mve_coerce(__p0, int16_t *), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, int16x8_t), p3), \
+  int (*)[__ARM_mve_type_int16_t_ptr][__ARM_mve_type_uint32x4_t][__ARM_mve_type_int32x4_t]: __arm_vstrhq_scatter_offset_p_s32 (__ARM_mve_coerce(__p0, int16_t *), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, int32x4_t), p3), \
+  int (*)[__ARM_mve_type_uint16_t_ptr][__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vstrhq_scatter_offset_p_u16 (__ARM_mve_coerce(__p0, uint16_t *), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, uint16x8_t), p3), \
+  int (*)[__ARM_mve_type_uint16_t_ptr][__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vstrhq_scatter_offset_p_u32 (__ARM_mve_coerce(__p0, uint16_t *), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, uint32x4_t), p3), \
+  int (*)[__ARM_mve_type_float16_t_ptr][__ARM_mve_type_uint16x8_t][__ARM_mve_type_float16x8_t]: __arm_vstrhq_scatter_offset_p_f16 (__ARM_mve_coerce(__p0, float16_t *), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, float16x8_t), p3));})
+
+#define vstrhq_scatter_offset(p0,p1,p2) __arm_vstrhq_scatter_offset(p0,p1,p2)
+#define __arm_vstrhq_scatter_offset(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
+  __typeof(p1) __p1 = (p1); \
+  __typeof(p2) __p2 = (p2); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
+  int (*)[__ARM_mve_type_int16_t_ptr][__ARM_mve_type_uint16x8_t][__ARM_mve_type_int16x8_t]: __arm_vstrhq_scatter_offset_s16 (__ARM_mve_coerce(__p0, int16_t *), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, int16x8_t)), \
+  int (*)[__ARM_mve_type_int16_t_ptr][__ARM_mve_type_uint32x4_t][__ARM_mve_type_int32x4_t]: __arm_vstrhq_scatter_offset_s32 (__ARM_mve_coerce(__p0, int16_t *), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, int32x4_t)), \
+  int (*)[__ARM_mve_type_uint16_t_ptr][__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vstrhq_scatter_offset_u16 (__ARM_mve_coerce(__p0, uint16_t *), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, uint16x8_t)), \
+  int (*)[__ARM_mve_type_uint16_t_ptr][__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vstrhq_scatter_offset_u32 (__ARM_mve_coerce(__p0, uint16_t *), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, uint32x4_t)), \
+  int (*)[__ARM_mve_type_float16_t_ptr][__ARM_mve_type_uint16x8_t][__ARM_mve_type_float16x8_t]: __arm_vstrhq_scatter_offset_f16 (__ARM_mve_coerce(__p0, float16_t *), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, float16x8_t)));})
+
+#define vstrhq_scatter_shifted_offset_p(p0,p1,p2,p3) __arm_vstrhq_scatter_shifted_offset_p(p0,p1,p2,p3)
+#define __arm_vstrhq_scatter_shifted_offset_p(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
+  __typeof(p1) __p1 = (p1); \
+  __typeof(p2) __p2 = (p2); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
+  int (*)[__ARM_mve_type_int16_t_ptr][__ARM_mve_type_uint16x8_t][__ARM_mve_type_int16x8_t]: __arm_vstrhq_scatter_shifted_offset_p_s16 (__ARM_mve_coerce(__p0, int16_t *), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, int16x8_t), p3), \
+  int (*)[__ARM_mve_type_int16_t_ptr][__ARM_mve_type_uint32x4_t][__ARM_mve_type_int32x4_t]: __arm_vstrhq_scatter_shifted_offset_p_s32 (__ARM_mve_coerce(__p0, int16_t *), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, int32x4_t), p3), \
+  int (*)[__ARM_mve_type_uint16_t_ptr][__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vstrhq_scatter_shifted_offset_p_u16 (__ARM_mve_coerce(__p0, uint16_t *), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, uint16x8_t), p3), \
+  int (*)[__ARM_mve_type_uint16_t_ptr][__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vstrhq_scatter_shifted_offset_p_u32 (__ARM_mve_coerce(__p0, uint16_t *), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, uint32x4_t), p3), \
+  int (*)[__ARM_mve_type_float16_t_ptr][__ARM_mve_type_uint16x8_t][__ARM_mve_type_float16x8_t]: __arm_vstrhq_scatter_shifted_offset_p_f16 (__ARM_mve_coerce(__p0, float16_t *), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, float16x8_t), p3));})
+
+#define vstrhq_scatter_shifted_offset(p0,p1,p2) __arm_vstrhq_scatter_shifted_offset(p0,p1,p2)
+#define __arm_vstrhq_scatter_shifted_offset(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
+  __typeof(p1) __p1 = (p1); \
+  __typeof(p2) __p2 = (p2); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
+  int (*)[__ARM_mve_type_int16_t_ptr][__ARM_mve_type_uint16x8_t][__ARM_mve_type_int16x8_t]: __arm_vstrhq_scatter_shifted_offset_s16 (__ARM_mve_coerce(__p0, int16_t *), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, int16x8_t)), \
+  int (*)[__ARM_mve_type_int16_t_ptr][__ARM_mve_type_uint32x4_t][__ARM_mve_type_int32x4_t]: __arm_vstrhq_scatter_shifted_offset_s32 (__ARM_mve_coerce(__p0, int16_t *), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, int32x4_t)), \
+  int (*)[__ARM_mve_type_uint16_t_ptr][__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vstrhq_scatter_shifted_offset_u16 (__ARM_mve_coerce(__p0, uint16_t *), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, uint16x8_t)), \
+  int (*)[__ARM_mve_type_uint16_t_ptr][__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vstrhq_scatter_shifted_offset_u32 (__ARM_mve_coerce(__p0, uint16_t *), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, uint32x4_t)), \
+  int (*)[__ARM_mve_type_float16_t_ptr][__ARM_mve_type_uint16x8_t][__ARM_mve_type_float16x8_t]: __arm_vstrhq_scatter_shifted_offset_f16 (__ARM_mve_coerce(__p0, float16_t *), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, float16x8_t)));})
+
+#define vstrwq_p(p0,p1,p2) __arm_vstrwq_p(p0,p1,p2)
+#define __arm_vstrwq_p(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
+  __typeof(p1) __p1 = (p1); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
+  int (*)[__ARM_mve_type_int32_t_ptr][__ARM_mve_type_int32x4_t]: __arm_vstrwq_p_s32 (__ARM_mve_coerce(__p0, int32_t *), __ARM_mve_coerce(__p1, int32x4_t), p2), \
+  int (*)[__ARM_mve_type_uint32_t_ptr][__ARM_mve_type_uint32x4_t]: __arm_vstrwq_p_u32 (__ARM_mve_coerce(__p0, uint32_t *), __ARM_mve_coerce(__p1, uint32x4_t), p2), \
+  int (*)[__ARM_mve_type_float32_t_ptr][__ARM_mve_type_float32x4_t]: __arm_vstrwq_p_f32 (__ARM_mve_coerce(__p0, float32_t *), __ARM_mve_coerce(__p1, float32x4_t), p2));})
+
+#define vstrwq(p0,p1) __arm_vstrwq(p0,p1)
+#define __arm_vstrwq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
+  __typeof(p1) __p1 = (p1); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
+  int (*)[__ARM_mve_type_int32_t_ptr][__ARM_mve_type_int32x4_t]: __arm_vstrwq_s32 (__ARM_mve_coerce(__p0, int32_t *), __ARM_mve_coerce(__p1, int32x4_t)), \
+  int (*)[__ARM_mve_type_uint32_t_ptr][__ARM_mve_type_uint32x4_t]: __arm_vstrwq_u32 (__ARM_mve_coerce(__p0, uint32_t *), __ARM_mve_coerce(__p1, uint32x4_t)), \
+  int (*)[__ARM_mve_type_float32_t_ptr][__ARM_mve_type_float32x4_t]: __arm_vstrwq_f32 (__ARM_mve_coerce(__p0, float32_t *), __ARM_mve_coerce(__p1, float32x4_t)));})
+
 #else /* MVE Interger.  */
 
 #define vst4q(p0,p1) __arm_vst4q(p0,p1)
@@ -18807,6 +19220,90 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_int32_t_const_ptr]: __arm_vldrwq_gather_shifted_offset_z_s32 (__ARM_mve_coerce(__p0, int32_t const *), p1, p2), \
   int (*)[__ARM_mve_type_uint32_t_const_ptr]: __arm_vldrwq_gather_shifted_offset_z_u32 (__ARM_mve_coerce(__p0, uint32_t const *), p1, p2));})
 
+#define vst1q(p0,p1) __arm_vst1q(p0,p1)
+#define __arm_vst1q(p0,p1) ({ __typeof(p0) __p0 = (p0); \
+  __typeof(p1) __p1 = (p1); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
+  int (*)[__ARM_mve_type_int8_t_ptr][__ARM_mve_type_int8x16_t]: __arm_vst1q_s8 (__ARM_mve_coerce(__p0, int8_t *), __ARM_mve_coerce(__p1, int8x16_t)), \
+  int (*)[__ARM_mve_type_int16_t_ptr][__ARM_mve_type_int16x8_t]: __arm_vst1q_s16 (__ARM_mve_coerce(__p0, int16_t *), __ARM_mve_coerce(__p1, int16x8_t)), \
+  int (*)[__ARM_mve_type_int32_t_ptr][__ARM_mve_type_int32x4_t]: __arm_vst1q_s32 (__ARM_mve_coerce(__p0, int32_t *), __ARM_mve_coerce(__p1, int32x4_t)), \
+  int (*)[__ARM_mve_type_uint8_t_ptr][__ARM_mve_type_uint8x16_t]: __arm_vst1q_u8 (__ARM_mve_coerce(__p0, uint8_t *), __ARM_mve_coerce(__p1, uint8x16_t)), \
+  int (*)[__ARM_mve_type_uint16_t_ptr][__ARM_mve_type_uint16x8_t]: __arm_vst1q_u16 (__ARM_mve_coerce(__p0, uint16_t *), __ARM_mve_coerce(__p1, uint16x8_t)), \
+  int (*)[__ARM_mve_type_uint32_t_ptr][__ARM_mve_type_uint32x4_t]: __arm_vst1q_u32 (__ARM_mve_coerce(__p0, uint32_t *), __ARM_mve_coerce(__p1, uint32x4_t)));})
+
+#define vstrhq(p0,p1) __arm_vstrhq(p0,p1)
+#define __arm_vstrhq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
+  __typeof(p1) __p1 = (p1); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
+  int (*)[__ARM_mve_type_int16_t_ptr][__ARM_mve_type_int16x8_t]: __arm_vstrhq_s16 (__ARM_mve_coerce(__p0, int16_t *), __ARM_mve_coerce(__p1, int16x8_t)), \
+  int (*)[__ARM_mve_type_int16_t_ptr][__ARM_mve_type_int32x4_t]: __arm_vstrhq_s32 (__ARM_mve_coerce(__p0, int16_t *), __ARM_mve_coerce(__p1, int32x4_t)), \
+  int (*)[__ARM_mve_type_uint16_t_ptr][__ARM_mve_type_uint16x8_t]: __arm_vstrhq_u16 (__ARM_mve_coerce(__p0, uint16_t *), __ARM_mve_coerce(__p1, uint16x8_t)), \
+  int (*)[__ARM_mve_type_uint16_t_ptr][__ARM_mve_type_uint32x4_t]: __arm_vstrhq_u32 (__ARM_mve_coerce(__p0, uint16_t *), __ARM_mve_coerce(__p1, uint32x4_t)));})
+
+#define vstrhq_p(p0,p1,p2) __arm_vstrhq_p(p0,p1,p2)
+#define __arm_vstrhq_p(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
+  __typeof(p1) __p1 = (p1); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
+  int (*)[__ARM_mve_type_int16_t_ptr][__ARM_mve_type_int16x8_t]: __arm_vstrhq_p_s16 (__ARM_mve_coerce(__p0, int16_t *), __ARM_mve_coerce(__p1, int16x8_t), p2), \
+  int (*)[__ARM_mve_type_int16_t_ptr][__ARM_mve_type_int32x4_t]: __arm_vstrhq_p_s32 (__ARM_mve_coerce(__p0, int16_t *), __ARM_mve_coerce(__p1, int32x4_t), p2), \
+  int (*)[__ARM_mve_type_uint16_t_ptr][__ARM_mve_type_uint16x8_t]: __arm_vstrhq_p_u16 (__ARM_mve_coerce(__p0, uint16_t *), __ARM_mve_coerce(__p1, uint16x8_t), p2), \
+  int (*)[__ARM_mve_type_uint16_t_ptr][__ARM_mve_type_uint32x4_t]: __arm_vstrhq_p_u32 (__ARM_mve_coerce(__p0, uint16_t *), __ARM_mve_coerce(__p1, uint32x4_t), p2));})
+
+#define vstrhq_scatter_offset_p(p0,p1,p2,p3) __arm_vstrhq_scatter_offset_p(p0,p1,p2,p3)
+#define __arm_vstrhq_scatter_offset_p(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
+  __typeof(p1) __p1 = (p1); \
+  __typeof(p2) __p2 = (p2); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
+  int (*)[__ARM_mve_type_int16_t_ptr][__ARM_mve_type_uint16x8_t][__ARM_mve_type_int16x8_t]: __arm_vstrhq_scatter_offset_p_s16 (__ARM_mve_coerce(__p0, int16_t *), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, int16x8_t), p3), \
+  int (*)[__ARM_mve_type_int16_t_ptr][__ARM_mve_type_uint32x4_t][__ARM_mve_type_int32x4_t]: __arm_vstrhq_scatter_offset_p_s32 (__ARM_mve_coerce(__p0, int16_t *), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, int32x4_t), p3), \
+  int (*)[__ARM_mve_type_uint16_t_ptr][__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vstrhq_scatter_offset_p_u16 (__ARM_mve_coerce(__p0, uint16_t *), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, uint16x8_t), p3), \
+  int (*)[__ARM_mve_type_uint16_t_ptr][__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vstrhq_scatter_offset_p_u32 (__ARM_mve_coerce(__p0, uint16_t *), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, uint32x4_t), p3));})
+
+#define vstrhq_scatter_offset(p0,p1,p2) __arm_vstrhq_scatter_offset(p0,p1,p2)
+#define __arm_vstrhq_scatter_offset(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
+  __typeof(p1) __p1 = (p1); \
+  __typeof(p2) __p2 = (p2); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
+  int (*)[__ARM_mve_type_int16_t_ptr][__ARM_mve_type_uint16x8_t][__ARM_mve_type_int16x8_t]: __arm_vstrhq_scatter_offset_s16 (__ARM_mve_coerce(__p0, int16_t *), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, int16x8_t)), \
+  int (*)[__ARM_mve_type_int16_t_ptr][__ARM_mve_type_uint32x4_t][__ARM_mve_type_int32x4_t]: __arm_vstrhq_scatter_offset_s32 (__ARM_mve_coerce(__p0, int16_t *), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, int32x4_t)), \
+  int (*)[__ARM_mve_type_uint16_t_ptr][__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vstrhq_scatter_offset_u16 (__ARM_mve_coerce(__p0, uint16_t *), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, uint16x8_t)), \
+  int (*)[__ARM_mve_type_uint16_t_ptr][__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vstrhq_scatter_offset_u32 (__ARM_mve_coerce(__p0, uint16_t *), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, uint32x4_t)));})
+
+#define vstrhq_scatter_shifted_offset_p(p0,p1,p2,p3) __arm_vstrhq_scatter_shifted_offset_p(p0,p1,p2,p3)
+#define __arm_vstrhq_scatter_shifted_offset_p(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
+  __typeof(p1) __p1 = (p1); \
+  __typeof(p2) __p2 = (p2); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
+  int (*)[__ARM_mve_type_int16_t_ptr][__ARM_mve_type_uint16x8_t][__ARM_mve_type_int16x8_t]: __arm_vstrhq_scatter_shifted_offset_p_s16 (__ARM_mve_coerce(__p0, int16_t *), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, int16x8_t), p3), \
+  int (*)[__ARM_mve_type_int16_t_ptr][__ARM_mve_type_uint32x4_t][__ARM_mve_type_int32x4_t]: __arm_vstrhq_scatter_shifted_offset_p_s32 (__ARM_mve_coerce(__p0, int16_t *), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, int32x4_t), p3), \
+  int (*)[__ARM_mve_type_uint16_t_ptr][__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vstrhq_scatter_shifted_offset_p_u16 (__ARM_mve_coerce(__p0, uint16_t *), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, uint16x8_t), p3), \
+  int (*)[__ARM_mve_type_uint16_t_ptr][__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vstrhq_scatter_shifted_offset_p_u32 (__ARM_mve_coerce(__p0, uint16_t *), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, uint32x4_t), p3));})
+
+#define vstrhq_scatter_shifted_offset(p0,p1,p2) __arm_vstrhq_scatter_shifted_offset(p0,p1,p2)
+#define __arm_vstrhq_scatter_shifted_offset(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
+  __typeof(p1) __p1 = (p1); \
+  __typeof(p2) __p2 = (p2); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
+  int (*)[__ARM_mve_type_int16_t_ptr][__ARM_mve_type_uint16x8_t][__ARM_mve_type_int16x8_t]: __arm_vstrhq_scatter_shifted_offset_s16 (__ARM_mve_coerce(__p0, int16_t *), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, int16x8_t)), \
+  int (*)[__ARM_mve_type_int16_t_ptr][__ARM_mve_type_uint32x4_t][__ARM_mve_type_int32x4_t]: __arm_vstrhq_scatter_shifted_offset_s32 (__ARM_mve_coerce(__p0, int16_t *), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, int32x4_t)), \
+  int (*)[__ARM_mve_type_uint16_t_ptr][__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vstrhq_scatter_shifted_offset_u16 (__ARM_mve_coerce(__p0, uint16_t *), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, uint16x8_t)), \
+  int (*)[__ARM_mve_type_uint16_t_ptr][__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vstrhq_scatter_shifted_offset_u32 (__ARM_mve_coerce(__p0, uint16_t *), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, uint32x4_t)));})
+
+
+#define vstrwq(p0,p1) __arm_vstrwq(p0,p1)
+#define __arm_vstrwq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
+  __typeof(p1) __p1 = (p1); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
+  int (*)[__ARM_mve_type_int32_t_ptr][__ARM_mve_type_int32x4_t]: __arm_vstrwq_s32 (__ARM_mve_coerce(__p0, int32_t *), __ARM_mve_coerce(__p1, int32x4_t)), \
+  int (*)[__ARM_mve_type_uint32_t_ptr][__ARM_mve_type_uint32x4_t]: __arm_vstrwq_u32 (__ARM_mve_coerce(__p0, uint32_t *), __ARM_mve_coerce(__p1, uint32x4_t)));})
+
+#define vstrwq_p(p0,p1,p2) __arm_vstrwq_p(p0,p1,p2)
+#define __arm_vstrwq_p(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
+  __typeof(p1) __p1 = (p1); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
+  int (*)[__ARM_mve_type_int32_t_ptr][__ARM_mve_type_int32x4_t]: __arm_vstrwq_p_s32 (__ARM_mve_coerce(__p0, int32_t *), __ARM_mve_coerce(__p1, int32x4_t), p2), \
+  int (*)[__ARM_mve_type_uint32_t_ptr][__ARM_mve_type_uint32x4_t]: __arm_vstrwq_p_u32 (__ARM_mve_coerce(__p0, uint32_t *), __ARM_mve_coerce(__p1, uint32x4_t), p2));})
+
 #endif /* MVE Floating point.  */
 
 #ifdef __cplusplus
diff --git a/gcc/config/arm/arm_mve_builtins.def b/gcc/config/arm/arm_mve_builtins.def
index 848be45ba716a194f475ca7bfd8c57f8fba59bfd..8e85d8def07b63825c764e233a2df9882b72c6a2 100644
--- a/gcc/config/arm/arm_mve_builtins.def
+++ b/gcc/config/arm/arm_mve_builtins.def
@@ -762,3 +762,26 @@ VAR1 (LDRGU_Z, vldrdq_gather_offset_z_u, v2di)
 VAR1 (LDRGU_Z, vldrdq_gather_shifted_offset_z_u, v2di)
 VAR1 (LDRGU_Z, vldrwq_gather_offset_z_u, v4si)
 VAR1 (LDRGU_Z, vldrwq_gather_shifted_offset_z_u, v4si)
+VAR3 (STRU, vst1q_u, v16qi, v8hi, v4si)
+VAR3 (STRS, vst1q_s, v16qi, v8hi, v4si)
+VAR2 (STRU_P, vstrhq_p_u, v8hi, v4si)
+VAR2 (STRU, vstrhq_u, v8hi, v4si)
+VAR2 (STRS_P, vstrhq_p_s, v8hi, v4si)
+VAR2 (STRS, vstrhq_s, v8hi, v4si)
+VAR2 (STRS, vst1q_f, v8hf, v4sf)
+VAR2 (STRSU_P, vstrhq_scatter_shifted_offset_p_u, v8hi, v4si)
+VAR2 (STRSU_P, vstrhq_scatter_offset_p_u, v8hi, v4si)
+VAR2 (STRSU, vstrhq_scatter_shifted_offset_u, v8hi, v4si)
+VAR2 (STRSU, vstrhq_scatter_offset_u, v8hi, v4si)
+VAR2 (STRSS_P, vstrhq_scatter_shifted_offset_p_s, v8hi, v4si)
+VAR2 (STRSS_P, vstrhq_scatter_offset_p_s, v8hi, v4si)
+VAR2 (STRSS, vstrhq_scatter_shifted_offset_s, v8hi, v4si)
+VAR2 (STRSS, vstrhq_scatter_offset_s, v8hi, v4si)
+VAR1 (STRS, vstrhq_f, v8hf)
+VAR1 (STRS_P, vstrhq_p_f, v8hf)
+VAR1 (STRS, vstrwq_f, v4sf)
+VAR1 (STRS, vstrwq_s, v4si)
+VAR1 (STRU, vstrwq_u, v4si)
+VAR1 (STRS_P, vstrwq_p_f, v4sf)
+VAR1 (STRS_P, vstrwq_p_s, v4si)
+VAR1 (STRU_P, vstrwq_p_u, v4si)
diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index 452d7b5eac44e9d3be03dfc418f51584ab9feb32..2d48eae395630e5a91ab0e06e38103effabfcdbf 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -201,7 +201,9 @@
 			 VLDRDQGO_S VLDRDQGO_U VLDRDQGSO_S VLDRDQGSO_U
 			 VLDRHQGO_F VLDRHQGSO_F VLDRWQGB_F VLDRWQGO_F
 			 VLDRWQGO_S VLDRWQGO_U VLDRWQGSO_F VLDRWQGSO_S
-			 VLDRWQGSO_U])
+			 VLDRWQGSO_U VSTRHQ_F VST1Q_S VST1Q_U VSTRHQSO_S
+			 VSTRHQSO_U VSTRHQSSO_S VSTRHQSSO_U VSTRHQ_S
+			 VSTRHQ_U VSTRWQ_S VSTRWQ_U VSTRWQ_F VST1Q_F])
 
 (define_mode_attr MVE_CNVT [(V8HI "V8HF") (V4SI "V4SF") (V8HF "V8HI")
 			    (V4SF "V4SI")])
@@ -363,7 +365,10 @@
 		       (VLDRWQ_U "u") (VLDRDQGB_S "s") (VLDRDQGB_U "u")
 		       (VLDRDQGO_S "s") (VLDRDQGO_U "u") (VLDRDQGSO_S "s")
 		       (VLDRDQGSO_U "u") (VLDRWQGO_S "s") (VLDRWQGO_U "u")
-		       (VLDRWQGSO_S "s") (VLDRWQGSO_U "u")])
+		       (VLDRWQGSO_S "s") (VLDRWQGSO_U "u") (VST1Q_S "s")
+		       (VST1Q_U "u") (VSTRHQSO_S "s") (VSTRHQSO_U "u")
+		       (VSTRHQSSO_S "s") (VSTRHQSSO_U "u") (VSTRHQ_S "s")
+		       (VSTRHQ_U "u") (VSTRWQ_S "s") (VSTRWQ_U "u")])
 
 (define_int_attr mode1 [(VCTP8Q "8") (VCTP16Q "16") (VCTP32Q "32")
 			(VCTP64Q "64") (VCTP8Q_M "8") (VCTP16Q_M "16")
@@ -602,6 +607,11 @@
 (define_int_iterator VLDRDGSOQ [VLDRDQGSO_S VLDRDQGSO_U])
 (define_int_iterator VLDRWGOQ [VLDRWQGO_S VLDRWQGO_U])
 (define_int_iterator VLDRWGSOQ [VLDRWQGSO_S VLDRWQGSO_U])
+(define_int_iterator VST1Q [VST1Q_S VST1Q_U])
+(define_int_iterator VSTRHSOQ [VSTRHQSO_S VSTRHQSO_U])
+(define_int_iterator VSTRHSSOQ [VSTRHQSSO_S VSTRHQSSO_U])
+(define_int_iterator VSTRHQ [VSTRHQ_S VSTRHQ_U])
+(define_int_iterator VSTRWQ [VSTRWQ_S VSTRWQ_U])
 
 (define_insn "*mve_mov<mode>"
   [(set (match_operand:MVE_types 0 "s_register_operand" "=w,w,r,w,w,r,w")
@@ -8917,3 +8927,265 @@
    return "";
 }
   [(set_attr "length" "8")])
+
+;;
+;; [vstrhq_f]
+;;
+(define_insn "mve_vstrhq_fv8hf"
+  [(set (match_operand:V8HI 0 "memory_operand" "=Us")
+	(unspec:V8HI [(match_operand:V8HF 1 "s_register_operand" "w")]
+	 VSTRHQ_F))
+  ]
+  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
+{
+   rtx ops[2];
+   int regno = REGNO (operands[1]);
+   ops[1] = gen_rtx_REG (TImode, regno);
+   ops[0]  = operands[0];
+   output_asm_insn ("vstrh.16\t%q1, %E0",ops);
+   return "";
+}
+  [(set_attr "length" "4")])
+
+;;
+;; [vstrhq_p_f]
+;;
+(define_insn "mve_vstrhq_p_fv8hf"
+  [(set (match_operand:V8HI 0 "memory_operand" "=Us")
+	(unspec:V8HI [(match_operand:V8HF 1 "s_register_operand" "w")
+		      (match_operand:HI 2 "vpr_register_operand" "Up")]
+	 VSTRHQ_F))
+  ]
+  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
+{
+   rtx ops[2];
+   int regno = REGNO (operands[1]);
+   ops[1] = gen_rtx_REG (TImode, regno);
+   ops[0]  = operands[0];
+   output_asm_insn ("vpst\n\tvstrht.16\t%q1, %E0",ops);
+   return "";
+}
+  [(set_attr "length" "8")])
+
+;;
+;; [vstrhq_p_s vstrhq_p_u]
+;;
+(define_insn "mve_vstrhq_p_<supf><mode>"
+  [(set (match_operand:<MVE_H_ELEM> 0 "memory_operand" "=Us")
+	(unspec:<MVE_H_ELEM> [(match_operand:MVE_6 1 "s_register_operand" "w")
+			      (match_operand:HI 2 "vpr_register_operand" "Up")]
+	 VSTRHQ))
+  ]
+  "TARGET_HAVE_MVE"
+{
+   rtx ops[2];
+   int regno = REGNO (operands[1]);
+   ops[1] = gen_rtx_REG (TImode, regno);
+   ops[0]  = operands[0];
+   output_asm_insn ("vpst\n\tvstrht.<V_sz_elem>\t%q1, %E0",ops);
+   return "";
+}
+  [(set_attr "length" "8")])
+
+;;
+;; [vstrhq_scatter_offset_p_s vstrhq_scatter_offset_p_u]
+;;
+(define_insn "mve_vstrhq_scatter_offset_p_<supf><mode>"
+  [(set (match_operand:<MVE_H_ELEM> 0 "memory_operand" "=Us")
+	(unspec:<MVE_H_ELEM>
+		[(match_operand:MVE_6 1 "s_register_operand" "w")
+		 (match_operand:MVE_6 2 "s_register_operand" "w")
+		 (match_operand:HI 3 "vpr_register_operand" "Up")]
+	 VSTRHSOQ))
+  ]
+  "TARGET_HAVE_MVE"
+{
+   rtx ops[3];
+   ops[0] = operands[0];
+   ops[1] = operands[1];
+   ops[2] = operands[2];
+   output_asm_insn ("vpst\n\tvstrht.<V_sz_elem>\t%q2, [%m0, %q1]",ops);
+   return "";
+}
+  [(set_attr "length" "8")])
+
+;;
+;; [vstrhq_scatter_offset_s vstrhq_scatter_offset_u]
+;;
+(define_insn "mve_vstrhq_scatter_offset_<supf><mode>"
+  [(set (match_operand:<MVE_H_ELEM> 0 "memory_operand" "=Us")
+	(unspec:<MVE_H_ELEM>
+		[(match_operand:MVE_6 1 "s_register_operand" "w")
+		 (match_operand:MVE_6 2 "s_register_operand" "w")]
+	 VSTRHSOQ))
+  ]
+  "TARGET_HAVE_MVE"
+{
+   rtx ops[3];
+   ops[0] = operands[0];
+   ops[1] = operands[1];
+   ops[2] = operands[2];
+   output_asm_insn ("vstrh.<V_sz_elem>\t%q2, [%m0, %q1]",ops);
+   return "";
+}
+  [(set_attr "length" "4")])
+
+;;
+;; [vstrhq_scatter_shifted_offset_p_s vstrhq_scatter_shifted_offset_p_u]
+;;
+(define_insn "mve_vstrhq_scatter_shifted_offset_p_<supf><mode>"
+  [(set (match_operand:<MVE_H_ELEM> 0 "memory_operand" "=Us")
+	(unspec:<MVE_H_ELEM>
+		[(match_operand:MVE_6 1 "s_register_operand" "w")
+		 (match_operand:MVE_6 2 "s_register_operand" "w")
+		 (match_operand:HI 3 "vpr_register_operand" "Up")]
+	 VSTRHSSOQ))
+  ]
+  "TARGET_HAVE_MVE"
+{
+   rtx ops[3];
+   ops[0] = operands[0];
+   ops[1] = operands[1];
+   ops[2] = operands[2];
+   output_asm_insn ("vpst\n\tvstrht.<V_sz_elem>\t%q2, [%m0, %q1, uxtw #1]",ops);
+   return "";
+}
+  [(set_attr "length" "8")])
+
+;;
+;; [vstrhq_scatter_shifted_offset_s vstrhq_scatter_shifted_offset_u]
+;;
+(define_insn "mve_vstrhq_scatter_shifted_offset_<supf><mode>"
+  [(set (match_operand:<MVE_H_ELEM> 0 "memory_operand" "=Us")
+	(unspec:<MVE_H_ELEM>
+		[(match_operand:MVE_6 1 "s_register_operand" "w")
+		 (match_operand:MVE_6 2 "s_register_operand" "w")]
+	 VSTRHSSOQ))
+  ]
+  "TARGET_HAVE_MVE"
+{
+   rtx ops[3];
+   ops[0] = operands[0];
+   ops[1] = operands[1];
+   ops[2] = operands[2];
+   output_asm_insn ("vstrh.<V_sz_elem>\t%q2, [%m0, %q1, uxtw #1]",ops);
+   return "";
+}
+  [(set_attr "length" "4")])
+
+;;
+;; [vstrhq_s, vstrhq_u]
+;;
+(define_insn "mve_vstrhq_<supf><mode>"
+  [(set (match_operand:<MVE_H_ELEM> 0 "memory_operand" "=Us")
+	(unspec:<MVE_H_ELEM> [(match_operand:MVE_6 1 "s_register_operand" "w")]
+	 VSTRHQ))
+  ]
+  "TARGET_HAVE_MVE"
+{
+   rtx ops[2];
+   int regno = REGNO (operands[1]);
+   ops[1] = gen_rtx_REG (TImode, regno);
+   ops[0]  = operands[0];
+   output_asm_insn ("vstrh.<V_sz_elem>\t%q1, %E0",ops);
+   return "";
+}
+  [(set_attr "length" "4")])
+
+;;
+;; [vstrwq_f]
+;;
+(define_insn "mve_vstrwq_fv4sf"
+  [(set (match_operand:V4SI 0 "memory_operand" "=Us")
+	(unspec:V4SI [(match_operand:V4SF 1 "s_register_operand" "w")]
+	 VSTRWQ_F))
+  ]
+  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
+{
+   rtx ops[2];
+   int regno = REGNO (operands[1]);
+   ops[1] = gen_rtx_REG (TImode, regno);
+   ops[0]  = operands[0];
+   output_asm_insn ("vstrw.32\t%q1, %E0",ops);
+   return "";
+}
+  [(set_attr "length" "4")])
+
+;;
+;; [vstrwq_p_f]
+;;
+(define_insn "mve_vstrwq_p_fv4sf"
+  [(set (match_operand:V4SI 0 "memory_operand" "=Us")
+	(unspec:V4SI [(match_operand:V4SF 1 "s_register_operand" "w")
+		      (match_operand:HI 2 "vpr_register_operand" "Up")]
+	 VSTRWQ_F))
+  ]
+  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
+{
+   rtx ops[2];
+   int regno = REGNO (operands[1]);
+   ops[1] = gen_rtx_REG (TImode, regno);
+   ops[0]  = operands[0];
+   output_asm_insn ("vpst\n\tvstrwt.32\t%q1, %E0",ops);
+   return "";
+}
+  [(set_attr "length" "8")])
+
+;;
+;; [vstrwq_p_s vstrwq_p_u]
+;;
+(define_insn "mve_vstrwq_p_<supf>v4si"
+  [(set (match_operand:V4SI 0 "memory_operand" "=Us")
+	(unspec:V4SI [(match_operand:V4SI 1 "s_register_operand" "w")
+		      (match_operand:HI 2 "vpr_register_operand" "Up")]
+	 VSTRWQ))
+  ]
+  "TARGET_HAVE_MVE"
+{
+   rtx ops[2];
+   int regno = REGNO (operands[1]);
+   ops[1] = gen_rtx_REG (TImode, regno);
+   ops[0]  = operands[0];
+   output_asm_insn ("vpst\n\tvstrwt.32\t%q1, %E0",ops);
+   return "";
+}
+  [(set_attr "length" "8")])
+
+;;
+;; [vstrwq_s vstrwq_u]
+;;
+(define_insn "mve_vstrwq_<supf>v4si"
+  [(set (match_operand:V4SI 0 "memory_operand" "=Us")
+	(unspec:V4SI [(match_operand:V4SI 1 "s_register_operand" "w")]
+	 VSTRWQ))
+  ]
+  "TARGET_HAVE_MVE"
+{
+   rtx ops[2];
+   int regno = REGNO (operands[1]);
+   ops[1] = gen_rtx_REG (TImode, regno);
+   ops[0]  = operands[0];
+   output_asm_insn ("vstrw.32\t%q1, %E0",ops);
+   return "";
+}
+  [(set_attr "length" "4")])
+
+(define_expand "mve_vst1q_f<mode>"
+  [(match_operand:<MVE_CNVT> 0 "memory_operand")
+   (unspec:<MVE_CNVT> [(match_operand:MVE_0 1 "s_register_operand")] VST1Q_F)
+  ]
+  "TARGET_HAVE_MVE || TARGET_HAVE_MVE_FLOAT"
+{
+  emit_insn (gen_mve_vstr<V_sz_elem1>q_f<mode>(operands[0],operands[1]));
+  DONE;
+})
+
+(define_expand "mve_vst1q_<supf><mode>"
+  [(match_operand:MVE_2 0 "memory_operand")
+   (unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand")] VST1Q)
+  ]
+  "TARGET_HAVE_MVE"
+{
+  emit_insn (gen_mve_vstr<V_sz_elem1>q_<supf><mode>(operands[0],operands[1]));
+  DONE;
+})
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_f16.c
new file mode 100644
index 0000000000000000000000000000000000000000..b429d61ab430a15d498cf24f3fd07ceb495c9f65
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_f16.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+void
+foo (float16_t * addr, float16x8_t value)
+{
+  vst1q_f16 (addr, value);
+}
+
+/* { dg-final { scan-assembler "vstrh.16"  }  } */
+
+void
+foo1 (float16_t * addr, float16x8_t value)
+{
+  vst1q (addr, value);
+}
+
+/* { dg-final { scan-assembler "vstrh.16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_f32.c
new file mode 100644
index 0000000000000000000000000000000000000000..d5acd953cfe811df9d0051d930d5101492b3b1a0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_f32.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+void
+foo (float32_t * addr, float32x4_t value)
+{
+  vst1q_f32 (addr, value);
+}
+
+/* { dg-final { scan-assembler "vstrw.32"  }  } */
+
+void
+foo1 (float32_t * addr, float32x4_t value)
+{
+  vst1q (addr, value);
+}
+
+/* { dg-final { scan-assembler "vstrw.32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s16.c
new file mode 100644
index 0000000000000000000000000000000000000000..bbb3886d2dbb6ab9dceff7663414c19ad6a37b93
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s16.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+void
+foo (int16_t * addr, int16x8_t value)
+{
+  vst1q_s16 (addr, value);
+}
+
+/* { dg-final { scan-assembler "vstrh.16"  }  } */
+
+void
+foo1 (int16_t * addr, int16x8_t value)
+{
+  vst1q (addr, value);
+}
+
+/* { dg-final { scan-assembler "vstrh.16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s32.c
new file mode 100644
index 0000000000000000000000000000000000000000..995a2769503239171943a6749739f781a7764ad7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s32.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+void
+foo (int32_t * addr, int32x4_t value)
+{
+  vst1q_s32 (addr, value);
+}
+
+/* { dg-final { scan-assembler "vstrw.32"  }  } */
+
+void
+foo1 (int32_t * addr, int32x4_t value)
+{
+  vst1q (addr, value);
+}
+
+/* { dg-final { scan-assembler "vstrw.32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s8.c
new file mode 100644
index 0000000000000000000000000000000000000000..1b87615cf3574523d18aade22931a60921af01bb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s8.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+void
+foo (int8_t * addr, int8x16_t value)
+{
+  vst1q_s8 (addr, value);
+}
+
+/* { dg-final { scan-assembler "vstrb.8"  }  } */
+
+void
+foo1 (int8_t * addr, int8x16_t value)
+{
+  vst1q (addr, value);
+}
+
+/* { dg-final { scan-assembler "vstrb.8"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u16.c
new file mode 100644
index 0000000000000000000000000000000000000000..de6721a6c95829191a5227afe658607c97f1719e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u16.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+void
+foo (uint16_t * addr, uint16x8_t value)
+{
+  vst1q_u16 (addr, value);
+}
+
+/* { dg-final { scan-assembler "vstrh.16"  }  } */
+
+void
+foo1 (uint16_t * addr, uint16x8_t value)
+{
+  vst1q (addr, value);
+}
+
+/* { dg-final { scan-assembler "vstrh.16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u32.c
new file mode 100644
index 0000000000000000000000000000000000000000..9880f200c11d3c7390ffad81ecadee3c39ac8db5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u32.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+void
+foo (uint32_t * addr, uint32x4_t value)
+{
+  vst1q_u32 (addr, value);
+}
+
+/* { dg-final { scan-assembler "vstrw.32"  }  } */
+
+void
+foo1 (uint32_t * addr, uint32x4_t value)
+{
+  vst1q (addr, value);
+}
+
+/* { dg-final { scan-assembler "vstrw.32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u8.c
new file mode 100644
index 0000000000000000000000000000000000000000..bdf52a6c67ae9c32bc8a704bd36f9b1e1c7959bb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u8.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+void
+foo (uint8_t * addr, uint8x16_t value)
+{
+  vst1q_u8 (addr, value);
+}
+
+/* { dg-final { scan-assembler "vstrb.8"  }  } */
+
+void
+foo1 (uint8_t * addr, uint8x16_t value)
+{
+  vst1q (addr, value);
+}
+
+/* { dg-final { scan-assembler "vstrb.8"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrhq_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrhq_f16.c
new file mode 100644
index 0000000000000000000000000000000000000000..f532c1f8bb398ae66e98c7fb41decdc535310a7f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrhq_f16.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+void
+foo (float16_t * addr, float16x8_t value)
+{
+  vstrhq_f16 (addr, value);
+}
+
+/* { dg-final { scan-assembler "vstrh.16"  }  } */
+
+void
+foo1 (float16_t * addr, float16x8_t value)
+{
+  vstrhq (addr, value);
+}
+
+/* { dg-final { scan-assembler "vstrh.16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrhq_p_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrhq_p_f16.c
new file mode 100644
index 0000000000000000000000000000000000000000..1da03679918a0331f7d4a2d36632fae9500833ff
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrhq_p_f16.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+void
+foo (float16_t * addr, float16x8_t value, mve_pred16_t p)
+{
+  vstrhq_p_f16 (addr, value, p);
+}
+
+/* { dg-final { scan-assembler "vstrht.16"  }  } */
+
+void
+foo1 (float16_t * addr, float16x8_t value, mve_pred16_t p)
+{
+  vstrhq_p (addr, value, p);
+}
+
+/* { dg-final { scan-assembler "vstrht.16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrhq_p_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrhq_p_s16.c
new file mode 100644
index 0000000000000000000000000000000000000000..d462f5ee321a9650b14693accbf9a46d0abe5c9d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrhq_p_s16.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+void
+foo (int16_t * addr, int16x8_t value, mve_pred16_t p)
+{
+  vstrhq_p_s16 (addr, value, p);
+}
+
+/* { dg-final { scan-assembler "vstrht.16"  }  } */
+
+void
+foo1 (int16_t * addr, int16x8_t value, mve_pred16_t p)
+{
+  vstrhq_p (addr, value, p);
+}
+
+/* { dg-final { scan-assembler "vstrht.16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrhq_p_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrhq_p_s32.c
new file mode 100644
index 0000000000000000000000000000000000000000..4f13b7cc5d1bad348cf0af8792fe6b77d437877b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrhq_p_s32.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+void
+foo (int16_t * addr, int32x4_t value, mve_pred16_t p)
+{
+  vstrhq_p_s32 (addr, value, p);
+}
+
+/* { dg-final { scan-assembler "vstrht.32"  }  } */
+
+void
+foo1 (int16_t * addr, int32x4_t value, mve_pred16_t p)
+{
+  vstrhq_p (addr, value, p);
+}
+
+/* { dg-final { scan-assembler "vstrht.32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrhq_p_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrhq_p_u16.c
new file mode 100644
index 0000000000000000000000000000000000000000..78c8ca8176041d7dcab0a0fe0082410a63fb753f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrhq_p_u16.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+void
+foo (uint16_t * addr, uint16x8_t value, mve_pred16_t p)
+{
+  vstrhq_p_u16 (addr, value, p);
+}
+
+/* { dg-final { scan-assembler "vstrht.16"  }  } */
+
+void
+foo1 (uint16_t * addr, uint16x8_t value, mve_pred16_t p)
+{
+  vstrhq_p (addr, value, p);
+}
+
+/* { dg-final { scan-assembler "vstrht.16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrhq_p_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrhq_p_u32.c
new file mode 100644
index 0000000000000000000000000000000000000000..6d5f90b3d85425de2aa6d35b8cebe2b3807196cd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrhq_p_u32.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+void
+foo (uint16_t * addr, uint32x4_t value, mve_pred16_t p)
+{
+  vstrhq_p_u32 (addr, value, p);
+}
+
+/* { dg-final { scan-assembler "vstrht.32"  }  } */
+
+void
+foo1 (uint16_t * addr, uint32x4_t value, mve_pred16_t p)
+{
+  vstrhq_p (addr, value, p);
+}
+
+/* { dg-final { scan-assembler "vstrht.32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrhq_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrhq_s16.c
new file mode 100644
index 0000000000000000000000000000000000000000..e824126e2194669869457a88090cbf9fe16f19e6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrhq_s16.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+void
+foo (int16_t * addr, int16x8_t value)
+{
+  vstrhq_s16 (addr, value);
+}
+
+/* { dg-final { scan-assembler "vstrh.16"  }  } */
+
+void
+foo1 (int16_t * addr, int16x8_t value)
+{
+  vstrhq (addr, value);
+}
+
+/* { dg-final { scan-assembler "vstrh.16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrhq_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrhq_s32.c
new file mode 100644
index 0000000000000000000000000000000000000000..619ff380cce6efef65c5cc66dfe9343d957fffbb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrhq_s32.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+void
+foo (int16_t * addr, int32x4_t value)
+{
+  vstrhq_s32 (addr, value);
+}
+
+/* { dg-final { scan-assembler "vstrh.32"  }  } */
+
+void
+foo1 (int16_t * addr, int32x4_t value)
+{
+  vstrhq (addr, value);
+}
+
+/* { dg-final { scan-assembler "vstrh.32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrhq_scatter_offset_p_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrhq_scatter_offset_p_s16.c
new file mode 100644
index 0000000000000000000000000000000000000000..0f2f289d24aec0049609fdd9cb69a0a5a1b02ee8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrhq_scatter_offset_p_s16.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+void
+foo (int16_t * base, uint16x8_t offset, int16x8_t value, mve_pred16_t p)
+{
+  vstrhq_scatter_offset_p_s16 (base, offset, value, p);
+}
+
+/* { dg-final { scan-assembler "vstrht.16"  }  } */
+
+void
+foo1 (int16_t * base, uint16x8_t offset, int16x8_t value, mve_pred16_t p)
+{
+  vstrhq_scatter_offset_p (base, offset, value, p);
+}
+
+/* { dg-final { scan-assembler "vstrht.16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrhq_scatter_offset_p_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrhq_scatter_offset_p_s32.c
new file mode 100644
index 0000000000000000000000000000000000000000..8b2371343089dbc9e420b8f66ffc9a75a0640be7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrhq_scatter_offset_p_s32.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+void
+foo (int16_t * base, uint32x4_t offset, int32x4_t value, mve_pred16_t p)
+{
+  vstrhq_scatter_offset_p_s32 (base, offset, value, p);
+}
+
+/* { dg-final { scan-assembler "vstrht.32"  }  } */
+
+void
+foo1 (int16_t * base, uint32x4_t offset, int32x4_t value, mve_pred16_t p)
+{
+  vstrhq_scatter_offset_p (base, offset, value, p);
+}
+
+/* { dg-final { scan-assembler "vstrht.32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrhq_scatter_offset_p_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrhq_scatter_offset_p_u16.c
new file mode 100644
index 0000000000000000000000000000000000000000..11a3c5d35bbd8949c9752edfcc72caba53feed4b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrhq_scatter_offset_p_u16.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+void
+foo (uint16_t * base, uint16x8_t offset, uint16x8_t value, mve_pred16_t p)
+{
+  vstrhq_scatter_offset_p_u16 (base, offset, value, p);
+}
+
+/* { dg-final { scan-assembler "vstrht.16"  }  } */
+
+void
+foo1 (uint16_t * base, uint16x8_t offset, uint16x8_t value, mve_pred16_t p)
+{
+  vstrhq_scatter_offset_p (base, offset, value, p);
+}
+
+/* { dg-final { scan-assembler "vstrht.16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrhq_scatter_offset_p_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrhq_scatter_offset_p_u32.c
new file mode 100644
index 0000000000000000000000000000000000000000..745da2e3f62bce088a651c4f7ee91a1137d557da
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrhq_scatter_offset_p_u32.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+void
+foo (uint16_t * base, uint32x4_t offset, uint32x4_t value, mve_pred16_t p)
+{
+  vstrhq_scatter_offset_p_u32 (base, offset, value, p);
+}
+
+/* { dg-final { scan-assembler "vstrht.32"  }  } */
+
+void
+foo1 (uint16_t * base, uint32x4_t offset, uint32x4_t value, mve_pred16_t p)
+{
+  vstrhq_scatter_offset_p (base, offset, value, p);
+}
+
+/* { dg-final { scan-assembler "vstrht.32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrhq_scatter_offset_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrhq_scatter_offset_s16.c
new file mode 100644
index 0000000000000000000000000000000000000000..6831e4a91501bb83ce5c22f74db9946679351627
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrhq_scatter_offset_s16.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+void
+foo (int16_t * base, uint16x8_t offset, int16x8_t value)
+{
+  vstrhq_scatter_offset_s16 (base, offset, value);
+}
+
+/* { dg-final { scan-assembler "vstrh.16"  }  } */
+
+void
+foo1 (int16_t * base, uint16x8_t offset, int16x8_t value)
+{
+  vstrhq_scatter_offset (base, offset, value);
+}
+
+/* { dg-final { scan-assembler "vstrh.16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrhq_scatter_offset_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrhq_scatter_offset_s32.c
new file mode 100644
index 0000000000000000000000000000000000000000..18ced2469dcd8f3288101dad170b77f5cc23b543
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrhq_scatter_offset_s32.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+void
+foo (int16_t * base, uint32x4_t offset, int32x4_t value)
+{
+  vstrhq_scatter_offset_s32 (base, offset, value);
+}
+
+/* { dg-final { scan-assembler "vstrh.32"  }  } */
+
+void
+foo1 (int16_t * base, uint32x4_t offset, int32x4_t value)
+{
+  vstrhq_scatter_offset (base, offset, value);
+}
+
+/* { dg-final { scan-assembler "vstrh.32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrhq_scatter_offset_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrhq_scatter_offset_u16.c
new file mode 100644
index 0000000000000000000000000000000000000000..b3bae4d149fe3368d4ec871c7a4ace08b865e155
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrhq_scatter_offset_u16.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+void
+foo (uint16_t * base, uint16x8_t offset, uint16x8_t value)
+{
+  vstrhq_scatter_offset_u16 (base, offset, value);
+}
+
+/* { dg-final { scan-assembler "vstrh.16"  }  } */
+
+void
+foo1 (uint16_t * base, uint16x8_t offset, uint16x8_t value)
+{
+  vstrhq_scatter_offset (base, offset, value);
+}
+
+/* { dg-final { scan-assembler "vstrh.16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrhq_scatter_offset_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrhq_scatter_offset_u32.c
new file mode 100644
index 0000000000000000000000000000000000000000..22e204ac40917547a45d25730a5f8b3608991dfd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrhq_scatter_offset_u32.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+void
+foo (uint16_t * base, uint32x4_t offset, uint32x4_t value)
+{
+  vstrhq_scatter_offset_u32 (base, offset, value);
+}
+
+/* { dg-final { scan-assembler "vstrh.32"  }  } */
+
+void
+foo1 (uint16_t * base, uint32x4_t offset, uint32x4_t value)
+{
+  vstrhq_scatter_offset (base, offset, value);
+}
+
+/* { dg-final { scan-assembler "vstrh.32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrhq_scatter_shifted_offset_p_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrhq_scatter_shifted_offset_p_s16.c
new file mode 100644
index 0000000000000000000000000000000000000000..9034cb5dfb804d0aa9238d73cb0156a650f8d707
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrhq_scatter_shifted_offset_p_s16.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+void
+foo (int16_t * base, uint16x8_t offset, int16x8_t value, mve_pred16_t p)
+{
+  vstrhq_scatter_shifted_offset_p_s16 (base, offset, value, p);
+}
+
+/* { dg-final { scan-assembler "vstrht.16"  }  } */
+
+void
+foo1 (int16_t * base, uint16x8_t offset, int16x8_t value, mve_pred16_t p)
+{
+  vstrhq_scatter_shifted_offset_p (base, offset, value, p);
+}
+
+/* { dg-final { scan-assembler "vstrht.16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrhq_scatter_shifted_offset_p_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrhq_scatter_shifted_offset_p_s32.c
new file mode 100644
index 0000000000000000000000000000000000000000..2c2b053376820f15a6e2f32ca72e2a188487925f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrhq_scatter_shifted_offset_p_s32.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+void
+foo (int16_t * base, uint32x4_t offset, int32x4_t value, mve_pred16_t p)
+{
+  vstrhq_scatter_shifted_offset_p_s32 (base, offset, value, p);
+}
+
+/* { dg-final { scan-assembler "vstrht.32"  }  } */
+
+void
+foo1 (int16_t * base, uint32x4_t offset, int32x4_t value, mve_pred16_t p)
+{
+  vstrhq_scatter_shifted_offset_p (base, offset, value, p);
+}
+
+/* { dg-final { scan-assembler "vstrht.32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrhq_scatter_shifted_offset_p_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrhq_scatter_shifted_offset_p_u16.c
new file mode 100644
index 0000000000000000000000000000000000000000..842f4286890e2a0087fc5d861cd4a6585fba8a88
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrhq_scatter_shifted_offset_p_u16.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+void
+foo (uint16_t * base, uint16x8_t offset, uint16x8_t value, mve_pred16_t p)
+{
+  vstrhq_scatter_shifted_offset_p_u16 (base, offset, value, p);
+}
+
+/* { dg-final { scan-assembler "vstrht.16"  }  } */
+
+void
+foo1 (uint16_t * base, uint16x8_t offset, uint16x8_t value, mve_pred16_t p)
+{
+  vstrhq_scatter_shifted_offset_p (base, offset, value, p);
+}
+
+/* { dg-final { scan-assembler "vstrht.16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrhq_scatter_shifted_offset_p_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrhq_scatter_shifted_offset_p_u32.c
new file mode 100644
index 0000000000000000000000000000000000000000..dcc7d8441f7e7b496df02d9eda713e4210e27af5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrhq_scatter_shifted_offset_p_u32.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+void
+foo (uint16_t * base, uint32x4_t offset, uint32x4_t value, mve_pred16_t p)
+{
+  vstrhq_scatter_shifted_offset_p_u32 (base, offset, value, p);
+}
+
+/* { dg-final { scan-assembler "vstrht.32"  }  } */
+
+void
+foo1 (uint16_t * base, uint32x4_t offset, uint32x4_t value, mve_pred16_t p)
+{
+  vstrhq_scatter_shifted_offset_p (base, offset, value, p);
+}
+
+/* { dg-final { scan-assembler "vstrht.32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrhq_scatter_shifted_offset_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrhq_scatter_shifted_offset_s16.c
new file mode 100644
index 0000000000000000000000000000000000000000..ee728f790f1550e278d240b40de31a957f3f1b6a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrhq_scatter_shifted_offset_s16.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+void
+foo (int16_t * base, uint16x8_t offset, int16x8_t value)
+{
+  vstrhq_scatter_shifted_offset_s16 (base, offset, value);
+}
+
+/* { dg-final { scan-assembler "vstrh.16"  }  } */
+
+void
+foo1 (int16_t * base, uint16x8_t offset, int16x8_t value)
+{
+  vstrhq_scatter_shifted_offset (base, offset, value);
+}
+
+/* { dg-final { scan-assembler "vstrh.16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrhq_scatter_shifted_offset_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrhq_scatter_shifted_offset_s32.c
new file mode 100644
index 0000000000000000000000000000000000000000..6accfd8a68a7291a0864903d6fe57b6f2194f810
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrhq_scatter_shifted_offset_s32.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+void
+foo (int16_t * base, uint32x4_t offset, int32x4_t value)
+{
+  vstrhq_scatter_shifted_offset_s32 (base, offset, value);
+}
+
+/* { dg-final { scan-assembler "vstrh.32"  }  } */
+
+void
+foo1 (int16_t * base, uint32x4_t offset, int32x4_t value)
+{
+  vstrhq_scatter_shifted_offset (base, offset, value);
+}
+
+/* { dg-final { scan-assembler "vstrh.32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrhq_scatter_shifted_offset_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrhq_scatter_shifted_offset_u16.c
new file mode 100644
index 0000000000000000000000000000000000000000..96c87b654fe2da44ac26627ec88b6188855bcabe
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrhq_scatter_shifted_offset_u16.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+void
+foo (uint16_t * base, uint16x8_t offset, uint16x8_t value)
+{
+  vstrhq_scatter_shifted_offset_u16 (base, offset, value);
+}
+
+/* { dg-final { scan-assembler "vstrh.16"  }  } */
+
+void
+foo1 (uint16_t * base, uint16x8_t offset, uint16x8_t value)
+{
+  vstrhq_scatter_shifted_offset (base, offset, value);
+}
+
+/* { dg-final { scan-assembler "vstrh.16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrhq_scatter_shifted_offset_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrhq_scatter_shifted_offset_u32.c
new file mode 100644
index 0000000000000000000000000000000000000000..57a27f13c02acffbc9e0818716a76604720ff6df
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrhq_scatter_shifted_offset_u32.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+void
+foo (uint16_t * base, uint32x4_t offset, uint32x4_t value)
+{
+  vstrhq_scatter_shifted_offset_u32 (base, offset, value);
+}
+
+/* { dg-final { scan-assembler "vstrh.32"  }  } */
+
+void
+foo1 (uint16_t * base, uint32x4_t offset, uint32x4_t value)
+{
+  vstrhq_scatter_shifted_offset (base, offset, value);
+}
+
+/* { dg-final { scan-assembler "vstrh.32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrhq_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrhq_u16.c
new file mode 100644
index 0000000000000000000000000000000000000000..46a368a26cfe8eb51c86459ab5d506f1f5f3acfd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrhq_u16.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+void
+foo (uint16_t * addr, uint16x8_t value)
+{
+  vstrhq_u16 (addr, value);
+}
+
+/* { dg-final { scan-assembler "vstrh.16"  }  } */
+
+void
+foo1 (uint16_t * addr, uint16x8_t value)
+{
+  vstrhq (addr, value);
+}
+
+/* { dg-final { scan-assembler "vstrh.16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrhq_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrhq_u32.c
new file mode 100644
index 0000000000000000000000000000000000000000..ba814d37aa99be8ef207496a77d6d85e450f1400
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrhq_u32.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+void
+foo (uint16_t * addr, uint32x4_t value)
+{
+  vstrhq_u32 (addr, value);
+}
+
+/* { dg-final { scan-assembler "vstrh.32"  }  } */
+
+void
+foo1 (uint16_t * addr, uint32x4_t value)
+{
+  vstrhq (addr, value);
+}
+
+/* { dg-final { scan-assembler "vstrh.32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrwq_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrwq_f32.c
new file mode 100644
index 0000000000000000000000000000000000000000..b5b8e309c167cfdf4f54701d9e8520e695bcd020
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrwq_f32.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+void
+foo (float32_t * addr, float32x4_t value)
+{
+  vstrwq_f32 (addr, value);
+}
+
+/* { dg-final { scan-assembler "vstrw.32"  }  } */
+
+void
+foo1 (float32_t * addr, float32x4_t value)
+{
+  vstrwq (addr, value);
+}
+
+/* { dg-final { scan-assembler "vstrw.32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrwq_p_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrwq_p_f32.c
new file mode 100644
index 0000000000000000000000000000000000000000..ee034fe053d1a34915d0d09fa78a5ca26bb695ae
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrwq_p_f32.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+void
+foo (float32_t * addr, float32x4_t value, mve_pred16_t p)
+{
+  vstrwq_p_f32 (addr, value, p);
+}
+
+/* { dg-final { scan-assembler "vstrwt.32"  }  } */
+
+void
+foo1 (float32_t * addr, float32x4_t value, mve_pred16_t p)
+{
+  vstrwq_p (addr, value, p);
+}
+
+/* { dg-final { scan-assembler "vstrwt.32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrwq_p_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrwq_p_s32.c
new file mode 100644
index 0000000000000000000000000000000000000000..daf2fe929d212854ae82f9379c543f9dfa065045
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrwq_p_s32.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+void
+foo (int32_t * addr, int32x4_t value, mve_pred16_t p)
+{
+  vstrwq_p_s32 (addr, value, p);
+}
+
+/* { dg-final { scan-assembler "vstrwt.32"  }  } */
+
+void
+foo1 (int32_t * addr, int32x4_t value, mve_pred16_t p)
+{
+  vstrwq_p (addr, value, p);
+}
+
+/* { dg-final { scan-assembler "vstrwt.32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrwq_p_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrwq_p_u32.c
new file mode 100644
index 0000000000000000000000000000000000000000..8e78c6effe55fa942bed99169e884dbcb2efeb8e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrwq_p_u32.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+void
+foo (uint32_t * addr, uint32x4_t value, mve_pred16_t p)
+{
+  vstrwq_p_u32 (addr, value, p);
+}
+
+/* { dg-final { scan-assembler "vstrwt.32"  }  } */
+
+void
+foo1 (uint32_t * addr, uint32x4_t value, mve_pred16_t p)
+{
+  vstrwq_p (addr, value, p);
+}
+
+/* { dg-final { scan-assembler "vstrwt.32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrwq_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrwq_s32.c
new file mode 100644
index 0000000000000000000000000000000000000000..e59ad820a7c7c767af300f4c7ac54e7566ca8ed3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrwq_s32.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+void
+foo (int32_t * addr, int32x4_t value)
+{
+  vstrwq_s32 (addr, value);
+}
+
+/* { dg-final { scan-assembler "vstrw.32"  }  } */
+
+void
+foo1 (int32_t * addr, int32x4_t value)
+{
+  vstrwq (addr, value);
+}
+
+/* { dg-final { scan-assembler "vstrw.32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrwq_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrwq_u32.c
new file mode 100644
index 0000000000000000000000000000000000000000..f76faa210526e73dc066410e9e9a745ab5671104
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrwq_u32.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+void
+foo (uint32_t * addr, uint32x4_t value)
+{
+  vstrwq_u32 (addr, value);
+}
+
+/* { dg-final { scan-assembler "vstrw.32"  }  } */
+
+void
+foo1 (uint32_t * addr, uint32x4_t value)
+{
+  vstrwq (addr, value);
+}
+
+/* { dg-final { scan-assembler "vstrw.32"  }  } */


[-- Attachment #2: diff26.patch.gz --]
[-- Type: application/gzip, Size: 6626 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH][ARM][GCC][4/2x]: MVE intrinsics with binary operands.
  2019-11-14 19:34 [PATCH][ARM][GCC][0/x]: Support for MVE ACLE intrinsics Srinath Parvathaneni
                   ` (11 preceding siblings ...)
  2019-11-14 19:14 ` [PATCH][ARM][GCC][1/4x]: MVE intrinsics with quaternary operands Srinath Parvathaneni
@ 2019-11-14 19:15 ` Srinath Parvathaneni
  2019-11-14 19:15 ` [PATCH][ARM][GCC][4/5x]: MVE load intrinsics with zero(_z) suffix Srinath Parvathaneni
                   ` (25 subsequent siblings)
  38 siblings, 0 replies; 48+ messages in thread
From: Srinath Parvathaneni @ 2019-11-14 19:15 UTC (permalink / raw)
  To: gcc-patches; +Cc: Richard Earnshaw, Kyrylo Tkachov

[-- Attachment #1: Type: text/plain, Size: 53531 bytes --]

Hello,

This patch supports following MVE ACLE intrinsics with binary operands.

vsubq_u8, vsubq_n_u8, vrmulhq_u8, vrhaddq_u8, vqsubq_u8, vqsubq_n_u8, vqaddq_u8,
vqaddq_n_u8, vorrq_u8, vornq_u8, vmulq_u8, vmulq_n_u8, vmulltq_int_u8, vmullbq_int_u8,
vmulhq_u8, vmladavq_u8, vminvq_u8, vminq_u8, vmaxvq_u8, vmaxq_u8, vhsubq_u8, vhsubq_n_u8,
vhaddq_u8, vhaddq_n_u8, veorq_u8, vcmpneq_n_u8, vcmphiq_u8, vcmphiq_n_u8, vcmpeqq_u8, 
vcmpeqq_n_u8, vcmpcsq_u8, vcmpcsq_n_u8, vcaddq_rot90_u8, vcaddq_rot270_u8, vbicq_u8,
vandq_u8, vaddvq_p_u8, vaddvaq_u8, vaddq_n_u8, vabdq_u8, vshlq_r_u8, vrshlq_u8,
vrshlq_n_u8, vqshlq_u8, vqshlq_r_u8, vqrshlq_u8, vqrshlq_n_u8, vminavq_s8, vminaq_s8,
vmaxavq_s8, vmaxaq_s8, vbrsrq_n_u8, vshlq_n_u8, vrshrq_n_u8, vqshlq_n_u8, vcmpneq_n_s8,
vcmpltq_s8, vcmpltq_n_s8, vcmpleq_s8, vcmpleq_n_s8, vcmpgtq_s8, vcmpgtq_n_s8, vcmpgeq_s8,
vcmpgeq_n_s8, vcmpeqq_s8, vcmpeqq_n_s8, vqshluq_n_s8, vaddvq_p_s8, vsubq_s8, vsubq_n_s8,
vshlq_r_s8, vrshlq_s8, vrshlq_n_s8, vrmulhq_s8, vrhaddq_s8, vqsubq_s8, vqsubq_n_s8,
vqshlq_s8, vqshlq_r_s8, vqrshlq_s8, vqrshlq_n_s8, vqrdmulhq_s8, vqrdmulhq_n_s8, vqdmulhq_s8,
vqdmulhq_n_s8, vqaddq_s8, vqaddq_n_s8, vorrq_s8, vornq_s8, vmulq_s8, vmulq_n_s8, vmulltq_int_s8,
vmullbq_int_s8, vmulhq_s8, vmlsdavxq_s8, vmlsdavq_s8, vmladavxq_s8, vmladavq_s8, vminvq_s8,
vminq_s8, vmaxvq_s8, vmaxq_s8, vhsubq_s8, vhsubq_n_s8, vhcaddq_rot90_s8, vhcaddq_rot270_s8,
vhaddq_s8, vhaddq_n_s8, veorq_s8, vcaddq_rot90_s8, vcaddq_rot270_s8, vbrsrq_n_s8, vbicq_s8,
vandq_s8, vaddvaq_s8, vaddq_n_s8, vabdq_s8, vshlq_n_s8, vrshrq_n_s8, vqshlq_n_s8, vsubq_u16,
vsubq_n_u16, vrmulhq_u16, vrhaddq_u16, vqsubq_u16, vqsubq_n_u16, vqaddq_u16, vqaddq_n_u16,
vorrq_u16, vornq_u16, vmulq_u16, vmulq_n_u16, vmulltq_int_u16, vmullbq_int_u16, vmulhq_u16,
vmladavq_u16, vminvq_u16, vminq_u16, vmaxvq_u16, vmaxq_u16, vhsubq_u16, vhsubq_n_u16,
vhaddq_u16, vhaddq_n_u16, veorq_u16, vcmpneq_n_u16, vcmphiq_u16, vcmphiq_n_u16, vcmpeqq_u16,
vcmpeqq_n_u16, vcmpcsq_u16, vcmpcsq_n_u16, vcaddq_rot90_u16, vcaddq_rot270_u16, vbicq_u16,
vandq_u16, vaddvq_p_u16, vaddvaq_u16, vaddq_n_u16, vabdq_u16, vshlq_r_u16, vrshlq_u16,
vrshlq_n_u16, vqshlq_u16, vqshlq_r_u16, vqrshlq_u16, vqrshlq_n_u16, vminavq_s16, vminaq_s16,
vmaxavq_s16, vmaxaq_s16, vbrsrq_n_u16, vshlq_n_u16, vrshrq_n_u16, vqshlq_n_u16, vcmpneq_n_s16,
vcmpltq_s16, vcmpltq_n_s16, vcmpleq_s16, vcmpleq_n_s16, vcmpgtq_s16, vcmpgtq_n_s16, 
vcmpgeq_s16, vcmpgeq_n_s16, vcmpeqq_s16, vcmpeqq_n_s16, vqshluq_n_s16, vaddvq_p_s16, vsubq_s16,
vsubq_n_s16, vshlq_r_s16, vrshlq_s16, vrshlq_n_s16, vrmulhq_s16, vrhaddq_s16, vqsubq_s16,
vqsubq_n_s16, vqshlq_s16, vqshlq_r_s16, vqrshlq_s16, vqrshlq_n_s16, vqrdmulhq_s16,
vqrdmulhq_n_s16, vqdmulhq_s16, vqdmulhq_n_s16, vqaddq_s16, vqaddq_n_s16, vorrq_s16, vornq_s16,
vmulq_s16, vmulq_n_s16, vmulltq_int_s16, vmullbq_int_s16, vmulhq_s16, vmlsdavxq_s16, vmlsdavq_s16,
vmladavxq_s16, vmladavq_s16, vminvq_s16, vminq_s16, vmaxvq_s16, vmaxq_s16, vhsubq_s16,
vhsubq_n_s16, vhcaddq_rot90_s16, vhcaddq_rot270_s16, vhaddq_s16, vhaddq_n_s16, veorq_s16,
vcaddq_rot90_s16, vcaddq_rot270_s16, vbrsrq_n_s16, vbicq_s16, vandq_s16, vaddvaq_s16, vaddq_n_s16,
vabdq_s16, vshlq_n_s16, vrshrq_n_s16, vqshlq_n_s16, vsubq_u32, vsubq_n_u32, vrmulhq_u32,
vrhaddq_u32, vqsubq_u32, vqsubq_n_u32, vqaddq_u32, vqaddq_n_u32, vorrq_u32, vornq_u32, vmulq_u32,
vmulq_n_u32, vmulltq_int_u32, vmullbq_int_u32, vmulhq_u32, vmladavq_u32, vminvq_u32, vminq_u32,
vmaxvq_u32, vmaxq_u32, vhsubq_u32, vhsubq_n_u32, vhaddq_u32, vhaddq_n_u32, veorq_u32, vcmpneq_n_u32,
vcmphiq_u32, vcmphiq_n_u32, vcmpeqq_u32, vcmpeqq_n_u32, vcmpcsq_u32, vcmpcsq_n_u32,
vcaddq_rot90_u32, vcaddq_rot270_u32, vbicq_u32, vandq_u32, vaddvq_p_u32, vaddvaq_u32, vaddq_n_u32,
vabdq_u32, vshlq_r_u32, vrshlq_u32, vrshlq_n_u32, vqshlq_u32, vqshlq_r_u32, vqrshlq_u32, vqrshlq_n_u32,
vminavq_s32, vminaq_s32, vmaxavq_s32, vmaxaq_s32, vbrsrq_n_u32, vshlq_n_u32, vrshrq_n_u32,
vqshlq_n_u32, vcmpneq_n_s32, vcmpltq_s32, vcmpltq_n_s32, vcmpleq_s32, vcmpleq_n_s32, vcmpgtq_s32,
vcmpgtq_n_s32, vcmpgeq_s32, vcmpgeq_n_s32, vcmpeqq_s32, vcmpeqq_n_s32, vqshluq_n_s32, vaddvq_p_s32,
vsubq_s32, vsubq_n_s32, vshlq_r_s32, vrshlq_s32, vrshlq_n_s32, vrmulhq_s32, vrhaddq_s32, vqsubq_s32,
vqsubq_n_s32, vqshlq_s32, vqshlq_r_s32, vqrshlq_s32, vqrshlq_n_s32, vqrdmulhq_s32, vqrdmulhq_n_s32,
vqdmulhq_s32, vqdmulhq_n_s32, vqaddq_s32, vqaddq_n_s32, vorrq_s32, vornq_s32, vmulq_s32, vmulq_n_s32,
vmulltq_int_s32, vmullbq_int_s32, vmulhq_s32, vmlsdavxq_s32, vmlsdavq_s32, vmladavxq_s32, vmladavq_s32,
vminvq_s32, vminq_s32, vmaxvq_s32, vmaxq_s32, vhsubq_s32, vhsubq_n_s32, vhcaddq_rot90_s32,
vhcaddq_rot270_s32, vhaddq_s32, vhaddq_n_s32, veorq_s32, vcaddq_rot90_s32, vcaddq_rot270_s32,
vbrsrq_n_s32, vbicq_s32, vandq_s32, vaddvaq_s32, vaddq_n_s32, vabdq_s32, vshlq_n_s32, vrshrq_n_s32,
vqshlq_n_s32.

Please refer to M-profile Vector Extension (MVE) intrinsics [1]  for more details.
[1] https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/mve-intrinsics

In this patch new constraints "Ra" and "Rg" are added.
Ra checks the constant is with in the range of 0 to 7 where as Rg checks that the constant is one among
1, 2, 4 and 8.

Also a new predicates "mve_imm_7" and "mve_imm_selective_upto_8" are added, to check the the matching
constraint Ra and Rg respectively.

The above intrinsics are defined using the already defined builtin qualifiers BINOP_NONE_NONE_IMM, BINOP_NONE_NONE_NONE,
BINOP_NONE_NONE_UNONE, BINOP_UNONE_NONE_IMM, BINOP_UNONE_NONE_NONE, BINOP_UNONE_UNONE_IMM, BINOP_UNONE_UNONE_NONE,
BINOP_UNONE_UNONE_UNONE.

Regression tested on arm-none-eabi and found no regressions.

Ok for trunk?

Thanks,
Srinath.

gcc/ChangeLog:

2019-10-23  Andre Vieira  <andre.simoesdiasvieira@arm.com>
	    Mihail Ionescu  <mihail.ionescu@arm.com>
	    Srinath Parvathaneni  <srinath.parvathaneni@arm.com>

	* config/arm/arm_mve.h (vsubq_u8): Define macro.
	(vsubq_n_u8): Likewise.
	(vrmulhq_u8): Likewise.
	(vrhaddq_u8): Likewise.
	(vqsubq_u8): Likewise.
	(vqsubq_n_u8): Likewise.
	(vqaddq_u8): Likewise.
	(vqaddq_n_u8): Likewise.
	(vorrq_u8): Likewise.
	(vornq_u8): Likewise.
	(vmulq_u8): Likewise.
	(vmulq_n_u8): Likewise.
	(vmulltq_int_u8): Likewise.
	(vmullbq_int_u8): Likewise.
	(vmulhq_u8): Likewise.
	(vmladavq_u8): Likewise.
	(vminvq_u8): Likewise.
	(vminq_u8): Likewise.
	(vmaxvq_u8): Likewise.
	(vmaxq_u8): Likewise.
	(vhsubq_u8): Likewise.
	(vhsubq_n_u8): Likewise.
	(vhaddq_u8): Likewise.
	(vhaddq_n_u8): Likewise.
	(veorq_u8): Likewise.
	(vcmpneq_n_u8): Likewise.
	(vcmphiq_u8): Likewise.
	(vcmphiq_n_u8): Likewise.
	(vcmpeqq_u8): Likewise.
	(vcmpeqq_n_u8): Likewise.
	(vcmpcsq_u8): Likewise.
	(vcmpcsq_n_u8): Likewise.
	(vcaddq_rot90_u8): Likewise.
	(vcaddq_rot270_u8): Likewise.
	(vbicq_u8): Likewise.
	(vandq_u8): Likewise.
	(vaddvq_p_u8): Likewise.
	(vaddvaq_u8): Likewise.
	(vaddq_n_u8): Likewise.
	(vabdq_u8): Likewise.
	(vshlq_r_u8): Likewise.
	(vrshlq_u8): Likewise.
	(vrshlq_n_u8): Likewise.
	(vqshlq_u8): Likewise.
	(vqshlq_r_u8): Likewise.
	(vqrshlq_u8): Likewise.
	(vqrshlq_n_u8): Likewise.
	(vminavq_s8): Likewise.
	(vminaq_s8): Likewise.
	(vmaxavq_s8): Likewise.
	(vmaxaq_s8): Likewise.
	(vbrsrq_n_u8): Likewise.
	(vshlq_n_u8): Likewise.
	(vrshrq_n_u8): Likewise.
	(vqshlq_n_u8): Likewise.
	(vcmpneq_n_s8): Likewise.
	(vcmpltq_s8): Likewise.
	(vcmpltq_n_s8): Likewise.
	(vcmpleq_s8): Likewise.
	(vcmpleq_n_s8): Likewise.
	(vcmpgtq_s8): Likewise.
	(vcmpgtq_n_s8): Likewise.
	(vcmpgeq_s8): Likewise.
	(vcmpgeq_n_s8): Likewise.
	(vcmpeqq_s8): Likewise.
	(vcmpeqq_n_s8): Likewise.
	(vqshluq_n_s8): Likewise.
	(vaddvq_p_s8): Likewise.
	(vsubq_s8): Likewise.
	(vsubq_n_s8): Likewise.
	(vshlq_r_s8): Likewise.
	(vrshlq_s8): Likewise.
	(vrshlq_n_s8): Likewise.
	(vrmulhq_s8): Likewise.
	(vrhaddq_s8): Likewise.
	(vqsubq_s8): Likewise.
	(vqsubq_n_s8): Likewise.
	(vqshlq_s8): Likewise.
	(vqshlq_r_s8): Likewise.
	(vqrshlq_s8): Likewise.
	(vqrshlq_n_s8): Likewise.
	(vqrdmulhq_s8): Likewise.
	(vqrdmulhq_n_s8): Likewise.
	(vqdmulhq_s8): Likewise.
	(vqdmulhq_n_s8): Likewise.
	(vqaddq_s8): Likewise.
	(vqaddq_n_s8): Likewise.
	(vorrq_s8): Likewise.
	(vornq_s8): Likewise.
	(vmulq_s8): Likewise.
	(vmulq_n_s8): Likewise.
	(vmulltq_int_s8): Likewise.
	(vmullbq_int_s8): Likewise.
	(vmulhq_s8): Likewise.
	(vmlsdavxq_s8): Likewise.
	(vmlsdavq_s8): Likewise.
	(vmladavxq_s8): Likewise.
	(vmladavq_s8): Likewise.
	(vminvq_s8): Likewise.
	(vminq_s8): Likewise.
	(vmaxvq_s8): Likewise.
	(vmaxq_s8): Likewise.
	(vhsubq_s8): Likewise.
	(vhsubq_n_s8): Likewise.
	(vhcaddq_rot90_s8): Likewise.
	(vhcaddq_rot270_s8): Likewise.
	(vhaddq_s8): Likewise.
	(vhaddq_n_s8): Likewise.
	(veorq_s8): Likewise.
	(vcaddq_rot90_s8): Likewise.
	(vcaddq_rot270_s8): Likewise.
	(vbrsrq_n_s8): Likewise.
	(vbicq_s8): Likewise.
	(vandq_s8): Likewise.
	(vaddvaq_s8): Likewise.
	(vaddq_n_s8): Likewise.
	(vabdq_s8): Likewise.
	(vshlq_n_s8): Likewise.
	(vrshrq_n_s8): Likewise.
	(vqshlq_n_s8): Likewise.
	(vsubq_u16): Likewise.
	(vsubq_n_u16): Likewise.
	(vrmulhq_u16): Likewise.
	(vrhaddq_u16): Likewise.
	(vqsubq_u16): Likewise.
	(vqsubq_n_u16): Likewise.
	(vqaddq_u16): Likewise.
	(vqaddq_n_u16): Likewise.
	(vorrq_u16): Likewise.
	(vornq_u16): Likewise.
	(vmulq_u16): Likewise.
	(vmulq_n_u16): Likewise.
	(vmulltq_int_u16): Likewise.
	(vmullbq_int_u16): Likewise.
	(vmulhq_u16): Likewise.
	(vmladavq_u16): Likewise.
	(vminvq_u16): Likewise.
	(vminq_u16): Likewise.
	(vmaxvq_u16): Likewise.
	(vmaxq_u16): Likewise.
	(vhsubq_u16): Likewise.
	(vhsubq_n_u16): Likewise.
	(vhaddq_u16): Likewise.
	(vhaddq_n_u16): Likewise.
	(veorq_u16): Likewise.
	(vcmpneq_n_u16): Likewise.
	(vcmphiq_u16): Likewise.
	(vcmphiq_n_u16): Likewise.
	(vcmpeqq_u16): Likewise.
	(vcmpeqq_n_u16): Likewise.
	(vcmpcsq_u16): Likewise.
	(vcmpcsq_n_u16): Likewise.
	(vcaddq_rot90_u16): Likewise.
	(vcaddq_rot270_u16): Likewise.
	(vbicq_u16): Likewise.
	(vandq_u16): Likewise.
	(vaddvq_p_u16): Likewise.
	(vaddvaq_u16): Likewise.
	(vaddq_n_u16): Likewise.
	(vabdq_u16): Likewise.
	(vshlq_r_u16): Likewise.
	(vrshlq_u16): Likewise.
	(vrshlq_n_u16): Likewise.
	(vqshlq_u16): Likewise.
	(vqshlq_r_u16): Likewise.
	(vqrshlq_u16): Likewise.
	(vqrshlq_n_u16): Likewise.
	(vminavq_s16): Likewise.
	(vminaq_s16): Likewise.
	(vmaxavq_s16): Likewise.
	(vmaxaq_s16): Likewise.
	(vbrsrq_n_u16): Likewise.
	(vshlq_n_u16): Likewise.
	(vrshrq_n_u16): Likewise.
	(vqshlq_n_u16): Likewise.
	(vcmpneq_n_s16): Likewise.
	(vcmpltq_s16): Likewise.
	(vcmpltq_n_s16): Likewise.
	(vcmpleq_s16): Likewise.
	(vcmpleq_n_s16): Likewise.
	(vcmpgtq_s16): Likewise.
	(vcmpgtq_n_s16): Likewise.
	(vcmpgeq_s16): Likewise.
	(vcmpgeq_n_s16): Likewise.
	(vcmpeqq_s16): Likewise.
	(vcmpeqq_n_s16): Likewise.
	(vqshluq_n_s16): Likewise.
	(vaddvq_p_s16): Likewise.
	(vsubq_s16): Likewise.
	(vsubq_n_s16): Likewise.
	(vshlq_r_s16): Likewise.
	(vrshlq_s16): Likewise.
	(vrshlq_n_s16): Likewise.
	(vrmulhq_s16): Likewise.
	(vrhaddq_s16): Likewise.
	(vqsubq_s16): Likewise.
	(vqsubq_n_s16): Likewise.
	(vqshlq_s16): Likewise.
	(vqshlq_r_s16): Likewise.
	(vqrshlq_s16): Likewise.
	(vqrshlq_n_s16): Likewise.
	(vqrdmulhq_s16): Likewise.
	(vqrdmulhq_n_s16): Likewise.
	(vqdmulhq_s16): Likewise.
	(vqdmulhq_n_s16): Likewise.
	(vqaddq_s16): Likewise.
	(vqaddq_n_s16): Likewise.
	(vorrq_s16): Likewise.
	(vornq_s16): Likewise.
	(vmulq_s16): Likewise.
	(vmulq_n_s16): Likewise.
	(vmulltq_int_s16): Likewise.
	(vmullbq_int_s16): Likewise.
	(vmulhq_s16): Likewise.
	(vmlsdavxq_s16): Likewise.
	(vmlsdavq_s16): Likewise.
	(vmladavxq_s16): Likewise.
	(vmladavq_s16): Likewise.
	(vminvq_s16): Likewise.
	(vminq_s16): Likewise.
	(vmaxvq_s16): Likewise.
	(vmaxq_s16): Likewise.
	(vhsubq_s16): Likewise.
	(vhsubq_n_s16): Likewise.
	(vhcaddq_rot90_s16): Likewise.
	(vhcaddq_rot270_s16): Likewise.
	(vhaddq_s16): Likewise.
	(vhaddq_n_s16): Likewise.
	(veorq_s16): Likewise.
	(vcaddq_rot90_s16): Likewise.
	(vcaddq_rot270_s16): Likewise.
	(vbrsrq_n_s16): Likewise.
	(vbicq_s16): Likewise.
	(vandq_s16): Likewise.
	(vaddvaq_s16): Likewise.
	(vaddq_n_s16): Likewise.
	(vabdq_s16): Likewise.
	(vshlq_n_s16): Likewise.
	(vrshrq_n_s16): Likewise.
	(vqshlq_n_s16): Likewise.
	(vsubq_u32): Likewise.
	(vsubq_n_u32): Likewise.
	(vrmulhq_u32): Likewise.
	(vrhaddq_u32): Likewise.
	(vqsubq_u32): Likewise.
	(vqsubq_n_u32): Likewise.
	(vqaddq_u32): Likewise.
	(vqaddq_n_u32): Likewise.
	(vorrq_u32): Likewise.
	(vornq_u32): Likewise.
	(vmulq_u32): Likewise.
	(vmulq_n_u32): Likewise.
	(vmulltq_int_u32): Likewise.
	(vmullbq_int_u32): Likewise.
	(vmulhq_u32): Likewise.
	(vmladavq_u32): Likewise.
	(vminvq_u32): Likewise.
	(vminq_u32): Likewise.
	(vmaxvq_u32): Likewise.
	(vmaxq_u32): Likewise.
	(vhsubq_u32): Likewise.
	(vhsubq_n_u32): Likewise.
	(vhaddq_u32): Likewise.
	(vhaddq_n_u32): Likewise.
	(veorq_u32): Likewise.
	(vcmpneq_n_u32): Likewise.
	(vcmphiq_u32): Likewise.
	(vcmphiq_n_u32): Likewise.
	(vcmpeqq_u32): Likewise.
	(vcmpeqq_n_u32): Likewise.
	(vcmpcsq_u32): Likewise.
	(vcmpcsq_n_u32): Likewise.
	(vcaddq_rot90_u32): Likewise.
	(vcaddq_rot270_u32): Likewise.
	(vbicq_u32): Likewise.
	(vandq_u32): Likewise.
	(vaddvq_p_u32): Likewise.
	(vaddvaq_u32): Likewise.
	(vaddq_n_u32): Likewise.
	(vabdq_u32): Likewise.
	(vshlq_r_u32): Likewise.
	(vrshlq_u32): Likewise.
	(vrshlq_n_u32): Likewise.
	(vqshlq_u32): Likewise.
	(vqshlq_r_u32): Likewise.
	(vqrshlq_u32): Likewise.
	(vqrshlq_n_u32): Likewise.
	(vminavq_s32): Likewise.
	(vminaq_s32): Likewise.
	(vmaxavq_s32): Likewise.
	(vmaxaq_s32): Likewise.
	(vbrsrq_n_u32): Likewise.
	(vshlq_n_u32): Likewise.
	(vrshrq_n_u32): Likewise.
	(vqshlq_n_u32): Likewise.
	(vcmpneq_n_s32): Likewise.
	(vcmpltq_s32): Likewise.
	(vcmpltq_n_s32): Likewise.
	(vcmpleq_s32): Likewise.
	(vcmpleq_n_s32): Likewise.
	(vcmpgtq_s32): Likewise.
	(vcmpgtq_n_s32): Likewise.
	(vcmpgeq_s32): Likewise.
	(vcmpgeq_n_s32): Likewise.
	(vcmpeqq_s32): Likewise.
	(vcmpeqq_n_s32): Likewise.
	(vqshluq_n_s32): Likewise.
	(vaddvq_p_s32): Likewise.
	(vsubq_s32): Likewise.
	(vsubq_n_s32): Likewise.
	(vshlq_r_s32): Likewise.
	(vrshlq_s32): Likewise.
	(vrshlq_n_s32): Likewise.
	(vrmulhq_s32): Likewise.
	(vrhaddq_s32): Likewise.
	(vqsubq_s32): Likewise.
	(vqsubq_n_s32): Likewise.
	(vqshlq_s32): Likewise.
	(vqshlq_r_s32): Likewise.
	(vqrshlq_s32): Likewise.
	(vqrshlq_n_s32): Likewise.
	(vqrdmulhq_s32): Likewise.
	(vqrdmulhq_n_s32): Likewise.
	(vqdmulhq_s32): Likewise.
	(vqdmulhq_n_s32): Likewise.
	(vqaddq_s32): Likewise.
	(vqaddq_n_s32): Likewise.
	(vorrq_s32): Likewise.
	(vornq_s32): Likewise.
	(vmulq_s32): Likewise.
	(vmulq_n_s32): Likewise.
	(vmulltq_int_s32): Likewise.
	(vmullbq_int_s32): Likewise.
	(vmulhq_s32): Likewise.
	(vmlsdavxq_s32): Likewise.
	(vmlsdavq_s32): Likewise.
	(vmladavxq_s32): Likewise.
	(vmladavq_s32): Likewise.
	(vminvq_s32): Likewise.
	(vminq_s32): Likewise.
	(vmaxvq_s32): Likewise.
	(vmaxq_s32): Likewise.
	(vhsubq_s32): Likewise.
	(vhsubq_n_s32): Likewise.
	(vhcaddq_rot90_s32): Likewise.
	(vhcaddq_rot270_s32): Likewise.
	(vhaddq_s32): Likewise.
	(vhaddq_n_s32): Likewise.
	(veorq_s32): Likewise.
	(vcaddq_rot90_s32): Likewise.
	(vcaddq_rot270_s32): Likewise.
	(vbrsrq_n_s32): Likewise.
	(vbicq_s32): Likewise.
	(vandq_s32): Likewise.
	(vaddvaq_s32): Likewise.
	(vaddq_n_s32): Likewise.
	(vabdq_s32): Likewise.
	(vshlq_n_s32): Likewise.
	(vrshrq_n_s32): Likewise.
	(vqshlq_n_s32): Likewise.
	(__arm_vsubq_u8): Define intrinsic.
	(__arm_vsubq_n_u8): Likewise.
	(__arm_vrmulhq_u8): Likewise.
	(__arm_vrhaddq_u8): Likewise.
	(__arm_vqsubq_u8): Likewise.
	(__arm_vqsubq_n_u8): Likewise.
	(__arm_vqaddq_u8): Likewise.
	(__arm_vqaddq_n_u8): Likewise.
	(__arm_vorrq_u8): Likewise.
	(__arm_vornq_u8): Likewise.
	(__arm_vmulq_u8): Likewise.
	(__arm_vmulq_n_u8): Likewise.
	(__arm_vmulltq_int_u8): Likewise.
	(__arm_vmullbq_int_u8): Likewise.
	(__arm_vmulhq_u8): Likewise.
	(__arm_vmladavq_u8): Likewise.
	(__arm_vminvq_u8): Likewise.
	(__arm_vminq_u8): Likewise.
	(__arm_vmaxvq_u8): Likewise.
	(__arm_vmaxq_u8): Likewise.
	(__arm_vhsubq_u8): Likewise.
	(__arm_vhsubq_n_u8): Likewise.
	(__arm_vhaddq_u8): Likewise.
	(__arm_vhaddq_n_u8): Likewise.
	(__arm_veorq_u8): Likewise.
	(__arm_vcmpneq_n_u8): Likewise.
	(__arm_vcmphiq_u8): Likewise.
	(__arm_vcmphiq_n_u8): Likewise.
	(__arm_vcmpeqq_u8): Likewise.
	(__arm_vcmpeqq_n_u8): Likewise.
	(__arm_vcmpcsq_u8): Likewise.
	(__arm_vcmpcsq_n_u8): Likewise.
	(__arm_vcaddq_rot90_u8): Likewise.
	(__arm_vcaddq_rot270_u8): Likewise.
	(__arm_vbicq_u8): Likewise.
	(__arm_vandq_u8): Likewise.
	(__arm_vaddvq_p_u8): Likewise.
	(__arm_vaddvaq_u8): Likewise.
	(__arm_vaddq_n_u8): Likewise.
	(__arm_vabdq_u8): Likewise.
	(__arm_vshlq_r_u8): Likewise.
	(__arm_vrshlq_u8): Likewise.
	(__arm_vrshlq_n_u8): Likewise.
	(__arm_vqshlq_u8): Likewise.
	(__arm_vqshlq_r_u8): Likewise.
	(__arm_vqrshlq_u8): Likewise.
	(__arm_vqrshlq_n_u8): Likewise.
	(__arm_vminavq_s8): Likewise.
	(__arm_vminaq_s8): Likewise.
	(__arm_vmaxavq_s8): Likewise.
	(__arm_vmaxaq_s8): Likewise.
	(__arm_vbrsrq_n_u8): Likewise.
	(__arm_vshlq_n_u8): Likewise.
	(__arm_vrshrq_n_u8): Likewise.
	(__arm_vqshlq_n_u8): Likewise.
	(__arm_vcmpneq_n_s8): Likewise.
	(__arm_vcmpltq_s8): Likewise.
	(__arm_vcmpltq_n_s8): Likewise.
	(__arm_vcmpleq_s8): Likewise.
	(__arm_vcmpleq_n_s8): Likewise.
	(__arm_vcmpgtq_s8): Likewise.
	(__arm_vcmpgtq_n_s8): Likewise.
	(__arm_vcmpgeq_s8): Likewise.
	(__arm_vcmpgeq_n_s8): Likewise.
	(__arm_vcmpeqq_s8): Likewise.
	(__arm_vcmpeqq_n_s8): Likewise.
	(__arm_vqshluq_n_s8): Likewise.
	(__arm_vaddvq_p_s8): Likewise.
	(__arm_vsubq_s8): Likewise.
	(__arm_vsubq_n_s8): Likewise.
	(__arm_vshlq_r_s8): Likewise.
	(__arm_vrshlq_s8): Likewise.
	(__arm_vrshlq_n_s8): Likewise.
	(__arm_vrmulhq_s8): Likewise.
	(__arm_vrhaddq_s8): Likewise.
	(__arm_vqsubq_s8): Likewise.
	(__arm_vqsubq_n_s8): Likewise.
	(__arm_vqshlq_s8): Likewise.
	(__arm_vqshlq_r_s8): Likewise.
	(__arm_vqrshlq_s8): Likewise.
	(__arm_vqrshlq_n_s8): Likewise.
	(__arm_vqrdmulhq_s8): Likewise.
	(__arm_vqrdmulhq_n_s8): Likewise.
	(__arm_vqdmulhq_s8): Likewise.
	(__arm_vqdmulhq_n_s8): Likewise.
	(__arm_vqaddq_s8): Likewise.
	(__arm_vqaddq_n_s8): Likewise.
	(__arm_vorrq_s8): Likewise.
	(__arm_vornq_s8): Likewise.
	(__arm_vmulq_s8): Likewise.
	(__arm_vmulq_n_s8): Likewise.
	(__arm_vmulltq_int_s8): Likewise.
	(__arm_vmullbq_int_s8): Likewise.
	(__arm_vmulhq_s8): Likewise.
	(__arm_vmlsdavxq_s8): Likewise.
	(__arm_vmlsdavq_s8): Likewise.
	(__arm_vmladavxq_s8): Likewise.
	(__arm_vmladavq_s8): Likewise.
	(__arm_vminvq_s8): Likewise.
	(__arm_vminq_s8): Likewise.
	(__arm_vmaxvq_s8): Likewise.
	(__arm_vmaxq_s8): Likewise.
	(__arm_vhsubq_s8): Likewise.
	(__arm_vhsubq_n_s8): Likewise.
	(__arm_vhcaddq_rot90_s8): Likewise.
	(__arm_vhcaddq_rot270_s8): Likewise.
	(__arm_vhaddq_s8): Likewise.
	(__arm_vhaddq_n_s8): Likewise.
	(__arm_veorq_s8): Likewise.
	(__arm_vcaddq_rot90_s8): Likewise.
	(__arm_vcaddq_rot270_s8): Likewise.
	(__arm_vbrsrq_n_s8): Likewise.
	(__arm_vbicq_s8): Likewise.
	(__arm_vandq_s8): Likewise.
	(__arm_vaddvaq_s8): Likewise.
	(__arm_vaddq_n_s8): Likewise.
	(__arm_vabdq_s8): Likewise.
	(__arm_vshlq_n_s8): Likewise.
	(__arm_vrshrq_n_s8): Likewise.
	(__arm_vqshlq_n_s8): Likewise.
	(__arm_vsubq_u16): Likewise.
	(__arm_vsubq_n_u16): Likewise.
	(__arm_vrmulhq_u16): Likewise.
	(__arm_vrhaddq_u16): Likewise.
	(__arm_vqsubq_u16): Likewise.
	(__arm_vqsubq_n_u16): Likewise.
	(__arm_vqaddq_u16): Likewise.
	(__arm_vqaddq_n_u16): Likewise.
	(__arm_vorrq_u16): Likewise.
	(__arm_vornq_u16): Likewise.
	(__arm_vmulq_u16): Likewise.
	(__arm_vmulq_n_u16): Likewise.
	(__arm_vmulltq_int_u16): Likewise.
	(__arm_vmullbq_int_u16): Likewise.
	(__arm_vmulhq_u16): Likewise.
	(__arm_vmladavq_u16): Likewise.
	(__arm_vminvq_u16): Likewise.
	(__arm_vminq_u16): Likewise.
	(__arm_vmaxvq_u16): Likewise.
	(__arm_vmaxq_u16): Likewise.
	(__arm_vhsubq_u16): Likewise.
	(__arm_vhsubq_n_u16): Likewise.
	(__arm_vhaddq_u16): Likewise.
	(__arm_vhaddq_n_u16): Likewise.
	(__arm_veorq_u16): Likewise.
	(__arm_vcmpneq_n_u16): Likewise.
	(__arm_vcmphiq_u16): Likewise.
	(__arm_vcmphiq_n_u16): Likewise.
	(__arm_vcmpeqq_u16): Likewise.
	(__arm_vcmpeqq_n_u16): Likewise.
	(__arm_vcmpcsq_u16): Likewise.
	(__arm_vcmpcsq_n_u16): Likewise.
	(__arm_vcaddq_rot90_u16): Likewise.
	(__arm_vcaddq_rot270_u16): Likewise.
	(__arm_vbicq_u16): Likewise.
	(__arm_vandq_u16): Likewise.
	(__arm_vaddvq_p_u16): Likewise.
	(__arm_vaddvaq_u16): Likewise.
	(__arm_vaddq_n_u16): Likewise.
	(__arm_vabdq_u16): Likewise.
	(__arm_vshlq_r_u16): Likewise.
	(__arm_vrshlq_u16): Likewise.
	(__arm_vrshlq_n_u16): Likewise.
	(__arm_vqshlq_u16): Likewise.
	(__arm_vqshlq_r_u16): Likewise.
	(__arm_vqrshlq_u16): Likewise.
	(__arm_vqrshlq_n_u16): Likewise.
	(__arm_vminavq_s16): Likewise.
	(__arm_vminaq_s16): Likewise.
	(__arm_vmaxavq_s16): Likewise.
	(__arm_vmaxaq_s16): Likewise.
	(__arm_vbrsrq_n_u16): Likewise.
	(__arm_vshlq_n_u16): Likewise.
	(__arm_vrshrq_n_u16): Likewise.
	(__arm_vqshlq_n_u16): Likewise.
	(__arm_vcmpneq_n_s16): Likewise.
	(__arm_vcmpltq_s16): Likewise.
	(__arm_vcmpltq_n_s16): Likewise.
	(__arm_vcmpleq_s16): Likewise.
	(__arm_vcmpleq_n_s16): Likewise.
	(__arm_vcmpgtq_s16): Likewise.
	(__arm_vcmpgtq_n_s16): Likewise.
	(__arm_vcmpgeq_s16): Likewise.
	(__arm_vcmpgeq_n_s16): Likewise.
	(__arm_vcmpeqq_s16): Likewise.
	(__arm_vcmpeqq_n_s16): Likewise.
	(__arm_vqshluq_n_s16): Likewise.
	(__arm_vaddvq_p_s16): Likewise.
	(__arm_vsubq_s16): Likewise.
	(__arm_vsubq_n_s16): Likewise.
	(__arm_vshlq_r_s16): Likewise.
	(__arm_vrshlq_s16): Likewise.
	(__arm_vrshlq_n_s16): Likewise.
	(__arm_vrmulhq_s16): Likewise.
	(__arm_vrhaddq_s16): Likewise.
	(__arm_vqsubq_s16): Likewise.
	(__arm_vqsubq_n_s16): Likewise.
	(__arm_vqshlq_s16): Likewise.
	(__arm_vqshlq_r_s16): Likewise.
	(__arm_vqrshlq_s16): Likewise.
	(__arm_vqrshlq_n_s16): Likewise.
	(__arm_vqrdmulhq_s16): Likewise.
	(__arm_vqrdmulhq_n_s16): Likewise.
	(__arm_vqdmulhq_s16): Likewise.
	(__arm_vqdmulhq_n_s16): Likewise.
	(__arm_vqaddq_s16): Likewise.
	(__arm_vqaddq_n_s16): Likewise.
	(__arm_vorrq_s16): Likewise.
	(__arm_vornq_s16): Likewise.
	(__arm_vmulq_s16): Likewise.
	(__arm_vmulq_n_s16): Likewise.
	(__arm_vmulltq_int_s16): Likewise.
	(__arm_vmullbq_int_s16): Likewise.
	(__arm_vmulhq_s16): Likewise.
	(__arm_vmlsdavxq_s16): Likewise.
	(__arm_vmlsdavq_s16): Likewise.
	(__arm_vmladavxq_s16): Likewise.
	(__arm_vmladavq_s16): Likewise.
	(__arm_vminvq_s16): Likewise.
	(__arm_vminq_s16): Likewise.
	(__arm_vmaxvq_s16): Likewise.
	(__arm_vmaxq_s16): Likewise.
	(__arm_vhsubq_s16): Likewise.
	(__arm_vhsubq_n_s16): Likewise.
	(__arm_vhcaddq_rot90_s16): Likewise.
	(__arm_vhcaddq_rot270_s16): Likewise.
	(__arm_vhaddq_s16): Likewise.
	(__arm_vhaddq_n_s16): Likewise.
	(__arm_veorq_s16): Likewise.
	(__arm_vcaddq_rot90_s16): Likewise.
	(__arm_vcaddq_rot270_s16): Likewise.
	(__arm_vbrsrq_n_s16): Likewise.
	(__arm_vbicq_s16): Likewise.
	(__arm_vandq_s16): Likewise.
	(__arm_vaddvaq_s16): Likewise.
	(__arm_vaddq_n_s16): Likewise.
	(__arm_vabdq_s16): Likewise.
	(__arm_vshlq_n_s16): Likewise.
	(__arm_vrshrq_n_s16): Likewise.
	(__arm_vqshlq_n_s16): Likewise.
	(__arm_vsubq_u32): Likewise.
	(__arm_vsubq_n_u32): Likewise.
	(__arm_vrmulhq_u32): Likewise.
	(__arm_vrhaddq_u32): Likewise.
	(__arm_vqsubq_u32): Likewise.
	(__arm_vqsubq_n_u32): Likewise.
	(__arm_vqaddq_u32): Likewise.
	(__arm_vqaddq_n_u32): Likewise.
	(__arm_vorrq_u32): Likewise.
	(__arm_vornq_u32): Likewise.
	(__arm_vmulq_u32): Likewise.
	(__arm_vmulq_n_u32): Likewise.
	(__arm_vmulltq_int_u32): Likewise.
	(__arm_vmullbq_int_u32): Likewise.
	(__arm_vmulhq_u32): Likewise.
	(__arm_vmladavq_u32): Likewise.
	(__arm_vminvq_u32): Likewise.
	(__arm_vminq_u32): Likewise.
	(__arm_vmaxvq_u32): Likewise.
	(__arm_vmaxq_u32): Likewise.
	(__arm_vhsubq_u32): Likewise.
	(__arm_vhsubq_n_u32): Likewise.
	(__arm_vhaddq_u32): Likewise.
	(__arm_vhaddq_n_u32): Likewise.
	(__arm_veorq_u32): Likewise.
	(__arm_vcmpneq_n_u32): Likewise.
	(__arm_vcmphiq_u32): Likewise.
	(__arm_vcmphiq_n_u32): Likewise.
	(__arm_vcmpeqq_u32): Likewise.
	(__arm_vcmpeqq_n_u32): Likewise.
	(__arm_vcmpcsq_u32): Likewise.
	(__arm_vcmpcsq_n_u32): Likewise.
	(__arm_vcaddq_rot90_u32): Likewise.
	(__arm_vcaddq_rot270_u32): Likewise.
	(__arm_vbicq_u32): Likewise.
	(__arm_vandq_u32): Likewise.
	(__arm_vaddvq_p_u32): Likewise.
	(__arm_vaddvaq_u32): Likewise.
	(__arm_vaddq_n_u32): Likewise.
	(__arm_vabdq_u32): Likewise.
	(__arm_vshlq_r_u32): Likewise.
	(__arm_vrshlq_u32): Likewise.
	(__arm_vrshlq_n_u32): Likewise.
	(__arm_vqshlq_u32): Likewise.
	(__arm_vqshlq_r_u32): Likewise.
	(__arm_vqrshlq_u32): Likewise.
	(__arm_vqrshlq_n_u32): Likewise.
	(__arm_vminavq_s32): Likewise.
	(__arm_vminaq_s32): Likewise.
	(__arm_vmaxavq_s32): Likewise.
	(__arm_vmaxaq_s32): Likewise.
	(__arm_vbrsrq_n_u32): Likewise.
	(__arm_vshlq_n_u32): Likewise.
	(__arm_vrshrq_n_u32): Likewise.
	(__arm_vqshlq_n_u32): Likewise.
	(__arm_vcmpneq_n_s32): Likewise.
	(__arm_vcmpltq_s32): Likewise.
	(__arm_vcmpltq_n_s32): Likewise.
	(__arm_vcmpleq_s32): Likewise.
	(__arm_vcmpleq_n_s32): Likewise.
	(__arm_vcmpgtq_s32): Likewise.
	(__arm_vcmpgtq_n_s32): Likewise.
	(__arm_vcmpgeq_s32): Likewise.
	(__arm_vcmpgeq_n_s32): Likewise.
	(__arm_vcmpeqq_s32): Likewise.
	(__arm_vcmpeqq_n_s32): Likewise.
	(__arm_vqshluq_n_s32): Likewise.
	(__arm_vaddvq_p_s32): Likewise.
	(__arm_vsubq_s32): Likewise.
	(__arm_vsubq_n_s32): Likewise.
	(__arm_vshlq_r_s32): Likewise.
	(__arm_vrshlq_s32): Likewise.
	(__arm_vrshlq_n_s32): Likewise.
	(__arm_vrmulhq_s32): Likewise.
	(__arm_vrhaddq_s32): Likewise.
	(__arm_vqsubq_s32): Likewise.
	(__arm_vqsubq_n_s32): Likewise.
	(__arm_vqshlq_s32): Likewise.
	(__arm_vqshlq_r_s32): Likewise.
	(__arm_vqrshlq_s32): Likewise.
	(__arm_vqrshlq_n_s32): Likewise.
	(__arm_vqrdmulhq_s32): Likewise.
	(__arm_vqrdmulhq_n_s32): Likewise.
	(__arm_vqdmulhq_s32): Likewise.
	(__arm_vqdmulhq_n_s32): Likewise.
	(__arm_vqaddq_s32): Likewise.
	(__arm_vqaddq_n_s32): Likewise.
	(__arm_vorrq_s32): Likewise.
	(__arm_vornq_s32): Likewise.
	(__arm_vmulq_s32): Likewise.
	(__arm_vmulq_n_s32): Likewise.
	(__arm_vmulltq_int_s32): Likewise.
	(__arm_vmullbq_int_s32): Likewise.
	(__arm_vmulhq_s32): Likewise.
	(__arm_vmlsdavxq_s32): Likewise.
	(__arm_vmlsdavq_s32): Likewise.
	(__arm_vmladavxq_s32): Likewise.
	(__arm_vmladavq_s32): Likewise.
	(__arm_vminvq_s32): Likewise.
	(__arm_vminq_s32): Likewise.
	(__arm_vmaxvq_s32): Likewise.
	(__arm_vmaxq_s32): Likewise.
	(__arm_vhsubq_s32): Likewise.
	(__arm_vhsubq_n_s32): Likewise.
	(__arm_vhcaddq_rot90_s32): Likewise.
	(__arm_vhcaddq_rot270_s32): Likewise.
	(__arm_vhaddq_s32): Likewise.
	(__arm_vhaddq_n_s32): Likewise.
	(__arm_veorq_s32): Likewise.
	(__arm_vcaddq_rot90_s32): Likewise.
	(__arm_vcaddq_rot270_s32): Likewise.
	(__arm_vbrsrq_n_s32): Likewise.
	(__arm_vbicq_s32): Likewise.
	(__arm_vandq_s32): Likewise.
	(__arm_vaddvaq_s32): Likewise.
	(__arm_vaddq_n_s32): Likewise.
	(__arm_vabdq_s32): Likewise.
	(__arm_vshlq_n_s32): Likewise.
	(__arm_vrshrq_n_s32): Likewise.
	(__arm_vqshlq_n_s32): Likewise.
	(vsubq): Define polymorphic variant.
	(vsubq_n): Likewise.
	(vshlq_r): Likewise.
	(vrshlq_n): Likewise.
	(vrshlq): Likewise.
	(vrmulhq): Likewise.
	(vrhaddq): Likewise.
	(vqsubq_n): Likewise.
	(vqsubq): Likewise.
	(vqshlq): Likewise.
	(vqshlq_r): Likewise.
	(vqshluq): Likewise.
	(vrshrq_n): Likewise.
	(vshlq_n): Likewise.
	(vqshluq_n): Likewise.
	(vqshlq_n): Likewise.
	(vqrshlq_n): Likewise.
	(vqrshlq): Likewise.
	(vqrdmulhq_n): Likewise.
	(vqrdmulhq): Likewise.
	(vqdmulhq_n): Likewise.
	(vqdmulhq): Likewise.
	(vqaddq_n): Likewise.
	(vqaddq): Likewise.
	(vorrq_n): Likewise.
	(vorrq): Likewise.
	(vornq): Likewise.
	(vmulq_n): Likewise.
	(vmulq): Likewise.
	(vmulltq_int): Likewise.
	(vmullbq_int): Likewise.
	(vmulhq): Likewise.
	(vminq): Likewise.
	(vminaq): Likewise.
	(vmaxq): Likewise.
	(vmaxaq): Likewise.
	(vhsubq_n): Likewise.
	(vhsubq): Likewise.
	(vhcaddq_rot90): Likewise.
	(vhcaddq_rot270): Likewise.
	(vhaddq_n): Likewise.
	(vhaddq): Likewise.
	(veorq): Likewise.
	(vcaddq_rot90): Likewise.
	(vcaddq_rot270): Likewise.
	(vbrsrq_n): Likewise.
	(vbicq_n): Likewise.
	(vbicq): Likewise.
	(vaddq): Likewise.
	(vaddq_n): Likewise.
	(vandq): Likewise.
	(vabdq): Likewise.
	* config/arm/arm_mve_builtins.def (BINOP_NONE_NONE_IMM): Use it.
	(BINOP_NONE_NONE_NONE): Likewise.
	(BINOP_NONE_NONE_UNONE): Likewise.
	(BINOP_UNONE_NONE_IMM): Likewise.
	(BINOP_UNONE_NONE_NONE): Likewise.
	(BINOP_UNONE_UNONE_IMM): Likewise.
	(BINOP_UNONE_UNONE_NONE): Likewise.
	(BINOP_UNONE_UNONE_UNONE): Likewise.
	* config/arm/constraints.md (Ra): Define constraint to check constant is
	in the range of 0 to 7.
	(Rg): Define constriant to check the constant is one among 1, 2, 4
	and 8.
	* config/arm/mve.md (mve_vabdq_<supf>): Define RTL pattern.
	(mve_vaddq_n_<supf>): Likewise.
	(mve_vaddvaq_<supf>): Likewise.
	(mve_vaddvq_p_<supf>): Likewise.
	(mve_vandq_<supf>): Likewise.
	(mve_vbicq_<supf>): Likewise.
	(mve_vbrsrq_n_<supf>): Likewise.
	(mve_vcaddq_rot270_<supf>): Likewise.
	(mve_vcaddq_rot90_<supf>): Likewise.
	(mve_vcmpcsq_n_u): Likewise.
	(mve_vcmpcsq_u): Likewise.
	(mve_vcmpeqq_n_<supf>): Likewise.
	(mve_vcmpeqq_<supf>): Likewise.
	(mve_vcmpgeq_n_s): Likewise.
	(mve_vcmpgeq_s): Likewise.
	(mve_vcmpgtq_n_s): Likewise.
	(mve_vcmpgtq_s): Likewise.
	(mve_vcmphiq_n_u): Likewise.
	(mve_vcmphiq_u): Likewise.
	(mve_vcmpleq_n_s): Likewise.
	(mve_vcmpleq_s): Likewise.
	(mve_vcmpltq_n_s): Likewise.
	(mve_vcmpltq_s): Likewise.
	(mve_vcmpneq_n_<supf>): Likewise.
	(mve_vddupq_n_u): Likewise.
	(mve_veorq_<supf>): Likewise.
	(mve_vhaddq_n_<supf>): Likewise.
	(mve_vhaddq_<supf>): Likewise.
	(mve_vhcaddq_rot270_s): Likewise.
	(mve_vhcaddq_rot90_s): Likewise.
	(mve_vhsubq_n_<supf>): Likewise.
	(mve_vhsubq_<supf>): Likewise.
	(mve_vidupq_n_u): Likewise.
	(mve_vmaxaq_s): Likewise.
	(mve_vmaxavq_s): Likewise.
	(mve_vmaxq_<supf>): Likewise.
	(mve_vmaxvq_<supf>): Likewise.
	(mve_vminaq_s): Likewise.
	(mve_vminavq_s): Likewise.
	(mve_vminq_<supf>): Likewise.
	(mve_vminvq_<supf>): Likewise.
	(mve_vmladavq_<supf>): Likewise.
	(mve_vmladavxq_s): Likewise.
	(mve_vmlsdavq_s): Likewise.
	(mve_vmlsdavxq_s): Likewise.
	(mve_vmulhq_<supf>): Likewise.
	(mve_vmullbq_int_<supf>): Likewise.
	(mve_vmulltq_int_<supf>): Likewise.
	(mve_vmulq_n_<supf>): Likewise.
	(mve_vmulq_<supf>): Likewise.
	(mve_vornq_<supf>): Likewise.
	(mve_vorrq_<supf>): Likewise.
	(mve_vqaddq_n_<supf>): Likewise.
	(mve_vqaddq_<supf>): Likewise.
	(mve_vqdmulhq_n_s): Likewise.
	(mve_vqdmulhq_s): Likewise.
	(mve_vqrdmulhq_n_s): Likewise.
	(mve_vqrdmulhq_s): Likewise.
	(mve_vqrshlq_n_<supf>): Likewise.
	(mve_vqrshlq_<supf>): Likewise.
	(mve_vqshlq_n_<supf>): Likewise.
	(mve_vqshlq_r_<supf>): Likewise.
	(mve_vqshlq_<supf>): Likewise.
	(mve_vqshluq_n_s): Likewise.
	(mve_vqsubq_n_<supf>): Likewise.
	(mve_vqsubq_<supf>): Likewise.
	(mve_vrhaddq_<supf>): Likewise.
	(mve_vrmulhq_<supf>): Likewise.
	(mve_vrshlq_n_<supf>): Likewise.
	(mve_vrshlq_<supf>): Likewise.
	(mve_vrshrq_n_<supf>): Likewise.
	(mve_vshlq_n_<supf>): Likewise.
	(mve_vshlq_r_<supf>): Likewise.
	(mve_vsubq_n_<supf>): Likewise.
	(mve_vsubq_<supf>): Likewise.
	* config/arm/predicates.md (mve_imm_7): Define predicate to check
	the matching constraint Ra.
	(mve_imm_selective_upto_8): Define predicate to check the matching
	constraint Rg.

gcc/testsuite/ChangeLog:

2019-10-23  Andre Vieira  <andre.simoesdiasvieira@arm.com>
	    Mihail Ionescu  <mihail.ionescu@arm.com>
	    Srinath Parvathaneni  <srinath.parvathaneni@arm.com>

	* gcc.target/arm/mve/intrinsics/vabdq_s16.c: New test.
	* gcc.target/arm/mve/intrinsics/vabdq_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vabdq_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vabdq_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vabdq_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vabdq_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddq_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddq_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddq_n_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddq_n_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddq_n_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddq_n_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddvaq_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddvaq_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddvaq_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddvaq_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddvaq_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddvaq_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddvq_p_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddvq_p_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddvq_p_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddvq_p_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddvq_p_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddvq_p_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vandq_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vandq_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vandq_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vandq_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vandq_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vandq_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vbicq_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vbicq_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vbicq_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vbicq_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vbicq_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vbicq_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vbrsrq_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vbrsrq_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vbrsrq_n_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vbrsrq_n_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vbrsrq_n_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vbrsrq_n_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcaddq_rot270_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcaddq_rot270_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcaddq_rot270_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcaddq_rot270_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcaddq_rot270_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcaddq_rot270_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcaddq_rot90_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcaddq_rot90_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcaddq_rot90_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcaddq_rot90_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcaddq_rot90_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcaddq_rot90_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpcsq_n_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpcsq_n_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpcsq_n_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpcsq_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpcsq_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpcsq_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpeqq_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpeqq_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpeqq_n_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpeqq_n_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpeqq_n_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpeqq_n_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpeqq_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpeqq_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpeqq_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpeqq_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpeqq_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpeqq_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpgeq_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpgeq_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpgeq_n_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpgeq_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpgeq_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpgeq_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpgtq_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpgtq_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpgtq_n_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpgtq_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpgtq_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpgtq_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmphiq_n_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmphiq_n_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmphiq_n_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmphiq_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmphiq_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmphiq_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpleq_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpleq_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpleq_n_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpleq_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpleq_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpleq_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpltq_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpltq_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpltq_n_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpltq_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpltq_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpltq_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpneq_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpneq_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpneq_n_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpneq_n_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpneq_n_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpneq_n_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/veorq_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/veorq_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/veorq_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/veorq_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/veorq_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/veorq_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhaddq_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhaddq_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhaddq_n_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhaddq_n_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhaddq_n_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhaddq_n_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhaddq_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhaddq_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhaddq_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhaddq_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhaddq_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhaddq_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhcaddq_rot270_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhcaddq_rot270_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhcaddq_rot270_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhcaddq_rot90_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhcaddq_rot90_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhcaddq_rot90_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhsubq_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhsubq_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhsubq_n_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhsubq_n_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhsubq_n_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhsubq_n_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhsubq_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhsubq_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhsubq_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhsubq_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhsubq_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhsubq_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmaxaq_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmaxaq_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmaxaq_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmaxavq_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmaxavq_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmaxavq_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmaxq_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmaxq_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmaxq_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmaxq_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmaxq_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmaxq_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmaxvq_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmaxvq_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmaxvq_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmaxvq_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmaxvq_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmaxvq_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vminaq_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vminaq_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vminaq_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vminavq_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vminavq_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vminavq_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vminq_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vminq_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vminq_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vminq_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vminq_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vminq_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vminvq_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vminvq_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vminvq_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vminvq_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vminvq_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vminvq_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmladavq_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmladavq_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmladavq_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmladavq_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmladavq_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmladavq_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmladavxq_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmladavxq_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmladavxq_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmlsdavq_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmlsdavq_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmlsdavq_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmlsdavxq_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmlsdavxq_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmlsdavxq_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmulhq_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmulhq_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmulhq_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmulhq_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmulhq_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmulhq_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmullbq_int_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmullbq_int_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmullbq_int_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmullbq_int_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmullbq_int_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmullbq_int_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmulltq_int_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmulltq_int_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmulltq_int_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmulltq_int_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmulltq_int_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmulltq_int_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmulq_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmulq_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmulq_n_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmulq_n_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmulq_n_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmulq_n_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmulq_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmulq_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmulq_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmulq_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmulq_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmulq_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vornq_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vornq_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vornq_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vornq_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vornq_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vornq_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vorrq_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vorrq_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vorrq_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vorrq_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vorrq_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vorrq_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqaddq_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqaddq_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqaddq_n_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqaddq_n_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqaddq_n_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqaddq_n_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqaddq_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqaddq_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqaddq_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqaddq_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqaddq_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqaddq_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqdmulhq_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqdmulhq_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqdmulhq_n_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqdmulhq_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqdmulhq_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqdmulhq_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqrdmulhq_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqrdmulhq_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqrdmulhq_n_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqrdmulhq_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqrdmulhq_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqrdmulhq_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqrshlq_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqrshlq_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqrshlq_n_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqrshlq_n_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqrshlq_n_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqrshlq_n_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqrshlq_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqrshlq_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqrshlq_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqrshlq_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqrshlq_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqrshlq_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqshlq_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqshlq_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqshlq_n_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqshlq_n_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqshlq_n_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqshlq_n_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqshlq_r_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqshlq_r_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqshlq_r_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqshlq_r_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqshlq_r_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqshlq_r_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqshlq_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqshlq_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqshlq_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqshlq_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqshlq_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqshlq_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqshluq_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqshluq_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqshluq_n_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqsubq_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqsubq_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqsubq_n_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqsubq_n_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqsubq_n_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqsubq_n_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqsubq_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqsubq_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqsubq_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqsubq_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqsubq_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqsubq_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrhaddq_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrhaddq_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrhaddq_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrhaddq_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrhaddq_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrhaddq_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrmulhq_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrmulhq_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrmulhq_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrmulhq_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrmulhq_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrmulhq_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrshlq_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrshlq_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrshlq_n_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrshlq_n_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrshlq_n_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrshlq_n_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrshlq_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrshlq_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrshlq_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrshlq_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrshlq_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrshlq_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrshrq_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrshrq_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrshrq_n_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrshrq_n_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrshrq_n_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrshrq_n_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vshlq_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vshlq_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vshlq_n_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vshlq_n_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vshlq_n_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vshlq_n_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vshlq_r_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vshlq_r_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vshlq_r_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vshlq_r_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vshlq_r_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vshlq_r_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsubq_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsubq_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsubq_n_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsubq_n_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsubq_n_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsubq_n_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsubq_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsubq_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsubq_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsubq_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsubq_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsubq_u8.c: Likewise.

[-- Attachment #2: diff11.patch.gz --]
[-- Type: application/gzip, Size: 35052 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH][ARM][GCC][3/5x]: MVE store intrinsics with predicated suffix.
  2019-11-14 19:34 [PATCH][ARM][GCC][0/x]: Support for MVE ACLE intrinsics Srinath Parvathaneni
                   ` (18 preceding siblings ...)
  2019-11-14 19:15 ` [PATCH][ARM][GCC][2/5x]: MVE load intrinsics Srinath Parvathaneni
@ 2019-11-14 19:15 ` Srinath Parvathaneni
  2019-11-14 19:16 ` [PATCH][ARM][GCC][13x]: MVE ACLE scalar shift intrinsics Srinath Parvathaneni
                   ` (18 subsequent siblings)
  38 siblings, 0 replies; 48+ messages in thread
From: Srinath Parvathaneni @ 2019-11-14 19:15 UTC (permalink / raw)
  To: gcc-patches; +Cc: Richard Earnshaw, Kyrylo Tkachov

[-- Attachment #1: Type: text/plain, Size: 33114 bytes --]

Hello,

This patch supports the following MVE ACLE store intrinsics with predicated
suffix.

vstrbq_p_s8, vstrbq_p_s32, vstrbq_p_s16, vstrbq_p_u8, vstrbq_p_u32,
vstrbq_p_u16, vstrbq_scatter_offset_p_s8, vstrbq_scatter_offset_p_s32,
vstrbq_scatter_offset_p_s16, vstrbq_scatter_offset_p_u8,
vstrbq_scatter_offset_p_u32, vstrbq_scatter_offset_p_u16,
vstrwq_scatter_base_p_s32, vstrwq_scatter_base_p_u32.

Please refer to M-profile Vector Extension (MVE) intrinsics [1]  for more details.
[1]  https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/mve-intrinsics

Regression tested on arm-none-eabi and found no regressions.

Ok for trunk?

Thanks,
Srinath.

gcc/ChangeLog:

2019-11-01  Andre Vieira  <andre.simoesdiasvieira@arm.com>
	    Mihail Ionescu  <mihail.ionescu@arm.com>
	    Srinath Parvathaneni  <srinath.parvathaneni@arm.com>

	* config/arm/arm-builtins.c (STRS_P_QUALIFIERS): Define builtin
	qualifier.
	(STRU_P_QUALIFIERS): Likewise.
	(STRSU_P_QUALIFIERS): Likewise.
	(STRSS_P_QUALIFIERS): Likewise.
	(STRSBS_P_QUALIFIERS): Likewise.
	(STRSBU_P_QUALIFIERS): Likewise.
	* config/arm/arm_mve.h (vstrbq_p_s8): Define macro.
	(vstrbq_p_s32): Likewise.
	(vstrbq_p_s16): Likewise.
	(vstrbq_p_u8): Likewise.
	(vstrbq_p_u32): Likewise.
	(vstrbq_p_u16): Likewise.
	(vstrbq_scatter_offset_p_s8): Likewise.
	(vstrbq_scatter_offset_p_s32): Likewise.
	(vstrbq_scatter_offset_p_s16): Likewise.
	(vstrbq_scatter_offset_p_u8): Likewise.
	(vstrbq_scatter_offset_p_u32): Likewise.
	(vstrbq_scatter_offset_p_u16): Likewise.
	(vstrwq_scatter_base_p_s32): Likewise.
	(vstrwq_scatter_base_p_u32): Likewise.
	(__arm_vstrbq_p_s8): Define intrinsic.
	(__arm_vstrbq_p_s32): Likewise.
	(__arm_vstrbq_p_s16): Likewise.
	(__arm_vstrbq_p_u8): Likewise.
	(__arm_vstrbq_p_u32): Likewise.
	(__arm_vstrbq_p_u16): Likewise.
	(__arm_vstrbq_scatter_offset_p_s8): Likewise.
	(__arm_vstrbq_scatter_offset_p_s32): Likewise.
	(__arm_vstrbq_scatter_offset_p_s16): Likewise.
	(__arm_vstrbq_scatter_offset_p_u8): Likewise.
	(__arm_vstrbq_scatter_offset_p_u32): Likewise.
	(__arm_vstrbq_scatter_offset_p_u16): Likewise.
	(__arm_vstrwq_scatter_base_p_s32): Likewise.
	(__arm_vstrwq_scatter_base_p_u32): Likewise.
	(vstrbq_p): Define polymorphic variant.
	(vstrbq_scatter_offset_p): Likewise.
	(vstrwq_scatter_base_p): Likewise.
	* config/arm/arm_mve_builtins.def (STRS_P_QUALIFIERS): Use builtin
	qualifier.
        (STRU_P_QUALIFIERS): Likewise.
        (STRSU_P_QUALIFIERS): Likewise.
        (STRSS_P_QUALIFIERS): Likewise.
        (STRSBS_P_QUALIFIERS): Likewise.
        (STRSBU_P_QUALIFIERS): Likewise.
	* config/arm/mve.md (mve_vstrbq_scatter_offset_p_<supf><mode>): Define
	RTL pattern.
	(mve_vstrwq_scatter_base_p_<supf>v4si): Likewise.
	(mve_vstrbq_p_<supf><mode>): Likewise.

gcc/testsuite/ChangeLog:

2019-11-01  Andre Vieira  <andre.simoesdiasvieira@arm.com>
	    Mihail Ionescu  <mihail.ionescu@arm.com>
	    Srinath Parvathaneni  <srinath.parvathaneni@arm.com>

	* gcc.target/arm/mve/intrinsics/vstrbq_p_s16.c: New test.
	* gcc.target/arm/mve/intrinsics/vstrbq_p_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vstrbq_p_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vstrbq_p_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vstrbq_p_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vstrbq_p_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vstrbq_scatter_offset_p_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vstrbq_scatter_offset_p_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vstrbq_scatter_offset_p_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vstrbq_scatter_offset_p_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vstrbq_scatter_offset_p_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vstrbq_scatter_offset_p_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_p_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_p_u32.c: Likewise.


###############     Attachment also inlined for ease of reply    ###############


diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index 02ea297937b18099f33a50c808964d1dd7eac1b3..b5639051bf07785d906ed596e08d670f4de1a67e 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -589,6 +589,41 @@ arm_strsbu_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 #define STRSBU_QUALIFIERS (arm_strsbu_qualifiers)
 
 static enum arm_type_qualifiers
+arm_strs_p_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_void, qualifier_pointer, qualifier_none, qualifier_unsigned};
+#define STRS_P_QUALIFIERS (arm_strs_p_qualifiers)
+
+static enum arm_type_qualifiers
+arm_stru_p_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_void, qualifier_pointer, qualifier_unsigned,
+      qualifier_unsigned};
+#define STRU_P_QUALIFIERS (arm_stru_p_qualifiers)
+
+static enum arm_type_qualifiers
+arm_strsu_p_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_void, qualifier_pointer, qualifier_unsigned,
+      qualifier_unsigned, qualifier_unsigned};
+#define STRSU_P_QUALIFIERS (arm_strsu_p_qualifiers)
+
+static enum arm_type_qualifiers
+arm_strss_p_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_void, qualifier_pointer, qualifier_unsigned,
+      qualifier_none, qualifier_unsigned};
+#define STRSS_P_QUALIFIERS (arm_strss_p_qualifiers)
+
+static enum arm_type_qualifiers
+arm_strsbs_p_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_void, qualifier_unsigned, qualifier_immediate,
+      qualifier_none, qualifier_unsigned};
+#define STRSBS_P_QUALIFIERS (arm_strsbs_p_qualifiers)
+
+static enum arm_type_qualifiers
+arm_strsbu_p_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_void, qualifier_unsigned, qualifier_immediate,
+      qualifier_unsigned, qualifier_unsigned};
+#define STRSBU_P_QUALIFIERS (arm_strsbu_p_qualifiers)
+
+static enum arm_type_qualifiers
 arm_ldrgu_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_unsigned, qualifier_pointer, qualifier_unsigned};
 #define LDRGU_QUALIFIERS (arm_ldrgu_qualifiers)
diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
index c123de1f3d37054b779a9ee908525c242b2bc4ab..f852acb575b596c534c474d19faa73c03cf85c5e 100644
--- a/gcc/config/arm/arm_mve.h
+++ b/gcc/config/arm/arm_mve.h
@@ -1730,6 +1730,20 @@ typedef struct { uint8x16_t val[4]; } uint8x16x4_t;
 #define vldrbq_u32(__base) __arm_vldrbq_u32(__base)
 #define vldrwq_gather_base_s32(__addr,  __offset) __arm_vldrwq_gather_base_s32(__addr,  __offset)
 #define vldrwq_gather_base_u32(__addr,  __offset) __arm_vldrwq_gather_base_u32(__addr,  __offset)
+#define vstrbq_p_s8( __addr, __value, __p) __arm_vstrbq_p_s8( __addr, __value, __p)
+#define vstrbq_p_s32( __addr, __value, __p) __arm_vstrbq_p_s32( __addr, __value, __p)
+#define vstrbq_p_s16( __addr, __value, __p) __arm_vstrbq_p_s16( __addr, __value, __p)
+#define vstrbq_p_u8( __addr, __value, __p) __arm_vstrbq_p_u8( __addr, __value, __p)
+#define vstrbq_p_u32( __addr, __value, __p) __arm_vstrbq_p_u32( __addr, __value, __p)
+#define vstrbq_p_u16( __addr, __value, __p) __arm_vstrbq_p_u16( __addr, __value, __p)
+#define vstrbq_scatter_offset_p_s8( __base, __offset, __value, __p) __arm_vstrbq_scatter_offset_p_s8( __base, __offset, __value, __p)
+#define vstrbq_scatter_offset_p_s32( __base, __offset, __value, __p) __arm_vstrbq_scatter_offset_p_s32( __base, __offset, __value, __p)
+#define vstrbq_scatter_offset_p_s16( __base, __offset, __value, __p) __arm_vstrbq_scatter_offset_p_s16( __base, __offset, __value, __p)
+#define vstrbq_scatter_offset_p_u8( __base, __offset, __value, __p) __arm_vstrbq_scatter_offset_p_u8( __base, __offset, __value, __p)
+#define vstrbq_scatter_offset_p_u32( __base, __offset, __value, __p) __arm_vstrbq_scatter_offset_p_u32( __base, __offset, __value, __p)
+#define vstrbq_scatter_offset_p_u16( __base, __offset, __value, __p) __arm_vstrbq_scatter_offset_p_u16( __base, __offset, __value, __p)
+#define vstrwq_scatter_base_p_s32(__addr,  __offset, __value, __p) __arm_vstrwq_scatter_base_p_s32(__addr,  __offset, __value, __p)
+#define vstrwq_scatter_base_p_u32(__addr,  __offset, __value, __p) __arm_vstrwq_scatter_base_p_u32(__addr,  __offset, __value, __p)
 #endif
 
 __extension__ extern __inline void
@@ -11219,6 +11233,103 @@ __arm_vldrwq_gather_base_u32 (uint32x4_t __addr, const int __offset)
   return __builtin_mve_vldrwq_gather_base_uv4si (__addr, __offset);
 }
 
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vstrbq_p_s8 (int8_t * __addr, int8x16_t __value, mve_pred16_t __p)
+{
+  __builtin_mve_vstrbq_p_sv16qi ((__builtin_neon_qi *) __addr, __value, __p);
+}
+
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vstrbq_p_s32 (int8_t * __addr, int32x4_t __value, mve_pred16_t __p)
+{
+  __builtin_mve_vstrbq_p_sv4si ((__builtin_neon_qi *) __addr, __value, __p);
+}
+
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vstrbq_p_s16 (int8_t * __addr, int16x8_t __value, mve_pred16_t __p)
+{
+  __builtin_mve_vstrbq_p_sv8hi ((__builtin_neon_qi *) __addr, __value, __p);
+}
+
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vstrbq_p_u8 (uint8_t * __addr, uint8x16_t __value, mve_pred16_t __p)
+{
+  __builtin_mve_vstrbq_p_uv16qi ((__builtin_neon_qi *) __addr, __value, __p);
+}
+
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vstrbq_p_u32 (uint8_t * __addr, uint32x4_t __value, mve_pred16_t __p)
+{
+  __builtin_mve_vstrbq_p_uv4si ((__builtin_neon_qi *) __addr, __value, __p);
+}
+
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vstrbq_p_u16 (uint8_t * __addr, uint16x8_t __value, mve_pred16_t __p)
+{
+  __builtin_mve_vstrbq_p_uv8hi ((__builtin_neon_qi *) __addr, __value, __p);
+}
+
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vstrbq_scatter_offset_p_s8 (int8_t * __base, uint8x16_t __offset, int8x16_t __value, mve_pred16_t __p)
+{
+  __builtin_mve_vstrbq_scatter_offset_p_sv16qi ((__builtin_neon_qi *) __base, __offset, __value, __p);
+}
+
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vstrbq_scatter_offset_p_s32 (int8_t * __base, uint32x4_t __offset, int32x4_t __value, mve_pred16_t __p)
+{
+  __builtin_mve_vstrbq_scatter_offset_p_sv4si ((__builtin_neon_qi *) __base, __offset, __value, __p);
+}
+
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vstrbq_scatter_offset_p_s16 (int8_t * __base, uint16x8_t __offset, int16x8_t __value, mve_pred16_t __p)
+{
+  __builtin_mve_vstrbq_scatter_offset_p_sv8hi ((__builtin_neon_qi *) __base, __offset, __value, __p);
+}
+
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vstrbq_scatter_offset_p_u8 (uint8_t * __base, uint8x16_t __offset, uint8x16_t __value, mve_pred16_t __p)
+{
+  __builtin_mve_vstrbq_scatter_offset_p_uv16qi ((__builtin_neon_qi *) __base, __offset, __value, __p);
+}
+
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vstrbq_scatter_offset_p_u32 (uint8_t * __base, uint32x4_t __offset, uint32x4_t __value, mve_pred16_t __p)
+{
+  __builtin_mve_vstrbq_scatter_offset_p_uv4si ((__builtin_neon_qi *) __base, __offset, __value, __p);
+}
+
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vstrbq_scatter_offset_p_u16 (uint8_t * __base, uint16x8_t __offset, uint16x8_t __value, mve_pred16_t __p)
+{
+  __builtin_mve_vstrbq_scatter_offset_p_uv8hi ((__builtin_neon_qi *) __base, __offset, __value, __p);
+}
+
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vstrwq_scatter_base_p_s32 (uint32x4_t __addr, const int __offset, int32x4_t __value, mve_pred16_t __p)
+{
+  __builtin_mve_vstrwq_scatter_base_p_sv4si (__addr, __offset, __value, __p);
+}
+
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vstrwq_scatter_base_p_u32 (uint32x4_t __addr, const int __offset, uint32x4_t __value, mve_pred16_t __p)
+{
+  __builtin_mve_vstrwq_scatter_base_p_uv4si (__addr, __offset, __value, __p);
+}
 #if (__ARM_FEATURE_MVE & 2) /* MVE Floating point.  */
 
 __extension__ extern __inline void
@@ -17806,6 +17917,35 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_uint8_t_const_ptr][__ARM_mve_type_uint16x8_t]: __arm_vldrbq_gather_offset_u16 (__ARM_mve_coerce(__p0, uint8_t const *), __ARM_mve_coerce(__p1, uint16x8_t)), \
   int (*)[__ARM_mve_type_uint8_t_const_ptr][__ARM_mve_type_uint32x4_t]: __arm_vldrbq_gather_offset_u32 (__ARM_mve_coerce(__p0, uint8_t const *), __ARM_mve_coerce(__p1, uint32x4_t)));})
 
+#define vstrbq_p(p0,p1,p2) __arm_vstrbq_p(p0,p1,p2)
+#define __arm_vstrbq_p(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
+  __typeof(p1) __p1 = (p1); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
+  int (*)[__ARM_mve_type_int8_t_ptr][__ARM_mve_type_int8x16_t]: __arm_vstrbq_p_s8 (__ARM_mve_coerce(__p0, int8_t *), __ARM_mve_coerce(__p1, int8x16_t), p2), \
+  int (*)[__ARM_mve_type_int8_t_ptr][__ARM_mve_type_int16x8_t]: __arm_vstrbq_p_s16 (__ARM_mve_coerce(__p0, int8_t *), __ARM_mve_coerce(__p1, int16x8_t), p2), \
+  int (*)[__ARM_mve_type_int8_t_ptr][__ARM_mve_type_int32x4_t]: __arm_vstrbq_p_s32 (__ARM_mve_coerce(__p0, int8_t *), __ARM_mve_coerce(__p1, int32x4_t), p2), \
+  int (*)[__ARM_mve_type_uint8_t_ptr][__ARM_mve_type_uint8x16_t]: __arm_vstrbq_p_u8 (__ARM_mve_coerce(__p0, uint8_t *), __ARM_mve_coerce(__p1, uint8x16_t), p2), \
+  int (*)[__ARM_mve_type_uint8_t_ptr][__ARM_mve_type_uint16x8_t]: __arm_vstrbq_p_u16 (__ARM_mve_coerce(__p0, uint8_t *), __ARM_mve_coerce(__p1, uint16x8_t), p2), \
+  int (*)[__ARM_mve_type_uint8_t_ptr][__ARM_mve_type_uint32x4_t]: __arm_vstrbq_p_u32 (__ARM_mve_coerce(__p0, uint8_t *), __ARM_mve_coerce(__p1, uint32x4_t), p2));})
+
+#define vstrbq_scatter_offset_p(p0,p1,p2,p3) __arm_vstrbq_scatter_offset_p(p0,p1,p2,p3)
+#define __arm_vstrbq_scatter_offset_p(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
+  __typeof(p1) __p1 = (p1); \
+  __typeof(p2) __p2 = (p2); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
+  int (*)[__ARM_mve_type_int8_t_ptr][__ARM_mve_type_uint8x16_t][__ARM_mve_type_int8x16_t]: __arm_vstrbq_scatter_offset_p_s8 (__ARM_mve_coerce(__p0, int8_t *), __ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce(__p2, int8x16_t), p3), \
+  int (*)[__ARM_mve_type_int8_t_ptr][__ARM_mve_type_uint16x8_t][__ARM_mve_type_int16x8_t]: __arm_vstrbq_scatter_offset_p_s16 (__ARM_mve_coerce(__p0, int8_t *), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, int16x8_t), p3), \
+  int (*)[__ARM_mve_type_int8_t_ptr][__ARM_mve_type_uint32x4_t][__ARM_mve_type_int32x4_t]: __arm_vstrbq_scatter_offset_p_s32 (__ARM_mve_coerce(__p0, int8_t *), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, int32x4_t), p3), \
+  int (*)[__ARM_mve_type_uint8_t_ptr][__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]: __arm_vstrbq_scatter_offset_p_u8 (__ARM_mve_coerce(__p0, uint8_t *), __ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce(__p2, uint8x16_t), p3), \
+  int (*)[__ARM_mve_type_uint8_t_ptr][__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vstrbq_scatter_offset_p_u16 (__ARM_mve_coerce(__p0, uint8_t *), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, uint16x8_t), p3), \
+  int (*)[__ARM_mve_type_uint8_t_ptr][__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vstrbq_scatter_offset_p_u32 (__ARM_mve_coerce(__p0, uint8_t *), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, uint32x4_t), p3));})
+
+#define vstrwq_scatter_base_p(p0,p1,p2,p3) __arm_vstrwq_scatter_base_p(p0,p1,p2,p3)
+#define __arm_vstrwq_scatter_base_p(p0,p1,p2,p3) ({ __typeof(p2) __p2 = (p2); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p2)])0, \
+  int (*)[__ARM_mve_type_int32x4_t]: __arm_vstrwq_scatter_base_p_s32 (p0, p1, __ARM_mve_coerce(__p2, int32x4_t), p3), \
+  int (*)[__ARM_mve_type_uint32x4_t]: __arm_vstrwq_scatter_base_p_u32 (p0, p1, __ARM_mve_coerce(__p2, uint32x4_t), p3));})
+
 #endif /* MVE Floating point.  */
 
 #ifdef __cplusplus
diff --git a/gcc/config/arm/arm_mve_builtins.def b/gcc/config/arm/arm_mve_builtins.def
index 60e50767a2ac80a05303e3df8512f2612c8bc8ef..dc22c2f599702be5a4c4481033d3f413abf6d1c5 100644
--- a/gcc/config/arm/arm_mve_builtins.def
+++ b/gcc/config/arm/arm_mve_builtins.def
@@ -697,3 +697,9 @@ VAR3 (LDRS, vldrbq_s, v16qi, v8hi, v4si)
 VAR3 (LDRU, vldrbq_u, v16qi, v8hi, v4si)
 VAR1 (LDRGBS, vldrwq_gather_base_s, v4si)
 VAR1 (LDRGBU, vldrwq_gather_base_u, v4si)
+VAR3 (STRS_P, vstrbq_p_s, v16qi, v8hi, v4si)
+VAR3 (STRU_P, vstrbq_p_u, v16qi, v8hi, v4si)
+VAR3 (STRSS_P, vstrbq_scatter_offset_p_s, v16qi, v8hi, v4si)
+VAR3 (STRSU_P, vstrbq_scatter_offset_p_u, v16qi, v8hi, v4si)
+VAR1 (STRSBS_P, vstrwq_scatter_base_p_s, v4si)
+VAR1 (STRSBU_P, vstrwq_scatter_base_p_u, v4si)
diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index 7b416f9a4e04f8c02547953d6019787817688a36..266a7f830285521fd225c8b07bc15e412b7bab61 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -8070,3 +8070,68 @@
    return "";
 }
   [(set_attr "length" "4")])
+
+;;
+;; [vstrbq_scatter_offset_p_s vstrbq_scatter_offset_p_u]
+;;
+(define_insn "mve_vstrbq_scatter_offset_p_<supf><mode>"
+  [(set (match_operand:<MVE_B_ELEM> 0 "memory_operand" "=Us")
+	(unspec:<MVE_B_ELEM>
+		[(match_operand:MVE_2 1 "s_register_operand" "w")
+		 (match_operand:MVE_2 2 "s_register_operand" "w")
+		 (match_operand:HI 3 "vpr_register_operand" "Up")]
+	 VSTRBSOQ))
+  ]
+  "TARGET_HAVE_MVE"
+{
+   rtx ops[3];
+   ops[0] = operands[0];
+   ops[1] = operands[1];
+   ops[2] = operands[2];
+   output_asm_insn ("vpst\n\tvstrbt.<V_sz_elem>\t%q2, [%m0, %q1]",ops);
+   return "";
+}
+  [(set_attr "length" "8")])
+
+;;
+;; [vstrwq_scatter_base_p_s vstrwq_scatter_base_p_u]
+;;
+(define_insn "mve_vstrwq_scatter_base_p_<supf>v4si"
+  [(set (mem:BLK (scratch))
+	(unspec:BLK
+		[(match_operand:V4SI 0 "s_register_operand" "w")
+		 (match_operand:SI 1 "immediate_operand" "i")
+		 (match_operand:V4SI 2 "s_register_operand" "w")
+		 (match_operand:HI 3 "vpr_register_operand" "Up")]
+	 VSTRWSBQ))
+  ]
+  "TARGET_HAVE_MVE"
+{
+   rtx ops[3];
+   ops[0] = operands[0];
+   ops[1] = operands[1];
+   ops[2] = operands[2];
+   output_asm_insn ("vpst\n\tvstrwt.u32\t%q2, [%q0, %1]",ops);
+   return "";
+}
+  [(set_attr "length" "8")])
+
+;;
+;; [vstrbq_p_s vstrbq_p_u]
+;;
+(define_insn "mve_vstrbq_p_<supf><mode>"
+  [(set (match_operand:<MVE_B_ELEM> 0 "memory_operand" "=Us")
+	(unspec:<MVE_B_ELEM> [(match_operand:MVE_2 1 "s_register_operand" "w")
+			      (match_operand:HI 2 "vpr_register_operand" "Up")]
+	 VSTRBQ))
+  ]
+  "TARGET_HAVE_MVE"
+{
+   rtx ops[2];
+   int regno = REGNO (operands[1]);
+   ops[1] = gen_rtx_REG (TImode, regno);
+   ops[0]  = operands[0];
+   output_asm_insn ("vpst\n\tvstrbt.<V_sz_elem>\t%q1, %E0",ops);
+   return "";
+}
+  [(set_attr "length" "8")])
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrbq_p_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrbq_p_s16.c
new file mode 100644
index 0000000000000000000000000000000000000000..917542b0ac049791c9258ca5cbaff824b9b0de93
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrbq_p_s16.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+void
+foo (int8_t * addr, int16x8_t value, mve_pred16_t p)
+{
+  vstrbq_p_s16 (addr, value, p);
+}
+
+/* { dg-final { scan-assembler "vstrbt.16"  }  } */
+
+void
+foo1 (int8_t * addr, int16x8_t value, mve_pred16_t p)
+{
+  vstrbq_p (addr, value, p);
+}
+
+/* { dg-final { scan-assembler "vstrbt.16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrbq_p_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrbq_p_s32.c
new file mode 100644
index 0000000000000000000000000000000000000000..6948b80d6fdb9ff9ee09cb49c1bf48723a351fa4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrbq_p_s32.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+void
+foo (int8_t * addr, int32x4_t value, mve_pred16_t p)
+{
+  vstrbq_p_s32 (addr, value, p);
+}
+
+/* { dg-final { scan-assembler "vstrbt.32"  }  } */
+
+void
+foo1 (int8_t * addr, int32x4_t value, mve_pred16_t p)
+{
+  vstrbq_p (addr, value, p);
+}
+
+/* { dg-final { scan-assembler "vstrbt.32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrbq_p_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrbq_p_s8.c
new file mode 100644
index 0000000000000000000000000000000000000000..5bf59d8a5afdaf92d940cad2fa84dc672e536654
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrbq_p_s8.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+void
+foo (int8_t * addr, int8x16_t value, mve_pred16_t p)
+{
+  vstrbq_p_s8 (addr, value, p);
+}
+
+/* { dg-final { scan-assembler "vstrbt.8"  }  } */
+
+void
+foo1 (int8_t * addr, int8x16_t value, mve_pred16_t p)
+{
+  vstrbq_p (addr, value, p);
+}
+
+/* { dg-final { scan-assembler "vstrbt.8"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrbq_p_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrbq_p_u16.c
new file mode 100644
index 0000000000000000000000000000000000000000..3c8b011ddd3ac086841f48d6b8ef88632aeaa809
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrbq_p_u16.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+void
+foo (uint8_t * addr, uint16x8_t value, mve_pred16_t p)
+{
+  vstrbq_p_u16 (addr, value, p);
+}
+
+/* { dg-final { scan-assembler "vstrbt.16"  }  } */
+
+void
+foo1 (uint8_t * addr, uint16x8_t value, mve_pred16_t p)
+{
+  vstrbq_p (addr, value, p);
+}
+
+/* { dg-final { scan-assembler "vstrbt.16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrbq_p_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrbq_p_u32.c
new file mode 100644
index 0000000000000000000000000000000000000000..a195b75250377e7d341aa06ac02af812f2e558b7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrbq_p_u32.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+void
+foo (uint8_t * addr, uint32x4_t value, mve_pred16_t p)
+{
+  vstrbq_p_u32 (addr, value, p);
+}
+
+/* { dg-final { scan-assembler "vstrbt.32"  }  } */
+
+void
+foo1 (uint8_t * addr, uint32x4_t value, mve_pred16_t p)
+{
+  vstrbq_p (addr, value, p);
+}
+
+/* { dg-final { scan-assembler "vstrbt.32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrbq_p_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrbq_p_u8.c
new file mode 100644
index 0000000000000000000000000000000000000000..aecbbee0d0a931aa1ded220566b6c81f0469d2f5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrbq_p_u8.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+void
+foo (uint8_t * addr, uint8x16_t value, mve_pred16_t p)
+{
+  vstrbq_p_u8 (addr, value, p);
+}
+
+/* { dg-final { scan-assembler "vstrbt.8"  }  } */
+
+void
+foo1 (uint8_t * addr, uint8x16_t value, mve_pred16_t p)
+{
+  vstrbq_p (addr, value, p);
+}
+
+/* { dg-final { scan-assembler "vstrbt.8"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrbq_scatter_offset_p_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrbq_scatter_offset_p_s16.c
new file mode 100644
index 0000000000000000000000000000000000000000..d791dd0aff0ffc444138187d3a731808cd16d218
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrbq_scatter_offset_p_s16.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+void
+foo (int8_t * base, uint16x8_t offset, int16x8_t value, mve_pred16_t p)
+{
+  vstrbq_scatter_offset_p_s16 (base, offset, value, p);
+}
+
+/* { dg-final { scan-assembler "vstrbt.16"  }  } */
+
+void
+foo1 (int8_t * base, uint16x8_t offset, int16x8_t value, mve_pred16_t p)
+{
+  vstrbq_scatter_offset_p (base, offset, value, p);
+}
+
+/* { dg-final { scan-assembler "vstrbt.16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrbq_scatter_offset_p_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrbq_scatter_offset_p_s32.c
new file mode 100644
index 0000000000000000000000000000000000000000..9cdac1e99d10d0f723b79f410ae16d02c289f932
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrbq_scatter_offset_p_s32.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+void
+foo (int8_t * base, uint32x4_t offset, int32x4_t value, mve_pred16_t p)
+{
+  vstrbq_scatter_offset_p_s32 (base, offset, value, p);
+}
+
+/* { dg-final { scan-assembler "vstrbt.32"  }  } */
+
+void
+foo1 (int8_t * base, uint32x4_t offset, int32x4_t value, mve_pred16_t p)
+{
+  vstrbq_scatter_offset_p (base, offset, value, p);
+}
+
+/* { dg-final { scan-assembler "vstrbt.32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrbq_scatter_offset_p_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrbq_scatter_offset_p_s8.c
new file mode 100644
index 0000000000000000000000000000000000000000..b96bd7bbccdea09e723374ae1913711931bfe568
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrbq_scatter_offset_p_s8.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+void
+foo (int8_t * base, uint8x16_t offset, int8x16_t value, mve_pred16_t p)
+{
+  vstrbq_scatter_offset_p_s8 (base, offset, value, p);
+}
+
+/* { dg-final { scan-assembler "vstrbt.8"  }  } */
+
+void
+foo1 (int8_t * base, uint8x16_t offset, int8x16_t value, mve_pred16_t p)
+{
+  vstrbq_scatter_offset_p (base, offset, value, p);
+}
+
+/* { dg-final { scan-assembler "vstrbt.8"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrbq_scatter_offset_p_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrbq_scatter_offset_p_u16.c
new file mode 100644
index 0000000000000000000000000000000000000000..51380f7c58e1798821db8cf5d2e5b9661267104c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrbq_scatter_offset_p_u16.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+void
+foo (uint8_t * base, uint16x8_t offset, uint16x8_t value, mve_pred16_t p)
+{
+  vstrbq_scatter_offset_p_u16 (base, offset, value, p);
+}
+
+/* { dg-final { scan-assembler "vstrbt.16"  }  } */
+
+void
+foo1 (uint8_t * base, uint16x8_t offset, uint16x8_t value, mve_pred16_t p)
+{
+  vstrbq_scatter_offset_p (base, offset, value, p);
+}
+
+/* { dg-final { scan-assembler "vstrbt.16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrbq_scatter_offset_p_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrbq_scatter_offset_p_u32.c
new file mode 100644
index 0000000000000000000000000000000000000000..b8f6aafb7d2f4968e6c336446239bd56f337e28b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrbq_scatter_offset_p_u32.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+void
+foo (uint8_t * base, uint32x4_t offset, uint32x4_t value, mve_pred16_t p)
+{
+  vstrbq_scatter_offset_p_u32 (base, offset, value, p);
+}
+
+/* { dg-final { scan-assembler "vstrbt.32"  }  } */
+
+void
+foo1 (uint8_t * base, uint32x4_t offset, uint32x4_t value, mve_pred16_t p)
+{
+  vstrbq_scatter_offset_p (base, offset, value, p);
+}
+
+/* { dg-final { scan-assembler "vstrbt.32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrbq_scatter_offset_p_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrbq_scatter_offset_p_u8.c
new file mode 100644
index 0000000000000000000000000000000000000000..492c97bac5062040d6fb0ed3f1255de2d2f6fdb4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrbq_scatter_offset_p_u8.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+void
+foo (uint8_t * base, uint8x16_t offset, uint8x16_t value, mve_pred16_t p)
+{
+  vstrbq_scatter_offset_p_u8 (base, offset, value, p);
+}
+
+/* { dg-final { scan-assembler "vstrbt.8"  }  } */
+
+void
+foo1 (uint8_t * base, uint8x16_t offset, uint8x16_t value, mve_pred16_t p)
+{
+  vstrbq_scatter_offset_p (base, offset, value, p);
+}
+
+/* { dg-final { scan-assembler "vstrbt.8"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_p_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_p_s32.c
new file mode 100644
index 0000000000000000000000000000000000000000..c20551bdde89287eadf04c68001e8a1699c2afb2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_p_s32.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+void
+foo (uint32x4_t addr, int32x4_t value, mve_pred16_t p)
+{
+  vstrwq_scatter_base_p_s32 (addr, 8, value, p);
+}
+
+/* { dg-final { scan-assembler "vstrwt.u32"  }  } */
+
+void
+foo1 (uint32x4_t addr, int32x4_t value, mve_pred16_t p)
+{
+  vstrwq_scatter_base_p (addr, 8, value, p);
+}
+
+/* { dg-final { scan-assembler "vstrwt.u32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_p_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_p_u32.c
new file mode 100644
index 0000000000000000000000000000000000000000..abd5e13e21ece98654cc136b8d76b2238df6a523
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_p_u32.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+void
+foo (uint32x4_t addr, uint32x4_t value, mve_pred16_t p)
+{
+  vstrwq_scatter_base_p_u32 (addr, 8, value, p);
+}
+
+/* { dg-final { scan-assembler "vstrwt.u32"  }  } */
+
+void
+foo1 (uint32x4_t addr, uint32x4_t value, mve_pred16_t p)
+{
+  vstrwq_scatter_base_p (addr, 8, value, p);
+}
+
+/* { dg-final { scan-assembler "vstrwt.u32"  }  } */


[-- Attachment #2: diff22.patch --]
[-- Type: text/plain, Size: 28351 bytes --]

diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index 02ea297937b18099f33a50c808964d1dd7eac1b3..b5639051bf07785d906ed596e08d670f4de1a67e 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -589,6 +589,41 @@ arm_strsbu_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 #define STRSBU_QUALIFIERS (arm_strsbu_qualifiers)
 
 static enum arm_type_qualifiers
+arm_strs_p_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_void, qualifier_pointer, qualifier_none, qualifier_unsigned};
+#define STRS_P_QUALIFIERS (arm_strs_p_qualifiers)
+
+static enum arm_type_qualifiers
+arm_stru_p_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_void, qualifier_pointer, qualifier_unsigned,
+      qualifier_unsigned};
+#define STRU_P_QUALIFIERS (arm_stru_p_qualifiers)
+
+static enum arm_type_qualifiers
+arm_strsu_p_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_void, qualifier_pointer, qualifier_unsigned,
+      qualifier_unsigned, qualifier_unsigned};
+#define STRSU_P_QUALIFIERS (arm_strsu_p_qualifiers)
+
+static enum arm_type_qualifiers
+arm_strss_p_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_void, qualifier_pointer, qualifier_unsigned,
+      qualifier_none, qualifier_unsigned};
+#define STRSS_P_QUALIFIERS (arm_strss_p_qualifiers)
+
+static enum arm_type_qualifiers
+arm_strsbs_p_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_void, qualifier_unsigned, qualifier_immediate,
+      qualifier_none, qualifier_unsigned};
+#define STRSBS_P_QUALIFIERS (arm_strsbs_p_qualifiers)
+
+static enum arm_type_qualifiers
+arm_strsbu_p_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_void, qualifier_unsigned, qualifier_immediate,
+      qualifier_unsigned, qualifier_unsigned};
+#define STRSBU_P_QUALIFIERS (arm_strsbu_p_qualifiers)
+
+static enum arm_type_qualifiers
 arm_ldrgu_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_unsigned, qualifier_pointer, qualifier_unsigned};
 #define LDRGU_QUALIFIERS (arm_ldrgu_qualifiers)
diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
index c123de1f3d37054b779a9ee908525c242b2bc4ab..f852acb575b596c534c474d19faa73c03cf85c5e 100644
--- a/gcc/config/arm/arm_mve.h
+++ b/gcc/config/arm/arm_mve.h
@@ -1730,6 +1730,20 @@ typedef struct { uint8x16_t val[4]; } uint8x16x4_t;
 #define vldrbq_u32(__base) __arm_vldrbq_u32(__base)
 #define vldrwq_gather_base_s32(__addr,  __offset) __arm_vldrwq_gather_base_s32(__addr,  __offset)
 #define vldrwq_gather_base_u32(__addr,  __offset) __arm_vldrwq_gather_base_u32(__addr,  __offset)
+#define vstrbq_p_s8( __addr, __value, __p) __arm_vstrbq_p_s8( __addr, __value, __p)
+#define vstrbq_p_s32( __addr, __value, __p) __arm_vstrbq_p_s32( __addr, __value, __p)
+#define vstrbq_p_s16( __addr, __value, __p) __arm_vstrbq_p_s16( __addr, __value, __p)
+#define vstrbq_p_u8( __addr, __value, __p) __arm_vstrbq_p_u8( __addr, __value, __p)
+#define vstrbq_p_u32( __addr, __value, __p) __arm_vstrbq_p_u32( __addr, __value, __p)
+#define vstrbq_p_u16( __addr, __value, __p) __arm_vstrbq_p_u16( __addr, __value, __p)
+#define vstrbq_scatter_offset_p_s8( __base, __offset, __value, __p) __arm_vstrbq_scatter_offset_p_s8( __base, __offset, __value, __p)
+#define vstrbq_scatter_offset_p_s32( __base, __offset, __value, __p) __arm_vstrbq_scatter_offset_p_s32( __base, __offset, __value, __p)
+#define vstrbq_scatter_offset_p_s16( __base, __offset, __value, __p) __arm_vstrbq_scatter_offset_p_s16( __base, __offset, __value, __p)
+#define vstrbq_scatter_offset_p_u8( __base, __offset, __value, __p) __arm_vstrbq_scatter_offset_p_u8( __base, __offset, __value, __p)
+#define vstrbq_scatter_offset_p_u32( __base, __offset, __value, __p) __arm_vstrbq_scatter_offset_p_u32( __base, __offset, __value, __p)
+#define vstrbq_scatter_offset_p_u16( __base, __offset, __value, __p) __arm_vstrbq_scatter_offset_p_u16( __base, __offset, __value, __p)
+#define vstrwq_scatter_base_p_s32(__addr,  __offset, __value, __p) __arm_vstrwq_scatter_base_p_s32(__addr,  __offset, __value, __p)
+#define vstrwq_scatter_base_p_u32(__addr,  __offset, __value, __p) __arm_vstrwq_scatter_base_p_u32(__addr,  __offset, __value, __p)
 #endif
 
 __extension__ extern __inline void
@@ -11219,6 +11233,103 @@ __arm_vldrwq_gather_base_u32 (uint32x4_t __addr, const int __offset)
   return __builtin_mve_vldrwq_gather_base_uv4si (__addr, __offset);
 }
 
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vstrbq_p_s8 (int8_t * __addr, int8x16_t __value, mve_pred16_t __p)
+{
+  __builtin_mve_vstrbq_p_sv16qi ((__builtin_neon_qi *) __addr, __value, __p);
+}
+
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vstrbq_p_s32 (int8_t * __addr, int32x4_t __value, mve_pred16_t __p)
+{
+  __builtin_mve_vstrbq_p_sv4si ((__builtin_neon_qi *) __addr, __value, __p);
+}
+
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vstrbq_p_s16 (int8_t * __addr, int16x8_t __value, mve_pred16_t __p)
+{
+  __builtin_mve_vstrbq_p_sv8hi ((__builtin_neon_qi *) __addr, __value, __p);
+}
+
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vstrbq_p_u8 (uint8_t * __addr, uint8x16_t __value, mve_pred16_t __p)
+{
+  __builtin_mve_vstrbq_p_uv16qi ((__builtin_neon_qi *) __addr, __value, __p);
+}
+
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vstrbq_p_u32 (uint8_t * __addr, uint32x4_t __value, mve_pred16_t __p)
+{
+  __builtin_mve_vstrbq_p_uv4si ((__builtin_neon_qi *) __addr, __value, __p);
+}
+
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vstrbq_p_u16 (uint8_t * __addr, uint16x8_t __value, mve_pred16_t __p)
+{
+  __builtin_mve_vstrbq_p_uv8hi ((__builtin_neon_qi *) __addr, __value, __p);
+}
+
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vstrbq_scatter_offset_p_s8 (int8_t * __base, uint8x16_t __offset, int8x16_t __value, mve_pred16_t __p)
+{
+  __builtin_mve_vstrbq_scatter_offset_p_sv16qi ((__builtin_neon_qi *) __base, __offset, __value, __p);
+}
+
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vstrbq_scatter_offset_p_s32 (int8_t * __base, uint32x4_t __offset, int32x4_t __value, mve_pred16_t __p)
+{
+  __builtin_mve_vstrbq_scatter_offset_p_sv4si ((__builtin_neon_qi *) __base, __offset, __value, __p);
+}
+
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vstrbq_scatter_offset_p_s16 (int8_t * __base, uint16x8_t __offset, int16x8_t __value, mve_pred16_t __p)
+{
+  __builtin_mve_vstrbq_scatter_offset_p_sv8hi ((__builtin_neon_qi *) __base, __offset, __value, __p);
+}
+
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vstrbq_scatter_offset_p_u8 (uint8_t * __base, uint8x16_t __offset, uint8x16_t __value, mve_pred16_t __p)
+{
+  __builtin_mve_vstrbq_scatter_offset_p_uv16qi ((__builtin_neon_qi *) __base, __offset, __value, __p);
+}
+
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vstrbq_scatter_offset_p_u32 (uint8_t * __base, uint32x4_t __offset, uint32x4_t __value, mve_pred16_t __p)
+{
+  __builtin_mve_vstrbq_scatter_offset_p_uv4si ((__builtin_neon_qi *) __base, __offset, __value, __p);
+}
+
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vstrbq_scatter_offset_p_u16 (uint8_t * __base, uint16x8_t __offset, uint16x8_t __value, mve_pred16_t __p)
+{
+  __builtin_mve_vstrbq_scatter_offset_p_uv8hi ((__builtin_neon_qi *) __base, __offset, __value, __p);
+}
+
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vstrwq_scatter_base_p_s32 (uint32x4_t __addr, const int __offset, int32x4_t __value, mve_pred16_t __p)
+{
+  __builtin_mve_vstrwq_scatter_base_p_sv4si (__addr, __offset, __value, __p);
+}
+
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vstrwq_scatter_base_p_u32 (uint32x4_t __addr, const int __offset, uint32x4_t __value, mve_pred16_t __p)
+{
+  __builtin_mve_vstrwq_scatter_base_p_uv4si (__addr, __offset, __value, __p);
+}
 #if (__ARM_FEATURE_MVE & 2) /* MVE Floating point.  */
 
 __extension__ extern __inline void
@@ -17806,6 +17917,35 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_uint8_t_const_ptr][__ARM_mve_type_uint16x8_t]: __arm_vldrbq_gather_offset_u16 (__ARM_mve_coerce(__p0, uint8_t const *), __ARM_mve_coerce(__p1, uint16x8_t)), \
   int (*)[__ARM_mve_type_uint8_t_const_ptr][__ARM_mve_type_uint32x4_t]: __arm_vldrbq_gather_offset_u32 (__ARM_mve_coerce(__p0, uint8_t const *), __ARM_mve_coerce(__p1, uint32x4_t)));})
 
+#define vstrbq_p(p0,p1,p2) __arm_vstrbq_p(p0,p1,p2)
+#define __arm_vstrbq_p(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
+  __typeof(p1) __p1 = (p1); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
+  int (*)[__ARM_mve_type_int8_t_ptr][__ARM_mve_type_int8x16_t]: __arm_vstrbq_p_s8 (__ARM_mve_coerce(__p0, int8_t *), __ARM_mve_coerce(__p1, int8x16_t), p2), \
+  int (*)[__ARM_mve_type_int8_t_ptr][__ARM_mve_type_int16x8_t]: __arm_vstrbq_p_s16 (__ARM_mve_coerce(__p0, int8_t *), __ARM_mve_coerce(__p1, int16x8_t), p2), \
+  int (*)[__ARM_mve_type_int8_t_ptr][__ARM_mve_type_int32x4_t]: __arm_vstrbq_p_s32 (__ARM_mve_coerce(__p0, int8_t *), __ARM_mve_coerce(__p1, int32x4_t), p2), \
+  int (*)[__ARM_mve_type_uint8_t_ptr][__ARM_mve_type_uint8x16_t]: __arm_vstrbq_p_u8 (__ARM_mve_coerce(__p0, uint8_t *), __ARM_mve_coerce(__p1, uint8x16_t), p2), \
+  int (*)[__ARM_mve_type_uint8_t_ptr][__ARM_mve_type_uint16x8_t]: __arm_vstrbq_p_u16 (__ARM_mve_coerce(__p0, uint8_t *), __ARM_mve_coerce(__p1, uint16x8_t), p2), \
+  int (*)[__ARM_mve_type_uint8_t_ptr][__ARM_mve_type_uint32x4_t]: __arm_vstrbq_p_u32 (__ARM_mve_coerce(__p0, uint8_t *), __ARM_mve_coerce(__p1, uint32x4_t), p2));})
+
+#define vstrbq_scatter_offset_p(p0,p1,p2,p3) __arm_vstrbq_scatter_offset_p(p0,p1,p2,p3)
+#define __arm_vstrbq_scatter_offset_p(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
+  __typeof(p1) __p1 = (p1); \
+  __typeof(p2) __p2 = (p2); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
+  int (*)[__ARM_mve_type_int8_t_ptr][__ARM_mve_type_uint8x16_t][__ARM_mve_type_int8x16_t]: __arm_vstrbq_scatter_offset_p_s8 (__ARM_mve_coerce(__p0, int8_t *), __ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce(__p2, int8x16_t), p3), \
+  int (*)[__ARM_mve_type_int8_t_ptr][__ARM_mve_type_uint16x8_t][__ARM_mve_type_int16x8_t]: __arm_vstrbq_scatter_offset_p_s16 (__ARM_mve_coerce(__p0, int8_t *), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, int16x8_t), p3), \
+  int (*)[__ARM_mve_type_int8_t_ptr][__ARM_mve_type_uint32x4_t][__ARM_mve_type_int32x4_t]: __arm_vstrbq_scatter_offset_p_s32 (__ARM_mve_coerce(__p0, int8_t *), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, int32x4_t), p3), \
+  int (*)[__ARM_mve_type_uint8_t_ptr][__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]: __arm_vstrbq_scatter_offset_p_u8 (__ARM_mve_coerce(__p0, uint8_t *), __ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce(__p2, uint8x16_t), p3), \
+  int (*)[__ARM_mve_type_uint8_t_ptr][__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vstrbq_scatter_offset_p_u16 (__ARM_mve_coerce(__p0, uint8_t *), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, uint16x8_t), p3), \
+  int (*)[__ARM_mve_type_uint8_t_ptr][__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vstrbq_scatter_offset_p_u32 (__ARM_mve_coerce(__p0, uint8_t *), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, uint32x4_t), p3));})
+
+#define vstrwq_scatter_base_p(p0,p1,p2,p3) __arm_vstrwq_scatter_base_p(p0,p1,p2,p3)
+#define __arm_vstrwq_scatter_base_p(p0,p1,p2,p3) ({ __typeof(p2) __p2 = (p2); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p2)])0, \
+  int (*)[__ARM_mve_type_int32x4_t]: __arm_vstrwq_scatter_base_p_s32 (p0, p1, __ARM_mve_coerce(__p2, int32x4_t), p3), \
+  int (*)[__ARM_mve_type_uint32x4_t]: __arm_vstrwq_scatter_base_p_u32 (p0, p1, __ARM_mve_coerce(__p2, uint32x4_t), p3));})
+
 #endif /* MVE Floating point.  */
 
 #ifdef __cplusplus
diff --git a/gcc/config/arm/arm_mve_builtins.def b/gcc/config/arm/arm_mve_builtins.def
index 60e50767a2ac80a05303e3df8512f2612c8bc8ef..dc22c2f599702be5a4c4481033d3f413abf6d1c5 100644
--- a/gcc/config/arm/arm_mve_builtins.def
+++ b/gcc/config/arm/arm_mve_builtins.def
@@ -697,3 +697,9 @@ VAR3 (LDRS, vldrbq_s, v16qi, v8hi, v4si)
 VAR3 (LDRU, vldrbq_u, v16qi, v8hi, v4si)
 VAR1 (LDRGBS, vldrwq_gather_base_s, v4si)
 VAR1 (LDRGBU, vldrwq_gather_base_u, v4si)
+VAR3 (STRS_P, vstrbq_p_s, v16qi, v8hi, v4si)
+VAR3 (STRU_P, vstrbq_p_u, v16qi, v8hi, v4si)
+VAR3 (STRSS_P, vstrbq_scatter_offset_p_s, v16qi, v8hi, v4si)
+VAR3 (STRSU_P, vstrbq_scatter_offset_p_u, v16qi, v8hi, v4si)
+VAR1 (STRSBS_P, vstrwq_scatter_base_p_s, v4si)
+VAR1 (STRSBU_P, vstrwq_scatter_base_p_u, v4si)
diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index 7b416f9a4e04f8c02547953d6019787817688a36..266a7f830285521fd225c8b07bc15e412b7bab61 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -8070,3 +8070,68 @@
    return "";
 }
   [(set_attr "length" "4")])
+
+;;
+;; [vstrbq_scatter_offset_p_s vstrbq_scatter_offset_p_u]
+;;
+(define_insn "mve_vstrbq_scatter_offset_p_<supf><mode>"
+  [(set (match_operand:<MVE_B_ELEM> 0 "memory_operand" "=Us")
+	(unspec:<MVE_B_ELEM>
+		[(match_operand:MVE_2 1 "s_register_operand" "w")
+		 (match_operand:MVE_2 2 "s_register_operand" "w")
+		 (match_operand:HI 3 "vpr_register_operand" "Up")]
+	 VSTRBSOQ))
+  ]
+  "TARGET_HAVE_MVE"
+{
+   rtx ops[3];
+   ops[0] = operands[0];
+   ops[1] = operands[1];
+   ops[2] = operands[2];
+   output_asm_insn ("vpst\n\tvstrbt.<V_sz_elem>\t%q2, [%m0, %q1]",ops);
+   return "";
+}
+  [(set_attr "length" "8")])
+
+;;
+;; [vstrwq_scatter_base_p_s vstrwq_scatter_base_p_u]
+;;
+(define_insn "mve_vstrwq_scatter_base_p_<supf>v4si"
+  [(set (mem:BLK (scratch))
+	(unspec:BLK
+		[(match_operand:V4SI 0 "s_register_operand" "w")
+		 (match_operand:SI 1 "immediate_operand" "i")
+		 (match_operand:V4SI 2 "s_register_operand" "w")
+		 (match_operand:HI 3 "vpr_register_operand" "Up")]
+	 VSTRWSBQ))
+  ]
+  "TARGET_HAVE_MVE"
+{
+   rtx ops[3];
+   ops[0] = operands[0];
+   ops[1] = operands[1];
+   ops[2] = operands[2];
+   output_asm_insn ("vpst\n\tvstrwt.u32\t%q2, [%q0, %1]",ops);
+   return "";
+}
+  [(set_attr "length" "8")])
+
+;;
+;; [vstrbq_p_s vstrbq_p_u]
+;;
+(define_insn "mve_vstrbq_p_<supf><mode>"
+  [(set (match_operand:<MVE_B_ELEM> 0 "memory_operand" "=Us")
+	(unspec:<MVE_B_ELEM> [(match_operand:MVE_2 1 "s_register_operand" "w")
+			      (match_operand:HI 2 "vpr_register_operand" "Up")]
+	 VSTRBQ))
+  ]
+  "TARGET_HAVE_MVE"
+{
+   rtx ops[2];
+   int regno = REGNO (operands[1]);
+   ops[1] = gen_rtx_REG (TImode, regno);
+   ops[0]  = operands[0];
+   output_asm_insn ("vpst\n\tvstrbt.<V_sz_elem>\t%q1, %E0",ops);
+   return "";
+}
+  [(set_attr "length" "8")])
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrbq_p_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrbq_p_s16.c
new file mode 100644
index 0000000000000000000000000000000000000000..917542b0ac049791c9258ca5cbaff824b9b0de93
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrbq_p_s16.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+void
+foo (int8_t * addr, int16x8_t value, mve_pred16_t p)
+{
+  vstrbq_p_s16 (addr, value, p);
+}
+
+/* { dg-final { scan-assembler "vstrbt.16"  }  } */
+
+void
+foo1 (int8_t * addr, int16x8_t value, mve_pred16_t p)
+{
+  vstrbq_p (addr, value, p);
+}
+
+/* { dg-final { scan-assembler "vstrbt.16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrbq_p_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrbq_p_s32.c
new file mode 100644
index 0000000000000000000000000000000000000000..6948b80d6fdb9ff9ee09cb49c1bf48723a351fa4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrbq_p_s32.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+void
+foo (int8_t * addr, int32x4_t value, mve_pred16_t p)
+{
+  vstrbq_p_s32 (addr, value, p);
+}
+
+/* { dg-final { scan-assembler "vstrbt.32"  }  } */
+
+void
+foo1 (int8_t * addr, int32x4_t value, mve_pred16_t p)
+{
+  vstrbq_p (addr, value, p);
+}
+
+/* { dg-final { scan-assembler "vstrbt.32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrbq_p_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrbq_p_s8.c
new file mode 100644
index 0000000000000000000000000000000000000000..5bf59d8a5afdaf92d940cad2fa84dc672e536654
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrbq_p_s8.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+void
+foo (int8_t * addr, int8x16_t value, mve_pred16_t p)
+{
+  vstrbq_p_s8 (addr, value, p);
+}
+
+/* { dg-final { scan-assembler "vstrbt.8"  }  } */
+
+void
+foo1 (int8_t * addr, int8x16_t value, mve_pred16_t p)
+{
+  vstrbq_p (addr, value, p);
+}
+
+/* { dg-final { scan-assembler "vstrbt.8"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrbq_p_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrbq_p_u16.c
new file mode 100644
index 0000000000000000000000000000000000000000..3c8b011ddd3ac086841f48d6b8ef88632aeaa809
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrbq_p_u16.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+void
+foo (uint8_t * addr, uint16x8_t value, mve_pred16_t p)
+{
+  vstrbq_p_u16 (addr, value, p);
+}
+
+/* { dg-final { scan-assembler "vstrbt.16"  }  } */
+
+void
+foo1 (uint8_t * addr, uint16x8_t value, mve_pred16_t p)
+{
+  vstrbq_p (addr, value, p);
+}
+
+/* { dg-final { scan-assembler "vstrbt.16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrbq_p_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrbq_p_u32.c
new file mode 100644
index 0000000000000000000000000000000000000000..a195b75250377e7d341aa06ac02af812f2e558b7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrbq_p_u32.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+void
+foo (uint8_t * addr, uint32x4_t value, mve_pred16_t p)
+{
+  vstrbq_p_u32 (addr, value, p);
+}
+
+/* { dg-final { scan-assembler "vstrbt.32"  }  } */
+
+void
+foo1 (uint8_t * addr, uint32x4_t value, mve_pred16_t p)
+{
+  vstrbq_p (addr, value, p);
+}
+
+/* { dg-final { scan-assembler "vstrbt.32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrbq_p_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrbq_p_u8.c
new file mode 100644
index 0000000000000000000000000000000000000000..aecbbee0d0a931aa1ded220566b6c81f0469d2f5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrbq_p_u8.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+void
+foo (uint8_t * addr, uint8x16_t value, mve_pred16_t p)
+{
+  vstrbq_p_u8 (addr, value, p);
+}
+
+/* { dg-final { scan-assembler "vstrbt.8"  }  } */
+
+void
+foo1 (uint8_t * addr, uint8x16_t value, mve_pred16_t p)
+{
+  vstrbq_p (addr, value, p);
+}
+
+/* { dg-final { scan-assembler "vstrbt.8"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrbq_scatter_offset_p_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrbq_scatter_offset_p_s16.c
new file mode 100644
index 0000000000000000000000000000000000000000..d791dd0aff0ffc444138187d3a731808cd16d218
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrbq_scatter_offset_p_s16.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+void
+foo (int8_t * base, uint16x8_t offset, int16x8_t value, mve_pred16_t p)
+{
+  vstrbq_scatter_offset_p_s16 (base, offset, value, p);
+}
+
+/* { dg-final { scan-assembler "vstrbt.16"  }  } */
+
+void
+foo1 (int8_t * base, uint16x8_t offset, int16x8_t value, mve_pred16_t p)
+{
+  vstrbq_scatter_offset_p (base, offset, value, p);
+}
+
+/* { dg-final { scan-assembler "vstrbt.16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrbq_scatter_offset_p_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrbq_scatter_offset_p_s32.c
new file mode 100644
index 0000000000000000000000000000000000000000..9cdac1e99d10d0f723b79f410ae16d02c289f932
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrbq_scatter_offset_p_s32.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+void
+foo (int8_t * base, uint32x4_t offset, int32x4_t value, mve_pred16_t p)
+{
+  vstrbq_scatter_offset_p_s32 (base, offset, value, p);
+}
+
+/* { dg-final { scan-assembler "vstrbt.32"  }  } */
+
+void
+foo1 (int8_t * base, uint32x4_t offset, int32x4_t value, mve_pred16_t p)
+{
+  vstrbq_scatter_offset_p (base, offset, value, p);
+}
+
+/* { dg-final { scan-assembler "vstrbt.32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrbq_scatter_offset_p_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrbq_scatter_offset_p_s8.c
new file mode 100644
index 0000000000000000000000000000000000000000..b96bd7bbccdea09e723374ae1913711931bfe568
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrbq_scatter_offset_p_s8.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+void
+foo (int8_t * base, uint8x16_t offset, int8x16_t value, mve_pred16_t p)
+{
+  vstrbq_scatter_offset_p_s8 (base, offset, value, p);
+}
+
+/* { dg-final { scan-assembler "vstrbt.8"  }  } */
+
+void
+foo1 (int8_t * base, uint8x16_t offset, int8x16_t value, mve_pred16_t p)
+{
+  vstrbq_scatter_offset_p (base, offset, value, p);
+}
+
+/* { dg-final { scan-assembler "vstrbt.8"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrbq_scatter_offset_p_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrbq_scatter_offset_p_u16.c
new file mode 100644
index 0000000000000000000000000000000000000000..51380f7c58e1798821db8cf5d2e5b9661267104c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrbq_scatter_offset_p_u16.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+void
+foo (uint8_t * base, uint16x8_t offset, uint16x8_t value, mve_pred16_t p)
+{
+  vstrbq_scatter_offset_p_u16 (base, offset, value, p);
+}
+
+/* { dg-final { scan-assembler "vstrbt.16"  }  } */
+
+void
+foo1 (uint8_t * base, uint16x8_t offset, uint16x8_t value, mve_pred16_t p)
+{
+  vstrbq_scatter_offset_p (base, offset, value, p);
+}
+
+/* { dg-final { scan-assembler "vstrbt.16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrbq_scatter_offset_p_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrbq_scatter_offset_p_u32.c
new file mode 100644
index 0000000000000000000000000000000000000000..b8f6aafb7d2f4968e6c336446239bd56f337e28b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrbq_scatter_offset_p_u32.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+void
+foo (uint8_t * base, uint32x4_t offset, uint32x4_t value, mve_pred16_t p)
+{
+  vstrbq_scatter_offset_p_u32 (base, offset, value, p);
+}
+
+/* { dg-final { scan-assembler "vstrbt.32"  }  } */
+
+void
+foo1 (uint8_t * base, uint32x4_t offset, uint32x4_t value, mve_pred16_t p)
+{
+  vstrbq_scatter_offset_p (base, offset, value, p);
+}
+
+/* { dg-final { scan-assembler "vstrbt.32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrbq_scatter_offset_p_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrbq_scatter_offset_p_u8.c
new file mode 100644
index 0000000000000000000000000000000000000000..492c97bac5062040d6fb0ed3f1255de2d2f6fdb4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrbq_scatter_offset_p_u8.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+void
+foo (uint8_t * base, uint8x16_t offset, uint8x16_t value, mve_pred16_t p)
+{
+  vstrbq_scatter_offset_p_u8 (base, offset, value, p);
+}
+
+/* { dg-final { scan-assembler "vstrbt.8"  }  } */
+
+void
+foo1 (uint8_t * base, uint8x16_t offset, uint8x16_t value, mve_pred16_t p)
+{
+  vstrbq_scatter_offset_p (base, offset, value, p);
+}
+
+/* { dg-final { scan-assembler "vstrbt.8"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_p_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_p_s32.c
new file mode 100644
index 0000000000000000000000000000000000000000..c20551bdde89287eadf04c68001e8a1699c2afb2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_p_s32.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+void
+foo (uint32x4_t addr, int32x4_t value, mve_pred16_t p)
+{
+  vstrwq_scatter_base_p_s32 (addr, 8, value, p);
+}
+
+/* { dg-final { scan-assembler "vstrwt.u32"  }  } */
+
+void
+foo1 (uint32x4_t addr, int32x4_t value, mve_pred16_t p)
+{
+  vstrwq_scatter_base_p (addr, 8, value, p);
+}
+
+/* { dg-final { scan-assembler "vstrwt.u32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_p_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_p_u32.c
new file mode 100644
index 0000000000000000000000000000000000000000..abd5e13e21ece98654cc136b8d76b2238df6a523
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_p_u32.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+void
+foo (uint32x4_t addr, uint32x4_t value, mve_pred16_t p)
+{
+  vstrwq_scatter_base_p_u32 (addr, 8, value, p);
+}
+
+/* { dg-final { scan-assembler "vstrwt.u32"  }  } */
+
+void
+foo1 (uint32x4_t addr, uint32x4_t value, mve_pred16_t p)
+{
+  vstrwq_scatter_base_p (addr, 8, value, p);
+}
+
+/* { dg-final { scan-assembler "vstrwt.u32"  }  } */


^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH][ARM][GCC][2/5x]: MVE load intrinsics.
  2019-11-14 19:34 [PATCH][ARM][GCC][0/x]: Support for MVE ACLE intrinsics Srinath Parvathaneni
                   ` (17 preceding siblings ...)
  2019-11-14 19:15 ` [PATCH][ARM][GCC][1/5x]: MVE store intrinsics Srinath Parvathaneni
@ 2019-11-14 19:15 ` Srinath Parvathaneni
  2019-11-14 19:15 ` [PATCH][ARM][GCC][3/5x]: MVE store intrinsics with predicated suffix Srinath Parvathaneni
                   ` (19 subsequent siblings)
  38 siblings, 0 replies; 48+ messages in thread
From: Srinath Parvathaneni @ 2019-11-14 19:15 UTC (permalink / raw)
  To: gcc-patches; +Cc: Richard Earnshaw, Kyrylo Tkachov

[-- Attachment #1: Type: text/plain, Size: 28743 bytes --]

Hello,

This patch supports the following MVE ACLE load intrinsics.

vldrbq_gather_offset_u8, vldrbq_gather_offset_s8, vldrbq_s8, vldrbq_u8,
vldrbq_gather_offset_u16, vldrbq_gather_offset_s16, vldrbq_s16, vldrbq_u16,
vldrbq_gather_offset_u32, vldrbq_gather_offset_s32, vldrbq_s32, vldrbq_u32,
vldrwq_gather_base_s32, vldrwq_gather_base_u32.

Please refer to M-profile Vector Extension (MVE) intrinsics [1]  for more details.
[1]  https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/mve-intrinsics

Regression tested on arm-none-eabi and found no regressions.

Ok for trunk?

Thanks,
Srinath.

gcc/ChangeLog:

2019-11-01  Andre Vieira  <andre.simoesdiasvieira@arm.com>
	    Mihail Ionescu  <mihail.ionescu@arm.com>
	    Srinath Parvathaneni  <srinath.parvathaneni@arm.com>

	* config/arm/arm-builtins.c (LDRGU_QUALIFIERS): Define builtin
	qualifier.
	(LDRGS_QUALIFIERS): Likewise.
	(LDRS_QUALIFIERS): Likewise.
	(LDRU_QUALIFIERS): Likewise.
	(LDRGBS_QUALIFIERS): Likewise.
	(LDRGBU_QUALIFIERS): Likewise.
	* config/arm/arm_mve.h (vldrbq_gather_offset_u8): Define macro.
	(vldrbq_gather_offset_s8): Likewise.
	(vldrbq_s8): Likewise.
	(vldrbq_u8): Likewise.
	(vldrbq_gather_offset_u16): Likewise.
	(vldrbq_gather_offset_s16): Likewise.
	(vldrbq_s16): Likewise.
	(vldrbq_u16): Likewise.
	(vldrbq_gather_offset_u32): Likewise.
	(vldrbq_gather_offset_s32): Likewise.
	(vldrbq_s32): Likewise.
	(vldrbq_u32): Likewise.
	(vldrwq_gather_base_s32): Likewise.
	(vldrwq_gather_base_u32): Likewise.
	(__arm_vldrbq_gather_offset_u8): Define intrinsic.
	(__arm_vldrbq_gather_offset_s8): Likewise.
	(__arm_vldrbq_s8): Likewise.
	(__arm_vldrbq_u8): Likewise.
	(__arm_vldrbq_gather_offset_u16): Likewise.
	(__arm_vldrbq_gather_offset_s16): Likewise.
	(__arm_vldrbq_s16): Likewise.
	(__arm_vldrbq_u16): Likewise.
	(__arm_vldrbq_gather_offset_u32): Likewise.
	(__arm_vldrbq_gather_offset_s32): Likewise.
	(__arm_vldrbq_s32): Likewise.
	(__arm_vldrbq_u32): Likewise.
	(__arm_vldrwq_gather_base_s32): Likewise.
	(__arm_vldrwq_gather_base_u32): Likewise.
	(vldrbq_gather_offset): Define polymorphic variant.
	* config/arm/arm_mve_builtins.def (LDRGU_QUALIFIERS): Use builtin
	qualifier.
        (LDRGS_QUALIFIERS): Likewise.
        (LDRS_QUALIFIERS): Likewise.
        (LDRU_QUALIFIERS): Likewise.
        (LDRGBS_QUALIFIERS): Likewise.
        (LDRGBU_QUALIFIERS): Likewise.
	* config/arm/mve.md (VLDRBGOQ): Define iterator.
	(VLDRBQ): Likewise. 
	(VLDRWGBQ): Likewise.
	(mve_vldrbq_gather_offset_<supf><mode>): Define RTL pattern.
	(mve_vldrbq_<supf><mode>): Likewise.
	(mve_vldrwq_gather_base_<supf>v4si): Likewise.

gcc/testsuite/ChangeLog:

2019-11-01  Andre Vieira  <andre.simoesdiasvieira@arm.com>
	    Mihail Ionescu  <mihail.ionescu@arm.com>
	    Srinath Parvathaneni  <srinath.parvathaneni@arm.com>

	* gcc.target/arm/mve/intrinsics/vldrbq_gather_offset_s16.c: New test.
	* gcc.target/arm/mve/intrinsics/vldrbq_gather_offset_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vldrbq_gather_offset_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vldrbq_gather_offset_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vldrbq_gather_offset_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vldrbq_gather_offset_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vldrbq_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vldrbq_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vldrbq_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vldrbq_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vldrbq_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vldrbq_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vldrwq_gather_base_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vldrwq_gather_base_u32.c: Likewise.


###############     Attachment also inlined for ease of reply    ###############


diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index ec88199bb5e7e9c15a346061c70841f3086004ef..02ea297937b18099f33a50c808964d1dd7eac1b3 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -588,6 +588,36 @@ arm_strsbu_qualifiers[SIMD_MAX_BUILTIN_ARGS]
       qualifier_unsigned};
 #define STRSBU_QUALIFIERS (arm_strsbu_qualifiers)
 
+static enum arm_type_qualifiers
+arm_ldrgu_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_unsigned, qualifier_pointer, qualifier_unsigned};
+#define LDRGU_QUALIFIERS (arm_ldrgu_qualifiers)
+
+static enum arm_type_qualifiers
+arm_ldrgs_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_none, qualifier_pointer, qualifier_unsigned};
+#define LDRGS_QUALIFIERS (arm_ldrgs_qualifiers)
+
+static enum arm_type_qualifiers
+arm_ldrs_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_none, qualifier_pointer};
+#define LDRS_QUALIFIERS (arm_ldrs_qualifiers)
+
+static enum arm_type_qualifiers
+arm_ldru_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_unsigned, qualifier_pointer};
+#define LDRU_QUALIFIERS (arm_ldru_qualifiers)
+
+static enum arm_type_qualifiers
+arm_ldrgbs_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_none, qualifier_unsigned, qualifier_immediate};
+#define LDRGBS_QUALIFIERS (arm_ldrgbs_qualifiers)
+
+static enum arm_type_qualifiers
+arm_ldrgbu_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_unsigned, qualifier_unsigned, qualifier_immediate};
+#define LDRGBU_QUALIFIERS (arm_ldrgbu_qualifiers)
+
 /* End of Qualifier for MVE builtins.  */
 
    /* void ([T element type] *, T, immediate).  */
diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
index 3d64033dd23636b041cbd6ecb3eb18c3b9db3aed..c123de1f3d37054b779a9ee908525c242b2bc4ab 100644
--- a/gcc/config/arm/arm_mve.h
+++ b/gcc/config/arm/arm_mve.h
@@ -1716,6 +1716,20 @@ typedef struct { uint8x16_t val[4]; } uint8x16x4_t;
 #define vstrbq_scatter_offset_s32( __base, __offset, __value) __arm_vstrbq_scatter_offset_s32( __base, __offset, __value)
 #define vstrwq_scatter_base_s32(__addr,  __offset, __value) __arm_vstrwq_scatter_base_s32(__addr,  __offset, __value)
 #define vstrwq_scatter_base_u32(__addr,  __offset, __value) __arm_vstrwq_scatter_base_u32(__addr,  __offset, __value)
+#define vldrbq_gather_offset_u8(__base, __offset) __arm_vldrbq_gather_offset_u8(__base, __offset)
+#define vldrbq_gather_offset_s8(__base, __offset) __arm_vldrbq_gather_offset_s8(__base, __offset)
+#define vldrbq_s8(__base) __arm_vldrbq_s8(__base)
+#define vldrbq_u8(__base) __arm_vldrbq_u8(__base)
+#define vldrbq_gather_offset_u16(__base, __offset) __arm_vldrbq_gather_offset_u16(__base, __offset)
+#define vldrbq_gather_offset_s16(__base, __offset) __arm_vldrbq_gather_offset_s16(__base, __offset)
+#define vldrbq_s16(__base) __arm_vldrbq_s16(__base)
+#define vldrbq_u16(__base) __arm_vldrbq_u16(__base)
+#define vldrbq_gather_offset_u32(__base, __offset) __arm_vldrbq_gather_offset_u32(__base, __offset)
+#define vldrbq_gather_offset_s32(__base, __offset) __arm_vldrbq_gather_offset_s32(__base, __offset)
+#define vldrbq_s32(__base) __arm_vldrbq_s32(__base)
+#define vldrbq_u32(__base) __arm_vldrbq_u32(__base)
+#define vldrwq_gather_base_s32(__addr,  __offset) __arm_vldrwq_gather_base_s32(__addr,  __offset)
+#define vldrwq_gather_base_u32(__addr,  __offset) __arm_vldrwq_gather_base_u32(__addr,  __offset)
 #endif
 
 __extension__ extern __inline void
@@ -11106,6 +11120,105 @@ __arm_vstrwq_scatter_base_u32 (uint32x4_t __addr, const int __offset, uint32x4_t
 {
   __builtin_mve_vstrwq_scatter_base_uv4si (__addr, __offset, __value);
 }
+
+__extension__ extern __inline uint8x16_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vldrbq_gather_offset_u8 (uint8_t const * __base, uint8x16_t __offset)
+{
+  return __builtin_mve_vldrbq_gather_offset_uv16qi ((__builtin_neon_qi *) __base, __offset);
+}
+
+__extension__ extern __inline int8x16_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vldrbq_gather_offset_s8 (int8_t const * __base, uint8x16_t __offset)
+{
+  return __builtin_mve_vldrbq_gather_offset_sv16qi ((__builtin_neon_qi *) __base, __offset);
+}
+
+__extension__ extern __inline int8x16_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vldrbq_s8 (int8_t const * __base)
+{
+  return __builtin_mve_vldrbq_sv16qi ((__builtin_neon_qi *) __base);
+}
+
+__extension__ extern __inline uint8x16_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vldrbq_u8 (uint8_t const * __base)
+{
+  return __builtin_mve_vldrbq_uv16qi ((__builtin_neon_qi *) __base);
+}
+
+__extension__ extern __inline uint16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vldrbq_gather_offset_u16 (uint8_t const * __base, uint16x8_t __offset)
+{
+  return __builtin_mve_vldrbq_gather_offset_uv8hi ((__builtin_neon_qi *) __base, __offset);
+}
+
+__extension__ extern __inline int16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vldrbq_gather_offset_s16 (int8_t const * __base, uint16x8_t __offset)
+{
+  return __builtin_mve_vldrbq_gather_offset_sv8hi ((__builtin_neon_qi *) __base, __offset);
+}
+
+__extension__ extern __inline int16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vldrbq_s16 (int8_t const * __base)
+{
+  return __builtin_mve_vldrbq_sv8hi ((__builtin_neon_qi *) __base);
+}
+
+__extension__ extern __inline uint16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vldrbq_u16 (uint8_t const * __base)
+{
+  return __builtin_mve_vldrbq_uv8hi ((__builtin_neon_qi *) __base);
+}
+
+__extension__ extern __inline uint32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vldrbq_gather_offset_u32 (uint8_t const * __base, uint32x4_t __offset)
+{
+  return __builtin_mve_vldrbq_gather_offset_uv4si ((__builtin_neon_qi *) __base, __offset);
+}
+
+__extension__ extern __inline int32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vldrbq_gather_offset_s32 (int8_t const * __base, uint32x4_t __offset)
+{
+  return __builtin_mve_vldrbq_gather_offset_sv4si ((__builtin_neon_qi *) __base, __offset);
+}
+
+__extension__ extern __inline int32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vldrbq_s32 (int8_t const * __base)
+{
+  return __builtin_mve_vldrbq_sv4si ((__builtin_neon_qi *) __base);
+}
+
+__extension__ extern __inline uint32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vldrbq_u32 (uint8_t const * __base)
+{
+  return __builtin_mve_vldrbq_uv4si ((__builtin_neon_qi *) __base);
+}
+
+__extension__ extern __inline int32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vldrwq_gather_base_s32 (uint32x4_t __addr, const int __offset)
+{
+  return __builtin_mve_vldrwq_gather_base_sv4si (__addr, __offset);
+}
+
+__extension__ extern __inline uint32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vldrwq_gather_base_u32 (uint32x4_t __addr, const int __offset)
+{
+  return __builtin_mve_vldrwq_gather_base_uv4si (__addr, __offset);
+}
+
 #if (__ARM_FEATURE_MVE & 2) /* MVE Floating point.  */
 
 __extension__ extern __inline void
@@ -17682,6 +17795,17 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_int32x4_t]: __arm_vstrwq_scatter_base_s32(p0, p1, __ARM_mve_coerce(__p2, int32x4_t)), \
   int (*)[__ARM_mve_type_uint32x4_t]: __arm_vstrwq_scatter_base_u32(p0, p1, __ARM_mve_coerce(__p2, uint32x4_t)));})
 
+#define vldrbq_gather_offset(p0,p1) __arm_vldrbq_gather_offset(p0,p1)
+#define __arm_vldrbq_gather_offset(p0,p1) ({ __typeof(p0) __p0 = (p0); \
+  __typeof(p1) __p1 = (p1); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
+  int (*)[__ARM_mve_type_int8_t_const_ptr][__ARM_mve_type_uint8x16_t]: __arm_vldrbq_gather_offset_s8 (__ARM_mve_coerce(__p0, int8_t const *), __ARM_mve_coerce(__p1, uint8x16_t)), \
+  int (*)[__ARM_mve_type_int8_t_const_ptr][__ARM_mve_type_uint16x8_t]: __arm_vldrbq_gather_offset_s16 (__ARM_mve_coerce(__p0, int8_t const *), __ARM_mve_coerce(__p1, uint16x8_t)), \
+  int (*)[__ARM_mve_type_int8_t_const_ptr][__ARM_mve_type_uint32x4_t]: __arm_vldrbq_gather_offset_s32 (__ARM_mve_coerce(__p0, int8_t const *), __ARM_mve_coerce(__p1, uint32x4_t)), \
+  int (*)[__ARM_mve_type_uint8_t_const_ptr][__ARM_mve_type_uint8x16_t]: __arm_vldrbq_gather_offset_u8 (__ARM_mve_coerce(__p0, uint8_t const *), __ARM_mve_coerce(__p1, uint8x16_t)), \
+  int (*)[__ARM_mve_type_uint8_t_const_ptr][__ARM_mve_type_uint16x8_t]: __arm_vldrbq_gather_offset_u16 (__ARM_mve_coerce(__p0, uint8_t const *), __ARM_mve_coerce(__p1, uint16x8_t)), \
+  int (*)[__ARM_mve_type_uint8_t_const_ptr][__ARM_mve_type_uint32x4_t]: __arm_vldrbq_gather_offset_u32 (__ARM_mve_coerce(__p0, uint8_t const *), __ARM_mve_coerce(__p1, uint32x4_t)));})
+
 #endif /* MVE Floating point.  */
 
 #ifdef __cplusplus
diff --git a/gcc/config/arm/arm_mve_builtins.def b/gcc/config/arm/arm_mve_builtins.def
index 149c4115b6167e3f2f0eb43fbdeaa8708ba5d723..60e50767a2ac80a05303e3df8512f2612c8bc8ef 100644
--- a/gcc/config/arm/arm_mve_builtins.def
+++ b/gcc/config/arm/arm_mve_builtins.def
@@ -691,3 +691,9 @@ VAR3 (STRSS, vstrbq_scatter_offset_s, v16qi, v8hi, v4si)
 VAR3 (STRSU, vstrbq_scatter_offset_u, v16qi, v8hi, v4si)
 VAR1 (STRSBS, vstrwq_scatter_base_s, v4si)
 VAR1 (STRSBU, vstrwq_scatter_base_u, v4si)
+VAR3 (LDRGU, vldrbq_gather_offset_u, v16qi, v8hi, v4si)
+VAR3 (LDRGS, vldrbq_gather_offset_s, v16qi, v8hi, v4si)
+VAR3 (LDRS, vldrbq_s, v16qi, v8hi, v4si)
+VAR3 (LDRU, vldrbq_u, v16qi, v8hi, v4si)
+VAR1 (LDRGBS, vldrwq_gather_base_s, v4si)
+VAR1 (LDRGBU, vldrwq_gather_base_u, v4si)
diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index eb365166e934f32f7695e5354258151202f378c0..7b416f9a4e04f8c02547953d6019787817688a36 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -192,7 +192,8 @@
 			 VCMULQ_ROT270_M_F VCMULQ_ROT90_M_F VFMAQ_M_F
 			 VFMAQ_M_N_F VFMASQ_M_N_F VFMSQ_M_F VMAXNMQ_M_F
 			 VMINNMQ_M_F VSUBQ_M_F VSTRWQSB_S VSTRWQSB_U
-			 VSTRBQSO_S VSTRBQSO_U VSTRBQ_S VSTRBQ_U])
+			 VSTRBQSO_S VSTRBQSO_U VSTRBQ_S VSTRBQ_U VLDRBQGO_S
+			 VLDRBQGO_U VLDRBQ_S VLDRBQ_U VLDRWQGB_S VLDRWQGB_U])
 
 (define_mode_attr MVE_CNVT [(V8HI "V8HF") (V4SI "V4SF")
 			    (V8HF "V8HI") (V4SF "V4SI")])
@@ -345,7 +346,9 @@
 		       (VMLALDAVAXQ_P_S "s") (VMLALDAVAXQ_P_U "u")
 		       (VMLALDAVAQ_P_S "s") (VMLALDAVAQ_P_U "u")
 		       (VSTRWQSB_S "s") (VSTRWQSB_U "u") (VSTRBQSO_S "s")
-		       (VSTRBQSO_U "u") (VSTRBQ_S "s") (VSTRBQ_U "u")])
+		       (VSTRBQSO_U "u") (VSTRBQ_S "s") (VSTRBQ_U "u")
+		       (VLDRBQGO_S "s") (VLDRBQGO_U "u") (VLDRBQ_S "s")
+		       (VLDRBQ_U "u") (VLDRWQGB_S "s") (VLDRWQGB_U "u")])
 
 (define_int_attr mode1 [(VCTP8Q "8") (VCTP16Q "16") (VCTP32Q "32")
 			(VCTP64Q "64") (VCTP8Q_M "8") (VCTP16Q_M "16")
@@ -569,6 +572,9 @@
 (define_int_iterator VSTRWSBQ [VSTRWQSB_S VSTRWQSB_U])
 (define_int_iterator VSTRBSOQ [VSTRBQSO_S VSTRBQSO_U])
 (define_int_iterator VSTRBQ [VSTRBQ_S VSTRBQ_U])
+(define_int_iterator VLDRBGOQ [VLDRBQGO_S VLDRBQGO_U])
+(define_int_iterator VLDRBQ [VLDRBQ_S VLDRBQ_U])
+(define_int_iterator VLDRWGBQ [VLDRWQGB_S VLDRWQGB_U])
 
 (define_insn "*mve_mov<mode>"
   [(set (match_operand:MVE_types 0 "s_register_operand" "=w,w,r,w,w,r,w")
@@ -8002,3 +8008,65 @@
    return "";
 }
   [(set_attr "length" "4")])
+
+;;
+;; [vldrbq_gather_offset_s vldrbq_gather_offset_u]
+;;
+(define_insn "mve_vldrbq_gather_offset_<supf><mode>"
+  [(set (match_operand:MVE_2 0 "s_register_operand" "=&w")
+	(unspec:MVE_2 [(match_operand:<MVE_B_ELEM> 1 "memory_operand" "Us")
+		       (match_operand:MVE_2 2 "s_register_operand" "w")]
+	 VLDRBGOQ))
+  ]
+  "TARGET_HAVE_MVE"
+{
+   rtx ops[3];
+   ops[0] = operands[0];
+   ops[1] = operands[1];
+   ops[2] = operands[2];
+   if (!strcmp ("<supf>","s") && <V_sz_elem> == 8)
+     output_asm_insn ("vldrb.u8\t%q0, [%m1, %q2]",ops);
+   else
+     output_asm_insn ("vldrb.<supf><V_sz_elem>\t%q0, [%m1, %q2]",ops);
+   return "";
+}
+  [(set_attr "length" "4")])
+
+;;
+;; [vldrbq_s vldrbq_u]
+;;
+(define_insn "mve_vldrbq_<supf><mode>"
+  [(set (match_operand:MVE_2 0 "s_register_operand" "=w")
+	(unspec:MVE_2 [(match_operand:<MVE_B_ELEM> 1 "memory_operand" "Us")]
+	 VLDRBQ))
+  ]
+  "TARGET_HAVE_MVE"
+{
+   rtx ops[2];
+   int regno = REGNO (operands[0]);
+   ops[0] = gen_rtx_REG (TImode, regno);
+   ops[1]  = operands[1];
+   output_asm_insn ("vldrb.<supf><V_sz_elem>\t%q0, %E1",ops);
+   return "";
+}
+  [(set_attr "length" "4")])
+
+;;
+;; [vldrwq_gather_base_s vldrwq_gather_base_u]
+;;
+(define_insn "mve_vldrwq_gather_base_<supf>v4si"
+  [(set (match_operand:V4SI 0 "s_register_operand" "=&w")
+	(unspec:V4SI [(match_operand:V4SI 1 "s_register_operand" "w")
+		      (match_operand:SI 2 "immediate_operand" "i")]
+	 VLDRWGBQ))
+  ]
+  "TARGET_HAVE_MVE"
+{
+   rtx ops[3];
+   ops[0] = operands[0];
+   ops[1] = operands[1];
+   ops[2] = operands[2];
+   output_asm_insn ("vldrw.u32\t%q0, [%q1, %2]",ops);
+   return "";
+}
+  [(set_attr "length" "4")])
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_gather_offset_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_gather_offset_s16.c
new file mode 100644
index 0000000000000000000000000000000000000000..f7763b8a1f668e25c5b34ed429dd5dc9a5455676
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_gather_offset_s16.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int16x8_t
+foo (int8_t const * base, uint16x8_t offset)
+{
+  return vldrbq_gather_offset_s16 (base, offset);
+}
+
+/* { dg-final { scan-assembler "vldrb.s16"  }  } */
+
+int16x8_t
+foo1 (int8_t const * base, uint16x8_t offset)
+{
+  return vldrbq_gather_offset (base, offset);
+}
+
+/* { dg-final { scan-assembler "vldrb.s16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_gather_offset_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_gather_offset_s32.c
new file mode 100644
index 0000000000000000000000000000000000000000..768a4325beae64496e174f55932554a961cbc886
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_gather_offset_s32.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int32x4_t
+foo (int8_t const * base, uint32x4_t offset)
+{
+  return vldrbq_gather_offset_s32 (base, offset);
+}
+
+/* { dg-final { scan-assembler "vldrb.s32"  }  } */
+
+int32x4_t
+foo1 (int8_t const * base, uint32x4_t offset)
+{
+  return vldrbq_gather_offset (base, offset);
+}
+
+/* { dg-final { scan-assembler "vldrb.s32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_gather_offset_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_gather_offset_s8.c
new file mode 100644
index 0000000000000000000000000000000000000000..4c52b5a009507bf68b6bd83699cd78e85b291572
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_gather_offset_s8.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int8x16_t
+foo (int8_t const * base, uint8x16_t offset)
+{
+  return vldrbq_gather_offset_s8 (base, offset);
+}
+
+/* { dg-final { scan-assembler "vldrb.u8"  }  } */
+
+int8x16_t
+foo1 (int8_t const * base, uint8x16_t offset)
+{
+  return vldrbq_gather_offset (base, offset);
+}
+
+/* { dg-final { scan-assembler "vldrb.u8"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_gather_offset_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_gather_offset_u16.c
new file mode 100644
index 0000000000000000000000000000000000000000..44b310add69aa8e7b74cf2c501013b136d5da041
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_gather_offset_u16.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint16x8_t
+foo (uint8_t const * base, uint16x8_t offset)
+{
+  return vldrbq_gather_offset_u16 (base, offset);
+}
+
+/* { dg-final { scan-assembler "vldrb.u16"  }  } */
+
+uint16x8_t
+foo1 (uint8_t const * base, uint16x8_t offset)
+{
+  return vldrbq_gather_offset (base, offset);
+}
+
+/* { dg-final { scan-assembler "vldrb.u16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_gather_offset_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_gather_offset_u32.c
new file mode 100644
index 0000000000000000000000000000000000000000..8415ea701327b2a296e1e28e37537045b69ff6a9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_gather_offset_u32.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint32x4_t
+foo (uint8_t const * base, uint32x4_t offset)
+{
+  return vldrbq_gather_offset_u32 (base, offset);
+}
+
+/* { dg-final { scan-assembler "vldrb.u32"  }  } */
+
+uint32x4_t
+foo1 (uint8_t const * base, uint32x4_t offset)
+{
+  return vldrbq_gather_offset (base, offset);
+}
+
+/* { dg-final { scan-assembler "vldrb.u32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_gather_offset_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_gather_offset_u8.c
new file mode 100644
index 0000000000000000000000000000000000000000..9fbe1aa419d572ffad9b319d3ad24d9ac913b7e6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_gather_offset_u8.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint8x16_t
+foo (uint8_t const * base, uint8x16_t offset)
+{
+  return vldrbq_gather_offset_u8 (base, offset);
+}
+
+/* { dg-final { scan-assembler "vldrb.u8"  }  } */
+
+uint8x16_t
+foo1 (uint8_t const * base, uint8x16_t offset)
+{
+  return vldrbq_gather_offset (base, offset);
+}
+
+/* { dg-final { scan-assembler "vldrb.u8"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_s16.c
new file mode 100644
index 0000000000000000000000000000000000000000..a63c7367e9ae24b78f4819aea95e6e5d0ec85891
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_s16.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int16x8_t
+foo (int8_t const * base)
+{
+  return vldrbq_s16 (base);
+}
+
+/* { dg-final { scan-assembler "vldrb.s16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_s32.c
new file mode 100644
index 0000000000000000000000000000000000000000..040c58d44c5e6bf09ad61d13c44f53dfd987e8e4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_s32.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int32x4_t
+foo (int8_t const * base)
+{
+  return vldrbq_s32 (base);
+}
+
+/* { dg-final { scan-assembler "vldrb.s32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_s8.c
new file mode 100644
index 0000000000000000000000000000000000000000..e5cc3deb79b909a19cbbcafe9213c91b67ac337a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_s8.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int8x16_t
+foo (int8_t const * base)
+{
+  return vldrbq_s8 (base);
+}
+
+/* { dg-final { scan-assembler "vldrb.s8"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_u16.c
new file mode 100644
index 0000000000000000000000000000000000000000..7dedbde6eeae066fdc04e9d6fb6fe626af4580db
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_u16.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint16x8_t
+foo (uint8_t const * base)
+{
+  return vldrbq_u16 (base);
+}
+
+/* { dg-final { scan-assembler "vldrb.u16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_u32.c
new file mode 100644
index 0000000000000000000000000000000000000000..3c3ec6c4ffe6b014bf58d0b7c2a2d501ddf8301b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_u32.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint32x4_t
+foo (uint8_t const * base)
+{
+  return vldrbq_u32 (base);
+}
+
+/* { dg-final { scan-assembler "vldrb.u32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_u8.c
new file mode 100644
index 0000000000000000000000000000000000000000..5328d99bab94d62bd032d4da3509cd2c55a7841a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_u8.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint8x16_t
+foo (uint8_t const * base)
+{
+  return vldrbq_u8 (base);
+}
+
+/* { dg-final { scan-assembler "vldrb.u8"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_s32.c
new file mode 100644
index 0000000000000000000000000000000000000000..14831a91b65900d040f74b90c592bc195af42874
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_s32.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int32x4_t
+foo (uint32x4_t addr)
+{
+  return vldrwq_gather_base_s32 (addr, 4);
+}
+
+/* { dg-final { scan-assembler "vldrw.u32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_u32.c
new file mode 100644
index 0000000000000000000000000000000000000000..7753ea6a7c9ec83b90f3e723683c156b20617e23
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_u32.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint32x4_t
+foo (uint32x4_t addr)
+{
+  return vldrwq_gather_base_u32 (addr, 4);
+}
+
+/* { dg-final { scan-assembler "vldrw.u32"  }  } */


[-- Attachment #2: diff21.patch --]
[-- Type: text/plain, Size: 24227 bytes --]

diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index ec88199bb5e7e9c15a346061c70841f3086004ef..02ea297937b18099f33a50c808964d1dd7eac1b3 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -588,6 +588,36 @@ arm_strsbu_qualifiers[SIMD_MAX_BUILTIN_ARGS]
       qualifier_unsigned};
 #define STRSBU_QUALIFIERS (arm_strsbu_qualifiers)
 
+static enum arm_type_qualifiers
+arm_ldrgu_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_unsigned, qualifier_pointer, qualifier_unsigned};
+#define LDRGU_QUALIFIERS (arm_ldrgu_qualifiers)
+
+static enum arm_type_qualifiers
+arm_ldrgs_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_none, qualifier_pointer, qualifier_unsigned};
+#define LDRGS_QUALIFIERS (arm_ldrgs_qualifiers)
+
+static enum arm_type_qualifiers
+arm_ldrs_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_none, qualifier_pointer};
+#define LDRS_QUALIFIERS (arm_ldrs_qualifiers)
+
+static enum arm_type_qualifiers
+arm_ldru_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_unsigned, qualifier_pointer};
+#define LDRU_QUALIFIERS (arm_ldru_qualifiers)
+
+static enum arm_type_qualifiers
+arm_ldrgbs_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_none, qualifier_unsigned, qualifier_immediate};
+#define LDRGBS_QUALIFIERS (arm_ldrgbs_qualifiers)
+
+static enum arm_type_qualifiers
+arm_ldrgbu_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_unsigned, qualifier_unsigned, qualifier_immediate};
+#define LDRGBU_QUALIFIERS (arm_ldrgbu_qualifiers)
+
 /* End of Qualifier for MVE builtins.  */
 
    /* void ([T element type] *, T, immediate).  */
diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
index 3d64033dd23636b041cbd6ecb3eb18c3b9db3aed..c123de1f3d37054b779a9ee908525c242b2bc4ab 100644
--- a/gcc/config/arm/arm_mve.h
+++ b/gcc/config/arm/arm_mve.h
@@ -1716,6 +1716,20 @@ typedef struct { uint8x16_t val[4]; } uint8x16x4_t;
 #define vstrbq_scatter_offset_s32( __base, __offset, __value) __arm_vstrbq_scatter_offset_s32( __base, __offset, __value)
 #define vstrwq_scatter_base_s32(__addr,  __offset, __value) __arm_vstrwq_scatter_base_s32(__addr,  __offset, __value)
 #define vstrwq_scatter_base_u32(__addr,  __offset, __value) __arm_vstrwq_scatter_base_u32(__addr,  __offset, __value)
+#define vldrbq_gather_offset_u8(__base, __offset) __arm_vldrbq_gather_offset_u8(__base, __offset)
+#define vldrbq_gather_offset_s8(__base, __offset) __arm_vldrbq_gather_offset_s8(__base, __offset)
+#define vldrbq_s8(__base) __arm_vldrbq_s8(__base)
+#define vldrbq_u8(__base) __arm_vldrbq_u8(__base)
+#define vldrbq_gather_offset_u16(__base, __offset) __arm_vldrbq_gather_offset_u16(__base, __offset)
+#define vldrbq_gather_offset_s16(__base, __offset) __arm_vldrbq_gather_offset_s16(__base, __offset)
+#define vldrbq_s16(__base) __arm_vldrbq_s16(__base)
+#define vldrbq_u16(__base) __arm_vldrbq_u16(__base)
+#define vldrbq_gather_offset_u32(__base, __offset) __arm_vldrbq_gather_offset_u32(__base, __offset)
+#define vldrbq_gather_offset_s32(__base, __offset) __arm_vldrbq_gather_offset_s32(__base, __offset)
+#define vldrbq_s32(__base) __arm_vldrbq_s32(__base)
+#define vldrbq_u32(__base) __arm_vldrbq_u32(__base)
+#define vldrwq_gather_base_s32(__addr,  __offset) __arm_vldrwq_gather_base_s32(__addr,  __offset)
+#define vldrwq_gather_base_u32(__addr,  __offset) __arm_vldrwq_gather_base_u32(__addr,  __offset)
 #endif
 
 __extension__ extern __inline void
@@ -11106,6 +11120,105 @@ __arm_vstrwq_scatter_base_u32 (uint32x4_t __addr, const int __offset, uint32x4_t
 {
   __builtin_mve_vstrwq_scatter_base_uv4si (__addr, __offset, __value);
 }
+
+__extension__ extern __inline uint8x16_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vldrbq_gather_offset_u8 (uint8_t const * __base, uint8x16_t __offset)
+{
+  return __builtin_mve_vldrbq_gather_offset_uv16qi ((__builtin_neon_qi *) __base, __offset);
+}
+
+__extension__ extern __inline int8x16_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vldrbq_gather_offset_s8 (int8_t const * __base, uint8x16_t __offset)
+{
+  return __builtin_mve_vldrbq_gather_offset_sv16qi ((__builtin_neon_qi *) __base, __offset);
+}
+
+__extension__ extern __inline int8x16_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vldrbq_s8 (int8_t const * __base)
+{
+  return __builtin_mve_vldrbq_sv16qi ((__builtin_neon_qi *) __base);
+}
+
+__extension__ extern __inline uint8x16_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vldrbq_u8 (uint8_t const * __base)
+{
+  return __builtin_mve_vldrbq_uv16qi ((__builtin_neon_qi *) __base);
+}
+
+__extension__ extern __inline uint16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vldrbq_gather_offset_u16 (uint8_t const * __base, uint16x8_t __offset)
+{
+  return __builtin_mve_vldrbq_gather_offset_uv8hi ((__builtin_neon_qi *) __base, __offset);
+}
+
+__extension__ extern __inline int16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vldrbq_gather_offset_s16 (int8_t const * __base, uint16x8_t __offset)
+{
+  return __builtin_mve_vldrbq_gather_offset_sv8hi ((__builtin_neon_qi *) __base, __offset);
+}
+
+__extension__ extern __inline int16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vldrbq_s16 (int8_t const * __base)
+{
+  return __builtin_mve_vldrbq_sv8hi ((__builtin_neon_qi *) __base);
+}
+
+__extension__ extern __inline uint16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vldrbq_u16 (uint8_t const * __base)
+{
+  return __builtin_mve_vldrbq_uv8hi ((__builtin_neon_qi *) __base);
+}
+
+__extension__ extern __inline uint32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vldrbq_gather_offset_u32 (uint8_t const * __base, uint32x4_t __offset)
+{
+  return __builtin_mve_vldrbq_gather_offset_uv4si ((__builtin_neon_qi *) __base, __offset);
+}
+
+__extension__ extern __inline int32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vldrbq_gather_offset_s32 (int8_t const * __base, uint32x4_t __offset)
+{
+  return __builtin_mve_vldrbq_gather_offset_sv4si ((__builtin_neon_qi *) __base, __offset);
+}
+
+__extension__ extern __inline int32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vldrbq_s32 (int8_t const * __base)
+{
+  return __builtin_mve_vldrbq_sv4si ((__builtin_neon_qi *) __base);
+}
+
+__extension__ extern __inline uint32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vldrbq_u32 (uint8_t const * __base)
+{
+  return __builtin_mve_vldrbq_uv4si ((__builtin_neon_qi *) __base);
+}
+
+__extension__ extern __inline int32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vldrwq_gather_base_s32 (uint32x4_t __addr, const int __offset)
+{
+  return __builtin_mve_vldrwq_gather_base_sv4si (__addr, __offset);
+}
+
+__extension__ extern __inline uint32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vldrwq_gather_base_u32 (uint32x4_t __addr, const int __offset)
+{
+  return __builtin_mve_vldrwq_gather_base_uv4si (__addr, __offset);
+}
+
 #if (__ARM_FEATURE_MVE & 2) /* MVE Floating point.  */
 
 __extension__ extern __inline void
@@ -17682,6 +17795,17 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_int32x4_t]: __arm_vstrwq_scatter_base_s32(p0, p1, __ARM_mve_coerce(__p2, int32x4_t)), \
   int (*)[__ARM_mve_type_uint32x4_t]: __arm_vstrwq_scatter_base_u32(p0, p1, __ARM_mve_coerce(__p2, uint32x4_t)));})
 
+#define vldrbq_gather_offset(p0,p1) __arm_vldrbq_gather_offset(p0,p1)
+#define __arm_vldrbq_gather_offset(p0,p1) ({ __typeof(p0) __p0 = (p0); \
+  __typeof(p1) __p1 = (p1); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
+  int (*)[__ARM_mve_type_int8_t_const_ptr][__ARM_mve_type_uint8x16_t]: __arm_vldrbq_gather_offset_s8 (__ARM_mve_coerce(__p0, int8_t const *), __ARM_mve_coerce(__p1, uint8x16_t)), \
+  int (*)[__ARM_mve_type_int8_t_const_ptr][__ARM_mve_type_uint16x8_t]: __arm_vldrbq_gather_offset_s16 (__ARM_mve_coerce(__p0, int8_t const *), __ARM_mve_coerce(__p1, uint16x8_t)), \
+  int (*)[__ARM_mve_type_int8_t_const_ptr][__ARM_mve_type_uint32x4_t]: __arm_vldrbq_gather_offset_s32 (__ARM_mve_coerce(__p0, int8_t const *), __ARM_mve_coerce(__p1, uint32x4_t)), \
+  int (*)[__ARM_mve_type_uint8_t_const_ptr][__ARM_mve_type_uint8x16_t]: __arm_vldrbq_gather_offset_u8 (__ARM_mve_coerce(__p0, uint8_t const *), __ARM_mve_coerce(__p1, uint8x16_t)), \
+  int (*)[__ARM_mve_type_uint8_t_const_ptr][__ARM_mve_type_uint16x8_t]: __arm_vldrbq_gather_offset_u16 (__ARM_mve_coerce(__p0, uint8_t const *), __ARM_mve_coerce(__p1, uint16x8_t)), \
+  int (*)[__ARM_mve_type_uint8_t_const_ptr][__ARM_mve_type_uint32x4_t]: __arm_vldrbq_gather_offset_u32 (__ARM_mve_coerce(__p0, uint8_t const *), __ARM_mve_coerce(__p1, uint32x4_t)));})
+
 #endif /* MVE Floating point.  */
 
 #ifdef __cplusplus
diff --git a/gcc/config/arm/arm_mve_builtins.def b/gcc/config/arm/arm_mve_builtins.def
index 149c4115b6167e3f2f0eb43fbdeaa8708ba5d723..60e50767a2ac80a05303e3df8512f2612c8bc8ef 100644
--- a/gcc/config/arm/arm_mve_builtins.def
+++ b/gcc/config/arm/arm_mve_builtins.def
@@ -691,3 +691,9 @@ VAR3 (STRSS, vstrbq_scatter_offset_s, v16qi, v8hi, v4si)
 VAR3 (STRSU, vstrbq_scatter_offset_u, v16qi, v8hi, v4si)
 VAR1 (STRSBS, vstrwq_scatter_base_s, v4si)
 VAR1 (STRSBU, vstrwq_scatter_base_u, v4si)
+VAR3 (LDRGU, vldrbq_gather_offset_u, v16qi, v8hi, v4si)
+VAR3 (LDRGS, vldrbq_gather_offset_s, v16qi, v8hi, v4si)
+VAR3 (LDRS, vldrbq_s, v16qi, v8hi, v4si)
+VAR3 (LDRU, vldrbq_u, v16qi, v8hi, v4si)
+VAR1 (LDRGBS, vldrwq_gather_base_s, v4si)
+VAR1 (LDRGBU, vldrwq_gather_base_u, v4si)
diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index eb365166e934f32f7695e5354258151202f378c0..7b416f9a4e04f8c02547953d6019787817688a36 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -192,7 +192,8 @@
 			 VCMULQ_ROT270_M_F VCMULQ_ROT90_M_F VFMAQ_M_F
 			 VFMAQ_M_N_F VFMASQ_M_N_F VFMSQ_M_F VMAXNMQ_M_F
 			 VMINNMQ_M_F VSUBQ_M_F VSTRWQSB_S VSTRWQSB_U
-			 VSTRBQSO_S VSTRBQSO_U VSTRBQ_S VSTRBQ_U])
+			 VSTRBQSO_S VSTRBQSO_U VSTRBQ_S VSTRBQ_U VLDRBQGO_S
+			 VLDRBQGO_U VLDRBQ_S VLDRBQ_U VLDRWQGB_S VLDRWQGB_U])
 
 (define_mode_attr MVE_CNVT [(V8HI "V8HF") (V4SI "V4SF")
 			    (V8HF "V8HI") (V4SF "V4SI")])
@@ -345,7 +346,9 @@
 		       (VMLALDAVAXQ_P_S "s") (VMLALDAVAXQ_P_U "u")
 		       (VMLALDAVAQ_P_S "s") (VMLALDAVAQ_P_U "u")
 		       (VSTRWQSB_S "s") (VSTRWQSB_U "u") (VSTRBQSO_S "s")
-		       (VSTRBQSO_U "u") (VSTRBQ_S "s") (VSTRBQ_U "u")])
+		       (VSTRBQSO_U "u") (VSTRBQ_S "s") (VSTRBQ_U "u")
+		       (VLDRBQGO_S "s") (VLDRBQGO_U "u") (VLDRBQ_S "s")
+		       (VLDRBQ_U "u") (VLDRWQGB_S "s") (VLDRWQGB_U "u")])
 
 (define_int_attr mode1 [(VCTP8Q "8") (VCTP16Q "16") (VCTP32Q "32")
 			(VCTP64Q "64") (VCTP8Q_M "8") (VCTP16Q_M "16")
@@ -569,6 +572,9 @@
 (define_int_iterator VSTRWSBQ [VSTRWQSB_S VSTRWQSB_U])
 (define_int_iterator VSTRBSOQ [VSTRBQSO_S VSTRBQSO_U])
 (define_int_iterator VSTRBQ [VSTRBQ_S VSTRBQ_U])
+(define_int_iterator VLDRBGOQ [VLDRBQGO_S VLDRBQGO_U])
+(define_int_iterator VLDRBQ [VLDRBQ_S VLDRBQ_U])
+(define_int_iterator VLDRWGBQ [VLDRWQGB_S VLDRWQGB_U])
 
 (define_insn "*mve_mov<mode>"
   [(set (match_operand:MVE_types 0 "s_register_operand" "=w,w,r,w,w,r,w")
@@ -8002,3 +8008,65 @@
    return "";
 }
   [(set_attr "length" "4")])
+
+;;
+;; [vldrbq_gather_offset_s vldrbq_gather_offset_u]
+;;
+(define_insn "mve_vldrbq_gather_offset_<supf><mode>"
+  [(set (match_operand:MVE_2 0 "s_register_operand" "=&w")
+	(unspec:MVE_2 [(match_operand:<MVE_B_ELEM> 1 "memory_operand" "Us")
+		       (match_operand:MVE_2 2 "s_register_operand" "w")]
+	 VLDRBGOQ))
+  ]
+  "TARGET_HAVE_MVE"
+{
+   rtx ops[3];
+   ops[0] = operands[0];
+   ops[1] = operands[1];
+   ops[2] = operands[2];
+   if (!strcmp ("<supf>","s") && <V_sz_elem> == 8)
+     output_asm_insn ("vldrb.u8\t%q0, [%m1, %q2]",ops);
+   else
+     output_asm_insn ("vldrb.<supf><V_sz_elem>\t%q0, [%m1, %q2]",ops);
+   return "";
+}
+  [(set_attr "length" "4")])
+
+;;
+;; [vldrbq_s vldrbq_u]
+;;
+(define_insn "mve_vldrbq_<supf><mode>"
+  [(set (match_operand:MVE_2 0 "s_register_operand" "=w")
+	(unspec:MVE_2 [(match_operand:<MVE_B_ELEM> 1 "memory_operand" "Us")]
+	 VLDRBQ))
+  ]
+  "TARGET_HAVE_MVE"
+{
+   rtx ops[2];
+   int regno = REGNO (operands[0]);
+   ops[0] = gen_rtx_REG (TImode, regno);
+   ops[1]  = operands[1];
+   output_asm_insn ("vldrb.<supf><V_sz_elem>\t%q0, %E1",ops);
+   return "";
+}
+  [(set_attr "length" "4")])
+
+;;
+;; [vldrwq_gather_base_s vldrwq_gather_base_u]
+;;
+(define_insn "mve_vldrwq_gather_base_<supf>v4si"
+  [(set (match_operand:V4SI 0 "s_register_operand" "=&w")
+	(unspec:V4SI [(match_operand:V4SI 1 "s_register_operand" "w")
+		      (match_operand:SI 2 "immediate_operand" "i")]
+	 VLDRWGBQ))
+  ]
+  "TARGET_HAVE_MVE"
+{
+   rtx ops[3];
+   ops[0] = operands[0];
+   ops[1] = operands[1];
+   ops[2] = operands[2];
+   output_asm_insn ("vldrw.u32\t%q0, [%q1, %2]",ops);
+   return "";
+}
+  [(set_attr "length" "4")])
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_gather_offset_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_gather_offset_s16.c
new file mode 100644
index 0000000000000000000000000000000000000000..f7763b8a1f668e25c5b34ed429dd5dc9a5455676
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_gather_offset_s16.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int16x8_t
+foo (int8_t const * base, uint16x8_t offset)
+{
+  return vldrbq_gather_offset_s16 (base, offset);
+}
+
+/* { dg-final { scan-assembler "vldrb.s16"  }  } */
+
+int16x8_t
+foo1 (int8_t const * base, uint16x8_t offset)
+{
+  return vldrbq_gather_offset (base, offset);
+}
+
+/* { dg-final { scan-assembler "vldrb.s16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_gather_offset_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_gather_offset_s32.c
new file mode 100644
index 0000000000000000000000000000000000000000..768a4325beae64496e174f55932554a961cbc886
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_gather_offset_s32.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int32x4_t
+foo (int8_t const * base, uint32x4_t offset)
+{
+  return vldrbq_gather_offset_s32 (base, offset);
+}
+
+/* { dg-final { scan-assembler "vldrb.s32"  }  } */
+
+int32x4_t
+foo1 (int8_t const * base, uint32x4_t offset)
+{
+  return vldrbq_gather_offset (base, offset);
+}
+
+/* { dg-final { scan-assembler "vldrb.s32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_gather_offset_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_gather_offset_s8.c
new file mode 100644
index 0000000000000000000000000000000000000000..4c52b5a009507bf68b6bd83699cd78e85b291572
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_gather_offset_s8.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int8x16_t
+foo (int8_t const * base, uint8x16_t offset)
+{
+  return vldrbq_gather_offset_s8 (base, offset);
+}
+
+/* { dg-final { scan-assembler "vldrb.u8"  }  } */
+
+int8x16_t
+foo1 (int8_t const * base, uint8x16_t offset)
+{
+  return vldrbq_gather_offset (base, offset);
+}
+
+/* { dg-final { scan-assembler "vldrb.u8"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_gather_offset_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_gather_offset_u16.c
new file mode 100644
index 0000000000000000000000000000000000000000..44b310add69aa8e7b74cf2c501013b136d5da041
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_gather_offset_u16.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint16x8_t
+foo (uint8_t const * base, uint16x8_t offset)
+{
+  return vldrbq_gather_offset_u16 (base, offset);
+}
+
+/* { dg-final { scan-assembler "vldrb.u16"  }  } */
+
+uint16x8_t
+foo1 (uint8_t const * base, uint16x8_t offset)
+{
+  return vldrbq_gather_offset (base, offset);
+}
+
+/* { dg-final { scan-assembler "vldrb.u16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_gather_offset_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_gather_offset_u32.c
new file mode 100644
index 0000000000000000000000000000000000000000..8415ea701327b2a296e1e28e37537045b69ff6a9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_gather_offset_u32.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint32x4_t
+foo (uint8_t const * base, uint32x4_t offset)
+{
+  return vldrbq_gather_offset_u32 (base, offset);
+}
+
+/* { dg-final { scan-assembler "vldrb.u32"  }  } */
+
+uint32x4_t
+foo1 (uint8_t const * base, uint32x4_t offset)
+{
+  return vldrbq_gather_offset (base, offset);
+}
+
+/* { dg-final { scan-assembler "vldrb.u32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_gather_offset_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_gather_offset_u8.c
new file mode 100644
index 0000000000000000000000000000000000000000..9fbe1aa419d572ffad9b319d3ad24d9ac913b7e6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_gather_offset_u8.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint8x16_t
+foo (uint8_t const * base, uint8x16_t offset)
+{
+  return vldrbq_gather_offset_u8 (base, offset);
+}
+
+/* { dg-final { scan-assembler "vldrb.u8"  }  } */
+
+uint8x16_t
+foo1 (uint8_t const * base, uint8x16_t offset)
+{
+  return vldrbq_gather_offset (base, offset);
+}
+
+/* { dg-final { scan-assembler "vldrb.u8"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_s16.c
new file mode 100644
index 0000000000000000000000000000000000000000..a63c7367e9ae24b78f4819aea95e6e5d0ec85891
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_s16.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int16x8_t
+foo (int8_t const * base)
+{
+  return vldrbq_s16 (base);
+}
+
+/* { dg-final { scan-assembler "vldrb.s16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_s32.c
new file mode 100644
index 0000000000000000000000000000000000000000..040c58d44c5e6bf09ad61d13c44f53dfd987e8e4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_s32.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int32x4_t
+foo (int8_t const * base)
+{
+  return vldrbq_s32 (base);
+}
+
+/* { dg-final { scan-assembler "vldrb.s32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_s8.c
new file mode 100644
index 0000000000000000000000000000000000000000..e5cc3deb79b909a19cbbcafe9213c91b67ac337a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_s8.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int8x16_t
+foo (int8_t const * base)
+{
+  return vldrbq_s8 (base);
+}
+
+/* { dg-final { scan-assembler "vldrb.s8"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_u16.c
new file mode 100644
index 0000000000000000000000000000000000000000..7dedbde6eeae066fdc04e9d6fb6fe626af4580db
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_u16.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint16x8_t
+foo (uint8_t const * base)
+{
+  return vldrbq_u16 (base);
+}
+
+/* { dg-final { scan-assembler "vldrb.u16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_u32.c
new file mode 100644
index 0000000000000000000000000000000000000000..3c3ec6c4ffe6b014bf58d0b7c2a2d501ddf8301b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_u32.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint32x4_t
+foo (uint8_t const * base)
+{
+  return vldrbq_u32 (base);
+}
+
+/* { dg-final { scan-assembler "vldrb.u32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_u8.c
new file mode 100644
index 0000000000000000000000000000000000000000..5328d99bab94d62bd032d4da3509cd2c55a7841a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrbq_u8.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint8x16_t
+foo (uint8_t const * base)
+{
+  return vldrbq_u8 (base);
+}
+
+/* { dg-final { scan-assembler "vldrb.u8"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_s32.c
new file mode 100644
index 0000000000000000000000000000000000000000..14831a91b65900d040f74b90c592bc195af42874
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_s32.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int32x4_t
+foo (uint32x4_t addr)
+{
+  return vldrwq_gather_base_s32 (addr, 4);
+}
+
+/* { dg-final { scan-assembler "vldrw.u32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_u32.c
new file mode 100644
index 0000000000000000000000000000000000000000..7753ea6a7c9ec83b90f3e723683c156b20617e23
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_u32.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint32x4_t
+foo (uint32x4_t addr)
+{
+  return vldrwq_gather_base_u32 (addr, 4);
+}
+
+/* { dg-final { scan-assembler "vldrw.u32"  }  } */


^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH][ARM][GCC][2/4x]: MVE intrinsics with quaternary operands.
  2019-11-14 19:34 [PATCH][ARM][GCC][0/x]: Support for MVE ACLE intrinsics Srinath Parvathaneni
                   ` (13 preceding siblings ...)
  2019-11-14 19:15 ` [PATCH][ARM][GCC][4/5x]: MVE load intrinsics with zero(_z) suffix Srinath Parvathaneni
@ 2019-11-14 19:15 ` Srinath Parvathaneni
  2019-11-14 19:15 ` [PATCH][ARM][GCC][7/5x]: MVE store intrinsics which stores byte,half word or word to memory Srinath Parvathaneni
                   ` (23 subsequent siblings)
  38 siblings, 0 replies; 48+ messages in thread
From: Srinath Parvathaneni @ 2019-11-14 19:15 UTC (permalink / raw)
  To: gcc-patches; +Cc: Richard Earnshaw, Kyrylo Tkachov

[-- Attachment #1: Type: text/plain, Size: 50259 bytes --]

Hello,

This patch supports following MVE ACLE intrinsics with quaternary operands.

vabdq_m_s8, vabdq_m_s32, vabdq_m_s16, vabdq_m_u8, vabdq_m_u32, vabdq_m_u16, vaddq_m_n_s8,
vaddq_m_n_s32, vaddq_m_n_s16, vaddq_m_n_u8, vaddq_m_n_u32, vaddq_m_n_u16, vaddq_m_s8,
vaddq_m_s32, vaddq_m_s16, vaddq_m_u8, vaddq_m_u32, vaddq_m_u16, vandq_m_s8, vandq_m_s32,
vandq_m_s16, vandq_m_u8, vandq_m_u32, vandq_m_u16, vbicq_m_s8, vbicq_m_s32, vbicq_m_s16,
vbicq_m_u8, vbicq_m_u32, vbicq_m_u16, vbrsrq_m_n_s8, vbrsrq_m_n_s32, vbrsrq_m_n_s16,
vbrsrq_m_n_u8, vbrsrq_m_n_u32, vbrsrq_m_n_u16, vcaddq_rot270_m_s8, vcaddq_rot270_m_s32,
vcaddq_rot270_m_s16, vcaddq_rot270_m_u8, vcaddq_rot270_m_u32, vcaddq_rot270_m_u16,
vcaddq_rot90_m_s8, vcaddq_rot90_m_s32, vcaddq_rot90_m_s16, vcaddq_rot90_m_u8,
vcaddq_rot90_m_u32, vcaddq_rot90_m_u16, veorq_m_s8, veorq_m_s32, veorq_m_s16, veorq_m_u8,
veorq_m_u32, veorq_m_u16, vhaddq_m_n_s8, vhaddq_m_n_s32, vhaddq_m_n_s16, vhaddq_m_n_u8,
vhaddq_m_n_u32, vhaddq_m_n_u16, vhaddq_m_s8, vhaddq_m_s32, vhaddq_m_s16, vhaddq_m_u8,
vhaddq_m_u32, vhaddq_m_u16, vhcaddq_rot270_m_s8, vhcaddq_rot270_m_s32, vhcaddq_rot270_m_s16,
vhcaddq_rot90_m_s8, vhcaddq_rot90_m_s32, vhcaddq_rot90_m_s16, vhsubq_m_n_s8, vhsubq_m_n_s32,
vhsubq_m_n_s16, vhsubq_m_n_u8, vhsubq_m_n_u32, vhsubq_m_n_u16, vhsubq_m_s8, vhsubq_m_s32,
vhsubq_m_s16, vhsubq_m_u8, vhsubq_m_u32, vhsubq_m_u16, vmaxq_m_s8, vmaxq_m_s32, vmaxq_m_s16,
vmaxq_m_u8, vmaxq_m_u32, vmaxq_m_u16, vminq_m_s8, vminq_m_s32, vminq_m_s16, vminq_m_u8,
vminq_m_u32, vminq_m_u16, vmladavaq_p_s8, vmladavaq_p_s32, vmladavaq_p_s16, vmladavaq_p_u8,
vmladavaq_p_u32, vmladavaq_p_u16, vmladavaxq_p_s8, vmladavaxq_p_s32, vmladavaxq_p_s16,
vmlaq_m_n_s8, vmlaq_m_n_s32, vmlaq_m_n_s16, vmlaq_m_n_u8, vmlaq_m_n_u32, vmlaq_m_n_u16,
vmlasq_m_n_s8, vmlasq_m_n_s32, vmlasq_m_n_s16, vmlasq_m_n_u8, vmlasq_m_n_u32, vmlasq_m_n_u16,
vmlsdavaq_p_s8, vmlsdavaq_p_s32, vmlsdavaq_p_s16, vmlsdavaxq_p_s8, vmlsdavaxq_p_s32,
vmlsdavaxq_p_s16, vmulhq_m_s8, vmulhq_m_s32, vmulhq_m_s16, vmulhq_m_u8, vmulhq_m_u32,
vmulhq_m_u16, vmullbq_int_m_s8, vmullbq_int_m_s32, vmullbq_int_m_s16, vmullbq_int_m_u8,
vmullbq_int_m_u32, vmullbq_int_m_u16, vmulltq_int_m_s8, vmulltq_int_m_s32, vmulltq_int_m_s16,
vmulltq_int_m_u8, vmulltq_int_m_u32, vmulltq_int_m_u16, vmulq_m_n_s8, vmulq_m_n_s32,
vmulq_m_n_s16, vmulq_m_n_u8, vmulq_m_n_u32, vmulq_m_n_u16, vmulq_m_s8, vmulq_m_s32,
vmulq_m_s16, vmulq_m_u8, vmulq_m_u32, vmulq_m_u16, vornq_m_s8, vornq_m_s32, vornq_m_s16,
vornq_m_u8, vornq_m_u32, vornq_m_u16, vorrq_m_s8, vorrq_m_s32, vorrq_m_s16, vorrq_m_u8,
vorrq_m_u32, vorrq_m_u16, vqaddq_m_n_s8, vqaddq_m_n_s32, vqaddq_m_n_s16, vqaddq_m_n_u8,
vqaddq_m_n_u32, vqaddq_m_n_u16, vqaddq_m_s8, vqaddq_m_s32, vqaddq_m_s16, vqaddq_m_u8, 
vqaddq_m_u32, vqaddq_m_u16, vqdmladhq_m_s8, vqdmladhq_m_s32, vqdmladhq_m_s16, vqdmladhxq_m_s8,
vqdmladhxq_m_s32, vqdmladhxq_m_s16, vqdmlahq_m_n_s8, vqdmlahq_m_n_s32, vqdmlahq_m_n_s16,
vqdmlahq_m_n_u8, vqdmlahq_m_n_u32, vqdmlahq_m_n_u16, vqdmlsdhq_m_s8, vqdmlsdhq_m_s32,
vqdmlsdhq_m_s16, vqdmlsdhxq_m_s8, vqdmlsdhxq_m_s32, vqdmlsdhxq_m_s16, vqdmulhq_m_n_s8,
vqdmulhq_m_n_s32, vqdmulhq_m_n_s16, vqdmulhq_m_s8, vqdmulhq_m_s32, vqdmulhq_m_s16,
vqrdmladhq_m_s8, vqrdmladhq_m_s32, vqrdmladhq_m_s16, vqrdmladhxq_m_s8, vqrdmladhxq_m_s32,
vqrdmladhxq_m_s16, vqrdmlahq_m_n_s8, vqrdmlahq_m_n_s32, vqrdmlahq_m_n_s16, vqrdmlahq_m_n_u8,
vqrdmlahq_m_n_u32, vqrdmlahq_m_n_u16, vqrdmlashq_m_n_s8, vqrdmlashq_m_n_s32, vqrdmlashq_m_n_s16,
vqrdmlashq_m_n_u8, vqrdmlashq_m_n_u32, vqrdmlashq_m_n_u16, vqrdmlsdhq_m_s8, vqrdmlsdhq_m_s32,
vqrdmlsdhq_m_s16, vqrdmlsdhxq_m_s8, vqrdmlsdhxq_m_s32, vqrdmlsdhxq_m_s16, vqrdmulhq_m_n_s8,
vqrdmulhq_m_n_s32, vqrdmulhq_m_n_s16, vqrdmulhq_m_s8, vqrdmulhq_m_s32, vqrdmulhq_m_s16,
vqrshlq_m_s8, vqrshlq_m_s32, vqrshlq_m_s16, vqrshlq_m_u8, vqrshlq_m_u32, vqrshlq_m_u16,
vqshlq_m_n_s8, vqshlq_m_n_s32, vqshlq_m_n_s16, vqshlq_m_n_u8, vqshlq_m_n_u32, vqshlq_m_n_u16,
vqshlq_m_s8, vqshlq_m_s32, vqshlq_m_s16, vqshlq_m_u8, vqshlq_m_u32, vqshlq_m_u16, 
vqsubq_m_n_s8, vqsubq_m_n_s32, vqsubq_m_n_s16, vqsubq_m_n_u8, vqsubq_m_n_u32, vqsubq_m_n_u16,
vqsubq_m_s8, vqsubq_m_s32, vqsubq_m_s16, vqsubq_m_u8, vqsubq_m_u32, vqsubq_m_u16,
vrhaddq_m_s8, vrhaddq_m_s32, vrhaddq_m_s16, vrhaddq_m_u8, vrhaddq_m_u32, vrhaddq_m_u16,
vrmulhq_m_s8, vrmulhq_m_s32, vrmulhq_m_s16, vrmulhq_m_u8, vrmulhq_m_u32, vrmulhq_m_u16,
vrshlq_m_s8, vrshlq_m_s32, vrshlq_m_s16, vrshlq_m_u8, vrshlq_m_u32, vrshlq_m_u16, vrshrq_m_n_s8,
vrshrq_m_n_s32, vrshrq_m_n_s16, vrshrq_m_n_u8, vrshrq_m_n_u32, vrshrq_m_n_u16, vshlq_m_n_s8,
vshlq_m_n_s32, vshlq_m_n_s16, vshlq_m_n_u8, vshlq_m_n_u32, vshlq_m_n_u16, vshrq_m_n_s8,
vshrq_m_n_s32, vshrq_m_n_s16, vshrq_m_n_u8, vshrq_m_n_u32, vshrq_m_n_u16, vsliq_m_n_s8,
vsliq_m_n_s32, vsliq_m_n_s16, vsliq_m_n_u8, vsliq_m_n_u32, vsliq_m_n_u16, vsubq_m_n_s8,
vsubq_m_n_s32, vsubq_m_n_s16, vsubq_m_n_u8, vsubq_m_n_u32, vsubq_m_n_u16.


Please refer to M-profile Vector Extension (MVE) intrinsics [1]  for more details.
[1] https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/mve-intrinsics

Regression tested on arm-none-eabi and found no regressions.

Ok for trunk?

Thanks,
Srinath.

gcc/ChangeLog:

2019-10-30  Andre Vieira  <andre.simoesdiasvieira@arm.com>
            Mihail Ionescu  <mihail.ionescu@arm.com>
            Srinath Parvathaneni  <srinath.parvathaneni@arm.com>
	
	* config/arm/arm_mve.h (vabdq_m_s8): Define macro.
	(vabdq_m_s32): Likewise.
	(vabdq_m_s16): Likewise.
	(vabdq_m_u8): Likewise.
	(vabdq_m_u32): Likewise.
	(vabdq_m_u16): Likewise.
	(vaddq_m_n_s8): Likewise.
	(vaddq_m_n_s32): Likewise.
	(vaddq_m_n_s16): Likewise.
	(vaddq_m_n_u8): Likewise.
	(vaddq_m_n_u32): Likewise.
	(vaddq_m_n_u16): Likewise.
	(vaddq_m_s8): Likewise.
	(vaddq_m_s32): Likewise.
	(vaddq_m_s16): Likewise.
	(vaddq_m_u8): Likewise.
	(vaddq_m_u32): Likewise.
	(vaddq_m_u16): Likewise.
	(vandq_m_s8): Likewise.
	(vandq_m_s32): Likewise.
	(vandq_m_s16): Likewise.
	(vandq_m_u8): Likewise.
	(vandq_m_u32): Likewise.
	(vandq_m_u16): Likewise.
	(vbicq_m_s8): Likewise.
	(vbicq_m_s32): Likewise.
	(vbicq_m_s16): Likewise.
	(vbicq_m_u8): Likewise.
	(vbicq_m_u32): Likewise.
	(vbicq_m_u16): Likewise.
	(vbrsrq_m_n_s8): Likewise.
	(vbrsrq_m_n_s32): Likewise.
	(vbrsrq_m_n_s16): Likewise.
	(vbrsrq_m_n_u8): Likewise.
	(vbrsrq_m_n_u32): Likewise.
	(vbrsrq_m_n_u16): Likewise.
	(vcaddq_rot270_m_s8): Likewise.
	(vcaddq_rot270_m_s32): Likewise.
	(vcaddq_rot270_m_s16): Likewise.
	(vcaddq_rot270_m_u8): Likewise.
	(vcaddq_rot270_m_u32): Likewise.
	(vcaddq_rot270_m_u16): Likewise.
	(vcaddq_rot90_m_s8): Likewise.
	(vcaddq_rot90_m_s32): Likewise.
	(vcaddq_rot90_m_s16): Likewise.
	(vcaddq_rot90_m_u8): Likewise.
	(vcaddq_rot90_m_u32): Likewise.
	(vcaddq_rot90_m_u16): Likewise.
	(veorq_m_s8): Likewise.
	(veorq_m_s32): Likewise.
	(veorq_m_s16): Likewise.
	(veorq_m_u8): Likewise.
	(veorq_m_u32): Likewise.
	(veorq_m_u16): Likewise.
	(vhaddq_m_n_s8): Likewise.
	(vhaddq_m_n_s32): Likewise.
	(vhaddq_m_n_s16): Likewise.
	(vhaddq_m_n_u8): Likewise.
	(vhaddq_m_n_u32): Likewise.
	(vhaddq_m_n_u16): Likewise.
	(vhaddq_m_s8): Likewise.
	(vhaddq_m_s32): Likewise.
	(vhaddq_m_s16): Likewise.
	(vhaddq_m_u8): Likewise.
	(vhaddq_m_u32): Likewise.
	(vhaddq_m_u16): Likewise.
	(vhcaddq_rot270_m_s8): Likewise.
	(vhcaddq_rot270_m_s32): Likewise.
	(vhcaddq_rot270_m_s16): Likewise.
	(vhcaddq_rot90_m_s8): Likewise.
	(vhcaddq_rot90_m_s32): Likewise.
	(vhcaddq_rot90_m_s16): Likewise.
	(vhsubq_m_n_s8): Likewise.
	(vhsubq_m_n_s32): Likewise.
	(vhsubq_m_n_s16): Likewise.
	(vhsubq_m_n_u8): Likewise.
	(vhsubq_m_n_u32): Likewise.
	(vhsubq_m_n_u16): Likewise.
	(vhsubq_m_s8): Likewise.
	(vhsubq_m_s32): Likewise.
	(vhsubq_m_s16): Likewise.
	(vhsubq_m_u8): Likewise.
	(vhsubq_m_u32): Likewise.
	(vhsubq_m_u16): Likewise.
	(vmaxq_m_s8): Likewise.
	(vmaxq_m_s32): Likewise.
	(vmaxq_m_s16): Likewise.
	(vmaxq_m_u8): Likewise.
	(vmaxq_m_u32): Likewise.
	(vmaxq_m_u16): Likewise.
	(vminq_m_s8): Likewise.
	(vminq_m_s32): Likewise.
	(vminq_m_s16): Likewise.
	(vminq_m_u8): Likewise.
	(vminq_m_u32): Likewise.
	(vminq_m_u16): Likewise.
	(vmladavaq_p_s8): Likewise.
	(vmladavaq_p_s32): Likewise.
	(vmladavaq_p_s16): Likewise.
	(vmladavaq_p_u8): Likewise.
	(vmladavaq_p_u32): Likewise.
	(vmladavaq_p_u16): Likewise.
	(vmladavaxq_p_s8): Likewise.
	(vmladavaxq_p_s32): Likewise.
	(vmladavaxq_p_s16): Likewise.
	(vmlaq_m_n_s8): Likewise.
	(vmlaq_m_n_s32): Likewise.
	(vmlaq_m_n_s16): Likewise.
	(vmlaq_m_n_u8): Likewise.
	(vmlaq_m_n_u32): Likewise.
	(vmlaq_m_n_u16): Likewise.
	(vmlasq_m_n_s8): Likewise.
	(vmlasq_m_n_s32): Likewise.
	(vmlasq_m_n_s16): Likewise.
	(vmlasq_m_n_u8): Likewise.
	(vmlasq_m_n_u32): Likewise.
	(vmlasq_m_n_u16): Likewise.
	(vmlsdavaq_p_s8): Likewise.
	(vmlsdavaq_p_s32): Likewise.
	(vmlsdavaq_p_s16): Likewise.
	(vmlsdavaxq_p_s8): Likewise.
	(vmlsdavaxq_p_s32): Likewise.
	(vmlsdavaxq_p_s16): Likewise.
	(vmulhq_m_s8): Likewise.
	(vmulhq_m_s32): Likewise.
	(vmulhq_m_s16): Likewise.
	(vmulhq_m_u8): Likewise.
	(vmulhq_m_u32): Likewise.
	(vmulhq_m_u16): Likewise.
	(vmullbq_int_m_s8): Likewise.
	(vmullbq_int_m_s32): Likewise.
	(vmullbq_int_m_s16): Likewise.
	(vmullbq_int_m_u8): Likewise.
	(vmullbq_int_m_u32): Likewise.
	(vmullbq_int_m_u16): Likewise.
	(vmulltq_int_m_s8): Likewise.
	(vmulltq_int_m_s32): Likewise.
	(vmulltq_int_m_s16): Likewise.
	(vmulltq_int_m_u8): Likewise.
	(vmulltq_int_m_u32): Likewise.
	(vmulltq_int_m_u16): Likewise.
	(vmulq_m_n_s8): Likewise.
	(vmulq_m_n_s32): Likewise.
	(vmulq_m_n_s16): Likewise.
	(vmulq_m_n_u8): Likewise.
	(vmulq_m_n_u32): Likewise.
	(vmulq_m_n_u16): Likewise.
	(vmulq_m_s8): Likewise.
	(vmulq_m_s32): Likewise.
	(vmulq_m_s16): Likewise.
	(vmulq_m_u8): Likewise.
	(vmulq_m_u32): Likewise.
	(vmulq_m_u16): Likewise.
	(vornq_m_s8): Likewise.
	(vornq_m_s32): Likewise.
	(vornq_m_s16): Likewise.
	(vornq_m_u8): Likewise.
	(vornq_m_u32): Likewise.
	(vornq_m_u16): Likewise.
	(vorrq_m_s8): Likewise.
	(vorrq_m_s32): Likewise.
	(vorrq_m_s16): Likewise.
	(vorrq_m_u8): Likewise.
	(vorrq_m_u32): Likewise.
	(vorrq_m_u16): Likewise.
	(vqaddq_m_n_s8): Likewise.
	(vqaddq_m_n_s32): Likewise.
	(vqaddq_m_n_s16): Likewise.
	(vqaddq_m_n_u8): Likewise.
	(vqaddq_m_n_u32): Likewise.
	(vqaddq_m_n_u16): Likewise.
	(vqaddq_m_s8): Likewise.
	(vqaddq_m_s32): Likewise.
	(vqaddq_m_s16): Likewise.
	(vqaddq_m_u8): Likewise.
	(vqaddq_m_u32): Likewise.
	(vqaddq_m_u16): Likewise.
	(vqdmladhq_m_s8): Likewise.
	(vqdmladhq_m_s32): Likewise.
	(vqdmladhq_m_s16): Likewise.
	(vqdmladhxq_m_s8): Likewise.
	(vqdmladhxq_m_s32): Likewise.
	(vqdmladhxq_m_s16): Likewise.
	(vqdmlahq_m_n_s8): Likewise.
	(vqdmlahq_m_n_s32): Likewise.
	(vqdmlahq_m_n_s16): Likewise.
	(vqdmlahq_m_n_u8): Likewise.
	(vqdmlahq_m_n_u32): Likewise.
	(vqdmlahq_m_n_u16): Likewise.
	(vqdmlsdhq_m_s8): Likewise.
	(vqdmlsdhq_m_s32): Likewise.
	(vqdmlsdhq_m_s16): Likewise.
	(vqdmlsdhxq_m_s8): Likewise.
	(vqdmlsdhxq_m_s32): Likewise.
	(vqdmlsdhxq_m_s16): Likewise.
	(vqdmulhq_m_n_s8): Likewise.
	(vqdmulhq_m_n_s32): Likewise.
	(vqdmulhq_m_n_s16): Likewise.
	(vqdmulhq_m_s8): Likewise.
	(vqdmulhq_m_s32): Likewise.
	(vqdmulhq_m_s16): Likewise.
	(vqrdmladhq_m_s8): Likewise.
	(vqrdmladhq_m_s32): Likewise.
	(vqrdmladhq_m_s16): Likewise.
	(vqrdmladhxq_m_s8): Likewise.
	(vqrdmladhxq_m_s32): Likewise.
	(vqrdmladhxq_m_s16): Likewise.
	(vqrdmlahq_m_n_s8): Likewise.
	(vqrdmlahq_m_n_s32): Likewise.
	(vqrdmlahq_m_n_s16): Likewise.
	(vqrdmlahq_m_n_u8): Likewise.
	(vqrdmlahq_m_n_u32): Likewise.
	(vqrdmlahq_m_n_u16): Likewise.
	(vqrdmlashq_m_n_s8): Likewise.
	(vqrdmlashq_m_n_s32): Likewise.
	(vqrdmlashq_m_n_s16): Likewise.
	(vqrdmlashq_m_n_u8): Likewise.
	(vqrdmlashq_m_n_u32): Likewise.
	(vqrdmlashq_m_n_u16): Likewise.
	(vqrdmlsdhq_m_s8): Likewise.
	(vqrdmlsdhq_m_s32): Likewise.
	(vqrdmlsdhq_m_s16): Likewise.
	(vqrdmlsdhxq_m_s8): Likewise.
	(vqrdmlsdhxq_m_s32): Likewise.
	(vqrdmlsdhxq_m_s16): Likewise.
	(vqrdmulhq_m_n_s8): Likewise.
	(vqrdmulhq_m_n_s32): Likewise.
	(vqrdmulhq_m_n_s16): Likewise.
	(vqrdmulhq_m_s8): Likewise.
	(vqrdmulhq_m_s32): Likewise.
	(vqrdmulhq_m_s16): Likewise.
	(vqrshlq_m_s8): Likewise.
	(vqrshlq_m_s32): Likewise.
	(vqrshlq_m_s16): Likewise.
	(vqrshlq_m_u8): Likewise.
	(vqrshlq_m_u32): Likewise.
	(vqrshlq_m_u16): Likewise.
	(vqshlq_m_n_s8): Likewise.
	(vqshlq_m_n_s32): Likewise.
	(vqshlq_m_n_s16): Likewise.
	(vqshlq_m_n_u8): Likewise.
	(vqshlq_m_n_u32): Likewise.
	(vqshlq_m_n_u16): Likewise.
	(vqshlq_m_s8): Likewise.
	(vqshlq_m_s32): Likewise.
	(vqshlq_m_s16): Likewise.
	(vqshlq_m_u8): Likewise.
	(vqshlq_m_u32): Likewise.
	(vqshlq_m_u16): Likewise.
	(vqsubq_m_n_s8): Likewise.
	(vqsubq_m_n_s32): Likewise.
	(vqsubq_m_n_s16): Likewise.
	(vqsubq_m_n_u8): Likewise.
	(vqsubq_m_n_u32): Likewise.
	(vqsubq_m_n_u16): Likewise.
	(vqsubq_m_s8): Likewise.
	(vqsubq_m_s32): Likewise.
	(vqsubq_m_s16): Likewise.
	(vqsubq_m_u8): Likewise.
	(vqsubq_m_u32): Likewise.
	(vqsubq_m_u16): Likewise.
	(vrhaddq_m_s8): Likewise.
	(vrhaddq_m_s32): Likewise.
	(vrhaddq_m_s16): Likewise.
	(vrhaddq_m_u8): Likewise.
	(vrhaddq_m_u32): Likewise.
	(vrhaddq_m_u16): Likewise.
	(vrmulhq_m_s8): Likewise.
	(vrmulhq_m_s32): Likewise.
	(vrmulhq_m_s16): Likewise.
	(vrmulhq_m_u8): Likewise.
	(vrmulhq_m_u32): Likewise.
	(vrmulhq_m_u16): Likewise.
	(vrshlq_m_s8): Likewise.
	(vrshlq_m_s32): Likewise.
	(vrshlq_m_s16): Likewise.
	(vrshlq_m_u8): Likewise.
	(vrshlq_m_u32): Likewise.
	(vrshlq_m_u16): Likewise.
	(vrshrq_m_n_s8): Likewise.
	(vrshrq_m_n_s32): Likewise.
	(vrshrq_m_n_s16): Likewise.
	(vrshrq_m_n_u8): Likewise.
	(vrshrq_m_n_u32): Likewise.
	(vrshrq_m_n_u16): Likewise.
	(vshlq_m_n_s8): Likewise.
	(vshlq_m_n_s32): Likewise.
	(vshlq_m_n_s16): Likewise.
	(vshlq_m_n_u8): Likewise.
	(vshlq_m_n_u32): Likewise.
	(vshlq_m_n_u16): Likewise.
	(vshrq_m_n_s8): Likewise.
	(vshrq_m_n_s32): Likewise.
	(vshrq_m_n_s16): Likewise.
	(vshrq_m_n_u8): Likewise.
	(vshrq_m_n_u32): Likewise.
	(vshrq_m_n_u16): Likewise.
	(vsliq_m_n_s8): Likewise.
	(vsliq_m_n_s32): Likewise.
	(vsliq_m_n_s16): Likewise.
	(vsliq_m_n_u8): Likewise.
	(vsliq_m_n_u32): Likewise.
	(vsliq_m_n_u16): Likewise.
	(vsubq_m_n_s8): Likewise.
	(vsubq_m_n_s32): Likewise.
	(vsubq_m_n_s16): Likewise.
	(vsubq_m_n_u8): Likewise.
	(vsubq_m_n_u32): Likewise.
	(vsubq_m_n_u16): Likewise.
	(__arm_vabdq_m_s8): Define intrinsic.
	(__arm_vabdq_m_s32): Likewise.
	(__arm_vabdq_m_s16): Likewise.
	(__arm_vabdq_m_u8): Likewise.
	(__arm_vabdq_m_u32): Likewise.
	(__arm_vabdq_m_u16): Likewise.
	(__arm_vaddq_m_n_s8): Likewise.
	(__arm_vaddq_m_n_s32): Likewise.
	(__arm_vaddq_m_n_s16): Likewise.
	(__arm_vaddq_m_n_u8): Likewise.
	(__arm_vaddq_m_n_u32): Likewise.
	(__arm_vaddq_m_n_u16): Likewise.
	(__arm_vaddq_m_s8): Likewise.
	(__arm_vaddq_m_s32): Likewise.
	(__arm_vaddq_m_s16): Likewise.
	(__arm_vaddq_m_u8): Likewise.
	(__arm_vaddq_m_u32): Likewise.
	(__arm_vaddq_m_u16): Likewise.
	(__arm_vandq_m_s8): Likewise.
	(__arm_vandq_m_s32): Likewise.
	(__arm_vandq_m_s16): Likewise.
	(__arm_vandq_m_u8): Likewise.
	(__arm_vandq_m_u32): Likewise.
	(__arm_vandq_m_u16): Likewise.
	(__arm_vbicq_m_s8): Likewise.
	(__arm_vbicq_m_s32): Likewise.
	(__arm_vbicq_m_s16): Likewise.
	(__arm_vbicq_m_u8): Likewise.
	(__arm_vbicq_m_u32): Likewise.
	(__arm_vbicq_m_u16): Likewise.
	(__arm_vbrsrq_m_n_s8): Likewise.
	(__arm_vbrsrq_m_n_s32): Likewise.
	(__arm_vbrsrq_m_n_s16): Likewise.
	(__arm_vbrsrq_m_n_u8): Likewise.
	(__arm_vbrsrq_m_n_u32): Likewise.
	(__arm_vbrsrq_m_n_u16): Likewise.
	(__arm_vcaddq_rot270_m_s8): Likewise.
	(__arm_vcaddq_rot270_m_s32): Likewise.
	(__arm_vcaddq_rot270_m_s16): Likewise.
	(__arm_vcaddq_rot270_m_u8): Likewise.
	(__arm_vcaddq_rot270_m_u32): Likewise.
	(__arm_vcaddq_rot270_m_u16): Likewise.
	(__arm_vcaddq_rot90_m_s8): Likewise.
	(__arm_vcaddq_rot90_m_s32): Likewise.
	(__arm_vcaddq_rot90_m_s16): Likewise.
	(__arm_vcaddq_rot90_m_u8): Likewise.
	(__arm_vcaddq_rot90_m_u32): Likewise.
	(__arm_vcaddq_rot90_m_u16): Likewise.
	(__arm_veorq_m_s8): Likewise.
	(__arm_veorq_m_s32): Likewise.
	(__arm_veorq_m_s16): Likewise.
	(__arm_veorq_m_u8): Likewise.
	(__arm_veorq_m_u32): Likewise.
	(__arm_veorq_m_u16): Likewise.
	(__arm_vhaddq_m_n_s8): Likewise.
	(__arm_vhaddq_m_n_s32): Likewise.
	(__arm_vhaddq_m_n_s16): Likewise.
	(__arm_vhaddq_m_n_u8): Likewise.
	(__arm_vhaddq_m_n_u32): Likewise.
	(__arm_vhaddq_m_n_u16): Likewise.
	(__arm_vhaddq_m_s8): Likewise.
	(__arm_vhaddq_m_s32): Likewise.
	(__arm_vhaddq_m_s16): Likewise.
	(__arm_vhaddq_m_u8): Likewise.
	(__arm_vhaddq_m_u32): Likewise.
	(__arm_vhaddq_m_u16): Likewise.
	(__arm_vhcaddq_rot270_m_s8): Likewise.
	(__arm_vhcaddq_rot270_m_s32): Likewise.
	(__arm_vhcaddq_rot270_m_s16): Likewise.
	(__arm_vhcaddq_rot90_m_s8): Likewise.
	(__arm_vhcaddq_rot90_m_s32): Likewise.
	(__arm_vhcaddq_rot90_m_s16): Likewise.
	(__arm_vhsubq_m_n_s8): Likewise.
	(__arm_vhsubq_m_n_s32): Likewise.
	(__arm_vhsubq_m_n_s16): Likewise.
	(__arm_vhsubq_m_n_u8): Likewise.
	(__arm_vhsubq_m_n_u32): Likewise.
	(__arm_vhsubq_m_n_u16): Likewise.
	(__arm_vhsubq_m_s8): Likewise.
	(__arm_vhsubq_m_s32): Likewise.
	(__arm_vhsubq_m_s16): Likewise.
	(__arm_vhsubq_m_u8): Likewise.
	(__arm_vhsubq_m_u32): Likewise.
	(__arm_vhsubq_m_u16): Likewise.
	(__arm_vmaxq_m_s8): Likewise.
	(__arm_vmaxq_m_s32): Likewise.
	(__arm_vmaxq_m_s16): Likewise.
	(__arm_vmaxq_m_u8): Likewise.
	(__arm_vmaxq_m_u32): Likewise.
	(__arm_vmaxq_m_u16): Likewise.
	(__arm_vminq_m_s8): Likewise.
	(__arm_vminq_m_s32): Likewise.
	(__arm_vminq_m_s16): Likewise.
	(__arm_vminq_m_u8): Likewise.
	(__arm_vminq_m_u32): Likewise.
	(__arm_vminq_m_u16): Likewise.
	(__arm_vmladavaq_p_s8): Likewise.
	(__arm_vmladavaq_p_s32): Likewise.
	(__arm_vmladavaq_p_s16): Likewise.
	(__arm_vmladavaq_p_u8): Likewise.
	(__arm_vmladavaq_p_u32): Likewise.
	(__arm_vmladavaq_p_u16): Likewise.
	(__arm_vmladavaxq_p_s8): Likewise.
	(__arm_vmladavaxq_p_s32): Likewise.
	(__arm_vmladavaxq_p_s16): Likewise.
	(__arm_vmlaq_m_n_s8): Likewise.
	(__arm_vmlaq_m_n_s32): Likewise.
	(__arm_vmlaq_m_n_s16): Likewise.
	(__arm_vmlaq_m_n_u8): Likewise.
	(__arm_vmlaq_m_n_u32): Likewise.
	(__arm_vmlaq_m_n_u16): Likewise.
	(__arm_vmlasq_m_n_s8): Likewise.
	(__arm_vmlasq_m_n_s32): Likewise.
	(__arm_vmlasq_m_n_s16): Likewise.
	(__arm_vmlasq_m_n_u8): Likewise.
	(__arm_vmlasq_m_n_u32): Likewise.
	(__arm_vmlasq_m_n_u16): Likewise.
	(__arm_vmlsdavaq_p_s8): Likewise.
	(__arm_vmlsdavaq_p_s32): Likewise.
	(__arm_vmlsdavaq_p_s16): Likewise.
	(__arm_vmlsdavaxq_p_s8): Likewise.
	(__arm_vmlsdavaxq_p_s32): Likewise.
	(__arm_vmlsdavaxq_p_s16): Likewise.
	(__arm_vmulhq_m_s8): Likewise.
	(__arm_vmulhq_m_s32): Likewise.
	(__arm_vmulhq_m_s16): Likewise.
	(__arm_vmulhq_m_u8): Likewise.
	(__arm_vmulhq_m_u32): Likewise.
	(__arm_vmulhq_m_u16): Likewise.
	(__arm_vmullbq_int_m_s8): Likewise.
	(__arm_vmullbq_int_m_s32): Likewise.
	(__arm_vmullbq_int_m_s16): Likewise.
	(__arm_vmullbq_int_m_u8): Likewise.
	(__arm_vmullbq_int_m_u32): Likewise.
	(__arm_vmullbq_int_m_u16): Likewise.
	(__arm_vmulltq_int_m_s8): Likewise.
	(__arm_vmulltq_int_m_s32): Likewise.
	(__arm_vmulltq_int_m_s16): Likewise.
	(__arm_vmulltq_int_m_u8): Likewise.
	(__arm_vmulltq_int_m_u32): Likewise.
	(__arm_vmulltq_int_m_u16): Likewise.
	(__arm_vmulq_m_n_s8): Likewise.
	(__arm_vmulq_m_n_s32): Likewise.
	(__arm_vmulq_m_n_s16): Likewise.
	(__arm_vmulq_m_n_u8): Likewise.
	(__arm_vmulq_m_n_u32): Likewise.
	(__arm_vmulq_m_n_u16): Likewise.
	(__arm_vmulq_m_s8): Likewise.
	(__arm_vmulq_m_s32): Likewise.
	(__arm_vmulq_m_s16): Likewise.
	(__arm_vmulq_m_u8): Likewise.
	(__arm_vmulq_m_u32): Likewise.
	(__arm_vmulq_m_u16): Likewise.
	(__arm_vornq_m_s8): Likewise.
	(__arm_vornq_m_s32): Likewise.
	(__arm_vornq_m_s16): Likewise.
	(__arm_vornq_m_u8): Likewise.
	(__arm_vornq_m_u32): Likewise.
	(__arm_vornq_m_u16): Likewise.
	(__arm_vorrq_m_s8): Likewise.
	(__arm_vorrq_m_s32): Likewise.
	(__arm_vorrq_m_s16): Likewise.
	(__arm_vorrq_m_u8): Likewise.
	(__arm_vorrq_m_u32): Likewise.
	(__arm_vorrq_m_u16): Likewise.
	(__arm_vqaddq_m_n_s8): Likewise.
	(__arm_vqaddq_m_n_s32): Likewise.
	(__arm_vqaddq_m_n_s16): Likewise.
	(__arm_vqaddq_m_n_u8): Likewise.
	(__arm_vqaddq_m_n_u32): Likewise.
	(__arm_vqaddq_m_n_u16): Likewise.
	(__arm_vqaddq_m_s8): Likewise.
	(__arm_vqaddq_m_s32): Likewise.
	(__arm_vqaddq_m_s16): Likewise.
	(__arm_vqaddq_m_u8): Likewise.
	(__arm_vqaddq_m_u32): Likewise.
	(__arm_vqaddq_m_u16): Likewise.
	(__arm_vqdmladhq_m_s8): Likewise.
	(__arm_vqdmladhq_m_s32): Likewise.
	(__arm_vqdmladhq_m_s16): Likewise.
	(__arm_vqdmladhxq_m_s8): Likewise.
	(__arm_vqdmladhxq_m_s32): Likewise.
	(__arm_vqdmladhxq_m_s16): Likewise.
	(__arm_vqdmlahq_m_n_s8): Likewise.
	(__arm_vqdmlahq_m_n_s32): Likewise.
	(__arm_vqdmlahq_m_n_s16): Likewise.
	(__arm_vqdmlahq_m_n_u8): Likewise.
	(__arm_vqdmlahq_m_n_u32): Likewise.
	(__arm_vqdmlahq_m_n_u16): Likewise.
	(__arm_vqdmlsdhq_m_s8): Likewise.
	(__arm_vqdmlsdhq_m_s32): Likewise.
	(__arm_vqdmlsdhq_m_s16): Likewise.
	(__arm_vqdmlsdhxq_m_s8): Likewise.
	(__arm_vqdmlsdhxq_m_s32): Likewise.
	(__arm_vqdmlsdhxq_m_s16): Likewise.
	(__arm_vqdmulhq_m_n_s8): Likewise.
	(__arm_vqdmulhq_m_n_s32): Likewise.
	(__arm_vqdmulhq_m_n_s16): Likewise.
	(__arm_vqdmulhq_m_s8): Likewise.
	(__arm_vqdmulhq_m_s32): Likewise.
	(__arm_vqdmulhq_m_s16): Likewise.
	(__arm_vqrdmladhq_m_s8): Likewise.
	(__arm_vqrdmladhq_m_s32): Likewise.
	(__arm_vqrdmladhq_m_s16): Likewise.
	(__arm_vqrdmladhxq_m_s8): Likewise.
	(__arm_vqrdmladhxq_m_s32): Likewise.
	(__arm_vqrdmladhxq_m_s16): Likewise.
	(__arm_vqrdmlahq_m_n_s8): Likewise.
	(__arm_vqrdmlahq_m_n_s32): Likewise.
	(__arm_vqrdmlahq_m_n_s16): Likewise.
	(__arm_vqrdmlahq_m_n_u8): Likewise.
	(__arm_vqrdmlahq_m_n_u32): Likewise.
	(__arm_vqrdmlahq_m_n_u16): Likewise.
	(__arm_vqrdmlashq_m_n_s8): Likewise.
	(__arm_vqrdmlashq_m_n_s32): Likewise.
	(__arm_vqrdmlashq_m_n_s16): Likewise.
	(__arm_vqrdmlashq_m_n_u8): Likewise.
	(__arm_vqrdmlashq_m_n_u32): Likewise.
	(__arm_vqrdmlashq_m_n_u16): Likewise.
	(__arm_vqrdmlsdhq_m_s8): Likewise.
	(__arm_vqrdmlsdhq_m_s32): Likewise.
	(__arm_vqrdmlsdhq_m_s16): Likewise.
	(__arm_vqrdmlsdhxq_m_s8): Likewise.
	(__arm_vqrdmlsdhxq_m_s32): Likewise.
	(__arm_vqrdmlsdhxq_m_s16): Likewise.
	(__arm_vqrdmulhq_m_n_s8): Likewise.
	(__arm_vqrdmulhq_m_n_s32): Likewise.
	(__arm_vqrdmulhq_m_n_s16): Likewise.
	(__arm_vqrdmulhq_m_s8): Likewise.
	(__arm_vqrdmulhq_m_s32): Likewise.
	(__arm_vqrdmulhq_m_s16): Likewise.
	(__arm_vqrshlq_m_s8): Likewise.
	(__arm_vqrshlq_m_s32): Likewise.
	(__arm_vqrshlq_m_s16): Likewise.
	(__arm_vqrshlq_m_u8): Likewise.
	(__arm_vqrshlq_m_u32): Likewise.
	(__arm_vqrshlq_m_u16): Likewise.
	(__arm_vqshlq_m_n_s8): Likewise.
	(__arm_vqshlq_m_n_s32): Likewise.
	(__arm_vqshlq_m_n_s16): Likewise.
	(__arm_vqshlq_m_n_u8): Likewise.
	(__arm_vqshlq_m_n_u32): Likewise.
	(__arm_vqshlq_m_n_u16): Likewise.
	(__arm_vqshlq_m_s8): Likewise.
	(__arm_vqshlq_m_s32): Likewise.
	(__arm_vqshlq_m_s16): Likewise.
	(__arm_vqshlq_m_u8): Likewise.
	(__arm_vqshlq_m_u32): Likewise.
	(__arm_vqshlq_m_u16): Likewise.
	(__arm_vqsubq_m_n_s8): Likewise.
	(__arm_vqsubq_m_n_s32): Likewise.
	(__arm_vqsubq_m_n_s16): Likewise.
	(__arm_vqsubq_m_n_u8): Likewise.
	(__arm_vqsubq_m_n_u32): Likewise.
	(__arm_vqsubq_m_n_u16): Likewise.
	(__arm_vqsubq_m_s8): Likewise.
	(__arm_vqsubq_m_s32): Likewise.
	(__arm_vqsubq_m_s16): Likewise.
	(__arm_vqsubq_m_u8): Likewise.
	(__arm_vqsubq_m_u32): Likewise.
	(__arm_vqsubq_m_u16): Likewise.
	(__arm_vrhaddq_m_s8): Likewise.
	(__arm_vrhaddq_m_s32): Likewise.
	(__arm_vrhaddq_m_s16): Likewise.
	(__arm_vrhaddq_m_u8): Likewise.
	(__arm_vrhaddq_m_u32): Likewise.
	(__arm_vrhaddq_m_u16): Likewise.
	(__arm_vrmulhq_m_s8): Likewise.
	(__arm_vrmulhq_m_s32): Likewise.
	(__arm_vrmulhq_m_s16): Likewise.
	(__arm_vrmulhq_m_u8): Likewise.
	(__arm_vrmulhq_m_u32): Likewise.
	(__arm_vrmulhq_m_u16): Likewise.
	(__arm_vrshlq_m_s8): Likewise.
	(__arm_vrshlq_m_s32): Likewise.
	(__arm_vrshlq_m_s16): Likewise.
	(__arm_vrshlq_m_u8): Likewise.
	(__arm_vrshlq_m_u32): Likewise.
	(__arm_vrshlq_m_u16): Likewise.
	(__arm_vrshrq_m_n_s8): Likewise.
	(__arm_vrshrq_m_n_s32): Likewise.
	(__arm_vrshrq_m_n_s16): Likewise.
	(__arm_vrshrq_m_n_u8): Likewise.
	(__arm_vrshrq_m_n_u32): Likewise.
	(__arm_vrshrq_m_n_u16): Likewise.
	(__arm_vshlq_m_n_s8): Likewise.
	(__arm_vshlq_m_n_s32): Likewise.
	(__arm_vshlq_m_n_s16): Likewise.
	(__arm_vshlq_m_n_u8): Likewise.
	(__arm_vshlq_m_n_u32): Likewise.
	(__arm_vshlq_m_n_u16): Likewise.
	(__arm_vshrq_m_n_s8): Likewise.
	(__arm_vshrq_m_n_s32): Likewise.
	(__arm_vshrq_m_n_s16): Likewise.
	(__arm_vshrq_m_n_u8): Likewise.
	(__arm_vshrq_m_n_u32): Likewise.
	(__arm_vshrq_m_n_u16): Likewise.
	(__arm_vsliq_m_n_s8): Likewise.
	(__arm_vsliq_m_n_s32): Likewise.
	(__arm_vsliq_m_n_s16): Likewise.
	(__arm_vsliq_m_n_u8): Likewise.
	(__arm_vsliq_m_n_u32): Likewise.
	(__arm_vsliq_m_n_u16): Likewise.
	(__arm_vsubq_m_n_s8): Likewise.
	(__arm_vsubq_m_n_s32): Likewise.
	(__arm_vsubq_m_n_s16): Likewise.
	(__arm_vsubq_m_n_u8): Likewise.
	(__arm_vsubq_m_n_u32): Likewise.
	(__arm_vsubq_m_n_u16): Likewise.
	(vqdmladhq_m): Define polymorphic variant.
	(vqdmladhxq_m): Likewise.
	(vqdmlsdhq_m): Likewise.
	(vqdmlsdhxq_m): Likewise.
	(vabdq_m): Likewise.
	(vandq_m): Likewise.
	(vbicq_m): Likewise.
	(vbrsrq_m_n): Likewise.
	(vcaddq_rot270_m): Likewise.
	(vcaddq_rot90_m): Likewise.
	(veorq_m): Likewise.
	(vmaxq_m): Likewise.
	(vminq_m): Likewise.
	(vmladavaq_p): Likewise.
	(vmlaq_m_n): Likewise.
	(vmlasq_m_n): Likewise.
	(vmulhq_m): Likewise.
	(vmullbq_int_m): Likewise.
	(vmulltq_int_m): Likewise.
	(vornq_m): Likewise.
	(vorrq_m): Likewise.
	(vqdmlahq_m_n): Likewise.
	(vqrdmlahq_m_n): Likewise.
	(vqrdmlashq_m_n): Likewise.
	(vqrshlq_m): Likewise.
	(vqshlq_m_n): Likewise.
	(vqshlq_m): Likewise.
	(vrhaddq_m): Likewise.
	(vrmulhq_m): Likewise.
	(vrshlq_m): Likewise.
	(vrshrq_m_n): Likewise.
	(vshlq_m_n): Likewise.
	(vshrq_m_n): Likewise.
	(vsliq_m): Likewise.
	(vaddq_m_n): Likewise.
	(vaddq_m): Likewise.
	(vhaddq_m_n): Likewise.
	(vhaddq_m): Likewise.
	(vhcaddq_rot270_m): Likewise.
	(vhcaddq_rot90_m): Likewise.
	(vhsubq_m): Likewise.
	(vhsubq_m_n): Likewise.
	(vmulq_m_n): Likewise.
	(vmulq_m): Likewise.
	(vqaddq_m_n): Likewise.
	(vqaddq_m): Likewise.
	(vqdmulhq_m_n): Likewise.
	(vqdmulhq_m): Likewise.
	(vsubq_m_n): Likewise.
	(vsliq_m_n): Likewise.
	(vqsubq_m_n): Likewise.
	(vqsubq_m): Likewise.
	(vqrdmulhq_m): Likewise.
	(vqrdmulhq_m_n): Likewise.
	(vqrdmlsdhxq_m): Likewise.
	(vqrdmlsdhq_m): Likewise.
	(vqrdmladhq_m): Likewise.
	(vqrdmladhxq_m): Likewise.
	(vmlsdavaxq_p): Likewise.
	(vmlsdavaq_p): Likewise.
	(vmladavaxq_p): Likewise.
	* config/arm/arm_mve_builtins.def (QUADOP_NONE_NONE_NONE_IMM_UNONE): Use
	builtin qualifier.
	(QUADOP_NONE_NONE_NONE_NONE_UNONE): Likewise.
	(QUADOP_UNONE_UNONE_UNONE_IMM_UNONE): Likewise.
	(QUADOP_UNONE_UNONE_UNONE_NONE_UNONE): Likewise.
	(QUADOP_UNONE_UNONE_UNONE_UNONE_UNONE): Likewise.
	* config/arm/mve.md (VHSUBQ_M): Define iterators.
	(VSLIQ_M_N): Likewise.
	(VQRDMLAHQ_M_N): Likewise.
	(VRSHLQ_M): Likewise.
	(VMINQ_M): Likewise.
	(VMULLBQ_INT_M): Likewise.
	(VMULHQ_M): Likewise.
	(VMULQ_M): Likewise.
	(VHSUBQ_M_N): Likewise.
	(VHADDQ_M_N): Likewise.
	(VORRQ_M): Likewise.
	(VRMULHQ_M): Likewise.
	(VQADDQ_M): Likewise.
	(VRSHRQ_M_N): Likewise.
	(VQSUBQ_M_N): Likewise.
	(VADDQ_M): Likewise.
	(VORNQ_M): Likewise.
	(VQDMLAHQ_M_N): Likewise.
	(VRHADDQ_M): Likewise.
	(VQSHLQ_M): Likewise.
	(VANDQ_M): Likewise.
	(VBICQ_M): Likewise.
	(VSHLQ_M_N): Likewise.
	(VCADDQ_ROT270_M): Likewise.
	(VQRSHLQ_M): Likewise.
	(VQADDQ_M_N): Likewise.
	(VADDQ_M_N): Likewise.
	(VMAXQ_M): Likewise.
	(VQSUBQ_M): Likewise.
	(VMLASQ_M_N): Likewise.
	(VMLADAVAQ_P): Likewise.
	(VBRSRQ_M_N): Likewise.
	(VMULQ_M_N): Likewise.
	(VCADDQ_ROT90_M): Likewise.
	(VMULLTQ_INT_M): Likewise.
	(VEORQ_M): Likewise.
	(VSHRQ_M_N): Likewise.
	(VSUBQ_M_N): Likewise.
	(VHADDQ_M): Likewise.
	(VABDQ_M): Likewise.
	(VQRDMLASHQ_M_N): Likewise.
	(VMLAQ_M_N): Likewise.
	(VQSHLQ_M_N): Likewise.
	(mve_vabdq_m_<supf><mode>): Define RTL pattern.
	(mve_vaddq_m_n_<supf><mode>): Likewise.
	(mve_vaddq_m_<supf><mode>): Likewise.
	(mve_vandq_m_<supf><mode>): Likewise.
	(mve_vbicq_m_<supf><mode>): Likewise.
	(mve_vbrsrq_m_n_<supf><mode>): Likewise.
	(mve_vcaddq_rot270_m_<supf><mode>): Likewise.
	(mve_vcaddq_rot90_m_<supf><mode>): Likewise.
	(mve_veorq_m_<supf><mode>): Likewise.
	(mve_vhaddq_m_n_<supf><mode>): Likewise.
	(mve_vhaddq_m_<supf><mode>): Likewise.
	(mve_vhsubq_m_n_<supf><mode>): Likewise.
	(mve_vhsubq_m_<supf><mode>): Likewise.
	(mve_vmaxq_m_<supf><mode>): Likewise.
	(mve_vminq_m_<supf><mode>): Likewise.
	(mve_vmladavaq_p_<supf><mode>): Likewise.
	(mve_vmlaq_m_n_<supf><mode>): Likewise.
	(mve_vmlasq_m_n_<supf><mode>): Likewise.
	(mve_vmulhq_m_<supf><mode>): Likewise.
	(mve_vmullbq_int_m_<supf><mode>): Likewise.
	(mve_vmulltq_int_m_<supf><mode>): Likewise.
	(mve_vmulq_m_n_<supf><mode>): Likewise.
	(mve_vmulq_m_<supf><mode>): Likewise.
	(mve_vornq_m_<supf><mode>): Likewise.
	(mve_vorrq_m_<supf><mode>): Likewise.
	(mve_vqaddq_m_n_<supf><mode>): Likewise.
	(mve_vqaddq_m_<supf><mode>): Likewise.
	(mve_vqdmlahq_m_n_<supf><mode>): Likewise.
	(mve_vqrdmlahq_m_n_<supf><mode>): Likewise.
	(mve_vqrdmlashq_m_n_<supf><mode>): Likewise.
	(mve_vqrshlq_m_<supf><mode>): Likewise.
	(mve_vqshlq_m_n_<supf><mode>): Likewise.
	(mve_vqshlq_m_<supf><mode>): Likewise.
	(mve_vqsubq_m_n_<supf><mode>): Likewise.
	(mve_vqsubq_m_<supf><mode>): Likewise.
	(mve_vrhaddq_m_<supf><mode>): Likewise.
	(mve_vrmulhq_m_<supf><mode>): Likewise.
	(mve_vrshlq_m_<supf><mode>): Likewise.
	(mve_vrshrq_m_n_<supf><mode>): Likewise.
	(mve_vshlq_m_n_<supf><mode>): Likewise.
	(mve_vshrq_m_n_<supf><mode>): Likewise.
	(mve_vsliq_m_n_<supf><mode>): Likewise.
	(mve_vsubq_m_n_<supf><mode>): Likewise.
	(mve_vhcaddq_rot270_m_s<mode>): Likewise.
	(mve_vhcaddq_rot90_m_s<mode>): Likewise.
	(mve_vmladavaxq_p_s<mode>): Likewise.
	(mve_vmlsdavaq_p_s<mode>): Likewise.
	(mve_vmlsdavaxq_p_s<mode>): Likewise.
	(mve_vqdmladhq_m_s<mode>): Likewise.
	(mve_vqdmladhxq_m_s<mode>): Likewise.
	(mve_vqdmlsdhq_m_s<mode>): Likewise.
	(mve_vqdmlsdhxq_m_s<mode>): Likewise.
	(mve_vqdmulhq_m_n_s<mode>): Likewise.
	(mve_vqdmulhq_m_s<mode>): Likewise.
	(mve_vqrdmladhq_m_s<mode>): Likewise.
	(mve_vqrdmladhxq_m_s<mode>): Likewise.
	(mve_vqrdmlsdhq_m_s<mode>): Likewise.
	(mve_vqrdmlsdhxq_m_s<mode>): Likewise.
	(mve_vqrdmulhq_m_n_s<mode>): Likewise.
	(mve_vqrdmulhq_m_s<mode>): Likewise.

gcc/testsuite/ChangeLog:

2019-10-30  Andre Vieira  <andre.simoesdiasvieira@arm.com>
            Mihail Ionescu  <mihail.ionescu@arm.com>
            Srinath Parvathaneni  <srinath.parvathaneni@arm.com>

	* gcc.target/arm/mve/intrinsics/vabdq_m_s16.c: New test.
	* gcc.target/arm/mve/intrinsics/vabdq_m_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vabdq_m_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vabdq_m_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vabdq_m_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vabdq_m_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddq_m_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddq_m_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddq_m_n_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddq_m_n_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddq_m_n_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddq_m_n_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddq_m_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddq_m_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddq_m_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddq_m_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddq_m_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddq_m_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vandq_m_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vandq_m_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vandq_m_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vandq_m_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vandq_m_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vandq_m_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vbicq_m_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vbicq_m_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vbicq_m_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vbicq_m_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vbicq_m_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vbicq_m_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vbrsrq_m_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vbrsrq_m_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vbrsrq_m_n_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vbrsrq_m_n_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vbrsrq_m_n_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vbrsrq_m_n_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcaddq_rot270_m_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcaddq_rot270_m_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcaddq_rot270_m_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcaddq_rot270_m_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcaddq_rot270_m_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcaddq_rot270_m_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcaddq_rot90_m_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcaddq_rot90_m_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcaddq_rot90_m_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcaddq_rot90_m_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcaddq_rot90_m_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcaddq_rot90_m_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/veorq_m_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/veorq_m_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/veorq_m_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/veorq_m_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/veorq_m_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/veorq_m_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhaddq_m_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhaddq_m_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhaddq_m_n_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhaddq_m_n_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhaddq_m_n_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhaddq_m_n_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhaddq_m_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhaddq_m_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhaddq_m_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhaddq_m_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhaddq_m_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhaddq_m_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhcaddq_rot270_m_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhcaddq_rot270_m_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhcaddq_rot270_m_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhcaddq_rot90_m_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhcaddq_rot90_m_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhcaddq_rot90_m_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhsubq_m_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhsubq_m_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhsubq_m_n_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhsubq_m_n_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhsubq_m_n_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhsubq_m_n_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhsubq_m_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhsubq_m_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhsubq_m_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhsubq_m_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhsubq_m_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhsubq_m_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmaxq_m_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmaxq_m_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmaxq_m_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmaxq_m_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmaxq_m_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmaxq_m_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vminq_m_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vminq_m_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vminq_m_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vminq_m_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vminq_m_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vminq_m_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmladavaq_p_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmladavaq_p_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmladavaq_p_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmladavaq_p_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmladavaq_p_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmladavaq_p_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmladavaxq_p_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmladavaxq_p_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmladavaxq_p_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmlaq_m_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmlaq_m_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmlaq_m_n_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmlaq_m_n_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmlaq_m_n_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmlaq_m_n_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmlasq_m_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmlasq_m_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmlasq_m_n_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmlasq_m_n_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmlasq_m_n_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmlasq_m_n_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmlsdavaq_p_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmlsdavaq_p_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmlsdavaq_p_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmlsdavaxq_p_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmlsdavaxq_p_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmlsdavaxq_p_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmulhq_m_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmulhq_m_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmulhq_m_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmulhq_m_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmulhq_m_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmulhq_m_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmullbq_int_m_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmullbq_int_m_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmullbq_int_m_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmullbq_int_m_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmullbq_int_m_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmullbq_int_m_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmulltq_int_m_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmulltq_int_m_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmulltq_int_m_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmulltq_int_m_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmulltq_int_m_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmulltq_int_m_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmulq_m_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmulq_m_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmulq_m_n_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmulq_m_n_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmulq_m_n_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmulq_m_n_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmulq_m_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmulq_m_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmulq_m_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmulq_m_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmulq_m_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmulq_m_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vornq_m_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vornq_m_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vornq_m_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vornq_m_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vornq_m_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vornq_m_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vorrq_m_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vorrq_m_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vorrq_m_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vorrq_m_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vorrq_m_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vorrq_m_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqaddq_m_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqaddq_m_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqaddq_m_n_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqaddq_m_n_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqaddq_m_n_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqaddq_m_n_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqaddq_m_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqaddq_m_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqaddq_m_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqaddq_m_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqaddq_m_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqaddq_m_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqdmladhq_m_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqdmladhq_m_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqdmladhq_m_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqdmladhxq_m_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqdmladhxq_m_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqdmladhxq_m_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqdmlahq_m_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqdmlahq_m_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqdmlahq_m_n_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqdmlahq_m_n_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqdmlahq_m_n_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqdmlahq_m_n_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqdmlsdhq_m_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqdmlsdhq_m_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqdmlsdhq_m_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqdmlsdhxq_m_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqdmlsdhxq_m_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqdmlsdhxq_m_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqdmulhq_m_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqdmulhq_m_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqdmulhq_m_n_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqdmulhq_m_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqdmulhq_m_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqdmulhq_m_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqrdmladhq_m_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqrdmladhq_m_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqrdmladhq_m_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqrdmladhxq_m_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqrdmladhxq_m_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqrdmladhxq_m_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqrdmlahq_m_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqrdmlahq_m_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqrdmlahq_m_n_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqrdmlahq_m_n_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqrdmlahq_m_n_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqrdmlahq_m_n_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqrdmlashq_m_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqrdmlashq_m_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqrdmlashq_m_n_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqrdmlashq_m_n_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqrdmlashq_m_n_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqrdmlashq_m_n_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqrdmlsdhq_m_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqrdmlsdhq_m_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqrdmlsdhq_m_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqrdmlsdhxq_m_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqrdmlsdhxq_m_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqrdmlsdhxq_m_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqrdmulhq_m_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqrdmulhq_m_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqrdmulhq_m_n_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqrdmulhq_m_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqrdmulhq_m_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqrdmulhq_m_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqrshlq_m_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqrshlq_m_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqrshlq_m_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqrshlq_m_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqrshlq_m_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqrshlq_m_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqshlq_m_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqshlq_m_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqshlq_m_n_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqshlq_m_n_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqshlq_m_n_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqshlq_m_n_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqshlq_m_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqshlq_m_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqshlq_m_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqshlq_m_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqshlq_m_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqshlq_m_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqsubq_m_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqsubq_m_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqsubq_m_n_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqsubq_m_n_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqsubq_m_n_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqsubq_m_n_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqsubq_m_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqsubq_m_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqsubq_m_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqsubq_m_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqsubq_m_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqsubq_m_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrhaddq_m_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrhaddq_m_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrhaddq_m_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrhaddq_m_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrhaddq_m_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrhaddq_m_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrmulhq_m_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrmulhq_m_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrmulhq_m_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrmulhq_m_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrmulhq_m_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrmulhq_m_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrshlq_m_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrshlq_m_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrshlq_m_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrshlq_m_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrshlq_m_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrshlq_m_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrshrq_m_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrshrq_m_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrshrq_m_n_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrshrq_m_n_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrshrq_m_n_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrshrq_m_n_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vshlq_m_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vshlq_m_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vshlq_m_n_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vshlq_m_n_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vshlq_m_n_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vshlq_m_n_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vshrq_m_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vshrq_m_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vshrq_m_n_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vshrq_m_n_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vshrq_m_n_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vshrq_m_n_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsliq_m_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsliq_m_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsliq_m_n_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsliq_m_n_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsliq_m_n_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsliq_m_n_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsubq_m_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsubq_m_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsubq_m_n_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsubq_m_n_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsubq_m_n_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsubq_m_n_u8.c: Likewise.

[-- Attachment #2: diff17.patch.gz --]
[-- Type: application/gzip, Size: 30144 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH][ARM][GCC][8/5x]: Remaining MVE store intrinsics which stores an half word, word and double word to memory.
  2019-11-14 19:34 [PATCH][ARM][GCC][0/x]: Support for MVE ACLE intrinsics Srinath Parvathaneni
                   ` (15 preceding siblings ...)
  2019-11-14 19:15 ` [PATCH][ARM][GCC][7/5x]: MVE store intrinsics which stores byte,half word or word to memory Srinath Parvathaneni
@ 2019-11-14 19:15 ` Srinath Parvathaneni
  2019-11-14 19:15 ` [PATCH][ARM][GCC][1/5x]: MVE store intrinsics Srinath Parvathaneni
                   ` (21 subsequent siblings)
  38 siblings, 0 replies; 48+ messages in thread
From: Srinath Parvathaneni @ 2019-11-14 19:15 UTC (permalink / raw)
  To: gcc-patches; +Cc: Richard Earnshaw, Kyrylo Tkachov

[-- Attachment #1: Type: text/plain, Size: 9394 bytes --]

Hello,

This patch supports the following MVE ACLE store intrinsics which stores an halfword,
word or double word to memory.

vstrdq_scatter_base_p_s64, vstrdq_scatter_base_p_u64, vstrdq_scatter_base_s64,
vstrdq_scatter_base_u64, vstrdq_scatter_offset_p_s64, vstrdq_scatter_offset_p_u64,
vstrdq_scatter_offset_s64, vstrdq_scatter_offset_u64, vstrdq_scatter_shifted_offset_p_s64,
vstrdq_scatter_shifted_offset_p_u64, vstrdq_scatter_shifted_offset_s64,
vstrdq_scatter_shifted_offset_u64, vstrhq_scatter_offset_f16, vstrhq_scatter_offset_p_f16,
vstrhq_scatter_shifted_offset_f16, vstrhq_scatter_shifted_offset_p_f16,
vstrwq_scatter_base_f32, vstrwq_scatter_base_p_f32, vstrwq_scatter_offset_f32,
vstrwq_scatter_offset_p_f32, vstrwq_scatter_offset_p_s32, vstrwq_scatter_offset_p_u32,
vstrwq_scatter_offset_s32, vstrwq_scatter_offset_u32, vstrwq_scatter_shifted_offset_f32,
vstrwq_scatter_shifted_offset_p_f32, vstrwq_scatter_shifted_offset_p_s32,
vstrwq_scatter_shifted_offset_p_u32, vstrwq_scatter_shifted_offset_s32,
vstrwq_scatter_shifted_offset_u32.

Please refer to M-profile Vector Extension (MVE) intrinsics [1]  for more details.
[1]  https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/mve-intrinsics

In this patch a new predicate "Ri" is defined to check the immediate is in the
range of +/-1016 and multiple of 8.

Regression tested on arm-none-eabi and found no regressions.

Ok for trunk?

Thanks,
Srinath.

gcc/ChangeLog:

2019-11-05  Andre Vieira  <andre.simoesdiasvieira@arm.com>
            Mihail Ionescu  <mihail.ionescu@arm.com>
            Srinath Parvathaneni  <srinath.parvathaneni@arm.com>

	* config/arm/arm_mve.h (vstrdq_scatter_base_p_s64): Define macro.
	(vstrdq_scatter_base_p_u64): Likewise.
	(vstrdq_scatter_base_s64): Likewise.
	(vstrdq_scatter_base_u64): Likewise.
	(vstrdq_scatter_offset_p_s64): Likewise.
	(vstrdq_scatter_offset_p_u64): Likewise.
	(vstrdq_scatter_offset_s64): Likewise.
	(vstrdq_scatter_offset_u64): Likewise.
	(vstrdq_scatter_shifted_offset_p_s64): Likewise.
	(vstrdq_scatter_shifted_offset_p_u64): Likewise.
	(vstrdq_scatter_shifted_offset_s64): Likewise.
	(vstrdq_scatter_shifted_offset_u64): Likewise.
	(vstrhq_scatter_offset_f16): Likewise.
	(vstrhq_scatter_offset_p_f16): Likewise.
	(vstrhq_scatter_shifted_offset_f16): Likewise.
	(vstrhq_scatter_shifted_offset_p_f16): Likewise.
	(vstrwq_scatter_base_f32): Likewise.
	(vstrwq_scatter_base_p_f32): Likewise.
	(vstrwq_scatter_offset_f32): Likewise.
	(vstrwq_scatter_offset_p_f32): Likewise.
	(vstrwq_scatter_offset_p_s32): Likewise.
	(vstrwq_scatter_offset_p_u32): Likewise.
	(vstrwq_scatter_offset_s32): Likewise.
	(vstrwq_scatter_offset_u32): Likewise.
	(vstrwq_scatter_shifted_offset_f32): Likewise.
	(vstrwq_scatter_shifted_offset_p_f32): Likewise.
	(vstrwq_scatter_shifted_offset_p_s32): Likewise.
	(vstrwq_scatter_shifted_offset_p_u32): Likewise.
	(vstrwq_scatter_shifted_offset_s32): Likewise.
	(vstrwq_scatter_shifted_offset_u32): Likewise.
	(__arm_vstrdq_scatter_base_p_s64): Define intrinsic.
	(__arm_vstrdq_scatter_base_p_u64): Likewise.
	(__arm_vstrdq_scatter_base_s64): Likewise.
	(__arm_vstrdq_scatter_base_u64): Likewise.
	(__arm_vstrdq_scatter_offset_p_s64): Likewise.
	(__arm_vstrdq_scatter_offset_p_u64): Likewise.
	(__arm_vstrdq_scatter_offset_s64): Likewise.
	(__arm_vstrdq_scatter_offset_u64): Likewise.
	(__arm_vstrdq_scatter_shifted_offset_p_s64): Likewise.
	(__arm_vstrdq_scatter_shifted_offset_p_u64): Likewise.
	(__arm_vstrdq_scatter_shifted_offset_s64): Likewise.
	(__arm_vstrdq_scatter_shifted_offset_u64): Likewise.
	(__arm_vstrwq_scatter_offset_p_s32): Likewise.
	(__arm_vstrwq_scatter_offset_p_u32): Likewise.
	(__arm_vstrwq_scatter_offset_s32): Likewise.
	(__arm_vstrwq_scatter_offset_u32): Likewise.
	(__arm_vstrwq_scatter_shifted_offset_p_s32): Likewise.
	(__arm_vstrwq_scatter_shifted_offset_p_u32): Likewise.
	(__arm_vstrwq_scatter_shifted_offset_s32): Likewise.
	(__arm_vstrwq_scatter_shifted_offset_u32): Likewise.
	(__arm_vstrhq_scatter_offset_f16): Likewise.
	(__arm_vstrhq_scatter_offset_p_f16): Likewise.
	(__arm_vstrhq_scatter_shifted_offset_f16): Likewise.
	(__arm_vstrhq_scatter_shifted_offset_p_f16): Likewise.
	(__arm_vstrwq_scatter_base_f32): Likewise.
	(__arm_vstrwq_scatter_base_p_f32): Likewise.
	(__arm_vstrwq_scatter_offset_f32): Likewise.
	(__arm_vstrwq_scatter_offset_p_f32): Likewise.
	(__arm_vstrwq_scatter_shifted_offset_f32): Likewise.
	(__arm_vstrwq_scatter_shifted_offset_p_f32): Likewise.
	(vstrhq_scatter_offset): Define polymorphic variant.
	(vstrhq_scatter_offset_p): Likewise.
	(vstrhq_scatter_shifted_offset): Likewise.
	(vstrhq_scatter_shifted_offset_p): Likewise.
	(vstrwq_scatter_base): Likewise.
	(vstrwq_scatter_base_p): Likewise.
	(vstrwq_scatter_offset): Likewise.
	(vstrwq_scatter_offset_p): Likewise.
	(vstrwq_scatter_shifted_offset): Likewise.
	(vstrwq_scatter_shifted_offset_p): Likewise.
	(vstrdq_scatter_base_p): Likewise.
	(vstrdq_scatter_base): Likewise.
	(vstrdq_scatter_offset_p): Likewise.
	(vstrdq_scatter_offset): Likewise.
	(vstrdq_scatter_shifted_offset_p): Likewise.
	(vstrdq_scatter_shifted_offset): Likewise.
	* config/arm/arm_mve_builtins.def (STRSBS): Use builtin qualifier.
	(STRSBS_P): Likewise.
	(STRSBU): Likewise.
	(STRSBU_P): Likewise.
	(STRSS): Likewise.
	(STRSS_P): Likewise.
	(STRSU): Likewise.
	(STRSU_P): Likewise.
	* config/arm/constraints.md (Ri): Define.
	* config/arm/mve.md (VSTRDSBQ): Define iterator.
	(VSTRDSOQ): Likewise.
	(VSTRDSSOQ): Likewise.
	(VSTRWSOQ): Likewise.
	(VSTRWSSOQ): Likewise.
	(mve_vstrdq_scatter_base_p_<supf>v2di): Define RTL pattern.
	(mve_vstrdq_scatter_base_<supf>v2di): Likewise.
	(mve_vstrdq_scatter_offset_p_<supf>v2di): Likewise.
	(mve_vstrdq_scatter_offset_<supf>v2di): Likewise.
	(mve_vstrdq_scatter_shifted_offset_p_<supf>v2di): Likewise.
	(mve_vstrdq_scatter_shifted_offset_<supf>v2di): Likewise.
	(mve_vstrhq_scatter_offset_fv8hf): Likewise.
	(mve_vstrhq_scatter_offset_p_fv8hf): Likewise.
	(mve_vstrhq_scatter_shifted_offset_fv8hf): Likewise.
	(mve_vstrhq_scatter_shifted_offset_p_fv8hf): Likewise.
	(mve_vstrwq_scatter_base_fv4sf): Likewise.
	(mve_vstrwq_scatter_base_p_fv4sf): Likewise.
	(mve_vstrwq_scatter_offset_fv4sf): Likewise.
	(mve_vstrwq_scatter_offset_p_fv4sf): Likewise.
	(mve_vstrwq_scatter_offset_p_<supf>v4si): Likewise.
	(mve_vstrwq_scatter_offset_<supf>v4si): Likewise.
	(mve_vstrwq_scatter_shifted_offset_fv4sf): Likewise.
	(mve_vstrwq_scatter_shifted_offset_p_fv4sf): Likewise.
	(mve_vstrwq_scatter_shifted_offset_p_<supf>v4si): Likewise.
	(mve_vstrwq_scatter_shifted_offset_<supf>v4si): Likewise.
	* config/arm/predicates.md (Ri): Define predicate to check immediate
        is the range +/-1016 and multiple of 8.

gcc/testsuite/ChangeLog:

2019-11-05  Andre Vieira  <andre.simoesdiasvieira@arm.com>
            Mihail Ionescu  <mihail.ionescu@arm.com>
            Srinath Parvathaneni  <srinath.parvathaneni@arm.com>

	* gcc.target/arm/mve/intrinsics/vstrdq_scatter_base_p_s64.c: New test.
	* gcc.target/arm/mve/intrinsics/vstrdq_scatter_base_p_u64.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vstrdq_scatter_base_s64.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vstrdq_scatter_base_u64.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vstrdq_scatter_offset_p_s64.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vstrdq_scatter_offset_p_u64.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vstrdq_scatter_offset_s64.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vstrdq_scatter_offset_u64.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vstrdq_scatter_shifted_offset_p_s64.c:
	Likewise.
	* gcc.target/arm/mve/intrinsics/vstrdq_scatter_shifted_offset_p_u64.c:
	Likewise.
	* gcc.target/arm/mve/intrinsics/vstrdq_scatter_shifted_offset_s64.c:
	Likewise.
	* gcc.target/arm/mve/intrinsics/vstrdq_scatter_shifted_offset_u64.c:
	Likewise.
	* gcc.target/arm/mve/intrinsics/vstrhq_scatter_offset_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vstrhq_scatter_offset_p_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vstrhq_scatter_shifted_offset_f16.c:
	Likewise.
	* gcc.target/arm/mve/intrinsics/vstrhq_scatter_shifted_offset_p_f16.c:
	Likewise.
	* gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_p_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vstrwq_scatter_offset_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vstrwq_scatter_offset_p_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vstrwq_scatter_offset_p_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vstrwq_scatter_offset_p_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vstrwq_scatter_offset_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vstrwq_scatter_offset_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vstrwq_scatter_shifted_offset_f32.c:
	Likewise.
	* gcc.target/arm/mve/intrinsics/vstrwq_scatter_shifted_offset_p_f32.c:
	Likewise.
	* gcc.target/arm/mve/intrinsics/vstrwq_scatter_shifted_offset_p_s32.c:
	Likewise.
	* gcc.target/arm/mve/intrinsics/vstrwq_scatter_shifted_offset_p_u32.c:
	Likewise.
	* gcc.target/arm/mve/intrinsics/vstrwq_scatter_shifted_offset_s32.c:
	Likewise.
	* gcc.target/arm/mve/intrinsics/vstrwq_scatter_shifted_offset_u32.c:
	Likewise.

[-- Attachment #2: diff27.patch.gz --]
[-- Type: application/gzip, Size: 7127 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH][ARM][GCC][1/8x]: MVE ACLE vidup, vddup, viwdup and vdwdup intrinsics with writeback.
  2019-11-14 19:34 [PATCH][ARM][GCC][0/x]: Support for MVE ACLE intrinsics Srinath Parvathaneni
                   ` (30 preceding siblings ...)
  2019-11-14 19:16 ` [PATCH][ARM][GCC][4/1x]: MVE intrinsics with unary operand Srinath Parvathaneni
@ 2019-11-14 19:16 ` Srinath Parvathaneni
  2019-11-14 19:16 ` [PATCH][ARM][GCC][10x]: MVE ACLE intrinsics "add with carry across beats" and "beat-wise substract" Srinath Parvathaneni
                   ` (6 subsequent siblings)
  38 siblings, 0 replies; 48+ messages in thread
From: Srinath Parvathaneni @ 2019-11-14 19:16 UTC (permalink / raw)
  To: gcc-patches; +Cc: Richard Earnshaw, Kyrylo Tkachov

[-- Attachment #1: Type: text/plain, Size: 9137 bytes --]

Hello,

This patch supports following MVE ACLE intrinsics with writeback.

vddupq_m_n_u8, vddupq_m_n_u32, vddupq_m_n_u16, vddupq_m_wb_u8, vddupq_m_wb_u16,
vddupq_m_wb_u32, vddupq_n_u8, vddupq_n_u32, vddupq_n_u16, vddupq_wb_u8,
vddupq_wb_u16, vddupq_wb_u32, vdwdupq_m_n_u8, vdwdupq_m_n_u32, vdwdupq_m_n_u16,
vdwdupq_m_wb_u8, vdwdupq_m_wb_u32, vdwdupq_m_wb_u16, vdwdupq_n_u8, vdwdupq_n_u32,
vdwdupq_n_u16, vdwdupq_wb_u8, vdwdupq_wb_u32, vdwdupq_wb_u16, vidupq_m_n_u8,
vidupq_m_n_u32, vidupq_m_n_u16, vidupq_m_wb_u8, vidupq_m_wb_u16, vidupq_m_wb_u32,
vidupq_n_u8, vidupq_n_u32, vidupq_n_u16, vidupq_wb_u8, vidupq_wb_u16,
vidupq_wb_u32, viwdupq_m_n_u8, viwdupq_m_n_u32, viwdupq_m_n_u16, viwdupq_m_wb_u8,
viwdupq_m_wb_u32, viwdupq_m_wb_u16, viwdupq_n_u8, viwdupq_n_u32, viwdupq_n_u16,
viwdupq_wb_u8, viwdupq_wb_u32, viwdupq_wb_u16.

Please refer to M-profile Vector Extension (MVE) intrinsics [1]  for more details.
[1] https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/mve-intrinsics

Regression tested on arm-none-eabi and found no regressions.

Ok for trunk?

Thanks,
Srinath.

gcc/ChangeLog:

2019-11-07  Andre Vieira  <andre.simoesdiasvieira@arm.com>
	    Mihail Ionescu  <mihail.ionescu@arm.com>
	    Srinath Parvathaneni  <srinath.parvathaneni@arm.com>

	* config/arm/arm-builtins.c
	(QUINOP_UNONE_UNONE_UNONE_UNONE_IMM_UNONE_QUALIFIERS): Define quinary
	builtin qualifier.
	* config/arm/arm_mve.h (vddupq_m_n_u8): Define macro.
	(vddupq_m_n_u32): Likewise.
	(vddupq_m_n_u16): Likewise.
	(vddupq_m_wb_u8): Likewise.
	(vddupq_m_wb_u16): Likewise.
	(vddupq_m_wb_u32): Likewise.
	(vddupq_n_u8): Likewise.
	(vddupq_n_u32): Likewise.
	(vddupq_n_u16): Likewise.
	(vddupq_wb_u8): Likewise.
	(vddupq_wb_u16): Likewise.
	(vddupq_wb_u32): Likewise.
	(vdwdupq_m_n_u8): Likewise.
	(vdwdupq_m_n_u32): Likewise.
	(vdwdupq_m_n_u16): Likewise.
	(vdwdupq_m_wb_u8): Likewise.
	(vdwdupq_m_wb_u32): Likewise.
	(vdwdupq_m_wb_u16): Likewise.
	(vdwdupq_n_u8): Likewise.
	(vdwdupq_n_u32): Likewise.
	(vdwdupq_n_u16): Likewise.
	(vdwdupq_wb_u8): Likewise.
	(vdwdupq_wb_u32): Likewise.
	(vdwdupq_wb_u16): Likewise.
	(vidupq_m_n_u8): Likewise.
	(vidupq_m_n_u32): Likewise.
	(vidupq_m_n_u16): Likewise.
	(vidupq_m_wb_u8): Likewise.
	(vidupq_m_wb_u16): Likewise.
	(vidupq_m_wb_u32): Likewise.
	(vidupq_n_u8): Likewise.
	(vidupq_n_u32): Likewise.
	(vidupq_n_u16): Likewise.
	(vidupq_wb_u8): Likewise.
	(vidupq_wb_u16): Likewise.
	(vidupq_wb_u32): Likewise.
	(viwdupq_m_n_u8): Likewise.
	(viwdupq_m_n_u32): Likewise.
	(viwdupq_m_n_u16): Likewise.
	(viwdupq_m_wb_u8): Likewise.
	(viwdupq_m_wb_u32): Likewise.
	(viwdupq_m_wb_u16): Likewise.
	(viwdupq_n_u8): Likewise.
	(viwdupq_n_u32): Likewise.
	(viwdupq_n_u16): Likewise.
	(viwdupq_wb_u8): Likewise.
	(viwdupq_wb_u32): Likewise.
	(viwdupq_wb_u16): Likewise.
	(__arm_vddupq_m_n_u8): Define intrinsic.
	(__arm_vddupq_m_n_u32): Likewise.
	(__arm_vddupq_m_n_u16): Likewise.
	(__arm_vddupq_m_wb_u8): Likewise.
	(__arm_vddupq_m_wb_u16): Likewise.
	(__arm_vddupq_m_wb_u32): Likewise.
	(__arm_vddupq_n_u8): Likewise.
	(__arm_vddupq_n_u32): Likewise.
	(__arm_vddupq_n_u16): Likewise.
	(__arm_vdwdupq_m_n_u8): Likewise.
	(__arm_vdwdupq_m_n_u32): Likewise.
	(__arm_vdwdupq_m_n_u16): Likewise.
	(__arm_vdwdupq_m_wb_u8): Likewise.
	(__arm_vdwdupq_m_wb_u32): Likewise.
	(__arm_vdwdupq_m_wb_u16): Likewise.
	(__arm_vdwdupq_n_u8): Likewise.
	(__arm_vdwdupq_n_u32): Likewise.
	(__arm_vdwdupq_n_u16): Likewise.
	(__arm_vdwdupq_wb_u8): Likewise.
	(__arm_vdwdupq_wb_u32): Likewise.
	(__arm_vdwdupq_wb_u16): Likewise.
	(__arm_vidupq_m_n_u8): Likewise.
	(__arm_vidupq_m_n_u32): Likewise.
	(__arm_vidupq_m_n_u16): Likewise.
	(__arm_vidupq_n_u8): Likewise.
	(__arm_vidupq_m_wb_u8): Likewise.
	(__arm_vidupq_m_wb_u16): Likewise.
	(__arm_vidupq_m_wb_u32): Likewise.
	(__arm_vidupq_n_u32): Likewise.
	(__arm_vidupq_n_u16): Likewise.
	(__arm_vidupq_wb_u8): Likewise.
	(__arm_vidupq_wb_u16): Likewise.
	(__arm_vidupq_wb_u32): Likewise.
	(__arm_vddupq_wb_u8): Likewise.
	(__arm_vddupq_wb_u16): Likewise.
	(__arm_vddupq_wb_u32): Likewise.
	(__arm_viwdupq_m_n_u8): Likewise.
	(__arm_viwdupq_m_n_u32): Likewise.
	(__arm_viwdupq_m_n_u16): Likewise.
	(__arm_viwdupq_m_wb_u8): Likewise.
	(__arm_viwdupq_m_wb_u32): Likewise.
	(__arm_viwdupq_m_wb_u16): Likewise.
	(__arm_viwdupq_n_u8): Likewise.
	(__arm_viwdupq_n_u32): Likewise.
	(__arm_viwdupq_n_u16): Likewise.
	(__arm_viwdupq_wb_u8): Likewise.
	(__arm_viwdupq_wb_u32): Likewise.
	(__arm_viwdupq_wb_u16): Likewise.
	(vidupq_m): Define polymorphic variant.
	(vddupq_m): Likewise.
	(vidupq_u16): Likewise.
	(vidupq_u32): Likewise.
	(vidupq_u8): Likewise.
	(vddupq_u16): Likewise.
	(vddupq_u32): Likewise.
	(vddupq_u8): Likewise.
	(viwdupq_m): Likewise.
	(viwdupq_u16): Likewise.
	(viwdupq_u32): Likewise.
	(viwdupq_u8): Likewise.
	(vdwdupq_m): Likewise.
	(vdwdupq_u16): Likewise.
	(vdwdupq_u32): Likewise.
	(vdwdupq_u8): Likewise.
	* config/arm/arm_mve_builtins.def
	(QUINOP_UNONE_UNONE_UNONE_UNONE_IMM_UNONE_QUALIFIERS): Use builtin
	qualifier.
	* config/arm/mve.md (mve_vidupq_n_u<mode>): Define RTL pattern.
	(mve_vidupq_u<mode>_insn): Likewise.
	(mve_vidupq_m_n_u<mode>): Likewise.
	(mve_vidupq_m_wb_u<mode>_insn): Likewise.
	(mve_vddupq_n_u<mode>): Likewise.
	(mve_vddupq_u<mode>_insn): Likewise.
	(mve_vddupq_m_n_u<mode>): Likewise.
	(mve_vddupq_m_wb_u<mode>_insn): Likewise.
	(mve_vdwdupq_n_u<mode>): Likewise.
	(mve_vdwdupq_wb_u<mode>): Likewise.
	(mve_vdwdupq_wb_u<mode>_insn): Likewise.
	(mve_vdwdupq_m_n_u<mode>): Likewise.
	(mve_vdwdupq_m_wb_u<mode>): Likewise.
	(mve_vdwdupq_m_wb_u<mode>_insn): Likewise.
	(mve_viwdupq_n_u<mode>): Likewise.
	(mve_viwdupq_wb_u<mode>): Likewise.
	(mve_viwdupq_wb_u<mode>_insn): Likewise.
	(mve_viwdupq_m_n_u<mode>): Likewise.
	(mve_viwdupq_m_wb_u<mode>): Likewise.
	(mve_viwdupq_m_wb_u<mode>_insn): Likewise.

gcc/testsuite/ChangeLog:

2019-11-07  Andre Vieira  <andre.simoesdiasvieira@arm.com>
	    Mihail Ionescu  <mihail.ionescu@arm.com>
	    Srinath Parvathaneni  <srinath.parvathaneni@arm.com>

	* gcc.target/arm/mve/intrinsics/vddupq_m_n_u16.c: New test.
	* gcc.target/arm/mve/intrinsics/vddupq_m_n_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vddupq_m_n_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vddupq_m_wb_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vddupq_m_wb_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vddupq_m_wb_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vddupq_n_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vddupq_n_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vddupq_n_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vddupq_wb_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vddupq_wb_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vddupq_wb_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vdwdupq_m_n_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vdwdupq_m_n_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vdwdupq_m_n_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vdwdupq_m_wb_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vdwdupq_m_wb_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vdwdupq_m_wb_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vdwdupq_n_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vdwdupq_n_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vdwdupq_n_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vdwdupq_wb_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vdwdupq_wb_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vdwdupq_wb_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vidupq_m_n_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vidupq_m_n_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vidupq_m_n_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vidupq_m_wb_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vidupq_m_wb_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vidupq_m_wb_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vidupq_n_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vidupq_n_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vidupq_n_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vidupq_wb_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vidupq_wb_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vidupq_wb_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/viwdupq_m_n_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/viwdupq_m_n_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/viwdupq_m_n_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/viwdupq_m_wb_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/viwdupq_m_wb_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/viwdupq_m_wb_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/viwdupq_n_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/viwdupq_n_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/viwdupq_n_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/viwdupq_wb_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/viwdupq_wb_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/viwdupq_wb_u8.c: Likewise.

[-- Attachment #2: diff30.patch.gz --]
[-- Type: application/gzip, Size: 7362 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH][ARM][GCC][4/1x]: MVE intrinsics with unary operand.
  2019-11-14 19:34 [PATCH][ARM][GCC][0/x]: Support for MVE ACLE intrinsics Srinath Parvathaneni
                   ` (29 preceding siblings ...)
  2019-11-14 19:16 ` [PATCH][ARM][GCC][9x]: MVE ACLE predicated intrinsics with (dont-care) variant Srinath Parvathaneni
@ 2019-11-14 19:16 ` Srinath Parvathaneni
  2019-12-19 18:16   ` Kyrill Tkachov
  2019-11-14 19:16 ` [PATCH][ARM][GCC][1/8x]: MVE ACLE vidup, vddup, viwdup and vdwdup intrinsics with writeback Srinath Parvathaneni
                   ` (7 subsequent siblings)
  38 siblings, 1 reply; 48+ messages in thread
From: Srinath Parvathaneni @ 2019-11-14 19:16 UTC (permalink / raw)
  To: gcc-patches; +Cc: Richard Earnshaw, Kyrylo Tkachov

[-- Attachment #1: Type: text/plain, Size: 13504 bytes --]

Hello,

This patch supports following MVE ACLE intrinsics with unary operand.

vctp16q, vctp32q, vctp64q, vctp8q, vpnot.

Please refer to M-profile Vector Extension (MVE) intrinsics [1]  for more details.
[1] https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/mve-intrinsics

There are few conflicts in defining the machine registers, resolved by re-ordering
VPR_REGNUM, APSRQ_REGNUM and APSRGE_REGNUM.

Regression tested on arm-none-eabi and found no regressions.

Ok for trunk?

Thanks,
Srinath.

gcc/ChangeLog:

2019-11-12  Andre Vieira  <andre.simoesdiasvieira@arm.com>
	    Mihail Ionescu  <mihail.ionescu@arm.com>
	    Srinath Parvathaneni  <srinath.parvathaneni@arm.com>

	* config/arm/arm-builtins.c (hi_UP): Define mode.
	* config/arm/arm.h (IS_VPR_REGNUM): Move.
	* config/arm/arm.md (VPR_REGNUM): Define before APSRQ_REGNUM.
	(APSRQ_REGNUM): Modify.
	(APSRGE_REGNUM): Modify.
	* config/arm/arm_mve.h (vctp16q): Define macro.
	(vctp32q): Likewise.
	(vctp64q): Likewise.
	(vctp8q): Likewise.
	(vpnot): Likewise.
	(__arm_vctp16q): Define intrinsic.
	(__arm_vctp32q): Likewise.
	(__arm_vctp64q): Likewise.
	(__arm_vctp8q): Likewise.
	(__arm_vpnot): Likewise.
	* config/arm/arm_mve_builtins.def (UNOP_UNONE_UNONE): Use builtin
	qualifier.
	* config/arm/mve.md (mve_vctp<mode1>qhi): Define RTL pattern.
	(mve_vpnothi): Likewise.

gcc/testsuite/ChangeLog:

2019-11-12  Andre Vieira  <andre.simoesdiasvieira@arm.com>
	    Mihail Ionescu  <mihail.ionescu@arm.com>
	    Srinath Parvathaneni  <srinath.parvathaneni@arm.com>

	* gcc.target/arm/mve/intrinsics/vctp16q.c: New test.
	* gcc.target/arm/mve/intrinsics/vctp32q.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vctp64q.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vctp8q.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vpnot.c: Likewise.


###############     Attachment also inlined for ease of reply    ###############


diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index 21b213d8e1bc99a3946f15e97161e01d73832799..cd82aa159089c288607e240de02a85dcbb134a14 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -387,6 +387,7 @@ arm_set_sat_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 #define oi_UP	 E_OImode
 #define hf_UP	 E_HFmode
 #define si_UP	 E_SImode
+#define hi_UP    E_HImode
 #define void_UP	 E_VOIDmode
 
 #define UP(X) X##_UP
diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index 485db72f05f16ca389227289a35c232dc982bf9d..95ec7963a57a1a5652a0a9dc30391a0ce6348242 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -955,6 +955,9 @@ extern int arm_arch_cmse;
 #define IS_IWMMXT_GR_REGNUM(REGNUM) \
   (((REGNUM) >= FIRST_IWMMXT_GR_REGNUM) && ((REGNUM) <= LAST_IWMMXT_GR_REGNUM))
 
+#define IS_VPR_REGNUM(REGNUM) \
+  ((REGNUM) == VPR_REGNUM)
+
 /* Base register for access to local variables of the function.  */
 #define FRAME_POINTER_REGNUM	102
 
@@ -999,7 +1002,7 @@ extern int arm_arch_cmse;
    && (LAST_VFP_REGNUM - (REGNUM) >= 2 * (N) - 1))
 
 /* The number of hard registers is 16 ARM + 1 CC + 1 SFP + 1 AFP
-   + 1 APSRQ + 1 APSRGE + 1 VPR.  */
+   +1 VPR + 1 APSRQ + 1 APSRGE.  */
 /* Intel Wireless MMX Technology registers add 16 + 4 more.  */
 /* VFP (VFP3) adds 32 (64) + 1 VFPCC.  */
 #define FIRST_PSEUDO_REGISTER   107
@@ -1101,13 +1104,10 @@ extern int arm_regs_in_sequence[];
   /* Registers not for general use.  */		\
   CC_REGNUM, VFPCC_REGNUM,			\
   FRAME_POINTER_REGNUM, ARG_POINTER_REGNUM,	\
-  SP_REGNUM, PC_REGNUM, APSRQ_REGNUM, APSRGE_REGNUM,	\
-  VPR_REGNUM					\
+  SP_REGNUM, PC_REGNUM, VPR_REGNUM, APSRQ_REGNUM,\
+  APSRGE_REGNUM					\
 }
 
-#define IS_VPR_REGNUM(REGNUM) \
-  ((REGNUM) == VPR_REGNUM)
-
 /* Use different register alloc ordering for Thumb.  */
 #define ADJUST_REG_ALLOC_ORDER arm_order_regs_for_local_alloc ()
 
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 689baa0b0ff63ef90f47d2fd844cb98c9a1457a0..2a90482a873f8250a3b2b1dec141669f55e0c58b 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -39,9 +39,9 @@
    (LAST_ARM_REGNUM  15)	;
    (CC_REGNUM       100)	; Condition code pseudo register
    (VFPCC_REGNUM    101)	; VFP Condition code pseudo register
-   (APSRQ_REGNUM    104)	; Q bit pseudo register
-   (APSRGE_REGNUM   105)	; GE bits pseudo register
-   (VPR_REGNUM      106)	; Vector Predication Register - MVE register.
+   (VPR_REGNUM      104)	; Vector Predication Register - MVE register.
+   (APSRQ_REGNUM    105)	; Q bit pseudo register
+   (APSRGE_REGNUM   106)	; GE bits pseudo register
   ]
 )
 ;; 3rd operand to select_dominance_cc_mode
diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
index 1d357180ba9ddb26347b55cde625903bdb09eef6..c8d9b6471634725cea9bab3f9fa145810b506938 100644
--- a/gcc/config/arm/arm_mve.h
+++ b/gcc/config/arm/arm_mve.h
@@ -192,6 +192,11 @@ typedef struct { uint8x16_t val[4]; } uint8x16x4_t;
 #define vcvtmq_u32_f32(__a) __arm_vcvtmq_u32_f32(__a)
 #define vcvtaq_u16_f16(__a) __arm_vcvtaq_u16_f16(__a)
 #define vcvtaq_u32_f32(__a) __arm_vcvtaq_u32_f32(__a)
+#define vctp16q(__a) __arm_vctp16q(__a)
+#define vctp32q(__a) __arm_vctp32q(__a)
+#define vctp64q(__a) __arm_vctp64q(__a)
+#define vctp8q(__a) __arm_vctp8q(__a)
+#define vpnot(__a) __arm_vpnot(__a)
 #endif
 
 __extension__ extern __inline void
@@ -703,6 +708,41 @@ __arm_vaddlvq_u32 (uint32x4_t __a)
   return __builtin_mve_vaddlvq_uv4si (__a);
 }
 
+__extension__ extern __inline int64_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vctp16q (uint32_t __a)
+{
+  return __builtin_mve_vctp16qhi (__a);
+}
+
+__extension__ extern __inline mve_pred16_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vctp32q (uint32_t __a)
+{
+  return __builtin_mve_vctp32qhi (__a);
+}
+
+__extension__ extern __inline mve_pred16_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vctp64q (uint32_t __a)
+{
+  return __builtin_mve_vctp64qhi (__a);
+}
+
+__extension__ extern __inline mve_pred16_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vctp8q (uint32_t __a)
+{
+  return __builtin_mve_vctp8qhi (__a);
+}
+
+__extension__ extern __inline mve_pred16_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vpnot (mve_pred16_t __a)
+{
+  return __builtin_mve_vpnothi (__a);
+}
+
 #if (__ARM_FEATURE_MVE & 2) /* MVE Floating point.  */
 
 __extension__ extern __inline void
diff --git a/gcc/config/arm/arm_mve_builtins.def b/gcc/config/arm/arm_mve_builtins.def
index aafcc2a4bede4e2f9f7845e60f30d4482b7f10bc..816b6dfca7fb221275212ca5f06fc6f679860a38 100644
--- a/gcc/config/arm/arm_mve_builtins.def
+++ b/gcc/config/arm/arm_mve_builtins.def
@@ -1,5 +1,5 @@
 /*  MVE builtin definitions for Arm.
-    Copyright  (C) 2019 Free Software Foundation, Inc.
+    Copyright (C) 2019 Free Software Foundation, Inc.
     Contributed by Arm.
 
     This file is part of GCC.
@@ -71,3 +71,8 @@ VAR2 (UNOP_UNONE_NONE, vcvtaq_u, v8hi, v4si)
 VAR2 (UNOP_UNONE_IMM, vmvnq_n_u, v8hi, v4si)
 VAR1 (UNOP_UNONE_UNONE, vrev16q_u, v16qi)
 VAR1 (UNOP_UNONE_UNONE, vaddlvq_u, v4si)
+VAR1 (UNOP_UNONE_UNONE, vctp16q, hi)
+VAR1 (UNOP_UNONE_UNONE, vctp32q, hi)
+VAR1 (UNOP_UNONE_UNONE, vctp64q, hi)
+VAR1 (UNOP_UNONE_UNONE, vctp8q, hi)
+VAR1 (UNOP_UNONE_UNONE, vpnot, hi)
diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index 9ad53e5655b01c26143a5f46c675658a450e0adf..ee2263e04309e8274ec76ae4478afb7cfba59f8f 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -36,7 +36,7 @@
 			 VREV32Q_U VREV32Q_S VMOVLTQ_U VMOVLTQ_S VMOVLBQ_S
 			 VMOVLBQ_U VCVTQ_FROM_F_S VCVTQ_FROM_F_U VCVTPQ_S
 			 VCVTPQ_U VCVTNQ_S VCVTNQ_U VCVTMQ_S VCVTMQ_U
-			 VADDLVQ_U])
+			 VADDLVQ_U VCTP8Q VCTP16Q VCTP32Q VCTP64Q VPNOT])
 
 (define_mode_attr MVE_CNVT [(V8HI "V8HF") (V4SI "V4SF")
 			    (V8HF "V8HI") (V4SF "V4SI")])
@@ -54,6 +54,9 @@
 		       (VCLZQ_U "u") (VCLZQ_S "s") (VREV32Q_U "u")
 		       (VREV32Q_S "s") (VADDLVQ_U "u") (VADDLVQ_S "s")])
 
+(define_int_attr mode1 [(VCTP8Q "8") (VCTP16Q "16") (VCTP32Q "32")
+			(VCTP64Q "64")])
+
 (define_int_iterator VCVTQ_TO_F [VCVTQ_TO_F_S VCVTQ_TO_F_U])
 (define_int_iterator VMVNQ_N [VMVNQ_N_U VMVNQ_N_S])
 (define_int_iterator VREV64Q [VREV64Q_S VREV64Q_U])
@@ -71,6 +74,7 @@
 (define_int_iterator VCVTNQ [VCVTNQ_S VCVTNQ_U])
 (define_int_iterator VCVTMQ [VCVTMQ_S VCVTMQ_U])
 (define_int_iterator VADDLVQ [VADDLVQ_U VADDLVQ_S])
+(define_int_iterator VCTPQ [VCTP8Q VCTP16Q VCTP32Q VCTP64Q])
 
 (define_insn "*mve_mov<mode>"
   [(set (match_operand:MVE_types 0 "s_register_operand" "=w,w,r,w,w,r,w")
@@ -648,3 +652,31 @@
   "vaddlv.<supf>32 %Q0, %R0, %q1"
   [(set_attr "type" "mve_move")
 ])
+
+;;
+;; [vctp8q vctp16q vctp32q vctp64q])
+;;
+(define_insn "mve_vctp<mode1>qhi"
+  [
+   (set (match_operand:HI 0 "vpr_register_operand" "=Up")
+	(unspec:HI [(match_operand:SI 1 "s_register_operand" "r")]
+	VCTPQ))
+  ]
+  "TARGET_HAVE_MVE"
+  "vctp.<mode1> %1"
+  [(set_attr "type" "mve_move")
+])
+
+;;
+;; [vpnot])
+;;
+(define_insn "mve_vpnothi"
+  [
+   (set (match_operand:HI 0 "s_register_operand" "=r")
+	(unspec:HI [(match_operand:HI 1 "vpr_register_operand" "Up")]
+	 VPNOT))
+  ]
+  "TARGET_HAVE_MVE"
+  "vpnot"
+  [(set_attr "type" "mve_move")
+])
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vctp16q.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vctp16q.c
new file mode 100644
index 0000000000000000000000000000000000000000..0530d23d6cb6283f19f9122bb85c22714eb71843
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vctp16q.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+mve_pred16_t
+foo (uint32_t a)
+{
+  return vctp16q (a);
+}
+
+/* { dg-final { scan-assembler "vctp.16"  }  } */
+
+mve_pred16_t
+foo1 (uint32_t a)
+{
+  return vctp16q (a);
+}
+
+/* { dg-final { scan-assembler "vctp.16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vctp32q.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vctp32q.c
new file mode 100644
index 0000000000000000000000000000000000000000..6e59362345a7b4e77028262c69bee42d230a6f38
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vctp32q.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+mve_pred16_t
+foo (uint32_t a)
+{
+  return vctp32q (a);
+}
+
+/* { dg-final { scan-assembler "vctp.32"  }  } */
+
+mve_pred16_t
+foo1 (uint32_t a)
+{
+  return vctp32q (a);
+}
+
+/* { dg-final { scan-assembler "vctp.32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vctp64q.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vctp64q.c
new file mode 100644
index 0000000000000000000000000000000000000000..9a9224ef88de71012373b0f0cebedbade045fccd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vctp64q.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+mve_pred16_t
+foo (uint32_t a)
+{
+  return vctp64q (a);
+}
+
+/* { dg-final { scan-assembler "vctp.64"  }  } */
+
+mve_pred16_t
+foo1 (uint32_t a)
+{
+  return vctp64q (a);
+}
+
+/* { dg-final { scan-assembler "vctp.64"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vctp8q.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vctp8q.c
new file mode 100644
index 0000000000000000000000000000000000000000..3118d9b60a2eb8c72eb1273445a0b3370b603402
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vctp8q.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+mve_pred16_t
+foo (uint32_t a)
+{
+  return vctp8q (a);
+}
+
+/* { dg-final { scan-assembler "vctp.8"  }  } */
+
+mve_pred16_t
+foo1 (uint32_t a)
+{
+  return vctp8q (a);
+}
+
+/* { dg-final { scan-assembler "vctp.8"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vpnot.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vpnot.c
new file mode 100644
index 0000000000000000000000000000000000000000..9422d342b9d8bf024c2a95877c61e8dd0bd4ffb1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vpnot.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+mve_pred16_t
+foo (mve_pred16_t a)
+{
+  return vpnot (a);
+}
+
+/* { dg-final { scan-assembler "vpnot"  }  } */
+
+mve_pred16_t
+foo1 (mve_pred16_t a)
+{
+  return vpnot (a);
+}
+
+/* { dg-final { scan-assembler "vpnot"  }  } */


[-- Attachment #2: diff07.patch --]
[-- Type: text/plain, Size: 11198 bytes --]

diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index 21b213d8e1bc99a3946f15e97161e01d73832799..cd82aa159089c288607e240de02a85dcbb134a14 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -387,6 +387,7 @@ arm_set_sat_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 #define oi_UP	 E_OImode
 #define hf_UP	 E_HFmode
 #define si_UP	 E_SImode
+#define hi_UP    E_HImode
 #define void_UP	 E_VOIDmode
 
 #define UP(X) X##_UP
diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index 485db72f05f16ca389227289a35c232dc982bf9d..95ec7963a57a1a5652a0a9dc30391a0ce6348242 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -955,6 +955,9 @@ extern int arm_arch_cmse;
 #define IS_IWMMXT_GR_REGNUM(REGNUM) \
   (((REGNUM) >= FIRST_IWMMXT_GR_REGNUM) && ((REGNUM) <= LAST_IWMMXT_GR_REGNUM))
 
+#define IS_VPR_REGNUM(REGNUM) \
+  ((REGNUM) == VPR_REGNUM)
+
 /* Base register for access to local variables of the function.  */
 #define FRAME_POINTER_REGNUM	102
 
@@ -999,7 +1002,7 @@ extern int arm_arch_cmse;
    && (LAST_VFP_REGNUM - (REGNUM) >= 2 * (N) - 1))
 
 /* The number of hard registers is 16 ARM + 1 CC + 1 SFP + 1 AFP
-   + 1 APSRQ + 1 APSRGE + 1 VPR.  */
+   +1 VPR + 1 APSRQ + 1 APSRGE.  */
 /* Intel Wireless MMX Technology registers add 16 + 4 more.  */
 /* VFP (VFP3) adds 32 (64) + 1 VFPCC.  */
 #define FIRST_PSEUDO_REGISTER   107
@@ -1101,13 +1104,10 @@ extern int arm_regs_in_sequence[];
   /* Registers not for general use.  */		\
   CC_REGNUM, VFPCC_REGNUM,			\
   FRAME_POINTER_REGNUM, ARG_POINTER_REGNUM,	\
-  SP_REGNUM, PC_REGNUM, APSRQ_REGNUM, APSRGE_REGNUM,	\
-  VPR_REGNUM					\
+  SP_REGNUM, PC_REGNUM, VPR_REGNUM, APSRQ_REGNUM,\
+  APSRGE_REGNUM					\
 }
 
-#define IS_VPR_REGNUM(REGNUM) \
-  ((REGNUM) == VPR_REGNUM)
-
 /* Use different register alloc ordering for Thumb.  */
 #define ADJUST_REG_ALLOC_ORDER arm_order_regs_for_local_alloc ()
 
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 689baa0b0ff63ef90f47d2fd844cb98c9a1457a0..2a90482a873f8250a3b2b1dec141669f55e0c58b 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -39,9 +39,9 @@
    (LAST_ARM_REGNUM  15)	;
    (CC_REGNUM       100)	; Condition code pseudo register
    (VFPCC_REGNUM    101)	; VFP Condition code pseudo register
-   (APSRQ_REGNUM    104)	; Q bit pseudo register
-   (APSRGE_REGNUM   105)	; GE bits pseudo register
-   (VPR_REGNUM      106)	; Vector Predication Register - MVE register.
+   (VPR_REGNUM      104)	; Vector Predication Register - MVE register.
+   (APSRQ_REGNUM    105)	; Q bit pseudo register
+   (APSRGE_REGNUM   106)	; GE bits pseudo register
   ]
 )
 ;; 3rd operand to select_dominance_cc_mode
diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
index 1d357180ba9ddb26347b55cde625903bdb09eef6..c8d9b6471634725cea9bab3f9fa145810b506938 100644
--- a/gcc/config/arm/arm_mve.h
+++ b/gcc/config/arm/arm_mve.h
@@ -192,6 +192,11 @@ typedef struct { uint8x16_t val[4]; } uint8x16x4_t;
 #define vcvtmq_u32_f32(__a) __arm_vcvtmq_u32_f32(__a)
 #define vcvtaq_u16_f16(__a) __arm_vcvtaq_u16_f16(__a)
 #define vcvtaq_u32_f32(__a) __arm_vcvtaq_u32_f32(__a)
+#define vctp16q(__a) __arm_vctp16q(__a)
+#define vctp32q(__a) __arm_vctp32q(__a)
+#define vctp64q(__a) __arm_vctp64q(__a)
+#define vctp8q(__a) __arm_vctp8q(__a)
+#define vpnot(__a) __arm_vpnot(__a)
 #endif
 
 __extension__ extern __inline void
@@ -703,6 +708,41 @@ __arm_vaddlvq_u32 (uint32x4_t __a)
   return __builtin_mve_vaddlvq_uv4si (__a);
 }
 
+__extension__ extern __inline int64_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vctp16q (uint32_t __a)
+{
+  return __builtin_mve_vctp16qhi (__a);
+}
+
+__extension__ extern __inline mve_pred16_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vctp32q (uint32_t __a)
+{
+  return __builtin_mve_vctp32qhi (__a);
+}
+
+__extension__ extern __inline mve_pred16_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vctp64q (uint32_t __a)
+{
+  return __builtin_mve_vctp64qhi (__a);
+}
+
+__extension__ extern __inline mve_pred16_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vctp8q (uint32_t __a)
+{
+  return __builtin_mve_vctp8qhi (__a);
+}
+
+__extension__ extern __inline mve_pred16_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vpnot (mve_pred16_t __a)
+{
+  return __builtin_mve_vpnothi (__a);
+}
+
 #if (__ARM_FEATURE_MVE & 2) /* MVE Floating point.  */
 
 __extension__ extern __inline void
diff --git a/gcc/config/arm/arm_mve_builtins.def b/gcc/config/arm/arm_mve_builtins.def
index aafcc2a4bede4e2f9f7845e60f30d4482b7f10bc..816b6dfca7fb221275212ca5f06fc6f679860a38 100644
--- a/gcc/config/arm/arm_mve_builtins.def
+++ b/gcc/config/arm/arm_mve_builtins.def
@@ -1,5 +1,5 @@
 /*  MVE builtin definitions for Arm.
-    Copyright  (C) 2019 Free Software Foundation, Inc.
+    Copyright (C) 2019 Free Software Foundation, Inc.
     Contributed by Arm.
 
     This file is part of GCC.
@@ -71,3 +71,8 @@ VAR2 (UNOP_UNONE_NONE, vcvtaq_u, v8hi, v4si)
 VAR2 (UNOP_UNONE_IMM, vmvnq_n_u, v8hi, v4si)
 VAR1 (UNOP_UNONE_UNONE, vrev16q_u, v16qi)
 VAR1 (UNOP_UNONE_UNONE, vaddlvq_u, v4si)
+VAR1 (UNOP_UNONE_UNONE, vctp16q, hi)
+VAR1 (UNOP_UNONE_UNONE, vctp32q, hi)
+VAR1 (UNOP_UNONE_UNONE, vctp64q, hi)
+VAR1 (UNOP_UNONE_UNONE, vctp8q, hi)
+VAR1 (UNOP_UNONE_UNONE, vpnot, hi)
diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index 9ad53e5655b01c26143a5f46c675658a450e0adf..ee2263e04309e8274ec76ae4478afb7cfba59f8f 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -36,7 +36,7 @@
 			 VREV32Q_U VREV32Q_S VMOVLTQ_U VMOVLTQ_S VMOVLBQ_S
 			 VMOVLBQ_U VCVTQ_FROM_F_S VCVTQ_FROM_F_U VCVTPQ_S
 			 VCVTPQ_U VCVTNQ_S VCVTNQ_U VCVTMQ_S VCVTMQ_U
-			 VADDLVQ_U])
+			 VADDLVQ_U VCTP8Q VCTP16Q VCTP32Q VCTP64Q VPNOT])
 
 (define_mode_attr MVE_CNVT [(V8HI "V8HF") (V4SI "V4SF")
 			    (V8HF "V8HI") (V4SF "V4SI")])
@@ -54,6 +54,9 @@
 		       (VCLZQ_U "u") (VCLZQ_S "s") (VREV32Q_U "u")
 		       (VREV32Q_S "s") (VADDLVQ_U "u") (VADDLVQ_S "s")])
 
+(define_int_attr mode1 [(VCTP8Q "8") (VCTP16Q "16") (VCTP32Q "32")
+			(VCTP64Q "64")])
+
 (define_int_iterator VCVTQ_TO_F [VCVTQ_TO_F_S VCVTQ_TO_F_U])
 (define_int_iterator VMVNQ_N [VMVNQ_N_U VMVNQ_N_S])
 (define_int_iterator VREV64Q [VREV64Q_S VREV64Q_U])
@@ -71,6 +74,7 @@
 (define_int_iterator VCVTNQ [VCVTNQ_S VCVTNQ_U])
 (define_int_iterator VCVTMQ [VCVTMQ_S VCVTMQ_U])
 (define_int_iterator VADDLVQ [VADDLVQ_U VADDLVQ_S])
+(define_int_iterator VCTPQ [VCTP8Q VCTP16Q VCTP32Q VCTP64Q])
 
 (define_insn "*mve_mov<mode>"
   [(set (match_operand:MVE_types 0 "s_register_operand" "=w,w,r,w,w,r,w")
@@ -648,3 +652,31 @@
   "vaddlv.<supf>32 %Q0, %R0, %q1"
   [(set_attr "type" "mve_move")
 ])
+
+;;
+;; [vctp8q vctp16q vctp32q vctp64q])
+;;
+(define_insn "mve_vctp<mode1>qhi"
+  [
+   (set (match_operand:HI 0 "vpr_register_operand" "=Up")
+	(unspec:HI [(match_operand:SI 1 "s_register_operand" "r")]
+	VCTPQ))
+  ]
+  "TARGET_HAVE_MVE"
+  "vctp.<mode1> %1"
+  [(set_attr "type" "mve_move")
+])
+
+;;
+;; [vpnot])
+;;
+(define_insn "mve_vpnothi"
+  [
+   (set (match_operand:HI 0 "s_register_operand" "=r")
+	(unspec:HI [(match_operand:HI 1 "vpr_register_operand" "Up")]
+	 VPNOT))
+  ]
+  "TARGET_HAVE_MVE"
+  "vpnot"
+  [(set_attr "type" "mve_move")
+])
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vctp16q.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vctp16q.c
new file mode 100644
index 0000000000000000000000000000000000000000..0530d23d6cb6283f19f9122bb85c22714eb71843
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vctp16q.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+mve_pred16_t
+foo (uint32_t a)
+{
+  return vctp16q (a);
+}
+
+/* { dg-final { scan-assembler "vctp.16"  }  } */
+
+mve_pred16_t
+foo1 (uint32_t a)
+{
+  return vctp16q (a);
+}
+
+/* { dg-final { scan-assembler "vctp.16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vctp32q.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vctp32q.c
new file mode 100644
index 0000000000000000000000000000000000000000..6e59362345a7b4e77028262c69bee42d230a6f38
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vctp32q.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+mve_pred16_t
+foo (uint32_t a)
+{
+  return vctp32q (a);
+}
+
+/* { dg-final { scan-assembler "vctp.32"  }  } */
+
+mve_pred16_t
+foo1 (uint32_t a)
+{
+  return vctp32q (a);
+}
+
+/* { dg-final { scan-assembler "vctp.32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vctp64q.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vctp64q.c
new file mode 100644
index 0000000000000000000000000000000000000000..9a9224ef88de71012373b0f0cebedbade045fccd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vctp64q.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+mve_pred16_t
+foo (uint32_t a)
+{
+  return vctp64q (a);
+}
+
+/* { dg-final { scan-assembler "vctp.64"  }  } */
+
+mve_pred16_t
+foo1 (uint32_t a)
+{
+  return vctp64q (a);
+}
+
+/* { dg-final { scan-assembler "vctp.64"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vctp8q.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vctp8q.c
new file mode 100644
index 0000000000000000000000000000000000000000..3118d9b60a2eb8c72eb1273445a0b3370b603402
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vctp8q.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+mve_pred16_t
+foo (uint32_t a)
+{
+  return vctp8q (a);
+}
+
+/* { dg-final { scan-assembler "vctp.8"  }  } */
+
+mve_pred16_t
+foo1 (uint32_t a)
+{
+  return vctp8q (a);
+}
+
+/* { dg-final { scan-assembler "vctp.8"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vpnot.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vpnot.c
new file mode 100644
index 0000000000000000000000000000000000000000..9422d342b9d8bf024c2a95877c61e8dd0bd4ffb1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vpnot.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+mve_pred16_t
+foo (mve_pred16_t a)
+{
+  return vpnot (a);
+}
+
+/* { dg-final { scan-assembler "vpnot"  }  } */
+
+mve_pred16_t
+foo1 (mve_pred16_t a)
+{
+  return vpnot (a);
+}
+
+/* { dg-final { scan-assembler "vpnot"  }  } */


^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH][ARM][GCC][1/1x]: Patch to support MVE ACLE intrinsics with unary operand.
  2019-11-14 19:34 [PATCH][ARM][GCC][0/x]: Support for MVE ACLE intrinsics Srinath Parvathaneni
                   ` (27 preceding siblings ...)
  2019-11-14 19:16 ` [PATCH][ARM][GCC][1/x]: MVE ACLE intrinsics framework patch Srinath Parvathaneni
@ 2019-11-14 19:16 ` Srinath Parvathaneni
  2019-12-19 17:57   ` Kyrill Tkachov
  2019-11-14 19:16 ` [PATCH][ARM][GCC][9x]: MVE ACLE predicated intrinsics with (dont-care) variant Srinath Parvathaneni
                   ` (9 subsequent siblings)
  38 siblings, 1 reply; 48+ messages in thread
From: Srinath Parvathaneni @ 2019-11-14 19:16 UTC (permalink / raw)
  To: gcc-patches; +Cc: Richard Earnshaw, Kyrylo Tkachov

[-- Attachment #1: Type: text/plain, Size: 45087 bytes --]

Hello,

This patch supports MVE ACLE intrinsics vcvtq_f16_s16, vcvtq_f32_s32, vcvtq_f16_u16, vcvtq_f32_u32n
vrndxq_f16, vrndxq_f32, vrndq_f16, vrndq_f32, vrndpq_f16, vrndpq_f32, vrndnq_f16, vrndnq_f32,
vrndmq_f16, vrndmq_f32, vrndaq_f16, vrndaq_f32, vrev64q_f16, vrev64q_f32, vnegq_f16, vnegq_f32,
vdupq_n_f16, vdupq_n_f32, vabsq_f16, vabsq_f32, vrev32q_f16, vcvttq_f32_f16, vcvtbq_f32_f16.


Please refer to M-profile Vector Extension (MVE) intrinsics [1]  for more details.
[1] https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/mve-intrinsics

Regression tested on arm-none-eabi and found no regressions.

Ok for trunk?

Thanks,
Srinath.

gcc/ChangeLog:

2019-10-17  Andre Vieira  <andre.simoesdiasvieira@arm.com>
	    Mihail Ionescu  <mihail.ionescu@arm.com>
	    Srinath Parvathaneni  <srinath.parvathaneni@arm.com>

	* config/arm/arm-builtins.c (UNOP_NONE_NONE_QUALIFIERS): Define macro.
	(UNOP_NONE_SNONE_QUALIFIERS): Likewise.
	(UNOP_NONE_UNONE_QUALIFIERS): Likewise.
	* config/arm/arm_mve.h (vrndxq_f16): Define macro.
	(vrndxq_f32): Likewise.
	(vrndq_f16) Likewise.:
	(vrndq_f32): Likewise.
	(vrndpq_f16): Likewise. 
	(vrndpq_f32): Likewise.
	(vrndnq_f16): Likewise.
	(vrndnq_f32): Likewise.
	(vrndmq_f16): Likewise.
	(vrndmq_f32): Likewise. 
	(vrndaq_f16): Likewise.
	(vrndaq_f32): Likewise.
	(vrev64q_f16): Likewise.
	(vrev64q_f32): Likewise.
	(vnegq_f16): Likewise.
	(vnegq_f32): Likewise.
	(vdupq_n_f16): Likewise.
	(vdupq_n_f32): Likewise.
	(vabsq_f16): Likewise. 
	(vabsq_f32): Likewise.
	(vrev32q_f16): Likewise.
	(vcvttq_f32_f16): Likewise.
	(vcvtbq_f32_f16): Likewise.
	(vcvtq_f16_s16): Likewise. 
	(vcvtq_f32_s32): Likewise.
	(vcvtq_f16_u16): Likewise.
	(vcvtq_f32_u32): Likewise.
	(__arm_vrndxq_f16): Define intrinsic.
	(__arm_vrndxq_f32): Likewise.
	(__arm_vrndq_f16): Likewise.
	(__arm_vrndq_f32): Likewise.
	(__arm_vrndpq_f16): Likewise.
	(__arm_vrndpq_f32): Likewise.
	(__arm_vrndnq_f16): Likewise.
	(__arm_vrndnq_f32): Likewise.
	(__arm_vrndmq_f16): Likewise.
	(__arm_vrndmq_f32): Likewise.
	(__arm_vrndaq_f16): Likewise.
	(__arm_vrndaq_f32): Likewise.
	(__arm_vrev64q_f16): Likewise.
	(__arm_vrev64q_f32): Likewise.
	(__arm_vnegq_f16): Likewise.
	(__arm_vnegq_f32): Likewise.
	(__arm_vdupq_n_f16): Likewise.
	(__arm_vdupq_n_f32): Likewise.
	(__arm_vabsq_f16): Likewise.
	(__arm_vabsq_f32): Likewise.
	(__arm_vrev32q_f16): Likewise.
	(__arm_vcvttq_f32_f16): Likewise.
	(__arm_vcvtbq_f32_f16): Likewise.
	(__arm_vcvtq_f16_s16): Likewise.
	(__arm_vcvtq_f32_s32): Likewise.
	(__arm_vcvtq_f16_u16): Likewise.
	(__arm_vcvtq_f32_u32): Likewise.
	(vrndxq): Define polymorphic variants.
	(vrndq): Likewise.
	(vrndpq): Likewise.
	(vrndnq): Likewise.
	(vrndmq): Likewise.
	(vrndaq): Likewise.
	(vrev64q): Likewise.
	(vnegq): Likewise.
	(vabsq): Likewise.
	(vrev32q): Likewise.
	(vcvtbq_f32): Likewise.
	(vcvttq_f32): Likewise.
	(vcvtq): Likewise.
	* config/arm/arm_mve_builtins.def (VAR2): Define.
	(VAR1): Define.
	* config/arm/mve.md (mve_vrndxq_f<mode>): Add RTL pattern.
	(mve_vrndq_f<mode>): Likewise.
	(mve_vrndpq_f<mode>): Likewise.
	(mve_vrndnq_f<mode>): Likewise.
	(mve_vrndmq_f<mode>): Likewise.
	(mve_vrndaq_f<mode>): Likewise.
	(mve_vrev64q_f<mode>): Likewise.
	(mve_vnegq_f<mode>): Likewise.
	(mve_vdupq_n_f<mode>): Likewise.
	(mve_vabsq_f<mode>): Likewise.
	(mve_vrev32q_fv8hf): Likewise.
	(mve_vcvttq_f32_f16v4sf): Likewise.
	(mve_vcvtbq_f32_f16v4sf): Likewise.
	(mve_vcvtq_to_f_<supf><mode>): Likewise.

gcc/testsuite/ChangeLog:

2019-10-17  Andre Vieira  <andre.simoesdiasvieira@arm.com>
	    Mihail Ionescu  <mihail.ionescu@arm.com>
	    Srinath Parvathaneni  <srinath.parvathaneni@arm.com>

	* gcc.target/arm/mve/intrinsics/vabsq_f16.c: New test.
	* gcc.target/arm/mve/intrinsics/vabsq_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcvtbq_f32_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcvtq_f16_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcvtq_f16_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcvtq_f32_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcvtq_f32_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcvttq_f32_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vdupq_n_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vdupq_n_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vnegq_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vnegq_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrev32q_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrev64q_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrev64q_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrndaq_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrndaq_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrndmq_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrndmq_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrndnq_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrndnq_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrndpq_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrndpq_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrndq_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrndq_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrndxq_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrndxq_f32.c: Likewise.


###############     Attachment also inlined for ease of reply    ###############


diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index a9f76971ef310118bf7edea6a8dd3de1da46b46b..2fee417fe6585f457edd4cf96655366b1d6bd1a0 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -293,6 +293,28 @@ arm_store1_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_void, qualifier_pointer_map_mode, qualifier_none };
 #define STORE1_QUALIFIERS (arm_store1_qualifiers)
 
+/* Qualifiers for MVE builtins.  */
+
+static enum arm_type_qualifiers
+arm_unop_none_none_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_none, qualifier_none };
+#define UNOP_NONE_NONE_QUALIFIERS \
+  (arm_unop_none_none_qualifiers)
+
+static enum arm_type_qualifiers
+arm_unop_none_snone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_none, qualifier_none };
+#define UNOP_NONE_SNONE_QUALIFIERS \
+  (arm_unop_none_snone_qualifiers)
+
+static enum arm_type_qualifiers
+arm_unop_none_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_none, qualifier_unsigned };
+#define UNOP_NONE_UNONE_QUALIFIERS \
+  (arm_unop_none_unone_qualifiers)
+
+/* End of Qualifier for MVE builtins.  */
+
    /* void ([T element type] *, T, immediate).  */
 static enum arm_type_qualifiers
 arm_storestruct_lane_qualifiers[SIMD_MAX_BUILTIN_ARGS]
diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
index 39c6a1551a72700292dde8ef6cea44ba0907af8d..9bcb04fb99a54b47057bb33cc807d6a5ad16401f 100644
--- a/gcc/config/arm/arm_mve.h
+++ b/gcc/config/arm/arm_mve.h
@@ -81,6 +81,33 @@ typedef struct { uint8x16_t val[4]; } uint8x16x4_t;
 #define vst4q_u32( __addr, __value) __arm_vst4q_u32( __addr, __value)
 #define vst4q_f16( __addr, __value) __arm_vst4q_f16( __addr, __value)
 #define vst4q_f32( __addr, __value) __arm_vst4q_f32( __addr, __value)
+#define vrndxq_f16(__a) __arm_vrndxq_f16(__a)
+#define vrndxq_f32(__a) __arm_vrndxq_f32(__a)
+#define vrndq_f16(__a) __arm_vrndq_f16(__a)
+#define vrndq_f32(__a) __arm_vrndq_f32(__a)
+#define vrndpq_f16(__a) __arm_vrndpq_f16(__a)
+#define vrndpq_f32(__a) __arm_vrndpq_f32(__a)
+#define vrndnq_f16(__a) __arm_vrndnq_f16(__a)
+#define vrndnq_f32(__a) __arm_vrndnq_f32(__a)
+#define vrndmq_f16(__a) __arm_vrndmq_f16(__a)
+#define vrndmq_f32(__a) __arm_vrndmq_f32(__a)
+#define vrndaq_f16(__a) __arm_vrndaq_f16(__a)
+#define vrndaq_f32(__a) __arm_vrndaq_f32(__a)
+#define vrev64q_f16(__a) __arm_vrev64q_f16(__a)
+#define vrev64q_f32(__a) __arm_vrev64q_f32(__a)
+#define vnegq_f16(__a) __arm_vnegq_f16(__a)
+#define vnegq_f32(__a) __arm_vnegq_f32(__a)
+#define vdupq_n_f16(__a) __arm_vdupq_n_f16(__a)
+#define vdupq_n_f32(__a) __arm_vdupq_n_f32(__a)
+#define vabsq_f16(__a) __arm_vabsq_f16(__a)
+#define vabsq_f32(__a) __arm_vabsq_f32(__a)
+#define vrev32q_f16(__a) __arm_vrev32q_f16(__a)
+#define vcvttq_f32_f16(__a) __arm_vcvttq_f32_f16(__a)
+#define vcvtbq_f32_f16(__a) __arm_vcvtbq_f32_f16(__a)
+#define vcvtq_f16_s16(__a) __arm_vcvtq_f16_s16(__a)
+#define vcvtq_f32_s32(__a) __arm_vcvtq_f32_s32(__a)
+#define vcvtq_f16_u16(__a) __arm_vcvtq_f16_u16(__a)
+#define vcvtq_f32_u32(__a) __arm_vcvtq_f32_u32(__a)
 #endif
 
 __extension__ extern __inline void
@@ -157,6 +184,195 @@ __arm_vst4q_f32 (float32_t * __addr, float32x4x4_t __value)
   __builtin_mve_vst4qv4sf (__addr, __rv.__o);
 }
 
+__extension__ extern __inline float16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vrndxq_f16 (float16x8_t __a)
+{
+  return __builtin_mve_vrndxq_fv8hf (__a);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vrndxq_f32 (float32x4_t __a)
+{
+  return __builtin_mve_vrndxq_fv4sf (__a);
+}
+
+__extension__ extern __inline float16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vrndq_f16 (float16x8_t __a)
+{
+  return __builtin_mve_vrndq_fv8hf (__a);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vrndq_f32 (float32x4_t __a)
+{
+  return __builtin_mve_vrndq_fv4sf (__a);
+}
+
+__extension__ extern __inline float16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vrndpq_f16 (float16x8_t __a)
+{
+  return __builtin_mve_vrndpq_fv8hf (__a);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vrndpq_f32 (float32x4_t __a)
+{
+  return __builtin_mve_vrndpq_fv4sf (__a);
+}
+
+__extension__ extern __inline float16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vrndnq_f16 (float16x8_t __a)
+{
+  return __builtin_mve_vrndnq_fv8hf (__a);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vrndnq_f32 (float32x4_t __a)
+{
+  return __builtin_mve_vrndnq_fv4sf (__a);
+}
+
+__extension__ extern __inline float16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vrndmq_f16 (float16x8_t __a)
+{
+  return __builtin_mve_vrndmq_fv8hf (__a);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vrndmq_f32 (float32x4_t __a)
+{
+  return __builtin_mve_vrndmq_fv4sf (__a);
+}
+
+__extension__ extern __inline float16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vrndaq_f16 (float16x8_t __a)
+{
+  return __builtin_mve_vrndaq_fv8hf (__a);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vrndaq_f32 (float32x4_t __a)
+{
+  return __builtin_mve_vrndaq_fv4sf (__a);
+}
+
+__extension__ extern __inline float16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vrev64q_f16 (float16x8_t __a)
+{
+  return __builtin_mve_vrev64q_fv8hf (__a);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vrev64q_f32 (float32x4_t __a)
+{
+  return __builtin_mve_vrev64q_fv4sf (__a);
+}
+
+__extension__ extern __inline float16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vnegq_f16 (float16x8_t __a)
+{
+  return __builtin_mve_vnegq_fv8hf (__a);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vnegq_f32 (float32x4_t __a)
+{
+  return __builtin_mve_vnegq_fv4sf (__a);
+}
+
+__extension__ extern __inline float16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vdupq_n_f16 (float16_t __a)
+{
+  return __builtin_mve_vdupq_n_fv8hf (__a);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vdupq_n_f32 (float32_t __a)
+{
+  return __builtin_mve_vdupq_n_fv4sf (__a);
+}
+
+__extension__ extern __inline float16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vabsq_f16 (float16x8_t __a)
+{
+  return __builtin_mve_vabsq_fv8hf (__a);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vabsq_f32 (float32x4_t __a)
+{
+  return __builtin_mve_vabsq_fv4sf (__a);
+}
+
+__extension__ extern __inline float16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vrev32q_f16 (float16x8_t __a)
+{
+  return __builtin_mve_vrev32q_fv8hf (__a);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vcvttq_f32_f16 (float16x8_t __a)
+{
+  return __builtin_mve_vcvttq_f32_f16v4sf (__a);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vcvtbq_f32_f16 (float16x8_t __a)
+{
+  return __builtin_mve_vcvtbq_f32_f16v4sf (__a);
+}
+
+__extension__ extern __inline float16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vcvtq_f16_s16 (int16x8_t __a)
+{
+  return __builtin_mve_vcvtq_to_f_sv8hf (__a);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vcvtq_f32_s32 (int32x4_t __a)
+{
+  return __builtin_mve_vcvtq_to_f_sv4sf (__a);
+}
+
+__extension__ extern __inline float16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vcvtq_f16_u16 (uint16x8_t __a)
+{
+  return __builtin_mve_vcvtq_to_f_uv8hf (__a);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vcvtq_f32_u32 (uint32x4_t __a)
+{
+  return __builtin_mve_vcvtq_to_f_uv4sf (__a);
+}
+
 #endif
 
 enum {
@@ -368,6 +584,83 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_float16_t_ptr][__ARM_mve_type_float16x8x4_t]: __arm_vst4q_f16 (__ARM_mve_coerce(__p0, float16_t *), __ARM_mve_coerce(__p1, float16x8x4_t)), \
   int (*)[__ARM_mve_type_float32_t_ptr][__ARM_mve_type_float32x4x4_t]: __arm_vst4q_f32 (__ARM_mve_coerce(__p0, float32_t *), __ARM_mve_coerce(__p1, float32x4x4_t)));})
 
+#define vrndxq(p0) __arm_vrndxq(p0)
+#define __arm_vrndxq(p0) ({ __typeof(p0) __p0 = (p0); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
+  int (*)[__ARM_mve_type_float16x8_t]: __arm_vrndxq_f16 (__ARM_mve_coerce(__p0, float16x8_t)), \
+  int (*)[__ARM_mve_type_float32x4_t]: __arm_vrndxq_f32 (__ARM_mve_coerce(__p0, float32x4_t)));})
+
+#define vrndq(p0) __arm_vrndq(p0)
+#define __arm_vrndq(p0) ({ __typeof(p0) __p0 = (p0); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
+  int (*)[__ARM_mve_type_float16x8_t]: __arm_vrndq_f16 (__ARM_mve_coerce(__p0, float16x8_t)), \
+  int (*)[__ARM_mve_type_float32x4_t]: __arm_vrndq_f32 (__ARM_mve_coerce(__p0, float32x4_t)));})
+
+#define vrndpq(p0) __arm_vrndpq(p0)
+#define __arm_vrndpq(p0) ({ __typeof(p0) __p0 = (p0); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
+  int (*)[__ARM_mve_type_float16x8_t]: __arm_vrndpq_f16 (__ARM_mve_coerce(__p0, float16x8_t)), \
+  int (*)[__ARM_mve_type_float32x4_t]: __arm_vrndpq_f32 (__ARM_mve_coerce(__p0, float32x4_t)));})
+
+#define vrndnq(p0) __arm_vrndnq(p0)
+#define __arm_vrndnq(p0) ({ __typeof(p0) __p0 = (p0); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
+  int (*)[__ARM_mve_type_float16x8_t]: __arm_vrndnq_f16 (__ARM_mve_coerce(__p0, float16x8_t)), \
+  int (*)[__ARM_mve_type_float32x4_t]: __arm_vrndnq_f32 (__ARM_mve_coerce(__p0, float32x4_t)));})
+
+#define vrndmq(p0) __arm_vrndmq(p0)
+#define __arm_vrndmq(p0) ({ __typeof(p0) __p0 = (p0); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
+  int (*)[__ARM_mve_type_float16x8_t]: __arm_vrndmq_f16 (__ARM_mve_coerce(__p0, float16x8_t)), \
+  int (*)[__ARM_mve_type_float32x4_t]: __arm_vrndmq_f32 (__ARM_mve_coerce(__p0, float32x4_t)));})
+
+#define vrndaq(p0) __arm_vrndaq(p0)
+#define __arm_vrndaq(p0) ({ __typeof(p0) __p0 = (p0); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
+  int (*)[__ARM_mve_type_float16x8_t]: __arm_vrndaq_f16 (__ARM_mve_coerce(__p0, float16x8_t)), \
+  int (*)[__ARM_mve_type_float32x4_t]: __arm_vrndaq_f32 (__ARM_mve_coerce(__p0, float32x4_t)));})
+
+#define vrev64q(p0) __arm_vrev64q(p0)
+#define __arm_vrev64q(p0) ({ __typeof(p0) __p0 = (p0); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
+  int (*)[__ARM_mve_type_float16x8_t]: __arm_vrev64q_f16 (__ARM_mve_coerce(__p0, float16x8_t)), \
+  int (*)[__ARM_mve_type_float32x4_t]: __arm_vrev64q_f32 (__ARM_mve_coerce(__p0, float32x4_t)));})
+
+#define vnegq(p0) __arm_vnegq(p0)
+#define __arm_vnegq(p0) ({ __typeof(p0) __p0 = (p0); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
+  int (*)[__ARM_mve_type_float16x8_t]: __arm_vnegq_f16 (__ARM_mve_coerce(__p0, float16x8_t)), \
+  int (*)[__ARM_mve_type_float32x4_t]: __arm_vnegq_f32 (__ARM_mve_coerce(__p0, float32x4_t)));})
+
+#define vabsq(p0) __arm_vabsq(p0)
+#define __arm_vabsq(p0) ({ __typeof(p0) __p0 = (p0); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
+  int (*)[__ARM_mve_type_float16x8_t]: __arm_vabsq_f16 (__ARM_mve_coerce(__p0, float16x8_t)), \
+  int (*)[__ARM_mve_type_float32x4_t]: __arm_vabsq_f32 (__ARM_mve_coerce(__p0, float32x4_t)));})
+
+#define vrev32q(p0) __arm_vrev32q(p0)
+#define __arm_vrev32q(p0) ({ __typeof(p0) __p0 = (p0); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
+  int (*)[__ARM_mve_type_float16x8_t]: __arm_vrev32q_f16 (__ARM_mve_coerce(__p0, float16x8_t)));})
+
+#define vcvtbq_f32(p0) __arm_vcvtbq_f32(p0)
+#define __arm_vcvtbq_f32(p0) ({ __typeof(p0) __p0 = (p0); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
+  int (*)[__ARM_mve_type_float16x8_t]: __arm_vcvtbq_f32_f16 (__ARM_mve_coerce(__p0, float16x8_t)));})
+
+#define vcvttq_f32(p0) __arm_vcvttq_f32(p0)
+#define __arm_vcvttq_f32(p0) ({ __typeof(p0) __p0 = (p0); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
+  int (*)[__ARM_mve_type_float16x8_t]: __arm_vcvttq_f32_f16 (__ARM_mve_coerce(__p0, float16x8_t)));})
+
+#define vcvtq(p0) __arm_vcvtq(p0)
+#define __arm_vcvtq(p0) ({ __typeof(p0) __p0 = (p0); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
+  int (*)[__ARM_mve_type_int16x8_t]: __arm_vcvtq_f16_s16 (__ARM_mve_coerce(__p0, int16x8_t)), \
+  int (*)[__ARM_mve_type_int32x4_t]: __arm_vcvtq_f32_s32 (__ARM_mve_coerce(__p0, int32x4_t)), \
+  int (*)[__ARM_mve_type_uint16x8_t]: __arm_vcvtq_f16_u16 (__ARM_mve_coerce(__p0, uint16x8_t)), \
+  int (*)[__ARM_mve_type_uint32x4_t]: __arm_vcvtq_f32_u32 (__ARM_mve_coerce(__p0, uint32x4_t)));})
+
 #else /* MVE Interger.  */
 
 #define vst4q(p0,p1) __arm_vst4q(p0,p1)
diff --git a/gcc/config/arm/arm_mve_builtins.def b/gcc/config/arm/arm_mve_builtins.def
index c7f3546c9cb0ec3ae794e181e533a1cbed38121c..65dc58c9328525891a0aa0bb97a412ebc8257c18 100644
--- a/gcc/config/arm/arm_mve_builtins.def
+++ b/gcc/config/arm/arm_mve_builtins.def
@@ -19,3 +19,18 @@
     <http://www.gnu.org/licenses/>.  */
 
 VAR5 (STORE1, vst4q, v16qi, v8hi, v4si, v8hf, v4sf)
+VAR2 (UNOP_NONE_NONE, vrndxq_f, v8hf, v4sf)
+VAR2 (UNOP_NONE_NONE, vrndq_f, v8hf, v4sf)
+VAR2 (UNOP_NONE_NONE, vrndpq_f, v8hf, v4sf)
+VAR2 (UNOP_NONE_NONE, vrndnq_f, v8hf, v4sf)
+VAR2 (UNOP_NONE_NONE, vrndmq_f, v8hf, v4sf)
+VAR2 (UNOP_NONE_NONE, vrndaq_f, v8hf, v4sf)
+VAR2 (UNOP_NONE_NONE, vrev64q_f, v8hf, v4sf)
+VAR2 (UNOP_NONE_NONE, vnegq_f, v8hf, v4sf)
+VAR2 (UNOP_NONE_NONE, vdupq_n_f, v8hf, v4sf)
+VAR2 (UNOP_NONE_NONE, vabsq_f, v8hf, v4sf)
+VAR1 (UNOP_NONE_NONE, vrev32q_f, v8hf)
+VAR1 (UNOP_NONE_NONE, vcvttq_f32_f16, v4sf)
+VAR1 (UNOP_NONE_NONE, vcvtbq_f32_f16, v4sf)
+VAR2 (UNOP_NONE_SNONE, vcvtq_to_f_s, v8hf, v4sf)
+VAR2 (UNOP_NONE_UNONE, vcvtq_to_f_u, v8hf, v4sf)
diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index a2f5220eccb6d888b9b0b1bbd1054cc3a6be97f5..7a31d0abdfff9a93d79faa1de44d1b224470e2eb 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -21,8 +21,18 @@
 			      (V2DI "u64")])
 (define_mode_iterator MVE_types [V16QI V8HI V4SI V2DI TI V8HF V4SF V2DF])
 (define_mode_iterator MVE_VLD_ST [V16QI V8HI V4SI V8HF V4SF])
+(define_mode_iterator MVE_0 [V8HF V4SF])
 
-(define_c_enum "unspec" [VST4Q])
+(define_c_enum "unspec" [VST4Q VRNDXQ_F VRNDQ_F VRNDPQ_F VRNDNQ_F VRNDMQ_F
+			 VRNDAQ_F VREV64Q_F VNEGQ_F VDUPQ_N_F VABSQ_F VREV32Q_F
+			 VCVTTQ_F32_F16 VCVTBQ_F32_F16 VCVTQ_TO_F_S
+			 VCVTQ_TO_F_U])
+
+(define_mode_attr MVE_CNVT [(V8HI "V8HF") (V4SI "V4SF")
+			    (V8HF "V8HI") (V4SF "V4SI")])
+
+(define_int_attr supf [(VCVTQ_TO_F_S "s") (VCVTQ_TO_F_U "u")])
+(define_int_iterator VCVTQ_TO_F [VCVTQ_TO_F_S VCVTQ_TO_F_U])
 
 (define_insn "*mve_mov<mode>"
   [(set (match_operand:MVE_types 0 "s_register_operand" "=w,w,r,w,w,r,w")
@@ -113,3 +123,198 @@
    return "";
 }
   [(set_attr "length" "16")])
+
+;;
+;; [vrndxq_f])
+;;
+(define_insn "mve_vrndxq_f<mode>"
+  [
+   (set (match_operand:MVE_0 0 "s_register_operand" "=w")
+	(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "w")]
+	 VRNDXQ_F))
+  ]
+  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
+  "vrintx.f%#<V_sz_elem>	%q0, %q1"
+  [(set_attr "type" "mve_move")
+])
+
+;;
+;; [vrndq_f])
+;;
+(define_insn "mve_vrndq_f<mode>"
+  [
+   (set (match_operand:MVE_0 0 "s_register_operand" "=w")
+	(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "w")]
+	 VRNDQ_F))
+  ]
+  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
+  "vrintz.f%#<V_sz_elem>	%q0, %q1"
+  [(set_attr "type" "mve_move")
+])
+
+;;
+;; [vrndpq_f])
+;;
+(define_insn "mve_vrndpq_f<mode>"
+  [
+   (set (match_operand:MVE_0 0 "s_register_operand" "=w")
+	(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "w")]
+	 VRNDPQ_F))
+  ]
+  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
+  "vrintp.f%#<V_sz_elem>	%q0, %q1"
+  [(set_attr "type" "mve_move")
+])
+
+;;
+;; [vrndnq_f])
+;;
+(define_insn "mve_vrndnq_f<mode>"
+  [
+   (set (match_operand:MVE_0 0 "s_register_operand" "=w")
+	(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "w")]
+	 VRNDNQ_F))
+  ]
+  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
+  "vrintn.f%#<V_sz_elem>	%q0, %q1"
+  [(set_attr "type" "mve_move")
+])
+
+;;
+;; [vrndmq_f])
+;;
+(define_insn "mve_vrndmq_f<mode>"
+  [
+   (set (match_operand:MVE_0 0 "s_register_operand" "=w")
+	(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "w")]
+	 VRNDMQ_F))
+  ]
+  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
+  "vrintm.f%#<V_sz_elem>	%q0, %q1"
+  [(set_attr "type" "mve_move")
+])
+
+;;
+;; [vrndaq_f])
+;;
+(define_insn "mve_vrndaq_f<mode>"
+  [
+   (set (match_operand:MVE_0 0 "s_register_operand" "=w")
+	(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "w")]
+	 VRNDAQ_F))
+  ]
+  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
+  "vrinta.f%#<V_sz_elem>	%q0, %q1"
+  [(set_attr "type" "mve_move")
+])
+
+;;
+;; [vrev64q_f])
+;;
+(define_insn "mve_vrev64q_f<mode>"
+  [
+   (set (match_operand:MVE_0 0 "s_register_operand" "=w")
+	(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "w")]
+	 VREV64Q_F))
+  ]
+  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
+  "vrev64.%#<V_sz_elem> %q0, %q1"
+  [(set_attr "type" "mve_move")
+])
+
+;;
+;; [vnegq_f])
+;;
+(define_insn "mve_vnegq_f<mode>"
+  [
+   (set (match_operand:MVE_0 0 "s_register_operand" "=w")
+	(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "w")]
+	 VNEGQ_F))
+  ]
+  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
+  "vneg.f%#<V_sz_elem>  %q0, %q1"
+  [(set_attr "type" "mve_move")
+])
+
+;;
+;; [vdupq_n_f])
+;;
+(define_insn "mve_vdupq_n_f<mode>"
+  [
+   (set (match_operand:MVE_0 0 "s_register_operand" "=w")
+	(unspec:MVE_0 [(match_operand:<V_elem> 1 "s_register_operand" "r")]
+	 VDUPQ_N_F))
+  ]
+  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
+  "vdup.%#<V_sz_elem>   %q0, %1"
+  [(set_attr "type" "mve_move")
+])
+
+;;
+;; [vabsq_f])
+;;
+(define_insn "mve_vabsq_f<mode>"
+  [
+   (set (match_operand:MVE_0 0 "s_register_operand" "=w")
+	(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "w")]
+	 VABSQ_F))
+  ]
+  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
+  "vabs.f%#<V_sz_elem>  %q0, %q1"
+  [(set_attr "type" "mve_move")
+])
+
+;;
+;; [vrev32q_f])
+;;
+(define_insn "mve_vrev32q_fv8hf"
+  [
+   (set (match_operand:V8HF 0 "s_register_operand" "=w")
+	(unspec:V8HF [(match_operand:V8HF 1 "s_register_operand" "w")]
+	 VREV32Q_F))
+  ]
+  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
+  "vrev32.16 %q0, %q1"
+  [(set_attr "type" "mve_move")
+])
+;;
+;; [vcvttq_f32_f16])
+;;
+(define_insn "mve_vcvttq_f32_f16v4sf"
+  [
+   (set (match_operand:V4SF 0 "s_register_operand" "=w")
+	(unspec:V4SF [(match_operand:V8HF 1 "s_register_operand" "w")]
+	 VCVTTQ_F32_F16))
+  ]
+  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
+  "vcvtt.f32.f16 %q0, %q1"
+  [(set_attr "type" "mve_move")
+])
+
+;;
+;; [vcvtbq_f32_f16])
+;;
+(define_insn "mve_vcvtbq_f32_f16v4sf"
+  [
+   (set (match_operand:V4SF 0 "s_register_operand" "=w")
+	(unspec:V4SF [(match_operand:V8HF 1 "s_register_operand" "w")]
+	 VCVTBQ_F32_F16))
+  ]
+  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
+  "vcvtb.f32.f16 %q0, %q1"
+  [(set_attr "type" "mve_move")
+])
+
+;;
+;; [vcvtq_to_f_s, vcvtq_to_f_u])
+;;
+(define_insn "mve_vcvtq_to_f_<supf><mode>"
+  [
+   (set (match_operand:MVE_0 0 "s_register_operand" "=w")
+	(unspec:MVE_0 [(match_operand:<MVE_CNVT> 1 "s_register_operand" "w")]
+	 VCVTQ_TO_F))
+  ]
+  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
+  "vcvt.f%#<V_sz_elem>.<supf>%#<V_sz_elem>       %q0, %q1"
+  [(set_attr "type" "mve_move")
+])
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_f16.c
new file mode 100644
index 0000000000000000000000000000000000000000..c0b19dc2acf36a89705ab7a371925f1f35fe60ba
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_f16.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mcpu=auto"} } */
+
+#include "arm_mve.h"
+
+float16x8_t
+foo (float16x8_t a)
+{
+  return vabsq_f16 (a);
+}
+
+/* { dg-final { scan-assembler "vabs.f16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_f32.c
new file mode 100644
index 0000000000000000000000000000000000000000..59c3b5ab2a4a4269f2fc94408d45a44e60c2db17
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_f32.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mcpu=auto"} } */
+
+#include "arm_mve.h"
+
+float32x4_t
+foo (float32x4_t a)
+{
+  return vabsq_f32 (a);
+}
+
+/* { dg-final { scan-assembler "vabs.f32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtbq_f32_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtbq_f32_f16.c
new file mode 100644
index 0000000000000000000000000000000000000000..0f6c0b56a364515c7804acbb38e2402833006474
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtbq_f32_f16.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mcpu=auto"} } */
+
+#include "arm_mve.h"
+
+float32x4_t
+foo (float16x8_t a)
+{
+  return vcvtbq_f32_f16 (a);
+}
+
+/* { dg-final { scan-assembler "vcvtb.f32.f16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_f16_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_f16_s16.c
new file mode 100644
index 0000000000000000000000000000000000000000..45be0573781bebd1b0321eb63ae9717d25b2e95f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_f16_s16.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mcpu=auto"} } */
+
+#include "arm_mve.h"
+
+float16x8_t
+foo (int16x8_t a)
+{
+  return vcvtq_f16_s16 (a);
+}
+
+/* { dg-final { scan-assembler "vcvt.f16.s16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_f16_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_f16_u16.c
new file mode 100644
index 0000000000000000000000000000000000000000..17b42c5771e5aab6a8c2dfef7c71c8616c5116f8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_f16_u16.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mcpu=auto"} } */
+
+#include "arm_mve.h"
+
+float16x8_t
+foo (uint16x8_t a)
+{
+  return vcvtq_f16_u16 (a);
+}
+
+/* { dg-final { scan-assembler "vcvt.f16.u16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_f32_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_f32_s32.c
new file mode 100644
index 0000000000000000000000000000000000000000..0e7dbd22effb90a47b53b7a82cbcea4d92e587a2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_f32_s32.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mcpu=auto"} } */
+
+#include "arm_mve.h"
+
+float32x4_t
+foo (int32x4_t a)
+{
+  return vcvtq_f32_s32 (a);
+}
+
+/* { dg-final { scan-assembler "vcvt.f32.s32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_f32_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_f32_u32.c
new file mode 100644
index 0000000000000000000000000000000000000000..2641d71558cd810a0dabc14a0fc038038fa6e2f4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_f32_u32.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mcpu=auto"} } */
+
+#include "arm_mve.h"
+
+float32x4_t
+foo (uint32x4_t a)
+{
+  return vcvtq_f32_u32 (a);
+}
+
+/* { dg-final { scan-assembler "vcvt.f32.u32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvttq_f32_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvttq_f32_f16.c
new file mode 100644
index 0000000000000000000000000000000000000000..f3983cf3aa4ec9a7489df84f895f10b365c6acd4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvttq_f32_f16.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mcpu=auto"} } */
+
+#include "arm_mve.h"
+
+float32x4_t
+foo (float16x8_t a)
+{
+  return vcvttq_f32_f16 (a);
+}
+
+/* { dg-final { scan-assembler "vcvtt.f32.f16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_f16.c
new file mode 100644
index 0000000000000000000000000000000000000000..a2b7bebddeb0ed92e4b674405d8a23a397234b15
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_f16.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mcpu=auto"} } */
+
+#include "arm_mve.h"
+
+float16x8_t
+foo (float16_t a)
+{
+  return vdupq_n_f16 (a);
+}
+
+/* { dg-final { scan-assembler "vdup.16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_f32.c
new file mode 100644
index 0000000000000000000000000000000000000000..05ffc960217f8781d87870a3d100a1008c41ed86
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_f32.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mcpu=auto"} } */
+
+#include "arm_mve.h"
+
+float32x4_t
+foo (float32_t a)
+{
+  return vdupq_n_f32 (a);
+}
+
+/* { dg-final { scan-assembler "vdup.32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vnegq_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vnegq_f16.c
new file mode 100644
index 0000000000000000000000000000000000000000..8a6e9a793c9ea12a74a08cb905a6f7cb6a4d6350
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vnegq_f16.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mcpu=auto"} } */
+
+#include "arm_mve.h"
+
+float16x8_t
+foo (float16x8_t a)
+{
+  return vnegq_f16 (a);
+}
+
+/* { dg-final { scan-assembler "vneg.f16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vnegq_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vnegq_f32.c
new file mode 100644
index 0000000000000000000000000000000000000000..63e94ca3eac1dda65aaf2d06309654560d0756f3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vnegq_f32.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mcpu=auto"} } */
+
+#include "arm_mve.h"
+
+float32x4_t
+foo (float32x4_t a)
+{
+  return vnegq_f32 (a);
+}
+
+/* { dg-final { scan-assembler "vneg.f32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrev32q_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrev32q_f16.c
new file mode 100644
index 0000000000000000000000000000000000000000..21c666844179e723484678af5d8123a418177090
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrev32q_f16.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mcpu=auto"} } */
+
+#include "arm_mve.h"
+
+float16x8_t
+foo (float16x8_t a)
+{
+  return vrev32q_f16 (a);
+}
+
+/* { dg-final { scan-assembler "vrev32.16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrev64q_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrev64q_f16.c
new file mode 100644
index 0000000000000000000000000000000000000000..70db03e0fe4573961778a5290e9b585044faa393
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrev64q_f16.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mcpu=auto"} } */
+
+#include "arm_mve.h"
+
+float16x8_t
+foo (float16x8_t a)
+{
+  return vrev64q_f16 (a);
+}
+
+/* { dg-final { scan-assembler "vrev64.16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrev64q_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrev64q_f32.c
new file mode 100644
index 0000000000000000000000000000000000000000..e9686c63d162a8919bd24d0227f9e6acb6278142
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrev64q_f32.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mcpu=auto"} } */
+
+#include "arm_mve.h"
+
+float32x4_t
+foo (float32x4_t a)
+{
+  return vrev64q_f32 (a);
+}
+
+/* { dg-final { scan-assembler "vrev64.32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrndaq_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrndaq_f16.c
new file mode 100644
index 0000000000000000000000000000000000000000..9c51833855e772346b2d1df3760de511aa58d1ca
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrndaq_f16.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mcpu=auto"} } */
+
+#include "arm_mve.h"
+
+float16x8_t
+foo (float16x8_t a)
+{
+  return vrndaq_f16 (a);
+}
+
+/* { dg-final { scan-assembler "vrinta.f16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrndaq_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrndaq_f32.c
new file mode 100644
index 0000000000000000000000000000000000000000..0f2c76f38ea950a36a6e243ce656f334c46e5fa5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrndaq_f32.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mcpu=auto"} } */
+
+#include "arm_mve.h"
+
+float32x4_t
+foo (float32x4_t a)
+{
+  return vrndaq_f32 (a);
+}
+
+/* { dg-final { scan-assembler "vrinta.f32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrndmq_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrndmq_f16.c
new file mode 100644
index 0000000000000000000000000000000000000000..936b20cf9eda0869f828880dc4ca6cf8d22d05dc
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrndmq_f16.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mcpu=auto"} } */
+
+#include "arm_mve.h"
+
+float16x8_t
+foo (float16x8_t a)
+{
+  return vrndmq_f16 (a);
+}
+
+/* { dg-final { scan-assembler "vrintm.f16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrndmq_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrndmq_f32.c
new file mode 100644
index 0000000000000000000000000000000000000000..ba77c73ebb1aebc5608597739e0be57436f8aebd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrndmq_f32.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mcpu=auto"} } */
+
+#include "arm_mve.h"
+
+float32x4_t
+foo (float32x4_t a)
+{
+  return vrndmq_f32 (a);
+}
+
+/* { dg-final { scan-assembler "vrintm.f32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrndnq_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrndnq_f16.c
new file mode 100644
index 0000000000000000000000000000000000000000..43dd340e8c529e71cbcd6168adb14f5a29d141b2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrndnq_f16.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mcpu=auto"} } */
+
+#include "arm_mve.h"
+
+float16x8_t
+foo (float16x8_t a)
+{
+  return vrndnq_f16 (a);
+}
+
+/* { dg-final { scan-assembler "vrintn.f16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrndnq_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrndnq_f32.c
new file mode 100644
index 0000000000000000000000000000000000000000..a665250c35b1fecb6bd017df8f336c86b6cae179
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrndnq_f32.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mcpu=auto"} } */
+
+#include "arm_mve.h"
+
+float32x4_t
+foo (float32x4_t a)
+{
+  return vrndnq_f32 (a);
+}
+
+/* { dg-final { scan-assembler "vrintn.f32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrndpq_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrndpq_f16.c
new file mode 100644
index 0000000000000000000000000000000000000000..effa934137adf3e2d5535b1abdd7812d8fa7cc41
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrndpq_f16.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mcpu=auto"} } */
+
+#include "arm_mve.h"
+
+float16x8_t
+foo (float16x8_t a)
+{
+  return vrndpq_f16 (a);
+}
+
+/* { dg-final { scan-assembler "vrintp.f16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrndpq_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrndpq_f32.c
new file mode 100644
index 0000000000000000000000000000000000000000..0e1714f89b3a7f5b59a40cfbb9ab3fd72399e988
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrndpq_f32.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mcpu=auto"} } */
+
+#include "arm_mve.h"
+
+float32x4_t
+foo (float32x4_t a)
+{
+  return vrndpq_f32 (a);
+}
+
+/* { dg-final { scan-assembler "vrintp.f32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrndq_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrndq_f16.c
new file mode 100644
index 0000000000000000000000000000000000000000..0cd74d4849fbb73a0dc7cb64d0f5cbde5ee86927
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrndq_f16.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mcpu=auto"} } */
+
+#include "arm_mve.h"
+
+float16x8_t
+foo (float16x8_t a)
+{
+  return vrndq_f16 (a);
+}
+
+/* { dg-final { scan-assembler "vrintz.f16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrndq_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrndq_f32.c
new file mode 100644
index 0000000000000000000000000000000000000000..eacd1d1f2e1c86169f0739ad43b3faf6da764f75
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrndq_f32.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mcpu=auto"} } */
+
+#include "arm_mve.h"
+
+float32x4_t
+foo (float32x4_t a)
+{
+  return vrndq_f32 (a);
+}
+
+/* { dg-final { scan-assembler "vrintz.f32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrndxq_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrndxq_f16.c
new file mode 100644
index 0000000000000000000000000000000000000000..0836f2233f814e0d53c100f6af42197fa88476a7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrndxq_f16.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mcpu=auto"} } */
+
+#include "arm_mve.h"
+
+float16x8_t
+foo (float16x8_t a)
+{
+  return vrndxq_f16 (a);
+}
+
+/* { dg-final { scan-assembler "vrintx.f16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrndxq_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrndxq_f32.c
new file mode 100644
index 0000000000000000000000000000000000000000..012a50b6020cd78d5ac1b53d44de0eebedc6b587
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrndxq_f32.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mcpu=auto"} } */
+
+#include "arm_mve.h"
+
+float32x4_t
+foo (float32x4_t a)
+{
+  return vrndxq_f32 (a);
+}
+
+/* { dg-final { scan-assembler "vrintx.f32"  }  } */


[-- Attachment #2: diff04.patch --]
[-- Type: text/plain, Size: 38535 bytes --]

diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index a9f76971ef310118bf7edea6a8dd3de1da46b46b..2fee417fe6585f457edd4cf96655366b1d6bd1a0 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -293,6 +293,28 @@ arm_store1_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_void, qualifier_pointer_map_mode, qualifier_none };
 #define STORE1_QUALIFIERS (arm_store1_qualifiers)
 
+/* Qualifiers for MVE builtins.  */
+
+static enum arm_type_qualifiers
+arm_unop_none_none_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_none, qualifier_none };
+#define UNOP_NONE_NONE_QUALIFIERS \
+  (arm_unop_none_none_qualifiers)
+
+static enum arm_type_qualifiers
+arm_unop_none_snone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_none, qualifier_none };
+#define UNOP_NONE_SNONE_QUALIFIERS \
+  (arm_unop_none_snone_qualifiers)
+
+static enum arm_type_qualifiers
+arm_unop_none_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_none, qualifier_unsigned };
+#define UNOP_NONE_UNONE_QUALIFIERS \
+  (arm_unop_none_unone_qualifiers)
+
+/* End of Qualifier for MVE builtins.  */
+
    /* void ([T element type] *, T, immediate).  */
 static enum arm_type_qualifiers
 arm_storestruct_lane_qualifiers[SIMD_MAX_BUILTIN_ARGS]
diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
index 39c6a1551a72700292dde8ef6cea44ba0907af8d..9bcb04fb99a54b47057bb33cc807d6a5ad16401f 100644
--- a/gcc/config/arm/arm_mve.h
+++ b/gcc/config/arm/arm_mve.h
@@ -81,6 +81,33 @@ typedef struct { uint8x16_t val[4]; } uint8x16x4_t;
 #define vst4q_u32( __addr, __value) __arm_vst4q_u32( __addr, __value)
 #define vst4q_f16( __addr, __value) __arm_vst4q_f16( __addr, __value)
 #define vst4q_f32( __addr, __value) __arm_vst4q_f32( __addr, __value)
+#define vrndxq_f16(__a) __arm_vrndxq_f16(__a)
+#define vrndxq_f32(__a) __arm_vrndxq_f32(__a)
+#define vrndq_f16(__a) __arm_vrndq_f16(__a)
+#define vrndq_f32(__a) __arm_vrndq_f32(__a)
+#define vrndpq_f16(__a) __arm_vrndpq_f16(__a)
+#define vrndpq_f32(__a) __arm_vrndpq_f32(__a)
+#define vrndnq_f16(__a) __arm_vrndnq_f16(__a)
+#define vrndnq_f32(__a) __arm_vrndnq_f32(__a)
+#define vrndmq_f16(__a) __arm_vrndmq_f16(__a)
+#define vrndmq_f32(__a) __arm_vrndmq_f32(__a)
+#define vrndaq_f16(__a) __arm_vrndaq_f16(__a)
+#define vrndaq_f32(__a) __arm_vrndaq_f32(__a)
+#define vrev64q_f16(__a) __arm_vrev64q_f16(__a)
+#define vrev64q_f32(__a) __arm_vrev64q_f32(__a)
+#define vnegq_f16(__a) __arm_vnegq_f16(__a)
+#define vnegq_f32(__a) __arm_vnegq_f32(__a)
+#define vdupq_n_f16(__a) __arm_vdupq_n_f16(__a)
+#define vdupq_n_f32(__a) __arm_vdupq_n_f32(__a)
+#define vabsq_f16(__a) __arm_vabsq_f16(__a)
+#define vabsq_f32(__a) __arm_vabsq_f32(__a)
+#define vrev32q_f16(__a) __arm_vrev32q_f16(__a)
+#define vcvttq_f32_f16(__a) __arm_vcvttq_f32_f16(__a)
+#define vcvtbq_f32_f16(__a) __arm_vcvtbq_f32_f16(__a)
+#define vcvtq_f16_s16(__a) __arm_vcvtq_f16_s16(__a)
+#define vcvtq_f32_s32(__a) __arm_vcvtq_f32_s32(__a)
+#define vcvtq_f16_u16(__a) __arm_vcvtq_f16_u16(__a)
+#define vcvtq_f32_u32(__a) __arm_vcvtq_f32_u32(__a)
 #endif
 
 __extension__ extern __inline void
@@ -157,6 +184,195 @@ __arm_vst4q_f32 (float32_t * __addr, float32x4x4_t __value)
   __builtin_mve_vst4qv4sf (__addr, __rv.__o);
 }
 
+__extension__ extern __inline float16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vrndxq_f16 (float16x8_t __a)
+{
+  return __builtin_mve_vrndxq_fv8hf (__a);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vrndxq_f32 (float32x4_t __a)
+{
+  return __builtin_mve_vrndxq_fv4sf (__a);
+}
+
+__extension__ extern __inline float16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vrndq_f16 (float16x8_t __a)
+{
+  return __builtin_mve_vrndq_fv8hf (__a);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vrndq_f32 (float32x4_t __a)
+{
+  return __builtin_mve_vrndq_fv4sf (__a);
+}
+
+__extension__ extern __inline float16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vrndpq_f16 (float16x8_t __a)
+{
+  return __builtin_mve_vrndpq_fv8hf (__a);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vrndpq_f32 (float32x4_t __a)
+{
+  return __builtin_mve_vrndpq_fv4sf (__a);
+}
+
+__extension__ extern __inline float16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vrndnq_f16 (float16x8_t __a)
+{
+  return __builtin_mve_vrndnq_fv8hf (__a);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vrndnq_f32 (float32x4_t __a)
+{
+  return __builtin_mve_vrndnq_fv4sf (__a);
+}
+
+__extension__ extern __inline float16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vrndmq_f16 (float16x8_t __a)
+{
+  return __builtin_mve_vrndmq_fv8hf (__a);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vrndmq_f32 (float32x4_t __a)
+{
+  return __builtin_mve_vrndmq_fv4sf (__a);
+}
+
+__extension__ extern __inline float16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vrndaq_f16 (float16x8_t __a)
+{
+  return __builtin_mve_vrndaq_fv8hf (__a);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vrndaq_f32 (float32x4_t __a)
+{
+  return __builtin_mve_vrndaq_fv4sf (__a);
+}
+
+__extension__ extern __inline float16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vrev64q_f16 (float16x8_t __a)
+{
+  return __builtin_mve_vrev64q_fv8hf (__a);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vrev64q_f32 (float32x4_t __a)
+{
+  return __builtin_mve_vrev64q_fv4sf (__a);
+}
+
+__extension__ extern __inline float16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vnegq_f16 (float16x8_t __a)
+{
+  return __builtin_mve_vnegq_fv8hf (__a);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vnegq_f32 (float32x4_t __a)
+{
+  return __builtin_mve_vnegq_fv4sf (__a);
+}
+
+__extension__ extern __inline float16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vdupq_n_f16 (float16_t __a)
+{
+  return __builtin_mve_vdupq_n_fv8hf (__a);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vdupq_n_f32 (float32_t __a)
+{
+  return __builtin_mve_vdupq_n_fv4sf (__a);
+}
+
+__extension__ extern __inline float16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vabsq_f16 (float16x8_t __a)
+{
+  return __builtin_mve_vabsq_fv8hf (__a);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vabsq_f32 (float32x4_t __a)
+{
+  return __builtin_mve_vabsq_fv4sf (__a);
+}
+
+__extension__ extern __inline float16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vrev32q_f16 (float16x8_t __a)
+{
+  return __builtin_mve_vrev32q_fv8hf (__a);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vcvttq_f32_f16 (float16x8_t __a)
+{
+  return __builtin_mve_vcvttq_f32_f16v4sf (__a);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vcvtbq_f32_f16 (float16x8_t __a)
+{
+  return __builtin_mve_vcvtbq_f32_f16v4sf (__a);
+}
+
+__extension__ extern __inline float16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vcvtq_f16_s16 (int16x8_t __a)
+{
+  return __builtin_mve_vcvtq_to_f_sv8hf (__a);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vcvtq_f32_s32 (int32x4_t __a)
+{
+  return __builtin_mve_vcvtq_to_f_sv4sf (__a);
+}
+
+__extension__ extern __inline float16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vcvtq_f16_u16 (uint16x8_t __a)
+{
+  return __builtin_mve_vcvtq_to_f_uv8hf (__a);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vcvtq_f32_u32 (uint32x4_t __a)
+{
+  return __builtin_mve_vcvtq_to_f_uv4sf (__a);
+}
+
 #endif
 
 enum {
@@ -368,6 +584,83 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_float16_t_ptr][__ARM_mve_type_float16x8x4_t]: __arm_vst4q_f16 (__ARM_mve_coerce(__p0, float16_t *), __ARM_mve_coerce(__p1, float16x8x4_t)), \
   int (*)[__ARM_mve_type_float32_t_ptr][__ARM_mve_type_float32x4x4_t]: __arm_vst4q_f32 (__ARM_mve_coerce(__p0, float32_t *), __ARM_mve_coerce(__p1, float32x4x4_t)));})
 
+#define vrndxq(p0) __arm_vrndxq(p0)
+#define __arm_vrndxq(p0) ({ __typeof(p0) __p0 = (p0); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
+  int (*)[__ARM_mve_type_float16x8_t]: __arm_vrndxq_f16 (__ARM_mve_coerce(__p0, float16x8_t)), \
+  int (*)[__ARM_mve_type_float32x4_t]: __arm_vrndxq_f32 (__ARM_mve_coerce(__p0, float32x4_t)));})
+
+#define vrndq(p0) __arm_vrndq(p0)
+#define __arm_vrndq(p0) ({ __typeof(p0) __p0 = (p0); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
+  int (*)[__ARM_mve_type_float16x8_t]: __arm_vrndq_f16 (__ARM_mve_coerce(__p0, float16x8_t)), \
+  int (*)[__ARM_mve_type_float32x4_t]: __arm_vrndq_f32 (__ARM_mve_coerce(__p0, float32x4_t)));})
+
+#define vrndpq(p0) __arm_vrndpq(p0)
+#define __arm_vrndpq(p0) ({ __typeof(p0) __p0 = (p0); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
+  int (*)[__ARM_mve_type_float16x8_t]: __arm_vrndpq_f16 (__ARM_mve_coerce(__p0, float16x8_t)), \
+  int (*)[__ARM_mve_type_float32x4_t]: __arm_vrndpq_f32 (__ARM_mve_coerce(__p0, float32x4_t)));})
+
+#define vrndnq(p0) __arm_vrndnq(p0)
+#define __arm_vrndnq(p0) ({ __typeof(p0) __p0 = (p0); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
+  int (*)[__ARM_mve_type_float16x8_t]: __arm_vrndnq_f16 (__ARM_mve_coerce(__p0, float16x8_t)), \
+  int (*)[__ARM_mve_type_float32x4_t]: __arm_vrndnq_f32 (__ARM_mve_coerce(__p0, float32x4_t)));})
+
+#define vrndmq(p0) __arm_vrndmq(p0)
+#define __arm_vrndmq(p0) ({ __typeof(p0) __p0 = (p0); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
+  int (*)[__ARM_mve_type_float16x8_t]: __arm_vrndmq_f16 (__ARM_mve_coerce(__p0, float16x8_t)), \
+  int (*)[__ARM_mve_type_float32x4_t]: __arm_vrndmq_f32 (__ARM_mve_coerce(__p0, float32x4_t)));})
+
+#define vrndaq(p0) __arm_vrndaq(p0)
+#define __arm_vrndaq(p0) ({ __typeof(p0) __p0 = (p0); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
+  int (*)[__ARM_mve_type_float16x8_t]: __arm_vrndaq_f16 (__ARM_mve_coerce(__p0, float16x8_t)), \
+  int (*)[__ARM_mve_type_float32x4_t]: __arm_vrndaq_f32 (__ARM_mve_coerce(__p0, float32x4_t)));})
+
+#define vrev64q(p0) __arm_vrev64q(p0)
+#define __arm_vrev64q(p0) ({ __typeof(p0) __p0 = (p0); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
+  int (*)[__ARM_mve_type_float16x8_t]: __arm_vrev64q_f16 (__ARM_mve_coerce(__p0, float16x8_t)), \
+  int (*)[__ARM_mve_type_float32x4_t]: __arm_vrev64q_f32 (__ARM_mve_coerce(__p0, float32x4_t)));})
+
+#define vnegq(p0) __arm_vnegq(p0)
+#define __arm_vnegq(p0) ({ __typeof(p0) __p0 = (p0); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
+  int (*)[__ARM_mve_type_float16x8_t]: __arm_vnegq_f16 (__ARM_mve_coerce(__p0, float16x8_t)), \
+  int (*)[__ARM_mve_type_float32x4_t]: __arm_vnegq_f32 (__ARM_mve_coerce(__p0, float32x4_t)));})
+
+#define vabsq(p0) __arm_vabsq(p0)
+#define __arm_vabsq(p0) ({ __typeof(p0) __p0 = (p0); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
+  int (*)[__ARM_mve_type_float16x8_t]: __arm_vabsq_f16 (__ARM_mve_coerce(__p0, float16x8_t)), \
+  int (*)[__ARM_mve_type_float32x4_t]: __arm_vabsq_f32 (__ARM_mve_coerce(__p0, float32x4_t)));})
+
+#define vrev32q(p0) __arm_vrev32q(p0)
+#define __arm_vrev32q(p0) ({ __typeof(p0) __p0 = (p0); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
+  int (*)[__ARM_mve_type_float16x8_t]: __arm_vrev32q_f16 (__ARM_mve_coerce(__p0, float16x8_t)));})
+
+#define vcvtbq_f32(p0) __arm_vcvtbq_f32(p0)
+#define __arm_vcvtbq_f32(p0) ({ __typeof(p0) __p0 = (p0); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
+  int (*)[__ARM_mve_type_float16x8_t]: __arm_vcvtbq_f32_f16 (__ARM_mve_coerce(__p0, float16x8_t)));})
+
+#define vcvttq_f32(p0) __arm_vcvttq_f32(p0)
+#define __arm_vcvttq_f32(p0) ({ __typeof(p0) __p0 = (p0); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
+  int (*)[__ARM_mve_type_float16x8_t]: __arm_vcvttq_f32_f16 (__ARM_mve_coerce(__p0, float16x8_t)));})
+
+#define vcvtq(p0) __arm_vcvtq(p0)
+#define __arm_vcvtq(p0) ({ __typeof(p0) __p0 = (p0); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
+  int (*)[__ARM_mve_type_int16x8_t]: __arm_vcvtq_f16_s16 (__ARM_mve_coerce(__p0, int16x8_t)), \
+  int (*)[__ARM_mve_type_int32x4_t]: __arm_vcvtq_f32_s32 (__ARM_mve_coerce(__p0, int32x4_t)), \
+  int (*)[__ARM_mve_type_uint16x8_t]: __arm_vcvtq_f16_u16 (__ARM_mve_coerce(__p0, uint16x8_t)), \
+  int (*)[__ARM_mve_type_uint32x4_t]: __arm_vcvtq_f32_u32 (__ARM_mve_coerce(__p0, uint32x4_t)));})
+
 #else /* MVE Interger.  */
 
 #define vst4q(p0,p1) __arm_vst4q(p0,p1)
diff --git a/gcc/config/arm/arm_mve_builtins.def b/gcc/config/arm/arm_mve_builtins.def
index c7f3546c9cb0ec3ae794e181e533a1cbed38121c..65dc58c9328525891a0aa0bb97a412ebc8257c18 100644
--- a/gcc/config/arm/arm_mve_builtins.def
+++ b/gcc/config/arm/arm_mve_builtins.def
@@ -19,3 +19,18 @@
     <http://www.gnu.org/licenses/>.  */
 
 VAR5 (STORE1, vst4q, v16qi, v8hi, v4si, v8hf, v4sf)
+VAR2 (UNOP_NONE_NONE, vrndxq_f, v8hf, v4sf)
+VAR2 (UNOP_NONE_NONE, vrndq_f, v8hf, v4sf)
+VAR2 (UNOP_NONE_NONE, vrndpq_f, v8hf, v4sf)
+VAR2 (UNOP_NONE_NONE, vrndnq_f, v8hf, v4sf)
+VAR2 (UNOP_NONE_NONE, vrndmq_f, v8hf, v4sf)
+VAR2 (UNOP_NONE_NONE, vrndaq_f, v8hf, v4sf)
+VAR2 (UNOP_NONE_NONE, vrev64q_f, v8hf, v4sf)
+VAR2 (UNOP_NONE_NONE, vnegq_f, v8hf, v4sf)
+VAR2 (UNOP_NONE_NONE, vdupq_n_f, v8hf, v4sf)
+VAR2 (UNOP_NONE_NONE, vabsq_f, v8hf, v4sf)
+VAR1 (UNOP_NONE_NONE, vrev32q_f, v8hf)
+VAR1 (UNOP_NONE_NONE, vcvttq_f32_f16, v4sf)
+VAR1 (UNOP_NONE_NONE, vcvtbq_f32_f16, v4sf)
+VAR2 (UNOP_NONE_SNONE, vcvtq_to_f_s, v8hf, v4sf)
+VAR2 (UNOP_NONE_UNONE, vcvtq_to_f_u, v8hf, v4sf)
diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index a2f5220eccb6d888b9b0b1bbd1054cc3a6be97f5..7a31d0abdfff9a93d79faa1de44d1b224470e2eb 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -21,8 +21,18 @@
 			      (V2DI "u64")])
 (define_mode_iterator MVE_types [V16QI V8HI V4SI V2DI TI V8HF V4SF V2DF])
 (define_mode_iterator MVE_VLD_ST [V16QI V8HI V4SI V8HF V4SF])
+(define_mode_iterator MVE_0 [V8HF V4SF])
 
-(define_c_enum "unspec" [VST4Q])
+(define_c_enum "unspec" [VST4Q VRNDXQ_F VRNDQ_F VRNDPQ_F VRNDNQ_F VRNDMQ_F
+			 VRNDAQ_F VREV64Q_F VNEGQ_F VDUPQ_N_F VABSQ_F VREV32Q_F
+			 VCVTTQ_F32_F16 VCVTBQ_F32_F16 VCVTQ_TO_F_S
+			 VCVTQ_TO_F_U])
+
+(define_mode_attr MVE_CNVT [(V8HI "V8HF") (V4SI "V4SF")
+			    (V8HF "V8HI") (V4SF "V4SI")])
+
+(define_int_attr supf [(VCVTQ_TO_F_S "s") (VCVTQ_TO_F_U "u")])
+(define_int_iterator VCVTQ_TO_F [VCVTQ_TO_F_S VCVTQ_TO_F_U])
 
 (define_insn "*mve_mov<mode>"
   [(set (match_operand:MVE_types 0 "s_register_operand" "=w,w,r,w,w,r,w")
@@ -113,3 +123,198 @@
    return "";
 }
   [(set_attr "length" "16")])
+
+;;
+;; [vrndxq_f])
+;;
+(define_insn "mve_vrndxq_f<mode>"
+  [
+   (set (match_operand:MVE_0 0 "s_register_operand" "=w")
+	(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "w")]
+	 VRNDXQ_F))
+  ]
+  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
+  "vrintx.f%#<V_sz_elem>	%q0, %q1"
+  [(set_attr "type" "mve_move")
+])
+
+;;
+;; [vrndq_f])
+;;
+(define_insn "mve_vrndq_f<mode>"
+  [
+   (set (match_operand:MVE_0 0 "s_register_operand" "=w")
+	(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "w")]
+	 VRNDQ_F))
+  ]
+  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
+  "vrintz.f%#<V_sz_elem>	%q0, %q1"
+  [(set_attr "type" "mve_move")
+])
+
+;;
+;; [vrndpq_f])
+;;
+(define_insn "mve_vrndpq_f<mode>"
+  [
+   (set (match_operand:MVE_0 0 "s_register_operand" "=w")
+	(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "w")]
+	 VRNDPQ_F))
+  ]
+  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
+  "vrintp.f%#<V_sz_elem>	%q0, %q1"
+  [(set_attr "type" "mve_move")
+])
+
+;;
+;; [vrndnq_f])
+;;
+(define_insn "mve_vrndnq_f<mode>"
+  [
+   (set (match_operand:MVE_0 0 "s_register_operand" "=w")
+	(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "w")]
+	 VRNDNQ_F))
+  ]
+  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
+  "vrintn.f%#<V_sz_elem>	%q0, %q1"
+  [(set_attr "type" "mve_move")
+])
+
+;;
+;; [vrndmq_f])
+;;
+(define_insn "mve_vrndmq_f<mode>"
+  [
+   (set (match_operand:MVE_0 0 "s_register_operand" "=w")
+	(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "w")]
+	 VRNDMQ_F))
+  ]
+  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
+  "vrintm.f%#<V_sz_elem>	%q0, %q1"
+  [(set_attr "type" "mve_move")
+])
+
+;;
+;; [vrndaq_f])
+;;
+(define_insn "mve_vrndaq_f<mode>"
+  [
+   (set (match_operand:MVE_0 0 "s_register_operand" "=w")
+	(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "w")]
+	 VRNDAQ_F))
+  ]
+  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
+  "vrinta.f%#<V_sz_elem>	%q0, %q1"
+  [(set_attr "type" "mve_move")
+])
+
+;;
+;; [vrev64q_f])
+;;
+(define_insn "mve_vrev64q_f<mode>"
+  [
+   (set (match_operand:MVE_0 0 "s_register_operand" "=w")
+	(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "w")]
+	 VREV64Q_F))
+  ]
+  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
+  "vrev64.%#<V_sz_elem> %q0, %q1"
+  [(set_attr "type" "mve_move")
+])
+
+;;
+;; [vnegq_f])
+;;
+(define_insn "mve_vnegq_f<mode>"
+  [
+   (set (match_operand:MVE_0 0 "s_register_operand" "=w")
+	(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "w")]
+	 VNEGQ_F))
+  ]
+  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
+  "vneg.f%#<V_sz_elem>  %q0, %q1"
+  [(set_attr "type" "mve_move")
+])
+
+;;
+;; [vdupq_n_f])
+;;
+(define_insn "mve_vdupq_n_f<mode>"
+  [
+   (set (match_operand:MVE_0 0 "s_register_operand" "=w")
+	(unspec:MVE_0 [(match_operand:<V_elem> 1 "s_register_operand" "r")]
+	 VDUPQ_N_F))
+  ]
+  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
+  "vdup.%#<V_sz_elem>   %q0, %1"
+  [(set_attr "type" "mve_move")
+])
+
+;;
+;; [vabsq_f])
+;;
+(define_insn "mve_vabsq_f<mode>"
+  [
+   (set (match_operand:MVE_0 0 "s_register_operand" "=w")
+	(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "w")]
+	 VABSQ_F))
+  ]
+  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
+  "vabs.f%#<V_sz_elem>  %q0, %q1"
+  [(set_attr "type" "mve_move")
+])
+
+;;
+;; [vrev32q_f])
+;;
+(define_insn "mve_vrev32q_fv8hf"
+  [
+   (set (match_operand:V8HF 0 "s_register_operand" "=w")
+	(unspec:V8HF [(match_operand:V8HF 1 "s_register_operand" "w")]
+	 VREV32Q_F))
+  ]
+  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
+  "vrev32.16 %q0, %q1"
+  [(set_attr "type" "mve_move")
+])
+;;
+;; [vcvttq_f32_f16])
+;;
+(define_insn "mve_vcvttq_f32_f16v4sf"
+  [
+   (set (match_operand:V4SF 0 "s_register_operand" "=w")
+	(unspec:V4SF [(match_operand:V8HF 1 "s_register_operand" "w")]
+	 VCVTTQ_F32_F16))
+  ]
+  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
+  "vcvtt.f32.f16 %q0, %q1"
+  [(set_attr "type" "mve_move")
+])
+
+;;
+;; [vcvtbq_f32_f16])
+;;
+(define_insn "mve_vcvtbq_f32_f16v4sf"
+  [
+   (set (match_operand:V4SF 0 "s_register_operand" "=w")
+	(unspec:V4SF [(match_operand:V8HF 1 "s_register_operand" "w")]
+	 VCVTBQ_F32_F16))
+  ]
+  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
+  "vcvtb.f32.f16 %q0, %q1"
+  [(set_attr "type" "mve_move")
+])
+
+;;
+;; [vcvtq_to_f_s, vcvtq_to_f_u])
+;;
+(define_insn "mve_vcvtq_to_f_<supf><mode>"
+  [
+   (set (match_operand:MVE_0 0 "s_register_operand" "=w")
+	(unspec:MVE_0 [(match_operand:<MVE_CNVT> 1 "s_register_operand" "w")]
+	 VCVTQ_TO_F))
+  ]
+  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
+  "vcvt.f%#<V_sz_elem>.<supf>%#<V_sz_elem>       %q0, %q1"
+  [(set_attr "type" "mve_move")
+])
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_f16.c
new file mode 100644
index 0000000000000000000000000000000000000000..c0b19dc2acf36a89705ab7a371925f1f35fe60ba
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_f16.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mcpu=auto"} } */
+
+#include "arm_mve.h"
+
+float16x8_t
+foo (float16x8_t a)
+{
+  return vabsq_f16 (a);
+}
+
+/* { dg-final { scan-assembler "vabs.f16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_f32.c
new file mode 100644
index 0000000000000000000000000000000000000000..59c3b5ab2a4a4269f2fc94408d45a44e60c2db17
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_f32.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mcpu=auto"} } */
+
+#include "arm_mve.h"
+
+float32x4_t
+foo (float32x4_t a)
+{
+  return vabsq_f32 (a);
+}
+
+/* { dg-final { scan-assembler "vabs.f32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtbq_f32_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtbq_f32_f16.c
new file mode 100644
index 0000000000000000000000000000000000000000..0f6c0b56a364515c7804acbb38e2402833006474
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtbq_f32_f16.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mcpu=auto"} } */
+
+#include "arm_mve.h"
+
+float32x4_t
+foo (float16x8_t a)
+{
+  return vcvtbq_f32_f16 (a);
+}
+
+/* { dg-final { scan-assembler "vcvtb.f32.f16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_f16_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_f16_s16.c
new file mode 100644
index 0000000000000000000000000000000000000000..45be0573781bebd1b0321eb63ae9717d25b2e95f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_f16_s16.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mcpu=auto"} } */
+
+#include "arm_mve.h"
+
+float16x8_t
+foo (int16x8_t a)
+{
+  return vcvtq_f16_s16 (a);
+}
+
+/* { dg-final { scan-assembler "vcvt.f16.s16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_f16_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_f16_u16.c
new file mode 100644
index 0000000000000000000000000000000000000000..17b42c5771e5aab6a8c2dfef7c71c8616c5116f8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_f16_u16.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mcpu=auto"} } */
+
+#include "arm_mve.h"
+
+float16x8_t
+foo (uint16x8_t a)
+{
+  return vcvtq_f16_u16 (a);
+}
+
+/* { dg-final { scan-assembler "vcvt.f16.u16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_f32_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_f32_s32.c
new file mode 100644
index 0000000000000000000000000000000000000000..0e7dbd22effb90a47b53b7a82cbcea4d92e587a2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_f32_s32.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mcpu=auto"} } */
+
+#include "arm_mve.h"
+
+float32x4_t
+foo (int32x4_t a)
+{
+  return vcvtq_f32_s32 (a);
+}
+
+/* { dg-final { scan-assembler "vcvt.f32.s32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_f32_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_f32_u32.c
new file mode 100644
index 0000000000000000000000000000000000000000..2641d71558cd810a0dabc14a0fc038038fa6e2f4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_f32_u32.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mcpu=auto"} } */
+
+#include "arm_mve.h"
+
+float32x4_t
+foo (uint32x4_t a)
+{
+  return vcvtq_f32_u32 (a);
+}
+
+/* { dg-final { scan-assembler "vcvt.f32.u32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvttq_f32_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvttq_f32_f16.c
new file mode 100644
index 0000000000000000000000000000000000000000..f3983cf3aa4ec9a7489df84f895f10b365c6acd4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvttq_f32_f16.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mcpu=auto"} } */
+
+#include "arm_mve.h"
+
+float32x4_t
+foo (float16x8_t a)
+{
+  return vcvttq_f32_f16 (a);
+}
+
+/* { dg-final { scan-assembler "vcvtt.f32.f16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_f16.c
new file mode 100644
index 0000000000000000000000000000000000000000..a2b7bebddeb0ed92e4b674405d8a23a397234b15
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_f16.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mcpu=auto"} } */
+
+#include "arm_mve.h"
+
+float16x8_t
+foo (float16_t a)
+{
+  return vdupq_n_f16 (a);
+}
+
+/* { dg-final { scan-assembler "vdup.16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_f32.c
new file mode 100644
index 0000000000000000000000000000000000000000..05ffc960217f8781d87870a3d100a1008c41ed86
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_f32.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mcpu=auto"} } */
+
+#include "arm_mve.h"
+
+float32x4_t
+foo (float32_t a)
+{
+  return vdupq_n_f32 (a);
+}
+
+/* { dg-final { scan-assembler "vdup.32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vnegq_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vnegq_f16.c
new file mode 100644
index 0000000000000000000000000000000000000000..8a6e9a793c9ea12a74a08cb905a6f7cb6a4d6350
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vnegq_f16.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mcpu=auto"} } */
+
+#include "arm_mve.h"
+
+float16x8_t
+foo (float16x8_t a)
+{
+  return vnegq_f16 (a);
+}
+
+/* { dg-final { scan-assembler "vneg.f16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vnegq_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vnegq_f32.c
new file mode 100644
index 0000000000000000000000000000000000000000..63e94ca3eac1dda65aaf2d06309654560d0756f3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vnegq_f32.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mcpu=auto"} } */
+
+#include "arm_mve.h"
+
+float32x4_t
+foo (float32x4_t a)
+{
+  return vnegq_f32 (a);
+}
+
+/* { dg-final { scan-assembler "vneg.f32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrev32q_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrev32q_f16.c
new file mode 100644
index 0000000000000000000000000000000000000000..21c666844179e723484678af5d8123a418177090
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrev32q_f16.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mcpu=auto"} } */
+
+#include "arm_mve.h"
+
+float16x8_t
+foo (float16x8_t a)
+{
+  return vrev32q_f16 (a);
+}
+
+/* { dg-final { scan-assembler "vrev32.16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrev64q_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrev64q_f16.c
new file mode 100644
index 0000000000000000000000000000000000000000..70db03e0fe4573961778a5290e9b585044faa393
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrev64q_f16.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mcpu=auto"} } */
+
+#include "arm_mve.h"
+
+float16x8_t
+foo (float16x8_t a)
+{
+  return vrev64q_f16 (a);
+}
+
+/* { dg-final { scan-assembler "vrev64.16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrev64q_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrev64q_f32.c
new file mode 100644
index 0000000000000000000000000000000000000000..e9686c63d162a8919bd24d0227f9e6acb6278142
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrev64q_f32.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mcpu=auto"} } */
+
+#include "arm_mve.h"
+
+float32x4_t
+foo (float32x4_t a)
+{
+  return vrev64q_f32 (a);
+}
+
+/* { dg-final { scan-assembler "vrev64.32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrndaq_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrndaq_f16.c
new file mode 100644
index 0000000000000000000000000000000000000000..9c51833855e772346b2d1df3760de511aa58d1ca
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrndaq_f16.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mcpu=auto"} } */
+
+#include "arm_mve.h"
+
+float16x8_t
+foo (float16x8_t a)
+{
+  return vrndaq_f16 (a);
+}
+
+/* { dg-final { scan-assembler "vrinta.f16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrndaq_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrndaq_f32.c
new file mode 100644
index 0000000000000000000000000000000000000000..0f2c76f38ea950a36a6e243ce656f334c46e5fa5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrndaq_f32.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mcpu=auto"} } */
+
+#include "arm_mve.h"
+
+float32x4_t
+foo (float32x4_t a)
+{
+  return vrndaq_f32 (a);
+}
+
+/* { dg-final { scan-assembler "vrinta.f32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrndmq_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrndmq_f16.c
new file mode 100644
index 0000000000000000000000000000000000000000..936b20cf9eda0869f828880dc4ca6cf8d22d05dc
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrndmq_f16.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mcpu=auto"} } */
+
+#include "arm_mve.h"
+
+float16x8_t
+foo (float16x8_t a)
+{
+  return vrndmq_f16 (a);
+}
+
+/* { dg-final { scan-assembler "vrintm.f16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrndmq_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrndmq_f32.c
new file mode 100644
index 0000000000000000000000000000000000000000..ba77c73ebb1aebc5608597739e0be57436f8aebd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrndmq_f32.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mcpu=auto"} } */
+
+#include "arm_mve.h"
+
+float32x4_t
+foo (float32x4_t a)
+{
+  return vrndmq_f32 (a);
+}
+
+/* { dg-final { scan-assembler "vrintm.f32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrndnq_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrndnq_f16.c
new file mode 100644
index 0000000000000000000000000000000000000000..43dd340e8c529e71cbcd6168adb14f5a29d141b2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrndnq_f16.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mcpu=auto"} } */
+
+#include "arm_mve.h"
+
+float16x8_t
+foo (float16x8_t a)
+{
+  return vrndnq_f16 (a);
+}
+
+/* { dg-final { scan-assembler "vrintn.f16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrndnq_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrndnq_f32.c
new file mode 100644
index 0000000000000000000000000000000000000000..a665250c35b1fecb6bd017df8f336c86b6cae179
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrndnq_f32.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mcpu=auto"} } */
+
+#include "arm_mve.h"
+
+float32x4_t
+foo (float32x4_t a)
+{
+  return vrndnq_f32 (a);
+}
+
+/* { dg-final { scan-assembler "vrintn.f32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrndpq_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrndpq_f16.c
new file mode 100644
index 0000000000000000000000000000000000000000..effa934137adf3e2d5535b1abdd7812d8fa7cc41
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrndpq_f16.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mcpu=auto"} } */
+
+#include "arm_mve.h"
+
+float16x8_t
+foo (float16x8_t a)
+{
+  return vrndpq_f16 (a);
+}
+
+/* { dg-final { scan-assembler "vrintp.f16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrndpq_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrndpq_f32.c
new file mode 100644
index 0000000000000000000000000000000000000000..0e1714f89b3a7f5b59a40cfbb9ab3fd72399e988
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrndpq_f32.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mcpu=auto"} } */
+
+#include "arm_mve.h"
+
+float32x4_t
+foo (float32x4_t a)
+{
+  return vrndpq_f32 (a);
+}
+
+/* { dg-final { scan-assembler "vrintp.f32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrndq_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrndq_f16.c
new file mode 100644
index 0000000000000000000000000000000000000000..0cd74d4849fbb73a0dc7cb64d0f5cbde5ee86927
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrndq_f16.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mcpu=auto"} } */
+
+#include "arm_mve.h"
+
+float16x8_t
+foo (float16x8_t a)
+{
+  return vrndq_f16 (a);
+}
+
+/* { dg-final { scan-assembler "vrintz.f16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrndq_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrndq_f32.c
new file mode 100644
index 0000000000000000000000000000000000000000..eacd1d1f2e1c86169f0739ad43b3faf6da764f75
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrndq_f32.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mcpu=auto"} } */
+
+#include "arm_mve.h"
+
+float32x4_t
+foo (float32x4_t a)
+{
+  return vrndq_f32 (a);
+}
+
+/* { dg-final { scan-assembler "vrintz.f32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrndxq_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrndxq_f16.c
new file mode 100644
index 0000000000000000000000000000000000000000..0836f2233f814e0d53c100f6af42197fa88476a7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrndxq_f16.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mcpu=auto"} } */
+
+#include "arm_mve.h"
+
+float16x8_t
+foo (float16x8_t a)
+{
+  return vrndxq_f16 (a);
+}
+
+/* { dg-final { scan-assembler "vrintx.f16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrndxq_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrndxq_f32.c
new file mode 100644
index 0000000000000000000000000000000000000000..012a50b6020cd78d5ac1b53d44de0eebedc6b587
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrndxq_f32.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mcpu=auto"} } */
+
+#include "arm_mve.h"
+
+float32x4_t
+foo (float32x4_t a)
+{
+  return vrndxq_f32 (a);
+}
+
+/* { dg-final { scan-assembler "vrintx.f32"  }  } */


^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH][ARM][GCC][14x]: MVE ACLE whole vector left shift with carry intrinsics.
  2019-11-14 19:34 [PATCH][ARM][GCC][0/x]: Support for MVE ACLE intrinsics Srinath Parvathaneni
                   ` (22 preceding siblings ...)
  2019-11-14 19:16 ` [PATCH][ARM][GCC][7x]: MVE vreinterpretq and vuninitializedq intrinsics Srinath Parvathaneni
@ 2019-11-14 19:16 ` Srinath Parvathaneni
  2019-11-14 19:16 ` [PATCH][ARM][GCC][5/5x]: MVE ACLE load intrinsics which load a byte, halfword, or word from memory Srinath Parvathaneni
                   ` (14 subsequent siblings)
  38 siblings, 0 replies; 48+ messages in thread
From: Srinath Parvathaneni @ 2019-11-14 19:16 UTC (permalink / raw)
  To: gcc-patches; +Cc: Richard Earnshaw, Kyrylo Tkachov

[-- Attachment #1: Type: text/plain, Size: 16988 bytes --]

Hello,

This patch supports following MVE ACLE whole vector left shift with carry intrinsics.

vshlcq_m_s8, vshlcq_m_s16, vshlcq_m_s32, vshlcq_m_u8, vshlcq_m_u16, vshlcq_m_u32.

Please refer to M-profile Vector Extension (MVE) intrinsics [1]  for more details.
[1] https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/mve-intrinsics

Regression tested on arm-none-eabi and found no regressions.

Ok for trunk?

Thanks,
Srinath.

gcc/ChangeLog:

2019-11-09  Andre Vieira  <andre.simoesdiasvieira@arm.com>
	    Mihail Ionescu  <mihail.ionescu@arm.com>
	    Srinath Parvathaneni  <srinath.parvathaneni@arm.com>

	* config/arm/arm_mve.h (vshlcq_m_s8): Define macro.
	(vshlcq_m_u8): Likewise.
	(vshlcq_m_s16): Likewise.
	(vshlcq_m_u16): Likewise.
	(vshlcq_m_s32): Likewise.
	(vshlcq_m_u32): Likewise.
	(__arm_vshlcq_m_s8): Define intrinsic.
	(__arm_vshlcq_m_u8): Likewise.
	(__arm_vshlcq_m_s16): Likewise.
	(__arm_vshlcq_m_u16): Likewise.
	(__arm_vshlcq_m_s32): Likewise.
	(__arm_vshlcq_m_u32): Likewise.
	(vshlcq_m): Define polymorphic variant.
	* config/arm/arm_mve_builtins.def (QUADOP_NONE_NONE_UNONE_IMM_UNONE):
	Use builtin qualifier.
	(QUADOP_UNONE_UNONE_UNONE_IMM_UNONE): Likewise.
	* config/arm/mve.md (mve_vshlcq_m_vec_<supf><mode>): Define RTL pattern.
	(mve_vshlcq_m_carry_<supf><mode>): Likewise.
	(mve_vshlcq_m_<supf><mode>): Likewise.

gcc/testsuite/ChangeLog:

2019-11-09  Andre Vieira  <andre.simoesdiasvieira@arm.com>
	    Mihail Ionescu  <mihail.ionescu@arm.com>
	    Srinath Parvathaneni  <srinath.parvathaneni@arm.com>

	* gcc.target/arm/mve/intrinsics/vshlcq_m_s16.c: New test.
	* gcc.target/arm/mve/intrinsics/vshlcq_m_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vshlcq_m_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vshlcq_m_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vshlcq_m_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vshlcq_m_u8.c: Likewise.


###############     Attachment also inlined for ease of reply    ###############


diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
index 2adae7f8b21f44aa3b80231b89bd68bcd0812611..5d385081a0affacc4dd21d010b01bceb38a9b699 100644
--- a/gcc/config/arm/arm_mve.h
+++ b/gcc/config/arm/arm_mve.h
@@ -2542,6 +2542,12 @@ typedef struct { uint8x16_t val[4]; } uint8x16x4_t;
 #define urshrl(__p0, __p1) __arm_urshrl(__p0, __p1)
 #define lsll(__p0, __p1) __arm_lsll(__p0, __p1)
 #define asrl(__p0, __p1) __arm_asrl(__p0, __p1)
+#define vshlcq_m_s8(__a,  __b,  __imm, __p) __arm_vshlcq_m_s8(__a,  __b,  __imm, __p)
+#define vshlcq_m_u8(__a,  __b,  __imm, __p) __arm_vshlcq_m_u8(__a,  __b,  __imm, __p)
+#define vshlcq_m_s16(__a,  __b,  __imm, __p) __arm_vshlcq_m_s16(__a,  __b,  __imm, __p)
+#define vshlcq_m_u16(__a,  __b,  __imm, __p) __arm_vshlcq_m_u16(__a,  __b,  __imm, __p)
+#define vshlcq_m_s32(__a,  __b,  __imm, __p) __arm_vshlcq_m_s32(__a,  __b,  __imm, __p)
+#define vshlcq_m_u32(__a,  __b,  __imm, __p) __arm_vshlcq_m_u32(__a,  __b,  __imm, __p)
 #endif
 
 /* For big-endian, GCC's vector indices are reversed within each 64 bits
@@ -16667,6 +16673,60 @@ __arm_srshr (int32_t value, const int shift)
   return __builtin_mve_srshr_si (value, shift);
 }
 
+__extension__ extern __inline int8x16_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vshlcq_m_s8 (int8x16_t __a, uint32_t * __b, const int __imm, mve_pred16_t __p)
+{
+  int8x16_t __res = __builtin_mve_vshlcq_m_vec_sv16qi (__a, *__b, __imm, __p);
+  *__b = __builtin_mve_vshlcq_m_carry_sv16qi (__a, *__b, __imm, __p);
+  return __res;
+}
+
+__extension__ extern __inline uint8x16_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vshlcq_m_u8 (uint8x16_t __a, uint32_t * __b, const int __imm, mve_pred16_t __p)
+{
+  uint8x16_t __res = __builtin_mve_vshlcq_m_vec_uv16qi (__a, *__b, __imm, __p);
+  *__b = __builtin_mve_vshlcq_m_carry_uv16qi (__a, *__b, __imm, __p);
+  return __res;
+}
+
+__extension__ extern __inline int16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vshlcq_m_s16 (int16x8_t __a, uint32_t * __b, const int __imm, mve_pred16_t __p)
+{
+  int16x8_t __res = __builtin_mve_vshlcq_m_vec_sv8hi (__a, *__b, __imm, __p);
+  *__b = __builtin_mve_vshlcq_m_carry_sv8hi (__a, *__b, __imm, __p);
+  return __res;
+}
+
+__extension__ extern __inline uint16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vshlcq_m_u16 (uint16x8_t __a, uint32_t * __b, const int __imm, mve_pred16_t __p)
+{
+  uint16x8_t __res = __builtin_mve_vshlcq_m_vec_uv8hi (__a, *__b, __imm, __p);
+  *__b = __builtin_mve_vshlcq_m_carry_uv8hi (__a, *__b, __imm, __p);
+  return __res;
+}
+
+__extension__ extern __inline int32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vshlcq_m_s32 (int32x4_t __a, uint32_t * __b, const int __imm, mve_pred16_t __p)
+{
+  int32x4_t __res = __builtin_mve_vshlcq_m_vec_sv4si (__a, *__b, __imm, __p);
+  *__b = __builtin_mve_vshlcq_m_carry_sv4si (__a, *__b, __imm, __p);
+  return __res;
+}
+
+__extension__ extern __inline uint32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vshlcq_m_u32 (uint32x4_t __a, uint32_t * __b, const int __imm, mve_pred16_t __p)
+{
+  uint32x4_t __res = __builtin_mve_vshlcq_m_vec_uv4si (__a, *__b, __imm, __p);
+  *__b = __builtin_mve_vshlcq_m_carry_uv4si (__a, *__b, __imm, __p);
+  return __res;
+}
+
 #if (__ARM_FEATURE_MVE & 2) /* MVE Floating point.  */
 
 __extension__ extern __inline void
@@ -26645,6 +26705,16 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_uint32_t][__ARM_mve_type_uint32x4_t]: __arm_vsetq_lane_u32 (__ARM_mve_coerce(__p0, uint32_t), __ARM_mve_coerce(__p1, uint32x4_t), p2), \
   int (*)[__ARM_mve_type_uint64_t][__ARM_mve_type_uint64x2_t]: __arm_vsetq_lane_u64 (__ARM_mve_coerce(__p0, uint64_t), __ARM_mve_coerce(__p1, uint64x2_t), p2));})
 
+#define vshlcq_m(p0,p1,p2,p3) __arm_vshlcq_m(p0,p1,p2,p3)
+#define __arm_vshlcq_m(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
+  int (*)[__ARM_mve_type_int8x16_t]: __arm_vshlcq_m_s8 (__ARM_mve_coerce(__p0, int8x16_t), p1, p2, p3), \
+  int (*)[__ARM_mve_type_int16x8_t]: __arm_vshlcq_m_s16 (__ARM_mve_coerce(__p0, int16x8_t), p1, p2, p3), \
+  int (*)[__ARM_mve_type_int32x4_t]: __arm_vshlcq_m_s32 (__ARM_mve_coerce(__p0, int32x4_t), p1, p2, p3), \
+  int (*)[__ARM_mve_type_uint8x16_t]: __arm_vshlcq_m_u8 (__ARM_mve_coerce(__p0, uint8x16_t), p1, p2, p3), \
+  int (*)[__ARM_mve_type_uint16x8_t]: __arm_vshlcq_m_u16 (__ARM_mve_coerce(__p0, uint16x8_t), p1, p2, p3), \
+  int (*)[__ARM_mve_type_uint32x4_t]: __arm_vshlcq_m_u32 (__ARM_mve_coerce(__p0, uint32x4_t), p1, p2, p3));})
+
 #endif /* MVE Floating point.  */
 
 #ifdef __cplusplus
diff --git a/gcc/config/arm/arm_mve_builtins.def b/gcc/config/arm/arm_mve_builtins.def
index c23fe88ad05f8ee6c03127066fdf5afa593df944..08f2a72df4b26980c473dac91993eb4583c60727 100644
--- a/gcc/config/arm/arm_mve_builtins.def
+++ b/gcc/config/arm/arm_mve_builtins.def
@@ -890,3 +890,7 @@ VAR1 (UQSHL, urshr_, si)
 VAR1 (UQSHL, urshrl_, di)
 VAR1 (UQSHL, uqshl_, si)
 VAR1 (UQSHL, uqshll_, di)
+VAR3 (QUADOP_NONE_NONE_UNONE_IMM_UNONE, vshlcq_m_vec_s, v16qi, v8hi, v4si)
+VAR3 (QUADOP_NONE_NONE_UNONE_IMM_UNONE, vshlcq_m_carry_s, v16qi, v8hi, v4si)
+VAR3 (QUADOP_UNONE_UNONE_UNONE_IMM_UNONE, vshlcq_m_vec_u, v16qi, v8hi, v4si)
+VAR3 (QUADOP_UNONE_UNONE_UNONE_IMM_UNONE, vshlcq_m_carry_u, v16qi, v8hi, v4si)
diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index bbd54ad71d0b85fd4711007dd93af7a2788b2cf4..8a51c25ce26405d65aac08044993c92cfb557661 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -215,8 +215,8 @@
 			 VADCQ_M_S VSBCIQ_U VSBCIQ_S VSBCIQ_M_U VSBCIQ_M_S
 			 VSBCQ_U VSBCQ_S VSBCQ_M_U VSBCQ_M_S VADCIQ_U VADCIQ_M_U
 			 VADCIQ_S VADCIQ_M_S VLD2Q VLD4Q VST2Q SRSHRL SRSHR
-			 URSHR URSHRL SQRSHR UQRSHL UQRSHLL_64
-			 UQRSHLL_48 SQRSHRL_64 SQRSHRL_48])
+			 URSHR URSHRL SQRSHR UQRSHL UQRSHLL_64 VSHLCQ_M_U
+			 UQRSHLL_48 SQRSHRL_64 SQRSHRL_48 VSHLCQ_M_S])
 
 (define_mode_attr MVE_CNVT [(V8HI "V8HF") (V4SI "V4SF") (V8HF "V8HI")
 			    (V4SF "V4SI")])
@@ -394,7 +394,8 @@
 		       (VADCQ_U "u")  (VADCQ_M_U "u") (VADCQ_S "s")
 		       (VADCIQ_U "u") (VADCIQ_M_U "u") (VADCIQ_S "s")
 		       (VADCIQ_M_S "s") (SQRSHRL_64 "64") (SQRSHRL_48 "48")
-		       (UQRSHLL_64 "64") (UQRSHLL_48 "48")])
+		       (UQRSHLL_64 "64") (UQRSHLL_48 "48") (VSHLCQ_M_S "s")
+		       (VSHLCQ_M_U "u")])
 
 (define_int_attr mode1 [(VCTP8Q "8") (VCTP16Q "16") (VCTP32Q "32")
 			(VCTP64Q "64") (VCTP8Q_M "8") (VCTP16Q_M "16")
@@ -659,6 +660,7 @@
 (define_int_iterator VADCQ_M [VADCQ_M_U VADCQ_M_S])
 (define_int_iterator UQRSHLLQ [UQRSHLL_64 UQRSHLL_48])
 (define_int_iterator SQRSHRLQ [SQRSHRL_64 SQRSHRL_48])
+(define_int_iterator VSHLCQ_M [VSHLCQ_M_S VSHLCQ_M_U])
 
 (define_insn "*mve_mov<mode>"
   [(set (match_operand:MVE_types 0 "s_register_operand" "=w,w,r,w,w,r,w")
@@ -11127,3 +11129,57 @@
   "TARGET_HAVE_MVE"
   "sqshll%?\\t%Q1, %R1, %2"
   [(set_attr "predicable" "yes")])
+
+;;
+;; [vshlcq_m_u vshlcq_m_s]
+;;
+(define_expand "mve_vshlcq_m_vec_<supf><mode>"
+ [(match_operand:MVE_2 0 "s_register_operand")
+  (match_operand:MVE_2 1 "s_register_operand")
+  (match_operand:SI 2 "s_register_operand")
+  (match_operand:SI 3 "mve_imm_32")
+  (match_operand:HI 4 "vpr_register_operand")
+  (unspec:MVE_2 [(const_int 0)] VSHLCQ_M)]
+ "TARGET_HAVE_MVE"
+{
+  rtx ignore_wb = gen_reg_rtx (SImode);
+  emit_insn (gen_mve_vshlcq_m_<supf><mode> (operands[0], ignore_wb, operands[1],
+					    operands[2], operands[3],
+					    operands[4]));
+  DONE;
+})
+
+(define_expand "mve_vshlcq_m_carry_<supf><mode>"
+ [(match_operand:SI 0 "s_register_operand")
+  (match_operand:MVE_2 1 "s_register_operand")
+  (match_operand:SI 2 "s_register_operand")
+  (match_operand:SI 3 "mve_imm_32")
+  (match_operand:HI 4 "vpr_register_operand")
+  (unspec:MVE_2 [(const_int 0)] VSHLCQ_M)]
+ "TARGET_HAVE_MVE"
+{
+  rtx ignore_vec = gen_reg_rtx (<MODE>mode);
+  emit_insn (gen_mve_vshlcq_m_<supf><mode> (ignore_vec, operands[0],
+					    operands[1], operands[2],
+					    operands[3], operands[4]));
+  DONE;
+})
+
+(define_insn "mve_vshlcq_m_<supf><mode>"
+ [(set (match_operand:MVE_2 0 "s_register_operand" "=w")
+       (unspec:MVE_2 [(match_operand:MVE_2 2 "s_register_operand" "0")
+		      (match_operand:SI 3 "s_register_operand" "1")
+		      (match_operand:SI 4 "mve_imm_32" "Rf")
+		      (match_operand:HI 5 "vpr_register_operand" "Up")]
+	VSHLCQ_M))
+  (set (match_operand:SI  1 "s_register_operand" "=r")
+       (unspec:SI [(match_dup 2)
+		   (match_dup 3)
+		   (match_dup 4)
+		   (match_dup 5)]
+	VSHLCQ_M))
+ ]
+ "TARGET_HAVE_MVE"
+ "vpst\;vshlct\t%q0, %1, %4"
+ [(set_attr "type" "mve_move")
+  (set_attr "length" "8")])
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vshlcq_m_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vshlcq_m_s16.c
new file mode 100644
index 0000000000000000000000000000000000000000..279c229c3423748f67164bc981f4484bf73172d0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vshlcq_m_s16.c
@@ -0,0 +1,23 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int16x8_t
+foo (int16x8_t a, uint32_t * b, mve_pred16_t p)
+{
+  return vshlcq_m_s16 (a, b, 32, p);
+}
+
+/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler "vshlct"  }  } */
+
+int16x8_t
+foo1 (int16x8_t a, uint32_t * b, mve_pred16_t p)
+{
+  return vshlcq_m (a, b, 32, p);
+}
+
+/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler "vshlct"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vshlcq_m_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vshlcq_m_s32.c
new file mode 100644
index 0000000000000000000000000000000000000000..29bc2cb1bcb45bea7f0d6a7a8eb9ab88dbde6eda
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vshlcq_m_s32.c
@@ -0,0 +1,23 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int32x4_t
+foo (int32x4_t a, uint32_t * b, mve_pred16_t p)
+{
+  return vshlcq_m_s32 (a, b, 1, p);
+}
+
+/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler "vshlct"  }  } */
+
+int32x4_t
+foo1 (int32x4_t a, uint32_t * b, mve_pred16_t p)
+{
+  return vshlcq_m (a, b, 1, p);
+}
+
+/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler "vshlct"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vshlcq_m_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vshlcq_m_s8.c
new file mode 100644
index 0000000000000000000000000000000000000000..6253ad84ab74c709b63c17f849ded956c8a78c36
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vshlcq_m_s8.c
@@ -0,0 +1,23 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int8x16_t
+foo (int8x16_t a, uint32_t * b, mve_pred16_t p)
+{
+  return vshlcq_m_s8 (a, b, 1, p);
+}
+
+/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler "vshlct"  }  } */
+
+int8x16_t
+foo1 (int8x16_t a, uint32_t * b, mve_pred16_t p)
+{
+  return vshlcq_m (a, b, 1, p);
+}
+
+/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler "vshlct"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vshlcq_m_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vshlcq_m_u16.c
new file mode 100644
index 0000000000000000000000000000000000000000..5db18b1ef87cd60cd79dc0915eded03c50a7ab38
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vshlcq_m_u16.c
@@ -0,0 +1,23 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint16x8_t
+foo (uint16x8_t a, uint32_t * b, mve_pred16_t p)
+{
+  return vshlcq_m_u16 (a, b, 1, p);
+}
+
+/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler "vshlct"  }  } */
+
+uint16x8_t
+foo1 (uint16x8_t a, uint32_t * b, mve_pred16_t p)
+{
+  return vshlcq_m (a, b, 1, p);
+}
+
+/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler "vshlct"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vshlcq_m_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vshlcq_m_u32.c
new file mode 100644
index 0000000000000000000000000000000000000000..cab75f2d73b5f4519640b265b7a8c6016788b11b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vshlcq_m_u32.c
@@ -0,0 +1,23 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint32x4_t
+foo (uint32x4_t a, uint32_t * b, mve_pred16_t p)
+{
+  return vshlcq_m_u32 (a, b, 1, p);
+}
+
+/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler "vshlct"  }  } */
+
+uint32x4_t
+foo1 (uint32x4_t a, uint32_t * b, mve_pred16_t p)
+{
+  return vshlcq_m (a, b, 1, p);
+}
+
+/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler "vshlct"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vshlcq_m_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vshlcq_m_u8.c
new file mode 100644
index 0000000000000000000000000000000000000000..98a1470aee4240b585113369e63a2c0129d7dc26
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vshlcq_m_u8.c
@@ -0,0 +1,23 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint8x16_t
+foo (uint8x16_t a, uint32_t * b, mve_pred16_t p)
+{
+  return vshlcq_m_u8 (a, b, 1, p);
+}
+
+/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler "vshlct"  }  } */
+
+uint8x16_t
+foo1 (uint8x16_t a, uint32_t * b, mve_pred16_t p)
+{
+  return vshlcq_m (a, b, 1, p);
+}
+
+/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler "vshlct"  }  } */


[-- Attachment #2: diff37.patch --]
[-- Type: text/plain, Size: 14562 bytes --]

diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
index 2adae7f8b21f44aa3b80231b89bd68bcd0812611..5d385081a0affacc4dd21d010b01bceb38a9b699 100644
--- a/gcc/config/arm/arm_mve.h
+++ b/gcc/config/arm/arm_mve.h
@@ -2542,6 +2542,12 @@ typedef struct { uint8x16_t val[4]; } uint8x16x4_t;
 #define urshrl(__p0, __p1) __arm_urshrl(__p0, __p1)
 #define lsll(__p0, __p1) __arm_lsll(__p0, __p1)
 #define asrl(__p0, __p1) __arm_asrl(__p0, __p1)
+#define vshlcq_m_s8(__a,  __b,  __imm, __p) __arm_vshlcq_m_s8(__a,  __b,  __imm, __p)
+#define vshlcq_m_u8(__a,  __b,  __imm, __p) __arm_vshlcq_m_u8(__a,  __b,  __imm, __p)
+#define vshlcq_m_s16(__a,  __b,  __imm, __p) __arm_vshlcq_m_s16(__a,  __b,  __imm, __p)
+#define vshlcq_m_u16(__a,  __b,  __imm, __p) __arm_vshlcq_m_u16(__a,  __b,  __imm, __p)
+#define vshlcq_m_s32(__a,  __b,  __imm, __p) __arm_vshlcq_m_s32(__a,  __b,  __imm, __p)
+#define vshlcq_m_u32(__a,  __b,  __imm, __p) __arm_vshlcq_m_u32(__a,  __b,  __imm, __p)
 #endif
 
 /* For big-endian, GCC's vector indices are reversed within each 64 bits
@@ -16667,6 +16673,60 @@ __arm_srshr (int32_t value, const int shift)
   return __builtin_mve_srshr_si (value, shift);
 }
 
+__extension__ extern __inline int8x16_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vshlcq_m_s8 (int8x16_t __a, uint32_t * __b, const int __imm, mve_pred16_t __p)
+{
+  int8x16_t __res = __builtin_mve_vshlcq_m_vec_sv16qi (__a, *__b, __imm, __p);
+  *__b = __builtin_mve_vshlcq_m_carry_sv16qi (__a, *__b, __imm, __p);
+  return __res;
+}
+
+__extension__ extern __inline uint8x16_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vshlcq_m_u8 (uint8x16_t __a, uint32_t * __b, const int __imm, mve_pred16_t __p)
+{
+  uint8x16_t __res = __builtin_mve_vshlcq_m_vec_uv16qi (__a, *__b, __imm, __p);
+  *__b = __builtin_mve_vshlcq_m_carry_uv16qi (__a, *__b, __imm, __p);
+  return __res;
+}
+
+__extension__ extern __inline int16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vshlcq_m_s16 (int16x8_t __a, uint32_t * __b, const int __imm, mve_pred16_t __p)
+{
+  int16x8_t __res = __builtin_mve_vshlcq_m_vec_sv8hi (__a, *__b, __imm, __p);
+  *__b = __builtin_mve_vshlcq_m_carry_sv8hi (__a, *__b, __imm, __p);
+  return __res;
+}
+
+__extension__ extern __inline uint16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vshlcq_m_u16 (uint16x8_t __a, uint32_t * __b, const int __imm, mve_pred16_t __p)
+{
+  uint16x8_t __res = __builtin_mve_vshlcq_m_vec_uv8hi (__a, *__b, __imm, __p);
+  *__b = __builtin_mve_vshlcq_m_carry_uv8hi (__a, *__b, __imm, __p);
+  return __res;
+}
+
+__extension__ extern __inline int32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vshlcq_m_s32 (int32x4_t __a, uint32_t * __b, const int __imm, mve_pred16_t __p)
+{
+  int32x4_t __res = __builtin_mve_vshlcq_m_vec_sv4si (__a, *__b, __imm, __p);
+  *__b = __builtin_mve_vshlcq_m_carry_sv4si (__a, *__b, __imm, __p);
+  return __res;
+}
+
+__extension__ extern __inline uint32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vshlcq_m_u32 (uint32x4_t __a, uint32_t * __b, const int __imm, mve_pred16_t __p)
+{
+  uint32x4_t __res = __builtin_mve_vshlcq_m_vec_uv4si (__a, *__b, __imm, __p);
+  *__b = __builtin_mve_vshlcq_m_carry_uv4si (__a, *__b, __imm, __p);
+  return __res;
+}
+
 #if (__ARM_FEATURE_MVE & 2) /* MVE Floating point.  */
 
 __extension__ extern __inline void
@@ -26645,6 +26705,16 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_uint32_t][__ARM_mve_type_uint32x4_t]: __arm_vsetq_lane_u32 (__ARM_mve_coerce(__p0, uint32_t), __ARM_mve_coerce(__p1, uint32x4_t), p2), \
   int (*)[__ARM_mve_type_uint64_t][__ARM_mve_type_uint64x2_t]: __arm_vsetq_lane_u64 (__ARM_mve_coerce(__p0, uint64_t), __ARM_mve_coerce(__p1, uint64x2_t), p2));})
 
+#define vshlcq_m(p0,p1,p2,p3) __arm_vshlcq_m(p0,p1,p2,p3)
+#define __arm_vshlcq_m(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
+  int (*)[__ARM_mve_type_int8x16_t]: __arm_vshlcq_m_s8 (__ARM_mve_coerce(__p0, int8x16_t), p1, p2, p3), \
+  int (*)[__ARM_mve_type_int16x8_t]: __arm_vshlcq_m_s16 (__ARM_mve_coerce(__p0, int16x8_t), p1, p2, p3), \
+  int (*)[__ARM_mve_type_int32x4_t]: __arm_vshlcq_m_s32 (__ARM_mve_coerce(__p0, int32x4_t), p1, p2, p3), \
+  int (*)[__ARM_mve_type_uint8x16_t]: __arm_vshlcq_m_u8 (__ARM_mve_coerce(__p0, uint8x16_t), p1, p2, p3), \
+  int (*)[__ARM_mve_type_uint16x8_t]: __arm_vshlcq_m_u16 (__ARM_mve_coerce(__p0, uint16x8_t), p1, p2, p3), \
+  int (*)[__ARM_mve_type_uint32x4_t]: __arm_vshlcq_m_u32 (__ARM_mve_coerce(__p0, uint32x4_t), p1, p2, p3));})
+
 #endif /* MVE Floating point.  */
 
 #ifdef __cplusplus
diff --git a/gcc/config/arm/arm_mve_builtins.def b/gcc/config/arm/arm_mve_builtins.def
index c23fe88ad05f8ee6c03127066fdf5afa593df944..08f2a72df4b26980c473dac91993eb4583c60727 100644
--- a/gcc/config/arm/arm_mve_builtins.def
+++ b/gcc/config/arm/arm_mve_builtins.def
@@ -890,3 +890,7 @@ VAR1 (UQSHL, urshr_, si)
 VAR1 (UQSHL, urshrl_, di)
 VAR1 (UQSHL, uqshl_, si)
 VAR1 (UQSHL, uqshll_, di)
+VAR3 (QUADOP_NONE_NONE_UNONE_IMM_UNONE, vshlcq_m_vec_s, v16qi, v8hi, v4si)
+VAR3 (QUADOP_NONE_NONE_UNONE_IMM_UNONE, vshlcq_m_carry_s, v16qi, v8hi, v4si)
+VAR3 (QUADOP_UNONE_UNONE_UNONE_IMM_UNONE, vshlcq_m_vec_u, v16qi, v8hi, v4si)
+VAR3 (QUADOP_UNONE_UNONE_UNONE_IMM_UNONE, vshlcq_m_carry_u, v16qi, v8hi, v4si)
diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index bbd54ad71d0b85fd4711007dd93af7a2788b2cf4..8a51c25ce26405d65aac08044993c92cfb557661 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -215,8 +215,8 @@
 			 VADCQ_M_S VSBCIQ_U VSBCIQ_S VSBCIQ_M_U VSBCIQ_M_S
 			 VSBCQ_U VSBCQ_S VSBCQ_M_U VSBCQ_M_S VADCIQ_U VADCIQ_M_U
 			 VADCIQ_S VADCIQ_M_S VLD2Q VLD4Q VST2Q SRSHRL SRSHR
-			 URSHR URSHRL SQRSHR UQRSHL UQRSHLL_64
-			 UQRSHLL_48 SQRSHRL_64 SQRSHRL_48])
+			 URSHR URSHRL SQRSHR UQRSHL UQRSHLL_64 VSHLCQ_M_U
+			 UQRSHLL_48 SQRSHRL_64 SQRSHRL_48 VSHLCQ_M_S])
 
 (define_mode_attr MVE_CNVT [(V8HI "V8HF") (V4SI "V4SF") (V8HF "V8HI")
 			    (V4SF "V4SI")])
@@ -394,7 +394,8 @@
 		       (VADCQ_U "u")  (VADCQ_M_U "u") (VADCQ_S "s")
 		       (VADCIQ_U "u") (VADCIQ_M_U "u") (VADCIQ_S "s")
 		       (VADCIQ_M_S "s") (SQRSHRL_64 "64") (SQRSHRL_48 "48")
-		       (UQRSHLL_64 "64") (UQRSHLL_48 "48")])
+		       (UQRSHLL_64 "64") (UQRSHLL_48 "48") (VSHLCQ_M_S "s")
+		       (VSHLCQ_M_U "u")])
 
 (define_int_attr mode1 [(VCTP8Q "8") (VCTP16Q "16") (VCTP32Q "32")
 			(VCTP64Q "64") (VCTP8Q_M "8") (VCTP16Q_M "16")
@@ -659,6 +660,7 @@
 (define_int_iterator VADCQ_M [VADCQ_M_U VADCQ_M_S])
 (define_int_iterator UQRSHLLQ [UQRSHLL_64 UQRSHLL_48])
 (define_int_iterator SQRSHRLQ [SQRSHRL_64 SQRSHRL_48])
+(define_int_iterator VSHLCQ_M [VSHLCQ_M_S VSHLCQ_M_U])
 
 (define_insn "*mve_mov<mode>"
   [(set (match_operand:MVE_types 0 "s_register_operand" "=w,w,r,w,w,r,w")
@@ -11127,3 +11129,57 @@
   "TARGET_HAVE_MVE"
   "sqshll%?\\t%Q1, %R1, %2"
   [(set_attr "predicable" "yes")])
+
+;;
+;; [vshlcq_m_u vshlcq_m_s]
+;;
+(define_expand "mve_vshlcq_m_vec_<supf><mode>"
+ [(match_operand:MVE_2 0 "s_register_operand")
+  (match_operand:MVE_2 1 "s_register_operand")
+  (match_operand:SI 2 "s_register_operand")
+  (match_operand:SI 3 "mve_imm_32")
+  (match_operand:HI 4 "vpr_register_operand")
+  (unspec:MVE_2 [(const_int 0)] VSHLCQ_M)]
+ "TARGET_HAVE_MVE"
+{
+  rtx ignore_wb = gen_reg_rtx (SImode);
+  emit_insn (gen_mve_vshlcq_m_<supf><mode> (operands[0], ignore_wb, operands[1],
+					    operands[2], operands[3],
+					    operands[4]));
+  DONE;
+})
+
+(define_expand "mve_vshlcq_m_carry_<supf><mode>"
+ [(match_operand:SI 0 "s_register_operand")
+  (match_operand:MVE_2 1 "s_register_operand")
+  (match_operand:SI 2 "s_register_operand")
+  (match_operand:SI 3 "mve_imm_32")
+  (match_operand:HI 4 "vpr_register_operand")
+  (unspec:MVE_2 [(const_int 0)] VSHLCQ_M)]
+ "TARGET_HAVE_MVE"
+{
+  rtx ignore_vec = gen_reg_rtx (<MODE>mode);
+  emit_insn (gen_mve_vshlcq_m_<supf><mode> (ignore_vec, operands[0],
+					    operands[1], operands[2],
+					    operands[3], operands[4]));
+  DONE;
+})
+
+(define_insn "mve_vshlcq_m_<supf><mode>"
+ [(set (match_operand:MVE_2 0 "s_register_operand" "=w")
+       (unspec:MVE_2 [(match_operand:MVE_2 2 "s_register_operand" "0")
+		      (match_operand:SI 3 "s_register_operand" "1")
+		      (match_operand:SI 4 "mve_imm_32" "Rf")
+		      (match_operand:HI 5 "vpr_register_operand" "Up")]
+	VSHLCQ_M))
+  (set (match_operand:SI  1 "s_register_operand" "=r")
+       (unspec:SI [(match_dup 2)
+		   (match_dup 3)
+		   (match_dup 4)
+		   (match_dup 5)]
+	VSHLCQ_M))
+ ]
+ "TARGET_HAVE_MVE"
+ "vpst\;vshlct\t%q0, %1, %4"
+ [(set_attr "type" "mve_move")
+  (set_attr "length" "8")])
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vshlcq_m_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vshlcq_m_s16.c
new file mode 100644
index 0000000000000000000000000000000000000000..279c229c3423748f67164bc981f4484bf73172d0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vshlcq_m_s16.c
@@ -0,0 +1,23 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int16x8_t
+foo (int16x8_t a, uint32_t * b, mve_pred16_t p)
+{
+  return vshlcq_m_s16 (a, b, 32, p);
+}
+
+/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler "vshlct"  }  } */
+
+int16x8_t
+foo1 (int16x8_t a, uint32_t * b, mve_pred16_t p)
+{
+  return vshlcq_m (a, b, 32, p);
+}
+
+/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler "vshlct"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vshlcq_m_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vshlcq_m_s32.c
new file mode 100644
index 0000000000000000000000000000000000000000..29bc2cb1bcb45bea7f0d6a7a8eb9ab88dbde6eda
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vshlcq_m_s32.c
@@ -0,0 +1,23 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int32x4_t
+foo (int32x4_t a, uint32_t * b, mve_pred16_t p)
+{
+  return vshlcq_m_s32 (a, b, 1, p);
+}
+
+/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler "vshlct"  }  } */
+
+int32x4_t
+foo1 (int32x4_t a, uint32_t * b, mve_pred16_t p)
+{
+  return vshlcq_m (a, b, 1, p);
+}
+
+/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler "vshlct"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vshlcq_m_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vshlcq_m_s8.c
new file mode 100644
index 0000000000000000000000000000000000000000..6253ad84ab74c709b63c17f849ded956c8a78c36
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vshlcq_m_s8.c
@@ -0,0 +1,23 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int8x16_t
+foo (int8x16_t a, uint32_t * b, mve_pred16_t p)
+{
+  return vshlcq_m_s8 (a, b, 1, p);
+}
+
+/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler "vshlct"  }  } */
+
+int8x16_t
+foo1 (int8x16_t a, uint32_t * b, mve_pred16_t p)
+{
+  return vshlcq_m (a, b, 1, p);
+}
+
+/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler "vshlct"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vshlcq_m_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vshlcq_m_u16.c
new file mode 100644
index 0000000000000000000000000000000000000000..5db18b1ef87cd60cd79dc0915eded03c50a7ab38
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vshlcq_m_u16.c
@@ -0,0 +1,23 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint16x8_t
+foo (uint16x8_t a, uint32_t * b, mve_pred16_t p)
+{
+  return vshlcq_m_u16 (a, b, 1, p);
+}
+
+/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler "vshlct"  }  } */
+
+uint16x8_t
+foo1 (uint16x8_t a, uint32_t * b, mve_pred16_t p)
+{
+  return vshlcq_m (a, b, 1, p);
+}
+
+/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler "vshlct"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vshlcq_m_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vshlcq_m_u32.c
new file mode 100644
index 0000000000000000000000000000000000000000..cab75f2d73b5f4519640b265b7a8c6016788b11b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vshlcq_m_u32.c
@@ -0,0 +1,23 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint32x4_t
+foo (uint32x4_t a, uint32_t * b, mve_pred16_t p)
+{
+  return vshlcq_m_u32 (a, b, 1, p);
+}
+
+/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler "vshlct"  }  } */
+
+uint32x4_t
+foo1 (uint32x4_t a, uint32_t * b, mve_pred16_t p)
+{
+  return vshlcq_m (a, b, 1, p);
+}
+
+/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler "vshlct"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vshlcq_m_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vshlcq_m_u8.c
new file mode 100644
index 0000000000000000000000000000000000000000..98a1470aee4240b585113369e63a2c0129d7dc26
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vshlcq_m_u8.c
@@ -0,0 +1,23 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint8x16_t
+foo (uint8x16_t a, uint32_t * b, mve_pred16_t p)
+{
+  return vshlcq_m_u8 (a, b, 1, p);
+}
+
+/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler "vshlct"  }  } */
+
+uint8x16_t
+foo1 (uint8x16_t a, uint32_t * b, mve_pred16_t p)
+{
+  return vshlcq_m (a, b, 1, p);
+}
+
+/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler "vshlct"  }  } */


^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH][ARM][GCC][4/4x]: MVE intrinsics with quaternary operands.
  2019-11-14 19:34 [PATCH][ARM][GCC][0/x]: Support for MVE ACLE intrinsics Srinath Parvathaneni
                   ` (34 preceding siblings ...)
  2019-11-14 19:16 ` [PATCH][ARM][GCC][11x]: MVE ACLE vector interleaving store and deinterleaving load intrinsics and also aliases to vstr and vldr intrinsics Srinath Parvathaneni
@ 2019-11-14 19:16 ` Srinath Parvathaneni
  2019-11-14 19:16 ` [PATCH][ARM][GCC][2/3x]: MVE intrinsics with ternary operands Srinath Parvathaneni
                   ` (2 subsequent siblings)
  38 siblings, 0 replies; 48+ messages in thread
From: Srinath Parvathaneni @ 2019-11-14 19:16 UTC (permalink / raw)
  To: gcc-patches; +Cc: Richard Earnshaw, Kyrylo Tkachov

[-- Attachment #1: Type: text/plain, Size: 11874 bytes --]

Hello,

This patch supports following MVE ACLE intrinsics with quaternary operands.

vabdq_m_f32, vabdq_m_f16, vaddq_m_f32, vaddq_m_f16, vaddq_m_n_f32, vaddq_m_n_f16,
vandq_m_f32, vandq_m_f16, vbicq_m_f32, vbicq_m_f16, vbrsrq_m_n_f32, vbrsrq_m_n_f16,
vcaddq_rot270_m_f32, vcaddq_rot270_m_f16, vcaddq_rot90_m_f32, vcaddq_rot90_m_f16,
vcmlaq_m_f32, vcmlaq_m_f16, vcmlaq_rot180_m_f32, vcmlaq_rot180_m_f16,
vcmlaq_rot270_m_f32, vcmlaq_rot270_m_f16, vcmlaq_rot90_m_f32, vcmlaq_rot90_m_f16,
vcmulq_m_f32, vcmulq_m_f16, vcmulq_rot180_m_f32, vcmulq_rot180_m_f16,
vcmulq_rot270_m_f32, vcmulq_rot270_m_f16, vcmulq_rot90_m_f32, vcmulq_rot90_m_f16,
vcvtq_m_n_s32_f32, vcvtq_m_n_s16_f16, vcvtq_m_n_u32_f32, vcvtq_m_n_u16_f16,
veorq_m_f32, veorq_m_f16, vfmaq_m_f32, vfmaq_m_f16, vfmaq_m_n_f32, vfmaq_m_n_f16,
vfmasq_m_n_f32, vfmasq_m_n_f16, vfmsq_m_f32, vfmsq_m_f16, vmaxnmq_m_f32,
vmaxnmq_m_f16, vminnmq_m_f32, vminnmq_m_f16, vmulq_m_f32, vmulq_m_f16,
vmulq_m_n_f32, vmulq_m_n_f16, vornq_m_f32, vornq_m_f16, vorrq_m_f32, vorrq_m_f16,
vsubq_m_f32, vsubq_m_f16, vsubq_m_n_f32, vsubq_m_n_f16.

Please refer to M-profile Vector Extension (MVE) intrinsics [1]  for more details.
[1]  https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/mve-intrinsics

Regression tested on arm-none-eabi and found no regressions.

Ok for trunk?

Thanks,
Srinath.

gcc/ChangeLog:

2019-10-31  Andre Vieira  <andre.simoesdiasvieira@arm.com>
            Mihail Ionescu  <mihail.ionescu@arm.com>
            Srinath Parvathaneni  <srinath.parvathaneni@arm.com>

	* config/arm/arm_mve.h (vabdq_m_f32): Define macro.
	(vabdq_m_f16): Likewise.
	(vaddq_m_f32): Likewise.
	(vaddq_m_f16): Likewise.
	(vaddq_m_n_f32): Likewise.
	(vaddq_m_n_f16): Likewise.
	(vandq_m_f32): Likewise.
	(vandq_m_f16): Likewise.
	(vbicq_m_f32): Likewise.
	(vbicq_m_f16): Likewise.
	(vbrsrq_m_n_f32): Likewise.
	(vbrsrq_m_n_f16): Likewise.
	(vcaddq_rot270_m_f32): Likewise.
	(vcaddq_rot270_m_f16): Likewise.
	(vcaddq_rot90_m_f32): Likewise.
	(vcaddq_rot90_m_f16): Likewise.
	(vcmlaq_m_f32): Likewise.
	(vcmlaq_m_f16): Likewise.
	(vcmlaq_rot180_m_f32): Likewise.
	(vcmlaq_rot180_m_f16): Likewise.
	(vcmlaq_rot270_m_f32): Likewise.
	(vcmlaq_rot270_m_f16): Likewise.
	(vcmlaq_rot90_m_f32): Likewise.
	(vcmlaq_rot90_m_f16): Likewise.
	(vcmulq_m_f32): Likewise.
	(vcmulq_m_f16): Likewise.
	(vcmulq_rot180_m_f32): Likewise.
	(vcmulq_rot180_m_f16): Likewise.
	(vcmulq_rot270_m_f32): Likewise.
	(vcmulq_rot270_m_f16): Likewise.
	(vcmulq_rot90_m_f32): Likewise.
	(vcmulq_rot90_m_f16): Likewise.
	(vcvtq_m_n_s32_f32): Likewise.
	(vcvtq_m_n_s16_f16): Likewise.
	(vcvtq_m_n_u32_f32): Likewise.
	(vcvtq_m_n_u16_f16): Likewise.
	(veorq_m_f32): Likewise.
	(veorq_m_f16): Likewise.
	(vfmaq_m_f32): Likewise.
	(vfmaq_m_f16): Likewise.
	(vfmaq_m_n_f32): Likewise.
	(vfmaq_m_n_f16): Likewise.
	(vfmasq_m_n_f32): Likewise.
	(vfmasq_m_n_f16): Likewise.
	(vfmsq_m_f32): Likewise.
	(vfmsq_m_f16): Likewise.
	(vmaxnmq_m_f32): Likewise.
	(vmaxnmq_m_f16): Likewise.
	(vminnmq_m_f32): Likewise.
	(vminnmq_m_f16): Likewise.
	(vmulq_m_f32): Likewise.
	(vmulq_m_f16): Likewise.
	(vmulq_m_n_f32): Likewise.
	(vmulq_m_n_f16): Likewise.
	(vornq_m_f32): Likewise.
	(vornq_m_f16): Likewise.
	(vorrq_m_f32): Likewise.
	(vorrq_m_f16): Likewise.
	(vsubq_m_f32): Likewise.
	(vsubq_m_f16): Likewise.
	(vsubq_m_n_f32): Likewise.
	(vsubq_m_n_f16): Likewise.
	(__attribute__): Likewise.
	(__arm_vabdq_m_f32): Likewise.
	(__arm_vabdq_m_f16): Likewise.
	(__arm_vaddq_m_f32): Likewise.
	(__arm_vaddq_m_f16): Likewise.
	(__arm_vaddq_m_n_f32): Likewise.
	(__arm_vaddq_m_n_f16): Likewise.
	(__arm_vandq_m_f32): Likewise.
	(__arm_vandq_m_f16): Likewise.
	(__arm_vbicq_m_f32): Likewise.
	(__arm_vbicq_m_f16): Likewise.
	(__arm_vbrsrq_m_n_f32): Likewise.
	(__arm_vbrsrq_m_n_f16): Likewise.
	(__arm_vcaddq_rot270_m_f32): Likewise.
	(__arm_vcaddq_rot270_m_f16): Likewise.
	(__arm_vcaddq_rot90_m_f32): Likewise.
	(__arm_vcaddq_rot90_m_f16): Likewise.
	(__arm_vcmlaq_m_f32): Likewise.
	(__arm_vcmlaq_m_f16): Likewise.
	(__arm_vcmlaq_rot180_m_f32): Likewise.
	(__arm_vcmlaq_rot180_m_f16): Likewise.
	(__arm_vcmlaq_rot270_m_f32): Likewise.
	(__arm_vcmlaq_rot270_m_f16): Likewise.
	(__arm_vcmlaq_rot90_m_f32): Likewise.
	(__arm_vcmlaq_rot90_m_f16): Likewise.
	(__arm_vcmulq_m_f32): Likewise.
	(__arm_vcmulq_m_f16): Likewise.
	(__arm_vcmulq_rot180_m_f32): Define intrinsic.
	(__arm_vcmulq_rot180_m_f16): Likewise.
	(__arm_vcmulq_rot270_m_f32): Likewise.
	(__arm_vcmulq_rot270_m_f16): Likewise.
	(__arm_vcmulq_rot90_m_f32): Likewise.
	(__arm_vcmulq_rot90_m_f16): Likewise.
	(__arm_vcvtq_m_n_s32_f32): Likewise.
	(__arm_vcvtq_m_n_s16_f16): Likewise.
	(__arm_vcvtq_m_n_u32_f32): Likewise.
	(__arm_vcvtq_m_n_u16_f16): Likewise.
	(__arm_veorq_m_f32): Likewise.
	(__arm_veorq_m_f16): Likewise.
	(__arm_vfmaq_m_f32): Likewise.
	(__arm_vfmaq_m_f16): Likewise.
	(__arm_vfmaq_m_n_f32): Likewise.
	(__arm_vfmaq_m_n_f16): Likewise.
	(__arm_vfmasq_m_n_f32): Likewise.
	(__arm_vfmasq_m_n_f16): Likewise.
	(__arm_vfmsq_m_f32): Likewise.
	(__arm_vfmsq_m_f16): Likewise.
	(__arm_vmaxnmq_m_f32): Likewise.
	(__arm_vmaxnmq_m_f16): Likewise.
	(__arm_vminnmq_m_f32): Likewise.
	(__arm_vminnmq_m_f16): Likewise.
	(__arm_vmulq_m_f32): Likewise.
	(__arm_vmulq_m_f16): Likewise.
	(__arm_vmulq_m_n_f32): Likewise.
	(__arm_vmulq_m_n_f16): Likewise.
	(__arm_vornq_m_f32): Likewise.
	(__arm_vornq_m_f16): Likewise.
	(__arm_vorrq_m_f32): Likewise.
	(__arm_vorrq_m_f16): Likewise.
	(__arm_vsubq_m_f32): Likewise.
	(__arm_vsubq_m_f16): Likewise.
	(__arm_vsubq_m_n_f32): Likewise.
	(__arm_vsubq_m_n_f16): Likewise.
	(vabdq_m): Define polymorphic variant.
	(vaddq_m): Likewise.
	(vaddq_m_n): Likewise.
	(vandq_m): Likewise.
	(vbicq_m): Likewise.
	(vbrsrq_m_n): Likewise.
	(vcaddq_rot270_m): Likewise.
	(vcaddq_rot90_m): Likewise.
	(vcmlaq_m): Likewise.
	(vcmlaq_rot180_m): Likewise.
	(vcmlaq_rot270_m): Likewise.
	(vcmlaq_rot90_m): Likewise.
	(vcmulq_m): Likewise.
	(vcmulq_rot180_m): Likewise.
	(vcmulq_rot270_m): Likewise.
	(vcmulq_rot90_m): Likewise.
	(veorq_m): Likewise.
	(vfmaq_m): Likewise.
	(vfmaq_m_n): Likewise.
	(vfmasq_m_n): Likewise.
	(vfmsq_m): Likewise.
	(vmaxnmq_m): Likewise.
	(vminnmq_m): Likewise.
	(vmulq_m): Likewise.
	(vmulq_m_n): Likewise.
	(vornq_m): Likewise.
	(vsubq_m): Likewise.
	(vsubq_m_n): Likewise.
	(vorrq_m): Likewise.
	* config/arm/arm_mve_builtins.def (QUADOP_NONE_NONE_NONE_IMM_UNONE): Use
	builtin qualifier.
	(QUADOP_NONE_NONE_NONE_NONE_UNONE): Likewise.
	(QUADOP_UNONE_UNONE_NONE_IMM_UNONE): Likewise.
	* config/arm/mve.md (mve_vabdq_m_f<mode>): Define RTL pattern.
	(mve_vaddq_m_f<mode>): Likewise.
	(mve_vaddq_m_n_f<mode>): Likewise.
	(mve_vandq_m_f<mode>): Likewise.
	(mve_vbicq_m_f<mode>): Likewise.
	(mve_vbrsrq_m_n_f<mode>): Likewise.
	(mve_vcaddq_rot270_m_f<mode>): Likewise.
	(mve_vcaddq_rot90_m_f<mode>): Likewise.
	(mve_vcmlaq_m_f<mode>): Likewise.
	(mve_vcmlaq_rot180_m_f<mode>): Likewise.
	(mve_vcmlaq_rot270_m_f<mode>): Likewise.
	(mve_vcmlaq_rot90_m_f<mode>): Likewise.
	(mve_vcmulq_m_f<mode>): Likewise.
	(mve_vcmulq_rot180_m_f<mode>): Likewise.
	(mve_vcmulq_rot270_m_f<mode>): Likewise.
	(mve_vcmulq_rot90_m_f<mode>): Likewise.
	(mve_veorq_m_f<mode>): Likewise.
	(mve_vfmaq_m_f<mode>): Likewise.
	(mve_vfmaq_m_n_f<mode>): Likewise.
	(mve_vfmasq_m_n_f<mode>): Likewise.
	(mve_vfmsq_m_f<mode>): Likewise.
	(mve_vmaxnmq_m_f<mode>): Likewise.
	(mve_vminnmq_m_f<mode>): Likewise.
	(mve_vmulq_m_f<mode>): Likewise.
	(mve_vmulq_m_n_f<mode>): Likewise.
	(mve_vornq_m_f<mode>): Likewise.
	(mve_vorrq_m_f<mode>): Likewise.
	(mve_vsubq_m_f<mode>): Likewise.
	(mve_vsubq_m_n_f<mode>): Likewise.

gcc/testsuite/ChangeLog:

2019-10-31  Andre Vieira  <andre.simoesdiasvieira@arm.com>
            Mihail Ionescu  <mihail.ionescu@arm.com>
            Srinath Parvathaneni  <srinath.parvathaneni@arm.com>

	* gcc.target/arm/mve/intrinsics/vabdq_m_f16.c: New test.
	* gcc.target/arm/mve/intrinsics/vabdq_m_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddq_m_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddq_m_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddq_m_n_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddq_m_n_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vandq_m_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vandq_m_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vbicq_m_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vbicq_m_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vbrsrq_m_n_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vbrsrq_m_n_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcaddq_rot270_m_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcaddq_rot270_m_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcaddq_rot90_m_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcaddq_rot90_m_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmlaq_m_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmlaq_m_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmlaq_rot180_m_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmlaq_rot180_m_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmlaq_rot270_m_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmlaq_rot270_m_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmlaq_rot90_m_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmlaq_rot90_m_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmulq_m_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmulq_m_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmulq_rot180_m_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmulq_rot180_m_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmulq_rot270_m_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmulq_rot270_m_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmulq_rot90_m_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmulq_rot90_m_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcvtq_m_n_s16_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcvtq_m_n_s32_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcvtq_m_n_u16_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcvtq_m_n_u32_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/veorq_m_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/veorq_m_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vfmaq_m_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vfmaq_m_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vfmaq_m_n_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vfmaq_m_n_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vfmasq_m_n_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vfmasq_m_n_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vfmsq_m_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vfmsq_m_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmaxnmq_m_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmaxnmq_m_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vminnmq_m_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vminnmq_m_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmulq_m_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmulq_m_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmulq_m_n_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmulq_m_n_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vornq_m_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vornq_m_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vorrq_m_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vorrq_m_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsubq_m_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsubq_m_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsubq_m_n_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsubq_m_n_f32.c: Likewise.

[-- Attachment #2: diff19.patch.gz --]
[-- Type: application/gzip, Size: 8955 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH][ARM][GCC][12x]: MVE ACLE intrinsics to set and get vector lane.
  2019-11-14 19:34 [PATCH][ARM][GCC][0/x]: Support for MVE ACLE intrinsics Srinath Parvathaneni
                   ` (24 preceding siblings ...)
  2019-11-14 19:16 ` [PATCH][ARM][GCC][5/5x]: MVE ACLE load intrinsics which load a byte, halfword, or word from memory Srinath Parvathaneni
@ 2019-11-14 19:16 ` Srinath Parvathaneni
  2019-11-14 19:16 ` [PATCH][ARM][GCC][6/5x]: Remaining MVE load intrinsics which loads half word and word or double word from memory Srinath Parvathaneni
                   ` (12 subsequent siblings)
  38 siblings, 0 replies; 48+ messages in thread
From: Srinath Parvathaneni @ 2019-11-14 19:16 UTC (permalink / raw)
  To: gcc-patches; +Cc: Richard Earnshaw, Kyrylo Tkachov

[-- Attachment #1: Type: text/plain, Size: 43257 bytes --]

Hello,

This patch supports following MVE ACLE intrinsics to get and set vector lane.

vsetq_lane_f16, vsetq_lane_f32, vsetq_lane_s16, vsetq_lane_s32, vsetq_lane_s8,
vsetq_lane_s64, vsetq_lane_u8, vsetq_lane_u16, vsetq_lane_u32, vsetq_lane_u64,
vgetq_lane_f16, vgetq_lane_f32, vgetq_lane_s16, vgetq_lane_s32, vgetq_lane_s8,
vgetq_lane_s64, vgetq_lane_u8, vgetq_lane_u16, vgetq_lane_u32, vgetq_lane_u64.

Please refer to M-profile Vector Extension (MVE) intrinsics [1]  for more details.
[1] https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/mve-intrinsics

Regression tested on arm-none-eabi and found no regressions.

Ok for trunk?

Thanks,
Srinath.

gcc/ChangeLog:

2019-11-08  Andre Vieira  <andre.simoesdiasvieira@arm.com>
            Mihail Ionescu  <mihail.ionescu@arm.com>
            Srinath Parvathaneni  <srinath.parvathaneni@arm.com>

	* config/arm/arm_mve.h (vsetq_lane_f16): Define macro.
	(vsetq_lane_f32): Likewise.
	(vsetq_lane_s16): Likewise.
	(vsetq_lane_s32): Likewise.
	(vsetq_lane_s8): Likewise.
	(vsetq_lane_s64): Likewise.
	(vsetq_lane_u8): Likewise.
	(vsetq_lane_u16): Likewise.
	(vsetq_lane_u32): Likewise.
	(vsetq_lane_u64): Likewise.
	(vgetq_lane_f16): Likewise.
	(vgetq_lane_f32): Likewise.
	(vgetq_lane_s16): Likewise.
	(vgetq_lane_s32): Likewise.
	(vgetq_lane_s8): Likewise.
	(vgetq_lane_s64): Likewise.
	(vgetq_lane_u8): Likewise.
	(vgetq_lane_u16): Likewise.
	(vgetq_lane_u32): Likewise.
	(vgetq_lane_u64): Likewise.
	(__ARM_NUM_LANES): Likewise.
	(__ARM_LANEQ): Likewise.
	(__ARM_CHECK_LANEQ): Likewise.
	(__arm_vsetq_lane_s16): Define intrinsic.
	(__arm_vsetq_lane_s32): Likewise.
	(__arm_vsetq_lane_s8): Likewise.
	(__arm_vsetq_lane_s64): Likewise.
	(__arm_vsetq_lane_u8): Likewise.
	(__arm_vsetq_lane_u16): Likewise.
	(__arm_vsetq_lane_u32): Likewise.
	(__arm_vsetq_lane_u64): Likewise.
	(__arm_vgetq_lane_s16): Likewise.
	(__arm_vgetq_lane_s32): Likewise.
	(__arm_vgetq_lane_s8): Likewise.
	(__arm_vgetq_lane_s64): Likewise.
	(__arm_vgetq_lane_u8): Likewise.
	(__arm_vgetq_lane_u16): Likewise.
	(__arm_vgetq_lane_u32): Likewise.
	(__arm_vgetq_lane_u64): Likewise.
	(__arm_vsetq_lane_f16): Likewise.
	(__arm_vsetq_lane_f32): Likewise.
	(__arm_vgetq_lane_f16): Likewise.
	(__arm_vgetq_lane_f32): Likewise.
	(vgetq_lane): Define polymorphic variant.
	(vsetq_lane): Likewise.
	* config/arm/mve.md (mve_vec_extract<mode><V_elem_l>): Define RTL
	pattern.
	(mve_vec_extractv2didi): Likewise.
	(mve_vec_extract_sext_internal<mode>): Likewise.
	(mve_vec_extract_zext_internal<mode>): Likewise.
	(mve_vec_set<mode>_internal): Likewise.
	(mve_vec_setv2di_internal): Likewise.
	* config/arm/neon.md (vec_set<mode>): Move RTL pattern to vec-common.md
	file.
	(vec_extract<mode><V_elem_l>): Rename to
	"neon_vec_extract<mode><V_elem_l>".
	(vec_extractv2didi): Rename to "neon_vec_extractv2didi".
	* config/arm/vec-common.md (vec_extract<mode><V_elem_l>): Define RTL
	pattern common for MVE and NEON.
	(vec_set<mode>): Move RTL pattern from neon.md and modify to accept both
	MVE and NEON.

gcc/testsuite/ChangeLog:

2019-11-08  Andre Vieira  <andre.simoesdiasvieira@arm.com>
            Mihail Ionescu  <mihail.ionescu@arm.com>
            Srinath Parvathaneni  <srinath.parvathaneni@arm.com>

	* gcc.target/arm/mve/intrinsics/vgetq_lane_f16.c: New test.
	* gcc.target/arm/mve/intrinsics/vgetq_lane_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vgetq_lane_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vgetq_lane_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vgetq_lane_s64.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vgetq_lane_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vgetq_lane_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vgetq_lane_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vgetq_lane_u64.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vgetq_lane_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsetq_lane_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsetq_lane_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsetq_lane_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsetq_lane_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsetq_lane_s64.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsetq_lane_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsetq_lane_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsetq_lane_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsetq_lane_u64.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsetq_lane_u8.c: Likewise.


###############     Attachment also inlined for ease of reply    ###############


diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
index d0259d7bd96c565d901b7634e9f735e0e14bc9dc..9dcf8d692670cd8552fade9868bc051683553b91 100644
--- a/gcc/config/arm/arm_mve.h
+++ b/gcc/config/arm/arm_mve.h
@@ -2506,8 +2506,40 @@ typedef struct { uint8x16_t val[4]; } uint8x16x4_t;
 #define vld1q_z_f32(__base, __p) __arm_vld1q_z_f32(__base, __p)
 #define vst2q_f32(__addr, __value) __arm_vst2q_f32(__addr, __value)
 #define vst1q_p_f32(__addr, __value, __p) __arm_vst1q_p_f32(__addr, __value, __p)
+#define vsetq_lane_f16(__a, __b,  __idx) __arm_vsetq_lane_f16(__a, __b,  __idx)
+#define vsetq_lane_f32(__a, __b,  __idx) __arm_vsetq_lane_f32(__a, __b,  __idx)
+#define vsetq_lane_s16(__a, __b,  __idx) __arm_vsetq_lane_s16(__a, __b,  __idx)
+#define vsetq_lane_s32(__a, __b,  __idx) __arm_vsetq_lane_s32(__a, __b,  __idx)
+#define vsetq_lane_s8(__a, __b,  __idx) __arm_vsetq_lane_s8(__a, __b,  __idx)
+#define vsetq_lane_s64(__a, __b,  __idx) __arm_vsetq_lane_s64(__a, __b,  __idx)
+#define vsetq_lane_u8(__a, __b,  __idx) __arm_vsetq_lane_u8(__a, __b,  __idx)
+#define vsetq_lane_u16(__a, __b,  __idx) __arm_vsetq_lane_u16(__a, __b,  __idx)
+#define vsetq_lane_u32(__a, __b,  __idx) __arm_vsetq_lane_u32(__a, __b,  __idx)
+#define vsetq_lane_u64(__a, __b,  __idx) __arm_vsetq_lane_u64(__a, __b,  __idx)
+#define vgetq_lane_f16(__a,  __idx) __arm_vgetq_lane_f16(__a,  __idx)
+#define vgetq_lane_f32(__a,  __idx) __arm_vgetq_lane_f32(__a,  __idx)
+#define vgetq_lane_s16(__a,  __idx) __arm_vgetq_lane_s16(__a,  __idx)
+#define vgetq_lane_s32(__a,  __idx) __arm_vgetq_lane_s32(__a,  __idx)
+#define vgetq_lane_s8(__a,  __idx) __arm_vgetq_lane_s8(__a,  __idx)
+#define vgetq_lane_s64(__a,  __idx) __arm_vgetq_lane_s64(__a,  __idx)
+#define vgetq_lane_u8(__a,  __idx) __arm_vgetq_lane_u8(__a,  __idx)
+#define vgetq_lane_u16(__a,  __idx) __arm_vgetq_lane_u16(__a,  __idx)
+#define vgetq_lane_u32(__a,  __idx) __arm_vgetq_lane_u32(__a,  __idx)
+#define vgetq_lane_u64(__a,  __idx) __arm_vgetq_lane_u64(__a,  __idx)
 #endif
 
+/* For big-endian, GCC's vector indices are reversed within each 64 bits
+   compared to the architectural lane indices used by MVE intrinsics.  */
+#define __ARM_NUM_LANES(__v) (sizeof (__v) / sizeof (__v[0]))
+#ifdef __ARM_BIG_ENDIAN
+#define __ARM_LANEQ(__vec, __idx) (__idx ^ (__ARM_NUM_LANES(__vec)/2 - 1))
+#else
+#define __ARM_LANEQ(__vec, __idx) __idx
+#endif
+#define __ARM_CHECK_LANEQ(__vec, __idx)		 \
+  __builtin_arm_lane_check (__ARM_NUM_LANES(__vec),     \
+			    __ARM_LANEQ(__vec, __idx))
+
 __extension__ extern __inline void
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vst4q_s8 (int8_t * __addr, int8x16x4_t __value)
@@ -16371,6 +16403,142 @@ __arm_vld4q_u32 (uint32_t const * __addr)
   return __rv.__i;
 }
 
+__extension__ extern __inline int16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vsetq_lane_s16 (int16_t __a, int16x8_t __b, const int __idx)
+{
+  __ARM_CHECK_LANEQ (__b, __idx);
+  __b[__ARM_LANEQ(__b,__idx)] = __a;
+  return __b;
+}
+
+__extension__ extern __inline int32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vsetq_lane_s32 (int32_t __a, int32x4_t __b, const int __idx)
+{
+  __ARM_CHECK_LANEQ (__b, __idx);
+  __b[__ARM_LANEQ(__b,__idx)] = __a;
+  return __b;
+}
+
+__extension__ extern __inline int8x16_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vsetq_lane_s8 (int8_t __a, int8x16_t __b, const int __idx)
+{
+  __ARM_CHECK_LANEQ (__b, __idx);
+  __b[__ARM_LANEQ(__b,__idx)] = __a;
+  return __b;
+}
+
+__extension__ extern __inline int64x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vsetq_lane_s64 (int64_t __a, int64x2_t __b, const int __idx)
+{
+  __ARM_CHECK_LANEQ (__b, __idx);
+  __b[__ARM_LANEQ(__b,__idx)] = __a;
+  return __b;
+}
+
+__extension__ extern __inline uint8x16_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vsetq_lane_u8 (uint8_t __a, uint8x16_t __b, const int __idx)
+{
+  __ARM_CHECK_LANEQ (__b, __idx);
+  __b[__ARM_LANEQ(__b,__idx)] = __a;
+  return __b;
+}
+
+__extension__ extern __inline uint16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vsetq_lane_u16 (uint16_t __a, uint16x8_t __b, const int __idx)
+{
+  __ARM_CHECK_LANEQ (__b, __idx);
+  __b[__ARM_LANEQ(__b,__idx)] = __a;
+  return __b;
+}
+
+__extension__ extern __inline uint32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vsetq_lane_u32 (uint32_t __a, uint32x4_t __b, const int __idx)
+{
+  __ARM_CHECK_LANEQ (__b, __idx);
+  __b[__ARM_LANEQ(__b,__idx)] = __a;
+  return __b;
+}
+
+__extension__ extern __inline uint64x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vsetq_lane_u64 (uint64_t __a, uint64x2_t __b, const int __idx)
+{
+  __ARM_CHECK_LANEQ (__b, __idx);
+  __b[__ARM_LANEQ(__b,__idx)] = __a;
+  return __b;
+}
+
+__extension__ extern __inline int16_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vgetq_lane_s16 (int16x8_t __a, const int __idx)
+{
+  __ARM_CHECK_LANEQ (__a, __idx);
+  return __a[__ARM_LANEQ(__a,__idx)];
+}
+
+__extension__ extern __inline int32_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vgetq_lane_s32 (int32x4_t __a, const int __idx)
+{
+  __ARM_CHECK_LANEQ (__a, __idx);
+  return __a[__ARM_LANEQ(__a,__idx)];
+}
+
+__extension__ extern __inline int8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vgetq_lane_s8 (int8x16_t __a, const int __idx)
+{
+  __ARM_CHECK_LANEQ (__a, __idx);
+  return __a[__ARM_LANEQ(__a,__idx)];
+}
+
+__extension__ extern __inline int64_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vgetq_lane_s64 (int64x2_t __a, const int __idx)
+{
+  __ARM_CHECK_LANEQ (__a, __idx);
+  return __a[__ARM_LANEQ(__a,__idx)];
+}
+
+__extension__ extern __inline uint8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vgetq_lane_u8 (uint8x16_t __a, const int __idx)
+{
+  __ARM_CHECK_LANEQ (__a, __idx);
+  return __a[__ARM_LANEQ(__a,__idx)];
+}
+
+__extension__ extern __inline uint16_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vgetq_lane_u16 (uint16x8_t __a, const int __idx)
+{
+  __ARM_CHECK_LANEQ (__a, __idx);
+  return __a[__ARM_LANEQ(__a,__idx)];
+}
+
+__extension__ extern __inline uint32_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vgetq_lane_u32 (uint32x4_t __a, const int __idx)
+{
+  __ARM_CHECK_LANEQ (__a, __idx);
+  return __a[__ARM_LANEQ(__a,__idx)];
+}
+
+__extension__ extern __inline uint64_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vgetq_lane_u64 (uint64x2_t __a, const int __idx)
+{
+  __ARM_CHECK_LANEQ (__a, __idx);
+  return __a[__ARM_LANEQ(__a,__idx)];
+}
+
 #if (__ARM_FEATURE_MVE & 2) /* MVE Floating point.  */
 
 __extension__ extern __inline void
@@ -19804,6 +19972,39 @@ __arm_vst1q_p_f32 (float32_t * __addr, float32x4_t __value, mve_pred16_t __p)
   return vstrwq_p_f32 (__addr, __value, __p);
 }
 
+__extension__ extern __inline float16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vsetq_lane_f16 (float16_t __a, float16x8_t __b, const int __idx)
+{
+  __ARM_CHECK_LANEQ (__b, __idx);
+  __b[__ARM_LANEQ(__b,__idx)] = __a;
+  return __b;
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vsetq_lane_f32 (float32_t __a, float32x4_t __b, const int __idx)
+{
+  __ARM_CHECK_LANEQ (__b, __idx);
+  __b[__ARM_LANEQ(__b,__idx)] = __a;
+  return __b;
+}
+
+__extension__ extern __inline float16_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vgetq_lane_f16 (float16x8_t __a, const int __idx)
+{
+  __ARM_CHECK_LANEQ (__a, __idx);
+  return __a[__ARM_LANEQ(__a,__idx)];
+}
+
+__extension__ extern __inline float32_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vgetq_lane_f32 (float32x4_t __a, const int __idx)
+{
+  __ARM_CHECK_LANEQ (__a, __idx);
+  return __a[__ARM_LANEQ(__a,__idx)];
+}
 #endif
 
 enum {
@@ -22165,6 +22366,35 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]: __arm_vcmulq_rot90_x_f16 (__ARM_mve_coerce(__p1, float16x8_t), __ARM_mve_coerce(__p2, float16x8_t), p3), \
   int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]: __arm_vcmulq_rot90_x_f32 (__ARM_mve_coerce(__p1, float32x4_t), __ARM_mve_coerce(__p2, float32x4_t), p3));})
 
+#define vgetq_lane(p0,p1) __arm_vgetq_lane(p0,p1)
+#define __arm_vgetq_lane(p0,p1) ({ __typeof(p0) __p0 = (p0); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
+  int (*)[__ARM_mve_type_int8x16_t]: __arm_vgetq_lane_s8 (__ARM_mve_coerce(__p0, int8x16_t), p1), \
+  int (*)[__ARM_mve_type_int16x8_t]: __arm_vgetq_lane_s16 (__ARM_mve_coerce(__p0, int16x8_t), p1), \
+  int (*)[__ARM_mve_type_int32x4_t]: __arm_vgetq_lane_s32 (__ARM_mve_coerce(__p0, int32x4_t), p1), \
+  int (*)[__ARM_mve_type_int64x2_t]: __arm_vgetq_lane_s64 (__ARM_mve_coerce(__p0, int64x2_t), p1), \
+  int (*)[__ARM_mve_type_uint8x16_t]: __arm_vgetq_lane_u8 (__ARM_mve_coerce(__p0, uint8x16_t), p1), \
+  int (*)[__ARM_mve_type_uint16x8_t]: __arm_vgetq_lane_u16 (__ARM_mve_coerce(__p0, uint16x8_t), p1), \
+  int (*)[__ARM_mve_type_uint32x4_t]: __arm_vgetq_lane_u32 (__ARM_mve_coerce(__p0, uint32x4_t), p1), \
+  int (*)[__ARM_mve_type_uint64x2_t]: __arm_vgetq_lane_u64 (__ARM_mve_coerce(__p0, uint64x2_t), p1), \
+  int (*)[__ARM_mve_type_float16x8_t]: __arm_vgetq_lane_f16 (__ARM_mve_coerce(__p0, float16x8_t), p1), \
+  int (*)[__ARM_mve_type_float32x4_t]: __arm_vgetq_lane_f32 (__ARM_mve_coerce(__p0, float32x4_t), p1));})
+
+#define vsetq_lane(p0,p1,p2) __arm_vsetq_lane(p0,p1,p2)
+#define __arm_vsetq_lane(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
+  __typeof(p1) __p1 = (p1); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
+  int (*)[__ARM_mve_type_int8_t][__ARM_mve_type_int8x16_t]: __arm_vsetq_lane_s8 (__ARM_mve_coerce(__p0, int8_t), __ARM_mve_coerce(__p1, int8x16_t), p2), \
+  int (*)[__ARM_mve_type_int16_t][__ARM_mve_type_int16x8_t]: __arm_vsetq_lane_s16 (__ARM_mve_coerce(__p0, int16_t), __ARM_mve_coerce(__p1, int16x8_t), p2), \
+  int (*)[__ARM_mve_type_int32_t][__ARM_mve_type_int32x4_t]: __arm_vsetq_lane_s32 (__ARM_mve_coerce(__p0, int32_t), __ARM_mve_coerce(__p1, int32x4_t), p2), \
+  int (*)[__ARM_mve_type_int64_t][__ARM_mve_type_int64x2_t]: __arm_vsetq_lane_s64 (__ARM_mve_coerce(__p0, int64_t), __ARM_mve_coerce(__p1, int64x2_t), p2), \
+  int (*)[__ARM_mve_type_uint8_t][__ARM_mve_type_uint8x16_t]: __arm_vsetq_lane_u8 (__ARM_mve_coerce(__p0, uint8_t), __ARM_mve_coerce(__p1, uint8x16_t), p2), \
+  int (*)[__ARM_mve_type_uint16_t][__ARM_mve_type_uint16x8_t]: __arm_vsetq_lane_u16 (__ARM_mve_coerce(__p0, uint16_t), __ARM_mve_coerce(__p1, uint16x8_t), p2), \
+  int (*)[__ARM_mve_type_uint32_t][__ARM_mve_type_uint32x4_t]: __arm_vsetq_lane_u32 (__ARM_mve_coerce(__p0, uint32_t), __ARM_mve_coerce(__p1, uint32x4_t), p2), \
+  int (*)[__ARM_mve_type_uint64_t][__ARM_mve_type_uint64x2_t]: __arm_vsetq_lane_u64 (__ARM_mve_coerce(__p0, uint64_t), __ARM_mve_coerce(__p1, uint64x2_t), p2), \
+  int (*)[__ARM_mve_type_float16_t][__ARM_mve_type_float16x8_t]: __arm_vsetq_lane_f16 (__ARM_mve_coerce(__p0, float16_t), __ARM_mve_coerce(__p1, float16x8_t), p2), \
+  int (*)[__ARM_mve_type_float32_t][__ARM_mve_type_float32x4_t]: __arm_vsetq_lane_f32 (__ARM_mve_coerce(__p0, float32_t), __ARM_mve_coerce(__p1, float32x4_t), p2));})
+
 #else /* MVE Interger.  */
 
 #define vst4q(p0,p1) __arm_vst4q(p0,p1)
@@ -26262,6 +26492,31 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_uint16_t_const_ptr]: __arm_vld4q_u16 (__ARM_mve_coerce(__p0, uint16_t const *)), \
   int (*)[__ARM_mve_type_uint32_t_const_ptr]: __arm_vld4q_u32 (__ARM_mve_coerce(__p0, uint32_t const *)));})
 
+#define vgetq_lane(p0,p1) __arm_vgetq_lane(p0,p1)
+#define __arm_vgetq_lane(p0,p1) ({ __typeof(p0) __p0 = (p0); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
+  int (*)[__ARM_mve_type_int8x16_t]: __arm_vgetq_lane_s8 (__ARM_mve_coerce(__p0, int8x16_t), p1), \
+  int (*)[__ARM_mve_type_int16x8_t]: __arm_vgetq_lane_s16 (__ARM_mve_coerce(__p0, int16x8_t), p1), \
+  int (*)[__ARM_mve_type_int32x4_t]: __arm_vgetq_lane_s32 (__ARM_mve_coerce(__p0, int32x4_t), p1), \
+  int (*)[__ARM_mve_type_int64x2_t]: __arm_vgetq_lane_s64 (__ARM_mve_coerce(__p0, int64x2_t), p1), \
+  int (*)[__ARM_mve_type_uint8x16_t]: __arm_vgetq_lane_u8 (__ARM_mve_coerce(__p0, uint8x16_t), p1), \
+  int (*)[__ARM_mve_type_uint16x8_t]: __arm_vgetq_lane_u16 (__ARM_mve_coerce(__p0, uint16x8_t), p1), \
+  int (*)[__ARM_mve_type_uint32x4_t]: __arm_vgetq_lane_u32 (__ARM_mve_coerce(__p0, uint32x4_t), p1), \
+  int (*)[__ARM_mve_type_uint64x2_t]: __arm_vgetq_lane_u64 (__ARM_mve_coerce(__p0, uint64x2_t), p1));})
+
+#define vsetq_lane(p0,p1,p2) __arm_vsetq_lane(p0,p1,p2)
+#define __arm_vsetq_lane(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
+  __typeof(p1) __p1 = (p1); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
+  int (*)[__ARM_mve_type_int8_t][__ARM_mve_type_int8x16_t]: __arm_vsetq_lane_s8 (__ARM_mve_coerce(__p0, int8_t), __ARM_mve_coerce(__p1, int8x16_t), p2), \
+  int (*)[__ARM_mve_type_int16_t][__ARM_mve_type_int16x8_t]: __arm_vsetq_lane_s16 (__ARM_mve_coerce(__p0, int16_t), __ARM_mve_coerce(__p1, int16x8_t), p2), \
+  int (*)[__ARM_mve_type_int32_t][__ARM_mve_type_int32x4_t]: __arm_vsetq_lane_s32 (__ARM_mve_coerce(__p0, int32_t), __ARM_mve_coerce(__p1, int32x4_t), p2), \
+  int (*)[__ARM_mve_type_int64_t][__ARM_mve_type_int64x2_t]: __arm_vsetq_lane_s64 (__ARM_mve_coerce(__p0, int64_t), __ARM_mve_coerce(__p1, int64x2_t), p2), \
+  int (*)[__ARM_mve_type_uint8_t][__ARM_mve_type_uint8x16_t]: __arm_vsetq_lane_u8 (__ARM_mve_coerce(__p0, uint8_t), __ARM_mve_coerce(__p1, uint8x16_t), p2), \
+  int (*)[__ARM_mve_type_uint16_t][__ARM_mve_type_uint16x8_t]: __arm_vsetq_lane_u16 (__ARM_mve_coerce(__p0, uint16_t), __ARM_mve_coerce(__p1, uint16x8_t), p2), \
+  int (*)[__ARM_mve_type_uint32_t][__ARM_mve_type_uint32x4_t]: __arm_vsetq_lane_u32 (__ARM_mve_coerce(__p0, uint32_t), __ARM_mve_coerce(__p1, uint32x4_t), p2), \
+  int (*)[__ARM_mve_type_uint64_t][__ARM_mve_type_uint64x2_t]: __arm_vsetq_lane_u64 (__ARM_mve_coerce(__p0, uint64_t), __ARM_mve_coerce(__p1, uint64x2_t), p2));})
+
 #endif /* MVE Floating point.  */
 
 #ifdef __cplusplus
diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index 62735e5ddab1125e6f38fbf5a5cb5c04936a7717..b679511e42ce909cc9ef19e1cb790e8a5254d538 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -411,6 +411,8 @@
 (define_mode_attr MVE_H_ELEM [ (V8HI "V8HI") (V4SI "V4HI")])
 (define_mode_attr V_sz_elem1 [(V16QI "b") (V8HI  "h") (V4SI "w") (V8HF "h")
 			      (V4SF "w")])
+(define_mode_attr V_extr_elem [(V16QI "u8") (V8HI "u16") (V4SI "32")
+			       (V8HF "u16") (V4SF "32")])
 
 (define_int_iterator VCVTQ_TO_F [VCVTQ_TO_F_S VCVTQ_TO_F_U])
 (define_int_iterator VMVNQ_N [VMVNQ_N_U VMVNQ_N_S])
@@ -10863,3 +10865,121 @@
    return "";
 }
   [(set_attr "length" "16")])
+;;
+;; [vgetq_lane_u, vgetq_lane_s, vgetq_lane_f])
+;;
+(define_insn "mve_vec_extract<mode><V_elem_l>"
+ [(set (match_operand:<V_elem> 0 "s_register_operand" "=r")
+   (vec_select:<V_elem>
+    (match_operand:MVE_VLD_ST 1 "s_register_operand" "w")
+    (parallel [(match_operand:SI 2 "immediate_operand" "i")])))]
+  "(TARGET_HAVE_MVE && VALID_MVE_SI_MODE (<MODE>mode))
+   || (TARGET_HAVE_MVE_FLOAT && VALID_MVE_SF_MODE (<MODE>mode))"
+{
+  if (BYTES_BIG_ENDIAN)
+    {
+      int elt = INTVAL (operands[2]);
+      elt = GET_MODE_NUNITS (<MODE>mode) - 1 - elt;
+      operands[2] = GEN_INT (elt);
+    }
+  return "vmov.<V_extr_elem>\t%0, %q1[%c2]";
+}
+ [(set_attr "type" "mve_move")])
+
+(define_insn "mve_vec_extractv2didi"
+ [(set (match_operand:DI 0 "s_register_operand" "=r")
+   (vec_select:DI
+    (match_operand:V2DI 1 "s_register_operand" "w")
+    (parallel [(match_operand:SI 2 "immediate_operand" "i")])))]
+  "TARGET_HAVE_MVE"
+{
+  int elt = INTVAL (operands[2]);
+  if (BYTES_BIG_ENDIAN)
+    elt = 1 - elt;
+
+  if (elt == 0)
+   return "vmov\t%Q0, %R0, %e1";
+  else
+   return "vmov\t%J0, %K0, %f1";
+}
+ [(set_attr "type" "mve_move")])
+
+(define_insn "*mve_vec_extract_sext_internal<mode>"
+ [(set (match_operand:SI 0 "s_register_operand" "=r")
+   (sign_extend:SI
+    (vec_select:<V_elem>
+     (match_operand:MVE_2 1 "s_register_operand" "w")
+     (parallel [(match_operand:SI 2 "immediate_operand" "i")]))))]
+  "(TARGET_HAVE_MVE && VALID_MVE_SI_MODE (<MODE>mode))
+   || (TARGET_HAVE_MVE_FLOAT && VALID_MVE_SF_MODE (<MODE>mode))"
+{
+  if (BYTES_BIG_ENDIAN)
+    {
+      int elt = INTVAL (operands[2]);
+      elt = GET_MODE_NUNITS (<MODE>mode) - 1 - elt;
+      operands[2] = GEN_INT (elt);
+    }
+  return "vmov.s<V_sz_elem>\t%0, %q1[%c2]";
+}
+ [(set_attr "type" "mve_move")])
+
+(define_insn "*mve_vec_extract_zext_internal<mode>"
+ [(set (match_operand:SI 0 "s_register_operand" "=r")
+   (zero_extend:SI
+    (vec_select:<V_elem>
+     (match_operand:MVE_2 1 "s_register_operand" "w")
+     (parallel [(match_operand:SI 2 "immediate_operand" "i")]))))]
+  "(TARGET_HAVE_MVE && VALID_MVE_SI_MODE (<MODE>mode))
+   || (TARGET_HAVE_MVE_FLOAT && VALID_MVE_SF_MODE (<MODE>mode))"
+{
+  if (BYTES_BIG_ENDIAN)
+    {
+      int elt = INTVAL (operands[2]);
+      elt = GET_MODE_NUNITS (<MODE>mode) - 1 - elt;
+      operands[2] = GEN_INT (elt);
+    }
+  return "vmov.u<V_sz_elem>\t%0, %q1[%c2]";
+}
+ [(set_attr "type" "mve_move")])
+
+;;
+;; [vsetq_lane_u, vsetq_lane_s, vsetq_lane_f])
+;;
+(define_insn "mve_vec_set<mode>_internal"
+ [(set (match_operand:VQ2 0 "s_register_operand" "=w")
+       (vec_merge:VQ2
+	(vec_duplicate:VQ2
+	  (match_operand:<V_elem> 1 "nonimmediate_operand" "r"))
+	(match_operand:VQ2 3 "s_register_operand" "0")
+	(match_operand:SI 2 "immediate_operand" "i")))]
+  "(TARGET_HAVE_MVE && VALID_MVE_SI_MODE (<MODE>mode))
+   || (TARGET_HAVE_MVE_FLOAT && VALID_MVE_SF_MODE (<MODE>mode))"
+{
+  int elt = ffs ((int) INTVAL (operands[2])) - 1;
+  if (BYTES_BIG_ENDIAN)
+    elt = GET_MODE_NUNITS (<MODE>mode) - 1 - elt;
+  operands[2] = GEN_INT (elt);
+
+  return "vmov.<V_sz_elem>\t%q0[%c2], %1";
+}
+ [(set_attr "type" "mve_move")])
+
+(define_insn "mve_vec_setv2di_internal"
+ [(set (match_operand:V2DI 0 "s_register_operand" "=w")
+       (vec_merge:V2DI
+	(vec_duplicate:V2DI
+	  (match_operand:DI 1 "nonimmediate_operand" "r"))
+	(match_operand:V2DI 3 "s_register_operand" "0")
+	(match_operand:SI 2 "immediate_operand" "i")))]
+ "TARGET_HAVE_MVE"
+{
+  int elt = ffs ((int) INTVAL (operands[2])) - 1;
+  if (BYTES_BIG_ENDIAN)
+    elt = 1 - elt;
+
+  if (elt == 0)
+   return "vmov\t%e0, %Q1, %R1";
+  else
+   return "vmov\t%f0, %J1, %K1";
+}
+ [(set_attr "type" "mve_move")])
diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index df8f4fd4166b0d9cd08f01f3ac7ac3958f20b9db..72bdf557dc9fd16a6d97286d7232f9a9071a6e5f 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -411,18 +411,6 @@
   [(set_attr "type" "neon_load1_all_lanes_q,neon_from_gp_q")]
 )
 
-(define_expand "vec_set<mode>"
-  [(match_operand:VDQ 0 "s_register_operand")
-   (match_operand:<V_elem> 1 "s_register_operand")
-   (match_operand:SI 2 "immediate_operand")]
-  "TARGET_NEON"
-{
-  HOST_WIDE_INT elem = HOST_WIDE_INT_1 << INTVAL (operands[2]);
-  emit_insn (gen_vec_set<mode>_internal (operands[0], operands[1],
-					 GEN_INT (elem), operands[0]));
-  DONE;
-})
-
 (define_insn "vec_extract<mode><V_elem_l>"
   [(set (match_operand:<V_elem> 0 "nonimmediate_operand" "=Um,r")
         (vec_select:<V_elem>
@@ -445,7 +433,10 @@
   [(set_attr "type" "neon_store1_one_lane<q>,neon_to_gp<q>")]
 )
 
-(define_insn "vec_extract<mode><V_elem_l>"
+;; This pattern is renamed from "vec_extract<mode><V_elem_l>" to
+;; "neon_vec_extract<mode><V_elem_l>" and this pattern is called
+;; by define_expand in vec-common.md file.
+(define_insn "neon_vec_extract<mode><V_elem_l>"
   [(set (match_operand:<V_elem> 0 "nonimmediate_operand" "=Um,r")
 	(vec_select:<V_elem>
           (match_operand:VQ2 1 "s_register_operand" "w,w")
@@ -471,7 +462,9 @@
   [(set_attr "type" "neon_store1_one_lane<q>,neon_to_gp<q>")]
 )
 
-(define_insn "vec_extractv2didi"
+;; This pattern is renamed from "vec_extractv2didi" to "neon_vec_extractv2didi"
+;; and this pattern is called by define_expand in vec-common.md file.
+(define_insn "neon_vec_extractv2didi"
   [(set (match_operand:DI 0 "nonimmediate_operand" "=Um,r")
 	(vec_select:DI
           (match_operand:V2DI 1 "s_register_operand" "w,w")
diff --git a/gcc/config/arm/vec-common.md b/gcc/config/arm/vec-common.md
index 82a1a6bd7698fa25db88a5cdb5b3e762dc80a589..e052047f8b18e4b251ea0b322448c414b53ea422 100644
--- a/gcc/config/arm/vec-common.md
+++ b/gcc/config/arm/vec-common.md
@@ -190,3 +190,40 @@
   arm_expand_vec_perm (operands[0], operands[1], operands[2], operands[3]);
   DONE;
 })
+
+;; The following expand patterns are moved from neon.md to here and
+;; modifications are made to support both NEON and MVE.  This are needed for
+;; intrinsics vgetq_lane and vsetq_lane intrinsics in MVE.
+
+(define_expand "vec_extract<mode><V_elem_l>"
+ [(match_operand:<V_elem> 0 "nonimmediate_operand")
+  (match_operand:VQX 1 "s_register_operand")
+  (match_operand:SI 2 "immediate_operand")]
+ "TARGET_NEON || TARGET_HAVE_MVE"
+{
+  if (TARGET_NEON)
+    emit_insn (gen_neon_vec_extract<mode><V_elem_l> (operands[0], operands[1],
+						     operands[2]));
+  else if (TARGET_HAVE_MVE)
+    emit_insn (gen_mve_vec_extract<mode><V_elem_l> (operands[0], operands[1],
+						     operands[2]));
+  else
+    gcc_unreachable ();
+  DONE;
+})
+
+(define_expand "vec_set<mode>"
+  [(match_operand:VQX 0 "s_register_operand" "")
+   (match_operand:<V_elem> 1 "s_register_operand" "")
+   (match_operand:SI 2 "immediate_operand" "")]
+  "TARGET_NEON || TARGET_HAVE_MVE"
+{
+  HOST_WIDE_INT elem = HOST_WIDE_INT_1 << INTVAL (operands[2]);
+  if (TARGET_NEON)
+    emit_insn (gen_vec_set<mode>_internal (operands[0], operands[1],
+					   GEN_INT (elem), operands[0]));
+  else
+    emit_insn (gen_mve_vec_set<mode>_internal (operands[0], operands[1],
+					       GEN_INT (elem), operands[0]));
+  DONE;
+})
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_f16.c
new file mode 100644
index 0000000000000000000000000000000000000000..ba5cf8187080919ff8df24b6a747ebe40a701a17
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_f16.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+float16_t
+foo (float16x8_t a)
+{
+  return vgetq_lane_f16 (a, 0);
+}
+
+/* { dg-final { scan-assembler "vmov.u16"  }  } */
+
+float16_t
+foo1 (float16x8_t a)
+{
+  return vgetq_lane (a, 0);
+}
+
+/* { dg-final { scan-assembler "vmov.u16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_f32.c
new file mode 100644
index 0000000000000000000000000000000000000000..e660e71bfb44b56f9e3f7c6ea55fe33753961641
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_f32.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+float32_t
+foo (float32x4_t a)
+{
+  return vgetq_lane_f32 (a, 0);
+}
+
+/* { dg-final { scan-assembler "vmov.32"  }  } */
+
+float32_t
+foo1 (float32x4_t a)
+{
+  return vgetq_lane (a, 0);
+}
+
+/* { dg-final { scan-assembler "vmov.32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_s16.c
new file mode 100644
index 0000000000000000000000000000000000000000..51609c3c4e40240204df1645717ed9eee38d897b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_s16.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int16_t
+foo (int16x8_t a)
+{
+  return vgetq_lane_s16 (a, 0);
+}
+
+/* { dg-final { scan-assembler "vmov.s16"  }  } */
+
+int16_t
+foo1 (int16x8_t a)
+{
+  return vgetq_lane (a, 0);
+}
+
+/* { dg-final { scan-assembler "vmov.s16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_s32.c
new file mode 100644
index 0000000000000000000000000000000000000000..47f092faeaef9abbcb947eb94a2a29ff011216c3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_s32.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int32_t
+foo (int32x4_t a)
+{
+  return vgetq_lane_s32 (a, 0);
+}
+
+/* { dg-final { scan-assembler "vmov.32"  }  } */
+
+int32_t
+foo1 (int32x4_t a)
+{
+  return vgetq_lane (a, 0);
+}
+
+/* { dg-final { scan-assembler "vmov.32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_s64.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_s64.c
new file mode 100644
index 0000000000000000000000000000000000000000..4e048dbf3f64548891232790e5da21718b4d33ad
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_s64.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int64_t
+foo (int64x2_t a)
+{
+  return vgetq_lane_s64 (a, 0);
+}
+
+/* { dg-final { scan-assembler {vmov\tr0, r1, d0}  }  } */
+
+int64_t
+foo1 (int64x2_t a)
+{
+  return vgetq_lane (a, 0);
+}
+
+/* { dg-final { scan-assembler {vmov\tr0, r1, d0}  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_s8.c
new file mode 100644
index 0000000000000000000000000000000000000000..ccf26912915411952eef220c3cc689df3f7504bb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_s8.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int8_t
+foo (int8x16_t a)
+{
+  return vgetq_lane_s8 (a, 0);
+}
+
+/* { dg-final { scan-assembler "vmov.s8"  }  } */
+
+int8_t
+foo1 (int8x16_t a)
+{
+  return vgetq_lane (a, 0);
+}
+
+/* { dg-final { scan-assembler "vmov.s8"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_u16.c
new file mode 100644
index 0000000000000000000000000000000000000000..ea463a6cc782cc04ea4607a20751d57f7d84808f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_u16.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint16_t
+foo (uint16x8_t a)
+{
+  return vgetq_lane_u16 (a, 0);
+}
+
+/* { dg-final { scan-assembler "vmov.u16"  }  } */
+
+uint16_t
+foo1 (uint16x8_t a)
+{
+  return vgetq_lane (a, 0);
+}
+
+/* { dg-final { scan-assembler "vmov.u16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_u32.c
new file mode 100644
index 0000000000000000000000000000000000000000..e136d2e8ab2b99aff08e51eb3d8b32602d5a9990
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_u32.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint32_t
+foo (uint32x4_t a)
+{
+  return vgetq_lane_u32 (a, 0);
+}
+
+/* { dg-final { scan-assembler "vmov.32"  }  } */
+
+uint32_t
+foo1 (uint32x4_t a)
+{
+  return vgetq_lane (a, 0);
+}
+
+/* { dg-final { scan-assembler "vmov.32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_u64.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_u64.c
new file mode 100644
index 0000000000000000000000000000000000000000..427687fbf404940fc5a8334bac5034e2aeb9112e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_u64.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint64_t
+foo (uint64x2_t a)
+{
+  return vgetq_lane_u64 (a, 0);
+}
+
+/* { dg-final { scan-assembler {vmov\tr0, r1, d0}  }  } */
+
+uint64_t
+foo1 (uint64x2_t a)
+{
+  return vgetq_lane (a, 0);
+}
+
+/* { dg-final { scan-assembler {vmov\tr0, r1, d0}  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_u8.c
new file mode 100644
index 0000000000000000000000000000000000000000..4db14fce783eead4c0f602b02f7e1353adc0c671
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_u8.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint8_t
+foo (uint8x16_t a)
+{
+  return vgetq_lane_u8 (a, 0);
+}
+
+/* { dg-final { scan-assembler "vmov.u8"  }  } */
+
+uint8_t
+foo1 (uint8x16_t a)
+{
+  return vgetq_lane (a, 0);
+}
+
+/* { dg-final { scan-assembler "vmov.u8"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_f16.c
new file mode 100644
index 0000000000000000000000000000000000000000..a2e4b765b61c612c580e2177a57255786e5529e2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_f16.c
@@ -0,0 +1,14 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+float16x8_t
+foo (float16_t a, float16x8_t b)
+{
+    return vsetq_lane_f16 (a, b, 0);
+}
+
+/* { dg-final { scan-assembler "vmov.16"  }  } */
+
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_f32.c
new file mode 100644
index 0000000000000000000000000000000000000000..c538d62c41145434a74770b75c762da4c3fc5898
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_f32.c
@@ -0,0 +1,14 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+float32x4_t
+foo (float32_t a, float32x4_t b)
+{
+    return vsetq_lane_f32 (a, b, 0);
+}
+
+/* { dg-final { scan-assembler "vmov.32"  }  } */
+
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_s16.c
new file mode 100644
index 0000000000000000000000000000000000000000..648789dc8df3f7bdc640255c6fb3f458a554c8e5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_s16.c
@@ -0,0 +1,14 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int16x8_t
+foo (int16_t a, int16x8_t b)
+{
+    return vsetq_lane_s16 (a, b, 0);
+}
+
+/* { dg-final { scan-assembler "vmov.16"  }  } */
+
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_s32.c
new file mode 100644
index 0000000000000000000000000000000000000000..18ca25b637ca5335d8068aee41f2a220416f660d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_s32.c
@@ -0,0 +1,14 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int32x4_t
+foo (int32_t a, int32x4_t b)
+{
+    return vsetq_lane_s32 (a, b, 0);
+}
+
+/* { dg-final { scan-assembler "vmov.32"  }  } */
+
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_s64.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_s64.c
new file mode 100644
index 0000000000000000000000000000000000000000..0c82eeb081cb2733078a07a7e13f3854a51afd27
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_s64.c
@@ -0,0 +1,14 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int64x2_t
+foo (int64_t a, int64x2_t b)
+{
+    return vsetq_lane_s64 (a, b, 0);
+}
+
+/* { dg-final { scan-assembler {vmov\td0, r[1-9]*[0-9], r[1-9]*[0-9]}  }  } */
+
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_s8.c
new file mode 100644
index 0000000000000000000000000000000000000000..87c06cd013da9a6b8e5a866bc946f2e10cd81522
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_s8.c
@@ -0,0 +1,14 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int8x16_t
+foo (int8_t a, int8x16_t b)
+{
+    return vsetq_lane_s8 (a, b, 0);
+}
+
+/* { dg-final { scan-assembler "vmov.8"  }  } */
+
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_u16.c
new file mode 100644
index 0000000000000000000000000000000000000000..92984297ad1d83bab29d1cfd12d72932eda50fe4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_u16.c
@@ -0,0 +1,14 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint16x8_t
+foo (uint16_t a, uint16x8_t b)
+{
+    return vsetq_lane_u16 (a, b, 0);
+}
+
+/* { dg-final { scan-assembler "vmov.16"  }  } */
+
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_u32.c
new file mode 100644
index 0000000000000000000000000000000000000000..755e7c9a978cd28dfcef108f6e624413fea35dff
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_u32.c
@@ -0,0 +1,14 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint32x4_t
+foo (uint32_t a, uint32x4_t b)
+{
+    return vsetq_lane_u32 (a, b, 0);
+}
+
+/* { dg-final { scan-assembler "vmov.32"  }  } */
+
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_u64.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_u64.c
new file mode 100644
index 0000000000000000000000000000000000000000..d7d593b2193cec787fefe547a47b1261def00f40
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_u64.c
@@ -0,0 +1,14 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint64x2_t
+foo (uint64_t a, uint64x2_t b)
+{
+    return vsetq_lane_u64 (a, b, 0);
+}
+
+/* { dg-final { scan-assembler {vmov\td0, r[1-9]*[0-9], r[1-9]*[0-9]}  }  } */
+
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_u8.c
new file mode 100644
index 0000000000000000000000000000000000000000..b263360fc1a23124e91a705232cc6eb77012f803
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_u8.c
@@ -0,0 +1,14 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint8x16_t
+foo (uint8_t a, uint8x16_t b)
+{
+    return vsetq_lane_u8 (a, b, 0);
+}
+
+/* { dg-final { scan-assembler "vmov.8"  }  } */
+


[-- Attachment #2: diff35.patch --]
[-- Type: text/plain, Size: 37604 bytes --]

diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
index d0259d7bd96c565d901b7634e9f735e0e14bc9dc..9dcf8d692670cd8552fade9868bc051683553b91 100644
--- a/gcc/config/arm/arm_mve.h
+++ b/gcc/config/arm/arm_mve.h
@@ -2506,8 +2506,40 @@ typedef struct { uint8x16_t val[4]; } uint8x16x4_t;
 #define vld1q_z_f32(__base, __p) __arm_vld1q_z_f32(__base, __p)
 #define vst2q_f32(__addr, __value) __arm_vst2q_f32(__addr, __value)
 #define vst1q_p_f32(__addr, __value, __p) __arm_vst1q_p_f32(__addr, __value, __p)
+#define vsetq_lane_f16(__a, __b,  __idx) __arm_vsetq_lane_f16(__a, __b,  __idx)
+#define vsetq_lane_f32(__a, __b,  __idx) __arm_vsetq_lane_f32(__a, __b,  __idx)
+#define vsetq_lane_s16(__a, __b,  __idx) __arm_vsetq_lane_s16(__a, __b,  __idx)
+#define vsetq_lane_s32(__a, __b,  __idx) __arm_vsetq_lane_s32(__a, __b,  __idx)
+#define vsetq_lane_s8(__a, __b,  __idx) __arm_vsetq_lane_s8(__a, __b,  __idx)
+#define vsetq_lane_s64(__a, __b,  __idx) __arm_vsetq_lane_s64(__a, __b,  __idx)
+#define vsetq_lane_u8(__a, __b,  __idx) __arm_vsetq_lane_u8(__a, __b,  __idx)
+#define vsetq_lane_u16(__a, __b,  __idx) __arm_vsetq_lane_u16(__a, __b,  __idx)
+#define vsetq_lane_u32(__a, __b,  __idx) __arm_vsetq_lane_u32(__a, __b,  __idx)
+#define vsetq_lane_u64(__a, __b,  __idx) __arm_vsetq_lane_u64(__a, __b,  __idx)
+#define vgetq_lane_f16(__a,  __idx) __arm_vgetq_lane_f16(__a,  __idx)
+#define vgetq_lane_f32(__a,  __idx) __arm_vgetq_lane_f32(__a,  __idx)
+#define vgetq_lane_s16(__a,  __idx) __arm_vgetq_lane_s16(__a,  __idx)
+#define vgetq_lane_s32(__a,  __idx) __arm_vgetq_lane_s32(__a,  __idx)
+#define vgetq_lane_s8(__a,  __idx) __arm_vgetq_lane_s8(__a,  __idx)
+#define vgetq_lane_s64(__a,  __idx) __arm_vgetq_lane_s64(__a,  __idx)
+#define vgetq_lane_u8(__a,  __idx) __arm_vgetq_lane_u8(__a,  __idx)
+#define vgetq_lane_u16(__a,  __idx) __arm_vgetq_lane_u16(__a,  __idx)
+#define vgetq_lane_u32(__a,  __idx) __arm_vgetq_lane_u32(__a,  __idx)
+#define vgetq_lane_u64(__a,  __idx) __arm_vgetq_lane_u64(__a,  __idx)
 #endif
 
+/* For big-endian, GCC's vector indices are reversed within each 64 bits
+   compared to the architectural lane indices used by MVE intrinsics.  */
+#define __ARM_NUM_LANES(__v) (sizeof (__v) / sizeof (__v[0]))
+#ifdef __ARM_BIG_ENDIAN
+#define __ARM_LANEQ(__vec, __idx) (__idx ^ (__ARM_NUM_LANES(__vec)/2 - 1))
+#else
+#define __ARM_LANEQ(__vec, __idx) __idx
+#endif
+#define __ARM_CHECK_LANEQ(__vec, __idx)		 \
+  __builtin_arm_lane_check (__ARM_NUM_LANES(__vec),     \
+			    __ARM_LANEQ(__vec, __idx))
+
 __extension__ extern __inline void
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vst4q_s8 (int8_t * __addr, int8x16x4_t __value)
@@ -16371,6 +16403,142 @@ __arm_vld4q_u32 (uint32_t const * __addr)
   return __rv.__i;
 }
 
+__extension__ extern __inline int16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vsetq_lane_s16 (int16_t __a, int16x8_t __b, const int __idx)
+{
+  __ARM_CHECK_LANEQ (__b, __idx);
+  __b[__ARM_LANEQ(__b,__idx)] = __a;
+  return __b;
+}
+
+__extension__ extern __inline int32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vsetq_lane_s32 (int32_t __a, int32x4_t __b, const int __idx)
+{
+  __ARM_CHECK_LANEQ (__b, __idx);
+  __b[__ARM_LANEQ(__b,__idx)] = __a;
+  return __b;
+}
+
+__extension__ extern __inline int8x16_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vsetq_lane_s8 (int8_t __a, int8x16_t __b, const int __idx)
+{
+  __ARM_CHECK_LANEQ (__b, __idx);
+  __b[__ARM_LANEQ(__b,__idx)] = __a;
+  return __b;
+}
+
+__extension__ extern __inline int64x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vsetq_lane_s64 (int64_t __a, int64x2_t __b, const int __idx)
+{
+  __ARM_CHECK_LANEQ (__b, __idx);
+  __b[__ARM_LANEQ(__b,__idx)] = __a;
+  return __b;
+}
+
+__extension__ extern __inline uint8x16_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vsetq_lane_u8 (uint8_t __a, uint8x16_t __b, const int __idx)
+{
+  __ARM_CHECK_LANEQ (__b, __idx);
+  __b[__ARM_LANEQ(__b,__idx)] = __a;
+  return __b;
+}
+
+__extension__ extern __inline uint16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vsetq_lane_u16 (uint16_t __a, uint16x8_t __b, const int __idx)
+{
+  __ARM_CHECK_LANEQ (__b, __idx);
+  __b[__ARM_LANEQ(__b,__idx)] = __a;
+  return __b;
+}
+
+__extension__ extern __inline uint32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vsetq_lane_u32 (uint32_t __a, uint32x4_t __b, const int __idx)
+{
+  __ARM_CHECK_LANEQ (__b, __idx);
+  __b[__ARM_LANEQ(__b,__idx)] = __a;
+  return __b;
+}
+
+__extension__ extern __inline uint64x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vsetq_lane_u64 (uint64_t __a, uint64x2_t __b, const int __idx)
+{
+  __ARM_CHECK_LANEQ (__b, __idx);
+  __b[__ARM_LANEQ(__b,__idx)] = __a;
+  return __b;
+}
+
+__extension__ extern __inline int16_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vgetq_lane_s16 (int16x8_t __a, const int __idx)
+{
+  __ARM_CHECK_LANEQ (__a, __idx);
+  return __a[__ARM_LANEQ(__a,__idx)];
+}
+
+__extension__ extern __inline int32_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vgetq_lane_s32 (int32x4_t __a, const int __idx)
+{
+  __ARM_CHECK_LANEQ (__a, __idx);
+  return __a[__ARM_LANEQ(__a,__idx)];
+}
+
+__extension__ extern __inline int8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vgetq_lane_s8 (int8x16_t __a, const int __idx)
+{
+  __ARM_CHECK_LANEQ (__a, __idx);
+  return __a[__ARM_LANEQ(__a,__idx)];
+}
+
+__extension__ extern __inline int64_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vgetq_lane_s64 (int64x2_t __a, const int __idx)
+{
+  __ARM_CHECK_LANEQ (__a, __idx);
+  return __a[__ARM_LANEQ(__a,__idx)];
+}
+
+__extension__ extern __inline uint8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vgetq_lane_u8 (uint8x16_t __a, const int __idx)
+{
+  __ARM_CHECK_LANEQ (__a, __idx);
+  return __a[__ARM_LANEQ(__a,__idx)];
+}
+
+__extension__ extern __inline uint16_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vgetq_lane_u16 (uint16x8_t __a, const int __idx)
+{
+  __ARM_CHECK_LANEQ (__a, __idx);
+  return __a[__ARM_LANEQ(__a,__idx)];
+}
+
+__extension__ extern __inline uint32_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vgetq_lane_u32 (uint32x4_t __a, const int __idx)
+{
+  __ARM_CHECK_LANEQ (__a, __idx);
+  return __a[__ARM_LANEQ(__a,__idx)];
+}
+
+__extension__ extern __inline uint64_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vgetq_lane_u64 (uint64x2_t __a, const int __idx)
+{
+  __ARM_CHECK_LANEQ (__a, __idx);
+  return __a[__ARM_LANEQ(__a,__idx)];
+}
+
 #if (__ARM_FEATURE_MVE & 2) /* MVE Floating point.  */
 
 __extension__ extern __inline void
@@ -19804,6 +19972,39 @@ __arm_vst1q_p_f32 (float32_t * __addr, float32x4_t __value, mve_pred16_t __p)
   return vstrwq_p_f32 (__addr, __value, __p);
 }
 
+__extension__ extern __inline float16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vsetq_lane_f16 (float16_t __a, float16x8_t __b, const int __idx)
+{
+  __ARM_CHECK_LANEQ (__b, __idx);
+  __b[__ARM_LANEQ(__b,__idx)] = __a;
+  return __b;
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vsetq_lane_f32 (float32_t __a, float32x4_t __b, const int __idx)
+{
+  __ARM_CHECK_LANEQ (__b, __idx);
+  __b[__ARM_LANEQ(__b,__idx)] = __a;
+  return __b;
+}
+
+__extension__ extern __inline float16_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vgetq_lane_f16 (float16x8_t __a, const int __idx)
+{
+  __ARM_CHECK_LANEQ (__a, __idx);
+  return __a[__ARM_LANEQ(__a,__idx)];
+}
+
+__extension__ extern __inline float32_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vgetq_lane_f32 (float32x4_t __a, const int __idx)
+{
+  __ARM_CHECK_LANEQ (__a, __idx);
+  return __a[__ARM_LANEQ(__a,__idx)];
+}
 #endif
 
 enum {
@@ -22165,6 +22366,35 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]: __arm_vcmulq_rot90_x_f16 (__ARM_mve_coerce(__p1, float16x8_t), __ARM_mve_coerce(__p2, float16x8_t), p3), \
   int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]: __arm_vcmulq_rot90_x_f32 (__ARM_mve_coerce(__p1, float32x4_t), __ARM_mve_coerce(__p2, float32x4_t), p3));})
 
+#define vgetq_lane(p0,p1) __arm_vgetq_lane(p0,p1)
+#define __arm_vgetq_lane(p0,p1) ({ __typeof(p0) __p0 = (p0); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
+  int (*)[__ARM_mve_type_int8x16_t]: __arm_vgetq_lane_s8 (__ARM_mve_coerce(__p0, int8x16_t), p1), \
+  int (*)[__ARM_mve_type_int16x8_t]: __arm_vgetq_lane_s16 (__ARM_mve_coerce(__p0, int16x8_t), p1), \
+  int (*)[__ARM_mve_type_int32x4_t]: __arm_vgetq_lane_s32 (__ARM_mve_coerce(__p0, int32x4_t), p1), \
+  int (*)[__ARM_mve_type_int64x2_t]: __arm_vgetq_lane_s64 (__ARM_mve_coerce(__p0, int64x2_t), p1), \
+  int (*)[__ARM_mve_type_uint8x16_t]: __arm_vgetq_lane_u8 (__ARM_mve_coerce(__p0, uint8x16_t), p1), \
+  int (*)[__ARM_mve_type_uint16x8_t]: __arm_vgetq_lane_u16 (__ARM_mve_coerce(__p0, uint16x8_t), p1), \
+  int (*)[__ARM_mve_type_uint32x4_t]: __arm_vgetq_lane_u32 (__ARM_mve_coerce(__p0, uint32x4_t), p1), \
+  int (*)[__ARM_mve_type_uint64x2_t]: __arm_vgetq_lane_u64 (__ARM_mve_coerce(__p0, uint64x2_t), p1), \
+  int (*)[__ARM_mve_type_float16x8_t]: __arm_vgetq_lane_f16 (__ARM_mve_coerce(__p0, float16x8_t), p1), \
+  int (*)[__ARM_mve_type_float32x4_t]: __arm_vgetq_lane_f32 (__ARM_mve_coerce(__p0, float32x4_t), p1));})
+
+#define vsetq_lane(p0,p1,p2) __arm_vsetq_lane(p0,p1,p2)
+#define __arm_vsetq_lane(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
+  __typeof(p1) __p1 = (p1); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
+  int (*)[__ARM_mve_type_int8_t][__ARM_mve_type_int8x16_t]: __arm_vsetq_lane_s8 (__ARM_mve_coerce(__p0, int8_t), __ARM_mve_coerce(__p1, int8x16_t), p2), \
+  int (*)[__ARM_mve_type_int16_t][__ARM_mve_type_int16x8_t]: __arm_vsetq_lane_s16 (__ARM_mve_coerce(__p0, int16_t), __ARM_mve_coerce(__p1, int16x8_t), p2), \
+  int (*)[__ARM_mve_type_int32_t][__ARM_mve_type_int32x4_t]: __arm_vsetq_lane_s32 (__ARM_mve_coerce(__p0, int32_t), __ARM_mve_coerce(__p1, int32x4_t), p2), \
+  int (*)[__ARM_mve_type_int64_t][__ARM_mve_type_int64x2_t]: __arm_vsetq_lane_s64 (__ARM_mve_coerce(__p0, int64_t), __ARM_mve_coerce(__p1, int64x2_t), p2), \
+  int (*)[__ARM_mve_type_uint8_t][__ARM_mve_type_uint8x16_t]: __arm_vsetq_lane_u8 (__ARM_mve_coerce(__p0, uint8_t), __ARM_mve_coerce(__p1, uint8x16_t), p2), \
+  int (*)[__ARM_mve_type_uint16_t][__ARM_mve_type_uint16x8_t]: __arm_vsetq_lane_u16 (__ARM_mve_coerce(__p0, uint16_t), __ARM_mve_coerce(__p1, uint16x8_t), p2), \
+  int (*)[__ARM_mve_type_uint32_t][__ARM_mve_type_uint32x4_t]: __arm_vsetq_lane_u32 (__ARM_mve_coerce(__p0, uint32_t), __ARM_mve_coerce(__p1, uint32x4_t), p2), \
+  int (*)[__ARM_mve_type_uint64_t][__ARM_mve_type_uint64x2_t]: __arm_vsetq_lane_u64 (__ARM_mve_coerce(__p0, uint64_t), __ARM_mve_coerce(__p1, uint64x2_t), p2), \
+  int (*)[__ARM_mve_type_float16_t][__ARM_mve_type_float16x8_t]: __arm_vsetq_lane_f16 (__ARM_mve_coerce(__p0, float16_t), __ARM_mve_coerce(__p1, float16x8_t), p2), \
+  int (*)[__ARM_mve_type_float32_t][__ARM_mve_type_float32x4_t]: __arm_vsetq_lane_f32 (__ARM_mve_coerce(__p0, float32_t), __ARM_mve_coerce(__p1, float32x4_t), p2));})
+
 #else /* MVE Interger.  */
 
 #define vst4q(p0,p1) __arm_vst4q(p0,p1)
@@ -26262,6 +26492,31 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_uint16_t_const_ptr]: __arm_vld4q_u16 (__ARM_mve_coerce(__p0, uint16_t const *)), \
   int (*)[__ARM_mve_type_uint32_t_const_ptr]: __arm_vld4q_u32 (__ARM_mve_coerce(__p0, uint32_t const *)));})
 
+#define vgetq_lane(p0,p1) __arm_vgetq_lane(p0,p1)
+#define __arm_vgetq_lane(p0,p1) ({ __typeof(p0) __p0 = (p0); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
+  int (*)[__ARM_mve_type_int8x16_t]: __arm_vgetq_lane_s8 (__ARM_mve_coerce(__p0, int8x16_t), p1), \
+  int (*)[__ARM_mve_type_int16x8_t]: __arm_vgetq_lane_s16 (__ARM_mve_coerce(__p0, int16x8_t), p1), \
+  int (*)[__ARM_mve_type_int32x4_t]: __arm_vgetq_lane_s32 (__ARM_mve_coerce(__p0, int32x4_t), p1), \
+  int (*)[__ARM_mve_type_int64x2_t]: __arm_vgetq_lane_s64 (__ARM_mve_coerce(__p0, int64x2_t), p1), \
+  int (*)[__ARM_mve_type_uint8x16_t]: __arm_vgetq_lane_u8 (__ARM_mve_coerce(__p0, uint8x16_t), p1), \
+  int (*)[__ARM_mve_type_uint16x8_t]: __arm_vgetq_lane_u16 (__ARM_mve_coerce(__p0, uint16x8_t), p1), \
+  int (*)[__ARM_mve_type_uint32x4_t]: __arm_vgetq_lane_u32 (__ARM_mve_coerce(__p0, uint32x4_t), p1), \
+  int (*)[__ARM_mve_type_uint64x2_t]: __arm_vgetq_lane_u64 (__ARM_mve_coerce(__p0, uint64x2_t), p1));})
+
+#define vsetq_lane(p0,p1,p2) __arm_vsetq_lane(p0,p1,p2)
+#define __arm_vsetq_lane(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
+  __typeof(p1) __p1 = (p1); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
+  int (*)[__ARM_mve_type_int8_t][__ARM_mve_type_int8x16_t]: __arm_vsetq_lane_s8 (__ARM_mve_coerce(__p0, int8_t), __ARM_mve_coerce(__p1, int8x16_t), p2), \
+  int (*)[__ARM_mve_type_int16_t][__ARM_mve_type_int16x8_t]: __arm_vsetq_lane_s16 (__ARM_mve_coerce(__p0, int16_t), __ARM_mve_coerce(__p1, int16x8_t), p2), \
+  int (*)[__ARM_mve_type_int32_t][__ARM_mve_type_int32x4_t]: __arm_vsetq_lane_s32 (__ARM_mve_coerce(__p0, int32_t), __ARM_mve_coerce(__p1, int32x4_t), p2), \
+  int (*)[__ARM_mve_type_int64_t][__ARM_mve_type_int64x2_t]: __arm_vsetq_lane_s64 (__ARM_mve_coerce(__p0, int64_t), __ARM_mve_coerce(__p1, int64x2_t), p2), \
+  int (*)[__ARM_mve_type_uint8_t][__ARM_mve_type_uint8x16_t]: __arm_vsetq_lane_u8 (__ARM_mve_coerce(__p0, uint8_t), __ARM_mve_coerce(__p1, uint8x16_t), p2), \
+  int (*)[__ARM_mve_type_uint16_t][__ARM_mve_type_uint16x8_t]: __arm_vsetq_lane_u16 (__ARM_mve_coerce(__p0, uint16_t), __ARM_mve_coerce(__p1, uint16x8_t), p2), \
+  int (*)[__ARM_mve_type_uint32_t][__ARM_mve_type_uint32x4_t]: __arm_vsetq_lane_u32 (__ARM_mve_coerce(__p0, uint32_t), __ARM_mve_coerce(__p1, uint32x4_t), p2), \
+  int (*)[__ARM_mve_type_uint64_t][__ARM_mve_type_uint64x2_t]: __arm_vsetq_lane_u64 (__ARM_mve_coerce(__p0, uint64_t), __ARM_mve_coerce(__p1, uint64x2_t), p2));})
+
 #endif /* MVE Floating point.  */
 
 #ifdef __cplusplus
diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index 62735e5ddab1125e6f38fbf5a5cb5c04936a7717..b679511e42ce909cc9ef19e1cb790e8a5254d538 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -411,6 +411,8 @@
 (define_mode_attr MVE_H_ELEM [ (V8HI "V8HI") (V4SI "V4HI")])
 (define_mode_attr V_sz_elem1 [(V16QI "b") (V8HI  "h") (V4SI "w") (V8HF "h")
 			      (V4SF "w")])
+(define_mode_attr V_extr_elem [(V16QI "u8") (V8HI "u16") (V4SI "32")
+			       (V8HF "u16") (V4SF "32")])
 
 (define_int_iterator VCVTQ_TO_F [VCVTQ_TO_F_S VCVTQ_TO_F_U])
 (define_int_iterator VMVNQ_N [VMVNQ_N_U VMVNQ_N_S])
@@ -10863,3 +10865,121 @@
    return "";
 }
   [(set_attr "length" "16")])
+;;
+;; [vgetq_lane_u, vgetq_lane_s, vgetq_lane_f])
+;;
+(define_insn "mve_vec_extract<mode><V_elem_l>"
+ [(set (match_operand:<V_elem> 0 "s_register_operand" "=r")
+   (vec_select:<V_elem>
+    (match_operand:MVE_VLD_ST 1 "s_register_operand" "w")
+    (parallel [(match_operand:SI 2 "immediate_operand" "i")])))]
+  "(TARGET_HAVE_MVE && VALID_MVE_SI_MODE (<MODE>mode))
+   || (TARGET_HAVE_MVE_FLOAT && VALID_MVE_SF_MODE (<MODE>mode))"
+{
+  if (BYTES_BIG_ENDIAN)
+    {
+      int elt = INTVAL (operands[2]);
+      elt = GET_MODE_NUNITS (<MODE>mode) - 1 - elt;
+      operands[2] = GEN_INT (elt);
+    }
+  return "vmov.<V_extr_elem>\t%0, %q1[%c2]";
+}
+ [(set_attr "type" "mve_move")])
+
+(define_insn "mve_vec_extractv2didi"
+ [(set (match_operand:DI 0 "s_register_operand" "=r")
+   (vec_select:DI
+    (match_operand:V2DI 1 "s_register_operand" "w")
+    (parallel [(match_operand:SI 2 "immediate_operand" "i")])))]
+  "TARGET_HAVE_MVE"
+{
+  int elt = INTVAL (operands[2]);
+  if (BYTES_BIG_ENDIAN)
+    elt = 1 - elt;
+
+  if (elt == 0)
+   return "vmov\t%Q0, %R0, %e1";
+  else
+   return "vmov\t%J0, %K0, %f1";
+}
+ [(set_attr "type" "mve_move")])
+
+(define_insn "*mve_vec_extract_sext_internal<mode>"
+ [(set (match_operand:SI 0 "s_register_operand" "=r")
+   (sign_extend:SI
+    (vec_select:<V_elem>
+     (match_operand:MVE_2 1 "s_register_operand" "w")
+     (parallel [(match_operand:SI 2 "immediate_operand" "i")]))))]
+  "(TARGET_HAVE_MVE && VALID_MVE_SI_MODE (<MODE>mode))
+   || (TARGET_HAVE_MVE_FLOAT && VALID_MVE_SF_MODE (<MODE>mode))"
+{
+  if (BYTES_BIG_ENDIAN)
+    {
+      int elt = INTVAL (operands[2]);
+      elt = GET_MODE_NUNITS (<MODE>mode) - 1 - elt;
+      operands[2] = GEN_INT (elt);
+    }
+  return "vmov.s<V_sz_elem>\t%0, %q1[%c2]";
+}
+ [(set_attr "type" "mve_move")])
+
+(define_insn "*mve_vec_extract_zext_internal<mode>"
+ [(set (match_operand:SI 0 "s_register_operand" "=r")
+   (zero_extend:SI
+    (vec_select:<V_elem>
+     (match_operand:MVE_2 1 "s_register_operand" "w")
+     (parallel [(match_operand:SI 2 "immediate_operand" "i")]))))]
+  "(TARGET_HAVE_MVE && VALID_MVE_SI_MODE (<MODE>mode))
+   || (TARGET_HAVE_MVE_FLOAT && VALID_MVE_SF_MODE (<MODE>mode))"
+{
+  if (BYTES_BIG_ENDIAN)
+    {
+      int elt = INTVAL (operands[2]);
+      elt = GET_MODE_NUNITS (<MODE>mode) - 1 - elt;
+      operands[2] = GEN_INT (elt);
+    }
+  return "vmov.u<V_sz_elem>\t%0, %q1[%c2]";
+}
+ [(set_attr "type" "mve_move")])
+
+;;
+;; [vsetq_lane_u, vsetq_lane_s, vsetq_lane_f])
+;;
+(define_insn "mve_vec_set<mode>_internal"
+ [(set (match_operand:VQ2 0 "s_register_operand" "=w")
+       (vec_merge:VQ2
+	(vec_duplicate:VQ2
+	  (match_operand:<V_elem> 1 "nonimmediate_operand" "r"))
+	(match_operand:VQ2 3 "s_register_operand" "0")
+	(match_operand:SI 2 "immediate_operand" "i")))]
+  "(TARGET_HAVE_MVE && VALID_MVE_SI_MODE (<MODE>mode))
+   || (TARGET_HAVE_MVE_FLOAT && VALID_MVE_SF_MODE (<MODE>mode))"
+{
+  int elt = ffs ((int) INTVAL (operands[2])) - 1;
+  if (BYTES_BIG_ENDIAN)
+    elt = GET_MODE_NUNITS (<MODE>mode) - 1 - elt;
+  operands[2] = GEN_INT (elt);
+
+  return "vmov.<V_sz_elem>\t%q0[%c2], %1";
+}
+ [(set_attr "type" "mve_move")])
+
+(define_insn "mve_vec_setv2di_internal"
+ [(set (match_operand:V2DI 0 "s_register_operand" "=w")
+       (vec_merge:V2DI
+	(vec_duplicate:V2DI
+	  (match_operand:DI 1 "nonimmediate_operand" "r"))
+	(match_operand:V2DI 3 "s_register_operand" "0")
+	(match_operand:SI 2 "immediate_operand" "i")))]
+ "TARGET_HAVE_MVE"
+{
+  int elt = ffs ((int) INTVAL (operands[2])) - 1;
+  if (BYTES_BIG_ENDIAN)
+    elt = 1 - elt;
+
+  if (elt == 0)
+   return "vmov\t%e0, %Q1, %R1";
+  else
+   return "vmov\t%f0, %J1, %K1";
+}
+ [(set_attr "type" "mve_move")])
diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index df8f4fd4166b0d9cd08f01f3ac7ac3958f20b9db..72bdf557dc9fd16a6d97286d7232f9a9071a6e5f 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -411,18 +411,6 @@
   [(set_attr "type" "neon_load1_all_lanes_q,neon_from_gp_q")]
 )
 
-(define_expand "vec_set<mode>"
-  [(match_operand:VDQ 0 "s_register_operand")
-   (match_operand:<V_elem> 1 "s_register_operand")
-   (match_operand:SI 2 "immediate_operand")]
-  "TARGET_NEON"
-{
-  HOST_WIDE_INT elem = HOST_WIDE_INT_1 << INTVAL (operands[2]);
-  emit_insn (gen_vec_set<mode>_internal (operands[0], operands[1],
-					 GEN_INT (elem), operands[0]));
-  DONE;
-})
-
 (define_insn "vec_extract<mode><V_elem_l>"
   [(set (match_operand:<V_elem> 0 "nonimmediate_operand" "=Um,r")
         (vec_select:<V_elem>
@@ -445,7 +433,10 @@
   [(set_attr "type" "neon_store1_one_lane<q>,neon_to_gp<q>")]
 )
 
-(define_insn "vec_extract<mode><V_elem_l>"
+;; This pattern is renamed from "vec_extract<mode><V_elem_l>" to
+;; "neon_vec_extract<mode><V_elem_l>" and this pattern is called
+;; by define_expand in vec-common.md file.
+(define_insn "neon_vec_extract<mode><V_elem_l>"
   [(set (match_operand:<V_elem> 0 "nonimmediate_operand" "=Um,r")
 	(vec_select:<V_elem>
           (match_operand:VQ2 1 "s_register_operand" "w,w")
@@ -471,7 +462,9 @@
   [(set_attr "type" "neon_store1_one_lane<q>,neon_to_gp<q>")]
 )
 
-(define_insn "vec_extractv2didi"
+;; This pattern is renamed from "vec_extractv2didi" to "neon_vec_extractv2didi"
+;; and this pattern is called by define_expand in vec-common.md file.
+(define_insn "neon_vec_extractv2didi"
   [(set (match_operand:DI 0 "nonimmediate_operand" "=Um,r")
 	(vec_select:DI
           (match_operand:V2DI 1 "s_register_operand" "w,w")
diff --git a/gcc/config/arm/vec-common.md b/gcc/config/arm/vec-common.md
index 82a1a6bd7698fa25db88a5cdb5b3e762dc80a589..e052047f8b18e4b251ea0b322448c414b53ea422 100644
--- a/gcc/config/arm/vec-common.md
+++ b/gcc/config/arm/vec-common.md
@@ -190,3 +190,40 @@
   arm_expand_vec_perm (operands[0], operands[1], operands[2], operands[3]);
   DONE;
 })
+
+;; The following expand patterns are moved from neon.md to here and
+;; modifications are made to support both NEON and MVE.  This are needed for
+;; intrinsics vgetq_lane and vsetq_lane intrinsics in MVE.
+
+(define_expand "vec_extract<mode><V_elem_l>"
+ [(match_operand:<V_elem> 0 "nonimmediate_operand")
+  (match_operand:VQX 1 "s_register_operand")
+  (match_operand:SI 2 "immediate_operand")]
+ "TARGET_NEON || TARGET_HAVE_MVE"
+{
+  if (TARGET_NEON)
+    emit_insn (gen_neon_vec_extract<mode><V_elem_l> (operands[0], operands[1],
+						     operands[2]));
+  else if (TARGET_HAVE_MVE)
+    emit_insn (gen_mve_vec_extract<mode><V_elem_l> (operands[0], operands[1],
+						     operands[2]));
+  else
+    gcc_unreachable ();
+  DONE;
+})
+
+(define_expand "vec_set<mode>"
+  [(match_operand:VQX 0 "s_register_operand" "")
+   (match_operand:<V_elem> 1 "s_register_operand" "")
+   (match_operand:SI 2 "immediate_operand" "")]
+  "TARGET_NEON || TARGET_HAVE_MVE"
+{
+  HOST_WIDE_INT elem = HOST_WIDE_INT_1 << INTVAL (operands[2]);
+  if (TARGET_NEON)
+    emit_insn (gen_vec_set<mode>_internal (operands[0], operands[1],
+					   GEN_INT (elem), operands[0]));
+  else
+    emit_insn (gen_mve_vec_set<mode>_internal (operands[0], operands[1],
+					       GEN_INT (elem), operands[0]));
+  DONE;
+})
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_f16.c
new file mode 100644
index 0000000000000000000000000000000000000000..ba5cf8187080919ff8df24b6a747ebe40a701a17
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_f16.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+float16_t
+foo (float16x8_t a)
+{
+  return vgetq_lane_f16 (a, 0);
+}
+
+/* { dg-final { scan-assembler "vmov.u16"  }  } */
+
+float16_t
+foo1 (float16x8_t a)
+{
+  return vgetq_lane (a, 0);
+}
+
+/* { dg-final { scan-assembler "vmov.u16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_f32.c
new file mode 100644
index 0000000000000000000000000000000000000000..e660e71bfb44b56f9e3f7c6ea55fe33753961641
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_f32.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+float32_t
+foo (float32x4_t a)
+{
+  return vgetq_lane_f32 (a, 0);
+}
+
+/* { dg-final { scan-assembler "vmov.32"  }  } */
+
+float32_t
+foo1 (float32x4_t a)
+{
+  return vgetq_lane (a, 0);
+}
+
+/* { dg-final { scan-assembler "vmov.32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_s16.c
new file mode 100644
index 0000000000000000000000000000000000000000..51609c3c4e40240204df1645717ed9eee38d897b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_s16.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int16_t
+foo (int16x8_t a)
+{
+  return vgetq_lane_s16 (a, 0);
+}
+
+/* { dg-final { scan-assembler "vmov.s16"  }  } */
+
+int16_t
+foo1 (int16x8_t a)
+{
+  return vgetq_lane (a, 0);
+}
+
+/* { dg-final { scan-assembler "vmov.s16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_s32.c
new file mode 100644
index 0000000000000000000000000000000000000000..47f092faeaef9abbcb947eb94a2a29ff011216c3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_s32.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int32_t
+foo (int32x4_t a)
+{
+  return vgetq_lane_s32 (a, 0);
+}
+
+/* { dg-final { scan-assembler "vmov.32"  }  } */
+
+int32_t
+foo1 (int32x4_t a)
+{
+  return vgetq_lane (a, 0);
+}
+
+/* { dg-final { scan-assembler "vmov.32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_s64.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_s64.c
new file mode 100644
index 0000000000000000000000000000000000000000..4e048dbf3f64548891232790e5da21718b4d33ad
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_s64.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int64_t
+foo (int64x2_t a)
+{
+  return vgetq_lane_s64 (a, 0);
+}
+
+/* { dg-final { scan-assembler {vmov\tr0, r1, d0}  }  } */
+
+int64_t
+foo1 (int64x2_t a)
+{
+  return vgetq_lane (a, 0);
+}
+
+/* { dg-final { scan-assembler {vmov\tr0, r1, d0}  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_s8.c
new file mode 100644
index 0000000000000000000000000000000000000000..ccf26912915411952eef220c3cc689df3f7504bb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_s8.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int8_t
+foo (int8x16_t a)
+{
+  return vgetq_lane_s8 (a, 0);
+}
+
+/* { dg-final { scan-assembler "vmov.s8"  }  } */
+
+int8_t
+foo1 (int8x16_t a)
+{
+  return vgetq_lane (a, 0);
+}
+
+/* { dg-final { scan-assembler "vmov.s8"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_u16.c
new file mode 100644
index 0000000000000000000000000000000000000000..ea463a6cc782cc04ea4607a20751d57f7d84808f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_u16.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint16_t
+foo (uint16x8_t a)
+{
+  return vgetq_lane_u16 (a, 0);
+}
+
+/* { dg-final { scan-assembler "vmov.u16"  }  } */
+
+uint16_t
+foo1 (uint16x8_t a)
+{
+  return vgetq_lane (a, 0);
+}
+
+/* { dg-final { scan-assembler "vmov.u16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_u32.c
new file mode 100644
index 0000000000000000000000000000000000000000..e136d2e8ab2b99aff08e51eb3d8b32602d5a9990
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_u32.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint32_t
+foo (uint32x4_t a)
+{
+  return vgetq_lane_u32 (a, 0);
+}
+
+/* { dg-final { scan-assembler "vmov.32"  }  } */
+
+uint32_t
+foo1 (uint32x4_t a)
+{
+  return vgetq_lane (a, 0);
+}
+
+/* { dg-final { scan-assembler "vmov.32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_u64.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_u64.c
new file mode 100644
index 0000000000000000000000000000000000000000..427687fbf404940fc5a8334bac5034e2aeb9112e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_u64.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint64_t
+foo (uint64x2_t a)
+{
+  return vgetq_lane_u64 (a, 0);
+}
+
+/* { dg-final { scan-assembler {vmov\tr0, r1, d0}  }  } */
+
+uint64_t
+foo1 (uint64x2_t a)
+{
+  return vgetq_lane (a, 0);
+}
+
+/* { dg-final { scan-assembler {vmov\tr0, r1, d0}  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_u8.c
new file mode 100644
index 0000000000000000000000000000000000000000..4db14fce783eead4c0f602b02f7e1353adc0c671
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_u8.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint8_t
+foo (uint8x16_t a)
+{
+  return vgetq_lane_u8 (a, 0);
+}
+
+/* { dg-final { scan-assembler "vmov.u8"  }  } */
+
+uint8_t
+foo1 (uint8x16_t a)
+{
+  return vgetq_lane (a, 0);
+}
+
+/* { dg-final { scan-assembler "vmov.u8"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_f16.c
new file mode 100644
index 0000000000000000000000000000000000000000..a2e4b765b61c612c580e2177a57255786e5529e2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_f16.c
@@ -0,0 +1,14 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+float16x8_t
+foo (float16_t a, float16x8_t b)
+{
+    return vsetq_lane_f16 (a, b, 0);
+}
+
+/* { dg-final { scan-assembler "vmov.16"  }  } */
+
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_f32.c
new file mode 100644
index 0000000000000000000000000000000000000000..c538d62c41145434a74770b75c762da4c3fc5898
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_f32.c
@@ -0,0 +1,14 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+float32x4_t
+foo (float32_t a, float32x4_t b)
+{
+    return vsetq_lane_f32 (a, b, 0);
+}
+
+/* { dg-final { scan-assembler "vmov.32"  }  } */
+
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_s16.c
new file mode 100644
index 0000000000000000000000000000000000000000..648789dc8df3f7bdc640255c6fb3f458a554c8e5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_s16.c
@@ -0,0 +1,14 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int16x8_t
+foo (int16_t a, int16x8_t b)
+{
+    return vsetq_lane_s16 (a, b, 0);
+}
+
+/* { dg-final { scan-assembler "vmov.16"  }  } */
+
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_s32.c
new file mode 100644
index 0000000000000000000000000000000000000000..18ca25b637ca5335d8068aee41f2a220416f660d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_s32.c
@@ -0,0 +1,14 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int32x4_t
+foo (int32_t a, int32x4_t b)
+{
+    return vsetq_lane_s32 (a, b, 0);
+}
+
+/* { dg-final { scan-assembler "vmov.32"  }  } */
+
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_s64.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_s64.c
new file mode 100644
index 0000000000000000000000000000000000000000..0c82eeb081cb2733078a07a7e13f3854a51afd27
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_s64.c
@@ -0,0 +1,14 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int64x2_t
+foo (int64_t a, int64x2_t b)
+{
+    return vsetq_lane_s64 (a, b, 0);
+}
+
+/* { dg-final { scan-assembler {vmov\td0, r[1-9]*[0-9], r[1-9]*[0-9]}  }  } */
+
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_s8.c
new file mode 100644
index 0000000000000000000000000000000000000000..87c06cd013da9a6b8e5a866bc946f2e10cd81522
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_s8.c
@@ -0,0 +1,14 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int8x16_t
+foo (int8_t a, int8x16_t b)
+{
+    return vsetq_lane_s8 (a, b, 0);
+}
+
+/* { dg-final { scan-assembler "vmov.8"  }  } */
+
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_u16.c
new file mode 100644
index 0000000000000000000000000000000000000000..92984297ad1d83bab29d1cfd12d72932eda50fe4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_u16.c
@@ -0,0 +1,14 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint16x8_t
+foo (uint16_t a, uint16x8_t b)
+{
+    return vsetq_lane_u16 (a, b, 0);
+}
+
+/* { dg-final { scan-assembler "vmov.16"  }  } */
+
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_u32.c
new file mode 100644
index 0000000000000000000000000000000000000000..755e7c9a978cd28dfcef108f6e624413fea35dff
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_u32.c
@@ -0,0 +1,14 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint32x4_t
+foo (uint32_t a, uint32x4_t b)
+{
+    return vsetq_lane_u32 (a, b, 0);
+}
+
+/* { dg-final { scan-assembler "vmov.32"  }  } */
+
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_u64.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_u64.c
new file mode 100644
index 0000000000000000000000000000000000000000..d7d593b2193cec787fefe547a47b1261def00f40
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_u64.c
@@ -0,0 +1,14 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint64x2_t
+foo (uint64_t a, uint64x2_t b)
+{
+    return vsetq_lane_u64 (a, b, 0);
+}
+
+/* { dg-final { scan-assembler {vmov\td0, r[1-9]*[0-9], r[1-9]*[0-9]}  }  } */
+
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_u8.c
new file mode 100644
index 0000000000000000000000000000000000000000..b263360fc1a23124e91a705232cc6eb77012f803
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_u8.c
@@ -0,0 +1,14 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint8x16_t
+foo (uint8_t a, uint8x16_t b)
+{
+    return vsetq_lane_u8 (a, b, 0);
+}
+
+/* { dg-final { scan-assembler "vmov.8"  }  } */
+


^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH][ARM][GCC][11x]: MVE ACLE vector interleaving store and deinterleaving load intrinsics and also aliases to vstr and vldr intrinsics.
  2019-11-14 19:34 [PATCH][ARM][GCC][0/x]: Support for MVE ACLE intrinsics Srinath Parvathaneni
                   ` (33 preceding siblings ...)
  2019-11-14 19:16 ` [PATCH][ARM][GCC][6x]:MVE ACLE vaddq intrinsics using arithmetic plus operator Srinath Parvathaneni
@ 2019-11-14 19:16 ` Srinath Parvathaneni
  2019-11-14 19:16 ` [PATCH][ARM][GCC][4/4x]: MVE intrinsics with quaternary operands Srinath Parvathaneni
                   ` (3 subsequent siblings)
  38 siblings, 0 replies; 48+ messages in thread
From: Srinath Parvathaneni @ 2019-11-14 19:16 UTC (permalink / raw)
  To: gcc-patches; +Cc: Richard Earnshaw, Kyrylo Tkachov

[-- Attachment #1: Type: text/plain, Size: 74882 bytes --]

Hello,

This patch supports following MVE ACLE intrinsics which are aliases of vstr and
vldr intrinsics.

vst1q_p_u8, vst1q_p_s8, vld1q_z_u8, vld1q_z_s8, vst1q_p_u16, vst1q_p_s16,
vld1q_z_u16, vld1q_z_s16, vst1q_p_u32, vst1q_p_s32, vld1q_z_u32, vld1q_z_s32,
vld1q_z_f16, vst1q_p_f16, vld1q_z_f32, vst1q_p_f32.

This patch also supports following MVE ACLE vector deinterleaving loads and vector
interleaving stores.

vst2q_s8, vst2q_u8, vld2q_s8, vld2q_u8, vld4q_s8, vld4q_u8, vst2q_s16, vst2q_u16,
vld2q_s16, vld2q_u16, vld4q_s16, vld4q_u16, vst2q_s32, vst2q_u32, vld2q_s32,
vld2q_u32, vld4q_s32, vld4q_u32, vld4q_f16, vld2q_f16, vst2q_f16, vld4q_f32,
vld2q_f32, vst2q_f32.

Please refer to M-profile Vector Extension (MVE) intrinsics [1]  for more details.
[1] https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/mve-intrinsics

Regression tested on arm-none-eabi and found no regressions.

Ok for trunk?

Thanks,
Srinath.

gcc/ChangeLog:

2019-11-08  Andre Vieira  <andre.simoesdiasvieira@arm.com>
            Mihail Ionescu  <mihail.ionescu@arm.com>
            Srinath Parvathaneni  <srinath.parvathaneni@arm.com>

	* config/arm/arm_mve.h (vst1q_p_u8): Define macro.
	(vst1q_p_s8): Likewise.
	(vst2q_s8): Likewise.
	(vst2q_u8): Likewise.
	(vld1q_z_u8): Likewise.
	(vld1q_z_s8): Likewise.
	(vld2q_s8): Likewise.
	(vld2q_u8): Likewise.
	(vld4q_s8): Likewise.
	(vld4q_u8): Likewise.
	(vst1q_p_u16): Likewise.
	(vst1q_p_s16): Likewise.
	(vst2q_s16): Likewise.
	(vst2q_u16): Likewise.
	(vld1q_z_u16): Likewise.
	(vld1q_z_s16): Likewise.
	(vld2q_s16): Likewise.
	(vld2q_u16): Likewise.
	(vld4q_s16): Likewise.
	(vld4q_u16): Likewise.
	(vst1q_p_u32): Likewise.
	(vst1q_p_s32): Likewise.
	(vst2q_s32): Likewise.
	(vst2q_u32): Likewise.
	(vld1q_z_u32): Likewise.
	(vld1q_z_s32): Likewise.
	(vld2q_s32): Likewise.
	(vld2q_u32): Likewise.
	(vld4q_s32): Likewise.
	(vld4q_u32): Likewise.
	(vld4q_f16): Likewise.
	(vld2q_f16): Likewise.
	(vld1q_z_f16): Likewise.
	(vst2q_f16): Likewise.
	(vst1q_p_f16): Likewise.
	(vld4q_f32): Likewise.
	(vld2q_f32): Likewise.
	(vld1q_z_f32): Likewise.
	(vst2q_f32): Likewise.
	(vst1q_p_f32): Likewise.
	(__arm_vst1q_p_u8): Define intrinsic.
	(__arm_vst1q_p_s8): Likewise.
	(__arm_vst2q_s8): Likewise.
	(__arm_vst2q_u8): Likewise.
	(__arm_vld1q_z_u8): Likewise.
	(__arm_vld1q_z_s8): Likewise.
	(__arm_vld2q_s8): Likewise.
	(__arm_vld2q_u8): Likewise.
	(__arm_vld4q_s8): Likewise.
	(__arm_vld4q_u8): Likewise.
	(__arm_vst1q_p_u16): Likewise.
	(__arm_vst1q_p_s16): Likewise.
	(__arm_vst2q_s16): Likewise.
	(__arm_vst2q_u16): Likewise.
	(__arm_vld1q_z_u16): Likewise.
	(__arm_vld1q_z_s16): Likewise.
	(__arm_vld2q_s16): Likewise.
	(__arm_vld2q_u16): Likewise.
	(__arm_vld4q_s16): Likewise.
	(__arm_vld4q_u16): Likewise.
	(__arm_vst1q_p_u32): Likewise.
	(__arm_vst1q_p_s32): Likewise.
	(__arm_vst2q_s32): Likewise.
	(__arm_vst2q_u32): Likewise.
	(__arm_vld1q_z_u32): Likewise.
	(__arm_vld1q_z_s32): Likewise.
	(__arm_vld2q_s32): Likewise.
	(__arm_vld2q_u32): Likewise.
	(__arm_vld4q_s32): Likewise.
	(__arm_vld4q_u32): Likewise.
	(__arm_vld4q_f16): Likewise.
	(__arm_vld2q_f16): Likewise.
	(__arm_vld1q_z_f16): Likewise.
	(__arm_vst2q_f16): Likewise.
	(__arm_vst1q_p_f16): Likewise.
	(__arm_vld4q_f32): Likewise.
	(__arm_vld2q_f32): Likewise.
	(__arm_vld1q_z_f32): Likewise.
	(__arm_vst2q_f32): Likewise.
	(__arm_vst1q_p_f32): Likewise.
	(vld1q_z): Define polymorphic variant.
	(vld2q): Likewise.
	(vld4q): Likewise.
	(vst1q_p): Likewise.
	(vst2q): Likewise.
	* config/arm/arm_mve_builtins.def (STORE1): Use builtin qualifier.
	(LOAD1): Likewise.
	* config/arm/mve.md (mve_vst2q<mode>): Define RTL pattern.
	(mve_vld2q<mode>): Likewise.
	(mve_vld4q<mode>): Likewise.

gcc/testsuite/ChangeLog:

2019-11-08  Andre Vieira  <andre.simoesdiasvieira@arm.com>
            Mihail Ionescu  <mihail.ionescu@arm.com>
            Srinath Parvathaneni  <srinath.parvathaneni@arm.com>

	* gcc.target/arm/mve/intrinsics/vld1q_z_f16.c: New test.
	* gcc.target/arm/mve/intrinsics/vld1q_z_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vld1q_z_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vld1q_z_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vld1q_z_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vld1q_z_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vld1q_z_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vld1q_z_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vld2q_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vld2q_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vld2q_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vld2q_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vld2q_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vld2q_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vld2q_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vld2q_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vld4q_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vld4q_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vld4q_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vld4q_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vld4q_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vld4q_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vld4q_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vld4q_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vst1q_p_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vst1q_p_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vst1q_p_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vst1q_p_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vst1q_p_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vst1q_p_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vst1q_p_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vst1q_p_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vst2q_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vst2q_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vst2q_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vst2q_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vst2q_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vst2q_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vst2q_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vst2q_u8.c: Likewise.


###############     Attachment also inlined for ease of reply    ###############


diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
index 1704b622c5d6e0abcf814ae1d439bb732f0bd76e..d0259d7bd96c565d901b7634e9f735e0e14bc9dc 100644
--- a/gcc/config/arm/arm_mve.h
+++ b/gcc/config/arm/arm_mve.h
@@ -2466,6 +2466,46 @@ typedef struct { uint8x16_t val[4]; } uint8x16x4_t;
 #define vsbcq_u32(__a, __b,  __carry) __arm_vsbcq_u32(__a, __b,  __carry)
 #define vsbcq_m_s32(__inactive, __a, __b,  __carry, __p) __arm_vsbcq_m_s32(__inactive, __a, __b,  __carry, __p)
 #define vsbcq_m_u32(__inactive, __a, __b,  __carry, __p) __arm_vsbcq_m_u32(__inactive, __a, __b,  __carry, __p)
+#define vst1q_p_u8(__addr, __value, __p) __arm_vst1q_p_u8(__addr, __value, __p)
+#define vst1q_p_s8(__addr, __value, __p) __arm_vst1q_p_s8(__addr, __value, __p)
+#define vst2q_s8(__addr, __value) __arm_vst2q_s8(__addr, __value)
+#define vst2q_u8(__addr, __value) __arm_vst2q_u8(__addr, __value)
+#define vld1q_z_u8(__base, __p) __arm_vld1q_z_u8(__base, __p)
+#define vld1q_z_s8(__base, __p) __arm_vld1q_z_s8(__base, __p)
+#define vld2q_s8(__addr) __arm_vld2q_s8(__addr)
+#define vld2q_u8(__addr) __arm_vld2q_u8(__addr)
+#define vld4q_s8(__addr) __arm_vld4q_s8(__addr)
+#define vld4q_u8(__addr) __arm_vld4q_u8(__addr)
+#define vst1q_p_u16(__addr, __value, __p) __arm_vst1q_p_u16(__addr, __value, __p)
+#define vst1q_p_s16(__addr, __value, __p) __arm_vst1q_p_s16(__addr, __value, __p)
+#define vst2q_s16(__addr, __value) __arm_vst2q_s16(__addr, __value)
+#define vst2q_u16(__addr, __value) __arm_vst2q_u16(__addr, __value)
+#define vld1q_z_u16(__base, __p) __arm_vld1q_z_u16(__base, __p)
+#define vld1q_z_s16(__base, __p) __arm_vld1q_z_s16(__base, __p)
+#define vld2q_s16(__addr) __arm_vld2q_s16(__addr)
+#define vld2q_u16(__addr) __arm_vld2q_u16(__addr)
+#define vld4q_s16(__addr) __arm_vld4q_s16(__addr)
+#define vld4q_u16(__addr) __arm_vld4q_u16(__addr)
+#define vst1q_p_u32(__addr, __value, __p) __arm_vst1q_p_u32(__addr, __value, __p)
+#define vst1q_p_s32(__addr, __value, __p) __arm_vst1q_p_s32(__addr, __value, __p)
+#define vst2q_s32(__addr, __value) __arm_vst2q_s32(__addr, __value)
+#define vst2q_u32(__addr, __value) __arm_vst2q_u32(__addr, __value)
+#define vld1q_z_u32(__base, __p) __arm_vld1q_z_u32(__base, __p)
+#define vld1q_z_s32(__base, __p) __arm_vld1q_z_s32(__base, __p)
+#define vld2q_s32(__addr) __arm_vld2q_s32(__addr)
+#define vld2q_u32(__addr) __arm_vld2q_u32(__addr)
+#define vld4q_s32(__addr) __arm_vld4q_s32(__addr)
+#define vld4q_u32(__addr) __arm_vld4q_u32(__addr)
+#define vld4q_f16(__addr) __arm_vld4q_f16(__addr)
+#define vld2q_f16(__addr) __arm_vld2q_f16(__addr)
+#define vld1q_z_f16(__base, __p) __arm_vld1q_z_f16(__base, __p)
+#define vst2q_f16(__addr, __value) __arm_vst2q_f16(__addr, __value)
+#define vst1q_p_f16(__addr, __value, __p) __arm_vst1q_p_f16(__addr, __value, __p)
+#define vld4q_f32(__addr) __arm_vld4q_f32(__addr)
+#define vld2q_f32(__addr) __arm_vld2q_f32(__addr)
+#define vld1q_z_f32(__base, __p) __arm_vld1q_z_f32(__base, __p)
+#define vst2q_f32(__addr, __value) __arm_vst2q_f32(__addr, __value)
+#define vst1q_p_f32(__addr, __value, __p) __arm_vst1q_p_f32(__addr, __value, __p)
 #endif
 
 __extension__ extern __inline void
@@ -16085,6 +16125,252 @@ __arm_vsbcq_m_u32 (uint32x4_t __inactive, uint32x4_t __a, uint32x4_t __b, unsign
   return __res;
 }
 
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vst1q_p_u8 (uint8_t * __addr, uint8x16_t __value, mve_pred16_t __p)
+{
+  return vstrbq_p_u8 (__addr, __value, __p);
+}
+
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vst1q_p_s8 (int8_t * __addr, int8x16_t __value, mve_pred16_t __p)
+{
+  return vstrbq_p_s8 (__addr, __value, __p);
+}
+
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vst2q_s8 (int8_t * __addr, int8x16x2_t __value)
+{
+  union { int8x16x2_t __i; __builtin_neon_oi __o; } __rv;
+  __rv.__i = __value;
+  __builtin_mve_vst2qv16qi ((__builtin_neon_qi *) __addr, __rv.__o);
+}
+
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vst2q_u8 (uint8_t * __addr, uint8x16x2_t __value)
+{
+  union { uint8x16x2_t __i; __builtin_neon_oi __o; } __rv;
+  __rv.__i = __value;
+  __builtin_mve_vst2qv16qi ((__builtin_neon_qi *) __addr, __rv.__o);
+}
+
+__extension__ extern __inline uint8x16_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vld1q_z_u8 (uint8_t const *__base, mve_pred16_t __p)
+{
+  return vldrbq_z_u8 ( __base, __p);
+}
+
+__extension__ extern __inline int8x16_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vld1q_z_s8 (int8_t const *__base, mve_pred16_t __p)
+{
+  return vldrbq_z_s8 ( __base, __p);
+}
+
+__extension__ extern __inline int8x16x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vld2q_s8 (int8_t const * __addr)
+{
+  union { int8x16x2_t __i; __builtin_neon_oi __o; } __rv;
+  __rv.__o = __builtin_mve_vld2qv16qi ((__builtin_neon_qi *) __addr);
+  return __rv.__i;
+}
+
+__extension__ extern __inline uint8x16x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vld2q_u8 (uint8_t const * __addr)
+{
+  union { uint8x16x2_t __i; __builtin_neon_oi __o; } __rv;
+  __rv.__o = __builtin_mve_vld2qv16qi ((__builtin_neon_qi *) __addr);
+  return __rv.__i;
+}
+
+__extension__ extern __inline int8x16x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vld4q_s8 (int8_t const * __addr)
+{
+  union { int8x16x4_t __i; __builtin_neon_xi __o; } __rv;
+  __rv.__o = __builtin_mve_vld4qv16qi ((__builtin_neon_qi *) __addr);
+  return __rv.__i;
+}
+
+__extension__ extern __inline uint8x16x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vld4q_u8 (uint8_t const * __addr)
+{
+  union { uint8x16x4_t __i; __builtin_neon_xi __o; } __rv;
+  __rv.__o = __builtin_mve_vld4qv16qi ((__builtin_neon_qi *) __addr);
+  return __rv.__i;
+}
+
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vst1q_p_u16 (uint16_t * __addr, uint16x8_t __value, mve_pred16_t __p)
+{
+  return vstrhq_p_u16 (__addr, __value, __p);
+}
+
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vst1q_p_s16 (int16_t * __addr, int16x8_t __value, mve_pred16_t __p)
+{
+  return vstrhq_p_s16 (__addr, __value, __p);
+}
+
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vst2q_s16 (int16_t * __addr, int16x8x2_t __value)
+{
+  union { int16x8x2_t __i; __builtin_neon_oi __o; } __rv;
+  __rv.__i = __value;
+  __builtin_mve_vst2qv8hi ((__builtin_neon_hi *) __addr, __rv.__o);
+}
+
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vst2q_u16 (uint16_t * __addr, uint16x8x2_t __value)
+{
+  union { uint16x8x2_t __i; __builtin_neon_oi __o; } __rv;
+  __rv.__i = __value;
+  __builtin_mve_vst2qv8hi ((__builtin_neon_hi *) __addr, __rv.__o);
+}
+
+__extension__ extern __inline uint16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vld1q_z_u16 (uint16_t const *__base, mve_pred16_t __p)
+{
+  return vldrhq_z_u16 ( __base, __p);
+}
+
+__extension__ extern __inline int16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vld1q_z_s16 (int16_t const *__base, mve_pred16_t __p)
+{
+  return vldrhq_z_s16 ( __base, __p);
+}
+
+__extension__ extern __inline int16x8x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vld2q_s16 (int16_t const * __addr)
+{
+  union { int16x8x2_t __i; __builtin_neon_oi __o; } __rv;
+  __rv.__o = __builtin_mve_vld2qv8hi ((__builtin_neon_hi *) __addr);
+  return __rv.__i;
+}
+
+__extension__ extern __inline uint16x8x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vld2q_u16 (uint16_t const * __addr)
+{
+  union { uint16x8x2_t __i; __builtin_neon_oi __o; } __rv;
+  __rv.__o = __builtin_mve_vld2qv8hi ((__builtin_neon_hi *) __addr);
+  return __rv.__i;
+}
+
+__extension__ extern __inline int16x8x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vld4q_s16 (int16_t const * __addr)
+{
+  union { int16x8x4_t __i; __builtin_neon_xi __o; } __rv;
+  __rv.__o = __builtin_mve_vld4qv8hi ((__builtin_neon_hi *) __addr);
+  return __rv.__i;
+}
+
+__extension__ extern __inline uint16x8x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vld4q_u16 (uint16_t const * __addr)
+{
+  union { uint16x8x4_t __i; __builtin_neon_xi __o; } __rv;
+  __rv.__o = __builtin_mve_vld4qv8hi ((__builtin_neon_hi *) __addr);
+  return __rv.__i;
+}
+
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vst1q_p_u32 (uint32_t * __addr, uint32x4_t __value, mve_pred16_t __p)
+{
+  return vstrwq_p_u32 (__addr, __value, __p);
+}
+
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vst1q_p_s32 (int32_t * __addr, int32x4_t __value, mve_pred16_t __p)
+{
+  return vstrwq_p_s32 (__addr, __value, __p);
+}
+
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vst2q_s32 (int32_t * __addr, int32x4x2_t __value)
+{
+  union { int32x4x2_t __i; __builtin_neon_oi __o; } __rv;
+  __rv.__i = __value;
+  __builtin_mve_vst2qv4si ((__builtin_neon_si *) __addr, __rv.__o);
+}
+
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vst2q_u32 (uint32_t * __addr, uint32x4x2_t __value)
+{
+  union { uint32x4x2_t __i; __builtin_neon_oi __o; } __rv;
+  __rv.__i = __value;
+  __builtin_mve_vst2qv4si ((__builtin_neon_si *) __addr, __rv.__o);
+}
+
+__extension__ extern __inline uint32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vld1q_z_u32 (uint32_t const *__base, mve_pred16_t __p)
+{
+  return vldrwq_z_u32 ( __base, __p);
+}
+
+__extension__ extern __inline int32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vld1q_z_s32 (int32_t const *__base, mve_pred16_t __p)
+{
+  return vldrwq_z_s32 ( __base, __p);
+}
+
+__extension__ extern __inline int32x4x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vld2q_s32 (int32_t const * __addr)
+{
+  union { int32x4x2_t __i; __builtin_neon_oi __o; } __rv;
+  __rv.__o = __builtin_mve_vld2qv4si ((__builtin_neon_si *) __addr);
+  return __rv.__i;
+}
+
+__extension__ extern __inline uint32x4x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vld2q_u32 (uint32_t const * __addr)
+{
+  union { uint32x4x2_t __i; __builtin_neon_oi __o; } __rv;
+  __rv.__o = __builtin_mve_vld2qv4si ((__builtin_neon_si *) __addr);
+  return __rv.__i;
+}
+
+__extension__ extern __inline int32x4x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vld4q_s32 (int32_t const * __addr)
+{
+  union { int32x4x4_t __i; __builtin_neon_xi __o; } __rv;
+  __rv.__o = __builtin_mve_vld4qv4si ((__builtin_neon_si *) __addr);
+  return __rv.__i;
+}
+
+__extension__ extern __inline uint32x4x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vld4q_u32 (uint32_t const * __addr)
+{
+  union { uint32x4x4_t __i; __builtin_neon_xi __o; } __rv;
+  __rv.__o = __builtin_mve_vld4qv4si ((__builtin_neon_si *) __addr);
+  return __rv.__i;
+}
+
 #if (__ARM_FEATURE_MVE & 2) /* MVE Floating point.  */
 
 __extension__ extern __inline void
@@ -19436,6 +19722,88 @@ __arm_vrev64q_x_f32 (float32x4_t __a, mve_pred16_t __p)
   return __builtin_mve_vrev64q_m_fv4sf (vuninitializedq_f32 (), __a, __p);
 }
 
+__extension__ extern __inline float16x8x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vld4q_f16 (float16_t const * __addr)
+{
+  union { float16x8x4_t __i; __builtin_neon_xi __o; } __rv;
+  __rv.__o = __builtin_mve_vld4qv8hf (__addr);
+  return __rv.__i;
+}
+
+__extension__ extern __inline float16x8x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vld2q_f16 (float16_t const * __addr)
+{
+  union { float16x8x2_t __i; __builtin_neon_oi __o; } __rv;
+  __rv.__o = __builtin_mve_vld2qv8hf (__addr);
+  return __rv.__i;
+}
+
+__extension__ extern __inline float16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vld1q_z_f16 (float16_t const *__base, mve_pred16_t __p)
+{
+  return vldrhq_z_f16 ( __base, __p);
+}
+
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vst2q_f16 (float16_t * __addr, float16x8x2_t __value)
+{
+  union { float16x8x2_t __i; __builtin_neon_oi __o; } __rv;
+  __rv.__i = __value;
+  __builtin_mve_vst2qv8hf (__addr, __rv.__o);
+}
+
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vst1q_p_f16 (float16_t * __addr, float16x8_t __value, mve_pred16_t __p)
+{
+  return vstrhq_p_f16 (__addr, __value, __p);
+}
+
+__extension__ extern __inline float32x4x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vld4q_f32 (float32_t const * __addr)
+{
+  union { float32x4x4_t __i; __builtin_neon_xi __o; } __rv;
+  __rv.__o = __builtin_mve_vld4qv4sf (__addr);
+  return __rv.__i;
+}
+
+__extension__ extern __inline float32x4x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vld2q_f32 (float32_t const * __addr)
+{
+  union { float32x4x2_t __i; __builtin_neon_oi __o; } __rv;
+  __rv.__o = __builtin_mve_vld2qv4sf (__addr);
+  return __rv.__i;
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vld1q_z_f32 (float32_t const *__base, mve_pred16_t __p)
+{
+  return vldrwq_z_f32 ( __base, __p);
+}
+
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vst2q_f32 (float32_t * __addr, float32x4x2_t __value)
+{
+  union { float32x4x2_t __i; __builtin_neon_oi __o; } __rv;
+  __rv.__i = __value;
+  __builtin_mve_vst2qv4sf (__addr, __rv.__o);
+}
+
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vst1q_p_f32 (float32_t * __addr, float32x4_t __value, mve_pred16_t __p)
+{
+  return vstrwq_p_f32 (__addr, __value, __p);
+}
+
 #endif
 
 enum {
@@ -20995,6 +21363,42 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_float16_t_const_ptr]: __arm_vld1q_f16 (__ARM_mve_coerce(__p0, float16_t const *)), \
   int (*)[__ARM_mve_type_float32_t_const_ptr]: __arm_vld1q_f32 (__ARM_mve_coerce(__p0, float32_t const *)));})
 
+#define vld1q_z(p0,p1) __arm_vld1q_z(p0, p1)
+#define __arm_vld1q_z(p0,p1) ({ __typeof(p0) __p0 = (p0); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
+  int (*)[__ARM_mve_type_int8_t_const_ptr]: __arm_vld1q_z_s8 (__ARM_mve_coerce(__p0, int8_t const *), p1), \
+  int (*)[__ARM_mve_type_int16_t_const_ptr]: __arm_vld1q_z_s16 (__ARM_mve_coerce(__p0, int16_t const *), p1), \
+  int (*)[__ARM_mve_type_int32_t_const_ptr]: __arm_vld1q_z_s32 (__ARM_mve_coerce(__p0, int32_t const *), p1), \
+  int (*)[__ARM_mve_type_uint8_t_const_ptr]: __arm_vld1q_z_u8 (__ARM_mve_coerce(__p0, uint8_t const *), p1), \
+  int (*)[__ARM_mve_type_uint16_t_const_ptr]: __arm_vld1q_z_u16 (__ARM_mve_coerce(__p0, uint16_t const *), p1), \
+  int (*)[__ARM_mve_type_uint32_t_const_ptr]: __arm_vld1q_z_u32 (__ARM_mve_coerce(__p0, uint32_t const *), p1), \
+  int (*)[__ARM_mve_type_float16_t_const_ptr]: __arm_vld1q_z_f16 (__ARM_mve_coerce(__p0, float16_t const *), p1), \
+  int (*)[__ARM_mve_type_float32_t_const_ptr]: __arm_vld1q_z_f32 (__ARM_mve_coerce(__p0, float32_t const *), p1));})
+
+#define vld2q(p0) __arm_vld2q(p0)
+#define __arm_vld2q(p0) ({ __typeof(p0) __p0 = (p0); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
+  int (*)[__ARM_mve_type_int8_t_const_ptr]: __arm_vld2q_s8 (__ARM_mve_coerce(__p0, int8_t const *)), \
+  int (*)[__ARM_mve_type_int16_t_const_ptr]: __arm_vld2q_s16 (__ARM_mve_coerce(__p0, int16_t const *)), \
+  int (*)[__ARM_mve_type_int32_t_const_ptr]: __arm_vld2q_s32 (__ARM_mve_coerce(__p0, int32_t const *)), \
+  int (*)[__ARM_mve_type_uint8_t_const_ptr]: __arm_vld2q_u8 (__ARM_mve_coerce(__p0, uint8_t const *)), \
+  int (*)[__ARM_mve_type_uint16_t_const_ptr]: __arm_vld2q_u16 (__ARM_mve_coerce(__p0, uint16_t const *)), \
+  int (*)[__ARM_mve_type_uint32_t_const_ptr]: __arm_vld2q_u32 (__ARM_mve_coerce(__p0, uint32_t const *)), \
+  int (*)[__ARM_mve_type_float16_t_const_ptr]: __arm_vld2q_f16 (__ARM_mve_coerce(__p0, float16_t const *)), \
+  int (*)[__ARM_mve_type_float32_t_const_ptr]: __arm_vld2q_f32 (__ARM_mve_coerce(__p0, float32_t const *)));})
+
+#define vld4q(p0) __arm_vld4q(p0)
+#define __arm_vld4q(p0) ({ __typeof(p0) __p0 = (p0); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
+  int (*)[__ARM_mve_type_int8_t_const_ptr]: __arm_vld4q_s8 (__ARM_mve_coerce(__p0, int8_t const *)), \
+  int (*)[__ARM_mve_type_int16_t_const_ptr]: __arm_vld4q_s16 (__ARM_mve_coerce(__p0, int16_t const *)), \
+  int (*)[__ARM_mve_type_int32_t_const_ptr]: __arm_vld4q_s32 (__ARM_mve_coerce(__p0, int32_t const *)), \
+  int (*)[__ARM_mve_type_uint8_t_const_ptr]: __arm_vld4q_u8 (__ARM_mve_coerce(__p0, uint8_t const *)), \
+  int (*)[__ARM_mve_type_uint16_t_const_ptr]: __arm_vld4q_u16 (__ARM_mve_coerce(__p0, uint16_t const *)), \
+  int (*)[__ARM_mve_type_uint32_t_const_ptr]: __arm_vld4q_u32 (__ARM_mve_coerce(__p0, uint32_t const *)), \
+  int (*)[__ARM_mve_type_float16_t_const_ptr]: __arm_vld4q_f16 (__ARM_mve_coerce(__p0, float16_t const *)), \
+  int (*)[__ARM_mve_type_float32_t_const_ptr]: __arm_vld4q_f32 (__ARM_mve_coerce(__p0, float32_t const *)));})
+
 #define vldrhq_gather_offset(p0,p1) __arm_vldrhq_gather_offset(p0,p1)
 #define __arm_vldrhq_gather_offset(p0,p1) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
@@ -21063,6 +21467,32 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_uint32_t_const_ptr]: __arm_vldrwq_gather_shifted_offset_z_u32 (__ARM_mve_coerce(__p0, uint32_t const *), p1, p2), \
   int (*)[__ARM_mve_type_float32_t_const_ptr]: __arm_vldrwq_gather_shifted_offset_z_f32 (__ARM_mve_coerce(__p0, float32_t const *), p1, p2));})
 
+#define vst1q_p(p0,p1,p2) __arm_vst1q_p(p0,p1,p2)
+#define __arm_vst1q_p(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
+  __typeof(p1) __p1 = (p1); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
+  int (*)[__ARM_mve_type_int8_t_ptr][__ARM_mve_type_int8x16_t]: __arm_vst1q_p_s8 (__ARM_mve_coerce(__p0, int8_t *), __ARM_mve_coerce(__p1, int8x16_t), p2), \
+  int (*)[__ARM_mve_type_int16_t_ptr][__ARM_mve_type_int16x8_t]: __arm_vst1q_p_s16 (__ARM_mve_coerce(__p0, int16_t *), __ARM_mve_coerce(__p1, int16x8_t), p2), \
+  int (*)[__ARM_mve_type_int32_t_ptr][__ARM_mve_type_int32x4_t]: __arm_vst1q_p_s32 (__ARM_mve_coerce(__p0, int32_t *), __ARM_mve_coerce(__p1, int32x4_t), p2), \
+  int (*)[__ARM_mve_type_uint8_t_ptr][__ARM_mve_type_uint8x16_t]: __arm_vst1q_p_u8 (__ARM_mve_coerce(__p0, uint8_t *), __ARM_mve_coerce(__p1, uint8x16_t), p2), \
+  int (*)[__ARM_mve_type_uint16_t_ptr][__ARM_mve_type_uint16x8_t]: __arm_vst1q_p_u16 (__ARM_mve_coerce(__p0, uint16_t *), __ARM_mve_coerce(__p1, uint16x8_t), p2), \
+  int (*)[__ARM_mve_type_uint32_t_ptr][__ARM_mve_type_uint32x4_t]: __arm_vst1q_p_u32 (__ARM_mve_coerce(__p0, uint32_t *), __ARM_mve_coerce(__p1, uint32x4_t), p2), \
+  int (*)[__ARM_mve_type_float16_t_ptr][__ARM_mve_type_float16x8_t]: __arm_vst1q_p_f16 (__ARM_mve_coerce(__p0, float16_t *), __ARM_mve_coerce(__p1, float16x8_t), p2), \
+  int (*)[__ARM_mve_type_float32_t_ptr][__ARM_mve_type_float32x4_t]: __arm_vst1q_p_f32 (__ARM_mve_coerce(__p0, float32_t *), __ARM_mve_coerce(__p1, float32x4_t), p2));})
+
+#define vst2q(p0,p1) __arm_vst2q(p0,p1)
+#define __arm_vst2q(p0,p1) ({ __typeof(p0) __p0 = (p0); \
+  __typeof(p1) __p1 = (p1); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
+  int (*)[__ARM_mve_type_int8_t_ptr][__ARM_mve_type_int8x16x2_t]: __arm_vst2q_s8 (__ARM_mve_coerce(__p0, int8_t *), __ARM_mve_coerce(__p1, int8x16x2_t)), \
+  int (*)[__ARM_mve_type_int16_t_ptr][__ARM_mve_type_int16x8x2_t]: __arm_vst2q_s16 (__ARM_mve_coerce(__p0, int16_t *), __ARM_mve_coerce(__p1, int16x8x2_t)), \
+  int (*)[__ARM_mve_type_int32_t_ptr][__ARM_mve_type_int32x4x2_t]: __arm_vst2q_s32 (__ARM_mve_coerce(__p0, int32_t *), __ARM_mve_coerce(__p1, int32x4x2_t)), \
+  int (*)[__ARM_mve_type_uint8_t_ptr][__ARM_mve_type_uint8x16x2_t]: __arm_vst2q_u8 (__ARM_mve_coerce(__p0, uint8_t *), __ARM_mve_coerce(__p1, uint8x16x2_t)), \
+  int (*)[__ARM_mve_type_uint16_t_ptr][__ARM_mve_type_uint16x8x2_t]: __arm_vst2q_u16 (__ARM_mve_coerce(__p0, uint16_t *), __ARM_mve_coerce(__p1, uint16x8x2_t)), \
+  int (*)[__ARM_mve_type_uint32_t_ptr][__ARM_mve_type_uint32x4x2_t]: __arm_vst2q_u32 (__ARM_mve_coerce(__p0, uint32_t *), __ARM_mve_coerce(__p1, uint32x4x2_t)), \
+  int (*)[__ARM_mve_type_float16_t_ptr][__ARM_mve_type_float16x8x2_t]: __arm_vst2q_f16 (__ARM_mve_coerce(__p0, float16_t *), __ARM_mve_coerce(__p1, float16x8x2_t)), \
+  int (*)[__ARM_mve_type_float32_t_ptr][__ARM_mve_type_float32x4x2_t]: __arm_vst2q_f32 (__ARM_mve_coerce(__p0, float32_t *), __ARM_mve_coerce(__p1, float32x4x2_t)));})
+
 #define vst1q(p0,p1) __arm_vst1q(p0,p1)
 #define __arm_vst1q(p0,p1) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
@@ -24774,6 +25204,28 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_uint16_t_ptr][__ARM_mve_type_uint16x8_t]: __arm_vst1q_u16 (__ARM_mve_coerce(__p0, uint16_t *), __ARM_mve_coerce(__p1, uint16x8_t)), \
   int (*)[__ARM_mve_type_uint32_t_ptr][__ARM_mve_type_uint32x4_t]: __arm_vst1q_u32 (__ARM_mve_coerce(__p0, uint32_t *), __ARM_mve_coerce(__p1, uint32x4_t)));})
 
+#define vst1q_p(p0,p1,p2) __arm_vst1q_p(p0,p1,p2)
+#define __arm_vst1q_p(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
+  __typeof(p1) __p1 = (p1); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
+  int (*)[__ARM_mve_type_int8_t_ptr][__ARM_mve_type_int8x16_t]: __arm_vst1q_p_s8 (__ARM_mve_coerce(__p0, int8_t *), __ARM_mve_coerce(__p1, int8x16_t), p2), \
+  int (*)[__ARM_mve_type_int16_t_ptr][__ARM_mve_type_int16x8_t]: __arm_vst1q_p_s16 (__ARM_mve_coerce(__p0, int16_t *), __ARM_mve_coerce(__p1, int16x8_t), p2), \
+  int (*)[__ARM_mve_type_int32_t_ptr][__ARM_mve_type_int32x4_t]: __arm_vst1q_p_s32 (__ARM_mve_coerce(__p0, int32_t *), __ARM_mve_coerce(__p1, int32x4_t), p2), \
+  int (*)[__ARM_mve_type_uint8_t_ptr][__ARM_mve_type_uint8x16_t]: __arm_vst1q_p_u8 (__ARM_mve_coerce(__p0, uint8_t *), __ARM_mve_coerce(__p1, uint8x16_t), p2), \
+  int (*)[__ARM_mve_type_uint16_t_ptr][__ARM_mve_type_uint16x8_t]: __arm_vst1q_p_u16 (__ARM_mve_coerce(__p0, uint16_t *), __ARM_mve_coerce(__p1, uint16x8_t), p2), \
+  int (*)[__ARM_mve_type_uint32_t_ptr][__ARM_mve_type_uint32x4_t]: __arm_vst1q_p_u32 (__ARM_mve_coerce(__p0, uint32_t *), __ARM_mve_coerce(__p1, uint32x4_t), p2));})
+
+#define vst2q(p0,p1) __arm_vst2q(p0,p1)
+#define __arm_vst2q(p0,p1) ({ __typeof(p0) __p0 = (p0); \
+  __typeof(p1) __p1 = (p1); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
+  int (*)[__ARM_mve_type_int8_t_ptr][__ARM_mve_type_int8x16x2_t]: __arm_vst2q_s8 (__ARM_mve_coerce(__p0, int8_t *), __ARM_mve_coerce(__p1, int8x16x2_t)), \
+  int (*)[__ARM_mve_type_int16_t_ptr][__ARM_mve_type_int16x8x2_t]: __arm_vst2q_s16 (__ARM_mve_coerce(__p0, int16_t *), __ARM_mve_coerce(__p1, int16x8x2_t)), \
+  int (*)[__ARM_mve_type_int32_t_ptr][__ARM_mve_type_int32x4x2_t]: __arm_vst2q_s32 (__ARM_mve_coerce(__p0, int32_t *), __ARM_mve_coerce(__p1, int32x4x2_t)), \
+  int (*)[__ARM_mve_type_uint8_t_ptr][__ARM_mve_type_uint8x16x2_t]: __arm_vst2q_u8 (__ARM_mve_coerce(__p0, uint8_t *), __ARM_mve_coerce(__p1, uint8x16x2_t)), \
+  int (*)[__ARM_mve_type_uint16_t_ptr][__ARM_mve_type_uint16x8x2_t]: __arm_vst2q_u16 (__ARM_mve_coerce(__p0, uint16_t *), __ARM_mve_coerce(__p1, uint16x8x2_t)), \
+  int (*)[__ARM_mve_type_uint32_t_ptr][__ARM_mve_type_uint32x4x2_t]: __arm_vst2q_u32 (__ARM_mve_coerce(__p0, uint32_t *), __ARM_mve_coerce(__p1, uint32x4x2_t)));})
+
 #define vstrhq(p0,p1) __arm_vstrhq(p0,p1)
 #define __arm_vstrhq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
@@ -25780,6 +26232,36 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vsbcq_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), p2), \
   int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vsbcq_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t), p2));})
 
+#define vld1q_z(p0,p1) __arm_vld1q_z(p0, p1)
+#define __arm_vld1q_z(p0,p1) ({ __typeof(p0) __p0 = (p0); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
+  int (*)[__ARM_mve_type_int8_t_const_ptr]: __arm_vld1q_z_s8 (__ARM_mve_coerce(__p0, int8_t const *), p1), \
+  int (*)[__ARM_mve_type_int16_t_const_ptr]: __arm_vld1q_z_s16 (__ARM_mve_coerce(__p0, int16_t const *), p1), \
+  int (*)[__ARM_mve_type_int32_t_const_ptr]: __arm_vld1q_z_s32 (__ARM_mve_coerce(__p0, int32_t const *), p1), \
+  int (*)[__ARM_mve_type_uint8_t_const_ptr]: __arm_vld1q_z_u8 (__ARM_mve_coerce(__p0, uint8_t const *), p1), \
+  int (*)[__ARM_mve_type_uint16_t_const_ptr]: __arm_vld1q_z_u16 (__ARM_mve_coerce(__p0, uint16_t const *), p1), \
+  int (*)[__ARM_mve_type_uint32_t_const_ptr]: __arm_vld1q_z_u32 (__ARM_mve_coerce(__p0, uint32_t const *), p1));})
+
+#define vld2q(p0) __arm_vld2q(p0)
+#define __arm_vld2q(p0) ({ __typeof(p0) __p0 = (p0); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
+  int (*)[__ARM_mve_type_int8_t_const_ptr]: __arm_vld2q_s8 (__ARM_mve_coerce(__p0, int8_t const *)), \
+  int (*)[__ARM_mve_type_int16_t_const_ptr]: __arm_vld2q_s16 (__ARM_mve_coerce(__p0, int16_t const *)), \
+  int (*)[__ARM_mve_type_int32_t_const_ptr]: __arm_vld2q_s32 (__ARM_mve_coerce(__p0, int32_t const *)), \
+  int (*)[__ARM_mve_type_uint8_t_const_ptr]: __arm_vld2q_u8 (__ARM_mve_coerce(__p0, uint8_t const *)), \
+  int (*)[__ARM_mve_type_uint16_t_const_ptr]: __arm_vld2q_u16 (__ARM_mve_coerce(__p0, uint16_t const *)), \
+  int (*)[__ARM_mve_type_uint32_t_const_ptr]: __arm_vld2q_u32 (__ARM_mve_coerce(__p0, uint32_t const *)));})
+
+#define vld4q(p0) __arm_vld4q(p0)
+#define __arm_vld4q(p0) ({ __typeof(p0) __p0 = (p0); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
+  int (*)[__ARM_mve_type_int8_t_const_ptr]: __arm_vld4q_s8 (__ARM_mve_coerce(__p0, int8_t const *)), \
+  int (*)[__ARM_mve_type_int16_t_const_ptr]: __arm_vld4q_s16 (__ARM_mve_coerce(__p0, int16_t const *)), \
+  int (*)[__ARM_mve_type_int32_t_const_ptr]: __arm_vld4q_s32 (__ARM_mve_coerce(__p0, int32_t const *)), \
+  int (*)[__ARM_mve_type_uint8_t_const_ptr]: __arm_vld4q_u8 (__ARM_mve_coerce(__p0, uint8_t const *)), \
+  int (*)[__ARM_mve_type_uint16_t_const_ptr]: __arm_vld4q_u16 (__ARM_mve_coerce(__p0, uint16_t const *)), \
+  int (*)[__ARM_mve_type_uint32_t_const_ptr]: __arm_vld4q_u32 (__ARM_mve_coerce(__p0, uint32_t const *)));})
+
 #endif /* MVE Floating point.  */
 
 #ifdef __cplusplus
diff --git a/gcc/config/arm/arm_mve_builtins.def b/gcc/config/arm/arm_mve_builtins.def
index a413b38676f2f102c16fdf2147f3b8a4d8ec47b4..638dcbc819034bf2c8428ff40f0e4d811763d80e 100644
--- a/gcc/config/arm/arm_mve_builtins.def
+++ b/gcc/config/arm/arm_mve_builtins.def
@@ -873,3 +873,6 @@ VAR1 (QUADOP_NONE_NONE_NONE_NONE_UNONE, vsbciq_m_s, v4si)
 VAR1 (QUADOP_UNONE_UNONE_UNONE_UNONE_UNONE, vsbciq_m_u, v4si)
 VAR1 (QUADOP_NONE_NONE_NONE_NONE_UNONE, vsbcq_m_s, v4si)
 VAR1 (QUADOP_UNONE_UNONE_UNONE_UNONE_UNONE, vsbcq_m_u, v4si)
+VAR5 (STORE1, vst2q, v16qi, v8hi, v4si, v8hf, v4sf)
+VAR5 (LOAD1, vld4q, v16qi, v8hi, v4si, v8hf, v4sf)
+VAR5 (LOAD1, vld2q, v16qi, v8hi, v4si, v8hf, v4sf)
diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index 8ff69094378396830ef31d9e2ca9db71c58aefab..62735e5ddab1125e6f38fbf5a5cb5c04936a7717 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -214,7 +214,7 @@
 			 VLDRDQGBWB_S VLDRDQGBWB_U VADCQ_U VADCQ_M_U VADCQ_S
 			 VADCQ_M_S VSBCIQ_U VSBCIQ_S VSBCIQ_M_U VSBCIQ_M_S
 			 VSBCQ_U VSBCQ_S VSBCQ_M_U VSBCQ_M_S VADCIQ_U VADCIQ_M_U
-			 VADCIQ_S VADCIQ_M_S])
+			 VADCIQ_S VADCIQ_M_S VLD2Q VLD4Q VST2Q])
 
 (define_mode_attr MVE_CNVT [(V8HI "V8HF") (V4SI "V4SF") (V8HF "V8HI")
 			    (V4SF "V4SI")])
@@ -10775,3 +10775,91 @@
   "vsbc.i32\t%q0, %q1, %q2"
   [(set_attr "type" "mve_move")
    (set_attr "length" "4")])
+
+;;
+;; [vst2q])
+;;
+(define_insn "mve_vst2q<mode>"
+  [(set (match_operand:OI 0 "neon_struct_operand" "=Um")
+	(unspec:OI [(match_operand:OI 1 "s_register_operand" "w")
+		    (unspec:MVE_VLD_ST [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
+	 VST2Q))
+  ]
+  "(TARGET_HAVE_MVE && VALID_MVE_SI_MODE (<MODE>mode))
+   || (TARGET_HAVE_MVE_FLOAT && VALID_MVE_SF_MODE (<MODE>mode))"
+{
+   rtx ops[4];
+   int regno = REGNO (operands[1]);
+   ops[0] = gen_rtx_REG (TImode, regno);
+   ops[1] = gen_rtx_REG (TImode, regno + 4);
+   rtx reg  = operands[0];
+   while (reg && !REG_P (reg))
+    reg = XEXP (reg, 0);
+   gcc_assert (REG_P (reg));
+   ops[2] = reg;
+   ops[3] = operands[0];
+   output_asm_insn ("vst20.<V_sz_elem>\t{%q0, %q1}, [%2]\n\t"
+		    "vst21.<V_sz_elem>\t{%q0, %q1}, %3", ops);
+   return "";
+}
+  [(set_attr "length" "8")])
+
+;;
+;; [vld2q])
+;;
+(define_insn "mve_vld2q<mode>"
+  [(set (match_operand:OI 0 "s_register_operand" "=w")
+	(unspec:OI [(match_operand:OI 1 "neon_struct_operand" "Um")
+		    (unspec:MVE_VLD_ST [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
+	 VLD2Q))
+  ]
+  "(TARGET_HAVE_MVE && VALID_MVE_SI_MODE (<MODE>mode))
+   || (TARGET_HAVE_MVE_FLOAT && VALID_MVE_SF_MODE (<MODE>mode))"
+{
+   rtx ops[4];
+   int regno = REGNO (operands[0]);
+   ops[0] = gen_rtx_REG (TImode, regno);
+   ops[1] = gen_rtx_REG (TImode, regno + 4);
+   rtx reg  = operands[1];
+   while (reg && !REG_P (reg))
+    reg = XEXP (reg, 0);
+   gcc_assert (REG_P (reg));
+   ops[2] = reg;
+   ops[3] = operands[1];
+   output_asm_insn ("vld20.<V_sz_elem>\t{%q0, %q1}, [%2]\n\t"
+		    "vld21.<V_sz_elem>\t{%q0, %q1}, %3", ops);
+   return "";
+}
+  [(set_attr "length" "8")])
+
+;;
+;; [vld4q])
+;;
+(define_insn "mve_vld4q<mode>"
+  [(set (match_operand:XI 0 "s_register_operand" "=w")
+	(unspec:XI [(match_operand:XI 1 "neon_struct_operand" "Um")
+		    (unspec:MVE_VLD_ST [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
+	 VLD4Q))
+  ]
+  "(TARGET_HAVE_MVE && VALID_MVE_SI_MODE (<MODE>mode))
+   || (TARGET_HAVE_MVE_FLOAT && VALID_MVE_SF_MODE (<MODE>mode))"
+{
+   rtx ops[6];
+   int regno = REGNO (operands[0]);
+   ops[0] = gen_rtx_REG (TImode, regno);
+   ops[1] = gen_rtx_REG (TImode, regno+4);
+   ops[2] = gen_rtx_REG (TImode, regno+8);
+   ops[3] = gen_rtx_REG (TImode, regno + 12);
+   rtx reg  = operands[1];
+   while (reg && !REG_P (reg))
+    reg = XEXP (reg, 0);
+   gcc_assert (REG_P (reg));
+   ops[4] = reg;
+   ops[5] = operands[1];
+   output_asm_insn ("vld40.<V_sz_elem>\t{%q0, %q1, %q2, %q3}, [%4]\n\t"
+		    "vld41.<V_sz_elem>\t{%q0, %q1, %q2, %q3}, [%4]\n\t"
+		    "vld42.<V_sz_elem>\t{%q0, %q1, %q2, %q3}, [%4]\n\t"
+		    "vld43.<V_sz_elem>\t{%q0, %q1, %q2, %q3}, %5", ops);
+   return "";
+}
+  [(set_attr "length" "16")])
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_f16.c
new file mode 100644
index 0000000000000000000000000000000000000000..26cc62ad29870209d51fb6133e35a55bbc254d57
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_f16.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+float16x8_t
+foo (float16_t const * base, mve_pred16_t p)
+{
+  return vld1q_z_f16 (base, p);
+}
+
+/* { dg-final { scan-assembler "vldrht.f16"  }  } */
+
+float16x8_t
+foo1 (float16_t const * base, mve_pred16_t p)
+{
+  return vld1q_z (base, p);
+}
+
+/* { dg-final { scan-assembler "vldrht.f16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_f32.c
new file mode 100644
index 0000000000000000000000000000000000000000..9184987dd5ae2d5b0de75e08e49d5149559a7aef
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_f32.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+float32x4_t
+foo (float32_t const * base, mve_pred16_t p)
+{
+  return vld1q_z_f32 (base, p);
+}
+
+/* { dg-final { scan-assembler "vldrwt.f32"  }  } */
+
+float32x4_t
+foo1 (float32_t const * base, mve_pred16_t p)
+{
+  return vld1q_z (base, p);
+}
+
+/* { dg-final { scan-assembler "vldrwt.f32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_s16.c
new file mode 100644
index 0000000000000000000000000000000000000000..c993f00fea3cfcc19531e20d161b120570794997
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_s16.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int16x8_t
+foo (int16_t const * base, mve_pred16_t p)
+{
+  return vld1q_z_s16 (base, p);
+}
+
+/* { dg-final { scan-assembler "vldrht.s16"  }  } */
+
+int16x8_t
+foo1 (int16_t const * base, mve_pred16_t p)
+{
+  return vld1q_z (base, p);
+}
+
+/* { dg-final { scan-assembler "vldrht.s16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_s32.c
new file mode 100644
index 0000000000000000000000000000000000000000..801d6c08f16339075ecccaca505c5b55360b5c84
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_s32.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int32x4_t
+foo (int32_t const * base, mve_pred16_t p)
+{
+  return vld1q_z_s32 (base, p);
+}
+
+/* { dg-final { scan-assembler "vldrwt.s32"  }  } */
+
+int32x4_t
+foo1 (int32_t const * base, mve_pred16_t p)
+{
+  return vld1q_z (base, p);
+}
+
+/* { dg-final { scan-assembler "vldrwt.s32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_s8.c
new file mode 100644
index 0000000000000000000000000000000000000000..edbffd1804da364ce2bf44cb587441a7e774a898
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_s8.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int8x16_t
+foo (int8_t const * base, mve_pred16_t p)
+{
+  return vld1q_z_s8 (base, p);
+}
+
+/* { dg-final { scan-assembler "vldrbt.s8"  }  } */
+
+int8x16_t
+foo1 (int8_t const * base, mve_pred16_t p)
+{
+  return vld1q_z (base, p);
+}
+
+/* { dg-final { scan-assembler "vldrbt.s8"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_u16.c
new file mode 100644
index 0000000000000000000000000000000000000000..c9c0edb2e4dfc098b67b9148980897b42d199388
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_u16.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint16x8_t
+foo (uint16_t const * base, mve_pred16_t p)
+{
+  return vld1q_z_u16 (base, p);
+}
+
+/* { dg-final { scan-assembler "vldrht.u16"  }  } */
+
+uint16x8_t
+foo1 (uint16_t const * base, mve_pred16_t p)
+{
+  return vld1q_z (base, p);
+}
+
+/* { dg-final { scan-assembler "vldrht.u16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_u32.c
new file mode 100644
index 0000000000000000000000000000000000000000..34b84cddef49798dabd167eddad1c784708aab55
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_u32.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint32x4_t
+foo (uint32_t const * base, mve_pred16_t p)
+{
+  return vld1q_z_u32 (base, p);
+}
+
+/* { dg-final { scan-assembler "vldrwt.u32"  }  } */
+
+uint32x4_t
+foo1 (uint32_t const * base, mve_pred16_t p)
+{
+  return vld1q_z (base, p);
+}
+
+/* { dg-final { scan-assembler "vldrwt.u32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_u8.c
new file mode 100644
index 0000000000000000000000000000000000000000..9338a98d3c0eba57a65f908c70d8801c57b87a12
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_z_u8.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint8x16_t
+foo (uint8_t const * base, mve_pred16_t p)
+{
+  return vld1q_z_u8 (base, p);
+}
+
+/* { dg-final { scan-assembler "vldrbt.u8"  }  } */
+
+uint8x16_t
+foo1 (uint8_t const * base, mve_pred16_t p)
+{
+  return vld1q_z (base, p);
+}
+
+/* { dg-final { scan-assembler "vldrbt.u8"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld2q_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld2q_f16.c
new file mode 100644
index 0000000000000000000000000000000000000000..9ac3ae203e9fdfa77ed281a74a989a34fd1c8949
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld2q_f16.c
@@ -0,0 +1,22 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+float16x8x2_t
+foo (float16_t const * addr)
+{
+  return vld2q_f16 (addr);
+}
+
+/* { dg-final { scan-assembler "vld20.16"  }  } */
+/* { dg-final { scan-assembler "vld21.16"  }  } */
+
+float16x8x2_t
+foo1 (float16_t const * addr)
+{
+  return vld2q (addr);
+}
+
+/* { dg-final { scan-assembler "vld20.16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld2q_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld2q_f32.c
new file mode 100644
index 0000000000000000000000000000000000000000..f2ef313af6cec813db59a772ceee87dab090be3e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld2q_f32.c
@@ -0,0 +1,22 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+float32x4x2_t
+foo (float32_t const * addr)
+{
+  return vld2q_f32 (addr);
+}
+
+/* { dg-final { scan-assembler "vld20.32"  }  } */
+/* { dg-final { scan-assembler "vld21.32"  }  } */
+
+float32x4x2_t
+foo1 (float32_t const * addr)
+{
+  return vld2q (addr);
+}
+
+/* { dg-final { scan-assembler "vld20.32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld2q_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld2q_s16.c
new file mode 100644
index 0000000000000000000000000000000000000000..9d1a8bfebb0501063833666c7f3ea5a3f1e40de0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld2q_s16.c
@@ -0,0 +1,22 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int16x8x2_t
+foo (int16_t const * addr)
+{
+  return vld2q_s16 (addr);
+}
+
+/* { dg-final { scan-assembler "vld20.16"  }  } */
+/* { dg-final { scan-assembler "vld21.16"  }  } */
+
+int16x8x2_t
+foo1 (int16_t const * addr)
+{
+  return vld2q (addr);
+}
+
+/* { dg-final { scan-assembler "vld20.16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld2q_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld2q_s32.c
new file mode 100644
index 0000000000000000000000000000000000000000..2fcc3bbb214f1923bbd0354042854bde2ec6a984
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld2q_s32.c
@@ -0,0 +1,22 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int32x4x2_t
+foo (int32_t const * addr)
+{
+  return vld2q_s32 (addr);
+}
+
+/* { dg-final { scan-assembler "vld20.32"  }  } */
+/* { dg-final { scan-assembler "vld21.32"  }  } */
+
+int32x4x2_t
+foo1 (int32_t const * addr)
+{
+  return vld2q (addr);
+}
+
+/* { dg-final { scan-assembler "vld20.32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld2q_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld2q_s8.c
new file mode 100644
index 0000000000000000000000000000000000000000..e4ba850fbd10507e55e2a8b4f2d576c6ee23a465
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld2q_s8.c
@@ -0,0 +1,22 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int8x16x2_t
+foo (int8_t const * addr)
+{
+  return vld2q_s8 (addr);
+}
+
+/* { dg-final { scan-assembler "vld20.8"  }  } */
+/* { dg-final { scan-assembler "vld21.8"  }  } */
+
+int8x16x2_t
+foo1 (int8_t const * addr)
+{
+  return vld2q (addr);
+}
+
+/* { dg-final { scan-assembler "vld20.8"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld2q_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld2q_u16.c
new file mode 100644
index 0000000000000000000000000000000000000000..e16bd47b4347fec79bb8dc16331a8927075d843b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld2q_u16.c
@@ -0,0 +1,22 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint16x8x2_t
+foo (uint16_t const * addr)
+{
+  return vld2q_u16 (addr);
+}
+
+/* { dg-final { scan-assembler "vld20.16"  }  } */
+/* { dg-final { scan-assembler "vld21.16"  }  } */
+
+uint16x8x2_t
+foo1 (uint16_t const * addr)
+{
+  return vld2q (addr);
+}
+
+/* { dg-final { scan-assembler "vld20.16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld2q_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld2q_u32.c
new file mode 100644
index 0000000000000000000000000000000000000000..9ddad5f7dcff826342a4a07e7c4f0bd5a278ebf1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld2q_u32.c
@@ -0,0 +1,22 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint32x4x2_t
+foo (uint32_t const * addr)
+{
+  return vld2q_u32 (addr);
+}
+
+/* { dg-final { scan-assembler "vld20.32"  }  } */
+/* { dg-final { scan-assembler "vld21.32"  }  } */
+
+uint32x4x2_t
+foo1 (uint32_t const * addr)
+{
+  return vld2q (addr);
+}
+
+/* { dg-final { scan-assembler "vld20.32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld2q_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld2q_u8.c
new file mode 100644
index 0000000000000000000000000000000000000000..76ae541b27ce6bd36c41f2325bb4ff1f932e8eb8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld2q_u8.c
@@ -0,0 +1,22 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint8x16x2_t
+foo (uint8_t const * addr)
+{
+  return vld2q_u8 (addr);
+}
+
+/* { dg-final { scan-assembler "vld20.8"  }  } */
+/* { dg-final { scan-assembler "vld21.8"  }  } */
+
+uint8x16x2_t
+foo1 (uint8_t const * addr)
+{
+  return vld2q (addr);
+}
+
+/* { dg-final { scan-assembler "vld20.8"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld4q_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld4q_f16.c
new file mode 100644
index 0000000000000000000000000000000000000000..6dcfe7ea454e975744854340c5f13ef0cc238afb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld4q_f16.c
@@ -0,0 +1,24 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+float16x8x4_t
+foo (float16_t const * addr)
+{
+  return vld4q_f16 (addr);
+}
+
+/* { dg-final { scan-assembler "vld40.16"  }  } */
+/* { dg-final { scan-assembler "vld41.16"  }  } */
+/* { dg-final { scan-assembler "vld42.16"  }  } */
+/* { dg-final { scan-assembler "vld43.16"  }  } */
+
+float16x8x4_t
+foo1 (float16_t const * addr)
+{
+  return vld4q (addr);
+}
+
+/* { dg-final { scan-assembler "vld40.16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld4q_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld4q_f32.c
new file mode 100644
index 0000000000000000000000000000000000000000..a7c1b1608d69148285dbc1e801ef8f69b64019d6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld4q_f32.c
@@ -0,0 +1,24 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+float32x4x4_t
+foo (float32_t const * addr)
+{
+  return vld4q_f32 (addr);
+}
+
+/* { dg-final { scan-assembler "vld40.32"  }  } */
+/* { dg-final { scan-assembler "vld41.32"  }  } */
+/* { dg-final { scan-assembler "vld42.32"  }  } */
+/* { dg-final { scan-assembler "vld43.32"  }  } */
+
+float32x4x4_t
+foo1 (float32_t const * addr)
+{
+  return vld4q (addr);
+}
+
+/* { dg-final { scan-assembler "vld40.32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld4q_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld4q_s16.c
new file mode 100644
index 0000000000000000000000000000000000000000..6f79d92911ec771a6493510fd08c93d30103d9b2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld4q_s16.c
@@ -0,0 +1,24 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int16x8x4_t
+foo (int16_t const * addr)
+{
+  return vld4q_s16 (addr);
+}
+
+/* { dg-final { scan-assembler "vld40.16"  }  } */
+/* { dg-final { scan-assembler "vld41.16"  }  } */
+/* { dg-final { scan-assembler "vld42.16"  }  } */
+/* { dg-final { scan-assembler "vld43.16"  }  } */
+
+int16x8x4_t
+foo1 (int16_t const * addr)
+{
+  return vld4q (addr);
+}
+
+/* { dg-final { scan-assembler "vld40.16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld4q_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld4q_s32.c
new file mode 100644
index 0000000000000000000000000000000000000000..d7bc46ea9512aa197327584aef3235c0ee51ca52
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld4q_s32.c
@@ -0,0 +1,24 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int32x4x4_t
+foo (int32_t const * addr)
+{
+  return vld4q_s32 (addr);
+}
+
+/* { dg-final { scan-assembler "vld40.32"  }  } */
+/* { dg-final { scan-assembler "vld41.32"  }  } */
+/* { dg-final { scan-assembler "vld42.32"  }  } */
+/* { dg-final { scan-assembler "vld43.32"  }  } */
+
+int32x4x4_t
+foo1 (int32_t const * addr)
+{
+  return vld4q (addr);
+}
+
+/* { dg-final { scan-assembler "vld40.32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld4q_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld4q_s8.c
new file mode 100644
index 0000000000000000000000000000000000000000..3708b59f6ec77cd3ecda9e06f213d1f02a482151
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld4q_s8.c
@@ -0,0 +1,24 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int8x16x4_t
+foo (int8_t const * addr)
+{
+  return vld4q_s8 (addr);
+}
+
+/* { dg-final { scan-assembler "vld40.8"  }  } */
+/* { dg-final { scan-assembler "vld41.8"  }  } */
+/* { dg-final { scan-assembler "vld42.8"  }  } */
+/* { dg-final { scan-assembler "vld43.8"  }  } */
+
+int8x16x4_t
+foo1 (int8_t const * addr)
+{
+  return vld4q (addr);
+}
+
+/* { dg-final { scan-assembler "vld40.8"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld4q_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld4q_u16.c
new file mode 100644
index 0000000000000000000000000000000000000000..2b708bd1cbd167d8a909bc9c554e1b9269a81240
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld4q_u16.c
@@ -0,0 +1,24 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint16x8x4_t
+foo (uint16_t const * addr)
+{
+  return vld4q_u16 (addr);
+}
+
+/* { dg-final { scan-assembler "vld40.16"  }  } */
+/* { dg-final { scan-assembler "vld41.16"  }  } */
+/* { dg-final { scan-assembler "vld42.16"  }  } */
+/* { dg-final { scan-assembler "vld43.16"  }  } */
+
+uint16x8x4_t
+foo1 (uint16_t const * addr)
+{
+  return vld4q (addr);
+}
+
+/* { dg-final { scan-assembler "vld40.16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld4q_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld4q_u32.c
new file mode 100644
index 0000000000000000000000000000000000000000..9b3c3922f1bb76b52de86a59dd4777431e793cb4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld4q_u32.c
@@ -0,0 +1,24 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint32x4x4_t
+foo (uint32_t const * addr)
+{
+  return vld4q_u32 (addr);
+}
+
+/* { dg-final { scan-assembler "vld40.32"  }  } */
+/* { dg-final { scan-assembler "vld41.32"  }  } */
+/* { dg-final { scan-assembler "vld42.32"  }  } */
+/* { dg-final { scan-assembler "vld43.32"  }  } */
+
+uint32x4x4_t
+foo1 (uint32_t const * addr)
+{
+  return vld4q (addr);
+}
+
+/* { dg-final { scan-assembler "vld40.32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld4q_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld4q_u8.c
new file mode 100644
index 0000000000000000000000000000000000000000..e950326e0ef4a388cbe1d58098b49d8862c0419e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld4q_u8.c
@@ -0,0 +1,24 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint8x16x4_t
+foo (uint8_t const * addr)
+{
+  return vld4q_u8 (addr);
+}
+
+/* { dg-final { scan-assembler "vld40.8"  }  } */
+/* { dg-final { scan-assembler "vld41.8"  }  } */
+/* { dg-final { scan-assembler "vld42.8"  }  } */
+/* { dg-final { scan-assembler "vld43.8"  }  } */
+
+uint8x16x4_t
+foo1 (uint8_t const * addr)
+{
+  return vld4q (addr);
+}
+
+/* { dg-final { scan-assembler "vld40.8"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_p_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_p_f16.c
new file mode 100644
index 0000000000000000000000000000000000000000..9983da6ba3f67e26be7db6ccbed7a684dbc2943e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_p_f16.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+void
+foo (float16_t * addr, float16x8_t value, mve_pred16_t p)
+{
+  vst1q_p_f16 (addr, value, p);
+}
+
+/* { dg-final { scan-assembler "vstrht.16"  }  } */
+
+void
+foo1 (float16_t * addr, float16x8_t value, mve_pred16_t p)
+{
+  vst1q_p (addr, value, p);
+}
+
+/* { dg-final { scan-assembler "vstrht.16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_p_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_p_f32.c
new file mode 100644
index 0000000000000000000000000000000000000000..870d2e257ab26a2978e00a4c5f77b7154592e671
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_p_f32.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+void
+foo (float32_t * addr, float32x4_t value, mve_pred16_t p)
+{
+  vst1q_p_f32 (addr, value, p);
+}
+
+/* { dg-final { scan-assembler "vstrwt.32"  }  } */
+
+void
+foo1 (float32_t * addr, float32x4_t value, mve_pred16_t p)
+{
+  vst1q_p (addr, value, p);
+}
+
+/* { dg-final { scan-assembler "vstrwt.32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_p_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_p_s16.c
new file mode 100644
index 0000000000000000000000000000000000000000..71dc898480f15ed6586eeff4bbb67307a5e77794
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_p_s16.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+void
+foo (int16_t * addr, int16x8_t value, mve_pred16_t p)
+{
+  vst1q_p_s16 (addr, value, p);
+}
+
+/* { dg-final { scan-assembler "vstrht.16"  }  } */
+
+void
+foo1 (int16_t * addr, int16x8_t value, mve_pred16_t p)
+{
+  vst1q_p (addr, value, p);
+}
+
+/* { dg-final { scan-assembler "vstrht.16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_p_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_p_s32.c
new file mode 100644
index 0000000000000000000000000000000000000000..8643c5068d629dfce628f6d08b581bdd56f117ee
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_p_s32.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+void
+foo (int32_t * addr, int32x4_t value, mve_pred16_t p)
+{
+  vst1q_p_s32 (addr, value, p);
+}
+
+/* { dg-final { scan-assembler "vstrwt.32"  }  } */
+
+void
+foo1 (int32_t * addr, int32x4_t value, mve_pred16_t p)
+{
+  vst1q_p (addr, value, p);
+}
+
+/* { dg-final { scan-assembler "vstrwt.32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_p_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_p_s8.c
new file mode 100644
index 0000000000000000000000000000000000000000..570574fc2b995fab882d424a4bcb6d997d155f58
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_p_s8.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+void
+foo (int8_t * addr, int8x16_t value, mve_pred16_t p)
+{
+  vst1q_p_s8 (addr, value, p);
+}
+
+/* { dg-final { scan-assembler "vstrbt.8"  }  } */
+
+void
+foo1 (int8_t * addr, int8x16_t value, mve_pred16_t p)
+{
+  vst1q_p (addr, value, p);
+}
+
+/* { dg-final { scan-assembler "vstrbt.8"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_p_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_p_u16.c
new file mode 100644
index 0000000000000000000000000000000000000000..43fc3ae023d942598df1c43066c79285d5a59c66
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_p_u16.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+void
+foo (uint16_t * addr, uint16x8_t value, mve_pred16_t p)
+{
+  vst1q_p_u16 (addr, value, p);
+}
+
+/* { dg-final { scan-assembler "vstrht.16"  }  } */
+
+void
+foo1 (uint16_t * addr, uint16x8_t value, mve_pred16_t p)
+{
+  vst1q_p (addr, value, p);
+}
+
+/* { dg-final { scan-assembler "vstrht.16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_p_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_p_u32.c
new file mode 100644
index 0000000000000000000000000000000000000000..bcaf6f23597c45977661eede690786890c051e5d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_p_u32.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+void
+foo (uint32_t * addr, uint32x4_t value, mve_pred16_t p)
+{
+  vst1q_p_u32 (addr, value, p);
+}
+
+/* { dg-final { scan-assembler "vstrwt.32"  }  } */
+
+void
+foo1 (uint32_t * addr, uint32x4_t value, mve_pred16_t p)
+{
+  vst1q_p (addr, value, p);
+}
+
+/* { dg-final { scan-assembler "vstrwt.32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_p_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_p_u8.c
new file mode 100644
index 0000000000000000000000000000000000000000..898d2eb10835a50a689d4db6dfe50425b6c40769
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_p_u8.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+void
+foo (uint8_t * addr, uint8x16_t value, mve_pred16_t p)
+{
+  vst1q_p_u8 (addr, value, p);
+}
+
+/* { dg-final { scan-assembler "vstrbt.8"  }  } */
+
+void
+foo1 (uint8_t * addr, uint8x16_t value, mve_pred16_t p)
+{
+  vst1q_p (addr, value, p);
+}
+
+/* { dg-final { scan-assembler "vstrbt.8"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst2q_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst2q_f16.c
new file mode 100644
index 0000000000000000000000000000000000000000..8c3e621320e6aa3900fafeaeab3db9f9fd1dae06
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst2q_f16.c
@@ -0,0 +1,22 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+void
+foo (float16_t * addr, float16x8x2_t value)
+{
+  vst2q_f16 (addr, value);
+}
+
+/* { dg-final { scan-assembler "vst20.16"  }  } */
+/* { dg-final { scan-assembler "vst21.16"  }  } */
+
+void
+foo1 (float16_t * addr, float16x8x2_t value)
+{
+  vst2q (addr, value);
+}
+
+/* { dg-final { scan-assembler "vst20.16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst2q_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst2q_f32.c
new file mode 100644
index 0000000000000000000000000000000000000000..99648c3db93a241088efb08df5299d651002d288
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst2q_f32.c
@@ -0,0 +1,22 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+void
+foo (float32_t * addr, float32x4x2_t value)
+{
+  vst2q_f32 (addr, value);
+}
+
+/* { dg-final { scan-assembler "vst20.32"  }  } */
+/* { dg-final { scan-assembler "vst21.32"  }  } */
+
+void
+foo1 (float32_t * addr, float32x4x2_t value)
+{
+  vst2q (addr, value);
+}
+
+/* { dg-final { scan-assembler "vst20.32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst2q_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst2q_s16.c
new file mode 100644
index 0000000000000000000000000000000000000000..8b44d7eab4eefcf242ef794a6781a67836638099
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst2q_s16.c
@@ -0,0 +1,22 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+void
+foo (int16_t * addr, int16x8x2_t value)
+{
+  vst2q_s16 (addr, value);
+}
+
+/* { dg-final { scan-assembler "vst20.16"  }  } */
+/* { dg-final { scan-assembler "vst21.16"  }  } */
+
+void
+foo1 (int16_t * addr, int16x8x2_t value)
+{
+  vst2q (addr, value);
+}
+
+/* { dg-final { scan-assembler "vst20.16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst2q_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst2q_s32.c
new file mode 100644
index 0000000000000000000000000000000000000000..7102edbb0e19081d43bf5810429cab9c79fa330d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst2q_s32.c
@@ -0,0 +1,22 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+void
+foo (int32_t * addr, int32x4x2_t value)
+{
+  vst2q_s32 (addr, value);
+}
+
+/* { dg-final { scan-assembler "vst20.32"  }  } */
+/* { dg-final { scan-assembler "vst21.32"  }  } */
+
+void
+foo1 (int32_t * addr, int32x4x2_t value)
+{
+  vst2q (addr, value);
+}
+
+/* { dg-final { scan-assembler "vst20.32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst2q_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst2q_s8.c
new file mode 100644
index 0000000000000000000000000000000000000000..c16a1042f323eaa87793047931258d683517f6a2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst2q_s8.c
@@ -0,0 +1,22 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+void
+foo (int8_t * addr, int8x16x2_t value)
+{
+  vst2q_s8 (addr, value);
+}
+
+/* { dg-final { scan-assembler "vst20.8"  }  } */
+/* { dg-final { scan-assembler "vst21.8"  }  } */
+
+void
+foo1 (int8_t * addr, int8x16x2_t value)
+{
+  vst2q (addr, value);
+}
+
+/* { dg-final { scan-assembler "vst20.8"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst2q_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst2q_u16.c
new file mode 100644
index 0000000000000000000000000000000000000000..69c97be5df9a876ee77150255878f5765f31a963
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst2q_u16.c
@@ -0,0 +1,22 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+void
+foo (uint16_t * addr, uint16x8x2_t value)
+{
+  vst2q_u16 (addr, value);
+}
+
+/* { dg-final { scan-assembler "vst20.16"  }  } */
+/* { dg-final { scan-assembler "vst21.16"  }  } */
+
+void
+foo1 (uint16_t * addr, uint16x8x2_t value)
+{
+  vst2q (addr, value);
+}
+
+/* { dg-final { scan-assembler "vst20.16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst2q_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst2q_u32.c
new file mode 100644
index 0000000000000000000000000000000000000000..aba5a20abc0b239cca5352aa81f17d8a56069164
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst2q_u32.c
@@ -0,0 +1,22 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+void
+foo (uint32_t * addr, uint32x4x2_t value)
+{
+  vst2q_u32 (addr, value);
+}
+
+/* { dg-final { scan-assembler "vst20.32"  }  } */
+/* { dg-final { scan-assembler "vst21.32"  }  } */
+
+void
+foo1 (uint32_t * addr, uint32x4x2_t value)
+{
+  vst2q (addr, value);
+}
+
+/* { dg-final { scan-assembler "vst20.32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst2q_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst2q_u8.c
new file mode 100644
index 0000000000000000000000000000000000000000..4dc1f82380473243f2e1b1adee4af97289f59261
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst2q_u8.c
@@ -0,0 +1,22 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+void
+foo (uint8_t * addr, uint8x16x2_t value)
+{
+  vst2q_u8 (addr, value);
+}
+
+/* { dg-final { scan-assembler "vst20.8"  }  } */
+/* { dg-final { scan-assembler "vst21.8"  }  } */
+
+void
+foo1 (uint8_t * addr, uint8x16x2_t value)
+{
+  vst2q (addr, value);
+}
+
+/* { dg-final { scan-assembler "vst20.8"  }  } */


[-- Attachment #2: diff34.patch.gz --]
[-- Type: application/gzip, Size: 6434 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH][ARM][GCC][10x]: MVE ACLE intrinsics "add with carry across beats" and "beat-wise substract".
  2019-11-14 19:34 [PATCH][ARM][GCC][0/x]: Support for MVE ACLE intrinsics Srinath Parvathaneni
                   ` (31 preceding siblings ...)
  2019-11-14 19:16 ` [PATCH][ARM][GCC][1/8x]: MVE ACLE vidup, vddup, viwdup and vdwdup intrinsics with writeback Srinath Parvathaneni
@ 2019-11-14 19:16 ` Srinath Parvathaneni
  2019-11-14 19:16 ` [PATCH][ARM][GCC][6x]:MVE ACLE vaddq intrinsics using arithmetic plus operator Srinath Parvathaneni
                   ` (5 subsequent siblings)
  38 siblings, 0 replies; 48+ messages in thread
From: Srinath Parvathaneni @ 2019-11-14 19:16 UTC (permalink / raw)
  To: gcc-patches; +Cc: Richard Earnshaw, Kyrylo Tkachov

[-- Attachment #1: Type: text/plain, Size: 44296 bytes --]

Hello,

This patch supports following MVE ACLE "add with carry across beats" intrinsics
and "beat-wise substract" intrinsics.

vadciq_s32, vadciq_u32, vadciq_m_s32, vadciq_m_u32, vadcq_s32, vadcq_u32,
vadcq_m_s32, vadcq_m_u32, vsbciq_s32, vsbciq_u32, vsbciq_m_s32, vsbciq_m_u32,
vsbcq_s32, vsbcq_u32, vsbcq_m_s32, vsbcq_m_u32.

Please refer to M-profile Vector Extension (MVE) intrinsics [1]  for more details.
[1] https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/mve-intrinsics

Regression tested on arm-none-eabi and found no regressions.

Ok for trunk?

Thanks,
Srinath.

gcc/ChangeLog:

2019-11-08  Andre Vieira  <andre.simoesdiasvieira@arm.com>
            Mihail Ionescu  <mihail.ionescu@arm.com>
            Srinath Parvathaneni  <srinath.parvathaneni@arm.com>

	* config/arm/arm_mve.h (vadciq_s32): Define macro.
	(vadciq_u32): Likewise.
	(vadciq_m_s32): Likewise.
	(vadciq_m_u32): Likewise.
	(vadcq_s32): Likewise.
	(vadcq_u32): Likewise.
	(vadcq_m_s32): Likewise.
	(vadcq_m_u32): Likewise.
	(vsbciq_s32): Likewise.
	(vsbciq_u32): Likewise.
	(vsbciq_m_s32): Likewise.
	(vsbciq_m_u32): Likewise.
	(vsbcq_s32): Likewise.
	(vsbcq_u32): Likewise.
	(vsbcq_m_s32): Likewise.
	(vsbcq_m_u32): Likewise.
	(__arm_vadciq_s32): Define intrinsic.
	(__arm_vadciq_u32): Likewise.
	(__arm_vadciq_m_s32): Likewise.
	(__arm_vadciq_m_u32): Likewise.
	(__arm_vadcq_s32): Likewise.
	(__arm_vadcq_u32): Likewise.
	(__arm_vadcq_m_s32): Likewise.
	(__arm_vadcq_m_u32): Likewise.
	(__arm_vsbciq_s32): Likewise.
	(__arm_vsbciq_u32): Likewise.
	(__arm_vsbciq_m_s32): Likewise.
	(__arm_vsbciq_m_u32): Likewise.
	(__arm_vsbcq_s32): Likewise.
	(__arm_vsbcq_u32): Likewise.
	(__arm_vsbcq_m_s32): Likewise.
	(__arm_vsbcq_m_u32): Likewise.
	(vadciq_m): Define polymorphic variant.
	(vadciq): Likewise.
	(vadcq_m): Likewise.
	(vadcq): Likewise.
	(vsbciq_m): Likewise.
	(vsbciq): Likewise.
	(vsbcq_m): Likewise.
	(vsbcq): Likewise.
	* config/arm/arm_mve_builtins.def (BINOP_NONE_NONE_NONE): Use builtin
	qualifier.
	(BINOP_UNONE_UNONE_UNONE): Likewise.
	(QUADOP_NONE_NONE_NONE_NONE_UNONE): Likewise.
	(QUADOP_UNONE_UNONE_UNONE_UNONE_UNONE): Likewise.
	* config/arm/mve.md (VADCIQ): Define iterator.
	(VADCIQ_M): Likewise.
	(VSBCQ): Likewise.
	(VSBCQ_M): Likewise.
	(VSBCIQ): Likewise.
	(VSBCIQ_M): Likewise.
	(VADCQ): Likewise.
	(VADCQ_M): Likewise.
	(mve_vadciq_m_<supf>v4si): Define RTL pattern.
	(mve_vadciq_<supf>v4si): Likewise.
	(mve_vadcq_m_<supf>v4si): Likewise.
	(mve_vadcq_<supf>v4si): Likewise.
	(mve_vsbciq_m_<supf>v4si): Likewise.
	(mve_vsbciq_<supf>v4si): Likewise.
	(mve_vsbcq_m_<supf>v4si): Likewise.
	(mve_vsbcq_<supf>v4si): Likewise.

gcc/testsuite/ChangeLog:

2019-11-08  Andre Vieira  <andre.simoesdiasvieira@arm.com>
            Mihail Ionescu  <mihail.ionescu@arm.com>
            Srinath Parvathaneni  <srinath.parvathaneni@arm.com>

	* gcc.target/arm/mve/intrinsics/vadciq_m_s32.c: New test.
	* gcc.target/arm/mve/intrinsics/vadciq_m_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vadciq_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vadciq_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vadcq_m_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vadcq_m_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vadcq_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vadcq_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsbciq_m_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsbciq_m_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsbciq_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsbciq_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsbcq_m_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsbcq_m_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsbcq_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsbcq_u32.c: Likewise.


###############     Attachment also inlined for ease of reply    ###############


diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
index 31ad3fc5cddfedede02b10e194a426a98bd13024..1704b622c5d6e0abcf814ae1d439bb732f0bd76e 100644
--- a/gcc/config/arm/arm_mve.h
+++ b/gcc/config/arm/arm_mve.h
@@ -2450,6 +2450,22 @@ typedef struct { uint8x16_t val[4]; } uint8x16x4_t;
 #define vrev32q_x_f16(__a, __p) __arm_vrev32q_x_f16(__a, __p)
 #define vrev64q_x_f16(__a, __p) __arm_vrev64q_x_f16(__a, __p)
 #define vrev64q_x_f32(__a, __p) __arm_vrev64q_x_f32(__a, __p)
+#define vadciq_s32(__a, __b,  __carry_out) __arm_vadciq_s32(__a, __b,  __carry_out)
+#define vadciq_u32(__a, __b,  __carry_out) __arm_vadciq_u32(__a, __b,  __carry_out)
+#define vadciq_m_s32(__inactive, __a, __b,  __carry_out, __p) __arm_vadciq_m_s32(__inactive, __a, __b,  __carry_out, __p)
+#define vadciq_m_u32(__inactive, __a, __b,  __carry_out, __p) __arm_vadciq_m_u32(__inactive, __a, __b,  __carry_out, __p)
+#define vadcq_s32(__a, __b,  __carry) __arm_vadcq_s32(__a, __b,  __carry)
+#define vadcq_u32(__a, __b,  __carry) __arm_vadcq_u32(__a, __b,  __carry)
+#define vadcq_m_s32(__inactive, __a, __b,  __carry, __p) __arm_vadcq_m_s32(__inactive, __a, __b,  __carry, __p)
+#define vadcq_m_u32(__inactive, __a, __b,  __carry, __p) __arm_vadcq_m_u32(__inactive, __a, __b,  __carry, __p)
+#define vsbciq_s32(__a, __b,  __carry_out) __arm_vsbciq_s32(__a, __b,  __carry_out)
+#define vsbciq_u32(__a, __b,  __carry_out) __arm_vsbciq_u32(__a, __b,  __carry_out)
+#define vsbciq_m_s32(__inactive, __a, __b,  __carry_out, __p) __arm_vsbciq_m_s32(__inactive, __a, __b,  __carry_out, __p)
+#define vsbciq_m_u32(__inactive, __a, __b,  __carry_out, __p) __arm_vsbciq_m_u32(__inactive, __a, __b,  __carry_out, __p)
+#define vsbcq_s32(__a, __b,  __carry) __arm_vsbcq_s32(__a, __b,  __carry)
+#define vsbcq_u32(__a, __b,  __carry) __arm_vsbcq_u32(__a, __b,  __carry)
+#define vsbcq_m_s32(__inactive, __a, __b,  __carry, __p) __arm_vsbcq_m_s32(__inactive, __a, __b,  __carry, __p)
+#define vsbcq_m_u32(__inactive, __a, __b,  __carry, __p) __arm_vsbcq_m_u32(__inactive, __a, __b,  __carry, __p)
 #endif
 
 __extension__ extern __inline void
@@ -15917,6 +15933,158 @@ __arm_vshrq_x_n_u32 (uint32x4_t __a, const int __imm, mve_pred16_t __p)
   return __builtin_mve_vshrq_m_n_uv4si (vuninitializedq_u32 (), __a, __imm, __p);
 }
 
+__extension__ extern __inline int32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vadciq_s32 (int32x4_t __a, int32x4_t __b, unsigned * __carry_out)
+{
+  int32x4_t __res = __builtin_mve_vadciq_sv4si (__a, __b);
+  *__carry_out = (__builtin_arm_get_fpscr () >> 29) & 0x1u;
+  return __res;
+}
+
+__extension__ extern __inline uint32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vadciq_u32 (uint32x4_t __a, uint32x4_t __b, unsigned * __carry_out)
+{
+  uint32x4_t __res = __builtin_mve_vadciq_uv4si (__a, __b);
+  *__carry_out = (__builtin_arm_get_fpscr () >> 29) & 0x1u;
+  return __res;
+}
+
+__extension__ extern __inline int32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vadciq_m_s32 (int32x4_t __inactive, int32x4_t __a, int32x4_t __b, unsigned * __carry_out, mve_pred16_t __p)
+{
+  int32x4_t __res =  __builtin_mve_vadciq_m_sv4si (__inactive, __a, __b, __p);
+  *__carry_out = (__builtin_arm_get_fpscr () >> 29) & 0x1u;
+  return __res;
+}
+
+__extension__ extern __inline uint32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vadciq_m_u32 (uint32x4_t __inactive, uint32x4_t __a, uint32x4_t __b, unsigned * __carry_out, mve_pred16_t __p)
+{
+  uint32x4_t __res = __builtin_mve_vadciq_m_uv4si (__inactive, __a, __b, __p);
+  *__carry_out = (__builtin_arm_get_fpscr () >> 29) & 0x1u;
+  return __res;
+}
+
+__extension__ extern __inline int32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vadcq_s32 (int32x4_t __a, int32x4_t __b, unsigned * __carry)
+{
+  __builtin_arm_set_fpscr((__builtin_arm_get_fpscr () & ~0x20000000u) | (*__carry << 29));
+  int32x4_t __res = __builtin_mve_vadcq_sv4si (__a, __b);
+  *__carry = (__builtin_arm_get_fpscr () >> 29) & 0x1u;
+  return __res;
+}
+
+__extension__ extern __inline uint32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vadcq_u32 (uint32x4_t __a, uint32x4_t __b, unsigned * __carry)
+{
+  __builtin_arm_set_fpscr((__builtin_arm_get_fpscr () & ~0x20000000u) | (*__carry << 29));
+  uint32x4_t __res = __builtin_mve_vadcq_uv4si (__a, __b);
+  *__carry = (__builtin_arm_get_fpscr () >> 29) & 0x1u;
+  return __res;
+}
+
+__extension__ extern __inline int32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vadcq_m_s32 (int32x4_t __inactive, int32x4_t __a, int32x4_t __b, unsigned * __carry, mve_pred16_t __p)
+{
+  __builtin_arm_set_fpscr((__builtin_arm_get_fpscr () & ~0x20000000u) | (*__carry << 29));
+  int32x4_t __res = __builtin_mve_vadcq_m_sv4si (__inactive, __a, __b, __p);
+  *__carry = (__builtin_arm_get_fpscr () >> 29) & 0x1u;
+  return __res;
+}
+
+__extension__ extern __inline uint32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vadcq_m_u32 (uint32x4_t __inactive, uint32x4_t __a, uint32x4_t __b, unsigned * __carry, mve_pred16_t __p)
+{
+  __builtin_arm_set_fpscr((__builtin_arm_get_fpscr () & ~0x20000000u) | (*__carry << 29));
+  uint32x4_t __res =  __builtin_mve_vadcq_m_uv4si (__inactive, __a, __b, __p);
+  *__carry = (__builtin_arm_get_fpscr () >> 29) & 0x1u;
+  return __res;
+}
+
+__extension__ extern __inline int32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vsbciq_s32 (int32x4_t __a, int32x4_t __b, unsigned * __carry_out)
+{
+  int32x4_t __res = __builtin_mve_vsbciq_sv4si (__a, __b);
+  *__carry_out = (__builtin_arm_get_fpscr () >> 29) & 0x1u;
+  return __res;
+}
+
+__extension__ extern __inline uint32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vsbciq_u32 (uint32x4_t __a, uint32x4_t __b, unsigned * __carry_out)
+{
+  uint32x4_t __res = __builtin_mve_vsbciq_uv4si (__a, __b);
+  *__carry_out = (__builtin_arm_get_fpscr () >> 29) & 0x1u;
+  return __res;
+}
+
+__extension__ extern __inline int32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vsbciq_m_s32 (int32x4_t __inactive, int32x4_t __a, int32x4_t __b, unsigned * __carry_out, mve_pred16_t __p)
+{
+  int32x4_t __res = __builtin_mve_vsbciq_m_sv4si (__inactive, __a, __b, __p);
+  *__carry_out = (__builtin_arm_get_fpscr () >> 29) & 0x1u;
+  return __res;
+}
+
+__extension__ extern __inline uint32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vsbciq_m_u32 (uint32x4_t __inactive, uint32x4_t __a, uint32x4_t __b, unsigned * __carry_out, mve_pred16_t __p)
+{
+  uint32x4_t __res = __builtin_mve_vsbciq_m_uv4si (__inactive, __a, __b, __p);
+  *__carry_out = (__builtin_arm_get_fpscr () >> 29) & 0x1u;
+  return __res;
+}
+
+__extension__ extern __inline int32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vsbcq_s32 (int32x4_t __a, int32x4_t __b, unsigned * __carry)
+{
+  __builtin_arm_set_fpscr((__builtin_arm_get_fpscr () & ~0x20000000u) | (*__carry << 29));
+  int32x4_t __res = __builtin_mve_vsbcq_sv4si (__a, __b);
+  *__carry = (__builtin_arm_get_fpscr () >> 29) & 0x1u;
+  return __res;
+}
+
+__extension__ extern __inline uint32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vsbcq_u32 (uint32x4_t __a, uint32x4_t __b, unsigned * __carry)
+{
+  __builtin_arm_set_fpscr((__builtin_arm_get_fpscr () & ~0x20000000u) | (*__carry << 29));
+  uint32x4_t __res =  __builtin_mve_vsbcq_uv4si (__a, __b);
+  *__carry = (__builtin_arm_get_fpscr () >> 29) & 0x1u;
+  return __res;
+}
+
+__extension__ extern __inline int32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vsbcq_m_s32 (int32x4_t __inactive, int32x4_t __a, int32x4_t __b, unsigned * __carry, mve_pred16_t __p)
+{
+  __builtin_arm_set_fpscr((__builtin_arm_get_fpscr () & ~0x20000000u) | (*__carry << 29));
+  int32x4_t __res = __builtin_mve_vsbcq_m_sv4si (__inactive, __a, __b, __p);
+  *__carry = (__builtin_arm_get_fpscr () >> 29) & 0x1u;
+  return __res;
+}
+
+__extension__ extern __inline uint32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vsbcq_m_u32 (uint32x4_t __inactive, uint32x4_t __a, uint32x4_t __b, unsigned * __carry, mve_pred16_t __p)
+{
+  __builtin_arm_set_fpscr((__builtin_arm_get_fpscr () & ~0x20000000u) | (*__carry << 29));
+  uint32x4_t __res = __builtin_mve_vsbcq_m_uv4si (__inactive, __a, __b, __p);
+  *__carry = (__builtin_arm_get_fpscr () >> 29) & 0x1u;
+  return __res;
+}
+
 #if (__ARM_FEATURE_MVE & 2) /* MVE Floating point.  */
 
 __extension__ extern __inline void
@@ -25552,6 +25720,65 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_uint16x8_t]: __arm_vshrq_x_n_u16 (__ARM_mve_coerce(__p1, uint16x8_t), p2, p3), \
   int (*)[__ARM_mve_type_uint32x4_t]: __arm_vshrq_x_n_u32 (__ARM_mve_coerce(__p1, uint32x4_t), p2, p3));})
 
+#define vadciq_m(p0,p1,p2,p3,p4) __arm_vadciq_m(p0,p1,p2,p3,p4)
+#define __arm_vadciq_m(p0,p1,p2,p3,p4) ({ __typeof(p0) __p0 = (p0); \
+  __typeof(p1) __p1 = (p1); \
+  __typeof(p2) __p2 = (p2); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vadciq_m_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32x4_t), p3, p4), \
+  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vadciq_m_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, uint32x4_t), p3, p4));})
+
+#define vadciq(p0,p1,p2) __arm_vadciq(p0,p1,p2)
+#define __arm_vadciq(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
+  __typeof(p1) __p1 = (p1); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vadciq_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), p2), \
+  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vadciq_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t), p2));})
+
+#define vadcq_m(p0,p1,p2,p3,p4) __arm_vadcq_m(p0,p1,p2,p3,p4)
+#define __arm_vadcq_m(p0,p1,p2,p3,p4) ({ __typeof(p0) __p0 = (p0); \
+  __typeof(p1) __p1 = (p1); \
+  __typeof(p2) __p2 = (p2); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vadcq_m_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32x4_t), p3, p4), \
+  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vadcq_m_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, uint32x4_t), p3, p4));})
+
+#define vadcq(p0,p1,p2) __arm_vadcq(p0,p1,p2)
+#define __arm_vadcq(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
+  __typeof(p1) __p1 = (p1); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vadcq_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), p2), \
+  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vadcq_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t), p2));})
+
+#define vsbciq_m(p0,p1,p2,p3,p4) __arm_vsbciq_m(p0,p1,p2,p3,p4)
+#define __arm_vsbciq_m(p0,p1,p2,p3,p4) ({ __typeof(p0) __p0 = (p0); \
+  __typeof(p1) __p1 = (p1); \
+  __typeof(p2) __p2 = (p2); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vsbciq_m_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32x4_t), p3, p4), \
+  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vsbciq_m_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, uint32x4_t), p3, p4));})
+
+#define vsbciq(p0,p1,p2) __arm_vsbciq(p0,p1,p2)
+#define __arm_vsbciq(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
+  __typeof(p1) __p1 = (p1); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vsbciq_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), p2), \
+  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vsbciq_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t), p2));})
+
+#define vsbcq_m(p0,p1,p2,p3,p4) __arm_vsbcq_m(p0,p1,p2,p3,p4)
+#define __arm_vsbcq_m(p0,p1,p2,p3,p4) ({ __typeof(p0) __p0 = (p0); \
+  __typeof(p1) __p1 = (p1); \
+  __typeof(p2) __p2 = (p2); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vsbcq_m_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32x4_t), p3, p4), \
+  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vsbcq_m_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, uint32x4_t), p3, p4));})
+
+#define vsbcq(p0,p1,p2) __arm_vsbcq(p0,p1,p2)
+#define __arm_vsbcq(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
+  __typeof(p1) __p1 = (p1); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vsbcq_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), p2), \
+  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vsbcq_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t), p2));})
 
 #endif /* MVE Floating point.  */
 
diff --git a/gcc/config/arm/arm_mve_builtins.def b/gcc/config/arm/arm_mve_builtins.def
index b77335cff133872558b48b5574dccc0f17df9ed1..a413b38676f2f102c16fdf2147f3b8a4d8ec47b4 100644
--- a/gcc/config/arm/arm_mve_builtins.def
+++ b/gcc/config/arm/arm_mve_builtins.def
@@ -857,3 +857,19 @@ VAR1 (LDRGBWBS_Z, vldrdq_gather_base_wb_z_s, v2di)
 VAR1 (LDRGBWBS, vldrwq_gather_base_wb_s, v4si)
 VAR1 (LDRGBWBS, vldrwq_gather_base_wb_f, v4sf)
 VAR1 (LDRGBWBS, vldrdq_gather_base_wb_s, v2di)
+VAR1 (BINOP_NONE_NONE_NONE, vadciq_s, v4si)
+VAR1 (BINOP_UNONE_UNONE_UNONE, vadciq_u, v4si)
+VAR1 (BINOP_NONE_NONE_NONE, vadcq_s, v4si)
+VAR1 (BINOP_UNONE_UNONE_UNONE, vadcq_u, v4si)
+VAR1 (BINOP_NONE_NONE_NONE, vsbciq_s, v4si)
+VAR1 (BINOP_UNONE_UNONE_UNONE, vsbciq_u, v4si)
+VAR1 (BINOP_NONE_NONE_NONE, vsbcq_s, v4si)
+VAR1 (BINOP_UNONE_UNONE_UNONE, vsbcq_u, v4si)
+VAR1 (QUADOP_NONE_NONE_NONE_NONE_UNONE, vadciq_m_s, v4si)
+VAR1 (QUADOP_UNONE_UNONE_UNONE_UNONE_UNONE, vadciq_m_u, v4si)
+VAR1 (QUADOP_NONE_NONE_NONE_NONE_UNONE, vadcq_m_s, v4si)
+VAR1 (QUADOP_UNONE_UNONE_UNONE_UNONE_UNONE, vadcq_m_u, v4si)
+VAR1 (QUADOP_NONE_NONE_NONE_NONE_UNONE, vsbciq_m_s, v4si)
+VAR1 (QUADOP_UNONE_UNONE_UNONE_UNONE_UNONE, vsbciq_m_u, v4si)
+VAR1 (QUADOP_NONE_NONE_NONE_NONE_UNONE, vsbcq_m_s, v4si)
+VAR1 (QUADOP_UNONE_UNONE_UNONE_UNONE_UNONE, vsbcq_m_u, v4si)
diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index a938e0922f8dc6749dc7192961ae2091d666c6e7..8ff69094378396830ef31d9e2ca9db71c58aefab 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -211,7 +211,10 @@
 			 VDWDUPQ_M VIDUPQ VIDUPQ_M VIWDUPQ VIWDUPQ_M
 			 VSTRWQSBWB_S VSTRWQSBWB_U VLDRWQGBWB_S VLDRWQGBWB_U
 			 VSTRWQSBWB_F VLDRWQGBWB_F VSTRDQSBWB_S VSTRDQSBWB_U
-			 VLDRDQGBWB_S VLDRDQGBWB_U])
+			 VLDRDQGBWB_S VLDRDQGBWB_U VADCQ_U VADCQ_M_U VADCQ_S
+			 VADCQ_M_S VSBCIQ_U VSBCIQ_S VSBCIQ_M_U VSBCIQ_M_S
+			 VSBCQ_U VSBCQ_S VSBCQ_M_U VSBCQ_M_S VADCIQ_U VADCIQ_M_U
+			 VADCIQ_S VADCIQ_M_S])
 
 (define_mode_attr MVE_CNVT [(V8HI "V8HF") (V4SI "V4SF") (V8HF "V8HI")
 			    (V4SF "V4SI")])
@@ -382,8 +385,13 @@
 		       (VSTRWQSO_U "u") (VSTRWQSO_S "s") (VSTRWQSSO_U "u")
 		       (VSTRWQSSO_S "s") (VSTRWQSBWB_S "s") (VSTRWQSBWB_U "u")
 		       (VLDRWQGBWB_S "s") (VLDRWQGBWB_U "u") (VLDRDQGBWB_S "s")
-		       (VLDRDQGBWB_U "u") (VSTRDQSBWB_S "s")
-		       (VSTRDQSBWB_U "u")])
+		       (VLDRDQGBWB_U "u") (VSTRDQSBWB_S "s") (VADCQ_M_S "s")
+		       (VSTRDQSBWB_U "u") (VSBCQ_U "u")  (VSBCQ_M_U "u")
+		       (VSBCQ_S "s")  (VSBCQ_M_S "s") (VSBCIQ_U "u")
+		       (VSBCIQ_M_U "u") (VSBCIQ_S "s") (VSBCIQ_M_S "s")
+		       (VADCQ_U "u")  (VADCQ_M_U "u") (VADCQ_S "s")
+		       (VADCIQ_U "u") (VADCIQ_M_U "u") (VADCIQ_S "s")
+		       (VADCIQ_M_S "s")])
 
 (define_int_attr mode1 [(VCTP8Q "8") (VCTP16Q "16") (VCTP32Q "32")
 			(VCTP64Q "64") (VCTP8Q_M "8") (VCTP16Q_M "16")
@@ -636,6 +644,15 @@
 (define_int_iterator VLDRWGBWBQ [VLDRWQGBWB_S VLDRWQGBWB_U])
 (define_int_iterator VSTRDSBWBQ [VSTRDQSBWB_S VSTRDQSBWB_U])
 (define_int_iterator VLDRDGBWBQ [VLDRDQGBWB_S VLDRDQGBWB_U])
+(define_int_iterator VADCIQ [VADCIQ_U VADCIQ_S])
+(define_int_iterator VADCIQ_M [VADCIQ_M_U VADCIQ_M_S])
+(define_int_iterator VSBCQ [VSBCQ_U VSBCQ_S])
+(define_int_iterator VSBCQ_M [VSBCQ_M_U VSBCQ_M_S])
+(define_int_iterator VSBCIQ [VSBCIQ_U VSBCIQ_S])
+(define_int_iterator VSBCIQ_M [VSBCIQ_M_U VSBCIQ_M_S])
+(define_int_iterator VADCQ [VADCQ_U VADCQ_S])
+(define_int_iterator VADCQ_M [VADCQ_M_U VADCQ_M_S])
+
 
 (define_insn "*mve_mov<mode>"
   [(set (match_operand:MVE_types 0 "s_register_operand" "=w,w,r,w,w,r,w")
@@ -10614,3 +10631,147 @@
    return "";
 }
   [(set_attr "length" "8")])
+;;
+;; [vadciq_m_s, vadciq_m_u])
+;;
+(define_insn "mve_vadciq_m_<supf>v4si"
+  [(set (match_operand:V4SI 0 "s_register_operand" "=w")
+	(unspec:V4SI [(match_operand:V4SI 1 "s_register_operand" "0")
+		      (match_operand:V4SI 2 "s_register_operand" "w")
+		      (match_operand:V4SI 3 "s_register_operand" "w")
+		      (match_operand:HI 4 "vpr_register_operand" "Up")]
+	 VADCIQ_M))
+   (set (reg:SI VFPCC_REGNUM)
+	(unspec:SI [(const_int 0)]
+	 VADCIQ_M))
+  ]
+  "TARGET_HAVE_MVE"
+  "vpst\;vadcit.i32\t%q0, %q2, %q3"
+  [(set_attr "type" "mve_move")
+   (set_attr "length" "8")])
+
+;;
+;; [vadciq_u, vadciq_s])
+;;
+(define_insn "mve_vadciq_<supf>v4si"
+  [(set (match_operand:V4SI 0 "s_register_operand" "=w")
+	(unspec:V4SI [(match_operand:V4SI 1 "s_register_operand" "w")
+		      (match_operand:V4SI 2 "s_register_operand" "w")]
+	 VADCIQ))
+   (set (reg:SI VFPCC_REGNUM)
+	(unspec:SI [(const_int 0)]
+	 VADCIQ))
+  ]
+  "TARGET_HAVE_MVE"
+  "vadci.i32\t%q0, %q1, %q2"
+  [(set_attr "type" "mve_move")
+   (set_attr "length" "4")])
+
+;;
+;; [vadcq_m_s, vadcq_m_u])
+;;
+(define_insn "mve_vadcq_m_<supf>v4si"
+  [(set (match_operand:V4SI 0 "s_register_operand" "=w")
+	(unspec:V4SI [(match_operand:V4SI 1 "s_register_operand" "0")
+		      (match_operand:V4SI 2 "s_register_operand" "w")
+		      (match_operand:V4SI 3 "s_register_operand" "w")
+		      (match_operand:HI 4 "vpr_register_operand" "Up")]
+	 VADCQ_M))
+   (set (reg:SI VFPCC_REGNUM)
+	(unspec:SI [(reg:SI VFPCC_REGNUM)]
+	 VADCQ_M))
+  ]
+  "TARGET_HAVE_MVE"
+  "vpst\;vadct.i32\t%q0, %q2, %q3"
+  [(set_attr "type" "mve_move")
+   (set_attr "length" "8")])
+
+;;
+;; [vadcq_u, vadcq_s])
+;;
+(define_insn "mve_vadcq_<supf>v4si"
+  [(set (match_operand:V4SI 0 "s_register_operand" "=w")
+	(unspec:V4SI [(match_operand:V4SI 1 "s_register_operand" "w")
+		       (match_operand:V4SI 2 "s_register_operand" "w")]
+	 VADCQ))
+   (set (reg:SI VFPCC_REGNUM)
+	(unspec:SI [(reg:SI VFPCC_REGNUM)]
+	 VADCQ))
+  ]
+  "TARGET_HAVE_MVE"
+  "vadc.i32\t%q0, %q1, %q2"
+  [(set_attr "type" "mve_move")
+   (set_attr "length" "4")
+   (set_attr "conds" "set")])
+
+;;
+;; [vsbciq_m_u, vsbciq_m_s])
+;;
+(define_insn "mve_vsbciq_m_<supf>v4si"
+  [(set (match_operand:V4SI 0 "s_register_operand" "=w")
+	(unspec:V4SI [(match_operand:V4SI 1 "s_register_operand" "w")
+		      (match_operand:V4SI 2 "s_register_operand" "w")
+		      (match_operand:V4SI 3 "s_register_operand" "w")
+		      (match_operand:HI 4 "vpr_register_operand" "Up")]
+	 VSBCIQ_M))
+   (set (reg:SI VFPCC_REGNUM)
+	(unspec:SI [(const_int 0)]
+	 VSBCIQ_M))
+  ]
+  "TARGET_HAVE_MVE"
+  "vpst\;vsbcit.i32\t%q0, %q2, %q3"
+  [(set_attr "type" "mve_move")
+   (set_attr "length" "8")])
+
+;;
+;; [vsbciq_s, vsbciq_u])
+;;
+(define_insn "mve_vsbciq_<supf>v4si"
+  [(set (match_operand:V4SI 0 "s_register_operand" "=w")
+	(unspec:V4SI [(match_operand:V4SI 1 "s_register_operand" "w")
+		      (match_operand:V4SI 2 "s_register_operand" "w")]
+	 VSBCIQ))
+   (set (reg:SI VFPCC_REGNUM)
+	(unspec:SI [(const_int 0)]
+	 VSBCIQ))
+  ]
+  "TARGET_HAVE_MVE"
+  "vsbci.i32\t%q0, %q1, %q2"
+  [(set_attr "type" "mve_move")
+   (set_attr "length" "4")])
+
+;;
+;; [vsbcq_m_u, vsbcq_m_s])
+;;
+(define_insn "mve_vsbcq_m_<supf>v4si"
+  [(set (match_operand:V4SI 0 "s_register_operand" "=w")
+	(unspec:V4SI [(match_operand:V4SI 1 "s_register_operand" "w")
+		      (match_operand:V4SI 2 "s_register_operand" "w")
+		      (match_operand:V4SI 3 "s_register_operand" "w")
+		      (match_operand:HI 4 "vpr_register_operand" "Up")]
+	 VSBCQ_M))
+   (set (reg:SI VFPCC_REGNUM)
+	(unspec:SI [(reg:SI VFPCC_REGNUM)]
+	 VSBCQ_M))
+  ]
+  "TARGET_HAVE_MVE"
+  "vpst\;vsbct.i32\t%q0, %q2, %q3"
+  [(set_attr "type" "mve_move")
+   (set_attr "length" "8")])
+
+;;
+;; [vsbcq_s, vsbcq_u])
+;;
+(define_insn "mve_vsbcq_<supf>v4si"
+  [(set (match_operand:V4SI 0 "s_register_operand" "=w")
+	(unspec:V4SI [(match_operand:V4SI 1 "s_register_operand" "w")
+		      (match_operand:V4SI 2 "s_register_operand" "w")]
+	 VSBCQ))
+   (set (reg:SI VFPCC_REGNUM)
+	(unspec:SI [(reg:SI VFPCC_REGNUM)]
+	 VSBCQ))
+  ]
+  "TARGET_HAVE_MVE"
+  "vsbc.i32\t%q0, %q1, %q2"
+  [(set_attr "type" "mve_move")
+   (set_attr "length" "4")])
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vadciq_m_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vadciq_m_s32.c
new file mode 100644
index 0000000000000000000000000000000000000000..51cedf3c5421241b0a2a1d8473e07172669a2043
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vadciq_m_s32.c
@@ -0,0 +1,26 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int32x4_t
+foo (int32x4_t inactive, int32x4_t a, int32x4_t b, unsigned * carry_out, mve_pred16_t p)
+{
+  return vadciq_m_s32 (inactive, a, b, carry_out, p);
+}
+
+/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler "vadcit.i32"  }  } */
+/* { dg-final { scan-assembler "vpst" } } */
+
+int32x4_t
+foo1 (int32x4_t inactive, int32x4_t a, int32x4_t b, unsigned * carry_out, mve_pred16_t p)
+{
+  return vadciq_m (inactive, a, b, carry_out, p);
+}
+
+/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler "vadcit.i32"  }  } */
+/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-times "mrc" 2 } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vadciq_m_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vadciq_m_u32.c
new file mode 100644
index 0000000000000000000000000000000000000000..b46f0a7d44d8bf77105ffcfc59cbb24aa4eaa658
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vadciq_m_u32.c
@@ -0,0 +1,26 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint32x4_t
+foo (uint32x4_t inactive, uint32x4_t a, uint32x4_t b, unsigned * carry_out, mve_pred16_t p)
+{
+  return vadciq_m_u32 (inactive, a, b, carry_out, p);
+}
+
+/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler "vadcit.i32"  }  } */
+/* { dg-final { scan-assembler "vpst" } } */
+
+uint32x4_t
+foo1 (uint32x4_t inactive, uint32x4_t a, uint32x4_t b, unsigned * carry_out, mve_pred16_t p)
+{
+  return vadciq_m (inactive, a, b, carry_out, p);
+}
+
+/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler "vadcit.i32"  }  } */
+/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-times "mrc" 2 } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vadciq_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vadciq_s32.c
new file mode 100644
index 0000000000000000000000000000000000000000..124bff6dbb225edd156390411db2b4e18643f256
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vadciq_s32.c
@@ -0,0 +1,22 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int32x4_t
+foo (int32x4_t a, int32x4_t b, unsigned * carry_out)
+{
+  return vadciq_s32 (a, b, carry_out);
+}
+
+/* { dg-final { scan-assembler "vadci.i32"  }  } */
+
+int32x4_t
+foo1 (int32x4_t a, int32x4_t b, unsigned * carry_out)
+{
+  return vadciq (a, b, carry_out);
+}
+
+/* { dg-final { scan-assembler "vadci.i32"  }  } */
+/* { dg-final { scan-assembler-times "mrc" 2 } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vadciq_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vadciq_u32.c
new file mode 100644
index 0000000000000000000000000000000000000000..0718570130caa4cb88085f8a11ab92a72522cec7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vadciq_u32.c
@@ -0,0 +1,22 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint32x4_t
+foo (uint32x4_t a, uint32x4_t b, unsigned * carry_out)
+{
+  return vadciq_u32 (a, b, carry_out);
+}
+
+/* { dg-final { scan-assembler "vadci.i32"  }  } */
+
+uint32x4_t
+foo1 (uint32x4_t a, uint32x4_t b, unsigned * carry_out)
+{
+  return vadciq (a, b, carry_out);
+}
+
+/* { dg-final { scan-assembler "vadci.i32"  }  } */
+/* { dg-final { scan-assembler-times "mrc" 2 } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vadcq_m_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vadcq_m_s32.c
new file mode 100644
index 0000000000000000000000000000000000000000..1c2a928d52ff04c7341f043c15123386453c66d5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vadcq_m_s32.c
@@ -0,0 +1,28 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int32x4_t
+foo (int32x4_t inactive, int32x4_t a, int32x4_t b, unsigned * carry, mve_pred16_t p)
+{
+  return vadcq_m_s32 (inactive, a, b, carry, p);
+}
+
+/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler "vadct.i32"  }  } */
+/* { dg-final { scan-assembler "vpst" } } */
+
+int32x4_t
+foo1 (int32x4_t inactive, int32x4_t a, int32x4_t b, unsigned * carry, mve_pred16_t p)
+{
+  return vadcq_m (inactive, a, b, carry, p);
+}
+
+/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler "vadct.i32"  }  } */
+/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-times "mrc" 4 } } */
+/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-times "mcr" 2 } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vadcq_m_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vadcq_m_u32.c
new file mode 100644
index 0000000000000000000000000000000000000000..af38e01b7c8fd180311afbb6ee05c758ec2e636e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vadcq_m_u32.c
@@ -0,0 +1,29 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint32x4_t
+foo (uint32x4_t inactive, uint32x4_t a, uint32x4_t b, unsigned * carry, mve_pred16_t p)
+{
+  return vadcq_m_u32 (inactive, a, b, carry, p);
+}
+
+/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler "vadct.i32"  }  } */
+/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler "vpst" } } */
+
+uint32x4_t
+foo1 (uint32x4_t inactive, uint32x4_t a, uint32x4_t b, unsigned * carry, mve_pred16_t p)
+{
+  return vadcq_m (inactive, a, b, carry, p);
+}
+
+/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler "vadct.i32"  }  } */
+/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-times "mrc" 4 } } */
+/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-times "mcr" 2 } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vadcq_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vadcq_s32.c
new file mode 100644
index 0000000000000000000000000000000000000000..35be2d6aa2e31628a162bd9456be0844c861bb6a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vadcq_s32.c
@@ -0,0 +1,23 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int32x4_t
+foo (int32x4_t a, int32x4_t b, unsigned * carry)
+{
+  return vadcq_s32 (a, b, carry);
+}
+
+/* { dg-final { scan-assembler "vadc.i32"  }  } */
+
+int32x4_t
+foo1 (int32x4_t a, int32x4_t b, unsigned * carry)
+{
+  return vadcq (a, b, carry);
+}
+
+/* { dg-final { scan-assembler "vadc.i32"  }  } */
+/* { dg-final { scan-assembler-times "mrc" 4 } } */
+/* { dg-final { scan-assembler-times "mcr" 2 } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vadcq_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vadcq_u32.c
new file mode 100644
index 0000000000000000000000000000000000000000..9a8246318b6fad157cbb9b74379e2f760fe71ff6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vadcq_u32.c
@@ -0,0 +1,23 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint32x4_t
+foo (uint32x4_t a, uint32x4_t b, unsigned * carry)
+{
+  return vadcq_u32 (a, b, carry);
+}
+
+/* { dg-final { scan-assembler "vadc.i32"  }  } */
+
+uint32x4_t
+foo1 (uint32x4_t a, uint32x4_t b, unsigned * carry)
+{
+  return vadcq (a, b, carry);
+}
+
+/* { dg-final { scan-assembler "vadc.i32"  }  } */
+/* { dg-final { scan-assembler-times "mrc" 4 } } */
+/* { dg-final { scan-assembler-times "mcr" 2 } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsbciq_m_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsbciq_m_s32.c
new file mode 100644
index 0000000000000000000000000000000000000000..c353a51080d9a15f033562636ffde3cb81e39b36
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsbciq_m_s32.c
@@ -0,0 +1,26 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int32x4_t
+foo (int32x4_t inactive, int32x4_t a, int32x4_t b, unsigned * carry_out, mve_pred16_t p)
+{
+  return vsbciq_m_s32 (inactive, a, b, carry_out, p);
+}
+
+/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler "vsbcit.i32"  } } */
+/* { dg-final { scan-assembler "vpst" } } */
+
+int32x4_t
+foo1 (int32x4_t inactive, int32x4_t a, int32x4_t b, unsigned * carry_out, mve_pred16_t p)
+{
+  return vsbciq_m (inactive, a, b, carry_out, p);
+}
+
+/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler "vsbcit.i32"  } } */
+/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-times "mrc" 2 } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsbciq_m_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsbciq_m_u32.c
new file mode 100644
index 0000000000000000000000000000000000000000..e2bddb737c866c86a0ba0ac6fb5de1eeb7206c69
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsbciq_m_u32.c
@@ -0,0 +1,27 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint32x4_t
+foo (uint32x4_t inactive, uint32x4_t a, uint32x4_t b, unsigned * carry_out, mve_pred16_t p)
+{
+  return vsbciq_m_u32 (inactive, a, b, carry_out, p);
+}
+
+/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler "vsbcit.i32"  }  } */
+/* { dg-final { scan-assembler "vpst" } } */
+
+uint32x4_t
+foo1 (uint32x4_t inactive, uint32x4_t a, uint32x4_t b, unsigned * carry_out, mve_pred16_t p)
+{
+  return vsbciq_m (inactive, a, b, carry_out, p);
+}
+
+/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler "vsbcit.i32"  }  } */
+/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-times "mrc" 2 } } */
+
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsbciq_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsbciq_s32.c
new file mode 100644
index 0000000000000000000000000000000000000000..db32b9cef100c1ec8b8fa75d946e9423e7dc6669
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsbciq_s32.c
@@ -0,0 +1,22 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int32x4_t
+foo (int32x4_t a, int32x4_t b, unsigned * carry_out)
+{
+  return vsbciq_s32 (a, b, carry_out);
+}
+
+/* { dg-final { scan-assembler "vsbci.i32"  }  } */
+
+int32x4_t
+foo1 (int32x4_t a, int32x4_t b, unsigned * carry_out)
+{
+  return vsbciq_s32 (a, b, carry_out);
+}
+
+/* { dg-final { scan-assembler "vsbci.i32"  }  } */
+/* { dg-final { scan-assembler-times "mrc" 2 } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsbciq_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsbciq_u32.c
new file mode 100644
index 0000000000000000000000000000000000000000..60f213d9a631d7d12f8ab7dfe58e3c251b9e8854
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsbciq_u32.c
@@ -0,0 +1,22 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint32x4_t
+foo (uint32x4_t a, uint32x4_t b, unsigned * carry_out)
+{
+  return vsbciq_u32 (a, b, carry_out);
+}
+
+/* { dg-final { scan-assembler "vsbci.i32"  }  } */
+
+uint32x4_t
+foo1 (uint32x4_t a, uint32x4_t b, unsigned * carry_out)
+{
+  return vsbciq_u32 (a, b, carry_out);
+}
+
+/* { dg-final { scan-assembler "vsbci.i32"  }  } */
+/* { dg-final { scan-assembler-times "mrc" 2 } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsbcq_m_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsbcq_m_s32.c
new file mode 100644
index 0000000000000000000000000000000000000000..4ab7e2f1f69ade2189c44bc9bcd6243dc3cd9d68
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsbcq_m_s32.c
@@ -0,0 +1,19 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int32x4_t
+foo (int32x4_t inactive, int32x4_t a, int32x4_t b, unsigned * carry, mve_pred16_t p)
+{
+    return vsbcq_m_s32 (inactive, a, b, carry, p);
+}
+
+/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler "vsbct.i32"  }  } */
+/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-times "mrc" 2 } } */
+/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-times "mcr" 1 } } */
+
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsbcq_m_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsbcq_m_u32.c
new file mode 100644
index 0000000000000000000000000000000000000000..da2edac3d95d5405226bd0b85a9dce42ad62fd9b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsbcq_m_u32.c
@@ -0,0 +1,19 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint32x4_t
+foo (uint32x4_t inactive, uint32x4_t a, uint32x4_t b, unsigned * carry, mve_pred16_t p)
+{
+    return vsbcq_m_u32 (inactive, a, b, carry, p);
+}
+
+/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler "vsbct.i32"  }  } */
+/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-times "mrc" 2 } } */
+/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-times "mcr" 1 } } */
+
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsbcq_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsbcq_s32.c
new file mode 100644
index 0000000000000000000000000000000000000000..9c3e6e99f5938d523d5dee127300031ddc09ee33
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsbcq_s32.c
@@ -0,0 +1,24 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int32x4_t
+foo (int32x4_t a, int32x4_t b, unsigned * carry)
+{
+  return vsbcq_s32 (a, b, carry);
+}
+
+/* { dg-final { scan-assembler "vsbc.i32"  }  } */
+
+int32x4_t
+foo1 (int32x4_t a, int32x4_t b, unsigned * carry)
+{
+  return vsbcq (a, b, carry);
+}
+
+/* { dg-final { scan-assembler "vsbc.i32"  }  } */
+/* { dg-final { scan-assembler-times "mrc" 4 } } */
+/* { dg-final { scan-assembler-times "mcr" 2 } } */
+
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsbcq_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsbcq_u32.c
new file mode 100644
index 0000000000000000000000000000000000000000..122b20c41b34c906c00b9914fe152195a3739118
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsbcq_u32.c
@@ -0,0 +1,23 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint32x4_t
+foo (uint32x4_t a, uint32x4_t b, unsigned * carry)
+{
+  return vsbcq_u32 (a, b, carry);
+}
+
+/* { dg-final { scan-assembler "vsbc.i32"  }  } */
+
+uint32x4_t
+foo1 (uint32x4_t a, uint32x4_t b, unsigned * carry)
+{
+  return vsbcq (a, b, carry);
+}
+
+/* { dg-final { scan-assembler "vsbc.i32"  }  } */
+/* { dg-final { scan-assembler-times "mrc" 4 } } */
+/* { dg-final { scan-assembler-times "mcr" 2 } } */


[-- Attachment #2: diff33.patch --]
[-- Type: text/plain, Size: 39371 bytes --]

diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
index 31ad3fc5cddfedede02b10e194a426a98bd13024..1704b622c5d6e0abcf814ae1d439bb732f0bd76e 100644
--- a/gcc/config/arm/arm_mve.h
+++ b/gcc/config/arm/arm_mve.h
@@ -2450,6 +2450,22 @@ typedef struct { uint8x16_t val[4]; } uint8x16x4_t;
 #define vrev32q_x_f16(__a, __p) __arm_vrev32q_x_f16(__a, __p)
 #define vrev64q_x_f16(__a, __p) __arm_vrev64q_x_f16(__a, __p)
 #define vrev64q_x_f32(__a, __p) __arm_vrev64q_x_f32(__a, __p)
+#define vadciq_s32(__a, __b,  __carry_out) __arm_vadciq_s32(__a, __b,  __carry_out)
+#define vadciq_u32(__a, __b,  __carry_out) __arm_vadciq_u32(__a, __b,  __carry_out)
+#define vadciq_m_s32(__inactive, __a, __b,  __carry_out, __p) __arm_vadciq_m_s32(__inactive, __a, __b,  __carry_out, __p)
+#define vadciq_m_u32(__inactive, __a, __b,  __carry_out, __p) __arm_vadciq_m_u32(__inactive, __a, __b,  __carry_out, __p)
+#define vadcq_s32(__a, __b,  __carry) __arm_vadcq_s32(__a, __b,  __carry)
+#define vadcq_u32(__a, __b,  __carry) __arm_vadcq_u32(__a, __b,  __carry)
+#define vadcq_m_s32(__inactive, __a, __b,  __carry, __p) __arm_vadcq_m_s32(__inactive, __a, __b,  __carry, __p)
+#define vadcq_m_u32(__inactive, __a, __b,  __carry, __p) __arm_vadcq_m_u32(__inactive, __a, __b,  __carry, __p)
+#define vsbciq_s32(__a, __b,  __carry_out) __arm_vsbciq_s32(__a, __b,  __carry_out)
+#define vsbciq_u32(__a, __b,  __carry_out) __arm_vsbciq_u32(__a, __b,  __carry_out)
+#define vsbciq_m_s32(__inactive, __a, __b,  __carry_out, __p) __arm_vsbciq_m_s32(__inactive, __a, __b,  __carry_out, __p)
+#define vsbciq_m_u32(__inactive, __a, __b,  __carry_out, __p) __arm_vsbciq_m_u32(__inactive, __a, __b,  __carry_out, __p)
+#define vsbcq_s32(__a, __b,  __carry) __arm_vsbcq_s32(__a, __b,  __carry)
+#define vsbcq_u32(__a, __b,  __carry) __arm_vsbcq_u32(__a, __b,  __carry)
+#define vsbcq_m_s32(__inactive, __a, __b,  __carry, __p) __arm_vsbcq_m_s32(__inactive, __a, __b,  __carry, __p)
+#define vsbcq_m_u32(__inactive, __a, __b,  __carry, __p) __arm_vsbcq_m_u32(__inactive, __a, __b,  __carry, __p)
 #endif
 
 __extension__ extern __inline void
@@ -15917,6 +15933,158 @@ __arm_vshrq_x_n_u32 (uint32x4_t __a, const int __imm, mve_pred16_t __p)
   return __builtin_mve_vshrq_m_n_uv4si (vuninitializedq_u32 (), __a, __imm, __p);
 }
 
+__extension__ extern __inline int32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vadciq_s32 (int32x4_t __a, int32x4_t __b, unsigned * __carry_out)
+{
+  int32x4_t __res = __builtin_mve_vadciq_sv4si (__a, __b);
+  *__carry_out = (__builtin_arm_get_fpscr () >> 29) & 0x1u;
+  return __res;
+}
+
+__extension__ extern __inline uint32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vadciq_u32 (uint32x4_t __a, uint32x4_t __b, unsigned * __carry_out)
+{
+  uint32x4_t __res = __builtin_mve_vadciq_uv4si (__a, __b);
+  *__carry_out = (__builtin_arm_get_fpscr () >> 29) & 0x1u;
+  return __res;
+}
+
+__extension__ extern __inline int32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vadciq_m_s32 (int32x4_t __inactive, int32x4_t __a, int32x4_t __b, unsigned * __carry_out, mve_pred16_t __p)
+{
+  int32x4_t __res =  __builtin_mve_vadciq_m_sv4si (__inactive, __a, __b, __p);
+  *__carry_out = (__builtin_arm_get_fpscr () >> 29) & 0x1u;
+  return __res;
+}
+
+__extension__ extern __inline uint32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vadciq_m_u32 (uint32x4_t __inactive, uint32x4_t __a, uint32x4_t __b, unsigned * __carry_out, mve_pred16_t __p)
+{
+  uint32x4_t __res = __builtin_mve_vadciq_m_uv4si (__inactive, __a, __b, __p);
+  *__carry_out = (__builtin_arm_get_fpscr () >> 29) & 0x1u;
+  return __res;
+}
+
+__extension__ extern __inline int32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vadcq_s32 (int32x4_t __a, int32x4_t __b, unsigned * __carry)
+{
+  __builtin_arm_set_fpscr((__builtin_arm_get_fpscr () & ~0x20000000u) | (*__carry << 29));
+  int32x4_t __res = __builtin_mve_vadcq_sv4si (__a, __b);
+  *__carry = (__builtin_arm_get_fpscr () >> 29) & 0x1u;
+  return __res;
+}
+
+__extension__ extern __inline uint32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vadcq_u32 (uint32x4_t __a, uint32x4_t __b, unsigned * __carry)
+{
+  __builtin_arm_set_fpscr((__builtin_arm_get_fpscr () & ~0x20000000u) | (*__carry << 29));
+  uint32x4_t __res = __builtin_mve_vadcq_uv4si (__a, __b);
+  *__carry = (__builtin_arm_get_fpscr () >> 29) & 0x1u;
+  return __res;
+}
+
+__extension__ extern __inline int32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vadcq_m_s32 (int32x4_t __inactive, int32x4_t __a, int32x4_t __b, unsigned * __carry, mve_pred16_t __p)
+{
+  __builtin_arm_set_fpscr((__builtin_arm_get_fpscr () & ~0x20000000u) | (*__carry << 29));
+  int32x4_t __res = __builtin_mve_vadcq_m_sv4si (__inactive, __a, __b, __p);
+  *__carry = (__builtin_arm_get_fpscr () >> 29) & 0x1u;
+  return __res;
+}
+
+__extension__ extern __inline uint32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vadcq_m_u32 (uint32x4_t __inactive, uint32x4_t __a, uint32x4_t __b, unsigned * __carry, mve_pred16_t __p)
+{
+  __builtin_arm_set_fpscr((__builtin_arm_get_fpscr () & ~0x20000000u) | (*__carry << 29));
+  uint32x4_t __res =  __builtin_mve_vadcq_m_uv4si (__inactive, __a, __b, __p);
+  *__carry = (__builtin_arm_get_fpscr () >> 29) & 0x1u;
+  return __res;
+}
+
+__extension__ extern __inline int32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vsbciq_s32 (int32x4_t __a, int32x4_t __b, unsigned * __carry_out)
+{
+  int32x4_t __res = __builtin_mve_vsbciq_sv4si (__a, __b);
+  *__carry_out = (__builtin_arm_get_fpscr () >> 29) & 0x1u;
+  return __res;
+}
+
+__extension__ extern __inline uint32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vsbciq_u32 (uint32x4_t __a, uint32x4_t __b, unsigned * __carry_out)
+{
+  uint32x4_t __res = __builtin_mve_vsbciq_uv4si (__a, __b);
+  *__carry_out = (__builtin_arm_get_fpscr () >> 29) & 0x1u;
+  return __res;
+}
+
+__extension__ extern __inline int32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vsbciq_m_s32 (int32x4_t __inactive, int32x4_t __a, int32x4_t __b, unsigned * __carry_out, mve_pred16_t __p)
+{
+  int32x4_t __res = __builtin_mve_vsbciq_m_sv4si (__inactive, __a, __b, __p);
+  *__carry_out = (__builtin_arm_get_fpscr () >> 29) & 0x1u;
+  return __res;
+}
+
+__extension__ extern __inline uint32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vsbciq_m_u32 (uint32x4_t __inactive, uint32x4_t __a, uint32x4_t __b, unsigned * __carry_out, mve_pred16_t __p)
+{
+  uint32x4_t __res = __builtin_mve_vsbciq_m_uv4si (__inactive, __a, __b, __p);
+  *__carry_out = (__builtin_arm_get_fpscr () >> 29) & 0x1u;
+  return __res;
+}
+
+__extension__ extern __inline int32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vsbcq_s32 (int32x4_t __a, int32x4_t __b, unsigned * __carry)
+{
+  __builtin_arm_set_fpscr((__builtin_arm_get_fpscr () & ~0x20000000u) | (*__carry << 29));
+  int32x4_t __res = __builtin_mve_vsbcq_sv4si (__a, __b);
+  *__carry = (__builtin_arm_get_fpscr () >> 29) & 0x1u;
+  return __res;
+}
+
+__extension__ extern __inline uint32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vsbcq_u32 (uint32x4_t __a, uint32x4_t __b, unsigned * __carry)
+{
+  __builtin_arm_set_fpscr((__builtin_arm_get_fpscr () & ~0x20000000u) | (*__carry << 29));
+  uint32x4_t __res =  __builtin_mve_vsbcq_uv4si (__a, __b);
+  *__carry = (__builtin_arm_get_fpscr () >> 29) & 0x1u;
+  return __res;
+}
+
+__extension__ extern __inline int32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vsbcq_m_s32 (int32x4_t __inactive, int32x4_t __a, int32x4_t __b, unsigned * __carry, mve_pred16_t __p)
+{
+  __builtin_arm_set_fpscr((__builtin_arm_get_fpscr () & ~0x20000000u) | (*__carry << 29));
+  int32x4_t __res = __builtin_mve_vsbcq_m_sv4si (__inactive, __a, __b, __p);
+  *__carry = (__builtin_arm_get_fpscr () >> 29) & 0x1u;
+  return __res;
+}
+
+__extension__ extern __inline uint32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vsbcq_m_u32 (uint32x4_t __inactive, uint32x4_t __a, uint32x4_t __b, unsigned * __carry, mve_pred16_t __p)
+{
+  __builtin_arm_set_fpscr((__builtin_arm_get_fpscr () & ~0x20000000u) | (*__carry << 29));
+  uint32x4_t __res = __builtin_mve_vsbcq_m_uv4si (__inactive, __a, __b, __p);
+  *__carry = (__builtin_arm_get_fpscr () >> 29) & 0x1u;
+  return __res;
+}
+
 #if (__ARM_FEATURE_MVE & 2) /* MVE Floating point.  */
 
 __extension__ extern __inline void
@@ -25552,6 +25720,65 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_uint16x8_t]: __arm_vshrq_x_n_u16 (__ARM_mve_coerce(__p1, uint16x8_t), p2, p3), \
   int (*)[__ARM_mve_type_uint32x4_t]: __arm_vshrq_x_n_u32 (__ARM_mve_coerce(__p1, uint32x4_t), p2, p3));})
 
+#define vadciq_m(p0,p1,p2,p3,p4) __arm_vadciq_m(p0,p1,p2,p3,p4)
+#define __arm_vadciq_m(p0,p1,p2,p3,p4) ({ __typeof(p0) __p0 = (p0); \
+  __typeof(p1) __p1 = (p1); \
+  __typeof(p2) __p2 = (p2); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vadciq_m_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32x4_t), p3, p4), \
+  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vadciq_m_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, uint32x4_t), p3, p4));})
+
+#define vadciq(p0,p1,p2) __arm_vadciq(p0,p1,p2)
+#define __arm_vadciq(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
+  __typeof(p1) __p1 = (p1); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vadciq_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), p2), \
+  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vadciq_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t), p2));})
+
+#define vadcq_m(p0,p1,p2,p3,p4) __arm_vadcq_m(p0,p1,p2,p3,p4)
+#define __arm_vadcq_m(p0,p1,p2,p3,p4) ({ __typeof(p0) __p0 = (p0); \
+  __typeof(p1) __p1 = (p1); \
+  __typeof(p2) __p2 = (p2); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vadcq_m_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32x4_t), p3, p4), \
+  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vadcq_m_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, uint32x4_t), p3, p4));})
+
+#define vadcq(p0,p1,p2) __arm_vadcq(p0,p1,p2)
+#define __arm_vadcq(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
+  __typeof(p1) __p1 = (p1); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vadcq_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), p2), \
+  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vadcq_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t), p2));})
+
+#define vsbciq_m(p0,p1,p2,p3,p4) __arm_vsbciq_m(p0,p1,p2,p3,p4)
+#define __arm_vsbciq_m(p0,p1,p2,p3,p4) ({ __typeof(p0) __p0 = (p0); \
+  __typeof(p1) __p1 = (p1); \
+  __typeof(p2) __p2 = (p2); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vsbciq_m_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32x4_t), p3, p4), \
+  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vsbciq_m_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, uint32x4_t), p3, p4));})
+
+#define vsbciq(p0,p1,p2) __arm_vsbciq(p0,p1,p2)
+#define __arm_vsbciq(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
+  __typeof(p1) __p1 = (p1); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vsbciq_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), p2), \
+  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vsbciq_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t), p2));})
+
+#define vsbcq_m(p0,p1,p2,p3,p4) __arm_vsbcq_m(p0,p1,p2,p3,p4)
+#define __arm_vsbcq_m(p0,p1,p2,p3,p4) ({ __typeof(p0) __p0 = (p0); \
+  __typeof(p1) __p1 = (p1); \
+  __typeof(p2) __p2 = (p2); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vsbcq_m_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32x4_t), p3, p4), \
+  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vsbcq_m_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, uint32x4_t), p3, p4));})
+
+#define vsbcq(p0,p1,p2) __arm_vsbcq(p0,p1,p2)
+#define __arm_vsbcq(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
+  __typeof(p1) __p1 = (p1); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vsbcq_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), p2), \
+  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vsbcq_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t), p2));})
 
 #endif /* MVE Floating point.  */
 
diff --git a/gcc/config/arm/arm_mve_builtins.def b/gcc/config/arm/arm_mve_builtins.def
index b77335cff133872558b48b5574dccc0f17df9ed1..a413b38676f2f102c16fdf2147f3b8a4d8ec47b4 100644
--- a/gcc/config/arm/arm_mve_builtins.def
+++ b/gcc/config/arm/arm_mve_builtins.def
@@ -857,3 +857,19 @@ VAR1 (LDRGBWBS_Z, vldrdq_gather_base_wb_z_s, v2di)
 VAR1 (LDRGBWBS, vldrwq_gather_base_wb_s, v4si)
 VAR1 (LDRGBWBS, vldrwq_gather_base_wb_f, v4sf)
 VAR1 (LDRGBWBS, vldrdq_gather_base_wb_s, v2di)
+VAR1 (BINOP_NONE_NONE_NONE, vadciq_s, v4si)
+VAR1 (BINOP_UNONE_UNONE_UNONE, vadciq_u, v4si)
+VAR1 (BINOP_NONE_NONE_NONE, vadcq_s, v4si)
+VAR1 (BINOP_UNONE_UNONE_UNONE, vadcq_u, v4si)
+VAR1 (BINOP_NONE_NONE_NONE, vsbciq_s, v4si)
+VAR1 (BINOP_UNONE_UNONE_UNONE, vsbciq_u, v4si)
+VAR1 (BINOP_NONE_NONE_NONE, vsbcq_s, v4si)
+VAR1 (BINOP_UNONE_UNONE_UNONE, vsbcq_u, v4si)
+VAR1 (QUADOP_NONE_NONE_NONE_NONE_UNONE, vadciq_m_s, v4si)
+VAR1 (QUADOP_UNONE_UNONE_UNONE_UNONE_UNONE, vadciq_m_u, v4si)
+VAR1 (QUADOP_NONE_NONE_NONE_NONE_UNONE, vadcq_m_s, v4si)
+VAR1 (QUADOP_UNONE_UNONE_UNONE_UNONE_UNONE, vadcq_m_u, v4si)
+VAR1 (QUADOP_NONE_NONE_NONE_NONE_UNONE, vsbciq_m_s, v4si)
+VAR1 (QUADOP_UNONE_UNONE_UNONE_UNONE_UNONE, vsbciq_m_u, v4si)
+VAR1 (QUADOP_NONE_NONE_NONE_NONE_UNONE, vsbcq_m_s, v4si)
+VAR1 (QUADOP_UNONE_UNONE_UNONE_UNONE_UNONE, vsbcq_m_u, v4si)
diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index a938e0922f8dc6749dc7192961ae2091d666c6e7..8ff69094378396830ef31d9e2ca9db71c58aefab 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -211,7 +211,10 @@
 			 VDWDUPQ_M VIDUPQ VIDUPQ_M VIWDUPQ VIWDUPQ_M
 			 VSTRWQSBWB_S VSTRWQSBWB_U VLDRWQGBWB_S VLDRWQGBWB_U
 			 VSTRWQSBWB_F VLDRWQGBWB_F VSTRDQSBWB_S VSTRDQSBWB_U
-			 VLDRDQGBWB_S VLDRDQGBWB_U])
+			 VLDRDQGBWB_S VLDRDQGBWB_U VADCQ_U VADCQ_M_U VADCQ_S
+			 VADCQ_M_S VSBCIQ_U VSBCIQ_S VSBCIQ_M_U VSBCIQ_M_S
+			 VSBCQ_U VSBCQ_S VSBCQ_M_U VSBCQ_M_S VADCIQ_U VADCIQ_M_U
+			 VADCIQ_S VADCIQ_M_S])
 
 (define_mode_attr MVE_CNVT [(V8HI "V8HF") (V4SI "V4SF") (V8HF "V8HI")
 			    (V4SF "V4SI")])
@@ -382,8 +385,13 @@
 		       (VSTRWQSO_U "u") (VSTRWQSO_S "s") (VSTRWQSSO_U "u")
 		       (VSTRWQSSO_S "s") (VSTRWQSBWB_S "s") (VSTRWQSBWB_U "u")
 		       (VLDRWQGBWB_S "s") (VLDRWQGBWB_U "u") (VLDRDQGBWB_S "s")
-		       (VLDRDQGBWB_U "u") (VSTRDQSBWB_S "s")
-		       (VSTRDQSBWB_U "u")])
+		       (VLDRDQGBWB_U "u") (VSTRDQSBWB_S "s") (VADCQ_M_S "s")
+		       (VSTRDQSBWB_U "u") (VSBCQ_U "u")  (VSBCQ_M_U "u")
+		       (VSBCQ_S "s")  (VSBCQ_M_S "s") (VSBCIQ_U "u")
+		       (VSBCIQ_M_U "u") (VSBCIQ_S "s") (VSBCIQ_M_S "s")
+		       (VADCQ_U "u")  (VADCQ_M_U "u") (VADCQ_S "s")
+		       (VADCIQ_U "u") (VADCIQ_M_U "u") (VADCIQ_S "s")
+		       (VADCIQ_M_S "s")])
 
 (define_int_attr mode1 [(VCTP8Q "8") (VCTP16Q "16") (VCTP32Q "32")
 			(VCTP64Q "64") (VCTP8Q_M "8") (VCTP16Q_M "16")
@@ -636,6 +644,15 @@
 (define_int_iterator VLDRWGBWBQ [VLDRWQGBWB_S VLDRWQGBWB_U])
 (define_int_iterator VSTRDSBWBQ [VSTRDQSBWB_S VSTRDQSBWB_U])
 (define_int_iterator VLDRDGBWBQ [VLDRDQGBWB_S VLDRDQGBWB_U])
+(define_int_iterator VADCIQ [VADCIQ_U VADCIQ_S])
+(define_int_iterator VADCIQ_M [VADCIQ_M_U VADCIQ_M_S])
+(define_int_iterator VSBCQ [VSBCQ_U VSBCQ_S])
+(define_int_iterator VSBCQ_M [VSBCQ_M_U VSBCQ_M_S])
+(define_int_iterator VSBCIQ [VSBCIQ_U VSBCIQ_S])
+(define_int_iterator VSBCIQ_M [VSBCIQ_M_U VSBCIQ_M_S])
+(define_int_iterator VADCQ [VADCQ_U VADCQ_S])
+(define_int_iterator VADCQ_M [VADCQ_M_U VADCQ_M_S])
+
 
 (define_insn "*mve_mov<mode>"
   [(set (match_operand:MVE_types 0 "s_register_operand" "=w,w,r,w,w,r,w")
@@ -10614,3 +10631,147 @@
    return "";
 }
   [(set_attr "length" "8")])
+;;
+;; [vadciq_m_s, vadciq_m_u])
+;;
+(define_insn "mve_vadciq_m_<supf>v4si"
+  [(set (match_operand:V4SI 0 "s_register_operand" "=w")
+	(unspec:V4SI [(match_operand:V4SI 1 "s_register_operand" "0")
+		      (match_operand:V4SI 2 "s_register_operand" "w")
+		      (match_operand:V4SI 3 "s_register_operand" "w")
+		      (match_operand:HI 4 "vpr_register_operand" "Up")]
+	 VADCIQ_M))
+   (set (reg:SI VFPCC_REGNUM)
+	(unspec:SI [(const_int 0)]
+	 VADCIQ_M))
+  ]
+  "TARGET_HAVE_MVE"
+  "vpst\;vadcit.i32\t%q0, %q2, %q3"
+  [(set_attr "type" "mve_move")
+   (set_attr "length" "8")])
+
+;;
+;; [vadciq_u, vadciq_s])
+;;
+(define_insn "mve_vadciq_<supf>v4si"
+  [(set (match_operand:V4SI 0 "s_register_operand" "=w")
+	(unspec:V4SI [(match_operand:V4SI 1 "s_register_operand" "w")
+		      (match_operand:V4SI 2 "s_register_operand" "w")]
+	 VADCIQ))
+   (set (reg:SI VFPCC_REGNUM)
+	(unspec:SI [(const_int 0)]
+	 VADCIQ))
+  ]
+  "TARGET_HAVE_MVE"
+  "vadci.i32\t%q0, %q1, %q2"
+  [(set_attr "type" "mve_move")
+   (set_attr "length" "4")])
+
+;;
+;; [vadcq_m_s, vadcq_m_u])
+;;
+(define_insn "mve_vadcq_m_<supf>v4si"
+  [(set (match_operand:V4SI 0 "s_register_operand" "=w")
+	(unspec:V4SI [(match_operand:V4SI 1 "s_register_operand" "0")
+		      (match_operand:V4SI 2 "s_register_operand" "w")
+		      (match_operand:V4SI 3 "s_register_operand" "w")
+		      (match_operand:HI 4 "vpr_register_operand" "Up")]
+	 VADCQ_M))
+   (set (reg:SI VFPCC_REGNUM)
+	(unspec:SI [(reg:SI VFPCC_REGNUM)]
+	 VADCQ_M))
+  ]
+  "TARGET_HAVE_MVE"
+  "vpst\;vadct.i32\t%q0, %q2, %q3"
+  [(set_attr "type" "mve_move")
+   (set_attr "length" "8")])
+
+;;
+;; [vadcq_u, vadcq_s])
+;;
+(define_insn "mve_vadcq_<supf>v4si"
+  [(set (match_operand:V4SI 0 "s_register_operand" "=w")
+	(unspec:V4SI [(match_operand:V4SI 1 "s_register_operand" "w")
+		       (match_operand:V4SI 2 "s_register_operand" "w")]
+	 VADCQ))
+   (set (reg:SI VFPCC_REGNUM)
+	(unspec:SI [(reg:SI VFPCC_REGNUM)]
+	 VADCQ))
+  ]
+  "TARGET_HAVE_MVE"
+  "vadc.i32\t%q0, %q1, %q2"
+  [(set_attr "type" "mve_move")
+   (set_attr "length" "4")
+   (set_attr "conds" "set")])
+
+;;
+;; [vsbciq_m_u, vsbciq_m_s])
+;;
+(define_insn "mve_vsbciq_m_<supf>v4si"
+  [(set (match_operand:V4SI 0 "s_register_operand" "=w")
+	(unspec:V4SI [(match_operand:V4SI 1 "s_register_operand" "w")
+		      (match_operand:V4SI 2 "s_register_operand" "w")
+		      (match_operand:V4SI 3 "s_register_operand" "w")
+		      (match_operand:HI 4 "vpr_register_operand" "Up")]
+	 VSBCIQ_M))
+   (set (reg:SI VFPCC_REGNUM)
+	(unspec:SI [(const_int 0)]
+	 VSBCIQ_M))
+  ]
+  "TARGET_HAVE_MVE"
+  "vpst\;vsbcit.i32\t%q0, %q2, %q3"
+  [(set_attr "type" "mve_move")
+   (set_attr "length" "8")])
+
+;;
+;; [vsbciq_s, vsbciq_u])
+;;
+(define_insn "mve_vsbciq_<supf>v4si"
+  [(set (match_operand:V4SI 0 "s_register_operand" "=w")
+	(unspec:V4SI [(match_operand:V4SI 1 "s_register_operand" "w")
+		      (match_operand:V4SI 2 "s_register_operand" "w")]
+	 VSBCIQ))
+   (set (reg:SI VFPCC_REGNUM)
+	(unspec:SI [(const_int 0)]
+	 VSBCIQ))
+  ]
+  "TARGET_HAVE_MVE"
+  "vsbci.i32\t%q0, %q1, %q2"
+  [(set_attr "type" "mve_move")
+   (set_attr "length" "4")])
+
+;;
+;; [vsbcq_m_u, vsbcq_m_s])
+;;
+(define_insn "mve_vsbcq_m_<supf>v4si"
+  [(set (match_operand:V4SI 0 "s_register_operand" "=w")
+	(unspec:V4SI [(match_operand:V4SI 1 "s_register_operand" "w")
+		      (match_operand:V4SI 2 "s_register_operand" "w")
+		      (match_operand:V4SI 3 "s_register_operand" "w")
+		      (match_operand:HI 4 "vpr_register_operand" "Up")]
+	 VSBCQ_M))
+   (set (reg:SI VFPCC_REGNUM)
+	(unspec:SI [(reg:SI VFPCC_REGNUM)]
+	 VSBCQ_M))
+  ]
+  "TARGET_HAVE_MVE"
+  "vpst\;vsbct.i32\t%q0, %q2, %q3"
+  [(set_attr "type" "mve_move")
+   (set_attr "length" "8")])
+
+;;
+;; [vsbcq_s, vsbcq_u])
+;;
+(define_insn "mve_vsbcq_<supf>v4si"
+  [(set (match_operand:V4SI 0 "s_register_operand" "=w")
+	(unspec:V4SI [(match_operand:V4SI 1 "s_register_operand" "w")
+		      (match_operand:V4SI 2 "s_register_operand" "w")]
+	 VSBCQ))
+   (set (reg:SI VFPCC_REGNUM)
+	(unspec:SI [(reg:SI VFPCC_REGNUM)]
+	 VSBCQ))
+  ]
+  "TARGET_HAVE_MVE"
+  "vsbc.i32\t%q0, %q1, %q2"
+  [(set_attr "type" "mve_move")
+   (set_attr "length" "4")])
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vadciq_m_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vadciq_m_s32.c
new file mode 100644
index 0000000000000000000000000000000000000000..51cedf3c5421241b0a2a1d8473e07172669a2043
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vadciq_m_s32.c
@@ -0,0 +1,26 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int32x4_t
+foo (int32x4_t inactive, int32x4_t a, int32x4_t b, unsigned * carry_out, mve_pred16_t p)
+{
+  return vadciq_m_s32 (inactive, a, b, carry_out, p);
+}
+
+/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler "vadcit.i32"  }  } */
+/* { dg-final { scan-assembler "vpst" } } */
+
+int32x4_t
+foo1 (int32x4_t inactive, int32x4_t a, int32x4_t b, unsigned * carry_out, mve_pred16_t p)
+{
+  return vadciq_m (inactive, a, b, carry_out, p);
+}
+
+/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler "vadcit.i32"  }  } */
+/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-times "mrc" 2 } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vadciq_m_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vadciq_m_u32.c
new file mode 100644
index 0000000000000000000000000000000000000000..b46f0a7d44d8bf77105ffcfc59cbb24aa4eaa658
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vadciq_m_u32.c
@@ -0,0 +1,26 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint32x4_t
+foo (uint32x4_t inactive, uint32x4_t a, uint32x4_t b, unsigned * carry_out, mve_pred16_t p)
+{
+  return vadciq_m_u32 (inactive, a, b, carry_out, p);
+}
+
+/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler "vadcit.i32"  }  } */
+/* { dg-final { scan-assembler "vpst" } } */
+
+uint32x4_t
+foo1 (uint32x4_t inactive, uint32x4_t a, uint32x4_t b, unsigned * carry_out, mve_pred16_t p)
+{
+  return vadciq_m (inactive, a, b, carry_out, p);
+}
+
+/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler "vadcit.i32"  }  } */
+/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-times "mrc" 2 } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vadciq_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vadciq_s32.c
new file mode 100644
index 0000000000000000000000000000000000000000..124bff6dbb225edd156390411db2b4e18643f256
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vadciq_s32.c
@@ -0,0 +1,22 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int32x4_t
+foo (int32x4_t a, int32x4_t b, unsigned * carry_out)
+{
+  return vadciq_s32 (a, b, carry_out);
+}
+
+/* { dg-final { scan-assembler "vadci.i32"  }  } */
+
+int32x4_t
+foo1 (int32x4_t a, int32x4_t b, unsigned * carry_out)
+{
+  return vadciq (a, b, carry_out);
+}
+
+/* { dg-final { scan-assembler "vadci.i32"  }  } */
+/* { dg-final { scan-assembler-times "mrc" 2 } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vadciq_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vadciq_u32.c
new file mode 100644
index 0000000000000000000000000000000000000000..0718570130caa4cb88085f8a11ab92a72522cec7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vadciq_u32.c
@@ -0,0 +1,22 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint32x4_t
+foo (uint32x4_t a, uint32x4_t b, unsigned * carry_out)
+{
+  return vadciq_u32 (a, b, carry_out);
+}
+
+/* { dg-final { scan-assembler "vadci.i32"  }  } */
+
+uint32x4_t
+foo1 (uint32x4_t a, uint32x4_t b, unsigned * carry_out)
+{
+  return vadciq (a, b, carry_out);
+}
+
+/* { dg-final { scan-assembler "vadci.i32"  }  } */
+/* { dg-final { scan-assembler-times "mrc" 2 } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vadcq_m_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vadcq_m_s32.c
new file mode 100644
index 0000000000000000000000000000000000000000..1c2a928d52ff04c7341f043c15123386453c66d5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vadcq_m_s32.c
@@ -0,0 +1,28 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int32x4_t
+foo (int32x4_t inactive, int32x4_t a, int32x4_t b, unsigned * carry, mve_pred16_t p)
+{
+  return vadcq_m_s32 (inactive, a, b, carry, p);
+}
+
+/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler "vadct.i32"  }  } */
+/* { dg-final { scan-assembler "vpst" } } */
+
+int32x4_t
+foo1 (int32x4_t inactive, int32x4_t a, int32x4_t b, unsigned * carry, mve_pred16_t p)
+{
+  return vadcq_m (inactive, a, b, carry, p);
+}
+
+/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler "vadct.i32"  }  } */
+/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-times "mrc" 4 } } */
+/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-times "mcr" 2 } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vadcq_m_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vadcq_m_u32.c
new file mode 100644
index 0000000000000000000000000000000000000000..af38e01b7c8fd180311afbb6ee05c758ec2e636e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vadcq_m_u32.c
@@ -0,0 +1,29 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint32x4_t
+foo (uint32x4_t inactive, uint32x4_t a, uint32x4_t b, unsigned * carry, mve_pred16_t p)
+{
+  return vadcq_m_u32 (inactive, a, b, carry, p);
+}
+
+/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler "vadct.i32"  }  } */
+/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler "vpst" } } */
+
+uint32x4_t
+foo1 (uint32x4_t inactive, uint32x4_t a, uint32x4_t b, unsigned * carry, mve_pred16_t p)
+{
+  return vadcq_m (inactive, a, b, carry, p);
+}
+
+/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler "vadct.i32"  }  } */
+/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-times "mrc" 4 } } */
+/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-times "mcr" 2 } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vadcq_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vadcq_s32.c
new file mode 100644
index 0000000000000000000000000000000000000000..35be2d6aa2e31628a162bd9456be0844c861bb6a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vadcq_s32.c
@@ -0,0 +1,23 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int32x4_t
+foo (int32x4_t a, int32x4_t b, unsigned * carry)
+{
+  return vadcq_s32 (a, b, carry);
+}
+
+/* { dg-final { scan-assembler "vadc.i32"  }  } */
+
+int32x4_t
+foo1 (int32x4_t a, int32x4_t b, unsigned * carry)
+{
+  return vadcq (a, b, carry);
+}
+
+/* { dg-final { scan-assembler "vadc.i32"  }  } */
+/* { dg-final { scan-assembler-times "mrc" 4 } } */
+/* { dg-final { scan-assembler-times "mcr" 2 } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vadcq_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vadcq_u32.c
new file mode 100644
index 0000000000000000000000000000000000000000..9a8246318b6fad157cbb9b74379e2f760fe71ff6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vadcq_u32.c
@@ -0,0 +1,23 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint32x4_t
+foo (uint32x4_t a, uint32x4_t b, unsigned * carry)
+{
+  return vadcq_u32 (a, b, carry);
+}
+
+/* { dg-final { scan-assembler "vadc.i32"  }  } */
+
+uint32x4_t
+foo1 (uint32x4_t a, uint32x4_t b, unsigned * carry)
+{
+  return vadcq (a, b, carry);
+}
+
+/* { dg-final { scan-assembler "vadc.i32"  }  } */
+/* { dg-final { scan-assembler-times "mrc" 4 } } */
+/* { dg-final { scan-assembler-times "mcr" 2 } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsbciq_m_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsbciq_m_s32.c
new file mode 100644
index 0000000000000000000000000000000000000000..c353a51080d9a15f033562636ffde3cb81e39b36
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsbciq_m_s32.c
@@ -0,0 +1,26 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int32x4_t
+foo (int32x4_t inactive, int32x4_t a, int32x4_t b, unsigned * carry_out, mve_pred16_t p)
+{
+  return vsbciq_m_s32 (inactive, a, b, carry_out, p);
+}
+
+/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler "vsbcit.i32"  } } */
+/* { dg-final { scan-assembler "vpst" } } */
+
+int32x4_t
+foo1 (int32x4_t inactive, int32x4_t a, int32x4_t b, unsigned * carry_out, mve_pred16_t p)
+{
+  return vsbciq_m (inactive, a, b, carry_out, p);
+}
+
+/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler "vsbcit.i32"  } } */
+/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-times "mrc" 2 } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsbciq_m_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsbciq_m_u32.c
new file mode 100644
index 0000000000000000000000000000000000000000..e2bddb737c866c86a0ba0ac6fb5de1eeb7206c69
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsbciq_m_u32.c
@@ -0,0 +1,27 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint32x4_t
+foo (uint32x4_t inactive, uint32x4_t a, uint32x4_t b, unsigned * carry_out, mve_pred16_t p)
+{
+  return vsbciq_m_u32 (inactive, a, b, carry_out, p);
+}
+
+/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler "vsbcit.i32"  }  } */
+/* { dg-final { scan-assembler "vpst" } } */
+
+uint32x4_t
+foo1 (uint32x4_t inactive, uint32x4_t a, uint32x4_t b, unsigned * carry_out, mve_pred16_t p)
+{
+  return vsbciq_m (inactive, a, b, carry_out, p);
+}
+
+/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler "vsbcit.i32"  }  } */
+/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-times "mrc" 2 } } */
+
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsbciq_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsbciq_s32.c
new file mode 100644
index 0000000000000000000000000000000000000000..db32b9cef100c1ec8b8fa75d946e9423e7dc6669
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsbciq_s32.c
@@ -0,0 +1,22 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int32x4_t
+foo (int32x4_t a, int32x4_t b, unsigned * carry_out)
+{
+  return vsbciq_s32 (a, b, carry_out);
+}
+
+/* { dg-final { scan-assembler "vsbci.i32"  }  } */
+
+int32x4_t
+foo1 (int32x4_t a, int32x4_t b, unsigned * carry_out)
+{
+  return vsbciq_s32 (a, b, carry_out);
+}
+
+/* { dg-final { scan-assembler "vsbci.i32"  }  } */
+/* { dg-final { scan-assembler-times "mrc" 2 } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsbciq_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsbciq_u32.c
new file mode 100644
index 0000000000000000000000000000000000000000..60f213d9a631d7d12f8ab7dfe58e3c251b9e8854
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsbciq_u32.c
@@ -0,0 +1,22 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint32x4_t
+foo (uint32x4_t a, uint32x4_t b, unsigned * carry_out)
+{
+  return vsbciq_u32 (a, b, carry_out);
+}
+
+/* { dg-final { scan-assembler "vsbci.i32"  }  } */
+
+uint32x4_t
+foo1 (uint32x4_t a, uint32x4_t b, unsigned * carry_out)
+{
+  return vsbciq_u32 (a, b, carry_out);
+}
+
+/* { dg-final { scan-assembler "vsbci.i32"  }  } */
+/* { dg-final { scan-assembler-times "mrc" 2 } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsbcq_m_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsbcq_m_s32.c
new file mode 100644
index 0000000000000000000000000000000000000000..4ab7e2f1f69ade2189c44bc9bcd6243dc3cd9d68
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsbcq_m_s32.c
@@ -0,0 +1,19 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int32x4_t
+foo (int32x4_t inactive, int32x4_t a, int32x4_t b, unsigned * carry, mve_pred16_t p)
+{
+    return vsbcq_m_s32 (inactive, a, b, carry, p);
+}
+
+/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler "vsbct.i32"  }  } */
+/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-times "mrc" 2 } } */
+/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-times "mcr" 1 } } */
+
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsbcq_m_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsbcq_m_u32.c
new file mode 100644
index 0000000000000000000000000000000000000000..da2edac3d95d5405226bd0b85a9dce42ad62fd9b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsbcq_m_u32.c
@@ -0,0 +1,19 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint32x4_t
+foo (uint32x4_t inactive, uint32x4_t a, uint32x4_t b, unsigned * carry, mve_pred16_t p)
+{
+    return vsbcq_m_u32 (inactive, a, b, carry, p);
+}
+
+/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler "vsbct.i32"  }  } */
+/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-times "mrc" 2 } } */
+/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler-times "mcr" 1 } } */
+
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsbcq_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsbcq_s32.c
new file mode 100644
index 0000000000000000000000000000000000000000..9c3e6e99f5938d523d5dee127300031ddc09ee33
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsbcq_s32.c
@@ -0,0 +1,24 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int32x4_t
+foo (int32x4_t a, int32x4_t b, unsigned * carry)
+{
+  return vsbcq_s32 (a, b, carry);
+}
+
+/* { dg-final { scan-assembler "vsbc.i32"  }  } */
+
+int32x4_t
+foo1 (int32x4_t a, int32x4_t b, unsigned * carry)
+{
+  return vsbcq (a, b, carry);
+}
+
+/* { dg-final { scan-assembler "vsbc.i32"  }  } */
+/* { dg-final { scan-assembler-times "mrc" 4 } } */
+/* { dg-final { scan-assembler-times "mcr" 2 } } */
+
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsbcq_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsbcq_u32.c
new file mode 100644
index 0000000000000000000000000000000000000000..122b20c41b34c906c00b9914fe152195a3739118
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsbcq_u32.c
@@ -0,0 +1,23 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint32x4_t
+foo (uint32x4_t a, uint32x4_t b, unsigned * carry)
+{
+  return vsbcq_u32 (a, b, carry);
+}
+
+/* { dg-final { scan-assembler "vsbc.i32"  }  } */
+
+uint32x4_t
+foo1 (uint32x4_t a, uint32x4_t b, unsigned * carry)
+{
+  return vsbcq (a, b, carry);
+}
+
+/* { dg-final { scan-assembler "vsbc.i32"  }  } */
+/* { dg-final { scan-assembler-times "mrc" 4 } } */
+/* { dg-final { scan-assembler-times "mcr" 2 } } */


^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH][ARM][GCC][6x]:MVE ACLE vaddq intrinsics using arithmetic plus operator.
  2019-11-14 19:34 [PATCH][ARM][GCC][0/x]: Support for MVE ACLE intrinsics Srinath Parvathaneni
                   ` (32 preceding siblings ...)
  2019-11-14 19:16 ` [PATCH][ARM][GCC][10x]: MVE ACLE intrinsics "add with carry across beats" and "beat-wise substract" Srinath Parvathaneni
@ 2019-11-14 19:16 ` Srinath Parvathaneni
  2019-11-14 19:16 ` [PATCH][ARM][GCC][11x]: MVE ACLE vector interleaving store and deinterleaving load intrinsics and also aliases to vstr and vldr intrinsics Srinath Parvathaneni
                   ` (4 subsequent siblings)
  38 siblings, 0 replies; 48+ messages in thread
From: Srinath Parvathaneni @ 2019-11-14 19:16 UTC (permalink / raw)
  To: gcc-patches; +Cc: Richard Earnshaw, Kyrylo Tkachov

[-- Attachment #1: Type: text/plain, Size: 23480 bytes --]

Hello,

This patch supports following MVE ACLE vaddq intrinsics. The RTL patterns for this intrinsics
are added using arithmetic "plus" operator.

vaddq_s8, vaddq_s16, vaddq_s32, vaddq_u8, vaddq_u16, vaddq_u32, vaddq_f16, vaddq_f32.

Please refer to M-profile Vector Extension (MVE) intrinsics [1]  for more details.
[1]  https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/mve-intrinsics

Regression tested on arm-none-eabi and found no regressions.

Ok for trunk?

Thanks,
Srinath.

gcc/ChangeLog:

2019-11-05  Andre Vieira  <andre.simoesdiasvieira@arm.com>
            Mihail Ionescu  <mihail.ionescu@arm.com>
            Srinath Parvathaneni  <srinath.parvathaneni@arm.com>

	* config/arm/arm_mve.h (vaddq_s8): Define macro.
	(vaddq_s16): Likewise.
	(vaddq_s32): Likewise.
	(vaddq_u8): Likewise.
	(vaddq_u16): Likewise.
	(vaddq_u32): Likewise.
	(vaddq_f16): Likewise.
	(vaddq_f32): Likewise.
	(__arm_vaddq_s8): Define intrinsic.
	(__arm_vaddq_s16): Likewise.
	(__arm_vaddq_s32): Likewise.
	(__arm_vaddq_u8): Likewise.
	(__arm_vaddq_u16): Likewise.
	(__arm_vaddq_u32): Likewise.
	(__arm_vaddq_f16): Likewise.
	(__arm_vaddq_f32): Likewise.
	(vaddq): Define polymorphic variant.
	* config/arm/iterators.md (VNIM): Define mode iterator for common types
	Neon, IWMMXT and MVE.
	(VNINOTM): Likewise.
	* config/arm/mve.md (mve_vaddq<mode>): Define RTL pattern.
	(mve_vaddq_f<mode>): Define RTL pattern.
	* config/arm/neon.md (add<mode>3): Rename to addv4hf3 RTL pattern.
	(addv8hf3_neon): Define RTL pattern.
	* config/arm/vec-common.md (add<mode>3): Modify standard add RTL pattern
	to support MVE.
	(addv8hf3): Define standard RTL pattern for MVE and Neon.
	(add<mode>3): Modify existing standard add RTL pattern for Neon and IWMMXT.

gcc/testsuite/ChangeLog:

2019-11-05  Andre Vieira  <andre.simoesdiasvieira@arm.com>
            Mihail Ionescu  <mihail.ionescu@arm.com>
            Srinath Parvathaneni  <srinath.parvathaneni@arm.com>

	* gcc.target/arm/mve/intrinsics/vaddq_f16.c: New test.
	* gcc.target/arm/mve/intrinsics/vaddq_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddq_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddq_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddq_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddq_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddq_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddq_u8.c: Likewise.


###############     Attachment also inlined for ease of reply    ###############


diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
index 42e98f9ad1e357fe974e58378a49bcaaf36c302a..89456589c9dcdff5b56e8707dd720fb151466661 100644
--- a/gcc/config/arm/arm_mve.h
+++ b/gcc/config/arm/arm_mve.h
@@ -1898,6 +1898,14 @@ typedef struct { uint8x16_t val[4]; } uint8x16x4_t;
 #define vstrwq_scatter_shifted_offset_p_u32(__base, __offset, __value, __p) __arm_vstrwq_scatter_shifted_offset_p_u32(__base, __offset, __value, __p)
 #define vstrwq_scatter_shifted_offset_s32(__base, __offset, __value) __arm_vstrwq_scatter_shifted_offset_s32(__base, __offset, __value)
 #define vstrwq_scatter_shifted_offset_u32(__base, __offset, __value) __arm_vstrwq_scatter_shifted_offset_u32(__base, __offset, __value)
+#define vaddq_s8(__a, __b) __arm_vaddq_s8(__a, __b)
+#define vaddq_s16(__a, __b) __arm_vaddq_s16(__a, __b)
+#define vaddq_s32(__a, __b) __arm_vaddq_s32(__a, __b)
+#define vaddq_u8(__a, __b) __arm_vaddq_u8(__a, __b)
+#define vaddq_u16(__a, __b) __arm_vaddq_u16(__a, __b)
+#define vaddq_u32(__a, __b) __arm_vaddq_u32(__a, __b)
+#define vaddq_f16(__a, __b) __arm_vaddq_f16(__a, __b)
+#define vaddq_f32(__a, __b) __arm_vaddq_f32(__a, __b)
 #endif
 
 __extension__ extern __inline void
@@ -12341,6 +12349,48 @@ __arm_vstrwq_scatter_shifted_offset_u32 (uint32_t * __base, uint32x4_t __offset,
   __builtin_mve_vstrwq_scatter_shifted_offset_uv4si ((__builtin_neon_si *) __base, __offset, __value);
 }
 
+__extension__ extern __inline int8x16_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vaddq_s8 (int8x16_t __a, int8x16_t __b)
+{
+  return __a + __b;
+}
+
+__extension__ extern __inline int16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vaddq_s16 (int16x8_t __a, int16x8_t __b)
+{
+  return __a + __b;
+}
+
+__extension__ extern __inline int32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vaddq_s32 (int32x4_t __a, int32x4_t __b)
+{
+  return __a + __b;
+}
+
+__extension__ extern __inline uint8x16_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vaddq_u8 (uint8x16_t __a, uint8x16_t __b)
+{
+  return __a + __b;
+}
+
+__extension__ extern __inline uint16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vaddq_u16 (uint16x8_t __a, uint16x8_t __b)
+{
+  return __a + __b;
+}
+
+__extension__ extern __inline uint32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vaddq_u32 (uint32x4_t __a, uint32x4_t __b)
+{
+  return __a + __b;
+}
+
 #if (__ARM_FEATURE_MVE & 2) /* MVE Floating point.  */
 
 __extension__ extern __inline void
@@ -14707,6 +14757,20 @@ __arm_vstrwq_scatter_shifted_offset_p_f32 (float32_t * __base, uint32x4_t __offs
   __builtin_mve_vstrwq_scatter_shifted_offset_p_fv4sf (__base, __offset, __value, __p);
 }
 
+__extension__ extern __inline float16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vaddq_f16 (float16x8_t __a, float16x8_t __b)
+{
+  return __a + __b;
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vaddq_f32 (float32x4_t __a, float32x4_t __b)
+{
+  return __a + __b;
+}
+
 #endif
 
 enum {
@@ -16557,6 +16621,25 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_uint32_t_ptr][__ARM_mve_type_uint32x4_t]: __arm_vstrwq_scatter_shifted_offset_u32 (__ARM_mve_coerce(__p0, uint32_t *), __p1, __ARM_mve_coerce(__p2, uint32x4_t)), \
   int (*)[__ARM_mve_type_float32_t_ptr][__ARM_mve_type_float32x4_t]: __arm_vstrwq_scatter_shifted_offset_f32 (__ARM_mve_coerce(__p0, float32_t *), __p1, __ARM_mve_coerce(__p2, float32x4_t)));})
 
+#define vaddq(p0,p1) __arm_vaddq(p0,p1)
+#define __arm_vaddq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
+  __typeof(p1) __p1 = (p1); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
+  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vaddq_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t)), \
+  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vaddq_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t)), \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vaddq_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t)), \
+  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]: __arm_vaddq_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t)), \
+  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vaddq_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t)), \
+  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vaddq_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t)), \
+  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8_t]: __arm_vaddq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8_t)), \
+  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16_t]: __arm_vaddq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16_t)), \
+  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32_t]: __arm_vaddq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32_t)), \
+  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8_t]: __arm_vaddq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8_t)), \
+  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16_t]: __arm_vaddq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16_t)), \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32_t]: __arm_vaddq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32_t)), \
+  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]: __arm_vaddq_f16 (__ARM_mve_coerce(p0, float16x8_t), __ARM_mve_coerce(p1, float16x8_t)), \
+  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]: __arm_vaddq_f32 (__ARM_mve_coerce(p0, float32x4_t), __ARM_mve_coerce(p1, float32x4_t)));})
+
 #else /* MVE Interger.  */
 
 #define vst4q(p0,p1) __arm_vst4q(p0,p1)
@@ -17213,12 +17296,7 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vaddq_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t)), \
   int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]: __arm_vaddq_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t)), \
   int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vaddq_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t)), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vaddq_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t)));})
-
-#define vaddq(p0,p1) __arm_vaddq(p0,p1)
-#define __arm_vaddq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
-  __typeof(p1) __p1 = (p1); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
+  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vaddq_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t)), \
   int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8_t]: __arm_vaddq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8_t)), \
   int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16_t]: __arm_vaddq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16_t)), \
   int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32_t]: __arm_vaddq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32_t)), \
diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
index e30325bc1652d378be2544fa32269c5c4294d7e9..a0cd35eadc8e9fa6d1a8330778c0b5f83a5e8a37 100644
--- a/gcc/config/arm/iterators.md
+++ b/gcc/config/arm/iterators.md
@@ -62,6 +62,14 @@
 ;; Integer and float modes supported by Neon and IWMMXT.
 (define_mode_iterator VALL [V2DI V2SI V4HI V8QI V2SF V4SI V8HI V16QI V4SF])
 
+;; Integer and float modes supported by Neon, IWMMXT and MVE, used by
+;; arithmetic epxand patterns.
+(define_mode_iterator VNIM [V16QI V8HI V4SI V4SF])
+
+;; Integer and float modes supported by Neon and IWMMXT but not MVE, used by
+;; arithmetic epxand patterns.
+(define_mode_iterator VNINOTM [V2SI V4HI V8QI V2SF V2DI])
+
 ;; Integer and float modes supported by Neon, IWMMXT and MVE.
 (define_mode_iterator VNIM1 [V16QI V8HI V4SI V4SF V2DI])
 
diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index c3fdc8b60843332ee0e59e8ec537d00c41407622..a3f5de300a4a83d916ebbed44ebce30b7d143b22 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -9636,3 +9636,31 @@
    return "";
 }
   [(set_attr "length" "4")])
+
+;;
+;; [vaddq_s, vaddq_u])
+;;
+(define_insn "mve_vaddq<mode>"
+  [
+   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
+	(plus:MVE_2 (match_operand:MVE_2 1 "s_register_operand" "w")
+		    (match_operand:MVE_2 2 "s_register_operand" "w")))
+  ]
+  "TARGET_HAVE_MVE"
+  "vadd.i%#<V_sz_elem>  %q0, %q1, %q2"
+  [(set_attr "type" "mve_move")
+])
+
+;;
+;; [vaddq_f])
+;;
+(define_insn "mve_vaddq_f<mode>"
+  [
+   (set (match_operand:MVE_0 0 "s_register_operand" "=w")
+	(plus:MVE_0 (match_operand:MVE_0 1 "s_register_operand" "w")
+		    (match_operand:MVE_0 2 "s_register_operand" "w")))
+  ]
+  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
+  "vadd.f%#<V_sz_elem> %q0, %q1, %q2"
+  [(set_attr "type" "mve_move")
+])
diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index c23783e0ed914ec21a92828388ada58ada3c6132..df8f4fd4166b0d9cd08f01f3ac7ac3958f20b9db 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -519,18 +519,30 @@
 ;; As with SFmode, full support for HFmode vector arithmetic is only available
 ;; when flag-unsafe-math-optimizations is enabled.
 
-(define_insn "add<mode>3"
+;; Add pattern with modes V8HF and V4HF is split into separate patterns to add
+;; support for standard pattern addv8hf3 in MVE.  Following pattern is called
+;; from "addv8hf3" standard pattern inside vec-common.md file.
+
+(define_insn "addv8hf3_neon"
   [(set
-    (match_operand:VH 0 "s_register_operand" "=w")
-    (plus:VH
-     (match_operand:VH 1 "s_register_operand" "w")
-     (match_operand:VH 2 "s_register_operand" "w")))]
+    (match_operand:V8HF 0 "s_register_operand" "=w")
+    (plus:V8HF
+     (match_operand:V8HF 1 "s_register_operand" "w")
+     (match_operand:V8HF 2 "s_register_operand" "w")))]
  "TARGET_NEON_FP16INST && flag_unsafe_math_optimizations"
- "vadd.<V_if_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
- [(set (attr "type")
-   (if_then_else (match_test "<Is_float_mode>")
-    (const_string "neon_fp_addsub_s<q>")
-    (const_string "neon_add<q>")))]
+ "vadd.f16\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
+ [(set_attr "type" "neon_fp_addsub_s_q")]
+)
+
+(define_insn "addv4hf3"
+  [(set
+    (match_operand:V4HF 0 "s_register_operand" "=w")
+    (plus:V4HF
+     (match_operand:V4HF 1 "s_register_operand" "w")
+     (match_operand:V4HF 2 "s_register_operand" "w")))]
+ "TARGET_NEON_FP16INST && flag_unsafe_math_optimizations"
+ "vadd.f16\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
+ [(set_attr "type" "neon_fp_addsub_s_q")]
 )
 
 (define_insn "add<mode>3_fp16"
diff --git a/gcc/config/arm/vec-common.md b/gcc/config/arm/vec-common.md
index 5f5c113cf95afafbb733e1bfd2a7c7b8a55651a2..82a1a6bd7698fa25db88a5cdb5b3e762dc80a589 100644
--- a/gcc/config/arm/vec-common.md
+++ b/gcc/config/arm/vec-common.md
@@ -77,19 +77,51 @@
      }
 })
 
+;; Vector arithmetic.  Expanders are blank, then unnamed insns implement
+;; patterns separately for Neon, IWMMXT and MVE.
+
+(define_expand "add<mode>3"
+  [(set (match_operand:VNIM 0 "s_register_operand")
+	(plus:VNIM (match_operand:VNIM 1 "s_register_operand")
+		   (match_operand:VNIM 2 "s_register_operand")))]
+  "(TARGET_NEON && ((<MODE>mode != V2SFmode && <MODE>mode != V4SFmode)
+		    || flag_unsafe_math_optimizations))
+   || (TARGET_REALLY_IWMMXT && VALID_IWMMXT_REG_MODE (<MODE>mode))
+   || (TARGET_HAVE_MVE && VALID_MVE_SI_MODE(<MODE>mode))
+   || (TARGET_HAVE_MVE_FLOAT && VALID_MVE_SF_MODE(<MODE>mode))"
+{
+})
+
+;; Vector arithmetic.  Expanders are blank, then unnamed insns implement
+;; patterns separately for Neon and MVE.
+
+(define_expand "addv8hf3"
+  [(set (match_operand:V8HF 0 "s_register_operand")
+	(plus:V8HF (match_operand:V8HF 1 "s_register_operand")
+		   (match_operand:V8HF 2 "s_register_operand")))]
+  "(TARGET_HAVE_MVE_FLOAT && VALID_MVE_SF_MODE(V8HFmode))
+   || (TARGET_NEON_FP16INST && flag_unsafe_math_optimizations)"
+{
+  if (TARGET_NEON_FP16INST && flag_unsafe_math_optimizations)
+    emit_insn (gen_addv8hf3_neon (operands[0], operands[1], operands[2]));
+})
+
+;; Vector arithmetic.  Expanders are blank, then unnamed insns implement
+;; patterns separately for Neon and IWMMXT.
+
+(define_expand "add<mode>3"
+  [(set (match_operand:VNINOTM 0 "s_register_operand")
+	(plus:VNINOTM (match_operand:VNINOTM 1 "s_register_operand")
+		      (match_operand:VNINOTM 2 "s_register_operand")))]
+  "(TARGET_NEON && ((<MODE>mode != V2SFmode && <MODE>mode != V4SFmode)
+		    || flag_unsafe_math_optimizations))
+   || (TARGET_REALLY_IWMMXT && VALID_IWMMXT_REG_MODE (<MODE>mode))"
+{
+})
+
 ;; Vector arithmetic. Expanders are blank, then unnamed insns implement
 ;; patterns separately for IWMMXT and Neon.
 
-(define_expand "add<mode>3"
-  [(set (match_operand:VALL 0 "s_register_operand")
-        (plus:VALL (match_operand:VALL 1 "s_register_operand")
-                   (match_operand:VALL 2 "s_register_operand")))]
-  "(TARGET_NEON && ((<MODE>mode != V2SFmode && <MODE>mode != V4SFmode)
-		    || flag_unsafe_math_optimizations))
-   || (TARGET_REALLY_IWMMXT && VALID_IWMMXT_REG_MODE (<MODE>mode))"
-{
-})
-
 (define_expand "sub<mode>3"
   [(set (match_operand:VALL 0 "s_register_operand")
         (minus:VALL (match_operand:VALL 1 "s_register_operand")
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_f16.c
new file mode 100644
index 0000000000000000000000000000000000000000..4b6fa72adfdf2b68d766c6a4ce1226a84d357f36
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_f16.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+float16x8_t
+foo (float16x8_t a, float16x8_t b)
+{
+  return vaddq_f16 (a, b);
+}
+
+/* { dg-final { scan-assembler "vadd.f16"  }  } */
+
+float16x8_t
+foo1 (float16x8_t a, float16x8_t b)
+{
+  return vaddq (a, b);
+}
+
+/* { dg-final { scan-assembler "vadd.f16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_f32.c
new file mode 100644
index 0000000000000000000000000000000000000000..0aa7d4b7d9b8f4cb0c6611fe47d494762297c24c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_f32.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+float32x4_t
+foo (float32x4_t a, float32x4_t b)
+{
+  return vaddq_f32 (a, b);
+}
+
+/* { dg-final { scan-assembler "vadd.f32"  }  } */
+
+float32x4_t
+foo1 (float32x4_t a, float32x4_t b)
+{
+  return vaddq (a, b);
+}
+
+/* { dg-final { scan-assembler "vadd.f32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_s16.c
new file mode 100644
index 0000000000000000000000000000000000000000..34f1e0b1d0e11b899505a314418f0f2e1acb7b51
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_s16.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int16x8_t
+foo (int16x8_t a, int16x8_t b)
+{
+  return vaddq_s16 (a, b);
+}
+
+/* { dg-final { scan-assembler "vadd.i16"  }  } */
+
+int16x8_t
+foo1 (int16x8_t a, int16x8_t b)
+{
+  return vaddq (a, b);
+}
+
+/* { dg-final { scan-assembler "vadd.i16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_s32.c
new file mode 100644
index 0000000000000000000000000000000000000000..ae35932b3dcf49eddd9e537e8833daac7050ad31
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_s32.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int32x4_t
+foo (int32x4_t a, int32x4_t b)
+{
+  return vaddq_s32 (a, b);
+}
+
+/* { dg-final { scan-assembler "vadd.i32"  }  } */
+
+int32x4_t
+foo1 (int32x4_t a, int32x4_t b)
+{
+  return vaddq (a, b);
+}
+
+/* { dg-final { scan-assembler "vadd.i32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_s8.c
new file mode 100644
index 0000000000000000000000000000000000000000..277c9f7f7c1a1054392e0dbd55800e0686e9e8fb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_s8.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int8x16_t
+foo (int8x16_t a, int8x16_t b)
+{
+  return vaddq_s8 (a, b);
+}
+
+/* { dg-final { scan-assembler "vadd.i8"  }  } */
+
+int8x16_t
+foo1 (int8x16_t a, int8x16_t b)
+{
+  return vaddq (a, b);
+}
+
+/* { dg-final { scan-assembler "vadd.i8"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_u16.c
new file mode 100644
index 0000000000000000000000000000000000000000..de2f299ab46181c696db6daa726744bca17f1588
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_u16.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint16x8_t
+foo (uint16x8_t a, uint16x8_t b)
+{
+  return vaddq_u16 (a, b);
+}
+
+/* { dg-final { scan-assembler "vadd.i16"  }  } */
+
+uint16x8_t
+foo1 (uint16x8_t a, uint16x8_t b)
+{
+  return vaddq (a, b);
+}
+
+/* { dg-final { scan-assembler "vadd.i16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_u32.c
new file mode 100644
index 0000000000000000000000000000000000000000..81a96d427a5783a99e4962df99c0f64b6b16e6a5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_u32.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint32x4_t
+foo (uint32x4_t a, uint32x4_t b)
+{
+  return vaddq_u32 (a, b);
+}
+
+/* { dg-final { scan-assembler "vadd.i32"  }  } */
+
+uint32x4_t
+foo1 (uint32x4_t a, uint32x4_t b)
+{
+  return vaddq (a, b);
+}
+
+/* { dg-final { scan-assembler "vadd.i32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_u8.c
new file mode 100644
index 0000000000000000000000000000000000000000..2a754210e139074728ff5af00787aed411d3597c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_u8.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint8x16_t
+foo (uint8x16_t a, uint8x16_t b)
+{
+  return vaddq_u8 (a, b);
+}
+
+/* { dg-final { scan-assembler "vadd.i8"  }  } */
+
+uint8x16_t
+foo1 (uint8x16_t a, uint8x16_t b)
+{
+  return vaddq (a, b);
+}
+
+/* { dg-final { scan-assembler "vadd.i8"  }  } */


[-- Attachment #2: diff28.patch --]
[-- Type: text/plain, Size: 20406 bytes --]

diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
index 42e98f9ad1e357fe974e58378a49bcaaf36c302a..89456589c9dcdff5b56e8707dd720fb151466661 100644
--- a/gcc/config/arm/arm_mve.h
+++ b/gcc/config/arm/arm_mve.h
@@ -1898,6 +1898,14 @@ typedef struct { uint8x16_t val[4]; } uint8x16x4_t;
 #define vstrwq_scatter_shifted_offset_p_u32(__base, __offset, __value, __p) __arm_vstrwq_scatter_shifted_offset_p_u32(__base, __offset, __value, __p)
 #define vstrwq_scatter_shifted_offset_s32(__base, __offset, __value) __arm_vstrwq_scatter_shifted_offset_s32(__base, __offset, __value)
 #define vstrwq_scatter_shifted_offset_u32(__base, __offset, __value) __arm_vstrwq_scatter_shifted_offset_u32(__base, __offset, __value)
+#define vaddq_s8(__a, __b) __arm_vaddq_s8(__a, __b)
+#define vaddq_s16(__a, __b) __arm_vaddq_s16(__a, __b)
+#define vaddq_s32(__a, __b) __arm_vaddq_s32(__a, __b)
+#define vaddq_u8(__a, __b) __arm_vaddq_u8(__a, __b)
+#define vaddq_u16(__a, __b) __arm_vaddq_u16(__a, __b)
+#define vaddq_u32(__a, __b) __arm_vaddq_u32(__a, __b)
+#define vaddq_f16(__a, __b) __arm_vaddq_f16(__a, __b)
+#define vaddq_f32(__a, __b) __arm_vaddq_f32(__a, __b)
 #endif
 
 __extension__ extern __inline void
@@ -12341,6 +12349,48 @@ __arm_vstrwq_scatter_shifted_offset_u32 (uint32_t * __base, uint32x4_t __offset,
   __builtin_mve_vstrwq_scatter_shifted_offset_uv4si ((__builtin_neon_si *) __base, __offset, __value);
 }
 
+__extension__ extern __inline int8x16_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vaddq_s8 (int8x16_t __a, int8x16_t __b)
+{
+  return __a + __b;
+}
+
+__extension__ extern __inline int16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vaddq_s16 (int16x8_t __a, int16x8_t __b)
+{
+  return __a + __b;
+}
+
+__extension__ extern __inline int32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vaddq_s32 (int32x4_t __a, int32x4_t __b)
+{
+  return __a + __b;
+}
+
+__extension__ extern __inline uint8x16_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vaddq_u8 (uint8x16_t __a, uint8x16_t __b)
+{
+  return __a + __b;
+}
+
+__extension__ extern __inline uint16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vaddq_u16 (uint16x8_t __a, uint16x8_t __b)
+{
+  return __a + __b;
+}
+
+__extension__ extern __inline uint32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vaddq_u32 (uint32x4_t __a, uint32x4_t __b)
+{
+  return __a + __b;
+}
+
 #if (__ARM_FEATURE_MVE & 2) /* MVE Floating point.  */
 
 __extension__ extern __inline void
@@ -14707,6 +14757,20 @@ __arm_vstrwq_scatter_shifted_offset_p_f32 (float32_t * __base, uint32x4_t __offs
   __builtin_mve_vstrwq_scatter_shifted_offset_p_fv4sf (__base, __offset, __value, __p);
 }
 
+__extension__ extern __inline float16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vaddq_f16 (float16x8_t __a, float16x8_t __b)
+{
+  return __a + __b;
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vaddq_f32 (float32x4_t __a, float32x4_t __b)
+{
+  return __a + __b;
+}
+
 #endif
 
 enum {
@@ -16557,6 +16621,25 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_uint32_t_ptr][__ARM_mve_type_uint32x4_t]: __arm_vstrwq_scatter_shifted_offset_u32 (__ARM_mve_coerce(__p0, uint32_t *), __p1, __ARM_mve_coerce(__p2, uint32x4_t)), \
   int (*)[__ARM_mve_type_float32_t_ptr][__ARM_mve_type_float32x4_t]: __arm_vstrwq_scatter_shifted_offset_f32 (__ARM_mve_coerce(__p0, float32_t *), __p1, __ARM_mve_coerce(__p2, float32x4_t)));})
 
+#define vaddq(p0,p1) __arm_vaddq(p0,p1)
+#define __arm_vaddq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
+  __typeof(p1) __p1 = (p1); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
+  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vaddq_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t)), \
+  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vaddq_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t)), \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vaddq_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t)), \
+  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]: __arm_vaddq_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t)), \
+  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vaddq_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t)), \
+  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vaddq_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t)), \
+  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8_t]: __arm_vaddq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8_t)), \
+  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16_t]: __arm_vaddq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16_t)), \
+  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32_t]: __arm_vaddq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32_t)), \
+  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8_t]: __arm_vaddq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8_t)), \
+  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16_t]: __arm_vaddq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16_t)), \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32_t]: __arm_vaddq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32_t)), \
+  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]: __arm_vaddq_f16 (__ARM_mve_coerce(p0, float16x8_t), __ARM_mve_coerce(p1, float16x8_t)), \
+  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]: __arm_vaddq_f32 (__ARM_mve_coerce(p0, float32x4_t), __ARM_mve_coerce(p1, float32x4_t)));})
+
 #else /* MVE Interger.  */
 
 #define vst4q(p0,p1) __arm_vst4q(p0,p1)
@@ -17213,12 +17296,7 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vaddq_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t)), \
   int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]: __arm_vaddq_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t)), \
   int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vaddq_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t)), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vaddq_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t)));})
-
-#define vaddq(p0,p1) __arm_vaddq(p0,p1)
-#define __arm_vaddq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
-  __typeof(p1) __p1 = (p1); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
+  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vaddq_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t)), \
   int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8_t]: __arm_vaddq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8_t)), \
   int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16_t]: __arm_vaddq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16_t)), \
   int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32_t]: __arm_vaddq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32_t)), \
diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
index e30325bc1652d378be2544fa32269c5c4294d7e9..a0cd35eadc8e9fa6d1a8330778c0b5f83a5e8a37 100644
--- a/gcc/config/arm/iterators.md
+++ b/gcc/config/arm/iterators.md
@@ -62,6 +62,14 @@
 ;; Integer and float modes supported by Neon and IWMMXT.
 (define_mode_iterator VALL [V2DI V2SI V4HI V8QI V2SF V4SI V8HI V16QI V4SF])
 
+;; Integer and float modes supported by Neon, IWMMXT and MVE, used by
+;; arithmetic epxand patterns.
+(define_mode_iterator VNIM [V16QI V8HI V4SI V4SF])
+
+;; Integer and float modes supported by Neon and IWMMXT but not MVE, used by
+;; arithmetic epxand patterns.
+(define_mode_iterator VNINOTM [V2SI V4HI V8QI V2SF V2DI])
+
 ;; Integer and float modes supported by Neon, IWMMXT and MVE.
 (define_mode_iterator VNIM1 [V16QI V8HI V4SI V4SF V2DI])
 
diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index c3fdc8b60843332ee0e59e8ec537d00c41407622..a3f5de300a4a83d916ebbed44ebce30b7d143b22 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -9636,3 +9636,31 @@
    return "";
 }
   [(set_attr "length" "4")])
+
+;;
+;; [vaddq_s, vaddq_u])
+;;
+(define_insn "mve_vaddq<mode>"
+  [
+   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
+	(plus:MVE_2 (match_operand:MVE_2 1 "s_register_operand" "w")
+		    (match_operand:MVE_2 2 "s_register_operand" "w")))
+  ]
+  "TARGET_HAVE_MVE"
+  "vadd.i%#<V_sz_elem>  %q0, %q1, %q2"
+  [(set_attr "type" "mve_move")
+])
+
+;;
+;; [vaddq_f])
+;;
+(define_insn "mve_vaddq_f<mode>"
+  [
+   (set (match_operand:MVE_0 0 "s_register_operand" "=w")
+	(plus:MVE_0 (match_operand:MVE_0 1 "s_register_operand" "w")
+		    (match_operand:MVE_0 2 "s_register_operand" "w")))
+  ]
+  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
+  "vadd.f%#<V_sz_elem> %q0, %q1, %q2"
+  [(set_attr "type" "mve_move")
+])
diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index c23783e0ed914ec21a92828388ada58ada3c6132..df8f4fd4166b0d9cd08f01f3ac7ac3958f20b9db 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -519,18 +519,30 @@
 ;; As with SFmode, full support for HFmode vector arithmetic is only available
 ;; when flag-unsafe-math-optimizations is enabled.
 
-(define_insn "add<mode>3"
+;; Add pattern with modes V8HF and V4HF is split into separate patterns to add
+;; support for standard pattern addv8hf3 in MVE.  Following pattern is called
+;; from "addv8hf3" standard pattern inside vec-common.md file.
+
+(define_insn "addv8hf3_neon"
   [(set
-    (match_operand:VH 0 "s_register_operand" "=w")
-    (plus:VH
-     (match_operand:VH 1 "s_register_operand" "w")
-     (match_operand:VH 2 "s_register_operand" "w")))]
+    (match_operand:V8HF 0 "s_register_operand" "=w")
+    (plus:V8HF
+     (match_operand:V8HF 1 "s_register_operand" "w")
+     (match_operand:V8HF 2 "s_register_operand" "w")))]
  "TARGET_NEON_FP16INST && flag_unsafe_math_optimizations"
- "vadd.<V_if_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
- [(set (attr "type")
-   (if_then_else (match_test "<Is_float_mode>")
-    (const_string "neon_fp_addsub_s<q>")
-    (const_string "neon_add<q>")))]
+ "vadd.f16\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
+ [(set_attr "type" "neon_fp_addsub_s_q")]
+)
+
+(define_insn "addv4hf3"
+  [(set
+    (match_operand:V4HF 0 "s_register_operand" "=w")
+    (plus:V4HF
+     (match_operand:V4HF 1 "s_register_operand" "w")
+     (match_operand:V4HF 2 "s_register_operand" "w")))]
+ "TARGET_NEON_FP16INST && flag_unsafe_math_optimizations"
+ "vadd.f16\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
+ [(set_attr "type" "neon_fp_addsub_s_q")]
 )
 
 (define_insn "add<mode>3_fp16"
diff --git a/gcc/config/arm/vec-common.md b/gcc/config/arm/vec-common.md
index 5f5c113cf95afafbb733e1bfd2a7c7b8a55651a2..82a1a6bd7698fa25db88a5cdb5b3e762dc80a589 100644
--- a/gcc/config/arm/vec-common.md
+++ b/gcc/config/arm/vec-common.md
@@ -77,19 +77,51 @@
      }
 })
 
+;; Vector arithmetic.  Expanders are blank, then unnamed insns implement
+;; patterns separately for Neon, IWMMXT and MVE.
+
+(define_expand "add<mode>3"
+  [(set (match_operand:VNIM 0 "s_register_operand")
+	(plus:VNIM (match_operand:VNIM 1 "s_register_operand")
+		   (match_operand:VNIM 2 "s_register_operand")))]
+  "(TARGET_NEON && ((<MODE>mode != V2SFmode && <MODE>mode != V4SFmode)
+		    || flag_unsafe_math_optimizations))
+   || (TARGET_REALLY_IWMMXT && VALID_IWMMXT_REG_MODE (<MODE>mode))
+   || (TARGET_HAVE_MVE && VALID_MVE_SI_MODE(<MODE>mode))
+   || (TARGET_HAVE_MVE_FLOAT && VALID_MVE_SF_MODE(<MODE>mode))"
+{
+})
+
+;; Vector arithmetic.  Expanders are blank, then unnamed insns implement
+;; patterns separately for Neon and MVE.
+
+(define_expand "addv8hf3"
+  [(set (match_operand:V8HF 0 "s_register_operand")
+	(plus:V8HF (match_operand:V8HF 1 "s_register_operand")
+		   (match_operand:V8HF 2 "s_register_operand")))]
+  "(TARGET_HAVE_MVE_FLOAT && VALID_MVE_SF_MODE(V8HFmode))
+   || (TARGET_NEON_FP16INST && flag_unsafe_math_optimizations)"
+{
+  if (TARGET_NEON_FP16INST && flag_unsafe_math_optimizations)
+    emit_insn (gen_addv8hf3_neon (operands[0], operands[1], operands[2]));
+})
+
+;; Vector arithmetic.  Expanders are blank, then unnamed insns implement
+;; patterns separately for Neon and IWMMXT.
+
+(define_expand "add<mode>3"
+  [(set (match_operand:VNINOTM 0 "s_register_operand")
+	(plus:VNINOTM (match_operand:VNINOTM 1 "s_register_operand")
+		      (match_operand:VNINOTM 2 "s_register_operand")))]
+  "(TARGET_NEON && ((<MODE>mode != V2SFmode && <MODE>mode != V4SFmode)
+		    || flag_unsafe_math_optimizations))
+   || (TARGET_REALLY_IWMMXT && VALID_IWMMXT_REG_MODE (<MODE>mode))"
+{
+})
+
 ;; Vector arithmetic. Expanders are blank, then unnamed insns implement
 ;; patterns separately for IWMMXT and Neon.
 
-(define_expand "add<mode>3"
-  [(set (match_operand:VALL 0 "s_register_operand")
-        (plus:VALL (match_operand:VALL 1 "s_register_operand")
-                   (match_operand:VALL 2 "s_register_operand")))]
-  "(TARGET_NEON && ((<MODE>mode != V2SFmode && <MODE>mode != V4SFmode)
-		    || flag_unsafe_math_optimizations))
-   || (TARGET_REALLY_IWMMXT && VALID_IWMMXT_REG_MODE (<MODE>mode))"
-{
-})
-
 (define_expand "sub<mode>3"
   [(set (match_operand:VALL 0 "s_register_operand")
         (minus:VALL (match_operand:VALL 1 "s_register_operand")
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_f16.c
new file mode 100644
index 0000000000000000000000000000000000000000..4b6fa72adfdf2b68d766c6a4ce1226a84d357f36
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_f16.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+float16x8_t
+foo (float16x8_t a, float16x8_t b)
+{
+  return vaddq_f16 (a, b);
+}
+
+/* { dg-final { scan-assembler "vadd.f16"  }  } */
+
+float16x8_t
+foo1 (float16x8_t a, float16x8_t b)
+{
+  return vaddq (a, b);
+}
+
+/* { dg-final { scan-assembler "vadd.f16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_f32.c
new file mode 100644
index 0000000000000000000000000000000000000000..0aa7d4b7d9b8f4cb0c6611fe47d494762297c24c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_f32.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+float32x4_t
+foo (float32x4_t a, float32x4_t b)
+{
+  return vaddq_f32 (a, b);
+}
+
+/* { dg-final { scan-assembler "vadd.f32"  }  } */
+
+float32x4_t
+foo1 (float32x4_t a, float32x4_t b)
+{
+  return vaddq (a, b);
+}
+
+/* { dg-final { scan-assembler "vadd.f32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_s16.c
new file mode 100644
index 0000000000000000000000000000000000000000..34f1e0b1d0e11b899505a314418f0f2e1acb7b51
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_s16.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int16x8_t
+foo (int16x8_t a, int16x8_t b)
+{
+  return vaddq_s16 (a, b);
+}
+
+/* { dg-final { scan-assembler "vadd.i16"  }  } */
+
+int16x8_t
+foo1 (int16x8_t a, int16x8_t b)
+{
+  return vaddq (a, b);
+}
+
+/* { dg-final { scan-assembler "vadd.i16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_s32.c
new file mode 100644
index 0000000000000000000000000000000000000000..ae35932b3dcf49eddd9e537e8833daac7050ad31
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_s32.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int32x4_t
+foo (int32x4_t a, int32x4_t b)
+{
+  return vaddq_s32 (a, b);
+}
+
+/* { dg-final { scan-assembler "vadd.i32"  }  } */
+
+int32x4_t
+foo1 (int32x4_t a, int32x4_t b)
+{
+  return vaddq (a, b);
+}
+
+/* { dg-final { scan-assembler "vadd.i32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_s8.c
new file mode 100644
index 0000000000000000000000000000000000000000..277c9f7f7c1a1054392e0dbd55800e0686e9e8fb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_s8.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int8x16_t
+foo (int8x16_t a, int8x16_t b)
+{
+  return vaddq_s8 (a, b);
+}
+
+/* { dg-final { scan-assembler "vadd.i8"  }  } */
+
+int8x16_t
+foo1 (int8x16_t a, int8x16_t b)
+{
+  return vaddq (a, b);
+}
+
+/* { dg-final { scan-assembler "vadd.i8"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_u16.c
new file mode 100644
index 0000000000000000000000000000000000000000..de2f299ab46181c696db6daa726744bca17f1588
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_u16.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint16x8_t
+foo (uint16x8_t a, uint16x8_t b)
+{
+  return vaddq_u16 (a, b);
+}
+
+/* { dg-final { scan-assembler "vadd.i16"  }  } */
+
+uint16x8_t
+foo1 (uint16x8_t a, uint16x8_t b)
+{
+  return vaddq (a, b);
+}
+
+/* { dg-final { scan-assembler "vadd.i16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_u32.c
new file mode 100644
index 0000000000000000000000000000000000000000..81a96d427a5783a99e4962df99c0f64b6b16e6a5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_u32.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint32x4_t
+foo (uint32x4_t a, uint32x4_t b)
+{
+  return vaddq_u32 (a, b);
+}
+
+/* { dg-final { scan-assembler "vadd.i32"  }  } */
+
+uint32x4_t
+foo1 (uint32x4_t a, uint32x4_t b)
+{
+  return vaddq (a, b);
+}
+
+/* { dg-final { scan-assembler "vadd.i32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_u8.c
new file mode 100644
index 0000000000000000000000000000000000000000..2a754210e139074728ff5af00787aed411d3597c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vaddq_u8.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint8x16_t
+foo (uint8x16_t a, uint8x16_t b)
+{
+  return vaddq_u8 (a, b);
+}
+
+/* { dg-final { scan-assembler "vadd.i8"  }  } */
+
+uint8x16_t
+foo1 (uint8x16_t a, uint8x16_t b)
+{
+  return vaddq (a, b);
+}
+
+/* { dg-final { scan-assembler "vadd.i8"  }  } */


^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH][ARM][GCC][13x]: MVE ACLE scalar shift intrinsics.
  2019-11-14 19:34 [PATCH][ARM][GCC][0/x]: Support for MVE ACLE intrinsics Srinath Parvathaneni
                   ` (19 preceding siblings ...)
  2019-11-14 19:15 ` [PATCH][ARM][GCC][3/5x]: MVE store intrinsics with predicated suffix Srinath Parvathaneni
@ 2019-11-14 19:16 ` Srinath Parvathaneni
  2019-11-14 19:16 ` [PATCH][ARM][GCC][2/8x]: MVE ACLE gather load and scatter store intrinsics with writeback Srinath Parvathaneni
                   ` (17 subsequent siblings)
  38 siblings, 0 replies; 48+ messages in thread
From: Srinath Parvathaneni @ 2019-11-14 19:16 UTC (permalink / raw)
  To: gcc-patches; +Cc: Richard Earnshaw, Kyrylo Tkachov

[-- Attachment #1: Type: text/plain, Size: 28173 bytes --]

Hello,

This patch supports following MVE ACLE scalar shift intrinsics.

sqrshr, sqrshrl, sqrshrl_sat48, sqshl, sqshll, srshr, srshrl, uqrshl,
uqrshll, uqrshll_sat48, uqshl, uqshll, urshr, urshrl, lsll, asrl.

Please refer to M-profile Vector Extension (MVE) intrinsics [1]  for more details.
[1] https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/mve-intrinsics

Regression tested on arm-none-eabi and found no regressions.

Ok for trunk?

Thanks,
Srinath.

gcc/ChangeLog:

2019-11-08  Srinath Parvathaneni  <srinath.parvathaneni@arm.com>

	* config/arm/arm-builtins.c (LSLL_QUALIFIERS): Define builtin qualifier.
	(UQSHL_QUALIFIERS): Likewise.
	(ASRL_QUALIFIERS): Likewise.
	(SQSHL_QUALIFIERS): Likewise.
	* config/arm/arm_mve.h (sqrshr): Define macro.
	(sqrshrl): Likewise.
	(sqrshrl_sat48): Likewise.
	(sqshl): Likewise.
	(sqshll): Likewise.
	(srshr): Likewise.
	(srshrl): Likewise.
	(uqrshl): Likewise.
	(uqrshll): Likewise.
	(uqrshll_sat48): Likewise.
	(uqshl): Likewise.
	(uqshll): Likewise.
	(urshr): Likewise.
	(urshrl): Likewise.
	(lsll): Likewise.
	(asrl): Likewise.
	(__arm_lsll): Define intrinsic.
	(__arm_asrl): Likewise.
	(__arm_uqrshll): Likewise.
	(__arm_uqrshll_sat48): Likewise.
	(__arm_sqrshrl): Likewise.
	(__arm_sqrshrl_sat48): Likewise.
	(__arm_uqshll): Likewise.
	(__arm_urshrl): Likewise.
	(__arm_srshrl): Likewise.
	(__arm_sqshll): Likewise.
	(__arm_uqrshl): Likewise.
	(__arm_sqrshr): Likewise.
	(__arm_uqshl): Likewise.
	(__arm_urshr): Likewise.
	(__arm_sqshl): Likewise.
	(__arm_srshr): Likewise.
	* config/arm/arm_mve_builtins.def (LSLL_QUALIFIERS): Use builtin
	qualifier.
        (UQSHL_QUALIFIERS): Likewise.
        (ASRL_QUALIFIERS): Likewise.
        (SQSHL_QUALIFIERS): Likewise.
	* config/arm/mve.md (mve_uqrshll_sat<supf>_di): Define RTL pattern.
	(mve_sqrshrl_sat<supf>_di): Likewise
	(mve_uqrshl_si): Likewise
	(mve_sqrshr_si): Likewise
	(mve_uqshll_di): Likewise
	(mve_urshrl_di): Likewise
	(mve_uqshl_si): Likewise
	(mve_urshr_si): Likewise
	(mve_sqshl_si): Likewise
	(mve_srshr_si): Likewise
	(mve_srshrl_di): Likewise
	(mve_sqshll_di): Likewise

gcc/testsuite/ChangeLog:

2019-11-08  Srinath Parvathaneni  <srinath.parvathaneni@arm.com>

	* gcc.target/arm/mve/intrinsics/asrl.c: New test.
	* gcc.target/arm/mve/intrinsics/lsll.c: Likewise.
	* gcc.target/arm/mve/intrinsics/sqrshr.c: Likewise.
	* gcc.target/arm/mve/intrinsics/sqrshrl_sat48.c: Likewise.
	* gcc.target/arm/mve/intrinsics/sqrshrl_sat64.c: Likewise.
	* gcc.target/arm/mve/intrinsics/sqshl.c: Likewise.
	* gcc.target/arm/mve/intrinsics/sqshll.c: Likewise.
	* gcc.target/arm/mve/intrinsics/srshr.c: Likewise.
	* gcc.target/arm/mve/intrinsics/srshrl.c: Likewise.
	* gcc.target/arm/mve/intrinsics/uqrshl.c: Likewise.
	* gcc.target/arm/mve/intrinsics/uqrshll_sat48.c: Likewise.
	* gcc.target/arm/mve/intrinsics/uqrshll_sat64.c: Likewise.
	* gcc.target/arm/mve/intrinsics/uqshl.c: Likewise.
	* gcc.target/arm/mve/intrinsics/uqshll.c: Likewise.
	* gcc.target/arm/mve/intrinsics/urshr.c: Likewise.
	* gcc.target/arm/mve/intrinsics/urshrl.c: Likewise.


###############     Attachment also inlined for ease of reply    ###############


diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index 9e87dff264d6b535f64407f669c6e83b0ed639a6..31bff8511e368f8e789297818e9b0b9f885463ae 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -738,6 +738,26 @@ arm_strsbwbu_p_qualifiers[SIMD_MAX_BUILTIN_ARGS]
       qualifier_unsigned, qualifier_unsigned};
 #define STRSBWBU_P_QUALIFIERS (arm_strsbwbu_p_qualifiers)
 
+static enum arm_type_qualifiers
+arm_lsll_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_unsigned, qualifier_unsigned, qualifier_none};
+#define LSLL_QUALIFIERS (arm_lsll_qualifiers)
+
+static enum arm_type_qualifiers
+arm_uqshl_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_unsigned, qualifier_unsigned, qualifier_const};
+#define UQSHL_QUALIFIERS (arm_uqshl_qualifiers)
+
+static enum arm_type_qualifiers
+arm_asrl_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_none, qualifier_none, qualifier_none};
+#define ASRL_QUALIFIERS (arm_asrl_qualifiers)
+
+static enum arm_type_qualifiers
+arm_sqshl_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_unsigned, qualifier_unsigned, qualifier_const};
+#define SQSHL_QUALIFIERS (arm_sqshl_qualifiers)
+
 /* End of Qualifier for MVE builtins.  */
 
    /* void ([T element type] *, T, immediate).  */
diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
index 9dcf8d692670cd8552fade9868bc051683553b91..2adae7f8b21f44aa3b80231b89bd68bcd0812611 100644
--- a/gcc/config/arm/arm_mve.h
+++ b/gcc/config/arm/arm_mve.h
@@ -2526,6 +2526,22 @@ typedef struct { uint8x16_t val[4]; } uint8x16x4_t;
 #define vgetq_lane_u16(__a,  __idx) __arm_vgetq_lane_u16(__a,  __idx)
 #define vgetq_lane_u32(__a,  __idx) __arm_vgetq_lane_u32(__a,  __idx)
 #define vgetq_lane_u64(__a,  __idx) __arm_vgetq_lane_u64(__a,  __idx)
+#define sqrshr(__p0, __p1) __arm_sqrshr(__p0, __p1)
+#define sqrshrl(__p0, __p1) __arm_sqrshrl(__p0, __p1)
+#define sqrshrl_sat48(__p0, __p1) __arm_sqrshrl_sat48(__p0, __p1)
+#define sqshl(__p0, __p1) __arm_sqshl(__p0, __p1)
+#define sqshll(__p0, __p1) __arm_sqshll(__p0, __p1)
+#define srshr(__p0, __p1) __arm_srshr(__p0, __p1)
+#define srshrl(__p0, __p1) __arm_srshrl(__p0, __p1)
+#define uqrshl(__p0, __p1) __arm_uqrshl(__p0, __p1)
+#define uqrshll(__p0, __p1) __arm_uqrshll(__p0, __p1)
+#define uqrshll_sat48(__p0, __p1) __arm_uqrshll_sat48(__p0, __p1)
+#define uqshl(__p0, __p1) __arm_uqshl(__p0, __p1)
+#define uqshll(__p0, __p1) __arm_uqshll(__p0, __p1)
+#define urshr(__p0, __p1) __arm_urshr(__p0, __p1)
+#define urshrl(__p0, __p1) __arm_urshrl(__p0, __p1)
+#define lsll(__p0, __p1) __arm_lsll(__p0, __p1)
+#define asrl(__p0, __p1) __arm_asrl(__p0, __p1)
 #endif
 
 /* For big-endian, GCC's vector indices are reversed within each 64 bits
@@ -16539,6 +16555,118 @@ __arm_vgetq_lane_u64 (uint64x2_t __a, const int __idx)
   return __a[__ARM_LANEQ(__a,__idx)];
 }
 
+__extension__ extern __inline  uint64_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_lsll (uint64_t value, int32_t shift)
+{
+  return (value << shift);
+}
+
+__extension__ extern __inline int64_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_asrl (int64_t value, int32_t shift)
+{
+  return (value >> shift);
+}
+
+__extension__ extern __inline uint64_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_uqrshll (uint64_t value, int32_t shift)
+{
+  return __builtin_mve_uqrshll_sat64_di (value, shift);
+}
+
+__extension__ extern __inline uint64_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_uqrshll_sat48 (uint64_t value, int32_t shift)
+{
+  return __builtin_mve_uqrshll_sat48_di (value, shift);
+}
+
+__extension__ extern __inline int64_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_sqrshrl (int64_t value, int32_t shift)
+{
+  return __builtin_mve_sqrshrl_sat64_di (value, shift);
+}
+
+__extension__ extern __inline int64_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_sqrshrl_sat48 (int64_t value, int32_t shift)
+{
+  return __builtin_mve_sqrshrl_sat48_di (value, shift);
+}
+
+__extension__ extern __inline uint64_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_uqshll (uint64_t value, const int shift)
+{
+  return __builtin_mve_uqshll_di (value, shift);
+}
+
+__extension__ extern __inline uint64_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_urshrl (uint64_t value, const int shift)
+{
+  return __builtin_mve_urshrl_di (value, shift);
+}
+
+__extension__ extern __inline int64_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_srshrl (int64_t value, const int shift)
+{
+  return __builtin_mve_srshrl_di (value, shift);
+}
+
+__extension__ extern __inline int64_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_sqshll (int64_t value, const int shift)
+{
+  return __builtin_mve_sqshll_di (value, shift);
+}
+
+__extension__ extern __inline uint32_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_uqrshl (uint32_t value, int32_t shift)
+{
+  return __builtin_mve_uqrshl_si (value, shift);
+}
+
+__extension__ extern __inline int32_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_sqrshr (int32_t value, int32_t shift)
+{
+  return __builtin_mve_sqrshr_si (value, shift);
+}
+
+__extension__ extern __inline uint32_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_uqshl (uint32_t value, const int shift)
+{
+  return  __builtin_mve_uqshl_si (value, shift);
+}
+
+__extension__ extern __inline uint32_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_urshr (uint32_t value, const int shift)
+{
+  return __builtin_mve_urshr_si (value, shift);
+}
+
+__extension__ extern __inline int32_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_sqshl (int32_t value, const int shift)
+{
+  return __builtin_mve_sqshl_si (value, shift);
+}
+
+__extension__ extern __inline int32_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_srshr (int32_t value, const int shift)
+{
+  return __builtin_mve_srshr_si (value, shift);
+}
+
 #if (__ARM_FEATURE_MVE & 2) /* MVE Floating point.  */
 
 __extension__ extern __inline void
diff --git a/gcc/config/arm/arm_mve_builtins.def b/gcc/config/arm/arm_mve_builtins.def
index 638dcbc819034bf2c8428ff40f0e4d811763d80e..c23fe88ad05f8ee6c03127066fdf5afa593df944 100644
--- a/gcc/config/arm/arm_mve_builtins.def
+++ b/gcc/config/arm/arm_mve_builtins.def
@@ -876,3 +876,17 @@ VAR1 (QUADOP_UNONE_UNONE_UNONE_UNONE_UNONE, vsbcq_m_u, v4si)
 VAR5 (STORE1, vst2q, v16qi, v8hi, v4si, v8hf, v4sf)
 VAR5 (LOAD1, vld4q, v16qi, v8hi, v4si, v8hf, v4sf)
 VAR5 (LOAD1, vld2q, v16qi, v8hi, v4si, v8hf, v4sf)
+VAR1 (ASRL, sqrshr_,si)
+VAR1 (ASRL, sqrshrl_sat64_,di)
+VAR1 (ASRL, sqrshrl_sat48_,di)
+VAR1 (LSLL, uqrshl_, si)
+VAR1 (LSLL, uqrshll_sat64_, di)
+VAR1 (LSLL, uqrshll_sat48_, di)
+VAR1 (SQSHL,srshr_,si)
+VAR1 (SQSHL,srshrl_,di)
+VAR1 (SQSHL,sqshl_,si)
+VAR1 (SQSHL,sqshll_,di)
+VAR1 (UQSHL, urshr_, si)
+VAR1 (UQSHL, urshrl_, di)
+VAR1 (UQSHL, uqshl_, si)
+VAR1 (UQSHL, uqshll_, di)
diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index b679511e42ce909cc9ef19e1cb790e8a5254d538..bbd54ad71d0b85fd4711007dd93af7a2788b2cf4 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -214,7 +214,9 @@
 			 VLDRDQGBWB_S VLDRDQGBWB_U VADCQ_U VADCQ_M_U VADCQ_S
 			 VADCQ_M_S VSBCIQ_U VSBCIQ_S VSBCIQ_M_U VSBCIQ_M_S
 			 VSBCQ_U VSBCQ_S VSBCQ_M_U VSBCQ_M_S VADCIQ_U VADCIQ_M_U
-			 VADCIQ_S VADCIQ_M_S VLD2Q VLD4Q VST2Q])
+			 VADCIQ_S VADCIQ_M_S VLD2Q VLD4Q VST2Q SRSHRL SRSHR
+			 URSHR URSHRL SQRSHR UQRSHL UQRSHLL_64
+			 UQRSHLL_48 SQRSHRL_64 SQRSHRL_48])
 
 (define_mode_attr MVE_CNVT [(V8HI "V8HF") (V4SI "V4SF") (V8HF "V8HI")
 			    (V4SF "V4SI")])
@@ -391,7 +393,8 @@
 		       (VSBCIQ_M_U "u") (VSBCIQ_S "s") (VSBCIQ_M_S "s")
 		       (VADCQ_U "u")  (VADCQ_M_U "u") (VADCQ_S "s")
 		       (VADCIQ_U "u") (VADCIQ_M_U "u") (VADCIQ_S "s")
-		       (VADCIQ_M_S "s")])
+		       (VADCIQ_M_S "s") (SQRSHRL_64 "64") (SQRSHRL_48 "48")
+		       (UQRSHLL_64 "64") (UQRSHLL_48 "48")])
 
 (define_int_attr mode1 [(VCTP8Q "8") (VCTP16Q "16") (VCTP32Q "32")
 			(VCTP64Q "64") (VCTP8Q_M "8") (VCTP16Q_M "16")
@@ -654,7 +657,8 @@
 (define_int_iterator VSBCIQ_M [VSBCIQ_M_U VSBCIQ_M_S])
 (define_int_iterator VADCQ [VADCQ_U VADCQ_S])
 (define_int_iterator VADCQ_M [VADCQ_M_U VADCQ_M_S])
-
+(define_int_iterator UQRSHLLQ [UQRSHLL_64 UQRSHLL_48])
+(define_int_iterator SQRSHRLQ [SQRSHRL_64 SQRSHRL_48])
 
 (define_insn "*mve_mov<mode>"
   [(set (match_operand:MVE_types 0 "s_register_operand" "=w,w,r,w,w,r,w")
@@ -10983,3 +10987,143 @@
    return "vmov\t%f0, %J1, %K1";
 }
  [(set_attr "type" "mve_move")])
+
+;;
+;; [uqrshll_di]
+;;
+(define_insn "mve_uqrshll_sat<supf>_di"
+  [(set (match_operand:DI 0 "arm_general_register_operand" "+r")
+	(unspec:DI [(match_operand:DI 1 "arm_general_register_operand" "r")
+		    (match_operand:SI 2 "s_register_operand" "r")]
+	 UQRSHLLQ))]
+  "TARGET_HAVE_MVE"
+  "uqrshll%?\\t%Q1, %R1, #<supf>, %2"
+  [(set_attr "predicable" "yes")])
+
+;;
+;; [sqrshrl_di]
+;;
+(define_insn "mve_sqrshrl_sat<supf>_di"
+  [(set (match_operand:DI 0 "arm_general_register_operand" "+r")
+	(unspec:DI [(match_operand:DI 1 "arm_general_register_operand" "r")
+		    (match_operand:SI 2 "s_register_operand" "r")]
+	 SQRSHRLQ))]
+  "TARGET_HAVE_MVE"
+  "sqrshrl%?\\t%Q1, %R1, #<supf>, %2"
+  [(set_attr "predicable" "yes")])
+
+;;
+;; [uqrshl_si]
+;;
+(define_insn "mve_uqrshl_si"
+  [(set (match_operand:SI 0 "arm_general_register_operand" "+r")
+	(unspec:SI [(match_operand:SI 1 "arm_general_register_operand" "r")
+		    (match_operand:SI 2 "s_register_operand" "r")]
+	 UQRSHL))]
+  "TARGET_HAVE_MVE"
+  "uqrshl%?\\t%1, %2"
+  [(set_attr "predicable" "yes")])
+
+;;
+;; [sqrshr_si]
+;;
+(define_insn "mve_sqrshr_si"
+  [(set (match_operand:SI 0 "arm_general_register_operand" "+r")
+	(unspec:SI [(match_operand:SI 1 "arm_general_register_operand" "r")
+		    (match_operand:SI 2 "s_register_operand" "r")]
+	 SQRSHR))]
+  "TARGET_HAVE_MVE"
+  "sqrshr%?\\t%1, %2"
+  [(set_attr "predicable" "yes")])
+
+;;
+;; [uqshll_di]
+;;
+(define_insn "mve_uqshll_di"
+  [(set (match_operand:DI 0 "arm_general_register_operand" "+r")
+	(us_ashift:DI (match_operand:DI 1 "arm_general_register_operand" "r")
+		      (match_operand:SI 2 "arm_reg_or_long_shift_imm" "rPg")))]
+  "TARGET_HAVE_MVE"
+  "uqshll%?\\t%Q1, %R1, %2"
+  [(set_attr "predicable" "yes")])
+
+;;
+;; [urshrl_di]
+;;
+(define_insn "mve_urshrl_di"
+  [(set (match_operand:DI 0 "arm_general_register_operand" "+r")
+	(unspec:DI [(match_operand:DI 1 "arm_general_register_operand" "r")
+		    (match_operand:SI 2 "arm_reg_or_long_shift_imm" "rPg")]
+	 URSHRL))]
+  "TARGET_HAVE_MVE"
+  "urshrl%?\\t%Q1, %R1, %2"
+  [(set_attr "predicable" "yes")])
+
+;;
+;; [uqshl_si]
+;;
+(define_insn "mve_uqshl_si"
+  [(set (match_operand:SI 0 "arm_general_register_operand" "+r")
+	(us_ashift:SI (match_operand:SI 1 "arm_general_register_operand" "r")
+		      (match_operand:SI 2 "arm_reg_or_long_shift_imm" "rPg")))]
+  "TARGET_HAVE_MVE"
+  "uqshl%?\\t%1, %2"
+  [(set_attr "predicable" "yes")])
+
+;;
+;; [urshr_si]
+;;
+(define_insn "mve_urshr_si"
+  [(set (match_operand:SI 0 "arm_general_register_operand" "+r")
+	(unspec:SI [(match_operand:SI 1 "arm_general_register_operand" "r")
+		    (match_operand:SI 2 "arm_reg_or_long_shift_imm" "rPg")]
+	 URSHR))]
+  "TARGET_HAVE_MVE"
+  "urshr%?\\t%1, %2"
+  [(set_attr "predicable" "yes")])
+
+;;
+;; [sqshl_si]
+;;
+(define_insn "mve_sqshl_si"
+  [(set (match_operand:SI 0 "arm_general_register_operand" "+r")
+	(ss_ashift:SI (match_operand:DI 1 "arm_general_register_operand" "r")
+		      (match_operand:SI 2 "arm_reg_or_long_shift_imm" "rPg")))]
+  "TARGET_HAVE_MVE"
+  "sqshl%?\\t%1, %2"
+  [(set_attr "predicable" "yes")])
+
+;;
+;; [srshr_si]
+;;
+(define_insn "mve_srshr_si"
+  [(set (match_operand:SI 0 "arm_general_register_operand" "+r")
+	(unspec:SI [(match_operand:DI 1 "arm_general_register_operand" "r")
+		    (match_operand:SI 2 "arm_reg_or_long_shift_imm" "rPg")]
+	 SRSHR))]
+  "TARGET_HAVE_MVE"
+  "srshr%?\\t%1, %2"
+  [(set_attr "predicable" "yes")])
+
+;;
+;; [srshrl_di]
+;;
+(define_insn "mve_srshrl_di"
+  [(set (match_operand:DI 0 "arm_general_register_operand" "+r")
+	(unspec:DI [(match_operand:DI 1 "arm_general_register_operand" "r")
+		    (match_operand:SI 2 "arm_reg_or_long_shift_imm" "rPg")]
+	 SRSHRL))]
+  "TARGET_HAVE_MVE"
+  "srshrl%?\\t%Q1, %R1, %2"
+  [(set_attr "predicable" "yes")])
+
+;;
+;; [sqshll_di]
+;;
+(define_insn "mve_sqshll_di"
+  [(set (match_operand:DI 0 "arm_general_register_operand" "+r")
+	(ss_ashift:DI (match_operand:DI 1 "arm_general_register_operand" "r")
+		      (match_operand:SI 2 "arm_reg_or_long_shift_imm" "rPg")))]
+  "TARGET_HAVE_MVE"
+  "sqshll%?\\t%Q1, %R1, %2"
+  [(set_attr "predicable" "yes")])
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/asrl.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/asrl.c
new file mode 100644
index 0000000000000000000000000000000000000000..ead9c51003f1a309fe2afb619a582bd42ee61ecb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/asrl.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int64_t
+asrl_reg (int64_t longval3, int32_t x)
+{
+  return asrl (longval3, x);
+}
+
+/* { dg-final { scan-assembler "asrl\\tr\[0-9\]+, r\[0-9\]+, r\[0-9\]+" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/lsll.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/lsll.c
new file mode 100644
index 0000000000000000000000000000000000000000..fac4a41f4f779e15f50b84f29e0c87558a27d30c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/lsll.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint64_t
+lsll_reg (uint64_t longval3, int32_t x)
+{
+  return lsll (longval3, x);
+}
+
+/* { dg-final { scan-assembler "lsll\\tr\[0-9\]+, r\[0-9\]+, r\[0-9\]+" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/sqrshr.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/sqrshr.c
new file mode 100644
index 0000000000000000000000000000000000000000..def379d59c9ad99add67ea5976eb285588a49f18
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/sqrshr.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int32_t
+sqrshr_reg (int32_t longval3, int32_t x)
+{
+  return sqrshr (longval3, x);
+}
+
+/* { dg-final { scan-assembler "sqrshr\\tr\[0-9\]+, r\[0-9\]+" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/sqrshrl_sat48.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/sqrshrl_sat48.c
new file mode 100644
index 0000000000000000000000000000000000000000..0a8606747c45aec190dbde705b5909dc2c4a1205
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/sqrshrl_sat48.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int64_t
+sqrshrl_reg (int64_t longval3, int32_t x)
+{
+  return sqrshrl_sat48 (longval3, x);
+}
+
+/* { dg-final { scan-assembler "sqrshrl\\tr\[0-9\]+, r\[0-9\]+, #48, r\[0-9\]+" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/sqrshrl_sat64.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/sqrshrl_sat64.c
new file mode 100644
index 0000000000000000000000000000000000000000..32d52496343311ac1ec6036cbef81c9c6266d4a5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/sqrshrl_sat64.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int64_t
+sqrshrl_reg (int64_t longval3, int32_t x)
+{
+  return sqrshrl (longval3, x);
+}
+
+/* { dg-final { scan-assembler "sqrshrl\\tr\[0-9\]+, r\[0-9\]+, #64, r\[0-9\]+" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/sqshl.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/sqshl.c
new file mode 100644
index 0000000000000000000000000000000000000000..a546ff51681e5615f7961db3f1ee59b526cd7688
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/sqshl.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int32_t
+sqshl_imm (int32_t longval3)
+{
+  return sqshl (longval3, 25);
+}
+
+/* { dg-final { scan-assembler "sqshl\\tr\[0-9\]+, #25" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/sqshll.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/sqshll.c
new file mode 100644
index 0000000000000000000000000000000000000000..8784b70155ae6193be8a9700dd6a0b4a34bf233b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/sqshll.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int64_t
+sqshll_imm(int64_t value)
+{
+  return sqshll (value, 21);
+}
+
+/* { dg-final { scan-assembler "sqshll\\tr\[0-9\]+, r\[0-9\]+, #21" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/srshr.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/srshr.c
new file mode 100644
index 0000000000000000000000000000000000000000..d48d65a1f05a3ea679129bac92ea8cf55f44bff5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/srshr.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int32_t
+srshr_imm (int32_t longval3)
+{
+  return srshr (longval3, 25);
+}
+
+/* { dg-final { scan-assembler "srshr\\tr\[0-9\]+, #25" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/srshrl.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/srshrl.c
new file mode 100644
index 0000000000000000000000000000000000000000..260285b28865337eb7e589a25a00898f8b636956
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/srshrl.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int64_t
+srshrl_imm(int64_t value)
+{
+  return srshrl (value, 21);
+}
+
+/* { dg-final { scan-assembler "srshrl\\tr\[0-9\]+, r\[0-9\]+, #21" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/uqrshl.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/uqrshl.c
new file mode 100644
index 0000000000000000000000000000000000000000..e23e644ec948a7af3341c0743569a1db8572e652
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/uqrshl.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint32_t
+uqrshl_reg (uint32_t longval3, int32_t x)
+{
+  return uqrshl (longval3, x);
+}
+
+/* { dg-final { scan-assembler "uqrshl\\tr\[0-9\]+, r\[0-9\]+" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/uqrshll_sat48.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/uqrshll_sat48.c
new file mode 100644
index 0000000000000000000000000000000000000000..ed7a550b92bff6097d831654e7601a6d8aee6153
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/uqrshll_sat48.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint64_t
+uqrshll_reg (uint64_t longval3, int32_t x)
+{
+  return uqrshll_sat48 (longval3, x);
+}
+
+/* { dg-final { scan-assembler "uqrshll\\tr\[0-9\]+, r\[0-9\]+, #48, r\[0-9\]+" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/uqrshll_sat64.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/uqrshll_sat64.c
new file mode 100644
index 0000000000000000000000000000000000000000..1d75ea1f878db12e8b5e92c886c7524ceebb0284
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/uqrshll_sat64.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint64_t
+uqrshll_reg (uint64_t longval3, int32_t x)
+{
+  return uqrshll (longval3, x);
+}
+
+/* { dg-final { scan-assembler "uqrshll\\tr\[0-9\]+, r\[0-9\]+, #64, r\[0-9\]+" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/uqshl.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/uqshl.c
new file mode 100644
index 0000000000000000000000000000000000000000..92e8748e5ec5d3426523b89c57e6b698aa4b351c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/uqshl.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint32_t
+uqshl_imm (uint32_t longval3)
+{
+  return uqshl (longval3, 21);
+}
+
+/* { dg-final { scan-assembler "uqshl\\tr\[0-9\]+, #21" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/uqshll.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/uqshll.c
new file mode 100644
index 0000000000000000000000000000000000000000..e4416e15144cc8939e5c57773049d85293073c07
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/uqshll.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint64_t
+uqshll_imm(uint64_t value)
+{
+  return uqshll (value, 21);
+}
+
+/* { dg-final { scan-assembler "uqshll\\tr\[0-9\]+, r\[0-9\]+, #21" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/urshr.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/urshr.c
new file mode 100644
index 0000000000000000000000000000000000000000..1526270e778a7757e4fcfeec7d45b738806adbc8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/urshr.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint64_t
+urshr_imm (uint32_t longval3)
+{
+  return urshr (longval3, 21);
+}
+
+/* { dg-final { scan-assembler "urshr\\tr\[0-9\]+, #21" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/urshrl.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/urshrl.c
new file mode 100644
index 0000000000000000000000000000000000000000..032e6e31670c19ab2f14915dc0af0a63459dd156
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/urshrl.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint64_t
+urshrl_imm(uint64_t value)
+{
+  return urshrl (value, 21);
+}
+
+/* { dg-final { scan-assembler "urshrl\\tr\[0-9\]+, r\[0-9\]+, #21" } } */


[-- Attachment #2: diff36.patch --]
[-- Type: text/plain, Size: 24231 bytes --]

diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index 9e87dff264d6b535f64407f669c6e83b0ed639a6..31bff8511e368f8e789297818e9b0b9f885463ae 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -738,6 +738,26 @@ arm_strsbwbu_p_qualifiers[SIMD_MAX_BUILTIN_ARGS]
       qualifier_unsigned, qualifier_unsigned};
 #define STRSBWBU_P_QUALIFIERS (arm_strsbwbu_p_qualifiers)
 
+static enum arm_type_qualifiers
+arm_lsll_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_unsigned, qualifier_unsigned, qualifier_none};
+#define LSLL_QUALIFIERS (arm_lsll_qualifiers)
+
+static enum arm_type_qualifiers
+arm_uqshl_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_unsigned, qualifier_unsigned, qualifier_const};
+#define UQSHL_QUALIFIERS (arm_uqshl_qualifiers)
+
+static enum arm_type_qualifiers
+arm_asrl_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_none, qualifier_none, qualifier_none};
+#define ASRL_QUALIFIERS (arm_asrl_qualifiers)
+
+static enum arm_type_qualifiers
+arm_sqshl_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_unsigned, qualifier_unsigned, qualifier_const};
+#define SQSHL_QUALIFIERS (arm_sqshl_qualifiers)
+
 /* End of Qualifier for MVE builtins.  */
 
    /* void ([T element type] *, T, immediate).  */
diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
index 9dcf8d692670cd8552fade9868bc051683553b91..2adae7f8b21f44aa3b80231b89bd68bcd0812611 100644
--- a/gcc/config/arm/arm_mve.h
+++ b/gcc/config/arm/arm_mve.h
@@ -2526,6 +2526,22 @@ typedef struct { uint8x16_t val[4]; } uint8x16x4_t;
 #define vgetq_lane_u16(__a,  __idx) __arm_vgetq_lane_u16(__a,  __idx)
 #define vgetq_lane_u32(__a,  __idx) __arm_vgetq_lane_u32(__a,  __idx)
 #define vgetq_lane_u64(__a,  __idx) __arm_vgetq_lane_u64(__a,  __idx)
+#define sqrshr(__p0, __p1) __arm_sqrshr(__p0, __p1)
+#define sqrshrl(__p0, __p1) __arm_sqrshrl(__p0, __p1)
+#define sqrshrl_sat48(__p0, __p1) __arm_sqrshrl_sat48(__p0, __p1)
+#define sqshl(__p0, __p1) __arm_sqshl(__p0, __p1)
+#define sqshll(__p0, __p1) __arm_sqshll(__p0, __p1)
+#define srshr(__p0, __p1) __arm_srshr(__p0, __p1)
+#define srshrl(__p0, __p1) __arm_srshrl(__p0, __p1)
+#define uqrshl(__p0, __p1) __arm_uqrshl(__p0, __p1)
+#define uqrshll(__p0, __p1) __arm_uqrshll(__p0, __p1)
+#define uqrshll_sat48(__p0, __p1) __arm_uqrshll_sat48(__p0, __p1)
+#define uqshl(__p0, __p1) __arm_uqshl(__p0, __p1)
+#define uqshll(__p0, __p1) __arm_uqshll(__p0, __p1)
+#define urshr(__p0, __p1) __arm_urshr(__p0, __p1)
+#define urshrl(__p0, __p1) __arm_urshrl(__p0, __p1)
+#define lsll(__p0, __p1) __arm_lsll(__p0, __p1)
+#define asrl(__p0, __p1) __arm_asrl(__p0, __p1)
 #endif
 
 /* For big-endian, GCC's vector indices are reversed within each 64 bits
@@ -16539,6 +16555,118 @@ __arm_vgetq_lane_u64 (uint64x2_t __a, const int __idx)
   return __a[__ARM_LANEQ(__a,__idx)];
 }
 
+__extension__ extern __inline  uint64_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_lsll (uint64_t value, int32_t shift)
+{
+  return (value << shift);
+}
+
+__extension__ extern __inline int64_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_asrl (int64_t value, int32_t shift)
+{
+  return (value >> shift);
+}
+
+__extension__ extern __inline uint64_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_uqrshll (uint64_t value, int32_t shift)
+{
+  return __builtin_mve_uqrshll_sat64_di (value, shift);
+}
+
+__extension__ extern __inline uint64_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_uqrshll_sat48 (uint64_t value, int32_t shift)
+{
+  return __builtin_mve_uqrshll_sat48_di (value, shift);
+}
+
+__extension__ extern __inline int64_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_sqrshrl (int64_t value, int32_t shift)
+{
+  return __builtin_mve_sqrshrl_sat64_di (value, shift);
+}
+
+__extension__ extern __inline int64_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_sqrshrl_sat48 (int64_t value, int32_t shift)
+{
+  return __builtin_mve_sqrshrl_sat48_di (value, shift);
+}
+
+__extension__ extern __inline uint64_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_uqshll (uint64_t value, const int shift)
+{
+  return __builtin_mve_uqshll_di (value, shift);
+}
+
+__extension__ extern __inline uint64_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_urshrl (uint64_t value, const int shift)
+{
+  return __builtin_mve_urshrl_di (value, shift);
+}
+
+__extension__ extern __inline int64_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_srshrl (int64_t value, const int shift)
+{
+  return __builtin_mve_srshrl_di (value, shift);
+}
+
+__extension__ extern __inline int64_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_sqshll (int64_t value, const int shift)
+{
+  return __builtin_mve_sqshll_di (value, shift);
+}
+
+__extension__ extern __inline uint32_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_uqrshl (uint32_t value, int32_t shift)
+{
+  return __builtin_mve_uqrshl_si (value, shift);
+}
+
+__extension__ extern __inline int32_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_sqrshr (int32_t value, int32_t shift)
+{
+  return __builtin_mve_sqrshr_si (value, shift);
+}
+
+__extension__ extern __inline uint32_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_uqshl (uint32_t value, const int shift)
+{
+  return  __builtin_mve_uqshl_si (value, shift);
+}
+
+__extension__ extern __inline uint32_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_urshr (uint32_t value, const int shift)
+{
+  return __builtin_mve_urshr_si (value, shift);
+}
+
+__extension__ extern __inline int32_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_sqshl (int32_t value, const int shift)
+{
+  return __builtin_mve_sqshl_si (value, shift);
+}
+
+__extension__ extern __inline int32_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_srshr (int32_t value, const int shift)
+{
+  return __builtin_mve_srshr_si (value, shift);
+}
+
 #if (__ARM_FEATURE_MVE & 2) /* MVE Floating point.  */
 
 __extension__ extern __inline void
diff --git a/gcc/config/arm/arm_mve_builtins.def b/gcc/config/arm/arm_mve_builtins.def
index 638dcbc819034bf2c8428ff40f0e4d811763d80e..c23fe88ad05f8ee6c03127066fdf5afa593df944 100644
--- a/gcc/config/arm/arm_mve_builtins.def
+++ b/gcc/config/arm/arm_mve_builtins.def
@@ -876,3 +876,17 @@ VAR1 (QUADOP_UNONE_UNONE_UNONE_UNONE_UNONE, vsbcq_m_u, v4si)
 VAR5 (STORE1, vst2q, v16qi, v8hi, v4si, v8hf, v4sf)
 VAR5 (LOAD1, vld4q, v16qi, v8hi, v4si, v8hf, v4sf)
 VAR5 (LOAD1, vld2q, v16qi, v8hi, v4si, v8hf, v4sf)
+VAR1 (ASRL, sqrshr_,si)
+VAR1 (ASRL, sqrshrl_sat64_,di)
+VAR1 (ASRL, sqrshrl_sat48_,di)
+VAR1 (LSLL, uqrshl_, si)
+VAR1 (LSLL, uqrshll_sat64_, di)
+VAR1 (LSLL, uqrshll_sat48_, di)
+VAR1 (SQSHL,srshr_,si)
+VAR1 (SQSHL,srshrl_,di)
+VAR1 (SQSHL,sqshl_,si)
+VAR1 (SQSHL,sqshll_,di)
+VAR1 (UQSHL, urshr_, si)
+VAR1 (UQSHL, urshrl_, di)
+VAR1 (UQSHL, uqshl_, si)
+VAR1 (UQSHL, uqshll_, di)
diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index b679511e42ce909cc9ef19e1cb790e8a5254d538..bbd54ad71d0b85fd4711007dd93af7a2788b2cf4 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -214,7 +214,9 @@
 			 VLDRDQGBWB_S VLDRDQGBWB_U VADCQ_U VADCQ_M_U VADCQ_S
 			 VADCQ_M_S VSBCIQ_U VSBCIQ_S VSBCIQ_M_U VSBCIQ_M_S
 			 VSBCQ_U VSBCQ_S VSBCQ_M_U VSBCQ_M_S VADCIQ_U VADCIQ_M_U
-			 VADCIQ_S VADCIQ_M_S VLD2Q VLD4Q VST2Q])
+			 VADCIQ_S VADCIQ_M_S VLD2Q VLD4Q VST2Q SRSHRL SRSHR
+			 URSHR URSHRL SQRSHR UQRSHL UQRSHLL_64
+			 UQRSHLL_48 SQRSHRL_64 SQRSHRL_48])
 
 (define_mode_attr MVE_CNVT [(V8HI "V8HF") (V4SI "V4SF") (V8HF "V8HI")
 			    (V4SF "V4SI")])
@@ -391,7 +393,8 @@
 		       (VSBCIQ_M_U "u") (VSBCIQ_S "s") (VSBCIQ_M_S "s")
 		       (VADCQ_U "u")  (VADCQ_M_U "u") (VADCQ_S "s")
 		       (VADCIQ_U "u") (VADCIQ_M_U "u") (VADCIQ_S "s")
-		       (VADCIQ_M_S "s")])
+		       (VADCIQ_M_S "s") (SQRSHRL_64 "64") (SQRSHRL_48 "48")
+		       (UQRSHLL_64 "64") (UQRSHLL_48 "48")])
 
 (define_int_attr mode1 [(VCTP8Q "8") (VCTP16Q "16") (VCTP32Q "32")
 			(VCTP64Q "64") (VCTP8Q_M "8") (VCTP16Q_M "16")
@@ -654,7 +657,8 @@
 (define_int_iterator VSBCIQ_M [VSBCIQ_M_U VSBCIQ_M_S])
 (define_int_iterator VADCQ [VADCQ_U VADCQ_S])
 (define_int_iterator VADCQ_M [VADCQ_M_U VADCQ_M_S])
-
+(define_int_iterator UQRSHLLQ [UQRSHLL_64 UQRSHLL_48])
+(define_int_iterator SQRSHRLQ [SQRSHRL_64 SQRSHRL_48])
 
 (define_insn "*mve_mov<mode>"
   [(set (match_operand:MVE_types 0 "s_register_operand" "=w,w,r,w,w,r,w")
@@ -10983,3 +10987,143 @@
    return "vmov\t%f0, %J1, %K1";
 }
  [(set_attr "type" "mve_move")])
+
+;;
+;; [uqrshll_di]
+;;
+(define_insn "mve_uqrshll_sat<supf>_di"
+  [(set (match_operand:DI 0 "arm_general_register_operand" "+r")
+	(unspec:DI [(match_operand:DI 1 "arm_general_register_operand" "r")
+		    (match_operand:SI 2 "s_register_operand" "r")]
+	 UQRSHLLQ))]
+  "TARGET_HAVE_MVE"
+  "uqrshll%?\\t%Q1, %R1, #<supf>, %2"
+  [(set_attr "predicable" "yes")])
+
+;;
+;; [sqrshrl_di]
+;;
+(define_insn "mve_sqrshrl_sat<supf>_di"
+  [(set (match_operand:DI 0 "arm_general_register_operand" "+r")
+	(unspec:DI [(match_operand:DI 1 "arm_general_register_operand" "r")
+		    (match_operand:SI 2 "s_register_operand" "r")]
+	 SQRSHRLQ))]
+  "TARGET_HAVE_MVE"
+  "sqrshrl%?\\t%Q1, %R1, #<supf>, %2"
+  [(set_attr "predicable" "yes")])
+
+;;
+;; [uqrshl_si]
+;;
+(define_insn "mve_uqrshl_si"
+  [(set (match_operand:SI 0 "arm_general_register_operand" "+r")
+	(unspec:SI [(match_operand:SI 1 "arm_general_register_operand" "r")
+		    (match_operand:SI 2 "s_register_operand" "r")]
+	 UQRSHL))]
+  "TARGET_HAVE_MVE"
+  "uqrshl%?\\t%1, %2"
+  [(set_attr "predicable" "yes")])
+
+;;
+;; [sqrshr_si]
+;;
+(define_insn "mve_sqrshr_si"
+  [(set (match_operand:SI 0 "arm_general_register_operand" "+r")
+	(unspec:SI [(match_operand:SI 1 "arm_general_register_operand" "r")
+		    (match_operand:SI 2 "s_register_operand" "r")]
+	 SQRSHR))]
+  "TARGET_HAVE_MVE"
+  "sqrshr%?\\t%1, %2"
+  [(set_attr "predicable" "yes")])
+
+;;
+;; [uqshll_di]
+;;
+(define_insn "mve_uqshll_di"
+  [(set (match_operand:DI 0 "arm_general_register_operand" "+r")
+	(us_ashift:DI (match_operand:DI 1 "arm_general_register_operand" "r")
+		      (match_operand:SI 2 "arm_reg_or_long_shift_imm" "rPg")))]
+  "TARGET_HAVE_MVE"
+  "uqshll%?\\t%Q1, %R1, %2"
+  [(set_attr "predicable" "yes")])
+
+;;
+;; [urshrl_di]
+;;
+(define_insn "mve_urshrl_di"
+  [(set (match_operand:DI 0 "arm_general_register_operand" "+r")
+	(unspec:DI [(match_operand:DI 1 "arm_general_register_operand" "r")
+		    (match_operand:SI 2 "arm_reg_or_long_shift_imm" "rPg")]
+	 URSHRL))]
+  "TARGET_HAVE_MVE"
+  "urshrl%?\\t%Q1, %R1, %2"
+  [(set_attr "predicable" "yes")])
+
+;;
+;; [uqshl_si]
+;;
+(define_insn "mve_uqshl_si"
+  [(set (match_operand:SI 0 "arm_general_register_operand" "+r")
+	(us_ashift:SI (match_operand:SI 1 "arm_general_register_operand" "r")
+		      (match_operand:SI 2 "arm_reg_or_long_shift_imm" "rPg")))]
+  "TARGET_HAVE_MVE"
+  "uqshl%?\\t%1, %2"
+  [(set_attr "predicable" "yes")])
+
+;;
+;; [urshr_si]
+;;
+(define_insn "mve_urshr_si"
+  [(set (match_operand:SI 0 "arm_general_register_operand" "+r")
+	(unspec:SI [(match_operand:SI 1 "arm_general_register_operand" "r")
+		    (match_operand:SI 2 "arm_reg_or_long_shift_imm" "rPg")]
+	 URSHR))]
+  "TARGET_HAVE_MVE"
+  "urshr%?\\t%1, %2"
+  [(set_attr "predicable" "yes")])
+
+;;
+;; [sqshl_si]
+;;
+(define_insn "mve_sqshl_si"
+  [(set (match_operand:SI 0 "arm_general_register_operand" "+r")
+	(ss_ashift:SI (match_operand:DI 1 "arm_general_register_operand" "r")
+		      (match_operand:SI 2 "arm_reg_or_long_shift_imm" "rPg")))]
+  "TARGET_HAVE_MVE"
+  "sqshl%?\\t%1, %2"
+  [(set_attr "predicable" "yes")])
+
+;;
+;; [srshr_si]
+;;
+(define_insn "mve_srshr_si"
+  [(set (match_operand:SI 0 "arm_general_register_operand" "+r")
+	(unspec:SI [(match_operand:DI 1 "arm_general_register_operand" "r")
+		    (match_operand:SI 2 "arm_reg_or_long_shift_imm" "rPg")]
+	 SRSHR))]
+  "TARGET_HAVE_MVE"
+  "srshr%?\\t%1, %2"
+  [(set_attr "predicable" "yes")])
+
+;;
+;; [srshrl_di]
+;;
+(define_insn "mve_srshrl_di"
+  [(set (match_operand:DI 0 "arm_general_register_operand" "+r")
+	(unspec:DI [(match_operand:DI 1 "arm_general_register_operand" "r")
+		    (match_operand:SI 2 "arm_reg_or_long_shift_imm" "rPg")]
+	 SRSHRL))]
+  "TARGET_HAVE_MVE"
+  "srshrl%?\\t%Q1, %R1, %2"
+  [(set_attr "predicable" "yes")])
+
+;;
+;; [sqshll_di]
+;;
+(define_insn "mve_sqshll_di"
+  [(set (match_operand:DI 0 "arm_general_register_operand" "+r")
+	(ss_ashift:DI (match_operand:DI 1 "arm_general_register_operand" "r")
+		      (match_operand:SI 2 "arm_reg_or_long_shift_imm" "rPg")))]
+  "TARGET_HAVE_MVE"
+  "sqshll%?\\t%Q1, %R1, %2"
+  [(set_attr "predicable" "yes")])
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/asrl.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/asrl.c
new file mode 100644
index 0000000000000000000000000000000000000000..ead9c51003f1a309fe2afb619a582bd42ee61ecb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/asrl.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int64_t
+asrl_reg (int64_t longval3, int32_t x)
+{
+  return asrl (longval3, x);
+}
+
+/* { dg-final { scan-assembler "asrl\\tr\[0-9\]+, r\[0-9\]+, r\[0-9\]+" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/lsll.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/lsll.c
new file mode 100644
index 0000000000000000000000000000000000000000..fac4a41f4f779e15f50b84f29e0c87558a27d30c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/lsll.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint64_t
+lsll_reg (uint64_t longval3, int32_t x)
+{
+  return lsll (longval3, x);
+}
+
+/* { dg-final { scan-assembler "lsll\\tr\[0-9\]+, r\[0-9\]+, r\[0-9\]+" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/sqrshr.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/sqrshr.c
new file mode 100644
index 0000000000000000000000000000000000000000..def379d59c9ad99add67ea5976eb285588a49f18
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/sqrshr.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int32_t
+sqrshr_reg (int32_t longval3, int32_t x)
+{
+  return sqrshr (longval3, x);
+}
+
+/* { dg-final { scan-assembler "sqrshr\\tr\[0-9\]+, r\[0-9\]+" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/sqrshrl_sat48.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/sqrshrl_sat48.c
new file mode 100644
index 0000000000000000000000000000000000000000..0a8606747c45aec190dbde705b5909dc2c4a1205
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/sqrshrl_sat48.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int64_t
+sqrshrl_reg (int64_t longval3, int32_t x)
+{
+  return sqrshrl_sat48 (longval3, x);
+}
+
+/* { dg-final { scan-assembler "sqrshrl\\tr\[0-9\]+, r\[0-9\]+, #48, r\[0-9\]+" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/sqrshrl_sat64.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/sqrshrl_sat64.c
new file mode 100644
index 0000000000000000000000000000000000000000..32d52496343311ac1ec6036cbef81c9c6266d4a5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/sqrshrl_sat64.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int64_t
+sqrshrl_reg (int64_t longval3, int32_t x)
+{
+  return sqrshrl (longval3, x);
+}
+
+/* { dg-final { scan-assembler "sqrshrl\\tr\[0-9\]+, r\[0-9\]+, #64, r\[0-9\]+" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/sqshl.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/sqshl.c
new file mode 100644
index 0000000000000000000000000000000000000000..a546ff51681e5615f7961db3f1ee59b526cd7688
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/sqshl.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int32_t
+sqshl_imm (int32_t longval3)
+{
+  return sqshl (longval3, 25);
+}
+
+/* { dg-final { scan-assembler "sqshl\\tr\[0-9\]+, #25" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/sqshll.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/sqshll.c
new file mode 100644
index 0000000000000000000000000000000000000000..8784b70155ae6193be8a9700dd6a0b4a34bf233b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/sqshll.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int64_t
+sqshll_imm(int64_t value)
+{
+  return sqshll (value, 21);
+}
+
+/* { dg-final { scan-assembler "sqshll\\tr\[0-9\]+, r\[0-9\]+, #21" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/srshr.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/srshr.c
new file mode 100644
index 0000000000000000000000000000000000000000..d48d65a1f05a3ea679129bac92ea8cf55f44bff5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/srshr.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int32_t
+srshr_imm (int32_t longval3)
+{
+  return srshr (longval3, 25);
+}
+
+/* { dg-final { scan-assembler "srshr\\tr\[0-9\]+, #25" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/srshrl.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/srshrl.c
new file mode 100644
index 0000000000000000000000000000000000000000..260285b28865337eb7e589a25a00898f8b636956
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/srshrl.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int64_t
+srshrl_imm(int64_t value)
+{
+  return srshrl (value, 21);
+}
+
+/* { dg-final { scan-assembler "srshrl\\tr\[0-9\]+, r\[0-9\]+, #21" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/uqrshl.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/uqrshl.c
new file mode 100644
index 0000000000000000000000000000000000000000..e23e644ec948a7af3341c0743569a1db8572e652
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/uqrshl.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint32_t
+uqrshl_reg (uint32_t longval3, int32_t x)
+{
+  return uqrshl (longval3, x);
+}
+
+/* { dg-final { scan-assembler "uqrshl\\tr\[0-9\]+, r\[0-9\]+" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/uqrshll_sat48.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/uqrshll_sat48.c
new file mode 100644
index 0000000000000000000000000000000000000000..ed7a550b92bff6097d831654e7601a6d8aee6153
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/uqrshll_sat48.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint64_t
+uqrshll_reg (uint64_t longval3, int32_t x)
+{
+  return uqrshll_sat48 (longval3, x);
+}
+
+/* { dg-final { scan-assembler "uqrshll\\tr\[0-9\]+, r\[0-9\]+, #48, r\[0-9\]+" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/uqrshll_sat64.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/uqrshll_sat64.c
new file mode 100644
index 0000000000000000000000000000000000000000..1d75ea1f878db12e8b5e92c886c7524ceebb0284
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/uqrshll_sat64.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint64_t
+uqrshll_reg (uint64_t longval3, int32_t x)
+{
+  return uqrshll (longval3, x);
+}
+
+/* { dg-final { scan-assembler "uqrshll\\tr\[0-9\]+, r\[0-9\]+, #64, r\[0-9\]+" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/uqshl.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/uqshl.c
new file mode 100644
index 0000000000000000000000000000000000000000..92e8748e5ec5d3426523b89c57e6b698aa4b351c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/uqshl.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint32_t
+uqshl_imm (uint32_t longval3)
+{
+  return uqshl (longval3, 21);
+}
+
+/* { dg-final { scan-assembler "uqshl\\tr\[0-9\]+, #21" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/uqshll.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/uqshll.c
new file mode 100644
index 0000000000000000000000000000000000000000..e4416e15144cc8939e5c57773049d85293073c07
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/uqshll.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint64_t
+uqshll_imm(uint64_t value)
+{
+  return uqshll (value, 21);
+}
+
+/* { dg-final { scan-assembler "uqshll\\tr\[0-9\]+, r\[0-9\]+, #21" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/urshr.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/urshr.c
new file mode 100644
index 0000000000000000000000000000000000000000..1526270e778a7757e4fcfeec7d45b738806adbc8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/urshr.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint64_t
+urshr_imm (uint32_t longval3)
+{
+  return urshr (longval3, 21);
+}
+
+/* { dg-final { scan-assembler "urshr\\tr\[0-9\]+, #21" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/urshrl.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/urshrl.c
new file mode 100644
index 0000000000000000000000000000000000000000..032e6e31670c19ab2f14915dc0af0a63459dd156
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/urshrl.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint64_t
+urshrl_imm(uint64_t value)
+{
+  return urshrl (value, 21);
+}
+
+/* { dg-final { scan-assembler "urshrl\\tr\[0-9\]+, r\[0-9\]+, #21" } } */


^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH][ARM][GCC][7x]: MVE vreinterpretq and vuninitializedq intrinsics.
  2019-11-14 19:34 [PATCH][ARM][GCC][0/x]: Support for MVE ACLE intrinsics Srinath Parvathaneni
                   ` (21 preceding siblings ...)
  2019-11-14 19:16 ` [PATCH][ARM][GCC][2/8x]: MVE ACLE gather load and scatter store intrinsics with writeback Srinath Parvathaneni
@ 2019-11-14 19:16 ` Srinath Parvathaneni
  2019-11-14 19:16 ` [PATCH][ARM][GCC][14x]: MVE ACLE whole vector left shift with carry intrinsics Srinath Parvathaneni
                   ` (15 subsequent siblings)
  38 siblings, 0 replies; 48+ messages in thread
From: Srinath Parvathaneni @ 2019-11-14 19:16 UTC (permalink / raw)
  To: gcc-patches; +Cc: Richard Earnshaw, Kyrylo Tkachov

[-- Attachment #1: Type: text/plain, Size: 85476 bytes --]

Hello,

This patch supports following MVE ACLE intrinsics.

vreinterpretq_s16_s32, vreinterpretq_s16_s64, vreinterpretq_s16_s8, vreinterpretq_s16_u16,
vreinterpretq_s16_u32, vreinterpretq_s16_u64, vreinterpretq_s16_u8, vreinterpretq_s32_s16,
vreinterpretq_s32_s64, vreinterpretq_s32_s8, vreinterpretq_s32_u16, vreinterpretq_s32_u32,
vreinterpretq_s32_u64, vreinterpretq_s32_u8, vreinterpretq_s64_s16, vreinterpretq_s64_s32,
vreinterpretq_s64_s8, vreinterpretq_s64_u16, vreinterpretq_s64_u32, vreinterpretq_s64_u64,
vreinterpretq_s64_u8, vreinterpretq_s8_s16, vreinterpretq_s8_s32, vreinterpretq_s8_s64,
vreinterpretq_s8_u16, vreinterpretq_s8_u32, vreinterpretq_s8_u64, vreinterpretq_s8_u8,
vreinterpretq_u16_s16, vreinterpretq_u16_s32, vreinterpretq_u16_s64, vreinterpretq_u16_s8,
vreinterpretq_u16_u32, vreinterpretq_u16_u64, vreinterpretq_u16_u8, vreinterpretq_u32_s16,
vreinterpretq_u32_s32, vreinterpretq_u32_s64, vreinterpretq_u32_s8, vreinterpretq_u32_u16,
vreinterpretq_u32_u64, vreinterpretq_u32_u8, vreinterpretq_u64_s16, vreinterpretq_u64_s32,
vreinterpretq_u64_s64, vreinterpretq_u64_s8, vreinterpretq_u64_u16, vreinterpretq_u64_u32,
vreinterpretq_u64_u8, vreinterpretq_u8_s16, vreinterpretq_u8_s32, vreinterpretq_u8_s64,
vreinterpretq_u8_s8, vreinterpretq_u8_u16, vreinterpretq_u8_u32, vreinterpretq_u8_u64,
vreinterpretq_s32_f16, vreinterpretq_s32_f32, vreinterpretq_u16_f16, vreinterpretq_u16_f32,
vreinterpretq_u32_f16, vreinterpretq_u32_f32, vreinterpretq_u64_f16, vreinterpretq_u64_f32,
vreinterpretq_u8_f16, vreinterpretq_u8_f32, vreinterpretq_f16_f32, vreinterpretq_f16_s16,
vreinterpretq_f16_s32, vreinterpretq_f16_s64, vreinterpretq_f16_s8, vreinterpretq_f16_u16,
vreinterpretq_f16_u32, vreinterpretq_f16_u64, vreinterpretq_f16_u8, vreinterpretq_f32_f16,
vreinterpretq_f32_s16, vreinterpretq_f32_s32, vreinterpretq_f32_s64, vreinterpretq_f32_s8,
vreinterpretq_f32_u16, vreinterpretq_f32_u32, vreinterpretq_f32_u64, vreinterpretq_f32_u8,
vreinterpretq_s16_f16, vreinterpretq_s16_f32, vreinterpretq_s64_f16, vreinterpretq_s64_f32,
vreinterpretq_s8_f16, vreinterpretq_s8_f32, vuninitializedq_u8, vuninitializedq_u16,
vuninitializedq_u32, vuninitializedq_u64, vuninitializedq_s8, vuninitializedq_s16,
vuninitializedq_s32, vuninitializedq_s64, vuninitializedq_f16, vuninitializedq_f32 and
vuninitializedq.

Please refer to M-profile Vector Extension (MVE) intrinsics [1]  for more details.
[1] https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/mve-intrinsics

Regression tested on arm-none-eabi and found no regressions.

Ok for trunk?

Thanks,
Srinath.

gcc/ChangeLog:

2019-11-14  Srinath Parvathaneni  <srinath.parvathaneni@arm.com>

	* config/arm/arm_mve.h (vreinterpretq_s16_s32): Define macro.
	(vreinterpretq_s16_s64): Likewise.
	(vreinterpretq_s16_s8): Likewise.
	(vreinterpretq_s16_u16): Likewise.
	(vreinterpretq_s16_u32): Likewise.
	(vreinterpretq_s16_u64): Likewise.
	(vreinterpretq_s16_u8): Likewise.
	(vreinterpretq_s32_s16): Likewise.
	(vreinterpretq_s32_s64): Likewise.
	(vreinterpretq_s32_s8): Likewise.
	(vreinterpretq_s32_u16): Likewise.
	(vreinterpretq_s32_u32): Likewise.
	(vreinterpretq_s32_u64): Likewise.
	(vreinterpretq_s32_u8): Likewise.
	(vreinterpretq_s64_s16): Likewise.
	(vreinterpretq_s64_s32): Likewise.
	(vreinterpretq_s64_s8): Likewise.
	(vreinterpretq_s64_u16): Likewise.
	(vreinterpretq_s64_u32): Likewise.
	(vreinterpretq_s64_u64): Likewise.
	(vreinterpretq_s64_u8): Likewise.
	(vreinterpretq_s8_s16): Likewise.
	(vreinterpretq_s8_s32): Likewise.
	(vreinterpretq_s8_s64): Likewise.
	(vreinterpretq_s8_u16): Likewise.
	(vreinterpretq_s8_u32): Likewise.
	(vreinterpretq_s8_u64): Likewise.
	(vreinterpretq_s8_u8): Likewise.
	(vreinterpretq_u16_s16): Likewise.
	(vreinterpretq_u16_s32): Likewise.
	(vreinterpretq_u16_s64): Likewise.
	(vreinterpretq_u16_s8): Likewise.
	(vreinterpretq_u16_u32): Likewise.
	(vreinterpretq_u16_u64): Likewise.
	(vreinterpretq_u16_u8): Likewise.
	(vreinterpretq_u32_s16): Likewise.
	(vreinterpretq_u32_s32): Likewise.
	(vreinterpretq_u32_s64): Likewise.
	(vreinterpretq_u32_s8): Likewise.
	(vreinterpretq_u32_u16): Likewise.
	(vreinterpretq_u32_u64): Likewise.
	(vreinterpretq_u32_u8): Likewise.
	(vreinterpretq_u64_s16): Likewise.
	(vreinterpretq_u64_s32): Likewise.
	(vreinterpretq_u64_s64): Likewise.
	(vreinterpretq_u64_s8): Likewise.
	(vreinterpretq_u64_u16): Likewise.
	(vreinterpretq_u64_u32): Likewise.
	(vreinterpretq_u64_u8): Likewise.
	(vreinterpretq_u8_s16): Likewise.
	(vreinterpretq_u8_s32): Likewise.
	(vreinterpretq_u8_s64): Likewise.
	(vreinterpretq_u8_s8): Likewise.
	(vreinterpretq_u8_u16): Likewise.
	(vreinterpretq_u8_u32): Likewise.
	(vreinterpretq_u8_u64): Likewise.
	(vreinterpretq_s32_f16): Likewise.
	(vreinterpretq_s32_f32): Likewise.
	(vreinterpretq_u16_f16): Likewise.
	(vreinterpretq_u16_f32): Likewise.
	(vreinterpretq_u32_f16): Likewise.
	(vreinterpretq_u32_f32): Likewise.
	(vreinterpretq_u64_f16): Likewise.
	(vreinterpretq_u64_f32): Likewise.
	(vreinterpretq_u8_f16): Likewise.
	(vreinterpretq_u8_f32): Likewise.
	(vreinterpretq_f16_f32): Likewise.
	(vreinterpretq_f16_s16): Likewise.
	(vreinterpretq_f16_s32): Likewise.
	(vreinterpretq_f16_s64): Likewise.
	(vreinterpretq_f16_s8): Likewise.
	(vreinterpretq_f16_u16): Likewise.
	(vreinterpretq_f16_u32): Likewise.
	(vreinterpretq_f16_u64): Likewise.
	(vreinterpretq_f16_u8): Likewise.
	(vreinterpretq_f32_f16): Likewise.
	(vreinterpretq_f32_s16): Likewise.
	(vreinterpretq_f32_s32): Likewise.
	(vreinterpretq_f32_s64): Likewise.
	(vreinterpretq_f32_s8): Likewise.
	(vreinterpretq_f32_u16): Likewise.
	(vreinterpretq_f32_u32): Likewise.
	(vreinterpretq_f32_u64): Likewise.
	(vreinterpretq_f32_u8): Likewise.
	(vreinterpretq_s16_f16): Likewise.
	(vreinterpretq_s16_f32): Likewise.
	(vreinterpretq_s64_f16): Likewise.
	(vreinterpretq_s64_f32): Likewise.
	(vreinterpretq_s8_f16): Likewise.
	(vreinterpretq_s8_f32): Likewise.
	(vuninitializedq_u8): Likewise.
	(vuninitializedq_u16): Likewise.
	(vuninitializedq_u32): Likewise.
	(vuninitializedq_u64): Likewise.
	(vuninitializedq_s8): Likewise.
	(vuninitializedq_s16): Likewise.
	(vuninitializedq_s32): Likewise.
	(vuninitializedq_s64): Likewise.
	(vuninitializedq_f16): Likewise.
	(vuninitializedq_f32): Likewise.
	(__arm_vuninitializedq_u8): Define intrinsic.
	(__arm_vuninitializedq_u16): Likewise.
	(__arm_vuninitializedq_u32): Likewise.
	(__arm_vuninitializedq_u64): Likewise.
	(__arm_vuninitializedq_s8): Likewise.
	(__arm_vuninitializedq_s16): Likewise.
	(__arm_vuninitializedq_s32): Likewise.
	(__arm_vuninitializedq_s64): Likewise.
	(__arm_vreinterpretq_s16_s32): Likewise.
	(__arm_vreinterpretq_s16_s64): Likewise.
	(__arm_vreinterpretq_s16_s8): Likewise.
	(__arm_vreinterpretq_s16_u16): Likewise.
	(__arm_vreinterpretq_s16_u32): Likewise.
	(__arm_vreinterpretq_s16_u64): Likewise.
	(__arm_vreinterpretq_s16_u8): Likewise.
	(__arm_vreinterpretq_s32_s16): Likewise.
	(__arm_vreinterpretq_s32_s64): Likewise.
	(__arm_vreinterpretq_s32_s8): Likewise.
	(__arm_vreinterpretq_s32_u16): Likewise.
	(__arm_vreinterpretq_s32_u32): Likewise.
	(__arm_vreinterpretq_s32_u64): Likewise.
	(__arm_vreinterpretq_s32_u8): Likewise.
	(__arm_vreinterpretq_s64_s16): Likewise.
	(__arm_vreinterpretq_s64_s32): Likewise.
	(__arm_vreinterpretq_s64_s8): Likewise.
	(__arm_vreinterpretq_s64_u16): Likewise.
	(__arm_vreinterpretq_s64_u32): Likewise.
	(__arm_vreinterpretq_s64_u64): Likewise.
	(__arm_vreinterpretq_s64_u8): Likewise.
	(__arm_vreinterpretq_s8_s16): Likewise.
	(__arm_vreinterpretq_s8_s32): Likewise.
	(__arm_vreinterpretq_s8_s64): Likewise.
	(__arm_vreinterpretq_s8_u16): Likewise.
	(__arm_vreinterpretq_s8_u32): Likewise.
	(__arm_vreinterpretq_s8_u64): Likewise.
	(__arm_vreinterpretq_s8_u8): Likewise.
	(__arm_vreinterpretq_u16_s16): Likewise.
	(__arm_vreinterpretq_u16_s32): Likewise.
	(__arm_vreinterpretq_u16_s64): Likewise.
	(__arm_vreinterpretq_u16_s8): Likewise.
	(__arm_vreinterpretq_u16_u32): Likewise.
	(__arm_vreinterpretq_u16_u64): Likewise.
	(__arm_vreinterpretq_u16_u8): Likewise.
	(__arm_vreinterpretq_u32_s16): Likewise.
	(__arm_vreinterpretq_u32_s32): Likewise.
	(__arm_vreinterpretq_u32_s64): Likewise.
	(__arm_vreinterpretq_u32_s8): Likewise.
	(__arm_vreinterpretq_u32_u16): Likewise.
	(__arm_vreinterpretq_u32_u64): Likewise.
	(__arm_vreinterpretq_u32_u8): Likewise.
	(__arm_vreinterpretq_u64_s16): Likewise.
	(__arm_vreinterpretq_u64_s32): Likewise.
	(__arm_vreinterpretq_u64_s64): Likewise.
	(__arm_vreinterpretq_u64_s8): Likewise.
	(__arm_vreinterpretq_u64_u16): Likewise.
	(__arm_vreinterpretq_u64_u32): Likewise.
	(__arm_vreinterpretq_u64_u8): Likewise.
	(__arm_vreinterpretq_u8_s16): Likewise.
	(__arm_vreinterpretq_u8_s32): Likewise.
	(__arm_vreinterpretq_u8_s64): Likewise.
	(__arm_vreinterpretq_u8_s8): Likewise.
	(__arm_vreinterpretq_u8_u16): Likewise.
	(__arm_vreinterpretq_u8_u32): Likewise.
	(__arm_vreinterpretq_u8_u64): Likewise.
	(__arm_vuninitializedq_f16): Likewise.
	(__arm_vuninitializedq_f32): Likewise.
	(__arm_vreinterpretq_s32_f16): Likewise.
	(__arm_vreinterpretq_s32_f32): Likewise.
	(__arm_vreinterpretq_s16_f16): Likewise.
	(__arm_vreinterpretq_s16_f32): Likewise.
	(__arm_vreinterpretq_s64_f16): Likewise.
	(__arm_vreinterpretq_s64_f32): Likewise.
	(__arm_vreinterpretq_s8_f16): Likewise.
	(__arm_vreinterpretq_s8_f32): Likewise.
	(__arm_vreinterpretq_u16_f16): Likewise.
	(__arm_vreinterpretq_u16_f32): Likewise.
	(__arm_vreinterpretq_u32_f16): Likewise.
	(__arm_vreinterpretq_u32_f32): Likewise.
	(__arm_vreinterpretq_u64_f16): Likewise.
	(__arm_vreinterpretq_u64_f32): Likewise.
	(__arm_vreinterpretq_u8_f16): Likewise.
	(__arm_vreinterpretq_u8_f32): Likewise.
	(__arm_vreinterpretq_f16_f32): Likewise.
	(__arm_vreinterpretq_f16_s16): Likewise.
	(__arm_vreinterpretq_f16_s32): Likewise.
	(__arm_vreinterpretq_f16_s64): Likewise.
	(__arm_vreinterpretq_f16_s8): Likewise.
	(__arm_vreinterpretq_f16_u16): Likewise.
	(__arm_vreinterpretq_f16_u32): Likewise.
	(__arm_vreinterpretq_f16_u64): Likewise.
	(__arm_vreinterpretq_f16_u8): Likewise.
	(__arm_vreinterpretq_f32_f16): Likewise.
	(__arm_vreinterpretq_f32_s16): Likewise.
	(__arm_vreinterpretq_f32_s32): Likewise.
	(__arm_vreinterpretq_f32_s64): Likewise.
	(__arm_vreinterpretq_f32_s8): Likewise.
	(__arm_vreinterpretq_f32_u16): Likewise.
	(__arm_vreinterpretq_f32_u32): Likewise.
	(__arm_vreinterpretq_f32_u64): Likewise.
	(__arm_vreinterpretq_f32_u8): Likewise.
	(vuninitializedq): Define polymorphic variant.
	(vreinterpretq_f16): Likewise.
	(vreinterpretq_f32): Likewise.
	(vreinterpretq_s16): Likewise.
	(vreinterpretq_s32): Likewise.
	(vreinterpretq_s64): Likewise.
	(vreinterpretq_s8): Likewise.
	(vreinterpretq_u16): Likewise.
	(vreinterpretq_u32): Likewise.
	(vreinterpretq_u64): Likewise.
	(vreinterpretq_u8): Likewise.

gcc/testsuite/ChangeLog:

2019-11-14  Srinath Parvathaneni  <srinath.parvathaneni@arm.com>

	* gcc.target/arm/mve/intrinsics/vuninitializedq_float.c: New test.
	* gcc.target/arm/mve/intrinsics/vuninitializedq_float1.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vuninitializedq_int.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vuninitializedq_int1.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vreinterpretq_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vreinterpretq_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vreinterpretq_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vreinterpretq_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vreinterpretq_s64.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vreinterpretq_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vreinterpretq_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vreinterpretq_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vreinterpretq_u64.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vreinterpretq_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vuninitializedq_float.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vuninitializedq_float1.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vuninitializedq_int.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vuninitializedq_int1.c: Likewise.



###############     Attachment also inlined for ease of reply    ###############


diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
index 89456589c9dcdff5b56e8707dd720fb151466661..810a99c83c07528beb0f9650708209e20e49df01 100644
--- a/gcc/config/arm/arm_mve.h
+++ b/gcc/config/arm/arm_mve.h
@@ -1906,6 +1906,106 @@ typedef struct { uint8x16_t val[4]; } uint8x16x4_t;
 #define vaddq_u32(__a, __b) __arm_vaddq_u32(__a, __b)
 #define vaddq_f16(__a, __b) __arm_vaddq_f16(__a, __b)
 #define vaddq_f32(__a, __b) __arm_vaddq_f32(__a, __b)
+#define vreinterpretq_s16_s32(__a) __arm_vreinterpretq_s16_s32(__a)
+#define vreinterpretq_s16_s64(__a) __arm_vreinterpretq_s16_s64(__a)
+#define vreinterpretq_s16_s8(__a) __arm_vreinterpretq_s16_s8(__a)
+#define vreinterpretq_s16_u16(__a) __arm_vreinterpretq_s16_u16(__a)
+#define vreinterpretq_s16_u32(__a) __arm_vreinterpretq_s16_u32(__a)
+#define vreinterpretq_s16_u64(__a) __arm_vreinterpretq_s16_u64(__a)
+#define vreinterpretq_s16_u8(__a) __arm_vreinterpretq_s16_u8(__a)
+#define vreinterpretq_s32_s16(__a) __arm_vreinterpretq_s32_s16(__a)
+#define vreinterpretq_s32_s64(__a) __arm_vreinterpretq_s32_s64(__a)
+#define vreinterpretq_s32_s8(__a) __arm_vreinterpretq_s32_s8(__a)
+#define vreinterpretq_s32_u16(__a) __arm_vreinterpretq_s32_u16(__a)
+#define vreinterpretq_s32_u32(__a) __arm_vreinterpretq_s32_u32(__a)
+#define vreinterpretq_s32_u64(__a) __arm_vreinterpretq_s32_u64(__a)
+#define vreinterpretq_s32_u8(__a) __arm_vreinterpretq_s32_u8(__a)
+#define vreinterpretq_s64_s16(__a) __arm_vreinterpretq_s64_s16(__a)
+#define vreinterpretq_s64_s32(__a) __arm_vreinterpretq_s64_s32(__a)
+#define vreinterpretq_s64_s8(__a) __arm_vreinterpretq_s64_s8(__a)
+#define vreinterpretq_s64_u16(__a) __arm_vreinterpretq_s64_u16(__a)
+#define vreinterpretq_s64_u32(__a) __arm_vreinterpretq_s64_u32(__a)
+#define vreinterpretq_s64_u64(__a) __arm_vreinterpretq_s64_u64(__a)
+#define vreinterpretq_s64_u8(__a) __arm_vreinterpretq_s64_u8(__a)
+#define vreinterpretq_s8_s16(__a) __arm_vreinterpretq_s8_s16(__a)
+#define vreinterpretq_s8_s32(__a) __arm_vreinterpretq_s8_s32(__a)
+#define vreinterpretq_s8_s64(__a) __arm_vreinterpretq_s8_s64(__a)
+#define vreinterpretq_s8_u16(__a) __arm_vreinterpretq_s8_u16(__a)
+#define vreinterpretq_s8_u32(__a) __arm_vreinterpretq_s8_u32(__a)
+#define vreinterpretq_s8_u64(__a) __arm_vreinterpretq_s8_u64(__a)
+#define vreinterpretq_s8_u8(__a) __arm_vreinterpretq_s8_u8(__a)
+#define vreinterpretq_u16_s16(__a) __arm_vreinterpretq_u16_s16(__a)
+#define vreinterpretq_u16_s32(__a) __arm_vreinterpretq_u16_s32(__a)
+#define vreinterpretq_u16_s64(__a) __arm_vreinterpretq_u16_s64(__a)
+#define vreinterpretq_u16_s8(__a) __arm_vreinterpretq_u16_s8(__a)
+#define vreinterpretq_u16_u32(__a) __arm_vreinterpretq_u16_u32(__a)
+#define vreinterpretq_u16_u64(__a) __arm_vreinterpretq_u16_u64(__a)
+#define vreinterpretq_u16_u8(__a) __arm_vreinterpretq_u16_u8(__a)
+#define vreinterpretq_u32_s16(__a) __arm_vreinterpretq_u32_s16(__a)
+#define vreinterpretq_u32_s32(__a) __arm_vreinterpretq_u32_s32(__a)
+#define vreinterpretq_u32_s64(__a) __arm_vreinterpretq_u32_s64(__a)
+#define vreinterpretq_u32_s8(__a) __arm_vreinterpretq_u32_s8(__a)
+#define vreinterpretq_u32_u16(__a) __arm_vreinterpretq_u32_u16(__a)
+#define vreinterpretq_u32_u64(__a) __arm_vreinterpretq_u32_u64(__a)
+#define vreinterpretq_u32_u8(__a) __arm_vreinterpretq_u32_u8(__a)
+#define vreinterpretq_u64_s16(__a) __arm_vreinterpretq_u64_s16(__a)
+#define vreinterpretq_u64_s32(__a) __arm_vreinterpretq_u64_s32(__a)
+#define vreinterpretq_u64_s64(__a) __arm_vreinterpretq_u64_s64(__a)
+#define vreinterpretq_u64_s8(__a) __arm_vreinterpretq_u64_s8(__a)
+#define vreinterpretq_u64_u16(__a) __arm_vreinterpretq_u64_u16(__a)
+#define vreinterpretq_u64_u32(__a) __arm_vreinterpretq_u64_u32(__a)
+#define vreinterpretq_u64_u8(__a) __arm_vreinterpretq_u64_u8(__a)
+#define vreinterpretq_u8_s16(__a) __arm_vreinterpretq_u8_s16(__a)
+#define vreinterpretq_u8_s32(__a) __arm_vreinterpretq_u8_s32(__a)
+#define vreinterpretq_u8_s64(__a) __arm_vreinterpretq_u8_s64(__a)
+#define vreinterpretq_u8_s8(__a) __arm_vreinterpretq_u8_s8(__a)
+#define vreinterpretq_u8_u16(__a) __arm_vreinterpretq_u8_u16(__a)
+#define vreinterpretq_u8_u32(__a) __arm_vreinterpretq_u8_u32(__a)
+#define vreinterpretq_u8_u64(__a) __arm_vreinterpretq_u8_u64(__a)
+#define vreinterpretq_s32_f16(__a) __arm_vreinterpretq_s32_f16(__a)
+#define vreinterpretq_s32_f32(__a) __arm_vreinterpretq_s32_f32(__a)
+#define vreinterpretq_u16_f16(__a) __arm_vreinterpretq_u16_f16(__a)
+#define vreinterpretq_u16_f32(__a) __arm_vreinterpretq_u16_f32(__a)
+#define vreinterpretq_u32_f16(__a) __arm_vreinterpretq_u32_f16(__a)
+#define vreinterpretq_u32_f32(__a) __arm_vreinterpretq_u32_f32(__a)
+#define vreinterpretq_u64_f16(__a) __arm_vreinterpretq_u64_f16(__a)
+#define vreinterpretq_u64_f32(__a) __arm_vreinterpretq_u64_f32(__a)
+#define vreinterpretq_u8_f16(__a) __arm_vreinterpretq_u8_f16(__a)
+#define vreinterpretq_u8_f32(__a) __arm_vreinterpretq_u8_f32(__a)
+#define vreinterpretq_f16_f32(__a) __arm_vreinterpretq_f16_f32(__a)
+#define vreinterpretq_f16_s16(__a) __arm_vreinterpretq_f16_s16(__a)
+#define vreinterpretq_f16_s32(__a) __arm_vreinterpretq_f16_s32(__a)
+#define vreinterpretq_f16_s64(__a) __arm_vreinterpretq_f16_s64(__a)
+#define vreinterpretq_f16_s8(__a) __arm_vreinterpretq_f16_s8(__a)
+#define vreinterpretq_f16_u16(__a) __arm_vreinterpretq_f16_u16(__a)
+#define vreinterpretq_f16_u32(__a) __arm_vreinterpretq_f16_u32(__a)
+#define vreinterpretq_f16_u64(__a) __arm_vreinterpretq_f16_u64(__a)
+#define vreinterpretq_f16_u8(__a) __arm_vreinterpretq_f16_u8(__a)
+#define vreinterpretq_f32_f16(__a) __arm_vreinterpretq_f32_f16(__a)
+#define vreinterpretq_f32_s16(__a) __arm_vreinterpretq_f32_s16(__a)
+#define vreinterpretq_f32_s32(__a) __arm_vreinterpretq_f32_s32(__a)
+#define vreinterpretq_f32_s64(__a) __arm_vreinterpretq_f32_s64(__a)
+#define vreinterpretq_f32_s8(__a) __arm_vreinterpretq_f32_s8(__a)
+#define vreinterpretq_f32_u16(__a) __arm_vreinterpretq_f32_u16(__a)
+#define vreinterpretq_f32_u32(__a) __arm_vreinterpretq_f32_u32(__a)
+#define vreinterpretq_f32_u64(__a) __arm_vreinterpretq_f32_u64(__a)
+#define vreinterpretq_f32_u8(__a) __arm_vreinterpretq_f32_u8(__a)
+#define vreinterpretq_s16_f16(__a) __arm_vreinterpretq_s16_f16(__a)
+#define vreinterpretq_s16_f32(__a) __arm_vreinterpretq_s16_f32(__a)
+#define vreinterpretq_s64_f16(__a) __arm_vreinterpretq_s64_f16(__a)
+#define vreinterpretq_s64_f32(__a) __arm_vreinterpretq_s64_f32(__a)
+#define vreinterpretq_s8_f16(__a) __arm_vreinterpretq_s8_f16(__a)
+#define vreinterpretq_s8_f32(__a) __arm_vreinterpretq_s8_f32(__a)
+#define vuninitializedq_u8(void) __arm_vuninitializedq_u8(void)
+#define vuninitializedq_u16(void) __arm_vuninitializedq_u16(void)
+#define vuninitializedq_u32(void) __arm_vuninitializedq_u32(void)
+#define vuninitializedq_u64(void) __arm_vuninitializedq_u64(void)
+#define vuninitializedq_s8(void) __arm_vuninitializedq_s8(void)
+#define vuninitializedq_s16(void) __arm_vuninitializedq_s16(void)
+#define vuninitializedq_s32(void) __arm_vuninitializedq_s32(void)
+#define vuninitializedq_s64(void) __arm_vuninitializedq_s64(void)
+#define vuninitializedq_f16(void) __arm_vuninitializedq_f16(void)
+#define vuninitializedq_f32(void) __arm_vuninitializedq_f32(void)
 #endif
 
 __extension__ extern __inline void
@@ -12391,6 +12491,471 @@ __arm_vaddq_u32 (uint32x4_t __a, uint32x4_t __b)
   return __a + __b;
 }
 
+__extension__ extern __inline uint8x16_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vuninitializedq_u8 (void)
+{
+  uint8x16_t __uninit;
+  __asm__ ("": "=w"(__uninit));
+  return __uninit;
+}
+
+__extension__ extern __inline uint16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vuninitializedq_u16 (void)
+{
+  uint16x8_t __uninit;
+  __asm__ ("": "=w"(__uninit));
+  return __uninit;
+}
+
+__extension__ extern __inline uint32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vuninitializedq_u32 (void)
+{
+  uint32x4_t __uninit;
+  __asm__ ("": "=w"(__uninit));
+  return __uninit;
+}
+
+__extension__ extern __inline uint64x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vuninitializedq_u64 (void)
+{
+  uint64x2_t __uninit;
+  __asm__ ("": "=w"(__uninit));
+  return __uninit;
+}
+
+__extension__ extern __inline int8x16_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vuninitializedq_s8 (void)
+{
+  int8x16_t __uninit;
+  __asm__ ("": "=w"(__uninit));
+  return __uninit;
+}
+
+__extension__ extern __inline int16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vuninitializedq_s16 (void)
+{
+  int16x8_t __uninit;
+  __asm__ ("": "=w"(__uninit));
+  return __uninit;
+}
+
+__extension__ extern __inline int32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vuninitializedq_s32 (void)
+{
+  int32x4_t __uninit;
+  __asm__ ("": "=w"(__uninit));
+  return __uninit;
+}
+
+__extension__ extern __inline int64x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vuninitializedq_s64 (void)
+{
+  int64x2_t __uninit;
+  __asm__ ("": "=w"(__uninit));
+  return __uninit;
+}
+
+__extension__ extern __inline int16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vreinterpretq_s16_s32 (int32x4_t __a)
+{
+  return (int16x8_t)  __a;
+}
+
+__extension__ extern __inline int16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vreinterpretq_s16_s64 (int64x2_t __a)
+{
+  return (int16x8_t)  __a;
+}
+
+__extension__ extern __inline int16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vreinterpretq_s16_s8 (int8x16_t __a)
+{
+  return (int16x8_t)  __a;
+}
+
+__extension__ extern __inline int16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vreinterpretq_s16_u16 (uint16x8_t __a)
+{
+  return (int16x8_t)  __a;
+}
+
+__extension__ extern __inline int16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vreinterpretq_s16_u32 (uint32x4_t __a)
+{
+  return (int16x8_t)  __a;
+}
+
+__extension__ extern __inline int16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vreinterpretq_s16_u64 (uint64x2_t __a)
+{
+  return (int16x8_t)  __a;
+}
+
+__extension__ extern __inline int16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vreinterpretq_s16_u8 (uint8x16_t __a)
+{
+  return (int16x8_t)  __a;
+}
+
+__extension__ extern __inline int32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vreinterpretq_s32_s16 (int16x8_t __a)
+{
+  return (int32x4_t)  __a;
+}
+
+__extension__ extern __inline int32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vreinterpretq_s32_s64 (int64x2_t __a)
+{
+  return (int32x4_t)  __a;
+}
+
+__extension__ extern __inline int32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vreinterpretq_s32_s8 (int8x16_t __a)
+{
+  return (int32x4_t)  __a;
+}
+
+__extension__ extern __inline int32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vreinterpretq_s32_u16 (uint16x8_t __a)
+{
+  return (int32x4_t)  __a;
+}
+
+__extension__ extern __inline int32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vreinterpretq_s32_u32 (uint32x4_t __a)
+{
+  return (int32x4_t)  __a;
+}
+
+__extension__ extern __inline int32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vreinterpretq_s32_u64 (uint64x2_t __a)
+{
+  return (int32x4_t)  __a;
+}
+
+__extension__ extern __inline int32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vreinterpretq_s32_u8 (uint8x16_t __a)
+{
+  return (int32x4_t)  __a;
+}
+
+__extension__ extern __inline int64x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vreinterpretq_s64_s16 (int16x8_t __a)
+{
+  return (int64x2_t)  __a;
+}
+
+__extension__ extern __inline int64x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vreinterpretq_s64_s32 (int32x4_t __a)
+{
+  return (int64x2_t)  __a;
+}
+
+__extension__ extern __inline int64x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vreinterpretq_s64_s8 (int8x16_t __a)
+{
+  return (int64x2_t)  __a;
+}
+
+__extension__ extern __inline int64x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vreinterpretq_s64_u16 (uint16x8_t __a)
+{
+  return (int64x2_t)  __a;
+}
+
+__extension__ extern __inline int64x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vreinterpretq_s64_u32 (uint32x4_t __a)
+{
+  return (int64x2_t)  __a;
+}
+
+__extension__ extern __inline int64x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vreinterpretq_s64_u64 (uint64x2_t __a)
+{
+  return (int64x2_t)  __a;
+}
+
+__extension__ extern __inline int64x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vreinterpretq_s64_u8 (uint8x16_t __a)
+{
+  return (int64x2_t)  __a;
+}
+
+__extension__ extern __inline int8x16_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vreinterpretq_s8_s16 (int16x8_t __a)
+{
+  return (int8x16_t)  __a;
+}
+
+__extension__ extern __inline int8x16_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vreinterpretq_s8_s32 (int32x4_t __a)
+{
+  return (int8x16_t)  __a;
+}
+
+__extension__ extern __inline int8x16_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vreinterpretq_s8_s64 (int64x2_t __a)
+{
+  return (int8x16_t)  __a;
+}
+
+__extension__ extern __inline int8x16_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vreinterpretq_s8_u16 (uint16x8_t __a)
+{
+  return (int8x16_t)  __a;
+}
+
+__extension__ extern __inline int8x16_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vreinterpretq_s8_u32 (uint32x4_t __a)
+{
+  return (int8x16_t)  __a;
+}
+
+__extension__ extern __inline int8x16_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vreinterpretq_s8_u64 (uint64x2_t __a)
+{
+  return (int8x16_t)  __a;
+}
+
+__extension__ extern __inline int8x16_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vreinterpretq_s8_u8 (uint8x16_t __a)
+{
+  return (int8x16_t)  __a;
+}
+
+__extension__ extern __inline uint16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vreinterpretq_u16_s16 (int16x8_t __a)
+{
+  return (uint16x8_t)  __a;
+}
+
+__extension__ extern __inline uint16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vreinterpretq_u16_s32 (int32x4_t __a)
+{
+  return (uint16x8_t)  __a;
+}
+
+__extension__ extern __inline uint16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vreinterpretq_u16_s64 (int64x2_t __a)
+{
+  return (uint16x8_t)  __a;
+}
+
+__extension__ extern __inline uint16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vreinterpretq_u16_s8 (int8x16_t __a)
+{
+  return (uint16x8_t)  __a;
+}
+
+__extension__ extern __inline uint16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vreinterpretq_u16_u32 (uint32x4_t __a)
+{
+  return (uint16x8_t)  __a;
+}
+
+__extension__ extern __inline uint16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vreinterpretq_u16_u64 (uint64x2_t __a)
+{
+  return (uint16x8_t)  __a;
+}
+
+__extension__ extern __inline uint16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vreinterpretq_u16_u8 (uint8x16_t __a)
+{
+  return (uint16x8_t)  __a;
+}
+
+
+__extension__ extern __inline uint32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vreinterpretq_u32_s16 (int16x8_t __a)
+{
+  return (uint32x4_t)  __a;
+}
+
+__extension__ extern __inline uint32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vreinterpretq_u32_s32 (int32x4_t __a)
+{
+  return (uint32x4_t)  __a;
+}
+
+__extension__ extern __inline uint32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vreinterpretq_u32_s64 (int64x2_t __a)
+{
+  return (uint32x4_t)  __a;
+}
+
+__extension__ extern __inline uint32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vreinterpretq_u32_s8 (int8x16_t __a)
+{
+  return (uint32x4_t)  __a;
+}
+
+__extension__ extern __inline uint32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vreinterpretq_u32_u16 (uint16x8_t __a)
+{
+  return (uint32x4_t)  __a;
+}
+
+__extension__ extern __inline uint32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vreinterpretq_u32_u64 (uint64x2_t __a)
+{
+  return (uint32x4_t)  __a;
+}
+
+__extension__ extern __inline uint32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vreinterpretq_u32_u8 (uint8x16_t __a)
+{
+  return (uint32x4_t)  __a;
+}
+
+__extension__ extern __inline uint64x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vreinterpretq_u64_s16 (int16x8_t __a)
+{
+  return (uint64x2_t)  __a;
+}
+
+__extension__ extern __inline uint64x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vreinterpretq_u64_s32 (int32x4_t __a)
+{
+  return (uint64x2_t)  __a;
+}
+
+__extension__ extern __inline uint64x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vreinterpretq_u64_s64 (int64x2_t __a)
+{
+  return (uint64x2_t)  __a;
+}
+
+__extension__ extern __inline uint64x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vreinterpretq_u64_s8 (int8x16_t __a)
+{
+  return (uint64x2_t)  __a;
+}
+
+__extension__ extern __inline uint64x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vreinterpretq_u64_u16 (uint16x8_t __a)
+{
+  return (uint64x2_t)  __a;
+}
+
+__extension__ extern __inline uint64x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vreinterpretq_u64_u32 (uint32x4_t __a)
+{
+  return (uint64x2_t)  __a;
+}
+
+__extension__ extern __inline uint64x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vreinterpretq_u64_u8 (uint8x16_t __a)
+{
+  return (uint64x2_t)  __a;
+}
+
+__extension__ extern __inline uint8x16_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vreinterpretq_u8_s16 (int16x8_t __a)
+{
+  return (uint8x16_t)  __a;
+}
+
+__extension__ extern __inline uint8x16_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vreinterpretq_u8_s32 (int32x4_t __a)
+{
+  return (uint8x16_t)  __a;
+}
+
+__extension__ extern __inline uint8x16_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vreinterpretq_u8_s64 (int64x2_t __a)
+{
+  return (uint8x16_t)  __a;
+}
+
+__extension__ extern __inline uint8x16_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vreinterpretq_u8_s8 (int8x16_t __a)
+{
+  return (uint8x16_t)  __a;
+}
+
+__extension__ extern __inline uint8x16_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vreinterpretq_u8_u16 (uint16x8_t __a)
+{
+  return (uint8x16_t)  __a;
+}
+
+__extension__ extern __inline uint8x16_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vreinterpretq_u8_u32 (uint32x4_t __a)
+{
+  return (uint8x16_t)  __a;
+}
+
+__extension__ extern __inline uint8x16_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vreinterpretq_u8_u64 (uint64x2_t __a)
+{
+  return (uint8x16_t)  __a;
+}
+
 #if (__ARM_FEATURE_MVE & 2) /* MVE Floating point.  */
 
 __extension__ extern __inline void
@@ -14771,6 +15336,262 @@ __arm_vaddq_f32 (float32x4_t __a, float32x4_t __b)
   return __a + __b;
 }
 
+__extension__ extern __inline float16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vuninitializedq_f16 (void)
+{
+  float16x8_t __uninit;
+  __asm__ ("": "=w" (__uninit));
+  return __uninit;
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vuninitializedq_f32 (void)
+{
+  float32x4_t __uninit;
+  __asm__ ("": "=w" (__uninit));
+  return __uninit;
+}
+
+__extension__ extern __inline int32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vreinterpretq_s32_f16 (float16x8_t __a)
+{
+  return (int32x4_t)  __a;
+}
+
+__extension__ extern __inline int32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vreinterpretq_s32_f32 (float32x4_t __a)
+{
+  return (int32x4_t)  __a;
+}
+
+__extension__ extern __inline int16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vreinterpretq_s16_f16 (float16x8_t __a)
+{
+  return (int16x8_t)  __a;
+}
+
+__extension__ extern __inline int16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vreinterpretq_s16_f32 (float32x4_t __a)
+{
+  return (int16x8_t)  __a;
+}
+
+__extension__ extern __inline int64x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vreinterpretq_s64_f16 (float16x8_t __a)
+{
+  return (int64x2_t)  __a;
+}
+
+__extension__ extern __inline int64x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vreinterpretq_s64_f32 (float32x4_t __a)
+{
+  return (int64x2_t)  __a;
+}
+
+__extension__ extern __inline int8x16_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vreinterpretq_s8_f16 (float16x8_t __a)
+{
+  return (int8x16_t)  __a;
+}
+
+__extension__ extern __inline int8x16_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vreinterpretq_s8_f32 (float32x4_t __a)
+{
+  return (int8x16_t)  __a;
+}
+
+__extension__ extern __inline uint16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vreinterpretq_u16_f16 (float16x8_t __a)
+{
+  return (uint16x8_t)  __a;
+}
+
+__extension__ extern __inline uint16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vreinterpretq_u16_f32 (float32x4_t __a)
+{
+  return (uint16x8_t)  __a;
+}
+
+__extension__ extern __inline uint32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vreinterpretq_u32_f16 (float16x8_t __a)
+{
+  return (uint32x4_t)  __a;
+}
+
+__extension__ extern __inline uint32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vreinterpretq_u32_f32 (float32x4_t __a)
+{
+  return (uint32x4_t)  __a;
+}
+
+__extension__ extern __inline uint64x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vreinterpretq_u64_f16 (float16x8_t __a)
+{
+  return (uint64x2_t)  __a;
+}
+
+__extension__ extern __inline uint64x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vreinterpretq_u64_f32 (float32x4_t __a)
+{
+  return (uint64x2_t)  __a;
+}
+
+__extension__ extern __inline uint8x16_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vreinterpretq_u8_f16 (float16x8_t __a)
+{
+  return (uint8x16_t)  __a;
+}
+
+__extension__ extern __inline uint8x16_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vreinterpretq_u8_f32 (float32x4_t __a)
+{
+  return (uint8x16_t)  __a;
+}
+
+__extension__ extern __inline float16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vreinterpretq_f16_f32 (float32x4_t __a)
+{
+  return (float16x8_t)  __a;
+}
+
+__extension__ extern __inline float16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vreinterpretq_f16_s16 (int16x8_t __a)
+{
+  return (float16x8_t)  __a;
+}
+
+__extension__ extern __inline float16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vreinterpretq_f16_s32 (int32x4_t __a)
+{
+  return (float16x8_t)  __a;
+}
+
+__extension__ extern __inline float16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vreinterpretq_f16_s64 (int64x2_t __a)
+{
+  return (float16x8_t)  __a;
+}
+
+__extension__ extern __inline float16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vreinterpretq_f16_s8 (int8x16_t __a)
+{
+  return (float16x8_t)  __a;
+}
+
+__extension__ extern __inline float16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vreinterpretq_f16_u16 (uint16x8_t __a)
+{
+  return (float16x8_t)  __a;
+}
+
+__extension__ extern __inline float16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vreinterpretq_f16_u32 (uint32x4_t __a)
+{
+  return (float16x8_t)  __a;
+}
+
+__extension__ extern __inline float16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vreinterpretq_f16_u64 (uint64x2_t __a)
+{
+  return (float16x8_t)  __a;
+}
+
+__extension__ extern __inline float16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vreinterpretq_f16_u8 (uint8x16_t __a)
+{
+  return (float16x8_t)  __a;
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vreinterpretq_f32_f16 (float16x8_t __a)
+{
+  return (float32x4_t)  __a;
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vreinterpretq_f32_s16 (int16x8_t __a)
+{
+  return (float32x4_t)  __a;
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vreinterpretq_f32_s32 (int32x4_t __a)
+{
+  return (float32x4_t)  __a;
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vreinterpretq_f32_s64 (int64x2_t __a)
+{
+  return (float32x4_t)  __a;
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vreinterpretq_f32_s8 (int8x16_t __a)
+{
+  return (float32x4_t)  __a;
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vreinterpretq_f32_u16 (uint16x8_t __a)
+{
+  return (float32x4_t)  __a;
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vreinterpretq_f32_u32 (uint32x4_t __a)
+{
+  return (float32x4_t)  __a;
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vreinterpretq_f32_u64 (uint64x2_t __a)
+{
+  return (float32x4_t)  __a;
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vreinterpretq_f32_u8 (uint8x16_t __a)
+{
+  return (float32x4_t)  __a;
+}
+
 #endif
 
 enum {
@@ -16640,6 +17461,150 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]: __arm_vaddq_f16 (__ARM_mve_coerce(p0, float16x8_t), __ARM_mve_coerce(p1, float16x8_t)), \
   int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]: __arm_vaddq_f32 (__ARM_mve_coerce(p0, float32x4_t), __ARM_mve_coerce(p1, float32x4_t)));})
 
+#define vuninitializedq(p0) __arm_vuninitializedq(p0)
+#define __arm_vuninitializedq(p0) ({ __typeof(p0) __p0 = (p0); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
+  int (*)[__ARM_mve_type_int8x16_t]: __arm_vuninitializedq_s8 (), \
+  int (*)[__ARM_mve_type_int16x8_t]: __arm_vuninitializedq_s16 (), \
+  int (*)[__ARM_mve_type_int32x4_t]: __arm_vuninitializedq_s32 (), \
+  int (*)[__ARM_mve_type_int64x2_t]: __arm_vuninitializedq_s64 (), \
+  int (*)[__ARM_mve_type_uint8x16_t]: __arm_vuninitializedq_u8 (), \
+  int (*)[__ARM_mve_type_uint16x8_t]: __arm_vuninitializedq_u16 (), \
+  int (*)[__ARM_mve_type_uint32x4_t]: __arm_vuninitializedq_u32 (), \
+  int (*)[__ARM_mve_type_uint64x2_t]: __arm_vuninitializedq_u64 (), \
+  int (*)[__ARM_mve_type_float16x8_t]: __arm_vuninitializedq_f16 (), \
+  int (*)[__ARM_mve_type_float32x4_t]: __arm_vuninitializedq_f32 ());})
+
+#define vreinterpretq_f16(p0) __arm_vreinterpretq_f16(p0)
+#define __arm_vreinterpretq_f16(p0) ({ __typeof(p0) __p0 = (p0); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
+  int (*)[__ARM_mve_type_int8x16_t]: __arm_vreinterpretq_f16_s8 (__ARM_mve_coerce(__p0, int8x16_t)), \
+  int (*)[__ARM_mve_type_int16x8_t]: __arm_vreinterpretq_f16_s16 (__ARM_mve_coerce(__p0, int16x8_t)), \
+  int (*)[__ARM_mve_type_int32x4_t]: __arm_vreinterpretq_f16_s32 (__ARM_mve_coerce(__p0, int32x4_t)), \
+  int (*)[__ARM_mve_type_int64x2_t]: __arm_vreinterpretq_f16_s64 (__ARM_mve_coerce(__p0, int64x2_t)), \
+  int (*)[__ARM_mve_type_uint8x16_t]: __arm_vreinterpretq_f16_u8 (__ARM_mve_coerce(__p0, uint8x16_t)), \
+  int (*)[__ARM_mve_type_uint16x8_t]: __arm_vreinterpretq_f16_u16 (__ARM_mve_coerce(__p0, uint16x8_t)), \
+  int (*)[__ARM_mve_type_uint32x4_t]: __arm_vreinterpretq_f16_u32 (__ARM_mve_coerce(__p0, uint32x4_t)), \
+  int (*)[__ARM_mve_type_uint64x2_t]: __arm_vreinterpretq_f16_u64 (__ARM_mve_coerce(__p0, uint64x2_t)), \
+  int (*)[__ARM_mve_type_float32x4_t]: __arm_vreinterpretq_f16_f32 (__ARM_mve_coerce(__p0, float32x4_t)));})
+
+#define vreinterpretq_f32(p0) __arm_vreinterpretq_f32(p0)
+#define __arm_vreinterpretq_f32(p0) ({ __typeof(p0) __p0 = (p0); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
+  int (*)[__ARM_mve_type_int8x16_t]: __arm_vreinterpretq_f32_s8 (__ARM_mve_coerce(__p0, int8x16_t)), \
+  int (*)[__ARM_mve_type_int16x8_t]: __arm_vreinterpretq_f32_s16 (__ARM_mve_coerce(__p0, int16x8_t)), \
+  int (*)[__ARM_mve_type_int32x4_t]: __arm_vreinterpretq_f32_s32 (__ARM_mve_coerce(__p0, int32x4_t)), \
+  int (*)[__ARM_mve_type_int64x2_t]: __arm_vreinterpretq_f32_s64 (__ARM_mve_coerce(__p0, int64x2_t)), \
+  int (*)[__ARM_mve_type_uint8x16_t]: __arm_vreinterpretq_f32_u8 (__ARM_mve_coerce(__p0, uint8x16_t)), \
+  int (*)[__ARM_mve_type_uint16x8_t]: __arm_vreinterpretq_f32_u16 (__ARM_mve_coerce(__p0, uint16x8_t)), \
+  int (*)[__ARM_mve_type_uint32x4_t]: __arm_vreinterpretq_f32_u32 (__ARM_mve_coerce(__p0, uint32x4_t)), \
+  int (*)[__ARM_mve_type_uint64x2_t]: __arm_vreinterpretq_f32_u64 (__ARM_mve_coerce(__p0, uint64x2_t)), \
+  int (*)[__ARM_mve_type_float16x8_t]: __arm_vreinterpretq_f32_f16 (__ARM_mve_coerce(__p0, float16x8_t)));})
+
+#define vreinterpretq_s16(p0) __arm_vreinterpretq_s16(p0)
+#define __arm_vreinterpretq_s16(p0) ({ __typeof(p0) __p0 = (p0); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
+  int (*)[__ARM_mve_type_float16x8_t]: __arm_vreinterpretq_s16_f16 (__ARM_mve_coerce(__p0, float16x8_t)), \
+  int (*)[__ARM_mve_type_int8x16_t]: __arm_vreinterpretq_s16_s8 (__ARM_mve_coerce(__p0, int8x16_t)), \
+  int (*)[__ARM_mve_type_int32x4_t]: __arm_vreinterpretq_s16_s32 (__ARM_mve_coerce(__p0, int32x4_t)), \
+  int (*)[__ARM_mve_type_int64x2_t]: __arm_vreinterpretq_s16_s64 (__ARM_mve_coerce(__p0, int64x2_t)), \
+  int (*)[__ARM_mve_type_uint8x16_t]: __arm_vreinterpretq_s16_u8 (__ARM_mve_coerce(__p0, uint8x16_t)), \
+  int (*)[__ARM_mve_type_uint16x8_t]: __arm_vreinterpretq_s16_u16 (__ARM_mve_coerce(__p0, uint16x8_t)), \
+  int (*)[__ARM_mve_type_uint32x4_t]: __arm_vreinterpretq_s16_u32 (__ARM_mve_coerce(__p0, uint32x4_t)), \
+  int (*)[__ARM_mve_type_uint64x2_t]: __arm_vreinterpretq_s16_u64 (__ARM_mve_coerce(__p0, uint64x2_t)), \
+  int (*)[__ARM_mve_type_float32x4_t]: __arm_vreinterpretq_s16_f32 (__ARM_mve_coerce(__p0, float32x4_t)));})
+
+#define vreinterpretq_s32(p0) __arm_vreinterpretq_s32(p0)
+#define __arm_vreinterpretq_s32(p0) ({ __typeof(p0) __p0 = (p0); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
+  int (*)[__ARM_mve_type_float16x8_t]: __arm_vreinterpretq_s32_f16 (__ARM_mve_coerce(__p0, float16x8_t)), \
+  int (*)[__ARM_mve_type_int16x8_t]: __arm_vreinterpretq_s32_s16 (__ARM_mve_coerce(__p0, int16x8_t)), \
+  int (*)[__ARM_mve_type_int8x16_t]: __arm_vreinterpretq_s32_s8 (__ARM_mve_coerce(__p0, int8x16_t)), \
+  int (*)[__ARM_mve_type_int64x2_t]: __arm_vreinterpretq_s32_s64 (__ARM_mve_coerce(__p0, int64x2_t)), \
+  int (*)[__ARM_mve_type_uint8x16_t]: __arm_vreinterpretq_s32_u8 (__ARM_mve_coerce(__p0, uint8x16_t)), \
+  int (*)[__ARM_mve_type_uint16x8_t]: __arm_vreinterpretq_s32_u16 (__ARM_mve_coerce(__p0, uint16x8_t)), \
+  int (*)[__ARM_mve_type_uint32x4_t]: __arm_vreinterpretq_s32_u32 (__ARM_mve_coerce(__p0, uint32x4_t)), \
+  int (*)[__ARM_mve_type_uint64x2_t]: __arm_vreinterpretq_s32_u64 (__ARM_mve_coerce(__p0, uint64x2_t)), \
+  int (*)[__ARM_mve_type_float32x4_t]: __arm_vreinterpretq_s32_f32 (__ARM_mve_coerce(__p0, float32x4_t)));})
+
+#define vreinterpretq_s64(p0) __arm_vreinterpretq_s64(p0)
+#define __arm_vreinterpretq_s64(p0) ({ __typeof(p0) __p0 = (p0); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
+  int (*)[__ARM_mve_type_float16x8_t]: __arm_vreinterpretq_s64_f16 (__ARM_mve_coerce(__p0, float16x8_t)), \
+  int (*)[__ARM_mve_type_int16x8_t]: __arm_vreinterpretq_s64_s16 (__ARM_mve_coerce(__p0, int16x8_t)), \
+  int (*)[__ARM_mve_type_int32x4_t]: __arm_vreinterpretq_s64_s32 (__ARM_mve_coerce(__p0, int32x4_t)), \
+  int (*)[__ARM_mve_type_int8x16_t]: __arm_vreinterpretq_s64_s8 (__ARM_mve_coerce(__p0, int8x16_t)), \
+  int (*)[__ARM_mve_type_uint8x16_t]: __arm_vreinterpretq_s64_u8 (__ARM_mve_coerce(__p0, uint8x16_t)), \
+  int (*)[__ARM_mve_type_uint16x8_t]: __arm_vreinterpretq_s64_u16 (__ARM_mve_coerce(__p0, uint16x8_t)), \
+  int (*)[__ARM_mve_type_uint32x4_t]: __arm_vreinterpretq_s64_u32 (__ARM_mve_coerce(__p0, uint32x4_t)), \
+  int (*)[__ARM_mve_type_uint64x2_t]: __arm_vreinterpretq_s64_u64 (__ARM_mve_coerce(__p0, uint64x2_t)), \
+  int (*)[__ARM_mve_type_float32x4_t]: __arm_vreinterpretq_s64_f32 (__ARM_mve_coerce(__p0, float32x4_t)));})
+
+#define vreinterpretq_s8(p0) __arm_vreinterpretq_s8(p0)
+#define __arm_vreinterpretq_s8(p0) ({ __typeof(p0) __p0 = (p0); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
+  int (*)[__ARM_mve_type_float16x8_t]: __arm_vreinterpretq_s8_f16 (__ARM_mve_coerce(__p0, float16x8_t)), \
+  int (*)[__ARM_mve_type_int16x8_t]: __arm_vreinterpretq_s8_s16 (__ARM_mve_coerce(__p0, int16x8_t)), \
+  int (*)[__ARM_mve_type_int32x4_t]: __arm_vreinterpretq_s8_s32 (__ARM_mve_coerce(__p0, int32x4_t)), \
+  int (*)[__ARM_mve_type_int64x2_t]: __arm_vreinterpretq_s8_s64 (__ARM_mve_coerce(__p0, int64x2_t)), \
+  int (*)[__ARM_mve_type_uint8x16_t]: __arm_vreinterpretq_s8_u8 (__ARM_mve_coerce(__p0, uint8x16_t)), \
+  int (*)[__ARM_mve_type_uint16x8_t]: __arm_vreinterpretq_s8_u16 (__ARM_mve_coerce(__p0, uint16x8_t)), \
+  int (*)[__ARM_mve_type_uint32x4_t]: __arm_vreinterpretq_s8_u32 (__ARM_mve_coerce(__p0, uint32x4_t)), \
+  int (*)[__ARM_mve_type_uint64x2_t]: __arm_vreinterpretq_s8_u64 (__ARM_mve_coerce(__p0, uint64x2_t)), \
+  int (*)[__ARM_mve_type_float32x4_t]: __arm_vreinterpretq_s8_f32 (__ARM_mve_coerce(__p0, float32x4_t)));})
+
+#define vreinterpretq_u16(p0) __arm_vreinterpretq_u16(p0)
+#define __arm_vreinterpretq_u16(p0) ({ __typeof(p0) __p0 = (p0); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
+  int (*)[__ARM_mve_type_float16x8_t]: __arm_vreinterpretq_u16_f16 (__ARM_mve_coerce(__p0, float16x8_t)), \
+  int (*)[__ARM_mve_type_int8x16_t]: __arm_vreinterpretq_u16_s8 (__ARM_mve_coerce(__p0, int8x16_t)), \
+  int (*)[__ARM_mve_type_int32x4_t]: __arm_vreinterpretq_u16_s32 (__ARM_mve_coerce(__p0, int32x4_t)), \
+  int (*)[__ARM_mve_type_int64x2_t]: __arm_vreinterpretq_u16_s64 (__ARM_mve_coerce(__p0, int64x2_t)), \
+  int (*)[__ARM_mve_type_uint8x16_t]: __arm_vreinterpretq_u16_u8 (__ARM_mve_coerce(__p0, uint8x16_t)), \
+  int (*)[__ARM_mve_type_int16x8_t]: __arm_vreinterpretq_u16_s16 (__ARM_mve_coerce(__p0, int16x8_t)), \
+  int (*)[__ARM_mve_type_uint32x4_t]: __arm_vreinterpretq_u16_u32 (__ARM_mve_coerce(__p0, uint32x4_t)), \
+  int (*)[__ARM_mve_type_uint64x2_t]: __arm_vreinterpretq_u16_u64 (__ARM_mve_coerce(__p0, uint64x2_t)), \
+  int (*)[__ARM_mve_type_float32x4_t]: __arm_vreinterpretq_u16_f32 (__ARM_mve_coerce(__p0, float32x4_t)));})
+
+#define vreinterpretq_u32(p0) __arm_vreinterpretq_u32(p0)
+#define __arm_vreinterpretq_u32(p0) ({ __typeof(p0) __p0 = (p0); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
+  int (*)[__ARM_mve_type_float16x8_t]: __arm_vreinterpretq_u32_f16 (__ARM_mve_coerce(__p0, float16x8_t)), \
+  int (*)[__ARM_mve_type_int16x8_t]: __arm_vreinterpretq_u32_s16 (__ARM_mve_coerce(__p0, int16x8_t)), \
+  int (*)[__ARM_mve_type_int8x16_t]: __arm_vreinterpretq_u32_s8 (__ARM_mve_coerce(__p0, int8x16_t)), \
+  int (*)[__ARM_mve_type_int64x2_t]: __arm_vreinterpretq_u32_s64 (__ARM_mve_coerce(__p0, int64x2_t)), \
+  int (*)[__ARM_mve_type_uint8x16_t]: __arm_vreinterpretq_u32_u8 (__ARM_mve_coerce(__p0, uint8x16_t)), \
+  int (*)[__ARM_mve_type_uint16x8_t]: __arm_vreinterpretq_u32_u16 (__ARM_mve_coerce(__p0, uint16x8_t)), \
+  int (*)[__ARM_mve_type_int32x4_t]: __arm_vreinterpretq_u32_s32 (__ARM_mve_coerce(__p0, int32x4_t)), \
+  int (*)[__ARM_mve_type_uint64x2_t]: __arm_vreinterpretq_u32_u64 (__ARM_mve_coerce(__p0, uint64x2_t)), \
+  int (*)[__ARM_mve_type_float32x4_t]: __arm_vreinterpretq_u32_f32 (__ARM_mve_coerce(__p0, float32x4_t)));})
+
+#define vreinterpretq_u64(p0) __arm_vreinterpretq_u64(p0)
+#define __arm_vreinterpretq_u64(p0) ({ __typeof(p0) __p0 = (p0); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
+  int (*)[__ARM_mve_type_float16x8_t]: __arm_vreinterpretq_u64_f16 (__ARM_mve_coerce(__p0, float16x8_t)), \
+  int (*)[__ARM_mve_type_int16x8_t]: __arm_vreinterpretq_u64_s16 (__ARM_mve_coerce(__p0, int16x8_t)), \
+  int (*)[__ARM_mve_type_int32x4_t]: __arm_vreinterpretq_u64_s32 (__ARM_mve_coerce(__p0, int32x4_t)), \
+  int (*)[__ARM_mve_type_int8x16_t]: __arm_vreinterpretq_u64_s8 (__ARM_mve_coerce(__p0, int8x16_t)), \
+  int (*)[__ARM_mve_type_uint8x16_t]: __arm_vreinterpretq_u64_u8 (__ARM_mve_coerce(__p0, uint8x16_t)), \
+  int (*)[__ARM_mve_type_uint16x8_t]: __arm_vreinterpretq_u64_u16 (__ARM_mve_coerce(__p0, uint16x8_t)), \
+  int (*)[__ARM_mve_type_uint32x4_t]: __arm_vreinterpretq_u64_u32 (__ARM_mve_coerce(__p0, uint32x4_t)), \
+  int (*)[__ARM_mve_type_int64x2_t]: __arm_vreinterpretq_u64_s64 (__ARM_mve_coerce(__p0, int64x2_t)), \
+  int (*)[__ARM_mve_type_float32x4_t]: __arm_vreinterpretq_u64_f32 (__ARM_mve_coerce(__p0, float32x4_t)));})
+
+#define vreinterpretq_u8(p0) __arm_vreinterpretq_u8(p0)
+#define __arm_vreinterpretq_u8(p0) ({ __typeof(p0) __p0 = (p0); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
+  int (*)[__ARM_mve_type_float16x8_t]: __arm_vreinterpretq_u8_f16 (__ARM_mve_coerce(__p0, float16x8_t)), \
+  int (*)[__ARM_mve_type_int16x8_t]: __arm_vreinterpretq_u8_s16 (__ARM_mve_coerce(__p0, int16x8_t)), \
+  int (*)[__ARM_mve_type_int32x4_t]: __arm_vreinterpretq_u8_s32 (__ARM_mve_coerce(__p0, int32x4_t)), \
+  int (*)[__ARM_mve_type_int64x2_t]: __arm_vreinterpretq_u8_s64 (__ARM_mve_coerce(__p0, int64x2_t)), \
+  int (*)[__ARM_mve_type_int8x16_t]: __arm_vreinterpretq_u8_s8 (__ARM_mve_coerce(__p0, int8x16_t)), \
+  int (*)[__ARM_mve_type_uint16x8_t]: __arm_vreinterpretq_u8_u16 (__ARM_mve_coerce(__p0, uint16x8_t)), \
+  int (*)[__ARM_mve_type_uint32x4_t]: __arm_vreinterpretq_u8_u32 (__ARM_mve_coerce(__p0, uint32x4_t)), \
+  int (*)[__ARM_mve_type_uint64x2_t]: __arm_vreinterpretq_u8_u64 (__ARM_mve_coerce(__p0, uint64x2_t)), \
+  int (*)[__ARM_mve_type_float32x4_t]: __arm_vreinterpretq_u8_f32 (__ARM_mve_coerce(__p0, float32x4_t)));})
+
 #else /* MVE Interger.  */
 
 #define vst4q(p0,p1) __arm_vst4q(p0,p1)
@@ -19884,6 +20849,106 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_int32_t_ptr][__ARM_mve_type_int32x4_t]: __arm_vstrwq_scatter_shifted_offset_p_s32 (__ARM_mve_coerce(__p0, int32_t *), __p1, __ARM_mve_coerce(__p2, int32x4_t), p3), \
   int (*)[__ARM_mve_type_uint32_t_ptr][__ARM_mve_type_uint32x4_t]: __arm_vstrwq_scatter_shifted_offset_p_u32 (__ARM_mve_coerce(__p0, uint32_t *), __p1, __ARM_mve_coerce(__p2, uint32x4_t), p3));})
 
+#define vuninitializedq(p0) __arm_vuninitializedq(p0)
+#define __arm_vuninitializedq(p0) ({ __typeof(p0) __p0 = (p0); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
+  int (*)[__ARM_mve_type_int8x16_t]: __arm_vuninitializedq_s8 (), \
+  int (*)[__ARM_mve_type_int16x8_t]: __arm_vuninitializedq_s16 (), \
+  int (*)[__ARM_mve_type_int32x4_t]: __arm_vuninitializedq_s32 (), \
+  int (*)[__ARM_mve_type_int64x2_t]: __arm_vuninitializedq_s64 (), \
+  int (*)[__ARM_mve_type_uint8x16_t]: __arm_vuninitializedq_u8 (), \
+  int (*)[__ARM_mve_type_uint16x8_t]: __arm_vuninitializedq_u16 (), \
+  int (*)[__ARM_mve_type_uint32x4_t]: __arm_vuninitializedq_u32 (), \
+  int (*)[__ARM_mve_type_uint64x2_t]: __arm_vuninitializedq_u64 ());})
+
+#define vreinterpretq_s16(p0) __arm_vreinterpretq_s16(p0)
+#define __arm_vreinterpretq_s16(p0) ({ __typeof(p0) __p0 = (p0); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
+  int (*)[__ARM_mve_type_int8x16_t]: __arm_vreinterpretq_s16_s8 (__ARM_mve_coerce(__p0, int8x16_t)), \
+  int (*)[__ARM_mve_type_int32x4_t]: __arm_vreinterpretq_s16_s32 (__ARM_mve_coerce(__p0, int32x4_t)), \
+  int (*)[__ARM_mve_type_int64x2_t]: __arm_vreinterpretq_s16_s64 (__ARM_mve_coerce(__p0, int64x2_t)), \
+  int (*)[__ARM_mve_type_uint8x16_t]: __arm_vreinterpretq_s16_u8 (__ARM_mve_coerce(__p0, uint8x16_t)), \
+  int (*)[__ARM_mve_type_uint16x8_t]: __arm_vreinterpretq_s16_u16 (__ARM_mve_coerce(__p0, uint16x8_t)), \
+  int (*)[__ARM_mve_type_uint32x4_t]: __arm_vreinterpretq_s16_u32 (__ARM_mve_coerce(__p0, uint32x4_t)), \
+  int (*)[__ARM_mve_type_uint64x2_t]: __arm_vreinterpretq_s16_u64 (__ARM_mve_coerce(__p0, uint64x2_t)));})
+
+#define vreinterpretq_s32(p0) __arm_vreinterpretq_s32(p0)
+#define __arm_vreinterpretq_s32(p0) ({ __typeof(p0) __p0 = (p0); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
+  int (*)[__ARM_mve_type_int16x8_t]: __arm_vreinterpretq_s32_s16 (__ARM_mve_coerce(__p0, int16x8_t)), \
+  int (*)[__ARM_mve_type_int8x16_t]: __arm_vreinterpretq_s32_s8 (__ARM_mve_coerce(__p0, int8x16_t)), \
+  int (*)[__ARM_mve_type_int64x2_t]: __arm_vreinterpretq_s32_s64 (__ARM_mve_coerce(__p0, int64x2_t)), \
+  int (*)[__ARM_mve_type_uint8x16_t]: __arm_vreinterpretq_s32_u8 (__ARM_mve_coerce(__p0, uint8x16_t)), \
+  int (*)[__ARM_mve_type_uint16x8_t]: __arm_vreinterpretq_s32_u16 (__ARM_mve_coerce(__p0, uint16x8_t)), \
+  int (*)[__ARM_mve_type_uint32x4_t]: __arm_vreinterpretq_s32_u32 (__ARM_mve_coerce(__p0, uint32x4_t)), \
+  int (*)[__ARM_mve_type_uint64x2_t]: __arm_vreinterpretq_s32_u64 (__ARM_mve_coerce(__p0, uint64x2_t)));})
+
+#define vreinterpretq_s64(p0) __arm_vreinterpretq_s64(p0)
+#define __arm_vreinterpretq_s64(p0) ({ __typeof(p0) __p0 = (p0); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
+  int (*)[__ARM_mve_type_int16x8_t]: __arm_vreinterpretq_s64_s16 (__ARM_mve_coerce(__p0, int16x8_t)), \
+  int (*)[__ARM_mve_type_int32x4_t]: __arm_vreinterpretq_s64_s32 (__ARM_mve_coerce(__p0, int32x4_t)), \
+  int (*)[__ARM_mve_type_int8x16_t]: __arm_vreinterpretq_s64_s8 (__ARM_mve_coerce(__p0, int8x16_t)), \
+  int (*)[__ARM_mve_type_uint8x16_t]: __arm_vreinterpretq_s64_u8 (__ARM_mve_coerce(__p0, uint8x16_t)), \
+  int (*)[__ARM_mve_type_uint16x8_t]: __arm_vreinterpretq_s64_u16 (__ARM_mve_coerce(__p0, uint16x8_t)), \
+  int (*)[__ARM_mve_type_uint32x4_t]: __arm_vreinterpretq_s64_u32 (__ARM_mve_coerce(__p0, uint32x4_t)), \
+  int (*)[__ARM_mve_type_uint64x2_t]: __arm_vreinterpretq_s64_u64 (__ARM_mve_coerce(__p0, uint64x2_t)));})
+
+#define vreinterpretq_s8(p0) __arm_vreinterpretq_s8(p0)
+#define __arm_vreinterpretq_s8(p0) ({ __typeof(p0) __p0 = (p0); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
+  int (*)[__ARM_mve_type_int16x8_t]: __arm_vreinterpretq_s8_s16 (__ARM_mve_coerce(__p0, int16x8_t)), \
+  int (*)[__ARM_mve_type_int32x4_t]: __arm_vreinterpretq_s8_s32 (__ARM_mve_coerce(__p0, int32x4_t)), \
+  int (*)[__ARM_mve_type_int64x2_t]: __arm_vreinterpretq_s8_s64 (__ARM_mve_coerce(__p0, int64x2_t)), \
+  int (*)[__ARM_mve_type_uint8x16_t]: __arm_vreinterpretq_s8_u8 (__ARM_mve_coerce(__p0, uint8x16_t)), \
+  int (*)[__ARM_mve_type_uint16x8_t]: __arm_vreinterpretq_s8_u16 (__ARM_mve_coerce(__p0, uint16x8_t)), \
+  int (*)[__ARM_mve_type_uint32x4_t]: __arm_vreinterpretq_s8_u32 (__ARM_mve_coerce(__p0, uint32x4_t)), \
+  int (*)[__ARM_mve_type_uint64x2_t]: __arm_vreinterpretq_s8_u64 (__ARM_mve_coerce(__p0, uint64x2_t)));})
+
+#define vreinterpretq_u16(p0) __arm_vreinterpretq_u16(p0)
+#define __arm_vreinterpretq_u16(p0) ({ __typeof(p0) __p0 = (p0); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
+  int (*)[__ARM_mve_type_int8x16_t]: __arm_vreinterpretq_u16_s8 (__ARM_mve_coerce(__p0, int8x16_t)), \
+  int (*)[__ARM_mve_type_int32x4_t]: __arm_vreinterpretq_u16_s32 (__ARM_mve_coerce(__p0, int32x4_t)), \
+  int (*)[__ARM_mve_type_int64x2_t]: __arm_vreinterpretq_u16_s64 (__ARM_mve_coerce(__p0, int64x2_t)), \
+  int (*)[__ARM_mve_type_uint8x16_t]: __arm_vreinterpretq_u16_u8 (__ARM_mve_coerce(__p0, uint8x16_t)), \
+  int (*)[__ARM_mve_type_int16x8_t]: __arm_vreinterpretq_u16_s16 (__ARM_mve_coerce(__p0, int16x8_t)), \
+  int (*)[__ARM_mve_type_uint32x4_t]: __arm_vreinterpretq_u16_u32 (__ARM_mve_coerce(__p0, uint32x4_t)), \
+  int (*)[__ARM_mve_type_uint64x2_t]: __arm_vreinterpretq_u16_u64 (__ARM_mve_coerce(__p0, uint64x2_t)));})
+
+#define vreinterpretq_u32(p0) __arm_vreinterpretq_u32(p0)
+#define __arm_vreinterpretq_u32(p0) ({ __typeof(p0) __p0 = (p0); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
+  int (*)[__ARM_mve_type_int16x8_t]: __arm_vreinterpretq_u32_s16 (__ARM_mve_coerce(__p0, int16x8_t)), \
+  int (*)[__ARM_mve_type_int8x16_t]: __arm_vreinterpretq_u32_s8 (__ARM_mve_coerce(__p0, int8x16_t)), \
+  int (*)[__ARM_mve_type_int64x2_t]: __arm_vreinterpretq_u32_s64 (__ARM_mve_coerce(__p0, int64x2_t)), \
+  int (*)[__ARM_mve_type_uint8x16_t]: __arm_vreinterpretq_u32_u8 (__ARM_mve_coerce(__p0, uint8x16_t)), \
+  int (*)[__ARM_mve_type_uint16x8_t]: __arm_vreinterpretq_u32_u16 (__ARM_mve_coerce(__p0, uint16x8_t)), \
+  int (*)[__ARM_mve_type_int32x4_t]: __arm_vreinterpretq_u32_s32 (__ARM_mve_coerce(__p0, int32x4_t)), \
+  int (*)[__ARM_mve_type_uint64x2_t]: __arm_vreinterpretq_u32_u64 (__ARM_mve_coerce(__p0, uint64x2_t)));})
+
+#define vreinterpretq_u64(p0) __arm_vreinterpretq_u64(p0)
+#define __arm_vreinterpretq_u64(p0) ({ __typeof(p0) __p0 = (p0); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
+  int (*)[__ARM_mve_type_int16x8_t]: __arm_vreinterpretq_u64_s16 (__ARM_mve_coerce(__p0, int16x8_t)), \
+  int (*)[__ARM_mve_type_int32x4_t]: __arm_vreinterpretq_u64_s32 (__ARM_mve_coerce(__p0, int32x4_t)), \
+  int (*)[__ARM_mve_type_int8x16_t]: __arm_vreinterpretq_u64_s8 (__ARM_mve_coerce(__p0, int8x16_t)), \
+  int (*)[__ARM_mve_type_uint8x16_t]: __arm_vreinterpretq_u64_u8 (__ARM_mve_coerce(__p0, uint8x16_t)), \
+  int (*)[__ARM_mve_type_uint16x8_t]: __arm_vreinterpretq_u64_u16 (__ARM_mve_coerce(__p0, uint16x8_t)), \
+  int (*)[__ARM_mve_type_uint32x4_t]: __arm_vreinterpretq_u64_u32 (__ARM_mve_coerce(__p0, uint32x4_t)), \
+  int (*)[__ARM_mve_type_int64x2_t]: __arm_vreinterpretq_u64_s64 (__ARM_mve_coerce(__p0, int64x2_t)));})
+
+#define vreinterpretq_u8(p0) __arm_vreinterpretq_u8(p0)
+#define __arm_vreinterpretq_u8(p0) ({ __typeof(p0) __p0 = (p0); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
+  int (*)[__ARM_mve_type_int16x8_t]: __arm_vreinterpretq_u8_s16 (__ARM_mve_coerce(__p0, int16x8_t)), \
+  int (*)[__ARM_mve_type_int32x4_t]: __arm_vreinterpretq_u8_s32 (__ARM_mve_coerce(__p0, int32x4_t)), \
+  int (*)[__ARM_mve_type_int64x2_t]: __arm_vreinterpretq_u8_s64 (__ARM_mve_coerce(__p0, int64x2_t)), \
+  int (*)[__ARM_mve_type_int8x16_t]: __arm_vreinterpretq_u8_s8 (__ARM_mve_coerce(__p0, int8x16_t)), \
+  int (*)[__ARM_mve_type_uint16x8_t]: __arm_vreinterpretq_u8_u16 (__ARM_mve_coerce(__p0, uint16x8_t)), \
+  int (*)[__ARM_mve_type_uint32x4_t]: __arm_vreinterpretq_u8_u32 (__ARM_mve_coerce(__p0, uint32x4_t)), \
+  int (*)[__ARM_mve_type_uint64x2_t]: __arm_vreinterpretq_u8_u64 (__ARM_mve_coerce(__p0, uint64x2_t)));})
+
 #endif /* MVE Floating point.  */
 
 #ifdef __cplusplus
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_f16.c
new file mode 100644
index 0000000000000000000000000000000000000000..74c373d9fba4722de662850423f4a314df02a97b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_f16.c
@@ -0,0 +1,44 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+int8x16_t value1;
+int64x2_t value2;
+int32x4_t value3;
+uint8x16_t value4;
+uint16x8_t value5;
+uint64x2_t value6;
+uint32x4_t value7;
+int16x8_t value8;
+float32x4_t value9;
+
+float16x8_t
+foo ()
+{
+  float16x8_t r1,r2,r3,r4,r5,r6,r7;
+  r1 = vaddq_f16 (vreinterpretq_f16_s8 (value1), vreinterpretq_f16_s64 (value2));
+  r2 = vaddq_f16 (r1, vreinterpretq_f16_s32 (value3));
+  r3 = vaddq_f16 (r2, vreinterpretq_f16_u8 (value4));
+  r4 = vaddq_f16 (r3, vreinterpretq_f16_u16 (value5));
+  r5 = vaddq_f16 (r4, vreinterpretq_f16_u64 (value6));
+  r6 = vaddq_f16 (r5, vreinterpretq_f16_u32 (value7));
+  r7 = vaddq_f16 (r6, vreinterpretq_f16_s16 (value8));
+  return vaddq_f16 (r7, vreinterpretq_f16_f32 (value9));
+}
+
+float16x8_t
+foo1 ()
+{
+  float16x8_t r1,r2,r3,r4,r5,r6,r7;
+  r1 = vaddq_f16 (vreinterpretq_f16 (value1), vreinterpretq_f16 (value2));
+  r2 = vaddq_f16 (r1, vreinterpretq_f16 (value3));
+  r3 = vaddq_f16 (r2, vreinterpretq_f16 (value4));
+  r4 = vaddq_f16 (r3, vreinterpretq_f16 (value5));
+  r5 = vaddq_f16 (r4, vreinterpretq_f16 (value6));
+  r6 = vaddq_f16 (r5, vreinterpretq_f16 (value7));
+  r7 = vaddq_f16 (r6, vreinterpretq_f16 (value8));
+  return vaddq_f16 (r7, vreinterpretq_f16 (value9));
+}
+
+/* { dg-final { scan-assembler-times "vadd.f16" 8 } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_f32.c
new file mode 100644
index 0000000000000000000000000000000000000000..89e3b7a2e554d3da8bd110a739c021fc9d0964ca
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_f32.c
@@ -0,0 +1,44 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+int16x8_t value1;
+int64x2_t value2;
+int8x16_t value3;
+uint8x16_t value4;
+uint16x8_t value5;
+uint64x2_t value6;
+uint32x4_t value7;
+float16x8_t value8;
+int32x4_t value9;
+
+float32x4_t
+foo ()
+{
+  float32x4_t r1,r2,r3,r4,r5,r6,r7;
+  r1 = vaddq_f32 (vreinterpretq_f32_s16 (value1), vreinterpretq_f32_s64 (value2));
+  r2 = vaddq_f32 (r1, vreinterpretq_f32_s8 (value3));
+  r3 = vaddq_f32 (r2, vreinterpretq_f32_u8 (value4));
+  r4 = vaddq_f32 (r3, vreinterpretq_f32_u16 (value5));
+  r5 = vaddq_f32 (r4, vreinterpretq_f32_u64 (value6));
+  r6 = vaddq_f32 (r5, vreinterpretq_f32_u32 (value7));
+  r7 = vaddq_f32 (r6, vreinterpretq_f32_f16 (value8));
+  return vaddq_f32 (r7, vreinterpretq_f32_s32 (value9));
+}
+
+float32x4_t
+foo1 ()
+{
+  float32x4_t r1,r2,r3,r4,r5,r6,r7;
+  r1 = vaddq_f32 (vreinterpretq_f32 (value1), vreinterpretq_f32 (value2));
+  r2 = vaddq_f32 (r1, vreinterpretq_f32 (value3));
+  r3 = vaddq_f32 (r2, vreinterpretq_f32 (value4));
+  r4 = vaddq_f32 (r3, vreinterpretq_f32 (value5));
+  r5 = vaddq_f32 (r4, vreinterpretq_f32 (value6));
+  r6 = vaddq_f32 (r5, vreinterpretq_f32 (value7));
+  r7 = vaddq_f32 (r6, vreinterpretq_f32 (value8));
+  return vaddq_f32 (r7, vreinterpretq_f32 (value9));
+}
+
+/* { dg-final { scan-assembler-times "vadd.f32" 8 } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_s16.c
new file mode 100644
index 0000000000000000000000000000000000000000..f3dccb6386eaeb49e6f6c8d10dc59a73a85a04f2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_s16.c
@@ -0,0 +1,44 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+int8x16_t value1;
+int64x2_t value2;
+int32x4_t value3;
+uint8x16_t value4;
+uint16x8_t value5;
+uint64x2_t value6;
+uint32x4_t value7;
+float16x8_t value8;
+float32x4_t value9;
+
+int16x8_t
+foo ()
+{
+  int16x8_t r1,r2,r3,r4,r5,r6,r7;
+  r1 = vaddq_s16 (vreinterpretq_s16_s8 (value1), vreinterpretq_s16_s64 (value2));
+  r2 = vaddq_s16 (r1, vreinterpretq_s16_s32 (value3));
+  r3 = vaddq_s16 (r2, vreinterpretq_s16_u8 (value4));
+  r4 = vaddq_s16 (r3, vreinterpretq_s16_u16 (value5));
+  r5 = vaddq_s16 (r4, vreinterpretq_s16_u64 (value6));
+  r6 = vaddq_s16 (r5, vreinterpretq_s16_u32 (value7));
+  r7 = vaddq_s16 (r6, vreinterpretq_s16_f16 (value8));
+  return vaddq_s16 (r7, vreinterpretq_s16_f32 (value9));
+}
+
+int16x8_t
+foo1 ()
+{
+  int16x8_t r1,r2,r3,r4,r5,r6,r7;
+  r1 = vaddq_s16 (vreinterpretq_s16 (value1), vreinterpretq_s16 (value2));
+  r2 = vaddq_s16 (r1, vreinterpretq_s16 (value3));
+  r3 = vaddq_s16 (r2, vreinterpretq_s16 (value4));
+  r4 = vaddq_s16 (r3, vreinterpretq_s16 (value5));
+  r5 = vaddq_s16 (r4, vreinterpretq_s16 (value6));
+  r6 = vaddq_s16 (r5, vreinterpretq_s16 (value7));
+  r7 = vaddq_s16 (r6, vreinterpretq_s16 (value8));
+  return vaddq_s16 (r7, vreinterpretq_s16 (value9));
+}
+
+/* { dg-final { scan-assembler-times "vadd.i16" 8 } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_s32.c
new file mode 100644
index 0000000000000000000000000000000000000000..efaca09a3801fcadf0e7a3254bfa79709b1c7b55
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_s32.c
@@ -0,0 +1,44 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+int16x8_t value1;
+int64x2_t value2;
+int8x16_t value3;
+uint8x16_t value4;
+uint16x8_t value5;
+uint64x2_t value6;
+uint32x4_t value7;
+float16x8_t value8;
+float32x4_t value9;
+
+int32x4_t
+foo ()
+{
+  int32x4_t r1,r2,r3,r4,r5,r6,r7;
+  r1 = vaddq_s32 (vreinterpretq_s32_s16 (value1), vreinterpretq_s32_s64 (value2));
+  r2 = vaddq_s32 (r1, vreinterpretq_s32_s8 (value3));
+  r3 = vaddq_s32 (r2, vreinterpretq_s32_u8 (value4));
+  r4 = vaddq_s32 (r3, vreinterpretq_s32_u16 (value5));
+  r5 = vaddq_s32 (r4, vreinterpretq_s32_u64 (value6));
+  r6 = vaddq_s32 (r5, vreinterpretq_s32_u32 (value7));
+  r7 = vaddq_s32 (r6, vreinterpretq_s32_f16 (value8));
+  return vaddq_s32 (r7, vreinterpretq_s32_f32 (value9));
+}
+
+int32x4_t
+foo1 ()
+{
+  int32x4_t r1,r2,r3,r4,r5,r6,r7;
+  r1 = vaddq_s32 (vreinterpretq_s32 (value1), vreinterpretq_s32 (value2));
+  r2 = vaddq_s32 (r1, vreinterpretq_s32 (value3));
+  r3 = vaddq_s32 (r2, vreinterpretq_s32 (value4));
+  r4 = vaddq_s32 (r3, vreinterpretq_s32 (value5));
+  r5 = vaddq_s32 (r4, vreinterpretq_s32 (value6));
+  r6 = vaddq_s32 (r5, vreinterpretq_s32 (value7));
+  r7 = vaddq_s32 (r6, vreinterpretq_s32 (value8));
+  return vaddq_s32 (r7, vreinterpretq_s32 (value9));
+}
+
+/* { dg-final { scan-assembler-times "vadd.i32" 8 } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_s64.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_s64.c
new file mode 100644
index 0000000000000000000000000000000000000000..de51e414b536a1dfcaa8f4236ab4ae97c05689e2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_s64.c
@@ -0,0 +1,45 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+int16x8_t value1;
+int8x16_t value2;
+int32x4_t value3;
+uint8x16_t value4;
+uint16x8_t value5;
+uint64x2_t value6;
+uint32x4_t value7;
+float16x8_t value8;
+float32x4_t value9;
+
+int64x2_t
+foo (mve_pred16_t __p)
+{
+  int64x2_t r1,r2,r3,r4,r5,r6,r7;
+  r1 = vpselq_s64 (vreinterpretq_s64_s16 (value1), vreinterpretq_s64_s8 (value2),
+		   __p);
+  r2 = vpselq_s64 (r1, vreinterpretq_s64_s32 (value3), __p);
+  r3 = vpselq_s64 (r2, vreinterpretq_s64_u8 (value4), __p);
+  r4 = vpselq_s64 (r3, vreinterpretq_s64_u16 (value5), __p);
+  r5 = vpselq_s64 (r4, vreinterpretq_s64_u64 (value6), __p);
+  r6 = vpselq_s64 (r5, vreinterpretq_s64_u32 (value7), __p);
+  r7 = vpselq_s64 (r6, vreinterpretq_s64_f16 (value8), __p);
+  return vpselq_s64 (r7, vreinterpretq_s64_f32 (value9), __p);
+}
+
+int64x2_t
+foo1 (mve_pred16_t __p)
+{
+  int64x2_t r1,r2,r3,r4,r5,r6,r7;
+  r1 = vpselq_s64 (vreinterpretq_s64 (value1), vreinterpretq_s64 (value2), __p);
+  r2 = vpselq_s64 (r1, vreinterpretq_s64 (value3), __p);
+  r3 = vpselq_s64 (r2, vreinterpretq_s64 (value4), __p);
+  r4 = vpselq_s64 (r3, vreinterpretq_s64 (value5), __p);
+  r5 = vpselq_s64 (r4, vreinterpretq_s64 (value6), __p);
+  r6 = vpselq_s64 (r5, vreinterpretq_s64 (value7), __p);
+  r7 = vpselq_s64 (r6, vreinterpretq_s64 (value8), __p);
+  return vpselq_s64 (r7, vreinterpretq_s64 (value9), __p);
+}
+
+/* { dg-final { scan-assembler-times "vpsel" 8 } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_s8.c
new file mode 100644
index 0000000000000000000000000000000000000000..0ef158320274df3b5335d0568519002e70093c01
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_s8.c
@@ -0,0 +1,44 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+int16x8_t value1;
+int64x2_t value2;
+int32x4_t value3;
+uint8x16_t value4;
+uint16x8_t value5;
+uint64x2_t value6;
+uint32x4_t value7;
+float16x8_t value8;
+float32x4_t value9;
+
+int8x16_t
+foo ()
+{
+  int8x16_t r1,r2,r3,r4,r5,r6,r7;
+  r1 = vaddq_s8 (vreinterpretq_s8_s16 (value1), vreinterpretq_s8_s64 (value2));
+  r2 = vaddq_s8 (r1, vreinterpretq_s8_s32 (value3));
+  r3 = vaddq_s8 (r2, vreinterpretq_s8_u8 (value4));
+  r4 = vaddq_s8 (r3, vreinterpretq_s8_u16 (value5));
+  r5 = vaddq_s8 (r4, vreinterpretq_s8_u64 (value6));
+  r6 = vaddq_s8 (r5, vreinterpretq_s8_u32 (value7));
+  r7 = vaddq_s8 (r6, vreinterpretq_s8_f16 (value8));
+  return vaddq_s8 (r7, vreinterpretq_s8_f32 (value9));
+}
+
+int8x16_t
+foo1 ()
+{
+  int8x16_t r1,r2,r3,r4,r5,r6,r7;
+  r1 = vaddq_s8 (vreinterpretq_s8 (value1), vreinterpretq_s8 (value2));
+  r2 = vaddq_s8 (r1, vreinterpretq_s8 (value3));
+  r3 = vaddq_s8 (r2, vreinterpretq_s8 (value4));
+  r4 = vaddq_s8 (r3, vreinterpretq_s8 (value5));
+  r5 = vaddq_s8 (r4, vreinterpretq_s8 (value6));
+  r6 = vaddq_s8 (r5, vreinterpretq_s8 (value7));
+  r7 = vaddq_s8 (r6, vreinterpretq_s8 (value8));
+  return vaddq_s8 (r7, vreinterpretq_s8 (value9));
+}
+
+/* { dg-final { scan-assembler-times "vadd.i8" 8 } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_u16.c
new file mode 100644
index 0000000000000000000000000000000000000000..63d5528db284f63816c1dad2638c7587755bf069
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_u16.c
@@ -0,0 +1,44 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+int8x16_t value1;
+int64x2_t value2;
+int32x4_t value3;
+uint8x16_t value4;
+int16x8_t value5;
+uint64x2_t value6;
+uint32x4_t value7;
+float16x8_t value8;
+float32x4_t value9;
+
+uint16x8_t
+foo ()
+{
+  uint16x8_t r1,r2,r3,r4,r5,r6,r7;
+  r1 = vaddq_u16 (vreinterpretq_u16_s8 (value1), vreinterpretq_u16_s64 (value2));
+  r2 = vaddq_u16 (r1, vreinterpretq_u16_s32 (value3));
+  r3 = vaddq_u16 (r2, vreinterpretq_u16_u8 (value4));
+  r4 = vaddq_u16 (r3, vreinterpretq_u16_s16 (value5));
+  r5 = vaddq_u16 (r4, vreinterpretq_u16_u64 (value6));
+  r6 = vaddq_u16 (r5, vreinterpretq_u16_u32 (value7));
+  r7 = vaddq_u16 (r6, vreinterpretq_u16_f16 (value8));
+  return vaddq_u16 (r7, vreinterpretq_u16_f32 (value9));
+}
+
+uint16x8_t
+foo1 ()
+{
+  uint16x8_t r1,r2,r3,r4,r5,r6,r7;
+  r1 = vaddq_u16 (vreinterpretq_u16 (value1), vreinterpretq_u16 (value2));
+  r2 = vaddq_u16 (r1, vreinterpretq_u16 (value3));
+  r3 = vaddq_u16 (r2, vreinterpretq_u16 (value4));
+  r4 = vaddq_u16 (r3, vreinterpretq_u16 (value5));
+  r5 = vaddq_u16 (r4, vreinterpretq_u16 (value6));
+  r6 = vaddq_u16 (r5, vreinterpretq_u16 (value7));
+  r7 = vaddq_u16 (r6, vreinterpretq_u16 (value8));
+  return vaddq_u16 (r7, vreinterpretq_u16 (value9));
+}
+
+/* { dg-final { scan-assembler-times "vadd.i16" 8 } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_u32.c
new file mode 100644
index 0000000000000000000000000000000000000000..1a4202a62d4548a153f223fde7036a04828c133f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_u32.c
@@ -0,0 +1,44 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+int16x8_t value1;
+int64x2_t value2;
+int8x16_t value3;
+uint8x16_t value4;
+uint16x8_t value5;
+uint64x2_t value6;
+int32x4_t value7;
+float16x8_t value8;
+float32x4_t value9;
+
+uint32x4_t
+foo ()
+{
+  uint32x4_t r1,r2,r3,r4,r5,r6,r7;
+  r1 = vaddq_u32 (vreinterpretq_u32_s16 (value1), vreinterpretq_u32_s64 (value2));
+  r2 = vaddq_u32 (r1, vreinterpretq_u32_s8 (value3));
+  r3 = vaddq_u32 (r2, vreinterpretq_u32_u8 (value4));
+  r4 = vaddq_u32 (r3, vreinterpretq_u32_u16 (value5));
+  r5 = vaddq_u32 (r4, vreinterpretq_u32_u64 (value6));
+  r6 = vaddq_u32 (r5, vreinterpretq_u32_s32 (value7));
+  r7 = vaddq_u32 (r6, vreinterpretq_u32_f16 (value8));
+  return vaddq_u32 (r7, vreinterpretq_u32_f32 (value9));
+}
+
+uint32x4_t
+foo1 ()
+{
+  uint32x4_t r1,r2,r3,r4,r5,r6,r7;
+  r1 = vaddq_u32 (vreinterpretq_u32 (value1), vreinterpretq_u32 (value2));
+  r2 = vaddq_u32 (r1, vreinterpretq_u32 (value3));
+  r3 = vaddq_u32 (r2, vreinterpretq_u32 (value4));
+  r4 = vaddq_u32 (r3, vreinterpretq_u32 (value5));
+  r5 = vaddq_u32 (r4, vreinterpretq_u32 (value6));
+  r6 = vaddq_u32 (r5, vreinterpretq_u32 (value7));
+  r7 = vaddq_u32 (r6, vreinterpretq_u32 (value8));
+  return vaddq_u32 (r7, vreinterpretq_u32 (value9));
+}
+
+/* { dg-final { scan-assembler-times "vadd.i32" 8 } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_u64.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_u64.c
new file mode 100644
index 0000000000000000000000000000000000000000..db7be807510bdbfcccc65738a168566d5bcb0485
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_u64.c
@@ -0,0 +1,45 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+int16x8_t value1;
+int8x16_t value2;
+int32x4_t value3;
+uint8x16_t value4;
+uint16x8_t value5;
+int64x2_t value6;
+uint32x4_t value7;
+float16x8_t value8;
+float32x4_t value9;
+
+uint64x2_t
+foo (mve_pred16_t __p)
+{
+  uint64x2_t r1,r2,r3,r4,r5,r6,r7;
+  r1 = vpselq_u64 (vreinterpretq_u64_s16 (value1), vreinterpretq_u64_s8 (value2),
+		   __p);
+  r2 = vpselq_u64 (r1, vreinterpretq_u64_s32 (value3), __p);
+  r3 = vpselq_u64 (r2, vreinterpretq_u64_u8 (value4), __p);
+  r4 = vpselq_u64 (r3, vreinterpretq_u64_u16 (value5), __p);
+  r5 = vpselq_u64 (r4, vreinterpretq_u64_s64 (value6), __p);
+  r6 = vpselq_u64 (r5, vreinterpretq_u64_u32 (value7), __p);
+  r7 = vpselq_u64 (r6, vreinterpretq_u64_f16 (value8), __p);
+  return vpselq_u64 (r7, vreinterpretq_u64_f32 (value9), __p);
+}
+
+uint64x2_t
+foo1 (mve_pred16_t __p)
+{
+  uint64x2_t r1,r2,r3,r4,r5,r6,r7;
+  r1 = vpselq_u64 (vreinterpretq_u64 (value1), vreinterpretq_u64 (value2), __p);
+  r2 = vpselq_u64 (r1, vreinterpretq_u64 (value3), __p);
+  r3 = vpselq_u64 (r2, vreinterpretq_u64 (value4), __p);
+  r4 = vpselq_u64 (r3, vreinterpretq_u64 (value5), __p);
+  r5 = vpselq_u64 (r4, vreinterpretq_u64 (value6), __p);
+  r6 = vpselq_u64 (r5, vreinterpretq_u64 (value7), __p);
+  r7 = vpselq_u64 (r6, vreinterpretq_u64 (value8), __p);
+  return vpselq_u64 (r7, vreinterpretq_u64 (value9), __p);
+}
+
+/* { dg-final { scan-assembler-times "vpsel" 8 } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_u8.c
new file mode 100644
index 0000000000000000000000000000000000000000..3d61d98dda88eeb05982374340d46e2e66b8e4dd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_u8.c
@@ -0,0 +1,44 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+int16x8_t value1;
+int64x2_t value2;
+int32x4_t value3;
+int8x16_t value4;
+uint16x8_t value5;
+uint64x2_t value6;
+uint32x4_t value7;
+float16x8_t value8;
+float32x4_t value9;
+
+uint8x16_t
+foo ()
+{
+  uint8x16_t r1,r2,r3,r4,r5,r6,r7;
+  r1 = vaddq_u8 (vreinterpretq_u8_s16 (value1), vreinterpretq_u8_s64 (value2));
+  r2 = vaddq_u8 (r1, vreinterpretq_u8_s32 (value3));
+  r3 = vaddq_u8 (r2, vreinterpretq_u8_s8 (value4));
+  r4 = vaddq_u8 (r3, vreinterpretq_u8_u16 (value5));
+  r5 = vaddq_u8 (r4, vreinterpretq_u8_u64 (value6));
+  r6 = vaddq_u8 (r5, vreinterpretq_u8_u32 (value7));
+  r7 = vaddq_u8 (r6, vreinterpretq_u8_f16 (value8));
+  return vaddq_u8 (r7, vreinterpretq_u8_f32 (value9));
+}
+
+uint8x16_t
+foo1 ()
+{
+  uint8x16_t r1,r2,r3,r4,r5,r6,r7;
+  r1 = vaddq_u8 (vreinterpretq_u8 (value1), vreinterpretq_u8 (value2));
+  r2 = vaddq_u8 (r1, vreinterpretq_u8 (value3));
+  r3 = vaddq_u8 (r2, vreinterpretq_u8 (value4));
+  r4 = vaddq_u8 (r3, vreinterpretq_u8 (value5));
+  r5 = vaddq_u8 (r4, vreinterpretq_u8 (value6));
+  r6 = vaddq_u8 (r5, vreinterpretq_u8 (value7));
+  r7 = vaddq_u8 (r6, vreinterpretq_u8 (value8));
+  return vaddq_u8 (r7, vreinterpretq_u8 (value9));
+}
+
+/* { dg-final { scan-assembler-times "vadd.i8" 8 } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vuninitializedq_float.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vuninitializedq_float.c
new file mode 100644
index 0000000000000000000000000000000000000000..ffdcf1c5b88266004a75c32cc7dd96c2315345a2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vuninitializedq_float.c
@@ -0,0 +1,17 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O0"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+void
+foo ()
+{
+  float16x8_t fa;
+  float32x4_t fb;
+  fa = vuninitializedq_f16 ();
+  fb = vuninitializedq_f32 ();
+}
+
+/* { dg-final { scan-assembler "vstrb.16" } } */
+/* { dg-final { scan-assembler "vstrb.32" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vuninitializedq_float1.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vuninitializedq_float1.c
new file mode 100644
index 0000000000000000000000000000000000000000..e5a415ea1236dc060b7f790bb8ac5590e89f682c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vuninitializedq_float1.c
@@ -0,0 +1,17 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O0"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+void
+foo ()
+{
+  float16x8_t fa, faa;
+  float32x4_t fb, fbb;
+  fa = vuninitializedq (faa);
+  fb = vuninitializedq (fbb);
+}
+
+/* { dg-final { scan-assembler "vstrb.16" } } */
+/* { dg-final { scan-assembler "vstrb.32" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vuninitializedq_int.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vuninitializedq_int.c
new file mode 100644
index 0000000000000000000000000000000000000000..503c108d860a1a41b8d88cf6b5a2dbab44e28b0a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vuninitializedq_int.c
@@ -0,0 +1,31 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O0"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+void
+foo ()
+{
+  int8x16_t a;
+  int16x8_t b;
+  int32x4_t c;
+  int64x2_t d;
+  uint8x16_t ua;
+  uint16x8_t ub;
+  uint32x4_t uc;
+  uint64x2_t ud;
+  a = vuninitializedq_s8 ();
+  b = vuninitializedq_s16 ();
+  c = vuninitializedq_s32 ();
+  d = vuninitializedq_s64 ();
+  ua = vuninitializedq_u8 ();
+  ub = vuninitializedq_u16 ();
+  uc = vuninitializedq_u32 ();
+  ud = vuninitializedq_u64 ();
+}
+
+/* { dg-final { scan-assembler-times "vstrb.8" 4 } } */
+/* { dg-final { scan-assembler-times "vstrb.16" 4 } } */
+/* { dg-final { scan-assembler-times "vstrb.32" 4 } } */
+/* { dg-final { scan-assembler-times "vstrb.64" 4 } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vuninitializedq_int1.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vuninitializedq_int1.c
new file mode 100644
index 0000000000000000000000000000000000000000..56b475cd6e0c936654f8aa13d2e4bf76d01e2f5a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vuninitializedq_int1.c
@@ -0,0 +1,31 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O0"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+void
+foo ()
+{
+  int8x16_t a, aa;
+  int16x8_t b, bb;
+  int32x4_t c, cc;
+  int64x2_t d, dd;
+  uint8x16_t ua, uaa;
+  uint16x8_t ub, ubb;
+  uint32x4_t uc, ucc;
+  uint64x2_t ud, udd;
+  a = vuninitializedq (aa);
+  b = vuninitializedq (bb);
+  c = vuninitializedq (cc);
+  d = vuninitializedq (dd);
+  ua = vuninitializedq (uaa);
+  ub = vuninitializedq (ubb);
+  uc = vuninitializedq (ucc);
+  ud = vuninitializedq (udd);
+}
+
+/* { dg-final { scan-assembler-times "vstrb.8" 6 } } */
+/* { dg-final { scan-assembler-times "vstrb.16" 6 } } */
+/* { dg-final { scan-assembler-times "vstrb.32" 6 } } */
+/* { dg-final { scan-assembler-times "vstrb.64" 6 } } */


[-- Attachment #2: diff29.patch.gz --]
[-- Type: application/gzip, Size: 5449 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH][ARM][GCC][5/5x]: MVE ACLE load intrinsics which load a byte, halfword, or word from memory.
  2019-11-14 19:34 [PATCH][ARM][GCC][0/x]: Support for MVE ACLE intrinsics Srinath Parvathaneni
                   ` (23 preceding siblings ...)
  2019-11-14 19:16 ` [PATCH][ARM][GCC][14x]: MVE ACLE whole vector left shift with carry intrinsics Srinath Parvathaneni
@ 2019-11-14 19:16 ` Srinath Parvathaneni
  2019-11-14 19:16 ` [PATCH][ARM][GCC][12x]: MVE ACLE intrinsics to set and get vector lane Srinath Parvathaneni
                   ` (13 subsequent siblings)
  38 siblings, 0 replies; 48+ messages in thread
From: Srinath Parvathaneni @ 2019-11-14 19:16 UTC (permalink / raw)
  To: gcc-patches; +Cc: Richard Earnshaw, Kyrylo Tkachov

[-- Attachment #1: Type: text/plain, Size: 77737 bytes --]

Hello,

This patch supports the following MVE ACLE load intrinsics which load a byte, halfword,
or word from memory.
vld1q_s8, vld1q_s32, vld1q_s16, vld1q_u8, vld1q_u32, vld1q_u16, vldrhq_gather_offset_s32,
vldrhq_gather_offset_s16, vldrhq_gather_offset_u32, vldrhq_gather_offset_u16,
vldrhq_gather_offset_z_s32, vldrhq_gather_offset_z_s16, vldrhq_gather_offset_z_u32,
vldrhq_gather_offset_z_u16, vldrhq_gather_shifted_offset_s32,vldrwq_f32, vldrwq_z_f32,
vldrhq_gather_shifted_offset_s16, vldrhq_gather_shifted_offset_u32,
vldrhq_gather_shifted_offset_u16, vldrhq_gather_shifted_offset_z_s32,
vldrhq_gather_shifted_offset_z_s16, vldrhq_gather_shifted_offset_z_u32,
vldrhq_gather_shifted_offset_z_u16, vldrhq_s32, vldrhq_s16, vldrhq_u32, vldrhq_u16,
vldrhq_z_s32, vldrhq_z_s16, vldrhq_z_u32, vldrhq_z_u16, vldrwq_s32, vldrwq_u32,
vldrwq_z_s32, vldrwq_z_u32, vld1q_f32, vld1q_f16, vldrhq_f16, vldrhq_z_f16.

Please refer to M-profile Vector Extension (MVE) intrinsics [1]  for more details.
[1]  https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/mve-intrinsics

Regression tested on arm-none-eabi and found no regressions.

Ok for trunk?

Thanks,
Srinath.

gcc/ChangeLog:

2019-11-01  Andre Vieira  <andre.simoesdiasvieira@arm.com>
	    Mihail Ionescu  <mihail.ionescu@arm.com>
	    Srinath Parvathaneni  <srinath.parvathaneni@arm.com>

	* config/arm/arm_mve.h (vld1q_s8): Define macro.
	(vld1q_s32): Likewise.
	(vld1q_s16): Likewise.
	(vld1q_u8): Likewise.
	(vld1q_u32): Likewise.
	(vld1q_u16): Likewise.
	(vldrhq_gather_offset_s32): Likewise.
	(vldrhq_gather_offset_s16): Likewise.
	(vldrhq_gather_offset_u32): Likewise.
	(vldrhq_gather_offset_u16): Likewise.
	(vldrhq_gather_offset_z_s32): Likewise.
	(vldrhq_gather_offset_z_s16): Likewise.
	(vldrhq_gather_offset_z_u32): Likewise.
	(vldrhq_gather_offset_z_u16): Likewise.
	(vldrhq_gather_shifted_offset_s32): Likewise.
	(vldrhq_gather_shifted_offset_s16): Likewise.
	(vldrhq_gather_shifted_offset_u32): Likewise.
	(vldrhq_gather_shifted_offset_u16): Likewise.
	(vldrhq_gather_shifted_offset_z_s32): Likewise.
	(vldrhq_gather_shifted_offset_z_s16): Likewise.
	(vldrhq_gather_shifted_offset_z_u32): Likewise.
	(vldrhq_gather_shifted_offset_z_u16): Likewise.
	(vldrhq_s32): Likewise.
	(vldrhq_s16): Likewise.
	(vldrhq_u32): Likewise.
	(vldrhq_u16): Likewise.
	(vldrhq_z_s32): Likewise.
	(vldrhq_z_s16): Likewise.
	(vldrhq_z_u32): Likewise.
	(vldrhq_z_u16): Likewise.
	(vldrwq_s32): Likewise.
	(vldrwq_u32): Likewise.
	(vldrwq_z_s32): Likewise.
	(vldrwq_z_u32): Likewise.
	(vld1q_f32): Likewise.
	(vld1q_f16): Likewise.
	(vldrhq_f16): Likewise.
	(vldrhq_z_f16): Likewise.
	(vldrwq_f32): Likewise.
	(vldrwq_z_f32): Likewise.
	(__arm_vld1q_s8): Define intrinsic.
	(__arm_vld1q_s32): Likewise.
	(__arm_vld1q_s16): Likewise.
	(__arm_vld1q_u8): Likewise.
	(__arm_vld1q_u32): Likewise.
	(__arm_vld1q_u16): Likewise.
	(__arm_vldrhq_gather_offset_s32): Likewise.
	(__arm_vldrhq_gather_offset_s16): Likewise.
	(__arm_vldrhq_gather_offset_u32): Likewise.
	(__arm_vldrhq_gather_offset_u16): Likewise.
	(__arm_vldrhq_gather_offset_z_s32): Likewise.
	(__arm_vldrhq_gather_offset_z_s16): Likewise.
	(__arm_vldrhq_gather_offset_z_u32): Likewise.
	(__arm_vldrhq_gather_offset_z_u16): Likewise.
	(__arm_vldrhq_gather_shifted_offset_s32): Likewise.
	(__arm_vldrhq_gather_shifted_offset_s16): Likewise.
	(__arm_vldrhq_gather_shifted_offset_u32): Likewise.
	(__arm_vldrhq_gather_shifted_offset_u16): Likewise.
	(__arm_vldrhq_gather_shifted_offset_z_s32): Likewise.
	(__arm_vldrhq_gather_shifted_offset_z_s16): Likewise.
	(__arm_vldrhq_gather_shifted_offset_z_u32): Likewise.
	(__arm_vldrhq_gather_shifted_offset_z_u16): Likewise.
	(__arm_vldrhq_s32): Likewise.
	(__arm_vldrhq_s16): Likewise.
	(__arm_vldrhq_u32): Likewise.
	(__arm_vldrhq_u16): Likewise.
	(__arm_vldrhq_z_s32): Likewise.
	(__arm_vldrhq_z_s16): Likewise.
	(__arm_vldrhq_z_u32): Likewise.
	(__arm_vldrhq_z_u16): Likewise.
	(__arm_vldrwq_s32): Likewise.
	(__arm_vldrwq_u32): Likewise.
	(__arm_vldrwq_z_s32): Likewise.
	(__arm_vldrwq_z_u32): Likewise.
	(__arm_vld1q_f32): Likewise.
	(__arm_vld1q_f16): Likewise.
	(__arm_vldrwq_f32): Likewise.
	(__arm_vldrwq_z_f32): Likewise.
	(__arm_vldrhq_z_f16): Likewise.
	(__arm_vldrhq_f16): Likewise.
	(vld1q): Define polymorphic variant.
	(vldrhq_gather_offset): Likewise.
	(vldrhq_gather_offset_z): Likewise.
	(vldrhq_gather_shifted_offset): Likewise.
	(vldrhq_gather_shifted_offset_z): Likewise.
	* config/arm/arm_mve_builtins.def (LDRU): Use builtin qualifier.
	(LDRS): Likewise.
	(LDRU_Z): Likewise.
	(LDRS_Z): Likewise.
	(LDRGU_Z): Likewise.
	(LDRGU): Likewise.
	(LDRGS_Z): Likewise.
	(LDRGS): Likewise.
	* config/arm/mve.md (MVE_H_ELEM): Define mode iterator.
	(V_sz_elem1): Likewise.
	(VLD1Q): Define iterator.
	(VLDRHGOQ): Likewise.
	(VLDRHGSOQ): Likewise.
	(VLDRHQ): Likewise.
	(VLDRWQ): Likewise.
	(mve_vldrhq_fv8hf): Define RTL pattern.
	(mve_vldrhq_gather_offset_<supf><mode>): Likewise
	(mve_vldrhq_gather_offset_z_<supf><mode>): Likewise
	(mve_vldrhq_gather_shifted_offset_<supf><mode>): Likewise
	(mve_vldrhq_gather_shifted_offset_z_<supf><mode>): Likewise
	(mve_vldrhq_<supf><mode>): Likewise
	(mve_vldrhq_z_fv8hf): Likewise
	(mve_vldrhq_z_<supf><mode>): Likewise
	(mve_vldrwq_fv4sf): Likewise
	(mve_vldrwq_<supf>v4si): Likewise
	(mve_vldrwq_z_fv4sf): Likewise
	(mve_vldrwq_z_<supf>v4si): Likewise
	(mve_vld1q_f<mode>): Define RTL expand pattern.
	(mve_vld1q_<supf><mode>): Likewise

gcc/testsuite/ChangeLog:

2019-11-01  Andre Vieira  <andre.simoesdiasvieira@arm.com>
	    Mihail Ionescu  <mihail.ionescu@arm.com>
	    Srinath Parvathaneni  <srinath.parvathaneni@arm.com>

	* gcc.target/arm/mve/intrinsics/vld1q_f16.c: New test.
	* gcc.target/arm/mve/intrinsics/vld1q_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vld1q_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vld1q_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vld1q_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vld1q_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vld1q_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vld1q_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vldrhq_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vldrhq_gather_offset_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vldrhq_gather_offset_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vldrhq_gather_offset_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vldrhq_gather_offset_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vldrhq_gather_offset_z_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vldrhq_gather_offset_z_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vldrhq_gather_offset_z_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vldrhq_gather_offset_z_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vldrhq_gather_shifted_offset_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vldrhq_gather_shifted_offset_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vldrhq_gather_shifted_offset_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vldrhq_gather_shifted_offset_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vldrhq_gather_shifted_offset_z_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vldrhq_gather_shifted_offset_z_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vldrhq_gather_shifted_offset_z_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vldrhq_gather_shifted_offset_z_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vldrhq_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vldrhq_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vldrhq_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vldrhq_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vldrhq_z_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vldrhq_z_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vldrhq_z_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vldrhq_z_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vldrhq_z_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vldrwq_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vldrwq_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vldrwq_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vldrwq_z_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vldrwq_z_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vldrwq_z_u32.c: Likewise.


###############     Attachment also inlined for ease of reply    ###############


diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
index b40a9c238b7f883a0e91dd6fcc5f41182ea6efe3..e85d36051ef748709da3b9fcdf522e39deb12c08 100644
--- a/gcc/config/arm/arm_mve.h
+++ b/gcc/config/arm/arm_mve.h
@@ -1758,6 +1758,46 @@ typedef struct { uint8x16_t val[4]; } uint8x16x4_t;
 #define vldrbq_z_u32(__base, __p) __arm_vldrbq_z_u32(__base, __p)
 #define vldrwq_gather_base_z_u32(__addr,  __offset, __p) __arm_vldrwq_gather_base_z_u32(__addr,  __offset, __p)
 #define vldrwq_gather_base_z_s32(__addr,  __offset, __p) __arm_vldrwq_gather_base_z_s32(__addr,  __offset, __p)
+#define vld1q_s8(__base) __arm_vld1q_s8(__base)
+#define vld1q_s32(__base) __arm_vld1q_s32(__base)
+#define vld1q_s16(__base) __arm_vld1q_s16(__base)
+#define vld1q_u8(__base) __arm_vld1q_u8(__base)
+#define vld1q_u32(__base) __arm_vld1q_u32(__base)
+#define vld1q_u16(__base) __arm_vld1q_u16(__base)
+#define vldrhq_gather_offset_s32(__base, __offset) __arm_vldrhq_gather_offset_s32(__base, __offset)
+#define vldrhq_gather_offset_s16(__base, __offset) __arm_vldrhq_gather_offset_s16(__base, __offset)
+#define vldrhq_gather_offset_u32(__base, __offset) __arm_vldrhq_gather_offset_u32(__base, __offset)
+#define vldrhq_gather_offset_u16(__base, __offset) __arm_vldrhq_gather_offset_u16(__base, __offset)
+#define vldrhq_gather_offset_z_s32(__base, __offset, __p) __arm_vldrhq_gather_offset_z_s32(__base, __offset, __p)
+#define vldrhq_gather_offset_z_s16(__base, __offset, __p) __arm_vldrhq_gather_offset_z_s16(__base, __offset, __p)
+#define vldrhq_gather_offset_z_u32(__base, __offset, __p) __arm_vldrhq_gather_offset_z_u32(__base, __offset, __p)
+#define vldrhq_gather_offset_z_u16(__base, __offset, __p) __arm_vldrhq_gather_offset_z_u16(__base, __offset, __p)
+#define vldrhq_gather_shifted_offset_s32(__base, __offset) __arm_vldrhq_gather_shifted_offset_s32(__base, __offset)
+#define vldrhq_gather_shifted_offset_s16(__base, __offset) __arm_vldrhq_gather_shifted_offset_s16(__base, __offset)
+#define vldrhq_gather_shifted_offset_u32(__base, __offset) __arm_vldrhq_gather_shifted_offset_u32(__base, __offset)
+#define vldrhq_gather_shifted_offset_u16(__base, __offset) __arm_vldrhq_gather_shifted_offset_u16(__base, __offset)
+#define vldrhq_gather_shifted_offset_z_s32(__base, __offset, __p) __arm_vldrhq_gather_shifted_offset_z_s32(__base, __offset, __p)
+#define vldrhq_gather_shifted_offset_z_s16(__base, __offset, __p) __arm_vldrhq_gather_shifted_offset_z_s16(__base, __offset, __p)
+#define vldrhq_gather_shifted_offset_z_u32(__base, __offset, __p) __arm_vldrhq_gather_shifted_offset_z_u32(__base, __offset, __p)
+#define vldrhq_gather_shifted_offset_z_u16(__base, __offset, __p) __arm_vldrhq_gather_shifted_offset_z_u16(__base, __offset, __p)
+#define vldrhq_s32(__base) __arm_vldrhq_s32(__base)
+#define vldrhq_s16(__base) __arm_vldrhq_s16(__base)
+#define vldrhq_u32(__base) __arm_vldrhq_u32(__base)
+#define vldrhq_u16(__base) __arm_vldrhq_u16(__base)
+#define vldrhq_z_s32(__base, __p) __arm_vldrhq_z_s32(__base, __p)
+#define vldrhq_z_s16(__base, __p) __arm_vldrhq_z_s16(__base, __p)
+#define vldrhq_z_u32(__base, __p) __arm_vldrhq_z_u32(__base, __p)
+#define vldrhq_z_u16(__base, __p) __arm_vldrhq_z_u16(__base, __p)
+#define vldrwq_s32(__base) __arm_vldrwq_s32(__base)
+#define vldrwq_u32(__base) __arm_vldrwq_u32(__base)
+#define vldrwq_z_s32(__base, __p) __arm_vldrwq_z_s32(__base, __p)
+#define vldrwq_z_u32(__base, __p) __arm_vldrwq_z_u32(__base, __p)
+#define vld1q_f32(__base) __arm_vld1q_f32(__base)
+#define vld1q_f16(__base) __arm_vld1q_f16(__base)
+#define vldrhq_f16(__base) __arm_vldrhq_f16(__base)
+#define vldrhq_z_f16(__base, __p) __arm_vldrhq_z_f16(__base, __p)
+#define vldrwq_f32(__base) __arm_vldrwq_f32(__base)
+#define vldrwq_z_f32(__base, __p) __arm_vldrwq_z_f32(__base, __p)
 #endif
 
 __extension__ extern __inline void
@@ -11443,6 +11483,245 @@ __arm_vldrwq_gather_base_z_u32 (uint32x4_t __addr, const int __offset, mve_pred1
   return __builtin_mve_vldrwq_gather_base_z_uv4si (__addr, __offset, __p);
 }
 
+__extension__ extern __inline int8x16_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vld1q_s8 (int8_t const * __base)
+{
+  return __builtin_mve_vld1q_sv16qi ((__builtin_neon_qi *) __base);
+}
+
+__extension__ extern __inline int32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vld1q_s32 (int32_t const * __base)
+{
+  return __builtin_mve_vld1q_sv4si ((__builtin_neon_si *) __base);
+}
+
+__extension__ extern __inline int16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vld1q_s16 (int16_t const * __base)
+{
+  return __builtin_mve_vld1q_sv8hi ((__builtin_neon_hi *) __base);
+}
+
+__extension__ extern __inline uint8x16_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vld1q_u8 (uint8_t const * __base)
+{
+  return __builtin_mve_vld1q_uv16qi ((__builtin_neon_qi *) __base);
+}
+
+__extension__ extern __inline uint32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vld1q_u32 (uint32_t const * __base)
+{
+  return __builtin_mve_vld1q_uv4si ((__builtin_neon_si *) __base);
+}
+
+__extension__ extern __inline uint16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vld1q_u16 (uint16_t const * __base)
+{
+  return __builtin_mve_vld1q_uv8hi ((__builtin_neon_hi *) __base);
+}
+
+__extension__ extern __inline int32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vldrhq_gather_offset_s32 (int16_t const * __base, uint32x4_t __offset)
+{
+  return __builtin_mve_vldrhq_gather_offset_sv4si ((__builtin_neon_hi *) __base, __offset);
+}
+
+__extension__ extern __inline int16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vldrhq_gather_offset_s16 (int16_t const * __base, uint16x8_t __offset)
+{
+  return __builtin_mve_vldrhq_gather_offset_sv8hi ((__builtin_neon_hi *) __base, __offset);
+}
+
+__extension__ extern __inline uint32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vldrhq_gather_offset_u32 (uint16_t const * __base, uint32x4_t __offset)
+{
+  return __builtin_mve_vldrhq_gather_offset_uv4si ((__builtin_neon_hi *) __base, __offset);
+}
+
+__extension__ extern __inline uint16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vldrhq_gather_offset_u16 (uint16_t const * __base, uint16x8_t __offset)
+{
+  return __builtin_mve_vldrhq_gather_offset_uv8hi ((__builtin_neon_hi *) __base, __offset);
+}
+
+__extension__ extern __inline int32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vldrhq_gather_offset_z_s32 (int16_t const * __base, uint32x4_t __offset, mve_pred16_t __p)
+{
+  return __builtin_mve_vldrhq_gather_offset_z_sv4si ((__builtin_neon_hi *) __base, __offset, __p);
+}
+
+__extension__ extern __inline int16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vldrhq_gather_offset_z_s16 (int16_t const * __base, uint16x8_t __offset, mve_pred16_t __p)
+{
+  return __builtin_mve_vldrhq_gather_offset_z_sv8hi ((__builtin_neon_hi *) __base, __offset, __p);
+}
+
+__extension__ extern __inline uint32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vldrhq_gather_offset_z_u32 (uint16_t const * __base, uint32x4_t __offset, mve_pred16_t __p)
+{
+  return __builtin_mve_vldrhq_gather_offset_z_uv4si ((__builtin_neon_hi *) __base, __offset, __p);
+}
+
+__extension__ extern __inline uint16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vldrhq_gather_offset_z_u16 (uint16_t const * __base, uint16x8_t __offset, mve_pred16_t __p)
+{
+  return __builtin_mve_vldrhq_gather_offset_z_uv8hi ((__builtin_neon_hi *) __base, __offset, __p);
+}
+
+__extension__ extern __inline int32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vldrhq_gather_shifted_offset_s32 (int16_t const * __base, uint32x4_t __offset)
+{
+  return __builtin_mve_vldrhq_gather_shifted_offset_sv4si ((__builtin_neon_hi *) __base, __offset);
+}
+
+__extension__ extern __inline int16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vldrhq_gather_shifted_offset_s16 (int16_t const * __base, uint16x8_t __offset)
+{
+  return __builtin_mve_vldrhq_gather_shifted_offset_sv8hi ((__builtin_neon_hi *) __base, __offset);
+}
+
+__extension__ extern __inline uint32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vldrhq_gather_shifted_offset_u32 (uint16_t const * __base, uint32x4_t __offset)
+{
+  return __builtin_mve_vldrhq_gather_shifted_offset_uv4si ((__builtin_neon_hi *) __base, __offset);
+}
+
+__extension__ extern __inline uint16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vldrhq_gather_shifted_offset_u16 (uint16_t const * __base, uint16x8_t __offset)
+{
+  return __builtin_mve_vldrhq_gather_shifted_offset_uv8hi ((__builtin_neon_hi *) __base, __offset);
+}
+
+__extension__ extern __inline int32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vldrhq_gather_shifted_offset_z_s32 (int16_t const * __base, uint32x4_t __offset, mve_pred16_t __p)
+{
+  return __builtin_mve_vldrhq_gather_shifted_offset_z_sv4si ((__builtin_neon_hi *) __base, __offset, __p);
+}
+
+__extension__ extern __inline int16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vldrhq_gather_shifted_offset_z_s16 (int16_t const * __base, uint16x8_t __offset, mve_pred16_t __p)
+{
+  return __builtin_mve_vldrhq_gather_shifted_offset_z_sv8hi ((__builtin_neon_hi *) __base, __offset, __p);
+}
+
+__extension__ extern __inline uint32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vldrhq_gather_shifted_offset_z_u32 (uint16_t const * __base, uint32x4_t __offset, mve_pred16_t __p)
+{
+  return __builtin_mve_vldrhq_gather_shifted_offset_z_uv4si ((__builtin_neon_hi *) __base, __offset, __p);
+}
+
+__extension__ extern __inline uint16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vldrhq_gather_shifted_offset_z_u16 (uint16_t const * __base, uint16x8_t __offset, mve_pred16_t __p)
+{
+  return __builtin_mve_vldrhq_gather_shifted_offset_z_uv8hi ((__builtin_neon_hi *) __base, __offset, __p);
+}
+
+__extension__ extern __inline int32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vldrhq_s32 (int16_t const * __base)
+{
+  return __builtin_mve_vldrhq_sv4si ((__builtin_neon_hi *) __base);
+}
+
+__extension__ extern __inline int16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vldrhq_s16 (int16_t const * __base)
+{
+  return __builtin_mve_vldrhq_sv8hi ((__builtin_neon_hi *) __base);
+}
+
+__extension__ extern __inline uint32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vldrhq_u32 (uint16_t const * __base)
+{
+  return __builtin_mve_vldrhq_uv4si ((__builtin_neon_hi *) __base);
+}
+
+__extension__ extern __inline uint16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vldrhq_u16 (uint16_t const * __base)
+{
+  return __builtin_mve_vldrhq_uv8hi ((__builtin_neon_hi *) __base);
+}
+
+__extension__ extern __inline int32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vldrhq_z_s32 (int16_t const * __base, mve_pred16_t __p)
+{
+  return __builtin_mve_vldrhq_z_sv4si ((__builtin_neon_hi *) __base, __p);
+}
+
+__extension__ extern __inline int16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vldrhq_z_s16 (int16_t const * __base, mve_pred16_t __p)
+{
+  return __builtin_mve_vldrhq_z_sv8hi ((__builtin_neon_hi *) __base, __p);
+}
+
+__extension__ extern __inline uint32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vldrhq_z_u32 (uint16_t const * __base, mve_pred16_t __p)
+{
+  return __builtin_mve_vldrhq_z_uv4si ((__builtin_neon_hi *) __base, __p);
+}
+
+__extension__ extern __inline uint16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vldrhq_z_u16 (uint16_t const * __base, mve_pred16_t __p)
+{
+  return __builtin_mve_vldrhq_z_uv8hi ((__builtin_neon_hi *) __base, __p);
+}
+
+__extension__ extern __inline int32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vldrwq_s32 (int32_t const * __base)
+{
+  return __builtin_mve_vldrwq_sv4si ((__builtin_neon_si *) __base);
+}
+
+__extension__ extern __inline uint32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vldrwq_u32 (uint32_t const * __base)
+{
+  return __builtin_mve_vldrwq_uv4si ((__builtin_neon_si *) __base);
+}
+
+
+__extension__ extern __inline int32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vldrwq_z_s32 (int32_t const * __base, mve_pred16_t __p)
+{
+  return __builtin_mve_vldrwq_z_sv4si ((__builtin_neon_si *) __base, __p);
+}
+
+__extension__ extern __inline uint32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vldrwq_z_u32 (uint32_t const * __base, mve_pred16_t __p)
+{
+  return __builtin_mve_vldrwq_z_uv4si ((__builtin_neon_si *) __base, __p);
+}
+
 #if (__ARM_FEATURE_MVE & 2) /* MVE Floating point.  */
 
 __extension__ extern __inline void
@@ -13585,6 +13864,47 @@ __arm_vsubq_m_n_f16 (float16x8_t __inactive, float16x8_t __a, float16_t __b, mve
   return __builtin_mve_vsubq_m_n_fv8hf (__inactive, __a, __b, __p);
 }
 
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vld1q_f32 (float32_t const * __base)
+{
+  return __builtin_mve_vld1q_fv4sf((__builtin_neon_si *) __base);
+}
+
+__extension__ extern __inline float16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vld1q_f16 (float16_t const * __base)
+{
+  return __builtin_mve_vld1q_fv8hf((__builtin_neon_hi *) __base);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vldrwq_f32 (float32_t const * __base)
+{
+  return __builtin_mve_vldrwq_fv4sf((__builtin_neon_si *) __base);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vldrwq_z_f32 (float32_t const * __base, mve_pred16_t __p)
+{
+  return __builtin_mve_vldrwq_z_fv4sf((__builtin_neon_si *) __base, __p);
+}
+
+__extension__ extern __inline float16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vldrhq_z_f16 (float16_t const * __base, mve_pred16_t __p)
+{
+  return __builtin_mve_vldrhq_z_fv8hf((__builtin_neon_hi *) __base, __p);
+}
+
+__extension__ extern __inline float16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vldrhq_f16 (float16_t const * __base)
+{
+  return __builtin_mve_vldrhq_fv8hf((__builtin_neon_hi *) __base);
+}
 #endif
 
 enum {
@@ -15132,6 +15452,18 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]: __arm_vorrq_m_f16 (__ARM_mve_coerce(__p0, float16x8_t), __ARM_mve_coerce(__p1, float16x8_t), __ARM_mve_coerce(__p2, float16x8_t), p3), \
   int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]: __arm_vorrq_m_f32 (__ARM_mve_coerce(__p0, float32x4_t), __ARM_mve_coerce(__p1, float32x4_t), __ARM_mve_coerce(__p2, float32x4_t), p3));})
 
+#define vld1q(p0) __arm_vld1q(p0)
+#define __arm_vld1q(p0) ({ __typeof(p0) __p0 = (p0); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
+  int (*)[__ARM_mve_type_int8_t_const_ptr]: __arm_vld1q_s8 (__ARM_mve_coerce(__p0, int8_t const *)), \
+  int (*)[__ARM_mve_type_int16_t_const_ptr]: __arm_vld1q_s16 (__ARM_mve_coerce(__p0, int16_t const *)), \
+  int (*)[__ARM_mve_type_int32_t_const_ptr]: __arm_vld1q_s32 (__ARM_mve_coerce(__p0, int32_t const *)), \
+  int (*)[__ARM_mve_type_uint8_t_const_ptr]: __arm_vld1q_u8 (__ARM_mve_coerce(__p0, uint8_t const *)), \
+  int (*)[__ARM_mve_type_uint16_t_const_ptr]: __arm_vld1q_u16 (__ARM_mve_coerce(__p0, uint16_t const *)), \
+  int (*)[__ARM_mve_type_uint32_t_const_ptr]: __arm_vld1q_u32 (__ARM_mve_coerce(__p0, uint32_t const *)), \
+  int (*)[__ARM_mve_type_float16_t_const_ptr]: __arm_vld1q_f16 (__ARM_mve_coerce(__p0, float16_t const *)), \
+  int (*)[__ARM_mve_type_float32_t_const_ptr]: __arm_vld1q_f32 (__ARM_mve_coerce(__p0, float32_t const *)));})
+
 #else /* MVE Interger.  */
 
 #define vst4q(p0,p1) __arm_vst4q(p0,p1)
@@ -18071,6 +18403,52 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_uint8_t_const_ptr][__ARM_mve_type_uint16x8_t]: __arm_vldrbq_gather_offset_z_u16 (__ARM_mve_coerce(__p0, uint8_t const *), __ARM_mve_coerce(__p1, uint16x8_t), p2), \
   int (*)[__ARM_mve_type_uint8_t_const_ptr][__ARM_mve_type_uint32x4_t]: __arm_vldrbq_gather_offset_z_u32 (__ARM_mve_coerce(__p0, uint8_t const *), __ARM_mve_coerce(__p1, uint32x4_t), p2));})
 
+#define vld1q(p0) __arm_vld1q(p0)
+#define __arm_vld1q(p0) ({ __typeof(p0) __p0 = (p0); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
+  int (*)[__ARM_mve_type_int8_t_const_ptr]: __arm_vld1q_s8 (__ARM_mve_coerce(__p0, int8_t const *)), \
+  int (*)[__ARM_mve_type_int16_t_const_ptr]: __arm_vld1q_s16 (__ARM_mve_coerce(__p0, int16_t const *)), \
+  int (*)[__ARM_mve_type_int32_t_const_ptr]: __arm_vld1q_s32 (__ARM_mve_coerce(__p0, int32_t const *)), \
+  int (*)[__ARM_mve_type_uint8_t_const_ptr]: __arm_vld1q_u8 (__ARM_mve_coerce(__p0, uint8_t const *)), \
+  int (*)[__ARM_mve_type_uint16_t_const_ptr]: __arm_vld1q_u16 (__ARM_mve_coerce(__p0, uint16_t const *)), \
+  int (*)[__ARM_mve_type_uint32_t_const_ptr]: __arm_vld1q_u32 (__ARM_mve_coerce(__p0, uint32_t const *)));})
+
+#define vldrhq_gather_offset(p0,p1) __arm_vldrhq_gather_offset(p0,p1)
+#define __arm_vldrhq_gather_offset(p0,p1) ({ __typeof(p0) __p0 = (p0); \
+  __typeof(p1) __p1 = (p1); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
+  int (*)[__ARM_mve_type_int16_t_const_ptr][__ARM_mve_type_uint16x8_t]: __arm_vldrhq_gather_offset_s16 (__ARM_mve_coerce(__p0, int16_t const *), __ARM_mve_coerce(__p1, uint16x8_t)), \
+  int (*)[__ARM_mve_type_int16_t_const_ptr][__ARM_mve_type_uint32x4_t]: __arm_vldrhq_gather_offset_s32 (__ARM_mve_coerce(__p0, int16_t const *), __ARM_mve_coerce(__p1, uint32x4_t)), \
+  int (*)[__ARM_mve_type_uint16_t_const_ptr][__ARM_mve_type_uint16x8_t]: __arm_vldrhq_gather_offset_u16 (__ARM_mve_coerce(__p0, uint16_t const *), __ARM_mve_coerce(__p1, uint16x8_t)), \
+  int (*)[__ARM_mve_type_uint16_t_const_ptr][__ARM_mve_type_uint32x4_t]: __arm_vldrhq_gather_offset_u32 (__ARM_mve_coerce(__p0, uint16_t const *), __ARM_mve_coerce(__p1, uint32x4_t)));})
+
+#define vldrhq_gather_offset_z(p0,p1,p2) __arm_vldrhq_gather_offset_z(p0,p1,p2)
+#define __arm_vldrhq_gather_offset_z(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
+  __typeof(p1) __p1 = (p1); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
+  int (*)[__ARM_mve_type_int16_t_const_ptr][__ARM_mve_type_uint16x8_t]: __arm_vldrhq_gather_offset_z_s16 (__ARM_mve_coerce(__p0, int16_t const *), __ARM_mve_coerce(__p1, uint16x8_t), p2), \
+  int (*)[__ARM_mve_type_int16_t_const_ptr][__ARM_mve_type_uint32x4_t]: __arm_vldrhq_gather_offset_z_s32 (__ARM_mve_coerce(__p0, int16_t const *), __ARM_mve_coerce(__p1, uint32x4_t), p2), \
+  int (*)[__ARM_mve_type_uint16_t_const_ptr][__ARM_mve_type_uint16x8_t]: __arm_vldrhq_gather_offset_z_u16 (__ARM_mve_coerce(__p0, uint16_t const *), __ARM_mve_coerce(__p1, uint16x8_t), p2), \
+  int (*)[__ARM_mve_type_uint16_t_const_ptr][__ARM_mve_type_uint32x4_t]: __arm_vldrhq_gather_offset_z_u32 (__ARM_mve_coerce(__p0, uint16_t const *), __ARM_mve_coerce(__p1, uint32x4_t), p2));})
+
+#define vldrhq_gather_shifted_offset(p0,p1) __arm_vldrhq_gather_shifted_offset(p0,p1)
+#define __arm_vldrhq_gather_shifted_offset(p0,p1) ({ __typeof(p0) __p0 = (p0); \
+  __typeof(p1) __p1 = (p1); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
+  int (*)[__ARM_mve_type_int16_t_const_ptr][__ARM_mve_type_uint16x8_t]: __arm_vldrhq_gather_shifted_offset_s16 (__ARM_mve_coerce(__p0, int16_t const *), __ARM_mve_coerce(__p1, uint16x8_t)), \
+  int (*)[__ARM_mve_type_int16_t_const_ptr][__ARM_mve_type_uint32x4_t]: __arm_vldrhq_gather_shifted_offset_s32 (__ARM_mve_coerce(__p0, int16_t const *), __ARM_mve_coerce(__p1, uint32x4_t)), \
+  int (*)[__ARM_mve_type_uint16_t_const_ptr][__ARM_mve_type_uint16x8_t]: __arm_vldrhq_gather_shifted_offset_u16 (__ARM_mve_coerce(__p0, uint16_t const *), __ARM_mve_coerce(__p1, uint16x8_t)), \
+  int (*)[__ARM_mve_type_uint16_t_const_ptr][__ARM_mve_type_uint32x4_t]: __arm_vldrhq_gather_shifted_offset_u32 (__ARM_mve_coerce(__p0, uint16_t const *), __ARM_mve_coerce(__p1, uint32x4_t)));})
+
+#define vldrhq_gather_shifted_offset_z(p0,p1,p2) __arm_vldrhq_gather_shifted_offset_z(p0,p1,p2)
+#define __arm_vldrhq_gather_shifted_offset_z(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
+  __typeof(p1) __p1 = (p1); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
+  int (*)[__ARM_mve_type_int16_t_const_ptr][__ARM_mve_type_uint16x8_t]: __arm_vldrhq_gather_shifted_offset_z_s16 (__ARM_mve_coerce(__p0, int16_t const *), __ARM_mve_coerce(__p1, uint16x8_t), p2), \
+  int (*)[__ARM_mve_type_int16_t_const_ptr][__ARM_mve_type_uint32x4_t]: __arm_vldrhq_gather_shifted_offset_z_s32 (__ARM_mve_coerce(__p0, int16_t const *), __ARM_mve_coerce(__p1, uint32x4_t), p2), \
+  int (*)[__ARM_mve_type_uint16_t_const_ptr][__ARM_mve_type_uint16x8_t]: __arm_vldrhq_gather_shifted_offset_z_u16 (__ARM_mve_coerce(__p0, uint16_t const *), __ARM_mve_coerce(__p1, uint16x8_t), p2), \
+  int (*)[__ARM_mve_type_uint16_t_const_ptr][__ARM_mve_type_uint32x4_t]: __arm_vldrhq_gather_shifted_offset_z_u32 (__ARM_mve_coerce(__p0, uint16_t const *), __ARM_mve_coerce(__p1, uint32x4_t), p2));})
+
 #endif /* MVE Floating point.  */
 
 #ifdef __cplusplus
diff --git a/gcc/config/arm/arm_mve_builtins.def b/gcc/config/arm/arm_mve_builtins.def
index 45b00e889cfe716a2c70fb86d72eb9a4c411b70d..be407f4d690cadadbdbdab30aed5b0339178dda9 100644
--- a/gcc/config/arm/arm_mve_builtins.def
+++ b/gcc/config/arm/arm_mve_builtins.def
@@ -709,3 +709,26 @@ VAR3 (LDRGS_Z, vldrbq_gather_offset_z_s, v16qi, v8hi, v4si)
 VAR3 (LDRGU_Z, vldrbq_gather_offset_z_u, v16qi, v8hi, v4si)
 VAR3 (LDRS_Z, vldrbq_z_s, v16qi, v8hi, v4si)
 VAR3 (LDRU_Z, vldrbq_z_u, v16qi, v8hi, v4si)
+VAR3 (LDRU, vld1q_u, v16qi, v8hi, v4si)
+VAR3 (LDRS, vld1q_s, v16qi, v8hi, v4si)
+VAR2 (LDRU_Z, vldrhq_z_u, v8hi, v4si)
+VAR2 (LDRU, vldrhq_u, v8hi, v4si)
+VAR2 (LDRS_Z, vldrhq_z_s, v8hi, v4si)
+VAR2 (LDRS, vldrhq_s, v8hi, v4si)
+VAR2 (LDRS, vld1q_f, v8hf, v4sf)
+VAR2 (LDRGU_Z, vldrhq_gather_shifted_offset_z_u, v8hi, v4si)
+VAR2 (LDRGU_Z, vldrhq_gather_offset_z_u, v8hi, v4si)
+VAR2 (LDRGU, vldrhq_gather_shifted_offset_u, v8hi, v4si)
+VAR2 (LDRGU, vldrhq_gather_offset_u, v8hi, v4si)
+VAR2 (LDRGS_Z, vldrhq_gather_shifted_offset_z_s, v8hi, v4si)
+VAR2 (LDRGS_Z, vldrhq_gather_offset_z_s, v8hi, v4si)
+VAR2 (LDRGS, vldrhq_gather_shifted_offset_s, v8hi, v4si)
+VAR2 (LDRGS, vldrhq_gather_offset_s, v8hi, v4si)
+VAR1 (LDRS, vldrhq_f, v8hf)
+VAR1 (LDRS_Z, vldrhq_z_f, v8hf)
+VAR1 (LDRS, vldrwq_f, v4sf)
+VAR1 (LDRS, vldrwq_s, v4si)
+VAR1 (LDRU, vldrwq_u, v4si)
+VAR1 (LDRS_Z, vldrwq_z_f, v4sf)
+VAR1 (LDRS_Z, vldrwq_z_s, v4si)
+VAR1 (LDRU_Z, vldrwq_z_u, v4si)
diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index 05b94e4ee283da427aee18c7223bf5ff4b0e1e4a..31ccb7b1608713e89ae9866b0b124efe8dc88ae3 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -26,6 +26,7 @@
 (define_mode_iterator MVE_3 [V16QI V8HI])
 (define_mode_iterator MVE_2 [V16QI V8HI V4SI])
 (define_mode_iterator MVE_5 [V8HI V4SI])
+(define_mode_iterator MVE_6 [V8HI V4SI])
 
 (define_c_enum "unspec" [VST4Q VRNDXQ_F VRNDQ_F VRNDPQ_F VRNDNQ_F VRNDMQ_F
 			 VRNDAQ_F VREV64Q_F VNEGQ_F VDUPQ_N_F VABSQ_F VREV32Q_F
@@ -193,10 +194,13 @@
 			 VFMAQ_M_N_F VFMASQ_M_N_F VFMSQ_M_F VMAXNMQ_M_F
 			 VMINNMQ_M_F VSUBQ_M_F VSTRWQSB_S VSTRWQSB_U
 			 VSTRBQSO_S VSTRBQSO_U VSTRBQ_S VSTRBQ_U VLDRBQGO_S
-			 VLDRBQGO_U VLDRBQ_S VLDRBQ_U VLDRWQGB_S VLDRWQGB_U])
+			 VLDRBQGO_U VLDRBQ_S VLDRBQ_U VLDRWQGB_S VLDRWQGB_U
+			 VLD1Q_F VLD1Q_S VLD1Q_U VLDRHQ_F VLDRHQGO_S
+			 VLDRHQGO_U VLDRHQGSO_S VLDRHQGSO_U VLDRHQ_S VLDRHQ_U
+			 VLDRWQ_F VLDRWQ_S VLDRWQ_U])
 
-(define_mode_attr MVE_CNVT [(V8HI "V8HF") (V4SI "V4SF")
-			    (V8HF "V8HI") (V4SF "V4SI")])
+(define_mode_attr MVE_CNVT [(V8HI "V8HF") (V4SI "V4SF") (V8HF "V8HI")
+			    (V4SF "V4SI")])
 
 (define_int_attr supf [(VCVTQ_TO_F_S "s") (VCVTQ_TO_F_U "u") (VREV16Q_S "s")
 		       (VREV16Q_U "u") (VMVNQ_N_S "s") (VMVNQ_N_U "u")
@@ -348,7 +352,11 @@
 		       (VSTRWQSB_S "s") (VSTRWQSB_U "u") (VSTRBQSO_S "s")
 		       (VSTRBQSO_U "u") (VSTRBQ_S "s") (VSTRBQ_U "u")
 		       (VLDRBQGO_S "s") (VLDRBQGO_U "u") (VLDRBQ_S "s")
-		       (VLDRBQ_U "u") (VLDRWQGB_S "s") (VLDRWQGB_U "u")])
+		       (VLDRBQ_U "u") (VLDRWQGB_S "s") (VLDRWQGB_U "u")
+		       (VLD1Q_S "s") (VLD1Q_U "u") (VLDRHQGO_S "s")
+		       (VLDRHQGO_U "u") (VLDRHQGSO_S "s") (VLDRHQGSO_U "u")
+		       (VLDRHQ_S "s") (VLDRHQ_U "u") (VLDRWQ_S "s")
+		       (VLDRWQ_U "u")])
 
 (define_int_attr mode1 [(VCTP8Q "8") (VCTP16Q "16") (VCTP32Q "32")
 			(VCTP64Q "64") (VCTP8Q_M "8") (VCTP16Q_M "16")
@@ -362,10 +370,12 @@
 				   (V4SI "mve_imm_31")])
 (define_mode_attr MVE_constraint3 [ (V8HI "Rb") (V4SI "Rd")])
 (define_mode_attr MVE_pred3 [ (V8HI "mve_imm_8") (V4SI "mve_imm_16")])
-
 (define_mode_attr MVE_constraint1 [ (V8HI "Ra") (V4SI "Rc")])
 (define_mode_attr MVE_pred1 [ (V8HI "mve_imm_7") (V4SI "mve_imm_15")])
 (define_mode_attr MVE_B_ELEM [ (V16QI "V16QI") (V8HI "V8QI") (V4SI "V4QI")])
+(define_mode_attr MVE_H_ELEM [ (V8HI "V8HI") (V4SI "V4HI")])
+(define_mode_attr V_sz_elem1 [(V16QI "b") (V8HI  "h") (V4SI "w") (V8HF "h")
+			      (V4SF "w")])
 
 (define_int_iterator VCVTQ_TO_F [VCVTQ_TO_F_S VCVTQ_TO_F_U])
 (define_int_iterator VMVNQ_N [VMVNQ_N_U VMVNQ_N_S])
@@ -575,6 +585,11 @@
 (define_int_iterator VLDRBGOQ [VLDRBQGO_S VLDRBQGO_U])
 (define_int_iterator VLDRBQ [VLDRBQ_S VLDRBQ_U])
 (define_int_iterator VLDRWGBQ [VLDRWQGB_S VLDRWQGB_U])
+(define_int_iterator VLD1Q [VLD1Q_S VLD1Q_U])
+(define_int_iterator VLDRHGOQ [VLDRHQGO_S VLDRHQGO_U])
+(define_int_iterator VLDRHGSOQ [VLDRHQGSO_S VLDRHQGSO_U])
+(define_int_iterator VLDRHQ [VLDRHQ_S VLDRHQ_U])
+(define_int_iterator VLDRWQ [VLDRWQ_S VLDRWQ_U])
 
 (define_insn "*mve_mov<mode>"
   [(set (match_operand:MVE_types 0 "s_register_operand" "=w,w,r,w,w,r,w")
@@ -8201,3 +8216,276 @@
    return "";
 }
   [(set_attr "length" "8")])
+
+;;
+;; [vldrhq_f]
+;;
+(define_insn "mve_vldrhq_fv8hf"
+  [(set (match_operand:V8HF 0 "s_register_operand" "=w")
+	(unspec:V8HF [(match_operand:V8HI 1 "memory_operand" "Us")]
+	 VLDRHQ_F))
+  ]
+  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
+{
+   rtx ops[2];
+   int regno = REGNO (operands[0]);
+   ops[0] = gen_rtx_REG (TImode, regno);
+   ops[1]  = operands[1];
+   output_asm_insn ("vldrh.f16\t%q0, %E1",ops);
+   return "";
+}
+  [(set_attr "length" "4")])
+
+;;
+;; [vldrhq_gather_offset_s vldrhq_gather_offset_u]
+;;
+(define_insn "mve_vldrhq_gather_offset_<supf><mode>"
+  [(set (match_operand:MVE_6 0 "s_register_operand" "=&w")
+	(unspec:MVE_6 [(match_operand:<MVE_H_ELEM> 1 "memory_operand" "Us")
+		       (match_operand:MVE_6 2 "s_register_operand" "w")]
+	VLDRHGOQ))
+  ]
+  "TARGET_HAVE_MVE"
+{
+   rtx ops[3];
+   ops[0] = operands[0];
+   ops[1] = operands[1];
+   ops[2] = operands[2];
+   if (!strcmp ("<supf>","s") && <V_sz_elem> == 16)
+     output_asm_insn ("vldrh.u16\t%q0, [%m1, %q2]",ops);
+   else
+     output_asm_insn ("vldrh.<supf><V_sz_elem>\t%q0, [%m1, %q2]",ops);
+   return "";
+}
+  [(set_attr "length" "4")])
+
+;;
+;; [vldrhq_gather_offset_z_s vldrhq_gather_offset_z_u]
+;;
+(define_insn "mve_vldrhq_gather_offset_z_<supf><mode>"
+  [(set (match_operand:MVE_6 0 "s_register_operand" "=&w")
+	(unspec:MVE_6 [(match_operand:<MVE_H_ELEM> 1 "memory_operand" "Us")
+		       (match_operand:MVE_6 2 "s_register_operand" "w")
+		       (match_operand:HI 3 "vpr_register_operand" "Up")
+	]VLDRHGOQ))
+  ]
+  "TARGET_HAVE_MVE"
+{
+   rtx ops[4];
+   ops[0] = operands[0];
+   ops[1] = operands[1];
+   ops[2] = operands[2];
+   ops[3] = operands[3];
+   if (!strcmp ("<supf>","s") && <V_sz_elem> == 16)
+     output_asm_insn ("vpst\n\tvldrht.u16\t%q0, [%m1, %q2]",ops);
+   else
+     output_asm_insn ("vpst\n\tvldrht.<supf><V_sz_elem>\t%q0, [%m1, %q2]",ops);
+   return "";
+}
+ [(set_attr "length" "8")])
+
+;;
+;; [vldrhq_gather_shifted_offset_s vldrhq_gather_shifted_offset_u]
+;;
+(define_insn "mve_vldrhq_gather_shifted_offset_<supf><mode>"
+  [(set (match_operand:MVE_6 0 "s_register_operand" "=&w")
+	(unspec:MVE_6 [(match_operand:<MVE_H_ELEM> 1 "memory_operand" "Us")
+		       (match_operand:MVE_6 2 "s_register_operand" "w")]
+	VLDRHGSOQ))
+  ]
+  "TARGET_HAVE_MVE"
+{
+   rtx ops[3];
+   ops[0] = operands[0];
+   ops[1] = operands[1];
+   ops[2] = operands[2];
+      if (!strcmp ("<supf>","s") && <V_sz_elem> == 16)
+     output_asm_insn ("vldrh.u16\t%q0, [%m1, %q2, uxtw #1]",ops);
+   else
+     output_asm_insn ("vldrh.<supf><V_sz_elem>\t%q0, [%m1, %q2, uxtw #1]",ops);
+   return "";
+}
+  [(set_attr "length" "4")])
+
+;;
+;; [vldrhq_gather_shifted_offset_z_s vldrhq_gather_shited_offset_z_u]
+;;
+(define_insn "mve_vldrhq_gather_shifted_offset_z_<supf><mode>"
+  [(set (match_operand:MVE_6 0 "s_register_operand" "=&w")
+	(unspec:MVE_6 [(match_operand:<MVE_H_ELEM> 1 "memory_operand" "Us")
+		       (match_operand:MVE_6 2 "s_register_operand" "w")
+		       (match_operand:HI 3 "vpr_register_operand" "Up")
+	]VLDRHGSOQ))
+  ]
+  "TARGET_HAVE_MVE"
+{
+   rtx ops[4];
+   ops[0] = operands[0];
+   ops[1] = operands[1];
+   ops[2] = operands[2];
+   ops[3] = operands[3];
+   if (!strcmp ("<supf>","s") && <V_sz_elem> == 16)
+     output_asm_insn ("vpst\n\tvldrht.u16\t%q0, [%m1, %q2, uxtw #1]",ops);
+   else
+     output_asm_insn ("vpst\n\tvldrht.<supf><V_sz_elem>\t%q0, [%m1, %q2, uxtw #1]",ops);
+   return "";
+}
+  [(set_attr "length" "8")])
+
+;;
+;;
+;; [vldrhq_s, vldrhq_u]
+;;
+(define_insn "mve_vldrhq_<supf><mode>"
+  [(set (match_operand:MVE_6 0 "s_register_operand" "=w")
+	(unspec:MVE_6 [(match_operand:<MVE_H_ELEM> 1 "memory_operand" "Us")]
+	 VLDRHQ))
+  ]
+  "TARGET_HAVE_MVE"
+{
+   rtx ops[2];
+   int regno = REGNO (operands[0]);
+   ops[0] = gen_rtx_REG (TImode, regno);
+   ops[1]  = operands[1];
+   output_asm_insn ("vldrh.<supf><V_sz_elem>\t%q0, %E1",ops);
+   return "";
+}
+  [(set_attr "length" "4")])
+
+;;
+;; [vldrhq_z_f]
+;;
+(define_insn "mve_vldrhq_z_fv8hf"
+  [(set (match_operand:V8HF 0 "s_register_operand" "=w")
+	(unspec:V8HF [(match_operand:V8HI 1 "memory_operand" "Us")
+	(match_operand:HI 2 "vpr_register_operand" "Up")]
+	 VLDRHQ_F))
+  ]
+  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
+{
+   rtx ops[2];
+   int regno = REGNO (operands[0]);
+   ops[0] = gen_rtx_REG (TImode, regno);
+   ops[1]  = operands[1];
+   output_asm_insn ("vpst\n\tvldrht.f16\t%q0, %E1",ops);
+   return "";
+}
+  [(set_attr "length" "8")])
+
+;;
+;; [vldrhq_z_s vldrhq_z_u]
+;;
+(define_insn "mve_vldrhq_z_<supf><mode>"
+  [(set (match_operand:MVE_6 0 "s_register_operand" "=w")
+	(unspec:MVE_6 [(match_operand:<MVE_H_ELEM> 1 "memory_operand" "Us")
+	(match_operand:HI 2 "vpr_register_operand" "Up")]
+	 VLDRHQ))
+  ]
+  "TARGET_HAVE_MVE"
+{
+   rtx ops[2];
+   int regno = REGNO (operands[0]);
+   ops[0] = gen_rtx_REG (TImode, regno);
+   ops[1]  = operands[1];
+   output_asm_insn ("vpst\n\tvldrht.<supf><V_sz_elem>\t%q0, %E1",ops);
+   return "";
+}
+  [(set_attr "length" "8")])
+
+;;
+;; [vldrwq_f]
+;;
+(define_insn "mve_vldrwq_fv4sf"
+  [(set (match_operand:V4SF 0 "s_register_operand" "=w")
+	(unspec:V4SF [(match_operand:V4SI 1 "memory_operand" "Us")]
+	 VLDRWQ_F))
+  ]
+  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
+{
+   rtx ops[2];
+   int regno = REGNO (operands[0]);
+   ops[0] = gen_rtx_REG (TImode, regno);
+   ops[1]  = operands[1];
+   output_asm_insn ("vldrw.f32\t%q0, %E1",ops);
+   return "";
+}
+  [(set_attr "length" "4")])
+
+;;
+;; [vldrwq_s vldrwq_u]
+;;
+(define_insn "mve_vldrwq_<supf>v4si"
+  [(set (match_operand:V4SI 0 "s_register_operand" "=w")
+	(unspec:V4SI [(match_operand:V4SI 1 "memory_operand" "Us")]
+	 VLDRWQ))
+  ]
+  "TARGET_HAVE_MVE"
+{
+   rtx ops[2];
+   int regno = REGNO (operands[0]);
+   ops[0] = gen_rtx_REG (TImode, regno);
+   ops[1]  = operands[1];
+   output_asm_insn ("vldrw.<supf>32\t%q0, %E1",ops);
+   return "";
+}
+  [(set_attr "length" "4")])
+
+;;
+;; [vldrwq_z_f]
+;;
+(define_insn "mve_vldrwq_z_fv4sf"
+  [(set (match_operand:V4SF 0 "s_register_operand" "=w")
+	(unspec:V4SF [(match_operand:V4SI 1 "memory_operand" "Us")
+	(match_operand:HI 2 "vpr_register_operand" "Up")]
+	 VLDRWQ_F))
+  ]
+  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
+{
+   rtx ops[2];
+   int regno = REGNO (operands[0]);
+   ops[0] = gen_rtx_REG (TImode, regno);
+   ops[1]  = operands[1];
+   output_asm_insn ("vpst\n\tvldrwt.f32\t%q0, %E1",ops);
+   return "";
+}
+  [(set_attr "length" "8")])
+
+;;
+;; [vldrwq_z_s vldrwq_z_u]
+;;
+(define_insn "mve_vldrwq_z_<supf>v4si"
+  [(set (match_operand:V4SI 0 "s_register_operand" "=w")
+	(unspec:V4SI [(match_operand:V4SI 1 "memory_operand" "Us")
+	(match_operand:HI 2 "vpr_register_operand" "Up")]
+	 VLDRWQ))
+  ]
+  "TARGET_HAVE_MVE"
+{
+   rtx ops[2];
+   int regno = REGNO (operands[0]);
+   ops[0] = gen_rtx_REG (TImode, regno);
+   ops[1]  = operands[1];
+   output_asm_insn ("vpst\n\tvldrwt.<supf>32\t%q0, %E1",ops);
+   return "";
+}
+  [(set_attr "length" "8")])
+
+(define_expand "mve_vld1q_f<mode>"
+  [(match_operand:MVE_0 0 "s_register_operand")
+   (unspec:MVE_0 [(match_operand:<MVE_CNVT> 1 "memory_operand")] VLD1Q_F)
+  ]
+  "TARGET_HAVE_MVE || TARGET_HAVE_MVE_FLOAT"
+{
+  emit_insn (gen_mve_vldr<V_sz_elem1>q_f<mode>(operands[0],operands[1]));
+  DONE;
+})
+
+(define_expand "mve_vld1q_<supf><mode>"
+  [(match_operand:MVE_2 0 "s_register_operand")
+   (unspec:MVE_2 [(match_operand:MVE_2 1 "memory_operand")] VLD1Q)
+  ]
+  "TARGET_HAVE_MVE"
+{
+  emit_insn (gen_mve_vldr<V_sz_elem1>q_<supf><mode>(operands[0],operands[1]));
+  DONE;
+})
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_f16.c
new file mode 100644
index 0000000000000000000000000000000000000000..9665c7d6721016bd7cccf3d6df5bb09746216e53
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_f16.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+float16x8_t
+foo (float16_t const * base)
+{
+  return vld1q_f16 (base);
+}
+
+/* { dg-final { scan-assembler "vldrh.f16"  }  } */
+
+float16x8_t
+foo1 (float16_t const * base)
+{
+  return vld1q (base);
+}
+
+/* { dg-final { scan-assembler "vldrh.f16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_f32.c
new file mode 100644
index 0000000000000000000000000000000000000000..8f720bd28a293858575ae9a7ed25aeb7cde827f8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_f32.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+float32x4_t
+foo (float32_t const * base)
+{
+  return vld1q_f32 (base);
+}
+
+/* { dg-final { scan-assembler "vldrw.f32"  }  } */
+
+float32x4_t
+foo1 (float32_t const * base)
+{
+  return vld1q (base);
+}
+
+/* { dg-final { scan-assembler "vldrw.f32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_s16.c
new file mode 100644
index 0000000000000000000000000000000000000000..9fa37f82fd604b225ba038673ef937bf3778f5ba
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_s16.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int16x8_t
+foo (int16_t const * base)
+{
+  return vld1q_s16 (base);
+}
+
+/* { dg-final { scan-assembler "vldrh.s16"  }  } */
+
+int16x8_t
+foo1 (int16_t const * base)
+{
+  return vld1q (base);
+}
+
+/* { dg-final { scan-assembler "vldrh.s16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_s32.c
new file mode 100644
index 0000000000000000000000000000000000000000..38af9e66c2b19133a21f87c9482f9370d8f9b69d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_s32.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int32x4_t
+foo (int32_t const * base)
+{
+  return vld1q_s32 (base);
+}
+
+/* { dg-final { scan-assembler "vldrw.s32"  }  } */
+
+int32x4_t
+foo1 (int32_t const * base)
+{
+  return vld1q (base);
+}
+
+/* { dg-final { scan-assembler "vldrw.s32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_s8.c
new file mode 100644
index 0000000000000000000000000000000000000000..a8b4c1fe3addec2ced922d66bf5d9f554fb2392c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_s8.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int8x16_t
+foo (int8_t const * base)
+{
+  return vld1q_s8 (base);
+}
+
+/* { dg-final { scan-assembler "vldrb.s8"  }  } */
+
+int8x16_t
+foo1 (int8_t const * base)
+{
+  return vld1q (base);
+}
+
+/* { dg-final { scan-assembler "vldrb.s8"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_u16.c
new file mode 100644
index 0000000000000000000000000000000000000000..b69a0642e2ebaf62a4f7c164b23085dab7d1237d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_u16.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint16x8_t
+foo (uint16_t const * base)
+{
+  return vld1q_u16 (base);
+}
+
+/* { dg-final { scan-assembler "vldrh.u16"  }  } */
+
+uint16x8_t
+foo1 (uint16_t const * base)
+{
+  return vld1q (base);
+}
+
+/* { dg-final { scan-assembler "vldrh.u16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_u32.c
new file mode 100644
index 0000000000000000000000000000000000000000..cbe87e31750d71dd1f5deaf8166dd3ec05c81d16
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_u32.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint32x4_t
+foo (uint32_t const * base)
+{
+  return vld1q_u32 (base);
+}
+
+/* { dg-final { scan-assembler "vldrw.u32"  }  } */
+
+uint32x4_t
+foo1 (uint32_t const * base)
+{
+  return vld1q (base);
+}
+
+/* { dg-final { scan-assembler "vldrw.u32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_u8.c
new file mode 100644
index 0000000000000000000000000000000000000000..75637eb88d20ce3eebd3030062ed8c23c8b1931e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vld1q_u8.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint8x16_t
+foo (uint8_t const * base)
+{
+  return vld1q_u8 (base);
+}
+
+/* { dg-final { scan-assembler "vldrb.u8"  }  } */
+
+uint8x16_t
+foo1 (uint8_t const * base)
+{
+  return vld1q (base);
+}
+
+/* { dg-final { scan-assembler "vldrb.u8"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_f16.c
new file mode 100644
index 0000000000000000000000000000000000000000..80fcdfc3bf66f73d6f2e91e5be9aba8ccb5a0af7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_f16.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+float16x8_t
+foo (float16_t const * base)
+{
+  return vldrhq_f16 (base);
+}
+
+/* { dg-final { scan-assembler "vldrh.f16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_gather_offset_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_gather_offset_s16.c
new file mode 100644
index 0000000000000000000000000000000000000000..cf429f323b6c59717e2d073355a7b6a6189bce71
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_gather_offset_s16.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int16x8_t
+foo (int16_t const * base, uint16x8_t offset)
+{
+  return vldrhq_gather_offset_s16 (base, offset);
+}
+
+/* { dg-final { scan-assembler "vldrh.u16"  }  } */
+
+int16x8_t
+foo1 (int16_t const * base, uint16x8_t offset)
+{
+  return vldrhq_gather_offset (base, offset);
+}
+
+/* { dg-final { scan-assembler "vldrh.u16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_gather_offset_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_gather_offset_s32.c
new file mode 100644
index 0000000000000000000000000000000000000000..264090080cb4435d5fa0512e16648f111dc79ad5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_gather_offset_s32.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int32x4_t
+foo (int16_t const * base, uint32x4_t offset)
+{
+  return vldrhq_gather_offset_s32 (base, offset);
+}
+
+/* { dg-final { scan-assembler "vldrh.s32"  }  } */
+
+int32x4_t
+foo1 (int16_t const * base, uint32x4_t offset)
+{
+  return vldrhq_gather_offset (base, offset);
+}
+
+/* { dg-final { scan-assembler "vldrh.s32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_gather_offset_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_gather_offset_u16.c
new file mode 100644
index 0000000000000000000000000000000000000000..9c46661a2a94cfb7f34658ee5f90ec8de626dee4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_gather_offset_u16.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint16x8_t
+foo (uint16_t const * base, uint16x8_t offset)
+{
+  return vldrhq_gather_offset_u16 (base, offset);
+}
+
+/* { dg-final { scan-assembler "vldrh.u16"  }  } */
+
+uint16x8_t
+foo1 (uint16_t const * base, uint16x8_t offset)
+{
+  return vldrhq_gather_offset (base, offset);
+}
+
+/* { dg-final { scan-assembler "vldrh.u16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_gather_offset_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_gather_offset_u32.c
new file mode 100644
index 0000000000000000000000000000000000000000..93efe5d1e38a7be3bb00009fc631528b76c4998e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_gather_offset_u32.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint32x4_t
+foo (uint16_t const * base, uint32x4_t offset)
+{
+  return vldrhq_gather_offset_u32 (base, offset);
+}
+
+/* { dg-final { scan-assembler "vldrh.u32"  }  } */
+
+uint32x4_t
+foo1 (uint16_t const * base, uint32x4_t offset)
+{
+  return vldrhq_gather_offset (base, offset);
+}
+
+/* { dg-final { scan-assembler "vldrh.u32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_gather_offset_z_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_gather_offset_z_s16.c
new file mode 100644
index 0000000000000000000000000000000000000000..a1beb003049d57f036ddd8ea14260985d0befc89
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_gather_offset_z_s16.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int16x8_t
+foo (int16_t const * base, uint16x8_t offset, mve_pred16_t p)
+{
+  return vldrhq_gather_offset_z_s16 (base, offset, p);
+}
+
+/* { dg-final { scan-assembler "vldrht.u16"  }  } */
+
+int16x8_t
+foo1 (int16_t const * base, uint16x8_t offset, mve_pred16_t p)
+{
+  return vldrhq_gather_offset_z (base, offset, p);
+}
+
+/* { dg-final { scan-assembler "vldrht.u16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_gather_offset_z_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_gather_offset_z_s32.c
new file mode 100644
index 0000000000000000000000000000000000000000..3f5173ba44a8759106623f2cc392425a32aaf184
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_gather_offset_z_s32.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int32x4_t
+foo (int16_t const * base, uint32x4_t offset, mve_pred16_t p)
+{
+  return vldrhq_gather_offset_z_s32 (base, offset, p);
+}
+
+/* { dg-final { scan-assembler "vldrht.s32"  }  } */
+
+int32x4_t
+foo1 (int16_t const * base, uint32x4_t offset, mve_pred16_t p)
+{
+  return vldrhq_gather_offset_z (base, offset, p);
+}
+
+/* { dg-final { scan-assembler "vldrht.s32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_gather_offset_z_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_gather_offset_z_u16.c
new file mode 100644
index 0000000000000000000000000000000000000000..4059307f77a1070fa621dc6dab99f340012a9540
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_gather_offset_z_u16.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint16x8_t
+foo (uint16_t const * base, uint16x8_t offset, mve_pred16_t p)
+{
+  return vldrhq_gather_offset_z_u16 (base, offset, p);
+}
+
+/* { dg-final { scan-assembler "vldrht.u16"  }  } */
+
+uint16x8_t
+foo1 (uint16_t const * base, uint16x8_t offset, mve_pred16_t p)
+{
+  return vldrhq_gather_offset_z (base, offset, p);
+}
+
+/* { dg-final { scan-assembler "vldrht.u16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_gather_offset_z_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_gather_offset_z_u32.c
new file mode 100644
index 0000000000000000000000000000000000000000..4e14c5ad4cdaaf5b12fdb9c807c5ddc5f9206173
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_gather_offset_z_u32.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint32x4_t
+foo (uint16_t const * base, uint32x4_t offset, mve_pred16_t p)
+{
+  return vldrhq_gather_offset_z_u32 (base, offset, p);
+}
+
+/* { dg-final { scan-assembler "vldrht.u32"  }  } */
+
+uint32x4_t
+foo1 (uint16_t const * base, uint32x4_t offset, mve_pred16_t p)
+{
+  return vldrhq_gather_offset_z (base, offset, p);
+}
+
+/* { dg-final { scan-assembler "vldrht.u32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_gather_shifted_offset_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_gather_shifted_offset_s16.c
new file mode 100644
index 0000000000000000000000000000000000000000..c4c520344eaadeb250aab0833d5d93552d4004cf
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_gather_shifted_offset_s16.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int16x8_t
+foo (int16_t const * base, uint16x8_t offset)
+{
+  return vldrhq_gather_shifted_offset_s16 (base, offset);
+}
+
+/* { dg-final { scan-assembler "vldrh.u16"  }  } */
+
+int16x8_t
+foo1 (int16_t const * base, uint16x8_t offset)
+{
+  return vldrhq_gather_shifted_offset (base, offset);
+}
+
+/* { dg-final { scan-assembler "vldrh.u16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_gather_shifted_offset_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_gather_shifted_offset_s32.c
new file mode 100644
index 0000000000000000000000000000000000000000..b4ef042f21d3965b370991269c76e18d652ad13c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_gather_shifted_offset_s32.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int32x4_t
+foo (int16_t const * base, uint32x4_t offset)
+{
+  return vldrhq_gather_shifted_offset_s32 (base, offset);
+}
+
+/* { dg-final { scan-assembler "vldrh.s32"  }  } */
+
+int32x4_t
+foo1 (int16_t const * base, uint32x4_t offset)
+{
+  return vldrhq_gather_shifted_offset (base, offset);
+}
+
+/* { dg-final { scan-assembler "vldrh.s32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_gather_shifted_offset_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_gather_shifted_offset_u16.c
new file mode 100644
index 0000000000000000000000000000000000000000..972b6d1035035e432167bdb72a07d141b256dd16
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_gather_shifted_offset_u16.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint16x8_t
+foo (uint16_t const * base, uint16x8_t offset)
+{
+  return vldrhq_gather_shifted_offset_u16 (base, offset);
+}
+
+/* { dg-final { scan-assembler "vldrh.u16"  }  } */
+
+uint16x8_t
+foo1 (uint16_t const * base, uint16x8_t offset)
+{
+  return vldrhq_gather_shifted_offset (base, offset);
+}
+
+/* { dg-final { scan-assembler "vldrh.u16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_gather_shifted_offset_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_gather_shifted_offset_u32.c
new file mode 100644
index 0000000000000000000000000000000000000000..6617efff82855734af0ffa69ef70cfd0a5883a9d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_gather_shifted_offset_u32.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint32x4_t
+foo (uint16_t const * base, uint32x4_t offset)
+{
+  return vldrhq_gather_shifted_offset_u32 (base, offset);
+}
+
+/* { dg-final { scan-assembler "vldrh.u32"  }  } */
+
+uint32x4_t
+foo1 (uint16_t const * base, uint32x4_t offset)
+{
+  return vldrhq_gather_shifted_offset (base, offset);
+}
+
+/* { dg-final { scan-assembler "vldrh.u32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_gather_shifted_offset_z_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_gather_shifted_offset_z_s16.c
new file mode 100644
index 0000000000000000000000000000000000000000..e067faa24e2d0b913943c55311230633ad4f0a9e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_gather_shifted_offset_z_s16.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int16x8_t
+foo (int16_t const * base, uint16x8_t offset, mve_pred16_t p)
+{
+  return vldrhq_gather_shifted_offset_z_s16 (base, offset, p);
+}
+
+/* { dg-final { scan-assembler "vldrht.u16"  }  } */
+
+int16x8_t
+foo1 (int16_t const * base, uint16x8_t offset, mve_pred16_t p)
+{
+  return vldrhq_gather_shifted_offset_z (base, offset, p);
+}
+
+/* { dg-final { scan-assembler "vldrht.u16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_gather_shifted_offset_z_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_gather_shifted_offset_z_s32.c
new file mode 100644
index 0000000000000000000000000000000000000000..01d9ad1996db17f2b64a7cb8a7885ec2aaad1f36
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_gather_shifted_offset_z_s32.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int32x4_t
+foo (int16_t const * base, uint32x4_t offset, mve_pred16_t p)
+{
+  return vldrhq_gather_shifted_offset_z_s32 (base, offset, p);
+}
+
+/* { dg-final { scan-assembler "vldrht.s32"  }  } */
+
+int32x4_t
+foo1 (int16_t const * base, uint32x4_t offset, mve_pred16_t p)
+{
+  return vldrhq_gather_shifted_offset_z (base, offset, p);
+}
+
+/* { dg-final { scan-assembler "vldrht.s32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_gather_shifted_offset_z_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_gather_shifted_offset_z_u16.c
new file mode 100644
index 0000000000000000000000000000000000000000..133ff330a92bc6874519acb573e8fbc32a6c409f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_gather_shifted_offset_z_u16.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint16x8_t
+foo (uint16_t const * base, uint16x8_t offset, mve_pred16_t p)
+{
+  return vldrhq_gather_shifted_offset_z_u16 (base, offset, p);
+}
+
+/* { dg-final { scan-assembler "vldrht.u16"  }  } */
+
+uint16x8_t
+foo1 (uint16_t const * base, uint16x8_t offset, mve_pred16_t p)
+{
+  return vldrhq_gather_shifted_offset_z (base, offset, p);
+}
+
+/* { dg-final { scan-assembler "vldrht.u16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_gather_shifted_offset_z_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_gather_shifted_offset_z_u32.c
new file mode 100644
index 0000000000000000000000000000000000000000..f370ddd51dfef82eda57c8811230250d429bbd7f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_gather_shifted_offset_z_u32.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint32x4_t
+foo (uint16_t const * base, uint32x4_t offset, mve_pred16_t p)
+{
+  return vldrhq_gather_shifted_offset_z_u32 (base, offset, p);
+}
+
+/* { dg-final { scan-assembler "vldrht.u32"  }  } */
+
+uint32x4_t
+foo1 (uint16_t const * base, uint32x4_t offset, mve_pred16_t p)
+{
+  return vldrhq_gather_shifted_offset_z (base, offset, p);
+}
+
+/* { dg-final { scan-assembler "vldrht.u32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_s16.c
new file mode 100644
index 0000000000000000000000000000000000000000..07c563a2cd0e8d3bdbf34c0e2ad9939fd3bfa1c7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_s16.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int16x8_t
+foo (int16_t const * base)
+{
+  return vldrhq_s16 (base);
+}
+
+/* { dg-final { scan-assembler "vldrh.s16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_s32.c
new file mode 100644
index 0000000000000000000000000000000000000000..6f56a2cc88eb3a9439ac602e0cedea676ac93419
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_s32.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int32x4_t
+foo (int16_t const * base)
+{
+  return vldrhq_s32 (base);
+}
+
+/* { dg-final { scan-assembler "vldrh.s32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_u16.c
new file mode 100644
index 0000000000000000000000000000000000000000..e18d7585b7ff66a1e038586069ee09a27579da22
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_u16.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint16x8_t
+foo (uint16_t const * base)
+{
+  return vldrhq_u16 (base);
+}
+
+/* { dg-final { scan-assembler "vldrh.u16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_u32.c
new file mode 100644
index 0000000000000000000000000000000000000000..4a89ed61f7aa2a580012093456dcff6d9eedb15f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_u32.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint32x4_t
+foo (uint16_t const * base)
+{
+  return vldrhq_u32 (base);
+}
+
+/* { dg-final { scan-assembler "vldrh.u32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_z_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_z_f16.c
new file mode 100644
index 0000000000000000000000000000000000000000..00b68f3f4b607e522321c4158ca8f3eef672920c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_z_f16.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+float16x8_t
+foo (float16_t const * base, mve_pred16_t p)
+{
+  return vldrhq_z_f16 (base, p);
+}
+
+/* { dg-final { scan-assembler "vldrht.f16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_z_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_z_s16.c
new file mode 100644
index 0000000000000000000000000000000000000000..a62b15c18051a42758a1b153028bf0a9b49761f2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_z_s16.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int16x8_t
+foo (int16_t const * base, mve_pred16_t p)
+{
+  return vldrhq_z_s16 (base, p);
+}
+
+/* { dg-final { scan-assembler "vldrht.s16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_z_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_z_s32.c
new file mode 100644
index 0000000000000000000000000000000000000000..e43b696c91b7a10c1f822f4f7e8fdd460830fe31
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_z_s32.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int32x4_t
+foo (int16_t const * base, mve_pred16_t p)
+{
+  return vldrhq_z_s32 (base, p);
+}
+
+/* { dg-final { scan-assembler "vldrht.s32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_z_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_z_u16.c
new file mode 100644
index 0000000000000000000000000000000000000000..9911442882cc9cc707f9a7034cb62516a06bc235
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_z_u16.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint16x8_t
+foo (uint16_t const * base, mve_pred16_t p)
+{
+  return vldrhq_z_u16 (base, p);
+}
+
+/* { dg-final { scan-assembler "vldrht.u16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_z_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_z_u32.c
new file mode 100644
index 0000000000000000000000000000000000000000..b28d52c9fd53029b13cbbaa10a51c5f6b1c38e20
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_z_u32.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint32x4_t
+foo (uint16_t const * base, mve_pred16_t p)
+{
+  return vldrhq_z_u32 (base, p);
+}
+
+/* { dg-final { scan-assembler "vldrht.u32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_f32.c
new file mode 100644
index 0000000000000000000000000000000000000000..c86b1ea7b8f19c231505a84030a3f9babaa6829d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_f32.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+float32x4_t
+foo (float32_t const * base)
+{
+  return vldrwq_f32 (base);
+}
+
+/* { dg-final { scan-assembler "vldrw.f32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_s32.c
new file mode 100644
index 0000000000000000000000000000000000000000..d0cab13dcea72037c5e11be76fe67b6906221571
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_s32.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int32x4_t
+foo (int32_t const * base)
+{
+  return vldrwq_s32 (base);
+}
+
+/* { dg-final { scan-assembler "vldrw.s32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_u32.c
new file mode 100644
index 0000000000000000000000000000000000000000..de9b3d9df3bbc9e6b655626636da9a97102b141a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_u32.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint32x4_t
+foo (uint32_t const * base)
+{
+  return vldrwq_u32 (base);
+}
+
+/* { dg-final { scan-assembler "vldrw.u32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_z_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_z_f32.c
new file mode 100644
index 0000000000000000000000000000000000000000..1f62417ecaf597dca883e8b8b3360eafacc48209
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_z_f32.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+float32x4_t
+foo (float32_t const * base, mve_pred16_t p)
+{
+  return vldrwq_z_f32 (base, p);
+}
+
+/* { dg-final { scan-assembler "vldrwt.f32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_z_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_z_s32.c
new file mode 100644
index 0000000000000000000000000000000000000000..1922df6268a46dab4bdb5ac89c6f28610592e2a0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_z_s32.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int32x4_t
+foo (int32_t const * base, mve_pred16_t p)
+{
+  return vldrwq_z_s32 (base, p);
+}
+
+/* { dg-final { scan-assembler "vldrwt.s32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_z_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_z_u32.c
new file mode 100644
index 0000000000000000000000000000000000000000..ac49091ce2e3330b4d5b92f8f373752d9f9b8ba6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_z_u32.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint32x4_t
+foo (uint32_t const * base, mve_pred16_t p)
+{
+  return vldrwq_z_u32 (base, p);
+}
+
+/* { dg-final { scan-assembler "vldrwt.u32"  }  } */


[-- Attachment #2: diff24.patch.gz --]
[-- Type: application/gzip, Size: 6707 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH][ARM][GCC][9x]: MVE ACLE predicated intrinsics with (dont-care) variant.
  2019-11-14 19:34 [PATCH][ARM][GCC][0/x]: Support for MVE ACLE intrinsics Srinath Parvathaneni
                   ` (28 preceding siblings ...)
  2019-11-14 19:16 ` [PATCH][ARM][GCC][1/1x]: Patch to support MVE ACLE intrinsics with unary operand Srinath Parvathaneni
@ 2019-11-14 19:16 ` Srinath Parvathaneni
  2019-11-14 19:16 ` [PATCH][ARM][GCC][4/1x]: MVE intrinsics with unary operand Srinath Parvathaneni
                   ` (8 subsequent siblings)
  38 siblings, 0 replies; 48+ messages in thread
From: Srinath Parvathaneni @ 2019-11-14 19:16 UTC (permalink / raw)
  To: gcc-patches; +Cc: Richard Earnshaw, Kyrylo Tkachov

[-- Attachment #1: Type: text/plain, Size: 55455 bytes --]

Hello,

This patch supports following MVE ACLE predicated intrinsic with `_x` (dont-care) variant.
* ``_x`` (dont-care) which indicates that the false-predicated lanes have undefined values.
These are syntactic sugar for merge intrinsics with a ``vuninitializedq`` inactive parameter.

vabdq_x_f16, vabdq_x_f32, vabdq_x_s16, vabdq_x_s32, vabdq_x_s8, vabdq_x_u16, vabdq_x_u32, vabdq_x_u8,
vabsq_x_f16, vabsq_x_f32, vabsq_x_s16, vabsq_x_s32, vabsq_x_s8, vaddq_x_f16, vaddq_x_f32, vaddq_x_n_f16,
vaddq_x_n_f32, vaddq_x_n_s16, vaddq_x_n_s32, vaddq_x_n_s8, vaddq_x_n_u16, vaddq_x_n_u32, vaddq_x_n_u8,
vaddq_x_s16, vaddq_x_s32, vaddq_x_s8, vaddq_x_u16, vaddq_x_u32, vaddq_x_u8, vandq_x_f16, vandq_x_f32,
vandq_x_s16, vandq_x_s32, vandq_x_s8, vandq_x_u16, vandq_x_u32, vandq_x_u8, vbicq_x_f16, vbicq_x_f32,
vbicq_x_s16, vbicq_x_s32, vbicq_x_s8, vbicq_x_u16, vbicq_x_u32, vbicq_x_u8, vbrsrq_x_n_f16,
vbrsrq_x_n_f32, vbrsrq_x_n_s16, vbrsrq_x_n_s32, vbrsrq_x_n_s8, vbrsrq_x_n_u16, vbrsrq_x_n_u32,
vbrsrq_x_n_u8, vcaddq_rot270_x_f16, vcaddq_rot270_x_f32, vcaddq_rot270_x_s16, vcaddq_rot270_x_s32,
vcaddq_rot270_x_s8, vcaddq_rot270_x_u16, vcaddq_rot270_x_u32, vcaddq_rot270_x_u8, vcaddq_rot90_x_f16,
vcaddq_rot90_x_f32, vcaddq_rot90_x_s16, vcaddq_rot90_x_s32, vcaddq_rot90_x_s8, vcaddq_rot90_x_u16,
vcaddq_rot90_x_u32, vcaddq_rot90_x_u8, vclsq_x_s16, vclsq_x_s32, vclsq_x_s8, vclzq_x_s16, vclzq_x_s32,
vclzq_x_s8, vclzq_x_u16, vclzq_x_u32, vclzq_x_u8, vcmulq_rot180_x_f16, vcmulq_rot180_x_f32,
vcmulq_rot270_x_f16, vcmulq_rot270_x_f32, vcmulq_rot90_x_f16, vcmulq_rot90_x_f32, vcmulq_x_f16,
vcmulq_x_f32, vcvtaq_x_s16_f16, vcvtaq_x_s32_f32, vcvtaq_x_u16_f16, vcvtaq_x_u32_f32, vcvtbq_x_f32_f16,
vcvtmq_x_s16_f16, vcvtmq_x_s32_f32, vcvtmq_x_u16_f16, vcvtmq_x_u32_f32, vcvtnq_x_s16_f16,
vcvtnq_x_s32_f32, vcvtnq_x_u16_f16, vcvtnq_x_u32_f32, vcvtpq_x_s16_f16, vcvtpq_x_s32_f32,
vcvtpq_x_u16_f16, vcvtpq_x_u32_f32, vcvtq_x_f16_s16, vcvtq_x_f16_u16, vcvtq_x_f32_s32, vcvtq_x_f32_u32,
vcvtq_x_n_f16_s16, vcvtq_x_n_f16_u16, vcvtq_x_n_f32_s32, vcvtq_x_n_f32_u32, vcvtq_x_n_s16_f16,
vcvtq_x_n_s32_f32, vcvtq_x_n_u16_f16, vcvtq_x_n_u32_f32, vcvtq_x_s16_f16, vcvtq_x_s32_f32,
vcvtq_x_u16_f16, vcvtq_x_u32_f32, vcvttq_x_f32_f16, vddupq_x_n_u16, vddupq_x_n_u32, vddupq_x_n_u8,
vddupq_x_wb_u16, vddupq_x_wb_u32, vddupq_x_wb_u8, vdupq_x_n_f16, vdupq_x_n_f32, vdupq_x_n_s16,
vdupq_x_n_s32, vdupq_x_n_s8, vdupq_x_n_u16, vdupq_x_n_u32, vdupq_x_n_u8, vdwdupq_x_n_u16, vdwdupq_x_n_u32,
vdwdupq_x_n_u8, vdwdupq_x_wb_u16, vdwdupq_x_wb_u32, vdwdupq_x_wb_u8, veorq_x_f16, veorq_x_f32, veorq_x_s16,
veorq_x_s32, veorq_x_s8, veorq_x_u16, veorq_x_u32, veorq_x_u8, vhaddq_x_n_s16, vhaddq_x_n_s32,
vhaddq_x_n_s8, vhaddq_x_n_u16, vhaddq_x_n_u32, vhaddq_x_n_u8, vhaddq_x_s16, vhaddq_x_s32, vhaddq_x_s8,
vhaddq_x_u16, vhaddq_x_u32, vhaddq_x_u8, vhcaddq_rot270_x_s16, vhcaddq_rot270_x_s32, vhcaddq_rot270_x_s8,
vhcaddq_rot90_x_s16, vhcaddq_rot90_x_s32, vhcaddq_rot90_x_s8, vhsubq_x_n_s16, vhsubq_x_n_s32,
vhsubq_x_n_s8, vhsubq_x_n_u16, vhsubq_x_n_u32, vhsubq_x_n_u8, vhsubq_x_s16, vhsubq_x_s32, vhsubq_x_s8,
vhsubq_x_u16, vhsubq_x_u32, vhsubq_x_u8, vidupq_x_n_u16, vidupq_x_n_u32, vidupq_x_n_u8, vidupq_x_wb_u16,
vidupq_x_wb_u32, vidupq_x_wb_u8, viwdupq_x_n_u16, viwdupq_x_n_u32, viwdupq_x_n_u8, viwdupq_x_wb_u16,
viwdupq_x_wb_u32, viwdupq_x_wb_u8, vmaxnmq_x_f16, vmaxnmq_x_f32, vmaxq_x_s16, vmaxq_x_s32, vmaxq_x_s8,
vmaxq_x_u16, vmaxq_x_u32, vmaxq_x_u8, vminnmq_x_f16, vminnmq_x_f32, vminq_x_s16, vminq_x_s32, vminq_x_s8,
vminq_x_u16, vminq_x_u32, vminq_x_u8, vmovlbq_x_s16, vmovlbq_x_s8, vmovlbq_x_u16, vmovlbq_x_u8,
vmovltq_x_s16, vmovltq_x_s8, vmovltq_x_u16, vmovltq_x_u8, vmulhq_x_s16, vmulhq_x_s32, vmulhq_x_s8,
vmulhq_x_u16, vmulhq_x_u32, vmulhq_x_u8, vmullbq_int_x_s16, vmullbq_int_x_s32, vmullbq_int_x_s8,
vmullbq_int_x_u16, vmullbq_int_x_u32, vmullbq_int_x_u8, vmullbq_poly_x_p16, vmullbq_poly_x_p8,
vmulltq_int_x_s16, vmulltq_int_x_s32, vmulltq_int_x_s8, vmulltq_int_x_u16, vmulltq_int_x_u32,
vmulltq_int_x_u8, vmulltq_poly_x_p16, vmulltq_poly_x_p8, vmulq_x_f16, vmulq_x_f32, vmulq_x_n_f16,
vmulq_x_n_f32, vmulq_x_n_s16, vmulq_x_n_s32, vmulq_x_n_s8, vmulq_x_n_u16, vmulq_x_n_u32, vmulq_x_n_u8,
vmulq_x_s16, vmulq_x_s32, vmulq_x_s8, vmulq_x_u16, vmulq_x_u32, vmulq_x_u8, vmvnq_x_n_s16, vmvnq_x_n_s32,
vmvnq_x_n_u16, vmvnq_x_n_u32, vmvnq_x_s16, vmvnq_x_s32, vmvnq_x_s8, vmvnq_x_u16, vmvnq_x_u32, vmvnq_x_u8,
vnegq_x_f16, vnegq_x_f32, vnegq_x_s16, vnegq_x_s32, vnegq_x_s8, vornq_x_f16, vornq_x_f32, vornq_x_s16,
vornq_x_s32, vornq_x_s8, vornq_x_u16, vornq_x_u32, vornq_x_u8, vorrq_x_f16, vorrq_x_f32, vorrq_x_s16,
vorrq_x_s32, vorrq_x_s8, vorrq_x_u16, vorrq_x_u32, vorrq_x_u8, vrev16q_x_s8, vrev16q_x_u8, vrev32q_x_f16,
vrev32q_x_s16, vrev32q_x_s8, vrev32q_x_u16, vrev32q_x_u8, vrev64q_x_f16, vrev64q_x_f32, vrev64q_x_s16,
vrev64q_x_s32, vrev64q_x_s8, vrev64q_x_u16, vrev64q_x_u32, vrev64q_x_u8, vrhaddq_x_s16, vrhaddq_x_s32,
vrhaddq_x_s8, vrhaddq_x_u16, vrhaddq_x_u32, vrhaddq_x_u8, vrmulhq_x_s16, vrmulhq_x_s32, vrmulhq_x_s8,
vrmulhq_x_u16, vrmulhq_x_u32, vrmulhq_x_u8, vrndaq_x_f16, vrndaq_x_f32, vrndmq_x_f16, vrndmq_x_f32,
vrndnq_x_f16, vrndnq_x_f32, vrndpq_x_f16, vrndpq_x_f32, vrndq_x_f16, vrndq_x_f32, vrndxq_x_f16,
vrndxq_x_f32, vrshlq_x_s16, vrshlq_x_s32, vrshlq_x_s8, vrshlq_x_u16, vrshlq_x_u32, vrshlq_x_u8,
vrshrq_x_n_s16, vrshrq_x_n_s32, vrshrq_x_n_s8, vrshrq_x_n_u16, vrshrq_x_n_u32, vrshrq_x_n_u8,
vshllbq_x_n_s16, vshllbq_x_n_s8, vshllbq_x_n_u16, vshllbq_x_n_u8, vshlltq_x_n_s16, vshlltq_x_n_s8,
vshlltq_x_n_u16, vshlltq_x_n_u8, vshlq_x_n_s16, vshlq_x_n_s32, vshlq_x_n_s8, vshlq_x_n_u16, vshlq_x_n_u32,
vshlq_x_n_u8, vshlq_x_s16, vshlq_x_s32, vshlq_x_s8, vshlq_x_u16, vshlq_x_u32, vshlq_x_u8, vshrq_x_n_s16,
vshrq_x_n_s32, vshrq_x_n_s8, vshrq_x_n_u16, vshrq_x_n_u32, vshrq_x_n_u8, vsubq_x_f16, vsubq_x_f32,
vsubq_x_n_f16, vsubq_x_n_f32, vsubq_x_n_s16, vsubq_x_n_s32, vsubq_x_n_s8, vsubq_x_n_u16, vsubq_x_n_u32,
vsubq_x_n_u8, vsubq_x_s16, vsubq_x_s32, vsubq_x_s8, vsubq_x_u16, vsubq_x_u32, vsubq_x_u8.

Please refer to M-profile Vector Extension (MVE) intrinsics [1] for more details.
[1] https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/mve-intrinsics

Regression tested on arm-none-eabi and found no regressions.

Ok for trunk?

Thanks,
Srinath.

gcc/ChangeLog:

2019-11-14  Srinath Parvathaneni  <srinath.parvathaneni@arm.com>

	* config/arm/arm_mve.h (vddupq_x_n_u8): Define macro.
	(vddupq_x_n_u16): Likewise.
	(vddupq_x_n_u32): Likewise.
	(vddupq_x_wb_u8): Likewise.
	(vddupq_x_wb_u16): Likewise.
	(vddupq_x_wb_u32): Likewise.
	(vdwdupq_x_n_u8): Likewise.
	(vdwdupq_x_n_u16): Likewise.
	(vdwdupq_x_n_u32): Likewise.
	(vdwdupq_x_wb_u8): Likewise.
	(vdwdupq_x_wb_u16): Likewise.
	(vdwdupq_x_wb_u32): Likewise.
	(vidupq_x_n_u8): Likewise.
	(vidupq_x_n_u16): Likewise.
	(vidupq_x_n_u32): Likewise.
	(vidupq_x_wb_u8): Likewise.
	(vidupq_x_wb_u16): Likewise.
	(vidupq_x_wb_u32): Likewise.
	(viwdupq_x_n_u8): Likewise.
	(viwdupq_x_n_u16): Likewise.
	(viwdupq_x_n_u32): Likewise.
	(viwdupq_x_wb_u8): Likewise.
	(viwdupq_x_wb_u16): Likewise.
	(viwdupq_x_wb_u32): Likewise.
	(vdupq_x_n_s8): Likewise.
	(vdupq_x_n_s16): Likewise.
	(vdupq_x_n_s32): Likewise.
	(vdupq_x_n_u8): Likewise.
	(vdupq_x_n_u16): Likewise.
	(vdupq_x_n_u32): Likewise.
	(vminq_x_s8): Likewise.
	(vminq_x_s16): Likewise.
	(vminq_x_s32): Likewise.
	(vminq_x_u8): Likewise.
	(vminq_x_u16): Likewise.
	(vminq_x_u32): Likewise.
	(vmaxq_x_s8): Likewise.
	(vmaxq_x_s16): Likewise.
	(vmaxq_x_s32): Likewise.
	(vmaxq_x_u8): Likewise.
	(vmaxq_x_u16): Likewise.
	(vmaxq_x_u32): Likewise.
	(vabdq_x_s8): Likewise.
	(vabdq_x_s16): Likewise.
	(vabdq_x_s32): Likewise.
	(vabdq_x_u8): Likewise.
	(vabdq_x_u16): Likewise.
	(vabdq_x_u32): Likewise.
	(vabsq_x_s8): Likewise.
	(vabsq_x_s16): Likewise.
	(vabsq_x_s32): Likewise.
	(vaddq_x_s8): Likewise.
	(vaddq_x_s16): Likewise.
	(vaddq_x_s32): Likewise.
	(vaddq_x_n_s8): Likewise.
	(vaddq_x_n_s16): Likewise.
	(vaddq_x_n_s32): Likewise.
	(vaddq_x_u8): Likewise.
	(vaddq_x_u16): Likewise.
	(vaddq_x_u32): Likewise.
	(vaddq_x_n_u8): Likewise.
	(vaddq_x_n_u16): Likewise.
	(vaddq_x_n_u32): Likewise.
	(vclsq_x_s8): Likewise.
	(vclsq_x_s16): Likewise.
	(vclsq_x_s32): Likewise.
	(vclzq_x_s8): Likewise.
	(vclzq_x_s16): Likewise.
	(vclzq_x_s32): Likewise.
	(vclzq_x_u8): Likewise.
	(vclzq_x_u16): Likewise.
	(vclzq_x_u32): Likewise.
	(vnegq_x_s8): Likewise.
	(vnegq_x_s16): Likewise.
	(vnegq_x_s32): Likewise.
	(vmulhq_x_s8): Likewise.
	(vmulhq_x_s16): Likewise.
	(vmulhq_x_s32): Likewise.
	(vmulhq_x_u8): Likewise.
	(vmulhq_x_u16): Likewise.
	(vmulhq_x_u32): Likewise.
	(vmullbq_poly_x_p8): Likewise.
	(vmullbq_poly_x_p16): Likewise.
	(vmullbq_int_x_s8): Likewise.
	(vmullbq_int_x_s16): Likewise.
	(vmullbq_int_x_s32): Likewise.
	(vmullbq_int_x_u8): Likewise.
	(vmullbq_int_x_u16): Likewise.
	(vmullbq_int_x_u32): Likewise.
	(vmulltq_poly_x_p8): Likewise.
	(vmulltq_poly_x_p16): Likewise.
	(vmulltq_int_x_s8): Likewise.
	(vmulltq_int_x_s16): Likewise.
	(vmulltq_int_x_s32): Likewise.
	(vmulltq_int_x_u8): Likewise.
	(vmulltq_int_x_u16): Likewise.
	(vmulltq_int_x_u32): Likewise.
	(vmulq_x_s8): Likewise.
	(vmulq_x_s16): Likewise.
	(vmulq_x_s32): Likewise.
	(vmulq_x_n_s8): Likewise.
	(vmulq_x_n_s16): Likewise.
	(vmulq_x_n_s32): Likewise.
	(vmulq_x_u8): Likewise.
	(vmulq_x_u16): Likewise.
	(vmulq_x_u32): Likewise.
	(vmulq_x_n_u8): Likewise.
	(vmulq_x_n_u16): Likewise.
	(vmulq_x_n_u32): Likewise.
	(vsubq_x_s8): Likewise.
	(vsubq_x_s16): Likewise.
	(vsubq_x_s32): Likewise.
	(vsubq_x_n_s8): Likewise.
	(vsubq_x_n_s16): Likewise.
	(vsubq_x_n_s32): Likewise.
	(vsubq_x_u8): Likewise.
	(vsubq_x_u16): Likewise.
	(vsubq_x_u32): Likewise.
	(vsubq_x_n_u8): Likewise.
	(vsubq_x_n_u16): Likewise.
	(vsubq_x_n_u32): Likewise.
	(vcaddq_rot90_x_s8): Likewise.
	(vcaddq_rot90_x_s16): Likewise.
	(vcaddq_rot90_x_s32): Likewise.
	(vcaddq_rot90_x_u8): Likewise.
	(vcaddq_rot90_x_u16): Likewise.
	(vcaddq_rot90_x_u32): Likewise.
	(vcaddq_rot270_x_s8): Likewise.
	(vcaddq_rot270_x_s16): Likewise.
	(vcaddq_rot270_x_s32): Likewise.
	(vcaddq_rot270_x_u8): Likewise.
	(vcaddq_rot270_x_u16): Likewise.
	(vcaddq_rot270_x_u32): Likewise.
	(vhaddq_x_n_s8): Likewise.
	(vhaddq_x_n_s16): Likewise.
	(vhaddq_x_n_s32): Likewise.
	(vhaddq_x_n_u8): Likewise.
	(vhaddq_x_n_u16): Likewise.
	(vhaddq_x_n_u32): Likewise.
	(vhaddq_x_s8): Likewise.
	(vhaddq_x_s16): Likewise.
	(vhaddq_x_s32): Likewise.
	(vhaddq_x_u8): Likewise.
	(vhaddq_x_u16): Likewise.
	(vhaddq_x_u32): Likewise.
	(vhcaddq_rot90_x_s8): Likewise.
	(vhcaddq_rot90_x_s16): Likewise.
	(vhcaddq_rot90_x_s32): Likewise.
	(vhcaddq_rot270_x_s8): Likewise.
	(vhcaddq_rot270_x_s16): Likewise.
	(vhcaddq_rot270_x_s32): Likewise.
	(vhsubq_x_n_s8): Likewise.
	(vhsubq_x_n_s16): Likewise.
	(vhsubq_x_n_s32): Likewise.
	(vhsubq_x_n_u8): Likewise.
	(vhsubq_x_n_u16): Likewise.
	(vhsubq_x_n_u32): Likewise.
	(vhsubq_x_s8): Likewise.
	(vhsubq_x_s16): Likewise.
	(vhsubq_x_s32): Likewise.
	(vhsubq_x_u8): Likewise.
	(vhsubq_x_u16): Likewise.
	(vhsubq_x_u32): Likewise.
	(vrhaddq_x_s8): Likewise.
	(vrhaddq_x_s16): Likewise.
	(vrhaddq_x_s32): Likewise.
	(vrhaddq_x_u8): Likewise.
	(vrhaddq_x_u16): Likewise.
	(vrhaddq_x_u32): Likewise.
	(vrmulhq_x_s8): Likewise.
	(vrmulhq_x_s16): Likewise.
	(vrmulhq_x_s32): Likewise.
	(vrmulhq_x_u8): Likewise.
	(vrmulhq_x_u16): Likewise.
	(vrmulhq_x_u32): Likewise.
	(vandq_x_s8): Likewise.
	(vandq_x_s16): Likewise.
	(vandq_x_s32): Likewise.
	(vandq_x_u8): Likewise.
	(vandq_x_u16): Likewise.
	(vandq_x_u32): Likewise.
	(vbicq_x_s8): Likewise.
	(vbicq_x_s16): Likewise.
	(vbicq_x_s32): Likewise.
	(vbicq_x_u8): Likewise.
	(vbicq_x_u16): Likewise.
	(vbicq_x_u32): Likewise.
	(vbrsrq_x_n_s8): Likewise.
	(vbrsrq_x_n_s16): Likewise.
	(vbrsrq_x_n_s32): Likewise.
	(vbrsrq_x_n_u8): Likewise.
	(vbrsrq_x_n_u16): Likewise.
	(vbrsrq_x_n_u32): Likewise.
	(veorq_x_s8): Likewise.
	(veorq_x_s16): Likewise.
	(veorq_x_s32): Likewise.
	(veorq_x_u8): Likewise.
	(veorq_x_u16): Likewise.
	(veorq_x_u32): Likewise.
	(vmovlbq_x_s8): Likewise.
	(vmovlbq_x_s16): Likewise.
	(vmovlbq_x_u8): Likewise.
	(vmovlbq_x_u16): Likewise.
	(vmovltq_x_s8): Likewise.
	(vmovltq_x_s16): Likewise.
	(vmovltq_x_u8): Likewise.
	(vmovltq_x_u16): Likewise.
	(vmvnq_x_s8): Likewise.
	(vmvnq_x_s16): Likewise.
	(vmvnq_x_s32): Likewise.
	(vmvnq_x_u8): Likewise.
	(vmvnq_x_u16): Likewise.
	(vmvnq_x_u32): Likewise.
	(vmvnq_x_n_s16): Likewise.
	(vmvnq_x_n_s32): Likewise.
	(vmvnq_x_n_u16): Likewise.
	(vmvnq_x_n_u32): Likewise.
	(vornq_x_s8): Likewise.
	(vornq_x_s16): Likewise.
	(vornq_x_s32): Likewise.
	(vornq_x_u8): Likewise.
	(vornq_x_u16): Likewise.
	(vornq_x_u32): Likewise.
	(vorrq_x_s8): Likewise.
	(vorrq_x_s16): Likewise.
	(vorrq_x_s32): Likewise.
	(vorrq_x_u8): Likewise.
	(vorrq_x_u16): Likewise.
	(vorrq_x_u32): Likewise.
	(vrev16q_x_s8): Likewise.
	(vrev16q_x_u8): Likewise.
	(vrev32q_x_s8): Likewise.
	(vrev32q_x_s16): Likewise.
	(vrev32q_x_u8): Likewise.
	(vrev32q_x_u16): Likewise.
	(vrev64q_x_s8): Likewise.
	(vrev64q_x_s16): Likewise.
	(vrev64q_x_s32): Likewise.
	(vrev64q_x_u8): Likewise.
	(vrev64q_x_u16): Likewise.
	(vrev64q_x_u32): Likewise.
	(vrshlq_x_s8): Likewise.
	(vrshlq_x_s16): Likewise.
	(vrshlq_x_s32): Likewise.
	(vrshlq_x_u8): Likewise.
	(vrshlq_x_u16): Likewise.
	(vrshlq_x_u32): Likewise.
	(vshllbq_x_n_s8): Likewise.
	(vshllbq_x_n_s16): Likewise.
	(vshllbq_x_n_u8): Likewise.
	(vshllbq_x_n_u16): Likewise.
	(vshlltq_x_n_s8): Likewise.
	(vshlltq_x_n_s16): Likewise.
	(vshlltq_x_n_u8): Likewise.
	(vshlltq_x_n_u16): Likewise.
	(vshlq_x_s8): Likewise.
	(vshlq_x_s16): Likewise.
	(vshlq_x_s32): Likewise.
	(vshlq_x_u8): Likewise.
	(vshlq_x_u16): Likewise.
	(vshlq_x_u32): Likewise.
	(vshlq_x_n_s8): Likewise.
	(vshlq_x_n_s16): Likewise.
	(vshlq_x_n_s32): Likewise.
	(vshlq_x_n_u8): Likewise.
	(vshlq_x_n_u16): Likewise.
	(vshlq_x_n_u32): Likewise.
	(vrshrq_x_n_s8): Likewise.
	(vrshrq_x_n_s16): Likewise.
	(vrshrq_x_n_s32): Likewise.
	(vrshrq_x_n_u8): Likewise.
	(vrshrq_x_n_u16): Likewise.
	(vrshrq_x_n_u32): Likewise.
	(vshrq_x_n_s8): Likewise.
	(vshrq_x_n_s16): Likewise.
	(vshrq_x_n_s32): Likewise.
	(vshrq_x_n_u8): Likewise.
	(vshrq_x_n_u16): Likewise.
	(vshrq_x_n_u32): Likewise.
	(vdupq_x_n_f16): Likewise.
	(vdupq_x_n_f32): Likewise.
	(vminnmq_x_f16): Likewise.
	(vminnmq_x_f32): Likewise.
	(vmaxnmq_x_f16): Likewise.
	(vmaxnmq_x_f32): Likewise.
	(vabdq_x_f16): Likewise.
	(vabdq_x_f32): Likewise.
	(vabsq_x_f16): Likewise.
	(vabsq_x_f32): Likewise.
	(vaddq_x_f16): Likewise.
	(vaddq_x_f32): Likewise.
	(vaddq_x_n_f16): Likewise.
	(vaddq_x_n_f32): Likewise.
	(vnegq_x_f16): Likewise.
	(vnegq_x_f32): Likewise.
	(vmulq_x_f16): Likewise.
	(vmulq_x_f32): Likewise.
	(vmulq_x_n_f16): Likewise.
	(vmulq_x_n_f32): Likewise.
	(vsubq_x_f16): Likewise.
	(vsubq_x_f32): Likewise.
	(vsubq_x_n_f16): Likewise.
	(vsubq_x_n_f32): Likewise.
	(vcaddq_rot90_x_f16): Likewise.
	(vcaddq_rot90_x_f32): Likewise.
	(vcaddq_rot270_x_f16): Likewise.
	(vcaddq_rot270_x_f32): Likewise.
	(vcmulq_x_f16): Likewise.
	(vcmulq_x_f32): Likewise.
	(vcmulq_rot90_x_f16): Likewise.
	(vcmulq_rot90_x_f32): Likewise.
	(vcmulq_rot180_x_f16): Likewise.
	(vcmulq_rot180_x_f32): Likewise.
	(vcmulq_rot270_x_f16): Likewise.
	(vcmulq_rot270_x_f32): Likewise.
	(vcvtaq_x_s16_f16): Likewise.
	(vcvtaq_x_s32_f32): Likewise.
	(vcvtaq_x_u16_f16): Likewise.
	(vcvtaq_x_u32_f32): Likewise.
	(vcvtnq_x_s16_f16): Likewise.
	(vcvtnq_x_s32_f32): Likewise.
	(vcvtnq_x_u16_f16): Likewise.
	(vcvtnq_x_u32_f32): Likewise.
	(vcvtpq_x_s16_f16): Likewise.
	(vcvtpq_x_s32_f32): Likewise.
	(vcvtpq_x_u16_f16): Likewise.
	(vcvtpq_x_u32_f32): Likewise.
	(vcvtmq_x_s16_f16): Likewise.
	(vcvtmq_x_s32_f32): Likewise.
	(vcvtmq_x_u16_f16): Likewise.
	(vcvtmq_x_u32_f32): Likewise.
	(vcvtbq_x_f32_f16): Likewise.
	(vcvttq_x_f32_f16): Likewise.
	(vcvtq_x_f16_u16): Likewise.
	(vcvtq_x_f16_s16): Likewise.
	(vcvtq_x_f32_s32): Likewise.
	(vcvtq_x_f32_u32): Likewise.
	(vcvtq_x_n_f16_s16): Likewise.
	(vcvtq_x_n_f16_u16): Likewise.
	(vcvtq_x_n_f32_s32): Likewise.
	(vcvtq_x_n_f32_u32): Likewise.
	(vcvtq_x_s16_f16): Likewise.
	(vcvtq_x_s32_f32): Likewise.
	(vcvtq_x_u16_f16): Likewise.
	(vcvtq_x_u32_f32): Likewise.
	(vcvtq_x_n_s16_f16): Likewise.
	(vcvtq_x_n_s32_f32): Likewise.
	(vcvtq_x_n_u16_f16): Likewise.
	(vcvtq_x_n_u32_f32): Likewise.
	(vrndq_x_f16): Likewise.
	(vrndq_x_f32): Likewise.
	(vrndnq_x_f16): Likewise.
	(vrndnq_x_f32): Likewise.
	(vrndmq_x_f16): Likewise.
	(vrndmq_x_f32): Likewise.
	(vrndpq_x_f16): Likewise.
	(vrndpq_x_f32): Likewise.
	(vrndaq_x_f16): Likewise.
	(vrndaq_x_f32): Likewise.
	(vrndxq_x_f16): Likewise.
	(vrndxq_x_f32): Likewise.
	(vandq_x_f16): Likewise.
	(vandq_x_f32): Likewise.
	(vbicq_x_f16): Likewise.
	(vbicq_x_f32): Likewise.
	(vbrsrq_x_n_f16): Likewise.
	(vbrsrq_x_n_f32): Likewise.
	(veorq_x_f16): Likewise.
	(veorq_x_f32): Likewise.
	(vornq_x_f16): Likewise.
	(vornq_x_f32): Likewise.
	(vorrq_x_f16): Likewise.
	(vorrq_x_f32): Likewise.
	(vrev32q_x_f16): Likewise.
	(vrev64q_x_f16): Likewise.
	(vrev64q_x_f32): Likewise.
	(__arm_vddupq_x_n_u8): Define intrinsic.
	(__arm_vddupq_x_n_u16): Likewise.
	(__arm_vddupq_x_n_u32): Likewise.
	(__arm_vddupq_x_wb_u8): Likewise.
	(__arm_vddupq_x_wb_u16): Likewise.
	(__arm_vddupq_x_wb_u32): Likewise.
	(__arm_vdwdupq_x_n_u8): Likewise.
	(__arm_vdwdupq_x_n_u16): Likewise.
	(__arm_vdwdupq_x_n_u32): Likewise.
	(__arm_vdwdupq_x_wb_u8): Likewise.
	(__arm_vdwdupq_x_wb_u16): Likewise.
	(__arm_vdwdupq_x_wb_u32): Likewise.
	(__arm_vidupq_x_n_u8): Likewise.
	(__arm_vidupq_x_n_u16): Likewise.
	(__arm_vidupq_x_n_u32): Likewise.
	(__arm_vidupq_x_wb_u8): Likewise.
	(__arm_vidupq_x_wb_u16): Likewise.
	(__arm_vidupq_x_wb_u32): Likewise.
	(__arm_viwdupq_x_n_u8): Likewise.
	(__arm_viwdupq_x_n_u16): Likewise.
	(__arm_viwdupq_x_n_u32): Likewise.
	(__arm_viwdupq_x_wb_u8): Likewise.
	(__arm_viwdupq_x_wb_u16): Likewise.
	(__arm_viwdupq_x_wb_u32): Likewise.
	(__arm_vdupq_x_n_s8): Likewise.
	(__arm_vdupq_x_n_s16): Likewise.
	(__arm_vdupq_x_n_s32): Likewise.
	(__arm_vdupq_x_n_u8): Likewise.
	(__arm_vdupq_x_n_u16): Likewise.
	(__arm_vdupq_x_n_u32): Likewise.
	(__arm_vminq_x_s8): Likewise.
	(__arm_vminq_x_s16): Likewise.
	(__arm_vminq_x_s32): Likewise.
	(__arm_vminq_x_u8): Likewise.
	(__arm_vminq_x_u16): Likewise.
	(__arm_vminq_x_u32): Likewise.
	(__arm_vmaxq_x_s8): Likewise.
	(__arm_vmaxq_x_s16): Likewise.
	(__arm_vmaxq_x_s32): Likewise.
	(__arm_vmaxq_x_u8): Likewise.
	(__arm_vmaxq_x_u16): Likewise.
	(__arm_vmaxq_x_u32): Likewise.
	(__arm_vabdq_x_s8): Likewise.
	(__arm_vabdq_x_s16): Likewise.
	(__arm_vabdq_x_s32): Likewise.
	(__arm_vabdq_x_u8): Likewise.
	(__arm_vabdq_x_u16): Likewise.
	(__arm_vabdq_x_u32): Likewise.
	(__arm_vabsq_x_s8): Likewise.
	(__arm_vabsq_x_s16): Likewise.
	(__arm_vabsq_x_s32): Likewise.
	(__arm_vaddq_x_s8): Likewise.
	(__arm_vaddq_x_s16): Likewise.
	(__arm_vaddq_x_s32): Likewise.
	(__arm_vaddq_x_n_s8): Likewise.
	(__arm_vaddq_x_n_s16): Likewise.
	(__arm_vaddq_x_n_s32): Likewise.
	(__arm_vaddq_x_u8): Likewise.
	(__arm_vaddq_x_u16): Likewise.
	(__arm_vaddq_x_u32): Likewise.
	(__arm_vaddq_x_n_u8): Likewise.
	(__arm_vaddq_x_n_u16): Likewise.
	(__arm_vaddq_x_n_u32): Likewise.
	(__arm_vclsq_x_s8): Likewise.
	(__arm_vclsq_x_s16): Likewise.
	(__arm_vclsq_x_s32): Likewise.
	(__arm_vclzq_x_s8): Likewise.
	(__arm_vclzq_x_s16): Likewise.
	(__arm_vclzq_x_s32): Likewise.
	(__arm_vclzq_x_u8): Likewise.
	(__arm_vclzq_x_u16): Likewise.
	(__arm_vclzq_x_u32): Likewise.
	(__arm_vnegq_x_s8): Likewise.
	(__arm_vnegq_x_s16): Likewise.
	(__arm_vnegq_x_s32): Likewise.
	(__arm_vmulhq_x_s8): Likewise.
	(__arm_vmulhq_x_s16): Likewise.
	(__arm_vmulhq_x_s32): Likewise.
	(__arm_vmulhq_x_u8): Likewise.
	(__arm_vmulhq_x_u16): Likewise.
	(__arm_vmulhq_x_u32): Likewise.
	(__arm_vmullbq_poly_x_p8): Likewise.
	(__arm_vmullbq_poly_x_p16): Likewise.
	(__arm_vmullbq_int_x_s8): Likewise.
	(__arm_vmullbq_int_x_s16): Likewise.
	(__arm_vmullbq_int_x_s32): Likewise.
	(__arm_vmullbq_int_x_u8): Likewise.
	(__arm_vmullbq_int_x_u16): Likewise.
	(__arm_vmullbq_int_x_u32): Likewise.
	(__arm_vmulltq_poly_x_p8): Likewise.
	(__arm_vmulltq_poly_x_p16): Likewise.
	(__arm_vmulltq_int_x_s8): Likewise.
	(__arm_vmulltq_int_x_s16): Likewise.
	(__arm_vmulltq_int_x_s32): Likewise.
	(__arm_vmulltq_int_x_u8): Likewise.
	(__arm_vmulltq_int_x_u16): Likewise.
	(__arm_vmulltq_int_x_u32): Likewise.
	(__arm_vmulq_x_s8): Likewise.
	(__arm_vmulq_x_s16): Likewise.
	(__arm_vmulq_x_s32): Likewise.
	(__arm_vmulq_x_n_s8): Likewise.
	(__arm_vmulq_x_n_s16): Likewise.
	(__arm_vmulq_x_n_s32): Likewise.
	(__arm_vmulq_x_u8): Likewise.
	(__arm_vmulq_x_u16): Likewise.
	(__arm_vmulq_x_u32): Likewise.
	(__arm_vmulq_x_n_u8): Likewise.
	(__arm_vmulq_x_n_u16): Likewise.
	(__arm_vmulq_x_n_u32): Likewise.
	(__arm_vsubq_x_s8): Likewise.
	(__arm_vsubq_x_s16): Likewise.
	(__arm_vsubq_x_s32): Likewise.
	(__arm_vsubq_x_n_s8): Likewise.
	(__arm_vsubq_x_n_s16): Likewise.
	(__arm_vsubq_x_n_s32): Likewise.
	(__arm_vsubq_x_u8): Likewise.
	(__arm_vsubq_x_u16): Likewise.
	(__arm_vsubq_x_u32): Likewise.
	(__arm_vsubq_x_n_u8): Likewise.
	(__arm_vsubq_x_n_u16): Likewise.
	(__arm_vsubq_x_n_u32): Likewise.
	(__arm_vcaddq_rot90_x_s8): Likewise.
	(__arm_vcaddq_rot90_x_s16): Likewise.
	(__arm_vcaddq_rot90_x_s32): Likewise.
	(__arm_vcaddq_rot90_x_u8): Likewise.
	(__arm_vcaddq_rot90_x_u16): Likewise.
	(__arm_vcaddq_rot90_x_u32): Likewise.
	(__arm_vcaddq_rot270_x_s8): Likewise.
	(__arm_vcaddq_rot270_x_s16): Likewise.
	(__arm_vcaddq_rot270_x_s32): Likewise.
	(__arm_vcaddq_rot270_x_u8): Likewise.
	(__arm_vcaddq_rot270_x_u16): Likewise.
	(__arm_vcaddq_rot270_x_u32): Likewise.
	(__arm_vhaddq_x_n_s8): Likewise.
	(__arm_vhaddq_x_n_s16): Likewise.
	(__arm_vhaddq_x_n_s32): Likewise.
	(__arm_vhaddq_x_n_u8): Likewise.
	(__arm_vhaddq_x_n_u16): Likewise.
	(__arm_vhaddq_x_n_u32): Likewise.
	(__arm_vhaddq_x_s8): Likewise.
	(__arm_vhaddq_x_s16): Likewise.
	(__arm_vhaddq_x_s32): Likewise.
	(__arm_vhaddq_x_u8): Likewise.
	(__arm_vhaddq_x_u16): Likewise.
	(__arm_vhaddq_x_u32): Likewise.
	(__arm_vhcaddq_rot90_x_s8): Likewise.
	(__arm_vhcaddq_rot90_x_s16): Likewise.
	(__arm_vhcaddq_rot90_x_s32): Likewise.
	(__arm_vhcaddq_rot270_x_s8): Likewise.
	(__arm_vhcaddq_rot270_x_s16): Likewise.
	(__arm_vhcaddq_rot270_x_s32): Likewise.
	(__arm_vhsubq_x_n_s8): Likewise.
	(__arm_vhsubq_x_n_s16): Likewise.
	(__arm_vhsubq_x_n_s32): Likewise.
	(__arm_vhsubq_x_n_u8): Likewise.
	(__arm_vhsubq_x_n_u16): Likewise.
	(__arm_vhsubq_x_n_u32): Likewise.
	(__arm_vhsubq_x_s8): Likewise.
	(__arm_vhsubq_x_s16): Likewise.
	(__arm_vhsubq_x_s32): Likewise.
	(__arm_vhsubq_x_u8): Likewise.
	(__arm_vhsubq_x_u16): Likewise.
	(__arm_vhsubq_x_u32): Likewise.
	(__arm_vrhaddq_x_s8): Likewise.
	(__arm_vrhaddq_x_s16): Likewise.
	(__arm_vrhaddq_x_s32): Likewise.
	(__arm_vrhaddq_x_u8): Likewise.
	(__arm_vrhaddq_x_u16): Likewise.
	(__arm_vrhaddq_x_u32): Likewise.
	(__arm_vrmulhq_x_s8): Likewise.
	(__arm_vrmulhq_x_s16): Likewise.
	(__arm_vrmulhq_x_s32): Likewise.
	(__arm_vrmulhq_x_u8): Likewise.
	(__arm_vrmulhq_x_u16): Likewise.
	(__arm_vrmulhq_x_u32): Likewise.
	(__arm_vandq_x_s8): Likewise.
	(__arm_vandq_x_s16): Likewise.
	(__arm_vandq_x_s32): Likewise.
	(__arm_vandq_x_u8): Likewise.
	(__arm_vandq_x_u16): Likewise.
	(__arm_vandq_x_u32): Likewise.
	(__arm_vbicq_x_s8): Likewise.
	(__arm_vbicq_x_s16): Likewise.
	(__arm_vbicq_x_s32): Likewise.
	(__arm_vbicq_x_u8): Likewise.
	(__arm_vbicq_x_u16): Likewise.
	(__arm_vbicq_x_u32): Likewise.
	(__arm_vbrsrq_x_n_s8): Likewise.
	(__arm_vbrsrq_x_n_s16): Likewise.
	(__arm_vbrsrq_x_n_s32): Likewise.
	(__arm_vbrsrq_x_n_u8): Likewise.
	(__arm_vbrsrq_x_n_u16): Likewise.
	(__arm_vbrsrq_x_n_u32): Likewise.
	(__arm_veorq_x_s8): Likewise.
	(__arm_veorq_x_s16): Likewise.
	(__arm_veorq_x_s32): Likewise.
	(__arm_veorq_x_u8): Likewise.
	(__arm_veorq_x_u16): Likewise.
	(__arm_veorq_x_u32): Likewise.
	(__arm_vmovlbq_x_s8): Likewise.
	(__arm_vmovlbq_x_s16): Likewise.
	(__arm_vmovlbq_x_u8): Likewise.
	(__arm_vmovlbq_x_u16): Likewise.
	(__arm_vmovltq_x_s8): Likewise.
	(__arm_vmovltq_x_s16): Likewise.
	(__arm_vmovltq_x_u8): Likewise.
	(__arm_vmovltq_x_u16): Likewise.
	(__arm_vmvnq_x_s8): Likewise.
	(__arm_vmvnq_x_s16): Likewise.
	(__arm_vmvnq_x_s32): Likewise.
	(__arm_vmvnq_x_u8): Likewise.
	(__arm_vmvnq_x_u16): Likewise.
	(__arm_vmvnq_x_u32): Likewise.
	(__arm_vmvnq_x_n_s16): Likewise.
	(__arm_vmvnq_x_n_s32): Likewise.
	(__arm_vmvnq_x_n_u16): Likewise.
	(__arm_vmvnq_x_n_u32): Likewise.
	(__arm_vornq_x_s8): Likewise.
	(__arm_vornq_x_s16): Likewise.
	(__arm_vornq_x_s32): Likewise.
	(__arm_vornq_x_u8): Likewise.
	(__arm_vornq_x_u16): Likewise.
	(__arm_vornq_x_u32): Likewise.
	(__arm_vorrq_x_s8): Likewise.
	(__arm_vorrq_x_s16): Likewise.
	(__arm_vorrq_x_s32): Likewise.
	(__arm_vorrq_x_u8): Likewise.
	(__arm_vorrq_x_u16): Likewise.
	(__arm_vorrq_x_u32): Likewise.
	(__arm_vrev16q_x_s8): Likewise.
	(__arm_vrev16q_x_u8): Likewise.
	(__arm_vrev32q_x_s8): Likewise.
	(__arm_vrev32q_x_s16): Likewise.
	(__arm_vrev32q_x_u8): Likewise.
	(__arm_vrev32q_x_u16): Likewise.
	(__arm_vrev64q_x_s8): Likewise.
	(__arm_vrev64q_x_s16): Likewise.
	(__arm_vrev64q_x_s32): Likewise.
	(__arm_vrev64q_x_u8): Likewise.
	(__arm_vrev64q_x_u16): Likewise.
	(__arm_vrev64q_x_u32): Likewise.
	(__arm_vrshlq_x_s8): Likewise.
	(__arm_vrshlq_x_s16): Likewise.
	(__arm_vrshlq_x_s32): Likewise.
	(__arm_vrshlq_x_u8): Likewise.
	(__arm_vrshlq_x_u16): Likewise.
	(__arm_vrshlq_x_u32): Likewise.
	(__arm_vshllbq_x_n_s8): Likewise.
	(__arm_vshllbq_x_n_s16): Likewise.
	(__arm_vshllbq_x_n_u8): Likewise.
	(__arm_vshllbq_x_n_u16): Likewise.
	(__arm_vshlltq_x_n_s8): Likewise.
	(__arm_vshlltq_x_n_s16): Likewise.
	(__arm_vshlltq_x_n_u8): Likewise.
	(__arm_vshlltq_x_n_u16): Likewise.
	(__arm_vshlq_x_s8): Likewise.
	(__arm_vshlq_x_s16): Likewise.
	(__arm_vshlq_x_s32): Likewise.
	(__arm_vshlq_x_u8): Likewise.
	(__arm_vshlq_x_u16): Likewise.
	(__arm_vshlq_x_u32): Likewise.
	(__arm_vshlq_x_n_s8): Likewise.
	(__arm_vshlq_x_n_s16): Likewise.
	(__arm_vshlq_x_n_s32): Likewise.
	(__arm_vshlq_x_n_u8): Likewise.
	(__arm_vshlq_x_n_u16): Likewise.
	(__arm_vshlq_x_n_u32): Likewise.
	(__arm_vrshrq_x_n_s8): Likewise.
	(__arm_vrshrq_x_n_s16): Likewise.
	(__arm_vrshrq_x_n_s32): Likewise.
	(__arm_vrshrq_x_n_u8): Likewise.
	(__arm_vrshrq_x_n_u16): Likewise.
	(__arm_vrshrq_x_n_u32): Likewise.
	(__arm_vshrq_x_n_s8): Likewise.
	(__arm_vshrq_x_n_s16): Likewise.
	(__arm_vshrq_x_n_s32): Likewise.
	(__arm_vshrq_x_n_u8): Likewise.
	(__arm_vshrq_x_n_u16): Likewise.
	(__arm_vshrq_x_n_u32): Likewise.
	(__arm_vdupq_x_n_f16): Likewise.
	(__arm_vdupq_x_n_f32): Likewise.
	(__arm_vminnmq_x_f16): Likewise.
	(__arm_vminnmq_x_f32): Likewise.
	(__arm_vmaxnmq_x_f16): Likewise.
	(__arm_vmaxnmq_x_f32): Likewise.
	(__arm_vabdq_x_f16): Likewise.
	(__arm_vabdq_x_f32): Likewise.
	(__arm_vabsq_x_f16): Likewise.
	(__arm_vabsq_x_f32): Likewise.
	(__arm_vaddq_x_f16): Likewise.
	(__arm_vaddq_x_f32): Likewise.
	(__arm_vaddq_x_n_f16): Likewise.
	(__arm_vaddq_x_n_f32): Likewise.
	(__arm_vnegq_x_f16): Likewise.
	(__arm_vnegq_x_f32): Likewise.
	(__arm_vmulq_x_f16): Likewise.
	(__arm_vmulq_x_f32): Likewise.
	(__arm_vmulq_x_n_f16): Likewise.
	(__arm_vmulq_x_n_f32): Likewise.
	(__arm_vsubq_x_f16): Likewise.
	(__arm_vsubq_x_f32): Likewise.
	(__arm_vsubq_x_n_f16): Likewise.
	(__arm_vsubq_x_n_f32): Likewise.
	(__arm_vcaddq_rot90_x_f16): Likewise.
	(__arm_vcaddq_rot90_x_f32): Likewise.
	(__arm_vcaddq_rot270_x_f16): Likewise.
	(__arm_vcaddq_rot270_x_f32): Likewise.
	(__arm_vcmulq_x_f16): Likewise.
	(__arm_vcmulq_x_f32): Likewise.
	(__arm_vcmulq_rot90_x_f16): Likewise.
	(__arm_vcmulq_rot90_x_f32): Likewise.
	(__arm_vcmulq_rot180_x_f16): Likewise.
	(__arm_vcmulq_rot180_x_f32): Likewise.
	(__arm_vcmulq_rot270_x_f16): Likewise.
	(__arm_vcmulq_rot270_x_f32): Likewise.
	(__arm_vcvtaq_x_s16_f16): Likewise.
	(__arm_vcvtaq_x_s32_f32): Likewise.
	(__arm_vcvtaq_x_u16_f16): Likewise.
	(__arm_vcvtaq_x_u32_f32): Likewise.
	(__arm_vcvtnq_x_s16_f16): Likewise.
	(__arm_vcvtnq_x_s32_f32): Likewise.
	(__arm_vcvtnq_x_u16_f16): Likewise.
	(__arm_vcvtnq_x_u32_f32): Likewise.
	(__arm_vcvtpq_x_s16_f16): Likewise.
	(__arm_vcvtpq_x_s32_f32): Likewise.
	(__arm_vcvtpq_x_u16_f16): Likewise.
	(__arm_vcvtpq_x_u32_f32): Likewise.
	(__arm_vcvtmq_x_s16_f16): Likewise.
	(__arm_vcvtmq_x_s32_f32): Likewise.
	(__arm_vcvtmq_x_u16_f16): Likewise.
	(__arm_vcvtmq_x_u32_f32): Likewise.
	(__arm_vcvtbq_x_f32_f16): Likewise.
	(__arm_vcvttq_x_f32_f16): Likewise.
	(__arm_vcvtq_x_f16_u16): Likewise.
	(__arm_vcvtq_x_f16_s16): Likewise.
	(__arm_vcvtq_x_f32_s32): Likewise.
	(__arm_vcvtq_x_f32_u32): Likewise.
	(__arm_vcvtq_x_n_f16_s16): Likewise.
	(__arm_vcvtq_x_n_f16_u16): Likewise.
	(__arm_vcvtq_x_n_f32_s32): Likewise.
	(__arm_vcvtq_x_n_f32_u32): Likewise.
	(__arm_vcvtq_x_s16_f16): Likewise.
	(__arm_vcvtq_x_s32_f32): Likewise.
	(__arm_vcvtq_x_u16_f16): Likewise.
	(__arm_vcvtq_x_u32_f32): Likewise.
	(__arm_vcvtq_x_n_s16_f16): Likewise.
	(__arm_vcvtq_x_n_s32_f32): Likewise.
	(__arm_vcvtq_x_n_u16_f16): Likewise.
	(__arm_vcvtq_x_n_u32_f32): Likewise.
	(__arm_vrndq_x_f16): Likewise.
	(__arm_vrndq_x_f32): Likewise.
	(__arm_vrndnq_x_f16): Likewise.
	(__arm_vrndnq_x_f32): Likewise.
	(__arm_vrndmq_x_f16): Likewise.
	(__arm_vrndmq_x_f32): Likewise.
	(__arm_vrndpq_x_f16): Likewise.
	(__arm_vrndpq_x_f32): Likewise.
	(__arm_vrndaq_x_f16): Likewise.
	(__arm_vrndaq_x_f32): Likewise.
	(__arm_vrndxq_x_f16): Likewise.
	(__arm_vrndxq_x_f32): Likewise.
	(__arm_vandq_x_f16): Likewise.
	(__arm_vandq_x_f32): Likewise.
	(__arm_vbicq_x_f16): Likewise.
	(__arm_vbicq_x_f32): Likewise.
	(__arm_vbrsrq_x_n_f16): Likewise.
	(__arm_vbrsrq_x_n_f32): Likewise.
	(__arm_veorq_x_f16): Likewise.
	(__arm_veorq_x_f32): Likewise.
	(__arm_vornq_x_f16): Likewise.
	(__arm_vornq_x_f32): Likewise.
	(__arm_vorrq_x_f16): Likewise.
	(__arm_vorrq_x_f32): Likewise.
	(__arm_vrev32q_x_f16): Likewise.
	(__arm_vrev64q_x_f16): Likewise.
	(__arm_vrev64q_x_f32): Likewise.
	(vabdq_x): Define polymorphic variant.
	(vabsq_x): Likewise.
	(vaddq_x): Likewise.
	(vandq_x): Likewise.
	(vbicq_x): Likewise.
	(vbrsrq_x): Likewise.
	(vcaddq_rot270_x): Likewise.
	(vcaddq_rot90_x): Likewise.
	(vcmulq_rot180_x): Likewise.
	(vcmulq_rot270_x): Likewise.
	(vcmulq_x): Likewise.
	(vcvtq_x): Likewise.
	(vcvtq_x_n): Likewise.
	(vcvtnq_m): Likewise.
	(veorq_x): Likewise.
	(vmaxnmq_x): Likewise.
	(vminnmq_x): Likewise.
	(vmulq_x): Likewise.
	(vnegq_x): Likewise.
	(vornq_x): Likewise.
	(vorrq_x): Likewise.
	(vrev32q_x): Likewise.
	(vrev64q_x): Likewise.
	(vrndaq_x): Likewise.
	(vrndmq_x): Likewise.
	(vrndnq_x): Likewise.
	(vrndpq_x): Likewise.
	(vrndq_x): Likewise.
	(vrndxq_x): Likewise.
	(vsubq_x): Likewise.
	(vcmulq_rot90_x): Likewise.
	(vadciq): Likewise.
	(vclsq_x): Likewise.
	(vclzq_x): Likewise.
	(vhaddq_x): Likewise.
	(vhcaddq_rot270_x): Likewise.
	(vhcaddq_rot90_x): Likewise.
	(vhsubq_x): Likewise.
	(vmaxq_x): Likewise.
	(vminq_x): Likewise.
	(vmovlbq_x): Likewise.
	(vmovltq_x): Likewise.
	(vmulhq_x): Likewise.
	(vmullbq_int_x): Likewise.
	(vmullbq_poly_x): Likewise.
	(vmulltq_int_x): Likewise.
	(vmulltq_poly_x): Likewise.
	(vmvnq_x): Likewise.
	(vrev16q_x): Likewise.
	(vrhaddq_x): Likewise.
	(vrmulhq_x): Likewise.
	(vrshlq_x): Likewise.
	(vrshrq_x): Likewise.
	(vshllbq_x): Likewise.
	(vshlltq_x): Likewise.
	(vshlq_x_n): Likewise.
	(vshlq_x): Likewise.
	(vdwdupq_x_u8): Likewise.
	(vdwdupq_x_u16): Likewise.
	(vdwdupq_x_u32): Likewise.
	(viwdupq_x_u8): Likewise.
	(viwdupq_x_u16): Likewise.
	(viwdupq_x_u32): Likewise.
	(vidupq_x_u8): Likewise.
	(vddupq_x_u8): Likewise.
	(vidupq_x_u16): Likewise.
	(vddupq_x_u16): Likewise.
	(vidupq_x_u32): Likewise.
	(vddupq_x_u32): Likewise.
	(vshrq_x): Likewise.

gcc/testsuite/ChangeLog:

2019-11-14  Srinath Parvathaneni  <srinath.parvathaneni@arm.com>

	* gcc.target/arm/mve/intrinsics/vabdq_x_f16.c: New test.
	* gcc.target/arm/mve/intrinsics/vabdq_x_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vabdq_x_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vabdq_x_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vabdq_x_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vabdq_x_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vabdq_x_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vabdq_x_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vabsq_x_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vabsq_x_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vabsq_x_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vabsq_x_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vabsq_x_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddq_x_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddq_x_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddq_x_n_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddq_x_n_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddq_x_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddq_x_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddq_x_n_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddq_x_n_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddq_x_n_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddq_x_n_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddq_x_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddq_x_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddq_x_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddq_x_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddq_x_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddq_x_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vandq_x_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vandq_x_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vandq_x_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vandq_x_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vandq_x_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vandq_x_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vandq_x_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vandq_x_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vbicq_x_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vbicq_x_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vbicq_x_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vbicq_x_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vbicq_x_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vbicq_x_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vbicq_x_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vbicq_x_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vbrsrq_x_n_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vbrsrq_x_n_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vbrsrq_x_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vbrsrq_x_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vbrsrq_x_n_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vbrsrq_x_n_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vbrsrq_x_n_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vbrsrq_x_n_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcaddq_rot270_x_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcaddq_rot270_x_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcaddq_rot270_x_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcaddq_rot270_x_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcaddq_rot270_x_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcaddq_rot270_x_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcaddq_rot270_x_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcaddq_rot270_x_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcaddq_rot90_x_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcaddq_rot90_x_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcaddq_rot90_x_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcaddq_rot90_x_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcaddq_rot90_x_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcaddq_rot90_x_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcaddq_rot90_x_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcaddq_rot90_x_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vclsq_x_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vclsq_x_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vclsq_x_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vclzq_x_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vclzq_x_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vclzq_x_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vclzq_x_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vclzq_x_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vclzq_x_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmulq_rot180_x_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmulq_rot180_x_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmulq_rot270_x_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmulq_rot270_x_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmulq_rot90_x_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmulq_rot90_x_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmulq_x_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmulq_x_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcvtaq_x_s16_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcvtaq_x_s32_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcvtaq_x_u16_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcvtaq_x_u32_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcvtbq_x_f32_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcvtmq_x_s16_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcvtmq_x_s32_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcvtmq_x_u16_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcvtmq_x_u32_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcvtnq_x_s16_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcvtnq_x_s32_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcvtnq_x_u16_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcvtnq_x_u32_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcvtpq_x_s16_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcvtpq_x_s32_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcvtpq_x_u16_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcvtpq_x_u32_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcvtq_x_f16_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcvtq_x_f16_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcvtq_x_f32_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcvtq_x_f32_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcvtq_x_n_f16_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcvtq_x_n_f16_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcvtq_x_n_f32_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcvtq_x_n_f32_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcvtq_x_n_s16_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcvtq_x_n_s32_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcvtq_x_n_u16_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcvtq_x_n_u32_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcvtq_x_s16_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcvtq_x_s32_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcvtq_x_u16_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcvtq_x_u32_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcvttq_x_f32_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vddupq_x_n_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vddupq_x_n_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vddupq_x_n_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vddupq_x_wb_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vddupq_x_wb_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vddupq_x_wb_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vdupq_x_n_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vdupq_x_n_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vdupq_x_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vdupq_x_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vdupq_x_n_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vdupq_x_n_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vdupq_x_n_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vdupq_x_n_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vdwdupq_x_n_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vdwdupq_x_n_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vdwdupq_x_n_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vdwdupq_x_wb_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vdwdupq_x_wb_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vdwdupq_x_wb_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/veorq_x_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/veorq_x_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/veorq_x_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/veorq_x_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/veorq_x_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/veorq_x_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/veorq_x_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/veorq_x_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhaddq_x_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhaddq_x_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhaddq_x_n_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhaddq_x_n_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhaddq_x_n_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhaddq_x_n_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhaddq_x_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhaddq_x_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhaddq_x_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhaddq_x_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhaddq_x_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhaddq_x_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhcaddq_rot270_x_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhcaddq_rot270_x_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhcaddq_rot270_x_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhcaddq_rot90_x_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhcaddq_rot90_x_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhcaddq_rot90_x_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhsubq_x_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhsubq_x_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhsubq_x_n_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhsubq_x_n_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhsubq_x_n_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhsubq_x_n_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhsubq_x_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhsubq_x_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhsubq_x_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhsubq_x_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhsubq_x_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vhsubq_x_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vidupq_x_n_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vidupq_x_n_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vidupq_x_n_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vidupq_x_wb_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vidupq_x_wb_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vidupq_x_wb_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/viwdupq_x_n_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/viwdupq_x_n_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/viwdupq_x_n_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/viwdupq_x_wb_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/viwdupq_x_wb_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/viwdupq_x_wb_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmaxnmq_x_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmaxnmq_x_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmaxq_x_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmaxq_x_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmaxq_x_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmaxq_x_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmaxq_x_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmaxq_x_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vminnmq_x_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vminnmq_x_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vminq_x_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vminq_x_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vminq_x_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vminq_x_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vminq_x_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vminq_x_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmovlbq_x_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmovlbq_x_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmovlbq_x_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmovlbq_x_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmovltq_x_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmovltq_x_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmovltq_x_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmovltq_x_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmulhq_x_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmulhq_x_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmulhq_x_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmulhq_x_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmulhq_x_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmulhq_x_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmullbq_int_x_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmullbq_int_x_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmullbq_int_x_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmullbq_int_x_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmullbq_int_x_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmullbq_int_x_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmullbq_poly_x_p16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmullbq_poly_x_p8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmulltq_int_x_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmulltq_int_x_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmulltq_int_x_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmulltq_int_x_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmulltq_int_x_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmulltq_int_x_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmulltq_poly_x_p16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmulltq_poly_x_p8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmulq_x_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmulq_x_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmulq_x_n_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmulq_x_n_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmulq_x_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmulq_x_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmulq_x_n_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmulq_x_n_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmulq_x_n_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmulq_x_n_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmulq_x_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmulq_x_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmulq_x_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmulq_x_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmulq_x_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmulq_x_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmvnq_x_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmvnq_x_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmvnq_x_n_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmvnq_x_n_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmvnq_x_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmvnq_x_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmvnq_x_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmvnq_x_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmvnq_x_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmvnq_x_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vnegq_x_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vnegq_x_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vnegq_x_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vnegq_x_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vnegq_x_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vornq_x_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vornq_x_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vornq_x_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vornq_x_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vornq_x_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vornq_x_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vornq_x_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vornq_x_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vorrq_x_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vorrq_x_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vorrq_x_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vorrq_x_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vorrq_x_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vorrq_x_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vorrq_x_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vorrq_x_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrev16q_x_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrev16q_x_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrev32q_x_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrev32q_x_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrev32q_x_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrev32q_x_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrev32q_x_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrev64q_x_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrev64q_x_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrev64q_x_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrev64q_x_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrev64q_x_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrev64q_x_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrev64q_x_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrev64q_x_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrhaddq_x_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrhaddq_x_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrhaddq_x_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrhaddq_x_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrhaddq_x_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrhaddq_x_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrmulhq_x_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrmulhq_x_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrmulhq_x_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrmulhq_x_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrmulhq_x_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrmulhq_x_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrndaq_x_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrndaq_x_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrndmq_x_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrndmq_x_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrndnq_x_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrndnq_x_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrndpq_x_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrndpq_x_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrndq_x_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrndq_x_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrndxq_x_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrndxq_x_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrshlq_x_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrshlq_x_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrshlq_x_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrshlq_x_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrshlq_x_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrshlq_x_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrshrq_x_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrshrq_x_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrshrq_x_n_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrshrq_x_n_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrshrq_x_n_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrshrq_x_n_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vshllbq_x_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vshllbq_x_n_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vshllbq_x_n_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vshllbq_x_n_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vshlltq_x_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vshlltq_x_n_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vshlltq_x_n_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vshlltq_x_n_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vshlq_x_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vshlq_x_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vshlq_x_n_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vshlq_x_n_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vshlq_x_n_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vshlq_x_n_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vshlq_x_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vshlq_x_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vshlq_x_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vshlq_x_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vshlq_x_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vshlq_x_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vshrq_x_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vshrq_x_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vshrq_x_n_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vshrq_x_n_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vshrq_x_n_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsubq_x_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsubq_x_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsubq_x_n_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsubq_x_n_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsubq_x_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsubq_x_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsubq_x_n_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsubq_x_n_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsubq_x_n_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsubq_x_n_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsubq_x_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsubq_x_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsubq_x_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsubq_x_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsubq_x_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsubq_x_u8.c: Likewise.

[-- Attachment #2: diff32.patch.gz --]
[-- Type: application/gzip, Size: 31605 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH][ARM][GCC][6/5x]: Remaining MVE load intrinsics which loads half word and word or double word from memory.
  2019-11-14 19:34 [PATCH][ARM][GCC][0/x]: Support for MVE ACLE intrinsics Srinath Parvathaneni
                   ` (25 preceding siblings ...)
  2019-11-14 19:16 ` [PATCH][ARM][GCC][12x]: MVE ACLE intrinsics to set and get vector lane Srinath Parvathaneni
@ 2019-11-14 19:16 ` Srinath Parvathaneni
  2019-11-14 19:16 ` [PATCH][ARM][GCC][1/x]: MVE ACLE intrinsics framework patch Srinath Parvathaneni
                   ` (11 subsequent siblings)
  38 siblings, 0 replies; 48+ messages in thread
From: Srinath Parvathaneni @ 2019-11-14 19:16 UTC (permalink / raw)
  To: gcc-patches; +Cc: Richard Earnshaw, Kyrylo Tkachov

[-- Attachment #1: Type: text/plain, Size: 79534 bytes --]

Hello,

This patch supports the following Remaining MVE ACLE load intrinsics which load an halfword,
word or double word from memory.

vldrdq_gather_base_s64, vldrdq_gather_base_u64, vldrdq_gather_base_z_s64,
vldrdq_gather_base_z_u64, vldrdq_gather_offset_s64, vldrdq_gather_offset_u64,
vldrdq_gather_offset_z_s64, vldrdq_gather_offset_z_u64, vldrdq_gather_shifted_offset_s64,
vldrdq_gather_shifted_offset_u64, vldrdq_gather_shifted_offset_z_s64,
vldrdq_gather_shifted_offset_z_u64, vldrhq_gather_offset_f16, vldrhq_gather_offset_z_f16,
vldrhq_gather_shifted_offset_f16, vldrhq_gather_shifted_offset_z_f16, vldrwq_gather_base_f32,
vldrwq_gather_base_z_f32, vldrwq_gather_offset_f32, vldrwq_gather_offset_s32,
vldrwq_gather_offset_u32, vldrwq_gather_offset_z_f32, vldrwq_gather_offset_z_s32,
vldrwq_gather_offset_z_u32, vldrwq_gather_shifted_offset_f32, vldrwq_gather_shifted_offset_s32,
vldrwq_gather_shifted_offset_u32, vldrwq_gather_shifted_offset_z_f32,
vldrwq_gather_shifted_offset_z_s32, vldrwq_gather_shifted_offset_z_u32.

Please refer to M-profile Vector Extension (MVE) intrinsics [1]  for more details.
[1]  https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/mve-intrinsics

Regression tested on arm-none-eabi and found no regressions.

Ok for trunk?

Thanks,
Srinath.

gcc/ChangeLog:

2019-11-01  Andre Vieira  <andre.simoesdiasvieira@arm.com>
            Mihail Ionescu  <mihail.ionescu@arm.com>
            Srinath Parvathaneni  <srinath.parvathaneni@arm.com>

	* config/arm/arm_mve.h (vldrdq_gather_base_s64): Define macro.
	(vldrdq_gather_base_u64): Likewise.
	(vldrdq_gather_base_z_s64): Likewise.
	(vldrdq_gather_base_z_u64): Likewise.
	(vldrdq_gather_offset_s64): Likewise.
	(vldrdq_gather_offset_u64): Likewise.
	(vldrdq_gather_offset_z_s64): Likewise.
	(vldrdq_gather_offset_z_u64): Likewise.
	(vldrdq_gather_shifted_offset_s64): Likewise.
	(vldrdq_gather_shifted_offset_u64): Likewise.
	(vldrdq_gather_shifted_offset_z_s64): Likewise.
	(vldrdq_gather_shifted_offset_z_u64): Likewise.
	(vldrhq_gather_offset_f16): Likewise.
	(vldrhq_gather_offset_z_f16): Likewise.
	(vldrhq_gather_shifted_offset_f16): Likewise.
	(vldrhq_gather_shifted_offset_z_f16): Likewise.
	(vldrwq_gather_base_f32): Likewise.
	(vldrwq_gather_base_z_f32): Likewise.
	(vldrwq_gather_offset_f32): Likewise.
	(vldrwq_gather_offset_s32): Likewise.
	(vldrwq_gather_offset_u32): Likewise.
	(vldrwq_gather_offset_z_f32): Likewise.
	(vldrwq_gather_offset_z_s32): Likewise.
	(vldrwq_gather_offset_z_u32): Likewise.
	(vldrwq_gather_shifted_offset_f32): Likewise.
	(vldrwq_gather_shifted_offset_s32): Likewise.
	(vldrwq_gather_shifted_offset_u32): Likewise.
	(vldrwq_gather_shifted_offset_z_f32): Likewise.
	(vldrwq_gather_shifted_offset_z_s32): Likewise.
	(vldrwq_gather_shifted_offset_z_u32): Likewise.
	(__arm_vldrdq_gather_base_s64): Define intrinsic.
	(__arm_vldrdq_gather_base_u64): Likewise.
	(__arm_vldrdq_gather_base_z_s64): Likewise.
	(__arm_vldrdq_gather_base_z_u64): Likewise.
	(__arm_vldrdq_gather_offset_s64): Likewise.
	(__arm_vldrdq_gather_offset_u64): Likewise.
	(__arm_vldrdq_gather_offset_z_s64): Likewise.
	(__arm_vldrdq_gather_offset_z_u64): Likewise.
	(__arm_vldrdq_gather_shifted_offset_s64): Likewise.
	(__arm_vldrdq_gather_shifted_offset_u64): Likewise.
	(__arm_vldrdq_gather_shifted_offset_z_s64): Likewise.
	(__arm_vldrdq_gather_shifted_offset_z_u64): Likewise.
	(__arm_vldrwq_gather_offset_s32): Likewise.
	(__arm_vldrwq_gather_offset_u32): Likewise.
	(__arm_vldrwq_gather_offset_z_s32): Likewise.
	(__arm_vldrwq_gather_offset_z_u32): Likewise.
	(__arm_vldrwq_gather_shifted_offset_s32): Likewise.
	(__arm_vldrwq_gather_shifted_offset_u32): Likewise.
	(__arm_vldrwq_gather_shifted_offset_z_s32): Likewise.
	(__arm_vldrwq_gather_shifted_offset_z_u32): Likewise.
	(__arm_vldrhq_gather_offset_f16): Likewise.
	(__arm_vldrhq_gather_offset_z_f16): Likewise.
	(__arm_vldrhq_gather_shifted_offset_f16): Likewise.
	(__arm_vldrhq_gather_shifted_offset_z_f16): Likewise.
	(__arm_vldrwq_gather_base_f32): Likewise.
	(__arm_vldrwq_gather_base_z_f32): Likewise.
	(__arm_vldrwq_gather_offset_f32): Likewise.
	(__arm_vldrwq_gather_offset_z_f32): Likewise.
	(__arm_vldrwq_gather_shifted_offset_f32): Likewise.
	(__arm_vldrwq_gather_shifted_offset_z_f32): Likewise.
	(vldrhq_gather_offset): Define polymorphic variant.
	(vldrhq_gather_offset_z): Likewise.
	(vldrhq_gather_shifted_offset): Likewise.
	(vldrhq_gather_shifted_offset_z): Likewise.
	(vldrwq_gather_offset): Likewise.
	(vldrwq_gather_offset_z): Likewise.
	(vldrwq_gather_shifted_offset): Likewise.
	(vldrwq_gather_shifted_offset_z): Likewise.
	(vldrdq_gather_offset): Likewise.
	(vldrdq_gather_offset_z): Likewise.
	(vldrdq_gather_shifted_offset): Likewise.
	(vldrdq_gather_shifted_offset_z): Likewise.
	* config/arm/arm_mve_builtins.def (LDRGBS): Use builtin qualifier.
	(LDRGBU): Likewise.
	(LDRGBS_Z): Likewise.
	(LDRGBU_Z): Likewise.
	(LDRGS): Likewise.
	(LDRGU): Likewise.
	(LDRGS_Z): Likewise.
	(LDRGU_Z): Likewise.
	* config/arm/mve.md (VLDRDGBQ): Define iterator.
	(VLDRDGOQ): Likewise.
	(VLDRDGSOQ): Likewise.
	(VLDRWGOQ): Likewise.
	(VLDRWGSOQ): Likewise.
	(mve_vldrdq_gather_base_<supf>v2di): Define RTL pattern.
	(mve_vldrdq_gather_base_z_<supf>v2di): Likewise.
	(mve_vldrdq_gather_offset_<supf>v2di): Likewise.
	(mve_vldrdq_gather_offset_z_<supf>v2di): Likewise.
	(mve_vldrdq_gather_shifted_offset_<supf>v2di): Likewise.
	(mve_vldrdq_gather_shifted_offset_z_<supf>v2di): Likewise.
	(mve_vldrhq_gather_offset_fv8hf): Likewise.
	(mve_vldrhq_gather_offset_z_fv8hf): Likewise.
	(mve_vldrhq_gather_shifted_offset_fv8hf): Likewise.
	(mve_vldrhq_gather_shifted_offset_z_fv8hf): Likewise.
	(mve_vldrwq_gather_base_fv4sf): Likewise.
	(mve_vldrwq_gather_base_z_fv4sf): Likewise.
	(mve_vldrwq_gather_offset_fv4sf): Likewise.
	(mve_vldrwq_gather_offset_<supf>v4si): Likewise.
	(mve_vldrwq_gather_offset_z_fv4sf): Likewise.
	(mve_vldrwq_gather_offset_z_<supf>v4si): Likewise.
	(mve_vldrwq_gather_shifted_offset_fv4sf): Likewise.
	(mve_vldrwq_gather_shifted_offset_<supf>v4si): Likewise.
	(mve_vldrwq_gather_shifted_offset_z_fv4sf): Likewise.
	(mve_vldrwq_gather_shifted_offset_z_<supf>v4si): Likewise.

gcc/testsuite/ChangeLog:

2019-11-04  Srinath Parvathaneni  <srinath.parvathaneni@arm.com>

	* gcc.target/arm/mve/intrinsics/vldrdq_gather_base_s64.c: New test.
	* gcc.target/arm/mve/intrinsics/vldrdq_gather_base_u64.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vldrdq_gather_base_z_s64.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vldrdq_gather_base_z_u64.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vldrdq_gather_offset_s64.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vldrdq_gather_offset_u64.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vldrdq_gather_offset_z_s64.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vldrdq_gather_offset_z_u64.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vldrdq_gather_shifted_offset_s64.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vldrdq_gather_shifted_offset_u64.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vldrdq_gather_shifted_offset_z_s64.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vldrdq_gather_shifted_offset_z_u64.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vldrhq_gather_offset_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vldrhq_gather_offset_z_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vldrhq_gather_shifted_offset_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vldrhq_gather_shifted_offset_z_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vldrwq_gather_base_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vldrwq_gather_base_z_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vldrwq_gather_offset_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vldrwq_gather_offset_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vldrwq_gather_offset_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vldrwq_gather_offset_z_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vldrwq_gather_offset_z_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vldrwq_gather_offset_z_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vldrwq_gather_shifted_offset_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vldrwq_gather_shifted_offset_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vldrwq_gather_shifted_offset_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vldrwq_gather_shifted_offset_z_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vldrwq_gather_shifted_offset_z_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vldrwq_gather_shifted_offset_z_u32.c: Likewise.


###############     Attachment also inlined for ease of reply    ###############


diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
index e85d36051ef748709da3b9fcdf522e39deb12c08..a4663a04902741af5276a5f20decca3a7321822e 100644
--- a/gcc/config/arm/arm_mve.h
+++ b/gcc/config/arm/arm_mve.h
@@ -1798,6 +1798,36 @@ typedef struct { uint8x16_t val[4]; } uint8x16x4_t;
 #define vldrhq_z_f16(__base, __p) __arm_vldrhq_z_f16(__base, __p)
 #define vldrwq_f32(__base) __arm_vldrwq_f32(__base)
 #define vldrwq_z_f32(__base, __p) __arm_vldrwq_z_f32(__base, __p)
+#define vldrdq_gather_base_s64(__addr,  __offset) __arm_vldrdq_gather_base_s64(__addr,  __offset)
+#define vldrdq_gather_base_u64(__addr,  __offset) __arm_vldrdq_gather_base_u64(__addr,  __offset)
+#define vldrdq_gather_base_z_s64(__addr,  __offset, __p) __arm_vldrdq_gather_base_z_s64(__addr,  __offset, __p)
+#define vldrdq_gather_base_z_u64(__addr,  __offset, __p) __arm_vldrdq_gather_base_z_u64(__addr,  __offset, __p)
+#define vldrdq_gather_offset_s64(__base, __offset) __arm_vldrdq_gather_offset_s64(__base, __offset)
+#define vldrdq_gather_offset_u64(__base, __offset) __arm_vldrdq_gather_offset_u64(__base, __offset)
+#define vldrdq_gather_offset_z_s64(__base, __offset, __p) __arm_vldrdq_gather_offset_z_s64(__base, __offset, __p)
+#define vldrdq_gather_offset_z_u64(__base, __offset, __p) __arm_vldrdq_gather_offset_z_u64(__base, __offset, __p)
+#define vldrdq_gather_shifted_offset_s64(__base, __offset) __arm_vldrdq_gather_shifted_offset_s64(__base, __offset)
+#define vldrdq_gather_shifted_offset_u64(__base, __offset) __arm_vldrdq_gather_shifted_offset_u64(__base, __offset)
+#define vldrdq_gather_shifted_offset_z_s64(__base, __offset, __p) __arm_vldrdq_gather_shifted_offset_z_s64(__base, __offset, __p)
+#define vldrdq_gather_shifted_offset_z_u64(__base, __offset, __p) __arm_vldrdq_gather_shifted_offset_z_u64(__base, __offset, __p)
+#define vldrhq_gather_offset_f16(__base, __offset) __arm_vldrhq_gather_offset_f16(__base, __offset)
+#define vldrhq_gather_offset_z_f16(__base, __offset, __p) __arm_vldrhq_gather_offset_z_f16(__base, __offset, __p)
+#define vldrhq_gather_shifted_offset_f16(__base, __offset) __arm_vldrhq_gather_shifted_offset_f16(__base, __offset)
+#define vldrhq_gather_shifted_offset_z_f16(__base, __offset, __p) __arm_vldrhq_gather_shifted_offset_z_f16(__base, __offset, __p)
+#define vldrwq_gather_base_f32(__addr,  __offset) __arm_vldrwq_gather_base_f32(__addr,  __offset)
+#define vldrwq_gather_base_z_f32(__addr,  __offset, __p) __arm_vldrwq_gather_base_z_f32(__addr,  __offset, __p)
+#define vldrwq_gather_offset_f32(__base, __offset) __arm_vldrwq_gather_offset_f32(__base, __offset)
+#define vldrwq_gather_offset_s32(__base, __offset) __arm_vldrwq_gather_offset_s32(__base, __offset)
+#define vldrwq_gather_offset_u32(__base, __offset) __arm_vldrwq_gather_offset_u32(__base, __offset)
+#define vldrwq_gather_offset_z_f32(__base, __offset, __p) __arm_vldrwq_gather_offset_z_f32(__base, __offset, __p)
+#define vldrwq_gather_offset_z_s32(__base, __offset, __p) __arm_vldrwq_gather_offset_z_s32(__base, __offset, __p)
+#define vldrwq_gather_offset_z_u32(__base, __offset, __p) __arm_vldrwq_gather_offset_z_u32(__base, __offset, __p)
+#define vldrwq_gather_shifted_offset_f32(__base, __offset) __arm_vldrwq_gather_shifted_offset_f32(__base, __offset)
+#define vldrwq_gather_shifted_offset_s32(__base, __offset) __arm_vldrwq_gather_shifted_offset_s32(__base, __offset)
+#define vldrwq_gather_shifted_offset_u32(__base, __offset) __arm_vldrwq_gather_shifted_offset_u32(__base, __offset)
+#define vldrwq_gather_shifted_offset_z_f32(__base, __offset, __p) __arm_vldrwq_gather_shifted_offset_z_f32(__base, __offset, __p)
+#define vldrwq_gather_shifted_offset_z_s32(__base, __offset, __p) __arm_vldrwq_gather_shifted_offset_z_s32(__base, __offset, __p)
+#define vldrwq_gather_shifted_offset_z_u32(__base, __offset, __p) __arm_vldrwq_gather_shifted_offset_z_u32(__base, __offset, __p)
 #endif
 
 __extension__ extern __inline void
@@ -11722,6 +11752,147 @@ __arm_vldrwq_z_u32 (uint32_t const * __base, mve_pred16_t __p)
   return __builtin_mve_vldrwq_z_uv4si ((__builtin_neon_si *) __base, __p);
 }
 
+__extension__ extern __inline int64x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vldrdq_gather_base_s64 (uint64x2_t __addr, const int __offset)
+{
+  return __builtin_mve_vldrdq_gather_base_sv2di (__addr, __offset);
+}
+
+__extension__ extern __inline uint64x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vldrdq_gather_base_u64 (uint64x2_t __addr, const int __offset)
+{
+  return __builtin_mve_vldrdq_gather_base_uv2di (__addr, __offset);
+}
+
+__extension__ extern __inline int64x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vldrdq_gather_base_z_s64 (uint64x2_t __addr, const int __offset, mve_pred16_t __p)
+{
+  return __builtin_mve_vldrdq_gather_base_z_sv2di (__addr, __offset, __p);
+}
+
+__extension__ extern __inline uint64x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vldrdq_gather_base_z_u64 (uint64x2_t __addr, const int __offset, mve_pred16_t __p)
+{
+  return __builtin_mve_vldrdq_gather_base_z_uv2di (__addr, __offset, __p);
+}
+
+__extension__ extern __inline int64x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vldrdq_gather_offset_s64 (int64_t const * __base, uint64x2_t __offset)
+{
+  return __builtin_mve_vldrdq_gather_offset_sv2di ((__builtin_neon_di *) __base, __offset);
+}
+
+__extension__ extern __inline uint64x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vldrdq_gather_offset_u64 (uint64_t const * __base, uint64x2_t __offset)
+{
+  return __builtin_mve_vldrdq_gather_offset_uv2di ((__builtin_neon_di *) __base, __offset);
+}
+
+__extension__ extern __inline int64x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vldrdq_gather_offset_z_s64 (int64_t const * __base, uint64x2_t __offset, mve_pred16_t __p)
+{
+  return __builtin_mve_vldrdq_gather_offset_z_sv2di ((__builtin_neon_di *) __base, __offset, __p);
+}
+
+
+__extension__ extern __inline uint64x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vldrdq_gather_offset_z_u64 (uint64_t const * __base, uint64x2_t __offset, mve_pred16_t __p)
+{
+  return __builtin_mve_vldrdq_gather_offset_z_uv2di ((__builtin_neon_di *) __base, __offset, __p);
+}
+
+__extension__ extern __inline int64x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vldrdq_gather_shifted_offset_s64 (int64_t const * __base, uint64x2_t __offset)
+{
+  return __builtin_mve_vldrdq_gather_shifted_offset_sv2di ((__builtin_neon_di *) __base, __offset);
+}
+
+__extension__ extern __inline uint64x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vldrdq_gather_shifted_offset_u64 (uint64_t const * __base, uint64x2_t __offset)
+{
+  return __builtin_mve_vldrdq_gather_shifted_offset_uv2di ((__builtin_neon_di *) __base, __offset);
+}
+
+__extension__ extern __inline int64x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vldrdq_gather_shifted_offset_z_s64 (int64_t const * __base, uint64x2_t __offset, mve_pred16_t __p)
+{
+  return __builtin_mve_vldrdq_gather_shifted_offset_z_sv2di ((__builtin_neon_di *) __base, __offset, __p);
+}
+
+__extension__ extern __inline uint64x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vldrdq_gather_shifted_offset_z_u64 (uint64_t const * __base, uint64x2_t __offset, mve_pred16_t __p)
+{
+  return __builtin_mve_vldrdq_gather_shifted_offset_z_uv2di ((__builtin_neon_di *) __base, __offset, __p);
+}
+
+__extension__ extern __inline int32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vldrwq_gather_offset_s32 (int32_t const * __base, uint32x4_t __offset)
+{
+  return __builtin_mve_vldrwq_gather_offset_sv4si ((__builtin_neon_si *) __base, __offset);
+}
+
+__extension__ extern __inline uint32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vldrwq_gather_offset_u32 (uint32_t const * __base, uint32x4_t __offset)
+{
+  return __builtin_mve_vldrwq_gather_offset_uv4si ((__builtin_neon_si *) __base, __offset);
+}
+
+__extension__ extern __inline int32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vldrwq_gather_offset_z_s32 (int32_t const * __base, uint32x4_t __offset, mve_pred16_t __p)
+{
+  return __builtin_mve_vldrwq_gather_offset_z_sv4si ((__builtin_neon_si *) __base, __offset, __p);
+}
+
+__extension__ extern __inline uint32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vldrwq_gather_offset_z_u32 (uint32_t const * __base, uint32x4_t __offset, mve_pred16_t __p)
+{
+  return __builtin_mve_vldrwq_gather_offset_z_uv4si ((__builtin_neon_si *) __base, __offset, __p);
+}
+
+__extension__ extern __inline int32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vldrwq_gather_shifted_offset_s32 (int32_t const * __base, uint32x4_t __offset)
+{
+  return __builtin_mve_vldrwq_gather_shifted_offset_sv4si ((__builtin_neon_si *) __base, __offset);
+}
+
+__extension__ extern __inline uint32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vldrwq_gather_shifted_offset_u32 (uint32_t const * __base, uint32x4_t __offset)
+{
+  return __builtin_mve_vldrwq_gather_shifted_offset_uv4si ((__builtin_neon_si *) __base, __offset);
+}
+
+__extension__ extern __inline int32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vldrwq_gather_shifted_offset_z_s32 (int32_t const * __base, uint32x4_t __offset, mve_pred16_t __p)
+{
+  return __builtin_mve_vldrwq_gather_shifted_offset_z_sv4si ((__builtin_neon_si *) __base, __offset, __p);
+}
+
+__extension__ extern __inline uint32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vldrwq_gather_shifted_offset_z_u32 (uint32_t const * __base, uint32x4_t __offset, mve_pred16_t __p)
+{
+  return __builtin_mve_vldrwq_gather_shifted_offset_z_uv4si ((__builtin_neon_si *) __base, __offset, __p);
+}
+
 #if (__ARM_FEATURE_MVE & 2) /* MVE Floating point.  */
 
 __extension__ extern __inline void
@@ -13905,6 +14076,77 @@ __arm_vldrhq_f16 (float16_t const * __base)
 {
   return __builtin_mve_vldrhq_fv8hf((__builtin_neon_hi *) __base);
 }
+
+__extension__ extern __inline float16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vldrhq_gather_offset_f16 (float16_t const * __base, uint16x8_t __offset)
+{
+  return __builtin_mve_vldrhq_gather_offset_fv8hf((__builtin_neon_hi *) __base, __offset);
+}
+
+__extension__ extern __inline float16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vldrhq_gather_offset_z_f16 (float16_t const * __base, uint16x8_t __offset, mve_pred16_t __p)
+{
+  return __builtin_mve_vldrhq_gather_offset_z_fv8hf((__builtin_neon_hi *) __base, __offset, __p);
+}
+
+__extension__ extern __inline float16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vldrhq_gather_shifted_offset_f16 (float16_t const * __base, uint16x8_t __offset)
+{
+  return __builtin_mve_vldrhq_gather_shifted_offset_fv8hf (__base, __offset);
+}
+
+__extension__ extern __inline float16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vldrhq_gather_shifted_offset_z_f16 (float16_t const * __base, uint16x8_t __offset, mve_pred16_t __p)
+{
+  return __builtin_mve_vldrhq_gather_shifted_offset_z_fv8hf (__base, __offset, __p);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vldrwq_gather_base_f32 (uint32x4_t __addr, const int __offset)
+{
+  return __builtin_mve_vldrwq_gather_base_fv4sf (__addr, __offset);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vldrwq_gather_base_z_f32 (uint32x4_t __addr, const int __offset, mve_pred16_t __p)
+{
+  return __builtin_mve_vldrwq_gather_base_z_fv4sf (__addr, __offset, __p);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vldrwq_gather_offset_f32 (float32_t const * __base, uint32x4_t __offset)
+{
+  return __builtin_mve_vldrwq_gather_offset_fv4sf((__builtin_neon_si *) __base, __offset);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vldrwq_gather_offset_z_f32 (float32_t const * __base, uint32x4_t __offset, mve_pred16_t __p)
+{
+  return __builtin_mve_vldrwq_gather_offset_z_fv4sf((__builtin_neon_si *) __base, __offset, __p);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vldrwq_gather_shifted_offset_f32 (float32_t const * __base, uint32x4_t __offset)
+{
+  return __builtin_mve_vldrwq_gather_shifted_offset_fv4sf (__base, __offset);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vldrwq_gather_shifted_offset_z_f32 (float32_t const * __base, uint32x4_t __offset, mve_pred16_t __p)
+{
+  return __builtin_mve_vldrwq_gather_shifted_offset_z_fv4sf (__base, __offset, __p);
+}
+
 #endif
 
 enum {
@@ -15464,6 +15706,74 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_float16_t_const_ptr]: __arm_vld1q_f16 (__ARM_mve_coerce(__p0, float16_t const *)), \
   int (*)[__ARM_mve_type_float32_t_const_ptr]: __arm_vld1q_f32 (__ARM_mve_coerce(__p0, float32_t const *)));})
 
+#define vldrhq_gather_offset(p0,p1) __arm_vldrhq_gather_offset(p0,p1)
+#define __arm_vldrhq_gather_offset(p0,p1) ({ __typeof(p0) __p0 = (p0); \
+  __typeof(p1) __p1 = (p1); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
+  int (*)[__ARM_mve_type_int16_t_const_ptr][__ARM_mve_type_uint16x8_t]: __arm_vldrhq_gather_offset_s16 (__ARM_mve_coerce(__p0, int16_t const *), __ARM_mve_coerce(__p1, uint16x8_t)), \
+  int (*)[__ARM_mve_type_int16_t_const_ptr][__ARM_mve_type_uint32x4_t]: __arm_vldrhq_gather_offset_s32 (__ARM_mve_coerce(__p0, int16_t const *), __ARM_mve_coerce(__p1, uint32x4_t)), \
+  int (*)[__ARM_mve_type_uint16_t_const_ptr][__ARM_mve_type_uint16x8_t]: __arm_vldrhq_gather_offset_u16 (__ARM_mve_coerce(__p0, uint16_t const *), __ARM_mve_coerce(__p1, uint16x8_t)), \
+  int (*)[__ARM_mve_type_uint16_t_const_ptr][__ARM_mve_type_uint32x4_t]: __arm_vldrhq_gather_offset_u32 (__ARM_mve_coerce(__p0, uint16_t const *), __ARM_mve_coerce(__p1, uint32x4_t)), \
+  int (*)[__ARM_mve_type_float16_t_const_ptr][__ARM_mve_type_uint16x8_t]: __arm_vldrhq_gather_offset_f16 (__ARM_mve_coerce(__p0, float16_t const *), __ARM_mve_coerce(__p1, uint16x8_t)));})
+
+#define vldrhq_gather_offset_z(p0,p1,p2) __arm_vldrhq_gather_offset_z(p0,p1,p2)
+#define __arm_vldrhq_gather_offset_z(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
+  __typeof(p1) __p1 = (p1); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
+  int (*)[__ARM_mve_type_int16_t_const_ptr][__ARM_mve_type_uint16x8_t]: __arm_vldrhq_gather_offset_z_s16 (__ARM_mve_coerce(__p0, int16_t const *), __ARM_mve_coerce(__p1, uint16x8_t), p2), \
+  int (*)[__ARM_mve_type_int16_t_const_ptr][__ARM_mve_type_uint32x4_t]: __arm_vldrhq_gather_offset_z_s32 (__ARM_mve_coerce(__p0, int16_t const *), __ARM_mve_coerce(__p1, uint32x4_t), p2), \
+  int (*)[__ARM_mve_type_uint16_t_const_ptr][__ARM_mve_type_uint16x8_t]: __arm_vldrhq_gather_offset_z_u16 (__ARM_mve_coerce(__p0, uint16_t const *), __ARM_mve_coerce(__p1, uint16x8_t), p2), \
+  int (*)[__ARM_mve_type_uint16_t_const_ptr][__ARM_mve_type_uint32x4_t]: __arm_vldrhq_gather_offset_z_u32 (__ARM_mve_coerce(__p0, uint16_t const *), __ARM_mve_coerce(__p1, uint32x4_t), p2), \
+  int (*)[__ARM_mve_type_float16_t_const_ptr][__ARM_mve_type_uint16x8_t]: __arm_vldrhq_gather_offset_z_f16 (__ARM_mve_coerce(__p0, float16_t const *), __ARM_mve_coerce(__p1, uint16x8_t), p2));})
+
+#define vldrhq_gather_shifted_offset(p0,p1) __arm_vldrhq_gather_shifted_offset(p0,p1)
+#define __arm_vldrhq_gather_shifted_offset(p0,p1) ({ __typeof(p0) __p0 = (p0); \
+  __typeof(p1) __p1 = (p1); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
+  int (*)[__ARM_mve_type_int16_t_const_ptr][__ARM_mve_type_uint16x8_t]: __arm_vldrhq_gather_shifted_offset_s16 (__ARM_mve_coerce(__p0, int16_t const *), __ARM_mve_coerce(__p1, uint16x8_t)), \
+  int (*)[__ARM_mve_type_int16_t_const_ptr][__ARM_mve_type_uint32x4_t]: __arm_vldrhq_gather_shifted_offset_s32 (__ARM_mve_coerce(__p0, int16_t const *), __ARM_mve_coerce(__p1, uint32x4_t)), \
+  int (*)[__ARM_mve_type_uint16_t_const_ptr][__ARM_mve_type_uint16x8_t]: __arm_vldrhq_gather_shifted_offset_u16 (__ARM_mve_coerce(__p0, uint16_t const *), __ARM_mve_coerce(__p1, uint16x8_t)), \
+  int (*)[__ARM_mve_type_uint16_t_const_ptr][__ARM_mve_type_uint32x4_t]: __arm_vldrhq_gather_shifted_offset_u32 (__ARM_mve_coerce(__p0, uint16_t const *), __ARM_mve_coerce(__p1, uint32x4_t)), \
+  int (*)[__ARM_mve_type_float16_t_const_ptr][__ARM_mve_type_uint16x8_t]: __arm_vldrhq_gather_shifted_offset_f16 (__ARM_mve_coerce(__p0, float16_t const *), __ARM_mve_coerce(__p1, uint16x8_t)));})
+
+#define vldrhq_gather_shifted_offset_z(p0,p1,p2) __arm_vldrhq_gather_shifted_offset_z(p0,p1,p2)
+#define __arm_vldrhq_gather_shifted_offset_z(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
+  __typeof(p1) __p1 = (p1); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
+  int (*)[__ARM_mve_type_int16_t_const_ptr][__ARM_mve_type_uint16x8_t]: __arm_vldrhq_gather_shifted_offset_z_s16 (__ARM_mve_coerce(__p0, int16_t const *), __ARM_mve_coerce(__p1, uint16x8_t), p2), \
+  int (*)[__ARM_mve_type_int16_t_const_ptr][__ARM_mve_type_uint32x4_t]: __arm_vldrhq_gather_shifted_offset_z_s32 (__ARM_mve_coerce(__p0, int16_t const *), __ARM_mve_coerce(__p1, uint32x4_t), p2), \
+  int (*)[__ARM_mve_type_uint16_t_const_ptr][__ARM_mve_type_uint16x8_t]: __arm_vldrhq_gather_shifted_offset_z_u16 (__ARM_mve_coerce(__p0, uint16_t const *), __ARM_mve_coerce(__p1, uint16x8_t), p2), \
+  int (*)[__ARM_mve_type_uint16_t_const_ptr][__ARM_mve_type_uint32x4_t]: __arm_vldrhq_gather_shifted_offset_z_u32 (__ARM_mve_coerce(__p0, uint16_t const *), __ARM_mve_coerce(__p1, uint32x4_t), p2), \
+  int (*)[__ARM_mve_type_float16_t_const_ptr][__ARM_mve_type_uint16x8_t]: __arm_vldrhq_gather_shifted_offset_z_f16 (__ARM_mve_coerce(__p0, float16_t const *), __ARM_mve_coerce(__p1, uint16x8_t), p2));})
+
+#define vldrwq_gather_offset(p0,p1) __arm_vldrwq_gather_offset(p0,p1)
+#define __arm_vldrwq_gather_offset(p0,p1) ({ __typeof(p0) __p0 = (p0); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
+  int (*)[__ARM_mve_type_int32_t_const_ptr]: __arm_vldrwq_gather_offset_s32 (__ARM_mve_coerce(__p0, int32_t const *), p1), \
+  int (*)[__ARM_mve_type_uint32_t_const_ptr]: __arm_vldrwq_gather_offset_u32 (__ARM_mve_coerce(__p0, uint32_t const *), p1), \
+  int (*)[__ARM_mve_type_float32_t_const_ptr]: __arm_vldrwq_gather_offset_f32 (__ARM_mve_coerce(__p0, float32_t const *), p1));})
+
+#define vldrwq_gather_offset_z(p0,p1,p2) __arm_vldrwq_gather_offset_z(p0,p1,p2)
+#define __arm_vldrwq_gather_offset_z(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
+  int (*)[__ARM_mve_type_int32_t_const_ptr]: __arm_vldrwq_gather_offset_z_s32 (__ARM_mve_coerce(__p0, int32_t const *), p1, p2), \
+  int (*)[__ARM_mve_type_uint32_t_const_ptr]: __arm_vldrwq_gather_offset_z_u32 (__ARM_mve_coerce(__p0, uint32_t const *), p1, p2), \
+  int (*)[__ARM_mve_type_float32_t_const_ptr]: __arm_vldrwq_gather_offset_z_f32 (__ARM_mve_coerce(__p0, float32_t const *), p1, p2));})
+
+#define vldrwq_gather_shifted_offset(p0,p1) __arm_vldrwq_gather_shifted_offset(p0,p1)
+#define __arm_vldrwq_gather_shifted_offset(p0,p1) ({ __typeof(p0) __p0 = (p0); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
+  int (*)[__ARM_mve_type_int32_t_const_ptr]: __arm_vldrwq_gather_shifted_offset_s32 (__ARM_mve_coerce(__p0, int32_t const *), p1), \
+  int (*)[__ARM_mve_type_uint32_t_const_ptr]: __arm_vldrwq_gather_shifted_offset_u32 (__ARM_mve_coerce(__p0, uint32_t const *), p1), \
+  int (*)[__ARM_mve_type_float32_t_const_ptr]: __arm_vldrwq_gather_shifted_offset_f32 (__ARM_mve_coerce(__p0, float32_t const *), p1));})
+
+#define vldrwq_gather_shifted_offset_z(p0,p1,p2) __arm_vldrwq_gather_shifted_offset_z(p0,p1,p2)
+#define __arm_vldrwq_gather_shifted_offset_z(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
+  int (*)[__ARM_mve_type_int32_t_const_ptr]: __arm_vldrwq_gather_shifted_offset_z_s32 (__ARM_mve_coerce(__p0, int32_t const *), p1, p2), \
+  int (*)[__ARM_mve_type_uint32_t_const_ptr]: __arm_vldrwq_gather_shifted_offset_z_u32 (__ARM_mve_coerce(__p0, uint32_t const *), p1, p2), \
+  int (*)[__ARM_mve_type_float32_t_const_ptr]: __arm_vldrwq_gather_shifted_offset_z_f32 (__ARM_mve_coerce(__p0, float32_t const *), p1, p2));})
+
 #else /* MVE Interger.  */
 
 #define vst4q(p0,p1) __arm_vst4q(p0,p1)
@@ -18449,6 +18759,54 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_uint16_t_const_ptr][__ARM_mve_type_uint16x8_t]: __arm_vldrhq_gather_shifted_offset_z_u16 (__ARM_mve_coerce(__p0, uint16_t const *), __ARM_mve_coerce(__p1, uint16x8_t), p2), \
   int (*)[__ARM_mve_type_uint16_t_const_ptr][__ARM_mve_type_uint32x4_t]: __arm_vldrhq_gather_shifted_offset_z_u32 (__ARM_mve_coerce(__p0, uint16_t const *), __ARM_mve_coerce(__p1, uint32x4_t), p2));})
 
+#define vldrdq_gather_offset(p0,p1) __arm_vldrdq_gather_offset(p0,p1)
+#define __arm_vldrdq_gather_offset(p0,p1) ({ __typeof(p0) __p0 = (p0); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
+  int (*)[__ARM_mve_type_int64_t_const_ptr]: __arm_vldrdq_gather_offset_s64 (__ARM_mve_coerce(__p0, int64_t const *), p1), \
+  int (*)[__ARM_mve_type_uint64_t_const_ptr]: __arm_vldrdq_gather_offset_u64 (__ARM_mve_coerce(__p0, uint64_t const *), p1));})
+
+#define vldrdq_gather_offset_z(p0,p1,p2) __arm_vldrdq_gather_offset_z(p0,p1,p2)
+#define __arm_vldrdq_gather_offset_z(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
+  int (*)[__ARM_mve_type_int64_t_const_ptr]: __arm_vldrdq_gather_offset_z_s64 (__ARM_mve_coerce(__p0, int64_t const *), p1, p2), \
+  int (*)[__ARM_mve_type_uint64_t_const_ptr]: __arm_vldrdq_gather_offset_z_u64 (__ARM_mve_coerce(__p0, uint64_t const *), p1, p2));})
+
+#define vldrdq_gather_shifted_offset(p0,p1) __arm_vldrdq_gather_shifted_offset(p0,p1)
+#define __arm_vldrdq_gather_shifted_offset(p0,p1) ({ __typeof(p0) __p0 = (p0); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
+  int (*)[__ARM_mve_type_int64_t_const_ptr]: __arm_vldrdq_gather_shifted_offset_s64 (__ARM_mve_coerce(__p0, int64_t const *), p1), \
+  int (*)[__ARM_mve_type_uint64_t_const_ptr]: __arm_vldrdq_gather_shifted_offset_u64 (__ARM_mve_coerce(__p0, uint64_t const *), p1));})
+
+#define vldrdq_gather_shifted_offset_z(p0,p1,p2) __arm_vldrdq_gather_shifted_offset_z(p0,p1,p2)
+#define __arm_vldrdq_gather_shifted_offset_z(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
+  int (*)[__ARM_mve_type_int64_t_const_ptr]: __arm_vldrdq_gather_shifted_offset_z_s64 (__ARM_mve_coerce(__p0, int64_t const *), p1, p2), \
+  int (*)[__ARM_mve_type_uint64_t_const_ptr]: __arm_vldrdq_gather_shifted_offset_z_u64 (__ARM_mve_coerce(__p0, uint64_t const *), p1, p2));})
+
+#define vldrwq_gather_offset(p0,p1) __arm_vldrwq_gather_offset(p0,p1)
+#define __arm_vldrwq_gather_offset(p0,p1) ({ __typeof(p0) __p0 = (p0); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
+  int (*)[__ARM_mve_type_int32_t_const_ptr]: __arm_vldrwq_gather_offset_s32 (__ARM_mve_coerce(__p0, int32_t const *), p1), \
+  int (*)[__ARM_mve_type_uint32_t_const_ptr]: __arm_vldrwq_gather_offset_u32 (__ARM_mve_coerce(__p0, uint32_t const *), p1));})
+
+#define vldrwq_gather_offset_z(p0,p1,p2) __arm_vldrwq_gather_offset_z(p0,p1,p2)
+#define __arm_vldrwq_gather_offset_z(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
+  int (*)[__ARM_mve_type_int32_t_const_ptr]: __arm_vldrwq_gather_offset_z_s32 (__ARM_mve_coerce(__p0, int32_t const *), p1, p2), \
+  int (*)[__ARM_mve_type_uint32_t_const_ptr]: __arm_vldrwq_gather_offset_z_u32 (__ARM_mve_coerce(__p0, uint32_t const *), p1, p2));})
+
+#define vldrwq_gather_shifted_offset(p0,p1) __arm_vldrwq_gather_shifted_offset(p0,p1)
+#define __arm_vldrwq_gather_shifted_offset(p0,p1) ({ __typeof(p0) __p0 = (p0); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
+  int (*)[__ARM_mve_type_int32_t_const_ptr]: __arm_vldrwq_gather_shifted_offset_s32 (__ARM_mve_coerce(__p0, int32_t const *), p1), \
+  int (*)[__ARM_mve_type_uint32_t_const_ptr]: __arm_vldrwq_gather_shifted_offset_u32 (__ARM_mve_coerce(__p0, uint32_t const *), p1));})
+
+#define vldrwq_gather_shifted_offset_z(p0,p1,p2) __arm_vldrwq_gather_shifted_offset_z(p0,p1,p2)
+#define __arm_vldrwq_gather_shifted_offset_z(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
+  int (*)[__ARM_mve_type_int32_t_const_ptr]: __arm_vldrwq_gather_shifted_offset_z_s32 (__ARM_mve_coerce(__p0, int32_t const *), p1, p2), \
+  int (*)[__ARM_mve_type_uint32_t_const_ptr]: __arm_vldrwq_gather_shifted_offset_z_u32 (__ARM_mve_coerce(__p0, uint32_t const *), p1, p2));})
+
 #endif /* MVE Floating point.  */
 
 #ifdef __cplusplus
diff --git a/gcc/config/arm/arm_mve_builtins.def b/gcc/config/arm/arm_mve_builtins.def
index be407f4d690cadadbdbdab30aed5b0339178dda9..848be45ba716a194f475ca7bfd8c57f8fba59bfd 100644
--- a/gcc/config/arm/arm_mve_builtins.def
+++ b/gcc/config/arm/arm_mve_builtins.def
@@ -732,3 +732,33 @@ VAR1 (LDRU, vldrwq_u, v4si)
 VAR1 (LDRS_Z, vldrwq_z_f, v4sf)
 VAR1 (LDRS_Z, vldrwq_z_s, v4si)
 VAR1 (LDRU_Z, vldrwq_z_u, v4si)
+VAR1 (LDRGBS, vldrdq_gather_base_s, v2di)
+VAR1 (LDRGBS, vldrwq_gather_base_f, v4sf)
+VAR1 (LDRGBS_Z, vldrdq_gather_base_z_s, v2di)
+VAR1 (LDRGBS_Z, vldrwq_gather_base_z_f, v4sf)
+VAR1 (LDRGBU, vldrdq_gather_base_u, v2di)
+VAR1 (LDRGBU_Z, vldrdq_gather_base_z_u, v2di)
+VAR1 (LDRGS, vldrdq_gather_offset_s, v2di)
+VAR1 (LDRGS, vldrdq_gather_shifted_offset_s, v2di)
+VAR1 (LDRGS, vldrhq_gather_offset_f, v8hf)
+VAR1 (LDRGS, vldrhq_gather_shifted_offset_f, v8hf)
+VAR1 (LDRGS, vldrwq_gather_offset_f, v4sf)
+VAR1 (LDRGS, vldrwq_gather_offset_s, v4si)
+VAR1 (LDRGS, vldrwq_gather_shifted_offset_f, v4sf)
+VAR1 (LDRGS, vldrwq_gather_shifted_offset_s, v4si)
+VAR1 (LDRGS_Z, vldrdq_gather_offset_z_s, v2di)
+VAR1 (LDRGS_Z, vldrdq_gather_shifted_offset_z_s, v2di)
+VAR1 (LDRGS_Z, vldrhq_gather_offset_z_f, v8hf)
+VAR1 (LDRGS_Z, vldrhq_gather_shifted_offset_z_f, v8hf)
+VAR1 (LDRGS_Z, vldrwq_gather_offset_z_f, v4sf)
+VAR1 (LDRGS_Z, vldrwq_gather_offset_z_s, v4si)
+VAR1 (LDRGS_Z, vldrwq_gather_shifted_offset_z_f, v4sf)
+VAR1 (LDRGS_Z, vldrwq_gather_shifted_offset_z_s, v4si)
+VAR1 (LDRGU, vldrdq_gather_offset_u, v2di)
+VAR1 (LDRGU, vldrdq_gather_shifted_offset_u, v2di)
+VAR1 (LDRGU, vldrwq_gather_offset_u, v4si)
+VAR1 (LDRGU, vldrwq_gather_shifted_offset_u, v4si)
+VAR1 (LDRGU_Z, vldrdq_gather_offset_z_u, v2di)
+VAR1 (LDRGU_Z, vldrdq_gather_shifted_offset_z_u, v2di)
+VAR1 (LDRGU_Z, vldrwq_gather_offset_z_u, v4si)
+VAR1 (LDRGU_Z, vldrwq_gather_shifted_offset_z_u, v4si)
diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index 31ccb7b1608713e89ae9866b0b124efe8dc88ae3..452d7b5eac44e9d3be03dfc418f51584ab9feb32 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -197,7 +197,11 @@
 			 VLDRBQGO_U VLDRBQ_S VLDRBQ_U VLDRWQGB_S VLDRWQGB_U
 			 VLD1Q_F VLD1Q_S VLD1Q_U VLDRHQ_F VLDRHQGO_S
 			 VLDRHQGO_U VLDRHQGSO_S VLDRHQGSO_U VLDRHQ_S VLDRHQ_U
-			 VLDRWQ_F VLDRWQ_S VLDRWQ_U])
+			 VLDRWQ_F VLDRWQ_S VLDRWQ_U VLDRDQGB_S VLDRDQGB_U
+			 VLDRDQGO_S VLDRDQGO_U VLDRDQGSO_S VLDRDQGSO_U
+			 VLDRHQGO_F VLDRHQGSO_F VLDRWQGB_F VLDRWQGO_F
+			 VLDRWQGO_S VLDRWQGO_U VLDRWQGSO_F VLDRWQGSO_S
+			 VLDRWQGSO_U])
 
 (define_mode_attr MVE_CNVT [(V8HI "V8HF") (V4SI "V4SF") (V8HF "V8HI")
 			    (V4SF "V4SI")])
@@ -356,7 +360,10 @@
 		       (VLD1Q_S "s") (VLD1Q_U "u") (VLDRHQGO_S "s")
 		       (VLDRHQGO_U "u") (VLDRHQGSO_S "s") (VLDRHQGSO_U "u")
 		       (VLDRHQ_S "s") (VLDRHQ_U "u") (VLDRWQ_S "s")
-		       (VLDRWQ_U "u")])
+		       (VLDRWQ_U "u") (VLDRDQGB_S "s") (VLDRDQGB_U "u")
+		       (VLDRDQGO_S "s") (VLDRDQGO_U "u") (VLDRDQGSO_S "s")
+		       (VLDRDQGSO_U "u") (VLDRWQGO_S "s") (VLDRWQGO_U "u")
+		       (VLDRWQGSO_S "s") (VLDRWQGSO_U "u")])
 
 (define_int_attr mode1 [(VCTP8Q "8") (VCTP16Q "16") (VCTP32Q "32")
 			(VCTP64Q "64") (VCTP8Q_M "8") (VCTP16Q_M "16")
@@ -590,6 +597,11 @@
 (define_int_iterator VLDRHGSOQ [VLDRHQGSO_S VLDRHQGSO_U])
 (define_int_iterator VLDRHQ [VLDRHQ_S VLDRHQ_U])
 (define_int_iterator VLDRWQ [VLDRWQ_S VLDRWQ_U])
+(define_int_iterator VLDRDGBQ [VLDRDQGB_S VLDRDQGB_U])
+(define_int_iterator VLDRDGOQ [VLDRDQGO_S VLDRDQGO_U])
+(define_int_iterator VLDRDGSOQ [VLDRDQGSO_S VLDRDQGSO_U])
+(define_int_iterator VLDRWGOQ [VLDRWQGO_S VLDRWQGO_U])
+(define_int_iterator VLDRWGSOQ [VLDRWQGSO_S VLDRWQGSO_U])
 
 (define_insn "*mve_mov<mode>"
   [(set (match_operand:MVE_types 0 "s_register_operand" "=w,w,r,w,w,r,w")
@@ -8489,3 +8501,419 @@
   emit_insn (gen_mve_vldr<V_sz_elem1>q_<supf><mode>(operands[0],operands[1]));
   DONE;
 })
+
+;;
+;; [vldrdq_gather_base_s vldrdq_gather_base_u]
+;;
+(define_insn "mve_vldrdq_gather_base_<supf>v2di"
+  [(set (match_operand:V2DI 0 "s_register_operand" "=&w")
+	(unspec:V2DI [(match_operand:V2DI 1 "s_register_operand" "w")
+		      (match_operand:SI 2 "immediate_operand" "i")]
+	 VLDRDGBQ))
+  ]
+  "TARGET_HAVE_MVE"
+{
+   rtx ops[3];
+   ops[0] = operands[0];
+   ops[1] = operands[1];
+   ops[2] = operands[2];
+   output_asm_insn ("vldrd.64\t%q0, [%q1, %2]",ops);
+   return "";
+}
+  [(set_attr "length" "4")])
+
+;;
+;; [vldrdq_gather_base_z_s vldrdq_gather_base_z_u]
+;;
+(define_insn "mve_vldrdq_gather_base_z_<supf>v2di"
+  [(set (match_operand:V2DI 0 "s_register_operand" "=&w")
+	(unspec:V2DI [(match_operand:V2DI 1 "s_register_operand" "w")
+		      (match_operand:SI 2 "immediate_operand" "i")
+		      (match_operand:HI 3 "vpr_register_operand" "Up")]
+	 VLDRDGBQ))
+  ]
+  "TARGET_HAVE_MVE"
+{
+   rtx ops[3];
+   ops[0] = operands[0];
+   ops[1] = operands[1];
+   ops[2] = operands[2];
+   output_asm_insn ("vpst\n\tvldrdt.u64\t%q0, [%q1, %2]",ops);
+   return "";
+}
+  [(set_attr "length" "8")])
+
+;;
+;; [vldrdq_gather_offset_s vldrdq_gather_offset_u]
+;;
+(define_insn "mve_vldrdq_gather_offset_<supf>v2di"
+ [(set (match_operand:V2DI 0 "s_register_operand" "=&w")
+       (unspec:V2DI [(match_operand:V2DI 1 "memory_operand" "Us")
+		     (match_operand:V2DI 2 "s_register_operand" "w")]
+	VLDRDGOQ))
+ ]
+ "TARGET_HAVE_MVE"
+{
+  rtx ops[3];
+  ops[0] = operands[0];
+  ops[1] = operands[1];
+  ops[2] = operands[2];
+  output_asm_insn ("vldrd.u64\t%q0, [%m1, %q2]",ops);
+  return "";
+}
+ [(set_attr "length" "4")])
+
+;;
+;; [vldrdq_gather_offset_z_s vldrdq_gather_offset_z_u]
+;;
+(define_insn "mve_vldrdq_gather_offset_z_<supf>v2di"
+ [(set (match_operand:V2DI 0 "s_register_operand" "=&w")
+       (unspec:V2DI [(match_operand:V2DI 1 "memory_operand" "Us")
+		     (match_operand:V2DI 2 "s_register_operand" "w")
+		     (match_operand:HI 3 "vpr_register_operand" "Up")]
+	VLDRDGOQ))
+ ]
+ "TARGET_HAVE_MVE"
+{
+  rtx ops[3];
+  ops[0] = operands[0];
+  ops[1] = operands[1];
+  ops[2] = operands[2];
+  output_asm_insn ("vpst\n\tvldrdt.u64\t%q0, [%m1, %q2]",ops);
+  return "";
+}
+ [(set_attr "length" "8")])
+
+;;
+;; [vldrdq_gather_shifted_offset_s vldrdq_gather_shifted_offset_u]
+;;
+(define_insn "mve_vldrdq_gather_shifted_offset_<supf>v2di"
+  [(set (match_operand:V2DI 0 "s_register_operand" "=&w")
+	(unspec:V2DI [(match_operand:V2DI 1 "memory_operand" "Us")
+		      (match_operand:V2DI 2 "s_register_operand" "w")]
+	 VLDRDGSOQ))
+  ]
+  "TARGET_HAVE_MVE"
+{
+   rtx ops[3];
+   ops[0] = operands[0];
+   ops[1] = operands[1];
+   ops[2] = operands[2];
+   output_asm_insn ("vldrd.u64\t%q0, [%m1, %q2, uxtw #3]",ops);
+   return "";
+}
+  [(set_attr "length" "4")])
+
+;;
+;; [vldrdq_gather_shifted_offset_z_s vldrdq_gather_shifted_offset_z_u]
+;;
+(define_insn "mve_vldrdq_gather_shifted_offset_z_<supf>v2di"
+  [(set (match_operand:V2DI 0 "s_register_operand" "=&w")
+	(unspec:V2DI [(match_operand:V2DI 1 "memory_operand" "Us")
+		      (match_operand:V2DI 2 "s_register_operand" "w")
+		      (match_operand:HI 3 "vpr_register_operand" "Up")]
+	 VLDRDGSOQ))
+  ]
+  "TARGET_HAVE_MVE"
+{
+   rtx ops[3];
+   ops[0] = operands[0];
+   ops[1] = operands[1];
+   ops[2] = operands[2];
+   output_asm_insn ("vpst\n\tvldrdt.u64\t%q0, [%m1, %q2, uxtw #3]",ops);
+   return "";
+}
+  [(set_attr "length" "8")])
+
+;;
+;; [vldrhq_gather_offset_f]
+;;
+(define_insn "mve_vldrhq_gather_offset_fv8hf"
+  [(set (match_operand:V8HF 0 "s_register_operand" "=&w")
+	(unspec:V8HF [(match_operand:V8HI 1 "memory_operand" "Us")
+		      (match_operand:V8HI 2 "s_register_operand" "w")]
+	 VLDRHQGO_F))
+  ]
+  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
+{
+   rtx ops[3];
+   ops[0] = operands[0];
+   ops[1] = operands[1];
+   ops[2] = operands[2];
+   output_asm_insn ("vldrh.f16\t%q0, [%m1, %q2]",ops);
+   return "";
+}
+  [(set_attr "length" "4")])
+
+;;
+;; [vldrhq_gather_offset_z_f]
+;;
+(define_insn "mve_vldrhq_gather_offset_z_fv8hf"
+  [(set (match_operand:V8HF 0 "s_register_operand" "=&w")
+	(unspec:V8HF [(match_operand:V8HI 1 "memory_operand" "Us")
+		      (match_operand:V8HI 2 "s_register_operand" "w")
+		      (match_operand:HI 3 "vpr_register_operand" "Up")]
+	 VLDRHQGO_F))
+  ]
+  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
+{
+   rtx ops[4];
+   ops[0] = operands[0];
+   ops[1] = operands[1];
+   ops[2] = operands[2];
+   ops[3] = operands[3];
+   output_asm_insn ("vpst\n\tvldrht.f16\t%q0, [%m1, %q2]",ops);
+   return "";
+}
+  [(set_attr "length" "8")])
+
+;;
+;; [vldrhq_gather_shifted_offset_f]
+;;
+(define_insn "mve_vldrhq_gather_shifted_offset_fv8hf"
+  [(set (match_operand:V8HF 0 "s_register_operand" "=&w")
+	(unspec:V8HF [(match_operand:V8HI 1 "memory_operand" "Us")
+		      (match_operand:V8HI 2 "s_register_operand" "w")]
+	 VLDRHQGSO_F))
+  ]
+  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
+{
+   rtx ops[3];
+   ops[0] = operands[0];
+   ops[1] = operands[1];
+   ops[2] = operands[2];
+   output_asm_insn ("vldrh.f16\t%q0, [%m1, %q2, uxtw #1]",ops);
+   return "";
+}
+  [(set_attr "length" "4")])
+
+;;
+;; [vldrhq_gather_shifted_offset_z_f]
+;;
+(define_insn "mve_vldrhq_gather_shifted_offset_z_fv8hf"
+  [(set (match_operand:V8HF 0 "s_register_operand" "=&w")
+	(unspec:V8HF [(match_operand:V8HI 1 "memory_operand" "Us")
+		      (match_operand:V8HI 2 "s_register_operand" "w")
+		      (match_operand:HI 3 "vpr_register_operand" "Up")]
+	 VLDRHQGSO_F))
+  ]
+  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
+{
+   rtx ops[4];
+   ops[0] = operands[0];
+   ops[1] = operands[1];
+   ops[2] = operands[2];
+   ops[3] = operands[3];
+   output_asm_insn ("vpst\n\tvldrht.f16\t%q0, [%m1, %q2, uxtw #1]",ops);
+   return "";
+}
+  [(set_attr "length" "8")])
+
+;;
+;; [vldrwq_gather_base_f]
+;;
+(define_insn "mve_vldrwq_gather_base_fv4sf"
+  [(set (match_operand:V4SF 0 "s_register_operand" "=&w")
+	(unspec:V4SF [(match_operand:V4SI 1 "s_register_operand" "w")
+		      (match_operand:SI 2 "immediate_operand" "i")]
+	 VLDRWQGB_F))
+  ]
+  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
+{
+   rtx ops[3];
+   ops[0] = operands[0];
+   ops[1] = operands[1];
+   ops[2] = operands[2];
+   output_asm_insn ("vldrw.u32\t%q0, [%q1, %2]",ops);
+   return "";
+}
+  [(set_attr "length" "4")])
+
+;;
+;; [vldrwq_gather_base_z_f]
+;;
+(define_insn "mve_vldrwq_gather_base_z_fv4sf"
+  [(set (match_operand:V4SF 0 "s_register_operand" "=&w")
+	(unspec:V4SF [(match_operand:V4SI 1 "s_register_operand" "w")
+		      (match_operand:SI 2 "immediate_operand" "i")
+		      (match_operand:HI 3 "vpr_register_operand" "Up")]
+	 VLDRWQGB_F))
+  ]
+  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
+{
+   rtx ops[3];
+   ops[0] = operands[0];
+   ops[1] = operands[1];
+   ops[2] = operands[2];
+   output_asm_insn ("vpst\n\tvldrwt.u32\t%q0, [%q1, %2]",ops);
+   return "";
+}
+  [(set_attr "length" "8")])
+
+;;
+;; [vldrwq_gather_offset_f]
+;;
+(define_insn "mve_vldrwq_gather_offset_fv4sf"
+  [(set (match_operand:V4SF 0 "s_register_operand" "=&w")
+	(unspec:V4SF [(match_operand:V4SI 1 "memory_operand" "Us")
+		       (match_operand:V4SI 2 "s_register_operand" "w")]
+	 VLDRWQGO_F))
+  ]
+  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
+{
+   rtx ops[3];
+   ops[0] = operands[0];
+   ops[1] = operands[1];
+   ops[2] = operands[2];
+   output_asm_insn ("vldrw.u32\t%q0, [%m1, %q2]",ops);
+   return "";
+}
+  [(set_attr "length" "4")])
+
+;;
+;; [vldrwq_gather_offset_s vldrwq_gather_offset_u]
+;;
+(define_insn "mve_vldrwq_gather_offset_<supf>v4si"
+  [(set (match_operand:V4SI 0 "s_register_operand" "=&w")
+	(unspec:V4SI [(match_operand:V4SI 1 "memory_operand" "Us")
+		       (match_operand:V4SI 2 "s_register_operand" "w")]
+	 VLDRWGOQ))
+  ]
+  "TARGET_HAVE_MVE"
+{
+   rtx ops[3];
+   ops[0] = operands[0];
+   ops[1] = operands[1];
+   ops[2] = operands[2];
+   output_asm_insn ("vldrw.u32\t%q0, [%m1, %q2]",ops);
+   return "";
+}
+  [(set_attr "length" "4")])
+
+;;
+;; [vldrwq_gather_offset_z_f]
+;;
+(define_insn "mve_vldrwq_gather_offset_z_fv4sf"
+  [(set (match_operand:V4SF 0 "s_register_operand" "=&w")
+	(unspec:V4SF [(match_operand:V4SI 1 "memory_operand" "Us")
+		      (match_operand:V4SI 2 "s_register_operand" "w")
+		      (match_operand:HI 3 "vpr_register_operand" "Up")]
+	 VLDRWQGO_F))
+  ]
+  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
+{
+   rtx ops[4];
+   ops[0] = operands[0];
+   ops[1] = operands[1];
+   ops[2] = operands[2];
+   ops[3] = operands[3];
+   output_asm_insn ("vpst\n\tvldrwt.u32\t%q0, [%m1, %q2]",ops);
+   return "";
+}
+  [(set_attr "length" "8")])
+
+;;
+;; [vldrwq_gather_offset_z_s vldrwq_gather_offset_z_u]
+;;
+(define_insn "mve_vldrwq_gather_offset_z_<supf>v4si"
+  [(set (match_operand:V4SI 0 "s_register_operand" "=&w")
+	(unspec:V4SI [(match_operand:V4SI 1 "memory_operand" "Us")
+		      (match_operand:V4SI 2 "s_register_operand" "w")
+		      (match_operand:HI 3 "vpr_register_operand" "Up")]
+	 VLDRWGOQ))
+  ]
+  "TARGET_HAVE_MVE"
+{
+   rtx ops[4];
+   ops[0] = operands[0];
+   ops[1] = operands[1];
+   ops[2] = operands[2];
+   ops[3] = operands[3];
+   output_asm_insn ("vpst\n\tvldrwt.u32\t%q0, [%m1, %q2]",ops);
+   return "";
+}
+  [(set_attr "length" "8")])
+
+;;
+;; [vldrwq_gather_shifted_offset_f]
+;;
+(define_insn "mve_vldrwq_gather_shifted_offset_fv4sf"
+  [(set (match_operand:V4SF 0 "s_register_operand" "=&w")
+	(unspec:V4SF [(match_operand:V4SI 1 "memory_operand" "Us")
+		      (match_operand:V4SI 2 "s_register_operand" "w")]
+	 VLDRWQGSO_F))
+  ]
+  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
+{
+   rtx ops[3];
+   ops[0] = operands[0];
+   ops[1] = operands[1];
+   ops[2] = operands[2];
+   output_asm_insn ("vldrw.u32\t%q0, [%m1, %q2, uxtw #2]",ops);
+   return "";
+}
+  [(set_attr "length" "4")])
+
+;;
+;; [vldrwq_gather_shifted_offset_s vldrwq_gather_shifted_offset_u]
+;;
+(define_insn "mve_vldrwq_gather_shifted_offset_<supf>v4si"
+  [(set (match_operand:V4SI 0 "s_register_operand" "=&w")
+	(unspec:V4SI [(match_operand:V4SI 1 "memory_operand" "Us")
+		      (match_operand:V4SI 2 "s_register_operand" "w")]
+	 VLDRWGSOQ))
+  ]
+  "TARGET_HAVE_MVE"
+{
+   rtx ops[3];
+   ops[0] = operands[0];
+   ops[1] = operands[1];
+   ops[2] = operands[2];
+   output_asm_insn ("vldrw.u32\t%q0, [%m1, %q2, uxtw #2]",ops);
+   return "";
+}
+  [(set_attr "length" "4")])
+
+;;
+;; [vldrwq_gather_shifted_offset_z_f]
+;;
+(define_insn "mve_vldrwq_gather_shifted_offset_z_fv4sf"
+  [(set (match_operand:V4SF 0 "s_register_operand" "=&w")
+	(unspec:V4SF [(match_operand:V4SI 1 "memory_operand" "Us")
+		      (match_operand:V4SI 2 "s_register_operand" "w")
+		      (match_operand:HI 3 "vpr_register_operand" "Up")]
+	 VLDRWQGSO_F))
+  ]
+  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
+{
+   rtx ops[4];
+   ops[0] = operands[0];
+   ops[1] = operands[1];
+   ops[2] = operands[2];
+   ops[3] = operands[3];
+   output_asm_insn ("vpst\n\tvldrwt.u32\t%q0, [%m1, %q2, uxtw #2]",ops);
+   return "";
+}
+  [(set_attr "length" "8")])
+
+;;
+;; [vldrwq_gather_shifted_offset_z_s vldrwq_gather_shifted_offset_z_u]
+;;
+(define_insn "mve_vldrwq_gather_shifted_offset_z_<supf>v4si"
+  [(set (match_operand:V4SI 0 "s_register_operand" "=&w")
+	(unspec:V4SI [(match_operand:V4SI 1 "memory_operand" "Us")
+		      (match_operand:V4SI 2 "s_register_operand" "w")
+		      (match_operand:HI 3 "vpr_register_operand" "Up")]
+	 VLDRWGSOQ))
+  ]
+  "TARGET_HAVE_MVE"
+{
+   rtx ops[4];
+   ops[0] = operands[0];
+   ops[1] = operands[1];
+   ops[2] = operands[2];
+   ops[3] = operands[3];
+   output_asm_insn ("vpst\n\tvldrwt.u32\t%q0, [%m1, %q2, uxtw #2]",ops);
+   return "";
+}
+  [(set_attr "length" "8")])
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_s64.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_s64.c
new file mode 100644
index 0000000000000000000000000000000000000000..0116d35828deaf638de211f87821a40112b297a2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_s64.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int64x2_t
+foo (uint64x2_t addr)
+{
+  return vldrdq_gather_base_s64 (addr, 8);
+}
+
+/* { dg-final { scan-assembler "vldrd.64"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_u64.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_u64.c
new file mode 100644
index 0000000000000000000000000000000000000000..191e5deb4cf2de98dd1d37f4018542b38fa6d95a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_u64.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint64x2_t
+foo (uint64x2_t addr)
+{
+  return vldrdq_gather_base_u64 (addr, 8);
+}
+
+/* { dg-final { scan-assembler "vldrd.64"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_z_s64.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_z_s64.c
new file mode 100644
index 0000000000000000000000000000000000000000..9193b419b4ede4d3c0445b0ac96a96a26f787196
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_z_s64.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int64x2_t
+foo (uint64x2_t addr, mve_pred16_t p)
+{
+  return vldrdq_gather_base_z_s64 (addr, 8, p);
+}
+
+/* { dg-final { scan-assembler "vldrdt.u64"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_z_u64.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_z_u64.c
new file mode 100644
index 0000000000000000000000000000000000000000..9f156751f5597c9f634d04dca636ee9465266a3a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_z_u64.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint64x2_t
+foo (uint64x2_t addr, mve_pred16_t p)
+{
+  return vldrdq_gather_base_z_u64 (addr, 8, p);
+}
+
+/* { dg-final { scan-assembler "vldrdt.u64"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_offset_s64.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_offset_s64.c
new file mode 100644
index 0000000000000000000000000000000000000000..00547a4b7c9ca6da28c35489ef83b2357e8c5b26
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_offset_s64.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int64x2_t
+foo (int64_t const * base, uint64x2_t offset)
+{
+  return vldrdq_gather_offset_s64 (base, offset);
+}
+
+/* { dg-final { scan-assembler "vldrd.u64"  }  } */
+
+int64x2_t
+foo1 (int64_t const * base, uint64x2_t offset)
+{
+  return vldrdq_gather_offset (base, offset);
+}
+
+/* { dg-final { scan-assembler "vldrd.u64"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_offset_u64.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_offset_u64.c
new file mode 100644
index 0000000000000000000000000000000000000000..af59f95094736d4247640c84d5faa86671e1a289
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_offset_u64.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint64x2_t
+foo (uint64_t const * base, uint64x2_t offset)
+{
+  return vldrdq_gather_offset_u64 (base, offset);
+}
+
+/* { dg-final { scan-assembler "vldrd.u64"  }  } */
+
+uint64x2_t
+foo1 (uint64_t const * base, uint64x2_t offset)
+{
+  return vldrdq_gather_offset (base, offset);
+}
+
+/* { dg-final { scan-assembler "vldrd.u64"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_offset_z_s64.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_offset_z_s64.c
new file mode 100644
index 0000000000000000000000000000000000000000..7818470d5684f7d93ebe8547fd4701d46202d4aa
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_offset_z_s64.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int64x2_t
+foo (int64_t const * base, uint64x2_t offset, mve_pred16_t p)
+{
+  return vldrdq_gather_offset_z_s64 (base, offset, p);
+}
+
+/* { dg-final { scan-assembler "vldrdt.u64"  }  } */
+
+int64x2_t
+foo1 (int64_t const * base, uint64x2_t offset, mve_pred16_t p)
+{
+  return vldrdq_gather_offset_z (base, offset, p);
+}
+
+/* { dg-final { scan-assembler "vldrdt.u64"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_offset_z_u64.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_offset_z_u64.c
new file mode 100644
index 0000000000000000000000000000000000000000..440941026efac37355affd32a06875c64da7c29f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_offset_z_u64.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint64x2_t
+foo (uint64_t const * base, uint64x2_t offset, mve_pred16_t p)
+{
+  return vldrdq_gather_offset_z_u64 (base, offset, p);
+}
+
+/* { dg-final { scan-assembler "vldrdt.u64"  }  } */
+
+uint64x2_t
+foo1 (uint64_t const * base, uint64x2_t offset, mve_pred16_t p)
+{
+  return vldrdq_gather_offset_z (base, offset, p);
+}
+
+/* { dg-final { scan-assembler "vldrdt.u64"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_shifted_offset_s64.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_shifted_offset_s64.c
new file mode 100644
index 0000000000000000000000000000000000000000..6dac7c2f89ea63bb3837bffcae81fc5bf559467f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_shifted_offset_s64.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int64x2_t
+foo (int64_t const * base, uint64x2_t offset)
+{
+  return vldrdq_gather_shifted_offset_s64 (base, offset);
+}
+
+/* { dg-final { scan-assembler "vldrd.u64"  }  } */
+
+int64x2_t
+foo1 (int64_t const * base, uint64x2_t offset)
+{
+  return vldrdq_gather_shifted_offset (base, offset);
+}
+
+/* { dg-final { scan-assembler "vldrd.u64"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_shifted_offset_u64.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_shifted_offset_u64.c
new file mode 100644
index 0000000000000000000000000000000000000000..b33efc296268b1e3c6f904e54b1567deea76a5ba
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_shifted_offset_u64.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint64x2_t
+foo (uint64_t const * base, uint64x2_t offset)
+{
+  return vldrdq_gather_shifted_offset_u64 (base, offset);
+}
+
+/* { dg-final { scan-assembler "vldrd.u64"  }  } */
+
+uint64x2_t
+foo1 (uint64_t const * base, uint64x2_t offset)
+{
+  return vldrdq_gather_shifted_offset (base, offset);
+}
+
+/* { dg-final { scan-assembler "vldrd.u64"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_shifted_offset_z_s64.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_shifted_offset_z_s64.c
new file mode 100644
index 0000000000000000000000000000000000000000..9a0572e402dbd1aac67a37e1f91dfa463e954d7f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_shifted_offset_z_s64.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int64x2_t
+foo (int64_t const * base, uint64x2_t offset, mve_pred16_t p)
+{
+  return vldrdq_gather_shifted_offset_z_s64 (base, offset, p);
+}
+
+/* { dg-final { scan-assembler "vldrdt.u64"  }  } */
+
+int64x2_t
+foo1 (int64_t const * base, uint64x2_t offset, mve_pred16_t p)
+{
+  return vldrdq_gather_shifted_offset_z (base, offset, p);
+}
+
+/* { dg-final { scan-assembler "vldrdt.u64"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_shifted_offset_z_u64.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_shifted_offset_z_u64.c
new file mode 100644
index 0000000000000000000000000000000000000000..50a2cd161964cd0ad4bf0c39786184b480d7a700
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_shifted_offset_z_u64.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint64x2_t
+foo (uint64_t const * base, uint64x2_t offset, mve_pred16_t p)
+{
+  return vldrdq_gather_shifted_offset_z_u64 (base, offset, p);
+}
+
+/* { dg-final { scan-assembler "vldrdt.u64"  }  } */
+
+uint64x2_t
+foo1 (uint64_t const * base, uint64x2_t offset, mve_pred16_t p)
+{
+  return vldrdq_gather_shifted_offset_z (base, offset, p);
+}
+
+/* { dg-final { scan-assembler "vldrdt.u64"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_gather_offset_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_gather_offset_f16.c
new file mode 100644
index 0000000000000000000000000000000000000000..a915959d0160e3574a5a593e618b696167c95039
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_gather_offset_f16.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+float16x8_t
+foo (float16_t const * base, uint16x8_t offset)
+{
+  return vldrhq_gather_offset_f16 (base, offset);
+}
+
+/* { dg-final { scan-assembler "vldrh.f16"  }  } */
+
+float16x8_t
+foo1 (float16_t const * base, uint16x8_t offset)
+{
+  return vldrhq_gather_offset (base, offset);
+}
+
+/* { dg-final { scan-assembler "vldrh.f16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_gather_offset_z_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_gather_offset_z_f16.c
new file mode 100644
index 0000000000000000000000000000000000000000..fdc6762c082b1eace0704aabe237fa43a6bd842a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_gather_offset_z_f16.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+float16x8_t
+foo (float16_t const * base, uint16x8_t offset, mve_pred16_t p)
+{
+  return vldrhq_gather_offset_z_f16 (base, offset, p);
+}
+
+/* { dg-final { scan-assembler "vldrht.f16"  }  } */
+
+float16x8_t
+foo1 (float16_t const * base, uint16x8_t offset, mve_pred16_t p)
+{
+  return vldrhq_gather_offset_z (base, offset, p);
+}
+
+/* { dg-final { scan-assembler "vldrht.f16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_gather_shifted_offset_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_gather_shifted_offset_f16.c
new file mode 100644
index 0000000000000000000000000000000000000000..ba9d0f2279a8e3b773ab86466eb053653a27a1a7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_gather_shifted_offset_f16.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+float16x8_t
+foo (float16_t const * base, uint16x8_t offset)
+{
+  return vldrhq_gather_shifted_offset_f16 (base, offset);
+}
+
+/* { dg-final { scan-assembler "vldrh.f16"  }  } */
+
+float16x8_t
+foo1 (float16_t const * base, uint16x8_t offset)
+{
+  return vldrhq_gather_shifted_offset (base, offset);
+}
+
+/* { dg-final { scan-assembler "vldrh.f16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_gather_shifted_offset_z_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_gather_shifted_offset_z_f16.c
new file mode 100644
index 0000000000000000000000000000000000000000..561669f0a2aa4a25961ee7492ff9cc9b02553687
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrhq_gather_shifted_offset_z_f16.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+float16x8_t
+foo (float16_t const * base, uint16x8_t offset, mve_pred16_t p)
+{
+  return vldrhq_gather_shifted_offset_z_f16 (base, offset, p);
+}
+
+/* { dg-final { scan-assembler "vldrht.f16"  }  } */
+
+float16x8_t
+foo1 (float16_t const * base, uint16x8_t offset, mve_pred16_t p)
+{
+  return vldrhq_gather_shifted_offset_z (base, offset, p);
+}
+
+/* { dg-final { scan-assembler "vldrht.f16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_f32.c
new file mode 100644
index 0000000000000000000000000000000000000000..b398bab5e232f1d1f7bf55b9b06b3929b505702f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_f32.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+float32x4_t
+foo (uint32x4_t addr)
+{
+  return vldrwq_gather_base_f32 (addr, 4);
+}
+
+/* { dg-final { scan-assembler "vldrw.u32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_z_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_z_f32.c
new file mode 100644
index 0000000000000000000000000000000000000000..bc219c7f0b51c394f381cdb47dc5e6917561a019
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_z_f32.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+float32x4_t
+foo (uint32x4_t addr, mve_pred16_t p)
+{
+  return vldrwq_gather_base_z_f32 (addr, 4, p);
+}
+
+/* { dg-final { scan-assembler "vldrwt.u32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_offset_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_offset_f32.c
new file mode 100644
index 0000000000000000000000000000000000000000..2e3b94fbb0e95c97a1dbbfa3fd6d177e4bd16611
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_offset_f32.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+float32x4_t
+foo (float32_t const * base, uint32x4_t offset)
+{
+  return vldrwq_gather_offset_f32 (base, offset);
+}
+
+/* { dg-final { scan-assembler "vldrw.u32"  }  } */
+
+float32x4_t
+foo1 (float32_t const * base, uint32x4_t offset)
+{
+  return vldrwq_gather_offset (base, offset);
+}
+
+/* { dg-final { scan-assembler "vldrw.u32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_offset_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_offset_s32.c
new file mode 100644
index 0000000000000000000000000000000000000000..fe5d51ccfa51031ad4c4d2884b9b7a44a51b10ca
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_offset_s32.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int32x4_t
+foo (int32_t const * base, uint32x4_t offset)
+{
+  return vldrwq_gather_offset_s32 (base, offset);
+}
+
+/* { dg-final { scan-assembler "vldrw.u32"  }  } */
+
+int32x4_t
+foo1 (int32_t const * base, uint32x4_t offset)
+{
+  return vldrwq_gather_offset (base, offset);
+}
+
+/* { dg-final { scan-assembler "vldrw.u32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_offset_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_offset_u32.c
new file mode 100644
index 0000000000000000000000000000000000000000..89ec39813002ad68c0c7aac8aa0c90224852bfb5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_offset_u32.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint32x4_t
+foo (uint32_t const * base, uint32x4_t offset)
+{
+  return vldrwq_gather_offset_u32 (base, offset);
+}
+
+/* { dg-final { scan-assembler "vldrw.u32"  }  } */
+
+uint32x4_t
+foo1 (uint32_t const * base, uint32x4_t offset)
+{
+  return vldrwq_gather_offset (base, offset);
+}
+
+/* { dg-final { scan-assembler "vldrw.u32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_offset_z_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_offset_z_f32.c
new file mode 100644
index 0000000000000000000000000000000000000000..c85a0c2c7f89633fb216b05c55983232592878ad
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_offset_z_f32.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+float32x4_t
+foo (float32_t const * base, uint32x4_t offset, mve_pred16_t p)
+{
+  return vldrwq_gather_offset_z_f32 (base, offset, p);
+}
+
+/* { dg-final { scan-assembler "vldrwt.u32"  }  } */
+
+float32x4_t
+foo1 (float32_t const * base, uint32x4_t offset, mve_pred16_t p)
+{
+  return vldrwq_gather_offset_z (base, offset, p);
+}
+
+/* { dg-final { scan-assembler "vldrwt.u32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_offset_z_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_offset_z_s32.c
new file mode 100644
index 0000000000000000000000000000000000000000..e128b434bd6c7b7583a1881bd6bff32df2ea035a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_offset_z_s32.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int32x4_t
+foo (int32_t const * base, uint32x4_t offset, mve_pred16_t p)
+{
+  return vldrwq_gather_offset_z_s32 (base, offset, p);
+}
+
+/* { dg-final { scan-assembler "vldrwt.u32"  }  } */
+
+int32x4_t
+foo1 (int32_t const * base, uint32x4_t offset, mve_pred16_t p)
+{
+  return vldrwq_gather_offset_z (base, offset, p);
+}
+
+/* { dg-final { scan-assembler "vldrwt.u32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_offset_z_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_offset_z_u32.c
new file mode 100644
index 0000000000000000000000000000000000000000..b183b9afefdc0aeb49f637d9cc9dc936622adea5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_offset_z_u32.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint32x4_t
+foo (uint32_t const * base, uint32x4_t offset, mve_pred16_t p)
+{
+  return vldrwq_gather_offset_z_u32 (base, offset, p);
+}
+
+/* { dg-final { scan-assembler "vldrwt.u32"  }  } */
+
+uint32x4_t
+foo1 (uint32_t const * base, uint32x4_t offset, mve_pred16_t p)
+{
+  return vldrwq_gather_offset_z (base, offset, p);
+}
+
+/* { dg-final { scan-assembler "vldrwt.u32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_shifted_offset_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_shifted_offset_f32.c
new file mode 100644
index 0000000000000000000000000000000000000000..67a42f72dd28bf747ddf16ae5a0ef7dc95792444
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_shifted_offset_f32.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+float32x4_t
+foo (float32_t const * base, uint32x4_t offset)
+{
+  return vldrwq_gather_shifted_offset_f32 (base, offset);
+}
+
+/* { dg-final { scan-assembler "vldrw.u32"  }  } */
+
+float32x4_t
+foo1 (float32_t const * base, uint32x4_t offset)
+{
+  return vldrwq_gather_shifted_offset (base, offset);
+}
+
+/* { dg-final { scan-assembler "vldrw.u32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_shifted_offset_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_shifted_offset_s32.c
new file mode 100644
index 0000000000000000000000000000000000000000..283d0a512e5d9627f433ba326b877254902dd885
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_shifted_offset_s32.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int32x4_t
+foo (int32_t const * base, uint32x4_t offset)
+{
+  return vldrwq_gather_shifted_offset_s32 (base, offset);
+}
+
+/* { dg-final { scan-assembler "vldrw.u32"  }  } */
+
+int32x4_t
+foo1 (int32_t const * base, uint32x4_t offset)
+{
+  return vldrwq_gather_shifted_offset (base, offset);
+}
+
+/* { dg-final { scan-assembler "vldrw.u32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_shifted_offset_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_shifted_offset_u32.c
new file mode 100644
index 0000000000000000000000000000000000000000..4783fae28e6286b39f28c79c9cdcb48f6b00beb7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_shifted_offset_u32.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint32x4_t
+foo (uint32_t const * base, uint32x4_t offset)
+{
+  return vldrwq_gather_shifted_offset_u32 (base, offset);
+}
+
+/* { dg-final { scan-assembler "vldrw.u32"  }  } */
+
+uint32x4_t
+foo1 (uint32_t const * base, uint32x4_t offset)
+{
+  return vldrwq_gather_shifted_offset (base, offset);
+}
+
+/* { dg-final { scan-assembler "vldrw.u32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_shifted_offset_z_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_shifted_offset_z_f32.c
new file mode 100644
index 0000000000000000000000000000000000000000..c1443854aee7d99a8f961d544bd412f0e23e3ec3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_shifted_offset_z_f32.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+float32x4_t
+foo (float32_t const * base, uint32x4_t offset, mve_pred16_t p)
+{
+  return vldrwq_gather_shifted_offset_z_f32 (base, offset, p);
+}
+
+/* { dg-final { scan-assembler "vldrwt.u32"  }  } */
+
+float32x4_t
+foo1 (float32_t const * base, uint32x4_t offset, mve_pred16_t p)
+{
+  return vldrwq_gather_shifted_offset_z (base, offset, p);
+}
+
+/* { dg-final { scan-assembler "vldrwt.u32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_shifted_offset_z_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_shifted_offset_z_s32.c
new file mode 100644
index 0000000000000000000000000000000000000000..b537998da597ac52774e1459c7663739b1c5b52b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_shifted_offset_z_s32.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int32x4_t
+foo (int32_t const * base, uint32x4_t offset, mve_pred16_t p)
+{
+  return vldrwq_gather_shifted_offset_z_s32 (base, offset, p);
+}
+
+/* { dg-final { scan-assembler "vldrwt.u32"  }  } */
+
+int32x4_t
+foo1 (int32_t const * base, uint32x4_t offset, mve_pred16_t p)
+{
+  return vldrwq_gather_shifted_offset_z (base, offset, p);
+}
+
+/* { dg-final { scan-assembler "vldrwt.u32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_shifted_offset_z_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_shifted_offset_z_u32.c
new file mode 100644
index 0000000000000000000000000000000000000000..a3d4fde827c85b3027e853aed963183699e1d7e5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_shifted_offset_z_u32.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint32x4_t
+foo (uint32_t const * base, uint32x4_t offset, mve_pred16_t p)
+{
+  return vldrwq_gather_shifted_offset_z_u32 (base, offset, p);
+}
+
+/* { dg-final { scan-assembler "vldrwt.u32"  }  } */
+
+uint32x4_t
+foo1 (uint32_t const * base, uint32x4_t offset, mve_pred16_t p)
+{
+  return vldrwq_gather_shifted_offset_z (base, offset, p);
+}
+
+/* { dg-final { scan-assembler "vldrwt.u32"  }  } */


[-- Attachment #2: diff25.patch.gz --]
[-- Type: application/gzip, Size: 5961 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH][ARM][GCC][2/3x]: MVE intrinsics with ternary operands.
  2019-11-14 19:34 [PATCH][ARM][GCC][0/x]: Support for MVE ACLE intrinsics Srinath Parvathaneni
                   ` (35 preceding siblings ...)
  2019-11-14 19:16 ` [PATCH][ARM][GCC][4/4x]: MVE intrinsics with quaternary operands Srinath Parvathaneni
@ 2019-11-14 19:16 ` Srinath Parvathaneni
  2019-11-14 19:27 ` [PATCH][ARM][GCC][1/3x]: " Srinath Parvathaneni
  2019-12-12 16:09 ` [PATCH][ARM][GCC][0/x]: Support for MVE ACLE intrinsics Kyrill Tkachov
  38 siblings, 0 replies; 48+ messages in thread
From: Srinath Parvathaneni @ 2019-11-14 19:16 UTC (permalink / raw)
  To: gcc-patches; +Cc: Richard Earnshaw, Kyrylo Tkachov

[-- Attachment #1: Type: text/plain, Size: 42237 bytes --]

Hello,

This patch supports following MVE ACLE intrinsics with ternary operands.

vpselq_u8, vpselq_s8, vrev64q_m_u8, vqrdmlashq_n_u8, vqrdmlahq_n_u8,
vqdmlahq_n_u8, vmvnq_m_u8, vmlasq_n_u8, vmlaq_n_u8, vmladavq_p_u8,
vmladavaq_u8, vminvq_p_u8, vmaxvq_p_u8, vdupq_m_n_u8, vcmpneq_m_u8,
vcmpneq_m_n_u8, vcmphiq_m_u8, vcmphiq_m_n_u8, vcmpeqq_m_u8,
vcmpeqq_m_n_u8, vcmpcsq_m_u8, vcmpcsq_m_n_u8, vclzq_m_u8, vaddvaq_p_u8,
vsriq_n_u8, vsliq_n_u8, vshlq_m_r_u8, vrshlq_m_n_u8, vqshlq_m_r_u8,
vqrshlq_m_n_u8, vminavq_p_s8, vminaq_m_s8, vmaxavq_p_s8, vmaxaq_m_s8,
vcmpneq_m_s8, vcmpneq_m_n_s8, vcmpltq_m_s8, vcmpltq_m_n_s8, vcmpleq_m_s8,
vcmpleq_m_n_s8, vcmpgtq_m_s8, vcmpgtq_m_n_s8, vcmpgeq_m_s8, vcmpgeq_m_n_s8,
vcmpeqq_m_s8, vcmpeqq_m_n_s8, vshlq_m_r_s8, vrshlq_m_n_s8, vrev64q_m_s8,
vqshlq_m_r_s8, vqrshlq_m_n_s8, vqnegq_m_s8, vqabsq_m_s8, vnegq_m_s8,
vmvnq_m_s8, vmlsdavxq_p_s8, vmlsdavq_p_s8, vmladavxq_p_s8, vmladavq_p_s8,
vminvq_p_s8, vmaxvq_p_s8, vdupq_m_n_s8, vclzq_m_s8, vclsq_m_s8, vaddvaq_p_s8,
vabsq_m_s8, vqrdmlsdhxq_s8, vqrdmlsdhq_s8, vqrdmlashq_n_s8, vqrdmlahq_n_s8,
vqrdmladhxq_s8, vqrdmladhq_s8, vqdmlsdhxq_s8, vqdmlsdhq_s8, vqdmlahq_n_s8,
vqdmladhxq_s8, vqdmladhq_s8, vmlsdavaxq_s8, vmlsdavaq_s8, vmlasq_n_s8,
vmlaq_n_s8, vmladavaxq_s8, vmladavaq_s8, vsriq_n_s8, vsliq_n_s8, vpselq_u16,
vpselq_s16, vrev64q_m_u16, vqrdmlashq_n_u16, vqrdmlahq_n_u16, vqdmlahq_n_u16,
vmvnq_m_u16, vmlasq_n_u16, vmlaq_n_u16, vmladavq_p_u16, vmladavaq_u16,
vminvq_p_u16, vmaxvq_p_u16, vdupq_m_n_u16, vcmpneq_m_u16, vcmpneq_m_n_u16,
vcmphiq_m_u16, vcmphiq_m_n_u16, vcmpeqq_m_u16, vcmpeqq_m_n_u16, vcmpcsq_m_u16,
vcmpcsq_m_n_u16, vclzq_m_u16, vaddvaq_p_u16, vsriq_n_u16, vsliq_n_u16,
vshlq_m_r_u16, vrshlq_m_n_u16, vqshlq_m_r_u16, vqrshlq_m_n_u16, vminavq_p_s16,
vminaq_m_s16, vmaxavq_p_s16, vmaxaq_m_s16, vcmpneq_m_s16, vcmpneq_m_n_s16,
vcmpltq_m_s16, vcmpltq_m_n_s16, vcmpleq_m_s16, vcmpleq_m_n_s16, vcmpgtq_m_s16,
vcmpgtq_m_n_s16, vcmpgeq_m_s16, vcmpgeq_m_n_s16, vcmpeqq_m_s16, vcmpeqq_m_n_s16,
vshlq_m_r_s16, vrshlq_m_n_s16, vrev64q_m_s16, vqshlq_m_r_s16, vqrshlq_m_n_s16,
vqnegq_m_s16, vqabsq_m_s16, vnegq_m_s16, vmvnq_m_s16, vmlsdavxq_p_s16,
vmlsdavq_p_s16, vmladavxq_p_s16, vmladavq_p_s16, vminvq_p_s16, vmaxvq_p_s16,
vdupq_m_n_s16, vclzq_m_s16, vclsq_m_s16, vaddvaq_p_s16, vabsq_m_s16,
vqrdmlsdhxq_s16, vqrdmlsdhq_s16, vqrdmlashq_n_s16, vqrdmlahq_n_s16,
vqrdmladhxq_s16, vqrdmladhq_s16, vqdmlsdhxq_s16, vqdmlsdhq_s16, vqdmlahq_n_s16,
vqdmladhxq_s16, vqdmladhq_s16, vmlsdavaxq_s16, vmlsdavaq_s16, vmlasq_n_s16,
vmlaq_n_s16, vmladavaxq_s16, vmladavaq_s16, vsriq_n_s16, vsliq_n_s16, vpselq_u32,
vpselq_s32, vrev64q_m_u32, vqrdmlashq_n_u32, vqrdmlahq_n_u32, vqdmlahq_n_u32,
vmvnq_m_u32, vmlasq_n_u32, vmlaq_n_u32, vmladavq_p_u32, vmladavaq_u32,
vminvq_p_u32, vmaxvq_p_u32, vdupq_m_n_u32, vcmpneq_m_u32, vcmpneq_m_n_u32,
vcmphiq_m_u32, vcmphiq_m_n_u32, vcmpeqq_m_u32, vcmpeqq_m_n_u32, vcmpcsq_m_u32,
vcmpcsq_m_n_u32, vclzq_m_u32, vaddvaq_p_u32, vsriq_n_u32, vsliq_n_u32,
vshlq_m_r_u32, vrshlq_m_n_u32, vqshlq_m_r_u32, vqrshlq_m_n_u32, vminavq_p_s32,
vminaq_m_s32, vmaxavq_p_s32, vmaxaq_m_s32, vcmpneq_m_s32, vcmpneq_m_n_s32,
vcmpltq_m_s32, vcmpltq_m_n_s32, vcmpleq_m_s32, vcmpleq_m_n_s32, vcmpgtq_m_s32,
vcmpgtq_m_n_s32, vcmpgeq_m_s32, vcmpgeq_m_n_s32, vcmpeqq_m_s32, vcmpeqq_m_n_s32,
vshlq_m_r_s32, vrshlq_m_n_s32, vrev64q_m_s32, vqshlq_m_r_s32, vqrshlq_m_n_s32,
vqnegq_m_s32, vqabsq_m_s32, vnegq_m_s32, vmvnq_m_s32, vmlsdavxq_p_s32,
vmlsdavq_p_s32, vmladavxq_p_s32, vmladavq_p_s32, vminvq_p_s32, vmaxvq_p_s32,
vdupq_m_n_s32, vclzq_m_s32, vclsq_m_s32, vaddvaq_p_s32, vabsq_m_s32,
vqrdmlsdhxq_s32, vqrdmlsdhq_s32, vqrdmlashq_n_s32, vqrdmlahq_n_s32,
vqrdmladhxq_s32, vqrdmladhq_s32, vqdmlsdhxq_s32, vqdmlsdhq_s32, vqdmlahq_n_s32,
vqdmladhxq_s32, vqdmladhq_s32, vmlsdavaxq_s32, vmlsdavaq_s32, vmlasq_n_s32,
vmlaq_n_s32, vmladavaxq_s32, vmladavaq_s32, vsriq_n_s32, vsliq_n_s32,
vpselq_u64, vpselq_s64.

Please refer to M-profile Vector Extension (MVE) intrinsics [1]  for more details.
[1] https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/mve-intrinsics

In this patch new constraints "Rc" and "Re" are added, which checks the constant is with
in the range of 0 to 15 and 0 to 31 respectively.

Also a new predicates "mve_imm_15" and "mve_imm_31" are added, to check the the matching
constraint Rc and Re respectively.

Regression tested on arm-none-eabi and found no regressions.

Ok for trunk?

Thanks,
Srinath.

gcc/ChangeLog:

2019-10-25  Andre Vieira  <andre.simoesdiasvieira@arm.com>
            Mihail Ionescu  <mihail.ionescu@arm.com>
            Srinath Parvathaneni  <srinath.parvathaneni@arm.com>

	* config/arm/arm_mve.h (vpselq_u8): Define macro.
	(vpselq_s8): Likewise.
	(vrev64q_m_u8): Likewise.
	(vqrdmlashq_n_u8): Likewise.
	(vqrdmlahq_n_u8): Likewise.
	(vqdmlahq_n_u8): Likewise.
	(vmvnq_m_u8): Likewise.
	(vmlasq_n_u8): Likewise.
	(vmlaq_n_u8): Likewise.
	(vmladavq_p_u8): Likewise.
	(vmladavaq_u8): Likewise.
	(vminvq_p_u8): Likewise.
	(vmaxvq_p_u8): Likewise.
	(vdupq_m_n_u8): Likewise.
	(vcmpneq_m_u8): Likewise.
	(vcmpneq_m_n_u8): Likewise.
	(vcmphiq_m_u8): Likewise.
	(vcmphiq_m_n_u8): Likewise.
	(vcmpeqq_m_u8): Likewise.
	(vcmpeqq_m_n_u8): Likewise.
	(vcmpcsq_m_u8): Likewise.
	(vcmpcsq_m_n_u8): Likewise.
	(vclzq_m_u8): Likewise.
	(vaddvaq_p_u8): Likewise.
	(vsriq_n_u8): Likewise.
	(vsliq_n_u8): Likewise.
	(vshlq_m_r_u8): Likewise.
	(vrshlq_m_n_u8): Likewise.
	(vqshlq_m_r_u8): Likewise.
	(vqrshlq_m_n_u8): Likewise.
	(vminavq_p_s8): Likewise.
	(vminaq_m_s8): Likewise.
	(vmaxavq_p_s8): Likewise.
	(vmaxaq_m_s8): Likewise.
	(vcmpneq_m_s8): Likewise.
	(vcmpneq_m_n_s8): Likewise.
	(vcmpltq_m_s8): Likewise.
	(vcmpltq_m_n_s8): Likewise.
	(vcmpleq_m_s8): Likewise.
	(vcmpleq_m_n_s8): Likewise.
	(vcmpgtq_m_s8): Likewise.
	(vcmpgtq_m_n_s8): Likewise.
	(vcmpgeq_m_s8): Likewise.
	(vcmpgeq_m_n_s8): Likewise.
	(vcmpeqq_m_s8): Likewise.
	(vcmpeqq_m_n_s8): Likewise.
	(vshlq_m_r_s8): Likewise.
	(vrshlq_m_n_s8): Likewise.
	(vrev64q_m_s8): Likewise.
	(vqshlq_m_r_s8): Likewise.
	(vqrshlq_m_n_s8): Likewise.
	(vqnegq_m_s8): Likewise.
	(vqabsq_m_s8): Likewise.
	(vnegq_m_s8): Likewise.
	(vmvnq_m_s8): Likewise.
	(vmlsdavxq_p_s8): Likewise.
	(vmlsdavq_p_s8): Likewise.
	(vmladavxq_p_s8): Likewise.
	(vmladavq_p_s8): Likewise.
	(vminvq_p_s8): Likewise.
	(vmaxvq_p_s8): Likewise.
	(vdupq_m_n_s8): Likewise.
	(vclzq_m_s8): Likewise.
	(vclsq_m_s8): Likewise.
	(vaddvaq_p_s8): Likewise.
	(vabsq_m_s8): Likewise.
	(vqrdmlsdhxq_s8): Likewise.
	(vqrdmlsdhq_s8): Likewise.
	(vqrdmlashq_n_s8): Likewise.
	(vqrdmlahq_n_s8): Likewise.
	(vqrdmladhxq_s8): Likewise.
	(vqrdmladhq_s8): Likewise.
	(vqdmlsdhxq_s8): Likewise.
	(vqdmlsdhq_s8): Likewise.
	(vqdmlahq_n_s8): Likewise.
	(vqdmladhxq_s8): Likewise.
	(vqdmladhq_s8): Likewise.
	(vmlsdavaxq_s8): Likewise.
	(vmlsdavaq_s8): Likewise.
	(vmlasq_n_s8): Likewise.
	(vmlaq_n_s8): Likewise.
	(vmladavaxq_s8): Likewise.
	(vmladavaq_s8): Likewise.
	(vsriq_n_s8): Likewise.
	(vsliq_n_s8): Likewise.
	(vpselq_u16): Likewise.
	(vpselq_s16): Likewise.
	(vrev64q_m_u16): Likewise.
	(vqrdmlashq_n_u16): Likewise.
	(vqrdmlahq_n_u16): Likewise.
	(vqdmlahq_n_u16): Likewise.
	(vmvnq_m_u16): Likewise.
	(vmlasq_n_u16): Likewise.
	(vmlaq_n_u16): Likewise.
	(vmladavq_p_u16): Likewise.
	(vmladavaq_u16): Likewise.
	(vminvq_p_u16): Likewise.
	(vmaxvq_p_u16): Likewise.
	(vdupq_m_n_u16): Likewise.
	(vcmpneq_m_u16): Likewise.
	(vcmpneq_m_n_u16): Likewise.
	(vcmphiq_m_u16): Likewise.
	(vcmphiq_m_n_u16): Likewise.
	(vcmpeqq_m_u16): Likewise.
	(vcmpeqq_m_n_u16): Likewise.
	(vcmpcsq_m_u16): Likewise.
	(vcmpcsq_m_n_u16): Likewise.
	(vclzq_m_u16): Likewise.
	(vaddvaq_p_u16): Likewise.
	(vsriq_n_u16): Likewise.
	(vsliq_n_u16): Likewise.
	(vshlq_m_r_u16): Likewise.
	(vrshlq_m_n_u16): Likewise.
	(vqshlq_m_r_u16): Likewise.
	(vqrshlq_m_n_u16): Likewise.
	(vminavq_p_s16): Likewise.
	(vminaq_m_s16): Likewise.
	(vmaxavq_p_s16): Likewise.
	(vmaxaq_m_s16): Likewise.
	(vcmpneq_m_s16): Likewise.
	(vcmpneq_m_n_s16): Likewise.
	(vcmpltq_m_s16): Likewise.
	(vcmpltq_m_n_s16): Likewise.
	(vcmpleq_m_s16): Likewise.
	(vcmpleq_m_n_s16): Likewise.
	(vcmpgtq_m_s16): Likewise.
	(vcmpgtq_m_n_s16): Likewise.
	(vcmpgeq_m_s16): Likewise.
	(vcmpgeq_m_n_s16): Likewise.
	(vcmpeqq_m_s16): Likewise.
	(vcmpeqq_m_n_s16): Likewise.
	(vshlq_m_r_s16): Likewise.
	(vrshlq_m_n_s16): Likewise.
	(vrev64q_m_s16): Likewise.
	(vqshlq_m_r_s16): Likewise.
	(vqrshlq_m_n_s16): Likewise.
	(vqnegq_m_s16): Likewise.
	(vqabsq_m_s16): Likewise.
	(vnegq_m_s16): Likewise.
	(vmvnq_m_s16): Likewise.
	(vmlsdavxq_p_s16): Likewise.
	(vmlsdavq_p_s16): Likewise.
	(vmladavxq_p_s16): Likewise.
	(vmladavq_p_s16): Likewise.
	(vminvq_p_s16): Likewise.
	(vmaxvq_p_s16): Likewise.
	(vdupq_m_n_s16): Likewise.
	(vclzq_m_s16): Likewise.
	(vclsq_m_s16): Likewise.
	(vaddvaq_p_s16): Likewise.
	(vabsq_m_s16): Likewise.
	(vqrdmlsdhxq_s16): Likewise.
	(vqrdmlsdhq_s16): Likewise.
	(vqrdmlashq_n_s16): Likewise.
	(vqrdmlahq_n_s16): Likewise.
	(vqrdmladhxq_s16): Likewise.
	(vqrdmladhq_s16): Likewise.
	(vqdmlsdhxq_s16): Likewise.
	(vqdmlsdhq_s16): Likewise.
	(vqdmlahq_n_s16): Likewise.
	(vqdmladhxq_s16): Likewise.
	(vqdmladhq_s16): Likewise.
	(vmlsdavaxq_s16): Likewise.
	(vmlsdavaq_s16): Likewise.
	(vmlasq_n_s16): Likewise.
	(vmlaq_n_s16): Likewise.
	(vmladavaxq_s16): Likewise.
	(vmladavaq_s16): Likewise.
	(vsriq_n_s16): Likewise.
	(vsliq_n_s16): Likewise.
	(vpselq_u32): Likewise.
	(vpselq_s32): Likewise.
	(vrev64q_m_u32): Likewise.
	(vqrdmlashq_n_u32): Likewise.
	(vqrdmlahq_n_u32): Likewise.
	(vqdmlahq_n_u32): Likewise.
	(vmvnq_m_u32): Likewise.
	(vmlasq_n_u32): Likewise.
	(vmlaq_n_u32): Likewise.
	(vmladavq_p_u32): Likewise.
	(vmladavaq_u32): Likewise.
	(vminvq_p_u32): Likewise.
	(vmaxvq_p_u32): Likewise.
	(vdupq_m_n_u32): Likewise.
	(vcmpneq_m_u32): Likewise.
	(vcmpneq_m_n_u32): Likewise.
	(vcmphiq_m_u32): Likewise.
	(vcmphiq_m_n_u32): Likewise.
	(vcmpeqq_m_u32): Likewise.
	(vcmpeqq_m_n_u32): Likewise.
	(vcmpcsq_m_u32): Likewise.
	(vcmpcsq_m_n_u32): Likewise.
	(vclzq_m_u32): Likewise.
	(vaddvaq_p_u32): Likewise.
	(vsriq_n_u32): Likewise.
	(vsliq_n_u32): Likewise.
	(vshlq_m_r_u32): Likewise.
	(vrshlq_m_n_u32): Likewise.
	(vqshlq_m_r_u32): Likewise.
	(vqrshlq_m_n_u32): Likewise.
	(vminavq_p_s32): Likewise.
	(vminaq_m_s32): Likewise.
	(vmaxavq_p_s32): Likewise.
	(vmaxaq_m_s32): Likewise.
	(vcmpneq_m_s32): Likewise.
	(vcmpneq_m_n_s32): Likewise.
	(vcmpltq_m_s32): Likewise.
	(vcmpltq_m_n_s32): Likewise.
	(vcmpleq_m_s32): Likewise.
	(vcmpleq_m_n_s32): Likewise.
	(vcmpgtq_m_s32): Likewise.
	(vcmpgtq_m_n_s32): Likewise.
	(vcmpgeq_m_s32): Likewise.
	(vcmpgeq_m_n_s32): Likewise.
	(vcmpeqq_m_s32): Likewise.
	(vcmpeqq_m_n_s32): Likewise.
	(vshlq_m_r_s32): Likewise.
	(vrshlq_m_n_s32): Likewise.
	(vrev64q_m_s32): Likewise.
	(vqshlq_m_r_s32): Likewise.
	(vqrshlq_m_n_s32): Likewise.
	(vqnegq_m_s32): Likewise.
	(vqabsq_m_s32): Likewise.
	(vnegq_m_s32): Likewise.
	(vmvnq_m_s32): Likewise.
	(vmlsdavxq_p_s32): Likewise.
	(vmlsdavq_p_s32): Likewise.
	(vmladavxq_p_s32): Likewise.
	(vmladavq_p_s32): Likewise.
	(vminvq_p_s32): Likewise.
	(vmaxvq_p_s32): Likewise.
	(vdupq_m_n_s32): Likewise.
	(vclzq_m_s32): Likewise.
	(vclsq_m_s32): Likewise.
	(vaddvaq_p_s32): Likewise.
	(vabsq_m_s32): Likewise.
	(vqrdmlsdhxq_s32): Likewise.
	(vqrdmlsdhq_s32): Likewise.
	(vqrdmlashq_n_s32): Likewise.
	(vqrdmlahq_n_s32): Likewise.
	(vqrdmladhxq_s32): Likewise.
	(vqrdmladhq_s32): Likewise.
	(vqdmlsdhxq_s32): Likewise.
	(vqdmlsdhq_s32): Likewise.
	(vqdmlahq_n_s32): Likewise.
	(vqdmladhxq_s32): Likewise.
	(vqdmladhq_s32): Likewise.
	(vmlsdavaxq_s32): Likewise.
	(vmlsdavaq_s32): Likewise.
	(vmlasq_n_s32): Likewise.
	(vmlaq_n_s32): Likewise.
	(vmladavaxq_s32): Likewise.
	(vmladavaq_s32): Likewise.
	(vsriq_n_s32): Likewise.
	(vsliq_n_s32): Likewise.
	(vpselq_u64): Likewise.
	(vpselq_s64): Likewise.
	(__arm_vpselq_u8): Define intrinsic.
	(__arm_vpselq_s8): Likewise.
	(__arm_vrev64q_m_u8): Likewise.
	(__arm_vqrdmlashq_n_u8): Likewise.
	(__arm_vqrdmlahq_n_u8): Likewise.
	(__arm_vqdmlahq_n_u8): Likewise.
	(__arm_vmvnq_m_u8): Likewise.
	(__arm_vmlasq_n_u8): Likewise.
	(__arm_vmlaq_n_u8): Likewise.
	(__arm_vmladavq_p_u8): Likewise.
	(__arm_vmladavaq_u8): Likewise.
	(__arm_vminvq_p_u8): Likewise.
	(__arm_vmaxvq_p_u8): Likewise.
	(__arm_vdupq_m_n_u8): Likewise.
	(__arm_vcmpneq_m_u8): Likewise.
	(__arm_vcmpneq_m_n_u8): Likewise.
	(__arm_vcmphiq_m_u8): Likewise.
	(__arm_vcmphiq_m_n_u8): Likewise.
	(__arm_vcmpeqq_m_u8): Likewise.
	(__arm_vcmpeqq_m_n_u8): Likewise.
	(__arm_vcmpcsq_m_u8): Likewise.
	(__arm_vcmpcsq_m_n_u8): Likewise.
	(__arm_vclzq_m_u8): Likewise.
	(__arm_vaddvaq_p_u8): Likewise.
	(__arm_vsriq_n_u8): Likewise.
	(__arm_vsliq_n_u8): Likewise.
	(__arm_vshlq_m_r_u8): Likewise.
	(__arm_vrshlq_m_n_u8): Likewise.
	(__arm_vqshlq_m_r_u8): Likewise.
	(__arm_vqrshlq_m_n_u8): Likewise.
	(__arm_vminavq_p_s8): Likewise.
	(__arm_vminaq_m_s8): Likewise.
	(__arm_vmaxavq_p_s8): Likewise.
	(__arm_vmaxaq_m_s8): Likewise.
	(__arm_vcmpneq_m_s8): Likewise.
	(__arm_vcmpneq_m_n_s8): Likewise.
	(__arm_vcmpltq_m_s8): Likewise.
	(__arm_vcmpltq_m_n_s8): Likewise.
	(__arm_vcmpleq_m_s8): Likewise.
	(__arm_vcmpleq_m_n_s8): Likewise.
	(__arm_vcmpgtq_m_s8): Likewise.
	(__arm_vcmpgtq_m_n_s8): Likewise.
	(__arm_vcmpgeq_m_s8): Likewise.
	(__arm_vcmpgeq_m_n_s8): Likewise.
	(__arm_vcmpeqq_m_s8): Likewise.
	(__arm_vcmpeqq_m_n_s8): Likewise.
	(__arm_vshlq_m_r_s8): Likewise.
	(__arm_vrshlq_m_n_s8): Likewise.
	(__arm_vrev64q_m_s8): Likewise.
	(__arm_vqshlq_m_r_s8): Likewise.
	(__arm_vqrshlq_m_n_s8): Likewise.
	(__arm_vqnegq_m_s8): Likewise.
	(__arm_vqabsq_m_s8): Likewise.
	(__arm_vnegq_m_s8): Likewise.
	(__arm_vmvnq_m_s8): Likewise.
	(__arm_vmlsdavxq_p_s8): Likewise.
	(__arm_vmlsdavq_p_s8): Likewise.
	(__arm_vmladavxq_p_s8): Likewise.
	(__arm_vmladavq_p_s8): Likewise.
	(__arm_vminvq_p_s8): Likewise.
	(__arm_vmaxvq_p_s8): Likewise.
	(__arm_vdupq_m_n_s8): Likewise.
	(__arm_vclzq_m_s8): Likewise.
	(__arm_vclsq_m_s8): Likewise.
	(__arm_vaddvaq_p_s8): Likewise.
	(__arm_vabsq_m_s8): Likewise.
	(__arm_vqrdmlsdhxq_s8): Likewise.
	(__arm_vqrdmlsdhq_s8): Likewise.
	(__arm_vqrdmlashq_n_s8): Likewise.
	(__arm_vqrdmlahq_n_s8): Likewise.
	(__arm_vqrdmladhxq_s8): Likewise.
	(__arm_vqrdmladhq_s8): Likewise.
	(__arm_vqdmlsdhxq_s8): Likewise.
	(__arm_vqdmlsdhq_s8): Likewise.
	(__arm_vqdmlahq_n_s8): Likewise.
	(__arm_vqdmladhxq_s8): Likewise.
	(__arm_vqdmladhq_s8): Likewise.
	(__arm_vmlsdavaxq_s8): Likewise.
	(__arm_vmlsdavaq_s8): Likewise.
	(__arm_vmlasq_n_s8): Likewise.
	(__arm_vmlaq_n_s8): Likewise.
	(__arm_vmladavaxq_s8): Likewise.
	(__arm_vmladavaq_s8): Likewise.
	(__arm_vsriq_n_s8): Likewise.
	(__arm_vsliq_n_s8): Likewise.
	(__arm_vpselq_u16): Likewise.
	(__arm_vpselq_s16): Likewise.
	(__arm_vrev64q_m_u16): Likewise.
	(__arm_vqrdmlashq_n_u16): Likewise.
	(__arm_vqrdmlahq_n_u16): Likewise.
	(__arm_vqdmlahq_n_u16): Likewise.
	(__arm_vmvnq_m_u16): Likewise.
	(__arm_vmlasq_n_u16): Likewise.
	(__arm_vmlaq_n_u16): Likewise.
	(__arm_vmladavq_p_u16): Likewise.
	(__arm_vmladavaq_u16): Likewise.
	(__arm_vminvq_p_u16): Likewise.
	(__arm_vmaxvq_p_u16): Likewise.
	(__arm_vdupq_m_n_u16): Likewise.
	(__arm_vcmpneq_m_u16): Likewise.
	(__arm_vcmpneq_m_n_u16): Likewise.
	(__arm_vcmphiq_m_u16): Likewise.
	(__arm_vcmphiq_m_n_u16): Likewise.
	(__arm_vcmpeqq_m_u16): Likewise.
	(__arm_vcmpeqq_m_n_u16): Likewise.
	(__arm_vcmpcsq_m_u16): Likewise.
	(__arm_vcmpcsq_m_n_u16): Likewise.
	(__arm_vclzq_m_u16): Likewise.
	(__arm_vaddvaq_p_u16): Likewise.
	(__arm_vsriq_n_u16): Likewise.
	(__arm_vsliq_n_u16): Likewise.
	(__arm_vshlq_m_r_u16): Likewise.
	(__arm_vrshlq_m_n_u16): Likewise.
	(__arm_vqshlq_m_r_u16): Likewise.
	(__arm_vqrshlq_m_n_u16): Likewise.
	(__arm_vminavq_p_s16): Likewise.
	(__arm_vminaq_m_s16): Likewise.
	(__arm_vmaxavq_p_s16): Likewise.
	(__arm_vmaxaq_m_s16): Likewise.
	(__arm_vcmpneq_m_s16): Likewise.
	(__arm_vcmpneq_m_n_s16): Likewise.
	(__arm_vcmpltq_m_s16): Likewise.
	(__arm_vcmpltq_m_n_s16): Likewise.
	(__arm_vcmpleq_m_s16): Likewise.
	(__arm_vcmpleq_m_n_s16): Likewise.
	(__arm_vcmpgtq_m_s16): Likewise.
	(__arm_vcmpgtq_m_n_s16): Likewise.
	(__arm_vcmpgeq_m_s16): Likewise.
	(__arm_vcmpgeq_m_n_s16): Likewise.
	(__arm_vcmpeqq_m_s16): Likewise.
	(__arm_vcmpeqq_m_n_s16): Likewise.
	(__arm_vshlq_m_r_s16): Likewise.
	(__arm_vrshlq_m_n_s16): Likewise.
	(__arm_vrev64q_m_s16): Likewise.
	(__arm_vqshlq_m_r_s16): Likewise.
	(__arm_vqrshlq_m_n_s16): Likewise.
	(__arm_vqnegq_m_s16): Likewise.
	(__arm_vqabsq_m_s16): Likewise.
	(__arm_vnegq_m_s16): Likewise.
	(__arm_vmvnq_m_s16): Likewise.
	(__arm_vmlsdavxq_p_s16): Likewise.
	(__arm_vmlsdavq_p_s16): Likewise.
	(__arm_vmladavxq_p_s16): Likewise.
	(__arm_vmladavq_p_s16): Likewise.
	(__arm_vminvq_p_s16): Likewise.
	(__arm_vmaxvq_p_s16): Likewise.
	(__arm_vdupq_m_n_s16): Likewise.
	(__arm_vclzq_m_s16): Likewise.
	(__arm_vclsq_m_s16): Likewise.
	(__arm_vaddvaq_p_s16): Likewise.
	(__arm_vabsq_m_s16): Likewise.
	(__arm_vqrdmlsdhxq_s16): Likewise.
	(__arm_vqrdmlsdhq_s16): Likewise.
	(__arm_vqrdmlashq_n_s16): Likewise.
	(__arm_vqrdmlahq_n_s16): Likewise.
	(__arm_vqrdmladhxq_s16): Likewise.
	(__arm_vqrdmladhq_s16): Likewise.
	(__arm_vqdmlsdhxq_s16): Likewise.
	(__arm_vqdmlsdhq_s16): Likewise.
	(__arm_vqdmlahq_n_s16): Likewise.
	(__arm_vqdmladhxq_s16): Likewise.
	(__arm_vqdmladhq_s16): Likewise.
	(__arm_vmlsdavaxq_s16): Likewise.
	(__arm_vmlsdavaq_s16): Likewise.
	(__arm_vmlasq_n_s16): Likewise.
	(__arm_vmlaq_n_s16): Likewise.
	(__arm_vmladavaxq_s16): Likewise.
	(__arm_vmladavaq_s16): Likewise.
	(__arm_vsriq_n_s16): Likewise.
	(__arm_vsliq_n_s16): Likewise.
	(__arm_vpselq_u32): Likewise.
	(__arm_vpselq_s32): Likewise.
	(__arm_vrev64q_m_u32): Likewise.
	(__arm_vqrdmlashq_n_u32): Likewise.
	(__arm_vqrdmlahq_n_u32): Likewise.
	(__arm_vqdmlahq_n_u32): Likewise.
	(__arm_vmvnq_m_u32): Likewise.
	(__arm_vmlasq_n_u32): Likewise.
	(__arm_vmlaq_n_u32): Likewise.
	(__arm_vmladavq_p_u32): Likewise.
	(__arm_vmladavaq_u32): Likewise.
	(__arm_vminvq_p_u32): Likewise.
	(__arm_vmaxvq_p_u32): Likewise.
	(__arm_vdupq_m_n_u32): Likewise.
	(__arm_vcmpneq_m_u32): Likewise.
	(__arm_vcmpneq_m_n_u32): Likewise.
	(__arm_vcmphiq_m_u32): Likewise.
	(__arm_vcmphiq_m_n_u32): Likewise.
	(__arm_vcmpeqq_m_u32): Likewise.
	(__arm_vcmpeqq_m_n_u32): Likewise.
	(__arm_vcmpcsq_m_u32): Likewise.
	(__arm_vcmpcsq_m_n_u32): Likewise.
	(__arm_vclzq_m_u32): Likewise.
	(__arm_vaddvaq_p_u32): Likewise.
	(__arm_vsriq_n_u32): Likewise.
	(__arm_vsliq_n_u32): Likewise.
	(__arm_vshlq_m_r_u32): Likewise.
	(__arm_vrshlq_m_n_u32): Likewise.
	(__arm_vqshlq_m_r_u32): Likewise.
	(__arm_vqrshlq_m_n_u32): Likewise.
	(__arm_vminavq_p_s32): Likewise.
	(__arm_vminaq_m_s32): Likewise.
	(__arm_vmaxavq_p_s32): Likewise.
	(__arm_vmaxaq_m_s32): Likewise.
	(__arm_vcmpneq_m_s32): Likewise.
	(__arm_vcmpneq_m_n_s32): Likewise.
	(__arm_vcmpltq_m_s32): Likewise.
	(__arm_vcmpltq_m_n_s32): Likewise.
	(__arm_vcmpleq_m_s32): Likewise.
	(__arm_vcmpleq_m_n_s32): Likewise.
	(__arm_vcmpgtq_m_s32): Likewise.
	(__arm_vcmpgtq_m_n_s32): Likewise.
	(__arm_vcmpgeq_m_s32): Likewise.
	(__arm_vcmpgeq_m_n_s32): Likewise.
	(__arm_vcmpeqq_m_s32): Likewise.
	(__arm_vcmpeqq_m_n_s32): Likewise.
	(__arm_vshlq_m_r_s32): Likewise.
	(__arm_vrshlq_m_n_s32): Likewise.
	(__arm_vrev64q_m_s32): Likewise.
	(__arm_vqshlq_m_r_s32): Likewise.
	(__arm_vqrshlq_m_n_s32): Likewise.
	(__arm_vqnegq_m_s32): Likewise.
	(__arm_vqabsq_m_s32): Likewise.
	(__arm_vnegq_m_s32): Likewise.
	(__arm_vmvnq_m_s32): Likewise.
	(__arm_vmlsdavxq_p_s32): Likewise.
	(__arm_vmlsdavq_p_s32): Likewise.
	(__arm_vmladavxq_p_s32): Likewise.
	(__arm_vmladavq_p_s32): Likewise.
	(__arm_vminvq_p_s32): Likewise.
	(__arm_vmaxvq_p_s32): Likewise.
	(__arm_vdupq_m_n_s32): Likewise.
	(__arm_vclzq_m_s32): Likewise.
	(__arm_vclsq_m_s32): Likewise.
	(__arm_vaddvaq_p_s32): Likewise.
	(__arm_vabsq_m_s32): Likewise.
	(__arm_vqrdmlsdhxq_s32): Likewise.
	(__arm_vqrdmlsdhq_s32): Likewise.
	(__arm_vqrdmlashq_n_s32): Likewise.
	(__arm_vqrdmlahq_n_s32): Likewise.
	(__arm_vqrdmladhxq_s32): Likewise.
	(__arm_vqrdmladhq_s32): Likewise.
	(__arm_vqdmlsdhxq_s32): Likewise.
	(__arm_vqdmlsdhq_s32): Likewise.
	(__arm_vqdmlahq_n_s32): Likewise.
	(__arm_vqdmladhxq_s32): Likewise.
	(__arm_vqdmladhq_s32): Likewise.
	(__arm_vmlsdavaxq_s32): Likewise.
	(__arm_vmlsdavaq_s32): Likewise.
	(__arm_vmlasq_n_s32): Likewise.
	(__arm_vmlaq_n_s32): Likewise.
	(__arm_vmladavaxq_s32): Likewise.
	(__arm_vmladavaq_s32): Likewise.
	(__arm_vsriq_n_s32): Likewise.
	(__arm_vsliq_n_s32): Likewise.
	(__arm_vpselq_u64): Likewise.
	(__arm_vpselq_s64): Likewise.
	(vcmpneq_m_n): Define polymorphic variant.
	(vcmpneq_m): Likewise.
	(vqrdmlsdhq): Likewise.
	(vqrdmlsdhxq): Likewise.
	(vqrshlq_m_n): Likewise.
	(vqshlq_m_r): Likewise.
	(vrev64q_m): Likewise.
	(vrshlq_m_n): Likewise.
	(vshlq_m_r): Likewise.
	(vsliq_n): Likewise.
	(vsriq_n): Likewise.
	(vqrdmlashq_n): Likewise.
	(vqrdmlahq): Likewise.
	(vqrdmladhxq): Likewise.
	(vqrdmladhq): Likewise.
	(vqnegq_m): Likewise.
	(vqdmlsdhxq): Likewise.
	(vabsq_m): Likewise.
	(vclsq_m): Likewise.
	(vclzq_m): Likewise.
	(vcmpgeq_m): Likewise.
	(vcmpgeq_m_n): Likewise.
	(vdupq_m_n): Likewise.
	(vmaxaq_m): Likewise.
	(vmlaq_n): Likewise.
	(vmlasq_n): Likewise.
	(vmvnq_m): Likewise.
	(vnegq_m): Likewise.
	(vpselq): Likewise.
	(vqdmlahq_n): Likewise.
	(vqrdmlahq_n): Likewise.
	(vqdmlsdhq): Likewise.
	(vqdmladhq): Likewise.
	(vqabsq_m): Likewise.
	(vminaq_m): Likewise.
	(vrmlaldavhaq): Likewise.
	(vmlsdavxq_p): Likewise.
	(vmlsdavq_p): Likewise.	
	(vmlsdavaxq): Likewise.	
	(vmlsdavaq): Likewise.	
	(vaddvaq_p): Likewise.	
	(vcmpcsq_m_n): Likewise.	
	(vcmpcsq_m): Likewise.	
	(vcmpeqq_m_n): Likewise.	
	(vcmpeqq_m): Likewise.	
	(vmladavxq_p): Likewise.	
	(vmladavq_p): Likewise.	
	(vmladavaxq): Likewise.	
	(vmladavaq): Likewise.	
	(vminvq_p): Likewise.	
	(vminavq_p): Likewise.	
	(vmaxvq_p): Likewise.	
	(vmaxavq_p): Likewise.	
	(vcmpltq_m_n): Likewise.	
	(vcmpltq_m): Likewise.	
	(vcmpleq_m): Likewise.	
	(vcmpleq_m_n): Likewise.	
	(vcmphiq_m_n): Likewise.	
	(vcmphiq_m): Likewise.	
	(vcmpgtq_m_n): Likewise.	
	(vcmpgtq_m): Likewise.	
	* config/arm/arm_mve_builtins.def (TERNOP_NONE_NONE_NONE_IMM): Use
	builtin qualifier.
	(TERNOP_NONE_NONE_NONE_NONE): Likewise.
	(TERNOP_NONE_NONE_NONE_UNONE): Likewise.
	(TERNOP_UNONE_NONE_NONE_UNONE): Likewise.
	(TERNOP_UNONE_UNONE_NONE_UNONE): Likewise.
	(TERNOP_UNONE_UNONE_UNONE_IMM): Likewise.
	(TERNOP_UNONE_UNONE_UNONE_UNONE): Likewise.
	* config/arm/constraints.md (Rc): Define constraint to check constant is
	in the range of 0 to 15.
	(Re): Define constraint to check constant is in the range of 0 to 31.
	* config/arm/mve.md (VADDVAQ_P): Define iterator.
	(VCLZQ_M): Likewise.
	(VCMPEQQ_M_N): Likewise.
	(VCMPEQQ_M): Likewise.
	(VCMPNEQ_M_N): Likewise.
	(VCMPNEQ_M): Likewise.
	(VDUPQ_M_N): Likewise.
	(VMAXVQ_P): Likewise.
	(VMINVQ_P): Likewise.
	(VMLADAVAQ): Likewise.
	(VMLADAVQ_P): Likewise.
	(VMLAQ_N): Likewise.
	(VMLASQ_N): Likewise.
	(VMVNQ_M): Likewise.
	(VPSELQ): Likewise.
	(VQDMLAHQ_N): Likewise.
	(VQRDMLAHQ_N): Likewise.
	(VQRDMLASHQ_N): Likewise.
	(VQRSHLQ_M_N): Likewise.
	(VQSHLQ_M_R): Likewise.
	(VREV64Q_M): Likewise.
	(VRSHLQ_M_N): Likewise.
	(VSHLQ_M_R): Likewise.
	(VSLIQ_N): Likewise.
	(VSRIQ_N): Likewise.
	(mve_vabsq_m_s<mode>): Define RTL pattern.
	(mve_vaddvaq_p_<supf><mode>): Likewise.
	(mve_vclsq_m_s<mode>): Likewise.
	(mve_vclzq_m_<supf><mode>): Likewise.
	(mve_vcmpcsq_m_n_u<mode>): Likewise.
	(mve_vcmpcsq_m_u<mode>): Likewise.
	(mve_vcmpeqq_m_n_<supf><mode>): Likewise.
	(mve_vcmpeqq_m_<supf><mode>): Likewise.
	(mve_vcmpgeq_m_n_s<mode>): Likewise.
	(mve_vcmpgeq_m_s<mode>): Likewise.
	(mve_vcmpgtq_m_n_s<mode>): Likewise.
	(mve_vcmpgtq_m_s<mode>): Likewise.
	(mve_vcmphiq_m_n_u<mode>): Likewise.
	(mve_vcmphiq_m_u<mode>): Likewise.
	(mve_vcmpleq_m_n_s<mode>): Likewise.
	(mve_vcmpleq_m_s<mode>): Likewise.
	(mve_vcmpltq_m_n_s<mode>): Likewise.
	(mve_vcmpltq_m_s<mode>): Likewise.
	(mve_vcmpneq_m_n_<supf><mode>): Likewise.
	(mve_vcmpneq_m_<supf><mode>): Likewise.
	(mve_vdupq_m_n_<supf><mode>): Likewise.
	(mve_vmaxaq_m_s<mode>): Likewise.
	(mve_vmaxavq_p_s<mode>): Likewise.
	(mve_vmaxvq_p_<supf><mode>): Likewise.
	(mve_vminaq_m_s<mode>): Likewise.
	(mve_vminavq_p_s<mode>): Likewise.
	(mve_vminvq_p_<supf><mode>): Likewise.
	(mve_vmladavaq_<supf><mode>): Likewise.
	(mve_vmladavq_p_<supf><mode>): Likewise.
	(mve_vmladavxq_p_s<mode>): Likewise.
	(mve_vmlaq_n_<supf><mode>): Likewise.
	(mve_vmlasq_n_<supf><mode>): Likewise.
	(mve_vmlsdavq_p_s<mode>): Likewise.
	(mve_vmlsdavxq_p_s<mode>): Likewise.
	(mve_vmvnq_m_<supf><mode>): Likewise.
	(mve_vnegq_m_s<mode>): Likewise.
	(mve_vpselq_<supf><mode>): Likewise.
	(mve_vqabsq_m_s<mode>): Likewise.
	(mve_vqdmlahq_n_<supf><mode>): Likewise.
	(mve_vqnegq_m_s<mode>): Likewise.
	(mve_vqrdmladhq_s<mode>): Likewise.
	(mve_vqrdmladhxq_s<mode>): Likewise.
	(mve_vqrdmlahq_n_<supf><mode>): Likewise.
	(mve_vqrdmlashq_n_<supf><mode>): Likewise.
	(mve_vqrdmlsdhq_s<mode>): Likewise.
	(mve_vqrdmlsdhxq_s<mode>): Likewise.
	(mve_vqrshlq_m_n_<supf><mode>): Likewise.
	(mve_vqshlq_m_r_<supf><mode>): Likewise.
	(mve_vrev64q_m_<supf><mode>): Likewise.
	(mve_vrshlq_m_n_<supf><mode>): Likewise.
	(mve_vshlq_m_r_<supf><mode>): Likewise.
	(mve_vsliq_n_<supf><mode>): Likewise.
	(mve_vsriq_n_<supf><mode>): Likewise.
	(mve_vqdmlsdhxq_s<mode>): Likewise.
	(mve_vqdmlsdhq_s<mode>): Likewise.
	(mve_vqdmladhxq_s<mode>): Likewise.
	(mve_vqdmladhq_s<mode>): Likewise.
	(mve_vmlsdavaxq_s<mode>): Likewise.
	(mve_vmlsdavaq_s<mode>): Likewise.
	(mve_vmladavaxq_s<mode>): Likewise.
	* config/arm/predicates.md (mve_imm_15):Define predicate to check the
	matching constraint Rc.
	(mve_imm_31): Define predicate to check	the matching constraint Re.

gcc/testsuite/ChangeLog:

2019-10-25  Andre Vieira  <andre.simoesdiasvieira@arm.com>
            Mihail Ionescu  <mihail.ionescu@arm.com>
            Srinath Parvathaneni  <srinath.parvathaneni@arm.com>

	* gcc.target/arm/mve/intrinsics/vabsq_m_s16.c: New test.
	* gcc.target/arm/mve/intrinsics/vabsq_m_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vabsq_m_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddvaq_p_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddvaq_p_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddvaq_p_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddvaq_p_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddvaq_p_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vaddvaq_p_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vclsq_m_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vclsq_m_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vclsq_m_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vclzq_m_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vclzq_m_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vclzq_m_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vclzq_m_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vclzq_m_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vclzq_m_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpcsq_m_n_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpcsq_m_n_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpcsq_m_n_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpcsq_m_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpcsq_m_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpcsq_m_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpeqq_m_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpeqq_m_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpeqq_m_n_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpeqq_m_n_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpeqq_m_n_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpeqq_m_n_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpeqq_m_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpeqq_m_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpeqq_m_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpeqq_m_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpeqq_m_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpeqq_m_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpgeq_m_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpgeq_m_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpgeq_m_n_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpgeq_m_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpgeq_m_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpgeq_m_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpgtq_m_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpgtq_m_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpgtq_m_n_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpgtq_m_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpgtq_m_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpgtq_m_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmphiq_m_n_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmphiq_m_n_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmphiq_m_n_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmphiq_m_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmphiq_m_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmphiq_m_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpleq_m_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpleq_m_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpleq_m_n_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpleq_m_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpleq_m_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpleq_m_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpltq_m_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpltq_m_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpltq_m_n_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpltq_m_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpltq_m_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpltq_m_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpneq_m_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpneq_m_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpneq_m_n_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpneq_m_n_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpneq_m_n_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpneq_m_n_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpneq_m_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpneq_m_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpneq_m_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpneq_m_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpneq_m_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpneq_m_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vdupq_m_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vdupq_m_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vdupq_m_n_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vdupq_m_n_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vdupq_m_n_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vdupq_m_n_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmaxaq_m_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmaxaq_m_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmaxaq_m_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmaxavq_p_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmaxavq_p_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmaxavq_p_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmaxvq_p_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmaxvq_p_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmaxvq_p_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmaxvq_p_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmaxvq_p_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmaxvq_p_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vminaq_m_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vminaq_m_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vminaq_m_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vminavq_p_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vminavq_p_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vminavq_p_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vminvq_p_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vminvq_p_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vminvq_p_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vminvq_p_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vminvq_p_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vminvq_p_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmladavaq_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmladavaq_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmladavaq_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmladavaq_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmladavaq_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmladavaq_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmladavaxq_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmladavaxq_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmladavaxq_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmladavq_p_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmladavq_p_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmladavq_p_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmladavq_p_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmladavq_p_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmladavq_p_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmladavxq_p_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmladavxq_p_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmladavxq_p_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmlaq_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmlaq_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmlaq_n_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmlaq_n_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmlaq_n_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmlaq_n_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmlasq_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmlasq_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmlasq_n_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmlasq_n_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmlasq_n_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmlasq_n_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmlsdavaq_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmlsdavaq_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmlsdavaq_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmlsdavaxq_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmlsdavaxq_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmlsdavaxq_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmlsdavq_p_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmlsdavq_p_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmlsdavq_p_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmlsdavxq_p_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmlsdavxq_p_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmlsdavxq_p_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmvnq_m_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmvnq_m_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmvnq_m_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmvnq_m_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmvnq_m_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vmvnq_m_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vnegq_m_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vnegq_m_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vnegq_m_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vpselq_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vpselq_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vpselq_s64.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vpselq_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vpselq_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vpselq_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vpselq_u64.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vpselq_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqabsq_m_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqabsq_m_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqabsq_m_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqdmladhq_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqdmladhq_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqdmladhq_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqdmladhxq_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqdmladhxq_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqdmladhxq_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqdmlahq_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqdmlahq_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqdmlahq_n_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqdmlahq_n_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqdmlahq_n_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqdmlahq_n_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqdmlsdhq_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqdmlsdhq_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqdmlsdhq_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqdmlsdhxq_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqdmlsdhxq_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqdmlsdhxq_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqnegq_m_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqnegq_m_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqnegq_m_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqrdmladhq_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqrdmladhq_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqrdmladhq_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqrdmladhxq_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqrdmladhxq_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqrdmladhxq_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqrdmlahq_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqrdmlahq_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqrdmlahq_n_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqrdmlahq_n_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqrdmlahq_n_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqrdmlahq_n_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqrdmlashq_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqrdmlashq_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqrdmlashq_n_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqrdmlashq_n_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqrdmlashq_n_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqrdmlashq_n_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqrdmlsdhq_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqrdmlsdhq_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqrdmlsdhq_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqrdmlsdhxq_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqrdmlsdhxq_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqrdmlsdhxq_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqrshlq_m_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqrshlq_m_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqrshlq_m_n_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqrshlq_m_n_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqrshlq_m_n_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqrshlq_m_n_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqshlq_m_r_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqshlq_m_r_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqshlq_m_r_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqshlq_m_r_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqshlq_m_r_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqshlq_m_r_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrev64q_m_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrev64q_m_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrev64q_m_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrev64q_m_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrev64q_m_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrev64q_m_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrshlq_m_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrshlq_m_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrshlq_m_n_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrshlq_m_n_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrshlq_m_n_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrshlq_m_n_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vshlq_m_r_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vshlq_m_r_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vshlq_m_r_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vshlq_m_r_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vshlq_m_r_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vshlq_m_r_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsliq_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsliq_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsliq_n_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsliq_n_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsliq_n_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsliq_n_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsriq_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsriq_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsriq_n_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsriq_n_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsriq_n_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vsriq_n_u8.c: Likewise.

[-- Attachment #2: diff14.patch.gz --]
[-- Type: application/gzip, Size: 27548 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH][ARM][GCC][1/x]: MVE ACLE intrinsics framework patch.
  2019-11-14 19:34 [PATCH][ARM][GCC][0/x]: Support for MVE ACLE intrinsics Srinath Parvathaneni
                   ` (26 preceding siblings ...)
  2019-11-14 19:16 ` [PATCH][ARM][GCC][6/5x]: Remaining MVE load intrinsics which loads half word and word or double word from memory Srinath Parvathaneni
@ 2019-11-14 19:16 ` Srinath Parvathaneni
  2019-12-18 17:18   ` Kyrill Tkachov
  2019-11-14 19:16 ` [PATCH][ARM][GCC][1/1x]: Patch to support MVE ACLE intrinsics with unary operand Srinath Parvathaneni
                   ` (10 subsequent siblings)
  38 siblings, 1 reply; 48+ messages in thread
From: Srinath Parvathaneni @ 2019-11-14 19:16 UTC (permalink / raw)
  To: gcc-patches; +Cc: Richard Earnshaw, Kyrylo Tkachov

[-- Attachment #1: Type: text/plain, Size: 74443 bytes --]

Hello,

This patch creates the required framework for MVE ACLE intrinsics.

The following changes are done in this patch to support MVE ACLE intrinsics.

Header file arm_mve.h is added to source code, which contains the definitions of MVE ACLE intrinsics
and different data types used in MVE. Machine description file mve.md is also added which contains the
RTL patterns defined for MVE.

A new reigster "p0" is added which is used in by MVE predicated patterns. A new register class "VPR_REG"
is added and its contents are defined in REG_CLASS_CONTENTS.

The vec-common.md file is modified to support the standard move patterns. The prefix of neon functions
which are also used by MVE is changed from "neon_" to "simd_".
eg: neon_immediate_valid_for_move changed to simd_immediate_valid_for_move.

In the patch standard patterns mve_move, mve_store and move_load for MVE are added and neon.md and vfp.md
files are modified to support this common patterns.

Please refer to Arm reference manual [1] for more details.

[1] https://static.docs.arm.com/ddi0553/bh/DDI0553B_h_armv8m_arm.pdf?_ga=2.102521798.659307368.1572453718-1501600630.1548848914

Regression tested on arm-none-eabi and found no regressions.

Ok for trunk?

Thanks,
Srinath

gcc/ChangeLog:

2019-11-11  Andre Vieira  <andre.simoesdiasvieira@arm.com>
	    Mihail Ionescu  <mihail.ionescu@arm.com>
	    Srinath Parvathaneni  <srinath.parvathaneni@arm.com>

	* config.gcc (arm_mve.h): Add header file.
	* config/arm/aout.h (p0): Add new register name.
	* config/arm-builtins.c (ARM_BUILTIN_SIMD_LANE_CHECK): Define.
	(ARM_BUILTIN_NEON_LANE_CHECK): Remove.
	(arm_init_simd_builtin_types): Add TARGET_HAVE_MVE check.
	(arm_init_neon_builtins): Move a check to arm_init_builtins function.
	(arm_init_builtins): Move a check from arm_init_neon_builtins function.
	(mve_dereference_pointer): Add new function.
	(arm_expand_builtin_args): Add TARGET_HAVE_MVE check.
	(arm_expand_neon_builtin): Move a check to arm_expand_builtin function.
	(arm_expand_builtin): Move a check from arm_expand_neon_builtin function.
	* config/arm/arm-c.c (arm_cpu_builtins): Define macros for MVE.
	* config/arm/arm-modes.def (INT_MODE): Add three new integer modes.
	* config/arm/arm-protos.h (neon_immediate_valid_for_move): Rename function.
	(simd_immediate_valid_for_move): Rename neon_immediate_valid_for_move function.
	* config/arm/arm.c (arm_options_perform_arch_sanity_checks):Enable mve isa bit.
	(use_return_insn): Add TARGET_HAVE_MVE check.
	(aapcs_vfp_allocate): Add TARGET_HAVE_MVE check.
	(aapcs_vfp_allocate_return_reg): Add TARGET_HAVE_MVE check.
	(thumb2_legitimate_address_p): Add TARGET_HAVE_MVE check.
	(arm_rtx_costs_internal): Add TARGET_HAVE_MVE check.
	(neon_valid_immediate): Rename to simd_valid_immediate.
	(simd_valid_immediate): Rename from neon_valid_immediate.
	(neon_immediate_valid_for_move): Rename to simd_immediate_valid_for_move.
	(simd_immediate_valid_for_move): Rename from neon_immediate_valid_for_move.
	(neon_immediate_valid_for_logic): Modify call to neon_valid_immediate function.
	(neon_make_constant): Modify call to neon_valid_immediate function.
	(neon_vector_mem_operand): Add TARGET_HAVE_MVE check.
	(output_move_neon): Add TARGET_HAVE_MVE check.
	(arm_compute_frame_layout): Add TARGET_HAVE_MVE check.
	(arm_save_coproc_regs): Add TARGET_HAVE_MVE check.
	(arm_print_operand): Add case 'E' to print memory operands.
	(arm_print_operand_address): Add TARGET_HAVE_MVE check.
	(arm_hard_regno_mode_ok): Add TARGET_HAVE_MVE check.
	(arm_modes_tieable_p): Add TARGET_HAVE_MVE check.
	(arm_regno_class): Add VPR_REGNUM check.
	(arm_expand_epilogue_apcs_frame): Add TARGET_HAVE_MVE check.
	(arm_expand_epilogue): Add TARGET_HAVE_MVE check.
	(arm_vector_mode_supported_p): Add TARGET_HAVE_MVE check for MVE vector modes.
	(arm_array_mode_supported_p): Add TARGET_HAVE_MVE check.
	(arm_conditional_register_usage): For TARGET_HAVE_MVE enable VPR register.
	* config/arm/arm.h (IS_VPR_REGNUM): Macro to check for VPR register.
	(FIRST_PSEUDO_REGISTER): Modify.
	(VALID_MVE_MODE): Define.
	(VALID_MVE_SI_MODE): Define.
	(VALID_MVE_SF_MODE): Define.
	(VALID_MVE_STRUCT_MODE): Define.
	(REG_ALLOC_ORDER): Add VPR_REGNUM entry.
	(enum reg_class): Add VPR_REG entry.
	(REG_CLASS_NAMES): Add VPR_REG entry.
	* config/arm/arm.md (VPR_REGNUM): Define.
	(arm_movsf_soft_insn): Add TARGET_HAVE_MVE check to not allow MVE.
	(vfp_pop_multiple_with_writeback): Add TARGET_HAVE_MVE check to allow writeback.
	(include "mve.md"): Include mve.md file.
	* config/arm/arm_mve.h: New file.
	* config/arm/constraints.md (Up): Define.
	* config/arm/iterators.md (VNIM1): Define.
	(VNINOTM1): Define.
	(VSTRUCT): Modify.
	* config/arm/mve.md: New file.
	* config/arm/neon.md:
	(mov<mode>): Add TARGET_HAVE_MVE check.
	(movv4hf): Define.
	(neon_mov<mode>): Add TARGET_HAVE_MVE check.
	(define_split): Add TARGET_HAVE_MVE check.
	(vec_init<mode><V_elem_l>): Add TARGET_HAVE_MVE check.
	* config/arm/predicates.md (vpr_register_operand): Define.
	* config/arm/t-arm: Add mve.md file.
	* config/arm/types.md: Add MVE instructions mve_move, mve_load, mve_store.
	* config/arm/vec-common.md (mov<mode>): Add TARGET_HAVE_MVE check.
	(mov<mode>): Modify iterator.
	(movv8hf): Define

gcc/testsuite/ChangeLog:

2019-11-11  Andre Vieira  <andre.simoesdiasvieira@arm.com>
	    Mihail Ionescu  <mihail.ionescu@arm.com>
	    Srinath Parvathaneni  <srinath.parvathaneni@arm.com>

	* gcc.target/arm/mve/intrinsics/mve_vector_float.c: New test.
	* gcc.target/arm/mve/intrinsics/mve_vector_float1.c: Likewise.
	* gcc.target/arm/mve/intrinsics/mve_vector_float2.c: Likewise.
	* gcc.target/arm/mve/intrinsics/mve_vector_int.c: Likewise.
	* gcc.target/arm/mve/intrinsics/mve_vector_int1.c: Likewise.
	* gcc.target/arm/mve/intrinsics/mve_vector_int2.c: Likewise.
	* gcc.target/arm/mve/intrinsics/mve_vector_uint.c: Likewise.
	* gcc.target/arm/mve/intrinsics/mve_vector_uint1.c: Likewise.
	* gcc.target/arm/mve/intrinsics/mve_vector_uint2.c: Likewise.
	* gcc.target/arm/mve/mve.exp: New file.


###############     Attachment also inlined for ease of reply    ###############


diff --git a/gcc/config.gcc b/gcc/config.gcc
index 72f656408f11802c669c3de953bf3020020ca312..c4a7d984936c531d7dfcce347d56b5931913e68b 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -344,7 +344,7 @@ arc*-*-*)
 arm*-*-*)
 	cpu_type=arm
 	extra_objs="arm-builtins.o aarch-common.o"
-	extra_headers="mmintrin.h arm_neon.h arm_acle.h arm_fp16.h arm_cmse.h"
+	extra_headers="mmintrin.h arm_neon.h arm_acle.h arm_fp16.h arm_cmse.h arm_mve.h"
 	target_type_format_char='%'
 	c_target_objs="arm-c.o"
 	cxx_target_objs="arm-c.o"
diff --git a/gcc/config/arm/aout.h b/gcc/config/arm/aout.h
index 72782758853a869bcb9a9d69f3fa0da979cd711f..28cde153f704748f35c84d072b59e9695a61e661 100644
--- a/gcc/config/arm/aout.h
+++ b/gcc/config/arm/aout.h
@@ -53,7 +53,9 @@
 /* The assembler's names for the registers.  Note that the ?xx registers are
    there so that VFPv3/NEON registers D16-D31 have the same spacing as D0-D15
    (each of which is overlaid on two S registers), although there are no
-   actual single-precision registers which correspond to D16-D31.  */
+   actual single-precision registers which correspond to D16-D31.  New register
+   p0 is added which is used for MVE predicated cases.  */
+
 #ifndef REGISTER_NAMES
 #define REGISTER_NAMES						\
 {								\
@@ -72,7 +74,7 @@
   "wr8",   "wr9",   "wr10",  "wr11",				\
   "wr12",  "wr13",  "wr14",  "wr15",				\
   "wcgr0", "wcgr1", "wcgr2", "wcgr3",				\
-  "cc", "vfpcc", "sfp", "afp", "apsrq", "apsrge"		\
+  "cc", "vfpcc", "sfp", "afp", "apsrq", "apsrge", "p0"		\
 }
 #endif
 
diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index 650b22c7ad916d9abd587981e9ed5809755ee035..d4cb0ea3deb49b10266d1620c85e243ed34aee4d 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -667,6 +667,7 @@ enum arm_builtins
   ARM_BUILTIN_SET_FPSCR,
 
   ARM_BUILTIN_CMSE_NONSECURE_CALLER,
+  ARM_BUILTIN_SIMD_LANE_CHECK,
 
 #undef CRYPTO1
 #undef CRYPTO2
@@ -692,7 +693,6 @@ enum arm_builtins
 #include "arm_vfp_builtins.def"
 
   ARM_BUILTIN_NEON_BASE,
-  ARM_BUILTIN_NEON_LANE_CHECK = ARM_BUILTIN_NEON_BASE,
 
 #include "arm_neon_builtins.def"
 
@@ -948,26 +948,35 @@ arm_init_simd_builtin_types (void)
      an entry in our mangling table, consequently, they get default
      mangling.  As a further gotcha, poly8_t and poly16_t are signed
      types, poly64_t and poly128_t are unsigned types.  */
-  arm_simd_polyQI_type_node
-    = build_distinct_type_copy (intQI_type_node);
-  (*lang_hooks.types.register_builtin_type) (arm_simd_polyQI_type_node,
-					     "__builtin_neon_poly8");
-  arm_simd_polyHI_type_node
-    = build_distinct_type_copy (intHI_type_node);
-  (*lang_hooks.types.register_builtin_type) (arm_simd_polyHI_type_node,
-					     "__builtin_neon_poly16");
-  arm_simd_polyDI_type_node
-    = build_distinct_type_copy (unsigned_intDI_type_node);
-  (*lang_hooks.types.register_builtin_type) (arm_simd_polyDI_type_node,
-					     "__builtin_neon_poly64");
-  arm_simd_polyTI_type_node
-    = build_distinct_type_copy (unsigned_intTI_type_node);
-  (*lang_hooks.types.register_builtin_type) (arm_simd_polyTI_type_node,
-					     "__builtin_neon_poly128");
-  /* Prevent front-ends from transforming poly vectors into string
-     literals.  */
-  TYPE_STRING_FLAG (arm_simd_polyQI_type_node) = false;
-  TYPE_STRING_FLAG (arm_simd_polyHI_type_node) = false;
+  if (!TARGET_HAVE_MVE)
+    {
+      arm_simd_polyQI_type_node
+	= build_distinct_type_copy (intQI_type_node);
+      (*lang_hooks.types.register_builtin_type) (arm_simd_polyQI_type_node,
+						 "__builtin_neon_poly8");
+      arm_simd_polyHI_type_node
+	= build_distinct_type_copy (intHI_type_node);
+      (*lang_hooks.types.register_builtin_type) (arm_simd_polyHI_type_node,
+						 "__builtin_neon_poly16");
+      arm_simd_polyDI_type_node
+	= build_distinct_type_copy (unsigned_intDI_type_node);
+      (*lang_hooks.types.register_builtin_type) (arm_simd_polyDI_type_node,
+						 "__builtin_neon_poly64");
+      arm_simd_polyTI_type_node
+	= build_distinct_type_copy (unsigned_intTI_type_node);
+      (*lang_hooks.types.register_builtin_type) (arm_simd_polyTI_type_node,
+						 "__builtin_neon_poly128");
+      /* Init poly vector element types with scalar poly types.  */
+      arm_simd_types[Poly8x8_t].eltype = arm_simd_polyQI_type_node;
+      arm_simd_types[Poly8x16_t].eltype = arm_simd_polyQI_type_node;
+      arm_simd_types[Poly16x4_t].eltype = arm_simd_polyHI_type_node;
+      arm_simd_types[Poly16x8_t].eltype = arm_simd_polyHI_type_node;
+
+      /* Prevent front-ends from transforming poly vectors into string
+	 literals.  */
+      TYPE_STRING_FLAG (arm_simd_polyQI_type_node) = false;
+      TYPE_STRING_FLAG (arm_simd_polyHI_type_node) = false;
+    }
 
   /* Init all the element types built by the front-end.  */
   arm_simd_types[Int8x8_t].eltype = intQI_type_node;
@@ -985,11 +994,6 @@ arm_init_simd_builtin_types (void)
   arm_simd_types[Uint32x4_t].eltype = unsigned_intSI_type_node;
   arm_simd_types[Uint64x2_t].eltype = unsigned_intDI_type_node;
 
-  /* Init poly vector element types with scalar poly types.  */
-  arm_simd_types[Poly8x8_t].eltype = arm_simd_polyQI_type_node;
-  arm_simd_types[Poly8x16_t].eltype = arm_simd_polyQI_type_node;
-  arm_simd_types[Poly16x4_t].eltype = arm_simd_polyHI_type_node;
-  arm_simd_types[Poly16x8_t].eltype = arm_simd_polyHI_type_node;
   /* Note: poly64x2_t is defined in arm_neon.h, to ensure it gets default
      mangling.  */
 
@@ -1006,6 +1010,8 @@ arm_init_simd_builtin_types (void)
       tree eltype = arm_simd_types[i].eltype;
       machine_mode mode = arm_simd_types[i].mode;
 
+      if (eltype == NULL)
+	continue;
       if (arm_simd_types[i].itype == NULL)
 	arm_simd_types[i].itype =
 	  build_distinct_type_copy
@@ -1231,15 +1237,6 @@ arm_init_neon_builtins (void)
      system.  */
   arm_init_simd_builtin_scalar_types ();
 
-  tree lane_check_fpr = build_function_type_list (void_type_node,
-						  intSI_type_node,
-						  intSI_type_node,
-						  NULL);
-  arm_builtin_decls[ARM_BUILTIN_NEON_LANE_CHECK] =
-      add_builtin_function ("__builtin_arm_lane_check", lane_check_fpr,
-			    ARM_BUILTIN_NEON_LANE_CHECK, BUILT_IN_MD,
-			    NULL, NULL_TREE);
-
   for (i = 0; i < ARRAY_SIZE (neon_builtin_data); i++, fcode++)
     {
       arm_builtin_datum *d = &neon_builtin_data[i];
@@ -1956,6 +1953,15 @@ arm_init_builtins (void)
 
   if (TARGET_MAYBE_HARD_FLOAT)
     {
+      tree lane_check_fpr = build_function_type_list (void_type_node,
+						      intSI_type_node,
+						      intSI_type_node,
+						      NULL);
+      arm_builtin_decls[ARM_BUILTIN_SIMD_LANE_CHECK]
+      = add_builtin_function ("__builtin_arm_lane_check", lane_check_fpr,
+			      ARM_BUILTIN_SIMD_LANE_CHECK, BUILT_IN_MD,
+			      NULL, NULL_TREE);
+
       arm_init_neon_builtins ();
       arm_init_vfp_builtins ();
       arm_init_crypto_builtins ();
@@ -2201,6 +2207,47 @@ neon_dereference_pointer (tree exp, tree type, machine_mode mem_mode,
 		      build_int_cst (build_pointer_type (array_type), 0));
 }
 
+/* EXP is a pointer argument to a vector scatter store intrinsics.
+
+   Consider the following example:
+	VSTRW<v>.<dt> Qd, [Qm{, #+/-<imm>}]!
+   When <Qm> used as the base register for the target address,
+   this function is used to derive and return an expression for the
+   accessed memory.
+
+   The intrinsic function operates on a block of registers that has mode
+   REG_MODE.  This block contains vectors of type TYPE_MODE.  The function
+   references the memory at EXP of type TYPE and in mode MEM_MODE.  This
+   mode may be BLKmode if no more suitable mode is available.  */
+
+static tree
+mve_dereference_pointer (tree exp, tree type, machine_mode reg_mode,
+			 machine_mode vector_mode)
+{
+  HOST_WIDE_INT reg_size, vector_size, nelems;
+  tree elem_type, upper_bound, array_type;
+
+  /* Work out the size of each vector in bytes.  */
+  vector_size = GET_MODE_SIZE (vector_mode);
+
+  /* Work out the size of the register block in bytes.  */
+  reg_size = GET_MODE_SIZE (reg_mode);
+
+  /* Work out the type of each element.  */
+  gcc_assert (POINTER_TYPE_P (type));
+  elem_type = TREE_TYPE (type);
+
+  nelems = reg_size / vector_size;
+
+  /* Create a type that describes the full access.  */
+  upper_bound = build_int_cst (size_type_node, nelems - 1);
+  array_type = build_array_type (elem_type, build_index_type (upper_bound));
+
+  /* Dereference EXP using that type.  */
+  return fold_build2 (MEM_REF, array_type, exp,
+		      build_int_cst (build_pointer_type (array_type), 0));
+}
+
 /* Expand a builtin.  */
 static rtx
 arm_expand_builtin_args (rtx target, machine_mode map_mode, int fcode,
@@ -2239,10 +2286,17 @@ arm_expand_builtin_args (rtx target, machine_mode map_mode, int fcode,
             {
               machine_mode other_mode
 		= insn_data[icode].operand[1 - opno].mode;
-              arg[argc] = neon_dereference_pointer (arg[argc],
+	      if (TARGET_HAVE_MVE && mode[argc] != other_mode)
+		{
+		  arg[argc] = mve_dereference_pointer (arg[argc],
 						    TREE_VALUE (formals),
-						    mode[argc], other_mode,
-						    map_mode);
+						    other_mode, map_mode);
+		}
+	      else
+		arg[argc] = neon_dereference_pointer (arg[argc],
+						      TREE_VALUE (formals),
+						      mode[argc], other_mode,
+						      map_mode);
             }
 
 	  /* Use EXPAND_MEMORY for ARG_BUILTIN_MEMORY and
@@ -2548,22 +2602,6 @@ arm_expand_neon_builtin (int fcode, tree exp, rtx target)
       return const0_rtx;
     }
 
-  if (fcode == ARM_BUILTIN_NEON_LANE_CHECK)
-    {
-      /* Builtin is only to check bounds of the lane passed to some intrinsics
-	 that are implemented with gcc vector extensions in arm_neon.h.  */
-
-      tree nlanes = CALL_EXPR_ARG (exp, 0);
-      gcc_assert (TREE_CODE (nlanes) == INTEGER_CST);
-      rtx lane_idx = expand_normal (CALL_EXPR_ARG (exp, 1));
-      if (CONST_INT_P (lane_idx))
-	neon_lane_bounds (lane_idx, 0, TREE_INT_CST_LOW (nlanes), exp);
-      else
-	error ("%Klane index must be a constant immediate", exp);
-      /* Don't generate any RTL.  */
-      return const0_rtx;
-    }
-
   arm_builtin_datum *d
     = &neon_builtin_data[fcode - ARM_BUILTIN_NEON_PATTERN_START];
 
@@ -2625,6 +2663,22 @@ arm_expand_builtin (tree exp,
   int mask;
   int imm;
 
+  if (fcode == ARM_BUILTIN_SIMD_LANE_CHECK)
+    {
+      /* Builtin is only to check bounds of the lane passed to some intrinsics
+	 that are implemented with gcc vector extensions in arm_neon.h.  */
+
+      tree nlanes = CALL_EXPR_ARG (exp, 0);
+      gcc_assert (TREE_CODE (nlanes) == INTEGER_CST);
+      rtx lane_idx = expand_normal (CALL_EXPR_ARG (exp, 1));
+      if (CONST_INT_P (lane_idx))
+	neon_lane_bounds (lane_idx, 0, TREE_INT_CST_LOW (nlanes), exp);
+      else
+	error ("%Klane index must be a constant immediate", exp);
+      /* Don't generate any RTL.  */
+      return const0_rtx;
+    }
+
   if (fcode >= ARM_BUILTIN_ACLE_BASE)
     return arm_expand_acle_builtin (fcode, exp, target);
 
diff --git a/gcc/config/arm/arm-c.c b/gcc/config/arm/arm-c.c
index 34695fa0112e90e4bdf317da0b9fd1d3194bf0a2..0fe7d371c348818f25901c5d84be94589523c9a6 100644
--- a/gcc/config/arm/arm-c.c
+++ b/gcc/config/arm/arm-c.c
@@ -79,6 +79,16 @@ arm_cpu_builtins (struct cpp_reader* pfile)
   def_or_undef_macro (pfile, "__ARM_FEATURE_COMPLEX", TARGET_COMPLEX);
   def_or_undef_macro (pfile, "__ARM_32BIT_STATE", TARGET_32BIT);
 
+  cpp_undef (pfile, "__ARM_FEATURE_MVE");
+  if (TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT)
+    {
+      builtin_define_with_int_value ("__ARM_FEATURE_MVE", 3);
+    }
+  else if (TARGET_HAVE_MVE)
+    {
+      builtin_define_with_int_value ("__ARM_FEATURE_MVE", 1);
+    }
+
   cpp_undef (pfile, "__ARM_FEATURE_CMSE");
   if (arm_arch8 && !arm_arch_notm)
     {
diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index 5b49049cc45c0bccfa9d67eac0940250fc5dd95a..d4612ae4553697989611d772f7bb0061a04b98b6 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -85,7 +85,7 @@ extern bool ldm_stm_operation_p (rtx, bool, machine_mode mode,
 extern bool clear_operation_p (rtx, bool);
 extern int arm_const_double_rtx (rtx);
 extern int vfp3_const_double_rtx (rtx);
-extern int neon_immediate_valid_for_move (rtx, machine_mode, rtx *, int *);
+extern int simd_immediate_valid_for_move (rtx, machine_mode, rtx *, int *);
 extern int neon_immediate_valid_for_logic (rtx, machine_mode, int, rtx *,
 					   int *);
 extern int neon_immediate_valid_for_shift (rtx, machine_mode, rtx *,
diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index 8b07c423fb6b071642fccc48424fe244d97dcbc2..c755df420b52798773ee99f54faf6689d4a16215 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -751,7 +751,8 @@ extern int arm_arch_cmse;
 /*	s0-s15		VFP scratch (aka d0-d7).
 	s16-s31	      S	VFP variable (aka d8-d15).
 	vfpcc		Not a real register.  Represents the VFP condition
-			code flags.  */
+			code flags.
+	vpr		Used to represent MVE VPR predication.  */
 
 /* The stack backtrace structure is as follows:
   fp points to here:  |  save code pointer  |      [fp]
@@ -792,7 +793,7 @@ extern int arm_arch_cmse;
   1,1,1,1,1,1,1,1,		\
   1,1,1,1,			\
   /* Specials.  */		\
-  1,1,1,1,1,1			\
+  1,1,1,1,1,1,1			\
 }
 
 /* 1 for registers not available across function calls.
@@ -822,7 +823,7 @@ extern int arm_arch_cmse;
   1,1,1,1,1,1,1,1,		\
   1,1,1,1,			\
   /* Specials.  */		\
-  1,1,1,1,1,1			\
+  1,1,1,1,1,1,1			\
 }
 
 #ifndef SUBTARGET_CONDITIONAL_REGISTER_USAGE
@@ -998,10 +999,10 @@ extern int arm_arch_cmse;
    && (LAST_VFP_REGNUM - (REGNUM) >= 2 * (N) - 1))
 
 /* The number of hard registers is 16 ARM + 1 CC + 1 SFP + 1 AFP
-   + 1 APSRQ + 1 APSRGE.  */
+   + 1 APSRQ + 1 APSRGE + 1 VPR.  */
 /* Intel Wireless MMX Technology registers add 16 + 4 more.  */
 /* VFP (VFP3) adds 32 (64) + 1 VFPCC.  */
-#define FIRST_PSEUDO_REGISTER   106
+#define FIRST_PSEUDO_REGISTER   107
 
 #define DBX_REGISTER_NUMBER(REGNO) arm_dbx_register_number (REGNO)
 
@@ -1029,11 +1030,26 @@ extern int arm_arch_cmse;
   ((MODE) == V4SImode || (MODE) == V8HImode || (MODE) == V16QImode \
    || (MODE) == V8HFmode || (MODE) == V4SFmode || (MODE) == V2DImode)
 
+#define VALID_MVE_MODE(MODE) \
+  ((MODE) == V2DImode ||(MODE) == V4SImode || (MODE) == V8HImode \
+   || (MODE) == V16QImode || (MODE) == V8HFmode || (MODE) == V4SFmode \
+   || (MODE) == V2DFmode)
+
+#define VALID_MVE_SI_MODE(MODE) \
+  ((MODE) == V2DImode ||(MODE) == V4SImode || (MODE) == V8HImode \
+   || (MODE) == V16QImode)
+
+#define VALID_MVE_SF_MODE(MODE) \
+  ((MODE) == V8HFmode || (MODE) == V4SFmode || (MODE) == V2DFmode)
+
 /* Structure modes valid for Neon registers.  */
 #define VALID_NEON_STRUCT_MODE(MODE) \
   ((MODE) == TImode || (MODE) == EImode || (MODE) == OImode \
    || (MODE) == CImode || (MODE) == XImode)
 
+#define VALID_MVE_STRUCT_MODE(MODE) \
+  ((MODE) == TImode || (MODE) == OImode || (MODE) == XImode)
+
 /* The register numbers in sequence, for passing to arm_gen_load_multiple.  */
 extern int arm_regs_in_sequence[];
 
@@ -1085,9 +1101,13 @@ extern int arm_regs_in_sequence[];
   /* Registers not for general use.  */		\
   CC_REGNUM, VFPCC_REGNUM,			\
   FRAME_POINTER_REGNUM, ARG_POINTER_REGNUM,	\
-  SP_REGNUM, PC_REGNUM, APSRQ_REGNUM, APSRGE_REGNUM	\
+  SP_REGNUM, PC_REGNUM, APSRQ_REGNUM, APSRGE_REGNUM,	\
+  VPR_REGNUM					\
 }
 
+#define IS_VPR_REGNUM(REGNUM) \
+  ((REGNUM) == VPR_REGNUM)
+
 /* Use different register alloc ordering for Thumb.  */
 #define ADJUST_REG_ALLOC_ORDER arm_order_regs_for_local_alloc ()
 
@@ -1124,6 +1144,7 @@ enum reg_class
   VFPCC_REG,
   SFP_REG,
   AFP_REG,
+  VPR_REG,
   ALL_REGS,
   LIM_REG_CLASSES
 };
@@ -1131,7 +1152,7 @@ enum reg_class
 #define N_REG_CLASSES  (int) LIM_REG_CLASSES
 
 /* Give names of register classes as strings for dump file.  */
-#define REG_CLASS_NAMES  \
+#define REG_CLASS_NAMES \
 {			\
   "NO_REGS",		\
   "LO_REGS",		\
@@ -1151,6 +1172,7 @@ enum reg_class
   "VFPCC_REG",		\
   "SFP_REG",		\
   "AFP_REG",		\
+  "VPR_REG",		\
   "ALL_REGS"		\
 }
 
@@ -1177,7 +1199,8 @@ enum reg_class
   { 0x00000000, 0x00000000, 0x00000000, 0x00000020 }, /* VFPCC_REG */	\
   { 0x00000000, 0x00000000, 0x00000000, 0x00000040 }, /* SFP_REG */	\
   { 0x00000000, 0x00000000, 0x00000000, 0x00000080 }, /* AFP_REG */	\
-  { 0xFFFF7FFF, 0xFFFFFFFF, 0xFFFFFFFF, 0x0000000F }  /* ALL_REGS */	\
+  { 0x00000000, 0x00000000, 0x00000000, 0x00000100 }, /* VPR_REG.  */	\
+  { 0xFFFF7FFF, 0xFFFFFFFF, 0xFFFFFFFF, 0x0000010F }  /* ALL_REGS.  */	\
 }
 
 #define FP_SYSREGS \
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 883c2a9179d7e6d69225f8d104228d15702ecef7..6faed76206b93c1a9dea048e2f693dc16ee58072 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -3759,7 +3759,8 @@ arm_options_perform_arch_sanity_checks (void)
       else if (TARGET_HARD_FLOAT_ABI)
 	{
 	  arm_pcs_default = ARM_PCS_AAPCS_VFP;
-	  if (!bitmap_bit_p (arm_active_target.isa, isa_bit_vfpv2))
+	  if (!bitmap_bit_p (arm_active_target.isa, isa_bit_vfpv2)
+	      && !bitmap_bit_p (arm_active_target.isa, isa_bit_mve))
 	    error ("%<-mfloat-abi=hard%>: selected processor lacks an FPU");
 	}
       else
@@ -4230,7 +4231,7 @@ use_return_insn (int iscond, rtx sibling)
 
   /* Can't be done if any of the VFP regs are pushed,
      since this also requires an insn.  */
-  if (TARGET_HARD_FLOAT)
+  if (TARGET_HARD_FLOAT || TARGET_HAVE_MVE)
     for (regno = FIRST_VFP_REGNUM; regno <= LAST_VFP_REGNUM; regno++)
       if (df_regs_ever_live_p (regno) && !call_used_or_fixed_reg_p (regno))
 	return 0;
@@ -6289,7 +6290,7 @@ aapcs_vfp_allocate (CUMULATIVE_ARGS *pcum, machine_mode mode,
       {
 	pcum->aapcs_vfp_reg_alloc = mask << regno;
 	if (mode == BLKmode
-	    || (mode == TImode && ! TARGET_NEON)
+	    || (mode == TImode && ! (TARGET_NEON || TARGET_HAVE_MVE))
 	    || ! arm_hard_regno_mode_ok (FIRST_VFP_REGNUM + regno, mode))
 	  {
 	    int i;
@@ -6297,7 +6298,7 @@ aapcs_vfp_allocate (CUMULATIVE_ARGS *pcum, machine_mode mode,
 	    int rshift = shift;
 	    machine_mode rmode = pcum->aapcs_vfp_rmode;
 	    rtx par;
-	    if (!TARGET_NEON)
+	    if (!(TARGET_NEON || TARGET_HAVE_MVE))
 	      {
 		/* Avoid using unsupported vector modes.  */
 		if (rmode == V2SImode)
@@ -6343,7 +6344,7 @@ aapcs_vfp_allocate_return_reg (enum arm_pcs pcs_variant ATTRIBUTE_UNUSED,
   if (mode == BLKmode
       || (GET_MODE_CLASS (mode) == MODE_INT
 	  && GET_MODE_SIZE (mode) >= GET_MODE_SIZE (TImode)
-	  && !TARGET_NEON))
+	  && !(TARGET_NEON || TARGET_HAVE_MVE)))
     {
       int count;
       machine_mode ag_mode;
@@ -6354,7 +6355,7 @@ aapcs_vfp_allocate_return_reg (enum arm_pcs pcs_variant ATTRIBUTE_UNUSED,
       aapcs_vfp_is_call_or_return_candidate (pcs_variant, mode, type,
 					     &ag_mode, &count);
 
-      if (!TARGET_NEON)
+      if (!(TARGET_NEON || TARGET_HAVE_MVE))
 	{
 	  if (ag_mode == V2SImode)
 	    ag_mode = DImode;
@@ -8253,7 +8254,9 @@ thumb2_legitimate_address_p (machine_mode mode, rtx x, int strict_p)
 		   && CONST_INT_P (XEXP (XEXP (x, 0), 1)))))
     return 1;
 
-  else if (mode == TImode || (TARGET_NEON && VALID_NEON_STRUCT_MODE (mode)))
+  else if (mode == TImode
+	   || (TARGET_NEON && VALID_NEON_STRUCT_MODE (mode))
+	   || (TARGET_HAVE_MVE && VALID_MVE_STRUCT_MODE (mode)))
     return 0;
 
   else if (code == PLUS)
@@ -9800,7 +9803,7 @@ arm_rtx_costs_internal (rtx x, enum rtx_code code, enum rtx_code outer_code,
 	  /* Assume that most copies can be done with a single insn,
 	     unless we don't have HW FP, in which case everything
 	     larger than word mode will require two insns.  */
-	  *cost = COSTS_N_INSNS (((!TARGET_HARD_FLOAT
+	  *cost = COSTS_N_INSNS (((!(TARGET_HARD_FLOAT || TARGET_HAVE_MVE)
 				   && GET_MODE_SIZE (mode) > 4)
 				  || mode == DImode)
 				 ? 2 : 1);
@@ -11281,10 +11284,10 @@ arm_rtx_costs_internal (rtx x, enum rtx_code code, enum rtx_code outer_code,
 
     case CONST_VECTOR:
       /* Fixme.  */
-      if (TARGET_NEON
-	  && TARGET_HARD_FLOAT
-	  && (VALID_NEON_DREG_MODE (mode) || VALID_NEON_QREG_MODE (mode))
-	  && neon_immediate_valid_for_move (x, mode, NULL, NULL))
+      if (((TARGET_NEON && TARGET_HARD_FLOAT
+	    && (VALID_NEON_DREG_MODE (mode) || VALID_NEON_QREG_MODE (mode)))
+	   || TARGET_HAVE_MVE)
+	  && simd_immediate_valid_for_move (x, mode, NULL, NULL))
 	*cost = COSTS_N_INSNS (1);
       else
 	*cost = COSTS_N_INSNS (4);
@@ -12328,8 +12331,8 @@ vfp3_const_double_rtx (rtx x)
   return vfp3_const_double_index (x) != -1;
 }
 
-/* Recognize immediates which can be used in various Neon instructions. Legal
-   immediates are described by the following table (for VMVN variants, the
+/* Recognize immediates which can be used in various Neon and MVE instructions.
+   Legal immediates are described by the following table (for VMVN variants, the
    bitwise inverse of the constant shown is recognized. In either case, VMOV
    is output and the correct instruction to use for a given constant is chosen
    by the assembler). The constant shown is replicated across all elements of
@@ -12380,7 +12383,7 @@ vfp3_const_double_rtx (rtx x)
    -1 if the given value doesn't match any of the listed patterns.
 */
 static int
-neon_valid_immediate (rtx op, machine_mode mode, int inverse,
+simd_valid_immediate (rtx op, machine_mode mode, int inverse,
 		      rtx *modconst, int *elementwidth)
 {
 #define CHECK(STRIDE, ELSIZE, CLASS, TEST)	\
@@ -12412,6 +12415,10 @@ neon_valid_immediate (rtx op, machine_mode mode, int inverse,
 
   innersize = GET_MODE_UNIT_SIZE (mode);
 
+  /* Only support 128-bit vectors for MVE.  */
+  if (TARGET_HAVE_MVE && (!vector || n_elts * innersize != 16))
+    return -1;
+
   /* Vectors of float constants.  */
   if (GET_MODE_CLASS (mode) == MODE_VECTOR_FLOAT)
     {
@@ -12560,18 +12567,19 @@ neon_valid_immediate (rtx op, machine_mode mode, int inverse,
 #undef CHECK
 }
 
-/* Return TRUE if rtx X is legal for use as either a Neon VMOV (or, implicitly,
-   VMVN) immediate. Write back width per element to *ELEMENTWIDTH (or zero for
-   float elements), and a modified constant (whatever should be output for a
-   VMOV) in *MODCONST.  */
-
+/* Return TRUE if rtx X is legal for use as either a Neon or MVE VMOV (or,
+   implicitly, VMVN) immediate.  Write back width per element to *ELEMENTWIDTH
+   (or zero for float elements), and a modified constant (whatever should be
+   output for a VMOV) in *MODCONST.  "neon_immediate_valid_for_move" function is
+   modified to "simd_immediate_valid_for_move" as this function will be used
+   both by neon and mve.  */
 int
-neon_immediate_valid_for_move (rtx op, machine_mode mode,
+simd_immediate_valid_for_move (rtx op, machine_mode mode,
 			       rtx *modconst, int *elementwidth)
 {
   rtx tmpconst;
   int tmpwidth;
-  int retval = neon_valid_immediate (op, mode, 0, &tmpconst, &tmpwidth);
+  int retval = simd_valid_immediate (op, mode, 0, &tmpconst, &tmpwidth);
 
   if (retval == -1)
     return 0;
@@ -12588,7 +12596,7 @@ neon_immediate_valid_for_move (rtx op, machine_mode mode,
 /* Return TRUE if rtx X is legal for use in a VORR or VBIC instruction.  If
    the immediate is valid, write a constant suitable for using as an operand
    to VORR/VBIC/VAND/VORN to *MODCONST and the corresponding element width to
-   *ELEMENTWIDTH. See neon_valid_immediate for description of INVERSE.  */
+   *ELEMENTWIDTH.  See simd_valid_immediate for description of INVERSE.  */
 
 int
 neon_immediate_valid_for_logic (rtx op, machine_mode mode, int inverse,
@@ -12596,7 +12604,7 @@ neon_immediate_valid_for_logic (rtx op, machine_mode mode, int inverse,
 {
   rtx tmpconst;
   int tmpwidth;
-  int retval = neon_valid_immediate (op, mode, inverse, &tmpconst, &tmpwidth);
+  int retval = simd_valid_immediate (op, mode, inverse, &tmpconst, &tmpwidth);
 
   if (retval < 0 || retval > 5)
     return 0;
@@ -12803,7 +12811,7 @@ neon_make_constant (rtx vals)
     gcc_unreachable ();
 
   if (const_vec != NULL
-      && neon_immediate_valid_for_move (const_vec, mode, NULL, NULL))
+      && simd_immediate_valid_for_move (const_vec, mode, NULL, NULL))
     /* Load using VMOV.  On Cortex-A8 this takes one cycle.  */
     return const_vec;
   else if ((target = neon_vdup_constant (vals)) != NULL_RTX)
@@ -13080,6 +13088,15 @@ neon_vector_mem_operand (rtx op, int type, bool strict)
       && (INTVAL (XEXP (ind, 1)) & 3) == 0)
     return TRUE;
 
+  if (type == 1 && TARGET_HAVE_MVE
+      && (GET_CODE (ind) == POST_INC || GET_CODE (ind) == PRE_DEC))
+    {
+      rtx ind1 = XEXP (ind, 0);
+      if (!REG_P (ind1))
+	return 0;
+      return NEON_REGNO_OK_FOR_QUAD (REGNO (ind1));
+    }
+
   return FALSE;
 }
 
@@ -19936,7 +19953,7 @@ output_move_neon (rtx *operands)
     {
     case POST_INC:
       /* We have to use vldm / vstm for too-large modes.  */
-      if (nregs > 4)
+      if (nregs > 4 || (TARGET_HAVE_MVE && nregs >= 2))
 	{
 	  templ = "v%smia%%?\t%%0!, %%h1";
 	  ops[0] = XEXP (addr, 0);
@@ -19965,7 +19982,7 @@ output_move_neon (rtx *operands)
       /* We have to use vldm / vstm for too-large modes.  */
       if (nregs > 1)
 	{
-	  if (nregs > 4)
+	  if (nregs > 4 || (TARGET_HAVE_MVE && nregs >= 2))
 	    templ = "v%smia%%?\t%%m0, %%h1";
 	  else
 	    templ = "v%s1.64\t%%h1, %%A0";
@@ -19980,29 +19997,40 @@ output_move_neon (rtx *operands)
       {
 	int i;
 	int overlap = -1;
-	for (i = 0; i < nregs; i++)
+	if (TARGET_HAVE_MVE && !BYTES_BIG_ENDIAN)
 	  {
-	    /* We're only using DImode here because it's a convenient size.  */
-	    ops[0] = gen_rtx_REG (DImode, REGNO (reg) + 2 * i);
-	    ops[1] = adjust_address (mem, DImode, 8 * i);
-	    if (reg_overlap_mentioned_p (ops[0], mem))
+	    sprintf (buff, "v%srw.32\t%%q0, %%1", load ? "ld" : "st");
+	    ops[0] = reg;
+	    ops[1] = mem;
+	    output_asm_insn (buff, ops);
+	  }
+	else
+	  {
+	    for (i = 0; i < nregs; i++)
 	      {
-		gcc_assert (overlap == -1);
-		overlap = i;
+		/* We're only using DImode here because it's a convenient
+		   size.  */
+		ops[0] = gen_rtx_REG (DImode, REGNO (reg) + 2 * i);
+		ops[1] = adjust_address (mem, DImode, 8 * i);
+		if (reg_overlap_mentioned_p (ops[0], mem))
+		  {
+		    gcc_assert (overlap == -1);
+		    overlap = i;
+		  }
+		else
+		  {
+		    sprintf (buff, "v%sr%%?\t%%P0, %%1", load ? "ld" : "st");
+		    output_asm_insn (buff, ops);
+		  }
 	      }
-	    else
+	    if (overlap != -1)
 	      {
+		ops[0] = gen_rtx_REG (DImode, REGNO (reg) + 2 * overlap);
+		ops[1] = adjust_address (mem, SImode, 8 * overlap);
 		sprintf (buff, "v%sr%%?\t%%P0, %%1", load ? "ld" : "st");
 		output_asm_insn (buff, ops);
 	      }
 	  }
-	if (overlap != -1)
-	  {
-	    ops[0] = gen_rtx_REG (DImode, REGNO (reg) + 2 * overlap);
-	    ops[1] = adjust_address (mem, SImode, 8 * overlap);
-	    sprintf (buff, "v%sr%%?\t%%P0, %%1", load ? "ld" : "st");
-	    output_asm_insn (buff, ops);
-	  }
 
         return "";
       }
@@ -22223,7 +22251,7 @@ arm_compute_frame_layout (void)
       func_type = arm_current_func_type ();
       /* Space for saved VFP registers.  */
       if (! IS_VOLATILE (func_type)
-	  && TARGET_HARD_FLOAT)
+	  && (TARGET_HARD_FLOAT || TARGET_HAVE_MVE))
 	saved += arm_get_vfp_saved_size ();
 
       /* Allocate space for saving/restoring FPCXTNS in Armv8.1-M Mainline
@@ -22447,7 +22475,7 @@ arm_save_coproc_regs(void)
 	saved_size += 8;
       }
 
-  if (TARGET_HARD_FLOAT)
+  if (TARGET_HARD_FLOAT || TARGET_HAVE_MVE)
     {
       start_reg = FIRST_VFP_REGNUM;
 
@@ -23749,6 +23777,53 @@ arm_print_operand (FILE *stream, rtx x, int code)
       }
       return;
 
+    /* To print the memory operand with "Us" constraint.  Based on the rtx_code
+       the memory operands output looks like following.
+       1. [Rn], #+/-<imm>
+       2. [Rn, #+/-<imm>]!
+       3. [Rn].  */
+    case 'E':
+      {
+	rtx addr;
+	rtx postinc_reg = NULL;
+	unsigned inc_val = 0;
+	enum rtx_code code;
+
+	gcc_assert (MEM_P (x));
+	addr = XEXP (x, 0);
+	code = GET_CODE (addr);
+	if (code == POST_INC || code == POST_DEC || code == PRE_INC
+	    || code  == PRE_DEC)
+	  {
+	    asm_fprintf (stream, "[%r", REGNO (XEXP (addr, 0)));
+	    inc_val = GET_MODE_SIZE (GET_MODE (x));
+	    if (code == POST_INC || code == POST_DEC)
+	      asm_fprintf (stream, "], #%s%d",(code == POST_INC)
+					      ? "": "-", inc_val);
+	    else
+	      asm_fprintf (stream, ", #%s%d]!",(code == PRE_INC)
+					       ? "": "-", inc_val);
+	  }
+	else if (code == POST_MODIFY || code == PRE_MODIFY)
+	  {
+	    asm_fprintf (stream, "[%r", REGNO (XEXP (addr, 0)));
+	    postinc_reg = XEXP ( XEXP (x, 1), 1);
+	    if (postinc_reg && CONST_INT_P (postinc_reg))
+	      {
+		if (code == POST_MODIFY)
+		  asm_fprintf (stream, "], #%wd",INTVAL (postinc_reg));
+		else
+		  asm_fprintf (stream, ", #%wd]!",INTVAL (postinc_reg));
+	      }
+	  }
+	else
+	  {
+	    gcc_assert (REG_P (addr));
+	    asm_fprintf (stream, "[%r]",REGNO (addr));
+	  }
+      }
+      return;
+
     case 'C':
       {
 	rtx addr;
@@ -23926,9 +24001,10 @@ arm_print_operand_address (FILE *stream, machine_mode mode, rtx x)
 			 REGNO (XEXP (x, 0)),
 			 GET_CODE (x) == PRE_DEC ? "-" : "",
 			 GET_MODE_SIZE (mode));
+	  else if (TARGET_HAVE_MVE && (mode == OImode || mode == XImode))
+	    asm_fprintf (stream, "[%r]!", REGNO (XEXP (x,0)));
 	  else
-	    asm_fprintf (stream, "[%r], #%s%d",
-			 REGNO (XEXP (x, 0)),
+	    asm_fprintf (stream, "[%r], #%s%d", REGNO (XEXP (x, 0)),
 			 GET_CODE (x) == POST_DEC ? "-" : "",
 			 GET_MODE_SIZE (mode));
 	}
@@ -24773,12 +24849,15 @@ arm_hard_regno_mode_ok (unsigned int regno, machine_mode mode)
 {
   if (GET_MODE_CLASS (mode) == MODE_CC)
     return (regno == CC_REGNUM
-	    || (TARGET_HARD_FLOAT
+	    || ((TARGET_HARD_FLOAT || TARGET_HAVE_MVE)
 		&& regno == VFPCC_REGNUM));
 
   if (regno == CC_REGNUM && GET_MODE_CLASS (mode) != MODE_CC)
     return false;
 
+  if (IS_VPR_REGNUM (regno))
+    return true;
+
   if (TARGET_THUMB1)
     /* For the Thumb we only allow values bigger than SImode in
        registers 0 - 6, so that there is always a second low
@@ -24787,7 +24866,7 @@ arm_hard_regno_mode_ok (unsigned int regno, machine_mode mode)
        start of an even numbered register pair.  */
     return (ARM_NUM_REGS (mode) < 2) || (regno < LAST_LO_REGNUM);
 
-  if (TARGET_HARD_FLOAT && IS_VFP_REGNUM (regno))
+  if ((TARGET_HARD_FLOAT || TARGET_HAVE_MVE) && IS_VFP_REGNUM (regno))
     {
       if (mode == SFmode || mode == SImode)
 	return VFP_REGNO_OK_FOR_SINGLE (regno);
@@ -24811,6 +24890,10 @@ arm_hard_regno_mode_ok (unsigned int regno, machine_mode mode)
 	       || (mode == OImode && NEON_REGNO_OK_FOR_NREGS (regno, 4))
 	       || (mode == CImode && NEON_REGNO_OK_FOR_NREGS (regno, 6))
 	       || (mode == XImode && NEON_REGNO_OK_FOR_NREGS (regno, 8));
+     if (TARGET_HAVE_MVE)
+       return ((VALID_MVE_MODE (mode) && NEON_REGNO_OK_FOR_QUAD (regno))
+	       || (mode == OImode && NEON_REGNO_OK_FOR_NREGS (regno, 4))
+	       || (mode == XImode && NEON_REGNO_OK_FOR_NREGS (regno, 8)));
 
       return false;
     }
@@ -24859,13 +24942,18 @@ arm_modes_tieable_p (machine_mode mode1, machine_mode mode2)
   /* We specifically want to allow elements of "structure" modes to
      be tieable to the structure.  This more general condition allows
      other rarer situations too.  */
-  if (TARGET_NEON
-      && (VALID_NEON_DREG_MODE (mode1)
-	  || VALID_NEON_QREG_MODE (mode1)
-	  || VALID_NEON_STRUCT_MODE (mode1))
-      && (VALID_NEON_DREG_MODE (mode2)
-	  || VALID_NEON_QREG_MODE (mode2)
-	  || VALID_NEON_STRUCT_MODE (mode2)))
+  if ((TARGET_NEON
+       && (VALID_NEON_DREG_MODE (mode1)
+	   || VALID_NEON_QREG_MODE (mode1)
+	   || VALID_NEON_STRUCT_MODE (mode1))
+       && (VALID_NEON_DREG_MODE (mode2)
+	   || VALID_NEON_QREG_MODE (mode2)
+	   || VALID_NEON_STRUCT_MODE (mode2)))
+      || (TARGET_HAVE_MVE
+	  && (VALID_MVE_MODE (mode1)
+	      || VALID_MVE_STRUCT_MODE (mode1))
+	  && (VALID_MVE_MODE (mode2)
+	      || VALID_MVE_STRUCT_MODE (mode2))))
     return true;
 
   return false;
@@ -24880,6 +24968,9 @@ arm_regno_class (int regno)
   if (regno == PC_REGNUM)
     return NO_REGS;
 
+  if (IS_VPR_REGNUM (regno))
+    return VPR_REG;
+
   if (TARGET_THUMB1)
     {
       if (regno == STACK_POINTER_REGNUM)
@@ -26731,7 +26822,7 @@ arm_expand_epilogue_apcs_frame (bool really_return)
         floats_from_frame += 4;
       }
 
-  if (TARGET_HARD_FLOAT)
+  if (TARGET_HARD_FLOAT || TARGET_HAVE_MVE)
     {
       int start_reg;
       rtx ip_rtx = gen_rtx_REG (SImode, IP_REGNUM);
@@ -26977,7 +27068,7 @@ arm_expand_epilogue (bool really_return)
         }
     }
 
-  if (TARGET_HARD_FLOAT)
+  if (TARGET_HARD_FLOAT || TARGET_HAVE_MVE)
     {
       /* Generate VFP register multi-pop.  */
       int end_reg = LAST_VFP_REGNUM + 1;
@@ -27148,7 +27239,7 @@ arm_expand_epilogue (bool really_return)
 						   GEN_INT (FPCXTNS_ENUM)));
 	  RTX_FRAME_RELATED_P (insn) = 1;
 	}
-      }
+    }
 
   if (!really_return)
     return;
@@ -28370,6 +28461,15 @@ arm_vector_mode_supported_p (machine_mode mode)
       || mode == V2HAmode))
     return true;
 
+  if (TARGET_HAVE_MVE
+      && (mode == V2DImode || mode == V4SImode || mode == V8HImode
+	  || mode == V16QImode))
+      return true;
+
+  if (TARGET_HAVE_MVE_FLOAT
+      && (mode == V2DFmode || mode == V4SFmode || mode == V8HFmode))
+      return true;
+
   return false;
 }
 
@@ -28387,6 +28487,10 @@ arm_array_mode_supported_p (machine_mode mode,
       && (nelems >= 2 && nelems <= 4))
     return true;
 
+  if (TARGET_HAVE_MVE && !BYTES_BIG_ENDIAN
+      && VALID_MVE_MODE (mode) && (nelems == 2 || nelems == 4))
+    return true;
+
   return false;
 }
 
@@ -29435,7 +29539,7 @@ arm_conditional_register_usage (void)
   if (TARGET_THUMB1)
     fixed_regs[LR_REGNUM] = call_used_regs[LR_REGNUM] = 1;
 
-  if (TARGET_32BIT && TARGET_HARD_FLOAT)
+  if (TARGET_32BIT && (TARGET_HARD_FLOAT || TARGET_HAVE_MVE))
     {
       /* VFPv3 registers are disabled when earlier VFP
 	 versions are selected due to the definition of
@@ -29447,6 +29551,8 @@ arm_conditional_register_usage (void)
 	  call_used_regs[regno] = regno < FIRST_VFP_REGNUM + 16
 	    || regno >= FIRST_VFP_REGNUM + 32;
 	}
+      if (TARGET_HAVE_MVE)
+	fixed_regs[VPR_REGNUM] = 0;
     }
 
   if (TARGET_REALLY_IWMMXT && !TARGET_GENERAL_REGS_ONLY)
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index c62ad1b360ebecd5368e90ea5634488eef22f2fc..689baa0b0ff63ef90f47d2fd844cb98c9a1457a0 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -41,6 +41,7 @@
    (VFPCC_REGNUM    101)	; VFP Condition code pseudo register
    (APSRQ_REGNUM    104)	; Q bit pseudo register
    (APSRGE_REGNUM   105)	; GE bits pseudo register
+   (VPR_REGNUM      106)	; Vector Predication Register - MVE register.
   ]
 )
 ;; 3rd operand to select_dominance_cc_mode
@@ -7293,7 +7294,7 @@
   [(set (match_operand:SF 0 "nonimmediate_operand" "=r,r,m")
 	(match_operand:SF 1 "general_operand"  "r,mE,r"))]
   "TARGET_32BIT
-   && TARGET_SOFT_FLOAT
+   && TARGET_SOFT_FLOAT && !TARGET_HAVE_MVE
    && (!MEM_P (operands[0])
        || register_operand (operands[1], SFmode))"
 {
@@ -7416,8 +7417,8 @@
 
 (define_insn "*movdf_soft_insn"
   [(set (match_operand:DF 0 "nonimmediate_soft_df_operand" "=r,r,r,r,m")
-	(match_operand:DF 1 "soft_df_operand" "rDa,Db,Dc,mF,r"))]
-  "TARGET_32BIT && TARGET_SOFT_FLOAT
+       (match_operand:DF 1 "soft_df_operand" "rDa,Db,Dc,mF,r"))]
+  "TARGET_32BIT && TARGET_SOFT_FLOAT && !TARGET_HAVE_MVE
    && (   register_operand (operands[0], DFmode)
        || register_operand (operands[1], DFmode))"
   "*
@@ -11681,7 +11682,7 @@
                    (match_operand:SI 2 "const_int_I_operand" "I")))
      (set (match_operand:DF 3 "vfp_hard_register_operand" "")
           (mem:DF (match_dup 1)))])]
-  "TARGET_32BIT && TARGET_HARD_FLOAT"
+  "TARGET_32BIT && (TARGET_HARD_FLOAT || TARGET_HAVE_MVE)"
   "*
   {
     int num_regs = XVECLEN (operands[0], 0);
@@ -12624,7 +12625,7 @@
    (set_attr "length" "8")]
 )
 
-;; Vector bits common to IWMMXT and Neon
+;; Vector bits common to IWMMXT, Neon and MVE
 (include "vec-common.md")
 ;; Load the Intel Wireless Multimedia Extension patterns
 (include "iwmmxt.md")
@@ -12642,3 +12643,5 @@
 (include "sync.md")
 ;; Fixed-point patterns
 (include "arm-fixed.md")
+;; M-profile Vector Extensions
+(include "mve.md")
diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
new file mode 100644
index 0000000000000000000000000000000000000000..5ffb466596b5d8fc330616a6fcc7ee37d3e28def
--- /dev/null
+++ b/gcc/config/arm/arm_mve.h
@@ -0,0 +1,59 @@
+/* Arm MVE intrinsics include file.
+
+   Copyright (C) 2019 Free Software Foundation, Inc.
+   Contributed by Arm.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published
+   by the Free Software Foundation; either version 3, or (at your
+   option) any later version.
+
+   GCC is distributed in the hope that it will be useful, but WITHOUT
+   ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+   or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
+   License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef _GCC_ARM_MVE_H
+#define _GCC_ARM_MVE_H
+
+#if !__ARM_FEATURE_MVE
+#error "MVE feature not supported"
+#endif
+
+#include <stdint.h>
+#ifndef  __cplusplus
+#include <stdbool.h>
+#endif
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#if (__ARM_FEATURE_MVE & 2) /* MVE Floating point.  */
+typedef __fp16 float16_t;
+typedef float float32_t;
+typedef __simd128_float16_t float16x8_t;
+typedef __simd128_float32_t float32x4_t;
+#endif
+
+typedef uint16_t mve_pred16_t;
+typedef __simd128_uint8_t uint8x16_t;
+typedef __simd128_uint16_t uint16x8_t;
+typedef __simd128_uint32_t uint32x4_t;
+typedef __simd128_uint64_t uint64x2_t;
+typedef __simd128_int8_t int8x16_t;
+typedef __simd128_int16_t int16x8_t;
+typedef __simd128_int32_t int32x4_t;
+typedef __simd128_int64_t int64x2_t;
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _GCC_ARM_MVE_H.  */
diff --git a/gcc/config/arm/constraints.md b/gcc/config/arm/constraints.md
index 6f309b95cc1874ac7bc69e435781070e0c9cb70a..f77084a0efd489491372bb1dafbc0cd585f0f518 100644
--- a/gcc/config/arm/constraints.md
+++ b/gcc/config/arm/constraints.md
@@ -44,6 +44,8 @@
 ;; in Thumb state: Uu, Uw
 ;; in all states: Q
 
+(define_register_constraint "Up" "TARGET_HAVE_MVE ? VPR_REG : NO_REGS"
+  "MVE VPR register")
 
 (define_register_constraint "t" "TARGET_32BIT ? VFP_LO_REGS : NO_REGS"
  "The VFP registers @code{s0}-@code{s31}.")
diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
index c412851843f4468c2c18bce264288705e076ac50..e30325bc1652d378be2544fa32269c5c4294d7e9 100644
--- a/gcc/config/arm/iterators.md
+++ b/gcc/config/arm/iterators.md
@@ -62,6 +62,12 @@
 ;; Integer and float modes supported by Neon and IWMMXT.
 (define_mode_iterator VALL [V2DI V2SI V4HI V8QI V2SF V4SI V8HI V16QI V4SF])
 
+;; Integer and float modes supported by Neon, IWMMXT and MVE.
+(define_mode_iterator VNIM1 [V16QI V8HI V4SI V4SF V2DI])
+
+;; Integer and float modes supported by Neon and IWMMXT but not MVE.
+(define_mode_iterator VNINOTM1 [V2SI V4HI V8QI V2SF])
+
 ;; Integer and float modes supported by Neon and IWMMXT, except V2DI.
 (define_mode_iterator VALLW [V2SI V4HI V8QI V2SF V4SI V8HI V16QI V4SF])
 
@@ -105,7 +111,8 @@
 (define_mode_iterator VQXMOV [V16QI V8HI V8HF V4SI V4SF V2DI TI])
 
 ;; Opaque structure types wider than TImode.
-(define_mode_iterator VSTRUCT [EI OI CI XI])
+(define_mode_iterator VSTRUCT [(EI "!TARGET_HAVE_MVE") OI
+			       (CI "!TARGET_HAVE_MVE") XI])
 
 ;; Opaque structure types used in table lookups (except vtbl1/vtbx1).
 (define_mode_iterator VTAB [TI EI OI])
diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
new file mode 100644
index 0000000000000000000000000000000000000000..53334c6d329dedd482615b996232e85ded7a34f8
--- /dev/null
+++ b/gcc/config/arm/mve.md
@@ -0,0 +1,78 @@
+;; Arm M-profile Vector Extension Machine Description
+;; Copyright (C) 2019 Free Software Foundation, Inc.
+;;
+;; This file is part of GCC.
+;;
+;; GCC is free software; you can redistribute it and/or modify it
+;; under the terms of the GNU General Public License as published by
+;; the Free Software Foundation; either version 3, or (at your option)
+;; any later version.
+;;
+;; GCC is distributed in the hope that it will be useful, but
+;; WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+;; General Public License for more details.
+;;
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3.  If not see
+;; <http://www.gnu.org/licenses/>.
+
+(define_mode_iterator MVE_types [V16QI V8HI V4SI V2DI TI V8HF V4SF V2DF])
+(define_mode_attr V_sz_elem2 [(V16QI "s8") (V8HI "u16") (V4SI "u32")
+			      (V2DI "u64")])
+
+(define_insn "*mve_mov<mode>"
+  [(set (match_operand:MVE_types 0 "s_register_operand" "=w,w,r,w,w,r,w")
+	(match_operand:MVE_types 1 "general_operand" "w,r,w,Dn,Usi,r,Dm"))]
+  "TARGET_HAVE_MVE || TARGET_HAVE_MVE_FLOAT"
+{
+  if (which_alternative == 3 || which_alternative == 6)
+    {
+      int width, is_valid;
+      static char templ[40];
+
+      is_valid = simd_immediate_valid_for_move (operands[1], <MODE>mode,
+	&operands[1], &width);
+
+      gcc_assert (is_valid != 0);
+
+      if (width == 0)
+	return "vmov.f32\t%q0, %1  @ <mode>";
+      else
+	sprintf (templ, "vmov.i%d\t%%q0, %%x1  @ <mode>", width);
+      return templ;
+    }
+  switch (which_alternative)
+    {
+    case 0:
+      return "vmov\t%q0, %q1";
+    case 1:
+      return "vmov\t%e0, %Q1, %R1  @ <mode>\;vmov\t%f0, %J1, %K1";
+    case 2:
+      return "vmov\t%Q0, %R0, %e1  @ <mode>\;vmov\t%J0, %K0, %f1";
+    case 4:
+      if ((TARGET_HAVE_MVE_FLOAT && VALID_MVE_SF_MODE (<MODE>mode))
+	  || (MEM_P (operands[1])
+	      && GET_CODE (XEXP (operands[1], 0)) == LABEL_REF))
+	return output_move_neon (operands);
+      else
+	return "vldrb.<V_sz_elem2> %q0, %E1";
+    case 5:
+      return output_move_neon (operands);
+    case 6:
+    default:
+      gcc_unreachable ();
+      return "";
+    }
+}
+  [(set_attr "type" "mve_move,mve_move,mve_move,mve_move,mve_load,mve_move,mve_move")
+   (set_attr "length" "4,8,8,4,8,8,4")
+   (set_attr "thumb2_pool_range" "*,*,*,*,1018,*,*")
+   (set_attr "neg_pool_range" "*,*,*,*,996,*,*")])
+
+(define_insn "*mve_vstr<mode>"
+  [(set (match_operand:MVE_types 0 "memory_operand" "=Us")
+	(match_operand:MVE_types 1 "s_register_operand" "w"))]
+  "TARGET_HAVE_MVE"
+  "vstrb.<V_sz_elem> %q1, %E0"
+  [(set_attr "type" "mve_store")])
diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index 6a0ee28efc9aa9f1fba7b5ae031564f40aa095fe..c23783e0ed914ec21a92828388ada58ada3c6132 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -35,9 +35,9 @@
 
 (define_insn "*neon_mov<mode>"
   [(set (match_operand:VDX 0 "nonimmediate_operand"
-	  "=w,Un,w, w, w,  ?r,?w,?r, ?Us,*r")
+	"=w,Un,w, w, w,  ?r,?w,?r, ?Us,*r")
 	(match_operand:VDX 1 "general_operand"
-	  " w,w, Dm,Dn,Uni, w, r, Usi,r,*r"))]
+	" w,w, Dm,Dn,Uni, w, r, Usi,r,*r"))]
   "TARGET_NEON
    && (register_operand (operands[0], <MODE>mode)
        || register_operand (operands[1], <MODE>mode))"
@@ -47,7 +47,7 @@
       int width, is_valid;
       static char templ[40];
 
-      is_valid = neon_immediate_valid_for_move (operands[1], <MODE>mode,
+      is_valid = simd_immediate_valid_for_move (operands[1], <MODE>mode,
         &operands[1], &width);
 
       gcc_assert (is_valid != 0);
@@ -94,7 +94,7 @@
       int width, is_valid;
       static char templ[40];
 
-      is_valid = neon_immediate_valid_for_move (operands[1], <MODE>mode,
+      is_valid = simd_immediate_valid_for_move (operands[1], <MODE>mode,
         &operands[1], &width);
 
       gcc_assert (is_valid != 0);
@@ -147,9 +147,9 @@
 })
 
 (define_expand "mov<mode>"
-  [(set (match_operand:VSTRUCT 0 "nonimmediate_operand")
-	(match_operand:VSTRUCT 1 "general_operand"))]
-  "TARGET_NEON"
+  [(set (match_operand:VSTRUCT 0 "nonimmediate_operand" "")
+	(match_operand:VSTRUCT 1 "general_operand" ""))]
+  "TARGET_NEON || TARGET_HAVE_MVE"
 {
   gcc_checking_assert (aligned_operand (operands[0], <MODE>mode));
   gcc_checking_assert (aligned_operand (operands[1], <MODE>mode));
@@ -160,24 +160,28 @@
     }
 })
 
-(define_expand "mov<mode>"
-  [(set (match_operand:VH 0 "s_register_operand")
-	(match_operand:VH 1 "s_register_operand"))]
+;; The pattern mov<mode> where mode is v4hf and v8hf is split into
+;; movv4hf and movv8hf.  The pattern movv8hf is common for MVE and
+;; NEON, so it is moved into vec-common.md file.
+(define_expand "movv4hf"
+  [(set (match_operand:V4HF 0 "s_register_operand")
+	(match_operand:V4HF 1 "s_register_operand"))]
   "TARGET_NEON"
 {
-  gcc_checking_assert (aligned_operand (operands[0], <MODE>mode));
-  gcc_checking_assert (aligned_operand (operands[1], <MODE>mode));
+  gcc_checking_assert (aligned_operand (operands[0], E_V4HFmode));
+  gcc_checking_assert (aligned_operand (operands[1], E_V4HFmode));
   if (can_create_pseudo_p ())
     {
       if (!REG_P (operands[0]))
-	operands[1] = force_reg (<MODE>mode, operands[1]);
+	operands[1] = force_reg (E_V4HFmode, operands[1]);
     }
 })
 
+
 (define_insn "*neon_mov<mode>"
   [(set (match_operand:VSTRUCT 0 "nonimmediate_operand"	"=w,Ut,w")
 	(match_operand:VSTRUCT 1 "general_operand"	" w,w, Ut"))]
-  "TARGET_NEON
+  "(TARGET_NEON || TARGET_HAVE_MVE)
    && (register_operand (operands[0], <MODE>mode)
        || register_operand (operands[1], <MODE>mode))"
 {
@@ -213,7 +217,7 @@
 (define_split
   [(set (match_operand:OI 0 "s_register_operand" "")
 	(match_operand:OI 1 "s_register_operand" ""))]
-  "TARGET_NEON && reload_completed"
+  "(TARGET_NEON || TARGET_HAVE_MVE) && reload_completed"
   [(set (match_dup 0) (match_dup 1))
    (set (match_dup 2) (match_dup 3))]
 {
@@ -254,7 +258,7 @@
 (define_split
   [(set (match_operand:XI 0 "s_register_operand" "")
 	(match_operand:XI 1 "s_register_operand" ""))]
-  "TARGET_NEON && reload_completed"
+  "(TARGET_NEON || TARGET_HAVE_MVE) && reload_completed"
   [(set (match_dup 0) (match_dup 1))
    (set (match_dup 2) (match_dup 3))
    (set (match_dup 4) (match_dup 5))
@@ -489,7 +493,7 @@
 (define_expand "vec_init<mode><V_elem_l>"
   [(match_operand:VDQ 0 "s_register_operand")
    (match_operand 1 "" "")]
-  "TARGET_NEON"
+  "TARGET_NEON || TARGET_HAVE_MVE"
 {
   neon_expand_vector_init (operands[0], operands[1]);
   DONE;
diff --git a/gcc/config/arm/predicates.md b/gcc/config/arm/predicates.md
index 2f0f532edf40d475e4199aa41bd7803fac8d6143..9d74165fe065b03c77918fe9e4611967799535f1 100644
--- a/gcc/config/arm/predicates.md
+++ b/gcc/config/arm/predicates.md
@@ -48,6 +48,16 @@
   return guard_addr_operand (XEXP (op, 0), mode);
 })
 
+(define_predicate "vpr_register_operand"
+  (match_code "reg,subreg")
+{
+  if (GET_CODE (op) == SUBREG)
+    op = SUBREG_REG (op);
+  return REG_P (op)
+	  && (REGNO (op) >= FIRST_PSEUDO_REGISTER
+	      || IS_VPR_REGNUM (REGNO (op)));
+})
+
 (define_predicate "imm_for_neon_inv_logic_operand"
   (match_code "const_vector")
 {
@@ -706,7 +716,7 @@
 (define_predicate "imm_for_neon_mov_operand"
   (match_code "const_vector,const_int")
 {
-  return neon_immediate_valid_for_move (op, mode, NULL, NULL);
+  return simd_immediate_valid_for_move (op, mode, NULL, NULL);
 })
 
 (define_predicate "imm_for_neon_lshift_operand"
diff --git a/gcc/config/arm/t-arm b/gcc/config/arm/t-arm
index af60c8fc285bb536afeb9ec5c21771a4379755fc..fda5e84355b56a20eb9a22919ab1c786120cc8f1 100644
--- a/gcc/config/arm/t-arm
+++ b/gcc/config/arm/t-arm
@@ -55,6 +55,7 @@ MD_INCLUDES=	$(srcdir)/config/arm/arm1020e.md \
 		$(srcdir)/config/arm/ldmstm.md \
 		$(srcdir)/config/arm/ldrdstrd.md \
 		$(srcdir)/config/arm/marvell-f-iwmmxt.md \
+		$(srcdir)/config/arm/mve.md \
 		$(srcdir)/config/arm/neon.md \
 		$(srcdir)/config/arm/predicates.md \
 		$(srcdir)/config/arm/sync.md \
diff --git a/gcc/config/arm/types.md b/gcc/config/arm/types.md
index 60faad6597935607ed3c5593f941a04bbc924252..c99b846ab387bac633be8b1631f0e40b3c827850 100644
--- a/gcc/config/arm/types.md
+++ b/gcc/config/arm/types.md
@@ -550,6 +550,11 @@
 ; The classification below is for TME instructions
 ;
 ; tme
+; The classification below is for M-profile Vector Extension instructions
+;
+; mve_move
+; mve_store
+; mve_load
 
 (define_attr "type"
  "adc_imm,\
@@ -1096,7 +1101,11 @@
   crypto_sm3,\
   crypto_sm4,\
   coproc,\
-  tme"
+  tme,\
+\
+  mve_move,\
+  mve_store,\
+  mve_load"
    (const_string "untyped"))
 
 ; Is this an (integer side) multiply with a 32-bit (or smaller) result?
diff --git a/gcc/config/arm/vec-common.md b/gcc/config/arm/vec-common.md
index 33ff5627284d7cc898074b562179938982ecc420..5f5c113cf95afafbb733e1bfd2a7c7b8a55651a2 100644
--- a/gcc/config/arm/vec-common.md
+++ b/gcc/config/arm/vec-common.md
@@ -21,8 +21,31 @@
 ;; Vector Moves
 
 (define_expand "mov<mode>"
-  [(set (match_operand:VALL 0 "nonimmediate_operand")
-	(match_operand:VALL 1 "general_operand"))]
+  [(set (match_operand:VNIM1 0 "nonimmediate_operand")
+	(match_operand:VNIM1 1 "general_operand"))]
+  "TARGET_NEON
+   || (TARGET_REALLY_IWMMXT && VALID_IWMMXT_REG_MODE (<MODE>mode))
+   || (TARGET_HAVE_MVE && VALID_MVE_SI_MODE (<MODE>mode))
+   || (TARGET_HAVE_MVE_FLOAT && VALID_MVE_SF_MODE (<MODE>mode))"
+   {
+  gcc_checking_assert (aligned_operand (operands[0], <MODE>mode));
+  gcc_checking_assert (aligned_operand (operands[1], <MODE>mode));
+  if (can_create_pseudo_p ())
+    {
+      if (!REG_P (operands[0]))
+	operands[1] = force_reg (<MODE>mode, operands[1]);
+      else if ((TARGET_NEON || TARGET_HAVE_MVE || TARGET_HAVE_MVE_FLOAT)
+	       && (CONSTANT_P (operands[1])))
+	{
+	  operands[1] = neon_make_constant (operands[1]);
+	  gcc_assert (operands[1] != NULL_RTX);
+	}
+    }
+})
+
+(define_expand "mov<mode>"
+  [(set (match_operand:VNINOTM1 0 "nonimmediate_operand")
+	(match_operand:VNINOTM1 1 "general_operand"))]
   "TARGET_NEON
    || (TARGET_REALLY_IWMMXT && VALID_IWMMXT_REG_MODE (<MODE>mode))"
 {
@@ -40,6 +63,20 @@
     }
 })
 
+(define_expand "movv8hf"
+  [(set (match_operand:V8HF 0 "s_register_operand")
+       (match_operand:V8HF 1 "s_register_operand"))]
+   "TARGET_NEON || TARGET_HAVE_MVE_FLOAT"
+{
+  gcc_checking_assert (aligned_operand (operands[0], E_V8HFmode));
+  gcc_checking_assert (aligned_operand (operands[1], E_V8HFmode));
+   if (can_create_pseudo_p ())
+     {
+       if (!REG_P (operands[0]))
+	 operands[1] = force_reg (E_V8HFmode, operands[1]);
+     }
+})
+
 ;; Vector arithmetic. Expanders are blank, then unnamed insns implement
 ;; patterns separately for IWMMXT and Neon.
 
diff --git a/gcc/config/arm/vfp.md b/gcc/config/arm/vfp.md
index 573db164f01b4ac9ee4a9ee7414872fb93c9e2ca..6349c0570540ec25a599166f5d427fcbdbf2af68 100644
--- a/gcc/config/arm/vfp.md
+++ b/gcc/config/arm/vfp.md
@@ -311,7 +311,7 @@
    && (   register_operand (operands[0], DImode)
        || register_operand (operands[1], DImode))
    && !(TARGET_NEON && CONST_INT_P (operands[1])
-        && neon_immediate_valid_for_move (operands[1], DImode, NULL, NULL))"
+	&& simd_immediate_valid_for_move (operands[1], DImode, NULL, NULL))"
   "*
   switch (which_alternative)
     {
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vector_float.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vector_float.c
new file mode 100644
index 0000000000000000000000000000000000000000..c3f81546c9f14f2491c6fb134170f17bcba16069
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vector_float.c
@@ -0,0 +1,27 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mcpu=auto"} } */
+
+#include "arm_mve.h"
+
+float32x4_t
+foo32 (float32x4_t value)
+{
+  float32x4_t b = value;
+  return b;
+}
+
+/* { dg-final { scan-assembler "vmov\\tq\[0-7\], q\[0-7\]"  }  } */
+/* { dg-final { scan-assembler "vstrb.*" }  } */
+/* { dg-final { scan-assembler "vldmia.*" }  } */
+
+float16x8_t
+foo16 (float16x8_t value)
+{
+  float16x8_t b = value;
+  return b;
+}
+
+/* { dg-final { scan-assembler "vmov\\tq\[0-7\], q\[0-7\]"  }  } */
+/* { dg-final { scan-assembler "vstrb.*" }  } */
+/* { dg-final { scan-assembler "vldmia.*" }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vector_float1.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vector_float1.c
new file mode 100644
index 0000000000000000000000000000000000000000..ebee0d2f1ad03b66d044d93bf901e0ce78eccba9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vector_float1.c
@@ -0,0 +1,31 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mcpu=auto"} } */
+
+#include "arm_mve.h"
+
+float32x4_t value;
+
+float32x4_t
+foo32 ()
+{
+  float32x4_t b = value;
+  return b;
+}
+
+/* { dg-final { scan-assembler "vmov\\tq\[0-7\], q\[0-7\]"  }  } */
+/* { dg-final { scan-assembler "vstrb.*" }  } */
+/* { dg-final { scan-assembler "vldmia.*" }  } */
+
+float16x8_t value1;
+
+float16x8_t
+foo16 ()
+{
+  float16x8_t b = value1;
+  return b;
+}
+
+/* { dg-final { scan-assembler "vmov\\tq\[0-7\], q\[0-7\]"  }  } */
+/* { dg-final { scan-assembler "vstrb.*" }  } */
+/* { dg-final { scan-assembler "vldmia.*" }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vector_float2.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vector_float2.c
new file mode 100644
index 0000000000000000000000000000000000000000..9b9c84d66ef8fd585a42be1ac7585d8bc6c529bb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vector_float2.c
@@ -0,0 +1,27 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mcpu=auto"} } */
+
+#include "arm_mve.h"
+
+float32x4_t
+foo32 ()
+{
+  float32x4_t b = {10.0, 12.0, 14.0, 16.0};
+  return b;
+}
+
+/* { dg-final { scan-assembler "vmov\\tq\[0-7\], q\[0-7\]"  }  } */
+/* { dg-final { scan-assembler "vstrb.*" }  } */
+/* { dg-final { scan-assembler "vldrw.32*" }  } */
+
+float16x8_t
+foo16 ()
+{
+  float16x8_t b = {32.01};
+  return b;
+}
+
+/* { dg-final { scan-assembler "vmov\\tq\[0-7\], q\[0-7\]"  }  } */
+/* { dg-final { scan-assembler "vstrb.*" }  } */
+/* { dg-final { scan-assembler "vldrw.32.*" }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vector_int.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vector_int.c
new file mode 100644
index 0000000000000000000000000000000000000000..6b54c3c61f32cf8e0af30272df63f261def0b8c5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vector_int.c
@@ -0,0 +1,49 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mcpu=auto"} } */
+
+#include "arm_mve.h"
+
+int8x16_t
+foo8 (int8x16_t value)
+{
+  int8x16_t b = value;
+  return b;
+}
+
+/* { dg-final { scan-assembler "vmov\\tq\[0-7\], q\[0-7\]"  }  } */
+/* { dg-final { scan-assembler "vstrb.*" }  } */
+/* { dg-final { scan-assembler "vldrb.s8*" }  } */
+
+int16x8_t
+foo16 (int16x8_t value)
+{
+  int16x8_t b = value;
+  return b;
+}
+
+/* { dg-final { scan-assembler "vmov\\tq\[0-7\], q\[0-7\]"  }  } */
+/* { dg-final { scan-assembler "vstrb.*" }  } */
+/* { dg-final { scan-assembler "vldrb.u16*" }  } */
+
+int32x4_t
+foo32 (int32x4_t value)
+{
+  int32x4_t b = value;
+  return b;
+}
+
+/* { dg-final { scan-assembler "vmov\\tq\[0-7\], q\[0-7\]"  }  } */
+/* { dg-final { scan-assembler "vstrb.*" }  } */
+/* { dg-final { scan-assembler "vldrb.u32*" }  } */
+
+int64x2_t
+foo64 (int64x2_t value)
+{
+  int64x2_t b = value;
+  return b;
+}
+
+/* { dg-final { scan-assembler "vmov\\tq\[0-7\], q\[0-7\]"  }  } */
+/* { dg-final { scan-assembler "vstrb.*" }  } */
+/* { dg-final { scan-assembler "vldrb.u64*" }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vector_int1.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vector_int1.c
new file mode 100644
index 0000000000000000000000000000000000000000..748ddecbd4011bb24058c27cd6a09d66f71ce581
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vector_int1.c
@@ -0,0 +1,54 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mcpu=auto"} } */
+
+#include "arm_mve.h"
+
+int8x16_t value1;
+int16x8_t value2;
+int32x4_t value3;
+int64x2_t value4;
+
+int8x16_t
+foo8 ()
+{
+  int8x16_t b = value1;
+  return b;
+}
+
+/* { dg-final { scan-assembler "vmov\\tq\[0-7\], q\[0-7\]"  }  } */
+/* { dg-final { scan-assembler "vstrb.*" }  } */
+/* { dg-final { scan-assembler "vldrb.u8*" }  } */
+
+int16x8_t
+foo16 ()
+{
+  int16x8_t b = value2;
+  return b;
+}
+
+/* { dg-final { scan-assembler "vmov\\tq\[0-7\], q\[0-7\]"  }  } */
+/* { dg-final { scan-assembler "vstrb.*" }  } */
+/* { dg-final { scan-assembler "vldrb.u16*" }  } */
+
+int32x4_t
+foo32 ()
+{
+  int32x4_t b = value3;
+  return b;
+}
+
+/* { dg-final { scan-assembler "vmov\\tq\[0-7\], q\[0-7\]"  }  } */
+/* { dg-final { scan-assembler "vstrb.*" }  } */
+/* { dg-final { scan-assembler "vldrb.u32" }  } */
+
+int64x2_t
+foo64 ()
+{
+  int64x2_t b = value4;
+  return b;
+}
+
+/* { dg-final { scan-assembler "vmov\\tq\[0-7\], q\[0-7\]"  }  } */
+/* { dg-final { scan-assembler "vstrb.*" }  } */
+/* { dg-final { scan-assembler "vldrb.u64" }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vector_int2.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vector_int2.c
new file mode 100644
index 0000000000000000000000000000000000000000..376ec9ee7fc04ddde98719d2605319a378f9a6bb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vector_int2.c
@@ -0,0 +1,49 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mcpu=auto"} } */
+
+#include "arm_mve.h"
+
+int8x16_t
+foo8 ()
+{
+  int8x16_t b = {1, 2, 3, 4};
+  return b;
+}
+
+/* { dg-final { scan-assembler "vmov\\tq\[0-7\], q\[0-7\]"  }  } */
+/* { dg-final { scan-assembler "vstrb.*" }  } */
+/* { dg-final { scan-assembler "vldrw.32.*" }  } */
+
+int16x8_t
+foo16 (int16x8_t value)
+{
+  int16x8_t b = {1, 2, 3};
+  return b;
+}
+
+/* { dg-final { scan-assembler "vmov\\tq\[0-7\], q\[0-7\]"  }  } */
+/* { dg-final { scan-assembler "vstrb.*" }  } */
+/* { dg-final { scan-assembler "vldrw.32.*" }  } */
+
+int32x4_t
+foo32 (int32x4_t value)
+{
+  int32x4_t b = {1, 2};
+  return b;
+}
+
+/* { dg-final { scan-assembler "vmov\\tq\[0-7\], q\[0-7\]"  }  } */
+/* { dg-final { scan-assembler "vstrb.*" }  } */
+/* { dg-final { scan-assembler "vldrw.32.*" }  } */
+
+int64x2_t
+foo64 (int64x2_t value)
+{
+  int64x2_t b = {1};
+  return b;
+}
+
+/* { dg-final { scan-assembler "vmov\\tq\[0-7\], q\[0-7\]"  }  } */
+/* { dg-final { scan-assembler "vstrb.*" }  } */
+/* { dg-final { scan-assembler "vldrw.32.*" }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vector_uint.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vector_uint.c
new file mode 100644
index 0000000000000000000000000000000000000000..f001d14f9ca4c851ed4b3371ae9599d23d2b62ce
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vector_uint.c
@@ -0,0 +1,49 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mcpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint8x16_t
+foo8 (uint8x16_t value)
+{
+  uint8x16_t b = value;
+  return b;
+}
+
+/* { dg-final { scan-assembler "vmov\\tq\[0-7\], q\[0-7\]"  }  } */
+/* { dg-final { scan-assembler "vstrb.*" }  } */
+/* { dg-final { scan-assembler "vldrb.s8*" }  } */
+
+uint16x8_t
+foo16 (uint16x8_t value)
+{
+  uint16x8_t b = value;
+  return b;
+}
+
+/* { dg-final { scan-assembler "vmov\\tq\[0-7\], q\[0-7\]"  }  } */
+/* { dg-final { scan-assembler "vstrb.*" }  } */
+/* { dg-final { scan-assembler "vldrb.u16*" }  } */
+
+uint32x4_t
+foo32 (uint32x4_t value)
+{
+  uint32x4_t b = value;
+  return b;
+}
+
+/* { dg-final { scan-assembler "vmov\\tq\[0-7\], q\[0-7\]"  }  } */
+/* { dg-final { scan-assembler "vstrb.*" }  } */
+/* { dg-final { scan-assembler "vldrb.u32*" }  } */
+
+uint64x2_t
+foo64 (uint64x2_t value)
+{
+  uint64x2_t b = value;
+  return b;
+}
+
+/* { dg-final { scan-assembler "vmov\\tq\[0-7\], q\[0-7\]"  }  } */
+/* { dg-final { scan-assembler "vstrb.*" }  } */
+/* { dg-final { scan-assembler "vldrb.u64*" }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vector_uint1.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vector_uint1.c
new file mode 100644
index 0000000000000000000000000000000000000000..56d40668d63ba0b24c08944981c415054494c37d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vector_uint1.c
@@ -0,0 +1,54 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mcpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint8x16_t value1;
+uint16x8_t value2;
+uint32x4_t value3;
+uint64x2_t value4;
+
+uint8x16_t
+foo8 ()
+{
+  uint8x16_t b = value1;
+  return b;
+}
+
+/* { dg-final { scan-assembler "vmov\\tq\[0-7\], q\[0-7\]"  }  } */
+/* { dg-final { scan-assembler "vstrb.*" }  } */
+/* { dg-final { scan-assembler "vldrb.s8*" }  } */
+
+uint16x8_t
+foo16 ()
+{
+  uint16x8_t b = value2;
+  return b;
+}
+
+/* { dg-final { scan-assembler "vmov\\tq\[0-7\], q\[0-7\]"  }  } */
+/* { dg-final { scan-assembler "vstrb.*" }  } */
+/* { dg-final { scan-assembler "vldrb.u16*" }  } */
+
+uint32x4_t
+foo32 ()
+{
+  uint32x4_t b = value3;
+  return b;
+}
+
+/* { dg-final { scan-assembler "vmov\\tq\[0-7\], q\[0-7\]"  }  } */
+/* { dg-final { scan-assembler "vstrb.*" }  } */
+/* { dg-final { scan-assembler "vldrb.u32*" }  } */
+
+uint64x2_t
+foo64 ()
+{
+  uint64x2_t b = value4;
+  return b;
+}
+
+/* { dg-final { scan-assembler "vmov\\tq\[0-7\], q\[0-7\]"  }  } */
+/* { dg-final { scan-assembler "vstrb.*" }  } */
+/* { dg-final { scan-assembler "vldrb.u64*" }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vector_uint2.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vector_uint2.c
new file mode 100644
index 0000000000000000000000000000000000000000..9ff9b67993ac83cf398880cb65510604a37de6a4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vector_uint2.c
@@ -0,0 +1,49 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mcpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint8x16_t
+foo8 (uint8x16_t value)
+{
+  uint8x16_t b = {1, 2, 3, 4};
+  return b;
+}
+
+/* { dg-final { scan-assembler "vmov\\tq\[0-7\], q\[0-7\]"  }  } */
+/* { dg-final { scan-assembler "vstrb.*" }  } */
+/* { dg-final { scan-assembler "vldrw.32.*" }  } */
+
+uint16x8_t
+foo16 (uint16x8_t value)
+{
+  uint16x8_t b = {1, 2, 3};
+  return b;
+}
+
+/* { dg-final { scan-assembler "vmov\\tq\[0-7\], q\[0-7\]"  }  } */
+/* { dg-final { scan-assembler "vstrb.*" }  } */
+/* { dg-final { scan-assembler "vldrw.32.*" }  } */
+
+uint32x4_t
+foo32 (uint32x4_t value)
+{
+  uint32x4_t b = {1, 2};
+  return b;
+}
+
+/* { dg-final { scan-assembler "vmov\\tq\[0-7\], q\[0-7\]"  }  } */
+/* { dg-final { scan-assembler "vstrb.*" }  } */
+/* { dg-final { scan-assembler "vldrw.32.*" }  } */
+
+uint64x2_t
+foo64 (uint64x2_t value)
+{
+  uint64x2_t b = {1};
+  return b;
+}
+
+/* { dg-final { scan-assembler "vmov\\tq\[0-7\], q\[0-7\]"  }  } */
+/* { dg-final { scan-assembler "vstrb.*" }  } */
+/* { dg-final { scan-assembler "vldrw.32.*" }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/mve.exp b/gcc/testsuite/gcc.target/arm/mve/mve.exp
new file mode 100644
index 0000000000000000000000000000000000000000..77ae3fa292b2892fb22c2f89223ca19dc16ccc99
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/mve.exp
@@ -0,0 +1,47 @@
+# Copyright (C) 2019 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with GCC; see the file COPYING3.  If not see
+# <http://www.gnu.org/licenses/>.
+
+# GCC testsuite that uses the `dg.exp' driver.
+
+# Exit immediately if this isn't an ARM target.
+if ![istarget arm*-*-*] then {
+  return
+}
+
+# Load support procs.
+load_lib gcc-dg.exp
+
+# If a testcase doesn't have special options, use these.
+global DEFAULT_CFLAGS
+if ![info exists DEFAULT_CFLAGS] then {
+    set DEFAULT_CFLAGS " -ansi -pedantic-errors"
+}
+
+# This variable should only apply to tests called in this exp file.
+global dg_runtest_extra_prunes
+set dg_runtest_extra_prunes ""
+lappend dg_runtest_extra_prunes "warning: switch -m(cpu|arch)=.* conflicts with -m(cpu|arch)=.* switch"
+
+# Initialize `dg'.
+dg-init
+
+# Main loop.
+dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/intrinsics/*.\[cCS\]]] \
+	"" $DEFAULT_CFLAGS
+
+# All done.
+set dg_runtest_extra_prunes ""
+dg-finish


[-- Attachment #2: diff00.patch.gz --]
[-- Type: application/gzip, Size: 16528 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH][ARM][GCC][2/8x]: MVE ACLE gather load and scatter store intrinsics with writeback.
  2019-11-14 19:34 [PATCH][ARM][GCC][0/x]: Support for MVE ACLE intrinsics Srinath Parvathaneni
                   ` (20 preceding siblings ...)
  2019-11-14 19:16 ` [PATCH][ARM][GCC][13x]: MVE ACLE scalar shift intrinsics Srinath Parvathaneni
@ 2019-11-14 19:16 ` Srinath Parvathaneni
  2019-11-14 19:16 ` [PATCH][ARM][GCC][7x]: MVE vreinterpretq and vuninitializedq intrinsics Srinath Parvathaneni
                   ` (16 subsequent siblings)
  38 siblings, 0 replies; 48+ messages in thread
From: Srinath Parvathaneni @ 2019-11-14 19:16 UTC (permalink / raw)
  To: gcc-patches; +Cc: Richard Earnshaw, Kyrylo Tkachov

[-- Attachment #1: Type: text/plain, Size: 64095 bytes --]

Hello,

This patch supports following MVE ACLE intrinsics with writeback.

vldrdq_gather_base_wb_s64, vldrdq_gather_base_wb_u64, vldrdq_gather_base_wb_z_s64,
vldrdq_gather_base_wb_z_u64, vldrwq_gather_base_wb_f32, vldrwq_gather_base_wb_s32,
vldrwq_gather_base_wb_u32, vldrwq_gather_base_wb_z_f32, vldrwq_gather_base_wb_z_s32,
vldrwq_gather_base_wb_z_u32, vstrdq_scatter_base_wb_p_s64, vstrdq_scatter_base_wb_p_u64,
vstrdq_scatter_base_wb_s64, vstrdq_scatter_base_wb_u64, vstrwq_scatter_base_wb_p_s32,
vstrwq_scatter_base_wb_p_f32, vstrwq_scatter_base_wb_p_u32, vstrwq_scatter_base_wb_s32,
vstrwq_scatter_base_wb_u32, vstrwq_scatter_base_wb_f32.

Please refer to M-profile Vector Extension (MVE) intrinsics [1]  for more details.
[1] https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/mve-intrinsics

Regression tested on arm-none-eabi and found no regressions.

Ok for trunk?

Thanks,
Srinath.

gcc/ChangeLog:

2019-11-07  Andre Vieira  <andre.simoesdiasvieira@arm.com>
            Mihail Ionescu  <mihail.ionescu@arm.com>
            Srinath Parvathaneni  <srinath.parvathaneni@arm.com>

	* config/arm/arm-builtins.c (LDRGBWBS_QUALIFIERS): Define builtin
	qualifier.
	(LDRGBWBU_QUALIFIERS): Likewise.
	(LDRGBWBS_Z_QUALIFIERS): Likewise.
	(LDRGBWBU_Z_QUALIFIERS): Likewise.
	(STRSBWBS_QUALIFIERS): Likewise.
	(STRSBWBU_QUALIFIERS): Likewise.
	(STRSBWBS_P_QUALIFIERS): Likewise.
	(STRSBWBU_P_QUALIFIERS): Likewise.
	* config/arm/arm_mve.h (vldrdq_gather_base_wb_s64): Define macro.
	(vldrdq_gather_base_wb_u64): Likewise.
	(vldrdq_gather_base_wb_z_s64): Likewise.
	(vldrdq_gather_base_wb_z_u64): Likewise.
	(vldrwq_gather_base_wb_f32): Likewise.
	(vldrwq_gather_base_wb_s32): Likewise.
	(vldrwq_gather_base_wb_u32): Likewise.
	(vldrwq_gather_base_wb_z_f32): Likewise.
	(vldrwq_gather_base_wb_z_s32): Likewise.
	(vldrwq_gather_base_wb_z_u32): Likewise.
	(vstrdq_scatter_base_wb_p_s64): Likewise.
	(vstrdq_scatter_base_wb_p_u64): Likewise.
	(vstrdq_scatter_base_wb_s64): Likewise.
	(vstrdq_scatter_base_wb_u64): Likewise.
	(vstrwq_scatter_base_wb_p_s32): Likewise.
	(vstrwq_scatter_base_wb_p_f32): Likewise.
	(vstrwq_scatter_base_wb_p_u32): Likewise.
	(vstrwq_scatter_base_wb_s32): Likewise.
	(vstrwq_scatter_base_wb_u32): Likewise.
	(vstrwq_scatter_base_wb_f32): Likewise.
	(__arm_vldrdq_gather_base_wb_s64): Define intrinsic.
	(__arm_vldrdq_gather_base_wb_u64): Likewise.
	(__arm_vldrdq_gather_base_wb_z_s64): Likewise.
	(__arm_vldrdq_gather_base_wb_z_u64): Likewise.
	(__arm_vldrwq_gather_base_wb_s32): Likewise.
	(__arm_vldrwq_gather_base_wb_u32): Likewise.
	(__arm_vldrwq_gather_base_wb_z_s32): Likewise.
	(__arm_vldrwq_gather_base_wb_z_u32): Likewise.
	(__arm_vstrdq_scatter_base_wb_s64): Likewise.
	(__arm_vstrdq_scatter_base_wb_u64): Likewise.
	(__arm_vstrdq_scatter_base_wb_p_s64): Likewise.
	(__arm_vstrdq_scatter_base_wb_p_u64): Likewise.
	(__arm_vstrwq_scatter_base_wb_p_s32): Likewise.
	(__arm_vstrwq_scatter_base_wb_p_u32): Likewise.
	(__arm_vstrwq_scatter_base_wb_s32): Likewise.
	(__arm_vstrwq_scatter_base_wb_u32): Likewise.
	(__arm_vldrwq_gather_base_wb_f32): Likewise.
	(__arm_vldrwq_gather_base_wb_z_f32): Likewise.
	(__arm_vstrwq_scatter_base_wb_f32): Likewise.
	(__arm_vstrwq_scatter_base_wb_p_f32): Likewise.
	(vstrwq_scatter_base_wb): Define polymorphic variant.
	(vstrwq_scatter_base_wb_p): Likewise.
	(vstrdq_scatter_base_wb_p): Likewise.
	(vstrdq_scatter_base_wb): Likewise.
	* config/arm/arm_mve_builtins.def (LDRGBWBS_QUALIFIERS): Use builtin
	qualifier.
	* config/arm/mve.md (mve_vstrwq_scatter_base_wb_<supf>v4si): Define RTL
	pattern.
	(mve_vstrwq_scatter_base_wb_add_<supf>v4si): Likewise.
	(mve_vstrwq_scatter_base_wb_<supf>v4si_insn): Likewise.
	(mve_vstrwq_scatter_base_wb_p_<supf>v4si): Likewise.
	(mve_vstrwq_scatter_base_wb_p_add_<supf>v4si): Likewise.
	(mve_vstrwq_scatter_base_wb_p_<supf>v4si_insn): Likewise.
	(mve_vstrwq_scatter_base_wb_fv4sf): Likewise.
	(mve_vstrwq_scatter_base_wb_add_fv4sf): Likewise.
	(mve_vstrwq_scatter_base_wb_fv4sf_insn): Likewise.
	(mve_vstrwq_scatter_base_wb_p_fv4sf): Likewise.
	(mve_vstrwq_scatter_base_wb_p_add_fv4sf): Likewise.
	(mve_vstrwq_scatter_base_wb_p_fv4sf_insn): Likewise.
	(mve_vstrdq_scatter_base_wb_<supf>v2di): Likewise.
	(mve_vstrdq_scatter_base_wb_add_<supf>v2di): Likewise.
	(mve_vstrdq_scatter_base_wb_<supf>v2di_insn): Likewise.
	(mve_vstrdq_scatter_base_wb_p_<supf>v2di): Likewise.
	(mve_vstrdq_scatter_base_wb_p_add_<supf>v2di): Likewise.
	(mve_vstrdq_scatter_base_wb_p_<supf>v2di_insn): Likewise.
	(mve_vldrwq_gather_base_wb_<supf>v4si): Likewise.
	(mve_vldrwq_gather_base_wb_<supf>v4si_insn): Likewise.
	(mve_vldrwq_gather_base_wb_z_<supf>v4si): Likewise.
	(mve_vldrwq_gather_base_wb_z_<supf>v4si_insn): Likewise.
	(mve_vldrwq_gather_base_wb_fv4sf): Likewise.
	(mve_vldrwq_gather_base_wb_fv4sf_insn): Likewise.
	(mve_vldrwq_gather_base_wb_z_fv4sf): Likewise.
	(mve_vldrwq_gather_base_wb_z_fv4sf_insn): Likewise.
	(mve_vldrdq_gather_base_wb_<supf>v2di): Likewise.
	(mve_vldrdq_gather_base_wb_<supf>v2di_insn): Likewise.
	(mve_vldrdq_gather_base_wb_z_<supf>v2di): Likewise.
	(mve_vldrdq_gather_base_wb_z_<supf>v2di_insn): Likewise.

gcc/testsuite/ChangeLog:

2019-11-07  Andre Vieira  <andre.simoesdiasvieira@arm.com>
            Mihail Ionescu  <mihail.ionescu@arm.com>
            Srinath Parvathaneni  <srinath.parvathaneni@arm.com>

	* gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_s64.c: New test.
	* gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_u64.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_z_s64.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_z_u64.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vstrdq_scatter_base_wb_p_s64.c:
	Likewise.
	* gcc.target/arm/mve/intrinsics/vstrdq_scatter_base_wb_p_u64.c:
	Likewise.
	* gcc.target/arm/mve/intrinsics/vstrdq_scatter_base_wb_s64.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vstrdq_scatter_base_wb_u64.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_wb_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_wb_p_f32.c:
	Likewise.
	* gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_wb_p_s32.c:
	Likewise.
	* gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_wb_p_u32.c:
	Likewise.
	* gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_wb_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_wb_u32.c: Likewise.


###############     Attachment also inlined for ease of reply    ###############


diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index 283e22897be16accb8ec9f00af8021161f8cd90c..9e87dff264d6b535f64407f669c6e83b0ed639a6 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -694,6 +694,50 @@ arm_quinop_unone_unone_unone_unone_imm_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 #define QUINOP_UNONE_UNONE_UNONE_UNONE_IMM_UNONE_QUALIFIERS \
   (arm_quinop_unone_unone_unone_unone_imm_unone_qualifiers)
 
+static enum arm_type_qualifiers
+arm_ldrgbwbs_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_none, qualifier_unsigned, qualifier_immediate};
+#define LDRGBWBS_QUALIFIERS (arm_ldrgbwbs_qualifiers)
+
+static enum arm_type_qualifiers
+arm_ldrgbwbu_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_unsigned, qualifier_unsigned, qualifier_immediate};
+#define LDRGBWBU_QUALIFIERS (arm_ldrgbwbu_qualifiers)
+
+static enum arm_type_qualifiers
+arm_ldrgbwbs_z_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_none, qualifier_unsigned, qualifier_immediate,
+      qualifier_unsigned};
+#define LDRGBWBS_Z_QUALIFIERS (arm_ldrgbwbs_z_qualifiers)
+
+static enum arm_type_qualifiers
+arm_ldrgbwbu_z_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_unsigned, qualifier_unsigned, qualifier_immediate,
+      qualifier_unsigned};
+#define LDRGBWBU_Z_QUALIFIERS (arm_ldrgbwbu_z_qualifiers)
+
+static enum arm_type_qualifiers
+arm_strsbwbs_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_void, qualifier_unsigned, qualifier_const, qualifier_none};
+#define STRSBWBS_QUALIFIERS (arm_strsbwbs_qualifiers)
+
+static enum arm_type_qualifiers
+arm_strsbwbu_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_void, qualifier_unsigned, qualifier_const, qualifier_unsigned};
+#define STRSBWBU_QUALIFIERS (arm_strsbwbu_qualifiers)
+
+static enum arm_type_qualifiers
+arm_strsbwbs_p_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_void, qualifier_unsigned, qualifier_const,
+      qualifier_none, qualifier_unsigned};
+#define STRSBWBS_P_QUALIFIERS (arm_strsbwbs_p_qualifiers)
+
+static enum arm_type_qualifiers
+arm_strsbwbu_p_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_void, qualifier_unsigned, qualifier_const,
+      qualifier_unsigned, qualifier_unsigned};
+#define STRSBWBU_P_QUALIFIERS (arm_strsbwbu_p_qualifiers)
+
 /* End of Qualifier for MVE builtins.  */
 
    /* void ([T element type] *, T, immediate).  */
diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
index 4ae40c247b96ad1bf6c7bc38e5be1e5a489d1ef0..842d47367b54d826acea7fe358cc58c78f6e0256 100644
--- a/gcc/config/arm/arm_mve.h
+++ b/gcc/config/arm/arm_mve.h
@@ -2054,6 +2054,26 @@ typedef struct { uint8x16_t val[4]; } uint8x16x4_t;
 #define viwdupq_wb_u8( __a, __b,  __imm) __arm_viwdupq_wb_u8( __a, __b,  __imm)
 #define viwdupq_wb_u32( __a, __b,  __imm) __arm_viwdupq_wb_u32( __a, __b,  __imm)
 #define viwdupq_wb_u16( __a, __b,  __imm) __arm_viwdupq_wb_u16( __a, __b,  __imm)
+#define vldrdq_gather_base_wb_s64(__addr, __offset) __arm_vldrdq_gather_base_wb_s64(__addr, __offset)
+#define vldrdq_gather_base_wb_u64(__addr, __offset) __arm_vldrdq_gather_base_wb_u64(__addr, __offset)
+#define vldrdq_gather_base_wb_z_s64(__addr, __offset, __p) __arm_vldrdq_gather_base_wb_z_s64(__addr, __offset, __p)
+#define vldrdq_gather_base_wb_z_u64(__addr, __offset, __p) __arm_vldrdq_gather_base_wb_z_u64(__addr, __offset, __p)
+#define vldrwq_gather_base_wb_f32(__addr, __offset) __arm_vldrwq_gather_base_wb_f32(__addr, __offset)
+#define vldrwq_gather_base_wb_s32(__addr, __offset) __arm_vldrwq_gather_base_wb_s32(__addr, __offset)
+#define vldrwq_gather_base_wb_u32(__addr, __offset) __arm_vldrwq_gather_base_wb_u32(__addr, __offset)
+#define vldrwq_gather_base_wb_z_f32(__addr, __offset, __p) __arm_vldrwq_gather_base_wb_z_f32(__addr, __offset, __p)
+#define vldrwq_gather_base_wb_z_s32(__addr, __offset, __p) __arm_vldrwq_gather_base_wb_z_s32(__addr, __offset, __p)
+#define vldrwq_gather_base_wb_z_u32(__addr, __offset, __p) __arm_vldrwq_gather_base_wb_z_u32(__addr, __offset, __p)
+#define vstrdq_scatter_base_wb_p_s64(__addr, __offset, __value, __p) __arm_vstrdq_scatter_base_wb_p_s64(__addr, __offset, __value, __p)
+#define vstrdq_scatter_base_wb_p_u64(__addr, __offset, __value, __p) __arm_vstrdq_scatter_base_wb_p_u64(__addr, __offset, __value, __p)
+#define vstrdq_scatter_base_wb_s64(__addr, __offset, __value) __arm_vstrdq_scatter_base_wb_s64(__addr, __offset, __value)
+#define vstrdq_scatter_base_wb_u64(__addr, __offset, __value) __arm_vstrdq_scatter_base_wb_u64(__addr, __offset, __value)
+#define vstrwq_scatter_base_wb_p_s32(__addr, __offset, __value, __p) __arm_vstrwq_scatter_base_wb_p_s32(__addr, __offset, __value, __p)
+#define vstrwq_scatter_base_wb_p_f32(__addr, __offset, __value, __p) __arm_vstrwq_scatter_base_wb_p_f32(__addr, __offset, __value, __p)
+#define vstrwq_scatter_base_wb_p_u32(__addr, __offset, __value, __p) __arm_vstrwq_scatter_base_wb_p_u32(__addr, __offset, __value, __p)
+#define vstrwq_scatter_base_wb_s32(__addr, __offset, __value) __arm_vstrwq_scatter_base_wb_s32(__addr, __offset, __value)
+#define vstrwq_scatter_base_wb_u32(__addr, __offset, __value) __arm_vstrwq_scatter_base_wb_u32(__addr, __offset, __value)
+#define vstrwq_scatter_base_wb_f32(__addr, __offset, __value) __arm_vstrwq_scatter_base_wb_f32(__addr, __offset, __value)
 #endif
 
 __extension__ extern __inline void
@@ -13388,6 +13408,150 @@ __arm_viwdupq_wb_u16 (uint32_t * __a, uint32_t __b, const int __imm)
   return __res;
 }
 
+__extension__ extern __inline int64x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vldrdq_gather_base_wb_s64 (uint64x2_t * __addr, const int __offset)
+{
+  int64x2_t
+  result = __builtin_mve_vldrdq_gather_base_wb_sv2di (*__addr, __offset);
+  __addr += __offset;
+  return result;
+}
+
+__extension__ extern __inline uint64x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vldrdq_gather_base_wb_u64 (uint64x2_t * __addr, const int __offset)
+{
+  uint64x2_t
+  result = __builtin_mve_vldrdq_gather_base_wb_uv2di (*__addr, __offset);
+  __addr += __offset;
+  return result;
+}
+
+__extension__ extern __inline int64x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vldrdq_gather_base_wb_z_s64 (uint64x2_t * __addr, const int __offset, mve_pred16_t __p)
+{
+  int64x2_t
+  result = __builtin_mve_vldrdq_gather_base_wb_z_sv2di (*__addr, __offset, __p);
+  __addr += __offset;
+  return result;
+}
+
+__extension__ extern __inline uint64x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vldrdq_gather_base_wb_z_u64 (uint64x2_t * __addr, const int __offset, mve_pred16_t __p)
+{
+  uint64x2_t
+  result = __builtin_mve_vldrdq_gather_base_wb_z_uv2di (*__addr, __offset, __p);
+  __addr += __offset;
+  return result;
+}
+
+__extension__ extern __inline int32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vldrwq_gather_base_wb_s32 (uint32x4_t * __addr, const int __offset)
+{
+  int32x4_t
+  result = __builtin_mve_vldrwq_gather_base_wb_sv4si (*__addr, __offset);
+  __addr += __offset;
+  return result;
+}
+
+__extension__ extern __inline uint32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vldrwq_gather_base_wb_u32 (uint32x4_t * __addr, const int __offset)
+{
+  uint32x4_t
+  result = __builtin_mve_vldrwq_gather_base_wb_uv4si (*__addr, __offset);
+  __addr += __offset;
+  return result;
+}
+
+__extension__ extern __inline int32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vldrwq_gather_base_wb_z_s32 (uint32x4_t * __addr, const int __offset, mve_pred16_t __p)
+{
+  int32x4_t
+  result = __builtin_mve_vldrwq_gather_base_wb_z_sv4si (*__addr, __offset, __p);
+  __addr += __offset;
+  return result;
+}
+
+__extension__ extern __inline uint32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vldrwq_gather_base_wb_z_u32 (uint32x4_t * __addr, const int __offset, mve_pred16_t __p)
+{
+  uint32x4_t
+  result = __builtin_mve_vldrwq_gather_base_wb_z_uv4si (*__addr, __offset, __p);
+  __addr += __offset;
+  return result;
+}
+
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vstrdq_scatter_base_wb_s64 (uint64x2_t * __addr, const int __offset, int64x2_t __value)
+{
+  __builtin_mve_vstrdq_scatter_base_wb_sv2di (*__addr, __offset, __value);
+  __builtin_mve_vstrdq_scatter_base_wb_add_sv2di (*__addr, __offset, *__addr);
+}
+
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vstrdq_scatter_base_wb_u64 (uint64x2_t * __addr, const int __offset, uint64x2_t __value)
+{
+  __builtin_mve_vstrdq_scatter_base_wb_uv2di (*__addr, __offset, __value);
+  __builtin_mve_vstrdq_scatter_base_wb_add_uv2di (*__addr, __offset, *__addr);
+}
+
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vstrdq_scatter_base_wb_p_s64 (uint64x2_t * __addr, const int __offset, int64x2_t __value, mve_pred16_t __p)
+{
+  __builtin_mve_vstrdq_scatter_base_wb_p_sv2di (*__addr, __offset, __value, __p);
+  __builtin_mve_vstrdq_scatter_base_wb_p_add_sv2di (*__addr, __offset, *__addr, __p);
+}
+
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vstrdq_scatter_base_wb_p_u64 (uint64x2_t * __addr, const int __offset, uint64x2_t __value, mve_pred16_t __p)
+{
+  __builtin_mve_vstrdq_scatter_base_wb_p_uv2di (*__addr, __offset, __value, __p);
+  __builtin_mve_vstrdq_scatter_base_wb_p_add_uv2di (*__addr, __offset, *__addr, __p);
+}
+
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vstrwq_scatter_base_wb_p_s32 (uint32x4_t * __addr, const int __offset, int32x4_t __value, mve_pred16_t __p)
+{
+  __builtin_mve_vstrwq_scatter_base_wb_p_sv4si (*__addr, __offset, __value, __p);
+  __builtin_mve_vstrwq_scatter_base_wb_p_add_sv4si (*__addr, __offset, *__addr, __p);
+}
+
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vstrwq_scatter_base_wb_p_u32 (uint32x4_t * __addr, const int __offset, uint32x4_t __value, mve_pred16_t __p)
+{
+  __builtin_mve_vstrwq_scatter_base_wb_p_uv4si (*__addr, __offset, __value, __p);
+  __builtin_mve_vstrwq_scatter_base_wb_p_add_uv4si (*__addr, __offset, *__addr, __p);
+}
+
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vstrwq_scatter_base_wb_s32 (uint32x4_t * __addr, const int __offset, int32x4_t __value)
+{
+  __builtin_mve_vstrwq_scatter_base_wb_sv4si (*__addr, __offset, __value);
+  __builtin_mve_vstrwq_scatter_base_wb_add_sv4si (*__addr, __offset, *__addr);
+}
+
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vstrwq_scatter_base_wb_u32 (uint32x4_t * __addr, const int __offset, uint32x4_t __value)
+{
+  __builtin_mve_vstrwq_scatter_base_wb_uv4si (*__addr, __offset, __value);
+  __builtin_mve_vstrwq_scatter_base_wb_add_uv4si (*__addr, __offset, *__addr);
+}
+
 #if (__ARM_FEATURE_MVE & 2) /* MVE Floating point.  */
 
 __extension__ extern __inline void
@@ -16024,6 +16188,42 @@ __arm_vreinterpretq_f32_u8 (uint8x16_t __a)
   return (float32x4_t)  __a;
 }
 
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vldrwq_gather_base_wb_f32 (uint32x4_t * __addr, const int __offset)
+{
+  float32x4_t
+  result = __builtin_mve_vldrwq_gather_base_wb_fv4sf (*__addr, __offset);
+  __addr += __offset;
+  return result;
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vldrwq_gather_base_wb_z_f32 (uint32x4_t * __addr, const int __offset, mve_pred16_t __p)
+{
+  float32x4_t
+  result = __builtin_mve_vldrwq_gather_base_wb_z_fv4sf (*__addr, __offset, __p);
+  __addr += __offset;
+  return result;
+}
+
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vstrwq_scatter_base_wb_f32 (uint32x4_t * __addr, const int __offset, float32x4_t __value)
+{
+  __builtin_mve_vstrwq_scatter_base_wb_fv4sf (*__addr, __offset, __value);
+  __builtin_mve_vstrwq_scatter_base_wb_add_fv4sf (*__addr, __offset, *__addr);
+}
+
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vstrwq_scatter_base_wb_p_f32 (uint32x4_t * __addr, const int __offset, float32x4_t __value, mve_pred16_t __p)
+{
+  __builtin_mve_vstrwq_scatter_base_wb_p_fv4sf (*__addr, __offset, __value, __p);
+  __builtin_mve_vstrwq_scatter_base_wb_p_add_fv4sf (*__addr, __offset, *__addr, __p);
+}
+
 #endif
 
 enum {
@@ -18037,6 +18237,20 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_uint64x2_t]: __arm_vreinterpretq_u8_u64 (__ARM_mve_coerce(__p0, uint64x2_t)), \
   int (*)[__ARM_mve_type_float32x4_t]: __arm_vreinterpretq_u8_f32 (__ARM_mve_coerce(__p0, float32x4_t)));})
 
+#define vstrwq_scatter_base_wb(p0,p1,p2) __arm_vstrwq_scatter_base_wb(p0,p1,p2)
+#define __arm_vstrwq_scatter_base_wb(p0,p1,p2) ({ __typeof(p2) __p2 = (p2); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p2)])0, \
+  int (*)[__ARM_mve_type_int32x4_t]: __arm_vstrwq_scatter_base_wb_s32 (p0, p1, __ARM_mve_coerce(__p2, int32x4_t)), \
+  int (*)[__ARM_mve_type_uint32x4_t]: __arm_vstrwq_scatter_base_wb_u32 (p0, p1, __ARM_mve_coerce(__p2, uint32x4_t)), \
+  int (*)[__ARM_mve_type_float32x4_t]: __arm_vstrwq_scatter_base_wb_f32 (p0, p1, __ARM_mve_coerce(__p2, float32x4_t)));})
+
+#define vstrwq_scatter_base_wb_p(p0,p1,p2,p3) __arm_vstrwq_scatter_base_wb_p(p0,p1,p2,p3)
+#define __arm_vstrwq_scatter_base_wb_p(p0,p1,p2,p3) ({ __typeof(p2) __p2 = (p2); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p2)])0, \
+  int (*)[__ARM_mve_type_int32x4_t]: __arm_vstrwq_scatter_base_wb_p_s32 (p0, p1, __ARM_mve_coerce(__p2, int32x4_t), p3), \
+  int (*)[__ARM_mve_type_uint32x4_t]: __arm_vstrwq_scatter_base_wb_p_u32 (p0, p1, __ARM_mve_coerce(__p2, uint32x4_t), p3), \
+  int (*)[__ARM_mve_type_float32x4_t]: __arm_vstrwq_scatter_base_wb_p_f32 (p0, p1, __ARM_mve_coerce(__p2, float32x4_t), p3));})
+
 #else /* MVE Interger.  */
 
 #define vst4q(p0,p1) __arm_vst4q(p0,p1)
@@ -21497,6 +21711,30 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_uint32_t]: __arm_vdwdupq_n_u8 (__ARM_mve_coerce(__p0, uint32_t), p1, p2), \
   int (*)[__ARM_mve_type_uint32_t_ptr]: __arm_vdwdupq_wb_u8 (__ARM_mve_coerce(__p0, uint32_t *), p1, p2));})
 
+#define vstrdq_scatter_base_wb_p(p0,p1,p2,p3) __arm_vstrdq_scatter_base_wb_p(p0,p1,p2,p3)
+#define __arm_vstrdq_scatter_base_wb_p(p0,p1,p2,p3) ({ __typeof(p2) __p2 = (p2); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p2)])0, \
+  int (*)[__ARM_mve_type_int64x2_t]: __arm_vstrdq_scatter_base_wb_p_s64 (p0, p1, __ARM_mve_coerce(__p2, int64x2_t), p3), \
+  int (*)[__ARM_mve_type_uint64x2_t]: __arm_vstrdq_scatter_base_wb_p_u64 (p0, p1, __ARM_mve_coerce(__p2, uint64x2_t), p3));})
+
+#define vstrdq_scatter_base_wb(p0,p1,p2) __arm_vstrdq_scatter_base_wb(p0,p1,p2)
+#define __arm_vstrdq_scatter_base_wb(p0,p1,p2) ({ __typeof(p2) __p2 = (p2); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p2)])0, \
+  int (*)[__ARM_mve_type_int64x2_t]: __arm_vstrdq_scatter_base_wb_s64 (p0, p1, __ARM_mve_coerce(__p2, int64x2_t)), \
+  int (*)[__ARM_mve_type_uint64x2_t]: __arm_vstrdq_scatter_base_wb_u64 (p0, p1, __ARM_mve_coerce(__p2, uint64x2_t)));})
+
+#define vstrwq_scatter_base_wb(p0,p1,p2) __arm_vstrwq_scatter_base_wb(p0,p1,p2)
+#define __arm_vstrwq_scatter_base_wb(p0,p1,p2) ({ __typeof(p2) __p2 = (p2); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p2)])0, \
+  int (*)[__ARM_mve_type_int32x4_t]: __arm_vstrwq_scatter_base_wb_s32 (p0, p1, __ARM_mve_coerce(__p2, int32x4_t)), \
+  int (*)[__ARM_mve_type_uint32x4_t]: __arm_vstrwq_scatter_base_wb_u32 (p0, p1, __ARM_mve_coerce(__p2, uint32x4_t)));})
+
+#define vstrwq_scatter_base_wb_p(p0,p1,p2,p3) __arm_vstrwq_scatter_base_wb_p(p0,p1,p2,p3)
+#define __arm_vstrwq_scatter_base_wb_p(p0,p1,p2,p3) ({ __typeof(p2) __p2 = (p2); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p2)])0, \
+  int (*)[__ARM_mve_type_int32x4_t]: __arm_vstrwq_scatter_base_wb_p_s32 (p0, p1, __ARM_mve_coerce(__p2, int32x4_t), p3), \
+  int (*)[__ARM_mve_type_uint32x4_t]: __arm_vstrwq_scatter_base_wb_p_u32 (p0, p1, __ARM_mve_coerce(__p2, uint32x4_t), p3));})
+
 #endif /* MVE Floating point.  */
 
 #ifdef __cplusplus
diff --git a/gcc/config/arm/arm_mve_builtins.def b/gcc/config/arm/arm_mve_builtins.def
index ac920d327e98a2db0c5e6aa8680dd40f295b579a..b77335cff133872558b48b5574dccc0f17df9ed1 100644
--- a/gcc/config/arm/arm_mve_builtins.def
+++ b/gcc/config/arm/arm_mve_builtins.def
@@ -827,3 +827,33 @@ VAR3 (QUADOP_UNONE_UNONE_UNONE_IMM_UNONE, vddupq_m_n_u, v16qi, v8hi, v4si)
 VAR3 (QUADOP_UNONE_UNONE_UNONE_IMM_UNONE, vidupq_m_n_u, v16qi, v8hi, v4si)
 VAR3 (TERNOP_UNONE_UNONE_UNONE_IMM, vdwdupq_n_u, v16qi, v4si, v8hi)
 VAR3 (TERNOP_UNONE_UNONE_UNONE_IMM, viwdupq_n_u, v16qi, v4si, v8hi)
+VAR1 (STRSBWBU, vstrwq_scatter_base_wb_u, v4si)
+VAR1 (STRSBWBU, vstrwq_scatter_base_wb_add_u, v4si)
+VAR1 (STRSBWBU, vstrwq_scatter_base_wb_add_s, v4si)
+VAR1 (STRSBWBU, vstrwq_scatter_base_wb_add_f, v4sf)
+VAR1 (STRSBWBU, vstrdq_scatter_base_wb_u, v2di)
+VAR1 (STRSBWBU, vstrdq_scatter_base_wb_add_u, v2di)
+VAR1 (STRSBWBU, vstrdq_scatter_base_wb_add_s, v2di)
+VAR1 (STRSBWBU_P, vstrwq_scatter_base_wb_p_u, v4si)
+VAR1 (STRSBWBU_P, vstrwq_scatter_base_wb_p_add_u, v4si)
+VAR1 (STRSBWBU_P, vstrwq_scatter_base_wb_p_add_s, v4si)
+VAR1 (STRSBWBU_P, vstrwq_scatter_base_wb_p_add_f, v4sf)
+VAR1 (STRSBWBU_P, vstrdq_scatter_base_wb_p_u, v2di)
+VAR1 (STRSBWBU_P, vstrdq_scatter_base_wb_p_add_u, v2di)
+VAR1 (STRSBWBU_P, vstrdq_scatter_base_wb_p_add_s, v2di)
+VAR1 (STRSBWBS, vstrwq_scatter_base_wb_s, v4si)
+VAR1 (STRSBWBS, vstrwq_scatter_base_wb_f, v4sf)
+VAR1 (STRSBWBS, vstrdq_scatter_base_wb_s, v2di)
+VAR1 (STRSBWBS_P, vstrwq_scatter_base_wb_p_s, v4si)
+VAR1 (STRSBWBS_P, vstrwq_scatter_base_wb_p_f, v4sf)
+VAR1 (STRSBWBS_P, vstrdq_scatter_base_wb_p_s, v2di)
+VAR1 (LDRGBWBU_Z, vldrwq_gather_base_wb_z_u, v4si)
+VAR1 (LDRGBWBU_Z, vldrdq_gather_base_wb_z_u, v2di)
+VAR1 (LDRGBWBU, vldrwq_gather_base_wb_u, v4si)
+VAR1 (LDRGBWBU, vldrdq_gather_base_wb_u, v2di)
+VAR1 (LDRGBWBS_Z, vldrwq_gather_base_wb_z_s, v4si)
+VAR1 (LDRGBWBS_Z, vldrwq_gather_base_wb_z_f, v4sf)
+VAR1 (LDRGBWBS_Z, vldrdq_gather_base_wb_z_s, v2di)
+VAR1 (LDRGBWBS, vldrwq_gather_base_wb_s, v4si)
+VAR1 (LDRGBWBS, vldrwq_gather_base_wb_f, v4sf)
+VAR1 (LDRGBWBS, vldrdq_gather_base_wb_s, v2di)
diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index b1b35ab23e677800c049d7c7873ab98d76b52538..a938e0922f8dc6749dc7192961ae2091d666c6e7 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -208,7 +208,10 @@
 			 VSTRDQSSO_U VSTRWQSO_S VSTRWQSO_U VSTRWQSSO_S
 			 VSTRWQSSO_U VSTRHQSO_F VSTRHQSSO_F VSTRWQSB_F
 			 VSTRWQSO_F VSTRWQSSO_F VDDUPQ VDDUPQ_M VDWDUPQ
-			 VDWDUPQ_M VIDUPQ VIDUPQ_M VIWDUPQ VIWDUPQ_M])
+			 VDWDUPQ_M VIDUPQ VIDUPQ_M VIWDUPQ VIWDUPQ_M
+			 VSTRWQSBWB_S VSTRWQSBWB_U VLDRWQGBWB_S VLDRWQGBWB_U
+			 VSTRWQSBWB_F VLDRWQGBWB_F VSTRDQSBWB_S VSTRDQSBWB_U
+			 VLDRDQGBWB_S VLDRDQGBWB_U])
 
 (define_mode_attr MVE_CNVT [(V8HI "V8HF") (V4SI "V4SF") (V8HF "V8HI")
 			    (V4SF "V4SI")])
@@ -377,7 +380,10 @@
 		       (VSTRDQSB_S "s") (VSTRDQSB_U "u") (VSTRDQSO_S "s")
 		       (VSTRDQSO_U "u") (VSTRDQSSO_S "s") (VSTRDQSSO_U "u")
 		       (VSTRWQSO_U "u") (VSTRWQSO_S "s") (VSTRWQSSO_U "u")
-		       (VSTRWQSSO_S "s")])
+		       (VSTRWQSSO_S "s") (VSTRWQSBWB_S "s") (VSTRWQSBWB_U "u")
+		       (VLDRWQGBWB_S "s") (VLDRWQGBWB_U "u") (VLDRDQGBWB_S "s")
+		       (VLDRDQGBWB_U "u") (VSTRDQSBWB_S "s")
+		       (VSTRDQSBWB_U "u")])
 
 (define_int_attr mode1 [(VCTP8Q "8") (VCTP16Q "16") (VCTP32Q "32")
 			(VCTP64Q "64") (VCTP8Q_M "8") (VCTP16Q_M "16")
@@ -626,6 +632,10 @@
 (define_int_iterator VSTRDSSOQ [VSTRDQSSO_S VSTRDQSSO_U])
 (define_int_iterator VSTRWSOQ [VSTRWQSO_S VSTRWQSO_U])
 (define_int_iterator VSTRWSSOQ [VSTRWQSSO_S VSTRWQSSO_U])
+(define_int_iterator VSTRWSBWBQ [VSTRWQSBWB_S VSTRWQSBWB_U])
+(define_int_iterator VLDRWGBWBQ [VLDRWQGBWB_S VLDRWQGBWB_U])
+(define_int_iterator VSTRDSBWBQ [VSTRDQSBWB_S VSTRDQSBWB_U])
+(define_int_iterator VLDRDGBWBQ [VLDRDQGBWB_S VLDRDQGBWB_U])
 
 (define_insn "*mve_mov<mode>"
   [(set (match_operand:MVE_types 0 "s_register_operand" "=w,w,r,w,w,r,w")
@@ -10035,3 +10045,572 @@
   "vpst\;\tviwdupt.u%#<V_sz_elem>\t%q2, %3, %4, %5"
   [(set_attr "type" "mve_move")
    (set_attr "length""8")])
+(define_expand "mve_vstrwq_scatter_base_wb_<supf>v4si"
+  [(match_operand:V4SI 0 "s_register_operand" "=w")
+   (match_operand:SI 1 "mve_vldrd_immediate" "Ri")
+   (match_operand:V4SI 2 "s_register_operand" "w")
+   (unspec:V4SI [(const_int 0)] VSTRWSBWBQ)]
+  "TARGET_HAVE_MVE"
+{
+  rtx ignore_wb = gen_reg_rtx (V4SImode);
+  emit_insn (
+  gen_mve_vstrwq_scatter_base_wb_<supf>v4si_insn (ignore_wb, operands[0],
+						  operands[1], operands[2]));
+  DONE;
+})
+
+(define_expand "mve_vstrwq_scatter_base_wb_add_<supf>v4si"
+  [(match_operand:V4SI 0 "s_register_operand" "=w")
+   (match_operand:SI 1 "mve_vldrd_immediate" "Ri")
+   (match_operand:V4SI 2 "s_register_operand" "0")
+   (unspec:V4SI [(const_int 0)] VSTRWSBWBQ)]
+  "TARGET_HAVE_MVE"
+{
+  rtx ignore_vec = gen_reg_rtx (V4SImode);
+  emit_insn (
+  gen_mve_vstrwq_scatter_base_wb_<supf>v4si_insn (operands[0], operands[2],
+						  operands[1], ignore_vec));
+  DONE;
+})
+
+;;
+;; [vstrwq_scatter_base_wb_s vstrdq_scatter_base_wb_u]
+;;
+(define_insn "mve_vstrwq_scatter_base_wb_<supf>v4si_insn"
+  [(set (mem:BLK (scratch))
+	(unspec:BLK
+		[(match_operand:V4SI 1 "s_register_operand" "0")
+		 (match_operand:SI 2 "mve_vldrd_immediate" "Ri")
+		 (match_operand:V4SI 3 "s_register_operand" "w")]
+	 VSTRWSBWBQ))
+   (set (match_operand:V4SI 0 "s_register_operand" "=w")
+	(unspec:V4SI [(match_dup 1) (match_dup 2)]
+	 VSTRWSBWBQ))
+  ]
+  "TARGET_HAVE_MVE"
+{
+   rtx ops[3];
+   ops[0] = operands[1];
+   ops[1] = operands[2];
+   ops[2] = operands[3];
+   output_asm_insn ("vstrw.u32\t%q2, [%q0, %1]!",ops);
+   return "";
+}
+  [(set_attr "length" "4")])
+
+(define_expand "mve_vstrwq_scatter_base_wb_p_<supf>v4si"
+  [(match_operand:V4SI 0 "s_register_operand" "=w")
+   (match_operand:SI 1 "mve_vldrd_immediate" "Ri")
+   (match_operand:V4SI 2 "s_register_operand" "w")
+   (match_operand:HI 3 "vpr_register_operand")
+   (unspec:V4SI [(const_int 0)] VSTRWSBWBQ)]
+  "TARGET_HAVE_MVE"
+{
+  rtx ignore_wb = gen_reg_rtx (V4SImode);
+  emit_insn (
+  gen_mve_vstrwq_scatter_base_wb_p_<supf>v4si_insn (ignore_wb, operands[0],
+						    operands[1], operands[2],
+						    operands[3]));
+  DONE;
+})
+
+(define_expand "mve_vstrwq_scatter_base_wb_p_add_<supf>v4si"
+  [(match_operand:V4SI 0 "s_register_operand" "=w")
+   (match_operand:SI 1 "mve_vldrd_immediate" "Ri")
+   (match_operand:V4SI 2 "s_register_operand" "0")
+   (match_operand:HI 3 "vpr_register_operand")
+   (unspec:V4SI [(const_int 0)] VSTRWSBWBQ)]
+  "TARGET_HAVE_MVE"
+{
+  rtx ignore_vec = gen_reg_rtx (V4SImode);
+  emit_insn (
+  gen_mve_vstrwq_scatter_base_wb_p_<supf>v4si_insn (operands[0], operands[2],
+						    operands[1], ignore_vec,
+						    operands[3]));
+  DONE;
+})
+
+;;
+;; [vstrwq_scatter_base_wb_p_s vstrwq_scatter_base_wb_p_u]
+;;
+(define_insn "mve_vstrwq_scatter_base_wb_p_<supf>v4si_insn"
+ [(set (mem:BLK (scratch))
+       (unspec:BLK
+		[(match_operand:V4SI 1 "s_register_operand" "0")
+		 (match_operand:SI 2 "mve_vldrd_immediate" "Ri")
+		 (match_operand:V4SI 3 "s_register_operand" "w")
+		 (match_operand:HI 4 "vpr_register_operand")]
+	VSTRWSBWBQ))
+   (set (match_operand:V4SI 0 "s_register_operand" "=w")
+	(unspec:V4SI [(match_dup 1) (match_dup 2)]
+	 VSTRWSBWBQ))
+  ]
+  "TARGET_HAVE_MVE"
+{
+   rtx ops[3];
+   ops[0] = operands[1];
+   ops[1] = operands[2];
+   ops[2] = operands[3];
+   output_asm_insn ("vpst\;\tvstrwt.u32\t%q2, [%q0, %1]!",ops);
+   return "";
+}
+  [(set_attr "length" "8")])
+
+(define_expand "mve_vstrwq_scatter_base_wb_fv4sf"
+  [(match_operand:V4SI 0 "s_register_operand" "=w")
+   (match_operand:SI 1 "mve_vldrd_immediate" "Ri")
+   (match_operand:V4SF 2 "s_register_operand" "w")
+   (unspec:V4SI [(const_int 0)] VSTRWQSBWB_F)]
+  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
+{
+  rtx ignore_wb = gen_reg_rtx (V4SImode);
+  emit_insn (
+  gen_mve_vstrwq_scatter_base_wb_fv4sf_insn (ignore_wb,operands[0],
+					     operands[1], operands[2]));
+  DONE;
+})
+
+(define_expand "mve_vstrwq_scatter_base_wb_add_fv4sf"
+  [(match_operand:V4SI 0 "s_register_operand" "=w")
+   (match_operand:SI 1 "mve_vldrd_immediate" "Ri")
+   (match_operand:V4SI 2 "s_register_operand" "0")
+   (unspec:V4SI [(const_int 0)] VSTRWQSBWB_F)]
+  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
+{
+  rtx ignore_vec = gen_reg_rtx (V4SFmode);
+  emit_insn (
+  gen_mve_vstrwq_scatter_base_wb_fv4sf_insn (operands[0], operands[2],
+					     operands[1], ignore_vec));
+  DONE;
+})
+
+;;
+;; [vstrwq_scatter_base_wb_f]
+;;
+(define_insn "mve_vstrwq_scatter_base_wb_fv4sf_insn"
+ [(set (mem:BLK (scratch))
+       (unspec:BLK
+		[(match_operand:V4SI 1 "s_register_operand" "0")
+		 (match_operand:SI 2 "mve_vldrd_immediate" "Ri")
+		 (match_operand:V4SF 3 "s_register_operand" "w")]
+	 VSTRWQSBWB_F))
+   (set (match_operand:V4SI 0 "s_register_operand" "=w")
+	(unspec:V4SI [(match_dup 1) (match_dup 2)]
+	 VSTRWQSBWB_F))
+  ]
+  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
+{
+   rtx ops[3];
+   ops[0] = operands[1];
+   ops[1] = operands[2];
+   ops[2] = operands[3];
+   output_asm_insn ("vstrw.u32\t%q2, [%q0, %1]!",ops);
+   return "";
+}
+  [(set_attr "length" "4")])
+
+(define_expand "mve_vstrwq_scatter_base_wb_p_fv4sf"
+  [(match_operand:V4SI 0 "s_register_operand" "=w")
+   (match_operand:SI 1 "mve_vldrd_immediate" "Ri")
+   (match_operand:V4SF 2 "s_register_operand" "w")
+   (match_operand:HI 3 "vpr_register_operand")
+   (unspec:V4SI [(const_int 0)] VSTRWQSBWB_F)]
+  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
+{
+  rtx ignore_wb = gen_reg_rtx (V4SImode);
+  emit_insn (
+  gen_mve_vstrwq_scatter_base_wb_p_fv4sf_insn (ignore_wb, operands[0],
+					       operands[1], operands[2],
+					       operands[3]));
+  DONE;
+})
+
+(define_expand "mve_vstrwq_scatter_base_wb_p_add_fv4sf"
+  [(match_operand:V4SI 0 "s_register_operand" "=w")
+   (match_operand:SI 1 "mve_vldrd_immediate" "Ri")
+   (match_operand:V4SI 2 "s_register_operand" "0")
+   (match_operand:HI 3 "vpr_register_operand")
+   (unspec:V4SI [(const_int 0)] VSTRWQSBWB_F)]
+  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
+{
+  rtx ignore_vec = gen_reg_rtx (V4SFmode);
+  emit_insn (
+  gen_mve_vstrwq_scatter_base_wb_p_fv4sf_insn (operands[0], operands[2],
+					       operands[1], ignore_vec,
+					       operands[3]));
+  DONE;
+})
+
+;;
+;; [vstrwq_scatter_base_wb_p_f]
+;;
+(define_insn "mve_vstrwq_scatter_base_wb_p_fv4sf_insn"
+ [(set (mem:BLK (scratch))
+       (unspec:BLK
+		[(match_operand:V4SI 1 "s_register_operand" "0")
+		 (match_operand:SI 2 "mve_vldrd_immediate" "Ri")
+		 (match_operand:V4SF 3 "s_register_operand" "w")
+		 (match_operand:HI 4 "vpr_register_operand")]
+	VSTRWQSBWB_F))
+   (set (match_operand:V4SI 0 "s_register_operand" "=w")
+	(unspec:V4SI [(match_dup 1) (match_dup 2)]
+	 VSTRWQSBWB_F))
+  ]
+  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
+{
+   rtx ops[3];
+   ops[0] = operands[1];
+   ops[1] = operands[2];
+   ops[2] = operands[3];
+   output_asm_insn ("vpst\;\tvstrwt.u32\t%q2, [%q0, %1]!",ops);
+   return "";
+}
+  [(set_attr "length" "8")])
+
+(define_expand "mve_vstrdq_scatter_base_wb_<supf>v2di"
+  [(match_operand:V2DI 0 "s_register_operand" "=w")
+   (match_operand:SI 1 "mve_vldrd_immediate" "Ri")
+   (match_operand:V2DI 2 "s_register_operand" "w")
+   (unspec:V2DI [(const_int 0)] VSTRDSBWBQ)]
+  "TARGET_HAVE_MVE"
+{
+  rtx ignore_wb = gen_reg_rtx (V2DImode);
+  emit_insn (
+  gen_mve_vstrdq_scatter_base_wb_<supf>v2di_insn (ignore_wb, operands[0],
+						  operands[1], operands[2]));
+  DONE;
+})
+
+(define_expand "mve_vstrdq_scatter_base_wb_add_<supf>v2di"
+  [(match_operand:V2DI 0 "s_register_operand" "=w")
+   (match_operand:SI 1 "mve_vldrd_immediate" "Ri")
+   (match_operand:V2DI 2 "s_register_operand" "0")
+   (unspec:V2DI [(const_int 0)] VSTRDSBWBQ)]
+  "TARGET_HAVE_MVE"
+{
+  rtx ignore_vec = gen_reg_rtx (V2DImode);
+  emit_insn (
+  gen_mve_vstrdq_scatter_base_wb_<supf>v2di_insn (operands[0], operands[2],
+						  operands[1], ignore_vec));
+  DONE;
+})
+
+;;
+;; [vstrdq_scatter_base_wb_s vstrdq_scatter_base_wb_u]
+;;
+(define_insn "mve_vstrdq_scatter_base_wb_<supf>v2di_insn"
+  [(set (mem:BLK (scratch))
+	(unspec:BLK
+		[(match_operand:V2DI 1 "s_register_operand" "0")
+		 (match_operand:SI 2 "mve_vldrd_immediate" "Ri")
+		 (match_operand:V2DI 3 "s_register_operand" "w")]
+	 VSTRDSBWBQ))
+   (set (match_operand:V2DI 0 "s_register_operand" "=&w")
+	(unspec:V2DI [(match_dup 1) (match_dup 2)]
+	 VSTRDSBWBQ))
+  ]
+  "TARGET_HAVE_MVE"
+{
+   rtx ops[3];
+   ops[0] = operands[1];
+   ops[1] = operands[2];
+   ops[2] = operands[3];
+   output_asm_insn ("vstrd.u64\t%q2, [%q0, %1]!",ops);
+   return "";
+}
+  [(set_attr "length" "4")])
+
+(define_expand "mve_vstrdq_scatter_base_wb_p_<supf>v2di"
+  [(match_operand:V2DI 0 "s_register_operand" "=w")
+   (match_operand:SI 1 "mve_vldrd_immediate" "Ri")
+   (match_operand:V2DI 2 "s_register_operand" "w")
+   (match_operand:HI 3 "vpr_register_operand")
+   (unspec:V2DI [(const_int 0)] VSTRDSBWBQ)]
+  "TARGET_HAVE_MVE"
+{
+  rtx ignore_wb = gen_reg_rtx (V2DImode);
+  emit_insn (
+  gen_mve_vstrdq_scatter_base_wb_p_<supf>v2di_insn (ignore_wb, operands[0],
+						    operands[1], operands[2],
+						    operands[3]));
+  DONE;
+})
+
+(define_expand "mve_vstrdq_scatter_base_wb_p_add_<supf>v2di"
+  [(match_operand:V2DI 0 "s_register_operand" "=w")
+   (match_operand:SI 1 "mve_vldrd_immediate" "Ri")
+   (match_operand:V2DI 2 "s_register_operand" "0")
+   (match_operand:HI 3 "vpr_register_operand")
+   (unspec:V2DI [(const_int 0)] VSTRDSBWBQ)]
+  "TARGET_HAVE_MVE"
+{
+  rtx ignore_vec = gen_reg_rtx (V2DImode);
+  emit_insn (
+  gen_mve_vstrdq_scatter_base_wb_p_<supf>v2di_insn (operands[0], operands[2],
+						    operands[1], ignore_vec,
+						    operands[3]));
+  DONE;
+})
+
+;;
+;; [vstrdq_scatter_base_wb_p_s vstrdq_scatter_base_wb_p_u]
+;;
+(define_insn "mve_vstrdq_scatter_base_wb_p_<supf>v2di_insn"
+  [(set (mem:BLK (scratch))
+	(unspec:BLK
+		[(match_operand:V2DI 1 "s_register_operand" "0")
+		 (match_operand:SI 2 "mve_vldrd_immediate" "Ri")
+		 (match_operand:V2DI 3 "s_register_operand" "w")
+		 (match_operand:HI 4 "vpr_register_operand")]
+	 VSTRDSBWBQ))
+   (set (match_operand:V2DI 0 "s_register_operand" "=w")
+	(unspec:V2DI [(match_dup 1) (match_dup 2)]
+	 VSTRDSBWBQ))
+  ]
+  "TARGET_HAVE_MVE"
+{
+   rtx ops[3];
+   ops[0] = operands[1];
+   ops[1] = operands[2];
+   ops[2] = operands[3];
+   output_asm_insn ("vpst\;\tvstrdt.u64\t%q2, [%q0, %1]!",ops);
+   return "";
+}
+  [(set_attr "length" "8")])
+
+(define_expand "mve_vldrwq_gather_base_wb_<supf>v4si"
+  [(match_operand:V4SI 0 "s_register_operand")
+   (match_operand:V4SI 1 "s_register_operand")
+   (match_operand:SI 2 "mve_vldrd_immediate")
+   (unspec:V4SI [(const_int 0)] VLDRWGBWBQ)]
+  "TARGET_HAVE_MVE"
+{
+  rtx ignore_wb = gen_reg_rtx (V4SImode);
+  emit_insn (
+  gen_mve_vldrwq_gather_base_wb_<supf>v4si_insn (operands[0], ignore_wb,
+						 operands[1], operands[2]));
+  DONE;
+})
+
+;;
+;; [vldrwq_gather_base_wb_s vldrwq_gather_base_wb_u]
+;;
+(define_insn "mve_vldrwq_gather_base_wb_<supf>v4si_insn"
+  [(set (match_operand:V4SI 0 "s_register_operand" "=&w")
+	(unspec:V4SI [(match_operand:V4SI 2 "s_register_operand" "1")
+		      (match_operand:SI 3 "mve_vldrd_immediate" "Ri")
+		      (mem:BLK (scratch))]
+	 VLDRWGBWBQ))
+   (set (match_operand:V4SI 1 "s_register_operand" "=&w")
+	(unspec:V4SI [(match_dup 2) (match_dup 3)]
+	 VLDRWGBWBQ))
+  ]
+  "TARGET_HAVE_MVE"
+{
+   rtx ops[3];
+   ops[0] = operands[0];
+   ops[1] = operands[2];
+   ops[2] = operands[3];
+   output_asm_insn ("vldrw.u32\t%q0, [%q1, %2]!",ops);
+   return "";
+}
+  [(set_attr "length" "4")])
+
+(define_expand "mve_vldrwq_gather_base_wb_z_<supf>v4si"
+  [(match_operand:V4SI 0 "s_register_operand")
+   (match_operand:V4SI 1 "s_register_operand")
+   (match_operand:SI 2 "mve_vldrd_immediate")
+   (match_operand:HI 3 "vpr_register_operand")
+   (unspec:V4SI [(const_int 0)] VLDRWGBWBQ)]
+  "TARGET_HAVE_MVE"
+{
+  rtx ignore_wb = gen_reg_rtx (V4SImode);
+  emit_insn (
+  gen_mve_vldrwq_gather_base_wb_z_<supf>v4si_insn (operands[0], ignore_wb,
+						   operands[1], operands[2],
+						   operands[3]));
+  DONE;
+})
+
+;;
+;; [vldrwq_gather_base_wb_z_s vldrwq_gather_base_wb_z_u]
+;;
+(define_insn "mve_vldrwq_gather_base_wb_z_<supf>v4si_insn"
+  [(set (match_operand:V4SI 0 "s_register_operand" "=&w")
+	(unspec:V4SI [(match_operand:V4SI 2 "s_register_operand" "1")
+		      (match_operand:SI 3 "mve_vldrd_immediate" "Ri")
+		      (match_operand:HI 4 "vpr_register_operand" "Up")
+		      (mem:BLK (scratch))]
+	 VLDRWGBWBQ))
+   (set (match_operand:V4SI 1 "s_register_operand" "=&w")
+	(unspec:V4SI [(match_dup 2) (match_dup 3)]
+	 VLDRWGBWBQ))
+  ]
+  "TARGET_HAVE_MVE"
+{
+   rtx ops[3];
+   ops[0] = operands[0];
+   ops[1] = operands[2];
+   ops[2] = operands[3];
+   output_asm_insn ("vpst\;\tvldrwt.u32\t%q0, [%q1, %2]!",ops);
+   return "";
+}
+  [(set_attr "length" "8")])
+
+(define_expand "mve_vldrwq_gather_base_wb_fv4sf"
+  [(match_operand:V4SF 0 "s_register_operand")
+   (match_operand:V4SI 1 "s_register_operand")
+   (match_operand:SI 2 "mve_vldrd_immediate")
+   (unspec:V4SI [(const_int 0)] VLDRWQGBWB_F)]
+  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
+{
+  rtx ignore_wb = gen_reg_rtx (V4SImode);
+  emit_insn (
+  gen_mve_vldrwq_gather_base_wb_fv4sf_insn (operands[0], ignore_wb,
+					    operands[1], operands[2]));
+  DONE;
+})
+
+;;
+;; [vldrwq_gather_base_wb_f]
+;;
+(define_insn "mve_vldrwq_gather_base_wb_fv4sf_insn"
+  [(set (match_operand:V4SF 0 "s_register_operand" "=&w")
+	(unspec:V4SF [(match_operand:V4SI 2 "s_register_operand" "1")
+		      (match_operand:SI 3 "mve_vldrd_immediate" "Ri")
+		      (mem:BLK (scratch))]
+	 VLDRWQGBWB_F))
+   (set (match_operand:V4SI 1 "s_register_operand" "=&w")
+	(unspec:V4SI [(match_dup 2) (match_dup 3)]
+	 VLDRWQGBWB_F))
+  ]
+  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
+{
+   rtx ops[3];
+   ops[0] = operands[0];
+   ops[1] = operands[2];
+   ops[2] = operands[3];
+   output_asm_insn ("vldrw.u32\t%q0, [%q1, %2]!",ops);
+   return "";
+}
+  [(set_attr "length" "4")])
+
+(define_expand "mve_vldrwq_gather_base_wb_z_fv4sf"
+  [(match_operand:V4SF 0 "s_register_operand")
+   (match_operand:V4SI 1 "s_register_operand")
+   (match_operand:SI 2 "mve_vldrd_immediate")
+   (match_operand:HI 3 "vpr_register_operand")
+   (unspec:V4SI [(const_int 0)] VLDRWQGBWB_F)]
+  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
+{
+  rtx ignore_wb = gen_reg_rtx (V4SImode);
+  emit_insn (
+  gen_mve_vldrwq_gather_base_wb_z_fv4sf_insn (operands[0], ignore_wb,
+					      operands[1], operands[2],
+					      operands[3]));
+  DONE;
+})
+
+;;
+;; [vldrwq_gather_base_wb_z_f]
+;;
+(define_insn "mve_vldrwq_gather_base_wb_z_fv4sf_insn"
+  [(set (match_operand:V4SF 0 "s_register_operand" "=&w")
+	(unspec:V4SF [(match_operand:V4SI 2 "s_register_operand" "1")
+		      (match_operand:SI 3 "mve_vldrd_immediate" "Ri")
+		      (match_operand:HI 4 "vpr_register_operand" "Up")
+		      (mem:BLK (scratch))]
+	 VLDRWQGBWB_F))
+   (set (match_operand:V4SI 1 "s_register_operand" "=&w")
+	(unspec:V4SI [(match_dup 2) (match_dup 3)]
+	 VLDRWQGBWB_F))
+  ]
+  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
+{
+   rtx ops[3];
+   ops[0] = operands[0];
+   ops[1] = operands[2];
+   ops[2] = operands[3];
+   output_asm_insn ("vpst\;\tvldrwt.u32\t%q0, [%q1, %2]!",ops);
+   return "";
+}
+  [(set_attr "length" "8")])
+
+(define_expand "mve_vldrdq_gather_base_wb_<supf>v2di"
+  [(match_operand:V2DI 0 "s_register_operand")
+   (match_operand:V2DI 1 "s_register_operand")
+   (match_operand:SI 2 "mve_vldrd_immediate")
+   (unspec:V2DI [(const_int 0)] VLDRDGBWBQ)]
+  "TARGET_HAVE_MVE"
+{
+  rtx ignore_wb = gen_reg_rtx (V2DImode);
+  emit_insn (
+  gen_mve_vldrdq_gather_base_wb_<supf>v2di_insn (operands[0], ignore_wb,
+						 operands[1], operands[2]));
+  DONE;
+})
+
+;;
+;; [vldrdq_gather_base_wb_s vldrdq_gather_base_wb_u]
+;;
+(define_insn "mve_vldrdq_gather_base_wb_<supf>v2di_insn"
+  [(set (match_operand:V2DI 0 "s_register_operand" "=&w")
+	(unspec:V2DI [(match_operand:V2DI 2 "s_register_operand" "1")
+		      (match_operand:SI 3 "mve_vldrd_immediate" "Ri")
+		      (mem:BLK (scratch))]
+	 VLDRDGBWBQ))
+   (set (match_operand:V2DI 1 "s_register_operand" "=&w")
+	(unspec:V2DI [(match_dup 2) (match_dup 3)]
+	 VLDRDGBWBQ))
+  ]
+  "TARGET_HAVE_MVE"
+{
+   rtx ops[3];
+   ops[0] = operands[0];
+   ops[1] = operands[2];
+   ops[2] = operands[3];
+   output_asm_insn ("vldrd.64\t%q0, [%q1, %2]!",ops);
+   return "";
+}
+  [(set_attr "length" "4")])
+
+(define_expand "mve_vldrdq_gather_base_wb_z_<supf>v2di"
+  [(match_operand:V2DI 0 "s_register_operand")
+   (match_operand:V2DI 1 "s_register_operand")
+   (match_operand:SI 2 "mve_vldrd_immediate")
+   (match_operand:HI 3 "vpr_register_operand")
+   (unspec:V2DI [(const_int 0)] VLDRDGBWBQ)]
+  "TARGET_HAVE_MVE"
+{
+  rtx ignore_wb = gen_reg_rtx (V2DImode);
+  emit_insn (
+  gen_mve_vldrdq_gather_base_wb_z_<supf>v2di_insn (operands[0], ignore_wb,
+						   operands[1], operands[2],
+						   operands[3]));
+  DONE;
+})
+
+;;
+;; [vldrdq_gather_base_wb_z_s vldrdq_gather_base_wb_z_u]
+;;
+(define_insn "mve_vldrdq_gather_base_wb_z_<supf>v2di_insn"
+  [(set (match_operand:V2DI 0 "s_register_operand" "=&w")
+	(unspec:V2DI [(match_operand:V2DI 2 "s_register_operand" "1")
+		      (match_operand:SI 3 "mve_vldrd_immediate" "Ri")
+		      (match_operand:HI 4 "vpr_register_operand" "Up")
+		      (mem:BLK (scratch))]
+	 VLDRDGBWBQ))
+   (set (match_operand:V2DI 1 "s_register_operand" "=&w")
+	(unspec:V2DI [(match_dup 2) (match_dup 3)]
+	 VLDRDGBWBQ))
+  ]
+  "TARGET_HAVE_MVE"
+{
+   rtx ops[3];
+   ops[0] = operands[0];
+   ops[1] = operands[2];
+   ops[2] = operands[3];
+   output_asm_insn ("vpst\;\tvldrdt.u64\t%q0, [%q1, %2]!",ops);
+   return "";
+}
+  [(set_attr "length" "8")])
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_s64.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_s64.c
new file mode 100644
index 0000000000000000000000000000000000000000..244f67e71fb51e8332c80c52aca2c962885d2f13
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_s64.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int64x2_t
+foo (uint64x2_t * addr)
+{
+  return vldrdq_gather_base_wb_s64 (addr, 8);
+}
+
+/* { dg-final { scan-assembler "vldrd.64"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_u64.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_u64.c
new file mode 100644
index 0000000000000000000000000000000000000000..f0ef919ab0ceee7318340a121d2307fbc51ceed9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_u64.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint64x2_t
+foo (uint64x2_t * addr)
+{
+  return vldrdq_gather_base_wb_u64 (addr, 8);
+}
+
+/* { dg-final { scan-assembler "vldrd.64"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_z_s64.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_z_s64.c
new file mode 100644
index 0000000000000000000000000000000000000000..55f618bdb51a1c1ef06d83a8d5caab37415eeca0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_z_s64.c
@@ -0,0 +1,11 @@
+/* { dg-do compile  } */
+/* { dg-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+
+#include "arm_mve.h"
+
+int64x2_t foo (uint64x2_t * addr, mve_pred16_t p)
+{
+    return vldrdq_gather_base_wb_z_s64 (addr, 1016, p);
+}
+
+/* { dg-final { scan-assembler "vldrdt.u64"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_z_u64.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_z_u64.c
new file mode 100644
index 0000000000000000000000000000000000000000..f165840344bfdc41613b4602d3acca246e637ca5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_z_u64.c
@@ -0,0 +1,11 @@
+/* { dg-do compile  } */
+/* { dg-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+
+#include "arm_mve.h"
+
+uint64x2_t foo (uint64x2_t * addr, mve_pred16_t p)
+{
+    return vldrdq_gather_base_wb_z_u64 (addr, 8, p);
+}
+
+/* { dg-final { scan-assembler "vldrdt.u64"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_f32.c
new file mode 100644
index 0000000000000000000000000000000000000000..c14829ee0b53df02fc27ab2507f0940cc665e500
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_f32.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+float32x4_t
+foo (uint32x4_t * addr)
+{
+  return vldrwq_gather_base_wb_f32 (addr, 8);
+}
+
+/* { dg-final { scan-assembler "vldrw.u32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_s32.c
new file mode 100644
index 0000000000000000000000000000000000000000..91e91a88b65c6ee319ad8e00c1ea6879b6ec9d30
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_s32.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int32x4_t
+foo (uint32x4_t * addr)
+{
+  return vldrwq_gather_base_wb_s32 (addr, 8);
+}
+
+/* { dg-final { scan-assembler "vldrw.u32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_u32.c
new file mode 100644
index 0000000000000000000000000000000000000000..4dbee68d1e29f179ee0213bb5aaac21e716d429a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_u32.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint32x4_t
+foo (uint32x4_t * addr)
+{
+  return vldrwq_gather_base_wb_u32 (addr, 8);
+}
+
+/* { dg-final { scan-assembler "vldrw.u32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_f32.c
new file mode 100644
index 0000000000000000000000000000000000000000..457368a712f677c8a7278a1914af5979c7a8ce13
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_f32.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+float32x4_t
+foo (uint32x4_t * addr, mve_pred16_t p)
+{
+  return vldrwq_gather_base_wb_z_f32 (addr, 8, p);
+}
+
+/* { dg-final { scan-assembler "vldrwt.u32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_s32.c
new file mode 100644
index 0000000000000000000000000000000000000000..5ce458edeb28b1c2d3c28d7751458398db77af68
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_s32.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int32x4_t
+foo (uint32x4_t * addr, mve_pred16_t p)
+{
+  return vldrwq_gather_base_wb_z_s32 (addr, 8, p);
+}
+
+/* { dg-final { scan-assembler "vldrwt.u32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_u32.c
new file mode 100644
index 0000000000000000000000000000000000000000..1c24ad852dd766934f38c8ce9725c5f6cefae938
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_u32.c
@@ -0,0 +1,13 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint32x4_t
+foo (uint32x4_t * addr, mve_pred16_t p)
+{
+  return vldrwq_gather_base_wb_z_u32 (addr, 8, p);
+}
+
+/* { dg-final { scan-assembler "vldrwt.u32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrdq_scatter_base_wb_p_s64.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrdq_scatter_base_wb_p_s64.c
new file mode 100644
index 0000000000000000000000000000000000000000..756a3a154181e9835e87b0830d70fa17eb5f8c6c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrdq_scatter_base_wb_p_s64.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+void
+foo (uint64x2_t * addr, const int offset, int64x2_t value, mve_pred16_t p)
+{
+  vstrdq_scatter_base_wb_p_s64 (addr, 8, value, p);
+}
+
+/* { dg-final { scan-assembler "vstrdt.u64"  }  } */
+
+void
+foo1 (uint64x2_t * addr, const int offset, int64x2_t value, mve_pred16_t p)
+{
+  vstrdq_scatter_base_wb_p (addr, 8, value, p);
+}
+
+/* { dg-final { scan-assembler "vstrdt.u64"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrdq_scatter_base_wb_p_u64.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrdq_scatter_base_wb_p_u64.c
new file mode 100644
index 0000000000000000000000000000000000000000..0e97dfadddec32b2c945417c8d8fb8620ec76cce
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrdq_scatter_base_wb_p_u64.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+void
+foo (uint64x2_t * addr, const int offset, uint64x2_t value, mve_pred16_t p)
+{
+  vstrdq_scatter_base_wb_p_u64 (addr, 8, value, p);
+}
+
+/* { dg-final { scan-assembler "vstrdt.u64"  }  } */
+
+void
+foo1 (uint64x2_t * addr, const int offset, uint64x2_t value, mve_pred16_t p)
+{
+  vstrdq_scatter_base_wb_p (addr, 8, value, p);
+}
+
+/* { dg-final { scan-assembler "vstrdt.u64"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrdq_scatter_base_wb_s64.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrdq_scatter_base_wb_s64.c
new file mode 100644
index 0000000000000000000000000000000000000000..b46c5b12c3d732cdcb19fcf2256a2b89f0f9a695
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrdq_scatter_base_wb_s64.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+void
+foo (uint64x2_t * addr, const int offset, int64x2_t value)
+{
+  vstrdq_scatter_base_wb_s64 (addr, 8, value);
+}
+
+/* { dg-final { scan-assembler "vstrd.u64"  }  } */
+
+void
+foo1 (uint64x2_t * addr, const int offset, int64x2_t value)
+{
+  vstrdq_scatter_base_wb (addr, 8, value);
+}
+
+/* { dg-final { scan-assembler "vstrd.u64"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrdq_scatter_base_wb_u64.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrdq_scatter_base_wb_u64.c
new file mode 100644
index 0000000000000000000000000000000000000000..62b0742cfb2f7ec7a3519969cad19f85cc272614
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrdq_scatter_base_wb_u64.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+void
+foo (uint64x2_t * addr, const int offset, uint64x2_t value)
+{
+  vstrdq_scatter_base_wb_u64 (addr, 8, value);
+}
+
+/* { dg-final { scan-assembler "vstrd.u64"  }  } */
+
+void
+foo1 (uint64x2_t * addr, const int offset, uint64x2_t value)
+{
+  vstrdq_scatter_base_wb (addr, 8, value);
+}
+
+/* { dg-final { scan-assembler "vstrd.u64"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_wb_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_wb_f32.c
new file mode 100644
index 0000000000000000000000000000000000000000..45eaf31206f4addb33ad32b52cb45ce815876d24
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_wb_f32.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+void
+foo (uint32x4_t * addr, const int offset, float32x4_t value)
+{
+  vstrwq_scatter_base_wb_f32 (addr, 8, value);
+}
+
+/* { dg-final { scan-assembler "vstrw.u32"  }  } */
+
+void
+foo1 (uint32x4_t * addr, const int offset, float32x4_t value)
+{
+  vstrwq_scatter_base_wb (addr, 8, value);
+}
+
+/* { dg-final { scan-assembler "vstrw.u32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_wb_p_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_wb_p_f32.c
new file mode 100644
index 0000000000000000000000000000000000000000..9592d386ebb95a1ae94201554448b75f6086ab78
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_wb_p_f32.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+void
+foo (uint32x4_t * addr, const int offset, float32x4_t value, mve_pred16_t p)
+{
+  vstrwq_scatter_base_wb_p_f32 (addr, 8, value, p);
+}
+
+/* { dg-final { scan-assembler "vstrwt.u32"  }  } */
+
+void
+foo1 (uint32x4_t * addr, const int offset, float32x4_t value, mve_pred16_t p)
+{
+  vstrwq_scatter_base_wb_p (addr, 8, value, p);
+}
+
+/* { dg-final { scan-assembler "vstrwt.u32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_wb_p_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_wb_p_s32.c
new file mode 100644
index 0000000000000000000000000000000000000000..6b2b03dc26467d65b4c617c0d39bb6e062791bca
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_wb_p_s32.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+void
+foo (uint32x4_t * addr, const int offset, int32x4_t value, mve_pred16_t p)
+{
+  vstrwq_scatter_base_wb_p_s32 (addr, 8, value, p);
+}
+
+/* { dg-final { scan-assembler "vstrwt.u32"  }  } */
+
+void
+foo1 (uint32x4_t * addr, const int offset, int32x4_t value, mve_pred16_t p)
+{
+  vstrwq_scatter_base_wb_p (addr, 8, value, p);
+}
+
+/* { dg-final { scan-assembler "vstrwt.u32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_wb_p_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_wb_p_u32.c
new file mode 100644
index 0000000000000000000000000000000000000000..afc5ff88287e6d39bf0209d92923769073b4a80d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_wb_p_u32.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+void
+foo (uint32x4_t * addr, const int offset, uint32x4_t value, mve_pred16_t p)
+{
+  vstrwq_scatter_base_wb_p_u32 (addr, 8, value, p);
+}
+
+/* { dg-final { scan-assembler "vstrwt.u32"  }  } */
+
+void
+foo1 (uint32x4_t * addr, const int offset, uint32x4_t value, mve_pred16_t p)
+{
+  vstrwq_scatter_base_wb_p (addr, 8, value, p);
+}
+
+/* { dg-final { scan-assembler "vstrwt.u32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_wb_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_wb_s32.c
new file mode 100644
index 0000000000000000000000000000000000000000..214cd3ee79067eb7f0887bc8018793647ad02e1f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_wb_s32.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+void
+foo (uint32x4_t * addr, const int offset, int32x4_t value)
+{
+  vstrwq_scatter_base_wb_s32 (addr, 8, value);
+}
+
+/* { dg-final { scan-assembler "vstrw.u32"  }  } */
+
+void
+foo1 (uint32x4_t * addr, const int offset, int32x4_t value)
+{
+  vstrwq_scatter_base_wb (addr, 8, value);
+}
+
+/* { dg-final { scan-assembler "vstrw.u32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_wb_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_wb_u32.c
new file mode 100644
index 0000000000000000000000000000000000000000..a0afb7ab448598536aaec568f55ddb22e12c3f99
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_wb_u32.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+void
+foo (uint32x4_t * addr, uint32x4_t value)
+{
+  vstrwq_scatter_base_wb_u32 (addr, 8, value);
+}
+
+/* { dg-final { scan-assembler "vstrw.u32"  }  } */
+
+void
+foo1 (uint32x4_t * addr, uint32x4_t value)
+{
+  vstrwq_scatter_base_wb (addr, 8, value);
+}
+
+/* { dg-final { scan-assembler "vstrw.u32"  }  } */


[-- Attachment #2: diff31.patch.gz --]
[-- Type: application/gzip, Size: 5708 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH][ARM][GCC][1/3x]: MVE intrinsics with ternary operands.
  2019-11-14 19:34 [PATCH][ARM][GCC][0/x]: Support for MVE ACLE intrinsics Srinath Parvathaneni
                   ` (36 preceding siblings ...)
  2019-11-14 19:16 ` [PATCH][ARM][GCC][2/3x]: MVE intrinsics with ternary operands Srinath Parvathaneni
@ 2019-11-14 19:27 ` Srinath Parvathaneni
  2019-12-12 16:09 ` [PATCH][ARM][GCC][0/x]: Support for MVE ACLE intrinsics Kyrill Tkachov
  38 siblings, 0 replies; 48+ messages in thread
From: Srinath Parvathaneni @ 2019-11-14 19:27 UTC (permalink / raw)
  To: gcc-patches; +Cc: Richard Earnshaw, Kyrylo Tkachov

[-- Attachment #1: Type: text/plain, Size: 79154 bytes --]

Hello,

This patch supports following MVE ACLE intrinsics with ternary operands.

vabavq_s8, vabavq_s16, vabavq_s32, vbicq_m_n_s16, vbicq_m_n_s32,
vbicq_m_n_u16, vbicq_m_n_u32, vcmpeqq_m_f16, vcmpeqq_m_f32,
vcvtaq_m_s16_f16, vcvtaq_m_u16_f16, vcvtaq_m_s32_f32, vcvtaq_m_u32_f32,
vcvtq_m_f16_s16, vcvtq_m_f16_u16, vcvtq_m_f32_s32, vcvtq_m_f32_u32,
vqrshrnbq_n_s16, vqrshrnbq_n_u16, vqrshrnbq_n_s32, vqrshrnbq_n_u32,
vqrshrunbq_n_s16, vqrshrunbq_n_s32, vrmlaldavhaq_s32, vrmlaldavhaq_u32,
vshlcq_s8, vshlcq_u8, vshlcq_s16, vshlcq_u16, vshlcq_s32, vshlcq_u32,
vabavq_s8, vabavq_s16, vabavq_s32.

Please refer to M-profile Vector Extension (MVE) intrinsics [1]  for more details.
[1] https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/mve-intrinsics

Regression tested on arm-none-eabi and found no regressions.

Ok for trunk?

Thanks,
Srinath.

gcc/ChangeLog:

2019-10-23  Andre Vieira  <andre.simoesdiasvieira@arm.com>
	    Mihail Ionescu  <mihail.ionescu@arm.com>
	    Srinath Parvathaneni  <srinath.parvathaneni@arm.com>

	* config/arm/arm-builtins.c (TERNOP_UNONE_UNONE_UNONE_IMM_QUALIFIERS):
	Define qualifier for ternary operands.
	(TERNOP_UNONE_UNONE_NONE_NONE_QUALIFIERS): Likewise.
	(TERNOP_UNONE_NONE_UNONE_IMM_QUALIFIERS): Likewise.
	(TERNOP_NONE_NONE_UNONE_IMM_QUALIFIERS): Likewise.
	(TERNOP_UNONE_UNONE_NONE_IMM_QUALIFIERS): Likewise.
	(TERNOP_UNONE_UNONE_NONE_UNONE_QUALIFIERS): Likewise.
	(TERNOP_UNONE_UNONE_IMM_UNONE_QUALIFIERS): Likewise.
	(TERNOP_UNONE_NONE_NONE_UNONE_QUALIFIERS): Likewise.
	(TERNOP_NONE_NONE_NONE_IMM_QUALIFIERS): Likewise.
	(TERNOP_NONE_NONE_NONE_UNONE_QUALIFIERS): Likewise.
	(TERNOP_NONE_NONE_IMM_UNONE_QUALIFIERS): Likewise.
	(TERNOP_NONE_NONE_UNONE_UNONE_QUALIFIERS): Likewise.
	(TERNOP_UNONE_UNONE_UNONE_UNONE_QUALIFIERS): Likewise.
	(TERNOP_NONE_NONE_NONE_NONE_QUALIFIERS): Likewise.
	* config/arm/arm_mve.h (vabavq_s8): Define macro.
	(vabavq_s16): Likewise.
	(vabavq_s32): Likewise.
	(vbicq_m_n_s16): Likewise.
	(vbicq_m_n_s32): Likewise.
	(vbicq_m_n_u16): Likewise.
	(vbicq_m_n_u32): Likewise.
	(vcmpeqq_m_f16): Likewise.
	(vcmpeqq_m_f32): Likewise.
	(vcvtaq_m_s16_f16): Likewise.
	(vcvtaq_m_u16_f16): Likewise.
	(vcvtaq_m_s32_f32): Likewise.
	(vcvtaq_m_u32_f32): Likewise.
	(vcvtq_m_f16_s16): Likewise.
	(vcvtq_m_f16_u16): Likewise.
	(vcvtq_m_f32_s32): Likewise.
	(vcvtq_m_f32_u32): Likewise.
	(vqrshrnbq_n_s16): Likewise.
	(vqrshrnbq_n_u16): Likewise.
	(vqrshrnbq_n_s32): Likewise.
	(vqrshrnbq_n_u32): Likewise.
	(vqrshrunbq_n_s16): Likewise.
	(vqrshrunbq_n_s32): Likewise.
	(vrmlaldavhaq_s32): Likewise.
	(vrmlaldavhaq_u32): Likewise.
	(vshlcq_s8): Likewise.
	(vshlcq_u8): Likewise.
	(vshlcq_s16): Likewise.
	(vshlcq_u16): Likewise.
	(vshlcq_s32): Likewise.
	(vshlcq_u32): Likewise.
	(vabavq_u8): Likewise.
	(vabavq_u16): Likewise.
	(vabavq_u32): Likewise.
	(__arm_vabavq_s8): Define intrinsic.
	(__arm_vabavq_s16): Likewise.
	(__arm_vabavq_s32): Likewise.
	(__arm_vabavq_u8): Likewise.
	(__arm_vabavq_u16): Likewise.
	(__arm_vabavq_u32): Likewise.
	(__arm_vbicq_m_n_s16): Likewise.
	(__arm_vbicq_m_n_s32): Likewise.
	(__arm_vbicq_m_n_u16): Likewise.
	(__arm_vbicq_m_n_u32): Likewise.
	(__arm_vqrshrnbq_n_s16): Likewise.
	(__arm_vqrshrnbq_n_u16): Likewise.
	(__arm_vqrshrnbq_n_s32): Likewise.
	(__arm_vqrshrnbq_n_u32): Likewise.
	(__arm_vqrshrunbq_n_s16): Likewise.
	(__arm_vqrshrunbq_n_s32): Likewise.
	(__arm_vrmlaldavhaq_s32): Likewise.
	(__arm_vrmlaldavhaq_u32): Likewise.
	(__arm_vshlcq_s8): Likewise.
	(__arm_vshlcq_u8): Likewise.
	(__arm_vshlcq_s16): Likewise.
	(__arm_vshlcq_u16): Likewise.
	(__arm_vshlcq_s32): Likewise.
	(__arm_vshlcq_u32): Likewise.
	(__arm_vcmpeqq_m_f16): Likewise.
	(__arm_vcmpeqq_m_f32): Likewise.
	(__arm_vcvtaq_m_s16_f16): Likewise.
	(__arm_vcvtaq_m_u16_f16): Likewise.
	(__arm_vcvtaq_m_s32_f32): Likewise.
	(__arm_vcvtaq_m_u32_f32): Likewise.
	(__arm_vcvtq_m_f16_s16): Likewise.
	(__arm_vcvtq_m_f16_u16): Likewise.
	(__arm_vcvtq_m_f32_s32): Likewise.
	(__arm_vcvtq_m_f32_u32): Likewise.
	(vcvtaq_m): Define polymorphic variant.
	(vcvtq_m): Likewise.
	(vabavq): Likewise.
	(vshlcq): Likewise.
	(vbicq_m_n): Likewise.
	(vqrshrnbq_n): Likewise.
	(vqrshrunbq_n): Likewise.
	* config/arm/arm_mve_builtins.def
	(TERNOP_UNONE_UNONE_UNONE_IMM_QUALIFIERS): Use the builtin qualifer.
	(TERNOP_UNONE_UNONE_NONE_NONE_QUALIFIERS): Likewise.
	(TERNOP_UNONE_NONE_UNONE_IMM_QUALIFIERS): Likewise.
	(TERNOP_NONE_NONE_UNONE_IMM_QUALIFIERS): Likewise.
	(TERNOP_UNONE_UNONE_NONE_IMM_QUALIFIERS): Likewise.
	(TERNOP_UNONE_UNONE_NONE_UNONE_QUALIFIERS): Likewise.
	(TERNOP_UNONE_UNONE_IMM_UNONE_QUALIFIERS): Likewise.
	(TERNOP_UNONE_NONE_NONE_UNONE_QUALIFIERS): Likewise.
	(TERNOP_NONE_NONE_NONE_IMM_QUALIFIERS): Likewise.
	(TERNOP_NONE_NONE_NONE_UNONE_QUALIFIERS): Likewise.
	(TERNOP_NONE_NONE_IMM_UNONE_QUALIFIERS): Likewise.
	(TERNOP_NONE_NONE_UNONE_UNONE_QUALIFIERS): Likewise.
	(TERNOP_UNONE_UNONE_UNONE_UNONE_QUALIFIERS): Likewise.
	(TERNOP_NONE_NONE_NONE_NONE_QUALIFIERS): Likewise.
	* config/arm/mve.md (VBICQ_M_N): Define iterator.
	(VCVTAQ_M): Likewise.
	(VCVTQ_M_TO_F): Likewise.
	(VQRSHRNBQ_N): Likewise.
	(VABAVQ): Likewise.
	(VSHLCQ): Likewise.
	(VRMLALDAVHAQ): Likewise.
	(mve_vbicq_m_n_<supf><mode>): Define RTL pattern.
	(mve_vcmpeqq_m_f<mode>): Likewise.
	(mve_vcvtaq_m_<supf><mode>): Likewise.
	(mve_vcvtq_m_to_f_<supf><mode>): Likewise.
	(mve_vqrshrnbq_n_<supf><mode>): Likewise.
	(mve_vqrshrunbq_n_s<mode>): Likewise.
	(mve_vrmlaldavhaq_<supf>v4si): Likewise.
	(mve_vabavq_<supf><mode>): Likewise.
	(mve_vshlcq_<supf><mode>): Likewise.
	(mve_vshlcq_<supf><mode>): Likewise.
	(mve_vshlcq_vec_<supf><mode>): Define RTL expand.
	(mve_vshlcq_carry_<supf><mode>): Likewise.

gcc/testsuite/ChangeLog:

2019-10-23  Andre Vieira  <andre.simoesdiasvieira@arm.com>
	    Mihail Ionescu  <mihail.ionescu@arm.com>
	    Srinath Parvathaneni  <srinath.parvathaneni@arm.com>

	* gcc.target/arm/mve/intrinsics/vabavq_s16.c: New test.
	* gcc.target/arm/mve/intrinsics/vabavq_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vabavq_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vabavq_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vabavq_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vabavq_u8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vbicq_m_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vbicq_m_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vbicq_m_n_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vbicq_m_n_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpeqq_m_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcmpeqq_m_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcvtaq_m_s16_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcvtaq_m_s32_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcvtaq_m_u16_f16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcvtaq_m_u32_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcvtq_m_f16_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcvtq_m_f16_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcvtq_m_f32_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vcvtq_m_f32_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqrshrnbq_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqrshrnbq_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqrshrnbq_n_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqrshrnbq_n_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqrshrunbq_n_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vqrshrunbq_n_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrmlaldavhaq_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vrmlaldavhaq_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vshlcq_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vshlcq_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vshlcq_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vshlcq_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vshlcq_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vshlcq_u8.c: Likewise.


###############     Attachment also inlined for ease of reply    ###############


diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index 73a0a3070bda2e35f3e400994dbb6ccfb3043f76..b0d2a19684dc798e1ae1a4c0496c6c4fddac891d 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -409,6 +409,96 @@ arm_binop_unone_unone_none_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 #define BINOP_UNONE_UNONE_NONE_QUALIFIERS \
   (arm_binop_unone_unone_none_qualifiers)
 
+static enum arm_type_qualifiers
+arm_ternop_unone_unone_unone_imm_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_unsigned, qualifier_unsigned, qualifier_unsigned,
+    qualifier_immediate };
+#define TERNOP_UNONE_UNONE_UNONE_IMM_QUALIFIERS \
+  (arm_ternop_unone_unone_unone_imm_qualifiers)
+
+static enum arm_type_qualifiers
+arm_ternop_unone_unone_none_none_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_unsigned, qualifier_unsigned, qualifier_none, qualifier_none };
+#define TERNOP_UNONE_UNONE_NONE_NONE_QUALIFIERS \
+  (arm_ternop_unone_unone_none_none_qualifiers)
+
+static enum arm_type_qualifiers
+arm_ternop_unone_none_unone_imm_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_unsigned, qualifier_none, qualifier_unsigned,
+      qualifier_immediate };
+#define TERNOP_UNONE_NONE_UNONE_IMM_QUALIFIERS \
+  (arm_ternop_unone_none_unone_imm_qualifiers)
+
+static enum arm_type_qualifiers
+arm_ternop_none_none_unone_imm_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_none, qualifier_none, qualifier_unsigned, qualifier_immediate };
+#define TERNOP_NONE_NONE_UNONE_IMM_QUALIFIERS \
+  (arm_ternop_none_none_unone_imm_qualifiers)
+
+static enum arm_type_qualifiers
+arm_ternop_unone_unone_none_imm_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_unsigned, qualifier_unsigned, qualifier_none,
+    qualifier_immediate };
+#define TERNOP_UNONE_UNONE_NONE_IMM_QUALIFIERS \
+  (arm_ternop_unone_unone_none_imm_qualifiers)
+
+static enum arm_type_qualifiers
+arm_ternop_unone_unone_none_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_unsigned, qualifier_unsigned, qualifier_none,
+      qualifier_unsigned };
+#define TERNOP_UNONE_UNONE_NONE_UNONE_QUALIFIERS \
+  (arm_ternop_unone_unone_none_unone_qualifiers)
+
+static enum arm_type_qualifiers
+arm_ternop_unone_unone_imm_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_unsigned, qualifier_unsigned, qualifier_immediate,
+    qualifier_unsigned };
+#define TERNOP_UNONE_UNONE_IMM_UNONE_QUALIFIERS \
+  (arm_ternop_unone_unone_imm_unone_qualifiers)
+
+static enum arm_type_qualifiers
+arm_ternop_unone_none_none_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_unsigned, qualifier_none, qualifier_none, qualifier_unsigned };
+#define TERNOP_UNONE_NONE_NONE_UNONE_QUALIFIERS \
+  (arm_ternop_unone_none_none_unone_qualifiers)
+
+static enum arm_type_qualifiers
+arm_ternop_none_none_none_imm_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_none, qualifier_none, qualifier_none, qualifier_immediate };
+#define TERNOP_NONE_NONE_NONE_IMM_QUALIFIERS \
+  (arm_ternop_none_none_none_imm_qualifiers)
+
+static enum arm_type_qualifiers
+arm_ternop_none_none_none_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_none, qualifier_none, qualifier_none, qualifier_unsigned };
+#define TERNOP_NONE_NONE_NONE_UNONE_QUALIFIERS \
+  (arm_ternop_none_none_none_unone_qualifiers)
+
+static enum arm_type_qualifiers
+arm_ternop_none_none_imm_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_none, qualifier_none, qualifier_immediate, qualifier_unsigned };
+#define TERNOP_NONE_NONE_IMM_UNONE_QUALIFIERS \
+  (arm_ternop_none_none_imm_unone_qualifiers)
+
+static enum arm_type_qualifiers
+arm_ternop_none_none_unone_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_none, qualifier_none, qualifier_unsigned, qualifier_unsigned };
+#define TERNOP_NONE_NONE_UNONE_UNONE_QUALIFIERS \
+  (arm_ternop_none_none_unone_unone_qualifiers)
+
+static enum arm_type_qualifiers
+arm_ternop_unone_unone_unone_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_unsigned, qualifier_unsigned, qualifier_unsigned,
+    qualifier_unsigned };
+#define TERNOP_UNONE_UNONE_UNONE_UNONE_QUALIFIERS \
+  (arm_ternop_unone_unone_unone_unone_qualifiers)
+
+static enum arm_type_qualifiers
+arm_ternop_none_none_none_none_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_none, qualifier_none, qualifier_none, qualifier_none };
+#define TERNOP_NONE_NONE_NONE_NONE_QUALIFIERS \
+  (arm_ternop_none_none_none_none_qualifiers)
+
 /* End of Qualifier for MVE builtins.  */
 
    /* void ([T element type] *, T, immediate).  */
diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
index 63c1089b9d5b27767439edf9f38f617879ef4dda..8e4030f4e19a51e7ebd7a0208d3e6af5e101f2ca 100644
--- a/gcc/config/arm/arm_mve.h
+++ b/gcc/config/arm/arm_mve.h
@@ -742,6 +742,40 @@ typedef struct { uint8x16_t val[4]; } uint8x16x4_t;
 #define vcvttq_f16_f32(__a, __b) __arm_vcvttq_f16_f32(__a, __b)
 #define vcvtbq_f16_f32(__a, __b) __arm_vcvtbq_f16_f32(__a, __b)
 #define vaddlvaq_s32(__a, __b) __arm_vaddlvaq_s32(__a, __b)
+#define vabavq_s8(__a, __b, __c) __arm_vabavq_s8(__a, __b, __c)
+#define vabavq_s16(__a, __b, __c) __arm_vabavq_s16(__a, __b, __c)
+#define vabavq_s32(__a, __b, __c) __arm_vabavq_s32(__a, __b, __c)
+#define vbicq_m_n_s16(__a,  __imm, __p) __arm_vbicq_m_n_s16(__a,  __imm, __p)
+#define vbicq_m_n_s32(__a,  __imm, __p) __arm_vbicq_m_n_s32(__a,  __imm, __p)
+#define vbicq_m_n_u16(__a,  __imm, __p) __arm_vbicq_m_n_u16(__a,  __imm, __p)
+#define vbicq_m_n_u32(__a,  __imm, __p) __arm_vbicq_m_n_u32(__a,  __imm, __p)
+#define vcmpeqq_m_f16(__a, __b, __p) __arm_vcmpeqq_m_f16(__a, __b, __p)
+#define vcmpeqq_m_f32(__a, __b, __p) __arm_vcmpeqq_m_f32(__a, __b, __p)
+#define vcvtaq_m_s16_f16(__inactive, __a, __p) __arm_vcvtaq_m_s16_f16(__inactive, __a, __p)
+#define vcvtaq_m_u16_f16(__inactive, __a, __p) __arm_vcvtaq_m_u16_f16(__inactive, __a, __p)
+#define vcvtaq_m_s32_f32(__inactive, __a, __p) __arm_vcvtaq_m_s32_f32(__inactive, __a, __p)
+#define vcvtaq_m_u32_f32(__inactive, __a, __p) __arm_vcvtaq_m_u32_f32(__inactive, __a, __p)
+#define vcvtq_m_f16_s16(__inactive, __a, __p) __arm_vcvtq_m_f16_s16(__inactive, __a, __p)
+#define vcvtq_m_f16_u16(__inactive, __a, __p) __arm_vcvtq_m_f16_u16(__inactive, __a, __p)
+#define vcvtq_m_f32_s32(__inactive, __a, __p) __arm_vcvtq_m_f32_s32(__inactive, __a, __p)
+#define vcvtq_m_f32_u32(__inactive, __a, __p) __arm_vcvtq_m_f32_u32(__inactive, __a, __p)
+#define vqrshrnbq_n_s16(__a, __b,  __imm) __arm_vqrshrnbq_n_s16(__a, __b,  __imm)
+#define vqrshrnbq_n_u16(__a, __b,  __imm) __arm_vqrshrnbq_n_u16(__a, __b,  __imm)
+#define vqrshrnbq_n_s32(__a, __b,  __imm) __arm_vqrshrnbq_n_s32(__a, __b,  __imm)
+#define vqrshrnbq_n_u32(__a, __b,  __imm) __arm_vqrshrnbq_n_u32(__a, __b,  __imm)
+#define vqrshrunbq_n_s16(__a, __b,  __imm) __arm_vqrshrunbq_n_s16(__a, __b,  __imm)
+#define vqrshrunbq_n_s32(__a, __b,  __imm) __arm_vqrshrunbq_n_s32(__a, __b,  __imm)
+#define vrmlaldavhaq_s32(__a, __b, __c) __arm_vrmlaldavhaq_s32(__a, __b, __c)
+#define vrmlaldavhaq_u32(__a, __b, __c) __arm_vrmlaldavhaq_u32(__a, __b, __c)
+#define vshlcq_s8(__a,  __b,  __imm) __arm_vshlcq_s8(__a,  __b,  __imm)
+#define vshlcq_u8(__a,  __b,  __imm) __arm_vshlcq_u8(__a,  __b,  __imm)
+#define vshlcq_s16(__a,  __b,  __imm) __arm_vshlcq_s16(__a,  __b,  __imm)
+#define vshlcq_u16(__a,  __b,  __imm) __arm_vshlcq_u16(__a,  __b,  __imm)
+#define vshlcq_s32(__a,  __b,  __imm) __arm_vshlcq_s32(__a,  __b,  __imm)
+#define vshlcq_u32(__a,  __b,  __imm) __arm_vshlcq_u32(__a,  __b,  __imm)
+#define vabavq_u8(__a, __b, __c) __arm_vabavq_u8(__a, __b, __c)
+#define vabavq_u16(__a, __b, __c) __arm_vabavq_u16(__a, __b, __c)
+#define vabavq_u32(__a, __b, __c) __arm_vabavq_u32(__a, __b, __c)
 #endif
 
 __extension__ extern __inline void
@@ -4485,6 +4519,186 @@ __arm_vaddlvaq_s32 (int64_t __a, int32x4_t __b)
   return __builtin_mve_vaddlvaq_sv4si (__a, __b);
 }
 
+__extension__ extern __inline uint32_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vabavq_s8 (uint32_t __a, int8x16_t __b, int8x16_t __c)
+{
+  return __builtin_mve_vabavq_sv16qi (__a, __b, __c);
+}
+
+__extension__ extern __inline uint32_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vabavq_s16 (uint32_t __a, int16x8_t __b, int16x8_t __c)
+{
+  return __builtin_mve_vabavq_sv8hi (__a, __b, __c);
+}
+
+__extension__ extern __inline uint32_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vabavq_s32 (uint32_t __a, int32x4_t __b, int32x4_t __c)
+{
+  return __builtin_mve_vabavq_sv4si (__a, __b, __c);
+}
+
+__extension__ extern __inline uint32_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vabavq_u8 (uint32_t __a, uint8x16_t __b, uint8x16_t __c)
+{
+  return __builtin_mve_vabavq_uv16qi(__a, __b, __c);
+}
+
+__extension__ extern __inline uint32_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vabavq_u16 (uint32_t __a, uint16x8_t __b, uint16x8_t __c)
+{
+  return __builtin_mve_vabavq_uv8hi(__a, __b, __c);
+}
+
+__extension__ extern __inline uint32_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vabavq_u32 (uint32_t __a, uint32x4_t __b, uint32x4_t __c)
+{
+  return __builtin_mve_vabavq_uv4si(__a, __b, __c);
+}
+
+__extension__ extern __inline int16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vbicq_m_n_s16 (int16x8_t __a, const int __imm, mve_pred16_t __p)
+{
+  return __builtin_mve_vbicq_m_n_sv8hi (__a, __imm, __p);
+}
+
+__extension__ extern __inline int32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vbicq_m_n_s32 (int32x4_t __a, const int __imm, mve_pred16_t __p)
+{
+  return __builtin_mve_vbicq_m_n_sv4si (__a, __imm, __p);
+}
+
+__extension__ extern __inline uint16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vbicq_m_n_u16 (uint16x8_t __a, const int __imm, mve_pred16_t __p)
+{
+  return __builtin_mve_vbicq_m_n_uv8hi (__a, __imm, __p);
+}
+
+__extension__ extern __inline uint32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vbicq_m_n_u32 (uint32x4_t __a, const int __imm, mve_pred16_t __p)
+{
+  return __builtin_mve_vbicq_m_n_uv4si (__a, __imm, __p);
+}
+
+__extension__ extern __inline int8x16_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vqrshrnbq_n_s16 (int8x16_t __a, int16x8_t __b, const int __imm)
+{
+  return __builtin_mve_vqrshrnbq_n_sv8hi (__a, __b, __imm);
+}
+
+__extension__ extern __inline uint8x16_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vqrshrnbq_n_u16 (uint8x16_t __a, uint16x8_t __b, const int __imm)
+{
+  return __builtin_mve_vqrshrnbq_n_uv8hi (__a, __b, __imm);
+}
+
+__extension__ extern __inline int16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vqrshrnbq_n_s32 (int16x8_t __a, int32x4_t __b, const int __imm)
+{
+  return __builtin_mve_vqrshrnbq_n_sv4si (__a, __b, __imm);
+}
+
+__extension__ extern __inline uint16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vqrshrnbq_n_u32 (uint16x8_t __a, uint32x4_t __b, const int __imm)
+{
+  return __builtin_mve_vqrshrnbq_n_uv4si (__a, __b, __imm);
+}
+
+__extension__ extern __inline uint8x16_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vqrshrunbq_n_s16 (uint8x16_t __a, int16x8_t __b, const int __imm)
+{
+  return __builtin_mve_vqrshrunbq_n_sv8hi (__a, __b, __imm);
+}
+
+__extension__ extern __inline uint16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vqrshrunbq_n_s32 (uint16x8_t __a, int32x4_t __b, const int __imm)
+{
+  return __builtin_mve_vqrshrunbq_n_sv4si (__a, __b, __imm);
+}
+
+__extension__ extern __inline int64_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vrmlaldavhaq_s32 (int64_t __a, int32x4_t __b, int32x4_t __c)
+{
+  return __builtin_mve_vrmlaldavhaq_sv4si (__a, __b, __c);
+}
+
+__extension__ extern __inline uint64_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vrmlaldavhaq_u32 (uint64_t __a, uint32x4_t __b, uint32x4_t __c)
+{
+  return __builtin_mve_vrmlaldavhaq_uv4si (__a, __b, __c);
+}
+
+__extension__ extern __inline int8x16_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vshlcq_s8 (int8x16_t __a, uint32_t * __b, const int __imm)
+{
+  int8x16_t __res = __builtin_mve_vshlcq_vec_sv16qi (__a, *__b, __imm);
+  *__b = __builtin_mve_vshlcq_carry_sv16qi (__a, *__b, __imm);
+  return __res;
+}
+
+__extension__ extern __inline uint8x16_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vshlcq_u8 (uint8x16_t __a, uint32_t * __b, const int __imm)
+{
+  uint8x16_t __res = __builtin_mve_vshlcq_vec_uv16qi (__a, *__b, __imm);
+  *__b = __builtin_mve_vshlcq_carry_uv16qi (__a, *__b, __imm);
+  return __res;
+}
+
+__extension__ extern __inline int16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vshlcq_s16 (int16x8_t __a, uint32_t * __b, const int __imm)
+{
+  int16x8_t __res = __builtin_mve_vshlcq_vec_sv8hi (__a, *__b, __imm);
+  *__b = __builtin_mve_vshlcq_carry_sv8hi (__a, *__b, __imm);
+  return __res;
+}
+
+__extension__ extern __inline uint16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vshlcq_u16 (uint16x8_t __a, uint32_t * __b, const int __imm)
+{
+  uint16x8_t __res = __builtin_mve_vshlcq_vec_uv8hi (__a, *__b, __imm);
+  *__b = __builtin_mve_vshlcq_carry_uv8hi (__a, *__b, __imm);
+  return __res;
+}
+
+__extension__ extern __inline int32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vshlcq_s32 (int32x4_t __a, uint32_t * __b, const int __imm)
+{
+  int32x4_t __res = __builtin_mve_vshlcq_vec_sv4si (__a, *__b, __imm);
+  *__b = __builtin_mve_vshlcq_carry_sv4si (__a, *__b, __imm);
+  return __res;
+}
+
+__extension__ extern __inline uint32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vshlcq_u32 (uint32x4_t __a, uint32_t * __b, const int __imm)
+{
+  uint32x4_t __res = __builtin_mve_vshlcq_vec_uv4si (__a, *__b, __imm);
+  *__b = __builtin_mve_vshlcq_carry_uv4si (__a, *__b, __imm);
+  return __res;
+}
+
 #if (__ARM_FEATURE_MVE & 2) /* MVE Floating point.  */
 
 __extension__ extern __inline void
@@ -5443,6 +5657,76 @@ __arm_vcvtbq_f16_f32 (float16x8_t __a, float32x4_t __b)
   return __builtin_mve_vcvtbq_f16_f32v8hf (__a, __b);
 }
 
+__extension__ extern __inline mve_pred16_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vcmpeqq_m_f16 (float16x8_t __a, float16x8_t __b, mve_pred16_t __p)
+{
+  return __builtin_mve_vcmpeqq_m_fv8hf (__a, __b, __p);
+}
+
+__extension__ extern __inline mve_pred16_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vcmpeqq_m_f32 (float32x4_t __a, float32x4_t __b, mve_pred16_t __p)
+{
+  return __builtin_mve_vcmpeqq_m_fv4sf (__a, __b, __p);
+}
+
+__extension__ extern __inline int16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vcvtaq_m_s16_f16 (int16x8_t __inactive, float16x8_t __a, mve_pred16_t __p)
+{
+  return __builtin_mve_vcvtaq_m_sv8hi (__inactive, __a, __p);
+}
+
+__extension__ extern __inline uint16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vcvtaq_m_u16_f16 (uint16x8_t __inactive, float16x8_t __a, mve_pred16_t __p)
+{
+  return __builtin_mve_vcvtaq_m_uv8hi (__inactive, __a, __p);
+}
+
+__extension__ extern __inline int32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vcvtaq_m_s32_f32 (int32x4_t __inactive, float32x4_t __a, mve_pred16_t __p)
+{
+  return __builtin_mve_vcvtaq_m_sv4si (__inactive, __a, __p);
+}
+
+__extension__ extern __inline uint32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vcvtaq_m_u32_f32 (uint32x4_t __inactive, float32x4_t __a, mve_pred16_t __p)
+{
+  return __builtin_mve_vcvtaq_m_uv4si (__inactive, __a, __p);
+}
+
+__extension__ extern __inline float16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vcvtq_m_f16_s16 (float16x8_t __inactive, int16x8_t __a, mve_pred16_t __p)
+{
+  return __builtin_mve_vcvtq_m_to_f_sv8hf (__inactive, __a, __p);
+}
+
+__extension__ extern __inline float16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vcvtq_m_f16_u16 (float16x8_t __inactive, uint16x8_t __a, mve_pred16_t __p)
+{
+  return __builtin_mve_vcvtq_m_to_f_uv8hf (__inactive, __a, __p);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vcvtq_m_f32_s32 (float32x4_t __inactive, int32x4_t __a, mve_pred16_t __p)
+{
+  return __builtin_mve_vcvtq_m_to_f_sv4sf (__inactive, __a, __p);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vcvtq_m_f32_u32 (float32x4_t __inactive, uint32x4_t __a, mve_pred16_t __p)
+{
+  return __builtin_mve_vcvtq_m_to_f_uv4sf (__inactive, __a, __p);
+}
+
 #endif
 
 enum {
@@ -6165,6 +6449,27 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16_t]: __arm_vcmpgeq_n_f16 (__ARM_mve_coerce(__p0, float16x8_t), __ARM_mve_coerce(__p1, float16_t)), \
   int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32_t]: __arm_vcmpgeq_n_f32 (__ARM_mve_coerce(__p0, float32x4_t), __ARM_mve_coerce(__p1, float32_t)));})
 
+#define vcmpeqq_m(p0,p1,p2) __arm_vcmpeqq_m(p0,p1,p2)
+#define __arm_vcmpeqq_m(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
+  __typeof(p1) __p1 = (p1); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
+  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vcmpeqq_m_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), p2), \
+  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vcmpeqq_m_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), p2), \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vcmpeqq_m_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), p2), \
+  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]: __arm_vcmpeqq_m_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t), p2), \
+  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vcmpeqq_m_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t), p2), \
+  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vcmpeqq_m_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t), p2), \
+  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8_t]: __arm_vcmpeqq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8_t), p2), \
+  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16_t]: __arm_vcmpeqq_m_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16_t), p2), \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32_t]: __arm_vcmpeqq_m_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32_t), p2), \
+  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8_t]: __arm_vcmpeqq_m_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8_t), p2), \
+  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16_t]: __arm_vcmpeqq_m_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16_t), p2), \
+  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32_t]: __arm_vcmpeqq_m_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32_t), p2), \
+  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]: __arm_vcmpeqq_m_f16 (__ARM_mve_coerce(__p0, float16x8_t), __ARM_mve_coerce(__p1, float16x8_t), p2), \
+  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]: __arm_vcmpeqq_m_f32 (__ARM_mve_coerce(__p0, float32x4_t), __ARM_mve_coerce(__p1, float32x4_t), p2), \
+  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16_t]: __arm_vcmpeqq_m_n_f16 (__ARM_mve_coerce(__p0, float16x8_t), __ARM_mve_coerce(__p1, float16_t), p2), \
+  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32_t]: __arm_vcmpeqq_m_n_f32 (__ARM_mve_coerce(__p0, float32x4_t), __ARM_mve_coerce(__p1, float32_t), p2));})
+
 #define vcmpgtq(p0,p1) __arm_vcmpgtq(p0,p1)
 #define __arm_vcmpgtq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
@@ -6180,6 +6485,25 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16_t]: __arm_vcmpgtq_n_f16 (__ARM_mve_coerce(__p0, float16x8_t), __ARM_mve_coerce(__p1, float16_t)), \
   int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32_t]: __arm_vcmpgtq_n_f32 (__ARM_mve_coerce(__p0, float32x4_t), __ARM_mve_coerce(__p1, float32_t)));})
 
+#define vcvtaq_m(p0,p1,p2) __arm_vcvtaq_m(p0,p1,p2)
+#define __arm_vcvtaq_m(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
+  __typeof(p1) __p1 = (p1); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
+  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_float16x8_t]: __arm_vcvtaq_m_s16_f16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, float16x8_t), p2), \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_float32x4_t]: __arm_vcvtaq_m_s32_f32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, float32x4_t), p2), \
+  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_float16x8_t]: __arm_vcvtaq_m_u16_f16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, float16x8_t), p2), \
+  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_float32x4_t]: __arm_vcvtaq_m_u32_f32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, float32x4_t), p2));})
+
+#define vcvtq_m(p0,p1,p2) __arm_vcvtq_m(p0,p1,p2)
+#define __arm_vcvtq_m(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
+  __typeof(p1) __p1 = (p1); \
+  __typeof(p2) __p2 = (p2); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
+  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_int16x8_t]: __arm_vcvtq_m_f16_s16 (__ARM_mve_coerce(__p0, float16x8_t), __ARM_mve_coerce(__p1, int16x8_t), p2), \
+  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_int32x4_t]: __arm_vcvtq_m_f32_s32 (__ARM_mve_coerce(__p0, float32x4_t), __ARM_mve_coerce(__p1, int32x4_t), p2), \
+  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vcvtq_m_f16_u16 (__ARM_mve_coerce(__p0, float16x8_t), __ARM_mve_coerce(__p1, uint16x8_t), p2), \
+  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vcvtq_m_f32_u32 (__ARM_mve_coerce(__p0, float32x4_t), __ARM_mve_coerce(__p1, uint32x4_t), p2));})
+
 #else /* MVE Interger.  */
 
 #define vst4q(p0,p1) __arm_vst4q(p0,p1)
@@ -7240,6 +7564,77 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vmlsldavq_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t)), \
   int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vmlsldavq_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t)));})
 
+#define vabavq(p0,p1,p2) __arm_vabavq(p0,p1,p2)
+#define __arm_vabavq(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
+  __typeof(p1) __p1 = (p1); \
+  __typeof(p2) __p2 = (p2); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
+  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vabavq_s8 (__p0, __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8x16_t)), \
+  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vabavq_s16 (__p0, __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16x8_t)), \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vabavq_s32 (__p0, __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32x4_t)), \
+  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]: __arm_vabavq_u8 (__p0, __ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce(__p2, uint8x16_t)), \
+  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vabavq_u16 (__p0, __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, uint16x8_t)), \
+  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vabavq_u32 (__p0, __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, uint32x4_t)));})
+
+#define vshlcq(p0,p1,p2) __arm_vshlcq(p0,p1,p2)
+#define __arm_vshlcq(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
+  int (*)[__ARM_mve_type_int8x16_t]: __arm_vshlcq_s8 (__ARM_mve_coerce(__p0, int8x16_t), p1, p2), \
+  int (*)[__ARM_mve_type_int16x8_t]: __arm_vshlcq_s16 (__ARM_mve_coerce(__p0, int16x8_t), p1, p2), \
+  int (*)[__ARM_mve_type_int32x4_t]: __arm_vshlcq_s32 (__ARM_mve_coerce(__p0, int32x4_t), p1, p2), \
+  int (*)[__ARM_mve_type_uint8x16_t]: __arm_vshlcq_u8 (__ARM_mve_coerce(__p0, uint8x16_t), p1, p2), \
+  int (*)[__ARM_mve_type_uint16x8_t]: __arm_vshlcq_u16 (__ARM_mve_coerce(__p0, uint16x8_t), p1, p2), \
+  int (*)[__ARM_mve_type_uint32x4_t]: __arm_vshlcq_u32 (__ARM_mve_coerce(__p0, uint32x4_t), p1, p2));})
+
+#define vrmlaldavhaq(p0,p1,p2) __arm_vrmlaldavhaq(p0,p1,p2)
+#define __arm_vrmlaldavhaq(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
+  __typeof(p1) __p1 = (p1); \
+  __typeof(p2) __p2 = (p2); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
+  int (*)[__ARM_mve_type_int64_t][__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vrmlaldavhaq_s32 (__ARM_mve_coerce(__p0, int64_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32x4_t)), \
+  int (*)[__ARM_mve_type_uint64_t][__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vrmlaldavhaq_u32 (__ARM_mve_coerce(__p0, uint64_t), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, uint32x4_t)));})
+
+#define vcmpeqq_m(p0,p1,p2) __arm_vcmpeqq_m(p0,p1,p2)
+#define __arm_vcmpeqq_m(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
+  __typeof(p1) __p1 = (p1); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
+  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vcmpeqq_m_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), p2), \
+  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vcmpeqq_m_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), p2), \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vcmpeqq_m_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), p2), \
+  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]: __arm_vcmpeqq_m_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t), p2), \
+  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vcmpeqq_m_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t), p2), \
+  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vcmpeqq_m_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t), p2), \
+  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8_t]: __arm_vcmpeqq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8_t), p2), \
+  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16_t]: __arm_vcmpeqq_m_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16_t), p2), \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32_t]: __arm_vcmpeqq_m_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32_t), p2), \
+  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8_t]: __arm_vcmpeqq_m_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8_t), p2), \
+  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16_t]: __arm_vcmpeqq_m_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16_t), p2), \
+  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32_t]: __arm_vcmpeqq_m_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32_t), p2));})
+
+#define vbicq_m_n(p0,p1,p2) __arm_vbicq_m_n(p0,p1,p2)
+#define __arm_vbicq_m_n(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
+  int (*)[__ARM_mve_type_int16x8_t]: __arm_vbicq_m_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), p1, p2), \
+  int (*)[__ARM_mve_type_int32x4_t]: __arm_vbicq_m_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), p1, p2), \
+  int (*)[__ARM_mve_type_uint16x8_t]: __arm_vbicq_m_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), p1, p2), \
+  int (*)[__ARM_mve_type_uint32x4_t]: __arm_vbicq_m_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), p1, p2));})
+
+#define vqrshrnbq_n(p0,p1,p2) __arm_vqrshrnbq_n(p0,p1,p2)
+#define __arm_vqrshrnbq_n(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
+  __typeof(p1) __p1 = (p1); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
+  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int16x8_t]: __arm_vqrshrnbq_n_s16 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int16x8_t), p2), \
+  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int32x4_t]: __arm_vqrshrnbq_n_s32 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int32x4_t), p2), \
+  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint16x8_t]: __arm_vqrshrnbq_n_u16 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint16x8_t), p2), \
+  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint32x4_t]: __arm_vqrshrnbq_n_u32 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint32x4_t), p2));})
+
+#define vqrshrunbq_n(p0,p1,p2) __arm_vqrshrunbq_n(p0,p1,p2)
+#define __arm_vqrshrunbq_n(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
+  __typeof(p1) __p1 = (p1); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
+  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int16x8_t]: __arm_vqrshrunbq_n_s16 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, int16x8_t), p2), \
+  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int32x4_t]: __arm_vqrshrunbq_n_s32 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, int32x4_t), p2));})
+
 #endif /* MVE Floating point.  */
 
 #ifdef __cplusplus
diff --git a/gcc/config/arm/arm_mve_builtins.def b/gcc/config/arm/arm_mve_builtins.def
index c890a575c7b865554f9645214e94977910825109..f11829a1a5ffd15f77c5ce58e1add9a336df76b8 100644
--- a/gcc/config/arm/arm_mve_builtins.def
+++ b/gcc/config/arm/arm_mve_builtins.def
@@ -291,3 +291,21 @@ VAR1 (BINOP_NONE_NONE_NONE, vrmlaldavhq_s, v4si)
 VAR1 (BINOP_NONE_NONE_NONE, vcvttq_f16_f32, v8hf)
 VAR1 (BINOP_NONE_NONE_NONE, vcvtbq_f16_f32, v8hf)
 VAR1 (BINOP_NONE_NONE_NONE, vaddlvaq_s, v4si)
+VAR2 (TERNOP_NONE_NONE_IMM_UNONE, vbicq_m_n_s, v8hi, v4si)
+VAR2 (TERNOP_UNONE_UNONE_IMM_UNONE, vbicq_m_n_u, v8hi, v4si)
+VAR2 (TERNOP_NONE_NONE_NONE_IMM, vqrshrnbq_n_s, v8hi, v4si)
+VAR2 (TERNOP_UNONE_UNONE_UNONE_IMM, vqrshrnbq_n_u, v8hi, v4si)
+VAR1 (TERNOP_NONE_NONE_NONE_NONE, vrmlaldavhaq_s, v4si)
+VAR1 (TERNOP_UNONE_UNONE_UNONE_UNONE, vrmlaldavhaq_u, v4si)
+VAR2 (TERNOP_NONE_NONE_UNONE_UNONE, vcvtq_m_to_f_u, v8hf, v4sf)
+VAR2 (TERNOP_NONE_NONE_NONE_UNONE, vcvtq_m_to_f_s, v8hf, v4sf)
+VAR2 (TERNOP_UNONE_NONE_NONE_UNONE, vcmpeqq_m_f, v8hf, v4sf)
+VAR3 (TERNOP_UNONE_NONE_UNONE_IMM, vshlcq_carry_s, v16qi, v8hi, v4si)
+VAR3 (TERNOP_UNONE_UNONE_UNONE_IMM, vshlcq_carry_u, v16qi, v8hi, v4si)
+VAR2 (TERNOP_UNONE_UNONE_NONE_IMM, vqrshrunbq_n_s, v8hi, v4si)
+VAR3 (TERNOP_UNONE_UNONE_NONE_NONE, vabavq_s, v16qi, v8hi, v4si)
+VAR3 (TERNOP_UNONE_UNONE_UNONE_UNONE, vabavq_u, v16qi, v8hi, v4si)
+VAR2 (TERNOP_UNONE_UNONE_NONE_UNONE, vcvtaq_m_u, v8hi, v4si)
+VAR2 (TERNOP_NONE_NONE_NONE_UNONE, vcvtaq_m_s, v8hi, v4si)
+VAR3 (TERNOP_UNONE_UNONE_UNONE_IMM, vshlcq_vec_u, v16qi, v8hi, v4si)
+VAR3 (TERNOP_NONE_NONE_UNONE_IMM, vshlcq_vec_s, v16qi, v8hi, v4si)
diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index c932b1bbfcea5d32d4f24df891bfe3e778e641fa..54d2314cb01c7761c115cf7cb516a3d0a3589182 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -85,7 +85,11 @@
 			 VSHLLBQ_U VSHLLTQ_U VSHLLTQ_S VQMOVNTQ_U VQMOVNTQ_S
 			 VSHLLBQ_N_S VSHLLBQ_N_U VSHLLTQ_N_U VSHLLTQ_N_S
 			 VRMLALDAVHQ_U VRMLALDAVHQ_S VMULLTQ_POLY_P
-			 VMULLBQ_POLY_P])
+			 VMULLBQ_POLY_P VBICQ_M_N_S VBICQ_M_N_U VCMPEQQ_M_F
+			 VCVTAQ_M_S VCVTAQ_M_U VCVTQ_M_TO_F_S VCVTQ_M_TO_F_U
+			 VQRSHRNBQ_N_U VQRSHRNBQ_N_S VQRSHRUNBQ_N_S
+			 VRMLALDAVHAQ_S VABAVQ_S VABAVQ_U VSHLCQ_S VSHLCQ_U
+			 VRMLALDAVHAQ_U])
 
 (define_mode_attr MVE_CNVT [(V8HI "V8HF") (V4SI "V4SF")
 			    (V8HF "V8HI") (V4SF "V4SI")])
@@ -146,7 +150,12 @@
 		       (VQMOVNBQ_U "u") (VQMOVNBQ_S "s") (VQMOVNTQ_S "s")
 		       (VQMOVNTQ_U "u") (VSHLLBQ_N_U "u") (VSHLLBQ_N_S "s")
 		       (VSHLLTQ_N_U "u") (VSHLLTQ_N_S "s") (VRMLALDAVHQ_U "u")
-		       (VRMLALDAVHQ_S "s")])
+		       (VRMLALDAVHQ_S "s") (VBICQ_M_N_S "s") (VBICQ_M_N_U "u")
+		       (VCVTAQ_M_S "s") (VCVTAQ_M_U "u") (VCVTQ_M_TO_F_S "s")
+		       (VCVTQ_M_TO_F_U "u") (VQRSHRNBQ_N_S "s")
+		       (VQRSHRNBQ_N_U "u") (VABAVQ_S "s") (VABAVQ_U "u")
+		       (VRMLALDAVHAQ_U "u") (VRMLALDAVHAQ_S "s") (VSHLCQ_S "s")
+		       (VSHLCQ_U "u")])
 
 (define_int_attr mode1 [(VCTP8Q "8") (VCTP16Q "16") (VCTP32Q "32")
 			(VCTP64Q "64") (VCTP8Q_M "8") (VCTP16Q_M "16")
@@ -241,6 +250,13 @@
 (define_int_iterator VSHLLBQ_N [VSHLLBQ_N_S VSHLLBQ_N_U])
 (define_int_iterator VSHLLTQ_N [VSHLLTQ_N_U VSHLLTQ_N_S])
 (define_int_iterator VRMLALDAVHQ [VRMLALDAVHQ_U VRMLALDAVHQ_S])
+(define_int_iterator VBICQ_M_N [VBICQ_M_N_S VBICQ_M_N_U])
+(define_int_iterator VCVTAQ_M [VCVTAQ_M_S VCVTAQ_M_U])
+(define_int_iterator VCVTQ_M_TO_F [VCVTQ_M_TO_F_S VCVTQ_M_TO_F_U])
+(define_int_iterator VQRSHRNBQ_N [VQRSHRNBQ_N_U VQRSHRNBQ_N_S])
+(define_int_iterator VABAVQ [VABAVQ_S VABAVQ_U])
+(define_int_iterator VSHLCQ [VSHLCQ_S VSHLCQ_U])
+(define_int_iterator VRMLALDAVHAQ [VRMLALDAVHAQ_S VRMLALDAVHAQ_U])
 
 (define_insn "*mve_mov<mode>"
   [(set (match_operand:MVE_types 0 "s_register_operand" "=w,w,r,w,w,r,w")
@@ -3050,3 +3066,170 @@
   "vrmlaldavh.<supf>32 %Q0, %R0, %q1, %q2"
   [(set_attr "type" "mve_move")
 ])
+
+;;
+;; [vbicq_m_n_s, vbicq_m_n_u])
+;;
+(define_insn "mve_vbicq_m_n_<supf><mode>"
+  [
+   (set (match_operand:MVE_5 0 "s_register_operand" "=w")
+	(unspec:MVE_5 [(match_operand:MVE_5 1 "s_register_operand" "0")
+		       (match_operand:SI 2 "immediate_operand" "i")
+		       (match_operand:HI 3 "vpr_register_operand" "Up")]
+	 VBICQ_M_N))
+  ]
+  "TARGET_HAVE_MVE"
+  "vpst\;vbict.i%#<V_sz_elem>	%q0, %2"
+  [(set_attr "type" "mve_move")
+   (set_attr "length""8")])
+;;
+;; [vcmpeqq_m_f])
+;;
+(define_insn "mve_vcmpeqq_m_f<mode>"
+  [
+   (set (match_operand:HI 0 "vpr_register_operand" "=Up")
+	(unspec:HI [(match_operand:MVE_0 1 "s_register_operand" "w")
+		    (match_operand:MVE_0 2 "s_register_operand" "w")
+		    (match_operand:HI 3 "vpr_register_operand" "Up")]
+	 VCMPEQQ_M_F))
+  ]
+  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
+  "vpst\;vcmpt.f%#<V_sz_elem>	eq, %q1, %q2"
+  [(set_attr "type" "mve_move")
+   (set_attr "length""8")])
+;;
+;; [vcvtaq_m_u, vcvtaq_m_s])
+;;
+(define_insn "mve_vcvtaq_m_<supf><mode>"
+  [
+   (set (match_operand:MVE_5 0 "s_register_operand" "=w")
+	(unspec:MVE_5 [(match_operand:MVE_5 1 "s_register_operand" "0")
+		       (match_operand:<MVE_CNVT> 2 "s_register_operand" "w")
+		       (match_operand:HI 3 "vpr_register_operand" "Up")]
+	 VCVTAQ_M))
+  ]
+  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
+  "vpst\;vcvtat.<supf>%#<V_sz_elem>.f%#<V_sz_elem>\t%q0, %q2"
+  [(set_attr "type" "mve_move")
+   (set_attr "length""8")])
+;;
+;; [vcvtq_m_to_f_s, vcvtq_m_to_f_u])
+;;
+(define_insn "mve_vcvtq_m_to_f_<supf><mode>"
+  [
+   (set (match_operand:MVE_0 0 "s_register_operand" "=w")
+	(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "0")
+		       (match_operand:<MVE_CNVT> 2 "s_register_operand" "w")
+		       (match_operand:HI 3 "vpr_register_operand" "Up")]
+	 VCVTQ_M_TO_F))
+  ]
+  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
+  "vpst\;vcvtt.f%#<V_sz_elem>.<supf>%#<V_sz_elem>	 %q0, %q2"
+  [(set_attr "type" "mve_move")
+   (set_attr "length""8")])
+;;
+;; [vqrshrnbq_n_u, vqrshrnbq_n_s])
+;;
+(define_insn "mve_vqrshrnbq_n_<supf><mode>"
+  [
+   (set (match_operand:<V_narrow_pack> 0 "s_register_operand" "=w")
+	(unspec:<V_narrow_pack> [(match_operand:<V_narrow_pack> 1 "s_register_operand" "0")
+				 (match_operand:MVE_5 2 "s_register_operand" "w")
+				 (match_operand:SI 3 "mve_imm_8" "Rb")]
+	 VQRSHRNBQ_N))
+  ]
+  "TARGET_HAVE_MVE"
+  "vqrshrnb.<supf>%#<V_sz_elem>	%q0, %q2, %3"
+  [(set_attr "type" "mve_move")
+])
+;;
+;; [vqrshrunbq_n_s])
+;;
+(define_insn "mve_vqrshrunbq_n_s<mode>"
+  [
+   (set (match_operand:<V_narrow_pack> 0 "s_register_operand" "=w")
+	(unspec:<V_narrow_pack> [(match_operand:<V_narrow_pack> 1 "s_register_operand" "0")
+				 (match_operand:MVE_5 2 "s_register_operand" "w")
+				 (match_operand:SI 3 "mve_imm_8" "Rb")]
+	 VQRSHRUNBQ_N_S))
+  ]
+  "TARGET_HAVE_MVE"
+  "vqrshrunb.s%#<V_sz_elem>\t%q0, %q2, %3"
+  [(set_attr "type" "mve_move")
+])
+;;
+;; [vrmlaldavhaq_s vrmlaldavhaq_u])
+;;
+(define_insn "mve_vrmlaldavhaq_<supf>v4si"
+  [
+   (set (match_operand:DI 0 "s_register_operand" "=r")
+	(unspec:DI [(match_operand:DI 1 "s_register_operand" "0")
+		    (match_operand:V4SI 2 "s_register_operand" "w")
+		    (match_operand:V4SI 3 "s_register_operand" "w")]
+	 VRMLALDAVHAQ))
+  ]
+  "TARGET_HAVE_MVE"
+  "vrmlaldavha.<supf>32 %Q0, %R0, %q2, %q3"
+  [(set_attr "type" "mve_move")
+])
+
+;;
+;; [vabavq_s, vabavq_u])
+;;
+(define_insn "mve_vabavq_<supf><mode>"
+  [
+   (set (match_operand:SI 0 "s_register_operand" "=r")
+	(unspec:SI [(match_operand:SI 1 "s_register_operand" "0")
+		    (match_operand:MVE_2 2 "s_register_operand" "w")
+		    (match_operand:MVE_2 3 "s_register_operand" "w")]
+	 VABAVQ))
+  ]
+  "TARGET_HAVE_MVE"
+  "vabav.<supf>%#<V_sz_elem>\t%0, %q2, %q3"
+  [(set_attr "type" "mve_move")
+])
+
+;;
+;; [vshlcq_u vshlcq_s]
+;;
+(define_expand "mve_vshlcq_vec_<supf><mode>"
+ [(match_operand:MVE_2 0 "s_register_operand")
+  (match_operand:MVE_2 1 "s_register_operand")
+  (match_operand:SI 2 "s_register_operand")
+  (match_operand:SI 3 "mve_imm_32")
+  (unspec:MVE_2 [(const_int 0)] VSHLCQ)]
+ "TARGET_HAVE_MVE"
+{
+  rtx ignore_wb = gen_reg_rtx (SImode);
+  emit_insn(gen_mve_vshlcq_<supf><mode>(operands[0], ignore_wb, operands[1],
+                                      operands[2], operands[3]));
+  DONE;
+})
+
+(define_expand "mve_vshlcq_carry_<supf><mode>"
+ [(match_operand:SI 0 "s_register_operand")
+  (match_operand:MVE_2 1 "s_register_operand")
+  (match_operand:SI 2 "s_register_operand")
+  (match_operand:SI 3 "mve_imm_32")
+  (unspec:MVE_2 [(const_int 0)] VSHLCQ)]
+ "TARGET_HAVE_MVE"
+{
+  rtx ignore_vec = gen_reg_rtx (<MODE>mode);
+  emit_insn(gen_mve_vshlcq_<supf><mode>(ignore_vec, operands[0], operands[1],
+				      operands[2], operands[3]));
+  DONE;
+})
+
+(define_insn "mve_vshlcq_<supf><mode>"
+ [(set (match_operand:MVE_2 0 "s_register_operand" "=w")
+       (unspec:MVE_2 [(match_operand:MVE_2 2 "s_register_operand" "0")
+		      (match_operand:SI 3 "s_register_operand" "1")
+		      (match_operand:SI 4 "mve_imm_32" "Rf")]
+	VSHLCQ))
+  (set (match_operand:SI  1 "s_register_operand" "=r")
+       (unspec:SI [(match_dup 2)
+		   (match_dup 3)
+		   (match_dup 4)]
+	VSHLCQ))]
+ "TARGET_HAVE_MVE"
+ "vshlc %q0, %1, %4")
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_s16.c
new file mode 100644
index 0000000000000000000000000000000000000000..c1dd93149b8aeae25955bbc5d437ee0d6ec619b1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_s16.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint32_t
+foo (uint32_t a, int16x8_t b, int16x8_t c)
+{
+  return vabavq_s16 (a, b, c);
+}
+
+/* { dg-final { scan-assembler "vabav.s16"  }  } */
+
+uint32_t
+foo1 (uint32_t a, int16x8_t b, int16x8_t c)
+{
+  return vabavq (a, b, c);
+}
+
+/* { dg-final { scan-assembler "vabav.s16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_s32.c
new file mode 100644
index 0000000000000000000000000000000000000000..4bad47dae4896140e82aad1d6a09d2813da97820
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_s32.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint32_t
+foo (uint32_t a, int32x4_t b, int32x4_t c)
+{
+  return vabavq_s32 (a, b, c);
+}
+
+/* { dg-final { scan-assembler "vabav.s32"  }  } */
+
+uint32_t
+foo1 (uint32_t a, int32x4_t b, int32x4_t c)
+{
+  return vabavq (a, b, c);
+}
+
+/* { dg-final { scan-assembler "vabav.s32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_s8.c
new file mode 100644
index 0000000000000000000000000000000000000000..750bdc428a2503b25ebf0ada5f2c200510c8e15c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_s8.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint32_t
+foo (uint32_t a, int8x16_t b, int8x16_t c)
+{
+  return vabavq_s8 (a, b, c);
+}
+
+/* { dg-final { scan-assembler "vabav.s8"  }  } */
+
+uint32_t
+foo1 (uint32_t a, int8x16_t b, int8x16_t c)
+{
+  return vabavq (a, b, c);
+}
+
+/* { dg-final { scan-assembler "vabav.s8"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_u16.c
new file mode 100644
index 0000000000000000000000000000000000000000..35b0ba3f0e1c4ce2dbe2dd6d1e6c51d8d4636493
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_u16.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint32_t
+foo (uint32_t a, uint16x8_t b, uint16x8_t c)
+{
+  return vabavq_u16 (a, b, c);
+}
+
+/* { dg-final { scan-assembler "vabav.u16"  }  } */
+
+uint32_t
+foo1 (uint32_t a, uint16x8_t b, uint16x8_t c)
+{
+  return vabavq (a, b, c);
+}
+
+/* { dg-final { scan-assembler "vabav.u16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_u32.c
new file mode 100644
index 0000000000000000000000000000000000000000..03dbe596b4f61ab472feceac9eea16cd81676e47
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_u32.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint32_t
+foo (uint32_t a, uint32x4_t b, uint32x4_t c)
+{
+  return vabavq_u32 (a, b, c);
+}
+
+/* { dg-final { scan-assembler "vabav.u32"  }  } */
+
+uint32_t
+foo1 (uint32_t a, uint32x4_t b, uint32x4_t c)
+{
+  return vabavq (a, b, c);
+}
+
+/* { dg-final { scan-assembler "vabav.u32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_u8.c
new file mode 100644
index 0000000000000000000000000000000000000000..708d862d5e6662cf79f39a368c8f921ef14758c1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_u8.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint32_t
+foo (uint32_t a, uint8x16_t b, uint8x16_t c)
+{
+  return vabavq_u8 (a, b, c);
+}
+
+/* { dg-final { scan-assembler "vabav.u8"  }  } */
+
+uint32_t
+foo1 (uint32_t a, uint8x16_t b, uint8x16_t c)
+{
+  return vabavq (a, b, c);
+}
+
+/* { dg-final { scan-assembler "vabav.u8"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vbicq_m_n_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vbicq_m_n_s16.c
new file mode 100644
index 0000000000000000000000000000000000000000..084998ce2a2b57794bbbf8276fd139bbb07649dd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vbicq_m_n_s16.c
@@ -0,0 +1,22 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int16x8_t
+foo (int16x8_t a, mve_pred16_t p)
+{
+  return vbicq_m_n_s16 (a, 16, p);
+}
+
+/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler "vbict.i16"  }  } */
+
+int16x8_t
+foo1 (int16x8_t a, mve_pred16_t p)
+{
+  return vbicq_m_n (a, 16, p);
+}
+
+/* { dg-final { scan-assembler "vpst" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vbicq_m_n_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vbicq_m_n_s32.c
new file mode 100644
index 0000000000000000000000000000000000000000..96b4ebdefc935c9ec467f20f26b4a3cf8f189444
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vbicq_m_n_s32.c
@@ -0,0 +1,22 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int32x4_t
+foo (int32x4_t a, mve_pred16_t p)
+{
+  return vbicq_m_n_s32 (a, 1, p);
+}
+
+/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler "vbict.i32"  }  } */
+
+int32x4_t
+foo1 (int32x4_t a, mve_pred16_t p)
+{
+  return vbicq_m_n (a, 1, p);
+}
+
+/* { dg-final { scan-assembler "vpst" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vbicq_m_n_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vbicq_m_n_u16.c
new file mode 100644
index 0000000000000000000000000000000000000000..f360477192eb59a39a56903882ba7d1352d495bc
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vbicq_m_n_u16.c
@@ -0,0 +1,22 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint16x8_t
+foo (uint16x8_t a, mve_pred16_t p)
+{
+  return vbicq_m_n_u16 (a, 1, p);
+}
+
+/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler "vbict.i16"  }  } */
+
+uint16x8_t
+foo1 (uint16x8_t a, mve_pred16_t p)
+{
+  return vbicq_m_n (a, 1, p);
+}
+
+/* { dg-final { scan-assembler "vpst" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vbicq_m_n_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vbicq_m_n_u32.c
new file mode 100644
index 0000000000000000000000000000000000000000..1302ff7d52a4d13cbd09958c18ac33b5cb093b38
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vbicq_m_n_u32.c
@@ -0,0 +1,22 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint32x4_t
+foo (uint32x4_t a, mve_pred16_t p)
+{
+  return vbicq_m_n_u32 (a, 1, p);
+}
+
+/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler "vbict.i32"  }  } */
+
+uint32x4_t
+foo1 (uint32x4_t a, mve_pred16_t p)
+{
+  return vbicq_m_n (a, 1, p);
+}
+
+/* { dg-final { scan-assembler "vpst" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_f16.c
new file mode 100644
index 0000000000000000000000000000000000000000..5d32d556a5451cd2ccbc99c5dfcbf8d9baa33889
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_f16.c
@@ -0,0 +1,22 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+mve_pred16_t
+foo (float16x8_t a, float16x8_t b, mve_pred16_t p)
+{
+  return vcmpeqq_m_f16 (a, b, p);
+}
+
+/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler "vcmpt.f16"  }  } */
+
+mve_pred16_t
+foo1 (float16x8_t a, float16x8_t b, mve_pred16_t p)
+{
+  return vcmpeqq_m (a, b, p);
+}
+
+/* { dg-final { scan-assembler "vpst" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_f32.c
new file mode 100644
index 0000000000000000000000000000000000000000..0cd7c809b5b800db4b543884bbe01a27e29c1cfa
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_m_f32.c
@@ -0,0 +1,22 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+mve_pred16_t
+foo (float32x4_t a, float32x4_t b, mve_pred16_t p)
+{
+  return vcmpeqq_m_f32 (a, b, p);
+}
+
+/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler "vcmpt.f32"  }  } */
+
+mve_pred16_t
+foo1 (float32x4_t a, float32x4_t b, mve_pred16_t p)
+{
+  return vcmpeqq_m (a, b, p);
+}
+
+/* { dg-final { scan-assembler "vpst" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtaq_m_s16_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtaq_m_s16_f16.c
new file mode 100644
index 0000000000000000000000000000000000000000..98f93a245d114565e11cdee3d12735aeec1049c8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtaq_m_s16_f16.c
@@ -0,0 +1,22 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int16x8_t
+foo (int16x8_t inactive, float16x8_t a, mve_pred16_t p)
+{
+  return vcvtaq_m_s16_f16 (inactive, a, p);
+}
+
+/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler "vcvtat.s16.f16"  }  } */
+
+int16x8_t
+foo1 (int16x8_t inactive, float16x8_t a, mve_pred16_t p)
+{
+  return vcvtaq_m (inactive, a, p);
+}
+
+/* { dg-final { scan-assembler "vpst" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtaq_m_s32_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtaq_m_s32_f32.c
new file mode 100644
index 0000000000000000000000000000000000000000..5a62e0edbdb9398bcc643ac4d4a06387818f2d7b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtaq_m_s32_f32.c
@@ -0,0 +1,22 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int32x4_t
+foo (int32x4_t inactive, float32x4_t a, mve_pred16_t p)
+{
+  return vcvtaq_m_s32_f32 (inactive, a, p);
+}
+
+/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler "vcvtat.s32.f32"  }  } */
+
+int32x4_t
+foo1 (int32x4_t inactive, float32x4_t a, mve_pred16_t p)
+{
+  return vcvtaq_m (inactive, a, p);
+}
+
+/* { dg-final { scan-assembler "vpst" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtaq_m_u16_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtaq_m_u16_f16.c
new file mode 100644
index 0000000000000000000000000000000000000000..2f61151218b82351e5763ddfd620380b2cde32e5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtaq_m_u16_f16.c
@@ -0,0 +1,22 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint16x8_t
+foo (uint16x8_t inactive, float16x8_t a, mve_pred16_t p)
+{
+  return vcvtaq_m_u16_f16 (inactive, a, p);
+}
+
+/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler "vcvtat.u16.f16"  }  } */
+
+uint16x8_t
+foo1 (uint16x8_t inactive, float16x8_t a, mve_pred16_t p)
+{
+  return vcvtaq_m (inactive, a, p);
+}
+
+/* { dg-final { scan-assembler "vpst" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtaq_m_u32_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtaq_m_u32_f32.c
new file mode 100644
index 0000000000000000000000000000000000000000..f2a1e2e23b2c14ca25f839469c7790b763ffeb01
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtaq_m_u32_f32.c
@@ -0,0 +1,22 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint32x4_t
+foo (uint32x4_t inactive, float32x4_t a, mve_pred16_t p)
+{
+  return vcvtaq_m_u32_f32 (inactive, a, p);
+}
+
+/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler "vcvtat.u32.f32"  }  } */
+
+uint32x4_t
+foo1 (uint32x4_t inactive, float32x4_t a, mve_pred16_t p)
+{
+  return vcvtaq_m (inactive, a, p);
+}
+
+/* { dg-final { scan-assembler "vpst" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_m_f16_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_m_f16_s16.c
new file mode 100644
index 0000000000000000000000000000000000000000..ba8bdcba36c8058f6135af4d14aa801cfc9cc974
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_m_f16_s16.c
@@ -0,0 +1,22 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+float16x8_t
+foo (float16x8_t inactive, int16x8_t a, mve_pred16_t p)
+{
+  return vcvtq_m_f16_s16 (inactive, a, p);
+}
+
+/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler "vcvtt.f16.s16"  }  } */
+
+float16x8_t
+foo1 (float16x8_t inactive, int16x8_t a, mve_pred16_t p)
+{
+  return vcvtq_m (inactive, a, p);
+}
+
+/* { dg-final { scan-assembler "vpst" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_m_f16_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_m_f16_u16.c
new file mode 100644
index 0000000000000000000000000000000000000000..1c893655492d72c6279699c68e0a7413c5fa6daa
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_m_f16_u16.c
@@ -0,0 +1,22 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+float16x8_t
+foo (float16x8_t inactive, uint16x8_t a, mve_pred16_t p)
+{
+  return vcvtq_m_f16_u16 (inactive, a, p);
+}
+
+/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler "vcvtt.f16.u16"  }  } */
+
+float16x8_t
+foo1 (float16x8_t inactive, uint16x8_t a, mve_pred16_t p)
+{
+  return vcvtq_m (inactive, a, p);
+}
+
+/* { dg-final { scan-assembler "vpst" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_m_f32_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_m_f32_s32.c
new file mode 100644
index 0000000000000000000000000000000000000000..a51b91f761c7e199e022cd08f5ca6a205f53d96a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_m_f32_s32.c
@@ -0,0 +1,22 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+float32x4_t
+foo (float32x4_t inactive, int32x4_t a, mve_pred16_t p)
+{
+  return vcvtq_m_f32_s32 (inactive, a, p);
+}
+
+/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler "vcvtt.f32.s32"  }  } */
+
+float32x4_t
+foo1 (float32x4_t inactive, int32x4_t a, mve_pred16_t p)
+{
+  return vcvtq_m (inactive, a, p);
+}
+
+/* { dg-final { scan-assembler "vpst" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_m_f32_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_m_f32_u32.c
new file mode 100644
index 0000000000000000000000000000000000000000..fc2626ff0fddd1dc41dcef9acf0ba0f2a74aaa74
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_m_f32_u32.c
@@ -0,0 +1,22 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+float32x4_t
+foo (float32x4_t inactive, uint32x4_t a, mve_pred16_t p)
+{
+  return vcvtq_m_f32_u32 (inactive, a, p);
+}
+
+/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler "vcvtt.f32.u32"  }  } */
+
+float32x4_t
+foo1 (float32x4_t inactive, uint32x4_t a, mve_pred16_t p)
+{
+  return vcvtq_m (inactive, a, p);
+}
+
+/* { dg-final { scan-assembler "vpst" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrshrnbq_n_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrshrnbq_n_s16.c
new file mode 100644
index 0000000000000000000000000000000000000000..59dfe69c421a6d0e851f7db822766a1a495eca43
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrshrnbq_n_s16.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int8x16_t
+foo (int8x16_t a, int16x8_t b)
+{
+  return vqrshrnbq_n_s16 (a, b, 1);
+}
+
+/* { dg-final { scan-assembler "vqrshrnb.s16"  }  } */
+
+int8x16_t
+foo1 (int8x16_t a, int16x8_t b)
+{
+  return vqrshrnbq_n (a, b, 1);
+}
+
+/* { dg-final { scan-assembler "vqrshrnb.s16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrshrnbq_n_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrshrnbq_n_s32.c
new file mode 100644
index 0000000000000000000000000000000000000000..a416a6a7d9c1f80997d5656df5efce8b9d5bc5bb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrshrnbq_n_s32.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int16x8_t
+foo (int16x8_t a, int32x4_t b)
+{
+  return vqrshrnbq_n_s32 (a, b, 1);
+}
+
+/* { dg-final { scan-assembler "vqrshrnb.s32"  }  } */
+
+int16x8_t
+foo1 (int16x8_t a, int32x4_t b)
+{
+  return vqrshrnbq_n (a, b, 1);
+}
+
+/* { dg-final { scan-assembler "vqrshrnb.s32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrshrnbq_n_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrshrnbq_n_u16.c
new file mode 100644
index 0000000000000000000000000000000000000000..da90940674eb1a6cbf40019dcfc6bd8969a6db97
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrshrnbq_n_u16.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint8x16_t
+foo (uint8x16_t a, uint16x8_t b)
+{
+  return vqrshrnbq_n_u16 (a, b, 1);
+}
+
+/* { dg-final { scan-assembler "vqrshrnb.u16"  }  } */
+
+uint8x16_t
+foo1 (uint8x16_t a, uint16x8_t b)
+{
+  return vqrshrnbq_n (a, b, 1);
+}
+
+/* { dg-final { scan-assembler "vqrshrnb.u16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrshrnbq_n_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrshrnbq_n_u32.c
new file mode 100644
index 0000000000000000000000000000000000000000..1c1b8bee72711b863c2f1b4d231c98d262cd94d8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrshrnbq_n_u32.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint16x8_t
+foo (uint16x8_t a, uint32x4_t b)
+{
+  return vqrshrnbq_n_u32 (a, b, 1);
+}
+
+/* { dg-final { scan-assembler "vqrshrnb.u32"  }  } */
+
+uint16x8_t
+foo1 (uint16x8_t a, uint32x4_t b)
+{
+  return vqrshrnbq_n (a, b, 1);
+}
+
+/* { dg-final { scan-assembler "vqrshrnb.u32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrshrunbq_n_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrshrunbq_n_s16.c
new file mode 100644
index 0000000000000000000000000000000000000000..8248a043cd9c72bc3514eda993a2016909d955e7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrshrunbq_n_s16.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint8x16_t
+foo (uint8x16_t a, int16x8_t b)
+{
+  return vqrshrunbq_n_s16 (a, b, 1);
+}
+
+/* { dg-final { scan-assembler "vqrshrunb.s16"  }  } */
+
+uint8x16_t
+foo1 (uint8x16_t a, int16x8_t b)
+{
+  return vqrshrunbq_n (a, b, 1);
+}
+
+/* { dg-final { scan-assembler "vqrshrunb.s16"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrshrunbq_n_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrshrunbq_n_s32.c
new file mode 100644
index 0000000000000000000000000000000000000000..6b6ee4047fc7812478475e65fa8748bb8144e288
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrshrunbq_n_s32.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint16x8_t
+foo (uint16x8_t a, int32x4_t b)
+{
+  return vqrshrunbq_n_s32 (a, b, 1);
+}
+
+/* { dg-final { scan-assembler "vqrshrunb.s32"  }  } */
+
+uint16x8_t
+foo1 (uint16x8_t a, int32x4_t b)
+{
+  return vqrshrunbq_n (a, b, 1);
+}
+
+/* { dg-final { scan-assembler "vqrshrunb.s32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrmlaldavhaq_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrmlaldavhaq_s32.c
new file mode 100644
index 0000000000000000000000000000000000000000..a3b35b626dcf13c27fd0e862b7a4bd2945de23b3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrmlaldavhaq_s32.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int64_t
+foo (int64_t a, int32x4_t b, int32x4_t c)
+{
+  return vrmlaldavhaq_s32 (a, b, c);
+}
+
+/* { dg-final { scan-assembler "vrmlaldavha.s32"  }  } */
+
+int64_t
+foo1 (int64_t a, int32x4_t b, int32x4_t c)
+{
+  return vrmlaldavhaq (a, b, c);
+}
+
+/* { dg-final { scan-assembler "vrmlaldavha.s32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrmlaldavhaq_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrmlaldavhaq_u32.c
new file mode 100644
index 0000000000000000000000000000000000000000..f3e9da88adab10e358a94bcb762fb14f04a67ce0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrmlaldavhaq_u32.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint64_t
+foo (uint64_t a, uint32x4_t b, uint32x4_t c)
+{
+  return vrmlaldavhaq_u32 (a, b, c);
+}
+
+/* { dg-final { scan-assembler "vrmlaldavha.u32"  }  } */
+
+uint64_t
+foo1 (uint64_t a, uint32x4_t b, uint32x4_t c)
+{
+  return vrmlaldavhaq (a, b, c);
+}
+
+/* { dg-final { scan-assembler "vrmlaldavha.u32"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vshlcq_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vshlcq_s16.c
new file mode 100644
index 0000000000000000000000000000000000000000..d38685471963a7ad6bd20de25afb91766e7967a6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vshlcq_s16.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int16x8_t
+foo (int16x8_t a, uint32_t * b)
+{
+  return vshlcq_s16 (a, b, 1);
+}
+
+/* { dg-final { scan-assembler "vshlc"  }  } */
+
+int16x8_t
+foo1 (int16x8_t a, uint32_t * b)
+{
+  return vshlcq (a, b, 1);
+}
+
+/* { dg-final { scan-assembler "vshlc"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vshlcq_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vshlcq_s32.c
new file mode 100644
index 0000000000000000000000000000000000000000..ce4d5d1af1771d93883baba9a078b77aaffc3fea
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vshlcq_s32.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int32x4_t
+foo (int32x4_t a, uint32_t * b)
+{
+  return vshlcq_s32 (a, b, 1);
+}
+
+/* { dg-final { scan-assembler "vshlc"  }  } */
+
+int32x4_t
+foo1 (int32x4_t a, uint32_t * b)
+{
+  return vshlcq (a, b, 1);
+}
+
+/* { dg-final { scan-assembler "vshlc"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vshlcq_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vshlcq_s8.c
new file mode 100644
index 0000000000000000000000000000000000000000..87e1af024fa9a87cb2b3701b4dcc908fe8136842
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vshlcq_s8.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+int8x16_t
+foo (int8x16_t a, uint32_t * b)
+{
+  return vshlcq_s8 (a, b, 1);
+}
+
+/* { dg-final { scan-assembler "vshlc"  }  } */
+
+int8x16_t
+foo1 (int8x16_t a, uint32_t * b)
+{
+  return vshlcq (a, b, 1);
+}
+
+/* { dg-final { scan-assembler "vshlc"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vshlcq_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vshlcq_u16.c
new file mode 100644
index 0000000000000000000000000000000000000000..0f148402d990586729eb14dc48bd0cc4f94d8f62
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vshlcq_u16.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint16x8_t
+foo (uint16x8_t a, uint32_t * b)
+{
+  return vshlcq_u16 (a, b, 1);
+}
+
+/* { dg-final { scan-assembler "vshlc"  }  } */
+
+uint16x8_t
+foo1 (uint16x8_t a, uint32_t * b)
+{
+  return vshlcq (a, b, 1);
+}
+
+/* { dg-final { scan-assembler "vshlc"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vshlcq_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vshlcq_u32.c
new file mode 100644
index 0000000000000000000000000000000000000000..c3ba2b27947667820b26bc30e0128a9ca4103f44
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vshlcq_u32.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint32x4_t
+foo (uint32x4_t a, uint32_t * b)
+{
+  return vshlcq_u32 (a, b, 1);
+}
+
+/* { dg-final { scan-assembler "vshlc"  }  } */
+
+uint32x4_t
+foo1 (uint32x4_t a, uint32_t * b)
+{
+  return vshlcq (a, b, 1);
+}
+
+/* { dg-final { scan-assembler "vshlc"  }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vshlcq_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vshlcq_u8.c
new file mode 100644
index 0000000000000000000000000000000000000000..23284af36de66f4595d48e427b4b76cda9ffd8db
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vshlcq_u8.c
@@ -0,0 +1,21 @@
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint8x16_t
+foo (uint8x16_t a, uint32_t * b)
+{
+  return vshlcq_u8 (a, b, 1);
+}
+
+/* { dg-final { scan-assembler "vshlc"  }  } */
+
+uint8x16_t
+foo1 (uint8x16_t a, uint32_t * b)
+{
+  return vshlcq (a, b, 1);
+}
+
+/* { dg-final { scan-assembler "vshlc"  }  } */


[-- Attachment #2: diff13.patch.gz --]
[-- Type: application/gzip, Size: 7548 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH][ARM][GCC][0/x]: Support for MVE ACLE intrinsics.
@ 2019-11-14 19:34 Srinath Parvathaneni
  2019-11-14 19:13 ` [PATCH][ARM][GCC][2/2x]: MVE intrinsics with binary operands Srinath Parvathaneni
                   ` (38 more replies)
  0 siblings, 39 replies; 48+ messages in thread
From: Srinath Parvathaneni @ 2019-11-14 19:34 UTC (permalink / raw)
  To: gcc-patches; +Cc: Richard Earnshaw, Kyrylo Tkachov

Hello,

This patches series is to support Arm MVE ACLE intrinsics.

Please refer to Arm reference manual [1] and MVE intrinsics [2] for more details.
Please refer to Chapter 13 MVE ACLE [3] for MVE intrinsics concepts.

This patch series depends on upstream patches "Armv8.1-M Mainline Security Extension" [4],
"CLI and multilib support for Armv8.1-M Mainline MVE extensions" [5] and "support for Armv8.1-M
Mainline scalar shifts" [6].

[1] https://static.docs.arm.com/ddi0553/bh/DDI0553B_h_armv8m_arm.pdf?_ga=2.102521798.659307368.1572453718-1501600630.1548848914
[2] https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/mve-intrinsics
[3] https://static.docs.arm.com/101028/0009/Q3-ACLE_2019Q3_release-0009.pdf?_ga=2.239684871.588348166.1573726994-1501600630.1548848914
[4] https://gcc.gnu.org/ml/gcc-patches/2019-10/msg01654.html
[5] https://gcc.gnu.org/ml/gcc-patches/2019-11/msg00641.html
[6] https://gcc.gnu.org/ml/gcc-patches/2019-11/msg01194.html

Srinath Parvathaneni(38):
[PATCH][ARM][GCC][1/x]: MVE ACLE intrinsics framework patch.
[PATCH][ARM][GCC][2/x]: MVE ACLE intrinsics framework patch.
[PATCH][ARM][GCC][3/x]: MVE ACLE intrinsics framework patch.
[PATCH][ARM][GCC][4/x]: MVE ACLE vector interleaving store intrinsics.
[PATCH][ARM][GCC][1/1x]: Patch to support MVE ACLE intrinsics with unary operand.
[PATCH][ARM][GCC][2/1x]: MVE intrinsics with unary operand.
[PATCH][ARM][GCC][3/1x]: MVE intrinsics with unary operand.
[PATCH][ARM][GCC][4/1x]: MVE intrinsics with unary operand.
[PATCH][ARM][GCC][1/2x]: MVE intrinsics with binary operands.
[PATCH][ARM][GCC][2/2x]: MVE intrinsics with binary operands.
[PATCH][ARM][GCC][3/2x]: MVE intrinsics with binary operands.
[PATCH][ARM][GCC][4/2x]: MVE intrinsics with binary operands.
[PATCH][ARM][GCC][5/2x]: MVE intrinsics with binary operands.
[PATCH][ARM][GCC][1/3x]: MVE intrinsics with ternary operands.
[PATCH][ARM][GCC][2/3x]: MVE intrinsics with ternary operands.
[PATCH][ARM][GCC][3/3x]: MVE intrinsics with ternary operands.
[PATCH][ARM][GCC][1/4x]: MVE intrinsics with quaternary operands.
[PATCH][ARM][GCC][2/4x]: MVE intrinsics with quaternary operands.
[PATCH][ARM][GCC][3/4x]: MVE intrinsics with quaternary operands.
[PATCH][ARM][GCC][4/4x]: MVE intrinsics with quaternary operands.
[PATCH][ARM][GCC][1/5x]: MVE store intrinsics.
[PATCH][ARM][GCC][2/5x]: MVE load intrinsics.
[PATCH][ARM][GCC][3/5x]: MVE store intrinsics with predicated suffix.
[PATCH][ARM][GCC][4/5x]: MVE load intrinsics with zero(_z) suffix.
[PATCH][ARM][GCC][5/5x]: MVE ACLE load intrinsics which load a byte, halfword, or word from memory.
[PATCH][ARM][GCC][6/5x]: Remaining MVE load intrinsics which loads half word and word or double word from memory.
[PATCH][ARM][GCC][7/5x]: MVE store intrinsics which stores byte,half word or word to memory.
[PATCH][ARM][GCC][8/5x]: Remaining MVE store intrinsics which stores an half word, word and double word to memory.
[PATCH][ARM][GCC][6x]:MVE ACLE vaddq intrinsics using arithmetic plus operator.
[PATCH][ARM][GCC][7x]: MVE vreinterpretq and vuninitializedq intrinsics.
[PATCH][ARM][GCC][1/8x]: MVE ACLE vidup, vddup, viwdup and vdwdup intrinsics with writeback.
[PATCH][ARM][GCC][2/8x]: MVE ACLE gather load and scatter store intrinsics with writeback.
[PATCH][ARM][GCC][9x]: MVE ACLE predicated intrinsics with (dont-care) variant.
[PATCH][ARM][GCC][10x]: MVE ACLE intrinsics "add with carry across beats" and "beat-wise substract".
[PATCH][ARM][GCC][11x]: MVE ACLE vector interleaving store and deinterleaving load intrinsics and also aliases to vstr and vldr intrinsics.
[PATCH][ARM][GCC][12x]: MVE ACLE intrinsics to set and get vector lane.
[PATCH][ARM][GCC][13x]: MVE ACLE scalar shift intrinsics.
[PATCH][ARM][GCC][14x]: MVE ACLE whole vector left shift with carry intrinsics.

Regards,
Srinath.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH][ARM][GCC][0/x]: Support for MVE ACLE intrinsics.
  2019-11-14 19:34 [PATCH][ARM][GCC][0/x]: Support for MVE ACLE intrinsics Srinath Parvathaneni
                   ` (37 preceding siblings ...)
  2019-11-14 19:27 ` [PATCH][ARM][GCC][1/3x]: " Srinath Parvathaneni
@ 2019-12-12 16:09 ` Kyrill Tkachov
  38 siblings, 0 replies; 48+ messages in thread
From: Kyrill Tkachov @ 2019-12-12 16:09 UTC (permalink / raw)
  To: Srinath Parvathaneni, gcc-patches; +Cc: Richard Earnshaw

Hi Srinath,

On 11/14/19 7:12 PM, Srinath Parvathaneni wrote:
> Hello,
>
> This patches series is to support Arm MVE ACLE intrinsics.
>
> Please refer to Arm reference manual [1] and MVE intrinsics [2] for 
> more details.
> Please refer to Chapter 13 MVE ACLE [3] for MVE intrinsics concepts.
>
> This patch series depends on upstream patches "Armv8.1-M Mainline 
> Security Extension" [4],
> "CLI and multilib support for Armv8.1-M Mainline MVE extensions" [5] 
> and "support for Armv8.1-M
> Mainline scalar shifts" [6].
>
> [1] 
> https://static.docs.arm.com/ddi0553/bh/DDI0553B_h_armv8m_arm.pdf?_ga=2.102521798.659307368.1572453718-1501600630.1548848914
> [2] 
> https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/mve-intrinsics
> [3] 
> https://static.docs.arm.com/101028/0009/Q3-ACLE_2019Q3_release-0009.pdf?_ga=2.239684871.588348166.1573726994-1501600630.1548848914
> [4] https://gcc.gnu.org/ml/gcc-patches/2019-10/msg01654.html
> [5] https://gcc.gnu.org/ml/gcc-patches/2019-11/msg00641.html
> [6] https://gcc.gnu.org/ml/gcc-patches/2019-11/msg01194.html
>
> Srinath Parvathaneni(38):
> [PATCH][ARM][GCC][1/x]: MVE ACLE intrinsics framework patch.
> [PATCH][ARM][GCC][2/x]: MVE ACLE intrinsics framework patch.
> [PATCH][ARM][GCC][3/x]: MVE ACLE intrinsics framework patch.
> [PATCH][ARM][GCC][4/x]: MVE ACLE vector interleaving store intrinsics.
> [PATCH][ARM][GCC][1/1x]: Patch to support MVE ACLE intrinsics with 
> unary operand.
> [PATCH][ARM][GCC][2/1x]: MVE intrinsics with unary operand.
> [PATCH][ARM][GCC][3/1x]: MVE intrinsics with unary operand.
> [PATCH][ARM][GCC][4/1x]: MVE intrinsics with unary operand.
> [PATCH][ARM][GCC][1/2x]: MVE intrinsics with binary operands.
> [PATCH][ARM][GCC][2/2x]: MVE intrinsics with binary operands.
> [PATCH][ARM][GCC][3/2x]: MVE intrinsics with binary operands.
> [PATCH][ARM][GCC][4/2x]: MVE intrinsics with binary operands.
> [PATCH][ARM][GCC][5/2x]: MVE intrinsics with binary operands.
> [PATCH][ARM][GCC][1/3x]: MVE intrinsics with ternary operands.
> [PATCH][ARM][GCC][2/3x]: MVE intrinsics with ternary operands.
> [PATCH][ARM][GCC][3/3x]: MVE intrinsics with ternary operands.
> [PATCH][ARM][GCC][1/4x]: MVE intrinsics with quaternary operands.
> [PATCH][ARM][GCC][2/4x]: MVE intrinsics with quaternary operands.
> [PATCH][ARM][GCC][3/4x]: MVE intrinsics with quaternary operands.
> [PATCH][ARM][GCC][4/4x]: MVE intrinsics with quaternary operands.
> [PATCH][ARM][GCC][1/5x]: MVE store intrinsics.
> [PATCH][ARM][GCC][2/5x]: MVE load intrinsics.
> [PATCH][ARM][GCC][3/5x]: MVE store intrinsics with predicated suffix.
> [PATCH][ARM][GCC][4/5x]: MVE load intrinsics with zero(_z) suffix.
> [PATCH][ARM][GCC][5/5x]: MVE ACLE load intrinsics which load a byte, 
> halfword, or word from memory.
> [PATCH][ARM][GCC][6/5x]: Remaining MVE load intrinsics which loads 
> half word and word or double word from memory.
> [PATCH][ARM][GCC][7/5x]: MVE store intrinsics which stores byte,half 
> word or word to memory.
> [PATCH][ARM][GCC][8/5x]: Remaining MVE store intrinsics which stores 
> an half word, word and double word to memory.
> [PATCH][ARM][GCC][6x]:MVE ACLE vaddq intrinsics using arithmetic plus 
> operator.
> [PATCH][ARM][GCC][7x]: MVE vreinterpretq and vuninitializedq intrinsics.
> [PATCH][ARM][GCC][1/8x]: MVE ACLE vidup, vddup, viwdup and vdwdup 
> intrinsics with writeback.
> [PATCH][ARM][GCC][2/8x]: MVE ACLE gather load and scatter store 
> intrinsics with writeback.
> [PATCH][ARM][GCC][9x]: MVE ACLE predicated intrinsics with (dont-care) 
> variant.
> [PATCH][ARM][GCC][10x]: MVE ACLE intrinsics "add with carry across 
> beats" and "beat-wise substract".
> [PATCH][ARM][GCC][11x]: MVE ACLE vector interleaving store and 
> deinterleaving load intrinsics and also aliases to vstr and vldr 
> intrinsics.
> [PATCH][ARM][GCC][12x]: MVE ACLE intrinsics to set and get vector lane.
> [PATCH][ARM][GCC][13x]: MVE ACLE scalar shift intrinsics.
> [PATCH][ARM][GCC][14x]: MVE ACLE whole vector left shift with carry 
> intrinsics.
>
Thank you for working on these.

I will reply to individual patches with more targeted comments.

As this is a fairly large amount of code, here's my high-level view:

The MVE intrinsics spec has more complexities than the Neon intrinsics one:

* It needs support for both the user-namespace versions, and the __arm_* 
ones.

* There are also overloaded forms that in C are implemented using _Generic.

The above two facts make for a rather bulky and messy arm_mve.h 
implementation.

In the case of the _Generic usage we hit the performance problems 
reported in PR c/91937.

Ideally, I'd like to see the frontend parts of these intrinsics 
implemented in a similar way to the SVE ACLE 
(https://gcc.gnu.org/ml/gcc-patches/2019-10/msg00413.html)

i.e. have the compiler inject the right functions into the language and 
do overload resolution through the appropriate hooks, thus keeping the 
(unavoidable) complexity in the backend rather than arm_mve.h

That being said, this is a major feature that I would very much like to 
see in GCC 10 and the current implementation, outside of the new .md 
file and arm_mve.h file, doesn't disturb anything in the backend it 
shouldn't.

That is, implementing the SVE ACLE approach shouldn't require any risky 
backend surgery, just ripping out a chunk of code and replacing it with 
another chunk of code.

So I'll accept the current approach for GCC 10 as long as we improve the 
frontend parts for the  GCC 11 timeframe.

My reviews on the individual patches will be in this context.

Thanks,

Kyrill

> Regards,
> Srinath.
>
> Entire patch series attached to cover letter.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH][ARM][GCC][1/x]: MVE ACLE intrinsics framework patch.
  2019-11-14 19:16 ` [PATCH][ARM][GCC][1/x]: MVE ACLE intrinsics framework patch Srinath Parvathaneni
@ 2019-12-18 17:18   ` Kyrill Tkachov
  0 siblings, 0 replies; 48+ messages in thread
From: Kyrill Tkachov @ 2019-12-18 17:18 UTC (permalink / raw)
  To: Srinath Parvathaneni, gcc-patches; +Cc: Richard Earnshaw


On 11/14/19 7:12 PM, Srinath Parvathaneni wrote:
> Hello,
>
> This patch creates the required framework for MVE ACLE intrinsics.
>
> The following changes are done in this patch to support MVE ACLE 
> intrinsics.
>
> Header file arm_mve.h is added to source code, which contains the 
> definitions of MVE ACLE intrinsics
> and different data types used in MVE. Machine description file mve.md 
> is also added which contains the
> RTL patterns defined for MVE.
>
> A new reigster "p0" is added which is used in by MVE predicated 
> patterns. A new register class "VPR_REG"
> is added and its contents are defined in REG_CLASS_CONTENTS.
>
> The vec-common.md file is modified to support the standard move 
> patterns. The prefix of neon functions
> which are also used by MVE is changed from "neon_" to "simd_".
> eg: neon_immediate_valid_for_move changed to 
> simd_immediate_valid_for_move.
>
> In the patch standard patterns mve_move, mve_store and move_load for 
> MVE are added and neon.md and vfp.md
> files are modified to support this common patterns.
>
> Please refer to Arm reference manual [1] for more details.
>
> [1] 
> https://static.docs.arm.com/ddi0553/bh/DDI0553B_h_armv8m_arm.pdf?_ga=2.102521798.659307368.1572453718-1501600630.1548848914
>
> Regression tested on arm-none-eabi and found no regressions.
>
> Ok for trunk?

Ok.

Thanks,

Kyrill

>
> Thanks,
> Srinath
>
> gcc/ChangeLog:
>
> 2019-11-11  Andre Vieira <andre.simoesdiasvieira@arm.com>
>             Mihail Ionescu  <mihail.ionescu@arm.com>
>             Srinath Parvathaneni <srinath.parvathaneni@arm.com>
>
>         * config.gcc (arm_mve.h): Add header file.
>         * config/arm/aout.h (p0): Add new register name.
>         * config/arm-builtins.c (ARM_BUILTIN_SIMD_LANE_CHECK): Define.
>         (ARM_BUILTIN_NEON_LANE_CHECK): Remove.
>         (arm_init_simd_builtin_types): Add TARGET_HAVE_MVE check.
>         (arm_init_neon_builtins): Move a check to arm_init_builtins 
> function.
>         (arm_init_builtins): Move a check from arm_init_neon_builtins 
> function.
>         (mve_dereference_pointer): Add new function.
>         (arm_expand_builtin_args): Add TARGET_HAVE_MVE check.
>         (arm_expand_neon_builtin): Move a check to arm_expand_builtin 
> function.
>         (arm_expand_builtin): Move a check from 
> arm_expand_neon_builtin function.
>         * config/arm/arm-c.c (arm_cpu_builtins): Define macros for MVE.
>         * config/arm/arm-modes.def (INT_MODE): Add three new integer 
> modes.
>         * config/arm/arm-protos.h (neon_immediate_valid_for_move): 
> Rename function.
>         (simd_immediate_valid_for_move): Rename 
> neon_immediate_valid_for_move function.
>         * config/arm/arm.c 
> (arm_options_perform_arch_sanity_checks):Enable mve isa bit.
>         (use_return_insn): Add TARGET_HAVE_MVE check.
>         (aapcs_vfp_allocate): Add TARGET_HAVE_MVE check.
>         (aapcs_vfp_allocate_return_reg): Add TARGET_HAVE_MVE check.
>         (thumb2_legitimate_address_p): Add TARGET_HAVE_MVE check.
>         (arm_rtx_costs_internal): Add TARGET_HAVE_MVE check.
>         (neon_valid_immediate): Rename to simd_valid_immediate.
>         (simd_valid_immediate): Rename from neon_valid_immediate.
>         (neon_immediate_valid_for_move): Rename to 
> simd_immediate_valid_for_move.
>         (simd_immediate_valid_for_move): Rename from 
> neon_immediate_valid_for_move.
>         (neon_immediate_valid_for_logic): Modify call to 
> neon_valid_immediate function.
>         (neon_make_constant): Modify call to neon_valid_immediate 
> function.
>         (neon_vector_mem_operand): Add TARGET_HAVE_MVE check.
>         (output_move_neon): Add TARGET_HAVE_MVE check.
>         (arm_compute_frame_layout): Add TARGET_HAVE_MVE check.
>         (arm_save_coproc_regs): Add TARGET_HAVE_MVE check.
>         (arm_print_operand): Add case 'E' to print memory operands.
>         (arm_print_operand_address): Add TARGET_HAVE_MVE check.
>         (arm_hard_regno_mode_ok): Add TARGET_HAVE_MVE check.
>         (arm_modes_tieable_p): Add TARGET_HAVE_MVE check.
>         (arm_regno_class): Add VPR_REGNUM check.
>         (arm_expand_epilogue_apcs_frame): Add TARGET_HAVE_MVE check.
>         (arm_expand_epilogue): Add TARGET_HAVE_MVE check.
>         (arm_vector_mode_supported_p): Add TARGET_HAVE_MVE check for 
> MVE vector modes.
>         (arm_array_mode_supported_p): Add TARGET_HAVE_MVE check.
>         (arm_conditional_register_usage): For TARGET_HAVE_MVE enable 
> VPR register.
>         * config/arm/arm.h (IS_VPR_REGNUM): Macro to check for VPR 
> register.
>         (FIRST_PSEUDO_REGISTER): Modify.
>         (VALID_MVE_MODE): Define.
>         (VALID_MVE_SI_MODE): Define.
>         (VALID_MVE_SF_MODE): Define.
>         (VALID_MVE_STRUCT_MODE): Define.
>         (REG_ALLOC_ORDER): Add VPR_REGNUM entry.
>         (enum reg_class): Add VPR_REG entry.
>         (REG_CLASS_NAMES): Add VPR_REG entry.
>         * config/arm/arm.md (VPR_REGNUM): Define.
>         (arm_movsf_soft_insn): Add TARGET_HAVE_MVE check to not allow MVE.
>         (vfp_pop_multiple_with_writeback): Add TARGET_HAVE_MVE check 
> to allow writeback.
>         (include "mve.md"): Include mve.md file.
>         * config/arm/arm_mve.h: New file.
>         * config/arm/constraints.md (Up): Define.
>         * config/arm/iterators.md (VNIM1): Define.
>         (VNINOTM1): Define.
>         (VSTRUCT): Modify.
>         * config/arm/mve.md: New file.
>         * config/arm/neon.md:
>         (mov<mode>): Add TARGET_HAVE_MVE check.
>         (movv4hf): Define.
>         (neon_mov<mode>): Add TARGET_HAVE_MVE check.
>         (define_split): Add TARGET_HAVE_MVE check.
>         (vec_init<mode><V_elem_l>): Add TARGET_HAVE_MVE check.
>         * config/arm/predicates.md (vpr_register_operand): Define.
>         * config/arm/t-arm: Add mve.md file.
>         * config/arm/types.md: Add MVE instructions mve_move, 
> mve_load, mve_store.
>         * config/arm/vec-common.md (mov<mode>): Add TARGET_HAVE_MVE check.
>         (mov<mode>): Modify iterator.
>         (movv8hf): Define
>
> gcc/testsuite/ChangeLog:
>
> 2019-11-11  Andre Vieira <andre.simoesdiasvieira@arm.com>
>             Mihail Ionescu  <mihail.ionescu@arm.com>
>             Srinath Parvathaneni <srinath.parvathaneni@arm.com>
>
>         * gcc.target/arm/mve/intrinsics/mve_vector_float.c: New test.
>         * gcc.target/arm/mve/intrinsics/mve_vector_float1.c: Likewise.
>         * gcc.target/arm/mve/intrinsics/mve_vector_float2.c: Likewise.
>         * gcc.target/arm/mve/intrinsics/mve_vector_int.c: Likewise.
>         * gcc.target/arm/mve/intrinsics/mve_vector_int1.c: Likewise.
>         * gcc.target/arm/mve/intrinsics/mve_vector_int2.c: Likewise.
>         * gcc.target/arm/mve/intrinsics/mve_vector_uint.c: Likewise.
>         * gcc.target/arm/mve/intrinsics/mve_vector_uint1.c: Likewise.
>         * gcc.target/arm/mve/intrinsics/mve_vector_uint2.c: Likewise.
>         * gcc.target/arm/mve/mve.exp: New file.
>
>
> ###############     Attachment also inlined for ease of reply    
> ###############
>
>
> diff --git a/gcc/config.gcc b/gcc/config.gcc
> index 
> 72f656408f11802c669c3de953bf3020020ca312..c4a7d984936c531d7dfcce347d56b5931913e68b 
> 100644
> --- a/gcc/config.gcc
> +++ b/gcc/config.gcc
> @@ -344,7 +344,7 @@ arc*-*-*)
>  arm*-*-*)
>          cpu_type=arm
>          extra_objs="arm-builtins.o aarch-common.o"
> -       extra_headers="mmintrin.h arm_neon.h arm_acle.h arm_fp16.h 
> arm_cmse.h"
> +       extra_headers="mmintrin.h arm_neon.h arm_acle.h arm_fp16.h 
> arm_cmse.h arm_mve.h"
>          target_type_format_char='%'
>          c_target_objs="arm-c.o"
>          cxx_target_objs="arm-c.o"
> diff --git a/gcc/config/arm/aout.h b/gcc/config/arm/aout.h
> index 
> 72782758853a869bcb9a9d69f3fa0da979cd711f..28cde153f704748f35c84d072b59e9695a61e661 
> 100644
> --- a/gcc/config/arm/aout.h
> +++ b/gcc/config/arm/aout.h
> @@ -53,7 +53,9 @@
>  /* The assembler's names for the registers.  Note that the ?xx 
> registers are
>     there so that VFPv3/NEON registers D16-D31 have the same spacing 
> as D0-D15
>     (each of which is overlaid on two S registers), although there are no
> -   actual single-precision registers which correspond to D16-D31.  */
> +   actual single-precision registers which correspond to D16-D31.  
> New register
> +   p0 is added which is used for MVE predicated cases.  */
> +
>  #ifndef REGISTER_NAMES
>  #define REGISTER_NAMES                                          \
>  {                                                               \
> @@ -72,7 +74,7 @@
>    "wr8",   "wr9",   "wr10", "wr11",                           \
>    "wr12",  "wr13",  "wr14", "wr15",                           \
>    "wcgr0", "wcgr1", "wcgr2", "wcgr3",                          \
> -  "cc", "vfpcc", "sfp", "afp", "apsrq", "apsrge"               \
> +  "cc", "vfpcc", "sfp", "afp", "apsrq", "apsrge", "p0"         \
>  }
>  #endif
>
> diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
> index 
> 650b22c7ad916d9abd587981e9ed5809755ee035..d4cb0ea3deb49b10266d1620c85e243ed34aee4d 
> 100644
> --- a/gcc/config/arm/arm-builtins.c
> +++ b/gcc/config/arm/arm-builtins.c
> @@ -667,6 +667,7 @@ enum arm_builtins
>    ARM_BUILTIN_SET_FPSCR,
>
>    ARM_BUILTIN_CMSE_NONSECURE_CALLER,
> +  ARM_BUILTIN_SIMD_LANE_CHECK,
>
>  #undef CRYPTO1
>  #undef CRYPTO2
> @@ -692,7 +693,6 @@ enum arm_builtins
>  #include "arm_vfp_builtins.def"
>
>    ARM_BUILTIN_NEON_BASE,
> -  ARM_BUILTIN_NEON_LANE_CHECK = ARM_BUILTIN_NEON_BASE,
>
>  #include "arm_neon_builtins.def"
>
> @@ -948,26 +948,35 @@ arm_init_simd_builtin_types (void)
>       an entry in our mangling table, consequently, they get default
>       mangling.  As a further gotcha, poly8_t and poly16_t are signed
>       types, poly64_t and poly128_t are unsigned types.  */
> -  arm_simd_polyQI_type_node
> -    = build_distinct_type_copy (intQI_type_node);
> -  (*lang_hooks.types.register_builtin_type) (arm_simd_polyQI_type_node,
> - "__builtin_neon_poly8");
> -  arm_simd_polyHI_type_node
> -    = build_distinct_type_copy (intHI_type_node);
> -  (*lang_hooks.types.register_builtin_type) (arm_simd_polyHI_type_node,
> - "__builtin_neon_poly16");
> -  arm_simd_polyDI_type_node
> -    = build_distinct_type_copy (unsigned_intDI_type_node);
> -  (*lang_hooks.types.register_builtin_type) (arm_simd_polyDI_type_node,
> - "__builtin_neon_poly64");
> -  arm_simd_polyTI_type_node
> -    = build_distinct_type_copy (unsigned_intTI_type_node);
> -  (*lang_hooks.types.register_builtin_type) (arm_simd_polyTI_type_node,
> - "__builtin_neon_poly128");
> -  /* Prevent front-ends from transforming poly vectors into string
> -     literals.  */
> -  TYPE_STRING_FLAG (arm_simd_polyQI_type_node) = false;
> -  TYPE_STRING_FLAG (arm_simd_polyHI_type_node) = false;
> +  if (!TARGET_HAVE_MVE)
> +    {
> +      arm_simd_polyQI_type_node
> +       = build_distinct_type_copy (intQI_type_node);
> +      (*lang_hooks.types.register_builtin_type) 
> (arm_simd_polyQI_type_node,
> + "__builtin_neon_poly8");
> +      arm_simd_polyHI_type_node
> +       = build_distinct_type_copy (intHI_type_node);
> +      (*lang_hooks.types.register_builtin_type) 
> (arm_simd_polyHI_type_node,
> + "__builtin_neon_poly16");
> +      arm_simd_polyDI_type_node
> +       = build_distinct_type_copy (unsigned_intDI_type_node);
> +      (*lang_hooks.types.register_builtin_type) 
> (arm_simd_polyDI_type_node,
> + "__builtin_neon_poly64");
> +      arm_simd_polyTI_type_node
> +       = build_distinct_type_copy (unsigned_intTI_type_node);
> +      (*lang_hooks.types.register_builtin_type) 
> (arm_simd_polyTI_type_node,
> + "__builtin_neon_poly128");
> +      /* Init poly vector element types with scalar poly types.  */
> +      arm_simd_types[Poly8x8_t].eltype = arm_simd_polyQI_type_node;
> +      arm_simd_types[Poly8x16_t].eltype = arm_simd_polyQI_type_node;
> +      arm_simd_types[Poly16x4_t].eltype = arm_simd_polyHI_type_node;
> +      arm_simd_types[Poly16x8_t].eltype = arm_simd_polyHI_type_node;
> +
> +      /* Prevent front-ends from transforming poly vectors into string
> +        literals.  */
> +      TYPE_STRING_FLAG (arm_simd_polyQI_type_node) = false;
> +      TYPE_STRING_FLAG (arm_simd_polyHI_type_node) = false;
> +    }
>
>    /* Init all the element types built by the front-end.  */
>    arm_simd_types[Int8x8_t].eltype = intQI_type_node;
> @@ -985,11 +994,6 @@ arm_init_simd_builtin_types (void)
>    arm_simd_types[Uint32x4_t].eltype = unsigned_intSI_type_node;
>    arm_simd_types[Uint64x2_t].eltype = unsigned_intDI_type_node;
>
> -  /* Init poly vector element types with scalar poly types.  */
> -  arm_simd_types[Poly8x8_t].eltype = arm_simd_polyQI_type_node;
> -  arm_simd_types[Poly8x16_t].eltype = arm_simd_polyQI_type_node;
> -  arm_simd_types[Poly16x4_t].eltype = arm_simd_polyHI_type_node;
> -  arm_simd_types[Poly16x8_t].eltype = arm_simd_polyHI_type_node;
>    /* Note: poly64x2_t is defined in arm_neon.h, to ensure it gets default
>       mangling.  */
>
> @@ -1006,6 +1010,8 @@ arm_init_simd_builtin_types (void)
>        tree eltype = arm_simd_types[i].eltype;
>        machine_mode mode = arm_simd_types[i].mode;
>
> +      if (eltype == NULL)
> +       continue;
>        if (arm_simd_types[i].itype == NULL)
>          arm_simd_types[i].itype =
>            build_distinct_type_copy
> @@ -1231,15 +1237,6 @@ arm_init_neon_builtins (void)
>       system.  */
>    arm_init_simd_builtin_scalar_types ();
>
> -  tree lane_check_fpr = build_function_type_list (void_type_node,
> - intSI_type_node,
> - intSI_type_node,
> -                                                 NULL);
> -  arm_builtin_decls[ARM_BUILTIN_NEON_LANE_CHECK] =
> -      add_builtin_function ("__builtin_arm_lane_check", lane_check_fpr,
> -                           ARM_BUILTIN_NEON_LANE_CHECK, BUILT_IN_MD,
> -                           NULL, NULL_TREE);
> -
>    for (i = 0; i < ARRAY_SIZE (neon_builtin_data); i++, fcode++)
>      {
>        arm_builtin_datum *d = &neon_builtin_data[i];
> @@ -1956,6 +1953,15 @@ arm_init_builtins (void)
>
>    if (TARGET_MAYBE_HARD_FLOAT)
>      {
> +      tree lane_check_fpr = build_function_type_list (void_type_node,
> + intSI_type_node,
> + intSI_type_node,
> +                                                     NULL);
> +      arm_builtin_decls[ARM_BUILTIN_SIMD_LANE_CHECK]
> +      = add_builtin_function ("__builtin_arm_lane_check", lane_check_fpr,
> +                             ARM_BUILTIN_SIMD_LANE_CHECK, BUILT_IN_MD,
> +                             NULL, NULL_TREE);
> +
>        arm_init_neon_builtins ();
>        arm_init_vfp_builtins ();
>        arm_init_crypto_builtins ();
> @@ -2201,6 +2207,47 @@ neon_dereference_pointer (tree exp, tree type, 
> machine_mode mem_mode,
>                        build_int_cst (build_pointer_type (array_type), 
> 0));
>  }
>
> +/* EXP is a pointer argument to a vector scatter store intrinsics.
> +
> +   Consider the following example:
> +       VSTRW<v>.<dt> Qd, [Qm{, #+/-<imm>}]!
> +   When <Qm> used as the base register for the target address,
> +   this function is used to derive and return an expression for the
> +   accessed memory.
> +
> +   The intrinsic function operates on a block of registers that has mode
> +   REG_MODE.  This block contains vectors of type TYPE_MODE.  The 
> function
> +   references the memory at EXP of type TYPE and in mode MEM_MODE.  This
> +   mode may be BLKmode if no more suitable mode is available.  */
> +
> +static tree
> +mve_dereference_pointer (tree exp, tree type, machine_mode reg_mode,
> +                        machine_mode vector_mode)
> +{
> +  HOST_WIDE_INT reg_size, vector_size, nelems;
> +  tree elem_type, upper_bound, array_type;
> +
> +  /* Work out the size of each vector in bytes.  */
> +  vector_size = GET_MODE_SIZE (vector_mode);
> +
> +  /* Work out the size of the register block in bytes.  */
> +  reg_size = GET_MODE_SIZE (reg_mode);
> +
> +  /* Work out the type of each element.  */
> +  gcc_assert (POINTER_TYPE_P (type));
> +  elem_type = TREE_TYPE (type);
> +
> +  nelems = reg_size / vector_size;
> +
> +  /* Create a type that describes the full access.  */
> +  upper_bound = build_int_cst (size_type_node, nelems - 1);
> +  array_type = build_array_type (elem_type, build_index_type 
> (upper_bound));
> +
> +  /* Dereference EXP using that type.  */
> +  return fold_build2 (MEM_REF, array_type, exp,
> +                     build_int_cst (build_pointer_type (array_type), 0));
> +}
> +
>  /* Expand a builtin.  */
>  static rtx
>  arm_expand_builtin_args (rtx target, machine_mode map_mode, int fcode,
> @@ -2239,10 +2286,17 @@ arm_expand_builtin_args (rtx target, 
> machine_mode map_mode, int fcode,
>              {
>                machine_mode other_mode
>                  = insn_data[icode].operand[1 - opno].mode;
> -              arg[argc] = neon_dereference_pointer (arg[argc],
> +             if (TARGET_HAVE_MVE && mode[argc] != other_mode)
> +               {
> +                 arg[argc] = mve_dereference_pointer (arg[argc],
> TREE_VALUE (formals),
> - mode[argc], other_mode,
> - map_mode);
> + other_mode, map_mode);
> +               }
> +             else
> +               arg[argc] = neon_dereference_pointer (arg[argc],
> + TREE_VALUE (formals),
> + mode[argc], other_mode,
> + map_mode);
>              }
>
>            /* Use EXPAND_MEMORY for ARG_BUILTIN_MEMORY and
> @@ -2548,22 +2602,6 @@ arm_expand_neon_builtin (int fcode, tree exp, 
> rtx target)
>        return const0_rtx;
>      }
>
> -  if (fcode == ARM_BUILTIN_NEON_LANE_CHECK)
> -    {
> -      /* Builtin is only to check bounds of the lane passed to some 
> intrinsics
> -        that are implemented with gcc vector extensions in 
> arm_neon.h.  */
> -
> -      tree nlanes = CALL_EXPR_ARG (exp, 0);
> -      gcc_assert (TREE_CODE (nlanes) == INTEGER_CST);
> -      rtx lane_idx = expand_normal (CALL_EXPR_ARG (exp, 1));
> -      if (CONST_INT_P (lane_idx))
> -       neon_lane_bounds (lane_idx, 0, TREE_INT_CST_LOW (nlanes), exp);
> -      else
> -       error ("%Klane index must be a constant immediate", exp);
> -      /* Don't generate any RTL.  */
> -      return const0_rtx;
> -    }
> -
>    arm_builtin_datum *d
>      = &neon_builtin_data[fcode - ARM_BUILTIN_NEON_PATTERN_START];
>
> @@ -2625,6 +2663,22 @@ arm_expand_builtin (tree exp,
>    int mask;
>    int imm;
>
> +  if (fcode == ARM_BUILTIN_SIMD_LANE_CHECK)
> +    {
> +      /* Builtin is only to check bounds of the lane passed to some 
> intrinsics
> +        that are implemented with gcc vector extensions in 
> arm_neon.h.  */
> +
> +      tree nlanes = CALL_EXPR_ARG (exp, 0);
> +      gcc_assert (TREE_CODE (nlanes) == INTEGER_CST);
> +      rtx lane_idx = expand_normal (CALL_EXPR_ARG (exp, 1));
> +      if (CONST_INT_P (lane_idx))
> +       neon_lane_bounds (lane_idx, 0, TREE_INT_CST_LOW (nlanes), exp);
> +      else
> +       error ("%Klane index must be a constant immediate", exp);
> +      /* Don't generate any RTL.  */
> +      return const0_rtx;
> +    }
> +
>    if (fcode >= ARM_BUILTIN_ACLE_BASE)
>      return arm_expand_acle_builtin (fcode, exp, target);
>
> diff --git a/gcc/config/arm/arm-c.c b/gcc/config/arm/arm-c.c
> index 
> 34695fa0112e90e4bdf317da0b9fd1d3194bf0a2..0fe7d371c348818f25901c5d84be94589523c9a6 
> 100644
> --- a/gcc/config/arm/arm-c.c
> +++ b/gcc/config/arm/arm-c.c
> @@ -79,6 +79,16 @@ arm_cpu_builtins (struct cpp_reader* pfile)
>    def_or_undef_macro (pfile, "__ARM_FEATURE_COMPLEX", TARGET_COMPLEX);
>    def_or_undef_macro (pfile, "__ARM_32BIT_STATE", TARGET_32BIT);
>
> +  cpp_undef (pfile, "__ARM_FEATURE_MVE");
> +  if (TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT)
> +    {
> +      builtin_define_with_int_value ("__ARM_FEATURE_MVE", 3);
> +    }
> +  else if (TARGET_HAVE_MVE)
> +    {
> +      builtin_define_with_int_value ("__ARM_FEATURE_MVE", 1);
> +    }
> +
>    cpp_undef (pfile, "__ARM_FEATURE_CMSE");
>    if (arm_arch8 && !arm_arch_notm)
>      {
> diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
> index 
> 5b49049cc45c0bccfa9d67eac0940250fc5dd95a..d4612ae4553697989611d772f7bb0061a04b98b6 
> 100644
> --- a/gcc/config/arm/arm-protos.h
> +++ b/gcc/config/arm/arm-protos.h
> @@ -85,7 +85,7 @@ extern bool ldm_stm_operation_p (rtx, bool, 
> machine_mode mode,
>  extern bool clear_operation_p (rtx, bool);
>  extern int arm_const_double_rtx (rtx);
>  extern int vfp3_const_double_rtx (rtx);
> -extern int neon_immediate_valid_for_move (rtx, machine_mode, rtx *, 
> int *);
> +extern int simd_immediate_valid_for_move (rtx, machine_mode, rtx *, 
> int *);
>  extern int neon_immediate_valid_for_logic (rtx, machine_mode, int, rtx *,
>                                             int *);
>  extern int neon_immediate_valid_for_shift (rtx, machine_mode, rtx *,
> diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
> index 
> 8b07c423fb6b071642fccc48424fe244d97dcbc2..c755df420b52798773ee99f54faf6689d4a16215 
> 100644
> --- a/gcc/config/arm/arm.h
> +++ b/gcc/config/arm/arm.h
> @@ -751,7 +751,8 @@ extern int arm_arch_cmse;
>  /*      s0-s15          VFP scratch (aka d0-d7).
>          s16-s31       S  VFP variable (aka d8-d15).
>          vfpcc           Not a real register.  Represents the VFP 
> condition
> -                       code flags.  */
> +                       code flags.
> +       vpr             Used to represent MVE VPR predication.  */
>
>  /* The stack backtrace structure is as follows:
>    fp points to here:  |  save code pointer  |      [fp]
> @@ -792,7 +793,7 @@ extern int arm_arch_cmse;
>    1,1,1,1,1,1,1,1,             \
>    1,1,1,1,                     \
>    /* Specials.  */             \
> -  1,1,1,1,1,1                  \
> +  1,1,1,1,1,1,1                        \
>  }
>
>  /* 1 for registers not available across function calls.
> @@ -822,7 +823,7 @@ extern int arm_arch_cmse;
>    1,1,1,1,1,1,1,1,             \
>    1,1,1,1,                     \
>    /* Specials.  */             \
> -  1,1,1,1,1,1                  \
> +  1,1,1,1,1,1,1                        \
>  }
>
>  #ifndef SUBTARGET_CONDITIONAL_REGISTER_USAGE
> @@ -998,10 +999,10 @@ extern int arm_arch_cmse;
>     && (LAST_VFP_REGNUM - (REGNUM) >= 2 * (N) - 1))
>
>  /* The number of hard registers is 16 ARM + 1 CC + 1 SFP + 1 AFP
> -   + 1 APSRQ + 1 APSRGE.  */
> +   + 1 APSRQ + 1 APSRGE + 1 VPR.  */
>  /* Intel Wireless MMX Technology registers add 16 + 4 more.  */
>  /* VFP (VFP3) adds 32 (64) + 1 VFPCC.  */
> -#define FIRST_PSEUDO_REGISTER   106
> +#define FIRST_PSEUDO_REGISTER   107
>
>  #define DBX_REGISTER_NUMBER(REGNO) arm_dbx_register_number (REGNO)
>
> @@ -1029,11 +1030,26 @@ extern int arm_arch_cmse;
>    ((MODE) == V4SImode || (MODE) == V8HImode || (MODE) == V16QImode \
>     || (MODE) == V8HFmode || (MODE) == V4SFmode || (MODE) == V2DImode)
>
> +#define VALID_MVE_MODE(MODE) \
> +  ((MODE) == V2DImode ||(MODE) == V4SImode || (MODE) == V8HImode \
> +   || (MODE) == V16QImode || (MODE) == V8HFmode || (MODE) == V4SFmode \
> +   || (MODE) == V2DFmode)
> +
> +#define VALID_MVE_SI_MODE(MODE) \
> +  ((MODE) == V2DImode ||(MODE) == V4SImode || (MODE) == V8HImode \
> +   || (MODE) == V16QImode)
> +
> +#define VALID_MVE_SF_MODE(MODE) \
> +  ((MODE) == V8HFmode || (MODE) == V4SFmode || (MODE) == V2DFmode)
> +
>  /* Structure modes valid for Neon registers.  */
>  #define VALID_NEON_STRUCT_MODE(MODE) \
>    ((MODE) == TImode || (MODE) == EImode || (MODE) == OImode \
>     || (MODE) == CImode || (MODE) == XImode)
>
> +#define VALID_MVE_STRUCT_MODE(MODE) \
> +  ((MODE) == TImode || (MODE) == OImode || (MODE) == XImode)
> +
>  /* The register numbers in sequence, for passing to 
> arm_gen_load_multiple.  */
>  extern int arm_regs_in_sequence[];
>
> @@ -1085,9 +1101,13 @@ extern int arm_regs_in_sequence[];
>    /* Registers not for general use.  */                \
>    CC_REGNUM, VFPCC_REGNUM,                     \
>    FRAME_POINTER_REGNUM, ARG_POINTER_REGNUM,    \
> -  SP_REGNUM, PC_REGNUM, APSRQ_REGNUM, APSRGE_REGNUM    \
> +  SP_REGNUM, PC_REGNUM, APSRQ_REGNUM, APSRGE_REGNUM,   \
> +  VPR_REGNUM                                   \
>  }
>
> +#define IS_VPR_REGNUM(REGNUM) \
> +  ((REGNUM) == VPR_REGNUM)
> +
>  /* Use different register alloc ordering for Thumb.  */
>  #define ADJUST_REG_ALLOC_ORDER arm_order_regs_for_local_alloc ()
>
> @@ -1124,6 +1144,7 @@ enum reg_class
>    VFPCC_REG,
>    SFP_REG,
>    AFP_REG,
> +  VPR_REG,
>    ALL_REGS,
>    LIM_REG_CLASSES
>  };
> @@ -1131,7 +1152,7 @@ enum reg_class
>  #define N_REG_CLASSES  (int) LIM_REG_CLASSES
>
>  /* Give names of register classes as strings for dump file.  */
> -#define REG_CLASS_NAMES  \
> +#define REG_CLASS_NAMES \
>  {                       \
>    "NO_REGS",           \
>    "LO_REGS",           \
> @@ -1151,6 +1172,7 @@ enum reg_class
>    "VFPCC_REG",         \
>    "SFP_REG",           \
>    "AFP_REG",           \
> +  "VPR_REG",           \
>    "ALL_REGS"           \
>  }
>
> @@ -1177,7 +1199,8 @@ enum reg_class
>    { 0x00000000, 0x00000000, 0x00000000, 0x00000020 }, /* VFPCC_REG */  \
>    { 0x00000000, 0x00000000, 0x00000000, 0x00000040 }, /* SFP_REG */    \
>    { 0x00000000, 0x00000000, 0x00000000, 0x00000080 }, /* AFP_REG */    \
> -  { 0xFFFF7FFF, 0xFFFFFFFF, 0xFFFFFFFF, 0x0000000F }  /* ALL_REGS */   \
> +  { 0x00000000, 0x00000000, 0x00000000, 0x00000100 }, /* VPR_REG.  */  \
> +  { 0xFFFF7FFF, 0xFFFFFFFF, 0xFFFFFFFF, 0x0000010F }  /* ALL_REGS.  */ \
>  }
>
>  #define FP_SYSREGS \
> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
> index 
> 883c2a9179d7e6d69225f8d104228d15702ecef7..6faed76206b93c1a9dea048e2f693dc16ee58072 
> 100644
> --- a/gcc/config/arm/arm.c
> +++ b/gcc/config/arm/arm.c
> @@ -3759,7 +3759,8 @@ arm_options_perform_arch_sanity_checks (void)
>        else if (TARGET_HARD_FLOAT_ABI)
>          {
>            arm_pcs_default = ARM_PCS_AAPCS_VFP;
> -         if (!bitmap_bit_p (arm_active_target.isa, isa_bit_vfpv2))
> +         if (!bitmap_bit_p (arm_active_target.isa, isa_bit_vfpv2)
> +             && !bitmap_bit_p (arm_active_target.isa, isa_bit_mve))
>              error ("%<-mfloat-abi=hard%>: selected processor lacks an 
> FPU");
>          }
>        else
> @@ -4230,7 +4231,7 @@ use_return_insn (int iscond, rtx sibling)
>
>    /* Can't be done if any of the VFP regs are pushed,
>       since this also requires an insn.  */
> -  if (TARGET_HARD_FLOAT)
> +  if (TARGET_HARD_FLOAT || TARGET_HAVE_MVE)
>      for (regno = FIRST_VFP_REGNUM; regno <= LAST_VFP_REGNUM; regno++)
>   ��    if (df_regs_ever_live_p (regno) && !call_used_or_fixed_reg_p 
> (regno))
>          return 0;
> @@ -6289,7 +6290,7 @@ aapcs_vfp_allocate (CUMULATIVE_ARGS *pcum, 
> machine_mode mode,
>        {
>          pcum->aapcs_vfp_reg_alloc = mask << regno;
>          if (mode == BLKmode
> -           || (mode == TImode && ! TARGET_NEON)
> +           || (mode == TImode && ! (TARGET_NEON || TARGET_HAVE_MVE))
>              || ! arm_hard_regno_mode_ok (FIRST_VFP_REGNUM + regno, mode))
>            {
>              int i;
> @@ -6297,7 +6298,7 @@ aapcs_vfp_allocate (CUMULATIVE_ARGS *pcum, 
> machine_mode mode,
>              int rshift = shift;
>              machine_mode rmode = pcum->aapcs_vfp_rmode;
>              rtx par;
> -           if (!TARGET_NEON)
> +           if (!(TARGET_NEON || TARGET_HAVE_MVE))
>                {
>                  /* Avoid using unsupported vector modes. */
>                  if (rmode == V2SImode)
> @@ -6343,7 +6344,7 @@ aapcs_vfp_allocate_return_reg (enum arm_pcs 
> pcs_variant ATTRIBUTE_UNUSED,
>    if (mode == BLKmode
>        || (GET_MODE_CLASS (mode) == MODE_INT
>            && GET_MODE_SIZE (mode) >= GET_MODE_SIZE (TImode)
> -         && !TARGET_NEON))
> +         && !(TARGET_NEON || TARGET_HAVE_MVE)))
>      {
>        int count;
>        machine_mode ag_mode;
> @@ -6354,7 +6355,7 @@ aapcs_vfp_allocate_return_reg (enum arm_pcs 
> pcs_variant ATTRIBUTE_UNUSED,
>        aapcs_vfp_is_call_or_return_candidate (pcs_variant, mode, type,
>                                               &ag_mode, &count);
>
> -      if (!TARGET_NEON)
> +      if (!(TARGET_NEON || TARGET_HAVE_MVE))
>          {
>            if (ag_mode == V2SImode)
>              ag_mode = DImode;
> @@ -8253,7 +8254,9 @@ thumb2_legitimate_address_p (machine_mode mode, 
> rtx x, int strict_p)
>                     && CONST_INT_P (XEXP (XEXP (x, 0), 1)))))
>      return 1;
>
> -  else if (mode == TImode || (TARGET_NEON && VALID_NEON_STRUCT_MODE 
> (mode)))
> +  else if (mode == TImode
> +          || (TARGET_NEON && VALID_NEON_STRUCT_MODE (mode))
> +          || (TARGET_HAVE_MVE && VALID_MVE_STRUCT_MODE (mode)))
>      return 0;
>
>    else if (code == PLUS)
> @@ -9800,7 +9803,7 @@ arm_rtx_costs_internal (rtx x, enum rtx_code 
> code, enum rtx_code outer_code,
>            /* Assume that most copies can be done with a single insn,
>               unless we don't have HW FP, in which case everything
>               larger than word mode will require two insns. */
> -         *cost = COSTS_N_INSNS (((!TARGET_HARD_FLOAT
> +         *cost = COSTS_N_INSNS (((!(TARGET_HARD_FLOAT || TARGET_HAVE_MVE)
>                                     && GET_MODE_SIZE (mode) > 4)
>                                    || mode == DImode)
>                                   ? 2 : 1);
> @@ -11281,10 +11284,10 @@ arm_rtx_costs_internal (rtx x, enum rtx_code 
> code, enum rtx_code outer_code,
>
>      case CONST_VECTOR:
>        /* Fixme.  */
> -      if (TARGET_NEON
> -         && TARGET_HARD_FLOAT
> -         && (VALID_NEON_DREG_MODE (mode) || VALID_NEON_QREG_MODE (mode))
> -         && neon_immediate_valid_for_move (x, mode, NULL, NULL))
> +      if (((TARGET_NEON && TARGET_HARD_FLOAT
> +           && (VALID_NEON_DREG_MODE (mode) || VALID_NEON_QREG_MODE 
> (mode)))
> +          || TARGET_HAVE_MVE)
> +         && simd_immediate_valid_for_move (x, mode, NULL, NULL))
>          *cost = COSTS_N_INSNS (1);
>        else
>          *cost = COSTS_N_INSNS (4);
> @@ -12328,8 +12331,8 @@ vfp3_const_double_rtx (rtx x)
>    return vfp3_const_double_index (x) != -1;
>  }
>
> -/* Recognize immediates which can be used in various Neon 
> instructions. Legal
> -   immediates are described by the following table (for VMVN 
> variants, the
> +/* Recognize immediates which can be used in various Neon and MVE 
> instructions.
> +   Legal immediates are described by the following table (for VMVN 
> variants, the
>     bitwise inverse of the constant shown is recognized. In either 
> case, VMOV
>     is output and the correct instruction to use for a given constant 
> is chosen
>     by the assembler). The constant shown is replicated across all 
> elements of
> @@ -12380,7 +12383,7 @@ vfp3_const_double_rtx (rtx x)
>     -1 if the given value doesn't match any of the listed patterns.
>  */
>  static int
> -neon_valid_immediate (rtx op, machine_mode mode, int inverse,
> +simd_valid_immediate (rtx op, machine_mode mode, int inverse,
>                        rtx *modconst, int *elementwidth)
>  {
>  #define CHECK(STRIDE, ELSIZE, CLASS, TEST)      \
> @@ -12412,6 +12415,10 @@ neon_valid_immediate (rtx op, machine_mode 
> mode, int inverse,
>
>    innersize = GET_MODE_UNIT_SIZE (mode);
>
> +  /* Only support 128-bit vectors for MVE.  */
> +  if (TARGET_HAVE_MVE && (!vector || n_elts * innersize != 16))
> +    return -1;
> +
>    /* Vectors of float constants.  */
>    if (GET_MODE_CLASS (mode) == MODE_VECTOR_FLOAT)
>      {
> @@ -12560,18 +12567,19 @@ neon_valid_immediate (rtx op, machine_mode 
> mode, int inverse,
>  #undef CHECK
>  }
>
> -/* Return TRUE if rtx X is legal for use as either a Neon VMOV (or, 
> implicitly,
> -   VMVN) immediate. Write back width per element to *ELEMENTWIDTH (or 
> zero for
> -   float elements), and a modified constant (whatever should be 
> output for a
> -   VMOV) in *MODCONST.  */
> -
> +/* Return TRUE if rtx X is legal for use as either a Neon or MVE VMOV 
> (or,
> +   implicitly, VMVN) immediate.  Write back width per element to 
> *ELEMENTWIDTH
> +   (or zero for float elements), and a modified constant (whatever 
> should be
> +   output for a VMOV) in *MODCONST. "neon_immediate_valid_for_move" 
> function is
> +   modified to "simd_immediate_valid_for_move" as this function will 
> be used
> +   both by neon and mve.  */
>  int
> -neon_immediate_valid_for_move (rtx op, machine_mode mode,
> +simd_immediate_valid_for_move (rtx op, machine_mode mode,
>                                 rtx *modconst, int *elementwidth)
>  {
>    rtx tmpconst;
>    int tmpwidth;
> -  int retval = neon_valid_immediate (op, mode, 0, &tmpconst, &tmpwidth);
> +  int retval = simd_valid_immediate (op, mode, 0, &tmpconst, &tmpwidth);
>
>    if (retval == -1)
>      return 0;
> @@ -12588,7 +12596,7 @@ neon_immediate_valid_for_move (rtx op, 
> machine_mode mode,
>  /* Return TRUE if rtx X is legal for use in a VORR or VBIC 
> instruction.  If
>     the immediate is valid, write a constant suitable for using as an 
> operand
>     to VORR/VBIC/VAND/VORN to *MODCONST and the corresponding element 
> width to
> -   *ELEMENTWIDTH. See neon_valid_immediate for description of 
> INVERSE.  */
> +   *ELEMENTWIDTH.  See simd_valid_immediate for description of 
> INVERSE.  */
>
>  int
>  neon_immediate_valid_for_logic (rtx op, machine_mode mode, int inverse,
> @@ -12596,7 +12604,7 @@ neon_immediate_valid_for_logic (rtx op, 
> machine_mode mode, int inverse,
>  {
>    rtx tmpconst;
>    int tmpwidth;
> -  int retval = neon_valid_immediate (op, mode, inverse, &tmpconst, 
> &tmpwidth);
> +  int retval = simd_valid_immediate (op, mode, inverse, &tmpconst, 
> &tmpwidth);
>
>    if (retval < 0 || retval > 5)
>      return 0;
> @@ -12803,7 +12811,7 @@ neon_make_constant (rtx vals)
>      gcc_unreachable ();
>
>    if (const_vec != NULL
> -      && neon_immediate_valid_for_move (const_vec, mode, NULL, NULL))
> +      && simd_immediate_valid_for_move (const_vec, mode, NULL, NULL))
>      /* Load using VMOV.  On Cortex-A8 this takes one cycle.  */
>      return const_vec;
>    else if ((target = neon_vdup_constant (vals)) != NULL_RTX)
> @@ -13080,6 +13088,15 @@ neon_vector_mem_operand (rtx op, int type, 
> bool strict)
>        && (INTVAL (XEXP (ind, 1)) & 3) == 0)
>      return TRUE;
>
> +  if (type == 1 && TARGET_HAVE_MVE
> +      && (GET_CODE (ind) == POST_INC || GET_CODE (ind) == PRE_DEC))
> +    {
> +      rtx ind1 = XEXP (ind, 0);
> +      if (!REG_P (ind1))
> +       return 0;
> +      return NEON_REGNO_OK_FOR_QUAD (REGNO (ind1));
> +    }
> +
>    return FALSE;
>  }
>
> @@ -19936,7 +19953,7 @@ output_move_neon (rtx *operands)
>      {
>      case POST_INC:
>        /* We have to use vldm / vstm for too-large modes. */
> -      if (nregs > 4)
> +      if (nregs > 4 || (TARGET_HAVE_MVE && nregs >= 2))
>          {
>            templ = "v%smia%%?\t%%0!, %%h1";
>            ops[0] = XEXP (addr, 0);
> @@ -19965,7 +19982,7 @@ output_move_neon (rtx *operands)
>        /* We have to use vldm / vstm for too-large modes. */
>        if (nregs > 1)
>          {
> -         if (nregs > 4)
> +         if (nregs > 4 || (TARGET_HAVE_MVE && nregs >= 2))
>              templ = "v%smia%%?\t%%m0, %%h1";
>            else
>              templ = "v%s1.64\t%%h1, %%A0";
> @@ -19980,29 +19997,40 @@ output_move_neon (rtx *operands)
>        {
>          int i;
>          int overlap = -1;
> -       for (i = 0; i < nregs; i++)
> +       if (TARGET_HAVE_MVE && !BYTES_BIG_ENDIAN)
>            {
> -           /* We're only using DImode here because it's a convenient 
> size.  */
> -           ops[0] = gen_rtx_REG (DImode, REGNO (reg) + 2 * i);
> -           ops[1] = adjust_address (mem, DImode, 8 * i);
> -           if (reg_overlap_mentioned_p (ops[0], mem))
> +           sprintf (buff, "v%srw.32\t%%q0, %%1", load ? "ld" : "st");
> +           ops[0] = reg;
> +           ops[1] = mem;
> +           output_asm_insn (buff, ops);
> +         }
> +       else
> +         {
> +           for (i = 0; i < nregs; i++)
>                {
> -               gcc_assert (overlap == -1);
> -               overlap = i;
> +               /* We're only using DImode here because it's a convenient
> +                  size.  */
> +               ops[0] = gen_rtx_REG (DImode, REGNO (reg) + 2 * i);
> +               ops[1] = adjust_address (mem, DImode, 8 * i);
> +               if (reg_overlap_mentioned_p (ops[0], mem))
> +                 {
> +                   gcc_assert (overlap == -1);
> +                   overlap = i;
> +                 }
> +               else
> +                 {
> +                   sprintf (buff, "v%sr%%?\t%%P0, %%1", load ? "ld" : 
> "st");
> +                   output_asm_insn (buff, ops);
> +                 }
>                }
> -           else
> +           if (overlap != -1)
>                {
> +               ops[0] = gen_rtx_REG (DImode, REGNO (reg) + 2 * overlap);
> +               ops[1] = adjust_address (mem, SImode, 8 * overlap);
>                  sprintf (buff, "v%sr%%?\t%%P0, %%1", load ? "ld" : "st");
>                  output_asm_insn (buff, ops);
>                }
>            }
> -       if (overlap != -1)
> -         {
> -           ops[0] = gen_rtx_REG (DImode, REGNO (reg) + 2 * overlap);
> -           ops[1] = adjust_address (mem, SImode, 8 * overlap);
> -           sprintf (buff, "v%sr%%?\t%%P0, %%1", load ? "ld" : "st");
> -           output_asm_insn (buff, ops);
> -         }
>
>          return "";
>        }
> @@ -22223,7 +22251,7 @@ arm_compute_frame_layout (void)
>        func_type = arm_current_func_type ();
>        /* Space for saved VFP registers.  */
>        if (! IS_VOLATILE (func_type)
> -         && TARGET_HARD_FLOAT)
> +         && (TARGET_HARD_FLOAT || TARGET_HAVE_MVE))
>          saved += arm_get_vfp_saved_size ();
>
>        /* Allocate space for saving/restoring FPCXTNS in Armv8.1-M 
> Mainline
> @@ -22447,7 +22475,7 @@ arm_save_coproc_regs(void)
>          saved_size += 8;
>        }
>
> -  if (TARGET_HARD_FLOAT)
> +  if (TARGET_HARD_FLOAT || TARGET_HAVE_MVE)
>      {
>        start_reg = FIRST_VFP_REGNUM;
>
> @@ -23749,6 +23777,53 @@ arm_print_operand (FILE *stream, rtx x, int code)
>        }
>        return;
>
> +    /* To print the memory operand with "Us" constraint. Based on the 
> rtx_code
> +       the memory operands output looks like following.
> +       1. [Rn], #+/-<imm>
> +       2. [Rn, #+/-<imm>]!
> +       3. [Rn].  */
> +    case 'E':
> +      {
> +       rtx addr;
> +       rtx postinc_reg = NULL;
> +       unsigned inc_val = 0;
> +       enum rtx_code code;
> +
> +       gcc_assert (MEM_P (x));
> +       addr = XEXP (x, 0);
> +       code = GET_CODE (addr);
> +       if (code == POST_INC || code == POST_DEC || code == PRE_INC
> +           || code  == PRE_DEC)
> +         {
> +           asm_fprintf (stream, "[%r", REGNO (XEXP (addr, 0)));
> +           inc_val = GET_MODE_SIZE (GET_MODE (x));
> +           if (code == POST_INC || code == POST_DEC)
> +             asm_fprintf (stream, "], #%s%d",(code == POST_INC)
> +                                             ? "": "-", inc_val);
> +           else
> +             asm_fprintf (stream, ", #%s%d]!",(code == PRE_INC)
> +                                              ? "": "-", inc_val);
> +         }
> +       else if (code == POST_MODIFY || code == PRE_MODIFY)
> +         {
> +           asm_fprintf (stream, "[%r", REGNO (XEXP (addr, 0)));
> +           postinc_reg = XEXP ( XEXP (x, 1), 1);
> +           if (postinc_reg && CONST_INT_P (postinc_reg))
> +             {
> +               if (code == POST_MODIFY)
> +                 asm_fprintf (stream, "], #%wd",INTVAL (postinc_reg));
> +               else
> +                 asm_fprintf (stream, ", #%wd]!",INTVAL (postinc_reg));
> +             }
> +         }
> +       else
> +         {
> +           gcc_assert (REG_P (addr));
> +           asm_fprintf (stream, "[%r]",REGNO (addr));
> +         }
> +      }
> +      return;
> +
>      case 'C':
>        {
>          rtx addr;
> @@ -23926,9 +24001,10 @@ arm_print_operand_address (FILE *stream, 
> machine_mode mode, rtx x)
>                           REGNO (XEXP (x, 0)),
>                           GET_CODE (x) == PRE_DEC ? "-" : "",
>                           GET_MODE_SIZE (mode));
> +         else if (TARGET_HAVE_MVE && (mode == OImode || mode == XImode))
> +           asm_fprintf (stream, "[%r]!", REGNO (XEXP (x,0)));
>            else
> -           asm_fprintf (stream, "[%r], #%s%d",
> -                        REGNO (XEXP (x, 0)),
> +           asm_fprintf (stream, "[%r], #%s%d", REGNO (XEXP (x, 0)),
>                           GET_CODE (x) == POST_DEC ? "-" : "",
>                           GET_MODE_SIZE (mode));
>          }
> @@ -24773,12 +24849,15 @@ arm_hard_regno_mode_ok (unsigned int regno, 
> machine_mode mode)
>  {
>    if (GET_MODE_CLASS (mode) == MODE_CC)
>      return (regno == CC_REGNUM
> -           || (TARGET_HARD_FLOAT
> +           || ((TARGET_HARD_FLOAT || TARGET_HAVE_MVE)
>                  && regno == VFPCC_REGNUM));
>
>    if (regno == CC_REGNUM && GET_MODE_CLASS (mode) != MODE_CC)
>      return false;
>
> +  if (IS_VPR_REGNUM (regno))
> +    return true;
> +
>    if (TARGET_THUMB1)
>      /* For the Thumb we only allow values bigger than SImode in
>         registers 0 - 6, so that there is always a second low
> @@ -24787,7 +24866,7 @@ arm_hard_regno_mode_ok (unsigned int regno, 
> machine_mode mode)
>         start of an even numbered register pair.  */
>      return (ARM_NUM_REGS (mode) < 2) || (regno < LAST_LO_REGNUM);
>
> -  if (TARGET_HARD_FLOAT && IS_VFP_REGNUM (regno))
> +  if ((TARGET_HARD_FLOAT || TARGET_HAVE_MVE) && IS_VFP_REGNUM (regno))
>      {
>        if (mode == SFmode || mode == SImode)
>          return VFP_REGNO_OK_FOR_SINGLE (regno);
> @@ -24811,6 +24890,10 @@ arm_hard_regno_mode_ok (unsigned int regno, 
> machine_mode mode)
>                 || (mode == OImode && NEON_REGNO_OK_FOR_NREGS (regno, 4))
>                 || (mode == CImode && NEON_REGNO_OK_FOR_NREGS (regno, 6))
>                 || (mode == XImode && NEON_REGNO_OK_FOR_NREGS (regno, 8));
> +     if (TARGET_HAVE_MVE)
> +       return ((VALID_MVE_MODE (mode) && NEON_REGNO_OK_FOR_QUAD (regno))
> +              || (mode == OImode && NEON_REGNO_OK_FOR_NREGS (regno, 4))
> +              || (mode == XImode && NEON_REGNO_OK_FOR_NREGS (regno, 8)));
>
>        return false;
>      }
> @@ -24859,13 +24942,18 @@ arm_modes_tieable_p (machine_mode mode1, 
> machine_mode mode2)
>    /* We specifically want to allow elements of "structure" modes to
>       be tieable to the structure.  This more general condition allows
>       other rarer situations too.  */
> -  if (TARGET_NEON
> -      && (VALID_NEON_DREG_MODE (mode1)
> -         || VALID_NEON_QREG_MODE (mode1)
> -         || VALID_NEON_STRUCT_MODE (mode1))
> -      && (VALID_NEON_DREG_MODE (mode2)
> -         || VALID_NEON_QREG_MODE (mode2)
> -         || VALID_NEON_STRUCT_MODE (mode2)))
> +  if ((TARGET_NEON
> +       && (VALID_NEON_DREG_MODE (mode1)
> +          || VALID_NEON_QREG_MODE (mode1)
> +          || VALID_NEON_STRUCT_MODE (mode1))
> +       && (VALID_NEON_DREG_MODE (mode2)
> +          || VALID_NEON_QREG_MODE (mode2)
> +          || VALID_NEON_STRUCT_MODE (mode2)))
> +      || (TARGET_HAVE_MVE
> +         && (VALID_MVE_MODE (mode1)
> +             || VALID_MVE_STRUCT_MODE (mode1))
> +         && (VALID_MVE_MODE (mode2)
> +             || VALID_MVE_STRUCT_MODE (mode2))))
>      return true;
>
>    return false;
> @@ -24880,6 +24968,9 @@ arm_regno_class (int regno)
>    if (regno == PC_REGNUM)
>      return NO_REGS;
>
> +  if (IS_VPR_REGNUM (regno))
> +    return VPR_REG;
> +
>    if (TARGET_THUMB1)
>      {
>        if (regno == STACK_POINTER_REGNUM)
> @@ -26731,7 +26822,7 @@ arm_expand_epilogue_apcs_frame (bool 
> really_return)
>          floats_from_frame += 4;
>        }
>
> -  if (TARGET_HARD_FLOAT)
> +  if (TARGET_HARD_FLOAT || TARGET_HAVE_MVE)
>      {
>        int start_reg;
>        rtx ip_rtx = gen_rtx_REG (SImode, IP_REGNUM);
> @@ -26977,7 +27068,7 @@ arm_expand_epilogue (bool really_return)
>          }
>      }
>
> -  if (TARGET_HARD_FLOAT)
> +  if (TARGET_HARD_FLOAT || TARGET_HAVE_MVE)
>      {
>        /* Generate VFP register multi-pop.  */
>        int end_reg = LAST_VFP_REGNUM + 1;
> @@ -27148,7 +27239,7 @@ arm_expand_epilogue (bool really_return)
>                                                     GEN_INT 
> (FPCXTNS_ENUM)));
>            RTX_FRAME_RELATED_P (insn) = 1;
>          }
> -      }
> +    }
>
>    if (!really_return)
>      return;
> @@ -28370,6 +28461,15 @@ arm_vector_mode_supported_p (machine_mode mode)
>        || mode == V2HAmode))
>      return true;
>
> +  if (TARGET_HAVE_MVE
> +      && (mode == V2DImode || mode == V4SImode || mode == V8HImode
> +         || mode == V16QImode))
> +      return true;
> +
> +  if (TARGET_HAVE_MVE_FLOAT
> +      && (mode == V2DFmode || mode == V4SFmode || mode == V8HFmode))
> +      return true;
> +
>    return false;
>  }
>
> @@ -28387,6 +28487,10 @@ arm_array_mode_supported_p (machine_mode mode,
>        && (nelems >= 2 && nelems <= 4))
>      return true;
>
> +  if (TARGET_HAVE_MVE && !BYTES_BIG_ENDIAN
> +      && VALID_MVE_MODE (mode) && (nelems == 2 || nelems == 4))
> +    return true;
> +
>    return false;
>  }
>
> @@ -29435,7 +29539,7 @@ arm_conditional_register_usage (void)
>    if (TARGET_THUMB1)
>      fixed_regs[LR_REGNUM] = call_used_regs[LR_REGNUM] = 1;
>
> -  if (TARGET_32BIT && TARGET_HARD_FLOAT)
> +  if (TARGET_32BIT && (TARGET_HARD_FLOAT || TARGET_HAVE_MVE))
>      {
>        /* VFPv3 registers are disabled when earlier VFP
>           versions are selected due to the definition of
> @@ -29447,6 +29551,8 @@ arm_conditional_register_usage (void)
>            call_used_regs[regno] = regno < FIRST_VFP_REGNUM + 16
>              || regno >= FIRST_VFP_REGNUM + 32;
>          }
> +      if (TARGET_HAVE_MVE)
> +       fixed_regs[VPR_REGNUM] = 0;
>      }
>
>    if (TARGET_REALLY_IWMMXT && !TARGET_GENERAL_REGS_ONLY)
> diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
> index 
> c62ad1b360ebecd5368e90ea5634488eef22f2fc..689baa0b0ff63ef90f47d2fd844cb98c9a1457a0 
> 100644
> --- a/gcc/config/arm/arm.md
> +++ b/gcc/config/arm/arm.md
> @@ -41,6 +41,7 @@
>     (VFPCC_REGNUM    101)       ; VFP Condition code pseudo register
>     (APSRQ_REGNUM    104)       ; Q bit pseudo register
>     (APSRGE_REGNUM   105)       ; GE bits pseudo register
> +   (VPR_REGNUM      106)       ; Vector Predication Register - MVE 
> register.
>    ]
>  )
>  ;; 3rd operand to select_dominance_cc_mode
> @@ -7293,7 +7294,7 @@
>    [(set (match_operand:SF 0 "nonimmediate_operand" "=r,r,m")
>          (match_operand:SF 1 "general_operand"  "r,mE,r"))]
>    "TARGET_32BIT
> -   && TARGET_SOFT_FLOAT
> +   && TARGET_SOFT_FLOAT && !TARGET_HAVE_MVE
>     && (!MEM_P (operands[0])
>         || register_operand (operands[1], SFmode))"
>  {
> @@ -7416,8 +7417,8 @@
>
>  (define_insn "*movdf_soft_insn"
>    [(set (match_operand:DF 0 "nonimmediate_soft_df_operand" "=r,r,r,r,m")
> -       (match_operand:DF 1 "soft_df_operand" "rDa,Db,Dc,mF,r"))]
> -  "TARGET_32BIT && TARGET_SOFT_FLOAT
> +       (match_operand:DF 1 "soft_df_operand" "rDa,Db,Dc,mF,r"))]
> +  "TARGET_32BIT && TARGET_SOFT_FLOAT && !TARGET_HAVE_MVE
>     && (   register_operand (operands[0], DFmode)
>         || register_operand (operands[1], DFmode))"
>    "*
> @@ -11681,7 +11682,7 @@
>                     (match_operand:SI 2 "const_int_I_operand" "I")))
>       (set (match_operand:DF 3 "vfp_hard_register_operand" "")
>            (mem:DF (match_dup 1)))])]
> -  "TARGET_32BIT && TARGET_HARD_FLOAT"
> +  "TARGET_32BIT && (TARGET_HARD_FLOAT || TARGET_HAVE_MVE)"
>    "*
>    {
>      int num_regs = XVECLEN (operands[0], 0);
> @@ -12624,7 +12625,7 @@
>     (set_attr "length" "8")]
>  )
>
> -;; Vector bits common to IWMMXT and Neon
> +;; Vector bits common to IWMMXT, Neon and MVE
>  (include "vec-common.md")
>  ;; Load the Intel Wireless Multimedia Extension patterns
>  (include "iwmmxt.md")
> @@ -12642,3 +12643,5 @@
>  (include "sync.md")
>  ;; Fixed-point patterns
>  (include "arm-fixed.md")
> +;; M-profile Vector Extensions
> +(include "mve.md")
> diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
> new file mode 100644
> index 
> 0000000000000000000000000000000000000000..5ffb466596b5d8fc330616a6fcc7ee37d3e28def
> --- /dev/null
> +++ b/gcc/config/arm/arm_mve.h
> @@ -0,0 +1,59 @@
> +/* Arm MVE intrinsics include file.
> +
> +   Copyright (C) 2019 Free Software Foundation, Inc.
> +   Contributed by Arm.
> +
> +   This file is part of GCC.
> +
> +   GCC is free software; you can redistribute it and/or modify it
> +   under the terms of the GNU General Public License as published
> +   by the Free Software Foundation; either version 3, or (at your
> +   option) any later version.
> +
> +   GCC is distributed in the hope that it will be useful, but WITHOUT
> +   ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
> +   or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
> +   License for more details.
> +
> +   You should have received a copy of the GNU General Public License
> +   along with GCC; see the file COPYING3.  If not see
> +   <http://www.gnu.org/licenses/>. */
> +
> +#ifndef _GCC_ARM_MVE_H
> +#define _GCC_ARM_MVE_H
> +
> +#if !__ARM_FEATURE_MVE
> +#error "MVE feature not supported"
> +#endif
> +
> +#include <stdint.h>
> +#ifndef  __cplusplus
> +#include <stdbool.h>
> +#endif
> +
> +#ifdef __cplusplus
> +extern "C" {
> +#endif
> +
> +#if (__ARM_FEATURE_MVE & 2) /* MVE Floating point.  */
> +typedef __fp16 float16_t;
> +typedef float float32_t;
> +typedef __simd128_float16_t float16x8_t;
> +typedef __simd128_float32_t float32x4_t;
> +#endif
> +
> +typedef uint16_t mve_pred16_t;
> +typedef __simd128_uint8_t uint8x16_t;
> +typedef __simd128_uint16_t uint16x8_t;
> +typedef __simd128_uint32_t uint32x4_t;
> +typedef __simd128_uint64_t uint64x2_t;
> +typedef __simd128_int8_t int8x16_t;
> +typedef __simd128_int16_t int16x8_t;
> +typedef __simd128_int32_t int32x4_t;
> +typedef __simd128_int64_t int64x2_t;
> +
> +#ifdef __cplusplus
> +}
> +#endif
> +
> +#endif /* _GCC_ARM_MVE_H.  */
> diff --git a/gcc/config/arm/constraints.md b/gcc/config/arm/constraints.md
> index 
> 6f309b95cc1874ac7bc69e435781070e0c9cb70a..f77084a0efd489491372bb1dafbc0cd585f0f518 
> 100644
> --- a/gcc/config/arm/constraints.md
> +++ b/gcc/config/arm/constraints.md
> @@ -44,6 +44,8 @@
>  ;; in Thumb state: Uu, Uw
>  ;; in all states: Q
>
> +(define_register_constraint "Up" "TARGET_HAVE_MVE ? VPR_REG : NO_REGS"
> +  "MVE VPR register")
>
>  (define_register_constraint "t" "TARGET_32BIT ? VFP_LO_REGS : NO_REGS"
>   "The VFP registers @code{s0}-@code{s31}.")
> diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
> index 
> c412851843f4468c2c18bce264288705e076ac50..e30325bc1652d378be2544fa32269c5c4294d7e9 
> 100644
> --- a/gcc/config/arm/iterators.md
> +++ b/gcc/config/arm/iterators.md
> @@ -62,6 +62,12 @@
>  ;; Integer and float modes supported by Neon and IWMMXT.
>  (define_mode_iterator VALL [V2DI V2SI V4HI V8QI V2SF V4SI V8HI V16QI 
> V4SF])
>
> +;; Integer and float modes supported by Neon, IWMMXT and MVE.
> +(define_mode_iterator VNIM1 [V16QI V8HI V4SI V4SF V2DI])
> +
> +;; Integer and float modes supported by Neon and IWMMXT but not MVE.
> +(define_mode_iterator VNINOTM1 [V2SI V4HI V8QI V2SF])
> +
>  ;; Integer and float modes supported by Neon and IWMMXT, except V2DI.
>  (define_mode_iterator VALLW [V2SI V4HI V8QI V2SF V4SI V8HI V16QI V4SF])
>
> @@ -105,7 +111,8 @@
>  (define_mode_iterator VQXMOV [V16QI V8HI V8HF V4SI V4SF V2DI TI])
>
>  ;; Opaque structure types wider than TImode.
> -(define_mode_iterator VSTRUCT [EI OI CI XI])
> +(define_mode_iterator VSTRUCT [(EI "!TARGET_HAVE_MVE") OI
> +                              (CI "!TARGET_HAVE_MVE") XI])
>
>  ;; Opaque structure types used in table lookups (except vtbl1/vtbx1).
>  (define_mode_iterator VTAB [TI EI OI])
> diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
> new file mode 100644
> index 
> 0000000000000000000000000000000000000000..53334c6d329dedd482615b996232e85ded7a34f8
> --- /dev/null
> +++ b/gcc/config/arm/mve.md
> @@ -0,0 +1,78 @@
> +;; Arm M-profile Vector Extension Machine Description
> +;; Copyright (C) 2019 Free Software Foundation, Inc.
> +;;
> +;; This file is part of GCC.
> +;;
> +;; GCC is free software; you can redistribute it and/or modify it
> +;; under the terms of the GNU General Public License as published by
> +;; the Free Software Foundation; either version 3, or (at your option)
> +;; any later version.
> +;;
> +;; GCC is distributed in the hope that it will be useful, but
> +;; WITHOUT ANY WARRANTY; without even the implied warranty of
> +;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> +;; General Public License for more details.
> +;;
> +;; You should have received a copy of the GNU General Public License
> +;; along with GCC; see the file COPYING3.  If not see
> +;; <http://www.gnu.org/licenses/>.
> +
> +(define_mode_iterator MVE_types [V16QI V8HI V4SI V2DI TI V8HF V4SF V2DF])
> +(define_mode_attr V_sz_elem2 [(V16QI "s8") (V8HI "u16") (V4SI "u32")
> +                             (V2DI "u64")])
> +
> +(define_insn "*mve_mov<mode>"
> +  [(set (match_operand:MVE_types 0 "s_register_operand" "=w,w,r,w,w,r,w")
> +       (match_operand:MVE_types 1 "general_operand" 
> "w,r,w,Dn,Usi,r,Dm"))]
> +  "TARGET_HAVE_MVE || TARGET_HAVE_MVE_FLOAT"
> +{
> +  if (which_alternative == 3 || which_alternative == 6)
> +    {
> +      int width, is_valid;
> +      static char templ[40];
> +
> +      is_valid = simd_immediate_valid_for_move (operands[1], <MODE>mode,
> +       &operands[1], &width);
> +
> +      gcc_assert (is_valid != 0);
> +
> +      if (width == 0)
> +       return "vmov.f32\t%q0, %1  @ <mode>";
> +      else
> +       sprintf (templ, "vmov.i%d\t%%q0, %%x1  @ <mode>", width);
> +      return templ;
> +    }
> +  switch (which_alternative)
> +    {
> +    case 0:
> +      return "vmov\t%q0, %q1";
> +    case 1:
> +      return "vmov\t%e0, %Q1, %R1  @ <mode>\;vmov\t%f0, %J1, %K1";
> +    case 2:
> +      return "vmov\t%Q0, %R0, %e1  @ <mode>\;vmov\t%J0, %K0, %f1";
> +    case 4:
> +      if ((TARGET_HAVE_MVE_FLOAT && VALID_MVE_SF_MODE (<MODE>mode))
> +         || (MEM_P (operands[1])
> +             && GET_CODE (XEXP (operands[1], 0)) == LABEL_REF))
> +       return output_move_neon (operands);
> +      else
> +       return "vldrb.<V_sz_elem2> %q0, %E1";
> +    case 5:
> +      return output_move_neon (operands);
> +    case 6:
> +    default:
> +      gcc_unreachable ();
> +      return "";
> +    }
> +}
> +  [(set_attr "type" 
> "mve_move,mve_move,mve_move,mve_move,mve_load,mve_move,mve_move")
> +   (set_attr "length" "4,8,8,4,8,8,4")
> +   (set_attr "thumb2_pool_range" "*,*,*,*,1018,*,*")
> +   (set_attr "neg_pool_range" "*,*,*,*,996,*,*")])
> +
> +(define_insn "*mve_vstr<mode>"
> +  [(set (match_operand:MVE_types 0 "memory_operand" "=Us")
> +       (match_operand:MVE_types 1 "s_register_operand" "w"))]
> +  "TARGET_HAVE_MVE"
> +  "vstrb.<V_sz_elem> %q1, %E0"
> +  [(set_attr "type" "mve_store")])
> diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
> index 
> 6a0ee28efc9aa9f1fba7b5ae031564f40aa095fe..c23783e0ed914ec21a92828388ada58ada3c6132 
> 100644
> --- a/gcc/config/arm/neon.md
> +++ b/gcc/config/arm/neon.md
> @@ -35,9 +35,9 @@
>
>  (define_insn "*neon_mov<mode>"
>    [(set (match_operand:VDX 0 "nonimmediate_operand"
> -         "=w,Un,w, w, w,  ?r,?w,?r, ?Us,*r")
> +       "=w,Un,w, w, w,  ?r,?w,?r, ?Us,*r")
>          (match_operand:VDX 1 "general_operand"
> -         " w,w, Dm,Dn,Uni, w, r, Usi,r,*r"))]
> +       " w,w, Dm,Dn,Uni, w, r, Usi,r,*r"))]
>    "TARGET_NEON
>     && (register_operand (operands[0], <MODE>mode)
>         || register_operand (operands[1], <MODE>mode))"
> @@ -47,7 +47,7 @@
>        int width, is_valid;
>        static char templ[40];
>
> -      is_valid = neon_immediate_valid_for_move (operands[1], <MODE>mode,
> +      is_valid = simd_immediate_valid_for_move (operands[1], <MODE>mode,
>          &operands[1], &width);
>
>        gcc_assert (is_valid != 0);
> @@ -94,7 +94,7 @@
>        int width, is_valid;
>        static char templ[40];
>
> -      is_valid = neon_immediate_valid_for_move (operands[1], <MODE>mode,
> +      is_valid = simd_immediate_valid_for_move (operands[1], <MODE>mode,
>          &operands[1], &width);
>
>        gcc_assert (is_valid != 0);
> @@ -147,9 +147,9 @@
>  })
>
>  (define_expand "mov<mode>"
> -  [(set (match_operand:VSTRUCT 0 "nonimmediate_operand")
> -       (match_operand:VSTRUCT 1 "general_operand"))]
> -  "TARGET_NEON"
> +  [(set (match_operand:VSTRUCT 0 "nonimmediate_operand" "")
> +       (match_operand:VSTRUCT 1 "general_operand" ""))]
> +  "TARGET_NEON || TARGET_HAVE_MVE"
>  {
>    gcc_checking_assert (aligned_operand (operands[0], <MODE>mode));
>    gcc_checking_assert (aligned_operand (operands[1], <MODE>mode));
> @@ -160,24 +160,28 @@
>      }
>  })
>
> -(define_expand "mov<mode>"
> -  [(set (match_operand:VH 0 "s_register_operand")
> -       (match_operand:VH 1 "s_register_operand"))]
> +;; The pattern mov<mode> where mode is v4hf and v8hf is split into
> +;; movv4hf and movv8hf.  The pattern movv8hf is common for MVE and
> +;; NEON, so it is moved into vec-common.md file.
> +(define_expand "movv4hf"
> +  [(set (match_operand:V4HF 0 "s_register_operand")
> +       (match_operand:V4HF 1 "s_register_operand"))]
>    "TARGET_NEON"
>  {
> -  gcc_checking_assert (aligned_operand (operands[0], <MODE>mode));
> -  gcc_checking_assert (aligned_operand (operands[1], <MODE>mode));
> +  gcc_checking_assert (aligned_operand (operands[0], E_V4HFmode));
> +  gcc_checking_assert (aligned_operand (operands[1], E_V4HFmode));
>    if (can_create_pseudo_p ())
>      {
>        if (!REG_P (operands[0]))
> -       operands[1] = force_reg (<MODE>mode, operands[1]);
> +       operands[1] = force_reg (E_V4HFmode, operands[1]);
>      }
>  })
>
> +
>  (define_insn "*neon_mov<mode>"
>    [(set (match_operand:VSTRUCT 0 "nonimmediate_operand"        "=w,Ut,w")
>          (match_operand:VSTRUCT 1 "general_operand"      " w,w, Ut"))]
> -  "TARGET_NEON
> +  "(TARGET_NEON || TARGET_HAVE_MVE)
>     && (register_operand (operands[0], <MODE>mode)
>         || register_operand (operands[1], <MODE>mode))"
>  {
> @@ -213,7 +217,7 @@
>  (define_split
>    [(set (match_operand:OI 0 "s_register_operand" "")
>          (match_operand:OI 1 "s_register_operand" ""))]
> -  "TARGET_NEON && reload_completed"
> +  "(TARGET_NEON || TARGET_HAVE_MVE) && reload_completed"
>    [(set (match_dup 0) (match_dup 1))
>     (set (match_dup 2) (match_dup 3))]
>  {
> @@ -254,7 +258,7 @@
>  (define_split
>    [(set (match_operand:XI 0 "s_register_operand" "")
>          (match_operand:XI 1 "s_register_operand" ""))]
> -  "TARGET_NEON && reload_completed"
> +  "(TARGET_NEON || TARGET_HAVE_MVE) && reload_completed"
>    [(set (match_dup 0) (match_dup 1))
>     (set (match_dup 2) (match_dup 3))
>     (set (match_dup 4) (match_dup 5))
> @@ -489,7 +493,7 @@
>  (define_expand "vec_init<mode><V_elem_l>"
>    [(match_operand:VDQ 0 "s_register_operand")
>     (match_operand 1 "" "")]
> -  "TARGET_NEON"
> +  "TARGET_NEON || TARGET_HAVE_MVE"
>  {
>    neon_expand_vector_init (operands[0], operands[1]);
>    DONE;
> diff --git a/gcc/config/arm/predicates.md b/gcc/config/arm/predicates.md
> index 
> 2f0f532edf40d475e4199aa41bd7803fac8d6143..9d74165fe065b03c77918fe9e4611967799535f1 
> 100644
> --- a/gcc/config/arm/predicates.md
> +++ b/gcc/config/arm/predicates.md
> @@ -48,6 +48,16 @@
>    return guard_addr_operand (XEXP (op, 0), mode);
>  })
>
> +(define_predicate "vpr_register_operand"
> +  (match_code "reg,subreg")
> +{
> +  if (GET_CODE (op) == SUBREG)
> +    op = SUBREG_REG (op);
> +  return REG_P (op)
> +         && (REGNO (op) >= FIRST_PSEUDO_REGISTER
> +             || IS_VPR_REGNUM (REGNO (op)));
> +})
> +
>  (define_predicate "imm_for_neon_inv_logic_operand"
>    (match_code "const_vector")
>  {
> @@ -706,7 +716,7 @@
>  (define_predicate "imm_for_neon_mov_operand"
>    (match_code "const_vector,const_int")
>  {
> -  return neon_immediate_valid_for_move (op, mode, NULL, NULL);
> +  return simd_immediate_valid_for_move (op, mode, NULL, NULL);
>  })
>
>  (define_predicate "imm_for_neon_lshift_operand"
> diff --git a/gcc/config/arm/t-arm b/gcc/config/arm/t-arm
> index 
> af60c8fc285bb536afeb9ec5c21771a4379755fc..fda5e84355b56a20eb9a22919ab1c786120cc8f1 
> 100644
> --- a/gcc/config/arm/t-arm
> +++ b/gcc/config/arm/t-arm
> @@ -55,6 +55,7 @@ MD_INCLUDES= $(srcdir)/config/arm/arm1020e.md \
>                  $(srcdir)/config/arm/ldmstm.md \
>                  $(srcdir)/config/arm/ldrdstrd.md \
>                  $(srcdir)/config/arm/marvell-f-iwmmxt.md \
> +               $(srcdir)/config/arm/mve.md \
>                  $(srcdir)/config/arm/neon.md \
>                  $(srcdir)/config/arm/predicates.md \
>                  $(srcdir)/config/arm/sync.md \
> diff --git a/gcc/config/arm/types.md b/gcc/config/arm/types.md
> index 
> 60faad6597935607ed3c5593f941a04bbc924252..c99b846ab387bac633be8b1631f0e40b3c827850 
> 100644
> --- a/gcc/config/arm/types.md
> +++ b/gcc/config/arm/types.md
> @@ -550,6 +550,11 @@
>  ; The classification below is for TME instructions
>  ;
>  ; tme
> +; The classification below is for M-profile Vector Extension instructions
> +;
> +; mve_move
> +; mve_store
> +; mve_load
>
>  (define_attr "type"
>   "adc_imm,\
> @@ -1096,7 +1101,11 @@
>    crypto_sm3,\
>    crypto_sm4,\
>    coproc,\
> -  tme"
> +  tme,\
> +\
> +  mve_move,\
> +  mve_store,\
> +  mve_load"
>     (const_string "untyped"))
>
>  ; Is this an (integer side) multiply with a 32-bit (or smaller) result?
> diff --git a/gcc/config/arm/vec-common.md b/gcc/config/arm/vec-common.md
> index 
> 33ff5627284d7cc898074b562179938982ecc420..5f5c113cf95afafbb733e1bfd2a7c7b8a55651a2 
> 100644
> --- a/gcc/config/arm/vec-common.md
> +++ b/gcc/config/arm/vec-common.md
> @@ -21,8 +21,31 @@
>  ;; Vector Moves
>
>  (define_expand "mov<mode>"
> -  [(set (match_operand:VALL 0 "nonimmediate_operand")
> -       (match_operand:VALL 1 "general_operand"))]
> +  [(set (match_operand:VNIM1 0 "nonimmediate_operand")
> +       (match_operand:VNIM1 1 "general_operand"))]
> +  "TARGET_NEON
> +   || (TARGET_REALLY_IWMMXT && VALID_IWMMXT_REG_MODE (<MODE>mode))
> +   || (TARGET_HAVE_MVE && VALID_MVE_SI_MODE (<MODE>mode))
> +   || (TARGET_HAVE_MVE_FLOAT && VALID_MVE_SF_MODE (<MODE>mode))"
> +   {
> +  gcc_checking_assert (aligned_operand (operands[0], <MODE>mode));
> +  gcc_checking_assert (aligned_operand (operands[1], <MODE>mode));
> +  if (can_create_pseudo_p ())
> +    {
> +      if (!REG_P (operands[0]))
> +       operands[1] = force_reg (<MODE>mode, operands[1]);
> +      else if ((TARGET_NEON || TARGET_HAVE_MVE || TARGET_HAVE_MVE_FLOAT)
> +              && (CONSTANT_P (operands[1])))
> +       {
> +         operands[1] = neon_make_constant (operands[1]);
> +         gcc_assert (operands[1] != NULL_RTX);
> +       }
> +    }
> +})
> +
> +(define_expand "mov<mode>"
> +  [(set (match_operand:VNINOTM1 0 "nonimmediate_operand")
> +       (match_operand:VNINOTM1 1 "general_operand"))]
>    "TARGET_NEON
>     || (TARGET_REALLY_IWMMXT && VALID_IWMMXT_REG_MODE (<MODE>mode))"
>  {
> @@ -40,6 +63,20 @@
>      }
>  })
>
> +(define_expand "movv8hf"
> +  [(set (match_operand:V8HF 0 "s_register_operand")
> +       (match_operand:V8HF 1 "s_register_operand"))]
> +   "TARGET_NEON || TARGET_HAVE_MVE_FLOAT"
> +{
> +  gcc_checking_assert (aligned_operand (operands[0], E_V8HFmode));
> +  gcc_checking_assert (aligned_operand (operands[1], E_V8HFmode));
> +   if (can_create_pseudo_p ())
> +     {
> +       if (!REG_P (operands[0]))
> +        operands[1] = force_reg (E_V8HFmode, operands[1]);
> +     }
> +})
> +
>  ;; Vector arithmetic. Expanders are blank, then unnamed insns implement
>  ;; patterns separately for IWMMXT and Neon.
>
> diff --git a/gcc/config/arm/vfp.md b/gcc/config/arm/vfp.md
> index 
> 573db164f01b4ac9ee4a9ee7414872fb93c9e2ca..6349c0570540ec25a599166f5d427fcbdbf2af68 
> 100644
> --- a/gcc/config/arm/vfp.md
> +++ b/gcc/config/arm/vfp.md
> @@ -311,7 +311,7 @@
>     && (   register_operand (operands[0], DImode)
>         || register_operand (operands[1], DImode))
>     && !(TARGET_NEON && CONST_INT_P (operands[1])
> -        && neon_immediate_valid_for_move (operands[1], DImode, NULL, 
> NULL))"
> +       && simd_immediate_valid_for_move (operands[1], DImode, NULL, 
> NULL))"
>    "*
>    switch (which_alternative)
>      {
> diff --git 
> a/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vector_float.c 
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vector_float.c
> new file mode 100644
> index 
> 0000000000000000000000000000000000000000..c3f81546c9f14f2491c6fb134170f17bcba16069
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vector_float.c
> @@ -0,0 +1,27 @@
> +/* { dg-do compile  } */
> +/* { dg-additional-options "-march=armv8.1-m.main+mve.fp 
> -mfloat-abi=hard"  }  */
> +/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mcpu=auto"} 
> } */
> +
> +#include "arm_mve.h"
> +
> +float32x4_t
> +foo32 (float32x4_t value)
> +{
> +  float32x4_t b = value;
> +  return b;
> +}
> +
> +/* { dg-final { scan-assembler "vmov\\tq\[0-7\], q\[0-7\]" }  } */
> +/* { dg-final { scan-assembler "vstrb.*" }  } */
> +/* { dg-final { scan-assembler "vldmia.*" }  } */
> +
> +float16x8_t
> +foo16 (float16x8_t value)
> +{
> +  float16x8_t b = value;
> +  return b;
> +}
> +
> +/* { dg-final { scan-assembler "vmov\\tq\[0-7\], q\[0-7\]" }  } */
> +/* { dg-final { scan-assembler "vstrb.*" }  } */
> +/* { dg-final { scan-assembler "vldmia.*" }  } */
> diff --git 
> a/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vector_float1.c 
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vector_float1.c
> new file mode 100644
> index 
> 0000000000000000000000000000000000000000..ebee0d2f1ad03b66d044d93bf901e0ce78eccba9
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vector_float1.c
> @@ -0,0 +1,31 @@
> +/* { dg-do compile  } */
> +/* { dg-additional-options "-march=armv8.1-m.main+mve.fp 
> -mfloat-abi=hard"  }  */
> +/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mcpu=auto"} 
> } */
> +
> +#include "arm_mve.h"
> +
> +float32x4_t value;
> +
> +float32x4_t
> +foo32 ()
> +{
> +  float32x4_t b = value;
> +  return b;
> +}
> +
> +/* { dg-final { scan-assembler "vmov\\tq\[0-7\], q\[0-7\]" }  } */
> +/* { dg-final { scan-assembler "vstrb.*" }  } */
> +/* { dg-final { scan-assembler "vldmia.*" }  } */
> +
> +float16x8_t value1;
> +
> +float16x8_t
> +foo16 ()
> +{
> +  float16x8_t b = value1;
> +  return b;
> +}
> +
> +/* { dg-final { scan-assembler "vmov\\tq\[0-7\], q\[0-7\]" }  } */
> +/* { dg-final { scan-assembler "vstrb.*" }  } */
> +/* { dg-final { scan-assembler "vldmia.*" }  } */
> diff --git 
> a/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vector_float2.c 
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vector_float2.c
> new file mode 100644
> index 
> 0000000000000000000000000000000000000000..9b9c84d66ef8fd585a42be1ac7585d8bc6c529bb
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vector_float2.c
> @@ -0,0 +1,27 @@
> +/* { dg-do compile  } */
> +/* { dg-additional-options "-march=armv8.1-m.main+mve.fp 
> -mfloat-abi=hard"  }  */
> +/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mcpu=auto"} 
> } */
> +
> +#include "arm_mve.h"
> +
> +float32x4_t
> +foo32 ()
> +{
> +  float32x4_t b = {10.0, 12.0, 14.0, 16.0};
> +  return b;
> +}
> +
> +/* { dg-final { scan-assembler "vmov\\tq\[0-7\], q\[0-7\]" }  } */
> +/* { dg-final { scan-assembler "vstrb.*" }  } */
> +/* { dg-final { scan-assembler "vldrw.32*" }  } */
> +
> +float16x8_t
> +foo16 ()
> +{
> +  float16x8_t b = {32.01};
> +  return b;
> +}
> +
> +/* { dg-final { scan-assembler "vmov\\tq\[0-7\], q\[0-7\]" }  } */
> +/* { dg-final { scan-assembler "vstrb.*" }  } */
> +/* { dg-final { scan-assembler "vldrw.32.*" }  } */
> diff --git 
> a/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vector_int.c 
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vector_int.c
> new file mode 100644
> index 
> 0000000000000000000000000000000000000000..6b54c3c61f32cf8e0af30272df63f261def0b8c5
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vector_int.c
> @@ -0,0 +1,49 @@
> +/* { dg-do compile  } */
> +/* { dg-additional-options "-march=armv8.1-m.main+mve 
> -mfloat-abi=hard"  }  */
> +/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mcpu=auto"} 
> } */
> +
> +#include "arm_mve.h"
> +
> +int8x16_t
> +foo8 (int8x16_t value)
> +{
> +  int8x16_t b = value;
> +  return b;
> +}
> +
> +/* { dg-final { scan-assembler "vmov\\tq\[0-7\], q\[0-7\]" }  } */
> +/* { dg-final { scan-assembler "vstrb.*" }  } */
> +/* { dg-final { scan-assembler "vldrb.s8*" }  } */
> +
> +int16x8_t
> +foo16 (int16x8_t value)
> +{
> +  int16x8_t b = value;
> +  return b;
> +}
> +
> +/* { dg-final { scan-assembler "vmov\\tq\[0-7\], q\[0-7\]" }  } */
> +/* { dg-final { scan-assembler "vstrb.*" }  } */
> +/* { dg-final { scan-assembler "vldrb.u16*" }  } */
> +
> +int32x4_t
> +foo32 (int32x4_t value)
> +{
> +  int32x4_t b = value;
> +  return b;
> +}
> +
> +/* { dg-final { scan-assembler "vmov\\tq\[0-7\], q\[0-7\]" }  } */
> +/* { dg-final { scan-assembler "vstrb.*" }  } */
> +/* { dg-final { scan-assembler "vldrb.u32*" }  } */
> +
> +int64x2_t
> +foo64 (int64x2_t value)
> +{
> +  int64x2_t b = value;
> +  return b;
> +}
> +
> +/* { dg-final { scan-assembler "vmov\\tq\[0-7\], q\[0-7\]" }  } */
> +/* { dg-final { scan-assembler "vstrb.*" }  } */
> +/* { dg-final { scan-assembler "vldrb.u64*" }  } */
> diff --git 
> a/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vector_int1.c 
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vector_int1.c
> new file mode 100644
> index 
> 0000000000000000000000000000000000000000..748ddecbd4011bb24058c27cd6a09d66f71ce581
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vector_int1.c
> @@ -0,0 +1,54 @@
> +/* { dg-do compile  } */
> +/* { dg-additional-options "-march=armv8.1-m.main+mve 
> -mfloat-abi=hard"  }  */
> +/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mcpu=auto"} 
> } */
> +
> +#include "arm_mve.h"
> +
> +int8x16_t value1;
> +int16x8_t value2;
> +int32x4_t value3;
> +int64x2_t value4;
> +
> +int8x16_t
> +foo8 ()
> +{
> +  int8x16_t b = value1;
> +  return b;
> +}
> +
> +/* { dg-final { scan-assembler "vmov\\tq\[0-7\], q\[0-7\]" }  } */
> +/* { dg-final { scan-assembler "vstrb.*" }  } */
> +/* { dg-final { scan-assembler "vldrb.u8*" }  } */
> +
> +int16x8_t
> +foo16 ()
> +{
> +  int16x8_t b = value2;
> +  return b;
> +}
> +
> +/* { dg-final { scan-assembler "vmov\\tq\[0-7\], q\[0-7\]" }  } */
> +/* { dg-final { scan-assembler "vstrb.*" }  } */
> +/* { dg-final { scan-assembler "vldrb.u16*" }  } */
> +
> +int32x4_t
> +foo32 ()
> +{
> +  int32x4_t b = value3;
> +  return b;
> +}
> +
> +/* { dg-final { scan-assembler "vmov\\tq\[0-7\], q\[0-7\]" }  } */
> +/* { dg-final { scan-assembler "vstrb.*" }  } */
> +/* { dg-final { scan-assembler "vldrb.u32" }  } */
> +
> +int64x2_t
> +foo64 ()
> +{
> +  int64x2_t b = value4;
> +  return b;
> +}
> +
> +/* { dg-final { scan-assembler "vmov\\tq\[0-7\], q\[0-7\]" }  } */
> +/* { dg-final { scan-assembler "vstrb.*" }  } */
> +/* { dg-final { scan-assembler "vldrb.u64" }  } */
> diff --git 
> a/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vector_int2.c 
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vector_int2.c
> new file mode 100644
> index 
> 0000000000000000000000000000000000000000..376ec9ee7fc04ddde98719d2605319a378f9a6bb
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vector_int2.c
> @@ -0,0 +1,49 @@
> +/* { dg-do compile  } */
> +/* { dg-additional-options "-march=armv8.1-m.main+mve 
> -mfloat-abi=hard"  }  */
> +/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mcpu=auto"} 
> } */
> +
> +#include "arm_mve.h"
> +
> +int8x16_t
> +foo8 ()
> +{
> +  int8x16_t b = {1, 2, 3, 4};
> +  return b;
> +}
> +
> +/* { dg-final { scan-assembler "vmov\\tq\[0-7\], q\[0-7\]" }  } */
> +/* { dg-final { scan-assembler "vstrb.*" }  } */
> +/* { dg-final { scan-assembler "vldrw.32.*" }  } */
> +
> +int16x8_t
> +foo16 (int16x8_t value)
> +{
> +  int16x8_t b = {1, 2, 3};
> +  return b;
> +}
> +
> +/* { dg-final { scan-assembler "vmov\\tq\[0-7\], q\[0-7\]" }  } */
> +/* { dg-final { scan-assembler "vstrb.*" }  } */
> +/* { dg-final { scan-assembler "vldrw.32.*" }  } */
> +
> +int32x4_t
> +foo32 (int32x4_t value)
> +{
> +  int32x4_t b = {1, 2};
> +  return b;
> +}
> +
> +/* { dg-final { scan-assembler "vmov\\tq\[0-7\], q\[0-7\]" }  } */
> +/* { dg-final { scan-assembler "vstrb.*" }  } */
> +/* { dg-final { scan-assembler "vldrw.32.*" }  } */
> +
> +int64x2_t
> +foo64 (int64x2_t value)
> +{
> +  int64x2_t b = {1};
> +  return b;
> +}
> +
> +/* { dg-final { scan-assembler "vmov\\tq\[0-7\], q\[0-7\]" }  } */
> +/* { dg-final { scan-assembler "vstrb.*" }  } */
> +/* { dg-final { scan-assembler "vldrw.32.*" }  } */
> diff --git 
> a/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vector_uint.c 
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vector_uint.c
> new file mode 100644
> index 
> 0000000000000000000000000000000000000000..f001d14f9ca4c851ed4b3371ae9599d23d2b62ce
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vector_uint.c
> @@ -0,0 +1,49 @@
> +/* { dg-do compile  } */
> +/* { dg-additional-options "-march=armv8.1-m.main+mve 
> -mfloat-abi=hard"  }  */
> +/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mcpu=auto"} 
> } */
> +
> +#include "arm_mve.h"
> +
> +uint8x16_t
> +foo8 (uint8x16_t value)
> +{
> +  uint8x16_t b = value;
> +  return b;
> +}
> +
> +/* { dg-final { scan-assembler "vmov\\tq\[0-7\], q\[0-7\]" }  } */
> +/* { dg-final { scan-assembler "vstrb.*" }  } */
> +/* { dg-final { scan-assembler "vldrb.s8*" }  } */
> +
> +uint16x8_t
> +foo16 (uint16x8_t value)
> +{
> +  uint16x8_t b = value;
> +  return b;
> +}
> +
> +/* { dg-final { scan-assembler "vmov\\tq\[0-7\], q\[0-7\]" }  } */
> +/* { dg-final { scan-assembler "vstrb.*" }  } */
> +/* { dg-final { scan-assembler "vldrb.u16*" }  } */
> +
> +uint32x4_t
> +foo32 (uint32x4_t value)
> +{
> +  uint32x4_t b = value;
> +  return b;
> +}
> +
> +/* { dg-final { scan-assembler "vmov\\tq\[0-7\], q\[0-7\]" }  } */
> +/* { dg-final { scan-assembler "vstrb.*" }  } */
> +/* { dg-final { scan-assembler "vldrb.u32*" }  } */
> +
> +uint64x2_t
> +foo64 (uint64x2_t value)
> +{
> +  uint64x2_t b = value;
> +  return b;
> +}
> +
> +/* { dg-final { scan-assembler "vmov\\tq\[0-7\], q\[0-7\]" }  } */
> +/* { dg-final { scan-assembler "vstrb.*" }  } */
> +/* { dg-final { scan-assembler "vldrb.u64*" }  } */
> diff --git 
> a/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vector_uint1.c 
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vector_uint1.c
> new file mode 100644
> index 
> 0000000000000000000000000000000000000000..56d40668d63ba0b24c08944981c415054494c37d
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vector_uint1.c
> @@ -0,0 +1,54 @@
> +/* { dg-do compile  } */
> +/* { dg-additional-options "-march=armv8.1-m.main+mve 
> -mfloat-abi=hard"  }  */
> +/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mcpu=auto"} 
> } */
> +
> +#include "arm_mve.h"
> +
> +uint8x16_t value1;
> +uint16x8_t value2;
> +uint32x4_t value3;
> +uint64x2_t value4;
> +
> +uint8x16_t
> +foo8 ()
> +{
> +  uint8x16_t b = value1;
> +  return b;
> +}
> +
> +/* { dg-final { scan-assembler "vmov\\tq\[0-7\], q\[0-7\]" }  } */
> +/* { dg-final { scan-assembler "vstrb.*" }  } */
> +/* { dg-final { scan-assembler "vldrb.s8*" }  } */
> +
> +uint16x8_t
> +foo16 ()
> +{
> +  uint16x8_t b = value2;
> +  return b;
> +}
> +
> +/* { dg-final { scan-assembler "vmov\\tq\[0-7\], q\[0-7\]" }  } */
> +/* { dg-final { scan-assembler "vstrb.*" }  } */
> +/* { dg-final { scan-assembler "vldrb.u16*" }  } */
> +
> +uint32x4_t
> +foo32 ()
> +{
> +  uint32x4_t b = value3;
> +  return b;
> +}
> +
> +/* { dg-final { scan-assembler "vmov\\tq\[0-7\], q\[0-7\]" }  } */
> +/* { dg-final { scan-assembler "vstrb.*" }  } */
> +/* { dg-final { scan-assembler "vldrb.u32*" }  } */
> +
> +uint64x2_t
> +foo64 ()
> +{
> +  uint64x2_t b = value4;
> +  return b;
> +}
> +
> +/* { dg-final { scan-assembler "vmov\\tq\[0-7\], q\[0-7\]" }  } */
> +/* { dg-final { scan-assembler "vstrb.*" }  } */
> +/* { dg-final { scan-assembler "vldrb.u64*" }  } */
> diff --git 
> a/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vector_uint2.c 
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vector_uint2.c
> new file mode 100644
> index 
> 0000000000000000000000000000000000000000..9ff9b67993ac83cf398880cb65510604a37de6a4
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vector_uint2.c
> @@ -0,0 +1,49 @@
> +/* { dg-do compile  } */
> +/* { dg-additional-options "-march=armv8.1-m.main+mve 
> -mfloat-abi=hard"  }  */
> +/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mcpu=auto"} 
> } */
> +
> +#include "arm_mve.h"
> +
> +uint8x16_t
> +foo8 (uint8x16_t value)
> +{
> +  uint8x16_t b = {1, 2, 3, 4};
> +  return b;
> +}
> +
> +/* { dg-final { scan-assembler "vmov\\tq\[0-7\], q\[0-7\]" }  } */
> +/* { dg-final { scan-assembler "vstrb.*" }  } */
> +/* { dg-final { scan-assembler "vldrw.32.*" }  } */
> +
> +uint16x8_t
> +foo16 (uint16x8_t value)
> +{
> +  uint16x8_t b = {1, 2, 3};
> +  return b;
> +}
> +
> +/* { dg-final { scan-assembler "vmov\\tq\[0-7\], q\[0-7\]" }  } */
> +/* { dg-final { scan-assembler "vstrb.*" }  } */
> +/* { dg-final { scan-assembler "vldrw.32.*" }  } */
> +
> +uint32x4_t
> +foo32 (uint32x4_t value)
> +{
> +  uint32x4_t b = {1, 2};
> +  return b;
> +}
> +
> +/* { dg-final { scan-assembler "vmov\\tq\[0-7\], q\[0-7\]" }  } */
> +/* { dg-final { scan-assembler "vstrb.*" }  } */
> +/* { dg-final { scan-assembler "vldrw.32.*" }  } */
> +
> +uint64x2_t
> +foo64 (uint64x2_t value)
> +{
> +  uint64x2_t b = {1};
> +  return b;
> +}
> +
> +/* { dg-final { scan-assembler "vmov\\tq\[0-7\], q\[0-7\]" }  } */
> +/* { dg-final { scan-assembler "vstrb.*" }  } */
> +/* { dg-final { scan-assembler "vldrw.32.*" }  } */
> diff --git a/gcc/testsuite/gcc.target/arm/mve/mve.exp 
> b/gcc/testsuite/gcc.target/arm/mve/mve.exp
> new file mode 100644
> index 
> 0000000000000000000000000000000000000000..77ae3fa292b2892fb22c2f89223ca19dc16ccc99
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/mve/mve.exp
> @@ -0,0 +1,47 @@
> +# Copyright (C) 2019 Free Software Foundation, Inc.
> +
> +# This program is free software; you can redistribute it and/or modify
> +# it under the terms of the GNU General Public License as published by
> +# the Free Software Foundation; either version 3 of the License, or
> +# (at your option) any later version.
> +#
> +# This program is distributed in the hope that it will be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +# GNU General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with GCC; see the file COPYING3.  If not see
> +# <http://www.gnu.org/licenses/>.
> +
> +# GCC testsuite that uses the `dg.exp' driver.
> +
> +# Exit immediately if this isn't an ARM target.
> +if ![istarget arm*-*-*] then {
> +  return
> +}
> +
> +# Load support procs.
> +load_lib gcc-dg.exp
> +
> +# If a testcase doesn't have special options, use these.
> +global DEFAULT_CFLAGS
> +if ![info exists DEFAULT_CFLAGS] then {
> +    set DEFAULT_CFLAGS " -ansi -pedantic-errors"
> +}
> +
> +# This variable should only apply to tests called in this exp file.
> +global dg_runtest_extra_prunes
> +set dg_runtest_extra_prunes ""
> +lappend dg_runtest_extra_prunes "warning: switch -m(cpu|arch)=.* 
> conflicts with -m(cpu|arch)=.* switch"
> +
> +# Initialize `dg'.
> +dg-init
> +
> +# Main loop.
> +dg-runtest [lsort [glob -nocomplain 
> $srcdir/$subdir/intrinsics/*.\[cCS\]]] \
> +       "" $DEFAULT_CFLAGS
> +
> +# All done.
> +set dg_runtest_extra_prunes ""
> +dg-finish
>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH][ARM][GCC][2/x]: MVE ACLE intrinsics framework patch.
  2019-11-14 19:13 ` [PATCH][ARM][GCC][2/x]: MVE ACLE intrinsics framework patch Srinath Parvathaneni
@ 2019-12-19 17:24   ` Kyrill Tkachov
  0 siblings, 0 replies; 48+ messages in thread
From: Kyrill Tkachov @ 2019-12-19 17:24 UTC (permalink / raw)
  To: Srinath Parvathaneni, gcc-patches; +Cc: Richard Earnshaw

Hi Srinath,

On 11/14/19 7:12 PM, Srinath Parvathaneni wrote:
> Hello,
>
> This patch is part of MVE ACLE intrinsics framework.
> This patches add support to update (read/write) the APSR (Application 
> Program Status Register)
> register and FPSCR (Floating-point Status and Control Register) 
> register for MVE.
> This patch also enables thumb2 mov RTL patterns for MVE.
>
> Please refer to Arm reference manual [1] for more details.
> [1] 
> https://static.docs.arm.com/ddi0553/bh/DDI0553B_h_armv8m_arm.pdf?_ga=2.102521798.659307368.1572453718-1501600630.1548848914
>
> Regression tested on arm-none-eabi and found no regressions.
>
> Ok for trunk?
>
> Thanks,
> Srinath
>
> gcc/ChangeLog:
>
> 2019-11-11  Andre Vieira <andre.simoesdiasvieira@arm.com>
>             Mihail Ionescu  <mihail.ionescu@arm.com>
>             Srinath Parvathaneni <srinath.parvathaneni@arm.com>
>
>         * config/arm/thumb2.md (thumb2_movsfcc_soft_insn): Add check 
> to not allow
>         TARGET_HAVE_MVE for this pattern.
>         (thumb2_cmse_entry_return): Add TARGET_HAVE_MVE check to 
> update APSR register.
>         * config/arm/unspecs.md (UNSPEC_GET_FPSCR): Define.
>         (VUNSPEC_GET_FPSCR): Remove.
>         * config/arm/vfp.md (thumb2_movhi_vfp): Add TARGET_HAVE_MVE check.
>         (thumb2_movhi_fp16): Add TARGET_HAVE_MVE check.
>         (thumb2_movsi_vfp): Add TARGET_HAVE_MVE check.
>         (movdi_vfp): Add TARGET_HAVE_MVE check.
>         (thumb2_movdf_vfp): Add TARGET_HAVE_MVE check.
>         (thumb2_movsfcc_vfp): Add TARGET_HAVE_MVE check.
>         (thumb2_movdfcc_vfp): Add TARGET_HAVE_MVE check.
>         (push_multi_vfp): Add TARGET_HAVE_MVE check.
>         (set_fpscr): Add TARGET_HAVE_MVE check.
>         (get_fpscr): Add TARGET_HAVE_MVE check.


These pattern changes do more that add a TARGET_HAVE_MVE check. Some add 
new alternatives, some even change the RTL pattern.

I'd like to see them reflected in the ChangeLog so that I know they're 
deliberate.


>
>
> ###############     Attachment also inlined for ease of reply    
> ###############
>
>
> diff --git a/gcc/config/arm/thumb2.md b/gcc/config/arm/thumb2.md
> index 
> 809461a25da5a8058a8afce972dea0d3131effc0..81afd8fcdc1b0a82493dc0758bce16fa9e5fde20 
> 100644
> --- a/gcc/config/arm/thumb2.md
> +++ b/gcc/config/arm/thumb2.md
> @@ -435,10 +435,10 @@
>  (define_insn "*cmovsi_insn"
>    [(set (match_operand:SI 0 "arm_general_register_operand" 
> "=r,r,r,r,r,r,r")
>          (if_then_else:SI
> -        (match_operator 1 "arm_comparison_operator"
> -         [(match_operand 2 "cc_register" "") (const_int 0)])
> -        (match_operand:SI 3 "arm_reg_or_m1_or_1" "r, r,UM, r,U1,UM,U1")
> -        (match_operand:SI 4 "arm_reg_or_m1_or_1" "r,UM, r,U1, 
> r,UM,U1")))]
> +       (match_operator 1 "arm_comparison_operator"
> +        [(match_operand 2 "cc_register" "") (const_int 0)])
> +       (match_operand:SI 3 "arm_reg_or_m1_or_1" "r, r,UM, r,U1,UM,U1")
> +       (match_operand:SI 4 "arm_reg_or_m1_or_1" "r,UM, r,U1, r,UM,U1")))]
>    "TARGET_THUMB2 && TARGET_COND_ARITH
>     && (!((operands[3] == const1_rtx && operands[4] == constm1_rtx)
>         || (operands[3] == constm1_rtx && operands[4] == const1_rtx)))"
> @@ -540,7 +540,7 @@
>                            [(match_operand 4 "cc_register" "") 
> (const_int 0)])
>                           (match_operand:SF 1 "s_register_operand" "0,r")
>                           (match_operand:SF 2 "s_register_operand" 
> "r,0")))]
> -  "TARGET_THUMB2 && TARGET_SOFT_FLOAT"
> +  "TARGET_THUMB2 && TARGET_SOFT_FLOAT && !TARGET_HAVE_MVE"
>    "@
>     it\\t%D3\;mov%D3\\t%0, %2
>     it\\t%d3\;mov%d3\\t%0, %1"
> @@ -1226,7 +1226,7 @@
>     ; added to clear the APSR and potentially the FPSCR if VFP is 
> available, so
>     ; we adapt the length accordingly.
>     (set (attr "length")
> -     (if_then_else (match_test "TARGET_HARD_FLOAT")
> +     (if_then_else (match_test "TARGET_HARD_FLOAT || TARGET_HAVE_MVE")
>        (const_int 34)
>        (const_int 8)))
>     ; We do not support predicate execution of returns from 
> cmse_nonsecure_entry
> diff --git a/gcc/config/arm/unspecs.md b/gcc/config/arm/unspecs.md
> index 
> b3b4f8ee3e2d1bdad968a9dd8ccbc72ded274f48..ac7fe7d0af19f1965356d47d8327e24d410b99bd 
> 100644
> --- a/gcc/config/arm/unspecs.md
> +++ b/gcc/config/arm/unspecs.md
> @@ -170,6 +170,7 @@
>    UNSPEC_TORC          ; Used by the intrinsic form of the iWMMXt 
> TORC instruction.
>    UNSPEC_TORVSC                ; Used by the intrinsic form of the 
> iWMMXt TORVSC instruction.
>    UNSPEC_TEXTRC                ; Used by the intrinsic form of the 
> iWMMXt TEXTRC instruction.
> +  UNSPEC_GET_FPSCR     ; Represent fetch of FPSCR content.
>  ])
>
>
> @@ -216,7 +217,6 @@
>    VUNSPEC_SLX          ; Represent a store-register-release-exclusive.
>    VUNSPEC_LDA          ; Represent a store-register-acquire.
>    VUNSPEC_STL          ; Represent a store-register-release.
> -  VUNSPEC_GET_FPSCR    ; Represent fetch of FPSCR content.
>    VUNSPEC_SET_FPSCR    ; Represent assign of FPSCR content.
>    VUNSPEC_PROBE_STACK_RANGE ; Represent stack range probing.
>    VUNSPEC_CDP          ; Represent the coprocessor cdp instruction.
> diff --git a/gcc/config/arm/vfp.md b/gcc/config/arm/vfp.md
> index 
> 6349c0570540ec25a599166f5d427fcbdbf2af68..461a5d71ca8548cfc61c83f9716249425633ad21 
> 100644
> --- a/gcc/config/arm/vfp.md
> +++ b/gcc/config/arm/vfp.md
> @@ -74,10 +74,10 @@
>  (define_insn "*thumb2_movhi_vfp"
>   [(set
>     (match_operand:HI 0 "nonimmediate_operand"
> -    "=rk, r, l, r, m, r, *t, r, *t")
> +    "=rk, r, l, r, m, r, *t, r, *t, Up, r")
>     (match_operand:HI 1 "general_operand"
> -    "rk, I, Py, n, r, m, r, *t, *t"))]
> - "TARGET_THUMB2 && TARGET_HARD_FLOAT
> +    "rk, I, Py, n, r, m, r, *t, *t, r, Up"))]
> + "TARGET_THUMB2 && (TARGET_HARD_FLOAT || TARGET_HAVE_MVE)
>    && !TARGET_VFP_FP16INST
>    && (register_operand (operands[0], HImode)
>         || register_operand (operands[1], HImode))"
> @@ -99,20 +99,24 @@
>        return "vmov%?\t%0, %1\t%@ int";
>      case 8:
>        return "vmov%?.f32\t%0, %1\t%@ int";
> +    case 9:
> +      return "vmsr%?\t P0, %1\t@ movhi";
> +    case 10:
> +      return "vmrs%?\t %0, P0\t@ movhi";
>      default:
>        gcc_unreachable ();
>      }
>  }
>   [(set_attr "predicable" "yes")
>    (set_attr "predicable_short_it"
> -   "yes, no, yes, no, no, no, no, no, no")
> +   "yes, no, yes, no, no, no, no, no, no, no, no")
>    (set_attr "type"
>     "mov_reg, mov_imm, mov_imm, mov_imm, store_4, load_4,\
> -    f_mcr, f_mrc, fmov")
> -  (set_attr "arch" "*, *, *, v6t2, *, *, *, *, *")
> -  (set_attr "pool_range" "*, *, *, *, *, 4094, *, *, *")
> -  (set_attr "neg_pool_range" "*, *, *, *, *, 250, *, *, *")
> -  (set_attr "length" "2, 4, 2, 4, 4, 4, 4, 4, 4")]
> +    f_mcr, f_mrc, fmov, mve_move, mve_move")
> +  (set_attr "arch" "*, *, *, v6t2, *, *, *, *, *, *, *")
> +  (set_attr "pool_range" "*, *, *, *, *, 4094, *, *, *, *, *")
> +  (set_attr "neg_pool_range" "*, *, *, *, *, 250, *, *, *, *, *")
> +  (set_attr "length" "2, 4, 2, 4, 4, 4, 4, 4, 4, 4, 4")]
>  )


Hmm, I think the alternatives that touch P0 and other MVE-specific 
registers should be disabled for non-MVE targets through the "arch" 
attribute.


>
>  ;; Patterns for HI moves which provide more data transfer 
> instructions when FP16
> @@ -170,10 +174,10 @@
>  (define_insn "*thumb2_movhi_fp16"
>   [(set
>     (match_operand:HI 0 "nonimmediate_operand"
> -    "=rk, r, l, r, m, r, *t, r, *t")
> +    "=rk, r, l, r, m, r, *t, r, *t, Up, r")
>     (match_operand:HI 1 "general_operand"
> -    "rk, I, Py, n, r, m, r, *t, *t"))]
> - "TARGET_THUMB2 && TARGET_VFP_FP16INST
> +    "rk, I, Py, n, r, m, r, *t, *t, r, Up"))]
> + "TARGET_THUMB2 && (TARGET_VFP_FP16INST || TARGET_HAVE_MVE)
>    && (register_operand (operands[0], HImode)
>         || register_operand (operands[1], HImode))"
>  {
> @@ -194,21 +198,25 @@
>        return "vmov.f16\t%0, %1\t%@ int";
>      case 8:
>        return "vmov%?.f32\t%0, %1\t%@ int";
> +    case 9:
> +      return "vmsr%?\tP0, %1\t%@ movhi";
> +    case 10:
> +      return "vmrs%?\t%0, P0\t%@ movhi";
>      default:
>        gcc_unreachable ();
>      }
>  }
>   [(set_attr "predicable"
> -   "yes, yes, yes, yes, yes, yes, no, no, yes")
> +   "yes, yes, yes, yes, yes, yes, no, no, yes, yes, yes")
>    (set_attr "predicable_short_it"
> -   "yes, no, yes, no, no, no, no, no, no")
> +   "yes, no, yes, no, no, no, no, no, no, no, no")
>    (set_attr "type"
>     "mov_reg, mov_imm, mov_imm, mov_imm, store_4, load_4,\
> -    f_mcr, f_mrc, fmov")
> -  (set_attr "arch" "*, *, *, v6t2, *, *, *, *, *")
> -  (set_attr "pool_range" "*, *, *, *, *, 4094, *, *, *")
> -  (set_attr "neg_pool_range" "*, *, *, *, *, 250, *, *, *")
> -  (set_attr "length" "2, 4, 2, 4, 4, 4, 4, 4, 4")]
> +    f_mcr, f_mrc, fmov, mve_move, mve_move")
> +  (set_attr "arch" "*, *, *, v6t2, *, *, *, *, *, *, *")
> +  (set_attr "pool_range" "*, *, *, *, *, 4094, *, *, *, *, *")
> +  (set_attr "neg_pool_range" "*, *, *, *, *, 250, *, *, *, *, *")
> +  (set_attr "length" "2, 4, 2, 4, 4, 4, 4, 4, 4, 4, 4")]
>  )
>
>  ;; SImode moves
> @@ -258,9 +266,11 @@
>  ;; is chosen with length 2 when the instruction is predicated for
>  ;; arm_restrict_it.
>  (define_insn "*thumb2_movsi_vfp"
> -  [(set (match_operand:SI 0 "nonimmediate_operand" 
> "=rk,r,l,r,r,lk*r,m,*t, r,*t,*t,  *Uv")
> -       (match_operand:SI 1 "general_operand" "rk,I,Py,K,j,mi,lk*r, 
> r,*t,*t,*UvTu,*t"))]
> -  "TARGET_THUMB2 && TARGET_HARD_FLOAT
> +  [(set (match_operand:SI 0 "nonimmediate_operand" 
> "=rk,r,l,r,r,l,*hk,m,*m,*t,\
> +                            r,*t,*t,*Uv, Up, r")
> +       (match_operand:SI 1 "general_operand" "rk,I,Py,K,j,mi,*mi,l,*hk,\
> +                            r,*t,*t,*UvTu,*t, r, Up"))]
> +  "TARGET_THUMB2 && (TARGET_HARD_FLOAT || TARGET_HAVE_MVE)
>     && (   s_register_operand (operands[0], SImode)
>         || s_register_operand (operands[1], SImode))"
>    "*
> @@ -275,43 +285,53 @@
>      case 4:
>        return \"movw%?\\t%0, %1\";
>      case 5:
> +    case 6:
>        /* Cannot load it directly, split to load it via MOV / MOVT.  */
>        if (!MEM_P (operands[1]) && arm_disable_literal_pool)
>          return \"#\";
>        return \"ldr%?\\t%0, %1\";
> -    case 6:
> -      return \"str%?\\t%1, %0\";
>      case 7:
> -      return \"vmov%?\\t%0, %1\\t%@ int\";
>      case 8:
> -      return \"vmov%?\\t%0, %1\\t%@ int\";
> +      return \"str%?\\t%1, %0\";
>      case 9:
> +      return \"vmov%?\\t%0, %1\\t%@ int\";
> +    case 10:
> +      return \"vmov%?\\t%0, %1\\t%@ int\";
> +    case 11:
>        return \"vmov%?.f32\\t%0, %1\\t%@ int\";
> -    case 10: case 11:
> +    case 12: case 13:
>        return output_move_vfp (operands);
> +    case 14:
> +      return \"vmsr\\t P0, %1\";
> +    case 15:
> +      return \"vmrs\\t %0, P0\";
>      default:
>        gcc_unreachable ();
>      }
>    "
>    [(set_attr "predicable" "yes")
> -   (set_attr "predicable_short_it" 
> "yes,no,yes,no,no,no,no,no,no,no,no,no")
> -   (set_attr "type" 
> "mov_reg,mov_reg,mov_reg,mvn_reg,mov_imm,load_4,store_4,f_mcr,f_mrc,fmov,f_loads,f_stores")
> -   (set_attr "length" "2,4,2,4,4,4,4,4,4,4,4,4")
> -   (set_attr "pool_range" "*,*,*,*,*,1018,*,*,*,*,1018,*")
> -   (set_attr "neg_pool_range" "*,*,*,*,*, 0,*,*,*,*,1008,*")]
> +   (set_attr "predicable_short_it" 
> "yes,no,yes,no,no,no,no,no,no,no,no,no,no,\
> +            no,no,no")
> +   (set_attr "type" 
> "mov_reg,mov_reg,mov_reg,mvn_reg,mov_imm,load_4,load_4,\
> + store_4,store_4,f_mcr,f_mrc,fmov,f_loads,f_stores,mve_move,\
> +            mve_move")
> +   (set_attr "length" "2,4,2,4,4,4,4,4,4,4,4,4,4,4,4,4")
> +   (set_attr "pool_range" "*,*,*,*,*,1018,4094,*,*,*,*,*,1018,*,*,*")
> +   (set_attr "neg_pool_range" "*,*,*,*,*,   0, 0,*,*,*,*,*,1008,*,*,*")]
>  )
>
>
>  ;; DImode moves
>
>  (define_insn "*movdi_vfp"
> -  [(set (match_operand:DI 0 "nonimmediate_di_operand" 
> "=r,r,r,r,r,r,m,w,!r,w,w, Uv")
> -       (match_operand:DI 1 "di_operand" 
> "r,rDa,Db,Dc,mi,mi,r,r,w,w,UvTu,w"))]
> -  "TARGET_32BIT && TARGET_HARD_FLOAT
> -   && (   register_operand (operands[0], DImode)
> -       || register_operand (operands[1], DImode))
> -   && !(TARGET_NEON && CONST_INT_P (operands[1])
> -       && simd_immediate_valid_for_move (operands[1], DImode, NULL, 
> NULL))"
> +  [(set (match_operand:DI 0 "nonimmediate_di_operand" 
> "=r,r,r,r,r,r,m,w,!r,w,\
> +                           w, Uv")
> +       (match_operand:DI 1 "di_operand" 
> "r,rDa,Db,Dc,mi,mi,r,r,w,w,UvTu,w"))]
> +  "TARGET_32BIT && (TARGET_HARD_FLOAT || TARGET_HAVE_MVE)
> +    && (register_operand (operands[0], DImode)
> +       || register_operand (operands[1], DImode))
> +   && !((TARGET_NEON || TARGET_HAVE_MVE) && CONST_INT_P (operands[1])
> +       && simd_immediate_valid_for_move (operands[1], DImode, NULL, 
> NULL))"
>    "*
>    switch (which_alternative)
>      {
> @@ -390,9 +410,15 @@
>      case 6: /* S register from immediate.  */
>        return \"vmov.f16\\t%0, %1\t%@ __fp16\";
>      case 7: /* S register from memory.  */
> -      return \"vld1.16\\t{%z0}, %A1\";
> +      if (TARGET_HAVE_MVE)
> +       return \"vldr.16\\t%0, %A1\";
> +      else
> +       return \"vld1.16\\t{%z0}, %A1\";
>      case 8: /* Memory from S register.  */
> -      return \"vst1.16\\t{%z1}, %A0\";
> +      if (TARGET_HAVE_MVE)
> +       return \"vstr.16\\t%1, %A0\";
> +      else
> +       return \"vst1.16\\t{%z1}, %A0\";
>      case 9: /* ARM register from constant.  */
>        {
>          long bits;
> @@ -593,7 +619,7 @@
>  (define_insn "*thumb2_movsf_vfp"
>    [(set (match_operand:SF 0 "nonimmediate_operand" "=t,?r,t, t  ,Uv,r 
> ,m,t,r")
>          (match_operand:SF 1 "hard_sf_operand"      " ?r,t,Dv,UvHa,t, 
> mHa,r,t,r"))]
> -  "TARGET_THUMB2 && TARGET_HARD_FLOAT
> +  "TARGET_THUMB2 && (TARGET_HARD_FLOAT || TARGET_HAVE_MVE)
>     && (   s_register_operand (operands[0], SFmode)
>         || s_register_operand (operands[1], SFmode))"
>    "*
> @@ -682,7 +708,7 @@
>  (define_insn "*thumb2_movdf_vfp"
>    [(set (match_operand:DF 0 "nonimmediate_soft_df_operand" "=w,?r,w 
> ,w,w  ,Uv,r ,m,w,r")
>          (match_operand:DF 1 "hard_df_operand" " ?r,w,Dy,G,UvHa,w, 
> mHa,r, w,r"))]
> -  "TARGET_THUMB2 && TARGET_HARD_FLOAT
> +  "TARGET_THUMB2 && (TARGET_HARD_FLOAT || TARGET_HAVE_MVE)
>     && (   register_operand (operands[0], DFmode)
>         || register_operand (operands[1], DFmode))"
>    "*
> @@ -760,7 +786,7 @@
>              [(match_operand 4 "cc_register" "") (const_int 0)])
>            (match_operand:SF 1 "s_register_operand" "0,t,t,0,?r,?r,0,t,t")
>            (match_operand:SF 2 "s_register_operand" 
> "t,0,t,?r,0,?r,t,0,t")))]
> -  "TARGET_THUMB2 && TARGET_HARD_FLOAT && !arm_restrict_it"
> +  "TARGET_THUMB2 && (TARGET_HARD_FLOAT || TARGET_HAVE_MVE) && 
> !arm_restrict_it"
>    "@
>     it\\t%D3\;vmov%D3.f32\\t%0, %2
>     it\\t%d3\;vmov%d3.f32\\t%0, %1
> @@ -806,7 +832,8 @@
>              [(match_operand 4 "cc_register" "") (const_int 0)])
>            (match_operand:DF 1 "s_register_operand" "0,w,w,0,?r,?r,0,w,w")
>            (match_operand:DF 2 "s_register_operand" 
> "w,0,w,?r,0,?r,w,0,w")))]
> -  "TARGET_THUMB2 && TARGET_HARD_FLOAT && TARGET_VFP_DOUBLE && 
> !arm_restrict_it"
> +  "TARGET_THUMB2 && (TARGET_HARD_FLOAT || TARGET_HAVE_MVE) && 
> TARGET_VFP_DOUBLE
> +   && !arm_restrict_it"
>    "@
>     it\\t%D3\;vmov%D3.f64\\t%P0, %P2
>     it\\t%d3\;vmov%d3.f64\\t%P0, %P1
> @@ -1982,7 +2009,7 @@
>      [(set (match_operand:BLK 0 "memory_operand" "=m")
>            (unspec:BLK [(match_operand:DF 1 "vfp_register_operand" "")]
>                        UNSPEC_PUSH_MULT))])]
> -  "TARGET_32BIT && TARGET_HARD_FLOAT"
> +  "TARGET_32BIT && (TARGET_HARD_FLOAT || TARGET_HAVE_MVE)"
>    "* return vfp_output_vstmd (operands);"
>    [(set_attr "type" "f_stored")]
>  )
> @@ -2070,16 +2097,18 @@
>
>  ;; Write Floating-point Status and Control Register.
>  (define_insn "set_fpscr"
> -  [(unspec_volatile [(match_operand:SI 0 "register_operand" "r")] 
> VUNSPEC_SET_FPSCR)]
> -  "TARGET_HARD_FLOAT"
> +  [(set (reg:SI VFPCC_REGNUM)
> +       (unspec_volatile:SI
> +        [(match_operand:SI 0 "register_operand" "r")] 
> VUNSPEC_SET_FPSCR))]
> +  "TARGET_HARD_FLOAT || TARGET_HAVE_MVE"
>    "mcr\\tp10, 7, %0, cr1, cr0, 0\\t @SET_FPSCR"
>    [(set_attr "type" "mrs")])


Why is the RTL pattern being changed here? I'm not, in principle, 
opposed, but it looks like an orthogonal change that can be done 
indendently if necessary and I'd rather keep the churn to the codebase 
minimal at this point.

Thanks,

Kyrill


>
>  ;; Read Floating-point Status and Control Register.
>  (define_insn "get_fpscr"
>    [(set (match_operand:SI 0 "register_operand" "=r")
> -        (unspec_volatile:SI [(const_int 0)] VUNSPEC_GET_FPSCR))]
> -  "TARGET_HARD_FLOAT"
> +       (unspec:SI [(reg:SI VFPCC_REGNUM)] UNSPEC_GET_FPSCR))]
> +  "TARGET_HARD_FLOAT || TARGET_HAVE_MVE"
>    "mrc\\tp10, 7, %0, cr1, cr0, 0\\t @GET_FPSCR"
>    [(set_attr "type" "mrs")])
>
>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH][ARM][GCC][4/x]: MVE ACLE vector interleaving store intrinsics.
  2019-11-14 19:13 ` [PATCH][ARM][GCC][4/x]: MVE ACLE vector interleaving store intrinsics Srinath Parvathaneni
@ 2019-12-19 17:39   ` Kyrill Tkachov
  0 siblings, 0 replies; 48+ messages in thread
From: Kyrill Tkachov @ 2019-12-19 17:39 UTC (permalink / raw)
  To: Srinath Parvathaneni, gcc-patches; +Cc: Richard Earnshaw


On 11/14/19 7:12 PM, Srinath Parvathaneni wrote:
> Hello,
>
> This patch supports MVE ACLE intrinsics vst4q_s8, vst4q_s16, 
> vst4q_s32, vst4q_u8,
> vst4q_u16, vst4q_u32, vst4q_f16 and vst4q_f32.
>
> In this patch arm_mve_builtins.def file is added to the source code in 
> which the
> builtins for MVE ACLE intrinsics are defined using builtin qualifiers.
>
> Please refer to M-profile Vector Extension (MVE) intrinsics [1]  for 
> more details.
> [1] 
> https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/mve-intrinsics
>
> Regression tested on arm-none-eabi and found no regressions.
>
> Ok for trunk?


Ok.

Thanks,

Kyrill


>
> Thanks,
> Srinath.
>
> gcc/ChangeLog:
>
> 2019-11-12  Andre Vieira <andre.simoesdiasvieira@arm.com>
>             Mihail Ionescu  <mihail.ionescu@arm.com>
>             Srinath Parvathaneni <srinath.parvathaneni@arm.com>
>
>         * config/arm/arm-builtins.c (CF): Define mve_builtin_data.
>         (VAR1): Define.
>         (ARM_BUILTIN_MVE_PATTERN_START): Define.
>         (arm_init_mve_builtins): Define function.
>         (arm_init_builtins): Add TARGET_HAVE_MVE check.
>         (arm_expand_builtin_1): Check the range of fcode.
>         (arm_expand_mve_builtin): Define function to expand MVE builtins.
>         (arm_expand_builtin): Check the range of fcode.
>         * config/arm/arm_mve.h (__ARM_FEATURE_MVE): Define MVE 
> floating point
>         types.
>         (__ARM_MVE_PRESERVE_USER_NAMESPACE): Define to protect user 
> namespace.
>         (vst4q_s8): Define macro.
>         (vst4q_s16): Likewise.
>         (vst4q_s32): Likewise.
>         (vst4q_u8): Likewise.
>         (vst4q_u16): Likewise.
>         (vst4q_u32): Likewise.
>         (vst4q_f16): Likewise.
>         (vst4q_f32): Likewise.
>         (__arm_vst4q_s8): Define inline builtin.
>         (__arm_vst4q_s16): Likewise.
>         (__arm_vst4q_s32): Likewise.
>         (__arm_vst4q_u8): Likewise.
>         (__arm_vst4q_u16): Likewise.
>         (__arm_vst4q_u32): Likewise.
>         (__arm_vst4q_f16): Likewise.
>         (__arm_vst4q_f32): Likewise.
>         (__ARM_mve_typeid): Define macro with MVE types.
>         (__ARM_mve_coerce): Define macro with _Generic feature.
>         (vst4q): Define polymorphic variant for different vst4q builtins.
>         * config/arm/arm_mve_builtins.def: New file.
>         * config/arm/mve.md (MVE_VLD_ST): Define iterator.
>         (unspec): Define unspec.
>         (mve_vst4q<mode>): Define RTL pattern.
>         * config/arm/t-arm (arm.o): Add entry for arm_mve_builtins.def.
>         (arm-builtins.o): Likewise.
>
> gcc/testsuite/ChangeLog:
>
> 2019-11-12  Andre Vieira <andre.simoesdiasvieira@arm.com>
>             Mihail Ionescu  <mihail.ionescu@arm.com>
>             Srinath Parvathaneni <srinath.parvathaneni@arm.com>
>
>         * gcc.target/arm/mve/intrinsics/vst4q_f16.c: New test.
>         * gcc.target/arm/mve/intrinsics/vst4q_f32.c: Likewise.
>         * gcc.target/arm/mve/intrinsics/vst4q_s16.c: Likewise.
>         * gcc.target/arm/mve/intrinsics/vst4q_s32.c: Likewise.
>         * gcc.target/arm/mve/intrinsics/vst4q_s8.c: Likewise.
>         * gcc.target/arm/mve/intrinsics/vst4q_u16.c: Likewise.
>         * gcc.target/arm/mve/intrinsics/vst4q_u32.c: Likewise.
>         * gcc.target/arm/mve/intrinsics/vst4q_u8.c: Likewise.
>
>
> ###############     Attachment also inlined for ease of reply    
> ###############
>
>
> diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
> index 
> d4cb0ea3deb49b10266d1620c85e243ed34aee4d..a9f76971ef310118bf7edea6a8dd3de1da46b46b 
> 100644
> --- a/gcc/config/arm/arm-builtins.c
> +++ b/gcc/config/arm/arm-builtins.c
> @@ -401,6 +401,13 @@ static arm_builtin_datum neon_builtin_data[] =
>  };
>
>  #undef CF
> +#define CF(N,X) CODE_FOR_mve_##N##X
> +static arm_builtin_datum mve_builtin_data[] =
> +{
> +#include "arm_mve_builtins.def"
> +};
> +
> +#undef CF
>  #undef VAR1
>  #define VAR1(T, N, A) \
>    {#N, UP (A), CODE_FOR_arm_##N, 0, T##_QUALIFIERS},
> @@ -705,6 +712,13 @@ enum arm_builtins
>
>  #include "arm_acle_builtins.def"
>
> +  ARM_BUILTIN_MVE_BASE,
> +
> +#undef VAR1
> +#define VAR1(T, N, X) \
> +  ARM_BUILTIN_MVE_##N##X,
> +#include "arm_mve_builtins.def"
> +
>    ARM_BUILTIN_MAX
>  };
>
> @@ -714,6 +728,9 @@ enum arm_builtins
>  #define ARM_BUILTIN_NEON_PATTERN_START \
>    (ARM_BUILTIN_NEON_BASE + 1)
>
> +#define ARM_BUILTIN_MVE_PATTERN_START \
> +  (ARM_BUILTIN_MVE_BASE + 1)
> +
>  #define ARM_BUILTIN_ACLE_PATTERN_START \
>    (ARM_BUILTIN_ACLE_BASE + 1)
>
> @@ -1219,6 +1236,22 @@ arm_init_acle_builtins (void)
>      }
>  }
>
> +/* Set up all the MVE builtins mentioned in arm_mve_builtins.def 
> file.  */
> +static void
> +arm_init_mve_builtins (void)
> +{
> +  volatile unsigned int i, fcode = ARM_BUILTIN_MVE_PATTERN_START;
> +
> +  arm_init_simd_builtin_scalar_types ();
> +  arm_init_simd_builtin_types ();
> +
> +  for (i = 0; i < ARRAY_SIZE (mve_builtin_data); i++, fcode++)
> +    {
> +      arm_builtin_datum *d = &mve_builtin_data[i];
> +      arm_init_builtin (fcode, d, "__builtin_mve");
> +    }
> +}
> +
>  /* Set up all the NEON builtins, even builtins for instructions that 
> are not
>     in the current target ISA to allow the user to compile particular 
> modules
>     with different target specific options that differ from the 
> command line
> @@ -1961,8 +1994,10 @@ arm_init_builtins (void)
>        = add_builtin_function ("__builtin_arm_lane_check", lane_check_fpr,
>                                ARM_BUILTIN_SIMD_LANE_CHECK, BUILT_IN_MD,
>                                NULL, NULL_TREE);
> -
> -      arm_init_neon_builtins ();
> +      if (TARGET_HAVE_MVE)
> +       arm_init_mve_builtins ();
> +      else
> +       arm_init_neon_builtins ();
>        arm_init_vfp_builtins ();
>        arm_init_crypto_builtins ();
>      }
> @@ -2492,10 +2527,14 @@ arm_expand_builtin_1 (int fcode, tree exp, rtx 
> target,
>    int is_void = 0;
>    int k;
>    bool neon = false;
> +  bool mve = false;
>
>    if (IN_RANGE (fcode, ARM_BUILTIN_VFP_BASE, ARM_BUILTIN_ACLE_BASE - 1))
>      neon = true;
>
> +  if (IN_RANGE (fcode, ARM_BUILTIN_MVE_BASE, ARM_BUILTIN_MAX - 1))
> +    mve = true;
> +
>    is_void = !!(d->qualifiers[0] & qualifier_void);
>
>    num_args += is_void;
> @@ -2535,7 +2574,7 @@ arm_expand_builtin_1 (int fcode, tree exp, rtx 
> target,
>          }
>        else if (d->qualifiers[qualifiers_k] & qualifier_pointer)
>          {
> -         if (neon)
> +         if (neon || mve)
>              args[k] = ARG_BUILTIN_NEON_MEMORY;
>            else
>              args[k] = ARG_BUILTIN_MEMORY;
> @@ -2585,6 +2624,26 @@ arm_expand_acle_builtin (int fcode, tree exp, 
> rtx target)
>    return arm_expand_builtin_1 (fcode, exp, target, d);
>  }
>
> +/* Expand an MVE builtin, i.e. those registered only if their 
> respective target
> +   constraints are met.  This check happens within 
> arm_expand_builtin.  */
> +
> +static rtx
> +arm_expand_mve_builtin (int fcode, tree exp, rtx target)
> +{
> +  if (fcode >= ARM_BUILTIN_MVE_BASE && !TARGET_HAVE_MVE)
> +  {
> +    fatal_error (input_location,
> +               "You must enable MVE instructions"
> +               " to use these intrinsics");
> +    return const0_rtx;
> +  }
> +
> +  arm_builtin_datum *d
> +    = &mve_builtin_data[fcode - ARM_BUILTIN_MVE_PATTERN_START];
> +
> +  return arm_expand_builtin_1 (fcode, exp, target, d);
> +}
> +
>  /* Expand a Neon builtin, i.e. those registered only if TARGET_NEON 
> holds.
>     Most of these are "special" because they don't have symbolic
>     constants defined per-instruction or per instruction-variant.  
> Instead, the
> @@ -2678,6 +2737,8 @@ arm_expand_builtin (tree exp,
>        /* Don't generate any RTL.  */
>        return const0_rtx;
>      }
> +  if (fcode >= ARM_BUILTIN_MVE_BASE)
> +    return arm_expand_mve_builtin (fcode, exp, target);
>
>    if (fcode >= ARM_BUILTIN_ACLE_BASE)
>      return arm_expand_acle_builtin (fcode, exp, target);
> diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
> index 
> 5ffb466596b5d8fc330616a6fcc7ee37d3e28def..39c6a1551a72700292dde8ef6cea44ba0907af8d 
> 100644
> --- a/gcc/config/arm/arm_mve.h
> +++ b/gcc/config/arm/arm_mve.h
> @@ -42,6 +42,13 @@ typedef __simd128_float16_t float16x8_t;
>  typedef __simd128_float32_t float32x4_t;
>  #endif
>
> +#if (__ARM_FEATURE_MVE & 2) /* MVE Floating point.  */
> +typedef struct { float16x8_t val[2]; } float16x8x2_t;
> +typedef struct { float16x8_t val[4]; } float16x8x4_t;
> +typedef struct { float32x4_t val[2]; } float32x4x2_t;
> +typedef struct { float32x4_t val[4]; } float32x4x4_t;
> +#endif
> +
>  typedef uint16_t mve_pred16_t;
>  typedef __simd128_uint8_t uint8x16_t;
>  typedef __simd128_uint16_t uint16x8_t;
> @@ -52,6 +59,330 @@ typedef __simd128_int16_t int16x8_t;
>  typedef __simd128_int32_t int32x4_t;
>  typedef __simd128_int64_t int64x2_t;
>
> +typedef struct { int16x8_t val[2]; } int16x8x2_t;
> +typedef struct { int16x8_t val[4]; } int16x8x4_t;
> +typedef struct { int32x4_t val[2]; } int32x4x2_t;
> +typedef struct { int32x4_t val[4]; } int32x4x4_t;
> +typedef struct { int8x16_t val[2]; } int8x16x2_t;
> +typedef struct { int8x16_t val[4]; } int8x16x4_t;
> +typedef struct { uint16x8_t val[2]; } uint16x8x2_t;
> +typedef struct { uint16x8_t val[4]; } uint16x8x4_t;
> +typedef struct { uint32x4_t val[2]; } uint32x4x2_t;
> +typedef struct { uint32x4_t val[4]; } uint32x4x4_t;
> +typedef struct { uint8x16_t val[2]; } uint8x16x2_t;
> +typedef struct { uint8x16_t val[4]; } uint8x16x4_t;
> +
> +#ifndef __ARM_MVE_PRESERVE_USER_NAMESPACE
> +#define vst4q_s8( __addr, __value) __arm_vst4q_s8( __addr, __value)
> +#define vst4q_s16( __addr, __value) __arm_vst4q_s16( __addr, __value)
> +#define vst4q_s32( __addr, __value) __arm_vst4q_s32( __addr, __value)
> +#define vst4q_u8( __addr, __value) __arm_vst4q_u8( __addr, __value)
> +#define vst4q_u16( __addr, __value) __arm_vst4q_u16( __addr, __value)
> +#define vst4q_u32( __addr, __value) __arm_vst4q_u32( __addr, __value)
> +#define vst4q_f16( __addr, __value) __arm_vst4q_f16( __addr, __value)
> +#define vst4q_f32( __addr, __value) __arm_vst4q_f32( __addr, __value)
> +#endif
> +
> +__extension__ extern __inline void
> +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> +__arm_vst4q_s8 (int8_t * __addr, int8x16x4_t __value)
> +{
> +  union { int8x16x4_t __i; __builtin_neon_xi __o; } __rv;
> +  __rv.__i = __value;
> +  __builtin_mve_vst4qv16qi ((__builtin_neon_qi *) __addr, __rv.__o);
> +}
> +
> +__extension__ extern __inline void
> +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> +__arm_vst4q_s16 (int16_t * __addr, int16x8x4_t __value)
> +{
> +  union { int16x8x4_t __i; __builtin_neon_xi __o; } __rv;
> +  __rv.__i = __value;
> +  __builtin_mve_vst4qv8hi ((__builtin_neon_hi *) __addr, __rv.__o);
> +}
> +
> +__extension__ extern __inline void
> +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> +__arm_vst4q_s32 (int32_t * __addr, int32x4x4_t __value)
> +{
> +  union { int32x4x4_t __i; __builtin_neon_xi __o; } __rv;
> +  __rv.__i = __value;
> +  __builtin_mve_vst4qv4si ((__builtin_neon_si *) __addr, __rv.__o);
> +}
> +
> +__extension__ extern __inline void
> +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> +__arm_vst4q_u8 (uint8_t * __addr, uint8x16x4_t __value)
> +{
> +  union { uint8x16x4_t __i; __builtin_neon_xi __o; } __rv;
> +  __rv.__i = __value;
> +  __builtin_mve_vst4qv16qi ((__builtin_neon_qi *) __addr, __rv.__o);
> +}
> +
> +__extension__ extern __inline void
> +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> +__arm_vst4q_u16 (uint16_t * __addr, uint16x8x4_t __value)
> +{
> +  union { uint16x8x4_t __i; __builtin_neon_xi __o; } __rv;
> +  __rv.__i = __value;
> +  __builtin_mve_vst4qv8hi ((__builtin_neon_hi *) __addr, __rv.__o);
> +}
> +
> +__extension__ extern __inline void
> +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> +__arm_vst4q_u32 (uint32_t * __addr, uint32x4x4_t __value)
> +{
> +  union { uint32x4x4_t __i; __builtin_neon_xi __o; } __rv;
> +  __rv.__i = __value;
> +  __builtin_mve_vst4qv4si ((__builtin_neon_si *) __addr, __rv.__o);
> +}
> +
> +#if (__ARM_FEATURE_MVE & 2) /* MVE Floating point.  */
> +
> +__extension__ extern __inline void
> +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> +__arm_vst4q_f16 (float16_t * __addr, float16x8x4_t __value)
> +{
> +  union { float16x8x4_t __i; __builtin_neon_xi __o; } __rv;
> +  __rv.__i = __value;
> +  __builtin_mve_vst4qv8hf (__addr, __rv.__o);
> +}
> +
> +__extension__ extern __inline void
> +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> +__arm_vst4q_f32 (float32_t * __addr, float32x4x4_t __value)
> +{
> +  union { float32x4x4_t __i; __builtin_neon_xi __o; } __rv;
> +  __rv.__i = __value;
> +  __builtin_mve_vst4qv4sf (__addr, __rv.__o);
> +}
> +
> +#endif
> +
> +enum {
> +    __ARM_mve_type_float16_t = 1,
> +    __ARM_mve_type_float16_t_ptr,
> +    __ARM_mve_type_float16_t_const_ptr,
> +    __ARM_mve_type_float16x8_t,
> +    __ARM_mve_type_float16x8x2_t,
> +    __ARM_mve_type_float16x8x4_t,
> +    __ARM_mve_type_float32_t,
> +    __ARM_mve_type_float32_t_ptr,
> +    __ARM_mve_type_float32_t_const_ptr,
> +    __ARM_mve_type_float32x4_t,
> +    __ARM_mve_type_float32x4x2_t,
> +    __ARM_mve_type_float32x4x4_t,
> +    __ARM_mve_type_int16_t,
> +    __ARM_mve_type_int16_t_ptr,
> +    __ARM_mve_type_int16_t_const_ptr,
> +    __ARM_mve_type_int16x8_t,
> +    __ARM_mve_type_int16x8x2_t,
> +    __ARM_mve_type_int16x8x4_t,
> +    __ARM_mve_type_int32_t,
> +    __ARM_mve_type_int32_t_ptr,
> +    __ARM_mve_type_int32_t_const_ptr,
> +    __ARM_mve_type_int32x4_t,
> +    __ARM_mve_type_int32x4x2_t,
> +    __ARM_mve_type_int32x4x4_t,
> +    __ARM_mve_type_int64_t,
> +    __ARM_mve_type_int64_t_ptr,
> +    __ARM_mve_type_int64_t_const_ptr,
> +    __ARM_mve_type_int64x2_t,
> +    __ARM_mve_type_int8_t,
> +    __ARM_mve_type_int8_t_ptr,
> +    __ARM_mve_type_int8_t_const_ptr,
> +    __ARM_mve_type_int8x16_t,
> +    __ARM_mve_type_int8x16x2_t,
> +    __ARM_mve_type_int8x16x4_t,
> +    __ARM_mve_type_uint16_t,
> +    __ARM_mve_type_uint16_t_ptr,
> +    __ARM_mve_type_uint16_t_const_ptr,
> +    __ARM_mve_type_uint16x8_t,
> +    __ARM_mve_type_uint16x8x2_t,
> +    __ARM_mve_type_uint16x8x4_t,
> +    __ARM_mve_type_uint32_t,
> +    __ARM_mve_type_uint32_t_ptr,
> +    __ARM_mve_type_uint32_t_const_ptr,
> +    __ARM_mve_type_uint32x4_t,
> +    __ARM_mve_type_uint32x4x2_t,
> +    __ARM_mve_type_uint32x4x4_t,
> +    __ARM_mve_type_uint64_t,
> +    __ARM_mve_type_uint64_t_ptr,
> +    __ARM_mve_type_uint64_t_const_ptr,
> +    __ARM_mve_type_uint64x2_t,
> +    __ARM_mve_type_uint8_t,
> +    __ARM_mve_type_uint8_t_ptr,
> +    __ARM_mve_type_uint8_t_const_ptr,
> +    __ARM_mve_type_uint8x16_t,
> +    __ARM_mve_type_uint8x16x2_t,
> +    __ARM_mve_type_uint8x16x4_t,
> +    __ARM_mve_unsupported_type
> +};
> +
> +#if (__ARM_FEATURE_MVE & 2) /* MVE Floating point.  */
> +#define __ARM_mve_typeid(x) _Generic(x, \
> +    float16_t: __ARM_mve_type_float16_t, \
> +    float16_t *: __ARM_mve_type_float16_t_ptr, \
> +    float16_t const *: __ARM_mve_type_float16_t_const_ptr, \
> +    float16x8_t: __ARM_mve_type_float16x8_t, \
> +    float16x8x2_t: __ARM_mve_type_float16x8x2_t, \
> +    float16x8x4_t: __ARM_mve_type_float16x8x4_t, \
> +    float32_t: __ARM_mve_type_float32_t, \
> +    float32_t *: __ARM_mve_type_float32_t_ptr, \
> +    float32_t const *: __ARM_mve_type_float32_t_const_ptr, \
> +    float32x4_t: __ARM_mve_type_float32x4_t, \
> +    float32x4x2_t: __ARM_mve_type_float32x4x2_t, \
> +    float32x4x4_t: __ARM_mve_type_float32x4x4_t, \
> +    int16_t: __ARM_mve_type_int16_t, \
> +    int16_t *: __ARM_mve_type_int16_t_ptr, \
> +    int16_t const *: __ARM_mve_type_int16_t_const_ptr, \
> +    int16x8_t: __ARM_mve_type_int16x8_t, \
> +    int16x8x2_t: __ARM_mve_type_int16x8x2_t, \
> +    int16x8x4_t: __ARM_mve_type_int16x8x4_t, \
> +    int32_t: __ARM_mve_type_int32_t, \
> +    int32_t *: __ARM_mve_type_int32_t_ptr, \
> +    int32_t const *: __ARM_mve_type_int32_t_const_ptr, \
> +    int32x4_t: __ARM_mve_type_int32x4_t, \
> +    int32x4x2_t: __ARM_mve_type_int32x4x2_t, \
> +    int32x4x4_t: __ARM_mve_type_int32x4x4_t, \
> +    int64_t: __ARM_mve_type_int64_t, \
> +    int64_t *: __ARM_mve_type_int64_t_ptr, \
> +    int64_t const *: __ARM_mve_type_int64_t_const_ptr, \
> +    int64x2_t: __ARM_mve_type_int64x2_t, \
> +    int8_t: __ARM_mve_type_int8_t, \
> +    int8_t *: __ARM_mve_type_int8_t_ptr, \
> +    int8_t const *: __ARM_mve_type_int8_t_const_ptr, \
> +    int8x16_t: __ARM_mve_type_int8x16_t, \
> +    int8x16x2_t: __ARM_mve_type_int8x16x2_t, \
> +    int8x16x4_t: __ARM_mve_type_int8x16x4_t, \
> +    uint16_t: __ARM_mve_type_uint16_t, \
> +    uint16_t *: __ARM_mve_type_uint16_t_ptr, \
> +    uint16_t const *: __ARM_mve_type_uint16_t_const_ptr, \
> +    uint16x8_t: __ARM_mve_type_uint16x8_t, \
> +    uint16x8x2_t: __ARM_mve_type_uint16x8x2_t, \
> +    uint16x8x4_t: __ARM_mve_type_uint16x8x4_t, \
> +    uint32_t: __ARM_mve_type_uint32_t, \
> +    uint32_t *: __ARM_mve_type_uint32_t_ptr, \
> +    uint32_t const *: __ARM_mve_type_uint32_t_const_ptr, \
> +    uint32x4_t: __ARM_mve_type_uint32x4_t, \
> +    uint32x4x2_t: __ARM_mve_type_uint32x4x2_t, \
> +    uint32x4x4_t: __ARM_mve_type_uint32x4x4_t, \
> +    uint64_t: __ARM_mve_type_uint64_t, \
> +    uint64_t *: __ARM_mve_type_uint64_t_ptr, \
> +    uint64_t const *: __ARM_mve_type_uint64_t_const_ptr, \
> +    uint64x2_t: __ARM_mve_type_uint64x2_t, \
> +    uint8_t: __ARM_mve_type_uint8_t, \
> +    uint8_t *: __ARM_mve_type_uint8_t_ptr, \
> +    uint8_t const *: __ARM_mve_type_uint8_t_const_ptr, \
> +    uint8x16_t: __ARM_mve_type_uint8x16_t, \
> +    uint8x16x2_t: __ARM_mve_type_uint8x16x2_t, \
> +    uint8x16x4_t: __ARM_mve_type_uint8x16x4_t, \
> +    default: _Generic(x, \
> +       signed char: __ARM_mve_type_int8_t, \
> +       short: __ARM_mve_type_int16_t, \
> +       int: __ARM_mve_type_int32_t, \
> +       long: __ARM_mve_type_int32_t, \
> +       long long: __ARM_mve_type_int64_t, \
> +       unsigned char: __ARM_mve_type_uint8_t, \
> +       unsigned short: __ARM_mve_type_uint16_t, \
> +       unsigned int: __ARM_mve_type_uint32_t, \
> +       unsigned long: __ARM_mve_type_uint32_t, \
> +       unsigned long long: __ARM_mve_type_uint64_t, \
> +       default: __ARM_mve_unsupported_type))
> +#else
> +#define __ARM_mve_typeid(x) _Generic(x, \
> +    int16_t: __ARM_mve_type_int16_t, \
> +    int16_t *: __ARM_mve_type_int16_t_ptr, \
> +    int16_t const *: __ARM_mve_type_int16_t_const_ptr, \
> +    int16x8_t: __ARM_mve_type_int16x8_t, \
> +    int16x8x2_t: __ARM_mve_type_int16x8x2_t, \
> +    int16x8x4_t: __ARM_mve_type_int16x8x4_t, \
> +    int32_t: __ARM_mve_type_int32_t, \
> +    int32_t *: __ARM_mve_type_int32_t_ptr, \
> +    int32_t const *: __ARM_mve_type_int32_t_const_ptr, \
> +    int32x4_t: __ARM_mve_type_int32x4_t, \
> +    int32x4x2_t: __ARM_mve_type_int32x4x2_t, \
> +    int32x4x4_t: __ARM_mve_type_int32x4x4_t, \
> +    int64_t: __ARM_mve_type_int64_t, \
> +    int64_t *: __ARM_mve_type_int64_t_ptr, \
> +    int64_t const *: __ARM_mve_type_int64_t_const_ptr, \
> +    int64x2_t: __ARM_mve_type_int64x2_t, \
> +    int8_t: __ARM_mve_type_int8_t, \
> +    int8_t *: __ARM_mve_type_int8_t_ptr, \
> +    int8_t const *: __ARM_mve_type_int8_t_const_ptr, \
> +    int8x16_t: __ARM_mve_type_int8x16_t, \
> +    int8x16x2_t: __ARM_mve_type_int8x16x2_t, \
> +    int8x16x4_t: __ARM_mve_type_int8x16x4_t, \
> +    uint16_t: __ARM_mve_type_uint16_t, \
> +    uint16_t *: __ARM_mve_type_uint16_t_ptr, \
> +    uint16_t const *: __ARM_mve_type_uint16_t_const_ptr, \
> +    uint16x8_t: __ARM_mve_type_uint16x8_t, \
> +    uint16x8x2_t: __ARM_mve_type_uint16x8x2_t, \
> +    uint16x8x4_t: __ARM_mve_type_uint16x8x4_t, \
> +    uint32_t: __ARM_mve_type_uint32_t, \
> +    uint32_t *: __ARM_mve_type_uint32_t_ptr, \
> +    uint32_t const *: __ARM_mve_type_uint32_t_const_ptr, \
> +    uint32x4_t: __ARM_mve_type_uint32x4_t, \
> +    uint32x4x2_t: __ARM_mve_type_uint32x4x2_t, \
> +    uint32x4x4_t: __ARM_mve_type_uint32x4x4_t, \
> +    uint64_t: __ARM_mve_type_uint64_t, \
> +    uint64_t *: __ARM_mve_type_uint64_t_ptr, \
> +    uint64_t const *: __ARM_mve_type_uint64_t_const_ptr, \
> +    uint64x2_t: __ARM_mve_type_uint64x2_t, \
> +    uint8_t: __ARM_mve_type_uint8_t, \
> +    uint8_t *: __ARM_mve_type_uint8_t_ptr, \
> +    uint8_t const *: __ARM_mve_type_uint8_t_const_ptr, \
> +    uint8x16_t: __ARM_mve_type_uint8x16_t, \
> +    uint8x16x2_t: __ARM_mve_type_uint8x16x2_t, \
> +    uint8x16x4_t: __ARM_mve_type_uint8x16x4_t, \
> +    default: _Generic(x, \
> +       signed char: __ARM_mve_type_int8_t, \
> +       short: __ARM_mve_type_int16_t, \
> +       int: __ARM_mve_type_int32_t, \
> +       long: __ARM_mve_type_int32_t, \
> +       long long: __ARM_mve_type_int64_t, \
> +       unsigned char: __ARM_mve_type_uint8_t, \
> +       unsigned short: __ARM_mve_type_uint16_t, \
> +       unsigned int: __ARM_mve_type_uint32_t, \
> +       unsigned long: __ARM_mve_type_uint32_t, \
> +       unsigned long long: __ARM_mve_type_uint64_t, \
> +       default: __ARM_mve_unsupported_type))
> +#endif /* MVE Floating point.  */
> +
> +extern void *__ARM_undef;
> +#define __ARM_mve_coerce(param, type) \
> +    _Generic(param, type: param, default: *(type *)__ARM_undef)
> +
> +#if (__ARM_FEATURE_MVE & 2) /* MVE Floating point.  */
> +
> +#define vst4q(p0,p1) __arm_vst4q(p0,p1)
> +#define __arm_vst4q(p0,p1) ({ __typeof(p0) __p0 = (p0); \
> +  __typeof(p1) __p1 = (p1); \
> +  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
> +  int (*)[__ARM_mve_type_int8_t_ptr][__ARM_mve_type_int8x16x4_t]: 
> __arm_vst4q_s8 (__ARM_mve_coerce(__p0, int8_t *), 
> __ARM_mve_coerce(__p1, int8x16x4_t)), \
> +  int (*)[__ARM_mve_type_int16_t_ptr][__ARM_mve_type_int16x8x4_t]: 
> __arm_vst4q_s16 (__ARM_mve_coerce(__p0, int16_t *), 
> __ARM_mve_coerce(__p1, int16x8x4_t)), \
> +  int (*)[__ARM_mve_type_int32_t_ptr][__ARM_mve_type_int32x4x4_t]: 
> __arm_vst4q_s32 (__ARM_mve_coerce(__p0, int32_t *), 
> __ARM_mve_coerce(__p1, int32x4x4_t)), \
> +  int (*)[__ARM_mve_type_uint8_t_ptr][__ARM_mve_type_uint8x16x4_t]: 
> __arm_vst4q_u8 (__ARM_mve_coerce(__p0, uint8_t *), 
> __ARM_mve_coerce(__p1, uint8x16x4_t)), \
> +  int (*)[__ARM_mve_type_uint16_t_ptr][__ARM_mve_type_uint16x8x4_t]: 
> __arm_vst4q_u16 (__ARM_mve_coerce(__p0, uint16_t *), 
> __ARM_mve_coerce(__p1, uint16x8x4_t)), \
> +  int (*)[__ARM_mve_type_uint32_t_ptr][__ARM_mve_type_uint32x4x4_t]: 
> __arm_vst4q_u32 (__ARM_mve_coerce(__p0, uint32_t *), 
> __ARM_mve_coerce(__p1, uint32x4x4_t)), \
> +  int 
> (*)[__ARM_mve_type_float16_t_ptr][__ARM_mve_type_float16x8x4_t]: 
> __arm_vst4q_f16 (__ARM_mve_coerce(__p0, float16_t *), 
> __ARM_mve_coerce(__p1, float16x8x4_t)), \
> +  int 
> (*)[__ARM_mve_type_float32_t_ptr][__ARM_mve_type_float32x4x4_t]: 
> __arm_vst4q_f32 (__ARM_mve_coerce(__p0, float32_t *), 
> __ARM_mve_coerce(__p1, float32x4x4_t)));})
> +
> +#else /* MVE Interger.  */
> +
> +#define vst4q(p0,p1) __arm_vst4q(p0,p1)
> +#define __arm_vst4q(p0,p1) ({ __typeof(p0) __p0 = (p0); \
> +  __typeof(p1) __p1 = (p1); \
> +  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
> +  int (*)[__ARM_mve_type_int8_t_ptr][__ARM_mve_type_int8x16x4_t]: 
> __arm_vst4q_s8 (__ARM_mve_coerce(__p0, int8_t *), 
> __ARM_mve_coerce(__p1, int8x16x4_t)), \
> +  int (*)[__ARM_mve_type_int16_t_ptr][__ARM_mve_type_int16x8x4_t]: 
> __arm_vst4q_s16 (__ARM_mve_coerce(__p0, int16_t *), 
> __ARM_mve_coerce(__p1, int16x8x4_t)), \
> +  int (*)[__ARM_mve_type_int32_t_ptr][__ARM_mve_type_int32x4x4_t]: 
> __arm_vst4q_s32 (__ARM_mve_coerce(__p0, int32_t *), 
> __ARM_mve_coerce(__p1, int32x4x4_t)), \
> +  int (*)[__ARM_mve_type_uint8_t_ptr][__ARM_mve_type_uint8x16x4_t]: 
> __arm_vst4q_u8 (__ARM_mve_coerce(__p0, uint8_t *), 
> __ARM_mve_coerce(__p1, uint8x16x4_t)), \
> +  int (*)[__ARM_mve_type_uint16_t_ptr][__ARM_mve_type_uint16x8x4_t]: 
> __arm_vst4q_u16 (__ARM_mve_coerce(__p0, uint16_t *), 
> __ARM_mve_coerce(__p1, uint16x8x4_t)), \
> +  int (*)[__ARM_mve_type_uint32_t_ptr][__ARM_mve_type_uint32x4x4_t]: 
> __arm_vst4q_u32 (__ARM_mve_coerce(__p0, uint32_t *), 
> __ARM_mve_coerce(__p1, uint32x4x4_t)));})
> +
> +#endif /* MVE Floating point.  */
> +
>  #ifdef __cplusplus
>  }
>  #endif
> diff --git a/gcc/config/arm/arm_mve_builtins.def 
> b/gcc/config/arm/arm_mve_builtins.def
> new file mode 100644
> index 
> 0000000000000000000000000000000000000000..c7f3546c9cb0ec3ae794e181e533a1cbed38121c
> --- /dev/null
> +++ b/gcc/config/arm/arm_mve_builtins.def
> @@ -0,0 +1,21 @@
> +/*  MVE builtin definitions for Arm.
> +    Copyright  (C) 2019 Free Software Foundation, Inc.
> +    Contributed by Arm.
> +
> +    This file is part of GCC.
> +
> +    GCC is free software; you can redistribute it and/or modify it
> +    under the terms of the GNU General Public License as published
> +    by the Free Software Foundation; either version 3, or (at your
> +    option) any later version.
> +
> +    GCC is distributed in the hope that it will be useful, but WITHOUT
> +    ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
> +    or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
> +    License for more details.
> +
> +    You should have received a copy of the GNU General Public License
> +    along with GCC; see the file COPYING3.  If not see
> +    <http://www.gnu.org/licenses/>. */
> +
> +VAR5 (STORE1, vst4q, v16qi, v8hi, v4si, v8hf, v4sf)
> diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
> index 
> 53334c6d329dedd482615b996232e85ded7a34f8..a2f5220eccb6d888b9b0b1bbd1054cc3a6be97f5 
> 100644
> --- a/gcc/config/arm/mve.md
> +++ b/gcc/config/arm/mve.md
> @@ -17,9 +17,12 @@
>  ;; along with GCC; see the file COPYING3.  If not see
>  ;; <http://www.gnu.org/licenses/>.
>
> -(define_mode_iterator MVE_types [V16QI V8HI V4SI V2DI TI V8HF V4SF V2DF])
>  (define_mode_attr V_sz_elem2 [(V16QI "s8") (V8HI "u16") (V4SI "u32")
>                                (V2DI "u64")])
> +(define_mode_iterator MVE_types [V16QI V8HI V4SI V2DI TI V8HF V4SF V2DF])
> +(define_mode_iterator MVE_VLD_ST [V16QI V8HI V4SI V8HF V4SF])
> +
> +(define_c_enum "unspec" [VST4Q])
>
>  (define_insn "*mve_mov<mode>"
>    [(set (match_operand:MVE_types 0 "s_register_operand" "=w,w,r,w,w,r,w")
> @@ -76,3 +79,37 @@
>    "TARGET_HAVE_MVE"
>    "vstrb.<V_sz_elem> %q1, %E0"
>    [(set_attr "type" "mve_store")])
> +
> +;;
> +;; [vst4q])
> +;;
> +(define_insn "mve_vst4q<mode>"
> +  [(set (match_operand:XI 0 "neon_struct_operand" "=Um")
> +       (unspec:XI [(match_operand:XI 1 "s_register_operand" "w")
> +                   (unspec:MVE_VLD_ST [(const_int 0)] 
> UNSPEC_VSTRUCTDUMMY)]
> +        VST4Q))
> +  ]
> +  "TARGET_HAVE_MVE"
> +{
> +   rtx ops[6];
> +   int regno = REGNO (operands[1]);
> +   ops[0] = gen_rtx_REG (TImode, regno);
> +   ops[1] = gen_rtx_REG (TImode, regno+4);
> +   ops[2] = gen_rtx_REG (TImode, regno+8);
> +   ops[3] = gen_rtx_REG (TImode, regno+12);
> +   rtx reg  = operands[0];
> +   while (reg && !REG_P (reg))
> +    reg = XEXP (reg, 0);
> +   gcc_assert (REG_P (reg));
> +   ops[4] = reg;
> +   ops[5] = operands[0];
> +   /* Here in first three instructions data is stored to ops[4]'s 
> location but
> +      in the fourth instruction data is stored to operands[0], this is to
> +      support the writeback.  */
> +   output_asm_insn ("vst40.<V_sz_elem>\t{%q0, %q1, %q2, %q3}, [%4]\n\t"
> +                   "vst41.<V_sz_elem>\t{%q0, %q1, %q2, %q3}, [%4]\n\t"
> +                   "vst42.<V_sz_elem>\t{%q0, %q1, %q2, %q3}, [%4]\n\t"
> +                   "vst43.<V_sz_elem>\t{%q0, %q1, %q2, %q3}, %5", ops);
> +   return "";
> +}
> +  [(set_attr "length" "16")])
> diff --git a/gcc/config/arm/t-arm b/gcc/config/arm/t-arm
> index 
> fda5e84355b56a20eb9a22919ab1c786120cc8f1..fbd99197b31ee4b69b7a06592123b6facd04f9af 
> 100644
> --- a/gcc/config/arm/t-arm
> +++ b/gcc/config/arm/t-arm
> @@ -135,7 +135,8 @@ arm.o: $(srcdir)/config/arm/arm.c $(CONFIG_H) 
> $(SYSTEM_H) coretypes.h $(TM_H) \
>    arm-cpu-data.h \
>    $(srcdir)/config/arm/arm-protos.h \
>    $(srcdir)/config/arm/arm_neon_builtins.def \
> -  $(srcdir)/config/arm/arm_vfp_builtins.def
> +  $(srcdir)/config/arm/arm_vfp_builtins.def \
> +  $(srcdir)/config/arm/arm_mve_builtins.def
>
>  arm-builtins.o: $(srcdir)/config/arm/arm-builtins.c $(CONFIG_H) \
>    $(SYSTEM_H) coretypes.h $(TM_H) \
> @@ -145,6 +146,7 @@ arm-builtins.o: 
> $(srcdir)/config/arm/arm-builtins.c $(CONFIG_H) \
>    $(srcdir)/config/arm/arm_acle_builtins.def \
>    $(srcdir)/config/arm/arm_neon_builtins.def \
>    $(srcdir)/config/arm/arm_vfp_builtins.def \
> +  $(srcdir)/config/arm/arm_mve_builtins.def \
>    $(srcdir)/config/arm/arm-simd-builtin-types.def
>          $(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) \
>                  $(srcdir)/config/arm/arm-builtins.c
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst4q_f16.c 
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst4q_f16.c
> new file mode 100644
> index 
> 0000000000000000000000000000000000000000..6f8ea17345b85983202613b0f481b23e0532fd16
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst4q_f16.c
> @@ -0,0 +1,37 @@
> +/* { dg-do compile  } */
> +/* { dg-additional-options "-march=armv8.1-m.main+mve.fp 
> -mfloat-abi=hard -O2"  }  */
> +/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} 
> } */
> +
> +#include "arm_mve.h"
> +
> +void
> +foo (float16_t * addr, float16x8x4_t value)
> +{
> +  vst4q_f16 (addr, value);
> +}
> +
> +/* { dg-final { scan-assembler "vst40.16"  }  } */
> +/* { dg-final { scan-assembler "vst41.16"  }  } */
> +/* { dg-final { scan-assembler "vst42.16"  }  } */
> +/* { dg-final { scan-assembler "vst43.16"  }  } */
> +
> +void
> +foo1 (float16_t * addr, float16x8x4_t value)
> +{
> +  vst4q (addr, value);
> +}
> +
> +/* { dg-final { scan-assembler "vst40.16"  }  } */
> +/* { dg-final { scan-assembler "vst41.16"  }  } */
> +/* { dg-final { scan-assembler "vst42.16"  }  } */
> +/* { dg-final { scan-assembler "vst43.16"  }  } */
> +
> +void
> +foo2 (float16_t * addr, float16x8x4_t value)
> +{
> +  vst4q_f16 (addr, value);
> +  addr += 32;
> +  vst4q_f16 (addr, value);
> +}
> +
> +/* { dg-final { scan-assembler {vst43.16\s\{.*\}, \[.*\]!} }  } */
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst4q_f32.c 
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst4q_f32.c
> new file mode 100644
> index 
> 0000000000000000000000000000000000000000..03b1dbb5b13e2c41013c772d944f29da8c2468f7
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst4q_f32.c
> @@ -0,0 +1,37 @@
> +/* { dg-do compile  } */
> +/* { dg-additional-options "-march=armv8.1-m.main+mve.fp 
> -mfloat-abi=hard -O2"  }  */
> +/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} 
> } */
> +
> +#include "arm_mve.h"
> +
> +void
> +foo (float32_t * addr, float32x4x4_t value)
> +{
> +  vst4q_f32 (addr, value);
> +}
> +
> +/* { dg-final { scan-assembler "vst40.32"  }  } */
> +/* { dg-final { scan-assembler "vst41.32"  }  } */
> +/* { dg-final { scan-assembler "vst42.32"  }  } */
> +/* { dg-final { scan-assembler "vst43.32"  }  } */
> +
> +void
> +foo1 (float32_t * addr, float32x4x4_t value)
> +{
> +  vst4q (addr, value);
> +}
> +
> +/* { dg-final { scan-assembler "vst40.32"  }  } */
> +/* { dg-final { scan-assembler "vst41.32"  }  } */
> +/* { dg-final { scan-assembler "vst42.32"  }  } */
> +/* { dg-final { scan-assembler "vst43.32"  }  } */
> +
> +void
> +foo2 (float32_t * addr, float32x4x4_t value)
> +{
> +  vst4q_f32 (addr, value);
> +  addr += 16;
> +  vst4q_f32 (addr, value);
> +}
> +
> +/* { dg-final { scan-assembler {vst43.32\s\{.*\}, \[.*\]!} }  } */
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst4q_s16.c 
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst4q_s16.c
> new file mode 100644
> index 
> 0000000000000000000000000000000000000000..05f53fc9175b41e4d16c7a6bc525b57b626a632d
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst4q_s16.c
> @@ -0,0 +1,37 @@
> +/* { dg-do compile  } */
> +/* { dg-additional-options "-march=armv8.1-m.main+mve 
> -mfloat-abi=hard -O2"  }  */
> +/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} 
> } */
> +
> +#include "arm_mve.h"
> +
> +void
> +foo (int16_t * addr, int16x8x4_t value)
> +{
> +  vst4q_s16 (addr, value);
> +}
> +
> +/* { dg-final { scan-assembler "vst40.16"  }  } */
> +/* { dg-final { scan-assembler "vst41.16"  }  } */
> +/* { dg-final { scan-assembler "vst42.16"  }  } */
> +/* { dg-final { scan-assembler "vst43.16"  }  } */
> +
> +void
> +foo1 (int16_t * addr, int16x8x4_t value)
> +{
> +  vst4q (addr, value);
> +}
> +
> +/* { dg-final { scan-assembler "vst40.16"  }  } */
> +/* { dg-final { scan-assembler "vst41.16"  }  } */
> +/* { dg-final { scan-assembler "vst42.16"  }  } */
> +/* { dg-final { scan-assembler "vst43.16"  }  } */
> +
> +void
> +foo2 (int16_t * addr, int16x8x4_t value)
> +{
> +  vst4q_s16 (addr, value);
> +  addr += 32;
> +  vst4q_s16 (addr, value);
> +}
> +
> +/* { dg-final { scan-assembler {vst43.16\s\{.*\}, \[.*\]!} }  } */
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst4q_s32.c 
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst4q_s32.c
> new file mode 100644
> index 
> 0000000000000000000000000000000000000000..9da2addb8c3d9566722b640f66bbee4f2b4b563d
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst4q_s32.c
> @@ -0,0 +1,37 @@
> +/* { dg-do compile  } */
> +/* { dg-additional-options "-march=armv8.1-m.main+mve 
> -mfloat-abi=hard -O2"  }  */
> +/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} 
> } */
> +
> +#include "arm_mve.h"
> +
> +void
> +foo (int32_t * addr, int32x4x4_t value)
> +{
> +  vst4q_s32 (addr, value);
> +}
> +
> +/* { dg-final { scan-assembler "vst40.32"  }  } */
> +/* { dg-final { scan-assembler "vst41.32"  }  } */
> +/* { dg-final { scan-assembler "vst42.32"  }  } */
> +/* { dg-final { scan-assembler "vst43.32"  }  } */
> +
> +void
> +foo1 (int32_t * addr, int32x4x4_t value)
> +{
> +  vst4q (addr, value);
> +}
> +
> +/* { dg-final { scan-assembler "vst40.32"  }  } */
> +/* { dg-final { scan-assembler "vst41.32"  }  } */
> +/* { dg-final { scan-assembler "vst42.32"  }  } */
> +/* { dg-final { scan-assembler "vst43.32"  }  } */
> +
> +void
> +foo2 (int32_t * addr, int32x4x4_t value)
> +{
> +  vst4q_s32 (addr, value);
> +  addr += 16;
> +  vst4q_s32 (addr, value);
> +}
> +
> +/* { dg-final { scan-assembler {vst43.32\s\{.*\}, \[.*\]!} }  } */
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst4q_s8.c 
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst4q_s8.c
> new file mode 100644
> index 
> 0000000000000000000000000000000000000000..1243d135e1628de14596a58f24a79f45299ce517
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst4q_s8.c
> @@ -0,0 +1,37 @@
> +/* { dg-do compile  } */
> +/* { dg-additional-options "-march=armv8.1-m.main+mve 
> -mfloat-abi=hard -O2"  }  */
> +/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} 
> } */
> +
> +#include "arm_mve.h"
> +
> +void
> +foo (int8_t * addr, int8x16x4_t value)
> +{
> +  vst4q_s8 (addr, value);
> +}
> +
> +/* { dg-final { scan-assembler "vst40.8"  }  } */
> +/* { dg-final { scan-assembler "vst41.8"  }  } */
> +/* { dg-final { scan-assembler "vst42.8"  }  } */
> +/* { dg-final { scan-assembler "vst43.8"  }  } */
> +
> +void
> +foo1 (int8_t * addr, int8x16x4_t value)
> +{
> +  vst4q (addr, value);
> +}
> +
> +/* { dg-final { scan-assembler "vst40.8"  }  } */
> +/* { dg-final { scan-assembler "vst41.8"  }  } */
> +/* { dg-final { scan-assembler "vst42.8"  }  } */
> +/* { dg-final { scan-assembler "vst43.8"  }  } */
> +
> +void
> +foo2 (int8_t * addr, int8x16x4_t value)
> +{
> +  vst4q_s8 (addr, value);
> +  addr += 16*4;
> +  vst4q_s8 (addr, value);
> +}
> +
> +/* { dg-final { scan-assembler {vst43.8\s\{.*\}, \[.*\]!} }  } */
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst4q_u16.c 
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst4q_u16.c
> new file mode 100644
> index 
> 0000000000000000000000000000000000000000..09097e7914ed08dda5e7616bafe35cdb30486f21
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst4q_u16.c
> @@ -0,0 +1,37 @@
> +/* { dg-do compile  } */
> +/* { dg-additional-options "-march=armv8.1-m.main+mve 
> -mfloat-abi=hard -O2"  }  */
> +/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} 
> } */
> +
> +#include "arm_mve.h"
> +
> +void
> +foo (uint16_t * addr, uint16x8x4_t value)
> +{
> +  vst4q_u16 (addr, value);
> +}
> +
> +/* { dg-final { scan-assembler "vst40.16"  }  } */
> +/* { dg-final { scan-assembler "vst41.16"  }  } */
> +/* { dg-final { scan-assembler "vst42.16"  }  } */
> +/* { dg-final { scan-assembler "vst43.16"  }  } */
> +
> +void
> +foo1 (uint16_t * addr, uint16x8x4_t value)
> +{
> +  vst4q (addr, value);
> +}
> +
> +/* { dg-final { scan-assembler "vst40.16"  }  } */
> +/* { dg-final { scan-assembler "vst41.16"  }  } */
> +/* { dg-final { scan-assembler "vst42.16"  }  } */
> +/* { dg-final { scan-assembler "vst43.16"  }  } */
> +
> +void
> +foo2 (uint16_t * addr, uint16x8x4_t value)
> +{
> +  vst4q_u16 (addr, value);
> +  addr += 32;
> +  vst4q_u16 (addr, value);
> +}
> +
> +/* { dg-final { scan-assembler {vst43.16\s\{.*\}, \[.*\]!} }  } */
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst4q_u32.c 
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst4q_u32.c
> new file mode 100644
> index 
> 0000000000000000000000000000000000000000..dcb0a10f9f21e4adb70d18b590fcc7b663547f16
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst4q_u32.c
> @@ -0,0 +1,37 @@
> +/* { dg-do compile  } */
> +/* { dg-additional-options "-march=armv8.1-m.main+mve 
> -mfloat-abi=hard -O2"  }  */
> +/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} 
> } */
> +
> +#include "arm_mve.h"
> +
> +void
> +foo (uint32_t * addr, uint32x4x4_t value)
> +{
> +  vst4q_u32 (addr, value);
> +}
> +
> +/* { dg-final { scan-assembler "vst40.32"  }  } */
> +/* { dg-final { scan-assembler "vst41.32"  }  } */
> +/* { dg-final { scan-assembler "vst42.32"  }  } */
> +/* { dg-final { scan-assembler "vst43.32"  }  } */
> +
> +void
> +foo1 (uint32_t * addr, uint32x4x4_t value)
> +{
> +  vst4q (addr, value);
> +}
> +
> +/* { dg-final { scan-assembler "vst40.32"  }  } */
> +/* { dg-final { scan-assembler "vst41.32"  }  } */
> +/* { dg-final { scan-assembler "vst42.32"  }  } */
> +/* { dg-final { scan-assembler "vst43.32"  }  } */
> +
> +void
> +foo2 (uint32_t * addr, uint32x4x4_t value)
> +{
> +  vst4q_u32 (addr, value);
> +  addr += 16;
> +  vst4q_u32 (addr, value);
> +}
> +
> +/* { dg-final { scan-assembler {vst43.32\s\{.*\}, \[.*\]!} }  } */
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst4q_u8.c 
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst4q_u8.c
> new file mode 100644
> index 
> 0000000000000000000000000000000000000000..b18f57dd6356c2fdf39cbf14ca85373c98f1c62c
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst4q_u8.c
> @@ -0,0 +1,37 @@
> +/* { dg-do compile  } */
> +/* { dg-additional-options "-march=armv8.1-m.main+mve 
> -mfloat-abi=hard -O2"  }  */
> +/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} 
> } */
> +
> +#include "arm_mve.h"
> +
> +void
> +foo (uint8_t * addr, uint8x16x4_t value)
> +{
> +  vst4q_u8 (addr, value);
> +}
> +
> +/* { dg-final { scan-assembler "vst40.8"  }  } */
> +/* { dg-final { scan-assembler "vst41.8"  }  } */
> +/* { dg-final { scan-assembler "vst42.8"  }  } */
> +/* { dg-final { scan-assembler "vst43.8"  }  } */
> +
> +void
> +foo1 (uint8_t * addr, uint8x16x4_t value)
> +{
> +  vst4q (addr, value);
> +}
> +
> +/* { dg-final { scan-assembler "vst40.8"  }  } */
> +/* { dg-final { scan-assembler "vst41.8"  }  } */
> +/* { dg-final { scan-assembler "vst42.8"  }  } */
> +/* { dg-final { scan-assembler "vst43.8"  }  } */
> +
> +void
> +foo2 (uint8_t * addr, uint8x16x4_t value)
> +{
> +  vst4q_u8 (addr, value);
> +  addr += 16*4;
> +  vst4q_u8 (addr, value);
> +}
> +
> +/* { dg-final { scan-assembler {vst43.8\s\{.*\}, \[.*\]!} }  } */
>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH][ARM][GCC][3/x]: MVE ACLE intrinsics framework patch.
  2019-11-14 19:13 ` [PATCH][ARM][GCC][3/x]: MVE ACLE intrinsics framework patch Srinath Parvathaneni
@ 2019-12-19 17:50   ` Kyrill Tkachov
  0 siblings, 0 replies; 48+ messages in thread
From: Kyrill Tkachov @ 2019-12-19 17:50 UTC (permalink / raw)
  To: Srinath Parvathaneni, gcc-patches; +Cc: Richard Earnshaw

Hi Srinath,

On 11/14/19 7:12 PM, Srinath Parvathaneni wrote:
> Hello,
>
> This patch is part of MVE ACLE intrinsics framework.
>
> The patch supports the use of emulation for the double-precision 
> arithmetic
> operations for MVE. This changes are to support the MVE ACLE 
> intrinsics which
> operates on vector floating point arithmetic operations.
>
> Please refer to Arm reference manual [1] for more details.
> [1] 
> https://static.docs.arm.com/ddi0553/bh/DDI0553B_h_armv8m_arm.pdf?_ga=2.102521798.659307368.1572453718-1501600630.1548848914
>
> Regression tested on arm-none-eabi and found no regressions.
>
> Ok for trunk?
>
> Thanks,
> Srinath.
>
> gcc/ChangeLog:
>
> 2019-11-11  Andre Vieira <andre.simoesdiasvieira@arm.com>
>             Srinath Parvathaneni <srinath.parvathaneni@arm.com>
>
>         * config/arm/arm.c (arm_libcall_uses_aapcs_base): Modify 
> function to add
>         emulator calls for dobule precision arithmetic operations for MVE.


I'm a bit confused by the changelog and the comment in the patch....


>
>
> ###############     Attachment also inlined for ease of reply    
> ###############
>
>
> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
> index 
> 6faed76206b93c1a9dea048e2f693dc16ee58072..358b2638b65a2007d1c7e8062844b67682597f45 
> 100644
> --- a/gcc/config/arm/arm.c
> +++ b/gcc/config/arm/arm.c
> @@ -5658,9 +5658,25 @@ arm_libcall_uses_aapcs_base (const_rtx libcall)
>        /* Values from double-precision helper functions are returned 
> in core
>           registers if the selected core only supports single-precision
>           arithmetic, even if we are using the hard-float ABI.  The 
> same is
> -        true for single-precision helpers, but we will never be using the
> -        hard-float ABI on a CPU which doesn't support single-precision
> -        operations in hardware.  */
> +        true for single-precision helpers except in case of MVE, 
> because in
> +        MVE we will be using the hard-float ABI on a CPU which 
> doesn't support
> +        single-precision operations in hardware.  In MVE the 
> following check
> +        enables use of emulation for the double-precision arithmetic
> +        operations.  */
> +      if (TARGET_HAVE_MVE)
> +       {
> +         add_libcall (libcall_htab, optab_libfunc (add_optab, SFmode));
> +         add_libcall (libcall_htab, optab_libfunc (sdiv_optab, SFmode));
> +         add_libcall (libcall_htab, optab_libfunc (smul_optab, SFmode));
> +         add_libcall (libcall_htab, optab_libfunc (neg_optab, SFmode));
> +         add_libcall (libcall_htab, optab_libfunc (sub_optab, SFmode));
> +         add_libcall (libcall_htab, optab_libfunc (eq_optab, SFmode));
> +         add_libcall (libcall_htab, optab_libfunc (lt_optab, SFmode));
> +         add_libcall (libcall_htab, optab_libfunc (le_optab, SFmode));
> +         add_libcall (libcall_htab, optab_libfunc (ge_optab, SFmode));
> +         add_libcall (libcall_htab, optab_libfunc (gt_optab, SFmode));
> +         add_libcall (libcall_htab, optab_libfunc (unord_optab, SFmode));
> +       }

... this adds emulation for SFmode but you say you want double-precision 
emulation?

Can you demonstrate what this patch wants to achieve with a testcase?

Thanks,

Kyrill




>        add_libcall (libcall_htab, optab_libfunc (add_optab, DFmode));
>        add_libcall (libcall_htab, optab_libfunc (sdiv_optab, DFmode));
>        add_libcall (libcall_htab, optab_libfunc (smul_optab, DFmode));
>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH][ARM][GCC][1/1x]: Patch to support MVE ACLE intrinsics with unary operand.
  2019-11-14 19:16 ` [PATCH][ARM][GCC][1/1x]: Patch to support MVE ACLE intrinsics with unary operand Srinath Parvathaneni
@ 2019-12-19 17:57   ` Kyrill Tkachov
  0 siblings, 0 replies; 48+ messages in thread
From: Kyrill Tkachov @ 2019-12-19 17:57 UTC (permalink / raw)
  To: Srinath Parvathaneni, gcc-patches; +Cc: Richard Earnshaw

Hi Srinath,

On 11/14/19 7:12 PM, Srinath Parvathaneni wrote:
> Hello,
>
> This patch supports MVE ACLE intrinsics vcvtq_f16_s16, vcvtq_f32_s32, 
> vcvtq_f16_u16, vcvtq_f32_u32n
> vrndxq_f16, vrndxq_f32, vrndq_f16, vrndq_f32, vrndpq_f16, vrndpq_f32, 
> vrndnq_f16, vrndnq_f32,
> vrndmq_f16, vrndmq_f32, vrndaq_f16, vrndaq_f32, vrev64q_f16, 
> vrev64q_f32, vnegq_f16, vnegq_f32,
> vdupq_n_f16, vdupq_n_f32, vabsq_f16, vabsq_f32, vrev32q_f16, 
> vcvttq_f32_f16, vcvtbq_f32_f16.
>
>
> Please refer to M-profile Vector Extension (MVE) intrinsics [1]  for 
> more details.
> [1] 
> https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/mve-intrinsics
>
> Regression tested on arm-none-eabi and found no regressions.
>
> Ok for trunk?
>
> Thanks,
> Srinath.
>
> gcc/ChangeLog:
>
> 2019-10-17  Andre Vieira <andre.simoesdiasvieira@arm.com>
>             Mihail Ionescu  <mihail.ionescu@arm.com>
>             Srinath Parvathaneni <srinath.parvathaneni@arm.com>
>
>         * config/arm/arm-builtins.c (UNOP_NONE_NONE_QUALIFIERS): 
> Define macro.
>         (UNOP_NONE_SNONE_QUALIFIERS): Likewise.
>         (UNOP_NONE_UNONE_QUALIFIERS): Likewise.
>         * config/arm/arm_mve.h (vrndxq_f16): Define macro.
>         (vrndxq_f32): Likewise.
>         (vrndq_f16) Likewise.:
>         (vrndq_f32): Likewise.
>         (vrndpq_f16): Likewise.
>         (vrndpq_f32): Likewise.
>         (vrndnq_f16): Likewise.
>         (vrndnq_f32): Likewise.
>         (vrndmq_f16): Likewise.
>         (vrndmq_f32): Likewise.
>         (vrndaq_f16): Likewise.
>         (vrndaq_f32): Likewise.
>         (vrev64q_f16): Likewise.
>         (vrev64q_f32): Likewise.
>         (vnegq_f16): Likewise.
>         (vnegq_f32): Likewise.
>         (vdupq_n_f16): Likewise.
>         (vdupq_n_f32): Likewise.
>         (vabsq_f16): Likewise.
>         (vabsq_f32): Likewise.
>         (vrev32q_f16): Likewise.
>         (vcvttq_f32_f16): Likewise.
>         (vcvtbq_f32_f16): Likewise.
>         (vcvtq_f16_s16): Likewise.
>         (vcvtq_f32_s32): Likewise.
>         (vcvtq_f16_u16): Likewise.
>         (vcvtq_f32_u32): Likewise.
>         (__arm_vrndxq_f16): Define intrinsic.
>         (__arm_vrndxq_f32): Likewise.
>         (__arm_vrndq_f16): Likewise.
>         (__arm_vrndq_f32): Likewise.
>         (__arm_vrndpq_f16): Likewise.
>         (__arm_vrndpq_f32): Likewise.
>         (__arm_vrndnq_f16): Likewise.
>         (__arm_vrndnq_f32): Likewise.
>         (__arm_vrndmq_f16): Likewise.
>         (__arm_vrndmq_f32): Likewise.
>         (__arm_vrndaq_f16): Likewise.
>         (__arm_vrndaq_f32): Likewise.
>         (__arm_vrev64q_f16): Likewise.
>         (__arm_vrev64q_f32): Likewise.
>         (__arm_vnegq_f16): Likewise.
>         (__arm_vnegq_f32): Likewise.
>         (__arm_vdupq_n_f16): Likewise.
>         (__arm_vdupq_n_f32): Likewise.
>         (__arm_vabsq_f16): Likewise.
>         (__arm_vabsq_f32): Likewise.
>         (__arm_vrev32q_f16): Likewise.
>         (__arm_vcvttq_f32_f16): Likewise.
>         (__arm_vcvtbq_f32_f16): Likewise.
>         (__arm_vcvtq_f16_s16): Likewise.
>         (__arm_vcvtq_f32_s32): Likewise.
>         (__arm_vcvtq_f16_u16): Likewise.
>         (__arm_vcvtq_f32_u32): Likewise.
>         (vrndxq): Define polymorphic variants.
>         (vrndq): Likewise.
>         (vrndpq): Likewise.
>         (vrndnq): Likewise.
>         (vrndmq): Likewise.
>         (vrndaq): Likewise.
>         (vrev64q): Likewise.
>         (vnegq): Likewise.
>         (vabsq): Likewise.
>         (vrev32q): Likewise.
>         (vcvtbq_f32): Likewise.
>         (vcvttq_f32): Likewise.
>         (vcvtq): Likewise.
>         * config/arm/arm_mve_builtins.def (VAR2): Define.
>         (VAR1): Define.
>         * config/arm/mve.md (mve_vrndxq_f<mode>): Add RTL pattern.
>         (mve_vrndq_f<mode>): Likewise.
>         (mve_vrndpq_f<mode>): Likewise.
>         (mve_vrndnq_f<mode>): Likewise.
>         (mve_vrndmq_f<mode>): Likewise.
>         (mve_vrndaq_f<mode>): Likewise.
>         (mve_vrev64q_f<mode>): Likewise.
>         (mve_vnegq_f<mode>): Likewise.
>         (mve_vdupq_n_f<mode>): Likewise.
>         (mve_vabsq_f<mode>): Likewise.
>         (mve_vrev32q_fv8hf): Likewise.
>         (mve_vcvttq_f32_f16v4sf): Likewise.
>         (mve_vcvtbq_f32_f16v4sf): Likewise.
>         (mve_vcvtq_to_f_<supf><mode>): Likewise.
>
> gcc/testsuite/ChangeLog:
>
> 2019-10-17  Andre Vieira <andre.simoesdiasvieira@arm.com>
>             Mihail Ionescu  <mihail.ionescu@arm.com>
>             Srinath Parvathaneni <srinath.parvathaneni@arm.com>
>
>         * gcc.target/arm/mve/intrinsics/vabsq_f16.c: New test.
>         * gcc.target/arm/mve/intrinsics/vabsq_f32.c: Likewise.
>         * gcc.target/arm/mve/intrinsics/vcvtbq_f32_f16.c: Likewise.
>         * gcc.target/arm/mve/intrinsics/vcvtq_f16_s16.c: Likewise.
>         * gcc.target/arm/mve/intrinsics/vcvtq_f16_u16.c: Likewise.
>         * gcc.target/arm/mve/intrinsics/vcvtq_f32_s32.c: Likewise.
>         * gcc.target/arm/mve/intrinsics/vcvtq_f32_u32.c: Likewise.
>         * gcc.target/arm/mve/intrinsics/vcvttq_f32_f16.c: Likewise.
>         * gcc.target/arm/mve/intrinsics/vdupq_n_f16.c: Likewise.
>         * gcc.target/arm/mve/intrinsics/vdupq_n_f32.c: Likewise.
>         * gcc.target/arm/mve/intrinsics/vnegq_f16.c: Likewise.
>         * gcc.target/arm/mve/intrinsics/vnegq_f32.c: Likewise.
>         * gcc.target/arm/mve/intrinsics/vrev32q_f16.c: Likewise.
>         * gcc.target/arm/mve/intrinsics/vrev64q_f16.c: Likewise.
>         * gcc.target/arm/mve/intrinsics/vrev64q_f32.c: Likewise.
>         * gcc.target/arm/mve/intrinsics/vrndaq_f16.c: Likewise.
>         * gcc.target/arm/mve/intrinsics/vrndaq_f32.c: Likewise.
>         * gcc.target/arm/mve/intrinsics/vrndmq_f16.c: Likewise.
>         * gcc.target/arm/mve/intrinsics/vrndmq_f32.c: Likewise.
>         * gcc.target/arm/mve/intrinsics/vrndnq_f16.c: Likewise.
>         * gcc.target/arm/mve/intrinsics/vrndnq_f32.c: Likewise.
>         * gcc.target/arm/mve/intrinsics/vrndpq_f16.c: Likewise.
>         * gcc.target/arm/mve/intrinsics/vrndpq_f32.c: Likewise.
>         * gcc.target/arm/mve/intrinsics/vrndq_f16.c: Likewise.
>         * gcc.target/arm/mve/intrinsics/vrndq_f32.c: Likewise.
>         * gcc.target/arm/mve/intrinsics/vrndxq_f16.c: Likewise.
>         * gcc.target/arm/mve/intrinsics/vrndxq_f32.c: Likewise.
>
>
> ###############     Attachment also inlined for ease of reply    
> ###############
>
>
> diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
> index 
> a9f76971ef310118bf7edea6a8dd3de1da46b46b..2fee417fe6585f457edd4cf96655366b1d6bd1a0 
> 100644
> --- a/gcc/config/arm/arm-builtins.c
> +++ b/gcc/config/arm/arm-builtins.c
> @@ -293,6 +293,28 @@ arm_store1_qualifiers[SIMD_MAX_BUILTIN_ARGS]
>    = { qualifier_void, qualifier_pointer_map_mode, qualifier_none };
>  #define STORE1_QUALIFIERS (arm_store1_qualifiers)
>
> +/* Qualifiers for MVE builtins.  */
> +
> +static enum arm_type_qualifiers
> +arm_unop_none_none_qualifiers[SIMD_MAX_BUILTIN_ARGS]
> +  = { qualifier_none, qualifier_none };
> +#define UNOP_NONE_NONE_QUALIFIERS \
> +  (arm_unop_none_none_qualifiers)
> +
> +static enum arm_type_qualifiers
> +arm_unop_none_snone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
> +  = { qualifier_none, qualifier_none };
> +#define UNOP_NONE_SNONE_QUALIFIERS \
> +  (arm_unop_none_snone_qualifiers)
> +
> +static enum arm_type_qualifiers
> +arm_unop_none_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
> +  = { qualifier_none, qualifier_unsigned };
> +#define UNOP_NONE_UNONE_QUALIFIERS \
> +  (arm_unop_none_unone_qualifiers)
> +
> +/* End of Qualifier for MVE builtins.  */
> +
>     /* void ([T element type] *, T, immediate).  */
>  static enum arm_type_qualifiers
>  arm_storestruct_lane_qualifiers[SIMD_MAX_BUILTIN_ARGS]
> diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
> index 
> 39c6a1551a72700292dde8ef6cea44ba0907af8d..9bcb04fb99a54b47057bb33cc807d6a5ad16401f 
> 100644
> --- a/gcc/config/arm/arm_mve.h
> +++ b/gcc/config/arm/arm_mve.h
> @@ -81,6 +81,33 @@ typedef struct { uint8x16_t val[4]; } uint8x16x4_t;
>  #define vst4q_u32( __addr, __value) __arm_vst4q_u32( __addr, __value)
>  #define vst4q_f16( __addr, __value) __arm_vst4q_f16( __addr, __value)
>  #define vst4q_f32( __addr, __value) __arm_vst4q_f32( __addr, __value)
> +#define vrndxq_f16(__a) __arm_vrndxq_f16(__a)
> +#define vrndxq_f32(__a) __arm_vrndxq_f32(__a)
> +#define vrndq_f16(__a) __arm_vrndq_f16(__a)
> +#define vrndq_f32(__a) __arm_vrndq_f32(__a)
> +#define vrndpq_f16(__a) __arm_vrndpq_f16(__a)
> +#define vrndpq_f32(__a) __arm_vrndpq_f32(__a)
> +#define vrndnq_f16(__a) __arm_vrndnq_f16(__a)
> +#define vrndnq_f32(__a) __arm_vrndnq_f32(__a)
> +#define vrndmq_f16(__a) __arm_vrndmq_f16(__a)
> +#define vrndmq_f32(__a) __arm_vrndmq_f32(__a)
> +#define vrndaq_f16(__a) __arm_vrndaq_f16(__a)
> +#define vrndaq_f32(__a) __arm_vrndaq_f32(__a)
> +#define vrev64q_f16(__a) __arm_vrev64q_f16(__a)
> +#define vrev64q_f32(__a) __arm_vrev64q_f32(__a)
> +#define vnegq_f16(__a) __arm_vnegq_f16(__a)
> +#define vnegq_f32(__a) __arm_vnegq_f32(__a)
> +#define vdupq_n_f16(__a) __arm_vdupq_n_f16(__a)
> +#define vdupq_n_f32(__a) __arm_vdupq_n_f32(__a)
> +#define vabsq_f16(__a) __arm_vabsq_f16(__a)
> +#define vabsq_f32(__a) __arm_vabsq_f32(__a)
> +#define vrev32q_f16(__a) __arm_vrev32q_f16(__a)
> +#define vcvttq_f32_f16(__a) __arm_vcvttq_f32_f16(__a)
> +#define vcvtbq_f32_f16(__a) __arm_vcvtbq_f32_f16(__a)
> +#define vcvtq_f16_s16(__a) __arm_vcvtq_f16_s16(__a)
> +#define vcvtq_f32_s32(__a) __arm_vcvtq_f32_s32(__a)
> +#define vcvtq_f16_u16(__a) __arm_vcvtq_f16_u16(__a)
> +#define vcvtq_f32_u32(__a) __arm_vcvtq_f32_u32(__a)
>  #endif
>
>  __extension__ extern __inline void
> @@ -157,6 +184,195 @@ __arm_vst4q_f32 (float32_t * __addr, 
> float32x4x4_t __value)
>    __builtin_mve_vst4qv4sf (__addr, __rv.__o);
>  }
>
> +__extension__ extern __inline float16x8_t
> +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> +__arm_vrndxq_f16 (float16x8_t __a)
> +{
> +  return __builtin_mve_vrndxq_fv8hf (__a);
> +}
> +
> +__extension__ extern __inline float32x4_t
> +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> +__arm_vrndxq_f32 (float32x4_t __a)
> +{
> +  return __builtin_mve_vrndxq_fv4sf (__a);
> +}
> +
> +__extension__ extern __inline float16x8_t
> +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> +__arm_vrndq_f16 (float16x8_t __a)
> +{
> +  return __builtin_mve_vrndq_fv8hf (__a);
> +}
> +
> +__extension__ extern __inline float32x4_t
> +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> +__arm_vrndq_f32 (float32x4_t __a)
> +{
> +  return __builtin_mve_vrndq_fv4sf (__a);
> +}
> +
> +__extension__ extern __inline float16x8_t
> +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> +__arm_vrndpq_f16 (float16x8_t __a)
> +{
> +  return __builtin_mve_vrndpq_fv8hf (__a);
> +}
> +
> +__extension__ extern __inline float32x4_t
> +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> +__arm_vrndpq_f32 (float32x4_t __a)
> +{
> +  return __builtin_mve_vrndpq_fv4sf (__a);
> +}
> +
> +__extension__ extern __inline float16x8_t
> +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> +__arm_vrndnq_f16 (float16x8_t __a)
> +{
> +  return __builtin_mve_vrndnq_fv8hf (__a);
> +}
> +
> +__extension__ extern __inline float32x4_t
> +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> +__arm_vrndnq_f32 (float32x4_t __a)
> +{
> +  return __builtin_mve_vrndnq_fv4sf (__a);
> +}
> +
> +__extension__ extern __inline float16x8_t
> +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> +__arm_vrndmq_f16 (float16x8_t __a)
> +{
> +  return __builtin_mve_vrndmq_fv8hf (__a);
> +}
> +
> +__extension__ extern __inline float32x4_t
> +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> +__arm_vrndmq_f32 (float32x4_t __a)
> +{
> +  return __builtin_mve_vrndmq_fv4sf (__a);
> +}
> +
> +__extension__ extern __inline float16x8_t
> +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> +__arm_vrndaq_f16 (float16x8_t __a)
> +{
> +  return __builtin_mve_vrndaq_fv8hf (__a);
> +}
> +
> +__extension__ extern __inline float32x4_t
> +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> +__arm_vrndaq_f32 (float32x4_t __a)
> +{
> +  return __builtin_mve_vrndaq_fv4sf (__a);
> +}
> +
> +__extension__ extern __inline float16x8_t
> +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> +__arm_vrev64q_f16 (float16x8_t __a)
> +{
> +  return __builtin_mve_vrev64q_fv8hf (__a);
> +}
> +
> +__extension__ extern __inline float32x4_t
> +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> +__arm_vrev64q_f32 (float32x4_t __a)
> +{
> +  return __builtin_mve_vrev64q_fv4sf (__a);
> +}
> +
> +__extension__ extern __inline float16x8_t
> +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> +__arm_vnegq_f16 (float16x8_t __a)
> +{
> +  return __builtin_mve_vnegq_fv8hf (__a);
> +}
> +
> +__extension__ extern __inline float32x4_t
> +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> +__arm_vnegq_f32 (float32x4_t __a)
> +{
> +  return __builtin_mve_vnegq_fv4sf (__a);
> +}
> +
> +__extension__ extern __inline float16x8_t
> +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> +__arm_vdupq_n_f16 (float16_t __a)
> +{
> +  return __builtin_mve_vdupq_n_fv8hf (__a);
> +}
> +
> +__extension__ extern __inline float32x4_t
> +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> +__arm_vdupq_n_f32 (float32_t __a)
> +{
> +  return __builtin_mve_vdupq_n_fv4sf (__a);
> +}
> +
> +__extension__ extern __inline float16x8_t
> +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> +__arm_vabsq_f16 (float16x8_t __a)
> +{
> +  return __builtin_mve_vabsq_fv8hf (__a);
> +}
> +
> +__extension__ extern __inline float32x4_t
> +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> +__arm_vabsq_f32 (float32x4_t __a)
> +{
> +  return __builtin_mve_vabsq_fv4sf (__a);
> +}
> +
> +__extension__ extern __inline float16x8_t
> +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> +__arm_vrev32q_f16 (float16x8_t __a)
> +{
> +  return __builtin_mve_vrev32q_fv8hf (__a);
> +}
> +
> +__extension__ extern __inline float32x4_t
> +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> +__arm_vcvttq_f32_f16 (float16x8_t __a)
> +{
> +  return __builtin_mve_vcvttq_f32_f16v4sf (__a);
> +}
> +
> +__extension__ extern __inline float32x4_t
> +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> +__arm_vcvtbq_f32_f16 (float16x8_t __a)
> +{
> +  return __builtin_mve_vcvtbq_f32_f16v4sf (__a);
> +}
> +
> +__extension__ extern __inline float16x8_t
> +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> +__arm_vcvtq_f16_s16 (int16x8_t __a)
> +{
> +  return __builtin_mve_vcvtq_to_f_sv8hf (__a);
> +}
> +
> +__extension__ extern __inline float32x4_t
> +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> +__arm_vcvtq_f32_s32 (int32x4_t __a)
> +{
> +  return __builtin_mve_vcvtq_to_f_sv4sf (__a);
> +}
> +
> +__extension__ extern __inline float16x8_t
> +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> +__arm_vcvtq_f16_u16 (uint16x8_t __a)
> +{
> +  return __builtin_mve_vcvtq_to_f_uv8hf (__a);
> +}
> +
> +__extension__ extern __inline float32x4_t
> +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> +__arm_vcvtq_f32_u32 (uint32x4_t __a)
> +{
> +  return __builtin_mve_vcvtq_to_f_uv4sf (__a);
> +}
> +
>  #endif
>
>  enum {
> @@ -368,6 +584,83 @@ extern void *__ARM_undef;
>    int 
> (*)[__ARM_mve_type_float16_t_ptr][__ARM_mve_type_float16x8x4_t]: 
> __arm_vst4q_f16 (__ARM_mve_coerce(__p0, float16_t *), 
> __ARM_mve_coerce(__p1, float16x8x4_t)), \
>    int 
> (*)[__ARM_mve_type_float32_t_ptr][__ARM_mve_type_float32x4x4_t]: 
> __arm_vst4q_f32 (__ARM_mve_coerce(__p0, float32_t *), 
> __ARM_mve_coerce(__p1, float32x4x4_t)));})
>
> +#define vrndxq(p0) __arm_vrndxq(p0)
> +#define __arm_vrndxq(p0) ({ __typeof(p0) __p0 = (p0); \
> +  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
> +  int (*)[__ARM_mve_type_float16x8_t]: __arm_vrndxq_f16 
> (__ARM_mve_coerce(__p0, float16x8_t)), \
> +  int (*)[__ARM_mve_type_float32x4_t]: __arm_vrndxq_f32 
> (__ARM_mve_coerce(__p0, float32x4_t)));})
> +
> +#define vrndq(p0) __arm_vrndq(p0)
> +#define __arm_vrndq(p0) ({ __typeof(p0) __p0 = (p0); \
> +  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
> +  int (*)[__ARM_mve_type_float16x8_t]: __arm_vrndq_f16 
> (__ARM_mve_coerce(__p0, float16x8_t)), \
> +  int (*)[__ARM_mve_type_float32x4_t]: __arm_vrndq_f32 
> (__ARM_mve_coerce(__p0, float32x4_t)));})
> +
> +#define vrndpq(p0) __arm_vrndpq(p0)
> +#define __arm_vrndpq(p0) ({ __typeof(p0) __p0 = (p0); \
> +  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
> +  int (*)[__ARM_mve_type_float16x8_t]: __arm_vrndpq_f16 
> (__ARM_mve_coerce(__p0, float16x8_t)), \
> +  int (*)[__ARM_mve_type_float32x4_t]: __arm_vrndpq_f32 
> (__ARM_mve_coerce(__p0, float32x4_t)));})
> +
> +#define vrndnq(p0) __arm_vrndnq(p0)
> +#define __arm_vrndnq(p0) ({ __typeof(p0) __p0 = (p0); \
> +  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
> +  int (*)[__ARM_mve_type_float16x8_t]: __arm_vrndnq_f16 
> (__ARM_mve_coerce(__p0, float16x8_t)), \
> +  int (*)[__ARM_mve_type_float32x4_t]: __arm_vrndnq_f32 
> (__ARM_mve_coerce(__p0, float32x4_t)));})
> +
> +#define vrndmq(p0) __arm_vrndmq(p0)
> +#define __arm_vrndmq(p0) ({ __typeof(p0) __p0 = (p0); \
> +  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
> +  int (*)[__ARM_mve_type_float16x8_t]: __arm_vrndmq_f16 
> (__ARM_mve_coerce(__p0, float16x8_t)), \
> +  int (*)[__ARM_mve_type_float32x4_t]: __arm_vrndmq_f32 
> (__ARM_mve_coerce(__p0, float32x4_t)));})
> +
> +#define vrndaq(p0) __arm_vrndaq(p0)
> +#define __arm_vrndaq(p0) ({ __typeof(p0) __p0 = (p0); \
> +  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
> +  int (*)[__ARM_mve_type_float16x8_t]: __arm_vrndaq_f16 
> (__ARM_mve_coerce(__p0, float16x8_t)), \
> +  int (*)[__ARM_mve_type_float32x4_t]: __arm_vrndaq_f32 
> (__ARM_mve_coerce(__p0, float32x4_t)));})
> +
> +#define vrev64q(p0) __arm_vrev64q(p0)
> +#define __arm_vrev64q(p0) ({ __typeof(p0) __p0 = (p0); \
> +  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
> +  int (*)[__ARM_mve_type_float16x8_t]: __arm_vrev64q_f16 
> (__ARM_mve_coerce(__p0, float16x8_t)), \
> +  int (*)[__ARM_mve_type_float32x4_t]: __arm_vrev64q_f32 
> (__ARM_mve_coerce(__p0, float32x4_t)));})
> +
> +#define vnegq(p0) __arm_vnegq(p0)
> +#define __arm_vnegq(p0) ({ __typeof(p0) __p0 = (p0); \
> +  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
> +  int (*)[__ARM_mve_type_float16x8_t]: __arm_vnegq_f16 
> (__ARM_mve_coerce(__p0, float16x8_t)), \
> +  int (*)[__ARM_mve_type_float32x4_t]: __arm_vnegq_f32 
> (__ARM_mve_coerce(__p0, float32x4_t)));})
> +
> +#define vabsq(p0) __arm_vabsq(p0)
> +#define __arm_vabsq(p0) ({ __typeof(p0) __p0 = (p0); \
> +  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
> +  int (*)[__ARM_mve_type_float16x8_t]: __arm_vabsq_f16 
> (__ARM_mve_coerce(__p0, float16x8_t)), \
> +  int (*)[__ARM_mve_type_float32x4_t]: __arm_vabsq_f32 
> (__ARM_mve_coerce(__p0, float32x4_t)));})
> +
> +#define vrev32q(p0) __arm_vrev32q(p0)
> +#define __arm_vrev32q(p0) ({ __typeof(p0) __p0 = (p0); \
> +  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
> +  int (*)[__ARM_mve_type_float16x8_t]: __arm_vrev32q_f16 
> (__ARM_mve_coerce(__p0, float16x8_t)));})
> +
> +#define vcvtbq_f32(p0) __arm_vcvtbq_f32(p0)
> +#define __arm_vcvtbq_f32(p0) ({ __typeof(p0) __p0 = (p0); \
> +  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
> +  int (*)[__ARM_mve_type_float16x8_t]: __arm_vcvtbq_f32_f16 
> (__ARM_mve_coerce(__p0, float16x8_t)));})
> +
> +#define vcvttq_f32(p0) __arm_vcvttq_f32(p0)
> +#define __arm_vcvttq_f32(p0) ({ __typeof(p0) __p0 = (p0); \
> +  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
> +  int (*)[__ARM_mve_type_float16x8_t]: __arm_vcvttq_f32_f16 
> (__ARM_mve_coerce(__p0, float16x8_t)));})
> +
> +#define vcvtq(p0) __arm_vcvtq(p0)
> +#define __arm_vcvtq(p0) ({ __typeof(p0) __p0 = (p0); \
> +  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
> +  int (*)[__ARM_mve_type_int16x8_t]: __arm_vcvtq_f16_s16 
> (__ARM_mve_coerce(__p0, int16x8_t)), \
> +  int (*)[__ARM_mve_type_int32x4_t]: __arm_vcvtq_f32_s32 
> (__ARM_mve_coerce(__p0, int32x4_t)), \
> +  int (*)[__ARM_mve_type_uint16x8_t]: __arm_vcvtq_f16_u16 
> (__ARM_mve_coerce(__p0, uint16x8_t)), \
> +  int (*)[__ARM_mve_type_uint32x4_t]: __arm_vcvtq_f32_u32 
> (__ARM_mve_coerce(__p0, uint32x4_t)));})
> +
>  #else /* MVE Interger.  */
>
>  #define vst4q(p0,p1) __arm_vst4q(p0,p1)
> diff --git a/gcc/config/arm/arm_mve_builtins.def 
> b/gcc/config/arm/arm_mve_builtins.def
> index 
> c7f3546c9cb0ec3ae794e181e533a1cbed38121c..65dc58c9328525891a0aa0bb97a412ebc8257c18 
> 100644
> --- a/gcc/config/arm/arm_mve_builtins.def
> +++ b/gcc/config/arm/arm_mve_builtins.def
> @@ -19,3 +19,18 @@
>      <http://www.gnu.org/licenses/>. */
>
>  VAR5 (STORE1, vst4q, v16qi, v8hi, v4si, v8hf, v4sf)
> +VAR2 (UNOP_NONE_NONE, vrndxq_f, v8hf, v4sf)
> +VAR2 (UNOP_NONE_NONE, vrndq_f, v8hf, v4sf)
> +VAR2 (UNOP_NONE_NONE, vrndpq_f, v8hf, v4sf)
> +VAR2 (UNOP_NONE_NONE, vrndnq_f, v8hf, v4sf)
> +VAR2 (UNOP_NONE_NONE, vrndmq_f, v8hf, v4sf)
> +VAR2 (UNOP_NONE_NONE, vrndaq_f, v8hf, v4sf)
> +VAR2 (UNOP_NONE_NONE, vrev64q_f, v8hf, v4sf)
> +VAR2 (UNOP_NONE_NONE, vnegq_f, v8hf, v4sf)
> +VAR2 (UNOP_NONE_NONE, vdupq_n_f, v8hf, v4sf)
> +VAR2 (UNOP_NONE_NONE, vabsq_f, v8hf, v4sf)
> +VAR1 (UNOP_NONE_NONE, vrev32q_f, v8hf)
> +VAR1 (UNOP_NONE_NONE, vcvttq_f32_f16, v4sf)
> +VAR1 (UNOP_NONE_NONE, vcvtbq_f32_f16, v4sf)
> +VAR2 (UNOP_NONE_SNONE, vcvtq_to_f_s, v8hf, v4sf)
> +VAR2 (UNOP_NONE_UNONE, vcvtq_to_f_u, v8hf, v4sf)
> diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
> index 
> a2f5220eccb6d888b9b0b1bbd1054cc3a6be97f5..7a31d0abdfff9a93d79faa1de44d1b224470e2eb 
> 100644
> --- a/gcc/config/arm/mve.md
> +++ b/gcc/config/arm/mve.md
> @@ -21,8 +21,18 @@
>                                (V2DI "u64")])
>  (define_mode_iterator MVE_types [V16QI V8HI V4SI V2DI TI V8HF V4SF V2DF])
>  (define_mode_iterator MVE_VLD_ST [V16QI V8HI V4SI V8HF V4SF])
> +(define_mode_iterator MVE_0 [V8HF V4SF])
>
> -(define_c_enum "unspec" [VST4Q])
> +(define_c_enum "unspec" [VST4Q VRNDXQ_F VRNDQ_F VRNDPQ_F VRNDNQ_F 
> VRNDMQ_F
> +                        VRNDAQ_F VREV64Q_F VNEGQ_F VDUPQ_N_F VABSQ_F 
> VREV32Q_F
> +                        VCVTTQ_F32_F16 VCVTBQ_F32_F16 VCVTQ_TO_F_S
> +                        VCVTQ_TO_F_U])
> +
> +(define_mode_attr MVE_CNVT [(V8HI "V8HF") (V4SI "V4SF")
> +                           (V8HF "V8HI") (V4SF "V4SI")])
> +
> +(define_int_attr supf [(VCVTQ_TO_F_S "s") (VCVTQ_TO_F_U "u")])
> +(define_int_iterator VCVTQ_TO_F [VCVTQ_TO_F_S VCVTQ_TO_F_U])
>
>  (define_insn "*mve_mov<mode>"
>    [(set (match_operand:MVE_types 0 "s_register_operand" "=w,w,r,w,w,r,w")
> @@ -113,3 +123,198 @@
>     return "";
>  }
>    [(set_attr "length" "16")])
> +
> +;;
> +;; [vrndxq_f])
> +;;
> +(define_insn "mve_vrndxq_f<mode>"
> +  [
> +   (set (match_operand:MVE_0 0 "s_register_operand" "=w")
> +       (unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "w")]
> +        VRNDXQ_F))
> +  ]
> +  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
> +  "vrintx.f%#<V_sz_elem>       %q0, %q1"
> +  [(set_attr "type" "mve_move")
> +])
> +


One thing I noticed, though I suppose it should be part of one of the 
infrastructure patches, is that the MVE instructions shouldn't be put in 
IT blocks.

So they don't have the "predicable" attribute, which is correct.

But they should also have the "conds" attribute be "unconditional", like 
for the NEON instructions. For NEON, this is achieved through the 
is_neon_type attribute being used in the "conds" definition in arm.md

We should do something similar for mve instructions (is_mve_type ?). 
That can be done as a separate patch though from this.

This patch is ok.

Thanks,

Kyrill


> +;;
> +;; [vrndq_f])
> +;;
> +(define_insn "mve_vrndq_f<mode>"
> +  [
> +   (set (match_operand:MVE_0 0 "s_register_operand" "=w")
> +       (unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "w")]
> +        VRNDQ_F))
> +  ]
> +  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
> +  "vrintz.f%#<V_sz_elem>       %q0, %q1"
> +  [(set_attr "type" "mve_move")
> +])
> +
> +;;
> +;; [vrndpq_f])
> +;;
> +(define_insn "mve_vrndpq_f<mode>"
> +  [
> +   (set (match_operand:MVE_0 0 "s_register_operand" "=w")
> +       (unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "w")]
> +        VRNDPQ_F))
> +  ]
> +  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
> +  "vrintp.f%#<V_sz_elem>       %q0, %q1"
> +  [(set_attr "type" "mve_move")
> +])
> +
> +;;
> +;; [vrndnq_f])
> +;;
> +(define_insn "mve_vrndnq_f<mode>"
> +  [
> +   (set (match_operand:MVE_0 0 "s_register_operand" "=w")
> +       (unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "w")]
> +        VRNDNQ_F))
> +  ]
> +  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
> +  "vrintn.f%#<V_sz_elem>       %q0, %q1"
> +  [(set_attr "type" "mve_move")
> +])
> +
> +;;
> +;; [vrndmq_f])
> +;;
> +(define_insn "mve_vrndmq_f<mode>"
> +  [
> +   (set (match_operand:MVE_0 0 "s_register_operand" "=w")
> +       (unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "w")]
> +        VRNDMQ_F))
> +  ]
> +  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
> +  "vrintm.f%#<V_sz_elem>       %q0, %q1"
> +  [(set_attr "type" "mve_move")
> +])
> +
> +;;
> +;; [vrndaq_f])
> +;;
> +(define_insn "mve_vrndaq_f<mode>"
> +  [
> +   (set (match_operand:MVE_0 0 "s_register_operand" "=w")
> +       (unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "w")]
> +        VRNDAQ_F))
> +  ]
> +  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
> +  "vrinta.f%#<V_sz_elem>       %q0, %q1"
> +  [(set_attr "type" "mve_move")
> +])
> +
> +;;
> +;; [vrev64q_f])
> +;;
> +(define_insn "mve_vrev64q_f<mode>"
> +  [
> +   (set (match_operand:MVE_0 0 "s_register_operand" "=w")
> +       (unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "w")]
> +        VREV64Q_F))
> +  ]
> +  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
> +  "vrev64.%#<V_sz_elem> %q0, %q1"
> +  [(set_attr "type" "mve_move")
> +])
> +
> +;;
> +;; [vnegq_f])
> +;;
> +(define_insn "mve_vnegq_f<mode>"
> +  [
> +   (set (match_operand:MVE_0 0 "s_register_operand" "=w")
> +       (unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "w")]
> +        VNEGQ_F))
> +  ]
> +  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
> +  "vneg.f%#<V_sz_elem>  %q0, %q1"
> +  [(set_attr "type" "mve_move")
> +])
> +
> +;;
> +;; [vdupq_n_f])
> +;;
> +(define_insn "mve_vdupq_n_f<mode>"
> +  [
> +   (set (match_operand:MVE_0 0 "s_register_operand" "=w")
> +       (unspec:MVE_0 [(match_operand:<V_elem> 1 "s_register_operand" 
> "r")]
> +        VDUPQ_N_F))
> +  ]
> +  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
> +  "vdup.%#<V_sz_elem>   %q0, %1"
> +  [(set_attr "type" "mve_move")
> +])
> +
> +;;
> +;; [vabsq_f])
> +;;
> +(define_insn "mve_vabsq_f<mode>"
> +  [
> +   (set (match_operand:MVE_0 0 "s_register_operand" "=w")
> +       (unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "w")]
> +        VABSQ_F))
> +  ]
> +  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
> +  "vabs.f%#<V_sz_elem>  %q0, %q1"
> +  [(set_attr "type" "mve_move")
> +])
> +
> +;;
> +;; [vrev32q_f])
> +;;
> +(define_insn "mve_vrev32q_fv8hf"
> +  [
> +   (set (match_operand:V8HF 0 "s_register_operand" "=w")
> +       (unspec:V8HF [(match_operand:V8HF 1 "s_register_operand" "w")]
> +        VREV32Q_F))
> +  ]
> +  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
> +  "vrev32.16 %q0, %q1"
> +  [(set_attr "type" "mve_move")
> +])
> +;;
> +;; [vcvttq_f32_f16])
> +;;
> +(define_insn "mve_vcvttq_f32_f16v4sf"
> +  [
> +   (set (match_operand:V4SF 0 "s_register_operand" "=w")
> +       (unspec:V4SF [(match_operand:V8HF 1 "s_register_operand" "w")]
> +        VCVTTQ_F32_F16))
> +  ]
> +  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
> +  "vcvtt.f32.f16 %q0, %q1"
> +  [(set_attr "type" "mve_move")
> +])
> +
> +;;
> +;; [vcvtbq_f32_f16])
> +;;
> +(define_insn "mve_vcvtbq_f32_f16v4sf"
> +  [
> +   (set (match_operand:V4SF 0 "s_register_operand" "=w")
> +       (unspec:V4SF [(match_operand:V8HF 1 "s_register_operand" "w")]
> +        VCVTBQ_F32_F16))
> +  ]
> +  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
> +  "vcvtb.f32.f16 %q0, %q1"
> +  [(set_attr "type" "mve_move")
> +])
> +
> +;;
> +;; [vcvtq_to_f_s, vcvtq_to_f_u])
> +;;
> +(define_insn "mve_vcvtq_to_f_<supf><mode>"
> +  [
> +   (set (match_operand:MVE_0 0 "s_register_operand" "=w")
> +       (unspec:MVE_0 [(match_operand:<MVE_CNVT> 1 
> "s_register_operand" "w")]
> +        VCVTQ_TO_F))
> +  ]
> +  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
> + "vcvt.f%#<V_sz_elem>.<supf>%#<V_sz_elem> %q0, %q1"
> +  [(set_attr "type" "mve_move")
> +])
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_f16.c 
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_f16.c
> new file mode 100644
> index 
> 0000000000000000000000000000000000000000..c0b19dc2acf36a89705ab7a371925f1f35fe60ba
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_f16.c
> @@ -0,0 +1,13 @@
> +/* { dg-do compile  } */
> +/* { dg-additional-options "-march=armv8.1-m.main+mve.fp 
> -mfloat-abi=hard -O2"  }  */
> +/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mcpu=auto"} 
> } */
> +
> +#include "arm_mve.h"
> +
> +float16x8_t
> +foo (float16x8_t a)
> +{
> +  return vabsq_f16 (a);
> +}
> +
> +/* { dg-final { scan-assembler "vabs.f16"  }  } */
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_f32.c 
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_f32.c
> new file mode 100644
> index 
> 0000000000000000000000000000000000000000..59c3b5ab2a4a4269f2fc94408d45a44e60c2db17
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_f32.c
> @@ -0,0 +1,13 @@
> +/* { dg-do compile  } */
> +/* { dg-additional-options "-march=armv8.1-m.main+mve.fp 
> -mfloat-abi=hard -O2"  }  */
> +/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mcpu=auto"} 
> } */
> +
> +#include "arm_mve.h"
> +
> +float32x4_t
> +foo (float32x4_t a)
> +{
> +  return vabsq_f32 (a);
> +}
> +
> +/* { dg-final { scan-assembler "vabs.f32"  }  } */
> diff --git 
> a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtbq_f32_f16.c 
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtbq_f32_f16.c
> new file mode 100644
> index 
> 0000000000000000000000000000000000000000..0f6c0b56a364515c7804acbb38e2402833006474
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtbq_f32_f16.c
> @@ -0,0 +1,13 @@
> +/* { dg-do compile  } */
> +/* { dg-additional-options "-march=armv8.1-m.main+mve.fp 
> -mfloat-abi=hard -O2"  }  */
> +/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mcpu=auto"} 
> } */
> +
> +#include "arm_mve.h"
> +
> +float32x4_t
> +foo (float16x8_t a)
> +{
> +  return vcvtbq_f32_f16 (a);
> +}
> +
> +/* { dg-final { scan-assembler "vcvtb.f32.f16"  }  } */
> diff --git 
> a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_f16_s16.c 
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_f16_s16.c
> new file mode 100644
> index 
> 0000000000000000000000000000000000000000..45be0573781bebd1b0321eb63ae9717d25b2e95f
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_f16_s16.c
> @@ -0,0 +1,13 @@
> +/* { dg-do compile  } */
> +/* { dg-additional-options "-march=armv8.1-m.main+mve.fp 
> -mfloat-abi=hard -O2"  }  */
> +/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mcpu=auto"} 
> } */
> +
> +#include "arm_mve.h"
> +
> +float16x8_t
> +foo (int16x8_t a)
> +{
> +  return vcvtq_f16_s16 (a);
> +}
> +
> +/* { dg-final { scan-assembler "vcvt.f16.s16"  }  } */
> diff --git 
> a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_f16_u16.c 
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_f16_u16.c
> new file mode 100644
> index 
> 0000000000000000000000000000000000000000..17b42c5771e5aab6a8c2dfef7c71c8616c5116f8
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_f16_u16.c
> @@ -0,0 +1,13 @@
> +/* { dg-do compile  } */
> +/* { dg-additional-options "-march=armv8.1-m.main+mve.fp 
> -mfloat-abi=hard -O2"  }  */
> +/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mcpu=auto"} 
> } */
> +
> +#include "arm_mve.h"
> +
> +float16x8_t
> +foo (uint16x8_t a)
> +{
> +  return vcvtq_f16_u16 (a);
> +}
> +
> +/* { dg-final { scan-assembler "vcvt.f16.u16"  }  } */
> diff --git 
> a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_f32_s32.c 
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_f32_s32.c
> new file mode 100644
> index 
> 0000000000000000000000000000000000000000..0e7dbd22effb90a47b53b7a82cbcea4d92e587a2
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_f32_s32.c
> @@ -0,0 +1,13 @@
> +/* { dg-do compile  } */
> +/* { dg-additional-options "-march=armv8.1-m.main+mve.fp 
> -mfloat-abi=hard -O2"  }  */
> +/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mcpu=auto"} 
> } */
> +
> +#include "arm_mve.h"
> +
> +float32x4_t
> +foo (int32x4_t a)
> +{
> +  return vcvtq_f32_s32 (a);
> +}
> +
> +/* { dg-final { scan-assembler "vcvt.f32.s32"  }  } */
> diff --git 
> a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_f32_u32.c 
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_f32_u32.c
> new file mode 100644
> index 
> 0000000000000000000000000000000000000000..2641d71558cd810a0dabc14a0fc038038fa6e2f4
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_f32_u32.c
> @@ -0,0 +1,13 @@
> +/* { dg-do compile  } */
> +/* { dg-additional-options "-march=armv8.1-m.main+mve.fp 
> -mfloat-abi=hard -O2"  }  */
> +/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mcpu=auto"} 
> } */
> +
> +#include "arm_mve.h"
> +
> +float32x4_t
> +foo (uint32x4_t a)
> +{
> +  return vcvtq_f32_u32 (a);
> +}
> +
> +/* { dg-final { scan-assembler "vcvt.f32.u32"  }  } */
> diff --git 
> a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvttq_f32_f16.c 
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvttq_f32_f16.c
> new file mode 100644
> index 
> 0000000000000000000000000000000000000000..f3983cf3aa4ec9a7489df84f895f10b365c6acd4
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvttq_f32_f16.c
> @@ -0,0 +1,13 @@
> +/* { dg-do compile  } */
> +/* { dg-additional-options "-march=armv8.1-m.main+mve.fp 
> -mfloat-abi=hard -O2"  }  */
> +/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mcpu=auto"} 
> } */
> +
> +#include "arm_mve.h"
> +
> +float32x4_t
> +foo (float16x8_t a)
> +{
> +  return vcvttq_f32_f16 (a);
> +}
> +
> +/* { dg-final { scan-assembler "vcvtt.f32.f16"  }  } */
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_f16.c 
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_f16.c
> new file mode 100644
> index 
> 0000000000000000000000000000000000000000..a2b7bebddeb0ed92e4b674405d8a23a397234b15
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_f16.c
> @@ -0,0 +1,13 @@
> +/* { dg-do compile  } */
> +/* { dg-additional-options "-march=armv8.1-m.main+mve.fp 
> -mfloat-abi=hard -O2"  }  */
> +/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mcpu=auto"} 
> } */
> +
> +#include "arm_mve.h"
> +
> +float16x8_t
> +foo (float16_t a)
> +{
> +  return vdupq_n_f16 (a);
> +}
> +
> +/* { dg-final { scan-assembler "vdup.16"  }  } */
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_f32.c 
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_f32.c
> new file mode 100644
> index 
> 0000000000000000000000000000000000000000..05ffc960217f8781d87870a3d100a1008c41ed86
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_n_f32.c
> @@ -0,0 +1,13 @@
> +/* { dg-do compile  } */
> +/* { dg-additional-options "-march=armv8.1-m.main+mve.fp 
> -mfloat-abi=hard -O2"  }  */
> +/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mcpu=auto"} 
> } */
> +
> +#include "arm_mve.h"
> +
> +float32x4_t
> +foo (float32_t a)
> +{
> +  return vdupq_n_f32 (a);
> +}
> +
> +/* { dg-final { scan-assembler "vdup.32"  }  } */
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vnegq_f16.c 
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vnegq_f16.c
> new file mode 100644
> index 
> 0000000000000000000000000000000000000000..8a6e9a793c9ea12a74a08cb905a6f7cb6a4d6350
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vnegq_f16.c
> @@ -0,0 +1,13 @@
> +/* { dg-do compile  } */
> +/* { dg-additional-options "-march=armv8.1-m.main+mve.fp 
> -mfloat-abi=hard -O2"  }  */
> +/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mcpu=auto"} 
> } */
> +
> +#include "arm_mve.h"
> +
> +float16x8_t
> +foo (float16x8_t a)
> +{
> +  return vnegq_f16 (a);
> +}
> +
> +/* { dg-final { scan-assembler "vneg.f16"  }  } */
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vnegq_f32.c 
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vnegq_f32.c
> new file mode 100644
> index 
> 0000000000000000000000000000000000000000..63e94ca3eac1dda65aaf2d06309654560d0756f3
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vnegq_f32.c
> @@ -0,0 +1,13 @@
> +/* { dg-do compile  } */
> +/* { dg-additional-options "-march=armv8.1-m.main+mve.fp 
> -mfloat-abi=hard -O2"  }  */
> +/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mcpu=auto"} 
> } */
> +
> +#include "arm_mve.h"
> +
> +float32x4_t
> +foo (float32x4_t a)
> +{
> +  return vnegq_f32 (a);
> +}
> +
> +/* { dg-final { scan-assembler "vneg.f32"  }  } */
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrev32q_f16.c 
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrev32q_f16.c
> new file mode 100644
> index 
> 0000000000000000000000000000000000000000..21c666844179e723484678af5d8123a418177090
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrev32q_f16.c
> @@ -0,0 +1,13 @@
> +/* { dg-do compile  } */
> +/* { dg-additional-options "-march=armv8.1-m.main+mve.fp 
> -mfloat-abi=hard -O2"  }  */
> +/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mcpu=auto"} 
> } */
> +
> +#include "arm_mve.h"
> +
> +float16x8_t
> +foo (float16x8_t a)
> +{
> +  return vrev32q_f16 (a);
> +}
> +
> +/* { dg-final { scan-assembler "vrev32.16"  }  } */
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrev64q_f16.c 
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrev64q_f16.c
> new file mode 100644
> index 
> 0000000000000000000000000000000000000000..70db03e0fe4573961778a5290e9b585044faa393
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrev64q_f16.c
> @@ -0,0 +1,13 @@
> +/* { dg-do compile  } */
> +/* { dg-additional-options "-march=armv8.1-m.main+mve.fp 
> -mfloat-abi=hard -O2"  }  */
> +/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mcpu=auto"} 
> } */
> +
> +#include "arm_mve.h"
> +
> +float16x8_t
> +foo (float16x8_t a)
> +{
> +  return vrev64q_f16 (a);
> +}
> +
> +/* { dg-final { scan-assembler "vrev64.16"  }  } */
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrev64q_f32.c 
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrev64q_f32.c
> new file mode 100644
> index 
> 0000000000000000000000000000000000000000..e9686c63d162a8919bd24d0227f9e6acb6278142
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrev64q_f32.c
> @@ -0,0 +1,13 @@
> +/* { dg-do compile  } */
> +/* { dg-additional-options "-march=armv8.1-m.main+mve.fp 
> -mfloat-abi=hard -O2"  }  */
> +/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mcpu=auto"} 
> } */
> +
> +#include "arm_mve.h"
> +
> +float32x4_t
> +foo (float32x4_t a)
> +{
> +  return vrev64q_f32 (a);
> +}
> +
> +/* { dg-final { scan-assembler "vrev64.32"  }  } */
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrndaq_f16.c 
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrndaq_f16.c
> new file mode 100644
> index 
> 0000000000000000000000000000000000000000..9c51833855e772346b2d1df3760de511aa58d1ca
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrndaq_f16.c
> @@ -0,0 +1,13 @@
> +/* { dg-do compile  } */
> +/* { dg-additional-options "-march=armv8.1-m.main+mve.fp 
> -mfloat-abi=hard -O2"  }  */
> +/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mcpu=auto"} 
> } */
> +
> +#include "arm_mve.h"
> +
> +float16x8_t
> +foo (float16x8_t a)
> +{
> +  return vrndaq_f16 (a);
> +}
> +
> +/* { dg-final { scan-assembler "vrinta.f16"  }  } */
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrndaq_f32.c 
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrndaq_f32.c
> new file mode 100644
> index 
> 0000000000000000000000000000000000000000..0f2c76f38ea950a36a6e243ce656f334c46e5fa5
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrndaq_f32.c
> @@ -0,0 +1,13 @@
> +/* { dg-do compile  } */
> +/* { dg-additional-options "-march=armv8.1-m.main+mve.fp 
> -mfloat-abi=hard -O2"  }  */
> +/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mcpu=auto"} 
> } */
> +
> +#include "arm_mve.h"
> +
> +float32x4_t
> +foo (float32x4_t a)
> +{
> +  return vrndaq_f32 (a);
> +}
> +
> +/* { dg-final { scan-assembler "vrinta.f32"  }  } */
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrndmq_f16.c 
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrndmq_f16.c
> new file mode 100644
> index 
> 0000000000000000000000000000000000000000..936b20cf9eda0869f828880dc4ca6cf8d22d05dc
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrndmq_f16.c
> @@ -0,0 +1,13 @@
> +/* { dg-do compile  } */
> +/* { dg-additional-options "-march=armv8.1-m.main+mve.fp 
> -mfloat-abi=hard -O2"  }  */
> +/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mcpu=auto"} 
> } */
> +
> +#include "arm_mve.h"
> +
> +float16x8_t
> +foo (float16x8_t a)
> +{
> +  return vrndmq_f16 (a);
> +}
> +
> +/* { dg-final { scan-assembler "vrintm.f16"  }  } */
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrndmq_f32.c 
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrndmq_f32.c
> new file mode 100644
> index 
> 0000000000000000000000000000000000000000..ba77c73ebb1aebc5608597739e0be57436f8aebd
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrndmq_f32.c
> @@ -0,0 +1,13 @@
> +/* { dg-do compile  } */
> +/* { dg-additional-options "-march=armv8.1-m.main+mve.fp 
> -mfloat-abi=hard -O2"  }  */
> +/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mcpu=auto"} 
> } */
> +
> +#include "arm_mve.h"
> +
> +float32x4_t
> +foo (float32x4_t a)
> +{
> +  return vrndmq_f32 (a);
> +}
> +
> +/* { dg-final { scan-assembler "vrintm.f32"  }  } */
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrndnq_f16.c 
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrndnq_f16.c
> new file mode 100644
> index 
> 0000000000000000000000000000000000000000..43dd340e8c529e71cbcd6168adb14f5a29d141b2
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrndnq_f16.c
> @@ -0,0 +1,13 @@
> +/* { dg-do compile  } */
> +/* { dg-additional-options "-march=armv8.1-m.main+mve.fp 
> -mfloat-abi=hard -O2"  }  */
> +/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mcpu=auto"} 
> } */
> +
> +#include "arm_mve.h"
> +
> +float16x8_t
> +foo (float16x8_t a)
> +{
> +  return vrndnq_f16 (a);
> +}
> +
> +/* { dg-final { scan-assembler "vrintn.f16"  }  } */
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrndnq_f32.c 
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrndnq_f32.c
> new file mode 100644
> index 
> 0000000000000000000000000000000000000000..a665250c35b1fecb6bd017df8f336c86b6cae179
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrndnq_f32.c
> @@ -0,0 +1,13 @@
> +/* { dg-do compile  } */
> +/* { dg-additional-options "-march=armv8.1-m.main+mve.fp 
> -mfloat-abi=hard -O2"  }  */
> +/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mcpu=auto"} 
> } */
> +
> +#include "arm_mve.h"
> +
> +float32x4_t
> +foo (float32x4_t a)
> +{
> +  return vrndnq_f32 (a);
> +}
> +
> +/* { dg-final { scan-assembler "vrintn.f32"  }  } */
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrndpq_f16.c 
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrndpq_f16.c
> new file mode 100644
> index 
> 0000000000000000000000000000000000000000..effa934137adf3e2d5535b1abdd7812d8fa7cc41
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrndpq_f16.c
> @@ -0,0 +1,13 @@
> +/* { dg-do compile  } */
> +/* { dg-additional-options "-march=armv8.1-m.main+mve.fp 
> -mfloat-abi=hard -O2"  }  */
> +/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mcpu=auto"} 
> } */
> +
> +#include "arm_mve.h"
> +
> +float16x8_t
> +foo (float16x8_t a)
> +{
> +  return vrndpq_f16 (a);
> +}
> +
> +/* { dg-final { scan-assembler "vrintp.f16"  }  } */
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrndpq_f32.c 
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrndpq_f32.c
> new file mode 100644
> index 
> 0000000000000000000000000000000000000000..0e1714f89b3a7f5b59a40cfbb9ab3fd72399e988
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrndpq_f32.c
> @@ -0,0 +1,13 @@
> +/* { dg-do compile  } */
> +/* { dg-additional-options "-march=armv8.1-m.main+mve.fp 
> -mfloat-abi=hard -O2"  }  */
> +/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mcpu=auto"} 
> } */
> +
> +#include "arm_mve.h"
> +
> +float32x4_t
> +foo (float32x4_t a)
> +{
> +  return vrndpq_f32 (a);
> +}
> +
> +/* { dg-final { scan-assembler "vrintp.f32"  }  } */
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrndq_f16.c 
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrndq_f16.c
> new file mode 100644
> index 
> 0000000000000000000000000000000000000000..0cd74d4849fbb73a0dc7cb64d0f5cbde5ee86927
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrndq_f16.c
> @@ -0,0 +1,13 @@
> +/* { dg-do compile  } */
> +/* { dg-additional-options "-march=armv8.1-m.main+mve.fp 
> -mfloat-abi=hard -O2"  }  */
> +/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mcpu=auto"} 
> } */
> +
> +#include "arm_mve.h"
> +
> +float16x8_t
> +foo (float16x8_t a)
> +{
> +  return vrndq_f16 (a);
> +}
> +
> +/* { dg-final { scan-assembler "vrintz.f16"  }  } */
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrndq_f32.c 
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrndq_f32.c
> new file mode 100644
> index 
> 0000000000000000000000000000000000000000..eacd1d1f2e1c86169f0739ad43b3faf6da764f75
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrndq_f32.c
> @@ -0,0 +1,13 @@
> +/* { dg-do compile  } */
> +/* { dg-additional-options "-march=armv8.1-m.main+mve.fp 
> -mfloat-abi=hard -O2"  }  */
> +/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mcpu=auto"} 
> } */
> +
> +#include "arm_mve.h"
> +
> +float32x4_t
> +foo (float32x4_t a)
> +{
> +  return vrndq_f32 (a);
> +}
> +
> +/* { dg-final { scan-assembler "vrintz.f32"  }  } */
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrndxq_f16.c 
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrndxq_f16.c
> new file mode 100644
> index 
> 0000000000000000000000000000000000000000..0836f2233f814e0d53c100f6af42197fa88476a7
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrndxq_f16.c
> @@ -0,0 +1,13 @@
> +/* { dg-do compile  } */
> +/* { dg-additional-options "-march=armv8.1-m.main+mve.fp 
> -mfloat-abi=hard -O2"  }  */
> +/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mcpu=auto"} 
> } */
> +
> +#include "arm_mve.h"
> +
> +float16x8_t
> +foo (float16x8_t a)
> +{
> +  return vrndxq_f16 (a);
> +}
> +
> +/* { dg-final { scan-assembler "vrintx.f16"  }  } */
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrndxq_f32.c 
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrndxq_f32.c
> new file mode 100644
> index 
> 0000000000000000000000000000000000000000..012a50b6020cd78d5ac1b53d44de0eebedc6b587
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrndxq_f32.c
> @@ -0,0 +1,13 @@
> +/* { dg-do compile  } */
> +/* { dg-additional-options "-march=armv8.1-m.main+mve.fp 
> -mfloat-abi=hard -O2"  }  */
> +/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mcpu=auto"} 
> } */
> +
> +#include "arm_mve.h"
> +
> +float32x4_t
> +foo (float32x4_t a)
> +{
> +  return vrndxq_f32 (a);
> +}
> +
> +/* { dg-final { scan-assembler "vrintx.f32"  }  } */
>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH][ARM][GCC][2/1x]: MVE intrinsics with unary operand.
  2019-11-14 19:13 ` [PATCH][ARM][GCC][2/1x]: " Srinath Parvathaneni
@ 2019-12-19 18:05   ` Kyrill Tkachov
  0 siblings, 0 replies; 48+ messages in thread
From: Kyrill Tkachov @ 2019-12-19 18:05 UTC (permalink / raw)
  To: Srinath Parvathaneni, gcc-patches; +Cc: Richard Earnshaw

Hi Srinath,

On 11/14/19 7:13 PM, Srinath Parvathaneni wrote:
> Hello,
>
> This patch supports following MVE ACLE intrinsics with unary operand.
>
> vmvnq_n_s16, vmvnq_n_s32, vrev64q_s8, vrev64q_s16, vrev64q_s32, 
> vcvtq_s16_f16, vcvtq_s32_f32,
> vrev64q_u8, vrev64q_u16, vrev64q_u32, vmvnq_n_u16, vmvnq_n_u32, 
> vcvtq_u16_f16, vcvtq_u32_f32,
> vrev64q.
>
> Please refer to M-profile Vector Extension (MVE) intrinsics [1]  for 
> more details.
> [1] 
> https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/mve-intrinsics
>
> Regression tested on arm-none-eabi and found no regressions.
>
> Ok for trunk?
>
> Thanks,
> Srinath.
>
> gcc/ChangeLog:
>
> 2019-10-21  Andre Vieira <andre.simoesdiasvieira@arm.com>
>             Mihail Ionescu  <mihail.ionescu@arm.com>
>             Srinath Parvathaneni <srinath.parvathaneni@arm.com>
>
>         * config/arm/arm-builtins.c (UNOP_SNONE_SNONE_QUALIFIERS): Define.
>         (UNOP_SNONE_NONE_QUALIFIERS): Likewise.
>         (UNOP_SNONE_IMM_QUALIFIERS): Likewise.
>         (UNOP_UNONE_NONE_QUALIFIERS): Likewise.
>         (UNOP_UNONE_UNONE_QUALIFIERS): Likewise.
>         (UNOP_UNONE_IMM_QUALIFIERS): Likewise.
>         * config/arm/arm_mve.h (vmvnq_n_s16): Define macro.
>         (vmvnq_n_s32): Likewise.
>         (vrev64q_s8): Likewise.
>         (vrev64q_s16): Likewise.
>         (vrev64q_s32): Likewise.
>         (vcvtq_s16_f16): Likewise.
>         (vcvtq_s32_f32): Likewise.
>         (vrev64q_u8): Likewise.
>         (vrev64q_u16): Likewise.
>         (vrev64q_u32): Likewise.
>         (vmvnq_n_u16): Likewise.
>         (vmvnq_n_u32): Likewise.
>         (vcvtq_u16_f16): Likewise.
>         (vcvtq_u32_f32): Likewise.
>         (__arm_vmvnq_n_s16): Define intrinsic.
>         (__arm_vmvnq_n_s32): Likewise.
>         (__arm_vrev64q_s8): Likewise.
>         (__arm_vrev64q_s16): Likewise.
>         (__arm_vrev64q_s32): Likewise.
>         (__arm_vrev64q_u8): Likewise.
>         (__arm_vrev64q_u16): Likewise.
>         (__arm_vrev64q_u32): Likewise.
>         (__arm_vmvnq_n_u16): Likewise.
>         (__arm_vmvnq_n_u32): Likewise.
>         (__arm_vcvtq_s16_f16): Likewise.
>         (__arm_vcvtq_s32_f32): Likewise.
>         (__arm_vcvtq_u16_f16): Likewise.
>         (__arm_vcvtq_u32_f32): Likewise.
>         (vrev64q): Define polymorphic variant.
>         * config/arm/arm_mve_builtins.def (UNOP_SNONE_SNONE): Use it.
>         (UNOP_SNONE_NONE): Likewise.
>         (UNOP_SNONE_IMM): Likewise.
>         (UNOP_UNONE_UNONE): Likewise.
>         (UNOP_UNONE_NONE): Likewise.
>         (UNOP_UNONE_IMM): Likewise.
>         * config/arm/mve.md (mve_vrev64q_<supf><mode>): Define RTL 
> pattern.
>         (mve_vcvtq_from_f_<supf><mode>): Likewise.
>         (mve_vmvnq_n_<supf><mode>): Likewise.
>
> gcc/testsuite/ChangeLog:
>
> 2019-10-21  Andre Vieira <andre.simoesdiasvieira@arm.com>
>             Mihail Ionescu  <mihail.ionescu@arm.com>
>             Srinath Parvathaneni <srinath.parvathaneni@arm.com>
>
>         * gcc.target/arm/mve/intrinsics/vcvtq_s16_f16.c: New test.
>         * gcc.target/arm/mve/intrinsics/vcvtq_s32_f32.c: Likewise.
>         * gcc.target/arm/mve/intrinsics/vcvtq_u16_f16.c: Likewise.
>         * gcc.target/arm/mve/intrinsics/vcvtq_u32_f32.c: Likewise.
>         * gcc.target/arm/mve/intrinsics/vmvnq_n_s16.c: Likewise.
>         * gcc.target/arm/mve/intrinsics/vmvnq_n_s32.c: Likewise.
>         * gcc.target/arm/mve/intrinsics/vmvnq_n_u16.c: Likewise.
>         * gcc.target/arm/mve/intrinsics/vmvnq_n_u32.c: Likewise.
>         * gcc.target/arm/mve/intrinsics/vrev64q_s16.c: Likewise.
>         * gcc.target/arm/mve/intrinsics/vrev64q_s32.c: Likewise.
>         * gcc.target/arm/mve/intrinsics/vrev64q_s8.c: Likewise.
>         * gcc.target/arm/mve/intrinsics/vrev64q_u16.c: Likewise.
>         * gcc.target/arm/mve/intrinsics/vrev64q_u32.c: Likewise.
>         * gcc.target/arm/mve/intrinsics/vrev64q_u8.c: Likewise.
>
>
> ###############     Attachment also inlined for ease of reply    
> ###############
>
>
> diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
> index 
> 2fee417fe6585f457edd4cf96655366b1d6bd1a0..21b213d8e1bc99a3946f15e97161e01d73832799 
> 100644
> --- a/gcc/config/arm/arm-builtins.c
> +++ b/gcc/config/arm/arm-builtins.c
> @@ -313,6 +313,42 @@ arm_unop_none_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
>  #define UNOP_NONE_UNONE_QUALIFIERS \
>    (arm_unop_none_unone_qualifiers)
>
> +static enum arm_type_qualifiers
> +arm_unop_snone_snone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
> +  = { qualifier_none, qualifier_none };
> +#define UNOP_SNONE_SNONE_QUALIFIERS \
> +  (arm_unop_snone_snone_qualifiers)
> +
> +static enum arm_type_qualifiers
> +arm_unop_snone_none_qualifiers[SIMD_MAX_BUILTIN_ARGS]
> +  = { qualifier_none, qualifier_none };
> +#define UNOP_SNONE_NONE_QUALIFIERS \
> +  (arm_unop_snone_none_qualifiers)
> +
> +static enum arm_type_qualifiers
> +arm_unop_snone_imm_qualifiers[SIMD_MAX_BUILTIN_ARGS]
> +  = { qualifier_none, qualifier_immediate };
> +#define UNOP_SNONE_IMM_QUALIFIERS \
> +  (arm_unop_snone_imm_qualifiers)
> +
> +static enum arm_type_qualifiers
> +arm_unop_unone_none_qualifiers[SIMD_MAX_BUILTIN_ARGS]
> +  = { qualifier_unsigned, qualifier_none };
> +#define UNOP_UNONE_NONE_QUALIFIERS \
> +  (arm_unop_unone_none_qualifiers)
> +
> +static enum arm_type_qualifiers
> +arm_unop_unone_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
> +  = { qualifier_unsigned, qualifier_unsigned };
> +#define UNOP_UNONE_UNONE_QUALIFIERS \
> +  (arm_unop_unone_unone_qualifiers)
> +
> +static enum arm_type_qualifiers
> +arm_unop_unone_imm_qualifiers[SIMD_MAX_BUILTIN_ARGS]
> +  = { qualifier_unsigned, qualifier_immediate };
> +#define UNOP_UNONE_IMM_QUALIFIERS \
> +  (arm_unop_unone_imm_qualifiers)
> +
>  /* End of Qualifier for MVE builtins.  */
>
>     /* void ([T element type] *, T, immediate).  */
> diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
> index 
> 9bcb04fb99a54b47057bb33cc807d6a5ad16401f..bd5162122b8c8e61ba25ba6ea89c56005f5a79dc 
> 100644
> --- a/gcc/config/arm/arm_mve.h
> +++ b/gcc/config/arm/arm_mve.h
> @@ -108,6 +108,20 @@ typedef struct { uint8x16_t val[4]; } uint8x16x4_t;
>  #define vcvtq_f32_s32(__a) __arm_vcvtq_f32_s32(__a)
>  #define vcvtq_f16_u16(__a) __arm_vcvtq_f16_u16(__a)
>  #define vcvtq_f32_u32(__a) __arm_vcvtq_f32_u32(__a)
> +#define vmvnq_n_s16( __imm) __arm_vmvnq_n_s16( __imm)
> +#define vmvnq_n_s32( __imm) __arm_vmvnq_n_s32( __imm)
> +#define vrev64q_s8(__a) __arm_vrev64q_s8(__a)
> +#define vrev64q_s16(__a) __arm_vrev64q_s16(__a)
> +#define vrev64q_s32(__a) __arm_vrev64q_s32(__a)
> +#define vcvtq_s16_f16(__a) __arm_vcvtq_s16_f16(__a)
> +#define vcvtq_s32_f32(__a) __arm_vcvtq_s32_f32(__a)
> +#define vrev64q_u8(__a) __arm_vrev64q_u8(__a)
> +#define vrev64q_u16(__a) __arm_vrev64q_u16(__a)
> +#define vrev64q_u32(__a) __arm_vrev64q_u32(__a)
> +#define vmvnq_n_u16( __imm) __arm_vmvnq_n_u16( __imm)
> +#define vmvnq_n_u32( __imm) __arm_vmvnq_n_u32( __imm)
> +#define vcvtq_u16_f16(__a) __arm_vcvtq_u16_f16(__a)
> +#define vcvtq_u32_f32(__a) __arm_vcvtq_u32_f32(__a)
>  #endif
>
>  __extension__ extern __inline void
> @@ -164,6 +178,76 @@ __arm_vst4q_u32 (uint32_t * __addr, uint32x4x4_t 
> __value)
>    __builtin_mve_vst4qv4si ((__builtin_neon_si *) __addr, __rv.__o);
>  }
>
> +__extension__ extern __inline int16x8_t
> +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> +__arm_vmvnq_n_s16 (const int __imm)
> +{
> +  return __builtin_mve_vmvnq_n_sv8hi (__imm);
> +}
> +
> +__extension__ extern __inline int32x4_t
> +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> +__arm_vmvnq_n_s32 (const int __imm)
> +{
> +  return __builtin_mve_vmvnq_n_sv4si (__imm);
> +}


The spec says that the immediates should be int16_t and int32_t rather 
than ints. Same for the unsigned cases in the patch.


> +
> +__extension__ extern __inline int8x16_t
> +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> +__arm_vrev64q_s8 (int8x16_t __a)
> +{
> +  return __builtin_mve_vrev64q_sv16qi (__a);
> +}
> +
> +__extension__ extern __inline int16x8_t
> +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> +__arm_vrev64q_s16 (int16x8_t __a)
> +{
> +  return __builtin_mve_vrev64q_sv8hi (__a);
> +}
> +
> +__extension__ extern __inline int32x4_t
> +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> +__arm_vrev64q_s32 (int32x4_t __a)
> +{
> +  return __builtin_mve_vrev64q_sv4si (__a);
> +}
> +
> +__extension__ extern __inline uint8x16_t
> +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> +__arm_vrev64q_u8 (uint8x16_t __a)
> +{
> +  return __builtin_mve_vrev64q_uv16qi (__a);
> +}
> +
> +__extension__ extern __inline uint16x8_t
> +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> +__arm_vrev64q_u16 (uint16x8_t __a)
> +{
> +  return __builtin_mve_vrev64q_uv8hi (__a);
> +}
> +
> +__extension__ extern __inline uint32x4_t
> +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> +__arm_vrev64q_u32 (uint32x4_t __a)
> +{
> +  return __builtin_mve_vrev64q_uv4si (__a);
> +}
> +
> +__extension__ extern __inline uint16x8_t
> +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> +__arm_vmvnq_n_u16 (const int __imm)
> +{
> +  return __builtin_mve_vmvnq_n_uv8hi (__imm);
> +}
> +
> +__extension__ extern __inline uint32x4_t
> +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> +__arm_vmvnq_n_u32 (const int __imm)
> +{
> +  return __builtin_mve_vmvnq_n_uv4si (__imm);
> +}
> +
>  #if (__ARM_FEATURE_MVE & 2) /* MVE Floating point.  */
>
>  __extension__ extern __inline void
> @@ -373,6 +457,34 @@ __arm_vcvtq_f32_u32 (uint32x4_t __a)
>    return __builtin_mve_vcvtq_to_f_uv4sf (__a);
>  }
>
> +__extension__ extern __inline int16x8_t
> +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> +__arm_vcvtq_s16_f16 (float16x8_t __a)
> +{
> +  return __builtin_mve_vcvtq_from_f_sv8hi (__a);
> +}
> +
> +__extension__ extern __inline int32x4_t
> +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> +__arm_vcvtq_s32_f32 (float32x4_t __a)
> +{
> +  return __builtin_mve_vcvtq_from_f_sv4si (__a);
> +}
> +
> +__extension__ extern __inline uint16x8_t
> +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> +__arm_vcvtq_u16_f16 (float16x8_t __a)
> +{
> +  return __builtin_mve_vcvtq_from_f_uv8hi (__a);
> +}
> +
> +__extension__ extern __inline uint32x4_t
> +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> +__arm_vcvtq_u32_f32 (float32x4_t __a)
> +{
> +  return __builtin_mve_vcvtq_from_f_uv4si (__a);
> +}
> +
>  #endif
>
>  enum {
> @@ -674,6 +786,16 @@ extern void *__ARM_undef;
>    int (*)[__ARM_mve_type_uint16_t_ptr][__ARM_mve_type_uint16x8x4_t]: 
> __arm_vst4q_u16 (__ARM_mve_coerce(__p0, uint16_t *), 
> __ARM_mve_coerce(__p1, uint16x8x4_t)), \
>    int (*)[__ARM_mve_type_uint32_t_ptr][__ARM_mve_type_uint32x4x4_t]: 
> __arm_vst4q_u32 (__ARM_mve_coerce(__p0, uint32_t *), 
> __ARM_mve_coerce(__p1, uint32x4x4_t)));})
>
> +#define vrev64q(p0) __arm_vrev64q(p0)
> +#define __arm_vrev64q(p0) ({ __typeof(p0) __p0 = (p0); \
> +  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
> +  int (*)[__ARM_mve_type_int8x16_t]: __arm_vrev64q_s8 
> (__ARM_mve_coerce(__p0, int8x16_t)), \
> +  int (*)[__ARM_mve_type_int16x8_t]: __arm_vrev64q_s16 
> (__ARM_mve_coerce(__p0, int16x8_t)), \
> +  int (*)[__ARM_mve_type_int32x4_t]: __arm_vrev64q_s32 
> (__ARM_mve_coerce(__p0, int32x4_t)), \
> +  int (*)[__ARM_mve_type_uint8x16_t]: __arm_vrev64q_u8 
> (__ARM_mve_coerce(__p0, uint8x16_t)), \
> +  int (*)[__ARM_mve_type_uint16x8_t]: __arm_vrev64q_u16 
> (__ARM_mve_coerce(__p0, uint16x8_t)), \
> +  int (*)[__ARM_mve_type_uint32x4_t]: __arm_vrev64q_u32 
> (__ARM_mve_coerce(__p0, uint32x4_t)));})
> +
>  #endif /* MVE Floating point.  */
>
>  #ifdef __cplusplus
> diff --git a/gcc/config/arm/arm_mve_builtins.def 
> b/gcc/config/arm/arm_mve_builtins.def
> index 
> 65dc58c9328525891a0aa0bb97a412ebc8257c18..d205aca28909a224bd4bad103b8a280631661538 
> 100644
> --- a/gcc/config/arm/arm_mve_builtins.def
> +++ b/gcc/config/arm/arm_mve_builtins.def
> @@ -34,3 +34,9 @@ VAR1 (UNOP_NONE_NONE, vcvttq_f32_f16, v4sf)
>  VAR1 (UNOP_NONE_NONE, vcvtbq_f32_f16, v4sf)
>  VAR2 (UNOP_NONE_SNONE, vcvtq_to_f_s, v8hf, v4sf)
>  VAR2 (UNOP_NONE_UNONE, vcvtq_to_f_u, v8hf, v4sf)
> +VAR3 (UNOP_SNONE_SNONE, vrev64q_s, v16qi, v8hi, v4si)
> +VAR2 (UNOP_SNONE_NONE, vcvtq_from_f_s, v8hi, v4si)
> +VAR2 (UNOP_SNONE_IMM, vmvnq_n_s, v8hi, v4si)
> +VAR3 (UNOP_UNONE_UNONE, vrev64q_u, v16qi, v8hi, v4si)
> +VAR2 (UNOP_UNONE_NONE, vcvtq_from_f_u, v8hi, v4si)
> +VAR2 (UNOP_UNONE_IMM, vmvnq_n_u, v8hi, v4si)
> diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
> index 
> 7a31d0abdfff9a93d79faa1de44d1b224470e2eb..a1dd709a9ffe479cf16a88a5923975f1941531ef 
> 100644
> --- a/gcc/config/arm/mve.md
> +++ b/gcc/config/arm/mve.md
> @@ -22,17 +22,26 @@
>  (define_mode_iterator MVE_types [V16QI V8HI V4SI V2DI TI V8HF V4SF V2DF])
>  (define_mode_iterator MVE_VLD_ST [V16QI V8HI V4SI V8HF V4SF])
>  (define_mode_iterator MVE_0 [V8HF V4SF])
> +(define_mode_iterator MVE_2 [V16QI V8HI V4SI])
> +(define_mode_iterator MVE_5 [V8HI V4SI])
>
>  (define_c_enum "unspec" [VST4Q VRNDXQ_F VRNDQ_F VRNDPQ_F VRNDNQ_F 
> VRNDMQ_F
>                           VRNDAQ_F VREV64Q_F VNEGQ_F VDUPQ_N_F VABSQ_F 
> VREV32Q_F
>                           VCVTTQ_F32_F16 VCVTBQ_F32_F16 VCVTQ_TO_F_S
> -                        VCVTQ_TO_F_U])
> +                        VCVTQ_TO_F_U VMVNQ_N_S VMVNQ_N_U VREV64Q_S 
> VREV64Q_U
> +                        VCVTQ_FROM_F_S VCVTQ_FROM_F_U])
>
>  (define_mode_attr MVE_CNVT [(V8HI "V8HF") (V4SI "V4SF")
>                              (V8HF "V8HI") (V4SF "V4SI")])
>
> -(define_int_attr supf [(VCVTQ_TO_F_S "s") (VCVTQ_TO_F_U "u")])
> +(define_int_attr supf [(VCVTQ_TO_F_S "s") (VCVTQ_TO_F_U "u") 
> (VMVNQ_N_S "s")
> +                      (VMVNQ_N_U "u") (VREV64Q_U "u") (VREV64Q_S "s")
> +                      (VCVTQ_FROM_F_S "s") (VCVTQ_FROM_F_U "u")])
> +
>  (define_int_iterator VCVTQ_TO_F [VCVTQ_TO_F_S VCVTQ_TO_F_U])
> +(define_int_iterator VMVNQ_N [VMVNQ_N_U VMVNQ_N_S])
> +(define_int_iterator VREV64Q [VREV64Q_S VREV64Q_U])
> +(define_int_iterator VCVTQ_FROM_F [VCVTQ_FROM_F_S VCVTQ_FROM_F_U])
>
>  (define_insn "*mve_mov<mode>"
>    [(set (match_operand:MVE_types 0 "s_register_operand" "=w,w,r,w,w,r,w")
> @@ -318,3 +327,45 @@
> "vcvt.f%#<V_sz_elem>.<supf>%#<V_sz_elem> %q0, %q1"
>    [(set_attr "type" "mve_move")
>  ])
> +
> +;;
> +;; [vrev64q_u, vrev64q_s])
> +;;
> +(define_insn "mve_vrev64q_<supf><mode>"
> +  [
> +   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
> +       (unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "w")]
> +        VREV64Q))
> +  ]
> +  "TARGET_HAVE_MVE"
> +  "vrev64.%#<V_sz_elem> %q0, %q1"
> +  [(set_attr "type" "mve_move")
> +])
> +
> +;;
> +;; [vcvtq_from_f_s, vcvtq_from_f_u])
> +;;
> +(define_insn "mve_vcvtq_from_f_<supf><mode>"
> +  [
> +   (set (match_operand:MVE_5 0 "s_register_operand" "=w")
> +       (unspec:MVE_5 [(match_operand:<MVE_CNVT> 1 
> "s_register_operand" "w")]
> +        VCVTQ_FROM_F))
> +  ]
> +  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
> + "vcvt.<supf>%#<V_sz_elem>.f%#<V_sz_elem> %q0, %q1"
> +  [(set_attr "type" "mve_move")
> +])
> +
> +;;
> +;; [vmvnq_n_u, vmvnq_n_s])
> +;;
> +(define_insn "mve_vmvnq_n_<supf><mode>"
> +  [
> +   (set (match_operand:MVE_5 0 "s_register_operand" "=w")
> +       (unspec:MVE_5 [(match_operand:SI 1 "immediate_operand" "i")]
> +        VMVNQ_N))
> +  ]


Falling out from my previous comment in the s16/u16 cases the immediate 
should be HImode.

Thanks,

Kyrill


> +  "TARGET_HAVE_MVE"
> +  "vmvn.i%#<V_sz_elem>  %q0, %1"
> +  [(set_attr "type" "mve_move")
> +])
> diff --git 
> a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_s16_f16.c 
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_s16_f16.c
> new file mode 100644
> index 
> 0000000000000000000000000000000000000000..aa69b11b79a15dace81bb5d8112cc5053a6f8dc2
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_s16_f16.c
> @@ -0,0 +1,13 @@
> +/* { dg-do compile  } */
> +/* { dg-additional-options "-march=armv8.1-m.main+mve.fp 
> -mfloat-abi=hard -O2"  }  */
> +/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} 
> } */
> +
> +#include "arm_mve.h"
> +
> +int16x8_t
> +foo (float16x8_t a)
> +{
> +  return vcvtq_s16_f16 (a);
> +}
> +
> +/* { dg-final { scan-assembler "vcvt.s16.f16"  }  } */
> diff --git 
> a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_s32_f32.c 
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_s32_f32.c
> new file mode 100644
> index 
> 0000000000000000000000000000000000000000..0bfcba6dcf4e240a1de0cba5d98d85c2a529c09e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_s32_f32.c
> @@ -0,0 +1,13 @@
> +/* { dg-do compile  } */
> +/* { dg-additional-options "-march=armv8.1-m.main+mve.fp 
> -mfloat-abi=hard -O2"  }  */
> +/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} 
> } */
> +
> +#include "arm_mve.h"
> +
> +int32x4_t
> +foo (float32x4_t a)
> +{
> +  return vcvtq_s32_f32 (a);
> +}
> +
> +/* { dg-final { scan-assembler "vcvt.s32.f32"  }  } */
> diff --git 
> a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_u16_f16.c 
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_u16_f16.c
> new file mode 100644
> index 
> 0000000000000000000000000000000000000000..ed36c8082ee464e8878ae7453e04d26e09a87752
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_u16_f16.c
> @@ -0,0 +1,13 @@
> +/* { dg-do compile  } */
> +/* { dg-additional-options "-march=armv8.1-m.main+mve.fp 
> -mfloat-abi=hard -O2"  }  */
> +/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} 
> } */
> +
> +#include "arm_mve.h"
> +
> +uint16x8_t
> +foo (float16x8_t a)
> +{
> +    return vcvtq_u16_f16 (a);
> +}
> +
> +/* { dg-final { scan-assembler "vcvt.u16.f16"  }  } */
> diff --git 
> a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_u32_f32.c 
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_u32_f32.c
> new file mode 100644
> index 
> 0000000000000000000000000000000000000000..fbd3989e19c8d832561d6a4265b68d0a87a678b7
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_u32_f32.c
> @@ -0,0 +1,13 @@
> +/* { dg-do compile  } */
> +/* { dg-additional-options "-march=armv8.1-m.main+mve.fp 
> -mfloat-abi=hard -O2"  }  */
> +/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} 
> } */
> +
> +#include "arm_mve.h"
> +
> +uint32x4_t
> +foo (float32x4_t a)
> +{
> +    return vcvtq_u32_f32 (a);
> +}
> +
> +/* { dg-final { scan-assembler "vcvt.u32.f32"  }  } */
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmvnq_n_s16.c 
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmvnq_n_s16.c
> new file mode 100644
> index 
> 0000000000000000000000000000000000000000..39c31b4bbe743ed765c9a106778d6c4ba31d14eb
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmvnq_n_s16.c
> @@ -0,0 +1,13 @@
> +/* { dg-do compile  } */
> +/* { dg-additional-options "-march=armv8.1-m.main+mve 
> -mfloat-abi=hard -O2"  }  */
> +/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} 
> } */
> +
> +#include "arm_mve.h"
> +
> +int16x8_t
> +foo ()
> +{
> +  return vmvnq_n_s16 (1);
> +}
> +
> +/* { dg-final { scan-assembler "vmvn.i16"  }  } */
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmvnq_n_s32.c 
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmvnq_n_s32.c
> new file mode 100644
> index 
> 0000000000000000000000000000000000000000..6754cbf8baf11a702543c86d4c048df6bd9699a8
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmvnq_n_s32.c
> @@ -0,0 +1,13 @@
> +/* { dg-do compile  } */
> +/* { dg-additional-options "-march=armv8.1-m.main+mve 
> -mfloat-abi=hard -O2"  }  */
> +/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} 
> } */
> +
> +#include "arm_mve.h"
> +
> +int32x4_t
> +foo ()
> +{
> +  return vmvnq_n_s32 (2);
> +}
> +
> +/* { dg-final { scan-assembler "vmvn.i32"  }  } */
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmvnq_n_u16.c 
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmvnq_n_u16.c
> new file mode 100644
> index 
> 0000000000000000000000000000000000000000..b7b12e7476917631927017d9413ef7226fedbe23
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmvnq_n_u16.c
> @@ -0,0 +1,13 @@
> +/* { dg-do compile  } */
> +/* { dg-additional-options "-march=armv8.1-m.main+mve 
> -mfloat-abi=hard -O2"  }  */
> +/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} 
> } */
> +
> +#include "arm_mve.h"
> +
> +uint16x8_t
> +foo ()
> +{
> +    return vmvnq_n_u16 (1);
> +}
> +
> +/* { dg-final { scan-assembler "vmvn.i16"  }  } */
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmvnq_n_u32.c 
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmvnq_n_u32.c
> new file mode 100644
> index 
> 0000000000000000000000000000000000000000..d5fb831b41ca68d6c4d812051878835b074c47e5
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmvnq_n_u32.c
> @@ -0,0 +1,13 @@
> +/* { dg-do compile  } */
> +/* { dg-additional-options "-march=armv8.1-m.main+mve 
> -mfloat-abi=hard -O2"  }  */
> +/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} 
> } */
> +
> +#include "arm_mve.h"
> +
> +uint32x4_t
> +foo ()
> +{
> +    return vmvnq_n_u32 (2);
> +}
> +
> +/* { dg-final { scan-assembler "vmvn.i32"  }  } */
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrev64q_s16.c 
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrev64q_s16.c
> new file mode 100644
> index 
> 0000000000000000000000000000000000000000..4eda96fefd369781a7639d7c5a9515d02b4b439e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrev64q_s16.c
> @@ -0,0 +1,21 @@
> +/* { dg-do compile  } */
> +/* { dg-additional-options "-march=armv8.1-m.main+mve 
> -mfloat-abi=hard -O2"  }  */
> +/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} 
> } */
> +
> +#include "arm_mve.h"
> +
> +int16x8_t
> +foo (int16x8_t a)
> +{
> +  return vrev64q_s16 (a);
> +}
> +
> +/* { dg-final { scan-assembler "vrev64.16"  }  } */
> +
> +int16x8_t
> +foo1 (int16x8_t a)
> +{
> +  return vrev64q_s16 (a);
> +}
> +
> +/* { dg-final { scan-assembler "vrev64.16"  }  } */
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrev64q_s32.c 
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrev64q_s32.c
> new file mode 100644
> index 
> 0000000000000000000000000000000000000000..356f162c477e8159da73c9242caff6a545235cc1
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrev64q_s32.c
> @@ -0,0 +1,21 @@
> +/* { dg-do compile  } */
> +/* { dg-additional-options "-march=armv8.1-m.main+mve 
> -mfloat-abi=hard -O2"  }  */
> +/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} 
> } */
> +
> +#include "arm_mve.h"
> +
> +int32x4_t
> +foo (int32x4_t a)
> +{
> +  return vrev64q_s32 (a);
> +}
> +
> +/* { dg-final { scan-assembler "vrev64.32"  }  } */
> +
> +int32x4_t
> +foo1 (int32x4_t a)
> +{
> +  return vrev64q_s32 (a);
> +}
> +
> +/* { dg-final { scan-assembler "vrev64.32"  }  } */
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrev64q_s8.c 
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrev64q_s8.c
> new file mode 100644
> index 
> 0000000000000000000000000000000000000000..5cc4d0750f4d8de85e997247c77d7c076dfb624e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrev64q_s8.c
> @@ -0,0 +1,21 @@
> +/* { dg-do compile  } */
> +/* { dg-additional-options "-march=armv8.1-m.main+mve 
> -mfloat-abi=hard -O2"  }  */
> +/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} 
> } */
> +
> +#include "arm_mve.h"
> +
> +int8x16_t
> +foo (int8x16_t a)
> +{
> +  return vrev64q_s8 (a);
> +}
> +
> +/* { dg-final { scan-assembler "vrev64.8"  }  } */
> +
> +int8x16_t
> +foo1 (int8x16_t a)
> +{
> +  return vrev64q (a);
> +}
> +
> +/* { dg-final { scan-assembler "vrev64.8"  }  } */
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrev64q_u16.c 
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrev64q_u16.c
> new file mode 100644
> index 
> 0000000000000000000000000000000000000000..ae7e3665c54b11e2eee8209ade6030875a201b6b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrev64q_u16.c
> @@ -0,0 +1,21 @@
> +/* { dg-do compile  } */
> +/* { dg-additional-options "-march=armv8.1-m.main+mve 
> -mfloat-abi=hard -O2"  }  */
> +/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} 
> } */
> +
> +#include "arm_mve.h"
> +
> +uint16x8_t
> +foo (uint16x8_t a)
> +{
> +    return vrev64q_u16 (a);
> +}
> +
> +/* { dg-final { scan-assembler "vrev64.16"  }  } */
> +
> +uint16x8_t
> +foo1 (uint16x8_t a)
> +{
> +    return vrev64q (a);
> +}
> +
> +/* { dg-final { scan-assembler "vrev64.16"  }  } */
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrev64q_u32.c 
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrev64q_u32.c
> new file mode 100644
> index 
> 0000000000000000000000000000000000000000..8c87cab925766ea981ce90cc47b3194bbf0913ff
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrev64q_u32.c
> @@ -0,0 +1,21 @@
> +/* { dg-do compile  } */
> +/* { dg-additional-options "-march=armv8.1-m.main+mve 
> -mfloat-abi=hard -O2"  }  */
> +/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} 
> } */
> +
> +#include "arm_mve.h"
> +
> +uint32x4_t
> +foo (uint32x4_t a)
> +{
> +    return vrev64q_u32 (a);
> +}
> +
> +/* { dg-final { scan-assembler "vrev64.32"  }  } */
> +
> +uint32x4_t
> +foo1 (uint32x4_t a)
> +{
> +    return vrev64q (a);
> +}
> +
> +/* { dg-final { scan-assembler "vrev64.32"  }  } */
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrev64q_u8.c 
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrev64q_u8.c
> new file mode 100644
> index 
> 0000000000000000000000000000000000000000..c4abd160e61517a4fcc2312c8fcdff1119686da6
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrev64q_u8.c
> @@ -0,0 +1,21 @@
> +/* { dg-do compile  } */
> +/* { dg-additional-options "-march=armv8.1-m.main+mve 
> -mfloat-abi=hard -O2"  }  */
> +/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} 
> } */
> +
> +#include "arm_mve.h"
> +
> +uint8x16_t
> +foo (uint8x16_t a)
> +{
> +    return vrev64q_u8 (a);
> +}
> +
> +/* { dg-final { scan-assembler "vrev64.8"  }  } */
> +
> +uint8x16_t
> +foo1 (uint8x16_t a)
> +{
> +    return vrev64q (a);
> +}
> +
> +/* { dg-final { scan-assembler "vrev64.8"  }  } */
>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH][ARM][GCC][4/1x]: MVE intrinsics with unary operand.
  2019-11-14 19:16 ` [PATCH][ARM][GCC][4/1x]: MVE intrinsics with unary operand Srinath Parvathaneni
@ 2019-12-19 18:16   ` Kyrill Tkachov
  0 siblings, 0 replies; 48+ messages in thread
From: Kyrill Tkachov @ 2019-12-19 18:16 UTC (permalink / raw)
  To: Srinath Parvathaneni, gcc-patches; +Cc: Richard Earnshaw

Hi Srinath,

On 11/14/19 7:13 PM, Srinath Parvathaneni wrote:
> Hello,
>
> This patch supports following MVE ACLE intrinsics with unary operand.
>
> vctp16q, vctp32q, vctp64q, vctp8q, vpnot.
>
> Please refer to M-profile Vector Extension (MVE) intrinsics [1]  for 
> more details.
> [1] 
> https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/mve-intrinsics
>
> There are few conflicts in defining the machine registers, resolved by 
> re-ordering
> VPR_REGNUM, APSRQ_REGNUM and APSRGE_REGNUM.
>
> Regression tested on arm-none-eabi and found no regressions.
>
> Ok for trunk?
>
> Thanks,
> Srinath.
>
> gcc/ChangeLog:
>
> 2019-11-12  Andre Vieira <andre.simoesdiasvieira@arm.com>
>             Mihail Ionescu  <mihail.ionescu@arm.com>
>             Srinath Parvathaneni <srinath.parvathaneni@arm.com>
>
>         * config/arm/arm-builtins.c (hi_UP): Define mode.
>         * config/arm/arm.h (IS_VPR_REGNUM): Move.
>         * config/arm/arm.md (VPR_REGNUM): Define before APSRQ_REGNUM.
>         (APSRQ_REGNUM): Modify.
>         (APSRGE_REGNUM): Modify.
>         * config/arm/arm_mve.h (vctp16q): Define macro.
>         (vctp32q): Likewise.
>         (vctp64q): Likewise.
>         (vctp8q): Likewise.
>         (vpnot): Likewise.
>         (__arm_vctp16q): Define intrinsic.
>         (__arm_vctp32q): Likewise.
>         (__arm_vctp64q): Likewise.
>         (__arm_vctp8q): Likewise.
>         (__arm_vpnot): Likewise.
>         * config/arm/arm_mve_builtins.def (UNOP_UNONE_UNONE): Use builtin
>         qualifier.
>         * config/arm/mve.md (mve_vctp<mode1>qhi): Define RTL pattern.
>         (mve_vpnothi): Likewise.
>
> gcc/testsuite/ChangeLog:
>
> 2019-11-12  Andre Vieira <andre.simoesdiasvieira@arm.com>
>             Mihail Ionescu  <mihail.ionescu@arm.com>
>             Srinath Parvathaneni <srinath.parvathaneni@arm.com>
>
>         * gcc.target/arm/mve/intrinsics/vctp16q.c: New test.
>         * gcc.target/arm/mve/intrinsics/vctp32q.c: Likewise.
>         * gcc.target/arm/mve/intrinsics/vctp64q.c: Likewise.
>         * gcc.target/arm/mve/intrinsics/vctp8q.c: Likewise.
>         * gcc.target/arm/mve/intrinsics/vpnot.c: Likewise.
>
>
> ###############     Attachment also inlined for ease of reply    
> ###############
>
>
> diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
> index 
> 21b213d8e1bc99a3946f15e97161e01d73832799..cd82aa159089c288607e240de02a85dcbb134a14 
> 100644
> --- a/gcc/config/arm/arm-builtins.c
> +++ b/gcc/config/arm/arm-builtins.c
> @@ -387,6 +387,7 @@ arm_set_sat_qualifiers[SIMD_MAX_BUILTIN_ARGS]
>  #define oi_UP    E_OImode
>  #define hf_UP    E_HFmode
>  #define si_UP    E_SImode
> +#define hi_UP    E_HImode
>  #define void_UP  E_VOIDmode
>
>  #define UP(X) X##_UP
> diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
> index 
> 485db72f05f16ca389227289a35c232dc982bf9d..95ec7963a57a1a5652a0a9dc30391a0ce6348242 
> 100644
> --- a/gcc/config/arm/arm.h
> +++ b/gcc/config/arm/arm.h
> @@ -955,6 +955,9 @@ extern int arm_arch_cmse;
>  #define IS_IWMMXT_GR_REGNUM(REGNUM) \
>    (((REGNUM) >= FIRST_IWMMXT_GR_REGNUM) && ((REGNUM) <= 
> LAST_IWMMXT_GR_REGNUM))
>
> +#define IS_VPR_REGNUM(REGNUM) \
> +  ((REGNUM) == VPR_REGNUM)
> +
>  /* Base register for access to local variables of the function.  */
>  #define FRAME_POINTER_REGNUM    102
>
> @@ -999,7 +1002,7 @@ extern int arm_arch_cmse;
>     && (LAST_VFP_REGNUM - (REGNUM) >= 2 * (N) - 1))
>
>  /* The number of hard registers is 16 ARM + 1 CC + 1 SFP + 1 AFP
> -   + 1 APSRQ + 1 APSRGE + 1 VPR.  */
> +   +1 VPR + 1 APSRQ + 1 APSRGE.  */
>  /* Intel Wireless MMX Technology registers add 16 + 4 more.  */
>  /* VFP (VFP3) adds 32 (64) + 1 VFPCC.  */
>  #define FIRST_PSEUDO_REGISTER   107
> @@ -1101,13 +1104,10 @@ extern int arm_regs_in_sequence[];
>    /* Registers not for general use.  */                \
>    CC_REGNUM, VFPCC_REGNUM,                     \
>    FRAME_POINTER_REGNUM, ARG_POINTER_REGNUM,    \
> -  SP_REGNUM, PC_REGNUM, APSRQ_REGNUM, APSRGE_REGNUM,   \
> -  VPR_REGNUM                                   \
> +  SP_REGNUM, PC_REGNUM, VPR_REGNUM, APSRQ_REGNUM,\
> +  APSRGE_REGNUM                                        \
>  }
>
> -#define IS_VPR_REGNUM(REGNUM) \
> -  ((REGNUM) == VPR_REGNUM)
> -
>  /* Use different register alloc ordering for Thumb.  */
>  #define ADJUST_REG_ALLOC_ORDER arm_order_regs_for_local_alloc ()
>
> diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
> index 
> 689baa0b0ff63ef90f47d2fd844cb98c9a1457a0..2a90482a873f8250a3b2b1dec141669f55e0c58b 
> 100644
> --- a/gcc/config/arm/arm.md
> +++ b/gcc/config/arm/arm.md
> @@ -39,9 +39,9 @@
>     (LAST_ARM_REGNUM  15)       ;
>     (CC_REGNUM       100)       ; Condition code pseudo register
>     (VFPCC_REGNUM    101)       ; VFP Condition code pseudo register
> -   (APSRQ_REGNUM    104)       ; Q bit pseudo register
> -   (APSRGE_REGNUM   105)       ; GE bits pseudo register
> -   (VPR_REGNUM      106)       ; Vector Predication Register - MVE 
> register.
> +   (VPR_REGNUM      104)       ; Vector Predication Register - MVE 
> register.
> +   (APSRQ_REGNUM    105)       ; Q bit pseudo register
> +   (APSRGE_REGNUM   106)       ; GE bits pseudo register
>    ]
>  )
>  ;; 3rd operand to select_dominance_cc_mode
> diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
> index 
> 1d357180ba9ddb26347b55cde625903bdb09eef6..c8d9b6471634725cea9bab3f9fa145810b506938 
> 100644
> --- a/gcc/config/arm/arm_mve.h
> +++ b/gcc/config/arm/arm_mve.h
> @@ -192,6 +192,11 @@ typedef struct { uint8x16_t val[4]; } uint8x16x4_t;
>  #define vcvtmq_u32_f32(__a) __arm_vcvtmq_u32_f32(__a)
>  #define vcvtaq_u16_f16(__a) __arm_vcvtaq_u16_f16(__a)
>  #define vcvtaq_u32_f32(__a) __arm_vcvtaq_u32_f32(__a)
> +#define vctp16q(__a) __arm_vctp16q(__a)
> +#define vctp32q(__a) __arm_vctp32q(__a)
> +#define vctp64q(__a) __arm_vctp64q(__a)
> +#define vctp8q(__a) __arm_vctp8q(__a)
> +#define vpnot(__a) __arm_vpnot(__a)
>  #endif
>
>  __extension__ extern __inline void
> @@ -703,6 +708,41 @@ __arm_vaddlvq_u32 (uint32x4_t __a)
>    return __builtin_mve_vaddlvq_uv4si (__a);
>  }
>
> +__extension__ extern __inline int64_t
> +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> +__arm_vctp16q (uint32_t __a)
> +{
> +  return __builtin_mve_vctp16qhi (__a);
> +}
> +
> +__extension__ extern __inline mve_pred16_t
> +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> +__arm_vctp32q (uint32_t __a)
> +{
> +  return __builtin_mve_vctp32qhi (__a);
> +}
> +
> +__extension__ extern __inline mve_pred16_t
> +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> +__arm_vctp64q (uint32_t __a)
> +{
> +  return __builtin_mve_vctp64qhi (__a);
> +}
> +
> +__extension__ extern __inline mve_pred16_t
> +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> +__arm_vctp8q (uint32_t __a)
> +{
> +  return __builtin_mve_vctp8qhi (__a);
> +}
> +
> +__extension__ extern __inline mve_pred16_t
> +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> +__arm_vpnot (mve_pred16_t __a)
> +{
> +  return __builtin_mve_vpnothi (__a);
> +}


Hmm, the spec says that this should generate:

VMSR P0,Rp
VPNOT
VMRS Rt,P0

but __builtin_mve_vpnothi will generate just a VPNOT... what's up with that?

Thanks,

Kyrill


> +
>  #if (__ARM_FEATURE_MVE & 2) /* MVE Floating point.  */
>
>  __extension__ extern __inline void
> diff --git a/gcc/config/arm/arm_mve_builtins.def 
> b/gcc/config/arm/arm_mve_builtins.def
> index 
> aafcc2a4bede4e2f9f7845e60f30d4482b7f10bc..816b6dfca7fb221275212ca5f06fc6f679860a38 
> 100644
> --- a/gcc/config/arm/arm_mve_builtins.def
> +++ b/gcc/config/arm/arm_mve_builtins.def
> @@ -1,5 +1,5 @@
>  /*  MVE builtin definitions for Arm.
> -    Copyright  (C) 2019 Free Software Foundation, Inc.
> +    Copyright (C) 2019 Free Software Foundation, Inc.
>      Contributed by Arm.
>
>      This file is part of GCC.
> @@ -71,3 +71,8 @@ VAR2 (UNOP_UNONE_NONE, vcvtaq_u, v8hi, v4si)
>  VAR2 (UNOP_UNONE_IMM, vmvnq_n_u, v8hi, v4si)
>  VAR1 (UNOP_UNONE_UNONE, vrev16q_u, v16qi)
>  VAR1 (UNOP_UNONE_UNONE, vaddlvq_u, v4si)
> +VAR1 (UNOP_UNONE_UNONE, vctp16q, hi)
> +VAR1 (UNOP_UNONE_UNONE, vctp32q, hi)
> +VAR1 (UNOP_UNONE_UNONE, vctp64q, hi)
> +VAR1 (UNOP_UNONE_UNONE, vctp8q, hi)
> +VAR1 (UNOP_UNONE_UNONE, vpnot, hi)
> diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
> index 
> 9ad53e5655b01c26143a5f46c675658a450e0adf..ee2263e04309e8274ec76ae4478afb7cfba59f8f 
> 100644
> --- a/gcc/config/arm/mve.md
> +++ b/gcc/config/arm/mve.md
> @@ -36,7 +36,7 @@
>                           VREV32Q_U VREV32Q_S VMOVLTQ_U VMOVLTQ_S 
> VMOVLBQ_S
>                           VMOVLBQ_U VCVTQ_FROM_F_S VCVTQ_FROM_F_U VCVTPQ_S
>                           VCVTPQ_U VCVTNQ_S VCVTNQ_U VCVTMQ_S VCVTMQ_U
> -                        VADDLVQ_U])
> +                        VADDLVQ_U VCTP8Q VCTP16Q VCTP32Q VCTP64Q VPNOT])
>
>  (define_mode_attr MVE_CNVT [(V8HI "V8HF") (V4SI "V4SF")
>                              (V8HF "V8HI") (V4SF "V4SI")])
> @@ -54,6 +54,9 @@
>                         (VCLZQ_U "u") (VCLZQ_S "s") (VREV32Q_U "u")
>                         (VREV32Q_S "s") (VADDLVQ_U "u") (VADDLVQ_S "s")])
>
> +(define_int_attr mode1 [(VCTP8Q "8") (VCTP16Q "16") (VCTP32Q "32")
> +                       (VCTP64Q "64")])
> +
>  (define_int_iterator VCVTQ_TO_F [VCVTQ_TO_F_S VCVTQ_TO_F_U])
>  (define_int_iterator VMVNQ_N [VMVNQ_N_U VMVNQ_N_S])
>  (define_int_iterator VREV64Q [VREV64Q_S VREV64Q_U])
> @@ -71,6 +74,7 @@
>  (define_int_iterator VCVTNQ [VCVTNQ_S VCVTNQ_U])
>  (define_int_iterator VCVTMQ [VCVTMQ_S VCVTMQ_U])
>  (define_int_iterator VADDLVQ [VADDLVQ_U VADDLVQ_S])
> +(define_int_iterator VCTPQ [VCTP8Q VCTP16Q VCTP32Q VCTP64Q])
>
>  (define_insn "*mve_mov<mode>"
>    [(set (match_operand:MVE_types 0 "s_register_operand" "=w,w,r,w,w,r,w")
> @@ -648,3 +652,31 @@
>    "vaddlv.<supf>32 %Q0, %R0, %q1"
>    [(set_attr "type" "mve_move")
>  ])
> +
> +;;
> +;; [vctp8q vctp16q vctp32q vctp64q])
> +;;
> +(define_insn "mve_vctp<mode1>qhi"
> +  [
> +   (set (match_operand:HI 0 "vpr_register_operand" "=Up")
> +       (unspec:HI [(match_operand:SI 1 "s_register_operand" "r")]
> +       VCTPQ))
> +  ]
> +  "TARGET_HAVE_MVE"
> +  "vctp.<mode1> %1"
> +  [(set_attr "type" "mve_move")
> +])
> +
> +;;
> +;; [vpnot])
> +;;
> +(define_insn "mve_vpnothi"
> +  [
> +   (set (match_operand:HI 0 "s_register_operand" "=r")
> +       (unspec:HI [(match_operand:HI 1 "vpr_register_operand" "Up")]
> +        VPNOT))
> +  ]
> +  "TARGET_HAVE_MVE"
> +  "vpnot"
> +  [(set_attr "type" "mve_move")
> +])
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vctp16q.c 
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vctp16q.c
> new file mode 100644
> index 
> 0000000000000000000000000000000000000000..0530d23d6cb6283f19f9122bb85c22714eb71843
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vctp16q.c
> @@ -0,0 +1,21 @@
> +/* { dg-do compile  } */
> +/* { dg-additional-options "-march=armv8.1-m.main+mve 
> -mfloat-abi=hard -O2"  }  */
> +/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} 
> } */
> +
> +#include "arm_mve.h"
> +
> +mve_pred16_t
> +foo (uint32_t a)
> +{
> +  return vctp16q (a);
> +}
> +
> +/* { dg-final { scan-assembler "vctp.16"  }  } */
> +
> +mve_pred16_t
> +foo1 (uint32_t a)
> +{
> +  return vctp16q (a);
> +}
> +
> +/* { dg-final { scan-assembler "vctp.16"  }  } */
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vctp32q.c 
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vctp32q.c
> new file mode 100644
> index 
> 0000000000000000000000000000000000000000..6e59362345a7b4e77028262c69bee42d230a6f38
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vctp32q.c
> @@ -0,0 +1,21 @@
> +/* { dg-do compile  } */
> +/* { dg-additional-options "-march=armv8.1-m.main+mve 
> -mfloat-abi=hard -O2"  }  */
> +/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} 
> } */
> +
> +#include "arm_mve.h"
> +
> +mve_pred16_t
> +foo (uint32_t a)
> +{
> +  return vctp32q (a);
> +}
> +
> +/* { dg-final { scan-assembler "vctp.32"  }  } */
> +
> +mve_pred16_t
> +foo1 (uint32_t a)
> +{
> +  return vctp32q (a);
> +}
> +
> +/* { dg-final { scan-assembler "vctp.32"  }  } */
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vctp64q.c 
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vctp64q.c
> new file mode 100644
> index 
> 0000000000000000000000000000000000000000..9a9224ef88de71012373b0f0cebedbade045fccd
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vctp64q.c
> @@ -0,0 +1,21 @@
> +/* { dg-do compile  } */
> +/* { dg-additional-options "-march=armv8.1-m.main+mve 
> -mfloat-abi=hard -O2"  }  */
> +/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} 
> } */
> +
> +#include "arm_mve.h"
> +
> +mve_pred16_t
> +foo (uint32_t a)
> +{
> +  return vctp64q (a);
> +}
> +
> +/* { dg-final { scan-assembler "vctp.64"  }  } */
> +
> +mve_pred16_t
> +foo1 (uint32_t a)
> +{
> +  return vctp64q (a);
> +}
> +
> +/* { dg-final { scan-assembler "vctp.64"  }  } */
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vctp8q.c 
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vctp8q.c
> new file mode 100644
> index 
> 0000000000000000000000000000000000000000..3118d9b60a2eb8c72eb1273445a0b3370b603402
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vctp8q.c
> @@ -0,0 +1,21 @@
> +/* { dg-do compile  } */
> +/* { dg-additional-options "-march=armv8.1-m.main+mve 
> -mfloat-abi=hard -O2"  }  */
> +/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} 
> } */
> +
> +#include "arm_mve.h"
> +
> +mve_pred16_t
> +foo (uint32_t a)
> +{
> +  return vctp8q (a);
> +}
> +
> +/* { dg-final { scan-assembler "vctp.8"  }  } */
> +
> +mve_pred16_t
> +foo1 (uint32_t a)
> +{
> +  return vctp8q (a);
> +}
> +
> +/* { dg-final { scan-assembler "vctp.8"  }  } */
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vpnot.c 
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vpnot.c
> new file mode 100644
> index 
> 0000000000000000000000000000000000000000..9422d342b9d8bf024c2a95877c61e8dd0bd4ffb1
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vpnot.c
> @@ -0,0 +1,21 @@
> +/* { dg-do compile  } */
> +/* { dg-additional-options "-march=armv8.1-m.main+mve 
> -mfloat-abi=hard -O2"  }  */
> +/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} 
> } */
> +
> +#include "arm_mve.h"
> +
> +mve_pred16_t
> +foo (mve_pred16_t a)
> +{
> +  return vpnot (a);
> +}
> +
> +/* { dg-final { scan-assembler "vpnot"  }  } */
> +
> +mve_pred16_t
> +foo1 (mve_pred16_t a)
> +{
> +  return vpnot (a);
> +}
> +
> +/* { dg-final { scan-assembler "vpnot"  }  } */
>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH][ARM][GCC][1/2x]: MVE intrinsics with binary operands.
  2019-11-14 19:14 ` [PATCH][ARM][GCC][1/2x]: " Srinath Parvathaneni
@ 2019-12-19 19:10   ` Kyrill Tkachov
  0 siblings, 0 replies; 48+ messages in thread
From: Kyrill Tkachov @ 2019-12-19 19:10 UTC (permalink / raw)
  To: Srinath Parvathaneni, gcc-patches; +Cc: Richard Earnshaw

Hi Srinath,

On 11/14/19 7:13 PM, Srinath Parvathaneni wrote:
> Hello,
>
> This patch supports following MVE ACLE intrinsics with binary operand.
>
> vsubq_n_f16, vsubq_n_f32, vbrsrq_n_f16, vbrsrq_n_f32, vcvtq_n_f16_s16,
> vcvtq_n_f32_s32, vcvtq_n_f16_u16, vcvtq_n_f32_u32, vcreateq_f16, 
> vcreateq_f32.
>
> Please refer to M-profile Vector Extension (MVE) intrinsics [1]  for 
> more details.
> [1] 
> https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/mve-intrinsics
>
> In this patch new constraint "Rd" is added, which checks the constant 
> is with in the range of 1 to 16.
> Also a new predicate "mve_imm_16" is added, to check the the matching 
> constraint Rd.
>
> Regression tested on arm-none-eabi and found no regressions.
>
> Ok for trunk?
>
> Thanks,
> Srinath.
>
> gcc/ChangeLog:
>
> 2019-10-21  Andre Vieira <andre.simoesdiasvieira@arm.com>
>             Mihail Ionescu  <mihail.ionescu@arm.com>
>             Srinath Parvathaneni <srinath.parvathaneni@arm.com>
>
>         * config/arm/arm-builtins.c (BINOP_NONE_NONE_NONE_QUALIFIERS): 
> Define
>         qualifier for binary operands.
>         (BINOP_NONE_NONE_IMM_QUALIFIERS): Likewise.
>         (BINOP_NONE_UNONE_IMM_QUALIFIERS): Likewise.
>         (BINOP_NONE_UNONE_UNONE_QUALIFIERS): Likewise.
>         * config/arm/arm_mve.h (vsubq_n_f16): Define macro.
>         (vsubq_n_f32): Likewise.
>         (vbrsrq_n_f16): Likewise.
>         (vbrsrq_n_f32): Likewise.
>         (vcvtq_n_f16_s16): Likewise.
>         (vcvtq_n_f32_s32): Likewise.
>         (vcvtq_n_f16_u16): Likewise.
>         (vcvtq_n_f32_u32): Likewise.
>         (vcreateq_f16): Likewise.
>         (vcreateq_f32): Likewise.
>         (__arm_vsubq_n_f16): Define intrinsic.
>         (__arm_vsubq_n_f32): Likewise.
>         (__arm_vbrsrq_n_f16): Likewise.
>         (__arm_vbrsrq_n_f32): Likewise.
>         (__arm_vcvtq_n_f16_s16): Likewise.
>         (__arm_vcvtq_n_f32_s32): Likewise.
>         (__arm_vcvtq_n_f16_u16): Likewise.
>         (__arm_vcvtq_n_f32_u32): Likewise.
>         (__arm_vcreateq_f16): Likewise.
>         (__arm_vcreateq_f32): Likewise.
>         (vsubq): Define polymorphic variant.
>         (vbrsrq): Likewise.
>         (vcvtq_n): Likewise.
>         * config/arm/arm_mve_builtins.def 
> (BINOP_NONE_NONE_NONE_QUALIFIERS): Use
>         it.
>         (BINOP_NONE_NONE_IMM_QUALIFIERS): Likewise.
>         (BINOP_NONE_UNONE_IMM_QUALIFIERS): Likewise.
>         (BINOP_NONE_UNONE_UNONE_QUALIFIERS): Likewise.
>         * config/arm/constraints.md (Rd): Define constraint to check 
> constant is
>         in the range of 1 to 16.
>         * config/arm/mve.md (mve_vsubq_n_f<mode>): Define RTL pattern.
>         mve_vbrsrq_n_f<mode>: Likewise.
>         mve_vcvtq_n_to_f_<supf><mode>: Likewise.
>         mve_vcreateq_f<mode>: Likewise.
>         * config/arm/predicates.md (mve_imm_16): Define predicate to check
>         the matching constraint Rd.
>
> gcc/testsuite/ChangeLog:
>
> 2019-10-21  Andre Vieira <andre.simoesdiasvieira@arm.com>
>             Mihail Ionescu  <mihail.ionescu@arm.com>
>             Srinath Parvathaneni <srinath.parvathaneni@arm.com>
>
>         * gcc.target/arm/mve/intrinsics/vbrsrq_n_f16.c: New test.
>         * gcc.target/arm/mve/intrinsics/vbrsrq_n_f32.c: Likewise.
>         * gcc.target/arm/mve/intrinsics/vcreateq_f16.c: Likewise.
>         * gcc.target/arm/mve/intrinsics/vcreateq_f32.c: Likewise.
>         * gcc.target/arm/mve/intrinsics/vcvtq_n_f16_s16.c: Likewise.
>         * gcc.target/arm/mve/intrinsics/vcvtq_n_f16_u16.c: Likewise.
>         * gcc.target/arm/mve/intrinsics/vcvtq_n_f32_s32.c: Likewise.
>         * gcc.target/arm/mve/intrinsics/vcvtq_n_f32_u32.c: Likewise.
>         * gcc.target/arm/mve/intrinsics/vsubq_n_f16.c: Likewise.
>         * gcc.target/arm/mve/intrinsics/vsubq_n_f32.c: Likewise.
>
>
> ###############     Attachment also inlined for ease of reply    
> ###############
>
>
> diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
> index 
> cd82aa159089c288607e240de02a85dcbb134a14..c2dad057d1365914477c64d559aa1fd1c32bbf19 
> 100644
> --- a/gcc/config/arm/arm-builtins.c
> +++ b/gcc/config/arm/arm-builtins.c
> @@ -349,6 +349,30 @@ arm_unop_unone_imm_qualifiers[SIMD_MAX_BUILTIN_ARGS]
>  #define UNOP_UNONE_IMM_QUALIFIERS \
>    (arm_unop_unone_imm_qualifiers)
>
> +static enum arm_type_qualifiers
> +arm_binop_none_none_none_qualifiers[SIMD_MAX_BUILTIN_ARGS]
> +  = { qualifier_none, qualifier_none, qualifier_none };
> +#define BINOP_NONE_NONE_NONE_QUALIFIERS \
> +  (arm_binop_none_none_none_qualifiers)
> +
> +static enum arm_type_qualifiers
> +arm_binop_none_none_imm_qualifiers[SIMD_MAX_BUILTIN_ARGS]
> +  = { qualifier_none, qualifier_none, qualifier_immediate };
> +#define BINOP_NONE_NONE_IMM_QUALIFIERS \
> +  (arm_binop_none_none_imm_qualifiers)
> +
> +static enum arm_type_qualifiers
> +arm_binop_none_unone_imm_qualifiers[SIMD_MAX_BUILTIN_ARGS]
> +  = { qualifier_none, qualifier_unsigned, qualifier_immediate };
> +#define BINOP_NONE_UNONE_IMM_QUALIFIERS \
> +  (arm_binop_none_unone_imm_qualifiers)
> +
> +static enum arm_type_qualifiers
> +arm_binop_none_unone_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
> +  = { qualifier_none, qualifier_unsigned, qualifier_unsigned };
> +#define BINOP_NONE_UNONE_UNONE_QUALIFIERS \
> +  (arm_binop_none_unone_unone_qualifiers)
> +
>  /* End of Qualifier for MVE builtins.  */
>
>     /* void ([T element type] *, T, immediate).  */
> diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
> index 
> c8d9b6471634725cea9bab3f9fa145810b506938..15b7ada025fe57b682f3873f0f275c43a30d1273 
> 100644
> --- a/gcc/config/arm/arm_mve.h
> +++ b/gcc/config/arm/arm_mve.h
> @@ -197,6 +197,16 @@ typedef struct { uint8x16_t val[4]; } uint8x16x4_t;
>  #define vctp64q(__a) __arm_vctp64q(__a)
>  #define vctp8q(__a) __arm_vctp8q(__a)
>  #define vpnot(__a) __arm_vpnot(__a)
> +#define vsubq_n_f16(__a, __b) __arm_vsubq_n_f16(__a, __b)
> +#define vsubq_n_f32(__a, __b) __arm_vsubq_n_f32(__a, __b)
> +#define vbrsrq_n_f16(__a, __b) __arm_vbrsrq_n_f16(__a, __b)
> +#define vbrsrq_n_f32(__a, __b) __arm_vbrsrq_n_f32(__a, __b)
> +#define vcvtq_n_f16_s16(__a,  __imm6) __arm_vcvtq_n_f16_s16(__a,  __imm6)
> +#define vcvtq_n_f32_s32(__a,  __imm6) __arm_vcvtq_n_f32_s32(__a,  __imm6)
> +#define vcvtq_n_f16_u16(__a,  __imm6) __arm_vcvtq_n_f16_u16(__a,  __imm6)
> +#define vcvtq_n_f32_u32(__a,  __imm6) __arm_vcvtq_n_f32_u32(__a,  __imm6)
> +#define vcreateq_f16(__a, __b) __arm_vcreateq_f16(__a, __b)
> +#define vcreateq_f32(__a, __b) __arm_vcreateq_f32(__a, __b)
>  #endif
>
>  __extension__ extern __inline void
> @@ -1085,6 +1095,76 @@ __arm_vcvtmq_s32_f32 (float32x4_t __a)
>    return __builtin_mve_vcvtmq_sv4si (__a);
>  }
>
> +__extension__ extern __inline float16x8_t
> +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> +__arm_vsubq_n_f16 (float16x8_t __a, float16_t __b)
> +{
> +  return __builtin_mve_vsubq_n_fv8hf (__a, __b);
> +}
> +
> +__extension__ extern __inline float32x4_t
> +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> +__arm_vsubq_n_f32 (float32x4_t __a, float32_t __b)
> +{
> +  return __builtin_mve_vsubq_n_fv4sf (__a, __b);
> +}
> +
> +__extension__ extern __inline float16x8_t
> +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> +__arm_vbrsrq_n_f16 (float16x8_t __a, int32_t __b)
> +{
> +  return __builtin_mve_vbrsrq_n_fv8hf (__a, __b);
> +}
> +
> +__extension__ extern __inline float32x4_t
> +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> +__arm_vbrsrq_n_f32 (float32x4_t __a, int32_t __b)
> +{
> +  return __builtin_mve_vbrsrq_n_fv4sf (__a, __b);
> +}
> +
> +__extension__ extern __inline float16x8_t
> +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> +__arm_vcvtq_n_f16_s16 (int16x8_t __a, const int __imm6)
> +{
> +  return __builtin_mve_vcvtq_n_to_f_sv8hf (__a, __imm6);
> +}
> +
> +__extension__ extern __inline float32x4_t
> +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> +__arm_vcvtq_n_f32_s32 (int32x4_t __a, const int __imm6)
> +{
> +  return __builtin_mve_vcvtq_n_to_f_sv4sf (__a, __imm6);
> +}
> +
> +__extension__ extern __inline float16x8_t
> +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> +__arm_vcvtq_n_f16_u16 (uint16x8_t __a, const int __imm6)
> +{
> +  return __builtin_mve_vcvtq_n_to_f_uv8hf (__a, __imm6);
> +}
> +
> +__extension__ extern __inline float32x4_t
> +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> +__arm_vcvtq_n_f32_u32 (uint32x4_t __a, const int __imm6)
> +{
> +  return __builtin_mve_vcvtq_n_to_f_uv4sf (__a, __imm6);
> +}
> +
> +__extension__ extern __inline float16x8_t
> +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> +__arm_vcreateq_f16 (uint64_t __a, uint64_t __b)
> +{
> +  return __builtin_mve_vcreateq_fv8hf (__a, __b);
> +}
> +
> +__extension__ extern __inline float32x4_t
> +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> +__arm_vcreateq_f32 (uint64_t __a, uint64_t __b)
> +{
> +  return __builtin_mve_vcreateq_fv4sf (__a, __b);
> +}
> +
>  #endif
>
>  enum {
> @@ -1373,6 +1453,27 @@ extern void *__ARM_undef;
>    int (*)[__ARM_mve_type_uint16x8_t]: __arm_vcvtq_f16_u16 
> (__ARM_mve_coerce(__p0, uint16x8_t)), \
>    int (*)[__ARM_mve_type_uint32x4_t]: __arm_vcvtq_f32_u32 
> (__ARM_mve_coerce(__p0, uint32x4_t)));})
>
> +#define vsubq(p0,p1) __arm_vsubq(p0,p1)
> +#define __arm_vsubq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
> +  __typeof(p1) __p1 = (p1); \
> +  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
> +  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16_t]: 
> __arm_vsubq_n_f16 (__ARM_mve_coerce(__p0, float16x8_t), 
> __ARM_mve_coerce(__p1, float16_t)), \
> +  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32_t]: 
> __arm_vsubq_n_f32 (__ARM_mve_coerce(__p0, float32x4_t), 
> __ARM_mve_coerce(__p1, float32_t)));})
> +
> +#define vbrsrq(p0,p1) __arm_vbrsrq(p0,p1)
> +#define __arm_vbrsrq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
> +  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
> +  int (*)[__ARM_mve_type_float16x8_t]: __arm_vbrsrq_n_f16 
> (__ARM_mve_coerce(__p0, float16x8_t), p1), \
> +  int (*)[__ARM_mve_type_float32x4_t]: __arm_vbrsrq_n_f32 
> (__ARM_mve_coerce(__p0, float32x4_t), p1));})
> +
> +#define vcvtq_n(p0,p1) __arm_vcvtq_n(p0,p1)
> +#define __arm_vcvtq_n(p0,p1) ({ __typeof(p0) __p0 = (p0); \
> +  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
> +  int (*)[__ARM_mve_type_int16x8_t]: __arm_vcvtq_n_f16_s16 
> (__ARM_mve_coerce(__p0, int16x8_t), p1), \
> +  int (*)[__ARM_mve_type_int32x4_t]: __arm_vcvtq_n_f32_s32 
> (__ARM_mve_coerce(__p0, int32x4_t), p1), \
> +  int (*)[__ARM_mve_type_uint16x8_t]: __arm_vcvtq_n_f16_u16 
> (__ARM_mve_coerce(__p0, uint16x8_t), p1), \
> +  int (*)[__ARM_mve_type_uint32x4_t]: __arm_vcvtq_n_f32_u32 
> (__ARM_mve_coerce(__p0, uint32x4_t), p1));})
> +
>  #else /* MVE Interger.  */
>
>  #define vst4q(p0,p1) __arm_vst4q(p0,p1)
> diff --git a/gcc/config/arm/arm_mve_builtins.def 
> b/gcc/config/arm/arm_mve_builtins.def
> index 
> 816b6dfca7fb221275212ca5f06fc6f679860a38..8d1e4fac3d75e87fbe334e64e1073cb1fef0d96d 
> 100644
> --- a/gcc/config/arm/arm_mve_builtins.def
> +++ b/gcc/config/arm/arm_mve_builtins.def
> @@ -6,7 +6,7 @@
>
>      GCC is free software; you can redistribute it and/or modify it
>      under the terms of the GNU General Public License as published
> -    by the Free Software Foundation; either version 3, or (at your
> +    by the Free Software Foundation; either version 3, or (at your
>      option) any later version.
>
>      GCC is distributed in the hope that it will be useful, but WITHOUT
> @@ -76,3 +76,8 @@ VAR1 (UNOP_UNONE_UNONE, vctp32q, hi)
>  VAR1 (UNOP_UNONE_UNONE, vctp64q, hi)
>  VAR1 (UNOP_UNONE_UNONE, vctp8q, hi)
>  VAR1 (UNOP_UNONE_UNONE, vpnot, hi)
> +VAR2 (BINOP_NONE_NONE_NONE, vsubq_n_f, v8hf, v4sf)
> +VAR2 (BINOP_NONE_NONE_NONE, vbrsrq_n_f, v8hf, v4sf)
> +VAR2 (BINOP_NONE_NONE_IMM, vcvtq_n_to_f_s, v8hf, v4sf)
> +VAR2 (BINOP_NONE_UNONE_IMM, vcvtq_n_to_f_u, v8hf, v4sf)
> +VAR2 (BINOP_NONE_UNONE_UNONE, vcreateq_f, v8hf, v4sf)
> diff --git a/gcc/config/arm/constraints.md b/gcc/config/arm/constraints.md
> index 
> 4c193d20a48d5e9ed43bdca76468381fc17682da..86cd82334533acbe58be4a0c547cf38737d2bb84 
> 100644
> --- a/gcc/config/arm/constraints.md
> +++ b/gcc/config/arm/constraints.md
> @@ -35,7 +35,7 @@
>  ;; in ARM/Thumb-2 state: Da, Db, Dc, Dd, Dn, DN, Dm, Dl, DL, Do, Dv, 
> Dy, Di,
>  ;;                       Dt, Dp, Dz, Tu
>  ;; in Thumb-1 state: Pa, Pb, Pc, Pd, Pe
> -;; in Thumb-2 state: Ha, Pj, PJ, Ps, Pt, Pu, Pv, Pw, Px, Py, Pz
> +;; in Thumb-2 state: Ha, Pj, PJ, Ps, Pt, Pu, Pv, Pw, Px, Py, Pz, Rd
>  ;; in all states: Pf, Pg, UM, U1
>
>  ;; The following memory constraints have been used:
> @@ -51,6 +51,11 @@
>    "MVE EVEN registers @code{r0}, @code{r2}, @code{r4}, @code{r6}, 
> @code{r8},
>     @code{r10}, @code{r12}, @code{r14}")
>
> +(define_constraint "Rd"
> +  "@internal In Thumb-2 state a constant in range 1 to 16"
> +  (and (match_code "const_int")
> +       (match_test "TARGET_HAVE_MVE && ival >= 1 && ival <= 16")))
> +
>  (define_register_constraint "t" "TARGET_32BIT ? VFP_LO_REGS : NO_REGS"
>   "The VFP registers @code{s0}-@code{s31}.")
>
> diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
> index 
> ee2263e04309e8274ec76ae4478afb7cfba59f8f..06701b024dabae85446bb60060a8b331e540cd6d 
> 100644
> --- a/gcc/config/arm/mve.md
> +++ b/gcc/config/arm/mve.md
> @@ -36,7 +36,9 @@
>                           VREV32Q_U VREV32Q_S VMOVLTQ_U VMOVLTQ_S 
> VMOVLBQ_S
>                           VMOVLBQ_U VCVTQ_FROM_F_S VCVTQ_FROM_F_U VCVTPQ_S
>                           VCVTPQ_U VCVTNQ_S VCVTNQ_U VCVTMQ_S VCVTMQ_U
> -                        VADDLVQ_U VCTP8Q VCTP16Q VCTP32Q VCTP64Q VPNOT])
> +                        VADDLVQ_U VCTP8Q VCTP16Q VCTP32Q VCTP64Q VPNOT
> +                        VCREATEQ_F VCVTQ_N_TO_F_S VCVTQ_N_TO_F_U 
> VBRSRQ_N_F
> +                        VSUBQ_N_F])
>
>  (define_mode_attr MVE_CNVT [(V8HI "V8HF") (V4SI "V4SF")
>                              (V8HF "V8HI") (V4SF "V4SI")])
> @@ -52,7 +54,8 @@
>                         (VCVTPQ_S "s") (VCVTPQ_U "u") (VCVTNQ_S "s")
>                         (VCVTNQ_U "u") (VCVTMQ_S "s") (VCVTMQ_U "u")
>                         (VCLZQ_U "u") (VCLZQ_S "s") (VREV32Q_U "u")
> -                      (VREV32Q_S "s") (VADDLVQ_U "u") (VADDLVQ_S "s")])
> +                      (VREV32Q_S "s") (VADDLVQ_U "u") (VADDLVQ_S "s")
> +                      (VCVTQ_N_TO_F_S "s") (VCVTQ_N_TO_F_U "u")])
>
>  (define_int_attr mode1 [(VCTP8Q "8") (VCTP16Q "16") (VCTP32Q "32")
>                          (VCTP64Q "64")])
> @@ -75,6 +78,7 @@
>  (define_int_iterator VCVTMQ [VCVTMQ_S VCVTMQ_U])
>  (define_int_iterator VADDLVQ [VADDLVQ_U VADDLVQ_S])
>  (define_int_iterator VCTPQ [VCTP8Q VCTP16Q VCTP32Q VCTP64Q])
> +(define_int_iterator VCVTQ_N_TO_F [VCVTQ_N_TO_F_S VCVTQ_N_TO_F_U])
>
>  (define_insn "*mve_mov<mode>"
>    [(set (match_operand:MVE_types 0 "s_register_operand" "=w,w,r,w,w,r,w")
> @@ -680,3 +684,62 @@
>    "vpnot"
>    [(set_attr "type" "mve_move")
>  ])
> +
> +;;
> +;; [vsubq_n_f])
> +;;
> +(define_insn "mve_vsubq_n_f<mode>"
> +  [
> +   (set (match_operand:MVE_0 0 "s_register_operand" "=w")
> +       (unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "w")
> +                      (match_operand:<V_elem> 2 "s_register_operand" 
> "r")]
> +        VSUBQ_N_F))
> +  ]
> +  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
> +  "vsub.f%#<V_sz_elem>  %q0, %q1, %2"


What's with the "f%#<V_sz_elem>" ?

I think just "f<V_sz_elem>" should work here, and in the rest of the patch.

Ok with that fixed.

Thanks,

Kyrill


> +  [(set_attr "type" "mve_move")
> +])
> +
> +;;
> +;; [vbrsrq_n_f])
> +;;
> +(define_insn "mve_vbrsrq_n_f<mode>"
> +  [
> +   (set (match_operand:MVE_0 0 "s_register_operand" "=w")
> +       (unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "w")
> +                      (match_operand:SI 2 "s_register_operand" "r")]
> +        VBRSRQ_N_F))
> +  ]
> +  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
> +  "vbrsr.%#<V_sz_elem>  %q0, %q1, %2"
> +  [(set_attr "type" "mve_move")
> +])
> +
> +;;
> +;; [vcvtq_n_to_f_s, vcvtq_n_to_f_u])
> +;;
> +(define_insn "mve_vcvtq_n_to_f_<supf><mode>"
> +  [
> +   (set (match_operand:MVE_0 0 "s_register_operand" "=w")
> +       (unspec:MVE_0 [(match_operand:<MVE_CNVT> 1 
> "s_register_operand" "w")
> +                      (match_operand:SI 2 "mve_imm_16" "Rd")]
> +        VCVTQ_N_TO_F))
> +  ]
> +  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
> + "vcvt.f%#<V_sz_elem>.<supf>%#<V_sz_elem>\t%q0, %q1, %2"
> +  [(set_attr "type" "mve_move")
> +])
> +
> +;; [vcreateq_f])
> +;;
> +(define_insn "mve_vcreateq_f<mode>"
> +  [
> +   (set (match_operand:MVE_0 0 "s_register_operand" "=w")
> +       (unspec:MVE_0 [(match_operand:DI 1 "s_register_operand" "r")
> +                      (match_operand:DI 2 "s_register_operand" "r")]
> +        VCREATEQ_F))
> +  ]
> +  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
> +  "vmov %q0[2], %q0[0], %Q2, %Q1\;vmov %q0[3], %q0[1], %R2, %R1"
> +  [(set_attr "type" "mve_move")
> +   (set_attr "length""8")])
> diff --git a/gcc/config/arm/predicates.md b/gcc/config/arm/predicates.md
> index 
> 9d74165fe065b03c77918fe9e4611967799535f1..80a44bb32cbce1ee3f850a44b70a0e4ceed548be 
> 100644
> --- a/gcc/config/arm/predicates.md
> +++ b/gcc/config/arm/predicates.md
> @@ -31,6 +31,10 @@
>                || REGNO_REG_CLASS (REGNO (op)) != NO_REGS));
>  })
>
> +;; True for immediates in the range of 1 to 16 for MVE.
> +(define_predicate "mve_imm_16"
> +  (match_test "satisfies_constraint_Rd (op)"))
> +
>  ; Predicate for stack protector guard's address in
>  ; stack_protect_combined_set_insn and 
> stack_protect_combined_test_insn patterns
>  (define_predicate "guard_addr_operand"
> diff --git 
> a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vbrsrq_n_f16.c 
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vbrsrq_n_f16.c
> new file mode 100644
> index 
> 0000000000000000000000000000000000000000..90f52b50e6bd0dbccc40e9da7e2f8badb593727d
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vbrsrq_n_f16.c
> @@ -0,0 +1,21 @@
> +/* { dg-do compile  } */
> +/* { dg-additional-options "-march=armv8.1-m.main+mve.fp 
> -mfloat-abi=hard -O2"  }  */
> +/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} 
> } */
> +
> +#include "arm_mve.h"
> +
> +float16x8_t
> +foo (float16x8_t a, int32_t b)
> +{
> +  return vbrsrq_n_f16 (a, b);
> +}
> +
> +/* { dg-final { scan-assembler "vbrsr.16"  }  } */
> +
> +float16x8_t
> +foo1 (float16x8_t a, int32_t b)
> +{
> +  return vbrsrq (a, b);
> +}
> +
> +/* { dg-final { scan-assembler "vbrsr.16"  }  } */
> diff --git 
> a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vbrsrq_n_f32.c 
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vbrsrq_n_f32.c
> new file mode 100644
> index 
> 0000000000000000000000000000000000000000..4ccd8dd313319ce1933e85f76955d1ea12952abd
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vbrsrq_n_f32.c
> @@ -0,0 +1,21 @@
> +/* { dg-do compile  } */
> +/* { dg-additional-options "-march=armv8.1-m.main+mve.fp 
> -mfloat-abi=hard -O2"  }  */
> +/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} 
> } */
> +
> +#include "arm_mve.h"
> +
> +float32x4_t
> +foo (float32x4_t a, int32_t b)
> +{
> +  return vbrsrq_n_f32 (a, b);
> +}
> +
> +/* { dg-final { scan-assembler "vbrsr.32"  }  } */
> +
> +float32x4_t
> +foo1 (float32x4_t a, int32_t b)
> +{
> +  return vbrsrq (a, b);
> +}
> +
> +/* { dg-final { scan-assembler "vbrsr.32"  }  } */
> diff --git 
> a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_f16.c 
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_f16.c
> new file mode 100644
> index 
> 0000000000000000000000000000000000000000..a83203ad0de1e6bfc6b4b9d469748108a335a391
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_f16.c
> @@ -0,0 +1,13 @@
> +/* { dg-do compile  } */
> +/* { dg-additional-options "-march=armv8.1-m.main+mve.fp 
> -mfloat-abi=hard -O2"  }  */
> +/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} 
> } */
> +
> +#include "arm_mve.h"
> +
> +float16x8_t
> +foo (uint64_t a, uint64_t b)
> +{
> +  return vcreateq_f16 (a, b);
> +}
> +
> +/* { dg-final { scan-assembler "vmov"  }  } */
> diff --git 
> a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_f32.c 
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_f32.c
> new file mode 100644
> index 
> 0000000000000000000000000000000000000000..279e922ed45165d0b104b236e4a2c8f782c4a59d
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_f32.c
> @@ -0,0 +1,13 @@
> +/* { dg-do compile  } */
> +/* { dg-additional-options "-march=armv8.1-m.main+mve.fp 
> -mfloat-abi=hard -O2"  }  */
> +/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} 
> } */
> +
> +#include "arm_mve.h"
> +
> +float32x4_t
> +foo (uint64_t a, uint64_t b)
> +{
> +  return vcreateq_f32 (a, b);
> +}
> +
> +/* { dg-final { scan-assembler "vmov"  }  } */
> diff --git 
> a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_n_f16_s16.c 
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_n_f16_s16.c
> new file mode 100644
> index 
> 0000000000000000000000000000000000000000..440d4df99b295deb66b91e3996cbca8f35f7d12b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_n_f16_s16.c
> @@ -0,0 +1,21 @@
> +/* { dg-do compile  } */
> +/* { dg-additional-options "-march=armv8.1-m.main+mve.fp 
> -mfloat-abi=hard -O2"  }  */
> +/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} 
> } */
> +
> +#include "arm_mve.h"
> +
> +float16x8_t
> +foo (int16x8_t a)
> +{
> +  return vcvtq_n_f16_s16 (a, 1);
> +}
> +
> +/* { dg-final { scan-assembler "vcvt.f16.s16"  }  } */
> +
> +float16x8_t
> +foo1 (int16x8_t a)
> +{
> +  return vcvtq_n (a, 1);
> +}
> +
> +/* { dg-final { scan-assembler "vcvt.f16.s16"  }  } */
> diff --git 
> a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_n_f16_u16.c 
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_n_f16_u16.c
> new file mode 100644
> index 
> 0000000000000000000000000000000000000000..4d648d653ef91436bb87d2e07e15ccee0844309e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_n_f16_u16.c
> @@ -0,0 +1,21 @@
> +/* { dg-do compile  } */
> +/* { dg-additional-options "-march=armv8.1-m.main+mve.fp 
> -mfloat-abi=hard -O2"  }  */
> +/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} 
> } */
> +
> +#include "arm_mve.h"
> +
> +float16x8_t
> +foo (uint16x8_t a)
> +{
> +  return vcvtq_n_f16_u16 (a, 1);
> +}
> +
> +/* { dg-final { scan-assembler "vcvt.f16.u16"  }  } */
> +
> +float16x8_t
> +foo1 (uint16x8_t a)
> +{
> +  return vcvtq_n (a, 1);
> +}
> +
> +/* { dg-final { scan-assembler "vcvt.f16.u16"  }  } */
> diff --git 
> a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_n_f32_s32.c 
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_n_f32_s32.c
> new file mode 100644
> index 
> 0000000000000000000000000000000000000000..836727a9b2d6a2621b34de5b7d4643ee1bebf47c
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_n_f32_s32.c
> @@ -0,0 +1,21 @@
> +/* { dg-do compile  } */
> +/* { dg-additional-options "-march=armv8.1-m.main+mve.fp 
> -mfloat-abi=hard -O2"  }  */
> +/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} 
> } */
> +
> +#include "arm_mve.h"
> +
> +float32x4_t
> +foo (int32x4_t a)
> +{
> +  return vcvtq_n_f32_s32 (a, 1);
> +}
> +
> +/* { dg-final { scan-assembler "vcvt.f32.s32"  }  } */
> +
> +float32x4_t
> +foo1 (int32x4_t a)
> +{
> +  return vcvtq_n (a, 1);
> +}
> +
> +/* { dg-final { scan-assembler "vcvt.f32.s32"  }  } */
> diff --git 
> a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_n_f32_u32.c 
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_n_f32_u32.c
> new file mode 100644
> index 
> 0000000000000000000000000000000000000000..fcd66a42fd49c78e1c11ac89393dc38cb06efbf5
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtq_n_f32_u32.c
> @@ -0,0 +1,21 @@
> +/* { dg-do compile  } */
> +/* { dg-additional-options "-march=armv8.1-m.main+mve.fp 
> -mfloat-abi=hard -O2"  }  */
> +/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} 
> } */
> +
> +#include "arm_mve.h"
> +
> +float32x4_t
> +foo (uint32x4_t a)
> +{
> +  return vcvtq_n_f32_u32 (a, 1);
> +}
> +
> +/* { dg-final { scan-assembler "vcvt.f32.u32"  }  } */
> +
> +float32x4_t
> +foo1 (uint32x4_t a)
> +{
> +  return vcvtq_n (a, 1);
> +}
> +
> +/* { dg-final { scan-assembler "vcvt.f32.u32"  }  } */
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_n_f16.c 
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_n_f16.c
> new file mode 100644
> index 
> 0000000000000000000000000000000000000000..e6ff11edf3933de7acd7bdffd33d8b79c88a7f48
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_n_f16.c
> @@ -0,0 +1,21 @@
> +/* { dg-do compile  } */
> +/* { dg-additional-options "-march=armv8.1-m.main+mve.fp 
> -mfloat-abi=hard -O2"  }  */
> +/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} 
> } */
> +
> +#include "arm_mve.h"
> +
> +float16x8_t
> +foo (float16x8_t a, float16_t b)
> +{
> +  return vsubq_n_f16 (a, b);
> +}
> +
> +/* { dg-final { scan-assembler "vsub.f16"  }  } */
> +
> +float16x8_t
> +foo1 (float16x8_t a, float16_t b)
> +{
> +  return vsubq (a, b);
> +}
> +
> +/* { dg-final { scan-assembler "vsub.f16"  }  } */
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_n_f32.c 
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_n_f32.c
> new file mode 100644
> index 
> 0000000000000000000000000000000000000000..1092fd0aa6abc2a0ed0320a17d6b563f0396fb58
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsubq_n_f32.c
> @@ -0,0 +1,21 @@
> +/* { dg-do compile  } */
> +/* { dg-additional-options "-march=armv8.1-m.main+mve.fp 
> -mfloat-abi=hard -O2"  }  */
> +/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} 
> } */
> +
> +#include "arm_mve.h"
> +
> +float32x4_t
> +foo (float32x4_t a, float32_t b)
> +{
> +  return vsubq_n_f32 (a, b);
> +}
> +
> +/* { dg-final { scan-assembler "vsub.f32"  }  } */
> +
> +float32x4_t
> +foo1 (float32x4_t a, float32_t b)
> +{
> +  return vsubq (a, b);
> +}
> +
> +/* { dg-final { scan-assembler "vsub.f32"  }  } */
>

^ permalink raw reply	[flat|nested] 48+ messages in thread

end of thread, other threads:[~2019-12-19 18:16 UTC | newest]

Thread overview: 48+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-14 19:34 [PATCH][ARM][GCC][0/x]: Support for MVE ACLE intrinsics Srinath Parvathaneni
2019-11-14 19:13 ` [PATCH][ARM][GCC][2/2x]: MVE intrinsics with binary operands Srinath Parvathaneni
2019-11-14 19:13 ` [PATCH][ARM][GCC][3/x]: MVE ACLE intrinsics framework patch Srinath Parvathaneni
2019-12-19 17:50   ` Kyrill Tkachov
2019-11-14 19:13 ` [PATCH][ARM][GCC][4/x]: MVE ACLE vector interleaving store intrinsics Srinath Parvathaneni
2019-12-19 17:39   ` Kyrill Tkachov
2019-11-14 19:13 ` [PATCH][ARM][GCC][2/x]: MVE ACLE intrinsics framework patch Srinath Parvathaneni
2019-12-19 17:24   ` Kyrill Tkachov
2019-11-14 19:13 ` [PATCH][ARM][GCC][3/1x]: MVE intrinsics with unary operand Srinath Parvathaneni
2019-11-14 19:13 ` [PATCH][ARM][GCC][2/1x]: " Srinath Parvathaneni
2019-12-19 18:05   ` Kyrill Tkachov
2019-11-14 19:13 ` [PATCH][ARM][GCC][3/3x]: MVE intrinsics with ternary operands Srinath Parvathaneni
2019-11-14 19:13 ` [PATCH][ARM][GCC][5/2x]: MVE intrinsics with binary operands Srinath Parvathaneni
2019-11-14 19:13 ` [PATCH][ARM][GCC][3/4x]: MVE intrinsics with quaternary operands Srinath Parvathaneni
2019-11-14 19:13 ` [PATCH][ARM][GCC][3/2x]: MVE intrinsics with binary operands Srinath Parvathaneni
2019-11-14 19:14 ` [PATCH][ARM][GCC][1/2x]: " Srinath Parvathaneni
2019-12-19 19:10   ` Kyrill Tkachov
2019-11-14 19:14 ` [PATCH][ARM][GCC][1/4x]: MVE intrinsics with quaternary operands Srinath Parvathaneni
2019-11-14 19:15 ` [PATCH][ARM][GCC][4/2x]: MVE intrinsics with binary operands Srinath Parvathaneni
2019-11-14 19:15 ` [PATCH][ARM][GCC][4/5x]: MVE load intrinsics with zero(_z) suffix Srinath Parvathaneni
2019-11-14 19:15 ` [PATCH][ARM][GCC][2/4x]: MVE intrinsics with quaternary operands Srinath Parvathaneni
2019-11-14 19:15 ` [PATCH][ARM][GCC][7/5x]: MVE store intrinsics which stores byte,half word or word to memory Srinath Parvathaneni
2019-11-14 19:15 ` [PATCH][ARM][GCC][8/5x]: Remaining MVE store intrinsics which stores an half word, word and double " Srinath Parvathaneni
2019-11-14 19:15 ` [PATCH][ARM][GCC][1/5x]: MVE store intrinsics Srinath Parvathaneni
2019-11-14 19:15 ` [PATCH][ARM][GCC][2/5x]: MVE load intrinsics Srinath Parvathaneni
2019-11-14 19:15 ` [PATCH][ARM][GCC][3/5x]: MVE store intrinsics with predicated suffix Srinath Parvathaneni
2019-11-14 19:16 ` [PATCH][ARM][GCC][13x]: MVE ACLE scalar shift intrinsics Srinath Parvathaneni
2019-11-14 19:16 ` [PATCH][ARM][GCC][2/8x]: MVE ACLE gather load and scatter store intrinsics with writeback Srinath Parvathaneni
2019-11-14 19:16 ` [PATCH][ARM][GCC][7x]: MVE vreinterpretq and vuninitializedq intrinsics Srinath Parvathaneni
2019-11-14 19:16 ` [PATCH][ARM][GCC][14x]: MVE ACLE whole vector left shift with carry intrinsics Srinath Parvathaneni
2019-11-14 19:16 ` [PATCH][ARM][GCC][5/5x]: MVE ACLE load intrinsics which load a byte, halfword, or word from memory Srinath Parvathaneni
2019-11-14 19:16 ` [PATCH][ARM][GCC][12x]: MVE ACLE intrinsics to set and get vector lane Srinath Parvathaneni
2019-11-14 19:16 ` [PATCH][ARM][GCC][6/5x]: Remaining MVE load intrinsics which loads half word and word or double word from memory Srinath Parvathaneni
2019-11-14 19:16 ` [PATCH][ARM][GCC][1/x]: MVE ACLE intrinsics framework patch Srinath Parvathaneni
2019-12-18 17:18   ` Kyrill Tkachov
2019-11-14 19:16 ` [PATCH][ARM][GCC][1/1x]: Patch to support MVE ACLE intrinsics with unary operand Srinath Parvathaneni
2019-12-19 17:57   ` Kyrill Tkachov
2019-11-14 19:16 ` [PATCH][ARM][GCC][9x]: MVE ACLE predicated intrinsics with (dont-care) variant Srinath Parvathaneni
2019-11-14 19:16 ` [PATCH][ARM][GCC][4/1x]: MVE intrinsics with unary operand Srinath Parvathaneni
2019-12-19 18:16   ` Kyrill Tkachov
2019-11-14 19:16 ` [PATCH][ARM][GCC][1/8x]: MVE ACLE vidup, vddup, viwdup and vdwdup intrinsics with writeback Srinath Parvathaneni
2019-11-14 19:16 ` [PATCH][ARM][GCC][10x]: MVE ACLE intrinsics "add with carry across beats" and "beat-wise substract" Srinath Parvathaneni
2019-11-14 19:16 ` [PATCH][ARM][GCC][6x]:MVE ACLE vaddq intrinsics using arithmetic plus operator Srinath Parvathaneni
2019-11-14 19:16 ` [PATCH][ARM][GCC][11x]: MVE ACLE vector interleaving store and deinterleaving load intrinsics and also aliases to vstr and vldr intrinsics Srinath Parvathaneni
2019-11-14 19:16 ` [PATCH][ARM][GCC][4/4x]: MVE intrinsics with quaternary operands Srinath Parvathaneni
2019-11-14 19:16 ` [PATCH][ARM][GCC][2/3x]: MVE intrinsics with ternary operands Srinath Parvathaneni
2019-11-14 19:27 ` [PATCH][ARM][GCC][1/3x]: " Srinath Parvathaneni
2019-12-12 16:09 ` [PATCH][ARM][GCC][0/x]: Support for MVE ACLE intrinsics Kyrill Tkachov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).