public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH 1/6] arm: [MVE intrinsics] Factorize vcaddq vhcaddq
@ 2023-07-13 10:22 Christophe Lyon
  2023-07-13 10:22 ` [PATCH 2/6] arm: [MVE intrinsics] rework " Christophe Lyon
                   ` (5 more replies)
  0 siblings, 6 replies; 7+ messages in thread
From: Christophe Lyon @ 2023-07-13 10:22 UTC (permalink / raw)
  To: gcc-patches, Kyrylo.Tkachov, richard.earnshaw, richard.sandiford
  Cc: Christophe Lyon

Factorize vcaddq, vhcaddq so that they use the same parameterized
names.

To be able to use the same patterns, we add a suffix to vcaddq.

Note that vcadd uses UNSPEC_VCADDxx for builtins without predication,
and VCADDQ_ROTxx_M_x (that is, not starting with "UNSPEC_").  The
UNPEC_* names are also used by neon.md

2023-07-13  Christophe Lyon  <christophe.lyon@linaro.org>

	gcc/
	* config/arm/arm_mve_builtins.def (vcaddq_rot90_, vcaddq_rot270_)
	(vcaddq_rot90_f, vcaddq_rot90_f): Add "_" or "_f" suffix.
	* config/arm/iterators.md (mve_insn): Add vcadd, vhcadd.
	(isu): Add UNSPEC_VCADD90, UNSPEC_VCADD270, VCADDQ_ROT270_M_U,
	VCADDQ_ROT270_M_S, VCADDQ_ROT90_M_U, VCADDQ_ROT90_M_S,
	VHCADDQ_ROT90_M_S, VHCADDQ_ROT270_M_S, VHCADDQ_ROT90_S,
	VHCADDQ_ROT270_S.
	(rot): Add VCADDQ_ROT90_M_F, VCADDQ_ROT90_M_S, VCADDQ_ROT90_M_U,
	VCADDQ_ROT270_M_F, VCADDQ_ROT270_M_S, VCADDQ_ROT270_M_U,
	VHCADDQ_ROT90_S, VHCADDQ_ROT270_S, VHCADDQ_ROT90_M_S,
	VHCADDQ_ROT270_M_S.
	(mve_rot): Add VCADDQ_ROT90_M_F, VCADDQ_ROT90_M_S,
	VCADDQ_ROT90_M_U, VCADDQ_ROT270_M_F, VCADDQ_ROT270_M_S,
	VCADDQ_ROT270_M_U, VHCADDQ_ROT90_S, VHCADDQ_ROT270_S,
	VHCADDQ_ROT90_M_S, VHCADDQ_ROT270_M_S.
	(supf): Add VHCADDQ_ROT90_M_S, VHCADDQ_ROT270_M_S,
	VHCADDQ_ROT90_S, VHCADDQ_ROT270_S, UNSPEC_VCADD90,
	UNSPEC_VCADD270.
	(VCADDQ_ROT270_M): Delete.
	(VCADDQ_M_F VxCADDQ VxCADDQ_M): New.
	(VCADDQ_ROT90_M): Delete.
	* config/arm/mve.md (mve_vcaddq<mve_rot><mode>)
	(mve_vhcaddq_rot270_s<mode>, mve_vhcaddq_rot90_s<mode>): Merge
	into ...
	(@mve_<mve_insn>q<mve_rot>_<supf><mode>): ... this.
	(mve_vcaddq<mve_rot><mode>): Rename into ...
	(@mve_<mve_insn>q<mve_rot>_f<mode>): ... this
	(mve_vcaddq_rot270_m_<supf><mode>)
	(mve_vcaddq_rot90_m_<supf><mode>, mve_vhcaddq_rot270_m_s<mode>)
	(mve_vhcaddq_rot90_m_s<mode>): Merge into ...
	(@mve_<mve_insn>q<mve_rot>_m_<supf><mode>): ... this.
	(mve_vcaddq_rot270_m_f<mode>, mve_vcaddq_rot90_m_f<mode>): Merge
	into ...
	(@mve_<mve_insn>q<mve_rot>_m_f<mode>): ... this.
---
 gcc/config/arm/arm_mve_builtins.def |   6 +-
 gcc/config/arm/iterators.md         |  38 +++++++-
 gcc/config/arm/mve.md               | 135 +++++-----------------------
 3 files changed, 62 insertions(+), 117 deletions(-)

diff --git a/gcc/config/arm/arm_mve_builtins.def b/gcc/config/arm/arm_mve_builtins.def
index 8de765de3b0..63ad1845593 100644
--- a/gcc/config/arm/arm_mve_builtins.def
+++ b/gcc/config/arm/arm_mve_builtins.def
@@ -187,6 +187,10 @@ VAR3 (BINOP_NONE_NONE_NONE, vmaxvq_s, v16qi, v8hi, v4si)
 VAR3 (BINOP_NONE_NONE_NONE, vmaxq_s, v16qi, v8hi, v4si)
 VAR3 (BINOP_NONE_NONE_NONE, vhsubq_s, v16qi, v8hi, v4si)
 VAR3 (BINOP_NONE_NONE_NONE, vhsubq_n_s, v16qi, v8hi, v4si)
+VAR3 (BINOP_NONE_NONE_NONE, vcaddq_rot90_, v16qi, v8hi, v4si)
+VAR3 (BINOP_NONE_NONE_NONE, vcaddq_rot270_, v16qi, v8hi, v4si)
+VAR2 (BINOP_NONE_NONE_NONE, vcaddq_rot90_f, v8hf, v4sf)
+VAR2 (BINOP_NONE_NONE_NONE, vcaddq_rot270_f, v8hf, v4sf)
 VAR3 (BINOP_NONE_NONE_NONE, vhcaddq_rot90_s, v16qi, v8hi, v4si)
 VAR3 (BINOP_NONE_NONE_NONE, vhcaddq_rot270_s, v16qi, v8hi, v4si)
 VAR3 (BINOP_NONE_NONE_NONE, vhaddq_s, v16qi, v8hi, v4si)
@@ -870,8 +874,6 @@ VAR3 (QUADOP_UNONE_UNONE_UNONE_IMM_PRED, vshlcq_m_vec_u, v16qi, v8hi, v4si)
 VAR3 (QUADOP_UNONE_UNONE_UNONE_IMM_PRED, vshlcq_m_carry_u, v16qi, v8hi, v4si)
 
 /* optabs without any suffixes.  */
-VAR5 (BINOP_NONE_NONE_NONE, vcaddq_rot90, v16qi, v8hi, v4si, v8hf, v4sf)
-VAR5 (BINOP_NONE_NONE_NONE, vcaddq_rot270, v16qi, v8hi, v4si, v8hf, v4sf)
 VAR2 (BINOP_NONE_NONE_NONE, vcmulq_rot90, v8hf, v4sf)
 VAR2 (BINOP_NONE_NONE_NONE, vcmulq_rot270, v8hf, v4sf)
 VAR2 (BINOP_NONE_NONE_NONE, vcmulq_rot180, v8hf, v4sf)
diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
index 9e77af55d60..da1ead34e58 100644
--- a/gcc/config/arm/iterators.md
+++ b/gcc/config/arm/iterators.md
@@ -902,6 +902,7 @@
 		     ])
 
 (define_int_attr mve_insn [
+		 (UNSPEC_VCADD90 "vcadd") (UNSPEC_VCADD270 "vcadd")
 		 (VABAVQ_P_S "vabav") (VABAVQ_P_U "vabav")
 		 (VABAVQ_S "vabav") (VABAVQ_U "vabav")
 		 (VABDQ_M_S "vabd") (VABDQ_M_U "vabd") (VABDQ_M_F "vabd")
@@ -925,6 +926,8 @@
 		 (VBICQ_N_S "vbic") (VBICQ_N_U "vbic")
 		 (VBRSRQ_M_N_S "vbrsr") (VBRSRQ_M_N_U "vbrsr") (VBRSRQ_M_N_F "vbrsr")
 		 (VBRSRQ_N_S "vbrsr") (VBRSRQ_N_U "vbrsr") (VBRSRQ_N_F "vbrsr")
+		 (VCADDQ_ROT270_M_U "vcadd") (VCADDQ_ROT270_M_S "vcadd") (VCADDQ_ROT270_M_F "vcadd")
+		 (VCADDQ_ROT90_M_U "vcadd") (VCADDQ_ROT90_M_S "vcadd") (VCADDQ_ROT90_M_F "vcadd")
 		 (VCLSQ_M_S "vcls")
 		 (VCLSQ_S "vcls")
 		 (VCLZQ_M_S "vclz") (VCLZQ_M_U "vclz")
@@ -944,6 +947,8 @@
 		 (VHADDQ_M_S "vhadd") (VHADDQ_M_U "vhadd")
 		 (VHADDQ_N_S "vhadd") (VHADDQ_N_U "vhadd")
 		 (VHADDQ_S "vhadd") (VHADDQ_U "vhadd")
+		 (VHCADDQ_ROT90_M_S "vhcadd") (VHCADDQ_ROT270_M_S "vhcadd")
+		 (VHCADDQ_ROT90_S "vhcadd") (VHCADDQ_ROT270_S "vhcadd")
 		 (VHSUBQ_M_N_S "vhsub") (VHSUBQ_M_N_U "vhsub")
 		 (VHSUBQ_M_S "vhsub") (VHSUBQ_M_U "vhsub")
 		 (VHSUBQ_N_S "vhsub") (VHSUBQ_N_U "vhsub")
@@ -1190,7 +1195,10 @@
 		 ])
 
 (define_int_attr isu    [
+		 (UNSPEC_VCADD90 "i") (UNSPEC_VCADD270 "i")
 		 (VABSQ_M_S "s")
+		 (VCADDQ_ROT270_M_U "i") (VCADDQ_ROT270_M_S "i")
+		 (VCADDQ_ROT90_M_U "i") (VCADDQ_ROT90_M_S "i")
 		 (VCLSQ_M_S "s")
 		 (VCLZQ_M_S "i")
 		 (VCLZQ_M_U "i")
@@ -1214,6 +1222,8 @@
 		 (VCMPNEQ_M_N_U "i")
 		 (VCMPNEQ_M_S "i")
 		 (VCMPNEQ_M_U "i")
+		 (VHCADDQ_ROT90_M_S "s") (VHCADDQ_ROT270_M_S "s")
+		 (VHCADDQ_ROT90_S "s") (VHCADDQ_ROT270_S "s")
 		 (VMOVNBQ_M_S "i") (VMOVNBQ_M_U "i")
 		 (VMOVNBQ_S "i") (VMOVNBQ_U "i")
 		 (VMOVNTQ_M_S "i") (VMOVNTQ_M_U "i")
@@ -2155,6 +2165,16 @@
 
 (define_int_attr rot [(UNSPEC_VCADD90 "90")
 		      (UNSPEC_VCADD270 "270")
+		      (VCADDQ_ROT90_M_F "90")
+		      (VCADDQ_ROT90_M_S "90")
+		      (VCADDQ_ROT90_M_U "90")
+		      (VCADDQ_ROT270_M_F "270")
+		      (VCADDQ_ROT270_M_S "270")
+		      (VCADDQ_ROT270_M_U "270")
+		      (VHCADDQ_ROT90_S "90")
+		      (VHCADDQ_ROT270_S "270")
+		      (VHCADDQ_ROT90_M_S "90")
+		      (VHCADDQ_ROT270_M_S "270")
 		      (UNSPEC_VCMUL "0")
 		      (UNSPEC_VCMUL90 "90")
 		      (UNSPEC_VCMUL180 "180")
@@ -2193,6 +2213,16 @@
 
 (define_int_attr mve_rot [(UNSPEC_VCADD90 "_rot90")
 			  (UNSPEC_VCADD270 "_rot270")
+			  (VCADDQ_ROT90_M_F "_rot90")
+			  (VCADDQ_ROT90_M_S "_rot90")
+			  (VCADDQ_ROT90_M_U "_rot90")
+			  (VCADDQ_ROT270_M_F "_rot270")
+			  (VCADDQ_ROT270_M_S "_rot270")
+			  (VCADDQ_ROT270_M_U "_rot270")
+			  (VHCADDQ_ROT90_S "_rot90")
+			  (VHCADDQ_ROT270_S "_rot270")
+			  (VHCADDQ_ROT90_M_S "_rot90")
+			  (VHCADDQ_ROT270_M_S "_rot270")
 			  (UNSPEC_VCMLA "")
 			  (UNSPEC_VCMLA90 "_rot90")
 			  (UNSPEC_VCMLA180 "_rot180")
@@ -2535,6 +2565,9 @@
 		       (VRMLALDAVHAQ_P_S "s") (VRMLALDAVHAQ_P_U "u")
 		       (VQSHLUQ_M_N_S "s")
 		       (VQSHLUQ_N_S "s")
+		       (VHCADDQ_ROT90_M_S "s") (VHCADDQ_ROT270_M_S "s")
+		       (VHCADDQ_ROT90_S "s") (VHCADDQ_ROT270_S "s")
+		       (UNSPEC_VCADD90 "") (UNSPEC_VCADD270 "")
 		       ])
 
 ;; Both kinds of return insn.
@@ -2767,7 +2800,9 @@
 (define_int_iterator VANDQ_M [VANDQ_M_U VANDQ_M_S])
 (define_int_iterator VBICQ_M [VBICQ_M_U VBICQ_M_S])
 (define_int_iterator VSHLQ_M_N [VSHLQ_M_N_S VSHLQ_M_N_U])
-(define_int_iterator VCADDQ_ROT270_M [VCADDQ_ROT270_M_U VCADDQ_ROT270_M_S])
+(define_int_iterator VCADDQ_M_F [VCADDQ_ROT90_M_F VCADDQ_ROT270_M_F])
+(define_int_iterator VxCADDQ [UNSPEC_VCADD90 UNSPEC_VCADD270 VHCADDQ_ROT90_S VHCADDQ_ROT270_S])
+(define_int_iterator VxCADDQ_M [VHCADDQ_ROT90_M_S VHCADDQ_ROT270_M_S VCADDQ_ROT90_M_U VCADDQ_ROT90_M_S VCADDQ_ROT270_M_U VCADDQ_ROT270_M_S])
 (define_int_iterator VQRSHLQ_M [VQRSHLQ_M_U VQRSHLQ_M_S])
 (define_int_iterator VQADDQ_M_N [VQADDQ_M_N_U VQADDQ_M_N_S])
 (define_int_iterator VADDQ_M_N [VADDQ_M_N_S VADDQ_M_N_U])
@@ -2777,7 +2812,6 @@
 (define_int_iterator VMLADAVAQ_P [VMLADAVAQ_P_U VMLADAVAQ_P_S])
 (define_int_iterator VBRSRQ_M_N [VBRSRQ_M_N_U VBRSRQ_M_N_S])
 (define_int_iterator VMULQ_M_N [VMULQ_M_N_U VMULQ_M_N_S])
-(define_int_iterator VCADDQ_ROT90_M [VCADDQ_ROT90_M_U VCADDQ_ROT90_M_S])
 (define_int_iterator VMULLTQ_INT_M [VMULLTQ_INT_M_S VMULLTQ_INT_M_U])
 (define_int_iterator VEORQ_M [VEORQ_M_S VEORQ_M_U])
 (define_int_iterator VSHRQ_M_N [VSHRQ_M_N_S VSHRQ_M_N_U])
diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index 74909ce47e1..a6db6d1b81d 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -839,17 +839,20 @@
 ])
 
 ;;
-;; [vcaddq, vcaddq_rot90, vcadd_rot180, vcadd_rot270])
+;; [vcaddq_rot90_s, vcadd_rot90_u]
+;; [vcaddq_rot270_s, vcadd_rot270_u]
+;; [vhcaddq_rot90_s]
+;; [vhcaddq_rot270_s]
 ;;
-(define_insn "mve_vcaddq<mve_rot><mode>"
+(define_insn "@mve_<mve_insn>q<mve_rot>_<supf><mode>"
   [
    (set (match_operand:MVE_2 0 "s_register_operand" "<earlyclobber_32>")
 	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "w")
 		       (match_operand:MVE_2 2 "s_register_operand" "w")]
-	 VCADD))
+	 VxCADDQ))
   ]
   "TARGET_HAVE_MVE"
-  "vcadd.i%#<V_sz_elem>	%q0, %q1, %q2, #<rot>"
+  "<mve_insn>.<isu>%#<V_sz_elem>\t%q0, %q1, %q2, #<rot>"
   [(set_attr "type" "mve_move")
 ])
 
@@ -904,36 +907,6 @@
   [(set_attr "type" "mve_move")
 ])
 
-;;
-;; [vhcaddq_rot270_s])
-;;
-(define_insn "mve_vhcaddq_rot270_s<mode>"
-  [
-   (set (match_operand:MVE_2 0 "s_register_operand" "<earlyclobber_32>")
-	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "w")
-		       (match_operand:MVE_2 2 "s_register_operand" "w")]
-	 VHCADDQ_ROT270_S))
-  ]
-  "TARGET_HAVE_MVE"
-  "vhcadd.s%#<V_sz_elem>\t%q0, %q1, %q2, #270"
-  [(set_attr "type" "mve_move")
-])
-
-;;
-;; [vhcaddq_rot90_s])
-;;
-(define_insn "mve_vhcaddq_rot90_s<mode>"
-  [
-   (set (match_operand:MVE_2 0 "s_register_operand" "<earlyclobber_32>")
-	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "w")
-		       (match_operand:MVE_2 2 "s_register_operand" "w")]
-	 VHCADDQ_ROT90_S))
-  ]
-  "TARGET_HAVE_MVE"
-  "vhcadd.s%#<V_sz_elem>\t%q0, %q1, %q2, #90"
-  [(set_attr "type" "mve_move")
-])
-
 ;;
 ;; [vmaxaq_s]
 ;; [vminaq_s]
@@ -1238,9 +1211,9 @@
 ])
 
 ;;
-;; [vcaddq, vcaddq_rot90, vcadd_rot180, vcadd_rot270])
+;; [vcaddq_rot90_f, vcaddq_rot270_f]
 ;;
-(define_insn "mve_vcaddq<mve_rot><mode>"
+(define_insn "@mve_<mve_insn>q<mve_rot>_f<mode>"
   [
    (set (match_operand:MVE_0 0 "s_register_operand" "<earlyclobber_32>")
 	(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "w")
@@ -1248,7 +1221,7 @@
 	 VCADD))
   ]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
-  "vcadd.f%#<V_sz_elem>	%q0, %q1, %q2, #<rot>"
+  "<mve_insn>.f%#<V_sz_elem>\t%q0, %q1, %q2, #<rot>"
   [(set_attr "type" "mve_move")
 ])
 
@@ -2788,36 +2761,22 @@
    (set_attr "length""8")])
 
 ;;
-;; [vcaddq_rot270_m_u, vcaddq_rot270_m_s])
+;; [vcaddq_rot90_m_u, vcaddq_rot90_m_s]
+;; [vcaddq_rot270_m_u, vcaddq_rot270_m_s]
+;; [vhcaddq_rot90_m_s]
+;; [vhcaddq_rot270_m_s]
 ;;
-(define_insn "mve_vcaddq_rot270_m_<supf><mode>"
+(define_insn "@mve_<mve_insn>q<mve_rot>_m_<supf><mode>"
   [
    (set (match_operand:MVE_2 0 "s_register_operand" "<earlyclobber_32>")
 	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
 		       (match_operand:MVE_2 2 "s_register_operand" "w")
 		       (match_operand:MVE_2 3 "s_register_operand" "w")
 		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
-	 VCADDQ_ROT270_M))
+	 VxCADDQ_M))
   ]
   "TARGET_HAVE_MVE"
-  "vpst\;vcaddt.i%#<V_sz_elem>	%q0, %q2, %q3, #270"
-  [(set_attr "type" "mve_move")
-   (set_attr "length""8")])
-
-;;
-;; [vcaddq_rot90_m_u, vcaddq_rot90_m_s])
-;;
-(define_insn "mve_vcaddq_rot90_m_<supf><mode>"
-  [
-   (set (match_operand:MVE_2 0 "s_register_operand" "<earlyclobber_32>")
-	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
-		       (match_operand:MVE_2 2 "s_register_operand" "w")
-		       (match_operand:MVE_2 3 "s_register_operand" "w")
-		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
-	 VCADDQ_ROT90_M))
-  ]
-  "TARGET_HAVE_MVE"
-  "vpst\;vcaddt.i%#<V_sz_elem>	%q0, %q2, %q3, #90"
+  "vpst\;<mve_insn>t.<isu>%#<V_sz_elem>\t%q0, %q2, %q3, #<rot>"
   [(set_attr "type" "mve_move")
    (set_attr "length""8")])
 
@@ -2974,40 +2933,6 @@
   [(set_attr "type" "mve_move")
    (set_attr "length""8")])
 
-;;
-;; [vhcaddq_rot270_m_s])
-;;
-(define_insn "mve_vhcaddq_rot270_m_s<mode>"
-  [
-   (set (match_operand:MVE_2 0 "s_register_operand" "<earlyclobber_32>")
-	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
-		       (match_operand:MVE_2 2 "s_register_operand" "w")
-		       (match_operand:MVE_2 3 "s_register_operand" "w")
-		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
-	 VHCADDQ_ROT270_M_S))
-  ]
-  "TARGET_HAVE_MVE"
-  "vpst\;vhcaddt.s%#<V_sz_elem>\t%q0, %q2, %q3, #270"
-  [(set_attr "type" "mve_move")
-   (set_attr "length""8")])
-
-;;
-;; [vhcaddq_rot90_m_s])
-;;
-(define_insn "mve_vhcaddq_rot90_m_s<mode>"
-  [
-   (set (match_operand:MVE_2 0 "s_register_operand" "<earlyclobber_32>")
-	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
-		       (match_operand:MVE_2 2 "s_register_operand" "w")
-		       (match_operand:MVE_2 3 "s_register_operand" "w")
-		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
-	 VHCADDQ_ROT90_M_S))
-  ]
-  "TARGET_HAVE_MVE"
-  "vpst\;vhcaddt.s%#<V_sz_elem>\t%q0, %q2, %q3, #90"
-  [(set_attr "type" "mve_move")
-   (set_attr "length""8")])
-
 ;;
 ;; [vmlaldavaq_p_u, vmlaldavaq_p_s]
 ;; [vmlaldavaxq_p_s]
@@ -3247,36 +3172,20 @@
    (set_attr "length""8")])
 
 ;;
-;; [vcaddq_rot270_m_f])
-;;
-(define_insn "mve_vcaddq_rot270_m_f<mode>"
-  [
-   (set (match_operand:MVE_0 0 "s_register_operand" "<earlyclobber_32>")
-	(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "0")
-		       (match_operand:MVE_0 2 "s_register_operand" "w")
-		       (match_operand:MVE_0 3 "s_register_operand" "w")
-		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
-	 VCADDQ_ROT270_M_F))
-  ]
-  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
-  "vpst\;vcaddt.f%#<V_sz_elem>	%q0, %q2, %q3, #270"
-  [(set_attr "type" "mve_move")
-   (set_attr "length""8")])
-
-;;
-;; [vcaddq_rot90_m_f])
+;; [vcaddq_rot90_m_f]
+;; [vcaddq_rot270_m_f]
 ;;
-(define_insn "mve_vcaddq_rot90_m_f<mode>"
+(define_insn "@mve_<mve_insn>q<mve_rot>_m_f<mode>"
   [
    (set (match_operand:MVE_0 0 "s_register_operand" "<earlyclobber_32>")
 	(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "0")
 		       (match_operand:MVE_0 2 "s_register_operand" "w")
 		       (match_operand:MVE_0 3 "s_register_operand" "w")
 		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
-	 VCADDQ_ROT90_M_F))
+	 VCADDQ_M_F))
   ]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
-  "vpst\;vcaddt.f%#<V_sz_elem>	%q0, %q2, %q3, #90"
+  "vpst\;<mve_insn>t.f%#<V_sz_elem>\t%q0, %q2, %q3, #<rot>"
   [(set_attr "type" "mve_move")
    (set_attr "length""8")])
 
-- 
2.34.1


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 2/6] arm: [MVE intrinsics] rework vcaddq vhcaddq
  2023-07-13 10:22 [PATCH 1/6] arm: [MVE intrinsics] Factorize vcaddq vhcaddq Christophe Lyon
@ 2023-07-13 10:22 ` Christophe Lyon
  2023-07-13 10:22 ` [PATCH 3/6] arm: [MVE intrinsics factorize vcmulq Christophe Lyon
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Christophe Lyon @ 2023-07-13 10:22 UTC (permalink / raw)
  To: gcc-patches, Kyrylo.Tkachov, richard.earnshaw, richard.sandiford
  Cc: Christophe Lyon

Implement vcaddq, vhcaddq using the new MVE builtins framework.

2023-07-13  Christophe Lyon  <christophe.lyon@linaro.org>

	gcc/
	* config/arm/arm-mve-builtins-base.cc (vcaddq_rot90)
	(vcaddq_rot270, vhcaddq_rot90, vhcaddq_rot270): New.
	* config/arm/arm-mve-builtins-base.def (vcaddq_rot90)
	(vcaddq_rot270, vhcaddq_rot90, vhcaddq_rot270): New.
	* config/arm/arm-mve-builtins-base.h: (vcaddq_rot90)
	(vcaddq_rot270, vhcaddq_rot90, vhcaddq_rot270): New.
	* config/arm/arm-mve-builtins-functions.h (class
	unspec_mve_function_exact_insn_rot): New.
	* config/arm/arm_mve.h (vcaddq_rot90): Delete.
	(vcaddq_rot270): Delete.
	(vhcaddq_rot90): Delete.
	(vhcaddq_rot270): Delete.
	(vcaddq_rot270_m): Delete.
	(vcaddq_rot90_m): Delete.
	(vhcaddq_rot270_m): Delete.
	(vhcaddq_rot90_m): Delete.
	(vcaddq_rot90_x): Delete.
	(vcaddq_rot270_x): Delete.
	(vhcaddq_rot90_x): Delete.
	(vhcaddq_rot270_x): Delete.
	(vcaddq_rot90_u8): Delete.
	(vcaddq_rot270_u8): Delete.
	(vhcaddq_rot90_s8): Delete.
	(vhcaddq_rot270_s8): Delete.
	(vcaddq_rot90_s8): Delete.
	(vcaddq_rot270_s8): Delete.
	(vcaddq_rot90_u16): Delete.
	(vcaddq_rot270_u16): Delete.
	(vhcaddq_rot90_s16): Delete.
	(vhcaddq_rot270_s16): Delete.
	(vcaddq_rot90_s16): Delete.
	(vcaddq_rot270_s16): Delete.
	(vcaddq_rot90_u32): Delete.
	(vcaddq_rot270_u32): Delete.
	(vhcaddq_rot90_s32): Delete.
	(vhcaddq_rot270_s32): Delete.
	(vcaddq_rot90_s32): Delete.
	(vcaddq_rot270_s32): Delete.
	(vcaddq_rot90_f16): Delete.
	(vcaddq_rot270_f16): Delete.
	(vcaddq_rot90_f32): Delete.
	(vcaddq_rot270_f32): Delete.
	(vcaddq_rot270_m_s8): Delete.
	(vcaddq_rot270_m_s32): Delete.
	(vcaddq_rot270_m_s16): Delete.
	(vcaddq_rot270_m_u8): Delete.
	(vcaddq_rot270_m_u32): Delete.
	(vcaddq_rot270_m_u16): Delete.
	(vcaddq_rot90_m_s8): Delete.
	(vcaddq_rot90_m_s32): Delete.
	(vcaddq_rot90_m_s16): Delete.
	(vcaddq_rot90_m_u8): Delete.
	(vcaddq_rot90_m_u32): Delete.
	(vcaddq_rot90_m_u16): Delete.
	(vhcaddq_rot270_m_s8): Delete.
	(vhcaddq_rot270_m_s32): Delete.
	(vhcaddq_rot270_m_s16): Delete.
	(vhcaddq_rot90_m_s8): Delete.
	(vhcaddq_rot90_m_s32): Delete.
	(vhcaddq_rot90_m_s16): Delete.
	(vcaddq_rot270_m_f32): Delete.
	(vcaddq_rot270_m_f16): Delete.
	(vcaddq_rot90_m_f32): Delete.
	(vcaddq_rot90_m_f16): Delete.
	(vcaddq_rot90_x_s8): Delete.
	(vcaddq_rot90_x_s16): Delete.
	(vcaddq_rot90_x_s32): Delete.
	(vcaddq_rot90_x_u8): Delete.
	(vcaddq_rot90_x_u16): Delete.
	(vcaddq_rot90_x_u32): Delete.
	(vcaddq_rot270_x_s8): Delete.
	(vcaddq_rot270_x_s16): Delete.
	(vcaddq_rot270_x_s32): Delete.
	(vcaddq_rot270_x_u8): Delete.
	(vcaddq_rot270_x_u16): Delete.
	(vcaddq_rot270_x_u32): Delete.
	(vhcaddq_rot90_x_s8): Delete.
	(vhcaddq_rot90_x_s16): Delete.
	(vhcaddq_rot90_x_s32): Delete.
	(vhcaddq_rot270_x_s8): Delete.
	(vhcaddq_rot270_x_s16): Delete.
	(vhcaddq_rot270_x_s32): Delete.
	(vcaddq_rot90_x_f16): Delete.
	(vcaddq_rot90_x_f32): Delete.
	(vcaddq_rot270_x_f16): Delete.
	(vcaddq_rot270_x_f32): Delete.
	(__arm_vcaddq_rot90_u8): Delete.
	(__arm_vcaddq_rot270_u8): Delete.
	(__arm_vhcaddq_rot90_s8): Delete.
	(__arm_vhcaddq_rot270_s8): Delete.
	(__arm_vcaddq_rot90_s8): Delete.
	(__arm_vcaddq_rot270_s8): Delete.
	(__arm_vcaddq_rot90_u16): Delete.
	(__arm_vcaddq_rot270_u16): Delete.
	(__arm_vhcaddq_rot90_s16): Delete.
	(__arm_vhcaddq_rot270_s16): Delete.
	(__arm_vcaddq_rot90_s16): Delete.
	(__arm_vcaddq_rot270_s16): Delete.
	(__arm_vcaddq_rot90_u32): Delete.
	(__arm_vcaddq_rot270_u32): Delete.
	(__arm_vhcaddq_rot90_s32): Delete.
	(__arm_vhcaddq_rot270_s32): Delete.
	(__arm_vcaddq_rot90_s32): Delete.
	(__arm_vcaddq_rot270_s32): Delete.
	(__arm_vcaddq_rot270_m_s8): Delete.
	(__arm_vcaddq_rot270_m_s32): Delete.
	(__arm_vcaddq_rot270_m_s16): Delete.
	(__arm_vcaddq_rot270_m_u8): Delete.
	(__arm_vcaddq_rot270_m_u32): Delete.
	(__arm_vcaddq_rot270_m_u16): Delete.
	(__arm_vcaddq_rot90_m_s8): Delete.
	(__arm_vcaddq_rot90_m_s32): Delete.
	(__arm_vcaddq_rot90_m_s16): Delete.
	(__arm_vcaddq_rot90_m_u8): Delete.
	(__arm_vcaddq_rot90_m_u32): Delete.
	(__arm_vcaddq_rot90_m_u16): Delete.
	(__arm_vhcaddq_rot270_m_s8): Delete.
	(__arm_vhcaddq_rot270_m_s32): Delete.
	(__arm_vhcaddq_rot270_m_s16): Delete.
	(__arm_vhcaddq_rot90_m_s8): Delete.
	(__arm_vhcaddq_rot90_m_s32): Delete.
	(__arm_vhcaddq_rot90_m_s16): Delete.
	(__arm_vcaddq_rot90_x_s8): Delete.
	(__arm_vcaddq_rot90_x_s16): Delete.
	(__arm_vcaddq_rot90_x_s32): Delete.
	(__arm_vcaddq_rot90_x_u8): Delete.
	(__arm_vcaddq_rot90_x_u16): Delete.
	(__arm_vcaddq_rot90_x_u32): Delete.
	(__arm_vcaddq_rot270_x_s8): Delete.
	(__arm_vcaddq_rot270_x_s16): Delete.
	(__arm_vcaddq_rot270_x_s32): Delete.
	(__arm_vcaddq_rot270_x_u8): Delete.
	(__arm_vcaddq_rot270_x_u16): Delete.
	(__arm_vcaddq_rot270_x_u32): Delete.
	(__arm_vhcaddq_rot90_x_s8): Delete.
	(__arm_vhcaddq_rot90_x_s16): Delete.
	(__arm_vhcaddq_rot90_x_s32): Delete.
	(__arm_vhcaddq_rot270_x_s8): Delete.
	(__arm_vhcaddq_rot270_x_s16): Delete.
	(__arm_vhcaddq_rot270_x_s32): Delete.
	(__arm_vcaddq_rot90_f16): Delete.
	(__arm_vcaddq_rot270_f16): Delete.
	(__arm_vcaddq_rot90_f32): Delete.
	(__arm_vcaddq_rot270_f32): Delete.
	(__arm_vcaddq_rot270_m_f32): Delete.
	(__arm_vcaddq_rot270_m_f16): Delete.
	(__arm_vcaddq_rot90_m_f32): Delete.
	(__arm_vcaddq_rot90_m_f16): Delete.
	(__arm_vcaddq_rot90_x_f16): Delete.
	(__arm_vcaddq_rot90_x_f32): Delete.
	(__arm_vcaddq_rot270_x_f16): Delete.
	(__arm_vcaddq_rot270_x_f32): Delete.
	(__arm_vcaddq_rot90): Delete.
	(__arm_vcaddq_rot270): Delete.
	(__arm_vhcaddq_rot90): Delete.
	(__arm_vhcaddq_rot270): Delete.
	(__arm_vcaddq_rot270_m): Delete.
	(__arm_vcaddq_rot90_m): Delete.
	(__arm_vhcaddq_rot270_m): Delete.
	(__arm_vhcaddq_rot90_m): Delete.
	(__arm_vcaddq_rot90_x): Delete.
	(__arm_vcaddq_rot270_x): Delete.
	(__arm_vhcaddq_rot90_x): Delete.
	(__arm_vhcaddq_rot270_x): Delete.
---
 gcc/config/arm/arm-mve-builtins-base.cc     |    4 +
 gcc/config/arm/arm-mve-builtins-base.def    |    6 +
 gcc/config/arm/arm-mve-builtins-base.h      |    4 +
 gcc/config/arm/arm-mve-builtins-functions.h |  103 ++
 gcc/config/arm/arm_mve.h                    | 1398 ++-----------------
 5 files changed, 215 insertions(+), 1300 deletions(-)

diff --git a/gcc/config/arm/arm-mve-builtins-base.cc b/gcc/config/arm/arm-mve-builtins-base.cc
index af02397f1c4..f15bb926147 100644
--- a/gcc/config/arm/arm-mve-builtins-base.cc
+++ b/gcc/config/arm/arm-mve-builtins-base.cc
@@ -260,6 +260,10 @@ FUNCTION_PRED_P_S_U (vaddvq, VADDVQ)
 FUNCTION_PRED_P_S_U (vaddvaq, VADDVAQ)
 FUNCTION_WITH_RTX_M (vandq, AND, VANDQ)
 FUNCTION_ONLY_N (vbrsrq, VBRSRQ)
+FUNCTION (vcaddq_rot90, unspec_mve_function_exact_insn_rot, (UNSPEC_VCADD90, UNSPEC_VCADD90, UNSPEC_VCADD90, VCADDQ_ROT90_M_S, VCADDQ_ROT90_M_U, VCADDQ_ROT90_M_F))
+FUNCTION (vcaddq_rot270, unspec_mve_function_exact_insn_rot, (UNSPEC_VCADD270, UNSPEC_VCADD270, UNSPEC_VCADD270, VCADDQ_ROT270_M_S, VCADDQ_ROT270_M_U, VCADDQ_ROT270_M_F))
+FUNCTION (vhcaddq_rot90, unspec_mve_function_exact_insn_rot, (VHCADDQ_ROT90_S, -1, -1, VHCADDQ_ROT90_M_S, -1, -1))
+FUNCTION (vhcaddq_rot270, unspec_mve_function_exact_insn_rot, (VHCADDQ_ROT270_S, -1, -1, VHCADDQ_ROT270_M_S, -1, -1))
 FUNCTION_WITHOUT_N_NO_U_F (vclsq, VCLSQ)
 FUNCTION (vclzq, unspec_based_mve_function_exact_insn, (CLZ, CLZ, CLZ, -1, -1, -1, VCLZQ_M_S, VCLZQ_M_U, -1, -1, -1 ,-1))
 FUNCTION (vcmpeqq, unspec_based_mve_function_exact_insn_vcmp, (EQ, EQ, EQ, VCMPEQQ_M_S, VCMPEQQ_M_U, VCMPEQQ_M_F, VCMPEQQ_M_N_S, VCMPEQQ_M_N_U, VCMPEQQ_M_N_F))
diff --git a/gcc/config/arm/arm-mve-builtins-base.def b/gcc/config/arm/arm-mve-builtins-base.def
index ee08d063407..9a793147960 100644
--- a/gcc/config/arm/arm-mve-builtins-base.def
+++ b/gcc/config/arm/arm-mve-builtins-base.def
@@ -28,6 +28,8 @@ DEF_MVE_FUNCTION (vaddvaq, unary_int32_acc, all_integer, p_or_none)
 DEF_MVE_FUNCTION (vaddvq, unary_int32, all_integer, p_or_none)
 DEF_MVE_FUNCTION (vandq, binary, all_integer, mx_or_none)
 DEF_MVE_FUNCTION (vbrsrq, binary_imm32, all_integer, mx_or_none)
+DEF_MVE_FUNCTION (vcaddq_rot90, binary, all_integer, mx_or_none)
+DEF_MVE_FUNCTION (vcaddq_rot270, binary, all_integer, mx_or_none)
 DEF_MVE_FUNCTION (vclsq, unary, all_signed, mx_or_none)
 DEF_MVE_FUNCTION (vclzq, unary, all_integer, mx_or_none)
 DEF_MVE_FUNCTION (vcmpcsq, cmp, all_unsigned, m_or_none)
@@ -42,6 +44,8 @@ DEF_MVE_FUNCTION (vcreateq, create, all_integer_with_64, none)
 DEF_MVE_FUNCTION (vdupq, unary_n, all_integer, mx_or_none)
 DEF_MVE_FUNCTION (veorq, binary, all_integer, mx_or_none)
 DEF_MVE_FUNCTION (vhaddq, binary_opt_n, all_integer, mx_or_none)
+DEF_MVE_FUNCTION (vhcaddq_rot90, binary, all_signed, mx_or_none)
+DEF_MVE_FUNCTION (vhcaddq_rot270, binary, all_signed, mx_or_none)
 DEF_MVE_FUNCTION (vhsubq, binary_opt_n, all_integer, mx_or_none)
 DEF_MVE_FUNCTION (vmaxaq, binary_maxamina, all_signed, m_or_none)
 DEF_MVE_FUNCTION (vmaxavq, binary_maxavminav, all_signed, p_or_none)
@@ -152,6 +156,8 @@ DEF_MVE_FUNCTION (vabsq, unary, all_float, mx_or_none)
 DEF_MVE_FUNCTION (vaddq, binary_opt_n, all_float, mx_or_none)
 DEF_MVE_FUNCTION (vandq, binary, all_float, mx_or_none)
 DEF_MVE_FUNCTION (vbrsrq, binary_imm32, all_float, mx_or_none)
+DEF_MVE_FUNCTION (vcaddq_rot90, binary, all_float, mx_or_none)
+DEF_MVE_FUNCTION (vcaddq_rot270, binary, all_float, mx_or_none)
 DEF_MVE_FUNCTION (vcmpeqq, cmp, all_float, m_or_none)
 DEF_MVE_FUNCTION (vcmpgeq, cmp, all_float, m_or_none)
 DEF_MVE_FUNCTION (vcmpgtq, cmp, all_float, m_or_none)
diff --git a/gcc/config/arm/arm-mve-builtins-base.h b/gcc/config/arm/arm-mve-builtins-base.h
index 942c8587446..8dac72970e0 100644
--- a/gcc/config/arm/arm-mve-builtins-base.h
+++ b/gcc/config/arm/arm-mve-builtins-base.h
@@ -33,6 +33,8 @@ extern const function_base *const vaddvaq;
 extern const function_base *const vaddvq;
 extern const function_base *const vandq;
 extern const function_base *const vbrsrq;
+extern const function_base *const vcaddq_rot90;
+extern const function_base *const vcaddq_rot270;
 extern const function_base *const vclsq;
 extern const function_base *const vclzq;
 extern const function_base *const vcmpcsq;
@@ -50,6 +52,8 @@ extern const function_base *const vfmaq;
 extern const function_base *const vfmasq;
 extern const function_base *const vfmsq;
 extern const function_base *const vhaddq;
+extern const function_base *const vhcaddq_rot90;
+extern const function_base *const vhcaddq_rot270;
 extern const function_base *const vhsubq;
 extern const function_base *const vmaxaq;
 extern const function_base *const vmaxavq;
diff --git a/gcc/config/arm/arm-mve-builtins-functions.h b/gcc/config/arm/arm-mve-builtins-functions.h
index 8f3bae4b7da..a6573844319 100644
--- a/gcc/config/arm/arm-mve-builtins-functions.h
+++ b/gcc/config/arm/arm-mve-builtins-functions.h
@@ -735,6 +735,109 @@ public:
   }
 };
 
+/* Map the function directly to CODE (UNSPEC, UNSPEC, UNSPEC, M) where
+   M is the vector mode associated with type suffix 0.  USed for the
+   operations where there is a "rot90" or "rot270" suffix, depending
+   on the UNSPEC.  */
+class unspec_mve_function_exact_insn_rot : public function_base
+{
+public:
+  CONSTEXPR unspec_mve_function_exact_insn_rot (int unspec_for_sint,
+						int unspec_for_uint,
+						int unspec_for_fp,
+						int unspec_for_m_sint,
+						int unspec_for_m_uint,
+						int unspec_for_m_fp)
+    : m_unspec_for_sint (unspec_for_sint),
+      m_unspec_for_uint (unspec_for_uint),
+      m_unspec_for_fp (unspec_for_fp),
+      m_unspec_for_m_sint (unspec_for_m_sint),
+      m_unspec_for_m_uint (unspec_for_m_uint),
+      m_unspec_for_m_fp (unspec_for_m_fp)
+  {}
+
+  /* The unspec code associated with signed-integer, unsigned-integer
+     and floating-point operations respectively.  It covers the cases
+     with and without the _m predicate.  */
+  int m_unspec_for_sint;
+  int m_unspec_for_uint;
+  int m_unspec_for_fp;
+  int m_unspec_for_m_sint;
+  int m_unspec_for_m_uint;
+  int m_unspec_for_m_fp;
+
+  rtx
+  expand (function_expander &e) const override
+  {
+    insn_code code;
+
+    switch (e.pred)
+      {
+      case PRED_none:
+	switch (e.mode_suffix_id)
+	  {
+	  case MODE_none:
+	    /* No predicate, no suffix.  */
+	    if (e.type_suffix (0).integer_p)
+	      if (e.type_suffix (0).unsigned_p)
+		code = code_for_mve_q (m_unspec_for_uint, m_unspec_for_uint, m_unspec_for_uint, e.vector_mode (0));
+	      else
+		code = code_for_mve_q (m_unspec_for_sint, m_unspec_for_sint, m_unspec_for_sint, e.vector_mode (0));
+	    else
+	      code = code_for_mve_q_f (m_unspec_for_fp, m_unspec_for_fp, e.vector_mode (0));
+	    break;
+
+	  default:
+	    gcc_unreachable ();
+	  }
+	return e.use_exact_insn (code);
+
+      case PRED_m:
+	switch (e.mode_suffix_id)
+	  {
+	  case MODE_none:
+	    /* No suffix, "m" predicate.  */
+	    if (e.type_suffix (0).integer_p)
+	      if (e.type_suffix (0).unsigned_p)
+		code = code_for_mve_q_m (m_unspec_for_m_uint, m_unspec_for_m_uint, m_unspec_for_m_uint, e.vector_mode (0));
+	      else
+		code = code_for_mve_q_m (m_unspec_for_m_sint, m_unspec_for_m_sint, m_unspec_for_m_sint, e.vector_mode (0));
+	    else
+	      code = code_for_mve_q_m_f (m_unspec_for_m_fp, m_unspec_for_m_fp, e.vector_mode (0));
+	    break;
+
+	  default:
+	    gcc_unreachable ();
+	  }
+	return e.use_cond_insn (code, 0);
+
+      case PRED_x:
+	switch (e.mode_suffix_id)
+	  {
+	  case MODE_none:
+	    /* No suffix, "x" predicate.  */
+	    if (e.type_suffix (0).integer_p)
+	      if (e.type_suffix (0).unsigned_p)
+		code = code_for_mve_q_m (m_unspec_for_m_uint, m_unspec_for_m_uint, m_unspec_for_m_uint, e.vector_mode (0));
+	      else
+		code = code_for_mve_q_m (m_unspec_for_m_sint, m_unspec_for_m_sint, m_unspec_for_m_sint, e.vector_mode (0));
+	    else
+	      code = code_for_mve_q_m_f (m_unspec_for_m_fp, m_unspec_for_m_fp, e.vector_mode (0));
+	    break;
+
+	  default:
+	    gcc_unreachable ();
+	  }
+	return e.use_pred_x_insn (code);
+
+      default:
+	gcc_unreachable ();
+      }
+
+    gcc_unreachable ();
+  }
+};
+
 } /* end namespace arm_mve */
 
 /* Declare the global function base NAME, creating it from an instance
diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
index 30485e74cdd..9297ad757ab 100644
--- a/gcc/config/arm/arm_mve.h
+++ b/gcc/config/arm/arm_mve.h
@@ -45,20 +45,12 @@
 #define vornq(__a, __b) __arm_vornq(__a, __b)
 #define vmulltq_int(__a, __b) __arm_vmulltq_int(__a, __b)
 #define vmullbq_int(__a, __b) __arm_vmullbq_int(__a, __b)
-#define vcaddq_rot90(__a, __b) __arm_vcaddq_rot90(__a, __b)
-#define vcaddq_rot270(__a, __b) __arm_vcaddq_rot270(__a, __b)
 #define vbicq(__a, __b) __arm_vbicq(__a, __b)
-#define vhcaddq_rot90(__a, __b) __arm_vhcaddq_rot90(__a, __b)
-#define vhcaddq_rot270(__a, __b) __arm_vhcaddq_rot270(__a, __b)
 #define vmulltq_poly(__a, __b) __arm_vmulltq_poly(__a, __b)
 #define vmullbq_poly(__a, __b) __arm_vmullbq_poly(__a, __b)
 #define vbicq_m_n(__a, __imm, __p) __arm_vbicq_m_n(__a, __imm, __p)
 #define vshlcq(__a, __b, __imm) __arm_vshlcq(__a, __b, __imm)
 #define vbicq_m(__inactive, __a, __b, __p) __arm_vbicq_m(__inactive, __a, __b, __p)
-#define vcaddq_rot270_m(__inactive, __a, __b, __p) __arm_vcaddq_rot270_m(__inactive, __a, __b, __p)
-#define vcaddq_rot90_m(__inactive, __a, __b, __p) __arm_vcaddq_rot90_m(__inactive, __a, __b, __p)
-#define vhcaddq_rot270_m(__inactive, __a, __b, __p) __arm_vhcaddq_rot270_m(__inactive, __a, __b, __p)
-#define vhcaddq_rot90_m(__inactive, __a, __b, __p) __arm_vhcaddq_rot90_m(__inactive, __a, __b, __p)
 #define vmullbq_int_m(__inactive, __a, __b, __p) __arm_vmullbq_int_m(__inactive, __a, __b, __p)
 #define vmulltq_int_m(__inactive, __a, __b, __p) __arm_vmulltq_int_m(__inactive, __a, __b, __p)
 #define vornq_m(__inactive, __a, __b, __p) __arm_vornq_m(__inactive, __a, __b, __p)
@@ -141,10 +133,6 @@
 #define vmullbq_int_x(__a, __b, __p) __arm_vmullbq_int_x(__a, __b, __p)
 #define vmulltq_poly_x(__a, __b, __p) __arm_vmulltq_poly_x(__a, __b, __p)
 #define vmulltq_int_x(__a, __b, __p) __arm_vmulltq_int_x(__a, __b, __p)
-#define vcaddq_rot90_x(__a, __b, __p) __arm_vcaddq_rot90_x(__a, __b, __p)
-#define vcaddq_rot270_x(__a, __b, __p) __arm_vcaddq_rot270_x(__a, __b, __p)
-#define vhcaddq_rot90_x(__a, __b, __p) __arm_vhcaddq_rot90_x(__a, __b, __p)
-#define vhcaddq_rot270_x(__a, __b, __p) __arm_vhcaddq_rot270_x(__a, __b, __p)
 #define vbicq_x(__a, __b, __p) __arm_vbicq_x(__a, __b, __p)
 #define vornq_x(__a, __b, __p) __arm_vornq_x(__a, __b, __p)
 #define vadciq(__a, __b, __carry_out) __arm_vadciq(__a, __b, __carry_out)
@@ -249,44 +237,26 @@
 #define vornq_u8(__a, __b) __arm_vornq_u8(__a, __b)
 #define vmulltq_int_u8(__a, __b) __arm_vmulltq_int_u8(__a, __b)
 #define vmullbq_int_u8(__a, __b) __arm_vmullbq_int_u8(__a, __b)
-#define vcaddq_rot90_u8(__a, __b) __arm_vcaddq_rot90_u8(__a, __b)
-#define vcaddq_rot270_u8(__a, __b) __arm_vcaddq_rot270_u8(__a, __b)
 #define vbicq_u8(__a, __b) __arm_vbicq_u8(__a, __b)
 #define vornq_s8(__a, __b) __arm_vornq_s8(__a, __b)
 #define vmulltq_int_s8(__a, __b) __arm_vmulltq_int_s8(__a, __b)
 #define vmullbq_int_s8(__a, __b) __arm_vmullbq_int_s8(__a, __b)
-#define vhcaddq_rot90_s8(__a, __b) __arm_vhcaddq_rot90_s8(__a, __b)
-#define vhcaddq_rot270_s8(__a, __b) __arm_vhcaddq_rot270_s8(__a, __b)
-#define vcaddq_rot90_s8(__a, __b) __arm_vcaddq_rot90_s8(__a, __b)
-#define vcaddq_rot270_s8(__a, __b) __arm_vcaddq_rot270_s8(__a, __b)
 #define vbicq_s8(__a, __b) __arm_vbicq_s8(__a, __b)
 #define vornq_u16(__a, __b) __arm_vornq_u16(__a, __b)
 #define vmulltq_int_u16(__a, __b) __arm_vmulltq_int_u16(__a, __b)
 #define vmullbq_int_u16(__a, __b) __arm_vmullbq_int_u16(__a, __b)
-#define vcaddq_rot90_u16(__a, __b) __arm_vcaddq_rot90_u16(__a, __b)
-#define vcaddq_rot270_u16(__a, __b) __arm_vcaddq_rot270_u16(__a, __b)
 #define vbicq_u16(__a, __b) __arm_vbicq_u16(__a, __b)
 #define vornq_s16(__a, __b) __arm_vornq_s16(__a, __b)
 #define vmulltq_int_s16(__a, __b) __arm_vmulltq_int_s16(__a, __b)
 #define vmullbq_int_s16(__a, __b) __arm_vmullbq_int_s16(__a, __b)
-#define vhcaddq_rot90_s16(__a, __b) __arm_vhcaddq_rot90_s16(__a, __b)
-#define vhcaddq_rot270_s16(__a, __b) __arm_vhcaddq_rot270_s16(__a, __b)
-#define vcaddq_rot90_s16(__a, __b) __arm_vcaddq_rot90_s16(__a, __b)
-#define vcaddq_rot270_s16(__a, __b) __arm_vcaddq_rot270_s16(__a, __b)
 #define vbicq_s16(__a, __b) __arm_vbicq_s16(__a, __b)
 #define vornq_u32(__a, __b) __arm_vornq_u32(__a, __b)
 #define vmulltq_int_u32(__a, __b) __arm_vmulltq_int_u32(__a, __b)
 #define vmullbq_int_u32(__a, __b) __arm_vmullbq_int_u32(__a, __b)
-#define vcaddq_rot90_u32(__a, __b) __arm_vcaddq_rot90_u32(__a, __b)
-#define vcaddq_rot270_u32(__a, __b) __arm_vcaddq_rot270_u32(__a, __b)
 #define vbicq_u32(__a, __b) __arm_vbicq_u32(__a, __b)
 #define vornq_s32(__a, __b) __arm_vornq_s32(__a, __b)
 #define vmulltq_int_s32(__a, __b) __arm_vmulltq_int_s32(__a, __b)
 #define vmullbq_int_s32(__a, __b) __arm_vmullbq_int_s32(__a, __b)
-#define vhcaddq_rot90_s32(__a, __b) __arm_vhcaddq_rot90_s32(__a, __b)
-#define vhcaddq_rot270_s32(__a, __b) __arm_vhcaddq_rot270_s32(__a, __b)
-#define vcaddq_rot90_s32(__a, __b) __arm_vcaddq_rot90_s32(__a, __b)
-#define vcaddq_rot270_s32(__a, __b) __arm_vcaddq_rot270_s32(__a, __b)
 #define vbicq_s32(__a, __b) __arm_vbicq_s32(__a, __b)
 #define vmulltq_poly_p8(__a, __b) __arm_vmulltq_poly_p8(__a, __b)
 #define vmullbq_poly_p8(__a, __b) __arm_vmullbq_poly_p8(__a, __b)
@@ -296,8 +266,6 @@
 #define vcmulq_rot270_f16(__a, __b) __arm_vcmulq_rot270_f16(__a, __b)
 #define vcmulq_rot180_f16(__a, __b) __arm_vcmulq_rot180_f16(__a, __b)
 #define vcmulq_f16(__a, __b) __arm_vcmulq_f16(__a, __b)
-#define vcaddq_rot90_f16(__a, __b) __arm_vcaddq_rot90_f16(__a, __b)
-#define vcaddq_rot270_f16(__a, __b) __arm_vcaddq_rot270_f16(__a, __b)
 #define vbicq_f16(__a, __b) __arm_vbicq_f16(__a, __b)
 #define vbicq_n_s16(__a,  __imm) __arm_vbicq_n_s16(__a,  __imm)
 #define vmulltq_poly_p16(__a, __b) __arm_vmulltq_poly_p16(__a, __b)
@@ -308,8 +276,6 @@
 #define vcmulq_rot270_f32(__a, __b) __arm_vcmulq_rot270_f32(__a, __b)
 #define vcmulq_rot180_f32(__a, __b) __arm_vcmulq_rot180_f32(__a, __b)
 #define vcmulq_f32(__a, __b) __arm_vcmulq_f32(__a, __b)
-#define vcaddq_rot90_f32(__a, __b) __arm_vcaddq_rot90_f32(__a, __b)
-#define vcaddq_rot270_f32(__a, __b) __arm_vcaddq_rot270_f32(__a, __b)
 #define vbicq_f32(__a, __b) __arm_vbicq_f32(__a, __b)
 #define vbicq_n_s32(__a,  __imm) __arm_vbicq_n_s32(__a,  __imm)
 #define vctp8q_m(__a, __p) __arm_vctp8q_m(__a, __p)
@@ -374,24 +340,6 @@
 #define vbicq_m_u8(__inactive, __a, __b, __p) __arm_vbicq_m_u8(__inactive, __a, __b, __p)
 #define vbicq_m_u32(__inactive, __a, __b, __p) __arm_vbicq_m_u32(__inactive, __a, __b, __p)
 #define vbicq_m_u16(__inactive, __a, __b, __p) __arm_vbicq_m_u16(__inactive, __a, __b, __p)
-#define vcaddq_rot270_m_s8(__inactive, __a, __b, __p) __arm_vcaddq_rot270_m_s8(__inactive, __a, __b, __p)
-#define vcaddq_rot270_m_s32(__inactive, __a, __b, __p) __arm_vcaddq_rot270_m_s32(__inactive, __a, __b, __p)
-#define vcaddq_rot270_m_s16(__inactive, __a, __b, __p) __arm_vcaddq_rot270_m_s16(__inactive, __a, __b, __p)
-#define vcaddq_rot270_m_u8(__inactive, __a, __b, __p) __arm_vcaddq_rot270_m_u8(__inactive, __a, __b, __p)
-#define vcaddq_rot270_m_u32(__inactive, __a, __b, __p) __arm_vcaddq_rot270_m_u32(__inactive, __a, __b, __p)
-#define vcaddq_rot270_m_u16(__inactive, __a, __b, __p) __arm_vcaddq_rot270_m_u16(__inactive, __a, __b, __p)
-#define vcaddq_rot90_m_s8(__inactive, __a, __b, __p) __arm_vcaddq_rot90_m_s8(__inactive, __a, __b, __p)
-#define vcaddq_rot90_m_s32(__inactive, __a, __b, __p) __arm_vcaddq_rot90_m_s32(__inactive, __a, __b, __p)
-#define vcaddq_rot90_m_s16(__inactive, __a, __b, __p) __arm_vcaddq_rot90_m_s16(__inactive, __a, __b, __p)
-#define vcaddq_rot90_m_u8(__inactive, __a, __b, __p) __arm_vcaddq_rot90_m_u8(__inactive, __a, __b, __p)
-#define vcaddq_rot90_m_u32(__inactive, __a, __b, __p) __arm_vcaddq_rot90_m_u32(__inactive, __a, __b, __p)
-#define vcaddq_rot90_m_u16(__inactive, __a, __b, __p) __arm_vcaddq_rot90_m_u16(__inactive, __a, __b, __p)
-#define vhcaddq_rot270_m_s8(__inactive, __a, __b, __p) __arm_vhcaddq_rot270_m_s8(__inactive, __a, __b, __p)
-#define vhcaddq_rot270_m_s32(__inactive, __a, __b, __p) __arm_vhcaddq_rot270_m_s32(__inactive, __a, __b, __p)
-#define vhcaddq_rot270_m_s16(__inactive, __a, __b, __p) __arm_vhcaddq_rot270_m_s16(__inactive, __a, __b, __p)
-#define vhcaddq_rot90_m_s8(__inactive, __a, __b, __p) __arm_vhcaddq_rot90_m_s8(__inactive, __a, __b, __p)
-#define vhcaddq_rot90_m_s32(__inactive, __a, __b, __p) __arm_vhcaddq_rot90_m_s32(__inactive, __a, __b, __p)
-#define vhcaddq_rot90_m_s16(__inactive, __a, __b, __p) __arm_vhcaddq_rot90_m_s16(__inactive, __a, __b, __p)
 #define vmullbq_int_m_s8(__inactive, __a, __b, __p) __arm_vmullbq_int_m_s8(__inactive, __a, __b, __p)
 #define vmullbq_int_m_s32(__inactive, __a, __b, __p) __arm_vmullbq_int_m_s32(__inactive, __a, __b, __p)
 #define vmullbq_int_m_s16(__inactive, __a, __b, __p) __arm_vmullbq_int_m_s16(__inactive, __a, __b, __p)
@@ -416,10 +364,6 @@
 #define vmulltq_poly_m_p16(__inactive, __a, __b, __p) __arm_vmulltq_poly_m_p16(__inactive, __a, __b, __p)
 #define vbicq_m_f32(__inactive, __a, __b, __p) __arm_vbicq_m_f32(__inactive, __a, __b, __p)
 #define vbicq_m_f16(__inactive, __a, __b, __p) __arm_vbicq_m_f16(__inactive, __a, __b, __p)
-#define vcaddq_rot270_m_f32(__inactive, __a, __b, __p) __arm_vcaddq_rot270_m_f32(__inactive, __a, __b, __p)
-#define vcaddq_rot270_m_f16(__inactive, __a, __b, __p) __arm_vcaddq_rot270_m_f16(__inactive, __a, __b, __p)
-#define vcaddq_rot90_m_f32(__inactive, __a, __b, __p) __arm_vcaddq_rot90_m_f32(__inactive, __a, __b, __p)
-#define vcaddq_rot90_m_f16(__inactive, __a, __b, __p) __arm_vcaddq_rot90_m_f16(__inactive, __a, __b, __p)
 #define vcmlaq_m_f32(__a, __b, __c, __p) __arm_vcmlaq_m_f32(__a, __b, __c, __p)
 #define vcmlaq_m_f16(__a, __b, __c, __p) __arm_vcmlaq_m_f16(__a, __b, __c, __p)
 #define vcmlaq_rot180_m_f32(__a, __b, __c, __p) __arm_vcmlaq_rot180_m_f32(__a, __b, __c, __p)
@@ -756,24 +700,6 @@
 #define vmulltq_int_x_u8(__a, __b, __p) __arm_vmulltq_int_x_u8(__a, __b, __p)
 #define vmulltq_int_x_u16(__a, __b, __p) __arm_vmulltq_int_x_u16(__a, __b, __p)
 #define vmulltq_int_x_u32(__a, __b, __p) __arm_vmulltq_int_x_u32(__a, __b, __p)
-#define vcaddq_rot90_x_s8(__a, __b, __p) __arm_vcaddq_rot90_x_s8(__a, __b, __p)
-#define vcaddq_rot90_x_s16(__a, __b, __p) __arm_vcaddq_rot90_x_s16(__a, __b, __p)
-#define vcaddq_rot90_x_s32(__a, __b, __p) __arm_vcaddq_rot90_x_s32(__a, __b, __p)
-#define vcaddq_rot90_x_u8(__a, __b, __p) __arm_vcaddq_rot90_x_u8(__a, __b, __p)
-#define vcaddq_rot90_x_u16(__a, __b, __p) __arm_vcaddq_rot90_x_u16(__a, __b, __p)
-#define vcaddq_rot90_x_u32(__a, __b, __p) __arm_vcaddq_rot90_x_u32(__a, __b, __p)
-#define vcaddq_rot270_x_s8(__a, __b, __p) __arm_vcaddq_rot270_x_s8(__a, __b, __p)
-#define vcaddq_rot270_x_s16(__a, __b, __p) __arm_vcaddq_rot270_x_s16(__a, __b, __p)
-#define vcaddq_rot270_x_s32(__a, __b, __p) __arm_vcaddq_rot270_x_s32(__a, __b, __p)
-#define vcaddq_rot270_x_u8(__a, __b, __p) __arm_vcaddq_rot270_x_u8(__a, __b, __p)
-#define vcaddq_rot270_x_u16(__a, __b, __p) __arm_vcaddq_rot270_x_u16(__a, __b, __p)
-#define vcaddq_rot270_x_u32(__a, __b, __p) __arm_vcaddq_rot270_x_u32(__a, __b, __p)
-#define vhcaddq_rot90_x_s8(__a, __b, __p) __arm_vhcaddq_rot90_x_s8(__a, __b, __p)
-#define vhcaddq_rot90_x_s16(__a, __b, __p) __arm_vhcaddq_rot90_x_s16(__a, __b, __p)
-#define vhcaddq_rot90_x_s32(__a, __b, __p) __arm_vhcaddq_rot90_x_s32(__a, __b, __p)
-#define vhcaddq_rot270_x_s8(__a, __b, __p) __arm_vhcaddq_rot270_x_s8(__a, __b, __p)
-#define vhcaddq_rot270_x_s16(__a, __b, __p) __arm_vhcaddq_rot270_x_s16(__a, __b, __p)
-#define vhcaddq_rot270_x_s32(__a, __b, __p) __arm_vhcaddq_rot270_x_s32(__a, __b, __p)
 #define vbicq_x_s8(__a, __b, __p) __arm_vbicq_x_s8(__a, __b, __p)
 #define vbicq_x_s16(__a, __b, __p) __arm_vbicq_x_s16(__a, __b, __p)
 #define vbicq_x_s32(__a, __b, __p) __arm_vbicq_x_s32(__a, __b, __p)
@@ -786,10 +712,6 @@
 #define vornq_x_u8(__a, __b, __p) __arm_vornq_x_u8(__a, __b, __p)
 #define vornq_x_u16(__a, __b, __p) __arm_vornq_x_u16(__a, __b, __p)
 #define vornq_x_u32(__a, __b, __p) __arm_vornq_x_u32(__a, __b, __p)
-#define vcaddq_rot90_x_f16(__a, __b, __p) __arm_vcaddq_rot90_x_f16(__a, __b, __p)
-#define vcaddq_rot90_x_f32(__a, __b, __p) __arm_vcaddq_rot90_x_f32(__a, __b, __p)
-#define vcaddq_rot270_x_f16(__a, __b, __p) __arm_vcaddq_rot270_x_f16(__a, __b, __p)
-#define vcaddq_rot270_x_f32(__a, __b, __p) __arm_vcaddq_rot270_x_f32(__a, __b, __p)
 #define vcmulq_x_f16(__a, __b, __p) __arm_vcmulq_x_f16(__a, __b, __p)
 #define vcmulq_x_f32(__a, __b, __p) __arm_vcmulq_x_f32(__a, __b, __p)
 #define vcmulq_rot90_x_f16(__a, __b, __p) __arm_vcmulq_rot90_x_f16(__a, __b, __p)
@@ -1058,22 +980,6 @@ __arm_vmullbq_int_u8 (uint8x16_t __a, uint8x16_t __b)
   return __builtin_mve_vmullbq_int_uv16qi (__a, __b);
 }
 
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcaddq_rot90_u8 (uint8x16_t __a, uint8x16_t __b)
-{
-  return (uint8x16_t)
-    __builtin_mve_vcaddq_rot90v16qi ((int8x16_t)__a, (int8x16_t)__b);
-}
-
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcaddq_rot270_u8 (uint8x16_t __a, uint8x16_t __b)
-{
-  return (uint8x16_t)
-    __builtin_mve_vcaddq_rot270v16qi ((int8x16_t)__a, (int8x16_t)__b);
-}
-
 __extension__ extern __inline uint8x16_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vbicq_u8 (uint8x16_t __a, uint8x16_t __b)
@@ -1102,34 +1008,6 @@ __arm_vmullbq_int_s8 (int8x16_t __a, int8x16_t __b)
   return __builtin_mve_vmullbq_int_sv16qi (__a, __b);
 }
 
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhcaddq_rot90_s8 (int8x16_t __a, int8x16_t __b)
-{
-  return __builtin_mve_vhcaddq_rot90_sv16qi (__a, __b);
-}
-
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhcaddq_rot270_s8 (int8x16_t __a, int8x16_t __b)
-{
-  return __builtin_mve_vhcaddq_rot270_sv16qi (__a, __b);
-}
-
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcaddq_rot90_s8 (int8x16_t __a, int8x16_t __b)
-{
-  return __builtin_mve_vcaddq_rot90v16qi (__a, __b);
-}
-
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcaddq_rot270_s8 (int8x16_t __a, int8x16_t __b)
-{
-  return __builtin_mve_vcaddq_rot270v16qi (__a, __b);
-}
-
 __extension__ extern __inline int8x16_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vbicq_s8 (int8x16_t __a, int8x16_t __b)
@@ -1158,22 +1036,6 @@ __arm_vmullbq_int_u16 (uint16x8_t __a, uint16x8_t __b)
   return __builtin_mve_vmullbq_int_uv8hi (__a, __b);
 }
 
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcaddq_rot90_u16 (uint16x8_t __a, uint16x8_t __b)
-{
-  return (uint16x8_t)
-    __builtin_mve_vcaddq_rot90v8hi ((int16x8_t)__a, (int16x8_t)__b);
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcaddq_rot270_u16 (uint16x8_t __a, uint16x8_t __b)
-{
-  return (uint16x8_t)
-    __builtin_mve_vcaddq_rot270v8hi ((int16x8_t)__a, (int16x8_t)__b);
-}
-
 __extension__ extern __inline uint16x8_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vbicq_u16 (uint16x8_t __a, uint16x8_t __b)
@@ -1202,34 +1064,6 @@ __arm_vmullbq_int_s16 (int16x8_t __a, int16x8_t __b)
   return __builtin_mve_vmullbq_int_sv8hi (__a, __b);
 }
 
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhcaddq_rot90_s16 (int16x8_t __a, int16x8_t __b)
-{
-  return __builtin_mve_vhcaddq_rot90_sv8hi (__a, __b);
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhcaddq_rot270_s16 (int16x8_t __a, int16x8_t __b)
-{
-  return __builtin_mve_vhcaddq_rot270_sv8hi (__a, __b);
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcaddq_rot90_s16 (int16x8_t __a, int16x8_t __b)
-{
-  return __builtin_mve_vcaddq_rot90v8hi (__a, __b);
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcaddq_rot270_s16 (int16x8_t __a, int16x8_t __b)
-{
-  return __builtin_mve_vcaddq_rot270v8hi (__a, __b);
-}
-
 __extension__ extern __inline int16x8_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vbicq_s16 (int16x8_t __a, int16x8_t __b)
@@ -1258,22 +1092,6 @@ __arm_vmullbq_int_u32 (uint32x4_t __a, uint32x4_t __b)
   return __builtin_mve_vmullbq_int_uv4si (__a, __b);
 }
 
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcaddq_rot90_u32 (uint32x4_t __a, uint32x4_t __b)
-{
-  return (uint32x4_t)
-    __builtin_mve_vcaddq_rot90v4si ((int32x4_t)__a, (int32x4_t)__b);
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcaddq_rot270_u32 (uint32x4_t __a, uint32x4_t __b)
-{
-  return (uint32x4_t)
-    __builtin_mve_vcaddq_rot270v4si ((int32x4_t)__a, (int32x4_t)__b);
-}
-
 __extension__ extern __inline uint32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vbicq_u32 (uint32x4_t __a, uint32x4_t __b)
@@ -1302,34 +1120,6 @@ __arm_vmullbq_int_s32 (int32x4_t __a, int32x4_t __b)
   return __builtin_mve_vmullbq_int_sv4si (__a, __b);
 }
 
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhcaddq_rot90_s32 (int32x4_t __a, int32x4_t __b)
-{
-  return __builtin_mve_vhcaddq_rot90_sv4si (__a, __b);
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhcaddq_rot270_s32 (int32x4_t __a, int32x4_t __b)
-{
-  return __builtin_mve_vhcaddq_rot270_sv4si (__a, __b);
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcaddq_rot90_s32 (int32x4_t __a, int32x4_t __b)
-{
-  return __builtin_mve_vcaddq_rot90v4si (__a, __b);
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcaddq_rot270_s32 (int32x4_t __a, int32x4_t __b)
-{
-  return __builtin_mve_vcaddq_rot270v4si (__a, __b);
-}
-
 __extension__ extern __inline int32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vbicq_s32 (int32x4_t __a, int32x4_t __b)
@@ -1545,132 +1335,6 @@ __arm_vbicq_m_u16 (uint16x8_t __inactive, uint16x8_t __a, uint16x8_t __b, mve_pr
   return __builtin_mve_vbicq_m_uv8hi (__inactive, __a, __b, __p);
 }
 
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcaddq_rot270_m_s8 (int8x16_t __inactive, int8x16_t __a, int8x16_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vcaddq_rot270_m_sv16qi (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcaddq_rot270_m_s32 (int32x4_t __inactive, int32x4_t __a, int32x4_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vcaddq_rot270_m_sv4si (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcaddq_rot270_m_s16 (int16x8_t __inactive, int16x8_t __a, int16x8_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vcaddq_rot270_m_sv8hi (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcaddq_rot270_m_u8 (uint8x16_t __inactive, uint8x16_t __a, uint8x16_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vcaddq_rot270_m_uv16qi (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcaddq_rot270_m_u32 (uint32x4_t __inactive, uint32x4_t __a, uint32x4_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vcaddq_rot270_m_uv4si (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcaddq_rot270_m_u16 (uint16x8_t __inactive, uint16x8_t __a, uint16x8_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vcaddq_rot270_m_uv8hi (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcaddq_rot90_m_s8 (int8x16_t __inactive, int8x16_t __a, int8x16_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vcaddq_rot90_m_sv16qi (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcaddq_rot90_m_s32 (int32x4_t __inactive, int32x4_t __a, int32x4_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vcaddq_rot90_m_sv4si (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcaddq_rot90_m_s16 (int16x8_t __inactive, int16x8_t __a, int16x8_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vcaddq_rot90_m_sv8hi (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcaddq_rot90_m_u8 (uint8x16_t __inactive, uint8x16_t __a, uint8x16_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vcaddq_rot90_m_uv16qi (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcaddq_rot90_m_u32 (uint32x4_t __inactive, uint32x4_t __a, uint32x4_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vcaddq_rot90_m_uv4si (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcaddq_rot90_m_u16 (uint16x8_t __inactive, uint16x8_t __a, uint16x8_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vcaddq_rot90_m_uv8hi (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhcaddq_rot270_m_s8 (int8x16_t __inactive, int8x16_t __a, int8x16_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vhcaddq_rot270_m_sv16qi (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhcaddq_rot270_m_s32 (int32x4_t __inactive, int32x4_t __a, int32x4_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vhcaddq_rot270_m_sv4si (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhcaddq_rot270_m_s16 (int16x8_t __inactive, int16x8_t __a, int16x8_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vhcaddq_rot270_m_sv8hi (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhcaddq_rot90_m_s8 (int8x16_t __inactive, int8x16_t __a, int8x16_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vhcaddq_rot90_m_sv16qi (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhcaddq_rot90_m_s32 (int32x4_t __inactive, int32x4_t __a, int32x4_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vhcaddq_rot90_m_sv4si (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhcaddq_rot90_m_s16 (int16x8_t __inactive, int16x8_t __a, int16x8_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vhcaddq_rot90_m_sv8hi (__inactive, __a, __b, __p);
-}
-
 __extension__ extern __inline int16x8_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vmullbq_int_m_s8 (int16x8_t __inactive, int8x16_t __a, int8x16_t __b, mve_pred16_t __p)
@@ -3850,281 +3514,155 @@ __arm_vmulltq_int_x_u32 (uint32x4_t __a, uint32x4_t __b, mve_pred16_t __p)
 
 __extension__ extern __inline int8x16_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcaddq_rot90_x_s8 (int8x16_t __a, int8x16_t __b, mve_pred16_t __p)
+__arm_vbicq_x_s8 (int8x16_t __a, int8x16_t __b, mve_pred16_t __p)
 {
-  return __builtin_mve_vcaddq_rot90_m_sv16qi (__arm_vuninitializedq_s8 (), __a, __b, __p);
+  return __builtin_mve_vbicq_m_sv16qi (__arm_vuninitializedq_s8 (), __a, __b, __p);
 }
 
 __extension__ extern __inline int16x8_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcaddq_rot90_x_s16 (int16x8_t __a, int16x8_t __b, mve_pred16_t __p)
+__arm_vbicq_x_s16 (int16x8_t __a, int16x8_t __b, mve_pred16_t __p)
 {
-  return __builtin_mve_vcaddq_rot90_m_sv8hi (__arm_vuninitializedq_s16 (), __a, __b, __p);
+  return __builtin_mve_vbicq_m_sv8hi (__arm_vuninitializedq_s16 (), __a, __b, __p);
 }
 
 __extension__ extern __inline int32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcaddq_rot90_x_s32 (int32x4_t __a, int32x4_t __b, mve_pred16_t __p)
+__arm_vbicq_x_s32 (int32x4_t __a, int32x4_t __b, mve_pred16_t __p)
 {
-  return __builtin_mve_vcaddq_rot90_m_sv4si (__arm_vuninitializedq_s32 (), __a, __b, __p);
+  return __builtin_mve_vbicq_m_sv4si (__arm_vuninitializedq_s32 (), __a, __b, __p);
 }
 
 __extension__ extern __inline uint8x16_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcaddq_rot90_x_u8 (uint8x16_t __a, uint8x16_t __b, mve_pred16_t __p)
+__arm_vbicq_x_u8 (uint8x16_t __a, uint8x16_t __b, mve_pred16_t __p)
 {
-  return __builtin_mve_vcaddq_rot90_m_uv16qi (__arm_vuninitializedq_u8 (), __a, __b, __p);
+  return __builtin_mve_vbicq_m_uv16qi (__arm_vuninitializedq_u8 (), __a, __b, __p);
 }
 
 __extension__ extern __inline uint16x8_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcaddq_rot90_x_u16 (uint16x8_t __a, uint16x8_t __b, mve_pred16_t __p)
+__arm_vbicq_x_u16 (uint16x8_t __a, uint16x8_t __b, mve_pred16_t __p)
 {
-  return __builtin_mve_vcaddq_rot90_m_uv8hi (__arm_vuninitializedq_u16 (), __a, __b, __p);
+  return __builtin_mve_vbicq_m_uv8hi (__arm_vuninitializedq_u16 (), __a, __b, __p);
 }
 
 __extension__ extern __inline uint32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcaddq_rot90_x_u32 (uint32x4_t __a, uint32x4_t __b, mve_pred16_t __p)
+__arm_vbicq_x_u32 (uint32x4_t __a, uint32x4_t __b, mve_pred16_t __p)
 {
-  return __builtin_mve_vcaddq_rot90_m_uv4si (__arm_vuninitializedq_u32 (), __a, __b, __p);
+  return __builtin_mve_vbicq_m_uv4si (__arm_vuninitializedq_u32 (), __a, __b, __p);
 }
 
 __extension__ extern __inline int8x16_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcaddq_rot270_x_s8 (int8x16_t __a, int8x16_t __b, mve_pred16_t __p)
+__arm_vornq_x_s8 (int8x16_t __a, int8x16_t __b, mve_pred16_t __p)
 {
-  return __builtin_mve_vcaddq_rot270_m_sv16qi (__arm_vuninitializedq_s8 (), __a, __b, __p);
+  return __builtin_mve_vornq_m_sv16qi (__arm_vuninitializedq_s8 (), __a, __b, __p);
 }
 
 __extension__ extern __inline int16x8_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcaddq_rot270_x_s16 (int16x8_t __a, int16x8_t __b, mve_pred16_t __p)
+__arm_vornq_x_s16 (int16x8_t __a, int16x8_t __b, mve_pred16_t __p)
 {
-  return __builtin_mve_vcaddq_rot270_m_sv8hi (__arm_vuninitializedq_s16 (), __a, __b, __p);
+  return __builtin_mve_vornq_m_sv8hi (__arm_vuninitializedq_s16 (), __a, __b, __p);
 }
 
 __extension__ extern __inline int32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcaddq_rot270_x_s32 (int32x4_t __a, int32x4_t __b, mve_pred16_t __p)
+__arm_vornq_x_s32 (int32x4_t __a, int32x4_t __b, mve_pred16_t __p)
 {
-  return __builtin_mve_vcaddq_rot270_m_sv4si (__arm_vuninitializedq_s32 (), __a, __b, __p);
+  return __builtin_mve_vornq_m_sv4si (__arm_vuninitializedq_s32 (), __a, __b, __p);
 }
 
 __extension__ extern __inline uint8x16_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcaddq_rot270_x_u8 (uint8x16_t __a, uint8x16_t __b, mve_pred16_t __p)
+__arm_vornq_x_u8 (uint8x16_t __a, uint8x16_t __b, mve_pred16_t __p)
 {
-  return __builtin_mve_vcaddq_rot270_m_uv16qi (__arm_vuninitializedq_u8 (), __a, __b, __p);
+  return __builtin_mve_vornq_m_uv16qi (__arm_vuninitializedq_u8 (), __a, __b, __p);
 }
 
 __extension__ extern __inline uint16x8_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcaddq_rot270_x_u16 (uint16x8_t __a, uint16x8_t __b, mve_pred16_t __p)
+__arm_vornq_x_u16 (uint16x8_t __a, uint16x8_t __b, mve_pred16_t __p)
 {
-  return __builtin_mve_vcaddq_rot270_m_uv8hi (__arm_vuninitializedq_u16 (), __a, __b, __p);
+  return __builtin_mve_vornq_m_uv8hi (__arm_vuninitializedq_u16 (), __a, __b, __p);
 }
 
 __extension__ extern __inline uint32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcaddq_rot270_x_u32 (uint32x4_t __a, uint32x4_t __b, mve_pred16_t __p)
+__arm_vornq_x_u32 (uint32x4_t __a, uint32x4_t __b, mve_pred16_t __p)
 {
-  return __builtin_mve_vcaddq_rot270_m_uv4si (__arm_vuninitializedq_u32 (), __a, __b, __p);
+  return __builtin_mve_vornq_m_uv4si (__arm_vuninitializedq_u32 (), __a, __b, __p);
 }
 
-__extension__ extern __inline int8x16_t
+__extension__ extern __inline int32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhcaddq_rot90_x_s8 (int8x16_t __a, int8x16_t __b, mve_pred16_t __p)
+__arm_vadciq_s32 (int32x4_t __a, int32x4_t __b, unsigned * __carry_out)
 {
-  return __builtin_mve_vhcaddq_rot90_m_sv16qi (__arm_vuninitializedq_s8 (), __a, __b, __p);
+  int32x4_t __res = __builtin_mve_vadciq_sv4si (__a, __b);
+  *__carry_out = (__builtin_arm_get_fpscr_nzcvqc () >> 29) & 0x1u;
+  return __res;
 }
 
-__extension__ extern __inline int16x8_t
+__extension__ extern __inline uint32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhcaddq_rot90_x_s16 (int16x8_t __a, int16x8_t __b, mve_pred16_t __p)
+__arm_vadciq_u32 (uint32x4_t __a, uint32x4_t __b, unsigned * __carry_out)
 {
-  return __builtin_mve_vhcaddq_rot90_m_sv8hi (__arm_vuninitializedq_s16 (), __a, __b, __p);
+  uint32x4_t __res = __builtin_mve_vadciq_uv4si (__a, __b);
+  *__carry_out = (__builtin_arm_get_fpscr_nzcvqc () >> 29) & 0x1u;
+  return __res;
 }
 
 __extension__ extern __inline int32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhcaddq_rot90_x_s32 (int32x4_t __a, int32x4_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vhcaddq_rot90_m_sv4si (__arm_vuninitializedq_s32 (), __a, __b, __p);
-}
-
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhcaddq_rot270_x_s8 (int8x16_t __a, int8x16_t __b, mve_pred16_t __p)
+__arm_vadciq_m_s32 (int32x4_t __inactive, int32x4_t __a, int32x4_t __b, unsigned * __carry_out, mve_pred16_t __p)
 {
-  return __builtin_mve_vhcaddq_rot270_m_sv16qi (__arm_vuninitializedq_s8 (), __a, __b, __p);
+  int32x4_t __res =  __builtin_mve_vadciq_m_sv4si (__inactive, __a, __b, __p);
+  *__carry_out = (__builtin_arm_get_fpscr_nzcvqc () >> 29) & 0x1u;
+  return __res;
 }
 
-__extension__ extern __inline int16x8_t
+__extension__ extern __inline uint32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhcaddq_rot270_x_s16 (int16x8_t __a, int16x8_t __b, mve_pred16_t __p)
+__arm_vadciq_m_u32 (uint32x4_t __inactive, uint32x4_t __a, uint32x4_t __b, unsigned * __carry_out, mve_pred16_t __p)
 {
-  return __builtin_mve_vhcaddq_rot270_m_sv8hi (__arm_vuninitializedq_s16 (), __a, __b, __p);
+  uint32x4_t __res = __builtin_mve_vadciq_m_uv4si (__inactive, __a, __b, __p);
+  *__carry_out = (__builtin_arm_get_fpscr_nzcvqc () >> 29) & 0x1u;
+  return __res;
 }
 
 __extension__ extern __inline int32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhcaddq_rot270_x_s32 (int32x4_t __a, int32x4_t __b, mve_pred16_t __p)
+__arm_vadcq_s32 (int32x4_t __a, int32x4_t __b, unsigned * __carry)
 {
-  return __builtin_mve_vhcaddq_rot270_m_sv4si (__arm_vuninitializedq_s32 (), __a, __b, __p);
+  __builtin_arm_set_fpscr_nzcvqc((__builtin_arm_get_fpscr_nzcvqc () & ~0x20000000u) | ((*__carry & 0x1u) << 29));
+  int32x4_t __res = __builtin_mve_vadcq_sv4si (__a, __b);
+  *__carry = (__builtin_arm_get_fpscr_nzcvqc () >> 29) & 0x1u;
+  return __res;
 }
 
-__extension__ extern __inline int8x16_t
+__extension__ extern __inline uint32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vbicq_x_s8 (int8x16_t __a, int8x16_t __b, mve_pred16_t __p)
+__arm_vadcq_u32 (uint32x4_t __a, uint32x4_t __b, unsigned * __carry)
 {
-  return __builtin_mve_vbicq_m_sv16qi (__arm_vuninitializedq_s8 (), __a, __b, __p);
+  __builtin_arm_set_fpscr_nzcvqc((__builtin_arm_get_fpscr_nzcvqc () & ~0x20000000u) | ((*__carry & 0x1u) << 29));
+  uint32x4_t __res = __builtin_mve_vadcq_uv4si (__a, __b);
+  *__carry = (__builtin_arm_get_fpscr_nzcvqc () >> 29) & 0x1u;
+  return __res;
 }
 
-__extension__ extern __inline int16x8_t
+__extension__ extern __inline int32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vbicq_x_s16 (int16x8_t __a, int16x8_t __b, mve_pred16_t __p)
+__arm_vadcq_m_s32 (int32x4_t __inactive, int32x4_t __a, int32x4_t __b, unsigned * __carry, mve_pred16_t __p)
 {
-  return __builtin_mve_vbicq_m_sv8hi (__arm_vuninitializedq_s16 (), __a, __b, __p);
+  __builtin_arm_set_fpscr_nzcvqc((__builtin_arm_get_fpscr_nzcvqc () & ~0x20000000u) | ((*__carry & 0x1u) << 29));
+  int32x4_t __res = __builtin_mve_vadcq_m_sv4si (__inactive, __a, __b, __p);
+  *__carry = (__builtin_arm_get_fpscr_nzcvqc () >> 29) & 0x1u;
+  return __res;
 }
 
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vbicq_x_s32 (int32x4_t __a, int32x4_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vbicq_m_sv4si (__arm_vuninitializedq_s32 (), __a, __b, __p);
-}
-
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vbicq_x_u8 (uint8x16_t __a, uint8x16_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vbicq_m_uv16qi (__arm_vuninitializedq_u8 (), __a, __b, __p);
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vbicq_x_u16 (uint16x8_t __a, uint16x8_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vbicq_m_uv8hi (__arm_vuninitializedq_u16 (), __a, __b, __p);
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vbicq_x_u32 (uint32x4_t __a, uint32x4_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vbicq_m_uv4si (__arm_vuninitializedq_u32 (), __a, __b, __p);
-}
-
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vornq_x_s8 (int8x16_t __a, int8x16_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vornq_m_sv16qi (__arm_vuninitializedq_s8 (), __a, __b, __p);
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vornq_x_s16 (int16x8_t __a, int16x8_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vornq_m_sv8hi (__arm_vuninitializedq_s16 (), __a, __b, __p);
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vornq_x_s32 (int32x4_t __a, int32x4_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vornq_m_sv4si (__arm_vuninitializedq_s32 (), __a, __b, __p);
-}
-
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vornq_x_u8 (uint8x16_t __a, uint8x16_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vornq_m_uv16qi (__arm_vuninitializedq_u8 (), __a, __b, __p);
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vornq_x_u16 (uint16x8_t __a, uint16x8_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vornq_m_uv8hi (__arm_vuninitializedq_u16 (), __a, __b, __p);
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vornq_x_u32 (uint32x4_t __a, uint32x4_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vornq_m_uv4si (__arm_vuninitializedq_u32 (), __a, __b, __p);
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vadciq_s32 (int32x4_t __a, int32x4_t __b, unsigned * __carry_out)
-{
-  int32x4_t __res = __builtin_mve_vadciq_sv4si (__a, __b);
-  *__carry_out = (__builtin_arm_get_fpscr_nzcvqc () >> 29) & 0x1u;
-  return __res;
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vadciq_u32 (uint32x4_t __a, uint32x4_t __b, unsigned * __carry_out)
-{
-  uint32x4_t __res = __builtin_mve_vadciq_uv4si (__a, __b);
-  *__carry_out = (__builtin_arm_get_fpscr_nzcvqc () >> 29) & 0x1u;
-  return __res;
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vadciq_m_s32 (int32x4_t __inactive, int32x4_t __a, int32x4_t __b, unsigned * __carry_out, mve_pred16_t __p)
-{
-  int32x4_t __res =  __builtin_mve_vadciq_m_sv4si (__inactive, __a, __b, __p);
-  *__carry_out = (__builtin_arm_get_fpscr_nzcvqc () >> 29) & 0x1u;
-  return __res;
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vadciq_m_u32 (uint32x4_t __inactive, uint32x4_t __a, uint32x4_t __b, unsigned * __carry_out, mve_pred16_t __p)
-{
-  uint32x4_t __res = __builtin_mve_vadciq_m_uv4si (__inactive, __a, __b, __p);
-  *__carry_out = (__builtin_arm_get_fpscr_nzcvqc () >> 29) & 0x1u;
-  return __res;
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vadcq_s32 (int32x4_t __a, int32x4_t __b, unsigned * __carry)
-{
-  __builtin_arm_set_fpscr_nzcvqc((__builtin_arm_get_fpscr_nzcvqc () & ~0x20000000u) | ((*__carry & 0x1u) << 29));
-  int32x4_t __res = __builtin_mve_vadcq_sv4si (__a, __b);
-  *__carry = (__builtin_arm_get_fpscr_nzcvqc () >> 29) & 0x1u;
-  return __res;
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vadcq_u32 (uint32x4_t __a, uint32x4_t __b, unsigned * __carry)
-{
-  __builtin_arm_set_fpscr_nzcvqc((__builtin_arm_get_fpscr_nzcvqc () & ~0x20000000u) | ((*__carry & 0x1u) << 29));
-  uint32x4_t __res = __builtin_mve_vadcq_uv4si (__a, __b);
-  *__carry = (__builtin_arm_get_fpscr_nzcvqc () >> 29) & 0x1u;
-  return __res;
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vadcq_m_s32 (int32x4_t __inactive, int32x4_t __a, int32x4_t __b, unsigned * __carry, mve_pred16_t __p)
-{
-  __builtin_arm_set_fpscr_nzcvqc((__builtin_arm_get_fpscr_nzcvqc () & ~0x20000000u) | ((*__carry & 0x1u) << 29));
-  int32x4_t __res = __builtin_mve_vadcq_m_sv4si (__inactive, __a, __b, __p);
-  *__carry = (__builtin_arm_get_fpscr_nzcvqc () >> 29) & 0x1u;
-  return __res;
-}
-
-__extension__ extern __inline uint32x4_t
+__extension__ extern __inline uint32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vadcq_m_u32 (uint32x4_t __inactive, uint32x4_t __a, uint32x4_t __b, unsigned * __carry, mve_pred16_t __p)
 {
@@ -5051,20 +4589,6 @@ __arm_vcmulq_f16 (float16x8_t __a, float16x8_t __b)
   return __builtin_mve_vcmulqv8hf (__a, __b);
 }
 
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcaddq_rot90_f16 (float16x8_t __a, float16x8_t __b)
-{
-  return __builtin_mve_vcaddq_rot90v8hf (__a, __b);
-}
-
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcaddq_rot270_f16 (float16x8_t __a, float16x8_t __b)
-{
-  return __builtin_mve_vcaddq_rot270v8hf (__a, __b);
-}
-
 __extension__ extern __inline float16x8_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vbicq_f16 (float16x8_t __a, float16x8_t __b)
@@ -5107,20 +4631,6 @@ __arm_vcmulq_f32 (float32x4_t __a, float32x4_t __b)
   return __builtin_mve_vcmulqv4sf (__a, __b);
 }
 
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcaddq_rot90_f32 (float32x4_t __a, float32x4_t __b)
-{
-  return __builtin_mve_vcaddq_rot90v4sf (__a, __b);
-}
-
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcaddq_rot270_f32 (float32x4_t __a, float32x4_t __b)
-{
-  return __builtin_mve_vcaddq_rot270v4sf (__a, __b);
-}
-
 __extension__ extern __inline float32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vbicq_f32 (float32x4_t __a, float32x4_t __b)
@@ -5437,34 +4947,6 @@ __arm_vbicq_m_f16 (float16x8_t __inactive, float16x8_t __a, float16x8_t __b, mve
   return __builtin_mve_vbicq_m_fv8hf (__inactive, __a, __b, __p);
 }
 
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcaddq_rot270_m_f32 (float32x4_t __inactive, float32x4_t __a, float32x4_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vcaddq_rot270_m_fv4sf (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcaddq_rot270_m_f16 (float16x8_t __inactive, float16x8_t __a, float16x8_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vcaddq_rot270_m_fv8hf (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcaddq_rot90_m_f32 (float32x4_t __inactive, float32x4_t __a, float32x4_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vcaddq_rot90_m_fv4sf (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcaddq_rot90_m_f16 (float16x8_t __inactive, float16x8_t __a, float16x8_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vcaddq_rot90_m_fv8hf (__inactive, __a, __b, __p);
-}
-
 __extension__ extern __inline float32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vcmlaq_m_f32 (float32x4_t __a, float32x4_t __b, float32x4_t __c, mve_pred16_t __p)
@@ -5877,34 +5359,6 @@ __arm_vstrwq_scatter_base_wb_p_f32 (uint32x4_t * __addr, const int __offset, flo
   *__addr = __builtin_mve_vstrwq_scatter_base_wb_p_fv4sf (*__addr, __offset, __value, __p);
 }
 
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcaddq_rot90_x_f16 (float16x8_t __a, float16x8_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vcaddq_rot90_m_fv8hf (__arm_vuninitializedq_f16 (), __a, __b, __p);
-}
-
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcaddq_rot90_x_f32 (float32x4_t __a, float32x4_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vcaddq_rot90_m_fv4sf (__arm_vuninitializedq_f32 (), __a, __b, __p);
-}
-
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcaddq_rot270_x_f16 (float16x8_t __a, float16x8_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vcaddq_rot270_m_fv8hf (__arm_vuninitializedq_f16 (), __a, __b, __p);
-}
-
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcaddq_rot270_x_f32 (float32x4_t __a, float32x4_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vcaddq_rot270_m_fv4sf (__arm_vuninitializedq_f32 (), __a, __b, __p);
-}
-
 __extension__ extern __inline float16x8_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vcmulq_x_f16 (float16x8_t __a, float16x8_t __b, mve_pred16_t __p)
@@ -6408,20 +5862,6 @@ __arm_vmullbq_int (uint8x16_t __a, uint8x16_t __b)
  return __arm_vmullbq_int_u8 (__a, __b);
 }
 
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcaddq_rot90 (uint8x16_t __a, uint8x16_t __b)
-{
- return __arm_vcaddq_rot90_u8 (__a, __b);
-}
-
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcaddq_rot270 (uint8x16_t __a, uint8x16_t __b)
-{
- return __arm_vcaddq_rot270_u8 (__a, __b);
-}
-
 __extension__ extern __inline uint8x16_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vbicq (uint8x16_t __a, uint8x16_t __b)
@@ -6450,34 +5890,6 @@ __arm_vmullbq_int (int8x16_t __a, int8x16_t __b)
  return __arm_vmullbq_int_s8 (__a, __b);
 }
 
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhcaddq_rot90 (int8x16_t __a, int8x16_t __b)
-{
- return __arm_vhcaddq_rot90_s8 (__a, __b);
-}
-
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhcaddq_rot270 (int8x16_t __a, int8x16_t __b)
-{
- return __arm_vhcaddq_rot270_s8 (__a, __b);
-}
-
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcaddq_rot90 (int8x16_t __a, int8x16_t __b)
-{
- return __arm_vcaddq_rot90_s8 (__a, __b);
-}
-
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcaddq_rot270 (int8x16_t __a, int8x16_t __b)
-{
- return __arm_vcaddq_rot270_s8 (__a, __b);
-}
-
 __extension__ extern __inline int8x16_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vbicq (int8x16_t __a, int8x16_t __b)
@@ -6506,20 +5918,6 @@ __arm_vmullbq_int (uint16x8_t __a, uint16x8_t __b)
  return __arm_vmullbq_int_u16 (__a, __b);
 }
 
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcaddq_rot90 (uint16x8_t __a, uint16x8_t __b)
-{
- return __arm_vcaddq_rot90_u16 (__a, __b);
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcaddq_rot270 (uint16x8_t __a, uint16x8_t __b)
-{
- return __arm_vcaddq_rot270_u16 (__a, __b);
-}
-
 __extension__ extern __inline uint16x8_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vbicq (uint16x8_t __a, uint16x8_t __b)
@@ -6548,34 +5946,6 @@ __arm_vmullbq_int (int16x8_t __a, int16x8_t __b)
  return __arm_vmullbq_int_s16 (__a, __b);
 }
 
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhcaddq_rot90 (int16x8_t __a, int16x8_t __b)
-{
- return __arm_vhcaddq_rot90_s16 (__a, __b);
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhcaddq_rot270 (int16x8_t __a, int16x8_t __b)
-{
- return __arm_vhcaddq_rot270_s16 (__a, __b);
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcaddq_rot90 (int16x8_t __a, int16x8_t __b)
-{
- return __arm_vcaddq_rot90_s16 (__a, __b);
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcaddq_rot270 (int16x8_t __a, int16x8_t __b)
-{
- return __arm_vcaddq_rot270_s16 (__a, __b);
-}
-
 __extension__ extern __inline int16x8_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vbicq (int16x8_t __a, int16x8_t __b)
@@ -6604,20 +5974,6 @@ __arm_vmullbq_int (uint32x4_t __a, uint32x4_t __b)
  return __arm_vmullbq_int_u32 (__a, __b);
 }
 
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcaddq_rot90 (uint32x4_t __a, uint32x4_t __b)
-{
- return __arm_vcaddq_rot90_u32 (__a, __b);
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcaddq_rot270 (uint32x4_t __a, uint32x4_t __b)
-{
- return __arm_vcaddq_rot270_u32 (__a, __b);
-}
-
 __extension__ extern __inline uint32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vbicq (uint32x4_t __a, uint32x4_t __b)
@@ -6646,34 +6002,6 @@ __arm_vmullbq_int (int32x4_t __a, int32x4_t __b)
  return __arm_vmullbq_int_s32 (__a, __b);
 }
 
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhcaddq_rot90 (int32x4_t __a, int32x4_t __b)
-{
- return __arm_vhcaddq_rot90_s32 (__a, __b);
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhcaddq_rot270 (int32x4_t __a, int32x4_t __b)
-{
- return __arm_vhcaddq_rot270_s32 (__a, __b);
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcaddq_rot90 (int32x4_t __a, int32x4_t __b)
-{
- return __arm_vcaddq_rot90_s32 (__a, __b);
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcaddq_rot270 (int32x4_t __a, int32x4_t __b)
-{
- return __arm_vcaddq_rot270_s32 (__a, __b);
-}
-
 __extension__ extern __inline int32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vbicq (int32x4_t __a, int32x4_t __b)
@@ -6751,228 +6079,102 @@ __arm_vbicq_m_n (int32x4_t __a, const int __imm, mve_pred16_t __p)
  return __arm_vbicq_m_n_s32 (__a, __imm, __p);
 }
 
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vbicq_m_n (uint16x8_t __a, const int __imm, mve_pred16_t __p)
-{
- return __arm_vbicq_m_n_u16 (__a, __imm, __p);
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vbicq_m_n (uint32x4_t __a, const int __imm, mve_pred16_t __p)
-{
- return __arm_vbicq_m_n_u32 (__a, __imm, __p);
-}
-
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vshlcq (int8x16_t __a, uint32_t * __b, const int __imm)
-{
- return __arm_vshlcq_s8 (__a, __b, __imm);
-}
-
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vshlcq (uint8x16_t __a, uint32_t * __b, const int __imm)
-{
- return __arm_vshlcq_u8 (__a, __b, __imm);
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vshlcq (int16x8_t __a, uint32_t * __b, const int __imm)
-{
- return __arm_vshlcq_s16 (__a, __b, __imm);
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vshlcq (uint16x8_t __a, uint32_t * __b, const int __imm)
-{
- return __arm_vshlcq_u16 (__a, __b, __imm);
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vshlcq (int32x4_t __a, uint32_t * __b, const int __imm)
-{
- return __arm_vshlcq_s32 (__a, __b, __imm);
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vshlcq (uint32x4_t __a, uint32_t * __b, const int __imm)
-{
- return __arm_vshlcq_u32 (__a, __b, __imm);
-}
-
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vbicq_m (int8x16_t __inactive, int8x16_t __a, int8x16_t __b, mve_pred16_t __p)
-{
- return __arm_vbicq_m_s8 (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vbicq_m (int32x4_t __inactive, int32x4_t __a, int32x4_t __b, mve_pred16_t __p)
-{
- return __arm_vbicq_m_s32 (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vbicq_m (int16x8_t __inactive, int16x8_t __a, int16x8_t __b, mve_pred16_t __p)
-{
- return __arm_vbicq_m_s16 (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vbicq_m (uint8x16_t __inactive, uint8x16_t __a, uint8x16_t __b, mve_pred16_t __p)
-{
- return __arm_vbicq_m_u8 (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vbicq_m (uint32x4_t __inactive, uint32x4_t __a, uint32x4_t __b, mve_pred16_t __p)
-{
- return __arm_vbicq_m_u32 (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vbicq_m (uint16x8_t __inactive, uint16x8_t __a, uint16x8_t __b, mve_pred16_t __p)
-{
- return __arm_vbicq_m_u16 (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcaddq_rot270_m (int8x16_t __inactive, int8x16_t __a, int8x16_t __b, mve_pred16_t __p)
-{
- return __arm_vcaddq_rot270_m_s8 (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcaddq_rot270_m (int32x4_t __inactive, int32x4_t __a, int32x4_t __b, mve_pred16_t __p)
-{
- return __arm_vcaddq_rot270_m_s32 (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcaddq_rot270_m (int16x8_t __inactive, int16x8_t __a, int16x8_t __b, mve_pred16_t __p)
-{
- return __arm_vcaddq_rot270_m_s16 (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcaddq_rot270_m (uint8x16_t __inactive, uint8x16_t __a, uint8x16_t __b, mve_pred16_t __p)
-{
- return __arm_vcaddq_rot270_m_u8 (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline uint32x4_t
+__extension__ extern __inline uint16x8_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcaddq_rot270_m (uint32x4_t __inactive, uint32x4_t __a, uint32x4_t __b, mve_pred16_t __p)
+__arm_vbicq_m_n (uint16x8_t __a, const int __imm, mve_pred16_t __p)
 {
- return __arm_vcaddq_rot270_m_u32 (__inactive, __a, __b, __p);
+ return __arm_vbicq_m_n_u16 (__a, __imm, __p);
 }
 
-__extension__ extern __inline uint16x8_t
+__extension__ extern __inline uint32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcaddq_rot270_m (uint16x8_t __inactive, uint16x8_t __a, uint16x8_t __b, mve_pred16_t __p)
+__arm_vbicq_m_n (uint32x4_t __a, const int __imm, mve_pred16_t __p)
 {
- return __arm_vcaddq_rot270_m_u16 (__inactive, __a, __b, __p);
+ return __arm_vbicq_m_n_u32 (__a, __imm, __p);
 }
 
 __extension__ extern __inline int8x16_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcaddq_rot90_m (int8x16_t __inactive, int8x16_t __a, int8x16_t __b, mve_pred16_t __p)
+__arm_vshlcq (int8x16_t __a, uint32_t * __b, const int __imm)
 {
- return __arm_vcaddq_rot90_m_s8 (__inactive, __a, __b, __p);
+ return __arm_vshlcq_s8 (__a, __b, __imm);
 }
 
-__extension__ extern __inline int32x4_t
+__extension__ extern __inline uint8x16_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcaddq_rot90_m (int32x4_t __inactive, int32x4_t __a, int32x4_t __b, mve_pred16_t __p)
+__arm_vshlcq (uint8x16_t __a, uint32_t * __b, const int __imm)
 {
- return __arm_vcaddq_rot90_m_s32 (__inactive, __a, __b, __p);
+ return __arm_vshlcq_u8 (__a, __b, __imm);
 }
 
 __extension__ extern __inline int16x8_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcaddq_rot90_m (int16x8_t __inactive, int16x8_t __a, int16x8_t __b, mve_pred16_t __p)
+__arm_vshlcq (int16x8_t __a, uint32_t * __b, const int __imm)
 {
- return __arm_vcaddq_rot90_m_s16 (__inactive, __a, __b, __p);
+ return __arm_vshlcq_s16 (__a, __b, __imm);
 }
 
-__extension__ extern __inline uint8x16_t
+__extension__ extern __inline uint16x8_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcaddq_rot90_m (uint8x16_t __inactive, uint8x16_t __a, uint8x16_t __b, mve_pred16_t __p)
+__arm_vshlcq (uint16x8_t __a, uint32_t * __b, const int __imm)
 {
- return __arm_vcaddq_rot90_m_u8 (__inactive, __a, __b, __p);
+ return __arm_vshlcq_u16 (__a, __b, __imm);
 }
 
-__extension__ extern __inline uint32x4_t
+__extension__ extern __inline int32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcaddq_rot90_m (uint32x4_t __inactive, uint32x4_t __a, uint32x4_t __b, mve_pred16_t __p)
+__arm_vshlcq (int32x4_t __a, uint32_t * __b, const int __imm)
 {
- return __arm_vcaddq_rot90_m_u32 (__inactive, __a, __b, __p);
+ return __arm_vshlcq_s32 (__a, __b, __imm);
 }
 
-__extension__ extern __inline uint16x8_t
+__extension__ extern __inline uint32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcaddq_rot90_m (uint16x8_t __inactive, uint16x8_t __a, uint16x8_t __b, mve_pred16_t __p)
+__arm_vshlcq (uint32x4_t __a, uint32_t * __b, const int __imm)
 {
- return __arm_vcaddq_rot90_m_u16 (__inactive, __a, __b, __p);
+ return __arm_vshlcq_u32 (__a, __b, __imm);
 }
 
 __extension__ extern __inline int8x16_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhcaddq_rot270_m (int8x16_t __inactive, int8x16_t __a, int8x16_t __b, mve_pred16_t __p)
+__arm_vbicq_m (int8x16_t __inactive, int8x16_t __a, int8x16_t __b, mve_pred16_t __p)
 {
- return __arm_vhcaddq_rot270_m_s8 (__inactive, __a, __b, __p);
+ return __arm_vbicq_m_s8 (__inactive, __a, __b, __p);
 }
 
 __extension__ extern __inline int32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhcaddq_rot270_m (int32x4_t __inactive, int32x4_t __a, int32x4_t __b, mve_pred16_t __p)
+__arm_vbicq_m (int32x4_t __inactive, int32x4_t __a, int32x4_t __b, mve_pred16_t __p)
 {
- return __arm_vhcaddq_rot270_m_s32 (__inactive, __a, __b, __p);
+ return __arm_vbicq_m_s32 (__inactive, __a, __b, __p);
 }
 
 __extension__ extern __inline int16x8_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhcaddq_rot270_m (int16x8_t __inactive, int16x8_t __a, int16x8_t __b, mve_pred16_t __p)
+__arm_vbicq_m (int16x8_t __inactive, int16x8_t __a, int16x8_t __b, mve_pred16_t __p)
 {
- return __arm_vhcaddq_rot270_m_s16 (__inactive, __a, __b, __p);
+ return __arm_vbicq_m_s16 (__inactive, __a, __b, __p);
 }
 
-__extension__ extern __inline int8x16_t
+__extension__ extern __inline uint8x16_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhcaddq_rot90_m (int8x16_t __inactive, int8x16_t __a, int8x16_t __b, mve_pred16_t __p)
+__arm_vbicq_m (uint8x16_t __inactive, uint8x16_t __a, uint8x16_t __b, mve_pred16_t __p)
 {
- return __arm_vhcaddq_rot90_m_s8 (__inactive, __a, __b, __p);
+ return __arm_vbicq_m_u8 (__inactive, __a, __b, __p);
 }
 
-__extension__ extern __inline int32x4_t
+__extension__ extern __inline uint32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhcaddq_rot90_m (int32x4_t __inactive, int32x4_t __a, int32x4_t __b, mve_pred16_t __p)
+__arm_vbicq_m (uint32x4_t __inactive, uint32x4_t __a, uint32x4_t __b, mve_pred16_t __p)
 {
- return __arm_vhcaddq_rot90_m_s32 (__inactive, __a, __b, __p);
+ return __arm_vbicq_m_u32 (__inactive, __a, __b, __p);
 }
 
-__extension__ extern __inline int16x8_t
+__extension__ extern __inline uint16x8_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhcaddq_rot90_m (int16x8_t __inactive, int16x8_t __a, int16x8_t __b, mve_pred16_t __p)
+__arm_vbicq_m (uint16x8_t __inactive, uint16x8_t __a, uint16x8_t __b, mve_pred16_t __p)
 {
- return __arm_vhcaddq_rot90_m_s16 (__inactive, __a, __b, __p);
+ return __arm_vbicq_m_u16 (__inactive, __a, __b, __p);
 }
 
 __extension__ extern __inline int16x8_t
@@ -8725,132 +7927,6 @@ __arm_vmulltq_int_x (uint32x4_t __a, uint32x4_t __b, mve_pred16_t __p)
  return __arm_vmulltq_int_x_u32 (__a, __b, __p);
 }
 
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcaddq_rot90_x (int8x16_t __a, int8x16_t __b, mve_pred16_t __p)
-{
- return __arm_vcaddq_rot90_x_s8 (__a, __b, __p);
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcaddq_rot90_x (int16x8_t __a, int16x8_t __b, mve_pred16_t __p)
-{
- return __arm_vcaddq_rot90_x_s16 (__a, __b, __p);
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcaddq_rot90_x (int32x4_t __a, int32x4_t __b, mve_pred16_t __p)
-{
- return __arm_vcaddq_rot90_x_s32 (__a, __b, __p);
-}
-
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcaddq_rot90_x (uint8x16_t __a, uint8x16_t __b, mve_pred16_t __p)
-{
- return __arm_vcaddq_rot90_x_u8 (__a, __b, __p);
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcaddq_rot90_x (uint16x8_t __a, uint16x8_t __b, mve_pred16_t __p)
-{
- return __arm_vcaddq_rot90_x_u16 (__a, __b, __p);
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcaddq_rot90_x (uint32x4_t __a, uint32x4_t __b, mve_pred16_t __p)
-{
- return __arm_vcaddq_rot90_x_u32 (__a, __b, __p);
-}
-
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcaddq_rot270_x (int8x16_t __a, int8x16_t __b, mve_pred16_t __p)
-{
- return __arm_vcaddq_rot270_x_s8 (__a, __b, __p);
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcaddq_rot270_x (int16x8_t __a, int16x8_t __b, mve_pred16_t __p)
-{
- return __arm_vcaddq_rot270_x_s16 (__a, __b, __p);
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcaddq_rot270_x (int32x4_t __a, int32x4_t __b, mve_pred16_t __p)
-{
- return __arm_vcaddq_rot270_x_s32 (__a, __b, __p);
-}
-
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcaddq_rot270_x (uint8x16_t __a, uint8x16_t __b, mve_pred16_t __p)
-{
- return __arm_vcaddq_rot270_x_u8 (__a, __b, __p);
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcaddq_rot270_x (uint16x8_t __a, uint16x8_t __b, mve_pred16_t __p)
-{
- return __arm_vcaddq_rot270_x_u16 (__a, __b, __p);
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcaddq_rot270_x (uint32x4_t __a, uint32x4_t __b, mve_pred16_t __p)
-{
- return __arm_vcaddq_rot270_x_u32 (__a, __b, __p);
-}
-
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhcaddq_rot90_x (int8x16_t __a, int8x16_t __b, mve_pred16_t __p)
-{
- return __arm_vhcaddq_rot90_x_s8 (__a, __b, __p);
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhcaddq_rot90_x (int16x8_t __a, int16x8_t __b, mve_pred16_t __p)
-{
- return __arm_vhcaddq_rot90_x_s16 (__a, __b, __p);
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhcaddq_rot90_x (int32x4_t __a, int32x4_t __b, mve_pred16_t __p)
-{
- return __arm_vhcaddq_rot90_x_s32 (__a, __b, __p);
-}
-
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhcaddq_rot270_x (int8x16_t __a, int8x16_t __b, mve_pred16_t __p)
-{
- return __arm_vhcaddq_rot270_x_s8 (__a, __b, __p);
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhcaddq_rot270_x (int16x8_t __a, int16x8_t __b, mve_pred16_t __p)
-{
- return __arm_vhcaddq_rot270_x_s16 (__a, __b, __p);
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhcaddq_rot270_x (int32x4_t __a, int32x4_t __b, mve_pred16_t __p)
-{
- return __arm_vhcaddq_rot270_x_s32 (__a, __b, __p);
-}
-
 __extension__ extern __inline int8x16_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vbicq_x (int8x16_t __a, int8x16_t __b, mve_pred16_t __p)
@@ -9532,20 +8608,6 @@ __arm_vcmulq (float16x8_t __a, float16x8_t __b)
  return __arm_vcmulq_f16 (__a, __b);
 }
 
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcaddq_rot90 (float16x8_t __a, float16x8_t __b)
-{
- return __arm_vcaddq_rot90_f16 (__a, __b);
-}
-
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcaddq_rot270 (float16x8_t __a, float16x8_t __b)
-{
- return __arm_vcaddq_rot270_f16 (__a, __b);
-}
-
 __extension__ extern __inline float16x8_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vbicq (float16x8_t __a, float16x8_t __b)
@@ -9588,20 +8650,6 @@ __arm_vcmulq (float32x4_t __a, float32x4_t __b)
  return __arm_vcmulq_f32 (__a, __b);
 }
 
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcaddq_rot90 (float32x4_t __a, float32x4_t __b)
-{
- return __arm_vcaddq_rot90_f32 (__a, __b);
-}
-
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcaddq_rot270 (float32x4_t __a, float32x4_t __b)
-{
- return __arm_vcaddq_rot270_f32 (__a, __b);
-}
-
 __extension__ extern __inline float32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vbicq (float32x4_t __a, float32x4_t __b)
@@ -9903,34 +8951,6 @@ __arm_vbicq_m (float16x8_t __inactive, float16x8_t __a, float16x8_t __b, mve_pre
  return __arm_vbicq_m_f16 (__inactive, __a, __b, __p);
 }
 
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcaddq_rot270_m (float32x4_t __inactive, float32x4_t __a, float32x4_t __b, mve_pred16_t __p)
-{
- return __arm_vcaddq_rot270_m_f32 (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcaddq_rot270_m (float16x8_t __inactive, float16x8_t __a, float16x8_t __b, mve_pred16_t __p)
-{
- return __arm_vcaddq_rot270_m_f16 (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcaddq_rot90_m (float32x4_t __inactive, float32x4_t __a, float32x4_t __b, mve_pred16_t __p)
-{
- return __arm_vcaddq_rot90_m_f32 (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcaddq_rot90_m (float16x8_t __inactive, float16x8_t __a, float16x8_t __b, mve_pred16_t __p)
-{
- return __arm_vcaddq_rot90_m_f16 (__inactive, __a, __b, __p);
-}
-
 __extension__ extern __inline float32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vcmlaq_m (float32x4_t __a, float32x4_t __b, float32x4_t __c, mve_pred16_t __p)
@@ -10281,34 +9301,6 @@ __arm_vstrwq_scatter_base_wb_p (uint32x4_t * __addr, const int __offset, float32
  __arm_vstrwq_scatter_base_wb_p_f32 (__addr, __offset, __value, __p);
 }
 
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcaddq_rot90_x (float16x8_t __a, float16x8_t __b, mve_pred16_t __p)
-{
- return __arm_vcaddq_rot90_x_f16 (__a, __b, __p);
-}
-
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcaddq_rot90_x (float32x4_t __a, float32x4_t __b, mve_pred16_t __p)
-{
- return __arm_vcaddq_rot90_x_f32 (__a, __b, __p);
-}
-
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcaddq_rot270_x (float16x8_t __a, float16x8_t __b, mve_pred16_t __p)
-{
- return __arm_vcaddq_rot270_x_f16 (__a, __b, __p);
-}
-
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcaddq_rot270_x (float32x4_t __a, float32x4_t __b, mve_pred16_t __p)
-{
- return __arm_vcaddq_rot270_x_f32 (__a, __b, __p);
-}
-
 __extension__ extern __inline float16x8_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vcmulq_x (float16x8_t __a, float16x8_t __b, mve_pred16_t __p)
@@ -10920,30 +9912,6 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]: __arm_vornq_f16 (__ARM_mve_coerce(__p0, float16x8_t), __ARM_mve_coerce(__p1, float16x8_t)), \
   int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]: __arm_vornq_f32 (__ARM_mve_coerce(__p0, float32x4_t), __ARM_mve_coerce(__p1, float32x4_t)));})
 
-#define __arm_vcaddq_rot270(p0,p1) ({ __typeof(p0) __p0 = (p0); \
-  __typeof(p1) __p1 = (p1); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vcaddq_rot270_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t)), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vcaddq_rot270_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t)), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vcaddq_rot270_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t)), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]: __arm_vcaddq_rot270_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t)), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vcaddq_rot270_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t)), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vcaddq_rot270_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t)), \
-  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]: __arm_vcaddq_rot270_f16 (__ARM_mve_coerce(__p0, float16x8_t), __ARM_mve_coerce(__p1, float16x8_t)), \
-  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]: __arm_vcaddq_rot270_f32 (__ARM_mve_coerce(__p0, float32x4_t), __ARM_mve_coerce(__p1, float32x4_t)));})
-
-#define __arm_vcaddq_rot90(p0,p1) ({ __typeof(p0) __p0 = (p0); \
-  __typeof(p1) __p1 = (p1); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vcaddq_rot90_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t)), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vcaddq_rot90_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t)), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vcaddq_rot90_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t)), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]: __arm_vcaddq_rot90_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t)), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vcaddq_rot90_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t)), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vcaddq_rot90_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t)), \
-  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]: __arm_vcaddq_rot90_f16 (__ARM_mve_coerce(__p0, float16x8_t), __ARM_mve_coerce(__p1, float16x8_t)), \
-  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]: __arm_vcaddq_rot90_f32 (__ARM_mve_coerce(__p0, float32x4_t), __ARM_mve_coerce(__p1, float32x4_t)));})
-
 #define __arm_vcmulq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
   _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
@@ -10990,20 +9958,6 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vmulltq_int_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t)), \
   int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vmulltq_int_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t)));})
 
-#define __arm_vhcaddq_rot270(p0,p1) ({ __typeof(p0) __p0 = (p0); \
-  __typeof(p1) __p1 = (p1); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vhcaddq_rot270_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t)), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vhcaddq_rot270_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t)), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vhcaddq_rot270_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t)));})
-
-#define __arm_vhcaddq_rot90(p0,p1) ({ __typeof(p0) __p0 = (p0); \
-  __typeof(p1) __p1 = (p1); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vhcaddq_rot90_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t)), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vhcaddq_rot90_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t)), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vhcaddq_rot90_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t)));})
-
 #define __arm_vmullbq_int(p0,p1) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
   _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
@@ -11139,32 +10093,6 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]: __arm_vbicq_m_f16 (__ARM_mve_coerce(__p0, float16x8_t), __ARM_mve_coerce(__p1, float16x8_t), __ARM_mve_coerce(__p2, float16x8_t), p3), \
   int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]: __arm_vbicq_m_f32 (__ARM_mve_coerce(__p0, float32x4_t), __ARM_mve_coerce(__p1, float32x4_t), __ARM_mve_coerce(__p2, float32x4_t), p3));})
 
-#define __arm_vcaddq_rot270_m(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
-  __typeof(p1) __p1 = (p1); \
-  __typeof(p2) __p2 = (p2); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vcaddq_rot270_m_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8x16_t), p3), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vcaddq_rot270_m_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16x8_t), p3), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vcaddq_rot270_m_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32x4_t), p3), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]: __arm_vcaddq_rot270_m_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce(__p2, uint8x16_t), p3), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vcaddq_rot270_m_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, uint16x8_t), p3), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vcaddq_rot270_m_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, uint32x4_t), p3), \
-  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]: __arm_vcaddq_rot270_m_f16 (__ARM_mve_coerce(__p0, float16x8_t), __ARM_mve_coerce(__p1, float16x8_t), __ARM_mve_coerce(__p2, float16x8_t), p3), \
-  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]: __arm_vcaddq_rot270_m_f32 (__ARM_mve_coerce(__p0, float32x4_t), __ARM_mve_coerce(__p1, float32x4_t), __ARM_mve_coerce(__p2, float32x4_t), p3));})
-
-#define __arm_vcaddq_rot90_m(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
-  __typeof(p1) __p1 = (p1); \
-  __typeof(p2) __p2 = (p2); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vcaddq_rot90_m_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8x16_t), p3), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vcaddq_rot90_m_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16x8_t), p3), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vcaddq_rot90_m_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32x4_t), p3), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]: __arm_vcaddq_rot90_m_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce(__p2, uint8x16_t), p3), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vcaddq_rot90_m_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, uint16x8_t), p3), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vcaddq_rot90_m_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, uint32x4_t), p3), \
-  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]: __arm_vcaddq_rot90_m_f16 (__ARM_mve_coerce(__p0, float16x8_t), __ARM_mve_coerce(__p1, float16x8_t), __ARM_mve_coerce(__p2, float16x8_t), p3), \
-  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]: __arm_vcaddq_rot90_m_f32 (__ARM_mve_coerce(__p0, float32x4_t), __ARM_mve_coerce(__p1, float32x4_t), __ARM_mve_coerce(__p2, float32x4_t), p3));})
-
 #define __arm_vcmlaq_m(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
   __typeof(p2) __p2 = (p2); \
@@ -11558,30 +10486,6 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]: __arm_vbicq_x_f16 (__ARM_mve_coerce(__p1, float16x8_t), __ARM_mve_coerce(__p2, float16x8_t), p3), \
   int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]: __arm_vbicq_x_f32 (__ARM_mve_coerce(__p1, float32x4_t), __ARM_mve_coerce(__p2, float32x4_t), p3));})
 
-#define __arm_vcaddq_rot270_x(p1,p2,p3) ({ __typeof(p1) __p1 = (p1); \
-  __typeof(p2) __p2 = (p2); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vcaddq_rot270_x_s8 (__ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8x16_t), p3), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vcaddq_rot270_x_s16 (__ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16x8_t), p3), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vcaddq_rot270_x_s32 (__ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32x4_t), p3), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]: __arm_vcaddq_rot270_x_u8 (__ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce(__p2, uint8x16_t), p3), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vcaddq_rot270_x_u16 (__ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, uint16x8_t), p3), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vcaddq_rot270_x_u32 (__ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, uint32x4_t), p3), \
-  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]: __arm_vcaddq_rot270_x_f16 (__ARM_mve_coerce(__p1, float16x8_t), __ARM_mve_coerce(__p2, float16x8_t), p3), \
-  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]: __arm_vcaddq_rot270_x_f32 (__ARM_mve_coerce(__p1, float32x4_t), __ARM_mve_coerce(__p2, float32x4_t), p3));})
-
-#define __arm_vcaddq_rot90_x(p1,p2,p3) ({ __typeof(p1) __p1 = (p1); \
-  __typeof(p2) __p2 = (p2); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vcaddq_rot90_x_s8 (__ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8x16_t), p3), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vcaddq_rot90_x_s16 (__ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16x8_t), p3), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vcaddq_rot90_x_s32 (__ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32x4_t), p3), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]: __arm_vcaddq_rot90_x_u8 (__ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce(__p2, uint8x16_t), p3), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vcaddq_rot90_x_u16 (__ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, uint16x8_t), p3), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vcaddq_rot90_x_u32 (__ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, uint32x4_t), p3), \
-  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]: __arm_vcaddq_rot90_x_f16 (__ARM_mve_coerce(__p1, float16x8_t), __ARM_mve_coerce(__p2, float16x8_t), p3), \
-  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]: __arm_vcaddq_rot90_x_f32 (__ARM_mve_coerce(__p1, float32x4_t), __ARM_mve_coerce(__p2, float32x4_t), p3));})
-
 #define __arm_vcmulq_rot180_x(p1,p2,p3) ({ __typeof(p1) __p1 = (p1); \
   __typeof(p2) __p2 = (p2); \
   _Generic( (int (*)[__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
@@ -11710,40 +10614,6 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vmullbq_int_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t)), \
   int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vmullbq_int_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t)));})
 
-#define __arm_vhcaddq_rot90(p0,p1) ({ __typeof(p0) __p0 = (p0); \
-  __typeof(p1) __p1 = (p1); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vhcaddq_rot90_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t)), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vhcaddq_rot90_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t)), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vhcaddq_rot90_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t)));})
-
-#define __arm_vhcaddq_rot270(p0,p1) ({ __typeof(p0) __p0 = (p0); \
-  __typeof(p1) __p1 = (p1); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vhcaddq_rot270_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t)), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vhcaddq_rot270_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t)), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vhcaddq_rot270_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t)));})
-
-#define __arm_vcaddq_rot90(p0,p1) ({ __typeof(p0) __p0 = (p0); \
-  __typeof(p1) __p1 = (p1); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vcaddq_rot90_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t)), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vcaddq_rot90_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t)), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vcaddq_rot90_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t)), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]: __arm_vcaddq_rot90_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t)), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vcaddq_rot90_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t)), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vcaddq_rot90_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t)));})
-
-#define __arm_vcaddq_rot270(p0,p1) ({ __typeof(p0) __p0 = (p0); \
-  __typeof(p1) __p1 = (p1); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vcaddq_rot270_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t)), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vcaddq_rot270_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t)), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vcaddq_rot270_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t)), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]: __arm_vcaddq_rot270_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t)), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vcaddq_rot270_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t)), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vcaddq_rot270_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t)));})
-
 #define __arm_vbicq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
   _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
@@ -11797,28 +10667,6 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vbicq_m_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, uint16x8_t), p3), \
   int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vbicq_m_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, uint32x4_t), p3));})
 
-#define __arm_vcaddq_rot270_m(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
-  __typeof(p1) __p1 = (p1); \
-  __typeof(p2) __p2 = (p2); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vcaddq_rot270_m_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8x16_t), p3), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vcaddq_rot270_m_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16x8_t), p3), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vcaddq_rot270_m_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32x4_t), p3), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]: __arm_vcaddq_rot270_m_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce(__p2, uint8x16_t), p3), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vcaddq_rot270_m_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, uint16x8_t), p3), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vcaddq_rot270_m_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, uint32x4_t), p3));})
-
-#define __arm_vcaddq_rot90_m(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
-  __typeof(p1) __p1 = (p1); \
-  __typeof(p2) __p2 = (p2); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vcaddq_rot90_m_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8x16_t), p3), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vcaddq_rot90_m_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16x8_t), p3), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vcaddq_rot90_m_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32x4_t), p3), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]: __arm_vcaddq_rot90_m_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce(__p2, uint8x16_t), p3), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vcaddq_rot90_m_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, uint16x8_t), p3), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vcaddq_rot90_m_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, uint32x4_t), p3));})
-
 #define __arm_vornq_m(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
   __typeof(p2) __p2 = (p2); \
@@ -12067,26 +10915,6 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_uint32x4_t]: __arm_vuninitializedq_u32 (), \
   int (*)[__ARM_mve_type_uint64x2_t]: __arm_vuninitializedq_u64 ());})
 
-#define __arm_vcaddq_rot270_x(p1,p2,p3) ({ __typeof(p1) __p1 = (p1); \
-  __typeof(p2) __p2 = (p2); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vcaddq_rot270_x_s8 (__ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8x16_t), p3), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vcaddq_rot270_x_s16 (__ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16x8_t), p3), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vcaddq_rot270_x_s32 (__ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32x4_t), p3), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]: __arm_vcaddq_rot270_x_u8 (__ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce(__p2, uint8x16_t), p3), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vcaddq_rot270_x_u16 (__ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, uint16x8_t), p3), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vcaddq_rot270_x_u32 (__ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, uint32x4_t), p3));})
-
-#define __arm_vcaddq_rot90_x(p1,p2,p3) ({ __typeof(p1) __p1 = (p1); \
-  __typeof(p2) __p2 = (p2); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vcaddq_rot90_x_s8 (__ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8x16_t), p3), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vcaddq_rot90_x_s16 (__ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16x8_t), p3), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vcaddq_rot90_x_s32 (__ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32x4_t), p3), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]: __arm_vcaddq_rot90_x_u8 (__ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce(__p2, uint8x16_t), p3), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vcaddq_rot90_x_u16 (__ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, uint16x8_t), p3), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vcaddq_rot90_x_u32 (__ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, uint32x4_t), p3));})
-
 #define __arm_vmullbq_int_x(p1,p2,p3) ({ __typeof(p1) __p1 = (p1); \
   __typeof(p2) __p2 = (p2); \
   _Generic( (int (*)[__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
@@ -12251,20 +11079,6 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_int_n]: __arm_vddupq_x_n_u32 ((uint32_t) __p1, p2, p3), \
   int (*)[__ARM_mve_type_uint32_t_ptr]: __arm_vddupq_x_wb_u32 (__ARM_mve_coerce_u32_ptr(__p1, uint32_t *), p2, p3));})
 
-#define __arm_vhcaddq_rot270_x(p1,p2,p3) ({ __typeof(p1) __p1 = (p1); \
-  __typeof(p2) __p2 = (p2); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vhcaddq_rot270_x_s8 (__ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8x16_t), p3), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vhcaddq_rot270_x_s16 (__ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16x8_t), p3), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vhcaddq_rot270_x_s32 (__ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32x4_t), p3));})
-
-#define __arm_vhcaddq_rot90_x(p1,p2,p3) ({ __typeof(p1) __p1 = (p1); \
-  __typeof(p2) __p2 = (p2); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vhcaddq_rot90_x_s8 (__ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8x16_t), p3), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vhcaddq_rot90_x_s16 (__ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16x8_t), p3), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vhcaddq_rot90_x_s32 (__ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32x4_t), p3));})
-
 #define __arm_vadciq(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
   _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
@@ -12358,22 +11172,6 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_uint8_t_ptr][__ARM_mve_type_uint16x8_t]: __arm_vldrbq_gather_offset_z_u16 (__ARM_mve_coerce_u8_ptr(p0, uint8_t *), __ARM_mve_coerce(__p1, uint16x8_t), p2), \
   int (*)[__ARM_mve_type_uint8_t_ptr][__ARM_mve_type_uint32x4_t]: __arm_vldrbq_gather_offset_z_u32 (__ARM_mve_coerce_u8_ptr(p0, uint8_t *), __ARM_mve_coerce(__p1, uint32x4_t), p2));})
 
-#define __arm_vhcaddq_rot270_m(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
-  __typeof(p1) __p1 = (p1); \
-  __typeof(p2) __p2 = (p2); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vhcaddq_rot270_m_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8x16_t), p3), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vhcaddq_rot270_m_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16x8_t), p3), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vhcaddq_rot270_m_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32x4_t), p3));})
-
-#define __arm_vhcaddq_rot90_m(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
-  __typeof(p1) __p1 = (p1); \
-  __typeof(p2) __p2 = (p2); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vhcaddq_rot90_m_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8x16_t), p3), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vhcaddq_rot90_m_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16x8_t), p3), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vhcaddq_rot90_m_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32x4_t), p3));})
-
 #define __arm_vmullbq_int_m(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
   __typeof(p2) __p2 = (p2); \
-- 
2.34.1


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 3/6] arm: [MVE intrinsics factorize vcmulq
  2023-07-13 10:22 [PATCH 1/6] arm: [MVE intrinsics] Factorize vcaddq vhcaddq Christophe Lyon
  2023-07-13 10:22 ` [PATCH 2/6] arm: [MVE intrinsics] rework " Christophe Lyon
@ 2023-07-13 10:22 ` Christophe Lyon
  2023-07-13 10:22 ` [PATCH 4/6] arm: [MVE intrinsics] rework vcmulq Christophe Lyon
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Christophe Lyon @ 2023-07-13 10:22 UTC (permalink / raw)
  To: gcc-patches, Kyrylo.Tkachov, richard.earnshaw, richard.sandiford
  Cc: Christophe Lyon

Factorize vcmulq builtins so that they use parameterized names.

We can merged them with vcadd.

2023-07-13  Christophe Lyon  <christophe.lyon@linaro.org>

	gcc/:
	* config/arm/arm_mve_builtins.def (vcmulq_rot90_f)
	(vcmulq_rot270_f, vcmulq_rot180_f, vcmulq_f): Add "_f" suffix.
	* config/arm/iterators.md (MVE_VCADDQ_VCMULQ)
	(MVE_VCADDQ_VCMULQ_M): New.
	(mve_insn): Add vcmul.
	(rot): Add VCMULQ_M_F, VCMULQ_ROT90_M_F, VCMULQ_ROT180_M_F,
	VCMULQ_ROT270_M_F.
	(VCMUL): Delete.
	(mve_rot): Add VCMULQ_M_F, VCMULQ_ROT90_M_F, VCMULQ_ROT180_M_F,
	VCMULQ_ROT270_M_F.
	* config/arm/mve.md (mve_vcmulq<mve_rot><mode>): Merge into
	@mve_<mve_insn>q<mve_rot>_f<mode>.
	(mve_vcmulq_m_f<mode>, mve_vcmulq_rot180_m_f<mode>)
	(mve_vcmulq_rot270_m_f<mode>, mve_vcmulq_rot90_m_f<mode>): Merge
	into @mve_<mve_insn>q<mve_rot>_m_f<mode>.
---
 gcc/config/arm/arm_mve_builtins.def |  8 +--
 gcc/config/arm/iterators.md         | 27 +++++++--
 gcc/config/arm/mve.md               | 92 +++--------------------------
 3 files changed, 33 insertions(+), 94 deletions(-)

diff --git a/gcc/config/arm/arm_mve_builtins.def b/gcc/config/arm/arm_mve_builtins.def
index 63ad1845593..56358c0bd02 100644
--- a/gcc/config/arm/arm_mve_builtins.def
+++ b/gcc/config/arm/arm_mve_builtins.def
@@ -191,6 +191,10 @@ VAR3 (BINOP_NONE_NONE_NONE, vcaddq_rot90_, v16qi, v8hi, v4si)
 VAR3 (BINOP_NONE_NONE_NONE, vcaddq_rot270_, v16qi, v8hi, v4si)
 VAR2 (BINOP_NONE_NONE_NONE, vcaddq_rot90_f, v8hf, v4sf)
 VAR2 (BINOP_NONE_NONE_NONE, vcaddq_rot270_f, v8hf, v4sf)
+VAR2 (BINOP_NONE_NONE_NONE, vcmulq_rot90_f, v8hf, v4sf)
+VAR2 (BINOP_NONE_NONE_NONE, vcmulq_rot270_f, v8hf, v4sf)
+VAR2 (BINOP_NONE_NONE_NONE, vcmulq_rot180_f, v8hf, v4sf)
+VAR2 (BINOP_NONE_NONE_NONE, vcmulq_f, v8hf, v4sf)
 VAR3 (BINOP_NONE_NONE_NONE, vhcaddq_rot90_s, v16qi, v8hi, v4si)
 VAR3 (BINOP_NONE_NONE_NONE, vhcaddq_rot270_s, v16qi, v8hi, v4si)
 VAR3 (BINOP_NONE_NONE_NONE, vhaddq_s, v16qi, v8hi, v4si)
@@ -874,10 +878,6 @@ VAR3 (QUADOP_UNONE_UNONE_UNONE_IMM_PRED, vshlcq_m_vec_u, v16qi, v8hi, v4si)
 VAR3 (QUADOP_UNONE_UNONE_UNONE_IMM_PRED, vshlcq_m_carry_u, v16qi, v8hi, v4si)
 
 /* optabs without any suffixes.  */
-VAR2 (BINOP_NONE_NONE_NONE, vcmulq_rot90, v8hf, v4sf)
-VAR2 (BINOP_NONE_NONE_NONE, vcmulq_rot270, v8hf, v4sf)
-VAR2 (BINOP_NONE_NONE_NONE, vcmulq_rot180, v8hf, v4sf)
-VAR2 (BINOP_NONE_NONE_NONE, vcmulq, v8hf, v4sf)
 VAR2 (TERNOP_NONE_NONE_NONE_NONE, vcmlaq_rot90, v8hf, v4sf)
 VAR2 (TERNOP_NONE_NONE_NONE_NONE, vcmlaq_rot270, v8hf, v4sf)
 VAR2 (TERNOP_NONE_NONE_NONE_NONE, vcmlaq_rot180, v8hf, v4sf)
diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
index da1ead34e58..9f71404e26c 100644
--- a/gcc/config/arm/iterators.md
+++ b/gcc/config/arm/iterators.md
@@ -901,8 +901,19 @@
 		     VPSELQ_F
 		     ])
 
+(define_int_iterator MVE_VCADDQ_VCMULQ [
+		     UNSPEC_VCADD90 UNSPEC_VCADD270
+		     UNSPEC_VCMUL UNSPEC_VCMUL90 UNSPEC_VCMUL180 UNSPEC_VCMUL270
+		     ])
+
+(define_int_iterator MVE_VCADDQ_VCMULQ_M [
+		     VCADDQ_ROT90_M_F VCADDQ_ROT270_M_F
+		     VCMULQ_M_F VCMULQ_ROT90_M_F VCMULQ_ROT180_M_F VCMULQ_ROT270_M_F
+		     ])
+
 (define_int_attr mve_insn [
 		 (UNSPEC_VCADD90 "vcadd") (UNSPEC_VCADD270 "vcadd")
+		 (UNSPEC_VCMUL "vcmul") (UNSPEC_VCMUL90 "vcmul") (UNSPEC_VCMUL180 "vcmul") (UNSPEC_VCMUL270 "vcmul")
 		 (VABAVQ_P_S "vabav") (VABAVQ_P_U "vabav")
 		 (VABAVQ_S "vabav") (VABAVQ_U "vabav")
 		 (VABDQ_M_S "vabd") (VABDQ_M_U "vabd") (VABDQ_M_F "vabd")
@@ -931,6 +942,7 @@
 		 (VCLSQ_M_S "vcls")
 		 (VCLSQ_S "vcls")
 		 (VCLZQ_M_S "vclz") (VCLZQ_M_U "vclz")
+		 (VCMULQ_M_F "vcmul") (VCMULQ_ROT90_M_F "vcmul") (VCMULQ_ROT180_M_F "vcmul") (VCMULQ_ROT270_M_F "vcmul")
 		 (VCREATEQ_S "vcreate") (VCREATEQ_U "vcreate") (VCREATEQ_F "vcreate")
 		 (VDUPQ_M_N_S "vdup") (VDUPQ_M_N_U "vdup") (VDUPQ_M_N_F "vdup")
 		 (VDUPQ_N_S "vdup") (VDUPQ_N_U "vdup") (VDUPQ_N_F "vdup")
@@ -2182,7 +2194,11 @@
 		      (UNSPEC_VCMLA "0")
 		      (UNSPEC_VCMLA90 "90")
 		      (UNSPEC_VCMLA180 "180")
-		      (UNSPEC_VCMLA270 "270")])
+		      (UNSPEC_VCMLA270 "270")
+		      (VCMULQ_M_F "0")
+		      (VCMULQ_ROT90_M_F "90")
+		      (VCMULQ_ROT180_M_F "180")
+		      (VCMULQ_ROT270_M_F "270")])
 
 ;; The complex operations when performed on a real complex number require two
 ;; instructions to perform the operation. e.g. complex multiplication requires
@@ -2230,10 +2246,11 @@
 			  (UNSPEC_VCMUL "")
 			  (UNSPEC_VCMUL90 "_rot90")
 			  (UNSPEC_VCMUL180 "_rot180")
-			  (UNSPEC_VCMUL270 "_rot270")])
-
-(define_int_iterator VCMUL [UNSPEC_VCMUL UNSPEC_VCMUL90
-			    UNSPEC_VCMUL180 UNSPEC_VCMUL270])
+			  (UNSPEC_VCMUL270 "_rot270")
+			  (VCMULQ_M_F "")
+			  (VCMULQ_ROT90_M_F "_rot90")
+			  (VCMULQ_ROT180_M_F "_rot180")
+			  (VCMULQ_ROT270_M_F "_rot270")])
 
 (define_int_attr fcmac1 [(UNSPEC_VCMLA "a") (UNSPEC_VCMLA_CONJ "a")
 			 (UNSPEC_VCMLA180 "s") (UNSPEC_VCMLA180_CONJ "s")])
diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index a6db6d1b81d..0b99bf017dc 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -1212,13 +1212,14 @@
 
 ;;
 ;; [vcaddq_rot90_f, vcaddq_rot270_f]
+;; [vcmulq, vcmulq_rot90, vcmulq_rot180, vcmulq_rot270]
 ;;
 (define_insn "@mve_<mve_insn>q<mve_rot>_f<mode>"
   [
    (set (match_operand:MVE_0 0 "s_register_operand" "<earlyclobber_32>")
 	(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "w")
 		       (match_operand:MVE_0 2 "s_register_operand" "w")]
-	 VCADD))
+	 MVE_VCADDQ_VCMULQ))
   ]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
   "<mve_insn>.f%#<V_sz_elem>\t%q0, %q1, %q2, #<rot>"
@@ -1254,21 +1255,6 @@
   [(set_attr "type" "mve_move")
 ])
 
-;;
-;; [vcmulq, vcmulq_rot90, vcmulq_rot180, vcmulq_rot270])
-;;
-(define_insn "mve_vcmulq<mve_rot><mode>"
-  [
-   (set (match_operand:MVE_0 0 "s_register_operand" "<earlyclobber_32>")
-	(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "w")
-		       (match_operand:MVE_0 2 "s_register_operand" "w")]
-	 VCMUL))
-  ]
-  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
-  "vcmul.f%#<V_sz_elem>	%q0, %q1, %q2, #<rot>"
-  [(set_attr "type" "mve_move")
-])
-
 ;;
 ;; [vctp8q_m vctp16q_m vctp32q_m vctp64q_m])
 ;;
@@ -3174,6 +3160,10 @@
 ;;
 ;; [vcaddq_rot90_m_f]
 ;; [vcaddq_rot270_m_f]
+;; [vcmulq_m_f]
+;; [vcmulq_rot90_m_f]
+;; [vcmulq_rot180_m_f]
+;; [vcmulq_rot270_m_f]
 ;;
 (define_insn "@mve_<mve_insn>q<mve_rot>_m_f<mode>"
   [
@@ -3182,7 +3172,7 @@
 		       (match_operand:MVE_0 2 "s_register_operand" "w")
 		       (match_operand:MVE_0 3 "s_register_operand" "w")
 		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
-	 VCADDQ_M_F))
+	 MVE_VCADDQ_VCMULQ_M))
   ]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
   "vpst\;<mve_insn>t.f%#<V_sz_elem>\t%q0, %q2, %q3, #<rot>"
@@ -3257,74 +3247,6 @@
   [(set_attr "type" "mve_move")
    (set_attr "length""8")])
 
-;;
-;; [vcmulq_m_f])
-;;
-(define_insn "mve_vcmulq_m_f<mode>"
-  [
-   (set (match_operand:MVE_0 0 "s_register_operand" "<earlyclobber_32>")
-	(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "0")
-		       (match_operand:MVE_0 2 "s_register_operand" "w")
-		       (match_operand:MVE_0 3 "s_register_operand" "w")
-		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
-	 VCMULQ_M_F))
-  ]
-  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
-  "vpst\;vcmult.f%#<V_sz_elem>	%q0, %q2, %q3, #0"
-  [(set_attr "type" "mve_move")
-   (set_attr "length""8")])
-
-;;
-;; [vcmulq_rot180_m_f])
-;;
-(define_insn "mve_vcmulq_rot180_m_f<mode>"
-  [
-   (set (match_operand:MVE_0 0 "s_register_operand" "<earlyclobber_32>")
-	(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "0")
-		       (match_operand:MVE_0 2 "s_register_operand" "w")
-		       (match_operand:MVE_0 3 "s_register_operand" "w")
-		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
-	 VCMULQ_ROT180_M_F))
-  ]
-  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
-  "vpst\;vcmult.f%#<V_sz_elem>	%q0, %q2, %q3, #180"
-  [(set_attr "type" "mve_move")
-   (set_attr "length""8")])
-
-;;
-;; [vcmulq_rot270_m_f])
-;;
-(define_insn "mve_vcmulq_rot270_m_f<mode>"
-  [
-   (set (match_operand:MVE_0 0 "s_register_operand" "<earlyclobber_32>")
-	(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "0")
-		       (match_operand:MVE_0 2 "s_register_operand" "w")
-		       (match_operand:MVE_0 3 "s_register_operand" "w")
-		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
-	 VCMULQ_ROT270_M_F))
-  ]
-  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
-  "vpst\;vcmult.f%#<V_sz_elem>	%q0, %q2, %q3, #270"
-  [(set_attr "type" "mve_move")
-   (set_attr "length""8")])
-
-;;
-;; [vcmulq_rot90_m_f])
-;;
-(define_insn "mve_vcmulq_rot90_m_f<mode>"
-  [
-   (set (match_operand:MVE_0 0 "s_register_operand" "<earlyclobber_32>")
-	(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "0")
-		       (match_operand:MVE_0 2 "s_register_operand" "w")
-		       (match_operand:MVE_0 3 "s_register_operand" "w")
-		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
-	 VCMULQ_ROT90_M_F))
-  ]
-  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
-  "vpst\;vcmult.f%#<V_sz_elem>	%q0, %q2, %q3, #90"
-  [(set_attr "type" "mve_move")
-   (set_attr "length""8")])
-
 ;;
 ;; [vornq_m_f])
 ;;
-- 
2.34.1


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 4/6] arm: [MVE intrinsics] rework vcmulq
  2023-07-13 10:22 [PATCH 1/6] arm: [MVE intrinsics] Factorize vcaddq vhcaddq Christophe Lyon
  2023-07-13 10:22 ` [PATCH 2/6] arm: [MVE intrinsics] rework " Christophe Lyon
  2023-07-13 10:22 ` [PATCH 3/6] arm: [MVE intrinsics factorize vcmulq Christophe Lyon
@ 2023-07-13 10:22 ` Christophe Lyon
  2023-07-13 10:22 ` [PATCH 5/6] arm: [MVE intrinsics] factorize vcmlaq Christophe Lyon
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Christophe Lyon @ 2023-07-13 10:22 UTC (permalink / raw)
  To: gcc-patches, Kyrylo.Tkachov, richard.earnshaw, richard.sandiford
  Cc: Christophe Lyon

Implement vcmulq using the new MVE builtins framework.

2023-07-13 Christophe Lyon  <christophe.lyon@linaro.org>

	gcc/
	* config/arm/arm-mve-builtins-base.cc (vcmulq, vcmulq_rot90)
	(vcmulq_rot180, vcmulq_rot270): New.
	* config/arm/arm-mve-builtins-base.def (vcmulq, vcmulq_rot90)
	(vcmulq_rot180, vcmulq_rot270): New.
	* config/arm/arm-mve-builtins-base.h: (vcmulq, vcmulq_rot90)
	(vcmulq_rot180, vcmulq_rot270): New.
	* config/arm/arm_mve.h (vcmulq_rot90): Delete.
	(vcmulq_rot270): Delete.
	(vcmulq_rot180): Delete.
	(vcmulq): Delete.
	(vcmulq_m): Delete.
	(vcmulq_rot180_m): Delete.
	(vcmulq_rot270_m): Delete.
	(vcmulq_rot90_m): Delete.
	(vcmulq_x): Delete.
	(vcmulq_rot90_x): Delete.
	(vcmulq_rot180_x): Delete.
	(vcmulq_rot270_x): Delete.
	(vcmulq_rot90_f16): Delete.
	(vcmulq_rot270_f16): Delete.
	(vcmulq_rot180_f16): Delete.
	(vcmulq_f16): Delete.
	(vcmulq_rot90_f32): Delete.
	(vcmulq_rot270_f32): Delete.
	(vcmulq_rot180_f32): Delete.
	(vcmulq_f32): Delete.
	(vcmulq_m_f32): Delete.
	(vcmulq_m_f16): Delete.
	(vcmulq_rot180_m_f32): Delete.
	(vcmulq_rot180_m_f16): Delete.
	(vcmulq_rot270_m_f32): Delete.
	(vcmulq_rot270_m_f16): Delete.
	(vcmulq_rot90_m_f32): Delete.
	(vcmulq_rot90_m_f16): Delete.
	(vcmulq_x_f16): Delete.
	(vcmulq_x_f32): Delete.
	(vcmulq_rot90_x_f16): Delete.
	(vcmulq_rot90_x_f32): Delete.
	(vcmulq_rot180_x_f16): Delete.
	(vcmulq_rot180_x_f32): Delete.
	(vcmulq_rot270_x_f16): Delete.
	(vcmulq_rot270_x_f32): Delete.
	(__arm_vcmulq_rot90_f16): Delete.
	(__arm_vcmulq_rot270_f16): Delete.
	(__arm_vcmulq_rot180_f16): Delete.
	(__arm_vcmulq_f16): Delete.
	(__arm_vcmulq_rot90_f32): Delete.
	(__arm_vcmulq_rot270_f32): Delete.
	(__arm_vcmulq_rot180_f32): Delete.
	(__arm_vcmulq_f32): Delete.
	(__arm_vcmulq_m_f32): Delete.
	(__arm_vcmulq_m_f16): Delete.
	(__arm_vcmulq_rot180_m_f32): Delete.
	(__arm_vcmulq_rot180_m_f16): Delete.
	(__arm_vcmulq_rot270_m_f32): Delete.
	(__arm_vcmulq_rot270_m_f16): Delete.
	(__arm_vcmulq_rot90_m_f32): Delete.
	(__arm_vcmulq_rot90_m_f16): Delete.
	(__arm_vcmulq_x_f16): Delete.
	(__arm_vcmulq_x_f32): Delete.
	(__arm_vcmulq_rot90_x_f16): Delete.
	(__arm_vcmulq_rot90_x_f32): Delete.
	(__arm_vcmulq_rot180_x_f16): Delete.
	(__arm_vcmulq_rot180_x_f32): Delete.
	(__arm_vcmulq_rot270_x_f16): Delete.
	(__arm_vcmulq_rot270_x_f32): Delete.
	(__arm_vcmulq_rot90): Delete.
	(__arm_vcmulq_rot270): Delete.
	(__arm_vcmulq_rot180): Delete.
	(__arm_vcmulq): Delete.
	(__arm_vcmulq_m): Delete.
	(__arm_vcmulq_rot180_m): Delete.
	(__arm_vcmulq_rot270_m): Delete.
	(__arm_vcmulq_rot90_m): Delete.
	(__arm_vcmulq_x): Delete.
	(__arm_vcmulq_rot90_x): Delete.
	(__arm_vcmulq_rot180_x): Delete.
	(__arm_vcmulq_rot270_x): Delete.
---
 gcc/config/arm/arm-mve-builtins-base.cc  |   4 +
 gcc/config/arm/arm-mve-builtins-base.def |   4 +
 gcc/config/arm/arm-mve-builtins-base.h   |   4 +
 gcc/config/arm/arm_mve.h                 | 448 -----------------------
 4 files changed, 12 insertions(+), 448 deletions(-)

diff --git a/gcc/config/arm/arm-mve-builtins-base.cc b/gcc/config/arm/arm-mve-builtins-base.cc
index f15bb926147..3ad8df304e8 100644
--- a/gcc/config/arm/arm-mve-builtins-base.cc
+++ b/gcc/config/arm/arm-mve-builtins-base.cc
@@ -262,6 +262,10 @@ FUNCTION_WITH_RTX_M (vandq, AND, VANDQ)
 FUNCTION_ONLY_N (vbrsrq, VBRSRQ)
 FUNCTION (vcaddq_rot90, unspec_mve_function_exact_insn_rot, (UNSPEC_VCADD90, UNSPEC_VCADD90, UNSPEC_VCADD90, VCADDQ_ROT90_M_S, VCADDQ_ROT90_M_U, VCADDQ_ROT90_M_F))
 FUNCTION (vcaddq_rot270, unspec_mve_function_exact_insn_rot, (UNSPEC_VCADD270, UNSPEC_VCADD270, UNSPEC_VCADD270, VCADDQ_ROT270_M_S, VCADDQ_ROT270_M_U, VCADDQ_ROT270_M_F))
+FUNCTION (vcmulq, unspec_mve_function_exact_insn_rot, (-1, -1, UNSPEC_VCMUL, -1, -1, VCMULQ_M_F))
+FUNCTION (vcmulq_rot90, unspec_mve_function_exact_insn_rot, (-1, -1, UNSPEC_VCMUL90, -1, -1, VCMULQ_ROT90_M_F))
+FUNCTION (vcmulq_rot180, unspec_mve_function_exact_insn_rot, (-1, -1, UNSPEC_VCMUL180, -1, -1, VCMULQ_ROT180_M_F))
+FUNCTION (vcmulq_rot270, unspec_mve_function_exact_insn_rot, (-1, -1, UNSPEC_VCMUL270, -1, -1, VCMULQ_ROT270_M_F))
 FUNCTION (vhcaddq_rot90, unspec_mve_function_exact_insn_rot, (VHCADDQ_ROT90_S, -1, -1, VHCADDQ_ROT90_M_S, -1, -1))
 FUNCTION (vhcaddq_rot270, unspec_mve_function_exact_insn_rot, (VHCADDQ_ROT270_S, -1, -1, VHCADDQ_ROT270_M_S, -1, -1))
 FUNCTION_WITHOUT_N_NO_U_F (vclsq, VCLSQ)
diff --git a/gcc/config/arm/arm-mve-builtins-base.def b/gcc/config/arm/arm-mve-builtins-base.def
index 9a793147960..cbcf0d296cd 100644
--- a/gcc/config/arm/arm-mve-builtins-base.def
+++ b/gcc/config/arm/arm-mve-builtins-base.def
@@ -158,6 +158,10 @@ DEF_MVE_FUNCTION (vandq, binary, all_float, mx_or_none)
 DEF_MVE_FUNCTION (vbrsrq, binary_imm32, all_float, mx_or_none)
 DEF_MVE_FUNCTION (vcaddq_rot90, binary, all_float, mx_or_none)
 DEF_MVE_FUNCTION (vcaddq_rot270, binary, all_float, mx_or_none)
+DEF_MVE_FUNCTION (vcmulq, binary, all_float, mx_or_none)
+DEF_MVE_FUNCTION (vcmulq_rot90, binary, all_float, mx_or_none)
+DEF_MVE_FUNCTION (vcmulq_rot180, binary, all_float, mx_or_none)
+DEF_MVE_FUNCTION (vcmulq_rot270, binary, all_float, mx_or_none)
 DEF_MVE_FUNCTION (vcmpeqq, cmp, all_float, m_or_none)
 DEF_MVE_FUNCTION (vcmpgeq, cmp, all_float, m_or_none)
 DEF_MVE_FUNCTION (vcmpgtq, cmp, all_float, m_or_none)
diff --git a/gcc/config/arm/arm-mve-builtins-base.h b/gcc/config/arm/arm-mve-builtins-base.h
index 8dac72970e0..875b333ebef 100644
--- a/gcc/config/arm/arm-mve-builtins-base.h
+++ b/gcc/config/arm/arm-mve-builtins-base.h
@@ -35,6 +35,10 @@ extern const function_base *const vandq;
 extern const function_base *const vbrsrq;
 extern const function_base *const vcaddq_rot90;
 extern const function_base *const vcaddq_rot270;
+extern const function_base *const vcmulq;
+extern const function_base *const vcmulq_rot90;
+extern const function_base *const vcmulq_rot180;
+extern const function_base *const vcmulq_rot270;
 extern const function_base *const vclsq;
 extern const function_base *const vclzq;
 extern const function_base *const vcmpcsq;
diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
index 9297ad757ab..b9d3a876369 100644
--- a/gcc/config/arm/arm_mve.h
+++ b/gcc/config/arm/arm_mve.h
@@ -155,10 +155,6 @@
 #define vcvtbq_f32(__a) __arm_vcvtbq_f32(__a)
 #define vcvtq(__a) __arm_vcvtq(__a)
 #define vcvtq_n(__a, __imm6) __arm_vcvtq_n(__a, __imm6)
-#define vcmulq_rot90(__a, __b) __arm_vcmulq_rot90(__a, __b)
-#define vcmulq_rot270(__a, __b) __arm_vcmulq_rot270(__a, __b)
-#define vcmulq_rot180(__a, __b) __arm_vcmulq_rot180(__a, __b)
-#define vcmulq(__a, __b) __arm_vcmulq(__a, __b)
 #define vcvtaq_m(__inactive, __a, __p) __arm_vcvtaq_m(__inactive, __a, __p)
 #define vcvtq_m(__inactive, __a, __p) __arm_vcvtq_m(__inactive, __a, __p)
 #define vcvtbq_m(__a, __b, __p) __arm_vcvtbq_m(__a, __b, __p)
@@ -175,14 +171,6 @@
 #define vcmlaq_rot180_m(__a, __b, __c, __p) __arm_vcmlaq_rot180_m(__a, __b, __c, __p)
 #define vcmlaq_rot270_m(__a, __b, __c, __p) __arm_vcmlaq_rot270_m(__a, __b, __c, __p)
 #define vcmlaq_rot90_m(__a, __b, __c, __p) __arm_vcmlaq_rot90_m(__a, __b, __c, __p)
-#define vcmulq_m(__inactive, __a, __b, __p) __arm_vcmulq_m(__inactive, __a, __b, __p)
-#define vcmulq_rot180_m(__inactive, __a, __b, __p) __arm_vcmulq_rot180_m(__inactive, __a, __b, __p)
-#define vcmulq_rot270_m(__inactive, __a, __b, __p) __arm_vcmulq_rot270_m(__inactive, __a, __b, __p)
-#define vcmulq_rot90_m(__inactive, __a, __b, __p) __arm_vcmulq_rot90_m(__inactive, __a, __b, __p)
-#define vcmulq_x(__a, __b, __p) __arm_vcmulq_x(__a, __b, __p)
-#define vcmulq_rot90_x(__a, __b, __p) __arm_vcmulq_rot90_x(__a, __b, __p)
-#define vcmulq_rot180_x(__a, __b, __p) __arm_vcmulq_rot180_x(__a, __b, __p)
-#define vcmulq_rot270_x(__a, __b, __p) __arm_vcmulq_rot270_x(__a, __b, __p)
 #define vcvtq_x(__a, __p) __arm_vcvtq_x(__a, __p)
 #define vcvtq_x_n(__a, __imm6, __p) __arm_vcvtq_x_n(__a, __imm6, __p)
 
@@ -262,20 +250,12 @@
 #define vmullbq_poly_p8(__a, __b) __arm_vmullbq_poly_p8(__a, __b)
 #define vbicq_n_u16(__a,  __imm) __arm_vbicq_n_u16(__a,  __imm)
 #define vornq_f16(__a, __b) __arm_vornq_f16(__a, __b)
-#define vcmulq_rot90_f16(__a, __b) __arm_vcmulq_rot90_f16(__a, __b)
-#define vcmulq_rot270_f16(__a, __b) __arm_vcmulq_rot270_f16(__a, __b)
-#define vcmulq_rot180_f16(__a, __b) __arm_vcmulq_rot180_f16(__a, __b)
-#define vcmulq_f16(__a, __b) __arm_vcmulq_f16(__a, __b)
 #define vbicq_f16(__a, __b) __arm_vbicq_f16(__a, __b)
 #define vbicq_n_s16(__a,  __imm) __arm_vbicq_n_s16(__a,  __imm)
 #define vmulltq_poly_p16(__a, __b) __arm_vmulltq_poly_p16(__a, __b)
 #define vmullbq_poly_p16(__a, __b) __arm_vmullbq_poly_p16(__a, __b)
 #define vbicq_n_u32(__a,  __imm) __arm_vbicq_n_u32(__a,  __imm)
 #define vornq_f32(__a, __b) __arm_vornq_f32(__a, __b)
-#define vcmulq_rot90_f32(__a, __b) __arm_vcmulq_rot90_f32(__a, __b)
-#define vcmulq_rot270_f32(__a, __b) __arm_vcmulq_rot270_f32(__a, __b)
-#define vcmulq_rot180_f32(__a, __b) __arm_vcmulq_rot180_f32(__a, __b)
-#define vcmulq_f32(__a, __b) __arm_vcmulq_f32(__a, __b)
 #define vbicq_f32(__a, __b) __arm_vbicq_f32(__a, __b)
 #define vbicq_n_s32(__a,  __imm) __arm_vbicq_n_s32(__a,  __imm)
 #define vctp8q_m(__a, __p) __arm_vctp8q_m(__a, __p)
@@ -372,14 +352,6 @@
 #define vcmlaq_rot270_m_f16(__a, __b, __c, __p) __arm_vcmlaq_rot270_m_f16(__a, __b, __c, __p)
 #define vcmlaq_rot90_m_f32(__a, __b, __c, __p) __arm_vcmlaq_rot90_m_f32(__a, __b, __c, __p)
 #define vcmlaq_rot90_m_f16(__a, __b, __c, __p) __arm_vcmlaq_rot90_m_f16(__a, __b, __c, __p)
-#define vcmulq_m_f32(__inactive, __a, __b, __p) __arm_vcmulq_m_f32(__inactive, __a, __b, __p)
-#define vcmulq_m_f16(__inactive, __a, __b, __p) __arm_vcmulq_m_f16(__inactive, __a, __b, __p)
-#define vcmulq_rot180_m_f32(__inactive, __a, __b, __p) __arm_vcmulq_rot180_m_f32(__inactive, __a, __b, __p)
-#define vcmulq_rot180_m_f16(__inactive, __a, __b, __p) __arm_vcmulq_rot180_m_f16(__inactive, __a, __b, __p)
-#define vcmulq_rot270_m_f32(__inactive, __a, __b, __p) __arm_vcmulq_rot270_m_f32(__inactive, __a, __b, __p)
-#define vcmulq_rot270_m_f16(__inactive, __a, __b, __p) __arm_vcmulq_rot270_m_f16(__inactive, __a, __b, __p)
-#define vcmulq_rot90_m_f32(__inactive, __a, __b, __p) __arm_vcmulq_rot90_m_f32(__inactive, __a, __b, __p)
-#define vcmulq_rot90_m_f16(__inactive, __a, __b, __p) __arm_vcmulq_rot90_m_f16(__inactive, __a, __b, __p)
 #define vcvtq_m_n_s32_f32(__inactive, __a,  __imm6, __p) __arm_vcvtq_m_n_s32_f32(__inactive, __a,  __imm6, __p)
 #define vcvtq_m_n_s16_f16(__inactive, __a,  __imm6, __p) __arm_vcvtq_m_n_s16_f16(__inactive, __a,  __imm6, __p)
 #define vcvtq_m_n_u32_f32(__inactive, __a,  __imm6, __p) __arm_vcvtq_m_n_u32_f32(__inactive, __a,  __imm6, __p)
@@ -712,14 +684,6 @@
 #define vornq_x_u8(__a, __b, __p) __arm_vornq_x_u8(__a, __b, __p)
 #define vornq_x_u16(__a, __b, __p) __arm_vornq_x_u16(__a, __b, __p)
 #define vornq_x_u32(__a, __b, __p) __arm_vornq_x_u32(__a, __b, __p)
-#define vcmulq_x_f16(__a, __b, __p) __arm_vcmulq_x_f16(__a, __b, __p)
-#define vcmulq_x_f32(__a, __b, __p) __arm_vcmulq_x_f32(__a, __b, __p)
-#define vcmulq_rot90_x_f16(__a, __b, __p) __arm_vcmulq_rot90_x_f16(__a, __b, __p)
-#define vcmulq_rot90_x_f32(__a, __b, __p) __arm_vcmulq_rot90_x_f32(__a, __b, __p)
-#define vcmulq_rot180_x_f16(__a, __b, __p) __arm_vcmulq_rot180_x_f16(__a, __b, __p)
-#define vcmulq_rot180_x_f32(__a, __b, __p) __arm_vcmulq_rot180_x_f32(__a, __b, __p)
-#define vcmulq_rot270_x_f16(__a, __b, __p) __arm_vcmulq_rot270_x_f16(__a, __b, __p)
-#define vcmulq_rot270_x_f32(__a, __b, __p) __arm_vcmulq_rot270_x_f32(__a, __b, __p)
 #define vcvtaq_x_s16_f16(__a, __p) __arm_vcvtaq_x_s16_f16(__a, __p)
 #define vcvtaq_x_s32_f32(__a, __p) __arm_vcvtaq_x_s32_f32(__a, __p)
 #define vcvtaq_x_u16_f16(__a, __p) __arm_vcvtaq_x_u16_f16(__a, __p)
@@ -4561,34 +4525,6 @@ __arm_vornq_f16 (float16x8_t __a, float16x8_t __b)
   return __builtin_mve_vornq_fv8hf (__a, __b);
 }
 
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcmulq_rot90_f16 (float16x8_t __a, float16x8_t __b)
-{
-  return __builtin_mve_vcmulq_rot90v8hf (__a, __b);
-}
-
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcmulq_rot270_f16 (float16x8_t __a, float16x8_t __b)
-{
-  return __builtin_mve_vcmulq_rot270v8hf (__a, __b);
-}
-
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcmulq_rot180_f16 (float16x8_t __a, float16x8_t __b)
-{
-  return __builtin_mve_vcmulq_rot180v8hf (__a, __b);
-}
-
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcmulq_f16 (float16x8_t __a, float16x8_t __b)
-{
-  return __builtin_mve_vcmulqv8hf (__a, __b);
-}
-
 __extension__ extern __inline float16x8_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vbicq_f16 (float16x8_t __a, float16x8_t __b)
@@ -4603,34 +4539,6 @@ __arm_vornq_f32 (float32x4_t __a, float32x4_t __b)
   return __builtin_mve_vornq_fv4sf (__a, __b);
 }
 
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcmulq_rot90_f32 (float32x4_t __a, float32x4_t __b)
-{
-  return __builtin_mve_vcmulq_rot90v4sf (__a, __b);
-}
-
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcmulq_rot270_f32 (float32x4_t __a, float32x4_t __b)
-{
-  return __builtin_mve_vcmulq_rot270v4sf (__a, __b);
-}
-
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcmulq_rot180_f32 (float32x4_t __a, float32x4_t __b)
-{
-  return __builtin_mve_vcmulq_rot180v4sf (__a, __b);
-}
-
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcmulq_f32 (float32x4_t __a, float32x4_t __b)
-{
-  return __builtin_mve_vcmulqv4sf (__a, __b);
-}
-
 __extension__ extern __inline float32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vbicq_f32 (float32x4_t __a, float32x4_t __b)
@@ -5003,62 +4911,6 @@ __arm_vcmlaq_rot90_m_f16 (float16x8_t __a, float16x8_t __b, float16x8_t __c, mve
   return __builtin_mve_vcmlaq_rot90_m_fv8hf (__a, __b, __c, __p);
 }
 
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcmulq_m_f32 (float32x4_t __inactive, float32x4_t __a, float32x4_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vcmulq_m_fv4sf (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcmulq_m_f16 (float16x8_t __inactive, float16x8_t __a, float16x8_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vcmulq_m_fv8hf (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcmulq_rot180_m_f32 (float32x4_t __inactive, float32x4_t __a, float32x4_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vcmulq_rot180_m_fv4sf (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcmulq_rot180_m_f16 (float16x8_t __inactive, float16x8_t __a, float16x8_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vcmulq_rot180_m_fv8hf (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcmulq_rot270_m_f32 (float32x4_t __inactive, float32x4_t __a, float32x4_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vcmulq_rot270_m_fv4sf (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcmulq_rot270_m_f16 (float16x8_t __inactive, float16x8_t __a, float16x8_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vcmulq_rot270_m_fv8hf (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcmulq_rot90_m_f32 (float32x4_t __inactive, float32x4_t __a, float32x4_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vcmulq_rot90_m_fv4sf (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcmulq_rot90_m_f16 (float16x8_t __inactive, float16x8_t __a, float16x8_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vcmulq_rot90_m_fv8hf (__inactive, __a, __b, __p);
-}
-
 __extension__ extern __inline int32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vcvtq_m_n_s32_f32 (int32x4_t __inactive, float32x4_t __a, const int __imm6, mve_pred16_t __p)
@@ -5359,62 +5211,6 @@ __arm_vstrwq_scatter_base_wb_p_f32 (uint32x4_t * __addr, const int __offset, flo
   *__addr = __builtin_mve_vstrwq_scatter_base_wb_p_fv4sf (*__addr, __offset, __value, __p);
 }
 
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcmulq_x_f16 (float16x8_t __a, float16x8_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vcmulq_m_fv8hf (__arm_vuninitializedq_f16 (), __a, __b, __p);
-}
-
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcmulq_x_f32 (float32x4_t __a, float32x4_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vcmulq_m_fv4sf (__arm_vuninitializedq_f32 (), __a, __b, __p);
-}
-
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcmulq_rot90_x_f16 (float16x8_t __a, float16x8_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vcmulq_rot90_m_fv8hf (__arm_vuninitializedq_f16 (), __a, __b, __p);
-}
-
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcmulq_rot90_x_f32 (float32x4_t __a, float32x4_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vcmulq_rot90_m_fv4sf (__arm_vuninitializedq_f32 (), __a, __b, __p);
-}
-
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcmulq_rot180_x_f16 (float16x8_t __a, float16x8_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vcmulq_rot180_m_fv8hf (__arm_vuninitializedq_f16 (), __a, __b, __p);
-}
-
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcmulq_rot180_x_f32 (float32x4_t __a, float32x4_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vcmulq_rot180_m_fv4sf (__arm_vuninitializedq_f32 (), __a, __b, __p);
-}
-
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcmulq_rot270_x_f16 (float16x8_t __a, float16x8_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vcmulq_rot270_m_fv8hf (__arm_vuninitializedq_f16 (), __a, __b, __p);
-}
-
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcmulq_rot270_x_f32 (float32x4_t __a, float32x4_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vcmulq_rot270_m_fv4sf (__arm_vuninitializedq_f32 (), __a, __b, __p);
-}
-
 __extension__ extern __inline int16x8_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vcvtaq_x_s16_f16 (float16x8_t __a, mve_pred16_t __p)
@@ -8580,34 +8376,6 @@ __arm_vornq (float16x8_t __a, float16x8_t __b)
  return __arm_vornq_f16 (__a, __b);
 }
 
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcmulq_rot90 (float16x8_t __a, float16x8_t __b)
-{
- return __arm_vcmulq_rot90_f16 (__a, __b);
-}
-
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcmulq_rot270 (float16x8_t __a, float16x8_t __b)
-{
- return __arm_vcmulq_rot270_f16 (__a, __b);
-}
-
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcmulq_rot180 (float16x8_t __a, float16x8_t __b)
-{
- return __arm_vcmulq_rot180_f16 (__a, __b);
-}
-
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcmulq (float16x8_t __a, float16x8_t __b)
-{
- return __arm_vcmulq_f16 (__a, __b);
-}
-
 __extension__ extern __inline float16x8_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vbicq (float16x8_t __a, float16x8_t __b)
@@ -8622,34 +8390,6 @@ __arm_vornq (float32x4_t __a, float32x4_t __b)
  return __arm_vornq_f32 (__a, __b);
 }
 
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcmulq_rot90 (float32x4_t __a, float32x4_t __b)
-{
- return __arm_vcmulq_rot90_f32 (__a, __b);
-}
-
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcmulq_rot270 (float32x4_t __a, float32x4_t __b)
-{
- return __arm_vcmulq_rot270_f32 (__a, __b);
-}
-
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcmulq_rot180 (float32x4_t __a, float32x4_t __b)
-{
- return __arm_vcmulq_rot180_f32 (__a, __b);
-}
-
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcmulq (float32x4_t __a, float32x4_t __b)
-{
- return __arm_vcmulq_f32 (__a, __b);
-}
-
 __extension__ extern __inline float32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vbicq (float32x4_t __a, float32x4_t __b)
@@ -9007,62 +8747,6 @@ __arm_vcmlaq_rot90_m (float16x8_t __a, float16x8_t __b, float16x8_t __c, mve_pre
  return __arm_vcmlaq_rot90_m_f16 (__a, __b, __c, __p);
 }
 
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcmulq_m (float32x4_t __inactive, float32x4_t __a, float32x4_t __b, mve_pred16_t __p)
-{
- return __arm_vcmulq_m_f32 (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcmulq_m (float16x8_t __inactive, float16x8_t __a, float16x8_t __b, mve_pred16_t __p)
-{
- return __arm_vcmulq_m_f16 (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcmulq_rot180_m (float32x4_t __inactive, float32x4_t __a, float32x4_t __b, mve_pred16_t __p)
-{
- return __arm_vcmulq_rot180_m_f32 (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcmulq_rot180_m (float16x8_t __inactive, float16x8_t __a, float16x8_t __b, mve_pred16_t __p)
-{
- return __arm_vcmulq_rot180_m_f16 (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcmulq_rot270_m (float32x4_t __inactive, float32x4_t __a, float32x4_t __b, mve_pred16_t __p)
-{
- return __arm_vcmulq_rot270_m_f32 (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcmulq_rot270_m (float16x8_t __inactive, float16x8_t __a, float16x8_t __b, mve_pred16_t __p)
-{
- return __arm_vcmulq_rot270_m_f16 (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcmulq_rot90_m (float32x4_t __inactive, float32x4_t __a, float32x4_t __b, mve_pred16_t __p)
-{
- return __arm_vcmulq_rot90_m_f32 (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcmulq_rot90_m (float16x8_t __inactive, float16x8_t __a, float16x8_t __b, mve_pred16_t __p)
-{
- return __arm_vcmulq_rot90_m_f16 (__inactive, __a, __b, __p);
-}
-
 __extension__ extern __inline int32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vcvtq_m_n (int32x4_t __inactive, float32x4_t __a, const int __imm6, mve_pred16_t __p)
@@ -9301,62 +8985,6 @@ __arm_vstrwq_scatter_base_wb_p (uint32x4_t * __addr, const int __offset, float32
  __arm_vstrwq_scatter_base_wb_p_f32 (__addr, __offset, __value, __p);
 }
 
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcmulq_x (float16x8_t __a, float16x8_t __b, mve_pred16_t __p)
-{
- return __arm_vcmulq_x_f16 (__a, __b, __p);
-}
-
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcmulq_x (float32x4_t __a, float32x4_t __b, mve_pred16_t __p)
-{
- return __arm_vcmulq_x_f32 (__a, __b, __p);
-}
-
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcmulq_rot90_x (float16x8_t __a, float16x8_t __b, mve_pred16_t __p)
-{
- return __arm_vcmulq_rot90_x_f16 (__a, __b, __p);
-}
-
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcmulq_rot90_x (float32x4_t __a, float32x4_t __b, mve_pred16_t __p)
-{
- return __arm_vcmulq_rot90_x_f32 (__a, __b, __p);
-}
-
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcmulq_rot180_x (float16x8_t __a, float16x8_t __b, mve_pred16_t __p)
-{
- return __arm_vcmulq_rot180_x_f16 (__a, __b, __p);
-}
-
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcmulq_rot180_x (float32x4_t __a, float32x4_t __b, mve_pred16_t __p)
-{
- return __arm_vcmulq_rot180_x_f32 (__a, __b, __p);
-}
-
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcmulq_rot270_x (float16x8_t __a, float16x8_t __b, mve_pred16_t __p)
-{
- return __arm_vcmulq_rot270_x_f16 (__a, __b, __p);
-}
-
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcmulq_rot270_x (float32x4_t __a, float32x4_t __b, mve_pred16_t __p)
-{
- return __arm_vcmulq_rot270_x_f32 (__a, __b, __p);
-}
-
 __extension__ extern __inline float16x8_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vcvtq_x (uint16x8_t __a, mve_pred16_t __p)
@@ -9912,30 +9540,6 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]: __arm_vornq_f16 (__ARM_mve_coerce(__p0, float16x8_t), __ARM_mve_coerce(__p1, float16x8_t)), \
   int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]: __arm_vornq_f32 (__ARM_mve_coerce(__p0, float32x4_t), __ARM_mve_coerce(__p1, float32x4_t)));})
 
-#define __arm_vcmulq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
-  __typeof(p1) __p1 = (p1); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
-  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]: __arm_vcmulq_f16 (__ARM_mve_coerce(__p0, float16x8_t), __ARM_mve_coerce(__p1, float16x8_t)), \
-  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]: __arm_vcmulq_f32 (__ARM_mve_coerce(__p0, float32x4_t), __ARM_mve_coerce(__p1, float32x4_t)));})
-
-#define __arm_vcmulq_rot180(p0,p1) ({ __typeof(p0) __p0 = (p0); \
-  __typeof(p1) __p1 = (p1); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
-  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]: __arm_vcmulq_rot180_f16 (__ARM_mve_coerce(__p0, float16x8_t), __ARM_mve_coerce(__p1, float16x8_t)), \
-  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]: __arm_vcmulq_rot180_f32 (__ARM_mve_coerce(__p0, float32x4_t), __ARM_mve_coerce(__p1, float32x4_t)));})
-
-#define __arm_vcmulq_rot270(p0,p1) ({ __typeof(p0) __p0 = (p0); \
-  __typeof(p1) __p1 = (p1); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
-  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]: __arm_vcmulq_rot270_f16 (__ARM_mve_coerce(__p0, float16x8_t), __ARM_mve_coerce(__p1, float16x8_t)), \
-  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]: __arm_vcmulq_rot270_f32 (__ARM_mve_coerce(__p0, float32x4_t), __ARM_mve_coerce(__p1, float32x4_t)));})
-
-#define __arm_vcmulq_rot90(p0,p1) ({ __typeof(p0) __p0 = (p0); \
-  __typeof(p1) __p1 = (p1); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
-  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]: __arm_vcmulq_rot90_f16 (__ARM_mve_coerce(__p0, float16x8_t), __ARM_mve_coerce(__p1, float16x8_t)), \
-  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]: __arm_vcmulq_rot90_f32 (__ARM_mve_coerce(__p0, float32x4_t), __ARM_mve_coerce(__p1, float32x4_t)));})
-
 #define __arm_vmulltq_poly(p0,p1) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
   _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
@@ -10121,34 +9725,6 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]: __arm_vcmlaq_rot90_m_f16 (__ARM_mve_coerce(__p0, float16x8_t), __ARM_mve_coerce(__p1, float16x8_t), __ARM_mve_coerce(__p2, float16x8_t), p3), \
   int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]: __arm_vcmlaq_rot90_m_f32 (__ARM_mve_coerce(__p0, float32x4_t), __ARM_mve_coerce(__p1, float32x4_t), __ARM_mve_coerce(__p2, float32x4_t), p3));})
 
-#define __arm_vcmulq_m(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
-  __typeof(p1) __p1 = (p1); \
-  __typeof(p2) __p2 = (p2); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
-  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]: __arm_vcmulq_m_f16 (__ARM_mve_coerce(__p0, float16x8_t), __ARM_mve_coerce(__p1, float16x8_t), __ARM_mve_coerce(__p2, float16x8_t), p3), \
-  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]: __arm_vcmulq_m_f32 (__ARM_mve_coerce(__p0, float32x4_t), __ARM_mve_coerce(__p1, float32x4_t), __ARM_mve_coerce(__p2, float32x4_t), p3));})
-
-#define __arm_vcmulq_rot180_m(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
-  __typeof(p1) __p1 = (p1); \
-  __typeof(p2) __p2 = (p2); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
-  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]: __arm_vcmulq_rot180_m_f16 (__ARM_mve_coerce(__p0, float16x8_t), __ARM_mve_coerce(__p1, float16x8_t), __ARM_mve_coerce(__p2, float16x8_t), p3), \
-  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]: __arm_vcmulq_rot180_m_f32 (__ARM_mve_coerce(__p0, float32x4_t), __ARM_mve_coerce(__p1, float32x4_t), __ARM_mve_coerce(__p2, float32x4_t), p3));})
-
-#define __arm_vcmulq_rot270_m(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
-  __typeof(p1) __p1 = (p1); \
-  __typeof(p2) __p2 = (p2); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
-  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]: __arm_vcmulq_rot270_m_f16 (__ARM_mve_coerce(__p0, float16x8_t), __ARM_mve_coerce(__p1, float16x8_t), __ARM_mve_coerce(__p2, float16x8_t), p3), \
-  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]: __arm_vcmulq_rot270_m_f32 (__ARM_mve_coerce(__p0, float32x4_t), __ARM_mve_coerce(__p1, float32x4_t), __ARM_mve_coerce(__p2, float32x4_t), p3));})
-
-#define __arm_vcmulq_rot90_m(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
-  __typeof(p1) __p1 = (p1); \
-  __typeof(p2) __p2 = (p2); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)] [__ARM_mve_typeid(__p2)])0, \
-  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]: __arm_vcmulq_rot90_m_f16(__ARM_mve_coerce(__p0, float16x8_t), __ARM_mve_coerce(__p1, float16x8_t), __ARM_mve_coerce(__p2, float16x8_t), p3), \
-  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]: __arm_vcmulq_rot90_m_f32(__ARM_mve_coerce(__p0, float32x4_t), __ARM_mve_coerce(__p1, float32x4_t), __ARM_mve_coerce(__p2, float32x4_t), p3));})
-
 #define __arm_vornq_m(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
   __typeof(p2) __p2 = (p2); \
@@ -10486,24 +10062,6 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]: __arm_vbicq_x_f16 (__ARM_mve_coerce(__p1, float16x8_t), __ARM_mve_coerce(__p2, float16x8_t), p3), \
   int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]: __arm_vbicq_x_f32 (__ARM_mve_coerce(__p1, float32x4_t), __ARM_mve_coerce(__p2, float32x4_t), p3));})
 
-#define __arm_vcmulq_rot180_x(p1,p2,p3) ({ __typeof(p1) __p1 = (p1); \
-  __typeof(p2) __p2 = (p2); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
-  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]: __arm_vcmulq_rot180_x_f16 (__ARM_mve_coerce(__p1, float16x8_t), __ARM_mve_coerce(__p2, float16x8_t), p3), \
-  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]: __arm_vcmulq_rot180_x_f32 (__ARM_mve_coerce(__p1, float32x4_t), __ARM_mve_coerce(__p2, float32x4_t), p3));})
-
-#define __arm_vcmulq_rot270_x(p1,p2,p3) ({ __typeof(p1) __p1 = (p1); \
-  __typeof(p2) __p2 = (p2); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
-  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]: __arm_vcmulq_rot270_x_f16 (__ARM_mve_coerce(__p1, float16x8_t), __ARM_mve_coerce(__p2, float16x8_t), p3), \
-  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]: __arm_vcmulq_rot270_x_f32 (__ARM_mve_coerce(__p1, float32x4_t), __ARM_mve_coerce(__p2, float32x4_t), p3));})
-
-#define __arm_vcmulq_x(p1,p2,p3) ({ __typeof(p1) __p1 = (p1); \
-  __typeof(p2) __p2 = (p2); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
-  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]: __arm_vcmulq_x_f16 (__ARM_mve_coerce(__p1, float16x8_t), __ARM_mve_coerce(__p2, float16x8_t), p3), \
-  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]: __arm_vcmulq_x_f32 (__ARM_mve_coerce(__p1, float32x4_t), __ARM_mve_coerce(__p2, float32x4_t), p3));})
-
 #define __arm_vcvtq_x(p1,p2) ({ __typeof(p1) __p1 = (p1); \
   _Generic( (int (*)[__ARM_mve_typeid(__p1)])0, \
   int (*)[__ARM_mve_type_int16x8_t]: __arm_vcvtq_x_f16_s16 (__ARM_mve_coerce(__p1, int16x8_t), p2), \
@@ -10530,12 +10088,6 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]: __arm_vornq_x_f16 (__ARM_mve_coerce(__p1, float16x8_t), __ARM_mve_coerce(__p2, float16x8_t), p3), \
   int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]: __arm_vornq_x_f32 (__ARM_mve_coerce(__p1, float32x4_t), __ARM_mve_coerce(__p2, float32x4_t), p3));})
 
-#define __arm_vcmulq_rot90_x(p1,p2,p3)  ({ __typeof(p1) __p1 = (p1); \
-  __typeof(p2) __p2 = (p2); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
-  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]: __arm_vcmulq_rot90_x_f16 (__ARM_mve_coerce(__p1, float16x8_t), __ARM_mve_coerce(__p2, float16x8_t), p3), \
-  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]: __arm_vcmulq_rot90_x_f32 (__ARM_mve_coerce(__p1, float32x4_t), __ARM_mve_coerce(__p2, float32x4_t), p3));})
-
 #define __arm_vgetq_lane(p0,p1) ({ __typeof(p0) __p0 = (p0); \
   _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
   int (*)[__ARM_mve_type_int8x16_t]: __arm_vgetq_lane_s8 (__ARM_mve_coerce(__p0, int8x16_t), p1), \
-- 
2.34.1


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 5/6] arm: [MVE intrinsics] factorize vcmlaq
  2023-07-13 10:22 [PATCH 1/6] arm: [MVE intrinsics] Factorize vcaddq vhcaddq Christophe Lyon
                   ` (2 preceding siblings ...)
  2023-07-13 10:22 ` [PATCH 4/6] arm: [MVE intrinsics] rework vcmulq Christophe Lyon
@ 2023-07-13 10:22 ` Christophe Lyon
  2023-07-13 10:22 ` [PATCH 6/6] arm: [MVE intrinsics] rework vcmlaq Christophe Lyon
  2023-07-14 16:18 ` [PATCH 1/6] arm: [MVE intrinsics] Factorize vcaddq vhcaddq Kyrylo Tkachov
  5 siblings, 0 replies; 7+ messages in thread
From: Christophe Lyon @ 2023-07-13 10:22 UTC (permalink / raw)
  To: gcc-patches, Kyrylo.Tkachov, richard.earnshaw, richard.sandiford
  Cc: Christophe Lyon

Factorize vcmlaq builtins so that they use parameterized names.

2023-17-13  Christophe Lyon  <christophe.lyon@linaro.org>

	gcc/
	* config/arm/arm_mve_builtins.def (vcmlaq_rot90_f)
	(vcmlaq_rot270_f, vcmlaq_rot180_f, vcmlaq_f): Add "_f" suffix.
	* config/arm/iterators.md (MVE_VCMLAQ_M): New.
	(mve_insn): Add vcmla.
	(rot): Add VCMLAQ_M_F, VCMLAQ_ROT90_M_F, VCMLAQ_ROT180_M_F,
	VCMLAQ_ROT270_M_F.
	(mve_rot): Add VCMLAQ_M_F, VCMLAQ_ROT90_M_F, VCMLAQ_ROT180_M_F,
	VCMLAQ_ROT270_M_F.
	* config/arm/mve.md (mve_vcmlaq<mve_rot><mode>): Rename into ...
	(@mve_<mve_insn>q<mve_rot>_f<mode>): ... this.
	(mve_vcmlaq_m_f<mode>, mve_vcmlaq_rot180_m_f<mode>)
	(mve_vcmlaq_rot270_m_f<mode>, mve_vcmlaq_rot90_m_f<mode>): Merge
	into ...
	(@mve_<mve_insn>q<mve_rot>_m_f<mode>): ... this.
---
 gcc/config/arm/arm_mve_builtins.def | 10 ++---
 gcc/config/arm/iterators.md         | 19 ++++++++-
 gcc/config/arm/mve.md               | 64 ++++-------------------------
 3 files changed, 29 insertions(+), 64 deletions(-)

diff --git a/gcc/config/arm/arm_mve_builtins.def b/gcc/config/arm/arm_mve_builtins.def
index 56358c0bd02..43dacc3dda1 100644
--- a/gcc/config/arm/arm_mve_builtins.def
+++ b/gcc/config/arm/arm_mve_builtins.def
@@ -378,6 +378,10 @@ VAR3 (TERNOP_NONE_NONE_NONE_NONE, vmlasq_n_s, v16qi, v8hi, v4si)
 VAR3 (TERNOP_NONE_NONE_NONE_NONE, vmlaq_n_s, v16qi, v8hi, v4si)
 VAR3 (TERNOP_NONE_NONE_NONE_NONE, vmladavaxq_s, v16qi, v8hi, v4si)
 VAR3 (TERNOP_NONE_NONE_NONE_NONE, vmladavaq_s, v16qi, v8hi, v4si)
+VAR2 (TERNOP_NONE_NONE_NONE_NONE, vcmlaq_rot90_f, v8hf, v4sf)
+VAR2 (TERNOP_NONE_NONE_NONE_NONE, vcmlaq_rot270_f, v8hf, v4sf)
+VAR2 (TERNOP_NONE_NONE_NONE_NONE, vcmlaq_rot180_f, v8hf, v4sf)
+VAR2 (TERNOP_NONE_NONE_NONE_NONE, vcmlaq_f, v8hf, v4sf)
 VAR3 (TERNOP_NONE_NONE_NONE_IMM, vsriq_n_s, v16qi, v8hi, v4si)
 VAR3 (TERNOP_NONE_NONE_NONE_IMM, vsliq_n_s, v16qi, v8hi, v4si)
 VAR2 (TERNOP_UNONE_UNONE_UNONE_PRED, vrev32q_m_u, v16qi, v8hi)
@@ -876,9 +880,3 @@ VAR3 (QUADOP_NONE_NONE_UNONE_IMM_PRED, vshlcq_m_vec_s, v16qi, v8hi, v4si)
 VAR3 (QUADOP_NONE_NONE_UNONE_IMM_PRED, vshlcq_m_carry_s, v16qi, v8hi, v4si)
 VAR3 (QUADOP_UNONE_UNONE_UNONE_IMM_PRED, vshlcq_m_vec_u, v16qi, v8hi, v4si)
 VAR3 (QUADOP_UNONE_UNONE_UNONE_IMM_PRED, vshlcq_m_carry_u, v16qi, v8hi, v4si)
-
-/* optabs without any suffixes.  */
-VAR2 (TERNOP_NONE_NONE_NONE_NONE, vcmlaq_rot90, v8hf, v4sf)
-VAR2 (TERNOP_NONE_NONE_NONE_NONE, vcmlaq_rot270, v8hf, v4sf)
-VAR2 (TERNOP_NONE_NONE_NONE_NONE, vcmlaq_rot180, v8hf, v4sf)
-VAR2 (TERNOP_NONE_NONE_NONE_NONE, vcmlaq, v8hf, v4sf)
diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
index 9f71404e26c..b13ff53d36f 100644
--- a/gcc/config/arm/iterators.md
+++ b/gcc/config/arm/iterators.md
@@ -911,6 +911,10 @@
 		     VCMULQ_M_F VCMULQ_ROT90_M_F VCMULQ_ROT180_M_F VCMULQ_ROT270_M_F
 		     ])
 
+(define_int_iterator MVE_VCMLAQ_M [
+		     VCMLAQ_M_F VCMLAQ_ROT90_M_F VCMLAQ_ROT180_M_F VCMLAQ_ROT270_M_F
+		     ])
+
 (define_int_attr mve_insn [
 		 (UNSPEC_VCADD90 "vcadd") (UNSPEC_VCADD270 "vcadd")
 		 (UNSPEC_VCMUL "vcmul") (UNSPEC_VCMUL90 "vcmul") (UNSPEC_VCMUL180 "vcmul") (UNSPEC_VCMUL270 "vcmul")
@@ -942,6 +946,7 @@
 		 (VCLSQ_M_S "vcls")
 		 (VCLSQ_S "vcls")
 		 (VCLZQ_M_S "vclz") (VCLZQ_M_U "vclz")
+		 (VCMLAQ_M_F "vcmla") (VCMLAQ_ROT90_M_F "vcmla") (VCMLAQ_ROT180_M_F "vcmla") (VCMLAQ_ROT270_M_F "vcmla")
 		 (VCMULQ_M_F "vcmul") (VCMULQ_ROT90_M_F "vcmul") (VCMULQ_ROT180_M_F "vcmul") (VCMULQ_ROT270_M_F "vcmul")
 		 (VCREATEQ_S "vcreate") (VCREATEQ_U "vcreate") (VCREATEQ_F "vcreate")
 		 (VDUPQ_M_N_S "vdup") (VDUPQ_M_N_U "vdup") (VDUPQ_M_N_F "vdup")
@@ -1204,6 +1209,7 @@
 		 (VSUBQ_M_N_S "vsub") (VSUBQ_M_N_U "vsub") (VSUBQ_M_N_F "vsub")
 		 (VSUBQ_M_S "vsub") (VSUBQ_M_U "vsub") (VSUBQ_M_F "vsub")
 		 (VSUBQ_N_S "vsub") (VSUBQ_N_U "vsub") (VSUBQ_N_F "vsub")
+		 (UNSPEC_VCMLA "vcmla") (UNSPEC_VCMLA90 "vcmla") (UNSPEC_VCMLA180 "vcmla") (UNSPEC_VCMLA270 "vcmla")
 		 ])
 
 (define_int_attr isu    [
@@ -2198,7 +2204,12 @@
 		      (VCMULQ_M_F "0")
 		      (VCMULQ_ROT90_M_F "90")
 		      (VCMULQ_ROT180_M_F "180")
-		      (VCMULQ_ROT270_M_F "270")])
+		      (VCMULQ_ROT270_M_F "270")
+		      (VCMLAQ_M_F "0")
+		      (VCMLAQ_ROT90_M_F "90")
+		      (VCMLAQ_ROT180_M_F "180")
+		      (VCMLAQ_ROT270_M_F "270")
+		      ])
 
 ;; The complex operations when performed on a real complex number require two
 ;; instructions to perform the operation. e.g. complex multiplication requires
@@ -2250,7 +2261,11 @@
 			  (VCMULQ_M_F "")
 			  (VCMULQ_ROT90_M_F "_rot90")
 			  (VCMULQ_ROT180_M_F "_rot180")
-			  (VCMULQ_ROT270_M_F "_rot270")])
+			  (VCMULQ_ROT270_M_F "_rot270")
+			  (VCMLAQ_M_F "")
+			  (VCMLAQ_ROT90_M_F "_rot90")
+			  (VCMLAQ_ROT180_M_F "_rot180")
+			  (VCMLAQ_ROT270_M_F "_rot270")])
 
 (define_int_attr fcmac1 [(UNSPEC_VCMLA "a") (UNSPEC_VCMLA_CONJ "a")
 			 (UNSPEC_VCMLA180 "s") (UNSPEC_VCMLA180_CONJ "s")])
diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index 0b99bf017dc..a2cbcff1a6f 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -2087,7 +2087,7 @@
 ;;
 ;; [vcmlaq, vcmlaq_rot90, vcmlaq_rot180, vcmlaq_rot270])
 ;;
-(define_insn "mve_vcmlaq<mve_rot><mode>"
+(define_insn "@mve_<mve_insn>q<mve_rot>_f<mode>"
   [
    (set (match_operand:MVE_0 0 "s_register_operand" "=w,w")
 	(plus:MVE_0 (match_operand:MVE_0 1 "reg_or_zero_operand" "Dz,0")
@@ -3180,70 +3180,22 @@
    (set_attr "length""8")])
 
 ;;
-;; [vcmlaq_m_f])
-;;
-(define_insn "mve_vcmlaq_m_f<mode>"
-  [
-   (set (match_operand:MVE_0 0 "s_register_operand" "=w")
-	(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "0")
-		       (match_operand:MVE_0 2 "s_register_operand" "w")
-		       (match_operand:MVE_0 3 "s_register_operand" "w")
-		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
-	 VCMLAQ_M_F))
-  ]
-  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
-  "vpst\;vcmlat.f%#<V_sz_elem>	%q0, %q2, %q3, #0"
-  [(set_attr "type" "mve_move")
-   (set_attr "length""8")])
-
-;;
-;; [vcmlaq_rot180_m_f])
-;;
-(define_insn "mve_vcmlaq_rot180_m_f<mode>"
-  [
-   (set (match_operand:MVE_0 0 "s_register_operand" "=w")
-	(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "0")
-		       (match_operand:MVE_0 2 "s_register_operand" "w")
-		       (match_operand:MVE_0 3 "s_register_operand" "w")
-		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
-	 VCMLAQ_ROT180_M_F))
-  ]
-  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
-  "vpst\;vcmlat.f%#<V_sz_elem>	%q0, %q2, %q3, #180"
-  [(set_attr "type" "mve_move")
-   (set_attr "length""8")])
-
-;;
-;; [vcmlaq_rot270_m_f])
+;; [vcmlaq_m_f]
+;; [vcmlaq_rot90_m_f]
+;; [vcmlaq_rot180_m_f]
+;; [vcmlaq_rot270_m_f]
 ;;
-(define_insn "mve_vcmlaq_rot270_m_f<mode>"
-  [
-   (set (match_operand:MVE_0 0 "s_register_operand" "=w")
-	(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "0")
-		       (match_operand:MVE_0 2 "s_register_operand" "w")
-		       (match_operand:MVE_0 3 "s_register_operand" "w")
-		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
-	 VCMLAQ_ROT270_M_F))
-  ]
-  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
-  "vpst\;vcmlat.f%#<V_sz_elem>	%q0, %q2, %q3, #270"
-  [(set_attr "type" "mve_move")
-   (set_attr "length""8")])
-
-;;
-;; [vcmlaq_rot90_m_f])
-;;
-(define_insn "mve_vcmlaq_rot90_m_f<mode>"
+(define_insn "@mve_<mve_insn>q<mve_rot>_m_f<mode>"
   [
    (set (match_operand:MVE_0 0 "s_register_operand" "=w")
 	(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "0")
 		       (match_operand:MVE_0 2 "s_register_operand" "w")
 		       (match_operand:MVE_0 3 "s_register_operand" "w")
 		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
-	 VCMLAQ_ROT90_M_F))
+	 MVE_VCMLAQ_M))
   ]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
-  "vpst\;vcmlat.f%#<V_sz_elem>	%q0, %q2, %q3, #90"
+  "vpst\;<mve_insn>t.f%#<V_sz_elem>\t%q0, %q2, %q3, #<rot>"
   [(set_attr "type" "mve_move")
    (set_attr "length""8")])
 
-- 
2.34.1


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 6/6] arm: [MVE intrinsics] rework vcmlaq
  2023-07-13 10:22 [PATCH 1/6] arm: [MVE intrinsics] Factorize vcaddq vhcaddq Christophe Lyon
                   ` (3 preceding siblings ...)
  2023-07-13 10:22 ` [PATCH 5/6] arm: [MVE intrinsics] factorize vcmlaq Christophe Lyon
@ 2023-07-13 10:22 ` Christophe Lyon
  2023-07-14 16:18 ` [PATCH 1/6] arm: [MVE intrinsics] Factorize vcaddq vhcaddq Kyrylo Tkachov
  5 siblings, 0 replies; 7+ messages in thread
From: Christophe Lyon @ 2023-07-13 10:22 UTC (permalink / raw)
  To: gcc-patches, Kyrylo.Tkachov, richard.earnshaw, richard.sandiford
  Cc: Christophe Lyon

Implement vcmlaq using the new MVE builtins framework.

2023-07-13  Christophe Lyon  <christophe.lyon@linaro.org>

	gcc/
	* config/arm/arm-mve-builtins-base.cc (vcmlaq, vcmlaq_rot90)
	(vcmlaq_rot180, vcmlaq_rot270): New.
	* config/arm/arm-mve-builtins-base.def (vcmlaq, vcmlaq_rot90)
	(vcmlaq_rot180, vcmlaq_rot270): New.
	* config/arm/arm-mve-builtins-base.h: (vcmlaq, vcmlaq_rot90)
	(vcmlaq_rot180, vcmlaq_rot270): New.
	* config/arm/arm-mve-builtins.cc
	(function_instance::has_inactive_argument): Handle vcmlaq,
	vcmlaq_rot90, vcmlaq_rot180, vcmlaq_rot270.
	* config/arm/arm_mve.h (vcmlaq): Delete.
	(vcmlaq_rot180): Delete.
	(vcmlaq_rot270): Delete.
	(vcmlaq_rot90): Delete.
	(vcmlaq_m): Delete.
	(vcmlaq_rot180_m): Delete.
	(vcmlaq_rot270_m): Delete.
	(vcmlaq_rot90_m): Delete.
	(vcmlaq_f16): Delete.
	(vcmlaq_rot180_f16): Delete.
	(vcmlaq_rot270_f16): Delete.
	(vcmlaq_rot90_f16): Delete.
	(vcmlaq_f32): Delete.
	(vcmlaq_rot180_f32): Delete.
	(vcmlaq_rot270_f32): Delete.
	(vcmlaq_rot90_f32): Delete.
	(vcmlaq_m_f32): Delete.
	(vcmlaq_m_f16): Delete.
	(vcmlaq_rot180_m_f32): Delete.
	(vcmlaq_rot180_m_f16): Delete.
	(vcmlaq_rot270_m_f32): Delete.
	(vcmlaq_rot270_m_f16): Delete.
	(vcmlaq_rot90_m_f32): Delete.
	(vcmlaq_rot90_m_f16): Delete.
	(__arm_vcmlaq_f16): Delete.
	(__arm_vcmlaq_rot180_f16): Delete.
	(__arm_vcmlaq_rot270_f16): Delete.
	(__arm_vcmlaq_rot90_f16): Delete.
	(__arm_vcmlaq_f32): Delete.
	(__arm_vcmlaq_rot180_f32): Delete.
	(__arm_vcmlaq_rot270_f32): Delete.
	(__arm_vcmlaq_rot90_f32): Delete.
	(__arm_vcmlaq_m_f32): Delete.
	(__arm_vcmlaq_m_f16): Delete.
	(__arm_vcmlaq_rot180_m_f32): Delete.
	(__arm_vcmlaq_rot180_m_f16): Delete.
	(__arm_vcmlaq_rot270_m_f32): Delete.
	(__arm_vcmlaq_rot270_m_f16): Delete.
	(__arm_vcmlaq_rot90_m_f32): Delete.
	(__arm_vcmlaq_rot90_m_f16): Delete.
	(__arm_vcmlaq): Delete.
	(__arm_vcmlaq_rot180): Delete.
	(__arm_vcmlaq_rot270): Delete.
	(__arm_vcmlaq_rot90): Delete.
	(__arm_vcmlaq_m): Delete.
	(__arm_vcmlaq_rot180_m): Delete.
	(__arm_vcmlaq_rot270_m): Delete.
	(__arm_vcmlaq_rot90_m): Delete.
---
 gcc/config/arm/arm-mve-builtins-base.cc  |   4 +
 gcc/config/arm/arm-mve-builtins-base.def |   4 +
 gcc/config/arm/arm-mve-builtins-base.h   |  16 +-
 gcc/config/arm/arm-mve-builtins.cc       |   4 +
 gcc/config/arm/arm_mve.h                 | 304 -----------------------
 5 files changed, 22 insertions(+), 310 deletions(-)

diff --git a/gcc/config/arm/arm-mve-builtins-base.cc b/gcc/config/arm/arm-mve-builtins-base.cc
index 3ad8df304e8..e31095ae112 100644
--- a/gcc/config/arm/arm-mve-builtins-base.cc
+++ b/gcc/config/arm/arm-mve-builtins-base.cc
@@ -262,6 +262,10 @@ FUNCTION_WITH_RTX_M (vandq, AND, VANDQ)
 FUNCTION_ONLY_N (vbrsrq, VBRSRQ)
 FUNCTION (vcaddq_rot90, unspec_mve_function_exact_insn_rot, (UNSPEC_VCADD90, UNSPEC_VCADD90, UNSPEC_VCADD90, VCADDQ_ROT90_M_S, VCADDQ_ROT90_M_U, VCADDQ_ROT90_M_F))
 FUNCTION (vcaddq_rot270, unspec_mve_function_exact_insn_rot, (UNSPEC_VCADD270, UNSPEC_VCADD270, UNSPEC_VCADD270, VCADDQ_ROT270_M_S, VCADDQ_ROT270_M_U, VCADDQ_ROT270_M_F))
+FUNCTION (vcmlaq, unspec_mve_function_exact_insn_rot, (-1, -1, UNSPEC_VCMLA, -1, -1, VCMLAQ_M_F))
+FUNCTION (vcmlaq_rot90, unspec_mve_function_exact_insn_rot, (-1, -1, UNSPEC_VCMLA90, -1, -1, VCMLAQ_ROT90_M_F))
+FUNCTION (vcmlaq_rot180, unspec_mve_function_exact_insn_rot, (-1, -1, UNSPEC_VCMLA180, -1, -1, VCMLAQ_ROT180_M_F))
+FUNCTION (vcmlaq_rot270, unspec_mve_function_exact_insn_rot, (-1, -1, UNSPEC_VCMLA270, -1, -1, VCMLAQ_ROT270_M_F))
 FUNCTION (vcmulq, unspec_mve_function_exact_insn_rot, (-1, -1, UNSPEC_VCMUL, -1, -1, VCMULQ_M_F))
 FUNCTION (vcmulq_rot90, unspec_mve_function_exact_insn_rot, (-1, -1, UNSPEC_VCMUL90, -1, -1, VCMULQ_ROT90_M_F))
 FUNCTION (vcmulq_rot180, unspec_mve_function_exact_insn_rot, (-1, -1, UNSPEC_VCMUL180, -1, -1, VCMULQ_ROT180_M_F))
diff --git a/gcc/config/arm/arm-mve-builtins-base.def b/gcc/config/arm/arm-mve-builtins-base.def
index cbcf0d296cd..e7d466f2efd 100644
--- a/gcc/config/arm/arm-mve-builtins-base.def
+++ b/gcc/config/arm/arm-mve-builtins-base.def
@@ -158,6 +158,10 @@ DEF_MVE_FUNCTION (vandq, binary, all_float, mx_or_none)
 DEF_MVE_FUNCTION (vbrsrq, binary_imm32, all_float, mx_or_none)
 DEF_MVE_FUNCTION (vcaddq_rot90, binary, all_float, mx_or_none)
 DEF_MVE_FUNCTION (vcaddq_rot270, binary, all_float, mx_or_none)
+DEF_MVE_FUNCTION (vcmlaq, ternary, all_float, m_or_none)
+DEF_MVE_FUNCTION (vcmlaq_rot90, ternary, all_float, m_or_none)
+DEF_MVE_FUNCTION (vcmlaq_rot180, ternary, all_float, m_or_none)
+DEF_MVE_FUNCTION (vcmlaq_rot270, ternary, all_float, m_or_none)
 DEF_MVE_FUNCTION (vcmulq, binary, all_float, mx_or_none)
 DEF_MVE_FUNCTION (vcmulq_rot90, binary, all_float, mx_or_none)
 DEF_MVE_FUNCTION (vcmulq_rot180, binary, all_float, mx_or_none)
diff --git a/gcc/config/arm/arm-mve-builtins-base.h b/gcc/config/arm/arm-mve-builtins-base.h
index 875b333ebef..be3698b4f4c 100644
--- a/gcc/config/arm/arm-mve-builtins-base.h
+++ b/gcc/config/arm/arm-mve-builtins-base.h
@@ -33,14 +33,14 @@ extern const function_base *const vaddvaq;
 extern const function_base *const vaddvq;
 extern const function_base *const vandq;
 extern const function_base *const vbrsrq;
-extern const function_base *const vcaddq_rot90;
 extern const function_base *const vcaddq_rot270;
-extern const function_base *const vcmulq;
-extern const function_base *const vcmulq_rot90;
-extern const function_base *const vcmulq_rot180;
-extern const function_base *const vcmulq_rot270;
+extern const function_base *const vcaddq_rot90;
 extern const function_base *const vclsq;
 extern const function_base *const vclzq;
+extern const function_base *const vcmlaq;
+extern const function_base *const vcmlaq_rot180;
+extern const function_base *const vcmlaq_rot270;
+extern const function_base *const vcmlaq_rot90;
 extern const function_base *const vcmpcsq;
 extern const function_base *const vcmpeqq;
 extern const function_base *const vcmpgeq;
@@ -49,6 +49,10 @@ extern const function_base *const vcmphiq;
 extern const function_base *const vcmpleq;
 extern const function_base *const vcmpltq;
 extern const function_base *const vcmpneq;
+extern const function_base *const vcmulq;
+extern const function_base *const vcmulq_rot180;
+extern const function_base *const vcmulq_rot270;
+extern const function_base *const vcmulq_rot90;
 extern const function_base *const vcreateq;
 extern const function_base *const vdupq;
 extern const function_base *const veorq;
@@ -56,8 +60,8 @@ extern const function_base *const vfmaq;
 extern const function_base *const vfmasq;
 extern const function_base *const vfmsq;
 extern const function_base *const vhaddq;
-extern const function_base *const vhcaddq_rot90;
 extern const function_base *const vhcaddq_rot270;
+extern const function_base *const vhcaddq_rot90;
 extern const function_base *const vhsubq;
 extern const function_base *const vmaxaq;
 extern const function_base *const vmaxavq;
diff --git a/gcc/config/arm/arm-mve-builtins.cc b/gcc/config/arm/arm-mve-builtins.cc
index 7033e41a571..3272ece6326 100644
--- a/gcc/config/arm/arm-mve-builtins.cc
+++ b/gcc/config/arm/arm-mve-builtins.cc
@@ -670,6 +670,10 @@ function_instance::has_inactive_argument () const
     return false;
 
   if (mode_suffix_id == MODE_r
+      || base == functions::vcmlaq
+      || base == functions::vcmlaq_rot90
+      || base == functions::vcmlaq_rot180
+      || base == functions::vcmlaq_rot270
       || base == functions::vcmpeqq
       || base == functions::vcmpneq
       || base == functions::vcmpgeq
diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
index b9d3a876369..88b2e77ffd9 100644
--- a/gcc/config/arm/arm_mve.h
+++ b/gcc/config/arm/arm_mve.h
@@ -159,18 +159,10 @@
 #define vcvtq_m(__inactive, __a, __p) __arm_vcvtq_m(__inactive, __a, __p)
 #define vcvtbq_m(__a, __b, __p) __arm_vcvtbq_m(__a, __b, __p)
 #define vcvttq_m(__a, __b, __p) __arm_vcvttq_m(__a, __b, __p)
-#define vcmlaq(__a, __b, __c) __arm_vcmlaq(__a, __b, __c)
-#define vcmlaq_rot180(__a, __b, __c) __arm_vcmlaq_rot180(__a, __b, __c)
-#define vcmlaq_rot270(__a, __b, __c) __arm_vcmlaq_rot270(__a, __b, __c)
-#define vcmlaq_rot90(__a, __b, __c) __arm_vcmlaq_rot90(__a, __b, __c)
 #define vcvtmq_m(__inactive, __a, __p) __arm_vcvtmq_m(__inactive, __a, __p)
 #define vcvtnq_m(__inactive, __a, __p) __arm_vcvtnq_m(__inactive, __a, __p)
 #define vcvtpq_m(__inactive, __a, __p) __arm_vcvtpq_m(__inactive, __a, __p)
 #define vcvtq_m_n(__inactive, __a, __imm6, __p) __arm_vcvtq_m_n(__inactive, __a, __imm6, __p)
-#define vcmlaq_m(__a, __b, __c, __p) __arm_vcmlaq_m(__a, __b, __c, __p)
-#define vcmlaq_rot180_m(__a, __b, __c, __p) __arm_vcmlaq_rot180_m(__a, __b, __c, __p)
-#define vcmlaq_rot270_m(__a, __b, __c, __p) __arm_vcmlaq_rot270_m(__a, __b, __c, __p)
-#define vcmlaq_rot90_m(__a, __b, __c, __p) __arm_vcmlaq_rot90_m(__a, __b, __c, __p)
 #define vcvtq_x(__a, __p) __arm_vcvtq_x(__a, __p)
 #define vcvtq_x_n(__a, __imm6, __p) __arm_vcvtq_x_n(__a, __imm6, __p)
 
@@ -286,10 +278,6 @@
 #define vcvtbq_m_f32_f16(__inactive, __a, __p) __arm_vcvtbq_m_f32_f16(__inactive, __a, __p)
 #define vcvttq_m_f16_f32(__a, __b, __p) __arm_vcvttq_m_f16_f32(__a, __b, __p)
 #define vcvttq_m_f32_f16(__inactive, __a, __p) __arm_vcvttq_m_f32_f16(__inactive, __a, __p)
-#define vcmlaq_f16(__a, __b, __c) __arm_vcmlaq_f16(__a, __b, __c)
-#define vcmlaq_rot180_f16(__a, __b, __c) __arm_vcmlaq_rot180_f16(__a, __b, __c)
-#define vcmlaq_rot270_f16(__a, __b, __c) __arm_vcmlaq_rot270_f16(__a, __b, __c)
-#define vcmlaq_rot90_f16(__a, __b, __c) __arm_vcmlaq_rot90_f16(__a, __b, __c)
 #define vcvtmq_m_s16_f16(__inactive, __a, __p) __arm_vcvtmq_m_s16_f16(__inactive, __a, __p)
 #define vcvtnq_m_s16_f16(__inactive, __a, __p) __arm_vcvtnq_m_s16_f16(__inactive, __a, __p)
 #define vcvtpq_m_s16_f16(__inactive, __a, __p) __arm_vcvtpq_m_s16_f16(__inactive, __a, __p)
@@ -298,10 +286,6 @@
 #define vcvtnq_m_u16_f16(__inactive, __a, __p) __arm_vcvtnq_m_u16_f16(__inactive, __a, __p)
 #define vcvtpq_m_u16_f16(__inactive, __a, __p) __arm_vcvtpq_m_u16_f16(__inactive, __a, __p)
 #define vcvtq_m_u16_f16(__inactive, __a, __p) __arm_vcvtq_m_u16_f16(__inactive, __a, __p)
-#define vcmlaq_f32(__a, __b, __c) __arm_vcmlaq_f32(__a, __b, __c)
-#define vcmlaq_rot180_f32(__a, __b, __c) __arm_vcmlaq_rot180_f32(__a, __b, __c)
-#define vcmlaq_rot270_f32(__a, __b, __c) __arm_vcmlaq_rot270_f32(__a, __b, __c)
-#define vcmlaq_rot90_f32(__a, __b, __c) __arm_vcmlaq_rot90_f32(__a, __b, __c)
 #define vcvtmq_m_s32_f32(__inactive, __a, __p) __arm_vcvtmq_m_s32_f32(__inactive, __a, __p)
 #define vcvtnq_m_s32_f32(__inactive, __a, __p) __arm_vcvtnq_m_s32_f32(__inactive, __a, __p)
 #define vcvtpq_m_s32_f32(__inactive, __a, __p) __arm_vcvtpq_m_s32_f32(__inactive, __a, __p)
@@ -344,14 +328,6 @@
 #define vmulltq_poly_m_p16(__inactive, __a, __b, __p) __arm_vmulltq_poly_m_p16(__inactive, __a, __b, __p)
 #define vbicq_m_f32(__inactive, __a, __b, __p) __arm_vbicq_m_f32(__inactive, __a, __b, __p)
 #define vbicq_m_f16(__inactive, __a, __b, __p) __arm_vbicq_m_f16(__inactive, __a, __b, __p)
-#define vcmlaq_m_f32(__a, __b, __c, __p) __arm_vcmlaq_m_f32(__a, __b, __c, __p)
-#define vcmlaq_m_f16(__a, __b, __c, __p) __arm_vcmlaq_m_f16(__a, __b, __c, __p)
-#define vcmlaq_rot180_m_f32(__a, __b, __c, __p) __arm_vcmlaq_rot180_m_f32(__a, __b, __c, __p)
-#define vcmlaq_rot180_m_f16(__a, __b, __c, __p) __arm_vcmlaq_rot180_m_f16(__a, __b, __c, __p)
-#define vcmlaq_rot270_m_f32(__a, __b, __c, __p) __arm_vcmlaq_rot270_m_f32(__a, __b, __c, __p)
-#define vcmlaq_rot270_m_f16(__a, __b, __c, __p) __arm_vcmlaq_rot270_m_f16(__a, __b, __c, __p)
-#define vcmlaq_rot90_m_f32(__a, __b, __c, __p) __arm_vcmlaq_rot90_m_f32(__a, __b, __c, __p)
-#define vcmlaq_rot90_m_f16(__a, __b, __c, __p) __arm_vcmlaq_rot90_m_f16(__a, __b, __c, __p)
 #define vcvtq_m_n_s32_f32(__inactive, __a,  __imm6, __p) __arm_vcvtq_m_n_s32_f32(__inactive, __a,  __imm6, __p)
 #define vcvtq_m_n_s16_f16(__inactive, __a,  __imm6, __p) __arm_vcvtq_m_n_s16_f16(__inactive, __a,  __imm6, __p)
 #define vcvtq_m_n_u32_f32(__inactive, __a,  __imm6, __p) __arm_vcvtq_m_n_u32_f32(__inactive, __a,  __imm6, __p)
@@ -4645,34 +4621,6 @@ __arm_vcvttq_m_f32_f16 (float32x4_t __inactive, float16x8_t __a, mve_pred16_t __
   return __builtin_mve_vcvttq_m_f32_f16v4sf (__inactive, __a, __p);
 }
 
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcmlaq_f16 (float16x8_t __a, float16x8_t __b, float16x8_t __c)
-{
-  return __builtin_mve_vcmlaqv8hf (__a, __b, __c);
-}
-
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcmlaq_rot180_f16 (float16x8_t __a, float16x8_t __b, float16x8_t __c)
-{
-  return __builtin_mve_vcmlaq_rot180v8hf (__a, __b, __c);
-}
-
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcmlaq_rot270_f16 (float16x8_t __a, float16x8_t __b, float16x8_t __c)
-{
-  return __builtin_mve_vcmlaq_rot270v8hf (__a, __b, __c);
-}
-
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcmlaq_rot90_f16 (float16x8_t __a, float16x8_t __b, float16x8_t __c)
-{
-  return __builtin_mve_vcmlaq_rot90v8hf (__a, __b, __c);
-}
-
 __extension__ extern __inline int16x8_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vcvtmq_m_s16_f16 (int16x8_t __inactive, float16x8_t __a, mve_pred16_t __p)
@@ -4729,34 +4677,6 @@ __arm_vcvtq_m_u16_f16 (uint16x8_t __inactive, float16x8_t __a, mve_pred16_t __p)
   return __builtin_mve_vcvtq_m_from_f_uv8hi (__inactive, __a, __p);
 }
 
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcmlaq_f32 (float32x4_t __a, float32x4_t __b, float32x4_t __c)
-{
-  return __builtin_mve_vcmlaqv4sf (__a, __b, __c);
-}
-
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcmlaq_rot180_f32 (float32x4_t __a, float32x4_t __b, float32x4_t __c)
-{
-  return __builtin_mve_vcmlaq_rot180v4sf (__a, __b, __c);
-}
-
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcmlaq_rot270_f32 (float32x4_t __a, float32x4_t __b, float32x4_t __c)
-{
-  return __builtin_mve_vcmlaq_rot270v4sf (__a, __b, __c);
-}
-
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcmlaq_rot90_f32 (float32x4_t __a, float32x4_t __b, float32x4_t __c)
-{
-  return __builtin_mve_vcmlaq_rot90v4sf (__a, __b, __c);
-}
-
 __extension__ extern __inline int32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vcvtmq_m_s32_f32 (int32x4_t __inactive, float32x4_t __a, mve_pred16_t __p)
@@ -4855,62 +4775,6 @@ __arm_vbicq_m_f16 (float16x8_t __inactive, float16x8_t __a, float16x8_t __b, mve
   return __builtin_mve_vbicq_m_fv8hf (__inactive, __a, __b, __p);
 }
 
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcmlaq_m_f32 (float32x4_t __a, float32x4_t __b, float32x4_t __c, mve_pred16_t __p)
-{
-  return __builtin_mve_vcmlaq_m_fv4sf (__a, __b, __c, __p);
-}
-
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcmlaq_m_f16 (float16x8_t __a, float16x8_t __b, float16x8_t __c, mve_pred16_t __p)
-{
-  return __builtin_mve_vcmlaq_m_fv8hf (__a, __b, __c, __p);
-}
-
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcmlaq_rot180_m_f32 (float32x4_t __a, float32x4_t __b, float32x4_t __c, mve_pred16_t __p)
-{
-  return __builtin_mve_vcmlaq_rot180_m_fv4sf (__a, __b, __c, __p);
-}
-
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcmlaq_rot180_m_f16 (float16x8_t __a, float16x8_t __b, float16x8_t __c, mve_pred16_t __p)
-{
-  return __builtin_mve_vcmlaq_rot180_m_fv8hf (__a, __b, __c, __p);
-}
-
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcmlaq_rot270_m_f32 (float32x4_t __a, float32x4_t __b, float32x4_t __c, mve_pred16_t __p)
-{
-  return __builtin_mve_vcmlaq_rot270_m_fv4sf (__a, __b, __c, __p);
-}
-
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcmlaq_rot270_m_f16 (float16x8_t __a, float16x8_t __b, float16x8_t __c, mve_pred16_t __p)
-{
-  return __builtin_mve_vcmlaq_rot270_m_fv8hf (__a, __b, __c, __p);
-}
-
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcmlaq_rot90_m_f32 (float32x4_t __a, float32x4_t __b, float32x4_t __c, mve_pred16_t __p)
-{
-  return __builtin_mve_vcmlaq_rot90_m_fv4sf (__a, __b, __c, __p);
-}
-
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcmlaq_rot90_m_f16 (float16x8_t __a, float16x8_t __b, float16x8_t __c, mve_pred16_t __p)
-{
-  return __builtin_mve_vcmlaq_rot90_m_fv8hf (__a, __b, __c, __p);
-}
-
 __extension__ extern __inline int32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vcvtq_m_n_s32_f32 (int32x4_t __inactive, float32x4_t __a, const int __imm6, mve_pred16_t __p)
@@ -8481,34 +8345,6 @@ __arm_vcvttq_m (float32x4_t __inactive, float16x8_t __a, mve_pred16_t __p)
  return __arm_vcvttq_m_f32_f16 (__inactive, __a, __p);
 }
 
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcmlaq (float16x8_t __a, float16x8_t __b, float16x8_t __c)
-{
- return __arm_vcmlaq_f16 (__a, __b, __c);
-}
-
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcmlaq_rot180 (float16x8_t __a, float16x8_t __b, float16x8_t __c)
-{
- return __arm_vcmlaq_rot180_f16 (__a, __b, __c);
-}
-
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcmlaq_rot270 (float16x8_t __a, float16x8_t __b, float16x8_t __c)
-{
- return __arm_vcmlaq_rot270_f16 (__a, __b, __c);
-}
-
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcmlaq_rot90 (float16x8_t __a, float16x8_t __b, float16x8_t __c)
-{
- return __arm_vcmlaq_rot90_f16 (__a, __b, __c);
-}
-
 __extension__ extern __inline int16x8_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vcvtmq_m (int16x8_t __inactive, float16x8_t __a, mve_pred16_t __p)
@@ -8565,34 +8401,6 @@ __arm_vcvtq_m (uint16x8_t __inactive, float16x8_t __a, mve_pred16_t __p)
  return __arm_vcvtq_m_u16_f16 (__inactive, __a, __p);
 }
 
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcmlaq (float32x4_t __a, float32x4_t __b, float32x4_t __c)
-{
- return __arm_vcmlaq_f32 (__a, __b, __c);
-}
-
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcmlaq_rot180 (float32x4_t __a, float32x4_t __b, float32x4_t __c)
-{
- return __arm_vcmlaq_rot180_f32 (__a, __b, __c);
-}
-
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcmlaq_rot270 (float32x4_t __a, float32x4_t __b, float32x4_t __c)
-{
- return __arm_vcmlaq_rot270_f32 (__a, __b, __c);
-}
-
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcmlaq_rot90 (float32x4_t __a, float32x4_t __b, float32x4_t __c)
-{
- return __arm_vcmlaq_rot90_f32 (__a, __b, __c);
-}
-
 __extension__ extern __inline int32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vcvtmq_m (int32x4_t __inactive, float32x4_t __a, mve_pred16_t __p)
@@ -8691,62 +8499,6 @@ __arm_vbicq_m (float16x8_t __inactive, float16x8_t __a, float16x8_t __b, mve_pre
  return __arm_vbicq_m_f16 (__inactive, __a, __b, __p);
 }
 
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcmlaq_m (float32x4_t __a, float32x4_t __b, float32x4_t __c, mve_pred16_t __p)
-{
- return __arm_vcmlaq_m_f32 (__a, __b, __c, __p);
-}
-
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcmlaq_m (float16x8_t __a, float16x8_t __b, float16x8_t __c, mve_pred16_t __p)
-{
- return __arm_vcmlaq_m_f16 (__a, __b, __c, __p);
-}
-
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcmlaq_rot180_m (float32x4_t __a, float32x4_t __b, float32x4_t __c, mve_pred16_t __p)
-{
- return __arm_vcmlaq_rot180_m_f32 (__a, __b, __c, __p);
-}
-
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcmlaq_rot180_m (float16x8_t __a, float16x8_t __b, float16x8_t __c, mve_pred16_t __p)
-{
- return __arm_vcmlaq_rot180_m_f16 (__a, __b, __c, __p);
-}
-
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcmlaq_rot270_m (float32x4_t __a, float32x4_t __b, float32x4_t __c, mve_pred16_t __p)
-{
- return __arm_vcmlaq_rot270_m_f32 (__a, __b, __c, __p);
-}
-
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcmlaq_rot270_m (float16x8_t __a, float16x8_t __b, float16x8_t __c, mve_pred16_t __p)
-{
- return __arm_vcmlaq_rot270_m_f16 (__a, __b, __c, __p);
-}
-
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcmlaq_rot90_m (float32x4_t __a, float32x4_t __b, float32x4_t __c, mve_pred16_t __p)
-{
- return __arm_vcmlaq_rot90_m_f32 (__a, __b, __c, __p);
-}
-
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcmlaq_rot90_m (float16x8_t __a, float16x8_t __b, float16x8_t __c, mve_pred16_t __p)
-{
- return __arm_vcmlaq_rot90_m_f16 (__a, __b, __c, __p);
-}
-
 __extension__ extern __inline int32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vcvtq_m_n (int32x4_t __inactive, float32x4_t __a, const int __imm6, mve_pred16_t __p)
@@ -9620,34 +9372,6 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vcvtq_m_n_f16_u16 (__ARM_mve_coerce(__p0, float16x8_t), __ARM_mve_coerce(__p1, uint16x8_t), p2, p3), \
   int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vcvtq_m_n_f32_u32 (__ARM_mve_coerce(__p0, float32x4_t), __ARM_mve_coerce(__p1, uint32x4_t), p2, p3));})
 
-#define __arm_vcmlaq(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
-  __typeof(p1) __p1 = (p1); \
-  __typeof(p2) __p2 = (p2); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
-  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]: __arm_vcmlaq_f16 (__ARM_mve_coerce(__p0, float16x8_t), __ARM_mve_coerce(__p1, float16x8_t), __ARM_mve_coerce(__p2, float16x8_t)), \
-  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]: __arm_vcmlaq_f32 (__ARM_mve_coerce(__p0, float32x4_t), __ARM_mve_coerce(__p1, float32x4_t), __ARM_mve_coerce(__p2, float32x4_t)));})
-
-#define __arm_vcmlaq_rot180(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
-  __typeof(p1) __p1 = (p1); \
-  __typeof(p2) __p2 = (p2); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
-  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]: __arm_vcmlaq_rot180_f16 (__ARM_mve_coerce(__p0, float16x8_t), __ARM_mve_coerce(__p1, float16x8_t), __ARM_mve_coerce(__p2, float16x8_t)), \
-  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]: __arm_vcmlaq_rot180_f32 (__ARM_mve_coerce(__p0, float32x4_t), __ARM_mve_coerce(__p1, float32x4_t), __ARM_mve_coerce(__p2, float32x4_t)));})
-
-#define __arm_vcmlaq_rot270(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
-  __typeof(p1) __p1 = (p1); \
-  __typeof(p2) __p2 = (p2); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
-  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]: __arm_vcmlaq_rot270_f16 (__ARM_mve_coerce(__p0, float16x8_t), __ARM_mve_coerce(__p1, float16x8_t), __ARM_mve_coerce(__p2, float16x8_t)), \
-  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]: __arm_vcmlaq_rot270_f32 (__ARM_mve_coerce(__p0, float32x4_t), __ARM_mve_coerce(__p1, float32x4_t), __ARM_mve_coerce(__p2, float32x4_t)));})
-
-#define __arm_vcmlaq_rot90(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
-  __typeof(p1) __p1 = (p1); \
-  __typeof(p2) __p2 = (p2); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
-  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]: __arm_vcmlaq_rot90_f16 (__ARM_mve_coerce(__p0, float16x8_t), __ARM_mve_coerce(__p1, float16x8_t), __ARM_mve_coerce(__p2, float16x8_t)), \
-  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]: __arm_vcmlaq_rot90_f32 (__ARM_mve_coerce(__p0, float32x4_t), __ARM_mve_coerce(__p1, float32x4_t), __ARM_mve_coerce(__p2, float32x4_t)));})
-
 #define __arm_vcvtbq_m(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
   _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
@@ -9697,34 +9421,6 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]: __arm_vbicq_m_f16 (__ARM_mve_coerce(__p0, float16x8_t), __ARM_mve_coerce(__p1, float16x8_t), __ARM_mve_coerce(__p2, float16x8_t), p3), \
   int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]: __arm_vbicq_m_f32 (__ARM_mve_coerce(__p0, float32x4_t), __ARM_mve_coerce(__p1, float32x4_t), __ARM_mve_coerce(__p2, float32x4_t), p3));})
 
-#define __arm_vcmlaq_m(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
-  __typeof(p1) __p1 = (p1); \
-  __typeof(p2) __p2 = (p2); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
-  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]: __arm_vcmlaq_m_f16 (__ARM_mve_coerce(__p0, float16x8_t), __ARM_mve_coerce(__p1, float16x8_t), __ARM_mve_coerce(__p2, float16x8_t), p3), \
-  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]: __arm_vcmlaq_m_f32 (__ARM_mve_coerce(__p0, float32x4_t), __ARM_mve_coerce(__p1, float32x4_t), __ARM_mve_coerce(__p2, float32x4_t), p3));})
-
-#define __arm_vcmlaq_rot180_m(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
-  __typeof(p1) __p1 = (p1); \
-  __typeof(p2) __p2 = (p2); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
-  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]: __arm_vcmlaq_rot180_m_f16 (__ARM_mve_coerce(__p0, float16x8_t), __ARM_mve_coerce(__p1, float16x8_t), __ARM_mve_coerce(__p2, float16x8_t), p3), \
-  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]: __arm_vcmlaq_rot180_m_f32 (__ARM_mve_coerce(__p0, float32x4_t), __ARM_mve_coerce(__p1, float32x4_t), __ARM_mve_coerce(__p2, float32x4_t), p3));})
-
-#define __arm_vcmlaq_rot270_m(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
-  __typeof(p1) __p1 = (p1); \
-  __typeof(p2) __p2 = (p2); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
-  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]: __arm_vcmlaq_rot270_m_f16 (__ARM_mve_coerce(__p0, float16x8_t), __ARM_mve_coerce(__p1, float16x8_t), __ARM_mve_coerce(__p2, float16x8_t), p3), \
-  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]: __arm_vcmlaq_rot270_m_f32 (__ARM_mve_coerce(__p0, float32x4_t), __ARM_mve_coerce(__p1, float32x4_t), __ARM_mve_coerce(__p2, float32x4_t), p3));})
-
-#define __arm_vcmlaq_rot90_m(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
-  __typeof(p1) __p1 = (p1); \
-  __typeof(p2) __p2 = (p2); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
-  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]: __arm_vcmlaq_rot90_m_f16 (__ARM_mve_coerce(__p0, float16x8_t), __ARM_mve_coerce(__p1, float16x8_t), __ARM_mve_coerce(__p2, float16x8_t), p3), \
-  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]: __arm_vcmlaq_rot90_m_f32 (__ARM_mve_coerce(__p0, float32x4_t), __ARM_mve_coerce(__p1, float32x4_t), __ARM_mve_coerce(__p2, float32x4_t), p3));})
-
 #define __arm_vornq_m(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
   __typeof(p2) __p2 = (p2); \
-- 
2.34.1


^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: [PATCH 1/6] arm: [MVE intrinsics] Factorize vcaddq vhcaddq
  2023-07-13 10:22 [PATCH 1/6] arm: [MVE intrinsics] Factorize vcaddq vhcaddq Christophe Lyon
                   ` (4 preceding siblings ...)
  2023-07-13 10:22 ` [PATCH 6/6] arm: [MVE intrinsics] rework vcmlaq Christophe Lyon
@ 2023-07-14 16:18 ` Kyrylo Tkachov
  5 siblings, 0 replies; 7+ messages in thread
From: Kyrylo Tkachov @ 2023-07-14 16:18 UTC (permalink / raw)
  To: Christophe Lyon, gcc-patches, Richard Earnshaw, Richard Sandiford



> -----Original Message-----
> From: Christophe Lyon <christophe.lyon@linaro.org>
> Sent: Thursday, July 13, 2023 11:22 AM
> To: gcc-patches@gcc.gnu.org; Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>;
> Richard Earnshaw <Richard.Earnshaw@arm.com>; Richard Sandiford
> <Richard.Sandiford@arm.com>
> Cc: Christophe Lyon <christophe.lyon@linaro.org>
> Subject: [PATCH 1/6] arm: [MVE intrinsics] Factorize vcaddq vhcaddq
> 
> Factorize vcaddq, vhcaddq so that they use the same parameterized
> names.
> 
> To be able to use the same patterns, we add a suffix to vcaddq.
> 
> Note that vcadd uses UNSPEC_VCADDxx for builtins without predication,
> and VCADDQ_ROTxx_M_x (that is, not starting with "UNSPEC_").  The
> UNPEC_* names are also used by neon.md

Thanks for working on this.
The series is ok.
Kyrill

> 
> 2023-07-13  Christophe Lyon  <christophe.lyon@linaro.org>
> 
> 	gcc/
> 	* config/arm/arm_mve_builtins.def (vcaddq_rot90_,
> vcaddq_rot270_)
> 	(vcaddq_rot90_f, vcaddq_rot90_f): Add "_" or "_f" suffix.
> 	* config/arm/iterators.md (mve_insn): Add vcadd, vhcadd.
> 	(isu): Add UNSPEC_VCADD90, UNSPEC_VCADD270,
> VCADDQ_ROT270_M_U,
> 	VCADDQ_ROT270_M_S, VCADDQ_ROT90_M_U,
> VCADDQ_ROT90_M_S,
> 	VHCADDQ_ROT90_M_S, VHCADDQ_ROT270_M_S,
> VHCADDQ_ROT90_S,
> 	VHCADDQ_ROT270_S.
> 	(rot): Add VCADDQ_ROT90_M_F, VCADDQ_ROT90_M_S,
> VCADDQ_ROT90_M_U,
> 	VCADDQ_ROT270_M_F, VCADDQ_ROT270_M_S,
> VCADDQ_ROT270_M_U,
> 	VHCADDQ_ROT90_S, VHCADDQ_ROT270_S, VHCADDQ_ROT90_M_S,
> 	VHCADDQ_ROT270_M_S.
> 	(mve_rot): Add VCADDQ_ROT90_M_F, VCADDQ_ROT90_M_S,
> 	VCADDQ_ROT90_M_U, VCADDQ_ROT270_M_F,
> VCADDQ_ROT270_M_S,
> 	VCADDQ_ROT270_M_U, VHCADDQ_ROT90_S, VHCADDQ_ROT270_S,
> 	VHCADDQ_ROT90_M_S, VHCADDQ_ROT270_M_S.
> 	(supf): Add VHCADDQ_ROT90_M_S, VHCADDQ_ROT270_M_S,
> 	VHCADDQ_ROT90_S, VHCADDQ_ROT270_S, UNSPEC_VCADD90,
> 	UNSPEC_VCADD270.
> 	(VCADDQ_ROT270_M): Delete.
> 	(VCADDQ_M_F VxCADDQ VxCADDQ_M): New.
> 	(VCADDQ_ROT90_M): Delete.
> 	* config/arm/mve.md (mve_vcaddq<mve_rot><mode>)
> 	(mve_vhcaddq_rot270_s<mode>, mve_vhcaddq_rot90_s<mode>):
> Merge
> 	into ...
> 	(@mve_<mve_insn>q<mve_rot>_<supf><mode>): ... this.
> 	(mve_vcaddq<mve_rot><mode>): Rename into ...
> 	(@mve_<mve_insn>q<mve_rot>_f<mode>): ... this
> 	(mve_vcaddq_rot270_m_<supf><mode>)
> 	(mve_vcaddq_rot90_m_<supf><mode>,
> mve_vhcaddq_rot270_m_s<mode>)
> 	(mve_vhcaddq_rot90_m_s<mode>): Merge into ...
> 	(@mve_<mve_insn>q<mve_rot>_m_<supf><mode>): ... this.
> 	(mve_vcaddq_rot270_m_f<mode>, mve_vcaddq_rot90_m_f<mode>):
> Merge
> 	into ...
> 	(@mve_<mve_insn>q<mve_rot>_m_f<mode>): ... this.
> ---
>  gcc/config/arm/arm_mve_builtins.def |   6 +-
>  gcc/config/arm/iterators.md         |  38 +++++++-
>  gcc/config/arm/mve.md               | 135 +++++-----------------------
>  3 files changed, 62 insertions(+), 117 deletions(-)
> 
> diff --git a/gcc/config/arm/arm_mve_builtins.def
> b/gcc/config/arm/arm_mve_builtins.def
> index 8de765de3b0..63ad1845593 100644
> --- a/gcc/config/arm/arm_mve_builtins.def
> +++ b/gcc/config/arm/arm_mve_builtins.def
> @@ -187,6 +187,10 @@ VAR3 (BINOP_NONE_NONE_NONE, vmaxvq_s, v16qi,
> v8hi, v4si)
>  VAR3 (BINOP_NONE_NONE_NONE, vmaxq_s, v16qi, v8hi, v4si)
>  VAR3 (BINOP_NONE_NONE_NONE, vhsubq_s, v16qi, v8hi, v4si)
>  VAR3 (BINOP_NONE_NONE_NONE, vhsubq_n_s, v16qi, v8hi, v4si)
> +VAR3 (BINOP_NONE_NONE_NONE, vcaddq_rot90_, v16qi, v8hi, v4si)
> +VAR3 (BINOP_NONE_NONE_NONE, vcaddq_rot270_, v16qi, v8hi, v4si)
> +VAR2 (BINOP_NONE_NONE_NONE, vcaddq_rot90_f, v8hf, v4sf)
> +VAR2 (BINOP_NONE_NONE_NONE, vcaddq_rot270_f, v8hf, v4sf)
>  VAR3 (BINOP_NONE_NONE_NONE, vhcaddq_rot90_s, v16qi, v8hi, v4si)
>  VAR3 (BINOP_NONE_NONE_NONE, vhcaddq_rot270_s, v16qi, v8hi, v4si)
>  VAR3 (BINOP_NONE_NONE_NONE, vhaddq_s, v16qi, v8hi, v4si)
> @@ -870,8 +874,6 @@ VAR3
> (QUADOP_UNONE_UNONE_UNONE_IMM_PRED, vshlcq_m_vec_u, v16qi,
> v8hi, v4si)
>  VAR3 (QUADOP_UNONE_UNONE_UNONE_IMM_PRED, vshlcq_m_carry_u,
> v16qi, v8hi, v4si)
> 
>  /* optabs without any suffixes.  */
> -VAR5 (BINOP_NONE_NONE_NONE, vcaddq_rot90, v16qi, v8hi, v4si, v8hf,
> v4sf)
> -VAR5 (BINOP_NONE_NONE_NONE, vcaddq_rot270, v16qi, v8hi, v4si, v8hf,
> v4sf)
>  VAR2 (BINOP_NONE_NONE_NONE, vcmulq_rot90, v8hf, v4sf)
>  VAR2 (BINOP_NONE_NONE_NONE, vcmulq_rot270, v8hf, v4sf)
>  VAR2 (BINOP_NONE_NONE_NONE, vcmulq_rot180, v8hf, v4sf)
> diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
> index 9e77af55d60..da1ead34e58 100644
> --- a/gcc/config/arm/iterators.md
> +++ b/gcc/config/arm/iterators.md
> @@ -902,6 +902,7 @@
>  		     ])
> 
>  (define_int_attr mve_insn [
> +		 (UNSPEC_VCADD90 "vcadd") (UNSPEC_VCADD270 "vcadd")
>  		 (VABAVQ_P_S "vabav") (VABAVQ_P_U "vabav")
>  		 (VABAVQ_S "vabav") (VABAVQ_U "vabav")
>  		 (VABDQ_M_S "vabd") (VABDQ_M_U "vabd") (VABDQ_M_F
> "vabd")
> @@ -925,6 +926,8 @@
>  		 (VBICQ_N_S "vbic") (VBICQ_N_U "vbic")
>  		 (VBRSRQ_M_N_S "vbrsr") (VBRSRQ_M_N_U "vbrsr")
> (VBRSRQ_M_N_F "vbrsr")
>  		 (VBRSRQ_N_S "vbrsr") (VBRSRQ_N_U "vbrsr") (VBRSRQ_N_F
> "vbrsr")
> +		 (VCADDQ_ROT270_M_U "vcadd") (VCADDQ_ROT270_M_S
> "vcadd") (VCADDQ_ROT270_M_F "vcadd")
> +		 (VCADDQ_ROT90_M_U "vcadd") (VCADDQ_ROT90_M_S
> "vcadd") (VCADDQ_ROT90_M_F "vcadd")
>  		 (VCLSQ_M_S "vcls")
>  		 (VCLSQ_S "vcls")
>  		 (VCLZQ_M_S "vclz") (VCLZQ_M_U "vclz")
> @@ -944,6 +947,8 @@
>  		 (VHADDQ_M_S "vhadd") (VHADDQ_M_U "vhadd")
>  		 (VHADDQ_N_S "vhadd") (VHADDQ_N_U "vhadd")
>  		 (VHADDQ_S "vhadd") (VHADDQ_U "vhadd")
> +		 (VHCADDQ_ROT90_M_S "vhcadd") (VHCADDQ_ROT270_M_S
> "vhcadd")
> +		 (VHCADDQ_ROT90_S "vhcadd") (VHCADDQ_ROT270_S
> "vhcadd")
>  		 (VHSUBQ_M_N_S "vhsub") (VHSUBQ_M_N_U "vhsub")
>  		 (VHSUBQ_M_S "vhsub") (VHSUBQ_M_U "vhsub")
>  		 (VHSUBQ_N_S "vhsub") (VHSUBQ_N_U "vhsub")
> @@ -1190,7 +1195,10 @@
>  		 ])
> 
>  (define_int_attr isu    [
> +		 (UNSPEC_VCADD90 "i") (UNSPEC_VCADD270 "i")
>  		 (VABSQ_M_S "s")
> +		 (VCADDQ_ROT270_M_U "i") (VCADDQ_ROT270_M_S "i")
> +		 (VCADDQ_ROT90_M_U "i") (VCADDQ_ROT90_M_S "i")
>  		 (VCLSQ_M_S "s")
>  		 (VCLZQ_M_S "i")
>  		 (VCLZQ_M_U "i")
> @@ -1214,6 +1222,8 @@
>  		 (VCMPNEQ_M_N_U "i")
>  		 (VCMPNEQ_M_S "i")
>  		 (VCMPNEQ_M_U "i")
> +		 (VHCADDQ_ROT90_M_S "s") (VHCADDQ_ROT270_M_S "s")
> +		 (VHCADDQ_ROT90_S "s") (VHCADDQ_ROT270_S "s")
>  		 (VMOVNBQ_M_S "i") (VMOVNBQ_M_U "i")
>  		 (VMOVNBQ_S "i") (VMOVNBQ_U "i")
>  		 (VMOVNTQ_M_S "i") (VMOVNTQ_M_U "i")
> @@ -2155,6 +2165,16 @@
> 
>  (define_int_attr rot [(UNSPEC_VCADD90 "90")
>  		      (UNSPEC_VCADD270 "270")
> +		      (VCADDQ_ROT90_M_F "90")
> +		      (VCADDQ_ROT90_M_S "90")
> +		      (VCADDQ_ROT90_M_U "90")
> +		      (VCADDQ_ROT270_M_F "270")
> +		      (VCADDQ_ROT270_M_S "270")
> +		      (VCADDQ_ROT270_M_U "270")
> +		      (VHCADDQ_ROT90_S "90")
> +		      (VHCADDQ_ROT270_S "270")
> +		      (VHCADDQ_ROT90_M_S "90")
> +		      (VHCADDQ_ROT270_M_S "270")
>  		      (UNSPEC_VCMUL "0")
>  		      (UNSPEC_VCMUL90 "90")
>  		      (UNSPEC_VCMUL180 "180")
> @@ -2193,6 +2213,16 @@
> 
>  (define_int_attr mve_rot [(UNSPEC_VCADD90 "_rot90")
>  			  (UNSPEC_VCADD270 "_rot270")
> +			  (VCADDQ_ROT90_M_F "_rot90")
> +			  (VCADDQ_ROT90_M_S "_rot90")
> +			  (VCADDQ_ROT90_M_U "_rot90")
> +			  (VCADDQ_ROT270_M_F "_rot270")
> +			  (VCADDQ_ROT270_M_S "_rot270")
> +			  (VCADDQ_ROT270_M_U "_rot270")
> +			  (VHCADDQ_ROT90_S "_rot90")
> +			  (VHCADDQ_ROT270_S "_rot270")
> +			  (VHCADDQ_ROT90_M_S "_rot90")
> +			  (VHCADDQ_ROT270_M_S "_rot270")
>  			  (UNSPEC_VCMLA "")
>  			  (UNSPEC_VCMLA90 "_rot90")
>  			  (UNSPEC_VCMLA180 "_rot180")
> @@ -2535,6 +2565,9 @@
>  		       (VRMLALDAVHAQ_P_S "s") (VRMLALDAVHAQ_P_U "u")
>  		       (VQSHLUQ_M_N_S "s")
>  		       (VQSHLUQ_N_S "s")
> +		       (VHCADDQ_ROT90_M_S "s") (VHCADDQ_ROT270_M_S
> "s")
> +		       (VHCADDQ_ROT90_S "s") (VHCADDQ_ROT270_S "s")
> +		       (UNSPEC_VCADD90 "") (UNSPEC_VCADD270 "")
>  		       ])
> 
>  ;; Both kinds of return insn.
> @@ -2767,7 +2800,9 @@
>  (define_int_iterator VANDQ_M [VANDQ_M_U VANDQ_M_S])
>  (define_int_iterator VBICQ_M [VBICQ_M_U VBICQ_M_S])
>  (define_int_iterator VSHLQ_M_N [VSHLQ_M_N_S VSHLQ_M_N_U])
> -(define_int_iterator VCADDQ_ROT270_M [VCADDQ_ROT270_M_U
> VCADDQ_ROT270_M_S])
> +(define_int_iterator VCADDQ_M_F [VCADDQ_ROT90_M_F
> VCADDQ_ROT270_M_F])
> +(define_int_iterator VxCADDQ [UNSPEC_VCADD90 UNSPEC_VCADD270
> VHCADDQ_ROT90_S VHCADDQ_ROT270_S])
> +(define_int_iterator VxCADDQ_M [VHCADDQ_ROT90_M_S
> VHCADDQ_ROT270_M_S VCADDQ_ROT90_M_U VCADDQ_ROT90_M_S
> VCADDQ_ROT270_M_U VCADDQ_ROT270_M_S])
>  (define_int_iterator VQRSHLQ_M [VQRSHLQ_M_U VQRSHLQ_M_S])
>  (define_int_iterator VQADDQ_M_N [VQADDQ_M_N_U VQADDQ_M_N_S])
>  (define_int_iterator VADDQ_M_N [VADDQ_M_N_S VADDQ_M_N_U])
> @@ -2777,7 +2812,6 @@
>  (define_int_iterator VMLADAVAQ_P [VMLADAVAQ_P_U VMLADAVAQ_P_S])
>  (define_int_iterator VBRSRQ_M_N [VBRSRQ_M_N_U VBRSRQ_M_N_S])
>  (define_int_iterator VMULQ_M_N [VMULQ_M_N_U VMULQ_M_N_S])
> -(define_int_iterator VCADDQ_ROT90_M [VCADDQ_ROT90_M_U
> VCADDQ_ROT90_M_S])
>  (define_int_iterator VMULLTQ_INT_M [VMULLTQ_INT_M_S
> VMULLTQ_INT_M_U])
>  (define_int_iterator VEORQ_M [VEORQ_M_S VEORQ_M_U])
>  (define_int_iterator VSHRQ_M_N [VSHRQ_M_N_S VSHRQ_M_N_U])
> diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
> index 74909ce47e1..a6db6d1b81d 100644
> --- a/gcc/config/arm/mve.md
> +++ b/gcc/config/arm/mve.md
> @@ -839,17 +839,20 @@
>  ])
> 
>  ;;
> -;; [vcaddq, vcaddq_rot90, vcadd_rot180, vcadd_rot270])
> +;; [vcaddq_rot90_s, vcadd_rot90_u]
> +;; [vcaddq_rot270_s, vcadd_rot270_u]
> +;; [vhcaddq_rot90_s]
> +;; [vhcaddq_rot270_s]
>  ;;
> -(define_insn "mve_vcaddq<mve_rot><mode>"
> +(define_insn "@mve_<mve_insn>q<mve_rot>_<supf><mode>"
>    [
>     (set (match_operand:MVE_2 0 "s_register_operand" "<earlyclobber_32>")
>  	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand"
> "w")
>  		       (match_operand:MVE_2 2 "s_register_operand" "w")]
> -	 VCADD))
> +	 VxCADDQ))
>    ]
>    "TARGET_HAVE_MVE"
> -  "vcadd.i%#<V_sz_elem>	%q0, %q1, %q2, #<rot>"
> +  "<mve_insn>.<isu>%#<V_sz_elem>\t%q0, %q1, %q2, #<rot>"
>    [(set_attr "type" "mve_move")
>  ])
> 
> @@ -904,36 +907,6 @@
>    [(set_attr "type" "mve_move")
>  ])
> 
> -;;
> -;; [vhcaddq_rot270_s])
> -;;
> -(define_insn "mve_vhcaddq_rot270_s<mode>"
> -  [
> -   (set (match_operand:MVE_2 0 "s_register_operand" "<earlyclobber_32>")
> -	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand"
> "w")
> -		       (match_operand:MVE_2 2 "s_register_operand" "w")]
> -	 VHCADDQ_ROT270_S))
> -  ]
> -  "TARGET_HAVE_MVE"
> -  "vhcadd.s%#<V_sz_elem>\t%q0, %q1, %q2, #270"
> -  [(set_attr "type" "mve_move")
> -])
> -
> -;;
> -;; [vhcaddq_rot90_s])
> -;;
> -(define_insn "mve_vhcaddq_rot90_s<mode>"
> -  [
> -   (set (match_operand:MVE_2 0 "s_register_operand" "<earlyclobber_32>")
> -	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand"
> "w")
> -		       (match_operand:MVE_2 2 "s_register_operand" "w")]
> -	 VHCADDQ_ROT90_S))
> -  ]
> -  "TARGET_HAVE_MVE"
> -  "vhcadd.s%#<V_sz_elem>\t%q0, %q1, %q2, #90"
> -  [(set_attr "type" "mve_move")
> -])
> -
>  ;;
>  ;; [vmaxaq_s]
>  ;; [vminaq_s]
> @@ -1238,9 +1211,9 @@
>  ])
> 
>  ;;
> -;; [vcaddq, vcaddq_rot90, vcadd_rot180, vcadd_rot270])
> +;; [vcaddq_rot90_f, vcaddq_rot270_f]
>  ;;
> -(define_insn "mve_vcaddq<mve_rot><mode>"
> +(define_insn "@mve_<mve_insn>q<mve_rot>_f<mode>"
>    [
>     (set (match_operand:MVE_0 0 "s_register_operand" "<earlyclobber_32>")
>  	(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand"
> "w")
> @@ -1248,7 +1221,7 @@
>  	 VCADD))
>    ]
>    "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
> -  "vcadd.f%#<V_sz_elem>	%q0, %q1, %q2, #<rot>"
> +  "<mve_insn>.f%#<V_sz_elem>\t%q0, %q1, %q2, #<rot>"
>    [(set_attr "type" "mve_move")
>  ])
> 
> @@ -2788,36 +2761,22 @@
>     (set_attr "length""8")])
> 
>  ;;
> -;; [vcaddq_rot270_m_u, vcaddq_rot270_m_s])
> +;; [vcaddq_rot90_m_u, vcaddq_rot90_m_s]
> +;; [vcaddq_rot270_m_u, vcaddq_rot270_m_s]
> +;; [vhcaddq_rot90_m_s]
> +;; [vhcaddq_rot270_m_s]
>  ;;
> -(define_insn "mve_vcaddq_rot270_m_<supf><mode>"
> +(define_insn "@mve_<mve_insn>q<mve_rot>_m_<supf><mode>"
>    [
>     (set (match_operand:MVE_2 0 "s_register_operand" "<earlyclobber_32>")
>  	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
>  		       (match_operand:MVE_2 2 "s_register_operand" "w")
>  		       (match_operand:MVE_2 3 "s_register_operand" "w")
>  		       (match_operand:<MVE_VPRED> 4
> "vpr_register_operand" "Up")]
> -	 VCADDQ_ROT270_M))
> +	 VxCADDQ_M))
>    ]
>    "TARGET_HAVE_MVE"
> -  "vpst\;vcaddt.i%#<V_sz_elem>	%q0, %q2, %q3, #270"
> -  [(set_attr "type" "mve_move")
> -   (set_attr "length""8")])
> -
> -;;
> -;; [vcaddq_rot90_m_u, vcaddq_rot90_m_s])
> -;;
> -(define_insn "mve_vcaddq_rot90_m_<supf><mode>"
> -  [
> -   (set (match_operand:MVE_2 0 "s_register_operand" "<earlyclobber_32>")
> -	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
> -		       (match_operand:MVE_2 2 "s_register_operand" "w")
> -		       (match_operand:MVE_2 3 "s_register_operand" "w")
> -		       (match_operand:<MVE_VPRED> 4
> "vpr_register_operand" "Up")]
> -	 VCADDQ_ROT90_M))
> -  ]
> -  "TARGET_HAVE_MVE"
> -  "vpst\;vcaddt.i%#<V_sz_elem>	%q0, %q2, %q3, #90"
> +  "vpst\;<mve_insn>t.<isu>%#<V_sz_elem>\t%q0, %q2, %q3, #<rot>"
>    [(set_attr "type" "mve_move")
>     (set_attr "length""8")])
> 
> @@ -2974,40 +2933,6 @@
>    [(set_attr "type" "mve_move")
>     (set_attr "length""8")])
> 
> -;;
> -;; [vhcaddq_rot270_m_s])
> -;;
> -(define_insn "mve_vhcaddq_rot270_m_s<mode>"
> -  [
> -   (set (match_operand:MVE_2 0 "s_register_operand" "<earlyclobber_32>")
> -	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
> -		       (match_operand:MVE_2 2 "s_register_operand" "w")
> -		       (match_operand:MVE_2 3 "s_register_operand" "w")
> -		       (match_operand:<MVE_VPRED> 4
> "vpr_register_operand" "Up")]
> -	 VHCADDQ_ROT270_M_S))
> -  ]
> -  "TARGET_HAVE_MVE"
> -  "vpst\;vhcaddt.s%#<V_sz_elem>\t%q0, %q2, %q3, #270"
> -  [(set_attr "type" "mve_move")
> -   (set_attr "length""8")])
> -
> -;;
> -;; [vhcaddq_rot90_m_s])
> -;;
> -(define_insn "mve_vhcaddq_rot90_m_s<mode>"
> -  [
> -   (set (match_operand:MVE_2 0 "s_register_operand" "<earlyclobber_32>")
> -	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
> -		       (match_operand:MVE_2 2 "s_register_operand" "w")
> -		       (match_operand:MVE_2 3 "s_register_operand" "w")
> -		       (match_operand:<MVE_VPRED> 4
> "vpr_register_operand" "Up")]
> -	 VHCADDQ_ROT90_M_S))
> -  ]
> -  "TARGET_HAVE_MVE"
> -  "vpst\;vhcaddt.s%#<V_sz_elem>\t%q0, %q2, %q3, #90"
> -  [(set_attr "type" "mve_move")
> -   (set_attr "length""8")])
> -
>  ;;
>  ;; [vmlaldavaq_p_u, vmlaldavaq_p_s]
>  ;; [vmlaldavaxq_p_s]
> @@ -3247,36 +3172,20 @@
>     (set_attr "length""8")])
> 
>  ;;
> -;; [vcaddq_rot270_m_f])
> -;;
> -(define_insn "mve_vcaddq_rot270_m_f<mode>"
> -  [
> -   (set (match_operand:MVE_0 0 "s_register_operand" "<earlyclobber_32>")
> -	(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "0")
> -		       (match_operand:MVE_0 2 "s_register_operand" "w")
> -		       (match_operand:MVE_0 3 "s_register_operand" "w")
> -		       (match_operand:<MVE_VPRED> 4
> "vpr_register_operand" "Up")]
> -	 VCADDQ_ROT270_M_F))
> -  ]
> -  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
> -  "vpst\;vcaddt.f%#<V_sz_elem>	%q0, %q2, %q3, #270"
> -  [(set_attr "type" "mve_move")
> -   (set_attr "length""8")])
> -
> -;;
> -;; [vcaddq_rot90_m_f])
> +;; [vcaddq_rot90_m_f]
> +;; [vcaddq_rot270_m_f]
>  ;;
> -(define_insn "mve_vcaddq_rot90_m_f<mode>"
> +(define_insn "@mve_<mve_insn>q<mve_rot>_m_f<mode>"
>    [
>     (set (match_operand:MVE_0 0 "s_register_operand" "<earlyclobber_32>")
>  	(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "0")
>  		       (match_operand:MVE_0 2 "s_register_operand" "w")
>  		       (match_operand:MVE_0 3 "s_register_operand" "w")
>  		       (match_operand:<MVE_VPRED> 4
> "vpr_register_operand" "Up")]
> -	 VCADDQ_ROT90_M_F))
> +	 VCADDQ_M_F))
>    ]
>    "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
> -  "vpst\;vcaddt.f%#<V_sz_elem>	%q0, %q2, %q3, #90"
> +  "vpst\;<mve_insn>t.f%#<V_sz_elem>\t%q0, %q2, %q3, #<rot>"
>    [(set_attr "type" "mve_move")
>     (set_attr "length""8")])
> 
> --
> 2.34.1


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2023-07-14 16:18 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-07-13 10:22 [PATCH 1/6] arm: [MVE intrinsics] Factorize vcaddq vhcaddq Christophe Lyon
2023-07-13 10:22 ` [PATCH 2/6] arm: [MVE intrinsics] rework " Christophe Lyon
2023-07-13 10:22 ` [PATCH 3/6] arm: [MVE intrinsics factorize vcmulq Christophe Lyon
2023-07-13 10:22 ` [PATCH 4/6] arm: [MVE intrinsics] rework vcmulq Christophe Lyon
2023-07-13 10:22 ` [PATCH 5/6] arm: [MVE intrinsics] factorize vcmlaq Christophe Lyon
2023-07-13 10:22 ` [PATCH 6/6] arm: [MVE intrinsics] rework vcmlaq Christophe Lyon
2023-07-14 16:18 ` [PATCH 1/6] arm: [MVE intrinsics] Factorize vcaddq vhcaddq Kyrylo Tkachov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).