[PATCH 00/22] arm: New framework for MVE intrinsics

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH 00/22] arm: New framework for MVE intrinsics
@ 2023-04-18 13:45 Christophe Lyon
  2023-04-18 13:45 ` [PATCH 01/22] arm: move builtin function codes into general numberspace Christophe Lyon
                   ` (22 more replies)
  0 siblings, 23 replies; 55+ messages in thread
From: Christophe Lyon @ 2023-04-18 13:45 UTC (permalink / raw)
  To: gcc-patches, kyrylo.tkachov, richard.earnshaw, richard.sandiford
  Cc: Christophe Lyon

Hi,

This is the beginning of a long patch series to change the way Arm MVE
intrinsics are implemented. The goal is to get rid of arm_mve.h, which
takes a long time to parse and compile.

Roughly speaking, it's about using a framework very similar to what is
implemented for AArch64/SVE intrinsics. I haven't converted all the
intrinsics yet, but I think it would be good to start the conversion
when stage-1 reopens.

* Factorizing names
One of the main implementation differences I noticed between SVE and
MVE is that mve.md provides only full builtin names at the moment, and
makes almost no use of "parameterized names"
(https://gcc.gnu.org/onlinedocs/gccint/Parameterized-Names.html#Parameterized-Names).

Without this, we'd need the builtin expander to use a large
switch/case of the form:

switch (code)
case VADDQ_S: insn_code = code_for_mve_vaddq_s (...)
case VADDQ_U: insn_code = code_for_mve_vaddq_u (...)
case VSUBQ_S: insn_code = code_for_mve_vsubq_s (...)
case VSUBQ_U: insn_code = code_for_mve_vsubq_u (...)
....

so part of the work (which I called "factorize" in the commit
messages) is about replacing

(define_insn "mve_vaddq_n_<supf><mode>"
with
(define_insn "@mve_<mve_insn>q_n_<supf><mode>"
with the help of a new iterator (mve_insn).

Doing so makes it more obvious that some patterns are identical,
except for the instruction name. I took this opportunity to merge
them, so for instance I have a patch which merges add, sub and mul
patterns.  Although not strictly necessary for the MVE intrinsics
restructuring work, this is a good opportunity to reduce such code
duplication (I did notice a few bugs during that process, which led me
to post a few small patches in the past months).  Note that identical
patterns will probably remain after the series, they can be merged
later if we want.

This factorization also implies the introduction of new iterators, but
also means that several existing ones become useless. These patches do
not remove them because it's a bit painful to reorder patches which
remove lines at some "random" places, leading to merge conflicts. It's
much simpler to write a big cleanup patch at the end of the serie to
remove all such useless iterators at once.

* Intrinsic re-implementation
After intrinsic names have been factorized, the actual
re-implementation patch is small:
- add 1 line in each of arm-mve-builtins-base.{cc,def,h} describing
  the intrinsic shape/signature, types and predicates involved,
  RTX/unspec codes
- remove the intrinsic definitions from arm_mve.h

The full series of ~140 patches is organized like this:
- patches 1 and 2 introduce the new framework
- new implementation of vreinterpretq
- new implementation of vuninitialized
- patch groups of varying size, consisting in:
  - add a new "shape" if needed (e.g. unary, binary, ternary, ....)
  - add framework support functions if needed
  - factorize a set of intrinsics (at minimum, just make use of
    parameterized-names)
  - actual re-implementation of the intrinsics

I kept patches small so the incremental progress is easy to follow and
check.  I'll submit the patches in small groups, this first one will
make sure we agree on the implementation.

Tested on arm-eabi with -mthumb/-mfloat-abi=hard/-march=armv8.1-m.main+mve.

To help reviewers, I suggest to compare arm-mve-builtins.cc with
aarch64-sve-builtins.cc.

Christophe Lyon (22):
  arm: move builtin function codes into general numberspace
  arm: [MVE intrinsics] Add new framework
  arm: [MVE intrinsics] Rework vreinterpretq
  arm: [MVE intrinsics] Rework vuninitialized
  arm: [MVE intrinsics] add binary_opt_n shape
  arm: [MVE intrinsics] add unspec_based_mve_function_exact_insn
  arm: [MVE intrinsics] factorize vadd vsubq vmulq
  arm: [MVE intrinsics] rework vaddq vmulq vsubq
  arm: [MVE intrinsics] add binary shape
  arm: [MVE intrinsics] factorize vandq veorq vorrq vbicq
  arm: [MVE intrinsics] rework vandq veorq
  arm: [MVE intrinsics] add binary_orrq shape
  arm: [MVE intrinsics] rework vorrq
  arm: [MVE intrinsics] add unspec_mve_function_exact_insn
  arm: [MVE intrinsics] add create shape
  arm: [MVE intrinsics] factorize vcreateq
  arm: [MVE intrinsics] rework vcreateq
  arm: [MVE intrinsics] factorize several binary_m operations
  arm: [MVE intrinsics] factorize several binary _n operations
  arm: [MVE intrinsics] factorize several binary _m_n operations
  arm: [MVE intrinsics] factorize several binary operations
  arm: [MVE intrinsics] rework vhaddq vhsubq vmulhq vqaddq vqsubq
    vqdmulhq vrhaddq vrmulhq

 gcc/config.gcc                                |    2 +-
 gcc/config/arm/arm-builtins.cc                |  237 +-
 gcc/config/arm/arm-builtins.h                 |    1 +
 gcc/config/arm/arm-c.cc                       |   42 +-
 gcc/config/arm/arm-mve-builtins-base.cc       |  163 +
 gcc/config/arm/arm-mve-builtins-base.def      |   50 +
 gcc/config/arm/arm-mve-builtins-base.h        |   47 +
 gcc/config/arm/arm-mve-builtins-functions.h   |  387 +
 gcc/config/arm/arm-mve-builtins-shapes.cc     |  529 ++
 gcc/config/arm/arm-mve-builtins-shapes.h      |   47 +
 gcc/config/arm/arm-mve-builtins.cc            | 2013 ++++-
 gcc/config/arm/arm-mve-builtins.def           |   40 +-
 gcc/config/arm/arm-mve-builtins.h             |  672 +-
 gcc/config/arm/arm-protos.h                   |   24 +
 gcc/config/arm/arm.cc                         |   27 +
 gcc/config/arm/arm_mve.h                      | 7581 +----------------
 gcc/config/arm/arm_mve_builtins.def           |    6 -
 gcc/config/arm/arm_mve_types.h                | 1430 ----
 gcc/config/arm/iterators.md                   |  240 +-
 gcc/config/arm/mve.md                         | 1747 +---
 gcc/config/arm/predicates.md                  |    4 +
 gcc/config/arm/t-arm                          |   32 +-
 gcc/config/arm/unspecs.md                     |    1 +
 gcc/config/arm/vec-common.md                  |    8 +-
 gcc/testsuite/g++.target/arm/mve.exp          |    8 +-
 .../arm/mve/general-c++/nomve_fp_1.c          |   15 +
 .../arm/mve/general-c++/vreinterpretq_1.C     |   25 +
 .../gcc.target/arm/mve/general-c/nomve_fp_1.c |   15 +
 .../arm/mve/general-c/vreinterpretq_1.c       |   25 +
 29 files changed, 4926 insertions(+), 10492 deletions(-)
 create mode 100644 gcc/config/arm/arm-mve-builtins-base.cc
 create mode 100644 gcc/config/arm/arm-mve-builtins-base.def
 create mode 100644 gcc/config/arm/arm-mve-builtins-base.h
 create mode 100644 gcc/config/arm/arm-mve-builtins-functions.h
 create mode 100644 gcc/config/arm/arm-mve-builtins-shapes.cc
 create mode 100644 gcc/config/arm/arm-mve-builtins-shapes.h
 create mode 100644 gcc/testsuite/g++.target/arm/mve/general-c++/nomve_fp_1.c
 create mode 100644 gcc/testsuite/g++.target/arm/mve/general-c++/vreinterpretq_1.C
 create mode 100644 gcc/testsuite/gcc.target/arm/mve/general-c/nomve_fp_1.c
 create mode 100644 gcc/testsuite/gcc.target/arm/mve/general-c/vreinterpretq_1.c

-- 
2.34.1


^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH 01/22] arm: move builtin function codes into general numberspace
  2023-04-18 13:45 [PATCH 00/22] arm: New framework for MVE intrinsics Christophe Lyon
@ 2023-04-18 13:45 ` Christophe Lyon
  2023-05-02  9:24   ` Kyrylo Tkachov
  2023-04-18 13:45 ` [PATCH 02/22] arm: [MVE intrinsics] Add new framework Christophe Lyon
                   ` (21 subsequent siblings)
  22 siblings, 1 reply; 55+ messages in thread
From: Christophe Lyon @ 2023-04-18 13:45 UTC (permalink / raw)
  To: gcc-patches, kyrylo.tkachov, richard.earnshaw, richard.sandiford
  Cc: Christophe Lyon

This patch introduces a separate numberspace for general arm builtin
function codes. The intent of this patch is to separate the space of
function codes that may be assigned to general builtins and future
MVE intrinsic functions by using the first bit of each function code
to differentiate them. This is identical to how SVE intrinsic functions
are currently differentiated from general aarch64 builtins.

Future intrinsics implementations may also make use of numberspacing by
changing the values of ARM_BUILTIN_SHIFT and ARM_BUILTIN_CLASS, and
adding themselves to the arm_builtin_class enum.

2022-09-08  Murray Steele  <murray.steele@arm.com>
	    Christophe Lyon  <christophe.lyon@arm.com>

gcc/ChangeLog:

	* config/arm/arm-builtins.cc (arm_general_add_builtin_function):
	New function.
	(arm_init_builtin): Use arm_general_add_builtin_function instead
	of arm_add_builtin_function.
	(arm_init_acle_builtins): Likewise.
	(arm_init_mve_builtins): Likewise.
	(arm_init_crypto_builtins): Likewise.
	(arm_init_builtins): Likewise.
	(arm_general_builtin_decl): New function.
	(arm_builtin_decl): Defer to numberspace-specialized functions.
	(arm_expand_builtin_args): Rename into arm_general_expand_builtin_args.
	(arm_expand_builtin_1): Rename into arm_general_expand_builtin_1 and ...
	(arm_general_expand_builtin_1): ... specialize for general builtins.
	(arm_expand_acle_builtin): Use arm_general_expand_builtin
	instead of arm_expand_builtin.
	(arm_expand_mve_builtin): Likewise.
	(arm_expand_neon_builtin): Likewise.
	(arm_expand_vfp_builtin): Likewise.
	(arm_general_expand_builtin): New function.
	(arm_expand_builtin): Specialize for general builtins.
	(arm_general_check_builtin_call): New function.
	(arm_check_builtin_call): Specialize for general builtins.
	(arm_describe_resolver): Validate numberspace.
	(arm_cde_end_args): Likewise.
	* config/arm/arm-protos.h (enum arm_builtin_class): New enum.
	(ARM_BUILTIN_SHIFT, ARM_BUILTIN_CLASS): New constants.

Co-authored-by: Christophe Lyon  <christophe.lyon@arm.com>
---
 gcc/config/arm/arm-builtins.cc | 226 ++++++++++++++++++++++-----------
 gcc/config/arm/arm-protos.h    |  16 +++
 2 files changed, 165 insertions(+), 77 deletions(-)

diff --git a/gcc/config/arm/arm-builtins.cc b/gcc/config/arm/arm-builtins.cc
index 9f5c568cbc3..adcb50d2185 100644
--- a/gcc/config/arm/arm-builtins.cc
+++ b/gcc/config/arm/arm-builtins.cc
@@ -1405,6 +1405,18 @@ static tree arm_simd_polyHI_type_node = NULL_TREE;
 static tree arm_simd_polyDI_type_node = NULL_TREE;
 static tree arm_simd_polyTI_type_node = NULL_TREE;
 
+/* Wrapper around add_builtin_function.  NAME is the name of the built-in
+   function, TYPE is the function type, CODE is the function subcode
+   (relative to ARM_BUILTIN_GENERAL), and ATTRS is the function
+   attributes.  */
+static tree
+arm_general_add_builtin_function (const char* name, tree type,
+				  unsigned int code, tree attrs = NULL_TREE)
+{
+  code = (code << ARM_BUILTIN_SHIFT) | ARM_BUILTIN_GENERAL;
+  return add_builtin_function (name, type, code, BUILT_IN_MD, NULL, attrs);
+}
+
 static const char *
 arm_mangle_builtin_scalar_type (const_tree type)
 {
@@ -1811,8 +1823,7 @@ arm_init_builtin (unsigned int fcode, arm_builtin_datum *d,
     snprintf (namebuf, sizeof (namebuf), "%s_%s",
 	      prefix, d->name);
 
-  fndecl = add_builtin_function (namebuf, ftype, fcode, BUILT_IN_MD,
-				 NULL, NULL_TREE);
+  fndecl = arm_general_add_builtin_function (namebuf, ftype, fcode);
   arm_builtin_decls[fcode] = fndecl;
 }
 
@@ -1832,7 +1843,7 @@ arm_init_bf16_types (void)
 /* Set up ACLE builtins, even builtins for instructions that are not
    in the current target ISA to allow the user to compile particular modules
    with different target specific options that differ from the command line
-   options.  Such builtins will be rejected in arm_expand_builtin.  */
+   options.  Such builtins will be rejected in arm_general_expand_builtin.  */
 
 static void
 arm_init_acle_builtins (void)
@@ -1845,9 +1856,9 @@ arm_init_acle_builtins (void)
 						 intSI_type_node,
 						 NULL);
   arm_builtin_decls[ARM_BUILTIN_SAT_IMM_CHECK]
-    = add_builtin_function ("__builtin_sat_imm_check", sat_check_fpr,
-			    ARM_BUILTIN_SAT_IMM_CHECK, BUILT_IN_MD,
-			    NULL, NULL_TREE);
+    = arm_general_add_builtin_function ("__builtin_sat_imm_check",
+					sat_check_fpr,
+					ARM_BUILTIN_SAT_IMM_CHECK);
 
   for (i = 0; i < ARRAY_SIZE (acle_builtin_data); i++, fcode++)
     {
@@ -1894,13 +1905,13 @@ arm_init_mve_builtins (void)
 						    intSI_type_node,
 						    NULL);
   arm_builtin_decls[ARM_BUILTIN_GET_FPSCR_NZCVQC]
-    = add_builtin_function ("__builtin_arm_get_fpscr_nzcvqc", get_fpscr_nzcvqc,
-			    ARM_BUILTIN_GET_FPSCR_NZCVQC, BUILT_IN_MD, NULL,
-			    NULL_TREE);
+    = arm_general_add_builtin_function ("__builtin_arm_get_fpscr_nzcvqc",
+					get_fpscr_nzcvqc,
+					ARM_BUILTIN_GET_FPSCR_NZCVQC);
   arm_builtin_decls[ARM_BUILTIN_SET_FPSCR_NZCVQC]
-    = add_builtin_function ("__builtin_arm_set_fpscr_nzcvqc", set_fpscr_nzcvqc,
-			    ARM_BUILTIN_SET_FPSCR_NZCVQC, BUILT_IN_MD, NULL,
-			    NULL_TREE);
+    = arm_general_add_builtin_function ("__builtin_arm_set_fpscr_nzcvqc",
+					set_fpscr_nzcvqc,
+					ARM_BUILTIN_SET_FPSCR_NZCVQC);
 
   for (i = 0; i < ARRAY_SIZE (mve_builtin_data); i++, fcode++)
     {
@@ -1912,7 +1923,7 @@ arm_init_mve_builtins (void)
 /* Set up all the NEON builtins, even builtins for instructions that are not
    in the current target ISA to allow the user to compile particular modules
    with different target specific options that differ from the command line
-   options. Such builtins will be rejected in arm_expand_builtin.  */
+   options.  Such builtins will be rejected in arm_general_expand_builtin.  */
 
 static void
 arm_init_neon_builtins (void)
@@ -2006,17 +2017,14 @@ arm_init_crypto_builtins (void)
     R##_ftype_##A1##_##A2##_##A3
   #define CRYPTO1(L, U, R, A) \
     arm_builtin_decls[C (U)] \
-      = add_builtin_function (N (L), FT1 (R, A), \
-		  C (U), BUILT_IN_MD, NULL, NULL_TREE);
+      = arm_general_add_builtin_function (N (L), FT1 (R, A), C (U));
   #define CRYPTO2(L, U, R, A1, A2)  \
     arm_builtin_decls[C (U)]	\
-      = add_builtin_function (N (L), FT2 (R, A1, A2), \
-		  C (U), BUILT_IN_MD, NULL, NULL_TREE);
+      = arm_general_add_builtin_function (N (L), FT2 (R, A1, A2), C (U));
 
   #define CRYPTO3(L, U, R, A1, A2, A3) \
     arm_builtin_decls[C (U)]	   \
-      = add_builtin_function (N (L), FT3 (R, A1, A2, A3), \
-				  C (U), BUILT_IN_MD, NULL, NULL_TREE);
+      = arm_general_add_builtin_function (N (L), FT3 (R, A1, A2, A3), C (U));
   #include "crypto.def"
 
   #undef CRYPTO1
@@ -2039,8 +2047,8 @@ arm_init_crypto_builtins (void)
 	  || bitmap_bit_p (arm_active_target.isa, FLAG))		\
 	{								\
 	  tree bdecl;							\
-	  bdecl = add_builtin_function ((NAME), (TYPE), (CODE),		\
-					BUILT_IN_MD, NULL, NULL_TREE);	\
+	  bdecl  = arm_general_add_builtin_function ((NAME), (TYPE),    \
+						     (CODE));		\
 	  arm_builtin_decls[CODE] = bdecl;				\
 	}								\
     }									\
@@ -2650,9 +2658,9 @@ arm_init_builtins (void)
 						      intSI_type_node,
 						      NULL);
       arm_builtin_decls[ARM_BUILTIN_SIMD_LANE_CHECK]
-      = add_builtin_function ("__builtin_arm_lane_check", lane_check_fpr,
-			      ARM_BUILTIN_SIMD_LANE_CHECK, BUILT_IN_MD,
-			      NULL, NULL_TREE);
+      = arm_general_add_builtin_function ("__builtin_arm_lane_check",
+					  lane_check_fpr,
+					  ARM_BUILTIN_SIMD_LANE_CHECK);
       if (TARGET_HAVE_MVE)
 	arm_init_mve_builtins ();
       else
@@ -2674,11 +2682,13 @@ arm_init_builtins (void)
 	= build_function_type_list (unsigned_type_node, NULL);
 
       arm_builtin_decls[ARM_BUILTIN_GET_FPSCR]
-	= add_builtin_function ("__builtin_arm_get_fpscr", ftype_get_fpscr,
-				ARM_BUILTIN_GET_FPSCR, BUILT_IN_MD, NULL, NULL_TREE);
+	= arm_general_add_builtin_function ("__builtin_arm_get_fpscr",
+					    ftype_get_fpscr,
+					    ARM_BUILTIN_GET_FPSCR);
       arm_builtin_decls[ARM_BUILTIN_SET_FPSCR]
-	= add_builtin_function ("__builtin_arm_set_fpscr", ftype_set_fpscr,
-				ARM_BUILTIN_SET_FPSCR, BUILT_IN_MD, NULL, NULL_TREE);
+	= arm_general_add_builtin_function ("__builtin_arm_set_fpscr",
+					    ftype_set_fpscr,
+					    ARM_BUILTIN_SET_FPSCR);
     }
 
   if (use_cmse)
@@ -2686,17 +2696,15 @@ arm_init_builtins (void)
       tree ftype_cmse_nonsecure_caller
 	= build_function_type_list (unsigned_type_node, NULL);
       arm_builtin_decls[ARM_BUILTIN_CMSE_NONSECURE_CALLER]
-	= add_builtin_function ("__builtin_arm_cmse_nonsecure_caller",
-				ftype_cmse_nonsecure_caller,
-				ARM_BUILTIN_CMSE_NONSECURE_CALLER, BUILT_IN_MD,
-				NULL, NULL_TREE);
+	= arm_general_add_builtin_function ("__builtin_arm_cmse_nonsecure_caller",
+					    ftype_cmse_nonsecure_caller,
+					    ARM_BUILTIN_CMSE_NONSECURE_CALLER);
     }
 }
 
-/* Return the ARM builtin for CODE.  */
-
+/* Implement TARGET_BUILTIN_DECL for general builtins.  */
 tree
-arm_builtin_decl (unsigned code, bool initialize_p ATTRIBUTE_UNUSED)
+arm_general_builtin_decl (unsigned code)
 {
   if (code >= ARM_BUILTIN_MAX)
     return error_mark_node;
@@ -2704,6 +2712,20 @@ arm_builtin_decl (unsigned code, bool initialize_p ATTRIBUTE_UNUSED)
   return arm_builtin_decls[code];
 }
 
+/* Return the ARM builtin for CODE.  */
+tree
+arm_builtin_decl (unsigned code, bool initialize_p ATTRIBUTE_UNUSED)
+{
+  unsigned subcode = code >> ARM_BUILTIN_SHIFT;
+  switch (code & ARM_BUILTIN_CLASS)
+    {
+    case ARM_BUILTIN_GENERAL:
+      return arm_general_builtin_decl (subcode);
+    default:
+      gcc_unreachable ();
+    }
+}
+
 /* Errors in the source file can cause expand_expr to return const0_rtx
    where we expect a vector.  To avoid crashing, use one of the vector
    clear instructions.  */
@@ -2769,7 +2791,7 @@ arm_expand_ternop_builtin (enum insn_code icode,
   return target;
 }
 
-/* Subroutine of arm_expand_builtin to take care of binop insns.  */
+/* Subroutine of arm_general_expand_builtin to take care of binop insns.  */
 
 static rtx
 arm_expand_binop_builtin (enum insn_code icode,
@@ -2809,7 +2831,7 @@ arm_expand_binop_builtin (enum insn_code icode,
   return target;
 }
 
-/* Subroutine of arm_expand_builtin to take care of unop insns.  */
+/* Subroutine of arm_general_expand_builtin to take care of unop insns.  */
 
 static rtx
 arm_expand_unop_builtin (enum insn_code icode,
@@ -2946,11 +2968,11 @@ mve_dereference_pointer (tree exp, tree type, machine_mode reg_mode,
 		      build_int_cst (build_pointer_type (array_type), 0));
 }
 
-/* Expand a builtin.  */
+/* Implement TARGET_EXPAND_BUILTIN for general builtins.  */
 static rtx
-arm_expand_builtin_args (rtx target, machine_mode map_mode, int fcode,
-		      int icode, int have_retval, tree exp,
-		      builtin_arg *args)
+arm_general_expand_builtin_args (rtx target, machine_mode map_mode, int fcode,
+				 int icode, int have_retval, tree exp,
+				 builtin_arg *args)
 {
   rtx pat;
   tree arg[SIMD_MAX_BUILTIN_ARGS];
@@ -3234,13 +3256,13 @@ constant_arg:
   return target;
 }
 
-/* Expand a builtin.  These builtins are "special" because they don't have
-   symbolic constants defined per-instruction or per instruction-variant.
+/* Expand a general builtin.  These builtins are "special" because they don't
+   have symbolic constants defined per-instruction or per instruction-variant.
    Instead, the required info is looked up in the ARM_BUILTIN_DATA record that
    is passed into the function.  */
 
 static rtx
-arm_expand_builtin_1 (int fcode, tree exp, rtx target,
+arm_general_expand_builtin_1 (int fcode, tree exp, rtx target,
 			   arm_builtin_datum *d)
 {
   enum insn_code icode = d->code;
@@ -3308,16 +3330,16 @@ arm_expand_builtin_1 (int fcode, tree exp, rtx target,
     }
   args[k] = ARG_BUILTIN_STOP;
 
-  /* The interface to arm_expand_builtin_args expects a 0 if
+  /* The interface to arm_general_expand_builtin_args expects a 0 if
      the function is void, and a 1 if it is not.  */
-  return arm_expand_builtin_args
+  return arm_general_expand_builtin_args
     (target, d->mode, fcode, icode, !is_void, exp,
      &args[1]);
 }
 
 /* Expand an ACLE builtin, i.e. those registered only if their respective
    target constraints are met.  This check happens within
-   arm_expand_builtin_args.  */
+   arm_general_expand_builtin_args.  */
 
 static rtx
 arm_expand_acle_builtin (int fcode, tree exp, rtx target)
@@ -3351,11 +3373,12 @@ arm_expand_acle_builtin (int fcode, tree exp, rtx target)
       ? &acle_builtin_data[fcode - ARM_BUILTIN_ACLE_PATTERN_START]
       : &cde_builtin_data[fcode - ARM_BUILTIN_CDE_PATTERN_START].base;
 
-  return arm_expand_builtin_1 (fcode, exp, target, d);
+  return arm_general_expand_builtin_1 (fcode, exp, target, d);
 }
 
-/* Expand an MVE builtin, i.e. those registered only if their respective target
-   constraints are met.  This check happens within arm_expand_builtin.  */
+/* Expand an MVE builtin, i.e. those registered only if their respective
+   target constraints are met.  This check happens within
+   arm_general_expand_builtin.  */
 
 static rtx
 arm_expand_mve_builtin (int fcode, tree exp, rtx target)
@@ -3371,7 +3394,7 @@ arm_expand_mve_builtin (int fcode, tree exp, rtx target)
   arm_builtin_datum *d
     = &mve_builtin_data[fcode - ARM_BUILTIN_MVE_PATTERN_START];
 
-  return arm_expand_builtin_1 (fcode, exp, target, d);
+  return arm_general_expand_builtin_1 (fcode, exp, target, d);
 }
 
 /* Expand a Neon builtin, i.e. those registered only if TARGET_NEON holds.
@@ -3394,7 +3417,7 @@ arm_expand_neon_builtin (int fcode, tree exp, rtx target)
   arm_builtin_datum *d
     = &neon_builtin_data[fcode - ARM_BUILTIN_NEON_PATTERN_START];
 
-  return arm_expand_builtin_1 (fcode, exp, target, d);
+  return arm_general_expand_builtin_1 (fcode, exp, target, d);
 }
 
 /* Expand a VFP builtin.  These builtins are treated like
@@ -3415,25 +3438,18 @@ arm_expand_vfp_builtin (int fcode, tree exp, rtx target)
   arm_builtin_datum *d
     = &vfp_builtin_data[fcode - ARM_BUILTIN_VFP_PATTERN_START];
 
-  return arm_expand_builtin_1 (fcode, exp, target, d);
+  return arm_general_expand_builtin_1 (fcode, exp, target, d);
 }
 
-/* Expand an expression EXP that calls a built-in function,
-   with result going to TARGET if that's convenient
-   (and in mode MODE if that's convenient).
-   SUBTARGET may be used as the target for computing one of EXP's operands.
-   IGNORE is nonzero if the value is to be ignored.  */
-
+/* Implement TARGET_EXPAND_BUILTIN for general builtins.  */
 rtx
-arm_expand_builtin (tree exp,
+arm_general_expand_builtin (unsigned int fcode,
+			    tree exp,
 		    rtx target,
-		    rtx subtarget ATTRIBUTE_UNUSED,
-		    machine_mode mode ATTRIBUTE_UNUSED,
 		    int ignore ATTRIBUTE_UNUSED)
 {
   const struct builtin_description * d;
   enum insn_code    icode;
-  tree              fndecl = TREE_OPERAND (CALL_EXPR_FN (exp), 0);
   tree              arg0;
   tree              arg1;
   tree              arg2;
@@ -3441,7 +3457,6 @@ arm_expand_builtin (tree exp,
   rtx               op1;
   rtx               op2;
   rtx               pat;
-  unsigned int      fcode = DECL_MD_FUNCTION_CODE (fndecl);
   size_t            i;
   machine_mode tmode;
   machine_mode mode0;
@@ -4052,6 +4067,31 @@ arm_expand_builtin (tree exp,
   return NULL_RTX;
 }
 
+/* Expand an expression EXP that calls a built-in function,
+   with result going to TARGET if that's convenient
+   (and in mode MODE if that's convenient).
+   SUBTARGET may be used as the target for computing one of EXP's operands.
+   IGNORE is nonzero if the value is to be ignored.  */
+
+rtx
+arm_expand_builtin (tree exp,
+		    rtx target,
+		    rtx subtarget ATTRIBUTE_UNUSED,
+		    machine_mode mode ATTRIBUTE_UNUSED,
+		    int ignore ATTRIBUTE_UNUSED)
+{
+  tree fndecl = TREE_OPERAND (CALL_EXPR_FN (exp), 0);
+  unsigned int code = DECL_MD_FUNCTION_CODE (fndecl);
+  unsigned int subcode = code >> ARM_BUILTIN_SHIFT;
+  switch (code & ARM_BUILTIN_CLASS)
+    {
+    case ARM_BUILTIN_GENERAL:
+      return arm_general_expand_builtin (subcode, exp, target, ignore);
+    default:
+      gcc_unreachable ();
+    }
+}
+
 void
 arm_atomic_assign_expand_fenv (tree *hold, tree *clear, tree *update)
 {
@@ -4122,22 +4162,21 @@ arm_atomic_assign_expand_fenv (tree *hold, tree *clear, tree *update)
 			    reload_fenv, restore_fnenv), update_call);
 }
 
-/* Implement TARGET_CHECK_BUILTIN_CALL.  Record a read of the Q bit through
-   intrinsics in the machine function.  */
+/* Implement TARGET_CHECK_BUILTIN_CALL for general builtins.  Record a read of
+   the Q bit through intrinsics in the machine function for general built-in
+   functions.  */
 bool
-arm_check_builtin_call (location_t , vec<location_t> , tree fndecl,
-			tree, unsigned int, tree *)
+arm_general_check_builtin_call (unsigned int code)
 {
-  int fcode = DECL_MD_FUNCTION_CODE (fndecl);
-  if (fcode == ARM_BUILTIN_saturation_occurred
-      || fcode == ARM_BUILTIN_set_saturation)
+  if (code == ARM_BUILTIN_saturation_occurred
+     || code == ARM_BUILTIN_set_saturation)
     {
       if (cfun && cfun->decl)
 	DECL_ATTRIBUTES (cfun->decl)
 	  = tree_cons (get_identifier ("acle qbit"), NULL_TREE,
 		       DECL_ATTRIBUTES (cfun->decl));
     }
-  if (fcode == ARM_BUILTIN_sel)
+  else if (code == ARM_BUILTIN_sel)
     {
       if (cfun && cfun->decl)
 	DECL_ATTRIBUTES (cfun->decl)
@@ -4147,19 +4186,52 @@ arm_check_builtin_call (location_t , vec<location_t> , tree fndecl,
   return true;
 }
 
+/* Implement TARGET_CHECK_BUILTIN_CALL.  */
+bool
+arm_check_builtin_call (location_t, vec<location_t>, tree fndecl, tree,
+			unsigned int, tree *)
+{
+  unsigned int code = DECL_MD_FUNCTION_CODE (fndecl);
+  unsigned int subcode = code >> ARM_BUILTIN_SHIFT;
+  switch (code & ARM_BUILTIN_CLASS)
+    {
+    case ARM_BUILTIN_GENERAL:
+      return arm_general_check_builtin_call (subcode);
+    default:
+      gcc_unreachable ();
+    }
+
+}
+
 enum resolver_ident
 arm_describe_resolver (tree fndecl)
 {
-  if (DECL_MD_FUNCTION_CODE (fndecl) >= ARM_BUILTIN_vcx1qv16qi
-    && DECL_MD_FUNCTION_CODE (fndecl) < ARM_BUILTIN_MVE_BASE)
-    return arm_cde_resolver;
-  return arm_no_resolver;
+  unsigned int code = DECL_MD_FUNCTION_CODE (fndecl);
+  unsigned int subcode = code >> ARM_BUILTIN_SHIFT;
+  switch (code & ARM_BUILTIN_CLASS)
+    {
+    case ARM_BUILTIN_GENERAL:
+      if (subcode >= ARM_BUILTIN_vcx1qv16qi
+	&& subcode < ARM_BUILTIN_MVE_BASE)
+	return arm_cde_resolver;
+      return arm_no_resolver;
+    default:
+      gcc_unreachable ();
+    }
 }
 
 unsigned
 arm_cde_end_args (tree fndecl)
 {
-  return DECL_MD_FUNCTION_CODE (fndecl) >= ARM_BUILTIN_vcx1q_p_v16qi ? 2 : 1;
+  unsigned int code = DECL_MD_FUNCTION_CODE (fndecl);
+  unsigned int subcode = code >> ARM_BUILTIN_SHIFT;
+  switch (code & ARM_BUILTIN_CLASS)
+    {
+    case ARM_BUILTIN_GENERAL:
+      return subcode >= ARM_BUILTIN_vcx1q_p_v16qi ? 2 : 1;
+    default:
+      gcc_unreachable ();
+    }
 }
 
 #include "gt-arm-builtins.h"
diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index c8ae5e1e9c1..1bdbd3b8ab3 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -210,6 +210,22 @@ extern opt_machine_mode arm_get_mask_mode (machine_mode mode);
 
 #endif /* RTX_CODE */
 
+/* It's convenient to divide the built-in function codes into groups,
+   rather than having everything in a single enum.  This type enumerates
+   those groups.  */
+enum arm_builtin_class
+{
+  ARM_BUILTIN_GENERAL
+};
+
+/* Built-in function codes are structured so that the low
+   ARM_BUILTIN_SHIFT bits contain the arm_builtin_class
+   and the upper bits contain a group-specific subcode.  */
+const unsigned int ARM_BUILTIN_SHIFT = 1;
+
+/* Mask that selects the arm part of a function code.  */
+const unsigned int ARM_BUILTIN_CLASS = (1 << ARM_BUILTIN_SHIFT) - 1;
+
 /* MVE functions.  */
 namespace arm_mve {
   void handle_arm_mve_types_h ();
-- 
2.34.1


^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH 02/22] arm: [MVE intrinsics] Add new framework
  2023-04-18 13:45 [PATCH 00/22] arm: New framework for MVE intrinsics Christophe Lyon
  2023-04-18 13:45 ` [PATCH 01/22] arm: move builtin function codes into general numberspace Christophe Lyon
@ 2023-04-18 13:45 ` Christophe Lyon
  2023-05-02 10:17   ` Kyrylo Tkachov
  2023-04-18 13:45 ` [PATCH 03/22] arm: [MVE intrinsics] Rework vreinterpretq Christophe Lyon
                   ` (20 subsequent siblings)
  22 siblings, 1 reply; 55+ messages in thread
From: Christophe Lyon @ 2023-04-18 13:45 UTC (permalink / raw)
  To: gcc-patches, kyrylo.tkachov, richard.earnshaw, richard.sandiford
  Cc: Christophe Lyon

This patch introduces the new MVE intrinsics framework, heavily
inspired by the SVE one in the aarch64 port.

Like the MVE intrinsic types implementation, the intrinsics framework
defines functions via a new pragma in arm_mve.h. A boolean parameter
is used to pass true when __ARM_MVE_PRESERVE_USER_NAMESPACE is
defined, and false when it is not, allowing for non-prefixed intrinsic
functions to be conditionally defined.

Future patches will build on this framework by adding new intrinsic
functions and adding the features needed to support them.

Differences compared to the aarch64/SVE port include:
- when present, the predicate argument is the last one with MVE (the
  first one with SVE)
- when using merging predicates ("_m" suffix), the "inactive" argument
  (if any) is inserted in the first position
- when using merging predicates ("_m" suffix), some function do not
  have the "inactive" argument, so we maintain an exception-list
- MVE intrinsics dealing with floating-point require the FP extension,
  while SVE may support different extensions
- regarding global state, MVE does not have any prefetch intrinsic, so
  we do not need a flag for this
- intrinsic names can be prefixed with "__arm", depending on whether
  preserve_user_namespace is true or false
- parse_signature: the maximum number of arguments is now a parameter,
  this helps detecting an overflow with a new assert.
- suffixes and overloading can be controlled using
  explicit_mode_suffix_p and skip_overload_p in addition to
  explicit_type_suffix_p

At this implemtation stage, there are some limitations compared
to aarch64/SVE, which are removed later in the series:
- "offset" mode is not supported yet
- gimple folding is not implemented

2022-09-08  Murray Steele  <murray.steele@arm.com>
	    Christophe Lyon  <christophe.lyon@arm.com>

gcc/ChangeLog:

	* config.gcc: Add arm-mve-builtins-base.o and
	arm-mve-builtins-shapes.o to extra_objs.
	* config/arm/arm-builtins.cc (arm_builtin_decl): Handle MVE builtin
	numberspace.
	(arm_expand_builtin): Likewise
	(arm_check_builtin_call): Likewise
	(arm_describe_resolver): Likewise.
	* config/arm/arm-builtins.h (enum resolver_ident): Add
	arm_mve_resolver.
	* config/arm/arm-c.cc (arm_pragma_arm): Handle new pragma.
	(arm_resolve_overloaded_builtin): Handle MVE builtins.
	(arm_register_target_pragmas): Register arm_check_builtin_call.
	* config/arm/arm-mve-builtins.cc (class registered_function): New
	class.
	(struct registered_function_hasher): New struct.
	(pred_suffixes): New table.
	(mode_suffixes): New table.
	(type_suffix_info): New table.
	(TYPES_float16): New.
	(TYPES_all_float): New.
	(TYPES_integer_8): New.
	(TYPES_integer_8_16): New.
	(TYPES_integer_16_32): New.
	(TYPES_integer_32): New.
	(TYPES_signed_16_32): New.
	(TYPES_signed_32): New.
	(TYPES_all_signed): New.
	(TYPES_all_unsigned): New.
	(TYPES_all_integer): New.
	(TYPES_all_integer_with_64): New.
	(DEF_VECTOR_TYPE): New.
	(DEF_DOUBLE_TYPE): New.
	(DEF_MVE_TYPES_ARRAY): New.
	(all_integer): New.
	(all_integer_with_64): New.
	(float16): New.
	(all_float): New.
	(all_signed): New.
	(all_unsigned): New.
	(integer_8): New.
	(integer_8_16): New.
	(integer_16_32): New.
	(integer_32): New.
	(signed_16_32): New.
	(signed_32): New.
	(register_vector_type): Use void_type_node for mve.fp-only types when
	mve.fp is not enabled.
	(register_builtin_tuple_types): Likewise.
	(handle_arm_mve_h): New function..
	(matches_type_p): Likewise..
	(report_out_of_range): Likewise.
	(report_not_enum): Likewise.
	(report_missing_float): Likewise.
	(report_non_ice): Likewise.
	(check_requires_float): Likewise.
	(function_instance::hash): Likewise
	(function_instance::call_properties): Likewise.
	(function_instance::reads_global_state_p): Likewise.
	(function_instance::modifies_global_state_p): Likewise.
	(function_instance::could_trap_p): Likewise.
	(function_instance::has_inactive_argument): Likewise.
	(registered_function_hasher::hash): Likewise.
	(registered_function_hasher::equal): Likewise.
	(function_builder::function_builder): Likewise.
	(function_builder::~function_builder): Likewise.
	(function_builder::append_name): Likewise.
	(function_builder::finish_name): Likewise.
	(function_builder::get_name): Likewise.
	(add_attribute): Likewise.
	(function_builder::get_attributes): Likewise.
	(function_builder::add_function): Likewise.
	(function_builder::add_unique_function): Likewise.
	(function_builder::add_overloaded_function): Likewise.
	(function_builder::add_overloaded_functions): Likewise.
	(function_builder::register_function_group): Likewise.
	(function_call_info::function_call_info): Likewise.
	(function_resolver::function_resolver): Likewise.
	(function_resolver::get_vector_type): Likewise.
	(function_resolver::get_scalar_type_name): Likewise.
	(function_resolver::get_argument_type): Likewise.
	(function_resolver::scalar_argument_p): Likewise.
	(function_resolver::report_no_such_form): Likewise.
	(function_resolver::lookup_form): Likewise.
	(function_resolver::resolve_to): Likewise.
	(function_resolver::infer_vector_or_tuple_type): Likewise.
	(function_resolver::infer_vector_type): Likewise.
	(function_resolver::require_vector_or_scalar_type): Likewise.
	(function_resolver::require_vector_type): Likewise.
	(function_resolver::require_matching_vector_type): Likewise.
	(function_resolver::require_derived_vector_type): Likewise.
	(function_resolver::require_derived_scalar_type): Likewise.
	(function_resolver::require_integer_immediate): Likewise.
	(function_resolver::require_scalar_type): Likewise.
	(function_resolver::check_num_arguments): Likewise.
	(function_resolver::check_gp_argument): Likewise.
	(function_resolver::finish_opt_n_resolution): Likewise.
	(function_resolver::resolve_unary): Likewise.
	(function_resolver::resolve_unary_n): Likewise.
	(function_resolver::resolve_uniform): Likewise.
	(function_resolver::resolve_uniform_opt_n): Likewise.
	(function_resolver::resolve): Likewise.
	(function_checker::function_checker): Likewise.
	(function_checker::argument_exists_p): Likewise.
	(function_checker::require_immediate): Likewise.
	(function_checker::require_immediate_enum): Likewise.
	(function_checker::require_immediate_range): Likewise.
	(function_checker::check): Likewise.
	(gimple_folder::gimple_folder): Likewise.
	(gimple_folder::fold): Likewise.
	(function_expander::function_expander): Likewise.
	(function_expander::direct_optab_handler): Likewise.
	(function_expander::get_fallback_value): Likewise.
	(function_expander::get_reg_target): Likewise.
	(function_expander::add_output_operand): Likewise.
	(function_expander::add_input_operand): Likewise.
	(function_expander::add_integer_operand): Likewise.
	(function_expander::generate_insn): Likewise.
	(function_expander::use_exact_insn): Likewise.
	(function_expander::use_unpred_insn): Likewise.
	(function_expander::use_pred_x_insn): Likewise.
	(function_expander::use_cond_insn): Likewise.
	(function_expander::map_to_rtx_codes): Likewise.
	(function_expander::expand): Likewise.
	(resolve_overloaded_builtin): Likewise.
	(check_builtin_call): Likewise.
	(gimple_fold_builtin): Likewise.
	(expand_builtin): Likewise.
	(gt_ggc_mx): Likewise.
	(gt_pch_nx): Likewise.
	(gt_pch_nx): Likewise.
	* config/arm/arm-mve-builtins.def(s8): Define new type suffix.
	(s16): Likewise.
	(s32): Likewise.
	(s64): Likewise.
	(u8): Likewise.
	(u16): Likewise.
	(u32): Likewise.
	(u64): Likewise.
	(f16): Likewise.
	(f32): Likewise.
	(n): New mode.
	(offset): New mode.
	* config/arm/arm-mve-builtins.h (MAX_TUPLE_SIZE): New constant.
	(CP_READ_FPCR): Likewise.
	(CP_RAISE_FP_EXCEPTIONS): Likewise.
	(CP_READ_MEMORY): Likewise.
	(CP_WRITE_MEMORY): Likewise.
	(enum units_index): New enum.
	(enum predication_index): New.
	(enum type_class_index): New.
	(enum mode_suffix_index): New enum.
	(enum type_suffix_index): New.
	(struct mode_suffix_info): New struct.
	(struct type_suffix_info): New.
	(struct function_group_info): Likewise.
	(class function_instance): Likewise.
	(class registered_function): Likewise.
	(class function_builder): Likewise.
	(class function_call_info): Likewise.
	(class function_resolver): Likewise.
	(class function_checker): Likewise.
	(class gimple_folder): Likewise.
	(class function_expander): Likewise.
	(get_mve_pred16_t): Likewise.
	(find_mode_suffix): New function.
	(class function_base): Likewise.
	(class function_shape): Likewise.
	(function_instance::operator==): New function.
	(function_instance::operator!=): Likewise.
	(function_instance::vectors_per_tuple): Likewise.
	(function_instance::mode_suffix): Likewise.
	(function_instance::type_suffix): Likewise.
	(function_instance::scalar_type): Likewise.
	(function_instance::vector_type): Likewise.
	(function_instance::tuple_type): Likewise.
	(function_instance::vector_mode): Likewise.
	(function_call_info::function_returns_void_p): Likewise.
	(function_base::call_properties): Likewise.
	* config/arm/arm-protos.h (enum arm_builtin_class): Add
	ARM_BUILTIN_MVE.
	(handle_arm_mve_h): New.
	(resolve_overloaded_builtin): New.
	(check_builtin_call): New.
	(gimple_fold_builtin): New.
	(expand_builtin): New.
	* config/arm/arm.cc (TARGET_GIMPLE_FOLD_BUILTIN): Define as
	arm_gimple_fold_builtin.
	(arm_gimple_fold_builtin): New function.
	* config/arm/arm_mve.h: Use new arm_mve.h pragma.
	* config/arm/predicates.md (arm_any_register_operand): New predicate.
	* config/arm/t-arm: (arm-mve-builtins.o): Add includes.
	(arm-mve-builtins-shapes.o): New target.
	(arm-mve-builtins-base.o): New target.
	* config/arm/arm-mve-builtins-base.cc: New file.
	* config/arm/arm-mve-builtins-base.def: New file.
	* config/arm/arm-mve-builtins-base.h: New file.
	* config/arm/arm-mve-builtins-functions.h: New file.
	* config/arm/arm-mve-builtins-shapes.cc: New file.
	* config/arm/arm-mve-builtins-shapes.h: New file.

Co-authored-by: Christophe Lyon  <christophe.lyon@arm.com
---
 gcc/config.gcc                              |    2 +-
 gcc/config/arm/arm-builtins.cc              |   15 +-
 gcc/config/arm/arm-builtins.h               |    1 +
 gcc/config/arm/arm-c.cc                     |   42 +-
 gcc/config/arm/arm-mve-builtins-base.cc     |   45 +
 gcc/config/arm/arm-mve-builtins-base.def    |   24 +
 gcc/config/arm/arm-mve-builtins-base.h      |   29 +
 gcc/config/arm/arm-mve-builtins-functions.h |   50 +
 gcc/config/arm/arm-mve-builtins-shapes.cc   |  343 ++++
 gcc/config/arm/arm-mve-builtins-shapes.h    |   30 +
 gcc/config/arm/arm-mve-builtins.cc          | 1950 ++++++++++++++++++-
 gcc/config/arm/arm-mve-builtins.def         |   40 +-
 gcc/config/arm/arm-mve-builtins.h           |  669 ++++++-
 gcc/config/arm/arm-protos.h                 |   10 +-
 gcc/config/arm/arm.cc                       |   27 +
 gcc/config/arm/arm_mve.h                    |    6 +
 gcc/config/arm/predicates.md                |    4 +
 gcc/config/arm/t-arm                        |   32 +-
 18 files changed, 3292 insertions(+), 27 deletions(-)
 create mode 100644 gcc/config/arm/arm-mve-builtins-base.cc
 create mode 100644 gcc/config/arm/arm-mve-builtins-base.def
 create mode 100644 gcc/config/arm/arm-mve-builtins-base.h
 create mode 100644 gcc/config/arm/arm-mve-builtins-functions.h
 create mode 100644 gcc/config/arm/arm-mve-builtins-shapes.cc
 create mode 100644 gcc/config/arm/arm-mve-builtins-shapes.h

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 6fd1594480a..5d49f5890ab 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -362,7 +362,7 @@ arc*-*-*)
 	;;
 arm*-*-*)
 	cpu_type=arm
-	extra_objs="arm-builtins.o arm-mve-builtins.o aarch-common.o aarch-bti-insert.o"
+	extra_objs="arm-builtins.o arm-mve-builtins.o arm-mve-builtins-shapes.o arm-mve-builtins-base.o aarch-common.o aarch-bti-insert.o"
 	extra_headers="mmintrin.h arm_neon.h arm_acle.h arm_fp16.h arm_cmse.h arm_bf16.h arm_mve_types.h arm_mve.h arm_cde.h"
 	target_type_format_char='%'
 	c_target_objs="arm-c.o"
diff --git a/gcc/config/arm/arm-builtins.cc b/gcc/config/arm/arm-builtins.cc
index adcb50d2185..d0c57409b4c 100644
--- a/gcc/config/arm/arm-builtins.cc
+++ b/gcc/config/arm/arm-builtins.cc
@@ -2712,6 +2712,7 @@ arm_general_builtin_decl (unsigned code)
   return arm_builtin_decls[code];
 }
 
+/* Implement TARGET_BUILTIN_DECL.  */
 /* Return the ARM builtin for CODE.  */
 tree
 arm_builtin_decl (unsigned code, bool initialize_p ATTRIBUTE_UNUSED)
@@ -2721,6 +2722,8 @@ arm_builtin_decl (unsigned code, bool initialize_p ATTRIBUTE_UNUSED)
     {
     case ARM_BUILTIN_GENERAL:
       return arm_general_builtin_decl (subcode);
+    case ARM_BUILTIN_MVE:
+      return error_mark_node;
     default:
       gcc_unreachable ();
     }
@@ -4087,6 +4090,8 @@ arm_expand_builtin (tree exp,
     {
     case ARM_BUILTIN_GENERAL:
       return arm_general_expand_builtin (subcode, exp, target, ignore);
+    case ARM_BUILTIN_MVE:
+      return arm_mve::expand_builtin (subcode, exp, target);
     default:
       gcc_unreachable ();
     }
@@ -4188,8 +4193,9 @@ arm_general_check_builtin_call (unsigned int code)
 
 /* Implement TARGET_CHECK_BUILTIN_CALL.  */
 bool
-arm_check_builtin_call (location_t, vec<location_t>, tree fndecl, tree,
-			unsigned int, tree *)
+arm_check_builtin_call (location_t loc, vec<location_t> arg_loc,
+			tree fndecl, tree orig_fndecl,
+			unsigned int nargs, tree *args)
 {
   unsigned int code = DECL_MD_FUNCTION_CODE (fndecl);
   unsigned int subcode = code >> ARM_BUILTIN_SHIFT;
@@ -4197,6 +4203,9 @@ arm_check_builtin_call (location_t, vec<location_t>, tree fndecl, tree,
     {
     case ARM_BUILTIN_GENERAL:
       return arm_general_check_builtin_call (subcode);
+    case ARM_BUILTIN_MVE:
+      return arm_mve::check_builtin_call (loc, arg_loc, subcode,
+					  orig_fndecl, nargs, args);
     default:
       gcc_unreachable ();
     }
@@ -4215,6 +4224,8 @@ arm_describe_resolver (tree fndecl)
 	&& subcode < ARM_BUILTIN_MVE_BASE)
 	return arm_cde_resolver;
       return arm_no_resolver;
+    case ARM_BUILTIN_MVE:
+      return arm_mve_resolver;
     default:
       gcc_unreachable ();
     }
diff --git a/gcc/config/arm/arm-builtins.h b/gcc/config/arm/arm-builtins.h
index 8c94b6bc40b..494dcd09411 100644
--- a/gcc/config/arm/arm-builtins.h
+++ b/gcc/config/arm/arm-builtins.h
@@ -27,6 +27,7 @@
 
 enum resolver_ident {
     arm_cde_resolver,
+    arm_mve_resolver,
     arm_no_resolver
 };
 enum resolver_ident arm_describe_resolver (tree);
diff --git a/gcc/config/arm/arm-c.cc b/gcc/config/arm/arm-c.cc
index 59c0d8ce747..d3d93ceba00 100644
--- a/gcc/config/arm/arm-c.cc
+++ b/gcc/config/arm/arm-c.cc
@@ -144,20 +144,44 @@ arm_pragma_arm (cpp_reader *)
   const char *name = TREE_STRING_POINTER (x);
   if (strcmp (name, "arm_mve_types.h") == 0)
     arm_mve::handle_arm_mve_types_h ();
+  else if (strcmp (name, "arm_mve.h") == 0)
+    {
+      if (pragma_lex (&x) == CPP_NAME)
+	{
+	  if (strcmp (IDENTIFIER_POINTER (x), "true") == 0)
+	    arm_mve::handle_arm_mve_h (true);
+	  else if (strcmp (IDENTIFIER_POINTER (x), "false") == 0)
+	    arm_mve::handle_arm_mve_h (false);
+	  else
+	    error ("%<#pragma GCC arm \"arm_mve.h\"%> requires a boolean parameter");
+	}
+    }
   else
     error ("unknown %<#pragma GCC arm%> option %qs", name);
 }
 
-/* Implement TARGET_RESOLVE_OVERLOADED_BUILTIN.  This is currently only
-   used for the MVE related builtins for the CDE extension.
-   Here we ensure the type of arguments is such that the size is correct, and
-   then return a tree that describes the same function call but with the
-   relevant types cast as necessary.  */
+/* Implement TARGET_RESOLVE_OVERLOADED_BUILTIN.  */
 tree
-arm_resolve_overloaded_builtin (location_t loc, tree fndecl, void *arglist)
+arm_resolve_overloaded_builtin (location_t loc, tree fndecl,
+				void *uncast_arglist)
 {
-  if (arm_describe_resolver (fndecl) == arm_cde_resolver)
-    return arm_resolve_cde_builtin (loc, fndecl, arglist);
+  enum resolver_ident resolver = arm_describe_resolver (fndecl);
+  if (resolver == arm_cde_resolver)
+    return arm_resolve_cde_builtin (loc, fndecl, uncast_arglist);
+  if (resolver == arm_mve_resolver)
+    {
+      vec<tree, va_gc> empty = {};
+      vec<tree, va_gc> *arglist = (uncast_arglist
+				   ? (vec<tree, va_gc> *) uncast_arglist
+				   : &empty);
+      unsigned int code = DECL_MD_FUNCTION_CODE (fndecl);
+      unsigned int subcode = code >> ARM_BUILTIN_SHIFT;
+      tree new_fndecl = arm_mve::resolve_overloaded_builtin (loc, subcode, arglist);
+      if (new_fndecl == NULL_TREE || new_fndecl == error_mark_node)
+	return new_fndecl;
+      return build_function_call_vec (loc, vNULL, new_fndecl, arglist,
+				      NULL, fndecl);
+    }
   return NULL_TREE;
 }
 
@@ -519,7 +543,9 @@ arm_register_target_pragmas (void)
 {
   /* Update pragma hook to allow parsing #pragma GCC target.  */
   targetm.target_option.pragma_parse = arm_pragma_target_parse;
+
   targetm.resolve_overloaded_builtin = arm_resolve_overloaded_builtin;
+  targetm.check_builtin_call = arm_check_builtin_call;
 
   c_register_pragma ("GCC", "arm", arm_pragma_arm);
 
diff --git a/gcc/config/arm/arm-mve-builtins-base.cc b/gcc/config/arm/arm-mve-builtins-base.cc
new file mode 100644
index 00000000000..e9f285faf2b
--- /dev/null
+++ b/gcc/config/arm/arm-mve-builtins-base.cc
@@ -0,0 +1,45 @@
+/* ACLE support for Arm MVE (__ARM_FEATURE_MVE intrinsics)
+   Copyright (C) 2023 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.  */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "tm.h"
+#include "tree.h"
+#include "rtl.h"
+#include "memmodel.h"
+#include "insn-codes.h"
+#include "optabs.h"
+#include "basic-block.h"
+#include "function.h"
+#include "gimple.h"
+#include "arm-mve-builtins.h"
+#include "arm-mve-builtins-shapes.h"
+#include "arm-mve-builtins-base.h"
+#include "arm-mve-builtins-functions.h"
+
+using namespace arm_mve;
+
+namespace {
+
+} /* end anonymous namespace */
+
+namespace arm_mve {
+
+} /* end namespace arm_mve */
diff --git a/gcc/config/arm/arm-mve-builtins-base.def b/gcc/config/arm/arm-mve-builtins-base.def
new file mode 100644
index 00000000000..d15ba2e23e8
--- /dev/null
+++ b/gcc/config/arm/arm-mve-builtins-base.def
@@ -0,0 +1,24 @@
+/* ACLE support for Arm MVE (__ARM_FEATURE_MVE intrinsics)
+   Copyright (C) 2023 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.  */
+
+#define REQUIRES_FLOAT false
+#undef REQUIRES_FLOAT
+
+#define REQUIRES_FLOAT true
+#undef REQUIRES_FLOAT
diff --git a/gcc/config/arm/arm-mve-builtins-base.h b/gcc/config/arm/arm-mve-builtins-base.h
new file mode 100644
index 00000000000..c4d7b750cd5
--- /dev/null
+++ b/gcc/config/arm/arm-mve-builtins-base.h
@@ -0,0 +1,29 @@
+/* ACLE support for Arm MVE (__ARM_FEATURE_MVE intrinsics)
+   Copyright (C) 2023 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef GCC_ARM_MVE_BUILTINS_BASE_H
+#define GCC_ARM_MVE_BUILTINS_BASE_H
+
+namespace arm_mve {
+namespace functions {
+
+} /* end namespace arm_mve::functions */
+} /* end namespace arm_mve */
+
+#endif
diff --git a/gcc/config/arm/arm-mve-builtins-functions.h b/gcc/config/arm/arm-mve-builtins-functions.h
new file mode 100644
index 00000000000..dff01999bcd
--- /dev/null
+++ b/gcc/config/arm/arm-mve-builtins-functions.h
@@ -0,0 +1,50 @@
+/* ACLE support for Arm MVE (function_base classes)
+   Copyright (C) 2023 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef GCC_ARM_MVE_BUILTINS_FUNCTIONS_H
+#define GCC_ARM_MVE_BUILTINS_FUNCTIONS_H
+
+namespace arm_mve {
+
+/* Wrap T, which is derived from function_base, and indicate that the
+   function never has side effects.  It is only necessary to use this
+   wrapper on functions that might have floating-point suffixes, since
+   otherwise we assume by default that the function has no side effects.  */
+template<typename T>
+class quiet : public T
+{
+public:
+  CONSTEXPR quiet () : T () {}
+
+  unsigned int
+  call_properties (const function_instance &) const override
+  {
+    return 0;
+  }
+};
+
+} /* end namespace arm_mve */
+
+/* Declare the global function base NAME, creating it from an instance
+   of class CLASS with constructor arguments ARGS.  */
+#define FUNCTION(NAME, CLASS, ARGS) \
+  namespace { static CONSTEXPR const CLASS NAME##_obj ARGS; } \
+  namespace functions { const function_base *const NAME = &NAME##_obj; }
+
+#endif
diff --git a/gcc/config/arm/arm-mve-builtins-shapes.cc b/gcc/config/arm/arm-mve-builtins-shapes.cc
new file mode 100644
index 00000000000..f20660d8319
--- /dev/null
+++ b/gcc/config/arm/arm-mve-builtins-shapes.cc
@@ -0,0 +1,343 @@
+/* ACLE support for Arm MVE (function shapes)
+   Copyright (C) 2023 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.  */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "tm.h"
+#include "tree.h"
+#include "rtl.h"
+#include "memmodel.h"
+#include "insn-codes.h"
+#include "optabs.h"
+#include "arm-mve-builtins.h"
+#include "arm-mve-builtins-shapes.h"
+
+/* In the comments below, _t0 represents the first type suffix
+   (e.g. "_s8") and _t1 represents the second.  T0/T1 represent the
+   type full names (e.g. int8x16_t). Square brackets enclose
+   characters that are present in only the full name, not the
+   overloaded name.  Governing predicate arguments and predicate
+   suffixes are not shown, since they depend on the predication type,
+   which is a separate piece of information from the shape.  */
+
+namespace arm_mve {
+
+/* If INSTANCE has a predicate, add it to the list of argument types
+   in ARGUMENT_TYPES.  RETURN_TYPE is the type returned by the
+   function.  */
+static void
+apply_predication (const function_instance &instance, tree return_type,
+		   vec<tree> &argument_types)
+{
+  if (instance.pred != PRED_none)
+    {
+      /* When predicate is PRED_m, insert a first argument
+	 ("inactive") with the same type as return_type.  */
+      if (instance.has_inactive_argument ())
+	argument_types.quick_insert (0, return_type);
+      argument_types.quick_push (get_mve_pred16_t ());
+    }
+}
+
+/* Parse and move past an element type in FORMAT and return it as a type
+   suffix.  The format is:
+
+   [01]    - the element type in type suffix 0 or 1 of INSTANCE.
+   h<elt>  - a half-sized version of <elt>
+   s<bits> - a signed type with the given number of bits
+   s[01]   - a signed type with the same width as type suffix 0 or 1
+   u<bits> - an unsigned type with the given number of bits
+   u[01]   - an unsigned type with the same width as type suffix 0 or 1
+   w<elt>  - a double-sized version of <elt>
+   x<bits> - a type with the given number of bits and same signedness
+             as the next argument.
+
+   Future intrinsics will extend this format.  */
+static type_suffix_index
+parse_element_type (const function_instance &instance, const char *&format)
+{
+  int ch = *format++;
+
+
+  if (ch == 's' || ch == 'u')
+    {
+      type_class_index tclass = (ch == 'f' ? TYPE_float
+				 : ch == 's' ? TYPE_signed
+				 : TYPE_unsigned);
+      char *end;
+      unsigned int bits = strtol (format, &end, 10);
+      format = end;
+      if (bits == 0 || bits == 1)
+	bits = instance.type_suffix (bits).element_bits;
+      return find_type_suffix (tclass, bits);
+    }
+
+  if (ch == 'h')
+    {
+      type_suffix_index suffix = parse_element_type (instance, format);
+      return find_type_suffix (type_suffixes[suffix].tclass,
+			       type_suffixes[suffix].element_bits / 2);
+    }
+
+   if (ch == 'w')
+    {
+      type_suffix_index suffix = parse_element_type (instance, format);
+      return find_type_suffix (type_suffixes[suffix].tclass,
+			       type_suffixes[suffix].element_bits * 2);
+    }
+
+  if (ch == 'x')
+    {
+      const char *next = format;
+      next = strstr (format, ",");
+      next+=2;
+      type_suffix_index suffix = parse_element_type (instance, next);
+      type_class_index tclass = type_suffixes[suffix].tclass;
+      char *end;
+      unsigned int bits = strtol (format, &end, 10);
+      format = end;
+      return find_type_suffix (tclass, bits);
+    }
+
+  if (ch == '0' || ch == '1')
+    return instance.type_suffix_ids[ch - '0'];
+
+  gcc_unreachable ();
+}
+
+/* Read and return a type from FORMAT for function INSTANCE.  Advance
+   FORMAT beyond the type string.  The format is:
+
+   p       - predicates with type mve_pred16_t
+   s<elt>  - a scalar type with the given element suffix
+   t<elt>  - a vector or tuple type with given element suffix [*1]
+   v<elt>  - a vector with the given element suffix
+
+   where <elt> has the format described above parse_element_type.
+
+   Future intrinsics will extend this format.
+
+   [*1] the vectors_per_tuple function indicates whether the type should
+        be a tuple, and if so, how many vectors it should contain.  */
+static tree
+parse_type (const function_instance &instance, const char *&format)
+{
+  int ch = *format++;
+
+  if (ch == 'p')
+    return get_mve_pred16_t ();
+
+  if (ch == 's')
+    {
+      type_suffix_index suffix = parse_element_type (instance, format);
+      return scalar_types[type_suffixes[suffix].vector_type];
+    }
+
+  if (ch == 't')
+    {
+      type_suffix_index suffix = parse_element_type (instance, format);
+      vector_type_index vector_type = type_suffixes[suffix].vector_type;
+      unsigned int num_vectors = instance.vectors_per_tuple ();
+      return acle_vector_types[num_vectors - 1][vector_type];
+    }
+
+  if (ch == 'v')
+    {
+      type_suffix_index suffix = parse_element_type (instance, format);
+      return acle_vector_types[0][type_suffixes[suffix].vector_type];
+    }
+
+  gcc_unreachable ();
+}
+
+/* Read a type signature for INSTANCE from FORMAT.  Add the argument
+   types to ARGUMENT_TYPES and return the return type.  Assert there
+   are no more than MAX_ARGS arguments.
+
+   The format is a comma-separated list of types (as for parse_type),
+   with the first type being the return type and the rest being the
+   argument types.  */
+static tree
+parse_signature (const function_instance &instance, const char *format,
+		 vec<tree> &argument_types, unsigned int max_args)
+{
+  tree return_type = parse_type (instance, format);
+  unsigned int args = 0;
+  while (format[0] == ',')
+    {
+      gcc_assert (args < max_args);
+      format += 1;
+      tree argument_type = parse_type (instance, format);
+      argument_types.quick_push (argument_type);
+      args += 1;
+    }
+  gcc_assert (format[0] == 0);
+  return return_type;
+}
+
+/* Add one function instance for GROUP, using mode suffix MODE_SUFFIX_ID,
+   the type suffixes at index TI and the predication suffix at index PI.
+   The other arguments are as for build_all.  */
+static void
+build_one (function_builder &b, const char *signature,
+	   const function_group_info &group, mode_suffix_index mode_suffix_id,
+	   unsigned int ti, unsigned int pi, bool preserve_user_namespace,
+	   bool force_direct_overloads)
+{
+  /* Current functions take at most five arguments.  Match
+     parse_signature parameter below.  */
+  auto_vec<tree, 5> argument_types;
+  function_instance instance (group.base_name, *group.base, *group.shape,
+			      mode_suffix_id, group.types[ti],
+			      group.preds[pi]);
+  tree return_type = parse_signature (instance, signature, argument_types, 5);
+  apply_predication (instance, return_type, argument_types);
+  b.add_unique_function (instance, return_type, argument_types,
+			 preserve_user_namespace, group.requires_float,
+			 force_direct_overloads);
+}
+
+/* Add a function instance for every type and predicate combination in
+   GROUP, except if requested to use only the predicates listed in
+   RESTRICT_TO_PREDS.  Take the function base name from GROUP and the
+   mode suffix from MODE_SUFFIX_ID. Use SIGNATURE to construct the
+   function signature, then use apply_predication to add in the
+   predicate.  */
+static void
+build_all (function_builder &b, const char *signature,
+	   const function_group_info &group, mode_suffix_index mode_suffix_id,
+	   bool preserve_user_namespace,
+	   bool force_direct_overloads = false,
+	   const predication_index *restrict_to_preds = NULL)
+{
+  for (unsigned int pi = 0; group.preds[pi] != NUM_PREDS; ++pi)
+    {
+      unsigned int pi2 = 0;
+
+      if (restrict_to_preds)
+	for (; restrict_to_preds[pi2] != NUM_PREDS; ++pi2)
+	  if (restrict_to_preds[pi2] == group.preds[pi])
+	    break;
+
+      if (restrict_to_preds == NULL || restrict_to_preds[pi2] != NUM_PREDS)
+	for (unsigned int ti = 0;
+	     ti == 0 || group.types[ti][0] != NUM_TYPE_SUFFIXES; ++ti)
+	  build_one (b, signature, group, mode_suffix_id, ti, pi,
+		     preserve_user_namespace, force_direct_overloads);
+    }
+}
+
+/* Add a function instance for every type and predicate combination in
+   GROUP, except if requested to use only the predicates listed in
+   RESTRICT_TO_PREDS, and only for 16-bit and 32-bit integers.  Take
+   the function base name from GROUP and the mode suffix from
+   MODE_SUFFIX_ID. Use SIGNATURE to construct the function signature,
+   then use apply_predication to add in the predicate.  */
+static void
+build_16_32 (function_builder &b, const char *signature,
+	     const function_group_info &group, mode_suffix_index mode_suffix_id,
+	     bool preserve_user_namespace,
+	     bool force_direct_overloads = false,
+	     const predication_index *restrict_to_preds = NULL)
+{
+  for (unsigned int pi = 0; group.preds[pi] != NUM_PREDS; ++pi)
+    {
+      unsigned int pi2 = 0;
+
+      if (restrict_to_preds)
+	for (; restrict_to_preds[pi2] != NUM_PREDS; ++pi2)
+	  if (restrict_to_preds[pi2] == group.preds[pi])
+	    break;
+
+      if (restrict_to_preds == NULL || restrict_to_preds[pi2] != NUM_PREDS)
+	for (unsigned int ti = 0;
+	     ti == 0 || group.types[ti][0] != NUM_TYPE_SUFFIXES; ++ti)
+	  {
+	    unsigned int element_bits = type_suffixes[group.types[ti][0]].element_bits;
+	    type_class_index tclass = type_suffixes[group.types[ti][0]].tclass;
+	    if ((tclass == TYPE_signed || tclass == TYPE_unsigned)
+		&& (element_bits == 16 || element_bits == 32))
+	      build_one (b, signature, group, mode_suffix_id, ti, pi,
+			 preserve_user_namespace, force_direct_overloads);
+	  }
+    }
+}
+
+/* Declare the function shape NAME, pointing it to an instance
+   of class <NAME>_def.  */
+#define SHAPE(NAME) \
+  static CONSTEXPR const NAME##_def NAME##_obj; \
+  namespace shapes { const function_shape *const NAME = &NAME##_obj; }
+
+/* Base class for functions that are not overloaded.  */
+struct nonoverloaded_base : public function_shape
+{
+  bool
+  explicit_type_suffix_p (unsigned int, enum predication_index, enum mode_suffix_index) const override
+  {
+    return true;
+  }
+
+  bool
+  explicit_mode_suffix_p (enum predication_index, enum mode_suffix_index) const override
+  {
+    return true;
+  }
+
+  bool
+  skip_overload_p (enum predication_index, enum mode_suffix_index) const override
+  {
+    return false;
+  }
+
+  tree
+  resolve (function_resolver &) const override
+  {
+    gcc_unreachable ();
+  }
+};
+
+/* Base class for overloaded functions.  Bit N of EXPLICIT_MASK is true
+   if type suffix N appears in the overloaded name.  */
+template<unsigned int EXPLICIT_MASK>
+struct overloaded_base : public function_shape
+{
+  bool
+  explicit_type_suffix_p (unsigned int i, enum predication_index, enum mode_suffix_index) const override
+  {
+    return (EXPLICIT_MASK >> i) & 1;
+  }
+
+  bool
+  explicit_mode_suffix_p (enum predication_index, enum mode_suffix_index) const override
+  {
+    return false;
+  }
+
+  bool
+  skip_overload_p (enum predication_index, enum mode_suffix_index) const override
+  {
+    return false;
+  }
+};
+
+} /* end namespace arm_mve */
+
+#undef SHAPE
diff --git a/gcc/config/arm/arm-mve-builtins-shapes.h b/gcc/config/arm/arm-mve-builtins-shapes.h
new file mode 100644
index 00000000000..9e353b85a76
--- /dev/null
+++ b/gcc/config/arm/arm-mve-builtins-shapes.h
@@ -0,0 +1,30 @@
+/* ACLE support for Arm MVE (function shapes)
+   Copyright (C) 2023 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef GCC_ARM_MVE_BUILTINS_SHAPES_H
+#define GCC_ARM_MVE_BUILTINS_SHAPES_H
+
+namespace arm_mve
+{
+  namespace shapes
+  {
+  } /* end namespace arm_mve::shapes */
+} /* end namespace arm_mve */
+
+#endif
diff --git a/gcc/config/arm/arm-mve-builtins.cc b/gcc/config/arm/arm-mve-builtins.cc
index 7586a82e3c1..b0cceb75ceb 100644
--- a/gcc/config/arm/arm-mve-builtins.cc
+++ b/gcc/config/arm/arm-mve-builtins.cc
@@ -24,7 +24,19 @@
 #include "coretypes.h"
 #include "tm.h"
 #include "tree.h"
+#include "rtl.h"
+#include "tm_p.h"
+#include "memmodel.h"
+#include "insn-codes.h"
+#include "optabs.h"
+#include "recog.h"
+#include "expr.h"
+#include "basic-block.h"
+#include "function.h"
 #include "fold-const.h"
+#include "gimple.h"
+#include "gimple-iterator.h"
+#include "emit-rtl.h"
 #include "langhooks.h"
 #include "stringpool.h"
 #include "attribs.h"
@@ -32,6 +44,8 @@
 #include "arm-protos.h"
 #include "arm-builtins.h"
 #include "arm-mve-builtins.h"
+#include "arm-mve-builtins-base.h"
+#include "arm-mve-builtins-shapes.h"
 
 namespace arm_mve {
 
@@ -46,6 +60,33 @@ struct vector_type_info
   const bool requires_float;
 };
 
+/* Describes a function decl.  */
+class GTY(()) registered_function
+{
+public:
+  /* The ACLE function that the decl represents.  */
+  function_instance instance GTY ((skip));
+
+  /* The decl itself.  */
+  tree decl;
+
+  /* Whether the function requires a floating point abi.  */
+  bool requires_float;
+
+  /* True if the decl represents an overloaded function that needs to be
+     resolved by function_resolver.  */
+  bool overloaded_p;
+};
+
+/* Hash traits for registered_function.  */
+struct registered_function_hasher : nofree_ptr_hash <registered_function>
+{
+  typedef function_instance compare_type;
+
+  static hashval_t hash (value_type);
+  static bool equal (value_type, const compare_type &);
+};
+
 /* Flag indicating whether the arm MVE types have been handled.  */
 static bool handle_arm_mve_types_p;
 
@@ -54,11 +95,167 @@ static CONSTEXPR const vector_type_info vector_types[] = {
 #define DEF_MVE_TYPE(ACLE_NAME, SCALAR_TYPE) \
   { #ACLE_NAME, REQUIRES_FLOAT },
 #include "arm-mve-builtins.def"
-#undef DEF_MVE_TYPE
+};
+
+/* The function name suffix associated with each predication type.  */
+static const char *const pred_suffixes[NUM_PREDS + 1] = {
+  "",
+  "_m",
+  "_p",
+  "_x",
+  "_z",
+  ""
+};
+
+/* Static information about each mode_suffix_index.  */
+CONSTEXPR const mode_suffix_info mode_suffixes[] = {
+#define VECTOR_TYPE_none NUM_VECTOR_TYPES
+#define DEF_MVE_MODE(NAME, BASE, DISPLACEMENT, UNITS) \
+  { "_" #NAME, VECTOR_TYPE_##BASE, VECTOR_TYPE_##DISPLACEMENT, UNITS_##UNITS },
+#include "arm-mve-builtins.def"
+#undef VECTOR_TYPE_none
+  { "", NUM_VECTOR_TYPES, NUM_VECTOR_TYPES, UNITS_none }
+};
+
+/* Static information about each type_suffix_index.  */
+CONSTEXPR const type_suffix_info type_suffixes[NUM_TYPE_SUFFIXES + 1] = {
+#define DEF_MVE_TYPE_SUFFIX(NAME, ACLE_TYPE, CLASS, BITS, MODE)	\
+  { "_" #NAME, \
+    VECTOR_TYPE_##ACLE_TYPE, \
+    TYPE_##CLASS, \
+    BITS, \
+    BITS / BITS_PER_UNIT, \
+    TYPE_##CLASS == TYPE_signed || TYPE_##CLASS == TYPE_unsigned, \
+    TYPE_##CLASS == TYPE_unsigned, \
+    TYPE_##CLASS == TYPE_float, \
+    0, \
+    MODE },
+#include "arm-mve-builtins.def"
+  { "", NUM_VECTOR_TYPES, TYPE_bool, 0, 0, false, false, false,
+    0, VOIDmode }
+};
+
+/* Define a TYPES_<combination> macro for each combination of type
+   suffixes that an ACLE function can have, where <combination> is the
+   name used in DEF_MVE_FUNCTION entries.
+
+   Use S (T) for single type suffix T and D (T1, T2) for a pair of type
+   suffixes T1 and T2.  Use commas to separate the suffixes.
+
+   Although the order shouldn't matter, the convention is to sort the
+   suffixes lexicographically after dividing suffixes into a type
+   class ("b", "f", etc.) and a numerical bit count.  */
+
+/* _f16.  */
+#define TYPES_float16(S, D) \
+  S (f16)
+
+/* _f16 _f32.  */
+#define TYPES_all_float(S, D) \
+  S (f16), S (f32)
+
+/* _s8  _u8 .  */
+#define TYPES_integer_8(S, D) \
+  S (s8), S (u8)
+
+/* _s8 _s16
+   _u8 _u16.  */
+#define TYPES_integer_8_16(S, D) \
+  S (s8), S (s16), S (u8), S(u16)
+
+/* _s16 _s32
+   _u16 _u32.  */
+#define TYPES_integer_16_32(S, D)     \
+  S (s16), S (s32),		      \
+  S (u16), S (u32)
+
+/* _s16 _s32.  */
+#define TYPES_signed_16_32(S, D) \
+  S (s16), S (s32)
+
+/* _s8 _s16 _s32.  */
+#define TYPES_all_signed(S, D) \
+  S (s8), S (s16), S (s32)
+
+/* _u8 _u16 _u32.  */
+#define TYPES_all_unsigned(S, D) \
+  S (u8), S (u16), S (u32)
+
+/* _s8 _s16 _s32
+   _u8 _u16 _u32.  */
+#define TYPES_all_integer(S, D) \
+  TYPES_all_signed (S, D), TYPES_all_unsigned (S, D)
+
+/* _s8 _s16 _s32 _s64
+   _u8 _u16 _u32 _u64.  */
+#define TYPES_all_integer_with_64(S, D) \
+  TYPES_all_signed (S, D), S (s64), TYPES_all_unsigned (S, D), S (u64)
+
+/* s32 _u32.  */
+#define TYPES_integer_32(S, D) \
+  S (s32), S (u32)
+
+/* s32 .  */
+#define TYPES_signed_32(S, D) \
+  S (s32)
+
+/* Describe a pair of type suffixes in which only the first is used.  */
+#define DEF_VECTOR_TYPE(X) { TYPE_SUFFIX_ ## X, NUM_TYPE_SUFFIXES }
+
+/* Describe a pair of type suffixes in which both are used.  */
+#define DEF_DOUBLE_TYPE(X, Y) { TYPE_SUFFIX_ ## X, TYPE_SUFFIX_ ## Y }
+
+/* Create an array that can be used in arm-mve-builtins.def to
+   select the type suffixes in TYPES_<NAME>.  */
+#define DEF_MVE_TYPES_ARRAY(NAME) \
+  static const type_suffix_pair types_##NAME[] = { \
+    TYPES_##NAME (DEF_VECTOR_TYPE, DEF_DOUBLE_TYPE), \
+    { NUM_TYPE_SUFFIXES, NUM_TYPE_SUFFIXES } \
+  }
+
+/* For functions that don't take any type suffixes.  */
+static const type_suffix_pair types_none[] = {
+  { NUM_TYPE_SUFFIXES, NUM_TYPE_SUFFIXES },
+  { NUM_TYPE_SUFFIXES, NUM_TYPE_SUFFIXES }
+};
+
+DEF_MVE_TYPES_ARRAY (all_integer);
+DEF_MVE_TYPES_ARRAY (all_integer_with_64);
+DEF_MVE_TYPES_ARRAY (float16);
+DEF_MVE_TYPES_ARRAY (all_float);
+DEF_MVE_TYPES_ARRAY (all_signed);
+DEF_MVE_TYPES_ARRAY (all_unsigned);
+DEF_MVE_TYPES_ARRAY (integer_8);
+DEF_MVE_TYPES_ARRAY (integer_8_16);
+DEF_MVE_TYPES_ARRAY (integer_16_32);
+DEF_MVE_TYPES_ARRAY (integer_32);
+DEF_MVE_TYPES_ARRAY (signed_16_32);
+DEF_MVE_TYPES_ARRAY (signed_32);
+
+/* Used by functions that have no governing predicate.  */
+static const predication_index preds_none[] = { PRED_none, NUM_PREDS };
+
+/* Used by functions that have the m (merging) predicated form, and in
+   addition have an unpredicated form.  */
+static const predication_index preds_m_or_none[] = {
+  PRED_m, PRED_none, NUM_PREDS
+};
+
+/* Used by functions that have the mx (merging and "don't care"
+   predicated forms, and in addition have an unpredicated form.  */
+static const predication_index preds_mx_or_none[] = {
+  PRED_m, PRED_x, PRED_none, NUM_PREDS
+};
+
+/* Used by functions that have the p predicated form, in addition to
+   an unpredicated form.  */
+static const predication_index preds_p_or_none[] = {
+  PRED_p, PRED_none, NUM_PREDS
 };
 
 /* The scalar type associated with each vector type.  */
-GTY(()) tree scalar_types[NUM_VECTOR_TYPES];
+extern GTY(()) tree scalar_types[NUM_VECTOR_TYPES];
+tree scalar_types[NUM_VECTOR_TYPES];
 
 /* The single-predicate and single-vector types, with their built-in
    "__simd128_..._t" name.  Allow an index of NUM_VECTOR_TYPES, which always
@@ -66,7 +263,20 @@ GTY(()) tree scalar_types[NUM_VECTOR_TYPES];
 static GTY(()) tree abi_vector_types[NUM_VECTOR_TYPES + 1];
 
 /* Same, but with the arm_mve.h names.  */
-GTY(()) tree acle_vector_types[3][NUM_VECTOR_TYPES + 1];
+extern GTY(()) tree acle_vector_types[MAX_TUPLE_SIZE][NUM_VECTOR_TYPES + 1];
+tree acle_vector_types[MAX_TUPLE_SIZE][NUM_VECTOR_TYPES + 1];
+
+/* The list of all registered function decls, indexed by code.  */
+static GTY(()) vec<registered_function *, va_gc> *registered_functions;
+
+/* All registered function decls, hashed on the function_instance
+   that they implement.  This is used for looking up implementations of
+   overloaded functions.  */
+static hash_table<registered_function_hasher> *function_table;
+
+/* True if we've already complained about attempts to use functions
+   when the required extension is disabled.  */
+static bool reported_missing_float_p;
 
 /* Return the MVE abi type with element of type TYPE.  */
 static tree
@@ -87,7 +297,6 @@ register_builtin_types ()
 #define DEF_MVE_TYPE(ACLE_NAME, SCALAR_TYPE) \
   scalar_types[VECTOR_TYPE_ ## ACLE_NAME] = SCALAR_TYPE;
 #include "arm-mve-builtins.def"
-#undef DEF_MVE_TYPE
   for (unsigned int i = 0; i < NUM_VECTOR_TYPES; ++i)
     {
       if (vector_types[i].requires_float && !TARGET_HAVE_MVE_FLOAT)
@@ -113,8 +322,18 @@ register_builtin_types ()
 static void
 register_vector_type (vector_type_index type)
 {
+
+  /* If the target does not have the mve.fp extension, but the type requires
+     it, then it needs to be assigned a non-dummy type so that functions
+     with those types in their signature can be registered.  This allows for
+     diagnostics about the missing extension, rather than about a missing
+     function definition.  */
   if (vector_types[type].requires_float && !TARGET_HAVE_MVE_FLOAT)
-    return;
+    {
+      acle_vector_types[0][type] = void_type_node;
+      return;
+    }
+
   tree vectype = abi_vector_types[type];
   tree id = get_identifier (vector_types[type].acle_name);
   tree decl = build_decl (input_location, TYPE_DECL, id, vectype);
@@ -133,15 +352,26 @@ register_vector_type (vector_type_index type)
   acle_vector_types[0][type] = vectype;
 }
 
-/* Register tuple type TYPE with NUM_VECTORS arity under its
-   arm_mve_types.h name.  */
+/* Register tuple types of element type TYPE under their arm_mve_types.h
+   names.  */
 static void
 register_builtin_tuple_types (vector_type_index type)
 {
   const vector_type_info* info = &vector_types[type];
+
+  /* If the target does not have the mve.fp extension, but the type requires
+     it, then it needs to be assigned a non-dummy type so that functions
+     with those types in their signature can be registered.  This allows for
+     diagnostics about the missing extension, rather than about a missing
+     function definition.  */
   if (scalar_types[type] == boolean_type_node
       || (info->requires_float && !TARGET_HAVE_MVE_FLOAT))
+    {
+      for (unsigned int num_vectors = 2; num_vectors <= 4; num_vectors += 2)
+	acle_vector_types[num_vectors >> 1][type] = void_type_node;
     return;
+    }
+
   const char *vector_type_name = info->acle_name;
   char buffer[sizeof ("float32x4x2_t")];
   for (unsigned int num_vectors = 2; num_vectors <= 4; num_vectors += 2)
@@ -189,8 +419,1710 @@ handle_arm_mve_types_h ()
     }
 }
 
-} /* end namespace arm_mve */
+/* Implement #pragma GCC arm "arm_mve.h" <bool>.  */
+void
+handle_arm_mve_h (bool preserve_user_namespace)
+{
+  if (function_table)
+    {
+      error ("duplicate definition of %qs", "arm_mve.h");
+      return;
+    }
 
-using namespace arm_mve;
+  /* Define MVE functions.  */
+  function_table = new hash_table<registered_function_hasher> (1023);
+}
+
+/* Return true if CANDIDATE is equivalent to MODEL_TYPE for overloading
+   purposes.  */
+static bool
+matches_type_p (const_tree model_type, const_tree candidate)
+{
+  if (VECTOR_TYPE_P (model_type))
+    {
+      if (!VECTOR_TYPE_P (candidate)
+	  || maybe_ne (TYPE_VECTOR_SUBPARTS (model_type),
+		       TYPE_VECTOR_SUBPARTS (candidate))
+	  || TYPE_MODE (model_type) != TYPE_MODE (candidate))
+	return false;
+
+      model_type = TREE_TYPE (model_type);
+      candidate = TREE_TYPE (candidate);
+    }
+  return (candidate != error_mark_node
+	  && TYPE_MAIN_VARIANT (model_type) == TYPE_MAIN_VARIANT (candidate));
+}
+
+/* Report an error against LOCATION that the user has tried to use
+   a floating point function when the mve.fp extension is disabled.  */
+static void
+report_missing_float (location_t location, tree fndecl)
+{
+  /* Avoid reporting a slew of messages for a single oversight.  */
+  if (reported_missing_float_p)
+    return;
+
+  error_at (location, "ACLE function %qD requires ISA extension %qs",
+	    fndecl, "mve.fp");
+  inform (location, "you can enable mve.fp by using the command-line"
+	  " option %<-march%>, or by using the %<target%>"
+	  " attribute or pragma");
+  reported_missing_float_p = true;
+}
+
+/* Report that LOCATION has a call to FNDECL in which argument ARGNO
+   was not an integer constant expression.  ARGNO counts from zero.  */
+static void
+report_non_ice (location_t location, tree fndecl, unsigned int argno)
+{
+  error_at (location, "argument %d of %qE must be an integer constant"
+	    " expression", argno + 1, fndecl);
+}
+
+/* Report that LOCATION has a call to FNDECL in which argument ARGNO has
+   the value ACTUAL, whereas the function requires a value in the range
+   [MIN, MAX].  ARGNO counts from zero.  */
+static void
+report_out_of_range (location_t location, tree fndecl, unsigned int argno,
+		     HOST_WIDE_INT actual, HOST_WIDE_INT min,
+		     HOST_WIDE_INT max)
+{
+  error_at (location, "passing %wd to argument %d of %qE, which expects"
+	    " a value in the range [%wd, %wd]", actual, argno + 1, fndecl,
+	    min, max);
+}
+
+/* Report that LOCATION has a call to FNDECL in which argument ARGNO has
+   the value ACTUAL, whereas the function requires a valid value of
+   enum type ENUMTYPE.  ARGNO counts from zero.  */
+static void
+report_not_enum (location_t location, tree fndecl, unsigned int argno,
+		 HOST_WIDE_INT actual, tree enumtype)
+{
+  error_at (location, "passing %wd to argument %d of %qE, which expects"
+	    " a valid %qT value", actual, argno + 1, fndecl, enumtype);
+}
+
+/* Checks that the mve.fp extension is enabled, given that REQUIRES_FLOAT
+   indicates whether it is required or not for function FNDECL.
+   Report an error against LOCATION if not.  */
+static bool
+check_requires_float (location_t location, tree fndecl,
+		      bool requires_float)
+{
+  if (requires_float && !TARGET_HAVE_MVE_FLOAT)
+    {
+      report_missing_float (location, fndecl);
+      return false;
+    }
+
+  return true;
+}
+
+/* Return a hash code for a function_instance.  */
+hashval_t
+function_instance::hash () const
+{
+  inchash::hash h;
+  /* BASE uniquely determines BASE_NAME, so we don't need to hash both.  */
+  h.add_ptr (base);
+  h.add_ptr (shape);
+  h.add_int (mode_suffix_id);
+  h.add_int (type_suffix_ids[0]);
+  h.add_int (type_suffix_ids[1]);
+  h.add_int (pred);
+  return h.end ();
+}
+
+/* Return a set of CP_* flags that describe what the function could do,
+   taking the command-line flags into account.  */
+unsigned int
+function_instance::call_properties () const
+{
+  unsigned int flags = base->call_properties (*this);
+
+  /* -fno-trapping-math means that we can assume any FP exceptions
+     are not user-visible.  */
+  if (!flag_trapping_math)
+    flags &= ~CP_RAISE_FP_EXCEPTIONS;
+
+  return flags;
+}
+
+/* Return true if calls to the function could read some form of
+   global state.  */
+bool
+function_instance::reads_global_state_p () const
+{
+  unsigned int flags = call_properties ();
+
+  /* Preserve any dependence on rounding mode, flush to zero mode, etc.
+     There is currently no way of turning this off; in particular,
+     -fno-rounding-math (which is the default) means that we should make
+     the usual assumptions about rounding mode, which for intrinsics means
+     acting as the instructions do.  */
+  if (flags & CP_READ_FPCR)
+    return true;
+
+  return false;
+}
+
+/* Return true if calls to the function could modify some form of
+   global state.  */
+bool
+function_instance::modifies_global_state_p () const
+{
+  unsigned int flags = call_properties ();
+
+  /* Preserve any exception state written back to the FPCR,
+     unless -fno-trapping-math says this is unnecessary.  */
+  if (flags & CP_RAISE_FP_EXCEPTIONS)
+    return true;
+
+  /* Handle direct modifications of global state.  */
+  return flags & CP_WRITE_MEMORY;
+}
+
+/* Return true if calls to the function could raise a signal.  */
+bool
+function_instance::could_trap_p () const
+{
+  unsigned int flags = call_properties ();
+
+  /* Handle functions that could raise SIGFPE.  */
+  if (flags & CP_RAISE_FP_EXCEPTIONS)
+    return true;
+
+  /* Handle functions that could raise SIGBUS or SIGSEGV.  */
+  if (flags & (CP_READ_MEMORY | CP_WRITE_MEMORY))
+    return true;
+
+  return false;
+}
+
+/* Return true if the function has an implicit "inactive" argument.
+   This is the case of most _m predicated functions, but not all.
+   The list will be updated as needed.  */
+bool
+function_instance::has_inactive_argument () const
+{
+  if (pred != PRED_m)
+    return false;
+
+  return true;
+}
+
+inline hashval_t
+registered_function_hasher::hash (value_type value)
+{
+  return value->instance.hash ();
+}
+
+inline bool
+registered_function_hasher::equal (value_type value, const compare_type &key)
+{
+  return value->instance == key;
+}
+
+function_builder::function_builder ()
+{
+  m_overload_type = build_function_type (void_type_node, void_list_node);
+  m_direct_overloads = lang_GNU_CXX ();
+  gcc_obstack_init (&m_string_obstack);
+}
+
+function_builder::~function_builder ()
+{
+  obstack_free (&m_string_obstack, NULL);
+}
+
+/* Add NAME to the end of the function name being built.  */
+void
+function_builder::append_name (const char *name)
+{
+  obstack_grow (&m_string_obstack, name, strlen (name));
+}
+
+/* Zero-terminate and complete the function name being built.  */
+char *
+function_builder::finish_name ()
+{
+  obstack_1grow (&m_string_obstack, 0);
+  return (char *) obstack_finish (&m_string_obstack);
+}
+
+/* Return the overloaded or full function name for INSTANCE, with optional
+   prefix; PRESERVE_USER_NAMESPACE selects the prefix, and OVERLOADED_P
+   selects which the overloaded or full function name.  Allocate the string on
+   m_string_obstack; the caller must use obstack_free to free it after use.  */
+char *
+function_builder::get_name (const function_instance &instance,
+			    bool preserve_user_namespace,
+			    bool overloaded_p)
+{
+  if (preserve_user_namespace)
+    append_name ("__arm_");
+  append_name (instance.base_name);
+  append_name (pred_suffixes[instance.pred]);
+  if (!overloaded_p
+      || instance.shape->explicit_mode_suffix_p (instance.pred,
+						 instance.mode_suffix_id))
+    append_name (instance.mode_suffix ().string);
+  for (unsigned int i = 0; i < 2; ++i)
+    if (!overloaded_p
+	|| instance.shape->explicit_type_suffix_p (i, instance.pred,
+						   instance.mode_suffix_id))
+      append_name (instance.type_suffix (i).string);
+  return finish_name ();
+}
+
+/* Add attribute NAME to ATTRS.  */
+static tree
+add_attribute (const char *name, tree attrs)
+{
+  return tree_cons (get_identifier (name), NULL_TREE, attrs);
+}
+
+/* Return the appropriate function attributes for INSTANCE.  */
+tree
+function_builder::get_attributes (const function_instance &instance)
+{
+  tree attrs = NULL_TREE;
+
+  if (!instance.modifies_global_state_p ())
+    {
+      if (instance.reads_global_state_p ())
+	attrs = add_attribute ("pure", attrs);
+      else
+	attrs = add_attribute ("const", attrs);
+    }
+
+  if (!flag_non_call_exceptions || !instance.could_trap_p ())
+    attrs = add_attribute ("nothrow", attrs);
+
+  return add_attribute ("leaf", attrs);
+}
+
+/* Add a function called NAME with type FNTYPE and attributes ATTRS.
+   INSTANCE describes what the function does and OVERLOADED_P indicates
+   whether it is overloaded.  REQUIRES_FLOAT indicates whether the function
+   requires the mve.fp extension.  */
+registered_function &
+function_builder::add_function (const function_instance &instance,
+				const char *name, tree fntype, tree attrs,
+				bool requires_float,
+				bool overloaded_p,
+				bool placeholder_p)
+{
+  unsigned int code = vec_safe_length (registered_functions);
+  code = (code << ARM_BUILTIN_SHIFT) | ARM_BUILTIN_MVE;
+
+  /* We need to be able to generate placeholders to ensure that we have a
+     consistent numbering scheme for function codes between the C and C++
+     frontends, so that everything ties up in LTO.
+
+     Currently, tree-streamer-in.cc:unpack_ts_function_decl_value_fields
+     validates that tree nodes returned by TARGET_BUILTIN_DECL are non-NULL and
+     some node other than error_mark_node.  This is a holdover from when builtin
+     decls were streamed by code rather than by value.
+
+     Ultimately, we should be able to remove this validation of BUILT_IN_MD
+     nodes and remove the target hook.  For now, however, we need to appease the
+     validation and return a non-NULL, non-error_mark_node node, so we
+     arbitrarily choose integer_zero_node.  */
+  tree decl = placeholder_p
+    ? integer_zero_node
+    : simulate_builtin_function_decl (input_location, name, fntype,
+				      code, NULL, attrs);
+
+  registered_function &rfn = *ggc_alloc <registered_function> ();
+  rfn.instance = instance;
+  rfn.decl = decl;
+  rfn.requires_float = requires_float;
+  rfn.overloaded_p = overloaded_p;
+  vec_safe_push (registered_functions, &rfn);
+
+  return rfn;
+}
+
+/* Add a built-in function for INSTANCE, with the argument types given
+   by ARGUMENT_TYPES and the return type given by RETURN_TYPE.
+   REQUIRES_FLOAT indicates whether the function requires the mve.fp extension,
+   and PRESERVE_USER_NAMESPACE indicates whether the function should also be
+   registered under its non-prefixed name.  */
+void
+function_builder::add_unique_function (const function_instance &instance,
+				       tree return_type,
+				       vec<tree> &argument_types,
+				       bool preserve_user_namespace,
+				       bool requires_float,
+				       bool force_direct_overloads)
+{
+  /* Add the function under its full (unique) name with prefix.  */
+  char *name = get_name (instance, true, false);
+  tree fntype = build_function_type_array (return_type,
+					   argument_types.length (),
+					   argument_types.address ());
+  tree attrs = get_attributes (instance);
+  registered_function &rfn = add_function (instance, name, fntype, attrs,
+					   requires_float, false, false);
+
+  /* Enter the function into the hash table.  */
+  hashval_t hash = instance.hash ();
+  registered_function **rfn_slot
+    = function_table->find_slot_with_hash (instance, hash, INSERT);
+  gcc_assert (!*rfn_slot);
+  *rfn_slot = &rfn;
+
+  /* Also add the non-prefixed non-overloaded function, if the user namespace
+     does not need to be preserved.  */
+  if (!preserve_user_namespace)
+    {
+      char *noprefix_name = get_name (instance, false, false);
+      tree attrs = get_attributes (instance);
+      add_function (instance, noprefix_name, fntype, attrs, requires_float,
+		    false, false);
+    }
+
+  /* Also add the function under its overloaded alias, if we want
+     a separate decl for each instance of an overloaded function.  */
+  char *overload_name = get_name (instance, true, true);
+  if (strcmp (name, overload_name) != 0)
+    {
+      /* Attribute lists shouldn't be shared.  */
+      tree attrs = get_attributes (instance);
+      bool placeholder_p = !(m_direct_overloads || force_direct_overloads);
+      add_function (instance, overload_name, fntype, attrs,
+		    requires_float, false, placeholder_p);
+
+      /* Also add the non-prefixed overloaded function, if the user namespace
+	 does not need to be preserved.  */
+      if (!preserve_user_namespace)
+	{
+	  char *noprefix_overload_name = get_name (instance, false, true);
+	  tree attrs = get_attributes (instance);
+	  add_function (instance, noprefix_overload_name, fntype, attrs,
+			requires_float, false, placeholder_p);
+	}
+    }
+
+  obstack_free (&m_string_obstack, name);
+}
+
+/* Add one function decl for INSTANCE, to be used with manual overload
+   resolution.  REQUIRES_FLOAT indicates whether the function requires the
+   mve.fp extension.
+
+   For simplicity, partition functions by instance and required extensions,
+   and check whether the required extensions are available as part of resolving
+   the function to the relevant unique function.  */
+void
+function_builder::add_overloaded_function (const function_instance &instance,
+					   bool preserve_user_namespace,
+					   bool requires_float)
+{
+  char *name = get_name (instance, true, true);
+  if (registered_function **map_value = m_overload_names.get (name))
+    {
+      gcc_assert ((*map_value)->instance == instance);
+      obstack_free (&m_string_obstack, name);
+    }
+  else
+    {
+      registered_function &rfn
+	= add_function (instance, name, m_overload_type, NULL_TREE,
+			requires_float, true, m_direct_overloads);
+      m_overload_names.put (name, &rfn);
+      if (!preserve_user_namespace)
+	{
+	  char *noprefix_name = get_name (instance, false, true);
+	  registered_function &noprefix_rfn
+	    = add_function (instance, noprefix_name, m_overload_type,
+			    NULL_TREE, requires_float, true,
+			    m_direct_overloads);
+	  m_overload_names.put (noprefix_name, &noprefix_rfn);
+	}
+    }
+}
+
+/* If we are using manual overload resolution, add one function decl
+   for each overloaded function in GROUP.  Take the function base name
+   from GROUP and the mode from MODE.  */
+void
+function_builder::add_overloaded_functions (const function_group_info &group,
+					    mode_suffix_index mode,
+					    bool preserve_user_namespace)
+{
+  for (unsigned int pi = 0; group.preds[pi] != NUM_PREDS; ++pi)
+    {
+      unsigned int explicit_type0
+	= (*group.shape)->explicit_type_suffix_p (0, group.preds[pi], mode);
+      unsigned int explicit_type1
+	= (*group.shape)->explicit_type_suffix_p (1, group.preds[pi], mode);
+
+      if ((*group.shape)->skip_overload_p (group.preds[pi], mode))
+	continue;
+
+      if (!explicit_type0 && !explicit_type1)
+	{
+	  /* Deal with the common case in which there is one overloaded
+	     function for all type combinations.  */
+	  function_instance instance (group.base_name, *group.base,
+				      *group.shape, mode, types_none[0],
+				      group.preds[pi]);
+	  add_overloaded_function (instance, preserve_user_namespace,
+				   group.requires_float);
+	}
+      else
+	for (unsigned int ti = 0; group.types[ti][0] != NUM_TYPE_SUFFIXES;
+	     ++ti)
+	  {
+	    /* Stub out the types that are determined by overload
+	       resolution.  */
+	    type_suffix_pair types = {
+	      explicit_type0 ? group.types[ti][0] : NUM_TYPE_SUFFIXES,
+	      explicit_type1 ? group.types[ti][1] : NUM_TYPE_SUFFIXES
+	    };
+	    function_instance instance (group.base_name, *group.base,
+					*group.shape, mode, types,
+					group.preds[pi]);
+	    add_overloaded_function (instance, preserve_user_namespace,
+				     group.requires_float);
+	  }
+    }
+}
+
+/* Register all the functions in GROUP.  */
+void
+function_builder::register_function_group (const function_group_info &group,
+					   bool preserve_user_namespace)
+{
+  (*group.shape)->build (*this, group, preserve_user_namespace);
+}
+
+function_call_info::function_call_info (location_t location_in,
+					const function_instance &instance_in,
+					tree fndecl_in)
+  : function_instance (instance_in), location (location_in), fndecl (fndecl_in)
+{
+}
+
+function_resolver::function_resolver (location_t location,
+				      const function_instance &instance,
+				      tree fndecl, vec<tree, va_gc> &arglist)
+  : function_call_info (location, instance, fndecl), m_arglist (arglist)
+{
+}
+
+/* Return the vector type associated with type suffix TYPE.  */
+tree
+function_resolver::get_vector_type (type_suffix_index type)
+{
+  return acle_vector_types[0][type_suffixes[type].vector_type];
+}
+
+/* Return the <stdint.h> name associated with TYPE.  Using the <stdint.h>
+   name should be more user-friendly than the underlying canonical type,
+   since it makes the signedness and bitwidth explicit.  */
+const char *
+function_resolver::get_scalar_type_name (type_suffix_index type)
+{
+  return vector_types[type_suffixes[type].vector_type].acle_name + 2;
+}
+
+/* Return the type of argument I, or error_mark_node if it isn't
+   well-formed.  */
+tree
+function_resolver::get_argument_type (unsigned int i)
+{
+  tree arg = m_arglist[i];
+  return arg == error_mark_node ? arg : TREE_TYPE (arg);
+}
+
+/* Return true if argument I is some form of scalar value.  */
+bool
+function_resolver::scalar_argument_p (unsigned int i)
+{
+  tree type = get_argument_type (i);
+  return (INTEGRAL_TYPE_P (type)
+	  /* Allow pointer types, leaving the frontend to warn where
+	     necessary.  */
+	  || POINTER_TYPE_P (type)
+	  || SCALAR_FLOAT_TYPE_P (type));
+}
+
+/* Report that the function has no form that takes type suffix TYPE.
+   Return error_mark_node.  */
+tree
+function_resolver::report_no_such_form (type_suffix_index type)
+{
+  error_at (location, "%qE has no form that takes %qT arguments",
+	    fndecl, get_vector_type (type));
+  return error_mark_node;
+}
+
+/* Silently check whether there is an instance of the function with the
+   mode suffix given by MODE and the type suffixes given by TYPE0 and TYPE1.
+   Return its function decl if so, otherwise return null.  */
+tree
+function_resolver::lookup_form (mode_suffix_index mode,
+				type_suffix_index type0,
+				type_suffix_index type1)
+{
+  type_suffix_pair types = { type0, type1 };
+  function_instance instance (base_name, base, shape, mode, types, pred);
+  registered_function *rfn
+    = function_table->find_with_hash (instance, instance.hash ());
+  return rfn ? rfn->decl : NULL_TREE;
+}
+
+/* Resolve the function to one with the mode suffix given by MODE and the
+   type suffixes given by TYPE0 and TYPE1.  Return its function decl on
+   success, otherwise report an error and return error_mark_node.  */
+tree
+function_resolver::resolve_to (mode_suffix_index mode,
+			       type_suffix_index type0,
+			       type_suffix_index type1)
+{
+  tree res = lookup_form (mode, type0, type1);
+  if (!res)
+    {
+      if (type1 == NUM_TYPE_SUFFIXES)
+	return report_no_such_form (type0);
+      if (type0 == type_suffix_ids[0])
+	return report_no_such_form (type1);
+      /* To be filled in when we have other cases.  */
+      gcc_unreachable ();
+    }
+  return res;
+}
+
+/* Require argument ARGNO to be a single vector or a tuple of NUM_VECTORS
+   vectors; NUM_VECTORS is 1 for the former.  Return the associated type
+   suffix on success, using TYPE_SUFFIX_b for predicates.  Report an error
+   and return NUM_TYPE_SUFFIXES on failure.  */
+type_suffix_index
+function_resolver::infer_vector_or_tuple_type (unsigned int argno,
+					       unsigned int num_vectors)
+{
+  tree actual = get_argument_type (argno);
+  if (actual == error_mark_node)
+    return NUM_TYPE_SUFFIXES;
+
+  /* A linear search should be OK here, since the code isn't hot and
+     the number of types is only small.  */
+  for (unsigned int size_i = 0; size_i < MAX_TUPLE_SIZE; ++size_i)
+    for (unsigned int suffix_i = 0; suffix_i < NUM_TYPE_SUFFIXES; ++suffix_i)
+      {
+	vector_type_index type_i = type_suffixes[suffix_i].vector_type;
+	tree type = acle_vector_types[size_i][type_i];
+	if (type && matches_type_p (type, actual))
+	  {
+	    if (size_i + 1 == num_vectors)
+	      return type_suffix_index (suffix_i);
+
+	    if (num_vectors == 1)
+	      error_at (location, "passing %qT to argument %d of %qE, which"
+			" expects a single MVE vector rather than a tuple",
+			actual, argno + 1, fndecl);
+	    else if (size_i == 0 && type_i != VECTOR_TYPE_mve_pred16_t)
+	      /* num_vectors is always != 1, so the singular isn't needed.  */
+	      error_n (location, num_vectors, "%qT%d%qE%d",
+		       "passing single vector %qT to argument %d"
+		       " of %qE, which expects a tuple of %d vectors",
+		       actual, argno + 1, fndecl, num_vectors);
+	    else
+	      /* num_vectors is always != 1, so the singular isn't needed.  */
+	      error_n (location, num_vectors, "%qT%d%qE%d",
+		       "passing %qT to argument %d of %qE, which"
+		       " expects a tuple of %d vectors", actual, argno + 1,
+		       fndecl, num_vectors);
+	    return NUM_TYPE_SUFFIXES;
+	  }
+      }
+
+  if (num_vectors == 1)
+    error_at (location, "passing %qT to argument %d of %qE, which"
+	      " expects an MVE vector type", actual, argno + 1, fndecl);
+  else
+    error_at (location, "passing %qT to argument %d of %qE, which"
+	      " expects an MVE tuple type", actual, argno + 1, fndecl);
+  return NUM_TYPE_SUFFIXES;
+}
+
+/* Require argument ARGNO to have some form of vector type.  Return the
+   associated type suffix on success, using TYPE_SUFFIX_b for predicates.
+   Report an error and return NUM_TYPE_SUFFIXES on failure.  */
+type_suffix_index
+function_resolver::infer_vector_type (unsigned int argno)
+{
+  return infer_vector_or_tuple_type (argno, 1);
+}
+
+/* Require argument ARGNO to be a vector or scalar argument.  Return true
+   if it is, otherwise report an appropriate error.  */
+bool
+function_resolver::require_vector_or_scalar_type (unsigned int argno)
+{
+  tree actual = get_argument_type (argno);
+  if (actual == error_mark_node)
+    return false;
+
+  if (!scalar_argument_p (argno) && !VECTOR_TYPE_P (actual))
+    {
+      error_at (location, "passing %qT to argument %d of %qE, which"
+		" expects a vector or scalar type", actual, argno + 1, fndecl);
+      return false;
+    }
+
+  return true;
+}
+
+/* Require argument ARGNO to have vector type TYPE, in cases where this
+   requirement holds for all uses of the function.  Return true if the
+   argument has the right form, otherwise report an appropriate error.  */
+bool
+function_resolver::require_vector_type (unsigned int argno,
+					vector_type_index type)
+{
+  tree expected = acle_vector_types[0][type];
+  tree actual = get_argument_type (argno);
+  if (actual == error_mark_node)
+    return false;
+
+  if (!matches_type_p (expected, actual))
+    {
+      error_at (location, "passing %qT to argument %d of %qE, which"
+		" expects %qT", actual, argno + 1, fndecl, expected);
+      return false;
+    }
+  return true;
+}
+
+/* Like require_vector_type, but TYPE is inferred from previous arguments
+   rather than being a fixed part of the function signature.  This changes
+   the nature of the error messages.  */
+bool
+function_resolver::require_matching_vector_type (unsigned int argno,
+						 type_suffix_index type)
+{
+  type_suffix_index new_type = infer_vector_type (argno);
+  if (new_type == NUM_TYPE_SUFFIXES)
+    return false;
+
+  if (type != new_type)
+    {
+      error_at (location, "passing %qT to argument %d of %qE, but"
+		" previous arguments had type %qT",
+		get_vector_type (new_type), argno + 1, fndecl,
+		get_vector_type (type));
+      return false;
+    }
+  return true;
+}
+
+/* Require argument ARGNO to be a vector type with the following properties:
+
+   - the type class must be the same as FIRST_TYPE's if EXPECTED_TCLASS
+     is SAME_TYPE_CLASS, otherwise it must be EXPECTED_TCLASS itself.
+
+   - the element size must be:
+
+     - the same as FIRST_TYPE's if EXPECTED_BITS == SAME_SIZE
+     - half of FIRST_TYPE's if EXPECTED_BITS == HALF_SIZE
+     - a quarter of FIRST_TYPE's if EXPECTED_BITS == QUARTER_SIZE
+     - EXPECTED_BITS itself otherwise
+
+   Return true if the argument has the required type, otherwise report
+   an appropriate error.
+
+   FIRST_ARGNO is the first argument that is known to have type FIRST_TYPE.
+   Usually it comes before ARGNO, but sometimes it is more natural to resolve
+   arguments out of order.
+
+   If the required properties depend on FIRST_TYPE then both FIRST_ARGNO and
+   ARGNO contribute to the resolution process.  If the required properties
+   are fixed, only FIRST_ARGNO contributes to the resolution process.
+
+   This function is a bit of a Swiss army knife.  The complication comes
+   from trying to give good error messages when FIRST_ARGNO and ARGNO are
+   inconsistent, since either of them might be wrong.  */
+bool function_resolver::
+require_derived_vector_type (unsigned int argno,
+			     unsigned int first_argno,
+			     type_suffix_index first_type,
+			     type_class_index expected_tclass,
+			     unsigned int expected_bits)
+{
+  /* If the type needs to match FIRST_ARGNO exactly, use the preferred
+     error message for that case.  The VECTOR_TYPE_P test excludes tuple
+     types, which we handle below instead.  */
+  bool both_vectors_p = VECTOR_TYPE_P (get_argument_type (first_argno));
+  if (both_vectors_p
+      && expected_tclass == SAME_TYPE_CLASS
+      && expected_bits == SAME_SIZE)
+    {
+      /* There's no need to resolve this case out of order.  */
+      gcc_assert (argno > first_argno);
+      return require_matching_vector_type (argno, first_type);
+    }
+
+  /* Use FIRST_TYPE to get the expected type class and element size.  */
+  type_class_index orig_expected_tclass = expected_tclass;
+  if (expected_tclass == NUM_TYPE_CLASSES)
+    expected_tclass = type_suffixes[first_type].tclass;
+
+  unsigned int orig_expected_bits = expected_bits;
+  if (expected_bits == SAME_SIZE)
+    expected_bits = type_suffixes[first_type].element_bits;
+  else if (expected_bits == HALF_SIZE)
+    expected_bits = type_suffixes[first_type].element_bits / 2;
+  else if (expected_bits == QUARTER_SIZE)
+    expected_bits = type_suffixes[first_type].element_bits / 4;
+
+  /* If the expected type doesn't depend on FIRST_TYPE at all,
+     just check for the fixed choice of vector type.  */
+  if (expected_tclass == orig_expected_tclass
+      && expected_bits == orig_expected_bits)
+    {
+      const type_suffix_info &expected_suffix
+	= type_suffixes[find_type_suffix (expected_tclass, expected_bits)];
+      return require_vector_type (argno, expected_suffix.vector_type);
+    }
+
+  /* Require the argument to be some form of MVE vector type,
+     without being specific about the type of vector we want.  */
+  type_suffix_index actual_type = infer_vector_type (argno);
+  if (actual_type == NUM_TYPE_SUFFIXES)
+    return false;
+
+  /* Exit now if we got the right type.  */
+  bool tclass_ok_p = (type_suffixes[actual_type].tclass == expected_tclass);
+  bool size_ok_p = (type_suffixes[actual_type].element_bits == expected_bits);
+  if (tclass_ok_p && size_ok_p)
+    return true;
+
+  /* First look for cases in which the actual type contravenes a fixed
+     size requirement, without having to refer to FIRST_TYPE.  */
+  if (!size_ok_p && expected_bits == orig_expected_bits)
+    {
+      error_at (location, "passing %qT to argument %d of %qE, which"
+		" expects a vector of %d-bit elements",
+		get_vector_type (actual_type), argno + 1, fndecl,
+		expected_bits);
+      return false;
+    }
+
+  /* Likewise for a fixed type class requirement.  This is only ever
+     needed for signed and unsigned types, so don't create unnecessary
+     translation work for other type classes.  */
+  if (!tclass_ok_p && orig_expected_tclass == TYPE_signed)
+    {
+      error_at (location, "passing %qT to argument %d of %qE, which"
+		" expects a vector of signed integers",
+		get_vector_type (actual_type), argno + 1, fndecl);
+      return false;
+    }
+  if (!tclass_ok_p && orig_expected_tclass == TYPE_unsigned)
+    {
+      error_at (location, "passing %qT to argument %d of %qE, which"
+		" expects a vector of unsigned integers",
+		get_vector_type (actual_type), argno + 1, fndecl);
+      return false;
+    }
+
+  /* Make sure that FIRST_TYPE itself is sensible before using it
+     as a basis for an error message.  */
+  if (resolve_to (mode_suffix_id, first_type) == error_mark_node)
+    return false;
+
+  /* If the arguments have consistent type classes, but a link between
+     the sizes has been broken, try to describe the error in those terms.  */
+  if (both_vectors_p && tclass_ok_p && orig_expected_bits == SAME_SIZE)
+    {
+      if (argno < first_argno)
+	{
+	  std::swap (argno, first_argno);
+	  std::swap (actual_type, first_type);
+	}
+      error_at (location, "arguments %d and %d of %qE must have the"
+		" same element size, but the values passed here have type"
+		" %qT and %qT respectively", first_argno + 1, argno + 1,
+		fndecl, get_vector_type (first_type),
+		get_vector_type (actual_type));
+      return false;
+    }
+
+  /* Likewise in reverse: look for cases in which the sizes are consistent
+     but a link between the type classes has been broken.  */
+  if (both_vectors_p
+      && size_ok_p
+      && orig_expected_tclass == SAME_TYPE_CLASS
+      && type_suffixes[first_type].integer_p
+      && type_suffixes[actual_type].integer_p)
+    {
+      if (argno < first_argno)
+	{
+	  std::swap (argno, first_argno);
+	  std::swap (actual_type, first_type);
+	}
+      error_at (location, "arguments %d and %d of %qE must have the"
+		" same signedness, but the values passed here have type"
+		" %qT and %qT respectively", first_argno + 1, argno + 1,
+		fndecl, get_vector_type (first_type),
+		get_vector_type (actual_type));
+      return false;
+    }
+
+  /* The two arguments are wildly inconsistent.  */
+  type_suffix_index expected_type
+    = find_type_suffix (expected_tclass, expected_bits);
+  error_at (location, "passing %qT instead of the expected %qT to argument"
+	    " %d of %qE, after passing %qT to argument %d",
+	    get_vector_type (actual_type), get_vector_type (expected_type),
+	    argno + 1, fndecl, get_argument_type (first_argno),
+	    first_argno + 1);
+  return false;
+}
+
+/* Require argument ARGNO to be a (possibly variable) scalar, expecting it
+   to have the following properties:
+
+   - the type class must be the same as for type suffix 0 if EXPECTED_TCLASS
+     is SAME_TYPE_CLASS, otherwise it must be EXPECTED_TCLASS itself.
+
+   - the element size must be the same as for type suffix 0 if EXPECTED_BITS
+     is SAME_TYPE_SIZE, otherwise it must be EXPECTED_BITS itself.
+
+   Return true if the argument is valid, otherwise report an appropriate error.
+
+   Note that we don't check whether the scalar type actually has the required
+   properties, since that's subject to implicit promotions and conversions.
+   Instead we just use the expected properties to tune the error message.  */
+bool function_resolver::
+require_derived_scalar_type (unsigned int argno,
+			     type_class_index expected_tclass,
+			     unsigned int expected_bits)
+{
+  gcc_assert (expected_tclass == SAME_TYPE_CLASS
+	      || expected_tclass == TYPE_signed
+	      || expected_tclass == TYPE_unsigned);
+
+  /* If the expected type doesn't depend on the type suffix at all,
+     just check for the fixed choice of scalar type.  */
+  if (expected_tclass != SAME_TYPE_CLASS && expected_bits != SAME_SIZE)
+    {
+      type_suffix_index expected_type
+	= find_type_suffix (expected_tclass, expected_bits);
+      return require_scalar_type (argno, get_scalar_type_name (expected_type));
+    }
+
+  if (scalar_argument_p (argno))
+    return true;
+
+  if (expected_tclass == SAME_TYPE_CLASS)
+    /* It doesn't really matter whether the element is expected to be
+       the same size as type suffix 0.  */
+    error_at (location, "passing %qT to argument %d of %qE, which"
+	      " expects a scalar element", get_argument_type (argno),
+	      argno + 1, fndecl);
+  else
+    /* It doesn't seem useful to distinguish between signed and unsigned
+       scalars here.  */
+    error_at (location, "passing %qT to argument %d of %qE, which"
+	      " expects a scalar integer", get_argument_type (argno),
+	      argno + 1, fndecl);
+  return false;
+}
+
+/* Require argument ARGNO to be suitable for an integer constant expression.
+   Return true if it is, otherwise report an appropriate error.
+
+   function_checker checks whether the argument is actually constant and
+   has a suitable range.  The reason for distinguishing immediate arguments
+   here is because it provides more consistent error messages than
+   require_scalar_type would.  */
+bool
+function_resolver::require_integer_immediate (unsigned int argno)
+{
+  if (!scalar_argument_p (argno))
+    {
+      report_non_ice (location, fndecl, argno);
+      return false;
+    }
+  return true;
+}
+
+/* Require argument ARGNO to be a (possibly variable) scalar, using EXPECTED
+   as the name of its expected type.  Return true if the argument has the
+   right form, otherwise report an appropriate error.  */
+bool
+function_resolver::require_scalar_type (unsigned int argno,
+					const char *expected)
+{
+  if (!scalar_argument_p (argno))
+    {
+      error_at (location, "passing %qT to argument %d of %qE, which"
+		" expects %qs", get_argument_type (argno), argno + 1,
+		fndecl, expected);
+      return false;
+    }
+  return true;
+}
+
+/* Require the function to have exactly EXPECTED arguments.  Return true
+   if it does, otherwise report an appropriate error.  */
+bool
+function_resolver::check_num_arguments (unsigned int expected)
+{
+  if (m_arglist.length () < expected)
+    error_at (location, "too few arguments to function %qE", fndecl);
+  else if (m_arglist.length () > expected)
+    error_at (location, "too many arguments to function %qE", fndecl);
+  return m_arglist.length () == expected;
+}
+
+/* If the function is predicated, check that the last argument is a
+   suitable predicate.  Also check that there are NOPS further
+   arguments before any predicate, but don't check what they are.
+
+   Return true on success, otherwise report a suitable error.
+   When returning true:
+
+   - set I to the number of the last unchecked argument.
+   - set NARGS to the total number of arguments.  */
+bool
+function_resolver::check_gp_argument (unsigned int nops,
+				      unsigned int &i, unsigned int &nargs)
+{
+  i = nops - 1;
+  if (pred != PRED_none)
+    {
+      switch (pred)
+	{
+	case PRED_m:
+	  /* Add first inactive argument if needed, and final predicate.  */
+	  if (has_inactive_argument ())
+	    nargs = nops + 2;
+	  else
+	    nargs = nops + 1;
+	  break;
+
+	case PRED_p:
+	case PRED_x:
+	  /* Add final predicate.  */
+	  nargs = nops + 1;
+	  break;
+
+	default:
+	  gcc_unreachable ();
+	}
+
+      if (!check_num_arguments (nargs)
+	  || !require_vector_type (nargs - 1, VECTOR_TYPE_mve_pred16_t))
+	return false;
+
+      i = nargs - 2;
+    }
+  else
+    {
+      nargs = nops;
+      if (!check_num_arguments (nargs))
+	return false;
+    }
+
+  return true;
+}
+
+/* Finish resolving a function whose final argument can be a vector
+   or a scalar, with the function having an implicit "_n" suffix
+   in the latter case.  This "_n" form might only exist for certain
+   type suffixes.
+
+   ARGNO is the index of the final argument.  The inferred type suffix
+   was obtained from argument FIRST_ARGNO, which has type FIRST_TYPE.
+   EXPECTED_TCLASS and EXPECTED_BITS describe the expected properties
+   of the final vector or scalar argument, in the same way as for
+   require_derived_vector_type.  INFERRED_TYPE is the inferred type
+   suffix itself, or NUM_TYPE_SUFFIXES if it's the same as FIRST_TYPE.
+
+   Return the function decl of the resolved function on success,
+   otherwise report a suitable error and return error_mark_node.  */
+tree function_resolver::
+finish_opt_n_resolution (unsigned int argno, unsigned int first_argno,
+			 type_suffix_index first_type,
+			 type_class_index expected_tclass,
+			 unsigned int expected_bits,
+			 type_suffix_index inferred_type)
+{
+  if (inferred_type == NUM_TYPE_SUFFIXES)
+    inferred_type = first_type;
+  tree scalar_form = lookup_form (MODE_n, inferred_type);
+
+  /* Allow the final argument to be scalar, if an _n form exists.  */
+  if (scalar_argument_p (argno))
+    {
+      if (scalar_form)
+	return scalar_form;
+
+      /* Check the vector form normally.  If that succeeds, raise an
+	 error about having no corresponding _n form.  */
+      tree res = resolve_to (mode_suffix_id, inferred_type);
+      if (res != error_mark_node)
+	error_at (location, "passing %qT to argument %d of %qE, but its"
+		  " %qT form does not accept scalars",
+		  get_argument_type (argno), argno + 1, fndecl,
+		  get_vector_type (first_type));
+      return error_mark_node;
+    }
+
+  /* If an _n form does exist, provide a more accurate message than
+     require_derived_vector_type would for arguments that are neither
+     vectors nor scalars.  */
+  if (scalar_form && !require_vector_or_scalar_type (argno))
+    return error_mark_node;
+
+  /* Check for the correct vector type.  */
+  if (!require_derived_vector_type (argno, first_argno, first_type,
+				    expected_tclass, expected_bits))
+    return error_mark_node;
+
+  return resolve_to (mode_suffix_id, inferred_type);
+}
+
+/* Resolve a (possibly predicated) unary function.  If the function uses
+   merge predication or if TREAT_AS_MERGE_P is true, there is an extra
+   vector argument before the governing predicate that specifies the
+   values of inactive elements.  This argument has the following
+   properties:
+
+   - the type class must be the same as for active elements if MERGE_TCLASS
+     is SAME_TYPE_CLASS, otherwise it must be MERGE_TCLASS itself.
+
+   - the element size must be the same as for active elements if MERGE_BITS
+     is SAME_TYPE_SIZE, otherwise it must be MERGE_BITS itself.
+
+   Return the function decl of the resolved function on success,
+   otherwise report a suitable error and return error_mark_node.  */
+tree
+function_resolver::resolve_unary (type_class_index merge_tclass,
+				  unsigned int merge_bits,
+				  bool treat_as_merge_p)
+{
+  type_suffix_index type;
+  if (pred == PRED_m || treat_as_merge_p)
+    {
+      if (!check_num_arguments (3))
+	return error_mark_node;
+      if (merge_tclass == SAME_TYPE_CLASS && merge_bits == SAME_SIZE)
+	{
+	  /* The inactive elements are the same as the active elements,
+	     so we can use normal left-to-right resolution.  */
+	  if ((type = infer_vector_type (0)) == NUM_TYPE_SUFFIXES
+	      /* Predicates are the last argument.  */
+	      || !require_vector_type (2 , VECTOR_TYPE_mve_pred16_t)
+	      || !require_matching_vector_type (1 , type))
+	    return error_mark_node;
+	}
+      else
+	{
+	  /* The inactive element type is a function of the active one,
+	     so resolve the active one first.  */
+	  if (!require_vector_type (1, VECTOR_TYPE_mve_pred16_t)
+	      || (type = infer_vector_type (2)) == NUM_TYPE_SUFFIXES
+	      || !require_derived_vector_type (0, 2, type, merge_tclass,
+					       merge_bits))
+	    return error_mark_node;
+	}
+    }
+  else
+    {
+      /* We just need to check the predicate (if any) and the single
+	 vector argument.  */
+      unsigned int i, nargs;
+      if (!check_gp_argument (1, i, nargs)
+	  || (type = infer_vector_type (i)) == NUM_TYPE_SUFFIXES)
+	return error_mark_node;
+    }
+
+  /* Handle convert-like functions in which the first type suffix is
+     explicit.  */
+  if (type_suffix_ids[0] != NUM_TYPE_SUFFIXES)
+    return resolve_to (mode_suffix_id, type_suffix_ids[0], type);
+
+  return resolve_to (mode_suffix_id, type);
+}
+
+/* Resolve a (possibly predicated) unary function taking a scalar
+   argument (_n suffix).  If the function uses merge predication,
+   there is an extra vector argument in the first position, and the
+   final governing predicate that specifies the values of inactive
+   elements.
+
+   Return the function decl of the resolved function on success,
+   otherwise report a suitable error and return error_mark_node.  */
+tree
+function_resolver::resolve_unary_n ()
+{
+  type_suffix_index type;
+
+  /* Currently only support overrides for _m (vdupq).  */
+  if (pred != PRED_m)
+    return error_mark_node;
+
+  if (pred == PRED_m)
+    {
+      if (!check_num_arguments (3))
+	return error_mark_node;
+
+      /* The inactive elements are the same as the active elements,
+	 so we can use normal left-to-right resolution.  */
+      if ((type = infer_vector_type (0)) == NUM_TYPE_SUFFIXES
+	  /* Predicates are the last argument.  */
+	  || !require_vector_type (2 , VECTOR_TYPE_mve_pred16_t))
+	return error_mark_node;
+    }
+
+  /* Make sure the argument is scalar.  */
+  tree scalar_form = lookup_form (MODE_n, type);
+
+  if (scalar_argument_p (1) && scalar_form)
+    return scalar_form;
+
+  return error_mark_node;
+}
+
+/* Resolve a (possibly predicated) function that takes NOPS like-typed
+   vector arguments followed by NIMM integer immediates.  Return the
+   function decl of the resolved function on success, otherwise report
+   a suitable error and return error_mark_node.  */
+tree
+function_resolver::resolve_uniform (unsigned int nops, unsigned int nimm)
+{
+  unsigned int i, nargs;
+  type_suffix_index type;
+  if (!check_gp_argument (nops + nimm, i, nargs)
+      || (type = infer_vector_type (0 )) == NUM_TYPE_SUFFIXES)
+    return error_mark_node;
+
+  unsigned int last_arg = i + 1 - nimm;
+  for (i = 0; i < last_arg; i++)
+    if (!require_matching_vector_type (i, type))
+      return error_mark_node;
+
+  for (i = last_arg; i < nargs; ++i)
+    if (!require_integer_immediate (i))
+      return error_mark_node;
+
+  return resolve_to (mode_suffix_id, type);
+}
+
+/* Resolve a (possibly predicated) function that offers a choice between
+   taking:
+
+   - NOPS like-typed vector arguments or
+   - NOPS - 1 like-typed vector arguments followed by a scalar argument
+
+   Return the function decl of the resolved function on success,
+   otherwise report a suitable error and return error_mark_node.  */
+tree
+function_resolver::resolve_uniform_opt_n (unsigned int nops)
+{
+  unsigned int i, nargs;
+  type_suffix_index type;
+  if (!check_gp_argument (nops, i, nargs)
+      /* Unary operators should use resolve_unary, so using i - 1 is
+	 safe.  */
+      || (type = infer_vector_type (i - 1)) == NUM_TYPE_SUFFIXES)
+    return error_mark_node;
+
+  /* Skip last argument, may be scalar.  */
+  unsigned int last_arg = i;
+  for (i = 0; i < last_arg; i++)
+    if (!require_matching_vector_type (i, type))
+      return error_mark_node;
+
+  return finish_opt_n_resolution (last_arg, 0, type);
+}
+
+/* If the call is erroneous, report an appropriate error and return
+   error_mark_node.  Otherwise, if the function is overloaded, return
+   the decl of the non-overloaded function.  Return NULL_TREE otherwise,
+   indicating that the call should be processed in the normal way.  */
+tree
+function_resolver::resolve ()
+{
+  return shape->resolve (*this);
+}
+
+function_checker::function_checker (location_t location,
+				    const function_instance &instance,
+				    tree fndecl, tree fntype,
+				    unsigned int nargs, tree *args)
+  : function_call_info (location, instance, fndecl),
+    m_fntype (fntype), m_nargs (nargs), m_args (args)
+{
+  if (instance.has_inactive_argument ())
+    m_base_arg = 1;
+  else
+    m_base_arg = 0;
+}
+
+/* Return true if argument ARGNO exists. which it might not for
+   erroneous calls.  It is safe to wave through checks if this
+   function returns false.  */
+bool
+function_checker::argument_exists_p (unsigned int argno)
+{
+  gcc_assert (argno < (unsigned int) type_num_arguments (m_fntype));
+  return argno < m_nargs;
+}
+
+/* Check that argument ARGNO is an integer constant expression and
+   store its value in VALUE_OUT if so.  The caller should first
+   check that argument ARGNO exists.  */
+bool
+function_checker::require_immediate (unsigned int argno,
+				     HOST_WIDE_INT &value_out)
+{
+  gcc_assert (argno < m_nargs);
+  tree arg = m_args[argno];
+
+  /* The type and range are unsigned, so read the argument as an
+     unsigned rather than signed HWI.  */
+  if (!tree_fits_uhwi_p (arg))
+    {
+      report_non_ice (location, fndecl, argno);
+      return false;
+    }
+
+  /* ...but treat VALUE_OUT as signed for error reporting, since printing
+     -1 is more user-friendly than the maximum uint64_t value.  */
+  value_out = tree_to_uhwi (arg);
+  return true;
+}
+
+/* Check that argument REL_ARGNO is an integer constant expression that has
+   a valid value for enumeration type TYPE.  REL_ARGNO counts from the end
+   of the predication arguments.  */
+bool
+function_checker::require_immediate_enum (unsigned int rel_argno, tree type)
+{
+  unsigned int argno = m_base_arg + rel_argno;
+  if (!argument_exists_p (argno))
+    return true;
+
+  HOST_WIDE_INT actual;
+  if (!require_immediate (argno, actual))
+    return false;
+
+  for (tree entry = TYPE_VALUES (type); entry; entry = TREE_CHAIN (entry))
+    {
+      /* The value is an INTEGER_CST for C and a CONST_DECL wrapper
+	 around an INTEGER_CST for C++.  */
+      tree value = TREE_VALUE (entry);
+      if (TREE_CODE (value) == CONST_DECL)
+	value = DECL_INITIAL (value);
+      if (wi::to_widest (value) == actual)
+	return true;
+    }
+
+  report_not_enum (location, fndecl, argno, actual, type);
+  return false;
+}
+
+/* Check that argument REL_ARGNO is an integer constant expression in the
+   range [MIN, MAX].  REL_ARGNO counts from the end of the predication
+   arguments.  */
+bool
+function_checker::require_immediate_range (unsigned int rel_argno,
+					   HOST_WIDE_INT min,
+					   HOST_WIDE_INT max)
+{
+  unsigned int argno = m_base_arg + rel_argno;
+  if (!argument_exists_p (argno))
+    return true;
+
+  /* Required because of the tree_to_uhwi -> HOST_WIDE_INT conversion
+     in require_immediate.  */
+  gcc_assert (min >= 0 && min <= max);
+  HOST_WIDE_INT actual;
+  if (!require_immediate (argno, actual))
+    return false;
+
+  if (!IN_RANGE (actual, min, max))
+    {
+      report_out_of_range (location, fndecl, argno, actual, min, max);
+      return false;
+    }
+
+  return true;
+}
+
+/* Perform semantic checks on the call.  Return true if the call is valid,
+   otherwise report a suitable error.  */
+bool
+function_checker::check ()
+{
+  function_args_iterator iter;
+  tree type;
+  unsigned int i = 0;
+  FOREACH_FUNCTION_ARGS (m_fntype, type, iter)
+    {
+      if (type == void_type_node || i >= m_nargs)
+	break;
+
+      if (i >= m_base_arg
+	  && TREE_CODE (type) == ENUMERAL_TYPE
+	  && !require_immediate_enum (i - m_base_arg, type))
+	return false;
+
+      i += 1;
+    }
+
+  return shape->check (*this);
+}
+
+gimple_folder::gimple_folder (const function_instance &instance, tree fndecl,
+			      gcall *call_in)
+  : function_call_info (gimple_location (call_in), instance, fndecl),
+    call (call_in), lhs (gimple_call_lhs (call_in))
+{
+}
+
+/* Try to fold the call.  Return the new statement on success and null
+   on failure.  */
+gimple *
+gimple_folder::fold ()
+{
+  /* Don't fold anything when MVE is disabled; emit an error during
+     expansion instead.  */
+  if (!TARGET_HAVE_MVE)
+    return NULL;
+
+  /* Punt if the function has a return type and no result location is
+     provided.  The attributes should allow target-independent code to
+     remove the calls if appropriate.  */
+  if (!lhs && TREE_TYPE (gimple_call_fntype (call)) != void_type_node)
+    return NULL;
+
+  return base->fold (*this);
+}
+
+function_expander::function_expander (const function_instance &instance,
+				      tree fndecl, tree call_expr_in,
+				      rtx possible_target_in)
+  : function_call_info (EXPR_LOCATION (call_expr_in), instance, fndecl),
+    call_expr (call_expr_in), possible_target (possible_target_in)
+{
+}
+
+/* Return the handler of direct optab OP for type suffix SUFFIX_I.  */
+insn_code
+function_expander::direct_optab_handler (optab op, unsigned int suffix_i)
+{
+  return ::direct_optab_handler (op, vector_mode (suffix_i));
+}
+
+/* For a function that does the equivalent of:
+
+     OUTPUT = COND ? FN (INPUTS) : FALLBACK;
+
+   return the value of FALLBACK.
+
+   MODE is the mode of OUTPUT.
+   MERGE_ARGNO is the argument that provides FALLBACK for _m functions,
+   or DEFAULT_MERGE_ARGNO if we should apply the usual rules.
+
+   ARGNO is the caller's index into args.  If the returned value is
+   argument 0 (as for unary _m operations), increment ARGNO past the
+   returned argument.  */
+rtx
+function_expander::get_fallback_value (machine_mode mode,
+				       unsigned int merge_argno,
+				       unsigned int &argno)
+{
+  if (pred == PRED_z)
+    return CONST0_RTX (mode);
+
+  gcc_assert (pred == PRED_m || pred == PRED_x);
+
+  if (merge_argno == 0)
+    return args[argno++];
+
+  return args[merge_argno];
+}
+
+/* Return a REG rtx that can be used for the result of the function,
+   using the preferred target if suitable.  */
+rtx
+function_expander::get_reg_target ()
+{
+  machine_mode target_mode = TYPE_MODE (TREE_TYPE (TREE_TYPE (fndecl)));
+  if (!possible_target || GET_MODE (possible_target) != target_mode)
+    possible_target = gen_reg_rtx (target_mode);
+  return possible_target;
+}
+
+/* Add an output operand to the instruction we're building, which has
+   code ICODE.  Bind the output to the preferred target rtx if possible.  */
+void
+function_expander::add_output_operand (insn_code icode)
+{
+  unsigned int opno = m_ops.length ();
+  machine_mode mode = insn_data[icode].operand[opno].mode;
+  m_ops.safe_grow (opno + 1, true);
+  create_output_operand (&m_ops.last (), possible_target, mode);
+}
+
+/* Add an input operand to the instruction we're building, which has
+   code ICODE.  Calculate the value of the operand as follows:
+
+   - If the operand is a predicate, coerce X to have the
+     mode that the instruction expects.
+
+   - Otherwise use X directly.  The expand machinery checks that X has
+     the right mode for the instruction.  */
+void
+function_expander::add_input_operand (insn_code icode, rtx x)
+{
+  unsigned int opno = m_ops.length ();
+  const insn_operand_data &operand = insn_data[icode].operand[opno];
+  machine_mode mode = operand.mode;
+  if (mode == VOIDmode)
+    {
+      /* The only allowable use of VOIDmode is the wildcard
+	 arm_any_register_operand, which is used to avoid
+	 combinatorial explosion in the reinterpret patterns.  */
+      gcc_assert (operand.predicate == arm_any_register_operand);
+      mode = GET_MODE (x);
+    }
+  else if (VALID_MVE_PRED_MODE (mode))
+    x = gen_lowpart (mode, x);
+
+  m_ops.safe_grow (m_ops.length () + 1, true);
+  create_input_operand (&m_ops.last (), x, mode);
+}
+
+/* Add an integer operand with value X to the instruction.  */
+void
+function_expander::add_integer_operand (HOST_WIDE_INT x)
+{
+  m_ops.safe_grow (m_ops.length () + 1, true);
+  create_integer_operand (&m_ops.last (), x);
+}
+
+/* Generate instruction ICODE, given that its operands have already
+   been added to M_OPS.  Return the value of the first operand.  */
+rtx
+function_expander::generate_insn (insn_code icode)
+{
+  expand_insn (icode, m_ops.length (), m_ops.address ());
+  return function_returns_void_p () ? const0_rtx : m_ops[0].value;
+}
+
+/* Implement the call using instruction ICODE, with a 1:1 mapping between
+   arguments and input operands.  */
+rtx
+function_expander::use_exact_insn (insn_code icode)
+{
+  unsigned int nops = insn_data[icode].n_operands;
+  if (!function_returns_void_p ())
+    {
+      add_output_operand (icode);
+      nops -= 1;
+    }
+  for (unsigned int i = 0; i < nops; ++i)
+    add_input_operand (icode, args[i]);
+  return generate_insn (icode);
+}
+
+/* Implement the call using instruction ICODE, which does not use a
+   predicate.  */
+rtx
+function_expander::use_unpred_insn (insn_code icode)
+{
+  gcc_assert (pred == PRED_none);
+  /* Discount the output operand.  */
+  unsigned int nops = insn_data[icode].n_operands - 1;
+  unsigned int i = 0;
+
+  add_output_operand (icode);
+  for (; i < nops; ++i)
+    add_input_operand (icode, args[i]);
+
+  return generate_insn (icode);
+}
+
+/* Implement the call using instruction ICODE, which is a predicated
+   operation that returns arbitrary values for inactive lanes.  */
+rtx
+function_expander::use_pred_x_insn (insn_code icode)
+{
+  gcc_assert (pred == PRED_x);
+  unsigned int nops = args.length ();
+
+  add_output_operand (icode);
+  /* Use first operand as arbitrary inactive input.  */
+  add_input_operand (icode, possible_target);
+  emit_clobber (possible_target);
+  /* Copy remaining arguments, including the final predicate.  */
+  for (unsigned int i = 0; i < nops; ++i)
+      add_input_operand (icode, args[i]);
+
+  return generate_insn (icode);
+}
+
+/* Implement the call using instruction ICODE, which does the equivalent of:
+
+     OUTPUT = COND ? FN (INPUTS) : FALLBACK;
+
+   The instruction operands are in the order above: OUTPUT, COND, INPUTS
+   and FALLBACK.  MERGE_ARGNO is the argument that provides FALLBACK for _m
+   functions, or DEFAULT_MERGE_ARGNO if we should apply the usual rules.  */
+rtx
+function_expander::use_cond_insn (insn_code icode, unsigned int merge_argno)
+{
+  /* At present we never need to handle PRED_none, which would involve
+     creating a new predicate rather than using one supplied by the user.  */
+  gcc_assert (pred != PRED_none);
+  /* For MVE, we only handle PRED_m at present.  */
+  gcc_assert (pred == PRED_m);
+
+  /* Discount the output, predicate and fallback value.  */
+  unsigned int nops = insn_data[icode].n_operands - 3;
+  machine_mode mode = insn_data[icode].operand[0].mode;
+
+  unsigned int opno = 0;
+  rtx fallback_arg = NULL_RTX;
+  fallback_arg = get_fallback_value (mode, merge_argno, opno);
+  rtx pred_arg = args[nops + 1];
+
+  add_output_operand (icode);
+    add_input_operand (icode, fallback_arg);
+  for (unsigned int i = 0; i < nops; ++i)
+    add_input_operand (icode, args[opno + i]);
+  add_input_operand (icode, pred_arg);
+  return generate_insn (icode);
+}
+
+/* Implement the call using a normal unpredicated optab for PRED_none.
+
+   <optab> corresponds to:
+
+   - CODE_FOR_SINT for signed integers
+   - CODE_FOR_UINT for unsigned integers
+   - CODE_FOR_FP for floating-point values  */
+rtx
+function_expander::map_to_rtx_codes (rtx_code code_for_sint,
+				     rtx_code code_for_uint,
+				     rtx_code code_for_fp)
+{
+  gcc_assert (pred == PRED_none);
+  rtx_code code = type_suffix (0).integer_p ?
+    (type_suffix (0).unsigned_p ? code_for_uint : code_for_sint)
+    : code_for_fp;
+  insn_code icode = direct_optab_handler (code_to_optab (code), 0);
+  if (icode == CODE_FOR_nothing)
+    gcc_unreachable ();
+
+  return use_unpred_insn (icode);
+}
+
+/* Expand the call and return its lhs.  */
+rtx
+function_expander::expand ()
+{
+  unsigned int nargs = call_expr_nargs (call_expr);
+  args.reserve (nargs);
+  for (unsigned int i = 0; i < nargs; ++i)
+    args.quick_push (expand_normal (CALL_EXPR_ARG (call_expr, i)));
+
+  return base->expand (*this);
+}
+
+/* If we're implementing manual overloading, check whether the MVE
+   function with subcode CODE is overloaded, and if so attempt to
+   determine the corresponding non-overloaded function.  The call
+   occurs at location LOCATION and has the arguments given by ARGLIST.
+
+   If the call is erroneous, report an appropriate error and return
+   error_mark_node.  Otherwise, if the function is overloaded, return
+   the decl of the non-overloaded function.  Return NULL_TREE otherwise,
+   indicating that the call should be processed in the normal way.  */
+tree
+resolve_overloaded_builtin (location_t location, unsigned int code,
+			    vec<tree, va_gc> *arglist)
+{
+  if (code >= vec_safe_length (registered_functions))
+    return NULL_TREE;
+
+  registered_function &rfn = *(*registered_functions)[code];
+  if (rfn.overloaded_p)
+    return function_resolver (location, rfn.instance, rfn.decl,
+			      *arglist).resolve ();
+  return NULL_TREE;
+}
+
+/* Perform any semantic checks needed for a call to the MVE function
+   with subcode CODE, such as testing for integer constant expressions.
+   The call occurs at location LOCATION and has NARGS arguments,
+   given by ARGS.  FNDECL is the original function decl, before
+   overload resolution.
+
+   Return true if the call is valid, otherwise report a suitable error.  */
+bool
+check_builtin_call (location_t location, vec<location_t>, unsigned int code,
+		    tree fndecl, unsigned int nargs, tree *args)
+{
+  const registered_function &rfn = *(*registered_functions)[code];
+  if (!check_requires_float (location, rfn.decl, rfn.requires_float))
+    return false;
+
+  return function_checker (location, rfn.instance, fndecl,
+			   TREE_TYPE (rfn.decl), nargs, args).check ();
+}
+
+/* Attempt to fold STMT, given that it's a call to the MVE function
+   with subcode CODE.  Return the new statement on success and null
+   on failure.  Insert any other new statements at GSI.  */
+gimple *
+gimple_fold_builtin (unsigned int code, gcall *stmt)
+{
+  registered_function &rfn = *(*registered_functions)[code];
+  return gimple_folder (rfn.instance, rfn.decl, stmt).fold ();
+}
+
+/* Expand a call to the MVE function with subcode CODE.  EXP is the call
+   expression and TARGET is the preferred location for the result.
+   Return the value of the lhs.  */
+rtx
+expand_builtin (unsigned int code, tree exp, rtx target)
+{
+  registered_function &rfn = *(*registered_functions)[code];
+  if (!check_requires_float (EXPR_LOCATION (exp), rfn.decl,
+			    rfn.requires_float))
+    return target;
+  return function_expander (rfn.instance, rfn.decl, exp, target).expand ();
+}
+
+} /* end namespace arm_mve */
+
+using namespace arm_mve;
+
+inline void
+gt_ggc_mx (function_instance *)
+{
+}
+
+inline void
+gt_pch_nx (function_instance *)
+{
+}
+
+inline void
+gt_pch_nx (function_instance *, gt_pointer_operator, void *)
+{
+}
 
 #include "gt-arm-mve-builtins.h"
diff --git a/gcc/config/arm/arm-mve-builtins.def b/gcc/config/arm/arm-mve-builtins.def
index 69f3f81b473..49d07364fa2 100644
--- a/gcc/config/arm/arm-mve-builtins.def
+++ b/gcc/config/arm/arm-mve-builtins.def
@@ -17,10 +17,25 @@
    along with GCC; see the file COPYING3.  If not see
    <http://www.gnu.org/licenses/>.  */
 
+#ifndef DEF_MVE_MODE
+#define DEF_MVE_MODE(A, B, C, D)
+#endif
+
 #ifndef DEF_MVE_TYPE
-#error "arm-mve-builtins.def included without defining DEF_MVE_TYPE"
+#define DEF_MVE_TYPE(A, B)
+#endif
+
+#ifndef DEF_MVE_TYPE_SUFFIX
+#define DEF_MVE_TYPE_SUFFIX(A, B, C, D, E)
 #endif
 
+#ifndef DEF_MVE_FUNCTION
+#define DEF_MVE_FUNCTION(A, B, C, D)
+#endif
+
+DEF_MVE_MODE (n, none, none, none)
+DEF_MVE_MODE (offset, none, none, bytes)
+
 #define REQUIRES_FLOAT false
 DEF_MVE_TYPE (mve_pred16_t, boolean_type_node)
 DEF_MVE_TYPE (uint8x16_t, unsigned_intQI_type_node)
@@ -37,3 +52,26 @@ DEF_MVE_TYPE (int64x2_t, intDI_type_node)
 DEF_MVE_TYPE (float16x8_t, arm_fp16_type_node)
 DEF_MVE_TYPE (float32x4_t, float_type_node)
 #undef REQUIRES_FLOAT
+
+#define REQUIRES_FLOAT false
+DEF_MVE_TYPE_SUFFIX (s8, int8x16_t, signed, 8, V16QImode)
+DEF_MVE_TYPE_SUFFIX (s16, int16x8_t, signed, 16, V8HImode)
+DEF_MVE_TYPE_SUFFIX (s32, int32x4_t, signed, 32, V4SImode)
+DEF_MVE_TYPE_SUFFIX (s64, int64x2_t, signed, 64, V2DImode)
+DEF_MVE_TYPE_SUFFIX (u8, uint8x16_t, unsigned, 8, V16QImode)
+DEF_MVE_TYPE_SUFFIX (u16, uint16x8_t, unsigned, 16, V8HImode)
+DEF_MVE_TYPE_SUFFIX (u32, uint32x4_t, unsigned, 32, V4SImode)
+DEF_MVE_TYPE_SUFFIX (u64, uint64x2_t, unsigned, 64, V2DImode)
+#undef REQUIRES_FLOAT
+
+#define REQUIRES_FLOAT true
+DEF_MVE_TYPE_SUFFIX (f16, float16x8_t, float, 16, V8HFmode)
+DEF_MVE_TYPE_SUFFIX (f32, float32x4_t, float, 32, V4SFmode)
+#undef REQUIRES_FLOAT
+
+#include "arm-mve-builtins-base.def"
+
+#undef DEF_MVE_TYPE
+#undef DEF_MVE_TYPE_SUFFIX
+#undef DEF_MVE_FUNCTION
+#undef DEF_MVE_MODE
diff --git a/gcc/config/arm/arm-mve-builtins.h b/gcc/config/arm/arm-mve-builtins.h
index 290a118ec92..a20d2fb5d86 100644
--- a/gcc/config/arm/arm-mve-builtins.h
+++ b/gcc/config/arm/arm-mve-builtins.h
@@ -20,7 +20,79 @@
 #ifndef GCC_ARM_MVE_BUILTINS_H
 #define GCC_ARM_MVE_BUILTINS_H
 
+/* The full name of an MVE ACLE function is the concatenation of:
+
+   - the base name ("vadd", etc.)
+   - the "mode" suffix ("_n", "_index", etc.)
+   - the type suffixes ("_s32", "_b8", etc.)
+   - the predication suffix ("_x", "_z", etc.)
+
+   Each piece of information is individually useful, so we retain this
+   classification throughout:
+
+   - function_base represents the base name
+
+   - mode_suffix_index represents the mode suffix
+
+   - type_suffix_index represents individual type suffixes, while
+     type_suffix_pair represents a pair of them
+
+   - prediction_index extends the predication suffix with an additional
+     alternative: PRED_implicit for implicitly-predicated operations
+
+   In addition to its unique full name, a function may have a shorter
+   overloaded alias.  This alias removes pieces of the suffixes that
+   can be inferred from the arguments, such as by shortening the mode
+   suffix or dropping some of the type suffixes.  The base name and the
+   predication suffix stay the same.
+
+   The function_shape class describes what arguments a given function
+   takes and what its overloaded alias is called.  In broad terms,
+   function_base describes how the underlying instruction behaves while
+   function_shape describes how that instruction has been presented at
+   the language level.
+
+   The static list of functions uses function_group to describe a group
+   of related functions.  The function_builder class is responsible for
+   expanding this static description into a list of individual functions
+   and registering the associated built-in functions.  function_instance
+   describes one of these individual functions in terms of the properties
+   described above.
+
+   The classes involved in compiling a function call are:
+
+   - function_resolver, which resolves an overloaded function call to a
+     specific function_instance and its associated function decl
+
+   - function_checker, which checks whether the values of the arguments
+     conform to the ACLE specification
+
+   - gimple_folder, which tries to fold a function call at the gimple level
+
+   - function_expander, which expands a function call into rtl instructions
+
+   function_resolver and function_checker operate at the language level
+   and so are associated with the function_shape.  gimple_folder and
+   function_expander are concerned with the behavior of the function
+   and so are associated with the function_base.
+
+   Note that we've specifically chosen not to fold calls in the frontend,
+   since MVE intrinsics will hardly ever fold a useful language-level
+   constant.  */
 namespace arm_mve {
+/* The maximum number of vectors in an ACLE tuple type.  */
+const unsigned int MAX_TUPLE_SIZE = 3;
+
+/* Used to represent the default merge argument index for _m functions.
+   The actual index depends on how many arguments the function takes.  */
+const unsigned int DEFAULT_MERGE_ARGNO = 0;
+
+/* Flags that describe what a function might do, in addition to reading
+   its arguments and returning a result.  */
+const unsigned int CP_READ_FPCR = 1U << 0;
+const unsigned int CP_RAISE_FP_EXCEPTIONS = 1U << 1;
+const unsigned int CP_READ_MEMORY = 1U << 2;
+const unsigned int CP_WRITE_MEMORY = 1U << 3;
 
 /* Enumerates the MVE predicate and (data) vector types, together called
    "vector types" for brevity.  */
@@ -30,11 +102,604 @@ enum vector_type_index
   VECTOR_TYPE_ ## ACLE_NAME,
 #include "arm-mve-builtins.def"
   NUM_VECTOR_TYPES
-#undef DEF_MVE_TYPE
 };
 
+/* Classifies the available measurement units for an address displacement.  */
+enum units_index
+{
+  UNITS_none,
+  UNITS_bytes
+};
+
+/* Describes the various uses of a governing predicate.  */
+enum predication_index
+{
+  /* No governing predicate is present.  */
+  PRED_none,
+
+  /* Merging predication: copy inactive lanes from the first data argument
+     to the vector result.  */
+  PRED_m,
+
+  /* Plain predication: inactive lanes are not used to compute the
+     scalar result.  */
+  PRED_p,
+
+  /* "Don't care" predication: set inactive lanes of the vector result
+     to arbitrary values.  */
+  PRED_x,
+
+  /* Zero predication: set inactive lanes of the vector result to zero.  */
+  PRED_z,
+
+  NUM_PREDS
+};
+
+/* Classifies element types, based on type suffixes with the bit count
+   removed.  */
+enum type_class_index
+{
+  TYPE_bool,
+  TYPE_float,
+  TYPE_signed,
+  TYPE_unsigned,
+  NUM_TYPE_CLASSES
+};
+
+/* Classifies an operation into "modes"; for example, to distinguish
+   vector-scalar operations from vector-vector operations, or to
+   distinguish between different addressing modes.  This classification
+   accounts for the function suffixes that occur between the base name
+   and the first type suffix.  */
+enum mode_suffix_index
+{
+#define DEF_MVE_MODE(NAME, BASE, DISPLACEMENT, UNITS) MODE_##NAME,
+#include "arm-mve-builtins.def"
+  MODE_none
+};
+
+/* Enumerates the possible type suffixes.  Each suffix is associated with
+   a vector type, but for predicates provides extra information about the
+   element size.  */
+enum type_suffix_index
+{
+#define DEF_MVE_TYPE_SUFFIX(NAME, ACLE_TYPE, CLASS, BITS, MODE)	\
+  TYPE_SUFFIX_ ## NAME,
+#include "arm-mve-builtins.def"
+  NUM_TYPE_SUFFIXES
+};
+
+/* Combines two type suffixes.  */
+typedef enum type_suffix_index type_suffix_pair[2];
+
+class function_base;
+class function_shape;
+
+/* Static information about a mode suffix.  */
+struct mode_suffix_info
+{
+  /* The suffix string itself.  */
+  const char *string;
+
+  /* The type of the vector base address, or NUM_VECTOR_TYPES if the
+     mode does not include a vector base address.  */
+  vector_type_index base_vector_type;
+
+  /* The type of the vector displacement, or NUM_VECTOR_TYPES if the
+     mode does not include a vector displacement.  (Note that scalar
+     displacements are always int64_t.)  */
+  vector_type_index displacement_vector_type;
+
+  /* The units in which the vector or scalar displacement is measured,
+     or UNITS_none if the mode doesn't take a displacement.  */
+  units_index displacement_units;
+};
+
+/* Static information about a type suffix.  */
+struct type_suffix_info
+{
+  /* The suffix string itself.  */
+  const char *string;
+
+  /* The associated ACLE vector or predicate type.  */
+  vector_type_index vector_type : 8;
+
+  /* What kind of type the suffix represents.  */
+  type_class_index tclass : 8;
+
+  /* The number of bits and bytes in an element.  For predicates this
+     measures the associated data elements.  */
+  unsigned int element_bits : 8;
+  unsigned int element_bytes : 8;
+
+  /* True if the suffix is for an integer type.  */
+  unsigned int integer_p : 1;
+  /* True if the suffix is for an unsigned type.  */
+  unsigned int unsigned_p : 1;
+  /* True if the suffix is for a floating-point type.  */
+  unsigned int float_p : 1;
+  unsigned int spare : 13;
+
+  /* The associated vector or predicate mode.  */
+  machine_mode vector_mode : 16;
+};
+
+/* Static information about a set of functions.  */
+struct function_group_info
+{
+  /* The base name, as a string.  */
+  const char *base_name;
+
+  /* Describes the behavior associated with the function base name.  */
+  const function_base *const *base;
+
+  /* The shape of the functions, as described above the class definition.
+     It's possible to have entries with the same base name but different
+     shapes.  */
+  const function_shape *const *shape;
+
+  /* A list of the available type suffixes, and of the available predication
+     types.  The function supports every combination of the two.
+
+     The list of type suffixes is terminated by two NUM_TYPE_SUFFIXES
+     while the list of predication types is terminated by NUM_PREDS.
+     The list of type suffixes is lexicographically ordered based
+     on the index value.  */
+  const type_suffix_pair *types;
+  const predication_index *preds;
+
+  /* Whether the function group requires a floating point abi.  */
+  bool requires_float;
+};
+
+/* Describes a single fully-resolved function (i.e. one that has a
+   unique full name).  */
+class GTY((user)) function_instance
+{
+public:
+  function_instance (const char *, const function_base *,
+		     const function_shape *, mode_suffix_index,
+		     const type_suffix_pair &, predication_index);
+
+  bool operator== (const function_instance &) const;
+  bool operator!= (const function_instance &) const;
+  hashval_t hash () const;
+
+  unsigned int call_properties () const;
+  bool reads_global_state_p () const;
+  bool modifies_global_state_p () const;
+  bool could_trap_p () const;
+
+  unsigned int vectors_per_tuple () const;
+
+  const mode_suffix_info &mode_suffix () const;
+
+  const type_suffix_info &type_suffix (unsigned int) const;
+  tree scalar_type (unsigned int) const;
+  tree vector_type (unsigned int) const;
+  tree tuple_type (unsigned int) const;
+  machine_mode vector_mode (unsigned int) const;
+  machine_mode gp_mode (unsigned int) const;
+
+  bool has_inactive_argument () const;
+
+  /* The properties of the function.  (The explicit "enum"s are required
+     for gengtype.)  */
+  const char *base_name;
+  const function_base *base;
+  const function_shape *shape;
+  enum mode_suffix_index mode_suffix_id;
+  type_suffix_pair type_suffix_ids;
+  enum predication_index pred;
+};
+
+class registered_function;
+
+/* A class for building and registering function decls.  */
+class function_builder
+{
+public:
+  function_builder ();
+  ~function_builder ();
+
+  void add_unique_function (const function_instance &, tree,
+			    vec<tree> &, bool, bool, bool);
+  void add_overloaded_function (const function_instance &, bool, bool);
+  void add_overloaded_functions (const function_group_info &,
+				 mode_suffix_index, bool);
+
+  void register_function_group (const function_group_info &, bool);
+
+private:
+  void append_name (const char *);
+  char *finish_name ();
+
+  char *get_name (const function_instance &, bool, bool);
+
+  tree get_attributes (const function_instance &);
+
+  registered_function &add_function (const function_instance &,
+				     const char *, tree, tree,
+				     bool, bool, bool);
+
+  /* The function type to use for functions that are resolved by
+     function_resolver.  */
+  tree m_overload_type;
+
+  /* True if we should create a separate decl for each instance of an
+     overloaded function, instead of using function_resolver.  */
+  bool m_direct_overloads;
+
+  /* Used for building up function names.  */
+  obstack m_string_obstack;
+
+  /* Maps all overloaded function names that we've registered so far
+     to their associated function_instances.  */
+  hash_map<nofree_string_hash, registered_function *> m_overload_names;
+};
+
+/* A base class for handling calls to built-in functions.  */
+class function_call_info : public function_instance
+{
+public:
+  function_call_info (location_t, const function_instance &, tree);
+
+  bool function_returns_void_p ();
+
+  /* The location of the call.  */
+  location_t location;
+
+  /* The FUNCTION_DECL that is being called.  */
+  tree fndecl;
+};
+
+/* A class for resolving an overloaded function call.  */
+class function_resolver : public function_call_info
+{
+public:
+  enum { SAME_SIZE = 256, HALF_SIZE, QUARTER_SIZE };
+  static const type_class_index SAME_TYPE_CLASS = NUM_TYPE_CLASSES;
+
+  function_resolver (location_t, const function_instance &, tree,
+		     vec<tree, va_gc> &);
+
+  tree get_vector_type (type_suffix_index);
+  const char *get_scalar_type_name (type_suffix_index);
+  tree get_argument_type (unsigned int);
+  bool scalar_argument_p (unsigned int);
+
+  tree report_no_such_form (type_suffix_index);
+  tree lookup_form (mode_suffix_index,
+		    type_suffix_index = NUM_TYPE_SUFFIXES,
+		    type_suffix_index = NUM_TYPE_SUFFIXES);
+  tree resolve_to (mode_suffix_index,
+		   type_suffix_index = NUM_TYPE_SUFFIXES,
+		   type_suffix_index = NUM_TYPE_SUFFIXES);
+
+  type_suffix_index infer_vector_or_tuple_type (unsigned int, unsigned int);
+  type_suffix_index infer_vector_type (unsigned int);
+
+  bool require_vector_or_scalar_type (unsigned int);
+
+  bool require_vector_type (unsigned int, vector_type_index);
+  bool require_matching_vector_type (unsigned int, type_suffix_index);
+  bool require_derived_vector_type (unsigned int, unsigned int,
+				    type_suffix_index,
+				    type_class_index = SAME_TYPE_CLASS,
+				    unsigned int = SAME_SIZE);
+  bool require_integer_immediate (unsigned int);
+  bool require_scalar_type (unsigned int, const char *);
+  bool require_derived_scalar_type (unsigned int, type_class_index,
+				    unsigned int = SAME_SIZE);
+  
+  bool check_num_arguments (unsigned int);
+  bool check_gp_argument (unsigned int, unsigned int &, unsigned int &);
+  tree resolve_unary (type_class_index = SAME_TYPE_CLASS,
+		      unsigned int = SAME_SIZE, bool = false);
+  tree resolve_unary_n ();
+  tree resolve_uniform (unsigned int, unsigned int = 0);
+  tree resolve_uniform_opt_n (unsigned int);
+  tree finish_opt_n_resolution (unsigned int, unsigned int, type_suffix_index,
+				type_class_index = SAME_TYPE_CLASS,
+				unsigned int = SAME_SIZE,
+				type_suffix_index = NUM_TYPE_SUFFIXES);
+
+  tree resolve ();
+
+private:
+  /* The arguments to the overloaded function.  */
+  vec<tree, va_gc> &m_arglist;
+};
+
+/* A class for checking that the semantic constraints on a function call are
+   satisfied, such as arguments being integer constant expressions with
+   a particular range.  The parent class's FNDECL is the decl that was
+   called in the original source, before overload resolution.  */
+class function_checker : public function_call_info
+{
+public:
+  function_checker (location_t, const function_instance &, tree,
+		    tree, unsigned int, tree *);
+
+  bool require_immediate_enum (unsigned int, tree);
+  bool require_immediate_lane_index (unsigned int, unsigned int = 1);
+  bool require_immediate_range (unsigned int, HOST_WIDE_INT, HOST_WIDE_INT);
+
+  bool check ();
+
+private:
+  bool argument_exists_p (unsigned int);
+
+  bool require_immediate (unsigned int, HOST_WIDE_INT &);
+
+  /* The type of the resolved function.  */
+  tree m_fntype;
+
+  /* The arguments to the function.  */
+  unsigned int m_nargs;
+  tree *m_args;
+
+  /* The first argument not associated with the function's predication
+     type.  */
+  unsigned int m_base_arg;
+};
+
+/* A class for folding a gimple function call.  */
+class gimple_folder : public function_call_info
+{
+public:
+  gimple_folder (const function_instance &, tree,
+		 gcall *);
+
+  gimple *fold ();
+
+  /* The call we're folding.  */
+  gcall *call;
+
+  /* The result of the call, or null if none.  */
+  tree lhs;
+};
+
+/* A class for expanding a function call into RTL.  */
+class function_expander : public function_call_info
+{
+public:
+  function_expander (const function_instance &, tree, tree, rtx);
+  rtx expand ();
+
+  insn_code direct_optab_handler (optab, unsigned int = 0);
+
+  rtx get_fallback_value (machine_mode, unsigned int, unsigned int &);
+  rtx get_reg_target ();
+
+  void add_output_operand (insn_code);
+  void add_input_operand (insn_code, rtx);
+  void add_integer_operand (HOST_WIDE_INT);
+  rtx generate_insn (insn_code);
+
+  rtx use_exact_insn (insn_code);
+  rtx use_unpred_insn (insn_code);
+  rtx use_pred_x_insn (insn_code);
+  rtx use_cond_insn (insn_code, unsigned int = DEFAULT_MERGE_ARGNO);
+
+  rtx map_to_rtx_codes (rtx_code, rtx_code, rtx_code);
+
+  /* The function call expression.  */
+  tree call_expr;
+
+  /* For functions that return a value, this is the preferred location
+     of that value.  It could be null or could have a different mode
+     from the function return type.  */
+  rtx possible_target;
+
+  /* The expanded arguments.  */
+  auto_vec<rtx, 16> args;
+
+private:
+  /* Used to build up the operands to an instruction.  */
+  auto_vec<expand_operand, 8> m_ops;
+};
+
+/* Provides information about a particular function base name, and handles
+   tasks related to the base name.  */
+class function_base
+{
+public:
+  /* Return a set of CP_* flags that describe what the function might do,
+     in addition to reading its arguments and returning a result.  */
+  virtual unsigned int call_properties (const function_instance &) const;
+
+  /* If the function operates on tuples of vectors, return the number
+     of vectors in the tuples, otherwise return 1.  */
+  virtual unsigned int vectors_per_tuple () const { return 1; }
+
+  /* Try to fold the given gimple call.  Return the new gimple statement
+     on success, otherwise return null.  */
+  virtual gimple *fold (gimple_folder &) const { return NULL; }
+
+  /* Expand the given call into rtl.  Return the result of the function,
+     or an arbitrary value if the function doesn't return a result.  */
+  virtual rtx expand (function_expander &) const = 0;
+};
+
+/* Classifies functions into "shapes".  The idea is to take all the
+   type signatures for a set of functions, and classify what's left
+   based on:
+
+   - the number of arguments
+
+   - the process of determining the types in the signature from the mode
+     and type suffixes in the function name (including types that are not
+     affected by the suffixes)
+
+   - which arguments must be integer constant expressions, and what range
+     those arguments have
+
+   - the process for mapping overloaded names to "full" names.  */
+class function_shape
+{
+public:
+  virtual bool explicit_type_suffix_p (unsigned int, enum predication_index, enum mode_suffix_index) const = 0;
+  virtual bool explicit_mode_suffix_p (enum predication_index, enum mode_suffix_index) const = 0;
+  virtual bool skip_overload_p (enum predication_index, enum mode_suffix_index) const = 0;
+
+  /* Define all functions associated with the given group.  */
+  virtual void build (function_builder &,
+		      const function_group_info &,
+		      bool) const = 0;
+
+  /* Try to resolve the overloaded call.  Return the non-overloaded
+     function decl on success and error_mark_node on failure.  */
+  virtual tree resolve (function_resolver &) const = 0;
+
+  /* Check whether the given call is semantically valid.  Return true
+     if it is, otherwise report an error and return false.  */
+  virtual bool check (function_checker &) const { return true; }
+};
+
+extern const type_suffix_info type_suffixes[NUM_TYPE_SUFFIXES + 1];
+extern const mode_suffix_info mode_suffixes[MODE_none + 1];
+
 extern tree scalar_types[NUM_VECTOR_TYPES];
-extern tree acle_vector_types[3][NUM_VECTOR_TYPES + 1];
+extern tree acle_vector_types[MAX_TUPLE_SIZE][NUM_VECTOR_TYPES + 1];
+
+/* Return the ACLE type mve_pred16_t.  */
+inline tree
+get_mve_pred16_t (void)
+{
+  return acle_vector_types[0][VECTOR_TYPE_mve_pred16_t];
+}
+
+/* Try to find a mode with the given mode_suffix_info fields.  Return the
+   mode on success or MODE_none on failure.  */
+inline mode_suffix_index
+find_mode_suffix (vector_type_index base_vector_type,
+		  vector_type_index displacement_vector_type,
+		  units_index displacement_units)
+{
+  for (unsigned int mode_i = 0; mode_i < ARRAY_SIZE (mode_suffixes); ++mode_i)
+    {
+      const mode_suffix_info &mode = mode_suffixes[mode_i];
+      if (mode.base_vector_type == base_vector_type
+	  && mode.displacement_vector_type == displacement_vector_type
+	  && mode.displacement_units == displacement_units)
+	return mode_suffix_index (mode_i);
+    }
+  return MODE_none;
+}
+
+/* Return the type suffix associated with ELEMENT_BITS-bit elements of type
+   class TCLASS.  */
+inline type_suffix_index
+find_type_suffix (type_class_index tclass, unsigned int element_bits)
+{
+  for (unsigned int i = 0; i < NUM_TYPE_SUFFIXES; ++i)
+    if (type_suffixes[i].tclass == tclass
+	&& type_suffixes[i].element_bits == element_bits)
+      return type_suffix_index (i);
+  gcc_unreachable ();
+}
+
+inline function_instance::
+function_instance (const char *base_name_in,
+		   const function_base *base_in,
+		   const function_shape *shape_in,
+		   mode_suffix_index mode_suffix_id_in,
+		   const type_suffix_pair &type_suffix_ids_in,
+		   predication_index pred_in)
+  : base_name (base_name_in), base (base_in), shape (shape_in),
+    mode_suffix_id (mode_suffix_id_in), pred (pred_in)
+{
+  memcpy (type_suffix_ids, type_suffix_ids_in, sizeof (type_suffix_ids));
+}
+
+inline bool
+function_instance::operator== (const function_instance &other) const
+{
+  return (base == other.base
+	  && shape == other.shape
+	  && mode_suffix_id == other.mode_suffix_id
+	  && pred == other.pred
+	  && type_suffix_ids[0] == other.type_suffix_ids[0]
+	  && type_suffix_ids[1] == other.type_suffix_ids[1]);
+}
+
+inline bool
+function_instance::operator!= (const function_instance &other) const
+{
+  return !operator== (other);
+}
+
+/* If the function operates on tuples of vectors, return the number
+   of vectors in the tuples, otherwise return 1.  */
+inline unsigned int
+function_instance::vectors_per_tuple () const
+{
+  return base->vectors_per_tuple ();
+}
+
+/* Return information about the function's mode suffix.  */
+inline const mode_suffix_info &
+function_instance::mode_suffix () const
+{
+  return mode_suffixes[mode_suffix_id];
+}
+
+/* Return information about type suffix I.  */
+inline const type_suffix_info &
+function_instance::type_suffix (unsigned int i) const
+{
+  return type_suffixes[type_suffix_ids[i]];
+}
+
+/* Return the scalar type associated with type suffix I.  */
+inline tree
+function_instance::scalar_type (unsigned int i) const
+{
+  return scalar_types[type_suffix (i).vector_type];
+}
+
+/* Return the vector type associated with type suffix I.  */
+inline tree
+function_instance::vector_type (unsigned int i) const
+{
+  return acle_vector_types[0][type_suffix (i).vector_type];
+}
+
+/* If the function operates on tuples of vectors, return the tuple type
+   associated with type suffix I, otherwise return the vector type associated
+   with type suffix I.  */
+inline tree
+function_instance::tuple_type (unsigned int i) const
+{
+  unsigned int num_vectors = vectors_per_tuple ();
+  return acle_vector_types[num_vectors - 1][type_suffix (i).vector_type];
+}
+
+/* Return the vector or predicate mode associated with type suffix I.  */
+inline machine_mode
+function_instance::vector_mode (unsigned int i) const
+{
+  return type_suffix (i).vector_mode;
+}
+
+/* Return true if the function has no return value.  */
+inline bool
+function_call_info::function_returns_void_p ()
+{
+  return TREE_TYPE (TREE_TYPE (fndecl)) == void_type_node;
+}
+
+/* Default implementation of function::call_properties, with conservatively
+   correct behavior for floating-point instructions.  */
+inline unsigned int
+function_base::call_properties (const function_instance &instance) const
+{
+  unsigned int flags = 0;
+  if (instance.type_suffix (0).float_p || instance.type_suffix (1).float_p)
+    flags |= CP_READ_FPCR | CP_RAISE_FP_EXCEPTIONS;
+  return flags;
+}
 
 } /* end namespace arm_mve */
 
diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index 1bdbd3b8ab3..61fcd671437 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -215,7 +215,8 @@ extern opt_machine_mode arm_get_mask_mode (machine_mode mode);
    those groups.  */
 enum arm_builtin_class
 {
-  ARM_BUILTIN_GENERAL
+  ARM_BUILTIN_GENERAL,
+  ARM_BUILTIN_MVE
 };
 
 /* Built-in function codes are structured so that the low
@@ -229,6 +230,13 @@ const unsigned int ARM_BUILTIN_CLASS = (1 << ARM_BUILTIN_SHIFT) - 1;
 /* MVE functions.  */
 namespace arm_mve {
   void handle_arm_mve_types_h ();
+  void handle_arm_mve_h (bool);
+  tree resolve_overloaded_builtin (location_t, unsigned int,
+				   vec<tree, va_gc> *);
+  bool check_builtin_call (location_t, vec<location_t>, unsigned int,
+			   tree, unsigned int, tree *);
+  gimple *gimple_fold_builtin (unsigned int code, gcall *stmt);
+  rtx expand_builtin (unsigned int, tree, rtx);
 }
 
 /* Thumb functions.  */
diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc
index bf7ff9a9704..004e6c6194e 100644
--- a/gcc/config/arm/arm.cc
+++ b/gcc/config/arm/arm.cc
@@ -69,6 +69,7 @@
 #include "optabs-libfuncs.h"
 #include "gimplify.h"
 #include "gimple.h"
+#include "gimple-iterator.h"
 #include "selftest.h"
 #include "tree-vectorizer.h"
 #include "opts.h"
@@ -506,6 +507,9 @@ static const struct attribute_spec arm_attribute_table[] =
 #undef TARGET_FUNCTION_VALUE_REGNO_P
 #define TARGET_FUNCTION_VALUE_REGNO_P arm_function_value_regno_p
 
+#undef TARGET_GIMPLE_FOLD_BUILTIN
+#define TARGET_GIMPLE_FOLD_BUILTIN arm_gimple_fold_builtin
+
 #undef  TARGET_ASM_OUTPUT_MI_THUNK
 #define TARGET_ASM_OUTPUT_MI_THUNK arm_output_mi_thunk
 #undef  TARGET_ASM_CAN_OUTPUT_MI_THUNK
@@ -2844,6 +2848,29 @@ arm_init_libfuncs (void)
   speculation_barrier_libfunc = init_one_libfunc ("__speculation_barrier");
 }
 
+/* Implement TARGET_GIMPLE_FOLD_BUILTIN.  */
+static bool
+arm_gimple_fold_builtin (gimple_stmt_iterator *gsi)
+{
+  gcall *stmt = as_a <gcall *> (gsi_stmt (*gsi));
+  tree fndecl = gimple_call_fndecl (stmt);
+  unsigned int code = DECL_MD_FUNCTION_CODE (fndecl);
+  unsigned int subcode = code >> ARM_BUILTIN_SHIFT;
+  gimple *new_stmt = NULL;
+  switch (code & ARM_BUILTIN_CLASS)
+    {
+    case ARM_BUILTIN_GENERAL:
+      break;
+    case ARM_BUILTIN_MVE:
+      new_stmt = arm_mve::gimple_fold_builtin (subcode, stmt);
+    }
+  if (!new_stmt)
+    return false;
+
+  gsi_replace (gsi, new_stmt, true);
+  return true;
+}
+
 /* On AAPCS systems, this is the "struct __va_list".  */
 static GTY(()) tree va_list_type;
 
diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
index 1262d668121..0d2ba968fc0 100644
--- a/gcc/config/arm/arm_mve.h
+++ b/gcc/config/arm/arm_mve.h
@@ -34,6 +34,12 @@
 #endif
 #include "arm_mve_types.h"
 
+#ifdef __ARM_MVE_PRESERVE_USER_NAMESPACE
+#pragma GCC arm "arm_mve.h" true
+#else
+#pragma GCC arm "arm_mve.h" false
+#endif
+
 #ifndef __ARM_MVE_PRESERVE_USER_NAMESPACE
 #define vst4q(__addr, __value) __arm_vst4q(__addr, __value)
 #define vdupq_n(__a) __arm_vdupq_n(__a)
diff --git a/gcc/config/arm/predicates.md b/gcc/config/arm/predicates.md
index 3139750c606..8e235f63ee6 100644
--- a/gcc/config/arm/predicates.md
+++ b/gcc/config/arm/predicates.md
@@ -903,3 +903,7 @@ (define_predicate "call_insn_operand"
 (define_special_predicate "aligned_operand"
   (ior (not (match_code "mem"))
        (match_test "MEM_ALIGN (op) >= GET_MODE_ALIGNMENT (mode)")))
+
+;; A special predicate that doesn't match a particular mode.
+(define_special_predicate "arm_any_register_operand"
+  (match_code "reg"))
diff --git a/gcc/config/arm/t-arm b/gcc/config/arm/t-arm
index 637e72af5bb..9a1b06368a1 100644
--- a/gcc/config/arm/t-arm
+++ b/gcc/config/arm/t-arm
@@ -154,15 +154,41 @@ arm-builtins.o: $(srcdir)/config/arm/arm-builtins.cc $(CONFIG_H) \
 		$(srcdir)/config/arm/arm-builtins.cc
 
 arm-mve-builtins.o: $(srcdir)/config/arm/arm-mve-builtins.cc $(CONFIG_H) \
-  $(SYSTEM_H) coretypes.h $(TM_H) $(TREE_H) \
-  fold-const.h langhooks.h stringpool.h attribs.h diagnostic.h \
+  $(SYSTEM_H) coretypes.h $(TM_H) $(TREE_H) $(RTL_H) $(TM_P_H) \
+  memmodel.h insn-codes.h optabs.h recog.h expr.h basic-block.h \
+  function.h fold-const.h gimple.h gimple-fold.h emit-rtl.h langhooks.h \
+  stringpool.h attribs.h diagnostic.h \
   $(srcdir)/config/arm/arm-protos.h \
   $(srcdir)/config/arm/arm-builtins.h \
   $(srcdir)/config/arm/arm-mve-builtins.h \
-  $(srcdir)/config/arm/arm-mve-builtins.def
+  $(srcdir)/config/arm/arm-mve-builtins-base.h \
+  $(srcdir)/config/arm/arm-mve-builtins-shapes.h \
+  $(srcdir)/config/arm/arm-mve-builtins.def \
+  $(srcdir)/config/arm/arm-mve-builtins-base.def
 	$(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) \
 		$(srcdir)/config/arm/arm-mve-builtins.cc
 
+arm-mve-builtins-shapes.o: \
+  $(srcdir)/config/arm/arm-mve-builtins-shapes.cc \
+  $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(TREE_H) \
+  $(RTL_H) memmodel.h insn-codes.h optabs.h \
+  $(srcdir)/config/arm/arm-mve-builtins.h \
+  $(srcdir)/config/arm/arm-mve-builtins-shapes.h
+	$(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) \
+		$(srcdir)/config/arm/arm-mve-builtins-shapes.cc
+
+arm-mve-builtins-base.o: \
+  $(srcdir)/config/arm/arm-mve-builtins-base.cc \
+  $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(TREE_H) $(RTL_H) \
+  memmodel.h insn-codes.h $(OPTABS_H) \
+  $(BASIC_BLOCK_H) $(FUNCTION_H) $(GIMPLE_H) \
+  $(srcdir)/config/arm/arm-mve-builtins.h \
+  $(srcdir)/config/arm/arm-mve-builtins-shapes.h \
+  $(srcdir)/config/arm/arm-mve-builtins-base.h \
+  $(srcdir)/config/arm/arm-mve-builtins-functions.h
+	$(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) \
+		$(srcdir)/config/arm/arm-mve-builtins-base.cc
+
 arm-c.o: $(srcdir)/config/arm/arm-c.cc $(CONFIG_H) $(SYSTEM_H) \
     coretypes.h $(TM_H) $(TREE_H) output.h $(C_COMMON_H)
 	$(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) \
-- 
2.34.1


^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH 03/22] arm: [MVE intrinsics] Rework vreinterpretq
  2023-04-18 13:45 [PATCH 00/22] arm: New framework for MVE intrinsics Christophe Lyon
  2023-04-18 13:45 ` [PATCH 01/22] arm: move builtin function codes into general numberspace Christophe Lyon
  2023-04-18 13:45 ` [PATCH 02/22] arm: [MVE intrinsics] Add new framework Christophe Lyon
@ 2023-04-18 13:45 ` Christophe Lyon
  2023-05-02 10:26   ` Kyrylo Tkachov
  2023-04-18 13:45 ` [PATCH 04/22] arm: [MVE intrinsics] Rework vuninitialized Christophe Lyon
                   ` (19 subsequent siblings)
  22 siblings, 1 reply; 55+ messages in thread
From: Christophe Lyon @ 2023-04-18 13:45 UTC (permalink / raw)
  To: gcc-patches, kyrylo.tkachov, richard.earnshaw, richard.sandiford
  Cc: Christophe Lyon

This patch implements vreinterpretq using the new MVE intrinsics
framework.

The old definitions for vreinterpretq are removed as a consequence.

2022-09-08  Murray Steele  <murray.steele@arm.com>
	    Christophe Lyon  <christophe.lyon@arm.com>

	gcc/
	* config/arm/arm-mve-builtins-base.cc (vreinterpretq_impl): New class.
	* config/arm/arm-mve-builtins-base.def: Define vreinterpretq.
	* config/arm/arm-mve-builtins-base.h (vreinterpretq): New declaration.
	* config/arm/arm-mve-builtins-shapes.cc (parse_element_type): New function.
	(parse_type): Likewise.
	(parse_signature): Likewise.
	(build_one): Likewise.
	(build_all): Likewise.
	(overloaded_base): New struct.
	(unary_convert_def): Likewise.
	* config/arm/arm-mve-builtins-shapes.h (unary_convert): Declare.
	* config/arm/arm-mve-builtins.cc (TYPES_reinterpret_signed1): New
	macro.
	(TYPES_reinterpret_unsigned1): Likewise.
	(TYPES_reinterpret_integer): Likewise.
	(TYPES_reinterpret_integer1): Likewise.
	(TYPES_reinterpret_float1): Likewise.
	(TYPES_reinterpret_float): Likewise.
	(reinterpret_integer): New.
	(reinterpret_float): New.
	(handle_arm_mve_h): Register builtins.
	* config/arm/arm_mve.h (vreinterpretq_s16): Remove.
	(vreinterpretq_s32): Likewise.
	(vreinterpretq_s64): Likewise.
	(vreinterpretq_s8): Likewise.
	(vreinterpretq_u16): Likewise.
	(vreinterpretq_u32): Likewise.
	(vreinterpretq_u64): Likewise.
	(vreinterpretq_u8): Likewise.
	(vreinterpretq_f16): Likewise.
	(vreinterpretq_f32): Likewise.
	(vreinterpretq_s16_s32): Likewise.
	(vreinterpretq_s16_s64): Likewise.
	(vreinterpretq_s16_s8): Likewise.
	(vreinterpretq_s16_u16): Likewise.
	(vreinterpretq_s16_u32): Likewise.
	(vreinterpretq_s16_u64): Likewise.
	(vreinterpretq_s16_u8): Likewise.
	(vreinterpretq_s32_s16): Likewise.
	(vreinterpretq_s32_s64): Likewise.
	(vreinterpretq_s32_s8): Likewise.
	(vreinterpretq_s32_u16): Likewise.
	(vreinterpretq_s32_u32): Likewise.
	(vreinterpretq_s32_u64): Likewise.
	(vreinterpretq_s32_u8): Likewise.
	(vreinterpretq_s64_s16): Likewise.
	(vreinterpretq_s64_s32): Likewise.
	(vreinterpretq_s64_s8): Likewise.
	(vreinterpretq_s64_u16): Likewise.
	(vreinterpretq_s64_u32): Likewise.
	(vreinterpretq_s64_u64): Likewise.
	(vreinterpretq_s64_u8): Likewise.
	(vreinterpretq_s8_s16): Likewise.
	(vreinterpretq_s8_s32): Likewise.
	(vreinterpretq_s8_s64): Likewise.
	(vreinterpretq_s8_u16): Likewise.
	(vreinterpretq_s8_u32): Likewise.
	(vreinterpretq_s8_u64): Likewise.
	(vreinterpretq_s8_u8): Likewise.
	(vreinterpretq_u16_s16): Likewise.
	(vreinterpretq_u16_s32): Likewise.
	(vreinterpretq_u16_s64): Likewise.
	(vreinterpretq_u16_s8): Likewise.
	(vreinterpretq_u16_u32): Likewise.
	(vreinterpretq_u16_u64): Likewise.
	(vreinterpretq_u16_u8): Likewise.
	(vreinterpretq_u32_s16): Likewise.
	(vreinterpretq_u32_s32): Likewise.
	(vreinterpretq_u32_s64): Likewise.
	(vreinterpretq_u32_s8): Likewise.
	(vreinterpretq_u32_u16): Likewise.
	(vreinterpretq_u32_u64): Likewise.
	(vreinterpretq_u32_u8): Likewise.
	(vreinterpretq_u64_s16): Likewise.
	(vreinterpretq_u64_s32): Likewise.
	(vreinterpretq_u64_s64): Likewise.
	(vreinterpretq_u64_s8): Likewise.
	(vreinterpretq_u64_u16): Likewise.
	(vreinterpretq_u64_u32): Likewise.
	(vreinterpretq_u64_u8): Likewise.
	(vreinterpretq_u8_s16): Likewise.
	(vreinterpretq_u8_s32): Likewise.
	(vreinterpretq_u8_s64): Likewise.
	(vreinterpretq_u8_s8): Likewise.
	(vreinterpretq_u8_u16): Likewise.
	(vreinterpretq_u8_u32): Likewise.
	(vreinterpretq_u8_u64): Likewise.
	(vreinterpretq_s32_f16): Likewise.
	(vreinterpretq_s32_f32): Likewise.
	(vreinterpretq_u16_f16): Likewise.
	(vreinterpretq_u16_f32): Likewise.
	(vreinterpretq_u32_f16): Likewise.
	(vreinterpretq_u32_f32): Likewise.
	(vreinterpretq_u64_f16): Likewise.
	(vreinterpretq_u64_f32): Likewise.
	(vreinterpretq_u8_f16): Likewise.
	(vreinterpretq_u8_f32): Likewise.
	(vreinterpretq_f16_f32): Likewise.
	(vreinterpretq_f16_s16): Likewise.
	(vreinterpretq_f16_s32): Likewise.
	(vreinterpretq_f16_s64): Likewise.
	(vreinterpretq_f16_s8): Likewise.
	(vreinterpretq_f16_u16): Likewise.
	(vreinterpretq_f16_u32): Likewise.
	(vreinterpretq_f16_u64): Likewise.
	(vreinterpretq_f16_u8): Likewise.
	(vreinterpretq_f32_f16): Likewise.
	(vreinterpretq_f32_s16): Likewise.
	(vreinterpretq_f32_s32): Likewise.
	(vreinterpretq_f32_s64): Likewise.
	(vreinterpretq_f32_s8): Likewise.
	(vreinterpretq_f32_u16): Likewise.
	(vreinterpretq_f32_u32): Likewise.
	(vreinterpretq_f32_u64): Likewise.
	(vreinterpretq_f32_u8): Likewise.
	(vreinterpretq_s16_f16): Likewise.
	(vreinterpretq_s16_f32): Likewise.
	(vreinterpretq_s64_f16): Likewise.
	(vreinterpretq_s64_f32): Likewise.
	(vreinterpretq_s8_f16): Likewise.
	(vreinterpretq_s8_f32): Likewise.
	(__arm_vreinterpretq_f16): Likewise.
	(__arm_vreinterpretq_f32): Likewise.
	(__arm_vreinterpretq_s16): Likewise.
	(__arm_vreinterpretq_s32): Likewise.
	(__arm_vreinterpretq_s64): Likewise.
	(__arm_vreinterpretq_s8): Likewise.
	(__arm_vreinterpretq_u16): Likewise.
	(__arm_vreinterpretq_u32): Likewise.
	(__arm_vreinterpretq_u64): Likewise.
	(__arm_vreinterpretq_u8): Likewise.
	* config/arm/arm_mve_types.h (__arm_vreinterpretq_s16_s32): Remove.
	(__arm_vreinterpretq_s16_s64): Likewise.
	(__arm_vreinterpretq_s16_s8): Likewise.
	(__arm_vreinterpretq_s16_u16): Likewise.
	(__arm_vreinterpretq_s16_u32): Likewise.
	(__arm_vreinterpretq_s16_u64): Likewise.
	(__arm_vreinterpretq_s16_u8): Likewise.
	(__arm_vreinterpretq_s32_s16): Likewise.
	(__arm_vreinterpretq_s32_s64): Likewise.
	(__arm_vreinterpretq_s32_s8): Likewise.
	(__arm_vreinterpretq_s32_u16): Likewise.
	(__arm_vreinterpretq_s32_u32): Likewise.
	(__arm_vreinterpretq_s32_u64): Likewise.
	(__arm_vreinterpretq_s32_u8): Likewise.
	(__arm_vreinterpretq_s64_s16): Likewise.
	(__arm_vreinterpretq_s64_s32): Likewise.
	(__arm_vreinterpretq_s64_s8): Likewise.
	(__arm_vreinterpretq_s64_u16): Likewise.
	(__arm_vreinterpretq_s64_u32): Likewise.
	(__arm_vreinterpretq_s64_u64): Likewise.
	(__arm_vreinterpretq_s64_u8): Likewise.
	(__arm_vreinterpretq_s8_s16): Likewise.
	(__arm_vreinterpretq_s8_s32): Likewise.
	(__arm_vreinterpretq_s8_s64): Likewise.
	(__arm_vreinterpretq_s8_u16): Likewise.
	(__arm_vreinterpretq_s8_u32): Likewise.
	(__arm_vreinterpretq_s8_u64): Likewise.
	(__arm_vreinterpretq_s8_u8): Likewise.
	(__arm_vreinterpretq_u16_s16): Likewise.
	(__arm_vreinterpretq_u16_s32): Likewise.
	(__arm_vreinterpretq_u16_s64): Likewise.
	(__arm_vreinterpretq_u16_s8): Likewise.
	(__arm_vreinterpretq_u16_u32): Likewise.
	(__arm_vreinterpretq_u16_u64): Likewise.
	(__arm_vreinterpretq_u16_u8): Likewise.
	(__arm_vreinterpretq_u32_s16): Likewise.
	(__arm_vreinterpretq_u32_s32): Likewise.
	(__arm_vreinterpretq_u32_s64): Likewise.
	(__arm_vreinterpretq_u32_s8): Likewise.
	(__arm_vreinterpretq_u32_u16): Likewise.
	(__arm_vreinterpretq_u32_u64): Likewise.
	(__arm_vreinterpretq_u32_u8): Likewise.
	(__arm_vreinterpretq_u64_s16): Likewise.
	(__arm_vreinterpretq_u64_s32): Likewise.
	(__arm_vreinterpretq_u64_s64): Likewise.
	(__arm_vreinterpretq_u64_s8): Likewise.
	(__arm_vreinterpretq_u64_u16): Likewise.
	(__arm_vreinterpretq_u64_u32): Likewise.
	(__arm_vreinterpretq_u64_u8): Likewise.
	(__arm_vreinterpretq_u8_s16): Likewise.
	(__arm_vreinterpretq_u8_s32): Likewise.
	(__arm_vreinterpretq_u8_s64): Likewise.
	(__arm_vreinterpretq_u8_s8): Likewise.
	(__arm_vreinterpretq_u8_u16): Likewise.
	(__arm_vreinterpretq_u8_u32): Likewise.
	(__arm_vreinterpretq_u8_u64): Likewise.
	(__arm_vreinterpretq_s32_f16): Likewise.
	(__arm_vreinterpretq_s32_f32): Likewise.
	(__arm_vreinterpretq_s16_f16): Likewise.
	(__arm_vreinterpretq_s16_f32): Likewise.
	(__arm_vreinterpretq_s64_f16): Likewise.
	(__arm_vreinterpretq_s64_f32): Likewise.
	(__arm_vreinterpretq_s8_f16): Likewise.
	(__arm_vreinterpretq_s8_f32): Likewise.
	(__arm_vreinterpretq_u16_f16): Likewise.
	(__arm_vreinterpretq_u16_f32): Likewise.
	(__arm_vreinterpretq_u32_f16): Likewise.
	(__arm_vreinterpretq_u32_f32): Likewise.
	(__arm_vreinterpretq_u64_f16): Likewise.
	(__arm_vreinterpretq_u64_f32): Likewise.
	(__arm_vreinterpretq_u8_f16): Likewise.
	(__arm_vreinterpretq_u8_f32): Likewise.
	(__arm_vreinterpretq_f16_f32): Likewise.
	(__arm_vreinterpretq_f16_s16): Likewise.
	(__arm_vreinterpretq_f16_s32): Likewise.
	(__arm_vreinterpretq_f16_s64): Likewise.
	(__arm_vreinterpretq_f16_s8): Likewise.
	(__arm_vreinterpretq_f16_u16): Likewise.
	(__arm_vreinterpretq_f16_u32): Likewise.
	(__arm_vreinterpretq_f16_u64): Likewise.
	(__arm_vreinterpretq_f16_u8): Likewise.
	(__arm_vreinterpretq_f32_f16): Likewise.
	(__arm_vreinterpretq_f32_s16): Likewise.
	(__arm_vreinterpretq_f32_s32): Likewise.
	(__arm_vreinterpretq_f32_s64): Likewise.
	(__arm_vreinterpretq_f32_s8): Likewise.
	(__arm_vreinterpretq_f32_u16): Likewise.
	(__arm_vreinterpretq_f32_u32): Likewise.
	(__arm_vreinterpretq_f32_u64): Likewise.
	(__arm_vreinterpretq_f32_u8): Likewise.
	(__arm_vreinterpretq_s16): Likewise.
	(__arm_vreinterpretq_s32): Likewise.
	(__arm_vreinterpretq_s64): Likewise.
	(__arm_vreinterpretq_s8): Likewise.
	(__arm_vreinterpretq_u16): Likewise.
	(__arm_vreinterpretq_u32): Likewise.
	(__arm_vreinterpretq_u64): Likewise.
	(__arm_vreinterpretq_u8): Likewise.
	(__arm_vreinterpretq_f16): Likewise.
	(__arm_vreinterpretq_f32): Likewise.
	* config/arm/mve.md (@arm_mve_reinterpret<mode>): New pattern.
	* config/arm/unspecs.md: (REINTERPRET): New unspec.

	gcc/testsuite/
	* g++.target/arm/mve.exp: Add general-c++ and general directories.
	* g++.target/arm/mve/general-c++/nomve_fp_1.c: New test.
	* g++.target/arm/mve/general-c++/vreinterpretq_1.C: New test.
	* gcc.target/arm/mve/general-c/nomve_fp_1.c: New test.
	* gcc.target/arm/mve/general-c/vreinterpretq_1.c: New test.
---
 gcc/config/arm/arm-mve-builtins-base.cc       |   29 +
 gcc/config/arm/arm-mve-builtins-base.def      |    2 +
 gcc/config/arm/arm-mve-builtins-base.h        |    2 +
 gcc/config/arm/arm-mve-builtins-shapes.cc     |   28 +
 gcc/config/arm/arm-mve-builtins-shapes.h      |    8 +
 gcc/config/arm/arm-mve-builtins.cc            |   60 +
 gcc/config/arm/arm_mve.h                      |  300 ----
 gcc/config/arm/arm_mve_types.h                | 1365 +----------------
 gcc/config/arm/mve.md                         |   18 +
 gcc/config/arm/unspecs.md                     |    1 +
 gcc/testsuite/g++.target/arm/mve.exp          |    8 +-
 .../arm/mve/general-c++/nomve_fp_1.c          |   15 +
 .../arm/mve/general-c++/vreinterpretq_1.C     |   25 +
 .../gcc.target/arm/mve/general-c/nomve_fp_1.c |   15 +
 .../arm/mve/general-c/vreinterpretq_1.c       |   25 +
 15 files changed, 286 insertions(+), 1615 deletions(-)
 create mode 100644 gcc/testsuite/g++.target/arm/mve/general-c++/nomve_fp_1.c
 create mode 100644 gcc/testsuite/g++.target/arm/mve/general-c++/vreinterpretq_1.C
 create mode 100644 gcc/testsuite/gcc.target/arm/mve/general-c/nomve_fp_1.c
 create mode 100644 gcc/testsuite/gcc.target/arm/mve/general-c/vreinterpretq_1.c

diff --git a/gcc/config/arm/arm-mve-builtins-base.cc b/gcc/config/arm/arm-mve-builtins-base.cc
index e9f285faf2b..ad8d500afc6 100644
--- a/gcc/config/arm/arm-mve-builtins-base.cc
+++ b/gcc/config/arm/arm-mve-builtins-base.cc
@@ -38,8 +38,37 @@ using namespace arm_mve;
 
 namespace {
 
+/* Implements vreinterpretq_* intrinsics.  */
+class vreinterpretq_impl : public quiet<function_base>
+{
+  gimple *
+  fold (gimple_folder &f) const override
+  {
+    /* Punt to rtl if the effect of the reinterpret on registers does not
+       conform to GCC's endianness model.  */
+    if (!targetm.can_change_mode_class (f.vector_mode (0),
+					f.vector_mode (1), VFP_REGS))
+      return NULL;
+
+    /* Otherwise vreinterpret corresponds directly to a VIEW_CONVERT_EXPR
+       reinterpretation.  */
+    tree rhs = build1 (VIEW_CONVERT_EXPR, TREE_TYPE (f.lhs),
+		       gimple_call_arg (f.call, 0));
+    return gimple_build_assign (f.lhs, VIEW_CONVERT_EXPR, rhs);
+  }
+
+  rtx
+  expand (function_expander &e) const override
+  {
+    machine_mode mode = e.vector_mode (0);
+    return e.use_exact_insn (code_for_arm_mve_reinterpret (mode));
+  }
+};
+
 } /* end anonymous namespace */
 
 namespace arm_mve {
 
+FUNCTION (vreinterpretq, vreinterpretq_impl,)
+
 } /* end namespace arm_mve */
diff --git a/gcc/config/arm/arm-mve-builtins-base.def b/gcc/config/arm/arm-mve-builtins-base.def
index d15ba2e23e8..5c0c1b9cee7 100644
--- a/gcc/config/arm/arm-mve-builtins-base.def
+++ b/gcc/config/arm/arm-mve-builtins-base.def
@@ -18,7 +18,9 @@
    <http://www.gnu.org/licenses/>.  */
 
 #define REQUIRES_FLOAT false
+DEF_MVE_FUNCTION (vreinterpretq, unary_convert, reinterpret_integer, none)
 #undef REQUIRES_FLOAT
 
 #define REQUIRES_FLOAT true
+DEF_MVE_FUNCTION (vreinterpretq, unary_convert, reinterpret_float, none)
 #undef REQUIRES_FLOAT
diff --git a/gcc/config/arm/arm-mve-builtins-base.h b/gcc/config/arm/arm-mve-builtins-base.h
index c4d7b750cd5..60e7bd24eda 100644
--- a/gcc/config/arm/arm-mve-builtins-base.h
+++ b/gcc/config/arm/arm-mve-builtins-base.h
@@ -23,6 +23,8 @@
 namespace arm_mve {
 namespace functions {
 
+extern const function_base *const vreinterpretq;
+
 } /* end namespace arm_mve::functions */
 } /* end namespace arm_mve */
 
diff --git a/gcc/config/arm/arm-mve-builtins-shapes.cc b/gcc/config/arm/arm-mve-builtins-shapes.cc
index f20660d8319..d0da0ffef91 100644
--- a/gcc/config/arm/arm-mve-builtins-shapes.cc
+++ b/gcc/config/arm/arm-mve-builtins-shapes.cc
@@ -338,6 +338,34 @@ struct overloaded_base : public function_shape
   }
 };
 
+/* <T0>_t foo_t0[_t1](<T1>_t)
+
+   where the target type <t0> must be specified explicitly but the source
+   type <t1> can be inferred.
+
+   Example: vreinterpretq.
+   int16x8_t [__arm_]vreinterpretq_s16[_s8](int8x16_t a)
+   int32x4_t [__arm_]vreinterpretq_s32[_s8](int8x16_t a)
+   int8x16_t [__arm_]vreinterpretq_s8[_s16](int16x8_t a)
+   int8x16_t [__arm_]vreinterpretq_s8[_s32](int32x4_t a)  */
+struct unary_convert_def : public overloaded_base<1>
+{
+  void
+  build (function_builder &b, const function_group_info &group,
+	 bool preserve_user_namespace) const override
+  {
+    b.add_overloaded_functions (group, MODE_none, preserve_user_namespace);
+    build_all (b, "v0,v1", group, MODE_none, preserve_user_namespace);
+  }
+
+  tree
+  resolve (function_resolver &r) const override
+  {
+    return r.resolve_unary ();
+  }
+};
+SHAPE (unary_convert)
+
 } /* end namespace arm_mve */
 
 #undef SHAPE
diff --git a/gcc/config/arm/arm-mve-builtins-shapes.h b/gcc/config/arm/arm-mve-builtins-shapes.h
index 9e353b85a76..04d19a02890 100644
--- a/gcc/config/arm/arm-mve-builtins-shapes.h
+++ b/gcc/config/arm/arm-mve-builtins-shapes.h
@@ -22,8 +22,16 @@
 
 namespace arm_mve
 {
+  /* The naming convention is:
+
+     - to use names like "unary" etc. if the rules are somewhat generic,
+       especially if there are no ranges involved.  */
+
   namespace shapes
   {
+
+    extern const function_shape *const unary_convert;
+
   } /* end namespace arm_mve::shapes */
 } /* end namespace arm_mve */
 
diff --git a/gcc/config/arm/arm-mve-builtins.cc b/gcc/config/arm/arm-mve-builtins.cc
index b0cceb75ceb..e409a029346 100644
--- a/gcc/config/arm/arm-mve-builtins.cc
+++ b/gcc/config/arm/arm-mve-builtins.cc
@@ -199,6 +199,52 @@ CONSTEXPR const type_suffix_info type_suffixes[NUM_TYPE_SUFFIXES + 1] = {
 #define TYPES_signed_32(S, D) \
   S (s32)
 
+#define TYPES_reinterpret_signed1(D, A) \
+  D (A, s8), D (A, s16), D (A, s32), D (A, s64)
+
+#define TYPES_reinterpret_unsigned1(D, A) \
+  D (A, u8), D (A, u16), D (A, u32), D (A, u64)
+
+#define TYPES_reinterpret_integer(S, D) \
+  TYPES_reinterpret_unsigned1 (D, s8), \
+  D (s8, s16), D (s8, s32), D (s8, s64), \
+  TYPES_reinterpret_unsigned1 (D, s16), \
+  D (s16, s8), D (s16, s32), D (s16, s64), \
+  TYPES_reinterpret_unsigned1 (D, s32), \
+  D (s32, s8), D (s32, s16), D (s32, s64), \
+  TYPES_reinterpret_unsigned1 (D, s64), \
+  D (s64, s8), D (s64, s16), D (s64, s32), \
+  TYPES_reinterpret_signed1 (D, u8), \
+  D (u8, u16), D (u8, u32), D (u8, u64), \
+  TYPES_reinterpret_signed1 (D, u16), \
+  D (u16, u8), D (u16, u32), D (u16, u64), \
+  TYPES_reinterpret_signed1 (D, u32), \
+  D (u32, u8), D (u32, u16), D (u32, u64), \
+  TYPES_reinterpret_signed1 (D, u64), \
+  D (u64, u8), D (u64, u16), D (u64, u32)
+
+/* { _s8  _s16 _s32 _s64 } x { _s8  _s16 _s32 _s64 }
+   { _u8  _u16 _u32 _u64 }   { _u8  _u16 _u32 _u64 }.  */
+#define TYPES_reinterpret_integer1(D, A) \
+  TYPES_reinterpret_signed1 (D, A), \
+  TYPES_reinterpret_unsigned1 (D, A)
+
+#define TYPES_reinterpret_float1(D, A) \
+  D (A, f16), D (A, f32)
+
+#define TYPES_reinterpret_float(S, D) \
+  TYPES_reinterpret_float1 (D, s8), \
+  TYPES_reinterpret_float1 (D, s16), \
+  TYPES_reinterpret_float1 (D, s32), \
+  TYPES_reinterpret_float1 (D, s64), \
+  TYPES_reinterpret_float1 (D, u8), \
+  TYPES_reinterpret_float1 (D, u16), \
+  TYPES_reinterpret_float1 (D, u32), \
+  TYPES_reinterpret_float1 (D, u64), \
+  TYPES_reinterpret_integer1 (D, f16), \
+  TYPES_reinterpret_integer1 (D, f32), \
+  D (f16, f32), D (f32, f16)
+
 /* Describe a pair of type suffixes in which only the first is used.  */
 #define DEF_VECTOR_TYPE(X) { TYPE_SUFFIX_ ## X, NUM_TYPE_SUFFIXES }
 
@@ -231,6 +277,8 @@ DEF_MVE_TYPES_ARRAY (integer_16_32);
 DEF_MVE_TYPES_ARRAY (integer_32);
 DEF_MVE_TYPES_ARRAY (signed_16_32);
 DEF_MVE_TYPES_ARRAY (signed_32);
+DEF_MVE_TYPES_ARRAY (reinterpret_integer);
+DEF_MVE_TYPES_ARRAY (reinterpret_float);
 
 /* Used by functions that have no governing predicate.  */
 static const predication_index preds_none[] = { PRED_none, NUM_PREDS };
@@ -253,6 +301,14 @@ static const predication_index preds_p_or_none[] = {
   PRED_p, PRED_none, NUM_PREDS
 };
 
+/* A list of all MVE ACLE functions.  */
+static CONSTEXPR const function_group_info function_groups[] = {
+#define DEF_MVE_FUNCTION(NAME, SHAPE, TYPES, PREDS)			\
+  { #NAME, &functions::NAME, &shapes::SHAPE, types_##TYPES, preds_##PREDS, \
+    REQUIRES_FLOAT },
+#include "arm-mve-builtins.def"
+};
+
 /* The scalar type associated with each vector type.  */
 extern GTY(()) tree scalar_types[NUM_VECTOR_TYPES];
 tree scalar_types[NUM_VECTOR_TYPES];
@@ -431,6 +487,10 @@ handle_arm_mve_h (bool preserve_user_namespace)
 
   /* Define MVE functions.  */
   function_table = new hash_table<registered_function_hasher> (1023);
+  function_builder builder;
+  for (unsigned int i = 0; i < ARRAY_SIZE (function_groups); ++i)
+    builder.register_function_group (function_groups[i],
+				     preserve_user_namespace);
 }
 
 /* Return true if CANDIDATE is equivalent to MODEL_TYPE for overloading
diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
index 0d2ba968fc0..7688b5a7e53 100644
--- a/gcc/config/arm/arm_mve.h
+++ b/gcc/config/arm/arm_mve.h
@@ -358,14 +358,6 @@
 #define vstrwq_scatter_shifted_offset_p(__base, __offset, __value, __p) __arm_vstrwq_scatter_shifted_offset_p(__base, __offset, __value, __p)
 #define vstrwq_scatter_shifted_offset(__base, __offset, __value) __arm_vstrwq_scatter_shifted_offset(__base, __offset, __value)
 #define vuninitializedq(__v) __arm_vuninitializedq(__v)
-#define vreinterpretq_s16(__a) __arm_vreinterpretq_s16(__a)
-#define vreinterpretq_s32(__a) __arm_vreinterpretq_s32(__a)
-#define vreinterpretq_s64(__a) __arm_vreinterpretq_s64(__a)
-#define vreinterpretq_s8(__a) __arm_vreinterpretq_s8(__a)
-#define vreinterpretq_u16(__a) __arm_vreinterpretq_u16(__a)
-#define vreinterpretq_u32(__a) __arm_vreinterpretq_u32(__a)
-#define vreinterpretq_u64(__a) __arm_vreinterpretq_u64(__a)
-#define vreinterpretq_u8(__a) __arm_vreinterpretq_u8(__a)
 #define vddupq_m(__inactive, __a, __imm, __p) __arm_vddupq_m(__inactive, __a, __imm, __p)
 #define vddupq_u8(__a, __imm) __arm_vddupq_u8(__a, __imm)
 #define vddupq_u32(__a, __imm) __arm_vddupq_u32(__a, __imm)
@@ -518,8 +510,6 @@
 #define vfmsq_m(__a, __b, __c, __p) __arm_vfmsq_m(__a, __b, __c, __p)
 #define vmaxnmq_m(__inactive, __a, __b, __p) __arm_vmaxnmq_m(__inactive, __a, __b, __p)
 #define vminnmq_m(__inactive, __a, __b, __p) __arm_vminnmq_m(__inactive, __a, __b, __p)
-#define vreinterpretq_f16(__a) __arm_vreinterpretq_f16(__a)
-#define vreinterpretq_f32(__a) __arm_vreinterpretq_f32(__a)
 #define vminnmq_x(__a, __b, __p) __arm_vminnmq_x(__a, __b, __p)
 #define vmaxnmq_x(__a, __b, __p) __arm_vmaxnmq_x(__a, __b, __p)
 #define vcmulq_x(__a, __b, __p) __arm_vcmulq_x(__a, __b, __p)
@@ -2365,96 +2355,6 @@
 #define vaddq_u32(__a, __b) __arm_vaddq_u32(__a, __b)
 #define vaddq_f16(__a, __b) __arm_vaddq_f16(__a, __b)
 #define vaddq_f32(__a, __b) __arm_vaddq_f32(__a, __b)
-#define vreinterpretq_s16_s32(__a) __arm_vreinterpretq_s16_s32(__a)
-#define vreinterpretq_s16_s64(__a) __arm_vreinterpretq_s16_s64(__a)
-#define vreinterpretq_s16_s8(__a) __arm_vreinterpretq_s16_s8(__a)
-#define vreinterpretq_s16_u16(__a) __arm_vreinterpretq_s16_u16(__a)
-#define vreinterpretq_s16_u32(__a) __arm_vreinterpretq_s16_u32(__a)
-#define vreinterpretq_s16_u64(__a) __arm_vreinterpretq_s16_u64(__a)
-#define vreinterpretq_s16_u8(__a) __arm_vreinterpretq_s16_u8(__a)
-#define vreinterpretq_s32_s16(__a) __arm_vreinterpretq_s32_s16(__a)
-#define vreinterpretq_s32_s64(__a) __arm_vreinterpretq_s32_s64(__a)
-#define vreinterpretq_s32_s8(__a) __arm_vreinterpretq_s32_s8(__a)
-#define vreinterpretq_s32_u16(__a) __arm_vreinterpretq_s32_u16(__a)
-#define vreinterpretq_s32_u32(__a) __arm_vreinterpretq_s32_u32(__a)
-#define vreinterpretq_s32_u64(__a) __arm_vreinterpretq_s32_u64(__a)
-#define vreinterpretq_s32_u8(__a) __arm_vreinterpretq_s32_u8(__a)
-#define vreinterpretq_s64_s16(__a) __arm_vreinterpretq_s64_s16(__a)
-#define vreinterpretq_s64_s32(__a) __arm_vreinterpretq_s64_s32(__a)
-#define vreinterpretq_s64_s8(__a) __arm_vreinterpretq_s64_s8(__a)
-#define vreinterpretq_s64_u16(__a) __arm_vreinterpretq_s64_u16(__a)
-#define vreinterpretq_s64_u32(__a) __arm_vreinterpretq_s64_u32(__a)
-#define vreinterpretq_s64_u64(__a) __arm_vreinterpretq_s64_u64(__a)
-#define vreinterpretq_s64_u8(__a) __arm_vreinterpretq_s64_u8(__a)
-#define vreinterpretq_s8_s16(__a) __arm_vreinterpretq_s8_s16(__a)
-#define vreinterpretq_s8_s32(__a) __arm_vreinterpretq_s8_s32(__a)
-#define vreinterpretq_s8_s64(__a) __arm_vreinterpretq_s8_s64(__a)
-#define vreinterpretq_s8_u16(__a) __arm_vreinterpretq_s8_u16(__a)
-#define vreinterpretq_s8_u32(__a) __arm_vreinterpretq_s8_u32(__a)
-#define vreinterpretq_s8_u64(__a) __arm_vreinterpretq_s8_u64(__a)
-#define vreinterpretq_s8_u8(__a) __arm_vreinterpretq_s8_u8(__a)
-#define vreinterpretq_u16_s16(__a) __arm_vreinterpretq_u16_s16(__a)
-#define vreinterpretq_u16_s32(__a) __arm_vreinterpretq_u16_s32(__a)
-#define vreinterpretq_u16_s64(__a) __arm_vreinterpretq_u16_s64(__a)
-#define vreinterpretq_u16_s8(__a) __arm_vreinterpretq_u16_s8(__a)
-#define vreinterpretq_u16_u32(__a) __arm_vreinterpretq_u16_u32(__a)
-#define vreinterpretq_u16_u64(__a) __arm_vreinterpretq_u16_u64(__a)
-#define vreinterpretq_u16_u8(__a) __arm_vreinterpretq_u16_u8(__a)
-#define vreinterpretq_u32_s16(__a) __arm_vreinterpretq_u32_s16(__a)
-#define vreinterpretq_u32_s32(__a) __arm_vreinterpretq_u32_s32(__a)
-#define vreinterpretq_u32_s64(__a) __arm_vreinterpretq_u32_s64(__a)
-#define vreinterpretq_u32_s8(__a) __arm_vreinterpretq_u32_s8(__a)
-#define vreinterpretq_u32_u16(__a) __arm_vreinterpretq_u32_u16(__a)
-#define vreinterpretq_u32_u64(__a) __arm_vreinterpretq_u32_u64(__a)
-#define vreinterpretq_u32_u8(__a) __arm_vreinterpretq_u32_u8(__a)
-#define vreinterpretq_u64_s16(__a) __arm_vreinterpretq_u64_s16(__a)
-#define vreinterpretq_u64_s32(__a) __arm_vreinterpretq_u64_s32(__a)
-#define vreinterpretq_u64_s64(__a) __arm_vreinterpretq_u64_s64(__a)
-#define vreinterpretq_u64_s8(__a) __arm_vreinterpretq_u64_s8(__a)
-#define vreinterpretq_u64_u16(__a) __arm_vreinterpretq_u64_u16(__a)
-#define vreinterpretq_u64_u32(__a) __arm_vreinterpretq_u64_u32(__a)
-#define vreinterpretq_u64_u8(__a) __arm_vreinterpretq_u64_u8(__a)
-#define vreinterpretq_u8_s16(__a) __arm_vreinterpretq_u8_s16(__a)
-#define vreinterpretq_u8_s32(__a) __arm_vreinterpretq_u8_s32(__a)
-#define vreinterpretq_u8_s64(__a) __arm_vreinterpretq_u8_s64(__a)
-#define vreinterpretq_u8_s8(__a) __arm_vreinterpretq_u8_s8(__a)
-#define vreinterpretq_u8_u16(__a) __arm_vreinterpretq_u8_u16(__a)
-#define vreinterpretq_u8_u32(__a) __arm_vreinterpretq_u8_u32(__a)
-#define vreinterpretq_u8_u64(__a) __arm_vreinterpretq_u8_u64(__a)
-#define vreinterpretq_s32_f16(__a) __arm_vreinterpretq_s32_f16(__a)
-#define vreinterpretq_s32_f32(__a) __arm_vreinterpretq_s32_f32(__a)
-#define vreinterpretq_u16_f16(__a) __arm_vreinterpretq_u16_f16(__a)
-#define vreinterpretq_u16_f32(__a) __arm_vreinterpretq_u16_f32(__a)
-#define vreinterpretq_u32_f16(__a) __arm_vreinterpretq_u32_f16(__a)
-#define vreinterpretq_u32_f32(__a) __arm_vreinterpretq_u32_f32(__a)
-#define vreinterpretq_u64_f16(__a) __arm_vreinterpretq_u64_f16(__a)
-#define vreinterpretq_u64_f32(__a) __arm_vreinterpretq_u64_f32(__a)
-#define vreinterpretq_u8_f16(__a) __arm_vreinterpretq_u8_f16(__a)
-#define vreinterpretq_u8_f32(__a) __arm_vreinterpretq_u8_f32(__a)
-#define vreinterpretq_f16_f32(__a) __arm_vreinterpretq_f16_f32(__a)
-#define vreinterpretq_f16_s16(__a) __arm_vreinterpretq_f16_s16(__a)
-#define vreinterpretq_f16_s32(__a) __arm_vreinterpretq_f16_s32(__a)
-#define vreinterpretq_f16_s64(__a) __arm_vreinterpretq_f16_s64(__a)
-#define vreinterpretq_f16_s8(__a) __arm_vreinterpretq_f16_s8(__a)
-#define vreinterpretq_f16_u16(__a) __arm_vreinterpretq_f16_u16(__a)
-#define vreinterpretq_f16_u32(__a) __arm_vreinterpretq_f16_u32(__a)
-#define vreinterpretq_f16_u64(__a) __arm_vreinterpretq_f16_u64(__a)
-#define vreinterpretq_f16_u8(__a) __arm_vreinterpretq_f16_u8(__a)
-#define vreinterpretq_f32_f16(__a) __arm_vreinterpretq_f32_f16(__a)
-#define vreinterpretq_f32_s16(__a) __arm_vreinterpretq_f32_s16(__a)
-#define vreinterpretq_f32_s32(__a) __arm_vreinterpretq_f32_s32(__a)
-#define vreinterpretq_f32_s64(__a) __arm_vreinterpretq_f32_s64(__a)
-#define vreinterpretq_f32_s8(__a) __arm_vreinterpretq_f32_s8(__a)
-#define vreinterpretq_f32_u16(__a) __arm_vreinterpretq_f32_u16(__a)
-#define vreinterpretq_f32_u32(__a) __arm_vreinterpretq_f32_u32(__a)
-#define vreinterpretq_f32_u64(__a) __arm_vreinterpretq_f32_u64(__a)
-#define vreinterpretq_f32_u8(__a) __arm_vreinterpretq_f32_u8(__a)
-#define vreinterpretq_s16_f16(__a) __arm_vreinterpretq_s16_f16(__a)
-#define vreinterpretq_s16_f32(__a) __arm_vreinterpretq_s16_f32(__a)
-#define vreinterpretq_s64_f16(__a) __arm_vreinterpretq_s64_f16(__a)
-#define vreinterpretq_s64_f32(__a) __arm_vreinterpretq_s64_f32(__a)
-#define vreinterpretq_s8_f16(__a) __arm_vreinterpretq_s8_f16(__a)
-#define vreinterpretq_s8_f32(__a) __arm_vreinterpretq_s8_f32(__a)
 #define vuninitializedq_u8(void) __arm_vuninitializedq_u8(void)
 #define vuninitializedq_u16(void) __arm_vuninitializedq_u16(void)
 #define vuninitializedq_u32(void) __arm_vuninitializedq_u32(void)
@@ -37874,126 +37774,6 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_float16x8_t]: __arm_vuninitializedq_f16 (), \
   int (*)[__ARM_mve_type_float32x4_t]: __arm_vuninitializedq_f32 ());})
 
-#define __arm_vreinterpretq_f16(p0) ({ __typeof(p0) __p0 = (p0); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
-  int (*)[__ARM_mve_type_int8x16_t]: __arm_vreinterpretq_f16_s8 (__ARM_mve_coerce(__p0, int8x16_t)), \
-  int (*)[__ARM_mve_type_int16x8_t]: __arm_vreinterpretq_f16_s16 (__ARM_mve_coerce(__p0, int16x8_t)), \
-  int (*)[__ARM_mve_type_int32x4_t]: __arm_vreinterpretq_f16_s32 (__ARM_mve_coerce(__p0, int32x4_t)), \
-  int (*)[__ARM_mve_type_int64x2_t]: __arm_vreinterpretq_f16_s64 (__ARM_mve_coerce(__p0, int64x2_t)), \
-  int (*)[__ARM_mve_type_uint8x16_t]: __arm_vreinterpretq_f16_u8 (__ARM_mve_coerce(__p0, uint8x16_t)), \
-  int (*)[__ARM_mve_type_uint16x8_t]: __arm_vreinterpretq_f16_u16 (__ARM_mve_coerce(__p0, uint16x8_t)), \
-  int (*)[__ARM_mve_type_uint32x4_t]: __arm_vreinterpretq_f16_u32 (__ARM_mve_coerce(__p0, uint32x4_t)), \
-  int (*)[__ARM_mve_type_uint64x2_t]: __arm_vreinterpretq_f16_u64 (__ARM_mve_coerce(__p0, uint64x2_t)), \
-  int (*)[__ARM_mve_type_float32x4_t]: __arm_vreinterpretq_f16_f32 (__ARM_mve_coerce(__p0, float32x4_t)));})
-
-#define __arm_vreinterpretq_f32(p0) ({ __typeof(p0) __p0 = (p0); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
-  int (*)[__ARM_mve_type_int8x16_t]: __arm_vreinterpretq_f32_s8 (__ARM_mve_coerce(__p0, int8x16_t)), \
-  int (*)[__ARM_mve_type_int16x8_t]: __arm_vreinterpretq_f32_s16 (__ARM_mve_coerce(__p0, int16x8_t)), \
-  int (*)[__ARM_mve_type_int32x4_t]: __arm_vreinterpretq_f32_s32 (__ARM_mve_coerce(__p0, int32x4_t)), \
-  int (*)[__ARM_mve_type_int64x2_t]: __arm_vreinterpretq_f32_s64 (__ARM_mve_coerce(__p0, int64x2_t)), \
-  int (*)[__ARM_mve_type_uint8x16_t]: __arm_vreinterpretq_f32_u8 (__ARM_mve_coerce(__p0, uint8x16_t)), \
-  int (*)[__ARM_mve_type_uint16x8_t]: __arm_vreinterpretq_f32_u16 (__ARM_mve_coerce(__p0, uint16x8_t)), \
-  int (*)[__ARM_mve_type_uint32x4_t]: __arm_vreinterpretq_f32_u32 (__ARM_mve_coerce(__p0, uint32x4_t)), \
-  int (*)[__ARM_mve_type_uint64x2_t]: __arm_vreinterpretq_f32_u64 (__ARM_mve_coerce(__p0, uint64x2_t)), \
-  int (*)[__ARM_mve_type_float16x8_t]: __arm_vreinterpretq_f32_f16 (__ARM_mve_coerce(__p0, float16x8_t)));})
-
-#define __arm_vreinterpretq_s16(p0) ({ __typeof(p0) __p0 = (p0); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
-  int (*)[__ARM_mve_type_float16x8_t]: __arm_vreinterpretq_s16_f16 (__ARM_mve_coerce(__p0, float16x8_t)), \
-  int (*)[__ARM_mve_type_int8x16_t]: __arm_vreinterpretq_s16_s8 (__ARM_mve_coerce(__p0, int8x16_t)), \
-  int (*)[__ARM_mve_type_int32x4_t]: __arm_vreinterpretq_s16_s32 (__ARM_mve_coerce(__p0, int32x4_t)), \
-  int (*)[__ARM_mve_type_int64x2_t]: __arm_vreinterpretq_s16_s64 (__ARM_mve_coerce(__p0, int64x2_t)), \
-  int (*)[__ARM_mve_type_uint8x16_t]: __arm_vreinterpretq_s16_u8 (__ARM_mve_coerce(__p0, uint8x16_t)), \
-  int (*)[__ARM_mve_type_uint16x8_t]: __arm_vreinterpretq_s16_u16 (__ARM_mve_coerce(__p0, uint16x8_t)), \
-  int (*)[__ARM_mve_type_uint32x4_t]: __arm_vreinterpretq_s16_u32 (__ARM_mve_coerce(__p0, uint32x4_t)), \
-  int (*)[__ARM_mve_type_uint64x2_t]: __arm_vreinterpretq_s16_u64 (__ARM_mve_coerce(__p0, uint64x2_t)), \
-  int (*)[__ARM_mve_type_float32x4_t]: __arm_vreinterpretq_s16_f32 (__ARM_mve_coerce(__p0, float32x4_t)));})
-
-#define __arm_vreinterpretq_s32(p0) ({ __typeof(p0) __p0 = (p0); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
-  int (*)[__ARM_mve_type_float16x8_t]: __arm_vreinterpretq_s32_f16 (__ARM_mve_coerce(__p0, float16x8_t)), \
-  int (*)[__ARM_mve_type_int16x8_t]: __arm_vreinterpretq_s32_s16 (__ARM_mve_coerce(__p0, int16x8_t)), \
-  int (*)[__ARM_mve_type_int8x16_t]: __arm_vreinterpretq_s32_s8 (__ARM_mve_coerce(__p0, int8x16_t)), \
-  int (*)[__ARM_mve_type_int64x2_t]: __arm_vreinterpretq_s32_s64 (__ARM_mve_coerce(__p0, int64x2_t)), \
-  int (*)[__ARM_mve_type_uint8x16_t]: __arm_vreinterpretq_s32_u8 (__ARM_mve_coerce(__p0, uint8x16_t)), \
-  int (*)[__ARM_mve_type_uint16x8_t]: __arm_vreinterpretq_s32_u16 (__ARM_mve_coerce(__p0, uint16x8_t)), \
-  int (*)[__ARM_mve_type_uint32x4_t]: __arm_vreinterpretq_s32_u32 (__ARM_mve_coerce(__p0, uint32x4_t)), \
-  int (*)[__ARM_mve_type_uint64x2_t]: __arm_vreinterpretq_s32_u64 (__ARM_mve_coerce(__p0, uint64x2_t)), \
-  int (*)[__ARM_mve_type_float32x4_t]: __arm_vreinterpretq_s32_f32 (__ARM_mve_coerce(__p0, float32x4_t)));})
-
-#define __arm_vreinterpretq_s64(p0) ({ __typeof(p0) __p0 = (p0); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
-  int (*)[__ARM_mve_type_float16x8_t]: __arm_vreinterpretq_s64_f16 (__ARM_mve_coerce(__p0, float16x8_t)), \
-  int (*)[__ARM_mve_type_int16x8_t]: __arm_vreinterpretq_s64_s16 (__ARM_mve_coerce(__p0, int16x8_t)), \
-  int (*)[__ARM_mve_type_int32x4_t]: __arm_vreinterpretq_s64_s32 (__ARM_mve_coerce(__p0, int32x4_t)), \
-  int (*)[__ARM_mve_type_int8x16_t]: __arm_vreinterpretq_s64_s8 (__ARM_mve_coerce(__p0, int8x16_t)), \
-  int (*)[__ARM_mve_type_uint8x16_t]: __arm_vreinterpretq_s64_u8 (__ARM_mve_coerce(__p0, uint8x16_t)), \
-  int (*)[__ARM_mve_type_uint16x8_t]: __arm_vreinterpretq_s64_u16 (__ARM_mve_coerce(__p0, uint16x8_t)), \
-  int (*)[__ARM_mve_type_uint32x4_t]: __arm_vreinterpretq_s64_u32 (__ARM_mve_coerce(__p0, uint32x4_t)), \
-  int (*)[__ARM_mve_type_uint64x2_t]: __arm_vreinterpretq_s64_u64 (__ARM_mve_coerce(__p0, uint64x2_t)), \
-  int (*)[__ARM_mve_type_float32x4_t]: __arm_vreinterpretq_s64_f32 (__ARM_mve_coerce(__p0, float32x4_t)));})
-
-#define __arm_vreinterpretq_s8(p0) ({ __typeof(p0) __p0 = (p0); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
-  int (*)[__ARM_mve_type_float16x8_t]: __arm_vreinterpretq_s8_f16 (__ARM_mve_coerce(__p0, float16x8_t)), \
-  int (*)[__ARM_mve_type_int16x8_t]: __arm_vreinterpretq_s8_s16 (__ARM_mve_coerce(__p0, int16x8_t)), \
-  int (*)[__ARM_mve_type_int32x4_t]: __arm_vreinterpretq_s8_s32 (__ARM_mve_coerce(__p0, int32x4_t)), \
-  int (*)[__ARM_mve_type_int64x2_t]: __arm_vreinterpretq_s8_s64 (__ARM_mve_coerce(__p0, int64x2_t)), \
-  int (*)[__ARM_mve_type_uint8x16_t]: __arm_vreinterpretq_s8_u8 (__ARM_mve_coerce(__p0, uint8x16_t)), \
-  int (*)[__ARM_mve_type_uint16x8_t]: __arm_vreinterpretq_s8_u16 (__ARM_mve_coerce(__p0, uint16x8_t)), \
-  int (*)[__ARM_mve_type_uint32x4_t]: __arm_vreinterpretq_s8_u32 (__ARM_mve_coerce(__p0, uint32x4_t)), \
-  int (*)[__ARM_mve_type_uint64x2_t]: __arm_vreinterpretq_s8_u64 (__ARM_mve_coerce(__p0, uint64x2_t)), \
-  int (*)[__ARM_mve_type_float32x4_t]: __arm_vreinterpretq_s8_f32 (__ARM_mve_coerce(__p0, float32x4_t)));})
-
-#define __arm_vreinterpretq_u16(p0) ({ __typeof(p0) __p0 = (p0); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
-  int (*)[__ARM_mve_type_float16x8_t]: __arm_vreinterpretq_u16_f16 (__ARM_mve_coerce(__p0, float16x8_t)), \
-  int (*)[__ARM_mve_type_int8x16_t]: __arm_vreinterpretq_u16_s8 (__ARM_mve_coerce(__p0, int8x16_t)), \
-  int (*)[__ARM_mve_type_int32x4_t]: __arm_vreinterpretq_u16_s32 (__ARM_mve_coerce(__p0, int32x4_t)), \
-  int (*)[__ARM_mve_type_int64x2_t]: __arm_vreinterpretq_u16_s64 (__ARM_mve_coerce(__p0, int64x2_t)), \
-  int (*)[__ARM_mve_type_uint8x16_t]: __arm_vreinterpretq_u16_u8 (__ARM_mve_coerce(__p0, uint8x16_t)), \
-  int (*)[__ARM_mve_type_int16x8_t]: __arm_vreinterpretq_u16_s16 (__ARM_mve_coerce(__p0, int16x8_t)), \
-  int (*)[__ARM_mve_type_uint32x4_t]: __arm_vreinterpretq_u16_u32 (__ARM_mve_coerce(__p0, uint32x4_t)), \
-  int (*)[__ARM_mve_type_uint64x2_t]: __arm_vreinterpretq_u16_u64 (__ARM_mve_coerce(__p0, uint64x2_t)), \
-  int (*)[__ARM_mve_type_float32x4_t]: __arm_vreinterpretq_u16_f32 (__ARM_mve_coerce(__p0, float32x4_t)));})
-
-#define __arm_vreinterpretq_u32(p0) ({ __typeof(p0) __p0 = (p0); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
-  int (*)[__ARM_mve_type_float16x8_t]: __arm_vreinterpretq_u32_f16 (__ARM_mve_coerce(__p0, float16x8_t)), \
-  int (*)[__ARM_mve_type_int16x8_t]: __arm_vreinterpretq_u32_s16 (__ARM_mve_coerce(__p0, int16x8_t)), \
-  int (*)[__ARM_mve_type_int8x16_t]: __arm_vreinterpretq_u32_s8 (__ARM_mve_coerce(__p0, int8x16_t)), \
-  int (*)[__ARM_mve_type_int64x2_t]: __arm_vreinterpretq_u32_s64 (__ARM_mve_coerce(__p0, int64x2_t)), \
-  int (*)[__ARM_mve_type_uint8x16_t]: __arm_vreinterpretq_u32_u8 (__ARM_mve_coerce(__p0, uint8x16_t)), \
-  int (*)[__ARM_mve_type_uint16x8_t]: __arm_vreinterpretq_u32_u16 (__ARM_mve_coerce(__p0, uint16x8_t)), \
-  int (*)[__ARM_mve_type_int32x4_t]: __arm_vreinterpretq_u32_s32 (__ARM_mve_coerce(__p0, int32x4_t)), \
-  int (*)[__ARM_mve_type_uint64x2_t]: __arm_vreinterpretq_u32_u64 (__ARM_mve_coerce(__p0, uint64x2_t)), \
-  int (*)[__ARM_mve_type_float32x4_t]: __arm_vreinterpretq_u32_f32 (__ARM_mve_coerce(__p0, float32x4_t)));})
-
-#define __arm_vreinterpretq_u64(p0) ({ __typeof(p0) __p0 = (p0); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
-  int (*)[__ARM_mve_type_float16x8_t]: __arm_vreinterpretq_u64_f16 (__ARM_mve_coerce(__p0, float16x8_t)), \
-  int (*)[__ARM_mve_type_int16x8_t]: __arm_vreinterpretq_u64_s16 (__ARM_mve_coerce(__p0, int16x8_t)), \
-  int (*)[__ARM_mve_type_int32x4_t]: __arm_vreinterpretq_u64_s32 (__ARM_mve_coerce(__p0, int32x4_t)), \
-  int (*)[__ARM_mve_type_int8x16_t]: __arm_vreinterpretq_u64_s8 (__ARM_mve_coerce(__p0, int8x16_t)), \
-  int (*)[__ARM_mve_type_uint8x16_t]: __arm_vreinterpretq_u64_u8 (__ARM_mve_coerce(__p0, uint8x16_t)), \
-  int (*)[__ARM_mve_type_uint16x8_t]: __arm_vreinterpretq_u64_u16 (__ARM_mve_coerce(__p0, uint16x8_t)), \
-  int (*)[__ARM_mve_type_uint32x4_t]: __arm_vreinterpretq_u64_u32 (__ARM_mve_coerce(__p0, uint32x4_t)), \
-  int (*)[__ARM_mve_type_int64x2_t]: __arm_vreinterpretq_u64_s64 (__ARM_mve_coerce(__p0, int64x2_t)), \
-  int (*)[__ARM_mve_type_float32x4_t]: __arm_vreinterpretq_u64_f32 (__ARM_mve_coerce(__p0, float32x4_t)));})
-
-#define __arm_vreinterpretq_u8(p0) ({ __typeof(p0) __p0 = (p0); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
-  int (*)[__ARM_mve_type_float16x8_t]: __arm_vreinterpretq_u8_f16 (__ARM_mve_coerce(__p0, float16x8_t)), \
-  int (*)[__ARM_mve_type_int16x8_t]: __arm_vreinterpretq_u8_s16 (__ARM_mve_coerce(__p0, int16x8_t)), \
-  int (*)[__ARM_mve_type_int32x4_t]: __arm_vreinterpretq_u8_s32 (__ARM_mve_coerce(__p0, int32x4_t)), \
-  int (*)[__ARM_mve_type_int64x2_t]: __arm_vreinterpretq_u8_s64 (__ARM_mve_coerce(__p0, int64x2_t)), \
-  int (*)[__ARM_mve_type_int8x16_t]: __arm_vreinterpretq_u8_s8 (__ARM_mve_coerce(__p0, int8x16_t)), \
-  int (*)[__ARM_mve_type_uint16x8_t]: __arm_vreinterpretq_u8_u16 (__ARM_mve_coerce(__p0, uint16x8_t)), \
-  int (*)[__ARM_mve_type_uint32x4_t]: __arm_vreinterpretq_u8_u32 (__ARM_mve_coerce(__p0, uint32x4_t)), \
-  int (*)[__ARM_mve_type_uint64x2_t]: __arm_vreinterpretq_u8_u64 (__ARM_mve_coerce(__p0, uint64x2_t)), \
-  int (*)[__ARM_mve_type_float32x4_t]: __arm_vreinterpretq_u8_f32 (__ARM_mve_coerce(__p0, float32x4_t)));})
-
 #define __arm_vstrwq_scatter_base_wb(p0,p1,p2) ({ __typeof(p2) __p2 = (p2); \
   _Generic( (int (*)[__ARM_mve_typeid(__p2)])0, \
   int (*)[__ARM_mve_type_int32x4_t]: __arm_vstrwq_scatter_base_wb_s32 (p0, p1, __ARM_mve_coerce(__p2, int32x4_t)), \
@@ -39931,86 +39711,6 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_uint32x4_t]: __arm_vuninitializedq_u32 (), \
   int (*)[__ARM_mve_type_uint64x2_t]: __arm_vuninitializedq_u64 ());})
 
-#define __arm_vreinterpretq_s16(p0) ({ __typeof(p0) __p0 = (p0); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
-  int (*)[__ARM_mve_type_int8x16_t]: __arm_vreinterpretq_s16_s8 (__ARM_mve_coerce(__p0, int8x16_t)), \
-  int (*)[__ARM_mve_type_int32x4_t]: __arm_vreinterpretq_s16_s32 (__ARM_mve_coerce(__p0, int32x4_t)), \
-  int (*)[__ARM_mve_type_int64x2_t]: __arm_vreinterpretq_s16_s64 (__ARM_mve_coerce(__p0, int64x2_t)), \
-  int (*)[__ARM_mve_type_uint8x16_t]: __arm_vreinterpretq_s16_u8 (__ARM_mve_coerce(__p0, uint8x16_t)), \
-  int (*)[__ARM_mve_type_uint16x8_t]: __arm_vreinterpretq_s16_u16 (__ARM_mve_coerce(__p0, uint16x8_t)), \
-  int (*)[__ARM_mve_type_uint32x4_t]: __arm_vreinterpretq_s16_u32 (__ARM_mve_coerce(__p0, uint32x4_t)), \
-  int (*)[__ARM_mve_type_uint64x2_t]: __arm_vreinterpretq_s16_u64 (__ARM_mve_coerce(__p0, uint64x2_t)));})
-
-#define __arm_vreinterpretq_s32(p0) ({ __typeof(p0) __p0 = (p0); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
-  int (*)[__ARM_mve_type_int16x8_t]: __arm_vreinterpretq_s32_s16 (__ARM_mve_coerce(__p0, int16x8_t)), \
-  int (*)[__ARM_mve_type_int8x16_t]: __arm_vreinterpretq_s32_s8 (__ARM_mve_coerce(__p0, int8x16_t)), \
-  int (*)[__ARM_mve_type_int64x2_t]: __arm_vreinterpretq_s32_s64 (__ARM_mve_coerce(__p0, int64x2_t)), \
-  int (*)[__ARM_mve_type_uint8x16_t]: __arm_vreinterpretq_s32_u8 (__ARM_mve_coerce(__p0, uint8x16_t)), \
-  int (*)[__ARM_mve_type_uint16x8_t]: __arm_vreinterpretq_s32_u16 (__ARM_mve_coerce(__p0, uint16x8_t)), \
-  int (*)[__ARM_mve_type_uint32x4_t]: __arm_vreinterpretq_s32_u32 (__ARM_mve_coerce(__p0, uint32x4_t)), \
-  int (*)[__ARM_mve_type_uint64x2_t]: __arm_vreinterpretq_s32_u64 (__ARM_mve_coerce(__p0, uint64x2_t)));})
-
-#define __arm_vreinterpretq_s64(p0) ({ __typeof(p0) __p0 = (p0); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
-  int (*)[__ARM_mve_type_int16x8_t]: __arm_vreinterpretq_s64_s16 (__ARM_mve_coerce(__p0, int16x8_t)), \
-  int (*)[__ARM_mve_type_int32x4_t]: __arm_vreinterpretq_s64_s32 (__ARM_mve_coerce(__p0, int32x4_t)), \
-  int (*)[__ARM_mve_type_int8x16_t]: __arm_vreinterpretq_s64_s8 (__ARM_mve_coerce(__p0, int8x16_t)), \
-  int (*)[__ARM_mve_type_uint8x16_t]: __arm_vreinterpretq_s64_u8 (__ARM_mve_coerce(__p0, uint8x16_t)), \
-  int (*)[__ARM_mve_type_uint16x8_t]: __arm_vreinterpretq_s64_u16 (__ARM_mve_coerce(__p0, uint16x8_t)), \
-  int (*)[__ARM_mve_type_uint32x4_t]: __arm_vreinterpretq_s64_u32 (__ARM_mve_coerce(__p0, uint32x4_t)), \
-  int (*)[__ARM_mve_type_uint64x2_t]: __arm_vreinterpretq_s64_u64 (__ARM_mve_coerce(__p0, uint64x2_t)));})
-
-#define __arm_vreinterpretq_s8(p0) ({ __typeof(p0) __p0 = (p0); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
-  int (*)[__ARM_mve_type_int16x8_t]: __arm_vreinterpretq_s8_s16 (__ARM_mve_coerce(__p0, int16x8_t)), \
-  int (*)[__ARM_mve_type_int32x4_t]: __arm_vreinterpretq_s8_s32 (__ARM_mve_coerce(__p0, int32x4_t)), \
-  int (*)[__ARM_mve_type_int64x2_t]: __arm_vreinterpretq_s8_s64 (__ARM_mve_coerce(__p0, int64x2_t)), \
-  int (*)[__ARM_mve_type_uint8x16_t]: __arm_vreinterpretq_s8_u8 (__ARM_mve_coerce(__p0, uint8x16_t)), \
-  int (*)[__ARM_mve_type_uint16x8_t]: __arm_vreinterpretq_s8_u16 (__ARM_mve_coerce(__p0, uint16x8_t)), \
-  int (*)[__ARM_mve_type_uint32x4_t]: __arm_vreinterpretq_s8_u32 (__ARM_mve_coerce(__p0, uint32x4_t)), \
-  int (*)[__ARM_mve_type_uint64x2_t]: __arm_vreinterpretq_s8_u64 (__ARM_mve_coerce(__p0, uint64x2_t)));})
-
-#define __arm_vreinterpretq_u16(p0) ({ __typeof(p0) __p0 = (p0); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
-  int (*)[__ARM_mve_type_int8x16_t]: __arm_vreinterpretq_u16_s8 (__ARM_mve_coerce(__p0, int8x16_t)), \
-  int (*)[__ARM_mve_type_int32x4_t]: __arm_vreinterpretq_u16_s32 (__ARM_mve_coerce(__p0, int32x4_t)), \
-  int (*)[__ARM_mve_type_int64x2_t]: __arm_vreinterpretq_u16_s64 (__ARM_mve_coerce(__p0, int64x2_t)), \
-  int (*)[__ARM_mve_type_uint8x16_t]: __arm_vreinterpretq_u16_u8 (__ARM_mve_coerce(__p0, uint8x16_t)), \
-  int (*)[__ARM_mve_type_int16x8_t]: __arm_vreinterpretq_u16_s16 (__ARM_mve_coerce(__p0, int16x8_t)), \
-  int (*)[__ARM_mve_type_uint32x4_t]: __arm_vreinterpretq_u16_u32 (__ARM_mve_coerce(__p0, uint32x4_t)), \
-  int (*)[__ARM_mve_type_uint64x2_t]: __arm_vreinterpretq_u16_u64 (__ARM_mve_coerce(__p0, uint64x2_t)));})
-
-#define __arm_vreinterpretq_u32(p0) ({ __typeof(p0) __p0 = (p0); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
-  int (*)[__ARM_mve_type_int16x8_t]: __arm_vreinterpretq_u32_s16 (__ARM_mve_coerce(__p0, int16x8_t)), \
-  int (*)[__ARM_mve_type_int8x16_t]: __arm_vreinterpretq_u32_s8 (__ARM_mve_coerce(__p0, int8x16_t)), \
-  int (*)[__ARM_mve_type_int64x2_t]: __arm_vreinterpretq_u32_s64 (__ARM_mve_coerce(__p0, int64x2_t)), \
-  int (*)[__ARM_mve_type_uint8x16_t]: __arm_vreinterpretq_u32_u8 (__ARM_mve_coerce(__p0, uint8x16_t)), \
-  int (*)[__ARM_mve_type_uint16x8_t]: __arm_vreinterpretq_u32_u16 (__ARM_mve_coerce(__p0, uint16x8_t)), \
-  int (*)[__ARM_mve_type_int32x4_t]: __arm_vreinterpretq_u32_s32 (__ARM_mve_coerce(__p0, int32x4_t)), \
-  int (*)[__ARM_mve_type_uint64x2_t]: __arm_vreinterpretq_u32_u64 (__ARM_mve_coerce(__p0, uint64x2_t)));})
-
-#define __arm_vreinterpretq_u64(p0) ({ __typeof(p0) __p0 = (p0); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
-  int (*)[__ARM_mve_type_int16x8_t]: __arm_vreinterpretq_u64_s16 (__ARM_mve_coerce(__p0, int16x8_t)), \
-  int (*)[__ARM_mve_type_int32x4_t]: __arm_vreinterpretq_u64_s32 (__ARM_mve_coerce(__p0, int32x4_t)), \
-  int (*)[__ARM_mve_type_int8x16_t]: __arm_vreinterpretq_u64_s8 (__ARM_mve_coerce(__p0, int8x16_t)), \
-  int (*)[__ARM_mve_type_uint8x16_t]: __arm_vreinterpretq_u64_u8 (__ARM_mve_coerce(__p0, uint8x16_t)), \
-  int (*)[__ARM_mve_type_uint16x8_t]: __arm_vreinterpretq_u64_u16 (__ARM_mve_coerce(__p0, uint16x8_t)), \
-  int (*)[__ARM_mve_type_uint32x4_t]: __arm_vreinterpretq_u64_u32 (__ARM_mve_coerce(__p0, uint32x4_t)), \
-  int (*)[__ARM_mve_type_int64x2_t]: __arm_vreinterpretq_u64_s64 (__ARM_mve_coerce(__p0, int64x2_t)));})
-
-#define __arm_vreinterpretq_u8(p0) ({ __typeof(p0) __p0 = (p0); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
-  int (*)[__ARM_mve_type_int16x8_t]: __arm_vreinterpretq_u8_s16 (__ARM_mve_coerce(__p0, int16x8_t)), \
-  int (*)[__ARM_mve_type_int32x4_t]: __arm_vreinterpretq_u8_s32 (__ARM_mve_coerce(__p0, int32x4_t)), \
-  int (*)[__ARM_mve_type_int64x2_t]: __arm_vreinterpretq_u8_s64 (__ARM_mve_coerce(__p0, int64x2_t)), \
-  int (*)[__ARM_mve_type_int8x16_t]: __arm_vreinterpretq_u8_s8 (__ARM_mve_coerce(__p0, int8x16_t)), \
-  int (*)[__ARM_mve_type_uint16x8_t]: __arm_vreinterpretq_u8_u16 (__ARM_mve_coerce(__p0, uint16x8_t)), \
-  int (*)[__ARM_mve_type_uint32x4_t]: __arm_vreinterpretq_u8_u32 (__ARM_mve_coerce(__p0, uint32x4_t)), \
-  int (*)[__ARM_mve_type_uint64x2_t]: __arm_vreinterpretq_u8_u64 (__ARM_mve_coerce(__p0, uint64x2_t)));})
-
 #define __arm_vabsq_x(p1,p2) ({ __typeof(p1) __p1 = (p1); \
   _Generic( (int (*)[__ARM_mve_typeid(__p1)])0, \
   int (*)[__ARM_mve_type_int8x16_t]: __arm_vabsq_x_s8 (__ARM_mve_coerce(__p1, int8x16_t), p2), \
diff --git a/gcc/config/arm/arm_mve_types.h b/gcc/config/arm/arm_mve_types.h
index 12bb519142f..ae2591faa03 100644
--- a/gcc/config/arm/arm_mve_types.h
+++ b/gcc/config/arm/arm_mve_types.h
@@ -29,1124 +29,101 @@ typedef float float32_t;
 
 #pragma GCC arm "arm_mve_types.h"
 
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_s16_s32 (int32x4_t __a)
-{
-  return (int16x8_t)  __a;
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_s16_s64 (int64x2_t __a)
-{
-  return (int16x8_t)  __a;
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_s16_s8 (int8x16_t __a)
-{
-  return (int16x8_t)  __a;
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_s16_u16 (uint16x8_t __a)
-{
-  return (int16x8_t)  __a;
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_s16_u32 (uint32x4_t __a)
-{
-  return (int16x8_t)  __a;
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_s16_u64 (uint64x2_t __a)
-{
-  return (int16x8_t)  __a;
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_s16_u8 (uint8x16_t __a)
-{
-  return (int16x8_t)  __a;
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_s32_s16 (int16x8_t __a)
-{
-  return (int32x4_t)  __a;
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_s32_s64 (int64x2_t __a)
-{
-  return (int32x4_t)  __a;
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_s32_s8 (int8x16_t __a)
-{
-  return (int32x4_t)  __a;
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_s32_u16 (uint16x8_t __a)
-{
-  return (int32x4_t)  __a;
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_s32_u32 (uint32x4_t __a)
-{
-  return (int32x4_t)  __a;
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_s32_u64 (uint64x2_t __a)
-{
-  return (int32x4_t)  __a;
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_s32_u8 (uint8x16_t __a)
-{
-  return (int32x4_t)  __a;
-}
-
-__extension__ extern __inline int64x2_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_s64_s16 (int16x8_t __a)
-{
-  return (int64x2_t)  __a;
-}
-
-__extension__ extern __inline int64x2_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_s64_s32 (int32x4_t __a)
-{
-  return (int64x2_t)  __a;
-}
-
-__extension__ extern __inline int64x2_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_s64_s8 (int8x16_t __a)
-{
-  return (int64x2_t)  __a;
-}
-
-__extension__ extern __inline int64x2_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_s64_u16 (uint16x8_t __a)
-{
-  return (int64x2_t)  __a;
-}
-
-__extension__ extern __inline int64x2_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_s64_u32 (uint32x4_t __a)
-{
-  return (int64x2_t)  __a;
-}
-
-__extension__ extern __inline int64x2_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_s64_u64 (uint64x2_t __a)
-{
-  return (int64x2_t)  __a;
-}
-
-__extension__ extern __inline int64x2_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_s64_u8 (uint8x16_t __a)
-{
-  return (int64x2_t)  __a;
-}
-
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_s8_s16 (int16x8_t __a)
-{
-  return (int8x16_t)  __a;
-}
-
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_s8_s32 (int32x4_t __a)
-{
-  return (int8x16_t)  __a;
-}
-
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_s8_s64 (int64x2_t __a)
-{
-  return (int8x16_t)  __a;
-}
-
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_s8_u16 (uint16x8_t __a)
-{
-  return (int8x16_t)  __a;
-}
-
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_s8_u32 (uint32x4_t __a)
-{
-  return (int8x16_t)  __a;
-}
-
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_s8_u64 (uint64x2_t __a)
-{
-  return (int8x16_t)  __a;
-}
-
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_s8_u8 (uint8x16_t __a)
-{
-  return (int8x16_t)  __a;
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_u16_s16 (int16x8_t __a)
-{
-  return (uint16x8_t)  __a;
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_u16_s32 (int32x4_t __a)
-{
-  return (uint16x8_t)  __a;
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_u16_s64 (int64x2_t __a)
-{
-  return (uint16x8_t)  __a;
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_u16_s8 (int8x16_t __a)
-{
-  return (uint16x8_t)  __a;
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_u16_u32 (uint32x4_t __a)
-{
-  return (uint16x8_t)  __a;
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_u16_u64 (uint64x2_t __a)
-{
-  return (uint16x8_t)  __a;
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_u16_u8 (uint8x16_t __a)
-{
-  return (uint16x8_t)  __a;
-}
-
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_u32_s16 (int16x8_t __a)
-{
-  return (uint32x4_t)  __a;
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_u32_s32 (int32x4_t __a)
-{
-  return (uint32x4_t)  __a;
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_u32_s64 (int64x2_t __a)
-{
-  return (uint32x4_t)  __a;
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_u32_s8 (int8x16_t __a)
-{
-  return (uint32x4_t)  __a;
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_u32_u16 (uint16x8_t __a)
-{
-  return (uint32x4_t)  __a;
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_u32_u64 (uint64x2_t __a)
-{
-  return (uint32x4_t)  __a;
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_u32_u8 (uint8x16_t __a)
-{
-  return (uint32x4_t)  __a;
-}
-
-__extension__ extern __inline uint64x2_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_u64_s16 (int16x8_t __a)
-{
-  return (uint64x2_t)  __a;
-}
-
-__extension__ extern __inline uint64x2_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_u64_s32 (int32x4_t __a)
-{
-  return (uint64x2_t)  __a;
-}
-
-__extension__ extern __inline uint64x2_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_u64_s64 (int64x2_t __a)
-{
-  return (uint64x2_t)  __a;
-}
-
-__extension__ extern __inline uint64x2_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_u64_s8 (int8x16_t __a)
-{
-  return (uint64x2_t)  __a;
-}
-
-__extension__ extern __inline uint64x2_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_u64_u16 (uint16x8_t __a)
-{
-  return (uint64x2_t)  __a;
-}
-
-__extension__ extern __inline uint64x2_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_u64_u32 (uint32x4_t __a)
-{
-  return (uint64x2_t)  __a;
-}
-
-__extension__ extern __inline uint64x2_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_u64_u8 (uint8x16_t __a)
-{
-  return (uint64x2_t)  __a;
-}
-
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_u8_s16 (int16x8_t __a)
-{
-  return (uint8x16_t)  __a;
-}
-
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_u8_s32 (int32x4_t __a)
-{
-  return (uint8x16_t)  __a;
-}
-
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_u8_s64 (int64x2_t __a)
-{
-  return (uint8x16_t)  __a;
-}
-
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_u8_s8 (int8x16_t __a)
-{
-  return (uint8x16_t)  __a;
-}
-
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_u8_u16 (uint16x8_t __a)
-{
-  return (uint8x16_t)  __a;
-}
-
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_u8_u32 (uint32x4_t __a)
-{
-  return (uint8x16_t)  __a;
-}
-
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_u8_u64 (uint64x2_t __a)
-{
-  return (uint8x16_t)  __a;
-}
-
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vuninitializedq_u8 (void)
-{
-  uint8x16_t __uninit;
-  __asm__ ("": "=w"(__uninit));
-  return __uninit;
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vuninitializedq_u16 (void)
-{
-  uint16x8_t __uninit;
-  __asm__ ("": "=w"(__uninit));
-  return __uninit;
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vuninitializedq_u32 (void)
-{
-  uint32x4_t __uninit;
-  __asm__ ("": "=w"(__uninit));
-  return __uninit;
-}
-
-__extension__ extern __inline uint64x2_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vuninitializedq_u64 (void)
-{
-  uint64x2_t __uninit;
-  __asm__ ("": "=w"(__uninit));
-  return __uninit;
-}
-
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vuninitializedq_s8 (void)
-{
-  int8x16_t __uninit;
-  __asm__ ("": "=w"(__uninit));
-  return __uninit;
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vuninitializedq_s16 (void)
-{
-  int16x8_t __uninit;
-  __asm__ ("": "=w"(__uninit));
-  return __uninit;
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vuninitializedq_s32 (void)
-{
-  int32x4_t __uninit;
-  __asm__ ("": "=w"(__uninit));
-  return __uninit;
-}
-
-__extension__ extern __inline int64x2_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vuninitializedq_s64 (void)
-{
-  int64x2_t __uninit;
-  __asm__ ("": "=w"(__uninit));
-  return __uninit;
-}
-
-#if (__ARM_FEATURE_MVE & 2) /* MVE Floating point.  */
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_s32_f16 (float16x8_t __a)
-{
-  return (int32x4_t)  __a;
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_s32_f32 (float32x4_t __a)
-{
-  return (int32x4_t)  __a;
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_s16_f16 (float16x8_t __a)
-{
-  return (int16x8_t)  __a;
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_s16_f32 (float32x4_t __a)
-{
-  return (int16x8_t)  __a;
-}
-
-__extension__ extern __inline int64x2_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_s64_f16 (float16x8_t __a)
-{
-  return (int64x2_t)  __a;
-}
-
-__extension__ extern __inline int64x2_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_s64_f32 (float32x4_t __a)
-{
-  return (int64x2_t)  __a;
-}
-
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_s8_f16 (float16x8_t __a)
-{
-  return (int8x16_t)  __a;
-}
-
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_s8_f32 (float32x4_t __a)
-{
-  return (int8x16_t)  __a;
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_u16_f16 (float16x8_t __a)
-{
-  return (uint16x8_t)  __a;
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_u16_f32 (float32x4_t __a)
-{
-  return (uint16x8_t)  __a;
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_u32_f16 (float16x8_t __a)
-{
-  return (uint32x4_t)  __a;
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_u32_f32 (float32x4_t __a)
-{
-  return (uint32x4_t)  __a;
-}
-
-__extension__ extern __inline uint64x2_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_u64_f16 (float16x8_t __a)
-{
-  return (uint64x2_t)  __a;
-}
-
-__extension__ extern __inline uint64x2_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_u64_f32 (float32x4_t __a)
-{
-  return (uint64x2_t)  __a;
-}
-
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_u8_f16 (float16x8_t __a)
-{
-  return (uint8x16_t)  __a;
-}
-
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_u8_f32 (float32x4_t __a)
-{
-  return (uint8x16_t)  __a;
-}
-
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_f16_f32 (float32x4_t __a)
-{
-  return (float16x8_t)  __a;
-}
-
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_f16_s16 (int16x8_t __a)
-{
-  return (float16x8_t)  __a;
-}
-
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_f16_s32 (int32x4_t __a)
-{
-  return (float16x8_t)  __a;
-}
-
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_f16_s64 (int64x2_t __a)
-{
-  return (float16x8_t)  __a;
-}
-
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_f16_s8 (int8x16_t __a)
-{
-  return (float16x8_t)  __a;
-}
-
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_f16_u16 (uint16x8_t __a)
-{
-  return (float16x8_t)  __a;
-}
-
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_f16_u32 (uint32x4_t __a)
-{
-  return (float16x8_t)  __a;
-}
-
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_f16_u64 (uint64x2_t __a)
-{
-  return (float16x8_t)  __a;
-}
-
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_f16_u8 (uint8x16_t __a)
-{
-  return (float16x8_t)  __a;
-}
-
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_f32_f16 (float16x8_t __a)
-{
-  return (float32x4_t)  __a;
-}
-
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_f32_s16 (int16x8_t __a)
-{
-  return (float32x4_t)  __a;
-}
-
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_f32_s32 (int32x4_t __a)
-{
-  return (float32x4_t)  __a;
-}
-
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_f32_s64 (int64x2_t __a)
-{
-  return (float32x4_t)  __a;
-}
-
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_f32_s8 (int8x16_t __a)
-{
-  return (float32x4_t)  __a;
-}
-
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_f32_u16 (uint16x8_t __a)
-{
-  return (float32x4_t)  __a;
-}
-
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_f32_u32 (uint32x4_t __a)
-{
-  return (float32x4_t)  __a;
-}
-
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_f32_u64 (uint64x2_t __a)
-{
-  return (float32x4_t)  __a;
-}
-
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_f32_u8 (uint8x16_t __a)
-{
-  return (float32x4_t)  __a;
-}
-
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vuninitializedq_f16 (void)
-{
-  float16x8_t __uninit;
-  __asm__ ("": "=w" (__uninit));
-  return __uninit;
-}
-
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vuninitializedq_f32 (void)
-{
-  float32x4_t __uninit;
-  __asm__ ("": "=w" (__uninit));
-  return __uninit;
-}
-
-#endif
-
-#ifdef __cplusplus
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_s16 (int32x4_t __a)
-{
- return __arm_vreinterpretq_s16_s32 (__a);
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_s16 (int64x2_t __a)
-{
- return __arm_vreinterpretq_s16_s64 (__a);
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_s16 (int8x16_t __a)
-{
- return __arm_vreinterpretq_s16_s8 (__a);
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_s16 (uint16x8_t __a)
-{
- return __arm_vreinterpretq_s16_u16 (__a);
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_s16 (uint32x4_t __a)
-{
- return __arm_vreinterpretq_s16_u32 (__a);
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_s16 (uint64x2_t __a)
-{
- return __arm_vreinterpretq_s16_u64 (__a);
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_s16 (uint8x16_t __a)
-{
- return __arm_vreinterpretq_s16_u8 (__a);
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_s32 (int16x8_t __a)
-{
- return __arm_vreinterpretq_s32_s16 (__a);
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_s32 (int64x2_t __a)
-{
- return __arm_vreinterpretq_s32_s64 (__a);
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_s32 (int8x16_t __a)
-{
- return __arm_vreinterpretq_s32_s8 (__a);
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_s32 (uint16x8_t __a)
-{
- return __arm_vreinterpretq_s32_u16 (__a);
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_s32 (uint32x4_t __a)
-{
- return __arm_vreinterpretq_s32_u32 (__a);
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_s32 (uint64x2_t __a)
-{
- return __arm_vreinterpretq_s32_u64 (__a);
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_s32 (uint8x16_t __a)
-{
- return __arm_vreinterpretq_s32_u8 (__a);
-}
-
-__extension__ extern __inline int64x2_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_s64 (int16x8_t __a)
-{
- return __arm_vreinterpretq_s64_s16 (__a);
-}
-
-__extension__ extern __inline int64x2_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_s64 (int32x4_t __a)
-{
- return __arm_vreinterpretq_s64_s32 (__a);
-}
-
-__extension__ extern __inline int64x2_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_s64 (int8x16_t __a)
-{
- return __arm_vreinterpretq_s64_s8 (__a);
-}
-
-__extension__ extern __inline int64x2_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_s64 (uint16x8_t __a)
-{
- return __arm_vreinterpretq_s64_u16 (__a);
-}
-
-__extension__ extern __inline int64x2_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_s64 (uint32x4_t __a)
-{
- return __arm_vreinterpretq_s64_u32 (__a);
-}
-
-__extension__ extern __inline int64x2_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_s64 (uint64x2_t __a)
-{
- return __arm_vreinterpretq_s64_u64 (__a);
-}
-
-__extension__ extern __inline int64x2_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_s64 (uint8x16_t __a)
-{
- return __arm_vreinterpretq_s64_u8 (__a);
-}
-
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_s8 (int16x8_t __a)
-{
- return __arm_vreinterpretq_s8_s16 (__a);
-}
-
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_s8 (int32x4_t __a)
-{
- return __arm_vreinterpretq_s8_s32 (__a);
-}
-
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_s8 (int64x2_t __a)
-{
- return __arm_vreinterpretq_s8_s64 (__a);
-}
-
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_s8 (uint16x8_t __a)
-{
- return __arm_vreinterpretq_s8_u16 (__a);
-}
-
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_s8 (uint32x4_t __a)
-{
- return __arm_vreinterpretq_s8_u32 (__a);
-}
-
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_s8 (uint64x2_t __a)
-{
- return __arm_vreinterpretq_s8_u64 (__a);
-}
-
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_s8 (uint8x16_t __a)
-{
- return __arm_vreinterpretq_s8_u8 (__a);
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_u16 (int16x8_t __a)
-{
- return __arm_vreinterpretq_u16_s16 (__a);
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_u16 (int32x4_t __a)
-{
- return __arm_vreinterpretq_u16_s32 (__a);
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_u16 (int64x2_t __a)
-{
- return __arm_vreinterpretq_u16_s64 (__a);
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_u16 (int8x16_t __a)
-{
- return __arm_vreinterpretq_u16_s8 (__a);
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_u16 (uint32x4_t __a)
-{
- return __arm_vreinterpretq_u16_u32 (__a);
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_u16 (uint64x2_t __a)
-{
- return __arm_vreinterpretq_u16_u64 (__a);
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_u16 (uint8x16_t __a)
-{
- return __arm_vreinterpretq_u16_u8 (__a);
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_u32 (int16x8_t __a)
-{
- return __arm_vreinterpretq_u32_s16 (__a);
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_u32 (int32x4_t __a)
-{
- return __arm_vreinterpretq_u32_s32 (__a);
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_u32 (int64x2_t __a)
-{
- return __arm_vreinterpretq_u32_s64 (__a);
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_u32 (int8x16_t __a)
-{
- return __arm_vreinterpretq_u32_s8 (__a);
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_u32 (uint16x8_t __a)
-{
- return __arm_vreinterpretq_u32_u16 (__a);
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_u32 (uint64x2_t __a)
-{
- return __arm_vreinterpretq_u32_u64 (__a);
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_u32 (uint8x16_t __a)
-{
- return __arm_vreinterpretq_u32_u8 (__a);
-}
-
-__extension__ extern __inline uint64x2_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_u64 (int16x8_t __a)
-{
- return __arm_vreinterpretq_u64_s16 (__a);
-}
-
-__extension__ extern __inline uint64x2_t
+__extension__ extern __inline uint8x16_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_u64 (int32x4_t __a)
+__arm_vuninitializedq_u8 (void)
 {
- return __arm_vreinterpretq_u64_s32 (__a);
+  uint8x16_t __uninit;
+  __asm__ ("": "=w"(__uninit));
+  return __uninit;
 }
 
-__extension__ extern __inline uint64x2_t
+__extension__ extern __inline uint16x8_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_u64 (int64x2_t __a)
+__arm_vuninitializedq_u16 (void)
 {
- return __arm_vreinterpretq_u64_s64 (__a);
+  uint16x8_t __uninit;
+  __asm__ ("": "=w"(__uninit));
+  return __uninit;
 }
 
-__extension__ extern __inline uint64x2_t
+__extension__ extern __inline uint32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_u64 (int8x16_t __a)
+__arm_vuninitializedq_u32 (void)
 {
- return __arm_vreinterpretq_u64_s8 (__a);
+  uint32x4_t __uninit;
+  __asm__ ("": "=w"(__uninit));
+  return __uninit;
 }
 
 __extension__ extern __inline uint64x2_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_u64 (uint16x8_t __a)
+__arm_vuninitializedq_u64 (void)
 {
- return __arm_vreinterpretq_u64_u16 (__a);
+  uint64x2_t __uninit;
+  __asm__ ("": "=w"(__uninit));
+  return __uninit;
 }
 
-__extension__ extern __inline uint64x2_t
+__extension__ extern __inline int8x16_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_u64 (uint32x4_t __a)
+__arm_vuninitializedq_s8 (void)
 {
- return __arm_vreinterpretq_u64_u32 (__a);
+  int8x16_t __uninit;
+  __asm__ ("": "=w"(__uninit));
+  return __uninit;
 }
 
-__extension__ extern __inline uint64x2_t
+__extension__ extern __inline int16x8_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_u64 (uint8x16_t __a)
+__arm_vuninitializedq_s16 (void)
 {
- return __arm_vreinterpretq_u64_u8 (__a);
+  int16x8_t __uninit;
+  __asm__ ("": "=w"(__uninit));
+  return __uninit;
 }
 
-__extension__ extern __inline uint8x16_t
+__extension__ extern __inline int32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_u8 (int16x8_t __a)
+__arm_vuninitializedq_s32 (void)
 {
- return __arm_vreinterpretq_u8_s16 (__a);
+  int32x4_t __uninit;
+  __asm__ ("": "=w"(__uninit));
+  return __uninit;
 }
 
-__extension__ extern __inline uint8x16_t
+__extension__ extern __inline int64x2_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_u8 (int32x4_t __a)
+__arm_vuninitializedq_s64 (void)
 {
- return __arm_vreinterpretq_u8_s32 (__a);
+  int64x2_t __uninit;
+  __asm__ ("": "=w"(__uninit));
+  return __uninit;
 }
 
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_u8 (int64x2_t __a)
-{
- return __arm_vreinterpretq_u8_s64 (__a);
-}
+#if (__ARM_FEATURE_MVE & 2) /* MVE Floating point.  */
 
-__extension__ extern __inline uint8x16_t
+__extension__ extern __inline float16x8_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_u8 (int8x16_t __a)
+__arm_vuninitializedq_f16 (void)
 {
- return __arm_vreinterpretq_u8_s8 (__a);
+  float16x8_t __uninit;
+  __asm__ ("": "=w" (__uninit));
+  return __uninit;
 }
 
-__extension__ extern __inline uint8x16_t
+__extension__ extern __inline float32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_u8 (uint16x8_t __a)
+__arm_vuninitializedq_f32 (void)
 {
- return __arm_vreinterpretq_u8_u16 (__a);
+  float32x4_t __uninit;
+  __asm__ ("": "=w" (__uninit));
+  return __uninit;
 }
 
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_u8 (uint32x4_t __a)
-{
- return __arm_vreinterpretq_u8_u32 (__a);
-}
+#endif
 
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_u8 (uint64x2_t __a)
-{
- return __arm_vreinterpretq_u8_u64 (__a);
-}
+#ifdef __cplusplus
 
 __extension__ extern __inline uint8x16_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
@@ -1205,244 +182,6 @@ __arm_vuninitializedq (int64x2_t /* __v ATTRIBUTE UNUSED */)
 }
 
 #if (__ARM_FEATURE_MVE & 2) /* MVE Floating point.  */
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_s32 (float16x8_t __a)
-{
- return __arm_vreinterpretq_s32_f16 (__a);
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_s32 (float32x4_t __a)
-{
- return __arm_vreinterpretq_s32_f32 (__a);
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_s16 (float16x8_t __a)
-{
- return __arm_vreinterpretq_s16_f16 (__a);
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_s16 (float32x4_t __a)
-{
- return __arm_vreinterpretq_s16_f32 (__a);
-}
-
-__extension__ extern __inline int64x2_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_s64 (float16x8_t __a)
-{
- return __arm_vreinterpretq_s64_f16 (__a);
-}
-
-__extension__ extern __inline int64x2_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_s64 (float32x4_t __a)
-{
- return __arm_vreinterpretq_s64_f32 (__a);
-}
-
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_s8 (float16x8_t __a)
-{
- return __arm_vreinterpretq_s8_f16 (__a);
-}
-
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_s8 (float32x4_t __a)
-{
- return __arm_vreinterpretq_s8_f32 (__a);
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_u16 (float16x8_t __a)
-{
- return __arm_vreinterpretq_u16_f16 (__a);
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_u16 (float32x4_t __a)
-{
- return __arm_vreinterpretq_u16_f32 (__a);
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_u32 (float16x8_t __a)
-{
- return __arm_vreinterpretq_u32_f16 (__a);
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_u32 (float32x4_t __a)
-{
- return __arm_vreinterpretq_u32_f32 (__a);
-}
-
-__extension__ extern __inline uint64x2_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_u64 (float16x8_t __a)
-{
- return __arm_vreinterpretq_u64_f16 (__a);
-}
-
-__extension__ extern __inline uint64x2_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_u64 (float32x4_t __a)
-{
- return __arm_vreinterpretq_u64_f32 (__a);
-}
-
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_u8 (float16x8_t __a)
-{
- return __arm_vreinterpretq_u8_f16 (__a);
-}
-
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_u8 (float32x4_t __a)
-{
- return __arm_vreinterpretq_u8_f32 (__a);
-}
-
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_f16 (float32x4_t __a)
-{
- return __arm_vreinterpretq_f16_f32 (__a);
-}
-
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_f16 (int16x8_t __a)
-{
- return __arm_vreinterpretq_f16_s16 (__a);
-}
-
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_f16 (int32x4_t __a)
-{
- return __arm_vreinterpretq_f16_s32 (__a);
-}
-
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_f16 (int64x2_t __a)
-{
- return __arm_vreinterpretq_f16_s64 (__a);
-}
-
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_f16 (int8x16_t __a)
-{
- return __arm_vreinterpretq_f16_s8 (__a);
-}
-
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_f16 (uint16x8_t __a)
-{
- return __arm_vreinterpretq_f16_u16 (__a);
-}
-
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_f16 (uint32x4_t __a)
-{
- return __arm_vreinterpretq_f16_u32 (__a);
-}
-
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_f16 (uint64x2_t __a)
-{
- return __arm_vreinterpretq_f16_u64 (__a);
-}
-
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_f16 (uint8x16_t __a)
-{
- return __arm_vreinterpretq_f16_u8 (__a);
-}
-
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_f32 (float16x8_t __a)
-{
- return __arm_vreinterpretq_f32_f16 (__a);
-}
-
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_f32 (int16x8_t __a)
-{
- return __arm_vreinterpretq_f32_s16 (__a);
-}
-
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_f32 (int32x4_t __a)
-{
- return __arm_vreinterpretq_f32_s32 (__a);
-}
-
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_f32 (int64x2_t __a)
-{
- return __arm_vreinterpretq_f32_s64 (__a);
-}
-
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_f32 (int8x16_t __a)
-{
- return __arm_vreinterpretq_f32_s8 (__a);
-}
-
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_f32 (uint16x8_t __a)
-{
- return __arm_vreinterpretq_f32_u16 (__a);
-}
-
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_f32 (uint32x4_t __a)
-{
- return __arm_vreinterpretq_f32_u32 (__a);
-}
-
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_f32 (uint64x2_t __a)
-{
- return __arm_vreinterpretq_f32_u64 (__a);
-}
-
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_f32 (uint8x16_t __a)
-{
- return __arm_vreinterpretq_f32_u8 (__a);
-}
-
 __extension__ extern __inline float16x8_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vuninitializedq (float16x8_t /* __v ATTRIBUTE UNUSED */)
diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index 35eab6c94bf..ab688396f97 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -10561,3 +10561,21 @@ (define_expand "vcond_mask_<mode><MVE_vpred>"
     }
   DONE;
 })
+
+;; Reinterpret operand 1 in operand 0's mode, without changing its contents.
+(define_expand "@arm_mve_reinterpret<mode>"
+  [(set (match_operand:MVE_vecs 0 "register_operand")
+	(unspec:MVE_vecs
+	  [(match_operand 1 "arm_any_register_operand")]
+	  REINTERPRET))]
+  "(TARGET_HAVE_MVE && VALID_MVE_SI_MODE (<MODE>mode))
+    || (TARGET_HAVE_MVE_FLOAT && VALID_MVE_SF_MODE (<MODE>mode))"
+  {
+    machine_mode src_mode = GET_MODE (operands[1]);
+    if (targetm.can_change_mode_class (<MODE>mode, src_mode, VFP_REGS))
+      {
+	emit_move_insn (operands[0], gen_lowpart (<MODE>mode, operands[1]));
+	DONE;
+      }
+  }
+)
diff --git a/gcc/config/arm/unspecs.md b/gcc/config/arm/unspecs.md
index 84384ee798d..dccda283573 100644
--- a/gcc/config/arm/unspecs.md
+++ b/gcc/config/arm/unspecs.md
@@ -1255,4 +1255,5 @@ (define_c_enum "unspec" [
   SQRSHRL_64
   SQRSHRL_48
   VSHLCQ_M_
+  REINTERPRET
 ])
diff --git a/gcc/testsuite/g++.target/arm/mve.exp b/gcc/testsuite/g++.target/arm/mve.exp
index cd824035540..f75ec20ea64 100644
--- a/gcc/testsuite/g++.target/arm/mve.exp
+++ b/gcc/testsuite/g++.target/arm/mve.exp
@@ -42,8 +42,12 @@ set dg-do-what-default "assemble"
 dg-init
 
 # Main loop.
-dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/../../gcc.target/arm/mve/intrinsics/*.\[cCS\]]] \
-	"" $DEFAULT_CXXFLAGS
+set gcc_subdir [string replace $subdir 0 2 gcc]
+set files [glob -nocomplain \
+	       "$srcdir/$subdir/../../gcc.target/arm/mve/intrinsics/*.\[cCS\]" \
+	       "$srcdir/$gcc_subdir/mve/general/*.\[cCS\]" \
+	       "$srcdir/$subdir/mve/general-c++/*.\[cCS\]"]
+dg-runtest [lsort $files] "" $DEFAULT_CXXFLAGS
 
 # All done.
 set dg-do-what-default ${save-dg-do-what-default}
diff --git a/gcc/testsuite/g++.target/arm/mve/general-c++/nomve_fp_1.c b/gcc/testsuite/g++.target/arm/mve/general-c++/nomve_fp_1.c
new file mode 100644
index 00000000000..e0692ceb8c8
--- /dev/null
+++ b/gcc/testsuite/g++.target/arm/mve/general-c++/nomve_fp_1.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_v8_1m_mve_ok } */
+/* Do not use dg-add-options arm_v8_1m_mve, because this might expand to "",
+   which could imply mve+fp depending on the user settings. We want to make
+   sure the '+fp' extension is not enabled.  */
+/* { dg-options "-mfpu=auto -march=armv8.1-m.main+mve" } */
+
+#include <arm_mve.h>
+
+void
+f1 (uint8x16_t v)
+{
+  vreinterpretq_f16 (v); /* { dg-error {ACLE function 'void vreinterpretq_f16\(uint8x16_t\)' requires ISA extension 'mve.fp'} } */
+  /* { dg-message {note: you can enable mve.fp by using the command-line option '-march', or by using the 'target' attribute or pragma} "" {target *-*-*} .-1 } */
+}
diff --git a/gcc/testsuite/g++.target/arm/mve/general-c++/vreinterpretq_1.C b/gcc/testsuite/g++.target/arm/mve/general-c++/vreinterpretq_1.C
new file mode 100644
index 00000000000..8b29ee58163
--- /dev/null
+++ b/gcc/testsuite/g++.target/arm/mve/general-c++/vreinterpretq_1.C
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
+/* { dg-add-options arm_v8_1m_mve_fp } */
+
+#include <arm_mve.h>
+
+void
+f1 (int8x16_t s8, uint16x8_t u16, float32x4_t f32)
+{
+  __arm_vreinterpretq_s8 (); /* { dg-error {no matching function for call to '__arm_vreinterpretq_s8\(\)'} } */
+  __arm_vreinterpretq_s8 (s8, s8); /* { dg-error {no matching function for call to '__arm_vreinterpretq_s8\(int8x16_t\&, int8x16_t\&\)'} } */
+  __arm_vreinterpretq_s8 (0); /* { dg-error {no matching function for call to '__arm_vreinterpretq_s8\(int\)'} } */
+  __arm_vreinterpretq_s8 (s8); /* { dg-error {no matching function for call to '__arm_vreinterpretq_s8\(int8x16_t\&\)'} } */
+  __arm_vreinterpretq_s8 (u16);
+  __arm_vreinterpretq_u16 (); /* { dg-error {no matching function for call to '__arm_vreinterpretq_u16\(\)'} } */
+  __arm_vreinterpretq_u16 (u16, u16); /* { dg-error {no matching function for call to '__arm_vreinterpretq_u16\(uint16x8_t\&, uint16x8_t\&\)'} } */
+  __arm_vreinterpretq_u16 (0); /* { dg-error {no matching function for call to '__arm_vreinterpretq_u16\(int\)'} } */
+  __arm_vreinterpretq_u16 (u16); /* { dg-error {no matching function for call to '__arm_vreinterpretq_u16\(uint16x8_t\&\)'} } */
+  __arm_vreinterpretq_u16 (f32);
+  __arm_vreinterpretq_f32 (); /* { dg-error {no matching function for call to '__arm_vreinterpretq_f32\(\)'} } */
+  __arm_vreinterpretq_f32 (f32, f32); /* { dg-error {no matching function for call to '__arm_vreinterpretq_f32\(float32x4_t\&, float32x4_t\&\)'} } */
+  __arm_vreinterpretq_f32 (0); /* { dg-error {no matching function for call to '__arm_vreinterpretq_f32\(int\)'} } */
+  __arm_vreinterpretq_f32 (f32); /* { dg-error {no matching function for call to '__arm_vreinterpretq_f32\(float32x4_t\&\)'} } */
+  __arm_vreinterpretq_f32 (s8);
+}
diff --git a/gcc/testsuite/gcc.target/arm/mve/general-c/nomve_fp_1.c b/gcc/testsuite/gcc.target/arm/mve/general-c/nomve_fp_1.c
new file mode 100644
index 00000000000..21c2af16a61
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/general-c/nomve_fp_1.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_v8_1m_mve_ok } */
+/* Do not use dg-add-options arm_v8_1m_mve, because this might expand to "",
+   which could imply mve+fp depending on the user settings. We want to make
+   sure the '+fp' extension is not enabled.  */
+/* { dg-options "-mfpu=auto -march=armv8.1-m.main+mve" } */
+
+#include <arm_mve.h>
+
+void
+foo (uint8x16_t v)
+{
+  vreinterpretq_f16 (v); /* { dg-error {ACLE function '__arm_vreinterpretq_f16_u8' requires ISA extension 'mve.fp'} } */
+  /* { dg-message {note: you can enable mve.fp by using the command-line option '-march', or by using the 'target' attribute or pragma} "" {target *-*-*} .-1 } */
+}
diff --git a/gcc/testsuite/gcc.target/arm/mve/general-c/vreinterpretq_1.c b/gcc/testsuite/gcc.target/arm/mve/general-c/vreinterpretq_1.c
new file mode 100644
index 00000000000..0297bd50198
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/general-c/vreinterpretq_1.c
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
+/* { dg-add-options arm_v8_1m_mve_fp } */
+
+#include <arm_mve.h>
+
+void
+f1 (int8x16_t s8, uint16x8_t u16, float32x4_t f32)
+{
+  __arm_vreinterpretq_s8 (); /* { dg-error {too few arguments to function '__arm_vreinterpretq_s8'} } */
+  __arm_vreinterpretq_s8 (s8, s8); /* { dg-error {too many arguments to function '__arm_vreinterpretq_s8'} } */
+  __arm_vreinterpretq_s8 (0); /* { dg-error {passing 'int' to argument 1 of '__arm_vreinterpretq_s8', which expects an MVE vector type} } */
+  __arm_vreinterpretq_s8 (s8); /* { dg-error {'__arm_vreinterpretq_s8' has no form that takes 'int8x16_t' arguments} } */
+  __arm_vreinterpretq_s8 (u16);
+  __arm_vreinterpretq_u16 (); /* { dg-error {too few arguments to function '__arm_vreinterpretq_u16'} } */
+  __arm_vreinterpretq_u16 (u16, u16); /* { dg-error {too many arguments to function '__arm_vreinterpretq_u16'} } */
+  __arm_vreinterpretq_u16 (0); /* { dg-error {passing 'int' to argument 1 of '__arm_vreinterpretq_u16', which expects an MVE vector type} } */
+  __arm_vreinterpretq_u16 (u16); /* { dg-error {'__arm_vreinterpretq_u16' has no form that takes 'uint16x8_t' arguments} } */
+  __arm_vreinterpretq_u16 (f32);
+  __arm_vreinterpretq_f32 (); /* { dg-error {too few arguments to function '__arm_vreinterpretq_f32'} } */
+  __arm_vreinterpretq_f32 (f32, f32); /* { dg-error {too many arguments to function '__arm_vreinterpretq_f32'} } */
+  __arm_vreinterpretq_f32 (0); /* { dg-error {passing 'int' to argument 1 of '__arm_vreinterpretq_f32', which expects an MVE vector type} } */
+  __arm_vreinterpretq_f32 (f32); /* { dg-error {'__arm_vreinterpretq_f32' has no form that takes 'float32x4_t' arguments} } */
+  __arm_vreinterpretq_f32 (s8);
+}
-- 
2.34.1


^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH 04/22] arm: [MVE intrinsics] Rework vuninitialized
  2023-04-18 13:45 [PATCH 00/22] arm: New framework for MVE intrinsics Christophe Lyon
                   ` (2 preceding siblings ...)
  2023-04-18 13:45 ` [PATCH 03/22] arm: [MVE intrinsics] Rework vreinterpretq Christophe Lyon
@ 2023-04-18 13:45 ` Christophe Lyon
  2023-05-02 16:13   ` Kyrylo Tkachov
  2023-04-18 13:45 ` [PATCH 05/22] arm: [MVE intrinsics] add binary_opt_n shape Christophe Lyon
                   ` (18 subsequent siblings)
  22 siblings, 1 reply; 55+ messages in thread
From: Christophe Lyon @ 2023-04-18 13:45 UTC (permalink / raw)
  To: gcc-patches, kyrylo.tkachov, richard.earnshaw, richard.sandiford
  Cc: Christophe Lyon

Implement vuninitialized using the new MVE builtins framework.

We need to keep the overloaded __arm_vuninitializedq definitions
because their resolution depends on the result type only, which is not
currently supported by the resolver.

2022-09-08  Murray Steele  <murray.steele@arm.com>
	    Christophe Lyon  <christophe.lyon@arm.com>

gcc/ChangeLog:

	* config/arm/arm-mve-builtins-base.cc (class
	vuninitializedq_impl): New.
	* config/arm/arm-mve-builtins-base.def (vuninitializedq): New.
	* config/arm/arm-mve-builtins-base.h (vuninitializedq): New
	declaration.
	* config/arm/arm-mve-builtins-shapes.cc	(inherent): New.
	* config/arm/arm-mve-builtins-shapes.h (inherent): New
	declaration.
	* config/arm/arm_mve_types.h (__arm_vuninitializedq): Move to ...
	* config/arm/arm_mve.h (__arm_vuninitializedq): ... here.
	(__arm_vuninitializedq_u8): Remove.
	(__arm_vuninitializedq_u16): Remove.
	(__arm_vuninitializedq_u32): Remove.
	(__arm_vuninitializedq_u64): Remove.
	(__arm_vuninitializedq_s8): Remove.
	(__arm_vuninitializedq_s16): Remove.
	(__arm_vuninitializedq_s32): Remove.
	(__arm_vuninitializedq_s64): Remove.
	(__arm_vuninitializedq_f16): Remove.
	(__arm_vuninitializedq_f32): Remove.
---
 gcc/config/arm/arm-mve-builtins-base.cc   |  14 ++
 gcc/config/arm/arm-mve-builtins-base.def  |   2 +
 gcc/config/arm/arm-mve-builtins-base.h    |   1 +
 gcc/config/arm/arm-mve-builtins-shapes.cc |  16 ++
 gcc/config/arm/arm-mve-builtins-shapes.h  |   7 +-
 gcc/config/arm/arm_mve.h                  |  73 ++++++++++
 gcc/config/arm/arm_mve_types.h            | 169 ----------------------
 7 files changed, 112 insertions(+), 170 deletions(-)

diff --git a/gcc/config/arm/arm-mve-builtins-base.cc b/gcc/config/arm/arm-mve-builtins-base.cc
index ad8d500afc6..02a3b23865c 100644
--- a/gcc/config/arm/arm-mve-builtins-base.cc
+++ b/gcc/config/arm/arm-mve-builtins-base.cc
@@ -65,10 +65,24 @@ class vreinterpretq_impl : public quiet<function_base>
   }
 };
 
+/* Implements vuninitializedq_* intrinsics.  */
+class vuninitializedq_impl : public quiet<function_base>
+{
+
+  rtx
+  expand (function_expander &e) const override
+  {
+    rtx target = e.get_reg_target ();
+    emit_clobber (copy_rtx (target));
+    return target;
+  }
+};
+
 } /* end anonymous namespace */
 
 namespace arm_mve {
 
 FUNCTION (vreinterpretq, vreinterpretq_impl,)
+FUNCTION (vuninitializedq, vuninitializedq_impl,)
 
 } /* end namespace arm_mve */
diff --git a/gcc/config/arm/arm-mve-builtins-base.def b/gcc/config/arm/arm-mve-builtins-base.def
index 5c0c1b9cee7..f669642a259 100644
--- a/gcc/config/arm/arm-mve-builtins-base.def
+++ b/gcc/config/arm/arm-mve-builtins-base.def
@@ -19,8 +19,10 @@
 
 #define REQUIRES_FLOAT false
 DEF_MVE_FUNCTION (vreinterpretq, unary_convert, reinterpret_integer, none)
+DEF_MVE_FUNCTION (vuninitializedq, inherent, all_integer_with_64, none)
 #undef REQUIRES_FLOAT
 
 #define REQUIRES_FLOAT true
 DEF_MVE_FUNCTION (vreinterpretq, unary_convert, reinterpret_float, none)
+DEF_MVE_FUNCTION (vuninitializedq, inherent, all_float, none)
 #undef REQUIRES_FLOAT
diff --git a/gcc/config/arm/arm-mve-builtins-base.h b/gcc/config/arm/arm-mve-builtins-base.h
index 60e7bd24eda..ec309cbe572 100644
--- a/gcc/config/arm/arm-mve-builtins-base.h
+++ b/gcc/config/arm/arm-mve-builtins-base.h
@@ -24,6 +24,7 @@ namespace arm_mve {
 namespace functions {
 
 extern const function_base *const vreinterpretq;
+extern const function_base *const vuninitializedq;
 
 } /* end namespace arm_mve::functions */
 } /* end namespace arm_mve */
diff --git a/gcc/config/arm/arm-mve-builtins-shapes.cc b/gcc/config/arm/arm-mve-builtins-shapes.cc
index d0da0ffef91..ce476aa196e 100644
--- a/gcc/config/arm/arm-mve-builtins-shapes.cc
+++ b/gcc/config/arm/arm-mve-builtins-shapes.cc
@@ -338,6 +338,22 @@ struct overloaded_base : public function_shape
   }
 };
 
+/* <T0>[xN]_t vfoo_t0().
+
+   Example: vuninitializedq.
+   int8x16_t [__arm_]vuninitializedq_s8(void)
+   int8x16_t [__arm_]vuninitializedq(int8x16_t t)  */
+struct inherent_def : public nonoverloaded_base
+{
+  void
+  build (function_builder &b, const function_group_info &group,
+	 bool preserve_user_namespace) const override
+  {
+    build_all (b, "t0", group, MODE_none, preserve_user_namespace);
+  }
+};
+SHAPE (inherent)
+
 /* <T0>_t foo_t0[_t1](<T1>_t)
 
    where the target type <t0> must be specified explicitly but the source
diff --git a/gcc/config/arm/arm-mve-builtins-shapes.h b/gcc/config/arm/arm-mve-builtins-shapes.h
index 04d19a02890..a491369425c 100644
--- a/gcc/config/arm/arm-mve-builtins-shapes.h
+++ b/gcc/config/arm/arm-mve-builtins-shapes.h
@@ -25,11 +25,16 @@ namespace arm_mve
   /* The naming convention is:
 
      - to use names like "unary" etc. if the rules are somewhat generic,
-       especially if there are no ranges involved.  */
+       especially if there are no ranges involved.
+
+     Also:
+
+     - "inherent" means that the function takes no arguments.  */
 
   namespace shapes
   {
 
+    extern const function_shape *const inherent;
     extern const function_shape *const unary_convert;
 
   } /* end namespace arm_mve::shapes */
diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
index 7688b5a7e53..5dc5ecef134 100644
--- a/gcc/config/arm/arm_mve.h
+++ b/gcc/config/arm/arm_mve.h
@@ -35382,6 +35382,79 @@ __arm_vgetq_lane (float32x4_t __a, const int __idx)
 }
 #endif /* MVE Floating point.  */
 
+
+__extension__ extern __inline uint8x16_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vuninitializedq (uint8x16_t /* __v ATTRIBUTE UNUSED */)
+{
+ return __arm_vuninitializedq_u8 ();
+}
+
+__extension__ extern __inline uint16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vuninitializedq (uint16x8_t /* __v ATTRIBUTE UNUSED */)
+{
+ return __arm_vuninitializedq_u16 ();
+}
+
+__extension__ extern __inline uint32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vuninitializedq (uint32x4_t /* __v ATTRIBUTE UNUSED */)
+{
+ return __arm_vuninitializedq_u32 ();
+}
+
+__extension__ extern __inline uint64x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vuninitializedq (uint64x2_t /* __v ATTRIBUTE UNUSED */)
+{
+ return __arm_vuninitializedq_u64 ();
+}
+
+__extension__ extern __inline int8x16_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vuninitializedq (int8x16_t /* __v ATTRIBUTE UNUSED */)
+{
+ return __arm_vuninitializedq_s8 ();
+}
+
+__extension__ extern __inline int16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vuninitializedq (int16x8_t /* __v ATTRIBUTE UNUSED */)
+{
+ return __arm_vuninitializedq_s16 ();
+}
+
+__extension__ extern __inline int32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vuninitializedq (int32x4_t /* __v ATTRIBUTE UNUSED */)
+{
+ return __arm_vuninitializedq_s32 ();
+}
+
+__extension__ extern __inline int64x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vuninitializedq (int64x2_t /* __v ATTRIBUTE UNUSED */)
+{
+ return __arm_vuninitializedq_s64 ();
+}
+
+#if (__ARM_FEATURE_MVE & 2) /* MVE Floating point.  */
+__extension__ extern __inline float16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vuninitializedq (float16x8_t /* __v ATTRIBUTE UNUSED */)
+{
+ return __arm_vuninitializedq_f16 ();
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vuninitializedq (float32x4_t /* __v ATTRIBUTE UNUSED */)
+{
+ return __arm_vuninitializedq_f32 ();
+}
+#endif /* __ARM_FEATURE_MVE & 2 (MVE floating point)  */
+
 #else
 enum {
     __ARM_mve_type_fp_n = 1,
diff --git a/gcc/config/arm/arm_mve_types.h b/gcc/config/arm/arm_mve_types.h
index ae2591faa03..32942e51a74 100644
--- a/gcc/config/arm/arm_mve_types.h
+++ b/gcc/config/arm/arm_mve_types.h
@@ -29,173 +29,4 @@ typedef float float32_t;
 
 #pragma GCC arm "arm_mve_types.h"
 
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vuninitializedq_u8 (void)
-{
-  uint8x16_t __uninit;
-  __asm__ ("": "=w"(__uninit));
-  return __uninit;
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vuninitializedq_u16 (void)
-{
-  uint16x8_t __uninit;
-  __asm__ ("": "=w"(__uninit));
-  return __uninit;
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vuninitializedq_u32 (void)
-{
-  uint32x4_t __uninit;
-  __asm__ ("": "=w"(__uninit));
-  return __uninit;
-}
-
-__extension__ extern __inline uint64x2_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vuninitializedq_u64 (void)
-{
-  uint64x2_t __uninit;
-  __asm__ ("": "=w"(__uninit));
-  return __uninit;
-}
-
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vuninitializedq_s8 (void)
-{
-  int8x16_t __uninit;
-  __asm__ ("": "=w"(__uninit));
-  return __uninit;
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vuninitializedq_s16 (void)
-{
-  int16x8_t __uninit;
-  __asm__ ("": "=w"(__uninit));
-  return __uninit;
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vuninitializedq_s32 (void)
-{
-  int32x4_t __uninit;
-  __asm__ ("": "=w"(__uninit));
-  return __uninit;
-}
-
-__extension__ extern __inline int64x2_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vuninitializedq_s64 (void)
-{
-  int64x2_t __uninit;
-  __asm__ ("": "=w"(__uninit));
-  return __uninit;
-}
-
-#if (__ARM_FEATURE_MVE & 2) /* MVE Floating point.  */
-
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vuninitializedq_f16 (void)
-{
-  float16x8_t __uninit;
-  __asm__ ("": "=w" (__uninit));
-  return __uninit;
-}
-
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vuninitializedq_f32 (void)
-{
-  float32x4_t __uninit;
-  __asm__ ("": "=w" (__uninit));
-  return __uninit;
-}
-
-#endif
-
-#ifdef __cplusplus
-
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vuninitializedq (uint8x16_t /* __v ATTRIBUTE UNUSED */)
-{
- return __arm_vuninitializedq_u8 ();
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vuninitializedq (uint16x8_t /* __v ATTRIBUTE UNUSED */)
-{
- return __arm_vuninitializedq_u16 ();
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vuninitializedq (uint32x4_t /* __v ATTRIBUTE UNUSED */)
-{
- return __arm_vuninitializedq_u32 ();
-}
-
-__extension__ extern __inline uint64x2_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vuninitializedq (uint64x2_t /* __v ATTRIBUTE UNUSED */)
-{
- return __arm_vuninitializedq_u64 ();
-}
-
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vuninitializedq (int8x16_t /* __v ATTRIBUTE UNUSED */)
-{
- return __arm_vuninitializedq_s8 ();
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vuninitializedq (int16x8_t /* __v ATTRIBUTE UNUSED */)
-{
- return __arm_vuninitializedq_s16 ();
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vuninitializedq (int32x4_t /* __v ATTRIBUTE UNUSED */)
-{
- return __arm_vuninitializedq_s32 ();
-}
-
-__extension__ extern __inline int64x2_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vuninitializedq (int64x2_t /* __v ATTRIBUTE UNUSED */)
-{
- return __arm_vuninitializedq_s64 ();
-}
-
-#if (__ARM_FEATURE_MVE & 2) /* MVE Floating point.  */
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vuninitializedq (float16x8_t /* __v ATTRIBUTE UNUSED */)
-{
- return __arm_vuninitializedq_f16 ();
-}
-
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vuninitializedq (float32x4_t /* __v ATTRIBUTE UNUSED */)
-{
- return __arm_vuninitializedq_f32 ();
-}
-#endif /* __ARM_FEATURE_MVE & 2 (MVE floating point)  */
-#endif /* __cplusplus */
-
 #endif /* _GCC_ARM_MVE_H.  */
-- 
2.34.1


^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH 05/22] arm: [MVE intrinsics] add binary_opt_n shape
  2023-04-18 13:45 [PATCH 00/22] arm: New framework for MVE intrinsics Christophe Lyon
                   ` (3 preceding siblings ...)
  2023-04-18 13:45 ` [PATCH 04/22] arm: [MVE intrinsics] Rework vuninitialized Christophe Lyon
@ 2023-04-18 13:45 ` Christophe Lyon
  2023-05-02 16:16   ` Kyrylo Tkachov
  2023-04-18 13:45 ` [PATCH 06/22] arm: [MVE intrinsics] add unspec_based_mve_function_exact_insn Christophe Lyon
                   ` (17 subsequent siblings)
  22 siblings, 1 reply; 55+ messages in thread
From: Christophe Lyon @ 2023-04-18 13:45 UTC (permalink / raw)
  To: gcc-patches, kyrylo.tkachov, richard.earnshaw, richard.sandiford
  Cc: Christophe Lyon

This patch adds the binary_opt_n shape description.

	gcc/
	* config/arm/arm-mve-builtins-shapes.cc (binary_opt_n): New.
	* config/arm/arm-mve-builtins-shapes.h (binary_opt_n): New.
---
 gcc/config/arm/arm-mve-builtins-shapes.cc | 32 +++++++++++++++++++++++
 gcc/config/arm/arm-mve-builtins-shapes.h  |  1 +
 2 files changed, 33 insertions(+)

diff --git a/gcc/config/arm/arm-mve-builtins-shapes.cc b/gcc/config/arm/arm-mve-builtins-shapes.cc
index ce476aa196e..033b304060a 100644
--- a/gcc/config/arm/arm-mve-builtins-shapes.cc
+++ b/gcc/config/arm/arm-mve-builtins-shapes.cc
@@ -338,6 +338,38 @@ struct overloaded_base : public function_shape
   }
 };
 
+/* <T0>_t vfoo[_t0](<T0>_t, <T0>_t)
+   <T0>_t vfoo[_n_t0](<T0>_t, <S0>_t)
+
+   i.e. the standard shape for binary operations that operate on
+   uniform types.
+
+   Example: vaddq.
+   int8x16_t [__arm_]vaddq[_s8](int8x16_t a, int8x16_t b)
+   int8x16_t [__arm_]vaddq[_n_s8](int8x16_t a, int8_t b)
+   int8x16_t [__arm_]vaddq_m[_s8](int8x16_t inactive, int8x16_t a, int8x16_t b, mve_pred16_t p)
+   int8x16_t [__arm_]vaddq_m[_n_s8](int8x16_t inactive, int8x16_t a, int8_t b, mve_pred16_t p)
+   int8x16_t [__arm_]vaddq_x[_s8](int8x16_t a, int8x16_t b, mve_pred16_t p)
+   int8x16_t [__arm_]vaddq_x[_n_s8](int8x16_t a, int8_t b, mve_pred16_t p)  */
+struct binary_opt_n_def : public overloaded_base<0>
+{
+  void
+  build (function_builder &b, const function_group_info &group,
+	 bool preserve_user_namespace) const override
+  {
+    b.add_overloaded_functions (group, MODE_none, preserve_user_namespace);
+    build_all (b, "v0,v0,v0", group, MODE_none, preserve_user_namespace);
+    build_all (b, "v0,v0,s0", group, MODE_n, preserve_user_namespace);
+  }
+
+  tree
+  resolve (function_resolver &r) const override
+  {
+    return r.resolve_uniform_opt_n (2);
+  }
+};
+SHAPE (binary_opt_n)
+
 /* <T0>[xN]_t vfoo_t0().
 
    Example: vuninitializedq.
diff --git a/gcc/config/arm/arm-mve-builtins-shapes.h b/gcc/config/arm/arm-mve-builtins-shapes.h
index a491369425c..43798fdde57 100644
--- a/gcc/config/arm/arm-mve-builtins-shapes.h
+++ b/gcc/config/arm/arm-mve-builtins-shapes.h
@@ -34,6 +34,7 @@ namespace arm_mve
   namespace shapes
   {
 
+    extern const function_shape *const binary_opt_n;
     extern const function_shape *const inherent;
     extern const function_shape *const unary_convert;
 
-- 
2.34.1


^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH 06/22] arm: [MVE intrinsics] add unspec_based_mve_function_exact_insn
  2023-04-18 13:45 [PATCH 00/22] arm: New framework for MVE intrinsics Christophe Lyon
                   ` (4 preceding siblings ...)
  2023-04-18 13:45 ` [PATCH 05/22] arm: [MVE intrinsics] add binary_opt_n shape Christophe Lyon
@ 2023-04-18 13:45 ` Christophe Lyon
  2023-05-02 16:17   ` Kyrylo Tkachov
  2023-04-18 13:45 ` [PATCH 07/22] arm: [MVE intrinsics] factorize vadd vsubq vmulq Christophe Lyon
                   ` (16 subsequent siblings)
  22 siblings, 1 reply; 55+ messages in thread
From: Christophe Lyon @ 2023-04-18 13:45 UTC (permalink / raw)
  To: gcc-patches, kyrylo.tkachov, richard.earnshaw, richard.sandiford
  Cc: Christophe Lyon

Introduce a function that will be used to build intrinsics which use
RTX codes for the non-predicated, no-mode version, and UNSPECS
otherwise.

2022-09-08  Christophe Lyon <christophe.lyon@arm.com>

gcc/ChangeLog:

	* config/arm/arm-mve-builtins-functions.h (class
	unspec_based_mve_function_base): New.
	(class unspec_based_mve_function_exact_insn): New.
---
 gcc/config/arm/arm-mve-builtins-functions.h | 186 ++++++++++++++++++++
 1 file changed, 186 insertions(+)

diff --git a/gcc/config/arm/arm-mve-builtins-functions.h b/gcc/config/arm/arm-mve-builtins-functions.h
index dff01999bcd..6d992b270b0 100644
--- a/gcc/config/arm/arm-mve-builtins-functions.h
+++ b/gcc/config/arm/arm-mve-builtins-functions.h
@@ -39,6 +39,192 @@ public:
   }
 };
 
+/* An incomplete function_base for functions that have an associated
+   rtx_code for signed integers, unsigned integers and floating-point
+   values for the non-predicated, non-suffixed intrinsic, and unspec
+   codes, with separate codes for signed integers, unsigned integers
+   and floating-point values.  The class simply records information
+   about the mapping for derived classes to use.  */
+class unspec_based_mve_function_base : public function_base
+{
+public:
+  CONSTEXPR unspec_based_mve_function_base (rtx_code code_for_sint,
+					    rtx_code code_for_uint,
+					    rtx_code code_for_fp,
+					    int unspec_for_n_sint,
+					    int unspec_for_n_uint,
+					    int unspec_for_n_fp,
+					    int unspec_for_m_sint,
+					    int unspec_for_m_uint,
+					    int unspec_for_m_fp,
+					    int unspec_for_m_n_sint,
+					    int unspec_for_m_n_uint,
+					    int unspec_for_m_n_fp)
+    : m_code_for_sint (code_for_sint),
+      m_code_for_uint (code_for_uint),
+      m_code_for_fp (code_for_fp),
+      m_unspec_for_n_sint (unspec_for_n_sint),
+      m_unspec_for_n_uint (unspec_for_n_uint),
+      m_unspec_for_n_fp (unspec_for_n_fp),
+      m_unspec_for_m_sint (unspec_for_m_sint),
+      m_unspec_for_m_uint (unspec_for_m_uint),
+      m_unspec_for_m_fp (unspec_for_m_fp),
+      m_unspec_for_m_n_sint (unspec_for_m_n_sint),
+      m_unspec_for_m_n_uint (unspec_for_m_n_uint),
+      m_unspec_for_m_n_fp (unspec_for_m_n_fp)
+  {}
+
+  /* The rtx code to use for signed, unsigned integers and
+     floating-point values respectively.  */
+  rtx_code m_code_for_sint;
+  rtx_code m_code_for_uint;
+  rtx_code m_code_for_fp;
+
+  /* The unspec code associated with signed-integer, unsigned-integer
+     and floating-point operations respectively.  It covers the cases
+     with the _n suffix, and/or the _m predicate.  */
+  int m_unspec_for_n_sint;
+  int m_unspec_for_n_uint;
+  int m_unspec_for_n_fp;
+  int m_unspec_for_m_sint;
+  int m_unspec_for_m_uint;
+  int m_unspec_for_m_fp;
+  int m_unspec_for_m_n_sint;
+  int m_unspec_for_m_n_uint;
+  int m_unspec_for_m_n_fp;
+};
+
+/* Map the function directly to CODE (UNSPEC, M) where M is the vector
+   mode associated with type suffix 0, except when there is no
+   predicate and no _n suffix, in which case we use the appropriate
+   rtx_code.  This is useful when the basic operation is mapped to a
+   standard RTX code and all other versions use different unspecs.  */
+class unspec_based_mve_function_exact_insn : public unspec_based_mve_function_base
+{
+public:
+  CONSTEXPR unspec_based_mve_function_exact_insn (rtx_code code_for_sint,
+						  rtx_code code_for_uint,
+						  rtx_code code_for_fp,
+						  int unspec_for_n_sint,
+						  int unspec_for_n_uint,
+						  int unspec_for_n_fp,
+						  int unspec_for_m_sint,
+						  int unspec_for_m_uint,
+						  int unspec_for_m_fp,
+						  int unspec_for_m_n_sint,
+						  int unspec_for_m_n_uint,
+						  int unspec_for_m_n_fp)
+    : unspec_based_mve_function_base (code_for_sint,
+				      code_for_uint,
+				      code_for_fp,
+				      unspec_for_n_sint,
+				      unspec_for_n_uint,
+				      unspec_for_n_fp,
+				      unspec_for_m_sint,
+				      unspec_for_m_uint,
+				      unspec_for_m_fp,
+				      unspec_for_m_n_sint,
+				      unspec_for_m_n_uint,
+				      unspec_for_m_n_fp)
+  {}
+
+  rtx
+  expand (function_expander &e) const override
+  {
+    /* No suffix, no predicate, use the right RTX code.  */
+    if ((e.mode_suffix_id != MODE_n)
+	&& (e.pred == PRED_none))
+      return e.map_to_rtx_codes (m_code_for_sint, m_code_for_uint,
+				 m_code_for_fp);
+
+    insn_code code;
+    switch (e.pred)
+      {
+      case PRED_none:
+	if (e.mode_suffix_id == MODE_n)
+	  /* No predicate, _n suffix.  */
+	  {
+	    if (e.type_suffix (0).integer_p)
+	      if (e.type_suffix (0).unsigned_p)
+		code = code_for_mve_q_n (m_unspec_for_n_uint, m_unspec_for_n_uint, e.vector_mode (0));
+	      else
+		code = code_for_mve_q_n (m_unspec_for_n_sint, m_unspec_for_n_sint, e.vector_mode (0));
+	    else
+	      code = code_for_mve_q_n_f (m_unspec_for_n_fp, e.vector_mode (0));
+
+	    return e.use_exact_insn (code);
+	  }
+	gcc_unreachable ();
+	break;
+
+      case PRED_m:
+	switch (e.mode_suffix_id)
+	  {
+	  case MODE_none:
+	    /* No suffix, "m" predicate.  */
+	    if (e.type_suffix (0).integer_p)
+	      if (e.type_suffix (0).unsigned_p)
+		code = code_for_mve_q_m (m_unspec_for_m_uint, m_unspec_for_m_uint, e.vector_mode (0));
+	      else
+		code = code_for_mve_q_m (m_unspec_for_m_sint, m_unspec_for_m_sint, e.vector_mode (0));
+	    else
+	      code = code_for_mve_q_m_f (m_unspec_for_m_fp, e.vector_mode (0));
+	    break;
+
+	  case MODE_n:
+	    /* _n suffix, "m" predicate.  */
+	    if (e.type_suffix (0).integer_p)
+	      if (e.type_suffix (0).unsigned_p)
+		code = code_for_mve_q_m_n (m_unspec_for_m_n_uint, m_unspec_for_m_n_uint, e.vector_mode (0));
+	      else
+		code = code_for_mve_q_m_n (m_unspec_for_m_n_sint, m_unspec_for_m_n_sint, e.vector_mode (0));
+	    else
+	      code = code_for_mve_q_m_n_f (m_unspec_for_m_n_fp, e.vector_mode (0));
+	    break;
+
+	  default:
+	    gcc_unreachable ();
+	  }
+	return e.use_cond_insn (code, 0);
+
+      case PRED_x:
+	switch (e.mode_suffix_id)
+	  {
+	  case MODE_none:
+	    /* No suffix, "x" predicate.  */
+	    if (e.type_suffix (0).integer_p)
+	      if (e.type_suffix (0).unsigned_p)
+		code = code_for_mve_q_m (m_unspec_for_m_uint, m_unspec_for_m_uint, e.vector_mode (0));
+	      else
+		code = code_for_mve_q_m (m_unspec_for_m_sint, m_unspec_for_m_sint, e.vector_mode (0));
+	    else
+	      code = code_for_mve_q_m_f (m_unspec_for_m_fp, e.vector_mode (0));
+	    break;
+
+	  case MODE_n:
+	    /* _n suffix, "x" predicate.  */
+	    if (e.type_suffix (0).integer_p)
+	      if (e.type_suffix (0).unsigned_p)
+		code = code_for_mve_q_m_n (m_unspec_for_m_n_uint, m_unspec_for_m_n_uint, e.vector_mode (0));
+	      else
+		code = code_for_mve_q_m_n (m_unspec_for_m_n_sint, m_unspec_for_m_n_sint, e.vector_mode (0));
+	    else
+	      code = code_for_mve_q_m_n_f (m_unspec_for_m_n_fp, e.vector_mode (0));
+	    break;
+
+	  default:
+	    gcc_unreachable ();
+	  }
+	return e.use_pred_x_insn (code);
+
+      default:
+	gcc_unreachable ();
+      }
+
+    gcc_unreachable ();
+  }
+};
+
 } /* end namespace arm_mve */
 
 /* Declare the global function base NAME, creating it from an instance
-- 
2.34.1


^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH 07/22] arm: [MVE intrinsics] factorize vadd vsubq vmulq
  2023-04-18 13:45 [PATCH 00/22] arm: New framework for MVE intrinsics Christophe Lyon
                   ` (5 preceding siblings ...)
  2023-04-18 13:45 ` [PATCH 06/22] arm: [MVE intrinsics] add unspec_based_mve_function_exact_insn Christophe Lyon
@ 2023-04-18 13:45 ` Christophe Lyon
  2023-05-02 16:19   ` Kyrylo Tkachov
  2023-04-18 13:45 ` [PATCH 08/22] arm: [MVE intrinsics] rework vaddq vmulq vsubq Christophe Lyon
                   ` (15 subsequent siblings)
  22 siblings, 1 reply; 55+ messages in thread
From: Christophe Lyon @ 2023-04-18 13:45 UTC (permalink / raw)
  To: gcc-patches, kyrylo.tkachov, richard.earnshaw, richard.sandiford
  Cc: Christophe Lyon

In order to avoid using a huge switch when generating all the
intrinsics (e.g. mve_vaddq_n_sv4si, ...), we want to generate a single
function taking the builtin code as parameter (e.g. mve_q_n (VADDQ_S,
....)
This is achieved by using the new mve_insn iterator.

Having done that, it becomes easier to share similar patterns, to
avoid useless/error-prone code duplication.

2022-09-08  Christophe Lyon  <christophe.lyon@arm.com>

gcc/ChangeLog:

	* config/arm/iterators.md (MVE_INT_BINARY_RTX, MVE_INT_M_BINARY)
	(MVE_INT_M_N_BINARY, MVE_INT_N_BINARY, MVE_FP_M_BINARY)
	(MVE_FP_M_N_BINARY, MVE_FP_N_BINARY, mve_addsubmul, mve_insn): New
	iterators.
	* config/arm/mve.md
	(mve_vsubq_n_f<mode>, mve_vaddq_n_f<mode>, mve_vmulq_n_f<mode>):
	Factorize into ...
	(@mve_<mve_insn>q_n_f<mode>): ... this.
	(mve_vaddq_n_<supf><mode>, mve_vmulq_n_<supf><mode>)
	(mve_vsubq_n_<supf><mode>): Factorize into ...
	(@mve_<mve_insn>q_n_<supf><mode>): ... this.
	(mve_vaddq<mode>, mve_vmulq<mode>, mve_vsubq<mode>): Factorize
	into ...
	(mve_<mve_addsubmul>q<mode>): ... this.
	(mve_vaddq_f<mode>, mve_vmulq_f<mode>, mve_vsubq_f<mode>):
	Factorize into ...
	(mve_<mve_addsubmul>q_f<mode>): ... this.
	(mve_vaddq_m_<supf><mode>, mve_vmulq_m_<supf><mode>)
	(mve_vsubq_m_<supf><mode>): Factorize into ...
	(@mve_<mve_insn>q_m_<supf><mode>): ... this,
	(mve_vaddq_m_n_<supf><mode>, mve_vmulq_m_n_<supf><mode>)
	(mve_vsubq_m_n_<supf><mode>): Factorize into ...
	(@mve_<mve_insn>q_m_n_<supf><mode>): ... this.
	(mve_vaddq_m_f<mode>, mve_vmulq_m_f<mode>, mve_vsubq_m_f<mode>):
	Factorize into ...
	(@mve_<mve_insn>q_m_f<mode>): ... this.
	(mve_vaddq_m_n_f<mode>, mve_vmulq_m_n_f<mode>)
	(mve_vsubq_m_n_f<mode>): Factorize into ...
	(@mve_<mve_insn>q_m_n_f<mode>): ... this.
---
 gcc/config/arm/iterators.md |  57 +++++++
 gcc/config/arm/mve.md       | 317 +++++-------------------------------
 2 files changed, 99 insertions(+), 275 deletions(-)

diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
index 39895ad62aa..d3bef594775 100644
--- a/gcc/config/arm/iterators.md
+++ b/gcc/config/arm/iterators.md
@@ -330,6 +330,63 @@ (define_code_iterator FCVT [unsigned_float float])
 ;; Saturating addition, subtraction
 (define_code_iterator SSPLUSMINUS [ss_plus ss_minus])
 
+;; MVE integer binary operations.
+(define_code_iterator MVE_INT_BINARY_RTX [plus minus mult])
+
+(define_int_iterator MVE_INT_M_BINARY   [
+		     VADDQ_M_S VADDQ_M_U
+		     VMULQ_M_S VMULQ_M_U
+		     VSUBQ_M_S VSUBQ_M_U
+		     ])
+
+(define_int_iterator MVE_INT_M_N_BINARY [
+		     VADDQ_M_N_S VADDQ_M_N_U
+		     VMULQ_M_N_S VMULQ_M_N_U
+		     VSUBQ_M_N_S VSUBQ_M_N_U
+		     ])
+
+(define_int_iterator MVE_INT_N_BINARY   [
+		     VADDQ_N_S VADDQ_N_U
+		     VMULQ_N_S VMULQ_N_U
+		     VSUBQ_N_S VSUBQ_N_U
+		     ])
+
+(define_int_iterator MVE_FP_M_BINARY   [
+		     VADDQ_M_F
+		     VMULQ_M_F
+		     VSUBQ_M_F
+		     ])
+
+(define_int_iterator MVE_FP_M_N_BINARY [
+		     VADDQ_M_N_F
+		     VMULQ_M_N_F
+		     VSUBQ_M_N_F
+		     ])
+
+(define_int_iterator MVE_FP_N_BINARY   [
+		     VADDQ_N_F
+		     VMULQ_N_F
+		     VSUBQ_N_F
+		     ])
+
+(define_code_attr mve_addsubmul [
+		 (minus "vsub")
+		 (mult "vmul")
+		 (plus "vadd")
+		 ])
+
+(define_int_attr mve_insn [
+		 (VADDQ_M_N_S "vadd") (VADDQ_M_N_U "vadd") (VADDQ_M_N_F "vadd")
+		 (VADDQ_M_S "vadd") (VADDQ_M_U "vadd") (VADDQ_M_F "vadd")
+		 (VADDQ_N_S "vadd") (VADDQ_N_U "vadd") (VADDQ_N_F "vadd")
+		 (VMULQ_M_N_S "vmul") (VMULQ_M_N_U "vmul") (VMULQ_M_N_F "vmul")
+		 (VMULQ_M_S "vmul") (VMULQ_M_U "vmul") (VMULQ_M_F "vmul")
+		 (VMULQ_N_S "vmul") (VMULQ_N_U "vmul") (VMULQ_N_F "vmul")
+		 (VSUBQ_M_N_S "vsub") (VSUBQ_M_N_U "vsub") (VSUBQ_M_N_F "vsub")
+		 (VSUBQ_M_S "vsub") (VSUBQ_M_U "vsub") (VSUBQ_M_F "vsub")
+		 (VSUBQ_N_S "vsub") (VSUBQ_N_U "vsub") (VSUBQ_N_F "vsub")
+		 ])
+
 ;; plus and minus are the only SHIFTABLE_OPS for which Thumb2 allows
 ;; a stack pointer operand.  The minus operation is a candidate for an rsub
 ;; and hence only plus is supported.
diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index ab688396f97..5167fbc6add 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -668,21 +668,6 @@ (define_insn "mve_vpnotv16bi"
   [(set_attr "type" "mve_move")
 ])
 
-;;
-;; [vsubq_n_f])
-;;
-(define_insn "mve_vsubq_n_f<mode>"
-  [
-   (set (match_operand:MVE_0 0 "s_register_operand" "=w")
-	(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "w")
-		       (match_operand:<V_elem> 2 "s_register_operand" "r")]
-	 VSUBQ_N_F))
-  ]
-  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
-  "vsub.f<V_sz_elem>\t%q0, %q1, %2"
-  [(set_attr "type" "mve_move")
-])
-
 ;;
 ;; [vbrsrq_n_f])
 ;;
@@ -871,16 +856,18 @@ (define_insn "mve_vabdq_<supf><mode>"
 
 ;;
 ;; [vaddq_n_s, vaddq_n_u])
+;; [vsubq_n_s, vsubq_n_u])
+;; [vmulq_n_s, vmulq_n_u])
 ;;
-(define_insn "mve_vaddq_n_<supf><mode>"
+(define_insn "@mve_<mve_insn>q_n_<supf><mode>"
   [
    (set (match_operand:MVE_2 0 "s_register_operand" "=w")
 	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "w")
 		       (match_operand:<V_elem> 2 "s_register_operand" "r")]
-	 VADDQ_N))
+	 MVE_INT_N_BINARY))
   ]
   "TARGET_HAVE_MVE"
-  "vadd.i%#<V_sz_elem>\t%q0, %q1, %2"
+  "<mve_insn>.i%#<V_sz_elem>\t%q0, %q1, %2"
   [(set_attr "type" "mve_move")
 ])
 
@@ -1362,26 +1349,13 @@ (define_insn "mve_vmulltq_int_<supf><mode>"
 ])
 
 ;;
-;; [vmulq_n_u, vmulq_n_s])
-;;
-(define_insn "mve_vmulq_n_<supf><mode>"
-  [
-   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
-	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "w")
-		       (match_operand:<V_elem> 2 "s_register_operand" "r")]
-	 VMULQ_N))
-  ]
-  "TARGET_HAVE_MVE"
-  "vmul.i%#<V_sz_elem>\t%q0, %q1, %2"
-  [(set_attr "type" "mve_move")
-])
-
-;;
+;; [vaddq_s, vaddq_u])
 ;; [vmulq_u, vmulq_s])
+;; [vsubq_s, vsubq_u])
 ;;
 (define_insn "mve_vmulq_<supf><mode>"
   [
-   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
+    (set (match_operand:MVE_2 0 "s_register_operand" "=w")
 	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "w")
 		       (match_operand:MVE_2 2 "s_register_operand" "w")]
 	 VMULQ))
@@ -1391,14 +1365,14 @@ (define_insn "mve_vmulq_<supf><mode>"
   [(set_attr "type" "mve_move")
 ])
 
-(define_insn "mve_vmulq<mode>"
+(define_insn "mve_<mve_addsubmul>q<mode>"
   [
    (set (match_operand:MVE_2 0 "s_register_operand" "=w")
-	(mult:MVE_2 (match_operand:MVE_2 1 "s_register_operand" "w")
-		    (match_operand:MVE_2 2 "s_register_operand" "w")))
+	(MVE_INT_BINARY_RTX:MVE_2 (match_operand:MVE_2 1 "s_register_operand" "w")
+			      (match_operand:MVE_2 2 "s_register_operand" "w")))
   ]
   "TARGET_HAVE_MVE"
-  "vmul.i%#<V_sz_elem>\t%q0, %q1, %q2"
+  "<mve_addsubmul>.i%#<V_sz_elem>\t%q0, %q1, %q2"
   [(set_attr "type" "mve_move")
 ])
 
@@ -1768,21 +1742,6 @@ (define_insn "mve_vshlq_r_<supf><mode>"
   [(set_attr "type" "mve_move")
 ])
 
-;;
-;; [vsubq_n_s, vsubq_n_u])
-;;
-(define_insn "mve_vsubq_n_<supf><mode>"
-  [
-   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
-	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "w")
-		       (match_operand:<V_elem> 2 "s_register_operand" "r")]
-	 VSUBQ_N))
-  ]
-  "TARGET_HAVE_MVE"
-  "vsub.i%#<V_sz_elem>\t%q0, %q1, %2"
-  [(set_attr "type" "mve_move")
-])
-
 ;;
 ;; [vsubq_s, vsubq_u])
 ;;
@@ -1798,17 +1757,6 @@ (define_insn "mve_vsubq_<supf><mode>"
   [(set_attr "type" "mve_move")
 ])
 
-(define_insn "mve_vsubq<mode>"
-  [
-   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
-	(minus:MVE_2 (match_operand:MVE_2 1 "s_register_operand" "w")
-		     (match_operand:MVE_2 2 "s_register_operand" "w")))
-  ]
-  "TARGET_HAVE_MVE"
-  "vsub.i%#<V_sz_elem>\t%q0, %q1, %q2"
-  [(set_attr "type" "mve_move")
-])
-
 ;;
 ;; [vabdq_f])
 ;;
@@ -1841,16 +1789,18 @@ (define_insn "mve_vaddlvaq_<supf>v4si"
 
 ;;
 ;; [vaddq_n_f])
+;; [vsubq_n_f])
+;; [vmulq_n_f])
 ;;
-(define_insn "mve_vaddq_n_f<mode>"
+(define_insn "@mve_<mve_insn>q_n_f<mode>"
   [
    (set (match_operand:MVE_0 0 "s_register_operand" "=w")
 	(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "w")
 		       (match_operand:<V_elem> 2 "s_register_operand" "r")]
-	 VADDQ_N_F))
+	 MVE_FP_N_BINARY))
   ]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
-  "vadd.f%#<V_sz_elem>\t%q0, %q1, %2"
+  "<mve_insn>.f%#<V_sz_elem>\t%q0, %q1, %2"
   [(set_attr "type" "mve_move")
 ])
 
@@ -2224,31 +2174,18 @@ (define_insn "mve_vmovntq_<supf><mode>"
 ])
 
 ;;
+;; [vaddq_f])
 ;; [vmulq_f])
+;; [vsubq_f])
 ;;
-(define_insn "mve_vmulq_f<mode>"
+(define_insn "mve_<mve_addsubmul>q_f<mode>"
   [
    (set (match_operand:MVE_0 0 "s_register_operand" "=w")
-	(mult:MVE_0 (match_operand:MVE_0 1 "s_register_operand" "w")
+	(MVE_INT_BINARY_RTX:MVE_0 (match_operand:MVE_0 1 "s_register_operand" "w")
 		    (match_operand:MVE_0 2 "s_register_operand" "w")))
   ]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
-  "vmul.f%#<V_sz_elem>	%q0, %q1, %q2"
-  [(set_attr "type" "mve_move")
-])
-
-;;
-;; [vmulq_n_f])
-;;
-(define_insn "mve_vmulq_n_f<mode>"
-  [
-   (set (match_operand:MVE_0 0 "s_register_operand" "=w")
-	(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "w")
-		       (match_operand:<V_elem> 2 "s_register_operand" "r")]
-	 VMULQ_N_F))
-  ]
-  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
-  "vmul.f%#<V_sz_elem>	%q0, %q1, %2"
+  "<mve_addsubmul>.f%#<V_sz_elem>\t%q0, %q1, %q2"
   [(set_attr "type" "mve_move")
 ])
 
@@ -2490,20 +2427,6 @@ (define_insn "mve_vshlltq_n_<supf><mode>"
   [(set_attr "type" "mve_move")
 ])
 
-;;
-;; [vsubq_f])
-;;
-(define_insn "mve_vsubq_f<mode>"
-  [
-   (set (match_operand:MVE_0 0 "s_register_operand" "=w")
-	(minus:MVE_0 (match_operand:MVE_0 1 "s_register_operand" "w")
-		     (match_operand:MVE_0 2 "s_register_operand" "w")))
-  ]
-  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
-  "vsub.f%#<V_sz_elem>\t%q0, %q1, %q2"
-  [(set_attr "type" "mve_move")
-])
-
 ;;
 ;; [vmulltq_poly_p])
 ;;
@@ -5032,23 +4955,6 @@ (define_insn "mve_vsriq_m_n_<supf><mode>"
   [(set_attr "type" "mve_move")
    (set_attr "length" "8")])
 
-;;
-;; [vsubq_m_u, vsubq_m_s])
-;;
-(define_insn "mve_vsubq_m_<supf><mode>"
-  [
-   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
-	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
-		       (match_operand:MVE_2 2 "s_register_operand" "w")
-		       (match_operand:MVE_2 3 "s_register_operand" "w")
-		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
-	 VSUBQ_M))
-  ]
-  "TARGET_HAVE_MVE"
-  "vpst\;vsubt.i%#<V_sz_elem>\t%q0, %q2, %q3"
-  [(set_attr "type" "mve_move")
-   (set_attr "length" "8")])
-
 ;;
 ;; [vcvtq_m_n_to_f_u, vcvtq_m_n_to_f_s])
 ;;
@@ -5084,35 +4990,39 @@ (define_insn "mve_vabdq_m_<supf><mode>"
 
 ;;
 ;; [vaddq_m_n_s, vaddq_m_n_u])
+;; [vsubq_m_n_s, vsubq_m_n_u])
+;; [vmulq_m_n_s, vmulq_m_n_u])
 ;;
-(define_insn "mve_vaddq_m_n_<supf><mode>"
+(define_insn "@mve_<mve_insn>q_m_n_<supf><mode>"
   [
    (set (match_operand:MVE_2 0 "s_register_operand" "=w")
 	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
 		       (match_operand:MVE_2 2 "s_register_operand" "w")
 		       (match_operand:<V_elem> 3 "s_register_operand" "r")
 		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
-	 VADDQ_M_N))
+	 MVE_INT_M_N_BINARY))
   ]
   "TARGET_HAVE_MVE"
-  "vpst\;vaddt.i%#<V_sz_elem>	%q0, %q2, %3"
+  "vpst\;<mve_insn>t.i%#<V_sz_elem>	%q0, %q2, %3"
   [(set_attr "type" "mve_move")
    (set_attr "length""8")])
 
 ;;
 ;; [vaddq_m_u, vaddq_m_s])
+;; [vsubq_m_u, vsubq_m_s])
+;; [vmulq_m_u, vmulq_m_s])
 ;;
-(define_insn "mve_vaddq_m_<supf><mode>"
+(define_insn "@mve_<mve_insn>q_m_<supf><mode>"
   [
    (set (match_operand:MVE_2 0 "s_register_operand" "=w")
 	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
 		       (match_operand:MVE_2 2 "s_register_operand" "w")
 		       (match_operand:MVE_2 3 "s_register_operand" "w")
 		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
-	 VADDQ_M))
+	 MVE_INT_M_BINARY))
   ]
   "TARGET_HAVE_MVE"
-  "vpst\;vaddt.i%#<V_sz_elem>	%q0, %q2, %q3"
+  "vpst\;<mve_insn>t.i%#<V_sz_elem>	%q0, %q2, %q3"
   [(set_attr "type" "mve_move")
    (set_attr "length""8")])
 
@@ -5422,40 +5332,6 @@ (define_insn "mve_vmulltq_int_m_<supf><mode>"
   [(set_attr "type" "mve_move")
    (set_attr "length""8")])
 
-;;
-;; [vmulq_m_n_u, vmulq_m_n_s])
-;;
-(define_insn "mve_vmulq_m_n_<supf><mode>"
-  [
-   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
-	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
-		       (match_operand:MVE_2 2 "s_register_operand" "w")
-		       (match_operand:<V_elem> 3 "s_register_operand" "r")
-		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
-	 VMULQ_M_N))
-  ]
-  "TARGET_HAVE_MVE"
-  "vpst\;vmult.i%#<V_sz_elem>	%q0, %q2, %3"
-  [(set_attr "type" "mve_move")
-   (set_attr "length""8")])
-
-;;
-;; [vmulq_m_s, vmulq_m_u])
-;;
-(define_insn "mve_vmulq_m_<supf><mode>"
-  [
-   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
-	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
-		       (match_operand:MVE_2 2 "s_register_operand" "w")
-		       (match_operand:MVE_2 3 "s_register_operand" "w")
-		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
-	 VMULQ_M))
-  ]
-  "TARGET_HAVE_MVE"
-  "vpst\;vmult.i%#<V_sz_elem>	%q0, %q2, %q3"
-  [(set_attr "type" "mve_move")
-   (set_attr "length""8")])
-
 ;;
 ;; [vornq_m_u, vornq_m_s])
 ;;
@@ -5796,23 +5672,6 @@ (define_insn "mve_vsliq_m_n_<supf><mode>"
   [(set_attr "type" "mve_move")
    (set_attr "length""8")])
 
-;;
-;; [vsubq_m_n_s, vsubq_m_n_u])
-;;
-(define_insn "mve_vsubq_m_n_<supf><mode>"
-  [
-   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
-	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
-		       (match_operand:MVE_2 2 "s_register_operand" "w")
-		       (match_operand:<V_elem> 3 "s_register_operand" "r")
-		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
-	 VSUBQ_M_N))
-  ]
-  "TARGET_HAVE_MVE"
-  "vpst\;vsubt.i%#<V_sz_elem>\t%q0, %q2, %3"
-  [(set_attr "type" "mve_move")
-   (set_attr "length""8")])
-
 ;;
 ;; [vhcaddq_rot270_m_s])
 ;;
@@ -6613,35 +6472,39 @@ (define_insn "mve_vabdq_m_f<mode>"
 
 ;;
 ;; [vaddq_m_f])
+;; [vsubq_m_f])
+;; [vmulq_m_f])
 ;;
-(define_insn "mve_vaddq_m_f<mode>"
+(define_insn "@mve_<mve_insn>q_m_f<mode>"
   [
    (set (match_operand:MVE_0 0 "s_register_operand" "=w")
 	(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "0")
 		       (match_operand:MVE_0 2 "s_register_operand" "w")
 		       (match_operand:MVE_0 3 "s_register_operand" "w")
 		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
-	 VADDQ_M_F))
+	 MVE_FP_M_BINARY))
   ]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
-  "vpst\;vaddt.f%#<V_sz_elem>	%q0, %q2, %q3"
+  "vpst\;<mve_insn>t.f%#<V_sz_elem>	%q0, %q2, %q3"
   [(set_attr "type" "mve_move")
    (set_attr "length""8")])
 
 ;;
 ;; [vaddq_m_n_f])
+;; [vsubq_m_n_f])
+;; [vmulq_m_n_f])
 ;;
-(define_insn "mve_vaddq_m_n_f<mode>"
+(define_insn "@mve_<mve_insn>q_m_n_f<mode>"
   [
    (set (match_operand:MVE_0 0 "s_register_operand" "=w")
 	(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "0")
 		       (match_operand:MVE_0 2 "s_register_operand" "w")
 		       (match_operand:<V_elem> 3 "s_register_operand" "r")
 		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
-	 VADDQ_M_N_F))
+	 MVE_FP_M_N_BINARY))
   ]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
-  "vpst\;vaddt.f%#<V_sz_elem>	%q0, %q2, %3"
+  "vpst\;<mve_insn>t.f%#<V_sz_elem>	%q0, %q2, %3"
   [(set_attr "type" "mve_move")
    (set_attr "length""8")])
 
@@ -6985,40 +6848,6 @@ (define_insn "mve_vminnmq_m_f<mode>"
   [(set_attr "type" "mve_move")
    (set_attr "length""8")])
 
-;;
-;; [vmulq_m_f])
-;;
-(define_insn "mve_vmulq_m_f<mode>"
-  [
-   (set (match_operand:MVE_0 0 "s_register_operand" "=w")
-	(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "0")
-		       (match_operand:MVE_0 2 "s_register_operand" "w")
-		       (match_operand:MVE_0 3 "s_register_operand" "w")
-		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
-	 VMULQ_M_F))
-  ]
-  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
-  "vpst\;vmult.f%#<V_sz_elem>	%q0, %q2, %q3"
-  [(set_attr "type" "mve_move")
-   (set_attr "length""8")])
-
-;;
-;; [vmulq_m_n_f])
-;;
-(define_insn "mve_vmulq_m_n_f<mode>"
-  [
-   (set (match_operand:MVE_0 0 "s_register_operand" "=w")
-	(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "0")
-		       (match_operand:MVE_0 2 "s_register_operand" "w")
-		       (match_operand:<V_elem> 3 "s_register_operand" "r")
-		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
-	 VMULQ_M_N_F))
-  ]
-  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
-  "vpst\;vmult.f%#<V_sz_elem>	%q0, %q2, %3"
-  [(set_attr "type" "mve_move")
-   (set_attr "length""8")])
-
 ;;
 ;; [vornq_m_f])
 ;;
@@ -7053,40 +6882,6 @@ (define_insn "mve_vorrq_m_f<mode>"
   [(set_attr "type" "mve_move")
    (set_attr "length""8")])
 
-;;
-;; [vsubq_m_f])
-;;
-(define_insn "mve_vsubq_m_f<mode>"
-  [
-   (set (match_operand:MVE_0 0 "s_register_operand" "=w")
-	(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "0")
-		       (match_operand:MVE_0 2 "s_register_operand" "w")
-		       (match_operand:MVE_0 3 "s_register_operand" "w")
-		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
-	 VSUBQ_M_F))
-  ]
-  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
-  "vpst\;vsubt.f%#<V_sz_elem>\t%q0, %q2, %q3"
-  [(set_attr "type" "mve_move")
-   (set_attr "length""8")])
-
-;;
-;; [vsubq_m_n_f])
-;;
-(define_insn "mve_vsubq_m_n_f<mode>"
-  [
-   (set (match_operand:MVE_0 0 "s_register_operand" "=w")
-	(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "0")
-		       (match_operand:MVE_0 2 "s_register_operand" "w")
-		       (match_operand:<V_elem> 3 "s_register_operand" "r")
-		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
-	 VSUBQ_M_N_F))
-  ]
-  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
-  "vpst\;vsubt.f%#<V_sz_elem>\t%q0, %q2, %3"
-  [(set_attr "type" "mve_move")
-   (set_attr "length""8")])
-
 ;;
 ;; [vstrbq_s vstrbq_u]
 ;;
@@ -8927,34 +8722,6 @@ (define_insn "mve_vstrwq_scatter_shifted_offset_<supf>v4si_insn"
   "vstrw.32\t%q2, [%0, %q1, uxtw #2]"
   [(set_attr "length" "4")])
 
-;;
-;; [vaddq_s, vaddq_u])
-;;
-(define_insn "mve_vaddq<mode>"
-  [
-   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
-	(plus:MVE_2 (match_operand:MVE_2 1 "s_register_operand" "w")
-		    (match_operand:MVE_2 2 "s_register_operand" "w")))
-  ]
-  "TARGET_HAVE_MVE"
-  "vadd.i%#<V_sz_elem>\t%q0, %q1, %q2"
-  [(set_attr "type" "mve_move")
-])
-
-;;
-;; [vaddq_f])
-;;
-(define_insn "mve_vaddq_f<mode>"
-  [
-   (set (match_operand:MVE_0 0 "s_register_operand" "=w")
-	(plus:MVE_0 (match_operand:MVE_0 1 "s_register_operand" "w")
-		    (match_operand:MVE_0 2 "s_register_operand" "w")))
-  ]
-  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
-  "vadd.f%#<V_sz_elem>\t%q0, %q1, %q2"
-  [(set_attr "type" "mve_move")
-])
-
 ;;
 ;; [vidupq_n_u])
 ;;
-- 
2.34.1


^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH 08/22] arm: [MVE intrinsics] rework vaddq vmulq vsubq
  2023-04-18 13:45 [PATCH 00/22] arm: New framework for MVE intrinsics Christophe Lyon
                   ` (6 preceding siblings ...)
  2023-04-18 13:45 ` [PATCH 07/22] arm: [MVE intrinsics] factorize vadd vsubq vmulq Christophe Lyon
@ 2023-04-18 13:45 ` Christophe Lyon
  2023-05-02 16:31   ` Kyrylo Tkachov
  2023-04-18 13:45 ` [PATCH 09/22] arm: [MVE intrinsics] add binary shape Christophe Lyon
                   ` (14 subsequent siblings)
  22 siblings, 1 reply; 55+ messages in thread
From: Christophe Lyon @ 2023-04-18 13:45 UTC (permalink / raw)
  To: gcc-patches, kyrylo.tkachov, richard.earnshaw, richard.sandiford
  Cc: Christophe Lyon

Implement vaddq, vmulq, vsubq using the new MVE builtins framework.

2022-09-08  Christophe Lyon <christophe.lyon@arm.com>

	gcc/

	* config/arm/arm-mve-builtins-base.cc (FUNCTION_WITH_RTX_M_N):
	New.
	(vaddq, vmulq, vsubq): New.
	* config/arm/arm-mve-builtins-base.def (vaddq, vmulq, vsubq): New.
	* config/arm/arm-mve-builtins-base.h (vaddq, vmulq, vsubq): New.
	* config/arm/arm_mve.h (vaddq): Remove.
	(vaddq_m): Remove.
	(vaddq_x): Remove.
	(vaddq_n_u8): Remove.
	(vaddq_n_s8): Remove.
	(vaddq_n_u16): Remove.
	(vaddq_n_s16): Remove.
	(vaddq_n_u32): Remove.
	(vaddq_n_s32): Remove.
	(vaddq_n_f16): Remove.
	(vaddq_n_f32): Remove.
	(vaddq_m_n_s8): Remove.
	(vaddq_m_n_s32): Remove.
	(vaddq_m_n_s16): Remove.
	(vaddq_m_n_u8): Remove.
	(vaddq_m_n_u32): Remove.
	(vaddq_m_n_u16): Remove.
	(vaddq_m_s8): Remove.
	(vaddq_m_s32): Remove.
	(vaddq_m_s16): Remove.
	(vaddq_m_u8): Remove.
	(vaddq_m_u32): Remove.
	(vaddq_m_u16): Remove.
	(vaddq_m_f32): Remove.
	(vaddq_m_f16): Remove.
	(vaddq_m_n_f32): Remove.
	(vaddq_m_n_f16): Remove.
	(vaddq_s8): Remove.
	(vaddq_s16): Remove.
	(vaddq_s32): Remove.
	(vaddq_u8): Remove.
	(vaddq_u16): Remove.
	(vaddq_u32): Remove.
	(vaddq_f16): Remove.
	(vaddq_f32): Remove.
	(vaddq_x_s8): Remove.
	(vaddq_x_s16): Remove.
	(vaddq_x_s32): Remove.
	(vaddq_x_n_s8): Remove.
	(vaddq_x_n_s16): Remove.
	(vaddq_x_n_s32): Remove.
	(vaddq_x_u8): Remove.
	(vaddq_x_u16): Remove.
	(vaddq_x_u32): Remove.
	(vaddq_x_n_u8): Remove.
	(vaddq_x_n_u16): Remove.
	(vaddq_x_n_u32): Remove.
	(vaddq_x_f16): Remove.
	(vaddq_x_f32): Remove.
	(vaddq_x_n_f16): Remove.
	(vaddq_x_n_f32): Remove.
	(__arm_vaddq_n_u8): Remove.
	(__arm_vaddq_n_s8): Remove.
	(__arm_vaddq_n_u16): Remove.
	(__arm_vaddq_n_s16): Remove.
	(__arm_vaddq_n_u32): Remove.
	(__arm_vaddq_n_s32): Remove.
	(__arm_vaddq_m_n_s8): Remove.
	(__arm_vaddq_m_n_s32): Remove.
	(__arm_vaddq_m_n_s16): Remove.
	(__arm_vaddq_m_n_u8): Remove.
	(__arm_vaddq_m_n_u32): Remove.
	(__arm_vaddq_m_n_u16): Remove.
	(__arm_vaddq_m_s8): Remove.
	(__arm_vaddq_m_s32): Remove.
	(__arm_vaddq_m_s16): Remove.
	(__arm_vaddq_m_u8): Remove.
	(__arm_vaddq_m_u32): Remove.
	(__arm_vaddq_m_u16): Remove.
	(__arm_vaddq_s8): Remove.
	(__arm_vaddq_s16): Remove.
	(__arm_vaddq_s32): Remove.
	(__arm_vaddq_u8): Remove.
	(__arm_vaddq_u16): Remove.
	(__arm_vaddq_u32): Remove.
	(__arm_vaddq_x_s8): Remove.
	(__arm_vaddq_x_s16): Remove.
	(__arm_vaddq_x_s32): Remove.
	(__arm_vaddq_x_n_s8): Remove.
	(__arm_vaddq_x_n_s16): Remove.
	(__arm_vaddq_x_n_s32): Remove.
	(__arm_vaddq_x_u8): Remove.
	(__arm_vaddq_x_u16): Remove.
	(__arm_vaddq_x_u32): Remove.
	(__arm_vaddq_x_n_u8): Remove.
	(__arm_vaddq_x_n_u16): Remove.
	(__arm_vaddq_x_n_u32): Remove.
	(__arm_vaddq_n_f16): Remove.
	(__arm_vaddq_n_f32): Remove.
	(__arm_vaddq_m_f32): Remove.
	(__arm_vaddq_m_f16): Remove.
	(__arm_vaddq_m_n_f32): Remove.
	(__arm_vaddq_m_n_f16): Remove.
	(__arm_vaddq_f16): Remove.
	(__arm_vaddq_f32): Remove.
	(__arm_vaddq_x_f16): Remove.
	(__arm_vaddq_x_f32): Remove.
	(__arm_vaddq_x_n_f16): Remove.
	(__arm_vaddq_x_n_f32): Remove.
	(__arm_vaddq): Remove.
	(__arm_vaddq_m): Remove.
	(__arm_vaddq_x): Remove.
	(vmulq): Remove.
	(vmulq_m): Remove.
	(vmulq_x): Remove.
	(vmulq_u8): Remove.
	(vmulq_n_u8): Remove.
	(vmulq_s8): Remove.
	(vmulq_n_s8): Remove.
	(vmulq_u16): Remove.
	(vmulq_n_u16): Remove.
	(vmulq_s16): Remove.
	(vmulq_n_s16): Remove.
	(vmulq_u32): Remove.
	(vmulq_n_u32): Remove.
	(vmulq_s32): Remove.
	(vmulq_n_s32): Remove.
	(vmulq_n_f16): Remove.
	(vmulq_f16): Remove.
	(vmulq_n_f32): Remove.
	(vmulq_f32): Remove.
	(vmulq_m_n_s8): Remove.
	(vmulq_m_n_s32): Remove.
	(vmulq_m_n_s16): Remove.
	(vmulq_m_n_u8): Remove.
	(vmulq_m_n_u32): Remove.
	(vmulq_m_n_u16): Remove.
	(vmulq_m_s8): Remove.
	(vmulq_m_s32): Remove.
	(vmulq_m_s16): Remove.
	(vmulq_m_u8): Remove.
	(vmulq_m_u32): Remove.
	(vmulq_m_u16): Remove.
	(vmulq_m_f32): Remove.
	(vmulq_m_f16): Remove.
	(vmulq_m_n_f32): Remove.
	(vmulq_m_n_f16): Remove.
	(vmulq_x_s8): Remove.
	(vmulq_x_s16): Remove.
	(vmulq_x_s32): Remove.
	(vmulq_x_n_s8): Remove.
	(vmulq_x_n_s16): Remove.
	(vmulq_x_n_s32): Remove.
	(vmulq_x_u8): Remove.
	(vmulq_x_u16): Remove.
	(vmulq_x_u32): Remove.
	(vmulq_x_n_u8): Remove.
	(vmulq_x_n_u16): Remove.
	(vmulq_x_n_u32): Remove.
	(vmulq_x_f16): Remove.
	(vmulq_x_f32): Remove.
	(vmulq_x_n_f16): Remove.
	(vmulq_x_n_f32): Remove.
	(__arm_vmulq_u8): Remove.
	(__arm_vmulq_n_u8): Remove.
	(__arm_vmulq_s8): Remove.
	(__arm_vmulq_n_s8): Remove.
	(__arm_vmulq_u16): Remove.
	(__arm_vmulq_n_u16): Remove.
	(__arm_vmulq_s16): Remove.
	(__arm_vmulq_n_s16): Remove.
	(__arm_vmulq_u32): Remove.
	(__arm_vmulq_n_u32): Remove.
	(__arm_vmulq_s32): Remove.
	(__arm_vmulq_n_s32): Remove.
	(__arm_vmulq_m_n_s8): Remove.
	(__arm_vmulq_m_n_s32): Remove.
	(__arm_vmulq_m_n_s16): Remove.
	(__arm_vmulq_m_n_u8): Remove.
	(__arm_vmulq_m_n_u32): Remove.
	(__arm_vmulq_m_n_u16): Remove.
	(__arm_vmulq_m_s8): Remove.
	(__arm_vmulq_m_s32): Remove.
	(__arm_vmulq_m_s16): Remove.
	(__arm_vmulq_m_u8): Remove.
	(__arm_vmulq_m_u32): Remove.
	(__arm_vmulq_m_u16): Remove.
	(__arm_vmulq_x_s8): Remove.
	(__arm_vmulq_x_s16): Remove.
	(__arm_vmulq_x_s32): Remove.
	(__arm_vmulq_x_n_s8): Remove.
	(__arm_vmulq_x_n_s16): Remove.
	(__arm_vmulq_x_n_s32): Remove.
	(__arm_vmulq_x_u8): Remove.
	(__arm_vmulq_x_u16): Remove.
	(__arm_vmulq_x_u32): Remove.
	(__arm_vmulq_x_n_u8): Remove.
	(__arm_vmulq_x_n_u16): Remove.
	(__arm_vmulq_x_n_u32): Remove.
	(__arm_vmulq_n_f16): Remove.
	(__arm_vmulq_f16): Remove.
	(__arm_vmulq_n_f32): Remove.
	(__arm_vmulq_f32): Remove.
	(__arm_vmulq_m_f32): Remove.
	(__arm_vmulq_m_f16): Remove.
	(__arm_vmulq_m_n_f32): Remove.
	(__arm_vmulq_m_n_f16): Remove.
	(__arm_vmulq_x_f16): Remove.
	(__arm_vmulq_x_f32): Remove.
	(__arm_vmulq_x_n_f16): Remove.
	(__arm_vmulq_x_n_f32): Remove.
	(__arm_vmulq): Remove.
	(__arm_vmulq_m): Remove.
	(__arm_vmulq_x): Remove.
	(vsubq): Remove.
	(vsubq_m): Remove.
	(vsubq_x): Remove.
	(vsubq_n_f16): Remove.
	(vsubq_n_f32): Remove.
	(vsubq_u8): Remove.
	(vsubq_n_u8): Remove.
	(vsubq_s8): Remove.
	(vsubq_n_s8): Remove.
	(vsubq_u16): Remove.
	(vsubq_n_u16): Remove.
	(vsubq_s16): Remove.
	(vsubq_n_s16): Remove.
	(vsubq_u32): Remove.
	(vsubq_n_u32): Remove.
	(vsubq_s32): Remove.
	(vsubq_n_s32): Remove.
	(vsubq_f16): Remove.
	(vsubq_f32): Remove.
	(vsubq_m_s8): Remove.
	(vsubq_m_u8): Remove.
	(vsubq_m_s16): Remove.
	(vsubq_m_u16): Remove.
	(vsubq_m_s32): Remove.
	(vsubq_m_u32): Remove.
	(vsubq_m_n_s8): Remove.
	(vsubq_m_n_s32): Remove.
	(vsubq_m_n_s16): Remove.
	(vsubq_m_n_u8): Remove.
	(vsubq_m_n_u32): Remove.
	(vsubq_m_n_u16): Remove.
	(vsubq_m_f32): Remove.
	(vsubq_m_f16): Remove.
	(vsubq_m_n_f32): Remove.
	(vsubq_m_n_f16): Remove.
	(vsubq_x_s8): Remove.
	(vsubq_x_s16): Remove.
	(vsubq_x_s32): Remove.
	(vsubq_x_n_s8): Remove.
	(vsubq_x_n_s16): Remove.
	(vsubq_x_n_s32): Remove.
	(vsubq_x_u8): Remove.
	(vsubq_x_u16): Remove.
	(vsubq_x_u32): Remove.
	(vsubq_x_n_u8): Remove.
	(vsubq_x_n_u16): Remove.
	(vsubq_x_n_u32): Remove.
	(vsubq_x_f16): Remove.
	(vsubq_x_f32): Remove.
	(vsubq_x_n_f16): Remove.
	(vsubq_x_n_f32): Remove.
	(__arm_vsubq_u8): Remove.
	(__arm_vsubq_n_u8): Remove.
	(__arm_vsubq_s8): Remove.
	(__arm_vsubq_n_s8): Remove.
	(__arm_vsubq_u16): Remove.
	(__arm_vsubq_n_u16): Remove.
	(__arm_vsubq_s16): Remove.
	(__arm_vsubq_n_s16): Remove.
	(__arm_vsubq_u32): Remove.
	(__arm_vsubq_n_u32): Remove.
	(__arm_vsubq_s32): Remove.
	(__arm_vsubq_n_s32): Remove.
	(__arm_vsubq_m_s8): Remove.
	(__arm_vsubq_m_u8): Remove.
	(__arm_vsubq_m_s16): Remove.
	(__arm_vsubq_m_u16): Remove.
	(__arm_vsubq_m_s32): Remove.
	(__arm_vsubq_m_u32): Remove.
	(__arm_vsubq_m_n_s8): Remove.
	(__arm_vsubq_m_n_s32): Remove.
	(__arm_vsubq_m_n_s16): Remove.
	(__arm_vsubq_m_n_u8): Remove.
	(__arm_vsubq_m_n_u32): Remove.
	(__arm_vsubq_m_n_u16): Remove.
	(__arm_vsubq_x_s8): Remove.
	(__arm_vsubq_x_s16): Remove.
	(__arm_vsubq_x_s32): Remove.
	(__arm_vsubq_x_n_s8): Remove.
	(__arm_vsubq_x_n_s16): Remove.
	(__arm_vsubq_x_n_s32): Remove.
	(__arm_vsubq_x_u8): Remove.
	(__arm_vsubq_x_u16): Remove.
	(__arm_vsubq_x_u32): Remove.
	(__arm_vsubq_x_n_u8): Remove.
	(__arm_vsubq_x_n_u16): Remove.
	(__arm_vsubq_x_n_u32): Remove.
	(__arm_vsubq_n_f16): Remove.
	(__arm_vsubq_n_f32): Remove.
	(__arm_vsubq_f16): Remove.
	(__arm_vsubq_f32): Remove.
	(__arm_vsubq_m_f32): Remove.
	(__arm_vsubq_m_f16): Remove.
	(__arm_vsubq_m_n_f32): Remove.
	(__arm_vsubq_m_n_f16): Remove.
	(__arm_vsubq_x_f16): Remove.
	(__arm_vsubq_x_f32): Remove.
	(__arm_vsubq_x_n_f16): Remove.
	(__arm_vsubq_x_n_f32): Remove.
	(__arm_vsubq): Remove.
	(__arm_vsubq_m): Remove.
	(__arm_vsubq_x): Remove.
	* config/arm/arm_mve_builtins.def (vsubq_u, vsubq_s, vsubq_f):
	Remove.
	(vmulq_u, vmulq_s, vmulq_f): Remove.
	* config/arm/mve.md (mve_vsubq_<supf><mode>): Remove.
	(mve_vmulq_<supf><mode>): Remove.
---
 gcc/config/arm/arm-mve-builtins-base.cc  |   11 +
 gcc/config/arm/arm-mve-builtins-base.def |    6 +
 gcc/config/arm/arm-mve-builtins-base.h   |    3 +
 gcc/config/arm/arm_mve.h                 | 2498 ----------------------
 gcc/config/arm/arm_mve_builtins.def      |    6 -
 gcc/config/arm/mve.md                    |   27 -
 6 files changed, 20 insertions(+), 2531 deletions(-)

diff --git a/gcc/config/arm/arm-mve-builtins-base.cc b/gcc/config/arm/arm-mve-builtins-base.cc
index 02a3b23865c..48b09bffd0c 100644
--- a/gcc/config/arm/arm-mve-builtins-base.cc
+++ b/gcc/config/arm/arm-mve-builtins-base.cc
@@ -82,7 +82,18 @@ class vuninitializedq_impl : public quiet<function_base>
 
 namespace arm_mve {
 
+  /* Helper for builtins with RTX codes, _m predicated and _n overrides.  */
+#define FUNCTION_WITH_RTX_M_N(NAME, RTX, UNSPEC) FUNCTION		\
+  (NAME, unspec_based_mve_function_exact_insn,				\
+   (RTX, RTX, RTX,							\
+    UNSPEC##_N_S, UNSPEC##_N_U, UNSPEC##_N_F,				\
+    UNSPEC##_M_S, UNSPEC##_M_U, UNSPEC##_M_F,				\
+    UNSPEC##_M_N_S, UNSPEC##_M_N_U, UNSPEC##_M_N_F))
+
+FUNCTION_WITH_RTX_M_N (vaddq, PLUS, VADDQ)
+FUNCTION_WITH_RTX_M_N (vmulq, MULT, VMULQ)
 FUNCTION (vreinterpretq, vreinterpretq_impl,)
+FUNCTION_WITH_RTX_M_N (vsubq, MINUS, VSUBQ)
 FUNCTION (vuninitializedq, vuninitializedq_impl,)
 
 } /* end namespace arm_mve */
diff --git a/gcc/config/arm/arm-mve-builtins-base.def b/gcc/config/arm/arm-mve-builtins-base.def
index f669642a259..624558c08b2 100644
--- a/gcc/config/arm/arm-mve-builtins-base.def
+++ b/gcc/config/arm/arm-mve-builtins-base.def
@@ -18,11 +18,17 @@
    <http://www.gnu.org/licenses/>.  */
 
 #define REQUIRES_FLOAT false
+DEF_MVE_FUNCTION (vaddq, binary_opt_n, all_integer, mx_or_none)
+DEF_MVE_FUNCTION (vmulq, binary_opt_n, all_integer, mx_or_none)
 DEF_MVE_FUNCTION (vreinterpretq, unary_convert, reinterpret_integer, none)
+DEF_MVE_FUNCTION (vsubq, binary_opt_n, all_integer, mx_or_none)
 DEF_MVE_FUNCTION (vuninitializedq, inherent, all_integer_with_64, none)
 #undef REQUIRES_FLOAT
 
 #define REQUIRES_FLOAT true
+DEF_MVE_FUNCTION (vaddq, binary_opt_n, all_float, mx_or_none)
+DEF_MVE_FUNCTION (vmulq, binary_opt_n, all_float, mx_or_none)
 DEF_MVE_FUNCTION (vreinterpretq, unary_convert, reinterpret_float, none)
+DEF_MVE_FUNCTION (vsubq, binary_opt_n, all_float, mx_or_none)
 DEF_MVE_FUNCTION (vuninitializedq, inherent, all_float, none)
 #undef REQUIRES_FLOAT
diff --git a/gcc/config/arm/arm-mve-builtins-base.h b/gcc/config/arm/arm-mve-builtins-base.h
index ec309cbe572..30f8549c495 100644
--- a/gcc/config/arm/arm-mve-builtins-base.h
+++ b/gcc/config/arm/arm-mve-builtins-base.h
@@ -23,7 +23,10 @@
 namespace arm_mve {
 namespace functions {
 
+extern const function_base *const vaddq;
+extern const function_base *const vmulq;
 extern const function_base *const vreinterpretq;
+extern const function_base *const vsubq;
 extern const function_base *const vuninitializedq;
 
 } /* end namespace arm_mve::functions */
diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
index 5dc5ecef134..42a1af2ae15 100644
--- a/gcc/config/arm/arm_mve.h
+++ b/gcc/config/arm/arm_mve.h
@@ -61,14 +61,12 @@
 #define vaddlvq_p(__a, __p) __arm_vaddlvq_p(__a, __p)
 #define vcmpneq(__a, __b) __arm_vcmpneq(__a, __b)
 #define vshlq(__a, __b) __arm_vshlq(__a, __b)
-#define vsubq(__a, __b) __arm_vsubq(__a, __b)
 #define vrmulhq(__a, __b) __arm_vrmulhq(__a, __b)
 #define vrhaddq(__a, __b) __arm_vrhaddq(__a, __b)
 #define vqsubq(__a, __b) __arm_vqsubq(__a, __b)
 #define vqaddq(__a, __b) __arm_vqaddq(__a, __b)
 #define vorrq(__a, __b) __arm_vorrq(__a, __b)
 #define vornq(__a, __b) __arm_vornq(__a, __b)
-#define vmulq(__a, __b) __arm_vmulq(__a, __b)
 #define vmulltq_int(__a, __b) __arm_vmulltq_int(__a, __b)
 #define vmullbq_int(__a, __b) __arm_vmullbq_int(__a, __b)
 #define vmulhq(__a, __b) __arm_vmulhq(__a, __b)
@@ -89,7 +87,6 @@
 #define vandq(__a, __b) __arm_vandq(__a, __b)
 #define vaddvq_p(__a, __p) __arm_vaddvq_p(__a, __p)
 #define vaddvaq(__a, __b) __arm_vaddvaq(__a, __b)
-#define vaddq(__a, __b) __arm_vaddq(__a, __b)
 #define vabdq(__a, __b) __arm_vabdq(__a, __b)
 #define vshlq_r(__a, __b) __arm_vshlq_r(__a, __b)
 #define vrshlq(__a, __b) __arm_vrshlq(__a, __b)
@@ -235,12 +232,10 @@
 #define vqmovunbq_m(__a, __b, __p) __arm_vqmovunbq_m(__a, __b, __p)
 #define vqmovuntq_m(__a, __b, __p) __arm_vqmovuntq_m(__a, __b, __p)
 #define vsriq_m(__a, __b, __imm, __p) __arm_vsriq_m(__a, __b, __imm, __p)
-#define vsubq_m(__inactive, __a, __b, __p) __arm_vsubq_m(__inactive, __a, __b, __p)
 #define vqshluq_m(__inactive, __a, __imm, __p) __arm_vqshluq_m(__inactive, __a, __imm, __p)
 #define vabavq_p(__a, __b, __c, __p) __arm_vabavq_p(__a, __b, __c, __p)
 #define vshlq_m(__inactive, __a, __b, __p) __arm_vshlq_m(__inactive, __a, __b, __p)
 #define vabdq_m(__inactive, __a, __b, __p) __arm_vabdq_m(__inactive, __a, __b, __p)
-#define vaddq_m(__inactive, __a, __b, __p) __arm_vaddq_m(__inactive, __a, __b, __p)
 #define vandq_m(__inactive, __a, __b, __p) __arm_vandq_m(__inactive, __a, __b, __p)
 #define vbicq_m(__inactive, __a, __b, __p) __arm_vbicq_m(__inactive, __a, __b, __p)
 #define vbrsrq_m(__inactive, __a, __b, __p) __arm_vbrsrq_m(__inactive, __a, __b, __p)
@@ -262,7 +257,6 @@
 #define vmulhq_m(__inactive, __a, __b, __p) __arm_vmulhq_m(__inactive, __a, __b, __p)
 #define vmullbq_int_m(__inactive, __a, __b, __p) __arm_vmullbq_int_m(__inactive, __a, __b, __p)
 #define vmulltq_int_m(__inactive, __a, __b, __p) __arm_vmulltq_int_m(__inactive, __a, __b, __p)
-#define vmulq_m(__inactive, __a, __b, __p) __arm_vmulq_m(__inactive, __a, __b, __p)
 #define vornq_m(__inactive, __a, __b, __p) __arm_vornq_m(__inactive, __a, __b, __p)
 #define vorrq_m(__inactive, __a, __b, __p) __arm_vorrq_m(__inactive, __a, __b, __p)
 #define vqaddq_m(__inactive, __a, __b, __p) __arm_vqaddq_m(__inactive, __a, __b, __p)
@@ -394,7 +388,6 @@
 #define vmaxq_x(__a, __b, __p) __arm_vmaxq_x(__a, __b, __p)
 #define vabdq_x(__a, __b, __p) __arm_vabdq_x(__a, __b, __p)
 #define vabsq_x(__a, __p) __arm_vabsq_x(__a, __p)
-#define vaddq_x(__a, __b, __p) __arm_vaddq_x(__a, __b, __p)
 #define vclsq_x(__a, __p) __arm_vclsq_x(__a, __p)
 #define vclzq_x(__a, __p) __arm_vclzq_x(__a, __p)
 #define vnegq_x(__a, __p) __arm_vnegq_x(__a, __p)
@@ -403,8 +396,6 @@
 #define vmullbq_int_x(__a, __b, __p) __arm_vmullbq_int_x(__a, __b, __p)
 #define vmulltq_poly_x(__a, __b, __p) __arm_vmulltq_poly_x(__a, __b, __p)
 #define vmulltq_int_x(__a, __b, __p) __arm_vmulltq_int_x(__a, __b, __p)
-#define vmulq_x(__a, __b, __p) __arm_vmulq_x(__a, __b, __p)
-#define vsubq_x(__a, __b, __p) __arm_vsubq_x(__a, __b, __p)
 #define vcaddq_rot90_x(__a, __b, __p) __arm_vcaddq_rot90_x(__a, __b, __p)
 #define vcaddq_rot270_x(__a, __b, __p) __arm_vcaddq_rot270_x(__a, __b, __p)
 #define vhaddq_x(__a, __b, __p) __arm_vhaddq_x(__a, __b, __p)
@@ -651,8 +642,6 @@
 #define vctp64q(__a) __arm_vctp64q(__a)
 #define vctp8q(__a) __arm_vctp8q(__a)
 #define vpnot(__a) __arm_vpnot(__a)
-#define vsubq_n_f16(__a, __b) __arm_vsubq_n_f16(__a, __b)
-#define vsubq_n_f32(__a, __b) __arm_vsubq_n_f32(__a, __b)
 #define vbrsrq_n_f16(__a, __b) __arm_vbrsrq_n_f16(__a, __b)
 #define vbrsrq_n_f32(__a, __b) __arm_vbrsrq_n_f32(__a, __b)
 #define vcvtq_n_f16_s16(__a,  __imm6) __arm_vcvtq_n_f16_s16(__a,  __imm6)
@@ -693,8 +682,6 @@
 #define vshlq_u8(__a, __b) __arm_vshlq_u8(__a, __b)
 #define vshlq_u16(__a, __b) __arm_vshlq_u16(__a, __b)
 #define vshlq_u32(__a, __b) __arm_vshlq_u32(__a, __b)
-#define vsubq_u8(__a, __b) __arm_vsubq_u8(__a, __b)
-#define vsubq_n_u8(__a, __b) __arm_vsubq_n_u8(__a, __b)
 #define vrmulhq_u8(__a, __b) __arm_vrmulhq_u8(__a, __b)
 #define vrhaddq_u8(__a, __b) __arm_vrhaddq_u8(__a, __b)
 #define vqsubq_u8(__a, __b) __arm_vqsubq_u8(__a, __b)
@@ -703,8 +690,6 @@
 #define vqaddq_n_u8(__a, __b) __arm_vqaddq_n_u8(__a, __b)
 #define vorrq_u8(__a, __b) __arm_vorrq_u8(__a, __b)
 #define vornq_u8(__a, __b) __arm_vornq_u8(__a, __b)
-#define vmulq_u8(__a, __b) __arm_vmulq_u8(__a, __b)
-#define vmulq_n_u8(__a, __b) __arm_vmulq_n_u8(__a, __b)
 #define vmulltq_int_u8(__a, __b) __arm_vmulltq_int_u8(__a, __b)
 #define vmullbq_int_u8(__a, __b) __arm_vmullbq_int_u8(__a, __b)
 #define vmulhq_u8(__a, __b) __arm_vmulhq_u8(__a, __b)
@@ -731,7 +716,6 @@
 #define vandq_u8(__a, __b) __arm_vandq_u8(__a, __b)
 #define vaddvq_p_u8(__a, __p) __arm_vaddvq_p_u8(__a, __p)
 #define vaddvaq_u8(__a, __b) __arm_vaddvaq_u8(__a, __b)
-#define vaddq_n_u8(__a, __b) __arm_vaddq_n_u8(__a, __b)
 #define vabdq_u8(__a, __b) __arm_vabdq_u8(__a, __b)
 #define vshlq_r_u8(__a, __b) __arm_vshlq_r_u8(__a, __b)
 #define vrshlq_u8(__a, __b) __arm_vrshlq_u8(__a, __b)
@@ -761,8 +745,6 @@
 #define vcmpeqq_n_s8(__a, __b) __arm_vcmpeqq_n_s8(__a, __b)
 #define vqshluq_n_s8(__a,  __imm) __arm_vqshluq_n_s8(__a,  __imm)
 #define vaddvq_p_s8(__a, __p) __arm_vaddvq_p_s8(__a, __p)
-#define vsubq_s8(__a, __b) __arm_vsubq_s8(__a, __b)
-#define vsubq_n_s8(__a, __b) __arm_vsubq_n_s8(__a, __b)
 #define vshlq_r_s8(__a, __b) __arm_vshlq_r_s8(__a, __b)
 #define vrshlq_s8(__a, __b) __arm_vrshlq_s8(__a, __b)
 #define vrshlq_n_s8(__a, __b) __arm_vrshlq_n_s8(__a, __b)
@@ -782,8 +764,6 @@
 #define vqaddq_n_s8(__a, __b) __arm_vqaddq_n_s8(__a, __b)
 #define vorrq_s8(__a, __b) __arm_vorrq_s8(__a, __b)
 #define vornq_s8(__a, __b) __arm_vornq_s8(__a, __b)
-#define vmulq_s8(__a, __b) __arm_vmulq_s8(__a, __b)
-#define vmulq_n_s8(__a, __b) __arm_vmulq_n_s8(__a, __b)
 #define vmulltq_int_s8(__a, __b) __arm_vmulltq_int_s8(__a, __b)
 #define vmullbq_int_s8(__a, __b) __arm_vmullbq_int_s8(__a, __b)
 #define vmulhq_s8(__a, __b) __arm_vmulhq_s8(__a, __b)
@@ -808,13 +788,10 @@
 #define vbicq_s8(__a, __b) __arm_vbicq_s8(__a, __b)
 #define vandq_s8(__a, __b) __arm_vandq_s8(__a, __b)
 #define vaddvaq_s8(__a, __b) __arm_vaddvaq_s8(__a, __b)
-#define vaddq_n_s8(__a, __b) __arm_vaddq_n_s8(__a, __b)
 #define vabdq_s8(__a, __b) __arm_vabdq_s8(__a, __b)
 #define vshlq_n_s8(__a,  __imm) __arm_vshlq_n_s8(__a,  __imm)
 #define vrshrq_n_s8(__a,  __imm) __arm_vrshrq_n_s8(__a,  __imm)
 #define vqshlq_n_s8(__a,  __imm) __arm_vqshlq_n_s8(__a,  __imm)
-#define vsubq_u16(__a, __b) __arm_vsubq_u16(__a, __b)
-#define vsubq_n_u16(__a, __b) __arm_vsubq_n_u16(__a, __b)
 #define vrmulhq_u16(__a, __b) __arm_vrmulhq_u16(__a, __b)
 #define vrhaddq_u16(__a, __b) __arm_vrhaddq_u16(__a, __b)
 #define vqsubq_u16(__a, __b) __arm_vqsubq_u16(__a, __b)
@@ -823,8 +800,6 @@
 #define vqaddq_n_u16(__a, __b) __arm_vqaddq_n_u16(__a, __b)
 #define vorrq_u16(__a, __b) __arm_vorrq_u16(__a, __b)
 #define vornq_u16(__a, __b) __arm_vornq_u16(__a, __b)
-#define vmulq_u16(__a, __b) __arm_vmulq_u16(__a, __b)
-#define vmulq_n_u16(__a, __b) __arm_vmulq_n_u16(__a, __b)
 #define vmulltq_int_u16(__a, __b) __arm_vmulltq_int_u16(__a, __b)
 #define vmullbq_int_u16(__a, __b) __arm_vmullbq_int_u16(__a, __b)
 #define vmulhq_u16(__a, __b) __arm_vmulhq_u16(__a, __b)
@@ -851,7 +826,6 @@
 #define vandq_u16(__a, __b) __arm_vandq_u16(__a, __b)
 #define vaddvq_p_u16(__a, __p) __arm_vaddvq_p_u16(__a, __p)
 #define vaddvaq_u16(__a, __b) __arm_vaddvaq_u16(__a, __b)
-#define vaddq_n_u16(__a, __b) __arm_vaddq_n_u16(__a, __b)
 #define vabdq_u16(__a, __b) __arm_vabdq_u16(__a, __b)
 #define vshlq_r_u16(__a, __b) __arm_vshlq_r_u16(__a, __b)
 #define vrshlq_u16(__a, __b) __arm_vrshlq_u16(__a, __b)
@@ -881,8 +855,6 @@
 #define vcmpeqq_n_s16(__a, __b) __arm_vcmpeqq_n_s16(__a, __b)
 #define vqshluq_n_s16(__a,  __imm) __arm_vqshluq_n_s16(__a,  __imm)
 #define vaddvq_p_s16(__a, __p) __arm_vaddvq_p_s16(__a, __p)
-#define vsubq_s16(__a, __b) __arm_vsubq_s16(__a, __b)
-#define vsubq_n_s16(__a, __b) __arm_vsubq_n_s16(__a, __b)
 #define vshlq_r_s16(__a, __b) __arm_vshlq_r_s16(__a, __b)
 #define vrshlq_s16(__a, __b) __arm_vrshlq_s16(__a, __b)
 #define vrshlq_n_s16(__a, __b) __arm_vrshlq_n_s16(__a, __b)
@@ -902,8 +874,6 @@
 #define vqaddq_n_s16(__a, __b) __arm_vqaddq_n_s16(__a, __b)
 #define vorrq_s16(__a, __b) __arm_vorrq_s16(__a, __b)
 #define vornq_s16(__a, __b) __arm_vornq_s16(__a, __b)
-#define vmulq_s16(__a, __b) __arm_vmulq_s16(__a, __b)
-#define vmulq_n_s16(__a, __b) __arm_vmulq_n_s16(__a, __b)
 #define vmulltq_int_s16(__a, __b) __arm_vmulltq_int_s16(__a, __b)
 #define vmullbq_int_s16(__a, __b) __arm_vmullbq_int_s16(__a, __b)
 #define vmulhq_s16(__a, __b) __arm_vmulhq_s16(__a, __b)
@@ -928,13 +898,10 @@
 #define vbicq_s16(__a, __b) __arm_vbicq_s16(__a, __b)
 #define vandq_s16(__a, __b) __arm_vandq_s16(__a, __b)
 #define vaddvaq_s16(__a, __b) __arm_vaddvaq_s16(__a, __b)
-#define vaddq_n_s16(__a, __b) __arm_vaddq_n_s16(__a, __b)
 #define vabdq_s16(__a, __b) __arm_vabdq_s16(__a, __b)
 #define vshlq_n_s16(__a,  __imm) __arm_vshlq_n_s16(__a,  __imm)
 #define vrshrq_n_s16(__a,  __imm) __arm_vrshrq_n_s16(__a,  __imm)
 #define vqshlq_n_s16(__a,  __imm) __arm_vqshlq_n_s16(__a,  __imm)
-#define vsubq_u32(__a, __b) __arm_vsubq_u32(__a, __b)
-#define vsubq_n_u32(__a, __b) __arm_vsubq_n_u32(__a, __b)
 #define vrmulhq_u32(__a, __b) __arm_vrmulhq_u32(__a, __b)
 #define vrhaddq_u32(__a, __b) __arm_vrhaddq_u32(__a, __b)
 #define vqsubq_u32(__a, __b) __arm_vqsubq_u32(__a, __b)
@@ -943,8 +910,6 @@
 #define vqaddq_n_u32(__a, __b) __arm_vqaddq_n_u32(__a, __b)
 #define vorrq_u32(__a, __b) __arm_vorrq_u32(__a, __b)
 #define vornq_u32(__a, __b) __arm_vornq_u32(__a, __b)
-#define vmulq_u32(__a, __b) __arm_vmulq_u32(__a, __b)
-#define vmulq_n_u32(__a, __b) __arm_vmulq_n_u32(__a, __b)
 #define vmulltq_int_u32(__a, __b) __arm_vmulltq_int_u32(__a, __b)
 #define vmullbq_int_u32(__a, __b) __arm_vmullbq_int_u32(__a, __b)
 #define vmulhq_u32(__a, __b) __arm_vmulhq_u32(__a, __b)
@@ -971,7 +936,6 @@
 #define vandq_u32(__a, __b) __arm_vandq_u32(__a, __b)
 #define vaddvq_p_u32(__a, __p) __arm_vaddvq_p_u32(__a, __p)
 #define vaddvaq_u32(__a, __b) __arm_vaddvaq_u32(__a, __b)
-#define vaddq_n_u32(__a, __b) __arm_vaddq_n_u32(__a, __b)
 #define vabdq_u32(__a, __b) __arm_vabdq_u32(__a, __b)
 #define vshlq_r_u32(__a, __b) __arm_vshlq_r_u32(__a, __b)
 #define vrshlq_u32(__a, __b) __arm_vrshlq_u32(__a, __b)
@@ -1001,8 +965,6 @@
 #define vcmpeqq_n_s32(__a, __b) __arm_vcmpeqq_n_s32(__a, __b)
 #define vqshluq_n_s32(__a,  __imm) __arm_vqshluq_n_s32(__a,  __imm)
 #define vaddvq_p_s32(__a, __p) __arm_vaddvq_p_s32(__a, __p)
-#define vsubq_s32(__a, __b) __arm_vsubq_s32(__a, __b)
-#define vsubq_n_s32(__a, __b) __arm_vsubq_n_s32(__a, __b)
 #define vshlq_r_s32(__a, __b) __arm_vshlq_r_s32(__a, __b)
 #define vrshlq_s32(__a, __b) __arm_vrshlq_s32(__a, __b)
 #define vrshlq_n_s32(__a, __b) __arm_vrshlq_n_s32(__a, __b)
@@ -1022,8 +984,6 @@
 #define vqaddq_n_s32(__a, __b) __arm_vqaddq_n_s32(__a, __b)
 #define vorrq_s32(__a, __b) __arm_vorrq_s32(__a, __b)
 #define vornq_s32(__a, __b) __arm_vornq_s32(__a, __b)
-#define vmulq_s32(__a, __b) __arm_vmulq_s32(__a, __b)
-#define vmulq_n_s32(__a, __b) __arm_vmulq_n_s32(__a, __b)
 #define vmulltq_int_s32(__a, __b) __arm_vmulltq_int_s32(__a, __b)
 #define vmullbq_int_s32(__a, __b) __arm_vmullbq_int_s32(__a, __b)
 #define vmulhq_s32(__a, __b) __arm_vmulhq_s32(__a, __b)
@@ -1048,7 +1008,6 @@
 #define vbicq_s32(__a, __b) __arm_vbicq_s32(__a, __b)
 #define vandq_s32(__a, __b) __arm_vandq_s32(__a, __b)
 #define vaddvaq_s32(__a, __b) __arm_vaddvaq_s32(__a, __b)
-#define vaddq_n_s32(__a, __b) __arm_vaddq_n_s32(__a, __b)
 #define vabdq_s32(__a, __b) __arm_vabdq_s32(__a, __b)
 #define vshlq_n_s32(__a,  __imm) __arm_vshlq_n_s32(__a,  __imm)
 #define vrshrq_n_s32(__a,  __imm) __arm_vrshrq_n_s32(__a,  __imm)
@@ -1078,7 +1037,6 @@
 #define vcmpgeq_f16(__a, __b) __arm_vcmpgeq_f16(__a, __b)
 #define vcmpeqq_n_f16(__a, __b) __arm_vcmpeqq_n_f16(__a, __b)
 #define vcmpeqq_f16(__a, __b) __arm_vcmpeqq_f16(__a, __b)
-#define vsubq_f16(__a, __b) __arm_vsubq_f16(__a, __b)
 #define vqmovntq_s16(__a, __b) __arm_vqmovntq_s16(__a, __b)
 #define vqmovnbq_s16(__a, __b) __arm_vqmovnbq_s16(__a, __b)
 #define vqdmulltq_s16(__a, __b) __arm_vqdmulltq_s16(__a, __b)
@@ -1087,8 +1045,6 @@
 #define vqdmullbq_n_s16(__a, __b) __arm_vqdmullbq_n_s16(__a, __b)
 #define vorrq_f16(__a, __b) __arm_vorrq_f16(__a, __b)
 #define vornq_f16(__a, __b) __arm_vornq_f16(__a, __b)
-#define vmulq_n_f16(__a, __b) __arm_vmulq_n_f16(__a, __b)
-#define vmulq_f16(__a, __b) __arm_vmulq_f16(__a, __b)
 #define vmovntq_s16(__a, __b) __arm_vmovntq_s16(__a, __b)
 #define vmovnbq_s16(__a, __b) __arm_vmovnbq_s16(__a, __b)
 #define vmlsldavxq_s16(__a, __b) __arm_vmlsldavxq_s16(__a, __b)
@@ -1112,7 +1068,6 @@
 #define vcaddq_rot270_f16(__a, __b) __arm_vcaddq_rot270_f16(__a, __b)
 #define vbicq_f16(__a, __b) __arm_vbicq_f16(__a, __b)
 #define vandq_f16(__a, __b) __arm_vandq_f16(__a, __b)
-#define vaddq_n_f16(__a, __b) __arm_vaddq_n_f16(__a, __b)
 #define vabdq_f16(__a, __b) __arm_vabdq_f16(__a, __b)
 #define vshlltq_n_s8(__a,  __imm) __arm_vshlltq_n_s8(__a,  __imm)
 #define vshllbq_n_s8(__a,  __imm) __arm_vshllbq_n_s8(__a,  __imm)
@@ -1143,7 +1098,6 @@
 #define vcmpgeq_f32(__a, __b) __arm_vcmpgeq_f32(__a, __b)
 #define vcmpeqq_n_f32(__a, __b) __arm_vcmpeqq_n_f32(__a, __b)
 #define vcmpeqq_f32(__a, __b) __arm_vcmpeqq_f32(__a, __b)
-#define vsubq_f32(__a, __b) __arm_vsubq_f32(__a, __b)
 #define vqmovntq_s32(__a, __b) __arm_vqmovntq_s32(__a, __b)
 #define vqmovnbq_s32(__a, __b) __arm_vqmovnbq_s32(__a, __b)
 #define vqdmulltq_s32(__a, __b) __arm_vqdmulltq_s32(__a, __b)
@@ -1152,8 +1106,6 @@
 #define vqdmullbq_n_s32(__a, __b) __arm_vqdmullbq_n_s32(__a, __b)
 #define vorrq_f32(__a, __b) __arm_vorrq_f32(__a, __b)
 #define vornq_f32(__a, __b) __arm_vornq_f32(__a, __b)
-#define vmulq_n_f32(__a, __b) __arm_vmulq_n_f32(__a, __b)
-#define vmulq_f32(__a, __b) __arm_vmulq_f32(__a, __b)
 #define vmovntq_s32(__a, __b) __arm_vmovntq_s32(__a, __b)
 #define vmovnbq_s32(__a, __b) __arm_vmovnbq_s32(__a, __b)
 #define vmlsldavxq_s32(__a, __b) __arm_vmlsldavxq_s32(__a, __b)
@@ -1177,7 +1129,6 @@
 #define vcaddq_rot270_f32(__a, __b) __arm_vcaddq_rot270_f32(__a, __b)
 #define vbicq_f32(__a, __b) __arm_vbicq_f32(__a, __b)
 #define vandq_f32(__a, __b) __arm_vandq_f32(__a, __b)
-#define vaddq_n_f32(__a, __b) __arm_vaddq_n_f32(__a, __b)
 #define vabdq_f32(__a, __b) __arm_vabdq_f32(__a, __b)
 #define vshlltq_n_s16(__a,  __imm) __arm_vshlltq_n_s16(__a,  __imm)
 #define vshllbq_n_s16(__a,  __imm) __arm_vshllbq_n_s16(__a,  __imm)
@@ -1681,34 +1632,28 @@
 #define vqmovntq_m_u32(__a, __b, __p) __arm_vqmovntq_m_u32(__a, __b, __p)
 #define vrev32q_m_u16(__inactive, __a, __p) __arm_vrev32q_m_u16(__inactive, __a, __p)
 #define vsriq_m_n_s8(__a, __b,  __imm, __p) __arm_vsriq_m_n_s8(__a, __b,  __imm, __p)
-#define vsubq_m_s8(__inactive, __a, __b, __p) __arm_vsubq_m_s8(__inactive, __a, __b, __p)
 #define vcvtq_m_n_f16_u16(__inactive, __a,  __imm6, __p) __arm_vcvtq_m_n_f16_u16(__inactive, __a,  __imm6, __p)
 #define vqshluq_m_n_s8(__inactive, __a,  __imm, __p) __arm_vqshluq_m_n_s8(__inactive, __a,  __imm, __p)
 #define vabavq_p_s8(__a, __b, __c, __p) __arm_vabavq_p_s8(__a, __b, __c, __p)
 #define vsriq_m_n_u8(__a, __b,  __imm, __p) __arm_vsriq_m_n_u8(__a, __b,  __imm, __p)
 #define vshlq_m_u8(__inactive, __a, __b, __p) __arm_vshlq_m_u8(__inactive, __a, __b, __p)
-#define vsubq_m_u8(__inactive, __a, __b, __p) __arm_vsubq_m_u8(__inactive, __a, __b, __p)
 #define vabavq_p_u8(__a, __b, __c, __p) __arm_vabavq_p_u8(__a, __b, __c, __p)
 #define vshlq_m_s8(__inactive, __a, __b, __p) __arm_vshlq_m_s8(__inactive, __a, __b, __p)
 #define vcvtq_m_n_f16_s16(__inactive, __a,  __imm6, __p) __arm_vcvtq_m_n_f16_s16(__inactive, __a,  __imm6, __p)
 #define vsriq_m_n_s16(__a, __b,  __imm, __p) __arm_vsriq_m_n_s16(__a, __b,  __imm, __p)
-#define vsubq_m_s16(__inactive, __a, __b, __p) __arm_vsubq_m_s16(__inactive, __a, __b, __p)
 #define vcvtq_m_n_f32_u32(__inactive, __a,  __imm6, __p) __arm_vcvtq_m_n_f32_u32(__inactive, __a,  __imm6, __p)
 #define vqshluq_m_n_s16(__inactive, __a,  __imm, __p) __arm_vqshluq_m_n_s16(__inactive, __a,  __imm, __p)
 #define vabavq_p_s16(__a, __b, __c, __p) __arm_vabavq_p_s16(__a, __b, __c, __p)
 #define vsriq_m_n_u16(__a, __b,  __imm, __p) __arm_vsriq_m_n_u16(__a, __b,  __imm, __p)
 #define vshlq_m_u16(__inactive, __a, __b, __p) __arm_vshlq_m_u16(__inactive, __a, __b, __p)
-#define vsubq_m_u16(__inactive, __a, __b, __p) __arm_vsubq_m_u16(__inactive, __a, __b, __p)
 #define vabavq_p_u16(__a, __b, __c, __p) __arm_vabavq_p_u16(__a, __b, __c, __p)
 #define vshlq_m_s16(__inactive, __a, __b, __p) __arm_vshlq_m_s16(__inactive, __a, __b, __p)
 #define vcvtq_m_n_f32_s32(__inactive, __a,  __imm6, __p) __arm_vcvtq_m_n_f32_s32(__inactive, __a,  __imm6, __p)
 #define vsriq_m_n_s32(__a, __b,  __imm, __p) __arm_vsriq_m_n_s32(__a, __b,  __imm, __p)
-#define vsubq_m_s32(__inactive, __a, __b, __p) __arm_vsubq_m_s32(__inactive, __a, __b, __p)
 #define vqshluq_m_n_s32(__inactive, __a,  __imm, __p) __arm_vqshluq_m_n_s32(__inactive, __a,  __imm, __p)
 #define vabavq_p_s32(__a, __b, __c, __p) __arm_vabavq_p_s32(__a, __b, __c, __p)
 #define vsriq_m_n_u32(__a, __b,  __imm, __p) __arm_vsriq_m_n_u32(__a, __b,  __imm, __p)
 #define vshlq_m_u32(__inactive, __a, __b, __p) __arm_vshlq_m_u32(__inactive, __a, __b, __p)
-#define vsubq_m_u32(__inactive, __a, __b, __p) __arm_vsubq_m_u32(__inactive, __a, __b, __p)
 #define vabavq_p_u32(__a, __b, __c, __p) __arm_vabavq_p_u32(__a, __b, __c, __p)
 #define vshlq_m_s32(__inactive, __a, __b, __p) __arm_vshlq_m_s32(__inactive, __a, __b, __p)
 #define vabdq_m_s8(__inactive, __a, __b, __p) __arm_vabdq_m_s8(__inactive, __a, __b, __p)
@@ -1717,18 +1662,6 @@
 #define vabdq_m_u8(__inactive, __a, __b, __p) __arm_vabdq_m_u8(__inactive, __a, __b, __p)
 #define vabdq_m_u32(__inactive, __a, __b, __p) __arm_vabdq_m_u32(__inactive, __a, __b, __p)
 #define vabdq_m_u16(__inactive, __a, __b, __p) __arm_vabdq_m_u16(__inactive, __a, __b, __p)
-#define vaddq_m_n_s8(__inactive, __a, __b, __p) __arm_vaddq_m_n_s8(__inactive, __a, __b, __p)
-#define vaddq_m_n_s32(__inactive, __a, __b, __p) __arm_vaddq_m_n_s32(__inactive, __a, __b, __p)
-#define vaddq_m_n_s16(__inactive, __a, __b, __p) __arm_vaddq_m_n_s16(__inactive, __a, __b, __p)
-#define vaddq_m_n_u8(__inactive, __a, __b, __p) __arm_vaddq_m_n_u8(__inactive, __a, __b, __p)
-#define vaddq_m_n_u32(__inactive, __a, __b, __p) __arm_vaddq_m_n_u32(__inactive, __a, __b, __p)
-#define vaddq_m_n_u16(__inactive, __a, __b, __p) __arm_vaddq_m_n_u16(__inactive, __a, __b, __p)
-#define vaddq_m_s8(__inactive, __a, __b, __p) __arm_vaddq_m_s8(__inactive, __a, __b, __p)
-#define vaddq_m_s32(__inactive, __a, __b, __p) __arm_vaddq_m_s32(__inactive, __a, __b, __p)
-#define vaddq_m_s16(__inactive, __a, __b, __p) __arm_vaddq_m_s16(__inactive, __a, __b, __p)
-#define vaddq_m_u8(__inactive, __a, __b, __p) __arm_vaddq_m_u8(__inactive, __a, __b, __p)
-#define vaddq_m_u32(__inactive, __a, __b, __p) __arm_vaddq_m_u32(__inactive, __a, __b, __p)
-#define vaddq_m_u16(__inactive, __a, __b, __p) __arm_vaddq_m_u16(__inactive, __a, __b, __p)
 #define vandq_m_s8(__inactive, __a, __b, __p) __arm_vandq_m_s8(__inactive, __a, __b, __p)
 #define vandq_m_s32(__inactive, __a, __b, __p) __arm_vandq_m_s32(__inactive, __a, __b, __p)
 #define vandq_m_s16(__inactive, __a, __b, __p) __arm_vandq_m_s16(__inactive, __a, __b, __p)
@@ -1852,18 +1785,6 @@
 #define vmulltq_int_m_u8(__inactive, __a, __b, __p) __arm_vmulltq_int_m_u8(__inactive, __a, __b, __p)
 #define vmulltq_int_m_u32(__inactive, __a, __b, __p) __arm_vmulltq_int_m_u32(__inactive, __a, __b, __p)
 #define vmulltq_int_m_u16(__inactive, __a, __b, __p) __arm_vmulltq_int_m_u16(__inactive, __a, __b, __p)
-#define vmulq_m_n_s8(__inactive, __a, __b, __p) __arm_vmulq_m_n_s8(__inactive, __a, __b, __p)
-#define vmulq_m_n_s32(__inactive, __a, __b, __p) __arm_vmulq_m_n_s32(__inactive, __a, __b, __p)
-#define vmulq_m_n_s16(__inactive, __a, __b, __p) __arm_vmulq_m_n_s16(__inactive, __a, __b, __p)
-#define vmulq_m_n_u8(__inactive, __a, __b, __p) __arm_vmulq_m_n_u8(__inactive, __a, __b, __p)
-#define vmulq_m_n_u32(__inactive, __a, __b, __p) __arm_vmulq_m_n_u32(__inactive, __a, __b, __p)
-#define vmulq_m_n_u16(__inactive, __a, __b, __p) __arm_vmulq_m_n_u16(__inactive, __a, __b, __p)
-#define vmulq_m_s8(__inactive, __a, __b, __p) __arm_vmulq_m_s8(__inactive, __a, __b, __p)
-#define vmulq_m_s32(__inactive, __a, __b, __p) __arm_vmulq_m_s32(__inactive, __a, __b, __p)
-#define vmulq_m_s16(__inactive, __a, __b, __p) __arm_vmulq_m_s16(__inactive, __a, __b, __p)
-#define vmulq_m_u8(__inactive, __a, __b, __p) __arm_vmulq_m_u8(__inactive, __a, __b, __p)
-#define vmulq_m_u32(__inactive, __a, __b, __p) __arm_vmulq_m_u32(__inactive, __a, __b, __p)
-#define vmulq_m_u16(__inactive, __a, __b, __p) __arm_vmulq_m_u16(__inactive, __a, __b, __p)
 #define vornq_m_s8(__inactive, __a, __b, __p) __arm_vornq_m_s8(__inactive, __a, __b, __p)
 #define vornq_m_s32(__inactive, __a, __b, __p) __arm_vornq_m_s32(__inactive, __a, __b, __p)
 #define vornq_m_s16(__inactive, __a, __b, __p) __arm_vornq_m_s16(__inactive, __a, __b, __p)
@@ -2008,12 +1929,6 @@
 #define vsliq_m_n_u8(__a, __b,  __imm, __p) __arm_vsliq_m_n_u8(__a, __b,  __imm, __p)
 #define vsliq_m_n_u32(__a, __b,  __imm, __p) __arm_vsliq_m_n_u32(__a, __b,  __imm, __p)
 #define vsliq_m_n_u16(__a, __b,  __imm, __p) __arm_vsliq_m_n_u16(__a, __b,  __imm, __p)
-#define vsubq_m_n_s8(__inactive, __a, __b, __p) __arm_vsubq_m_n_s8(__inactive, __a, __b, __p)
-#define vsubq_m_n_s32(__inactive, __a, __b, __p) __arm_vsubq_m_n_s32(__inactive, __a, __b, __p)
-#define vsubq_m_n_s16(__inactive, __a, __b, __p) __arm_vsubq_m_n_s16(__inactive, __a, __b, __p)
-#define vsubq_m_n_u8(__inactive, __a, __b, __p) __arm_vsubq_m_n_u8(__inactive, __a, __b, __p)
-#define vsubq_m_n_u32(__inactive, __a, __b, __p) __arm_vsubq_m_n_u32(__inactive, __a, __b, __p)
-#define vsubq_m_n_u16(__inactive, __a, __b, __p) __arm_vsubq_m_n_u16(__inactive, __a, __b, __p)
 #define vmlaldavaq_p_s32(__a, __b, __c, __p) __arm_vmlaldavaq_p_s32(__a, __b, __c, __p)
 #define vmlaldavaq_p_s16(__a, __b, __c, __p) __arm_vmlaldavaq_p_s16(__a, __b, __c, __p)
 #define vmlaldavaq_p_u32(__a, __b, __c, __p) __arm_vmlaldavaq_p_u32(__a, __b, __c, __p)
@@ -2091,10 +2006,6 @@
 #define vshrntq_m_n_u16(__a, __b,  __imm, __p) __arm_vshrntq_m_n_u16(__a, __b,  __imm, __p)
 #define vabdq_m_f32(__inactive, __a, __b, __p) __arm_vabdq_m_f32(__inactive, __a, __b, __p)
 #define vabdq_m_f16(__inactive, __a, __b, __p) __arm_vabdq_m_f16(__inactive, __a, __b, __p)
-#define vaddq_m_f32(__inactive, __a, __b, __p) __arm_vaddq_m_f32(__inactive, __a, __b, __p)
-#define vaddq_m_f16(__inactive, __a, __b, __p) __arm_vaddq_m_f16(__inactive, __a, __b, __p)
-#define vaddq_m_n_f32(__inactive, __a, __b, __p) __arm_vaddq_m_n_f32(__inactive, __a, __b, __p)
-#define vaddq_m_n_f16(__inactive, __a, __b, __p) __arm_vaddq_m_n_f16(__inactive, __a, __b, __p)
 #define vandq_m_f32(__inactive, __a, __b, __p) __arm_vandq_m_f32(__inactive, __a, __b, __p)
 #define vandq_m_f16(__inactive, __a, __b, __p) __arm_vandq_m_f16(__inactive, __a, __b, __p)
 #define vbicq_m_f32(__inactive, __a, __b, __p) __arm_vbicq_m_f32(__inactive, __a, __b, __p)
@@ -2139,18 +2050,10 @@
 #define vmaxnmq_m_f16(__inactive, __a, __b, __p) __arm_vmaxnmq_m_f16(__inactive, __a, __b, __p)
 #define vminnmq_m_f32(__inactive, __a, __b, __p) __arm_vminnmq_m_f32(__inactive, __a, __b, __p)
 #define vminnmq_m_f16(__inactive, __a, __b, __p) __arm_vminnmq_m_f16(__inactive, __a, __b, __p)
-#define vmulq_m_f32(__inactive, __a, __b, __p) __arm_vmulq_m_f32(__inactive, __a, __b, __p)
-#define vmulq_m_f16(__inactive, __a, __b, __p) __arm_vmulq_m_f16(__inactive, __a, __b, __p)
-#define vmulq_m_n_f32(__inactive, __a, __b, __p) __arm_vmulq_m_n_f32(__inactive, __a, __b, __p)
-#define vmulq_m_n_f16(__inactive, __a, __b, __p) __arm_vmulq_m_n_f16(__inactive, __a, __b, __p)
 #define vornq_m_f32(__inactive, __a, __b, __p) __arm_vornq_m_f32(__inactive, __a, __b, __p)
 #define vornq_m_f16(__inactive, __a, __b, __p) __arm_vornq_m_f16(__inactive, __a, __b, __p)
 #define vorrq_m_f32(__inactive, __a, __b, __p) __arm_vorrq_m_f32(__inactive, __a, __b, __p)
 #define vorrq_m_f16(__inactive, __a, __b, __p) __arm_vorrq_m_f16(__inactive, __a, __b, __p)
-#define vsubq_m_f32(__inactive, __a, __b, __p) __arm_vsubq_m_f32(__inactive, __a, __b, __p)
-#define vsubq_m_f16(__inactive, __a, __b, __p) __arm_vsubq_m_f16(__inactive, __a, __b, __p)
-#define vsubq_m_n_f32(__inactive, __a, __b, __p) __arm_vsubq_m_n_f32(__inactive, __a, __b, __p)
-#define vsubq_m_n_f16(__inactive, __a, __b, __p) __arm_vsubq_m_n_f16(__inactive, __a, __b, __p)
 #define vstrbq_s8( __addr, __value) __arm_vstrbq_s8( __addr, __value)
 #define vstrbq_u8( __addr, __value) __arm_vstrbq_u8( __addr, __value)
 #define vstrbq_u16( __addr, __value) __arm_vstrbq_u16( __addr, __value)
@@ -2347,14 +2250,6 @@
 #define vstrwq_scatter_shifted_offset_p_u32(__base, __offset, __value, __p) __arm_vstrwq_scatter_shifted_offset_p_u32(__base, __offset, __value, __p)
 #define vstrwq_scatter_shifted_offset_s32(__base, __offset, __value) __arm_vstrwq_scatter_shifted_offset_s32(__base, __offset, __value)
 #define vstrwq_scatter_shifted_offset_u32(__base, __offset, __value) __arm_vstrwq_scatter_shifted_offset_u32(__base, __offset, __value)
-#define vaddq_s8(__a, __b) __arm_vaddq_s8(__a, __b)
-#define vaddq_s16(__a, __b) __arm_vaddq_s16(__a, __b)
-#define vaddq_s32(__a, __b) __arm_vaddq_s32(__a, __b)
-#define vaddq_u8(__a, __b) __arm_vaddq_u8(__a, __b)
-#define vaddq_u16(__a, __b) __arm_vaddq_u16(__a, __b)
-#define vaddq_u32(__a, __b) __arm_vaddq_u32(__a, __b)
-#define vaddq_f16(__a, __b) __arm_vaddq_f16(__a, __b)
-#define vaddq_f32(__a, __b) __arm_vaddq_f32(__a, __b)
 #define vuninitializedq_u8(void) __arm_vuninitializedq_u8(void)
 #define vuninitializedq_u16(void) __arm_vuninitializedq_u16(void)
 #define vuninitializedq_u32(void) __arm_vuninitializedq_u32(void)
@@ -2484,18 +2379,6 @@
 #define vabsq_x_s8(__a, __p) __arm_vabsq_x_s8(__a, __p)
 #define vabsq_x_s16(__a, __p) __arm_vabsq_x_s16(__a, __p)
 #define vabsq_x_s32(__a, __p) __arm_vabsq_x_s32(__a, __p)
-#define vaddq_x_s8(__a, __b, __p) __arm_vaddq_x_s8(__a, __b, __p)
-#define vaddq_x_s16(__a, __b, __p) __arm_vaddq_x_s16(__a, __b, __p)
-#define vaddq_x_s32(__a, __b, __p) __arm_vaddq_x_s32(__a, __b, __p)
-#define vaddq_x_n_s8(__a, __b, __p) __arm_vaddq_x_n_s8(__a, __b, __p)
-#define vaddq_x_n_s16(__a, __b, __p) __arm_vaddq_x_n_s16(__a, __b, __p)
-#define vaddq_x_n_s32(__a, __b, __p) __arm_vaddq_x_n_s32(__a, __b, __p)
-#define vaddq_x_u8(__a, __b, __p) __arm_vaddq_x_u8(__a, __b, __p)
-#define vaddq_x_u16(__a, __b, __p) __arm_vaddq_x_u16(__a, __b, __p)
-#define vaddq_x_u32(__a, __b, __p) __arm_vaddq_x_u32(__a, __b, __p)
-#define vaddq_x_n_u8(__a, __b, __p) __arm_vaddq_x_n_u8(__a, __b, __p)
-#define vaddq_x_n_u16(__a, __b, __p) __arm_vaddq_x_n_u16(__a, __b, __p)
-#define vaddq_x_n_u32(__a, __b, __p) __arm_vaddq_x_n_u32(__a, __b, __p)
 #define vclsq_x_s8(__a, __p) __arm_vclsq_x_s8(__a, __p)
 #define vclsq_x_s16(__a, __p) __arm_vclsq_x_s16(__a, __p)
 #define vclsq_x_s32(__a, __p) __arm_vclsq_x_s32(__a, __p)
@@ -2530,30 +2413,6 @@
 #define vmulltq_int_x_u8(__a, __b, __p) __arm_vmulltq_int_x_u8(__a, __b, __p)
 #define vmulltq_int_x_u16(__a, __b, __p) __arm_vmulltq_int_x_u16(__a, __b, __p)
 #define vmulltq_int_x_u32(__a, __b, __p) __arm_vmulltq_int_x_u32(__a, __b, __p)
-#define vmulq_x_s8(__a, __b, __p) __arm_vmulq_x_s8(__a, __b, __p)
-#define vmulq_x_s16(__a, __b, __p) __arm_vmulq_x_s16(__a, __b, __p)
-#define vmulq_x_s32(__a, __b, __p) __arm_vmulq_x_s32(__a, __b, __p)
-#define vmulq_x_n_s8(__a, __b, __p) __arm_vmulq_x_n_s8(__a, __b, __p)
-#define vmulq_x_n_s16(__a, __b, __p) __arm_vmulq_x_n_s16(__a, __b, __p)
-#define vmulq_x_n_s32(__a, __b, __p) __arm_vmulq_x_n_s32(__a, __b, __p)
-#define vmulq_x_u8(__a, __b, __p) __arm_vmulq_x_u8(__a, __b, __p)
-#define vmulq_x_u16(__a, __b, __p) __arm_vmulq_x_u16(__a, __b, __p)
-#define vmulq_x_u32(__a, __b, __p) __arm_vmulq_x_u32(__a, __b, __p)
-#define vmulq_x_n_u8(__a, __b, __p) __arm_vmulq_x_n_u8(__a, __b, __p)
-#define vmulq_x_n_u16(__a, __b, __p) __arm_vmulq_x_n_u16(__a, __b, __p)
-#define vmulq_x_n_u32(__a, __b, __p) __arm_vmulq_x_n_u32(__a, __b, __p)
-#define vsubq_x_s8(__a, __b, __p) __arm_vsubq_x_s8(__a, __b, __p)
-#define vsubq_x_s16(__a, __b, __p) __arm_vsubq_x_s16(__a, __b, __p)
-#define vsubq_x_s32(__a, __b, __p) __arm_vsubq_x_s32(__a, __b, __p)
-#define vsubq_x_n_s8(__a, __b, __p) __arm_vsubq_x_n_s8(__a, __b, __p)
-#define vsubq_x_n_s16(__a, __b, __p) __arm_vsubq_x_n_s16(__a, __b, __p)
-#define vsubq_x_n_s32(__a, __b, __p) __arm_vsubq_x_n_s32(__a, __b, __p)
-#define vsubq_x_u8(__a, __b, __p) __arm_vsubq_x_u8(__a, __b, __p)
-#define vsubq_x_u16(__a, __b, __p) __arm_vsubq_x_u16(__a, __b, __p)
-#define vsubq_x_u32(__a, __b, __p) __arm_vsubq_x_u32(__a, __b, __p)
-#define vsubq_x_n_u8(__a, __b, __p) __arm_vsubq_x_n_u8(__a, __b, __p)
-#define vsubq_x_n_u16(__a, __b, __p) __arm_vsubq_x_n_u16(__a, __b, __p)
-#define vsubq_x_n_u32(__a, __b, __p) __arm_vsubq_x_n_u32(__a, __b, __p)
 #define vcaddq_rot90_x_s8(__a, __b, __p) __arm_vcaddq_rot90_x_s8(__a, __b, __p)
 #define vcaddq_rot90_x_s16(__a, __b, __p) __arm_vcaddq_rot90_x_s16(__a, __b, __p)
 #define vcaddq_rot90_x_s32(__a, __b, __p) __arm_vcaddq_rot90_x_s32(__a, __b, __p)
@@ -2722,20 +2581,8 @@
 #define vabdq_x_f32(__a, __b, __p) __arm_vabdq_x_f32(__a, __b, __p)
 #define vabsq_x_f16(__a, __p) __arm_vabsq_x_f16(__a, __p)
 #define vabsq_x_f32(__a, __p) __arm_vabsq_x_f32(__a, __p)
-#define vaddq_x_f16(__a, __b, __p) __arm_vaddq_x_f16(__a, __b, __p)
-#define vaddq_x_f32(__a, __b, __p) __arm_vaddq_x_f32(__a, __b, __p)
-#define vaddq_x_n_f16(__a, __b, __p) __arm_vaddq_x_n_f16(__a, __b, __p)
-#define vaddq_x_n_f32(__a, __b, __p) __arm_vaddq_x_n_f32(__a, __b, __p)
 #define vnegq_x_f16(__a, __p) __arm_vnegq_x_f16(__a, __p)
 #define vnegq_x_f32(__a, __p) __arm_vnegq_x_f32(__a, __p)
-#define vmulq_x_f16(__a, __b, __p) __arm_vmulq_x_f16(__a, __b, __p)
-#define vmulq_x_f32(__a, __b, __p) __arm_vmulq_x_f32(__a, __b, __p)
-#define vmulq_x_n_f16(__a, __b, __p) __arm_vmulq_x_n_f16(__a, __b, __p)
-#define vmulq_x_n_f32(__a, __b, __p) __arm_vmulq_x_n_f32(__a, __b, __p)
-#define vsubq_x_f16(__a, __b, __p) __arm_vsubq_x_f16(__a, __b, __p)
-#define vsubq_x_f32(__a, __b, __p) __arm_vsubq_x_f32(__a, __b, __p)
-#define vsubq_x_n_f16(__a, __b, __p) __arm_vsubq_x_n_f16(__a, __b, __p)
-#define vsubq_x_n_f32(__a, __b, __p) __arm_vsubq_x_n_f32(__a, __b, __p)
 #define vcaddq_rot90_x_f16(__a, __b, __p) __arm_vcaddq_rot90_x_f16(__a, __b, __p)
 #define vcaddq_rot90_x_f32(__a, __b, __p) __arm_vcaddq_rot90_x_f32(__a, __b, __p)
 #define vcaddq_rot270_x_f16(__a, __b, __p) __arm_vcaddq_rot270_x_f16(__a, __b, __p)
@@ -3659,19 +3506,6 @@ __arm_vshlq_u32 (uint32x4_t __a, int32x4_t __b)
 {
   return __builtin_mve_vshlq_uv4si (__a, __b);
 }
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vsubq_u8 (uint8x16_t __a, uint8x16_t __b)
-{
-  return __builtin_mve_vsubq_uv16qi (__a, __b);
-}
-
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vsubq_n_u8 (uint8x16_t __a, uint8_t __b)
-{
-  return __builtin_mve_vsubq_n_uv16qi (__a, __b);
-}
 
 __extension__ extern __inline uint8x16_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
@@ -3729,20 +3563,6 @@ __arm_vornq_u8 (uint8x16_t __a, uint8x16_t __b)
   return __builtin_mve_vornq_uv16qi (__a, __b);
 }
 
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vmulq_u8 (uint8x16_t __a, uint8x16_t __b)
-{
-  return __builtin_mve_vmulq_uv16qi (__a, __b);
-}
-
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vmulq_n_u8 (uint8x16_t __a, uint8_t __b)
-{
-  return __builtin_mve_vmulq_n_uv16qi (__a, __b);
-}
-
 __extension__ extern __inline uint16x8_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vmulltq_int_u8 (uint8x16_t __a, uint8x16_t __b)
@@ -3927,13 +3747,6 @@ __arm_vaddvaq_u8 (uint32_t __a, uint8x16_t __b)
   return __builtin_mve_vaddvaq_uv16qi (__a, __b);
 }
 
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vaddq_n_u8 (uint8x16_t __a, uint8_t __b)
-{
-  return __builtin_mve_vaddq_n_uv16qi (__a, __b);
-}
-
 __extension__ extern __inline uint8x16_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vabdq_u8 (uint8x16_t __a, uint8x16_t __b)
@@ -4137,20 +3950,6 @@ __arm_vaddvq_p_s8 (int8x16_t __a, mve_pred16_t __p)
   return __builtin_mve_vaddvq_p_sv16qi (__a, __p);
 }
 
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vsubq_s8 (int8x16_t __a, int8x16_t __b)
-{
-  return __builtin_mve_vsubq_sv16qi (__a, __b);
-}
-
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vsubq_n_s8 (int8x16_t __a, int8_t __b)
-{
-  return __builtin_mve_vsubq_n_sv16qi (__a, __b);
-}
-
 __extension__ extern __inline int8x16_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vshlq_r_s8 (int8x16_t __a, int32_t __b)
@@ -4284,20 +4083,6 @@ __arm_vornq_s8 (int8x16_t __a, int8x16_t __b)
   return __builtin_mve_vornq_sv16qi (__a, __b);
 }
 
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vmulq_s8 (int8x16_t __a, int8x16_t __b)
-{
-  return __builtin_mve_vmulq_sv16qi (__a, __b);
-}
-
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vmulq_n_s8 (int8x16_t __a, int8_t __b)
-{
-  return __builtin_mve_vmulq_n_sv16qi (__a, __b);
-}
-
 __extension__ extern __inline int16x8_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vmulltq_int_s8 (int8x16_t __a, int8x16_t __b)
@@ -4466,13 +4251,6 @@ __arm_vaddvaq_s8 (int32_t __a, int8x16_t __b)
   return __builtin_mve_vaddvaq_sv16qi (__a, __b);
 }
 
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vaddq_n_s8 (int8x16_t __a, int8_t __b)
-{
-  return __builtin_mve_vaddq_n_sv16qi (__a, __b);
-}
-
 __extension__ extern __inline int8x16_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vabdq_s8 (int8x16_t __a, int8x16_t __b)
@@ -4501,20 +4279,6 @@ __arm_vqshlq_n_s8 (int8x16_t __a, const int __imm)
   return __builtin_mve_vqshlq_n_sv16qi (__a, __imm);
 }
 
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vsubq_u16 (uint16x8_t __a, uint16x8_t __b)
-{
-  return __builtin_mve_vsubq_uv8hi (__a, __b);
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vsubq_n_u16 (uint16x8_t __a, uint16_t __b)
-{
-  return __builtin_mve_vsubq_n_uv8hi (__a, __b);
-}
-
 __extension__ extern __inline uint16x8_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vrmulhq_u16 (uint16x8_t __a, uint16x8_t __b)
@@ -4571,20 +4335,6 @@ __arm_vornq_u16 (uint16x8_t __a, uint16x8_t __b)
   return __builtin_mve_vornq_uv8hi (__a, __b);
 }
 
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vmulq_u16 (uint16x8_t __a, uint16x8_t __b)
-{
-  return __builtin_mve_vmulq_uv8hi (__a, __b);
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vmulq_n_u16 (uint16x8_t __a, uint16_t __b)
-{
-  return __builtin_mve_vmulq_n_uv8hi (__a, __b);
-}
-
 __extension__ extern __inline uint32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vmulltq_int_u16 (uint16x8_t __a, uint16x8_t __b)
@@ -4769,13 +4519,6 @@ __arm_vaddvaq_u16 (uint32_t __a, uint16x8_t __b)
   return __builtin_mve_vaddvaq_uv8hi (__a, __b);
 }
 
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vaddq_n_u16 (uint16x8_t __a, uint16_t __b)
-{
-  return __builtin_mve_vaddq_n_uv8hi (__a, __b);
-}
-
 __extension__ extern __inline uint16x8_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vabdq_u16 (uint16x8_t __a, uint16x8_t __b)
@@ -4979,20 +4722,6 @@ __arm_vaddvq_p_s16 (int16x8_t __a, mve_pred16_t __p)
   return __builtin_mve_vaddvq_p_sv8hi (__a, __p);
 }
 
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vsubq_s16 (int16x8_t __a, int16x8_t __b)
-{
-  return __builtin_mve_vsubq_sv8hi (__a, __b);
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vsubq_n_s16 (int16x8_t __a, int16_t __b)
-{
-  return __builtin_mve_vsubq_n_sv8hi (__a, __b);
-}
-
 __extension__ extern __inline int16x8_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vshlq_r_s16 (int16x8_t __a, int32_t __b)
@@ -5126,20 +4855,6 @@ __arm_vornq_s16 (int16x8_t __a, int16x8_t __b)
   return __builtin_mve_vornq_sv8hi (__a, __b);
 }
 
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vmulq_s16 (int16x8_t __a, int16x8_t __b)
-{
-  return __builtin_mve_vmulq_sv8hi (__a, __b);
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vmulq_n_s16 (int16x8_t __a, int16_t __b)
-{
-  return __builtin_mve_vmulq_n_sv8hi (__a, __b);
-}
-
 __extension__ extern __inline int32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vmulltq_int_s16 (int16x8_t __a, int16x8_t __b)
@@ -5308,13 +5023,6 @@ __arm_vaddvaq_s16 (int32_t __a, int16x8_t __b)
   return __builtin_mve_vaddvaq_sv8hi (__a, __b);
 }
 
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vaddq_n_s16 (int16x8_t __a, int16_t __b)
-{
-  return __builtin_mve_vaddq_n_sv8hi (__a, __b);
-}
-
 __extension__ extern __inline int16x8_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vabdq_s16 (int16x8_t __a, int16x8_t __b)
@@ -5343,20 +5051,6 @@ __arm_vqshlq_n_s16 (int16x8_t __a, const int __imm)
   return __builtin_mve_vqshlq_n_sv8hi (__a, __imm);
 }
 
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vsubq_u32 (uint32x4_t __a, uint32x4_t __b)
-{
-  return __builtin_mve_vsubq_uv4si (__a, __b);
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vsubq_n_u32 (uint32x4_t __a, uint32_t __b)
-{
-  return __builtin_mve_vsubq_n_uv4si (__a, __b);
-}
-
 __extension__ extern __inline uint32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vrmulhq_u32 (uint32x4_t __a, uint32x4_t __b)
@@ -5413,20 +5107,6 @@ __arm_vornq_u32 (uint32x4_t __a, uint32x4_t __b)
   return __builtin_mve_vornq_uv4si (__a, __b);
 }
 
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vmulq_u32 (uint32x4_t __a, uint32x4_t __b)
-{
-  return __builtin_mve_vmulq_uv4si (__a, __b);
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vmulq_n_u32 (uint32x4_t __a, uint32_t __b)
-{
-  return __builtin_mve_vmulq_n_uv4si (__a, __b);
-}
-
 __extension__ extern __inline uint64x2_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vmulltq_int_u32 (uint32x4_t __a, uint32x4_t __b)
@@ -5611,13 +5291,6 @@ __arm_vaddvaq_u32 (uint32_t __a, uint32x4_t __b)
   return __builtin_mve_vaddvaq_uv4si (__a, __b);
 }
 
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vaddq_n_u32 (uint32x4_t __a, uint32_t __b)
-{
-  return __builtin_mve_vaddq_n_uv4si (__a, __b);
-}
-
 __extension__ extern __inline uint32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vabdq_u32 (uint32x4_t __a, uint32x4_t __b)
@@ -5821,20 +5494,6 @@ __arm_vaddvq_p_s32 (int32x4_t __a, mve_pred16_t __p)
   return __builtin_mve_vaddvq_p_sv4si (__a, __p);
 }
 
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vsubq_s32 (int32x4_t __a, int32x4_t __b)
-{
-  return __builtin_mve_vsubq_sv4si (__a, __b);
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vsubq_n_s32 (int32x4_t __a, int32_t __b)
-{
-  return __builtin_mve_vsubq_n_sv4si (__a, __b);
-}
-
 __extension__ extern __inline int32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vshlq_r_s32 (int32x4_t __a, int32_t __b)
@@ -5968,20 +5627,6 @@ __arm_vornq_s32 (int32x4_t __a, int32x4_t __b)
   return __builtin_mve_vornq_sv4si (__a, __b);
 }
 
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vmulq_s32 (int32x4_t __a, int32x4_t __b)
-{
-  return __builtin_mve_vmulq_sv4si (__a, __b);
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vmulq_n_s32 (int32x4_t __a, int32_t __b)
-{
-  return __builtin_mve_vmulq_n_sv4si (__a, __b);
-}
-
 __extension__ extern __inline int64x2_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vmulltq_int_s32 (int32x4_t __a, int32x4_t __b)
@@ -6150,13 +5795,6 @@ __arm_vaddvaq_s32 (int32_t __a, int32x4_t __b)
   return __builtin_mve_vaddvaq_sv4si (__a, __b);
 }
 
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vaddq_n_s32 (int32x4_t __a, int32_t __b)
-{
-  return __builtin_mve_vaddq_n_sv4si (__a, __b);
-}
-
 __extension__ extern __inline int32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vabdq_s32 (int32x4_t __a, int32x4_t __b)
@@ -9355,13 +8993,6 @@ __arm_vsriq_m_n_s8 (int8x16_t __a, int8x16_t __b, const int __imm, mve_pred16_t
   return __builtin_mve_vsriq_m_n_sv16qi (__a, __b, __imm, __p);
 }
 
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vsubq_m_s8 (int8x16_t __inactive, int8x16_t __a, int8x16_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vsubq_m_sv16qi (__inactive, __a, __b, __p);
-}
-
 __extension__ extern __inline uint8x16_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vqshluq_m_n_s8 (uint8x16_t __inactive, int8x16_t __a, const int __imm, mve_pred16_t __p)
@@ -9390,13 +9021,6 @@ __arm_vshlq_m_u8 (uint8x16_t __inactive, uint8x16_t __a, int8x16_t __b, mve_pred
   return __builtin_mve_vshlq_m_uv16qi (__inactive, __a, __b, __p);
 }
 
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vsubq_m_u8 (uint8x16_t __inactive, uint8x16_t __a, uint8x16_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vsubq_m_uv16qi (__inactive, __a, __b, __p);
-}
-
 __extension__ extern __inline uint32_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vabavq_p_u8 (uint32_t __a, uint8x16_t __b, uint8x16_t __c, mve_pred16_t __p)
@@ -9418,13 +9042,6 @@ __arm_vsriq_m_n_s16 (int16x8_t __a, int16x8_t __b, const int __imm, mve_pred16_t
   return __builtin_mve_vsriq_m_n_sv8hi (__a, __b, __imm, __p);
 }
 
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vsubq_m_s16 (int16x8_t __inactive, int16x8_t __a, int16x8_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vsubq_m_sv8hi (__inactive, __a, __b, __p);
-}
-
 __extension__ extern __inline uint16x8_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vqshluq_m_n_s16 (uint16x8_t __inactive, int16x8_t __a, const int __imm, mve_pred16_t __p)
@@ -9453,13 +9070,6 @@ __arm_vshlq_m_u16 (uint16x8_t __inactive, uint16x8_t __a, int16x8_t __b, mve_pre
   return __builtin_mve_vshlq_m_uv8hi (__inactive, __a, __b, __p);
 }
 
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vsubq_m_u16 (uint16x8_t __inactive, uint16x8_t __a, uint16x8_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vsubq_m_uv8hi (__inactive, __a, __b, __p);
-}
-
 __extension__ extern __inline uint32_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vabavq_p_u16 (uint32_t __a, uint16x8_t __b, uint16x8_t __c, mve_pred16_t __p)
@@ -9481,13 +9091,6 @@ __arm_vsriq_m_n_s32 (int32x4_t __a, int32x4_t __b, const int __imm, mve_pred16_t
   return __builtin_mve_vsriq_m_n_sv4si (__a, __b, __imm, __p);
 }
 
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vsubq_m_s32 (int32x4_t __inactive, int32x4_t __a, int32x4_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vsubq_m_sv4si (__inactive, __a, __b, __p);
-}
-
 __extension__ extern __inline uint32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vqshluq_m_n_s32 (uint32x4_t __inactive, int32x4_t __a, const int __imm, mve_pred16_t __p)
@@ -9516,13 +9119,6 @@ __arm_vshlq_m_u32 (uint32x4_t __inactive, uint32x4_t __a, int32x4_t __b, mve_pre
   return __builtin_mve_vshlq_m_uv4si (__inactive, __a, __b, __p);
 }
 
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vsubq_m_u32 (uint32x4_t __inactive, uint32x4_t __a, uint32x4_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vsubq_m_uv4si (__inactive, __a, __b, __p);
-}
-
 __extension__ extern __inline uint32_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vabavq_p_u32 (uint32_t __a, uint32x4_t __b, uint32x4_t __c, mve_pred16_t __p)
@@ -9579,90 +9175,6 @@ __arm_vabdq_m_u16 (uint16x8_t __inactive, uint16x8_t __a, uint16x8_t __b, mve_pr
   return __builtin_mve_vabdq_m_uv8hi (__inactive, __a, __b, __p);
 }
 
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vaddq_m_n_s8 (int8x16_t __inactive, int8x16_t __a, int8_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vaddq_m_n_sv16qi (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vaddq_m_n_s32 (int32x4_t __inactive, int32x4_t __a, int32_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vaddq_m_n_sv4si (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vaddq_m_n_s16 (int16x8_t __inactive, int16x8_t __a, int16_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vaddq_m_n_sv8hi (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vaddq_m_n_u8 (uint8x16_t __inactive, uint8x16_t __a, uint8_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vaddq_m_n_uv16qi (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vaddq_m_n_u32 (uint32x4_t __inactive, uint32x4_t __a, uint32_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vaddq_m_n_uv4si (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vaddq_m_n_u16 (uint16x8_t __inactive, uint16x8_t __a, uint16_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vaddq_m_n_uv8hi (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vaddq_m_s8 (int8x16_t __inactive, int8x16_t __a, int8x16_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vaddq_m_sv16qi (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vaddq_m_s32 (int32x4_t __inactive, int32x4_t __a, int32x4_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vaddq_m_sv4si (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vaddq_m_s16 (int16x8_t __inactive, int16x8_t __a, int16x8_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vaddq_m_sv8hi (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vaddq_m_u8 (uint8x16_t __inactive, uint8x16_t __a, uint8x16_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vaddq_m_uv16qi (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vaddq_m_u32 (uint32x4_t __inactive, uint32x4_t __a, uint32x4_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vaddq_m_uv4si (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vaddq_m_u16 (uint16x8_t __inactive, uint16x8_t __a, uint16x8_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vaddq_m_uv8hi (__inactive, __a, __b, __p);
-}
-
 __extension__ extern __inline int8x16_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vandq_m_s8 (int8x16_t __inactive, int8x16_t __a, int8x16_t __b, mve_pred16_t __p)
@@ -10524,90 +10036,6 @@ __arm_vmulltq_int_m_u16 (uint32x4_t __inactive, uint16x8_t __a, uint16x8_t __b,
   return __builtin_mve_vmulltq_int_m_uv8hi (__inactive, __a, __b, __p);
 }
 
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vmulq_m_n_s8 (int8x16_t __inactive, int8x16_t __a, int8_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vmulq_m_n_sv16qi (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vmulq_m_n_s32 (int32x4_t __inactive, int32x4_t __a, int32_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vmulq_m_n_sv4si (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vmulq_m_n_s16 (int16x8_t __inactive, int16x8_t __a, int16_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vmulq_m_n_sv8hi (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vmulq_m_n_u8 (uint8x16_t __inactive, uint8x16_t __a, uint8_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vmulq_m_n_uv16qi (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vmulq_m_n_u32 (uint32x4_t __inactive, uint32x4_t __a, uint32_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vmulq_m_n_uv4si (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vmulq_m_n_u16 (uint16x8_t __inactive, uint16x8_t __a, uint16_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vmulq_m_n_uv8hi (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vmulq_m_s8 (int8x16_t __inactive, int8x16_t __a, int8x16_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vmulq_m_sv16qi (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vmulq_m_s32 (int32x4_t __inactive, int32x4_t __a, int32x4_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vmulq_m_sv4si (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vmulq_m_s16 (int16x8_t __inactive, int16x8_t __a, int16x8_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vmulq_m_sv8hi (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vmulq_m_u8 (uint8x16_t __inactive, uint8x16_t __a, uint8x16_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vmulq_m_uv16qi (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vmulq_m_u32 (uint32x4_t __inactive, uint32x4_t __a, uint32x4_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vmulq_m_uv4si (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vmulq_m_u16 (uint16x8_t __inactive, uint16x8_t __a, uint16x8_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vmulq_m_uv8hi (__inactive, __a, __b, __p);
-}
-
 __extension__ extern __inline int8x16_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vornq_m_s8 (int8x16_t __inactive, int8x16_t __a, int8x16_t __b, mve_pred16_t __p)
@@ -11616,48 +11044,6 @@ __arm_vsliq_m_n_u16 (uint16x8_t __a, uint16x8_t __b, const int __imm, mve_pred16
   return __builtin_mve_vsliq_m_n_uv8hi (__a, __b, __imm, __p);
 }
 
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vsubq_m_n_s8 (int8x16_t __inactive, int8x16_t __a, int8_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vsubq_m_n_sv16qi (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vsubq_m_n_s32 (int32x4_t __inactive, int32x4_t __a, int32_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vsubq_m_n_sv4si (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vsubq_m_n_s16 (int16x8_t __inactive, int16x8_t __a, int16_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vsubq_m_n_sv8hi (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vsubq_m_n_u8 (uint8x16_t __inactive, uint8x16_t __a, uint8_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vsubq_m_n_uv16qi (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vsubq_m_n_u32 (uint32x4_t __inactive, uint32x4_t __a, uint32_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vsubq_m_n_uv4si (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vsubq_m_n_u16 (uint16x8_t __inactive, uint16x8_t __a, uint16_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vsubq_m_n_uv8hi (__inactive, __a, __b, __p);
-}
-
 __extension__ extern __inline int64_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vmlaldavaq_p_s32 (int64_t __a, int32x4_t __b, int32x4_t __c, mve_pred16_t __p)
@@ -13333,48 +12719,6 @@ __arm_vstrwq_scatter_shifted_offset_u32 (uint32_t * __base, uint32x4_t __offset,
   __builtin_mve_vstrwq_scatter_shifted_offset_uv4si ((__builtin_neon_si *) __base, __offset, __value);
 }
 
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vaddq_s8 (int8x16_t __a, int8x16_t __b)
-{
-  return __a + __b;
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vaddq_s16 (int16x8_t __a, int16x8_t __b)
-{
-  return __a + __b;
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vaddq_s32 (int32x4_t __a, int32x4_t __b)
-{
-  return __a + __b;
-}
-
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vaddq_u8 (uint8x16_t __a, uint8x16_t __b)
-{
-  return __a + __b;
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vaddq_u16 (uint16x8_t __a, uint16x8_t __b)
-{
-  return __a + __b;
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vaddq_u32 (uint32x4_t __a, uint32x4_t __b)
-{
-  return __a + __b;
-}
-
 __extension__ extern __inline uint8x16_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vddupq_m_n_u8 (uint8x16_t __inactive, uint32_t __a, const int __imm, mve_pred16_t __p)
@@ -14325,90 +13669,6 @@ __arm_vabsq_x_s32 (int32x4_t __a, mve_pred16_t __p)
   return __builtin_mve_vabsq_m_sv4si (__arm_vuninitializedq_s32 (), __a, __p);
 }
 
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vaddq_x_s8 (int8x16_t __a, int8x16_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vaddq_m_sv16qi (__arm_vuninitializedq_s8 (), __a, __b, __p);
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vaddq_x_s16 (int16x8_t __a, int16x8_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vaddq_m_sv8hi (__arm_vuninitializedq_s16 (), __a, __b, __p);
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vaddq_x_s32 (int32x4_t __a, int32x4_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vaddq_m_sv4si (__arm_vuninitializedq_s32 (), __a, __b, __p);
-}
-
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vaddq_x_n_s8 (int8x16_t __a, int8_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vaddq_m_n_sv16qi (__arm_vuninitializedq_s8 (), __a, __b, __p);
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vaddq_x_n_s16 (int16x8_t __a, int16_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vaddq_m_n_sv8hi (__arm_vuninitializedq_s16 (), __a, __b, __p);
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vaddq_x_n_s32 (int32x4_t __a, int32_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vaddq_m_n_sv4si (__arm_vuninitializedq_s32 (), __a, __b, __p);
-}
-
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vaddq_x_u8 (uint8x16_t __a, uint8x16_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vaddq_m_uv16qi (__arm_vuninitializedq_u8 (), __a, __b, __p);
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vaddq_x_u16 (uint16x8_t __a, uint16x8_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vaddq_m_uv8hi (__arm_vuninitializedq_u16 (), __a, __b, __p);
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vaddq_x_u32 (uint32x4_t __a, uint32x4_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vaddq_m_uv4si (__arm_vuninitializedq_u32 (), __a, __b, __p);
-}
-
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vaddq_x_n_u8 (uint8x16_t __a, uint8_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vaddq_m_n_uv16qi (__arm_vuninitializedq_u8 (), __a, __b, __p);
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vaddq_x_n_u16 (uint16x8_t __a, uint16_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vaddq_m_n_uv8hi (__arm_vuninitializedq_u16 (), __a, __b, __p);
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vaddq_x_n_u32 (uint32x4_t __a, uint32_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vaddq_m_n_uv4si (__arm_vuninitializedq_u32 (), __a, __b, __p);
-}
-
 __extension__ extern __inline int8x16_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vclsq_x_s8 (int8x16_t __a, mve_pred16_t __p)
@@ -14647,174 +13907,6 @@ __arm_vmulltq_int_x_u32 (uint32x4_t __a, uint32x4_t __b, mve_pred16_t __p)
   return __builtin_mve_vmulltq_int_m_uv4si (__arm_vuninitializedq_u64 (), __a, __b, __p);
 }
 
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vmulq_x_s8 (int8x16_t __a, int8x16_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vmulq_m_sv16qi (__arm_vuninitializedq_s8 (), __a, __b, __p);
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vmulq_x_s16 (int16x8_t __a, int16x8_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vmulq_m_sv8hi (__arm_vuninitializedq_s16 (), __a, __b, __p);
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vmulq_x_s32 (int32x4_t __a, int32x4_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vmulq_m_sv4si (__arm_vuninitializedq_s32 (), __a, __b, __p);
-}
-
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vmulq_x_n_s8 (int8x16_t __a, int8_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vmulq_m_n_sv16qi (__arm_vuninitializedq_s8 (), __a, __b, __p);
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vmulq_x_n_s16 (int16x8_t __a, int16_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vmulq_m_n_sv8hi (__arm_vuninitializedq_s16 (), __a, __b, __p);
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vmulq_x_n_s32 (int32x4_t __a, int32_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vmulq_m_n_sv4si (__arm_vuninitializedq_s32 (), __a, __b, __p);
-}
-
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vmulq_x_u8 (uint8x16_t __a, uint8x16_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vmulq_m_uv16qi (__arm_vuninitializedq_u8 (), __a, __b, __p);
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vmulq_x_u16 (uint16x8_t __a, uint16x8_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vmulq_m_uv8hi (__arm_vuninitializedq_u16 (), __a, __b, __p);
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vmulq_x_u32 (uint32x4_t __a, uint32x4_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vmulq_m_uv4si (__arm_vuninitializedq_u32 (), __a, __b, __p);
-}
-
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vmulq_x_n_u8 (uint8x16_t __a, uint8_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vmulq_m_n_uv16qi (__arm_vuninitializedq_u8 (), __a, __b, __p);
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vmulq_x_n_u16 (uint16x8_t __a, uint16_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vmulq_m_n_uv8hi (__arm_vuninitializedq_u16 (), __a, __b, __p);
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vmulq_x_n_u32 (uint32x4_t __a, uint32_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vmulq_m_n_uv4si (__arm_vuninitializedq_u32 (), __a, __b, __p);
-}
-
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vsubq_x_s8 (int8x16_t __a, int8x16_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vsubq_m_sv16qi (__arm_vuninitializedq_s8 (), __a, __b, __p);
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vsubq_x_s16 (int16x8_t __a, int16x8_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vsubq_m_sv8hi (__arm_vuninitializedq_s16 (), __a, __b, __p);
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vsubq_x_s32 (int32x4_t __a, int32x4_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vsubq_m_sv4si (__arm_vuninitializedq_s32 (), __a, __b, __p);
-}
-
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vsubq_x_n_s8 (int8x16_t __a, int8_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vsubq_m_n_sv16qi (__arm_vuninitializedq_s8 (), __a, __b, __p);
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vsubq_x_n_s16 (int16x8_t __a, int16_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vsubq_m_n_sv8hi (__arm_vuninitializedq_s16 (), __a, __b, __p);
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vsubq_x_n_s32 (int32x4_t __a, int32_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vsubq_m_n_sv4si (__arm_vuninitializedq_s32 (), __a, __b, __p);
-}
-
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vsubq_x_u8 (uint8x16_t __a, uint8x16_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vsubq_m_uv16qi (__arm_vuninitializedq_u8 (), __a, __b, __p);
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vsubq_x_u16 (uint16x8_t __a, uint16x8_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vsubq_m_uv8hi (__arm_vuninitializedq_u16 (), __a, __b, __p);
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vsubq_x_u32 (uint32x4_t __a, uint32x4_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vsubq_m_uv4si (__arm_vuninitializedq_u32 (), __a, __b, __p);
-}
-
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vsubq_x_n_u8 (uint8x16_t __a, uint8_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vsubq_m_n_uv16qi (__arm_vuninitializedq_u8 (), __a, __b, __p);
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vsubq_x_n_u16 (uint16x8_t __a, uint16_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vsubq_m_n_uv8hi (__arm_vuninitializedq_u16 (), __a, __b, __p);
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vsubq_x_n_u32 (uint32x4_t __a, uint32_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vsubq_m_n_uv4si (__arm_vuninitializedq_u32 (), __a, __b, __p);
-}
-
 __extension__ extern __inline int8x16_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vcaddq_rot90_x_s8 (int8x16_t __a, int8x16_t __b, mve_pred16_t __p)
@@ -16970,20 +16062,6 @@ __arm_vcvtmq_s32_f32 (float32x4_t __a)
   return __builtin_mve_vcvtmq_sv4si (__a);
 }
 
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vsubq_n_f16 (float16x8_t __a, float16_t __b)
-{
-  return __builtin_mve_vsubq_n_fv8hf (__a, __b);
-}
-
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vsubq_n_f32 (float32x4_t __a, float32_t __b)
-{
-  return __builtin_mve_vsubq_n_fv4sf (__a, __b);
-}
-
 __extension__ extern __inline float16x8_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vbrsrq_n_f16 (float16x8_t __a, int32_t __b)
@@ -17152,13 +16230,6 @@ __arm_vcmpeqq_f16 (float16x8_t __a, float16x8_t __b)
   return __builtin_mve_vcmpeqq_fv8hf (__a, __b);
 }
 
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vsubq_f16 (float16x8_t __a, float16x8_t __b)
-{
-  return __builtin_mve_vsubq_fv8hf (__a, __b);
-}
-
 __extension__ extern __inline float16x8_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vorrq_f16 (float16x8_t __a, float16x8_t __b)
@@ -17173,20 +16244,6 @@ __arm_vornq_f16 (float16x8_t __a, float16x8_t __b)
   return __builtin_mve_vornq_fv8hf (__a, __b);
 }
 
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vmulq_n_f16 (float16x8_t __a, float16_t __b)
-{
-  return __builtin_mve_vmulq_n_fv8hf (__a, __b);
-}
-
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vmulq_f16 (float16x8_t __a, float16x8_t __b)
-{
-  return __builtin_mve_vmulq_fv8hf (__a, __b);
-}
-
 __extension__ extern __inline float16_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vminnmvq_f16 (float16_t __a, float16x8_t __b)
@@ -17306,13 +16363,6 @@ __arm_vandq_f16 (float16x8_t __a, float16x8_t __b)
   return __builtin_mve_vandq_fv8hf (__a, __b);
 }
 
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vaddq_n_f16 (float16x8_t __a, float16_t __b)
-{
-  return __builtin_mve_vaddq_n_fv8hf (__a, __b);
-}
-
 __extension__ extern __inline float16x8_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vabdq_f16 (float16x8_t __a, float16x8_t __b)
@@ -17404,13 +16454,6 @@ __arm_vcmpeqq_f32 (float32x4_t __a, float32x4_t __b)
   return __builtin_mve_vcmpeqq_fv4sf (__a, __b);
 }
 
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vsubq_f32 (float32x4_t __a, float32x4_t __b)
-{
-  return __builtin_mve_vsubq_fv4sf (__a, __b);
-}
-
 __extension__ extern __inline float32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vorrq_f32 (float32x4_t __a, float32x4_t __b)
@@ -17425,20 +16468,6 @@ __arm_vornq_f32 (float32x4_t __a, float32x4_t __b)
   return __builtin_mve_vornq_fv4sf (__a, __b);
 }
 
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vmulq_n_f32 (float32x4_t __a, float32_t __b)
-{
-  return __builtin_mve_vmulq_n_fv4sf (__a, __b);
-}
-
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vmulq_f32 (float32x4_t __a, float32x4_t __b)
-{
-  return __builtin_mve_vmulq_fv4sf (__a, __b);
-}
-
 __extension__ extern __inline float32_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vminnmvq_f32 (float32_t __a, float32x4_t __b)
@@ -17558,13 +16587,6 @@ __arm_vandq_f32 (float32x4_t __a, float32x4_t __b)
   return __builtin_mve_vandq_fv4sf (__a, __b);
 }
 
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vaddq_n_f32 (float32x4_t __a, float32_t __b)
-{
-  return __builtin_mve_vaddq_n_fv4sf (__a, __b);
-}
-
 __extension__ extern __inline float32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vabdq_f32 (float32x4_t __a, float32x4_t __b)
@@ -18350,34 +17372,6 @@ __arm_vabdq_m_f16 (float16x8_t __inactive, float16x8_t __a, float16x8_t __b, mve
   return __builtin_mve_vabdq_m_fv8hf (__inactive, __a, __b, __p);
 }
 
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vaddq_m_f32 (float32x4_t __inactive, float32x4_t __a, float32x4_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vaddq_m_fv4sf (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vaddq_m_f16 (float16x8_t __inactive, float16x8_t __a, float16x8_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vaddq_m_fv8hf (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vaddq_m_n_f32 (float32x4_t __inactive, float32x4_t __a, float32_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vaddq_m_n_fv4sf (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vaddq_m_n_f16 (float16x8_t __inactive, float16x8_t __a, float16_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vaddq_m_n_fv8hf (__inactive, __a, __b, __p);
-}
-
 __extension__ extern __inline float32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vandq_m_f32 (float32x4_t __inactive, float32x4_t __a, float32x4_t __b, mve_pred16_t __p)
@@ -18686,34 +17680,6 @@ __arm_vminnmq_m_f16 (float16x8_t __inactive, float16x8_t __a, float16x8_t __b, m
   return __builtin_mve_vminnmq_m_fv8hf (__inactive, __a, __b, __p);
 }
 
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vmulq_m_f32 (float32x4_t __inactive, float32x4_t __a, float32x4_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vmulq_m_fv4sf (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vmulq_m_f16 (float16x8_t __inactive, float16x8_t __a, float16x8_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vmulq_m_fv8hf (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vmulq_m_n_f32 (float32x4_t __inactive, float32x4_t __a, float32_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vmulq_m_n_fv4sf (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vmulq_m_n_f16 (float16x8_t __inactive, float16x8_t __a, float16_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vmulq_m_n_fv8hf (__inactive, __a, __b, __p);
-}
-
 __extension__ extern __inline float32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vornq_m_f32 (float32x4_t __inactive, float32x4_t __a, float32x4_t __b, mve_pred16_t __p)
@@ -18742,34 +17708,6 @@ __arm_vorrq_m_f16 (float16x8_t __inactive, float16x8_t __a, float16x8_t __b, mve
   return __builtin_mve_vorrq_m_fv8hf (__inactive, __a, __b, __p);
 }
 
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vsubq_m_f32 (float32x4_t __inactive, float32x4_t __a, float32x4_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vsubq_m_fv4sf (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vsubq_m_f16 (float16x8_t __inactive, float16x8_t __a, float16x8_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vsubq_m_fv8hf (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vsubq_m_n_f32 (float32x4_t __inactive, float32x4_t __a, float32_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vsubq_m_n_fv4sf (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vsubq_m_n_f16 (float16x8_t __inactive, float16x8_t __a, float16_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vsubq_m_n_fv8hf (__inactive, __a, __b, __p);
-}
-
 __extension__ extern __inline float32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vld1q_f32 (float32_t const * __base)
@@ -18994,20 +17932,6 @@ __arm_vstrwq_scatter_shifted_offset_p_f32 (float32_t * __base, uint32x4_t __offs
   __builtin_mve_vstrwq_scatter_shifted_offset_p_fv4sf ((__builtin_neon_si *) __base, __offset, __value, __p);
 }
 
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vaddq_f16 (float16x8_t __a, float16x8_t __b)
-{
-  return __a + __b;
-}
-
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vaddq_f32 (float32x4_t __a, float32x4_t __b)
-{
-  return __a + __b;
-}
-
 __extension__ extern __inline float32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vldrwq_gather_base_wb_f32 (uint32x4_t * __addr, const int __offset)
@@ -19112,34 +18036,6 @@ __arm_vabsq_x_f32 (float32x4_t __a, mve_pred16_t __p)
   return __builtin_mve_vabsq_m_fv4sf (__arm_vuninitializedq_f32 (), __a, __p);
 }
 
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vaddq_x_f16 (float16x8_t __a, float16x8_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vaddq_m_fv8hf (__arm_vuninitializedq_f16 (), __a, __b, __p);
-}
-
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vaddq_x_f32 (float32x4_t __a, float32x4_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vaddq_m_fv4sf (__arm_vuninitializedq_f32 (), __a, __b, __p);
-}
-
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vaddq_x_n_f16 (float16x8_t __a, float16_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vaddq_m_n_fv8hf (__arm_vuninitializedq_f16 (), __a, __b, __p);
-}
-
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vaddq_x_n_f32 (float32x4_t __a, float32_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vaddq_m_n_fv4sf (__arm_vuninitializedq_f32 (), __a, __b, __p);
-}
-
 __extension__ extern __inline float16x8_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vnegq_x_f16 (float16x8_t __a, mve_pred16_t __p)
@@ -19154,62 +18050,6 @@ __arm_vnegq_x_f32 (float32x4_t __a, mve_pred16_t __p)
   return __builtin_mve_vnegq_m_fv4sf (__arm_vuninitializedq_f32 (), __a, __p);
 }
 
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vmulq_x_f16 (float16x8_t __a, float16x8_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vmulq_m_fv8hf (__arm_vuninitializedq_f16 (), __a, __b, __p);
-}
-
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vmulq_x_f32 (float32x4_t __a, float32x4_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vmulq_m_fv4sf (__arm_vuninitializedq_f32 (), __a, __b, __p);
-}
-
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vmulq_x_n_f16 (float16x8_t __a, float16_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vmulq_m_n_fv8hf (__arm_vuninitializedq_f16 (), __a, __b, __p);
-}
-
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vmulq_x_n_f32 (float32x4_t __a, float32_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vmulq_m_n_fv4sf (__arm_vuninitializedq_f32 (), __a, __b, __p);
-}
-
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vsubq_x_f16 (float16x8_t __a, float16x8_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vsubq_m_fv8hf (__arm_vuninitializedq_f16 (), __a, __b, __p);
-}
-
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vsubq_x_f32 (float32x4_t __a, float32x4_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vsubq_m_fv4sf (__arm_vuninitializedq_f32 (), __a, __b, __p);
-}
-
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vsubq_x_n_f16 (float16x8_t __a, float16_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vsubq_m_n_fv8hf (__arm_vuninitializedq_f16 (), __a, __b, __p);
-}
-
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vsubq_x_n_f32 (float32x4_t __a, float32_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vsubq_m_n_fv4sf (__arm_vuninitializedq_f32 (), __a, __b, __p);
-}
-
 __extension__ extern __inline float16x8_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vcaddq_rot90_x_f16 (float16x8_t __a, float16x8_t __b, mve_pred16_t __p)
@@ -20448,20 +19288,6 @@ __arm_vshlq (uint32x4_t __a, int32x4_t __b)
  return __arm_vshlq_u32 (__a, __b);
 }
 
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vsubq (uint8x16_t __a, uint8x16_t __b)
-{
- return __arm_vsubq_u8 (__a, __b);
-}
-
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vsubq (uint8x16_t __a, uint8_t __b)
-{
- return __arm_vsubq_n_u8 (__a, __b);
-}
-
 __extension__ extern __inline uint8x16_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vrmulhq (uint8x16_t __a, uint8x16_t __b)
@@ -20518,20 +19344,6 @@ __arm_vornq (uint8x16_t __a, uint8x16_t __b)
  return __arm_vornq_u8 (__a, __b);
 }
 
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vmulq (uint8x16_t __a, uint8x16_t __b)
-{
- return __arm_vmulq_u8 (__a, __b);
-}
-
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vmulq (uint8x16_t __a, uint8_t __b)
-{
- return __arm_vmulq_n_u8 (__a, __b);
-}
-
 __extension__ extern __inline uint16x8_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vmulltq_int (uint8x16_t __a, uint8x16_t __b)
@@ -20714,13 +19526,6 @@ __arm_vaddvaq (uint32_t __a, uint8x16_t __b)
  return __arm_vaddvaq_u8 (__a, __b);
 }
 
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vaddq (uint8x16_t __a, uint8_t __b)
-{
- return __arm_vaddq_n_u8 (__a, __b);
-}
-
 __extension__ extern __inline uint8x16_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vabdq (uint8x16_t __a, uint8x16_t __b)
@@ -20924,20 +19729,6 @@ __arm_vaddvq_p (int8x16_t __a, mve_pred16_t __p)
  return __arm_vaddvq_p_s8 (__a, __p);
 }
 
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vsubq (int8x16_t __a, int8x16_t __b)
-{
- return __arm_vsubq_s8 (__a, __b);
-}
-
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vsubq (int8x16_t __a, int8_t __b)
-{
- return __arm_vsubq_n_s8 (__a, __b);
-}
-
 __extension__ extern __inline int8x16_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vshlq_r (int8x16_t __a, int32_t __b)
@@ -21071,20 +19862,6 @@ __arm_vornq (int8x16_t __a, int8x16_t __b)
  return __arm_vornq_s8 (__a, __b);
 }
 
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vmulq (int8x16_t __a, int8x16_t __b)
-{
- return __arm_vmulq_s8 (__a, __b);
-}
-
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vmulq (int8x16_t __a, int8_t __b)
-{
- return __arm_vmulq_n_s8 (__a, __b);
-}
-
 __extension__ extern __inline int16x8_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vmulltq_int (int8x16_t __a, int8x16_t __b)
@@ -21253,13 +20030,6 @@ __arm_vaddvaq (int32_t __a, int8x16_t __b)
  return __arm_vaddvaq_s8 (__a, __b);
 }
 
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vaddq (int8x16_t __a, int8_t __b)
-{
- return __arm_vaddq_n_s8 (__a, __b);
-}
-
 __extension__ extern __inline int8x16_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vabdq (int8x16_t __a, int8x16_t __b)
@@ -21288,20 +20058,6 @@ __arm_vqshlq_n (int8x16_t __a, const int __imm)
  return __arm_vqshlq_n_s8 (__a, __imm);
 }
 
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vsubq (uint16x8_t __a, uint16x8_t __b)
-{
- return __arm_vsubq_u16 (__a, __b);
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vsubq (uint16x8_t __a, uint16_t __b)
-{
- return __arm_vsubq_n_u16 (__a, __b);
-}
-
 __extension__ extern __inline uint16x8_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vrmulhq (uint16x8_t __a, uint16x8_t __b)
@@ -21358,20 +20114,6 @@ __arm_vornq (uint16x8_t __a, uint16x8_t __b)
  return __arm_vornq_u16 (__a, __b);
 }
 
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vmulq (uint16x8_t __a, uint16x8_t __b)
-{
- return __arm_vmulq_u16 (__a, __b);
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vmulq (uint16x8_t __a, uint16_t __b)
-{
- return __arm_vmulq_n_u16 (__a, __b);
-}
-
 __extension__ extern __inline uint32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vmulltq_int (uint16x8_t __a, uint16x8_t __b)
@@ -21554,13 +20296,6 @@ __arm_vaddvaq (uint32_t __a, uint16x8_t __b)
  return __arm_vaddvaq_u16 (__a, __b);
 }
 
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vaddq (uint16x8_t __a, uint16_t __b)
-{
- return __arm_vaddq_n_u16 (__a, __b);
-}
-
 __extension__ extern __inline uint16x8_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vabdq (uint16x8_t __a, uint16x8_t __b)
@@ -21764,20 +20499,6 @@ __arm_vaddvq_p (int16x8_t __a, mve_pred16_t __p)
  return __arm_vaddvq_p_s16 (__a, __p);
 }
 
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vsubq (int16x8_t __a, int16x8_t __b)
-{
- return __arm_vsubq_s16 (__a, __b);
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vsubq (int16x8_t __a, int16_t __b)
-{
- return __arm_vsubq_n_s16 (__a, __b);
-}
-
 __extension__ extern __inline int16x8_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vshlq_r (int16x8_t __a, int32_t __b)
@@ -21911,20 +20632,6 @@ __arm_vornq (int16x8_t __a, int16x8_t __b)
  return __arm_vornq_s16 (__a, __b);
 }
 
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vmulq (int16x8_t __a, int16x8_t __b)
-{
- return __arm_vmulq_s16 (__a, __b);
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vmulq (int16x8_t __a, int16_t __b)
-{
- return __arm_vmulq_n_s16 (__a, __b);
-}
-
 __extension__ extern __inline int32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vmulltq_int (int16x8_t __a, int16x8_t __b)
@@ -22093,13 +20800,6 @@ __arm_vaddvaq (int32_t __a, int16x8_t __b)
  return __arm_vaddvaq_s16 (__a, __b);
 }
 
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vaddq (int16x8_t __a, int16_t __b)
-{
- return __arm_vaddq_n_s16 (__a, __b);
-}
-
 __extension__ extern __inline int16x8_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vabdq (int16x8_t __a, int16x8_t __b)
@@ -22128,20 +20828,6 @@ __arm_vqshlq_n (int16x8_t __a, const int __imm)
  return __arm_vqshlq_n_s16 (__a, __imm);
 }
 
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vsubq (uint32x4_t __a, uint32x4_t __b)
-{
- return __arm_vsubq_u32 (__a, __b);
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vsubq (uint32x4_t __a, uint32_t __b)
-{
- return __arm_vsubq_n_u32 (__a, __b);
-}
-
 __extension__ extern __inline uint32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vrmulhq (uint32x4_t __a, uint32x4_t __b)
@@ -22198,20 +20884,6 @@ __arm_vornq (uint32x4_t __a, uint32x4_t __b)
  return __arm_vornq_u32 (__a, __b);
 }
 
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vmulq (uint32x4_t __a, uint32x4_t __b)
-{
- return __arm_vmulq_u32 (__a, __b);
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vmulq (uint32x4_t __a, uint32_t __b)
-{
- return __arm_vmulq_n_u32 (__a, __b);
-}
-
 __extension__ extern __inline uint64x2_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vmulltq_int (uint32x4_t __a, uint32x4_t __b)
@@ -22394,13 +21066,6 @@ __arm_vaddvaq (uint32_t __a, uint32x4_t __b)
  return __arm_vaddvaq_u32 (__a, __b);
 }
 
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vaddq (uint32x4_t __a, uint32_t __b)
-{
- return __arm_vaddq_n_u32 (__a, __b);
-}
-
 __extension__ extern __inline uint32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vabdq (uint32x4_t __a, uint32x4_t __b)
@@ -22604,20 +21269,6 @@ __arm_vaddvq_p (int32x4_t __a, mve_pred16_t __p)
  return __arm_vaddvq_p_s32 (__a, __p);
 }
 
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vsubq (int32x4_t __a, int32x4_t __b)
-{
- return __arm_vsubq_s32 (__a, __b);
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vsubq (int32x4_t __a, int32_t __b)
-{
- return __arm_vsubq_n_s32 (__a, __b);
-}
-
 __extension__ extern __inline int32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vshlq_r (int32x4_t __a, int32_t __b)
@@ -22751,20 +21402,6 @@ __arm_vornq (int32x4_t __a, int32x4_t __b)
  return __arm_vornq_s32 (__a, __b);
 }
 
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vmulq (int32x4_t __a, int32x4_t __b)
-{
- return __arm_vmulq_s32 (__a, __b);
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vmulq (int32x4_t __a, int32_t __b)
-{
- return __arm_vmulq_n_s32 (__a, __b);
-}
-
 __extension__ extern __inline int64x2_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vmulltq_int (int32x4_t __a, int32x4_t __b)
@@ -22933,13 +21570,6 @@ __arm_vaddvaq (int32_t __a, int32x4_t __b)
  return __arm_vaddvaq_s32 (__a, __b);
 }
 
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vaddq (int32x4_t __a, int32_t __b)
-{
- return __arm_vaddq_n_s32 (__a, __b);
-}
-
 __extension__ extern __inline int32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vabdq (int32x4_t __a, int32x4_t __b)
@@ -26097,13 +24727,6 @@ __arm_vsriq_m (int8x16_t __a, int8x16_t __b, const int __imm, mve_pred16_t __p)
  return __arm_vsriq_m_n_s8 (__a, __b, __imm, __p);
 }
 
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vsubq_m (int8x16_t __inactive, int8x16_t __a, int8x16_t __b, mve_pred16_t __p)
-{
- return __arm_vsubq_m_s8 (__inactive, __a, __b, __p);
-}
-
 __extension__ extern __inline uint8x16_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vqshluq_m (uint8x16_t __inactive, int8x16_t __a, const int __imm, mve_pred16_t __p)
@@ -26132,13 +24755,6 @@ __arm_vshlq_m (uint8x16_t __inactive, uint8x16_t __a, int8x16_t __b, mve_pred16_
  return __arm_vshlq_m_u8 (__inactive, __a, __b, __p);
 }
 
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vsubq_m (uint8x16_t __inactive, uint8x16_t __a, uint8x16_t __b, mve_pred16_t __p)
-{
- return __arm_vsubq_m_u8 (__inactive, __a, __b, __p);
-}
-
 __extension__ extern __inline uint32_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vabavq_p (uint32_t __a, uint8x16_t __b, uint8x16_t __c, mve_pred16_t __p)
@@ -26160,13 +24776,6 @@ __arm_vsriq_m (int16x8_t __a, int16x8_t __b, const int __imm, mve_pred16_t __p)
  return __arm_vsriq_m_n_s16 (__a, __b, __imm, __p);
 }
 
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vsubq_m (int16x8_t __inactive, int16x8_t __a, int16x8_t __b, mve_pred16_t __p)
-{
- return __arm_vsubq_m_s16 (__inactive, __a, __b, __p);
-}
-
 __extension__ extern __inline uint16x8_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vqshluq_m (uint16x8_t __inactive, int16x8_t __a, const int __imm, mve_pred16_t __p)
@@ -26195,13 +24804,6 @@ __arm_vshlq_m (uint16x8_t __inactive, uint16x8_t __a, int16x8_t __b, mve_pred16_
  return __arm_vshlq_m_u16 (__inactive, __a, __b, __p);
 }
 
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vsubq_m (uint16x8_t __inactive, uint16x8_t __a, uint16x8_t __b, mve_pred16_t __p)
-{
- return __arm_vsubq_m_u16 (__inactive, __a, __b, __p);
-}
-
 __extension__ extern __inline uint32_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vabavq_p (uint32_t __a, uint16x8_t __b, uint16x8_t __c, mve_pred16_t __p)
@@ -26223,13 +24825,6 @@ __arm_vsriq_m (int32x4_t __a, int32x4_t __b, const int __imm, mve_pred16_t __p)
  return __arm_vsriq_m_n_s32 (__a, __b, __imm, __p);
 }
 
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vsubq_m (int32x4_t __inactive, int32x4_t __a, int32x4_t __b, mve_pred16_t __p)
-{
- return __arm_vsubq_m_s32 (__inactive, __a, __b, __p);
-}
-
 __extension__ extern __inline uint32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vqshluq_m (uint32x4_t __inactive, int32x4_t __a, const int __imm, mve_pred16_t __p)
@@ -26258,13 +24853,6 @@ __arm_vshlq_m (uint32x4_t __inactive, uint32x4_t __a, int32x4_t __b, mve_pred16_
  return __arm_vshlq_m_u32 (__inactive, __a, __b, __p);
 }
 
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vsubq_m (uint32x4_t __inactive, uint32x4_t __a, uint32x4_t __b, mve_pred16_t __p)
-{
- return __arm_vsubq_m_u32 (__inactive, __a, __b, __p);
-}
-
 __extension__ extern __inline uint32_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vabavq_p (uint32_t __a, uint32x4_t __b, uint32x4_t __c, mve_pred16_t __p)
@@ -26321,90 +24909,6 @@ __arm_vabdq_m (uint16x8_t __inactive, uint16x8_t __a, uint16x8_t __b, mve_pred16
  return __arm_vabdq_m_u16 (__inactive, __a, __b, __p);
 }
 
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vaddq_m (int8x16_t __inactive, int8x16_t __a, int8_t __b, mve_pred16_t __p)
-{
- return __arm_vaddq_m_n_s8 (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vaddq_m (int32x4_t __inactive, int32x4_t __a, int32_t __b, mve_pred16_t __p)
-{
- return __arm_vaddq_m_n_s32 (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vaddq_m (int16x8_t __inactive, int16x8_t __a, int16_t __b, mve_pred16_t __p)
-{
- return __arm_vaddq_m_n_s16 (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vaddq_m (uint8x16_t __inactive, uint8x16_t __a, uint8_t __b, mve_pred16_t __p)
-{
- return __arm_vaddq_m_n_u8 (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vaddq_m (uint32x4_t __inactive, uint32x4_t __a, uint32_t __b, mve_pred16_t __p)
-{
- return __arm_vaddq_m_n_u32 (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vaddq_m (uint16x8_t __inactive, uint16x8_t __a, uint16_t __b, mve_pred16_t __p)
-{
- return __arm_vaddq_m_n_u16 (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vaddq_m (int8x16_t __inactive, int8x16_t __a, int8x16_t __b, mve_pred16_t __p)
-{
- return __arm_vaddq_m_s8 (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vaddq_m (int32x4_t __inactive, int32x4_t __a, int32x4_t __b, mve_pred16_t __p)
-{
- return __arm_vaddq_m_s32 (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vaddq_m (int16x8_t __inactive, int16x8_t __a, int16x8_t __b, mve_pred16_t __p)
-{
- return __arm_vaddq_m_s16 (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vaddq_m (uint8x16_t __inactive, uint8x16_t __a, uint8x16_t __b, mve_pred16_t __p)
-{
- return __arm_vaddq_m_u8 (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vaddq_m (uint32x4_t __inactive, uint32x4_t __a, uint32x4_t __b, mve_pred16_t __p)
-{
- return __arm_vaddq_m_u32 (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vaddq_m (uint16x8_t __inactive, uint16x8_t __a, uint16x8_t __b, mve_pred16_t __p)
-{
- return __arm_vaddq_m_u16 (__inactive, __a, __b, __p);
-}
-
 __extension__ extern __inline int8x16_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vandq_m (int8x16_t __inactive, int8x16_t __a, int8x16_t __b, mve_pred16_t __p)
@@ -27266,90 +25770,6 @@ __arm_vmulltq_int_m (uint32x4_t __inactive, uint16x8_t __a, uint16x8_t __b, mve_
  return __arm_vmulltq_int_m_u16 (__inactive, __a, __b, __p);
 }
 
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vmulq_m (int8x16_t __inactive, int8x16_t __a, int8_t __b, mve_pred16_t __p)
-{
- return __arm_vmulq_m_n_s8 (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vmulq_m (int32x4_t __inactive, int32x4_t __a, int32_t __b, mve_pred16_t __p)
-{
- return __arm_vmulq_m_n_s32 (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vmulq_m (int16x8_t __inactive, int16x8_t __a, int16_t __b, mve_pred16_t __p)
-{
- return __arm_vmulq_m_n_s16 (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vmulq_m (uint8x16_t __inactive, uint8x16_t __a, uint8_t __b, mve_pred16_t __p)
-{
- return __arm_vmulq_m_n_u8 (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vmulq_m (uint32x4_t __inactive, uint32x4_t __a, uint32_t __b, mve_pred16_t __p)
-{
- return __arm_vmulq_m_n_u32 (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vmulq_m (uint16x8_t __inactive, uint16x8_t __a, uint16_t __b, mve_pred16_t __p)
-{
- return __arm_vmulq_m_n_u16 (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vmulq_m (int8x16_t __inactive, int8x16_t __a, int8x16_t __b, mve_pred16_t __p)
-{
- return __arm_vmulq_m_s8 (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vmulq_m (int32x4_t __inactive, int32x4_t __a, int32x4_t __b, mve_pred16_t __p)
-{
- return __arm_vmulq_m_s32 (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vmulq_m (int16x8_t __inactive, int16x8_t __a, int16x8_t __b, mve_pred16_t __p)
-{
- return __arm_vmulq_m_s16 (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vmulq_m (uint8x16_t __inactive, uint8x16_t __a, uint8x16_t __b, mve_pred16_t __p)
-{
- return __arm_vmulq_m_u8 (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vmulq_m (uint32x4_t __inactive, uint32x4_t __a, uint32x4_t __b, mve_pred16_t __p)
-{
- return __arm_vmulq_m_u32 (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vmulq_m (uint16x8_t __inactive, uint16x8_t __a, uint16x8_t __b, mve_pred16_t __p)
-{
- return __arm_vmulq_m_u16 (__inactive, __a, __b, __p);
-}
-
 __extension__ extern __inline int8x16_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vornq_m (int8x16_t __inactive, int8x16_t __a, int8x16_t __b, mve_pred16_t __p)
@@ -28358,48 +26778,6 @@ __arm_vsliq_m (uint16x8_t __a, uint16x8_t __b, const int __imm, mve_pred16_t __p
  return __arm_vsliq_m_n_u16 (__a, __b, __imm, __p);
 }
 
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vsubq_m (int8x16_t __inactive, int8x16_t __a, int8_t __b, mve_pred16_t __p)
-{
- return __arm_vsubq_m_n_s8 (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vsubq_m (int32x4_t __inactive, int32x4_t __a, int32_t __b, mve_pred16_t __p)
-{
- return __arm_vsubq_m_n_s32 (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vsubq_m (int16x8_t __inactive, int16x8_t __a, int16_t __b, mve_pred16_t __p)
-{
- return __arm_vsubq_m_n_s16 (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vsubq_m (uint8x16_t __inactive, uint8x16_t __a, uint8_t __b, mve_pred16_t __p)
-{
- return __arm_vsubq_m_n_u8 (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vsubq_m (uint32x4_t __inactive, uint32x4_t __a, uint32_t __b, mve_pred16_t __p)
-{
- return __arm_vsubq_m_n_u32 (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vsubq_m (uint16x8_t __inactive, uint16x8_t __a, uint16_t __b, mve_pred16_t __p)
-{
- return __arm_vsubq_m_n_u16 (__inactive, __a, __b, __p);
-}
-
 __extension__ extern __inline int64_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vmlaldavaq_p (int64_t __a, int32x4_t __b, int32x4_t __c, mve_pred16_t __p)
@@ -29849,48 +28227,6 @@ __arm_vstrwq_scatter_shifted_offset (uint32_t * __base, uint32x4_t __offset, uin
  __arm_vstrwq_scatter_shifted_offset_u32 (__base, __offset, __value);
 }
 
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vaddq (int8x16_t __a, int8x16_t __b)
-{
- return __arm_vaddq_s8 (__a, __b);
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vaddq (int16x8_t __a, int16x8_t __b)
-{
- return __arm_vaddq_s16 (__a, __b);
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vaddq (int32x4_t __a, int32x4_t __b)
-{
- return __arm_vaddq_s32 (__a, __b);
-}
-
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vaddq (uint8x16_t __a, uint8x16_t __b)
-{
- return __arm_vaddq_u8 (__a, __b);
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vaddq (uint16x8_t __a, uint16x8_t __b)
-{
- return __arm_vaddq_u16 (__a, __b);
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vaddq (uint32x4_t __a, uint32x4_t __b)
-{
- return __arm_vaddq_u32 (__a, __b);
-}
-
 __extension__ extern __inline uint8x16_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vddupq_m (uint8x16_t __inactive, uint32_t __a, const int __imm, mve_pred16_t __p)
@@ -30598,90 +28934,6 @@ __arm_vabsq_x (int32x4_t __a, mve_pred16_t __p)
  return __arm_vabsq_x_s32 (__a, __p);
 }
 
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vaddq_x (int8x16_t __a, int8x16_t __b, mve_pred16_t __p)
-{
- return __arm_vaddq_x_s8 (__a, __b, __p);
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vaddq_x (int16x8_t __a, int16x8_t __b, mve_pred16_t __p)
-{
- return __arm_vaddq_x_s16 (__a, __b, __p);
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vaddq_x (int32x4_t __a, int32x4_t __b, mve_pred16_t __p)
-{
- return __arm_vaddq_x_s32 (__a, __b, __p);
-}
-
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vaddq_x (int8x16_t __a, int8_t __b, mve_pred16_t __p)
-{
- return __arm_vaddq_x_n_s8 (__a, __b, __p);
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vaddq_x (int16x8_t __a, int16_t __b, mve_pred16_t __p)
-{
- return __arm_vaddq_x_n_s16 (__a, __b, __p);
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vaddq_x (int32x4_t __a, int32_t __b, mve_pred16_t __p)
-{
- return __arm_vaddq_x_n_s32 (__a, __b, __p);
-}
-
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vaddq_x (uint8x16_t __a, uint8x16_t __b, mve_pred16_t __p)
-{
- return __arm_vaddq_x_u8 (__a, __b, __p);
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vaddq_x (uint16x8_t __a, uint16x8_t __b, mve_pred16_t __p)
-{
- return __arm_vaddq_x_u16 (__a, __b, __p);
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vaddq_x (uint32x4_t __a, uint32x4_t __b, mve_pred16_t __p)
-{
- return __arm_vaddq_x_u32 (__a, __b, __p);
-}
-
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vaddq_x (uint8x16_t __a, uint8_t __b, mve_pred16_t __p)
-{
- return __arm_vaddq_x_n_u8 (__a, __b, __p);
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vaddq_x (uint16x8_t __a, uint16_t __b, mve_pred16_t __p)
-{
- return __arm_vaddq_x_n_u16 (__a, __b, __p);
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vaddq_x (uint32x4_t __a, uint32_t __b, mve_pred16_t __p)
-{
- return __arm_vaddq_x_n_u32 (__a, __b, __p);
-}
-
 __extension__ extern __inline int8x16_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vclsq_x (int8x16_t __a, mve_pred16_t __p)
@@ -30920,174 +29172,6 @@ __arm_vmulltq_int_x (uint32x4_t __a, uint32x4_t __b, mve_pred16_t __p)
  return __arm_vmulltq_int_x_u32 (__a, __b, __p);
 }
 
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vmulq_x (int8x16_t __a, int8x16_t __b, mve_pred16_t __p)
-{
- return __arm_vmulq_x_s8 (__a, __b, __p);
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vmulq_x (int16x8_t __a, int16x8_t __b, mve_pred16_t __p)
-{
- return __arm_vmulq_x_s16 (__a, __b, __p);
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vmulq_x (int32x4_t __a, int32x4_t __b, mve_pred16_t __p)
-{
- return __arm_vmulq_x_s32 (__a, __b, __p);
-}
-
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vmulq_x (int8x16_t __a, int8_t __b, mve_pred16_t __p)
-{
- return __arm_vmulq_x_n_s8 (__a, __b, __p);
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vmulq_x (int16x8_t __a, int16_t __b, mve_pred16_t __p)
-{
- return __arm_vmulq_x_n_s16 (__a, __b, __p);
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vmulq_x (int32x4_t __a, int32_t __b, mve_pred16_t __p)
-{
- return __arm_vmulq_x_n_s32 (__a, __b, __p);
-}
-
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vmulq_x (uint8x16_t __a, uint8x16_t __b, mve_pred16_t __p)
-{
- return __arm_vmulq_x_u8 (__a, __b, __p);
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vmulq_x (uint16x8_t __a, uint16x8_t __b, mve_pred16_t __p)
-{
- return __arm_vmulq_x_u16 (__a, __b, __p);
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vmulq_x (uint32x4_t __a, uint32x4_t __b, mve_pred16_t __p)
-{
- return __arm_vmulq_x_u32 (__a, __b, __p);
-}
-
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vmulq_x (uint8x16_t __a, uint8_t __b, mve_pred16_t __p)
-{
- return __arm_vmulq_x_n_u8 (__a, __b, __p);
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vmulq_x (uint16x8_t __a, uint16_t __b, mve_pred16_t __p)
-{
- return __arm_vmulq_x_n_u16 (__a, __b, __p);
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vmulq_x (uint32x4_t __a, uint32_t __b, mve_pred16_t __p)
-{
- return __arm_vmulq_x_n_u32 (__a, __b, __p);
-}
-
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vsubq_x (int8x16_t __a, int8x16_t __b, mve_pred16_t __p)
-{
- return __arm_vsubq_x_s8 (__a, __b, __p);
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vsubq_x (int16x8_t __a, int16x8_t __b, mve_pred16_t __p)
-{
- return __arm_vsubq_x_s16 (__a, __b, __p);
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vsubq_x (int32x4_t __a, int32x4_t __b, mve_pred16_t __p)
-{
- return __arm_vsubq_x_s32 (__a, __b, __p);
-}
-
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vsubq_x (int8x16_t __a, int8_t __b, mve_pred16_t __p)
-{
- return __arm_vsubq_x_n_s8 (__a, __b, __p);
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vsubq_x (int16x8_t __a, int16_t __b, mve_pred16_t __p)
-{
- return __arm_vsubq_x_n_s16 (__a, __b, __p);
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vsubq_x (int32x4_t __a, int32_t __b, mve_pred16_t __p)
-{
- return __arm_vsubq_x_n_s32 (__a, __b, __p);
-}
-
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vsubq_x (uint8x16_t __a, uint8x16_t __b, mve_pred16_t __p)
-{
- return __arm_vsubq_x_u8 (__a, __b, __p);
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vsubq_x (uint16x8_t __a, uint16x8_t __b, mve_pred16_t __p)
-{
- return __arm_vsubq_x_u16 (__a, __b, __p);
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vsubq_x (uint32x4_t __a, uint32x4_t __b, mve_pred16_t __p)
-{
- return __arm_vsubq_x_u32 (__a, __b, __p);
-}
-
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vsubq_x (uint8x16_t __a, uint8_t __b, mve_pred16_t __p)
-{
- return __arm_vsubq_x_n_u8 (__a, __b, __p);
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vsubq_x (uint16x8_t __a, uint16_t __b, mve_pred16_t __p)
-{
- return __arm_vsubq_x_n_u16 (__a, __b, __p);
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vsubq_x (uint32x4_t __a, uint32_t __b, mve_pred16_t __p)
-{
- return __arm_vsubq_x_n_u32 (__a, __b, __p);
-}
-
 __extension__ extern __inline int8x16_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vcaddq_rot90_x (int8x16_t __a, int8x16_t __b, mve_pred16_t __p)
@@ -32847,20 +30931,6 @@ __arm_vcvtq (uint32x4_t __a)
  return __arm_vcvtq_f32_u32 (__a);
 }
 
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vsubq (float16x8_t __a, float16_t __b)
-{
- return __arm_vsubq_n_f16 (__a, __b);
-}
-
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vsubq (float32x4_t __a, float32_t __b)
-{
- return __arm_vsubq_n_f32 (__a, __b);
-}
-
 __extension__ extern __inline float16x8_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vbrsrq (float16x8_t __a, int32_t __b)
@@ -32987,13 +31057,6 @@ __arm_vcmpeqq (float16x8_t __a, float16x8_t __b)
  return __arm_vcmpeqq_f16 (__a, __b);
 }
 
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vsubq (float16x8_t __a, float16x8_t __b)
-{
- return __arm_vsubq_f16 (__a, __b);
-}
-
 __extension__ extern __inline float16x8_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vorrq (float16x8_t __a, float16x8_t __b)
@@ -33008,20 +31071,6 @@ __arm_vornq (float16x8_t __a, float16x8_t __b)
  return __arm_vornq_f16 (__a, __b);
 }
 
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vmulq (float16x8_t __a, float16_t __b)
-{
- return __arm_vmulq_n_f16 (__a, __b);
-}
-
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vmulq (float16x8_t __a, float16x8_t __b)
-{
- return __arm_vmulq_f16 (__a, __b);
-}
-
 __extension__ extern __inline float16_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vminnmvq (float16_t __a, float16x8_t __b)
@@ -33141,13 +31190,6 @@ __arm_vandq (float16x8_t __a, float16x8_t __b)
  return __arm_vandq_f16 (__a, __b);
 }
 
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vaddq (float16x8_t __a, float16_t __b)
-{
- return __arm_vaddq_n_f16 (__a, __b);
-}
-
 __extension__ extern __inline float16x8_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vabdq (float16x8_t __a, float16x8_t __b)
@@ -33239,13 +31281,6 @@ __arm_vcmpeqq (float32x4_t __a, float32x4_t __b)
  return __arm_vcmpeqq_f32 (__a, __b);
 }
 
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vsubq (float32x4_t __a, float32x4_t __b)
-{
- return __arm_vsubq_f32 (__a, __b);
-}
-
 __extension__ extern __inline float32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vorrq (float32x4_t __a, float32x4_t __b)
@@ -33260,20 +31295,6 @@ __arm_vornq (float32x4_t __a, float32x4_t __b)
  return __arm_vornq_f32 (__a, __b);
 }
 
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vmulq (float32x4_t __a, float32_t __b)
-{
- return __arm_vmulq_n_f32 (__a, __b);
-}
-
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vmulq (float32x4_t __a, float32x4_t __b)
-{
- return __arm_vmulq_f32 (__a, __b);
-}
-
 __extension__ extern __inline float32_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vminnmvq (float32_t __a, float32x4_t __b)
@@ -33393,13 +31414,6 @@ __arm_vandq (float32x4_t __a, float32x4_t __b)
  return __arm_vandq_f32 (__a, __b);
 }
 
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vaddq (float32x4_t __a, float32_t __b)
-{
- return __arm_vaddq_n_f32 (__a, __b);
-}
-
 __extension__ extern __inline float32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vabdq (float32x4_t __a, float32x4_t __b)
@@ -34170,34 +32184,6 @@ __arm_vabdq_m (float16x8_t __inactive, float16x8_t __a, float16x8_t __b, mve_pre
  return __arm_vabdq_m_f16 (__inactive, __a, __b, __p);
 }
 
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vaddq_m (float32x4_t __inactive, float32x4_t __a, float32x4_t __b, mve_pred16_t __p)
-{
- return __arm_vaddq_m_f32 (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vaddq_m (float16x8_t __inactive, float16x8_t __a, float16x8_t __b, mve_pred16_t __p)
-{
- return __arm_vaddq_m_f16 (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vaddq_m (float32x4_t __inactive, float32x4_t __a, float32_t __b, mve_pred16_t __p)
-{
- return __arm_vaddq_m_n_f32 (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vaddq_m (float16x8_t __inactive, float16x8_t __a, float16_t __b, mve_pred16_t __p)
-{
- return __arm_vaddq_m_n_f16 (__inactive, __a, __b, __p);
-}
-
 __extension__ extern __inline float32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vandq_m (float32x4_t __inactive, float32x4_t __a, float32x4_t __b, mve_pred16_t __p)
@@ -34506,34 +32492,6 @@ __arm_vminnmq_m (float16x8_t __inactive, float16x8_t __a, float16x8_t __b, mve_p
  return __arm_vminnmq_m_f16 (__inactive, __a, __b, __p);
 }
 
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vmulq_m (float32x4_t __inactive, float32x4_t __a, float32x4_t __b, mve_pred16_t __p)
-{
- return __arm_vmulq_m_f32 (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vmulq_m (float16x8_t __inactive, float16x8_t __a, float16x8_t __b, mve_pred16_t __p)
-{
- return __arm_vmulq_m_f16 (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vmulq_m (float32x4_t __inactive, float32x4_t __a, float32_t __b, mve_pred16_t __p)
-{
- return __arm_vmulq_m_n_f32 (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vmulq_m (float16x8_t __inactive, float16x8_t __a, float16_t __b, mve_pred16_t __p)
-{
- return __arm_vmulq_m_n_f16 (__inactive, __a, __b, __p);
-}
-
 __extension__ extern __inline float32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vornq_m (float32x4_t __inactive, float32x4_t __a, float32x4_t __b, mve_pred16_t __p)
@@ -34562,34 +32520,6 @@ __arm_vorrq_m (float16x8_t __inactive, float16x8_t __a, float16x8_t __b, mve_pre
  return __arm_vorrq_m_f16 (__inactive, __a, __b, __p);
 }
 
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vsubq_m (float32x4_t __inactive, float32x4_t __a, float32x4_t __b, mve_pred16_t __p)
-{
- return __arm_vsubq_m_f32 (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vsubq_m (float16x8_t __inactive, float16x8_t __a, float16x8_t __b, mve_pred16_t __p)
-{
- return __arm_vsubq_m_f16 (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vsubq_m (float32x4_t __inactive, float32x4_t __a, float32_t __b, mve_pred16_t __p)
-{
- return __arm_vsubq_m_n_f32 (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vsubq_m (float16x8_t __inactive, float16x8_t __a, float16_t __b, mve_pred16_t __p)
-{
- return __arm_vsubq_m_n_f16 (__inactive, __a, __b, __p);
-}
-
 __extension__ extern __inline float32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vld1q (float32_t const * __base)
@@ -34772,20 +32702,6 @@ __arm_vstrwq_scatter_shifted_offset_p (float32_t * __base, uint32x4_t __offset,
  __arm_vstrwq_scatter_shifted_offset_p_f32 (__base, __offset, __value, __p);
 }
 
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vaddq (float16x8_t __a, float16x8_t __b)
-{
- return __arm_vaddq_f16 (__a, __b);
-}
-
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vaddq (float32x4_t __a, float32x4_t __b)
-{
- return __arm_vaddq_f32 (__a, __b);
-}
-
 __extension__ extern __inline void
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vstrwq_scatter_base_wb (uint32x4_t * __addr, const int __offset, float32x4_t __value)
@@ -34856,34 +32772,6 @@ __arm_vabsq_x (float32x4_t __a, mve_pred16_t __p)
  return __arm_vabsq_x_f32 (__a, __p);
 }
 
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vaddq_x (float16x8_t __a, float16x8_t __b, mve_pred16_t __p)
-{
- return __arm_vaddq_x_f16 (__a, __b, __p);
-}
-
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vaddq_x (float32x4_t __a, float32x4_t __b, mve_pred16_t __p)
-{
- return __arm_vaddq_x_f32 (__a, __b, __p);
-}
-
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vaddq_x (float16x8_t __a, float16_t __b, mve_pred16_t __p)
-{
- return __arm_vaddq_x_n_f16 (__a, __b, __p);
-}
-
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vaddq_x (float32x4_t __a, float32_t __b, mve_pred16_t __p)
-{
- return __arm_vaddq_x_n_f32 (__a, __b, __p);
-}
-
 __extension__ extern __inline float16x8_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vnegq_x (float16x8_t __a, mve_pred16_t __p)
@@ -34898,62 +32786,6 @@ __arm_vnegq_x (float32x4_t __a, mve_pred16_t __p)
  return __arm_vnegq_x_f32 (__a, __p);
 }
 
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vmulq_x (float16x8_t __a, float16x8_t __b, mve_pred16_t __p)
-{
- return __arm_vmulq_x_f16 (__a, __b, __p);
-}
-
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vmulq_x (float32x4_t __a, float32x4_t __b, mve_pred16_t __p)
-{
- return __arm_vmulq_x_f32 (__a, __b, __p);
-}
-
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vmulq_x (float16x8_t __a, float16_t __b, mve_pred16_t __p)
-{
- return __arm_vmulq_x_n_f16 (__a, __b, __p);
-}
-
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vmulq_x (float32x4_t __a, float32_t __b, mve_pred16_t __p)
-{
- return __arm_vmulq_x_n_f32 (__a, __b, __p);
-}
-
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vsubq_x (float16x8_t __a, float16x8_t __b, mve_pred16_t __p)
-{
- return __arm_vsubq_x_f16 (__a, __b, __p);
-}
-
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vsubq_x (float32x4_t __a, float32x4_t __b, mve_pred16_t __p)
-{
- return __arm_vsubq_x_f32 (__a, __b, __p);
-}
-
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vsubq_x (float16x8_t __a, float16_t __b, mve_pred16_t __p)
-{
- return __arm_vsubq_x_n_f16 (__a, __b, __p);
-}
-
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vsubq_x (float32x4_t __a, float32_t __b, mve_pred16_t __p)
-{
- return __arm_vsubq_x_n_f32 (__a, __b, __p);
-}
-
 __extension__ extern __inline float16x8_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vcaddq_rot90_x (float16x8_t __a, float16x8_t __b, mve_pred16_t __p)
@@ -35846,26 +33678,6 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]: __arm_vabdq_f16 (__ARM_mve_coerce(__p0, float16x8_t), __ARM_mve_coerce(__p1, float16x8_t)), \
   int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]: __arm_vabdq_f32 (__ARM_mve_coerce(__p0, float32x4_t), __ARM_mve_coerce(__p1, float32x4_t)));})
 
-#define __arm_vaddq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
-  __typeof(p1) __p1 = (p1); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vaddq_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t)), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vaddq_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t)), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vaddq_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t)), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]: __arm_vaddq_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t)), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vaddq_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t)), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vaddq_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t)), \
-  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]: __arm_vaddq_f16 (__ARM_mve_coerce(__p0, float16x8_t), __ARM_mve_coerce(__p1, float16x8_t)), \
-  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]: __arm_vaddq_f32 (__ARM_mve_coerce(__p0, float32x4_t), __ARM_mve_coerce(__p1, float32x4_t)), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vaddq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce3(p1, int)), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vaddq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce3(p1, int)), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vaddq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce3(p1, int)), \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vaddq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce3(p1, int)), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vaddq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce3(p1, int)), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vaddq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce3(p1, int)), \
-  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_fp_n]: __arm_vaddq_n_f16 (__ARM_mve_coerce(__p0, float16x8_t), __ARM_mve_coerce2(p1, double)), \
-  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_fp_n]: __arm_vaddq_n_f32 (__ARM_mve_coerce(__p0, float32x4_t), __ARM_mve_coerce2(p1, double)));})
-
 #define __arm_vandq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
   _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
@@ -35906,26 +33718,6 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]: __arm_vornq_f16 (__ARM_mve_coerce(__p0, float16x8_t), __ARM_mve_coerce(__p1, float16x8_t)), \
   int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]: __arm_vornq_f32 (__ARM_mve_coerce(__p0, float32x4_t), __ARM_mve_coerce(__p1, float32x4_t)));})
 
-#define __arm_vmulq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
-  __typeof(p1) __p1 = (p1); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vmulq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce3(p1, int)), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vmulq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce3(p1, int)), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vmulq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce3(p1, int)), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vmulq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce3(p1, int)), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vmulq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce3(p1, int)), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vmulq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce3(p1, int)), \
-  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_fp_n]: __arm_vmulq_n_f16 (__ARM_mve_coerce(__p0, float16x8_t), __ARM_mve_coerce2(p1, double)), \
-  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_fp_n]: __arm_vmulq_n_f32 (__ARM_mve_coerce(__p0, float32x4_t), __ARM_mve_coerce2(p1, double)), \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vmulq_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t)), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vmulq_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t)), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vmulq_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t)), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]: __arm_vmulq_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t)), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vmulq_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t)), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vmulq_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t)), \
-  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]: __arm_vmulq_f16 (__ARM_mve_coerce(__p0, float16x8_t), __ARM_mve_coerce(__p1, float16x8_t)), \
-  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]: __arm_vmulq_f32 (__ARM_mve_coerce(__p0, float32x4_t), __ARM_mve_coerce(__p1, float32x4_t)));})
-
 #define __arm_vcaddq_rot270(p0,p1) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
   _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
@@ -36147,26 +33939,6 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]: __arm_vminnmq_f16 (__ARM_mve_coerce(__p0, float16x8_t), __ARM_mve_coerce(__p1, float16x8_t)), \
   int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]: __arm_vminnmq_f32 (__ARM_mve_coerce(__p0, float32x4_t), __ARM_mve_coerce(__p1, float32x4_t)));})
 
-#define __arm_vsubq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
-  __typeof(p1) __p1 = (p1); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
-  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_fp_n]: __arm_vsubq_n_f16 (__ARM_mve_coerce(__p0, float16x8_t), __ARM_mve_coerce2(p1, double)), \
-  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_fp_n]: __arm_vsubq_n_f32 (__ARM_mve_coerce(__p0, float32x4_t), __ARM_mve_coerce2(p1, double)), \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vsubq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce3(p1, int)), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vsubq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce3(p1, int)), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vsubq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce3(p1, int)), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vsubq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce3(p1, int)), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vsubq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce3(p1, int)), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vsubq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce3(p1, int)), \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vsubq_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t)), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vsubq_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t)), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vsubq_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t)), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]: __arm_vsubq_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t)), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vsubq_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t)), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vsubq_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t)), \
-  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]: __arm_vsubq_f16 (__ARM_mve_coerce(__p0, float16x8_t), __ARM_mve_coerce(__p1, float16x8_t)), \
-  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]: __arm_vsubq_f32 (__ARM_mve_coerce(__p0, float32x4_t), __ARM_mve_coerce(__p1, float32x4_t)));})
-
 #define __arm_vminnmvq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
   _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
@@ -37288,27 +35060,6 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]: __arm_vabdq_m_f16 (__ARM_mve_coerce(__p0, float16x8_t), __ARM_mve_coerce(__p1, float16x8_t), __ARM_mve_coerce(__p2, float16x8_t), p3), \
   int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]: __arm_vabdq_m_f32 (__ARM_mve_coerce(__p0, float32x4_t), __ARM_mve_coerce(__p1, float32x4_t), __ARM_mve_coerce(__p2, float32x4_t), p3));})
 
-#define __arm_vaddq_m(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
-  __typeof(p1) __p1 = (p1); \
-  __typeof(p2) __p2 = (p2); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vaddq_m_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8x16_t), p3), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vaddq_m_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16x8_t), p3), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vaddq_m_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32x4_t), p3), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]: __arm_vaddq_m_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce(__p2, uint8x16_t), p3), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vaddq_m_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, uint16x8_t), p3), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vaddq_m_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, uint32x4_t), p3), \
-  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]: __arm_vaddq_m_f16 (__ARM_mve_coerce(__p0, float16x8_t), __ARM_mve_coerce(__p1, float16x8_t), __ARM_mve_coerce(__p2, float16x8_t), p3), \
-  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]: __arm_vaddq_m_f32 (__ARM_mve_coerce(__p0, float32x4_t), __ARM_mve_coerce(__p1, float32x4_t), __ARM_mve_coerce(__p2, float32x4_t), p3), \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vaddq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce3(p2, int), p3), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vaddq_m_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce3(p2, int), p3), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vaddq_m_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce3(p2, int), p3), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vaddq_m_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce3(p2, int), p3), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vaddq_m_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce3(p2, int), p3), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vaddq_m_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce3(p2, int), p3), \
-  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t][__ARM_mve_type_fp_n]: __arm_vaddq_m_n_f16 (__ARM_mve_coerce(__p0, float16x8_t), __ARM_mve_coerce(__p1, float16x8_t), __ARM_mve_coerce2(p2, double), p3), \
-  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t][__ARM_mve_type_fp_n]: __arm_vaddq_m_n_f32 (__ARM_mve_coerce(__p0, float32x4_t), __ARM_mve_coerce(__p1, float32x4_t), __ARM_mve_coerce2(p2, double), p3));})
-
 #define __arm_vandq_m(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
   __typeof(p2) __p2 = (p2); \
@@ -37479,27 +35230,6 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]: __arm_vminnmq_m_f16 (__ARM_mve_coerce(__p0, float16x8_t), __ARM_mve_coerce(__p1, float16x8_t), __ARM_mve_coerce(__p2, float16x8_t), p3), \
   int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]: __arm_vminnmq_m_f32 (__ARM_mve_coerce(__p0, float32x4_t), __ARM_mve_coerce(__p1, float32x4_t), __ARM_mve_coerce(__p2, float32x4_t), p3));})
 
-#define __arm_vmulq_m(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
-  __typeof(p1) __p1 = (p1); \
-  __typeof(p2) __p2 = (p2); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vmulq_m_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8x16_t), p3), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vmulq_m_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16x8_t), p3), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vmulq_m_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32x4_t), p3), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]: __arm_vmulq_m_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce(__p2, uint8x16_t), p3), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vmulq_m_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, uint16x8_t), p3), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vmulq_m_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, uint32x4_t), p3), \
-  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]: __arm_vmulq_m_f16 (__ARM_mve_coerce(__p0, float16x8_t), __ARM_mve_coerce(__p1, float16x8_t), __ARM_mve_coerce(__p2, float16x8_t), p3), \
-  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]: __arm_vmulq_m_f32 (__ARM_mve_coerce(__p0, float32x4_t), __ARM_mve_coerce(__p1, float32x4_t), __ARM_mve_coerce(__p2, float32x4_t), p3), \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vmulq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce3(p2, int), p3), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vmulq_m_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce3(p2, int), p3), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vmulq_m_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce3(p2, int), p3), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vmulq_m_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce3(p2, int), p3), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vmulq_m_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce3(p2, int), p3), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vmulq_m_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce3(p2, int), p3), \
-  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t][__ARM_mve_type_fp_n]: __arm_vmulq_m_n_f16 (__ARM_mve_coerce(__p0, float16x8_t), __ARM_mve_coerce(__p1, float16x8_t), __ARM_mve_coerce2(p2, double), p3), \
-  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t][__ARM_mve_type_fp_n]: __arm_vmulq_m_n_f32 (__ARM_mve_coerce(__p0, float32x4_t), __ARM_mve_coerce(__p1, float32x4_t), __ARM_mve_coerce2(p2, double), p3));})
-
 #define __arm_vornq_m(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
   __typeof(p2) __p2 = (p2); \
@@ -37513,27 +35243,6 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]: __arm_vornq_m_f16 (__ARM_mve_coerce(__p0, float16x8_t), __ARM_mve_coerce(__p1, float16x8_t), __ARM_mve_coerce(__p2, float16x8_t), p3), \
   int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]: __arm_vornq_m_f32 (__ARM_mve_coerce(__p0, float32x4_t), __ARM_mve_coerce(__p1, float32x4_t), __ARM_mve_coerce(__p2, float32x4_t), p3));})
 
-#define __arm_vsubq_m(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
-  __typeof(p1) __p1 = (p1); \
-  __typeof(p2) __p2 = (p2); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vsubq_m_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8x16_t), p3), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vsubq_m_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16x8_t), p3), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vsubq_m_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32x4_t), p3), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]: __arm_vsubq_m_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce(__p2, uint8x16_t), p3), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vsubq_m_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, uint16x8_t), p3), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vsubq_m_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, uint32x4_t), p3), \
-  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]: __arm_vsubq_m_f16 (__ARM_mve_coerce(__p0, float16x8_t), __ARM_mve_coerce(__p1, float16x8_t), __ARM_mve_coerce(__p2, float16x8_t), p3), \
-  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]: __arm_vsubq_m_f32 (__ARM_mve_coerce(__p0, float32x4_t), __ARM_mve_coerce(__p1, float32x4_t), __ARM_mve_coerce(__p2, float32x4_t), p3), \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vsubq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce3(p2, int), p3), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vsubq_m_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce3(p2, int), p3), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vsubq_m_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce3(p2, int), p3), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vsubq_m_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce3(p2, int), p3), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vsubq_m_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce3(p2, int), p3), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vsubq_m_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce3(p2, int), p3), \
-  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t][__ARM_mve_type_fp_n]: __arm_vsubq_m_n_f16 (__ARM_mve_coerce(__p0, float16x8_t), __ARM_mve_coerce(__p1, float16x8_t), __ARM_mve_coerce2(p2, double), p3), \
-  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t][__ARM_mve_type_fp_n]: __arm_vsubq_m_n_f32 (__ARM_mve_coerce(__p0, float32x4_t), __ARM_mve_coerce(__p1, float32x4_t), __ARM_mve_coerce2(p2, double), p3));})
-
 #define __arm_vorrq_m(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
   __typeof(p2) __p2 = (p2); \
@@ -37879,26 +35588,6 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_float16x8_t]: __arm_vabsq_x_f16 (__ARM_mve_coerce(__p1, float16x8_t), p2), \
   int (*)[__ARM_mve_type_float32x4_t]: __arm_vabsq_x_f32 (__ARM_mve_coerce(__p1, float32x4_t), p2));})
 
-#define __arm_vaddq_x(p1,p2,p3) ({ __typeof(p1) __p1 = (p1); \
-  __typeof(p2) __p2 = (p2); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vaddq_x_s8 (__ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8x16_t), p3), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vaddq_x_s16 (__ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16x8_t), p3), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vaddq_x_s32 (__ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32x4_t), p3), \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vaddq_x_n_s8 (__ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce3(p2, int), p3), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vaddq_x_n_s16 (__ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce3(p2, int), p3), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vaddq_x_n_s32 (__ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce3(p2, int), p3), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]: __arm_vaddq_x_u8 (__ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce(__p2, uint8x16_t), p3), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vaddq_x_u16 (__ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, uint16x8_t), p3), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vaddq_x_u32 (__ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, uint32x4_t), p3), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vaddq_x_n_u8 (__ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce3(p2, int), p3), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vaddq_x_n_u16 (__ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce3(p2, int), p3), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vaddq_x_n_u32 (__ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce3(p2, int), p3), \
-  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]: __arm_vaddq_x_f16 (__ARM_mve_coerce(__p1, float16x8_t), __ARM_mve_coerce(__p2, float16x8_t), p3), \
-  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]: __arm_vaddq_x_f32 (__ARM_mve_coerce(__p1, float32x4_t), __ARM_mve_coerce(__p2, float32x4_t), p3), \
-  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_fp_n]: __arm_vaddq_x_n_f16 (__ARM_mve_coerce(__p1, float16x8_t), __ARM_mve_coerce2(p2, double), p3), \
-  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_fp_n]: __arm_vaddq_x_n_f32 (__ARM_mve_coerce(__p1, float32x4_t), __ARM_mve_coerce2(p2, double), p3));})
-
 #define __arm_vandq_x(p1,p2,p3) ({ __typeof(p1) __p1 = (p1); \
   __typeof(p2) __p2 = (p2); \
   _Generic( (int (*)[__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
@@ -38014,26 +35703,6 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]: __arm_vminnmq_x_f16 (__ARM_mve_coerce(__p1, float16x8_t), __ARM_mve_coerce(__p2, float16x8_t), p3), \
   int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]: __arm_vminnmq_x_f32 (__ARM_mve_coerce(__p1, float32x4_t), __ARM_mve_coerce(__p2, float32x4_t), p3));})
 
-#define __arm_vmulq_x(p1,p2,p3) ({ __typeof(p1) __p1 = (p1); \
-  __typeof(p2) __p2 = (p2); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vmulq_x_s8 (__ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8x16_t), p3), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vmulq_x_s16 (__ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16x8_t), p3), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vmulq_x_s32 (__ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32x4_t), p3), \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vmulq_x_n_s8 (__ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce3(p2, int), p3), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vmulq_x_n_s16 (__ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce3(p2, int), p3), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vmulq_x_n_s32 (__ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce3(p2, int), p3), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]: __arm_vmulq_x_u8 (__ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce(__p2, uint8x16_t), p3), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vmulq_x_u16 (__ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, uint16x8_t), p3), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vmulq_x_u32 (__ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, uint32x4_t), p3), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vmulq_x_n_u8 (__ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce3(p2, int), p3), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vmulq_x_n_u16 (__ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce3(p2, int), p3), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vmulq_x_n_u32 (__ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce3(p2, int), p3), \
-  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]: __arm_vmulq_x_f16 (__ARM_mve_coerce(__p1, float16x8_t), __ARM_mve_coerce(__p2, float16x8_t), p3), \
-  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]: __arm_vmulq_x_f32 (__ARM_mve_coerce(__p1, float32x4_t), __ARM_mve_coerce(__p2, float32x4_t), p3), \
-  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_fp_n]: __arm_vmulq_x_n_f16 (__ARM_mve_coerce(__p1, float16x8_t), __ARM_mve_coerce2(p2, double), p3), \
-  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_fp_n]: __arm_vmulq_x_n_f32 (__ARM_mve_coerce(__p1, float32x4_t), __ARM_mve_coerce2(p2, double), p3));})
-
 #define __arm_vnegq_x(p1,p2) ({ __typeof(p1) __p1 = (p1); \
   _Generic( (int (*)[__ARM_mve_typeid(__p1)])0, \
   int (*)[__ARM_mve_type_int8x16_t]: __arm_vnegq_x_s8 (__ARM_mve_coerce(__p1, int8x16_t), p2), \
@@ -38115,26 +35784,6 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_float16x8_t]: __arm_vrndxq_x_f16 (__ARM_mve_coerce(__p1, float16x8_t), p2), \
   int (*)[__ARM_mve_type_float32x4_t]: __arm_vrndxq_x_f32 (__ARM_mve_coerce(__p1, float32x4_t), p2));})
 
-#define __arm_vsubq_x(p1,p2,p3) ({ __typeof(p1) __p1 = (p1); \
-  __typeof(p2) __p2 = (p2); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vsubq_x_s8 (__ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8x16_t), p3), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vsubq_x_s16 (__ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16x8_t), p3), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vsubq_x_s32 (__ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32x4_t), p3), \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vsubq_x_n_s8 (__ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce3(p2, int), p3), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vsubq_x_n_s16 (__ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce3(p2, int), p3), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vsubq_x_n_s32 (__ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce3(p2, int), p3), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]: __arm_vsubq_x_u8 (__ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce(__p2, uint8x16_t), p3), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vsubq_x_u16 (__ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, uint16x8_t), p3), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vsubq_x_u32 (__ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, uint32x4_t), p3), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vsubq_x_n_u8 (__ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce3(p2, int), p3), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vsubq_x_n_u16 (__ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce3(p2, int), p3), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vsubq_x_n_u32 (__ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce3(p2, int), p3), \
-  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]: __arm_vsubq_x_f16 (__ARM_mve_coerce(__p1, float16x8_t), __ARM_mve_coerce(__p2, float16x8_t), p3), \
-  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]: __arm_vsubq_x_f32 (__ARM_mve_coerce(__p1, float32x4_t), __ARM_mve_coerce(__p2, float32x4_t), p3), \
-  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_fp_n]: __arm_vsubq_x_n_f16 (__ARM_mve_coerce(__p1, float16x8_t), __ARM_mve_coerce2(p2, double), p3), \
-  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_fp_n]: __arm_vsubq_x_n_f32 (__ARM_mve_coerce(__p1, float32x4_t), __ARM_mve_coerce2(p2, double), p3));})
-
 #define __arm_vcmulq_rot90_x(p1,p2,p3)  ({ __typeof(p1) __p1 = (p1); \
   __typeof(p2) __p2 = (p2); \
   _Generic( (int (*)[__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
@@ -38307,22 +35956,6 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int16x8_t]: __arm_vshlq_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, int16x8_t)), \
   int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int32x4_t]: __arm_vshlq_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, int32x4_t)));})
 
-#define __arm_vsubq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
-  __typeof(p1) __p1 = (p1); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vsubq_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t)), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vsubq_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t)), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vsubq_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t)), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]: __arm_vsubq_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t)), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vsubq_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t)), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vsubq_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t)), \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vsubq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce3(p1, int)), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vsubq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce3(p1, int)), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vsubq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce3(p1, int)), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vsubq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce3(p1, int)), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vsubq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce3(p1, int)), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vsubq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce3(p1, int)));})
-
 #define __arm_vshlq_r(p0,p1) ({ __typeof(p0) __p0 = (p0); \
   _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
   int (*)[__ARM_mve_type_int8x16_t]: __arm_vshlq_r_s8 (__ARM_mve_coerce(__p0, int8x16_t), p1), \
@@ -38508,22 +36141,6 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vornq_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t)), \
   int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vornq_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t)));})
 
-#define __arm_vmulq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
-  __typeof(p1) __p1 = (p1); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vmulq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce3(p1, int)), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vmulq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce3(p1, int)), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vmulq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce3(p1, int)), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vmulq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce3(p1, int)), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vmulq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce3(p1, int)), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vmulq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce3(p1, int)), \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vmulq_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t)), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vmulq_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t)), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vmulq_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t)), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]: __arm_vmulq_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t)), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vmulq_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t)), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vmulq_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t)));})
-
 #define __arm_vmulltq_int(p0,p1) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
   _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
@@ -38687,22 +36304,6 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vbicq_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t)), \
   int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vbicq_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t)));})
 
-#define __arm_vaddq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
-  __typeof(p1) __p1 = (p1); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vaddq_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t)), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vaddq_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t)), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vaddq_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t)), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]: __arm_vaddq_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t)), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vaddq_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t)), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vaddq_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t)), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vaddq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce3(p1, int)), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vaddq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce3(p1, int)), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vaddq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce3(p1, int)), \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vaddq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce3(p1, int)), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vaddq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce3(p1, int)), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vaddq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce3(p1, int)));})
-
 #define __arm_vandq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
   _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
@@ -39375,23 +36976,6 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int16x8_t]: __arm_vqmovunbq_m_s16 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, int16x8_t), p2), \
   int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int32x4_t]: __arm_vqmovunbq_m_s32 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, int32x4_t), p2));})
 
-#define __arm_vsubq_m(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
-  __typeof(p1) __p1 = (p1); \
-  __typeof(p2) __p2 = (p2); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vsubq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce3(p2, int), p3), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vsubq_m_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce3(p2, int), p3), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vsubq_m_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce3(p2, int), p3), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vsubq_m_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce3(p2, int), p3), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vsubq_m_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce3(p2, int), p3), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vsubq_m_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce3(p2, int), p3), \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vsubq_m_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8x16_t), p3), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vsubq_m_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16x8_t), p3), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vsubq_m_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32x4_t), p3), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]: __arm_vsubq_m_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce(__p2, uint8x16_t), p3), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vsubq_m_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, uint16x8_t), p3), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vsubq_m_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, uint32x4_t), p3));})
-
 #define __arm_vabavq_p(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
   __typeof(p2) __p2 = (p2); \
@@ -39513,40 +37097,6 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vorrq_m_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, uint16x8_t), p3), \
   int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vorrq_m_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, uint32x4_t), p3));})
 
-#define __arm_vaddq_m(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
-  __typeof(p1) __p1 = (p1); \
-  __typeof(p2) __p2 = (p2); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vaddq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce3(p2, int), p3), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vaddq_m_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce3(p2, int), p3), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vaddq_m_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce3(p2, int), p3), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vaddq_m_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce3(p2, int), p3), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vaddq_m_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce3(p2, int), p3), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vaddq_m_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce3(p2, int), p3), \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vaddq_m_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8x16_t), p3), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vaddq_m_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16x8_t), p3), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vaddq_m_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32x4_t), p3), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]: __arm_vaddq_m_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce(__p2, uint8x16_t), p3), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vaddq_m_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, uint16x8_t), p3), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vaddq_m_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, uint32x4_t), p3));})
-
-#define __arm_vmulq_m(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
-  __typeof(p1) __p1 = (p1); \
-  __typeof(p2) __p2 = (p2); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vmulq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce3(p2, int), p3), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vmulq_m_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce3(p2, int), p3), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vmulq_m_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce3(p2, int), p3), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vmulq_m_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce3(p2, int), p3), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vmulq_m_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce3(p2, int), p3), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vmulq_m_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce3(p2, int), p3), \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vmulq_m_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8x16_t), p3), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vmulq_m_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16x8_t), p3), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vmulq_m_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32x4_t), p3), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]: __arm_vmulq_m_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce(__p2, uint8x16_t), p3), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vmulq_m_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, uint16x8_t), p3), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vmulq_m_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, uint32x4_t), p3));})
-
 #define __arm_vstrwq_scatter_base(p0,p1,p2) ({ __typeof(p2) __p2 = (p2); \
   _Generic( (int (*)[__ARM_mve_typeid(__p2)])0, \
   int (*)[__ARM_mve_type_int32x4_t]: __arm_vstrwq_scatter_base_s32(p0, p1, __ARM_mve_coerce(__p2, int32x4_t)), \
@@ -39790,22 +37340,6 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_int16x8_t]: __arm_vabsq_x_s16 (__ARM_mve_coerce(__p1, int16x8_t), p2), \
   int (*)[__ARM_mve_type_int32x4_t]: __arm_vabsq_x_s32 (__ARM_mve_coerce(__p1, int32x4_t), p2));})
 
-#define __arm_vaddq_x(p1,p2,p3) ({ __typeof(p1) __p1 = (p1); \
-  __typeof(p2) __p2 = (p2); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vaddq_x_s8 (__ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8x16_t), p3), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vaddq_x_s16 (__ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16x8_t), p3), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vaddq_x_s32 (__ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32x4_t), p3), \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vaddq_x_n_s8 (__ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce3(p2, int), p3), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vaddq_x_n_s16 (__ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce3(p2, int), p3), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vaddq_x_n_s32 (__ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce3(p2, int), p3), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]: __arm_vaddq_x_u8 (__ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce(__p2, uint8x16_t), p3), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vaddq_x_u16 (__ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, uint16x8_t), p3), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vaddq_x_u32 (__ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, uint32x4_t), p3), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vaddq_x_n_u8 (__ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce3(p2, int), p3), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vaddq_x_n_u16 (__ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce3(p2, int), p3), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vaddq_x_n_u32 (__ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce3(p2, int), p3));})
-
 #define __arm_vcaddq_rot270_x(p1,p2,p3) ({ __typeof(p1) __p1 = (p1); \
   __typeof(p2) __p2 = (p2); \
   _Generic( (int (*)[__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
@@ -39892,22 +37426,6 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]: __arm_vmulltq_poly_x_p8 (__ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce(__p2, uint8x16_t), p3), \
   int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vmulltq_poly_x_p16 (__ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, uint16x8_t), p3));})
 
-#define __arm_vmulq_x(p1,p2,p3) ({ __typeof(p1) __p1 = (p1); \
-  __typeof(p2) __p2 = (p2); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vmulq_x_s8 (__ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8x16_t), p3), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vmulq_x_s16 (__ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16x8_t), p3), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vmulq_x_s32 (__ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32x4_t), p3), \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vmulq_x_n_s8 (__ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce3(p2, int), p3), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vmulq_x_n_s16 (__ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce3(p2, int), p3), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vmulq_x_n_s32 (__ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce3(p2, int), p3), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]: __arm_vmulq_x_u8 (__ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce(__p2, uint8x16_t), p3), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vmulq_x_u16 (__ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, uint16x8_t), p3), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vmulq_x_u32 (__ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, uint32x4_t), p3), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vmulq_x_n_u8 (__ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce3(p2, int), p3), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vmulq_x_n_u16 (__ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce3(p2, int), p3), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vmulq_x_n_u32 (__ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce3(p2, int), p3));})
-
 #define __arm_vnegq_x(p1,p2) ({ __typeof(p1) __p1 = (p1); \
   _Generic( (int (*)[__ARM_mve_typeid(__p1)])0, \
   int (*)[__ARM_mve_type_int8x16_t]: __arm_vnegq_x_s8 (__ARM_mve_coerce(__p1, int8x16_t), p2), \
@@ -40014,22 +37532,6 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_uint16_t_ptr]: __arm_vld4q_u16 (__ARM_mve_coerce1(p0, uint16_t *)), \
   int (*)[__ARM_mve_type_uint32_t_ptr]: __arm_vld4q_u32 (__ARM_mve_coerce1(p0, uint32_t *))))
 
-#define __arm_vsubq_x(p1,p2,p3) ({ __typeof(p1) __p1 = (p1); \
-  __typeof(p2) __p2 = (p2); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vsubq_x_s8 (__ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8x16_t), p3), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vsubq_x_s16 (__ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16x8_t), p3), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vsubq_x_s32 (__ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32x4_t), p3), \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vsubq_x_n_s8 (__ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce3(p2, int), p3), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vsubq_x_n_s16 (__ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce3(p2, int), p3), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vsubq_x_n_s32 (__ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce3(p2, int), p3), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]: __arm_vsubq_x_u8 (__ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce(__p2, uint8x16_t), p3), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vsubq_x_u16 (__ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, uint16x8_t), p3), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vsubq_x_u32 (__ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, uint32x4_t), p3), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vsubq_x_n_u8 (__ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce3(p2, int), p3), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vsubq_x_n_u16 (__ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce3(p2, int), p3), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vsubq_x_n_u32 (__ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce3(p2, int), p3));})
-
 #define __arm_vgetq_lane(p0,p1) ({ __typeof(p0) __p0 = (p0); \
   _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
   int (*)[__ARM_mve_type_int8x16_t]: __arm_vgetq_lane_s8 (__ARM_mve_coerce(__p0, int8x16_t), p1), \
diff --git a/gcc/config/arm/arm_mve_builtins.def b/gcc/config/arm/arm_mve_builtins.def
index 5e5510f6e37..8de765de3b0 100644
--- a/gcc/config/arm/arm_mve_builtins.def
+++ b/gcc/config/arm/arm_mve_builtins.def
@@ -92,7 +92,6 @@ VAR1 (BINOP_UNONE_UNONE_PRED, vaddlvq_p_u, v4si)
 VAR3 (BINOP_PRED_NONE_NONE, vcmpneq_, v16qi, v8hi, v4si)
 VAR3 (BINOP_NONE_NONE_NONE, vshlq_s, v16qi, v8hi, v4si)
 VAR3 (BINOP_UNONE_UNONE_NONE, vshlq_u, v16qi, v8hi, v4si)
-VAR3 (BINOP_UNONE_UNONE_UNONE, vsubq_u, v16qi, v8hi, v4si)
 VAR3 (BINOP_UNONE_UNONE_UNONE, vsubq_n_u, v16qi, v8hi, v4si)
 VAR3 (BINOP_UNONE_UNONE_UNONE, vrmulhq_u, v16qi, v8hi, v4si)
 VAR3 (BINOP_UNONE_UNONE_UNONE, vrhaddq_u, v16qi, v8hi, v4si)
@@ -102,7 +101,6 @@ VAR3 (BINOP_UNONE_UNONE_UNONE, vqaddq_u, v16qi, v8hi, v4si)
 VAR3 (BINOP_UNONE_UNONE_UNONE, vqaddq_n_u, v16qi, v8hi, v4si)
 VAR3 (BINOP_UNONE_UNONE_UNONE, vorrq_u, v16qi, v8hi, v4si)
 VAR3 (BINOP_UNONE_UNONE_UNONE, vornq_u, v16qi, v8hi, v4si)
-VAR3 (BINOP_UNONE_UNONE_UNONE, vmulq_u, v16qi, v8hi, v4si)
 VAR3 (BINOP_UNONE_UNONE_UNONE, vmulq_n_u, v16qi, v8hi, v4si)
 VAR3 (BINOP_UNONE_UNONE_UNONE, vmulltq_int_u, v16qi, v8hi, v4si)
 VAR3 (BINOP_UNONE_UNONE_UNONE, vmullbq_int_u, v16qi, v8hi, v4si)
@@ -155,7 +153,6 @@ VAR3 (BINOP_PRED_NONE_NONE, vcmpeqq_, v16qi, v8hi, v4si)
 VAR3 (BINOP_PRED_NONE_NONE, vcmpeqq_n_, v16qi, v8hi, v4si)
 VAR3 (BINOP_UNONE_NONE_IMM, vqshluq_n_s, v16qi, v8hi, v4si)
 VAR3 (BINOP_NONE_NONE_PRED, vaddvq_p_s, v16qi, v8hi, v4si)
-VAR3 (BINOP_NONE_NONE_NONE, vsubq_s, v16qi, v8hi, v4si)
 VAR3 (BINOP_NONE_NONE_NONE, vsubq_n_s, v16qi, v8hi, v4si)
 VAR3 (BINOP_NONE_NONE_NONE, vshlq_r_s, v16qi, v8hi, v4si)
 VAR3 (BINOP_NONE_NONE_NONE, vrshlq_s, v16qi, v8hi, v4si)
@@ -176,7 +173,6 @@ VAR3 (BINOP_NONE_NONE_NONE, vqaddq_s, v16qi, v8hi, v4si)
 VAR3 (BINOP_NONE_NONE_NONE, vqaddq_n_s, v16qi, v8hi, v4si)
 VAR3 (BINOP_NONE_NONE_NONE, vorrq_s, v16qi, v8hi, v4si)
 VAR3 (BINOP_NONE_NONE_NONE, vornq_s, v16qi, v8hi, v4si)
-VAR3 (BINOP_NONE_NONE_NONE, vmulq_s, v16qi, v8hi, v4si)
 VAR3 (BINOP_NONE_NONE_NONE, vmulq_n_s, v16qi, v8hi, v4si)
 VAR3 (BINOP_NONE_NONE_NONE, vmulltq_int_s, v16qi, v8hi, v4si)
 VAR3 (BINOP_NONE_NONE_NONE, vmullbq_int_s, v16qi, v8hi, v4si)
@@ -230,7 +226,6 @@ VAR2 (BINOP_PRED_NONE_NONE, vcmpgeq_n_f, v8hf, v4sf)
 VAR2 (BINOP_PRED_NONE_NONE, vcmpgeq_f, v8hf, v4sf)
 VAR2 (BINOP_PRED_NONE_NONE, vcmpeqq_n_f, v8hf, v4sf)
 VAR2 (BINOP_PRED_NONE_NONE, vcmpeqq_f, v8hf, v4sf)
-VAR2 (BINOP_NONE_NONE_NONE, vsubq_f, v8hf, v4sf)
 VAR2 (BINOP_NONE_NONE_NONE, vqmovntq_s, v8hi, v4si)
 VAR2 (BINOP_NONE_NONE_NONE, vqmovnbq_s, v8hi, v4si)
 VAR2 (BINOP_NONE_NONE_NONE, vqdmulltq_s, v8hi, v4si)
@@ -240,7 +235,6 @@ VAR2 (BINOP_NONE_NONE_NONE, vqdmullbq_n_s, v8hi, v4si)
 VAR2 (BINOP_NONE_NONE_NONE, vorrq_f, v8hf, v4sf)
 VAR2 (BINOP_NONE_NONE_NONE, vornq_f, v8hf, v4sf)
 VAR2 (BINOP_NONE_NONE_NONE, vmulq_n_f, v8hf, v4sf)
-VAR2 (BINOP_NONE_NONE_NONE, vmulq_f, v8hf, v4sf)
 VAR2 (BINOP_NONE_NONE_NONE, vmovntq_s, v8hi, v4si)
 VAR2 (BINOP_NONE_NONE_NONE, vmovnbq_s, v8hi, v4si)
 VAR2 (BINOP_NONE_NONE_NONE, vmlsldavxq_s, v8hi, v4si)
diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index 5167fbc6add..ccb3cf23304 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -1353,18 +1353,6 @@ (define_insn "mve_vmulltq_int_<supf><mode>"
 ;; [vmulq_u, vmulq_s])
 ;; [vsubq_s, vsubq_u])
 ;;
-(define_insn "mve_vmulq_<supf><mode>"
-  [
-    (set (match_operand:MVE_2 0 "s_register_operand" "=w")
-	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "w")
-		       (match_operand:MVE_2 2 "s_register_operand" "w")]
-	 VMULQ))
-  ]
-  "TARGET_HAVE_MVE"
-  "vmul.i%#<V_sz_elem>\t%q0, %q1, %q2"
-  [(set_attr "type" "mve_move")
-])
-
 (define_insn "mve_<mve_addsubmul>q<mode>"
   [
    (set (match_operand:MVE_2 0 "s_register_operand" "=w")
@@ -1742,21 +1730,6 @@ (define_insn "mve_vshlq_r_<supf><mode>"
   [(set_attr "type" "mve_move")
 ])
 
-;;
-;; [vsubq_s, vsubq_u])
-;;
-(define_insn "mve_vsubq_<supf><mode>"
-  [
-   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
-	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "w")
-		       (match_operand:MVE_2 2 "s_register_operand" "w")]
-	 VSUBQ))
-  ]
-  "TARGET_HAVE_MVE"
-  "vsub.i%#<V_sz_elem>\t%q0, %q1, %q2"
-  [(set_attr "type" "mve_move")
-])
-
 ;;
 ;; [vabdq_f])
 ;;
-- 
2.34.1


^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH 09/22] arm: [MVE intrinsics] add binary shape
  2023-04-18 13:45 [PATCH 00/22] arm: New framework for MVE intrinsics Christophe Lyon
                   ` (7 preceding siblings ...)
  2023-04-18 13:45 ` [PATCH 08/22] arm: [MVE intrinsics] rework vaddq vmulq vsubq Christophe Lyon
@ 2023-04-18 13:45 ` Christophe Lyon
  2023-05-02 16:32   ` Kyrylo Tkachov
  2023-04-18 13:45 ` [PATCH 10/22] arm: [MVE intrinsics] factorize vandq veorq vorrq vbicq Christophe Lyon
                   ` (13 subsequent siblings)
  22 siblings, 1 reply; 55+ messages in thread
From: Christophe Lyon @ 2023-04-18 13:45 UTC (permalink / raw)
  To: gcc-patches, kyrylo.tkachov, richard.earnshaw, richard.sandiford
  Cc: Christophe Lyon

This patch adds the binary shape description.

2022-09-08  Christophe Lyon  <christophe.lyon@arm.com>

	gcc/
	* config/arm/arm-mve-builtins-shapes.cc (binary): New.
	* config/arm/arm-mve-builtins-shapes.h (binary): New.
---
 gcc/config/arm/arm-mve-builtins-shapes.cc | 27 +++++++++++++++++++++++
 gcc/config/arm/arm-mve-builtins-shapes.h  |  1 +
 2 files changed, 28 insertions(+)

diff --git a/gcc/config/arm/arm-mve-builtins-shapes.cc b/gcc/config/arm/arm-mve-builtins-shapes.cc
index 033b304060a..e69faae4e2c 100644
--- a/gcc/config/arm/arm-mve-builtins-shapes.cc
+++ b/gcc/config/arm/arm-mve-builtins-shapes.cc
@@ -338,6 +338,33 @@ struct overloaded_base : public function_shape
   }
 };
 
+/* <T0>_t vfoo[_t0](<T0>_t, <T0>_t)
+
+   i.e. the standard shape for binary operations that operate on
+   uniform types.
+
+   Example: vandq.
+   int8x16_t [__arm_]vandq[_s8](int8x16_t a, int8x16_t b)
+   int8x16_t [__arm_]vandq_m[_s8](int8x16_t inactive, int8x16_t a, int8x16_t b, mve_pred16_t p)
+   int8x16_t [__arm_]vandq_x[_s8](int8x16_t a, int8x16_t b, mve_pred16_t p)  */
+struct binary_def : public overloaded_base<0>
+{
+  void
+  build (function_builder &b, const function_group_info &group,
+	 bool preserve_user_namespace) const override
+  {
+    b.add_overloaded_functions (group, MODE_none, preserve_user_namespace);
+    build_all (b, "v0,v0,v0", group, MODE_none, preserve_user_namespace);
+  }
+
+  tree
+  resolve (function_resolver &r) const override
+  {
+    return r.resolve_uniform (2);
+  }
+};
+SHAPE (binary)
+
 /* <T0>_t vfoo[_t0](<T0>_t, <T0>_t)
    <T0>_t vfoo[_n_t0](<T0>_t, <S0>_t)
 
diff --git a/gcc/config/arm/arm-mve-builtins-shapes.h b/gcc/config/arm/arm-mve-builtins-shapes.h
index 43798fdde57..b00ee5eb57a 100644
--- a/gcc/config/arm/arm-mve-builtins-shapes.h
+++ b/gcc/config/arm/arm-mve-builtins-shapes.h
@@ -34,6 +34,7 @@ namespace arm_mve
   namespace shapes
   {
 
+    extern const function_shape *const binary;
     extern const function_shape *const binary_opt_n;
     extern const function_shape *const inherent;
     extern const function_shape *const unary_convert;
-- 
2.34.1


^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH 10/22] arm: [MVE intrinsics] factorize vandq veorq vorrq vbicq
  2023-04-18 13:45 [PATCH 00/22] arm: New framework for MVE intrinsics Christophe Lyon
                   ` (8 preceding siblings ...)
  2023-04-18 13:45 ` [PATCH 09/22] arm: [MVE intrinsics] add binary shape Christophe Lyon
@ 2023-04-18 13:45 ` Christophe Lyon
  2023-05-02 16:36   ` Kyrylo Tkachov
  2023-04-18 13:45 ` [PATCH 11/22] arm: [MVE intrinsics] rework vandq veorq Christophe Lyon
                   ` (12 subsequent siblings)
  22 siblings, 1 reply; 55+ messages in thread
From: Christophe Lyon @ 2023-04-18 13:45 UTC (permalink / raw)
  To: gcc-patches, kyrylo.tkachov, richard.earnshaw, richard.sandiford
  Cc: Christophe Lyon

Factorize vandq, veorq, vorrq, vbicq so that they use the same
parameterized names.

2022-09-08  Christophe Lyon <christophe.lyon@arm.com>

	gcc/
	* config/arm/iterators.md (MVE_INT_M_BINARY_LOGIC)
	(MVE_FP_M_BINARY_LOGIC): New.
	(MVE_INT_M_N_BINARY_LOGIC): New.
	(MVE_INT_N_BINARY_LOGIC): New.
	(mve_insn): Add vand, veor, vorr, vbic.
	* config/arm/mve.md (mve_vandq_m_<supf><mode>)
	(mve_veorq_m_<supf><mode>, mve_vorrq_m_<supf><mode>)
	(mve_vbicq_m_<supf><mode>): Merge into ...
	(@mve_<mve_insn>q_m_<supf><mode>): ... this.
	(mve_vandq_m_f<mode>, mve_veorq_m_f<mode>, mve_vorrq_m_f<mode>)
	(mve_vbicq_m_f<mode>): Merge into ...
	(@mve_<mve_insn>q_m_f<mode>): ... this.
	(mve_vorrq_n_<supf><mode>)
	(mve_vbicq_n_<supf><mode>): Merge into ...
	(@mve_<mve_insn>q_n_<supf><mode>): ... this.
	(mve_vorrq_m_n_<supf><mode>, mve_vbicq_m_n_<supf><mode>): Merge
	into ...
	(@mve_<mve_insn>q_m_n_<supf><mode>): ... this.
---
 gcc/config/arm/iterators.md |  32 +++++++
 gcc/config/arm/mve.md       | 161 +++++-------------------------------
 2 files changed, 51 insertions(+), 142 deletions(-)

diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
index d3bef594775..b0ea1af77d2 100644
--- a/gcc/config/arm/iterators.md
+++ b/gcc/config/arm/iterators.md
@@ -339,24 +339,48 @@ (define_int_iterator MVE_INT_M_BINARY   [
 		     VSUBQ_M_S VSUBQ_M_U
 		     ])
 
+(define_int_iterator MVE_INT_M_BINARY_LOGIC   [
+		     VANDQ_M_S VANDQ_M_U
+		     VBICQ_M_S VBICQ_M_U
+		     VEORQ_M_S VEORQ_M_U
+		     VORRQ_M_S VORRQ_M_U
+		     ])
+
 (define_int_iterator MVE_INT_M_N_BINARY [
 		     VADDQ_M_N_S VADDQ_M_N_U
 		     VMULQ_M_N_S VMULQ_M_N_U
 		     VSUBQ_M_N_S VSUBQ_M_N_U
 		     ])
 
+(define_int_iterator MVE_INT_M_N_BINARY_LOGIC [
+		     VBICQ_M_N_S VBICQ_M_N_U
+		     VORRQ_M_N_S VORRQ_M_N_U
+		     ])
+
 (define_int_iterator MVE_INT_N_BINARY   [
 		     VADDQ_N_S VADDQ_N_U
 		     VMULQ_N_S VMULQ_N_U
 		     VSUBQ_N_S VSUBQ_N_U
 		     ])
 
+(define_int_iterator MVE_INT_N_BINARY_LOGIC   [
+		     VBICQ_N_S VBICQ_N_U
+		     VORRQ_N_S VORRQ_N_U
+		     ])
+
 (define_int_iterator MVE_FP_M_BINARY   [
 		     VADDQ_M_F
 		     VMULQ_M_F
 		     VSUBQ_M_F
 		     ])
 
+(define_int_iterator MVE_FP_M_BINARY_LOGIC   [
+		     VANDQ_M_F
+		     VBICQ_M_F
+		     VEORQ_M_F
+		     VORRQ_M_F
+		     ])
+
 (define_int_iterator MVE_FP_M_N_BINARY [
 		     VADDQ_M_N_F
 		     VMULQ_M_N_F
@@ -379,9 +403,17 @@ (define_int_attr mve_insn [
 		 (VADDQ_M_N_S "vadd") (VADDQ_M_N_U "vadd") (VADDQ_M_N_F "vadd")
 		 (VADDQ_M_S "vadd") (VADDQ_M_U "vadd") (VADDQ_M_F "vadd")
 		 (VADDQ_N_S "vadd") (VADDQ_N_U "vadd") (VADDQ_N_F "vadd")
+		 (VANDQ_M_S "vand") (VANDQ_M_U "vand") (VANDQ_M_F "vand")
+		 (VBICQ_M_N_S "vbic") (VBICQ_M_N_U "vbic")
+		 (VBICQ_M_S "vbic") (VBICQ_M_U "vbic") (VBICQ_M_F "vbic")
+		 (VBICQ_N_S "vbic") (VBICQ_N_U "vbic")
+		 (VEORQ_M_S "veor") (VEORQ_M_U "veor") (VEORQ_M_F "veor")
 		 (VMULQ_M_N_S "vmul") (VMULQ_M_N_U "vmul") (VMULQ_M_N_F "vmul")
 		 (VMULQ_M_S "vmul") (VMULQ_M_U "vmul") (VMULQ_M_F "vmul")
 		 (VMULQ_N_S "vmul") (VMULQ_N_U "vmul") (VMULQ_N_F "vmul")
+		 (VORRQ_M_N_S "vorr") (VORRQ_M_N_U "vorr")
+		 (VORRQ_M_S "vorr") (VORRQ_M_U "vorr") (VORRQ_M_F "vorr")
+		 (VORRQ_N_S "vorr") (VORRQ_N_U "vorr")
 		 (VSUBQ_M_N_S "vsub") (VSUBQ_M_N_U "vsub") (VSUBQ_M_N_F "vsub")
 		 (VSUBQ_M_S "vsub") (VSUBQ_M_U "vsub") (VSUBQ_M_F "vsub")
 		 (VSUBQ_N_S "vsub") (VSUBQ_N_U "vsub") (VSUBQ_N_F "vsub")
diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index ccb3cf23304..fbae1d3791f 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -1805,21 +1805,6 @@ (define_insn "mve_vbicq_f<mode>"
   [(set_attr "type" "mve_move")
 ])
 
-;;
-;; [vbicq_n_s, vbicq_n_u])
-;;
-(define_insn "mve_vbicq_n_<supf><mode>"
-  [
-   (set (match_operand:MVE_5 0 "s_register_operand" "=w")
-	(unspec:MVE_5 [(match_operand:MVE_5 1 "s_register_operand" "0")
-		       (match_operand:SI 2 "immediate_operand" "i")]
-	 VBICQ_N))
-  ]
-  "TARGET_HAVE_MVE"
-  "vbic.i%#<V_sz_elem>	%q0, %2"
-  [(set_attr "type" "mve_move")
-])
-
 ;;
 ;; [vcaddq, vcaddq_rot90, vcadd_rot180, vcadd_rot270])
 ;;
@@ -2191,17 +2176,18 @@ (define_insn "mve_vorrq_f<mode>"
 ])
 
 ;;
+;; [vbicq_n_s, vbicq_n_u])
 ;; [vorrq_n_u, vorrq_n_s])
 ;;
-(define_insn "mve_vorrq_n_<supf><mode>"
+(define_insn "@mve_<mve_insn>q_n_<supf><mode>"
   [
    (set (match_operand:MVE_5 0 "s_register_operand" "=w")
 	(unspec:MVE_5 [(match_operand:MVE_5 1 "s_register_operand" "0")
 		       (match_operand:SI 2 "immediate_operand" "i")]
-	 VORRQ_N))
+	 MVE_INT_N_BINARY_LOGIC))
   ]
   "TARGET_HAVE_MVE"
-  "vorr.i%#<V_sz_elem>	%q0, %2"
+  "<mve_insn>.i%#<V_sz_elem>	%q0, %2"
   [(set_attr "type" "mve_move")
 ])
 
@@ -2445,21 +2431,6 @@ (define_insn "mve_vrmlaldavhq_<supf>v4si"
   [(set_attr "type" "mve_move")
 ])
 
-;;
-;; [vbicq_m_n_s, vbicq_m_n_u])
-;;
-(define_insn "mve_vbicq_m_n_<supf><mode>"
-  [
-   (set (match_operand:MVE_5 0 "s_register_operand" "=w")
-	(unspec:MVE_5 [(match_operand:MVE_5 1 "s_register_operand" "0")
-		       (match_operand:SI 2 "immediate_operand" "i")
-		       (match_operand:<MVE_VPRED> 3 "vpr_register_operand" "Up")]
-	 VBICQ_M_N))
-  ]
-  "TARGET_HAVE_MVE"
-  "vpst\;vbict.i%#<V_sz_elem>	%q0, %2"
-  [(set_attr "type" "mve_move")
-   (set_attr "length""8")])
 ;;
 ;; [vcmpeqq_m_f])
 ;;
@@ -4269,20 +4240,22 @@ (define_insn "mve_vnegq_m_f<mode>"
    (set_attr "length""8")])
 
 ;;
+;; [vbicq_m_n_s, vbicq_m_n_u])
 ;; [vorrq_m_n_s, vorrq_m_n_u])
 ;;
-(define_insn "mve_vorrq_m_n_<supf><mode>"
+(define_insn "@mve_<mve_insn>q_m_n_<supf><mode>"
   [
    (set (match_operand:MVE_5 0 "s_register_operand" "=w")
 	(unspec:MVE_5 [(match_operand:MVE_5 1 "s_register_operand" "0")
 		       (match_operand:SI 2 "immediate_operand" "i")
 		       (match_operand:<MVE_VPRED> 3 "vpr_register_operand" "Up")]
-	 VORRQ_M_N))
+	 MVE_INT_M_N_BINARY_LOGIC))
   ]
   "TARGET_HAVE_MVE"
-  "vpst\;vorrt.i%#<V_sz_elem>	%q0, %2"
+  "vpst\;<mve_insn>t.i%#<V_sz_elem>	%q0, %2"
   [(set_attr "type" "mve_move")
    (set_attr "length""8")])
+
 ;;
 ;; [vpselq_f])
 ;;
@@ -5001,35 +4974,21 @@ (define_insn "@mve_<mve_insn>q_m_<supf><mode>"
 
 ;;
 ;; [vandq_m_u, vandq_m_s])
-;;
-(define_insn "mve_vandq_m_<supf><mode>"
-  [
-   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
-	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
-		       (match_operand:MVE_2 2 "s_register_operand" "w")
-		       (match_operand:MVE_2 3 "s_register_operand" "w")
-		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
-	 VANDQ_M))
-  ]
-  "TARGET_HAVE_MVE"
-  "vpst\;vandt %q0, %q2, %q3"
-  [(set_attr "type" "mve_move")
-   (set_attr "length""8")])
-
-;;
 ;; [vbicq_m_u, vbicq_m_s])
+;; [veorq_m_u, veorq_m_s])
+;; [vorrq_m_u, vorrq_m_s])
 ;;
-(define_insn "mve_vbicq_m_<supf><mode>"
+(define_insn "@mve_<mve_insn>q_m_<supf><mode>"
   [
    (set (match_operand:MVE_2 0 "s_register_operand" "=w")
 	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
 		       (match_operand:MVE_2 2 "s_register_operand" "w")
 		       (match_operand:MVE_2 3 "s_register_operand" "w")
 		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
-	 VBICQ_M))
+	 MVE_INT_M_BINARY_LOGIC))
   ]
   "TARGET_HAVE_MVE"
-  "vpst\;vbict %q0, %q2, %q3"
+  "vpst\;<mve_insn>t %q0, %q2, %q3"
   [(set_attr "type" "mve_move")
    (set_attr "length""8")])
 
@@ -5084,23 +5043,6 @@ (define_insn "mve_vcaddq_rot90_m_<supf><mode>"
   [(set_attr "type" "mve_move")
    (set_attr "length""8")])
 
-;;
-;; [veorq_m_s, veorq_m_u])
-;;
-(define_insn "mve_veorq_m_<supf><mode>"
-  [
-   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
-	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
-		       (match_operand:MVE_2 2 "s_register_operand" "w")
-		       (match_operand:MVE_2 3 "s_register_operand" "w")
-		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
-	 VEORQ_M))
-  ]
-  "TARGET_HAVE_MVE"
-  "vpst\;veort %q0, %q2, %q3"
-  [(set_attr "type" "mve_move")
-   (set_attr "length""8")])
-
 ;;
 ;; [vhaddq_m_n_s, vhaddq_m_n_u])
 ;;
@@ -5322,23 +5264,6 @@ (define_insn "mve_vornq_m_<supf><mode>"
   [(set_attr "type" "mve_move")
    (set_attr "length""8")])
 
-;;
-;; [vorrq_m_s, vorrq_m_u])
-;;
-(define_insn "mve_vorrq_m_<supf><mode>"
-  [
-   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
-	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
-		       (match_operand:MVE_2 2 "s_register_operand" "w")
-		       (match_operand:MVE_2 3 "s_register_operand" "w")
-		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
-	 VORRQ_M))
-  ]
-  "TARGET_HAVE_MVE"
-  "vpst\;vorrt %q0, %q2, %q3"
-  [(set_attr "type" "mve_move")
-   (set_attr "length""8")])
-
 ;;
 ;; [vqaddq_m_n_u, vqaddq_m_n_s])
 ;;
@@ -6483,35 +6408,21 @@ (define_insn "@mve_<mve_insn>q_m_n_f<mode>"
 
 ;;
 ;; [vandq_m_f])
-;;
-(define_insn "mve_vandq_m_f<mode>"
-  [
-   (set (match_operand:MVE_0 0 "s_register_operand" "=w")
-	(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "0")
-		       (match_operand:MVE_0 2 "s_register_operand" "w")
-		       (match_operand:MVE_0 3 "s_register_operand" "w")
-		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
-	 VANDQ_M_F))
-  ]
-  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
-  "vpst\;vandt %q0, %q2, %q3"
-  [(set_attr "type" "mve_move")
-   (set_attr "length""8")])
-
-;;
 ;; [vbicq_m_f])
+;; [veorq_m_f])
+;; [vorrq_m_f])
 ;;
-(define_insn "mve_vbicq_m_f<mode>"
+(define_insn "@mve_<mve_insn>q_m_f<mode>"
   [
    (set (match_operand:MVE_0 0 "s_register_operand" "=w")
 	(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "0")
 		       (match_operand:MVE_0 2 "s_register_operand" "w")
 		       (match_operand:MVE_0 3 "s_register_operand" "w")
 		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
-	 VBICQ_M_F))
+	 MVE_FP_M_BINARY_LOGIC))
   ]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
-  "vpst\;vbict %q0, %q2, %q3"
+  "vpst\;<mve_insn>t %q0, %q2, %q3"
   [(set_attr "type" "mve_move")
    (set_attr "length""8")])
 
@@ -6702,23 +6613,6 @@ (define_insn "mve_vcmulq_rot90_m_f<mode>"
   [(set_attr "type" "mve_move")
    (set_attr "length""8")])
 
-;;
-;; [veorq_m_f])
-;;
-(define_insn "mve_veorq_m_f<mode>"
-  [
-   (set (match_operand:MVE_0 0 "s_register_operand" "=w")
-	(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "0")
-		       (match_operand:MVE_0 2 "s_register_operand" "w")
-		       (match_operand:MVE_0 3 "s_register_operand" "w")
-		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
-	 VEORQ_M_F))
-  ]
-  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
-  "vpst\;veort %q0, %q2, %q3"
-  [(set_attr "type" "mve_move")
-   (set_attr "length""8")])
-
 ;;
 ;; [vfmaq_m_f])
 ;;
@@ -6838,23 +6732,6 @@ (define_insn "mve_vornq_m_f<mode>"
   [(set_attr "type" "mve_move")
    (set_attr "length""8")])
 
-;;
-;; [vorrq_m_f])
-;;
-(define_insn "mve_vorrq_m_f<mode>"
-  [
-   (set (match_operand:MVE_0 0 "s_register_operand" "=w")
-	(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "0")
-		       (match_operand:MVE_0 2 "s_register_operand" "w")
-		       (match_operand:MVE_0 3 "s_register_operand" "w")
-		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
-	 VORRQ_M_F))
-  ]
-  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
-  "vpst\;vorrt %q0, %q2, %q3"
-  [(set_attr "type" "mve_move")
-   (set_attr "length""8")])
-
 ;;
 ;; [vstrbq_s vstrbq_u]
 ;;
-- 
2.34.1


^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH 11/22] arm: [MVE intrinsics] rework vandq veorq
  2023-04-18 13:45 [PATCH 00/22] arm: New framework for MVE intrinsics Christophe Lyon
                   ` (9 preceding siblings ...)
  2023-04-18 13:45 ` [PATCH 10/22] arm: [MVE intrinsics] factorize vandq veorq vorrq vbicq Christophe Lyon
@ 2023-04-18 13:45 ` Christophe Lyon
  2023-05-02 16:37   ` Kyrylo Tkachov
  2023-04-18 13:45 ` [PATCH 12/22] arm: [MVE intrinsics] add binary_orrq shape Christophe Lyon
                   ` (11 subsequent siblings)
  22 siblings, 1 reply; 55+ messages in thread
From: Christophe Lyon @ 2023-04-18 13:45 UTC (permalink / raw)
  To: gcc-patches, kyrylo.tkachov, richard.earnshaw, richard.sandiford
  Cc: Christophe Lyon

Implement vamdq, veorq using the new MVE builtins framework.

2022-09-08  Christophe Lyon <christophe.lyon@arm.com>

	gcc/
	* config/arm/arm-mve-builtins-base.cc (FUNCTION_WITH_RTX_M): New.
	(vandq,veorq): New.
	* config/arm/arm-mve-builtins-base.def (vandq, veorq): New.
	* config/arm/arm-mve-builtins-base.h (vandq, veorq): New.
	* config/arm/arm_mve.h (vandq): Remove.
	(vandq_m): Remove.
	(vandq_x): Remove.
	(vandq_u8): Remove.
	(vandq_s8): Remove.
	(vandq_u16): Remove.
	(vandq_s16): Remove.
	(vandq_u32): Remove.
	(vandq_s32): Remove.
	(vandq_f16): Remove.
	(vandq_f32): Remove.
	(vandq_m_s8): Remove.
	(vandq_m_s32): Remove.
	(vandq_m_s16): Remove.
	(vandq_m_u8): Remove.
	(vandq_m_u32): Remove.
	(vandq_m_u16): Remove.
	(vandq_m_f32): Remove.
	(vandq_m_f16): Remove.
	(vandq_x_s8): Remove.
	(vandq_x_s16): Remove.
	(vandq_x_s32): Remove.
	(vandq_x_u8): Remove.
	(vandq_x_u16): Remove.
	(vandq_x_u32): Remove.
	(vandq_x_f16): Remove.
	(vandq_x_f32): Remove.
	(__arm_vandq_u8): Remove.
	(__arm_vandq_s8): Remove.
	(__arm_vandq_u16): Remove.
	(__arm_vandq_s16): Remove.
	(__arm_vandq_u32): Remove.
	(__arm_vandq_s32): Remove.
	(__arm_vandq_m_s8): Remove.
	(__arm_vandq_m_s32): Remove.
	(__arm_vandq_m_s16): Remove.
	(__arm_vandq_m_u8): Remove.
	(__arm_vandq_m_u32): Remove.
	(__arm_vandq_m_u16): Remove.
	(__arm_vandq_x_s8): Remove.
	(__arm_vandq_x_s16): Remove.
	(__arm_vandq_x_s32): Remove.
	(__arm_vandq_x_u8): Remove.
	(__arm_vandq_x_u16): Remove.
	(__arm_vandq_x_u32): Remove.
	(__arm_vandq_f16): Remove.
	(__arm_vandq_f32): Remove.
	(__arm_vandq_m_f32): Remove.
	(__arm_vandq_m_f16): Remove.
	(__arm_vandq_x_f16): Remove.
	(__arm_vandq_x_f32): Remove.
	(__arm_vandq): Remove.
	(__arm_vandq_m): Remove.
	(__arm_vandq_x): Remove.
	(veorq_m): Remove.
	(veorq_x): Remove.
	(veorq_u8): Remove.
	(veorq_s8): Remove.
	(veorq_u16): Remove.
	(veorq_s16): Remove.
	(veorq_u32): Remove.
	(veorq_s32): Remove.
	(veorq_f16): Remove.
	(veorq_f32): Remove.
	(veorq_m_s8): Remove.
	(veorq_m_s32): Remove.
	(veorq_m_s16): Remove.
	(veorq_m_u8): Remove.
	(veorq_m_u32): Remove.
	(veorq_m_u16): Remove.
	(veorq_m_f32): Remove.
	(veorq_m_f16): Remove.
	(veorq_x_s8): Remove.
	(veorq_x_s16): Remove.
	(veorq_x_s32): Remove.
	(veorq_x_u8): Remove.
	(veorq_x_u16): Remove.
	(veorq_x_u32): Remove.
	(veorq_x_f16): Remove.
	(veorq_x_f32): Remove.
	(__arm_veorq_u8): Remove.
	(__arm_veorq_s8): Remove.
	(__arm_veorq_u16): Remove.
	(__arm_veorq_s16): Remove.
	(__arm_veorq_u32): Remove.
	(__arm_veorq_s32): Remove.
	(__arm_veorq_m_s8): Remove.
	(__arm_veorq_m_s32): Remove.
	(__arm_veorq_m_s16): Remove.
	(__arm_veorq_m_u8): Remove.
	(__arm_veorq_m_u32): Remove.
	(__arm_veorq_m_u16): Remove.
	(__arm_veorq_x_s8): Remove.
	(__arm_veorq_x_s16): Remove.
	(__arm_veorq_x_s32): Remove.
	(__arm_veorq_x_u8): Remove.
	(__arm_veorq_x_u16): Remove.
	(__arm_veorq_x_u32): Remove.
	(__arm_veorq_f16): Remove.
	(__arm_veorq_f32): Remove.
	(__arm_veorq_m_f32): Remove.
	(__arm_veorq_m_f16): Remove.
	(__arm_veorq_x_f16): Remove.
	(__arm_veorq_x_f32): Remove.
	(__arm_veorq): Remove.
	(__arm_veorq_m): Remove.
	(__arm_veorq_x): Remove.
---
 gcc/config/arm/arm-mve-builtins-base.cc  |  10 +
 gcc/config/arm/arm-mve-builtins-base.def |   4 +
 gcc/config/arm/arm-mve-builtins-base.h   |   2 +
 gcc/config/arm/arm_mve.h                 | 862 -----------------------
 4 files changed, 16 insertions(+), 862 deletions(-)

diff --git a/gcc/config/arm/arm-mve-builtins-base.cc b/gcc/config/arm/arm-mve-builtins-base.cc
index 48b09bffd0c..51fed8f671f 100644
--- a/gcc/config/arm/arm-mve-builtins-base.cc
+++ b/gcc/config/arm/arm-mve-builtins-base.cc
@@ -90,7 +90,17 @@ namespace arm_mve {
     UNSPEC##_M_S, UNSPEC##_M_U, UNSPEC##_M_F,				\
     UNSPEC##_M_N_S, UNSPEC##_M_N_U, UNSPEC##_M_N_F))
 
+  /* Helper for builtins with RTX codes, and _m predicated overrides.  */
+#define FUNCTION_WITH_RTX_M(NAME, RTX, UNSPEC) FUNCTION			\
+  (NAME, unspec_based_mve_function_exact_insn,				\
+   (RTX, RTX, RTX,							\
+    -1, -1, -1,								\
+    UNSPEC##_M_S, UNSPEC##_M_U, UNSPEC##_M_F,				\
+    -1, -1, -1))
+
 FUNCTION_WITH_RTX_M_N (vaddq, PLUS, VADDQ)
+FUNCTION_WITH_RTX_M (vandq, AND, VANDQ)
+FUNCTION_WITH_RTX_M (veorq, XOR, VEORQ)
 FUNCTION_WITH_RTX_M_N (vmulq, MULT, VMULQ)
 FUNCTION (vreinterpretq, vreinterpretq_impl,)
 FUNCTION_WITH_RTX_M_N (vsubq, MINUS, VSUBQ)
diff --git a/gcc/config/arm/arm-mve-builtins-base.def b/gcc/config/arm/arm-mve-builtins-base.def
index 624558c08b2..a933c9fc91e 100644
--- a/gcc/config/arm/arm-mve-builtins-base.def
+++ b/gcc/config/arm/arm-mve-builtins-base.def
@@ -19,6 +19,8 @@
 
 #define REQUIRES_FLOAT false
 DEF_MVE_FUNCTION (vaddq, binary_opt_n, all_integer, mx_or_none)
+DEF_MVE_FUNCTION (vandq, binary, all_integer, mx_or_none)
+DEF_MVE_FUNCTION (veorq, binary, all_integer, mx_or_none)
 DEF_MVE_FUNCTION (vmulq, binary_opt_n, all_integer, mx_or_none)
 DEF_MVE_FUNCTION (vreinterpretq, unary_convert, reinterpret_integer, none)
 DEF_MVE_FUNCTION (vsubq, binary_opt_n, all_integer, mx_or_none)
@@ -27,6 +29,8 @@ DEF_MVE_FUNCTION (vuninitializedq, inherent, all_integer_with_64, none)
 
 #define REQUIRES_FLOAT true
 DEF_MVE_FUNCTION (vaddq, binary_opt_n, all_float, mx_or_none)
+DEF_MVE_FUNCTION (vandq, binary, all_float, mx_or_none)
+DEF_MVE_FUNCTION (veorq, binary, all_float, mx_or_none)
 DEF_MVE_FUNCTION (vmulq, binary_opt_n, all_float, mx_or_none)
 DEF_MVE_FUNCTION (vreinterpretq, unary_convert, reinterpret_float, none)
 DEF_MVE_FUNCTION (vsubq, binary_opt_n, all_float, mx_or_none)
diff --git a/gcc/config/arm/arm-mve-builtins-base.h b/gcc/config/arm/arm-mve-builtins-base.h
index 30f8549c495..4fcf55715b6 100644
--- a/gcc/config/arm/arm-mve-builtins-base.h
+++ b/gcc/config/arm/arm-mve-builtins-base.h
@@ -24,6 +24,8 @@ namespace arm_mve {
 namespace functions {
 
 extern const function_base *const vaddq;
+extern const function_base *const vandq;
+extern const function_base *const veorq;
 extern const function_base *const vmulq;
 extern const function_base *const vreinterpretq;
 extern const function_base *const vsubq;
diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
index 42a1af2ae15..0ad0122e44f 100644
--- a/gcc/config/arm/arm_mve.h
+++ b/gcc/config/arm/arm_mve.h
@@ -77,14 +77,12 @@
 #define vmaxq(__a, __b) __arm_vmaxq(__a, __b)
 #define vhsubq(__a, __b) __arm_vhsubq(__a, __b)
 #define vhaddq(__a, __b) __arm_vhaddq(__a, __b)
-#define veorq(__a, __b) __arm_veorq(__a, __b)
 #define vcmphiq(__a, __b) __arm_vcmphiq(__a, __b)
 #define vcmpeqq(__a, __b) __arm_vcmpeqq(__a, __b)
 #define vcmpcsq(__a, __b) __arm_vcmpcsq(__a, __b)
 #define vcaddq_rot90(__a, __b) __arm_vcaddq_rot90(__a, __b)
 #define vcaddq_rot270(__a, __b) __arm_vcaddq_rot270(__a, __b)
 #define vbicq(__a, __b) __arm_vbicq(__a, __b)
-#define vandq(__a, __b) __arm_vandq(__a, __b)
 #define vaddvq_p(__a, __p) __arm_vaddvq_p(__a, __p)
 #define vaddvaq(__a, __b) __arm_vaddvaq(__a, __b)
 #define vabdq(__a, __b) __arm_vabdq(__a, __b)
@@ -236,12 +234,10 @@
 #define vabavq_p(__a, __b, __c, __p) __arm_vabavq_p(__a, __b, __c, __p)
 #define vshlq_m(__inactive, __a, __b, __p) __arm_vshlq_m(__inactive, __a, __b, __p)
 #define vabdq_m(__inactive, __a, __b, __p) __arm_vabdq_m(__inactive, __a, __b, __p)
-#define vandq_m(__inactive, __a, __b, __p) __arm_vandq_m(__inactive, __a, __b, __p)
 #define vbicq_m(__inactive, __a, __b, __p) __arm_vbicq_m(__inactive, __a, __b, __p)
 #define vbrsrq_m(__inactive, __a, __b, __p) __arm_vbrsrq_m(__inactive, __a, __b, __p)
 #define vcaddq_rot270_m(__inactive, __a, __b, __p) __arm_vcaddq_rot270_m(__inactive, __a, __b, __p)
 #define vcaddq_rot90_m(__inactive, __a, __b, __p) __arm_vcaddq_rot90_m(__inactive, __a, __b, __p)
-#define veorq_m(__inactive, __a, __b, __p) __arm_veorq_m(__inactive, __a, __b, __p)
 #define vhaddq_m(__inactive, __a, __b, __p) __arm_vhaddq_m(__inactive, __a, __b, __p)
 #define vhcaddq_rot270_m(__inactive, __a, __b, __p) __arm_vhcaddq_rot270_m(__inactive, __a, __b, __p)
 #define vhcaddq_rot90_m(__inactive, __a, __b, __p) __arm_vhcaddq_rot90_m(__inactive, __a, __b, __p)
@@ -404,10 +400,8 @@
 #define vhsubq_x(__a, __b, __p) __arm_vhsubq_x(__a, __b, __p)
 #define vrhaddq_x(__a, __b, __p) __arm_vrhaddq_x(__a, __b, __p)
 #define vrmulhq_x(__a, __b, __p) __arm_vrmulhq_x(__a, __b, __p)
-#define vandq_x(__a, __b, __p) __arm_vandq_x(__a, __b, __p)
 #define vbicq_x(__a, __b, __p) __arm_vbicq_x(__a, __b, __p)
 #define vbrsrq_x(__a, __b, __p) __arm_vbrsrq_x(__a, __b, __p)
-#define veorq_x(__a, __b, __p) __arm_veorq_x(__a, __b, __p)
 #define vmovlbq_x(__a, __p) __arm_vmovlbq_x(__a, __p)
 #define vmovltq_x(__a, __p) __arm_vmovltq_x(__a, __p)
 #define vmvnq_x(__a, __p) __arm_vmvnq_x(__a, __p)
@@ -702,7 +696,6 @@
 #define vhsubq_n_u8(__a, __b) __arm_vhsubq_n_u8(__a, __b)
 #define vhaddq_u8(__a, __b) __arm_vhaddq_u8(__a, __b)
 #define vhaddq_n_u8(__a, __b) __arm_vhaddq_n_u8(__a, __b)
-#define veorq_u8(__a, __b) __arm_veorq_u8(__a, __b)
 #define vcmpneq_n_u8(__a, __b) __arm_vcmpneq_n_u8(__a, __b)
 #define vcmphiq_u8(__a, __b) __arm_vcmphiq_u8(__a, __b)
 #define vcmphiq_n_u8(__a, __b) __arm_vcmphiq_n_u8(__a, __b)
@@ -713,7 +706,6 @@
 #define vcaddq_rot90_u8(__a, __b) __arm_vcaddq_rot90_u8(__a, __b)
 #define vcaddq_rot270_u8(__a, __b) __arm_vcaddq_rot270_u8(__a, __b)
 #define vbicq_u8(__a, __b) __arm_vbicq_u8(__a, __b)
-#define vandq_u8(__a, __b) __arm_vandq_u8(__a, __b)
 #define vaddvq_p_u8(__a, __p) __arm_vaddvq_p_u8(__a, __p)
 #define vaddvaq_u8(__a, __b) __arm_vaddvaq_u8(__a, __b)
 #define vabdq_u8(__a, __b) __arm_vabdq_u8(__a, __b)
@@ -781,12 +773,10 @@
 #define vhcaddq_rot270_s8(__a, __b) __arm_vhcaddq_rot270_s8(__a, __b)
 #define vhaddq_s8(__a, __b) __arm_vhaddq_s8(__a, __b)
 #define vhaddq_n_s8(__a, __b) __arm_vhaddq_n_s8(__a, __b)
-#define veorq_s8(__a, __b) __arm_veorq_s8(__a, __b)
 #define vcaddq_rot90_s8(__a, __b) __arm_vcaddq_rot90_s8(__a, __b)
 #define vcaddq_rot270_s8(__a, __b) __arm_vcaddq_rot270_s8(__a, __b)
 #define vbrsrq_n_s8(__a, __b) __arm_vbrsrq_n_s8(__a, __b)
 #define vbicq_s8(__a, __b) __arm_vbicq_s8(__a, __b)
-#define vandq_s8(__a, __b) __arm_vandq_s8(__a, __b)
 #define vaddvaq_s8(__a, __b) __arm_vaddvaq_s8(__a, __b)
 #define vabdq_s8(__a, __b) __arm_vabdq_s8(__a, __b)
 #define vshlq_n_s8(__a,  __imm) __arm_vshlq_n_s8(__a,  __imm)
@@ -812,7 +802,6 @@
 #define vhsubq_n_u16(__a, __b) __arm_vhsubq_n_u16(__a, __b)
 #define vhaddq_u16(__a, __b) __arm_vhaddq_u16(__a, __b)
 #define vhaddq_n_u16(__a, __b) __arm_vhaddq_n_u16(__a, __b)
-#define veorq_u16(__a, __b) __arm_veorq_u16(__a, __b)
 #define vcmpneq_n_u16(__a, __b) __arm_vcmpneq_n_u16(__a, __b)
 #define vcmphiq_u16(__a, __b) __arm_vcmphiq_u16(__a, __b)
 #define vcmphiq_n_u16(__a, __b) __arm_vcmphiq_n_u16(__a, __b)
@@ -823,7 +812,6 @@
 #define vcaddq_rot90_u16(__a, __b) __arm_vcaddq_rot90_u16(__a, __b)
 #define vcaddq_rot270_u16(__a, __b) __arm_vcaddq_rot270_u16(__a, __b)
 #define vbicq_u16(__a, __b) __arm_vbicq_u16(__a, __b)
-#define vandq_u16(__a, __b) __arm_vandq_u16(__a, __b)
 #define vaddvq_p_u16(__a, __p) __arm_vaddvq_p_u16(__a, __p)
 #define vaddvaq_u16(__a, __b) __arm_vaddvaq_u16(__a, __b)
 #define vabdq_u16(__a, __b) __arm_vabdq_u16(__a, __b)
@@ -891,12 +879,10 @@
 #define vhcaddq_rot270_s16(__a, __b) __arm_vhcaddq_rot270_s16(__a, __b)
 #define vhaddq_s16(__a, __b) __arm_vhaddq_s16(__a, __b)
 #define vhaddq_n_s16(__a, __b) __arm_vhaddq_n_s16(__a, __b)
-#define veorq_s16(__a, __b) __arm_veorq_s16(__a, __b)
 #define vcaddq_rot90_s16(__a, __b) __arm_vcaddq_rot90_s16(__a, __b)
 #define vcaddq_rot270_s16(__a, __b) __arm_vcaddq_rot270_s16(__a, __b)
 #define vbrsrq_n_s16(__a, __b) __arm_vbrsrq_n_s16(__a, __b)
 #define vbicq_s16(__a, __b) __arm_vbicq_s16(__a, __b)
-#define vandq_s16(__a, __b) __arm_vandq_s16(__a, __b)
 #define vaddvaq_s16(__a, __b) __arm_vaddvaq_s16(__a, __b)
 #define vabdq_s16(__a, __b) __arm_vabdq_s16(__a, __b)
 #define vshlq_n_s16(__a,  __imm) __arm_vshlq_n_s16(__a,  __imm)
@@ -922,7 +908,6 @@
 #define vhsubq_n_u32(__a, __b) __arm_vhsubq_n_u32(__a, __b)
 #define vhaddq_u32(__a, __b) __arm_vhaddq_u32(__a, __b)
 #define vhaddq_n_u32(__a, __b) __arm_vhaddq_n_u32(__a, __b)
-#define veorq_u32(__a, __b) __arm_veorq_u32(__a, __b)
 #define vcmpneq_n_u32(__a, __b) __arm_vcmpneq_n_u32(__a, __b)
 #define vcmphiq_u32(__a, __b) __arm_vcmphiq_u32(__a, __b)
 #define vcmphiq_n_u32(__a, __b) __arm_vcmphiq_n_u32(__a, __b)
@@ -933,7 +918,6 @@
 #define vcaddq_rot90_u32(__a, __b) __arm_vcaddq_rot90_u32(__a, __b)
 #define vcaddq_rot270_u32(__a, __b) __arm_vcaddq_rot270_u32(__a, __b)
 #define vbicq_u32(__a, __b) __arm_vbicq_u32(__a, __b)
-#define vandq_u32(__a, __b) __arm_vandq_u32(__a, __b)
 #define vaddvq_p_u32(__a, __p) __arm_vaddvq_p_u32(__a, __p)
 #define vaddvaq_u32(__a, __b) __arm_vaddvaq_u32(__a, __b)
 #define vabdq_u32(__a, __b) __arm_vabdq_u32(__a, __b)
@@ -1001,12 +985,10 @@
 #define vhcaddq_rot270_s32(__a, __b) __arm_vhcaddq_rot270_s32(__a, __b)
 #define vhaddq_s32(__a, __b) __arm_vhaddq_s32(__a, __b)
 #define vhaddq_n_s32(__a, __b) __arm_vhaddq_n_s32(__a, __b)
-#define veorq_s32(__a, __b) __arm_veorq_s32(__a, __b)
 #define vcaddq_rot90_s32(__a, __b) __arm_vcaddq_rot90_s32(__a, __b)
 #define vcaddq_rot270_s32(__a, __b) __arm_vcaddq_rot270_s32(__a, __b)
 #define vbrsrq_n_s32(__a, __b) __arm_vbrsrq_n_s32(__a, __b)
 #define vbicq_s32(__a, __b) __arm_vbicq_s32(__a, __b)
-#define vandq_s32(__a, __b) __arm_vandq_s32(__a, __b)
 #define vaddvaq_s32(__a, __b) __arm_vaddvaq_s32(__a, __b)
 #define vabdq_s32(__a, __b) __arm_vabdq_s32(__a, __b)
 #define vshlq_n_s32(__a,  __imm) __arm_vshlq_n_s32(__a,  __imm)
@@ -1059,7 +1041,6 @@
 #define vmaxnmq_f16(__a, __b) __arm_vmaxnmq_f16(__a, __b)
 #define vmaxnmavq_f16(__a, __b) __arm_vmaxnmavq_f16(__a, __b)
 #define vmaxnmaq_f16(__a, __b) __arm_vmaxnmaq_f16(__a, __b)
-#define veorq_f16(__a, __b) __arm_veorq_f16(__a, __b)
 #define vcmulq_rot90_f16(__a, __b) __arm_vcmulq_rot90_f16(__a, __b)
 #define vcmulq_rot270_f16(__a, __b) __arm_vcmulq_rot270_f16(__a, __b)
 #define vcmulq_rot180_f16(__a, __b) __arm_vcmulq_rot180_f16(__a, __b)
@@ -1067,7 +1048,6 @@
 #define vcaddq_rot90_f16(__a, __b) __arm_vcaddq_rot90_f16(__a, __b)
 #define vcaddq_rot270_f16(__a, __b) __arm_vcaddq_rot270_f16(__a, __b)
 #define vbicq_f16(__a, __b) __arm_vbicq_f16(__a, __b)
-#define vandq_f16(__a, __b) __arm_vandq_f16(__a, __b)
 #define vabdq_f16(__a, __b) __arm_vabdq_f16(__a, __b)
 #define vshlltq_n_s8(__a,  __imm) __arm_vshlltq_n_s8(__a,  __imm)
 #define vshllbq_n_s8(__a,  __imm) __arm_vshllbq_n_s8(__a,  __imm)
@@ -1120,7 +1100,6 @@
 #define vmaxnmq_f32(__a, __b) __arm_vmaxnmq_f32(__a, __b)
 #define vmaxnmavq_f32(__a, __b) __arm_vmaxnmavq_f32(__a, __b)
 #define vmaxnmaq_f32(__a, __b) __arm_vmaxnmaq_f32(__a, __b)
-#define veorq_f32(__a, __b) __arm_veorq_f32(__a, __b)
 #define vcmulq_rot90_f32(__a, __b) __arm_vcmulq_rot90_f32(__a, __b)
 #define vcmulq_rot270_f32(__a, __b) __arm_vcmulq_rot270_f32(__a, __b)
 #define vcmulq_rot180_f32(__a, __b) __arm_vcmulq_rot180_f32(__a, __b)
@@ -1128,7 +1107,6 @@
 #define vcaddq_rot90_f32(__a, __b) __arm_vcaddq_rot90_f32(__a, __b)
 #define vcaddq_rot270_f32(__a, __b) __arm_vcaddq_rot270_f32(__a, __b)
 #define vbicq_f32(__a, __b) __arm_vbicq_f32(__a, __b)
-#define vandq_f32(__a, __b) __arm_vandq_f32(__a, __b)
 #define vabdq_f32(__a, __b) __arm_vabdq_f32(__a, __b)
 #define vshlltq_n_s16(__a,  __imm) __arm_vshlltq_n_s16(__a,  __imm)
 #define vshllbq_n_s16(__a,  __imm) __arm_vshllbq_n_s16(__a,  __imm)
@@ -1662,12 +1640,6 @@
 #define vabdq_m_u8(__inactive, __a, __b, __p) __arm_vabdq_m_u8(__inactive, __a, __b, __p)
 #define vabdq_m_u32(__inactive, __a, __b, __p) __arm_vabdq_m_u32(__inactive, __a, __b, __p)
 #define vabdq_m_u16(__inactive, __a, __b, __p) __arm_vabdq_m_u16(__inactive, __a, __b, __p)
-#define vandq_m_s8(__inactive, __a, __b, __p) __arm_vandq_m_s8(__inactive, __a, __b, __p)
-#define vandq_m_s32(__inactive, __a, __b, __p) __arm_vandq_m_s32(__inactive, __a, __b, __p)
-#define vandq_m_s16(__inactive, __a, __b, __p) __arm_vandq_m_s16(__inactive, __a, __b, __p)
-#define vandq_m_u8(__inactive, __a, __b, __p) __arm_vandq_m_u8(__inactive, __a, __b, __p)
-#define vandq_m_u32(__inactive, __a, __b, __p) __arm_vandq_m_u32(__inactive, __a, __b, __p)
-#define vandq_m_u16(__inactive, __a, __b, __p) __arm_vandq_m_u16(__inactive, __a, __b, __p)
 #define vbicq_m_s8(__inactive, __a, __b, __p) __arm_vbicq_m_s8(__inactive, __a, __b, __p)
 #define vbicq_m_s32(__inactive, __a, __b, __p) __arm_vbicq_m_s32(__inactive, __a, __b, __p)
 #define vbicq_m_s16(__inactive, __a, __b, __p) __arm_vbicq_m_s16(__inactive, __a, __b, __p)
@@ -1692,12 +1664,6 @@
 #define vcaddq_rot90_m_u8(__inactive, __a, __b, __p) __arm_vcaddq_rot90_m_u8(__inactive, __a, __b, __p)
 #define vcaddq_rot90_m_u32(__inactive, __a, __b, __p) __arm_vcaddq_rot90_m_u32(__inactive, __a, __b, __p)
 #define vcaddq_rot90_m_u16(__inactive, __a, __b, __p) __arm_vcaddq_rot90_m_u16(__inactive, __a, __b, __p)
-#define veorq_m_s8(__inactive, __a, __b, __p) __arm_veorq_m_s8(__inactive, __a, __b, __p)
-#define veorq_m_s32(__inactive, __a, __b, __p) __arm_veorq_m_s32(__inactive, __a, __b, __p)
-#define veorq_m_s16(__inactive, __a, __b, __p) __arm_veorq_m_s16(__inactive, __a, __b, __p)
-#define veorq_m_u8(__inactive, __a, __b, __p) __arm_veorq_m_u8(__inactive, __a, __b, __p)
-#define veorq_m_u32(__inactive, __a, __b, __p) __arm_veorq_m_u32(__inactive, __a, __b, __p)
-#define veorq_m_u16(__inactive, __a, __b, __p) __arm_veorq_m_u16(__inactive, __a, __b, __p)
 #define vhaddq_m_n_s8(__inactive, __a, __b, __p) __arm_vhaddq_m_n_s8(__inactive, __a, __b, __p)
 #define vhaddq_m_n_s32(__inactive, __a, __b, __p) __arm_vhaddq_m_n_s32(__inactive, __a, __b, __p)
 #define vhaddq_m_n_s16(__inactive, __a, __b, __p) __arm_vhaddq_m_n_s16(__inactive, __a, __b, __p)
@@ -2006,8 +1972,6 @@
 #define vshrntq_m_n_u16(__a, __b,  __imm, __p) __arm_vshrntq_m_n_u16(__a, __b,  __imm, __p)
 #define vabdq_m_f32(__inactive, __a, __b, __p) __arm_vabdq_m_f32(__inactive, __a, __b, __p)
 #define vabdq_m_f16(__inactive, __a, __b, __p) __arm_vabdq_m_f16(__inactive, __a, __b, __p)
-#define vandq_m_f32(__inactive, __a, __b, __p) __arm_vandq_m_f32(__inactive, __a, __b, __p)
-#define vandq_m_f16(__inactive, __a, __b, __p) __arm_vandq_m_f16(__inactive, __a, __b, __p)
 #define vbicq_m_f32(__inactive, __a, __b, __p) __arm_vbicq_m_f32(__inactive, __a, __b, __p)
 #define vbicq_m_f16(__inactive, __a, __b, __p) __arm_vbicq_m_f16(__inactive, __a, __b, __p)
 #define vbrsrq_m_n_f32(__inactive, __a, __b, __p) __arm_vbrsrq_m_n_f32(__inactive, __a, __b, __p)
@@ -2036,8 +2000,6 @@
 #define vcvtq_m_n_s16_f16(__inactive, __a,  __imm6, __p) __arm_vcvtq_m_n_s16_f16(__inactive, __a,  __imm6, __p)
 #define vcvtq_m_n_u32_f32(__inactive, __a,  __imm6, __p) __arm_vcvtq_m_n_u32_f32(__inactive, __a,  __imm6, __p)
 #define vcvtq_m_n_u16_f16(__inactive, __a,  __imm6, __p) __arm_vcvtq_m_n_u16_f16(__inactive, __a,  __imm6, __p)
-#define veorq_m_f32(__inactive, __a, __b, __p) __arm_veorq_m_f32(__inactive, __a, __b, __p)
-#define veorq_m_f16(__inactive, __a, __b, __p) __arm_veorq_m_f16(__inactive, __a, __b, __p)
 #define vfmaq_m_f32(__a, __b, __c, __p) __arm_vfmaq_m_f32(__a, __b, __c, __p)
 #define vfmaq_m_f16(__a, __b, __c, __p) __arm_vfmaq_m_f16(__a, __b, __c, __p)
 #define vfmaq_m_n_f32(__a, __b, __c, __p) __arm_vfmaq_m_n_f32(__a, __b, __c, __p)
@@ -2467,12 +2429,6 @@
 #define vrmulhq_x_u8(__a, __b, __p) __arm_vrmulhq_x_u8(__a, __b, __p)
 #define vrmulhq_x_u16(__a, __b, __p) __arm_vrmulhq_x_u16(__a, __b, __p)
 #define vrmulhq_x_u32(__a, __b, __p) __arm_vrmulhq_x_u32(__a, __b, __p)
-#define vandq_x_s8(__a, __b, __p) __arm_vandq_x_s8(__a, __b, __p)
-#define vandq_x_s16(__a, __b, __p) __arm_vandq_x_s16(__a, __b, __p)
-#define vandq_x_s32(__a, __b, __p) __arm_vandq_x_s32(__a, __b, __p)
-#define vandq_x_u8(__a, __b, __p) __arm_vandq_x_u8(__a, __b, __p)
-#define vandq_x_u16(__a, __b, __p) __arm_vandq_x_u16(__a, __b, __p)
-#define vandq_x_u32(__a, __b, __p) __arm_vandq_x_u32(__a, __b, __p)
 #define vbicq_x_s8(__a, __b, __p) __arm_vbicq_x_s8(__a, __b, __p)
 #define vbicq_x_s16(__a, __b, __p) __arm_vbicq_x_s16(__a, __b, __p)
 #define vbicq_x_s32(__a, __b, __p) __arm_vbicq_x_s32(__a, __b, __p)
@@ -2485,12 +2441,6 @@
 #define vbrsrq_x_n_u8(__a, __b, __p) __arm_vbrsrq_x_n_u8(__a, __b, __p)
 #define vbrsrq_x_n_u16(__a, __b, __p) __arm_vbrsrq_x_n_u16(__a, __b, __p)
 #define vbrsrq_x_n_u32(__a, __b, __p) __arm_vbrsrq_x_n_u32(__a, __b, __p)
-#define veorq_x_s8(__a, __b, __p) __arm_veorq_x_s8(__a, __b, __p)
-#define veorq_x_s16(__a, __b, __p) __arm_veorq_x_s16(__a, __b, __p)
-#define veorq_x_s32(__a, __b, __p) __arm_veorq_x_s32(__a, __b, __p)
-#define veorq_x_u8(__a, __b, __p) __arm_veorq_x_u8(__a, __b, __p)
-#define veorq_x_u16(__a, __b, __p) __arm_veorq_x_u16(__a, __b, __p)
-#define veorq_x_u32(__a, __b, __p) __arm_veorq_x_u32(__a, __b, __p)
 #define vmovlbq_x_s8(__a, __p) __arm_vmovlbq_x_s8(__a, __p)
 #define vmovlbq_x_s16(__a, __p) __arm_vmovlbq_x_s16(__a, __p)
 #define vmovlbq_x_u8(__a, __p) __arm_vmovlbq_x_u8(__a, __p)
@@ -2641,14 +2591,10 @@
 #define vrndaq_x_f32(__a, __p) __arm_vrndaq_x_f32(__a, __p)
 #define vrndxq_x_f16(__a, __p) __arm_vrndxq_x_f16(__a, __p)
 #define vrndxq_x_f32(__a, __p) __arm_vrndxq_x_f32(__a, __p)
-#define vandq_x_f16(__a, __b, __p) __arm_vandq_x_f16(__a, __b, __p)
-#define vandq_x_f32(__a, __b, __p) __arm_vandq_x_f32(__a, __b, __p)
 #define vbicq_x_f16(__a, __b, __p) __arm_vbicq_x_f16(__a, __b, __p)
 #define vbicq_x_f32(__a, __b, __p) __arm_vbicq_x_f32(__a, __b, __p)
 #define vbrsrq_x_n_f16(__a, __b, __p) __arm_vbrsrq_x_n_f16(__a, __b, __p)
 #define vbrsrq_x_n_f32(__a, __b, __p) __arm_vbrsrq_x_n_f32(__a, __b, __p)
-#define veorq_x_f16(__a, __b, __p) __arm_veorq_x_f16(__a, __b, __p)
-#define veorq_x_f32(__a, __b, __p) __arm_veorq_x_f32(__a, __b, __p)
 #define vornq_x_f16(__a, __b, __p) __arm_vornq_x_f16(__a, __b, __p)
 #define vornq_x_f32(__a, __b, __p) __arm_vornq_x_f32(__a, __b, __p)
 #define vorrq_x_f16(__a, __b, __p) __arm_vorrq_x_f16(__a, __b, __p)
@@ -3647,13 +3593,6 @@ __arm_vhaddq_n_u8 (uint8x16_t __a, uint8_t __b)
   return __builtin_mve_vhaddq_n_uv16qi (__a, __b);
 }
 
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_veorq_u8 (uint8x16_t __a, uint8x16_t __b)
-{
-  return __builtin_mve_veorq_uv16qi (__a, __b);
-}
-
 __extension__ extern __inline mve_pred16_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vcmpneq_n_u8 (uint8x16_t __a, uint8_t __b)
@@ -3726,13 +3665,6 @@ __arm_vbicq_u8 (uint8x16_t __a, uint8x16_t __b)
   return __builtin_mve_vbicq_uv16qi (__a, __b);
 }
 
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vandq_u8 (uint8x16_t __a, uint8x16_t __b)
-{
-  return __builtin_mve_vandq_uv16qi (__a, __b);
-}
-
 __extension__ extern __inline uint32_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vaddvq_p_u8 (uint8x16_t __a, mve_pred16_t __p)
@@ -4202,13 +4134,6 @@ __arm_vhaddq_n_s8 (int8x16_t __a, int8_t __b)
   return __builtin_mve_vhaddq_n_sv16qi (__a, __b);
 }
 
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_veorq_s8 (int8x16_t __a, int8x16_t __b)
-{
-  return __builtin_mve_veorq_sv16qi (__a, __b);
-}
-
 __extension__ extern __inline int8x16_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vcaddq_rot90_s8 (int8x16_t __a, int8x16_t __b)
@@ -4237,13 +4162,6 @@ __arm_vbicq_s8 (int8x16_t __a, int8x16_t __b)
   return __builtin_mve_vbicq_sv16qi (__a, __b);
 }
 
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vandq_s8 (int8x16_t __a, int8x16_t __b)
-{
-  return __builtin_mve_vandq_sv16qi (__a, __b);
-}
-
 __extension__ extern __inline int32_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vaddvaq_s8 (int32_t __a, int8x16_t __b)
@@ -4419,13 +4337,6 @@ __arm_vhaddq_n_u16 (uint16x8_t __a, uint16_t __b)
   return __builtin_mve_vhaddq_n_uv8hi (__a, __b);
 }
 
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_veorq_u16 (uint16x8_t __a, uint16x8_t __b)
-{
-  return __builtin_mve_veorq_uv8hi (__a, __b);
-}
-
 __extension__ extern __inline mve_pred16_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vcmpneq_n_u16 (uint16x8_t __a, uint16_t __b)
@@ -4498,13 +4409,6 @@ __arm_vbicq_u16 (uint16x8_t __a, uint16x8_t __b)
   return __builtin_mve_vbicq_uv8hi (__a, __b);
 }
 
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vandq_u16 (uint16x8_t __a, uint16x8_t __b)
-{
-  return __builtin_mve_vandq_uv8hi (__a, __b);
-}
-
 __extension__ extern __inline uint32_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vaddvq_p_u16 (uint16x8_t __a, mve_pred16_t __p)
@@ -4974,13 +4878,6 @@ __arm_vhaddq_n_s16 (int16x8_t __a, int16_t __b)
   return __builtin_mve_vhaddq_n_sv8hi (__a, __b);
 }
 
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_veorq_s16 (int16x8_t __a, int16x8_t __b)
-{
-  return __builtin_mve_veorq_sv8hi (__a, __b);
-}
-
 __extension__ extern __inline int16x8_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vcaddq_rot90_s16 (int16x8_t __a, int16x8_t __b)
@@ -5009,13 +4906,6 @@ __arm_vbicq_s16 (int16x8_t __a, int16x8_t __b)
   return __builtin_mve_vbicq_sv8hi (__a, __b);
 }
 
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vandq_s16 (int16x8_t __a, int16x8_t __b)
-{
-  return __builtin_mve_vandq_sv8hi (__a, __b);
-}
-
 __extension__ extern __inline int32_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vaddvaq_s16 (int32_t __a, int16x8_t __b)
@@ -5191,13 +5081,6 @@ __arm_vhaddq_n_u32 (uint32x4_t __a, uint32_t __b)
   return __builtin_mve_vhaddq_n_uv4si (__a, __b);
 }
 
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_veorq_u32 (uint32x4_t __a, uint32x4_t __b)
-{
-  return __builtin_mve_veorq_uv4si (__a, __b);
-}
-
 __extension__ extern __inline mve_pred16_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vcmpneq_n_u32 (uint32x4_t __a, uint32_t __b)
@@ -5270,13 +5153,6 @@ __arm_vbicq_u32 (uint32x4_t __a, uint32x4_t __b)
   return __builtin_mve_vbicq_uv4si (__a, __b);
 }
 
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vandq_u32 (uint32x4_t __a, uint32x4_t __b)
-{
-  return __builtin_mve_vandq_uv4si (__a, __b);
-}
-
 __extension__ extern __inline uint32_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vaddvq_p_u32 (uint32x4_t __a, mve_pred16_t __p)
@@ -5746,13 +5622,6 @@ __arm_vhaddq_n_s32 (int32x4_t __a, int32_t __b)
   return __builtin_mve_vhaddq_n_sv4si (__a, __b);
 }
 
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_veorq_s32 (int32x4_t __a, int32x4_t __b)
-{
-  return __builtin_mve_veorq_sv4si (__a, __b);
-}
-
 __extension__ extern __inline int32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vcaddq_rot90_s32 (int32x4_t __a, int32x4_t __b)
@@ -5781,13 +5650,6 @@ __arm_vbicq_s32 (int32x4_t __a, int32x4_t __b)
   return __builtin_mve_vbicq_sv4si (__a, __b);
 }
 
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vandq_s32 (int32x4_t __a, int32x4_t __b)
-{
-  return __builtin_mve_vandq_sv4si (__a, __b);
-}
-
 __extension__ extern __inline int32_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vaddvaq_s32 (int32_t __a, int32x4_t __b)
@@ -9175,48 +9037,6 @@ __arm_vabdq_m_u16 (uint16x8_t __inactive, uint16x8_t __a, uint16x8_t __b, mve_pr
   return __builtin_mve_vabdq_m_uv8hi (__inactive, __a, __b, __p);
 }
 
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vandq_m_s8 (int8x16_t __inactive, int8x16_t __a, int8x16_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vandq_m_sv16qi (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vandq_m_s32 (int32x4_t __inactive, int32x4_t __a, int32x4_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vandq_m_sv4si (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vandq_m_s16 (int16x8_t __inactive, int16x8_t __a, int16x8_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vandq_m_sv8hi (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vandq_m_u8 (uint8x16_t __inactive, uint8x16_t __a, uint8x16_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vandq_m_uv16qi (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vandq_m_u32 (uint32x4_t __inactive, uint32x4_t __a, uint32x4_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vandq_m_uv4si (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vandq_m_u16 (uint16x8_t __inactive, uint16x8_t __a, uint16x8_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vandq_m_uv8hi (__inactive, __a, __b, __p);
-}
-
 __extension__ extern __inline int8x16_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vbicq_m_s8 (int8x16_t __inactive, int8x16_t __a, int8x16_t __b, mve_pred16_t __p)
@@ -9385,48 +9205,6 @@ __arm_vcaddq_rot90_m_u16 (uint16x8_t __inactive, uint16x8_t __a, uint16x8_t __b,
   return __builtin_mve_vcaddq_rot90_m_uv8hi (__inactive, __a, __b, __p);
 }
 
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_veorq_m_s8 (int8x16_t __inactive, int8x16_t __a, int8x16_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_veorq_m_sv16qi (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_veorq_m_s32 (int32x4_t __inactive, int32x4_t __a, int32x4_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_veorq_m_sv4si (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_veorq_m_s16 (int16x8_t __inactive, int16x8_t __a, int16x8_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_veorq_m_sv8hi (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_veorq_m_u8 (uint8x16_t __inactive, uint8x16_t __a, uint8x16_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_veorq_m_uv16qi (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_veorq_m_u32 (uint32x4_t __inactive, uint32x4_t __a, uint32x4_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_veorq_m_uv4si (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_veorq_m_u16 (uint16x8_t __inactive, uint16x8_t __a, uint16x8_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_veorq_m_uv8hi (__inactive, __a, __b, __p);
-}
-
 __extension__ extern __inline int8x16_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vhaddq_m_n_s8 (int8x16_t __inactive, int8x16_t __a, int8_t __b, mve_pred16_t __p)
@@ -14285,48 +14063,6 @@ __arm_vrmulhq_x_u32 (uint32x4_t __a, uint32x4_t __b, mve_pred16_t __p)
   return __builtin_mve_vrmulhq_m_uv4si (__arm_vuninitializedq_u32 (), __a, __b, __p);
 }
 
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vandq_x_s8 (int8x16_t __a, int8x16_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vandq_m_sv16qi (__arm_vuninitializedq_s8 (), __a, __b, __p);
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vandq_x_s16 (int16x8_t __a, int16x8_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vandq_m_sv8hi (__arm_vuninitializedq_s16 (), __a, __b, __p);
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vandq_x_s32 (int32x4_t __a, int32x4_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vandq_m_sv4si (__arm_vuninitializedq_s32 (), __a, __b, __p);
-}
-
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vandq_x_u8 (uint8x16_t __a, uint8x16_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vandq_m_uv16qi (__arm_vuninitializedq_u8 (), __a, __b, __p);
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vandq_x_u16 (uint16x8_t __a, uint16x8_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vandq_m_uv8hi (__arm_vuninitializedq_u16 (), __a, __b, __p);
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vandq_x_u32 (uint32x4_t __a, uint32x4_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vandq_m_uv4si (__arm_vuninitializedq_u32 (), __a, __b, __p);
-}
-
 __extension__ extern __inline int8x16_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vbicq_x_s8 (int8x16_t __a, int8x16_t __b, mve_pred16_t __p)
@@ -14411,48 +14147,6 @@ __arm_vbrsrq_x_n_u32 (uint32x4_t __a, int32_t __b, mve_pred16_t __p)
   return __builtin_mve_vbrsrq_m_n_uv4si (__arm_vuninitializedq_u32 (), __a, __b, __p);
 }
 
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_veorq_x_s8 (int8x16_t __a, int8x16_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_veorq_m_sv16qi (__arm_vuninitializedq_s8 (), __a, __b, __p);
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_veorq_x_s16 (int16x8_t __a, int16x8_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_veorq_m_sv8hi (__arm_vuninitializedq_s16 (), __a, __b, __p);
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_veorq_x_s32 (int32x4_t __a, int32x4_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_veorq_m_sv4si (__arm_vuninitializedq_s32 (), __a, __b, __p);
-}
-
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_veorq_x_u8 (uint8x16_t __a, uint8x16_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_veorq_m_uv16qi (__arm_vuninitializedq_u8 (), __a, __b, __p);
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_veorq_x_u16 (uint16x8_t __a, uint16x8_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_veorq_m_uv8hi (__arm_vuninitializedq_u16 (), __a, __b, __p);
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_veorq_x_u32 (uint32x4_t __a, uint32x4_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_veorq_m_uv4si (__arm_vuninitializedq_u32 (), __a, __b, __p);
-}
-
 __extension__ extern __inline int16x8_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vmovlbq_x_s8 (int8x16_t __a, mve_pred16_t __p)
@@ -16300,13 +15994,6 @@ __arm_vmaxnmaq_f16 (float16x8_t __a, float16x8_t __b)
   return __builtin_mve_vmaxnmaq_fv8hf (__a, __b);
 }
 
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_veorq_f16 (float16x8_t __a, float16x8_t __b)
-{
-  return __builtin_mve_veorq_fv8hf (__a, __b);
-}
-
 __extension__ extern __inline float16x8_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vcmulq_rot90_f16 (float16x8_t __a, float16x8_t __b)
@@ -16356,13 +16043,6 @@ __arm_vbicq_f16 (float16x8_t __a, float16x8_t __b)
   return __builtin_mve_vbicq_fv8hf (__a, __b);
 }
 
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vandq_f16 (float16x8_t __a, float16x8_t __b)
-{
-  return __builtin_mve_vandq_fv8hf (__a, __b);
-}
-
 __extension__ extern __inline float16x8_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vabdq_f16 (float16x8_t __a, float16x8_t __b)
@@ -16524,13 +16204,6 @@ __arm_vmaxnmaq_f32 (float32x4_t __a, float32x4_t __b)
   return __builtin_mve_vmaxnmaq_fv4sf (__a, __b);
 }
 
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_veorq_f32 (float32x4_t __a, float32x4_t __b)
-{
-  return __builtin_mve_veorq_fv4sf (__a, __b);
-}
-
 __extension__ extern __inline float32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vcmulq_rot90_f32 (float32x4_t __a, float32x4_t __b)
@@ -16580,13 +16253,6 @@ __arm_vbicq_f32 (float32x4_t __a, float32x4_t __b)
   return __builtin_mve_vbicq_fv4sf (__a, __b);
 }
 
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vandq_f32 (float32x4_t __a, float32x4_t __b)
-{
-  return __builtin_mve_vandq_fv4sf (__a, __b);
-}
-
 __extension__ extern __inline float32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vabdq_f32 (float32x4_t __a, float32x4_t __b)
@@ -17372,20 +17038,6 @@ __arm_vabdq_m_f16 (float16x8_t __inactive, float16x8_t __a, float16x8_t __b, mve
   return __builtin_mve_vabdq_m_fv8hf (__inactive, __a, __b, __p);
 }
 
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vandq_m_f32 (float32x4_t __inactive, float32x4_t __a, float32x4_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vandq_m_fv4sf (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vandq_m_f16 (float16x8_t __inactive, float16x8_t __a, float16x8_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vandq_m_fv8hf (__inactive, __a, __b, __p);
-}
-
 __extension__ extern __inline float32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vbicq_m_f32 (float32x4_t __inactive, float32x4_t __a, float32x4_t __b, mve_pred16_t __p)
@@ -17582,20 +17234,6 @@ __arm_vcvtq_m_n_u16_f16 (uint16x8_t __inactive, float16x8_t __a, const int __imm
   return __builtin_mve_vcvtq_m_n_from_f_uv8hi (__inactive, __a, __imm6, __p);
 }
 
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_veorq_m_f32 (float32x4_t __inactive, float32x4_t __a, float32x4_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_veorq_m_fv4sf (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_veorq_m_f16 (float16x8_t __inactive, float16x8_t __a, float16x8_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_veorq_m_fv8hf (__inactive, __a, __b, __p);
-}
-
 __extension__ extern __inline float32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vfmaq_m_f32 (float32x4_t __a, float32x4_t __b, float32x4_t __c, mve_pred16_t __p)
@@ -18456,20 +18094,6 @@ __arm_vrndxq_x_f32 (float32x4_t __a, mve_pred16_t __p)
   return __builtin_mve_vrndxq_m_fv4sf (__arm_vuninitializedq_f32 (), __a, __p);
 }
 
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vandq_x_f16 (float16x8_t __a, float16x8_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vandq_m_fv8hf (__arm_vuninitializedq_f16 (), __a, __b, __p);
-}
-
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vandq_x_f32 (float32x4_t __a, float32x4_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vandq_m_fv4sf (__arm_vuninitializedq_f32 (), __a, __b, __p);
-}
-
 __extension__ extern __inline float16x8_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vbicq_x_f16 (float16x8_t __a, float16x8_t __b, mve_pred16_t __p)
@@ -18498,20 +18122,6 @@ __arm_vbrsrq_x_n_f32 (float32x4_t __a, int32_t __b, mve_pred16_t __p)
   return __builtin_mve_vbrsrq_m_n_fv4sf (__arm_vuninitializedq_f32 (), __a, __b, __p);
 }
 
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_veorq_x_f16 (float16x8_t __a, float16x8_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_veorq_m_fv8hf (__arm_vuninitializedq_f16 (), __a, __b, __p);
-}
-
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_veorq_x_f32 (float32x4_t __a, float32x4_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_veorq_m_fv4sf (__arm_vuninitializedq_f32 (), __a, __b, __p);
-}
-
 __extension__ extern __inline float16x8_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vornq_x_f16 (float16x8_t __a, float16x8_t __b, mve_pred16_t __p)
@@ -19428,13 +19038,6 @@ __arm_vhaddq (uint8x16_t __a, uint8_t __b)
  return __arm_vhaddq_n_u8 (__a, __b);
 }
 
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_veorq (uint8x16_t __a, uint8x16_t __b)
-{
- return __arm_veorq_u8 (__a, __b);
-}
-
 __extension__ extern __inline mve_pred16_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vcmpneq (uint8x16_t __a, uint8_t __b)
@@ -19505,13 +19108,6 @@ __arm_vbicq (uint8x16_t __a, uint8x16_t __b)
  return __arm_vbicq_u8 (__a, __b);
 }
 
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vandq (uint8x16_t __a, uint8x16_t __b)
-{
- return __arm_vandq_u8 (__a, __b);
-}
-
 __extension__ extern __inline uint32_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vaddvq_p (uint8x16_t __a, mve_pred16_t __p)
@@ -19981,13 +19577,6 @@ __arm_vhaddq (int8x16_t __a, int8_t __b)
  return __arm_vhaddq_n_s8 (__a, __b);
 }
 
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_veorq (int8x16_t __a, int8x16_t __b)
-{
- return __arm_veorq_s8 (__a, __b);
-}
-
 __extension__ extern __inline int8x16_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vcaddq_rot90 (int8x16_t __a, int8x16_t __b)
@@ -20016,13 +19605,6 @@ __arm_vbicq (int8x16_t __a, int8x16_t __b)
  return __arm_vbicq_s8 (__a, __b);
 }
 
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vandq (int8x16_t __a, int8x16_t __b)
-{
- return __arm_vandq_s8 (__a, __b);
-}
-
 __extension__ extern __inline int32_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vaddvaq (int32_t __a, int8x16_t __b)
@@ -20198,13 +19780,6 @@ __arm_vhaddq (uint16x8_t __a, uint16_t __b)
  return __arm_vhaddq_n_u16 (__a, __b);
 }
 
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_veorq (uint16x8_t __a, uint16x8_t __b)
-{
- return __arm_veorq_u16 (__a, __b);
-}
-
 __extension__ extern __inline mve_pred16_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vcmpneq (uint16x8_t __a, uint16_t __b)
@@ -20275,13 +19850,6 @@ __arm_vbicq (uint16x8_t __a, uint16x8_t __b)
  return __arm_vbicq_u16 (__a, __b);
 }
 
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vandq (uint16x8_t __a, uint16x8_t __b)
-{
- return __arm_vandq_u16 (__a, __b);
-}
-
 __extension__ extern __inline uint32_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vaddvq_p (uint16x8_t __a, mve_pred16_t __p)
@@ -20751,13 +20319,6 @@ __arm_vhaddq (int16x8_t __a, int16_t __b)
  return __arm_vhaddq_n_s16 (__a, __b);
 }
 
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_veorq (int16x8_t __a, int16x8_t __b)
-{
- return __arm_veorq_s16 (__a, __b);
-}
-
 __extension__ extern __inline int16x8_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vcaddq_rot90 (int16x8_t __a, int16x8_t __b)
@@ -20786,13 +20347,6 @@ __arm_vbicq (int16x8_t __a, int16x8_t __b)
  return __arm_vbicq_s16 (__a, __b);
 }
 
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vandq (int16x8_t __a, int16x8_t __b)
-{
- return __arm_vandq_s16 (__a, __b);
-}
-
 __extension__ extern __inline int32_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vaddvaq (int32_t __a, int16x8_t __b)
@@ -20968,13 +20522,6 @@ __arm_vhaddq (uint32x4_t __a, uint32_t __b)
  return __arm_vhaddq_n_u32 (__a, __b);
 }
 
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_veorq (uint32x4_t __a, uint32x4_t __b)
-{
- return __arm_veorq_u32 (__a, __b);
-}
-
 __extension__ extern __inline mve_pred16_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vcmpneq (uint32x4_t __a, uint32_t __b)
@@ -21045,13 +20592,6 @@ __arm_vbicq (uint32x4_t __a, uint32x4_t __b)
  return __arm_vbicq_u32 (__a, __b);
 }
 
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vandq (uint32x4_t __a, uint32x4_t __b)
-{
- return __arm_vandq_u32 (__a, __b);
-}
-
 __extension__ extern __inline uint32_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vaddvq_p (uint32x4_t __a, mve_pred16_t __p)
@@ -21521,13 +21061,6 @@ __arm_vhaddq (int32x4_t __a, int32_t __b)
  return __arm_vhaddq_n_s32 (__a, __b);
 }
 
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_veorq (int32x4_t __a, int32x4_t __b)
-{
- return __arm_veorq_s32 (__a, __b);
-}
-
 __extension__ extern __inline int32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vcaddq_rot90 (int32x4_t __a, int32x4_t __b)
@@ -21556,13 +21089,6 @@ __arm_vbicq (int32x4_t __a, int32x4_t __b)
  return __arm_vbicq_s32 (__a, __b);
 }
 
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vandq (int32x4_t __a, int32x4_t __b)
-{
- return __arm_vandq_s32 (__a, __b);
-}
-
 __extension__ extern __inline int32_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vaddvaq (int32_t __a, int32x4_t __b)
@@ -24909,48 +24435,6 @@ __arm_vabdq_m (uint16x8_t __inactive, uint16x8_t __a, uint16x8_t __b, mve_pred16
  return __arm_vabdq_m_u16 (__inactive, __a, __b, __p);
 }
 
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vandq_m (int8x16_t __inactive, int8x16_t __a, int8x16_t __b, mve_pred16_t __p)
-{
- return __arm_vandq_m_s8 (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vandq_m (int32x4_t __inactive, int32x4_t __a, int32x4_t __b, mve_pred16_t __p)
-{
- return __arm_vandq_m_s32 (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vandq_m (int16x8_t __inactive, int16x8_t __a, int16x8_t __b, mve_pred16_t __p)
-{
- return __arm_vandq_m_s16 (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vandq_m (uint8x16_t __inactive, uint8x16_t __a, uint8x16_t __b, mve_pred16_t __p)
-{
- return __arm_vandq_m_u8 (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vandq_m (uint32x4_t __inactive, uint32x4_t __a, uint32x4_t __b, mve_pred16_t __p)
-{
- return __arm_vandq_m_u32 (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vandq_m (uint16x8_t __inactive, uint16x8_t __a, uint16x8_t __b, mve_pred16_t __p)
-{
- return __arm_vandq_m_u16 (__inactive, __a, __b, __p);
-}
-
 __extension__ extern __inline int8x16_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vbicq_m (int8x16_t __inactive, int8x16_t __a, int8x16_t __b, mve_pred16_t __p)
@@ -25119,48 +24603,6 @@ __arm_vcaddq_rot90_m (uint16x8_t __inactive, uint16x8_t __a, uint16x8_t __b, mve
  return __arm_vcaddq_rot90_m_u16 (__inactive, __a, __b, __p);
 }
 
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_veorq_m (int8x16_t __inactive, int8x16_t __a, int8x16_t __b, mve_pred16_t __p)
-{
- return __arm_veorq_m_s8 (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_veorq_m (int32x4_t __inactive, int32x4_t __a, int32x4_t __b, mve_pred16_t __p)
-{
- return __arm_veorq_m_s32 (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_veorq_m (int16x8_t __inactive, int16x8_t __a, int16x8_t __b, mve_pred16_t __p)
-{
- return __arm_veorq_m_s16 (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_veorq_m (uint8x16_t __inactive, uint8x16_t __a, uint8x16_t __b, mve_pred16_t __p)
-{
- return __arm_veorq_m_u8 (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_veorq_m (uint32x4_t __inactive, uint32x4_t __a, uint32x4_t __b, mve_pred16_t __p)
-{
- return __arm_veorq_m_u32 (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_veorq_m (uint16x8_t __inactive, uint16x8_t __a, uint16x8_t __b, mve_pred16_t __p)
-{
- return __arm_veorq_m_u16 (__inactive, __a, __b, __p);
-}
-
 __extension__ extern __inline int8x16_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vhaddq_m (int8x16_t __inactive, int8x16_t __a, int8_t __b, mve_pred16_t __p)
@@ -29550,48 +28992,6 @@ __arm_vrmulhq_x (uint32x4_t __a, uint32x4_t __b, mve_pred16_t __p)
  return __arm_vrmulhq_x_u32 (__a, __b, __p);
 }
 
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vandq_x (int8x16_t __a, int8x16_t __b, mve_pred16_t __p)
-{
- return __arm_vandq_x_s8 (__a, __b, __p);
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vandq_x (int16x8_t __a, int16x8_t __b, mve_pred16_t __p)
-{
- return __arm_vandq_x_s16 (__a, __b, __p);
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vandq_x (int32x4_t __a, int32x4_t __b, mve_pred16_t __p)
-{
- return __arm_vandq_x_s32 (__a, __b, __p);
-}
-
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vandq_x (uint8x16_t __a, uint8x16_t __b, mve_pred16_t __p)
-{
- return __arm_vandq_x_u8 (__a, __b, __p);
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vandq_x (uint16x8_t __a, uint16x8_t __b, mve_pred16_t __p)
-{
- return __arm_vandq_x_u16 (__a, __b, __p);
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vandq_x (uint32x4_t __a, uint32x4_t __b, mve_pred16_t __p)
-{
- return __arm_vandq_x_u32 (__a, __b, __p);
-}
-
 __extension__ extern __inline int8x16_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vbicq_x (int8x16_t __a, int8x16_t __b, mve_pred16_t __p)
@@ -29676,48 +29076,6 @@ __arm_vbrsrq_x (uint32x4_t __a, int32_t __b, mve_pred16_t __p)
  return __arm_vbrsrq_x_n_u32 (__a, __b, __p);
 }
 
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_veorq_x (int8x16_t __a, int8x16_t __b, mve_pred16_t __p)
-{
- return __arm_veorq_x_s8 (__a, __b, __p);
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_veorq_x (int16x8_t __a, int16x8_t __b, mve_pred16_t __p)
-{
- return __arm_veorq_x_s16 (__a, __b, __p);
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_veorq_x (int32x4_t __a, int32x4_t __b, mve_pred16_t __p)
-{
- return __arm_veorq_x_s32 (__a, __b, __p);
-}
-
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_veorq_x (uint8x16_t __a, uint8x16_t __b, mve_pred16_t __p)
-{
- return __arm_veorq_x_u8 (__a, __b, __p);
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_veorq_x (uint16x8_t __a, uint16x8_t __b, mve_pred16_t __p)
-{
- return __arm_veorq_x_u16 (__a, __b, __p);
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_veorq_x (uint32x4_t __a, uint32x4_t __b, mve_pred16_t __p)
-{
- return __arm_veorq_x_u32 (__a, __b, __p);
-}
-
 __extension__ extern __inline int16x8_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vmovlbq_x (int8x16_t __a, mve_pred16_t __p)
@@ -31127,13 +30485,6 @@ __arm_vmaxnmaq (float16x8_t __a, float16x8_t __b)
  return __arm_vmaxnmaq_f16 (__a, __b);
 }
 
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_veorq (float16x8_t __a, float16x8_t __b)
-{
- return __arm_veorq_f16 (__a, __b);
-}
-
 __extension__ extern __inline float16x8_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vcmulq_rot90 (float16x8_t __a, float16x8_t __b)
@@ -31183,13 +30534,6 @@ __arm_vbicq (float16x8_t __a, float16x8_t __b)
  return __arm_vbicq_f16 (__a, __b);
 }
 
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vandq (float16x8_t __a, float16x8_t __b)
-{
- return __arm_vandq_f16 (__a, __b);
-}
-
 __extension__ extern __inline float16x8_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vabdq (float16x8_t __a, float16x8_t __b)
@@ -31351,13 +30695,6 @@ __arm_vmaxnmaq (float32x4_t __a, float32x4_t __b)
  return __arm_vmaxnmaq_f32 (__a, __b);
 }
 
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_veorq (float32x4_t __a, float32x4_t __b)
-{
- return __arm_veorq_f32 (__a, __b);
-}
-
 __extension__ extern __inline float32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vcmulq_rot90 (float32x4_t __a, float32x4_t __b)
@@ -31407,13 +30744,6 @@ __arm_vbicq (float32x4_t __a, float32x4_t __b)
  return __arm_vbicq_f32 (__a, __b);
 }
 
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vandq (float32x4_t __a, float32x4_t __b)
-{
- return __arm_vandq_f32 (__a, __b);
-}
-
 __extension__ extern __inline float32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vabdq (float32x4_t __a, float32x4_t __b)
@@ -32184,20 +31514,6 @@ __arm_vabdq_m (float16x8_t __inactive, float16x8_t __a, float16x8_t __b, mve_pre
  return __arm_vabdq_m_f16 (__inactive, __a, __b, __p);
 }
 
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vandq_m (float32x4_t __inactive, float32x4_t __a, float32x4_t __b, mve_pred16_t __p)
-{
- return __arm_vandq_m_f32 (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vandq_m (float16x8_t __inactive, float16x8_t __a, float16x8_t __b, mve_pred16_t __p)
-{
- return __arm_vandq_m_f16 (__inactive, __a, __b, __p);
-}
-
 __extension__ extern __inline float32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vbicq_m (float32x4_t __inactive, float32x4_t __a, float32x4_t __b, mve_pred16_t __p)
@@ -32394,20 +31710,6 @@ __arm_vcvtq_m_n (uint16x8_t __inactive, float16x8_t __a, const int __imm6, mve_p
  return __arm_vcvtq_m_n_u16_f16 (__inactive, __a, __imm6, __p);
 }
 
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_veorq_m (float32x4_t __inactive, float32x4_t __a, float32x4_t __b, mve_pred16_t __p)
-{
- return __arm_veorq_m_f32 (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_veorq_m (float16x8_t __inactive, float16x8_t __a, float16x8_t __b, mve_pred16_t __p)
-{
- return __arm_veorq_m_f16 (__inactive, __a, __b, __p);
-}
-
 __extension__ extern __inline float32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vfmaq_m (float32x4_t __a, float32x4_t __b, float32x4_t __c, mve_pred16_t __p)
@@ -33010,20 +32312,6 @@ __arm_vrndxq_x (float32x4_t __a, mve_pred16_t __p)
  return __arm_vrndxq_x_f32 (__a, __p);
 }
 
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vandq_x (float16x8_t __a, float16x8_t __b, mve_pred16_t __p)
-{
- return __arm_vandq_x_f16 (__a, __b, __p);
-}
-
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vandq_x (float32x4_t __a, float32x4_t __b, mve_pred16_t __p)
-{
- return __arm_vandq_x_f32 (__a, __b, __p);
-}
-
 __extension__ extern __inline float16x8_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vbicq_x (float16x8_t __a, float16x8_t __b, mve_pred16_t __p)
@@ -33052,20 +32340,6 @@ __arm_vbrsrq_x (float32x4_t __a, int32_t __b, mve_pred16_t __p)
  return __arm_vbrsrq_x_n_f32 (__a, __b, __p);
 }
 
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_veorq_x (float16x8_t __a, float16x8_t __b, mve_pred16_t __p)
-{
- return __arm_veorq_x_f16 (__a, __b, __p);
-}
-
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_veorq_x (float32x4_t __a, float32x4_t __b, mve_pred16_t __p)
-{
- return __arm_veorq_x_f32 (__a, __b, __p);
-}
-
 __extension__ extern __inline float16x8_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vornq_x (float16x8_t __a, float16x8_t __b, mve_pred16_t __p)
@@ -33678,18 +32952,6 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]: __arm_vabdq_f16 (__ARM_mve_coerce(__p0, float16x8_t), __ARM_mve_coerce(__p1, float16x8_t)), \
   int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]: __arm_vabdq_f32 (__ARM_mve_coerce(__p0, float32x4_t), __ARM_mve_coerce(__p1, float32x4_t)));})
 
-#define __arm_vandq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
-  __typeof(p1) __p1 = (p1); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vandq_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t)), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vandq_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t)), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vandq_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t)), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]: __arm_vandq_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t)), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vandq_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t)), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vandq_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t)), \
-  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]: __arm_vandq_f16 (__ARM_mve_coerce(__p0, float16x8_t), __ARM_mve_coerce(__p1, float16x8_t)), \
-  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]: __arm_vandq_f32 (__ARM_mve_coerce(__p0, float32x4_t), __ARM_mve_coerce(__p1, float32x4_t)));})
-
 #define __arm_vbicq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
   _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
@@ -33868,18 +33130,6 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]: __arm_vcmulq_rot90_f16 (__ARM_mve_coerce(__p0, float16x8_t), __ARM_mve_coerce(__p1, float16x8_t)), \
   int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]: __arm_vcmulq_rot90_f32 (__ARM_mve_coerce(__p0, float32x4_t), __ARM_mve_coerce(__p1, float32x4_t)));})
 
-#define __arm_veorq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
-  __typeof(p1) __p1 = (p1); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_veorq_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t)), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_veorq_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t)), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_veorq_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t)), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]: __arm_veorq_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t)), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_veorq_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t)), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_veorq_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t)), \
-  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]: __arm_veorq_f16 (__ARM_mve_coerce(__p0, float16x8_t), __ARM_mve_coerce(__p1, float16x8_t)), \
-  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]: __arm_veorq_f32 (__ARM_mve_coerce(__p0, float32x4_t), __ARM_mve_coerce(__p1, float32x4_t)));})
-
 #define __arm_vmaxnmaq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
   _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
@@ -35060,19 +34310,6 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]: __arm_vabdq_m_f16 (__ARM_mve_coerce(__p0, float16x8_t), __ARM_mve_coerce(__p1, float16x8_t), __ARM_mve_coerce(__p2, float16x8_t), p3), \
   int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]: __arm_vabdq_m_f32 (__ARM_mve_coerce(__p0, float32x4_t), __ARM_mve_coerce(__p1, float32x4_t), __ARM_mve_coerce(__p2, float32x4_t), p3));})
 
-#define __arm_vandq_m(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
-  __typeof(p1) __p1 = (p1); \
-  __typeof(p2) __p2 = (p2); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vandq_m_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8x16_t), p3), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vandq_m_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16x8_t), p3), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vandq_m_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32x4_t), p3), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]: __arm_vandq_m_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce(__p2, uint8x16_t), p3), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vandq_m_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, uint16x8_t), p3), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vandq_m_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, uint32x4_t), p3), \
-  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]: __arm_vandq_m_f16 (__ARM_mve_coerce(__p0, float16x8_t), __ARM_mve_coerce(__p1, float16x8_t), __ARM_mve_coerce(__p2, float16x8_t), p3), \
-  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]: __arm_vandq_m_f32 (__ARM_mve_coerce(__p0, float32x4_t), __ARM_mve_coerce(__p1, float32x4_t), __ARM_mve_coerce(__p2, float32x4_t), p3));})
-
 #define __arm_vbicq_m(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
   __typeof(p2) __p2 = (p2); \
@@ -35180,19 +34417,6 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]: __arm_vcmulq_rot90_m_f16(__ARM_mve_coerce(__p0, float16x8_t), __ARM_mve_coerce(__p1, float16x8_t), __ARM_mve_coerce(__p2, float16x8_t), p3), \
   int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]: __arm_vcmulq_rot90_m_f32(__ARM_mve_coerce(__p0, float32x4_t), __ARM_mve_coerce(__p1, float32x4_t), __ARM_mve_coerce(__p2, float32x4_t), p3));})
 
-#define __arm_veorq_m(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
-  __typeof(p1) __p1 = (p1); \
-  __typeof(p2) __p2 = (p2); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_veorq_m_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8x16_t), p3), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_veorq_m_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16x8_t), p3), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_veorq_m_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32x4_t), p3), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]: __arm_veorq_m_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce(__p2, uint8x16_t), p3), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_veorq_m_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, uint16x8_t), p3), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_veorq_m_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, uint32x4_t), p3), \
-  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]: __arm_veorq_m_f16 (__ARM_mve_coerce(__p0, float16x8_t), __ARM_mve_coerce(__p1, float16x8_t), __ARM_mve_coerce(__p2, float16x8_t), p3), \
-  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]: __arm_veorq_m_f32 (__ARM_mve_coerce(__p0, float32x4_t), __ARM_mve_coerce(__p1, float32x4_t), __ARM_mve_coerce(__p2, float32x4_t), p3));})
-
 #define __arm_vfmaq_m(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
   __typeof(p2) __p2 = (p2); \
@@ -35588,18 +34812,6 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_float16x8_t]: __arm_vabsq_x_f16 (__ARM_mve_coerce(__p1, float16x8_t), p2), \
   int (*)[__ARM_mve_type_float32x4_t]: __arm_vabsq_x_f32 (__ARM_mve_coerce(__p1, float32x4_t), p2));})
 
-#define __arm_vandq_x(p1,p2,p3) ({ __typeof(p1) __p1 = (p1); \
-  __typeof(p2) __p2 = (p2); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vandq_x_s8  (__ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8x16_t), p3), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vandq_x_s16 (__ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16x8_t), p3), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vandq_x_s32 (__ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32x4_t), p3), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]: __arm_vandq_x_u8 (__ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce(__p2, uint8x16_t), p3), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vandq_x_u16 (__ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, uint16x8_t), p3), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vandq_x_u32 (__ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, uint32x4_t), p3), \
-  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]: __arm_vandq_x_f16 (__ARM_mve_coerce(__p1, float16x8_t), __ARM_mve_coerce(__p2, float16x8_t), p3), \
-  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]: __arm_vandq_x_f32 (__ARM_mve_coerce(__p1, float32x4_t), __ARM_mve_coerce(__p2, float32x4_t), p3));})
-
 #define __arm_vbicq_x(p1,p2,p3) ({ __typeof(p1) __p1 = (p1); \
   __typeof(p2) __p2 = (p2); \
   _Generic( (int (*)[__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
@@ -35679,18 +34891,6 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_uint16x8_t]: __arm_vcvtq_x_n_f16_u16 (__ARM_mve_coerce(__p1, uint16x8_t), p2, p3), \
   int (*)[__ARM_mve_type_uint32x4_t]: __arm_vcvtq_x_n_f32_u32 (__ARM_mve_coerce(__p1, uint32x4_t), p2, p3));})
 
-#define __arm_veorq_x(p1,p2,p3) ({ __typeof(p1) __p1 = (p1); \
-  __typeof(p2) __p2 = (p2); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_veorq_x_s8(__ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8x16_t), p3), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_veorq_x_s16(__ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16x8_t), p3), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_veorq_x_s32(__ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32x4_t), p3), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]: __arm_veorq_x_u8 (__ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce(__p2, uint8x16_t), p3), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_veorq_x_u16 (__ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, uint16x8_t), p3), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_veorq_x_u32 (__ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, uint32x4_t), p3), \
-  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]: __arm_veorq_x_f16 (__ARM_mve_coerce(__p1, float16x8_t), __ARM_mve_coerce(__p2, float16x8_t), p3), \
-  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]: __arm_veorq_x_f32 (__ARM_mve_coerce(__p1, float32x4_t), __ARM_mve_coerce(__p2, float32x4_t), p3));})
-
 #define __arm_vmaxnmq_x(p1,p2,p3) ({ __typeof(p1) __p1 = (p1); \
   __typeof(p2) __p2 = (p2); \
   _Generic( (int (*)[__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
@@ -36251,16 +35451,6 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vhaddq_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t)), \
   int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vhaddq_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t)));})
 
-#define __arm_veorq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
-  __typeof(p1) __p1 = (p1); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_veorq_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t)), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_veorq_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t)), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_veorq_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t)), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]: __arm_veorq_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t)), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_veorq_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t)), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_veorq_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t)));})
-
 #define __arm_vcaddq_rot90(p0,p1) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
   _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
@@ -36304,16 +35494,6 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vbicq_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t)), \
   int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vbicq_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t)));})
 
-#define __arm_vandq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
-  __typeof(p1) __p1 = (p1); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vandq_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t)), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vandq_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t)), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vandq_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t)), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]: __arm_vandq_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t)), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vandq_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t)), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vandq_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t)));})
-
 #define __arm_vabdq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
   _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
@@ -36998,17 +36178,6 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vabdq_m_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, uint16x8_t), p3), \
   int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vabdq_m_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, uint32x4_t), p3));})
 
-#define __arm_vandq_m(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
-  __typeof(p1) __p1 = (p1); \
-  __typeof(p2) __p2 = (p2); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vandq_m_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8x16_t), p3), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vandq_m_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16x8_t), p3), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vandq_m_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32x4_t), p3), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]: __arm_vandq_m_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce(__p2, uint8x16_t), p3), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vandq_m_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, uint16x8_t), p3), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vandq_m_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, uint32x4_t), p3));})
-
 #define __arm_vbicq_m(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
   __typeof(p2) __p2 = (p2); \
@@ -37053,17 +36222,6 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vcaddq_rot90_m_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, uint16x8_t), p3), \
   int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vcaddq_rot90_m_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, uint32x4_t), p3));})
 
-#define __arm_veorq_m(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
-  __typeof(p1) __p1 = (p1); \
-  __typeof(p2) __p2 = (p2); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_veorq_m_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8x16_t), p3), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_veorq_m_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16x8_t), p3), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_veorq_m_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32x4_t), p3), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]: __arm_veorq_m_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce(__p2, uint8x16_t), p3), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_veorq_m_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, uint16x8_t), p3), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_veorq_m_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, uint32x4_t), p3));})
-
 #define __arm_vmladavaq_p(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
   __typeof(p2) __p2 = (p2); \
@@ -37360,16 +36518,6 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vcaddq_rot90_x_u16 (__ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, uint16x8_t), p3), \
   int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vcaddq_rot90_x_u32 (__ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, uint32x4_t), p3));})
 
-#define __arm_veorq_x(p1,p2,p3) ({ __typeof(p1) __p1 = (p1); \
-  __typeof(p2) __p2 = (p2); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_veorq_x_s8(__ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8x16_t), p3), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_veorq_x_s16(__ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16x8_t), p3), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_veorq_x_s32(__ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32x4_t), p3), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]: __arm_veorq_x_u8 (__ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce(__p2, uint8x16_t), p3), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_veorq_x_u16 (__ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, uint16x8_t), p3), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_veorq_x_u32 (__ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, uint32x4_t), p3));})
-
 #define __arm_vmovlbq_x(p1,p2) ({ __typeof(p1) __p1 = (p1); \
   _Generic( (int (*)[__ARM_mve_typeid(__p1)])0, \
   int (*)[__ARM_mve_type_int8x16_t]: __arm_vmovlbq_x_s8 (__ARM_mve_coerce(__p1, int8x16_t), p2), \
@@ -37478,16 +36626,6 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vabdq_x_u16 (__ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, uint16x8_t), p3), \
   int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vabdq_x_u32 (__ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, uint32x4_t), p3));})
 
-#define __arm_vandq_x(p1,p2,p3) ({ __typeof(p1) __p1 = (p1); \
-  __typeof(p2) __p2 = (p2); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vandq_x_s8  (__ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8x16_t), p3), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vandq_x_s16 (__ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16x8_t), p3), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vandq_x_s32 (__ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32x4_t), p3), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]: __arm_vandq_x_u8 (__ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce(__p2, uint8x16_t), p3), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vandq_x_u16 (__ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, uint16x8_t), p3), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vandq_x_u32 (__ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, uint32x4_t), p3));})
-
 #define __arm_vbicq_x(p1,p2,p3) ({ __typeof(p1) __p1 = (p1); \
   __typeof(p2) __p2 = (p2); \
   _Generic( (int (*)[__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
-- 
2.34.1


^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH 12/22] arm: [MVE intrinsics] add binary_orrq shape
  2023-04-18 13:45 [PATCH 00/22] arm: New framework for MVE intrinsics Christophe Lyon
                   ` (10 preceding siblings ...)
  2023-04-18 13:45 ` [PATCH 11/22] arm: [MVE intrinsics] rework vandq veorq Christophe Lyon
@ 2023-04-18 13:45 ` Christophe Lyon
  2023-05-02 16:39   ` Kyrylo Tkachov
  2023-04-18 13:45 ` [PATCH 13/22] arm: [MVE intrinsics] rework vorrq Christophe Lyon
                   ` (10 subsequent siblings)
  22 siblings, 1 reply; 55+ messages in thread
From: Christophe Lyon @ 2023-04-18 13:45 UTC (permalink / raw)
  To: gcc-patches, kyrylo.tkachov, richard.earnshaw, richard.sandiford
  Cc: Christophe Lyon

patch adds the binary_orrq shape description.

MODE_n intrinsics use a set of predicates (preds_m_or_none) different
the MODE_none ones, so we explicitly reference preds_m_or_none from
the shape, thus we need to make it a global array.

2022-09-08  Christophe Lyon  <christophe.lyon@arm.com>

	gcc/
	* config/arm/arm-mve-builtins-shapes.cc (binary_orrq): New.
	* config/arm/arm-mve-builtins-shapes.h (binary_orrq): New.
	* config/arm/arm-mve-builtins.cc (preds_m_or_none): Remove static.
	* config/arm/arm-mve-builtins.h (preds_m_or_none): Declare.
---
 gcc/config/arm/arm-mve-builtins-shapes.cc | 61 +++++++++++++++++++++++
 gcc/config/arm/arm-mve-builtins-shapes.h  |  1 +
 gcc/config/arm/arm-mve-builtins.cc        |  2 +-
 gcc/config/arm/arm-mve-builtins.h         |  3 ++
 4 files changed, 66 insertions(+), 1 deletion(-)

diff --git a/gcc/config/arm/arm-mve-builtins-shapes.cc b/gcc/config/arm/arm-mve-builtins-shapes.cc
index e69faae4e2c..83410bbc51a 100644
--- a/gcc/config/arm/arm-mve-builtins-shapes.cc
+++ b/gcc/config/arm/arm-mve-builtins-shapes.cc
@@ -397,6 +397,67 @@ struct binary_opt_n_def : public overloaded_base<0>
 };
 SHAPE (binary_opt_n)
 
+/* <T0>_t vfoo[t0](<T0>_t, <T0>_t)
+   <T0>_t vfoo[_n_t0](<T0>_t, <S0>_t)
+
+   Where the _n form has only supports s16/s32/u16/u32 types as for vorrq.
+
+   Example: vorrq.
+   int16x8_t [__arm_]vorrq[_s16](int16x8_t a, int16x8_t b)
+   int16x8_t [__arm_]vorrq_m[_s16](int16x8_t inactive, int16x8_t a, int16x8_t b, mve_pred16_t p)
+   int16x8_t [__arm_]vorrq_x[_s16](int16x8_t a, int16x8_t b, mve_pred16_t p)
+   int16x8_t [__arm_]vorrq[_n_s16](int16x8_t a, const int16_t imm)
+   int16x8_t [__arm_]vorrq_m_n[_s16](int16x8_t a, const int16_t imm, mve_pred16_t p)  */
+struct binary_orrq_def : public overloaded_base<0>
+{
+  bool
+  explicit_mode_suffix_p (enum predication_index pred, enum mode_suffix_index mode) const override
+  {
+    return (mode == MODE_n
+	    && pred == PRED_m);
+  }
+
+  bool
+  skip_overload_p (enum predication_index pred, enum mode_suffix_index mode) const override
+  {
+    switch (mode)
+      {
+      case MODE_none:
+	return false;
+
+	/* For MODE_n, share the overloaded instance with MODE_none, except for PRED_m.  */
+      case MODE_n:
+	return pred != PRED_m;
+
+      default:
+	gcc_unreachable ();
+      }
+  }
+
+  void
+  build (function_builder &b, const function_group_info &group,
+	 bool preserve_user_namespace) const override
+  {
+    b.add_overloaded_functions (group, MODE_none, preserve_user_namespace);
+    b.add_overloaded_functions (group, MODE_n, preserve_user_namespace);
+    build_all (b, "v0,v0,v0", group, MODE_none, preserve_user_namespace);
+    build_16_32 (b, "v0,v0,s0", group, MODE_n, preserve_user_namespace, false, preds_m_or_none);
+  }
+
+  tree
+  resolve (function_resolver &r) const override
+  {
+    unsigned int i, nargs;
+    type_suffix_index type;
+    if (!r.check_gp_argument (2, i, nargs)
+	|| (type = r.infer_vector_type (0)) == NUM_TYPE_SUFFIXES)
+      return error_mark_node;
+
+    return r.finish_opt_n_resolution (i, 0, type);
+  }
+};
+SHAPE (binary_orrq)
+
 /* <T0>[xN]_t vfoo_t0().
 
    Example: vuninitializedq.
diff --git a/gcc/config/arm/arm-mve-builtins-shapes.h b/gcc/config/arm/arm-mve-builtins-shapes.h
index b00ee5eb57a..618b3226050 100644
--- a/gcc/config/arm/arm-mve-builtins-shapes.h
+++ b/gcc/config/arm/arm-mve-builtins-shapes.h
@@ -36,6 +36,7 @@ namespace arm_mve
 
     extern const function_shape *const binary;
     extern const function_shape *const binary_opt_n;
+    extern const function_shape *const binary_orrq;
     extern const function_shape *const inherent;
     extern const function_shape *const unary_convert;
 
diff --git a/gcc/config/arm/arm-mve-builtins.cc b/gcc/config/arm/arm-mve-builtins.cc
index e409a029346..c74e890bd3d 100644
--- a/gcc/config/arm/arm-mve-builtins.cc
+++ b/gcc/config/arm/arm-mve-builtins.cc
@@ -285,7 +285,7 @@ static const predication_index preds_none[] = { PRED_none, NUM_PREDS };
 
 /* Used by functions that have the m (merging) predicated form, and in
    addition have an unpredicated form.  */
-static const predication_index preds_m_or_none[] = {
+const predication_index preds_m_or_none[] = {
   PRED_m, PRED_none, NUM_PREDS
 };
 
diff --git a/gcc/config/arm/arm-mve-builtins.h b/gcc/config/arm/arm-mve-builtins.h
index a20d2fb5d86..c9b51a0c77b 100644
--- a/gcc/config/arm/arm-mve-builtins.h
+++ b/gcc/config/arm/arm-mve-builtins.h
@@ -135,6 +135,9 @@ enum predication_index
   NUM_PREDS
 };
 
+/* Some shapes need access to some predicate sets.  */
+extern const predication_index preds_m_or_none[];
+
 /* Classifies element types, based on type suffixes with the bit count
    removed.  */
 enum type_class_index
-- 
2.34.1


^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH 13/22] arm: [MVE intrinsics] rework vorrq
  2023-04-18 13:45 [PATCH 00/22] arm: New framework for MVE intrinsics Christophe Lyon
                   ` (11 preceding siblings ...)
  2023-04-18 13:45 ` [PATCH 12/22] arm: [MVE intrinsics] add binary_orrq shape Christophe Lyon
@ 2023-04-18 13:45 ` Christophe Lyon
  2023-05-02 16:41   ` Kyrylo Tkachov
  2023-04-18 13:46 ` [PATCH 14/22] arm: [MVE intrinsics] add unspec_mve_function_exact_insn Christophe Lyon
                   ` (9 subsequent siblings)
  22 siblings, 1 reply; 55+ messages in thread
From: Christophe Lyon @ 2023-04-18 13:45 UTC (permalink / raw)
  To: gcc-patches, kyrylo.tkachov, richard.earnshaw, richard.sandiford
  Cc: Christophe Lyon

Implement vorrq using the new MVE builtins framework.

2022-09-08  Christophe Lyon <christophe.lyon@arm.com>

	gcc/
	* config/arm/arm-mve-builtins-base.cc (FUNCTION_WITH_RTX_M_N_NO_N_F): New.
	(vorrq): New.
	* config/arm/arm-mve-builtins-base.def (vorrq): New.
	* config/arm/arm-mve-builtins-base.h (vorrq): New.
	* config/arm/arm-mve-builtins.cc
	(function_instance::has_inactive_argument): Handle vorrq.
	* config/arm/arm_mve.h (vorrq): Remove.
	(vorrq_m_n): Remove.
	(vorrq_m): Remove.
	(vorrq_x): Remove.
	(vorrq_u8): Remove.
	(vorrq_s8): Remove.
	(vorrq_u16): Remove.
	(vorrq_s16): Remove.
	(vorrq_u32): Remove.
	(vorrq_s32): Remove.
	(vorrq_n_u16): Remove.
	(vorrq_f16): Remove.
	(vorrq_n_s16): Remove.
	(vorrq_n_u32): Remove.
	(vorrq_f32): Remove.
	(vorrq_n_s32): Remove.
	(vorrq_m_n_s16): Remove.
	(vorrq_m_n_u16): Remove.
	(vorrq_m_n_s32): Remove.
	(vorrq_m_n_u32): Remove.
	(vorrq_m_s8): Remove.
	(vorrq_m_s32): Remove.
	(vorrq_m_s16): Remove.
	(vorrq_m_u8): Remove.
	(vorrq_m_u32): Remove.
	(vorrq_m_u16): Remove.
	(vorrq_m_f32): Remove.
	(vorrq_m_f16): Remove.
	(vorrq_x_s8): Remove.
	(vorrq_x_s16): Remove.
	(vorrq_x_s32): Remove.
	(vorrq_x_u8): Remove.
	(vorrq_x_u16): Remove.
	(vorrq_x_u32): Remove.
	(vorrq_x_f16): Remove.
	(vorrq_x_f32): Remove.
	(__arm_vorrq_u8): Remove.
	(__arm_vorrq_s8): Remove.
	(__arm_vorrq_u16): Remove.
	(__arm_vorrq_s16): Remove.
	(__arm_vorrq_u32): Remove.
	(__arm_vorrq_s32): Remove.
	(__arm_vorrq_n_u16): Remove.
	(__arm_vorrq_n_s16): Remove.
	(__arm_vorrq_n_u32): Remove.
	(__arm_vorrq_n_s32): Remove.
	(__arm_vorrq_m_n_s16): Remove.
	(__arm_vorrq_m_n_u16): Remove.
	(__arm_vorrq_m_n_s32): Remove.
	(__arm_vorrq_m_n_u32): Remove.
	(__arm_vorrq_m_s8): Remove.
	(__arm_vorrq_m_s32): Remove.
	(__arm_vorrq_m_s16): Remove.
	(__arm_vorrq_m_u8): Remove.
	(__arm_vorrq_m_u32): Remove.
	(__arm_vorrq_m_u16): Remove.
	(__arm_vorrq_x_s8): Remove.
	(__arm_vorrq_x_s16): Remove.
	(__arm_vorrq_x_s32): Remove.
	(__arm_vorrq_x_u8): Remove.
	(__arm_vorrq_x_u16): Remove.
	(__arm_vorrq_x_u32): Remove.
	(__arm_vorrq_f16): Remove.
	(__arm_vorrq_f32): Remove.
	(__arm_vorrq_m_f32): Remove.
	(__arm_vorrq_m_f16): Remove.
	(__arm_vorrq_x_f16): Remove.
	(__arm_vorrq_x_f32): Remove.
	(__arm_vorrq): Remove.
	(__arm_vorrq_m_n): Remove.
	(__arm_vorrq_m): Remove.
	(__arm_vorrq_x): Remove.
---
 gcc/config/arm/arm-mve-builtins-base.cc  |   9 +
 gcc/config/arm/arm-mve-builtins-base.def |   2 +
 gcc/config/arm/arm-mve-builtins-base.h   |   1 +
 gcc/config/arm/arm-mve-builtins.cc       |   3 +
 gcc/config/arm/arm_mve.h                 | 559 -----------------------
 5 files changed, 15 insertions(+), 559 deletions(-)

diff --git a/gcc/config/arm/arm-mve-builtins-base.cc b/gcc/config/arm/arm-mve-builtins-base.cc
index 51fed8f671f..499a1ef9f0e 100644
--- a/gcc/config/arm/arm-mve-builtins-base.cc
+++ b/gcc/config/arm/arm-mve-builtins-base.cc
@@ -98,10 +98,19 @@ namespace arm_mve {
     UNSPEC##_M_S, UNSPEC##_M_U, UNSPEC##_M_F,				\
     -1, -1, -1))
 
+  /* Helper for builtins with RTX codes, _m predicated and _n overrides.  */
+#define FUNCTION_WITH_RTX_M_N_NO_N_F(NAME, RTX, UNSPEC) FUNCTION	\
+  (NAME, unspec_based_mve_function_exact_insn,				\
+   (RTX, RTX, RTX,							\
+    UNSPEC##_N_S, UNSPEC##_N_U, -1,					\
+    UNSPEC##_M_S, UNSPEC##_M_U, UNSPEC##_M_F,				\
+    UNSPEC##_M_N_S, UNSPEC##_M_N_U, -1))
+
 FUNCTION_WITH_RTX_M_N (vaddq, PLUS, VADDQ)
 FUNCTION_WITH_RTX_M (vandq, AND, VANDQ)
 FUNCTION_WITH_RTX_M (veorq, XOR, VEORQ)
 FUNCTION_WITH_RTX_M_N (vmulq, MULT, VMULQ)
+FUNCTION_WITH_RTX_M_N_NO_N_F (vorrq, IOR, VORRQ)
 FUNCTION (vreinterpretq, vreinterpretq_impl,)
 FUNCTION_WITH_RTX_M_N (vsubq, MINUS, VSUBQ)
 FUNCTION (vuninitializedq, vuninitializedq_impl,)
diff --git a/gcc/config/arm/arm-mve-builtins-base.def b/gcc/config/arm/arm-mve-builtins-base.def
index a933c9fc91e..c3f8c0f0eeb 100644
--- a/gcc/config/arm/arm-mve-builtins-base.def
+++ b/gcc/config/arm/arm-mve-builtins-base.def
@@ -22,6 +22,7 @@ DEF_MVE_FUNCTION (vaddq, binary_opt_n, all_integer, mx_or_none)
 DEF_MVE_FUNCTION (vandq, binary, all_integer, mx_or_none)
 DEF_MVE_FUNCTION (veorq, binary, all_integer, mx_or_none)
 DEF_MVE_FUNCTION (vmulq, binary_opt_n, all_integer, mx_or_none)
+DEF_MVE_FUNCTION (vorrq, binary_orrq, all_integer, mx_or_none)
 DEF_MVE_FUNCTION (vreinterpretq, unary_convert, reinterpret_integer, none)
 DEF_MVE_FUNCTION (vsubq, binary_opt_n, all_integer, mx_or_none)
 DEF_MVE_FUNCTION (vuninitializedq, inherent, all_integer_with_64, none)
@@ -32,6 +33,7 @@ DEF_MVE_FUNCTION (vaddq, binary_opt_n, all_float, mx_or_none)
 DEF_MVE_FUNCTION (vandq, binary, all_float, mx_or_none)
 DEF_MVE_FUNCTION (veorq, binary, all_float, mx_or_none)
 DEF_MVE_FUNCTION (vmulq, binary_opt_n, all_float, mx_or_none)
+DEF_MVE_FUNCTION (vorrq, binary_orrq, all_float, mx_or_none)
 DEF_MVE_FUNCTION (vreinterpretq, unary_convert, reinterpret_float, none)
 DEF_MVE_FUNCTION (vsubq, binary_opt_n, all_float, mx_or_none)
 DEF_MVE_FUNCTION (vuninitializedq, inherent, all_float, none)
diff --git a/gcc/config/arm/arm-mve-builtins-base.h b/gcc/config/arm/arm-mve-builtins-base.h
index 4fcf55715b6..c450b373239 100644
--- a/gcc/config/arm/arm-mve-builtins-base.h
+++ b/gcc/config/arm/arm-mve-builtins-base.h
@@ -27,6 +27,7 @@ extern const function_base *const vaddq;
 extern const function_base *const vandq;
 extern const function_base *const veorq;
 extern const function_base *const vmulq;
+extern const function_base *const vorrq;
 extern const function_base *const vreinterpretq;
 extern const function_base *const vsubq;
 extern const function_base *const vuninitializedq;
diff --git a/gcc/config/arm/arm-mve-builtins.cc b/gcc/config/arm/arm-mve-builtins.cc
index c74e890bd3d..0708d4fa94a 100644
--- a/gcc/config/arm/arm-mve-builtins.cc
+++ b/gcc/config/arm/arm-mve-builtins.cc
@@ -669,6 +669,9 @@ function_instance::has_inactive_argument () const
   if (pred != PRED_m)
     return false;
 
+  if (base == functions::vorrq && mode_suffix_id == MODE_n)
+    return false;
+
   return true;
 }
 
diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
index 0ad0122e44f..edf8e247421 100644
--- a/gcc/config/arm/arm_mve.h
+++ b/gcc/config/arm/arm_mve.h
@@ -65,7 +65,6 @@
 #define vrhaddq(__a, __b) __arm_vrhaddq(__a, __b)
 #define vqsubq(__a, __b) __arm_vqsubq(__a, __b)
 #define vqaddq(__a, __b) __arm_vqaddq(__a, __b)
-#define vorrq(__a, __b) __arm_vorrq(__a, __b)
 #define vornq(__a, __b) __arm_vornq(__a, __b)
 #define vmulltq_int(__a, __b) __arm_vmulltq_int(__a, __b)
 #define vmullbq_int(__a, __b) __arm_vmullbq_int(__a, __b)
@@ -201,7 +200,6 @@
 #define vrmlaldavhxq_p(__a, __b, __p) __arm_vrmlaldavhxq_p(__a, __b, __p)
 #define vrmlsldavhq_p(__a, __b, __p) __arm_vrmlsldavhq_p(__a, __b, __p)
 #define vrmlsldavhxq_p(__a, __b, __p) __arm_vrmlsldavhxq_p(__a, __b, __p)
-#define vorrq_m_n(__a, __imm, __p) __arm_vorrq_m_n(__a, __imm, __p)
 #define vqrshrntq(__a, __b, __imm) __arm_vqrshrntq(__a, __b, __imm)
 #define vqshrnbq(__a, __b, __imm) __arm_vqshrnbq(__a, __b, __imm)
 #define vqshrntq(__a, __b, __imm) __arm_vqshrntq(__a, __b, __imm)
@@ -254,7 +252,6 @@
 #define vmullbq_int_m(__inactive, __a, __b, __p) __arm_vmullbq_int_m(__inactive, __a, __b, __p)
 #define vmulltq_int_m(__inactive, __a, __b, __p) __arm_vmulltq_int_m(__inactive, __a, __b, __p)
 #define vornq_m(__inactive, __a, __b, __p) __arm_vornq_m(__inactive, __a, __b, __p)
-#define vorrq_m(__inactive, __a, __b, __p) __arm_vorrq_m(__inactive, __a, __b, __p)
 #define vqaddq_m(__inactive, __a, __b, __p) __arm_vqaddq_m(__inactive, __a, __b, __p)
 #define vqdmladhq_m(__inactive, __a, __b, __p) __arm_vqdmladhq_m(__inactive, __a, __b, __p)
 #define vqdmlashq_m(__a, __b, __c, __p) __arm_vqdmlashq_m(__a, __b, __c, __p)
@@ -406,7 +403,6 @@
 #define vmovltq_x(__a, __p) __arm_vmovltq_x(__a, __p)
 #define vmvnq_x(__a, __p) __arm_vmvnq_x(__a, __p)
 #define vornq_x(__a, __b, __p) __arm_vornq_x(__a, __b, __p)
-#define vorrq_x(__a, __b, __p) __arm_vorrq_x(__a, __b, __p)
 #define vrev16q_x(__a, __p) __arm_vrev16q_x(__a, __p)
 #define vrev32q_x(__a, __p) __arm_vrev32q_x(__a, __p)
 #define vrev64q_x(__a, __p) __arm_vrev64q_x(__a, __p)
@@ -682,7 +678,6 @@
 #define vqsubq_n_u8(__a, __b) __arm_vqsubq_n_u8(__a, __b)
 #define vqaddq_u8(__a, __b) __arm_vqaddq_u8(__a, __b)
 #define vqaddq_n_u8(__a, __b) __arm_vqaddq_n_u8(__a, __b)
-#define vorrq_u8(__a, __b) __arm_vorrq_u8(__a, __b)
 #define vornq_u8(__a, __b) __arm_vornq_u8(__a, __b)
 #define vmulltq_int_u8(__a, __b) __arm_vmulltq_int_u8(__a, __b)
 #define vmullbq_int_u8(__a, __b) __arm_vmullbq_int_u8(__a, __b)
@@ -754,7 +749,6 @@
 #define vqdmulhq_n_s8(__a, __b) __arm_vqdmulhq_n_s8(__a, __b)
 #define vqaddq_s8(__a, __b) __arm_vqaddq_s8(__a, __b)
 #define vqaddq_n_s8(__a, __b) __arm_vqaddq_n_s8(__a, __b)
-#define vorrq_s8(__a, __b) __arm_vorrq_s8(__a, __b)
 #define vornq_s8(__a, __b) __arm_vornq_s8(__a, __b)
 #define vmulltq_int_s8(__a, __b) __arm_vmulltq_int_s8(__a, __b)
 #define vmullbq_int_s8(__a, __b) __arm_vmullbq_int_s8(__a, __b)
@@ -788,7 +782,6 @@
 #define vqsubq_n_u16(__a, __b) __arm_vqsubq_n_u16(__a, __b)
 #define vqaddq_u16(__a, __b) __arm_vqaddq_u16(__a, __b)
 #define vqaddq_n_u16(__a, __b) __arm_vqaddq_n_u16(__a, __b)
-#define vorrq_u16(__a, __b) __arm_vorrq_u16(__a, __b)
 #define vornq_u16(__a, __b) __arm_vornq_u16(__a, __b)
 #define vmulltq_int_u16(__a, __b) __arm_vmulltq_int_u16(__a, __b)
 #define vmullbq_int_u16(__a, __b) __arm_vmullbq_int_u16(__a, __b)
@@ -860,7 +853,6 @@
 #define vqdmulhq_n_s16(__a, __b) __arm_vqdmulhq_n_s16(__a, __b)
 #define vqaddq_s16(__a, __b) __arm_vqaddq_s16(__a, __b)
 #define vqaddq_n_s16(__a, __b) __arm_vqaddq_n_s16(__a, __b)
-#define vorrq_s16(__a, __b) __arm_vorrq_s16(__a, __b)
 #define vornq_s16(__a, __b) __arm_vornq_s16(__a, __b)
 #define vmulltq_int_s16(__a, __b) __arm_vmulltq_int_s16(__a, __b)
 #define vmullbq_int_s16(__a, __b) __arm_vmullbq_int_s16(__a, __b)
@@ -894,7 +886,6 @@
 #define vqsubq_n_u32(__a, __b) __arm_vqsubq_n_u32(__a, __b)
 #define vqaddq_u32(__a, __b) __arm_vqaddq_u32(__a, __b)
 #define vqaddq_n_u32(__a, __b) __arm_vqaddq_n_u32(__a, __b)
-#define vorrq_u32(__a, __b) __arm_vorrq_u32(__a, __b)
 #define vornq_u32(__a, __b) __arm_vornq_u32(__a, __b)
 #define vmulltq_int_u32(__a, __b) __arm_vmulltq_int_u32(__a, __b)
 #define vmullbq_int_u32(__a, __b) __arm_vmullbq_int_u32(__a, __b)
@@ -966,7 +957,6 @@
 #define vqdmulhq_n_s32(__a, __b) __arm_vqdmulhq_n_s32(__a, __b)
 #define vqaddq_s32(__a, __b) __arm_vqaddq_s32(__a, __b)
 #define vqaddq_n_s32(__a, __b) __arm_vqaddq_n_s32(__a, __b)
-#define vorrq_s32(__a, __b) __arm_vorrq_s32(__a, __b)
 #define vornq_s32(__a, __b) __arm_vornq_s32(__a, __b)
 #define vmulltq_int_s32(__a, __b) __arm_vmulltq_int_s32(__a, __b)
 #define vmullbq_int_s32(__a, __b) __arm_vmullbq_int_s32(__a, __b)
@@ -1005,7 +995,6 @@
 #define vqmovunbq_s16(__a, __b) __arm_vqmovunbq_s16(__a, __b)
 #define vshlltq_n_u8(__a,  __imm) __arm_vshlltq_n_u8(__a,  __imm)
 #define vshllbq_n_u8(__a,  __imm) __arm_vshllbq_n_u8(__a,  __imm)
-#define vorrq_n_u16(__a,  __imm) __arm_vorrq_n_u16(__a,  __imm)
 #define vbicq_n_u16(__a,  __imm) __arm_vbicq_n_u16(__a,  __imm)
 #define vcmpneq_n_f16(__a, __b) __arm_vcmpneq_n_f16(__a, __b)
 #define vcmpneq_f16(__a, __b) __arm_vcmpneq_f16(__a, __b)
@@ -1025,7 +1014,6 @@
 #define vqdmulltq_n_s16(__a, __b) __arm_vqdmulltq_n_s16(__a, __b)
 #define vqdmullbq_s16(__a, __b) __arm_vqdmullbq_s16(__a, __b)
 #define vqdmullbq_n_s16(__a, __b) __arm_vqdmullbq_n_s16(__a, __b)
-#define vorrq_f16(__a, __b) __arm_vorrq_f16(__a, __b)
 #define vornq_f16(__a, __b) __arm_vornq_f16(__a, __b)
 #define vmovntq_s16(__a, __b) __arm_vmovntq_s16(__a, __b)
 #define vmovnbq_s16(__a, __b) __arm_vmovnbq_s16(__a, __b)
@@ -1051,7 +1039,6 @@
 #define vabdq_f16(__a, __b) __arm_vabdq_f16(__a, __b)
 #define vshlltq_n_s8(__a,  __imm) __arm_vshlltq_n_s8(__a,  __imm)
 #define vshllbq_n_s8(__a,  __imm) __arm_vshllbq_n_s8(__a,  __imm)
-#define vorrq_n_s16(__a,  __imm) __arm_vorrq_n_s16(__a,  __imm)
 #define vbicq_n_s16(__a,  __imm) __arm_vbicq_n_s16(__a,  __imm)
 #define vqmovntq_u32(__a, __b) __arm_vqmovntq_u32(__a, __b)
 #define vqmovnbq_u32(__a, __b) __arm_vqmovnbq_u32(__a, __b)
@@ -1064,7 +1051,6 @@
 #define vqmovunbq_s32(__a, __b) __arm_vqmovunbq_s32(__a, __b)
 #define vshlltq_n_u16(__a,  __imm) __arm_vshlltq_n_u16(__a,  __imm)
 #define vshllbq_n_u16(__a,  __imm) __arm_vshllbq_n_u16(__a,  __imm)
-#define vorrq_n_u32(__a,  __imm) __arm_vorrq_n_u32(__a,  __imm)
 #define vbicq_n_u32(__a,  __imm) __arm_vbicq_n_u32(__a,  __imm)
 #define vcmpneq_n_f32(__a, __b) __arm_vcmpneq_n_f32(__a, __b)
 #define vcmpneq_f32(__a, __b) __arm_vcmpneq_f32(__a, __b)
@@ -1084,7 +1070,6 @@
 #define vqdmulltq_n_s32(__a, __b) __arm_vqdmulltq_n_s32(__a, __b)
 #define vqdmullbq_s32(__a, __b) __arm_vqdmullbq_s32(__a, __b)
 #define vqdmullbq_n_s32(__a, __b) __arm_vqdmullbq_n_s32(__a, __b)
-#define vorrq_f32(__a, __b) __arm_vorrq_f32(__a, __b)
 #define vornq_f32(__a, __b) __arm_vornq_f32(__a, __b)
 #define vmovntq_s32(__a, __b) __arm_vmovntq_s32(__a, __b)
 #define vmovnbq_s32(__a, __b) __arm_vmovnbq_s32(__a, __b)
@@ -1110,7 +1095,6 @@
 #define vabdq_f32(__a, __b) __arm_vabdq_f32(__a, __b)
 #define vshlltq_n_s16(__a,  __imm) __arm_vshlltq_n_s16(__a,  __imm)
 #define vshllbq_n_s16(__a,  __imm) __arm_vshllbq_n_s16(__a,  __imm)
-#define vorrq_n_s32(__a,  __imm) __arm_vorrq_n_s32(__a,  __imm)
 #define vbicq_n_s32(__a,  __imm) __arm_vbicq_n_s32(__a,  __imm)
 #define vrmlaldavhq_u32(__a, __b) __arm_vrmlaldavhq_u32(__a, __b)
 #define vctp8q_m(__a, __p) __arm_vctp8q_m(__a, __p)
@@ -1428,7 +1412,6 @@
 #define vrev16q_m_u8(__inactive, __a, __p) __arm_vrev16q_m_u8(__inactive, __a, __p)
 #define vrmlaldavhq_p_u32(__a, __b, __p) __arm_vrmlaldavhq_p_u32(__a, __b, __p)
 #define vmvnq_m_n_s16(__inactive,  __imm, __p) __arm_vmvnq_m_n_s16(__inactive,  __imm, __p)
-#define vorrq_m_n_s16(__a,  __imm, __p) __arm_vorrq_m_n_s16(__a,  __imm, __p)
 #define vqrshrntq_n_s16(__a, __b,  __imm) __arm_vqrshrntq_n_s16(__a, __b,  __imm)
 #define vqshrnbq_n_s16(__a, __b,  __imm) __arm_vqshrnbq_n_s16(__a, __b,  __imm)
 #define vqshrntq_n_s16(__a, __b,  __imm) __arm_vqshrntq_n_s16(__a, __b,  __imm)
@@ -1492,7 +1475,6 @@
 #define vcmpneq_m_f16(__a, __b, __p) __arm_vcmpneq_m_f16(__a, __b, __p)
 #define vcmpneq_m_n_f16(__a, __b, __p) __arm_vcmpneq_m_n_f16(__a, __b, __p)
 #define vmvnq_m_n_u16(__inactive,  __imm, __p) __arm_vmvnq_m_n_u16(__inactive,  __imm, __p)
-#define vorrq_m_n_u16(__a,  __imm, __p) __arm_vorrq_m_n_u16(__a,  __imm, __p)
 #define vqrshruntq_n_s16(__a, __b,  __imm) __arm_vqrshruntq_n_s16(__a, __b,  __imm)
 #define vqshrunbq_n_s16(__a, __b,  __imm) __arm_vqshrunbq_n_s16(__a, __b,  __imm)
 #define vqshruntq_n_s16(__a, __b,  __imm) __arm_vqshruntq_n_s16(__a, __b,  __imm)
@@ -1519,7 +1501,6 @@
 #define vqmovntq_m_u16(__a, __b, __p) __arm_vqmovntq_m_u16(__a, __b, __p)
 #define vrev32q_m_u8(__inactive, __a, __p) __arm_vrev32q_m_u8(__inactive, __a, __p)
 #define vmvnq_m_n_s32(__inactive,  __imm, __p) __arm_vmvnq_m_n_s32(__inactive,  __imm, __p)
-#define vorrq_m_n_s32(__a,  __imm, __p) __arm_vorrq_m_n_s32(__a,  __imm, __p)
 #define vqrshrntq_n_s32(__a, __b,  __imm) __arm_vqrshrntq_n_s32(__a, __b,  __imm)
 #define vqshrnbq_n_s32(__a, __b,  __imm) __arm_vqshrnbq_n_s32(__a, __b,  __imm)
 #define vqshrntq_n_s32(__a, __b,  __imm) __arm_vqshrntq_n_s32(__a, __b,  __imm)
@@ -1583,7 +1564,6 @@
 #define vcmpneq_m_f32(__a, __b, __p) __arm_vcmpneq_m_f32(__a, __b, __p)
 #define vcmpneq_m_n_f32(__a, __b, __p) __arm_vcmpneq_m_n_f32(__a, __b, __p)
 #define vmvnq_m_n_u32(__inactive,  __imm, __p) __arm_vmvnq_m_n_u32(__inactive,  __imm, __p)
-#define vorrq_m_n_u32(__a,  __imm, __p) __arm_vorrq_m_n_u32(__a,  __imm, __p)
 #define vqrshruntq_n_s32(__a, __b,  __imm) __arm_vqrshruntq_n_s32(__a, __b,  __imm)
 #define vqshrunbq_n_s32(__a, __b,  __imm) __arm_vqshrunbq_n_s32(__a, __b,  __imm)
 #define vqshruntq_n_s32(__a, __b,  __imm) __arm_vqshruntq_n_s32(__a, __b,  __imm)
@@ -1757,12 +1737,6 @@
 #define vornq_m_u8(__inactive, __a, __b, __p) __arm_vornq_m_u8(__inactive, __a, __b, __p)
 #define vornq_m_u32(__inactive, __a, __b, __p) __arm_vornq_m_u32(__inactive, __a, __b, __p)
 #define vornq_m_u16(__inactive, __a, __b, __p) __arm_vornq_m_u16(__inactive, __a, __b, __p)
-#define vorrq_m_s8(__inactive, __a, __b, __p) __arm_vorrq_m_s8(__inactive, __a, __b, __p)
-#define vorrq_m_s32(__inactive, __a, __b, __p) __arm_vorrq_m_s32(__inactive, __a, __b, __p)
-#define vorrq_m_s16(__inactive, __a, __b, __p) __arm_vorrq_m_s16(__inactive, __a, __b, __p)
-#define vorrq_m_u8(__inactive, __a, __b, __p) __arm_vorrq_m_u8(__inactive, __a, __b, __p)
-#define vorrq_m_u32(__inactive, __a, __b, __p) __arm_vorrq_m_u32(__inactive, __a, __b, __p)
-#define vorrq_m_u16(__inactive, __a, __b, __p) __arm_vorrq_m_u16(__inactive, __a, __b, __p)
 #define vqaddq_m_n_s8(__inactive, __a, __b, __p) __arm_vqaddq_m_n_s8(__inactive, __a, __b, __p)
 #define vqaddq_m_n_s32(__inactive, __a, __b, __p) __arm_vqaddq_m_n_s32(__inactive, __a, __b, __p)
 #define vqaddq_m_n_s16(__inactive, __a, __b, __p) __arm_vqaddq_m_n_s16(__inactive, __a, __b, __p)
@@ -2014,8 +1988,6 @@
 #define vminnmq_m_f16(__inactive, __a, __b, __p) __arm_vminnmq_m_f16(__inactive, __a, __b, __p)
 #define vornq_m_f32(__inactive, __a, __b, __p) __arm_vornq_m_f32(__inactive, __a, __b, __p)
 #define vornq_m_f16(__inactive, __a, __b, __p) __arm_vornq_m_f16(__inactive, __a, __b, __p)
-#define vorrq_m_f32(__inactive, __a, __b, __p) __arm_vorrq_m_f32(__inactive, __a, __b, __p)
-#define vorrq_m_f16(__inactive, __a, __b, __p) __arm_vorrq_m_f16(__inactive, __a, __b, __p)
 #define vstrbq_s8( __addr, __value) __arm_vstrbq_s8( __addr, __value)
 #define vstrbq_u8( __addr, __value) __arm_vstrbq_u8( __addr, __value)
 #define vstrbq_u16( __addr, __value) __arm_vstrbq_u16( __addr, __value)
@@ -2465,12 +2437,6 @@
 #define vornq_x_u8(__a, __b, __p) __arm_vornq_x_u8(__a, __b, __p)
 #define vornq_x_u16(__a, __b, __p) __arm_vornq_x_u16(__a, __b, __p)
 #define vornq_x_u32(__a, __b, __p) __arm_vornq_x_u32(__a, __b, __p)
-#define vorrq_x_s8(__a, __b, __p) __arm_vorrq_x_s8(__a, __b, __p)
-#define vorrq_x_s16(__a, __b, __p) __arm_vorrq_x_s16(__a, __b, __p)
-#define vorrq_x_s32(__a, __b, __p) __arm_vorrq_x_s32(__a, __b, __p)
-#define vorrq_x_u8(__a, __b, __p) __arm_vorrq_x_u8(__a, __b, __p)
-#define vorrq_x_u16(__a, __b, __p) __arm_vorrq_x_u16(__a, __b, __p)
-#define vorrq_x_u32(__a, __b, __p) __arm_vorrq_x_u32(__a, __b, __p)
 #define vrev16q_x_s8(__a, __p) __arm_vrev16q_x_s8(__a, __p)
 #define vrev16q_x_u8(__a, __p) __arm_vrev16q_x_u8(__a, __p)
 #define vrev32q_x_s8(__a, __p) __arm_vrev32q_x_s8(__a, __p)
@@ -2597,8 +2563,6 @@
 #define vbrsrq_x_n_f32(__a, __b, __p) __arm_vbrsrq_x_n_f32(__a, __b, __p)
 #define vornq_x_f16(__a, __b, __p) __arm_vornq_x_f16(__a, __b, __p)
 #define vornq_x_f32(__a, __b, __p) __arm_vornq_x_f32(__a, __b, __p)
-#define vorrq_x_f16(__a, __b, __p) __arm_vorrq_x_f16(__a, __b, __p)
-#define vorrq_x_f32(__a, __b, __p) __arm_vorrq_x_f32(__a, __b, __p)
 #define vrev32q_x_f16(__a, __p) __arm_vrev32q_x_f16(__a, __p)
 #define vrev64q_x_f16(__a, __p) __arm_vrev64q_x_f16(__a, __p)
 #define vrev64q_x_f32(__a, __p) __arm_vrev64q_x_f32(__a, __p)
@@ -3495,13 +3459,6 @@ __arm_vqaddq_n_u8 (uint8x16_t __a, uint8_t __b)
   return __builtin_mve_vqaddq_n_uv16qi (__a, __b);
 }
 
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vorrq_u8 (uint8x16_t __a, uint8x16_t __b)
-{
-  return __builtin_mve_vorrq_uv16qi (__a, __b);
-}
-
 __extension__ extern __inline uint8x16_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vornq_u8 (uint8x16_t __a, uint8x16_t __b)
@@ -4001,13 +3958,6 @@ __arm_vqaddq_n_s8 (int8x16_t __a, int8_t __b)
   return __builtin_mve_vqaddq_n_sv16qi (__a, __b);
 }
 
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vorrq_s8 (int8x16_t __a, int8x16_t __b)
-{
-  return __builtin_mve_vorrq_sv16qi (__a, __b);
-}
-
 __extension__ extern __inline int8x16_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vornq_s8 (int8x16_t __a, int8x16_t __b)
@@ -4239,13 +4189,6 @@ __arm_vqaddq_n_u16 (uint16x8_t __a, uint16_t __b)
   return __builtin_mve_vqaddq_n_uv8hi (__a, __b);
 }
 
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vorrq_u16 (uint16x8_t __a, uint16x8_t __b)
-{
-  return __builtin_mve_vorrq_uv8hi (__a, __b);
-}
-
 __extension__ extern __inline uint16x8_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vornq_u16 (uint16x8_t __a, uint16x8_t __b)
@@ -4745,13 +4688,6 @@ __arm_vqaddq_n_s16 (int16x8_t __a, int16_t __b)
   return __builtin_mve_vqaddq_n_sv8hi (__a, __b);
 }
 
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vorrq_s16 (int16x8_t __a, int16x8_t __b)
-{
-  return __builtin_mve_vorrq_sv8hi (__a, __b);
-}
-
 __extension__ extern __inline int16x8_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vornq_s16 (int16x8_t __a, int16x8_t __b)
@@ -4983,13 +4919,6 @@ __arm_vqaddq_n_u32 (uint32x4_t __a, uint32_t __b)
   return __builtin_mve_vqaddq_n_uv4si (__a, __b);
 }
 
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vorrq_u32 (uint32x4_t __a, uint32x4_t __b)
-{
-  return __builtin_mve_vorrq_uv4si (__a, __b);
-}
-
 __extension__ extern __inline uint32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vornq_u32 (uint32x4_t __a, uint32x4_t __b)
@@ -5489,13 +5418,6 @@ __arm_vqaddq_n_s32 (int32x4_t __a, int32_t __b)
   return __builtin_mve_vqaddq_n_sv4si (__a, __b);
 }
 
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vorrq_s32 (int32x4_t __a, int32x4_t __b)
-{
-  return __builtin_mve_vorrq_sv4si (__a, __b);
-}
-
 __extension__ extern __inline int32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vornq_s32 (int32x4_t __a, int32x4_t __b)
@@ -5762,13 +5684,6 @@ __arm_vshllbq_n_u8 (uint8x16_t __a, const int __imm)
   return __builtin_mve_vshllbq_n_uv16qi (__a, __imm);
 }
 
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vorrq_n_u16 (uint16x8_t __a, const int __imm)
-{
-  return __builtin_mve_vorrq_n_uv8hi (__a, __imm);
-}
-
 __extension__ extern __inline uint16x8_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vbicq_n_u16 (uint16x8_t __a, const int __imm)
@@ -5874,13 +5789,6 @@ __arm_vshllbq_n_s8 (int8x16_t __a, const int __imm)
   return __builtin_mve_vshllbq_n_sv16qi (__a, __imm);
 }
 
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vorrq_n_s16 (int16x8_t __a, const int __imm)
-{
-  return __builtin_mve_vorrq_n_sv8hi (__a, __imm);
-}
-
 __extension__ extern __inline int16x8_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vbicq_n_s16 (int16x8_t __a, const int __imm)
@@ -5965,13 +5873,6 @@ __arm_vshllbq_n_u16 (uint16x8_t __a, const int __imm)
   return __builtin_mve_vshllbq_n_uv8hi (__a, __imm);
 }
 
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vorrq_n_u32 (uint32x4_t __a, const int __imm)
-{
-  return __builtin_mve_vorrq_n_uv4si (__a, __imm);
-}
-
 __extension__ extern __inline uint32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vbicq_n_u32 (uint32x4_t __a, const int __imm)
@@ -6077,13 +5978,6 @@ __arm_vshllbq_n_s16 (int16x8_t __a, const int __imm)
   return __builtin_mve_vshllbq_n_sv8hi (__a, __imm);
 }
 
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vorrq_n_s32 (int32x4_t __a, const int __imm)
-{
-  return __builtin_mve_vorrq_n_sv4si (__a, __imm);
-}
-
 __extension__ extern __inline int32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vbicq_n_s32 (int32x4_t __a, const int __imm)
@@ -8197,13 +8091,6 @@ __arm_vmvnq_m_n_s16 (int16x8_t __inactive, const int __imm, mve_pred16_t __p)
   return __builtin_mve_vmvnq_m_n_sv8hi (__inactive, __imm, __p);
 }
 
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vorrq_m_n_s16 (int16x8_t __a, const int __imm, mve_pred16_t __p)
-{
-  return __builtin_mve_vorrq_m_n_sv8hi (__a, __imm, __p);
-}
-
 __extension__ extern __inline int8x16_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vqrshrntq_n_s16 (int8x16_t __a, int16x8_t __b, const int __imm)
@@ -8365,13 +8252,6 @@ __arm_vmvnq_m_n_u16 (uint16x8_t __inactive, const int __imm, mve_pred16_t __p)
   return __builtin_mve_vmvnq_m_n_uv8hi (__inactive, __imm, __p);
 }
 
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vorrq_m_n_u16 (uint16x8_t __a, const int __imm, mve_pred16_t __p)
-{
-  return __builtin_mve_vorrq_m_n_uv8hi (__a, __imm, __p);
-}
-
 __extension__ extern __inline uint8x16_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vqrshruntq_n_s16 (uint8x16_t __a, int16x8_t __b, const int __imm)
@@ -8526,13 +8406,6 @@ __arm_vmvnq_m_n_s32 (int32x4_t __inactive, const int __imm, mve_pred16_t __p)
   return __builtin_mve_vmvnq_m_n_sv4si (__inactive, __imm, __p);
 }
 
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vorrq_m_n_s32 (int32x4_t __a, const int __imm, mve_pred16_t __p)
-{
-  return __builtin_mve_vorrq_m_n_sv4si (__a, __imm, __p);
-}
-
 __extension__ extern __inline int16x8_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vqrshrntq_n_s32 (int16x8_t __a, int32x4_t __b, const int __imm)
@@ -8694,13 +8567,6 @@ __arm_vmvnq_m_n_u32 (uint32x4_t __inactive, const int __imm, mve_pred16_t __p)
   return __builtin_mve_vmvnq_m_n_uv4si (__inactive, __imm, __p);
 }
 
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vorrq_m_n_u32 (uint32x4_t __a, const int __imm, mve_pred16_t __p)
-{
-  return __builtin_mve_vorrq_m_n_uv4si (__a, __imm, __p);
-}
-
 __extension__ extern __inline uint16x8_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vqrshruntq_n_s32 (uint16x8_t __a, int32x4_t __b, const int __imm)
@@ -9856,48 +9722,6 @@ __arm_vornq_m_u16 (uint16x8_t __inactive, uint16x8_t __a, uint16x8_t __b, mve_pr
   return __builtin_mve_vornq_m_uv8hi (__inactive, __a, __b, __p);
 }
 
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vorrq_m_s8 (int8x16_t __inactive, int8x16_t __a, int8x16_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vorrq_m_sv16qi (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vorrq_m_s32 (int32x4_t __inactive, int32x4_t __a, int32x4_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vorrq_m_sv4si (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vorrq_m_s16 (int16x8_t __inactive, int16x8_t __a, int16x8_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vorrq_m_sv8hi (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vorrq_m_u8 (uint8x16_t __inactive, uint8x16_t __a, uint8x16_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vorrq_m_uv16qi (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vorrq_m_u32 (uint32x4_t __inactive, uint32x4_t __a, uint32x4_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vorrq_m_uv4si (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vorrq_m_u16 (uint16x8_t __inactive, uint16x8_t __a, uint16x8_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vorrq_m_uv8hi (__inactive, __a, __b, __p);
-}
-
 __extension__ extern __inline int8x16_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vqaddq_m_n_s8 (int8x16_t __inactive, int8x16_t __a, int8_t __b, mve_pred16_t __p)
@@ -14315,48 +14139,6 @@ __arm_vornq_x_u32 (uint32x4_t __a, uint32x4_t __b, mve_pred16_t __p)
   return __builtin_mve_vornq_m_uv4si (__arm_vuninitializedq_u32 (), __a, __b, __p);
 }
 
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vorrq_x_s8 (int8x16_t __a, int8x16_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vorrq_m_sv16qi (__arm_vuninitializedq_s8 (), __a, __b, __p);
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vorrq_x_s16 (int16x8_t __a, int16x8_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vorrq_m_sv8hi (__arm_vuninitializedq_s16 (), __a, __b, __p);
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vorrq_x_s32 (int32x4_t __a, int32x4_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vorrq_m_sv4si (__arm_vuninitializedq_s32 (), __a, __b, __p);
-}
-
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vorrq_x_u8 (uint8x16_t __a, uint8x16_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vorrq_m_uv16qi (__arm_vuninitializedq_u8 (), __a, __b, __p);
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vorrq_x_u16 (uint16x8_t __a, uint16x8_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vorrq_m_uv8hi (__arm_vuninitializedq_u16 (), __a, __b, __p);
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vorrq_x_u32 (uint32x4_t __a, uint32x4_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vorrq_m_uv4si (__arm_vuninitializedq_u32 (), __a, __b, __p);
-}
-
 __extension__ extern __inline int8x16_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vrev16q_x_s8 (int8x16_t __a, mve_pred16_t __p)
@@ -15924,13 +15706,6 @@ __arm_vcmpeqq_f16 (float16x8_t __a, float16x8_t __b)
   return __builtin_mve_vcmpeqq_fv8hf (__a, __b);
 }
 
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vorrq_f16 (float16x8_t __a, float16x8_t __b)
-{
-  return __builtin_mve_vorrq_fv8hf (__a, __b);
-}
-
 __extension__ extern __inline float16x8_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vornq_f16 (float16x8_t __a, float16x8_t __b)
@@ -16134,13 +15909,6 @@ __arm_vcmpeqq_f32 (float32x4_t __a, float32x4_t __b)
   return __builtin_mve_vcmpeqq_fv4sf (__a, __b);
 }
 
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vorrq_f32 (float32x4_t __a, float32x4_t __b)
-{
-  return __builtin_mve_vorrq_fv4sf (__a, __b);
-}
-
 __extension__ extern __inline float32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vornq_f32 (float32x4_t __a, float32x4_t __b)
@@ -17332,20 +17100,6 @@ __arm_vornq_m_f16 (float16x8_t __inactive, float16x8_t __a, float16x8_t __b, mve
   return __builtin_mve_vornq_m_fv8hf (__inactive, __a, __b, __p);
 }
 
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vorrq_m_f32 (float32x4_t __inactive, float32x4_t __a, float32x4_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vorrq_m_fv4sf (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vorrq_m_f16 (float16x8_t __inactive, float16x8_t __a, float16x8_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vorrq_m_fv8hf (__inactive, __a, __b, __p);
-}
-
 __extension__ extern __inline float32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vld1q_f32 (float32_t const * __base)
@@ -18136,20 +17890,6 @@ __arm_vornq_x_f32 (float32x4_t __a, float32x4_t __b, mve_pred16_t __p)
   return __builtin_mve_vornq_m_fv4sf (__arm_vuninitializedq_f32 (), __a, __b, __p);
 }
 
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vorrq_x_f16 (float16x8_t __a, float16x8_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vorrq_m_fv8hf (__arm_vuninitializedq_f16 (), __a, __b, __p);
-}
-
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vorrq_x_f32 (float32x4_t __a, float32x4_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vorrq_m_fv4sf (__arm_vuninitializedq_f32 (), __a, __b, __p);
-}
-
 __extension__ extern __inline float16x8_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vrev32q_x_f16 (float16x8_t __a, mve_pred16_t __p)
@@ -18940,13 +18680,6 @@ __arm_vqaddq (uint8x16_t __a, uint8_t __b)
  return __arm_vqaddq_n_u8 (__a, __b);
 }
 
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vorrq (uint8x16_t __a, uint8x16_t __b)
-{
- return __arm_vorrq_u8 (__a, __b);
-}
-
 __extension__ extern __inline uint8x16_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vornq (uint8x16_t __a, uint8x16_t __b)
@@ -19444,13 +19177,6 @@ __arm_vqaddq (int8x16_t __a, int8_t __b)
  return __arm_vqaddq_n_s8 (__a, __b);
 }
 
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vorrq (int8x16_t __a, int8x16_t __b)
-{
- return __arm_vorrq_s8 (__a, __b);
-}
-
 __extension__ extern __inline int8x16_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vornq (int8x16_t __a, int8x16_t __b)
@@ -19682,13 +19408,6 @@ __arm_vqaddq (uint16x8_t __a, uint16_t __b)
  return __arm_vqaddq_n_u16 (__a, __b);
 }
 
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vorrq (uint16x8_t __a, uint16x8_t __b)
-{
- return __arm_vorrq_u16 (__a, __b);
-}
-
 __extension__ extern __inline uint16x8_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vornq (uint16x8_t __a, uint16x8_t __b)
@@ -20186,13 +19905,6 @@ __arm_vqaddq (int16x8_t __a, int16_t __b)
  return __arm_vqaddq_n_s16 (__a, __b);
 }
 
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vorrq (int16x8_t __a, int16x8_t __b)
-{
- return __arm_vorrq_s16 (__a, __b);
-}
-
 __extension__ extern __inline int16x8_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vornq (int16x8_t __a, int16x8_t __b)
@@ -20424,13 +20136,6 @@ __arm_vqaddq (uint32x4_t __a, uint32_t __b)
  return __arm_vqaddq_n_u32 (__a, __b);
 }
 
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vorrq (uint32x4_t __a, uint32x4_t __b)
-{
- return __arm_vorrq_u32 (__a, __b);
-}
-
 __extension__ extern __inline uint32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vornq (uint32x4_t __a, uint32x4_t __b)
@@ -20928,13 +20633,6 @@ __arm_vqaddq (int32x4_t __a, int32_t __b)
  return __arm_vqaddq_n_s32 (__a, __b);
 }
 
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vorrq (int32x4_t __a, int32x4_t __b)
-{
- return __arm_vorrq_s32 (__a, __b);
-}
-
 __extension__ extern __inline int32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vornq (int32x4_t __a, int32x4_t __b)
@@ -21201,13 +20899,6 @@ __arm_vshllbq (uint8x16_t __a, const int __imm)
  return __arm_vshllbq_n_u8 (__a, __imm);
 }
 
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vorrq (uint16x8_t __a, const int __imm)
-{
- return __arm_vorrq_n_u16 (__a, __imm);
-}
-
 __extension__ extern __inline uint16x8_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vbicq (uint16x8_t __a, const int __imm)
@@ -21313,13 +21004,6 @@ __arm_vshllbq (int8x16_t __a, const int __imm)
  return __arm_vshllbq_n_s8 (__a, __imm);
 }
 
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vorrq (int16x8_t __a, const int __imm)
-{
- return __arm_vorrq_n_s16 (__a, __imm);
-}
-
 __extension__ extern __inline int16x8_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vbicq (int16x8_t __a, const int __imm)
@@ -21404,13 +21088,6 @@ __arm_vshllbq (uint16x8_t __a, const int __imm)
  return __arm_vshllbq_n_u16 (__a, __imm);
 }
 
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vorrq (uint32x4_t __a, const int __imm)
-{
- return __arm_vorrq_n_u32 (__a, __imm);
-}
-
 __extension__ extern __inline uint32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vbicq (uint32x4_t __a, const int __imm)
@@ -21516,13 +21193,6 @@ __arm_vshllbq (int16x8_t __a, const int __imm)
  return __arm_vshllbq_n_s16 (__a, __imm);
 }
 
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vorrq (int32x4_t __a, const int __imm)
-{
- return __arm_vorrq_n_s32 (__a, __imm);
-}
-
 __extension__ extern __inline int32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vbicq (int32x4_t __a, const int __imm)
@@ -23595,13 +23265,6 @@ __arm_vmvnq_m (int16x8_t __inactive, const int __imm, mve_pred16_t __p)
  return __arm_vmvnq_m_n_s16 (__inactive, __imm, __p);
 }
 
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vorrq_m_n (int16x8_t __a, const int __imm, mve_pred16_t __p)
-{
- return __arm_vorrq_m_n_s16 (__a, __imm, __p);
-}
-
 __extension__ extern __inline int8x16_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vqrshrntq (int8x16_t __a, int16x8_t __b, const int __imm)
@@ -23763,13 +23426,6 @@ __arm_vmvnq_m (uint16x8_t __inactive, const int __imm, mve_pred16_t __p)
  return __arm_vmvnq_m_n_u16 (__inactive, __imm, __p);
 }
 
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vorrq_m_n (uint16x8_t __a, const int __imm, mve_pred16_t __p)
-{
- return __arm_vorrq_m_n_u16 (__a, __imm, __p);
-}
-
 __extension__ extern __inline uint8x16_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vqrshruntq (uint8x16_t __a, int16x8_t __b, const int __imm)
@@ -23924,13 +23580,6 @@ __arm_vmvnq_m (int32x4_t __inactive, const int __imm, mve_pred16_t __p)
  return __arm_vmvnq_m_n_s32 (__inactive, __imm, __p);
 }
 
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vorrq_m_n (int32x4_t __a, const int __imm, mve_pred16_t __p)
-{
- return __arm_vorrq_m_n_s32 (__a, __imm, __p);
-}
-
 __extension__ extern __inline int16x8_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vqrshrntq (int16x8_t __a, int32x4_t __b, const int __imm)
@@ -24092,13 +23741,6 @@ __arm_vmvnq_m (uint32x4_t __inactive, const int __imm, mve_pred16_t __p)
  return __arm_vmvnq_m_n_u32 (__inactive, __imm, __p);
 }
 
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vorrq_m_n (uint32x4_t __a, const int __imm, mve_pred16_t __p)
-{
- return __arm_vorrq_m_n_u32 (__a, __imm, __p);
-}
-
 __extension__ extern __inline uint16x8_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vqrshruntq (uint16x8_t __a, int32x4_t __b, const int __imm)
@@ -25254,48 +24896,6 @@ __arm_vornq_m (uint16x8_t __inactive, uint16x8_t __a, uint16x8_t __b, mve_pred16
  return __arm_vornq_m_u16 (__inactive, __a, __b, __p);
 }
 
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vorrq_m (int8x16_t __inactive, int8x16_t __a, int8x16_t __b, mve_pred16_t __p)
-{
- return __arm_vorrq_m_s8 (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vorrq_m (int32x4_t __inactive, int32x4_t __a, int32x4_t __b, mve_pred16_t __p)
-{
- return __arm_vorrq_m_s32 (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vorrq_m (int16x8_t __inactive, int16x8_t __a, int16x8_t __b, mve_pred16_t __p)
-{
- return __arm_vorrq_m_s16 (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vorrq_m (uint8x16_t __inactive, uint8x16_t __a, uint8x16_t __b, mve_pred16_t __p)
-{
- return __arm_vorrq_m_u8 (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vorrq_m (uint32x4_t __inactive, uint32x4_t __a, uint32x4_t __b, mve_pred16_t __p)
-{
- return __arm_vorrq_m_u32 (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vorrq_m (uint16x8_t __inactive, uint16x8_t __a, uint16x8_t __b, mve_pred16_t __p)
-{
- return __arm_vorrq_m_u16 (__inactive, __a, __b, __p);
-}
-
 __extension__ extern __inline int8x16_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vqaddq_m (int8x16_t __inactive, int8x16_t __a, int8_t __b, mve_pred16_t __p)
@@ -29216,48 +28816,6 @@ __arm_vornq_x (uint32x4_t __a, uint32x4_t __b, mve_pred16_t __p)
  return __arm_vornq_x_u32 (__a, __b, __p);
 }
 
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vorrq_x (int8x16_t __a, int8x16_t __b, mve_pred16_t __p)
-{
- return __arm_vorrq_x_s8 (__a, __b, __p);
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vorrq_x (int16x8_t __a, int16x8_t __b, mve_pred16_t __p)
-{
- return __arm_vorrq_x_s16 (__a, __b, __p);
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vorrq_x (int32x4_t __a, int32x4_t __b, mve_pred16_t __p)
-{
- return __arm_vorrq_x_s32 (__a, __b, __p);
-}
-
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vorrq_x (uint8x16_t __a, uint8x16_t __b, mve_pred16_t __p)
-{
- return __arm_vorrq_x_u8 (__a, __b, __p);
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vorrq_x (uint16x8_t __a, uint16x8_t __b, mve_pred16_t __p)
-{
- return __arm_vorrq_x_u16 (__a, __b, __p);
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vorrq_x (uint32x4_t __a, uint32x4_t __b, mve_pred16_t __p)
-{
- return __arm_vorrq_x_u32 (__a, __b, __p);
-}
-
 __extension__ extern __inline int8x16_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vrev16q_x (int8x16_t __a, mve_pred16_t __p)
@@ -30415,13 +29973,6 @@ __arm_vcmpeqq (float16x8_t __a, float16x8_t __b)
  return __arm_vcmpeqq_f16 (__a, __b);
 }
 
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vorrq (float16x8_t __a, float16x8_t __b)
-{
- return __arm_vorrq_f16 (__a, __b);
-}
-
 __extension__ extern __inline float16x8_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vornq (float16x8_t __a, float16x8_t __b)
@@ -30625,13 +30176,6 @@ __arm_vcmpeqq (float32x4_t __a, float32x4_t __b)
  return __arm_vcmpeqq_f32 (__a, __b);
 }
 
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vorrq (float32x4_t __a, float32x4_t __b)
-{
- return __arm_vorrq_f32 (__a, __b);
-}
-
 __extension__ extern __inline float32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vornq (float32x4_t __a, float32x4_t __b)
@@ -31808,20 +31352,6 @@ __arm_vornq_m (float16x8_t __inactive, float16x8_t __a, float16x8_t __b, mve_pre
  return __arm_vornq_m_f16 (__inactive, __a, __b, __p);
 }
 
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vorrq_m (float32x4_t __inactive, float32x4_t __a, float32x4_t __b, mve_pred16_t __p)
-{
- return __arm_vorrq_m_f32 (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vorrq_m (float16x8_t __inactive, float16x8_t __a, float16x8_t __b, mve_pred16_t __p)
-{
- return __arm_vorrq_m_f16 (__inactive, __a, __b, __p);
-}
-
 __extension__ extern __inline float32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vld1q (float32_t const * __base)
@@ -32354,20 +31884,6 @@ __arm_vornq_x (float32x4_t __a, float32x4_t __b, mve_pred16_t __p)
  return __arm_vornq_x_f32 (__a, __b, __p);
 }
 
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vorrq_x (float16x8_t __a, float16x8_t __b, mve_pred16_t __p)
-{
- return __arm_vorrq_x_f16 (__a, __b, __p);
-}
-
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vorrq_x (float32x4_t __a, float32x4_t __b, mve_pred16_t __p)
-{
- return __arm_vorrq_x_f32 (__a, __b, __p);
-}
-
 __extension__ extern __inline float16x8_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vrev32q_x (float16x8_t __a, mve_pred16_t __p)
@@ -32928,18 +32444,6 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_uint16x8_t]: __arm_vcvtq_n_f16_u16 (__ARM_mve_coerce(__p0, uint16x8_t), p1), \
   int (*)[__ARM_mve_type_uint32x4_t]: __arm_vcvtq_n_f32_u32 (__ARM_mve_coerce(__p0, uint32x4_t), p1));})
 
-#define __arm_vorrq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
-  __typeof(p1) __p1 = (p1); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vorrq_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t)), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vorrq_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t)), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vorrq_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t)), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]: __arm_vorrq_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t)), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vorrq_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t)), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vorrq_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t)), \
-  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]: __arm_vorrq_f16 (__ARM_mve_coerce(__p0, float16x8_t), __ARM_mve_coerce(__p1, float16x8_t)), \
-  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]: __arm_vorrq_f32 (__ARM_mve_coerce(__p0, float32x4_t), __ARM_mve_coerce(__p1, float32x4_t)));})
-
 #define __arm_vabdq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
   _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
@@ -34467,19 +33971,6 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]: __arm_vornq_m_f16 (__ARM_mve_coerce(__p0, float16x8_t), __ARM_mve_coerce(__p1, float16x8_t), __ARM_mve_coerce(__p2, float16x8_t), p3), \
   int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]: __arm_vornq_m_f32 (__ARM_mve_coerce(__p0, float32x4_t), __ARM_mve_coerce(__p1, float32x4_t), __ARM_mve_coerce(__p2, float32x4_t), p3));})
 
-#define __arm_vorrq_m(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
-  __typeof(p1) __p1 = (p1); \
-  __typeof(p2) __p2 = (p2); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vorrq_m_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8x16_t), p3), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vorrq_m_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16x8_t), p3), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vorrq_m_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32x4_t), p3), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]: __arm_vorrq_m_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce(__p2, uint8x16_t), p3), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vorrq_m_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, uint16x8_t), p3), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vorrq_m_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, uint32x4_t), p3), \
-  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]: __arm_vorrq_m_f16 (__ARM_mve_coerce(__p0, float16x8_t), __ARM_mve_coerce(__p1, float16x8_t), __ARM_mve_coerce(__p2, float16x8_t), p3), \
-  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]: __arm_vorrq_m_f32 (__ARM_mve_coerce(__p0, float32x4_t), __ARM_mve_coerce(__p1, float32x4_t), __ARM_mve_coerce(__p2, float32x4_t), p3));})
-
 #define __arm_vld1q(p0) (\
   _Generic( (int (*)[__ARM_mve_typeid(p0)])0, \
   int (*)[__ARM_mve_type_int8_t_ptr]: __arm_vld1q_s8 (__ARM_mve_coerce1(p0, int8_t *)), \
@@ -34923,18 +34414,6 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]: __arm_vornq_x_f16 (__ARM_mve_coerce(__p1, float16x8_t), __ARM_mve_coerce(__p2, float16x8_t), p3), \
   int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]: __arm_vornq_x_f32 (__ARM_mve_coerce(__p1, float32x4_t), __ARM_mve_coerce(__p2, float32x4_t), p3));})
 
-#define __arm_vorrq_x(p1,p2,p3) ({ __typeof(p1) __p1 = (p1); \
-  __typeof(p2) __p2 = (p2); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vorrq_x_s8 (__ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8x16_t), p3), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vorrq_x_s16 (__ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16x8_t), p3), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vorrq_x_s32 (__ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32x4_t), p3), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]: __arm_vorrq_x_u8 (__ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce(__p2, uint8x16_t), p3), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vorrq_x_u16 (__ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, uint16x8_t), p3), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vorrq_x_u32 (__ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, uint32x4_t), p3), \
-  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]: __arm_vorrq_x_f16 (__ARM_mve_coerce(__p1, float16x8_t), __ARM_mve_coerce(__p2, float16x8_t), p3), \
-  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]: __arm_vorrq_x_f32 (__ARM_mve_coerce(__p1, float32x4_t), __ARM_mve_coerce(__p2, float32x4_t), p3));})
-
 #define __arm_vrev32q_x(p1,p2) ({ __typeof(p1) __p1 = (p1); \
   _Generic( (int (*)[__ARM_mve_typeid(__p1)])0, \
   int (*)[__ARM_mve_type_int8x16_t]: __arm_vrev32q_x_s8 (__ARM_mve_coerce(__p1, int8x16_t), p2), \
@@ -35321,16 +34800,6 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vqaddq_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t)), \
   int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vqaddq_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t)));})
 
-#define __arm_vorrq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
-  __typeof(p1) __p1 = (p1); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vorrq_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t)), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vorrq_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t)), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vorrq_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t)), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]: __arm_vorrq_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t)), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vorrq_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t)), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vorrq_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t)));})
-
 #define __arm_vornq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
   _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
@@ -36244,17 +35713,6 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vornq_m_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, uint16x8_t), p3), \
   int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vornq_m_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, uint32x4_t), p3));})
 
-#define __arm_vorrq_m(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
-  __typeof(p1) __p1 = (p1); \
-  __typeof(p2) __p2 = (p2); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vorrq_m_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8x16_t), p3), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vorrq_m_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16x8_t), p3), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vorrq_m_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32x4_t), p3), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]: __arm_vorrq_m_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce(__p2, uint8x16_t), p3), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vorrq_m_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, uint16x8_t), p3), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vorrq_m_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, uint32x4_t), p3));})
-
 #define __arm_vstrwq_scatter_base(p0,p1,p2) ({ __typeof(p2) __p2 = (p2); \
   _Generic( (int (*)[__ARM_mve_typeid(__p2)])0, \
   int (*)[__ARM_mve_type_int32x4_t]: __arm_vstrwq_scatter_base_s32(p0, p1, __ARM_mve_coerce(__p2, int32x4_t)), \
@@ -36590,16 +36048,6 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vornq_x_u16 (__ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, uint16x8_t), p3), \
   int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vornq_x_u32 (__ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, uint32x4_t), p3));})
 
-#define __arm_vorrq_x(p1,p2,p3) ({ __typeof(p1) __p1 = (p1); \
-  __typeof(p2) __p2 = (p2); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vorrq_x_s8 (__ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8x16_t), p3), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vorrq_x_s16 (__ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16x8_t), p3), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vorrq_x_s32 (__ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32x4_t), p3), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]: __arm_vorrq_x_u8 (__ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce(__p2, uint8x16_t), p3), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vorrq_x_u16 (__ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, uint16x8_t), p3), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vorrq_x_u32 (__ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, uint32x4_t), p3));})
-
 #define __arm_vrev32q_x(p1,p2) ({ __typeof(p1) __p1 = (p1); \
   _Generic( (int (*)[__ARM_mve_typeid(__p1)])0, \
   int (*)[__ARM_mve_type_int8x16_t]: __arm_vrev32q_x_s8 (__ARM_mve_coerce(__p1, int8x16_t), p2), \
@@ -37378,13 +36826,6 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vmvnq_m_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce1(__p1, int) , p2), \
   int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vmvnq_m_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce1(__p1, int) , p2));})
 
-#define __arm_vorrq_m_n(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
-  int (*)[__ARM_mve_type_int16x8_t]: __arm_vorrq_m_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), p1, p2), \
-  int (*)[__ARM_mve_type_int32x4_t]: __arm_vorrq_m_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), p1, p2), \
-  int (*)[__ARM_mve_type_uint16x8_t]: __arm_vorrq_m_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), p1, p2), \
-  int (*)[__ARM_mve_type_uint32x4_t]: __arm_vorrq_m_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), p1, p2));})
-
 #define __arm_vqshrunbq(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
   _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
-- 
2.34.1


^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH 14/22] arm: [MVE intrinsics] add unspec_mve_function_exact_insn
  2023-04-18 13:45 [PATCH 00/22] arm: New framework for MVE intrinsics Christophe Lyon
                   ` (12 preceding siblings ...)
  2023-04-18 13:45 ` [PATCH 13/22] arm: [MVE intrinsics] rework vorrq Christophe Lyon
@ 2023-04-18 13:46 ` Christophe Lyon
  2023-05-03  8:40   ` Kyrylo Tkachov
  2023-04-18 13:46 ` [PATCH 15/22] arm: [MVE intrinsics] add create shape Christophe Lyon
                   ` (8 subsequent siblings)
  22 siblings, 1 reply; 55+ messages in thread
From: Christophe Lyon @ 2023-04-18 13:46 UTC (permalink / raw)
  To: gcc-patches, kyrylo.tkachov, richard.earnshaw, richard.sandiford
  Cc: Christophe Lyon

Introduce a function that will be used to build intrinsics which use
UNSPECS for the versions.

2022-09-08  Christophe Lyon <christophe.lyon@arm.com>

	gcc/
	* config/arm/arm-mve-builtins-functions.h (class
	unspec_mve_function_exact_insn): New.
---
 gcc/config/arm/arm-mve-builtins-functions.h | 151 ++++++++++++++++++++
 1 file changed, 151 insertions(+)

diff --git a/gcc/config/arm/arm-mve-builtins-functions.h b/gcc/config/arm/arm-mve-builtins-functions.h
index 6d992b270b0..5abf913d182 100644
--- a/gcc/config/arm/arm-mve-builtins-functions.h
+++ b/gcc/config/arm/arm-mve-builtins-functions.h
@@ -225,6 +225,157 @@ public:
   }
 };
 
+/* Map the function directly to CODE (UNSPEC, M) where M is the vector
+   mode associated with type suffix 0.  */
+class unspec_mve_function_exact_insn : public function_base
+{
+public:
+  CONSTEXPR unspec_mve_function_exact_insn (int unspec_for_sint,
+					    int unspec_for_uint,
+					    int unspec_for_fp,
+					    int unspec_for_n_sint,
+					    int unspec_for_n_uint,
+					    int unspec_for_n_fp,
+					    int unspec_for_m_sint,
+					    int unspec_for_m_uint,
+					    int unspec_for_m_fp,
+					    int unspec_for_m_n_sint,
+					    int unspec_for_m_n_uint,
+					    int unspec_for_m_n_fp)
+    : m_unspec_for_sint (unspec_for_sint),
+      m_unspec_for_uint (unspec_for_uint),
+      m_unspec_for_fp (unspec_for_fp),
+      m_unspec_for_n_sint (unspec_for_n_sint),
+      m_unspec_for_n_uint (unspec_for_n_uint),
+      m_unspec_for_n_fp (unspec_for_n_fp),
+      m_unspec_for_m_sint (unspec_for_m_sint),
+      m_unspec_for_m_uint (unspec_for_m_uint),
+      m_unspec_for_m_fp (unspec_for_m_fp),
+      m_unspec_for_m_n_sint (unspec_for_m_n_sint),
+      m_unspec_for_m_n_uint (unspec_for_m_n_uint),
+      m_unspec_for_m_n_fp (unspec_for_m_n_fp)
+  {}
+
+  /* The unspec code associated with signed-integer, unsigned-integer
+     and floating-point operations respectively.  It covers the cases
+     with the _n suffix, and/or the _m predicate.  */
+  int m_unspec_for_sint;
+  int m_unspec_for_uint;
+  int m_unspec_for_fp;
+  int m_unspec_for_n_sint;
+  int m_unspec_for_n_uint;
+  int m_unspec_for_n_fp;
+  int m_unspec_for_m_sint;
+  int m_unspec_for_m_uint;
+  int m_unspec_for_m_fp;
+  int m_unspec_for_m_n_sint;
+  int m_unspec_for_m_n_uint;
+  int m_unspec_for_m_n_fp;
+
+  rtx
+  expand (function_expander &e) const override
+  {
+    insn_code code;
+    switch (e.pred)
+      {
+      case PRED_none:
+	switch (e.mode_suffix_id)
+	  {
+	  case MODE_none:
+	    /* No predicate, no suffix.  */
+	    if (e.type_suffix (0).integer_p)
+	      if (e.type_suffix (0).unsigned_p)
+		code = code_for_mve_q (m_unspec_for_uint, m_unspec_for_uint, e.vector_mode (0));
+	      else
+		code = code_for_mve_q (m_unspec_for_sint, m_unspec_for_sint, e.vector_mode (0));
+	    else
+	      code = code_for_mve_q_f (m_unspec_for_fp, e.vector_mode (0));
+	    break;
+
+	  case MODE_n:
+	    /* No predicate, _n suffix.  */
+	    if (e.type_suffix (0).integer_p)
+	      if (e.type_suffix (0).unsigned_p)
+		code = code_for_mve_q_n (m_unspec_for_n_uint, m_unspec_for_n_uint, e.vector_mode (0));
+	      else
+		code = code_for_mve_q_n (m_unspec_for_n_sint, m_unspec_for_n_sint, e.vector_mode (0));
+	    else
+	      code = code_for_mve_q_n_f (m_unspec_for_n_fp, e.vector_mode (0));
+	    break;
+
+	  default:
+	    gcc_unreachable ();
+	  }
+	return e.use_exact_insn (code);
+
+      case PRED_m:
+	switch (e.mode_suffix_id)
+	  {
+	  case MODE_none:
+	    /* No suffix, "m" predicate.  */
+	    if (e.type_suffix (0).integer_p)
+	      if (e.type_suffix (0).unsigned_p)
+		code = code_for_mve_q_m (m_unspec_for_m_uint, m_unspec_for_m_uint, e.vector_mode (0));
+	      else
+		code = code_for_mve_q_m (m_unspec_for_m_sint, m_unspec_for_m_sint, e.vector_mode (0));
+	    else
+	      code = code_for_mve_q_m_f (m_unspec_for_m_fp, e.vector_mode (0));
+	    break;
+
+	  case MODE_n:
+	    /* _n suffix, "m" predicate.  */
+	    if (e.type_suffix (0).integer_p)
+	      if (e.type_suffix (0).unsigned_p)
+		code = code_for_mve_q_m_n (m_unspec_for_m_n_uint, m_unspec_for_m_n_uint, e.vector_mode (0));
+	      else
+		code = code_for_mve_q_m_n (m_unspec_for_m_n_sint, m_unspec_for_m_n_sint, e.vector_mode (0));
+	    else
+	      code = code_for_mve_q_m_n_f (m_unspec_for_m_n_fp, e.vector_mode (0));
+	    break;
+
+	  default:
+	    gcc_unreachable ();
+	  }
+	return e.use_cond_insn (code, 0);
+
+      case PRED_x:
+	switch (e.mode_suffix_id)
+	  {
+	  case MODE_none:
+	    /* No suffix, "x" predicate.  */
+	    if (e.type_suffix (0).integer_p)
+	      if (e.type_suffix (0).unsigned_p)
+		code = code_for_mve_q_m (m_unspec_for_m_uint, m_unspec_for_m_uint, e.vector_mode (0));
+	      else
+		code = code_for_mve_q_m (m_unspec_for_m_sint, m_unspec_for_m_sint, e.vector_mode (0));
+	    else
+	      code = code_for_mve_q_m_f (m_unspec_for_m_fp, e.vector_mode (0));
+	    break;
+
+	  case MODE_n:
+	    /* _n suffix, "x" predicate.  */
+	    if (e.type_suffix (0).integer_p)
+	      if (e.type_suffix (0).unsigned_p)
+		code = code_for_mve_q_m_n (m_unspec_for_m_n_uint, m_unspec_for_m_n_uint, e.vector_mode (0));
+	      else
+		code = code_for_mve_q_m_n (m_unspec_for_m_n_sint, m_unspec_for_m_n_sint, e.vector_mode (0));
+	    else
+	      code = code_for_mve_q_m_n_f (m_unspec_for_m_n_fp, e.vector_mode (0));
+	    break;
+
+	  default:
+	    gcc_unreachable ();
+	  }
+	return e.use_pred_x_insn (code);
+
+      default:
+	gcc_unreachable ();
+      }
+
+    gcc_unreachable ();
+  }
+};
+
 } /* end namespace arm_mve */
 
 /* Declare the global function base NAME, creating it from an instance
-- 
2.34.1


^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH 15/22] arm: [MVE intrinsics] add create shape
  2023-04-18 13:45 [PATCH 00/22] arm: New framework for MVE intrinsics Christophe Lyon
                   ` (13 preceding siblings ...)
  2023-04-18 13:46 ` [PATCH 14/22] arm: [MVE intrinsics] add unspec_mve_function_exact_insn Christophe Lyon
@ 2023-04-18 13:46 ` Christophe Lyon
  2023-05-03  8:40   ` Kyrylo Tkachov
  2023-04-18 13:46 ` [PATCH 16/22] arm: [MVE intrinsics] factorize vcreateq Christophe Lyon
                   ` (7 subsequent siblings)
  22 siblings, 1 reply; 55+ messages in thread
From: Christophe Lyon @ 2023-04-18 13:46 UTC (permalink / raw)
  To: gcc-patches, kyrylo.tkachov, richard.earnshaw, richard.sandiford
  Cc: Christophe Lyon

This patch adds the create shape description.

2022-09-08  Christophe Lyon  <christophe.lyon@arm.com>

	gcc/
	* config/arm/arm-mve-builtins-shapes.cc (create): New.
	* config/arm/arm-mve-builtins-shapes.h: (create): New.
---
 gcc/config/arm/arm-mve-builtins-shapes.cc | 22 ++++++++++++++++++++++
 gcc/config/arm/arm-mve-builtins-shapes.h  |  1 +
 2 files changed, 23 insertions(+)

diff --git a/gcc/config/arm/arm-mve-builtins-shapes.cc b/gcc/config/arm/arm-mve-builtins-shapes.cc
index 83410bbc51a..e4a42005852 100644
--- a/gcc/config/arm/arm-mve-builtins-shapes.cc
+++ b/gcc/config/arm/arm-mve-builtins-shapes.cc
@@ -458,6 +458,28 @@ struct binary_orrq_def : public overloaded_base<0>
 };
 SHAPE (binary_orrq)
 
+/* <T0>xN_t vfoo[_t0](uint64_t, uint64_t)
+
+   where there are N arguments in total.
+   Example: vcreateq.
+   int16x8_t [__arm_]vcreateq_s16(uint64_t a, uint64_t b)  */
+struct create_def : public nonoverloaded_base
+{
+  void
+  build (function_builder &b, const function_group_info &group,
+	 bool preserve_user_namespace) const override
+  {
+    build_all (b, "v0,su64,su64", group, MODE_none, preserve_user_namespace);
+  }
+
+  tree
+  resolve (function_resolver &r) const override
+  {
+    return r.resolve_uniform (0, 2);
+  }
+};
+SHAPE (create)
+
 /* <T0>[xN]_t vfoo_t0().
 
    Example: vuninitializedq.
diff --git a/gcc/config/arm/arm-mve-builtins-shapes.h b/gcc/config/arm/arm-mve-builtins-shapes.h
index 618b3226050..3305d12877a 100644
--- a/gcc/config/arm/arm-mve-builtins-shapes.h
+++ b/gcc/config/arm/arm-mve-builtins-shapes.h
@@ -37,6 +37,7 @@ namespace arm_mve
     extern const function_shape *const binary;
     extern const function_shape *const binary_opt_n;
     extern const function_shape *const binary_orrq;
+    extern const function_shape *const create;
     extern const function_shape *const inherent;
     extern const function_shape *const unary_convert;
 
-- 
2.34.1


^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH 16/22] arm: [MVE intrinsics] factorize vcreateq
  2023-04-18 13:45 [PATCH 00/22] arm: New framework for MVE intrinsics Christophe Lyon
                   ` (14 preceding siblings ...)
  2023-04-18 13:46 ` [PATCH 15/22] arm: [MVE intrinsics] add create shape Christophe Lyon
@ 2023-04-18 13:46 ` Christophe Lyon
  2023-05-03  8:42   ` Kyrylo Tkachov
  2023-04-18 13:46 ` [PATCH 17/22] arm: [MVE intrinsics] rework vcreateq Christophe Lyon
                   ` (6 subsequent siblings)
  22 siblings, 1 reply; 55+ messages in thread
From: Christophe Lyon @ 2023-04-18 13:46 UTC (permalink / raw)
  To: gcc-patches, kyrylo.tkachov, richard.earnshaw, richard.sandiford
  Cc: Christophe Lyon

We need a 'fake' iterator to be able to use mve_insn for vcreateq_f.

2022-09-08  Christophe Lyon <christophe.lyon@arm.com>

	gcc/
	* config/arm/iterators.md (MVE_FP_CREATE): New.
	(mve_insn): Add VCREATEQ_S, VCREATEQ_U, VCREATEQ_F.
	* config/arm/mve.md (mve_vcreateq_f<mode>): Rename into ...
	(@mve_<mve_insn>q_f<mode>): ... this.
	(mve_vcreateq_<supf><mode>): Rename into ...
	(@mve_<mve_insn>q_<supf><mode>): ... this.
---
 gcc/config/arm/iterators.md | 5 +++++
 gcc/config/arm/mve.md       | 6 +++---
 2 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
index b0ea1af77d2..5a531d77a33 100644
--- a/gcc/config/arm/iterators.md
+++ b/gcc/config/arm/iterators.md
@@ -393,6 +393,10 @@ (define_int_iterator MVE_FP_N_BINARY   [
 		     VSUBQ_N_F
 		     ])
 
+(define_int_iterator MVE_FP_CREATE [
+		     VCREATEQ_F
+		     ])
+
 (define_code_attr mve_addsubmul [
 		 (minus "vsub")
 		 (mult "vmul")
@@ -407,6 +411,7 @@ (define_int_attr mve_insn [
 		 (VBICQ_M_N_S "vbic") (VBICQ_M_N_U "vbic")
 		 (VBICQ_M_S "vbic") (VBICQ_M_U "vbic") (VBICQ_M_F "vbic")
 		 (VBICQ_N_S "vbic") (VBICQ_N_U "vbic")
+		 (VCREATEQ_S "vcreate") (VCREATEQ_U "vcreate") (VCREATEQ_F "vcreate")
 		 (VEORQ_M_S "veor") (VEORQ_M_U "veor") (VEORQ_M_F "veor")
 		 (VMULQ_M_N_S "vmul") (VMULQ_M_N_U "vmul") (VMULQ_M_N_F "vmul")
 		 (VMULQ_M_S "vmul") (VMULQ_M_U "vmul") (VMULQ_M_F "vmul")
diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index fbae1d3791f..f7f0ba65251 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -700,12 +700,12 @@ (define_insn "mve_vcvtq_n_to_f_<supf><mode>"
 
 ;; [vcreateq_f])
 ;;
-(define_insn "mve_vcreateq_f<mode>"
+(define_insn "@mve_<mve_insn>q_f<mode>"
   [
    (set (match_operand:MVE_0 0 "s_register_operand" "=w")
 	(unspec:MVE_0 [(match_operand:DI 1 "s_register_operand" "r")
 		       (match_operand:DI 2 "s_register_operand" "r")]
-	 VCREATEQ_F))
+	 MVE_FP_CREATE))
   ]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
   "vmov %q0[2], %q0[0], %Q1, %Q2\;vmov %q0[3], %q0[1], %R1, %R2"
@@ -715,7 +715,7 @@ (define_insn "mve_vcreateq_f<mode>"
 ;;
 ;; [vcreateq_u, vcreateq_s])
 ;;
-(define_insn "mve_vcreateq_<supf><mode>"
+(define_insn "@mve_<mve_insn>q_<supf><mode>"
   [
    (set (match_operand:MVE_1 0 "s_register_operand" "=w")
 	(unspec:MVE_1 [(match_operand:DI 1 "s_register_operand" "r")
-- 
2.34.1


^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH 17/22] arm: [MVE intrinsics] rework vcreateq
  2023-04-18 13:45 [PATCH 00/22] arm: New framework for MVE intrinsics Christophe Lyon
                   ` (15 preceding siblings ...)
  2023-04-18 13:46 ` [PATCH 16/22] arm: [MVE intrinsics] factorize vcreateq Christophe Lyon
@ 2023-04-18 13:46 ` Christophe Lyon
  2023-05-03  8:44   ` Kyrylo Tkachov
  2023-04-18 13:46 ` [PATCH 18/22] arm: [MVE intrinsics] factorize several binary_m operations Christophe Lyon
                   ` (5 subsequent siblings)
  22 siblings, 1 reply; 55+ messages in thread
From: Christophe Lyon @ 2023-04-18 13:46 UTC (permalink / raw)
  To: gcc-patches, kyrylo.tkachov, richard.earnshaw, richard.sandiford
  Cc: Christophe Lyon

Implement vcreateq using the new MVE builtins framework.

2022-09-08  Christophe Lyon <christophe.lyon@arm.com>

	gcc/
	* config/arm/arm-mve-builtins-base.cc (FUNCTION_WITHOUT_M_N): New. (vcreateq): New.
	* config/arm/arm-mve-builtins-base.def (vcreateq): New.
	* config/arm/arm-mve-builtins-base.h (vcreateq): New.
	* config/arm/arm_mve.h (vcreateq_f16): Remove.
	(vcreateq_f32): Remove.
	(vcreateq_u8): Remove.
	(vcreateq_u16): Remove.
	(vcreateq_u32): Remove.
	(vcreateq_u64): Remove.
	(vcreateq_s8): Remove.
	(vcreateq_s16): Remove.
	(vcreateq_s32): Remove.
	(vcreateq_s64): Remove.
	(__arm_vcreateq_u8): Remove.
	(__arm_vcreateq_u16): Remove.
	(__arm_vcreateq_u32): Remove.
	(__arm_vcreateq_u64): Remove.
	(__arm_vcreateq_s8): Remove.
	(__arm_vcreateq_s16): Remove.
	(__arm_vcreateq_s32): Remove.
	(__arm_vcreateq_s64): Remove.
	(__arm_vcreateq_f16): Remove.
	(__arm_vcreateq_f32): Remove.
---
 gcc/config/arm/arm-mve-builtins-base.cc  | 10 +++
 gcc/config/arm/arm-mve-builtins-base.def |  2 +
 gcc/config/arm/arm-mve-builtins-base.h   |  1 +
 gcc/config/arm/arm_mve.h                 | 80 ------------------------
 4 files changed, 13 insertions(+), 80 deletions(-)

diff --git a/gcc/config/arm/arm-mve-builtins-base.cc b/gcc/config/arm/arm-mve-builtins-base.cc
index 499a1ef9f0e..9722c861faf 100644
--- a/gcc/config/arm/arm-mve-builtins-base.cc
+++ b/gcc/config/arm/arm-mve-builtins-base.cc
@@ -106,8 +106,18 @@ namespace arm_mve {
     UNSPEC##_M_S, UNSPEC##_M_U, UNSPEC##_M_F,				\
     UNSPEC##_M_N_S, UNSPEC##_M_N_U, -1))
 
+  /* Helper for builtins without RTX codes, no _m predicated and no _n
+     overrides.  */
+#define FUNCTION_WITHOUT_M_N(NAME, UNSPEC) FUNCTION			\
+  (NAME, unspec_mve_function_exact_insn,				\
+   (UNSPEC##_S, UNSPEC##_U, UNSPEC##_F,					\
+    -1, -1, -1,								\
+    -1, -1, -1,								\
+    -1, -1, -1))
+
 FUNCTION_WITH_RTX_M_N (vaddq, PLUS, VADDQ)
 FUNCTION_WITH_RTX_M (vandq, AND, VANDQ)
+FUNCTION_WITHOUT_M_N (vcreateq, VCREATEQ)
 FUNCTION_WITH_RTX_M (veorq, XOR, VEORQ)
 FUNCTION_WITH_RTX_M_N (vmulq, MULT, VMULQ)
 FUNCTION_WITH_RTX_M_N_NO_N_F (vorrq, IOR, VORRQ)
diff --git a/gcc/config/arm/arm-mve-builtins-base.def b/gcc/config/arm/arm-mve-builtins-base.def
index c3f8c0f0eeb..1bfd15f973c 100644
--- a/gcc/config/arm/arm-mve-builtins-base.def
+++ b/gcc/config/arm/arm-mve-builtins-base.def
@@ -20,6 +20,7 @@
 #define REQUIRES_FLOAT false
 DEF_MVE_FUNCTION (vaddq, binary_opt_n, all_integer, mx_or_none)
 DEF_MVE_FUNCTION (vandq, binary, all_integer, mx_or_none)
+DEF_MVE_FUNCTION (vcreateq, create, all_integer_with_64, none)
 DEF_MVE_FUNCTION (veorq, binary, all_integer, mx_or_none)
 DEF_MVE_FUNCTION (vmulq, binary_opt_n, all_integer, mx_or_none)
 DEF_MVE_FUNCTION (vorrq, binary_orrq, all_integer, mx_or_none)
@@ -31,6 +32,7 @@ DEF_MVE_FUNCTION (vuninitializedq, inherent, all_integer_with_64, none)
 #define REQUIRES_FLOAT true
 DEF_MVE_FUNCTION (vaddq, binary_opt_n, all_float, mx_or_none)
 DEF_MVE_FUNCTION (vandq, binary, all_float, mx_or_none)
+DEF_MVE_FUNCTION (vcreateq, create, all_float, none)
 DEF_MVE_FUNCTION (veorq, binary, all_float, mx_or_none)
 DEF_MVE_FUNCTION (vmulq, binary_opt_n, all_float, mx_or_none)
 DEF_MVE_FUNCTION (vorrq, binary_orrq, all_float, mx_or_none)
diff --git a/gcc/config/arm/arm-mve-builtins-base.h b/gcc/config/arm/arm-mve-builtins-base.h
index c450b373239..8dd6bff01bf 100644
--- a/gcc/config/arm/arm-mve-builtins-base.h
+++ b/gcc/config/arm/arm-mve-builtins-base.h
@@ -25,6 +25,7 @@ namespace functions {
 
 extern const function_base *const vaddq;
 extern const function_base *const vandq;
+extern const function_base *const vcreateq;
 extern const function_base *const veorq;
 extern const function_base *const vmulq;
 extern const function_base *const vorrq;
diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
index edf8e247421..4810e2977d3 100644
--- a/gcc/config/arm/arm_mve.h
+++ b/gcc/config/arm/arm_mve.h
@@ -638,20 +638,10 @@
 #define vcvtq_n_f32_s32(__a,  __imm6) __arm_vcvtq_n_f32_s32(__a,  __imm6)
 #define vcvtq_n_f16_u16(__a,  __imm6) __arm_vcvtq_n_f16_u16(__a,  __imm6)
 #define vcvtq_n_f32_u32(__a,  __imm6) __arm_vcvtq_n_f32_u32(__a,  __imm6)
-#define vcreateq_f16(__a, __b) __arm_vcreateq_f16(__a, __b)
-#define vcreateq_f32(__a, __b) __arm_vcreateq_f32(__a, __b)
 #define vcvtq_n_s16_f16(__a,  __imm6) __arm_vcvtq_n_s16_f16(__a,  __imm6)
 #define vcvtq_n_s32_f32(__a,  __imm6) __arm_vcvtq_n_s32_f32(__a,  __imm6)
 #define vcvtq_n_u16_f16(__a,  __imm6) __arm_vcvtq_n_u16_f16(__a,  __imm6)
 #define vcvtq_n_u32_f32(__a,  __imm6) __arm_vcvtq_n_u32_f32(__a,  __imm6)
-#define vcreateq_u8(__a, __b) __arm_vcreateq_u8(__a, __b)
-#define vcreateq_u16(__a, __b) __arm_vcreateq_u16(__a, __b)
-#define vcreateq_u32(__a, __b) __arm_vcreateq_u32(__a, __b)
-#define vcreateq_u64(__a, __b) __arm_vcreateq_u64(__a, __b)
-#define vcreateq_s8(__a, __b) __arm_vcreateq_s8(__a, __b)
-#define vcreateq_s16(__a, __b) __arm_vcreateq_s16(__a, __b)
-#define vcreateq_s32(__a, __b) __arm_vcreateq_s32(__a, __b)
-#define vcreateq_s64(__a, __b) __arm_vcreateq_s64(__a, __b)
 #define vshrq_n_s8(__a,  __imm) __arm_vshrq_n_s8(__a,  __imm)
 #define vshrq_n_s16(__a,  __imm) __arm_vshrq_n_s16(__a,  __imm)
 #define vshrq_n_s32(__a,  __imm) __arm_vshrq_n_s32(__a,  __imm)
@@ -3222,62 +3212,6 @@ __arm_vpnot (mve_pred16_t __a)
   return __builtin_mve_vpnotv16bi (__a);
 }
 
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcreateq_u8 (uint64_t __a, uint64_t __b)
-{
-  return __builtin_mve_vcreateq_uv16qi (__a, __b);
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcreateq_u16 (uint64_t __a, uint64_t __b)
-{
-  return __builtin_mve_vcreateq_uv8hi (__a, __b);
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcreateq_u32 (uint64_t __a, uint64_t __b)
-{
-  return __builtin_mve_vcreateq_uv4si (__a, __b);
-}
-
-__extension__ extern __inline uint64x2_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcreateq_u64 (uint64_t __a, uint64_t __b)
-{
-  return __builtin_mve_vcreateq_uv2di (__a, __b);
-}
-
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcreateq_s8 (uint64_t __a, uint64_t __b)
-{
-  return __builtin_mve_vcreateq_sv16qi (__a, __b);
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcreateq_s16 (uint64_t __a, uint64_t __b)
-{
-  return __builtin_mve_vcreateq_sv8hi (__a, __b);
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcreateq_s32 (uint64_t __a, uint64_t __b)
-{
-  return __builtin_mve_vcreateq_sv4si (__a, __b);
-}
-
-__extension__ extern __inline int64x2_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcreateq_s64 (uint64_t __a, uint64_t __b)
-{
-  return __builtin_mve_vcreateq_sv2di (__a, __b);
-}
-
 __extension__ extern __inline int8x16_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vshrq_n_s8 (int8x16_t __a, const int __imm)
@@ -15580,20 +15514,6 @@ __arm_vcvtq_n_f32_u32 (uint32x4_t __a, const int __imm6)
   return __builtin_mve_vcvtq_n_to_f_uv4sf (__a, __imm6);
 }
 
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcreateq_f16 (uint64_t __a, uint64_t __b)
-{
-  return __builtin_mve_vcreateq_fv8hf (__a, __b);
-}
-
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vcreateq_f32 (uint64_t __a, uint64_t __b)
-{
-  return __builtin_mve_vcreateq_fv4sf (__a, __b);
-}
-
 __extension__ extern __inline int16x8_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vcvtq_n_s16_f16 (float16x8_t __a, const int __imm6)
-- 
2.34.1


^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH 18/22] arm: [MVE intrinsics] factorize several binary_m operations
  2023-04-18 13:45 [PATCH 00/22] arm: New framework for MVE intrinsics Christophe Lyon
                   ` (16 preceding siblings ...)
  2023-04-18 13:46 ` [PATCH 17/22] arm: [MVE intrinsics] rework vcreateq Christophe Lyon
@ 2023-04-18 13:46 ` Christophe Lyon
  2023-05-03  8:46   ` Kyrylo Tkachov
  2023-04-18 13:46 ` [PATCH 19/22] arm: [MVE intrinsics] factorize several binary _n operations Christophe Lyon
                   ` (4 subsequent siblings)
  22 siblings, 1 reply; 55+ messages in thread
From: Christophe Lyon @ 2023-04-18 13:46 UTC (permalink / raw)
  To: gcc-patches, kyrylo.tkachov, richard.earnshaw, richard.sandiford
  Cc: Christophe Lyon

Factorize m-predicated versions of vabdq, vhaddq, vhsubq, vmaxq,
vminq, vmulhq, vqaddq, vqdmladhq, vqdmladhxq, vqdmlsdhq, vqdmlsdhxq,
vqdmulhq, vqrdmladhq, vqrdmladhxq, vqrdmlsdhq, vqrdmlsdhxq, vqrdmulhq,
vqrshlq, vqshlq, vqsubq, vrhaddq, vrmulhq, vrshlq, vshlq
so that they use the same pattern.

2022-09-08  Christophe Lyon <christophe.lyon@arm.com>

	gcc/
	* config/arm/iterators.md (MVE_INT_SU_M_BINARY): New.
	(mve_insn): Add vabdq, vhaddq, vhsubq, vmaxq, vminq, vmulhq,
	vqaddq, vqdmladhq, vqdmladhxq, vqdmlsdhq, vqdmlsdhxq, vqdmulhq,
	vqrdmladhq, vqrdmladhxq, vqrdmlsdhq, vqrdmlsdhxq, vqrdmulhq,
	vqrshlq, vqshlq, vqsubq, vrhaddq, vrmulhq, vrshlq, vshlq.
	(supf): Add VQDMLADHQ_M_S, VQDMLADHXQ_M_S, VQDMLSDHQ_M_S,
	VQDMLSDHXQ_M_S, VQDMULHQ_M_S, VQRDMLADHQ_M_S, VQRDMLADHXQ_M_S,
	VQRDMLSDHQ_M_S, VQRDMLSDHXQ_M_S, VQRDMULHQ_M_S.
	* config/arm/mve.md (@mve_<mve_insn>q_m_<supf><mode>): New.
	(mve_vshlq_m_<supf><mode>): Merged into
	@mve_<mve_insn>q_m_<supf><mode>.
	(mve_vabdq_m_<supf><mode>): Likewise.
	(mve_vhaddq_m_<supf><mode>): Likewise.
	(mve_vhsubq_m_<supf><mode>): Likewise.
	(mve_vmaxq_m_<supf><mode>): Likewise.
	(mve_vminq_m_<supf><mode>): Likewise.
	(mve_vmulhq_m_<supf><mode>): Likewise.
	(mve_vqaddq_m_<supf><mode>): Likewise.
	(mve_vqrshlq_m_<supf><mode>): Likewise.
	(mve_vqshlq_m_<supf><mode>): Likewise.
	(mve_vqsubq_m_<supf><mode>): Likewise.
	(mve_vrhaddq_m_<supf><mode>): Likewise.
	(mve_vrmulhq_m_<supf><mode>): Likewise.
	(mve_vrshlq_m_<supf><mode>): Likewise.
	(mve_vqdmladhq_m_s<mode>): Likewise.
	(mve_vqdmladhxq_m_s<mode>): Likewise.
	(mve_vqdmlsdhq_m_s<mode>): Likewise.
	(mve_vqdmlsdhxq_m_s<mode>): Likewise.
	(mve_vqdmulhq_m_s<mode>): Likewise.
	(mve_vqrdmladhq_m_s<mode>): Likewise.
	(mve_vqrdmladhxq_m_s<mode>): Likewise.
	(mve_vqrdmlsdhq_m_s<mode>): Likewise.
	(mve_vqrdmlsdhxq_m_s<mode>): Likewise.
	(mve_vqrdmulhq_m_s<mode>): Likewise.
---
 gcc/config/arm/iterators.md |  65 +++++-
 gcc/config/arm/mve.md       | 420 +++---------------------------------
 2 files changed, 91 insertions(+), 394 deletions(-)

diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
index 5a531d77a33..18d70350bbe 100644
--- a/gcc/config/arm/iterators.md
+++ b/gcc/config/arm/iterators.md
@@ -339,6 +339,33 @@ (define_int_iterator MVE_INT_M_BINARY   [
 		     VSUBQ_M_S VSUBQ_M_U
 		     ])
 
+(define_int_iterator MVE_INT_SU_M_BINARY   [
+		     VABDQ_M_S VABDQ_M_U
+		     VHADDQ_M_S VHADDQ_M_U
+		     VHSUBQ_M_S VHSUBQ_M_U
+		     VMAXQ_M_S VMAXQ_M_U
+		     VMINQ_M_S VMINQ_M_U
+		     VMULHQ_M_S VMULHQ_M_U
+		     VQADDQ_M_S VQADDQ_M_U
+		     VQDMLADHQ_M_S
+		     VQDMLADHXQ_M_S
+		     VQDMLSDHQ_M_S
+		     VQDMLSDHXQ_M_S
+		     VQDMULHQ_M_S
+		     VQRDMLADHQ_M_S
+		     VQRDMLADHXQ_M_S
+		     VQRDMLSDHQ_M_S
+		     VQRDMLSDHXQ_M_S
+		     VQRDMULHQ_M_S
+		     VQRSHLQ_M_S VQRSHLQ_M_U
+		     VQSHLQ_M_S VQSHLQ_M_U
+		     VQSUBQ_M_S VQSUBQ_M_U
+		     VRHADDQ_M_S VRHADDQ_M_U
+		     VRMULHQ_M_S VRMULHQ_M_U
+		     VRSHLQ_M_S VRSHLQ_M_U
+		     VSHLQ_M_S VSHLQ_M_U
+		     ])
+
 (define_int_iterator MVE_INT_M_BINARY_LOGIC   [
 		     VANDQ_M_S VANDQ_M_U
 		     VBICQ_M_S VBICQ_M_U
@@ -404,6 +431,7 @@ (define_code_attr mve_addsubmul [
 		 ])
 
 (define_int_attr mve_insn [
+		 (VABDQ_M_S "vabd") (VABDQ_M_U "vabd")
 		 (VADDQ_M_N_S "vadd") (VADDQ_M_N_U "vadd") (VADDQ_M_N_F "vadd")
 		 (VADDQ_M_S "vadd") (VADDQ_M_U "vadd") (VADDQ_M_F "vadd")
 		 (VADDQ_N_S "vadd") (VADDQ_N_U "vadd") (VADDQ_N_F "vadd")
@@ -413,12 +441,35 @@ (define_int_attr mve_insn [
 		 (VBICQ_N_S "vbic") (VBICQ_N_U "vbic")
 		 (VCREATEQ_S "vcreate") (VCREATEQ_U "vcreate") (VCREATEQ_F "vcreate")
 		 (VEORQ_M_S "veor") (VEORQ_M_U "veor") (VEORQ_M_F "veor")
+		 (VHADDQ_M_S "vhadd") (VHADDQ_M_U "vhadd")
+		 (VHSUBQ_M_S "vhsub") (VHSUBQ_M_U "vhsub")
+		 (VMAXQ_M_S "vmax") (VMAXQ_M_U "vmax")
+		 (VMINQ_M_S "vmin") (VMINQ_M_U "vmin")
+		 (VMULHQ_M_S "vmulh") (VMULHQ_M_U "vmulh")
 		 (VMULQ_M_N_S "vmul") (VMULQ_M_N_U "vmul") (VMULQ_M_N_F "vmul")
 		 (VMULQ_M_S "vmul") (VMULQ_M_U "vmul") (VMULQ_M_F "vmul")
 		 (VMULQ_N_S "vmul") (VMULQ_N_U "vmul") (VMULQ_N_F "vmul")
 		 (VORRQ_M_N_S "vorr") (VORRQ_M_N_U "vorr")
 		 (VORRQ_M_S "vorr") (VORRQ_M_U "vorr") (VORRQ_M_F "vorr")
 		 (VORRQ_N_S "vorr") (VORRQ_N_U "vorr")
+		 (VQADDQ_M_S "vqadd") (VQADDQ_M_U "vqadd")
+		 (VQDMLADHQ_M_S "vqdmladh")
+		 (VQDMLADHXQ_M_S "vqdmladhx")
+		 (VQDMLSDHQ_M_S "vqdmlsdh")
+		 (VQDMLSDHXQ_M_S "vqdmlsdhx")
+		 (VQDMULHQ_M_S "vqdmulh")
+		 (VQRDMLADHQ_M_S "vqrdmladh")
+		 (VQRDMLADHXQ_M_S "vqrdmladhx")
+		 (VQRDMLSDHQ_M_S "vqrdmlsdh")
+		 (VQRDMLSDHXQ_M_S "vqrdmlsdhx")
+		 (VQRDMULHQ_M_S "vqrdmulh")
+		 (VQRSHLQ_M_S "vqrshl") (VQRSHLQ_M_U "vqrshl")
+		 (VQSHLQ_M_S "vqshl") (VQSHLQ_M_U "vqshl")
+		 (VQSUBQ_M_S "vqsub") (VQSUBQ_M_U "vqsub")
+		 (VRHADDQ_M_S "vrhadd") (VRHADDQ_M_U "vrhadd")
+		 (VRMULHQ_M_S "vrmulh") (VRMULHQ_M_U "vrmulh")
+		 (VRSHLQ_M_S "vrshl") (VRSHLQ_M_U "vrshl")
+		 (VSHLQ_M_S "vshl") (VSHLQ_M_U "vshl")
 		 (VSUBQ_M_N_S "vsub") (VSUBQ_M_N_U "vsub") (VSUBQ_M_N_F "vsub")
 		 (VSUBQ_M_S "vsub") (VSUBQ_M_U "vsub") (VSUBQ_M_F "vsub")
 		 (VSUBQ_N_S "vsub") (VSUBQ_N_U "vsub") (VSUBQ_N_F "vsub")
@@ -1557,7 +1608,19 @@ (define_int_attr supf [(VCVTQ_TO_F_S "s") (VCVTQ_TO_F_U "u") (VREV16Q_S "s")
 		       (VADCIQ_U "u") (VADCIQ_M_U "u") (VADCIQ_S "s")
 		       (VADCIQ_M_S "s") (SQRSHRL_64 "64") (SQRSHRL_48 "48")
 		       (UQRSHLL_64 "64") (UQRSHLL_48 "48") (VSHLCQ_M_S "s")
-		       (VSHLCQ_M_U "u")])
+		       (VSHLCQ_M_U "u")
+		       (VQDMLADHQ_M_S "s")
+		       (VQDMLADHXQ_M_S "s")
+		       (VQDMLSDHQ_M_S "s")
+		       (VQDMLSDHXQ_M_S "s")
+		       (VQDMULHQ_M_S "s")
+		       (VQRDMLADHQ_M_S "s")
+		       (VQRDMLADHXQ_M_S "s")
+		       (VQRDMLSDHQ_M_S "s")
+		       (VQRDMLSDHXQ_M_S "s")
+		       (VQRDMULHQ_M_S "s")
+		       ])
+
 ;; Both kinds of return insn.
 (define_code_iterator RETURNS [return simple_return])
 (define_code_attr return_str [(return "") (simple_return "simple_")])
diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index f7f0ba65251..21c54197db5 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -4867,23 +4867,6 @@ (define_insn "mve_vqshluq_m_n_s<mode>"
   [(set_attr "type" "mve_move")
    (set_attr "length" "8")])
 
-;;
-;; [vshlq_m_s, vshlq_m_u])
-;;
-(define_insn "mve_vshlq_m_<supf><mode>"
-  [
-   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
-	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
-		       (match_operand:MVE_2 2 "s_register_operand" "w")
-		       (match_operand:MVE_2 3 "s_register_operand" "w")
-		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
-	 VSHLQ_M))
-  ]
-  "TARGET_HAVE_MVE"
-  "vpst\;vshlt.<supf>%#<V_sz_elem>\t%q0, %q2, %q3"
-  [(set_attr "type" "mve_move")
-   (set_attr "length" "8")])
-
 ;;
 ;; [vsriq_m_n_s, vsriq_m_n_u])
 ;;
@@ -4917,20 +4900,44 @@ (define_insn "mve_vcvtq_m_n_to_f_<supf><mode>"
   "vpst\;vcvtt.f%#<V_sz_elem>.<supf>%#<V_sz_elem>\t%q0, %q2, %3"
   [(set_attr "type" "mve_move")
    (set_attr "length""8")])
+
 ;;
 ;; [vabdq_m_s, vabdq_m_u])
+;; [vhaddq_m_s, vhaddq_m_u])
+;; [vhsubq_m_s, vhsubq_m_u])
+;; [vmaxq_m_s, vmaxq_m_u])
+;; [vminq_m_s, vminq_m_u])
+;; [vmulhq_m_s, vmulhq_m_u])
+;; [vqaddq_m_u, vqaddq_m_s])
+;; [vqdmladhq_m_s])
+;; [vqdmladhxq_m_s])
+;; [vqdmlsdhq_m_s])
+;; [vqdmlsdhxq_m_s])
+;; [vqdmulhq_m_s])
+;; [vqrdmladhq_m_s])
+;; [vqrdmladhxq_m_s])
+;; [vqrdmlsdhq_m_s])
+;; [vqrdmlsdhxq_m_s])
+;; [vqrdmulhq_m_s])
+;; [vqrshlq_m_u, vqrshlq_m_s])
+;; [vqshlq_m_u, vqshlq_m_s])
+;; [vqsubq_m_u, vqsubq_m_s])
+;; [vrhaddq_m_u, vrhaddq_m_s])
+;; [vrmulhq_m_u, vrmulhq_m_s])
+;; [vrshlq_m_s, vrshlq_m_u])
+;; [vshlq_m_s, vshlq_m_u])
 ;;
-(define_insn "mve_vabdq_m_<supf><mode>"
+(define_insn "@mve_<mve_insn>q_m_<supf><mode>"
   [
    (set (match_operand:MVE_2 0 "s_register_operand" "=w")
 	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
 		       (match_operand:MVE_2 2 "s_register_operand" "w")
 		       (match_operand:MVE_2 3 "s_register_operand" "w")
 		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
-	 VABDQ_M))
+	 MVE_INT_SU_M_BINARY))
   ]
   "TARGET_HAVE_MVE"
-  "vpst\;vabdt.<supf>%#<V_sz_elem>	%q0, %q2, %q3"
+  "vpst\;<mve_insn>t.<supf>%#<V_sz_elem>\t%q0, %q2, %q3"
   [(set_attr "type" "mve_move")
    (set_attr "length""8")])
 
@@ -5060,23 +5067,6 @@ (define_insn "mve_vhaddq_m_n_<supf><mode>"
   [(set_attr "type" "mve_move")
    (set_attr "length""8")])
 
-;;
-;; [vhaddq_m_s, vhaddq_m_u])
-;;
-(define_insn "mve_vhaddq_m_<supf><mode>"
-  [
-   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
-	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
-		       (match_operand:MVE_2 2 "s_register_operand" "w")
-		       (match_operand:MVE_2 3 "s_register_operand" "w")
-		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
-	 VHADDQ_M))
-  ]
-  "TARGET_HAVE_MVE"
-  "vpst\;vhaddt.<supf>%#<V_sz_elem>	%q0, %q2, %q3"
-  [(set_attr "type" "mve_move")
-   (set_attr "length""8")])
-
 ;;
 ;; [vhsubq_m_n_s, vhsubq_m_n_u])
 ;;
@@ -5095,56 +5085,6 @@ (define_insn "mve_vhsubq_m_n_<supf><mode>"
    (set_attr "length""8")])
 
 ;;
-;; [vhsubq_m_s, vhsubq_m_u])
-;;
-(define_insn "mve_vhsubq_m_<supf><mode>"
-  [
-   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
-	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
-		       (match_operand:MVE_2 2 "s_register_operand" "w")
-		       (match_operand:MVE_2 3 "s_register_operand" "w")
-		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
-	 VHSUBQ_M))
-  ]
-  "TARGET_HAVE_MVE"
-  "vpst\;vhsubt.<supf>%#<V_sz_elem>	%q0, %q2, %q3"
-  [(set_attr "type" "mve_move")
-   (set_attr "length""8")])
-
-;;
-;; [vmaxq_m_s, vmaxq_m_u])
-;;
-(define_insn "mve_vmaxq_m_<supf><mode>"
-  [
-   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
-	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
-		       (match_operand:MVE_2 2 "s_register_operand" "w")
-		       (match_operand:MVE_2 3 "s_register_operand" "w")
-		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
-	 VMAXQ_M))
-  ]
-  "TARGET_HAVE_MVE"
-  "vpst\;vmaxt.<supf>%#<V_sz_elem>	%q0, %q2, %q3"
-  [(set_attr "type" "mve_move")
-   (set_attr "length""8")])
-
-;;
-;; [vminq_m_s, vminq_m_u])
-;;
-(define_insn "mve_vminq_m_<supf><mode>"
-  [
-   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
-	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
-		       (match_operand:MVE_2 2 "s_register_operand" "w")
-		       (match_operand:MVE_2 3 "s_register_operand" "w")
-		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
-	 VMINQ_M))
-  ]
-  "TARGET_HAVE_MVE"
-  "vpst\;vmint.<supf>%#<V_sz_elem>	%q0, %q2, %q3"
-  [(set_attr "type" "mve_move")
-   (set_attr "length""8")])
-
 ;;
 ;; [vmladavaq_p_u, vmladavaq_p_s])
 ;;
@@ -5196,23 +5136,6 @@ (define_insn "mve_vmlasq_m_n_<supf><mode>"
   [(set_attr "type" "mve_move")
    (set_attr "length""8")])
 
-;;
-;; [vmulhq_m_s, vmulhq_m_u])
-;;
-(define_insn "mve_vmulhq_m_<supf><mode>"
-  [
-   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
-	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
-		       (match_operand:MVE_2 2 "s_register_operand" "w")
-		       (match_operand:MVE_2 3 "s_register_operand" "w")
-		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
-	 VMULHQ_M))
-  ]
-  "TARGET_HAVE_MVE"
-  "vpst\;vmulht.<supf>%#<V_sz_elem>	%q0, %q2, %q3"
-  [(set_attr "type" "mve_move")
-   (set_attr "length""8")])
-
 ;;
 ;; [vmullbq_int_m_u, vmullbq_int_m_s])
 ;;
@@ -5281,23 +5204,6 @@ (define_insn "mve_vqaddq_m_n_<supf><mode>"
   [(set_attr "type" "mve_move")
    (set_attr "length""8")])
 
-;;
-;; [vqaddq_m_u, vqaddq_m_s])
-;;
-(define_insn "mve_vqaddq_m_<supf><mode>"
-  [
-   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
-	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
-		       (match_operand:MVE_2 2 "s_register_operand" "w")
-		       (match_operand:MVE_2 3 "s_register_operand" "w")
-		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
-	 VQADDQ_M))
-  ]
-  "TARGET_HAVE_MVE"
-  "vpst\;vqaddt.<supf>%#<V_sz_elem>\t%q0, %q2, %q3"
-  [(set_attr "type" "mve_move")
-   (set_attr "length""8")])
-
 ;;
 ;; [vqdmlahq_m_n_s])
 ;;
@@ -5366,23 +5272,6 @@ (define_insn "mve_vqrdmlashq_m_n_s<mode>"
   [(set_attr "type" "mve_move")
    (set_attr "length""8")])
 
-;;
-;; [vqrshlq_m_u, vqrshlq_m_s])
-;;
-(define_insn "mve_vqrshlq_m_<supf><mode>"
-  [
-   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
-	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
-		       (match_operand:MVE_2 2 "s_register_operand" "w")
-		       (match_operand:MVE_2 3 "s_register_operand" "w")
-		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
-	 VQRSHLQ_M))
-  ]
-  "TARGET_HAVE_MVE"
-  "vpst\;vqrshlt.<supf>%#<V_sz_elem>\t%q0, %q2, %q3"
-  [(set_attr "type" "mve_move")
-   (set_attr "length""8")])
-
 ;;
 ;; [vqshlq_m_n_s, vqshlq_m_n_u])
 ;;
@@ -5400,23 +5289,6 @@ (define_insn "mve_vqshlq_m_n_<supf><mode>"
   [(set_attr "type" "mve_move")
    (set_attr "length""8")])
 
-;;
-;; [vqshlq_m_u, vqshlq_m_s])
-;;
-(define_insn "mve_vqshlq_m_<supf><mode>"
-  [
-   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
-	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
-		       (match_operand:MVE_2 2 "s_register_operand" "w")
-		       (match_operand:MVE_2 3 "s_register_operand" "w")
-		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
-	 VQSHLQ_M))
-  ]
-  "TARGET_HAVE_MVE"
-  "vpst\;vqshlt.<supf>%#<V_sz_elem>\t%q0, %q2, %q3"
-  [(set_attr "type" "mve_move")
-   (set_attr "length""8")])
-
 ;;
 ;; [vqsubq_m_n_u, vqsubq_m_n_s])
 ;;
@@ -5434,74 +5306,6 @@ (define_insn "mve_vqsubq_m_n_<supf><mode>"
   [(set_attr "type" "mve_move")
    (set_attr "length""8")])
 
-;;
-;; [vqsubq_m_u, vqsubq_m_s])
-;;
-(define_insn "mve_vqsubq_m_<supf><mode>"
-  [
-   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
-	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
-		       (match_operand:MVE_2 2 "s_register_operand" "w")
-		       (match_operand:MVE_2 3 "s_register_operand" "w")
-		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
-	 VQSUBQ_M))
-  ]
-  "TARGET_HAVE_MVE"
-  "vpst\;vqsubt.<supf>%#<V_sz_elem>\t%q0, %q2, %q3"
-  [(set_attr "type" "mve_move")
-   (set_attr "length""8")])
-
-;;
-;; [vrhaddq_m_u, vrhaddq_m_s])
-;;
-(define_insn "mve_vrhaddq_m_<supf><mode>"
-  [
-   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
-	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
-		       (match_operand:MVE_2 2 "s_register_operand" "w")
-		       (match_operand:MVE_2 3 "s_register_operand" "w")
-		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
-	 VRHADDQ_M))
-  ]
-  "TARGET_HAVE_MVE"
-  "vpst\;vrhaddt.<supf>%#<V_sz_elem>\t%q0, %q2, %q3"
-  [(set_attr "type" "mve_move")
-   (set_attr "length""8")])
-
-;;
-;; [vrmulhq_m_u, vrmulhq_m_s])
-;;
-(define_insn "mve_vrmulhq_m_<supf><mode>"
-  [
-   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
-	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
-		       (match_operand:MVE_2 2 "s_register_operand" "w")
-		       (match_operand:MVE_2 3 "s_register_operand" "w")
-		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
-	 VRMULHQ_M))
-  ]
-  "TARGET_HAVE_MVE"
-  "vpst\;vrmulht.<supf>%#<V_sz_elem>\t%q0, %q2, %q3"
-  [(set_attr "type" "mve_move")
-   (set_attr "length""8")])
-
-;;
-;; [vrshlq_m_s, vrshlq_m_u])
-;;
-(define_insn "mve_vrshlq_m_<supf><mode>"
-  [
-   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
-	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
-		       (match_operand:MVE_2 2 "s_register_operand" "w")
-		       (match_operand:MVE_2 3 "s_register_operand" "w")
-		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
-	 VRSHLQ_M))
-  ]
-  "TARGET_HAVE_MVE"
-  "vpst\;vrshlt.<supf>%#<V_sz_elem>\t%q0, %q2, %q3"
-  [(set_attr "type" "mve_move")
-   (set_attr "length""8")])
-
 ;;
 ;; [vrshrq_m_n_s, vrshrq_m_n_u])
 ;;
@@ -5655,74 +5459,6 @@ (define_insn "mve_vmlsdavaxq_p_s<mode>"
   [(set_attr "type" "mve_move")
    (set_attr "length""8")])
 
-;;
-;; [vqdmladhq_m_s])
-;;
-(define_insn "mve_vqdmladhq_m_s<mode>"
-  [
-   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
-	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
-		       (match_operand:MVE_2 2 "s_register_operand" "w")
-		       (match_operand:MVE_2 3 "s_register_operand" "w")
-		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
-	 VQDMLADHQ_M_S))
-  ]
-  "TARGET_HAVE_MVE"
-  "vpst\;vqdmladht.s%#<V_sz_elem>\t%q0, %q2, %q3"
-  [(set_attr "type" "mve_move")
-   (set_attr "length""8")])
-
-;;
-;; [vqdmladhxq_m_s])
-;;
-(define_insn "mve_vqdmladhxq_m_s<mode>"
-  [
-   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
-	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
-		       (match_operand:MVE_2 2 "s_register_operand" "w")
-		       (match_operand:MVE_2 3 "s_register_operand" "w")
-		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
-	 VQDMLADHXQ_M_S))
-  ]
-  "TARGET_HAVE_MVE"
-  "vpst\;vqdmladhxt.s%#<V_sz_elem>\t%q0, %q2, %q3"
-  [(set_attr "type" "mve_move")
-   (set_attr "length""8")])
-
-;;
-;; [vqdmlsdhq_m_s])
-;;
-(define_insn "mve_vqdmlsdhq_m_s<mode>"
-  [
-   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
-	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
-		       (match_operand:MVE_2 2 "s_register_operand" "w")
-		       (match_operand:MVE_2 3 "s_register_operand" "w")
-		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
-	 VQDMLSDHQ_M_S))
-  ]
-  "TARGET_HAVE_MVE"
-  "vpst\;vqdmlsdht.s%#<V_sz_elem>\t%q0, %q2, %q3"
-  [(set_attr "type" "mve_move")
-   (set_attr "length""8")])
-
-;;
-;; [vqdmlsdhxq_m_s])
-;;
-(define_insn "mve_vqdmlsdhxq_m_s<mode>"
-  [
-   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
-	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
-		       (match_operand:MVE_2 2 "s_register_operand" "w")
-		       (match_operand:MVE_2 3 "s_register_operand" "w")
-		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
-	 VQDMLSDHXQ_M_S))
-  ]
-  "TARGET_HAVE_MVE"
-  "vpst\;vqdmlsdhxt.s%#<V_sz_elem>\t%q0, %q2, %q3"
-  [(set_attr "type" "mve_move")
-   (set_attr "length""8")])
-
 ;;
 ;; [vqdmulhq_m_n_s])
 ;;
@@ -5740,91 +5476,6 @@ (define_insn "mve_vqdmulhq_m_n_s<mode>"
   [(set_attr "type" "mve_move")
    (set_attr "length""8")])
 
-;;
-;; [vqdmulhq_m_s])
-;;
-(define_insn "mve_vqdmulhq_m_s<mode>"
-  [
-   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
-	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
-		       (match_operand:MVE_2 2 "s_register_operand" "w")
-		       (match_operand:MVE_2 3 "s_register_operand" "w")
-		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
-	 VQDMULHQ_M_S))
-  ]
-  "TARGET_HAVE_MVE"
-  "vpst\;vqdmulht.s%#<V_sz_elem>\t%q0, %q2, %q3"
-  [(set_attr "type" "mve_move")
-   (set_attr "length""8")])
-
-;;
-;; [vqrdmladhq_m_s])
-;;
-(define_insn "mve_vqrdmladhq_m_s<mode>"
-  [
-   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
-	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
-		       (match_operand:MVE_2 2 "s_register_operand" "w")
-		       (match_operand:MVE_2 3 "s_register_operand" "w")
-		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
-	 VQRDMLADHQ_M_S))
-  ]
-  "TARGET_HAVE_MVE"
-  "vpst\;vqrdmladht.s%#<V_sz_elem>\t%q0, %q2, %q3"
-  [(set_attr "type" "mve_move")
-   (set_attr "length""8")])
-
-;;
-;; [vqrdmladhxq_m_s])
-;;
-(define_insn "mve_vqrdmladhxq_m_s<mode>"
-  [
-   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
-	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
-		       (match_operand:MVE_2 2 "s_register_operand" "w")
-		       (match_operand:MVE_2 3 "s_register_operand" "w")
-		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
-	 VQRDMLADHXQ_M_S))
-  ]
-  "TARGET_HAVE_MVE"
-  "vpst\;vqrdmladhxt.s%#<V_sz_elem>\t%q0, %q2, %q3"
-  [(set_attr "type" "mve_move")
-   (set_attr "length""8")])
-
-;;
-;; [vqrdmlsdhq_m_s])
-;;
-(define_insn "mve_vqrdmlsdhq_m_s<mode>"
-  [
-   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
-	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
-		       (match_operand:MVE_2 2 "s_register_operand" "w")
-		       (match_operand:MVE_2 3 "s_register_operand" "w")
-		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
-	 VQRDMLSDHQ_M_S))
-  ]
-  "TARGET_HAVE_MVE"
-  "vpst\;vqrdmlsdht.s%#<V_sz_elem>\t%q0, %q2, %q3"
-  [(set_attr "type" "mve_move")
-   (set_attr "length""8")])
-
-;;
-;; [vqrdmlsdhxq_m_s])
-;;
-(define_insn "mve_vqrdmlsdhxq_m_s<mode>"
-  [
-   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
-	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
-		       (match_operand:MVE_2 2 "s_register_operand" "w")
-		       (match_operand:MVE_2 3 "s_register_operand" "w")
-		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
-	 VQRDMLSDHXQ_M_S))
-  ]
-  "TARGET_HAVE_MVE"
-  "vpst\;vqrdmlsdhxt.s%#<V_sz_elem>\t%q0, %q2, %q3"
-  [(set_attr "type" "mve_move")
-   (set_attr "length""8")])
-
 ;;
 ;; [vqrdmulhq_m_n_s])
 ;;
@@ -5842,23 +5493,6 @@ (define_insn "mve_vqrdmulhq_m_n_s<mode>"
   [(set_attr "type" "mve_move")
    (set_attr "length""8")])
 
-;;
-;; [vqrdmulhq_m_s])
-;;
-(define_insn "mve_vqrdmulhq_m_s<mode>"
-  [
-   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
-	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
-		       (match_operand:MVE_2 2 "s_register_operand" "w")
-		       (match_operand:MVE_2 3 "s_register_operand" "w")
-		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
-	 VQRDMULHQ_M_S))
-  ]
-  "TARGET_HAVE_MVE"
-  "vpst\;vqrdmulht.s%#<V_sz_elem>\t%q0, %q2, %q3"
-  [(set_attr "type" "mve_move")
-   (set_attr "length""8")])
-
 ;;
 ;; [vmlaldavaq_p_u, vmlaldavaq_p_s])
 ;;
-- 
2.34.1


^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH 19/22] arm: [MVE intrinsics] factorize several binary _n operations
  2023-04-18 13:45 [PATCH 00/22] arm: New framework for MVE intrinsics Christophe Lyon
                   ` (17 preceding siblings ...)
  2023-04-18 13:46 ` [PATCH 18/22] arm: [MVE intrinsics] factorize several binary_m operations Christophe Lyon
@ 2023-04-18 13:46 ` Christophe Lyon
  2023-05-03  8:47   ` Kyrylo Tkachov
  2023-04-18 13:46 ` [PATCH 20/22] arm: [MVE intrinsics] factorize several binary _m_n operations Christophe Lyon
                   ` (3 subsequent siblings)
  22 siblings, 1 reply; 55+ messages in thread
From: Christophe Lyon @ 2023-04-18 13:46 UTC (permalink / raw)
  To: gcc-patches, kyrylo.tkachov, richard.earnshaw, richard.sandiford
  Cc: Christophe Lyon

Factorize
vhaddq_n, vhsubq_n, vqaddq_n, vqdmulhq_n, vqrdmulhq_n, vqsubq_n
so that they use the same pattern.

2022-09-08  Christophe Lyon <christophe.lyon@arm.com>

	gcc/
	* config/arm/iterators.md (MVE_INT_SU_N_BINARY): New.
	(mve_insn): Add vhaddq, vhsubq, vqaddq, vqdmulhq, vqrdmulhq,
	vqsubq.
	(supf): Add VQDMULHQ_N_S, VQRDMULHQ_N_S.
	* config/arm/mve.md (mve_vhaddq_n_<supf><mode>)
	(mve_vhsubq_n_<supf><mode>, mve_vqaddq_n_<supf><mode>)
	(mve_vqdmulhq_n_s<mode>, mve_vqrdmulhq_n_s<mode>)
	(mve_vqsubq_n_<supf><mode>): Merge into ...
	(@mve_<mve_insn>q_n_<supf><mode>): ... this.
---
 gcc/config/arm/iterators.md | 17 ++++++++
 gcc/config/arm/mve.md       | 86 ++++---------------------------------
 2 files changed, 25 insertions(+), 78 deletions(-)

diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
index 18d70350bbe..6dbc40f842c 100644
--- a/gcc/config/arm/iterators.md
+++ b/gcc/config/arm/iterators.md
@@ -390,6 +390,15 @@ (define_int_iterator MVE_INT_N_BINARY   [
 		     VSUBQ_N_S VSUBQ_N_U
 		     ])
 
+(define_int_iterator MVE_INT_SU_N_BINARY   [
+		     VHADDQ_N_S VHADDQ_N_U
+		     VHSUBQ_N_S VHSUBQ_N_U
+		     VQADDQ_N_S VQADDQ_N_U
+		     VQDMULHQ_N_S
+		     VQRDMULHQ_N_S
+		     VQSUBQ_N_S VQSUBQ_N_U
+		     ])
+
 (define_int_iterator MVE_INT_N_BINARY_LOGIC   [
 		     VBICQ_N_S VBICQ_N_U
 		     VORRQ_N_S VORRQ_N_U
@@ -442,7 +451,9 @@ (define_int_attr mve_insn [
 		 (VCREATEQ_S "vcreate") (VCREATEQ_U "vcreate") (VCREATEQ_F "vcreate")
 		 (VEORQ_M_S "veor") (VEORQ_M_U "veor") (VEORQ_M_F "veor")
 		 (VHADDQ_M_S "vhadd") (VHADDQ_M_U "vhadd")
+		 (VHADDQ_N_S "vhadd") (VHADDQ_N_U "vhadd")
 		 (VHSUBQ_M_S "vhsub") (VHSUBQ_M_U "vhsub")
+		 (VHSUBQ_N_S "vhsub") (VHSUBQ_N_U "vhsub")
 		 (VMAXQ_M_S "vmax") (VMAXQ_M_U "vmax")
 		 (VMINQ_M_S "vmin") (VMINQ_M_U "vmin")
 		 (VMULHQ_M_S "vmulh") (VMULHQ_M_U "vmulh")
@@ -453,19 +464,23 @@ (define_int_attr mve_insn [
 		 (VORRQ_M_S "vorr") (VORRQ_M_U "vorr") (VORRQ_M_F "vorr")
 		 (VORRQ_N_S "vorr") (VORRQ_N_U "vorr")
 		 (VQADDQ_M_S "vqadd") (VQADDQ_M_U "vqadd")
+		 (VQADDQ_N_S "vqadd") (VQADDQ_N_U "vqadd")
 		 (VQDMLADHQ_M_S "vqdmladh")
 		 (VQDMLADHXQ_M_S "vqdmladhx")
 		 (VQDMLSDHQ_M_S "vqdmlsdh")
 		 (VQDMLSDHXQ_M_S "vqdmlsdhx")
 		 (VQDMULHQ_M_S "vqdmulh")
+		 (VQDMULHQ_N_S "vqdmulh")
 		 (VQRDMLADHQ_M_S "vqrdmladh")
 		 (VQRDMLADHXQ_M_S "vqrdmladhx")
 		 (VQRDMLSDHQ_M_S "vqrdmlsdh")
 		 (VQRDMLSDHXQ_M_S "vqrdmlsdhx")
 		 (VQRDMULHQ_M_S "vqrdmulh")
+		 (VQRDMULHQ_N_S "vqrdmulh")
 		 (VQRSHLQ_M_S "vqrshl") (VQRSHLQ_M_U "vqrshl")
 		 (VQSHLQ_M_S "vqshl") (VQSHLQ_M_U "vqshl")
 		 (VQSUBQ_M_S "vqsub") (VQSUBQ_M_U "vqsub")
+		 (VQSUBQ_N_S "vqsub") (VQSUBQ_N_U "vqsub")
 		 (VRHADDQ_M_S "vrhadd") (VRHADDQ_M_U "vrhadd")
 		 (VRMULHQ_M_S "vrmulh") (VRMULHQ_M_U "vrmulh")
 		 (VRSHLQ_M_S "vrshl") (VRSHLQ_M_U "vrshl")
@@ -1619,6 +1634,8 @@ (define_int_attr supf [(VCVTQ_TO_F_S "s") (VCVTQ_TO_F_U "u") (VREV16Q_S "s")
 		       (VQRDMLSDHQ_M_S "s")
 		       (VQRDMLSDHXQ_M_S "s")
 		       (VQRDMULHQ_M_S "s")
+		       (VQDMULHQ_N_S "s")
+		       (VQRDMULHQ_N_S "s")
 		       ])
 
 ;; Both kinds of return insn.
diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index 21c54197db5..3377e03ee06 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -1015,16 +1015,21 @@ (define_expand "mve_veorq_s<mode>"
 
 ;;
 ;; [vhaddq_n_u, vhaddq_n_s])
+;; [vhsubq_n_u, vhsubq_n_s])
+;; [vqaddq_n_s, vqaddq_n_u])
+;; [vqdmulhq_n_s])
+;; [vqrdmulhq_n_s])
+;; [vqsubq_n_s, vqsubq_n_u])
 ;;
-(define_insn "mve_vhaddq_n_<supf><mode>"
+(define_insn "@mve_<mve_insn>q_n_<supf><mode>"
   [
    (set (match_operand:MVE_2 0 "s_register_operand" "=w")
 	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "w")
 		       (match_operand:<V_elem> 2 "s_register_operand" "r")]
-	 VHADDQ_N))
+	 MVE_INT_SU_N_BINARY))
   ]
   "TARGET_HAVE_MVE"
-  "vhadd.<supf>%#<V_sz_elem>\t%q0, %q1, %2"
+  "<mve_insn>.<supf>%#<V_sz_elem>\t%q0, %q1, %2"
   [(set_attr "type" "mve_move")
 ])
 
@@ -1073,21 +1078,6 @@ (define_insn "mve_vhcaddq_rot90_s<mode>"
   [(set_attr "type" "mve_move")
 ])
 
-;;
-;; [vhsubq_n_u, vhsubq_n_s])
-;;
-(define_insn "mve_vhsubq_n_<supf><mode>"
-  [
-   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
-	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "w")
-		       (match_operand:<V_elem> 2 "s_register_operand" "r")]
-	 VHSUBQ_N))
-  ]
-  "TARGET_HAVE_MVE"
-  "vhsub.<supf>%#<V_sz_elem>\t%q0, %q1, %2"
-  [(set_attr "type" "mve_move")
-])
-
 ;;
 ;; [vhsubq_s, vhsubq_u])
 ;;
@@ -1415,21 +1405,6 @@ (define_expand "mve_vorrq_u<mode>"
   "TARGET_HAVE_MVE"
 )
 
-;;
-;; [vqaddq_n_s, vqaddq_n_u])
-;;
-(define_insn "mve_vqaddq_n_<supf><mode>"
-  [
-   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
-	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "w")
-		       (match_operand:<V_elem> 2 "s_register_operand" "r")]
-	 VQADDQ_N))
-  ]
-  "TARGET_HAVE_MVE"
-  "vqadd.<supf>%#<V_sz_elem>\t%q0, %q1, %2"
-  [(set_attr "type" "mve_move")
-])
-
 ;;
 ;; [vqaddq_u, vqaddq_s])
 ;;
@@ -1445,21 +1420,6 @@ (define_insn "mve_vqaddq_<supf><mode>"
   [(set_attr "type" "mve_move")
 ])
 
-;;
-;; [vqdmulhq_n_s])
-;;
-(define_insn "mve_vqdmulhq_n_s<mode>"
-  [
-   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
-	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "w")
-		       (match_operand:<V_elem> 2 "s_register_operand" "r")]
-	 VQDMULHQ_N_S))
-  ]
-  "TARGET_HAVE_MVE"
-  "vqdmulh.s%#<V_sz_elem>\t%q0, %q1, %2"
-  [(set_attr "type" "mve_move")
-])
-
 ;;
 ;; [vqdmulhq_s])
 ;;
@@ -1475,21 +1435,6 @@ (define_insn "mve_vqdmulhq_s<mode>"
   [(set_attr "type" "mve_move")
 ])
 
-;;
-;; [vqrdmulhq_n_s])
-;;
-(define_insn "mve_vqrdmulhq_n_s<mode>"
-  [
-   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
-	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "w")
-		       (match_operand:<V_elem> 2 "s_register_operand" "r")]
-	 VQRDMULHQ_N_S))
-  ]
-  "TARGET_HAVE_MVE"
-  "vqrdmulh.s%#<V_sz_elem>\t%q0, %q1, %2"
-  [(set_attr "type" "mve_move")
-])
-
 ;;
 ;; [vqrdmulhq_s])
 ;;
@@ -1595,21 +1540,6 @@ (define_insn "mve_vqshluq_n_s<mode>"
   [(set_attr "type" "mve_move")
 ])
 
-;;
-;; [vqsubq_n_s, vqsubq_n_u])
-;;
-(define_insn "mve_vqsubq_n_<supf><mode>"
-  [
-   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
-	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "w")
-		       (match_operand:<V_elem> 2 "s_register_operand" "r")]
-	 VQSUBQ_N))
-  ]
-  "TARGET_HAVE_MVE"
-  "vqsub.<supf>%#<V_sz_elem>\t%q0, %q1, %2"
-  [(set_attr "type" "mve_move")
-])
-
 ;;
 ;; [vqsubq_u, vqsubq_s])
 ;;
-- 
2.34.1


^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH 20/22] arm: [MVE intrinsics] factorize several binary _m_n operations
  2023-04-18 13:45 [PATCH 00/22] arm: New framework for MVE intrinsics Christophe Lyon
                   ` (18 preceding siblings ...)
  2023-04-18 13:46 ` [PATCH 19/22] arm: [MVE intrinsics] factorize several binary _n operations Christophe Lyon
@ 2023-04-18 13:46 ` Christophe Lyon
  2023-05-03  8:48   ` Kyrylo Tkachov
  2023-04-18 13:46 ` [PATCH 21/22] arm: [MVE intrinsics] factorize several binary operations Christophe Lyon
                   ` (2 subsequent siblings)
  22 siblings, 1 reply; 55+ messages in thread
From: Christophe Lyon @ 2023-04-18 13:46 UTC (permalink / raw)
  To: gcc-patches, kyrylo.tkachov, richard.earnshaw, richard.sandiford
  Cc: Christophe Lyon

Factorize vhaddq_m_n, vhsubq_m_n, vmlaq_m_n, vmlasq_m_n, vqaddq_m_n,
vqdmlahq_m_n, vqdmlashq_m_n, vqdmulhq_m_n, vqrdmlahq_m_n,
vqrdmlashq_m_n, vqrdmulhq_m_n, vqsubq_m_n
so that they use the same pattern.

2022-09-08  Christophe Lyon <christophe.lyon@arm.com>

	gcc/
	* config/arm/iterators.md (MVE_INT_SU_M_N_BINARY): New.
	(mve_insn): Add vhaddq, vhsubq, vmlaq, vmlasq, vqaddq, vqdmlahq,
	vqdmlashq, vqdmulhq, vqrdmlahq, vqrdmlashq, vqrdmulhq, vqsubq.
	(supf): Add VQDMLAHQ_M_N_S, VQDMLASHQ_M_N_S, VQRDMLAHQ_M_N_S,
	VQRDMLASHQ_M_N_S, VQDMULHQ_M_N_S, VQRDMULHQ_M_N_S.
	* config/arm/mve.md (mve_vhaddq_m_n_<supf><mode>)
	(mve_vhsubq_m_n_<supf><mode>, mve_vmlaq_m_n_<supf><mode>)
	(mve_vmlasq_m_n_<supf><mode>, mve_vqaddq_m_n_<supf><mode>)
	(mve_vqdmlahq_m_n_s<mode>, mve_vqdmlashq_m_n_s<mode>)
	(mve_vqrdmlahq_m_n_s<mode>, mve_vqrdmlashq_m_n_s<mode>)
	(mve_vqsubq_m_n_<supf><mode>, mve_vqdmulhq_m_n_s<mode>)
	(mve_vqrdmulhq_m_n_s<mode>): Merge into ...
	(@mve_<mve_insn>q_m_n_<supf><mode>): ... this.
---
 gcc/config/arm/iterators.md |  33 ++++++
 gcc/config/arm/mve.md       | 202 +++---------------------------------
 2 files changed, 46 insertions(+), 189 deletions(-)

diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
index 6dbc40f842c..60452cdefe3 100644
--- a/gcc/config/arm/iterators.md
+++ b/gcc/config/arm/iterators.md
@@ -384,6 +384,21 @@ (define_int_iterator MVE_INT_M_N_BINARY_LOGIC [
 		     VORRQ_M_N_S VORRQ_M_N_U
 		     ])
 
+(define_int_iterator MVE_INT_SU_M_N_BINARY   [
+		     VHADDQ_M_N_S VHADDQ_M_N_U
+		     VHSUBQ_M_N_S VHSUBQ_M_N_U
+		     VMLAQ_M_N_S VMLAQ_M_N_U
+		     VMLASQ_M_N_S VMLASQ_M_N_U
+		     VQDMLAHQ_M_N_S
+		     VQDMLASHQ_M_N_S
+		     VQRDMLAHQ_M_N_S
+		     VQRDMLASHQ_M_N_S
+		     VQADDQ_M_N_S VQADDQ_M_N_U
+		     VQSUBQ_M_N_S VQSUBQ_M_N_U
+		     VQDMULHQ_M_N_S
+		     VQRDMULHQ_M_N_S
+		     ])
+
 (define_int_iterator MVE_INT_N_BINARY   [
 		     VADDQ_N_S VADDQ_N_U
 		     VMULQ_N_S VMULQ_N_U
@@ -450,12 +465,16 @@ (define_int_attr mve_insn [
 		 (VBICQ_N_S "vbic") (VBICQ_N_U "vbic")
 		 (VCREATEQ_S "vcreate") (VCREATEQ_U "vcreate") (VCREATEQ_F "vcreate")
 		 (VEORQ_M_S "veor") (VEORQ_M_U "veor") (VEORQ_M_F "veor")
+		 (VHADDQ_M_N_S "vhadd") (VHADDQ_M_N_U "vhadd")
 		 (VHADDQ_M_S "vhadd") (VHADDQ_M_U "vhadd")
 		 (VHADDQ_N_S "vhadd") (VHADDQ_N_U "vhadd")
+		 (VHSUBQ_M_N_S "vhsub") (VHSUBQ_M_N_U "vhsub")
 		 (VHSUBQ_M_S "vhsub") (VHSUBQ_M_U "vhsub")
 		 (VHSUBQ_N_S "vhsub") (VHSUBQ_N_U "vhsub")
 		 (VMAXQ_M_S "vmax") (VMAXQ_M_U "vmax")
 		 (VMINQ_M_S "vmin") (VMINQ_M_U "vmin")
+		 (VMLAQ_M_N_S "vmla") (VMLAQ_M_N_U "vmla")
+		 (VMLASQ_M_N_S "vmlas") (VMLASQ_M_N_U "vmlas")
 		 (VMULHQ_M_S "vmulh") (VMULHQ_M_U "vmulh")
 		 (VMULQ_M_N_S "vmul") (VMULQ_M_N_U "vmul") (VMULQ_M_N_F "vmul")
 		 (VMULQ_M_S "vmul") (VMULQ_M_U "vmul") (VMULQ_M_F "vmul")
@@ -463,22 +482,30 @@ (define_int_attr mve_insn [
 		 (VORRQ_M_N_S "vorr") (VORRQ_M_N_U "vorr")
 		 (VORRQ_M_S "vorr") (VORRQ_M_U "vorr") (VORRQ_M_F "vorr")
 		 (VORRQ_N_S "vorr") (VORRQ_N_U "vorr")
+		 (VQADDQ_M_N_S "vqadd") (VQADDQ_M_N_U "vqadd")
 		 (VQADDQ_M_S "vqadd") (VQADDQ_M_U "vqadd")
 		 (VQADDQ_N_S "vqadd") (VQADDQ_N_U "vqadd")
 		 (VQDMLADHQ_M_S "vqdmladh")
 		 (VQDMLADHXQ_M_S "vqdmladhx")
+		 (VQDMLAHQ_M_N_S "vqdmlah")
+		 (VQDMLASHQ_M_N_S "vqdmlash")
 		 (VQDMLSDHQ_M_S "vqdmlsdh")
 		 (VQDMLSDHXQ_M_S "vqdmlsdhx")
+		 (VQDMULHQ_M_N_S "vqdmulh")
 		 (VQDMULHQ_M_S "vqdmulh")
 		 (VQDMULHQ_N_S "vqdmulh")
 		 (VQRDMLADHQ_M_S "vqrdmladh")
 		 (VQRDMLADHXQ_M_S "vqrdmladhx")
+		 (VQRDMLAHQ_M_N_S "vqrdmlah")
+		 (VQRDMLASHQ_M_N_S "vqrdmlash")
 		 (VQRDMLSDHQ_M_S "vqrdmlsdh")
 		 (VQRDMLSDHXQ_M_S "vqrdmlsdhx")
+		 (VQRDMULHQ_M_N_S "vqrdmulh")
 		 (VQRDMULHQ_M_S "vqrdmulh")
 		 (VQRDMULHQ_N_S "vqrdmulh")
 		 (VQRSHLQ_M_S "vqrshl") (VQRSHLQ_M_U "vqrshl")
 		 (VQSHLQ_M_S "vqshl") (VQSHLQ_M_U "vqshl")
+		 (VQSUBQ_M_N_S "vqsub") (VQSUBQ_M_N_U "vqsub")
 		 (VQSUBQ_M_S "vqsub") (VQSUBQ_M_U "vqsub")
 		 (VQSUBQ_N_S "vqsub") (VQSUBQ_N_U "vqsub")
 		 (VRHADDQ_M_S "vrhadd") (VRHADDQ_M_U "vrhadd")
@@ -1636,6 +1663,12 @@ (define_int_attr supf [(VCVTQ_TO_F_S "s") (VCVTQ_TO_F_U "u") (VREV16Q_S "s")
 		       (VQRDMULHQ_M_S "s")
 		       (VQDMULHQ_N_S "s")
 		       (VQRDMULHQ_N_S "s")
+		       (VQDMLAHQ_M_N_S "s")
+		       (VQDMLASHQ_M_N_S "s")
+		       (VQRDMLAHQ_M_N_S "s")
+		       (VQRDMLASHQ_M_N_S "s")
+		       (VQDMULHQ_M_N_S "s")
+		       (VQRDMULHQ_M_N_S "s")
 		       ])
 
 ;; Both kinds of return insn.
diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index 3377e03ee06..d14a04d5f82 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -4982,35 +4982,29 @@ (define_insn "mve_vcaddq_rot90_m_<supf><mode>"
 
 ;;
 ;; [vhaddq_m_n_s, vhaddq_m_n_u])
-;;
-(define_insn "mve_vhaddq_m_n_<supf><mode>"
-  [
-   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
-	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
-		       (match_operand:MVE_2 2 "s_register_operand" "w")
-		       (match_operand:<V_elem> 3 "s_register_operand" "r")
-		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
-	 VHADDQ_M_N))
-  ]
-  "TARGET_HAVE_MVE"
-  "vpst\;vhaddt.<supf>%#<V_sz_elem>	%q0, %q2, %3"
-  [(set_attr "type" "mve_move")
-   (set_attr "length""8")])
-
-;;
 ;; [vhsubq_m_n_s, vhsubq_m_n_u])
+;; [vmlaq_m_n_s, vmlaq_m_n_u])
+;; [vmlasq_m_n_u, vmlasq_m_n_s])
+;; [vqaddq_m_n_u, vqaddq_m_n_s])
+;; [vqdmlahq_m_n_s])
+;; [vqdmlashq_m_n_s])
+;; [vqdmulhq_m_n_s])
+;; [vqrdmlahq_m_n_s])
+;; [vqrdmlashq_m_n_s])
+;; [vqrdmulhq_m_n_s])
+;; [vqsubq_m_n_u, vqsubq_m_n_s])
 ;;
-(define_insn "mve_vhsubq_m_n_<supf><mode>"
+(define_insn "@mve_<mve_insn>q_m_n_<supf><mode>"
   [
    (set (match_operand:MVE_2 0 "s_register_operand" "=w")
 	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
 		       (match_operand:MVE_2 2 "s_register_operand" "w")
 		       (match_operand:<V_elem> 3 "s_register_operand" "r")
 		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
-	 VHSUBQ_M_N))
+	 MVE_INT_SU_M_N_BINARY))
   ]
   "TARGET_HAVE_MVE"
-  "vpst\;vhsubt.<supf>%#<V_sz_elem>	%q0, %q2, %3"
+  "vpst\;<mve_insn>t.<supf>%#<V_sz_elem>\t%q0, %q2, %3"
   [(set_attr "type" "mve_move")
    (set_attr "length""8")])
 
@@ -5032,40 +5026,6 @@ (define_insn "mve_vmladavaq_p_<supf><mode>"
   [(set_attr "type" "mve_move")
    (set_attr "length""8")])
 
-;;
-;; [vmlaq_m_n_s, vmlaq_m_n_u])
-;;
-(define_insn "mve_vmlaq_m_n_<supf><mode>"
-  [
-   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
-	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
-		       (match_operand:MVE_2 2 "s_register_operand" "w")
-		       (match_operand:<V_elem> 3 "s_register_operand" "r")
-		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
-	 VMLAQ_M_N))
-  ]
-  "TARGET_HAVE_MVE"
-  "vpst\;vmlat.<supf>%#<V_sz_elem>	%q0, %q2, %3"
-  [(set_attr "type" "mve_move")
-   (set_attr "length""8")])
-
-;;
-;; [vmlasq_m_n_u, vmlasq_m_n_s])
-;;
-(define_insn "mve_vmlasq_m_n_<supf><mode>"
-  [
-   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
-	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
-		       (match_operand:MVE_2 2 "s_register_operand" "w")
-		       (match_operand:<V_elem> 3 "s_register_operand" "r")
-		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
-	 VMLASQ_M_N))
-  ]
-  "TARGET_HAVE_MVE"
-  "vpst\;vmlast.<supf>%#<V_sz_elem>	%q0, %q2, %3"
-  [(set_attr "type" "mve_move")
-   (set_attr "length""8")])
-
 ;;
 ;; [vmullbq_int_m_u, vmullbq_int_m_s])
 ;;
@@ -5117,91 +5077,6 @@ (define_insn "mve_vornq_m_<supf><mode>"
   [(set_attr "type" "mve_move")
    (set_attr "length""8")])
 
-;;
-;; [vqaddq_m_n_u, vqaddq_m_n_s])
-;;
-(define_insn "mve_vqaddq_m_n_<supf><mode>"
-  [
-   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
-	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
-		       (match_operand:MVE_2 2 "s_register_operand" "w")
-		       (match_operand:<V_elem> 3 "s_register_operand" "r")
-		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
-	 VQADDQ_M_N))
-  ]
-  "TARGET_HAVE_MVE"
-  "vpst\;vqaddt.<supf>%#<V_sz_elem>\t%q0, %q2, %3"
-  [(set_attr "type" "mve_move")
-   (set_attr "length""8")])
-
-;;
-;; [vqdmlahq_m_n_s])
-;;
-(define_insn "mve_vqdmlahq_m_n_s<mode>"
-  [
-   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
-	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
-		       (match_operand:MVE_2 2 "s_register_operand" "w")
-		       (match_operand:<V_elem> 3 "s_register_operand" "r")
-		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
-	 VQDMLAHQ_M_N_S))
-  ]
-  "TARGET_HAVE_MVE"
-  "vpst\;vqdmlaht.s%#<V_sz_elem>\t%q0, %q2, %3"
-  [(set_attr "type" "mve_move")
-   (set_attr "length""8")])
-
-;;
-;; [vqdmlashq_m_n_s])
-;;
-(define_insn "mve_vqdmlashq_m_n_s<mode>"
-  [
-   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
-	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
-		       (match_operand:MVE_2 2 "s_register_operand" "w")
-		       (match_operand:<V_elem> 3 "s_register_operand" "r")
-		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
-	 VQDMLASHQ_M_N_S))
-  ]
-  "TARGET_HAVE_MVE"
-  "vpst\;vqdmlasht.s%#<V_sz_elem>\t%q0, %q2, %3"
-  [(set_attr "type" "mve_move")
-   (set_attr "length""8")])
-
-;;
-;; [vqrdmlahq_m_n_s])
-;;
-(define_insn "mve_vqrdmlahq_m_n_s<mode>"
-  [
-   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
-	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
-		       (match_operand:MVE_2 2 "s_register_operand" "w")
-		       (match_operand:<V_elem> 3 "s_register_operand" "r")
-		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
-	 VQRDMLAHQ_M_N_S))
-  ]
-  "TARGET_HAVE_MVE"
-  "vpst\;vqrdmlaht.s%#<V_sz_elem>\t%q0, %q2, %3"
-  [(set_attr "type" "mve_move")
-   (set_attr "length""8")])
-
-;;
-;; [vqrdmlashq_m_n_s])
-;;
-(define_insn "mve_vqrdmlashq_m_n_s<mode>"
-  [
-   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
-	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
-		       (match_operand:MVE_2 2 "s_register_operand" "w")
-		       (match_operand:<V_elem> 3 "s_register_operand" "r")
-		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
-	 VQRDMLASHQ_M_N_S))
-  ]
-  "TARGET_HAVE_MVE"
-  "vpst\;vqrdmlasht.s%#<V_sz_elem>\t%q0, %q2, %3"
-  [(set_attr "type" "mve_move")
-   (set_attr "length""8")])
-
 ;;
 ;; [vqshlq_m_n_s, vqshlq_m_n_u])
 ;;
@@ -5219,23 +5094,6 @@ (define_insn "mve_vqshlq_m_n_<supf><mode>"
   [(set_attr "type" "mve_move")
    (set_attr "length""8")])
 
-;;
-;; [vqsubq_m_n_u, vqsubq_m_n_s])
-;;
-(define_insn "mve_vqsubq_m_n_<supf><mode>"
-  [
-   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
-	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
-		       (match_operand:MVE_2 2 "s_register_operand" "w")
-		       (match_operand:<V_elem> 3 "s_register_operand" "r")
-		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
-	 VQSUBQ_M_N))
-  ]
-  "TARGET_HAVE_MVE"
-  "vpst\;vqsubt.<supf>%#<V_sz_elem>\t%q0, %q2, %3"
-  [(set_attr "type" "mve_move")
-   (set_attr "length""8")])
-
 ;;
 ;; [vrshrq_m_n_s, vrshrq_m_n_u])
 ;;
@@ -5389,40 +5247,6 @@ (define_insn "mve_vmlsdavaxq_p_s<mode>"
   [(set_attr "type" "mve_move")
    (set_attr "length""8")])
 
-;;
-;; [vqdmulhq_m_n_s])
-;;
-(define_insn "mve_vqdmulhq_m_n_s<mode>"
-  [
-   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
-	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
-		       (match_operand:MVE_2 2 "s_register_operand" "w")
-		       (match_operand:<V_elem> 3 "s_register_operand" "r")
-		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
-	 VQDMULHQ_M_N_S))
-  ]
-  "TARGET_HAVE_MVE"
-  "vpst\;vqdmulht.s%#<V_sz_elem>\t%q0, %q2, %3"
-  [(set_attr "type" "mve_move")
-   (set_attr "length""8")])
-
-;;
-;; [vqrdmulhq_m_n_s])
-;;
-(define_insn "mve_vqrdmulhq_m_n_s<mode>"
-  [
-   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
-	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
-		       (match_operand:MVE_2 2 "s_register_operand" "w")
-		       (match_operand:<V_elem> 3 "s_register_operand" "r")
-		       (match_operand:<MVE_VPRED> 4 "vpr_register_operand" "Up")]
-	 VQRDMULHQ_M_N_S))
-  ]
-  "TARGET_HAVE_MVE"
-  "vpst\;vqrdmulht.s%#<V_sz_elem>\t%q0, %q2, %3"
-  [(set_attr "type" "mve_move")
-   (set_attr "length""8")])
-
 ;;
 ;; [vmlaldavaq_p_u, vmlaldavaq_p_s])
 ;;
-- 
2.34.1


^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH 21/22] arm: [MVE intrinsics] factorize several binary operations
  2023-04-18 13:45 [PATCH 00/22] arm: New framework for MVE intrinsics Christophe Lyon
                   ` (19 preceding siblings ...)
  2023-04-18 13:46 ` [PATCH 20/22] arm: [MVE intrinsics] factorize several binary _m_n operations Christophe Lyon
@ 2023-04-18 13:46 ` Christophe Lyon
  2023-05-03  8:49   ` Kyrylo Tkachov
  2023-04-18 13:46 ` [PATCH 22/22] arm: [MVE intrinsics] rework vhaddq vhsubq vmulhq vqaddq vqsubq vqdmulhq vrhaddq vrmulhq Christophe Lyon
  2023-05-02  9:18 ` [PATCH 00/22] arm: New framework for MVE intrinsics Kyrylo Tkachov
  22 siblings, 1 reply; 55+ messages in thread
From: Christophe Lyon @ 2023-04-18 13:46 UTC (permalink / raw)
  To: gcc-patches, kyrylo.tkachov, richard.earnshaw, richard.sandiford
  Cc: Christophe Lyon

Factorize vabdq, vhaddq, vhsubq, vmulhq, vqaddq_u, vqdmulhq,
vqrdmulhq, vqrshlq, vqshlq, vqsubq_u, vrhaddq, vrmulhq, vrshlq
so that they use the same pattern.

2022-09-08  Christophe Lyon <christophe.lyon@arm.com>

	gcc/
	* config/arm/iterators.md (MVE_INT_SU_BINARY): New.
	(mve_insn): Add vabdq, vhaddq, vhsubq, vmulhq, vqaddq, vqdmulhq,
	vqrdmulhq, vqrshlq, vqshlq, vqsubq, vrhaddq, vrmulhq, vrshlq.
	(supf): Add VQDMULHQ_S, VQRDMULHQ_S.
	* config/arm/mve.md (mve_vabdq_<supf><mode>)
	(@mve_vhaddq_<supf><mode>, mve_vhsubq_<supf><mode>)
	(mve_vmulhq_<supf><mode>, mve_vqaddq_<supf><mode>)
	(mve_vqdmulhq_s<mode>, mve_vqrdmulhq_s<mode>)
	(mve_vqrshlq_<supf><mode>, mve_vqshlq_<supf><mode>)
	(mve_vqsubq_<supf><mode>, @mve_vrhaddq_<supf><mode>)
	(mve_vrmulhq_<supf><mode>, mve_vrshlq_<supf><mode>): Merge into
	...
	(@mve_<mve_insn>q_<supf><mode>): ... this.
	* config/arm/vec-common.md (avg<mode>3_floor, uavg<mode>3_floor)
	(avg<mode>3_ceil, uavg<mode>3_ceil): Use gen_mve_q instead of
	gen_mve_vhaddq / gen_mve_vrhaddq.
---
 gcc/config/arm/iterators.md  |  31 ++++++
 gcc/config/arm/mve.md        | 198 +++--------------------------------
 gcc/config/arm/vec-common.md |   8 +-
 3 files changed, 50 insertions(+), 187 deletions(-)

diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
index 60452cdefe3..068ae25e578 100644
--- a/gcc/config/arm/iterators.md
+++ b/gcc/config/arm/iterators.md
@@ -414,6 +414,22 @@ (define_int_iterator MVE_INT_SU_N_BINARY   [
 		     VQSUBQ_N_S VQSUBQ_N_U
 		     ])
 
+(define_int_iterator MVE_INT_SU_BINARY   [
+		     VABDQ_S VABDQ_U
+		     VHADDQ_S VHADDQ_U
+		     VHSUBQ_S VHSUBQ_U
+		     VMULHQ_S VMULHQ_U
+		     VQADDQ_S VQADDQ_U
+		     VQDMULHQ_S
+		     VQRDMULHQ_S
+		     VQRSHLQ_S VQRSHLQ_U
+		     VQSHLQ_S VQSHLQ_U
+		     VQSUBQ_S VQSUBQ_U
+		     VRHADDQ_S VRHADDQ_U
+		     VRMULHQ_S VRMULHQ_U
+		     VRSHLQ_S VRSHLQ_U
+		     ])
+
 (define_int_iterator MVE_INT_N_BINARY_LOGIC   [
 		     VBICQ_N_S VBICQ_N_U
 		     VORRQ_N_S VORRQ_N_U
@@ -456,6 +472,7 @@ (define_code_attr mve_addsubmul [
 
 (define_int_attr mve_insn [
 		 (VABDQ_M_S "vabd") (VABDQ_M_U "vabd")
+		 (VABDQ_S "vabd") (VABDQ_U "vabd")
 		 (VADDQ_M_N_S "vadd") (VADDQ_M_N_U "vadd") (VADDQ_M_N_F "vadd")
 		 (VADDQ_M_S "vadd") (VADDQ_M_U "vadd") (VADDQ_M_F "vadd")
 		 (VADDQ_N_S "vadd") (VADDQ_N_U "vadd") (VADDQ_N_F "vadd")
@@ -468,14 +485,17 @@ (define_int_attr mve_insn [
 		 (VHADDQ_M_N_S "vhadd") (VHADDQ_M_N_U "vhadd")
 		 (VHADDQ_M_S "vhadd") (VHADDQ_M_U "vhadd")
 		 (VHADDQ_N_S "vhadd") (VHADDQ_N_U "vhadd")
+		 (VHADDQ_S "vhadd") (VHADDQ_U "vhadd")
 		 (VHSUBQ_M_N_S "vhsub") (VHSUBQ_M_N_U "vhsub")
 		 (VHSUBQ_M_S "vhsub") (VHSUBQ_M_U "vhsub")
 		 (VHSUBQ_N_S "vhsub") (VHSUBQ_N_U "vhsub")
+		 (VHSUBQ_S "vhsub") (VHSUBQ_U "vhsub")
 		 (VMAXQ_M_S "vmax") (VMAXQ_M_U "vmax")
 		 (VMINQ_M_S "vmin") (VMINQ_M_U "vmin")
 		 (VMLAQ_M_N_S "vmla") (VMLAQ_M_N_U "vmla")
 		 (VMLASQ_M_N_S "vmlas") (VMLASQ_M_N_U "vmlas")
 		 (VMULHQ_M_S "vmulh") (VMULHQ_M_U "vmulh")
+		 (VMULHQ_S "vmulh") (VMULHQ_U "vmulh")
 		 (VMULQ_M_N_S "vmul") (VMULQ_M_N_U "vmul") (VMULQ_M_N_F "vmul")
 		 (VMULQ_M_S "vmul") (VMULQ_M_U "vmul") (VMULQ_M_F "vmul")
 		 (VMULQ_N_S "vmul") (VMULQ_N_U "vmul") (VMULQ_N_F "vmul")
@@ -485,6 +505,7 @@ (define_int_attr mve_insn [
 		 (VQADDQ_M_N_S "vqadd") (VQADDQ_M_N_U "vqadd")
 		 (VQADDQ_M_S "vqadd") (VQADDQ_M_U "vqadd")
 		 (VQADDQ_N_S "vqadd") (VQADDQ_N_U "vqadd")
+		 (VQADDQ_S "vqadd") (VQADDQ_U "vqadd")
 		 (VQDMLADHQ_M_S "vqdmladh")
 		 (VQDMLADHXQ_M_S "vqdmladhx")
 		 (VQDMLAHQ_M_N_S "vqdmlah")
@@ -494,6 +515,7 @@ (define_int_attr mve_insn [
 		 (VQDMULHQ_M_N_S "vqdmulh")
 		 (VQDMULHQ_M_S "vqdmulh")
 		 (VQDMULHQ_N_S "vqdmulh")
+		 (VQDMULHQ_S "vqdmulh")
 		 (VQRDMLADHQ_M_S "vqrdmladh")
 		 (VQRDMLADHXQ_M_S "vqrdmladhx")
 		 (VQRDMLAHQ_M_N_S "vqrdmlah")
@@ -503,14 +525,21 @@ (define_int_attr mve_insn [
 		 (VQRDMULHQ_M_N_S "vqrdmulh")
 		 (VQRDMULHQ_M_S "vqrdmulh")
 		 (VQRDMULHQ_N_S "vqrdmulh")
+		 (VQRDMULHQ_S "vqrdmulh")
 		 (VQRSHLQ_M_S "vqrshl") (VQRSHLQ_M_U "vqrshl")
+		 (VQRSHLQ_S "vqrshl") (VQRSHLQ_U "vqrshl")
 		 (VQSHLQ_M_S "vqshl") (VQSHLQ_M_U "vqshl")
+		 (VQSHLQ_S "vqshl") (VQSHLQ_U "vqshl")
 		 (VQSUBQ_M_N_S "vqsub") (VQSUBQ_M_N_U "vqsub")
 		 (VQSUBQ_M_S "vqsub") (VQSUBQ_M_U "vqsub")
 		 (VQSUBQ_N_S "vqsub") (VQSUBQ_N_U "vqsub")
+		 (VQSUBQ_S "vqsub") (VQSUBQ_U "vqsub")
 		 (VRHADDQ_M_S "vrhadd") (VRHADDQ_M_U "vrhadd")
+		 (VRHADDQ_S "vrhadd") (VRHADDQ_U "vrhadd")
 		 (VRMULHQ_M_S "vrmulh") (VRMULHQ_M_U "vrmulh")
+		 (VRMULHQ_S "vrmulh") (VRMULHQ_U "vrmulh")
 		 (VRSHLQ_M_S "vrshl") (VRSHLQ_M_U "vrshl")
+		 (VRSHLQ_S "vrshl") (VRSHLQ_U "vrshl")
 		 (VSHLQ_M_S "vshl") (VSHLQ_M_U "vshl")
 		 (VSUBQ_M_N_S "vsub") (VSUBQ_M_N_U "vsub") (VSUBQ_M_N_F "vsub")
 		 (VSUBQ_M_S "vsub") (VSUBQ_M_U "vsub") (VSUBQ_M_F "vsub")
@@ -1669,6 +1698,8 @@ (define_int_attr supf [(VCVTQ_TO_F_S "s") (VCVTQ_TO_F_U "u") (VREV16Q_S "s")
 		       (VQRDMLASHQ_M_N_S "s")
 		       (VQDMULHQ_M_N_S "s")
 		       (VQRDMULHQ_M_N_S "s")
+		       (VQDMULHQ_S "s")
+		       (VQRDMULHQ_S "s")
 		       ])
 
 ;; Both kinds of return insn.
diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index d14a04d5f82..b9126af2aa9 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -841,16 +841,28 @@ (define_insn "mve_vcmp<mve_cmp_op>q_n_<mode>"
 
 ;;
 ;; [vabdq_s, vabdq_u])
+;; [vhaddq_s, vhaddq_u])
+;; [vhsubq_s, vhsubq_u])
+;; [vmulhq_s, vmulhq_u])
+;; [vqaddq_u, vqaddq_s])
+;; [vqdmulhq_s])
+;; [vqrdmulhq_s])
+;; [vqrshlq_s, vqrshlq_u])
+;; [vqshlq_s, vqshlq_u])
+;; [vqsubq_u, vqsubq_s])
+;; [vrhaddq_s, vrhaddq_u])
+;; [vrmulhq_s, vrmulhq_u])
+;; [vrshlq_s, vrshlq_u])
 ;;
-(define_insn "mve_vabdq_<supf><mode>"
+(define_insn "@mve_<mve_insn>q_<supf><mode>"
   [
    (set (match_operand:MVE_2 0 "s_register_operand" "=w")
 	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "w")
 		       (match_operand:MVE_2 2 "s_register_operand" "w")]
-	 VABDQ))
+	 MVE_INT_SU_BINARY))
   ]
   "TARGET_HAVE_MVE"
-  "vabd.<supf>%#<V_sz_elem>	%q0, %q1, %q2"
+  "<mve_insn>.<supf>%#<V_sz_elem>\t%q0, %q1, %q2"
   [(set_attr "type" "mve_move")
 ])
 
@@ -1033,21 +1045,6 @@ (define_insn "@mve_<mve_insn>q_n_<supf><mode>"
   [(set_attr "type" "mve_move")
 ])
 
-;;
-;; [vhaddq_s, vhaddq_u])
-;;
-(define_insn "@mve_vhaddq_<supf><mode>"
-  [
-   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
-	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "w")
-		       (match_operand:MVE_2 2 "s_register_operand" "w")]
-	 VHADDQ))
-  ]
-  "TARGET_HAVE_MVE"
-  "vhadd.<supf>%#<V_sz_elem>\t%q0, %q1, %q2"
-  [(set_attr "type" "mve_move")
-])
-
 ;;
 ;; [vhcaddq_rot270_s])
 ;;
@@ -1078,21 +1075,6 @@ (define_insn "mve_vhcaddq_rot90_s<mode>"
   [(set_attr "type" "mve_move")
 ])
 
-;;
-;; [vhsubq_s, vhsubq_u])
-;;
-(define_insn "mve_vhsubq_<supf><mode>"
-  [
-   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
-	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "w")
-		       (match_operand:MVE_2 2 "s_register_operand" "w")]
-	 VHSUBQ))
-  ]
-  "TARGET_HAVE_MVE"
-  "vhsub.<supf>%#<V_sz_elem>\t%q0, %q1, %q2"
-  [(set_attr "type" "mve_move")
-])
-
 ;;
 ;; [vmaxaq_s])
 ;;
@@ -1293,21 +1275,6 @@ (define_insn "mve_vmlsdavxq_s<mode>"
   [(set_attr "type" "mve_move")
 ])
 
-;;
-;; [vmulhq_s, vmulhq_u])
-;;
-(define_insn "mve_vmulhq_<supf><mode>"
-  [
-   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
-	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "w")
-		       (match_operand:MVE_2 2 "s_register_operand" "w")]
-	 VMULHQ))
-  ]
-  "TARGET_HAVE_MVE"
-  "vmulh.<supf>%#<V_sz_elem>\t%q0, %q1, %q2"
-  [(set_attr "type" "mve_move")
-])
-
 ;;
 ;; [vmullbq_int_u, vmullbq_int_s])
 ;;
@@ -1405,51 +1372,6 @@ (define_expand "mve_vorrq_u<mode>"
   "TARGET_HAVE_MVE"
 )
 
-;;
-;; [vqaddq_u, vqaddq_s])
-;;
-(define_insn "mve_vqaddq_<supf><mode>"
-  [
-   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
-	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "w")
-		       (match_operand:MVE_2 2 "s_register_operand" "w")]
-	 VQADDQ))
-  ]
-  "TARGET_HAVE_MVE"
-  "vqadd.<supf>%#<V_sz_elem>\t%q0, %q1, %q2"
-  [(set_attr "type" "mve_move")
-])
-
-;;
-;; [vqdmulhq_s])
-;;
-(define_insn "mve_vqdmulhq_s<mode>"
-  [
-   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
-	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "w")
-		       (match_operand:MVE_2 2 "s_register_operand" "w")]
-	 VQDMULHQ_S))
-  ]
-  "TARGET_HAVE_MVE"
-  "vqdmulh.s%#<V_sz_elem>\t%q0, %q1, %q2"
-  [(set_attr "type" "mve_move")
-])
-
-;;
-;; [vqrdmulhq_s])
-;;
-(define_insn "mve_vqrdmulhq_s<mode>"
-  [
-   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
-	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "w")
-		       (match_operand:MVE_2 2 "s_register_operand" "w")]
-	 VQRDMULHQ_S))
-  ]
-  "TARGET_HAVE_MVE"
-  "vqrdmulh.s%#<V_sz_elem>\t%q0, %q1, %q2"
-  [(set_attr "type" "mve_move")
-])
-
 ;;
 ;; [vqrshlq_n_s, vqrshlq_n_u])
 ;;
@@ -1465,21 +1387,6 @@ (define_insn "mve_vqrshlq_n_<supf><mode>"
   [(set_attr "type" "mve_move")
 ])
 
-;;
-;; [vqrshlq_s, vqrshlq_u])
-;;
-(define_insn "mve_vqrshlq_<supf><mode>"
-  [
-   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
-	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "w")
-		       (match_operand:MVE_2 2 "s_register_operand" "w")]
-	 VQRSHLQ))
-  ]
-  "TARGET_HAVE_MVE"
-  "vqrshl.<supf>%#<V_sz_elem>\t%q0, %q1, %q2"
-  [(set_attr "type" "mve_move")
-])
-
 ;;
 ;; [vqshlq_n_s, vqshlq_n_u])
 ;;
@@ -1510,21 +1417,6 @@ (define_insn "mve_vqshlq_r_<supf><mode>"
   [(set_attr "type" "mve_move")
 ])
 
-;;
-;; [vqshlq_s, vqshlq_u])
-;;
-(define_insn "mve_vqshlq_<supf><mode>"
-  [
-   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
-	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "w")
-		       (match_operand:MVE_2 2 "s_register_operand" "w")]
-	 VQSHLQ))
-  ]
-  "TARGET_HAVE_MVE"
-  "vqshl.<supf>%#<V_sz_elem>\t%q0, %q1, %q2"
-  [(set_attr "type" "mve_move")
-])
-
 ;;
 ;; [vqshluq_n_s])
 ;;
@@ -1540,51 +1432,6 @@ (define_insn "mve_vqshluq_n_s<mode>"
   [(set_attr "type" "mve_move")
 ])
 
-;;
-;; [vqsubq_u, vqsubq_s])
-;;
-(define_insn "mve_vqsubq_<supf><mode>"
-  [
-   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
-	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "w")
-		       (match_operand:MVE_2 2 "s_register_operand" "w")]
-	 VQSUBQ))
-  ]
-  "TARGET_HAVE_MVE"
-  "vqsub.<supf>%#<V_sz_elem>\t%q0, %q1, %q2"
-  [(set_attr "type" "mve_move")
-])
-
-;;
-;; [vrhaddq_s, vrhaddq_u])
-;;
-(define_insn "@mve_vrhaddq_<supf><mode>"
-  [
-   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
-	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "w")
-		       (match_operand:MVE_2 2 "s_register_operand" "w")]
-	 VRHADDQ))
-  ]
-  "TARGET_HAVE_MVE"
-  "vrhadd.<supf>%#<V_sz_elem>\t%q0, %q1, %q2"
-  [(set_attr "type" "mve_move")
-])
-
-;;
-;; [vrmulhq_s, vrmulhq_u])
-;;
-(define_insn "mve_vrmulhq_<supf><mode>"
-  [
-   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
-	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "w")
-		       (match_operand:MVE_2 2 "s_register_operand" "w")]
-	 VRMULHQ))
-  ]
-  "TARGET_HAVE_MVE"
-  "vrmulh.<supf>%#<V_sz_elem>\t%q0, %q1, %q2"
-  [(set_attr "type" "mve_move")
-])
-
 ;;
 ;; [vrshlq_n_u, vrshlq_n_s])
 ;;
@@ -1600,21 +1447,6 @@ (define_insn "mve_vrshlq_n_<supf><mode>"
   [(set_attr "type" "mve_move")
 ])
 
-;;
-;; [vrshlq_s, vrshlq_u])
-;;
-(define_insn "mve_vrshlq_<supf><mode>"
-  [
-   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
-	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "w")
-		       (match_operand:MVE_2 2 "s_register_operand" "w")]
-	 VRSHLQ))
-  ]
-  "TARGET_HAVE_MVE"
-  "vrshl.<supf>%#<V_sz_elem>\t%q0, %q1, %q2"
-  [(set_attr "type" "mve_move")
-])
-
 ;;
 ;; [vrshrq_n_s, vrshrq_n_u])
 ;;
diff --git a/gcc/config/arm/vec-common.md b/gcc/config/arm/vec-common.md
index f06df4db636..918338ca5c0 100644
--- a/gcc/config/arm/vec-common.md
+++ b/gcc/config/arm/vec-common.md
@@ -573,7 +573,7 @@ (define_expand "avg<mode>3_floor"
   "ARM_HAVE_<MODE>_ARITH"
 {
   if (TARGET_HAVE_MVE)
-    emit_insn (gen_mve_vhaddq (VHADDQ_S, <MODE>mode,
+    emit_insn (gen_mve_q (VHADDQ_S, VHADDQ_S, <MODE>mode,
 			       operands[0], operands[1], operands[2]));
   else
     emit_insn (gen_neon_vhadd (UNSPEC_VHADD_S, UNSPEC_VHADD_S, <MODE>mode,
@@ -588,7 +588,7 @@ (define_expand "uavg<mode>3_floor"
   "ARM_HAVE_<MODE>_ARITH"
 {
   if (TARGET_HAVE_MVE)
-    emit_insn (gen_mve_vhaddq (VHADDQ_U, <MODE>mode,
+    emit_insn (gen_mve_q (VHADDQ_U, VHADDQ_U, <MODE>mode,
 			       operands[0], operands[1], operands[2]));
   else
     emit_insn (gen_neon_vhadd (UNSPEC_VHADD_U, UNSPEC_VHADD_U, <MODE>mode,
@@ -603,7 +603,7 @@ (define_expand "avg<mode>3_ceil"
   "ARM_HAVE_<MODE>_ARITH"
 {
   if (TARGET_HAVE_MVE)
-    emit_insn (gen_mve_vrhaddq (VRHADDQ_S, <MODE>mode,
+    emit_insn (gen_mve_q (VRHADDQ_S, VRHADDQ_S, <MODE>mode,
 				operands[0], operands[1], operands[2]));
   else
     emit_insn (gen_neon_vhadd (UNSPEC_VRHADD_S, UNSPEC_VRHADD_S, <MODE>mode,
@@ -618,7 +618,7 @@ (define_expand "uavg<mode>3_ceil"
   "ARM_HAVE_<MODE>_ARITH"
 {
   if (TARGET_HAVE_MVE)
-    emit_insn (gen_mve_vrhaddq (VRHADDQ_U, <MODE>mode,
+    emit_insn (gen_mve_q (VRHADDQ_U, VRHADDQ_U, <MODE>mode,
 				operands[0], operands[1], operands[2]));
   else
     emit_insn (gen_neon_vhadd (UNSPEC_VRHADD_U, UNSPEC_VRHADD_U, <MODE>mode,
-- 
2.34.1


^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH 22/22] arm: [MVE intrinsics] rework vhaddq vhsubq vmulhq vqaddq vqsubq vqdmulhq vrhaddq vrmulhq
  2023-04-18 13:45 [PATCH 00/22] arm: New framework for MVE intrinsics Christophe Lyon
                   ` (20 preceding siblings ...)
  2023-04-18 13:46 ` [PATCH 21/22] arm: [MVE intrinsics] factorize several binary operations Christophe Lyon
@ 2023-04-18 13:46 ` Christophe Lyon
  2023-05-03  8:51   ` Kyrylo Tkachov
  2023-05-02  9:18 ` [PATCH 00/22] arm: New framework for MVE intrinsics Kyrylo Tkachov
  22 siblings, 1 reply; 55+ messages in thread
From: Christophe Lyon @ 2023-04-18 13:46 UTC (permalink / raw)
  To: gcc-patches, kyrylo.tkachov, richard.earnshaw, richard.sandiford
  Cc: Christophe Lyon

Implement vhaddq, vhsubq, vmulhq, vqaddq, vqsubq, vqdmulhq, vrhaddq, vrmulhq using the new MVE builtins framework.

2022-09-08  Christophe Lyon <christophe.lyon@arm.com>

	gcc/
	* config/arm/arm-mve-builtins-base.cc (FUNCTION_WITH_M_N_NO_F)
	(FUNCTION_WITHOUT_N_NO_F, FUNCTION_WITH_M_N_NO_U_F): New.
	(vhaddq, vhsubq, vmulhq, vqaddq, vqsubq, vqdmulhq, vrhaddq)
	(vrmulhq): New.
	* config/arm/arm-mve-builtins-base.def (vhaddq, vhsubq, vmulhq)
	(vqaddq, vqsubq, vqdmulhq, vrhaddq, vrmulhq): New.
	* config/arm/arm-mve-builtins-base.h (vhaddq, vhsubq, vmulhq)
	(vqaddq, vqsubq, vqdmulhq, vrhaddq, vrmulhq): New.
	* config/arm/arm_mve.h (vhsubq): Remove.
	(vhaddq): Remove.
	(vhaddq_m): Remove.
	(vhsubq_m): Remove.
	(vhaddq_x): Remove.
	(vhsubq_x): Remove.
	(vhsubq_u8): Remove.
	(vhsubq_n_u8): Remove.
	(vhaddq_u8): Remove.
	(vhaddq_n_u8): Remove.
	(vhsubq_s8): Remove.
	(vhsubq_n_s8): Remove.
	(vhaddq_s8): Remove.
	(vhaddq_n_s8): Remove.
	(vhsubq_u16): Remove.
	(vhsubq_n_u16): Remove.
	(vhaddq_u16): Remove.
	(vhaddq_n_u16): Remove.
	(vhsubq_s16): Remove.
	(vhsubq_n_s16): Remove.
	(vhaddq_s16): Remove.
	(vhaddq_n_s16): Remove.
	(vhsubq_u32): Remove.
	(vhsubq_n_u32): Remove.
	(vhaddq_u32): Remove.
	(vhaddq_n_u32): Remove.
	(vhsubq_s32): Remove.
	(vhsubq_n_s32): Remove.
	(vhaddq_s32): Remove.
	(vhaddq_n_s32): Remove.
	(vhaddq_m_n_s8): Remove.
	(vhaddq_m_n_s32): Remove.
	(vhaddq_m_n_s16): Remove.
	(vhaddq_m_n_u8): Remove.
	(vhaddq_m_n_u32): Remove.
	(vhaddq_m_n_u16): Remove.
	(vhaddq_m_s8): Remove.
	(vhaddq_m_s32): Remove.
	(vhaddq_m_s16): Remove.
	(vhaddq_m_u8): Remove.
	(vhaddq_m_u32): Remove.
	(vhaddq_m_u16): Remove.
	(vhsubq_m_n_s8): Remove.
	(vhsubq_m_n_s32): Remove.
	(vhsubq_m_n_s16): Remove.
	(vhsubq_m_n_u8): Remove.
	(vhsubq_m_n_u32): Remove.
	(vhsubq_m_n_u16): Remove.
	(vhsubq_m_s8): Remove.
	(vhsubq_m_s32): Remove.
	(vhsubq_m_s16): Remove.
	(vhsubq_m_u8): Remove.
	(vhsubq_m_u32): Remove.
	(vhsubq_m_u16): Remove.
	(vhaddq_x_n_s8): Remove.
	(vhaddq_x_n_s16): Remove.
	(vhaddq_x_n_s32): Remove.
	(vhaddq_x_n_u8): Remove.
	(vhaddq_x_n_u16): Remove.
	(vhaddq_x_n_u32): Remove.
	(vhaddq_x_s8): Remove.
	(vhaddq_x_s16): Remove.
	(vhaddq_x_s32): Remove.
	(vhaddq_x_u8): Remove.
	(vhaddq_x_u16): Remove.
	(vhaddq_x_u32): Remove.
	(vhsubq_x_n_s8): Remove.
	(vhsubq_x_n_s16): Remove.
	(vhsubq_x_n_s32): Remove.
	(vhsubq_x_n_u8): Remove.
	(vhsubq_x_n_u16): Remove.
	(vhsubq_x_n_u32): Remove.
	(vhsubq_x_s8): Remove.
	(vhsubq_x_s16): Remove.
	(vhsubq_x_s32): Remove.
	(vhsubq_x_u8): Remove.
	(vhsubq_x_u16): Remove.
	(vhsubq_x_u32): Remove.
	(__arm_vhsubq_u8): Remove.
	(__arm_vhsubq_n_u8): Remove.
	(__arm_vhaddq_u8): Remove.
	(__arm_vhaddq_n_u8): Remove.
	(__arm_vhsubq_s8): Remove.
	(__arm_vhsubq_n_s8): Remove.
	(__arm_vhaddq_s8): Remove.
	(__arm_vhaddq_n_s8): Remove.
	(__arm_vhsubq_u16): Remove.
	(__arm_vhsubq_n_u16): Remove.
	(__arm_vhaddq_u16): Remove.
	(__arm_vhaddq_n_u16): Remove.
	(__arm_vhsubq_s16): Remove.
	(__arm_vhsubq_n_s16): Remove.
	(__arm_vhaddq_s16): Remove.
	(__arm_vhaddq_n_s16): Remove.
	(__arm_vhsubq_u32): Remove.
	(__arm_vhsubq_n_u32): Remove.
	(__arm_vhaddq_u32): Remove.
	(__arm_vhaddq_n_u32): Remove.
	(__arm_vhsubq_s32): Remove.
	(__arm_vhsubq_n_s32): Remove.
	(__arm_vhaddq_s32): Remove.
	(__arm_vhaddq_n_s32): Remove.
	(__arm_vhaddq_m_n_s8): Remove.
	(__arm_vhaddq_m_n_s32): Remove.
	(__arm_vhaddq_m_n_s16): Remove.
	(__arm_vhaddq_m_n_u8): Remove.
	(__arm_vhaddq_m_n_u32): Remove.
	(__arm_vhaddq_m_n_u16): Remove.
	(__arm_vhaddq_m_s8): Remove.
	(__arm_vhaddq_m_s32): Remove.
	(__arm_vhaddq_m_s16): Remove.
	(__arm_vhaddq_m_u8): Remove.
	(__arm_vhaddq_m_u32): Remove.
	(__arm_vhaddq_m_u16): Remove.
	(__arm_vhsubq_m_n_s8): Remove.
	(__arm_vhsubq_m_n_s32): Remove.
	(__arm_vhsubq_m_n_s16): Remove.
	(__arm_vhsubq_m_n_u8): Remove.
	(__arm_vhsubq_m_n_u32): Remove.
	(__arm_vhsubq_m_n_u16): Remove.
	(__arm_vhsubq_m_s8): Remove.
	(__arm_vhsubq_m_s32): Remove.
	(__arm_vhsubq_m_s16): Remove.
	(__arm_vhsubq_m_u8): Remove.
	(__arm_vhsubq_m_u32): Remove.
	(__arm_vhsubq_m_u16): Remove.
	(__arm_vhaddq_x_n_s8): Remove.
	(__arm_vhaddq_x_n_s16): Remove.
	(__arm_vhaddq_x_n_s32): Remove.
	(__arm_vhaddq_x_n_u8): Remove.
	(__arm_vhaddq_x_n_u16): Remove.
	(__arm_vhaddq_x_n_u32): Remove.
	(__arm_vhaddq_x_s8): Remove.
	(__arm_vhaddq_x_s16): Remove.
	(__arm_vhaddq_x_s32): Remove.
	(__arm_vhaddq_x_u8): Remove.
	(__arm_vhaddq_x_u16): Remove.
	(__arm_vhaddq_x_u32): Remove.
	(__arm_vhsubq_x_n_s8): Remove.
	(__arm_vhsubq_x_n_s16): Remove.
	(__arm_vhsubq_x_n_s32): Remove.
	(__arm_vhsubq_x_n_u8): Remove.
	(__arm_vhsubq_x_n_u16): Remove.
	(__arm_vhsubq_x_n_u32): Remove.
	(__arm_vhsubq_x_s8): Remove.
	(__arm_vhsubq_x_s16): Remove.
	(__arm_vhsubq_x_s32): Remove.
	(__arm_vhsubq_x_u8): Remove.
	(__arm_vhsubq_x_u16): Remove.
	(__arm_vhsubq_x_u32): Remove.
	(__arm_vhsubq): Remove.
	(__arm_vhaddq): Remove.
	(__arm_vhaddq_m): Remove.
	(__arm_vhsubq_m): Remove.
	(__arm_vhaddq_x): Remove.
	(__arm_vhsubq_x): Remove.
	(vmulhq): Remove.
	(vmulhq_m): Remove.
	(vmulhq_x): Remove.
	(vmulhq_u8): Remove.
	(vmulhq_s8): Remove.
	(vmulhq_u16): Remove.
	(vmulhq_s16): Remove.
	(vmulhq_u32): Remove.
	(vmulhq_s32): Remove.
	(vmulhq_m_s8): Remove.
	(vmulhq_m_s32): Remove.
	(vmulhq_m_s16): Remove.
	(vmulhq_m_u8): Remove.
	(vmulhq_m_u32): Remove.
	(vmulhq_m_u16): Remove.
	(vmulhq_x_s8): Remove.
	(vmulhq_x_s16): Remove.
	(vmulhq_x_s32): Remove.
	(vmulhq_x_u8): Remove.
	(vmulhq_x_u16): Remove.
	(vmulhq_x_u32): Remove.
	(__arm_vmulhq_u8): Remove.
	(__arm_vmulhq_s8): Remove.
	(__arm_vmulhq_u16): Remove.
	(__arm_vmulhq_s16): Remove.
	(__arm_vmulhq_u32): Remove.
	(__arm_vmulhq_s32): Remove.
	(__arm_vmulhq_m_s8): Remove.
	(__arm_vmulhq_m_s32): Remove.
	(__arm_vmulhq_m_s16): Remove.
	(__arm_vmulhq_m_u8): Remove.
	(__arm_vmulhq_m_u32): Remove.
	(__arm_vmulhq_m_u16): Remove.
	(__arm_vmulhq_x_s8): Remove.
	(__arm_vmulhq_x_s16): Remove.
	(__arm_vmulhq_x_s32): Remove.
	(__arm_vmulhq_x_u8): Remove.
	(__arm_vmulhq_x_u16): Remove.
	(__arm_vmulhq_x_u32): Remove.
	(__arm_vmulhq): Remove.
	(__arm_vmulhq_m): Remove.
	(__arm_vmulhq_x): Remove.
	(vqsubq): Remove.
	(vqaddq): Remove.
	(vqaddq_m): Remove.
	(vqsubq_m): Remove.
	(vqsubq_u8): Remove.
	(vqsubq_n_u8): Remove.
	(vqaddq_u8): Remove.
	(vqaddq_n_u8): Remove.
	(vqsubq_s8): Remove.
	(vqsubq_n_s8): Remove.
	(vqaddq_s8): Remove.
	(vqaddq_n_s8): Remove.
	(vqsubq_u16): Remove.
	(vqsubq_n_u16): Remove.
	(vqaddq_u16): Remove.
	(vqaddq_n_u16): Remove.
	(vqsubq_s16): Remove.
	(vqsubq_n_s16): Remove.
	(vqaddq_s16): Remove.
	(vqaddq_n_s16): Remove.
	(vqsubq_u32): Remove.
	(vqsubq_n_u32): Remove.
	(vqaddq_u32): Remove.
	(vqaddq_n_u32): Remove.
	(vqsubq_s32): Remove.
	(vqsubq_n_s32): Remove.
	(vqaddq_s32): Remove.
	(vqaddq_n_s32): Remove.
	(vqaddq_m_n_s8): Remove.
	(vqaddq_m_n_s32): Remove.
	(vqaddq_m_n_s16): Remove.
	(vqaddq_m_n_u8): Remove.
	(vqaddq_m_n_u32): Remove.
	(vqaddq_m_n_u16): Remove.
	(vqaddq_m_s8): Remove.
	(vqaddq_m_s32): Remove.
	(vqaddq_m_s16): Remove.
	(vqaddq_m_u8): Remove.
	(vqaddq_m_u32): Remove.
	(vqaddq_m_u16): Remove.
	(vqsubq_m_n_s8): Remove.
	(vqsubq_m_n_s32): Remove.
	(vqsubq_m_n_s16): Remove.
	(vqsubq_m_n_u8): Remove.
	(vqsubq_m_n_u32): Remove.
	(vqsubq_m_n_u16): Remove.
	(vqsubq_m_s8): Remove.
	(vqsubq_m_s32): Remove.
	(vqsubq_m_s16): Remove.
	(vqsubq_m_u8): Remove.
	(vqsubq_m_u32): Remove.
	(vqsubq_m_u16): Remove.
	(__arm_vqsubq_u8): Remove.
	(__arm_vqsubq_n_u8): Remove.
	(__arm_vqaddq_u8): Remove.
	(__arm_vqaddq_n_u8): Remove.
	(__arm_vqsubq_s8): Remove.
	(__arm_vqsubq_n_s8): Remove.
	(__arm_vqaddq_s8): Remove.
	(__arm_vqaddq_n_s8): Remove.
	(__arm_vqsubq_u16): Remove.
	(__arm_vqsubq_n_u16): Remove.
	(__arm_vqaddq_u16): Remove.
	(__arm_vqaddq_n_u16): Remove.
	(__arm_vqsubq_s16): Remove.
	(__arm_vqsubq_n_s16): Remove.
	(__arm_vqaddq_s16): Remove.
	(__arm_vqaddq_n_s16): Remove.
	(__arm_vqsubq_u32): Remove.
	(__arm_vqsubq_n_u32): Remove.
	(__arm_vqaddq_u32): Remove.
	(__arm_vqaddq_n_u32): Remove.
	(__arm_vqsubq_s32): Remove.
	(__arm_vqsubq_n_s32): Remove.
	(__arm_vqaddq_s32): Remove.
	(__arm_vqaddq_n_s32): Remove.
	(__arm_vqaddq_m_n_s8): Remove.
	(__arm_vqaddq_m_n_s32): Remove.
	(__arm_vqaddq_m_n_s16): Remove.
	(__arm_vqaddq_m_n_u8): Remove.
	(__arm_vqaddq_m_n_u32): Remove.
	(__arm_vqaddq_m_n_u16): Remove.
	(__arm_vqaddq_m_s8): Remove.
	(__arm_vqaddq_m_s32): Remove.
	(__arm_vqaddq_m_s16): Remove.
	(__arm_vqaddq_m_u8): Remove.
	(__arm_vqaddq_m_u32): Remove.
	(__arm_vqaddq_m_u16): Remove.
	(__arm_vqsubq_m_n_s8): Remove.
	(__arm_vqsubq_m_n_s32): Remove.
	(__arm_vqsubq_m_n_s16): Remove.
	(__arm_vqsubq_m_n_u8): Remove.
	(__arm_vqsubq_m_n_u32): Remove.
	(__arm_vqsubq_m_n_u16): Remove.
	(__arm_vqsubq_m_s8): Remove.
	(__arm_vqsubq_m_s32): Remove.
	(__arm_vqsubq_m_s16): Remove.
	(__arm_vqsubq_m_u8): Remove.
	(__arm_vqsubq_m_u32): Remove.
	(__arm_vqsubq_m_u16): Remove.
	(__arm_vqsubq): Remove.
	(__arm_vqaddq): Remove.
	(__arm_vqaddq_m): Remove.
	(__arm_vqsubq_m): Remove.
	(vqdmulhq): Remove.
	(vqdmulhq_m): Remove.
	(vqdmulhq_s8): Remove.
	(vqdmulhq_n_s8): Remove.
	(vqdmulhq_s16): Remove.
	(vqdmulhq_n_s16): Remove.
	(vqdmulhq_s32): Remove.
	(vqdmulhq_n_s32): Remove.
	(vqdmulhq_m_n_s8): Remove.
	(vqdmulhq_m_n_s32): Remove.
	(vqdmulhq_m_n_s16): Remove.
	(vqdmulhq_m_s8): Remove.
	(vqdmulhq_m_s32): Remove.
	(vqdmulhq_m_s16): Remove.
	(__arm_vqdmulhq_s8): Remove.
	(__arm_vqdmulhq_n_s8): Remove.
	(__arm_vqdmulhq_s16): Remove.
	(__arm_vqdmulhq_n_s16): Remove.
	(__arm_vqdmulhq_s32): Remove.
	(__arm_vqdmulhq_n_s32): Remove.
	(__arm_vqdmulhq_m_n_s8): Remove.
	(__arm_vqdmulhq_m_n_s32): Remove.
	(__arm_vqdmulhq_m_n_s16): Remove.
	(__arm_vqdmulhq_m_s8): Remove.
	(__arm_vqdmulhq_m_s32): Remove.
	(__arm_vqdmulhq_m_s16): Remove.
	(__arm_vqdmulhq): Remove.
	(__arm_vqdmulhq_m): Remove.
	(vrhaddq): Remove.
	(vrhaddq_m): Remove.
	(vrhaddq_x): Remove.
	(vrhaddq_u8): Remove.
	(vrhaddq_s8): Remove.
	(vrhaddq_u16): Remove.
	(vrhaddq_s16): Remove.
	(vrhaddq_u32): Remove.
	(vrhaddq_s32): Remove.
	(vrhaddq_m_s8): Remove.
	(vrhaddq_m_s32): Remove.
	(vrhaddq_m_s16): Remove.
	(vrhaddq_m_u8): Remove.
	(vrhaddq_m_u32): Remove.
	(vrhaddq_m_u16): Remove.
	(vrhaddq_x_s8): Remove.
	(vrhaddq_x_s16): Remove.
	(vrhaddq_x_s32): Remove.
	(vrhaddq_x_u8): Remove.
	(vrhaddq_x_u16): Remove.
	(vrhaddq_x_u32): Remove.
	(__arm_vrhaddq_u8): Remove.
	(__arm_vrhaddq_s8): Remove.
	(__arm_vrhaddq_u16): Remove.
	(__arm_vrhaddq_s16): Remove.
	(__arm_vrhaddq_u32): Remove.
	(__arm_vrhaddq_s32): Remove.
	(__arm_vrhaddq_m_s8): Remove.
	(__arm_vrhaddq_m_s32): Remove.
	(__arm_vrhaddq_m_s16): Remove.
	(__arm_vrhaddq_m_u8): Remove.
	(__arm_vrhaddq_m_u32): Remove.
	(__arm_vrhaddq_m_u16): Remove.
	(__arm_vrhaddq_x_s8): Remove.
	(__arm_vrhaddq_x_s16): Remove.
	(__arm_vrhaddq_x_s32): Remove.
	(__arm_vrhaddq_x_u8): Remove.
	(__arm_vrhaddq_x_u16): Remove.
	(__arm_vrhaddq_x_u32): Remove.
	(__arm_vrhaddq): Remove.
	(__arm_vrhaddq_m): Remove.
	(__arm_vrhaddq_x): Remove.
	(vrmulhq): Remove.
	(vrmulhq_m): Remove.
	(vrmulhq_x): Remove.
	(vrmulhq_u8): Remove.
	(vrmulhq_s8): Remove.
	(vrmulhq_u16): Remove.
	(vrmulhq_s16): Remove.
	(vrmulhq_u32): Remove.
	(vrmulhq_s32): Remove.
	(vrmulhq_m_s8): Remove.
	(vrmulhq_m_s32): Remove.
	(vrmulhq_m_s16): Remove.
	(vrmulhq_m_u8): Remove.
	(vrmulhq_m_u32): Remove.
	(vrmulhq_m_u16): Remove.
	(vrmulhq_x_s8): Remove.
	(vrmulhq_x_s16): Remove.
	(vrmulhq_x_s32): Remove.
	(vrmulhq_x_u8): Remove.
	(vrmulhq_x_u16): Remove.
	(vrmulhq_x_u32): Remove.
	(__arm_vrmulhq_u8): Remove.
	(__arm_vrmulhq_s8): Remove.
	(__arm_vrmulhq_u16): Remove.
	(__arm_vrmulhq_s16): Remove.
	(__arm_vrmulhq_u32): Remove.
	(__arm_vrmulhq_s32): Remove.
	(__arm_vrmulhq_m_s8): Remove.
	(__arm_vrmulhq_m_s32): Remove.
	(__arm_vrmulhq_m_s16): Remove.
	(__arm_vrmulhq_m_u8): Remove.
	(__arm_vrmulhq_m_u32): Remove.
	(__arm_vrmulhq_m_u16): Remove.
	(__arm_vrmulhq_x_s8): Remove.
	(__arm_vrmulhq_x_s16): Remove.
	(__arm_vrmulhq_x_s32): Remove.
	(__arm_vrmulhq_x_u8): Remove.
	(__arm_vrmulhq_x_u16): Remove.
	(__arm_vrmulhq_x_u32): Remove.
	(__arm_vrmulhq): Remove.
	(__arm_vrmulhq_m): Remove.
	(__arm_vrmulhq_x): Remove.
---
 gcc/config/arm/arm-mve-builtins-base.cc  |   35 +
 gcc/config/arm/arm-mve-builtins-base.def |    8 +
 gcc/config/arm/arm-mve-builtins-base.h   |    8 +
 gcc/config/arm/arm_mve.h                 | 3203 ----------------------
 4 files changed, 51 insertions(+), 3203 deletions(-)

diff --git a/gcc/config/arm/arm-mve-builtins-base.cc b/gcc/config/arm/arm-mve-builtins-base.cc
index 9722c861faf..668f1fe9cda 100644
--- a/gcc/config/arm/arm-mve-builtins-base.cc
+++ b/gcc/config/arm/arm-mve-builtins-base.cc
@@ -115,13 +115,48 @@ namespace arm_mve {
     -1, -1, -1,								\
     -1, -1, -1))
 
+  /* Helper for builtins with only unspec codes, _m predicated and _n
+     overrides, but no floating-point version.  */
+#define FUNCTION_WITH_M_N_NO_F(NAME, UNSPEC) FUNCTION			\
+  (NAME, unspec_mve_function_exact_insn,				\
+   (UNSPEC##_S, UNSPEC##_U, -1,						\
+    UNSPEC##_N_S, UNSPEC##_N_U, -1,					\
+    UNSPEC##_M_S, UNSPEC##_M_U, -1,					\
+    UNSPEC##_M_N_S, UNSPEC##_M_N_U, -1))
+
+  /* Helper for builtins with only unspec codes, _m predicated
+     overrides, no _n and no floating-point version.  */
+#define FUNCTION_WITHOUT_N_NO_F(NAME, UNSPEC) FUNCTION			\
+  (NAME, unspec_mve_function_exact_insn,				\
+   (UNSPEC##_S, UNSPEC##_U, -1,						\
+    -1, -1, -1,								\
+    UNSPEC##_M_S, UNSPEC##_M_U, -1,					\
+    -1, -1, -1))
+
+  /* Helper for builtins with only unspec codes, _m predicated and _n
+     overrides, but no unsigned and floating-point versions.  */
+#define FUNCTION_WITH_M_N_NO_U_F(NAME, UNSPEC) FUNCTION			\
+  (NAME, unspec_mve_function_exact_insn,				\
+   (UNSPEC##_S, -1, -1,							\
+    UNSPEC##_N_S, -1, -1,						\
+    UNSPEC##_M_S, -1, -1,						\
+    UNSPEC##_M_N_S, -1, -1))
+
 FUNCTION_WITH_RTX_M_N (vaddq, PLUS, VADDQ)
 FUNCTION_WITH_RTX_M (vandq, AND, VANDQ)
 FUNCTION_WITHOUT_M_N (vcreateq, VCREATEQ)
 FUNCTION_WITH_RTX_M (veorq, XOR, VEORQ)
+FUNCTION_WITH_M_N_NO_F (vhaddq, VHADDQ)
+FUNCTION_WITH_M_N_NO_F (vhsubq, VHSUBQ)
+FUNCTION_WITHOUT_N_NO_F (vmulhq, VMULHQ)
 FUNCTION_WITH_RTX_M_N (vmulq, MULT, VMULQ)
 FUNCTION_WITH_RTX_M_N_NO_N_F (vorrq, IOR, VORRQ)
+FUNCTION_WITH_M_N_NO_F (vqaddq, VQADDQ)
+FUNCTION_WITH_M_N_NO_U_F (vqdmulhq, VQDMULHQ)
+FUNCTION_WITH_M_N_NO_F (vqsubq, VQSUBQ)
 FUNCTION (vreinterpretq, vreinterpretq_impl,)
+FUNCTION_WITHOUT_N_NO_F (vrhaddq, VRHADDQ)
+FUNCTION_WITHOUT_N_NO_F (vrmulhq, VRMULHQ)
 FUNCTION_WITH_RTX_M_N (vsubq, MINUS, VSUBQ)
 FUNCTION (vuninitializedq, vuninitializedq_impl,)
 
diff --git a/gcc/config/arm/arm-mve-builtins-base.def b/gcc/config/arm/arm-mve-builtins-base.def
index 1bfd15f973c..d256f3ebb2d 100644
--- a/gcc/config/arm/arm-mve-builtins-base.def
+++ b/gcc/config/arm/arm-mve-builtins-base.def
@@ -22,9 +22,17 @@ DEF_MVE_FUNCTION (vaddq, binary_opt_n, all_integer, mx_or_none)
 DEF_MVE_FUNCTION (vandq, binary, all_integer, mx_or_none)
 DEF_MVE_FUNCTION (vcreateq, create, all_integer_with_64, none)
 DEF_MVE_FUNCTION (veorq, binary, all_integer, mx_or_none)
+DEF_MVE_FUNCTION (vhaddq, binary_opt_n, all_integer, mx_or_none)
+DEF_MVE_FUNCTION (vhsubq, binary_opt_n, all_integer, mx_or_none)
+DEF_MVE_FUNCTION (vmulhq, binary, all_integer, mx_or_none)
 DEF_MVE_FUNCTION (vmulq, binary_opt_n, all_integer, mx_or_none)
 DEF_MVE_FUNCTION (vorrq, binary_orrq, all_integer, mx_or_none)
+DEF_MVE_FUNCTION (vqaddq, binary_opt_n, all_integer, m_or_none)
+DEF_MVE_FUNCTION (vqdmulhq, binary_opt_n, all_signed, m_or_none)
+DEF_MVE_FUNCTION (vqsubq, binary_opt_n, all_integer, m_or_none)
 DEF_MVE_FUNCTION (vreinterpretq, unary_convert, reinterpret_integer, none)
+DEF_MVE_FUNCTION (vrhaddq, binary, all_integer, mx_or_none)
+DEF_MVE_FUNCTION (vrmulhq, binary, all_integer, mx_or_none)
 DEF_MVE_FUNCTION (vsubq, binary_opt_n, all_integer, mx_or_none)
 DEF_MVE_FUNCTION (vuninitializedq, inherent, all_integer_with_64, none)
 #undef REQUIRES_FLOAT
diff --git a/gcc/config/arm/arm-mve-builtins-base.h b/gcc/config/arm/arm-mve-builtins-base.h
index 8dd6bff01bf..d64cb5e1dec 100644
--- a/gcc/config/arm/arm-mve-builtins-base.h
+++ b/gcc/config/arm/arm-mve-builtins-base.h
@@ -27,9 +27,17 @@ extern const function_base *const vaddq;
 extern const function_base *const vandq;
 extern const function_base *const vcreateq;
 extern const function_base *const veorq;
+extern const function_base *const vhaddq;
+extern const function_base *const vhsubq;
+extern const function_base *const vmulhq;
 extern const function_base *const vmulq;
 extern const function_base *const vorrq;
+extern const function_base *const vqaddq;
+extern const function_base *const vqdmulhq;
+extern const function_base *const vqsubq;
 extern const function_base *const vreinterpretq;
+extern const function_base *const vrhaddq;
+extern const function_base *const vrmulhq;
 extern const function_base *const vsubq;
 extern const function_base *const vuninitializedq;
 
diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
index 4810e2977d3..9c5d14794a1 100644
--- a/gcc/config/arm/arm_mve.h
+++ b/gcc/config/arm/arm_mve.h
@@ -61,21 +61,14 @@
 #define vaddlvq_p(__a, __p) __arm_vaddlvq_p(__a, __p)
 #define vcmpneq(__a, __b) __arm_vcmpneq(__a, __b)
 #define vshlq(__a, __b) __arm_vshlq(__a, __b)
-#define vrmulhq(__a, __b) __arm_vrmulhq(__a, __b)
-#define vrhaddq(__a, __b) __arm_vrhaddq(__a, __b)
-#define vqsubq(__a, __b) __arm_vqsubq(__a, __b)
-#define vqaddq(__a, __b) __arm_vqaddq(__a, __b)
 #define vornq(__a, __b) __arm_vornq(__a, __b)
 #define vmulltq_int(__a, __b) __arm_vmulltq_int(__a, __b)
 #define vmullbq_int(__a, __b) __arm_vmullbq_int(__a, __b)
-#define vmulhq(__a, __b) __arm_vmulhq(__a, __b)
 #define vmladavq(__a, __b) __arm_vmladavq(__a, __b)
 #define vminvq(__a, __b) __arm_vminvq(__a, __b)
 #define vminq(__a, __b) __arm_vminq(__a, __b)
 #define vmaxvq(__a, __b) __arm_vmaxvq(__a, __b)
 #define vmaxq(__a, __b) __arm_vmaxq(__a, __b)
-#define vhsubq(__a, __b) __arm_vhsubq(__a, __b)
-#define vhaddq(__a, __b) __arm_vhaddq(__a, __b)
 #define vcmphiq(__a, __b) __arm_vcmphiq(__a, __b)
 #define vcmpeqq(__a, __b) __arm_vcmpeqq(__a, __b)
 #define vcmpcsq(__a, __b) __arm_vcmpcsq(__a, __b)
@@ -104,7 +97,6 @@
 #define vcmpgeq(__a, __b) __arm_vcmpgeq(__a, __b)
 #define vqshluq(__a, __imm) __arm_vqshluq(__a, __imm)
 #define vqrdmulhq(__a, __b) __arm_vqrdmulhq(__a, __b)
-#define vqdmulhq(__a, __b) __arm_vqdmulhq(__a, __b)
 #define vmlsdavxq(__a, __b) __arm_vmlsdavxq(__a, __b)
 #define vmlsdavq(__a, __b) __arm_vmlsdavq(__a, __b)
 #define vmladavxq(__a, __b) __arm_vmladavxq(__a, __b)
@@ -236,10 +228,8 @@
 #define vbrsrq_m(__inactive, __a, __b, __p) __arm_vbrsrq_m(__inactive, __a, __b, __p)
 #define vcaddq_rot270_m(__inactive, __a, __b, __p) __arm_vcaddq_rot270_m(__inactive, __a, __b, __p)
 #define vcaddq_rot90_m(__inactive, __a, __b, __p) __arm_vcaddq_rot90_m(__inactive, __a, __b, __p)
-#define vhaddq_m(__inactive, __a, __b, __p) __arm_vhaddq_m(__inactive, __a, __b, __p)
 #define vhcaddq_rot270_m(__inactive, __a, __b, __p) __arm_vhcaddq_rot270_m(__inactive, __a, __b, __p)
 #define vhcaddq_rot90_m(__inactive, __a, __b, __p) __arm_vhcaddq_rot90_m(__inactive, __a, __b, __p)
-#define vhsubq_m(__inactive, __a, __b, __p) __arm_vhsubq_m(__inactive, __a, __b, __p)
 #define vmaxq_m(__inactive, __a, __b, __p) __arm_vmaxq_m(__inactive, __a, __b, __p)
 #define vminq_m(__inactive, __a, __b, __p) __arm_vminq_m(__inactive, __a, __b, __p)
 #define vmladavaq_p(__a, __b, __c, __p) __arm_vmladavaq_p(__a, __b, __c, __p)
@@ -248,18 +238,15 @@
 #define vmlasq_m(__a, __b, __c, __p) __arm_vmlasq_m(__a, __b, __c, __p)
 #define vmlsdavaq_p(__a, __b, __c, __p) __arm_vmlsdavaq_p(__a, __b, __c, __p)
 #define vmlsdavaxq_p(__a, __b, __c, __p) __arm_vmlsdavaxq_p(__a, __b, __c, __p)
-#define vmulhq_m(__inactive, __a, __b, __p) __arm_vmulhq_m(__inactive, __a, __b, __p)
 #define vmullbq_int_m(__inactive, __a, __b, __p) __arm_vmullbq_int_m(__inactive, __a, __b, __p)
 #define vmulltq_int_m(__inactive, __a, __b, __p) __arm_vmulltq_int_m(__inactive, __a, __b, __p)
 #define vornq_m(__inactive, __a, __b, __p) __arm_vornq_m(__inactive, __a, __b, __p)
-#define vqaddq_m(__inactive, __a, __b, __p) __arm_vqaddq_m(__inactive, __a, __b, __p)
 #define vqdmladhq_m(__inactive, __a, __b, __p) __arm_vqdmladhq_m(__inactive, __a, __b, __p)
 #define vqdmlashq_m(__a, __b, __c, __p) __arm_vqdmlashq_m(__a, __b, __c, __p)
 #define vqdmladhxq_m(__inactive, __a, __b, __p) __arm_vqdmladhxq_m(__inactive, __a, __b, __p)
 #define vqdmlahq_m(__a, __b, __c, __p) __arm_vqdmlahq_m(__a, __b, __c, __p)
 #define vqdmlsdhq_m(__inactive, __a, __b, __p) __arm_vqdmlsdhq_m(__inactive, __a, __b, __p)
 #define vqdmlsdhxq_m(__inactive, __a, __b, __p) __arm_vqdmlsdhxq_m(__inactive, __a, __b, __p)
-#define vqdmulhq_m(__inactive, __a, __b, __p) __arm_vqdmulhq_m(__inactive, __a, __b, __p)
 #define vqrdmladhq_m(__inactive, __a, __b, __p) __arm_vqrdmladhq_m(__inactive, __a, __b, __p)
 #define vqrdmladhxq_m(__inactive, __a, __b, __p) __arm_vqrdmladhxq_m(__inactive, __a, __b, __p)
 #define vqrdmlahq_m(__a, __b, __c, __p) __arm_vqrdmlahq_m(__a, __b, __c, __p)
@@ -270,9 +257,6 @@
 #define vqrshlq_m(__inactive, __a, __b, __p) __arm_vqrshlq_m(__inactive, __a, __b, __p)
 #define vqshlq_m_n(__inactive, __a, __imm, __p) __arm_vqshlq_m_n(__inactive, __a, __imm, __p)
 #define vqshlq_m(__inactive, __a, __b, __p) __arm_vqshlq_m(__inactive, __a, __b, __p)
-#define vqsubq_m(__inactive, __a, __b, __p) __arm_vqsubq_m(__inactive, __a, __b, __p)
-#define vrhaddq_m(__inactive, __a, __b, __p) __arm_vrhaddq_m(__inactive, __a, __b, __p)
-#define vrmulhq_m(__inactive, __a, __b, __p) __arm_vrmulhq_m(__inactive, __a, __b, __p)
 #define vrshlq_m(__inactive, __a, __b, __p) __arm_vrshlq_m(__inactive, __a, __b, __p)
 #define vrshrq_m(__inactive, __a, __imm, __p) __arm_vrshrq_m(__inactive, __a, __imm, __p)
 #define vshlq_m_n(__inactive, __a, __imm, __p) __arm_vshlq_m_n(__inactive, __a, __imm, __p)
@@ -384,19 +368,14 @@
 #define vclsq_x(__a, __p) __arm_vclsq_x(__a, __p)
 #define vclzq_x(__a, __p) __arm_vclzq_x(__a, __p)
 #define vnegq_x(__a, __p) __arm_vnegq_x(__a, __p)
-#define vmulhq_x(__a, __b, __p) __arm_vmulhq_x(__a, __b, __p)
 #define vmullbq_poly_x(__a, __b, __p) __arm_vmullbq_poly_x(__a, __b, __p)
 #define vmullbq_int_x(__a, __b, __p) __arm_vmullbq_int_x(__a, __b, __p)
 #define vmulltq_poly_x(__a, __b, __p) __arm_vmulltq_poly_x(__a, __b, __p)
 #define vmulltq_int_x(__a, __b, __p) __arm_vmulltq_int_x(__a, __b, __p)
 #define vcaddq_rot90_x(__a, __b, __p) __arm_vcaddq_rot90_x(__a, __b, __p)
 #define vcaddq_rot270_x(__a, __b, __p) __arm_vcaddq_rot270_x(__a, __b, __p)
-#define vhaddq_x(__a, __b, __p) __arm_vhaddq_x(__a, __b, __p)
 #define vhcaddq_rot90_x(__a, __b, __p) __arm_vhcaddq_rot90_x(__a, __b, __p)
 #define vhcaddq_rot270_x(__a, __b, __p) __arm_vhcaddq_rot270_x(__a, __b, __p)
-#define vhsubq_x(__a, __b, __p) __arm_vhsubq_x(__a, __b, __p)
-#define vrhaddq_x(__a, __b, __p) __arm_vrhaddq_x(__a, __b, __p)
-#define vrmulhq_x(__a, __b, __p) __arm_vrmulhq_x(__a, __b, __p)
 #define vbicq_x(__a, __b, __p) __arm_vbicq_x(__a, __b, __p)
 #define vbrsrq_x(__a, __b, __p) __arm_vbrsrq_x(__a, __b, __p)
 #define vmovlbq_x(__a, __p) __arm_vmovlbq_x(__a, __p)
@@ -662,25 +641,14 @@
 #define vshlq_u8(__a, __b) __arm_vshlq_u8(__a, __b)
 #define vshlq_u16(__a, __b) __arm_vshlq_u16(__a, __b)
 #define vshlq_u32(__a, __b) __arm_vshlq_u32(__a, __b)
-#define vrmulhq_u8(__a, __b) __arm_vrmulhq_u8(__a, __b)
-#define vrhaddq_u8(__a, __b) __arm_vrhaddq_u8(__a, __b)
-#define vqsubq_u8(__a, __b) __arm_vqsubq_u8(__a, __b)
-#define vqsubq_n_u8(__a, __b) __arm_vqsubq_n_u8(__a, __b)
-#define vqaddq_u8(__a, __b) __arm_vqaddq_u8(__a, __b)
-#define vqaddq_n_u8(__a, __b) __arm_vqaddq_n_u8(__a, __b)
 #define vornq_u8(__a, __b) __arm_vornq_u8(__a, __b)
 #define vmulltq_int_u8(__a, __b) __arm_vmulltq_int_u8(__a, __b)
 #define vmullbq_int_u8(__a, __b) __arm_vmullbq_int_u8(__a, __b)
-#define vmulhq_u8(__a, __b) __arm_vmulhq_u8(__a, __b)
 #define vmladavq_u8(__a, __b) __arm_vmladavq_u8(__a, __b)
 #define vminvq_u8(__a, __b) __arm_vminvq_u8(__a, __b)
 #define vminq_u8(__a, __b) __arm_vminq_u8(__a, __b)
 #define vmaxvq_u8(__a, __b) __arm_vmaxvq_u8(__a, __b)
 #define vmaxq_u8(__a, __b) __arm_vmaxq_u8(__a, __b)
-#define vhsubq_u8(__a, __b) __arm_vhsubq_u8(__a, __b)
-#define vhsubq_n_u8(__a, __b) __arm_vhsubq_n_u8(__a, __b)
-#define vhaddq_u8(__a, __b) __arm_vhaddq_u8(__a, __b)
-#define vhaddq_n_u8(__a, __b) __arm_vhaddq_n_u8(__a, __b)
 #define vcmpneq_n_u8(__a, __b) __arm_vcmpneq_n_u8(__a, __b)
 #define vcmphiq_u8(__a, __b) __arm_vcmphiq_u8(__a, __b)
 #define vcmphiq_n_u8(__a, __b) __arm_vcmphiq_n_u8(__a, __b)
@@ -725,24 +693,15 @@
 #define vshlq_r_s8(__a, __b) __arm_vshlq_r_s8(__a, __b)
 #define vrshlq_s8(__a, __b) __arm_vrshlq_s8(__a, __b)
 #define vrshlq_n_s8(__a, __b) __arm_vrshlq_n_s8(__a, __b)
-#define vrmulhq_s8(__a, __b) __arm_vrmulhq_s8(__a, __b)
-#define vrhaddq_s8(__a, __b) __arm_vrhaddq_s8(__a, __b)
-#define vqsubq_s8(__a, __b) __arm_vqsubq_s8(__a, __b)
-#define vqsubq_n_s8(__a, __b) __arm_vqsubq_n_s8(__a, __b)
 #define vqshlq_s8(__a, __b) __arm_vqshlq_s8(__a, __b)
 #define vqshlq_r_s8(__a, __b) __arm_vqshlq_r_s8(__a, __b)
 #define vqrshlq_s8(__a, __b) __arm_vqrshlq_s8(__a, __b)
 #define vqrshlq_n_s8(__a, __b) __arm_vqrshlq_n_s8(__a, __b)
 #define vqrdmulhq_s8(__a, __b) __arm_vqrdmulhq_s8(__a, __b)
 #define vqrdmulhq_n_s8(__a, __b) __arm_vqrdmulhq_n_s8(__a, __b)
-#define vqdmulhq_s8(__a, __b) __arm_vqdmulhq_s8(__a, __b)
-#define vqdmulhq_n_s8(__a, __b) __arm_vqdmulhq_n_s8(__a, __b)
-#define vqaddq_s8(__a, __b) __arm_vqaddq_s8(__a, __b)
-#define vqaddq_n_s8(__a, __b) __arm_vqaddq_n_s8(__a, __b)
 #define vornq_s8(__a, __b) __arm_vornq_s8(__a, __b)
 #define vmulltq_int_s8(__a, __b) __arm_vmulltq_int_s8(__a, __b)
 #define vmullbq_int_s8(__a, __b) __arm_vmullbq_int_s8(__a, __b)
-#define vmulhq_s8(__a, __b) __arm_vmulhq_s8(__a, __b)
 #define vmlsdavxq_s8(__a, __b) __arm_vmlsdavxq_s8(__a, __b)
 #define vmlsdavq_s8(__a, __b) __arm_vmlsdavq_s8(__a, __b)
 #define vmladavxq_s8(__a, __b) __arm_vmladavxq_s8(__a, __b)
@@ -751,12 +710,8 @@
 #define vminq_s8(__a, __b) __arm_vminq_s8(__a, __b)
 #define vmaxvq_s8(__a, __b) __arm_vmaxvq_s8(__a, __b)
 #define vmaxq_s8(__a, __b) __arm_vmaxq_s8(__a, __b)
-#define vhsubq_s8(__a, __b) __arm_vhsubq_s8(__a, __b)
-#define vhsubq_n_s8(__a, __b) __arm_vhsubq_n_s8(__a, __b)
 #define vhcaddq_rot90_s8(__a, __b) __arm_vhcaddq_rot90_s8(__a, __b)
 #define vhcaddq_rot270_s8(__a, __b) __arm_vhcaddq_rot270_s8(__a, __b)
-#define vhaddq_s8(__a, __b) __arm_vhaddq_s8(__a, __b)
-#define vhaddq_n_s8(__a, __b) __arm_vhaddq_n_s8(__a, __b)
 #define vcaddq_rot90_s8(__a, __b) __arm_vcaddq_rot90_s8(__a, __b)
 #define vcaddq_rot270_s8(__a, __b) __arm_vcaddq_rot270_s8(__a, __b)
 #define vbrsrq_n_s8(__a, __b) __arm_vbrsrq_n_s8(__a, __b)
@@ -766,25 +721,14 @@
 #define vshlq_n_s8(__a,  __imm) __arm_vshlq_n_s8(__a,  __imm)
 #define vrshrq_n_s8(__a,  __imm) __arm_vrshrq_n_s8(__a,  __imm)
 #define vqshlq_n_s8(__a,  __imm) __arm_vqshlq_n_s8(__a,  __imm)
-#define vrmulhq_u16(__a, __b) __arm_vrmulhq_u16(__a, __b)
-#define vrhaddq_u16(__a, __b) __arm_vrhaddq_u16(__a, __b)
-#define vqsubq_u16(__a, __b) __arm_vqsubq_u16(__a, __b)
-#define vqsubq_n_u16(__a, __b) __arm_vqsubq_n_u16(__a, __b)
-#define vqaddq_u16(__a, __b) __arm_vqaddq_u16(__a, __b)
-#define vqaddq_n_u16(__a, __b) __arm_vqaddq_n_u16(__a, __b)
 #define vornq_u16(__a, __b) __arm_vornq_u16(__a, __b)
 #define vmulltq_int_u16(__a, __b) __arm_vmulltq_int_u16(__a, __b)
 #define vmullbq_int_u16(__a, __b) __arm_vmullbq_int_u16(__a, __b)
-#define vmulhq_u16(__a, __b) __arm_vmulhq_u16(__a, __b)
 #define vmladavq_u16(__a, __b) __arm_vmladavq_u16(__a, __b)
 #define vminvq_u16(__a, __b) __arm_vminvq_u16(__a, __b)
 #define vminq_u16(__a, __b) __arm_vminq_u16(__a, __b)
 #define vmaxvq_u16(__a, __b) __arm_vmaxvq_u16(__a, __b)
 #define vmaxq_u16(__a, __b) __arm_vmaxq_u16(__a, __b)
-#define vhsubq_u16(__a, __b) __arm_vhsubq_u16(__a, __b)
-#define vhsubq_n_u16(__a, __b) __arm_vhsubq_n_u16(__a, __b)
-#define vhaddq_u16(__a, __b) __arm_vhaddq_u16(__a, __b)
-#define vhaddq_n_u16(__a, __b) __arm_vhaddq_n_u16(__a, __b)
 #define vcmpneq_n_u16(__a, __b) __arm_vcmpneq_n_u16(__a, __b)
 #define vcmphiq_u16(__a, __b) __arm_vcmphiq_u16(__a, __b)
 #define vcmphiq_n_u16(__a, __b) __arm_vcmphiq_n_u16(__a, __b)
@@ -829,24 +773,15 @@
 #define vshlq_r_s16(__a, __b) __arm_vshlq_r_s16(__a, __b)
 #define vrshlq_s16(__a, __b) __arm_vrshlq_s16(__a, __b)
 #define vrshlq_n_s16(__a, __b) __arm_vrshlq_n_s16(__a, __b)
-#define vrmulhq_s16(__a, __b) __arm_vrmulhq_s16(__a, __b)
-#define vrhaddq_s16(__a, __b) __arm_vrhaddq_s16(__a, __b)
-#define vqsubq_s16(__a, __b) __arm_vqsubq_s16(__a, __b)
-#define vqsubq_n_s16(__a, __b) __arm_vqsubq_n_s16(__a, __b)
 #define vqshlq_s16(__a, __b) __arm_vqshlq_s16(__a, __b)
 #define vqshlq_r_s16(__a, __b) __arm_vqshlq_r_s16(__a, __b)
 #define vqrshlq_s16(__a, __b) __arm_vqrshlq_s16(__a, __b)
 #define vqrshlq_n_s16(__a, __b) __arm_vqrshlq_n_s16(__a, __b)
 #define vqrdmulhq_s16(__a, __b) __arm_vqrdmulhq_s16(__a, __b)
 #define vqrdmulhq_n_s16(__a, __b) __arm_vqrdmulhq_n_s16(__a, __b)
-#define vqdmulhq_s16(__a, __b) __arm_vqdmulhq_s16(__a, __b)
-#define vqdmulhq_n_s16(__a, __b) __arm_vqdmulhq_n_s16(__a, __b)
-#define vqaddq_s16(__a, __b) __arm_vqaddq_s16(__a, __b)
-#define vqaddq_n_s16(__a, __b) __arm_vqaddq_n_s16(__a, __b)
 #define vornq_s16(__a, __b) __arm_vornq_s16(__a, __b)
 #define vmulltq_int_s16(__a, __b) __arm_vmulltq_int_s16(__a, __b)
 #define vmullbq_int_s16(__a, __b) __arm_vmullbq_int_s16(__a, __b)
-#define vmulhq_s16(__a, __b) __arm_vmulhq_s16(__a, __b)
 #define vmlsdavxq_s16(__a, __b) __arm_vmlsdavxq_s16(__a, __b)
 #define vmlsdavq_s16(__a, __b) __arm_vmlsdavq_s16(__a, __b)
 #define vmladavxq_s16(__a, __b) __arm_vmladavxq_s16(__a, __b)
@@ -855,12 +790,8 @@
 #define vminq_s16(__a, __b) __arm_vminq_s16(__a, __b)
 #define vmaxvq_s16(__a, __b) __arm_vmaxvq_s16(__a, __b)
 #define vmaxq_s16(__a, __b) __arm_vmaxq_s16(__a, __b)
-#define vhsubq_s16(__a, __b) __arm_vhsubq_s16(__a, __b)
-#define vhsubq_n_s16(__a, __b) __arm_vhsubq_n_s16(__a, __b)
 #define vhcaddq_rot90_s16(__a, __b) __arm_vhcaddq_rot90_s16(__a, __b)
 #define vhcaddq_rot270_s16(__a, __b) __arm_vhcaddq_rot270_s16(__a, __b)
-#define vhaddq_s16(__a, __b) __arm_vhaddq_s16(__a, __b)
-#define vhaddq_n_s16(__a, __b) __arm_vhaddq_n_s16(__a, __b)
 #define vcaddq_rot90_s16(__a, __b) __arm_vcaddq_rot90_s16(__a, __b)
 #define vcaddq_rot270_s16(__a, __b) __arm_vcaddq_rot270_s16(__a, __b)
 #define vbrsrq_n_s16(__a, __b) __arm_vbrsrq_n_s16(__a, __b)
@@ -870,25 +801,14 @@
 #define vshlq_n_s16(__a,  __imm) __arm_vshlq_n_s16(__a,  __imm)
 #define vrshrq_n_s16(__a,  __imm) __arm_vrshrq_n_s16(__a,  __imm)
 #define vqshlq_n_s16(__a,  __imm) __arm_vqshlq_n_s16(__a,  __imm)
-#define vrmulhq_u32(__a, __b) __arm_vrmulhq_u32(__a, __b)
-#define vrhaddq_u32(__a, __b) __arm_vrhaddq_u32(__a, __b)
-#define vqsubq_u32(__a, __b) __arm_vqsubq_u32(__a, __b)
-#define vqsubq_n_u32(__a, __b) __arm_vqsubq_n_u32(__a, __b)
-#define vqaddq_u32(__a, __b) __arm_vqaddq_u32(__a, __b)
-#define vqaddq_n_u32(__a, __b) __arm_vqaddq_n_u32(__a, __b)
 #define vornq_u32(__a, __b) __arm_vornq_u32(__a, __b)
 #define vmulltq_int_u32(__a, __b) __arm_vmulltq_int_u32(__a, __b)
 #define vmullbq_int_u32(__a, __b) __arm_vmullbq_int_u32(__a, __b)
-#define vmulhq_u32(__a, __b) __arm_vmulhq_u32(__a, __b)
 #define vmladavq_u32(__a, __b) __arm_vmladavq_u32(__a, __b)
 #define vminvq_u32(__a, __b) __arm_vminvq_u32(__a, __b)
 #define vminq_u32(__a, __b) __arm_vminq_u32(__a, __b)
 #define vmaxvq_u32(__a, __b) __arm_vmaxvq_u32(__a, __b)
 #define vmaxq_u32(__a, __b) __arm_vmaxq_u32(__a, __b)
-#define vhsubq_u32(__a, __b) __arm_vhsubq_u32(__a, __b)
-#define vhsubq_n_u32(__a, __b) __arm_vhsubq_n_u32(__a, __b)
-#define vhaddq_u32(__a, __b) __arm_vhaddq_u32(__a, __b)
-#define vhaddq_n_u32(__a, __b) __arm_vhaddq_n_u32(__a, __b)
 #define vcmpneq_n_u32(__a, __b) __arm_vcmpneq_n_u32(__a, __b)
 #define vcmphiq_u32(__a, __b) __arm_vcmphiq_u32(__a, __b)
 #define vcmphiq_n_u32(__a, __b) __arm_vcmphiq_n_u32(__a, __b)
@@ -933,24 +853,15 @@
 #define vshlq_r_s32(__a, __b) __arm_vshlq_r_s32(__a, __b)
 #define vrshlq_s32(__a, __b) __arm_vrshlq_s32(__a, __b)
 #define vrshlq_n_s32(__a, __b) __arm_vrshlq_n_s32(__a, __b)
-#define vrmulhq_s32(__a, __b) __arm_vrmulhq_s32(__a, __b)
-#define vrhaddq_s32(__a, __b) __arm_vrhaddq_s32(__a, __b)
-#define vqsubq_s32(__a, __b) __arm_vqsubq_s32(__a, __b)
-#define vqsubq_n_s32(__a, __b) __arm_vqsubq_n_s32(__a, __b)
 #define vqshlq_s32(__a, __b) __arm_vqshlq_s32(__a, __b)
 #define vqshlq_r_s32(__a, __b) __arm_vqshlq_r_s32(__a, __b)
 #define vqrshlq_s32(__a, __b) __arm_vqrshlq_s32(__a, __b)
 #define vqrshlq_n_s32(__a, __b) __arm_vqrshlq_n_s32(__a, __b)
 #define vqrdmulhq_s32(__a, __b) __arm_vqrdmulhq_s32(__a, __b)
 #define vqrdmulhq_n_s32(__a, __b) __arm_vqrdmulhq_n_s32(__a, __b)
-#define vqdmulhq_s32(__a, __b) __arm_vqdmulhq_s32(__a, __b)
-#define vqdmulhq_n_s32(__a, __b) __arm_vqdmulhq_n_s32(__a, __b)
-#define vqaddq_s32(__a, __b) __arm_vqaddq_s32(__a, __b)
-#define vqaddq_n_s32(__a, __b) __arm_vqaddq_n_s32(__a, __b)
 #define vornq_s32(__a, __b) __arm_vornq_s32(__a, __b)
 #define vmulltq_int_s32(__a, __b) __arm_vmulltq_int_s32(__a, __b)
 #define vmullbq_int_s32(__a, __b) __arm_vmullbq_int_s32(__a, __b)
-#define vmulhq_s32(__a, __b) __arm_vmulhq_s32(__a, __b)
 #define vmlsdavxq_s32(__a, __b) __arm_vmlsdavxq_s32(__a, __b)
 #define vmlsdavq_s32(__a, __b) __arm_vmlsdavq_s32(__a, __b)
 #define vmladavxq_s32(__a, __b) __arm_vmladavxq_s32(__a, __b)
@@ -959,12 +870,8 @@
 #define vminq_s32(__a, __b) __arm_vminq_s32(__a, __b)
 #define vmaxvq_s32(__a, __b) __arm_vmaxvq_s32(__a, __b)
 #define vmaxq_s32(__a, __b) __arm_vmaxq_s32(__a, __b)
-#define vhsubq_s32(__a, __b) __arm_vhsubq_s32(__a, __b)
-#define vhsubq_n_s32(__a, __b) __arm_vhsubq_n_s32(__a, __b)
 #define vhcaddq_rot90_s32(__a, __b) __arm_vhcaddq_rot90_s32(__a, __b)
 #define vhcaddq_rot270_s32(__a, __b) __arm_vhcaddq_rot270_s32(__a, __b)
-#define vhaddq_s32(__a, __b) __arm_vhaddq_s32(__a, __b)
-#define vhaddq_n_s32(__a, __b) __arm_vhaddq_n_s32(__a, __b)
 #define vcaddq_rot90_s32(__a, __b) __arm_vcaddq_rot90_s32(__a, __b)
 #define vcaddq_rot270_s32(__a, __b) __arm_vcaddq_rot270_s32(__a, __b)
 #define vbrsrq_n_s32(__a, __b) __arm_vbrsrq_n_s32(__a, __b)
@@ -1634,36 +1541,12 @@
 #define vcaddq_rot90_m_u8(__inactive, __a, __b, __p) __arm_vcaddq_rot90_m_u8(__inactive, __a, __b, __p)
 #define vcaddq_rot90_m_u32(__inactive, __a, __b, __p) __arm_vcaddq_rot90_m_u32(__inactive, __a, __b, __p)
 #define vcaddq_rot90_m_u16(__inactive, __a, __b, __p) __arm_vcaddq_rot90_m_u16(__inactive, __a, __b, __p)
-#define vhaddq_m_n_s8(__inactive, __a, __b, __p) __arm_vhaddq_m_n_s8(__inactive, __a, __b, __p)
-#define vhaddq_m_n_s32(__inactive, __a, __b, __p) __arm_vhaddq_m_n_s32(__inactive, __a, __b, __p)
-#define vhaddq_m_n_s16(__inactive, __a, __b, __p) __arm_vhaddq_m_n_s16(__inactive, __a, __b, __p)
-#define vhaddq_m_n_u8(__inactive, __a, __b, __p) __arm_vhaddq_m_n_u8(__inactive, __a, __b, __p)
-#define vhaddq_m_n_u32(__inactive, __a, __b, __p) __arm_vhaddq_m_n_u32(__inactive, __a, __b, __p)
-#define vhaddq_m_n_u16(__inactive, __a, __b, __p) __arm_vhaddq_m_n_u16(__inactive, __a, __b, __p)
-#define vhaddq_m_s8(__inactive, __a, __b, __p) __arm_vhaddq_m_s8(__inactive, __a, __b, __p)
-#define vhaddq_m_s32(__inactive, __a, __b, __p) __arm_vhaddq_m_s32(__inactive, __a, __b, __p)
-#define vhaddq_m_s16(__inactive, __a, __b, __p) __arm_vhaddq_m_s16(__inactive, __a, __b, __p)
-#define vhaddq_m_u8(__inactive, __a, __b, __p) __arm_vhaddq_m_u8(__inactive, __a, __b, __p)
-#define vhaddq_m_u32(__inactive, __a, __b, __p) __arm_vhaddq_m_u32(__inactive, __a, __b, __p)
-#define vhaddq_m_u16(__inactive, __a, __b, __p) __arm_vhaddq_m_u16(__inactive, __a, __b, __p)
 #define vhcaddq_rot270_m_s8(__inactive, __a, __b, __p) __arm_vhcaddq_rot270_m_s8(__inactive, __a, __b, __p)
 #define vhcaddq_rot270_m_s32(__inactive, __a, __b, __p) __arm_vhcaddq_rot270_m_s32(__inactive, __a, __b, __p)
 #define vhcaddq_rot270_m_s16(__inactive, __a, __b, __p) __arm_vhcaddq_rot270_m_s16(__inactive, __a, __b, __p)
 #define vhcaddq_rot90_m_s8(__inactive, __a, __b, __p) __arm_vhcaddq_rot90_m_s8(__inactive, __a, __b, __p)
 #define vhcaddq_rot90_m_s32(__inactive, __a, __b, __p) __arm_vhcaddq_rot90_m_s32(__inactive, __a, __b, __p)
 #define vhcaddq_rot90_m_s16(__inactive, __a, __b, __p) __arm_vhcaddq_rot90_m_s16(__inactive, __a, __b, __p)
-#define vhsubq_m_n_s8(__inactive, __a, __b, __p) __arm_vhsubq_m_n_s8(__inactive, __a, __b, __p)
-#define vhsubq_m_n_s32(__inactive, __a, __b, __p) __arm_vhsubq_m_n_s32(__inactive, __a, __b, __p)
-#define vhsubq_m_n_s16(__inactive, __a, __b, __p) __arm_vhsubq_m_n_s16(__inactive, __a, __b, __p)
-#define vhsubq_m_n_u8(__inactive, __a, __b, __p) __arm_vhsubq_m_n_u8(__inactive, __a, __b, __p)
-#define vhsubq_m_n_u32(__inactive, __a, __b, __p) __arm_vhsubq_m_n_u32(__inactive, __a, __b, __p)
-#define vhsubq_m_n_u16(__inactive, __a, __b, __p) __arm_vhsubq_m_n_u16(__inactive, __a, __b, __p)
-#define vhsubq_m_s8(__inactive, __a, __b, __p) __arm_vhsubq_m_s8(__inactive, __a, __b, __p)
-#define vhsubq_m_s32(__inactive, __a, __b, __p) __arm_vhsubq_m_s32(__inactive, __a, __b, __p)
-#define vhsubq_m_s16(__inactive, __a, __b, __p) __arm_vhsubq_m_s16(__inactive, __a, __b, __p)
-#define vhsubq_m_u8(__inactive, __a, __b, __p) __arm_vhsubq_m_u8(__inactive, __a, __b, __p)
-#define vhsubq_m_u32(__inactive, __a, __b, __p) __arm_vhsubq_m_u32(__inactive, __a, __b, __p)
-#define vhsubq_m_u16(__inactive, __a, __b, __p) __arm_vhsubq_m_u16(__inactive, __a, __b, __p)
 #define vmaxq_m_s8(__inactive, __a, __b, __p) __arm_vmaxq_m_s8(__inactive, __a, __b, __p)
 #define vmaxq_m_s32(__inactive, __a, __b, __p) __arm_vmaxq_m_s32(__inactive, __a, __b, __p)
 #define vmaxq_m_s16(__inactive, __a, __b, __p) __arm_vmaxq_m_s16(__inactive, __a, __b, __p)
@@ -1703,12 +1586,6 @@
 #define vmlsdavaxq_p_s8(__a, __b, __c, __p) __arm_vmlsdavaxq_p_s8(__a, __b, __c, __p)
 #define vmlsdavaxq_p_s32(__a, __b, __c, __p) __arm_vmlsdavaxq_p_s32(__a, __b, __c, __p)
 #define vmlsdavaxq_p_s16(__a, __b, __c, __p) __arm_vmlsdavaxq_p_s16(__a, __b, __c, __p)
-#define vmulhq_m_s8(__inactive, __a, __b, __p) __arm_vmulhq_m_s8(__inactive, __a, __b, __p)
-#define vmulhq_m_s32(__inactive, __a, __b, __p) __arm_vmulhq_m_s32(__inactive, __a, __b, __p)
-#define vmulhq_m_s16(__inactive, __a, __b, __p) __arm_vmulhq_m_s16(__inactive, __a, __b, __p)
-#define vmulhq_m_u8(__inactive, __a, __b, __p) __arm_vmulhq_m_u8(__inactive, __a, __b, __p)
-#define vmulhq_m_u32(__inactive, __a, __b, __p) __arm_vmulhq_m_u32(__inactive, __a, __b, __p)
-#define vmulhq_m_u16(__inactive, __a, __b, __p) __arm_vmulhq_m_u16(__inactive, __a, __b, __p)
 #define vmullbq_int_m_s8(__inactive, __a, __b, __p) __arm_vmullbq_int_m_s8(__inactive, __a, __b, __p)
 #define vmullbq_int_m_s32(__inactive, __a, __b, __p) __arm_vmullbq_int_m_s32(__inactive, __a, __b, __p)
 #define vmullbq_int_m_s16(__inactive, __a, __b, __p) __arm_vmullbq_int_m_s16(__inactive, __a, __b, __p)
@@ -1727,18 +1604,6 @@
 #define vornq_m_u8(__inactive, __a, __b, __p) __arm_vornq_m_u8(__inactive, __a, __b, __p)
 #define vornq_m_u32(__inactive, __a, __b, __p) __arm_vornq_m_u32(__inactive, __a, __b, __p)
 #define vornq_m_u16(__inactive, __a, __b, __p) __arm_vornq_m_u16(__inactive, __a, __b, __p)
-#define vqaddq_m_n_s8(__inactive, __a, __b, __p) __arm_vqaddq_m_n_s8(__inactive, __a, __b, __p)
-#define vqaddq_m_n_s32(__inactive, __a, __b, __p) __arm_vqaddq_m_n_s32(__inactive, __a, __b, __p)
-#define vqaddq_m_n_s16(__inactive, __a, __b, __p) __arm_vqaddq_m_n_s16(__inactive, __a, __b, __p)
-#define vqaddq_m_n_u8(__inactive, __a, __b, __p) __arm_vqaddq_m_n_u8(__inactive, __a, __b, __p)
-#define vqaddq_m_n_u32(__inactive, __a, __b, __p) __arm_vqaddq_m_n_u32(__inactive, __a, __b, __p)
-#define vqaddq_m_n_u16(__inactive, __a, __b, __p) __arm_vqaddq_m_n_u16(__inactive, __a, __b, __p)
-#define vqaddq_m_s8(__inactive, __a, __b, __p) __arm_vqaddq_m_s8(__inactive, __a, __b, __p)
-#define vqaddq_m_s32(__inactive, __a, __b, __p) __arm_vqaddq_m_s32(__inactive, __a, __b, __p)
-#define vqaddq_m_s16(__inactive, __a, __b, __p) __arm_vqaddq_m_s16(__inactive, __a, __b, __p)
-#define vqaddq_m_u8(__inactive, __a, __b, __p) __arm_vqaddq_m_u8(__inactive, __a, __b, __p)
-#define vqaddq_m_u32(__inactive, __a, __b, __p) __arm_vqaddq_m_u32(__inactive, __a, __b, __p)
-#define vqaddq_m_u16(__inactive, __a, __b, __p) __arm_vqaddq_m_u16(__inactive, __a, __b, __p)
 #define vqdmladhq_m_s8(__inactive, __a, __b, __p) __arm_vqdmladhq_m_s8(__inactive, __a, __b, __p)
 #define vqdmladhq_m_s32(__inactive, __a, __b, __p) __arm_vqdmladhq_m_s32(__inactive, __a, __b, __p)
 #define vqdmladhq_m_s16(__inactive, __a, __b, __p) __arm_vqdmladhq_m_s16(__inactive, __a, __b, __p)
@@ -1757,12 +1622,6 @@
 #define vqdmlsdhxq_m_s8(__inactive, __a, __b, __p) __arm_vqdmlsdhxq_m_s8(__inactive, __a, __b, __p)
 #define vqdmlsdhxq_m_s32(__inactive, __a, __b, __p) __arm_vqdmlsdhxq_m_s32(__inactive, __a, __b, __p)
 #define vqdmlsdhxq_m_s16(__inactive, __a, __b, __p) __arm_vqdmlsdhxq_m_s16(__inactive, __a, __b, __p)
-#define vqdmulhq_m_n_s8(__inactive, __a, __b, __p) __arm_vqdmulhq_m_n_s8(__inactive, __a, __b, __p)
-#define vqdmulhq_m_n_s32(__inactive, __a, __b, __p) __arm_vqdmulhq_m_n_s32(__inactive, __a, __b, __p)
-#define vqdmulhq_m_n_s16(__inactive, __a, __b, __p) __arm_vqdmulhq_m_n_s16(__inactive, __a, __b, __p)
-#define vqdmulhq_m_s8(__inactive, __a, __b, __p) __arm_vqdmulhq_m_s8(__inactive, __a, __b, __p)
-#define vqdmulhq_m_s32(__inactive, __a, __b, __p) __arm_vqdmulhq_m_s32(__inactive, __a, __b, __p)
-#define vqdmulhq_m_s16(__inactive, __a, __b, __p) __arm_vqdmulhq_m_s16(__inactive, __a, __b, __p)
 #define vqrdmladhq_m_s8(__inactive, __a, __b, __p) __arm_vqrdmladhq_m_s8(__inactive, __a, __b, __p)
 #define vqrdmladhq_m_s32(__inactive, __a, __b, __p) __arm_vqrdmladhq_m_s32(__inactive, __a, __b, __p)
 #define vqrdmladhq_m_s16(__inactive, __a, __b, __p) __arm_vqrdmladhq_m_s16(__inactive, __a, __b, __p)
@@ -1805,30 +1664,6 @@
 #define vqshlq_m_u8(__inactive, __a, __b, __p) __arm_vqshlq_m_u8(__inactive, __a, __b, __p)
 #define vqshlq_m_u32(__inactive, __a, __b, __p) __arm_vqshlq_m_u32(__inactive, __a, __b, __p)
 #define vqshlq_m_u16(__inactive, __a, __b, __p) __arm_vqshlq_m_u16(__inactive, __a, __b, __p)
-#define vqsubq_m_n_s8(__inactive, __a, __b, __p) __arm_vqsubq_m_n_s8(__inactive, __a, __b, __p)
-#define vqsubq_m_n_s32(__inactive, __a, __b, __p) __arm_vqsubq_m_n_s32(__inactive, __a, __b, __p)
-#define vqsubq_m_n_s16(__inactive, __a, __b, __p) __arm_vqsubq_m_n_s16(__inactive, __a, __b, __p)
-#define vqsubq_m_n_u8(__inactive, __a, __b, __p) __arm_vqsubq_m_n_u8(__inactive, __a, __b, __p)
-#define vqsubq_m_n_u32(__inactive, __a, __b, __p) __arm_vqsubq_m_n_u32(__inactive, __a, __b, __p)
-#define vqsubq_m_n_u16(__inactive, __a, __b, __p) __arm_vqsubq_m_n_u16(__inactive, __a, __b, __p)
-#define vqsubq_m_s8(__inactive, __a, __b, __p) __arm_vqsubq_m_s8(__inactive, __a, __b, __p)
-#define vqsubq_m_s32(__inactive, __a, __b, __p) __arm_vqsubq_m_s32(__inactive, __a, __b, __p)
-#define vqsubq_m_s16(__inactive, __a, __b, __p) __arm_vqsubq_m_s16(__inactive, __a, __b, __p)
-#define vqsubq_m_u8(__inactive, __a, __b, __p) __arm_vqsubq_m_u8(__inactive, __a, __b, __p)
-#define vqsubq_m_u32(__inactive, __a, __b, __p) __arm_vqsubq_m_u32(__inactive, __a, __b, __p)
-#define vqsubq_m_u16(__inactive, __a, __b, __p) __arm_vqsubq_m_u16(__inactive, __a, __b, __p)
-#define vrhaddq_m_s8(__inactive, __a, __b, __p) __arm_vrhaddq_m_s8(__inactive, __a, __b, __p)
-#define vrhaddq_m_s32(__inactive, __a, __b, __p) __arm_vrhaddq_m_s32(__inactive, __a, __b, __p)
-#define vrhaddq_m_s16(__inactive, __a, __b, __p) __arm_vrhaddq_m_s16(__inactive, __a, __b, __p)
-#define vrhaddq_m_u8(__inactive, __a, __b, __p) __arm_vrhaddq_m_u8(__inactive, __a, __b, __p)
-#define vrhaddq_m_u32(__inactive, __a, __b, __p) __arm_vrhaddq_m_u32(__inactive, __a, __b, __p)
-#define vrhaddq_m_u16(__inactive, __a, __b, __p) __arm_vrhaddq_m_u16(__inactive, __a, __b, __p)
-#define vrmulhq_m_s8(__inactive, __a, __b, __p) __arm_vrmulhq_m_s8(__inactive, __a, __b, __p)
-#define vrmulhq_m_s32(__inactive, __a, __b, __p) __arm_vrmulhq_m_s32(__inactive, __a, __b, __p)
-#define vrmulhq_m_s16(__inactive, __a, __b, __p) __arm_vrmulhq_m_s16(__inactive, __a, __b, __p)
-#define vrmulhq_m_u8(__inactive, __a, __b, __p) __arm_vrmulhq_m_u8(__inactive, __a, __b, __p)
-#define vrmulhq_m_u32(__inactive, __a, __b, __p) __arm_vrmulhq_m_u32(__inactive, __a, __b, __p)
-#define vrmulhq_m_u16(__inactive, __a, __b, __p) __arm_vrmulhq_m_u16(__inactive, __a, __b, __p)
 #define vrshlq_m_s8(__inactive, __a, __b, __p) __arm_vrshlq_m_s8(__inactive, __a, __b, __p)
 #define vrshlq_m_s32(__inactive, __a, __b, __p) __arm_vrshlq_m_s32(__inactive, __a, __b, __p)
 #define vrshlq_m_s16(__inactive, __a, __b, __p) __arm_vrshlq_m_s16(__inactive, __a, __b, __p)
@@ -2315,12 +2150,6 @@
 #define vnegq_x_s8(__a, __p) __arm_vnegq_x_s8(__a, __p)
 #define vnegq_x_s16(__a, __p) __arm_vnegq_x_s16(__a, __p)
 #define vnegq_x_s32(__a, __p) __arm_vnegq_x_s32(__a, __p)
-#define vmulhq_x_s8(__a, __b, __p) __arm_vmulhq_x_s8(__a, __b, __p)
-#define vmulhq_x_s16(__a, __b, __p) __arm_vmulhq_x_s16(__a, __b, __p)
-#define vmulhq_x_s32(__a, __b, __p) __arm_vmulhq_x_s32(__a, __b, __p)
-#define vmulhq_x_u8(__a, __b, __p) __arm_vmulhq_x_u8(__a, __b, __p)
-#define vmulhq_x_u16(__a, __b, __p) __arm_vmulhq_x_u16(__a, __b, __p)
-#define vmulhq_x_u32(__a, __b, __p) __arm_vmulhq_x_u32(__a, __b, __p)
 #define vmullbq_poly_x_p8(__a, __b, __p) __arm_vmullbq_poly_x_p8(__a, __b, __p)
 #define vmullbq_poly_x_p16(__a, __b, __p) __arm_vmullbq_poly_x_p16(__a, __b, __p)
 #define vmullbq_int_x_s8(__a, __b, __p) __arm_vmullbq_int_x_s8(__a, __b, __p)
@@ -2349,48 +2178,12 @@
 #define vcaddq_rot270_x_u8(__a, __b, __p) __arm_vcaddq_rot270_x_u8(__a, __b, __p)
 #define vcaddq_rot270_x_u16(__a, __b, __p) __arm_vcaddq_rot270_x_u16(__a, __b, __p)
 #define vcaddq_rot270_x_u32(__a, __b, __p) __arm_vcaddq_rot270_x_u32(__a, __b, __p)
-#define vhaddq_x_n_s8(__a, __b, __p) __arm_vhaddq_x_n_s8(__a, __b, __p)
-#define vhaddq_x_n_s16(__a, __b, __p) __arm_vhaddq_x_n_s16(__a, __b, __p)
-#define vhaddq_x_n_s32(__a, __b, __p) __arm_vhaddq_x_n_s32(__a, __b, __p)
-#define vhaddq_x_n_u8(__a, __b, __p) __arm_vhaddq_x_n_u8(__a, __b, __p)
-#define vhaddq_x_n_u16(__a, __b, __p) __arm_vhaddq_x_n_u16(__a, __b, __p)
-#define vhaddq_x_n_u32(__a, __b, __p) __arm_vhaddq_x_n_u32(__a, __b, __p)
-#define vhaddq_x_s8(__a, __b, __p) __arm_vhaddq_x_s8(__a, __b, __p)
-#define vhaddq_x_s16(__a, __b, __p) __arm_vhaddq_x_s16(__a, __b, __p)
-#define vhaddq_x_s32(__a, __b, __p) __arm_vhaddq_x_s32(__a, __b, __p)
-#define vhaddq_x_u8(__a, __b, __p) __arm_vhaddq_x_u8(__a, __b, __p)
-#define vhaddq_x_u16(__a, __b, __p) __arm_vhaddq_x_u16(__a, __b, __p)
-#define vhaddq_x_u32(__a, __b, __p) __arm_vhaddq_x_u32(__a, __b, __p)
 #define vhcaddq_rot90_x_s8(__a, __b, __p) __arm_vhcaddq_rot90_x_s8(__a, __b, __p)
 #define vhcaddq_rot90_x_s16(__a, __b, __p) __arm_vhcaddq_rot90_x_s16(__a, __b, __p)
 #define vhcaddq_rot90_x_s32(__a, __b, __p) __arm_vhcaddq_rot90_x_s32(__a, __b, __p)
 #define vhcaddq_rot270_x_s8(__a, __b, __p) __arm_vhcaddq_rot270_x_s8(__a, __b, __p)
 #define vhcaddq_rot270_x_s16(__a, __b, __p) __arm_vhcaddq_rot270_x_s16(__a, __b, __p)
 #define vhcaddq_rot270_x_s32(__a, __b, __p) __arm_vhcaddq_rot270_x_s32(__a, __b, __p)
-#define vhsubq_x_n_s8(__a, __b, __p) __arm_vhsubq_x_n_s8(__a, __b, __p)
-#define vhsubq_x_n_s16(__a, __b, __p) __arm_vhsubq_x_n_s16(__a, __b, __p)
-#define vhsubq_x_n_s32(__a, __b, __p) __arm_vhsubq_x_n_s32(__a, __b, __p)
-#define vhsubq_x_n_u8(__a, __b, __p) __arm_vhsubq_x_n_u8(__a, __b, __p)
-#define vhsubq_x_n_u16(__a, __b, __p) __arm_vhsubq_x_n_u16(__a, __b, __p)
-#define vhsubq_x_n_u32(__a, __b, __p) __arm_vhsubq_x_n_u32(__a, __b, __p)
-#define vhsubq_x_s8(__a, __b, __p) __arm_vhsubq_x_s8(__a, __b, __p)
-#define vhsubq_x_s16(__a, __b, __p) __arm_vhsubq_x_s16(__a, __b, __p)
-#define vhsubq_x_s32(__a, __b, __p) __arm_vhsubq_x_s32(__a, __b, __p)
-#define vhsubq_x_u8(__a, __b, __p) __arm_vhsubq_x_u8(__a, __b, __p)
-#define vhsubq_x_u16(__a, __b, __p) __arm_vhsubq_x_u16(__a, __b, __p)
-#define vhsubq_x_u32(__a, __b, __p) __arm_vhsubq_x_u32(__a, __b, __p)
-#define vrhaddq_x_s8(__a, __b, __p) __arm_vrhaddq_x_s8(__a, __b, __p)
-#define vrhaddq_x_s16(__a, __b, __p) __arm_vrhaddq_x_s16(__a, __b, __p)
-#define vrhaddq_x_s32(__a, __b, __p) __arm_vrhaddq_x_s32(__a, __b, __p)
-#define vrhaddq_x_u8(__a, __b, __p) __arm_vrhaddq_x_u8(__a, __b, __p)
-#define vrhaddq_x_u16(__a, __b, __p) __arm_vrhaddq_x_u16(__a, __b, __p)
-#define vrhaddq_x_u32(__a, __b, __p) __arm_vrhaddq_x_u32(__a, __b, __p)
-#define vrmulhq_x_s8(__a, __b, __p) __arm_vrmulhq_x_s8(__a, __b, __p)
-#define vrmulhq_x_s16(__a, __b, __p) __arm_vrmulhq_x_s16(__a, __b, __p)
-#define vrmulhq_x_s32(__a, __b, __p) __arm_vrmulhq_x_s32(__a, __b, __p)
-#define vrmulhq_x_u8(__a, __b, __p) __arm_vrmulhq_x_u8(__a, __b, __p)
-#define vrmulhq_x_u16(__a, __b, __p) __arm_vrmulhq_x_u16(__a, __b, __p)
-#define vrmulhq_x_u32(__a, __b, __p) __arm_vrmulhq_x_u32(__a, __b, __p)
 #define vbicq_x_s8(__a, __b, __p) __arm_vbicq_x_s8(__a, __b, __p)
 #define vbicq_x_s16(__a, __b, __p) __arm_vbicq_x_s16(__a, __b, __p)
 #define vbicq_x_s32(__a, __b, __p) __arm_vbicq_x_s32(__a, __b, __p)
@@ -3351,48 +3144,6 @@ __arm_vshlq_u32 (uint32x4_t __a, int32x4_t __b)
   return __builtin_mve_vshlq_uv4si (__a, __b);
 }
 
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vrmulhq_u8 (uint8x16_t __a, uint8x16_t __b)
-{
-  return __builtin_mve_vrmulhq_uv16qi (__a, __b);
-}
-
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vrhaddq_u8 (uint8x16_t __a, uint8x16_t __b)
-{
-  return __builtin_mve_vrhaddq_uv16qi (__a, __b);
-}
-
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vqsubq_u8 (uint8x16_t __a, uint8x16_t __b)
-{
-  return __builtin_mve_vqsubq_uv16qi (__a, __b);
-}
-
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vqsubq_n_u8 (uint8x16_t __a, uint8_t __b)
-{
-  return __builtin_mve_vqsubq_n_uv16qi (__a, __b);
-}
-
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vqaddq_u8 (uint8x16_t __a, uint8x16_t __b)
-{
-  return __builtin_mve_vqaddq_uv16qi (__a, __b);
-}
-
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vqaddq_n_u8 (uint8x16_t __a, uint8_t __b)
-{
-  return __builtin_mve_vqaddq_n_uv16qi (__a, __b);
-}
-
 __extension__ extern __inline uint8x16_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vornq_u8 (uint8x16_t __a, uint8x16_t __b)
@@ -3414,13 +3165,6 @@ __arm_vmullbq_int_u8 (uint8x16_t __a, uint8x16_t __b)
   return __builtin_mve_vmullbq_int_uv16qi (__a, __b);
 }
 
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vmulhq_u8 (uint8x16_t __a, uint8x16_t __b)
-{
-  return __builtin_mve_vmulhq_uv16qi (__a, __b);
-}
-
 __extension__ extern __inline uint32_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vmladavq_u8 (uint8x16_t __a, uint8x16_t __b)
@@ -3456,34 +3200,6 @@ __arm_vmaxq_u8 (uint8x16_t __a, uint8x16_t __b)
   return __builtin_mve_vmaxq_uv16qi (__a, __b);
 }
 
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhsubq_u8 (uint8x16_t __a, uint8x16_t __b)
-{
-  return __builtin_mve_vhsubq_uv16qi (__a, __b);
-}
-
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhsubq_n_u8 (uint8x16_t __a, uint8_t __b)
-{
-  return __builtin_mve_vhsubq_n_uv16qi (__a, __b);
-}
-
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhaddq_u8 (uint8x16_t __a, uint8x16_t __b)
-{
-  return __builtin_mve_vhaddq_uv16qi (__a, __b);
-}
-
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhaddq_n_u8 (uint8x16_t __a, uint8_t __b)
-{
-  return __builtin_mve_vhaddq_n_uv16qi (__a, __b);
-}
-
 __extension__ extern __inline mve_pred16_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vcmpneq_n_u8 (uint8x16_t __a, uint8_t __b)
@@ -3794,34 +3510,6 @@ __arm_vrshlq_n_s8 (int8x16_t __a, int32_t __b)
   return __builtin_mve_vrshlq_n_sv16qi (__a, __b);
 }
 
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vrmulhq_s8 (int8x16_t __a, int8x16_t __b)
-{
-  return __builtin_mve_vrmulhq_sv16qi (__a, __b);
-}
-
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vrhaddq_s8 (int8x16_t __a, int8x16_t __b)
-{
-  return __builtin_mve_vrhaddq_sv16qi (__a, __b);
-}
-
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vqsubq_s8 (int8x16_t __a, int8x16_t __b)
-{
-  return __builtin_mve_vqsubq_sv16qi (__a, __b);
-}
-
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vqsubq_n_s8 (int8x16_t __a, int8_t __b)
-{
-  return __builtin_mve_vqsubq_n_sv16qi (__a, __b);
-}
-
 __extension__ extern __inline int8x16_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vqshlq_s8 (int8x16_t __a, int8x16_t __b)
@@ -3864,34 +3552,6 @@ __arm_vqrdmulhq_n_s8 (int8x16_t __a, int8_t __b)
   return __builtin_mve_vqrdmulhq_n_sv16qi (__a, __b);
 }
 
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vqdmulhq_s8 (int8x16_t __a, int8x16_t __b)
-{
-  return __builtin_mve_vqdmulhq_sv16qi (__a, __b);
-}
-
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vqdmulhq_n_s8 (int8x16_t __a, int8_t __b)
-{
-  return __builtin_mve_vqdmulhq_n_sv16qi (__a, __b);
-}
-
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vqaddq_s8 (int8x16_t __a, int8x16_t __b)
-{
-  return __builtin_mve_vqaddq_sv16qi (__a, __b);
-}
-
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vqaddq_n_s8 (int8x16_t __a, int8_t __b)
-{
-  return __builtin_mve_vqaddq_n_sv16qi (__a, __b);
-}
-
 __extension__ extern __inline int8x16_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vornq_s8 (int8x16_t __a, int8x16_t __b)
@@ -3913,13 +3573,6 @@ __arm_vmullbq_int_s8 (int8x16_t __a, int8x16_t __b)
   return __builtin_mve_vmullbq_int_sv16qi (__a, __b);
 }
 
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vmulhq_s8 (int8x16_t __a, int8x16_t __b)
-{
-  return __builtin_mve_vmulhq_sv16qi (__a, __b);
-}
-
 __extension__ extern __inline int32_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vmlsdavxq_s8 (int8x16_t __a, int8x16_t __b)
@@ -3976,20 +3629,6 @@ __arm_vmaxq_s8 (int8x16_t __a, int8x16_t __b)
   return __builtin_mve_vmaxq_sv16qi (__a, __b);
 }
 
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhsubq_s8 (int8x16_t __a, int8x16_t __b)
-{
-  return __builtin_mve_vhsubq_sv16qi (__a, __b);
-}
-
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhsubq_n_s8 (int8x16_t __a, int8_t __b)
-{
-  return __builtin_mve_vhsubq_n_sv16qi (__a, __b);
-}
-
 __extension__ extern __inline int8x16_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vhcaddq_rot90_s8 (int8x16_t __a, int8x16_t __b)
@@ -4004,20 +3643,6 @@ __arm_vhcaddq_rot270_s8 (int8x16_t __a, int8x16_t __b)
   return __builtin_mve_vhcaddq_rot270_sv16qi (__a, __b);
 }
 
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhaddq_s8 (int8x16_t __a, int8x16_t __b)
-{
-  return __builtin_mve_vhaddq_sv16qi (__a, __b);
-}
-
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhaddq_n_s8 (int8x16_t __a, int8_t __b)
-{
-  return __builtin_mve_vhaddq_n_sv16qi (__a, __b);
-}
-
 __extension__ extern __inline int8x16_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vcaddq_rot90_s8 (int8x16_t __a, int8x16_t __b)
@@ -4081,48 +3706,6 @@ __arm_vqshlq_n_s8 (int8x16_t __a, const int __imm)
   return __builtin_mve_vqshlq_n_sv16qi (__a, __imm);
 }
 
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vrmulhq_u16 (uint16x8_t __a, uint16x8_t __b)
-{
-  return __builtin_mve_vrmulhq_uv8hi (__a, __b);
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vrhaddq_u16 (uint16x8_t __a, uint16x8_t __b)
-{
-  return __builtin_mve_vrhaddq_uv8hi (__a, __b);
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vqsubq_u16 (uint16x8_t __a, uint16x8_t __b)
-{
-  return __builtin_mve_vqsubq_uv8hi (__a, __b);
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vqsubq_n_u16 (uint16x8_t __a, uint16_t __b)
-{
-  return __builtin_mve_vqsubq_n_uv8hi (__a, __b);
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vqaddq_u16 (uint16x8_t __a, uint16x8_t __b)
-{
-  return __builtin_mve_vqaddq_uv8hi (__a, __b);
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vqaddq_n_u16 (uint16x8_t __a, uint16_t __b)
-{
-  return __builtin_mve_vqaddq_n_uv8hi (__a, __b);
-}
-
 __extension__ extern __inline uint16x8_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vornq_u16 (uint16x8_t __a, uint16x8_t __b)
@@ -4144,13 +3727,6 @@ __arm_vmullbq_int_u16 (uint16x8_t __a, uint16x8_t __b)
   return __builtin_mve_vmullbq_int_uv8hi (__a, __b);
 }
 
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vmulhq_u16 (uint16x8_t __a, uint16x8_t __b)
-{
-  return __builtin_mve_vmulhq_uv8hi (__a, __b);
-}
-
 __extension__ extern __inline uint32_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vmladavq_u16 (uint16x8_t __a, uint16x8_t __b)
@@ -4186,34 +3762,6 @@ __arm_vmaxq_u16 (uint16x8_t __a, uint16x8_t __b)
   return __builtin_mve_vmaxq_uv8hi (__a, __b);
 }
 
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhsubq_u16 (uint16x8_t __a, uint16x8_t __b)
-{
-  return __builtin_mve_vhsubq_uv8hi (__a, __b);
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhsubq_n_u16 (uint16x8_t __a, uint16_t __b)
-{
-  return __builtin_mve_vhsubq_n_uv8hi (__a, __b);
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhaddq_u16 (uint16x8_t __a, uint16x8_t __b)
-{
-  return __builtin_mve_vhaddq_uv8hi (__a, __b);
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhaddq_n_u16 (uint16x8_t __a, uint16_t __b)
-{
-  return __builtin_mve_vhaddq_n_uv8hi (__a, __b);
-}
-
 __extension__ extern __inline mve_pred16_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vcmpneq_n_u16 (uint16x8_t __a, uint16_t __b)
@@ -4524,34 +4072,6 @@ __arm_vrshlq_n_s16 (int16x8_t __a, int32_t __b)
   return __builtin_mve_vrshlq_n_sv8hi (__a, __b);
 }
 
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vrmulhq_s16 (int16x8_t __a, int16x8_t __b)
-{
-  return __builtin_mve_vrmulhq_sv8hi (__a, __b);
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vrhaddq_s16 (int16x8_t __a, int16x8_t __b)
-{
-  return __builtin_mve_vrhaddq_sv8hi (__a, __b);
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vqsubq_s16 (int16x8_t __a, int16x8_t __b)
-{
-  return __builtin_mve_vqsubq_sv8hi (__a, __b);
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vqsubq_n_s16 (int16x8_t __a, int16_t __b)
-{
-  return __builtin_mve_vqsubq_n_sv8hi (__a, __b);
-}
-
 __extension__ extern __inline int16x8_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vqshlq_s16 (int16x8_t __a, int16x8_t __b)
@@ -4594,34 +4114,6 @@ __arm_vqrdmulhq_n_s16 (int16x8_t __a, int16_t __b)
   return __builtin_mve_vqrdmulhq_n_sv8hi (__a, __b);
 }
 
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vqdmulhq_s16 (int16x8_t __a, int16x8_t __b)
-{
-  return __builtin_mve_vqdmulhq_sv8hi (__a, __b);
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vqdmulhq_n_s16 (int16x8_t __a, int16_t __b)
-{
-  return __builtin_mve_vqdmulhq_n_sv8hi (__a, __b);
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vqaddq_s16 (int16x8_t __a, int16x8_t __b)
-{
-  return __builtin_mve_vqaddq_sv8hi (__a, __b);
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vqaddq_n_s16 (int16x8_t __a, int16_t __b)
-{
-  return __builtin_mve_vqaddq_n_sv8hi (__a, __b);
-}
-
 __extension__ extern __inline int16x8_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vornq_s16 (int16x8_t __a, int16x8_t __b)
@@ -4643,13 +4135,6 @@ __arm_vmullbq_int_s16 (int16x8_t __a, int16x8_t __b)
   return __builtin_mve_vmullbq_int_sv8hi (__a, __b);
 }
 
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vmulhq_s16 (int16x8_t __a, int16x8_t __b)
-{
-  return __builtin_mve_vmulhq_sv8hi (__a, __b);
-}
-
 __extension__ extern __inline int32_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vmlsdavxq_s16 (int16x8_t __a, int16x8_t __b)
@@ -4706,20 +4191,6 @@ __arm_vmaxq_s16 (int16x8_t __a, int16x8_t __b)
   return __builtin_mve_vmaxq_sv8hi (__a, __b);
 }
 
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhsubq_s16 (int16x8_t __a, int16x8_t __b)
-{
-  return __builtin_mve_vhsubq_sv8hi (__a, __b);
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhsubq_n_s16 (int16x8_t __a, int16_t __b)
-{
-  return __builtin_mve_vhsubq_n_sv8hi (__a, __b);
-}
-
 __extension__ extern __inline int16x8_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vhcaddq_rot90_s16 (int16x8_t __a, int16x8_t __b)
@@ -4734,20 +4205,6 @@ __arm_vhcaddq_rot270_s16 (int16x8_t __a, int16x8_t __b)
   return __builtin_mve_vhcaddq_rot270_sv8hi (__a, __b);
 }
 
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhaddq_s16 (int16x8_t __a, int16x8_t __b)
-{
-  return __builtin_mve_vhaddq_sv8hi (__a, __b);
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhaddq_n_s16 (int16x8_t __a, int16_t __b)
-{
-  return __builtin_mve_vhaddq_n_sv8hi (__a, __b);
-}
-
 __extension__ extern __inline int16x8_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vcaddq_rot90_s16 (int16x8_t __a, int16x8_t __b)
@@ -4811,48 +4268,6 @@ __arm_vqshlq_n_s16 (int16x8_t __a, const int __imm)
   return __builtin_mve_vqshlq_n_sv8hi (__a, __imm);
 }
 
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vrmulhq_u32 (uint32x4_t __a, uint32x4_t __b)
-{
-  return __builtin_mve_vrmulhq_uv4si (__a, __b);
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vrhaddq_u32 (uint32x4_t __a, uint32x4_t __b)
-{
-  return __builtin_mve_vrhaddq_uv4si (__a, __b);
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vqsubq_u32 (uint32x4_t __a, uint32x4_t __b)
-{
-  return __builtin_mve_vqsubq_uv4si (__a, __b);
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vqsubq_n_u32 (uint32x4_t __a, uint32_t __b)
-{
-  return __builtin_mve_vqsubq_n_uv4si (__a, __b);
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vqaddq_u32 (uint32x4_t __a, uint32x4_t __b)
-{
-  return __builtin_mve_vqaddq_uv4si (__a, __b);
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vqaddq_n_u32 (uint32x4_t __a, uint32_t __b)
-{
-  return __builtin_mve_vqaddq_n_uv4si (__a, __b);
-}
-
 __extension__ extern __inline uint32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vornq_u32 (uint32x4_t __a, uint32x4_t __b)
@@ -4874,13 +4289,6 @@ __arm_vmullbq_int_u32 (uint32x4_t __a, uint32x4_t __b)
   return __builtin_mve_vmullbq_int_uv4si (__a, __b);
 }
 
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vmulhq_u32 (uint32x4_t __a, uint32x4_t __b)
-{
-  return __builtin_mve_vmulhq_uv4si (__a, __b);
-}
-
 __extension__ extern __inline uint32_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vmladavq_u32 (uint32x4_t __a, uint32x4_t __b)
@@ -4916,34 +4324,6 @@ __arm_vmaxq_u32 (uint32x4_t __a, uint32x4_t __b)
   return __builtin_mve_vmaxq_uv4si (__a, __b);
 }
 
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhsubq_u32 (uint32x4_t __a, uint32x4_t __b)
-{
-  return __builtin_mve_vhsubq_uv4si (__a, __b);
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhsubq_n_u32 (uint32x4_t __a, uint32_t __b)
-{
-  return __builtin_mve_vhsubq_n_uv4si (__a, __b);
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhaddq_u32 (uint32x4_t __a, uint32x4_t __b)
-{
-  return __builtin_mve_vhaddq_uv4si (__a, __b);
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhaddq_n_u32 (uint32x4_t __a, uint32_t __b)
-{
-  return __builtin_mve_vhaddq_n_uv4si (__a, __b);
-}
-
 __extension__ extern __inline mve_pred16_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vcmpneq_n_u32 (uint32x4_t __a, uint32_t __b)
@@ -5254,34 +4634,6 @@ __arm_vrshlq_n_s32 (int32x4_t __a, int32_t __b)
   return __builtin_mve_vrshlq_n_sv4si (__a, __b);
 }
 
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vrmulhq_s32 (int32x4_t __a, int32x4_t __b)
-{
-  return __builtin_mve_vrmulhq_sv4si (__a, __b);
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vrhaddq_s32 (int32x4_t __a, int32x4_t __b)
-{
-  return __builtin_mve_vrhaddq_sv4si (__a, __b);
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vqsubq_s32 (int32x4_t __a, int32x4_t __b)
-{
-  return __builtin_mve_vqsubq_sv4si (__a, __b);
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vqsubq_n_s32 (int32x4_t __a, int32_t __b)
-{
-  return __builtin_mve_vqsubq_n_sv4si (__a, __b);
-}
-
 __extension__ extern __inline int32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vqshlq_s32 (int32x4_t __a, int32x4_t __b)
@@ -5324,34 +4676,6 @@ __arm_vqrdmulhq_n_s32 (int32x4_t __a, int32_t __b)
   return __builtin_mve_vqrdmulhq_n_sv4si (__a, __b);
 }
 
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vqdmulhq_s32 (int32x4_t __a, int32x4_t __b)
-{
-  return __builtin_mve_vqdmulhq_sv4si (__a, __b);
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vqdmulhq_n_s32 (int32x4_t __a, int32_t __b)
-{
-  return __builtin_mve_vqdmulhq_n_sv4si (__a, __b);
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vqaddq_s32 (int32x4_t __a, int32x4_t __b)
-{
-  return __builtin_mve_vqaddq_sv4si (__a, __b);
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vqaddq_n_s32 (int32x4_t __a, int32_t __b)
-{
-  return __builtin_mve_vqaddq_n_sv4si (__a, __b);
-}
-
 __extension__ extern __inline int32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vornq_s32 (int32x4_t __a, int32x4_t __b)
@@ -5373,13 +4697,6 @@ __arm_vmullbq_int_s32 (int32x4_t __a, int32x4_t __b)
   return __builtin_mve_vmullbq_int_sv4si (__a, __b);
 }
 
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vmulhq_s32 (int32x4_t __a, int32x4_t __b)
-{
-  return __builtin_mve_vmulhq_sv4si (__a, __b);
-}
-
 __extension__ extern __inline int32_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vmlsdavxq_s32 (int32x4_t __a, int32x4_t __b)
@@ -5436,20 +4753,6 @@ __arm_vmaxq_s32 (int32x4_t __a, int32x4_t __b)
   return __builtin_mve_vmaxq_sv4si (__a, __b);
 }
 
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhsubq_s32 (int32x4_t __a, int32x4_t __b)
-{
-  return __builtin_mve_vhsubq_sv4si (__a, __b);
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhsubq_n_s32 (int32x4_t __a, int32_t __b)
-{
-  return __builtin_mve_vhsubq_n_sv4si (__a, __b);
-}
-
 __extension__ extern __inline int32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vhcaddq_rot90_s32 (int32x4_t __a, int32x4_t __b)
@@ -5464,20 +4767,6 @@ __arm_vhcaddq_rot270_s32 (int32x4_t __a, int32x4_t __b)
   return __builtin_mve_vhcaddq_rot270_sv4si (__a, __b);
 }
 
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhaddq_s32 (int32x4_t __a, int32x4_t __b)
-{
-  return __builtin_mve_vhaddq_sv4si (__a, __b);
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhaddq_n_s32 (int32x4_t __a, int32_t __b)
-{
-  return __builtin_mve_vhaddq_n_sv4si (__a, __b);
-}
-
 __extension__ extern __inline int32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vcaddq_rot90_s32 (int32x4_t __a, int32x4_t __b)
@@ -9005,90 +8294,6 @@ __arm_vcaddq_rot90_m_u16 (uint16x8_t __inactive, uint16x8_t __a, uint16x8_t __b,
   return __builtin_mve_vcaddq_rot90_m_uv8hi (__inactive, __a, __b, __p);
 }
 
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhaddq_m_n_s8 (int8x16_t __inactive, int8x16_t __a, int8_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vhaddq_m_n_sv16qi (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhaddq_m_n_s32 (int32x4_t __inactive, int32x4_t __a, int32_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vhaddq_m_n_sv4si (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhaddq_m_n_s16 (int16x8_t __inactive, int16x8_t __a, int16_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vhaddq_m_n_sv8hi (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhaddq_m_n_u8 (uint8x16_t __inactive, uint8x16_t __a, uint8_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vhaddq_m_n_uv16qi (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhaddq_m_n_u32 (uint32x4_t __inactive, uint32x4_t __a, uint32_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vhaddq_m_n_uv4si (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhaddq_m_n_u16 (uint16x8_t __inactive, uint16x8_t __a, uint16_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vhaddq_m_n_uv8hi (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhaddq_m_s8 (int8x16_t __inactive, int8x16_t __a, int8x16_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vhaddq_m_sv16qi (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhaddq_m_s32 (int32x4_t __inactive, int32x4_t __a, int32x4_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vhaddq_m_sv4si (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhaddq_m_s16 (int16x8_t __inactive, int16x8_t __a, int16x8_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vhaddq_m_sv8hi (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhaddq_m_u8 (uint8x16_t __inactive, uint8x16_t __a, uint8x16_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vhaddq_m_uv16qi (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhaddq_m_u32 (uint32x4_t __inactive, uint32x4_t __a, uint32x4_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vhaddq_m_uv4si (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhaddq_m_u16 (uint16x8_t __inactive, uint16x8_t __a, uint16x8_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vhaddq_m_uv8hi (__inactive, __a, __b, __p);
-}
-
 __extension__ extern __inline int8x16_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vhcaddq_rot270_m_s8 (int8x16_t __inactive, int8x16_t __a, int8x16_t __b, mve_pred16_t __p)
@@ -9131,90 +8336,6 @@ __arm_vhcaddq_rot90_m_s16 (int16x8_t __inactive, int16x8_t __a, int16x8_t __b, m
   return __builtin_mve_vhcaddq_rot90_m_sv8hi (__inactive, __a, __b, __p);
 }
 
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhsubq_m_n_s8 (int8x16_t __inactive, int8x16_t __a, int8_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vhsubq_m_n_sv16qi (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhsubq_m_n_s32 (int32x4_t __inactive, int32x4_t __a, int32_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vhsubq_m_n_sv4si (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhsubq_m_n_s16 (int16x8_t __inactive, int16x8_t __a, int16_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vhsubq_m_n_sv8hi (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhsubq_m_n_u8 (uint8x16_t __inactive, uint8x16_t __a, uint8_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vhsubq_m_n_uv16qi (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhsubq_m_n_u32 (uint32x4_t __inactive, uint32x4_t __a, uint32_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vhsubq_m_n_uv4si (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhsubq_m_n_u16 (uint16x8_t __inactive, uint16x8_t __a, uint16_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vhsubq_m_n_uv8hi (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhsubq_m_s8 (int8x16_t __inactive, int8x16_t __a, int8x16_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vhsubq_m_sv16qi (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhsubq_m_s32 (int32x4_t __inactive, int32x4_t __a, int32x4_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vhsubq_m_sv4si (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhsubq_m_s16 (int16x8_t __inactive, int16x8_t __a, int16x8_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vhsubq_m_sv8hi (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhsubq_m_u8 (uint8x16_t __inactive, uint8x16_t __a, uint8x16_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vhsubq_m_uv16qi (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhsubq_m_u32 (uint32x4_t __inactive, uint32x4_t __a, uint32x4_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vhsubq_m_uv4si (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhsubq_m_u16 (uint16x8_t __inactive, uint16x8_t __a, uint16x8_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vhsubq_m_uv8hi (__inactive, __a, __b, __p);
-}
-
 __extension__ extern __inline int8x16_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vmaxq_m_s8 (int8x16_t __inactive, int8x16_t __a, int8x16_t __b, mve_pred16_t __p)
@@ -9488,48 +8609,6 @@ __arm_vmlsdavaxq_p_s16 (int32_t __a, int16x8_t __b, int16x8_t __c, mve_pred16_t
   return __builtin_mve_vmlsdavaxq_p_sv8hi (__a, __b, __c, __p);
 }
 
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vmulhq_m_s8 (int8x16_t __inactive, int8x16_t __a, int8x16_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vmulhq_m_sv16qi (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vmulhq_m_s32 (int32x4_t __inactive, int32x4_t __a, int32x4_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vmulhq_m_sv4si (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vmulhq_m_s16 (int16x8_t __inactive, int16x8_t __a, int16x8_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vmulhq_m_sv8hi (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vmulhq_m_u8 (uint8x16_t __inactive, uint8x16_t __a, uint8x16_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vmulhq_m_uv16qi (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vmulhq_m_u32 (uint32x4_t __inactive, uint32x4_t __a, uint32x4_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vmulhq_m_uv4si (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vmulhq_m_u16 (uint16x8_t __inactive, uint16x8_t __a, uint16x8_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vmulhq_m_uv8hi (__inactive, __a, __b, __p);
-}
-
 __extension__ extern __inline int16x8_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vmullbq_int_m_s8 (int16x8_t __inactive, int8x16_t __a, int8x16_t __b, mve_pred16_t __p)
@@ -9656,90 +8735,6 @@ __arm_vornq_m_u16 (uint16x8_t __inactive, uint16x8_t __a, uint16x8_t __b, mve_pr
   return __builtin_mve_vornq_m_uv8hi (__inactive, __a, __b, __p);
 }
 
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vqaddq_m_n_s8 (int8x16_t __inactive, int8x16_t __a, int8_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vqaddq_m_n_sv16qi (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vqaddq_m_n_s32 (int32x4_t __inactive, int32x4_t __a, int32_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vqaddq_m_n_sv4si (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vqaddq_m_n_s16 (int16x8_t __inactive, int16x8_t __a, int16_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vqaddq_m_n_sv8hi (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vqaddq_m_n_u8 (uint8x16_t __inactive, uint8x16_t __a, uint8_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vqaddq_m_n_uv16qi (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vqaddq_m_n_u32 (uint32x4_t __inactive, uint32x4_t __a, uint32_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vqaddq_m_n_uv4si (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vqaddq_m_n_u16 (uint16x8_t __inactive, uint16x8_t __a, uint16_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vqaddq_m_n_uv8hi (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vqaddq_m_s8 (int8x16_t __inactive, int8x16_t __a, int8x16_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vqaddq_m_sv16qi (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vqaddq_m_s32 (int32x4_t __inactive, int32x4_t __a, int32x4_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vqaddq_m_sv4si (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vqaddq_m_s16 (int16x8_t __inactive, int16x8_t __a, int16x8_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vqaddq_m_sv8hi (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vqaddq_m_u8 (uint8x16_t __inactive, uint8x16_t __a, uint8x16_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vqaddq_m_uv16qi (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vqaddq_m_u32 (uint32x4_t __inactive, uint32x4_t __a, uint32x4_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vqaddq_m_uv4si (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vqaddq_m_u16 (uint16x8_t __inactive, uint16x8_t __a, uint16x8_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vqaddq_m_uv8hi (__inactive, __a, __b, __p);
-}
-
 __extension__ extern __inline int8x16_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vqdmladhq_m_s8 (int8x16_t __inactive, int8x16_t __a, int8x16_t __b, mve_pred16_t __p)
@@ -9845,48 +8840,6 @@ __arm_vqdmlsdhxq_m_s16 (int16x8_t __inactive, int16x8_t __a, int16x8_t __b, mve_
   return __builtin_mve_vqdmlsdhxq_m_sv8hi (__inactive, __a, __b, __p);
 }
 
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vqdmulhq_m_n_s8 (int8x16_t __inactive, int8x16_t __a, int8_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vqdmulhq_m_n_sv16qi (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vqdmulhq_m_n_s32 (int32x4_t __inactive, int32x4_t __a, int32_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vqdmulhq_m_n_sv4si (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vqdmulhq_m_n_s16 (int16x8_t __inactive, int16x8_t __a, int16_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vqdmulhq_m_n_sv8hi (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vqdmulhq_m_s8 (int8x16_t __inactive, int8x16_t __a, int8x16_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vqdmulhq_m_sv16qi (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vqdmulhq_m_s32 (int32x4_t __inactive, int32x4_t __a, int32x4_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vqdmulhq_m_sv4si (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vqdmulhq_m_s16 (int16x8_t __inactive, int16x8_t __a, int16x8_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vqdmulhq_m_sv8hi (__inactive, __a, __b, __p);
-}
-
 __extension__ extern __inline int8x16_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vqrdmladhq_m_s8 (int8x16_t __inactive, int8x16_t __a, int8x16_t __b, mve_pred16_t __p)
@@ -10202,174 +9155,6 @@ __arm_vqshlq_m_u16 (uint16x8_t __inactive, uint16x8_t __a, int16x8_t __b, mve_pr
   return __builtin_mve_vqshlq_m_uv8hi (__inactive, __a, __b, __p);
 }
 
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vqsubq_m_n_s8 (int8x16_t __inactive, int8x16_t __a, int8_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vqsubq_m_n_sv16qi (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vqsubq_m_n_s32 (int32x4_t __inactive, int32x4_t __a, int32_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vqsubq_m_n_sv4si (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vqsubq_m_n_s16 (int16x8_t __inactive, int16x8_t __a, int16_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vqsubq_m_n_sv8hi (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vqsubq_m_n_u8 (uint8x16_t __inactive, uint8x16_t __a, uint8_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vqsubq_m_n_uv16qi (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vqsubq_m_n_u32 (uint32x4_t __inactive, uint32x4_t __a, uint32_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vqsubq_m_n_uv4si (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vqsubq_m_n_u16 (uint16x8_t __inactive, uint16x8_t __a, uint16_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vqsubq_m_n_uv8hi (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vqsubq_m_s8 (int8x16_t __inactive, int8x16_t __a, int8x16_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vqsubq_m_sv16qi (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vqsubq_m_s32 (int32x4_t __inactive, int32x4_t __a, int32x4_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vqsubq_m_sv4si (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vqsubq_m_s16 (int16x8_t __inactive, int16x8_t __a, int16x8_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vqsubq_m_sv8hi (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vqsubq_m_u8 (uint8x16_t __inactive, uint8x16_t __a, uint8x16_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vqsubq_m_uv16qi (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vqsubq_m_u32 (uint32x4_t __inactive, uint32x4_t __a, uint32x4_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vqsubq_m_uv4si (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vqsubq_m_u16 (uint16x8_t __inactive, uint16x8_t __a, uint16x8_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vqsubq_m_uv8hi (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vrhaddq_m_s8 (int8x16_t __inactive, int8x16_t __a, int8x16_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vrhaddq_m_sv16qi (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vrhaddq_m_s32 (int32x4_t __inactive, int32x4_t __a, int32x4_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vrhaddq_m_sv4si (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vrhaddq_m_s16 (int16x8_t __inactive, int16x8_t __a, int16x8_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vrhaddq_m_sv8hi (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vrhaddq_m_u8 (uint8x16_t __inactive, uint8x16_t __a, uint8x16_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vrhaddq_m_uv16qi (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vrhaddq_m_u32 (uint32x4_t __inactive, uint32x4_t __a, uint32x4_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vrhaddq_m_uv4si (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vrhaddq_m_u16 (uint16x8_t __inactive, uint16x8_t __a, uint16x8_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vrhaddq_m_uv8hi (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vrmulhq_m_s8 (int8x16_t __inactive, int8x16_t __a, int8x16_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vrmulhq_m_sv16qi (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vrmulhq_m_s32 (int32x4_t __inactive, int32x4_t __a, int32x4_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vrmulhq_m_sv4si (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vrmulhq_m_s16 (int16x8_t __inactive, int16x8_t __a, int16x8_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vrmulhq_m_sv8hi (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vrmulhq_m_u8 (uint8x16_t __inactive, uint8x16_t __a, uint8x16_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vrmulhq_m_uv16qi (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vrmulhq_m_u32 (uint32x4_t __inactive, uint32x4_t __a, uint32x4_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vrmulhq_m_uv4si (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vrmulhq_m_u16 (uint16x8_t __inactive, uint16x8_t __a, uint16x8_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vrmulhq_m_uv8hi (__inactive, __a, __b, __p);
-}
-
 __extension__ extern __inline int8x16_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vrshlq_m_s8 (int8x16_t __inactive, int8x16_t __a, int8x16_t __b, mve_pred16_t __p)
@@ -13289,48 +12074,6 @@ __arm_vnegq_x_s32 (int32x4_t __a, mve_pred16_t __p)
   return __builtin_mve_vnegq_m_sv4si (__arm_vuninitializedq_s32 (), __a, __p);
 }
 
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vmulhq_x_s8 (int8x16_t __a, int8x16_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vmulhq_m_sv16qi (__arm_vuninitializedq_s8 (), __a, __b, __p);
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vmulhq_x_s16 (int16x8_t __a, int16x8_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vmulhq_m_sv8hi (__arm_vuninitializedq_s16 (), __a, __b, __p);
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vmulhq_x_s32 (int32x4_t __a, int32x4_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vmulhq_m_sv4si (__arm_vuninitializedq_s32 (), __a, __b, __p);
-}
-
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vmulhq_x_u8 (uint8x16_t __a, uint8x16_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vmulhq_m_uv16qi (__arm_vuninitializedq_u8 (), __a, __b, __p);
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vmulhq_x_u16 (uint16x8_t __a, uint16x8_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vmulhq_m_uv8hi (__arm_vuninitializedq_u16 (), __a, __b, __p);
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vmulhq_x_u32 (uint32x4_t __a, uint32x4_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vmulhq_m_uv4si (__arm_vuninitializedq_u32 (), __a, __b, __p);
-}
-
 __extension__ extern __inline uint16x8_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vmullbq_poly_x_p8 (uint8x16_t __a, uint8x16_t __b, mve_pred16_t __p)
@@ -13527,90 +12270,6 @@ __arm_vcaddq_rot270_x_u32 (uint32x4_t __a, uint32x4_t __b, mve_pred16_t __p)
   return __builtin_mve_vcaddq_rot270_m_uv4si (__arm_vuninitializedq_u32 (), __a, __b, __p);
 }
 
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhaddq_x_n_s8 (int8x16_t __a, int8_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vhaddq_m_n_sv16qi (__arm_vuninitializedq_s8 (), __a, __b, __p);
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhaddq_x_n_s16 (int16x8_t __a, int16_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vhaddq_m_n_sv8hi (__arm_vuninitializedq_s16 (), __a, __b, __p);
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhaddq_x_n_s32 (int32x4_t __a, int32_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vhaddq_m_n_sv4si (__arm_vuninitializedq_s32 (), __a, __b, __p);
-}
-
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhaddq_x_n_u8 (uint8x16_t __a, uint8_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vhaddq_m_n_uv16qi (__arm_vuninitializedq_u8 (), __a, __b, __p);
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhaddq_x_n_u16 (uint16x8_t __a, uint16_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vhaddq_m_n_uv8hi (__arm_vuninitializedq_u16 (), __a, __b, __p);
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhaddq_x_n_u32 (uint32x4_t __a, uint32_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vhaddq_m_n_uv4si (__arm_vuninitializedq_u32 (), __a, __b, __p);
-}
-
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhaddq_x_s8 (int8x16_t __a, int8x16_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vhaddq_m_sv16qi (__arm_vuninitializedq_s8 (), __a, __b, __p);
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhaddq_x_s16 (int16x8_t __a, int16x8_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vhaddq_m_sv8hi (__arm_vuninitializedq_s16 (), __a, __b, __p);
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhaddq_x_s32 (int32x4_t __a, int32x4_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vhaddq_m_sv4si (__arm_vuninitializedq_s32 (), __a, __b, __p);
-}
-
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhaddq_x_u8 (uint8x16_t __a, uint8x16_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vhaddq_m_uv16qi (__arm_vuninitializedq_u8 (), __a, __b, __p);
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhaddq_x_u16 (uint16x8_t __a, uint16x8_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vhaddq_m_uv8hi (__arm_vuninitializedq_u16 (), __a, __b, __p);
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhaddq_x_u32 (uint32x4_t __a, uint32x4_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vhaddq_m_uv4si (__arm_vuninitializedq_u32 (), __a, __b, __p);
-}
-
 __extension__ extern __inline int8x16_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vhcaddq_rot90_x_s8 (int8x16_t __a, int8x16_t __b, mve_pred16_t __p)
@@ -13653,174 +12312,6 @@ __arm_vhcaddq_rot270_x_s32 (int32x4_t __a, int32x4_t __b, mve_pred16_t __p)
   return __builtin_mve_vhcaddq_rot270_m_sv4si (__arm_vuninitializedq_s32 (), __a, __b, __p);
 }
 
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhsubq_x_n_s8 (int8x16_t __a, int8_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vhsubq_m_n_sv16qi (__arm_vuninitializedq_s8 (), __a, __b, __p);
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhsubq_x_n_s16 (int16x8_t __a, int16_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vhsubq_m_n_sv8hi (__arm_vuninitializedq_s16 (), __a, __b, __p);
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhsubq_x_n_s32 (int32x4_t __a, int32_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vhsubq_m_n_sv4si (__arm_vuninitializedq_s32 (), __a, __b, __p);
-}
-
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhsubq_x_n_u8 (uint8x16_t __a, uint8_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vhsubq_m_n_uv16qi (__arm_vuninitializedq_u8 (), __a, __b, __p);
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhsubq_x_n_u16 (uint16x8_t __a, uint16_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vhsubq_m_n_uv8hi (__arm_vuninitializedq_u16 (), __a, __b, __p);
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhsubq_x_n_u32 (uint32x4_t __a, uint32_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vhsubq_m_n_uv4si (__arm_vuninitializedq_u32 (), __a, __b, __p);
-}
-
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhsubq_x_s8 (int8x16_t __a, int8x16_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vhsubq_m_sv16qi (__arm_vuninitializedq_s8 (), __a, __b, __p);
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhsubq_x_s16 (int16x8_t __a, int16x8_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vhsubq_m_sv8hi (__arm_vuninitializedq_s16 (), __a, __b, __p);
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhsubq_x_s32 (int32x4_t __a, int32x4_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vhsubq_m_sv4si (__arm_vuninitializedq_s32 (), __a, __b, __p);
-}
-
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhsubq_x_u8 (uint8x16_t __a, uint8x16_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vhsubq_m_uv16qi (__arm_vuninitializedq_u8 (), __a, __b, __p);
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhsubq_x_u16 (uint16x8_t __a, uint16x8_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vhsubq_m_uv8hi (__arm_vuninitializedq_u16 (), __a, __b, __p);
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhsubq_x_u32 (uint32x4_t __a, uint32x4_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vhsubq_m_uv4si (__arm_vuninitializedq_u32 (), __a, __b, __p);
-}
-
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vrhaddq_x_s8 (int8x16_t __a, int8x16_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vrhaddq_m_sv16qi (__arm_vuninitializedq_s8 (), __a, __b, __p);
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vrhaddq_x_s16 (int16x8_t __a, int16x8_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vrhaddq_m_sv8hi (__arm_vuninitializedq_s16 (), __a, __b, __p);
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vrhaddq_x_s32 (int32x4_t __a, int32x4_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vrhaddq_m_sv4si (__arm_vuninitializedq_s32 (), __a, __b, __p);
-}
-
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vrhaddq_x_u8 (uint8x16_t __a, uint8x16_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vrhaddq_m_uv16qi (__arm_vuninitializedq_u8 (), __a, __b, __p);
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vrhaddq_x_u16 (uint16x8_t __a, uint16x8_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vrhaddq_m_uv8hi (__arm_vuninitializedq_u16 (), __a, __b, __p);
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vrhaddq_x_u32 (uint32x4_t __a, uint32x4_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vrhaddq_m_uv4si (__arm_vuninitializedq_u32 (), __a, __b, __p);
-}
-
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vrmulhq_x_s8 (int8x16_t __a, int8x16_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vrmulhq_m_sv16qi (__arm_vuninitializedq_s8 (), __a, __b, __p);
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vrmulhq_x_s16 (int16x8_t __a, int16x8_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vrmulhq_m_sv8hi (__arm_vuninitializedq_s16 (), __a, __b, __p);
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vrmulhq_x_s32 (int32x4_t __a, int32x4_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vrmulhq_m_sv4si (__arm_vuninitializedq_s32 (), __a, __b, __p);
-}
-
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vrmulhq_x_u8 (uint8x16_t __a, uint8x16_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vrmulhq_m_uv16qi (__arm_vuninitializedq_u8 (), __a, __b, __p);
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vrmulhq_x_u16 (uint16x8_t __a, uint16x8_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vrmulhq_m_uv8hi (__arm_vuninitializedq_u16 (), __a, __b, __p);
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vrmulhq_x_u32 (uint32x4_t __a, uint32x4_t __b, mve_pred16_t __p)
-{
-  return __builtin_mve_vrmulhq_m_uv4si (__arm_vuninitializedq_u32 (), __a, __b, __p);
-}
-
 __extension__ extern __inline int8x16_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vbicq_x_s8 (int8x16_t __a, int8x16_t __b, mve_pred16_t __p)
@@ -18558,48 +17049,6 @@ __arm_vshlq (uint32x4_t __a, int32x4_t __b)
  return __arm_vshlq_u32 (__a, __b);
 }
 
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vrmulhq (uint8x16_t __a, uint8x16_t __b)
-{
- return __arm_vrmulhq_u8 (__a, __b);
-}
-
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vrhaddq (uint8x16_t __a, uint8x16_t __b)
-{
- return __arm_vrhaddq_u8 (__a, __b);
-}
-
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vqsubq (uint8x16_t __a, uint8x16_t __b)
-{
- return __arm_vqsubq_u8 (__a, __b);
-}
-
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vqsubq (uint8x16_t __a, uint8_t __b)
-{
- return __arm_vqsubq_n_u8 (__a, __b);
-}
-
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vqaddq (uint8x16_t __a, uint8x16_t __b)
-{
- return __arm_vqaddq_u8 (__a, __b);
-}
-
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vqaddq (uint8x16_t __a, uint8_t __b)
-{
- return __arm_vqaddq_n_u8 (__a, __b);
-}
-
 __extension__ extern __inline uint8x16_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vornq (uint8x16_t __a, uint8x16_t __b)
@@ -18621,13 +17070,6 @@ __arm_vmullbq_int (uint8x16_t __a, uint8x16_t __b)
  return __arm_vmullbq_int_u8 (__a, __b);
 }
 
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vmulhq (uint8x16_t __a, uint8x16_t __b)
-{
- return __arm_vmulhq_u8 (__a, __b);
-}
-
 __extension__ extern __inline uint32_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vmladavq (uint8x16_t __a, uint8x16_t __b)
@@ -18663,34 +17105,6 @@ __arm_vmaxq (uint8x16_t __a, uint8x16_t __b)
  return __arm_vmaxq_u8 (__a, __b);
 }
 
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhsubq (uint8x16_t __a, uint8x16_t __b)
-{
- return __arm_vhsubq_u8 (__a, __b);
-}
-
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhsubq (uint8x16_t __a, uint8_t __b)
-{
- return __arm_vhsubq_n_u8 (__a, __b);
-}
-
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhaddq (uint8x16_t __a, uint8x16_t __b)
-{
- return __arm_vhaddq_u8 (__a, __b);
-}
-
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhaddq (uint8x16_t __a, uint8_t __b)
-{
- return __arm_vhaddq_n_u8 (__a, __b);
-}
-
 __extension__ extern __inline mve_pred16_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vcmpneq (uint8x16_t __a, uint8_t __b)
@@ -18999,34 +17413,6 @@ __arm_vrshlq (int8x16_t __a, int32_t __b)
  return __arm_vrshlq_n_s8 (__a, __b);
 }
 
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vrmulhq (int8x16_t __a, int8x16_t __b)
-{
- return __arm_vrmulhq_s8 (__a, __b);
-}
-
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vrhaddq (int8x16_t __a, int8x16_t __b)
-{
- return __arm_vrhaddq_s8 (__a, __b);
-}
-
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vqsubq (int8x16_t __a, int8x16_t __b)
-{
- return __arm_vqsubq_s8 (__a, __b);
-}
-
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vqsubq (int8x16_t __a, int8_t __b)
-{
- return __arm_vqsubq_n_s8 (__a, __b);
-}
-
 __extension__ extern __inline int8x16_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vqshlq (int8x16_t __a, int8x16_t __b)
@@ -19069,34 +17455,6 @@ __arm_vqrdmulhq (int8x16_t __a, int8_t __b)
  return __arm_vqrdmulhq_n_s8 (__a, __b);
 }
 
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vqdmulhq (int8x16_t __a, int8x16_t __b)
-{
- return __arm_vqdmulhq_s8 (__a, __b);
-}
-
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vqdmulhq (int8x16_t __a, int8_t __b)
-{
- return __arm_vqdmulhq_n_s8 (__a, __b);
-}
-
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vqaddq (int8x16_t __a, int8x16_t __b)
-{
- return __arm_vqaddq_s8 (__a, __b);
-}
-
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vqaddq (int8x16_t __a, int8_t __b)
-{
- return __arm_vqaddq_n_s8 (__a, __b);
-}
-
 __extension__ extern __inline int8x16_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vornq (int8x16_t __a, int8x16_t __b)
@@ -19118,13 +17476,6 @@ __arm_vmullbq_int (int8x16_t __a, int8x16_t __b)
  return __arm_vmullbq_int_s8 (__a, __b);
 }
 
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vmulhq (int8x16_t __a, int8x16_t __b)
-{
- return __arm_vmulhq_s8 (__a, __b);
-}
-
 __extension__ extern __inline int32_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vmlsdavxq (int8x16_t __a, int8x16_t __b)
@@ -19181,20 +17532,6 @@ __arm_vmaxq (int8x16_t __a, int8x16_t __b)
  return __arm_vmaxq_s8 (__a, __b);
 }
 
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhsubq (int8x16_t __a, int8x16_t __b)
-{
- return __arm_vhsubq_s8 (__a, __b);
-}
-
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhsubq (int8x16_t __a, int8_t __b)
-{
- return __arm_vhsubq_n_s8 (__a, __b);
-}
-
 __extension__ extern __inline int8x16_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vhcaddq_rot90 (int8x16_t __a, int8x16_t __b)
@@ -19209,20 +17546,6 @@ __arm_vhcaddq_rot270 (int8x16_t __a, int8x16_t __b)
  return __arm_vhcaddq_rot270_s8 (__a, __b);
 }
 
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhaddq (int8x16_t __a, int8x16_t __b)
-{
- return __arm_vhaddq_s8 (__a, __b);
-}
-
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhaddq (int8x16_t __a, int8_t __b)
-{
- return __arm_vhaddq_n_s8 (__a, __b);
-}
-
 __extension__ extern __inline int8x16_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vcaddq_rot90 (int8x16_t __a, int8x16_t __b)
@@ -19286,48 +17609,6 @@ __arm_vqshlq_n (int8x16_t __a, const int __imm)
  return __arm_vqshlq_n_s8 (__a, __imm);
 }
 
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vrmulhq (uint16x8_t __a, uint16x8_t __b)
-{
- return __arm_vrmulhq_u16 (__a, __b);
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vrhaddq (uint16x8_t __a, uint16x8_t __b)
-{
- return __arm_vrhaddq_u16 (__a, __b);
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vqsubq (uint16x8_t __a, uint16x8_t __b)
-{
- return __arm_vqsubq_u16 (__a, __b);
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vqsubq (uint16x8_t __a, uint16_t __b)
-{
- return __arm_vqsubq_n_u16 (__a, __b);
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vqaddq (uint16x8_t __a, uint16x8_t __b)
-{
- return __arm_vqaddq_u16 (__a, __b);
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vqaddq (uint16x8_t __a, uint16_t __b)
-{
- return __arm_vqaddq_n_u16 (__a, __b);
-}
-
 __extension__ extern __inline uint16x8_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vornq (uint16x8_t __a, uint16x8_t __b)
@@ -19349,13 +17630,6 @@ __arm_vmullbq_int (uint16x8_t __a, uint16x8_t __b)
  return __arm_vmullbq_int_u16 (__a, __b);
 }
 
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vmulhq (uint16x8_t __a, uint16x8_t __b)
-{
- return __arm_vmulhq_u16 (__a, __b);
-}
-
 __extension__ extern __inline uint32_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vmladavq (uint16x8_t __a, uint16x8_t __b)
@@ -19391,34 +17665,6 @@ __arm_vmaxq (uint16x8_t __a, uint16x8_t __b)
  return __arm_vmaxq_u16 (__a, __b);
 }
 
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhsubq (uint16x8_t __a, uint16x8_t __b)
-{
- return __arm_vhsubq_u16 (__a, __b);
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhsubq (uint16x8_t __a, uint16_t __b)
-{
- return __arm_vhsubq_n_u16 (__a, __b);
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhaddq (uint16x8_t __a, uint16x8_t __b)
-{
- return __arm_vhaddq_u16 (__a, __b);
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhaddq (uint16x8_t __a, uint16_t __b)
-{
- return __arm_vhaddq_n_u16 (__a, __b);
-}
-
 __extension__ extern __inline mve_pred16_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vcmpneq (uint16x8_t __a, uint16_t __b)
@@ -19727,34 +17973,6 @@ __arm_vrshlq (int16x8_t __a, int32_t __b)
  return __arm_vrshlq_n_s16 (__a, __b);
 }
 
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vrmulhq (int16x8_t __a, int16x8_t __b)
-{
- return __arm_vrmulhq_s16 (__a, __b);
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vrhaddq (int16x8_t __a, int16x8_t __b)
-{
- return __arm_vrhaddq_s16 (__a, __b);
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vqsubq (int16x8_t __a, int16x8_t __b)
-{
- return __arm_vqsubq_s16 (__a, __b);
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vqsubq (int16x8_t __a, int16_t __b)
-{
- return __arm_vqsubq_n_s16 (__a, __b);
-}
-
 __extension__ extern __inline int16x8_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vqshlq (int16x8_t __a, int16x8_t __b)
@@ -19797,34 +18015,6 @@ __arm_vqrdmulhq (int16x8_t __a, int16_t __b)
  return __arm_vqrdmulhq_n_s16 (__a, __b);
 }
 
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vqdmulhq (int16x8_t __a, int16x8_t __b)
-{
- return __arm_vqdmulhq_s16 (__a, __b);
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vqdmulhq (int16x8_t __a, int16_t __b)
-{
- return __arm_vqdmulhq_n_s16 (__a, __b);
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vqaddq (int16x8_t __a, int16x8_t __b)
-{
- return __arm_vqaddq_s16 (__a, __b);
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vqaddq (int16x8_t __a, int16_t __b)
-{
- return __arm_vqaddq_n_s16 (__a, __b);
-}
-
 __extension__ extern __inline int16x8_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vornq (int16x8_t __a, int16x8_t __b)
@@ -19846,13 +18036,6 @@ __arm_vmullbq_int (int16x8_t __a, int16x8_t __b)
  return __arm_vmullbq_int_s16 (__a, __b);
 }
 
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vmulhq (int16x8_t __a, int16x8_t __b)
-{
- return __arm_vmulhq_s16 (__a, __b);
-}
-
 __extension__ extern __inline int32_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vmlsdavxq (int16x8_t __a, int16x8_t __b)
@@ -19909,20 +18092,6 @@ __arm_vmaxq (int16x8_t __a, int16x8_t __b)
  return __arm_vmaxq_s16 (__a, __b);
 }
 
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhsubq (int16x8_t __a, int16x8_t __b)
-{
- return __arm_vhsubq_s16 (__a, __b);
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhsubq (int16x8_t __a, int16_t __b)
-{
- return __arm_vhsubq_n_s16 (__a, __b);
-}
-
 __extension__ extern __inline int16x8_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vhcaddq_rot90 (int16x8_t __a, int16x8_t __b)
@@ -19937,20 +18106,6 @@ __arm_vhcaddq_rot270 (int16x8_t __a, int16x8_t __b)
  return __arm_vhcaddq_rot270_s16 (__a, __b);
 }
 
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhaddq (int16x8_t __a, int16x8_t __b)
-{
- return __arm_vhaddq_s16 (__a, __b);
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhaddq (int16x8_t __a, int16_t __b)
-{
- return __arm_vhaddq_n_s16 (__a, __b);
-}
-
 __extension__ extern __inline int16x8_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vcaddq_rot90 (int16x8_t __a, int16x8_t __b)
@@ -20014,48 +18169,6 @@ __arm_vqshlq_n (int16x8_t __a, const int __imm)
  return __arm_vqshlq_n_s16 (__a, __imm);
 }
 
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vrmulhq (uint32x4_t __a, uint32x4_t __b)
-{
- return __arm_vrmulhq_u32 (__a, __b);
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vrhaddq (uint32x4_t __a, uint32x4_t __b)
-{
- return __arm_vrhaddq_u32 (__a, __b);
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vqsubq (uint32x4_t __a, uint32x4_t __b)
-{
- return __arm_vqsubq_u32 (__a, __b);
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vqsubq (uint32x4_t __a, uint32_t __b)
-{
- return __arm_vqsubq_n_u32 (__a, __b);
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vqaddq (uint32x4_t __a, uint32x4_t __b)
-{
- return __arm_vqaddq_u32 (__a, __b);
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vqaddq (uint32x4_t __a, uint32_t __b)
-{
- return __arm_vqaddq_n_u32 (__a, __b);
-}
-
 __extension__ extern __inline uint32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vornq (uint32x4_t __a, uint32x4_t __b)
@@ -20077,13 +18190,6 @@ __arm_vmullbq_int (uint32x4_t __a, uint32x4_t __b)
  return __arm_vmullbq_int_u32 (__a, __b);
 }
 
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vmulhq (uint32x4_t __a, uint32x4_t __b)
-{
- return __arm_vmulhq_u32 (__a, __b);
-}
-
 __extension__ extern __inline uint32_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vmladavq (uint32x4_t __a, uint32x4_t __b)
@@ -20119,34 +18225,6 @@ __arm_vmaxq (uint32x4_t __a, uint32x4_t __b)
  return __arm_vmaxq_u32 (__a, __b);
 }
 
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhsubq (uint32x4_t __a, uint32x4_t __b)
-{
- return __arm_vhsubq_u32 (__a, __b);
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhsubq (uint32x4_t __a, uint32_t __b)
-{
- return __arm_vhsubq_n_u32 (__a, __b);
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhaddq (uint32x4_t __a, uint32x4_t __b)
-{
- return __arm_vhaddq_u32 (__a, __b);
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhaddq (uint32x4_t __a, uint32_t __b)
-{
- return __arm_vhaddq_n_u32 (__a, __b);
-}
-
 __extension__ extern __inline mve_pred16_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vcmpneq (uint32x4_t __a, uint32_t __b)
@@ -20455,34 +18533,6 @@ __arm_vrshlq (int32x4_t __a, int32_t __b)
  return __arm_vrshlq_n_s32 (__a, __b);
 }
 
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vrmulhq (int32x4_t __a, int32x4_t __b)
-{
- return __arm_vrmulhq_s32 (__a, __b);
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vrhaddq (int32x4_t __a, int32x4_t __b)
-{
- return __arm_vrhaddq_s32 (__a, __b);
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vqsubq (int32x4_t __a, int32x4_t __b)
-{
- return __arm_vqsubq_s32 (__a, __b);
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vqsubq (int32x4_t __a, int32_t __b)
-{
- return __arm_vqsubq_n_s32 (__a, __b);
-}
-
 __extension__ extern __inline int32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vqshlq (int32x4_t __a, int32x4_t __b)
@@ -20525,34 +18575,6 @@ __arm_vqrdmulhq (int32x4_t __a, int32_t __b)
  return __arm_vqrdmulhq_n_s32 (__a, __b);
 }
 
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vqdmulhq (int32x4_t __a, int32x4_t __b)
-{
- return __arm_vqdmulhq_s32 (__a, __b);
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vqdmulhq (int32x4_t __a, int32_t __b)
-{
- return __arm_vqdmulhq_n_s32 (__a, __b);
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vqaddq (int32x4_t __a, int32x4_t __b)
-{
- return __arm_vqaddq_s32 (__a, __b);
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vqaddq (int32x4_t __a, int32_t __b)
-{
- return __arm_vqaddq_n_s32 (__a, __b);
-}
-
 __extension__ extern __inline int32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vornq (int32x4_t __a, int32x4_t __b)
@@ -20574,13 +18596,6 @@ __arm_vmullbq_int (int32x4_t __a, int32x4_t __b)
  return __arm_vmullbq_int_s32 (__a, __b);
 }
 
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vmulhq (int32x4_t __a, int32x4_t __b)
-{
- return __arm_vmulhq_s32 (__a, __b);
-}
-
 __extension__ extern __inline int32_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vmlsdavxq (int32x4_t __a, int32x4_t __b)
@@ -20637,20 +18652,6 @@ __arm_vmaxq (int32x4_t __a, int32x4_t __b)
  return __arm_vmaxq_s32 (__a, __b);
 }
 
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhsubq (int32x4_t __a, int32x4_t __b)
-{
- return __arm_vhsubq_s32 (__a, __b);
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhsubq (int32x4_t __a, int32_t __b)
-{
- return __arm_vhsubq_n_s32 (__a, __b);
-}
-
 __extension__ extern __inline int32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vhcaddq_rot90 (int32x4_t __a, int32x4_t __b)
@@ -20665,20 +18666,6 @@ __arm_vhcaddq_rot270 (int32x4_t __a, int32x4_t __b)
  return __arm_vhcaddq_rot270_s32 (__a, __b);
 }
 
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhaddq (int32x4_t __a, int32x4_t __b)
-{
- return __arm_vhaddq_s32 (__a, __b);
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhaddq (int32x4_t __a, int32_t __b)
-{
- return __arm_vhaddq_n_s32 (__a, __b);
-}
-
 __extension__ extern __inline int32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vcaddq_rot90 (int32x4_t __a, int32x4_t __b)
@@ -24165,90 +22152,6 @@ __arm_vcaddq_rot90_m (uint16x8_t __inactive, uint16x8_t __a, uint16x8_t __b, mve
  return __arm_vcaddq_rot90_m_u16 (__inactive, __a, __b, __p);
 }
 
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhaddq_m (int8x16_t __inactive, int8x16_t __a, int8_t __b, mve_pred16_t __p)
-{
- return __arm_vhaddq_m_n_s8 (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhaddq_m (int32x4_t __inactive, int32x4_t __a, int32_t __b, mve_pred16_t __p)
-{
- return __arm_vhaddq_m_n_s32 (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhaddq_m (int16x8_t __inactive, int16x8_t __a, int16_t __b, mve_pred16_t __p)
-{
- return __arm_vhaddq_m_n_s16 (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhaddq_m (uint8x16_t __inactive, uint8x16_t __a, uint8_t __b, mve_pred16_t __p)
-{
- return __arm_vhaddq_m_n_u8 (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhaddq_m (uint32x4_t __inactive, uint32x4_t __a, uint32_t __b, mve_pred16_t __p)
-{
- return __arm_vhaddq_m_n_u32 (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhaddq_m (uint16x8_t __inactive, uint16x8_t __a, uint16_t __b, mve_pred16_t __p)
-{
- return __arm_vhaddq_m_n_u16 (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhaddq_m (int8x16_t __inactive, int8x16_t __a, int8x16_t __b, mve_pred16_t __p)
-{
- return __arm_vhaddq_m_s8 (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhaddq_m (int32x4_t __inactive, int32x4_t __a, int32x4_t __b, mve_pred16_t __p)
-{
- return __arm_vhaddq_m_s32 (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhaddq_m (int16x8_t __inactive, int16x8_t __a, int16x8_t __b, mve_pred16_t __p)
-{
- return __arm_vhaddq_m_s16 (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhaddq_m (uint8x16_t __inactive, uint8x16_t __a, uint8x16_t __b, mve_pred16_t __p)
-{
- return __arm_vhaddq_m_u8 (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhaddq_m (uint32x4_t __inactive, uint32x4_t __a, uint32x4_t __b, mve_pred16_t __p)
-{
- return __arm_vhaddq_m_u32 (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhaddq_m (uint16x8_t __inactive, uint16x8_t __a, uint16x8_t __b, mve_pred16_t __p)
-{
- return __arm_vhaddq_m_u16 (__inactive, __a, __b, __p);
-}
-
 __extension__ extern __inline int8x16_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vhcaddq_rot270_m (int8x16_t __inactive, int8x16_t __a, int8x16_t __b, mve_pred16_t __p)
@@ -24291,90 +22194,6 @@ __arm_vhcaddq_rot90_m (int16x8_t __inactive, int16x8_t __a, int16x8_t __b, mve_p
  return __arm_vhcaddq_rot90_m_s16 (__inactive, __a, __b, __p);
 }
 
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhsubq_m (int8x16_t __inactive, int8x16_t __a, int8_t __b, mve_pred16_t __p)
-{
- return __arm_vhsubq_m_n_s8 (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhsubq_m (int32x4_t __inactive, int32x4_t __a, int32_t __b, mve_pred16_t __p)
-{
- return __arm_vhsubq_m_n_s32 (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhsubq_m (int16x8_t __inactive, int16x8_t __a, int16_t __b, mve_pred16_t __p)
-{
- return __arm_vhsubq_m_n_s16 (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhsubq_m (uint8x16_t __inactive, uint8x16_t __a, uint8_t __b, mve_pred16_t __p)
-{
- return __arm_vhsubq_m_n_u8 (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhsubq_m (uint32x4_t __inactive, uint32x4_t __a, uint32_t __b, mve_pred16_t __p)
-{
- return __arm_vhsubq_m_n_u32 (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhsubq_m (uint16x8_t __inactive, uint16x8_t __a, uint16_t __b, mve_pred16_t __p)
-{
- return __arm_vhsubq_m_n_u16 (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhsubq_m (int8x16_t __inactive, int8x16_t __a, int8x16_t __b, mve_pred16_t __p)
-{
- return __arm_vhsubq_m_s8 (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhsubq_m (int32x4_t __inactive, int32x4_t __a, int32x4_t __b, mve_pred16_t __p)
-{
- return __arm_vhsubq_m_s32 (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhsubq_m (int16x8_t __inactive, int16x8_t __a, int16x8_t __b, mve_pred16_t __p)
-{
- return __arm_vhsubq_m_s16 (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhsubq_m (uint8x16_t __inactive, uint8x16_t __a, uint8x16_t __b, mve_pred16_t __p)
-{
- return __arm_vhsubq_m_u8 (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhsubq_m (uint32x4_t __inactive, uint32x4_t __a, uint32x4_t __b, mve_pred16_t __p)
-{
- return __arm_vhsubq_m_u32 (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhsubq_m (uint16x8_t __inactive, uint16x8_t __a, uint16x8_t __b, mve_pred16_t __p)
-{
- return __arm_vhsubq_m_u16 (__inactive, __a, __b, __p);
-}
-
 __extension__ extern __inline int8x16_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vmaxq_m (int8x16_t __inactive, int8x16_t __a, int8x16_t __b, mve_pred16_t __p)
@@ -24648,48 +22467,6 @@ __arm_vmlsdavaxq_p (int32_t __a, int16x8_t __b, int16x8_t __c, mve_pred16_t __p)
  return __arm_vmlsdavaxq_p_s16 (__a, __b, __c, __p);
 }
 
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vmulhq_m (int8x16_t __inactive, int8x16_t __a, int8x16_t __b, mve_pred16_t __p)
-{
- return __arm_vmulhq_m_s8 (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vmulhq_m (int32x4_t __inactive, int32x4_t __a, int32x4_t __b, mve_pred16_t __p)
-{
- return __arm_vmulhq_m_s32 (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vmulhq_m (int16x8_t __inactive, int16x8_t __a, int16x8_t __b, mve_pred16_t __p)
-{
- return __arm_vmulhq_m_s16 (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vmulhq_m (uint8x16_t __inactive, uint8x16_t __a, uint8x16_t __b, mve_pred16_t __p)
-{
- return __arm_vmulhq_m_u8 (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vmulhq_m (uint32x4_t __inactive, uint32x4_t __a, uint32x4_t __b, mve_pred16_t __p)
-{
- return __arm_vmulhq_m_u32 (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vmulhq_m (uint16x8_t __inactive, uint16x8_t __a, uint16x8_t __b, mve_pred16_t __p)
-{
- return __arm_vmulhq_m_u16 (__inactive, __a, __b, __p);
-}
-
 __extension__ extern __inline int16x8_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vmullbq_int_m (int16x8_t __inactive, int8x16_t __a, int8x16_t __b, mve_pred16_t __p)
@@ -24816,90 +22593,6 @@ __arm_vornq_m (uint16x8_t __inactive, uint16x8_t __a, uint16x8_t __b, mve_pred16
  return __arm_vornq_m_u16 (__inactive, __a, __b, __p);
 }
 
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vqaddq_m (int8x16_t __inactive, int8x16_t __a, int8_t __b, mve_pred16_t __p)
-{
- return __arm_vqaddq_m_n_s8 (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vqaddq_m (int32x4_t __inactive, int32x4_t __a, int32_t __b, mve_pred16_t __p)
-{
- return __arm_vqaddq_m_n_s32 (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vqaddq_m (int16x8_t __inactive, int16x8_t __a, int16_t __b, mve_pred16_t __p)
-{
- return __arm_vqaddq_m_n_s16 (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vqaddq_m (uint8x16_t __inactive, uint8x16_t __a, uint8_t __b, mve_pred16_t __p)
-{
- return __arm_vqaddq_m_n_u8 (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vqaddq_m (uint32x4_t __inactive, uint32x4_t __a, uint32_t __b, mve_pred16_t __p)
-{
- return __arm_vqaddq_m_n_u32 (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vqaddq_m (uint16x8_t __inactive, uint16x8_t __a, uint16_t __b, mve_pred16_t __p)
-{
- return __arm_vqaddq_m_n_u16 (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vqaddq_m (int8x16_t __inactive, int8x16_t __a, int8x16_t __b, mve_pred16_t __p)
-{
- return __arm_vqaddq_m_s8 (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vqaddq_m (int32x4_t __inactive, int32x4_t __a, int32x4_t __b, mve_pred16_t __p)
-{
- return __arm_vqaddq_m_s32 (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vqaddq_m (int16x8_t __inactive, int16x8_t __a, int16x8_t __b, mve_pred16_t __p)
-{
- return __arm_vqaddq_m_s16 (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vqaddq_m (uint8x16_t __inactive, uint8x16_t __a, uint8x16_t __b, mve_pred16_t __p)
-{
- return __arm_vqaddq_m_u8 (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vqaddq_m (uint32x4_t __inactive, uint32x4_t __a, uint32x4_t __b, mve_pred16_t __p)
-{
- return __arm_vqaddq_m_u32 (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vqaddq_m (uint16x8_t __inactive, uint16x8_t __a, uint16x8_t __b, mve_pred16_t __p)
-{
- return __arm_vqaddq_m_u16 (__inactive, __a, __b, __p);
-}
-
 __extension__ extern __inline int8x16_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vqdmladhq_m (int8x16_t __inactive, int8x16_t __a, int8x16_t __b, mve_pred16_t __p)
@@ -25005,48 +22698,6 @@ __arm_vqdmlsdhxq_m (int16x8_t __inactive, int16x8_t __a, int16x8_t __b, mve_pred
  return __arm_vqdmlsdhxq_m_s16 (__inactive, __a, __b, __p);
 }
 
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vqdmulhq_m (int8x16_t __inactive, int8x16_t __a, int8_t __b, mve_pred16_t __p)
-{
- return __arm_vqdmulhq_m_n_s8 (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vqdmulhq_m (int32x4_t __inactive, int32x4_t __a, int32_t __b, mve_pred16_t __p)
-{
- return __arm_vqdmulhq_m_n_s32 (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vqdmulhq_m (int16x8_t __inactive, int16x8_t __a, int16_t __b, mve_pred16_t __p)
-{
- return __arm_vqdmulhq_m_n_s16 (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vqdmulhq_m (int8x16_t __inactive, int8x16_t __a, int8x16_t __b, mve_pred16_t __p)
-{
- return __arm_vqdmulhq_m_s8 (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vqdmulhq_m (int32x4_t __inactive, int32x4_t __a, int32x4_t __b, mve_pred16_t __p)
-{
- return __arm_vqdmulhq_m_s32 (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vqdmulhq_m (int16x8_t __inactive, int16x8_t __a, int16x8_t __b, mve_pred16_t __p)
-{
- return __arm_vqdmulhq_m_s16 (__inactive, __a, __b, __p);
-}
-
 __extension__ extern __inline int8x16_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vqrdmladhq_m (int8x16_t __inactive, int8x16_t __a, int8x16_t __b, mve_pred16_t __p)
@@ -25362,174 +23013,6 @@ __arm_vqshlq_m (uint16x8_t __inactive, uint16x8_t __a, int16x8_t __b, mve_pred16
  return __arm_vqshlq_m_u16 (__inactive, __a, __b, __p);
 }
 
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vqsubq_m (int8x16_t __inactive, int8x16_t __a, int8_t __b, mve_pred16_t __p)
-{
- return __arm_vqsubq_m_n_s8 (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vqsubq_m (int32x4_t __inactive, int32x4_t __a, int32_t __b, mve_pred16_t __p)
-{
- return __arm_vqsubq_m_n_s32 (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vqsubq_m (int16x8_t __inactive, int16x8_t __a, int16_t __b, mve_pred16_t __p)
-{
- return __arm_vqsubq_m_n_s16 (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vqsubq_m (uint8x16_t __inactive, uint8x16_t __a, uint8_t __b, mve_pred16_t __p)
-{
- return __arm_vqsubq_m_n_u8 (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vqsubq_m (uint32x4_t __inactive, uint32x4_t __a, uint32_t __b, mve_pred16_t __p)
-{
- return __arm_vqsubq_m_n_u32 (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vqsubq_m (uint16x8_t __inactive, uint16x8_t __a, uint16_t __b, mve_pred16_t __p)
-{
- return __arm_vqsubq_m_n_u16 (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vqsubq_m (int8x16_t __inactive, int8x16_t __a, int8x16_t __b, mve_pred16_t __p)
-{
- return __arm_vqsubq_m_s8 (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vqsubq_m (int32x4_t __inactive, int32x4_t __a, int32x4_t __b, mve_pred16_t __p)
-{
- return __arm_vqsubq_m_s32 (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vqsubq_m (int16x8_t __inactive, int16x8_t __a, int16x8_t __b, mve_pred16_t __p)
-{
- return __arm_vqsubq_m_s16 (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vqsubq_m (uint8x16_t __inactive, uint8x16_t __a, uint8x16_t __b, mve_pred16_t __p)
-{
- return __arm_vqsubq_m_u8 (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vqsubq_m (uint32x4_t __inactive, uint32x4_t __a, uint32x4_t __b, mve_pred16_t __p)
-{
- return __arm_vqsubq_m_u32 (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vqsubq_m (uint16x8_t __inactive, uint16x8_t __a, uint16x8_t __b, mve_pred16_t __p)
-{
- return __arm_vqsubq_m_u16 (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vrhaddq_m (int8x16_t __inactive, int8x16_t __a, int8x16_t __b, mve_pred16_t __p)
-{
- return __arm_vrhaddq_m_s8 (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vrhaddq_m (int32x4_t __inactive, int32x4_t __a, int32x4_t __b, mve_pred16_t __p)
-{
- return __arm_vrhaddq_m_s32 (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vrhaddq_m (int16x8_t __inactive, int16x8_t __a, int16x8_t __b, mve_pred16_t __p)
-{
- return __arm_vrhaddq_m_s16 (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vrhaddq_m (uint8x16_t __inactive, uint8x16_t __a, uint8x16_t __b, mve_pred16_t __p)
-{
- return __arm_vrhaddq_m_u8 (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vrhaddq_m (uint32x4_t __inactive, uint32x4_t __a, uint32x4_t __b, mve_pred16_t __p)
-{
- return __arm_vrhaddq_m_u32 (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vrhaddq_m (uint16x8_t __inactive, uint16x8_t __a, uint16x8_t __b, mve_pred16_t __p)
-{
- return __arm_vrhaddq_m_u16 (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vrmulhq_m (int8x16_t __inactive, int8x16_t __a, int8x16_t __b, mve_pred16_t __p)
-{
- return __arm_vrmulhq_m_s8 (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vrmulhq_m (int32x4_t __inactive, int32x4_t __a, int32x4_t __b, mve_pred16_t __p)
-{
- return __arm_vrmulhq_m_s32 (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vrmulhq_m (int16x8_t __inactive, int16x8_t __a, int16x8_t __b, mve_pred16_t __p)
-{
- return __arm_vrmulhq_m_s16 (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vrmulhq_m (uint8x16_t __inactive, uint8x16_t __a, uint8x16_t __b, mve_pred16_t __p)
-{
- return __arm_vrmulhq_m_u8 (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vrmulhq_m (uint32x4_t __inactive, uint32x4_t __a, uint32x4_t __b, mve_pred16_t __p)
-{
- return __arm_vrmulhq_m_u32 (__inactive, __a, __b, __p);
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vrmulhq_m (uint16x8_t __inactive, uint16x8_t __a, uint16x8_t __b, mve_pred16_t __p)
-{
- return __arm_vrmulhq_m_u16 (__inactive, __a, __b, __p);
-}
-
 __extension__ extern __inline int8x16_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vrshlq_m (int8x16_t __inactive, int8x16_t __a, int8x16_t __b, mve_pred16_t __p)
@@ -27980,48 +25463,6 @@ __arm_vnegq_x (int32x4_t __a, mve_pred16_t __p)
  return __arm_vnegq_x_s32 (__a, __p);
 }
 
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vmulhq_x (int8x16_t __a, int8x16_t __b, mve_pred16_t __p)
-{
- return __arm_vmulhq_x_s8 (__a, __b, __p);
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vmulhq_x (int16x8_t __a, int16x8_t __b, mve_pred16_t __p)
-{
- return __arm_vmulhq_x_s16 (__a, __b, __p);
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vmulhq_x (int32x4_t __a, int32x4_t __b, mve_pred16_t __p)
-{
- return __arm_vmulhq_x_s32 (__a, __b, __p);
-}
-
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vmulhq_x (uint8x16_t __a, uint8x16_t __b, mve_pred16_t __p)
-{
- return __arm_vmulhq_x_u8 (__a, __b, __p);
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vmulhq_x (uint16x8_t __a, uint16x8_t __b, mve_pred16_t __p)
-{
- return __arm_vmulhq_x_u16 (__a, __b, __p);
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vmulhq_x (uint32x4_t __a, uint32x4_t __b, mve_pred16_t __p)
-{
- return __arm_vmulhq_x_u32 (__a, __b, __p);
-}
-
 __extension__ extern __inline uint16x8_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vmullbq_poly_x (uint8x16_t __a, uint8x16_t __b, mve_pred16_t __p)
@@ -28218,90 +25659,6 @@ __arm_vcaddq_rot270_x (uint32x4_t __a, uint32x4_t __b, mve_pred16_t __p)
  return __arm_vcaddq_rot270_x_u32 (__a, __b, __p);
 }
 
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhaddq_x (int8x16_t __a, int8_t __b, mve_pred16_t __p)
-{
- return __arm_vhaddq_x_n_s8 (__a, __b, __p);
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhaddq_x (int16x8_t __a, int16_t __b, mve_pred16_t __p)
-{
- return __arm_vhaddq_x_n_s16 (__a, __b, __p);
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhaddq_x (int32x4_t __a, int32_t __b, mve_pred16_t __p)
-{
- return __arm_vhaddq_x_n_s32 (__a, __b, __p);
-}
-
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhaddq_x (uint8x16_t __a, uint8_t __b, mve_pred16_t __p)
-{
- return __arm_vhaddq_x_n_u8 (__a, __b, __p);
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhaddq_x (uint16x8_t __a, uint16_t __b, mve_pred16_t __p)
-{
- return __arm_vhaddq_x_n_u16 (__a, __b, __p);
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhaddq_x (uint32x4_t __a, uint32_t __b, mve_pred16_t __p)
-{
- return __arm_vhaddq_x_n_u32 (__a, __b, __p);
-}
-
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhaddq_x (int8x16_t __a, int8x16_t __b, mve_pred16_t __p)
-{
- return __arm_vhaddq_x_s8 (__a, __b, __p);
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhaddq_x (int16x8_t __a, int16x8_t __b, mve_pred16_t __p)
-{
- return __arm_vhaddq_x_s16 (__a, __b, __p);
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhaddq_x (int32x4_t __a, int32x4_t __b, mve_pred16_t __p)
-{
- return __arm_vhaddq_x_s32 (__a, __b, __p);
-}
-
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhaddq_x (uint8x16_t __a, uint8x16_t __b, mve_pred16_t __p)
-{
- return __arm_vhaddq_x_u8 (__a, __b, __p);
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhaddq_x (uint16x8_t __a, uint16x8_t __b, mve_pred16_t __p)
-{
- return __arm_vhaddq_x_u16 (__a, __b, __p);
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhaddq_x (uint32x4_t __a, uint32x4_t __b, mve_pred16_t __p)
-{
- return __arm_vhaddq_x_u32 (__a, __b, __p);
-}
-
 __extension__ extern __inline int8x16_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vhcaddq_rot90_x (int8x16_t __a, int8x16_t __b, mve_pred16_t __p)
@@ -28344,174 +25701,6 @@ __arm_vhcaddq_rot270_x (int32x4_t __a, int32x4_t __b, mve_pred16_t __p)
  return __arm_vhcaddq_rot270_x_s32 (__a, __b, __p);
 }
 
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhsubq_x (int8x16_t __a, int8_t __b, mve_pred16_t __p)
-{
- return __arm_vhsubq_x_n_s8 (__a, __b, __p);
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhsubq_x (int16x8_t __a, int16_t __b, mve_pred16_t __p)
-{
- return __arm_vhsubq_x_n_s16 (__a, __b, __p);
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhsubq_x (int32x4_t __a, int32_t __b, mve_pred16_t __p)
-{
- return __arm_vhsubq_x_n_s32 (__a, __b, __p);
-}
-
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhsubq_x (uint8x16_t __a, uint8_t __b, mve_pred16_t __p)
-{
- return __arm_vhsubq_x_n_u8 (__a, __b, __p);
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhsubq_x (uint16x8_t __a, uint16_t __b, mve_pred16_t __p)
-{
- return __arm_vhsubq_x_n_u16 (__a, __b, __p);
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhsubq_x (uint32x4_t __a, uint32_t __b, mve_pred16_t __p)
-{
- return __arm_vhsubq_x_n_u32 (__a, __b, __p);
-}
-
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhsubq_x (int8x16_t __a, int8x16_t __b, mve_pred16_t __p)
-{
- return __arm_vhsubq_x_s8 (__a, __b, __p);
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhsubq_x (int16x8_t __a, int16x8_t __b, mve_pred16_t __p)
-{
- return __arm_vhsubq_x_s16 (__a, __b, __p);
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhsubq_x (int32x4_t __a, int32x4_t __b, mve_pred16_t __p)
-{
- return __arm_vhsubq_x_s32 (__a, __b, __p);
-}
-
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhsubq_x (uint8x16_t __a, uint8x16_t __b, mve_pred16_t __p)
-{
- return __arm_vhsubq_x_u8 (__a, __b, __p);
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhsubq_x (uint16x8_t __a, uint16x8_t __b, mve_pred16_t __p)
-{
- return __arm_vhsubq_x_u16 (__a, __b, __p);
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vhsubq_x (uint32x4_t __a, uint32x4_t __b, mve_pred16_t __p)
-{
- return __arm_vhsubq_x_u32 (__a, __b, __p);
-}
-
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vrhaddq_x (int8x16_t __a, int8x16_t __b, mve_pred16_t __p)
-{
- return __arm_vrhaddq_x_s8 (__a, __b, __p);
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vrhaddq_x (int16x8_t __a, int16x8_t __b, mve_pred16_t __p)
-{
- return __arm_vrhaddq_x_s16 (__a, __b, __p);
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vrhaddq_x (int32x4_t __a, int32x4_t __b, mve_pred16_t __p)
-{
- return __arm_vrhaddq_x_s32 (__a, __b, __p);
-}
-
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vrhaddq_x (uint8x16_t __a, uint8x16_t __b, mve_pred16_t __p)
-{
- return __arm_vrhaddq_x_u8 (__a, __b, __p);
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vrhaddq_x (uint16x8_t __a, uint16x8_t __b, mve_pred16_t __p)
-{
- return __arm_vrhaddq_x_u16 (__a, __b, __p);
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vrhaddq_x (uint32x4_t __a, uint32x4_t __b, mve_pred16_t __p)
-{
- return __arm_vrhaddq_x_u32 (__a, __b, __p);
-}
-
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vrmulhq_x (int8x16_t __a, int8x16_t __b, mve_pred16_t __p)
-{
- return __arm_vrmulhq_x_s8 (__a, __b, __p);
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vrmulhq_x (int16x8_t __a, int16x8_t __b, mve_pred16_t __p)
-{
- return __arm_vrmulhq_x_s16 (__a, __b, __p);
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vrmulhq_x (int32x4_t __a, int32x4_t __b, mve_pred16_t __p)
-{
- return __arm_vrmulhq_x_s32 (__a, __b, __p);
-}
-
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vrmulhq_x (uint8x16_t __a, uint8x16_t __b, mve_pred16_t __p)
-{
- return __arm_vrmulhq_x_u8 (__a, __b, __p);
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vrmulhq_x (uint16x8_t __a, uint16x8_t __b, mve_pred16_t __p)
-{
- return __arm_vrmulhq_x_u16 (__a, __b, __p);
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vrmulhq_x (uint32x4_t __a, uint32x4_t __b, mve_pred16_t __p)
-{
- return __arm_vrmulhq_x_u32 (__a, __b, __p);
-}
-
 __extension__ extern __inline int8x16_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vbicq_x (int8x16_t __a, int8x16_t __b, mve_pred16_t __p)
@@ -32685,42 +29874,6 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int16x8_t]: __arm_vrshlq_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, int16x8_t)), \
   int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int32x4_t]: __arm_vrshlq_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, int32x4_t)));})
 
-#define __arm_vrmulhq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
-  __typeof(p1) __p1 = (p1); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vrmulhq_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t)), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vrmulhq_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t)), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vrmulhq_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t)), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]: __arm_vrmulhq_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t)), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vrmulhq_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t)), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vrmulhq_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t)));})
-
-#define __arm_vrhaddq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
-  __typeof(p1) __p1 = (p1); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vrhaddq_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t)), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vrhaddq_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t)), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vrhaddq_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t)), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]: __arm_vrhaddq_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t)), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vrhaddq_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t)), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vrhaddq_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t)));})
-
-#define __arm_vqsubq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
-  __typeof(p1) __p1 = (p1); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vqsubq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce3(p1, int)), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vqsubq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce3(p1, int)), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vqsubq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce3(p1, int)), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vqsubq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce3(p1, int)), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vqsubq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce3(p1, int)), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vqsubq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce3(p1, int)), \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vqsubq_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t)), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vqsubq_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t)), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vqsubq_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t)), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]: __arm_vqsubq_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t)), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vqsubq_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t)), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vqsubq_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t)));})
-
 #define __arm_vqshluq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
   _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
   int (*)[__ARM_mve_type_int8x16_t]: __arm_vqshluq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), p1), \
@@ -32831,32 +29984,6 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vqdmullbq_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t)), \
   int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vqdmullbq_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t)));})
 
-#define __arm_vqdmulhq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
-  __typeof(p1) __p1 = (p1); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vqdmulhq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce3(p1, int)), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vqdmulhq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce3(p1, int)), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vqdmulhq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce3(p1, int)), \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vqdmulhq_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t)), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vqdmulhq_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t)), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vqdmulhq_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t)));})
-
-#define __arm_vqaddq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
-  __typeof(p1) __p1 = (p1); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vqaddq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce3(p1, int)), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vqaddq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce3(p1, int)), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vqaddq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce3(p1, int)), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vqaddq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce3(p1, int)), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vqaddq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce3(p1, int)), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vqaddq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce3(p1, int)), \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vqaddq_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t)), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vqaddq_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t)), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vqaddq_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t)), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]: __arm_vqaddq_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t)), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vqaddq_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t)), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vqaddq_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t)));})
-
 #define __arm_vmulltq_poly(p0,p1) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
   _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
@@ -32879,22 +30006,6 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vmulltq_int_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t)), \
   int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vmulltq_int_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t)));})
 
-#define __arm_vhaddq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
-  __typeof(p1) __p1 = (p1); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vhaddq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce3(p1, int)), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vhaddq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce3(p1, int)), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vhaddq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce3(p1, int)), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vhaddq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce3(p1, int)), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vhaddq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce3(p1, int)), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vhaddq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce3(p1, int)), \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vhaddq_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t)), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vhaddq_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t)), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vhaddq_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t)), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]: __arm_vhaddq_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t)), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vhaddq_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t)), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vhaddq_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t)));})
-
 #define __arm_vhcaddq_rot270(p0,p1) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
   _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
@@ -32909,22 +30020,6 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vhcaddq_rot90_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t)), \
   int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vhcaddq_rot90_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t)));})
 
-#define __arm_vhsubq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
-  __typeof(p1) __p1 = (p1); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vhsubq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce3(p1, int)), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vhsubq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce3(p1, int)), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vhsubq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce3(p1, int)), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vhsubq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce3(p1, int)), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vhsubq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce3(p1, int)), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vhsubq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce3(p1, int)), \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vhsubq_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t)), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vhsubq_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t)), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vhsubq_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t)), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]: __arm_vhsubq_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t)), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vhsubq_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t)), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vhsubq_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t)));})
-
 #define __arm_vminq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
   _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
@@ -32975,16 +30070,6 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint16x8_t]: __arm_vmovnbq_u16 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint16x8_t)), \
   int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint32x4_t]: __arm_vmovnbq_u32 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint32x4_t)));})
 
-#define __arm_vmulhq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
-  __typeof(p1) __p1 = (p1); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vmulhq_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t)), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vmulhq_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t)), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vmulhq_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t)), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]: __arm_vmulhq_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t)), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vmulhq_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t)), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vmulhq_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t)));})
-
 #define __arm_vmullbq_int(p0,p1) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
   _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
@@ -34580,42 +31665,6 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int16x8_t]: __arm_vrshlq_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, int16x8_t)), \
   int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int32x4_t]: __arm_vrshlq_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, int32x4_t)));})
 
-#define __arm_vrmulhq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
-  __typeof(p1) __p1 = (p1); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vrmulhq_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t)), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vrmulhq_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t)), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vrmulhq_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t)), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]: __arm_vrmulhq_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t)), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vrmulhq_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t)), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vrmulhq_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t)));})
-
-#define __arm_vrhaddq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
-  __typeof(p1) __p1 = (p1); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vrhaddq_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t)), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vrhaddq_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t)), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vrhaddq_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t)), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]: __arm_vrhaddq_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t)), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vrhaddq_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t)), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vrhaddq_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t)));})
-
-#define __arm_vqsubq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
-  __typeof(p1) __p1 = (p1); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vqsubq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce3(p1, int)), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vqsubq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce3(p1, int)), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vqsubq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce3(p1, int)), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vqsubq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce3(p1, int)), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vqsubq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce3(p1, int)), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vqsubq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce3(p1, int)), \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vqsubq_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t)), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vqsubq_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t)), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vqsubq_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t)), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]: __arm_vqsubq_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t)), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vqsubq_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t)), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vqsubq_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t)));})
-
 #define __arm_vqshlq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
   _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
@@ -34694,32 +31743,6 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vqrdmulhq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce3(p1, int)), \
   int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vqrdmulhq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce3(p1, int)));})
 
-#define __arm_vqdmulhq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
-  __typeof(p1) __p1 = (p1); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vqdmulhq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce3(p1, int)), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vqdmulhq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce3(p1, int)), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vqdmulhq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce3(p1, int)), \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vqdmulhq_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t)), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vqdmulhq_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t)), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vqdmulhq_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t)));})
-
-#define __arm_vqaddq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
-  __typeof(p1) __p1 = (p1); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vqaddq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce3(p1, int)), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vqaddq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce3(p1, int)), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vqaddq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce3(p1, int)), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vqaddq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce3(p1, int)), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vqaddq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce3(p1, int)), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vqaddq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce3(p1, int)), \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vqaddq_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t)), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vqaddq_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t)), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vqaddq_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t)), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]: __arm_vqaddq_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t)), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vqaddq_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t)), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vqaddq_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t)));})
-
 #define __arm_vornq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
   _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
@@ -34750,16 +31773,6 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vmullbq_int_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t)), \
   int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vmullbq_int_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t)));})
 
-#define __arm_vmulhq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
-  __typeof(p1) __p1 = (p1); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vmulhq_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t)), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vmulhq_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t)), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vmulhq_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t)), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]: __arm_vmulhq_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t)), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vmulhq_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t)), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vmulhq_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t)));})
-
 #define __arm_vminq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
   _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
@@ -34794,22 +31807,6 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int16x8_t]: __arm_vmaxaq_s16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, int16x8_t)), \
   int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int32x4_t]: __arm_vmaxaq_s32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, int32x4_t)));})
 
-#define __arm_vhsubq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
-  __typeof(p1) __p1 = (p1); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vhsubq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce3(p1, int)), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vhsubq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce3(p1, int)), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vhsubq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce3(p1, int)), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vhsubq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce3(p1, int)), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vhsubq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce3(p1, int)), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vhsubq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce3(p1, int)), \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vhsubq_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t)), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vhsubq_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t)), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vhsubq_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t)), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]: __arm_vhsubq_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t)), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vhsubq_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t)), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vhsubq_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t)));})
-
 #define __arm_vhcaddq_rot90(p0,p1) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
   _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
@@ -34824,22 +31821,6 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vhcaddq_rot270_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t)), \
   int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vhcaddq_rot270_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t)));})
 
-#define __arm_vhaddq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
-  __typeof(p1) __p1 = (p1); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vhaddq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce3(p1, int)), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vhaddq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce3(p1, int)), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vhaddq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce3(p1, int)), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vhaddq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce3(p1, int)), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vhaddq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce3(p1, int)), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vhaddq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce3(p1, int)), \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vhaddq_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t)), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vhaddq_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t)), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vhaddq_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t)), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]: __arm_vhaddq_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t)), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vhaddq_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t)), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vhaddq_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t)));})
-
 #define __arm_vcaddq_rot90(p0,p1) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
   _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
@@ -35910,16 +32891,6 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_uint8x16_t]: __arm_vmovltq_x_u8 (__ARM_mve_coerce(__p1, uint8x16_t), p2), \
   int (*)[__ARM_mve_type_uint16x8_t]: __arm_vmovltq_x_u16 (__ARM_mve_coerce(__p1, uint16x8_t), p2));})
 
-#define __arm_vmulhq_x(p1,p2,p3) ({ __typeof(p1) __p1 = (p1); \
-  __typeof(p2) __p2 = (p2); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vmulhq_x_s8 (__ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8x16_t), p3), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vmulhq_x_s16 (__ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16x8_t), p3), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vmulhq_x_s32 (__ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32x4_t), p3), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]: __arm_vmulhq_x_u8 (__ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce(__p2, uint8x16_t), p3), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vmulhq_x_u16 (__ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, uint16x8_t), p3), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vmulhq_x_u32 (__ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, uint32x4_t), p3));})
-
 #define __arm_vmullbq_int_x(p1,p2,p3) ({ __typeof(p1) __p1 = (p1); \
   __typeof(p2) __p2 = (p2); \
   _Generic( (int (*)[__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
@@ -36095,16 +33066,6 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_int8x16_t]: __arm_vrev16q_x_s8 (__ARM_mve_coerce(__p1, int8x16_t), p2), \
   int (*)[__ARM_mve_type_uint8x16_t]: __arm_vrev16q_x_u8 (__ARM_mve_coerce(__p1, uint8x16_t), p2));})
 
-#define __arm_vrhaddq_x(p1,p2,p3) ({ __typeof(p1) __p1 = (p1); \
-  __typeof(p2) __p2 = (p2); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vrhaddq_x_s8 (__ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8x16_t), p3), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vrhaddq_x_s16 (__ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16x8_t), p3), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vrhaddq_x_s32 (__ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32x4_t), p3), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]: __arm_vrhaddq_x_u8 (__ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce(__p2, uint8x16_t), p3), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vrhaddq_x_u16 (__ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, uint16x8_t), p3), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vrhaddq_x_u32 (__ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, uint32x4_t), p3));})
-
 #define __arm_vshlq_x(p1,p2,p3) ({ __typeof(p1) __p1 = (p1); \
   __typeof(p2) __p2 = (p2); \
   _Generic( (int (*)[__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
@@ -36115,16 +33076,6 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int16x8_t]: __arm_vshlq_x_u16 (__ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, int16x8_t), p3), \
   int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int32x4_t]: __arm_vshlq_x_u32 (__ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, int32x4_t), p3));})
 
-#define __arm_vrmulhq_x(p1,p2,p3) ({ __typeof(p1) __p1 = (p1); \
-  __typeof(p2) __p2 = (p2); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vrmulhq_x_s8 (__ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8x16_t), p3), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vrmulhq_x_s16 (__ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16x8_t), p3), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vrmulhq_x_s32 (__ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32x4_t), p3), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]: __arm_vrmulhq_x_u8 (__ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce(__p2, uint8x16_t), p3), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vrmulhq_x_u16 (__ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, uint16x8_t), p3), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vrmulhq_x_u32 (__ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, uint32x4_t), p3));})
-
 #define __arm_vrshlq_x(p1,p2,p3) ({ __typeof(p1) __p1 = (p1); \
   __typeof(p2) __p2 = (p2); \
   _Generic( (int (*)[__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
@@ -36236,22 +33187,6 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_uint16x8_t]: __arm_vshrq_x_n_u16 (__ARM_mve_coerce(__p1, uint16x8_t), p2, p3), \
   int (*)[__ARM_mve_type_uint32x4_t]: __arm_vshrq_x_n_u32 (__ARM_mve_coerce(__p1, uint32x4_t), p2, p3));})
 
-#define __arm_vhaddq_x(p1,p2,p3) ({ __typeof(p1) __p1 = (p1); \
-  __typeof(p2) __p2 = (p2); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vhaddq_x_n_s8 (__ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce3(p2, int), p3), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vhaddq_x_n_s16 (__ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce3(p2, int), p3), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vhaddq_x_n_s32 (__ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce3(p2, int), p3), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vhaddq_x_n_u8( __ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce3(p2, int), p3), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vhaddq_x_n_u16( __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce3(p2, int), p3), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vhaddq_x_n_u32( __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce3(p2, int), p3), \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vhaddq_x_s8 (__ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8x16_t), p3), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vhaddq_x_s16 (__ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16x8_t), p3), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vhaddq_x_s32 (__ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32x4_t), p3), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]: __arm_vhaddq_x_u8( __ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce(__p2, uint8x16_t), p3), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vhaddq_x_u16 (__ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, uint16x8_t), p3), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vhaddq_x_u32 (__ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, uint32x4_t), p3));})
-
 #define __arm_vhcaddq_rot270_x(p1,p2,p3) ({ __typeof(p1) __p1 = (p1); \
   __typeof(p2) __p2 = (p2); \
   _Generic( (int (*)[__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
@@ -36266,22 +33201,6 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vhcaddq_rot90_x_s16 (__ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16x8_t), p3), \
   int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vhcaddq_rot90_x_s32 (__ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32x4_t), p3));})
 
-#define __arm_vhsubq_x(p1,p2,p3) ({ __typeof(p1) __p1 = (p1); \
-  __typeof(p2) __p2 = (p2); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vhsubq_x_n_s8 (__ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce3(p2, int), p3), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vhsubq_x_n_s16 (__ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce3(p2, int), p3), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vhsubq_x_n_s32 (__ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce3(p2, int), p3), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vhsubq_x_n_u8 (__ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce3(p2, int), p3), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vhsubq_x_n_u16 (__ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce3(p2, int), p3), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vhsubq_x_n_u32 (__ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce3(p2, int), p3), \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vhsubq_x_s8 (__ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8x16_t), p3), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vhsubq_x_s16 (__ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16x8_t), p3), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vhsubq_x_s32 (__ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32x4_t), p3), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]: __arm_vhsubq_x_u8 (__ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce(__p2, uint8x16_t), p3), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vhsubq_x_u16 (__ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, uint16x8_t), p3), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vhsubq_x_u32 (__ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, uint32x4_t), p3));})
-
 #define __arm_vclsq_x(p1,p2) ({ __typeof(p1) __p1 = (p1); \
   _Generic( (int (*)[__ARM_mve_typeid(__p1)])0, \
   int (*)[__ARM_mve_type_int8x16_t]: __arm_vclsq_x_s8 (__ARM_mve_coerce(__p1, int8x16_t), p2), \
@@ -36446,28 +33365,6 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_mve_type_int16x8_t]: __arm_vqshlq_m_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, int16x8_t), p3), \
   int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_mve_type_int32x4_t]: __arm_vqshlq_m_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, int32x4_t), p3));})
 
-#define __arm_vrhaddq_m(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
-  __typeof(p1) __p1 = (p1); \
-  __typeof(p2) __p2 = (p2); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vrhaddq_m_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8x16_t), p3), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vrhaddq_m_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16x8_t), p3), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vrhaddq_m_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32x4_t), p3), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]: __arm_vrhaddq_m_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce(__p2, uint8x16_t), p3), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vrhaddq_m_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, uint16x8_t), p3), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vrhaddq_m_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, uint32x4_t), p3));})
-
-#define __arm_vrmulhq_m(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
-  __typeof(p1) __p1 = (p1); \
-  __typeof(p2) __p2 = (p2); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vrmulhq_m_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8x16_t), p3), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vrmulhq_m_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16x8_t), p3), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vrmulhq_m_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32x4_t), p3), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]: __arm_vrmulhq_m_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce(__p2, uint8x16_t), p3), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vrmulhq_m_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, uint16x8_t), p3), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vrmulhq_m_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, uint32x4_t), p3));})
-
 #define __arm_vrshlq_m(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
   __typeof(p2) __p2 = (p2); \
@@ -36509,23 +33406,6 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vsliq_m_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t),  p2, p3), \
   int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vsliq_m_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t),  p2, p3));})
 
-#define __arm_vqsubq_m(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
-  __typeof(p1) __p1 = (p1); \
-  __typeof(p2) __p2 = (p2); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vqsubq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce3(p2, int), p3), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vqsubq_m_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce3(p2, int), p3), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vqsubq_m_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce3(p2, int), p3), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vqsubq_m_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce3(p2, int), p3), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vqsubq_m_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce3(p2, int), p3), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vqsubq_m_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce3(p2, int), p3), \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vqsubq_m_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8x16_t), p3), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vqsubq_m_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16x8_t), p3), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vqsubq_m_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32x4_t), p3), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]: __arm_vqsubq_m_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce(__p2, uint8x16_t), p3), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vqsubq_m_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, uint16x8_t), p3), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vqsubq_m_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, uint32x4_t), p3));})
-
 #define __arm_vqrdmulhq_m(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
   __typeof(p2) __p2 = (p2); \
@@ -36799,23 +33679,6 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vsriq_m_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t), p2, p3), \
   int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vsriq_m_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t), p2, p3));})
 
-#define __arm_vhaddq_m(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
-  __typeof(p1) __p1 = (p1); \
-  __typeof(p2) __p2 = (p2); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vhaddq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce3(p2, int), p3), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vhaddq_m_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce3(p2, int), p3), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vhaddq_m_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce3(p2, int), p3), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vhaddq_m_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce3(p2, int), p3), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vhaddq_m_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce3(p2, int), p3), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vhaddq_m_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce3(p2, int), p3), \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vhaddq_m_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8x16_t), p3), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vhaddq_m_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16x8_t), p3), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vhaddq_m_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32x4_t), p3), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]: __arm_vhaddq_m_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce(__p2, uint8x16_t), p3), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vhaddq_m_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, uint16x8_t), p3), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vhaddq_m_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, uint32x4_t), p3));})
-
 #define __arm_vhcaddq_rot270_m(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
   __typeof(p2) __p2 = (p2); \
@@ -36832,23 +33695,6 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vhcaddq_rot90_m_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16x8_t), p3), \
   int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vhcaddq_rot90_m_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32x4_t), p3));})
 
-#define __arm_vhsubq_m(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
-  __typeof(p1) __p1 = (p1); \
-  __typeof(p2) __p2 = (p2); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vhsubq_m_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8x16_t), p3), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vhsubq_m_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16x8_t), p3), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vhsubq_m_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32x4_t), p3), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]: __arm_vhsubq_m_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce(__p2, uint8x16_t), p3), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vhsubq_m_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, uint16x8_t), p3), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vhsubq_m_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, uint32x4_t), p3), \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vhsubq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce3(p2, int), p3), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vhsubq_m_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce3(p2, int), p3), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vhsubq_m_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce3(p2, int), p3), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vhsubq_m_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce3(p2, int), p3), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vhsubq_m_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce3(p2, int), p3), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vhsubq_m_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce3(p2, int), p3));})
-
 #define __arm_vmaxq_m(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
   __typeof(p2) __p2 = (p2); \
@@ -36893,17 +33739,6 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vmlasq_m_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce3(p2, int), p3), \
   int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vmlasq_m_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce3(p2, int), p3));})
 
-#define __arm_vmulhq_m(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
-  __typeof(p1) __p1 = (p1); \
-  __typeof(p2) __p2 = (p2); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vmulhq_m_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8x16_t), p3), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vmulhq_m_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16x8_t), p3), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vmulhq_m_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32x4_t), p3), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]: __arm_vmulhq_m_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce(__p2, uint8x16_t), p3), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vmulhq_m_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, uint16x8_t), p3), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vmulhq_m_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, uint32x4_t), p3));})
-
 #define __arm_vmullbq_int_m(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
   __typeof(p2) __p2 = (p2); \
@@ -36933,23 +33768,6 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]: __arm_vmulltq_poly_m_p8 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce(__p2, uint8x16_t), p3), \
   int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vmulltq_poly_m_p16 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, uint16x8_t), p3));})
 
-#define __arm_vqaddq_m(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
-  __typeof(p1) __p1 = (p1); \
-  __typeof(p2) __p2 = (p2); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vqaddq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce3(p2, int), p3), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vqaddq_m_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce3(p2, int), p3), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vqaddq_m_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce3(p2, int), p3), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vqaddq_m_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce3(p2, int), p3), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vqaddq_m_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce3(p2, int), p3), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vqaddq_m_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce3(p2, int), p3), \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vqaddq_m_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8x16_t), p3), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vqaddq_m_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16x8_t), p3), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vqaddq_m_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32x4_t), p3), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]: __arm_vqaddq_m_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce(__p2, uint8x16_t), p3), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vqaddq_m_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, uint16x8_t), p3), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vqaddq_m_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, uint32x4_t), p3));})
-
 #define __arm_vqdmlahq_m(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
   __typeof(p2) __p2 = (p2); \
@@ -36958,17 +33776,6 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vqdmlahq_m_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce3(p2, int), p3), \
   int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vqdmlahq_m_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce3(p2, int), p3));})
 
-#define __arm_vqdmulhq_m(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
-  __typeof(p1) __p1 = (p1); \
-  __typeof(p2) __p2 = (p2); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vqdmulhq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce3(p2, int), p3), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vqdmulhq_m_n_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce3(p2, int), p3), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vqdmulhq_m_n_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce3(p2, int), p3), \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vqdmulhq_m_s8 (__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8x16_t), p3), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vqdmulhq_m_s16 (__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16x8_t), p3), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vqdmulhq_m_s32 (__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32x4_t), p3));})
-
 #define __arm_vqdmullbq_m(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
   __typeof(p2) __p2 = (p2); \
@@ -37562,16 +34369,6 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_uint8x16_t]: __arm_vmovltq_x_u8 (__ARM_mve_coerce(__p1, uint8x16_t), p2), \
   int (*)[__ARM_mve_type_uint16x8_t]: __arm_vmovltq_x_u16 (__ARM_mve_coerce(__p1, uint16x8_t), p2));})
 
-#define __arm_vmulhq_x(p1,p2,p3) ({ __typeof(p1) __p1 = (p1); \
-  __typeof(p2) __p2 = (p2); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
-  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vmulhq_x_s8 (__ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8x16_t), p3), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vmulhq_x_s16 (__ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, int16x8_t), p3), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vmulhq_x_s32 (__ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, int32x4_t), p3), \
-  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]: __arm_vmulhq_x_u8 (__ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce(__p2, uint8x16_t), p3), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vmulhq_x_u16 (__ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, uint16x8_t), p3), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vmulhq_x_u32 (__ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, uint32x4_t), p3));})
-
 #define __arm_vmullbq_int_x(p1,p2,p3) ({ __typeof(p1) __p1 = (p1); \
   __typeof(p2) __p2 = (p2); \
   _Generic( (int (*)[__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
-- 
2.34.1


^ permalink raw reply	[flat|nested] 55+ messages in thread

* RE: [PATCH 00/22] arm: New framework for MVE intrinsics
  2023-04-18 13:45 [PATCH 00/22] arm: New framework for MVE intrinsics Christophe Lyon
                   ` (21 preceding siblings ...)
  2023-04-18 13:46 ` [PATCH 22/22] arm: [MVE intrinsics] rework vhaddq vhsubq vmulhq vqaddq vqsubq vqdmulhq vrhaddq vrmulhq Christophe Lyon
@ 2023-05-02  9:18 ` Kyrylo Tkachov
  2023-05-02 15:04   ` Christophe Lyon
  22 siblings, 1 reply; 55+ messages in thread
From: Kyrylo Tkachov @ 2023-05-02  9:18 UTC (permalink / raw)
  To: Christophe Lyon, gcc-patches, Richard Earnshaw, Richard Sandiford
  Cc: Christophe Lyon

Hi Christophe,

> -----Original Message-----
> From: Christophe Lyon <christophe.lyon@arm.com>
> Sent: Tuesday, April 18, 2023 2:46 PM
> To: gcc-patches@gcc.gnu.org; Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>;
> Richard Earnshaw <Richard.Earnshaw@arm.com>; Richard Sandiford
> <Richard.Sandiford@arm.com>
> Cc: Christophe Lyon <Christophe.Lyon@arm.com>
> Subject: [PATCH 00/22] arm: New framework for MVE intrinsics
> 
> Hi,
> 
> This is the beginning of a long patch series to change the way Arm MVE
> intrinsics are implemented. The goal is to get rid of arm_mve.h, which
> takes a long time to parse and compile.
> 

Thanks for doing this. It is a significant improvement to the MVE intrinsics and should address some of the biggest maintainability and scalability issues we have in that area.
I'll be going through the patches one-by-one (I've looked at these offline already before), but the approach looks good to me at a high level.

My hope is that we'll move all the intrinsics, including the Neon ones to use this framework in the future, but getting the framework in place first is a good major first step in that direction.

Thanks,
Kyrill

> Roughly speaking, it's about using a framework very similar to what is
> implemented for AArch64/SVE intrinsics. I haven't converted all the
> intrinsics yet, but I think it would be good to start the conversion
> when stage-1 reopens.
> 
> * Factorizing names
> One of the main implementation differences I noticed between SVE and
> MVE is that mve.md provides only full builtin names at the moment, and
> makes almost no use of "parameterized names"
> (https://gcc.gnu.org/onlinedocs/gccint/Parameterized-
> Names.html#Parameterized-Names).
> 
> Without this, we'd need the builtin expander to use a large
> switch/case of the form:
> 
> switch (code)
> case VADDQ_S: insn_code = code_for_mve_vaddq_s (...)
> case VADDQ_U: insn_code = code_for_mve_vaddq_u (...)
> case VSUBQ_S: insn_code = code_for_mve_vsubq_s (...)
> case VSUBQ_U: insn_code = code_for_mve_vsubq_u (...)
> ....
> 
> so part of the work (which I called "factorize" in the commit
> messages) is about replacing
> 
> (define_insn "mve_vaddq_n_<supf><mode>"
> with
> (define_insn "@mve_<mve_insn>q_n_<supf><mode>"
> with the help of a new iterator (mve_insn).
> 
> Doing so makes it more obvious that some patterns are identical,
> except for the instruction name. I took this opportunity to merge
> them, so for instance I have a patch which merges add, sub and mul
> patterns.  Although not strictly necessary for the MVE intrinsics
> restructuring work, this is a good opportunity to reduce such code
> duplication (I did notice a few bugs during that process, which led me
> to post a few small patches in the past months).  Note that identical
> patterns will probably remain after the series, they can be merged
> later if we want.
> 
> This factorization also implies the introduction of new iterators, but
> also means that several existing ones become useless. These patches do
> not remove them because it's a bit painful to reorder patches which
> remove lines at some "random" places, leading to merge conflicts. It's
> much simpler to write a big cleanup patch at the end of the serie to
> remove all such useless iterators at once.
> 
> * Intrinsic re-implementation
> After intrinsic names have been factorized, the actual
> re-implementation patch is small:
> - add 1 line in each of arm-mve-builtins-base.{cc,def,h} describing
>   the intrinsic shape/signature, types and predicates involved,
>   RTX/unspec codes
> - remove the intrinsic definitions from arm_mve.h
> 
> The full series of ~140 patches is organized like this:
> - patches 1 and 2 introduce the new framework
> - new implementation of vreinterpretq
> - new implementation of vuninitialized
> - patch groups of varying size, consisting in:
>   - add a new "shape" if needed (e.g. unary, binary, ternary, ....)
>   - add framework support functions if needed
>   - factorize a set of intrinsics (at minimum, just make use of
>     parameterized-names)
>   - actual re-implementation of the intrinsics
> 
> I kept patches small so the incremental progress is easy to follow and
> check.  I'll submit the patches in small groups, this first one will
> make sure we agree on the implementation.
> 
> Tested on arm-eabi with -mthumb/-mfloat-abi=hard/-march=armv8.1-
> m.main+mve.
> 
> To help reviewers, I suggest to compare arm-mve-builtins.cc with
> aarch64-sve-builtins.cc.
> 
> Christophe Lyon (22):
>   arm: move builtin function codes into general numberspace
>   arm: [MVE intrinsics] Add new framework
>   arm: [MVE intrinsics] Rework vreinterpretq
>   arm: [MVE intrinsics] Rework vuninitialized
>   arm: [MVE intrinsics] add binary_opt_n shape
>   arm: [MVE intrinsics] add unspec_based_mve_function_exact_insn
>   arm: [MVE intrinsics] factorize vadd vsubq vmulq
>   arm: [MVE intrinsics] rework vaddq vmulq vsubq
>   arm: [MVE intrinsics] add binary shape
>   arm: [MVE intrinsics] factorize vandq veorq vorrq vbicq
>   arm: [MVE intrinsics] rework vandq veorq
>   arm: [MVE intrinsics] add binary_orrq shape
>   arm: [MVE intrinsics] rework vorrq
>   arm: [MVE intrinsics] add unspec_mve_function_exact_insn
>   arm: [MVE intrinsics] add create shape
>   arm: [MVE intrinsics] factorize vcreateq
>   arm: [MVE intrinsics] rework vcreateq
>   arm: [MVE intrinsics] factorize several binary_m operations
>   arm: [MVE intrinsics] factorize several binary _n operations
>   arm: [MVE intrinsics] factorize several binary _m_n operations
>   arm: [MVE intrinsics] factorize several binary operations
>   arm: [MVE intrinsics] rework vhaddq vhsubq vmulhq vqaddq vqsubq
>     vqdmulhq vrhaddq vrmulhq
> 
>  gcc/config.gcc                                |    2 +-
>  gcc/config/arm/arm-builtins.cc                |  237 +-
>  gcc/config/arm/arm-builtins.h                 |    1 +
>  gcc/config/arm/arm-c.cc                       |   42 +-
>  gcc/config/arm/arm-mve-builtins-base.cc       |  163 +
>  gcc/config/arm/arm-mve-builtins-base.def      |   50 +
>  gcc/config/arm/arm-mve-builtins-base.h        |   47 +
>  gcc/config/arm/arm-mve-builtins-functions.h   |  387 +
>  gcc/config/arm/arm-mve-builtins-shapes.cc     |  529 ++
>  gcc/config/arm/arm-mve-builtins-shapes.h      |   47 +
>  gcc/config/arm/arm-mve-builtins.cc            | 2013 ++++-
>  gcc/config/arm/arm-mve-builtins.def           |   40 +-
>  gcc/config/arm/arm-mve-builtins.h             |  672 +-
>  gcc/config/arm/arm-protos.h                   |   24 +
>  gcc/config/arm/arm.cc                         |   27 +
>  gcc/config/arm/arm_mve.h                      | 7581 +----------------
>  gcc/config/arm/arm_mve_builtins.def           |    6 -
>  gcc/config/arm/arm_mve_types.h                | 1430 ----
>  gcc/config/arm/iterators.md                   |  240 +-
>  gcc/config/arm/mve.md                         | 1747 +---
>  gcc/config/arm/predicates.md                  |    4 +
>  gcc/config/arm/t-arm                          |   32 +-
>  gcc/config/arm/unspecs.md                     |    1 +
>  gcc/config/arm/vec-common.md                  |    8 +-
>  gcc/testsuite/g++.target/arm/mve.exp          |    8 +-
>  .../arm/mve/general-c++/nomve_fp_1.c          |   15 +
>  .../arm/mve/general-c++/vreinterpretq_1.C     |   25 +
>  .../gcc.target/arm/mve/general-c/nomve_fp_1.c |   15 +
>  .../arm/mve/general-c/vreinterpretq_1.c       |   25 +
>  29 files changed, 4926 insertions(+), 10492 deletions(-)
>  create mode 100644 gcc/config/arm/arm-mve-builtins-base.cc
>  create mode 100644 gcc/config/arm/arm-mve-builtins-base.def
>  create mode 100644 gcc/config/arm/arm-mve-builtins-base.h
>  create mode 100644 gcc/config/arm/arm-mve-builtins-functions.h
>  create mode 100644 gcc/config/arm/arm-mve-builtins-shapes.cc
>  create mode 100644 gcc/config/arm/arm-mve-builtins-shapes.h
>  create mode 100644 gcc/testsuite/g++.target/arm/mve/general-
> c++/nomve_fp_1.c
>  create mode 100644 gcc/testsuite/g++.target/arm/mve/general-
> c++/vreinterpretq_1.C
>  create mode 100644 gcc/testsuite/gcc.target/arm/mve/general-
> c/nomve_fp_1.c
>  create mode 100644 gcc/testsuite/gcc.target/arm/mve/general-
> c/vreinterpretq_1.c
> 
> --
> 2.34.1


^ permalink raw reply	[flat|nested] 55+ messages in thread

* RE: [PATCH 01/22] arm: move builtin function codes into general numberspace
  2023-04-18 13:45 ` [PATCH 01/22] arm: move builtin function codes into general numberspace Christophe Lyon
@ 2023-05-02  9:24   ` Kyrylo Tkachov
  0 siblings, 0 replies; 55+ messages in thread
From: Kyrylo Tkachov @ 2023-05-02  9:24 UTC (permalink / raw)
  To: Christophe Lyon, gcc-patches, Richard Earnshaw, Richard Sandiford
  Cc: Christophe Lyon

Hi Christophe,

> -----Original Message-----
> From: Christophe Lyon <christophe.lyon@arm.com>
> Sent: Tuesday, April 18, 2023 2:46 PM
> To: gcc-patches@gcc.gnu.org; Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>;
> Richard Earnshaw <Richard.Earnshaw@arm.com>; Richard Sandiford
> <Richard.Sandiford@arm.com>
> Cc: Christophe Lyon <Christophe.Lyon@arm.com>
> Subject: [PATCH 01/22] arm: move builtin function codes into general
> numberspace
> 
> This patch introduces a separate numberspace for general arm builtin
> function codes. The intent of this patch is to separate the space of
> function codes that may be assigned to general builtins and future
> MVE intrinsic functions by using the first bit of each function code
> to differentiate them. This is identical to how SVE intrinsic functions
> are currently differentiated from general aarch64 builtins.
> 
> Future intrinsics implementations may also make use of numberspacing by
> changing the values of ARM_BUILTIN_SHIFT and ARM_BUILTIN_CLASS, and
> adding themselves to the arm_builtin_class enum.

Ok.
Thanks,
Kyrill

> 
> 2022-09-08  Murray Steele  <murray.steele@arm.com>
> 	    Christophe Lyon  <christophe.lyon@arm.com>
> 
> gcc/ChangeLog:
> 
> 	* config/arm/arm-builtins.cc (arm_general_add_builtin_function):
> 	New function.
> 	(arm_init_builtin): Use arm_general_add_builtin_function instead
> 	of arm_add_builtin_function.
> 	(arm_init_acle_builtins): Likewise.
> 	(arm_init_mve_builtins): Likewise.
> 	(arm_init_crypto_builtins): Likewise.
> 	(arm_init_builtins): Likewise.
> 	(arm_general_builtin_decl): New function.
> 	(arm_builtin_decl): Defer to numberspace-specialized functions.
> 	(arm_expand_builtin_args): Rename into
> arm_general_expand_builtin_args.
> 	(arm_expand_builtin_1): Rename into
> arm_general_expand_builtin_1 and ...
> 	(arm_general_expand_builtin_1): ... specialize for general builtins.
> 	(arm_expand_acle_builtin): Use arm_general_expand_builtin
> 	instead of arm_expand_builtin.
> 	(arm_expand_mve_builtin): Likewise.
> 	(arm_expand_neon_builtin): Likewise.
> 	(arm_expand_vfp_builtin): Likewise.
> 	(arm_general_expand_builtin): New function.
> 	(arm_expand_builtin): Specialize for general builtins.
> 	(arm_general_check_builtin_call): New function.
> 	(arm_check_builtin_call): Specialize for general builtins.
> 	(arm_describe_resolver): Validate numberspace.
> 	(arm_cde_end_args): Likewise.
> 	* config/arm/arm-protos.h (enum arm_builtin_class): New enum.
> 	(ARM_BUILTIN_SHIFT, ARM_BUILTIN_CLASS): New constants.
> 
> Co-authored-by: Christophe Lyon  <christophe.lyon@arm.com>
> ---
>  gcc/config/arm/arm-builtins.cc | 226 ++++++++++++++++++++++-----------
>  gcc/config/arm/arm-protos.h    |  16 +++
>  2 files changed, 165 insertions(+), 77 deletions(-)
> 
> diff --git a/gcc/config/arm/arm-builtins.cc b/gcc/config/arm/arm-builtins.cc
> index 9f5c568cbc3..adcb50d2185 100644
> --- a/gcc/config/arm/arm-builtins.cc
> +++ b/gcc/config/arm/arm-builtins.cc
> @@ -1405,6 +1405,18 @@ static tree arm_simd_polyHI_type_node =
> NULL_TREE;
>  static tree arm_simd_polyDI_type_node = NULL_TREE;
>  static tree arm_simd_polyTI_type_node = NULL_TREE;
> 
> +/* Wrapper around add_builtin_function.  NAME is the name of the built-in
> +   function, TYPE is the function type, CODE is the function subcode
> +   (relative to ARM_BUILTIN_GENERAL), and ATTRS is the function
> +   attributes.  */
> +static tree
> +arm_general_add_builtin_function (const char* name, tree type,
> +				  unsigned int code, tree attrs = NULL_TREE)
> +{
> +  code = (code << ARM_BUILTIN_SHIFT) | ARM_BUILTIN_GENERAL;
> +  return add_builtin_function (name, type, code, BUILT_IN_MD, NULL, attrs);
> +}
> +
>  static const char *
>  arm_mangle_builtin_scalar_type (const_tree type)
>  {
> @@ -1811,8 +1823,7 @@ arm_init_builtin (unsigned int fcode,
> arm_builtin_datum *d,
>      snprintf (namebuf, sizeof (namebuf), "%s_%s",
>  	      prefix, d->name);
> 
> -  fndecl = add_builtin_function (namebuf, ftype, fcode, BUILT_IN_MD,
> -				 NULL, NULL_TREE);
> +  fndecl = arm_general_add_builtin_function (namebuf, ftype, fcode);
>    arm_builtin_decls[fcode] = fndecl;
>  }
> 
> @@ -1832,7 +1843,7 @@ arm_init_bf16_types (void)
>  /* Set up ACLE builtins, even builtins for instructions that are not
>     in the current target ISA to allow the user to compile particular modules
>     with different target specific options that differ from the command line
> -   options.  Such builtins will be rejected in arm_expand_builtin.  */
> +   options.  Such builtins will be rejected in arm_general_expand_builtin.  */
> 
>  static void
>  arm_init_acle_builtins (void)
> @@ -1845,9 +1856,9 @@ arm_init_acle_builtins (void)
>  						 intSI_type_node,
>  						 NULL);
>    arm_builtin_decls[ARM_BUILTIN_SAT_IMM_CHECK]
> -    = add_builtin_function ("__builtin_sat_imm_check", sat_check_fpr,
> -			    ARM_BUILTIN_SAT_IMM_CHECK, BUILT_IN_MD,
> -			    NULL, NULL_TREE);
> +    = arm_general_add_builtin_function ("__builtin_sat_imm_check",
> +					sat_check_fpr,
> +					ARM_BUILTIN_SAT_IMM_CHECK);
> 
>    for (i = 0; i < ARRAY_SIZE (acle_builtin_data); i++, fcode++)
>      {
> @@ -1894,13 +1905,13 @@ arm_init_mve_builtins (void)
>  						    intSI_type_node,
>  						    NULL);
>    arm_builtin_decls[ARM_BUILTIN_GET_FPSCR_NZCVQC]
> -    = add_builtin_function ("__builtin_arm_get_fpscr_nzcvqc",
> get_fpscr_nzcvqc,
> -			    ARM_BUILTIN_GET_FPSCR_NZCVQC, BUILT_IN_MD,
> NULL,
> -			    NULL_TREE);
> +    = arm_general_add_builtin_function ("__builtin_arm_get_fpscr_nzcvqc",
> +					get_fpscr_nzcvqc,
> +					ARM_BUILTIN_GET_FPSCR_NZCVQC);
>    arm_builtin_decls[ARM_BUILTIN_SET_FPSCR_NZCVQC]
> -    = add_builtin_function ("__builtin_arm_set_fpscr_nzcvqc",
> set_fpscr_nzcvqc,
> -			    ARM_BUILTIN_SET_FPSCR_NZCVQC, BUILT_IN_MD,
> NULL,
> -			    NULL_TREE);
> +    = arm_general_add_builtin_function ("__builtin_arm_set_fpscr_nzcvqc",
> +					set_fpscr_nzcvqc,
> +					ARM_BUILTIN_SET_FPSCR_NZCVQC);
> 
>    for (i = 0; i < ARRAY_SIZE (mve_builtin_data); i++, fcode++)
>      {
> @@ -1912,7 +1923,7 @@ arm_init_mve_builtins (void)
>  /* Set up all the NEON builtins, even builtins for instructions that are not
>     in the current target ISA to allow the user to compile particular modules
>     with different target specific options that differ from the command line
> -   options. Such builtins will be rejected in arm_expand_builtin.  */
> +   options.  Such builtins will be rejected in arm_general_expand_builtin.  */
> 
>  static void
>  arm_init_neon_builtins (void)
> @@ -2006,17 +2017,14 @@ arm_init_crypto_builtins (void)
>      R##_ftype_##A1##_##A2##_##A3
>    #define CRYPTO1(L, U, R, A) \
>      arm_builtin_decls[C (U)] \
> -      = add_builtin_function (N (L), FT1 (R, A), \
> -		  C (U), BUILT_IN_MD, NULL, NULL_TREE);
> +      = arm_general_add_builtin_function (N (L), FT1 (R, A), C (U));
>    #define CRYPTO2(L, U, R, A1, A2)  \
>      arm_builtin_decls[C (U)]	\
> -      = add_builtin_function (N (L), FT2 (R, A1, A2), \
> -		  C (U), BUILT_IN_MD, NULL, NULL_TREE);
> +      = arm_general_add_builtin_function (N (L), FT2 (R, A1, A2), C (U));
> 
>    #define CRYPTO3(L, U, R, A1, A2, A3) \
>      arm_builtin_decls[C (U)]	   \
> -      = add_builtin_function (N (L), FT3 (R, A1, A2, A3), \
> -				  C (U), BUILT_IN_MD, NULL, NULL_TREE);
> +      = arm_general_add_builtin_function (N (L), FT3 (R, A1, A2, A3), C (U));
>    #include "crypto.def"
> 
>    #undef CRYPTO1
> @@ -2039,8 +2047,8 @@ arm_init_crypto_builtins (void)
>  	  || bitmap_bit_p (arm_active_target.isa, FLAG))		\
>  	{								\
>  	  tree bdecl;							\
> -	  bdecl = add_builtin_function ((NAME), (TYPE), (CODE),
> 	\
> -					BUILT_IN_MD, NULL, NULL_TREE);
> 	\
> +	  bdecl  = arm_general_add_builtin_function ((NAME), (TYPE),    \
> +						     (CODE));		\
>  	  arm_builtin_decls[CODE] = bdecl;				\
>  	}								\
>      }									\
> @@ -2650,9 +2658,9 @@ arm_init_builtins (void)
>  						      intSI_type_node,
>  						      NULL);
>        arm_builtin_decls[ARM_BUILTIN_SIMD_LANE_CHECK]
> -      = add_builtin_function ("__builtin_arm_lane_check", lane_check_fpr,
> -			      ARM_BUILTIN_SIMD_LANE_CHECK, BUILT_IN_MD,
> -			      NULL, NULL_TREE);
> +      = arm_general_add_builtin_function ("__builtin_arm_lane_check",
> +					  lane_check_fpr,
> +					  ARM_BUILTIN_SIMD_LANE_CHECK);
>        if (TARGET_HAVE_MVE)
>  	arm_init_mve_builtins ();
>        else
> @@ -2674,11 +2682,13 @@ arm_init_builtins (void)
>  	= build_function_type_list (unsigned_type_node, NULL);
> 
>        arm_builtin_decls[ARM_BUILTIN_GET_FPSCR]
> -	= add_builtin_function ("__builtin_arm_get_fpscr", ftype_get_fpscr,
> -				ARM_BUILTIN_GET_FPSCR, BUILT_IN_MD,
> NULL, NULL_TREE);
> +	= arm_general_add_builtin_function ("__builtin_arm_get_fpscr",
> +					    ftype_get_fpscr,
> +					    ARM_BUILTIN_GET_FPSCR);
>        arm_builtin_decls[ARM_BUILTIN_SET_FPSCR]
> -	= add_builtin_function ("__builtin_arm_set_fpscr", ftype_set_fpscr,
> -				ARM_BUILTIN_SET_FPSCR, BUILT_IN_MD,
> NULL, NULL_TREE);
> +	= arm_general_add_builtin_function ("__builtin_arm_set_fpscr",
> +					    ftype_set_fpscr,
> +					    ARM_BUILTIN_SET_FPSCR);
>      }
> 
>    if (use_cmse)
> @@ -2686,17 +2696,15 @@ arm_init_builtins (void)
>        tree ftype_cmse_nonsecure_caller
>  	= build_function_type_list (unsigned_type_node, NULL);
>        arm_builtin_decls[ARM_BUILTIN_CMSE_NONSECURE_CALLER]
> -	= add_builtin_function ("__builtin_arm_cmse_nonsecure_caller",
> -				ftype_cmse_nonsecure_caller,
> -				ARM_BUILTIN_CMSE_NONSECURE_CALLER,
> BUILT_IN_MD,
> -				NULL, NULL_TREE);
> +	= arm_general_add_builtin_function
> ("__builtin_arm_cmse_nonsecure_caller",
> +					    ftype_cmse_nonsecure_caller,
> +
> ARM_BUILTIN_CMSE_NONSECURE_CALLER);
>      }
>  }
> 
> -/* Return the ARM builtin for CODE.  */
> -
> +/* Implement TARGET_BUILTIN_DECL for general builtins.  */
>  tree
> -arm_builtin_decl (unsigned code, bool initialize_p ATTRIBUTE_UNUSED)
> +arm_general_builtin_decl (unsigned code)
>  {
>    if (code >= ARM_BUILTIN_MAX)
>      return error_mark_node;
> @@ -2704,6 +2712,20 @@ arm_builtin_decl (unsigned code, bool initialize_p
> ATTRIBUTE_UNUSED)
>    return arm_builtin_decls[code];
>  }
> 
> +/* Return the ARM builtin for CODE.  */
> +tree
> +arm_builtin_decl (unsigned code, bool initialize_p ATTRIBUTE_UNUSED)
> +{
> +  unsigned subcode = code >> ARM_BUILTIN_SHIFT;
> +  switch (code & ARM_BUILTIN_CLASS)
> +    {
> +    case ARM_BUILTIN_GENERAL:
> +      return arm_general_builtin_decl (subcode);
> +    default:
> +      gcc_unreachable ();
> +    }
> +}
> +
>  /* Errors in the source file can cause expand_expr to return const0_rtx
>     where we expect a vector.  To avoid crashing, use one of the vector
>     clear instructions.  */
> @@ -2769,7 +2791,7 @@ arm_expand_ternop_builtin (enum insn_code
> icode,
>    return target;
>  }
> 
> -/* Subroutine of arm_expand_builtin to take care of binop insns.  */
> +/* Subroutine of arm_general_expand_builtin to take care of binop insns.
> */
> 
>  static rtx
>  arm_expand_binop_builtin (enum insn_code icode,
> @@ -2809,7 +2831,7 @@ arm_expand_binop_builtin (enum insn_code
> icode,
>    return target;
>  }
> 
> -/* Subroutine of arm_expand_builtin to take care of unop insns.  */
> +/* Subroutine of arm_general_expand_builtin to take care of unop insns.  */
> 
>  static rtx
>  arm_expand_unop_builtin (enum insn_code icode,
> @@ -2946,11 +2968,11 @@ mve_dereference_pointer (tree exp, tree type,
> machine_mode reg_mode,
>  		      build_int_cst (build_pointer_type (array_type), 0));
>  }
> 
> -/* Expand a builtin.  */
> +/* Implement TARGET_EXPAND_BUILTIN for general builtins.  */
>  static rtx
> -arm_expand_builtin_args (rtx target, machine_mode map_mode, int fcode,
> -		      int icode, int have_retval, tree exp,
> -		      builtin_arg *args)
> +arm_general_expand_builtin_args (rtx target, machine_mode map_mode,
> int fcode,
> +				 int icode, int have_retval, tree exp,
> +				 builtin_arg *args)
>  {
>    rtx pat;
>    tree arg[SIMD_MAX_BUILTIN_ARGS];
> @@ -3234,13 +3256,13 @@ constant_arg:
>    return target;
>  }
> 
> -/* Expand a builtin.  These builtins are "special" because they don't have
> -   symbolic constants defined per-instruction or per instruction-variant.
> +/* Expand a general builtin.  These builtins are "special" because they don't
> +   have symbolic constants defined per-instruction or per instruction-variant.
>     Instead, the required info is looked up in the ARM_BUILTIN_DATA record
> that
>     is passed into the function.  */
> 
>  static rtx
> -arm_expand_builtin_1 (int fcode, tree exp, rtx target,
> +arm_general_expand_builtin_1 (int fcode, tree exp, rtx target,
>  			   arm_builtin_datum *d)
>  {
>    enum insn_code icode = d->code;
> @@ -3308,16 +3330,16 @@ arm_expand_builtin_1 (int fcode, tree exp, rtx
> target,
>      }
>    args[k] = ARG_BUILTIN_STOP;
> 
> -  /* The interface to arm_expand_builtin_args expects a 0 if
> +  /* The interface to arm_general_expand_builtin_args expects a 0 if
>       the function is void, and a 1 if it is not.  */
> -  return arm_expand_builtin_args
> +  return arm_general_expand_builtin_args
>      (target, d->mode, fcode, icode, !is_void, exp,
>       &args[1]);
>  }
> 
>  /* Expand an ACLE builtin, i.e. those registered only if their respective
>     target constraints are met.  This check happens within
> -   arm_expand_builtin_args.  */
> +   arm_general_expand_builtin_args.  */
> 
>  static rtx
>  arm_expand_acle_builtin (int fcode, tree exp, rtx target)
> @@ -3351,11 +3373,12 @@ arm_expand_acle_builtin (int fcode, tree exp, rtx
> target)
>        ? &acle_builtin_data[fcode - ARM_BUILTIN_ACLE_PATTERN_START]
>        : &cde_builtin_data[fcode - ARM_BUILTIN_CDE_PATTERN_START].base;
> 
> -  return arm_expand_builtin_1 (fcode, exp, target, d);
> +  return arm_general_expand_builtin_1 (fcode, exp, target, d);
>  }
> 
> -/* Expand an MVE builtin, i.e. those registered only if their respective target
> -   constraints are met.  This check happens within arm_expand_builtin.  */
> +/* Expand an MVE builtin, i.e. those registered only if their respective
> +   target constraints are met.  This check happens within
> +   arm_general_expand_builtin.  */
> 
>  static rtx
>  arm_expand_mve_builtin (int fcode, tree exp, rtx target)
> @@ -3371,7 +3394,7 @@ arm_expand_mve_builtin (int fcode, tree exp, rtx
> target)
>    arm_builtin_datum *d
>      = &mve_builtin_data[fcode - ARM_BUILTIN_MVE_PATTERN_START];
> 
> -  return arm_expand_builtin_1 (fcode, exp, target, d);
> +  return arm_general_expand_builtin_1 (fcode, exp, target, d);
>  }
> 
>  /* Expand a Neon builtin, i.e. those registered only if TARGET_NEON holds.
> @@ -3394,7 +3417,7 @@ arm_expand_neon_builtin (int fcode, tree exp, rtx
> target)
>    arm_builtin_datum *d
>      = &neon_builtin_data[fcode - ARM_BUILTIN_NEON_PATTERN_START];
> 
> -  return arm_expand_builtin_1 (fcode, exp, target, d);
> +  return arm_general_expand_builtin_1 (fcode, exp, target, d);
>  }
> 
>  /* Expand a VFP builtin.  These builtins are treated like
> @@ -3415,25 +3438,18 @@ arm_expand_vfp_builtin (int fcode, tree exp, rtx
> target)
>    arm_builtin_datum *d
>      = &vfp_builtin_data[fcode - ARM_BUILTIN_VFP_PATTERN_START];
> 
> -  return arm_expand_builtin_1 (fcode, exp, target, d);
> +  return arm_general_expand_builtin_1 (fcode, exp, target, d);
>  }
> 
> -/* Expand an expression EXP that calls a built-in function,
> -   with result going to TARGET if that's convenient
> -   (and in mode MODE if that's convenient).
> -   SUBTARGET may be used as the target for computing one of EXP's
> operands.
> -   IGNORE is nonzero if the value is to be ignored.  */
> -
> +/* Implement TARGET_EXPAND_BUILTIN for general builtins.  */
>  rtx
> -arm_expand_builtin (tree exp,
> +arm_general_expand_builtin (unsigned int fcode,
> +			    tree exp,
>  		    rtx target,
> -		    rtx subtarget ATTRIBUTE_UNUSED,
> -		    machine_mode mode ATTRIBUTE_UNUSED,
>  		    int ignore ATTRIBUTE_UNUSED)
>  {
>    const struct builtin_description * d;
>    enum insn_code    icode;
> -  tree              fndecl = TREE_OPERAND (CALL_EXPR_FN (exp), 0);
>    tree              arg0;
>    tree              arg1;
>    tree              arg2;
> @@ -3441,7 +3457,6 @@ arm_expand_builtin (tree exp,
>    rtx               op1;
>    rtx               op2;
>    rtx               pat;
> -  unsigned int      fcode = DECL_MD_FUNCTION_CODE (fndecl);
>    size_t            i;
>    machine_mode tmode;
>    machine_mode mode0;
> @@ -4052,6 +4067,31 @@ arm_expand_builtin (tree exp,
>    return NULL_RTX;
>  }
> 
> +/* Expand an expression EXP that calls a built-in function,
> +   with result going to TARGET if that's convenient
> +   (and in mode MODE if that's convenient).
> +   SUBTARGET may be used as the target for computing one of EXP's
> operands.
> +   IGNORE is nonzero if the value is to be ignored.  */
> +
> +rtx
> +arm_expand_builtin (tree exp,
> +		    rtx target,
> +		    rtx subtarget ATTRIBUTE_UNUSED,
> +		    machine_mode mode ATTRIBUTE_UNUSED,
> +		    int ignore ATTRIBUTE_UNUSED)
> +{
> +  tree fndecl = TREE_OPERAND (CALL_EXPR_FN (exp), 0);
> +  unsigned int code = DECL_MD_FUNCTION_CODE (fndecl);
> +  unsigned int subcode = code >> ARM_BUILTIN_SHIFT;
> +  switch (code & ARM_BUILTIN_CLASS)
> +    {
> +    case ARM_BUILTIN_GENERAL:
> +      return arm_general_expand_builtin (subcode, exp, target, ignore);
> +    default:
> +      gcc_unreachable ();
> +    }
> +}
> +
>  void
>  arm_atomic_assign_expand_fenv (tree *hold, tree *clear, tree *update)
>  {
> @@ -4122,22 +4162,21 @@ arm_atomic_assign_expand_fenv (tree *hold,
> tree *clear, tree *update)
>  			    reload_fenv, restore_fnenv), update_call);
>  }
> 
> -/* Implement TARGET_CHECK_BUILTIN_CALL.  Record a read of the Q bit
> through
> -   intrinsics in the machine function.  */
> +/* Implement TARGET_CHECK_BUILTIN_CALL for general builtins.  Record a
> read of
> +   the Q bit through intrinsics in the machine function for general built-in
> +   functions.  */
>  bool
> -arm_check_builtin_call (location_t , vec<location_t> , tree fndecl,
> -			tree, unsigned int, tree *)
> +arm_general_check_builtin_call (unsigned int code)
>  {
> -  int fcode = DECL_MD_FUNCTION_CODE (fndecl);
> -  if (fcode == ARM_BUILTIN_saturation_occurred
> -      || fcode == ARM_BUILTIN_set_saturation)
> +  if (code == ARM_BUILTIN_saturation_occurred
> +     || code == ARM_BUILTIN_set_saturation)
>      {
>        if (cfun && cfun->decl)
>  	DECL_ATTRIBUTES (cfun->decl)
>  	  = tree_cons (get_identifier ("acle qbit"), NULL_TREE,
>  		       DECL_ATTRIBUTES (cfun->decl));
>      }
> -  if (fcode == ARM_BUILTIN_sel)
> +  else if (code == ARM_BUILTIN_sel)
>      {
>        if (cfun && cfun->decl)
>  	DECL_ATTRIBUTES (cfun->decl)
> @@ -4147,19 +4186,52 @@ arm_check_builtin_call (location_t ,
> vec<location_t> , tree fndecl,
>    return true;
>  }
> 
> +/* Implement TARGET_CHECK_BUILTIN_CALL.  */
> +bool
> +arm_check_builtin_call (location_t, vec<location_t>, tree fndecl, tree,
> +			unsigned int, tree *)
> +{
> +  unsigned int code = DECL_MD_FUNCTION_CODE (fndecl);
> +  unsigned int subcode = code >> ARM_BUILTIN_SHIFT;
> +  switch (code & ARM_BUILTIN_CLASS)
> +    {
> +    case ARM_BUILTIN_GENERAL:
> +      return arm_general_check_builtin_call (subcode);
> +    default:
> +      gcc_unreachable ();
> +    }
> +
> +}
> +
>  enum resolver_ident
>  arm_describe_resolver (tree fndecl)
>  {
> -  if (DECL_MD_FUNCTION_CODE (fndecl) >= ARM_BUILTIN_vcx1qv16qi
> -    && DECL_MD_FUNCTION_CODE (fndecl) < ARM_BUILTIN_MVE_BASE)
> -    return arm_cde_resolver;
> -  return arm_no_resolver;
> +  unsigned int code = DECL_MD_FUNCTION_CODE (fndecl);
> +  unsigned int subcode = code >> ARM_BUILTIN_SHIFT;
> +  switch (code & ARM_BUILTIN_CLASS)
> +    {
> +    case ARM_BUILTIN_GENERAL:
> +      if (subcode >= ARM_BUILTIN_vcx1qv16qi
> +	&& subcode < ARM_BUILTIN_MVE_BASE)
> +	return arm_cde_resolver;
> +      return arm_no_resolver;
> +    default:
> +      gcc_unreachable ();
> +    }
>  }
> 
>  unsigned
>  arm_cde_end_args (tree fndecl)
>  {
> -  return DECL_MD_FUNCTION_CODE (fndecl) >=
> ARM_BUILTIN_vcx1q_p_v16qi ? 2 : 1;
> +  unsigned int code = DECL_MD_FUNCTION_CODE (fndecl);
> +  unsigned int subcode = code >> ARM_BUILTIN_SHIFT;
> +  switch (code & ARM_BUILTIN_CLASS)
> +    {
> +    case ARM_BUILTIN_GENERAL:
> +      return subcode >= ARM_BUILTIN_vcx1q_p_v16qi ? 2 : 1;
> +    default:
> +      gcc_unreachable ();
> +    }
>  }
> 
>  #include "gt-arm-builtins.h"
> diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
> index c8ae5e1e9c1..1bdbd3b8ab3 100644
> --- a/gcc/config/arm/arm-protos.h
> +++ b/gcc/config/arm/arm-protos.h
> @@ -210,6 +210,22 @@ extern opt_machine_mode arm_get_mask_mode
> (machine_mode mode);
> 
>  #endif /* RTX_CODE */
> 
> +/* It's convenient to divide the built-in function codes into groups,
> +   rather than having everything in a single enum.  This type enumerates
> +   those groups.  */
> +enum arm_builtin_class
> +{
> +  ARM_BUILTIN_GENERAL
> +};
> +
> +/* Built-in function codes are structured so that the low
> +   ARM_BUILTIN_SHIFT bits contain the arm_builtin_class
> +   and the upper bits contain a group-specific subcode.  */
> +const unsigned int ARM_BUILTIN_SHIFT = 1;
> +
> +/* Mask that selects the arm part of a function code.  */
> +const unsigned int ARM_BUILTIN_CLASS = (1 << ARM_BUILTIN_SHIFT) - 1;
> +
>  /* MVE functions.  */
>  namespace arm_mve {
>    void handle_arm_mve_types_h ();
> --
> 2.34.1


^ permalink raw reply	[flat|nested] 55+ messages in thread

* RE: [PATCH 02/22] arm: [MVE intrinsics] Add new framework
  2023-04-18 13:45 ` [PATCH 02/22] arm: [MVE intrinsics] Add new framework Christophe Lyon
@ 2023-05-02 10:17   ` Kyrylo Tkachov
  0 siblings, 0 replies; 55+ messages in thread
From: Kyrylo Tkachov @ 2023-05-02 10:17 UTC (permalink / raw)
  To: Christophe Lyon, gcc-patches, Richard Earnshaw, Richard Sandiford
  Cc: Christophe Lyon



> -----Original Message-----
> From: Christophe Lyon <christophe.lyon@arm.com>
> Sent: Tuesday, April 18, 2023 2:46 PM
> To: gcc-patches@gcc.gnu.org; Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>;
> Richard Earnshaw <Richard.Earnshaw@arm.com>; Richard Sandiford
> <Richard.Sandiford@arm.com>
> Cc: Christophe Lyon <Christophe.Lyon@arm.com>
> Subject: [PATCH 02/22] arm: [MVE intrinsics] Add new framework
> 
> This patch introduces the new MVE intrinsics framework, heavily
> inspired by the SVE one in the aarch64 port.
> 
> Like the MVE intrinsic types implementation, the intrinsics framework
> defines functions via a new pragma in arm_mve.h. A boolean parameter
> is used to pass true when __ARM_MVE_PRESERVE_USER_NAMESPACE is
> defined, and false when it is not, allowing for non-prefixed intrinsic
> functions to be conditionally defined.
> 
> Future patches will build on this framework by adding new intrinsic
> functions and adding the features needed to support them.
> 
> Differences compared to the aarch64/SVE port include:
> - when present, the predicate argument is the last one with MVE (the
>   first one with SVE)
> - when using merging predicates ("_m" suffix), the "inactive" argument
>   (if any) is inserted in the first position
> - when using merging predicates ("_m" suffix), some function do not
>   have the "inactive" argument, so we maintain an exception-list
> - MVE intrinsics dealing with floating-point require the FP extension,
>   while SVE may support different extensions
> - regarding global state, MVE does not have any prefetch intrinsic, so
>   we do not need a flag for this
> - intrinsic names can be prefixed with "__arm", depending on whether
>   preserve_user_namespace is true or false
> - parse_signature: the maximum number of arguments is now a parameter,
>   this helps detecting an overflow with a new assert.
> - suffixes and overloading can be controlled using
>   explicit_mode_suffix_p and skip_overload_p in addition to
>   explicit_type_suffix_p

Ok.
Thanks,
Kyrill

> 
> At this implemtation stage, there are some limitations compared
> to aarch64/SVE, which are removed later in the series:
> - "offset" mode is not supported yet
> - gimple folding is not implemented
> 
> 2022-09-08  Murray Steele  <murray.steele@arm.com>
> 	    Christophe Lyon  <christophe.lyon@arm.com>
> 
> gcc/ChangeLog:
> 
> 	* config.gcc: Add arm-mve-builtins-base.o and
> 	arm-mve-builtins-shapes.o to extra_objs.
> 	* config/arm/arm-builtins.cc (arm_builtin_decl): Handle MVE builtin
> 	numberspace.
> 	(arm_expand_builtin): Likewise
> 	(arm_check_builtin_call): Likewise
> 	(arm_describe_resolver): Likewise.
> 	* config/arm/arm-builtins.h (enum resolver_ident): Add
> 	arm_mve_resolver.
> 	* config/arm/arm-c.cc (arm_pragma_arm): Handle new pragma.
> 	(arm_resolve_overloaded_builtin): Handle MVE builtins.
> 	(arm_register_target_pragmas): Register arm_check_builtin_call.
> 	* config/arm/arm-mve-builtins.cc (class registered_function): New
> 	class.
> 	(struct registered_function_hasher): New struct.
> 	(pred_suffixes): New table.
> 	(mode_suffixes): New table.
> 	(type_suffix_info): New table.
> 	(TYPES_float16): New.
> 	(TYPES_all_float): New.
> 	(TYPES_integer_8): New.
> 	(TYPES_integer_8_16): New.
> 	(TYPES_integer_16_32): New.
> 	(TYPES_integer_32): New.
> 	(TYPES_signed_16_32): New.
> 	(TYPES_signed_32): New.
> 	(TYPES_all_signed): New.
> 	(TYPES_all_unsigned): New.
> 	(TYPES_all_integer): New.
> 	(TYPES_all_integer_with_64): New.
> 	(DEF_VECTOR_TYPE): New.
> 	(DEF_DOUBLE_TYPE): New.
> 	(DEF_MVE_TYPES_ARRAY): New.
> 	(all_integer): New.
> 	(all_integer_with_64): New.
> 	(float16): New.
> 	(all_float): New.
> 	(all_signed): New.
> 	(all_unsigned): New.
> 	(integer_8): New.
> 	(integer_8_16): New.
> 	(integer_16_32): New.
> 	(integer_32): New.
> 	(signed_16_32): New.
> 	(signed_32): New.
> 	(register_vector_type): Use void_type_node for mve.fp-only types
> when
> 	mve.fp is not enabled.
> 	(register_builtin_tuple_types): Likewise.
> 	(handle_arm_mve_h): New function..
> 	(matches_type_p): Likewise..
> 	(report_out_of_range): Likewise.
> 	(report_not_enum): Likewise.
> 	(report_missing_float): Likewise.
> 	(report_non_ice): Likewise.
> 	(check_requires_float): Likewise.
> 	(function_instance::hash): Likewise
> 	(function_instance::call_properties): Likewise.
> 	(function_instance::reads_global_state_p): Likewise.
> 	(function_instance::modifies_global_state_p): Likewise.
> 	(function_instance::could_trap_p): Likewise.
> 	(function_instance::has_inactive_argument): Likewise.
> 	(registered_function_hasher::hash): Likewise.
> 	(registered_function_hasher::equal): Likewise.
> 	(function_builder::function_builder): Likewise.
> 	(function_builder::~function_builder): Likewise.
> 	(function_builder::append_name): Likewise.
> 	(function_builder::finish_name): Likewise.
> 	(function_builder::get_name): Likewise.
> 	(add_attribute): Likewise.
> 	(function_builder::get_attributes): Likewise.
> 	(function_builder::add_function): Likewise.
> 	(function_builder::add_unique_function): Likewise.
> 	(function_builder::add_overloaded_function): Likewise.
> 	(function_builder::add_overloaded_functions): Likewise.
> 	(function_builder::register_function_group): Likewise.
> 	(function_call_info::function_call_info): Likewise.
> 	(function_resolver::function_resolver): Likewise.
> 	(function_resolver::get_vector_type): Likewise.
> 	(function_resolver::get_scalar_type_name): Likewise.
> 	(function_resolver::get_argument_type): Likewise.
> 	(function_resolver::scalar_argument_p): Likewise.
> 	(function_resolver::report_no_such_form): Likewise.
> 	(function_resolver::lookup_form): Likewise.
> 	(function_resolver::resolve_to): Likewise.
> 	(function_resolver::infer_vector_or_tuple_type): Likewise.
> 	(function_resolver::infer_vector_type): Likewise.
> 	(function_resolver::require_vector_or_scalar_type): Likewise.
> 	(function_resolver::require_vector_type): Likewise.
> 	(function_resolver::require_matching_vector_type): Likewise.
> 	(function_resolver::require_derived_vector_type): Likewise.
> 	(function_resolver::require_derived_scalar_type): Likewise.
> 	(function_resolver::require_integer_immediate): Likewise.
> 	(function_resolver::require_scalar_type): Likewise.
> 	(function_resolver::check_num_arguments): Likewise.
> 	(function_resolver::check_gp_argument): Likewise.
> 	(function_resolver::finish_opt_n_resolution): Likewise.
> 	(function_resolver::resolve_unary): Likewise.
> 	(function_resolver::resolve_unary_n): Likewise.
> 	(function_resolver::resolve_uniform): Likewise.
> 	(function_resolver::resolve_uniform_opt_n): Likewise.
> 	(function_resolver::resolve): Likewise.
> 	(function_checker::function_checker): Likewise.
> 	(function_checker::argument_exists_p): Likewise.
> 	(function_checker::require_immediate): Likewise.
> 	(function_checker::require_immediate_enum): Likewise.
> 	(function_checker::require_immediate_range): Likewise.
> 	(function_checker::check): Likewise.
> 	(gimple_folder::gimple_folder): Likewise.
> 	(gimple_folder::fold): Likewise.
> 	(function_expander::function_expander): Likewise.
> 	(function_expander::direct_optab_handler): Likewise.
> 	(function_expander::get_fallback_value): Likewise.
> 	(function_expander::get_reg_target): Likewise.
> 	(function_expander::add_output_operand): Likewise.
> 	(function_expander::add_input_operand): Likewise.
> 	(function_expander::add_integer_operand): Likewise.
> 	(function_expander::generate_insn): Likewise.
> 	(function_expander::use_exact_insn): Likewise.
> 	(function_expander::use_unpred_insn): Likewise.
> 	(function_expander::use_pred_x_insn): Likewise.
> 	(function_expander::use_cond_insn): Likewise.
> 	(function_expander::map_to_rtx_codes): Likewise.
> 	(function_expander::expand): Likewise.
> 	(resolve_overloaded_builtin): Likewise.
> 	(check_builtin_call): Likewise.
> 	(gimple_fold_builtin): Likewise.
> 	(expand_builtin): Likewise.
> 	(gt_ggc_mx): Likewise.
> 	(gt_pch_nx): Likewise.
> 	(gt_pch_nx): Likewise.
> 	* config/arm/arm-mve-builtins.def(s8): Define new type suffix.
> 	(s16): Likewise.
> 	(s32): Likewise.
> 	(s64): Likewise.
> 	(u8): Likewise.
> 	(u16): Likewise.
> 	(u32): Likewise.
> 	(u64): Likewise.
> 	(f16): Likewise.
> 	(f32): Likewise.
> 	(n): New mode.
> 	(offset): New mode.
> 	* config/arm/arm-mve-builtins.h (MAX_TUPLE_SIZE): New constant.
> 	(CP_READ_FPCR): Likewise.
> 	(CP_RAISE_FP_EXCEPTIONS): Likewise.
> 	(CP_READ_MEMORY): Likewise.
> 	(CP_WRITE_MEMORY): Likewise.
> 	(enum units_index): New enum.
> 	(enum predication_index): New.
> 	(enum type_class_index): New.
> 	(enum mode_suffix_index): New enum.
> 	(enum type_suffix_index): New.
> 	(struct mode_suffix_info): New struct.
> 	(struct type_suffix_info): New.
> 	(struct function_group_info): Likewise.
> 	(class function_instance): Likewise.
> 	(class registered_function): Likewise.
> 	(class function_builder): Likewise.
> 	(class function_call_info): Likewise.
> 	(class function_resolver): Likewise.
> 	(class function_checker): Likewise.
> 	(class gimple_folder): Likewise.
> 	(class function_expander): Likewise.
> 	(get_mve_pred16_t): Likewise.
> 	(find_mode_suffix): New function.
> 	(class function_base): Likewise.
> 	(class function_shape): Likewise.
> 	(function_instance::operator==): New function.
> 	(function_instance::operator!=): Likewise.
> 	(function_instance::vectors_per_tuple): Likewise.
> 	(function_instance::mode_suffix): Likewise.
> 	(function_instance::type_suffix): Likewise.
> 	(function_instance::scalar_type): Likewise.
> 	(function_instance::vector_type): Likewise.
> 	(function_instance::tuple_type): Likewise.
> 	(function_instance::vector_mode): Likewise.
> 	(function_call_info::function_returns_void_p): Likewise.
> 	(function_base::call_properties): Likewise.
> 	* config/arm/arm-protos.h (enum arm_builtin_class): Add
> 	ARM_BUILTIN_MVE.
> 	(handle_arm_mve_h): New.
> 	(resolve_overloaded_builtin): New.
> 	(check_builtin_call): New.
> 	(gimple_fold_builtin): New.
> 	(expand_builtin): New.
> 	* config/arm/arm.cc (TARGET_GIMPLE_FOLD_BUILTIN): Define as
> 	arm_gimple_fold_builtin.
> 	(arm_gimple_fold_builtin): New function.
> 	* config/arm/arm_mve.h: Use new arm_mve.h pragma.
> 	* config/arm/predicates.md (arm_any_register_operand): New
> predicate.
> 	* config/arm/t-arm: (arm-mve-builtins.o): Add includes.
> 	(arm-mve-builtins-shapes.o): New target.
> 	(arm-mve-builtins-base.o): New target.
> 	* config/arm/arm-mve-builtins-base.cc: New file.
> 	* config/arm/arm-mve-builtins-base.def: New file.
> 	* config/arm/arm-mve-builtins-base.h: New file.
> 	* config/arm/arm-mve-builtins-functions.h: New file.
> 	* config/arm/arm-mve-builtins-shapes.cc: New file.
> 	* config/arm/arm-mve-builtins-shapes.h: New file.
> 
> Co-authored-by: Christophe Lyon  <christophe.lyon@arm.com
> ---
>  gcc/config.gcc                              |    2 +-
>  gcc/config/arm/arm-builtins.cc              |   15 +-
>  gcc/config/arm/arm-builtins.h               |    1 +
>  gcc/config/arm/arm-c.cc                     |   42 +-
>  gcc/config/arm/arm-mve-builtins-base.cc     |   45 +
>  gcc/config/arm/arm-mve-builtins-base.def    |   24 +
>  gcc/config/arm/arm-mve-builtins-base.h      |   29 +
>  gcc/config/arm/arm-mve-builtins-functions.h |   50 +
>  gcc/config/arm/arm-mve-builtins-shapes.cc   |  343 ++++
>  gcc/config/arm/arm-mve-builtins-shapes.h    |   30 +
>  gcc/config/arm/arm-mve-builtins.cc          | 1950 ++++++++++++++++++-
>  gcc/config/arm/arm-mve-builtins.def         |   40 +-
>  gcc/config/arm/arm-mve-builtins.h           |  669 ++++++-
>  gcc/config/arm/arm-protos.h                 |   10 +-
>  gcc/config/arm/arm.cc                       |   27 +
>  gcc/config/arm/arm_mve.h                    |    6 +
>  gcc/config/arm/predicates.md                |    4 +
>  gcc/config/arm/t-arm                        |   32 +-
>  18 files changed, 3292 insertions(+), 27 deletions(-)
>  create mode 100644 gcc/config/arm/arm-mve-builtins-base.cc
>  create mode 100644 gcc/config/arm/arm-mve-builtins-base.def
>  create mode 100644 gcc/config/arm/arm-mve-builtins-base.h
>  create mode 100644 gcc/config/arm/arm-mve-builtins-functions.h
>  create mode 100644 gcc/config/arm/arm-mve-builtins-shapes.cc
>  create mode 100644 gcc/config/arm/arm-mve-builtins-shapes.h
> 
> diff --git a/gcc/config.gcc b/gcc/config.gcc
> index 6fd1594480a..5d49f5890ab 100644
> --- a/gcc/config.gcc
> +++ b/gcc/config.gcc
> @@ -362,7 +362,7 @@ arc*-*-*)
>  	;;
>  arm*-*-*)
>  	cpu_type=arm
> -	extra_objs="arm-builtins.o arm-mve-builtins.o aarch-common.o
> aarch-bti-insert.o"
> +	extra_objs="arm-builtins.o arm-mve-builtins.o arm-mve-builtins-
> shapes.o arm-mve-builtins-base.o aarch-common.o aarch-bti-insert.o"
>  	extra_headers="mmintrin.h arm_neon.h arm_acle.h arm_fp16.h
> arm_cmse.h arm_bf16.h arm_mve_types.h arm_mve.h arm_cde.h"
>  	target_type_format_char='%'
>  	c_target_objs="arm-c.o"
> diff --git a/gcc/config/arm/arm-builtins.cc b/gcc/config/arm/arm-builtins.cc
> index adcb50d2185..d0c57409b4c 100644
> --- a/gcc/config/arm/arm-builtins.cc
> +++ b/gcc/config/arm/arm-builtins.cc
> @@ -2712,6 +2712,7 @@ arm_general_builtin_decl (unsigned code)
>    return arm_builtin_decls[code];
>  }
> 
> +/* Implement TARGET_BUILTIN_DECL.  */
>  /* Return the ARM builtin for CODE.  */
>  tree
>  arm_builtin_decl (unsigned code, bool initialize_p ATTRIBUTE_UNUSED)
> @@ -2721,6 +2722,8 @@ arm_builtin_decl (unsigned code, bool initialize_p
> ATTRIBUTE_UNUSED)
>      {
>      case ARM_BUILTIN_GENERAL:
>        return arm_general_builtin_decl (subcode);
> +    case ARM_BUILTIN_MVE:
> +      return error_mark_node;
>      default:
>        gcc_unreachable ();
>      }
> @@ -4087,6 +4090,8 @@ arm_expand_builtin (tree exp,
>      {
>      case ARM_BUILTIN_GENERAL:
>        return arm_general_expand_builtin (subcode, exp, target, ignore);
> +    case ARM_BUILTIN_MVE:
> +      return arm_mve::expand_builtin (subcode, exp, target);
>      default:
>        gcc_unreachable ();
>      }
> @@ -4188,8 +4193,9 @@ arm_general_check_builtin_call (unsigned int code)
> 
>  /* Implement TARGET_CHECK_BUILTIN_CALL.  */
>  bool
> -arm_check_builtin_call (location_t, vec<location_t>, tree fndecl, tree,
> -			unsigned int, tree *)
> +arm_check_builtin_call (location_t loc, vec<location_t> arg_loc,
> +			tree fndecl, tree orig_fndecl,
> +			unsigned int nargs, tree *args)
>  {
>    unsigned int code = DECL_MD_FUNCTION_CODE (fndecl);
>    unsigned int subcode = code >> ARM_BUILTIN_SHIFT;
> @@ -4197,6 +4203,9 @@ arm_check_builtin_call (location_t,
> vec<location_t>, tree fndecl, tree,
>      {
>      case ARM_BUILTIN_GENERAL:
>        return arm_general_check_builtin_call (subcode);
> +    case ARM_BUILTIN_MVE:
> +      return arm_mve::check_builtin_call (loc, arg_loc, subcode,
> +					  orig_fndecl, nargs, args);
>      default:
>        gcc_unreachable ();
>      }
> @@ -4215,6 +4224,8 @@ arm_describe_resolver (tree fndecl)
>  	&& subcode < ARM_BUILTIN_MVE_BASE)
>  	return arm_cde_resolver;
>        return arm_no_resolver;
> +    case ARM_BUILTIN_MVE:
> +      return arm_mve_resolver;
>      default:
>        gcc_unreachable ();
>      }
> diff --git a/gcc/config/arm/arm-builtins.h b/gcc/config/arm/arm-builtins.h
> index 8c94b6bc40b..494dcd09411 100644
> --- a/gcc/config/arm/arm-builtins.h
> +++ b/gcc/config/arm/arm-builtins.h
> @@ -27,6 +27,7 @@
> 
>  enum resolver_ident {
>      arm_cde_resolver,
> +    arm_mve_resolver,
>      arm_no_resolver
>  };
>  enum resolver_ident arm_describe_resolver (tree);
> diff --git a/gcc/config/arm/arm-c.cc b/gcc/config/arm/arm-c.cc
> index 59c0d8ce747..d3d93ceba00 100644
> --- a/gcc/config/arm/arm-c.cc
> +++ b/gcc/config/arm/arm-c.cc
> @@ -144,20 +144,44 @@ arm_pragma_arm (cpp_reader *)
>    const char *name = TREE_STRING_POINTER (x);
>    if (strcmp (name, "arm_mve_types.h") == 0)
>      arm_mve::handle_arm_mve_types_h ();
> +  else if (strcmp (name, "arm_mve.h") == 0)
> +    {
> +      if (pragma_lex (&x) == CPP_NAME)
> +	{
> +	  if (strcmp (IDENTIFIER_POINTER (x), "true") == 0)
> +	    arm_mve::handle_arm_mve_h (true);
> +	  else if (strcmp (IDENTIFIER_POINTER (x), "false") == 0)
> +	    arm_mve::handle_arm_mve_h (false);
> +	  else
> +	    error ("%<#pragma GCC arm \"arm_mve.h\"%> requires a boolean
> parameter");
> +	}
> +    }
>    else
>      error ("unknown %<#pragma GCC arm%> option %qs", name);
>  }
> 
> -/* Implement TARGET_RESOLVE_OVERLOADED_BUILTIN.  This is currently
> only
> -   used for the MVE related builtins for the CDE extension.
> -   Here we ensure the type of arguments is such that the size is correct, and
> -   then return a tree that describes the same function call but with the
> -   relevant types cast as necessary.  */
> +/* Implement TARGET_RESOLVE_OVERLOADED_BUILTIN.  */
>  tree
> -arm_resolve_overloaded_builtin (location_t loc, tree fndecl, void *arglist)
> +arm_resolve_overloaded_builtin (location_t loc, tree fndecl,
> +				void *uncast_arglist)
>  {
> -  if (arm_describe_resolver (fndecl) == arm_cde_resolver)
> -    return arm_resolve_cde_builtin (loc, fndecl, arglist);
> +  enum resolver_ident resolver = arm_describe_resolver (fndecl);
> +  if (resolver == arm_cde_resolver)
> +    return arm_resolve_cde_builtin (loc, fndecl, uncast_arglist);
> +  if (resolver == arm_mve_resolver)
> +    {
> +      vec<tree, va_gc> empty = {};
> +      vec<tree, va_gc> *arglist = (uncast_arglist
> +				   ? (vec<tree, va_gc> *) uncast_arglist
> +				   : &empty);
> +      unsigned int code = DECL_MD_FUNCTION_CODE (fndecl);
> +      unsigned int subcode = code >> ARM_BUILTIN_SHIFT;
> +      tree new_fndecl = arm_mve::resolve_overloaded_builtin (loc, subcode,
> arglist);
> +      if (new_fndecl == NULL_TREE || new_fndecl == error_mark_node)
> +	return new_fndecl;
> +      return build_function_call_vec (loc, vNULL, new_fndecl, arglist,
> +				      NULL, fndecl);
> +    }
>    return NULL_TREE;
>  }
> 
> @@ -519,7 +543,9 @@ arm_register_target_pragmas (void)
>  {
>    /* Update pragma hook to allow parsing #pragma GCC target.  */
>    targetm.target_option.pragma_parse = arm_pragma_target_parse;
> +
>    targetm.resolve_overloaded_builtin = arm_resolve_overloaded_builtin;
> +  targetm.check_builtin_call = arm_check_builtin_call;
> 
>    c_register_pragma ("GCC", "arm", arm_pragma_arm);
> 
> diff --git a/gcc/config/arm/arm-mve-builtins-base.cc b/gcc/config/arm/arm-
> mve-builtins-base.cc
> new file mode 100644
> index 00000000000..e9f285faf2b
> --- /dev/null
> +++ b/gcc/config/arm/arm-mve-builtins-base.cc
> @@ -0,0 +1,45 @@
> +/* ACLE support for Arm MVE (__ARM_FEATURE_MVE intrinsics)
> +   Copyright (C) 2023 Free Software Foundation, Inc.
> +
> +   This file is part of GCC.
> +
> +   GCC is free software; you can redistribute it and/or modify it
> +   under the terms of the GNU General Public License as published by
> +   the Free Software Foundation; either version 3, or (at your option)
> +   any later version.
> +
> +   GCC is distributed in the hope that it will be useful, but
> +   WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   General Public License for more details.
> +
> +   You should have received a copy of the GNU General Public License
> +   along with GCC; see the file COPYING3.  If not see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#include "config.h"
> +#include "system.h"
> +#include "coretypes.h"
> +#include "tm.h"
> +#include "tree.h"
> +#include "rtl.h"
> +#include "memmodel.h"
> +#include "insn-codes.h"
> +#include "optabs.h"
> +#include "basic-block.h"
> +#include "function.h"
> +#include "gimple.h"
> +#include "arm-mve-builtins.h"
> +#include "arm-mve-builtins-shapes.h"
> +#include "arm-mve-builtins-base.h"
> +#include "arm-mve-builtins-functions.h"
> +
> +using namespace arm_mve;
> +
> +namespace {
> +
> +} /* end anonymous namespace */
> +
> +namespace arm_mve {
> +
> +} /* end namespace arm_mve */
> diff --git a/gcc/config/arm/arm-mve-builtins-base.def b/gcc/config/arm/arm-
> mve-builtins-base.def
> new file mode 100644
> index 00000000000..d15ba2e23e8
> --- /dev/null
> +++ b/gcc/config/arm/arm-mve-builtins-base.def
> @@ -0,0 +1,24 @@
> +/* ACLE support for Arm MVE (__ARM_FEATURE_MVE intrinsics)
> +   Copyright (C) 2023 Free Software Foundation, Inc.
> +
> +   This file is part of GCC.
> +
> +   GCC is free software; you can redistribute it and/or modify it
> +   under the terms of the GNU General Public License as published by
> +   the Free Software Foundation; either version 3, or (at your option)
> +   any later version.
> +
> +   GCC is distributed in the hope that it will be useful, but
> +   WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   General Public License for more details.
> +
> +   You should have received a copy of the GNU General Public License
> +   along with GCC; see the file COPYING3.  If not see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#define REQUIRES_FLOAT false
> +#undef REQUIRES_FLOAT
> +
> +#define REQUIRES_FLOAT true
> +#undef REQUIRES_FLOAT
> diff --git a/gcc/config/arm/arm-mve-builtins-base.h b/gcc/config/arm/arm-
> mve-builtins-base.h
> new file mode 100644
> index 00000000000..c4d7b750cd5
> --- /dev/null
> +++ b/gcc/config/arm/arm-mve-builtins-base.h
> @@ -0,0 +1,29 @@
> +/* ACLE support for Arm MVE (__ARM_FEATURE_MVE intrinsics)
> +   Copyright (C) 2023 Free Software Foundation, Inc.
> +
> +   This file is part of GCC.
> +
> +   GCC is free software; you can redistribute it and/or modify it
> +   under the terms of the GNU General Public License as published by
> +   the Free Software Foundation; either version 3, or (at your option)
> +   any later version.
> +
> +   GCC is distributed in the hope that it will be useful, but
> +   WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   General Public License for more details.
> +
> +   You should have received a copy of the GNU General Public License
> +   along with GCC; see the file COPYING3.  If not see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#ifndef GCC_ARM_MVE_BUILTINS_BASE_H
> +#define GCC_ARM_MVE_BUILTINS_BASE_H
> +
> +namespace arm_mve {
> +namespace functions {
> +
> +} /* end namespace arm_mve::functions */
> +} /* end namespace arm_mve */
> +
> +#endif
> diff --git a/gcc/config/arm/arm-mve-builtins-functions.h
> b/gcc/config/arm/arm-mve-builtins-functions.h
> new file mode 100644
> index 00000000000..dff01999bcd
> --- /dev/null
> +++ b/gcc/config/arm/arm-mve-builtins-functions.h
> @@ -0,0 +1,50 @@
> +/* ACLE support for Arm MVE (function_base classes)
> +   Copyright (C) 2023 Free Software Foundation, Inc.
> +
> +   This file is part of GCC.
> +
> +   GCC is free software; you can redistribute it and/or modify it
> +   under the terms of the GNU General Public License as published by
> +   the Free Software Foundation; either version 3, or (at your option)
> +   any later version.
> +
> +   GCC is distributed in the hope that it will be useful, but
> +   WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   General Public License for more details.
> +
> +   You should have received a copy of the GNU General Public License
> +   along with GCC; see the file COPYING3.  If not see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#ifndef GCC_ARM_MVE_BUILTINS_FUNCTIONS_H
> +#define GCC_ARM_MVE_BUILTINS_FUNCTIONS_H
> +
> +namespace arm_mve {
> +
> +/* Wrap T, which is derived from function_base, and indicate that the
> +   function never has side effects.  It is only necessary to use this
> +   wrapper on functions that might have floating-point suffixes, since
> +   otherwise we assume by default that the function has no side effects.  */
> +template<typename T>
> +class quiet : public T
> +{
> +public:
> +  CONSTEXPR quiet () : T () {}
> +
> +  unsigned int
> +  call_properties (const function_instance &) const override
> +  {
> +    return 0;
> +  }
> +};
> +
> +} /* end namespace arm_mve */
> +
> +/* Declare the global function base NAME, creating it from an instance
> +   of class CLASS with constructor arguments ARGS.  */
> +#define FUNCTION(NAME, CLASS, ARGS) \
> +  namespace { static CONSTEXPR const CLASS NAME##_obj ARGS; } \
> +  namespace functions { const function_base *const NAME = &NAME##_obj;
> }
> +
> +#endif
> diff --git a/gcc/config/arm/arm-mve-builtins-shapes.cc b/gcc/config/arm/arm-
> mve-builtins-shapes.cc
> new file mode 100644
> index 00000000000..f20660d8319
> --- /dev/null
> +++ b/gcc/config/arm/arm-mve-builtins-shapes.cc
> @@ -0,0 +1,343 @@
> +/* ACLE support for Arm MVE (function shapes)
> +   Copyright (C) 2023 Free Software Foundation, Inc.
> +
> +   This file is part of GCC.
> +
> +   GCC is free software; you can redistribute it and/or modify it
> +   under the terms of the GNU General Public License as published by
> +   the Free Software Foundation; either version 3, or (at your option)
> +   any later version.
> +
> +   GCC is distributed in the hope that it will be useful, but
> +   WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   General Public License for more details.
> +
> +   You should have received a copy of the GNU General Public License
> +   along with GCC; see the file COPYING3.  If not see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#include "config.h"
> +#include "system.h"
> +#include "coretypes.h"
> +#include "tm.h"
> +#include "tree.h"
> +#include "rtl.h"
> +#include "memmodel.h"
> +#include "insn-codes.h"
> +#include "optabs.h"
> +#include "arm-mve-builtins.h"
> +#include "arm-mve-builtins-shapes.h"
> +
> +/* In the comments below, _t0 represents the first type suffix
> +   (e.g. "_s8") and _t1 represents the second.  T0/T1 represent the
> +   type full names (e.g. int8x16_t). Square brackets enclose
> +   characters that are present in only the full name, not the
> +   overloaded name.  Governing predicate arguments and predicate
> +   suffixes are not shown, since they depend on the predication type,
> +   which is a separate piece of information from the shape.  */
> +
> +namespace arm_mve {
> +
> +/* If INSTANCE has a predicate, add it to the list of argument types
> +   in ARGUMENT_TYPES.  RETURN_TYPE is the type returned by the
> +   function.  */
> +static void
> +apply_predication (const function_instance &instance, tree return_type,
> +		   vec<tree> &argument_types)
> +{
> +  if (instance.pred != PRED_none)
> +    {
> +      /* When predicate is PRED_m, insert a first argument
> +	 ("inactive") with the same type as return_type.  */
> +      if (instance.has_inactive_argument ())
> +	argument_types.quick_insert (0, return_type);
> +      argument_types.quick_push (get_mve_pred16_t ());
> +    }
> +}
> +
> +/* Parse and move past an element type in FORMAT and return it as a type
> +   suffix.  The format is:
> +
> +   [01]    - the element type in type suffix 0 or 1 of INSTANCE.
> +   h<elt>  - a half-sized version of <elt>
> +   s<bits> - a signed type with the given number of bits
> +   s[01]   - a signed type with the same width as type suffix 0 or 1
> +   u<bits> - an unsigned type with the given number of bits
> +   u[01]   - an unsigned type with the same width as type suffix 0 or 1
> +   w<elt>  - a double-sized version of <elt>
> +   x<bits> - a type with the given number of bits and same signedness
> +             as the next argument.
> +
> +   Future intrinsics will extend this format.  */
> +static type_suffix_index
> +parse_element_type (const function_instance &instance, const char
> *&format)
> +{
> +  int ch = *format++;
> +
> +
> +  if (ch == 's' || ch == 'u')
> +    {
> +      type_class_index tclass = (ch == 'f' ? TYPE_float
> +				 : ch == 's' ? TYPE_signed
> +				 : TYPE_unsigned);
> +      char *end;
> +      unsigned int bits = strtol (format, &end, 10);
> +      format = end;
> +      if (bits == 0 || bits == 1)
> +	bits = instance.type_suffix (bits).element_bits;
> +      return find_type_suffix (tclass, bits);
> +    }
> +
> +  if (ch == 'h')
> +    {
> +      type_suffix_index suffix = parse_element_type (instance, format);
> +      return find_type_suffix (type_suffixes[suffix].tclass,
> +			       type_suffixes[suffix].element_bits / 2);
> +    }
> +
> +   if (ch == 'w')
> +    {
> +      type_suffix_index suffix = parse_element_type (instance, format);
> +      return find_type_suffix (type_suffixes[suffix].tclass,
> +			       type_suffixes[suffix].element_bits * 2);
> +    }
> +
> +  if (ch == 'x')
> +    {
> +      const char *next = format;
> +      next = strstr (format, ",");
> +      next+=2;
> +      type_suffix_index suffix = parse_element_type (instance, next);
> +      type_class_index tclass = type_suffixes[suffix].tclass;
> +      char *end;
> +      unsigned int bits = strtol (format, &end, 10);
> +      format = end;
> +      return find_type_suffix (tclass, bits);
> +    }
> +
> +  if (ch == '0' || ch == '1')
> +    return instance.type_suffix_ids[ch - '0'];
> +
> +  gcc_unreachable ();
> +}
> +
> +/* Read and return a type from FORMAT for function INSTANCE.  Advance
> +   FORMAT beyond the type string.  The format is:
> +
> +   p       - predicates with type mve_pred16_t
> +   s<elt>  - a scalar type with the given element suffix
> +   t<elt>  - a vector or tuple type with given element suffix [*1]
> +   v<elt>  - a vector with the given element suffix
> +
> +   where <elt> has the format described above parse_element_type.
> +
> +   Future intrinsics will extend this format.
> +
> +   [*1] the vectors_per_tuple function indicates whether the type should
> +        be a tuple, and if so, how many vectors it should contain.  */
> +static tree
> +parse_type (const function_instance &instance, const char *&format)
> +{
> +  int ch = *format++;
> +
> +  if (ch == 'p')
> +    return get_mve_pred16_t ();
> +
> +  if (ch == 's')
> +    {
> +      type_suffix_index suffix = parse_element_type (instance, format);
> +      return scalar_types[type_suffixes[suffix].vector_type];
> +    }
> +
> +  if (ch == 't')
> +    {
> +      type_suffix_index suffix = parse_element_type (instance, format);
> +      vector_type_index vector_type = type_suffixes[suffix].vector_type;
> +      unsigned int num_vectors = instance.vectors_per_tuple ();
> +      return acle_vector_types[num_vectors - 1][vector_type];
> +    }
> +
> +  if (ch == 'v')
> +    {
> +      type_suffix_index suffix = parse_element_type (instance, format);
> +      return acle_vector_types[0][type_suffixes[suffix].vector_type];
> +    }
> +
> +  gcc_unreachable ();
> +}
> +
> +/* Read a type signature for INSTANCE from FORMAT.  Add the argument
> +   types to ARGUMENT_TYPES and return the return type.  Assert there
> +   are no more than MAX_ARGS arguments.
> +
> +   The format is a comma-separated list of types (as for parse_type),
> +   with the first type being the return type and the rest being the
> +   argument types.  */
> +static tree
> +parse_signature (const function_instance &instance, const char *format,
> +		 vec<tree> &argument_types, unsigned int max_args)
> +{
> +  tree return_type = parse_type (instance, format);
> +  unsigned int args = 0;
> +  while (format[0] == ',')
> +    {
> +      gcc_assert (args < max_args);
> +      format += 1;
> +      tree argument_type = parse_type (instance, format);
> +      argument_types.quick_push (argument_type);
> +      args += 1;
> +    }
> +  gcc_assert (format[0] == 0);
> +  return return_type;
> +}
> +
> +/* Add one function instance for GROUP, using mode suffix
> MODE_SUFFIX_ID,
> +   the type suffixes at index TI and the predication suffix at index PI.
> +   The other arguments are as for build_all.  */
> +static void
> +build_one (function_builder &b, const char *signature,
> +	   const function_group_info &group, mode_suffix_index
> mode_suffix_id,
> +	   unsigned int ti, unsigned int pi, bool preserve_user_namespace,
> +	   bool force_direct_overloads)
> +{
> +  /* Current functions take at most five arguments.  Match
> +     parse_signature parameter below.  */
> +  auto_vec<tree, 5> argument_types;
> +  function_instance instance (group.base_name, *group.base, *group.shape,
> +			      mode_suffix_id, group.types[ti],
> +			      group.preds[pi]);
> +  tree return_type = parse_signature (instance, signature, argument_types,
> 5);
> +  apply_predication (instance, return_type, argument_types);
> +  b.add_unique_function (instance, return_type, argument_types,
> +			 preserve_user_namespace, group.requires_float,
> +			 force_direct_overloads);
> +}
> +
> +/* Add a function instance for every type and predicate combination in
> +   GROUP, except if requested to use only the predicates listed in
> +   RESTRICT_TO_PREDS.  Take the function base name from GROUP and the
> +   mode suffix from MODE_SUFFIX_ID. Use SIGNATURE to construct the
> +   function signature, then use apply_predication to add in the
> +   predicate.  */
> +static void
> +build_all (function_builder &b, const char *signature,
> +	   const function_group_info &group, mode_suffix_index
> mode_suffix_id,
> +	   bool preserve_user_namespace,
> +	   bool force_direct_overloads = false,
> +	   const predication_index *restrict_to_preds = NULL)
> +{
> +  for (unsigned int pi = 0; group.preds[pi] != NUM_PREDS; ++pi)
> +    {
> +      unsigned int pi2 = 0;
> +
> +      if (restrict_to_preds)
> +	for (; restrict_to_preds[pi2] != NUM_PREDS; ++pi2)
> +	  if (restrict_to_preds[pi2] == group.preds[pi])
> +	    break;
> +
> +      if (restrict_to_preds == NULL || restrict_to_preds[pi2] != NUM_PREDS)
> +	for (unsigned int ti = 0;
> +	     ti == 0 || group.types[ti][0] != NUM_TYPE_SUFFIXES; ++ti)
> +	  build_one (b, signature, group, mode_suffix_id, ti, pi,
> +		     preserve_user_namespace, force_direct_overloads);
> +    }
> +}
> +
> +/* Add a function instance for every type and predicate combination in
> +   GROUP, except if requested to use only the predicates listed in
> +   RESTRICT_TO_PREDS, and only for 16-bit and 32-bit integers.  Take
> +   the function base name from GROUP and the mode suffix from
> +   MODE_SUFFIX_ID. Use SIGNATURE to construct the function signature,
> +   then use apply_predication to add in the predicate.  */
> +static void
> +build_16_32 (function_builder &b, const char *signature,
> +	     const function_group_info &group, mode_suffix_index
> mode_suffix_id,
> +	     bool preserve_user_namespace,
> +	     bool force_direct_overloads = false,
> +	     const predication_index *restrict_to_preds = NULL)
> +{
> +  for (unsigned int pi = 0; group.preds[pi] != NUM_PREDS; ++pi)
> +    {
> +      unsigned int pi2 = 0;
> +
> +      if (restrict_to_preds)
> +	for (; restrict_to_preds[pi2] != NUM_PREDS; ++pi2)
> +	  if (restrict_to_preds[pi2] == group.preds[pi])
> +	    break;
> +
> +      if (restrict_to_preds == NULL || restrict_to_preds[pi2] != NUM_PREDS)
> +	for (unsigned int ti = 0;
> +	     ti == 0 || group.types[ti][0] != NUM_TYPE_SUFFIXES; ++ti)
> +	  {
> +	    unsigned int element_bits =
> type_suffixes[group.types[ti][0]].element_bits;
> +	    type_class_index tclass = type_suffixes[group.types[ti][0]].tclass;
> +	    if ((tclass == TYPE_signed || tclass == TYPE_unsigned)
> +		&& (element_bits == 16 || element_bits == 32))
> +	      build_one (b, signature, group, mode_suffix_id, ti, pi,
> +			 preserve_user_namespace, force_direct_overloads);
> +	  }
> +    }
> +}
> +
> +/* Declare the function shape NAME, pointing it to an instance
> +   of class <NAME>_def.  */
> +#define SHAPE(NAME) \
> +  static CONSTEXPR const NAME##_def NAME##_obj; \
> +  namespace shapes { const function_shape *const NAME = &NAME##_obj; }
> +
> +/* Base class for functions that are not overloaded.  */
> +struct nonoverloaded_base : public function_shape
> +{
> +  bool
> +  explicit_type_suffix_p (unsigned int, enum predication_index, enum
> mode_suffix_index) const override
> +  {
> +    return true;
> +  }
> +
> +  bool
> +  explicit_mode_suffix_p (enum predication_index, enum
> mode_suffix_index) const override
> +  {
> +    return true;
> +  }
> +
> +  bool
> +  skip_overload_p (enum predication_index, enum mode_suffix_index)
> const override
> +  {
> +    return false;
> +  }
> +
> +  tree
> +  resolve (function_resolver &) const override
> +  {
> +    gcc_unreachable ();
> +  }
> +};
> +
> +/* Base class for overloaded functions.  Bit N of EXPLICIT_MASK is true
> +   if type suffix N appears in the overloaded name.  */
> +template<unsigned int EXPLICIT_MASK>
> +struct overloaded_base : public function_shape
> +{
> +  bool
> +  explicit_type_suffix_p (unsigned int i, enum predication_index, enum
> mode_suffix_index) const override
> +  {
> +    return (EXPLICIT_MASK >> i) & 1;
> +  }
> +
> +  bool
> +  explicit_mode_suffix_p (enum predication_index, enum
> mode_suffix_index) const override
> +  {
> +    return false;
> +  }
> +
> +  bool
> +  skip_overload_p (enum predication_index, enum mode_suffix_index)
> const override
> +  {
> +    return false;
> +  }
> +};
> +
> +} /* end namespace arm_mve */
> +
> +#undef SHAPE
> diff --git a/gcc/config/arm/arm-mve-builtins-shapes.h b/gcc/config/arm/arm-
> mve-builtins-shapes.h
> new file mode 100644
> index 00000000000..9e353b85a76
> --- /dev/null
> +++ b/gcc/config/arm/arm-mve-builtins-shapes.h
> @@ -0,0 +1,30 @@
> +/* ACLE support for Arm MVE (function shapes)
> +   Copyright (C) 2023 Free Software Foundation, Inc.
> +
> +   This file is part of GCC.
> +
> +   GCC is free software; you can redistribute it and/or modify it
> +   under the terms of the GNU General Public License as published by
> +   the Free Software Foundation; either version 3, or (at your option)
> +   any later version.
> +
> +   GCC is distributed in the hope that it will be useful, but
> +   WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   General Public License for more details.
> +
> +   You should have received a copy of the GNU General Public License
> +   along with GCC; see the file COPYING3.  If not see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#ifndef GCC_ARM_MVE_BUILTINS_SHAPES_H
> +#define GCC_ARM_MVE_BUILTINS_SHAPES_H
> +
> +namespace arm_mve
> +{
> +  namespace shapes
> +  {
> +  } /* end namespace arm_mve::shapes */
> +} /* end namespace arm_mve */
> +
> +#endif
> diff --git a/gcc/config/arm/arm-mve-builtins.cc b/gcc/config/arm/arm-mve-
> builtins.cc
> index 7586a82e3c1..b0cceb75ceb 100644
> --- a/gcc/config/arm/arm-mve-builtins.cc
> +++ b/gcc/config/arm/arm-mve-builtins.cc
> @@ -24,7 +24,19 @@
>  #include "coretypes.h"
>  #include "tm.h"
>  #include "tree.h"
> +#include "rtl.h"
> +#include "tm_p.h"
> +#include "memmodel.h"
> +#include "insn-codes.h"
> +#include "optabs.h"
> +#include "recog.h"
> +#include "expr.h"
> +#include "basic-block.h"
> +#include "function.h"
>  #include "fold-const.h"
> +#include "gimple.h"
> +#include "gimple-iterator.h"
> +#include "emit-rtl.h"
>  #include "langhooks.h"
>  #include "stringpool.h"
>  #include "attribs.h"
> @@ -32,6 +44,8 @@
>  #include "arm-protos.h"
>  #include "arm-builtins.h"
>  #include "arm-mve-builtins.h"
> +#include "arm-mve-builtins-base.h"
> +#include "arm-mve-builtins-shapes.h"
> 
>  namespace arm_mve {
> 
> @@ -46,6 +60,33 @@ struct vector_type_info
>    const bool requires_float;
>  };
> 
> +/* Describes a function decl.  */
> +class GTY(()) registered_function
> +{
> +public:
> +  /* The ACLE function that the decl represents.  */
> +  function_instance instance GTY ((skip));
> +
> +  /* The decl itself.  */
> +  tree decl;
> +
> +  /* Whether the function requires a floating point abi.  */
> +  bool requires_float;
> +
> +  /* True if the decl represents an overloaded function that needs to be
> +     resolved by function_resolver.  */
> +  bool overloaded_p;
> +};
> +
> +/* Hash traits for registered_function.  */
> +struct registered_function_hasher : nofree_ptr_hash <registered_function>
> +{
> +  typedef function_instance compare_type;
> +
> +  static hashval_t hash (value_type);
> +  static bool equal (value_type, const compare_type &);
> +};
> +
>  /* Flag indicating whether the arm MVE types have been handled.  */
>  static bool handle_arm_mve_types_p;
> 
> @@ -54,11 +95,167 @@ static CONSTEXPR const vector_type_info
> vector_types[] = {
>  #define DEF_MVE_TYPE(ACLE_NAME, SCALAR_TYPE) \
>    { #ACLE_NAME, REQUIRES_FLOAT },
>  #include "arm-mve-builtins.def"
> -#undef DEF_MVE_TYPE
> +};
> +
> +/* The function name suffix associated with each predication type.  */
> +static const char *const pred_suffixes[NUM_PREDS + 1] = {
> +  "",
> +  "_m",
> +  "_p",
> +  "_x",
> +  "_z",
> +  ""
> +};
> +
> +/* Static information about each mode_suffix_index.  */
> +CONSTEXPR const mode_suffix_info mode_suffixes[] = {
> +#define VECTOR_TYPE_none NUM_VECTOR_TYPES
> +#define DEF_MVE_MODE(NAME, BASE, DISPLACEMENT, UNITS) \
> +  { "_" #NAME, VECTOR_TYPE_##BASE, VECTOR_TYPE_##DISPLACEMENT,
> UNITS_##UNITS },
> +#include "arm-mve-builtins.def"
> +#undef VECTOR_TYPE_none
> +  { "", NUM_VECTOR_TYPES, NUM_VECTOR_TYPES, UNITS_none }
> +};
> +
> +/* Static information about each type_suffix_index.  */
> +CONSTEXPR const type_suffix_info type_suffixes[NUM_TYPE_SUFFIXES + 1] =
> {
> +#define DEF_MVE_TYPE_SUFFIX(NAME, ACLE_TYPE, CLASS, BITS, MODE)
> 	\
> +  { "_" #NAME, \
> +    VECTOR_TYPE_##ACLE_TYPE, \
> +    TYPE_##CLASS, \
> +    BITS, \
> +    BITS / BITS_PER_UNIT, \
> +    TYPE_##CLASS == TYPE_signed || TYPE_##CLASS == TYPE_unsigned, \
> +    TYPE_##CLASS == TYPE_unsigned, \
> +    TYPE_##CLASS == TYPE_float, \
> +    0, \
> +    MODE },
> +#include "arm-mve-builtins.def"
> +  { "", NUM_VECTOR_TYPES, TYPE_bool, 0, 0, false, false, false,
> +    0, VOIDmode }
> +};
> +
> +/* Define a TYPES_<combination> macro for each combination of type
> +   suffixes that an ACLE function can have, where <combination> is the
> +   name used in DEF_MVE_FUNCTION entries.
> +
> +   Use S (T) for single type suffix T and D (T1, T2) for a pair of type
> +   suffixes T1 and T2.  Use commas to separate the suffixes.
> +
> +   Although the order shouldn't matter, the convention is to sort the
> +   suffixes lexicographically after dividing suffixes into a type
> +   class ("b", "f", etc.) and a numerical bit count.  */
> +
> +/* _f16.  */
> +#define TYPES_float16(S, D) \
> +  S (f16)
> +
> +/* _f16 _f32.  */
> +#define TYPES_all_float(S, D) \
> +  S (f16), S (f32)
> +
> +/* _s8  _u8 .  */
> +#define TYPES_integer_8(S, D) \
> +  S (s8), S (u8)
> +
> +/* _s8 _s16
> +   _u8 _u16.  */
> +#define TYPES_integer_8_16(S, D) \
> +  S (s8), S (s16), S (u8), S(u16)
> +
> +/* _s16 _s32
> +   _u16 _u32.  */
> +#define TYPES_integer_16_32(S, D)     \
> +  S (s16), S (s32),		      \
> +  S (u16), S (u32)
> +
> +/* _s16 _s32.  */
> +#define TYPES_signed_16_32(S, D) \
> +  S (s16), S (s32)
> +
> +/* _s8 _s16 _s32.  */
> +#define TYPES_all_signed(S, D) \
> +  S (s8), S (s16), S (s32)
> +
> +/* _u8 _u16 _u32.  */
> +#define TYPES_all_unsigned(S, D) \
> +  S (u8), S (u16), S (u32)
> +
> +/* _s8 _s16 _s32
> +   _u8 _u16 _u32.  */
> +#define TYPES_all_integer(S, D) \
> +  TYPES_all_signed (S, D), TYPES_all_unsigned (S, D)
> +
> +/* _s8 _s16 _s32 _s64
> +   _u8 _u16 _u32 _u64.  */
> +#define TYPES_all_integer_with_64(S, D) \
> +  TYPES_all_signed (S, D), S (s64), TYPES_all_unsigned (S, D), S (u64)
> +
> +/* s32 _u32.  */
> +#define TYPES_integer_32(S, D) \
> +  S (s32), S (u32)
> +
> +/* s32 .  */
> +#define TYPES_signed_32(S, D) \
> +  S (s32)
> +
> +/* Describe a pair of type suffixes in which only the first is used.  */
> +#define DEF_VECTOR_TYPE(X) { TYPE_SUFFIX_ ## X, NUM_TYPE_SUFFIXES }
> +
> +/* Describe a pair of type suffixes in which both are used.  */
> +#define DEF_DOUBLE_TYPE(X, Y) { TYPE_SUFFIX_ ## X, TYPE_SUFFIX_ ## Y }
> +
> +/* Create an array that can be used in arm-mve-builtins.def to
> +   select the type suffixes in TYPES_<NAME>.  */
> +#define DEF_MVE_TYPES_ARRAY(NAME) \
> +  static const type_suffix_pair types_##NAME[] = { \
> +    TYPES_##NAME (DEF_VECTOR_TYPE, DEF_DOUBLE_TYPE), \
> +    { NUM_TYPE_SUFFIXES, NUM_TYPE_SUFFIXES } \
> +  }
> +
> +/* For functions that don't take any type suffixes.  */
> +static const type_suffix_pair types_none[] = {
> +  { NUM_TYPE_SUFFIXES, NUM_TYPE_SUFFIXES },
> +  { NUM_TYPE_SUFFIXES, NUM_TYPE_SUFFIXES }
> +};
> +
> +DEF_MVE_TYPES_ARRAY (all_integer);
> +DEF_MVE_TYPES_ARRAY (all_integer_with_64);
> +DEF_MVE_TYPES_ARRAY (float16);
> +DEF_MVE_TYPES_ARRAY (all_float);
> +DEF_MVE_TYPES_ARRAY (all_signed);
> +DEF_MVE_TYPES_ARRAY (all_unsigned);
> +DEF_MVE_TYPES_ARRAY (integer_8);
> +DEF_MVE_TYPES_ARRAY (integer_8_16);
> +DEF_MVE_TYPES_ARRAY (integer_16_32);
> +DEF_MVE_TYPES_ARRAY (integer_32);
> +DEF_MVE_TYPES_ARRAY (signed_16_32);
> +DEF_MVE_TYPES_ARRAY (signed_32);
> +
> +/* Used by functions that have no governing predicate.  */
> +static const predication_index preds_none[] = { PRED_none, NUM_PREDS };
> +
> +/* Used by functions that have the m (merging) predicated form, and in
> +   addition have an unpredicated form.  */
> +static const predication_index preds_m_or_none[] = {
> +  PRED_m, PRED_none, NUM_PREDS
> +};
> +
> +/* Used by functions that have the mx (merging and "don't care"
> +   predicated forms, and in addition have an unpredicated form.  */
> +static const predication_index preds_mx_or_none[] = {
> +  PRED_m, PRED_x, PRED_none, NUM_PREDS
> +};
> +
> +/* Used by functions that have the p predicated form, in addition to
> +   an unpredicated form.  */
> +static const predication_index preds_p_or_none[] = {
> +  PRED_p, PRED_none, NUM_PREDS
>  };
> 
>  /* The scalar type associated with each vector type.  */
> -GTY(()) tree scalar_types[NUM_VECTOR_TYPES];
> +extern GTY(()) tree scalar_types[NUM_VECTOR_TYPES];
> +tree scalar_types[NUM_VECTOR_TYPES];
> 
>  /* The single-predicate and single-vector types, with their built-in
>     "__simd128_..._t" name.  Allow an index of NUM_VECTOR_TYPES, which
> always
> @@ -66,7 +263,20 @@ GTY(()) tree scalar_types[NUM_VECTOR_TYPES];
>  static GTY(()) tree abi_vector_types[NUM_VECTOR_TYPES + 1];
> 
>  /* Same, but with the arm_mve.h names.  */
> -GTY(()) tree acle_vector_types[3][NUM_VECTOR_TYPES + 1];
> +extern GTY(()) tree
> acle_vector_types[MAX_TUPLE_SIZE][NUM_VECTOR_TYPES + 1];
> +tree acle_vector_types[MAX_TUPLE_SIZE][NUM_VECTOR_TYPES + 1];
> +
> +/* The list of all registered function decls, indexed by code.  */
> +static GTY(()) vec<registered_function *, va_gc> *registered_functions;
> +
> +/* All registered function decls, hashed on the function_instance
> +   that they implement.  This is used for looking up implementations of
> +   overloaded functions.  */
> +static hash_table<registered_function_hasher> *function_table;
> +
> +/* True if we've already complained about attempts to use functions
> +   when the required extension is disabled.  */
> +static bool reported_missing_float_p;
> 
>  /* Return the MVE abi type with element of type TYPE.  */
>  static tree
> @@ -87,7 +297,6 @@ register_builtin_types ()
>  #define DEF_MVE_TYPE(ACLE_NAME, SCALAR_TYPE) \
>    scalar_types[VECTOR_TYPE_ ## ACLE_NAME] = SCALAR_TYPE;
>  #include "arm-mve-builtins.def"
> -#undef DEF_MVE_TYPE
>    for (unsigned int i = 0; i < NUM_VECTOR_TYPES; ++i)
>      {
>        if (vector_types[i].requires_float && !TARGET_HAVE_MVE_FLOAT)
> @@ -113,8 +322,18 @@ register_builtin_types ()
>  static void
>  register_vector_type (vector_type_index type)
>  {
> +
> +  /* If the target does not have the mve.fp extension, but the type requires
> +     it, then it needs to be assigned a non-dummy type so that functions
> +     with those types in their signature can be registered.  This allows for
> +     diagnostics about the missing extension, rather than about a missing
> +     function definition.  */
>    if (vector_types[type].requires_float && !TARGET_HAVE_MVE_FLOAT)
> -    return;
> +    {
> +      acle_vector_types[0][type] = void_type_node;
> +      return;
> +    }
> +
>    tree vectype = abi_vector_types[type];
>    tree id = get_identifier (vector_types[type].acle_name);
>    tree decl = build_decl (input_location, TYPE_DECL, id, vectype);
> @@ -133,15 +352,26 @@ register_vector_type (vector_type_index type)
>    acle_vector_types[0][type] = vectype;
>  }
> 
> -/* Register tuple type TYPE with NUM_VECTORS arity under its
> -   arm_mve_types.h name.  */
> +/* Register tuple types of element type TYPE under their arm_mve_types.h
> +   names.  */
>  static void
>  register_builtin_tuple_types (vector_type_index type)
>  {
>    const vector_type_info* info = &vector_types[type];
> +
> +  /* If the target does not have the mve.fp extension, but the type requires
> +     it, then it needs to be assigned a non-dummy type so that functions
> +     with those types in their signature can be registered.  This allows for
> +     diagnostics about the missing extension, rather than about a missing
> +     function definition.  */
>    if (scalar_types[type] == boolean_type_node
>        || (info->requires_float && !TARGET_HAVE_MVE_FLOAT))
> +    {
> +      for (unsigned int num_vectors = 2; num_vectors <= 4; num_vectors += 2)
> +	acle_vector_types[num_vectors >> 1][type] = void_type_node;
>      return;
> +    }
> +
>    const char *vector_type_name = info->acle_name;
>    char buffer[sizeof ("float32x4x2_t")];
>    for (unsigned int num_vectors = 2; num_vectors <= 4; num_vectors += 2)
> @@ -189,8 +419,1710 @@ handle_arm_mve_types_h ()
>      }
>  }
> 
> -} /* end namespace arm_mve */
> +/* Implement #pragma GCC arm "arm_mve.h" <bool>.  */
> +void
> +handle_arm_mve_h (bool preserve_user_namespace)
> +{
> +  if (function_table)
> +    {
> +      error ("duplicate definition of %qs", "arm_mve.h");
> +      return;
> +    }
> 
> -using namespace arm_mve;
> +  /* Define MVE functions.  */
> +  function_table = new hash_table<registered_function_hasher> (1023);
> +}
> +
> +/* Return true if CANDIDATE is equivalent to MODEL_TYPE for overloading
> +   purposes.  */
> +static bool
> +matches_type_p (const_tree model_type, const_tree candidate)
> +{
> +  if (VECTOR_TYPE_P (model_type))
> +    {
> +      if (!VECTOR_TYPE_P (candidate)
> +	  || maybe_ne (TYPE_VECTOR_SUBPARTS (model_type),
> +		       TYPE_VECTOR_SUBPARTS (candidate))
> +	  || TYPE_MODE (model_type) != TYPE_MODE (candidate))
> +	return false;
> +
> +      model_type = TREE_TYPE (model_type);
> +      candidate = TREE_TYPE (candidate);
> +    }
> +  return (candidate != error_mark_node
> +	  && TYPE_MAIN_VARIANT (model_type) == TYPE_MAIN_VARIANT
> (candidate));
> +}
> +
> +/* Report an error against LOCATION that the user has tried to use
> +   a floating point function when the mve.fp extension is disabled.  */
> +static void
> +report_missing_float (location_t location, tree fndecl)
> +{
> +  /* Avoid reporting a slew of messages for a single oversight.  */
> +  if (reported_missing_float_p)
> +    return;
> +
> +  error_at (location, "ACLE function %qD requires ISA extension %qs",
> +	    fndecl, "mve.fp");
> +  inform (location, "you can enable mve.fp by using the command-line"
> +	  " option %<-march%>, or by using the %<target%>"
> +	  " attribute or pragma");
> +  reported_missing_float_p = true;
> +}
> +
> +/* Report that LOCATION has a call to FNDECL in which argument ARGNO
> +   was not an integer constant expression.  ARGNO counts from zero.  */
> +static void
> +report_non_ice (location_t location, tree fndecl, unsigned int argno)
> +{
> +  error_at (location, "argument %d of %qE must be an integer constant"
> +	    " expression", argno + 1, fndecl);
> +}
> +
> +/* Report that LOCATION has a call to FNDECL in which argument ARGNO has
> +   the value ACTUAL, whereas the function requires a value in the range
> +   [MIN, MAX].  ARGNO counts from zero.  */
> +static void
> +report_out_of_range (location_t location, tree fndecl, unsigned int argno,
> +		     HOST_WIDE_INT actual, HOST_WIDE_INT min,
> +		     HOST_WIDE_INT max)
> +{
> +  error_at (location, "passing %wd to argument %d of %qE, which expects"
> +	    " a value in the range [%wd, %wd]", actual, argno + 1, fndecl,
> +	    min, max);
> +}
> +
> +/* Report that LOCATION has a call to FNDECL in which argument ARGNO has
> +   the value ACTUAL, whereas the function requires a valid value of
> +   enum type ENUMTYPE.  ARGNO counts from zero.  */
> +static void
> +report_not_enum (location_t location, tree fndecl, unsigned int argno,
> +		 HOST_WIDE_INT actual, tree enumtype)
> +{
> +  error_at (location, "passing %wd to argument %d of %qE, which expects"
> +	    " a valid %qT value", actual, argno + 1, fndecl, enumtype);
> +}
> +
> +/* Checks that the mve.fp extension is enabled, given that REQUIRES_FLOAT
> +   indicates whether it is required or not for function FNDECL.
> +   Report an error against LOCATION if not.  */
> +static bool
> +check_requires_float (location_t location, tree fndecl,
> +		      bool requires_float)
> +{
> +  if (requires_float && !TARGET_HAVE_MVE_FLOAT)
> +    {
> +      report_missing_float (location, fndecl);
> +      return false;
> +    }
> +
> +  return true;
> +}
> +
> +/* Return a hash code for a function_instance.  */
> +hashval_t
> +function_instance::hash () const
> +{
> +  inchash::hash h;
> +  /* BASE uniquely determines BASE_NAME, so we don't need to hash both.
> */
> +  h.add_ptr (base);
> +  h.add_ptr (shape);
> +  h.add_int (mode_suffix_id);
> +  h.add_int (type_suffix_ids[0]);
> +  h.add_int (type_suffix_ids[1]);
> +  h.add_int (pred);
> +  return h.end ();
> +}
> +
> +/* Return a set of CP_* flags that describe what the function could do,
> +   taking the command-line flags into account.  */
> +unsigned int
> +function_instance::call_properties () const
> +{
> +  unsigned int flags = base->call_properties (*this);
> +
> +  /* -fno-trapping-math means that we can assume any FP exceptions
> +     are not user-visible.  */
> +  if (!flag_trapping_math)
> +    flags &= ~CP_RAISE_FP_EXCEPTIONS;
> +
> +  return flags;
> +}
> +
> +/* Return true if calls to the function could read some form of
> +   global state.  */
> +bool
> +function_instance::reads_global_state_p () const
> +{
> +  unsigned int flags = call_properties ();
> +
> +  /* Preserve any dependence on rounding mode, flush to zero mode, etc.
> +     There is currently no way of turning this off; in particular,
> +     -fno-rounding-math (which is the default) means that we should make
> +     the usual assumptions about rounding mode, which for intrinsics means
> +     acting as the instructions do.  */
> +  if (flags & CP_READ_FPCR)
> +    return true;
> +
> +  return false;
> +}
> +
> +/* Return true if calls to the function could modify some form of
> +   global state.  */
> +bool
> +function_instance::modifies_global_state_p () const
> +{
> +  unsigned int flags = call_properties ();
> +
> +  /* Preserve any exception state written back to the FPCR,
> +     unless -fno-trapping-math says this is unnecessary.  */
> +  if (flags & CP_RAISE_FP_EXCEPTIONS)
> +    return true;
> +
> +  /* Handle direct modifications of global state.  */
> +  return flags & CP_WRITE_MEMORY;
> +}
> +
> +/* Return true if calls to the function could raise a signal.  */
> +bool
> +function_instance::could_trap_p () const
> +{
> +  unsigned int flags = call_properties ();
> +
> +  /* Handle functions that could raise SIGFPE.  */
> +  if (flags & CP_RAISE_FP_EXCEPTIONS)
> +    return true;
> +
> +  /* Handle functions that could raise SIGBUS or SIGSEGV.  */
> +  if (flags & (CP_READ_MEMORY | CP_WRITE_MEMORY))
> +    return true;
> +
> +  return false;
> +}
> +
> +/* Return true if the function has an implicit "inactive" argument.
> +   This is the case of most _m predicated functions, but not all.
> +   The list will be updated as needed.  */
> +bool
> +function_instance::has_inactive_argument () const
> +{
> +  if (pred != PRED_m)
> +    return false;
> +
> +  return true;
> +}
> +
> +inline hashval_t
> +registered_function_hasher::hash (value_type value)
> +{
> +  return value->instance.hash ();
> +}
> +
> +inline bool
> +registered_function_hasher::equal (value_type value, const compare_type
> &key)
> +{
> +  return value->instance == key;
> +}
> +
> +function_builder::function_builder ()
> +{
> +  m_overload_type = build_function_type (void_type_node, void_list_node);
> +  m_direct_overloads = lang_GNU_CXX ();
> +  gcc_obstack_init (&m_string_obstack);
> +}
> +
> +function_builder::~function_builder ()
> +{
> +  obstack_free (&m_string_obstack, NULL);
> +}
> +
> +/* Add NAME to the end of the function name being built.  */
> +void
> +function_builder::append_name (const char *name)
> +{
> +  obstack_grow (&m_string_obstack, name, strlen (name));
> +}
> +
> +/* Zero-terminate and complete the function name being built.  */
> +char *
> +function_builder::finish_name ()
> +{
> +  obstack_1grow (&m_string_obstack, 0);
> +  return (char *) obstack_finish (&m_string_obstack);
> +}
> +
> +/* Return the overloaded or full function name for INSTANCE, with optional
> +   prefix; PRESERVE_USER_NAMESPACE selects the prefix, and
> OVERLOADED_P
> +   selects which the overloaded or full function name.  Allocate the string on
> +   m_string_obstack; the caller must use obstack_free to free it after use.  */
> +char *
> +function_builder::get_name (const function_instance &instance,
> +			    bool preserve_user_namespace,
> +			    bool overloaded_p)
> +{
> +  if (preserve_user_namespace)
> +    append_name ("__arm_");
> +  append_name (instance.base_name);
> +  append_name (pred_suffixes[instance.pred]);
> +  if (!overloaded_p
> +      || instance.shape->explicit_mode_suffix_p (instance.pred,
> +						 instance.mode_suffix_id))
> +    append_name (instance.mode_suffix ().string);
> +  for (unsigned int i = 0; i < 2; ++i)
> +    if (!overloaded_p
> +	|| instance.shape->explicit_type_suffix_p (i, instance.pred,
> +						   instance.mode_suffix_id))
> +      append_name (instance.type_suffix (i).string);
> +  return finish_name ();
> +}
> +
> +/* Add attribute NAME to ATTRS.  */
> +static tree
> +add_attribute (const char *name, tree attrs)
> +{
> +  return tree_cons (get_identifier (name), NULL_TREE, attrs);
> +}
> +
> +/* Return the appropriate function attributes for INSTANCE.  */
> +tree
> +function_builder::get_attributes (const function_instance &instance)
> +{
> +  tree attrs = NULL_TREE;
> +
> +  if (!instance.modifies_global_state_p ())
> +    {
> +      if (instance.reads_global_state_p ())
> +	attrs = add_attribute ("pure", attrs);
> +      else
> +	attrs = add_attribute ("const", attrs);
> +    }
> +
> +  if (!flag_non_call_exceptions || !instance.could_trap_p ())
> +    attrs = add_attribute ("nothrow", attrs);
> +
> +  return add_attribute ("leaf", attrs);
> +}
> +
> +/* Add a function called NAME with type FNTYPE and attributes ATTRS.
> +   INSTANCE describes what the function does and OVERLOADED_P indicates
> +   whether it is overloaded.  REQUIRES_FLOAT indicates whether the function
> +   requires the mve.fp extension.  */
> +registered_function &
> +function_builder::add_function (const function_instance &instance,
> +				const char *name, tree fntype, tree attrs,
> +				bool requires_float,
> +				bool overloaded_p,
> +				bool placeholder_p)
> +{
> +  unsigned int code = vec_safe_length (registered_functions);
> +  code = (code << ARM_BUILTIN_SHIFT) | ARM_BUILTIN_MVE;
> +
> +  /* We need to be able to generate placeholders to ensure that we have a
> +     consistent numbering scheme for function codes between the C and C++
> +     frontends, so that everything ties up in LTO.
> +
> +     Currently, tree-streamer-in.cc:unpack_ts_function_decl_value_fields
> +     validates that tree nodes returned by TARGET_BUILTIN_DECL are non-
> NULL and
> +     some node other than error_mark_node.  This is a holdover from when
> builtin
> +     decls were streamed by code rather than by value.
> +
> +     Ultimately, we should be able to remove this validation of BUILT_IN_MD
> +     nodes and remove the target hook.  For now, however, we need to
> appease the
> +     validation and return a non-NULL, non-error_mark_node node, so we
> +     arbitrarily choose integer_zero_node.  */
> +  tree decl = placeholder_p
> +    ? integer_zero_node
> +    : simulate_builtin_function_decl (input_location, name, fntype,
> +				      code, NULL, attrs);
> +
> +  registered_function &rfn = *ggc_alloc <registered_function> ();
> +  rfn.instance = instance;
> +  rfn.decl = decl;
> +  rfn.requires_float = requires_float;
> +  rfn.overloaded_p = overloaded_p;
> +  vec_safe_push (registered_functions, &rfn);
> +
> +  return rfn;
> +}
> +
> +/* Add a built-in function for INSTANCE, with the argument types given
> +   by ARGUMENT_TYPES and the return type given by RETURN_TYPE.
> +   REQUIRES_FLOAT indicates whether the function requires the mve.fp
> extension,
> +   and PRESERVE_USER_NAMESPACE indicates whether the function should
> also be
> +   registered under its non-prefixed name.  */
> +void
> +function_builder::add_unique_function (const function_instance &instance,
> +				       tree return_type,
> +				       vec<tree> &argument_types,
> +				       bool preserve_user_namespace,
> +				       bool requires_float,
> +				       bool force_direct_overloads)
> +{
> +  /* Add the function under its full (unique) name with prefix.  */
> +  char *name = get_name (instance, true, false);
> +  tree fntype = build_function_type_array (return_type,
> +					   argument_types.length (),
> +					   argument_types.address ());
> +  tree attrs = get_attributes (instance);
> +  registered_function &rfn = add_function (instance, name, fntype, attrs,
> +					   requires_float, false, false);
> +
> +  /* Enter the function into the hash table.  */
> +  hashval_t hash = instance.hash ();
> +  registered_function **rfn_slot
> +    = function_table->find_slot_with_hash (instance, hash, INSERT);
> +  gcc_assert (!*rfn_slot);
> +  *rfn_slot = &rfn;
> +
> +  /* Also add the non-prefixed non-overloaded function, if the user
> namespace
> +     does not need to be preserved.  */
> +  if (!preserve_user_namespace)
> +    {
> +      char *noprefix_name = get_name (instance, false, false);
> +      tree attrs = get_attributes (instance);
> +      add_function (instance, noprefix_name, fntype, attrs, requires_float,
> +		    false, false);
> +    }
> +
> +  /* Also add the function under its overloaded alias, if we want
> +     a separate decl for each instance of an overloaded function.  */
> +  char *overload_name = get_name (instance, true, true);
> +  if (strcmp (name, overload_name) != 0)
> +    {
> +      /* Attribute lists shouldn't be shared.  */
> +      tree attrs = get_attributes (instance);
> +      bool placeholder_p = !(m_direct_overloads || force_direct_overloads);
> +      add_function (instance, overload_name, fntype, attrs,
> +		    requires_float, false, placeholder_p);
> +
> +      /* Also add the non-prefixed overloaded function, if the user namespace
> +	 does not need to be preserved.  */
> +      if (!preserve_user_namespace)
> +	{
> +	  char *noprefix_overload_name = get_name (instance, false, true);
> +	  tree attrs = get_attributes (instance);
> +	  add_function (instance, noprefix_overload_name, fntype, attrs,
> +			requires_float, false, placeholder_p);
> +	}
> +    }
> +
> +  obstack_free (&m_string_obstack, name);
> +}
> +
> +/* Add one function decl for INSTANCE, to be used with manual overload
> +   resolution.  REQUIRES_FLOAT indicates whether the function requires the
> +   mve.fp extension.
> +
> +   For simplicity, partition functions by instance and required extensions,
> +   and check whether the required extensions are available as part of
> resolving
> +   the function to the relevant unique function.  */
> +void
> +function_builder::add_overloaded_function (const function_instance
> &instance,
> +					   bool preserve_user_namespace,
> +					   bool requires_float)
> +{
> +  char *name = get_name (instance, true, true);
> +  if (registered_function **map_value = m_overload_names.get (name))
> +    {
> +      gcc_assert ((*map_value)->instance == instance);
> +      obstack_free (&m_string_obstack, name);
> +    }
> +  else
> +    {
> +      registered_function &rfn
> +	= add_function (instance, name, m_overload_type, NULL_TREE,
> +			requires_float, true, m_direct_overloads);
> +      m_overload_names.put (name, &rfn);
> +      if (!preserve_user_namespace)
> +	{
> +	  char *noprefix_name = get_name (instance, false, true);
> +	  registered_function &noprefix_rfn
> +	    = add_function (instance, noprefix_name, m_overload_type,
> +			    NULL_TREE, requires_float, true,
> +			    m_direct_overloads);
> +	  m_overload_names.put (noprefix_name, &noprefix_rfn);
> +	}
> +    }
> +}
> +
> +/* If we are using manual overload resolution, add one function decl
> +   for each overloaded function in GROUP.  Take the function base name
> +   from GROUP and the mode from MODE.  */
> +void
> +function_builder::add_overloaded_functions (const function_group_info
> &group,
> +					    mode_suffix_index mode,
> +					    bool preserve_user_namespace)
> +{
> +  for (unsigned int pi = 0; group.preds[pi] != NUM_PREDS; ++pi)
> +    {
> +      unsigned int explicit_type0
> +	= (*group.shape)->explicit_type_suffix_p (0, group.preds[pi], mode);
> +      unsigned int explicit_type1
> +	= (*group.shape)->explicit_type_suffix_p (1, group.preds[pi], mode);
> +
> +      if ((*group.shape)->skip_overload_p (group.preds[pi], mode))
> +	continue;
> +
> +      if (!explicit_type0 && !explicit_type1)
> +	{
> +	  /* Deal with the common case in which there is one overloaded
> +	     function for all type combinations.  */
> +	  function_instance instance (group.base_name, *group.base,
> +				      *group.shape, mode, types_none[0],
> +				      group.preds[pi]);
> +	  add_overloaded_function (instance, preserve_user_namespace,
> +				   group.requires_float);
> +	}
> +      else
> +	for (unsigned int ti = 0; group.types[ti][0] != NUM_TYPE_SUFFIXES;
> +	     ++ti)
> +	  {
> +	    /* Stub out the types that are determined by overload
> +	       resolution.  */
> +	    type_suffix_pair types = {
> +	      explicit_type0 ? group.types[ti][0] : NUM_TYPE_SUFFIXES,
> +	      explicit_type1 ? group.types[ti][1] : NUM_TYPE_SUFFIXES
> +	    };
> +	    function_instance instance (group.base_name, *group.base,
> +					*group.shape, mode, types,
> +					group.preds[pi]);
> +	    add_overloaded_function (instance, preserve_user_namespace,
> +				     group.requires_float);
> +	  }
> +    }
> +}
> +
> +/* Register all the functions in GROUP.  */
> +void
> +function_builder::register_function_group (const function_group_info
> &group,
> +					   bool preserve_user_namespace)
> +{
> +  (*group.shape)->build (*this, group, preserve_user_namespace);
> +}
> +
> +function_call_info::function_call_info (location_t location_in,
> +					const function_instance &instance_in,
> +					tree fndecl_in)
> +  : function_instance (instance_in), location (location_in), fndecl (fndecl_in)
> +{
> +}
> +
> +function_resolver::function_resolver (location_t location,
> +				      const function_instance &instance,
> +				      tree fndecl, vec<tree, va_gc> &arglist)
> +  : function_call_info (location, instance, fndecl), m_arglist (arglist)
> +{
> +}
> +
> +/* Return the vector type associated with type suffix TYPE.  */
> +tree
> +function_resolver::get_vector_type (type_suffix_index type)
> +{
> +  return acle_vector_types[0][type_suffixes[type].vector_type];
> +}
> +
> +/* Return the <stdint.h> name associated with TYPE.  Using the <stdint.h>
> +   name should be more user-friendly than the underlying canonical type,
> +   since it makes the signedness and bitwidth explicit.  */
> +const char *
> +function_resolver::get_scalar_type_name (type_suffix_index type)
> +{
> +  return vector_types[type_suffixes[type].vector_type].acle_name + 2;
> +}
> +
> +/* Return the type of argument I, or error_mark_node if it isn't
> +   well-formed.  */
> +tree
> +function_resolver::get_argument_type (unsigned int i)
> +{
> +  tree arg = m_arglist[i];
> +  return arg == error_mark_node ? arg : TREE_TYPE (arg);
> +}
> +
> +/* Return true if argument I is some form of scalar value.  */
> +bool
> +function_resolver::scalar_argument_p (unsigned int i)
> +{
> +  tree type = get_argument_type (i);
> +  return (INTEGRAL_TYPE_P (type)
> +	  /* Allow pointer types, leaving the frontend to warn where
> +	     necessary.  */
> +	  || POINTER_TYPE_P (type)
> +	  || SCALAR_FLOAT_TYPE_P (type));
> +}
> +
> +/* Report that the function has no form that takes type suffix TYPE.
> +   Return error_mark_node.  */
> +tree
> +function_resolver::report_no_such_form (type_suffix_index type)
> +{
> +  error_at (location, "%qE has no form that takes %qT arguments",
> +	    fndecl, get_vector_type (type));
> +  return error_mark_node;
> +}
> +
> +/* Silently check whether there is an instance of the function with the
> +   mode suffix given by MODE and the type suffixes given by TYPE0 and
> TYPE1.
> +   Return its function decl if so, otherwise return null.  */
> +tree
> +function_resolver::lookup_form (mode_suffix_index mode,
> +				type_suffix_index type0,
> +				type_suffix_index type1)
> +{
> +  type_suffix_pair types = { type0, type1 };
> +  function_instance instance (base_name, base, shape, mode, types, pred);
> +  registered_function *rfn
> +    = function_table->find_with_hash (instance, instance.hash ());
> +  return rfn ? rfn->decl : NULL_TREE;
> +}
> +
> +/* Resolve the function to one with the mode suffix given by MODE and the
> +   type suffixes given by TYPE0 and TYPE1.  Return its function decl on
> +   success, otherwise report an error and return error_mark_node.  */
> +tree
> +function_resolver::resolve_to (mode_suffix_index mode,
> +			       type_suffix_index type0,
> +			       type_suffix_index type1)
> +{
> +  tree res = lookup_form (mode, type0, type1);
> +  if (!res)
> +    {
> +      if (type1 == NUM_TYPE_SUFFIXES)
> +	return report_no_such_form (type0);
> +      if (type0 == type_suffix_ids[0])
> +	return report_no_such_form (type1);
> +      /* To be filled in when we have other cases.  */
> +      gcc_unreachable ();
> +    }
> +  return res;
> +}
> +
> +/* Require argument ARGNO to be a single vector or a tuple of
> NUM_VECTORS
> +   vectors; NUM_VECTORS is 1 for the former.  Return the associated type
> +   suffix on success, using TYPE_SUFFIX_b for predicates.  Report an error
> +   and return NUM_TYPE_SUFFIXES on failure.  */
> +type_suffix_index
> +function_resolver::infer_vector_or_tuple_type (unsigned int argno,
> +					       unsigned int num_vectors)
> +{
> +  tree actual = get_argument_type (argno);
> +  if (actual == error_mark_node)
> +    return NUM_TYPE_SUFFIXES;
> +
> +  /* A linear search should be OK here, since the code isn't hot and
> +     the number of types is only small.  */
> +  for (unsigned int size_i = 0; size_i < MAX_TUPLE_SIZE; ++size_i)
> +    for (unsigned int suffix_i = 0; suffix_i < NUM_TYPE_SUFFIXES; ++suffix_i)
> +      {
> +	vector_type_index type_i = type_suffixes[suffix_i].vector_type;
> +	tree type = acle_vector_types[size_i][type_i];
> +	if (type && matches_type_p (type, actual))
> +	  {
> +	    if (size_i + 1 == num_vectors)
> +	      return type_suffix_index (suffix_i);
> +
> +	    if (num_vectors == 1)
> +	      error_at (location, "passing %qT to argument %d of %qE, which"
> +			" expects a single MVE vector rather than a tuple",
> +			actual, argno + 1, fndecl);
> +	    else if (size_i == 0 && type_i != VECTOR_TYPE_mve_pred16_t)
> +	      /* num_vectors is always != 1, so the singular isn't needed.  */
> +	      error_n (location, num_vectors, "%qT%d%qE%d",
> +		       "passing single vector %qT to argument %d"
> +		       " of %qE, which expects a tuple of %d vectors",
> +		       actual, argno + 1, fndecl, num_vectors);
> +	    else
> +	      /* num_vectors is always != 1, so the singular isn't needed.  */
> +	      error_n (location, num_vectors, "%qT%d%qE%d",
> +		       "passing %qT to argument %d of %qE, which"
> +		       " expects a tuple of %d vectors", actual, argno + 1,
> +		       fndecl, num_vectors);
> +	    return NUM_TYPE_SUFFIXES;
> +	  }
> +      }
> +
> +  if (num_vectors == 1)
> +    error_at (location, "passing %qT to argument %d of %qE, which"
> +	      " expects an MVE vector type", actual, argno + 1, fndecl);
> +  else
> +    error_at (location, "passing %qT to argument %d of %qE, which"
> +	      " expects an MVE tuple type", actual, argno + 1, fndecl);
> +  return NUM_TYPE_SUFFIXES;
> +}
> +
> +/* Require argument ARGNO to have some form of vector type.  Return the
> +   associated type suffix on success, using TYPE_SUFFIX_b for predicates.
> +   Report an error and return NUM_TYPE_SUFFIXES on failure.  */
> +type_suffix_index
> +function_resolver::infer_vector_type (unsigned int argno)
> +{
> +  return infer_vector_or_tuple_type (argno, 1);
> +}
> +
> +/* Require argument ARGNO to be a vector or scalar argument.  Return true
> +   if it is, otherwise report an appropriate error.  */
> +bool
> +function_resolver::require_vector_or_scalar_type (unsigned int argno)
> +{
> +  tree actual = get_argument_type (argno);
> +  if (actual == error_mark_node)
> +    return false;
> +
> +  if (!scalar_argument_p (argno) && !VECTOR_TYPE_P (actual))
> +    {
> +      error_at (location, "passing %qT to argument %d of %qE, which"
> +		" expects a vector or scalar type", actual, argno + 1, fndecl);
> +      return false;
> +    }
> +
> +  return true;
> +}
> +
> +/* Require argument ARGNO to have vector type TYPE, in cases where this
> +   requirement holds for all uses of the function.  Return true if the
> +   argument has the right form, otherwise report an appropriate error.  */
> +bool
> +function_resolver::require_vector_type (unsigned int argno,
> +					vector_type_index type)
> +{
> +  tree expected = acle_vector_types[0][type];
> +  tree actual = get_argument_type (argno);
> +  if (actual == error_mark_node)
> +    return false;
> +
> +  if (!matches_type_p (expected, actual))
> +    {
> +      error_at (location, "passing %qT to argument %d of %qE, which"
> +		" expects %qT", actual, argno + 1, fndecl, expected);
> +      return false;
> +    }
> +  return true;
> +}
> +
> +/* Like require_vector_type, but TYPE is inferred from previous arguments
> +   rather than being a fixed part of the function signature.  This changes
> +   the nature of the error messages.  */
> +bool
> +function_resolver::require_matching_vector_type (unsigned int argno,
> +						 type_suffix_index type)
> +{
> +  type_suffix_index new_type = infer_vector_type (argno);
> +  if (new_type == NUM_TYPE_SUFFIXES)
> +    return false;
> +
> +  if (type != new_type)
> +    {
> +      error_at (location, "passing %qT to argument %d of %qE, but"
> +		" previous arguments had type %qT",
> +		get_vector_type (new_type), argno + 1, fndecl,
> +		get_vector_type (type));
> +      return false;
> +    }
> +  return true;
> +}
> +
> +/* Require argument ARGNO to be a vector type with the following
> properties:
> +
> +   - the type class must be the same as FIRST_TYPE's if EXPECTED_TCLASS
> +     is SAME_TYPE_CLASS, otherwise it must be EXPECTED_TCLASS itself.
> +
> +   - the element size must be:
> +
> +     - the same as FIRST_TYPE's if EXPECTED_BITS == SAME_SIZE
> +     - half of FIRST_TYPE's if EXPECTED_BITS == HALF_SIZE
> +     - a quarter of FIRST_TYPE's if EXPECTED_BITS == QUARTER_SIZE
> +     - EXPECTED_BITS itself otherwise
> +
> +   Return true if the argument has the required type, otherwise report
> +   an appropriate error.
> +
> +   FIRST_ARGNO is the first argument that is known to have type FIRST_TYPE.
> +   Usually it comes before ARGNO, but sometimes it is more natural to
> resolve
> +   arguments out of order.
> +
> +   If the required properties depend on FIRST_TYPE then both FIRST_ARGNO
> and
> +   ARGNO contribute to the resolution process.  If the required properties
> +   are fixed, only FIRST_ARGNO contributes to the resolution process.
> +
> +   This function is a bit of a Swiss army knife.  The complication comes
> +   from trying to give good error messages when FIRST_ARGNO and ARGNO
> are
> +   inconsistent, since either of them might be wrong.  */
> +bool function_resolver::
> +require_derived_vector_type (unsigned int argno,
> +			     unsigned int first_argno,
> +			     type_suffix_index first_type,
> +			     type_class_index expected_tclass,
> +			     unsigned int expected_bits)
> +{
> +  /* If the type needs to match FIRST_ARGNO exactly, use the preferred
> +     error message for that case.  The VECTOR_TYPE_P test excludes tuple
> +     types, which we handle below instead.  */
> +  bool both_vectors_p = VECTOR_TYPE_P (get_argument_type (first_argno));
> +  if (both_vectors_p
> +      && expected_tclass == SAME_TYPE_CLASS
> +      && expected_bits == SAME_SIZE)
> +    {
> +      /* There's no need to resolve this case out of order.  */
> +      gcc_assert (argno > first_argno);
> +      return require_matching_vector_type (argno, first_type);
> +    }
> +
> +  /* Use FIRST_TYPE to get the expected type class and element size.  */
> +  type_class_index orig_expected_tclass = expected_tclass;
> +  if (expected_tclass == NUM_TYPE_CLASSES)
> +    expected_tclass = type_suffixes[first_type].tclass;
> +
> +  unsigned int orig_expected_bits = expected_bits;
> +  if (expected_bits == SAME_SIZE)
> +    expected_bits = type_suffixes[first_type].element_bits;
> +  else if (expected_bits == HALF_SIZE)
> +    expected_bits = type_suffixes[first_type].element_bits / 2;
> +  else if (expected_bits == QUARTER_SIZE)
> +    expected_bits = type_suffixes[first_type].element_bits / 4;
> +
> +  /* If the expected type doesn't depend on FIRST_TYPE at all,
> +     just check for the fixed choice of vector type.  */
> +  if (expected_tclass == orig_expected_tclass
> +      && expected_bits == orig_expected_bits)
> +    {
> +      const type_suffix_info &expected_suffix
> +	= type_suffixes[find_type_suffix (expected_tclass, expected_bits)];
> +      return require_vector_type (argno, expected_suffix.vector_type);
> +    }
> +
> +  /* Require the argument to be some form of MVE vector type,
> +     without being specific about the type of vector we want.  */
> +  type_suffix_index actual_type = infer_vector_type (argno);
> +  if (actual_type == NUM_TYPE_SUFFIXES)
> +    return false;
> +
> +  /* Exit now if we got the right type.  */
> +  bool tclass_ok_p = (type_suffixes[actual_type].tclass == expected_tclass);
> +  bool size_ok_p = (type_suffixes[actual_type].element_bits ==
> expected_bits);
> +  if (tclass_ok_p && size_ok_p)
> +    return true;
> +
> +  /* First look for cases in which the actual type contravenes a fixed
> +     size requirement, without having to refer to FIRST_TYPE.  */
> +  if (!size_ok_p && expected_bits == orig_expected_bits)
> +    {
> +      error_at (location, "passing %qT to argument %d of %qE, which"
> +		" expects a vector of %d-bit elements",
> +		get_vector_type (actual_type), argno + 1, fndecl,
> +		expected_bits);
> +      return false;
> +    }
> +
> +  /* Likewise for a fixed type class requirement.  This is only ever
> +     needed for signed and unsigned types, so don't create unnecessary
> +     translation work for other type classes.  */
> +  if (!tclass_ok_p && orig_expected_tclass == TYPE_signed)
> +    {
> +      error_at (location, "passing %qT to argument %d of %qE, which"
> +		" expects a vector of signed integers",
> +		get_vector_type (actual_type), argno + 1, fndecl);
> +      return false;
> +    }
> +  if (!tclass_ok_p && orig_expected_tclass == TYPE_unsigned)
> +    {
> +      error_at (location, "passing %qT to argument %d of %qE, which"
> +		" expects a vector of unsigned integers",
> +		get_vector_type (actual_type), argno + 1, fndecl);
> +      return false;
> +    }
> +
> +  /* Make sure that FIRST_TYPE itself is sensible before using it
> +     as a basis for an error message.  */
> +  if (resolve_to (mode_suffix_id, first_type) == error_mark_node)
> +    return false;
> +
> +  /* If the arguments have consistent type classes, but a link between
> +     the sizes has been broken, try to describe the error in those terms.  */
> +  if (both_vectors_p && tclass_ok_p && orig_expected_bits == SAME_SIZE)
> +    {
> +      if (argno < first_argno)
> +	{
> +	  std::swap (argno, first_argno);
> +	  std::swap (actual_type, first_type);
> +	}
> +      error_at (location, "arguments %d and %d of %qE must have the"
> +		" same element size, but the values passed here have type"
> +		" %qT and %qT respectively", first_argno + 1, argno + 1,
> +		fndecl, get_vector_type (first_type),
> +		get_vector_type (actual_type));
> +      return false;
> +    }
> +
> +  /* Likewise in reverse: look for cases in which the sizes are consistent
> +     but a link between the type classes has been broken.  */
> +  if (both_vectors_p
> +      && size_ok_p
> +      && orig_expected_tclass == SAME_TYPE_CLASS
> +      && type_suffixes[first_type].integer_p
> +      && type_suffixes[actual_type].integer_p)
> +    {
> +      if (argno < first_argno)
> +	{
> +	  std::swap (argno, first_argno);
> +	  std::swap (actual_type, first_type);
> +	}
> +      error_at (location, "arguments %d and %d of %qE must have the"
> +		" same signedness, but the values passed here have type"
> +		" %qT and %qT respectively", first_argno + 1, argno + 1,
> +		fndecl, get_vector_type (first_type),
> +		get_vector_type (actual_type));
> +      return false;
> +    }
> +
> +  /* The two arguments are wildly inconsistent.  */
> +  type_suffix_index expected_type
> +    = find_type_suffix (expected_tclass, expected_bits);
> +  error_at (location, "passing %qT instead of the expected %qT to argument"
> +	    " %d of %qE, after passing %qT to argument %d",
> +	    get_vector_type (actual_type), get_vector_type (expected_type),
> +	    argno + 1, fndecl, get_argument_type (first_argno),
> +	    first_argno + 1);
> +  return false;
> +}
> +
> +/* Require argument ARGNO to be a (possibly variable) scalar, expecting it
> +   to have the following properties:
> +
> +   - the type class must be the same as for type suffix 0 if EXPECTED_TCLASS
> +     is SAME_TYPE_CLASS, otherwise it must be EXPECTED_TCLASS itself.
> +
> +   - the element size must be the same as for type suffix 0 if EXPECTED_BITS
> +     is SAME_TYPE_SIZE, otherwise it must be EXPECTED_BITS itself.
> +
> +   Return true if the argument is valid, otherwise report an appropriate error.
> +
> +   Note that we don't check whether the scalar type actually has the required
> +   properties, since that's subject to implicit promotions and conversions.
> +   Instead we just use the expected properties to tune the error message.  */
> +bool function_resolver::
> +require_derived_scalar_type (unsigned int argno,
> +			     type_class_index expected_tclass,
> +			     unsigned int expected_bits)
> +{
> +  gcc_assert (expected_tclass == SAME_TYPE_CLASS
> +	      || expected_tclass == TYPE_signed
> +	      || expected_tclass == TYPE_unsigned);
> +
> +  /* If the expected type doesn't depend on the type suffix at all,
> +     just check for the fixed choice of scalar type.  */
> +  if (expected_tclass != SAME_TYPE_CLASS && expected_bits != SAME_SIZE)
> +    {
> +      type_suffix_index expected_type
> +	= find_type_suffix (expected_tclass, expected_bits);
> +      return require_scalar_type (argno, get_scalar_type_name
> (expected_type));
> +    }
> +
> +  if (scalar_argument_p (argno))
> +    return true;
> +
> +  if (expected_tclass == SAME_TYPE_CLASS)
> +    /* It doesn't really matter whether the element is expected to be
> +       the same size as type suffix 0.  */
> +    error_at (location, "passing %qT to argument %d of %qE, which"
> +	      " expects a scalar element", get_argument_type (argno),
> +	      argno + 1, fndecl);
> +  else
> +    /* It doesn't seem useful to distinguish between signed and unsigned
> +       scalars here.  */
> +    error_at (location, "passing %qT to argument %d of %qE, which"
> +	      " expects a scalar integer", get_argument_type (argno),
> +	      argno + 1, fndecl);
> +  return false;
> +}
> +
> +/* Require argument ARGNO to be suitable for an integer constant
> expression.
> +   Return true if it is, otherwise report an appropriate error.
> +
> +   function_checker checks whether the argument is actually constant and
> +   has a suitable range.  The reason for distinguishing immediate arguments
> +   here is because it provides more consistent error messages than
> +   require_scalar_type would.  */
> +bool
> +function_resolver::require_integer_immediate (unsigned int argno)
> +{
> +  if (!scalar_argument_p (argno))
> +    {
> +      report_non_ice (location, fndecl, argno);
> +      return false;
> +    }
> +  return true;
> +}
> +
> +/* Require argument ARGNO to be a (possibly variable) scalar, using
> EXPECTED
> +   as the name of its expected type.  Return true if the argument has the
> +   right form, otherwise report an appropriate error.  */
> +bool
> +function_resolver::require_scalar_type (unsigned int argno,
> +					const char *expected)
> +{
> +  if (!scalar_argument_p (argno))
> +    {
> +      error_at (location, "passing %qT to argument %d of %qE, which"
> +		" expects %qs", get_argument_type (argno), argno + 1,
> +		fndecl, expected);
> +      return false;
> +    }
> +  return true;
> +}
> +
> +/* Require the function to have exactly EXPECTED arguments.  Return true
> +   if it does, otherwise report an appropriate error.  */
> +bool
> +function_resolver::check_num_arguments (unsigned int expected)
> +{
> +  if (m_arglist.length () < expected)
> +    error_at (location, "too few arguments to function %qE", fndecl);
> +  else if (m_arglist.length () > expected)
> +    error_at (location, "too many arguments to function %qE", fndecl);
> +  return m_arglist.length () == expected;
> +}
> +
> +/* If the function is predicated, check that the last argument is a
> +   suitable predicate.  Also check that there are NOPS further
> +   arguments before any predicate, but don't check what they are.
> +
> +   Return true on success, otherwise report a suitable error.
> +   When returning true:
> +
> +   - set I to the number of the last unchecked argument.
> +   - set NARGS to the total number of arguments.  */
> +bool
> +function_resolver::check_gp_argument (unsigned int nops,
> +				      unsigned int &i, unsigned int &nargs)
> +{
> +  i = nops - 1;
> +  if (pred != PRED_none)
> +    {
> +      switch (pred)
> +	{
> +	case PRED_m:
> +	  /* Add first inactive argument if needed, and final predicate.  */
> +	  if (has_inactive_argument ())
> +	    nargs = nops + 2;
> +	  else
> +	    nargs = nops + 1;
> +	  break;
> +
> +	case PRED_p:
> +	case PRED_x:
> +	  /* Add final predicate.  */
> +	  nargs = nops + 1;
> +	  break;
> +
> +	default:
> +	  gcc_unreachable ();
> +	}
> +
> +      if (!check_num_arguments (nargs)
> +	  || !require_vector_type (nargs - 1, VECTOR_TYPE_mve_pred16_t))
> +	return false;
> +
> +      i = nargs - 2;
> +    }
> +  else
> +    {
> +      nargs = nops;
> +      if (!check_num_arguments (nargs))
> +	return false;
> +    }
> +
> +  return true;
> +}
> +
> +/* Finish resolving a function whose final argument can be a vector
> +   or a scalar, with the function having an implicit "_n" suffix
> +   in the latter case.  This "_n" form might only exist for certain
> +   type suffixes.
> +
> +   ARGNO is the index of the final argument.  The inferred type suffix
> +   was obtained from argument FIRST_ARGNO, which has type FIRST_TYPE.
> +   EXPECTED_TCLASS and EXPECTED_BITS describe the expected properties
> +   of the final vector or scalar argument, in the same way as for
> +   require_derived_vector_type.  INFERRED_TYPE is the inferred type
> +   suffix itself, or NUM_TYPE_SUFFIXES if it's the same as FIRST_TYPE.
> +
> +   Return the function decl of the resolved function on success,
> +   otherwise report a suitable error and return error_mark_node.  */
> +tree function_resolver::
> +finish_opt_n_resolution (unsigned int argno, unsigned int first_argno,
> +			 type_suffix_index first_type,
> +			 type_class_index expected_tclass,
> +			 unsigned int expected_bits,
> +			 type_suffix_index inferred_type)
> +{
> +  if (inferred_type == NUM_TYPE_SUFFIXES)
> +    inferred_type = first_type;
> +  tree scalar_form = lookup_form (MODE_n, inferred_type);
> +
> +  /* Allow the final argument to be scalar, if an _n form exists.  */
> +  if (scalar_argument_p (argno))
> +    {
> +      if (scalar_form)
> +	return scalar_form;
> +
> +      /* Check the vector form normally.  If that succeeds, raise an
> +	 error about having no corresponding _n form.  */
> +      tree res = resolve_to (mode_suffix_id, inferred_type);
> +      if (res != error_mark_node)
> +	error_at (location, "passing %qT to argument %d of %qE, but its"
> +		  " %qT form does not accept scalars",
> +		  get_argument_type (argno), argno + 1, fndecl,
> +		  get_vector_type (first_type));
> +      return error_mark_node;
> +    }
> +
> +  /* If an _n form does exist, provide a more accurate message than
> +     require_derived_vector_type would for arguments that are neither
> +     vectors nor scalars.  */
> +  if (scalar_form && !require_vector_or_scalar_type (argno))
> +    return error_mark_node;
> +
> +  /* Check for the correct vector type.  */
> +  if (!require_derived_vector_type (argno, first_argno, first_type,
> +				    expected_tclass, expected_bits))
> +    return error_mark_node;
> +
> +  return resolve_to (mode_suffix_id, inferred_type);
> +}
> +
> +/* Resolve a (possibly predicated) unary function.  If the function uses
> +   merge predication or if TREAT_AS_MERGE_P is true, there is an extra
> +   vector argument before the governing predicate that specifies the
> +   values of inactive elements.  This argument has the following
> +   properties:
> +
> +   - the type class must be the same as for active elements if MERGE_TCLASS
> +     is SAME_TYPE_CLASS, otherwise it must be MERGE_TCLASS itself.
> +
> +   - the element size must be the same as for active elements if MERGE_BITS
> +     is SAME_TYPE_SIZE, otherwise it must be MERGE_BITS itself.
> +
> +   Return the function decl of the resolved function on success,
> +   otherwise report a suitable error and return error_mark_node.  */
> +tree
> +function_resolver::resolve_unary (type_class_index merge_tclass,
> +				  unsigned int merge_bits,
> +				  bool treat_as_merge_p)
> +{
> +  type_suffix_index type;
> +  if (pred == PRED_m || treat_as_merge_p)
> +    {
> +      if (!check_num_arguments (3))
> +	return error_mark_node;
> +      if (merge_tclass == SAME_TYPE_CLASS && merge_bits == SAME_SIZE)
> +	{
> +	  /* The inactive elements are the same as the active elements,
> +	     so we can use normal left-to-right resolution.  */
> +	  if ((type = infer_vector_type (0)) == NUM_TYPE_SUFFIXES
> +	      /* Predicates are the last argument.  */
> +	      || !require_vector_type (2 , VECTOR_TYPE_mve_pred16_t)
> +	      || !require_matching_vector_type (1 , type))
> +	    return error_mark_node;
> +	}
> +      else
> +	{
> +	  /* The inactive element type is a function of the active one,
> +	     so resolve the active one first.  */
> +	  if (!require_vector_type (1, VECTOR_TYPE_mve_pred16_t)
> +	      || (type = infer_vector_type (2)) == NUM_TYPE_SUFFIXES
> +	      || !require_derived_vector_type (0, 2, type, merge_tclass,
> +					       merge_bits))
> +	    return error_mark_node;
> +	}
> +    }
> +  else
> +    {
> +      /* We just need to check the predicate (if any) and the single
> +	 vector argument.  */
> +      unsigned int i, nargs;
> +      if (!check_gp_argument (1, i, nargs)
> +	  || (type = infer_vector_type (i)) == NUM_TYPE_SUFFIXES)
> +	return error_mark_node;
> +    }
> +
> +  /* Handle convert-like functions in which the first type suffix is
> +     explicit.  */
> +  if (type_suffix_ids[0] != NUM_TYPE_SUFFIXES)
> +    return resolve_to (mode_suffix_id, type_suffix_ids[0], type);
> +
> +  return resolve_to (mode_suffix_id, type);
> +}
> +
> +/* Resolve a (possibly predicated) unary function taking a scalar
> +   argument (_n suffix).  If the function uses merge predication,
> +   there is an extra vector argument in the first position, and the
> +   final governing predicate that specifies the values of inactive
> +   elements.
> +
> +   Return the function decl of the resolved function on success,
> +   otherwise report a suitable error and return error_mark_node.  */
> +tree
> +function_resolver::resolve_unary_n ()
> +{
> +  type_suffix_index type;
> +
> +  /* Currently only support overrides for _m (vdupq).  */
> +  if (pred != PRED_m)
> +    return error_mark_node;
> +
> +  if (pred == PRED_m)
> +    {
> +      if (!check_num_arguments (3))
> +	return error_mark_node;
> +
> +      /* The inactive elements are the same as the active elements,
> +	 so we can use normal left-to-right resolution.  */
> +      if ((type = infer_vector_type (0)) == NUM_TYPE_SUFFIXES
> +	  /* Predicates are the last argument.  */
> +	  || !require_vector_type (2 , VECTOR_TYPE_mve_pred16_t))
> +	return error_mark_node;
> +    }
> +
> +  /* Make sure the argument is scalar.  */
> +  tree scalar_form = lookup_form (MODE_n, type);
> +
> +  if (scalar_argument_p (1) && scalar_form)
> +    return scalar_form;
> +
> +  return error_mark_node;
> +}
> +
> +/* Resolve a (possibly predicated) function that takes NOPS like-typed
> +   vector arguments followed by NIMM integer immediates.  Return the
> +   function decl of the resolved function on success, otherwise report
> +   a suitable error and return error_mark_node.  */
> +tree
> +function_resolver::resolve_uniform (unsigned int nops, unsigned int nimm)
> +{
> +  unsigned int i, nargs;
> +  type_suffix_index type;
> +  if (!check_gp_argument (nops + nimm, i, nargs)
> +      || (type = infer_vector_type (0 )) == NUM_TYPE_SUFFIXES)
> +    return error_mark_node;
> +
> +  unsigned int last_arg = i + 1 - nimm;
> +  for (i = 0; i < last_arg; i++)
> +    if (!require_matching_vector_type (i, type))
> +      return error_mark_node;
> +
> +  for (i = last_arg; i < nargs; ++i)
> +    if (!require_integer_immediate (i))
> +      return error_mark_node;
> +
> +  return resolve_to (mode_suffix_id, type);
> +}
> +
> +/* Resolve a (possibly predicated) function that offers a choice between
> +   taking:
> +
> +   - NOPS like-typed vector arguments or
> +   - NOPS - 1 like-typed vector arguments followed by a scalar argument
> +
> +   Return the function decl of the resolved function on success,
> +   otherwise report a suitable error and return error_mark_node.  */
> +tree
> +function_resolver::resolve_uniform_opt_n (unsigned int nops)
> +{
> +  unsigned int i, nargs;
> +  type_suffix_index type;
> +  if (!check_gp_argument (nops, i, nargs)
> +      /* Unary operators should use resolve_unary, so using i - 1 is
> +	 safe.  */
> +      || (type = infer_vector_type (i - 1)) == NUM_TYPE_SUFFIXES)
> +    return error_mark_node;
> +
> +  /* Skip last argument, may be scalar.  */
> +  unsigned int last_arg = i;
> +  for (i = 0; i < last_arg; i++)
> +    if (!require_matching_vector_type (i, type))
> +      return error_mark_node;
> +
> +  return finish_opt_n_resolution (last_arg, 0, type);
> +}
> +
> +/* If the call is erroneous, report an appropriate error and return
> +   error_mark_node.  Otherwise, if the function is overloaded, return
> +   the decl of the non-overloaded function.  Return NULL_TREE otherwise,
> +   indicating that the call should be processed in the normal way.  */
> +tree
> +function_resolver::resolve ()
> +{
> +  return shape->resolve (*this);
> +}
> +
> +function_checker::function_checker (location_t location,
> +				    const function_instance &instance,
> +				    tree fndecl, tree fntype,
> +				    unsigned int nargs, tree *args)
> +  : function_call_info (location, instance, fndecl),
> +    m_fntype (fntype), m_nargs (nargs), m_args (args)
> +{
> +  if (instance.has_inactive_argument ())
> +    m_base_arg = 1;
> +  else
> +    m_base_arg = 0;
> +}
> +
> +/* Return true if argument ARGNO exists. which it might not for
> +   erroneous calls.  It is safe to wave through checks if this
> +   function returns false.  */
> +bool
> +function_checker::argument_exists_p (unsigned int argno)
> +{
> +  gcc_assert (argno < (unsigned int) type_num_arguments (m_fntype));
> +  return argno < m_nargs;
> +}
> +
> +/* Check that argument ARGNO is an integer constant expression and
> +   store its value in VALUE_OUT if so.  The caller should first
> +   check that argument ARGNO exists.  */
> +bool
> +function_checker::require_immediate (unsigned int argno,
> +				     HOST_WIDE_INT &value_out)
> +{
> +  gcc_assert (argno < m_nargs);
> +  tree arg = m_args[argno];
> +
> +  /* The type and range are unsigned, so read the argument as an
> +     unsigned rather than signed HWI.  */
> +  if (!tree_fits_uhwi_p (arg))
> +    {
> +      report_non_ice (location, fndecl, argno);
> +      return false;
> +    }
> +
> +  /* ...but treat VALUE_OUT as signed for error reporting, since printing
> +     -1 is more user-friendly than the maximum uint64_t value.  */
> +  value_out = tree_to_uhwi (arg);
> +  return true;
> +}
> +
> +/* Check that argument REL_ARGNO is an integer constant expression that
> has
> +   a valid value for enumeration type TYPE.  REL_ARGNO counts from the end
> +   of the predication arguments.  */
> +bool
> +function_checker::require_immediate_enum (unsigned int rel_argno, tree
> type)
> +{
> +  unsigned int argno = m_base_arg + rel_argno;
> +  if (!argument_exists_p (argno))
> +    return true;
> +
> +  HOST_WIDE_INT actual;
> +  if (!require_immediate (argno, actual))
> +    return false;
> +
> +  for (tree entry = TYPE_VALUES (type); entry; entry = TREE_CHAIN (entry))
> +    {
> +      /* The value is an INTEGER_CST for C and a CONST_DECL wrapper
> +	 around an INTEGER_CST for C++.  */
> +      tree value = TREE_VALUE (entry);
> +      if (TREE_CODE (value) == CONST_DECL)
> +	value = DECL_INITIAL (value);
> +      if (wi::to_widest (value) == actual)
> +	return true;
> +    }
> +
> +  report_not_enum (location, fndecl, argno, actual, type);
> +  return false;
> +}
> +
> +/* Check that argument REL_ARGNO is an integer constant expression in the
> +   range [MIN, MAX].  REL_ARGNO counts from the end of the predication
> +   arguments.  */
> +bool
> +function_checker::require_immediate_range (unsigned int rel_argno,
> +					   HOST_WIDE_INT min,
> +					   HOST_WIDE_INT max)
> +{
> +  unsigned int argno = m_base_arg + rel_argno;
> +  if (!argument_exists_p (argno))
> +    return true;
> +
> +  /* Required because of the tree_to_uhwi -> HOST_WIDE_INT conversion
> +     in require_immediate.  */
> +  gcc_assert (min >= 0 && min <= max);
> +  HOST_WIDE_INT actual;
> +  if (!require_immediate (argno, actual))
> +    return false;
> +
> +  if (!IN_RANGE (actual, min, max))
> +    {
> +      report_out_of_range (location, fndecl, argno, actual, min, max);
> +      return false;
> +    }
> +
> +  return true;
> +}
> +
> +/* Perform semantic checks on the call.  Return true if the call is valid,
> +   otherwise report a suitable error.  */
> +bool
> +function_checker::check ()
> +{
> +  function_args_iterator iter;
> +  tree type;
> +  unsigned int i = 0;
> +  FOREACH_FUNCTION_ARGS (m_fntype, type, iter)
> +    {
> +      if (type == void_type_node || i >= m_nargs)
> +	break;
> +
> +      if (i >= m_base_arg
> +	  && TREE_CODE (type) == ENUMERAL_TYPE
> +	  && !require_immediate_enum (i - m_base_arg, type))
> +	return false;
> +
> +      i += 1;
> +    }
> +
> +  return shape->check (*this);
> +}
> +
> +gimple_folder::gimple_folder (const function_instance &instance, tree
> fndecl,
> +			      gcall *call_in)
> +  : function_call_info (gimple_location (call_in), instance, fndecl),
> +    call (call_in), lhs (gimple_call_lhs (call_in))
> +{
> +}
> +
> +/* Try to fold the call.  Return the new statement on success and null
> +   on failure.  */
> +gimple *
> +gimple_folder::fold ()
> +{
> +  /* Don't fold anything when MVE is disabled; emit an error during
> +     expansion instead.  */
> +  if (!TARGET_HAVE_MVE)
> +    return NULL;
> +
> +  /* Punt if the function has a return type and no result location is
> +     provided.  The attributes should allow target-independent code to
> +     remove the calls if appropriate.  */
> +  if (!lhs && TREE_TYPE (gimple_call_fntype (call)) != void_type_node)
> +    return NULL;
> +
> +  return base->fold (*this);
> +}
> +
> +function_expander::function_expander (const function_instance &instance,
> +				      tree fndecl, tree call_expr_in,
> +				      rtx possible_target_in)
> +  : function_call_info (EXPR_LOCATION (call_expr_in), instance, fndecl),
> +    call_expr (call_expr_in), possible_target (possible_target_in)
> +{
> +}
> +
> +/* Return the handler of direct optab OP for type suffix SUFFIX_I.  */
> +insn_code
> +function_expander::direct_optab_handler (optab op, unsigned int suffix_i)
> +{
> +  return ::direct_optab_handler (op, vector_mode (suffix_i));
> +}
> +
> +/* For a function that does the equivalent of:
> +
> +     OUTPUT = COND ? FN (INPUTS) : FALLBACK;
> +
> +   return the value of FALLBACK.
> +
> +   MODE is the mode of OUTPUT.
> +   MERGE_ARGNO is the argument that provides FALLBACK for _m functions,
> +   or DEFAULT_MERGE_ARGNO if we should apply the usual rules.
> +
> +   ARGNO is the caller's index into args.  If the returned value is
> +   argument 0 (as for unary _m operations), increment ARGNO past the
> +   returned argument.  */
> +rtx
> +function_expander::get_fallback_value (machine_mode mode,
> +				       unsigned int merge_argno,
> +				       unsigned int &argno)
> +{
> +  if (pred == PRED_z)
> +    return CONST0_RTX (mode);
> +
> +  gcc_assert (pred == PRED_m || pred == PRED_x);
> +
> +  if (merge_argno == 0)
> +    return args[argno++];
> +
> +  return args[merge_argno];
> +}
> +
> +/* Return a REG rtx that can be used for the result of the function,
> +   using the preferred target if suitable.  */
> +rtx
> +function_expander::get_reg_target ()
> +{
> +  machine_mode target_mode = TYPE_MODE (TREE_TYPE (TREE_TYPE
> (fndecl)));
> +  if (!possible_target || GET_MODE (possible_target) != target_mode)
> +    possible_target = gen_reg_rtx (target_mode);
> +  return possible_target;
> +}
> +
> +/* Add an output operand to the instruction we're building, which has
> +   code ICODE.  Bind the output to the preferred target rtx if possible.  */
> +void
> +function_expander::add_output_operand (insn_code icode)
> +{
> +  unsigned int opno = m_ops.length ();
> +  machine_mode mode = insn_data[icode].operand[opno].mode;
> +  m_ops.safe_grow (opno + 1, true);
> +  create_output_operand (&m_ops.last (), possible_target, mode);
> +}
> +
> +/* Add an input operand to the instruction we're building, which has
> +   code ICODE.  Calculate the value of the operand as follows:
> +
> +   - If the operand is a predicate, coerce X to have the
> +     mode that the instruction expects.
> +
> +   - Otherwise use X directly.  The expand machinery checks that X has
> +     the right mode for the instruction.  */
> +void
> +function_expander::add_input_operand (insn_code icode, rtx x)
> +{
> +  unsigned int opno = m_ops.length ();
> +  const insn_operand_data &operand = insn_data[icode].operand[opno];
> +  machine_mode mode = operand.mode;
> +  if (mode == VOIDmode)
> +    {
> +      /* The only allowable use of VOIDmode is the wildcard
> +	 arm_any_register_operand, which is used to avoid
> +	 combinatorial explosion in the reinterpret patterns.  */
> +      gcc_assert (operand.predicate == arm_any_register_operand);
> +      mode = GET_MODE (x);
> +    }
> +  else if (VALID_MVE_PRED_MODE (mode))
> +    x = gen_lowpart (mode, x);
> +
> +  m_ops.safe_grow (m_ops.length () + 1, true);
> +  create_input_operand (&m_ops.last (), x, mode);
> +}
> +
> +/* Add an integer operand with value X to the instruction.  */
> +void
> +function_expander::add_integer_operand (HOST_WIDE_INT x)
> +{
> +  m_ops.safe_grow (m_ops.length () + 1, true);
> +  create_integer_operand (&m_ops.last (), x);
> +}
> +
> +/* Generate instruction ICODE, given that its operands have already
> +   been added to M_OPS.  Return the value of the first operand.  */
> +rtx
> +function_expander::generate_insn (insn_code icode)
> +{
> +  expand_insn (icode, m_ops.length (), m_ops.address ());
> +  return function_returns_void_p () ? const0_rtx : m_ops[0].value;
> +}
> +
> +/* Implement the call using instruction ICODE, with a 1:1 mapping between
> +   arguments and input operands.  */
> +rtx
> +function_expander::use_exact_insn (insn_code icode)
> +{
> +  unsigned int nops = insn_data[icode].n_operands;
> +  if (!function_returns_void_p ())
> +    {
> +      add_output_operand (icode);
> +      nops -= 1;
> +    }
> +  for (unsigned int i = 0; i < nops; ++i)
> +    add_input_operand (icode, args[i]);
> +  return generate_insn (icode);
> +}
> +
> +/* Implement the call using instruction ICODE, which does not use a
> +   predicate.  */
> +rtx
> +function_expander::use_unpred_insn (insn_code icode)
> +{
> +  gcc_assert (pred == PRED_none);
> +  /* Discount the output operand.  */
> +  unsigned int nops = insn_data[icode].n_operands - 1;
> +  unsigned int i = 0;
> +
> +  add_output_operand (icode);
> +  for (; i < nops; ++i)
> +    add_input_operand (icode, args[i]);
> +
> +  return generate_insn (icode);
> +}
> +
> +/* Implement the call using instruction ICODE, which is a predicated
> +   operation that returns arbitrary values for inactive lanes.  */
> +rtx
> +function_expander::use_pred_x_insn (insn_code icode)
> +{
> +  gcc_assert (pred == PRED_x);
> +  unsigned int nops = args.length ();
> +
> +  add_output_operand (icode);
> +  /* Use first operand as arbitrary inactive input.  */
> +  add_input_operand (icode, possible_target);
> +  emit_clobber (possible_target);
> +  /* Copy remaining arguments, including the final predicate.  */
> +  for (unsigned int i = 0; i < nops; ++i)
> +      add_input_operand (icode, args[i]);
> +
> +  return generate_insn (icode);
> +}
> +
> +/* Implement the call using instruction ICODE, which does the equivalent of:
> +
> +     OUTPUT = COND ? FN (INPUTS) : FALLBACK;
> +
> +   The instruction operands are in the order above: OUTPUT, COND, INPUTS
> +   and FALLBACK.  MERGE_ARGNO is the argument that provides FALLBACK
> for _m
> +   functions, or DEFAULT_MERGE_ARGNO if we should apply the usual rules.
> */
> +rtx
> +function_expander::use_cond_insn (insn_code icode, unsigned int
> merge_argno)
> +{
> +  /* At present we never need to handle PRED_none, which would involve
> +     creating a new predicate rather than using one supplied by the user.  */
> +  gcc_assert (pred != PRED_none);
> +  /* For MVE, we only handle PRED_m at present.  */
> +  gcc_assert (pred == PRED_m);
> +
> +  /* Discount the output, predicate and fallback value.  */
> +  unsigned int nops = insn_data[icode].n_operands - 3;
> +  machine_mode mode = insn_data[icode].operand[0].mode;
> +
> +  unsigned int opno = 0;
> +  rtx fallback_arg = NULL_RTX;
> +  fallback_arg = get_fallback_value (mode, merge_argno, opno);
> +  rtx pred_arg = args[nops + 1];
> +
> +  add_output_operand (icode);
> +    add_input_operand (icode, fallback_arg);
> +  for (unsigned int i = 0; i < nops; ++i)
> +    add_input_operand (icode, args[opno + i]);
> +  add_input_operand (icode, pred_arg);
> +  return generate_insn (icode);
> +}
> +
> +/* Implement the call using a normal unpredicated optab for PRED_none.
> +
> +   <optab> corresponds to:
> +
> +   - CODE_FOR_SINT for signed integers
> +   - CODE_FOR_UINT for unsigned integers
> +   - CODE_FOR_FP for floating-point values  */
> +rtx
> +function_expander::map_to_rtx_codes (rtx_code code_for_sint,
> +				     rtx_code code_for_uint,
> +				     rtx_code code_for_fp)
> +{
> +  gcc_assert (pred == PRED_none);
> +  rtx_code code = type_suffix (0).integer_p ?
> +    (type_suffix (0).unsigned_p ? code_for_uint : code_for_sint)
> +    : code_for_fp;
> +  insn_code icode = direct_optab_handler (code_to_optab (code), 0);
> +  if (icode == CODE_FOR_nothing)
> +    gcc_unreachable ();
> +
> +  return use_unpred_insn (icode);
> +}
> +
> +/* Expand the call and return its lhs.  */
> +rtx
> +function_expander::expand ()
> +{
> +  unsigned int nargs = call_expr_nargs (call_expr);
> +  args.reserve (nargs);
> +  for (unsigned int i = 0; i < nargs; ++i)
> +    args.quick_push (expand_normal (CALL_EXPR_ARG (call_expr, i)));
> +
> +  return base->expand (*this);
> +}
> +
> +/* If we're implementing manual overloading, check whether the MVE
> +   function with subcode CODE is overloaded, and if so attempt to
> +   determine the corresponding non-overloaded function.  The call
> +   occurs at location LOCATION and has the arguments given by ARGLIST.
> +
> +   If the call is erroneous, report an appropriate error and return
> +   error_mark_node.  Otherwise, if the function is overloaded, return
> +   the decl of the non-overloaded function.  Return NULL_TREE otherwise,
> +   indicating that the call should be processed in the normal way.  */
> +tree
> +resolve_overloaded_builtin (location_t location, unsigned int code,
> +			    vec<tree, va_gc> *arglist)
> +{
> +  if (code >= vec_safe_length (registered_functions))
> +    return NULL_TREE;
> +
> +  registered_function &rfn = *(*registered_functions)[code];
> +  if (rfn.overloaded_p)
> +    return function_resolver (location, rfn.instance, rfn.decl,
> +			      *arglist).resolve ();
> +  return NULL_TREE;
> +}
> +
> +/* Perform any semantic checks needed for a call to the MVE function
> +   with subcode CODE, such as testing for integer constant expressions.
> +   The call occurs at location LOCATION and has NARGS arguments,
> +   given by ARGS.  FNDECL is the original function decl, before
> +   overload resolution.
> +
> +   Return true if the call is valid, otherwise report a suitable error.  */
> +bool
> +check_builtin_call (location_t location, vec<location_t>, unsigned int code,
> +		    tree fndecl, unsigned int nargs, tree *args)
> +{
> +  const registered_function &rfn = *(*registered_functions)[code];
> +  if (!check_requires_float (location, rfn.decl, rfn.requires_float))
> +    return false;
> +
> +  return function_checker (location, rfn.instance, fndecl,
> +			   TREE_TYPE (rfn.decl), nargs, args).check ();
> +}
> +
> +/* Attempt to fold STMT, given that it's a call to the MVE function
> +   with subcode CODE.  Return the new statement on success and null
> +   on failure.  Insert any other new statements at GSI.  */
> +gimple *
> +gimple_fold_builtin (unsigned int code, gcall *stmt)
> +{
> +  registered_function &rfn = *(*registered_functions)[code];
> +  return gimple_folder (rfn.instance, rfn.decl, stmt).fold ();
> +}
> +
> +/* Expand a call to the MVE function with subcode CODE.  EXP is the call
> +   expression and TARGET is the preferred location for the result.
> +   Return the value of the lhs.  */
> +rtx
> +expand_builtin (unsigned int code, tree exp, rtx target)
> +{
> +  registered_function &rfn = *(*registered_functions)[code];
> +  if (!check_requires_float (EXPR_LOCATION (exp), rfn.decl,
> +			    rfn.requires_float))
> +    return target;
> +  return function_expander (rfn.instance, rfn.decl, exp, target).expand ();
> +}
> +
> +} /* end namespace arm_mve */
> +
> +using namespace arm_mve;
> +
> +inline void
> +gt_ggc_mx (function_instance *)
> +{
> +}
> +
> +inline void
> +gt_pch_nx (function_instance *)
> +{
> +}
> +
> +inline void
> +gt_pch_nx (function_instance *, gt_pointer_operator, void *)
> +{
> +}
> 
>  #include "gt-arm-mve-builtins.h"
> diff --git a/gcc/config/arm/arm-mve-builtins.def b/gcc/config/arm/arm-mve-
> builtins.def
> index 69f3f81b473..49d07364fa2 100644
> --- a/gcc/config/arm/arm-mve-builtins.def
> +++ b/gcc/config/arm/arm-mve-builtins.def
> @@ -17,10 +17,25 @@
>     along with GCC; see the file COPYING3.  If not see
>     <http://www.gnu.org/licenses/>.  */
> 
> +#ifndef DEF_MVE_MODE
> +#define DEF_MVE_MODE(A, B, C, D)
> +#endif
> +
>  #ifndef DEF_MVE_TYPE
> -#error "arm-mve-builtins.def included without defining DEF_MVE_TYPE"
> +#define DEF_MVE_TYPE(A, B)
> +#endif
> +
> +#ifndef DEF_MVE_TYPE_SUFFIX
> +#define DEF_MVE_TYPE_SUFFIX(A, B, C, D, E)
>  #endif
> 
> +#ifndef DEF_MVE_FUNCTION
> +#define DEF_MVE_FUNCTION(A, B, C, D)
> +#endif
> +
> +DEF_MVE_MODE (n, none, none, none)
> +DEF_MVE_MODE (offset, none, none, bytes)
> +
>  #define REQUIRES_FLOAT false
>  DEF_MVE_TYPE (mve_pred16_t, boolean_type_node)
>  DEF_MVE_TYPE (uint8x16_t, unsigned_intQI_type_node)
> @@ -37,3 +52,26 @@ DEF_MVE_TYPE (int64x2_t, intDI_type_node)
>  DEF_MVE_TYPE (float16x8_t, arm_fp16_type_node)
>  DEF_MVE_TYPE (float32x4_t, float_type_node)
>  #undef REQUIRES_FLOAT
> +
> +#define REQUIRES_FLOAT false
> +DEF_MVE_TYPE_SUFFIX (s8, int8x16_t, signed, 8, V16QImode)
> +DEF_MVE_TYPE_SUFFIX (s16, int16x8_t, signed, 16, V8HImode)
> +DEF_MVE_TYPE_SUFFIX (s32, int32x4_t, signed, 32, V4SImode)
> +DEF_MVE_TYPE_SUFFIX (s64, int64x2_t, signed, 64, V2DImode)
> +DEF_MVE_TYPE_SUFFIX (u8, uint8x16_t, unsigned, 8, V16QImode)
> +DEF_MVE_TYPE_SUFFIX (u16, uint16x8_t, unsigned, 16, V8HImode)
> +DEF_MVE_TYPE_SUFFIX (u32, uint32x4_t, unsigned, 32, V4SImode)
> +DEF_MVE_TYPE_SUFFIX (u64, uint64x2_t, unsigned, 64, V2DImode)
> +#undef REQUIRES_FLOAT
> +
> +#define REQUIRES_FLOAT true
> +DEF_MVE_TYPE_SUFFIX (f16, float16x8_t, float, 16, V8HFmode)
> +DEF_MVE_TYPE_SUFFIX (f32, float32x4_t, float, 32, V4SFmode)
> +#undef REQUIRES_FLOAT
> +
> +#include "arm-mve-builtins-base.def"
> +
> +#undef DEF_MVE_TYPE
> +#undef DEF_MVE_TYPE_SUFFIX
> +#undef DEF_MVE_FUNCTION
> +#undef DEF_MVE_MODE
> diff --git a/gcc/config/arm/arm-mve-builtins.h b/gcc/config/arm/arm-mve-
> builtins.h
> index 290a118ec92..a20d2fb5d86 100644
> --- a/gcc/config/arm/arm-mve-builtins.h
> +++ b/gcc/config/arm/arm-mve-builtins.h
> @@ -20,7 +20,79 @@
>  #ifndef GCC_ARM_MVE_BUILTINS_H
>  #define GCC_ARM_MVE_BUILTINS_H
> 
> +/* The full name of an MVE ACLE function is the concatenation of:
> +
> +   - the base name ("vadd", etc.)
> +   - the "mode" suffix ("_n", "_index", etc.)
> +   - the type suffixes ("_s32", "_b8", etc.)
> +   - the predication suffix ("_x", "_z", etc.)
> +
> +   Each piece of information is individually useful, so we retain this
> +   classification throughout:
> +
> +   - function_base represents the base name
> +
> +   - mode_suffix_index represents the mode suffix
> +
> +   - type_suffix_index represents individual type suffixes, while
> +     type_suffix_pair represents a pair of them
> +
> +   - prediction_index extends the predication suffix with an additional
> +     alternative: PRED_implicit for implicitly-predicated operations
> +
> +   In addition to its unique full name, a function may have a shorter
> +   overloaded alias.  This alias removes pieces of the suffixes that
> +   can be inferred from the arguments, such as by shortening the mode
> +   suffix or dropping some of the type suffixes.  The base name and the
> +   predication suffix stay the same.
> +
> +   The function_shape class describes what arguments a given function
> +   takes and what its overloaded alias is called.  In broad terms,
> +   function_base describes how the underlying instruction behaves while
> +   function_shape describes how that instruction has been presented at
> +   the language level.
> +
> +   The static list of functions uses function_group to describe a group
> +   of related functions.  The function_builder class is responsible for
> +   expanding this static description into a list of individual functions
> +   and registering the associated built-in functions.  function_instance
> +   describes one of these individual functions in terms of the properties
> +   described above.
> +
> +   The classes involved in compiling a function call are:
> +
> +   - function_resolver, which resolves an overloaded function call to a
> +     specific function_instance and its associated function decl
> +
> +   - function_checker, which checks whether the values of the arguments
> +     conform to the ACLE specification
> +
> +   - gimple_folder, which tries to fold a function call at the gimple level
> +
> +   - function_expander, which expands a function call into rtl instructions
> +
> +   function_resolver and function_checker operate at the language level
> +   and so are associated with the function_shape.  gimple_folder and
> +   function_expander are concerned with the behavior of the function
> +   and so are associated with the function_base.
> +
> +   Note that we've specifically chosen not to fold calls in the frontend,
> +   since MVE intrinsics will hardly ever fold a useful language-level
> +   constant.  */
>  namespace arm_mve {
> +/* The maximum number of vectors in an ACLE tuple type.  */
> +const unsigned int MAX_TUPLE_SIZE = 3;
> +
> +/* Used to represent the default merge argument index for _m functions.
> +   The actual index depends on how many arguments the function takes.  */
> +const unsigned int DEFAULT_MERGE_ARGNO = 0;
> +
> +/* Flags that describe what a function might do, in addition to reading
> +   its arguments and returning a result.  */
> +const unsigned int CP_READ_FPCR = 1U << 0;
> +const unsigned int CP_RAISE_FP_EXCEPTIONS = 1U << 1;
> +const unsigned int CP_READ_MEMORY = 1U << 2;
> +const unsigned int CP_WRITE_MEMORY = 1U << 3;
> 
>  /* Enumerates the MVE predicate and (data) vector types, together called
>     "vector types" for brevity.  */
> @@ -30,11 +102,604 @@ enum vector_type_index
>    VECTOR_TYPE_ ## ACLE_NAME,
>  #include "arm-mve-builtins.def"
>    NUM_VECTOR_TYPES
> -#undef DEF_MVE_TYPE
>  };
> 
> +/* Classifies the available measurement units for an address displacement.
> */
> +enum units_index
> +{
> +  UNITS_none,
> +  UNITS_bytes
> +};
> +
> +/* Describes the various uses of a governing predicate.  */
> +enum predication_index
> +{
> +  /* No governing predicate is present.  */
> +  PRED_none,
> +
> +  /* Merging predication: copy inactive lanes from the first data argument
> +     to the vector result.  */
> +  PRED_m,
> +
> +  /* Plain predication: inactive lanes are not used to compute the
> +     scalar result.  */
> +  PRED_p,
> +
> +  /* "Don't care" predication: set inactive lanes of the vector result
> +     to arbitrary values.  */
> +  PRED_x,
> +
> +  /* Zero predication: set inactive lanes of the vector result to zero.  */
> +  PRED_z,
> +
> +  NUM_PREDS
> +};
> +
> +/* Classifies element types, based on type suffixes with the bit count
> +   removed.  */
> +enum type_class_index
> +{
> +  TYPE_bool,
> +  TYPE_float,
> +  TYPE_signed,
> +  TYPE_unsigned,
> +  NUM_TYPE_CLASSES
> +};
> +
> +/* Classifies an operation into "modes"; for example, to distinguish
> +   vector-scalar operations from vector-vector operations, or to
> +   distinguish between different addressing modes.  This classification
> +   accounts for the function suffixes that occur between the base name
> +   and the first type suffix.  */
> +enum mode_suffix_index
> +{
> +#define DEF_MVE_MODE(NAME, BASE, DISPLACEMENT, UNITS)
> MODE_##NAME,
> +#include "arm-mve-builtins.def"
> +  MODE_none
> +};
> +
> +/* Enumerates the possible type suffixes.  Each suffix is associated with
> +   a vector type, but for predicates provides extra information about the
> +   element size.  */
> +enum type_suffix_index
> +{
> +#define DEF_MVE_TYPE_SUFFIX(NAME, ACLE_TYPE, CLASS, BITS, MODE)
> 	\
> +  TYPE_SUFFIX_ ## NAME,
> +#include "arm-mve-builtins.def"
> +  NUM_TYPE_SUFFIXES
> +};
> +
> +/* Combines two type suffixes.  */
> +typedef enum type_suffix_index type_suffix_pair[2];
> +
> +class function_base;
> +class function_shape;
> +
> +/* Static information about a mode suffix.  */
> +struct mode_suffix_info
> +{
> +  /* The suffix string itself.  */
> +  const char *string;
> +
> +  /* The type of the vector base address, or NUM_VECTOR_TYPES if the
> +     mode does not include a vector base address.  */
> +  vector_type_index base_vector_type;
> +
> +  /* The type of the vector displacement, or NUM_VECTOR_TYPES if the
> +     mode does not include a vector displacement.  (Note that scalar
> +     displacements are always int64_t.)  */
> +  vector_type_index displacement_vector_type;
> +
> +  /* The units in which the vector or scalar displacement is measured,
> +     or UNITS_none if the mode doesn't take a displacement.  */
> +  units_index displacement_units;
> +};
> +
> +/* Static information about a type suffix.  */
> +struct type_suffix_info
> +{
> +  /* The suffix string itself.  */
> +  const char *string;
> +
> +  /* The associated ACLE vector or predicate type.  */
> +  vector_type_index vector_type : 8;
> +
> +  /* What kind of type the suffix represents.  */
> +  type_class_index tclass : 8;
> +
> +  /* The number of bits and bytes in an element.  For predicates this
> +     measures the associated data elements.  */
> +  unsigned int element_bits : 8;
> +  unsigned int element_bytes : 8;
> +
> +  /* True if the suffix is for an integer type.  */
> +  unsigned int integer_p : 1;
> +  /* True if the suffix is for an unsigned type.  */
> +  unsigned int unsigned_p : 1;
> +  /* True if the suffix is for a floating-point type.  */
> +  unsigned int float_p : 1;
> +  unsigned int spare : 13;
> +
> +  /* The associated vector or predicate mode.  */
> +  machine_mode vector_mode : 16;
> +};
> +
> +/* Static information about a set of functions.  */
> +struct function_group_info
> +{
> +  /* The base name, as a string.  */
> +  const char *base_name;
> +
> +  /* Describes the behavior associated with the function base name.  */
> +  const function_base *const *base;
> +
> +  /* The shape of the functions, as described above the class definition.
> +     It's possible to have entries with the same base name but different
> +     shapes.  */
> +  const function_shape *const *shape;
> +
> +  /* A list of the available type suffixes, and of the available predication
> +     types.  The function supports every combination of the two.
> +
> +     The list of type suffixes is terminated by two NUM_TYPE_SUFFIXES
> +     while the list of predication types is terminated by NUM_PREDS.
> +     The list of type suffixes is lexicographically ordered based
> +     on the index value.  */
> +  const type_suffix_pair *types;
> +  const predication_index *preds;
> +
> +  /* Whether the function group requires a floating point abi.  */
> +  bool requires_float;
> +};
> +
> +/* Describes a single fully-resolved function (i.e. one that has a
> +   unique full name).  */
> +class GTY((user)) function_instance
> +{
> +public:
> +  function_instance (const char *, const function_base *,
> +		     const function_shape *, mode_suffix_index,
> +		     const type_suffix_pair &, predication_index);
> +
> +  bool operator== (const function_instance &) const;
> +  bool operator!= (const function_instance &) const;
> +  hashval_t hash () const;
> +
> +  unsigned int call_properties () const;
> +  bool reads_global_state_p () const;
> +  bool modifies_global_state_p () const;
> +  bool could_trap_p () const;
> +
> +  unsigned int vectors_per_tuple () const;
> +
> +  const mode_suffix_info &mode_suffix () const;
> +
> +  const type_suffix_info &type_suffix (unsigned int) const;
> +  tree scalar_type (unsigned int) const;
> +  tree vector_type (unsigned int) const;
> +  tree tuple_type (unsigned int) const;
> +  machine_mode vector_mode (unsigned int) const;
> +  machine_mode gp_mode (unsigned int) const;
> +
> +  bool has_inactive_argument () const;
> +
> +  /* The properties of the function.  (The explicit "enum"s are required
> +     for gengtype.)  */
> +  const char *base_name;
> +  const function_base *base;
> +  const function_shape *shape;
> +  enum mode_suffix_index mode_suffix_id;
> +  type_suffix_pair type_suffix_ids;
> +  enum predication_index pred;
> +};
> +
> +class registered_function;
> +
> +/* A class for building and registering function decls.  */
> +class function_builder
> +{
> +public:
> +  function_builder ();
> +  ~function_builder ();
> +
> +  void add_unique_function (const function_instance &, tree,
> +			    vec<tree> &, bool, bool, bool);
> +  void add_overloaded_function (const function_instance &, bool, bool);
> +  void add_overloaded_functions (const function_group_info &,
> +				 mode_suffix_index, bool);
> +
> +  void register_function_group (const function_group_info &, bool);
> +
> +private:
> +  void append_name (const char *);
> +  char *finish_name ();
> +
> +  char *get_name (const function_instance &, bool, bool);
> +
> +  tree get_attributes (const function_instance &);
> +
> +  registered_function &add_function (const function_instance &,
> +				     const char *, tree, tree,
> +				     bool, bool, bool);
> +
> +  /* The function type to use for functions that are resolved by
> +     function_resolver.  */
> +  tree m_overload_type;
> +
> +  /* True if we should create a separate decl for each instance of an
> +     overloaded function, instead of using function_resolver.  */
> +  bool m_direct_overloads;
> +
> +  /* Used for building up function names.  */
> +  obstack m_string_obstack;
> +
> +  /* Maps all overloaded function names that we've registered so far
> +     to their associated function_instances.  */
> +  hash_map<nofree_string_hash, registered_function *>
> m_overload_names;
> +};
> +
> +/* A base class for handling calls to built-in functions.  */
> +class function_call_info : public function_instance
> +{
> +public:
> +  function_call_info (location_t, const function_instance &, tree);
> +
> +  bool function_returns_void_p ();
> +
> +  /* The location of the call.  */
> +  location_t location;
> +
> +  /* The FUNCTION_DECL that is being called.  */
> +  tree fndecl;
> +};
> +
> +/* A class for resolving an overloaded function call.  */
> +class function_resolver : public function_call_info
> +{
> +public:
> +  enum { SAME_SIZE = 256, HALF_SIZE, QUARTER_SIZE };
> +  static const type_class_index SAME_TYPE_CLASS = NUM_TYPE_CLASSES;
> +
> +  function_resolver (location_t, const function_instance &, tree,
> +		     vec<tree, va_gc> &);
> +
> +  tree get_vector_type (type_suffix_index);
> +  const char *get_scalar_type_name (type_suffix_index);
> +  tree get_argument_type (unsigned int);
> +  bool scalar_argument_p (unsigned int);
> +
> +  tree report_no_such_form (type_suffix_index);
> +  tree lookup_form (mode_suffix_index,
> +		    type_suffix_index = NUM_TYPE_SUFFIXES,
> +		    type_suffix_index = NUM_TYPE_SUFFIXES);
> +  tree resolve_to (mode_suffix_index,
> +		   type_suffix_index = NUM_TYPE_SUFFIXES,
> +		   type_suffix_index = NUM_TYPE_SUFFIXES);
> +
> +  type_suffix_index infer_vector_or_tuple_type (unsigned int, unsigned int);
> +  type_suffix_index infer_vector_type (unsigned int);
> +
> +  bool require_vector_or_scalar_type (unsigned int);
> +
> +  bool require_vector_type (unsigned int, vector_type_index);
> +  bool require_matching_vector_type (unsigned int, type_suffix_index);
> +  bool require_derived_vector_type (unsigned int, unsigned int,
> +				    type_suffix_index,
> +				    type_class_index = SAME_TYPE_CLASS,
> +				    unsigned int = SAME_SIZE);
> +  bool require_integer_immediate (unsigned int);
> +  bool require_scalar_type (unsigned int, const char *);
> +  bool require_derived_scalar_type (unsigned int, type_class_index,
> +				    unsigned int = SAME_SIZE);
> +
> +  bool check_num_arguments (unsigned int);
> +  bool check_gp_argument (unsigned int, unsigned int &, unsigned int &);
> +  tree resolve_unary (type_class_index = SAME_TYPE_CLASS,
> +		      unsigned int = SAME_SIZE, bool = false);
> +  tree resolve_unary_n ();
> +  tree resolve_uniform (unsigned int, unsigned int = 0);
> +  tree resolve_uniform_opt_n (unsigned int);
> +  tree finish_opt_n_resolution (unsigned int, unsigned int, type_suffix_index,
> +				type_class_index = SAME_TYPE_CLASS,
> +				unsigned int = SAME_SIZE,
> +				type_suffix_index = NUM_TYPE_SUFFIXES);
> +
> +  tree resolve ();
> +
> +private:
> +  /* The arguments to the overloaded function.  */
> +  vec<tree, va_gc> &m_arglist;
> +};
> +
> +/* A class for checking that the semantic constraints on a function call are
> +   satisfied, such as arguments being integer constant expressions with
> +   a particular range.  The parent class's FNDECL is the decl that was
> +   called in the original source, before overload resolution.  */
> +class function_checker : public function_call_info
> +{
> +public:
> +  function_checker (location_t, const function_instance &, tree,
> +		    tree, unsigned int, tree *);
> +
> +  bool require_immediate_enum (unsigned int, tree);
> +  bool require_immediate_lane_index (unsigned int, unsigned int = 1);
> +  bool require_immediate_range (unsigned int, HOST_WIDE_INT,
> HOST_WIDE_INT);
> +
> +  bool check ();
> +
> +private:
> +  bool argument_exists_p (unsigned int);
> +
> +  bool require_immediate (unsigned int, HOST_WIDE_INT &);
> +
> +  /* The type of the resolved function.  */
> +  tree m_fntype;
> +
> +  /* The arguments to the function.  */
> +  unsigned int m_nargs;
> +  tree *m_args;
> +
> +  /* The first argument not associated with the function's predication
> +     type.  */
> +  unsigned int m_base_arg;
> +};
> +
> +/* A class for folding a gimple function call.  */
> +class gimple_folder : public function_call_info
> +{
> +public:
> +  gimple_folder (const function_instance &, tree,
> +		 gcall *);
> +
> +  gimple *fold ();
> +
> +  /* The call we're folding.  */
> +  gcall *call;
> +
> +  /* The result of the call, or null if none.  */
> +  tree lhs;
> +};
> +
> +/* A class for expanding a function call into RTL.  */
> +class function_expander : public function_call_info
> +{
> +public:
> +  function_expander (const function_instance &, tree, tree, rtx);
> +  rtx expand ();
> +
> +  insn_code direct_optab_handler (optab, unsigned int = 0);
> +
> +  rtx get_fallback_value (machine_mode, unsigned int, unsigned int &);
> +  rtx get_reg_target ();
> +
> +  void add_output_operand (insn_code);
> +  void add_input_operand (insn_code, rtx);
> +  void add_integer_operand (HOST_WIDE_INT);
> +  rtx generate_insn (insn_code);
> +
> +  rtx use_exact_insn (insn_code);
> +  rtx use_unpred_insn (insn_code);
> +  rtx use_pred_x_insn (insn_code);
> +  rtx use_cond_insn (insn_code, unsigned int = DEFAULT_MERGE_ARGNO);
> +
> +  rtx map_to_rtx_codes (rtx_code, rtx_code, rtx_code);
> +
> +  /* The function call expression.  */
> +  tree call_expr;
> +
> +  /* For functions that return a value, this is the preferred location
> +     of that value.  It could be null or could have a different mode
> +     from the function return type.  */
> +  rtx possible_target;
> +
> +  /* The expanded arguments.  */
> +  auto_vec<rtx, 16> args;
> +
> +private:
> +  /* Used to build up the operands to an instruction.  */
> +  auto_vec<expand_operand, 8> m_ops;
> +};
> +
> +/* Provides information about a particular function base name, and handles
> +   tasks related to the base name.  */
> +class function_base
> +{
> +public:
> +  /* Return a set of CP_* flags that describe what the function might do,
> +     in addition to reading its arguments and returning a result.  */
> +  virtual unsigned int call_properties (const function_instance &) const;
> +
> +  /* If the function operates on tuples of vectors, return the number
> +     of vectors in the tuples, otherwise return 1.  */
> +  virtual unsigned int vectors_per_tuple () const { return 1; }
> +
> +  /* Try to fold the given gimple call.  Return the new gimple statement
> +     on success, otherwise return null.  */
> +  virtual gimple *fold (gimple_folder &) const { return NULL; }
> +
> +  /* Expand the given call into rtl.  Return the result of the function,
> +     or an arbitrary value if the function doesn't return a result.  */
> +  virtual rtx expand (function_expander &) const = 0;
> +};
> +
> +/* Classifies functions into "shapes".  The idea is to take all the
> +   type signatures for a set of functions, and classify what's left
> +   based on:
> +
> +   - the number of arguments
> +
> +   - the process of determining the types in the signature from the mode
> +     and type suffixes in the function name (including types that are not
> +     affected by the suffixes)
> +
> +   - which arguments must be integer constant expressions, and what range
> +     those arguments have
> +
> +   - the process for mapping overloaded names to "full" names.  */
> +class function_shape
> +{
> +public:
> +  virtual bool explicit_type_suffix_p (unsigned int, enum predication_index,
> enum mode_suffix_index) const = 0;
> +  virtual bool explicit_mode_suffix_p (enum predication_index, enum
> mode_suffix_index) const = 0;
> +  virtual bool skip_overload_p (enum predication_index, enum
> mode_suffix_index) const = 0;
> +
> +  /* Define all functions associated with the given group.  */
> +  virtual void build (function_builder &,
> +		      const function_group_info &,
> +		      bool) const = 0;
> +
> +  /* Try to resolve the overloaded call.  Return the non-overloaded
> +     function decl on success and error_mark_node on failure.  */
> +  virtual tree resolve (function_resolver &) const = 0;
> +
> +  /* Check whether the given call is semantically valid.  Return true
> +     if it is, otherwise report an error and return false.  */
> +  virtual bool check (function_checker &) const { return true; }
> +};
> +
> +extern const type_suffix_info type_suffixes[NUM_TYPE_SUFFIXES + 1];
> +extern const mode_suffix_info mode_suffixes[MODE_none + 1];
> +
>  extern tree scalar_types[NUM_VECTOR_TYPES];
> -extern tree acle_vector_types[3][NUM_VECTOR_TYPES + 1];
> +extern tree acle_vector_types[MAX_TUPLE_SIZE][NUM_VECTOR_TYPES + 1];
> +
> +/* Return the ACLE type mve_pred16_t.  */
> +inline tree
> +get_mve_pred16_t (void)
> +{
> +  return acle_vector_types[0][VECTOR_TYPE_mve_pred16_t];
> +}
> +
> +/* Try to find a mode with the given mode_suffix_info fields.  Return the
> +   mode on success or MODE_none on failure.  */
> +inline mode_suffix_index
> +find_mode_suffix (vector_type_index base_vector_type,
> +		  vector_type_index displacement_vector_type,
> +		  units_index displacement_units)
> +{
> +  for (unsigned int mode_i = 0; mode_i < ARRAY_SIZE (mode_suffixes);
> ++mode_i)
> +    {
> +      const mode_suffix_info &mode = mode_suffixes[mode_i];
> +      if (mode.base_vector_type == base_vector_type
> +	  && mode.displacement_vector_type == displacement_vector_type
> +	  && mode.displacement_units == displacement_units)
> +	return mode_suffix_index (mode_i);
> +    }
> +  return MODE_none;
> +}
> +
> +/* Return the type suffix associated with ELEMENT_BITS-bit elements of type
> +   class TCLASS.  */
> +inline type_suffix_index
> +find_type_suffix (type_class_index tclass, unsigned int element_bits)
> +{
> +  for (unsigned int i = 0; i < NUM_TYPE_SUFFIXES; ++i)
> +    if (type_suffixes[i].tclass == tclass
> +	&& type_suffixes[i].element_bits == element_bits)
> +      return type_suffix_index (i);
> +  gcc_unreachable ();
> +}
> +
> +inline function_instance::
> +function_instance (const char *base_name_in,
> +		   const function_base *base_in,
> +		   const function_shape *shape_in,
> +		   mode_suffix_index mode_suffix_id_in,
> +		   const type_suffix_pair &type_suffix_ids_in,
> +		   predication_index pred_in)
> +  : base_name (base_name_in), base (base_in), shape (shape_in),
> +    mode_suffix_id (mode_suffix_id_in), pred (pred_in)
> +{
> +  memcpy (type_suffix_ids, type_suffix_ids_in, sizeof (type_suffix_ids));
> +}
> +
> +inline bool
> +function_instance::operator== (const function_instance &other) const
> +{
> +  return (base == other.base
> +	  && shape == other.shape
> +	  && mode_suffix_id == other.mode_suffix_id
> +	  && pred == other.pred
> +	  && type_suffix_ids[0] == other.type_suffix_ids[0]
> +	  && type_suffix_ids[1] == other.type_suffix_ids[1]);
> +}
> +
> +inline bool
> +function_instance::operator!= (const function_instance &other) const
> +{
> +  return !operator== (other);
> +}
> +
> +/* If the function operates on tuples of vectors, return the number
> +   of vectors in the tuples, otherwise return 1.  */
> +inline unsigned int
> +function_instance::vectors_per_tuple () const
> +{
> +  return base->vectors_per_tuple ();
> +}
> +
> +/* Return information about the function's mode suffix.  */
> +inline const mode_suffix_info &
> +function_instance::mode_suffix () const
> +{
> +  return mode_suffixes[mode_suffix_id];
> +}
> +
> +/* Return information about type suffix I.  */
> +inline const type_suffix_info &
> +function_instance::type_suffix (unsigned int i) const
> +{
> +  return type_suffixes[type_suffix_ids[i]];
> +}
> +
> +/* Return the scalar type associated with type suffix I.  */
> +inline tree
> +function_instance::scalar_type (unsigned int i) const
> +{
> +  return scalar_types[type_suffix (i).vector_type];
> +}
> +
> +/* Return the vector type associated with type suffix I.  */
> +inline tree
> +function_instance::vector_type (unsigned int i) const
> +{
> +  return acle_vector_types[0][type_suffix (i).vector_type];
> +}
> +
> +/* If the function operates on tuples of vectors, return the tuple type
> +   associated with type suffix I, otherwise return the vector type associated
> +   with type suffix I.  */
> +inline tree
> +function_instance::tuple_type (unsigned int i) const
> +{
> +  unsigned int num_vectors = vectors_per_tuple ();
> +  return acle_vector_types[num_vectors - 1][type_suffix (i).vector_type];
> +}
> +
> +/* Return the vector or predicate mode associated with type suffix I.  */
> +inline machine_mode
> +function_instance::vector_mode (unsigned int i) const
> +{
> +  return type_suffix (i).vector_mode;
> +}
> +
> +/* Return true if the function has no return value.  */
> +inline bool
> +function_call_info::function_returns_void_p ()
> +{
> +  return TREE_TYPE (TREE_TYPE (fndecl)) == void_type_node;
> +}
> +
> +/* Default implementation of function::call_properties, with conservatively
> +   correct behavior for floating-point instructions.  */
> +inline unsigned int
> +function_base::call_properties (const function_instance &instance) const
> +{
> +  unsigned int flags = 0;
> +  if (instance.type_suffix (0).float_p || instance.type_suffix (1).float_p)
> +    flags |= CP_READ_FPCR | CP_RAISE_FP_EXCEPTIONS;
> +  return flags;
> +}
> 
>  } /* end namespace arm_mve */
> 
> diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
> index 1bdbd3b8ab3..61fcd671437 100644
> --- a/gcc/config/arm/arm-protos.h
> +++ b/gcc/config/arm/arm-protos.h
> @@ -215,7 +215,8 @@ extern opt_machine_mode arm_get_mask_mode
> (machine_mode mode);
>     those groups.  */
>  enum arm_builtin_class
>  {
> -  ARM_BUILTIN_GENERAL
> +  ARM_BUILTIN_GENERAL,
> +  ARM_BUILTIN_MVE
>  };
> 
>  /* Built-in function codes are structured so that the low
> @@ -229,6 +230,13 @@ const unsigned int ARM_BUILTIN_CLASS = (1 <<
> ARM_BUILTIN_SHIFT) - 1;
>  /* MVE functions.  */
>  namespace arm_mve {
>    void handle_arm_mve_types_h ();
> +  void handle_arm_mve_h (bool);
> +  tree resolve_overloaded_builtin (location_t, unsigned int,
> +				   vec<tree, va_gc> *);
> +  bool check_builtin_call (location_t, vec<location_t>, unsigned int,
> +			   tree, unsigned int, tree *);
> +  gimple *gimple_fold_builtin (unsigned int code, gcall *stmt);
> +  rtx expand_builtin (unsigned int, tree, rtx);
>  }
> 
>  /* Thumb functions.  */
> diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc
> index bf7ff9a9704..004e6c6194e 100644
> --- a/gcc/config/arm/arm.cc
> +++ b/gcc/config/arm/arm.cc
> @@ -69,6 +69,7 @@
>  #include "optabs-libfuncs.h"
>  #include "gimplify.h"
>  #include "gimple.h"
> +#include "gimple-iterator.h"
>  #include "selftest.h"
>  #include "tree-vectorizer.h"
>  #include "opts.h"
> @@ -506,6 +507,9 @@ static const struct attribute_spec
> arm_attribute_table[] =
>  #undef TARGET_FUNCTION_VALUE_REGNO_P
>  #define TARGET_FUNCTION_VALUE_REGNO_P arm_function_value_regno_p
> 
> +#undef TARGET_GIMPLE_FOLD_BUILTIN
> +#define TARGET_GIMPLE_FOLD_BUILTIN arm_gimple_fold_builtin
> +
>  #undef  TARGET_ASM_OUTPUT_MI_THUNK
>  #define TARGET_ASM_OUTPUT_MI_THUNK arm_output_mi_thunk
>  #undef  TARGET_ASM_CAN_OUTPUT_MI_THUNK
> @@ -2844,6 +2848,29 @@ arm_init_libfuncs (void)
>    speculation_barrier_libfunc = init_one_libfunc ("__speculation_barrier");
>  }
> 
> +/* Implement TARGET_GIMPLE_FOLD_BUILTIN.  */
> +static bool
> +arm_gimple_fold_builtin (gimple_stmt_iterator *gsi)
> +{
> +  gcall *stmt = as_a <gcall *> (gsi_stmt (*gsi));
> +  tree fndecl = gimple_call_fndecl (stmt);
> +  unsigned int code = DECL_MD_FUNCTION_CODE (fndecl);
> +  unsigned int subcode = code >> ARM_BUILTIN_SHIFT;
> +  gimple *new_stmt = NULL;
> +  switch (code & ARM_BUILTIN_CLASS)
> +    {
> +    case ARM_BUILTIN_GENERAL:
> +      break;
> +    case ARM_BUILTIN_MVE:
> +      new_stmt = arm_mve::gimple_fold_builtin (subcode, stmt);
> +    }
> +  if (!new_stmt)
> +    return false;
> +
> +  gsi_replace (gsi, new_stmt, true);
> +  return true;
> +}
> +
>  /* On AAPCS systems, this is the "struct __va_list".  */
>  static GTY(()) tree va_list_type;
> 
> diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
> index 1262d668121..0d2ba968fc0 100644
> --- a/gcc/config/arm/arm_mve.h
> +++ b/gcc/config/arm/arm_mve.h
> @@ -34,6 +34,12 @@
>  #endif
>  #include "arm_mve_types.h"
> 
> +#ifdef __ARM_MVE_PRESERVE_USER_NAMESPACE
> +#pragma GCC arm "arm_mve.h" true
> +#else
> +#pragma GCC arm "arm_mve.h" false
> +#endif
> +
>  #ifndef __ARM_MVE_PRESERVE_USER_NAMESPACE
>  #define vst4q(__addr, __value) __arm_vst4q(__addr, __value)
>  #define vdupq_n(__a) __arm_vdupq_n(__a)
> diff --git a/gcc/config/arm/predicates.md b/gcc/config/arm/predicates.md
> index 3139750c606..8e235f63ee6 100644
> --- a/gcc/config/arm/predicates.md
> +++ b/gcc/config/arm/predicates.md
> @@ -903,3 +903,7 @@ (define_predicate "call_insn_operand"
>  (define_special_predicate "aligned_operand"
>    (ior (not (match_code "mem"))
>         (match_test "MEM_ALIGN (op) >= GET_MODE_ALIGNMENT (mode)")))
> +
> +;; A special predicate that doesn't match a particular mode.
> +(define_special_predicate "arm_any_register_operand"
> +  (match_code "reg"))
> diff --git a/gcc/config/arm/t-arm b/gcc/config/arm/t-arm
> index 637e72af5bb..9a1b06368a1 100644
> --- a/gcc/config/arm/t-arm
> +++ b/gcc/config/arm/t-arm
> @@ -154,15 +154,41 @@ arm-builtins.o: $(srcdir)/config/arm/arm-builtins.cc
> $(CONFIG_H) \
>  		$(srcdir)/config/arm/arm-builtins.cc
> 
>  arm-mve-builtins.o: $(srcdir)/config/arm/arm-mve-builtins.cc $(CONFIG_H) \
> -  $(SYSTEM_H) coretypes.h $(TM_H) $(TREE_H) \
> -  fold-const.h langhooks.h stringpool.h attribs.h diagnostic.h \
> +  $(SYSTEM_H) coretypes.h $(TM_H) $(TREE_H) $(RTL_H) $(TM_P_H) \
> +  memmodel.h insn-codes.h optabs.h recog.h expr.h basic-block.h \
> +  function.h fold-const.h gimple.h gimple-fold.h emit-rtl.h langhooks.h \
> +  stringpool.h attribs.h diagnostic.h \
>    $(srcdir)/config/arm/arm-protos.h \
>    $(srcdir)/config/arm/arm-builtins.h \
>    $(srcdir)/config/arm/arm-mve-builtins.h \
> -  $(srcdir)/config/arm/arm-mve-builtins.def
> +  $(srcdir)/config/arm/arm-mve-builtins-base.h \
> +  $(srcdir)/config/arm/arm-mve-builtins-shapes.h \
> +  $(srcdir)/config/arm/arm-mve-builtins.def \
> +  $(srcdir)/config/arm/arm-mve-builtins-base.def
>  	$(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS)
> $(INCLUDES) \
>  		$(srcdir)/config/arm/arm-mve-builtins.cc
> 
> +arm-mve-builtins-shapes.o: \
> +  $(srcdir)/config/arm/arm-mve-builtins-shapes.cc \
> +  $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(TREE_H) \
> +  $(RTL_H) memmodel.h insn-codes.h optabs.h \
> +  $(srcdir)/config/arm/arm-mve-builtins.h \
> +  $(srcdir)/config/arm/arm-mve-builtins-shapes.h
> +	$(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS)
> $(INCLUDES) \
> +		$(srcdir)/config/arm/arm-mve-builtins-shapes.cc
> +
> +arm-mve-builtins-base.o: \
> +  $(srcdir)/config/arm/arm-mve-builtins-base.cc \
> +  $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(TREE_H) $(RTL_H) \
> +  memmodel.h insn-codes.h $(OPTABS_H) \
> +  $(BASIC_BLOCK_H) $(FUNCTION_H) $(GIMPLE_H) \
> +  $(srcdir)/config/arm/arm-mve-builtins.h \
> +  $(srcdir)/config/arm/arm-mve-builtins-shapes.h \
> +  $(srcdir)/config/arm/arm-mve-builtins-base.h \
> +  $(srcdir)/config/arm/arm-mve-builtins-functions.h
> +	$(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS)
> $(INCLUDES) \
> +		$(srcdir)/config/arm/arm-mve-builtins-base.cc
> +
>  arm-c.o: $(srcdir)/config/arm/arm-c.cc $(CONFIG_H) $(SYSTEM_H) \
>      coretypes.h $(TM_H) $(TREE_H) output.h $(C_COMMON_H)
>  	$(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS)
> $(INCLUDES) \
> --
> 2.34.1


^ permalink raw reply	[flat|nested] 55+ messages in thread

* RE: [PATCH 03/22] arm: [MVE intrinsics] Rework vreinterpretq
  2023-04-18 13:45 ` [PATCH 03/22] arm: [MVE intrinsics] Rework vreinterpretq Christophe Lyon
@ 2023-05-02 10:26   ` Kyrylo Tkachov
  2023-05-02 14:05     ` Christophe Lyon
  0 siblings, 1 reply; 55+ messages in thread
From: Kyrylo Tkachov @ 2023-05-02 10:26 UTC (permalink / raw)
  To: Christophe Lyon, gcc-patches, Richard Earnshaw, Richard Sandiford
  Cc: Christophe Lyon

Hi Christophe,

> -----Original Message-----
> From: Christophe Lyon <christophe.lyon@arm.com>
> Sent: Tuesday, April 18, 2023 2:46 PM
> To: gcc-patches@gcc.gnu.org; Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>;
> Richard Earnshaw <Richard.Earnshaw@arm.com>; Richard Sandiford
> <Richard.Sandiford@arm.com>
> Cc: Christophe Lyon <Christophe.Lyon@arm.com>
> Subject: [PATCH 03/22] arm: [MVE intrinsics] Rework vreinterpretq
> 
> This patch implements vreinterpretq using the new MVE intrinsics
> framework.
> 
> The old definitions for vreinterpretq are removed as a consequence.
> 
> 2022-09-08  Murray Steele  <murray.steele@arm.com>
> 	    Christophe Lyon  <christophe.lyon@arm.com>
> 
> 	gcc/
> 	* config/arm/arm-mve-builtins-base.cc (vreinterpretq_impl): New
> class.
> 	* config/arm/arm-mve-builtins-base.def: Define vreinterpretq.
> 	* config/arm/arm-mve-builtins-base.h (vreinterpretq): New
> declaration.
> 	* config/arm/arm-mve-builtins-shapes.cc (parse_element_type): New
> function.
> 	(parse_type): Likewise.
> 	(parse_signature): Likewise.
> 	(build_one): Likewise.
> 	(build_all): Likewise.
> 	(overloaded_base): New struct.
> 	(unary_convert_def): Likewise.
> 	* config/arm/arm-mve-builtins-shapes.h (unary_convert): Declare.
> 	* config/arm/arm-mve-builtins.cc (TYPES_reinterpret_signed1): New
> 	macro.
> 	(TYPES_reinterpret_unsigned1): Likewise.
> 	(TYPES_reinterpret_integer): Likewise.
> 	(TYPES_reinterpret_integer1): Likewise.
> 	(TYPES_reinterpret_float1): Likewise.
> 	(TYPES_reinterpret_float): Likewise.
> 	(reinterpret_integer): New.
> 	(reinterpret_float): New.
> 	(handle_arm_mve_h): Register builtins.
> 	* config/arm/arm_mve.h (vreinterpretq_s16): Remove.
> 	(vreinterpretq_s32): Likewise.
> 	(vreinterpretq_s64): Likewise.
> 	(vreinterpretq_s8): Likewise.
> 	(vreinterpretq_u16): Likewise.
> 	(vreinterpretq_u32): Likewise.
> 	(vreinterpretq_u64): Likewise.
> 	(vreinterpretq_u8): Likewise.
> 	(vreinterpretq_f16): Likewise.
> 	(vreinterpretq_f32): Likewise.
> 	(vreinterpretq_s16_s32): Likewise.
> 	(vreinterpretq_s16_s64): Likewise.
> 	(vreinterpretq_s16_s8): Likewise.
> 	(vreinterpretq_s16_u16): Likewise.
> 	(vreinterpretq_s16_u32): Likewise.
> 	(vreinterpretq_s16_u64): Likewise.
> 	(vreinterpretq_s16_u8): Likewise.
> 	(vreinterpretq_s32_s16): Likewise.
> 	(vreinterpretq_s32_s64): Likewise.
> 	(vreinterpretq_s32_s8): Likewise.
> 	(vreinterpretq_s32_u16): Likewise.
> 	(vreinterpretq_s32_u32): Likewise.
> 	(vreinterpretq_s32_u64): Likewise.
> 	(vreinterpretq_s32_u8): Likewise.
> 	(vreinterpretq_s64_s16): Likewise.
> 	(vreinterpretq_s64_s32): Likewise.
> 	(vreinterpretq_s64_s8): Likewise.
> 	(vreinterpretq_s64_u16): Likewise.
> 	(vreinterpretq_s64_u32): Likewise.
> 	(vreinterpretq_s64_u64): Likewise.
> 	(vreinterpretq_s64_u8): Likewise.
> 	(vreinterpretq_s8_s16): Likewise.
> 	(vreinterpretq_s8_s32): Likewise.
> 	(vreinterpretq_s8_s64): Likewise.
> 	(vreinterpretq_s8_u16): Likewise.
> 	(vreinterpretq_s8_u32): Likewise.
> 	(vreinterpretq_s8_u64): Likewise.
> 	(vreinterpretq_s8_u8): Likewise.
> 	(vreinterpretq_u16_s16): Likewise.
> 	(vreinterpretq_u16_s32): Likewise.
> 	(vreinterpretq_u16_s64): Likewise.
> 	(vreinterpretq_u16_s8): Likewise.
> 	(vreinterpretq_u16_u32): Likewise.
> 	(vreinterpretq_u16_u64): Likewise.
> 	(vreinterpretq_u16_u8): Likewise.
> 	(vreinterpretq_u32_s16): Likewise.
> 	(vreinterpretq_u32_s32): Likewise.
> 	(vreinterpretq_u32_s64): Likewise.
> 	(vreinterpretq_u32_s8): Likewise.
> 	(vreinterpretq_u32_u16): Likewise.
> 	(vreinterpretq_u32_u64): Likewise.
> 	(vreinterpretq_u32_u8): Likewise.
> 	(vreinterpretq_u64_s16): Likewise.
> 	(vreinterpretq_u64_s32): Likewise.
> 	(vreinterpretq_u64_s64): Likewise.
> 	(vreinterpretq_u64_s8): Likewise.
> 	(vreinterpretq_u64_u16): Likewise.
> 	(vreinterpretq_u64_u32): Likewise.
> 	(vreinterpretq_u64_u8): Likewise.
> 	(vreinterpretq_u8_s16): Likewise.
> 	(vreinterpretq_u8_s32): Likewise.
> 	(vreinterpretq_u8_s64): Likewise.
> 	(vreinterpretq_u8_s8): Likewise.
> 	(vreinterpretq_u8_u16): Likewise.
> 	(vreinterpretq_u8_u32): Likewise.
> 	(vreinterpretq_u8_u64): Likewise.
> 	(vreinterpretq_s32_f16): Likewise.
> 	(vreinterpretq_s32_f32): Likewise.
> 	(vreinterpretq_u16_f16): Likewise.
> 	(vreinterpretq_u16_f32): Likewise.
> 	(vreinterpretq_u32_f16): Likewise.
> 	(vreinterpretq_u32_f32): Likewise.
> 	(vreinterpretq_u64_f16): Likewise.
> 	(vreinterpretq_u64_f32): Likewise.
> 	(vreinterpretq_u8_f16): Likewise.
> 	(vreinterpretq_u8_f32): Likewise.
> 	(vreinterpretq_f16_f32): Likewise.
> 	(vreinterpretq_f16_s16): Likewise.
> 	(vreinterpretq_f16_s32): Likewise.
> 	(vreinterpretq_f16_s64): Likewise.
> 	(vreinterpretq_f16_s8): Likewise.
> 	(vreinterpretq_f16_u16): Likewise.
> 	(vreinterpretq_f16_u32): Likewise.
> 	(vreinterpretq_f16_u64): Likewise.
> 	(vreinterpretq_f16_u8): Likewise.
> 	(vreinterpretq_f32_f16): Likewise.
> 	(vreinterpretq_f32_s16): Likewise.
> 	(vreinterpretq_f32_s32): Likewise.
> 	(vreinterpretq_f32_s64): Likewise.
> 	(vreinterpretq_f32_s8): Likewise.
> 	(vreinterpretq_f32_u16): Likewise.
> 	(vreinterpretq_f32_u32): Likewise.
> 	(vreinterpretq_f32_u64): Likewise.
> 	(vreinterpretq_f32_u8): Likewise.
> 	(vreinterpretq_s16_f16): Likewise.
> 	(vreinterpretq_s16_f32): Likewise.
> 	(vreinterpretq_s64_f16): Likewise.
> 	(vreinterpretq_s64_f32): Likewise.
> 	(vreinterpretq_s8_f16): Likewise.
> 	(vreinterpretq_s8_f32): Likewise.
> 	(__arm_vreinterpretq_f16): Likewise.
> 	(__arm_vreinterpretq_f32): Likewise.
> 	(__arm_vreinterpretq_s16): Likewise.
> 	(__arm_vreinterpretq_s32): Likewise.
> 	(__arm_vreinterpretq_s64): Likewise.
> 	(__arm_vreinterpretq_s8): Likewise.
> 	(__arm_vreinterpretq_u16): Likewise.
> 	(__arm_vreinterpretq_u32): Likewise.
> 	(__arm_vreinterpretq_u64): Likewise.
> 	(__arm_vreinterpretq_u8): Likewise.
> 	* config/arm/arm_mve_types.h (__arm_vreinterpretq_s16_s32):
> Remove.
> 	(__arm_vreinterpretq_s16_s64): Likewise.
> 	(__arm_vreinterpretq_s16_s8): Likewise.
> 	(__arm_vreinterpretq_s16_u16): Likewise.
> 	(__arm_vreinterpretq_s16_u32): Likewise.
> 	(__arm_vreinterpretq_s16_u64): Likewise.
> 	(__arm_vreinterpretq_s16_u8): Likewise.
> 	(__arm_vreinterpretq_s32_s16): Likewise.
> 	(__arm_vreinterpretq_s32_s64): Likewise.
> 	(__arm_vreinterpretq_s32_s8): Likewise.
> 	(__arm_vreinterpretq_s32_u16): Likewise.
> 	(__arm_vreinterpretq_s32_u32): Likewise.
> 	(__arm_vreinterpretq_s32_u64): Likewise.
> 	(__arm_vreinterpretq_s32_u8): Likewise.
> 	(__arm_vreinterpretq_s64_s16): Likewise.
> 	(__arm_vreinterpretq_s64_s32): Likewise.
> 	(__arm_vreinterpretq_s64_s8): Likewise.
> 	(__arm_vreinterpretq_s64_u16): Likewise.
> 	(__arm_vreinterpretq_s64_u32): Likewise.
> 	(__arm_vreinterpretq_s64_u64): Likewise.
> 	(__arm_vreinterpretq_s64_u8): Likewise.
> 	(__arm_vreinterpretq_s8_s16): Likewise.
> 	(__arm_vreinterpretq_s8_s32): Likewise.
> 	(__arm_vreinterpretq_s8_s64): Likewise.
> 	(__arm_vreinterpretq_s8_u16): Likewise.
> 	(__arm_vreinterpretq_s8_u32): Likewise.
> 	(__arm_vreinterpretq_s8_u64): Likewise.
> 	(__arm_vreinterpretq_s8_u8): Likewise.
> 	(__arm_vreinterpretq_u16_s16): Likewise.
> 	(__arm_vreinterpretq_u16_s32): Likewise.
> 	(__arm_vreinterpretq_u16_s64): Likewise.
> 	(__arm_vreinterpretq_u16_s8): Likewise.
> 	(__arm_vreinterpretq_u16_u32): Likewise.
> 	(__arm_vreinterpretq_u16_u64): Likewise.
> 	(__arm_vreinterpretq_u16_u8): Likewise.
> 	(__arm_vreinterpretq_u32_s16): Likewise.
> 	(__arm_vreinterpretq_u32_s32): Likewise.
> 	(__arm_vreinterpretq_u32_s64): Likewise.
> 	(__arm_vreinterpretq_u32_s8): Likewise.
> 	(__arm_vreinterpretq_u32_u16): Likewise.
> 	(__arm_vreinterpretq_u32_u64): Likewise.
> 	(__arm_vreinterpretq_u32_u8): Likewise.
> 	(__arm_vreinterpretq_u64_s16): Likewise.
> 	(__arm_vreinterpretq_u64_s32): Likewise.
> 	(__arm_vreinterpretq_u64_s64): Likewise.
> 	(__arm_vreinterpretq_u64_s8): Likewise.
> 	(__arm_vreinterpretq_u64_u16): Likewise.
> 	(__arm_vreinterpretq_u64_u32): Likewise.
> 	(__arm_vreinterpretq_u64_u8): Likewise.
> 	(__arm_vreinterpretq_u8_s16): Likewise.
> 	(__arm_vreinterpretq_u8_s32): Likewise.
> 	(__arm_vreinterpretq_u8_s64): Likewise.
> 	(__arm_vreinterpretq_u8_s8): Likewise.
> 	(__arm_vreinterpretq_u8_u16): Likewise.
> 	(__arm_vreinterpretq_u8_u32): Likewise.
> 	(__arm_vreinterpretq_u8_u64): Likewise.
> 	(__arm_vreinterpretq_s32_f16): Likewise.
> 	(__arm_vreinterpretq_s32_f32): Likewise.
> 	(__arm_vreinterpretq_s16_f16): Likewise.
> 	(__arm_vreinterpretq_s16_f32): Likewise.
> 	(__arm_vreinterpretq_s64_f16): Likewise.
> 	(__arm_vreinterpretq_s64_f32): Likewise.
> 	(__arm_vreinterpretq_s8_f16): Likewise.
> 	(__arm_vreinterpretq_s8_f32): Likewise.
> 	(__arm_vreinterpretq_u16_f16): Likewise.
> 	(__arm_vreinterpretq_u16_f32): Likewise.
> 	(__arm_vreinterpretq_u32_f16): Likewise.
> 	(__arm_vreinterpretq_u32_f32): Likewise.
> 	(__arm_vreinterpretq_u64_f16): Likewise.
> 	(__arm_vreinterpretq_u64_f32): Likewise.
> 	(__arm_vreinterpretq_u8_f16): Likewise.
> 	(__arm_vreinterpretq_u8_f32): Likewise.
> 	(__arm_vreinterpretq_f16_f32): Likewise.
> 	(__arm_vreinterpretq_f16_s16): Likewise.
> 	(__arm_vreinterpretq_f16_s32): Likewise.
> 	(__arm_vreinterpretq_f16_s64): Likewise.
> 	(__arm_vreinterpretq_f16_s8): Likewise.
> 	(__arm_vreinterpretq_f16_u16): Likewise.
> 	(__arm_vreinterpretq_f16_u32): Likewise.
> 	(__arm_vreinterpretq_f16_u64): Likewise.
> 	(__arm_vreinterpretq_f16_u8): Likewise.
> 	(__arm_vreinterpretq_f32_f16): Likewise.
> 	(__arm_vreinterpretq_f32_s16): Likewise.
> 	(__arm_vreinterpretq_f32_s32): Likewise.
> 	(__arm_vreinterpretq_f32_s64): Likewise.
> 	(__arm_vreinterpretq_f32_s8): Likewise.
> 	(__arm_vreinterpretq_f32_u16): Likewise.
> 	(__arm_vreinterpretq_f32_u32): Likewise.
> 	(__arm_vreinterpretq_f32_u64): Likewise.
> 	(__arm_vreinterpretq_f32_u8): Likewise.
> 	(__arm_vreinterpretq_s16): Likewise.
> 	(__arm_vreinterpretq_s32): Likewise.
> 	(__arm_vreinterpretq_s64): Likewise.
> 	(__arm_vreinterpretq_s8): Likewise.
> 	(__arm_vreinterpretq_u16): Likewise.
> 	(__arm_vreinterpretq_u32): Likewise.
> 	(__arm_vreinterpretq_u64): Likewise.
> 	(__arm_vreinterpretq_u8): Likewise.
> 	(__arm_vreinterpretq_f16): Likewise.
> 	(__arm_vreinterpretq_f32): Likewise.
> 	* config/arm/mve.md (@arm_mve_reinterpret<mode>): New
> pattern.
> 	* config/arm/unspecs.md: (REINTERPRET): New unspec.
> 
> 	gcc/testsuite/
> 	* g++.target/arm/mve.exp: Add general-c++ and general directories.
> 	* g++.target/arm/mve/general-c++/nomve_fp_1.c: New test.
> 	* g++.target/arm/mve/general-c++/vreinterpretq_1.C: New test.
> 	* gcc.target/arm/mve/general-c/nomve_fp_1.c: New test.
> 	* gcc.target/arm/mve/general-c/vreinterpretq_1.c: New test.
> ---
>  gcc/config/arm/arm-mve-builtins-base.cc       |   29 +
>  gcc/config/arm/arm-mve-builtins-base.def      |    2 +
>  gcc/config/arm/arm-mve-builtins-base.h        |    2 +
>  gcc/config/arm/arm-mve-builtins-shapes.cc     |   28 +
>  gcc/config/arm/arm-mve-builtins-shapes.h      |    8 +
>  gcc/config/arm/arm-mve-builtins.cc            |   60 +
>  gcc/config/arm/arm_mve.h                      |  300 ----
>  gcc/config/arm/arm_mve_types.h                | 1365 +----------------
>  gcc/config/arm/mve.md                         |   18 +
>  gcc/config/arm/unspecs.md                     |    1 +
>  gcc/testsuite/g++.target/arm/mve.exp          |    8 +-
>  .../arm/mve/general-c++/nomve_fp_1.c          |   15 +
>  .../arm/mve/general-c++/vreinterpretq_1.C     |   25 +
>  .../gcc.target/arm/mve/general-c/nomve_fp_1.c |   15 +
>  .../arm/mve/general-c/vreinterpretq_1.c       |   25 +
>  15 files changed, 286 insertions(+), 1615 deletions(-)
>  create mode 100644 gcc/testsuite/g++.target/arm/mve/general-
> c++/nomve_fp_1.c
>  create mode 100644 gcc/testsuite/g++.target/arm/mve/general-
> c++/vreinterpretq_1.C
>  create mode 100644 gcc/testsuite/gcc.target/arm/mve/general-
> c/nomve_fp_1.c
>  create mode 100644 gcc/testsuite/gcc.target/arm/mve/general-
> c/vreinterpretq_1.c
> 
> diff --git a/gcc/config/arm/arm-mve-builtins-base.cc b/gcc/config/arm/arm-
> mve-builtins-base.cc
> index e9f285faf2b..ad8d500afc6 100644
> --- a/gcc/config/arm/arm-mve-builtins-base.cc
> +++ b/gcc/config/arm/arm-mve-builtins-base.cc
> @@ -38,8 +38,37 @@ using namespace arm_mve;
> 
>  namespace {
> 
> +/* Implements vreinterpretq_* intrinsics.  */
> +class vreinterpretq_impl : public quiet<function_base>
> +{
> +  gimple *
> +  fold (gimple_folder &f) const override
> +  {
> +    /* Punt to rtl if the effect of the reinterpret on registers does not
> +       conform to GCC's endianness model.  */
> +    if (!targetm.can_change_mode_class (f.vector_mode (0),
> +					f.vector_mode (1), VFP_REGS))
> +      return NULL;
> +

So we punt to an RTL pattern here if we cannot change mode class...

[snip]

> diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
> index 35eab6c94bf..ab688396f97 100644
> --- a/gcc/config/arm/mve.md
> +++ b/gcc/config/arm/mve.md
> @@ -10561,3 +10561,21 @@ (define_expand
> "vcond_mask_<mode><MVE_vpred>"
>      }
>    DONE;
>  })
> +
> +;; Reinterpret operand 1 in operand 0's mode, without changing its contents.
> +(define_expand "@arm_mve_reinterpret<mode>"
> +  [(set (match_operand:MVE_vecs 0 "register_operand")
> +	(unspec:MVE_vecs
> +	  [(match_operand 1 "arm_any_register_operand")]
> +	  REINTERPRET))]
> +  "(TARGET_HAVE_MVE && VALID_MVE_SI_MODE (<MODE>mode))
> +    || (TARGET_HAVE_MVE_FLOAT && VALID_MVE_SF_MODE
> (<MODE>mode))"
> +  {
> +    machine_mode src_mode = GET_MODE (operands[1]);
> +    if (targetm.can_change_mode_class (<MODE>mode, src_mode,
> VFP_REGS))
> +      {
> +	emit_move_insn (operands[0], gen_lowpart (<MODE>mode,
> operands[1]));
> +	DONE;
> +      }
> +  }
> +)

... But we still check can_change_mode_class in this pattern and if it's not true we emit the new REINTERPRET unspec
without a corresponding define_insn pattern. Won't that ICE? Would this case occur on big-endian targets?

Thanks,
Kyrill

> diff --git a/gcc/config/arm/unspecs.md b/gcc/config/arm/unspecs.md
> index 84384ee798d..dccda283573 100644
> --- a/gcc/config/arm/unspecs.md
> +++ b/gcc/config/arm/unspecs.md
> @@ -1255,4 +1255,5 @@ (define_c_enum "unspec" [
>    SQRSHRL_64
>    SQRSHRL_48
>    VSHLCQ_M_
> +  REINTERPRET
>  ])


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 03/22] arm: [MVE intrinsics] Rework vreinterpretq
  2023-05-02 10:26   ` Kyrylo Tkachov
@ 2023-05-02 14:05     ` Christophe Lyon
  2023-05-02 15:28       ` Kyrylo Tkachov
  0 siblings, 1 reply; 55+ messages in thread
From: Christophe Lyon @ 2023-05-02 14:05 UTC (permalink / raw)
  To: Kyrylo Tkachov, gcc-patches, Richard Earnshaw, Richard Sandiford

[-- Attachment #1: Type: text/plain, Size: 15320 bytes --]


On 5/2/23 12:26, Kyrylo Tkachov wrote:
> Hi Christophe,
>
>> -----Original Message-----
>> From: Christophe Lyon<christophe.lyon@arm.com>
>> Sent: Tuesday, April 18, 2023 2:46 PM
>> To:gcc-patches@gcc.gnu.org; Kyrylo Tkachov<Kyrylo.Tkachov@arm.com>;
>> Richard Earnshaw<Richard.Earnshaw@arm.com>; Richard Sandiford
>> <Richard.Sandiford@arm.com>
>> Cc: Christophe Lyon<Christophe.Lyon@arm.com>
>> Subject: [PATCH 03/22] arm: [MVE intrinsics] Rework vreinterpretq
>>
>> This patch implements vreinterpretq using the new MVE intrinsics
>> framework.
>>
>> The old definitions for vreinterpretq are removed as a consequence.
>>
>> 2022-09-08  Murray Steele<murray.steele@arm.com>
>> 	    Christophe Lyon<christophe.lyon@arm.com>
>>
>> 	gcc/
>> 	* config/arm/arm-mve-builtins-base.cc (vreinterpretq_impl): New
>> class.
>> 	* config/arm/arm-mve-builtins-base.def: Define vreinterpretq.
>> 	* config/arm/arm-mve-builtins-base.h (vreinterpretq): New
>> declaration.
>> 	* config/arm/arm-mve-builtins-shapes.cc (parse_element_type): New
>> function.
>> 	(parse_type): Likewise.
>> 	(parse_signature): Likewise.
>> 	(build_one): Likewise.
>> 	(build_all): Likewise.
>> 	(overloaded_base): New struct.
>> 	(unary_convert_def): Likewise.
>> 	* config/arm/arm-mve-builtins-shapes.h (unary_convert): Declare.
>> 	* config/arm/arm-mve-builtins.cc (TYPES_reinterpret_signed1): New
>> 	macro.
>> 	(TYPES_reinterpret_unsigned1): Likewise.
>> 	(TYPES_reinterpret_integer): Likewise.
>> 	(TYPES_reinterpret_integer1): Likewise.
>> 	(TYPES_reinterpret_float1): Likewise.
>> 	(TYPES_reinterpret_float): Likewise.
>> 	(reinterpret_integer): New.
>> 	(reinterpret_float): New.
>> 	(handle_arm_mve_h): Register builtins.
>> 	* config/arm/arm_mve.h (vreinterpretq_s16): Remove.
>> 	(vreinterpretq_s32): Likewise.
>> 	(vreinterpretq_s64): Likewise.
>> 	(vreinterpretq_s8): Likewise.
>> 	(vreinterpretq_u16): Likewise.
>> 	(vreinterpretq_u32): Likewise.
>> 	(vreinterpretq_u64): Likewise.
>> 	(vreinterpretq_u8): Likewise.
>> 	(vreinterpretq_f16): Likewise.
>> 	(vreinterpretq_f32): Likewise.
>> 	(vreinterpretq_s16_s32): Likewise.
>> 	(vreinterpretq_s16_s64): Likewise.
>> 	(vreinterpretq_s16_s8): Likewise.
>> 	(vreinterpretq_s16_u16): Likewise.
>> 	(vreinterpretq_s16_u32): Likewise.
>> 	(vreinterpretq_s16_u64): Likewise.
>> 	(vreinterpretq_s16_u8): Likewise.
>> 	(vreinterpretq_s32_s16): Likewise.
>> 	(vreinterpretq_s32_s64): Likewise.
>> 	(vreinterpretq_s32_s8): Likewise.
>> 	(vreinterpretq_s32_u16): Likewise.
>> 	(vreinterpretq_s32_u32): Likewise.
>> 	(vreinterpretq_s32_u64): Likewise.
>> 	(vreinterpretq_s32_u8): Likewise.
>> 	(vreinterpretq_s64_s16): Likewise.
>> 	(vreinterpretq_s64_s32): Likewise.
>> 	(vreinterpretq_s64_s8): Likewise.
>> 	(vreinterpretq_s64_u16): Likewise.
>> 	(vreinterpretq_s64_u32): Likewise.
>> 	(vreinterpretq_s64_u64): Likewise.
>> 	(vreinterpretq_s64_u8): Likewise.
>> 	(vreinterpretq_s8_s16): Likewise.
>> 	(vreinterpretq_s8_s32): Likewise.
>> 	(vreinterpretq_s8_s64): Likewise.
>> 	(vreinterpretq_s8_u16): Likewise.
>> 	(vreinterpretq_s8_u32): Likewise.
>> 	(vreinterpretq_s8_u64): Likewise.
>> 	(vreinterpretq_s8_u8): Likewise.
>> 	(vreinterpretq_u16_s16): Likewise.
>> 	(vreinterpretq_u16_s32): Likewise.
>> 	(vreinterpretq_u16_s64): Likewise.
>> 	(vreinterpretq_u16_s8): Likewise.
>> 	(vreinterpretq_u16_u32): Likewise.
>> 	(vreinterpretq_u16_u64): Likewise.
>> 	(vreinterpretq_u16_u8): Likewise.
>> 	(vreinterpretq_u32_s16): Likewise.
>> 	(vreinterpretq_u32_s32): Likewise.
>> 	(vreinterpretq_u32_s64): Likewise.
>> 	(vreinterpretq_u32_s8): Likewise.
>> 	(vreinterpretq_u32_u16): Likewise.
>> 	(vreinterpretq_u32_u64): Likewise.
>> 	(vreinterpretq_u32_u8): Likewise.
>> 	(vreinterpretq_u64_s16): Likewise.
>> 	(vreinterpretq_u64_s32): Likewise.
>> 	(vreinterpretq_u64_s64): Likewise.
>> 	(vreinterpretq_u64_s8): Likewise.
>> 	(vreinterpretq_u64_u16): Likewise.
>> 	(vreinterpretq_u64_u32): Likewise.
>> 	(vreinterpretq_u64_u8): Likewise.
>> 	(vreinterpretq_u8_s16): Likewise.
>> 	(vreinterpretq_u8_s32): Likewise.
>> 	(vreinterpretq_u8_s64): Likewise.
>> 	(vreinterpretq_u8_s8): Likewise.
>> 	(vreinterpretq_u8_u16): Likewise.
>> 	(vreinterpretq_u8_u32): Likewise.
>> 	(vreinterpretq_u8_u64): Likewise.
>> 	(vreinterpretq_s32_f16): Likewise.
>> 	(vreinterpretq_s32_f32): Likewise.
>> 	(vreinterpretq_u16_f16): Likewise.
>> 	(vreinterpretq_u16_f32): Likewise.
>> 	(vreinterpretq_u32_f16): Likewise.
>> 	(vreinterpretq_u32_f32): Likewise.
>> 	(vreinterpretq_u64_f16): Likewise.
>> 	(vreinterpretq_u64_f32): Likewise.
>> 	(vreinterpretq_u8_f16): Likewise.
>> 	(vreinterpretq_u8_f32): Likewise.
>> 	(vreinterpretq_f16_f32): Likewise.
>> 	(vreinterpretq_f16_s16): Likewise.
>> 	(vreinterpretq_f16_s32): Likewise.
>> 	(vreinterpretq_f16_s64): Likewise.
>> 	(vreinterpretq_f16_s8): Likewise.
>> 	(vreinterpretq_f16_u16): Likewise.
>> 	(vreinterpretq_f16_u32): Likewise.
>> 	(vreinterpretq_f16_u64): Likewise.
>> 	(vreinterpretq_f16_u8): Likewise.
>> 	(vreinterpretq_f32_f16): Likewise.
>> 	(vreinterpretq_f32_s16): Likewise.
>> 	(vreinterpretq_f32_s32): Likewise.
>> 	(vreinterpretq_f32_s64): Likewise.
>> 	(vreinterpretq_f32_s8): Likewise.
>> 	(vreinterpretq_f32_u16): Likewise.
>> 	(vreinterpretq_f32_u32): Likewise.
>> 	(vreinterpretq_f32_u64): Likewise.
>> 	(vreinterpretq_f32_u8): Likewise.
>> 	(vreinterpretq_s16_f16): Likewise.
>> 	(vreinterpretq_s16_f32): Likewise.
>> 	(vreinterpretq_s64_f16): Likewise.
>> 	(vreinterpretq_s64_f32): Likewise.
>> 	(vreinterpretq_s8_f16): Likewise.
>> 	(vreinterpretq_s8_f32): Likewise.
>> 	(__arm_vreinterpretq_f16): Likewise.
>> 	(__arm_vreinterpretq_f32): Likewise.
>> 	(__arm_vreinterpretq_s16): Likewise.
>> 	(__arm_vreinterpretq_s32): Likewise.
>> 	(__arm_vreinterpretq_s64): Likewise.
>> 	(__arm_vreinterpretq_s8): Likewise.
>> 	(__arm_vreinterpretq_u16): Likewise.
>> 	(__arm_vreinterpretq_u32): Likewise.
>> 	(__arm_vreinterpretq_u64): Likewise.
>> 	(__arm_vreinterpretq_u8): Likewise.
>> 	* config/arm/arm_mve_types.h (__arm_vreinterpretq_s16_s32):
>> Remove.
>> 	(__arm_vreinterpretq_s16_s64): Likewise.
>> 	(__arm_vreinterpretq_s16_s8): Likewise.
>> 	(__arm_vreinterpretq_s16_u16): Likewise.
>> 	(__arm_vreinterpretq_s16_u32): Likewise.
>> 	(__arm_vreinterpretq_s16_u64): Likewise.
>> 	(__arm_vreinterpretq_s16_u8): Likewise.
>> 	(__arm_vreinterpretq_s32_s16): Likewise.
>> 	(__arm_vreinterpretq_s32_s64): Likewise.
>> 	(__arm_vreinterpretq_s32_s8): Likewise.
>> 	(__arm_vreinterpretq_s32_u16): Likewise.
>> 	(__arm_vreinterpretq_s32_u32): Likewise.
>> 	(__arm_vreinterpretq_s32_u64): Likewise.
>> 	(__arm_vreinterpretq_s32_u8): Likewise.
>> 	(__arm_vreinterpretq_s64_s16): Likewise.
>> 	(__arm_vreinterpretq_s64_s32): Likewise.
>> 	(__arm_vreinterpretq_s64_s8): Likewise.
>> 	(__arm_vreinterpretq_s64_u16): Likewise.
>> 	(__arm_vreinterpretq_s64_u32): Likewise.
>> 	(__arm_vreinterpretq_s64_u64): Likewise.
>> 	(__arm_vreinterpretq_s64_u8): Likewise.
>> 	(__arm_vreinterpretq_s8_s16): Likewise.
>> 	(__arm_vreinterpretq_s8_s32): Likewise.
>> 	(__arm_vreinterpretq_s8_s64): Likewise.
>> 	(__arm_vreinterpretq_s8_u16): Likewise.
>> 	(__arm_vreinterpretq_s8_u32): Likewise.
>> 	(__arm_vreinterpretq_s8_u64): Likewise.
>> 	(__arm_vreinterpretq_s8_u8): Likewise.
>> 	(__arm_vreinterpretq_u16_s16): Likewise.
>> 	(__arm_vreinterpretq_u16_s32): Likewise.
>> 	(__arm_vreinterpretq_u16_s64): Likewise.
>> 	(__arm_vreinterpretq_u16_s8): Likewise.
>> 	(__arm_vreinterpretq_u16_u32): Likewise.
>> 	(__arm_vreinterpretq_u16_u64): Likewise.
>> 	(__arm_vreinterpretq_u16_u8): Likewise.
>> 	(__arm_vreinterpretq_u32_s16): Likewise.
>> 	(__arm_vreinterpretq_u32_s32): Likewise.
>> 	(__arm_vreinterpretq_u32_s64): Likewise.
>> 	(__arm_vreinterpretq_u32_s8): Likewise.
>> 	(__arm_vreinterpretq_u32_u16): Likewise.
>> 	(__arm_vreinterpretq_u32_u64): Likewise.
>> 	(__arm_vreinterpretq_u32_u8): Likewise.
>> 	(__arm_vreinterpretq_u64_s16): Likewise.
>> 	(__arm_vreinterpretq_u64_s32): Likewise.
>> 	(__arm_vreinterpretq_u64_s64): Likewise.
>> 	(__arm_vreinterpretq_u64_s8): Likewise.
>> 	(__arm_vreinterpretq_u64_u16): Likewise.
>> 	(__arm_vreinterpretq_u64_u32): Likewise.
>> 	(__arm_vreinterpretq_u64_u8): Likewise.
>> 	(__arm_vreinterpretq_u8_s16): Likewise.
>> 	(__arm_vreinterpretq_u8_s32): Likewise.
>> 	(__arm_vreinterpretq_u8_s64): Likewise.
>> 	(__arm_vreinterpretq_u8_s8): Likewise.
>> 	(__arm_vreinterpretq_u8_u16): Likewise.
>> 	(__arm_vreinterpretq_u8_u32): Likewise.
>> 	(__arm_vreinterpretq_u8_u64): Likewise.
>> 	(__arm_vreinterpretq_s32_f16): Likewise.
>> 	(__arm_vreinterpretq_s32_f32): Likewise.
>> 	(__arm_vreinterpretq_s16_f16): Likewise.
>> 	(__arm_vreinterpretq_s16_f32): Likewise.
>> 	(__arm_vreinterpretq_s64_f16): Likewise.
>> 	(__arm_vreinterpretq_s64_f32): Likewise.
>> 	(__arm_vreinterpretq_s8_f16): Likewise.
>> 	(__arm_vreinterpretq_s8_f32): Likewise.
>> 	(__arm_vreinterpretq_u16_f16): Likewise.
>> 	(__arm_vreinterpretq_u16_f32): Likewise.
>> 	(__arm_vreinterpretq_u32_f16): Likewise.
>> 	(__arm_vreinterpretq_u32_f32): Likewise.
>> 	(__arm_vreinterpretq_u64_f16): Likewise.
>> 	(__arm_vreinterpretq_u64_f32): Likewise.
>> 	(__arm_vreinterpretq_u8_f16): Likewise.
>> 	(__arm_vreinterpretq_u8_f32): Likewise.
>> 	(__arm_vreinterpretq_f16_f32): Likewise.
>> 	(__arm_vreinterpretq_f16_s16): Likewise.
>> 	(__arm_vreinterpretq_f16_s32): Likewise.
>> 	(__arm_vreinterpretq_f16_s64): Likewise.
>> 	(__arm_vreinterpretq_f16_s8): Likewise.
>> 	(__arm_vreinterpretq_f16_u16): Likewise.
>> 	(__arm_vreinterpretq_f16_u32): Likewise.
>> 	(__arm_vreinterpretq_f16_u64): Likewise.
>> 	(__arm_vreinterpretq_f16_u8): Likewise.
>> 	(__arm_vreinterpretq_f32_f16): Likewise.
>> 	(__arm_vreinterpretq_f32_s16): Likewise.
>> 	(__arm_vreinterpretq_f32_s32): Likewise.
>> 	(__arm_vreinterpretq_f32_s64): Likewise.
>> 	(__arm_vreinterpretq_f32_s8): Likewise.
>> 	(__arm_vreinterpretq_f32_u16): Likewise.
>> 	(__arm_vreinterpretq_f32_u32): Likewise.
>> 	(__arm_vreinterpretq_f32_u64): Likewise.
>> 	(__arm_vreinterpretq_f32_u8): Likewise.
>> 	(__arm_vreinterpretq_s16): Likewise.
>> 	(__arm_vreinterpretq_s32): Likewise.
>> 	(__arm_vreinterpretq_s64): Likewise.
>> 	(__arm_vreinterpretq_s8): Likewise.
>> 	(__arm_vreinterpretq_u16): Likewise.
>> 	(__arm_vreinterpretq_u32): Likewise.
>> 	(__arm_vreinterpretq_u64): Likewise.
>> 	(__arm_vreinterpretq_u8): Likewise.
>> 	(__arm_vreinterpretq_f16): Likewise.
>> 	(__arm_vreinterpretq_f32): Likewise.
>> 	* config/arm/mve.md (@arm_mve_reinterpret<mode>): New
>> pattern.
>> 	* config/arm/unspecs.md: (REINTERPRET): New unspec.
>>
>> 	gcc/testsuite/
>> 	* g++.target/arm/mve.exp: Add general-c++ and general directories.
>> 	* g++.target/arm/mve/general-c++/nomve_fp_1.c: New test.
>> 	* g++.target/arm/mve/general-c++/vreinterpretq_1.C: New test.
>> 	* gcc.target/arm/mve/general-c/nomve_fp_1.c: New test.
>> 	* gcc.target/arm/mve/general-c/vreinterpretq_1.c: New test.
>> ---
>>   gcc/config/arm/arm-mve-builtins-base.cc       |   29 +
>>   gcc/config/arm/arm-mve-builtins-base.def      |    2 +
>>   gcc/config/arm/arm-mve-builtins-base.h        |    2 +
>>   gcc/config/arm/arm-mve-builtins-shapes.cc     |   28 +
>>   gcc/config/arm/arm-mve-builtins-shapes.h      |    8 +
>>   gcc/config/arm/arm-mve-builtins.cc            |   60 +
>>   gcc/config/arm/arm_mve.h                      |  300 ----
>>   gcc/config/arm/arm_mve_types.h                | 1365 +----------------
>>   gcc/config/arm/mve.md                         |   18 +
>>   gcc/config/arm/unspecs.md                     |    1 +
>>   gcc/testsuite/g++.target/arm/mve.exp          |    8 +-
>>   .../arm/mve/general-c++/nomve_fp_1.c          |   15 +
>>   .../arm/mve/general-c++/vreinterpretq_1.C     |   25 +
>>   .../gcc.target/arm/mve/general-c/nomve_fp_1.c |   15 +
>>   .../arm/mve/general-c/vreinterpretq_1.c       |   25 +
>>   15 files changed, 286 insertions(+), 1615 deletions(-)
>>   create mode 100644 gcc/testsuite/g++.target/arm/mve/general-
>> c++/nomve_fp_1.c
>>   create mode 100644 gcc/testsuite/g++.target/arm/mve/general-
>> c++/vreinterpretq_1.C
>>   create mode 100644 gcc/testsuite/gcc.target/arm/mve/general-
>> c/nomve_fp_1.c
>>   create mode 100644 gcc/testsuite/gcc.target/arm/mve/general-
>> c/vreinterpretq_1.c
>>
>> diff --git a/gcc/config/arm/arm-mve-builtins-base.cc b/gcc/config/arm/arm-
>> mve-builtins-base.cc
>> index e9f285faf2b..ad8d500afc6 100644
>> --- a/gcc/config/arm/arm-mve-builtins-base.cc
>> +++ b/gcc/config/arm/arm-mve-builtins-base.cc
>> @@ -38,8 +38,37 @@ using namespace arm_mve;
>>
>>   namespace {
>>
>> +/* Implements vreinterpretq_* intrinsics.  */
>> +class vreinterpretq_impl : public quiet<function_base>
>> +{
>> +  gimple *
>> +  fold (gimple_folder &f) const override
>> +  {
>> +    /* Punt to rtl if the effect of the reinterpret on registers does not
>> +       conform to GCC's endianness model.  */
>> +    if (!targetm.can_change_mode_class (f.vector_mode (0),
>> +					f.vector_mode (1), VFP_REGS))
>> +      return NULL;
>> +
> So we punt to an RTL pattern here if we cannot change mode class...
>
> [snip]
>
>> diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
>> index 35eab6c94bf..ab688396f97 100644
>> --- a/gcc/config/arm/mve.md
>> +++ b/gcc/config/arm/mve.md
>> @@ -10561,3 +10561,21 @@ (define_expand
>> "vcond_mask_<mode><MVE_vpred>"
>>       }
>>     DONE;
>>   })
>> +
>> +;; Reinterpret operand 1 in operand 0's mode, without changing its contents.
>> +(define_expand "@arm_mve_reinterpret<mode>"
>> +  [(set (match_operand:MVE_vecs 0 "register_operand")
>> +	(unspec:MVE_vecs
>> +	  [(match_operand 1 "arm_any_register_operand")]
>> +	  REINTERPRET))]
>> +  "(TARGET_HAVE_MVE && VALID_MVE_SI_MODE (<MODE>mode))
>> +    || (TARGET_HAVE_MVE_FLOAT && VALID_MVE_SF_MODE
>> (<MODE>mode))"
>> +  {
>> +    machine_mode src_mode = GET_MODE (operands[1]);
>> +    if (targetm.can_change_mode_class (<MODE>mode, src_mode,
>> VFP_REGS))
>> +      {
>> +	emit_move_insn (operands[0], gen_lowpart (<MODE>mode,
>> operands[1]));
>> +	DONE;
>> +      }
>> +  }
>> +)
> ... But we still check can_change_mode_class in this pattern and if it's not true we emit the new REINTERPRET unspec
> without a corresponding define_insn pattern. Won't that ICE? Would this case occur on big-endian targets?


Looks like you are right. However, arm_mve.h is protected by:

#if __ARM_BIG_ENDIAN

#error "MVE intrinsics are not supported in Big-Endian mode."


Just tried to hack my arm_mve.h to accept big-endian, and indeed we do ICE.


In fact, this pattern and vreinterpretq_impl above are quite similar to 
the aarch64 implementation.

I tried with a sample

svint16_t foo(svint8_t value1)
{
returnsvreinterpret_s16_s8(value1);
}
and it seems aarch64-none-elf-gcc -march=armv8.2-a+sve -mbig-endian is 
OK, although
aarch64_can_change_mode_class() has:
if (BYTES_BIG_ENDIAN)
...
if (from_sve_p && GET_MODE_UNIT_SIZE (from) != GET_MODE_UNIT_SIZE (to))
return false;
so it should have a similar problem? I', not sure why it doesn't ICE?
Thanks,
Christophe

>
> Thanks,
> Kyrill
>
>> diff --git a/gcc/config/arm/unspecs.md b/gcc/config/arm/unspecs.md
>> index 84384ee798d..dccda283573 100644
>> --- a/gcc/config/arm/unspecs.md
>> +++ b/gcc/config/arm/unspecs.md
>> @@ -1255,4 +1255,5 @@ (define_c_enum "unspec" [
>>     SQRSHRL_64
>>     SQRSHRL_48
>>     VSHLCQ_M_
>> +  REINTERPRET
>>   ])

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 00/22] arm: New framework for MVE intrinsics
  2023-05-02  9:18 ` [PATCH 00/22] arm: New framework for MVE intrinsics Kyrylo Tkachov
@ 2023-05-02 15:04   ` Christophe Lyon
  2023-05-03 15:01     ` Christophe Lyon
  0 siblings, 1 reply; 55+ messages in thread
From: Christophe Lyon @ 2023-05-02 15:04 UTC (permalink / raw)
  To: Kyrylo Tkachov, gcc-patches, Richard Earnshaw, Richard Sandiford



On 5/2/23 11:18, Kyrylo Tkachov wrote:
> Hi Christophe,
> 
>> -----Original Message-----
>> From: Christophe Lyon <christophe.lyon@arm.com>
>> Sent: Tuesday, April 18, 2023 2:46 PM
>> To: gcc-patches@gcc.gnu.org; Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>;
>> Richard Earnshaw <Richard.Earnshaw@arm.com>; Richard Sandiford
>> <Richard.Sandiford@arm.com>
>> Cc: Christophe Lyon <Christophe.Lyon@arm.com>
>> Subject: [PATCH 00/22] arm: New framework for MVE intrinsics
>>
>> Hi,
>>
>> This is the beginning of a long patch series to change the way Arm MVE
>> intrinsics are implemented. The goal is to get rid of arm_mve.h, which
>> takes a long time to parse and compile.
>>
> 
> Thanks for doing this. It is a significant improvement to the MVE intrinsics and should address some of the biggest maintainability and scalability issues we have in that area.
> I'll be going through the patches one-by-one (I've looked at these offline already before), but the approach looks good to me at a high level.
> 
> My hope is that we'll move all the intrinsics, including the Neon ones to use this framework in the future, but getting the framework in place first is a good major first step in that direction.
> 

Indeed. Ideally we'd probably want to make this framework more generic 
so that it supports aarch64 SVE, arm MVE and Neon, but that can be done 
later. I tried to highlight the differences I noticed compared to SVE, 
so that it helps us think what needs to be specialized for different 
targets, as opposed to what is already generic enough.

Thanks,

Christophe

> Thanks,
> Kyrill
> 
>> Roughly speaking, it's about using a framework very similar to what is
>> implemented for AArch64/SVE intrinsics. I haven't converted all the
>> intrinsics yet, but I think it would be good to start the conversion
>> when stage-1 reopens.
>>
>> * Factorizing names
>> One of the main implementation differences I noticed between SVE and
>> MVE is that mve.md provides only full builtin names at the moment, and
>> makes almost no use of "parameterized names"
>> (https://gcc.gnu.org/onlinedocs/gccint/Parameterized-
>> Names.html#Parameterized-Names).
>>
>> Without this, we'd need the builtin expander to use a large
>> switch/case of the form:
>>
>> switch (code)
>> case VADDQ_S: insn_code = code_for_mve_vaddq_s (...)
>> case VADDQ_U: insn_code = code_for_mve_vaddq_u (...)
>> case VSUBQ_S: insn_code = code_for_mve_vsubq_s (...)
>> case VSUBQ_U: insn_code = code_for_mve_vsubq_u (...)
>> ....
>>
>> so part of the work (which I called "factorize" in the commit
>> messages) is about replacing
>>
>> (define_insn "mve_vaddq_n_<supf><mode>"
>> with
>> (define_insn "@mve_<mve_insn>q_n_<supf><mode>"
>> with the help of a new iterator (mve_insn).
>>
>> Doing so makes it more obvious that some patterns are identical,
>> except for the instruction name. I took this opportunity to merge
>> them, so for instance I have a patch which merges add, sub and mul
>> patterns.  Although not strictly necessary for the MVE intrinsics
>> restructuring work, this is a good opportunity to reduce such code
>> duplication (I did notice a few bugs during that process, which led me
>> to post a few small patches in the past months).  Note that identical
>> patterns will probably remain after the series, they can be merged
>> later if we want.
>>
>> This factorization also implies the introduction of new iterators, but
>> also means that several existing ones become useless. These patches do
>> not remove them because it's a bit painful to reorder patches which
>> remove lines at some "random" places, leading to merge conflicts. It's
>> much simpler to write a big cleanup patch at the end of the serie to
>> remove all such useless iterators at once.
>>
>> * Intrinsic re-implementation
>> After intrinsic names have been factorized, the actual
>> re-implementation patch is small:
>> - add 1 line in each of arm-mve-builtins-base.{cc,def,h} describing
>>    the intrinsic shape/signature, types and predicates involved,
>>    RTX/unspec codes
>> - remove the intrinsic definitions from arm_mve.h
>>
>> The full series of ~140 patches is organized like this:
>> - patches 1 and 2 introduce the new framework
>> - new implementation of vreinterpretq
>> - new implementation of vuninitialized
>> - patch groups of varying size, consisting in:
>>    - add a new "shape" if needed (e.g. unary, binary, ternary, ....)
>>    - add framework support functions if needed
>>    - factorize a set of intrinsics (at minimum, just make use of
>>      parameterized-names)
>>    - actual re-implementation of the intrinsics
>>
>> I kept patches small so the incremental progress is easy to follow and
>> check.  I'll submit the patches in small groups, this first one will
>> make sure we agree on the implementation.
>>
>> Tested on arm-eabi with -mthumb/-mfloat-abi=hard/-march=armv8.1-
>> m.main+mve.
>>
>> To help reviewers, I suggest to compare arm-mve-builtins.cc with
>> aarch64-sve-builtins.cc.
>>
>> Christophe Lyon (22):
>>    arm: move builtin function codes into general numberspace
>>    arm: [MVE intrinsics] Add new framework
>>    arm: [MVE intrinsics] Rework vreinterpretq
>>    arm: [MVE intrinsics] Rework vuninitialized
>>    arm: [MVE intrinsics] add binary_opt_n shape
>>    arm: [MVE intrinsics] add unspec_based_mve_function_exact_insn
>>    arm: [MVE intrinsics] factorize vadd vsubq vmulq
>>    arm: [MVE intrinsics] rework vaddq vmulq vsubq
>>    arm: [MVE intrinsics] add binary shape
>>    arm: [MVE intrinsics] factorize vandq veorq vorrq vbicq
>>    arm: [MVE intrinsics] rework vandq veorq
>>    arm: [MVE intrinsics] add binary_orrq shape
>>    arm: [MVE intrinsics] rework vorrq
>>    arm: [MVE intrinsics] add unspec_mve_function_exact_insn
>>    arm: [MVE intrinsics] add create shape
>>    arm: [MVE intrinsics] factorize vcreateq
>>    arm: [MVE intrinsics] rework vcreateq
>>    arm: [MVE intrinsics] factorize several binary_m operations
>>    arm: [MVE intrinsics] factorize several binary _n operations
>>    arm: [MVE intrinsics] factorize several binary _m_n operations
>>    arm: [MVE intrinsics] factorize several binary operations
>>    arm: [MVE intrinsics] rework vhaddq vhsubq vmulhq vqaddq vqsubq
>>      vqdmulhq vrhaddq vrmulhq
>>
>>   gcc/config.gcc                                |    2 +-
>>   gcc/config/arm/arm-builtins.cc                |  237 +-
>>   gcc/config/arm/arm-builtins.h                 |    1 +
>>   gcc/config/arm/arm-c.cc                       |   42 +-
>>   gcc/config/arm/arm-mve-builtins-base.cc       |  163 +
>>   gcc/config/arm/arm-mve-builtins-base.def      |   50 +
>>   gcc/config/arm/arm-mve-builtins-base.h        |   47 +
>>   gcc/config/arm/arm-mve-builtins-functions.h   |  387 +
>>   gcc/config/arm/arm-mve-builtins-shapes.cc     |  529 ++
>>   gcc/config/arm/arm-mve-builtins-shapes.h      |   47 +
>>   gcc/config/arm/arm-mve-builtins.cc            | 2013 ++++-
>>   gcc/config/arm/arm-mve-builtins.def           |   40 +-
>>   gcc/config/arm/arm-mve-builtins.h             |  672 +-
>>   gcc/config/arm/arm-protos.h                   |   24 +
>>   gcc/config/arm/arm.cc                         |   27 +
>>   gcc/config/arm/arm_mve.h                      | 7581 +----------------
>>   gcc/config/arm/arm_mve_builtins.def           |    6 -
>>   gcc/config/arm/arm_mve_types.h                | 1430 ----
>>   gcc/config/arm/iterators.md                   |  240 +-
>>   gcc/config/arm/mve.md                         | 1747 +---
>>   gcc/config/arm/predicates.md                  |    4 +
>>   gcc/config/arm/t-arm                          |   32 +-
>>   gcc/config/arm/unspecs.md                     |    1 +
>>   gcc/config/arm/vec-common.md                  |    8 +-
>>   gcc/testsuite/g++.target/arm/mve.exp          |    8 +-
>>   .../arm/mve/general-c++/nomve_fp_1.c          |   15 +
>>   .../arm/mve/general-c++/vreinterpretq_1.C     |   25 +
>>   .../gcc.target/arm/mve/general-c/nomve_fp_1.c |   15 +
>>   .../arm/mve/general-c/vreinterpretq_1.c       |   25 +
>>   29 files changed, 4926 insertions(+), 10492 deletions(-)
>>   create mode 100644 gcc/config/arm/arm-mve-builtins-base.cc
>>   create mode 100644 gcc/config/arm/arm-mve-builtins-base.def
>>   create mode 100644 gcc/config/arm/arm-mve-builtins-base.h
>>   create mode 100644 gcc/config/arm/arm-mve-builtins-functions.h
>>   create mode 100644 gcc/config/arm/arm-mve-builtins-shapes.cc
>>   create mode 100644 gcc/config/arm/arm-mve-builtins-shapes.h
>>   create mode 100644 gcc/testsuite/g++.target/arm/mve/general-
>> c++/nomve_fp_1.c
>>   create mode 100644 gcc/testsuite/g++.target/arm/mve/general-
>> c++/vreinterpretq_1.C
>>   create mode 100644 gcc/testsuite/gcc.target/arm/mve/general-
>> c/nomve_fp_1.c
>>   create mode 100644 gcc/testsuite/gcc.target/arm/mve/general-
>> c/vreinterpretq_1.c
>>
>> --
>> 2.34.1
> 

^ permalink raw reply	[flat|nested] 55+ messages in thread

* RE: [PATCH 03/22] arm: [MVE intrinsics] Rework vreinterpretq
  2023-05-02 14:05     ` Christophe Lyon
@ 2023-05-02 15:28       ` Kyrylo Tkachov
  2023-05-02 15:49         ` Christophe Lyon
  0 siblings, 1 reply; 55+ messages in thread
From: Kyrylo Tkachov @ 2023-05-02 15:28 UTC (permalink / raw)
  To: Christophe Lyon, gcc-patches, Richard Earnshaw, Richard Sandiford



> -----Original Message-----
> From: Christophe Lyon <Christophe.Lyon@arm.com>
> Sent: Tuesday, May 2, 2023 3:05 PM
> To: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>; gcc-patches@gcc.gnu.org;
> Richard Earnshaw <Richard.Earnshaw@arm.com>; Richard Sandiford
> <Richard.Sandiford@arm.com>
> Subject: Re: [PATCH 03/22] arm: [MVE intrinsics] Rework vreinterpretq
> 
> 
> 
> 
> On 5/2/23 12:26, Kyrylo Tkachov wrote:
> 
> 
> 	Hi Christophe,
> 
> 
> 		-----Original Message-----
> 		From: Christophe Lyon <christophe.lyon@arm.com>
> <mailto:christophe.lyon@arm.com>
> 		Sent: Tuesday, April 18, 2023 2:46 PM
> 		To: gcc-patches@gcc.gnu.org <mailto:gcc-
> patches@gcc.gnu.org> ; Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>
> <mailto:Kyrylo.Tkachov@arm.com> ;
> 		Richard Earnshaw <Richard.Earnshaw@arm.com>
> <mailto:Richard.Earnshaw@arm.com> ; Richard Sandiford
> 		<Richard.Sandiford@arm.com>
> <mailto:Richard.Sandiford@arm.com>
> 		Cc: Christophe Lyon <Christophe.Lyon@arm.com>
> <mailto:Christophe.Lyon@arm.com>
> 		Subject: [PATCH 03/22] arm: [MVE intrinsics] Rework
> vreinterpretq
> 
> 		This patch implements vreinterpretq using the new MVE
> intrinsics
> 		framework.
> 
> 		The old definitions for vreinterpretq are removed as a
> consequence.
> 
> 		2022-09-08  Murray Steele  <murray.steele@arm.com>
> <mailto:murray.steele@arm.com>
> 			    Christophe Lyon  <christophe.lyon@arm.com>
> <mailto:christophe.lyon@arm.com>
> 
> 			gcc/
> 			* config/arm/arm-mve-builtins-base.cc
> (vreinterpretq_impl): New
> 		class.
> 			* config/arm/arm-mve-builtins-base.def: Define
> vreinterpretq.
> 			* config/arm/arm-mve-builtins-base.h
> (vreinterpretq): New
> 		declaration.
> 			* config/arm/arm-mve-builtins-shapes.cc
> (parse_element_type): New
> 		function.
> 			(parse_type): Likewise.
> 			(parse_signature): Likewise.
> 			(build_one): Likewise.
> 			(build_all): Likewise.
> 			(overloaded_base): New struct.
> 			(unary_convert_def): Likewise.
> 			* config/arm/arm-mve-builtins-shapes.h
> (unary_convert): Declare.
> 			* config/arm/arm-mve-builtins.cc
> (TYPES_reinterpret_signed1): New
> 			macro.
> 			(TYPES_reinterpret_unsigned1): Likewise.
> 			(TYPES_reinterpret_integer): Likewise.
> 			(TYPES_reinterpret_integer1): Likewise.
> 			(TYPES_reinterpret_float1): Likewise.
> 			(TYPES_reinterpret_float): Likewise.
> 			(reinterpret_integer): New.
> 			(reinterpret_float): New.
> 			(handle_arm_mve_h): Register builtins.
> 			* config/arm/arm_mve.h (vreinterpretq_s16):
> Remove.
> 			(vreinterpretq_s32): Likewise.
> 			(vreinterpretq_s64): Likewise.
> 			(vreinterpretq_s8): Likewise.
> 			(vreinterpretq_u16): Likewise.
> 			(vreinterpretq_u32): Likewise.
> 			(vreinterpretq_u64): Likewise.
> 			(vreinterpretq_u8): Likewise.
> 			(vreinterpretq_f16): Likewise.
> 			(vreinterpretq_f32): Likewise.
> 			(vreinterpretq_s16_s32): Likewise.
> 			(vreinterpretq_s16_s64): Likewise.
> 			(vreinterpretq_s16_s8): Likewise.
> 			(vreinterpretq_s16_u16): Likewise.
> 			(vreinterpretq_s16_u32): Likewise.
> 			(vreinterpretq_s16_u64): Likewise.
> 			(vreinterpretq_s16_u8): Likewise.
> 			(vreinterpretq_s32_s16): Likewise.
> 			(vreinterpretq_s32_s64): Likewise.
> 			(vreinterpretq_s32_s8): Likewise.
> 			(vreinterpretq_s32_u16): Likewise.
> 			(vreinterpretq_s32_u32): Likewise.
> 			(vreinterpretq_s32_u64): Likewise.
> 			(vreinterpretq_s32_u8): Likewise.
> 			(vreinterpretq_s64_s16): Likewise.
> 			(vreinterpretq_s64_s32): Likewise.
> 			(vreinterpretq_s64_s8): Likewise.
> 			(vreinterpretq_s64_u16): Likewise.
> 			(vreinterpretq_s64_u32): Likewise.
> 			(vreinterpretq_s64_u64): Likewise.
> 			(vreinterpretq_s64_u8): Likewise.
> 			(vreinterpretq_s8_s16): Likewise.
> 			(vreinterpretq_s8_s32): Likewise.
> 			(vreinterpretq_s8_s64): Likewise.
> 			(vreinterpretq_s8_u16): Likewise.
> 			(vreinterpretq_s8_u32): Likewise.
> 			(vreinterpretq_s8_u64): Likewise.
> 			(vreinterpretq_s8_u8): Likewise.
> 			(vreinterpretq_u16_s16): Likewise.
> 			(vreinterpretq_u16_s32): Likewise.
> 			(vreinterpretq_u16_s64): Likewise.
> 			(vreinterpretq_u16_s8): Likewise.
> 			(vreinterpretq_u16_u32): Likewise.
> 			(vreinterpretq_u16_u64): Likewise.
> 			(vreinterpretq_u16_u8): Likewise.
> 			(vreinterpretq_u32_s16): Likewise.
> 			(vreinterpretq_u32_s32): Likewise.
> 			(vreinterpretq_u32_s64): Likewise.
> 			(vreinterpretq_u32_s8): Likewise.
> 			(vreinterpretq_u32_u16): Likewise.
> 			(vreinterpretq_u32_u64): Likewise.
> 			(vreinterpretq_u32_u8): Likewise.
> 			(vreinterpretq_u64_s16): Likewise.
> 			(vreinterpretq_u64_s32): Likewise.
> 			(vreinterpretq_u64_s64): Likewise.
> 			(vreinterpretq_u64_s8): Likewise.
> 			(vreinterpretq_u64_u16): Likewise.
> 			(vreinterpretq_u64_u32): Likewise.
> 			(vreinterpretq_u64_u8): Likewise.
> 			(vreinterpretq_u8_s16): Likewise.
> 			(vreinterpretq_u8_s32): Likewise.
> 			(vreinterpretq_u8_s64): Likewise.
> 			(vreinterpretq_u8_s8): Likewise.
> 			(vreinterpretq_u8_u16): Likewise.
> 			(vreinterpretq_u8_u32): Likewise.
> 			(vreinterpretq_u8_u64): Likewise.
> 			(vreinterpretq_s32_f16): Likewise.
> 			(vreinterpretq_s32_f32): Likewise.
> 			(vreinterpretq_u16_f16): Likewise.
> 			(vreinterpretq_u16_f32): Likewise.
> 			(vreinterpretq_u32_f16): Likewise.
> 			(vreinterpretq_u32_f32): Likewise.
> 			(vreinterpretq_u64_f16): Likewise.
> 			(vreinterpretq_u64_f32): Likewise.
> 			(vreinterpretq_u8_f16): Likewise.
> 			(vreinterpretq_u8_f32): Likewise.
> 			(vreinterpretq_f16_f32): Likewise.
> 			(vreinterpretq_f16_s16): Likewise.
> 			(vreinterpretq_f16_s32): Likewise.
> 			(vreinterpretq_f16_s64): Likewise.
> 			(vreinterpretq_f16_s8): Likewise.
> 			(vreinterpretq_f16_u16): Likewise.
> 			(vreinterpretq_f16_u32): Likewise.
> 			(vreinterpretq_f16_u64): Likewise.
> 			(vreinterpretq_f16_u8): Likewise.
> 			(vreinterpretq_f32_f16): Likewise.
> 			(vreinterpretq_f32_s16): Likewise.
> 			(vreinterpretq_f32_s32): Likewise.
> 			(vreinterpretq_f32_s64): Likewise.
> 			(vreinterpretq_f32_s8): Likewise.
> 			(vreinterpretq_f32_u16): Likewise.
> 			(vreinterpretq_f32_u32): Likewise.
> 			(vreinterpretq_f32_u64): Likewise.
> 			(vreinterpretq_f32_u8): Likewise.
> 			(vreinterpretq_s16_f16): Likewise.
> 			(vreinterpretq_s16_f32): Likewise.
> 			(vreinterpretq_s64_f16): Likewise.
> 			(vreinterpretq_s64_f32): Likewise.
> 			(vreinterpretq_s8_f16): Likewise.
> 			(vreinterpretq_s8_f32): Likewise.
> 			(__arm_vreinterpretq_f16): Likewise.
> 			(__arm_vreinterpretq_f32): Likewise.
> 			(__arm_vreinterpretq_s16): Likewise.
> 			(__arm_vreinterpretq_s32): Likewise.
> 			(__arm_vreinterpretq_s64): Likewise.
> 			(__arm_vreinterpretq_s8): Likewise.
> 			(__arm_vreinterpretq_u16): Likewise.
> 			(__arm_vreinterpretq_u32): Likewise.
> 			(__arm_vreinterpretq_u64): Likewise.
> 			(__arm_vreinterpretq_u8): Likewise.
> 			* config/arm/arm_mve_types.h
> (__arm_vreinterpretq_s16_s32):
> 		Remove.
> 			(__arm_vreinterpretq_s16_s64): Likewise.
> 			(__arm_vreinterpretq_s16_s8): Likewise.
> 			(__arm_vreinterpretq_s16_u16): Likewise.
> 			(__arm_vreinterpretq_s16_u32): Likewise.
> 			(__arm_vreinterpretq_s16_u64): Likewise.
> 			(__arm_vreinterpretq_s16_u8): Likewise.
> 			(__arm_vreinterpretq_s32_s16): Likewise.
> 			(__arm_vreinterpretq_s32_s64): Likewise.
> 			(__arm_vreinterpretq_s32_s8): Likewise.
> 			(__arm_vreinterpretq_s32_u16): Likewise.
> 			(__arm_vreinterpretq_s32_u32): Likewise.
> 			(__arm_vreinterpretq_s32_u64): Likewise.
> 			(__arm_vreinterpretq_s32_u8): Likewise.
> 			(__arm_vreinterpretq_s64_s16): Likewise.
> 			(__arm_vreinterpretq_s64_s32): Likewise.
> 			(__arm_vreinterpretq_s64_s8): Likewise.
> 			(__arm_vreinterpretq_s64_u16): Likewise.
> 			(__arm_vreinterpretq_s64_u32): Likewise.
> 			(__arm_vreinterpretq_s64_u64): Likewise.
> 			(__arm_vreinterpretq_s64_u8): Likewise.
> 			(__arm_vreinterpretq_s8_s16): Likewise.
> 			(__arm_vreinterpretq_s8_s32): Likewise.
> 			(__arm_vreinterpretq_s8_s64): Likewise.
> 			(__arm_vreinterpretq_s8_u16): Likewise.
> 			(__arm_vreinterpretq_s8_u32): Likewise.
> 			(__arm_vreinterpretq_s8_u64): Likewise.
> 			(__arm_vreinterpretq_s8_u8): Likewise.
> 			(__arm_vreinterpretq_u16_s16): Likewise.
> 			(__arm_vreinterpretq_u16_s32): Likewise.
> 			(__arm_vreinterpretq_u16_s64): Likewise.
> 			(__arm_vreinterpretq_u16_s8): Likewise.
> 			(__arm_vreinterpretq_u16_u32): Likewise.
> 			(__arm_vreinterpretq_u16_u64): Likewise.
> 			(__arm_vreinterpretq_u16_u8): Likewise.
> 			(__arm_vreinterpretq_u32_s16): Likewise.
> 			(__arm_vreinterpretq_u32_s32): Likewise.
> 			(__arm_vreinterpretq_u32_s64): Likewise.
> 			(__arm_vreinterpretq_u32_s8): Likewise.
> 			(__arm_vreinterpretq_u32_u16): Likewise.
> 			(__arm_vreinterpretq_u32_u64): Likewise.
> 			(__arm_vreinterpretq_u32_u8): Likewise.
> 			(__arm_vreinterpretq_u64_s16): Likewise.
> 			(__arm_vreinterpretq_u64_s32): Likewise.
> 			(__arm_vreinterpretq_u64_s64): Likewise.
> 			(__arm_vreinterpretq_u64_s8): Likewise.
> 			(__arm_vreinterpretq_u64_u16): Likewise.
> 			(__arm_vreinterpretq_u64_u32): Likewise.
> 			(__arm_vreinterpretq_u64_u8): Likewise.
> 			(__arm_vreinterpretq_u8_s16): Likewise.
> 			(__arm_vreinterpretq_u8_s32): Likewise.
> 			(__arm_vreinterpretq_u8_s64): Likewise.
> 			(__arm_vreinterpretq_u8_s8): Likewise.
> 			(__arm_vreinterpretq_u8_u16): Likewise.
> 			(__arm_vreinterpretq_u8_u32): Likewise.
> 			(__arm_vreinterpretq_u8_u64): Likewise.
> 			(__arm_vreinterpretq_s32_f16): Likewise.
> 			(__arm_vreinterpretq_s32_f32): Likewise.
> 			(__arm_vreinterpretq_s16_f16): Likewise.
> 			(__arm_vreinterpretq_s16_f32): Likewise.
> 			(__arm_vreinterpretq_s64_f16): Likewise.
> 			(__arm_vreinterpretq_s64_f32): Likewise.
> 			(__arm_vreinterpretq_s8_f16): Likewise.
> 			(__arm_vreinterpretq_s8_f32): Likewise.
> 			(__arm_vreinterpretq_u16_f16): Likewise.
> 			(__arm_vreinterpretq_u16_f32): Likewise.
> 			(__arm_vreinterpretq_u32_f16): Likewise.
> 			(__arm_vreinterpretq_u32_f32): Likewise.
> 			(__arm_vreinterpretq_u64_f16): Likewise.
> 			(__arm_vreinterpretq_u64_f32): Likewise.
> 			(__arm_vreinterpretq_u8_f16): Likewise.
> 			(__arm_vreinterpretq_u8_f32): Likewise.
> 			(__arm_vreinterpretq_f16_f32): Likewise.
> 			(__arm_vreinterpretq_f16_s16): Likewise.
> 			(__arm_vreinterpretq_f16_s32): Likewise.
> 			(__arm_vreinterpretq_f16_s64): Likewise.
> 			(__arm_vreinterpretq_f16_s8): Likewise.
> 			(__arm_vreinterpretq_f16_u16): Likewise.
> 			(__arm_vreinterpretq_f16_u32): Likewise.
> 			(__arm_vreinterpretq_f16_u64): Likewise.
> 			(__arm_vreinterpretq_f16_u8): Likewise.
> 			(__arm_vreinterpretq_f32_f16): Likewise.
> 			(__arm_vreinterpretq_f32_s16): Likewise.
> 			(__arm_vreinterpretq_f32_s32): Likewise.
> 			(__arm_vreinterpretq_f32_s64): Likewise.
> 			(__arm_vreinterpretq_f32_s8): Likewise.
> 			(__arm_vreinterpretq_f32_u16): Likewise.
> 			(__arm_vreinterpretq_f32_u32): Likewise.
> 			(__arm_vreinterpretq_f32_u64): Likewise.
> 			(__arm_vreinterpretq_f32_u8): Likewise.
> 			(__arm_vreinterpretq_s16): Likewise.
> 			(__arm_vreinterpretq_s32): Likewise.
> 			(__arm_vreinterpretq_s64): Likewise.
> 			(__arm_vreinterpretq_s8): Likewise.
> 			(__arm_vreinterpretq_u16): Likewise.
> 			(__arm_vreinterpretq_u32): Likewise.
> 			(__arm_vreinterpretq_u64): Likewise.
> 			(__arm_vreinterpretq_u8): Likewise.
> 			(__arm_vreinterpretq_f16): Likewise.
> 			(__arm_vreinterpretq_f32): Likewise.
> 			* config/arm/mve.md
> (@arm_mve_reinterpret<mode>): New
> 		pattern.
> 			* config/arm/unspecs.md: (REINTERPRET): New
> unspec.
> 
> 			gcc/testsuite/
> 			* g++.target/arm/mve.exp: Add general-c++ and
> general directories.
> 			* g++.target/arm/mve/general-c++/nomve_fp_1.c:
> New test.
> 			* g++.target/arm/mve/general-c++/vreinterpretq_1.C:
> New test.
> 			* gcc.target/arm/mve/general-c/nomve_fp_1.c: New
> test.
> 			* gcc.target/arm/mve/general-c/vreinterpretq_1.c:
> New test.
> 		---
> 		 gcc/config/arm/arm-mve-builtins-base.cc       |   29 +
> 		 gcc/config/arm/arm-mve-builtins-base.def      |    2 +
> 		 gcc/config/arm/arm-mve-builtins-base.h        |    2 +
> 		 gcc/config/arm/arm-mve-builtins-shapes.cc     |   28 +
> 		 gcc/config/arm/arm-mve-builtins-shapes.h      |    8 +
> 		 gcc/config/arm/arm-mve-builtins.cc            |   60 +
> 		 gcc/config/arm/arm_mve.h                      |  300 ----
> 		 gcc/config/arm/arm_mve_types.h                | 1365 +-------------
> ---
> 		 gcc/config/arm/mve.md                         |   18 +
> 		 gcc/config/arm/unspecs.md                     |    1 +
> 		 gcc/testsuite/g++.target/arm/mve.exp          |    8 +-
> 		 .../arm/mve/general-c++/nomve_fp_1.c          |   15 +
> 		 .../arm/mve/general-c++/vreinterpretq_1.C     |   25 +
> 		 .../gcc.target/arm/mve/general-c/nomve_fp_1.c |   15 +
> 		 .../arm/mve/general-c/vreinterpretq_1.c       |   25 +
> 		 15 files changed, 286 insertions(+), 1615 deletions(-)
> 		 create mode 100644
> gcc/testsuite/g++.target/arm/mve/general-
> 		c++/nomve_fp_1.c
> 		 create mode 100644
> gcc/testsuite/g++.target/arm/mve/general-
> 		c++/vreinterpretq_1.C
> 		 create mode 100644
> gcc/testsuite/gcc.target/arm/mve/general-
> 		c/nomve_fp_1.c
> 		 create mode 100644
> gcc/testsuite/gcc.target/arm/mve/general-
> 		c/vreinterpretq_1.c
> 
> 		diff --git a/gcc/config/arm/arm-mve-builtins-base.cc
> b/gcc/config/arm/arm-
> 		mve-builtins-base.cc
> 		index e9f285faf2b..ad8d500afc6 100644
> 		--- a/gcc/config/arm/arm-mve-builtins-base.cc
> 		+++ b/gcc/config/arm/arm-mve-builtins-base.cc
> 		@@ -38,8 +38,37 @@ using namespace arm_mve;
> 
> 		 namespace {
> 
> 		+/* Implements vreinterpretq_* intrinsics.  */
> 		+class vreinterpretq_impl : public quiet<function_base>
> 		+{
> 		+  gimple *
> 		+  fold (gimple_folder &f) const override
> 		+  {
> 		+    /* Punt to rtl if the effect of the reinterpret on registers
> does not
> 		+       conform to GCC's endianness model.  */
> 		+    if (!targetm.can_change_mode_class (f.vector_mode (0),
> 		+					f.vector_mode (1),
> VFP_REGS))
> 		+      return NULL;
> 		+
> 
> 
> 	So we punt to an RTL pattern here if we cannot change mode class...
> 
> 	[snip]
> 
> 
> 		diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
> 		index 35eab6c94bf..ab688396f97 100644
> 		--- a/gcc/config/arm/mve.md
> 		+++ b/gcc/config/arm/mve.md
> 		@@ -10561,3 +10561,21 @@ (define_expand
> 		"vcond_mask_<mode><MVE_vpred>"
> 		     }
> 		   DONE;
> 		 })
> 		+
> 		+;; Reinterpret operand 1 in operand 0's mode, without
> changing its contents.
> 		+(define_expand "@arm_mve_reinterpret<mode>"
> 		+  [(set (match_operand:MVE_vecs 0 "register_operand")
> 		+	(unspec:MVE_vecs
> 		+	  [(match_operand 1 "arm_any_register_operand")]
> 		+	  REINTERPRET))]
> 		+  "(TARGET_HAVE_MVE && VALID_MVE_SI_MODE
> (<MODE>mode))
> 		+    || (TARGET_HAVE_MVE_FLOAT && VALID_MVE_SF_MODE
> 		(<MODE>mode))"
> 		+  {
> 		+    machine_mode src_mode = GET_MODE (operands[1]);
> 		+    if (targetm.can_change_mode_class (<MODE>mode,
> src_mode,
> 		VFP_REGS))
> 		+      {
> 		+	emit_move_insn (operands[0], gen_lowpart
> (<MODE>mode,
> 		operands[1]));
> 		+	DONE;
> 		+      }
> 		+  }
> 		+)
> 
> 
> 	... But we still check can_change_mode_class in this pattern and if it's
> not true we emit the new REINTERPRET unspec
> 	without a corresponding define_insn pattern. Won't that ICE? Would
> this case occur on big-endian targets?
> 
> 
> 
> 
> Looks like you are right. However, arm_mve.h is protected by:
> 
> #if __ARM_BIG_ENDIAN
> 
> #error "MVE intrinsics are not supported in Big-Endian mode."
> 
> 
> 
> 
> Just tried to hack my arm_mve.h to accept big-endian, and indeed we do ICE.
> 
> 
> 
> 
> In fact, this pattern and vreinterpretq_impl above are quite similar to the
> aarch64 implementation.
> 
> I tried with a sample
> 
> svint16_t foo(svint8_t value1)
> {
> return svreinterpret_s16_s8(value1);
> }
> and it seems aarch64-none-elf-gcc -march=armv8.2-a+sve -mbig-endian is OK,
> although
> aarch64_can_change_mode_class() has:
> if (BYTES_BIG_ENDIAN)
> ...
> if (from_sve_p && GET_MODE_UNIT_SIZE (from) != GET_MODE_UNIT_SIZE
> (to))
> return false;
> so it should have a similar problem? I', not sure why it doesn't ICE?

I believe that's because there's a pattern in aarch64-sve.md that converts everything into a simple set with the right modes forced in.

;; A pattern for handling type punning on big-endian targets.  We use a
;; special predicate for operand 1 to reduce the number of patterns.
(define_insn_and_split "*aarch64_sve_reinterpret<mode>"
  [(set (match_operand:SVE_ALL 0 "register_operand" "=w")
        (unspec:SVE_ALL
          [(match_operand 1 "aarch64_any_register_operand" "w")]
          UNSPEC_REINTERPRET))]
  "TARGET_SVE"
  "#"
  "&& reload_completed"
  [(set (match_dup 0) (match_dup 1))]
  {
    operands[1] = aarch64_replace_reg_mode (operands[1], <MODE>mode);
  }
)

I guess since we don't claim to support big-endian MVE for now we probably don't need to handle it, but I wonder whether we should instead
be asserting that targetm.can_change_mode_class is true in the folding code and adding a comment that it for future big-endian support it should be handled properly in the .md file as on aarch64?

Thanks,
Kyrill


> Thanks,
> Christophe
> 
> 
> 
> 	Thanks,
> 	Kyrill
> 
> 
> 		diff --git a/gcc/config/arm/unspecs.md
> b/gcc/config/arm/unspecs.md
> 		index 84384ee798d..dccda283573 100644
> 		--- a/gcc/config/arm/unspecs.md
> 		+++ b/gcc/config/arm/unspecs.md
> 		@@ -1255,4 +1255,5 @@ (define_c_enum "unspec" [
> 		   SQRSHRL_64
> 		   SQRSHRL_48
> 		   VSHLCQ_M_
> 		+  REINTERPRET
> 		 ])
> 
> 


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 03/22] arm: [MVE intrinsics] Rework vreinterpretq
  2023-05-02 15:28       ` Kyrylo Tkachov
@ 2023-05-02 15:49         ` Christophe Lyon
  2023-05-03 14:37           ` [PATCH v2 " Christophe Lyon
  0 siblings, 1 reply; 55+ messages in thread
From: Christophe Lyon @ 2023-05-02 15:49 UTC (permalink / raw)
  To: Kyrylo Tkachov, gcc-patches, Richard Earnshaw, Richard Sandiford



On 5/2/23 17:28, Kyrylo Tkachov wrote:
> 
> 
>> -----Original Message-----
>> From: Christophe Lyon <Christophe.Lyon@arm.com>
>> Sent: Tuesday, May 2, 2023 3:05 PM
>> To: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>; gcc-patches@gcc.gnu.org;
>> Richard Earnshaw <Richard.Earnshaw@arm.com>; Richard Sandiford
>> <Richard.Sandiford@arm.com>
>> Subject: Re: [PATCH 03/22] arm: [MVE intrinsics] Rework vreinterpretq
>>
>>
>>
>>
>> On 5/2/23 12:26, Kyrylo Tkachov wrote:
>>
>>
>> 	Hi Christophe,
>>
>>
>> 		-----Original Message-----
>> 		From: Christophe Lyon <christophe.lyon@arm.com>
>> <mailto:christophe.lyon@arm.com>
>> 		Sent: Tuesday, April 18, 2023 2:46 PM
>> 		To: gcc-patches@gcc.gnu.org <mailto:gcc-
>> patches@gcc.gnu.org> ; Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>
>> <mailto:Kyrylo.Tkachov@arm.com> ;
>> 		Richard Earnshaw <Richard.Earnshaw@arm.com>
>> <mailto:Richard.Earnshaw@arm.com> ; Richard Sandiford
>> 		<Richard.Sandiford@arm.com>
>> <mailto:Richard.Sandiford@arm.com>
>> 		Cc: Christophe Lyon <Christophe.Lyon@arm.com>
>> <mailto:Christophe.Lyon@arm.com>
>> 		Subject: [PATCH 03/22] arm: [MVE intrinsics] Rework
>> vreinterpretq
>>
>> 		This patch implements vreinterpretq using the new MVE
>> intrinsics
>> 		framework.
>>
>> 		The old definitions for vreinterpretq are removed as a
>> consequence.
>>
>> 		2022-09-08  Murray Steele  <murray.steele@arm.com>
>> <mailto:murray.steele@arm.com>
>> 			    Christophe Lyon  <christophe.lyon@arm.com>
>> <mailto:christophe.lyon@arm.com>
>>
>> 			gcc/
>> 			* config/arm/arm-mve-builtins-base.cc
>> (vreinterpretq_impl): New
>> 		class.
>> 			* config/arm/arm-mve-builtins-base.def: Define
>> vreinterpretq.
>> 			* config/arm/arm-mve-builtins-base.h
>> (vreinterpretq): New
>> 		declaration.
>> 			* config/arm/arm-mve-builtins-shapes.cc
>> (parse_element_type): New
>> 		function.
>> 			(parse_type): Likewise.
>> 			(parse_signature): Likewise.
>> 			(build_one): Likewise.
>> 			(build_all): Likewise.
>> 			(overloaded_base): New struct.
>> 			(unary_convert_def): Likewise.
>> 			* config/arm/arm-mve-builtins-shapes.h
>> (unary_convert): Declare.
>> 			* config/arm/arm-mve-builtins.cc
>> (TYPES_reinterpret_signed1): New
>> 			macro.
>> 			(TYPES_reinterpret_unsigned1): Likewise.
>> 			(TYPES_reinterpret_integer): Likewise.
>> 			(TYPES_reinterpret_integer1): Likewise.
>> 			(TYPES_reinterpret_float1): Likewise.
>> 			(TYPES_reinterpret_float): Likewise.
>> 			(reinterpret_integer): New.
>> 			(reinterpret_float): New.
>> 			(handle_arm_mve_h): Register builtins.
>> 			* config/arm/arm_mve.h (vreinterpretq_s16):
>> Remove.
>> 			(vreinterpretq_s32): Likewise.
>> 			(vreinterpretq_s64): Likewise.
>> 			(vreinterpretq_s8): Likewise.
>> 			(vreinterpretq_u16): Likewise.
>> 			(vreinterpretq_u32): Likewise.
>> 			(vreinterpretq_u64): Likewise.
>> 			(vreinterpretq_u8): Likewise.
>> 			(vreinterpretq_f16): Likewise.
>> 			(vreinterpretq_f32): Likewise.
>> 			(vreinterpretq_s16_s32): Likewise.
>> 			(vreinterpretq_s16_s64): Likewise.
>> 			(vreinterpretq_s16_s8): Likewise.
>> 			(vreinterpretq_s16_u16): Likewise.
>> 			(vreinterpretq_s16_u32): Likewise.
>> 			(vreinterpretq_s16_u64): Likewise.
>> 			(vreinterpretq_s16_u8): Likewise.
>> 			(vreinterpretq_s32_s16): Likewise.
>> 			(vreinterpretq_s32_s64): Likewise.
>> 			(vreinterpretq_s32_s8): Likewise.
>> 			(vreinterpretq_s32_u16): Likewise.
>> 			(vreinterpretq_s32_u32): Likewise.
>> 			(vreinterpretq_s32_u64): Likewise.
>> 			(vreinterpretq_s32_u8): Likewise.
>> 			(vreinterpretq_s64_s16): Likewise.
>> 			(vreinterpretq_s64_s32): Likewise.
>> 			(vreinterpretq_s64_s8): Likewise.
>> 			(vreinterpretq_s64_u16): Likewise.
>> 			(vreinterpretq_s64_u32): Likewise.
>> 			(vreinterpretq_s64_u64): Likewise.
>> 			(vreinterpretq_s64_u8): Likewise.
>> 			(vreinterpretq_s8_s16): Likewise.
>> 			(vreinterpretq_s8_s32): Likewise.
>> 			(vreinterpretq_s8_s64): Likewise.
>> 			(vreinterpretq_s8_u16): Likewise.
>> 			(vreinterpretq_s8_u32): Likewise.
>> 			(vreinterpretq_s8_u64): Likewise.
>> 			(vreinterpretq_s8_u8): Likewise.
>> 			(vreinterpretq_u16_s16): Likewise.
>> 			(vreinterpretq_u16_s32): Likewise.
>> 			(vreinterpretq_u16_s64): Likewise.
>> 			(vreinterpretq_u16_s8): Likewise.
>> 			(vreinterpretq_u16_u32): Likewise.
>> 			(vreinterpretq_u16_u64): Likewise.
>> 			(vreinterpretq_u16_u8): Likewise.
>> 			(vreinterpretq_u32_s16): Likewise.
>> 			(vreinterpretq_u32_s32): Likewise.
>> 			(vreinterpretq_u32_s64): Likewise.
>> 			(vreinterpretq_u32_s8): Likewise.
>> 			(vreinterpretq_u32_u16): Likewise.
>> 			(vreinterpretq_u32_u64): Likewise.
>> 			(vreinterpretq_u32_u8): Likewise.
>> 			(vreinterpretq_u64_s16): Likewise.
>> 			(vreinterpretq_u64_s32): Likewise.
>> 			(vreinterpretq_u64_s64): Likewise.
>> 			(vreinterpretq_u64_s8): Likewise.
>> 			(vreinterpretq_u64_u16): Likewise.
>> 			(vreinterpretq_u64_u32): Likewise.
>> 			(vreinterpretq_u64_u8): Likewise.
>> 			(vreinterpretq_u8_s16): Likewise.
>> 			(vreinterpretq_u8_s32): Likewise.
>> 			(vreinterpretq_u8_s64): Likewise.
>> 			(vreinterpretq_u8_s8): Likewise.
>> 			(vreinterpretq_u8_u16): Likewise.
>> 			(vreinterpretq_u8_u32): Likewise.
>> 			(vreinterpretq_u8_u64): Likewise.
>> 			(vreinterpretq_s32_f16): Likewise.
>> 			(vreinterpretq_s32_f32): Likewise.
>> 			(vreinterpretq_u16_f16): Likewise.
>> 			(vreinterpretq_u16_f32): Likewise.
>> 			(vreinterpretq_u32_f16): Likewise.
>> 			(vreinterpretq_u32_f32): Likewise.
>> 			(vreinterpretq_u64_f16): Likewise.
>> 			(vreinterpretq_u64_f32): Likewise.
>> 			(vreinterpretq_u8_f16): Likewise.
>> 			(vreinterpretq_u8_f32): Likewise.
>> 			(vreinterpretq_f16_f32): Likewise.
>> 			(vreinterpretq_f16_s16): Likewise.
>> 			(vreinterpretq_f16_s32): Likewise.
>> 			(vreinterpretq_f16_s64): Likewise.
>> 			(vreinterpretq_f16_s8): Likewise.
>> 			(vreinterpretq_f16_u16): Likewise.
>> 			(vreinterpretq_f16_u32): Likewise.
>> 			(vreinterpretq_f16_u64): Likewise.
>> 			(vreinterpretq_f16_u8): Likewise.
>> 			(vreinterpretq_f32_f16): Likewise.
>> 			(vreinterpretq_f32_s16): Likewise.
>> 			(vreinterpretq_f32_s32): Likewise.
>> 			(vreinterpretq_f32_s64): Likewise.
>> 			(vreinterpretq_f32_s8): Likewise.
>> 			(vreinterpretq_f32_u16): Likewise.
>> 			(vreinterpretq_f32_u32): Likewise.
>> 			(vreinterpretq_f32_u64): Likewise.
>> 			(vreinterpretq_f32_u8): Likewise.
>> 			(vreinterpretq_s16_f16): Likewise.
>> 			(vreinterpretq_s16_f32): Likewise.
>> 			(vreinterpretq_s64_f16): Likewise.
>> 			(vreinterpretq_s64_f32): Likewise.
>> 			(vreinterpretq_s8_f16): Likewise.
>> 			(vreinterpretq_s8_f32): Likewise.
>> 			(__arm_vreinterpretq_f16): Likewise.
>> 			(__arm_vreinterpretq_f32): Likewise.
>> 			(__arm_vreinterpretq_s16): Likewise.
>> 			(__arm_vreinterpretq_s32): Likewise.
>> 			(__arm_vreinterpretq_s64): Likewise.
>> 			(__arm_vreinterpretq_s8): Likewise.
>> 			(__arm_vreinterpretq_u16): Likewise.
>> 			(__arm_vreinterpretq_u32): Likewise.
>> 			(__arm_vreinterpretq_u64): Likewise.
>> 			(__arm_vreinterpretq_u8): Likewise.
>> 			* config/arm/arm_mve_types.h
>> (__arm_vreinterpretq_s16_s32):
>> 		Remove.
>> 			(__arm_vreinterpretq_s16_s64): Likewise.
>> 			(__arm_vreinterpretq_s16_s8): Likewise.
>> 			(__arm_vreinterpretq_s16_u16): Likewise.
>> 			(__arm_vreinterpretq_s16_u32): Likewise.
>> 			(__arm_vreinterpretq_s16_u64): Likewise.
>> 			(__arm_vreinterpretq_s16_u8): Likewise.
>> 			(__arm_vreinterpretq_s32_s16): Likewise.
>> 			(__arm_vreinterpretq_s32_s64): Likewise.
>> 			(__arm_vreinterpretq_s32_s8): Likewise.
>> 			(__arm_vreinterpretq_s32_u16): Likewise.
>> 			(__arm_vreinterpretq_s32_u32): Likewise.
>> 			(__arm_vreinterpretq_s32_u64): Likewise.
>> 			(__arm_vreinterpretq_s32_u8): Likewise.
>> 			(__arm_vreinterpretq_s64_s16): Likewise.
>> 			(__arm_vreinterpretq_s64_s32): Likewise.
>> 			(__arm_vreinterpretq_s64_s8): Likewise.
>> 			(__arm_vreinterpretq_s64_u16): Likewise.
>> 			(__arm_vreinterpretq_s64_u32): Likewise.
>> 			(__arm_vreinterpretq_s64_u64): Likewise.
>> 			(__arm_vreinterpretq_s64_u8): Likewise.
>> 			(__arm_vreinterpretq_s8_s16): Likewise.
>> 			(__arm_vreinterpretq_s8_s32): Likewise.
>> 			(__arm_vreinterpretq_s8_s64): Likewise.
>> 			(__arm_vreinterpretq_s8_u16): Likewise.
>> 			(__arm_vreinterpretq_s8_u32): Likewise.
>> 			(__arm_vreinterpretq_s8_u64): Likewise.
>> 			(__arm_vreinterpretq_s8_u8): Likewise.
>> 			(__arm_vreinterpretq_u16_s16): Likewise.
>> 			(__arm_vreinterpretq_u16_s32): Likewise.
>> 			(__arm_vreinterpretq_u16_s64): Likewise.
>> 			(__arm_vreinterpretq_u16_s8): Likewise.
>> 			(__arm_vreinterpretq_u16_u32): Likewise.
>> 			(__arm_vreinterpretq_u16_u64): Likewise.
>> 			(__arm_vreinterpretq_u16_u8): Likewise.
>> 			(__arm_vreinterpretq_u32_s16): Likewise.
>> 			(__arm_vreinterpretq_u32_s32): Likewise.
>> 			(__arm_vreinterpretq_u32_s64): Likewise.
>> 			(__arm_vreinterpretq_u32_s8): Likewise.
>> 			(__arm_vreinterpretq_u32_u16): Likewise.
>> 			(__arm_vreinterpretq_u32_u64): Likewise.
>> 			(__arm_vreinterpretq_u32_u8): Likewise.
>> 			(__arm_vreinterpretq_u64_s16): Likewise.
>> 			(__arm_vreinterpretq_u64_s32): Likewise.
>> 			(__arm_vreinterpretq_u64_s64): Likewise.
>> 			(__arm_vreinterpretq_u64_s8): Likewise.
>> 			(__arm_vreinterpretq_u64_u16): Likewise.
>> 			(__arm_vreinterpretq_u64_u32): Likewise.
>> 			(__arm_vreinterpretq_u64_u8): Likewise.
>> 			(__arm_vreinterpretq_u8_s16): Likewise.
>> 			(__arm_vreinterpretq_u8_s32): Likewise.
>> 			(__arm_vreinterpretq_u8_s64): Likewise.
>> 			(__arm_vreinterpretq_u8_s8): Likewise.
>> 			(__arm_vreinterpretq_u8_u16): Likewise.
>> 			(__arm_vreinterpretq_u8_u32): Likewise.
>> 			(__arm_vreinterpretq_u8_u64): Likewise.
>> 			(__arm_vreinterpretq_s32_f16): Likewise.
>> 			(__arm_vreinterpretq_s32_f32): Likewise.
>> 			(__arm_vreinterpretq_s16_f16): Likewise.
>> 			(__arm_vreinterpretq_s16_f32): Likewise.
>> 			(__arm_vreinterpretq_s64_f16): Likewise.
>> 			(__arm_vreinterpretq_s64_f32): Likewise.
>> 			(__arm_vreinterpretq_s8_f16): Likewise.
>> 			(__arm_vreinterpretq_s8_f32): Likewise.
>> 			(__arm_vreinterpretq_u16_f16): Likewise.
>> 			(__arm_vreinterpretq_u16_f32): Likewise.
>> 			(__arm_vreinterpretq_u32_f16): Likewise.
>> 			(__arm_vreinterpretq_u32_f32): Likewise.
>> 			(__arm_vreinterpretq_u64_f16): Likewise.
>> 			(__arm_vreinterpretq_u64_f32): Likewise.
>> 			(__arm_vreinterpretq_u8_f16): Likewise.
>> 			(__arm_vreinterpretq_u8_f32): Likewise.
>> 			(__arm_vreinterpretq_f16_f32): Likewise.
>> 			(__arm_vreinterpretq_f16_s16): Likewise.
>> 			(__arm_vreinterpretq_f16_s32): Likewise.
>> 			(__arm_vreinterpretq_f16_s64): Likewise.
>> 			(__arm_vreinterpretq_f16_s8): Likewise.
>> 			(__arm_vreinterpretq_f16_u16): Likewise.
>> 			(__arm_vreinterpretq_f16_u32): Likewise.
>> 			(__arm_vreinterpretq_f16_u64): Likewise.
>> 			(__arm_vreinterpretq_f16_u8): Likewise.
>> 			(__arm_vreinterpretq_f32_f16): Likewise.
>> 			(__arm_vreinterpretq_f32_s16): Likewise.
>> 			(__arm_vreinterpretq_f32_s32): Likewise.
>> 			(__arm_vreinterpretq_f32_s64): Likewise.
>> 			(__arm_vreinterpretq_f32_s8): Likewise.
>> 			(__arm_vreinterpretq_f32_u16): Likewise.
>> 			(__arm_vreinterpretq_f32_u32): Likewise.
>> 			(__arm_vreinterpretq_f32_u64): Likewise.
>> 			(__arm_vreinterpretq_f32_u8): Likewise.
>> 			(__arm_vreinterpretq_s16): Likewise.
>> 			(__arm_vreinterpretq_s32): Likewise.
>> 			(__arm_vreinterpretq_s64): Likewise.
>> 			(__arm_vreinterpretq_s8): Likewise.
>> 			(__arm_vreinterpretq_u16): Likewise.
>> 			(__arm_vreinterpretq_u32): Likewise.
>> 			(__arm_vreinterpretq_u64): Likewise.
>> 			(__arm_vreinterpretq_u8): Likewise.
>> 			(__arm_vreinterpretq_f16): Likewise.
>> 			(__arm_vreinterpretq_f32): Likewise.
>> 			* config/arm/mve.md
>> (@arm_mve_reinterpret<mode>): New
>> 		pattern.
>> 			* config/arm/unspecs.md: (REINTERPRET): New
>> unspec.
>>
>> 			gcc/testsuite/
>> 			* g++.target/arm/mve.exp: Add general-c++ and
>> general directories.
>> 			* g++.target/arm/mve/general-c++/nomve_fp_1.c:
>> New test.
>> 			* g++.target/arm/mve/general-c++/vreinterpretq_1.C:
>> New test.
>> 			* gcc.target/arm/mve/general-c/nomve_fp_1.c: New
>> test.
>> 			* gcc.target/arm/mve/general-c/vreinterpretq_1.c:
>> New test.
>> 		---
>> 		 gcc/config/arm/arm-mve-builtins-base.cc       |   29 +
>> 		 gcc/config/arm/arm-mve-builtins-base.def      |    2 +
>> 		 gcc/config/arm/arm-mve-builtins-base.h        |    2 +
>> 		 gcc/config/arm/arm-mve-builtins-shapes.cc     |   28 +
>> 		 gcc/config/arm/arm-mve-builtins-shapes.h      |    8 +
>> 		 gcc/config/arm/arm-mve-builtins.cc            |   60 +
>> 		 gcc/config/arm/arm_mve.h                      |  300 ----
>> 		 gcc/config/arm/arm_mve_types.h                | 1365 +-------------
>> ---
>> 		 gcc/config/arm/mve.md                         |   18 +
>> 		 gcc/config/arm/unspecs.md                     |    1 +
>> 		 gcc/testsuite/g++.target/arm/mve.exp          |    8 +-
>> 		 .../arm/mve/general-c++/nomve_fp_1.c          |   15 +
>> 		 .../arm/mve/general-c++/vreinterpretq_1.C     |   25 +
>> 		 .../gcc.target/arm/mve/general-c/nomve_fp_1.c |   15 +
>> 		 .../arm/mve/general-c/vreinterpretq_1.c       |   25 +
>> 		 15 files changed, 286 insertions(+), 1615 deletions(-)
>> 		 create mode 100644
>> gcc/testsuite/g++.target/arm/mve/general-
>> 		c++/nomve_fp_1.c
>> 		 create mode 100644
>> gcc/testsuite/g++.target/arm/mve/general-
>> 		c++/vreinterpretq_1.C
>> 		 create mode 100644
>> gcc/testsuite/gcc.target/arm/mve/general-
>> 		c/nomve_fp_1.c
>> 		 create mode 100644
>> gcc/testsuite/gcc.target/arm/mve/general-
>> 		c/vreinterpretq_1.c
>>
>> 		diff --git a/gcc/config/arm/arm-mve-builtins-base.cc
>> b/gcc/config/arm/arm-
>> 		mve-builtins-base.cc
>> 		index e9f285faf2b..ad8d500afc6 100644
>> 		--- a/gcc/config/arm/arm-mve-builtins-base.cc
>> 		+++ b/gcc/config/arm/arm-mve-builtins-base.cc
>> 		@@ -38,8 +38,37 @@ using namespace arm_mve;
>>
>> 		 namespace {
>>
>> 		+/* Implements vreinterpretq_* intrinsics.  */
>> 		+class vreinterpretq_impl : public quiet<function_base>
>> 		+{
>> 		+  gimple *
>> 		+  fold (gimple_folder &f) const override
>> 		+  {
>> 		+    /* Punt to rtl if the effect of the reinterpret on registers
>> does not
>> 		+       conform to GCC's endianness model.  */
>> 		+    if (!targetm.can_change_mode_class (f.vector_mode (0),
>> 		+					f.vector_mode (1),
>> VFP_REGS))
>> 		+      return NULL;
>> 		+
>>
>>
>> 	So we punt to an RTL pattern here if we cannot change mode class...
>>
>> 	[snip]
>>
>>
>> 		diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
>> 		index 35eab6c94bf..ab688396f97 100644
>> 		--- a/gcc/config/arm/mve.md
>> 		+++ b/gcc/config/arm/mve.md
>> 		@@ -10561,3 +10561,21 @@ (define_expand
>> 		"vcond_mask_<mode><MVE_vpred>"
>> 		     }
>> 		   DONE;
>> 		 })
>> 		+
>> 		+;; Reinterpret operand 1 in operand 0's mode, without
>> changing its contents.
>> 		+(define_expand "@arm_mve_reinterpret<mode>"
>> 		+  [(set (match_operand:MVE_vecs 0 "register_operand")
>> 		+	(unspec:MVE_vecs
>> 		+	  [(match_operand 1 "arm_any_register_operand")]
>> 		+	  REINTERPRET))]
>> 		+  "(TARGET_HAVE_MVE && VALID_MVE_SI_MODE
>> (<MODE>mode))
>> 		+    || (TARGET_HAVE_MVE_FLOAT && VALID_MVE_SF_MODE
>> 		(<MODE>mode))"
>> 		+  {
>> 		+    machine_mode src_mode = GET_MODE (operands[1]);
>> 		+    if (targetm.can_change_mode_class (<MODE>mode,
>> src_mode,
>> 		VFP_REGS))
>> 		+      {
>> 		+	emit_move_insn (operands[0], gen_lowpart
>> (<MODE>mode,
>> 		operands[1]));
>> 		+	DONE;
>> 		+      }
>> 		+  }
>> 		+)
>>
>>
>> 	... But we still check can_change_mode_class in this pattern and if it's
>> not true we emit the new REINTERPRET unspec
>> 	without a corresponding define_insn pattern. Won't that ICE? Would
>> this case occur on big-endian targets?
>>
>>
>>
>>
>> Looks like you are right. However, arm_mve.h is protected by:
>>
>> #if __ARM_BIG_ENDIAN
>>
>> #error "MVE intrinsics are not supported in Big-Endian mode."
>>
>>
>>
>>
>> Just tried to hack my arm_mve.h to accept big-endian, and indeed we do ICE.
>>
>>
>>
>>
>> In fact, this pattern and vreinterpretq_impl above are quite similar to the
>> aarch64 implementation.
>>
>> I tried with a sample
>>
>> svint16_t foo(svint8_t value1)
>> {
>> return svreinterpret_s16_s8(value1);
>> }
>> and it seems aarch64-none-elf-gcc -march=armv8.2-a+sve -mbig-endian is OK,
>> although
>> aarch64_can_change_mode_class() has:
>> if (BYTES_BIG_ENDIAN)
>> ...
>> if (from_sve_p && GET_MODE_UNIT_SIZE (from) != GET_MODE_UNIT_SIZE
>> (to))
>> return false;
>> so it should have a similar problem? I', not sure why it doesn't ICE?
> 
> I believe that's because there's a pattern in aarch64-sve.md that converts everything into a simple set with the right modes forced in.
> 
> ;; A pattern for handling type punning on big-endian targets.  We use a
> ;; special predicate for operand 1 to reduce the number of patterns.
> (define_insn_and_split "*aarch64_sve_reinterpret<mode>"
>    [(set (match_operand:SVE_ALL 0 "register_operand" "=w")
>          (unspec:SVE_ALL
>            [(match_operand 1 "aarch64_any_register_operand" "w")]
>            UNSPEC_REINTERPRET))]
>    "TARGET_SVE"
>    "#"
>    "&& reload_completed"
>    [(set (match_dup 0) (match_dup 1))]
>    {
>      operands[1] = aarch64_replace_reg_mode (operands[1], <MODE>mode);
>    }
> )
> 
Ha, right, thanks.

> I guess since we don't claim to support big-endian MVE for now we probably don't need to handle it, but I wonder whether we should instead
> be asserting that targetm.can_change_mode_class is true in the folding code and adding a comment that it for future big-endian support it should be handled properly in the .md file as on aarch64?
> 

Sure, I can easily add an assert, will do for v2.

Thanks,

Christophe

> Thanks,
> Kyrill
> 
> 
>> Thanks,
>> Christophe
>>
>>
>>
>> 	Thanks,
>> 	Kyrill
>>
>>
>> 		diff --git a/gcc/config/arm/unspecs.md
>> b/gcc/config/arm/unspecs.md
>> 		index 84384ee798d..dccda283573 100644
>> 		--- a/gcc/config/arm/unspecs.md
>> 		+++ b/gcc/config/arm/unspecs.md
>> 		@@ -1255,4 +1255,5 @@ (define_c_enum "unspec" [
>> 		   SQRSHRL_64
>> 		   SQRSHRL_48
>> 		   VSHLCQ_M_
>> 		+  REINTERPRET
>> 		 ])
>>
>>
> 

^ permalink raw reply	[flat|nested] 55+ messages in thread

* RE: [PATCH 04/22] arm: [MVE intrinsics] Rework vuninitialized
  2023-04-18 13:45 ` [PATCH 04/22] arm: [MVE intrinsics] Rework vuninitialized Christophe Lyon
@ 2023-05-02 16:13   ` Kyrylo Tkachov
  0 siblings, 0 replies; 55+ messages in thread
From: Kyrylo Tkachov @ 2023-05-02 16:13 UTC (permalink / raw)
  To: Christophe Lyon, gcc-patches, Richard Earnshaw, Richard Sandiford
  Cc: Christophe Lyon



> -----Original Message-----
> From: Christophe Lyon <christophe.lyon@arm.com>
> Sent: Tuesday, April 18, 2023 2:46 PM
> To: gcc-patches@gcc.gnu.org; Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>;
> Richard Earnshaw <Richard.Earnshaw@arm.com>; Richard Sandiford
> <Richard.Sandiford@arm.com>
> Cc: Christophe Lyon <Christophe.Lyon@arm.com>
> Subject: [PATCH 04/22] arm: [MVE intrinsics] Rework vuninitialized
> 
> Implement vuninitialized using the new MVE builtins framework.
> 
> We need to keep the overloaded __arm_vuninitializedq definitions
> because their resolution depends on the result type only, which is not
> currently supported by the resolver.

Ok.
Thanks,
Kyrill

> 
> 2022-09-08  Murray Steele  <murray.steele@arm.com>
> 	    Christophe Lyon  <christophe.lyon@arm.com>
> 
> gcc/ChangeLog:
> 
> 	* config/arm/arm-mve-builtins-base.cc (class
> 	vuninitializedq_impl): New.
> 	* config/arm/arm-mve-builtins-base.def (vuninitializedq): New.
> 	* config/arm/arm-mve-builtins-base.h (vuninitializedq): New
> 	declaration.
> 	* config/arm/arm-mve-builtins-shapes.cc	(inherent): New.
> 	* config/arm/arm-mve-builtins-shapes.h (inherent): New
> 	declaration.
> 	* config/arm/arm_mve_types.h (__arm_vuninitializedq): Move to ...
> 	* config/arm/arm_mve.h (__arm_vuninitializedq): ... here.
> 	(__arm_vuninitializedq_u8): Remove.
> 	(__arm_vuninitializedq_u16): Remove.
> 	(__arm_vuninitializedq_u32): Remove.
> 	(__arm_vuninitializedq_u64): Remove.
> 	(__arm_vuninitializedq_s8): Remove.
> 	(__arm_vuninitializedq_s16): Remove.
> 	(__arm_vuninitializedq_s32): Remove.
> 	(__arm_vuninitializedq_s64): Remove.
> 	(__arm_vuninitializedq_f16): Remove.
> 	(__arm_vuninitializedq_f32): Remove.
> ---
>  gcc/config/arm/arm-mve-builtins-base.cc   |  14 ++
>  gcc/config/arm/arm-mve-builtins-base.def  |   2 +
>  gcc/config/arm/arm-mve-builtins-base.h    |   1 +
>  gcc/config/arm/arm-mve-builtins-shapes.cc |  16 ++
>  gcc/config/arm/arm-mve-builtins-shapes.h  |   7 +-
>  gcc/config/arm/arm_mve.h                  |  73 ++++++++++
>  gcc/config/arm/arm_mve_types.h            | 169 ----------------------
>  7 files changed, 112 insertions(+), 170 deletions(-)
> 
> diff --git a/gcc/config/arm/arm-mve-builtins-base.cc b/gcc/config/arm/arm-
> mve-builtins-base.cc
> index ad8d500afc6..02a3b23865c 100644
> --- a/gcc/config/arm/arm-mve-builtins-base.cc
> +++ b/gcc/config/arm/arm-mve-builtins-base.cc
> @@ -65,10 +65,24 @@ class vreinterpretq_impl : public
> quiet<function_base>
>    }
>  };
> 
> +/* Implements vuninitializedq_* intrinsics.  */
> +class vuninitializedq_impl : public quiet<function_base>
> +{
> +
> +  rtx
> +  expand (function_expander &e) const override
> +  {
> +    rtx target = e.get_reg_target ();
> +    emit_clobber (copy_rtx (target));
> +    return target;
> +  }
> +};
> +
>  } /* end anonymous namespace */
> 
>  namespace arm_mve {
> 
>  FUNCTION (vreinterpretq, vreinterpretq_impl,)
> +FUNCTION (vuninitializedq, vuninitializedq_impl,)
> 
>  } /* end namespace arm_mve */
> diff --git a/gcc/config/arm/arm-mve-builtins-base.def b/gcc/config/arm/arm-
> mve-builtins-base.def
> index 5c0c1b9cee7..f669642a259 100644
> --- a/gcc/config/arm/arm-mve-builtins-base.def
> +++ b/gcc/config/arm/arm-mve-builtins-base.def
> @@ -19,8 +19,10 @@
> 
>  #define REQUIRES_FLOAT false
>  DEF_MVE_FUNCTION (vreinterpretq, unary_convert, reinterpret_integer,
> none)
> +DEF_MVE_FUNCTION (vuninitializedq, inherent, all_integer_with_64, none)
>  #undef REQUIRES_FLOAT
> 
>  #define REQUIRES_FLOAT true
>  DEF_MVE_FUNCTION (vreinterpretq, unary_convert, reinterpret_float, none)
> +DEF_MVE_FUNCTION (vuninitializedq, inherent, all_float, none)
>  #undef REQUIRES_FLOAT
> diff --git a/gcc/config/arm/arm-mve-builtins-base.h b/gcc/config/arm/arm-
> mve-builtins-base.h
> index 60e7bd24eda..ec309cbe572 100644
> --- a/gcc/config/arm/arm-mve-builtins-base.h
> +++ b/gcc/config/arm/arm-mve-builtins-base.h
> @@ -24,6 +24,7 @@ namespace arm_mve {
>  namespace functions {
> 
>  extern const function_base *const vreinterpretq;
> +extern const function_base *const vuninitializedq;
> 
>  } /* end namespace arm_mve::functions */
>  } /* end namespace arm_mve */
> diff --git a/gcc/config/arm/arm-mve-builtins-shapes.cc b/gcc/config/arm/arm-
> mve-builtins-shapes.cc
> index d0da0ffef91..ce476aa196e 100644
> --- a/gcc/config/arm/arm-mve-builtins-shapes.cc
> +++ b/gcc/config/arm/arm-mve-builtins-shapes.cc
> @@ -338,6 +338,22 @@ struct overloaded_base : public function_shape
>    }
>  };
> 
> +/* <T0>[xN]_t vfoo_t0().
> +
> +   Example: vuninitializedq.
> +   int8x16_t [__arm_]vuninitializedq_s8(void)
> +   int8x16_t [__arm_]vuninitializedq(int8x16_t t)  */
> +struct inherent_def : public nonoverloaded_base
> +{
> +  void
> +  build (function_builder &b, const function_group_info &group,
> +	 bool preserve_user_namespace) const override
> +  {
> +    build_all (b, "t0", group, MODE_none, preserve_user_namespace);
> +  }
> +};
> +SHAPE (inherent)
> +
>  /* <T0>_t foo_t0[_t1](<T1>_t)
> 
>     where the target type <t0> must be specified explicitly but the source
> diff --git a/gcc/config/arm/arm-mve-builtins-shapes.h b/gcc/config/arm/arm-
> mve-builtins-shapes.h
> index 04d19a02890..a491369425c 100644
> --- a/gcc/config/arm/arm-mve-builtins-shapes.h
> +++ b/gcc/config/arm/arm-mve-builtins-shapes.h
> @@ -25,11 +25,16 @@ namespace arm_mve
>    /* The naming convention is:
> 
>       - to use names like "unary" etc. if the rules are somewhat generic,
> -       especially if there are no ranges involved.  */
> +       especially if there are no ranges involved.
> +
> +     Also:
> +
> +     - "inherent" means that the function takes no arguments.  */
> 
>    namespace shapes
>    {
> 
> +    extern const function_shape *const inherent;
>      extern const function_shape *const unary_convert;
> 
>    } /* end namespace arm_mve::shapes */
> diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
> index 7688b5a7e53..5dc5ecef134 100644
> --- a/gcc/config/arm/arm_mve.h
> +++ b/gcc/config/arm/arm_mve.h
> @@ -35382,6 +35382,79 @@ __arm_vgetq_lane (float32x4_t __a, const int
> __idx)
>  }
>  #endif /* MVE Floating point.  */
> 
> +
> +__extension__ extern __inline uint8x16_t
> +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> +__arm_vuninitializedq (uint8x16_t /* __v ATTRIBUTE UNUSED */)
> +{
> + return __arm_vuninitializedq_u8 ();
> +}
> +
> +__extension__ extern __inline uint16x8_t
> +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> +__arm_vuninitializedq (uint16x8_t /* __v ATTRIBUTE UNUSED */)
> +{
> + return __arm_vuninitializedq_u16 ();
> +}
> +
> +__extension__ extern __inline uint32x4_t
> +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> +__arm_vuninitializedq (uint32x4_t /* __v ATTRIBUTE UNUSED */)
> +{
> + return __arm_vuninitializedq_u32 ();
> +}
> +
> +__extension__ extern __inline uint64x2_t
> +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> +__arm_vuninitializedq (uint64x2_t /* __v ATTRIBUTE UNUSED */)
> +{
> + return __arm_vuninitializedq_u64 ();
> +}
> +
> +__extension__ extern __inline int8x16_t
> +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> +__arm_vuninitializedq (int8x16_t /* __v ATTRIBUTE UNUSED */)
> +{
> + return __arm_vuninitializedq_s8 ();
> +}
> +
> +__extension__ extern __inline int16x8_t
> +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> +__arm_vuninitializedq (int16x8_t /* __v ATTRIBUTE UNUSED */)
> +{
> + return __arm_vuninitializedq_s16 ();
> +}
> +
> +__extension__ extern __inline int32x4_t
> +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> +__arm_vuninitializedq (int32x4_t /* __v ATTRIBUTE UNUSED */)
> +{
> + return __arm_vuninitializedq_s32 ();
> +}
> +
> +__extension__ extern __inline int64x2_t
> +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> +__arm_vuninitializedq (int64x2_t /* __v ATTRIBUTE UNUSED */)
> +{
> + return __arm_vuninitializedq_s64 ();
> +}
> +
> +#if (__ARM_FEATURE_MVE & 2) /* MVE Floating point.  */
> +__extension__ extern __inline float16x8_t
> +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> +__arm_vuninitializedq (float16x8_t /* __v ATTRIBUTE UNUSED */)
> +{
> + return __arm_vuninitializedq_f16 ();
> +}
> +
> +__extension__ extern __inline float32x4_t
> +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> +__arm_vuninitializedq (float32x4_t /* __v ATTRIBUTE UNUSED */)
> +{
> + return __arm_vuninitializedq_f32 ();
> +}
> +#endif /* __ARM_FEATURE_MVE & 2 (MVE floating point)  */
> +
>  #else
>  enum {
>      __ARM_mve_type_fp_n = 1,
> diff --git a/gcc/config/arm/arm_mve_types.h
> b/gcc/config/arm/arm_mve_types.h
> index ae2591faa03..32942e51a74 100644
> --- a/gcc/config/arm/arm_mve_types.h
> +++ b/gcc/config/arm/arm_mve_types.h
> @@ -29,173 +29,4 @@ typedef float float32_t;
> 
>  #pragma GCC arm "arm_mve_types.h"
> 
> -__extension__ extern __inline uint8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vuninitializedq_u8 (void)
> -{
> -  uint8x16_t __uninit;
> -  __asm__ ("": "=w"(__uninit));
> -  return __uninit;
> -}
> -
> -__extension__ extern __inline uint16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vuninitializedq_u16 (void)
> -{
> -  uint16x8_t __uninit;
> -  __asm__ ("": "=w"(__uninit));
> -  return __uninit;
> -}
> -
> -__extension__ extern __inline uint32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vuninitializedq_u32 (void)
> -{
> -  uint32x4_t __uninit;
> -  __asm__ ("": "=w"(__uninit));
> -  return __uninit;
> -}
> -
> -__extension__ extern __inline uint64x2_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vuninitializedq_u64 (void)
> -{
> -  uint64x2_t __uninit;
> -  __asm__ ("": "=w"(__uninit));
> -  return __uninit;
> -}
> -
> -__extension__ extern __inline int8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vuninitializedq_s8 (void)
> -{
> -  int8x16_t __uninit;
> -  __asm__ ("": "=w"(__uninit));
> -  return __uninit;
> -}
> -
> -__extension__ extern __inline int16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vuninitializedq_s16 (void)
> -{
> -  int16x8_t __uninit;
> -  __asm__ ("": "=w"(__uninit));
> -  return __uninit;
> -}
> -
> -__extension__ extern __inline int32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vuninitializedq_s32 (void)
> -{
> -  int32x4_t __uninit;
> -  __asm__ ("": "=w"(__uninit));
> -  return __uninit;
> -}
> -
> -__extension__ extern __inline int64x2_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vuninitializedq_s64 (void)
> -{
> -  int64x2_t __uninit;
> -  __asm__ ("": "=w"(__uninit));
> -  return __uninit;
> -}
> -
> -#if (__ARM_FEATURE_MVE & 2) /* MVE Floating point.  */
> -
> -__extension__ extern __inline float16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vuninitializedq_f16 (void)
> -{
> -  float16x8_t __uninit;
> -  __asm__ ("": "=w" (__uninit));
> -  return __uninit;
> -}
> -
> -__extension__ extern __inline float32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vuninitializedq_f32 (void)
> -{
> -  float32x4_t __uninit;
> -  __asm__ ("": "=w" (__uninit));
> -  return __uninit;
> -}
> -
> -#endif
> -
> -#ifdef __cplusplus
> -
> -__extension__ extern __inline uint8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vuninitializedq (uint8x16_t /* __v ATTRIBUTE UNUSED */)
> -{
> - return __arm_vuninitializedq_u8 ();
> -}
> -
> -__extension__ extern __inline uint16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vuninitializedq (uint16x8_t /* __v ATTRIBUTE UNUSED */)
> -{
> - return __arm_vuninitializedq_u16 ();
> -}
> -
> -__extension__ extern __inline uint32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vuninitializedq (uint32x4_t /* __v ATTRIBUTE UNUSED */)
> -{
> - return __arm_vuninitializedq_u32 ();
> -}
> -
> -__extension__ extern __inline uint64x2_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vuninitializedq (uint64x2_t /* __v ATTRIBUTE UNUSED */)
> -{
> - return __arm_vuninitializedq_u64 ();
> -}
> -
> -__extension__ extern __inline int8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vuninitializedq (int8x16_t /* __v ATTRIBUTE UNUSED */)
> -{
> - return __arm_vuninitializedq_s8 ();
> -}
> -
> -__extension__ extern __inline int16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vuninitializedq (int16x8_t /* __v ATTRIBUTE UNUSED */)
> -{
> - return __arm_vuninitializedq_s16 ();
> -}
> -
> -__extension__ extern __inline int32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vuninitializedq (int32x4_t /* __v ATTRIBUTE UNUSED */)
> -{
> - return __arm_vuninitializedq_s32 ();
> -}
> -
> -__extension__ extern __inline int64x2_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vuninitializedq (int64x2_t /* __v ATTRIBUTE UNUSED */)
> -{
> - return __arm_vuninitializedq_s64 ();
> -}
> -
> -#if (__ARM_FEATURE_MVE & 2) /* MVE Floating point.  */
> -__extension__ extern __inline float16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vuninitializedq (float16x8_t /* __v ATTRIBUTE UNUSED */)
> -{
> - return __arm_vuninitializedq_f16 ();
> -}
> -
> -__extension__ extern __inline float32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vuninitializedq (float32x4_t /* __v ATTRIBUTE UNUSED */)
> -{
> - return __arm_vuninitializedq_f32 ();
> -}
> -#endif /* __ARM_FEATURE_MVE & 2 (MVE floating point)  */
> -#endif /* __cplusplus */
> -
>  #endif /* _GCC_ARM_MVE_H.  */
> --
> 2.34.1


^ permalink raw reply	[flat|nested] 55+ messages in thread

* RE: [PATCH 05/22] arm: [MVE intrinsics] add binary_opt_n shape
  2023-04-18 13:45 ` [PATCH 05/22] arm: [MVE intrinsics] add binary_opt_n shape Christophe Lyon
@ 2023-05-02 16:16   ` Kyrylo Tkachov
  0 siblings, 0 replies; 55+ messages in thread
From: Kyrylo Tkachov @ 2023-05-02 16:16 UTC (permalink / raw)
  To: Christophe Lyon, gcc-patches, Richard Earnshaw, Richard Sandiford
  Cc: Christophe Lyon



> -----Original Message-----
> From: Christophe Lyon <christophe.lyon@arm.com>
> Sent: Tuesday, April 18, 2023 2:46 PM
> To: gcc-patches@gcc.gnu.org; Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>;
> Richard Earnshaw <Richard.Earnshaw@arm.com>; Richard Sandiford
> <Richard.Sandiford@arm.com>
> Cc: Christophe Lyon <Christophe.Lyon@arm.com>
> Subject: [PATCH 05/22] arm: [MVE intrinsics] add binary_opt_n shape
> 
> This patch adds the binary_opt_n shape description.
> 

Ok.
Thanks,
Kyrill

> 	gcc/
> 	* config/arm/arm-mve-builtins-shapes.cc (binary_opt_n): New.
> 	* config/arm/arm-mve-builtins-shapes.h (binary_opt_n): New.
> ---
>  gcc/config/arm/arm-mve-builtins-shapes.cc | 32 +++++++++++++++++++++++
>  gcc/config/arm/arm-mve-builtins-shapes.h  |  1 +
>  2 files changed, 33 insertions(+)
> 
> diff --git a/gcc/config/arm/arm-mve-builtins-shapes.cc b/gcc/config/arm/arm-
> mve-builtins-shapes.cc
> index ce476aa196e..033b304060a 100644
> --- a/gcc/config/arm/arm-mve-builtins-shapes.cc
> +++ b/gcc/config/arm/arm-mve-builtins-shapes.cc
> @@ -338,6 +338,38 @@ struct overloaded_base : public function_shape
>    }
>  };
> 
> +/* <T0>_t vfoo[_t0](<T0>_t, <T0>_t)
> +   <T0>_t vfoo[_n_t0](<T0>_t, <S0>_t)
> +
> +   i.e. the standard shape for binary operations that operate on
> +   uniform types.
> +
> +   Example: vaddq.
> +   int8x16_t [__arm_]vaddq[_s8](int8x16_t a, int8x16_t b)
> +   int8x16_t [__arm_]vaddq[_n_s8](int8x16_t a, int8_t b)
> +   int8x16_t [__arm_]vaddq_m[_s8](int8x16_t inactive, int8x16_t a, int8x16_t
> b, mve_pred16_t p)
> +   int8x16_t [__arm_]vaddq_m[_n_s8](int8x16_t inactive, int8x16_t a, int8_t
> b, mve_pred16_t p)
> +   int8x16_t [__arm_]vaddq_x[_s8](int8x16_t a, int8x16_t b, mve_pred16_t p)
> +   int8x16_t [__arm_]vaddq_x[_n_s8](int8x16_t a, int8_t b, mve_pred16_t p)
> */
> +struct binary_opt_n_def : public overloaded_base<0>
> +{
> +  void
> +  build (function_builder &b, const function_group_info &group,
> +	 bool preserve_user_namespace) const override
> +  {
> +    b.add_overloaded_functions (group, MODE_none,
> preserve_user_namespace);
> +    build_all (b, "v0,v0,v0", group, MODE_none, preserve_user_namespace);
> +    build_all (b, "v0,v0,s0", group, MODE_n, preserve_user_namespace);
> +  }
> +
> +  tree
> +  resolve (function_resolver &r) const override
> +  {
> +    return r.resolve_uniform_opt_n (2);
> +  }
> +};
> +SHAPE (binary_opt_n)
> +
>  /* <T0>[xN]_t vfoo_t0().
> 
>     Example: vuninitializedq.
> diff --git a/gcc/config/arm/arm-mve-builtins-shapes.h b/gcc/config/arm/arm-
> mve-builtins-shapes.h
> index a491369425c..43798fdde57 100644
> --- a/gcc/config/arm/arm-mve-builtins-shapes.h
> +++ b/gcc/config/arm/arm-mve-builtins-shapes.h
> @@ -34,6 +34,7 @@ namespace arm_mve
>    namespace shapes
>    {
> 
> +    extern const function_shape *const binary_opt_n;
>      extern const function_shape *const inherent;
>      extern const function_shape *const unary_convert;
> 
> --
> 2.34.1


^ permalink raw reply	[flat|nested] 55+ messages in thread

* RE: [PATCH 06/22] arm: [MVE intrinsics] add unspec_based_mve_function_exact_insn
  2023-04-18 13:45 ` [PATCH 06/22] arm: [MVE intrinsics] add unspec_based_mve_function_exact_insn Christophe Lyon
@ 2023-05-02 16:17   ` Kyrylo Tkachov
  0 siblings, 0 replies; 55+ messages in thread
From: Kyrylo Tkachov @ 2023-05-02 16:17 UTC (permalink / raw)
  To: Christophe Lyon, gcc-patches, Richard Earnshaw, Richard Sandiford
  Cc: Christophe Lyon



> -----Original Message-----
> From: Christophe Lyon <christophe.lyon@arm.com>
> Sent: Tuesday, April 18, 2023 2:46 PM
> To: gcc-patches@gcc.gnu.org; Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>;
> Richard Earnshaw <Richard.Earnshaw@arm.com>; Richard Sandiford
> <Richard.Sandiford@arm.com>
> Cc: Christophe Lyon <Christophe.Lyon@arm.com>
> Subject: [PATCH 06/22] arm: [MVE intrinsics] add
> unspec_based_mve_function_exact_insn
> 
> Introduce a function that will be used to build intrinsics which use
> RTX codes for the non-predicated, no-mode version, and UNSPECS
> otherwise.
> 

Ok.
Thanks,
Kyrill

> 2022-09-08  Christophe Lyon <christophe.lyon@arm.com>
> 
> gcc/ChangeLog:
> 
> 	* config/arm/arm-mve-builtins-functions.h (class
> 	unspec_based_mve_function_base): New.
> 	(class unspec_based_mve_function_exact_insn): New.
> ---
>  gcc/config/arm/arm-mve-builtins-functions.h | 186 ++++++++++++++++++++
>  1 file changed, 186 insertions(+)
> 
> diff --git a/gcc/config/arm/arm-mve-builtins-functions.h
> b/gcc/config/arm/arm-mve-builtins-functions.h
> index dff01999bcd..6d992b270b0 100644
> --- a/gcc/config/arm/arm-mve-builtins-functions.h
> +++ b/gcc/config/arm/arm-mve-builtins-functions.h
> @@ -39,6 +39,192 @@ public:
>    }
>  };
> 
> +/* An incomplete function_base for functions that have an associated
> +   rtx_code for signed integers, unsigned integers and floating-point
> +   values for the non-predicated, non-suffixed intrinsic, and unspec
> +   codes, with separate codes for signed integers, unsigned integers
> +   and floating-point values.  The class simply records information
> +   about the mapping for derived classes to use.  */
> +class unspec_based_mve_function_base : public function_base
> +{
> +public:
> +  CONSTEXPR unspec_based_mve_function_base (rtx_code code_for_sint,
> +					    rtx_code code_for_uint,
> +					    rtx_code code_for_fp,
> +					    int unspec_for_n_sint,
> +					    int unspec_for_n_uint,
> +					    int unspec_for_n_fp,
> +					    int unspec_for_m_sint,
> +					    int unspec_for_m_uint,
> +					    int unspec_for_m_fp,
> +					    int unspec_for_m_n_sint,
> +					    int unspec_for_m_n_uint,
> +					    int unspec_for_m_n_fp)
> +    : m_code_for_sint (code_for_sint),
> +      m_code_for_uint (code_for_uint),
> +      m_code_for_fp (code_for_fp),
> +      m_unspec_for_n_sint (unspec_for_n_sint),
> +      m_unspec_for_n_uint (unspec_for_n_uint),
> +      m_unspec_for_n_fp (unspec_for_n_fp),
> +      m_unspec_for_m_sint (unspec_for_m_sint),
> +      m_unspec_for_m_uint (unspec_for_m_uint),
> +      m_unspec_for_m_fp (unspec_for_m_fp),
> +      m_unspec_for_m_n_sint (unspec_for_m_n_sint),
> +      m_unspec_for_m_n_uint (unspec_for_m_n_uint),
> +      m_unspec_for_m_n_fp (unspec_for_m_n_fp)
> +  {}
> +
> +  /* The rtx code to use for signed, unsigned integers and
> +     floating-point values respectively.  */
> +  rtx_code m_code_for_sint;
> +  rtx_code m_code_for_uint;
> +  rtx_code m_code_for_fp;
> +
> +  /* The unspec code associated with signed-integer, unsigned-integer
> +     and floating-point operations respectively.  It covers the cases
> +     with the _n suffix, and/or the _m predicate.  */
> +  int m_unspec_for_n_sint;
> +  int m_unspec_for_n_uint;
> +  int m_unspec_for_n_fp;
> +  int m_unspec_for_m_sint;
> +  int m_unspec_for_m_uint;
> +  int m_unspec_for_m_fp;
> +  int m_unspec_for_m_n_sint;
> +  int m_unspec_for_m_n_uint;
> +  int m_unspec_for_m_n_fp;
> +};
> +
> +/* Map the function directly to CODE (UNSPEC, M) where M is the vector
> +   mode associated with type suffix 0, except when there is no
> +   predicate and no _n suffix, in which case we use the appropriate
> +   rtx_code.  This is useful when the basic operation is mapped to a
> +   standard RTX code and all other versions use different unspecs.  */
> +class unspec_based_mve_function_exact_insn : public
> unspec_based_mve_function_base
> +{
> +public:
> +  CONSTEXPR unspec_based_mve_function_exact_insn (rtx_code
> code_for_sint,
> +						  rtx_code code_for_uint,
> +						  rtx_code code_for_fp,
> +						  int unspec_for_n_sint,
> +						  int unspec_for_n_uint,
> +						  int unspec_for_n_fp,
> +						  int unspec_for_m_sint,
> +						  int unspec_for_m_uint,
> +						  int unspec_for_m_fp,
> +						  int unspec_for_m_n_sint,
> +						  int unspec_for_m_n_uint,
> +						  int unspec_for_m_n_fp)
> +    : unspec_based_mve_function_base (code_for_sint,
> +				      code_for_uint,
> +				      code_for_fp,
> +				      unspec_for_n_sint,
> +				      unspec_for_n_uint,
> +				      unspec_for_n_fp,
> +				      unspec_for_m_sint,
> +				      unspec_for_m_uint,
> +				      unspec_for_m_fp,
> +				      unspec_for_m_n_sint,
> +				      unspec_for_m_n_uint,
> +				      unspec_for_m_n_fp)
> +  {}
> +
> +  rtx
> +  expand (function_expander &e) const override
> +  {
> +    /* No suffix, no predicate, use the right RTX code.  */
> +    if ((e.mode_suffix_id != MODE_n)
> +	&& (e.pred == PRED_none))
> +      return e.map_to_rtx_codes (m_code_for_sint, m_code_for_uint,
> +				 m_code_for_fp);
> +
> +    insn_code code;
> +    switch (e.pred)
> +      {
> +      case PRED_none:
> +	if (e.mode_suffix_id == MODE_n)
> +	  /* No predicate, _n suffix.  */
> +	  {
> +	    if (e.type_suffix (0).integer_p)
> +	      if (e.type_suffix (0).unsigned_p)
> +		code = code_for_mve_q_n (m_unspec_for_n_uint,
> m_unspec_for_n_uint, e.vector_mode (0));
> +	      else
> +		code = code_for_mve_q_n (m_unspec_for_n_sint,
> m_unspec_for_n_sint, e.vector_mode (0));
> +	    else
> +	      code = code_for_mve_q_n_f (m_unspec_for_n_fp, e.vector_mode
> (0));
> +
> +	    return e.use_exact_insn (code);
> +	  }
> +	gcc_unreachable ();
> +	break;
> +
> +      case PRED_m:
> +	switch (e.mode_suffix_id)
> +	  {
> +	  case MODE_none:
> +	    /* No suffix, "m" predicate.  */
> +	    if (e.type_suffix (0).integer_p)
> +	      if (e.type_suffix (0).unsigned_p)
> +		code = code_for_mve_q_m (m_unspec_for_m_uint,
> m_unspec_for_m_uint, e.vector_mode (0));
> +	      else
> +		code = code_for_mve_q_m (m_unspec_for_m_sint,
> m_unspec_for_m_sint, e.vector_mode (0));
> +	    else
> +	      code = code_for_mve_q_m_f (m_unspec_for_m_fp,
> e.vector_mode (0));
> +	    break;
> +
> +	  case MODE_n:
> +	    /* _n suffix, "m" predicate.  */
> +	    if (e.type_suffix (0).integer_p)
> +	      if (e.type_suffix (0).unsigned_p)
> +		code = code_for_mve_q_m_n (m_unspec_for_m_n_uint,
> m_unspec_for_m_n_uint, e.vector_mode (0));
> +	      else
> +		code = code_for_mve_q_m_n (m_unspec_for_m_n_sint,
> m_unspec_for_m_n_sint, e.vector_mode (0));
> +	    else
> +	      code = code_for_mve_q_m_n_f (m_unspec_for_m_n_fp,
> e.vector_mode (0));
> +	    break;
> +
> +	  default:
> +	    gcc_unreachable ();
> +	  }
> +	return e.use_cond_insn (code, 0);
> +
> +      case PRED_x:
> +	switch (e.mode_suffix_id)
> +	  {
> +	  case MODE_none:
> +	    /* No suffix, "x" predicate.  */
> +	    if (e.type_suffix (0).integer_p)
> +	      if (e.type_suffix (0).unsigned_p)
> +		code = code_for_mve_q_m (m_unspec_for_m_uint,
> m_unspec_for_m_uint, e.vector_mode (0));
> +	      else
> +		code = code_for_mve_q_m (m_unspec_for_m_sint,
> m_unspec_for_m_sint, e.vector_mode (0));
> +	    else
> +	      code = code_for_mve_q_m_f (m_unspec_for_m_fp,
> e.vector_mode (0));
> +	    break;
> +
> +	  case MODE_n:
> +	    /* _n suffix, "x" predicate.  */
> +	    if (e.type_suffix (0).integer_p)
> +	      if (e.type_suffix (0).unsigned_p)
> +		code = code_for_mve_q_m_n (m_unspec_for_m_n_uint,
> m_unspec_for_m_n_uint, e.vector_mode (0));
> +	      else
> +		code = code_for_mve_q_m_n (m_unspec_for_m_n_sint,
> m_unspec_for_m_n_sint, e.vector_mode (0));
> +	    else
> +	      code = code_for_mve_q_m_n_f (m_unspec_for_m_n_fp,
> e.vector_mode (0));
> +	    break;
> +
> +	  default:
> +	    gcc_unreachable ();
> +	  }
> +	return e.use_pred_x_insn (code);
> +
> +      default:
> +	gcc_unreachable ();
> +      }
> +
> +    gcc_unreachable ();
> +  }
> +};
> +
>  } /* end namespace arm_mve */
> 
>  /* Declare the global function base NAME, creating it from an instance
> --
> 2.34.1


^ permalink raw reply	[flat|nested] 55+ messages in thread

* RE: [PATCH 07/22] arm: [MVE intrinsics] factorize vadd vsubq vmulq
  2023-04-18 13:45 ` [PATCH 07/22] arm: [MVE intrinsics] factorize vadd vsubq vmulq Christophe Lyon
@ 2023-05-02 16:19   ` Kyrylo Tkachov
  2023-05-02 16:22     ` Christophe Lyon
  0 siblings, 1 reply; 55+ messages in thread
From: Kyrylo Tkachov @ 2023-05-02 16:19 UTC (permalink / raw)
  To: Christophe Lyon, gcc-patches, Richard Earnshaw, Richard Sandiford
  Cc: Christophe Lyon



> -----Original Message-----
> From: Christophe Lyon <christophe.lyon@arm.com>
> Sent: Tuesday, April 18, 2023 2:46 PM
> To: gcc-patches@gcc.gnu.org; Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>;
> Richard Earnshaw <Richard.Earnshaw@arm.com>; Richard Sandiford
> <Richard.Sandiford@arm.com>
> Cc: Christophe Lyon <Christophe.Lyon@arm.com>
> Subject: [PATCH 07/22] arm: [MVE intrinsics] factorize vadd vsubq vmulq
> 
> In order to avoid using a huge switch when generating all the
> intrinsics (e.g. mve_vaddq_n_sv4si, ...), we want to generate a single
> function taking the builtin code as parameter (e.g. mve_q_n (VADDQ_S,
> ....)
> This is achieved by using the new mve_insn iterator.
> 
> Having done that, it becomes easier to share similar patterns, to
> avoid useless/error-prone code duplication.

Nice!
Ok but...

> 
> 2022-09-08  Christophe Lyon  <christophe.lyon@arm.com>
> 
> gcc/ChangeLog:
> 
> 	* config/arm/iterators.md (MVE_INT_BINARY_RTX,
> MVE_INT_M_BINARY)
> 	(MVE_INT_M_N_BINARY, MVE_INT_N_BINARY, MVE_FP_M_BINARY)
> 	(MVE_FP_M_N_BINARY, MVE_FP_N_BINARY, mve_addsubmul,
> mve_insn): New
> 	iterators.
> 	* config/arm/mve.md
> 	(mve_vsubq_n_f<mode>, mve_vaddq_n_f<mode>,
> mve_vmulq_n_f<mode>):
> 	Factorize into ...
> 	(@mve_<mve_insn>q_n_f<mode>): ... this.
> 	(mve_vaddq_n_<supf><mode>, mve_vmulq_n_<supf><mode>)
> 	(mve_vsubq_n_<supf><mode>): Factorize into ...
> 	(@mve_<mve_insn>q_n_<supf><mode>): ... this.
> 	(mve_vaddq<mode>, mve_vmulq<mode>, mve_vsubq<mode>):
> Factorize
> 	into ...
> 	(mve_<mve_addsubmul>q<mode>): ... this.
> 	(mve_vaddq_f<mode>, mve_vmulq_f<mode>,
> mve_vsubq_f<mode>):
> 	Factorize into ...
> 	(mve_<mve_addsubmul>q_f<mode>): ... this.
> 	(mve_vaddq_m_<supf><mode>, mve_vmulq_m_<supf><mode>)
> 	(mve_vsubq_m_<supf><mode>): Factorize into ...
> 	(@mve_<mve_insn>q_m_<supf><mode>): ... this,
> 	(mve_vaddq_m_n_<supf><mode>,
> mve_vmulq_m_n_<supf><mode>)
> 	(mve_vsubq_m_n_<supf><mode>): Factorize into ...
> 	(@mve_<mve_insn>q_m_n_<supf><mode>): ... this.
> 	(mve_vaddq_m_f<mode>, mve_vmulq_m_f<mode>,
> mve_vsubq_m_f<mode>):
> 	Factorize into ...
> 	(@mve_<mve_insn>q_m_f<mode>): ... this.
> 	(mve_vaddq_m_n_f<mode>, mve_vmulq_m_n_f<mode>)
> 	(mve_vsubq_m_n_f<mode>): Factorize into ...
> 	(@mve_<mve_insn>q_m_n_f<mode>): ... this.
> ---
>  gcc/config/arm/iterators.md |  57 +++++++
>  gcc/config/arm/mve.md       | 317 +++++-------------------------------
>  2 files changed, 99 insertions(+), 275 deletions(-)
> 
> diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
> index 39895ad62aa..d3bef594775 100644
> --- a/gcc/config/arm/iterators.md
> +++ b/gcc/config/arm/iterators.md
> @@ -330,6 +330,63 @@ (define_code_iterator FCVT [unsigned_float float])
>  ;; Saturating addition, subtraction
>  (define_code_iterator SSPLUSMINUS [ss_plus ss_minus])
> 
> +;; MVE integer binary operations.
> +(define_code_iterator MVE_INT_BINARY_RTX [plus minus mult])
> +
> +(define_int_iterator MVE_INT_M_BINARY   [
> +		     VADDQ_M_S VADDQ_M_U
> +		     VMULQ_M_S VMULQ_M_U
> +		     VSUBQ_M_S VSUBQ_M_U
> +		     ])
> +
> +(define_int_iterator MVE_INT_M_N_BINARY [
> +		     VADDQ_M_N_S VADDQ_M_N_U
> +		     VMULQ_M_N_S VMULQ_M_N_U
> +		     VSUBQ_M_N_S VSUBQ_M_N_U
> +		     ])
> +
> +(define_int_iterator MVE_INT_N_BINARY   [
> +		     VADDQ_N_S VADDQ_N_U
> +		     VMULQ_N_S VMULQ_N_U
> +		     VSUBQ_N_S VSUBQ_N_U
> +		     ])
> +
> +(define_int_iterator MVE_FP_M_BINARY   [
> +		     VADDQ_M_F
> +		     VMULQ_M_F
> +		     VSUBQ_M_F
> +		     ])
> +
> +(define_int_iterator MVE_FP_M_N_BINARY [
> +		     VADDQ_M_N_F
> +		     VMULQ_M_N_F
> +		     VSUBQ_M_N_F
> +		     ])
> +
> +(define_int_iterator MVE_FP_N_BINARY   [
> +		     VADDQ_N_F
> +		     VMULQ_N_F
> +		     VSUBQ_N_F
> +		     ])
> +
> +(define_code_attr mve_addsubmul [
> +		 (minus "vsub")
> +		 (mult "vmul")
> +		 (plus "vadd")
> +		 ])
> +
> +(define_int_attr mve_insn [
> +		 (VADDQ_M_N_S "vadd") (VADDQ_M_N_U "vadd")
> (VADDQ_M_N_F "vadd")
> +		 (VADDQ_M_S "vadd") (VADDQ_M_U "vadd") (VADDQ_M_F
> "vadd")
> +		 (VADDQ_N_S "vadd") (VADDQ_N_U "vadd") (VADDQ_N_F
> "vadd")
> +		 (VMULQ_M_N_S "vmul") (VMULQ_M_N_U "vmul")
> (VMULQ_M_N_F "vmul")
> +		 (VMULQ_M_S "vmul") (VMULQ_M_U "vmul") (VMULQ_M_F
> "vmul")
> +		 (VMULQ_N_S "vmul") (VMULQ_N_U "vmul") (VMULQ_N_F
> "vmul")
> +		 (VSUBQ_M_N_S "vsub") (VSUBQ_M_N_U "vsub")
> (VSUBQ_M_N_F "vsub")
> +		 (VSUBQ_M_S "vsub") (VSUBQ_M_U "vsub") (VSUBQ_M_F
> "vsub")
> +		 (VSUBQ_N_S "vsub") (VSUBQ_N_U "vsub") (VSUBQ_N_F
> "vsub")
> +		 ])
> +
>  ;; plus and minus are the only SHIFTABLE_OPS for which Thumb2 allows
>  ;; a stack pointer operand.  The minus operation is a candidate for an rsub
>  ;; and hence only plus is supported.
> diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
> index ab688396f97..5167fbc6add 100644
> --- a/gcc/config/arm/mve.md
> +++ b/gcc/config/arm/mve.md
> @@ -668,21 +668,6 @@ (define_insn "mve_vpnotv16bi"
>    [(set_attr "type" "mve_move")
>  ])
> 
> -;;
> -;; [vsubq_n_f])
> -;;
> -(define_insn "mve_vsubq_n_f<mode>"
> -  [
> -   (set (match_operand:MVE_0 0 "s_register_operand" "=w")
> -	(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand"
> "w")
> -		       (match_operand:<V_elem> 2 "s_register_operand" "r")]
> -	 VSUBQ_N_F))
> -  ]
> -  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
> -  "vsub.f<V_sz_elem>\t%q0, %q1, %2"
> -  [(set_attr "type" "mve_move")
> -])
> -
>  ;;
>  ;; [vbrsrq_n_f])
>  ;;
> @@ -871,16 +856,18 @@ (define_insn "mve_vabdq_<supf><mode>"
> 
>  ;;
>  ;; [vaddq_n_s, vaddq_n_u])
> +;; [vsubq_n_s, vsubq_n_u])
> +;; [vmulq_n_s, vmulq_n_u])
>  ;;

... This trailing ')' is a pre-existing copy-pasto I think. Let's remove it.
Thanks,
Kyrill

> -(define_insn "mve_vaddq_n_<supf><mode>"
> +(define_insn "@mve_<mve_insn>q_n_<supf><mode>"
>    [
>     (set (match_operand:MVE_2 0 "s_register_operand" "=w")
>  	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand"
> "w")
>  		       (match_operand:<V_elem> 2 "s_register_operand" "r")]
> -	 VADDQ_N))
> +	 MVE_INT_N_BINARY))
>    ]
>    "TARGET_HAVE_MVE"
> -  "vadd.i%#<V_sz_elem>\t%q0, %q1, %2"
> +  "<mve_insn>.i%#<V_sz_elem>\t%q0, %q1, %2"
>    [(set_attr "type" "mve_move")
>  ])
> 
> @@ -1362,26 +1349,13 @@ (define_insn "mve_vmulltq_int_<supf><mode>"
>  ])
> 
>  ;;
> -;; [vmulq_n_u, vmulq_n_s])
> -;;
> -(define_insn "mve_vmulq_n_<supf><mode>"
> -  [
> -   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
> -	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand"
> "w")
> -		       (match_operand:<V_elem> 2 "s_register_operand" "r")]
> -	 VMULQ_N))
> -  ]
> -  "TARGET_HAVE_MVE"
> -  "vmul.i%#<V_sz_elem>\t%q0, %q1, %2"
> -  [(set_attr "type" "mve_move")
> -])
> -
> -;;
> +;; [vaddq_s, vaddq_u])
>  ;; [vmulq_u, vmulq_s])
> +;; [vsubq_s, vsubq_u])
>  ;;
>  (define_insn "mve_vmulq_<supf><mode>"
>    [
> -   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
> +    (set (match_operand:MVE_2 0 "s_register_operand" "=w")
>  	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand"
> "w")
>  		       (match_operand:MVE_2 2 "s_register_operand" "w")]
>  	 VMULQ))
> @@ -1391,14 +1365,14 @@ (define_insn "mve_vmulq_<supf><mode>"
>    [(set_attr "type" "mve_move")
>  ])
> 
> -(define_insn "mve_vmulq<mode>"
> +(define_insn "mve_<mve_addsubmul>q<mode>"
>    [
>     (set (match_operand:MVE_2 0 "s_register_operand" "=w")
> -	(mult:MVE_2 (match_operand:MVE_2 1 "s_register_operand" "w")
> -		    (match_operand:MVE_2 2 "s_register_operand" "w")))
> +	(MVE_INT_BINARY_RTX:MVE_2 (match_operand:MVE_2 1
> "s_register_operand" "w")
> +			      (match_operand:MVE_2 2 "s_register_operand"
> "w")))
>    ]
>    "TARGET_HAVE_MVE"
> -  "vmul.i%#<V_sz_elem>\t%q0, %q1, %q2"
> +  "<mve_addsubmul>.i%#<V_sz_elem>\t%q0, %q1, %q2"
>    [(set_attr "type" "mve_move")
>  ])
> 
> @@ -1768,21 +1742,6 @@ (define_insn "mve_vshlq_r_<supf><mode>"
>    [(set_attr "type" "mve_move")
>  ])
> 
> -;;
> -;; [vsubq_n_s, vsubq_n_u])
> -;;
> -(define_insn "mve_vsubq_n_<supf><mode>"
> -  [
> -   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
> -	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand"
> "w")
> -		       (match_operand:<V_elem> 2 "s_register_operand" "r")]
> -	 VSUBQ_N))
> -  ]
> -  "TARGET_HAVE_MVE"
> -  "vsub.i%#<V_sz_elem>\t%q0, %q1, %2"
> -  [(set_attr "type" "mve_move")
> -])
> -
>  ;;
>  ;; [vsubq_s, vsubq_u])
>  ;;
> @@ -1798,17 +1757,6 @@ (define_insn "mve_vsubq_<supf><mode>"
>    [(set_attr "type" "mve_move")
>  ])
> 
> -(define_insn "mve_vsubq<mode>"
> -  [
> -   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
> -	(minus:MVE_2 (match_operand:MVE_2 1 "s_register_operand" "w")
> -		     (match_operand:MVE_2 2 "s_register_operand" "w")))
> -  ]
> -  "TARGET_HAVE_MVE"
> -  "vsub.i%#<V_sz_elem>\t%q0, %q1, %q2"
> -  [(set_attr "type" "mve_move")
> -])
> -
>  ;;
>  ;; [vabdq_f])
>  ;;
> @@ -1841,16 +1789,18 @@ (define_insn "mve_vaddlvaq_<supf>v4si"
> 
>  ;;
>  ;; [vaddq_n_f])
> +;; [vsubq_n_f])
> +;; [vmulq_n_f])
>  ;;
> -(define_insn "mve_vaddq_n_f<mode>"
> +(define_insn "@mve_<mve_insn>q_n_f<mode>"
>    [
>     (set (match_operand:MVE_0 0 "s_register_operand" "=w")
>  	(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand"
> "w")
>  		       (match_operand:<V_elem> 2 "s_register_operand" "r")]
> -	 VADDQ_N_F))
> +	 MVE_FP_N_BINARY))
>    ]
>    "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
> -  "vadd.f%#<V_sz_elem>\t%q0, %q1, %2"
> +  "<mve_insn>.f%#<V_sz_elem>\t%q0, %q1, %2"
>    [(set_attr "type" "mve_move")
>  ])
> 
> @@ -2224,31 +2174,18 @@ (define_insn "mve_vmovntq_<supf><mode>"
>  ])
> 
>  ;;
> +;; [vaddq_f])
>  ;; [vmulq_f])
> +;; [vsubq_f])
>  ;;
> -(define_insn "mve_vmulq_f<mode>"
> +(define_insn "mve_<mve_addsubmul>q_f<mode>"
>    [
>     (set (match_operand:MVE_0 0 "s_register_operand" "=w")
> -	(mult:MVE_0 (match_operand:MVE_0 1 "s_register_operand" "w")
> +	(MVE_INT_BINARY_RTX:MVE_0 (match_operand:MVE_0 1
> "s_register_operand" "w")
>  		    (match_operand:MVE_0 2 "s_register_operand" "w")))
>    ]
>    "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
> -  "vmul.f%#<V_sz_elem>	%q0, %q1, %q2"
> -  [(set_attr "type" "mve_move")
> -])
> -
> -;;
> -;; [vmulq_n_f])
> -;;
> -(define_insn "mve_vmulq_n_f<mode>"
> -  [
> -   (set (match_operand:MVE_0 0 "s_register_operand" "=w")
> -	(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand"
> "w")
> -		       (match_operand:<V_elem> 2 "s_register_operand" "r")]
> -	 VMULQ_N_F))
> -  ]
> -  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
> -  "vmul.f%#<V_sz_elem>	%q0, %q1, %2"
> +  "<mve_addsubmul>.f%#<V_sz_elem>\t%q0, %q1, %q2"
>    [(set_attr "type" "mve_move")
>  ])
> 
> @@ -2490,20 +2427,6 @@ (define_insn "mve_vshlltq_n_<supf><mode>"
>    [(set_attr "type" "mve_move")
>  ])
> 
> -;;
> -;; [vsubq_f])
> -;;
> -(define_insn "mve_vsubq_f<mode>"
> -  [
> -   (set (match_operand:MVE_0 0 "s_register_operand" "=w")
> -	(minus:MVE_0 (match_operand:MVE_0 1 "s_register_operand" "w")
> -		     (match_operand:MVE_0 2 "s_register_operand" "w")))
> -  ]
> -  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
> -  "vsub.f%#<V_sz_elem>\t%q0, %q1, %q2"
> -  [(set_attr "type" "mve_move")
> -])
> -
>  ;;
>  ;; [vmulltq_poly_p])
>  ;;
> @@ -5032,23 +4955,6 @@ (define_insn "mve_vsriq_m_n_<supf><mode>"
>    [(set_attr "type" "mve_move")
>     (set_attr "length" "8")])
> 
> -;;
> -;; [vsubq_m_u, vsubq_m_s])
> -;;
> -(define_insn "mve_vsubq_m_<supf><mode>"
> -  [
> -   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
> -	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
> -		       (match_operand:MVE_2 2 "s_register_operand" "w")
> -		       (match_operand:MVE_2 3 "s_register_operand" "w")
> -		       (match_operand:<MVE_VPRED> 4
> "vpr_register_operand" "Up")]
> -	 VSUBQ_M))
> -  ]
> -  "TARGET_HAVE_MVE"
> -  "vpst\;vsubt.i%#<V_sz_elem>\t%q0, %q2, %q3"
> -  [(set_attr "type" "mve_move")
> -   (set_attr "length" "8")])
> -
>  ;;
>  ;; [vcvtq_m_n_to_f_u, vcvtq_m_n_to_f_s])
>  ;;
> @@ -5084,35 +4990,39 @@ (define_insn "mve_vabdq_m_<supf><mode>"
> 
>  ;;
>  ;; [vaddq_m_n_s, vaddq_m_n_u])
> +;; [vsubq_m_n_s, vsubq_m_n_u])
> +;; [vmulq_m_n_s, vmulq_m_n_u])
>  ;;
> -(define_insn "mve_vaddq_m_n_<supf><mode>"
> +(define_insn "@mve_<mve_insn>q_m_n_<supf><mode>"
>    [
>     (set (match_operand:MVE_2 0 "s_register_operand" "=w")
>  	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
>  		       (match_operand:MVE_2 2 "s_register_operand" "w")
>  		       (match_operand:<V_elem> 3 "s_register_operand" "r")
>  		       (match_operand:<MVE_VPRED> 4
> "vpr_register_operand" "Up")]
> -	 VADDQ_M_N))
> +	 MVE_INT_M_N_BINARY))
>    ]
>    "TARGET_HAVE_MVE"
> -  "vpst\;vaddt.i%#<V_sz_elem>	%q0, %q2, %3"
> +  "vpst\;<mve_insn>t.i%#<V_sz_elem>	%q0, %q2, %3"
>    [(set_attr "type" "mve_move")
>     (set_attr "length""8")])
> 
>  ;;
>  ;; [vaddq_m_u, vaddq_m_s])
> +;; [vsubq_m_u, vsubq_m_s])
> +;; [vmulq_m_u, vmulq_m_s])
>  ;;
> -(define_insn "mve_vaddq_m_<supf><mode>"
> +(define_insn "@mve_<mve_insn>q_m_<supf><mode>"
>    [
>     (set (match_operand:MVE_2 0 "s_register_operand" "=w")
>  	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
>  		       (match_operand:MVE_2 2 "s_register_operand" "w")
>  		       (match_operand:MVE_2 3 "s_register_operand" "w")
>  		       (match_operand:<MVE_VPRED> 4
> "vpr_register_operand" "Up")]
> -	 VADDQ_M))
> +	 MVE_INT_M_BINARY))
>    ]
>    "TARGET_HAVE_MVE"
> -  "vpst\;vaddt.i%#<V_sz_elem>	%q0, %q2, %q3"
> +  "vpst\;<mve_insn>t.i%#<V_sz_elem>	%q0, %q2, %q3"
>    [(set_attr "type" "mve_move")
>     (set_attr "length""8")])
> 
> @@ -5422,40 +5332,6 @@ (define_insn
> "mve_vmulltq_int_m_<supf><mode>"
>    [(set_attr "type" "mve_move")
>     (set_attr "length""8")])
> 
> -;;
> -;; [vmulq_m_n_u, vmulq_m_n_s])
> -;;
> -(define_insn "mve_vmulq_m_n_<supf><mode>"
> -  [
> -   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
> -	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
> -		       (match_operand:MVE_2 2 "s_register_operand" "w")
> -		       (match_operand:<V_elem> 3 "s_register_operand" "r")
> -		       (match_operand:<MVE_VPRED> 4
> "vpr_register_operand" "Up")]
> -	 VMULQ_M_N))
> -  ]
> -  "TARGET_HAVE_MVE"
> -  "vpst\;vmult.i%#<V_sz_elem>	%q0, %q2, %3"
> -  [(set_attr "type" "mve_move")
> -   (set_attr "length""8")])
> -
> -;;
> -;; [vmulq_m_s, vmulq_m_u])
> -;;
> -(define_insn "mve_vmulq_m_<supf><mode>"
> -  [
> -   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
> -	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
> -		       (match_operand:MVE_2 2 "s_register_operand" "w")
> -		       (match_operand:MVE_2 3 "s_register_operand" "w")
> -		       (match_operand:<MVE_VPRED> 4
> "vpr_register_operand" "Up")]
> -	 VMULQ_M))
> -  ]
> -  "TARGET_HAVE_MVE"
> -  "vpst\;vmult.i%#<V_sz_elem>	%q0, %q2, %q3"
> -  [(set_attr "type" "mve_move")
> -   (set_attr "length""8")])
> -
>  ;;
>  ;; [vornq_m_u, vornq_m_s])
>  ;;
> @@ -5796,23 +5672,6 @@ (define_insn "mve_vsliq_m_n_<supf><mode>"
>    [(set_attr "type" "mve_move")
>     (set_attr "length""8")])
> 
> -;;
> -;; [vsubq_m_n_s, vsubq_m_n_u])
> -;;
> -(define_insn "mve_vsubq_m_n_<supf><mode>"
> -  [
> -   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
> -	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
> -		       (match_operand:MVE_2 2 "s_register_operand" "w")
> -		       (match_operand:<V_elem> 3 "s_register_operand" "r")
> -		       (match_operand:<MVE_VPRED> 4
> "vpr_register_operand" "Up")]
> -	 VSUBQ_M_N))
> -  ]
> -  "TARGET_HAVE_MVE"
> -  "vpst\;vsubt.i%#<V_sz_elem>\t%q0, %q2, %3"
> -  [(set_attr "type" "mve_move")
> -   (set_attr "length""8")])
> -
>  ;;
>  ;; [vhcaddq_rot270_m_s])
>  ;;
> @@ -6613,35 +6472,39 @@ (define_insn "mve_vabdq_m_f<mode>"
> 
>  ;;
>  ;; [vaddq_m_f])
> +;; [vsubq_m_f])
> +;; [vmulq_m_f])
>  ;;
> -(define_insn "mve_vaddq_m_f<mode>"
> +(define_insn "@mve_<mve_insn>q_m_f<mode>"
>    [
>     (set (match_operand:MVE_0 0 "s_register_operand" "=w")
>  	(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "0")
>  		       (match_operand:MVE_0 2 "s_register_operand" "w")
>  		       (match_operand:MVE_0 3 "s_register_operand" "w")
>  		       (match_operand:<MVE_VPRED> 4
> "vpr_register_operand" "Up")]
> -	 VADDQ_M_F))
> +	 MVE_FP_M_BINARY))
>    ]
>    "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
> -  "vpst\;vaddt.f%#<V_sz_elem>	%q0, %q2, %q3"
> +  "vpst\;<mve_insn>t.f%#<V_sz_elem>	%q0, %q2, %q3"
>    [(set_attr "type" "mve_move")
>     (set_attr "length""8")])
> 
>  ;;
>  ;; [vaddq_m_n_f])
> +;; [vsubq_m_n_f])
> +;; [vmulq_m_n_f])
>  ;;
> -(define_insn "mve_vaddq_m_n_f<mode>"
> +(define_insn "@mve_<mve_insn>q_m_n_f<mode>"
>    [
>     (set (match_operand:MVE_0 0 "s_register_operand" "=w")
>  	(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "0")
>  		       (match_operand:MVE_0 2 "s_register_operand" "w")
>  		       (match_operand:<V_elem> 3 "s_register_operand" "r")
>  		       (match_operand:<MVE_VPRED> 4
> "vpr_register_operand" "Up")]
> -	 VADDQ_M_N_F))
> +	 MVE_FP_M_N_BINARY))
>    ]
>    "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
> -  "vpst\;vaddt.f%#<V_sz_elem>	%q0, %q2, %3"
> +  "vpst\;<mve_insn>t.f%#<V_sz_elem>	%q0, %q2, %3"
>    [(set_attr "type" "mve_move")
>     (set_attr "length""8")])
> 
> @@ -6985,40 +6848,6 @@ (define_insn "mve_vminnmq_m_f<mode>"
>    [(set_attr "type" "mve_move")
>     (set_attr "length""8")])
> 
> -;;
> -;; [vmulq_m_f])
> -;;
> -(define_insn "mve_vmulq_m_f<mode>"
> -  [
> -   (set (match_operand:MVE_0 0 "s_register_operand" "=w")
> -	(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "0")
> -		       (match_operand:MVE_0 2 "s_register_operand" "w")
> -		       (match_operand:MVE_0 3 "s_register_operand" "w")
> -		       (match_operand:<MVE_VPRED> 4
> "vpr_register_operand" "Up")]
> -	 VMULQ_M_F))
> -  ]
> -  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
> -  "vpst\;vmult.f%#<V_sz_elem>	%q0, %q2, %q3"
> -  [(set_attr "type" "mve_move")
> -   (set_attr "length""8")])
> -
> -;;
> -;; [vmulq_m_n_f])
> -;;
> -(define_insn "mve_vmulq_m_n_f<mode>"
> -  [
> -   (set (match_operand:MVE_0 0 "s_register_operand" "=w")
> -	(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "0")
> -		       (match_operand:MVE_0 2 "s_register_operand" "w")
> -		       (match_operand:<V_elem> 3 "s_register_operand" "r")
> -		       (match_operand:<MVE_VPRED> 4
> "vpr_register_operand" "Up")]
> -	 VMULQ_M_N_F))
> -  ]
> -  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
> -  "vpst\;vmult.f%#<V_sz_elem>	%q0, %q2, %3"
> -  [(set_attr "type" "mve_move")
> -   (set_attr "length""8")])
> -
>  ;;
>  ;; [vornq_m_f])
>  ;;
> @@ -7053,40 +6882,6 @@ (define_insn "mve_vorrq_m_f<mode>"
>    [(set_attr "type" "mve_move")
>     (set_attr "length""8")])
> 
> -;;
> -;; [vsubq_m_f])
> -;;
> -(define_insn "mve_vsubq_m_f<mode>"
> -  [
> -   (set (match_operand:MVE_0 0 "s_register_operand" "=w")
> -	(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "0")
> -		       (match_operand:MVE_0 2 "s_register_operand" "w")
> -		       (match_operand:MVE_0 3 "s_register_operand" "w")
> -		       (match_operand:<MVE_VPRED> 4
> "vpr_register_operand" "Up")]
> -	 VSUBQ_M_F))
> -  ]
> -  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
> -  "vpst\;vsubt.f%#<V_sz_elem>\t%q0, %q2, %q3"
> -  [(set_attr "type" "mve_move")
> -   (set_attr "length""8")])
> -
> -;;
> -;; [vsubq_m_n_f])
> -;;
> -(define_insn "mve_vsubq_m_n_f<mode>"
> -  [
> -   (set (match_operand:MVE_0 0 "s_register_operand" "=w")
> -	(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "0")
> -		       (match_operand:MVE_0 2 "s_register_operand" "w")
> -		       (match_operand:<V_elem> 3 "s_register_operand" "r")
> -		       (match_operand:<MVE_VPRED> 4
> "vpr_register_operand" "Up")]
> -	 VSUBQ_M_N_F))
> -  ]
> -  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
> -  "vpst\;vsubt.f%#<V_sz_elem>\t%q0, %q2, %3"
> -  [(set_attr "type" "mve_move")
> -   (set_attr "length""8")])
> -
>  ;;
>  ;; [vstrbq_s vstrbq_u]
>  ;;
> @@ -8927,34 +8722,6 @@ (define_insn
> "mve_vstrwq_scatter_shifted_offset_<supf>v4si_insn"
>    "vstrw.32\t%q2, [%0, %q1, uxtw #2]"
>    [(set_attr "length" "4")])
> 
> -;;
> -;; [vaddq_s, vaddq_u])
> -;;
> -(define_insn "mve_vaddq<mode>"
> -  [
> -   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
> -	(plus:MVE_2 (match_operand:MVE_2 1 "s_register_operand" "w")
> -		    (match_operand:MVE_2 2 "s_register_operand" "w")))
> -  ]
> -  "TARGET_HAVE_MVE"
> -  "vadd.i%#<V_sz_elem>\t%q0, %q1, %q2"
> -  [(set_attr "type" "mve_move")
> -])
> -
> -;;
> -;; [vaddq_f])
> -;;
> -(define_insn "mve_vaddq_f<mode>"
> -  [
> -   (set (match_operand:MVE_0 0 "s_register_operand" "=w")
> -	(plus:MVE_0 (match_operand:MVE_0 1 "s_register_operand" "w")
> -		    (match_operand:MVE_0 2 "s_register_operand" "w")))
> -  ]
> -  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
> -  "vadd.f%#<V_sz_elem>\t%q0, %q1, %q2"
> -  [(set_attr "type" "mve_move")
> -])
> -
>  ;;
>  ;; [vidupq_n_u])
>  ;;
> --
> 2.34.1


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 07/22] arm: [MVE intrinsics] factorize vadd vsubq vmulq
  2023-05-02 16:19   ` Kyrylo Tkachov
@ 2023-05-02 16:22     ` Christophe Lyon
  0 siblings, 0 replies; 55+ messages in thread
From: Christophe Lyon @ 2023-05-02 16:22 UTC (permalink / raw)
  To: Kyrylo Tkachov, gcc-patches, Richard Earnshaw, Richard Sandiford



On 5/2/23 18:19, Kyrylo Tkachov wrote:
> 
> 
>> -----Original Message-----
>> From: Christophe Lyon <christophe.lyon@arm.com>
>> Sent: Tuesday, April 18, 2023 2:46 PM
>> To: gcc-patches@gcc.gnu.org; Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>;
>> Richard Earnshaw <Richard.Earnshaw@arm.com>; Richard Sandiford
>> <Richard.Sandiford@arm.com>
>> Cc: Christophe Lyon <Christophe.Lyon@arm.com>
>> Subject: [PATCH 07/22] arm: [MVE intrinsics] factorize vadd vsubq vmulq
>>
>> In order to avoid using a huge switch when generating all the
>> intrinsics (e.g. mve_vaddq_n_sv4si, ...), we want to generate a single
>> function taking the builtin code as parameter (e.g. mve_q_n (VADDQ_S,
>> ....)
>> This is achieved by using the new mve_insn iterator.
>>
>> Having done that, it becomes easier to share similar patterns, to
>> avoid useless/error-prone code duplication.
> 
> Nice!
> Ok but...
> 
>>
>> 2022-09-08  Christophe Lyon  <christophe.lyon@arm.com>
>>
>> gcc/ChangeLog:
>>
>> 	* config/arm/iterators.md (MVE_INT_BINARY_RTX,
>> MVE_INT_M_BINARY)
>> 	(MVE_INT_M_N_BINARY, MVE_INT_N_BINARY, MVE_FP_M_BINARY)
>> 	(MVE_FP_M_N_BINARY, MVE_FP_N_BINARY, mve_addsubmul,
>> mve_insn): New
>> 	iterators.
>> 	* config/arm/mve.md
>> 	(mve_vsubq_n_f<mode>, mve_vaddq_n_f<mode>,
>> mve_vmulq_n_f<mode>):
>> 	Factorize into ...
>> 	(@mve_<mve_insn>q_n_f<mode>): ... this.
>> 	(mve_vaddq_n_<supf><mode>, mve_vmulq_n_<supf><mode>)
>> 	(mve_vsubq_n_<supf><mode>): Factorize into ...
>> 	(@mve_<mve_insn>q_n_<supf><mode>): ... this.
>> 	(mve_vaddq<mode>, mve_vmulq<mode>, mve_vsubq<mode>):
>> Factorize
>> 	into ...
>> 	(mve_<mve_addsubmul>q<mode>): ... this.
>> 	(mve_vaddq_f<mode>, mve_vmulq_f<mode>,
>> mve_vsubq_f<mode>):
>> 	Factorize into ...
>> 	(mve_<mve_addsubmul>q_f<mode>): ... this.
>> 	(mve_vaddq_m_<supf><mode>, mve_vmulq_m_<supf><mode>)
>> 	(mve_vsubq_m_<supf><mode>): Factorize into ...
>> 	(@mve_<mve_insn>q_m_<supf><mode>): ... this,
>> 	(mve_vaddq_m_n_<supf><mode>,
>> mve_vmulq_m_n_<supf><mode>)
>> 	(mve_vsubq_m_n_<supf><mode>): Factorize into ...
>> 	(@mve_<mve_insn>q_m_n_<supf><mode>): ... this.
>> 	(mve_vaddq_m_f<mode>, mve_vmulq_m_f<mode>,
>> mve_vsubq_m_f<mode>):
>> 	Factorize into ...
>> 	(@mve_<mve_insn>q_m_f<mode>): ... this.
>> 	(mve_vaddq_m_n_f<mode>, mve_vmulq_m_n_f<mode>)
>> 	(mve_vsubq_m_n_f<mode>): Factorize into ...
>> 	(@mve_<mve_insn>q_m_n_f<mode>): ... this.
>> ---
>>   gcc/config/arm/iterators.md |  57 +++++++
>>   gcc/config/arm/mve.md       | 317 +++++-------------------------------
>>   2 files changed, 99 insertions(+), 275 deletions(-)
>>
>> diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
>> index 39895ad62aa..d3bef594775 100644
>> --- a/gcc/config/arm/iterators.md
>> +++ b/gcc/config/arm/iterators.md
>> @@ -330,6 +330,63 @@ (define_code_iterator FCVT [unsigned_float float])
>>   ;; Saturating addition, subtraction
>>   (define_code_iterator SSPLUSMINUS [ss_plus ss_minus])
>>
>> +;; MVE integer binary operations.
>> +(define_code_iterator MVE_INT_BINARY_RTX [plus minus mult])
>> +
>> +(define_int_iterator MVE_INT_M_BINARY   [
>> +		     VADDQ_M_S VADDQ_M_U
>> +		     VMULQ_M_S VMULQ_M_U
>> +		     VSUBQ_M_S VSUBQ_M_U
>> +		     ])
>> +
>> +(define_int_iterator MVE_INT_M_N_BINARY [
>> +		     VADDQ_M_N_S VADDQ_M_N_U
>> +		     VMULQ_M_N_S VMULQ_M_N_U
>> +		     VSUBQ_M_N_S VSUBQ_M_N_U
>> +		     ])
>> +
>> +(define_int_iterator MVE_INT_N_BINARY   [
>> +		     VADDQ_N_S VADDQ_N_U
>> +		     VMULQ_N_S VMULQ_N_U
>> +		     VSUBQ_N_S VSUBQ_N_U
>> +		     ])
>> +
>> +(define_int_iterator MVE_FP_M_BINARY   [
>> +		     VADDQ_M_F
>> +		     VMULQ_M_F
>> +		     VSUBQ_M_F
>> +		     ])
>> +
>> +(define_int_iterator MVE_FP_M_N_BINARY [
>> +		     VADDQ_M_N_F
>> +		     VMULQ_M_N_F
>> +		     VSUBQ_M_N_F
>> +		     ])
>> +
>> +(define_int_iterator MVE_FP_N_BINARY   [
>> +		     VADDQ_N_F
>> +		     VMULQ_N_F
>> +		     VSUBQ_N_F
>> +		     ])
>> +
>> +(define_code_attr mve_addsubmul [
>> +		 (minus "vsub")
>> +		 (mult "vmul")
>> +		 (plus "vadd")
>> +		 ])
>> +
>> +(define_int_attr mve_insn [
>> +		 (VADDQ_M_N_S "vadd") (VADDQ_M_N_U "vadd")
>> (VADDQ_M_N_F "vadd")
>> +		 (VADDQ_M_S "vadd") (VADDQ_M_U "vadd") (VADDQ_M_F
>> "vadd")
>> +		 (VADDQ_N_S "vadd") (VADDQ_N_U "vadd") (VADDQ_N_F
>> "vadd")
>> +		 (VMULQ_M_N_S "vmul") (VMULQ_M_N_U "vmul")
>> (VMULQ_M_N_F "vmul")
>> +		 (VMULQ_M_S "vmul") (VMULQ_M_U "vmul") (VMULQ_M_F
>> "vmul")
>> +		 (VMULQ_N_S "vmul") (VMULQ_N_U "vmul") (VMULQ_N_F
>> "vmul")
>> +		 (VSUBQ_M_N_S "vsub") (VSUBQ_M_N_U "vsub")
>> (VSUBQ_M_N_F "vsub")
>> +		 (VSUBQ_M_S "vsub") (VSUBQ_M_U "vsub") (VSUBQ_M_F
>> "vsub")
>> +		 (VSUBQ_N_S "vsub") (VSUBQ_N_U "vsub") (VSUBQ_N_F
>> "vsub")
>> +		 ])
>> +
>>   ;; plus and minus are the only SHIFTABLE_OPS for which Thumb2 allows
>>   ;; a stack pointer operand.  The minus operation is a candidate for an rsub
>>   ;; and hence only plus is supported.
>> diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
>> index ab688396f97..5167fbc6add 100644
>> --- a/gcc/config/arm/mve.md
>> +++ b/gcc/config/arm/mve.md
>> @@ -668,21 +668,6 @@ (define_insn "mve_vpnotv16bi"
>>     [(set_attr "type" "mve_move")
>>   ])
>>
>> -;;
>> -;; [vsubq_n_f])
>> -;;
>> -(define_insn "mve_vsubq_n_f<mode>"
>> -  [
>> -   (set (match_operand:MVE_0 0 "s_register_operand" "=w")
>> -	(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand"
>> "w")
>> -		       (match_operand:<V_elem> 2 "s_register_operand" "r")]
>> -	 VSUBQ_N_F))
>> -  ]
>> -  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
>> -  "vsub.f<V_sz_elem>\t%q0, %q1, %2"
>> -  [(set_attr "type" "mve_move")
>> -])
>> -
>>   ;;
>>   ;; [vbrsrq_n_f])
>>   ;;
>> @@ -871,16 +856,18 @@ (define_insn "mve_vabdq_<supf><mode>"
>>
>>   ;;
>>   ;; [vaddq_n_s, vaddq_n_u])
>> +;; [vsubq_n_s, vsubq_n_u])
>> +;; [vmulq_n_s, vmulq_n_u])
>>   ;;
> 
> ... This trailing ')' is a pre-existing copy-pasto I think. Let's remove it.

Yes, as you can see (almost?) everywhere in mve.md, such patterns have 
this '])' ending. (Indeed, I just moved those two lines from the 
previous patterns)

Christophe



> Thanks,
> Kyrill
> 
>> -(define_insn "mve_vaddq_n_<supf><mode>"
>> +(define_insn "@mve_<mve_insn>q_n_<supf><mode>"
>>     [
>>      (set (match_operand:MVE_2 0 "s_register_operand" "=w")
>>   	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand"
>> "w")
>>   		       (match_operand:<V_elem> 2 "s_register_operand" "r")]
>> -	 VADDQ_N))
>> +	 MVE_INT_N_BINARY))
>>     ]
>>     "TARGET_HAVE_MVE"
>> -  "vadd.i%#<V_sz_elem>\t%q0, %q1, %2"
>> +  "<mve_insn>.i%#<V_sz_elem>\t%q0, %q1, %2"
>>     [(set_attr "type" "mve_move")
>>   ])
>>
>> @@ -1362,26 +1349,13 @@ (define_insn "mve_vmulltq_int_<supf><mode>"
>>   ])
>>
>>   ;;
>> -;; [vmulq_n_u, vmulq_n_s])
>> -;;
>> -(define_insn "mve_vmulq_n_<supf><mode>"
>> -  [
>> -   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
>> -	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand"
>> "w")
>> -		       (match_operand:<V_elem> 2 "s_register_operand" "r")]
>> -	 VMULQ_N))
>> -  ]
>> -  "TARGET_HAVE_MVE"
>> -  "vmul.i%#<V_sz_elem>\t%q0, %q1, %2"
>> -  [(set_attr "type" "mve_move")
>> -])
>> -
>> -;;
>> +;; [vaddq_s, vaddq_u])
>>   ;; [vmulq_u, vmulq_s])
>> +;; [vsubq_s, vsubq_u])
>>   ;;
>>   (define_insn "mve_vmulq_<supf><mode>"
>>     [
>> -   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
>> +    (set (match_operand:MVE_2 0 "s_register_operand" "=w")
>>   	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand"
>> "w")
>>   		       (match_operand:MVE_2 2 "s_register_operand" "w")]
>>   	 VMULQ))
>> @@ -1391,14 +1365,14 @@ (define_insn "mve_vmulq_<supf><mode>"
>>     [(set_attr "type" "mve_move")
>>   ])
>>
>> -(define_insn "mve_vmulq<mode>"
>> +(define_insn "mve_<mve_addsubmul>q<mode>"
>>     [
>>      (set (match_operand:MVE_2 0 "s_register_operand" "=w")
>> -	(mult:MVE_2 (match_operand:MVE_2 1 "s_register_operand" "w")
>> -		    (match_operand:MVE_2 2 "s_register_operand" "w")))
>> +	(MVE_INT_BINARY_RTX:MVE_2 (match_operand:MVE_2 1
>> "s_register_operand" "w")
>> +			      (match_operand:MVE_2 2 "s_register_operand"
>> "w")))
>>     ]
>>     "TARGET_HAVE_MVE"
>> -  "vmul.i%#<V_sz_elem>\t%q0, %q1, %q2"
>> +  "<mve_addsubmul>.i%#<V_sz_elem>\t%q0, %q1, %q2"
>>     [(set_attr "type" "mve_move")
>>   ])
>>
>> @@ -1768,21 +1742,6 @@ (define_insn "mve_vshlq_r_<supf><mode>"
>>     [(set_attr "type" "mve_move")
>>   ])
>>
>> -;;
>> -;; [vsubq_n_s, vsubq_n_u])
>> -;;
>> -(define_insn "mve_vsubq_n_<supf><mode>"
>> -  [
>> -   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
>> -	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand"
>> "w")
>> -		       (match_operand:<V_elem> 2 "s_register_operand" "r")]
>> -	 VSUBQ_N))
>> -  ]
>> -  "TARGET_HAVE_MVE"
>> -  "vsub.i%#<V_sz_elem>\t%q0, %q1, %2"
>> -  [(set_attr "type" "mve_move")
>> -])
>> -
>>   ;;
>>   ;; [vsubq_s, vsubq_u])
>>   ;;
>> @@ -1798,17 +1757,6 @@ (define_insn "mve_vsubq_<supf><mode>"
>>     [(set_attr "type" "mve_move")
>>   ])
>>
>> -(define_insn "mve_vsubq<mode>"
>> -  [
>> -   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
>> -	(minus:MVE_2 (match_operand:MVE_2 1 "s_register_operand" "w")
>> -		     (match_operand:MVE_2 2 "s_register_operand" "w")))
>> -  ]
>> -  "TARGET_HAVE_MVE"
>> -  "vsub.i%#<V_sz_elem>\t%q0, %q1, %q2"
>> -  [(set_attr "type" "mve_move")
>> -])
>> -
>>   ;;
>>   ;; [vabdq_f])
>>   ;;
>> @@ -1841,16 +1789,18 @@ (define_insn "mve_vaddlvaq_<supf>v4si"
>>
>>   ;;
>>   ;; [vaddq_n_f])
>> +;; [vsubq_n_f])
>> +;; [vmulq_n_f])
>>   ;;
>> -(define_insn "mve_vaddq_n_f<mode>"
>> +(define_insn "@mve_<mve_insn>q_n_f<mode>"
>>     [
>>      (set (match_operand:MVE_0 0 "s_register_operand" "=w")
>>   	(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand"
>> "w")
>>   		       (match_operand:<V_elem> 2 "s_register_operand" "r")]
>> -	 VADDQ_N_F))
>> +	 MVE_FP_N_BINARY))
>>     ]
>>     "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
>> -  "vadd.f%#<V_sz_elem>\t%q0, %q1, %2"
>> +  "<mve_insn>.f%#<V_sz_elem>\t%q0, %q1, %2"
>>     [(set_attr "type" "mve_move")
>>   ])
>>
>> @@ -2224,31 +2174,18 @@ (define_insn "mve_vmovntq_<supf><mode>"
>>   ])
>>
>>   ;;
>> +;; [vaddq_f])
>>   ;; [vmulq_f])
>> +;; [vsubq_f])
>>   ;;
>> -(define_insn "mve_vmulq_f<mode>"
>> +(define_insn "mve_<mve_addsubmul>q_f<mode>"
>>     [
>>      (set (match_operand:MVE_0 0 "s_register_operand" "=w")
>> -	(mult:MVE_0 (match_operand:MVE_0 1 "s_register_operand" "w")
>> +	(MVE_INT_BINARY_RTX:MVE_0 (match_operand:MVE_0 1
>> "s_register_operand" "w")
>>   		    (match_operand:MVE_0 2 "s_register_operand" "w")))
>>     ]
>>     "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
>> -  "vmul.f%#<V_sz_elem>	%q0, %q1, %q2"
>> -  [(set_attr "type" "mve_move")
>> -])
>> -
>> -;;
>> -;; [vmulq_n_f])
>> -;;
>> -(define_insn "mve_vmulq_n_f<mode>"
>> -  [
>> -   (set (match_operand:MVE_0 0 "s_register_operand" "=w")
>> -	(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand"
>> "w")
>> -		       (match_operand:<V_elem> 2 "s_register_operand" "r")]
>> -	 VMULQ_N_F))
>> -  ]
>> -  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
>> -  "vmul.f%#<V_sz_elem>	%q0, %q1, %2"
>> +  "<mve_addsubmul>.f%#<V_sz_elem>\t%q0, %q1, %q2"
>>     [(set_attr "type" "mve_move")
>>   ])
>>
>> @@ -2490,20 +2427,6 @@ (define_insn "mve_vshlltq_n_<supf><mode>"
>>     [(set_attr "type" "mve_move")
>>   ])
>>
>> -;;
>> -;; [vsubq_f])
>> -;;
>> -(define_insn "mve_vsubq_f<mode>"
>> -  [
>> -   (set (match_operand:MVE_0 0 "s_register_operand" "=w")
>> -	(minus:MVE_0 (match_operand:MVE_0 1 "s_register_operand" "w")
>> -		     (match_operand:MVE_0 2 "s_register_operand" "w")))
>> -  ]
>> -  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
>> -  "vsub.f%#<V_sz_elem>\t%q0, %q1, %q2"
>> -  [(set_attr "type" "mve_move")
>> -])
>> -
>>   ;;
>>   ;; [vmulltq_poly_p])
>>   ;;
>> @@ -5032,23 +4955,6 @@ (define_insn "mve_vsriq_m_n_<supf><mode>"
>>     [(set_attr "type" "mve_move")
>>      (set_attr "length" "8")])
>>
>> -;;
>> -;; [vsubq_m_u, vsubq_m_s])
>> -;;
>> -(define_insn "mve_vsubq_m_<supf><mode>"
>> -  [
>> -   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
>> -	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
>> -		       (match_operand:MVE_2 2 "s_register_operand" "w")
>> -		       (match_operand:MVE_2 3 "s_register_operand" "w")
>> -		       (match_operand:<MVE_VPRED> 4
>> "vpr_register_operand" "Up")]
>> -	 VSUBQ_M))
>> -  ]
>> -  "TARGET_HAVE_MVE"
>> -  "vpst\;vsubt.i%#<V_sz_elem>\t%q0, %q2, %q3"
>> -  [(set_attr "type" "mve_move")
>> -   (set_attr "length" "8")])
>> -
>>   ;;
>>   ;; [vcvtq_m_n_to_f_u, vcvtq_m_n_to_f_s])
>>   ;;
>> @@ -5084,35 +4990,39 @@ (define_insn "mve_vabdq_m_<supf><mode>"
>>
>>   ;;
>>   ;; [vaddq_m_n_s, vaddq_m_n_u])
>> +;; [vsubq_m_n_s, vsubq_m_n_u])
>> +;; [vmulq_m_n_s, vmulq_m_n_u])
>>   ;;
>> -(define_insn "mve_vaddq_m_n_<supf><mode>"
>> +(define_insn "@mve_<mve_insn>q_m_n_<supf><mode>"
>>     [
>>      (set (match_operand:MVE_2 0 "s_register_operand" "=w")
>>   	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
>>   		       (match_operand:MVE_2 2 "s_register_operand" "w")
>>   		       (match_operand:<V_elem> 3 "s_register_operand" "r")
>>   		       (match_operand:<MVE_VPRED> 4
>> "vpr_register_operand" "Up")]
>> -	 VADDQ_M_N))
>> +	 MVE_INT_M_N_BINARY))
>>     ]
>>     "TARGET_HAVE_MVE"
>> -  "vpst\;vaddt.i%#<V_sz_elem>	%q0, %q2, %3"
>> +  "vpst\;<mve_insn>t.i%#<V_sz_elem>	%q0, %q2, %3"
>>     [(set_attr "type" "mve_move")
>>      (set_attr "length""8")])
>>
>>   ;;
>>   ;; [vaddq_m_u, vaddq_m_s])
>> +;; [vsubq_m_u, vsubq_m_s])
>> +;; [vmulq_m_u, vmulq_m_s])
>>   ;;
>> -(define_insn "mve_vaddq_m_<supf><mode>"
>> +(define_insn "@mve_<mve_insn>q_m_<supf><mode>"
>>     [
>>      (set (match_operand:MVE_2 0 "s_register_operand" "=w")
>>   	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
>>   		       (match_operand:MVE_2 2 "s_register_operand" "w")
>>   		       (match_operand:MVE_2 3 "s_register_operand" "w")
>>   		       (match_operand:<MVE_VPRED> 4
>> "vpr_register_operand" "Up")]
>> -	 VADDQ_M))
>> +	 MVE_INT_M_BINARY))
>>     ]
>>     "TARGET_HAVE_MVE"
>> -  "vpst\;vaddt.i%#<V_sz_elem>	%q0, %q2, %q3"
>> +  "vpst\;<mve_insn>t.i%#<V_sz_elem>	%q0, %q2, %q3"
>>     [(set_attr "type" "mve_move")
>>      (set_attr "length""8")])
>>
>> @@ -5422,40 +5332,6 @@ (define_insn
>> "mve_vmulltq_int_m_<supf><mode>"
>>     [(set_attr "type" "mve_move")
>>      (set_attr "length""8")])
>>
>> -;;
>> -;; [vmulq_m_n_u, vmulq_m_n_s])
>> -;;
>> -(define_insn "mve_vmulq_m_n_<supf><mode>"
>> -  [
>> -   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
>> -	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
>> -		       (match_operand:MVE_2 2 "s_register_operand" "w")
>> -		       (match_operand:<V_elem> 3 "s_register_operand" "r")
>> -		       (match_operand:<MVE_VPRED> 4
>> "vpr_register_operand" "Up")]
>> -	 VMULQ_M_N))
>> -  ]
>> -  "TARGET_HAVE_MVE"
>> -  "vpst\;vmult.i%#<V_sz_elem>	%q0, %q2, %3"
>> -  [(set_attr "type" "mve_move")
>> -   (set_attr "length""8")])
>> -
>> -;;
>> -;; [vmulq_m_s, vmulq_m_u])
>> -;;
>> -(define_insn "mve_vmulq_m_<supf><mode>"
>> -  [
>> -   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
>> -	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
>> -		       (match_operand:MVE_2 2 "s_register_operand" "w")
>> -		       (match_operand:MVE_2 3 "s_register_operand" "w")
>> -		       (match_operand:<MVE_VPRED> 4
>> "vpr_register_operand" "Up")]
>> -	 VMULQ_M))
>> -  ]
>> -  "TARGET_HAVE_MVE"
>> -  "vpst\;vmult.i%#<V_sz_elem>	%q0, %q2, %q3"
>> -  [(set_attr "type" "mve_move")
>> -   (set_attr "length""8")])
>> -
>>   ;;
>>   ;; [vornq_m_u, vornq_m_s])
>>   ;;
>> @@ -5796,23 +5672,6 @@ (define_insn "mve_vsliq_m_n_<supf><mode>"
>>     [(set_attr "type" "mve_move")
>>      (set_attr "length""8")])
>>
>> -;;
>> -;; [vsubq_m_n_s, vsubq_m_n_u])
>> -;;
>> -(define_insn "mve_vsubq_m_n_<supf><mode>"
>> -  [
>> -   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
>> -	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
>> -		       (match_operand:MVE_2 2 "s_register_operand" "w")
>> -		       (match_operand:<V_elem> 3 "s_register_operand" "r")
>> -		       (match_operand:<MVE_VPRED> 4
>> "vpr_register_operand" "Up")]
>> -	 VSUBQ_M_N))
>> -  ]
>> -  "TARGET_HAVE_MVE"
>> -  "vpst\;vsubt.i%#<V_sz_elem>\t%q0, %q2, %3"
>> -  [(set_attr "type" "mve_move")
>> -   (set_attr "length""8")])
>> -
>>   ;;
>>   ;; [vhcaddq_rot270_m_s])
>>   ;;
>> @@ -6613,35 +6472,39 @@ (define_insn "mve_vabdq_m_f<mode>"
>>
>>   ;;
>>   ;; [vaddq_m_f])
>> +;; [vsubq_m_f])
>> +;; [vmulq_m_f])
>>   ;;
>> -(define_insn "mve_vaddq_m_f<mode>"
>> +(define_insn "@mve_<mve_insn>q_m_f<mode>"
>>     [
>>      (set (match_operand:MVE_0 0 "s_register_operand" "=w")
>>   	(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "0")
>>   		       (match_operand:MVE_0 2 "s_register_operand" "w")
>>   		       (match_operand:MVE_0 3 "s_register_operand" "w")
>>   		       (match_operand:<MVE_VPRED> 4
>> "vpr_register_operand" "Up")]
>> -	 VADDQ_M_F))
>> +	 MVE_FP_M_BINARY))
>>     ]
>>     "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
>> -  "vpst\;vaddt.f%#<V_sz_elem>	%q0, %q2, %q3"
>> +  "vpst\;<mve_insn>t.f%#<V_sz_elem>	%q0, %q2, %q3"
>>     [(set_attr "type" "mve_move")
>>      (set_attr "length""8")])
>>
>>   ;;
>>   ;; [vaddq_m_n_f])
>> +;; [vsubq_m_n_f])
>> +;; [vmulq_m_n_f])
>>   ;;
>> -(define_insn "mve_vaddq_m_n_f<mode>"
>> +(define_insn "@mve_<mve_insn>q_m_n_f<mode>"
>>     [
>>      (set (match_operand:MVE_0 0 "s_register_operand" "=w")
>>   	(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "0")
>>   		       (match_operand:MVE_0 2 "s_register_operand" "w")
>>   		       (match_operand:<V_elem> 3 "s_register_operand" "r")
>>   		       (match_operand:<MVE_VPRED> 4
>> "vpr_register_operand" "Up")]
>> -	 VADDQ_M_N_F))
>> +	 MVE_FP_M_N_BINARY))
>>     ]
>>     "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
>> -  "vpst\;vaddt.f%#<V_sz_elem>	%q0, %q2, %3"
>> +  "vpst\;<mve_insn>t.f%#<V_sz_elem>	%q0, %q2, %3"
>>     [(set_attr "type" "mve_move")
>>      (set_attr "length""8")])
>>
>> @@ -6985,40 +6848,6 @@ (define_insn "mve_vminnmq_m_f<mode>"
>>     [(set_attr "type" "mve_move")
>>      (set_attr "length""8")])
>>
>> -;;
>> -;; [vmulq_m_f])
>> -;;
>> -(define_insn "mve_vmulq_m_f<mode>"
>> -  [
>> -   (set (match_operand:MVE_0 0 "s_register_operand" "=w")
>> -	(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "0")
>> -		       (match_operand:MVE_0 2 "s_register_operand" "w")
>> -		       (match_operand:MVE_0 3 "s_register_operand" "w")
>> -		       (match_operand:<MVE_VPRED> 4
>> "vpr_register_operand" "Up")]
>> -	 VMULQ_M_F))
>> -  ]
>> -  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
>> -  "vpst\;vmult.f%#<V_sz_elem>	%q0, %q2, %q3"
>> -  [(set_attr "type" "mve_move")
>> -   (set_attr "length""8")])
>> -
>> -;;
>> -;; [vmulq_m_n_f])
>> -;;
>> -(define_insn "mve_vmulq_m_n_f<mode>"
>> -  [
>> -   (set (match_operand:MVE_0 0 "s_register_operand" "=w")
>> -	(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "0")
>> -		       (match_operand:MVE_0 2 "s_register_operand" "w")
>> -		       (match_operand:<V_elem> 3 "s_register_operand" "r")
>> -		       (match_operand:<MVE_VPRED> 4
>> "vpr_register_operand" "Up")]
>> -	 VMULQ_M_N_F))
>> -  ]
>> -  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
>> -  "vpst\;vmult.f%#<V_sz_elem>	%q0, %q2, %3"
>> -  [(set_attr "type" "mve_move")
>> -   (set_attr "length""8")])
>> -
>>   ;;
>>   ;; [vornq_m_f])
>>   ;;
>> @@ -7053,40 +6882,6 @@ (define_insn "mve_vorrq_m_f<mode>"
>>     [(set_attr "type" "mve_move")
>>      (set_attr "length""8")])
>>
>> -;;
>> -;; [vsubq_m_f])
>> -;;
>> -(define_insn "mve_vsubq_m_f<mode>"
>> -  [
>> -   (set (match_operand:MVE_0 0 "s_register_operand" "=w")
>> -	(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "0")
>> -		       (match_operand:MVE_0 2 "s_register_operand" "w")
>> -		       (match_operand:MVE_0 3 "s_register_operand" "w")
>> -		       (match_operand:<MVE_VPRED> 4
>> "vpr_register_operand" "Up")]
>> -	 VSUBQ_M_F))
>> -  ]
>> -  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
>> -  "vpst\;vsubt.f%#<V_sz_elem>\t%q0, %q2, %q3"
>> -  [(set_attr "type" "mve_move")
>> -   (set_attr "length""8")])
>> -
>> -;;
>> -;; [vsubq_m_n_f])
>> -;;
>> -(define_insn "mve_vsubq_m_n_f<mode>"
>> -  [
>> -   (set (match_operand:MVE_0 0 "s_register_operand" "=w")
>> -	(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "0")
>> -		       (match_operand:MVE_0 2 "s_register_operand" "w")
>> -		       (match_operand:<V_elem> 3 "s_register_operand" "r")
>> -		       (match_operand:<MVE_VPRED> 4
>> "vpr_register_operand" "Up")]
>> -	 VSUBQ_M_N_F))
>> -  ]
>> -  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
>> -  "vpst\;vsubt.f%#<V_sz_elem>\t%q0, %q2, %3"
>> -  [(set_attr "type" "mve_move")
>> -   (set_attr "length""8")])
>> -
>>   ;;
>>   ;; [vstrbq_s vstrbq_u]
>>   ;;
>> @@ -8927,34 +8722,6 @@ (define_insn
>> "mve_vstrwq_scatter_shifted_offset_<supf>v4si_insn"
>>     "vstrw.32\t%q2, [%0, %q1, uxtw #2]"
>>     [(set_attr "length" "4")])
>>
>> -;;
>> -;; [vaddq_s, vaddq_u])
>> -;;
>> -(define_insn "mve_vaddq<mode>"
>> -  [
>> -   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
>> -	(plus:MVE_2 (match_operand:MVE_2 1 "s_register_operand" "w")
>> -		    (match_operand:MVE_2 2 "s_register_operand" "w")))
>> -  ]
>> -  "TARGET_HAVE_MVE"
>> -  "vadd.i%#<V_sz_elem>\t%q0, %q1, %q2"
>> -  [(set_attr "type" "mve_move")
>> -])
>> -
>> -;;
>> -;; [vaddq_f])
>> -;;
>> -(define_insn "mve_vaddq_f<mode>"
>> -  [
>> -   (set (match_operand:MVE_0 0 "s_register_operand" "=w")
>> -	(plus:MVE_0 (match_operand:MVE_0 1 "s_register_operand" "w")
>> -		    (match_operand:MVE_0 2 "s_register_operand" "w")))
>> -  ]
>> -  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
>> -  "vadd.f%#<V_sz_elem>\t%q0, %q1, %q2"
>> -  [(set_attr "type" "mve_move")
>> -])
>> -
>>   ;;
>>   ;; [vidupq_n_u])
>>   ;;
>> --
>> 2.34.1
> 

^ permalink raw reply	[flat|nested] 55+ messages in thread

* RE: [PATCH 08/22] arm: [MVE intrinsics] rework vaddq vmulq vsubq
  2023-04-18 13:45 ` [PATCH 08/22] arm: [MVE intrinsics] rework vaddq vmulq vsubq Christophe Lyon
@ 2023-05-02 16:31   ` Kyrylo Tkachov
  2023-05-03  9:06     ` Christophe Lyon
  0 siblings, 1 reply; 55+ messages in thread
From: Kyrylo Tkachov @ 2023-05-02 16:31 UTC (permalink / raw)
  To: Christophe Lyon, gcc-patches, Richard Earnshaw, Richard Sandiford
  Cc: Christophe Lyon



> -----Original Message-----
> From: Christophe Lyon <christophe.lyon@arm.com>
> Sent: Tuesday, April 18, 2023 2:46 PM
> To: gcc-patches@gcc.gnu.org; Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>;
> Richard Earnshaw <Richard.Earnshaw@arm.com>; Richard Sandiford
> <Richard.Sandiford@arm.com>
> Cc: Christophe Lyon <Christophe.Lyon@arm.com>
> Subject: [PATCH 08/22] arm: [MVE intrinsics] rework vaddq vmulq vsubq
> 
> Implement vaddq, vmulq, vsubq using the new MVE builtins framework.
> 
> 2022-09-08  Christophe Lyon <christophe.lyon@arm.com>
> 
> 	gcc/
> 
> 	* config/arm/arm-mve-builtins-base.cc
> (FUNCTION_WITH_RTX_M_N):
> 	New.
> 	(vaddq, vmulq, vsubq): New.
> 	* config/arm/arm-mve-builtins-base.def (vaddq, vmulq, vsubq): New.
> 	* config/arm/arm-mve-builtins-base.h (vaddq, vmulq, vsubq): New.
> 	* config/arm/arm_mve.h (vaddq): Remove.
> 	(vaddq_m): Remove.
> 	(vaddq_x): Remove.
> 	(vaddq_n_u8): Remove.
> 	(vaddq_n_s8): Remove.
> 	(vaddq_n_u16): Remove.
> 	(vaddq_n_s16): Remove.
> 	(vaddq_n_u32): Remove.
> 	(vaddq_n_s32): Remove.
> 	(vaddq_n_f16): Remove.
> 	(vaddq_n_f32): Remove.
> 	(vaddq_m_n_s8): Remove.
> 	(vaddq_m_n_s32): Remove.
> 	(vaddq_m_n_s16): Remove.
> 	(vaddq_m_n_u8): Remove.
> 	(vaddq_m_n_u32): Remove.
> 	(vaddq_m_n_u16): Remove.
> 	(vaddq_m_s8): Remove.
> 	(vaddq_m_s32): Remove.
> 	(vaddq_m_s16): Remove.
> 	(vaddq_m_u8): Remove.
> 	(vaddq_m_u32): Remove.
> 	(vaddq_m_u16): Remove.
> 	(vaddq_m_f32): Remove.
> 	(vaddq_m_f16): Remove.
> 	(vaddq_m_n_f32): Remove.
> 	(vaddq_m_n_f16): Remove.
> 	(vaddq_s8): Remove.
> 	(vaddq_s16): Remove.
> 	(vaddq_s32): Remove.
> 	(vaddq_u8): Remove.
> 	(vaddq_u16): Remove.
> 	(vaddq_u32): Remove.
> 	(vaddq_f16): Remove.
> 	(vaddq_f32): Remove.
> 	(vaddq_x_s8): Remove.
> 	(vaddq_x_s16): Remove.
> 	(vaddq_x_s32): Remove.
> 	(vaddq_x_n_s8): Remove.
> 	(vaddq_x_n_s16): Remove.
> 	(vaddq_x_n_s32): Remove.
> 	(vaddq_x_u8): Remove.
> 	(vaddq_x_u16): Remove.
> 	(vaddq_x_u32): Remove.
> 	(vaddq_x_n_u8): Remove.
> 	(vaddq_x_n_u16): Remove.
> 	(vaddq_x_n_u32): Remove.
> 	(vaddq_x_f16): Remove.
> 	(vaddq_x_f32): Remove.
> 	(vaddq_x_n_f16): Remove.
> 	(vaddq_x_n_f32): Remove.
> 	(__arm_vaddq_n_u8): Remove.
> 	(__arm_vaddq_n_s8): Remove.
> 	(__arm_vaddq_n_u16): Remove.
> 	(__arm_vaddq_n_s16): Remove.
> 	(__arm_vaddq_n_u32): Remove.
> 	(__arm_vaddq_n_s32): Remove.
> 	(__arm_vaddq_m_n_s8): Remove.
> 	(__arm_vaddq_m_n_s32): Remove.
> 	(__arm_vaddq_m_n_s16): Remove.
> 	(__arm_vaddq_m_n_u8): Remove.
> 	(__arm_vaddq_m_n_u32): Remove.
> 	(__arm_vaddq_m_n_u16): Remove.
> 	(__arm_vaddq_m_s8): Remove.
> 	(__arm_vaddq_m_s32): Remove.
> 	(__arm_vaddq_m_s16): Remove.
> 	(__arm_vaddq_m_u8): Remove.
> 	(__arm_vaddq_m_u32): Remove.
> 	(__arm_vaddq_m_u16): Remove.
> 	(__arm_vaddq_s8): Remove.
> 	(__arm_vaddq_s16): Remove.
> 	(__arm_vaddq_s32): Remove.
> 	(__arm_vaddq_u8): Remove.
> 	(__arm_vaddq_u16): Remove.
> 	(__arm_vaddq_u32): Remove.
> 	(__arm_vaddq_x_s8): Remove.
> 	(__arm_vaddq_x_s16): Remove.
> 	(__arm_vaddq_x_s32): Remove.
> 	(__arm_vaddq_x_n_s8): Remove.
> 	(__arm_vaddq_x_n_s16): Remove.
> 	(__arm_vaddq_x_n_s32): Remove.
> 	(__arm_vaddq_x_u8): Remove.
> 	(__arm_vaddq_x_u16): Remove.
> 	(__arm_vaddq_x_u32): Remove.
> 	(__arm_vaddq_x_n_u8): Remove.
> 	(__arm_vaddq_x_n_u16): Remove.
> 	(__arm_vaddq_x_n_u32): Remove.
> 	(__arm_vaddq_n_f16): Remove.
> 	(__arm_vaddq_n_f32): Remove.
> 	(__arm_vaddq_m_f32): Remove.
> 	(__arm_vaddq_m_f16): Remove.
> 	(__arm_vaddq_m_n_f32): Remove.
> 	(__arm_vaddq_m_n_f16): Remove.
> 	(__arm_vaddq_f16): Remove.
> 	(__arm_vaddq_f32): Remove.
> 	(__arm_vaddq_x_f16): Remove.
> 	(__arm_vaddq_x_f32): Remove.
> 	(__arm_vaddq_x_n_f16): Remove.
> 	(__arm_vaddq_x_n_f32): Remove.
> 	(__arm_vaddq): Remove.
> 	(__arm_vaddq_m): Remove.
> 	(__arm_vaddq_x): Remove.
> 	(vmulq): Remove.
> 	(vmulq_m): Remove.
> 	(vmulq_x): Remove.
> 	(vmulq_u8): Remove.
> 	(vmulq_n_u8): Remove.
> 	(vmulq_s8): Remove.
> 	(vmulq_n_s8): Remove.
> 	(vmulq_u16): Remove.
> 	(vmulq_n_u16): Remove.
> 	(vmulq_s16): Remove.
> 	(vmulq_n_s16): Remove.
> 	(vmulq_u32): Remove.
> 	(vmulq_n_u32): Remove.
> 	(vmulq_s32): Remove.
> 	(vmulq_n_s32): Remove.
> 	(vmulq_n_f16): Remove.
> 	(vmulq_f16): Remove.
> 	(vmulq_n_f32): Remove.
> 	(vmulq_f32): Remove.
> 	(vmulq_m_n_s8): Remove.
> 	(vmulq_m_n_s32): Remove.
> 	(vmulq_m_n_s16): Remove.
> 	(vmulq_m_n_u8): Remove.
> 	(vmulq_m_n_u32): Remove.
> 	(vmulq_m_n_u16): Remove.
> 	(vmulq_m_s8): Remove.
> 	(vmulq_m_s32): Remove.
> 	(vmulq_m_s16): Remove.
> 	(vmulq_m_u8): Remove.
> 	(vmulq_m_u32): Remove.
> 	(vmulq_m_u16): Remove.
> 	(vmulq_m_f32): Remove.
> 	(vmulq_m_f16): Remove.
> 	(vmulq_m_n_f32): Remove.
> 	(vmulq_m_n_f16): Remove.
> 	(vmulq_x_s8): Remove.
> 	(vmulq_x_s16): Remove.
> 	(vmulq_x_s32): Remove.
> 	(vmulq_x_n_s8): Remove.
> 	(vmulq_x_n_s16): Remove.
> 	(vmulq_x_n_s32): Remove.
> 	(vmulq_x_u8): Remove.
> 	(vmulq_x_u16): Remove.
> 	(vmulq_x_u32): Remove.
> 	(vmulq_x_n_u8): Remove.
> 	(vmulq_x_n_u16): Remove.
> 	(vmulq_x_n_u32): Remove.
> 	(vmulq_x_f16): Remove.
> 	(vmulq_x_f32): Remove.
> 	(vmulq_x_n_f16): Remove.
> 	(vmulq_x_n_f32): Remove.
> 	(__arm_vmulq_u8): Remove.
> 	(__arm_vmulq_n_u8): Remove.
> 	(__arm_vmulq_s8): Remove.
> 	(__arm_vmulq_n_s8): Remove.
> 	(__arm_vmulq_u16): Remove.
> 	(__arm_vmulq_n_u16): Remove.
> 	(__arm_vmulq_s16): Remove.
> 	(__arm_vmulq_n_s16): Remove.
> 	(__arm_vmulq_u32): Remove.
> 	(__arm_vmulq_n_u32): Remove.
> 	(__arm_vmulq_s32): Remove.
> 	(__arm_vmulq_n_s32): Remove.
> 	(__arm_vmulq_m_n_s8): Remove.
> 	(__arm_vmulq_m_n_s32): Remove.
> 	(__arm_vmulq_m_n_s16): Remove.
> 	(__arm_vmulq_m_n_u8): Remove.
> 	(__arm_vmulq_m_n_u32): Remove.
> 	(__arm_vmulq_m_n_u16): Remove.
> 	(__arm_vmulq_m_s8): Remove.
> 	(__arm_vmulq_m_s32): Remove.
> 	(__arm_vmulq_m_s16): Remove.
> 	(__arm_vmulq_m_u8): Remove.
> 	(__arm_vmulq_m_u32): Remove.
> 	(__arm_vmulq_m_u16): Remove.
> 	(__arm_vmulq_x_s8): Remove.
> 	(__arm_vmulq_x_s16): Remove.
> 	(__arm_vmulq_x_s32): Remove.
> 	(__arm_vmulq_x_n_s8): Remove.
> 	(__arm_vmulq_x_n_s16): Remove.
> 	(__arm_vmulq_x_n_s32): Remove.
> 	(__arm_vmulq_x_u8): Remove.
> 	(__arm_vmulq_x_u16): Remove.
> 	(__arm_vmulq_x_u32): Remove.
> 	(__arm_vmulq_x_n_u8): Remove.
> 	(__arm_vmulq_x_n_u16): Remove.
> 	(__arm_vmulq_x_n_u32): Remove.
> 	(__arm_vmulq_n_f16): Remove.
> 	(__arm_vmulq_f16): Remove.
> 	(__arm_vmulq_n_f32): Remove.
> 	(__arm_vmulq_f32): Remove.
> 	(__arm_vmulq_m_f32): Remove.
> 	(__arm_vmulq_m_f16): Remove.
> 	(__arm_vmulq_m_n_f32): Remove.
> 	(__arm_vmulq_m_n_f16): Remove.
> 	(__arm_vmulq_x_f16): Remove.
> 	(__arm_vmulq_x_f32): Remove.
> 	(__arm_vmulq_x_n_f16): Remove.
> 	(__arm_vmulq_x_n_f32): Remove.
> 	(__arm_vmulq): Remove.
> 	(__arm_vmulq_m): Remove.
> 	(__arm_vmulq_x): Remove.
> 	(vsubq): Remove.
> 	(vsubq_m): Remove.
> 	(vsubq_x): Remove.
> 	(vsubq_n_f16): Remove.
> 	(vsubq_n_f32): Remove.
> 	(vsubq_u8): Remove.
> 	(vsubq_n_u8): Remove.
> 	(vsubq_s8): Remove.
> 	(vsubq_n_s8): Remove.
> 	(vsubq_u16): Remove.
> 	(vsubq_n_u16): Remove.
> 	(vsubq_s16): Remove.
> 	(vsubq_n_s16): Remove.
> 	(vsubq_u32): Remove.
> 	(vsubq_n_u32): Remove.
> 	(vsubq_s32): Remove.
> 	(vsubq_n_s32): Remove.
> 	(vsubq_f16): Remove.
> 	(vsubq_f32): Remove.
> 	(vsubq_m_s8): Remove.
> 	(vsubq_m_u8): Remove.
> 	(vsubq_m_s16): Remove.
> 	(vsubq_m_u16): Remove.
> 	(vsubq_m_s32): Remove.
> 	(vsubq_m_u32): Remove.
> 	(vsubq_m_n_s8): Remove.
> 	(vsubq_m_n_s32): Remove.
> 	(vsubq_m_n_s16): Remove.
> 	(vsubq_m_n_u8): Remove.
> 	(vsubq_m_n_u32): Remove.
> 	(vsubq_m_n_u16): Remove.
> 	(vsubq_m_f32): Remove.
> 	(vsubq_m_f16): Remove.
> 	(vsubq_m_n_f32): Remove.
> 	(vsubq_m_n_f16): Remove.
> 	(vsubq_x_s8): Remove.
> 	(vsubq_x_s16): Remove.
> 	(vsubq_x_s32): Remove.
> 	(vsubq_x_n_s8): Remove.
> 	(vsubq_x_n_s16): Remove.
> 	(vsubq_x_n_s32): Remove.
> 	(vsubq_x_u8): Remove.
> 	(vsubq_x_u16): Remove.
> 	(vsubq_x_u32): Remove.
> 	(vsubq_x_n_u8): Remove.
> 	(vsubq_x_n_u16): Remove.
> 	(vsubq_x_n_u32): Remove.
> 	(vsubq_x_f16): Remove.
> 	(vsubq_x_f32): Remove.
> 	(vsubq_x_n_f16): Remove.
> 	(vsubq_x_n_f32): Remove.
> 	(__arm_vsubq_u8): Remove.
> 	(__arm_vsubq_n_u8): Remove.
> 	(__arm_vsubq_s8): Remove.
> 	(__arm_vsubq_n_s8): Remove.
> 	(__arm_vsubq_u16): Remove.
> 	(__arm_vsubq_n_u16): Remove.
> 	(__arm_vsubq_s16): Remove.
> 	(__arm_vsubq_n_s16): Remove.
> 	(__arm_vsubq_u32): Remove.
> 	(__arm_vsubq_n_u32): Remove.
> 	(__arm_vsubq_s32): Remove.
> 	(__arm_vsubq_n_s32): Remove.
> 	(__arm_vsubq_m_s8): Remove.
> 	(__arm_vsubq_m_u8): Remove.
> 	(__arm_vsubq_m_s16): Remove.
> 	(__arm_vsubq_m_u16): Remove.
> 	(__arm_vsubq_m_s32): Remove.
> 	(__arm_vsubq_m_u32): Remove.
> 	(__arm_vsubq_m_n_s8): Remove.
> 	(__arm_vsubq_m_n_s32): Remove.
> 	(__arm_vsubq_m_n_s16): Remove.
> 	(__arm_vsubq_m_n_u8): Remove.
> 	(__arm_vsubq_m_n_u32): Remove.
> 	(__arm_vsubq_m_n_u16): Remove.
> 	(__arm_vsubq_x_s8): Remove.
> 	(__arm_vsubq_x_s16): Remove.
> 	(__arm_vsubq_x_s32): Remove.
> 	(__arm_vsubq_x_n_s8): Remove.
> 	(__arm_vsubq_x_n_s16): Remove.
> 	(__arm_vsubq_x_n_s32): Remove.
> 	(__arm_vsubq_x_u8): Remove.
> 	(__arm_vsubq_x_u16): Remove.
> 	(__arm_vsubq_x_u32): Remove.
> 	(__arm_vsubq_x_n_u8): Remove.
> 	(__arm_vsubq_x_n_u16): Remove.
> 	(__arm_vsubq_x_n_u32): Remove.
> 	(__arm_vsubq_n_f16): Remove.
> 	(__arm_vsubq_n_f32): Remove.
> 	(__arm_vsubq_f16): Remove.
> 	(__arm_vsubq_f32): Remove.
> 	(__arm_vsubq_m_f32): Remove.
> 	(__arm_vsubq_m_f16): Remove.
> 	(__arm_vsubq_m_n_f32): Remove.
> 	(__arm_vsubq_m_n_f16): Remove.
> 	(__arm_vsubq_x_f16): Remove.
> 	(__arm_vsubq_x_f32): Remove.
> 	(__arm_vsubq_x_n_f16): Remove.
> 	(__arm_vsubq_x_n_f32): Remove.
> 	(__arm_vsubq): Remove.
> 	(__arm_vsubq_m): Remove.
> 	(__arm_vsubq_x): Remove.
> 	* config/arm/arm_mve_builtins.def (vsubq_u, vsubq_s, vsubq_f):
> 	Remove.
> 	(vmulq_u, vmulq_s, vmulq_f): Remove.
> 	* config/arm/mve.md (mve_vsubq_<supf><mode>): Remove.
> 	(mve_vmulq_<supf><mode>): Remove.

[snip]

> diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
> index 5167fbc6add..ccb3cf23304 100644
> --- a/gcc/config/arm/mve.md
> +++ b/gcc/config/arm/mve.md
> @@ -1353,18 +1353,6 @@ (define_insn "mve_vmulltq_int_<supf><mode>"
>  ;; [vmulq_u, vmulq_s])
>  ;; [vsubq_s, vsubq_u])
>  ;;
> -(define_insn "mve_vmulq_<supf><mode>"
> -  [
> -    (set (match_operand:MVE_2 0 "s_register_operand" "=w")
> -	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand"
> "w")
> -		       (match_operand:MVE_2 2 "s_register_operand" "w")]
> -	 VMULQ))
> -  ]
> -  "TARGET_HAVE_MVE"
> -  "vmul.i%#<V_sz_elem>\t%q0, %q1, %q2"
> -  [(set_attr "type" "mve_move")
> -])
> -
>  (define_insn "mve_<mve_addsubmul>q<mode>"
>    [
>     (set (match_operand:MVE_2 0 "s_register_operand" "=w")
> @@ -1742,21 +1730,6 @@ (define_insn "mve_vshlq_r_<supf><mode>"
>    [(set_attr "type" "mve_move")
>  ])
> 
> -;;
> -;; [vsubq_s, vsubq_u])
> -;;
> -(define_insn "mve_vsubq_<supf><mode>"
> -  [
> -   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
> -	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand"
> "w")
> -		       (match_operand:MVE_2 2 "s_register_operand" "w")]
> -	 VSUBQ))
> -  ]
> -  "TARGET_HAVE_MVE"
> -  "vsub.i%#<V_sz_elem>\t%q0, %q1, %q2"
> -  [(set_attr "type" "mve_move")
> -])
> -

Just to make sure I understand correctly, are these patterns being removed because the new builtins are wired through the factored patterns in patch [07/22]?
If so, ok.
Thanks,
Kyrill

>  ;;
>  ;; [vabdq_f])
>  ;;
> --
> 2.34.1


^ permalink raw reply	[flat|nested] 55+ messages in thread

* RE: [PATCH 09/22] arm: [MVE intrinsics] add binary shape
  2023-04-18 13:45 ` [PATCH 09/22] arm: [MVE intrinsics] add binary shape Christophe Lyon
@ 2023-05-02 16:32   ` Kyrylo Tkachov
  0 siblings, 0 replies; 55+ messages in thread
From: Kyrylo Tkachov @ 2023-05-02 16:32 UTC (permalink / raw)
  To: Christophe Lyon, gcc-patches, Richard Earnshaw, Richard Sandiford
  Cc: Christophe Lyon



> -----Original Message-----
> From: Christophe Lyon <christophe.lyon@arm.com>
> Sent: Tuesday, April 18, 2023 2:46 PM
> To: gcc-patches@gcc.gnu.org; Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>;
> Richard Earnshaw <Richard.Earnshaw@arm.com>; Richard Sandiford
> <Richard.Sandiford@arm.com>
> Cc: Christophe Lyon <Christophe.Lyon@arm.com>
> Subject: [PATCH 09/22] arm: [MVE intrinsics] add binary shape
> 
> This patch adds the binary shape description.

Ok.
Thanks,
Kyrill

> 
> 2022-09-08  Christophe Lyon  <christophe.lyon@arm.com>
> 
> 	gcc/
> 	* config/arm/arm-mve-builtins-shapes.cc (binary): New.
> 	* config/arm/arm-mve-builtins-shapes.h (binary): New.
> ---
>  gcc/config/arm/arm-mve-builtins-shapes.cc | 27 +++++++++++++++++++++++
>  gcc/config/arm/arm-mve-builtins-shapes.h  |  1 +
>  2 files changed, 28 insertions(+)
> 
> diff --git a/gcc/config/arm/arm-mve-builtins-shapes.cc b/gcc/config/arm/arm-
> mve-builtins-shapes.cc
> index 033b304060a..e69faae4e2c 100644
> --- a/gcc/config/arm/arm-mve-builtins-shapes.cc
> +++ b/gcc/config/arm/arm-mve-builtins-shapes.cc
> @@ -338,6 +338,33 @@ struct overloaded_base : public function_shape
>    }
>  };
> 
> +/* <T0>_t vfoo[_t0](<T0>_t, <T0>_t)
> +
> +   i.e. the standard shape for binary operations that operate on
> +   uniform types.
> +
> +   Example: vandq.
> +   int8x16_t [__arm_]vandq[_s8](int8x16_t a, int8x16_t b)
> +   int8x16_t [__arm_]vandq_m[_s8](int8x16_t inactive, int8x16_t a, int8x16_t
> b, mve_pred16_t p)
> +   int8x16_t [__arm_]vandq_x[_s8](int8x16_t a, int8x16_t b, mve_pred16_t p)
> */
> +struct binary_def : public overloaded_base<0>
> +{
> +  void
> +  build (function_builder &b, const function_group_info &group,
> +	 bool preserve_user_namespace) const override
> +  {
> +    b.add_overloaded_functions (group, MODE_none,
> preserve_user_namespace);
> +    build_all (b, "v0,v0,v0", group, MODE_none, preserve_user_namespace);
> +  }
> +
> +  tree
> +  resolve (function_resolver &r) const override
> +  {
> +    return r.resolve_uniform (2);
> +  }
> +};
> +SHAPE (binary)
> +
>  /* <T0>_t vfoo[_t0](<T0>_t, <T0>_t)
>     <T0>_t vfoo[_n_t0](<T0>_t, <S0>_t)
> 
> diff --git a/gcc/config/arm/arm-mve-builtins-shapes.h b/gcc/config/arm/arm-
> mve-builtins-shapes.h
> index 43798fdde57..b00ee5eb57a 100644
> --- a/gcc/config/arm/arm-mve-builtins-shapes.h
> +++ b/gcc/config/arm/arm-mve-builtins-shapes.h
> @@ -34,6 +34,7 @@ namespace arm_mve
>    namespace shapes
>    {
> 
> +    extern const function_shape *const binary;
>      extern const function_shape *const binary_opt_n;
>      extern const function_shape *const inherent;
>      extern const function_shape *const unary_convert;
> --
> 2.34.1


^ permalink raw reply	[flat|nested] 55+ messages in thread

* RE: [PATCH 10/22] arm: [MVE intrinsics] factorize vandq veorq vorrq vbicq
  2023-04-18 13:45 ` [PATCH 10/22] arm: [MVE intrinsics] factorize vandq veorq vorrq vbicq Christophe Lyon
@ 2023-05-02 16:36   ` Kyrylo Tkachov
  0 siblings, 0 replies; 55+ messages in thread
From: Kyrylo Tkachov @ 2023-05-02 16:36 UTC (permalink / raw)
  To: Christophe Lyon, gcc-patches, Richard Earnshaw, Richard Sandiford
  Cc: Christophe Lyon



> -----Original Message-----
> From: Christophe Lyon <christophe.lyon@arm.com>
> Sent: Tuesday, April 18, 2023 2:46 PM
> To: gcc-patches@gcc.gnu.org; Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>;
> Richard Earnshaw <Richard.Earnshaw@arm.com>; Richard Sandiford
> <Richard.Sandiford@arm.com>
> Cc: Christophe Lyon <Christophe.Lyon@arm.com>
> Subject: [PATCH 10/22] arm: [MVE intrinsics] factorize vandq veorq vorrq
> vbicq
> 
> Factorize vandq, veorq, vorrq, vbicq so that they use the same
> parameterized names.
> 
> 2022-09-08  Christophe Lyon <christophe.lyon@arm.com>
> 
> 	gcc/
> 	* config/arm/iterators.md (MVE_INT_M_BINARY_LOGIC)
> 	(MVE_FP_M_BINARY_LOGIC): New.
> 	(MVE_INT_M_N_BINARY_LOGIC): New.
> 	(MVE_INT_N_BINARY_LOGIC): New.
> 	(mve_insn): Add vand, veor, vorr, vbic.
> 	* config/arm/mve.md (mve_vandq_m_<supf><mode>)
> 	(mve_veorq_m_<supf><mode>, mve_vorrq_m_<supf><mode>)
> 	(mve_vbicq_m_<supf><mode>): Merge into ...
> 	(@mve_<mve_insn>q_m_<supf><mode>): ... this.
> 	(mve_vandq_m_f<mode>, mve_veorq_m_f<mode>,
> mve_vorrq_m_f<mode>)
> 	(mve_vbicq_m_f<mode>): Merge into ...
> 	(@mve_<mve_insn>q_m_f<mode>): ... this.
> 	(mve_vorrq_n_<supf><mode>)
> 	(mve_vbicq_n_<supf><mode>): Merge into ...
> 	(@mve_<mve_insn>q_n_<supf><mode>): ... this.
> 	(mve_vorrq_m_n_<supf><mode>, mve_vbicq_m_n_<supf><mode>):
> Merge
> 	into ...
> 	(@mve_<mve_insn>q_m_n_<supf><mode>): ... this.
> ---
>  gcc/config/arm/iterators.md |  32 +++++++
>  gcc/config/arm/mve.md       | 161 +++++-------------------------------
>  2 files changed, 51 insertions(+), 142 deletions(-)
> 
> diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
> index d3bef594775..b0ea1af77d2 100644
> --- a/gcc/config/arm/iterators.md
> +++ b/gcc/config/arm/iterators.md
> @@ -339,24 +339,48 @@ (define_int_iterator MVE_INT_M_BINARY   [
>  		     VSUBQ_M_S VSUBQ_M_U
>  		     ])
> 
> +(define_int_iterator MVE_INT_M_BINARY_LOGIC   [
> +		     VANDQ_M_S VANDQ_M_U
> +		     VBICQ_M_S VBICQ_M_U
> +		     VEORQ_M_S VEORQ_M_U
> +		     VORRQ_M_S VORRQ_M_U
> +		     ])
> +
>  (define_int_iterator MVE_INT_M_N_BINARY [
>  		     VADDQ_M_N_S VADDQ_M_N_U
>  		     VMULQ_M_N_S VMULQ_M_N_U
>  		     VSUBQ_M_N_S VSUBQ_M_N_U
>  		     ])
> 
> +(define_int_iterator MVE_INT_M_N_BINARY_LOGIC [
> +		     VBICQ_M_N_S VBICQ_M_N_U
> +		     VORRQ_M_N_S VORRQ_M_N_U
> +		     ])
> +
>  (define_int_iterator MVE_INT_N_BINARY   [
>  		     VADDQ_N_S VADDQ_N_U
>  		     VMULQ_N_S VMULQ_N_U
>  		     VSUBQ_N_S VSUBQ_N_U
>  		     ])
> 
> +(define_int_iterator MVE_INT_N_BINARY_LOGIC   [
> +		     VBICQ_N_S VBICQ_N_U
> +		     VORRQ_N_S VORRQ_N_U
> +		     ])
> +
>  (define_int_iterator MVE_FP_M_BINARY   [
>  		     VADDQ_M_F
>  		     VMULQ_M_F
>  		     VSUBQ_M_F
>  		     ])
> 
> +(define_int_iterator MVE_FP_M_BINARY_LOGIC   [
> +		     VANDQ_M_F
> +		     VBICQ_M_F
> +		     VEORQ_M_F
> +		     VORRQ_M_F
> +		     ])
> +
>  (define_int_iterator MVE_FP_M_N_BINARY [
>  		     VADDQ_M_N_F
>  		     VMULQ_M_N_F
> @@ -379,9 +403,17 @@ (define_int_attr mve_insn [
>  		 (VADDQ_M_N_S "vadd") (VADDQ_M_N_U "vadd")
> (VADDQ_M_N_F "vadd")
>  		 (VADDQ_M_S "vadd") (VADDQ_M_U "vadd") (VADDQ_M_F
> "vadd")
>  		 (VADDQ_N_S "vadd") (VADDQ_N_U "vadd") (VADDQ_N_F
> "vadd")
> +		 (VANDQ_M_S "vand") (VANDQ_M_U "vand") (VANDQ_M_F
> "vand")
> +		 (VBICQ_M_N_S "vbic") (VBICQ_M_N_U "vbic")
> +		 (VBICQ_M_S "vbic") (VBICQ_M_U "vbic") (VBICQ_M_F
> "vbic")
> +		 (VBICQ_N_S "vbic") (VBICQ_N_U "vbic")
> +		 (VEORQ_M_S "veor") (VEORQ_M_U "veor") (VEORQ_M_F
> "veor")
>  		 (VMULQ_M_N_S "vmul") (VMULQ_M_N_U "vmul")
> (VMULQ_M_N_F "vmul")
>  		 (VMULQ_M_S "vmul") (VMULQ_M_U "vmul") (VMULQ_M_F
> "vmul")
>  		 (VMULQ_N_S "vmul") (VMULQ_N_U "vmul") (VMULQ_N_F
> "vmul")
> +		 (VORRQ_M_N_S "vorr") (VORRQ_M_N_U "vorr")
> +		 (VORRQ_M_S "vorr") (VORRQ_M_U "vorr") (VORRQ_M_F
> "vorr")
> +		 (VORRQ_N_S "vorr") (VORRQ_N_U "vorr")
>  		 (VSUBQ_M_N_S "vsub") (VSUBQ_M_N_U "vsub")
> (VSUBQ_M_N_F "vsub")
>  		 (VSUBQ_M_S "vsub") (VSUBQ_M_U "vsub") (VSUBQ_M_F
> "vsub")
>  		 (VSUBQ_N_S "vsub") (VSUBQ_N_U "vsub") (VSUBQ_N_F
> "vsub")
> diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
> index ccb3cf23304..fbae1d3791f 100644
> --- a/gcc/config/arm/mve.md
> +++ b/gcc/config/arm/mve.md
> @@ -1805,21 +1805,6 @@ (define_insn "mve_vbicq_f<mode>"
>    [(set_attr "type" "mve_move")
>  ])
> 
> -;;
> -;; [vbicq_n_s, vbicq_n_u])
> -;;
> -(define_insn "mve_vbicq_n_<supf><mode>"
> -  [
> -   (set (match_operand:MVE_5 0 "s_register_operand" "=w")
> -	(unspec:MVE_5 [(match_operand:MVE_5 1 "s_register_operand" "0")
> -		       (match_operand:SI 2 "immediate_operand" "i")]
> -	 VBICQ_N))
> -  ]
> -  "TARGET_HAVE_MVE"
> -  "vbic.i%#<V_sz_elem>	%q0, %2"
> -  [(set_attr "type" "mve_move")
> -])
> -
>  ;;
>  ;; [vcaddq, vcaddq_rot90, vcadd_rot180, vcadd_rot270])
>  ;;
> @@ -2191,17 +2176,18 @@ (define_insn "mve_vorrq_f<mode>"
>  ])
> 
>  ;;
> +;; [vbicq_n_s, vbicq_n_u])
>  ;; [vorrq_n_u, vorrq_n_s])

As in the other patch, let's get rid of these trailing ')' in the patterns this patch touches.
We can clean up any remaining occurrences after the series with pre-approved patches.
Ok otherwise.
Thanks,
Kyrill

>  ;;
> -(define_insn "mve_vorrq_n_<supf><mode>"
> +(define_insn "@mve_<mve_insn>q_n_<supf><mode>"
>    [
>     (set (match_operand:MVE_5 0 "s_register_operand" "=w")
>  	(unspec:MVE_5 [(match_operand:MVE_5 1 "s_register_operand" "0")
>  		       (match_operand:SI 2 "immediate_operand" "i")]
> -	 VORRQ_N))
> +	 MVE_INT_N_BINARY_LOGIC))
>    ]
>    "TARGET_HAVE_MVE"
> -  "vorr.i%#<V_sz_elem>	%q0, %2"
> +  "<mve_insn>.i%#<V_sz_elem>	%q0, %2"
>    [(set_attr "type" "mve_move")
>  ])
> 
> @@ -2445,21 +2431,6 @@ (define_insn "mve_vrmlaldavhq_<supf>v4si"
>    [(set_attr "type" "mve_move")
>  ])
> 
> -;;
> -;; [vbicq_m_n_s, vbicq_m_n_u])
> -;;
> -(define_insn "mve_vbicq_m_n_<supf><mode>"
> -  [
> -   (set (match_operand:MVE_5 0 "s_register_operand" "=w")
> -	(unspec:MVE_5 [(match_operand:MVE_5 1 "s_register_operand" "0")
> -		       (match_operand:SI 2 "immediate_operand" "i")
> -		       (match_operand:<MVE_VPRED> 3
> "vpr_register_operand" "Up")]
> -	 VBICQ_M_N))
> -  ]
> -  "TARGET_HAVE_MVE"
> -  "vpst\;vbict.i%#<V_sz_elem>	%q0, %2"
> -  [(set_attr "type" "mve_move")
> -   (set_attr "length""8")])
>  ;;
>  ;; [vcmpeqq_m_f])
>  ;;
> @@ -4269,20 +4240,22 @@ (define_insn "mve_vnegq_m_f<mode>"
>     (set_attr "length""8")])
> 
>  ;;
> +;; [vbicq_m_n_s, vbicq_m_n_u])
>  ;; [vorrq_m_n_s, vorrq_m_n_u])
>  ;;
> -(define_insn "mve_vorrq_m_n_<supf><mode>"
> +(define_insn "@mve_<mve_insn>q_m_n_<supf><mode>"
>    [
>     (set (match_operand:MVE_5 0 "s_register_operand" "=w")
>  	(unspec:MVE_5 [(match_operand:MVE_5 1 "s_register_operand" "0")
>  		       (match_operand:SI 2 "immediate_operand" "i")
>  		       (match_operand:<MVE_VPRED> 3
> "vpr_register_operand" "Up")]
> -	 VORRQ_M_N))
> +	 MVE_INT_M_N_BINARY_LOGIC))
>    ]
>    "TARGET_HAVE_MVE"
> -  "vpst\;vorrt.i%#<V_sz_elem>	%q0, %2"
> +  "vpst\;<mve_insn>t.i%#<V_sz_elem>	%q0, %2"
>    [(set_attr "type" "mve_move")
>     (set_attr "length""8")])
> +
>  ;;
>  ;; [vpselq_f])
>  ;;
> @@ -5001,35 +4974,21 @@ (define_insn
> "@mve_<mve_insn>q_m_<supf><mode>"
> 
>  ;;
>  ;; [vandq_m_u, vandq_m_s])
> -;;
> -(define_insn "mve_vandq_m_<supf><mode>"
> -  [
> -   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
> -	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
> -		       (match_operand:MVE_2 2 "s_register_operand" "w")
> -		       (match_operand:MVE_2 3 "s_register_operand" "w")
> -		       (match_operand:<MVE_VPRED> 4
> "vpr_register_operand" "Up")]
> -	 VANDQ_M))
> -  ]
> -  "TARGET_HAVE_MVE"
> -  "vpst\;vandt %q0, %q2, %q3"
> -  [(set_attr "type" "mve_move")
> -   (set_attr "length""8")])
> -
> -;;
>  ;; [vbicq_m_u, vbicq_m_s])
> +;; [veorq_m_u, veorq_m_s])
> +;; [vorrq_m_u, vorrq_m_s])
>  ;;
> -(define_insn "mve_vbicq_m_<supf><mode>"
> +(define_insn "@mve_<mve_insn>q_m_<supf><mode>"
>    [
>     (set (match_operand:MVE_2 0 "s_register_operand" "=w")
>  	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
>  		       (match_operand:MVE_2 2 "s_register_operand" "w")
>  		       (match_operand:MVE_2 3 "s_register_operand" "w")
>  		       (match_operand:<MVE_VPRED> 4
> "vpr_register_operand" "Up")]
> -	 VBICQ_M))
> +	 MVE_INT_M_BINARY_LOGIC))
>    ]
>    "TARGET_HAVE_MVE"
> -  "vpst\;vbict %q0, %q2, %q3"
> +  "vpst\;<mve_insn>t %q0, %q2, %q3"
>    [(set_attr "type" "mve_move")
>     (set_attr "length""8")])
> 
> @@ -5084,23 +5043,6 @@ (define_insn
> "mve_vcaddq_rot90_m_<supf><mode>"
>    [(set_attr "type" "mve_move")
>     (set_attr "length""8")])
> 
> -;;
> -;; [veorq_m_s, veorq_m_u])
> -;;
> -(define_insn "mve_veorq_m_<supf><mode>"
> -  [
> -   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
> -	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
> -		       (match_operand:MVE_2 2 "s_register_operand" "w")
> -		       (match_operand:MVE_2 3 "s_register_operand" "w")
> -		       (match_operand:<MVE_VPRED> 4
> "vpr_register_operand" "Up")]
> -	 VEORQ_M))
> -  ]
> -  "TARGET_HAVE_MVE"
> -  "vpst\;veort %q0, %q2, %q3"
> -  [(set_attr "type" "mve_move")
> -   (set_attr "length""8")])
> -
>  ;;
>  ;; [vhaddq_m_n_s, vhaddq_m_n_u])
>  ;;
> @@ -5322,23 +5264,6 @@ (define_insn "mve_vornq_m_<supf><mode>"
>    [(set_attr "type" "mve_move")
>     (set_attr "length""8")])
> 
> -;;
> -;; [vorrq_m_s, vorrq_m_u])
> -;;
> -(define_insn "mve_vorrq_m_<supf><mode>"
> -  [
> -   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
> -	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
> -		       (match_operand:MVE_2 2 "s_register_operand" "w")
> -		       (match_operand:MVE_2 3 "s_register_operand" "w")
> -		       (match_operand:<MVE_VPRED> 4
> "vpr_register_operand" "Up")]
> -	 VORRQ_M))
> -  ]
> -  "TARGET_HAVE_MVE"
> -  "vpst\;vorrt %q0, %q2, %q3"
> -  [(set_attr "type" "mve_move")
> -   (set_attr "length""8")])
> -
>  ;;
>  ;; [vqaddq_m_n_u, vqaddq_m_n_s])
>  ;;
> @@ -6483,35 +6408,21 @@ (define_insn
> "@mve_<mve_insn>q_m_n_f<mode>"
> 
>  ;;
>  ;; [vandq_m_f])
> -;;
> -(define_insn "mve_vandq_m_f<mode>"
> -  [
> -   (set (match_operand:MVE_0 0 "s_register_operand" "=w")
> -	(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "0")
> -		       (match_operand:MVE_0 2 "s_register_operand" "w")
> -		       (match_operand:MVE_0 3 "s_register_operand" "w")
> -		       (match_operand:<MVE_VPRED> 4
> "vpr_register_operand" "Up")]
> -	 VANDQ_M_F))
> -  ]
> -  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
> -  "vpst\;vandt %q0, %q2, %q3"
> -  [(set_attr "type" "mve_move")
> -   (set_attr "length""8")])
> -
> -;;
>  ;; [vbicq_m_f])
> +;; [veorq_m_f])
> +;; [vorrq_m_f])
>  ;;
> -(define_insn "mve_vbicq_m_f<mode>"
> +(define_insn "@mve_<mve_insn>q_m_f<mode>"
>    [
>     (set (match_operand:MVE_0 0 "s_register_operand" "=w")
>  	(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "0")
>  		       (match_operand:MVE_0 2 "s_register_operand" "w")
>  		       (match_operand:MVE_0 3 "s_register_operand" "w")
>  		       (match_operand:<MVE_VPRED> 4
> "vpr_register_operand" "Up")]
> -	 VBICQ_M_F))
> +	 MVE_FP_M_BINARY_LOGIC))
>    ]
>    "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
> -  "vpst\;vbict %q0, %q2, %q3"
> +  "vpst\;<mve_insn>t %q0, %q2, %q3"
>    [(set_attr "type" "mve_move")
>     (set_attr "length""8")])
> 
> @@ -6702,23 +6613,6 @@ (define_insn "mve_vcmulq_rot90_m_f<mode>"
>    [(set_attr "type" "mve_move")
>     (set_attr "length""8")])
> 
> -;;
> -;; [veorq_m_f])
> -;;
> -(define_insn "mve_veorq_m_f<mode>"
> -  [
> -   (set (match_operand:MVE_0 0 "s_register_operand" "=w")
> -	(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "0")
> -		       (match_operand:MVE_0 2 "s_register_operand" "w")
> -		       (match_operand:MVE_0 3 "s_register_operand" "w")
> -		       (match_operand:<MVE_VPRED> 4
> "vpr_register_operand" "Up")]
> -	 VEORQ_M_F))
> -  ]
> -  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
> -  "vpst\;veort %q0, %q2, %q3"
> -  [(set_attr "type" "mve_move")
> -   (set_attr "length""8")])
> -
>  ;;
>  ;; [vfmaq_m_f])
>  ;;
> @@ -6838,23 +6732,6 @@ (define_insn "mve_vornq_m_f<mode>"
>    [(set_attr "type" "mve_move")
>     (set_attr "length""8")])
> 
> -;;
> -;; [vorrq_m_f])
> -;;
> -(define_insn "mve_vorrq_m_f<mode>"
> -  [
> -   (set (match_operand:MVE_0 0 "s_register_operand" "=w")
> -	(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "0")
> -		       (match_operand:MVE_0 2 "s_register_operand" "w")
> -		       (match_operand:MVE_0 3 "s_register_operand" "w")
> -		       (match_operand:<MVE_VPRED> 4
> "vpr_register_operand" "Up")]
> -	 VORRQ_M_F))
> -  ]
> -  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
> -  "vpst\;vorrt %q0, %q2, %q3"
> -  [(set_attr "type" "mve_move")
> -   (set_attr "length""8")])
> -
>  ;;
>  ;; [vstrbq_s vstrbq_u]
>  ;;
> --
> 2.34.1


^ permalink raw reply	[flat|nested] 55+ messages in thread

* RE: [PATCH 11/22] arm: [MVE intrinsics] rework vandq veorq
  2023-04-18 13:45 ` [PATCH 11/22] arm: [MVE intrinsics] rework vandq veorq Christophe Lyon
@ 2023-05-02 16:37   ` Kyrylo Tkachov
  0 siblings, 0 replies; 55+ messages in thread
From: Kyrylo Tkachov @ 2023-05-02 16:37 UTC (permalink / raw)
  To: Christophe Lyon, gcc-patches, Richard Earnshaw, Richard Sandiford
  Cc: Christophe Lyon



> -----Original Message-----
> From: Christophe Lyon <christophe.lyon@arm.com>
> Sent: Tuesday, April 18, 2023 2:46 PM
> To: gcc-patches@gcc.gnu.org; Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>;
> Richard Earnshaw <Richard.Earnshaw@arm.com>; Richard Sandiford
> <Richard.Sandiford@arm.com>
> Cc: Christophe Lyon <Christophe.Lyon@arm.com>
> Subject: [PATCH 11/22] arm: [MVE intrinsics] rework vandq veorq
> 
> Implement vamdq, veorq using the new MVE builtins framework.
> 

Ok.
Thanks,
Kyrill

> 2022-09-08  Christophe Lyon <christophe.lyon@arm.com>
> 
> 	gcc/
> 	* config/arm/arm-mve-builtins-base.cc (FUNCTION_WITH_RTX_M):
> New.
> 	(vandq,veorq): New.
> 	* config/arm/arm-mve-builtins-base.def (vandq, veorq): New.
> 	* config/arm/arm-mve-builtins-base.h (vandq, veorq): New.
> 	* config/arm/arm_mve.h (vandq): Remove.
> 	(vandq_m): Remove.
> 	(vandq_x): Remove.
> 	(vandq_u8): Remove.
> 	(vandq_s8): Remove.
> 	(vandq_u16): Remove.
> 	(vandq_s16): Remove.
> 	(vandq_u32): Remove.
> 	(vandq_s32): Remove.
> 	(vandq_f16): Remove.
> 	(vandq_f32): Remove.
> 	(vandq_m_s8): Remove.
> 	(vandq_m_s32): Remove.
> 	(vandq_m_s16): Remove.
> 	(vandq_m_u8): Remove.
> 	(vandq_m_u32): Remove.
> 	(vandq_m_u16): Remove.
> 	(vandq_m_f32): Remove.
> 	(vandq_m_f16): Remove.
> 	(vandq_x_s8): Remove.
> 	(vandq_x_s16): Remove.
> 	(vandq_x_s32): Remove.
> 	(vandq_x_u8): Remove.
> 	(vandq_x_u16): Remove.
> 	(vandq_x_u32): Remove.
> 	(vandq_x_f16): Remove.
> 	(vandq_x_f32): Remove.
> 	(__arm_vandq_u8): Remove.
> 	(__arm_vandq_s8): Remove.
> 	(__arm_vandq_u16): Remove.
> 	(__arm_vandq_s16): Remove.
> 	(__arm_vandq_u32): Remove.
> 	(__arm_vandq_s32): Remove.
> 	(__arm_vandq_m_s8): Remove.
> 	(__arm_vandq_m_s32): Remove.
> 	(__arm_vandq_m_s16): Remove.
> 	(__arm_vandq_m_u8): Remove.
> 	(__arm_vandq_m_u32): Remove.
> 	(__arm_vandq_m_u16): Remove.
> 	(__arm_vandq_x_s8): Remove.
> 	(__arm_vandq_x_s16): Remove.
> 	(__arm_vandq_x_s32): Remove.
> 	(__arm_vandq_x_u8): Remove.
> 	(__arm_vandq_x_u16): Remove.
> 	(__arm_vandq_x_u32): Remove.
> 	(__arm_vandq_f16): Remove.
> 	(__arm_vandq_f32): Remove.
> 	(__arm_vandq_m_f32): Remove.
> 	(__arm_vandq_m_f16): Remove.
> 	(__arm_vandq_x_f16): Remove.
> 	(__arm_vandq_x_f32): Remove.
> 	(__arm_vandq): Remove.
> 	(__arm_vandq_m): Remove.
> 	(__arm_vandq_x): Remove.
> 	(veorq_m): Remove.
> 	(veorq_x): Remove.
> 	(veorq_u8): Remove.
> 	(veorq_s8): Remove.
> 	(veorq_u16): Remove.
> 	(veorq_s16): Remove.
> 	(veorq_u32): Remove.
> 	(veorq_s32): Remove.
> 	(veorq_f16): Remove.
> 	(veorq_f32): Remove.
> 	(veorq_m_s8): Remove.
> 	(veorq_m_s32): Remove.
> 	(veorq_m_s16): Remove.
> 	(veorq_m_u8): Remove.
> 	(veorq_m_u32): Remove.
> 	(veorq_m_u16): Remove.
> 	(veorq_m_f32): Remove.
> 	(veorq_m_f16): Remove.
> 	(veorq_x_s8): Remove.
> 	(veorq_x_s16): Remove.
> 	(veorq_x_s32): Remove.
> 	(veorq_x_u8): Remove.
> 	(veorq_x_u16): Remove.
> 	(veorq_x_u32): Remove.
> 	(veorq_x_f16): Remove.
> 	(veorq_x_f32): Remove.
> 	(__arm_veorq_u8): Remove.
> 	(__arm_veorq_s8): Remove.
> 	(__arm_veorq_u16): Remove.
> 	(__arm_veorq_s16): Remove.
> 	(__arm_veorq_u32): Remove.
> 	(__arm_veorq_s32): Remove.
> 	(__arm_veorq_m_s8): Remove.
> 	(__arm_veorq_m_s32): Remove.
> 	(__arm_veorq_m_s16): Remove.
> 	(__arm_veorq_m_u8): Remove.
> 	(__arm_veorq_m_u32): Remove.
> 	(__arm_veorq_m_u16): Remove.
> 	(__arm_veorq_x_s8): Remove.
> 	(__arm_veorq_x_s16): Remove.
> 	(__arm_veorq_x_s32): Remove.
> 	(__arm_veorq_x_u8): Remove.
> 	(__arm_veorq_x_u16): Remove.
> 	(__arm_veorq_x_u32): Remove.
> 	(__arm_veorq_f16): Remove.
> 	(__arm_veorq_f32): Remove.
> 	(__arm_veorq_m_f32): Remove.
> 	(__arm_veorq_m_f16): Remove.
> 	(__arm_veorq_x_f16): Remove.
> 	(__arm_veorq_x_f32): Remove.
> 	(__arm_veorq): Remove.
> 	(__arm_veorq_m): Remove.
> 	(__arm_veorq_x): Remove.
> ---
>  gcc/config/arm/arm-mve-builtins-base.cc  |  10 +
>  gcc/config/arm/arm-mve-builtins-base.def |   4 +
>  gcc/config/arm/arm-mve-builtins-base.h   |   2 +
>  gcc/config/arm/arm_mve.h                 | 862 -----------------------
>  4 files changed, 16 insertions(+), 862 deletions(-)
> 
> diff --git a/gcc/config/arm/arm-mve-builtins-base.cc b/gcc/config/arm/arm-
> mve-builtins-base.cc
> index 48b09bffd0c..51fed8f671f 100644
> --- a/gcc/config/arm/arm-mve-builtins-base.cc
> +++ b/gcc/config/arm/arm-mve-builtins-base.cc
> @@ -90,7 +90,17 @@ namespace arm_mve {
>      UNSPEC##_M_S, UNSPEC##_M_U, UNSPEC##_M_F,
> 	\
>      UNSPEC##_M_N_S, UNSPEC##_M_N_U, UNSPEC##_M_N_F))
> 
> +  /* Helper for builtins with RTX codes, and _m predicated overrides.  */
> +#define FUNCTION_WITH_RTX_M(NAME, RTX, UNSPEC) FUNCTION
> 		\
> +  (NAME, unspec_based_mve_function_exact_insn,
> 	\
> +   (RTX, RTX, RTX,							\
> +    -1, -1, -1,								\
> +    UNSPEC##_M_S, UNSPEC##_M_U, UNSPEC##_M_F,
> 	\
> +    -1, -1, -1))
> +
>  FUNCTION_WITH_RTX_M_N (vaddq, PLUS, VADDQ)
> +FUNCTION_WITH_RTX_M (vandq, AND, VANDQ)
> +FUNCTION_WITH_RTX_M (veorq, XOR, VEORQ)
>  FUNCTION_WITH_RTX_M_N (vmulq, MULT, VMULQ)
>  FUNCTION (vreinterpretq, vreinterpretq_impl,)
>  FUNCTION_WITH_RTX_M_N (vsubq, MINUS, VSUBQ)
> diff --git a/gcc/config/arm/arm-mve-builtins-base.def b/gcc/config/arm/arm-
> mve-builtins-base.def
> index 624558c08b2..a933c9fc91e 100644
> --- a/gcc/config/arm/arm-mve-builtins-base.def
> +++ b/gcc/config/arm/arm-mve-builtins-base.def
> @@ -19,6 +19,8 @@
> 
>  #define REQUIRES_FLOAT false
>  DEF_MVE_FUNCTION (vaddq, binary_opt_n, all_integer, mx_or_none)
> +DEF_MVE_FUNCTION (vandq, binary, all_integer, mx_or_none)
> +DEF_MVE_FUNCTION (veorq, binary, all_integer, mx_or_none)
>  DEF_MVE_FUNCTION (vmulq, binary_opt_n, all_integer, mx_or_none)
>  DEF_MVE_FUNCTION (vreinterpretq, unary_convert, reinterpret_integer,
> none)
>  DEF_MVE_FUNCTION (vsubq, binary_opt_n, all_integer, mx_or_none)
> @@ -27,6 +29,8 @@ DEF_MVE_FUNCTION (vuninitializedq, inherent,
> all_integer_with_64, none)
> 
>  #define REQUIRES_FLOAT true
>  DEF_MVE_FUNCTION (vaddq, binary_opt_n, all_float, mx_or_none)
> +DEF_MVE_FUNCTION (vandq, binary, all_float, mx_or_none)
> +DEF_MVE_FUNCTION (veorq, binary, all_float, mx_or_none)
>  DEF_MVE_FUNCTION (vmulq, binary_opt_n, all_float, mx_or_none)
>  DEF_MVE_FUNCTION (vreinterpretq, unary_convert, reinterpret_float, none)
>  DEF_MVE_FUNCTION (vsubq, binary_opt_n, all_float, mx_or_none)
> diff --git a/gcc/config/arm/arm-mve-builtins-base.h b/gcc/config/arm/arm-
> mve-builtins-base.h
> index 30f8549c495..4fcf55715b6 100644
> --- a/gcc/config/arm/arm-mve-builtins-base.h
> +++ b/gcc/config/arm/arm-mve-builtins-base.h
> @@ -24,6 +24,8 @@ namespace arm_mve {
>  namespace functions {
> 
>  extern const function_base *const vaddq;
> +extern const function_base *const vandq;
> +extern const function_base *const veorq;
>  extern const function_base *const vmulq;
>  extern const function_base *const vreinterpretq;
>  extern const function_base *const vsubq;
> diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
> index 42a1af2ae15..0ad0122e44f 100644
> --- a/gcc/config/arm/arm_mve.h
> +++ b/gcc/config/arm/arm_mve.h
> @@ -77,14 +77,12 @@
>  #define vmaxq(__a, __b) __arm_vmaxq(__a, __b)
>  #define vhsubq(__a, __b) __arm_vhsubq(__a, __b)
>  #define vhaddq(__a, __b) __arm_vhaddq(__a, __b)
> -#define veorq(__a, __b) __arm_veorq(__a, __b)
>  #define vcmphiq(__a, __b) __arm_vcmphiq(__a, __b)
>  #define vcmpeqq(__a, __b) __arm_vcmpeqq(__a, __b)
>  #define vcmpcsq(__a, __b) __arm_vcmpcsq(__a, __b)
>  #define vcaddq_rot90(__a, __b) __arm_vcaddq_rot90(__a, __b)
>  #define vcaddq_rot270(__a, __b) __arm_vcaddq_rot270(__a, __b)
>  #define vbicq(__a, __b) __arm_vbicq(__a, __b)
> -#define vandq(__a, __b) __arm_vandq(__a, __b)
>  #define vaddvq_p(__a, __p) __arm_vaddvq_p(__a, __p)
>  #define vaddvaq(__a, __b) __arm_vaddvaq(__a, __b)
>  #define vabdq(__a, __b) __arm_vabdq(__a, __b)
> @@ -236,12 +234,10 @@
>  #define vabavq_p(__a, __b, __c, __p) __arm_vabavq_p(__a, __b, __c, __p)
>  #define vshlq_m(__inactive, __a, __b, __p) __arm_vshlq_m(__inactive, __a,
> __b, __p)
>  #define vabdq_m(__inactive, __a, __b, __p) __arm_vabdq_m(__inactive,
> __a, __b, __p)
> -#define vandq_m(__inactive, __a, __b, __p) __arm_vandq_m(__inactive,
> __a, __b, __p)
>  #define vbicq_m(__inactive, __a, __b, __p) __arm_vbicq_m(__inactive, __a,
> __b, __p)
>  #define vbrsrq_m(__inactive, __a, __b, __p) __arm_vbrsrq_m(__inactive,
> __a, __b, __p)
>  #define vcaddq_rot270_m(__inactive, __a, __b, __p)
> __arm_vcaddq_rot270_m(__inactive, __a, __b, __p)
>  #define vcaddq_rot90_m(__inactive, __a, __b, __p)
> __arm_vcaddq_rot90_m(__inactive, __a, __b, __p)
> -#define veorq_m(__inactive, __a, __b, __p) __arm_veorq_m(__inactive, __a,
> __b, __p)
>  #define vhaddq_m(__inactive, __a, __b, __p) __arm_vhaddq_m(__inactive,
> __a, __b, __p)
>  #define vhcaddq_rot270_m(__inactive, __a, __b, __p)
> __arm_vhcaddq_rot270_m(__inactive, __a, __b, __p)
>  #define vhcaddq_rot90_m(__inactive, __a, __b, __p)
> __arm_vhcaddq_rot90_m(__inactive, __a, __b, __p)
> @@ -404,10 +400,8 @@
>  #define vhsubq_x(__a, __b, __p) __arm_vhsubq_x(__a, __b, __p)
>  #define vrhaddq_x(__a, __b, __p) __arm_vrhaddq_x(__a, __b, __p)
>  #define vrmulhq_x(__a, __b, __p) __arm_vrmulhq_x(__a, __b, __p)
> -#define vandq_x(__a, __b, __p) __arm_vandq_x(__a, __b, __p)
>  #define vbicq_x(__a, __b, __p) __arm_vbicq_x(__a, __b, __p)
>  #define vbrsrq_x(__a, __b, __p) __arm_vbrsrq_x(__a, __b, __p)
> -#define veorq_x(__a, __b, __p) __arm_veorq_x(__a, __b, __p)
>  #define vmovlbq_x(__a, __p) __arm_vmovlbq_x(__a, __p)
>  #define vmovltq_x(__a, __p) __arm_vmovltq_x(__a, __p)
>  #define vmvnq_x(__a, __p) __arm_vmvnq_x(__a, __p)
> @@ -702,7 +696,6 @@
>  #define vhsubq_n_u8(__a, __b) __arm_vhsubq_n_u8(__a, __b)
>  #define vhaddq_u8(__a, __b) __arm_vhaddq_u8(__a, __b)
>  #define vhaddq_n_u8(__a, __b) __arm_vhaddq_n_u8(__a, __b)
> -#define veorq_u8(__a, __b) __arm_veorq_u8(__a, __b)
>  #define vcmpneq_n_u8(__a, __b) __arm_vcmpneq_n_u8(__a, __b)
>  #define vcmphiq_u8(__a, __b) __arm_vcmphiq_u8(__a, __b)
>  #define vcmphiq_n_u8(__a, __b) __arm_vcmphiq_n_u8(__a, __b)
> @@ -713,7 +706,6 @@
>  #define vcaddq_rot90_u8(__a, __b) __arm_vcaddq_rot90_u8(__a, __b)
>  #define vcaddq_rot270_u8(__a, __b) __arm_vcaddq_rot270_u8(__a, __b)
>  #define vbicq_u8(__a, __b) __arm_vbicq_u8(__a, __b)
> -#define vandq_u8(__a, __b) __arm_vandq_u8(__a, __b)
>  #define vaddvq_p_u8(__a, __p) __arm_vaddvq_p_u8(__a, __p)
>  #define vaddvaq_u8(__a, __b) __arm_vaddvaq_u8(__a, __b)
>  #define vabdq_u8(__a, __b) __arm_vabdq_u8(__a, __b)
> @@ -781,12 +773,10 @@
>  #define vhcaddq_rot270_s8(__a, __b) __arm_vhcaddq_rot270_s8(__a, __b)
>  #define vhaddq_s8(__a, __b) __arm_vhaddq_s8(__a, __b)
>  #define vhaddq_n_s8(__a, __b) __arm_vhaddq_n_s8(__a, __b)
> -#define veorq_s8(__a, __b) __arm_veorq_s8(__a, __b)
>  #define vcaddq_rot90_s8(__a, __b) __arm_vcaddq_rot90_s8(__a, __b)
>  #define vcaddq_rot270_s8(__a, __b) __arm_vcaddq_rot270_s8(__a, __b)
>  #define vbrsrq_n_s8(__a, __b) __arm_vbrsrq_n_s8(__a, __b)
>  #define vbicq_s8(__a, __b) __arm_vbicq_s8(__a, __b)
> -#define vandq_s8(__a, __b) __arm_vandq_s8(__a, __b)
>  #define vaddvaq_s8(__a, __b) __arm_vaddvaq_s8(__a, __b)
>  #define vabdq_s8(__a, __b) __arm_vabdq_s8(__a, __b)
>  #define vshlq_n_s8(__a,  __imm) __arm_vshlq_n_s8(__a,  __imm)
> @@ -812,7 +802,6 @@
>  #define vhsubq_n_u16(__a, __b) __arm_vhsubq_n_u16(__a, __b)
>  #define vhaddq_u16(__a, __b) __arm_vhaddq_u16(__a, __b)
>  #define vhaddq_n_u16(__a, __b) __arm_vhaddq_n_u16(__a, __b)
> -#define veorq_u16(__a, __b) __arm_veorq_u16(__a, __b)
>  #define vcmpneq_n_u16(__a, __b) __arm_vcmpneq_n_u16(__a, __b)
>  #define vcmphiq_u16(__a, __b) __arm_vcmphiq_u16(__a, __b)
>  #define vcmphiq_n_u16(__a, __b) __arm_vcmphiq_n_u16(__a, __b)
> @@ -823,7 +812,6 @@
>  #define vcaddq_rot90_u16(__a, __b) __arm_vcaddq_rot90_u16(__a, __b)
>  #define vcaddq_rot270_u16(__a, __b) __arm_vcaddq_rot270_u16(__a, __b)
>  #define vbicq_u16(__a, __b) __arm_vbicq_u16(__a, __b)
> -#define vandq_u16(__a, __b) __arm_vandq_u16(__a, __b)
>  #define vaddvq_p_u16(__a, __p) __arm_vaddvq_p_u16(__a, __p)
>  #define vaddvaq_u16(__a, __b) __arm_vaddvaq_u16(__a, __b)
>  #define vabdq_u16(__a, __b) __arm_vabdq_u16(__a, __b)
> @@ -891,12 +879,10 @@
>  #define vhcaddq_rot270_s16(__a, __b) __arm_vhcaddq_rot270_s16(__a,
> __b)
>  #define vhaddq_s16(__a, __b) __arm_vhaddq_s16(__a, __b)
>  #define vhaddq_n_s16(__a, __b) __arm_vhaddq_n_s16(__a, __b)
> -#define veorq_s16(__a, __b) __arm_veorq_s16(__a, __b)
>  #define vcaddq_rot90_s16(__a, __b) __arm_vcaddq_rot90_s16(__a, __b)
>  #define vcaddq_rot270_s16(__a, __b) __arm_vcaddq_rot270_s16(__a, __b)
>  #define vbrsrq_n_s16(__a, __b) __arm_vbrsrq_n_s16(__a, __b)
>  #define vbicq_s16(__a, __b) __arm_vbicq_s16(__a, __b)
> -#define vandq_s16(__a, __b) __arm_vandq_s16(__a, __b)
>  #define vaddvaq_s16(__a, __b) __arm_vaddvaq_s16(__a, __b)
>  #define vabdq_s16(__a, __b) __arm_vabdq_s16(__a, __b)
>  #define vshlq_n_s16(__a,  __imm) __arm_vshlq_n_s16(__a,  __imm)
> @@ -922,7 +908,6 @@
>  #define vhsubq_n_u32(__a, __b) __arm_vhsubq_n_u32(__a, __b)
>  #define vhaddq_u32(__a, __b) __arm_vhaddq_u32(__a, __b)
>  #define vhaddq_n_u32(__a, __b) __arm_vhaddq_n_u32(__a, __b)
> -#define veorq_u32(__a, __b) __arm_veorq_u32(__a, __b)
>  #define vcmpneq_n_u32(__a, __b) __arm_vcmpneq_n_u32(__a, __b)
>  #define vcmphiq_u32(__a, __b) __arm_vcmphiq_u32(__a, __b)
>  #define vcmphiq_n_u32(__a, __b) __arm_vcmphiq_n_u32(__a, __b)
> @@ -933,7 +918,6 @@
>  #define vcaddq_rot90_u32(__a, __b) __arm_vcaddq_rot90_u32(__a, __b)
>  #define vcaddq_rot270_u32(__a, __b) __arm_vcaddq_rot270_u32(__a, __b)
>  #define vbicq_u32(__a, __b) __arm_vbicq_u32(__a, __b)
> -#define vandq_u32(__a, __b) __arm_vandq_u32(__a, __b)
>  #define vaddvq_p_u32(__a, __p) __arm_vaddvq_p_u32(__a, __p)
>  #define vaddvaq_u32(__a, __b) __arm_vaddvaq_u32(__a, __b)
>  #define vabdq_u32(__a, __b) __arm_vabdq_u32(__a, __b)
> @@ -1001,12 +985,10 @@
>  #define vhcaddq_rot270_s32(__a, __b) __arm_vhcaddq_rot270_s32(__a,
> __b)
>  #define vhaddq_s32(__a, __b) __arm_vhaddq_s32(__a, __b)
>  #define vhaddq_n_s32(__a, __b) __arm_vhaddq_n_s32(__a, __b)
> -#define veorq_s32(__a, __b) __arm_veorq_s32(__a, __b)
>  #define vcaddq_rot90_s32(__a, __b) __arm_vcaddq_rot90_s32(__a, __b)
>  #define vcaddq_rot270_s32(__a, __b) __arm_vcaddq_rot270_s32(__a, __b)
>  #define vbrsrq_n_s32(__a, __b) __arm_vbrsrq_n_s32(__a, __b)
>  #define vbicq_s32(__a, __b) __arm_vbicq_s32(__a, __b)
> -#define vandq_s32(__a, __b) __arm_vandq_s32(__a, __b)
>  #define vaddvaq_s32(__a, __b) __arm_vaddvaq_s32(__a, __b)
>  #define vabdq_s32(__a, __b) __arm_vabdq_s32(__a, __b)
>  #define vshlq_n_s32(__a,  __imm) __arm_vshlq_n_s32(__a,  __imm)
> @@ -1059,7 +1041,6 @@
>  #define vmaxnmq_f16(__a, __b) __arm_vmaxnmq_f16(__a, __b)
>  #define vmaxnmavq_f16(__a, __b) __arm_vmaxnmavq_f16(__a, __b)
>  #define vmaxnmaq_f16(__a, __b) __arm_vmaxnmaq_f16(__a, __b)
> -#define veorq_f16(__a, __b) __arm_veorq_f16(__a, __b)
>  #define vcmulq_rot90_f16(__a, __b) __arm_vcmulq_rot90_f16(__a, __b)
>  #define vcmulq_rot270_f16(__a, __b) __arm_vcmulq_rot270_f16(__a, __b)
>  #define vcmulq_rot180_f16(__a, __b) __arm_vcmulq_rot180_f16(__a, __b)
> @@ -1067,7 +1048,6 @@
>  #define vcaddq_rot90_f16(__a, __b) __arm_vcaddq_rot90_f16(__a, __b)
>  #define vcaddq_rot270_f16(__a, __b) __arm_vcaddq_rot270_f16(__a, __b)
>  #define vbicq_f16(__a, __b) __arm_vbicq_f16(__a, __b)
> -#define vandq_f16(__a, __b) __arm_vandq_f16(__a, __b)
>  #define vabdq_f16(__a, __b) __arm_vabdq_f16(__a, __b)
>  #define vshlltq_n_s8(__a,  __imm) __arm_vshlltq_n_s8(__a,  __imm)
>  #define vshllbq_n_s8(__a,  __imm) __arm_vshllbq_n_s8(__a,  __imm)
> @@ -1120,7 +1100,6 @@
>  #define vmaxnmq_f32(__a, __b) __arm_vmaxnmq_f32(__a, __b)
>  #define vmaxnmavq_f32(__a, __b) __arm_vmaxnmavq_f32(__a, __b)
>  #define vmaxnmaq_f32(__a, __b) __arm_vmaxnmaq_f32(__a, __b)
> -#define veorq_f32(__a, __b) __arm_veorq_f32(__a, __b)
>  #define vcmulq_rot90_f32(__a, __b) __arm_vcmulq_rot90_f32(__a, __b)
>  #define vcmulq_rot270_f32(__a, __b) __arm_vcmulq_rot270_f32(__a, __b)
>  #define vcmulq_rot180_f32(__a, __b) __arm_vcmulq_rot180_f32(__a, __b)
> @@ -1128,7 +1107,6 @@
>  #define vcaddq_rot90_f32(__a, __b) __arm_vcaddq_rot90_f32(__a, __b)
>  #define vcaddq_rot270_f32(__a, __b) __arm_vcaddq_rot270_f32(__a, __b)
>  #define vbicq_f32(__a, __b) __arm_vbicq_f32(__a, __b)
> -#define vandq_f32(__a, __b) __arm_vandq_f32(__a, __b)
>  #define vabdq_f32(__a, __b) __arm_vabdq_f32(__a, __b)
>  #define vshlltq_n_s16(__a,  __imm) __arm_vshlltq_n_s16(__a,  __imm)
>  #define vshllbq_n_s16(__a,  __imm) __arm_vshllbq_n_s16(__a,  __imm)
> @@ -1662,12 +1640,6 @@
>  #define vabdq_m_u8(__inactive, __a, __b, __p)
> __arm_vabdq_m_u8(__inactive, __a, __b, __p)
>  #define vabdq_m_u32(__inactive, __a, __b, __p)
> __arm_vabdq_m_u32(__inactive, __a, __b, __p)
>  #define vabdq_m_u16(__inactive, __a, __b, __p)
> __arm_vabdq_m_u16(__inactive, __a, __b, __p)
> -#define vandq_m_s8(__inactive, __a, __b, __p)
> __arm_vandq_m_s8(__inactive, __a, __b, __p)
> -#define vandq_m_s32(__inactive, __a, __b, __p)
> __arm_vandq_m_s32(__inactive, __a, __b, __p)
> -#define vandq_m_s16(__inactive, __a, __b, __p)
> __arm_vandq_m_s16(__inactive, __a, __b, __p)
> -#define vandq_m_u8(__inactive, __a, __b, __p)
> __arm_vandq_m_u8(__inactive, __a, __b, __p)
> -#define vandq_m_u32(__inactive, __a, __b, __p)
> __arm_vandq_m_u32(__inactive, __a, __b, __p)
> -#define vandq_m_u16(__inactive, __a, __b, __p)
> __arm_vandq_m_u16(__inactive, __a, __b, __p)
>  #define vbicq_m_s8(__inactive, __a, __b, __p)
> __arm_vbicq_m_s8(__inactive, __a, __b, __p)
>  #define vbicq_m_s32(__inactive, __a, __b, __p)
> __arm_vbicq_m_s32(__inactive, __a, __b, __p)
>  #define vbicq_m_s16(__inactive, __a, __b, __p)
> __arm_vbicq_m_s16(__inactive, __a, __b, __p)
> @@ -1692,12 +1664,6 @@
>  #define vcaddq_rot90_m_u8(__inactive, __a, __b, __p)
> __arm_vcaddq_rot90_m_u8(__inactive, __a, __b, __p)
>  #define vcaddq_rot90_m_u32(__inactive, __a, __b, __p)
> __arm_vcaddq_rot90_m_u32(__inactive, __a, __b, __p)
>  #define vcaddq_rot90_m_u16(__inactive, __a, __b, __p)
> __arm_vcaddq_rot90_m_u16(__inactive, __a, __b, __p)
> -#define veorq_m_s8(__inactive, __a, __b, __p)
> __arm_veorq_m_s8(__inactive, __a, __b, __p)
> -#define veorq_m_s32(__inactive, __a, __b, __p)
> __arm_veorq_m_s32(__inactive, __a, __b, __p)
> -#define veorq_m_s16(__inactive, __a, __b, __p)
> __arm_veorq_m_s16(__inactive, __a, __b, __p)
> -#define veorq_m_u8(__inactive, __a, __b, __p)
> __arm_veorq_m_u8(__inactive, __a, __b, __p)
> -#define veorq_m_u32(__inactive, __a, __b, __p)
> __arm_veorq_m_u32(__inactive, __a, __b, __p)
> -#define veorq_m_u16(__inactive, __a, __b, __p)
> __arm_veorq_m_u16(__inactive, __a, __b, __p)
>  #define vhaddq_m_n_s8(__inactive, __a, __b, __p)
> __arm_vhaddq_m_n_s8(__inactive, __a, __b, __p)
>  #define vhaddq_m_n_s32(__inactive, __a, __b, __p)
> __arm_vhaddq_m_n_s32(__inactive, __a, __b, __p)
>  #define vhaddq_m_n_s16(__inactive, __a, __b, __p)
> __arm_vhaddq_m_n_s16(__inactive, __a, __b, __p)
> @@ -2006,8 +1972,6 @@
>  #define vshrntq_m_n_u16(__a, __b,  __imm, __p)
> __arm_vshrntq_m_n_u16(__a, __b,  __imm, __p)
>  #define vabdq_m_f32(__inactive, __a, __b, __p)
> __arm_vabdq_m_f32(__inactive, __a, __b, __p)
>  #define vabdq_m_f16(__inactive, __a, __b, __p)
> __arm_vabdq_m_f16(__inactive, __a, __b, __p)
> -#define vandq_m_f32(__inactive, __a, __b, __p)
> __arm_vandq_m_f32(__inactive, __a, __b, __p)
> -#define vandq_m_f16(__inactive, __a, __b, __p)
> __arm_vandq_m_f16(__inactive, __a, __b, __p)
>  #define vbicq_m_f32(__inactive, __a, __b, __p)
> __arm_vbicq_m_f32(__inactive, __a, __b, __p)
>  #define vbicq_m_f16(__inactive, __a, __b, __p)
> __arm_vbicq_m_f16(__inactive, __a, __b, __p)
>  #define vbrsrq_m_n_f32(__inactive, __a, __b, __p)
> __arm_vbrsrq_m_n_f32(__inactive, __a, __b, __p)
> @@ -2036,8 +2000,6 @@
>  #define vcvtq_m_n_s16_f16(__inactive, __a,  __imm6, __p)
> __arm_vcvtq_m_n_s16_f16(__inactive, __a,  __imm6, __p)
>  #define vcvtq_m_n_u32_f32(__inactive, __a,  __imm6, __p)
> __arm_vcvtq_m_n_u32_f32(__inactive, __a,  __imm6, __p)
>  #define vcvtq_m_n_u16_f16(__inactive, __a,  __imm6, __p)
> __arm_vcvtq_m_n_u16_f16(__inactive, __a,  __imm6, __p)
> -#define veorq_m_f32(__inactive, __a, __b, __p)
> __arm_veorq_m_f32(__inactive, __a, __b, __p)
> -#define veorq_m_f16(__inactive, __a, __b, __p)
> __arm_veorq_m_f16(__inactive, __a, __b, __p)
>  #define vfmaq_m_f32(__a, __b, __c, __p) __arm_vfmaq_m_f32(__a, __b,
> __c, __p)
>  #define vfmaq_m_f16(__a, __b, __c, __p) __arm_vfmaq_m_f16(__a, __b,
> __c, __p)
>  #define vfmaq_m_n_f32(__a, __b, __c, __p) __arm_vfmaq_m_n_f32(__a,
> __b, __c, __p)
> @@ -2467,12 +2429,6 @@
>  #define vrmulhq_x_u8(__a, __b, __p) __arm_vrmulhq_x_u8(__a, __b, __p)
>  #define vrmulhq_x_u16(__a, __b, __p) __arm_vrmulhq_x_u16(__a, __b,
> __p)
>  #define vrmulhq_x_u32(__a, __b, __p) __arm_vrmulhq_x_u32(__a, __b,
> __p)
> -#define vandq_x_s8(__a, __b, __p) __arm_vandq_x_s8(__a, __b, __p)
> -#define vandq_x_s16(__a, __b, __p) __arm_vandq_x_s16(__a, __b, __p)
> -#define vandq_x_s32(__a, __b, __p) __arm_vandq_x_s32(__a, __b, __p)
> -#define vandq_x_u8(__a, __b, __p) __arm_vandq_x_u8(__a, __b, __p)
> -#define vandq_x_u16(__a, __b, __p) __arm_vandq_x_u16(__a, __b, __p)
> -#define vandq_x_u32(__a, __b, __p) __arm_vandq_x_u32(__a, __b, __p)
>  #define vbicq_x_s8(__a, __b, __p) __arm_vbicq_x_s8(__a, __b, __p)
>  #define vbicq_x_s16(__a, __b, __p) __arm_vbicq_x_s16(__a, __b, __p)
>  #define vbicq_x_s32(__a, __b, __p) __arm_vbicq_x_s32(__a, __b, __p)
> @@ -2485,12 +2441,6 @@
>  #define vbrsrq_x_n_u8(__a, __b, __p) __arm_vbrsrq_x_n_u8(__a, __b, __p)
>  #define vbrsrq_x_n_u16(__a, __b, __p) __arm_vbrsrq_x_n_u16(__a, __b,
> __p)
>  #define vbrsrq_x_n_u32(__a, __b, __p) __arm_vbrsrq_x_n_u32(__a, __b,
> __p)
> -#define veorq_x_s8(__a, __b, __p) __arm_veorq_x_s8(__a, __b, __p)
> -#define veorq_x_s16(__a, __b, __p) __arm_veorq_x_s16(__a, __b, __p)
> -#define veorq_x_s32(__a, __b, __p) __arm_veorq_x_s32(__a, __b, __p)
> -#define veorq_x_u8(__a, __b, __p) __arm_veorq_x_u8(__a, __b, __p)
> -#define veorq_x_u16(__a, __b, __p) __arm_veorq_x_u16(__a, __b, __p)
> -#define veorq_x_u32(__a, __b, __p) __arm_veorq_x_u32(__a, __b, __p)
>  #define vmovlbq_x_s8(__a, __p) __arm_vmovlbq_x_s8(__a, __p)
>  #define vmovlbq_x_s16(__a, __p) __arm_vmovlbq_x_s16(__a, __p)
>  #define vmovlbq_x_u8(__a, __p) __arm_vmovlbq_x_u8(__a, __p)
> @@ -2641,14 +2591,10 @@
>  #define vrndaq_x_f32(__a, __p) __arm_vrndaq_x_f32(__a, __p)
>  #define vrndxq_x_f16(__a, __p) __arm_vrndxq_x_f16(__a, __p)
>  #define vrndxq_x_f32(__a, __p) __arm_vrndxq_x_f32(__a, __p)
> -#define vandq_x_f16(__a, __b, __p) __arm_vandq_x_f16(__a, __b, __p)
> -#define vandq_x_f32(__a, __b, __p) __arm_vandq_x_f32(__a, __b, __p)
>  #define vbicq_x_f16(__a, __b, __p) __arm_vbicq_x_f16(__a, __b, __p)
>  #define vbicq_x_f32(__a, __b, __p) __arm_vbicq_x_f32(__a, __b, __p)
>  #define vbrsrq_x_n_f16(__a, __b, __p) __arm_vbrsrq_x_n_f16(__a, __b,
> __p)
>  #define vbrsrq_x_n_f32(__a, __b, __p) __arm_vbrsrq_x_n_f32(__a, __b,
> __p)
> -#define veorq_x_f16(__a, __b, __p) __arm_veorq_x_f16(__a, __b, __p)
> -#define veorq_x_f32(__a, __b, __p) __arm_veorq_x_f32(__a, __b, __p)
>  #define vornq_x_f16(__a, __b, __p) __arm_vornq_x_f16(__a, __b, __p)
>  #define vornq_x_f32(__a, __b, __p) __arm_vornq_x_f32(__a, __b, __p)
>  #define vorrq_x_f16(__a, __b, __p) __arm_vorrq_x_f16(__a, __b, __p)
> @@ -3647,13 +3593,6 @@ __arm_vhaddq_n_u8 (uint8x16_t __a, uint8_t
> __b)
>    return __builtin_mve_vhaddq_n_uv16qi (__a, __b);
>  }
> 
> -__extension__ extern __inline uint8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_veorq_u8 (uint8x16_t __a, uint8x16_t __b)
> -{
> -  return __builtin_mve_veorq_uv16qi (__a, __b);
> -}
> -
>  __extension__ extern __inline mve_pred16_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vcmpneq_n_u8 (uint8x16_t __a, uint8_t __b)
> @@ -3726,13 +3665,6 @@ __arm_vbicq_u8 (uint8x16_t __a, uint8x16_t __b)
>    return __builtin_mve_vbicq_uv16qi (__a, __b);
>  }
> 
> -__extension__ extern __inline uint8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vandq_u8 (uint8x16_t __a, uint8x16_t __b)
> -{
> -  return __builtin_mve_vandq_uv16qi (__a, __b);
> -}
> -
>  __extension__ extern __inline uint32_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vaddvq_p_u8 (uint8x16_t __a, mve_pred16_t __p)
> @@ -4202,13 +4134,6 @@ __arm_vhaddq_n_s8 (int8x16_t __a, int8_t __b)
>    return __builtin_mve_vhaddq_n_sv16qi (__a, __b);
>  }
> 
> -__extension__ extern __inline int8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_veorq_s8 (int8x16_t __a, int8x16_t __b)
> -{
> -  return __builtin_mve_veorq_sv16qi (__a, __b);
> -}
> -
>  __extension__ extern __inline int8x16_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vcaddq_rot90_s8 (int8x16_t __a, int8x16_t __b)
> @@ -4237,13 +4162,6 @@ __arm_vbicq_s8 (int8x16_t __a, int8x16_t __b)
>    return __builtin_mve_vbicq_sv16qi (__a, __b);
>  }
> 
> -__extension__ extern __inline int8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vandq_s8 (int8x16_t __a, int8x16_t __b)
> -{
> -  return __builtin_mve_vandq_sv16qi (__a, __b);
> -}
> -
>  __extension__ extern __inline int32_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vaddvaq_s8 (int32_t __a, int8x16_t __b)
> @@ -4419,13 +4337,6 @@ __arm_vhaddq_n_u16 (uint16x8_t __a, uint16_t
> __b)
>    return __builtin_mve_vhaddq_n_uv8hi (__a, __b);
>  }
> 
> -__extension__ extern __inline uint16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_veorq_u16 (uint16x8_t __a, uint16x8_t __b)
> -{
> -  return __builtin_mve_veorq_uv8hi (__a, __b);
> -}
> -
>  __extension__ extern __inline mve_pred16_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vcmpneq_n_u16 (uint16x8_t __a, uint16_t __b)
> @@ -4498,13 +4409,6 @@ __arm_vbicq_u16 (uint16x8_t __a, uint16x8_t
> __b)
>    return __builtin_mve_vbicq_uv8hi (__a, __b);
>  }
> 
> -__extension__ extern __inline uint16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vandq_u16 (uint16x8_t __a, uint16x8_t __b)
> -{
> -  return __builtin_mve_vandq_uv8hi (__a, __b);
> -}
> -
>  __extension__ extern __inline uint32_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vaddvq_p_u16 (uint16x8_t __a, mve_pred16_t __p)
> @@ -4974,13 +4878,6 @@ __arm_vhaddq_n_s16 (int16x8_t __a, int16_t
> __b)
>    return __builtin_mve_vhaddq_n_sv8hi (__a, __b);
>  }
> 
> -__extension__ extern __inline int16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_veorq_s16 (int16x8_t __a, int16x8_t __b)
> -{
> -  return __builtin_mve_veorq_sv8hi (__a, __b);
> -}
> -
>  __extension__ extern __inline int16x8_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vcaddq_rot90_s16 (int16x8_t __a, int16x8_t __b)
> @@ -5009,13 +4906,6 @@ __arm_vbicq_s16 (int16x8_t __a, int16x8_t __b)
>    return __builtin_mve_vbicq_sv8hi (__a, __b);
>  }
> 
> -__extension__ extern __inline int16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vandq_s16 (int16x8_t __a, int16x8_t __b)
> -{
> -  return __builtin_mve_vandq_sv8hi (__a, __b);
> -}
> -
>  __extension__ extern __inline int32_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vaddvaq_s16 (int32_t __a, int16x8_t __b)
> @@ -5191,13 +5081,6 @@ __arm_vhaddq_n_u32 (uint32x4_t __a, uint32_t
> __b)
>    return __builtin_mve_vhaddq_n_uv4si (__a, __b);
>  }
> 
> -__extension__ extern __inline uint32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_veorq_u32 (uint32x4_t __a, uint32x4_t __b)
> -{
> -  return __builtin_mve_veorq_uv4si (__a, __b);
> -}
> -
>  __extension__ extern __inline mve_pred16_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vcmpneq_n_u32 (uint32x4_t __a, uint32_t __b)
> @@ -5270,13 +5153,6 @@ __arm_vbicq_u32 (uint32x4_t __a, uint32x4_t
> __b)
>    return __builtin_mve_vbicq_uv4si (__a, __b);
>  }
> 
> -__extension__ extern __inline uint32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vandq_u32 (uint32x4_t __a, uint32x4_t __b)
> -{
> -  return __builtin_mve_vandq_uv4si (__a, __b);
> -}
> -
>  __extension__ extern __inline uint32_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vaddvq_p_u32 (uint32x4_t __a, mve_pred16_t __p)
> @@ -5746,13 +5622,6 @@ __arm_vhaddq_n_s32 (int32x4_t __a, int32_t
> __b)
>    return __builtin_mve_vhaddq_n_sv4si (__a, __b);
>  }
> 
> -__extension__ extern __inline int32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_veorq_s32 (int32x4_t __a, int32x4_t __b)
> -{
> -  return __builtin_mve_veorq_sv4si (__a, __b);
> -}
> -
>  __extension__ extern __inline int32x4_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vcaddq_rot90_s32 (int32x4_t __a, int32x4_t __b)
> @@ -5781,13 +5650,6 @@ __arm_vbicq_s32 (int32x4_t __a, int32x4_t __b)
>    return __builtin_mve_vbicq_sv4si (__a, __b);
>  }
> 
> -__extension__ extern __inline int32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vandq_s32 (int32x4_t __a, int32x4_t __b)
> -{
> -  return __builtin_mve_vandq_sv4si (__a, __b);
> -}
> -
>  __extension__ extern __inline int32_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vaddvaq_s32 (int32_t __a, int32x4_t __b)
> @@ -9175,48 +9037,6 @@ __arm_vabdq_m_u16 (uint16x8_t __inactive,
> uint16x8_t __a, uint16x8_t __b, mve_pr
>    return __builtin_mve_vabdq_m_uv8hi (__inactive, __a, __b, __p);
>  }
> 
> -__extension__ extern __inline int8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vandq_m_s8 (int8x16_t __inactive, int8x16_t __a, int8x16_t __b,
> mve_pred16_t __p)
> -{
> -  return __builtin_mve_vandq_m_sv16qi (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline int32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vandq_m_s32 (int32x4_t __inactive, int32x4_t __a, int32x4_t __b,
> mve_pred16_t __p)
> -{
> -  return __builtin_mve_vandq_m_sv4si (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline int16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vandq_m_s16 (int16x8_t __inactive, int16x8_t __a, int16x8_t __b,
> mve_pred16_t __p)
> -{
> -  return __builtin_mve_vandq_m_sv8hi (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline uint8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vandq_m_u8 (uint8x16_t __inactive, uint8x16_t __a, uint8x16_t __b,
> mve_pred16_t __p)
> -{
> -  return __builtin_mve_vandq_m_uv16qi (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline uint32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vandq_m_u32 (uint32x4_t __inactive, uint32x4_t __a, uint32x4_t
> __b, mve_pred16_t __p)
> -{
> -  return __builtin_mve_vandq_m_uv4si (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline uint16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vandq_m_u16 (uint16x8_t __inactive, uint16x8_t __a, uint16x8_t
> __b, mve_pred16_t __p)
> -{
> -  return __builtin_mve_vandq_m_uv8hi (__inactive, __a, __b, __p);
> -}
> -
>  __extension__ extern __inline int8x16_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vbicq_m_s8 (int8x16_t __inactive, int8x16_t __a, int8x16_t __b,
> mve_pred16_t __p)
> @@ -9385,48 +9205,6 @@ __arm_vcaddq_rot90_m_u16 (uint16x8_t
> __inactive, uint16x8_t __a, uint16x8_t __b,
>    return __builtin_mve_vcaddq_rot90_m_uv8hi (__inactive, __a, __b, __p);
>  }
> 
> -__extension__ extern __inline int8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_veorq_m_s8 (int8x16_t __inactive, int8x16_t __a, int8x16_t __b,
> mve_pred16_t __p)
> -{
> -  return __builtin_mve_veorq_m_sv16qi (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline int32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_veorq_m_s32 (int32x4_t __inactive, int32x4_t __a, int32x4_t __b,
> mve_pred16_t __p)
> -{
> -  return __builtin_mve_veorq_m_sv4si (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline int16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_veorq_m_s16 (int16x8_t __inactive, int16x8_t __a, int16x8_t __b,
> mve_pred16_t __p)
> -{
> -  return __builtin_mve_veorq_m_sv8hi (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline uint8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_veorq_m_u8 (uint8x16_t __inactive, uint8x16_t __a, uint8x16_t __b,
> mve_pred16_t __p)
> -{
> -  return __builtin_mve_veorq_m_uv16qi (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline uint32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_veorq_m_u32 (uint32x4_t __inactive, uint32x4_t __a, uint32x4_t __b,
> mve_pred16_t __p)
> -{
> -  return __builtin_mve_veorq_m_uv4si (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline uint16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_veorq_m_u16 (uint16x8_t __inactive, uint16x8_t __a, uint16x8_t __b,
> mve_pred16_t __p)
> -{
> -  return __builtin_mve_veorq_m_uv8hi (__inactive, __a, __b, __p);
> -}
> -
>  __extension__ extern __inline int8x16_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vhaddq_m_n_s8 (int8x16_t __inactive, int8x16_t __a, int8_t __b,
> mve_pred16_t __p)
> @@ -14285,48 +14063,6 @@ __arm_vrmulhq_x_u32 (uint32x4_t __a,
> uint32x4_t __b, mve_pred16_t __p)
>    return __builtin_mve_vrmulhq_m_uv4si (__arm_vuninitializedq_u32 (),
> __a, __b, __p);
>  }
> 
> -__extension__ extern __inline int8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vandq_x_s8 (int8x16_t __a, int8x16_t __b, mve_pred16_t __p)
> -{
> -  return __builtin_mve_vandq_m_sv16qi (__arm_vuninitializedq_s8 (), __a,
> __b, __p);
> -}
> -
> -__extension__ extern __inline int16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vandq_x_s16 (int16x8_t __a, int16x8_t __b, mve_pred16_t __p)
> -{
> -  return __builtin_mve_vandq_m_sv8hi (__arm_vuninitializedq_s16 (), __a,
> __b, __p);
> -}
> -
> -__extension__ extern __inline int32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vandq_x_s32 (int32x4_t __a, int32x4_t __b, mve_pred16_t __p)
> -{
> -  return __builtin_mve_vandq_m_sv4si (__arm_vuninitializedq_s32 (), __a,
> __b, __p);
> -}
> -
> -__extension__ extern __inline uint8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vandq_x_u8 (uint8x16_t __a, uint8x16_t __b, mve_pred16_t __p)
> -{
> -  return __builtin_mve_vandq_m_uv16qi (__arm_vuninitializedq_u8 (), __a,
> __b, __p);
> -}
> -
> -__extension__ extern __inline uint16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vandq_x_u16 (uint16x8_t __a, uint16x8_t __b, mve_pred16_t __p)
> -{
> -  return __builtin_mve_vandq_m_uv8hi (__arm_vuninitializedq_u16 (), __a,
> __b, __p);
> -}
> -
> -__extension__ extern __inline uint32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vandq_x_u32 (uint32x4_t __a, uint32x4_t __b, mve_pred16_t __p)
> -{
> -  return __builtin_mve_vandq_m_uv4si (__arm_vuninitializedq_u32 (), __a,
> __b, __p);
> -}
> -
>  __extension__ extern __inline int8x16_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vbicq_x_s8 (int8x16_t __a, int8x16_t __b, mve_pred16_t __p)
> @@ -14411,48 +14147,6 @@ __arm_vbrsrq_x_n_u32 (uint32x4_t __a,
> int32_t __b, mve_pred16_t __p)
>    return __builtin_mve_vbrsrq_m_n_uv4si (__arm_vuninitializedq_u32 (),
> __a, __b, __p);
>  }
> 
> -__extension__ extern __inline int8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_veorq_x_s8 (int8x16_t __a, int8x16_t __b, mve_pred16_t __p)
> -{
> -  return __builtin_mve_veorq_m_sv16qi (__arm_vuninitializedq_s8 (), __a,
> __b, __p);
> -}
> -
> -__extension__ extern __inline int16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_veorq_x_s16 (int16x8_t __a, int16x8_t __b, mve_pred16_t __p)
> -{
> -  return __builtin_mve_veorq_m_sv8hi (__arm_vuninitializedq_s16 (), __a,
> __b, __p);
> -}
> -
> -__extension__ extern __inline int32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_veorq_x_s32 (int32x4_t __a, int32x4_t __b, mve_pred16_t __p)
> -{
> -  return __builtin_mve_veorq_m_sv4si (__arm_vuninitializedq_s32 (), __a,
> __b, __p);
> -}
> -
> -__extension__ extern __inline uint8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_veorq_x_u8 (uint8x16_t __a, uint8x16_t __b, mve_pred16_t __p)
> -{
> -  return __builtin_mve_veorq_m_uv16qi (__arm_vuninitializedq_u8 (), __a,
> __b, __p);
> -}
> -
> -__extension__ extern __inline uint16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_veorq_x_u16 (uint16x8_t __a, uint16x8_t __b, mve_pred16_t __p)
> -{
> -  return __builtin_mve_veorq_m_uv8hi (__arm_vuninitializedq_u16 (), __a,
> __b, __p);
> -}
> -
> -__extension__ extern __inline uint32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_veorq_x_u32 (uint32x4_t __a, uint32x4_t __b, mve_pred16_t __p)
> -{
> -  return __builtin_mve_veorq_m_uv4si (__arm_vuninitializedq_u32 (), __a,
> __b, __p);
> -}
> -
>  __extension__ extern __inline int16x8_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vmovlbq_x_s8 (int8x16_t __a, mve_pred16_t __p)
> @@ -16300,13 +15994,6 @@ __arm_vmaxnmaq_f16 (float16x8_t __a,
> float16x8_t __b)
>    return __builtin_mve_vmaxnmaq_fv8hf (__a, __b);
>  }
> 
> -__extension__ extern __inline float16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_veorq_f16 (float16x8_t __a, float16x8_t __b)
> -{
> -  return __builtin_mve_veorq_fv8hf (__a, __b);
> -}
> -
>  __extension__ extern __inline float16x8_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vcmulq_rot90_f16 (float16x8_t __a, float16x8_t __b)
> @@ -16356,13 +16043,6 @@ __arm_vbicq_f16 (float16x8_t __a, float16x8_t
> __b)
>    return __builtin_mve_vbicq_fv8hf (__a, __b);
>  }
> 
> -__extension__ extern __inline float16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vandq_f16 (float16x8_t __a, float16x8_t __b)
> -{
> -  return __builtin_mve_vandq_fv8hf (__a, __b);
> -}
> -
>  __extension__ extern __inline float16x8_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vabdq_f16 (float16x8_t __a, float16x8_t __b)
> @@ -16524,13 +16204,6 @@ __arm_vmaxnmaq_f32 (float32x4_t __a,
> float32x4_t __b)
>    return __builtin_mve_vmaxnmaq_fv4sf (__a, __b);
>  }
> 
> -__extension__ extern __inline float32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_veorq_f32 (float32x4_t __a, float32x4_t __b)
> -{
> -  return __builtin_mve_veorq_fv4sf (__a, __b);
> -}
> -
>  __extension__ extern __inline float32x4_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vcmulq_rot90_f32 (float32x4_t __a, float32x4_t __b)
> @@ -16580,13 +16253,6 @@ __arm_vbicq_f32 (float32x4_t __a, float32x4_t
> __b)
>    return __builtin_mve_vbicq_fv4sf (__a, __b);
>  }
> 
> -__extension__ extern __inline float32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vandq_f32 (float32x4_t __a, float32x4_t __b)
> -{
> -  return __builtin_mve_vandq_fv4sf (__a, __b);
> -}
> -
>  __extension__ extern __inline float32x4_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vabdq_f32 (float32x4_t __a, float32x4_t __b)
> @@ -17372,20 +17038,6 @@ __arm_vabdq_m_f16 (float16x8_t __inactive,
> float16x8_t __a, float16x8_t __b, mve
>    return __builtin_mve_vabdq_m_fv8hf (__inactive, __a, __b, __p);
>  }
> 
> -__extension__ extern __inline float32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vandq_m_f32 (float32x4_t __inactive, float32x4_t __a, float32x4_t
> __b, mve_pred16_t __p)
> -{
> -  return __builtin_mve_vandq_m_fv4sf (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline float16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vandq_m_f16 (float16x8_t __inactive, float16x8_t __a, float16x8_t
> __b, mve_pred16_t __p)
> -{
> -  return __builtin_mve_vandq_m_fv8hf (__inactive, __a, __b, __p);
> -}
> -
>  __extension__ extern __inline float32x4_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vbicq_m_f32 (float32x4_t __inactive, float32x4_t __a, float32x4_t
> __b, mve_pred16_t __p)
> @@ -17582,20 +17234,6 @@ __arm_vcvtq_m_n_u16_f16 (uint16x8_t
> __inactive, float16x8_t __a, const int __imm
>    return __builtin_mve_vcvtq_m_n_from_f_uv8hi (__inactive, __a, __imm6,
> __p);
>  }
> 
> -__extension__ extern __inline float32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_veorq_m_f32 (float32x4_t __inactive, float32x4_t __a, float32x4_t
> __b, mve_pred16_t __p)
> -{
> -  return __builtin_mve_veorq_m_fv4sf (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline float16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_veorq_m_f16 (float16x8_t __inactive, float16x8_t __a, float16x8_t
> __b, mve_pred16_t __p)
> -{
> -  return __builtin_mve_veorq_m_fv8hf (__inactive, __a, __b, __p);
> -}
> -
>  __extension__ extern __inline float32x4_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vfmaq_m_f32 (float32x4_t __a, float32x4_t __b, float32x4_t __c,
> mve_pred16_t __p)
> @@ -18456,20 +18094,6 @@ __arm_vrndxq_x_f32 (float32x4_t __a,
> mve_pred16_t __p)
>    return __builtin_mve_vrndxq_m_fv4sf (__arm_vuninitializedq_f32 (), __a,
> __p);
>  }
> 
> -__extension__ extern __inline float16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vandq_x_f16 (float16x8_t __a, float16x8_t __b, mve_pred16_t __p)
> -{
> -  return __builtin_mve_vandq_m_fv8hf (__arm_vuninitializedq_f16 (), __a,
> __b, __p);
> -}
> -
> -__extension__ extern __inline float32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vandq_x_f32 (float32x4_t __a, float32x4_t __b, mve_pred16_t __p)
> -{
> -  return __builtin_mve_vandq_m_fv4sf (__arm_vuninitializedq_f32 (), __a,
> __b, __p);
> -}
> -
>  __extension__ extern __inline float16x8_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vbicq_x_f16 (float16x8_t __a, float16x8_t __b, mve_pred16_t __p)
> @@ -18498,20 +18122,6 @@ __arm_vbrsrq_x_n_f32 (float32x4_t __a,
> int32_t __b, mve_pred16_t __p)
>    return __builtin_mve_vbrsrq_m_n_fv4sf (__arm_vuninitializedq_f32 (), __a,
> __b, __p);
>  }
> 
> -__extension__ extern __inline float16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_veorq_x_f16 (float16x8_t __a, float16x8_t __b, mve_pred16_t __p)
> -{
> -  return __builtin_mve_veorq_m_fv8hf (__arm_vuninitializedq_f16 (), __a,
> __b, __p);
> -}
> -
> -__extension__ extern __inline float32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_veorq_x_f32 (float32x4_t __a, float32x4_t __b, mve_pred16_t __p)
> -{
> -  return __builtin_mve_veorq_m_fv4sf (__arm_vuninitializedq_f32 (), __a,
> __b, __p);
> -}
> -
>  __extension__ extern __inline float16x8_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vornq_x_f16 (float16x8_t __a, float16x8_t __b, mve_pred16_t __p)
> @@ -19428,13 +19038,6 @@ __arm_vhaddq (uint8x16_t __a, uint8_t __b)
>   return __arm_vhaddq_n_u8 (__a, __b);
>  }
> 
> -__extension__ extern __inline uint8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_veorq (uint8x16_t __a, uint8x16_t __b)
> -{
> - return __arm_veorq_u8 (__a, __b);
> -}
> -
>  __extension__ extern __inline mve_pred16_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vcmpneq (uint8x16_t __a, uint8_t __b)
> @@ -19505,13 +19108,6 @@ __arm_vbicq (uint8x16_t __a, uint8x16_t __b)
>   return __arm_vbicq_u8 (__a, __b);
>  }
> 
> -__extension__ extern __inline uint8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vandq (uint8x16_t __a, uint8x16_t __b)
> -{
> - return __arm_vandq_u8 (__a, __b);
> -}
> -
>  __extension__ extern __inline uint32_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vaddvq_p (uint8x16_t __a, mve_pred16_t __p)
> @@ -19981,13 +19577,6 @@ __arm_vhaddq (int8x16_t __a, int8_t __b)
>   return __arm_vhaddq_n_s8 (__a, __b);
>  }
> 
> -__extension__ extern __inline int8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_veorq (int8x16_t __a, int8x16_t __b)
> -{
> - return __arm_veorq_s8 (__a, __b);
> -}
> -
>  __extension__ extern __inline int8x16_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vcaddq_rot90 (int8x16_t __a, int8x16_t __b)
> @@ -20016,13 +19605,6 @@ __arm_vbicq (int8x16_t __a, int8x16_t __b)
>   return __arm_vbicq_s8 (__a, __b);
>  }
> 
> -__extension__ extern __inline int8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vandq (int8x16_t __a, int8x16_t __b)
> -{
> - return __arm_vandq_s8 (__a, __b);
> -}
> -
>  __extension__ extern __inline int32_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vaddvaq (int32_t __a, int8x16_t __b)
> @@ -20198,13 +19780,6 @@ __arm_vhaddq (uint16x8_t __a, uint16_t __b)
>   return __arm_vhaddq_n_u16 (__a, __b);
>  }
> 
> -__extension__ extern __inline uint16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_veorq (uint16x8_t __a, uint16x8_t __b)
> -{
> - return __arm_veorq_u16 (__a, __b);
> -}
> -
>  __extension__ extern __inline mve_pred16_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vcmpneq (uint16x8_t __a, uint16_t __b)
> @@ -20275,13 +19850,6 @@ __arm_vbicq (uint16x8_t __a, uint16x8_t __b)
>   return __arm_vbicq_u16 (__a, __b);
>  }
> 
> -__extension__ extern __inline uint16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vandq (uint16x8_t __a, uint16x8_t __b)
> -{
> - return __arm_vandq_u16 (__a, __b);
> -}
> -
>  __extension__ extern __inline uint32_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vaddvq_p (uint16x8_t __a, mve_pred16_t __p)
> @@ -20751,13 +20319,6 @@ __arm_vhaddq (int16x8_t __a, int16_t __b)
>   return __arm_vhaddq_n_s16 (__a, __b);
>  }
> 
> -__extension__ extern __inline int16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_veorq (int16x8_t __a, int16x8_t __b)
> -{
> - return __arm_veorq_s16 (__a, __b);
> -}
> -
>  __extension__ extern __inline int16x8_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vcaddq_rot90 (int16x8_t __a, int16x8_t __b)
> @@ -20786,13 +20347,6 @@ __arm_vbicq (int16x8_t __a, int16x8_t __b)
>   return __arm_vbicq_s16 (__a, __b);
>  }
> 
> -__extension__ extern __inline int16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vandq (int16x8_t __a, int16x8_t __b)
> -{
> - return __arm_vandq_s16 (__a, __b);
> -}
> -
>  __extension__ extern __inline int32_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vaddvaq (int32_t __a, int16x8_t __b)
> @@ -20968,13 +20522,6 @@ __arm_vhaddq (uint32x4_t __a, uint32_t __b)
>   return __arm_vhaddq_n_u32 (__a, __b);
>  }
> 
> -__extension__ extern __inline uint32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_veorq (uint32x4_t __a, uint32x4_t __b)
> -{
> - return __arm_veorq_u32 (__a, __b);
> -}
> -
>  __extension__ extern __inline mve_pred16_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vcmpneq (uint32x4_t __a, uint32_t __b)
> @@ -21045,13 +20592,6 @@ __arm_vbicq (uint32x4_t __a, uint32x4_t __b)
>   return __arm_vbicq_u32 (__a, __b);
>  }
> 
> -__extension__ extern __inline uint32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vandq (uint32x4_t __a, uint32x4_t __b)
> -{
> - return __arm_vandq_u32 (__a, __b);
> -}
> -
>  __extension__ extern __inline uint32_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vaddvq_p (uint32x4_t __a, mve_pred16_t __p)
> @@ -21521,13 +21061,6 @@ __arm_vhaddq (int32x4_t __a, int32_t __b)
>   return __arm_vhaddq_n_s32 (__a, __b);
>  }
> 
> -__extension__ extern __inline int32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_veorq (int32x4_t __a, int32x4_t __b)
> -{
> - return __arm_veorq_s32 (__a, __b);
> -}
> -
>  __extension__ extern __inline int32x4_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vcaddq_rot90 (int32x4_t __a, int32x4_t __b)
> @@ -21556,13 +21089,6 @@ __arm_vbicq (int32x4_t __a, int32x4_t __b)
>   return __arm_vbicq_s32 (__a, __b);
>  }
> 
> -__extension__ extern __inline int32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vandq (int32x4_t __a, int32x4_t __b)
> -{
> - return __arm_vandq_s32 (__a, __b);
> -}
> -
>  __extension__ extern __inline int32_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vaddvaq (int32_t __a, int32x4_t __b)
> @@ -24909,48 +24435,6 @@ __arm_vabdq_m (uint16x8_t __inactive,
> uint16x8_t __a, uint16x8_t __b, mve_pred16
>   return __arm_vabdq_m_u16 (__inactive, __a, __b, __p);
>  }
> 
> -__extension__ extern __inline int8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vandq_m (int8x16_t __inactive, int8x16_t __a, int8x16_t __b,
> mve_pred16_t __p)
> -{
> - return __arm_vandq_m_s8 (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline int32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vandq_m (int32x4_t __inactive, int32x4_t __a, int32x4_t __b,
> mve_pred16_t __p)
> -{
> - return __arm_vandq_m_s32 (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline int16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vandq_m (int16x8_t __inactive, int16x8_t __a, int16x8_t __b,
> mve_pred16_t __p)
> -{
> - return __arm_vandq_m_s16 (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline uint8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vandq_m (uint8x16_t __inactive, uint8x16_t __a, uint8x16_t __b,
> mve_pred16_t __p)
> -{
> - return __arm_vandq_m_u8 (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline uint32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vandq_m (uint32x4_t __inactive, uint32x4_t __a, uint32x4_t __b,
> mve_pred16_t __p)
> -{
> - return __arm_vandq_m_u32 (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline uint16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vandq_m (uint16x8_t __inactive, uint16x8_t __a, uint16x8_t __b,
> mve_pred16_t __p)
> -{
> - return __arm_vandq_m_u16 (__inactive, __a, __b, __p);
> -}
> -
>  __extension__ extern __inline int8x16_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vbicq_m (int8x16_t __inactive, int8x16_t __a, int8x16_t __b,
> mve_pred16_t __p)
> @@ -25119,48 +24603,6 @@ __arm_vcaddq_rot90_m (uint16x8_t __inactive,
> uint16x8_t __a, uint16x8_t __b, mve
>   return __arm_vcaddq_rot90_m_u16 (__inactive, __a, __b, __p);
>  }
> 
> -__extension__ extern __inline int8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_veorq_m (int8x16_t __inactive, int8x16_t __a, int8x16_t __b,
> mve_pred16_t __p)
> -{
> - return __arm_veorq_m_s8 (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline int32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_veorq_m (int32x4_t __inactive, int32x4_t __a, int32x4_t __b,
> mve_pred16_t __p)
> -{
> - return __arm_veorq_m_s32 (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline int16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_veorq_m (int16x8_t __inactive, int16x8_t __a, int16x8_t __b,
> mve_pred16_t __p)
> -{
> - return __arm_veorq_m_s16 (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline uint8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_veorq_m (uint8x16_t __inactive, uint8x16_t __a, uint8x16_t __b,
> mve_pred16_t __p)
> -{
> - return __arm_veorq_m_u8 (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline uint32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_veorq_m (uint32x4_t __inactive, uint32x4_t __a, uint32x4_t __b,
> mve_pred16_t __p)
> -{
> - return __arm_veorq_m_u32 (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline uint16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_veorq_m (uint16x8_t __inactive, uint16x8_t __a, uint16x8_t __b,
> mve_pred16_t __p)
> -{
> - return __arm_veorq_m_u16 (__inactive, __a, __b, __p);
> -}
> -
>  __extension__ extern __inline int8x16_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vhaddq_m (int8x16_t __inactive, int8x16_t __a, int8_t __b,
> mve_pred16_t __p)
> @@ -29550,48 +28992,6 @@ __arm_vrmulhq_x (uint32x4_t __a, uint32x4_t
> __b, mve_pred16_t __p)
>   return __arm_vrmulhq_x_u32 (__a, __b, __p);
>  }
> 
> -__extension__ extern __inline int8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vandq_x (int8x16_t __a, int8x16_t __b, mve_pred16_t __p)
> -{
> - return __arm_vandq_x_s8 (__a, __b, __p);
> -}
> -
> -__extension__ extern __inline int16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vandq_x (int16x8_t __a, int16x8_t __b, mve_pred16_t __p)
> -{
> - return __arm_vandq_x_s16 (__a, __b, __p);
> -}
> -
> -__extension__ extern __inline int32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vandq_x (int32x4_t __a, int32x4_t __b, mve_pred16_t __p)
> -{
> - return __arm_vandq_x_s32 (__a, __b, __p);
> -}
> -
> -__extension__ extern __inline uint8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vandq_x (uint8x16_t __a, uint8x16_t __b, mve_pred16_t __p)
> -{
> - return __arm_vandq_x_u8 (__a, __b, __p);
> -}
> -
> -__extension__ extern __inline uint16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vandq_x (uint16x8_t __a, uint16x8_t __b, mve_pred16_t __p)
> -{
> - return __arm_vandq_x_u16 (__a, __b, __p);
> -}
> -
> -__extension__ extern __inline uint32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vandq_x (uint32x4_t __a, uint32x4_t __b, mve_pred16_t __p)
> -{
> - return __arm_vandq_x_u32 (__a, __b, __p);
> -}
> -
>  __extension__ extern __inline int8x16_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vbicq_x (int8x16_t __a, int8x16_t __b, mve_pred16_t __p)
> @@ -29676,48 +29076,6 @@ __arm_vbrsrq_x (uint32x4_t __a, int32_t __b,
> mve_pred16_t __p)
>   return __arm_vbrsrq_x_n_u32 (__a, __b, __p);
>  }
> 
> -__extension__ extern __inline int8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_veorq_x (int8x16_t __a, int8x16_t __b, mve_pred16_t __p)
> -{
> - return __arm_veorq_x_s8 (__a, __b, __p);
> -}
> -
> -__extension__ extern __inline int16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_veorq_x (int16x8_t __a, int16x8_t __b, mve_pred16_t __p)
> -{
> - return __arm_veorq_x_s16 (__a, __b, __p);
> -}
> -
> -__extension__ extern __inline int32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_veorq_x (int32x4_t __a, int32x4_t __b, mve_pred16_t __p)
> -{
> - return __arm_veorq_x_s32 (__a, __b, __p);
> -}
> -
> -__extension__ extern __inline uint8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_veorq_x (uint8x16_t __a, uint8x16_t __b, mve_pred16_t __p)
> -{
> - return __arm_veorq_x_u8 (__a, __b, __p);
> -}
> -
> -__extension__ extern __inline uint16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_veorq_x (uint16x8_t __a, uint16x8_t __b, mve_pred16_t __p)
> -{
> - return __arm_veorq_x_u16 (__a, __b, __p);
> -}
> -
> -__extension__ extern __inline uint32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_veorq_x (uint32x4_t __a, uint32x4_t __b, mve_pred16_t __p)
> -{
> - return __arm_veorq_x_u32 (__a, __b, __p);
> -}
> -
>  __extension__ extern __inline int16x8_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vmovlbq_x (int8x16_t __a, mve_pred16_t __p)
> @@ -31127,13 +30485,6 @@ __arm_vmaxnmaq (float16x8_t __a,
> float16x8_t __b)
>   return __arm_vmaxnmaq_f16 (__a, __b);
>  }
> 
> -__extension__ extern __inline float16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_veorq (float16x8_t __a, float16x8_t __b)
> -{
> - return __arm_veorq_f16 (__a, __b);
> -}
> -
>  __extension__ extern __inline float16x8_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vcmulq_rot90 (float16x8_t __a, float16x8_t __b)
> @@ -31183,13 +30534,6 @@ __arm_vbicq (float16x8_t __a, float16x8_t __b)
>   return __arm_vbicq_f16 (__a, __b);
>  }
> 
> -__extension__ extern __inline float16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vandq (float16x8_t __a, float16x8_t __b)
> -{
> - return __arm_vandq_f16 (__a, __b);
> -}
> -
>  __extension__ extern __inline float16x8_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vabdq (float16x8_t __a, float16x8_t __b)
> @@ -31351,13 +30695,6 @@ __arm_vmaxnmaq (float32x4_t __a,
> float32x4_t __b)
>   return __arm_vmaxnmaq_f32 (__a, __b);
>  }
> 
> -__extension__ extern __inline float32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_veorq (float32x4_t __a, float32x4_t __b)
> -{
> - return __arm_veorq_f32 (__a, __b);
> -}
> -
>  __extension__ extern __inline float32x4_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vcmulq_rot90 (float32x4_t __a, float32x4_t __b)
> @@ -31407,13 +30744,6 @@ __arm_vbicq (float32x4_t __a, float32x4_t __b)
>   return __arm_vbicq_f32 (__a, __b);
>  }
> 
> -__extension__ extern __inline float32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vandq (float32x4_t __a, float32x4_t __b)
> -{
> - return __arm_vandq_f32 (__a, __b);
> -}
> -
>  __extension__ extern __inline float32x4_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vabdq (float32x4_t __a, float32x4_t __b)
> @@ -32184,20 +31514,6 @@ __arm_vabdq_m (float16x8_t __inactive,
> float16x8_t __a, float16x8_t __b, mve_pre
>   return __arm_vabdq_m_f16 (__inactive, __a, __b, __p);
>  }
> 
> -__extension__ extern __inline float32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vandq_m (float32x4_t __inactive, float32x4_t __a, float32x4_t __b,
> mve_pred16_t __p)
> -{
> - return __arm_vandq_m_f32 (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline float16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vandq_m (float16x8_t __inactive, float16x8_t __a, float16x8_t __b,
> mve_pred16_t __p)
> -{
> - return __arm_vandq_m_f16 (__inactive, __a, __b, __p);
> -}
> -
>  __extension__ extern __inline float32x4_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vbicq_m (float32x4_t __inactive, float32x4_t __a, float32x4_t __b,
> mve_pred16_t __p)
> @@ -32394,20 +31710,6 @@ __arm_vcvtq_m_n (uint16x8_t __inactive,
> float16x8_t __a, const int __imm6, mve_p
>   return __arm_vcvtq_m_n_u16_f16 (__inactive, __a, __imm6, __p);
>  }
> 
> -__extension__ extern __inline float32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_veorq_m (float32x4_t __inactive, float32x4_t __a, float32x4_t __b,
> mve_pred16_t __p)
> -{
> - return __arm_veorq_m_f32 (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline float16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_veorq_m (float16x8_t __inactive, float16x8_t __a, float16x8_t __b,
> mve_pred16_t __p)
> -{
> - return __arm_veorq_m_f16 (__inactive, __a, __b, __p);
> -}
> -
>  __extension__ extern __inline float32x4_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vfmaq_m (float32x4_t __a, float32x4_t __b, float32x4_t __c,
> mve_pred16_t __p)
> @@ -33010,20 +32312,6 @@ __arm_vrndxq_x (float32x4_t __a,
> mve_pred16_t __p)
>   return __arm_vrndxq_x_f32 (__a, __p);
>  }
> 
> -__extension__ extern __inline float16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vandq_x (float16x8_t __a, float16x8_t __b, mve_pred16_t __p)
> -{
> - return __arm_vandq_x_f16 (__a, __b, __p);
> -}
> -
> -__extension__ extern __inline float32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vandq_x (float32x4_t __a, float32x4_t __b, mve_pred16_t __p)
> -{
> - return __arm_vandq_x_f32 (__a, __b, __p);
> -}
> -
>  __extension__ extern __inline float16x8_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vbicq_x (float16x8_t __a, float16x8_t __b, mve_pred16_t __p)
> @@ -33052,20 +32340,6 @@ __arm_vbrsrq_x (float32x4_t __a, int32_t __b,
> mve_pred16_t __p)
>   return __arm_vbrsrq_x_n_f32 (__a, __b, __p);
>  }
> 
> -__extension__ extern __inline float16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_veorq_x (float16x8_t __a, float16x8_t __b, mve_pred16_t __p)
> -{
> - return __arm_veorq_x_f16 (__a, __b, __p);
> -}
> -
> -__extension__ extern __inline float32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_veorq_x (float32x4_t __a, float32x4_t __b, mve_pred16_t __p)
> -{
> - return __arm_veorq_x_f32 (__a, __b, __p);
> -}
> -
>  __extension__ extern __inline float16x8_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vornq_x (float16x8_t __a, float16x8_t __b, mve_pred16_t __p)
> @@ -33678,18 +32952,6 @@ extern void *__ARM_undef;
>    int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]:
> __arm_vabdq_f16 (__ARM_mve_coerce(__p0, float16x8_t),
> __ARM_mve_coerce(__p1, float16x8_t)), \
>    int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]:
> __arm_vabdq_f32 (__ARM_mve_coerce(__p0, float32x4_t),
> __ARM_mve_coerce(__p1, float32x4_t)));})
> 
> -#define __arm_vandq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
> -  __typeof(p1) __p1 = (p1); \
> -  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0,
> \
> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]:
> __arm_vandq_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int8x16_t)), \
> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]:
> __arm_vandq_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce(__p1, int16x8_t)), \
> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
> __arm_vandq_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce(__p1, int32x4_t)), \
> -  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]:
> __arm_vandq_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
> __ARM_mve_coerce(__p1, uint8x16_t)), \
> -  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]:
> __arm_vandq_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
> __ARM_mve_coerce(__p1, uint16x8_t)), \
> -  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]:
> __arm_vandq_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
> __ARM_mve_coerce(__p1, uint32x4_t)), \
> -  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]:
> __arm_vandq_f16 (__ARM_mve_coerce(__p0, float16x8_t),
> __ARM_mve_coerce(__p1, float16x8_t)), \
> -  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]:
> __arm_vandq_f32 (__ARM_mve_coerce(__p0, float32x4_t),
> __ARM_mve_coerce(__p1, float32x4_t)));})
> -
>  #define __arm_vbicq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
>    __typeof(p1) __p1 = (p1); \
>    _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0,
> \
> @@ -33868,18 +33130,6 @@ extern void *__ARM_undef;
>    int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]:
> __arm_vcmulq_rot90_f16 (__ARM_mve_coerce(__p0, float16x8_t),
> __ARM_mve_coerce(__p1, float16x8_t)), \
>    int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]:
> __arm_vcmulq_rot90_f32 (__ARM_mve_coerce(__p0, float32x4_t),
> __ARM_mve_coerce(__p1, float32x4_t)));})
> 
> -#define __arm_veorq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
> -  __typeof(p1) __p1 = (p1); \
> -  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0,
> \
> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]:
> __arm_veorq_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int8x16_t)), \
> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]:
> __arm_veorq_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce(__p1, int16x8_t)), \
> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
> __arm_veorq_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce(__p1, int32x4_t)), \
> -  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]:
> __arm_veorq_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
> __ARM_mve_coerce(__p1, uint8x16_t)), \
> -  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]:
> __arm_veorq_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
> __ARM_mve_coerce(__p1, uint16x8_t)), \
> -  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]:
> __arm_veorq_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
> __ARM_mve_coerce(__p1, uint32x4_t)), \
> -  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]:
> __arm_veorq_f16 (__ARM_mve_coerce(__p0, float16x8_t),
> __ARM_mve_coerce(__p1, float16x8_t)), \
> -  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]:
> __arm_veorq_f32 (__ARM_mve_coerce(__p0, float32x4_t),
> __ARM_mve_coerce(__p1, float32x4_t)));})
> -
>  #define __arm_vmaxnmaq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
>    __typeof(p1) __p1 = (p1); \
>    _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0,
> \
> @@ -35060,19 +34310,6 @@ extern void *__ARM_undef;
>    int
> (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t][__ARM_
> mve_type_float16x8_t]: __arm_vabdq_m_f16 (__ARM_mve_coerce(__p0,
> float16x8_t), __ARM_mve_coerce(__p1, float16x8_t),
> __ARM_mve_coerce(__p2, float16x8_t), p3), \
>    int
> (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t][__ARM_
> mve_type_float32x4_t]: __arm_vabdq_m_f32 (__ARM_mve_coerce(__p0,
> float32x4_t), __ARM_mve_coerce(__p1, float32x4_t),
> __ARM_mve_coerce(__p2, float32x4_t), p3));})
> 
> -#define __arm_vandq_m(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
> -  __typeof(p1) __p1 = (p1); \
> -  __typeof(p2) __p2 = (p2); \
> -  _Generic( (int
> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typ
> eid(__p2)])0, \
> -  int
> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
> _type_int8x16_t]: __arm_vandq_m_s8 (__ARM_mve_coerce(__p0,
> int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2,
> int8x16_t), p3), \
> -  int
> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
> _type_int16x8_t]: __arm_vandq_m_s16 (__ARM_mve_coerce(__p0,
> int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2,
> int16x8_t), p3), \
> -  int
> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
> _type_int32x4_t]: __arm_vandq_m_s32 (__ARM_mve_coerce(__p0,
> int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2,
> int32x4_t), p3), \
> -  int
> (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_m
> ve_type_uint8x16_t]: __arm_vandq_m_u8 (__ARM_mve_coerce(__p0,
> uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t),
> __ARM_mve_coerce(__p2, uint8x16_t), p3), \
> -  int
> (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_m
> ve_type_uint16x8_t]: __arm_vandq_m_u16 (__ARM_mve_coerce(__p0,
> uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t),
> __ARM_mve_coerce(__p2, uint16x8_t), p3), \
> -  int
> (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_m
> ve_type_uint32x4_t]: __arm_vandq_m_u32 (__ARM_mve_coerce(__p0,
> uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t),
> __ARM_mve_coerce(__p2, uint32x4_t), p3), \
> -  int
> (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t][__ARM_
> mve_type_float16x8_t]: __arm_vandq_m_f16 (__ARM_mve_coerce(__p0,
> float16x8_t), __ARM_mve_coerce(__p1, float16x8_t),
> __ARM_mve_coerce(__p2, float16x8_t), p3), \
> -  int
> (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t][__ARM_
> mve_type_float32x4_t]: __arm_vandq_m_f32 (__ARM_mve_coerce(__p0,
> float32x4_t), __ARM_mve_coerce(__p1, float32x4_t),
> __ARM_mve_coerce(__p2, float32x4_t), p3));})
> -
>  #define __arm_vbicq_m(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
>    __typeof(p1) __p1 = (p1); \
>    __typeof(p2) __p2 = (p2); \
> @@ -35180,19 +34417,6 @@ extern void *__ARM_undef;
>    int
> (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t][__ARM_
> mve_type_float16x8_t]:
> __arm_vcmulq_rot90_m_f16(__ARM_mve_coerce(__p0, float16x8_t),
> __ARM_mve_coerce(__p1, float16x8_t), __ARM_mve_coerce(__p2,
> float16x8_t), p3), \
>    int
> (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t][__ARM_
> mve_type_float32x4_t]:
> __arm_vcmulq_rot90_m_f32(__ARM_mve_coerce(__p0, float32x4_t),
> __ARM_mve_coerce(__p1, float32x4_t), __ARM_mve_coerce(__p2,
> float32x4_t), p3));})
> 
> -#define __arm_veorq_m(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
> -  __typeof(p1) __p1 = (p1); \
> -  __typeof(p2) __p2 = (p2); \
> -  _Generic( (int
> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typ
> eid(__p2)])0, \
> -  int
> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
> _type_int8x16_t]: __arm_veorq_m_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8x16_t),
> p3), \
> -  int
> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
> _type_int16x8_t]: __arm_veorq_m_s16 (__ARM_mve_coerce(__p0,
> int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2,
> int16x8_t), p3), \
> -  int
> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
> _type_int32x4_t]: __arm_veorq_m_s32 (__ARM_mve_coerce(__p0,
> int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2,
> int32x4_t), p3), \
> -  int
> (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_m
> ve_type_uint8x16_t]: __arm_veorq_m_u8 (__ARM_mve_coerce(__p0,
> uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t),
> __ARM_mve_coerce(__p2, uint8x16_t), p3), \
> -  int
> (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_m
> ve_type_uint16x8_t]: __arm_veorq_m_u16 (__ARM_mve_coerce(__p0,
> uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t),
> __ARM_mve_coerce(__p2, uint16x8_t), p3), \
> -  int
> (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_m
> ve_type_uint32x4_t]: __arm_veorq_m_u32 (__ARM_mve_coerce(__p0,
> uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t),
> __ARM_mve_coerce(__p2, uint32x4_t), p3), \
> -  int
> (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t][__ARM_
> mve_type_float16x8_t]: __arm_veorq_m_f16 (__ARM_mve_coerce(__p0,
> float16x8_t), __ARM_mve_coerce(__p1, float16x8_t),
> __ARM_mve_coerce(__p2, float16x8_t), p3), \
> -  int
> (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t][__ARM_
> mve_type_float32x4_t]: __arm_veorq_m_f32 (__ARM_mve_coerce(__p0,
> float32x4_t), __ARM_mve_coerce(__p1, float32x4_t),
> __ARM_mve_coerce(__p2, float32x4_t), p3));})
> -
>  #define __arm_vfmaq_m(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
>    __typeof(p1) __p1 = (p1); \
>    __typeof(p2) __p2 = (p2); \
> @@ -35588,18 +34812,6 @@ extern void *__ARM_undef;
>    int (*)[__ARM_mve_type_float16x8_t]: __arm_vabsq_x_f16
> (__ARM_mve_coerce(__p1, float16x8_t), p2), \
>    int (*)[__ARM_mve_type_float32x4_t]: __arm_vabsq_x_f32
> (__ARM_mve_coerce(__p1, float32x4_t), p2));})
> 
> -#define __arm_vandq_x(p1,p2,p3) ({ __typeof(p1) __p1 = (p1); \
> -  __typeof(p2) __p2 = (p2); \
> -  _Generic( (int (*)[__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0,
> \
> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]:
> __arm_vandq_x_s8  (__ARM_mve_coerce(__p1, int8x16_t),
> __ARM_mve_coerce(__p2, int8x16_t), p3), \
> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]:
> __arm_vandq_x_s16 (__ARM_mve_coerce(__p1, int16x8_t),
> __ARM_mve_coerce(__p2, int16x8_t), p3), \
> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
> __arm_vandq_x_s32 (__ARM_mve_coerce(__p1, int32x4_t),
> __ARM_mve_coerce(__p2, int32x4_t), p3), \
> -  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]:
> __arm_vandq_x_u8 (__ARM_mve_coerce(__p1, uint8x16_t),
> __ARM_mve_coerce(__p2, uint8x16_t), p3), \
> -  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]:
> __arm_vandq_x_u16 (__ARM_mve_coerce(__p1, uint16x8_t),
> __ARM_mve_coerce(__p2, uint16x8_t), p3), \
> -  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]:
> __arm_vandq_x_u32 (__ARM_mve_coerce(__p1, uint32x4_t),
> __ARM_mve_coerce(__p2, uint32x4_t), p3), \
> -  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]:
> __arm_vandq_x_f16 (__ARM_mve_coerce(__p1, float16x8_t),
> __ARM_mve_coerce(__p2, float16x8_t), p3), \
> -  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]:
> __arm_vandq_x_f32 (__ARM_mve_coerce(__p1, float32x4_t),
> __ARM_mve_coerce(__p2, float32x4_t), p3));})
> -
>  #define __arm_vbicq_x(p1,p2,p3) ({ __typeof(p1) __p1 = (p1); \
>    __typeof(p2) __p2 = (p2); \
>    _Generic( (int (*)[__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0,
> \
> @@ -35679,18 +34891,6 @@ extern void *__ARM_undef;
>    int (*)[__ARM_mve_type_uint16x8_t]: __arm_vcvtq_x_n_f16_u16
> (__ARM_mve_coerce(__p1, uint16x8_t), p2, p3), \
>    int (*)[__ARM_mve_type_uint32x4_t]: __arm_vcvtq_x_n_f32_u32
> (__ARM_mve_coerce(__p1, uint32x4_t), p2, p3));})
> 
> -#define __arm_veorq_x(p1,p2,p3) ({ __typeof(p1) __p1 = (p1); \
> -  __typeof(p2) __p2 = (p2); \
> -  _Generic( (int (*)[__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0,
> \
> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]:
> __arm_veorq_x_s8(__ARM_mve_coerce(__p1, int8x16_t),
> __ARM_mve_coerce(__p2, int8x16_t), p3), \
> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]:
> __arm_veorq_x_s16(__ARM_mve_coerce(__p1, int16x8_t),
> __ARM_mve_coerce(__p2, int16x8_t), p3), \
> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
> __arm_veorq_x_s32(__ARM_mve_coerce(__p1, int32x4_t),
> __ARM_mve_coerce(__p2, int32x4_t), p3), \
> -  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]:
> __arm_veorq_x_u8 (__ARM_mve_coerce(__p1, uint8x16_t),
> __ARM_mve_coerce(__p2, uint8x16_t), p3), \
> -  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]:
> __arm_veorq_x_u16 (__ARM_mve_coerce(__p1, uint16x8_t),
> __ARM_mve_coerce(__p2, uint16x8_t), p3), \
> -  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]:
> __arm_veorq_x_u32 (__ARM_mve_coerce(__p1, uint32x4_t),
> __ARM_mve_coerce(__p2, uint32x4_t), p3), \
> -  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]:
> __arm_veorq_x_f16 (__ARM_mve_coerce(__p1, float16x8_t),
> __ARM_mve_coerce(__p2, float16x8_t), p3), \
> -  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]:
> __arm_veorq_x_f32 (__ARM_mve_coerce(__p1, float32x4_t),
> __ARM_mve_coerce(__p2, float32x4_t), p3));})
> -
>  #define __arm_vmaxnmq_x(p1,p2,p3) ({ __typeof(p1) __p1 = (p1); \
>    __typeof(p2) __p2 = (p2); \
>    _Generic( (int (*)[__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0,
> \
> @@ -36251,16 +35451,6 @@ extern void *__ARM_undef;
>    int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]:
> __arm_vhaddq_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
> __ARM_mve_coerce(__p1, uint16x8_t)), \
>    int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]:
> __arm_vhaddq_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
> __ARM_mve_coerce(__p1, uint32x4_t)));})
> 
> -#define __arm_veorq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
> -  __typeof(p1) __p1 = (p1); \
> -  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0,
> \
> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]:
> __arm_veorq_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int8x16_t)), \
> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]:
> __arm_veorq_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce(__p1, int16x8_t)), \
> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
> __arm_veorq_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce(__p1, int32x4_t)), \
> -  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]:
> __arm_veorq_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
> __ARM_mve_coerce(__p1, uint8x16_t)), \
> -  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]:
> __arm_veorq_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
> __ARM_mve_coerce(__p1, uint16x8_t)), \
> -  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]:
> __arm_veorq_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
> __ARM_mve_coerce(__p1, uint32x4_t)));})
> -
>  #define __arm_vcaddq_rot90(p0,p1) ({ __typeof(p0) __p0 = (p0); \
>    __typeof(p1) __p1 = (p1); \
>    _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0,
> \
> @@ -36304,16 +35494,6 @@ extern void *__ARM_undef;
>    int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]:
> __arm_vbicq_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
> __ARM_mve_coerce(__p1, uint16x8_t)), \
>    int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]:
> __arm_vbicq_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
> __ARM_mve_coerce(__p1, uint32x4_t)));})
> 
> -#define __arm_vandq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
> -  __typeof(p1) __p1 = (p1); \
> -  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0,
> \
> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]:
> __arm_vandq_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int8x16_t)), \
> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]:
> __arm_vandq_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce(__p1, int16x8_t)), \
> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
> __arm_vandq_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce(__p1, int32x4_t)), \
> -  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]:
> __arm_vandq_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
> __ARM_mve_coerce(__p1, uint8x16_t)), \
> -  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]:
> __arm_vandq_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
> __ARM_mve_coerce(__p1, uint16x8_t)), \
> -  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]:
> __arm_vandq_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
> __ARM_mve_coerce(__p1, uint32x4_t)));})
> -
>  #define __arm_vabdq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
>    __typeof(p1) __p1 = (p1); \
>    _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0,
> \
> @@ -36998,17 +36178,6 @@ extern void *__ARM_undef;
>    int
> (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_m
> ve_type_uint16x8_t]: __arm_vabdq_m_u16 (__ARM_mve_coerce(__p0,
> uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t),
> __ARM_mve_coerce(__p2, uint16x8_t), p3), \
>    int
> (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_m
> ve_type_uint32x4_t]: __arm_vabdq_m_u32 (__ARM_mve_coerce(__p0,
> uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t),
> __ARM_mve_coerce(__p2, uint32x4_t), p3));})
> 
> -#define __arm_vandq_m(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
> -  __typeof(p1) __p1 = (p1); \
> -  __typeof(p2) __p2 = (p2); \
> -  _Generic( (int
> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typ
> eid(__p2)])0, \
> -  int
> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
> _type_int8x16_t]: __arm_vandq_m_s8 (__ARM_mve_coerce(__p0,
> int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2,
> int8x16_t), p3), \
> -  int
> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
> _type_int16x8_t]: __arm_vandq_m_s16 (__ARM_mve_coerce(__p0,
> int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2,
> int16x8_t), p3), \
> -  int
> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
> _type_int32x4_t]: __arm_vandq_m_s32 (__ARM_mve_coerce(__p0,
> int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2,
> int32x4_t), p3), \
> -  int
> (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_m
> ve_type_uint8x16_t]: __arm_vandq_m_u8 (__ARM_mve_coerce(__p0,
> uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t),
> __ARM_mve_coerce(__p2, uint8x16_t), p3), \
> -  int
> (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_m
> ve_type_uint16x8_t]: __arm_vandq_m_u16 (__ARM_mve_coerce(__p0,
> uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t),
> __ARM_mve_coerce(__p2, uint16x8_t), p3), \
> -  int
> (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_m
> ve_type_uint32x4_t]: __arm_vandq_m_u32 (__ARM_mve_coerce(__p0,
> uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t),
> __ARM_mve_coerce(__p2, uint32x4_t), p3));})
> -
>  #define __arm_vbicq_m(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
>    __typeof(p1) __p1 = (p1); \
>    __typeof(p2) __p2 = (p2); \
> @@ -37053,17 +36222,6 @@ extern void *__ARM_undef;
>    int
> (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_m
> ve_type_uint16x8_t]: __arm_vcaddq_rot90_m_u16
> (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1,
> uint16x8_t), __ARM_mve_coerce(__p2, uint16x8_t), p3), \
>    int
> (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_m
> ve_type_uint32x4_t]: __arm_vcaddq_rot90_m_u32
> (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1,
> uint32x4_t), __ARM_mve_coerce(__p2, uint32x4_t), p3));})
> 
> -#define __arm_veorq_m(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
> -  __typeof(p1) __p1 = (p1); \
> -  __typeof(p2) __p2 = (p2); \
> -  _Generic( (int
> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typ
> eid(__p2)])0, \
> -  int
> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
> _type_int8x16_t]: __arm_veorq_m_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8x16_t),
> p3), \
> -  int
> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
> _type_int16x8_t]: __arm_veorq_m_s16 (__ARM_mve_coerce(__p0,
> int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2,
> int16x8_t), p3), \
> -  int
> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
> _type_int32x4_t]: __arm_veorq_m_s32 (__ARM_mve_coerce(__p0,
> int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2,
> int32x4_t), p3), \
> -  int
> (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_m
> ve_type_uint8x16_t]: __arm_veorq_m_u8 (__ARM_mve_coerce(__p0,
> uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t),
> __ARM_mve_coerce(__p2, uint8x16_t), p3), \
> -  int
> (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_m
> ve_type_uint16x8_t]: __arm_veorq_m_u16 (__ARM_mve_coerce(__p0,
> uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t),
> __ARM_mve_coerce(__p2, uint16x8_t), p3), \
> -  int
> (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_m
> ve_type_uint32x4_t]: __arm_veorq_m_u32 (__ARM_mve_coerce(__p0,
> uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t),
> __ARM_mve_coerce(__p2, uint32x4_t), p3));})
> -
>  #define __arm_vmladavaq_p(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
>    __typeof(p1) __p1 = (p1); \
>    __typeof(p2) __p2 = (p2); \
> @@ -37360,16 +36518,6 @@ extern void *__ARM_undef;
>    int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]:
> __arm_vcaddq_rot90_x_u16 (__ARM_mve_coerce(__p1, uint16x8_t),
> __ARM_mve_coerce(__p2, uint16x8_t), p3), \
>    int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]:
> __arm_vcaddq_rot90_x_u32 (__ARM_mve_coerce(__p1, uint32x4_t),
> __ARM_mve_coerce(__p2, uint32x4_t), p3));})
> 
> -#define __arm_veorq_x(p1,p2,p3) ({ __typeof(p1) __p1 = (p1); \
> -  __typeof(p2) __p2 = (p2); \
> -  _Generic( (int (*)[__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0,
> \
> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]:
> __arm_veorq_x_s8(__ARM_mve_coerce(__p1, int8x16_t),
> __ARM_mve_coerce(__p2, int8x16_t), p3), \
> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]:
> __arm_veorq_x_s16(__ARM_mve_coerce(__p1, int16x8_t),
> __ARM_mve_coerce(__p2, int16x8_t), p3), \
> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
> __arm_veorq_x_s32(__ARM_mve_coerce(__p1, int32x4_t),
> __ARM_mve_coerce(__p2, int32x4_t), p3), \
> -  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]:
> __arm_veorq_x_u8 (__ARM_mve_coerce(__p1, uint8x16_t),
> __ARM_mve_coerce(__p2, uint8x16_t), p3), \
> -  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]:
> __arm_veorq_x_u16 (__ARM_mve_coerce(__p1, uint16x8_t),
> __ARM_mve_coerce(__p2, uint16x8_t), p3), \
> -  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]:
> __arm_veorq_x_u32 (__ARM_mve_coerce(__p1, uint32x4_t),
> __ARM_mve_coerce(__p2, uint32x4_t), p3));})
> -
>  #define __arm_vmovlbq_x(p1,p2) ({ __typeof(p1) __p1 = (p1); \
>    _Generic( (int (*)[__ARM_mve_typeid(__p1)])0, \
>    int (*)[__ARM_mve_type_int8x16_t]: __arm_vmovlbq_x_s8
> (__ARM_mve_coerce(__p1, int8x16_t), p2), \
> @@ -37478,16 +36626,6 @@ extern void *__ARM_undef;
>    int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]:
> __arm_vabdq_x_u16 (__ARM_mve_coerce(__p1, uint16x8_t),
> __ARM_mve_coerce(__p2, uint16x8_t), p3), \
>    int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]:
> __arm_vabdq_x_u32 (__ARM_mve_coerce(__p1, uint32x4_t),
> __ARM_mve_coerce(__p2, uint32x4_t), p3));})
> 
> -#define __arm_vandq_x(p1,p2,p3) ({ __typeof(p1) __p1 = (p1); \
> -  __typeof(p2) __p2 = (p2); \
> -  _Generic( (int (*)[__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0,
> \
> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]:
> __arm_vandq_x_s8  (__ARM_mve_coerce(__p1, int8x16_t),
> __ARM_mve_coerce(__p2, int8x16_t), p3), \
> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]:
> __arm_vandq_x_s16 (__ARM_mve_coerce(__p1, int16x8_t),
> __ARM_mve_coerce(__p2, int16x8_t), p3), \
> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
> __arm_vandq_x_s32 (__ARM_mve_coerce(__p1, int32x4_t),
> __ARM_mve_coerce(__p2, int32x4_t), p3), \
> -  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]:
> __arm_vandq_x_u8 (__ARM_mve_coerce(__p1, uint8x16_t),
> __ARM_mve_coerce(__p2, uint8x16_t), p3), \
> -  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]:
> __arm_vandq_x_u16 (__ARM_mve_coerce(__p1, uint16x8_t),
> __ARM_mve_coerce(__p2, uint16x8_t), p3), \
> -  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]:
> __arm_vandq_x_u32 (__ARM_mve_coerce(__p1, uint32x4_t),
> __ARM_mve_coerce(__p2, uint32x4_t), p3));})
> -
>  #define __arm_vbicq_x(p1,p2,p3) ({ __typeof(p1) __p1 = (p1); \
>    __typeof(p2) __p2 = (p2); \
>    _Generic( (int (*)[__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0,
> \
> --
> 2.34.1


^ permalink raw reply	[flat|nested] 55+ messages in thread

* RE: [PATCH 12/22] arm: [MVE intrinsics] add binary_orrq shape
  2023-04-18 13:45 ` [PATCH 12/22] arm: [MVE intrinsics] add binary_orrq shape Christophe Lyon
@ 2023-05-02 16:39   ` Kyrylo Tkachov
  0 siblings, 0 replies; 55+ messages in thread
From: Kyrylo Tkachov @ 2023-05-02 16:39 UTC (permalink / raw)
  To: Christophe Lyon, gcc-patches, Richard Earnshaw, Richard Sandiford
  Cc: Christophe Lyon



> -----Original Message-----
> From: Christophe Lyon <christophe.lyon@arm.com>
> Sent: Tuesday, April 18, 2023 2:46 PM
> To: gcc-patches@gcc.gnu.org; Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>;
> Richard Earnshaw <Richard.Earnshaw@arm.com>; Richard Sandiford
> <Richard.Sandiford@arm.com>
> Cc: Christophe Lyon <Christophe.Lyon@arm.com>
> Subject: [PATCH 12/22] arm: [MVE intrinsics] add binary_orrq shape
> 
> patch adds the binary_orrq shape description.
> 
> MODE_n intrinsics use a set of predicates (preds_m_or_none) different
> the MODE_none ones, so we explicitly reference preds_m_or_none from
> the shape, thus we need to make it a global array.
> 
> 2022-09-08  Christophe Lyon  <christophe.lyon@arm.com>
> 
> 	gcc/
> 	* config/arm/arm-mve-builtins-shapes.cc (binary_orrq): New.
> 	* config/arm/arm-mve-builtins-shapes.h (binary_orrq): New.
> 	* config/arm/arm-mve-builtins.cc (preds_m_or_none): Remove
> static.
> 	* config/arm/arm-mve-builtins.h (preds_m_or_none): Declare.
> ---
>  gcc/config/arm/arm-mve-builtins-shapes.cc | 61 +++++++++++++++++++++++
>  gcc/config/arm/arm-mve-builtins-shapes.h  |  1 +
>  gcc/config/arm/arm-mve-builtins.cc        |  2 +-
>  gcc/config/arm/arm-mve-builtins.h         |  3 ++
>  4 files changed, 66 insertions(+), 1 deletion(-)
> 
> diff --git a/gcc/config/arm/arm-mve-builtins-shapes.cc b/gcc/config/arm/arm-
> mve-builtins-shapes.cc
> index e69faae4e2c..83410bbc51a 100644
> --- a/gcc/config/arm/arm-mve-builtins-shapes.cc
> +++ b/gcc/config/arm/arm-mve-builtins-shapes.cc
> @@ -397,6 +397,67 @@ struct binary_opt_n_def : public
> overloaded_base<0>
>  };
>  SHAPE (binary_opt_n)
> 
> +/* <T0>_t vfoo[t0](<T0>_t, <T0>_t)
> +   <T0>_t vfoo[_n_t0](<T0>_t, <S0>_t)
> +
> +   Where the _n form has only supports s16/s32/u16/u32 types as for vorrq.

Delete the "has" in this sentence.
Ok otherwise.
Thanks,
Kyrill

> +
> +   Example: vorrq.
> +   int16x8_t [__arm_]vorrq[_s16](int16x8_t a, int16x8_t b)
> +   int16x8_t [__arm_]vorrq_m[_s16](int16x8_t inactive, int16x8_t a, int16x8_t
> b, mve_pred16_t p)
> +   int16x8_t [__arm_]vorrq_x[_s16](int16x8_t a, int16x8_t b, mve_pred16_t
> p)
> +   int16x8_t [__arm_]vorrq[_n_s16](int16x8_t a, const int16_t imm)
> +   int16x8_t [__arm_]vorrq_m_n[_s16](int16x8_t a, const int16_t imm,
> mve_pred16_t p)  */
> +struct binary_orrq_def : public overloaded_base<0>
> +{
> +  bool
> +  explicit_mode_suffix_p (enum predication_index pred, enum
> mode_suffix_index mode) const override
> +  {
> +    return (mode == MODE_n
> +	    && pred == PRED_m);
> +  }
> +
> +  bool
> +  skip_overload_p (enum predication_index pred, enum mode_suffix_index
> mode) const override
> +  {
> +    switch (mode)
> +      {
> +      case MODE_none:
> +	return false;
> +
> +	/* For MODE_n, share the overloaded instance with MODE_none,
> except for PRED_m.  */
> +      case MODE_n:
> +	return pred != PRED_m;
> +
> +      default:
> +	gcc_unreachable ();
> +      }
> +  }
> +
> +  void
> +  build (function_builder &b, const function_group_info &group,
> +	 bool preserve_user_namespace) const override
> +  {
> +    b.add_overloaded_functions (group, MODE_none,
> preserve_user_namespace);
> +    b.add_overloaded_functions (group, MODE_n,
> preserve_user_namespace);
> +    build_all (b, "v0,v0,v0", group, MODE_none, preserve_user_namespace);
> +    build_16_32 (b, "v0,v0,s0", group, MODE_n, preserve_user_namespace,
> false, preds_m_or_none);
> +  }
> +
> +  tree
> +  resolve (function_resolver &r) const override
> +  {
> +    unsigned int i, nargs;
> +    type_suffix_index type;
> +    if (!r.check_gp_argument (2, i, nargs)
> +	|| (type = r.infer_vector_type (0)) == NUM_TYPE_SUFFIXES)
> +      return error_mark_node;
> +
> +    return r.finish_opt_n_resolution (i, 0, type);
> +  }
> +};
> +SHAPE (binary_orrq)
> +
>  /* <T0>[xN]_t vfoo_t0().
> 
>     Example: vuninitializedq.
> diff --git a/gcc/config/arm/arm-mve-builtins-shapes.h b/gcc/config/arm/arm-
> mve-builtins-shapes.h
> index b00ee5eb57a..618b3226050 100644
> --- a/gcc/config/arm/arm-mve-builtins-shapes.h
> +++ b/gcc/config/arm/arm-mve-builtins-shapes.h
> @@ -36,6 +36,7 @@ namespace arm_mve
> 
>      extern const function_shape *const binary;
>      extern const function_shape *const binary_opt_n;
> +    extern const function_shape *const binary_orrq;
>      extern const function_shape *const inherent;
>      extern const function_shape *const unary_convert;
> 
> diff --git a/gcc/config/arm/arm-mve-builtins.cc b/gcc/config/arm/arm-mve-
> builtins.cc
> index e409a029346..c74e890bd3d 100644
> --- a/gcc/config/arm/arm-mve-builtins.cc
> +++ b/gcc/config/arm/arm-mve-builtins.cc
> @@ -285,7 +285,7 @@ static const predication_index preds_none[] = {
> PRED_none, NUM_PREDS };
> 
>  /* Used by functions that have the m (merging) predicated form, and in
>     addition have an unpredicated form.  */
> -static const predication_index preds_m_or_none[] = {
> +const predication_index preds_m_or_none[] = {
>    PRED_m, PRED_none, NUM_PREDS
>  };
> 
> diff --git a/gcc/config/arm/arm-mve-builtins.h b/gcc/config/arm/arm-mve-
> builtins.h
> index a20d2fb5d86..c9b51a0c77b 100644
> --- a/gcc/config/arm/arm-mve-builtins.h
> +++ b/gcc/config/arm/arm-mve-builtins.h
> @@ -135,6 +135,9 @@ enum predication_index
>    NUM_PREDS
>  };
> 
> +/* Some shapes need access to some predicate sets.  */
> +extern const predication_index preds_m_or_none[];
> +
>  /* Classifies element types, based on type suffixes with the bit count
>     removed.  */
>  enum type_class_index
> --
> 2.34.1


^ permalink raw reply	[flat|nested] 55+ messages in thread

* RE: [PATCH 13/22] arm: [MVE intrinsics] rework vorrq
  2023-04-18 13:45 ` [PATCH 13/22] arm: [MVE intrinsics] rework vorrq Christophe Lyon
@ 2023-05-02 16:41   ` Kyrylo Tkachov
  0 siblings, 0 replies; 55+ messages in thread
From: Kyrylo Tkachov @ 2023-05-02 16:41 UTC (permalink / raw)
  To: Christophe Lyon, gcc-patches, Richard Earnshaw, Richard Sandiford
  Cc: Christophe Lyon



> -----Original Message-----
> From: Christophe Lyon <christophe.lyon@arm.com>
> Sent: Tuesday, April 18, 2023 2:46 PM
> To: gcc-patches@gcc.gnu.org; Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>;
> Richard Earnshaw <Richard.Earnshaw@arm.com>; Richard Sandiford
> <Richard.Sandiford@arm.com>
> Cc: Christophe Lyon <Christophe.Lyon@arm.com>
> Subject: [PATCH 13/22] arm: [MVE intrinsics] rework vorrq
> 
> Implement vorrq using the new MVE builtins framework.
> 

Ok.
Thanks,
Kyrill

> 2022-09-08  Christophe Lyon <christophe.lyon@arm.com>
> 
> 	gcc/
> 	* config/arm/arm-mve-builtins-base.cc
> (FUNCTION_WITH_RTX_M_N_NO_N_F): New.
> 	(vorrq): New.
> 	* config/arm/arm-mve-builtins-base.def (vorrq): New.
> 	* config/arm/arm-mve-builtins-base.h (vorrq): New.
> 	* config/arm/arm-mve-builtins.cc
> 	(function_instance::has_inactive_argument): Handle vorrq.
> 	* config/arm/arm_mve.h (vorrq): Remove.
> 	(vorrq_m_n): Remove.
> 	(vorrq_m): Remove.
> 	(vorrq_x): Remove.
> 	(vorrq_u8): Remove.
> 	(vorrq_s8): Remove.
> 	(vorrq_u16): Remove.
> 	(vorrq_s16): Remove.
> 	(vorrq_u32): Remove.
> 	(vorrq_s32): Remove.
> 	(vorrq_n_u16): Remove.
> 	(vorrq_f16): Remove.
> 	(vorrq_n_s16): Remove.
> 	(vorrq_n_u32): Remove.
> 	(vorrq_f32): Remove.
> 	(vorrq_n_s32): Remove.
> 	(vorrq_m_n_s16): Remove.
> 	(vorrq_m_n_u16): Remove.
> 	(vorrq_m_n_s32): Remove.
> 	(vorrq_m_n_u32): Remove.
> 	(vorrq_m_s8): Remove.
> 	(vorrq_m_s32): Remove.
> 	(vorrq_m_s16): Remove.
> 	(vorrq_m_u8): Remove.
> 	(vorrq_m_u32): Remove.
> 	(vorrq_m_u16): Remove.
> 	(vorrq_m_f32): Remove.
> 	(vorrq_m_f16): Remove.
> 	(vorrq_x_s8): Remove.
> 	(vorrq_x_s16): Remove.
> 	(vorrq_x_s32): Remove.
> 	(vorrq_x_u8): Remove.
> 	(vorrq_x_u16): Remove.
> 	(vorrq_x_u32): Remove.
> 	(vorrq_x_f16): Remove.
> 	(vorrq_x_f32): Remove.
> 	(__arm_vorrq_u8): Remove.
> 	(__arm_vorrq_s8): Remove.
> 	(__arm_vorrq_u16): Remove.
> 	(__arm_vorrq_s16): Remove.
> 	(__arm_vorrq_u32): Remove.
> 	(__arm_vorrq_s32): Remove.
> 	(__arm_vorrq_n_u16): Remove.
> 	(__arm_vorrq_n_s16): Remove.
> 	(__arm_vorrq_n_u32): Remove.
> 	(__arm_vorrq_n_s32): Remove.
> 	(__arm_vorrq_m_n_s16): Remove.
> 	(__arm_vorrq_m_n_u16): Remove.
> 	(__arm_vorrq_m_n_s32): Remove.
> 	(__arm_vorrq_m_n_u32): Remove.
> 	(__arm_vorrq_m_s8): Remove.
> 	(__arm_vorrq_m_s32): Remove.
> 	(__arm_vorrq_m_s16): Remove.
> 	(__arm_vorrq_m_u8): Remove.
> 	(__arm_vorrq_m_u32): Remove.
> 	(__arm_vorrq_m_u16): Remove.
> 	(__arm_vorrq_x_s8): Remove.
> 	(__arm_vorrq_x_s16): Remove.
> 	(__arm_vorrq_x_s32): Remove.
> 	(__arm_vorrq_x_u8): Remove.
> 	(__arm_vorrq_x_u16): Remove.
> 	(__arm_vorrq_x_u32): Remove.
> 	(__arm_vorrq_f16): Remove.
> 	(__arm_vorrq_f32): Remove.
> 	(__arm_vorrq_m_f32): Remove.
> 	(__arm_vorrq_m_f16): Remove.
> 	(__arm_vorrq_x_f16): Remove.
> 	(__arm_vorrq_x_f32): Remove.
> 	(__arm_vorrq): Remove.
> 	(__arm_vorrq_m_n): Remove.
> 	(__arm_vorrq_m): Remove.
> 	(__arm_vorrq_x): Remove.
> ---
>  gcc/config/arm/arm-mve-builtins-base.cc  |   9 +
>  gcc/config/arm/arm-mve-builtins-base.def |   2 +
>  gcc/config/arm/arm-mve-builtins-base.h   |   1 +
>  gcc/config/arm/arm-mve-builtins.cc       |   3 +
>  gcc/config/arm/arm_mve.h                 | 559 -----------------------
>  5 files changed, 15 insertions(+), 559 deletions(-)
> 
> diff --git a/gcc/config/arm/arm-mve-builtins-base.cc b/gcc/config/arm/arm-
> mve-builtins-base.cc
> index 51fed8f671f..499a1ef9f0e 100644
> --- a/gcc/config/arm/arm-mve-builtins-base.cc
> +++ b/gcc/config/arm/arm-mve-builtins-base.cc
> @@ -98,10 +98,19 @@ namespace arm_mve {
>      UNSPEC##_M_S, UNSPEC##_M_U, UNSPEC##_M_F,
> 	\
>      -1, -1, -1))
> 
> +  /* Helper for builtins with RTX codes, _m predicated and _n overrides.  */
> +#define FUNCTION_WITH_RTX_M_N_NO_N_F(NAME, RTX, UNSPEC)
> FUNCTION	\
> +  (NAME, unspec_based_mve_function_exact_insn,
> 	\
> +   (RTX, RTX, RTX,							\
> +    UNSPEC##_N_S, UNSPEC##_N_U, -1,					\
> +    UNSPEC##_M_S, UNSPEC##_M_U, UNSPEC##_M_F,
> 	\
> +    UNSPEC##_M_N_S, UNSPEC##_M_N_U, -1))
> +
>  FUNCTION_WITH_RTX_M_N (vaddq, PLUS, VADDQ)
>  FUNCTION_WITH_RTX_M (vandq, AND, VANDQ)
>  FUNCTION_WITH_RTX_M (veorq, XOR, VEORQ)
>  FUNCTION_WITH_RTX_M_N (vmulq, MULT, VMULQ)
> +FUNCTION_WITH_RTX_M_N_NO_N_F (vorrq, IOR, VORRQ)
>  FUNCTION (vreinterpretq, vreinterpretq_impl,)
>  FUNCTION_WITH_RTX_M_N (vsubq, MINUS, VSUBQ)
>  FUNCTION (vuninitializedq, vuninitializedq_impl,)
> diff --git a/gcc/config/arm/arm-mve-builtins-base.def b/gcc/config/arm/arm-
> mve-builtins-base.def
> index a933c9fc91e..c3f8c0f0eeb 100644
> --- a/gcc/config/arm/arm-mve-builtins-base.def
> +++ b/gcc/config/arm/arm-mve-builtins-base.def
> @@ -22,6 +22,7 @@ DEF_MVE_FUNCTION (vaddq, binary_opt_n, all_integer,
> mx_or_none)
>  DEF_MVE_FUNCTION (vandq, binary, all_integer, mx_or_none)
>  DEF_MVE_FUNCTION (veorq, binary, all_integer, mx_or_none)
>  DEF_MVE_FUNCTION (vmulq, binary_opt_n, all_integer, mx_or_none)
> +DEF_MVE_FUNCTION (vorrq, binary_orrq, all_integer, mx_or_none)
>  DEF_MVE_FUNCTION (vreinterpretq, unary_convert, reinterpret_integer,
> none)
>  DEF_MVE_FUNCTION (vsubq, binary_opt_n, all_integer, mx_or_none)
>  DEF_MVE_FUNCTION (vuninitializedq, inherent, all_integer_with_64, none)
> @@ -32,6 +33,7 @@ DEF_MVE_FUNCTION (vaddq, binary_opt_n, all_float,
> mx_or_none)
>  DEF_MVE_FUNCTION (vandq, binary, all_float, mx_or_none)
>  DEF_MVE_FUNCTION (veorq, binary, all_float, mx_or_none)
>  DEF_MVE_FUNCTION (vmulq, binary_opt_n, all_float, mx_or_none)
> +DEF_MVE_FUNCTION (vorrq, binary_orrq, all_float, mx_or_none)
>  DEF_MVE_FUNCTION (vreinterpretq, unary_convert, reinterpret_float, none)
>  DEF_MVE_FUNCTION (vsubq, binary_opt_n, all_float, mx_or_none)
>  DEF_MVE_FUNCTION (vuninitializedq, inherent, all_float, none)
> diff --git a/gcc/config/arm/arm-mve-builtins-base.h b/gcc/config/arm/arm-
> mve-builtins-base.h
> index 4fcf55715b6..c450b373239 100644
> --- a/gcc/config/arm/arm-mve-builtins-base.h
> +++ b/gcc/config/arm/arm-mve-builtins-base.h
> @@ -27,6 +27,7 @@ extern const function_base *const vaddq;
>  extern const function_base *const vandq;
>  extern const function_base *const veorq;
>  extern const function_base *const vmulq;
> +extern const function_base *const vorrq;
>  extern const function_base *const vreinterpretq;
>  extern const function_base *const vsubq;
>  extern const function_base *const vuninitializedq;
> diff --git a/gcc/config/arm/arm-mve-builtins.cc b/gcc/config/arm/arm-mve-
> builtins.cc
> index c74e890bd3d..0708d4fa94a 100644
> --- a/gcc/config/arm/arm-mve-builtins.cc
> +++ b/gcc/config/arm/arm-mve-builtins.cc
> @@ -669,6 +669,9 @@ function_instance::has_inactive_argument () const
>    if (pred != PRED_m)
>      return false;
> 
> +  if (base == functions::vorrq && mode_suffix_id == MODE_n)
> +    return false;
> +
>    return true;
>  }
> 
> diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
> index 0ad0122e44f..edf8e247421 100644
> --- a/gcc/config/arm/arm_mve.h
> +++ b/gcc/config/arm/arm_mve.h
> @@ -65,7 +65,6 @@
>  #define vrhaddq(__a, __b) __arm_vrhaddq(__a, __b)
>  #define vqsubq(__a, __b) __arm_vqsubq(__a, __b)
>  #define vqaddq(__a, __b) __arm_vqaddq(__a, __b)
> -#define vorrq(__a, __b) __arm_vorrq(__a, __b)
>  #define vornq(__a, __b) __arm_vornq(__a, __b)
>  #define vmulltq_int(__a, __b) __arm_vmulltq_int(__a, __b)
>  #define vmullbq_int(__a, __b) __arm_vmullbq_int(__a, __b)
> @@ -201,7 +200,6 @@
>  #define vrmlaldavhxq_p(__a, __b, __p) __arm_vrmlaldavhxq_p(__a, __b,
> __p)
>  #define vrmlsldavhq_p(__a, __b, __p) __arm_vrmlsldavhq_p(__a, __b, __p)
>  #define vrmlsldavhxq_p(__a, __b, __p) __arm_vrmlsldavhxq_p(__a, __b,
> __p)
> -#define vorrq_m_n(__a, __imm, __p) __arm_vorrq_m_n(__a, __imm, __p)
>  #define vqrshrntq(__a, __b, __imm) __arm_vqrshrntq(__a, __b, __imm)
>  #define vqshrnbq(__a, __b, __imm) __arm_vqshrnbq(__a, __b, __imm)
>  #define vqshrntq(__a, __b, __imm) __arm_vqshrntq(__a, __b, __imm)
> @@ -254,7 +252,6 @@
>  #define vmullbq_int_m(__inactive, __a, __b, __p)
> __arm_vmullbq_int_m(__inactive, __a, __b, __p)
>  #define vmulltq_int_m(__inactive, __a, __b, __p)
> __arm_vmulltq_int_m(__inactive, __a, __b, __p)
>  #define vornq_m(__inactive, __a, __b, __p) __arm_vornq_m(__inactive, __a,
> __b, __p)
> -#define vorrq_m(__inactive, __a, __b, __p) __arm_vorrq_m(__inactive, __a,
> __b, __p)
>  #define vqaddq_m(__inactive, __a, __b, __p) __arm_vqaddq_m(__inactive,
> __a, __b, __p)
>  #define vqdmladhq_m(__inactive, __a, __b, __p)
> __arm_vqdmladhq_m(__inactive, __a, __b, __p)
>  #define vqdmlashq_m(__a, __b, __c, __p) __arm_vqdmlashq_m(__a, __b,
> __c, __p)
> @@ -406,7 +403,6 @@
>  #define vmovltq_x(__a, __p) __arm_vmovltq_x(__a, __p)
>  #define vmvnq_x(__a, __p) __arm_vmvnq_x(__a, __p)
>  #define vornq_x(__a, __b, __p) __arm_vornq_x(__a, __b, __p)
> -#define vorrq_x(__a, __b, __p) __arm_vorrq_x(__a, __b, __p)
>  #define vrev16q_x(__a, __p) __arm_vrev16q_x(__a, __p)
>  #define vrev32q_x(__a, __p) __arm_vrev32q_x(__a, __p)
>  #define vrev64q_x(__a, __p) __arm_vrev64q_x(__a, __p)
> @@ -682,7 +678,6 @@
>  #define vqsubq_n_u8(__a, __b) __arm_vqsubq_n_u8(__a, __b)
>  #define vqaddq_u8(__a, __b) __arm_vqaddq_u8(__a, __b)
>  #define vqaddq_n_u8(__a, __b) __arm_vqaddq_n_u8(__a, __b)
> -#define vorrq_u8(__a, __b) __arm_vorrq_u8(__a, __b)
>  #define vornq_u8(__a, __b) __arm_vornq_u8(__a, __b)
>  #define vmulltq_int_u8(__a, __b) __arm_vmulltq_int_u8(__a, __b)
>  #define vmullbq_int_u8(__a, __b) __arm_vmullbq_int_u8(__a, __b)
> @@ -754,7 +749,6 @@
>  #define vqdmulhq_n_s8(__a, __b) __arm_vqdmulhq_n_s8(__a, __b)
>  #define vqaddq_s8(__a, __b) __arm_vqaddq_s8(__a, __b)
>  #define vqaddq_n_s8(__a, __b) __arm_vqaddq_n_s8(__a, __b)
> -#define vorrq_s8(__a, __b) __arm_vorrq_s8(__a, __b)
>  #define vornq_s8(__a, __b) __arm_vornq_s8(__a, __b)
>  #define vmulltq_int_s8(__a, __b) __arm_vmulltq_int_s8(__a, __b)
>  #define vmullbq_int_s8(__a, __b) __arm_vmullbq_int_s8(__a, __b)
> @@ -788,7 +782,6 @@
>  #define vqsubq_n_u16(__a, __b) __arm_vqsubq_n_u16(__a, __b)
>  #define vqaddq_u16(__a, __b) __arm_vqaddq_u16(__a, __b)
>  #define vqaddq_n_u16(__a, __b) __arm_vqaddq_n_u16(__a, __b)
> -#define vorrq_u16(__a, __b) __arm_vorrq_u16(__a, __b)
>  #define vornq_u16(__a, __b) __arm_vornq_u16(__a, __b)
>  #define vmulltq_int_u16(__a, __b) __arm_vmulltq_int_u16(__a, __b)
>  #define vmullbq_int_u16(__a, __b) __arm_vmullbq_int_u16(__a, __b)
> @@ -860,7 +853,6 @@
>  #define vqdmulhq_n_s16(__a, __b) __arm_vqdmulhq_n_s16(__a, __b)
>  #define vqaddq_s16(__a, __b) __arm_vqaddq_s16(__a, __b)
>  #define vqaddq_n_s16(__a, __b) __arm_vqaddq_n_s16(__a, __b)
> -#define vorrq_s16(__a, __b) __arm_vorrq_s16(__a, __b)
>  #define vornq_s16(__a, __b) __arm_vornq_s16(__a, __b)
>  #define vmulltq_int_s16(__a, __b) __arm_vmulltq_int_s16(__a, __b)
>  #define vmullbq_int_s16(__a, __b) __arm_vmullbq_int_s16(__a, __b)
> @@ -894,7 +886,6 @@
>  #define vqsubq_n_u32(__a, __b) __arm_vqsubq_n_u32(__a, __b)
>  #define vqaddq_u32(__a, __b) __arm_vqaddq_u32(__a, __b)
>  #define vqaddq_n_u32(__a, __b) __arm_vqaddq_n_u32(__a, __b)
> -#define vorrq_u32(__a, __b) __arm_vorrq_u32(__a, __b)
>  #define vornq_u32(__a, __b) __arm_vornq_u32(__a, __b)
>  #define vmulltq_int_u32(__a, __b) __arm_vmulltq_int_u32(__a, __b)
>  #define vmullbq_int_u32(__a, __b) __arm_vmullbq_int_u32(__a, __b)
> @@ -966,7 +957,6 @@
>  #define vqdmulhq_n_s32(__a, __b) __arm_vqdmulhq_n_s32(__a, __b)
>  #define vqaddq_s32(__a, __b) __arm_vqaddq_s32(__a, __b)
>  #define vqaddq_n_s32(__a, __b) __arm_vqaddq_n_s32(__a, __b)
> -#define vorrq_s32(__a, __b) __arm_vorrq_s32(__a, __b)
>  #define vornq_s32(__a, __b) __arm_vornq_s32(__a, __b)
>  #define vmulltq_int_s32(__a, __b) __arm_vmulltq_int_s32(__a, __b)
>  #define vmullbq_int_s32(__a, __b) __arm_vmullbq_int_s32(__a, __b)
> @@ -1005,7 +995,6 @@
>  #define vqmovunbq_s16(__a, __b) __arm_vqmovunbq_s16(__a, __b)
>  #define vshlltq_n_u8(__a,  __imm) __arm_vshlltq_n_u8(__a,  __imm)
>  #define vshllbq_n_u8(__a,  __imm) __arm_vshllbq_n_u8(__a,  __imm)
> -#define vorrq_n_u16(__a,  __imm) __arm_vorrq_n_u16(__a,  __imm)
>  #define vbicq_n_u16(__a,  __imm) __arm_vbicq_n_u16(__a,  __imm)
>  #define vcmpneq_n_f16(__a, __b) __arm_vcmpneq_n_f16(__a, __b)
>  #define vcmpneq_f16(__a, __b) __arm_vcmpneq_f16(__a, __b)
> @@ -1025,7 +1014,6 @@
>  #define vqdmulltq_n_s16(__a, __b) __arm_vqdmulltq_n_s16(__a, __b)
>  #define vqdmullbq_s16(__a, __b) __arm_vqdmullbq_s16(__a, __b)
>  #define vqdmullbq_n_s16(__a, __b) __arm_vqdmullbq_n_s16(__a, __b)
> -#define vorrq_f16(__a, __b) __arm_vorrq_f16(__a, __b)
>  #define vornq_f16(__a, __b) __arm_vornq_f16(__a, __b)
>  #define vmovntq_s16(__a, __b) __arm_vmovntq_s16(__a, __b)
>  #define vmovnbq_s16(__a, __b) __arm_vmovnbq_s16(__a, __b)
> @@ -1051,7 +1039,6 @@
>  #define vabdq_f16(__a, __b) __arm_vabdq_f16(__a, __b)
>  #define vshlltq_n_s8(__a,  __imm) __arm_vshlltq_n_s8(__a,  __imm)
>  #define vshllbq_n_s8(__a,  __imm) __arm_vshllbq_n_s8(__a,  __imm)
> -#define vorrq_n_s16(__a,  __imm) __arm_vorrq_n_s16(__a,  __imm)
>  #define vbicq_n_s16(__a,  __imm) __arm_vbicq_n_s16(__a,  __imm)
>  #define vqmovntq_u32(__a, __b) __arm_vqmovntq_u32(__a, __b)
>  #define vqmovnbq_u32(__a, __b) __arm_vqmovnbq_u32(__a, __b)
> @@ -1064,7 +1051,6 @@
>  #define vqmovunbq_s32(__a, __b) __arm_vqmovunbq_s32(__a, __b)
>  #define vshlltq_n_u16(__a,  __imm) __arm_vshlltq_n_u16(__a,  __imm)
>  #define vshllbq_n_u16(__a,  __imm) __arm_vshllbq_n_u16(__a,  __imm)
> -#define vorrq_n_u32(__a,  __imm) __arm_vorrq_n_u32(__a,  __imm)
>  #define vbicq_n_u32(__a,  __imm) __arm_vbicq_n_u32(__a,  __imm)
>  #define vcmpneq_n_f32(__a, __b) __arm_vcmpneq_n_f32(__a, __b)
>  #define vcmpneq_f32(__a, __b) __arm_vcmpneq_f32(__a, __b)
> @@ -1084,7 +1070,6 @@
>  #define vqdmulltq_n_s32(__a, __b) __arm_vqdmulltq_n_s32(__a, __b)
>  #define vqdmullbq_s32(__a, __b) __arm_vqdmullbq_s32(__a, __b)
>  #define vqdmullbq_n_s32(__a, __b) __arm_vqdmullbq_n_s32(__a, __b)
> -#define vorrq_f32(__a, __b) __arm_vorrq_f32(__a, __b)
>  #define vornq_f32(__a, __b) __arm_vornq_f32(__a, __b)
>  #define vmovntq_s32(__a, __b) __arm_vmovntq_s32(__a, __b)
>  #define vmovnbq_s32(__a, __b) __arm_vmovnbq_s32(__a, __b)
> @@ -1110,7 +1095,6 @@
>  #define vabdq_f32(__a, __b) __arm_vabdq_f32(__a, __b)
>  #define vshlltq_n_s16(__a,  __imm) __arm_vshlltq_n_s16(__a,  __imm)
>  #define vshllbq_n_s16(__a,  __imm) __arm_vshllbq_n_s16(__a,  __imm)
> -#define vorrq_n_s32(__a,  __imm) __arm_vorrq_n_s32(__a,  __imm)
>  #define vbicq_n_s32(__a,  __imm) __arm_vbicq_n_s32(__a,  __imm)
>  #define vrmlaldavhq_u32(__a, __b) __arm_vrmlaldavhq_u32(__a, __b)
>  #define vctp8q_m(__a, __p) __arm_vctp8q_m(__a, __p)
> @@ -1428,7 +1412,6 @@
>  #define vrev16q_m_u8(__inactive, __a, __p)
> __arm_vrev16q_m_u8(__inactive, __a, __p)
>  #define vrmlaldavhq_p_u32(__a, __b, __p) __arm_vrmlaldavhq_p_u32(__a,
> __b, __p)
>  #define vmvnq_m_n_s16(__inactive,  __imm, __p)
> __arm_vmvnq_m_n_s16(__inactive,  __imm, __p)
> -#define vorrq_m_n_s16(__a,  __imm, __p) __arm_vorrq_m_n_s16(__a,
> __imm, __p)
>  #define vqrshrntq_n_s16(__a, __b,  __imm) __arm_vqrshrntq_n_s16(__a,
> __b,  __imm)
>  #define vqshrnbq_n_s16(__a, __b,  __imm) __arm_vqshrnbq_n_s16(__a,
> __b,  __imm)
>  #define vqshrntq_n_s16(__a, __b,  __imm) __arm_vqshrntq_n_s16(__a, __b,
> __imm)
> @@ -1492,7 +1475,6 @@
>  #define vcmpneq_m_f16(__a, __b, __p) __arm_vcmpneq_m_f16(__a, __b,
> __p)
>  #define vcmpneq_m_n_f16(__a, __b, __p) __arm_vcmpneq_m_n_f16(__a,
> __b, __p)
>  #define vmvnq_m_n_u16(__inactive,  __imm, __p)
> __arm_vmvnq_m_n_u16(__inactive,  __imm, __p)
> -#define vorrq_m_n_u16(__a,  __imm, __p) __arm_vorrq_m_n_u16(__a,
> __imm, __p)
>  #define vqrshruntq_n_s16(__a, __b,  __imm) __arm_vqrshruntq_n_s16(__a,
> __b,  __imm)
>  #define vqshrunbq_n_s16(__a, __b,  __imm) __arm_vqshrunbq_n_s16(__a,
> __b,  __imm)
>  #define vqshruntq_n_s16(__a, __b,  __imm) __arm_vqshruntq_n_s16(__a,
> __b,  __imm)
> @@ -1519,7 +1501,6 @@
>  #define vqmovntq_m_u16(__a, __b, __p) __arm_vqmovntq_m_u16(__a,
> __b, __p)
>  #define vrev32q_m_u8(__inactive, __a, __p)
> __arm_vrev32q_m_u8(__inactive, __a, __p)
>  #define vmvnq_m_n_s32(__inactive,  __imm, __p)
> __arm_vmvnq_m_n_s32(__inactive,  __imm, __p)
> -#define vorrq_m_n_s32(__a,  __imm, __p) __arm_vorrq_m_n_s32(__a,
> __imm, __p)
>  #define vqrshrntq_n_s32(__a, __b,  __imm) __arm_vqrshrntq_n_s32(__a,
> __b,  __imm)
>  #define vqshrnbq_n_s32(__a, __b,  __imm) __arm_vqshrnbq_n_s32(__a,
> __b,  __imm)
>  #define vqshrntq_n_s32(__a, __b,  __imm) __arm_vqshrntq_n_s32(__a, __b,
> __imm)
> @@ -1583,7 +1564,6 @@
>  #define vcmpneq_m_f32(__a, __b, __p) __arm_vcmpneq_m_f32(__a, __b,
> __p)
>  #define vcmpneq_m_n_f32(__a, __b, __p) __arm_vcmpneq_m_n_f32(__a,
> __b, __p)
>  #define vmvnq_m_n_u32(__inactive,  __imm, __p)
> __arm_vmvnq_m_n_u32(__inactive,  __imm, __p)
> -#define vorrq_m_n_u32(__a,  __imm, __p) __arm_vorrq_m_n_u32(__a,
> __imm, __p)
>  #define vqrshruntq_n_s32(__a, __b,  __imm) __arm_vqrshruntq_n_s32(__a,
> __b,  __imm)
>  #define vqshrunbq_n_s32(__a, __b,  __imm) __arm_vqshrunbq_n_s32(__a,
> __b,  __imm)
>  #define vqshruntq_n_s32(__a, __b,  __imm) __arm_vqshruntq_n_s32(__a,
> __b,  __imm)
> @@ -1757,12 +1737,6 @@
>  #define vornq_m_u8(__inactive, __a, __b, __p)
> __arm_vornq_m_u8(__inactive, __a, __b, __p)
>  #define vornq_m_u32(__inactive, __a, __b, __p)
> __arm_vornq_m_u32(__inactive, __a, __b, __p)
>  #define vornq_m_u16(__inactive, __a, __b, __p)
> __arm_vornq_m_u16(__inactive, __a, __b, __p)
> -#define vorrq_m_s8(__inactive, __a, __b, __p)
> __arm_vorrq_m_s8(__inactive, __a, __b, __p)
> -#define vorrq_m_s32(__inactive, __a, __b, __p)
> __arm_vorrq_m_s32(__inactive, __a, __b, __p)
> -#define vorrq_m_s16(__inactive, __a, __b, __p)
> __arm_vorrq_m_s16(__inactive, __a, __b, __p)
> -#define vorrq_m_u8(__inactive, __a, __b, __p)
> __arm_vorrq_m_u8(__inactive, __a, __b, __p)
> -#define vorrq_m_u32(__inactive, __a, __b, __p)
> __arm_vorrq_m_u32(__inactive, __a, __b, __p)
> -#define vorrq_m_u16(__inactive, __a, __b, __p)
> __arm_vorrq_m_u16(__inactive, __a, __b, __p)
>  #define vqaddq_m_n_s8(__inactive, __a, __b, __p)
> __arm_vqaddq_m_n_s8(__inactive, __a, __b, __p)
>  #define vqaddq_m_n_s32(__inactive, __a, __b, __p)
> __arm_vqaddq_m_n_s32(__inactive, __a, __b, __p)
>  #define vqaddq_m_n_s16(__inactive, __a, __b, __p)
> __arm_vqaddq_m_n_s16(__inactive, __a, __b, __p)
> @@ -2014,8 +1988,6 @@
>  #define vminnmq_m_f16(__inactive, __a, __b, __p)
> __arm_vminnmq_m_f16(__inactive, __a, __b, __p)
>  #define vornq_m_f32(__inactive, __a, __b, __p)
> __arm_vornq_m_f32(__inactive, __a, __b, __p)
>  #define vornq_m_f16(__inactive, __a, __b, __p)
> __arm_vornq_m_f16(__inactive, __a, __b, __p)
> -#define vorrq_m_f32(__inactive, __a, __b, __p)
> __arm_vorrq_m_f32(__inactive, __a, __b, __p)
> -#define vorrq_m_f16(__inactive, __a, __b, __p)
> __arm_vorrq_m_f16(__inactive, __a, __b, __p)
>  #define vstrbq_s8( __addr, __value) __arm_vstrbq_s8( __addr, __value)
>  #define vstrbq_u8( __addr, __value) __arm_vstrbq_u8( __addr, __value)
>  #define vstrbq_u16( __addr, __value) __arm_vstrbq_u16( __addr, __value)
> @@ -2465,12 +2437,6 @@
>  #define vornq_x_u8(__a, __b, __p) __arm_vornq_x_u8(__a, __b, __p)
>  #define vornq_x_u16(__a, __b, __p) __arm_vornq_x_u16(__a, __b, __p)
>  #define vornq_x_u32(__a, __b, __p) __arm_vornq_x_u32(__a, __b, __p)
> -#define vorrq_x_s8(__a, __b, __p) __arm_vorrq_x_s8(__a, __b, __p)
> -#define vorrq_x_s16(__a, __b, __p) __arm_vorrq_x_s16(__a, __b, __p)
> -#define vorrq_x_s32(__a, __b, __p) __arm_vorrq_x_s32(__a, __b, __p)
> -#define vorrq_x_u8(__a, __b, __p) __arm_vorrq_x_u8(__a, __b, __p)
> -#define vorrq_x_u16(__a, __b, __p) __arm_vorrq_x_u16(__a, __b, __p)
> -#define vorrq_x_u32(__a, __b, __p) __arm_vorrq_x_u32(__a, __b, __p)
>  #define vrev16q_x_s8(__a, __p) __arm_vrev16q_x_s8(__a, __p)
>  #define vrev16q_x_u8(__a, __p) __arm_vrev16q_x_u8(__a, __p)
>  #define vrev32q_x_s8(__a, __p) __arm_vrev32q_x_s8(__a, __p)
> @@ -2597,8 +2563,6 @@
>  #define vbrsrq_x_n_f32(__a, __b, __p) __arm_vbrsrq_x_n_f32(__a, __b,
> __p)
>  #define vornq_x_f16(__a, __b, __p) __arm_vornq_x_f16(__a, __b, __p)
>  #define vornq_x_f32(__a, __b, __p) __arm_vornq_x_f32(__a, __b, __p)
> -#define vorrq_x_f16(__a, __b, __p) __arm_vorrq_x_f16(__a, __b, __p)
> -#define vorrq_x_f32(__a, __b, __p) __arm_vorrq_x_f32(__a, __b, __p)
>  #define vrev32q_x_f16(__a, __p) __arm_vrev32q_x_f16(__a, __p)
>  #define vrev64q_x_f16(__a, __p) __arm_vrev64q_x_f16(__a, __p)
>  #define vrev64q_x_f32(__a, __p) __arm_vrev64q_x_f32(__a, __p)
> @@ -3495,13 +3459,6 @@ __arm_vqaddq_n_u8 (uint8x16_t __a, uint8_t
> __b)
>    return __builtin_mve_vqaddq_n_uv16qi (__a, __b);
>  }
> 
> -__extension__ extern __inline uint8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vorrq_u8 (uint8x16_t __a, uint8x16_t __b)
> -{
> -  return __builtin_mve_vorrq_uv16qi (__a, __b);
> -}
> -
>  __extension__ extern __inline uint8x16_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vornq_u8 (uint8x16_t __a, uint8x16_t __b)
> @@ -4001,13 +3958,6 @@ __arm_vqaddq_n_s8 (int8x16_t __a, int8_t __b)
>    return __builtin_mve_vqaddq_n_sv16qi (__a, __b);
>  }
> 
> -__extension__ extern __inline int8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vorrq_s8 (int8x16_t __a, int8x16_t __b)
> -{
> -  return __builtin_mve_vorrq_sv16qi (__a, __b);
> -}
> -
>  __extension__ extern __inline int8x16_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vornq_s8 (int8x16_t __a, int8x16_t __b)
> @@ -4239,13 +4189,6 @@ __arm_vqaddq_n_u16 (uint16x8_t __a, uint16_t
> __b)
>    return __builtin_mve_vqaddq_n_uv8hi (__a, __b);
>  }
> 
> -__extension__ extern __inline uint16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vorrq_u16 (uint16x8_t __a, uint16x8_t __b)
> -{
> -  return __builtin_mve_vorrq_uv8hi (__a, __b);
> -}
> -
>  __extension__ extern __inline uint16x8_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vornq_u16 (uint16x8_t __a, uint16x8_t __b)
> @@ -4745,13 +4688,6 @@ __arm_vqaddq_n_s16 (int16x8_t __a, int16_t
> __b)
>    return __builtin_mve_vqaddq_n_sv8hi (__a, __b);
>  }
> 
> -__extension__ extern __inline int16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vorrq_s16 (int16x8_t __a, int16x8_t __b)
> -{
> -  return __builtin_mve_vorrq_sv8hi (__a, __b);
> -}
> -
>  __extension__ extern __inline int16x8_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vornq_s16 (int16x8_t __a, int16x8_t __b)
> @@ -4983,13 +4919,6 @@ __arm_vqaddq_n_u32 (uint32x4_t __a, uint32_t
> __b)
>    return __builtin_mve_vqaddq_n_uv4si (__a, __b);
>  }
> 
> -__extension__ extern __inline uint32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vorrq_u32 (uint32x4_t __a, uint32x4_t __b)
> -{
> -  return __builtin_mve_vorrq_uv4si (__a, __b);
> -}
> -
>  __extension__ extern __inline uint32x4_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vornq_u32 (uint32x4_t __a, uint32x4_t __b)
> @@ -5489,13 +5418,6 @@ __arm_vqaddq_n_s32 (int32x4_t __a, int32_t
> __b)
>    return __builtin_mve_vqaddq_n_sv4si (__a, __b);
>  }
> 
> -__extension__ extern __inline int32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vorrq_s32 (int32x4_t __a, int32x4_t __b)
> -{
> -  return __builtin_mve_vorrq_sv4si (__a, __b);
> -}
> -
>  __extension__ extern __inline int32x4_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vornq_s32 (int32x4_t __a, int32x4_t __b)
> @@ -5762,13 +5684,6 @@ __arm_vshllbq_n_u8 (uint8x16_t __a, const int
> __imm)
>    return __builtin_mve_vshllbq_n_uv16qi (__a, __imm);
>  }
> 
> -__extension__ extern __inline uint16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vorrq_n_u16 (uint16x8_t __a, const int __imm)
> -{
> -  return __builtin_mve_vorrq_n_uv8hi (__a, __imm);
> -}
> -
>  __extension__ extern __inline uint16x8_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vbicq_n_u16 (uint16x8_t __a, const int __imm)
> @@ -5874,13 +5789,6 @@ __arm_vshllbq_n_s8 (int8x16_t __a, const int
> __imm)
>    return __builtin_mve_vshllbq_n_sv16qi (__a, __imm);
>  }
> 
> -__extension__ extern __inline int16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vorrq_n_s16 (int16x8_t __a, const int __imm)
> -{
> -  return __builtin_mve_vorrq_n_sv8hi (__a, __imm);
> -}
> -
>  __extension__ extern __inline int16x8_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vbicq_n_s16 (int16x8_t __a, const int __imm)
> @@ -5965,13 +5873,6 @@ __arm_vshllbq_n_u16 (uint16x8_t __a, const int
> __imm)
>    return __builtin_mve_vshllbq_n_uv8hi (__a, __imm);
>  }
> 
> -__extension__ extern __inline uint32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vorrq_n_u32 (uint32x4_t __a, const int __imm)
> -{
> -  return __builtin_mve_vorrq_n_uv4si (__a, __imm);
> -}
> -
>  __extension__ extern __inline uint32x4_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vbicq_n_u32 (uint32x4_t __a, const int __imm)
> @@ -6077,13 +5978,6 @@ __arm_vshllbq_n_s16 (int16x8_t __a, const int
> __imm)
>    return __builtin_mve_vshllbq_n_sv8hi (__a, __imm);
>  }
> 
> -__extension__ extern __inline int32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vorrq_n_s32 (int32x4_t __a, const int __imm)
> -{
> -  return __builtin_mve_vorrq_n_sv4si (__a, __imm);
> -}
> -
>  __extension__ extern __inline int32x4_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vbicq_n_s32 (int32x4_t __a, const int __imm)
> @@ -8197,13 +8091,6 @@ __arm_vmvnq_m_n_s16 (int16x8_t __inactive,
> const int __imm, mve_pred16_t __p)
>    return __builtin_mve_vmvnq_m_n_sv8hi (__inactive, __imm, __p);
>  }
> 
> -__extension__ extern __inline int16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vorrq_m_n_s16 (int16x8_t __a, const int __imm, mve_pred16_t __p)
> -{
> -  return __builtin_mve_vorrq_m_n_sv8hi (__a, __imm, __p);
> -}
> -
>  __extension__ extern __inline int8x16_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vqrshrntq_n_s16 (int8x16_t __a, int16x8_t __b, const int __imm)
> @@ -8365,13 +8252,6 @@ __arm_vmvnq_m_n_u16 (uint16x8_t __inactive,
> const int __imm, mve_pred16_t __p)
>    return __builtin_mve_vmvnq_m_n_uv8hi (__inactive, __imm, __p);
>  }
> 
> -__extension__ extern __inline uint16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vorrq_m_n_u16 (uint16x8_t __a, const int __imm, mve_pred16_t
> __p)
> -{
> -  return __builtin_mve_vorrq_m_n_uv8hi (__a, __imm, __p);
> -}
> -
>  __extension__ extern __inline uint8x16_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vqrshruntq_n_s16 (uint8x16_t __a, int16x8_t __b, const int __imm)
> @@ -8526,13 +8406,6 @@ __arm_vmvnq_m_n_s32 (int32x4_t __inactive,
> const int __imm, mve_pred16_t __p)
>    return __builtin_mve_vmvnq_m_n_sv4si (__inactive, __imm, __p);
>  }
> 
> -__extension__ extern __inline int32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vorrq_m_n_s32 (int32x4_t __a, const int __imm, mve_pred16_t __p)
> -{
> -  return __builtin_mve_vorrq_m_n_sv4si (__a, __imm, __p);
> -}
> -
>  __extension__ extern __inline int16x8_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vqrshrntq_n_s32 (int16x8_t __a, int32x4_t __b, const int __imm)
> @@ -8694,13 +8567,6 @@ __arm_vmvnq_m_n_u32 (uint32x4_t __inactive,
> const int __imm, mve_pred16_t __p)
>    return __builtin_mve_vmvnq_m_n_uv4si (__inactive, __imm, __p);
>  }
> 
> -__extension__ extern __inline uint32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vorrq_m_n_u32 (uint32x4_t __a, const int __imm, mve_pred16_t
> __p)
> -{
> -  return __builtin_mve_vorrq_m_n_uv4si (__a, __imm, __p);
> -}
> -
>  __extension__ extern __inline uint16x8_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vqrshruntq_n_s32 (uint16x8_t __a, int32x4_t __b, const int __imm)
> @@ -9856,48 +9722,6 @@ __arm_vornq_m_u16 (uint16x8_t __inactive,
> uint16x8_t __a, uint16x8_t __b, mve_pr
>    return __builtin_mve_vornq_m_uv8hi (__inactive, __a, __b, __p);
>  }
> 
> -__extension__ extern __inline int8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vorrq_m_s8 (int8x16_t __inactive, int8x16_t __a, int8x16_t __b,
> mve_pred16_t __p)
> -{
> -  return __builtin_mve_vorrq_m_sv16qi (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline int32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vorrq_m_s32 (int32x4_t __inactive, int32x4_t __a, int32x4_t __b,
> mve_pred16_t __p)
> -{
> -  return __builtin_mve_vorrq_m_sv4si (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline int16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vorrq_m_s16 (int16x8_t __inactive, int16x8_t __a, int16x8_t __b,
> mve_pred16_t __p)
> -{
> -  return __builtin_mve_vorrq_m_sv8hi (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline uint8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vorrq_m_u8 (uint8x16_t __inactive, uint8x16_t __a, uint8x16_t __b,
> mve_pred16_t __p)
> -{
> -  return __builtin_mve_vorrq_m_uv16qi (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline uint32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vorrq_m_u32 (uint32x4_t __inactive, uint32x4_t __a, uint32x4_t __b,
> mve_pred16_t __p)
> -{
> -  return __builtin_mve_vorrq_m_uv4si (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline uint16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vorrq_m_u16 (uint16x8_t __inactive, uint16x8_t __a, uint16x8_t __b,
> mve_pred16_t __p)
> -{
> -  return __builtin_mve_vorrq_m_uv8hi (__inactive, __a, __b, __p);
> -}
> -
>  __extension__ extern __inline int8x16_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vqaddq_m_n_s8 (int8x16_t __inactive, int8x16_t __a, int8_t __b,
> mve_pred16_t __p)
> @@ -14315,48 +14139,6 @@ __arm_vornq_x_u32 (uint32x4_t __a,
> uint32x4_t __b, mve_pred16_t __p)
>    return __builtin_mve_vornq_m_uv4si (__arm_vuninitializedq_u32 (), __a,
> __b, __p);
>  }
> 
> -__extension__ extern __inline int8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vorrq_x_s8 (int8x16_t __a, int8x16_t __b, mve_pred16_t __p)
> -{
> -  return __builtin_mve_vorrq_m_sv16qi (__arm_vuninitializedq_s8 (), __a,
> __b, __p);
> -}
> -
> -__extension__ extern __inline int16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vorrq_x_s16 (int16x8_t __a, int16x8_t __b, mve_pred16_t __p)
> -{
> -  return __builtin_mve_vorrq_m_sv8hi (__arm_vuninitializedq_s16 (), __a,
> __b, __p);
> -}
> -
> -__extension__ extern __inline int32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vorrq_x_s32 (int32x4_t __a, int32x4_t __b, mve_pred16_t __p)
> -{
> -  return __builtin_mve_vorrq_m_sv4si (__arm_vuninitializedq_s32 (), __a,
> __b, __p);
> -}
> -
> -__extension__ extern __inline uint8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vorrq_x_u8 (uint8x16_t __a, uint8x16_t __b, mve_pred16_t __p)
> -{
> -  return __builtin_mve_vorrq_m_uv16qi (__arm_vuninitializedq_u8 (), __a,
> __b, __p);
> -}
> -
> -__extension__ extern __inline uint16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vorrq_x_u16 (uint16x8_t __a, uint16x8_t __b, mve_pred16_t __p)
> -{
> -  return __builtin_mve_vorrq_m_uv8hi (__arm_vuninitializedq_u16 (), __a,
> __b, __p);
> -}
> -
> -__extension__ extern __inline uint32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vorrq_x_u32 (uint32x4_t __a, uint32x4_t __b, mve_pred16_t __p)
> -{
> -  return __builtin_mve_vorrq_m_uv4si (__arm_vuninitializedq_u32 (), __a,
> __b, __p);
> -}
> -
>  __extension__ extern __inline int8x16_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vrev16q_x_s8 (int8x16_t __a, mve_pred16_t __p)
> @@ -15924,13 +15706,6 @@ __arm_vcmpeqq_f16 (float16x8_t __a,
> float16x8_t __b)
>    return __builtin_mve_vcmpeqq_fv8hf (__a, __b);
>  }
> 
> -__extension__ extern __inline float16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vorrq_f16 (float16x8_t __a, float16x8_t __b)
> -{
> -  return __builtin_mve_vorrq_fv8hf (__a, __b);
> -}
> -
>  __extension__ extern __inline float16x8_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vornq_f16 (float16x8_t __a, float16x8_t __b)
> @@ -16134,13 +15909,6 @@ __arm_vcmpeqq_f32 (float32x4_t __a,
> float32x4_t __b)
>    return __builtin_mve_vcmpeqq_fv4sf (__a, __b);
>  }
> 
> -__extension__ extern __inline float32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vorrq_f32 (float32x4_t __a, float32x4_t __b)
> -{
> -  return __builtin_mve_vorrq_fv4sf (__a, __b);
> -}
> -
>  __extension__ extern __inline float32x4_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vornq_f32 (float32x4_t __a, float32x4_t __b)
> @@ -17332,20 +17100,6 @@ __arm_vornq_m_f16 (float16x8_t __inactive,
> float16x8_t __a, float16x8_t __b, mve
>    return __builtin_mve_vornq_m_fv8hf (__inactive, __a, __b, __p);
>  }
> 
> -__extension__ extern __inline float32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vorrq_m_f32 (float32x4_t __inactive, float32x4_t __a, float32x4_t
> __b, mve_pred16_t __p)
> -{
> -  return __builtin_mve_vorrq_m_fv4sf (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline float16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vorrq_m_f16 (float16x8_t __inactive, float16x8_t __a, float16x8_t
> __b, mve_pred16_t __p)
> -{
> -  return __builtin_mve_vorrq_m_fv8hf (__inactive, __a, __b, __p);
> -}
> -
>  __extension__ extern __inline float32x4_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vld1q_f32 (float32_t const * __base)
> @@ -18136,20 +17890,6 @@ __arm_vornq_x_f32 (float32x4_t __a,
> float32x4_t __b, mve_pred16_t __p)
>    return __builtin_mve_vornq_m_fv4sf (__arm_vuninitializedq_f32 (), __a,
> __b, __p);
>  }
> 
> -__extension__ extern __inline float16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vorrq_x_f16 (float16x8_t __a, float16x8_t __b, mve_pred16_t __p)
> -{
> -  return __builtin_mve_vorrq_m_fv8hf (__arm_vuninitializedq_f16 (), __a,
> __b, __p);
> -}
> -
> -__extension__ extern __inline float32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vorrq_x_f32 (float32x4_t __a, float32x4_t __b, mve_pred16_t __p)
> -{
> -  return __builtin_mve_vorrq_m_fv4sf (__arm_vuninitializedq_f32 (), __a,
> __b, __p);
> -}
> -
>  __extension__ extern __inline float16x8_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vrev32q_x_f16 (float16x8_t __a, mve_pred16_t __p)
> @@ -18940,13 +18680,6 @@ __arm_vqaddq (uint8x16_t __a, uint8_t __b)
>   return __arm_vqaddq_n_u8 (__a, __b);
>  }
> 
> -__extension__ extern __inline uint8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vorrq (uint8x16_t __a, uint8x16_t __b)
> -{
> - return __arm_vorrq_u8 (__a, __b);
> -}
> -
>  __extension__ extern __inline uint8x16_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vornq (uint8x16_t __a, uint8x16_t __b)
> @@ -19444,13 +19177,6 @@ __arm_vqaddq (int8x16_t __a, int8_t __b)
>   return __arm_vqaddq_n_s8 (__a, __b);
>  }
> 
> -__extension__ extern __inline int8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vorrq (int8x16_t __a, int8x16_t __b)
> -{
> - return __arm_vorrq_s8 (__a, __b);
> -}
> -
>  __extension__ extern __inline int8x16_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vornq (int8x16_t __a, int8x16_t __b)
> @@ -19682,13 +19408,6 @@ __arm_vqaddq (uint16x8_t __a, uint16_t __b)
>   return __arm_vqaddq_n_u16 (__a, __b);
>  }
> 
> -__extension__ extern __inline uint16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vorrq (uint16x8_t __a, uint16x8_t __b)
> -{
> - return __arm_vorrq_u16 (__a, __b);
> -}
> -
>  __extension__ extern __inline uint16x8_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vornq (uint16x8_t __a, uint16x8_t __b)
> @@ -20186,13 +19905,6 @@ __arm_vqaddq (int16x8_t __a, int16_t __b)
>   return __arm_vqaddq_n_s16 (__a, __b);
>  }
> 
> -__extension__ extern __inline int16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vorrq (int16x8_t __a, int16x8_t __b)
> -{
> - return __arm_vorrq_s16 (__a, __b);
> -}
> -
>  __extension__ extern __inline int16x8_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vornq (int16x8_t __a, int16x8_t __b)
> @@ -20424,13 +20136,6 @@ __arm_vqaddq (uint32x4_t __a, uint32_t __b)
>   return __arm_vqaddq_n_u32 (__a, __b);
>  }
> 
> -__extension__ extern __inline uint32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vorrq (uint32x4_t __a, uint32x4_t __b)
> -{
> - return __arm_vorrq_u32 (__a, __b);
> -}
> -
>  __extension__ extern __inline uint32x4_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vornq (uint32x4_t __a, uint32x4_t __b)
> @@ -20928,13 +20633,6 @@ __arm_vqaddq (int32x4_t __a, int32_t __b)
>   return __arm_vqaddq_n_s32 (__a, __b);
>  }
> 
> -__extension__ extern __inline int32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vorrq (int32x4_t __a, int32x4_t __b)
> -{
> - return __arm_vorrq_s32 (__a, __b);
> -}
> -
>  __extension__ extern __inline int32x4_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vornq (int32x4_t __a, int32x4_t __b)
> @@ -21201,13 +20899,6 @@ __arm_vshllbq (uint8x16_t __a, const int
> __imm)
>   return __arm_vshllbq_n_u8 (__a, __imm);
>  }
> 
> -__extension__ extern __inline uint16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vorrq (uint16x8_t __a, const int __imm)
> -{
> - return __arm_vorrq_n_u16 (__a, __imm);
> -}
> -
>  __extension__ extern __inline uint16x8_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vbicq (uint16x8_t __a, const int __imm)
> @@ -21313,13 +21004,6 @@ __arm_vshllbq (int8x16_t __a, const int __imm)
>   return __arm_vshllbq_n_s8 (__a, __imm);
>  }
> 
> -__extension__ extern __inline int16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vorrq (int16x8_t __a, const int __imm)
> -{
> - return __arm_vorrq_n_s16 (__a, __imm);
> -}
> -
>  __extension__ extern __inline int16x8_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vbicq (int16x8_t __a, const int __imm)
> @@ -21404,13 +21088,6 @@ __arm_vshllbq (uint16x8_t __a, const int
> __imm)
>   return __arm_vshllbq_n_u16 (__a, __imm);
>  }
> 
> -__extension__ extern __inline uint32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vorrq (uint32x4_t __a, const int __imm)
> -{
> - return __arm_vorrq_n_u32 (__a, __imm);
> -}
> -
>  __extension__ extern __inline uint32x4_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vbicq (uint32x4_t __a, const int __imm)
> @@ -21516,13 +21193,6 @@ __arm_vshllbq (int16x8_t __a, const int __imm)
>   return __arm_vshllbq_n_s16 (__a, __imm);
>  }
> 
> -__extension__ extern __inline int32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vorrq (int32x4_t __a, const int __imm)
> -{
> - return __arm_vorrq_n_s32 (__a, __imm);
> -}
> -
>  __extension__ extern __inline int32x4_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vbicq (int32x4_t __a, const int __imm)
> @@ -23595,13 +23265,6 @@ __arm_vmvnq_m (int16x8_t __inactive, const
> int __imm, mve_pred16_t __p)
>   return __arm_vmvnq_m_n_s16 (__inactive, __imm, __p);
>  }
> 
> -__extension__ extern __inline int16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vorrq_m_n (int16x8_t __a, const int __imm, mve_pred16_t __p)
> -{
> - return __arm_vorrq_m_n_s16 (__a, __imm, __p);
> -}
> -
>  __extension__ extern __inline int8x16_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vqrshrntq (int8x16_t __a, int16x8_t __b, const int __imm)
> @@ -23763,13 +23426,6 @@ __arm_vmvnq_m (uint16x8_t __inactive, const
> int __imm, mve_pred16_t __p)
>   return __arm_vmvnq_m_n_u16 (__inactive, __imm, __p);
>  }
> 
> -__extension__ extern __inline uint16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vorrq_m_n (uint16x8_t __a, const int __imm, mve_pred16_t __p)
> -{
> - return __arm_vorrq_m_n_u16 (__a, __imm, __p);
> -}
> -
>  __extension__ extern __inline uint8x16_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vqrshruntq (uint8x16_t __a, int16x8_t __b, const int __imm)
> @@ -23924,13 +23580,6 @@ __arm_vmvnq_m (int32x4_t __inactive, const
> int __imm, mve_pred16_t __p)
>   return __arm_vmvnq_m_n_s32 (__inactive, __imm, __p);
>  }
> 
> -__extension__ extern __inline int32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vorrq_m_n (int32x4_t __a, const int __imm, mve_pred16_t __p)
> -{
> - return __arm_vorrq_m_n_s32 (__a, __imm, __p);
> -}
> -
>  __extension__ extern __inline int16x8_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vqrshrntq (int16x8_t __a, int32x4_t __b, const int __imm)
> @@ -24092,13 +23741,6 @@ __arm_vmvnq_m (uint32x4_t __inactive, const
> int __imm, mve_pred16_t __p)
>   return __arm_vmvnq_m_n_u32 (__inactive, __imm, __p);
>  }
> 
> -__extension__ extern __inline uint32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vorrq_m_n (uint32x4_t __a, const int __imm, mve_pred16_t __p)
> -{
> - return __arm_vorrq_m_n_u32 (__a, __imm, __p);
> -}
> -
>  __extension__ extern __inline uint16x8_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vqrshruntq (uint16x8_t __a, int32x4_t __b, const int __imm)
> @@ -25254,48 +24896,6 @@ __arm_vornq_m (uint16x8_t __inactive,
> uint16x8_t __a, uint16x8_t __b, mve_pred16
>   return __arm_vornq_m_u16 (__inactive, __a, __b, __p);
>  }
> 
> -__extension__ extern __inline int8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vorrq_m (int8x16_t __inactive, int8x16_t __a, int8x16_t __b,
> mve_pred16_t __p)
> -{
> - return __arm_vorrq_m_s8 (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline int32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vorrq_m (int32x4_t __inactive, int32x4_t __a, int32x4_t __b,
> mve_pred16_t __p)
> -{
> - return __arm_vorrq_m_s32 (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline int16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vorrq_m (int16x8_t __inactive, int16x8_t __a, int16x8_t __b,
> mve_pred16_t __p)
> -{
> - return __arm_vorrq_m_s16 (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline uint8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vorrq_m (uint8x16_t __inactive, uint8x16_t __a, uint8x16_t __b,
> mve_pred16_t __p)
> -{
> - return __arm_vorrq_m_u8 (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline uint32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vorrq_m (uint32x4_t __inactive, uint32x4_t __a, uint32x4_t __b,
> mve_pred16_t __p)
> -{
> - return __arm_vorrq_m_u32 (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline uint16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vorrq_m (uint16x8_t __inactive, uint16x8_t __a, uint16x8_t __b,
> mve_pred16_t __p)
> -{
> - return __arm_vorrq_m_u16 (__inactive, __a, __b, __p);
> -}
> -
>  __extension__ extern __inline int8x16_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vqaddq_m (int8x16_t __inactive, int8x16_t __a, int8_t __b,
> mve_pred16_t __p)
> @@ -29216,48 +28816,6 @@ __arm_vornq_x (uint32x4_t __a, uint32x4_t
> __b, mve_pred16_t __p)
>   return __arm_vornq_x_u32 (__a, __b, __p);
>  }
> 
> -__extension__ extern __inline int8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vorrq_x (int8x16_t __a, int8x16_t __b, mve_pred16_t __p)
> -{
> - return __arm_vorrq_x_s8 (__a, __b, __p);
> -}
> -
> -__extension__ extern __inline int16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vorrq_x (int16x8_t __a, int16x8_t __b, mve_pred16_t __p)
> -{
> - return __arm_vorrq_x_s16 (__a, __b, __p);
> -}
> -
> -__extension__ extern __inline int32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vorrq_x (int32x4_t __a, int32x4_t __b, mve_pred16_t __p)
> -{
> - return __arm_vorrq_x_s32 (__a, __b, __p);
> -}
> -
> -__extension__ extern __inline uint8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vorrq_x (uint8x16_t __a, uint8x16_t __b, mve_pred16_t __p)
> -{
> - return __arm_vorrq_x_u8 (__a, __b, __p);
> -}
> -
> -__extension__ extern __inline uint16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vorrq_x (uint16x8_t __a, uint16x8_t __b, mve_pred16_t __p)
> -{
> - return __arm_vorrq_x_u16 (__a, __b, __p);
> -}
> -
> -__extension__ extern __inline uint32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vorrq_x (uint32x4_t __a, uint32x4_t __b, mve_pred16_t __p)
> -{
> - return __arm_vorrq_x_u32 (__a, __b, __p);
> -}
> -
>  __extension__ extern __inline int8x16_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vrev16q_x (int8x16_t __a, mve_pred16_t __p)
> @@ -30415,13 +29973,6 @@ __arm_vcmpeqq (float16x8_t __a, float16x8_t
> __b)
>   return __arm_vcmpeqq_f16 (__a, __b);
>  }
> 
> -__extension__ extern __inline float16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vorrq (float16x8_t __a, float16x8_t __b)
> -{
> - return __arm_vorrq_f16 (__a, __b);
> -}
> -
>  __extension__ extern __inline float16x8_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vornq (float16x8_t __a, float16x8_t __b)
> @@ -30625,13 +30176,6 @@ __arm_vcmpeqq (float32x4_t __a, float32x4_t
> __b)
>   return __arm_vcmpeqq_f32 (__a, __b);
>  }
> 
> -__extension__ extern __inline float32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vorrq (float32x4_t __a, float32x4_t __b)
> -{
> - return __arm_vorrq_f32 (__a, __b);
> -}
> -
>  __extension__ extern __inline float32x4_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vornq (float32x4_t __a, float32x4_t __b)
> @@ -31808,20 +31352,6 @@ __arm_vornq_m (float16x8_t __inactive,
> float16x8_t __a, float16x8_t __b, mve_pre
>   return __arm_vornq_m_f16 (__inactive, __a, __b, __p);
>  }
> 
> -__extension__ extern __inline float32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vorrq_m (float32x4_t __inactive, float32x4_t __a, float32x4_t __b,
> mve_pred16_t __p)
> -{
> - return __arm_vorrq_m_f32 (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline float16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vorrq_m (float16x8_t __inactive, float16x8_t __a, float16x8_t __b,
> mve_pred16_t __p)
> -{
> - return __arm_vorrq_m_f16 (__inactive, __a, __b, __p);
> -}
> -
>  __extension__ extern __inline float32x4_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vld1q (float32_t const * __base)
> @@ -32354,20 +31884,6 @@ __arm_vornq_x (float32x4_t __a, float32x4_t
> __b, mve_pred16_t __p)
>   return __arm_vornq_x_f32 (__a, __b, __p);
>  }
> 
> -__extension__ extern __inline float16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vorrq_x (float16x8_t __a, float16x8_t __b, mve_pred16_t __p)
> -{
> - return __arm_vorrq_x_f16 (__a, __b, __p);
> -}
> -
> -__extension__ extern __inline float32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vorrq_x (float32x4_t __a, float32x4_t __b, mve_pred16_t __p)
> -{
> - return __arm_vorrq_x_f32 (__a, __b, __p);
> -}
> -
>  __extension__ extern __inline float16x8_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vrev32q_x (float16x8_t __a, mve_pred16_t __p)
> @@ -32928,18 +32444,6 @@ extern void *__ARM_undef;
>    int (*)[__ARM_mve_type_uint16x8_t]: __arm_vcvtq_n_f16_u16
> (__ARM_mve_coerce(__p0, uint16x8_t), p1), \
>    int (*)[__ARM_mve_type_uint32x4_t]: __arm_vcvtq_n_f32_u32
> (__ARM_mve_coerce(__p0, uint32x4_t), p1));})
> 
> -#define __arm_vorrq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
> -  __typeof(p1) __p1 = (p1); \
> -  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0,
> \
> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]:
> __arm_vorrq_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int8x16_t)), \
> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]:
> __arm_vorrq_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce(__p1, int16x8_t)), \
> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
> __arm_vorrq_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce(__p1, int32x4_t)), \
> -  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]:
> __arm_vorrq_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
> __ARM_mve_coerce(__p1, uint8x16_t)), \
> -  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]:
> __arm_vorrq_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
> __ARM_mve_coerce(__p1, uint16x8_t)), \
> -  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]:
> __arm_vorrq_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
> __ARM_mve_coerce(__p1, uint32x4_t)), \
> -  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]:
> __arm_vorrq_f16 (__ARM_mve_coerce(__p0, float16x8_t),
> __ARM_mve_coerce(__p1, float16x8_t)), \
> -  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]:
> __arm_vorrq_f32 (__ARM_mve_coerce(__p0, float32x4_t),
> __ARM_mve_coerce(__p1, float32x4_t)));})
> -
>  #define __arm_vabdq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
>    __typeof(p1) __p1 = (p1); \
>    _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0,
> \
> @@ -34467,19 +33971,6 @@ extern void *__ARM_undef;
>    int
> (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t][__ARM_
> mve_type_float16x8_t]: __arm_vornq_m_f16 (__ARM_mve_coerce(__p0,
> float16x8_t), __ARM_mve_coerce(__p1, float16x8_t),
> __ARM_mve_coerce(__p2, float16x8_t), p3), \
>    int
> (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t][__ARM_
> mve_type_float32x4_t]: __arm_vornq_m_f32 (__ARM_mve_coerce(__p0,
> float32x4_t), __ARM_mve_coerce(__p1, float32x4_t),
> __ARM_mve_coerce(__p2, float32x4_t), p3));})
> 
> -#define __arm_vorrq_m(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
> -  __typeof(p1) __p1 = (p1); \
> -  __typeof(p2) __p2 = (p2); \
> -  _Generic( (int
> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typ
> eid(__p2)])0, \
> -  int
> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
> _type_int8x16_t]: __arm_vorrq_m_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8x16_t),
> p3), \
> -  int
> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
> _type_int16x8_t]: __arm_vorrq_m_s16 (__ARM_mve_coerce(__p0,
> int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2,
> int16x8_t), p3), \
> -  int
> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
> _type_int32x4_t]: __arm_vorrq_m_s32 (__ARM_mve_coerce(__p0,
> int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2,
> int32x4_t), p3), \
> -  int
> (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_m
> ve_type_uint8x16_t]: __arm_vorrq_m_u8 (__ARM_mve_coerce(__p0,
> uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t),
> __ARM_mve_coerce(__p2, uint8x16_t), p3), \
> -  int
> (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_m
> ve_type_uint16x8_t]: __arm_vorrq_m_u16 (__ARM_mve_coerce(__p0,
> uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t),
> __ARM_mve_coerce(__p2, uint16x8_t), p3), \
> -  int
> (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_m
> ve_type_uint32x4_t]: __arm_vorrq_m_u32 (__ARM_mve_coerce(__p0,
> uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t),
> __ARM_mve_coerce(__p2, uint32x4_t), p3), \
> -  int
> (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t][__ARM_
> mve_type_float16x8_t]: __arm_vorrq_m_f16 (__ARM_mve_coerce(__p0,
> float16x8_t), __ARM_mve_coerce(__p1, float16x8_t),
> __ARM_mve_coerce(__p2, float16x8_t), p3), \
> -  int
> (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t][__ARM_
> mve_type_float32x4_t]: __arm_vorrq_m_f32 (__ARM_mve_coerce(__p0,
> float32x4_t), __ARM_mve_coerce(__p1, float32x4_t),
> __ARM_mve_coerce(__p2, float32x4_t), p3));})
> -
>  #define __arm_vld1q(p0) (\
>    _Generic( (int (*)[__ARM_mve_typeid(p0)])0, \
>    int (*)[__ARM_mve_type_int8_t_ptr]: __arm_vld1q_s8
> (__ARM_mve_coerce1(p0, int8_t *)), \
> @@ -34923,18 +34414,6 @@ extern void *__ARM_undef;
>    int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]:
> __arm_vornq_x_f16 (__ARM_mve_coerce(__p1, float16x8_t),
> __ARM_mve_coerce(__p2, float16x8_t), p3), \
>    int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]:
> __arm_vornq_x_f32 (__ARM_mve_coerce(__p1, float32x4_t),
> __ARM_mve_coerce(__p2, float32x4_t), p3));})
> 
> -#define __arm_vorrq_x(p1,p2,p3) ({ __typeof(p1) __p1 = (p1); \
> -  __typeof(p2) __p2 = (p2); \
> -  _Generic( (int (*)[__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0,
> \
> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]:
> __arm_vorrq_x_s8 (__ARM_mve_coerce(__p1, int8x16_t),
> __ARM_mve_coerce(__p2, int8x16_t), p3), \
> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]:
> __arm_vorrq_x_s16 (__ARM_mve_coerce(__p1, int16x8_t),
> __ARM_mve_coerce(__p2, int16x8_t), p3), \
> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
> __arm_vorrq_x_s32 (__ARM_mve_coerce(__p1, int32x4_t),
> __ARM_mve_coerce(__p2, int32x4_t), p3), \
> -  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]:
> __arm_vorrq_x_u8 (__ARM_mve_coerce(__p1, uint8x16_t),
> __ARM_mve_coerce(__p2, uint8x16_t), p3), \
> -  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]:
> __arm_vorrq_x_u16 (__ARM_mve_coerce(__p1, uint16x8_t),
> __ARM_mve_coerce(__p2, uint16x8_t), p3), \
> -  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]:
> __arm_vorrq_x_u32 (__ARM_mve_coerce(__p1, uint32x4_t),
> __ARM_mve_coerce(__p2, uint32x4_t), p3), \
> -  int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]:
> __arm_vorrq_x_f16 (__ARM_mve_coerce(__p1, float16x8_t),
> __ARM_mve_coerce(__p2, float16x8_t), p3), \
> -  int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]:
> __arm_vorrq_x_f32 (__ARM_mve_coerce(__p1, float32x4_t),
> __ARM_mve_coerce(__p2, float32x4_t), p3));})
> -
>  #define __arm_vrev32q_x(p1,p2) ({ __typeof(p1) __p1 = (p1); \
>    _Generic( (int (*)[__ARM_mve_typeid(__p1)])0, \
>    int (*)[__ARM_mve_type_int8x16_t]: __arm_vrev32q_x_s8
> (__ARM_mve_coerce(__p1, int8x16_t), p2), \
> @@ -35321,16 +34800,6 @@ extern void *__ARM_undef;
>    int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]:
> __arm_vqaddq_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
> __ARM_mve_coerce(__p1, uint16x8_t)), \
>    int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]:
> __arm_vqaddq_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
> __ARM_mve_coerce(__p1, uint32x4_t)));})
> 
> -#define __arm_vorrq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
> -  __typeof(p1) __p1 = (p1); \
> -  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0,
> \
> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]:
> __arm_vorrq_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int8x16_t)), \
> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]:
> __arm_vorrq_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce(__p1, int16x8_t)), \
> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
> __arm_vorrq_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce(__p1, int32x4_t)), \
> -  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]:
> __arm_vorrq_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
> __ARM_mve_coerce(__p1, uint8x16_t)), \
> -  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]:
> __arm_vorrq_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
> __ARM_mve_coerce(__p1, uint16x8_t)), \
> -  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]:
> __arm_vorrq_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
> __ARM_mve_coerce(__p1, uint32x4_t)));})
> -
>  #define __arm_vornq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
>    __typeof(p1) __p1 = (p1); \
>    _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0,
> \
> @@ -36244,17 +35713,6 @@ extern void *__ARM_undef;
>    int
> (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_m
> ve_type_uint16x8_t]: __arm_vornq_m_u16 (__ARM_mve_coerce(__p0,
> uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t),
> __ARM_mve_coerce(__p2, uint16x8_t), p3), \
>    int
> (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_m
> ve_type_uint32x4_t]: __arm_vornq_m_u32 (__ARM_mve_coerce(__p0,
> uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t),
> __ARM_mve_coerce(__p2, uint32x4_t), p3));})
> 
> -#define __arm_vorrq_m(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
> -  __typeof(p1) __p1 = (p1); \
> -  __typeof(p2) __p2 = (p2); \
> -  _Generic( (int
> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typ
> eid(__p2)])0, \
> -  int
> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
> _type_int8x16_t]: __arm_vorrq_m_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, int8x16_t),
> p3), \
> -  int
> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
> _type_int16x8_t]: __arm_vorrq_m_s16 (__ARM_mve_coerce(__p0,
> int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2,
> int16x8_t), p3), \
> -  int
> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
> _type_int32x4_t]: __arm_vorrq_m_s32 (__ARM_mve_coerce(__p0,
> int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2,
> int32x4_t), p3), \
> -  int
> (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_m
> ve_type_uint8x16_t]: __arm_vorrq_m_u8 (__ARM_mve_coerce(__p0,
> uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t),
> __ARM_mve_coerce(__p2, uint8x16_t), p3), \
> -  int
> (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_m
> ve_type_uint16x8_t]: __arm_vorrq_m_u16 (__ARM_mve_coerce(__p0,
> uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t),
> __ARM_mve_coerce(__p2, uint16x8_t), p3), \
> -  int
> (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_m
> ve_type_uint32x4_t]: __arm_vorrq_m_u32 (__ARM_mve_coerce(__p0,
> uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t),
> __ARM_mve_coerce(__p2, uint32x4_t), p3));})
> -
>  #define __arm_vstrwq_scatter_base(p0,p1,p2) ({ __typeof(p2) __p2 = (p2); \
>    _Generic( (int (*)[__ARM_mve_typeid(__p2)])0, \
>    int (*)[__ARM_mve_type_int32x4_t]: __arm_vstrwq_scatter_base_s32(p0,
> p1, __ARM_mve_coerce(__p2, int32x4_t)), \
> @@ -36590,16 +36048,6 @@ extern void *__ARM_undef;
>    int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]:
> __arm_vornq_x_u16 (__ARM_mve_coerce(__p1, uint16x8_t),
> __ARM_mve_coerce(__p2, uint16x8_t), p3), \
>    int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]:
> __arm_vornq_x_u32 (__ARM_mve_coerce(__p1, uint32x4_t),
> __ARM_mve_coerce(__p2, uint32x4_t), p3));})
> 
> -#define __arm_vorrq_x(p1,p2,p3) ({ __typeof(p1) __p1 = (p1); \
> -  __typeof(p2) __p2 = (p2); \
> -  _Generic( (int (*)[__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0,
> \
> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]:
> __arm_vorrq_x_s8 (__ARM_mve_coerce(__p1, int8x16_t),
> __ARM_mve_coerce(__p2, int8x16_t), p3), \
> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]:
> __arm_vorrq_x_s16 (__ARM_mve_coerce(__p1, int16x8_t),
> __ARM_mve_coerce(__p2, int16x8_t), p3), \
> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
> __arm_vorrq_x_s32 (__ARM_mve_coerce(__p1, int32x4_t),
> __ARM_mve_coerce(__p2, int32x4_t), p3), \
> -  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]:
> __arm_vorrq_x_u8 (__ARM_mve_coerce(__p1, uint8x16_t),
> __ARM_mve_coerce(__p2, uint8x16_t), p3), \
> -  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]:
> __arm_vorrq_x_u16 (__ARM_mve_coerce(__p1, uint16x8_t),
> __ARM_mve_coerce(__p2, uint16x8_t), p3), \
> -  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]:
> __arm_vorrq_x_u32 (__ARM_mve_coerce(__p1, uint32x4_t),
> __ARM_mve_coerce(__p2, uint32x4_t), p3));})
> -
>  #define __arm_vrev32q_x(p1,p2) ({ __typeof(p1) __p1 = (p1); \
>    _Generic( (int (*)[__ARM_mve_typeid(__p1)])0, \
>    int (*)[__ARM_mve_type_int8x16_t]: __arm_vrev32q_x_s8
> (__ARM_mve_coerce(__p1, int8x16_t), p2), \
> @@ -37378,13 +36826,6 @@ extern void *__ARM_undef;
>    int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
> __arm_vmvnq_m_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
> __ARM_mve_coerce1(__p1, int) , p2), \
>    int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
> __arm_vmvnq_m_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
> __ARM_mve_coerce1(__p1, int) , p2));})
> 
> -#define __arm_vorrq_m_n(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
> -  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
> -  int (*)[__ARM_mve_type_int16x8_t]: __arm_vorrq_m_n_s16
> (__ARM_mve_coerce(__p0, int16x8_t), p1, p2), \
> -  int (*)[__ARM_mve_type_int32x4_t]: __arm_vorrq_m_n_s32
> (__ARM_mve_coerce(__p0, int32x4_t), p1, p2), \
> -  int (*)[__ARM_mve_type_uint16x8_t]: __arm_vorrq_m_n_u16
> (__ARM_mve_coerce(__p0, uint16x8_t), p1, p2), \
> -  int (*)[__ARM_mve_type_uint32x4_t]: __arm_vorrq_m_n_u32
> (__ARM_mve_coerce(__p0, uint32x4_t), p1, p2));})
> -
>  #define __arm_vqshrunbq(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \
>    __typeof(p1) __p1 = (p1); \
>    _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0,
> \
> --
> 2.34.1


^ permalink raw reply	[flat|nested] 55+ messages in thread

* RE: [PATCH 14/22] arm: [MVE intrinsics] add unspec_mve_function_exact_insn
  2023-04-18 13:46 ` [PATCH 14/22] arm: [MVE intrinsics] add unspec_mve_function_exact_insn Christophe Lyon
@ 2023-05-03  8:40   ` Kyrylo Tkachov
  0 siblings, 0 replies; 55+ messages in thread
From: Kyrylo Tkachov @ 2023-05-03  8:40 UTC (permalink / raw)
  To: Christophe Lyon, gcc-patches, Richard Earnshaw, Richard Sandiford
  Cc: Christophe Lyon



> -----Original Message-----
> From: Christophe Lyon <christophe.lyon@arm.com>
> Sent: Tuesday, April 18, 2023 2:46 PM
> To: gcc-patches@gcc.gnu.org; Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>;
> Richard Earnshaw <Richard.Earnshaw@arm.com>; Richard Sandiford
> <Richard.Sandiford@arm.com>
> Cc: Christophe Lyon <Christophe.Lyon@arm.com>
> Subject: [PATCH 14/22] arm: [MVE intrinsics] add
> unspec_mve_function_exact_insn
> 
> Introduce a function that will be used to build intrinsics which use
> UNSPECS for the versions.
> 

Ok.
Thanks,
Kyrill

> 2022-09-08  Christophe Lyon <christophe.lyon@arm.com>
> 
> 	gcc/
> 	* config/arm/arm-mve-builtins-functions.h (class
> 	unspec_mve_function_exact_insn): New.
> ---
>  gcc/config/arm/arm-mve-builtins-functions.h | 151 ++++++++++++++++++++
>  1 file changed, 151 insertions(+)
> 
> diff --git a/gcc/config/arm/arm-mve-builtins-functions.h
> b/gcc/config/arm/arm-mve-builtins-functions.h
> index 6d992b270b0..5abf913d182 100644
> --- a/gcc/config/arm/arm-mve-builtins-functions.h
> +++ b/gcc/config/arm/arm-mve-builtins-functions.h
> @@ -225,6 +225,157 @@ public:
>    }
>  };
> 
> +/* Map the function directly to CODE (UNSPEC, M) where M is the vector
> +   mode associated with type suffix 0.  */
> +class unspec_mve_function_exact_insn : public function_base
> +{
> +public:
> +  CONSTEXPR unspec_mve_function_exact_insn (int unspec_for_sint,
> +					    int unspec_for_uint,
> +					    int unspec_for_fp,
> +					    int unspec_for_n_sint,
> +					    int unspec_for_n_uint,
> +					    int unspec_for_n_fp,
> +					    int unspec_for_m_sint,
> +					    int unspec_for_m_uint,
> +					    int unspec_for_m_fp,
> +					    int unspec_for_m_n_sint,
> +					    int unspec_for_m_n_uint,
> +					    int unspec_for_m_n_fp)
> +    : m_unspec_for_sint (unspec_for_sint),
> +      m_unspec_for_uint (unspec_for_uint),
> +      m_unspec_for_fp (unspec_for_fp),
> +      m_unspec_for_n_sint (unspec_for_n_sint),
> +      m_unspec_for_n_uint (unspec_for_n_uint),
> +      m_unspec_for_n_fp (unspec_for_n_fp),
> +      m_unspec_for_m_sint (unspec_for_m_sint),
> +      m_unspec_for_m_uint (unspec_for_m_uint),
> +      m_unspec_for_m_fp (unspec_for_m_fp),
> +      m_unspec_for_m_n_sint (unspec_for_m_n_sint),
> +      m_unspec_for_m_n_uint (unspec_for_m_n_uint),
> +      m_unspec_for_m_n_fp (unspec_for_m_n_fp)
> +  {}
> +
> +  /* The unspec code associated with signed-integer, unsigned-integer
> +     and floating-point operations respectively.  It covers the cases
> +     with the _n suffix, and/or the _m predicate.  */
> +  int m_unspec_for_sint;
> +  int m_unspec_for_uint;
> +  int m_unspec_for_fp;
> +  int m_unspec_for_n_sint;
> +  int m_unspec_for_n_uint;
> +  int m_unspec_for_n_fp;
> +  int m_unspec_for_m_sint;
> +  int m_unspec_for_m_uint;
> +  int m_unspec_for_m_fp;
> +  int m_unspec_for_m_n_sint;
> +  int m_unspec_for_m_n_uint;
> +  int m_unspec_for_m_n_fp;
> +
> +  rtx
> +  expand (function_expander &e) const override
> +  {
> +    insn_code code;
> +    switch (e.pred)
> +      {
> +      case PRED_none:
> +	switch (e.mode_suffix_id)
> +	  {
> +	  case MODE_none:
> +	    /* No predicate, no suffix.  */
> +	    if (e.type_suffix (0).integer_p)
> +	      if (e.type_suffix (0).unsigned_p)
> +		code = code_for_mve_q (m_unspec_for_uint,
> m_unspec_for_uint, e.vector_mode (0));
> +	      else
> +		code = code_for_mve_q (m_unspec_for_sint,
> m_unspec_for_sint, e.vector_mode (0));
> +	    else
> +	      code = code_for_mve_q_f (m_unspec_for_fp, e.vector_mode (0));
> +	    break;
> +
> +	  case MODE_n:
> +	    /* No predicate, _n suffix.  */
> +	    if (e.type_suffix (0).integer_p)
> +	      if (e.type_suffix (0).unsigned_p)
> +		code = code_for_mve_q_n (m_unspec_for_n_uint,
> m_unspec_for_n_uint, e.vector_mode (0));
> +	      else
> +		code = code_for_mve_q_n (m_unspec_for_n_sint,
> m_unspec_for_n_sint, e.vector_mode (0));
> +	    else
> +	      code = code_for_mve_q_n_f (m_unspec_for_n_fp, e.vector_mode
> (0));
> +	    break;
> +
> +	  default:
> +	    gcc_unreachable ();
> +	  }
> +	return e.use_exact_insn (code);
> +
> +      case PRED_m:
> +	switch (e.mode_suffix_id)
> +	  {
> +	  case MODE_none:
> +	    /* No suffix, "m" predicate.  */
> +	    if (e.type_suffix (0).integer_p)
> +	      if (e.type_suffix (0).unsigned_p)
> +		code = code_for_mve_q_m (m_unspec_for_m_uint,
> m_unspec_for_m_uint, e.vector_mode (0));
> +	      else
> +		code = code_for_mve_q_m (m_unspec_for_m_sint,
> m_unspec_for_m_sint, e.vector_mode (0));
> +	    else
> +	      code = code_for_mve_q_m_f (m_unspec_for_m_fp,
> e.vector_mode (0));
> +	    break;
> +
> +	  case MODE_n:
> +	    /* _n suffix, "m" predicate.  */
> +	    if (e.type_suffix (0).integer_p)
> +	      if (e.type_suffix (0).unsigned_p)
> +		code = code_for_mve_q_m_n (m_unspec_for_m_n_uint,
> m_unspec_for_m_n_uint, e.vector_mode (0));
> +	      else
> +		code = code_for_mve_q_m_n (m_unspec_for_m_n_sint,
> m_unspec_for_m_n_sint, e.vector_mode (0));
> +	    else
> +	      code = code_for_mve_q_m_n_f (m_unspec_for_m_n_fp,
> e.vector_mode (0));
> +	    break;
> +
> +	  default:
> +	    gcc_unreachable ();
> +	  }
> +	return e.use_cond_insn (code, 0);
> +
> +      case PRED_x:
> +	switch (e.mode_suffix_id)
> +	  {
> +	  case MODE_none:
> +	    /* No suffix, "x" predicate.  */
> +	    if (e.type_suffix (0).integer_p)
> +	      if (e.type_suffix (0).unsigned_p)
> +		code = code_for_mve_q_m (m_unspec_for_m_uint,
> m_unspec_for_m_uint, e.vector_mode (0));
> +	      else
> +		code = code_for_mve_q_m (m_unspec_for_m_sint,
> m_unspec_for_m_sint, e.vector_mode (0));
> +	    else
> +	      code = code_for_mve_q_m_f (m_unspec_for_m_fp,
> e.vector_mode (0));
> +	    break;
> +
> +	  case MODE_n:
> +	    /* _n suffix, "x" predicate.  */
> +	    if (e.type_suffix (0).integer_p)
> +	      if (e.type_suffix (0).unsigned_p)
> +		code = code_for_mve_q_m_n (m_unspec_for_m_n_uint,
> m_unspec_for_m_n_uint, e.vector_mode (0));
> +	      else
> +		code = code_for_mve_q_m_n (m_unspec_for_m_n_sint,
> m_unspec_for_m_n_sint, e.vector_mode (0));
> +	    else
> +	      code = code_for_mve_q_m_n_f (m_unspec_for_m_n_fp,
> e.vector_mode (0));
> +	    break;
> +
> +	  default:
> +	    gcc_unreachable ();
> +	  }
> +	return e.use_pred_x_insn (code);
> +
> +      default:
> +	gcc_unreachable ();
> +      }
> +
> +    gcc_unreachable ();
> +  }
> +};
> +
>  } /* end namespace arm_mve */
> 
>  /* Declare the global function base NAME, creating it from an instance
> --
> 2.34.1


^ permalink raw reply	[flat|nested] 55+ messages in thread

* RE: [PATCH 15/22] arm: [MVE intrinsics] add create shape
  2023-04-18 13:46 ` [PATCH 15/22] arm: [MVE intrinsics] add create shape Christophe Lyon
@ 2023-05-03  8:40   ` Kyrylo Tkachov
  0 siblings, 0 replies; 55+ messages in thread
From: Kyrylo Tkachov @ 2023-05-03  8:40 UTC (permalink / raw)
  To: Christophe Lyon, gcc-patches, Richard Earnshaw, Richard Sandiford
  Cc: Christophe Lyon



> -----Original Message-----
> From: Christophe Lyon <christophe.lyon@arm.com>
> Sent: Tuesday, April 18, 2023 2:46 PM
> To: gcc-patches@gcc.gnu.org; Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>;
> Richard Earnshaw <Richard.Earnshaw@arm.com>; Richard Sandiford
> <Richard.Sandiford@arm.com>
> Cc: Christophe Lyon <Christophe.Lyon@arm.com>
> Subject: [PATCH 15/22] arm: [MVE intrinsics] add create shape
> 
> This patch adds the create shape description.
> 

Ok.
Thanks,
Kyrill

> 2022-09-08  Christophe Lyon  <christophe.lyon@arm.com>
> 
> 	gcc/
> 	* config/arm/arm-mve-builtins-shapes.cc (create): New.
> 	* config/arm/arm-mve-builtins-shapes.h: (create): New.
> ---
>  gcc/config/arm/arm-mve-builtins-shapes.cc | 22 ++++++++++++++++++++++
>  gcc/config/arm/arm-mve-builtins-shapes.h  |  1 +
>  2 files changed, 23 insertions(+)
> 
> diff --git a/gcc/config/arm/arm-mve-builtins-shapes.cc b/gcc/config/arm/arm-
> mve-builtins-shapes.cc
> index 83410bbc51a..e4a42005852 100644
> --- a/gcc/config/arm/arm-mve-builtins-shapes.cc
> +++ b/gcc/config/arm/arm-mve-builtins-shapes.cc
> @@ -458,6 +458,28 @@ struct binary_orrq_def : public overloaded_base<0>
>  };
>  SHAPE (binary_orrq)
> 
> +/* <T0>xN_t vfoo[_t0](uint64_t, uint64_t)
> +
> +   where there are N arguments in total.
> +   Example: vcreateq.
> +   int16x8_t [__arm_]vcreateq_s16(uint64_t a, uint64_t b)  */
> +struct create_def : public nonoverloaded_base
> +{
> +  void
> +  build (function_builder &b, const function_group_info &group,
> +	 bool preserve_user_namespace) const override
> +  {
> +    build_all (b, "v0,su64,su64", group, MODE_none,
> preserve_user_namespace);
> +  }
> +
> +  tree
> +  resolve (function_resolver &r) const override
> +  {
> +    return r.resolve_uniform (0, 2);
> +  }
> +};
> +SHAPE (create)
> +
>  /* <T0>[xN]_t vfoo_t0().
> 
>     Example: vuninitializedq.
> diff --git a/gcc/config/arm/arm-mve-builtins-shapes.h b/gcc/config/arm/arm-
> mve-builtins-shapes.h
> index 618b3226050..3305d12877a 100644
> --- a/gcc/config/arm/arm-mve-builtins-shapes.h
> +++ b/gcc/config/arm/arm-mve-builtins-shapes.h
> @@ -37,6 +37,7 @@ namespace arm_mve
>      extern const function_shape *const binary;
>      extern const function_shape *const binary_opt_n;
>      extern const function_shape *const binary_orrq;
> +    extern const function_shape *const create;
>      extern const function_shape *const inherent;
>      extern const function_shape *const unary_convert;
> 
> --
> 2.34.1


^ permalink raw reply	[flat|nested] 55+ messages in thread

* RE: [PATCH 16/22] arm: [MVE intrinsics] factorize vcreateq
  2023-04-18 13:46 ` [PATCH 16/22] arm: [MVE intrinsics] factorize vcreateq Christophe Lyon
@ 2023-05-03  8:42   ` Kyrylo Tkachov
  0 siblings, 0 replies; 55+ messages in thread
From: Kyrylo Tkachov @ 2023-05-03  8:42 UTC (permalink / raw)
  To: Christophe Lyon, gcc-patches, Richard Earnshaw, Richard Sandiford
  Cc: Christophe Lyon



> -----Original Message-----
> From: Christophe Lyon <christophe.lyon@arm.com>
> Sent: Tuesday, April 18, 2023 2:46 PM
> To: gcc-patches@gcc.gnu.org; Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>;
> Richard Earnshaw <Richard.Earnshaw@arm.com>; Richard Sandiford
> <Richard.Sandiford@arm.com>
> Cc: Christophe Lyon <Christophe.Lyon@arm.com>
> Subject: [PATCH 16/22] arm: [MVE intrinsics] factorize vcreateq
> 
> We need a 'fake' iterator to be able to use mve_insn for vcreateq_f.
> 
> 2022-09-08  Christophe Lyon <christophe.lyon@arm.com>
> 
> 	gcc/
> 	* config/arm/iterators.md (MVE_FP_CREATE): New.
> 	(mve_insn): Add VCREATEQ_S, VCREATEQ_U, VCREATEQ_F.
> 	* config/arm/mve.md (mve_vcreateq_f<mode>): Rename into ...
> 	(@mve_<mve_insn>q_f<mode>): ... this.
> 	(mve_vcreateq_<supf><mode>): Rename into ...
> 	(@mve_<mve_insn>q_<supf><mode>): ... this.
> ---
>  gcc/config/arm/iterators.md | 5 +++++
>  gcc/config/arm/mve.md       | 6 +++---
>  2 files changed, 8 insertions(+), 3 deletions(-)
> 
> diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
> index b0ea1af77d2..5a531d77a33 100644
> --- a/gcc/config/arm/iterators.md
> +++ b/gcc/config/arm/iterators.md
> @@ -393,6 +393,10 @@ (define_int_iterator MVE_FP_N_BINARY   [
>  		     VSUBQ_N_F
>  		     ])
> 
> +(define_int_iterator MVE_FP_CREATE [
> +		     VCREATEQ_F
> +		     ])

I believe in similar cases in the aarch64 port we used the convention "ONLY", so something like VCREATEQ_F_ONLY.
Ok with that change.
Thanks,
Kyrill

> +
>  (define_code_attr mve_addsubmul [
>  		 (minus "vsub")
>  		 (mult "vmul")
> @@ -407,6 +411,7 @@ (define_int_attr mve_insn [
>  		 (VBICQ_M_N_S "vbic") (VBICQ_M_N_U "vbic")
>  		 (VBICQ_M_S "vbic") (VBICQ_M_U "vbic") (VBICQ_M_F
> "vbic")
>  		 (VBICQ_N_S "vbic") (VBICQ_N_U "vbic")
> +		 (VCREATEQ_S "vcreate") (VCREATEQ_U "vcreate")
> (VCREATEQ_F "vcreate")
>  		 (VEORQ_M_S "veor") (VEORQ_M_U "veor") (VEORQ_M_F
> "veor")
>  		 (VMULQ_M_N_S "vmul") (VMULQ_M_N_U "vmul")
> (VMULQ_M_N_F "vmul")
>  		 (VMULQ_M_S "vmul") (VMULQ_M_U "vmul") (VMULQ_M_F
> "vmul")
> diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
> index fbae1d3791f..f7f0ba65251 100644
> --- a/gcc/config/arm/mve.md
> +++ b/gcc/config/arm/mve.md
> @@ -700,12 +700,12 @@ (define_insn "mve_vcvtq_n_to_f_<supf><mode>"
> 
>  ;; [vcreateq_f])
>  ;;
> -(define_insn "mve_vcreateq_f<mode>"
> +(define_insn "@mve_<mve_insn>q_f<mode>"
>    [
>     (set (match_operand:MVE_0 0 "s_register_operand" "=w")
>  	(unspec:MVE_0 [(match_operand:DI 1 "s_register_operand" "r")
>  		       (match_operand:DI 2 "s_register_operand" "r")]
> -	 VCREATEQ_F))
> +	 MVE_FP_CREATE))
>    ]
>    "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
>    "vmov %q0[2], %q0[0], %Q1, %Q2\;vmov %q0[3], %q0[1], %R1, %R2"
> @@ -715,7 +715,7 @@ (define_insn "mve_vcreateq_f<mode>"
>  ;;
>  ;; [vcreateq_u, vcreateq_s])
>  ;;
> -(define_insn "mve_vcreateq_<supf><mode>"
> +(define_insn "@mve_<mve_insn>q_<supf><mode>"
>    [
>     (set (match_operand:MVE_1 0 "s_register_operand" "=w")
>  	(unspec:MVE_1 [(match_operand:DI 1 "s_register_operand" "r")
> --
> 2.34.1


^ permalink raw reply	[flat|nested] 55+ messages in thread

* RE: [PATCH 17/22] arm: [MVE intrinsics] rework vcreateq
  2023-04-18 13:46 ` [PATCH 17/22] arm: [MVE intrinsics] rework vcreateq Christophe Lyon
@ 2023-05-03  8:44   ` Kyrylo Tkachov
  0 siblings, 0 replies; 55+ messages in thread
From: Kyrylo Tkachov @ 2023-05-03  8:44 UTC (permalink / raw)
  To: Christophe Lyon, gcc-patches, Richard Earnshaw, Richard Sandiford
  Cc: Christophe Lyon



> -----Original Message-----
> From: Christophe Lyon <christophe.lyon@arm.com>
> Sent: Tuesday, April 18, 2023 2:46 PM
> To: gcc-patches@gcc.gnu.org; Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>;
> Richard Earnshaw <Richard.Earnshaw@arm.com>; Richard Sandiford
> <Richard.Sandiford@arm.com>
> Cc: Christophe Lyon <Christophe.Lyon@arm.com>
> Subject: [PATCH 17/22] arm: [MVE intrinsics] rework vcreateq
> 
> Implement vcreateq using the new MVE builtins framework.

Ok.
Thanks,
Kyrill

> 
> 2022-09-08  Christophe Lyon <christophe.lyon@arm.com>
> 
> 	gcc/
> 	* config/arm/arm-mve-builtins-base.cc (FUNCTION_WITHOUT_M_N):
> New. (vcreateq): New.
> 	* config/arm/arm-mve-builtins-base.def (vcreateq): New.
> 	* config/arm/arm-mve-builtins-base.h (vcreateq): New.
> 	* config/arm/arm_mve.h (vcreateq_f16): Remove.
> 	(vcreateq_f32): Remove.
> 	(vcreateq_u8): Remove.
> 	(vcreateq_u16): Remove.
> 	(vcreateq_u32): Remove.
> 	(vcreateq_u64): Remove.
> 	(vcreateq_s8): Remove.
> 	(vcreateq_s16): Remove.
> 	(vcreateq_s32): Remove.
> 	(vcreateq_s64): Remove.
> 	(__arm_vcreateq_u8): Remove.
> 	(__arm_vcreateq_u16): Remove.
> 	(__arm_vcreateq_u32): Remove.
> 	(__arm_vcreateq_u64): Remove.
> 	(__arm_vcreateq_s8): Remove.
> 	(__arm_vcreateq_s16): Remove.
> 	(__arm_vcreateq_s32): Remove.
> 	(__arm_vcreateq_s64): Remove.
> 	(__arm_vcreateq_f16): Remove.
> 	(__arm_vcreateq_f32): Remove.
> ---
>  gcc/config/arm/arm-mve-builtins-base.cc  | 10 +++
>  gcc/config/arm/arm-mve-builtins-base.def |  2 +
>  gcc/config/arm/arm-mve-builtins-base.h   |  1 +
>  gcc/config/arm/arm_mve.h                 | 80 ------------------------
>  4 files changed, 13 insertions(+), 80 deletions(-)
> 
> diff --git a/gcc/config/arm/arm-mve-builtins-base.cc b/gcc/config/arm/arm-
> mve-builtins-base.cc
> index 499a1ef9f0e..9722c861faf 100644
> --- a/gcc/config/arm/arm-mve-builtins-base.cc
> +++ b/gcc/config/arm/arm-mve-builtins-base.cc
> @@ -106,8 +106,18 @@ namespace arm_mve {
>      UNSPEC##_M_S, UNSPEC##_M_U, UNSPEC##_M_F,
> 	\
>      UNSPEC##_M_N_S, UNSPEC##_M_N_U, -1))
> 
> +  /* Helper for builtins without RTX codes, no _m predicated and no _n
> +     overrides.  */
> +#define FUNCTION_WITHOUT_M_N(NAME, UNSPEC) FUNCTION
> 		\
> +  (NAME, unspec_mve_function_exact_insn,				\
> +   (UNSPEC##_S, UNSPEC##_U, UNSPEC##_F,
> 	\
> +    -1, -1, -1,								\
> +    -1, -1, -1,								\
> +    -1, -1, -1))
> +
>  FUNCTION_WITH_RTX_M_N (vaddq, PLUS, VADDQ)
>  FUNCTION_WITH_RTX_M (vandq, AND, VANDQ)
> +FUNCTION_WITHOUT_M_N (vcreateq, VCREATEQ)
>  FUNCTION_WITH_RTX_M (veorq, XOR, VEORQ)
>  FUNCTION_WITH_RTX_M_N (vmulq, MULT, VMULQ)
>  FUNCTION_WITH_RTX_M_N_NO_N_F (vorrq, IOR, VORRQ)
> diff --git a/gcc/config/arm/arm-mve-builtins-base.def b/gcc/config/arm/arm-
> mve-builtins-base.def
> index c3f8c0f0eeb..1bfd15f973c 100644
> --- a/gcc/config/arm/arm-mve-builtins-base.def
> +++ b/gcc/config/arm/arm-mve-builtins-base.def
> @@ -20,6 +20,7 @@
>  #define REQUIRES_FLOAT false
>  DEF_MVE_FUNCTION (vaddq, binary_opt_n, all_integer, mx_or_none)
>  DEF_MVE_FUNCTION (vandq, binary, all_integer, mx_or_none)
> +DEF_MVE_FUNCTION (vcreateq, create, all_integer_with_64, none)
>  DEF_MVE_FUNCTION (veorq, binary, all_integer, mx_or_none)
>  DEF_MVE_FUNCTION (vmulq, binary_opt_n, all_integer, mx_or_none)
>  DEF_MVE_FUNCTION (vorrq, binary_orrq, all_integer, mx_or_none)
> @@ -31,6 +32,7 @@ DEF_MVE_FUNCTION (vuninitializedq, inherent,
> all_integer_with_64, none)
>  #define REQUIRES_FLOAT true
>  DEF_MVE_FUNCTION (vaddq, binary_opt_n, all_float, mx_or_none)
>  DEF_MVE_FUNCTION (vandq, binary, all_float, mx_or_none)
> +DEF_MVE_FUNCTION (vcreateq, create, all_float, none)
>  DEF_MVE_FUNCTION (veorq, binary, all_float, mx_or_none)
>  DEF_MVE_FUNCTION (vmulq, binary_opt_n, all_float, mx_or_none)
>  DEF_MVE_FUNCTION (vorrq, binary_orrq, all_float, mx_or_none)
> diff --git a/gcc/config/arm/arm-mve-builtins-base.h b/gcc/config/arm/arm-
> mve-builtins-base.h
> index c450b373239..8dd6bff01bf 100644
> --- a/gcc/config/arm/arm-mve-builtins-base.h
> +++ b/gcc/config/arm/arm-mve-builtins-base.h
> @@ -25,6 +25,7 @@ namespace functions {
> 
>  extern const function_base *const vaddq;
>  extern const function_base *const vandq;
> +extern const function_base *const vcreateq;
>  extern const function_base *const veorq;
>  extern const function_base *const vmulq;
>  extern const function_base *const vorrq;
> diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
> index edf8e247421..4810e2977d3 100644
> --- a/gcc/config/arm/arm_mve.h
> +++ b/gcc/config/arm/arm_mve.h
> @@ -638,20 +638,10 @@
>  #define vcvtq_n_f32_s32(__a,  __imm6) __arm_vcvtq_n_f32_s32(__a,
> __imm6)
>  #define vcvtq_n_f16_u16(__a,  __imm6) __arm_vcvtq_n_f16_u16(__a,
> __imm6)
>  #define vcvtq_n_f32_u32(__a,  __imm6) __arm_vcvtq_n_f32_u32(__a,
> __imm6)
> -#define vcreateq_f16(__a, __b) __arm_vcreateq_f16(__a, __b)
> -#define vcreateq_f32(__a, __b) __arm_vcreateq_f32(__a, __b)
>  #define vcvtq_n_s16_f16(__a,  __imm6) __arm_vcvtq_n_s16_f16(__a,
> __imm6)
>  #define vcvtq_n_s32_f32(__a,  __imm6) __arm_vcvtq_n_s32_f32(__a,
> __imm6)
>  #define vcvtq_n_u16_f16(__a,  __imm6) __arm_vcvtq_n_u16_f16(__a,
> __imm6)
>  #define vcvtq_n_u32_f32(__a,  __imm6) __arm_vcvtq_n_u32_f32(__a,
> __imm6)
> -#define vcreateq_u8(__a, __b) __arm_vcreateq_u8(__a, __b)
> -#define vcreateq_u16(__a, __b) __arm_vcreateq_u16(__a, __b)
> -#define vcreateq_u32(__a, __b) __arm_vcreateq_u32(__a, __b)
> -#define vcreateq_u64(__a, __b) __arm_vcreateq_u64(__a, __b)
> -#define vcreateq_s8(__a, __b) __arm_vcreateq_s8(__a, __b)
> -#define vcreateq_s16(__a, __b) __arm_vcreateq_s16(__a, __b)
> -#define vcreateq_s32(__a, __b) __arm_vcreateq_s32(__a, __b)
> -#define vcreateq_s64(__a, __b) __arm_vcreateq_s64(__a, __b)
>  #define vshrq_n_s8(__a,  __imm) __arm_vshrq_n_s8(__a,  __imm)
>  #define vshrq_n_s16(__a,  __imm) __arm_vshrq_n_s16(__a,  __imm)
>  #define vshrq_n_s32(__a,  __imm) __arm_vshrq_n_s32(__a,  __imm)
> @@ -3222,62 +3212,6 @@ __arm_vpnot (mve_pred16_t __a)
>    return __builtin_mve_vpnotv16bi (__a);
>  }
> 
> -__extension__ extern __inline uint8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vcreateq_u8 (uint64_t __a, uint64_t __b)
> -{
> -  return __builtin_mve_vcreateq_uv16qi (__a, __b);
> -}
> -
> -__extension__ extern __inline uint16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vcreateq_u16 (uint64_t __a, uint64_t __b)
> -{
> -  return __builtin_mve_vcreateq_uv8hi (__a, __b);
> -}
> -
> -__extension__ extern __inline uint32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vcreateq_u32 (uint64_t __a, uint64_t __b)
> -{
> -  return __builtin_mve_vcreateq_uv4si (__a, __b);
> -}
> -
> -__extension__ extern __inline uint64x2_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vcreateq_u64 (uint64_t __a, uint64_t __b)
> -{
> -  return __builtin_mve_vcreateq_uv2di (__a, __b);
> -}
> -
> -__extension__ extern __inline int8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vcreateq_s8 (uint64_t __a, uint64_t __b)
> -{
> -  return __builtin_mve_vcreateq_sv16qi (__a, __b);
> -}
> -
> -__extension__ extern __inline int16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vcreateq_s16 (uint64_t __a, uint64_t __b)
> -{
> -  return __builtin_mve_vcreateq_sv8hi (__a, __b);
> -}
> -
> -__extension__ extern __inline int32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vcreateq_s32 (uint64_t __a, uint64_t __b)
> -{
> -  return __builtin_mve_vcreateq_sv4si (__a, __b);
> -}
> -
> -__extension__ extern __inline int64x2_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vcreateq_s64 (uint64_t __a, uint64_t __b)
> -{
> -  return __builtin_mve_vcreateq_sv2di (__a, __b);
> -}
> -
>  __extension__ extern __inline int8x16_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vshrq_n_s8 (int8x16_t __a, const int __imm)
> @@ -15580,20 +15514,6 @@ __arm_vcvtq_n_f32_u32 (uint32x4_t __a, const
> int __imm6)
>    return __builtin_mve_vcvtq_n_to_f_uv4sf (__a, __imm6);
>  }
> 
> -__extension__ extern __inline float16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vcreateq_f16 (uint64_t __a, uint64_t __b)
> -{
> -  return __builtin_mve_vcreateq_fv8hf (__a, __b);
> -}
> -
> -__extension__ extern __inline float32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vcreateq_f32 (uint64_t __a, uint64_t __b)
> -{
> -  return __builtin_mve_vcreateq_fv4sf (__a, __b);
> -}
> -
>  __extension__ extern __inline int16x8_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vcvtq_n_s16_f16 (float16x8_t __a, const int __imm6)
> --
> 2.34.1


^ permalink raw reply	[flat|nested] 55+ messages in thread

* RE: [PATCH 18/22] arm: [MVE intrinsics] factorize several binary_m operations
  2023-04-18 13:46 ` [PATCH 18/22] arm: [MVE intrinsics] factorize several binary_m operations Christophe Lyon
@ 2023-05-03  8:46   ` Kyrylo Tkachov
  0 siblings, 0 replies; 55+ messages in thread
From: Kyrylo Tkachov @ 2023-05-03  8:46 UTC (permalink / raw)
  To: Christophe Lyon, gcc-patches, Richard Earnshaw, Richard Sandiford
  Cc: Christophe Lyon



> -----Original Message-----
> From: Christophe Lyon <christophe.lyon@arm.com>
> Sent: Tuesday, April 18, 2023 2:46 PM
> To: gcc-patches@gcc.gnu.org; Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>;
> Richard Earnshaw <Richard.Earnshaw@arm.com>; Richard Sandiford
> <Richard.Sandiford@arm.com>
> Cc: Christophe Lyon <Christophe.Lyon@arm.com>
> Subject: [PATCH 18/22] arm: [MVE intrinsics] factorize several binary_m
> operations
> 
> Factorize m-predicated versions of vabdq, vhaddq, vhsubq, vmaxq,
> vminq, vmulhq, vqaddq, vqdmladhq, vqdmladhxq, vqdmlsdhq, vqdmlsdhxq,
> vqdmulhq, vqrdmladhq, vqrdmladhxq, vqrdmlsdhq, vqrdmlsdhxq, vqrdmulhq,
> vqrshlq, vqshlq, vqsubq, vrhaddq, vrmulhq, vrshlq, vshlq
> so that they use the same pattern.
> 
> 2022-09-08  Christophe Lyon <christophe.lyon@arm.com>
> 
> 	gcc/
> 	* config/arm/iterators.md (MVE_INT_SU_M_BINARY): New.
> 	(mve_insn): Add vabdq, vhaddq, vhsubq, vmaxq, vminq, vmulhq,
> 	vqaddq, vqdmladhq, vqdmladhxq, vqdmlsdhq, vqdmlsdhxq,
> vqdmulhq,
> 	vqrdmladhq, vqrdmladhxq, vqrdmlsdhq, vqrdmlsdhxq, vqrdmulhq,
> 	vqrshlq, vqshlq, vqsubq, vrhaddq, vrmulhq, vrshlq, vshlq.
> 	(supf): Add VQDMLADHQ_M_S, VQDMLADHXQ_M_S,
> VQDMLSDHQ_M_S,
> 	VQDMLSDHXQ_M_S, VQDMULHQ_M_S, VQRDMLADHQ_M_S,
> VQRDMLADHXQ_M_S,
> 	VQRDMLSDHQ_M_S, VQRDMLSDHXQ_M_S, VQRDMULHQ_M_S.
> 	* config/arm/mve.md (@mve_<mve_insn>q_m_<supf><mode>):
> New.
> 	(mve_vshlq_m_<supf><mode>): Merged into
> 	@mve_<mve_insn>q_m_<supf><mode>.
> 	(mve_vabdq_m_<supf><mode>): Likewise.
> 	(mve_vhaddq_m_<supf><mode>): Likewise.
> 	(mve_vhsubq_m_<supf><mode>): Likewise.
> 	(mve_vmaxq_m_<supf><mode>): Likewise.
> 	(mve_vminq_m_<supf><mode>): Likewise.
> 	(mve_vmulhq_m_<supf><mode>): Likewise.
> 	(mve_vqaddq_m_<supf><mode>): Likewise.
> 	(mve_vqrshlq_m_<supf><mode>): Likewise.
> 	(mve_vqshlq_m_<supf><mode>): Likewise.
> 	(mve_vqsubq_m_<supf><mode>): Likewise.
> 	(mve_vrhaddq_m_<supf><mode>): Likewise.
> 	(mve_vrmulhq_m_<supf><mode>): Likewise.
> 	(mve_vrshlq_m_<supf><mode>): Likewise.
> 	(mve_vqdmladhq_m_s<mode>): Likewise.
> 	(mve_vqdmladhxq_m_s<mode>): Likewise.
> 	(mve_vqdmlsdhq_m_s<mode>): Likewise.
> 	(mve_vqdmlsdhxq_m_s<mode>): Likewise.
> 	(mve_vqdmulhq_m_s<mode>): Likewise.
> 	(mve_vqrdmladhq_m_s<mode>): Likewise.
> 	(mve_vqrdmladhxq_m_s<mode>): Likewise.
> 	(mve_vqrdmlsdhq_m_s<mode>): Likewise.
> 	(mve_vqrdmlsdhxq_m_s<mode>): Likewise.
> 	(mve_vqrdmulhq_m_s<mode>): Likewise.
> ---
>  gcc/config/arm/iterators.md |  65 +++++-
>  gcc/config/arm/mve.md       | 420 +++---------------------------------
>  2 files changed, 91 insertions(+), 394 deletions(-)
> 
> diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
> index 5a531d77a33..18d70350bbe 100644
> --- a/gcc/config/arm/iterators.md
> +++ b/gcc/config/arm/iterators.md
> @@ -339,6 +339,33 @@ (define_int_iterator MVE_INT_M_BINARY   [
>  		     VSUBQ_M_S VSUBQ_M_U
>  		     ])
> 
> +(define_int_iterator MVE_INT_SU_M_BINARY   [
> +		     VABDQ_M_S VABDQ_M_U
> +		     VHADDQ_M_S VHADDQ_M_U
> +		     VHSUBQ_M_S VHSUBQ_M_U
> +		     VMAXQ_M_S VMAXQ_M_U
> +		     VMINQ_M_S VMINQ_M_U
> +		     VMULHQ_M_S VMULHQ_M_U
> +		     VQADDQ_M_S VQADDQ_M_U
> +		     VQDMLADHQ_M_S
> +		     VQDMLADHXQ_M_S
> +		     VQDMLSDHQ_M_S
> +		     VQDMLSDHXQ_M_S
> +		     VQDMULHQ_M_S
> +		     VQRDMLADHQ_M_S
> +		     VQRDMLADHXQ_M_S
> +		     VQRDMLSDHQ_M_S
> +		     VQRDMLSDHXQ_M_S
> +		     VQRDMULHQ_M_S
> +		     VQRSHLQ_M_S VQRSHLQ_M_U
> +		     VQSHLQ_M_S VQSHLQ_M_U
> +		     VQSUBQ_M_S VQSUBQ_M_U
> +		     VRHADDQ_M_S VRHADDQ_M_U
> +		     VRMULHQ_M_S VRMULHQ_M_U
> +		     VRSHLQ_M_S VRSHLQ_M_U
> +		     VSHLQ_M_S VSHLQ_M_U
> +		     ])
> +
>  (define_int_iterator MVE_INT_M_BINARY_LOGIC   [
>  		     VANDQ_M_S VANDQ_M_U
>  		     VBICQ_M_S VBICQ_M_U
> @@ -404,6 +431,7 @@ (define_code_attr mve_addsubmul [
>  		 ])
> 
>  (define_int_attr mve_insn [
> +		 (VABDQ_M_S "vabd") (VABDQ_M_U "vabd")
>  		 (VADDQ_M_N_S "vadd") (VADDQ_M_N_U "vadd")
> (VADDQ_M_N_F "vadd")
>  		 (VADDQ_M_S "vadd") (VADDQ_M_U "vadd") (VADDQ_M_F
> "vadd")
>  		 (VADDQ_N_S "vadd") (VADDQ_N_U "vadd") (VADDQ_N_F
> "vadd")
> @@ -413,12 +441,35 @@ (define_int_attr mve_insn [
>  		 (VBICQ_N_S "vbic") (VBICQ_N_U "vbic")
>  		 (VCREATEQ_S "vcreate") (VCREATEQ_U "vcreate")
> (VCREATEQ_F "vcreate")
>  		 (VEORQ_M_S "veor") (VEORQ_M_U "veor") (VEORQ_M_F
> "veor")
> +		 (VHADDQ_M_S "vhadd") (VHADDQ_M_U "vhadd")
> +		 (VHSUBQ_M_S "vhsub") (VHSUBQ_M_U "vhsub")
> +		 (VMAXQ_M_S "vmax") (VMAXQ_M_U "vmax")
> +		 (VMINQ_M_S "vmin") (VMINQ_M_U "vmin")
> +		 (VMULHQ_M_S "vmulh") (VMULHQ_M_U "vmulh")
>  		 (VMULQ_M_N_S "vmul") (VMULQ_M_N_U "vmul")
> (VMULQ_M_N_F "vmul")
>  		 (VMULQ_M_S "vmul") (VMULQ_M_U "vmul") (VMULQ_M_F
> "vmul")
>  		 (VMULQ_N_S "vmul") (VMULQ_N_U "vmul") (VMULQ_N_F
> "vmul")
>  		 (VORRQ_M_N_S "vorr") (VORRQ_M_N_U "vorr")
>  		 (VORRQ_M_S "vorr") (VORRQ_M_U "vorr") (VORRQ_M_F
> "vorr")
>  		 (VORRQ_N_S "vorr") (VORRQ_N_U "vorr")
> +		 (VQADDQ_M_S "vqadd") (VQADDQ_M_U "vqadd")
> +		 (VQDMLADHQ_M_S "vqdmladh")
> +		 (VQDMLADHXQ_M_S "vqdmladhx")
> +		 (VQDMLSDHQ_M_S "vqdmlsdh")
> +		 (VQDMLSDHXQ_M_S "vqdmlsdhx")
> +		 (VQDMULHQ_M_S "vqdmulh")
> +		 (VQRDMLADHQ_M_S "vqrdmladh")
> +		 (VQRDMLADHXQ_M_S "vqrdmladhx")
> +		 (VQRDMLSDHQ_M_S "vqrdmlsdh")
> +		 (VQRDMLSDHXQ_M_S "vqrdmlsdhx")
> +		 (VQRDMULHQ_M_S "vqrdmulh")
> +		 (VQRSHLQ_M_S "vqrshl") (VQRSHLQ_M_U "vqrshl")
> +		 (VQSHLQ_M_S "vqshl") (VQSHLQ_M_U "vqshl")
> +		 (VQSUBQ_M_S "vqsub") (VQSUBQ_M_U "vqsub")
> +		 (VRHADDQ_M_S "vrhadd") (VRHADDQ_M_U "vrhadd")
> +		 (VRMULHQ_M_S "vrmulh") (VRMULHQ_M_U "vrmulh")
> +		 (VRSHLQ_M_S "vrshl") (VRSHLQ_M_U "vrshl")
> +		 (VSHLQ_M_S "vshl") (VSHLQ_M_U "vshl")
>  		 (VSUBQ_M_N_S "vsub") (VSUBQ_M_N_U "vsub")
> (VSUBQ_M_N_F "vsub")
>  		 (VSUBQ_M_S "vsub") (VSUBQ_M_U "vsub") (VSUBQ_M_F
> "vsub")
>  		 (VSUBQ_N_S "vsub") (VSUBQ_N_U "vsub") (VSUBQ_N_F
> "vsub")
> @@ -1557,7 +1608,19 @@ (define_int_attr supf [(VCVTQ_TO_F_S "s")
> (VCVTQ_TO_F_U "u") (VREV16Q_S "s")
>  		       (VADCIQ_U "u") (VADCIQ_M_U "u") (VADCIQ_S "s")
>  		       (VADCIQ_M_S "s") (SQRSHRL_64 "64") (SQRSHRL_48 "48")
>  		       (UQRSHLL_64 "64") (UQRSHLL_48 "48") (VSHLCQ_M_S
> "s")
> -		       (VSHLCQ_M_U "u")])
> +		       (VSHLCQ_M_U "u")
> +		       (VQDMLADHQ_M_S "s")
> +		       (VQDMLADHXQ_M_S "s")
> +		       (VQDMLSDHQ_M_S "s")
> +		       (VQDMLSDHXQ_M_S "s")
> +		       (VQDMULHQ_M_S "s")
> +		       (VQRDMLADHQ_M_S "s")
> +		       (VQRDMLADHXQ_M_S "s")
> +		       (VQRDMLSDHQ_M_S "s")
> +		       (VQRDMLSDHXQ_M_S "s")
> +		       (VQRDMULHQ_M_S "s")
> +		       ])
> +
>  ;; Both kinds of return insn.
>  (define_code_iterator RETURNS [return simple_return])
>  (define_code_attr return_str [(return "") (simple_return "simple_")])
> diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
> index f7f0ba65251..21c54197db5 100644
> --- a/gcc/config/arm/mve.md
> +++ b/gcc/config/arm/mve.md
> @@ -4867,23 +4867,6 @@ (define_insn "mve_vqshluq_m_n_s<mode>"
>    [(set_attr "type" "mve_move")
>     (set_attr "length" "8")])
> 
> -;;
> -;; [vshlq_m_s, vshlq_m_u])
> -;;
> -(define_insn "mve_vshlq_m_<supf><mode>"
> -  [
> -   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
> -	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
> -		       (match_operand:MVE_2 2 "s_register_operand" "w")
> -		       (match_operand:MVE_2 3 "s_register_operand" "w")
> -		       (match_operand:<MVE_VPRED> 4
> "vpr_register_operand" "Up")]
> -	 VSHLQ_M))
> -  ]
> -  "TARGET_HAVE_MVE"
> -  "vpst\;vshlt.<supf>%#<V_sz_elem>\t%q0, %q2, %q3"
> -  [(set_attr "type" "mve_move")
> -   (set_attr "length" "8")])
> -
>  ;;
>  ;; [vsriq_m_n_s, vsriq_m_n_u])
>  ;;
> @@ -4917,20 +4900,44 @@ (define_insn
> "mve_vcvtq_m_n_to_f_<supf><mode>"
>    "vpst\;vcvtt.f%#<V_sz_elem>.<supf>%#<V_sz_elem>\t%q0, %q2, %3"
>    [(set_attr "type" "mve_move")
>     (set_attr "length""8")])
> +
>  ;;
>  ;; [vabdq_m_s, vabdq_m_u])
> +;; [vhaddq_m_s, vhaddq_m_u])
> +;; [vhsubq_m_s, vhsubq_m_u])
> +;; [vmaxq_m_s, vmaxq_m_u])
> +;; [vminq_m_s, vminq_m_u])
> +;; [vmulhq_m_s, vmulhq_m_u])
> +;; [vqaddq_m_u, vqaddq_m_s])
> +;; [vqdmladhq_m_s])
> +;; [vqdmladhxq_m_s])
> +;; [vqdmlsdhq_m_s])
> +;; [vqdmlsdhxq_m_s])
> +;; [vqdmulhq_m_s])
> +;; [vqrdmladhq_m_s])
> +;; [vqrdmladhxq_m_s])
> +;; [vqrdmlsdhq_m_s])
> +;; [vqrdmlsdhxq_m_s])
> +;; [vqrdmulhq_m_s])
> +;; [vqrshlq_m_u, vqrshlq_m_s])
> +;; [vqshlq_m_u, vqshlq_m_s])
> +;; [vqsubq_m_u, vqsubq_m_s])
> +;; [vrhaddq_m_u, vrhaddq_m_s])
> +;; [vrmulhq_m_u, vrmulhq_m_s])
> +;; [vrshlq_m_s, vrshlq_m_u])
> +;; [vshlq_m_s, vshlq_m_u])

Ok with the trailing ')' removed.
Thanks,
Kyrill

>  ;;
> -(define_insn "mve_vabdq_m_<supf><mode>"
> +(define_insn "@mve_<mve_insn>q_m_<supf><mode>"
>    [
>     (set (match_operand:MVE_2 0 "s_register_operand" "=w")
>  	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
>  		       (match_operand:MVE_2 2 "s_register_operand" "w")
>  		       (match_operand:MVE_2 3 "s_register_operand" "w")
>  		       (match_operand:<MVE_VPRED> 4
> "vpr_register_operand" "Up")]
> -	 VABDQ_M))
> +	 MVE_INT_SU_M_BINARY))
>    ]
>    "TARGET_HAVE_MVE"
> -  "vpst\;vabdt.<supf>%#<V_sz_elem>	%q0, %q2, %q3"
> +  "vpst\;<mve_insn>t.<supf>%#<V_sz_elem>\t%q0, %q2, %q3"
>    [(set_attr "type" "mve_move")
>     (set_attr "length""8")])
> 
> @@ -5060,23 +5067,6 @@ (define_insn "mve_vhaddq_m_n_<supf><mode>"
>    [(set_attr "type" "mve_move")
>     (set_attr "length""8")])
> 
> -;;
> -;; [vhaddq_m_s, vhaddq_m_u])
> -;;
> -(define_insn "mve_vhaddq_m_<supf><mode>"
> -  [
> -   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
> -	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
> -		       (match_operand:MVE_2 2 "s_register_operand" "w")
> -		       (match_operand:MVE_2 3 "s_register_operand" "w")
> -		       (match_operand:<MVE_VPRED> 4
> "vpr_register_operand" "Up")]
> -	 VHADDQ_M))
> -  ]
> -  "TARGET_HAVE_MVE"
> -  "vpst\;vhaddt.<supf>%#<V_sz_elem>	%q0, %q2, %q3"
> -  [(set_attr "type" "mve_move")
> -   (set_attr "length""8")])
> -
>  ;;
>  ;; [vhsubq_m_n_s, vhsubq_m_n_u])
>  ;;
> @@ -5095,56 +5085,6 @@ (define_insn "mve_vhsubq_m_n_<supf><mode>"
>     (set_attr "length""8")])
> 
>  ;;
> -;; [vhsubq_m_s, vhsubq_m_u])
> -;;
> -(define_insn "mve_vhsubq_m_<supf><mode>"
> -  [
> -   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
> -	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
> -		       (match_operand:MVE_2 2 "s_register_operand" "w")
> -		       (match_operand:MVE_2 3 "s_register_operand" "w")
> -		       (match_operand:<MVE_VPRED> 4
> "vpr_register_operand" "Up")]
> -	 VHSUBQ_M))
> -  ]
> -  "TARGET_HAVE_MVE"
> -  "vpst\;vhsubt.<supf>%#<V_sz_elem>	%q0, %q2, %q3"
> -  [(set_attr "type" "mve_move")
> -   (set_attr "length""8")])
> -
> -;;
> -;; [vmaxq_m_s, vmaxq_m_u])
> -;;
> -(define_insn "mve_vmaxq_m_<supf><mode>"
> -  [
> -   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
> -	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
> -		       (match_operand:MVE_2 2 "s_register_operand" "w")
> -		       (match_operand:MVE_2 3 "s_register_operand" "w")
> -		       (match_operand:<MVE_VPRED> 4
> "vpr_register_operand" "Up")]
> -	 VMAXQ_M))
> -  ]
> -  "TARGET_HAVE_MVE"
> -  "vpst\;vmaxt.<supf>%#<V_sz_elem>	%q0, %q2, %q3"
> -  [(set_attr "type" "mve_move")
> -   (set_attr "length""8")])
> -
> -;;
> -;; [vminq_m_s, vminq_m_u])
> -;;
> -(define_insn "mve_vminq_m_<supf><mode>"
> -  [
> -   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
> -	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
> -		       (match_operand:MVE_2 2 "s_register_operand" "w")
> -		       (match_operand:MVE_2 3 "s_register_operand" "w")
> -		       (match_operand:<MVE_VPRED> 4
> "vpr_register_operand" "Up")]
> -	 VMINQ_M))
> -  ]
> -  "TARGET_HAVE_MVE"
> -  "vpst\;vmint.<supf>%#<V_sz_elem>	%q0, %q2, %q3"
> -  [(set_attr "type" "mve_move")
> -   (set_attr "length""8")])
> -
>  ;;
>  ;; [vmladavaq_p_u, vmladavaq_p_s])
>  ;;
> @@ -5196,23 +5136,6 @@ (define_insn "mve_vmlasq_m_n_<supf><mode>"
>    [(set_attr "type" "mve_move")
>     (set_attr "length""8")])
> 
> -;;
> -;; [vmulhq_m_s, vmulhq_m_u])
> -;;
> -(define_insn "mve_vmulhq_m_<supf><mode>"
> -  [
> -   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
> -	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
> -		       (match_operand:MVE_2 2 "s_register_operand" "w")
> -		       (match_operand:MVE_2 3 "s_register_operand" "w")
> -		       (match_operand:<MVE_VPRED> 4
> "vpr_register_operand" "Up")]
> -	 VMULHQ_M))
> -  ]
> -  "TARGET_HAVE_MVE"
> -  "vpst\;vmulht.<supf>%#<V_sz_elem>	%q0, %q2, %q3"
> -  [(set_attr "type" "mve_move")
> -   (set_attr "length""8")])
> -
>  ;;
>  ;; [vmullbq_int_m_u, vmullbq_int_m_s])
>  ;;
> @@ -5281,23 +5204,6 @@ (define_insn "mve_vqaddq_m_n_<supf><mode>"
>    [(set_attr "type" "mve_move")
>     (set_attr "length""8")])
> 
> -;;
> -;; [vqaddq_m_u, vqaddq_m_s])
> -;;
> -(define_insn "mve_vqaddq_m_<supf><mode>"
> -  [
> -   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
> -	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
> -		       (match_operand:MVE_2 2 "s_register_operand" "w")
> -		       (match_operand:MVE_2 3 "s_register_operand" "w")
> -		       (match_operand:<MVE_VPRED> 4
> "vpr_register_operand" "Up")]
> -	 VQADDQ_M))
> -  ]
> -  "TARGET_HAVE_MVE"
> -  "vpst\;vqaddt.<supf>%#<V_sz_elem>\t%q0, %q2, %q3"
> -  [(set_attr "type" "mve_move")
> -   (set_attr "length""8")])
> -
>  ;;
>  ;; [vqdmlahq_m_n_s])
>  ;;
> @@ -5366,23 +5272,6 @@ (define_insn "mve_vqrdmlashq_m_n_s<mode>"
>    [(set_attr "type" "mve_move")
>     (set_attr "length""8")])
> 
> -;;
> -;; [vqrshlq_m_u, vqrshlq_m_s])
> -;;
> -(define_insn "mve_vqrshlq_m_<supf><mode>"
> -  [
> -   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
> -	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
> -		       (match_operand:MVE_2 2 "s_register_operand" "w")
> -		       (match_operand:MVE_2 3 "s_register_operand" "w")
> -		       (match_operand:<MVE_VPRED> 4
> "vpr_register_operand" "Up")]
> -	 VQRSHLQ_M))
> -  ]
> -  "TARGET_HAVE_MVE"
> -  "vpst\;vqrshlt.<supf>%#<V_sz_elem>\t%q0, %q2, %q3"
> -  [(set_attr "type" "mve_move")
> -   (set_attr "length""8")])
> -
>  ;;
>  ;; [vqshlq_m_n_s, vqshlq_m_n_u])
>  ;;
> @@ -5400,23 +5289,6 @@ (define_insn "mve_vqshlq_m_n_<supf><mode>"
>    [(set_attr "type" "mve_move")
>     (set_attr "length""8")])
> 
> -;;
> -;; [vqshlq_m_u, vqshlq_m_s])
> -;;
> -(define_insn "mve_vqshlq_m_<supf><mode>"
> -  [
> -   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
> -	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
> -		       (match_operand:MVE_2 2 "s_register_operand" "w")
> -		       (match_operand:MVE_2 3 "s_register_operand" "w")
> -		       (match_operand:<MVE_VPRED> 4
> "vpr_register_operand" "Up")]
> -	 VQSHLQ_M))
> -  ]
> -  "TARGET_HAVE_MVE"
> -  "vpst\;vqshlt.<supf>%#<V_sz_elem>\t%q0, %q2, %q3"
> -  [(set_attr "type" "mve_move")
> -   (set_attr "length""8")])
> -
>  ;;
>  ;; [vqsubq_m_n_u, vqsubq_m_n_s])
>  ;;
> @@ -5434,74 +5306,6 @@ (define_insn "mve_vqsubq_m_n_<supf><mode>"
>    [(set_attr "type" "mve_move")
>     (set_attr "length""8")])
> 
> -;;
> -;; [vqsubq_m_u, vqsubq_m_s])
> -;;
> -(define_insn "mve_vqsubq_m_<supf><mode>"
> -  [
> -   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
> -	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
> -		       (match_operand:MVE_2 2 "s_register_operand" "w")
> -		       (match_operand:MVE_2 3 "s_register_operand" "w")
> -		       (match_operand:<MVE_VPRED> 4
> "vpr_register_operand" "Up")]
> -	 VQSUBQ_M))
> -  ]
> -  "TARGET_HAVE_MVE"
> -  "vpst\;vqsubt.<supf>%#<V_sz_elem>\t%q0, %q2, %q3"
> -  [(set_attr "type" "mve_move")
> -   (set_attr "length""8")])
> -
> -;;
> -;; [vrhaddq_m_u, vrhaddq_m_s])
> -;;
> -(define_insn "mve_vrhaddq_m_<supf><mode>"
> -  [
> -   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
> -	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
> -		       (match_operand:MVE_2 2 "s_register_operand" "w")
> -		       (match_operand:MVE_2 3 "s_register_operand" "w")
> -		       (match_operand:<MVE_VPRED> 4
> "vpr_register_operand" "Up")]
> -	 VRHADDQ_M))
> -  ]
> -  "TARGET_HAVE_MVE"
> -  "vpst\;vrhaddt.<supf>%#<V_sz_elem>\t%q0, %q2, %q3"
> -  [(set_attr "type" "mve_move")
> -   (set_attr "length""8")])
> -
> -;;
> -;; [vrmulhq_m_u, vrmulhq_m_s])
> -;;
> -(define_insn "mve_vrmulhq_m_<supf><mode>"
> -  [
> -   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
> -	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
> -		       (match_operand:MVE_2 2 "s_register_operand" "w")
> -		       (match_operand:MVE_2 3 "s_register_operand" "w")
> -		       (match_operand:<MVE_VPRED> 4
> "vpr_register_operand" "Up")]
> -	 VRMULHQ_M))
> -  ]
> -  "TARGET_HAVE_MVE"
> -  "vpst\;vrmulht.<supf>%#<V_sz_elem>\t%q0, %q2, %q3"
> -  [(set_attr "type" "mve_move")
> -   (set_attr "length""8")])
> -
> -;;
> -;; [vrshlq_m_s, vrshlq_m_u])
> -;;
> -(define_insn "mve_vrshlq_m_<supf><mode>"
> -  [
> -   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
> -	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
> -		       (match_operand:MVE_2 2 "s_register_operand" "w")
> -		       (match_operand:MVE_2 3 "s_register_operand" "w")
> -		       (match_operand:<MVE_VPRED> 4
> "vpr_register_operand" "Up")]
> -	 VRSHLQ_M))
> -  ]
> -  "TARGET_HAVE_MVE"
> -  "vpst\;vrshlt.<supf>%#<V_sz_elem>\t%q0, %q2, %q3"
> -  [(set_attr "type" "mve_move")
> -   (set_attr "length""8")])
> -
>  ;;
>  ;; [vrshrq_m_n_s, vrshrq_m_n_u])
>  ;;
> @@ -5655,74 +5459,6 @@ (define_insn "mve_vmlsdavaxq_p_s<mode>"
>    [(set_attr "type" "mve_move")
>     (set_attr "length""8")])
> 
> -;;
> -;; [vqdmladhq_m_s])
> -;;
> -(define_insn "mve_vqdmladhq_m_s<mode>"
> -  [
> -   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
> -	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
> -		       (match_operand:MVE_2 2 "s_register_operand" "w")
> -		       (match_operand:MVE_2 3 "s_register_operand" "w")
> -		       (match_operand:<MVE_VPRED> 4
> "vpr_register_operand" "Up")]
> -	 VQDMLADHQ_M_S))
> -  ]
> -  "TARGET_HAVE_MVE"
> -  "vpst\;vqdmladht.s%#<V_sz_elem>\t%q0, %q2, %q3"
> -  [(set_attr "type" "mve_move")
> -   (set_attr "length""8")])
> -
> -;;
> -;; [vqdmladhxq_m_s])
> -;;
> -(define_insn "mve_vqdmladhxq_m_s<mode>"
> -  [
> -   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
> -	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
> -		       (match_operand:MVE_2 2 "s_register_operand" "w")
> -		       (match_operand:MVE_2 3 "s_register_operand" "w")
> -		       (match_operand:<MVE_VPRED> 4
> "vpr_register_operand" "Up")]
> -	 VQDMLADHXQ_M_S))
> -  ]
> -  "TARGET_HAVE_MVE"
> -  "vpst\;vqdmladhxt.s%#<V_sz_elem>\t%q0, %q2, %q3"
> -  [(set_attr "type" "mve_move")
> -   (set_attr "length""8")])
> -
> -;;
> -;; [vqdmlsdhq_m_s])
> -;;
> -(define_insn "mve_vqdmlsdhq_m_s<mode>"
> -  [
> -   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
> -	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
> -		       (match_operand:MVE_2 2 "s_register_operand" "w")
> -		       (match_operand:MVE_2 3 "s_register_operand" "w")
> -		       (match_operand:<MVE_VPRED> 4
> "vpr_register_operand" "Up")]
> -	 VQDMLSDHQ_M_S))
> -  ]
> -  "TARGET_HAVE_MVE"
> -  "vpst\;vqdmlsdht.s%#<V_sz_elem>\t%q0, %q2, %q3"
> -  [(set_attr "type" "mve_move")
> -   (set_attr "length""8")])
> -
> -;;
> -;; [vqdmlsdhxq_m_s])
> -;;
> -(define_insn "mve_vqdmlsdhxq_m_s<mode>"
> -  [
> -   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
> -	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
> -		       (match_operand:MVE_2 2 "s_register_operand" "w")
> -		       (match_operand:MVE_2 3 "s_register_operand" "w")
> -		       (match_operand:<MVE_VPRED> 4
> "vpr_register_operand" "Up")]
> -	 VQDMLSDHXQ_M_S))
> -  ]
> -  "TARGET_HAVE_MVE"
> -  "vpst\;vqdmlsdhxt.s%#<V_sz_elem>\t%q0, %q2, %q3"
> -  [(set_attr "type" "mve_move")
> -   (set_attr "length""8")])
> -
>  ;;
>  ;; [vqdmulhq_m_n_s])
>  ;;
> @@ -5740,91 +5476,6 @@ (define_insn "mve_vqdmulhq_m_n_s<mode>"
>    [(set_attr "type" "mve_move")
>     (set_attr "length""8")])
> 
> -;;
> -;; [vqdmulhq_m_s])
> -;;
> -(define_insn "mve_vqdmulhq_m_s<mode>"
> -  [
> -   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
> -	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
> -		       (match_operand:MVE_2 2 "s_register_operand" "w")
> -		       (match_operand:MVE_2 3 "s_register_operand" "w")
> -		       (match_operand:<MVE_VPRED> 4
> "vpr_register_operand" "Up")]
> -	 VQDMULHQ_M_S))
> -  ]
> -  "TARGET_HAVE_MVE"
> -  "vpst\;vqdmulht.s%#<V_sz_elem>\t%q0, %q2, %q3"
> -  [(set_attr "type" "mve_move")
> -   (set_attr "length""8")])
> -
> -;;
> -;; [vqrdmladhq_m_s])
> -;;
> -(define_insn "mve_vqrdmladhq_m_s<mode>"
> -  [
> -   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
> -	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
> -		       (match_operand:MVE_2 2 "s_register_operand" "w")
> -		       (match_operand:MVE_2 3 "s_register_operand" "w")
> -		       (match_operand:<MVE_VPRED> 4
> "vpr_register_operand" "Up")]
> -	 VQRDMLADHQ_M_S))
> -  ]
> -  "TARGET_HAVE_MVE"
> -  "vpst\;vqrdmladht.s%#<V_sz_elem>\t%q0, %q2, %q3"
> -  [(set_attr "type" "mve_move")
> -   (set_attr "length""8")])
> -
> -;;
> -;; [vqrdmladhxq_m_s])
> -;;
> -(define_insn "mve_vqrdmladhxq_m_s<mode>"
> -  [
> -   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
> -	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
> -		       (match_operand:MVE_2 2 "s_register_operand" "w")
> -		       (match_operand:MVE_2 3 "s_register_operand" "w")
> -		       (match_operand:<MVE_VPRED> 4
> "vpr_register_operand" "Up")]
> -	 VQRDMLADHXQ_M_S))
> -  ]
> -  "TARGET_HAVE_MVE"
> -  "vpst\;vqrdmladhxt.s%#<V_sz_elem>\t%q0, %q2, %q3"
> -  [(set_attr "type" "mve_move")
> -   (set_attr "length""8")])
> -
> -;;
> -;; [vqrdmlsdhq_m_s])
> -;;
> -(define_insn "mve_vqrdmlsdhq_m_s<mode>"
> -  [
> -   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
> -	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
> -		       (match_operand:MVE_2 2 "s_register_operand" "w")
> -		       (match_operand:MVE_2 3 "s_register_operand" "w")
> -		       (match_operand:<MVE_VPRED> 4
> "vpr_register_operand" "Up")]
> -	 VQRDMLSDHQ_M_S))
> -  ]
> -  "TARGET_HAVE_MVE"
> -  "vpst\;vqrdmlsdht.s%#<V_sz_elem>\t%q0, %q2, %q3"
> -  [(set_attr "type" "mve_move")
> -   (set_attr "length""8")])
> -
> -;;
> -;; [vqrdmlsdhxq_m_s])
> -;;
> -(define_insn "mve_vqrdmlsdhxq_m_s<mode>"
> -  [
> -   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
> -	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
> -		       (match_operand:MVE_2 2 "s_register_operand" "w")
> -		       (match_operand:MVE_2 3 "s_register_operand" "w")
> -		       (match_operand:<MVE_VPRED> 4
> "vpr_register_operand" "Up")]
> -	 VQRDMLSDHXQ_M_S))
> -  ]
> -  "TARGET_HAVE_MVE"
> -  "vpst\;vqrdmlsdhxt.s%#<V_sz_elem>\t%q0, %q2, %q3"
> -  [(set_attr "type" "mve_move")
> -   (set_attr "length""8")])
> -
>  ;;
>  ;; [vqrdmulhq_m_n_s])
>  ;;
> @@ -5842,23 +5493,6 @@ (define_insn "mve_vqrdmulhq_m_n_s<mode>"
>    [(set_attr "type" "mve_move")
>     (set_attr "length""8")])
> 
> -;;
> -;; [vqrdmulhq_m_s])
> -;;
> -(define_insn "mve_vqrdmulhq_m_s<mode>"
> -  [
> -   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
> -	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
> -		       (match_operand:MVE_2 2 "s_register_operand" "w")
> -		       (match_operand:MVE_2 3 "s_register_operand" "w")
> -		       (match_operand:<MVE_VPRED> 4
> "vpr_register_operand" "Up")]
> -	 VQRDMULHQ_M_S))
> -  ]
> -  "TARGET_HAVE_MVE"
> -  "vpst\;vqrdmulht.s%#<V_sz_elem>\t%q0, %q2, %q3"
> -  [(set_attr "type" "mve_move")
> -   (set_attr "length""8")])
> -
>  ;;
>  ;; [vmlaldavaq_p_u, vmlaldavaq_p_s])
>  ;;
> --
> 2.34.1


^ permalink raw reply	[flat|nested] 55+ messages in thread

* RE: [PATCH 19/22] arm: [MVE intrinsics] factorize several binary _n operations
  2023-04-18 13:46 ` [PATCH 19/22] arm: [MVE intrinsics] factorize several binary _n operations Christophe Lyon
@ 2023-05-03  8:47   ` Kyrylo Tkachov
  0 siblings, 0 replies; 55+ messages in thread
From: Kyrylo Tkachov @ 2023-05-03  8:47 UTC (permalink / raw)
  To: Christophe Lyon, gcc-patches, Richard Earnshaw, Richard Sandiford
  Cc: Christophe Lyon



> -----Original Message-----
> From: Christophe Lyon <christophe.lyon@arm.com>
> Sent: Tuesday, April 18, 2023 2:46 PM
> To: gcc-patches@gcc.gnu.org; Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>;
> Richard Earnshaw <Richard.Earnshaw@arm.com>; Richard Sandiford
> <Richard.Sandiford@arm.com>
> Cc: Christophe Lyon <Christophe.Lyon@arm.com>
> Subject: [PATCH 19/22] arm: [MVE intrinsics] factorize several binary _n
> operations
> 
> Factorize
> vhaddq_n, vhsubq_n, vqaddq_n, vqdmulhq_n, vqrdmulhq_n, vqsubq_n
> so that they use the same pattern.
> 
> 2022-09-08  Christophe Lyon <christophe.lyon@arm.com>
> 
> 	gcc/
> 	* config/arm/iterators.md (MVE_INT_SU_N_BINARY): New.
> 	(mve_insn): Add vhaddq, vhsubq, vqaddq, vqdmulhq, vqrdmulhq,
> 	vqsubq.
> 	(supf): Add VQDMULHQ_N_S, VQRDMULHQ_N_S.
> 	* config/arm/mve.md (mve_vhaddq_n_<supf><mode>)
> 	(mve_vhsubq_n_<supf><mode>, mve_vqaddq_n_<supf><mode>)
> 	(mve_vqdmulhq_n_s<mode>, mve_vqrdmulhq_n_s<mode>)
> 	(mve_vqsubq_n_<supf><mode>): Merge into ...
> 	(@mve_<mve_insn>q_n_<supf><mode>): ... this.
> ---
>  gcc/config/arm/iterators.md | 17 ++++++++
>  gcc/config/arm/mve.md       | 86 ++++---------------------------------
>  2 files changed, 25 insertions(+), 78 deletions(-)
> 
> diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
> index 18d70350bbe..6dbc40f842c 100644
> --- a/gcc/config/arm/iterators.md
> +++ b/gcc/config/arm/iterators.md
> @@ -390,6 +390,15 @@ (define_int_iterator MVE_INT_N_BINARY   [
>  		     VSUBQ_N_S VSUBQ_N_U
>  		     ])
> 
> +(define_int_iterator MVE_INT_SU_N_BINARY   [
> +		     VHADDQ_N_S VHADDQ_N_U
> +		     VHSUBQ_N_S VHSUBQ_N_U
> +		     VQADDQ_N_S VQADDQ_N_U
> +		     VQDMULHQ_N_S
> +		     VQRDMULHQ_N_S
> +		     VQSUBQ_N_S VQSUBQ_N_U
> +		     ])
> +
>  (define_int_iterator MVE_INT_N_BINARY_LOGIC   [
>  		     VBICQ_N_S VBICQ_N_U
>  		     VORRQ_N_S VORRQ_N_U
> @@ -442,7 +451,9 @@ (define_int_attr mve_insn [
>  		 (VCREATEQ_S "vcreate") (VCREATEQ_U "vcreate")
> (VCREATEQ_F "vcreate")
>  		 (VEORQ_M_S "veor") (VEORQ_M_U "veor") (VEORQ_M_F
> "veor")
>  		 (VHADDQ_M_S "vhadd") (VHADDQ_M_U "vhadd")
> +		 (VHADDQ_N_S "vhadd") (VHADDQ_N_U "vhadd")
>  		 (VHSUBQ_M_S "vhsub") (VHSUBQ_M_U "vhsub")
> +		 (VHSUBQ_N_S "vhsub") (VHSUBQ_N_U "vhsub")
>  		 (VMAXQ_M_S "vmax") (VMAXQ_M_U "vmax")
>  		 (VMINQ_M_S "vmin") (VMINQ_M_U "vmin")
>  		 (VMULHQ_M_S "vmulh") (VMULHQ_M_U "vmulh")
> @@ -453,19 +464,23 @@ (define_int_attr mve_insn [
>  		 (VORRQ_M_S "vorr") (VORRQ_M_U "vorr") (VORRQ_M_F
> "vorr")
>  		 (VORRQ_N_S "vorr") (VORRQ_N_U "vorr")
>  		 (VQADDQ_M_S "vqadd") (VQADDQ_M_U "vqadd")
> +		 (VQADDQ_N_S "vqadd") (VQADDQ_N_U "vqadd")
>  		 (VQDMLADHQ_M_S "vqdmladh")
>  		 (VQDMLADHXQ_M_S "vqdmladhx")
>  		 (VQDMLSDHQ_M_S "vqdmlsdh")
>  		 (VQDMLSDHXQ_M_S "vqdmlsdhx")
>  		 (VQDMULHQ_M_S "vqdmulh")
> +		 (VQDMULHQ_N_S "vqdmulh")
>  		 (VQRDMLADHQ_M_S "vqrdmladh")
>  		 (VQRDMLADHXQ_M_S "vqrdmladhx")
>  		 (VQRDMLSDHQ_M_S "vqrdmlsdh")
>  		 (VQRDMLSDHXQ_M_S "vqrdmlsdhx")
>  		 (VQRDMULHQ_M_S "vqrdmulh")
> +		 (VQRDMULHQ_N_S "vqrdmulh")
>  		 (VQRSHLQ_M_S "vqrshl") (VQRSHLQ_M_U "vqrshl")
>  		 (VQSHLQ_M_S "vqshl") (VQSHLQ_M_U "vqshl")
>  		 (VQSUBQ_M_S "vqsub") (VQSUBQ_M_U "vqsub")
> +		 (VQSUBQ_N_S "vqsub") (VQSUBQ_N_U "vqsub")
>  		 (VRHADDQ_M_S "vrhadd") (VRHADDQ_M_U "vrhadd")
>  		 (VRMULHQ_M_S "vrmulh") (VRMULHQ_M_U "vrmulh")
>  		 (VRSHLQ_M_S "vrshl") (VRSHLQ_M_U "vrshl")
> @@ -1619,6 +1634,8 @@ (define_int_attr supf [(VCVTQ_TO_F_S "s")
> (VCVTQ_TO_F_U "u") (VREV16Q_S "s")
>  		       (VQRDMLSDHQ_M_S "s")
>  		       (VQRDMLSDHXQ_M_S "s")
>  		       (VQRDMULHQ_M_S "s")
> +		       (VQDMULHQ_N_S "s")
> +		       (VQRDMULHQ_N_S "s")
>  		       ])
> 
>  ;; Both kinds of return insn.
> diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
> index 21c54197db5..3377e03ee06 100644
> --- a/gcc/config/arm/mve.md
> +++ b/gcc/config/arm/mve.md
> @@ -1015,16 +1015,21 @@ (define_expand "mve_veorq_s<mode>"
> 
>  ;;
>  ;; [vhaddq_n_u, vhaddq_n_s])
> +;; [vhsubq_n_u, vhsubq_n_s])
> +;; [vqaddq_n_s, vqaddq_n_u])
> +;; [vqdmulhq_n_s])
> +;; [vqrdmulhq_n_s])
> +;; [vqsubq_n_s, vqsubq_n_u])

Ok with the ')' removed.
Thanks,
Kyrill

>  ;;
> -(define_insn "mve_vhaddq_n_<supf><mode>"
> +(define_insn "@mve_<mve_insn>q_n_<supf><mode>"
>    [
>     (set (match_operand:MVE_2 0 "s_register_operand" "=w")
>  	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand"
> "w")
>  		       (match_operand:<V_elem> 2 "s_register_operand" "r")]
> -	 VHADDQ_N))
> +	 MVE_INT_SU_N_BINARY))
>    ]
>    "TARGET_HAVE_MVE"
> -  "vhadd.<supf>%#<V_sz_elem>\t%q0, %q1, %2"
> +  "<mve_insn>.<supf>%#<V_sz_elem>\t%q0, %q1, %2"
>    [(set_attr "type" "mve_move")
>  ])
> 
> @@ -1073,21 +1078,6 @@ (define_insn "mve_vhcaddq_rot90_s<mode>"
>    [(set_attr "type" "mve_move")
>  ])
> 
> -;;
> -;; [vhsubq_n_u, vhsubq_n_s])
> -;;
> -(define_insn "mve_vhsubq_n_<supf><mode>"
> -  [
> -   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
> -	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand"
> "w")
> -		       (match_operand:<V_elem> 2 "s_register_operand" "r")]
> -	 VHSUBQ_N))
> -  ]
> -  "TARGET_HAVE_MVE"
> -  "vhsub.<supf>%#<V_sz_elem>\t%q0, %q1, %2"
> -  [(set_attr "type" "mve_move")
> -])
> -
>  ;;
>  ;; [vhsubq_s, vhsubq_u])
>  ;;
> @@ -1415,21 +1405,6 @@ (define_expand "mve_vorrq_u<mode>"
>    "TARGET_HAVE_MVE"
>  )
> 
> -;;
> -;; [vqaddq_n_s, vqaddq_n_u])
> -;;
> -(define_insn "mve_vqaddq_n_<supf><mode>"
> -  [
> -   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
> -	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand"
> "w")
> -		       (match_operand:<V_elem> 2 "s_register_operand" "r")]
> -	 VQADDQ_N))
> -  ]
> -  "TARGET_HAVE_MVE"
> -  "vqadd.<supf>%#<V_sz_elem>\t%q0, %q1, %2"
> -  [(set_attr "type" "mve_move")
> -])
> -
>  ;;
>  ;; [vqaddq_u, vqaddq_s])
>  ;;
> @@ -1445,21 +1420,6 @@ (define_insn "mve_vqaddq_<supf><mode>"
>    [(set_attr "type" "mve_move")
>  ])
> 
> -;;
> -;; [vqdmulhq_n_s])
> -;;
> -(define_insn "mve_vqdmulhq_n_s<mode>"
> -  [
> -   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
> -	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand"
> "w")
> -		       (match_operand:<V_elem> 2 "s_register_operand" "r")]
> -	 VQDMULHQ_N_S))
> -  ]
> -  "TARGET_HAVE_MVE"
> -  "vqdmulh.s%#<V_sz_elem>\t%q0, %q1, %2"
> -  [(set_attr "type" "mve_move")
> -])
> -
>  ;;
>  ;; [vqdmulhq_s])
>  ;;
> @@ -1475,21 +1435,6 @@ (define_insn "mve_vqdmulhq_s<mode>"
>    [(set_attr "type" "mve_move")
>  ])
> 
> -;;
> -;; [vqrdmulhq_n_s])
> -;;
> -(define_insn "mve_vqrdmulhq_n_s<mode>"
> -  [
> -   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
> -	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand"
> "w")
> -		       (match_operand:<V_elem> 2 "s_register_operand" "r")]
> -	 VQRDMULHQ_N_S))
> -  ]
> -  "TARGET_HAVE_MVE"
> -  "vqrdmulh.s%#<V_sz_elem>\t%q0, %q1, %2"
> -  [(set_attr "type" "mve_move")
> -])
> -
>  ;;
>  ;; [vqrdmulhq_s])
>  ;;
> @@ -1595,21 +1540,6 @@ (define_insn "mve_vqshluq_n_s<mode>"
>    [(set_attr "type" "mve_move")
>  ])
> 
> -;;
> -;; [vqsubq_n_s, vqsubq_n_u])
> -;;
> -(define_insn "mve_vqsubq_n_<supf><mode>"
> -  [
> -   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
> -	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand"
> "w")
> -		       (match_operand:<V_elem> 2 "s_register_operand" "r")]
> -	 VQSUBQ_N))
> -  ]
> -  "TARGET_HAVE_MVE"
> -  "vqsub.<supf>%#<V_sz_elem>\t%q0, %q1, %2"
> -  [(set_attr "type" "mve_move")
> -])
> -
>  ;;
>  ;; [vqsubq_u, vqsubq_s])
>  ;;
> --
> 2.34.1


^ permalink raw reply	[flat|nested] 55+ messages in thread

* RE: [PATCH 20/22] arm: [MVE intrinsics] factorize several binary _m_n operations
  2023-04-18 13:46 ` [PATCH 20/22] arm: [MVE intrinsics] factorize several binary _m_n operations Christophe Lyon
@ 2023-05-03  8:48   ` Kyrylo Tkachov
  0 siblings, 0 replies; 55+ messages in thread
From: Kyrylo Tkachov @ 2023-05-03  8:48 UTC (permalink / raw)
  To: Christophe Lyon, gcc-patches, Richard Earnshaw, Richard Sandiford
  Cc: Christophe Lyon



> -----Original Message-----
> From: Christophe Lyon <christophe.lyon@arm.com>
> Sent: Tuesday, April 18, 2023 2:46 PM
> To: gcc-patches@gcc.gnu.org; Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>;
> Richard Earnshaw <Richard.Earnshaw@arm.com>; Richard Sandiford
> <Richard.Sandiford@arm.com>
> Cc: Christophe Lyon <Christophe.Lyon@arm.com>
> Subject: [PATCH 20/22] arm: [MVE intrinsics] factorize several binary _m_n
> operations
> 
> Factorize vhaddq_m_n, vhsubq_m_n, vmlaq_m_n, vmlasq_m_n,
> vqaddq_m_n,
> vqdmlahq_m_n, vqdmlashq_m_n, vqdmulhq_m_n, vqrdmlahq_m_n,
> vqrdmlashq_m_n, vqrdmulhq_m_n, vqsubq_m_n
> so that they use the same pattern.
> 
> 2022-09-08  Christophe Lyon <christophe.lyon@arm.com>
> 
> 	gcc/
> 	* config/arm/iterators.md (MVE_INT_SU_M_N_BINARY): New.
> 	(mve_insn): Add vhaddq, vhsubq, vmlaq, vmlasq, vqaddq, vqdmlahq,
> 	vqdmlashq, vqdmulhq, vqrdmlahq, vqrdmlashq, vqrdmulhq, vqsubq.
> 	(supf): Add VQDMLAHQ_M_N_S, VQDMLASHQ_M_N_S,
> VQRDMLAHQ_M_N_S,
> 	VQRDMLASHQ_M_N_S, VQDMULHQ_M_N_S, VQRDMULHQ_M_N_S.
> 	* config/arm/mve.md (mve_vhaddq_m_n_<supf><mode>)
> 	(mve_vhsubq_m_n_<supf><mode>,
> mve_vmlaq_m_n_<supf><mode>)
> 	(mve_vmlasq_m_n_<supf><mode>,
> mve_vqaddq_m_n_<supf><mode>)
> 	(mve_vqdmlahq_m_n_s<mode>, mve_vqdmlashq_m_n_s<mode>)
> 	(mve_vqrdmlahq_m_n_s<mode>, mve_vqrdmlashq_m_n_s<mode>)
> 	(mve_vqsubq_m_n_<supf><mode>, mve_vqdmulhq_m_n_s<mode>)
> 	(mve_vqrdmulhq_m_n_s<mode>): Merge into ...
> 	(@mve_<mve_insn>q_m_n_<supf><mode>): ... this.
> ---
>  gcc/config/arm/iterators.md |  33 ++++++
>  gcc/config/arm/mve.md       | 202 +++---------------------------------
>  2 files changed, 46 insertions(+), 189 deletions(-)
> 
> diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
> index 6dbc40f842c..60452cdefe3 100644
> --- a/gcc/config/arm/iterators.md
> +++ b/gcc/config/arm/iterators.md
> @@ -384,6 +384,21 @@ (define_int_iterator MVE_INT_M_N_BINARY_LOGIC
> [
>  		     VORRQ_M_N_S VORRQ_M_N_U
>  		     ])
> 
> +(define_int_iterator MVE_INT_SU_M_N_BINARY   [
> +		     VHADDQ_M_N_S VHADDQ_M_N_U
> +		     VHSUBQ_M_N_S VHSUBQ_M_N_U
> +		     VMLAQ_M_N_S VMLAQ_M_N_U
> +		     VMLASQ_M_N_S VMLASQ_M_N_U
> +		     VQDMLAHQ_M_N_S
> +		     VQDMLASHQ_M_N_S
> +		     VQRDMLAHQ_M_N_S
> +		     VQRDMLASHQ_M_N_S
> +		     VQADDQ_M_N_S VQADDQ_M_N_U
> +		     VQSUBQ_M_N_S VQSUBQ_M_N_U
> +		     VQDMULHQ_M_N_S
> +		     VQRDMULHQ_M_N_S
> +		     ])
> +
>  (define_int_iterator MVE_INT_N_BINARY   [
>  		     VADDQ_N_S VADDQ_N_U
>  		     VMULQ_N_S VMULQ_N_U
> @@ -450,12 +465,16 @@ (define_int_attr mve_insn [
>  		 (VBICQ_N_S "vbic") (VBICQ_N_U "vbic")
>  		 (VCREATEQ_S "vcreate") (VCREATEQ_U "vcreate")
> (VCREATEQ_F "vcreate")
>  		 (VEORQ_M_S "veor") (VEORQ_M_U "veor") (VEORQ_M_F
> "veor")
> +		 (VHADDQ_M_N_S "vhadd") (VHADDQ_M_N_U "vhadd")
>  		 (VHADDQ_M_S "vhadd") (VHADDQ_M_U "vhadd")
>  		 (VHADDQ_N_S "vhadd") (VHADDQ_N_U "vhadd")
> +		 (VHSUBQ_M_N_S "vhsub") (VHSUBQ_M_N_U "vhsub")
>  		 (VHSUBQ_M_S "vhsub") (VHSUBQ_M_U "vhsub")
>  		 (VHSUBQ_N_S "vhsub") (VHSUBQ_N_U "vhsub")
>  		 (VMAXQ_M_S "vmax") (VMAXQ_M_U "vmax")
>  		 (VMINQ_M_S "vmin") (VMINQ_M_U "vmin")
> +		 (VMLAQ_M_N_S "vmla") (VMLAQ_M_N_U "vmla")
> +		 (VMLASQ_M_N_S "vmlas") (VMLASQ_M_N_U "vmlas")
>  		 (VMULHQ_M_S "vmulh") (VMULHQ_M_U "vmulh")
>  		 (VMULQ_M_N_S "vmul") (VMULQ_M_N_U "vmul")
> (VMULQ_M_N_F "vmul")
>  		 (VMULQ_M_S "vmul") (VMULQ_M_U "vmul") (VMULQ_M_F
> "vmul")
> @@ -463,22 +482,30 @@ (define_int_attr mve_insn [
>  		 (VORRQ_M_N_S "vorr") (VORRQ_M_N_U "vorr")
>  		 (VORRQ_M_S "vorr") (VORRQ_M_U "vorr") (VORRQ_M_F
> "vorr")
>  		 (VORRQ_N_S "vorr") (VORRQ_N_U "vorr")
> +		 (VQADDQ_M_N_S "vqadd") (VQADDQ_M_N_U "vqadd")
>  		 (VQADDQ_M_S "vqadd") (VQADDQ_M_U "vqadd")
>  		 (VQADDQ_N_S "vqadd") (VQADDQ_N_U "vqadd")
>  		 (VQDMLADHQ_M_S "vqdmladh")
>  		 (VQDMLADHXQ_M_S "vqdmladhx")
> +		 (VQDMLAHQ_M_N_S "vqdmlah")
> +		 (VQDMLASHQ_M_N_S "vqdmlash")
>  		 (VQDMLSDHQ_M_S "vqdmlsdh")
>  		 (VQDMLSDHXQ_M_S "vqdmlsdhx")
> +		 (VQDMULHQ_M_N_S "vqdmulh")
>  		 (VQDMULHQ_M_S "vqdmulh")
>  		 (VQDMULHQ_N_S "vqdmulh")
>  		 (VQRDMLADHQ_M_S "vqrdmladh")
>  		 (VQRDMLADHXQ_M_S "vqrdmladhx")
> +		 (VQRDMLAHQ_M_N_S "vqrdmlah")
> +		 (VQRDMLASHQ_M_N_S "vqrdmlash")
>  		 (VQRDMLSDHQ_M_S "vqrdmlsdh")
>  		 (VQRDMLSDHXQ_M_S "vqrdmlsdhx")
> +		 (VQRDMULHQ_M_N_S "vqrdmulh")
>  		 (VQRDMULHQ_M_S "vqrdmulh")
>  		 (VQRDMULHQ_N_S "vqrdmulh")
>  		 (VQRSHLQ_M_S "vqrshl") (VQRSHLQ_M_U "vqrshl")
>  		 (VQSHLQ_M_S "vqshl") (VQSHLQ_M_U "vqshl")
> +		 (VQSUBQ_M_N_S "vqsub") (VQSUBQ_M_N_U "vqsub")
>  		 (VQSUBQ_M_S "vqsub") (VQSUBQ_M_U "vqsub")
>  		 (VQSUBQ_N_S "vqsub") (VQSUBQ_N_U "vqsub")
>  		 (VRHADDQ_M_S "vrhadd") (VRHADDQ_M_U "vrhadd")
> @@ -1636,6 +1663,12 @@ (define_int_attr supf [(VCVTQ_TO_F_S "s")
> (VCVTQ_TO_F_U "u") (VREV16Q_S "s")
>  		       (VQRDMULHQ_M_S "s")
>  		       (VQDMULHQ_N_S "s")
>  		       (VQRDMULHQ_N_S "s")
> +		       (VQDMLAHQ_M_N_S "s")
> +		       (VQDMLASHQ_M_N_S "s")
> +		       (VQRDMLAHQ_M_N_S "s")
> +		       (VQRDMLASHQ_M_N_S "s")
> +		       (VQDMULHQ_M_N_S "s")
> +		       (VQRDMULHQ_M_N_S "s")
>  		       ])
> 
>  ;; Both kinds of return insn.
> diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
> index 3377e03ee06..d14a04d5f82 100644
> --- a/gcc/config/arm/mve.md
> +++ b/gcc/config/arm/mve.md
> @@ -4982,35 +4982,29 @@ (define_insn
> "mve_vcaddq_rot90_m_<supf><mode>"
> 
>  ;;
>  ;; [vhaddq_m_n_s, vhaddq_m_n_u])
> -;;
> -(define_insn "mve_vhaddq_m_n_<supf><mode>"
> -  [
> -   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
> -	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
> -		       (match_operand:MVE_2 2 "s_register_operand" "w")
> -		       (match_operand:<V_elem> 3 "s_register_operand" "r")
> -		       (match_operand:<MVE_VPRED> 4
> "vpr_register_operand" "Up")]
> -	 VHADDQ_M_N))
> -  ]
> -  "TARGET_HAVE_MVE"
> -  "vpst\;vhaddt.<supf>%#<V_sz_elem>	%q0, %q2, %3"
> -  [(set_attr "type" "mve_move")
> -   (set_attr "length""8")])
> -
> -;;
>  ;; [vhsubq_m_n_s, vhsubq_m_n_u])
> +;; [vmlaq_m_n_s, vmlaq_m_n_u])
> +;; [vmlasq_m_n_u, vmlasq_m_n_s])
> +;; [vqaddq_m_n_u, vqaddq_m_n_s])
> +;; [vqdmlahq_m_n_s])
> +;; [vqdmlashq_m_n_s])
> +;; [vqdmulhq_m_n_s])
> +;; [vqrdmlahq_m_n_s])
> +;; [vqrdmlashq_m_n_s])
> +;; [vqrdmulhq_m_n_s])
> +;; [vqsubq_m_n_u, vqsubq_m_n_s])
>  ;;

Ok with the trailing ')' removed.
Thanks,
Kyrill

> -(define_insn "mve_vhsubq_m_n_<supf><mode>"
> +(define_insn "@mve_<mve_insn>q_m_n_<supf><mode>"
>    [
>     (set (match_operand:MVE_2 0 "s_register_operand" "=w")
>  	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
>  		       (match_operand:MVE_2 2 "s_register_operand" "w")
>  		       (match_operand:<V_elem> 3 "s_register_operand" "r")
>  		       (match_operand:<MVE_VPRED> 4
> "vpr_register_operand" "Up")]
> -	 VHSUBQ_M_N))
> +	 MVE_INT_SU_M_N_BINARY))
>    ]
>    "TARGET_HAVE_MVE"
> -  "vpst\;vhsubt.<supf>%#<V_sz_elem>	%q0, %q2, %3"
> +  "vpst\;<mve_insn>t.<supf>%#<V_sz_elem>\t%q0, %q2, %3"
>    [(set_attr "type" "mve_move")
>     (set_attr "length""8")])
> 
> @@ -5032,40 +5026,6 @@ (define_insn "mve_vmladavaq_p_<supf><mode>"
>    [(set_attr "type" "mve_move")
>     (set_attr "length""8")])
> 
> -;;
> -;; [vmlaq_m_n_s, vmlaq_m_n_u])
> -;;
> -(define_insn "mve_vmlaq_m_n_<supf><mode>"
> -  [
> -   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
> -	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
> -		       (match_operand:MVE_2 2 "s_register_operand" "w")
> -		       (match_operand:<V_elem> 3 "s_register_operand" "r")
> -		       (match_operand:<MVE_VPRED> 4
> "vpr_register_operand" "Up")]
> -	 VMLAQ_M_N))
> -  ]
> -  "TARGET_HAVE_MVE"
> -  "vpst\;vmlat.<supf>%#<V_sz_elem>	%q0, %q2, %3"
> -  [(set_attr "type" "mve_move")
> -   (set_attr "length""8")])
> -
> -;;
> -;; [vmlasq_m_n_u, vmlasq_m_n_s])
> -;;
> -(define_insn "mve_vmlasq_m_n_<supf><mode>"
> -  [
> -   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
> -	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
> -		       (match_operand:MVE_2 2 "s_register_operand" "w")
> -		       (match_operand:<V_elem> 3 "s_register_operand" "r")
> -		       (match_operand:<MVE_VPRED> 4
> "vpr_register_operand" "Up")]
> -	 VMLASQ_M_N))
> -  ]
> -  "TARGET_HAVE_MVE"
> -  "vpst\;vmlast.<supf>%#<V_sz_elem>	%q0, %q2, %3"
> -  [(set_attr "type" "mve_move")
> -   (set_attr "length""8")])
> -
>  ;;
>  ;; [vmullbq_int_m_u, vmullbq_int_m_s])
>  ;;
> @@ -5117,91 +5077,6 @@ (define_insn "mve_vornq_m_<supf><mode>"
>    [(set_attr "type" "mve_move")
>     (set_attr "length""8")])
> 
> -;;
> -;; [vqaddq_m_n_u, vqaddq_m_n_s])
> -;;
> -(define_insn "mve_vqaddq_m_n_<supf><mode>"
> -  [
> -   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
> -	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
> -		       (match_operand:MVE_2 2 "s_register_operand" "w")
> -		       (match_operand:<V_elem> 3 "s_register_operand" "r")
> -		       (match_operand:<MVE_VPRED> 4
> "vpr_register_operand" "Up")]
> -	 VQADDQ_M_N))
> -  ]
> -  "TARGET_HAVE_MVE"
> -  "vpst\;vqaddt.<supf>%#<V_sz_elem>\t%q0, %q2, %3"
> -  [(set_attr "type" "mve_move")
> -   (set_attr "length""8")])
> -
> -;;
> -;; [vqdmlahq_m_n_s])
> -;;
> -(define_insn "mve_vqdmlahq_m_n_s<mode>"
> -  [
> -   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
> -	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
> -		       (match_operand:MVE_2 2 "s_register_operand" "w")
> -		       (match_operand:<V_elem> 3 "s_register_operand" "r")
> -		       (match_operand:<MVE_VPRED> 4
> "vpr_register_operand" "Up")]
> -	 VQDMLAHQ_M_N_S))
> -  ]
> -  "TARGET_HAVE_MVE"
> -  "vpst\;vqdmlaht.s%#<V_sz_elem>\t%q0, %q2, %3"
> -  [(set_attr "type" "mve_move")
> -   (set_attr "length""8")])
> -
> -;;
> -;; [vqdmlashq_m_n_s])
> -;;
> -(define_insn "mve_vqdmlashq_m_n_s<mode>"
> -  [
> -   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
> -	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
> -		       (match_operand:MVE_2 2 "s_register_operand" "w")
> -		       (match_operand:<V_elem> 3 "s_register_operand" "r")
> -		       (match_operand:<MVE_VPRED> 4
> "vpr_register_operand" "Up")]
> -	 VQDMLASHQ_M_N_S))
> -  ]
> -  "TARGET_HAVE_MVE"
> -  "vpst\;vqdmlasht.s%#<V_sz_elem>\t%q0, %q2, %3"
> -  [(set_attr "type" "mve_move")
> -   (set_attr "length""8")])
> -
> -;;
> -;; [vqrdmlahq_m_n_s])
> -;;
> -(define_insn "mve_vqrdmlahq_m_n_s<mode>"
> -  [
> -   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
> -	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
> -		       (match_operand:MVE_2 2 "s_register_operand" "w")
> -		       (match_operand:<V_elem> 3 "s_register_operand" "r")
> -		       (match_operand:<MVE_VPRED> 4
> "vpr_register_operand" "Up")]
> -	 VQRDMLAHQ_M_N_S))
> -  ]
> -  "TARGET_HAVE_MVE"
> -  "vpst\;vqrdmlaht.s%#<V_sz_elem>\t%q0, %q2, %3"
> -  [(set_attr "type" "mve_move")
> -   (set_attr "length""8")])
> -
> -;;
> -;; [vqrdmlashq_m_n_s])
> -;;
> -(define_insn "mve_vqrdmlashq_m_n_s<mode>"
> -  [
> -   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
> -	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
> -		       (match_operand:MVE_2 2 "s_register_operand" "w")
> -		       (match_operand:<V_elem> 3 "s_register_operand" "r")
> -		       (match_operand:<MVE_VPRED> 4
> "vpr_register_operand" "Up")]
> -	 VQRDMLASHQ_M_N_S))
> -  ]
> -  "TARGET_HAVE_MVE"
> -  "vpst\;vqrdmlasht.s%#<V_sz_elem>\t%q0, %q2, %3"
> -  [(set_attr "type" "mve_move")
> -   (set_attr "length""8")])
> -
>  ;;
>  ;; [vqshlq_m_n_s, vqshlq_m_n_u])
>  ;;
> @@ -5219,23 +5094,6 @@ (define_insn "mve_vqshlq_m_n_<supf><mode>"
>    [(set_attr "type" "mve_move")
>     (set_attr "length""8")])
> 
> -;;
> -;; [vqsubq_m_n_u, vqsubq_m_n_s])
> -;;
> -(define_insn "mve_vqsubq_m_n_<supf><mode>"
> -  [
> -   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
> -	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
> -		       (match_operand:MVE_2 2 "s_register_operand" "w")
> -		       (match_operand:<V_elem> 3 "s_register_operand" "r")
> -		       (match_operand:<MVE_VPRED> 4
> "vpr_register_operand" "Up")]
> -	 VQSUBQ_M_N))
> -  ]
> -  "TARGET_HAVE_MVE"
> -  "vpst\;vqsubt.<supf>%#<V_sz_elem>\t%q0, %q2, %3"
> -  [(set_attr "type" "mve_move")
> -   (set_attr "length""8")])
> -
>  ;;
>  ;; [vrshrq_m_n_s, vrshrq_m_n_u])
>  ;;
> @@ -5389,40 +5247,6 @@ (define_insn "mve_vmlsdavaxq_p_s<mode>"
>    [(set_attr "type" "mve_move")
>     (set_attr "length""8")])
> 
> -;;
> -;; [vqdmulhq_m_n_s])
> -;;
> -(define_insn "mve_vqdmulhq_m_n_s<mode>"
> -  [
> -   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
> -	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
> -		       (match_operand:MVE_2 2 "s_register_operand" "w")
> -		       (match_operand:<V_elem> 3 "s_register_operand" "r")
> -		       (match_operand:<MVE_VPRED> 4
> "vpr_register_operand" "Up")]
> -	 VQDMULHQ_M_N_S))
> -  ]
> -  "TARGET_HAVE_MVE"
> -  "vpst\;vqdmulht.s%#<V_sz_elem>\t%q0, %q2, %3"
> -  [(set_attr "type" "mve_move")
> -   (set_attr "length""8")])
> -
> -;;
> -;; [vqrdmulhq_m_n_s])
> -;;
> -(define_insn "mve_vqrdmulhq_m_n_s<mode>"
> -  [
> -   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
> -	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
> -		       (match_operand:MVE_2 2 "s_register_operand" "w")
> -		       (match_operand:<V_elem> 3 "s_register_operand" "r")
> -		       (match_operand:<MVE_VPRED> 4
> "vpr_register_operand" "Up")]
> -	 VQRDMULHQ_M_N_S))
> -  ]
> -  "TARGET_HAVE_MVE"
> -  "vpst\;vqrdmulht.s%#<V_sz_elem>\t%q0, %q2, %3"
> -  [(set_attr "type" "mve_move")
> -   (set_attr "length""8")])
> -
>  ;;
>  ;; [vmlaldavaq_p_u, vmlaldavaq_p_s])
>  ;;
> --
> 2.34.1


^ permalink raw reply	[flat|nested] 55+ messages in thread

* RE: [PATCH 21/22] arm: [MVE intrinsics] factorize several binary operations
  2023-04-18 13:46 ` [PATCH 21/22] arm: [MVE intrinsics] factorize several binary operations Christophe Lyon
@ 2023-05-03  8:49   ` Kyrylo Tkachov
  0 siblings, 0 replies; 55+ messages in thread
From: Kyrylo Tkachov @ 2023-05-03  8:49 UTC (permalink / raw)
  To: Christophe Lyon, gcc-patches, Richard Earnshaw, Richard Sandiford
  Cc: Christophe Lyon



> -----Original Message-----
> From: Christophe Lyon <christophe.lyon@arm.com>
> Sent: Tuesday, April 18, 2023 2:46 PM
> To: gcc-patches@gcc.gnu.org; Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>;
> Richard Earnshaw <Richard.Earnshaw@arm.com>; Richard Sandiford
> <Richard.Sandiford@arm.com>
> Cc: Christophe Lyon <Christophe.Lyon@arm.com>
> Subject: [PATCH 21/22] arm: [MVE intrinsics] factorize several binary
> operations
> 
> Factorize vabdq, vhaddq, vhsubq, vmulhq, vqaddq_u, vqdmulhq,
> vqrdmulhq, vqrshlq, vqshlq, vqsubq_u, vrhaddq, vrmulhq, vrshlq
> so that they use the same pattern.
> 

Ok, as before without the trailing ')'.
Thanks,
Kyrill

> 2022-09-08  Christophe Lyon <christophe.lyon@arm.com>
> 
> 	gcc/
> 	* config/arm/iterators.md (MVE_INT_SU_BINARY): New.
> 	(mve_insn): Add vabdq, vhaddq, vhsubq, vmulhq, vqaddq, vqdmulhq,
> 	vqrdmulhq, vqrshlq, vqshlq, vqsubq, vrhaddq, vrmulhq, vrshlq.
> 	(supf): Add VQDMULHQ_S, VQRDMULHQ_S.
> 	* config/arm/mve.md (mve_vabdq_<supf><mode>)
> 	(@mve_vhaddq_<supf><mode>, mve_vhsubq_<supf><mode>)
> 	(mve_vmulhq_<supf><mode>, mve_vqaddq_<supf><mode>)
> 	(mve_vqdmulhq_s<mode>, mve_vqrdmulhq_s<mode>)
> 	(mve_vqrshlq_<supf><mode>, mve_vqshlq_<supf><mode>)
> 	(mve_vqsubq_<supf><mode>, @mve_vrhaddq_<supf><mode>)
> 	(mve_vrmulhq_<supf><mode>, mve_vrshlq_<supf><mode>): Merge
> into
> 	...
> 	(@mve_<mve_insn>q_<supf><mode>): ... this.
> 	* config/arm/vec-common.md (avg<mode>3_floor,
> uavg<mode>3_floor)
> 	(avg<mode>3_ceil, uavg<mode>3_ceil): Use gen_mve_q instead of
> 	gen_mve_vhaddq / gen_mve_vrhaddq.
> ---
>  gcc/config/arm/iterators.md  |  31 ++++++
>  gcc/config/arm/mve.md        | 198 +++--------------------------------
>  gcc/config/arm/vec-common.md |   8 +-
>  3 files changed, 50 insertions(+), 187 deletions(-)
> 
> diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
> index 60452cdefe3..068ae25e578 100644
> --- a/gcc/config/arm/iterators.md
> +++ b/gcc/config/arm/iterators.md
> @@ -414,6 +414,22 @@ (define_int_iterator MVE_INT_SU_N_BINARY   [
>  		     VQSUBQ_N_S VQSUBQ_N_U
>  		     ])
> 
> +(define_int_iterator MVE_INT_SU_BINARY   [
> +		     VABDQ_S VABDQ_U
> +		     VHADDQ_S VHADDQ_U
> +		     VHSUBQ_S VHSUBQ_U
> +		     VMULHQ_S VMULHQ_U
> +		     VQADDQ_S VQADDQ_U
> +		     VQDMULHQ_S
> +		     VQRDMULHQ_S
> +		     VQRSHLQ_S VQRSHLQ_U
> +		     VQSHLQ_S VQSHLQ_U
> +		     VQSUBQ_S VQSUBQ_U
> +		     VRHADDQ_S VRHADDQ_U
> +		     VRMULHQ_S VRMULHQ_U
> +		     VRSHLQ_S VRSHLQ_U
> +		     ])
> +
>  (define_int_iterator MVE_INT_N_BINARY_LOGIC   [
>  		     VBICQ_N_S VBICQ_N_U
>  		     VORRQ_N_S VORRQ_N_U
> @@ -456,6 +472,7 @@ (define_code_attr mve_addsubmul [
> 
>  (define_int_attr mve_insn [
>  		 (VABDQ_M_S "vabd") (VABDQ_M_U "vabd")
> +		 (VABDQ_S "vabd") (VABDQ_U "vabd")
>  		 (VADDQ_M_N_S "vadd") (VADDQ_M_N_U "vadd")
> (VADDQ_M_N_F "vadd")
>  		 (VADDQ_M_S "vadd") (VADDQ_M_U "vadd") (VADDQ_M_F
> "vadd")
>  		 (VADDQ_N_S "vadd") (VADDQ_N_U "vadd") (VADDQ_N_F
> "vadd")
> @@ -468,14 +485,17 @@ (define_int_attr mve_insn [
>  		 (VHADDQ_M_N_S "vhadd") (VHADDQ_M_N_U "vhadd")
>  		 (VHADDQ_M_S "vhadd") (VHADDQ_M_U "vhadd")
>  		 (VHADDQ_N_S "vhadd") (VHADDQ_N_U "vhadd")
> +		 (VHADDQ_S "vhadd") (VHADDQ_U "vhadd")
>  		 (VHSUBQ_M_N_S "vhsub") (VHSUBQ_M_N_U "vhsub")
>  		 (VHSUBQ_M_S "vhsub") (VHSUBQ_M_U "vhsub")
>  		 (VHSUBQ_N_S "vhsub") (VHSUBQ_N_U "vhsub")
> +		 (VHSUBQ_S "vhsub") (VHSUBQ_U "vhsub")
>  		 (VMAXQ_M_S "vmax") (VMAXQ_M_U "vmax")
>  		 (VMINQ_M_S "vmin") (VMINQ_M_U "vmin")
>  		 (VMLAQ_M_N_S "vmla") (VMLAQ_M_N_U "vmla")
>  		 (VMLASQ_M_N_S "vmlas") (VMLASQ_M_N_U "vmlas")
>  		 (VMULHQ_M_S "vmulh") (VMULHQ_M_U "vmulh")
> +		 (VMULHQ_S "vmulh") (VMULHQ_U "vmulh")
>  		 (VMULQ_M_N_S "vmul") (VMULQ_M_N_U "vmul")
> (VMULQ_M_N_F "vmul")
>  		 (VMULQ_M_S "vmul") (VMULQ_M_U "vmul") (VMULQ_M_F
> "vmul")
>  		 (VMULQ_N_S "vmul") (VMULQ_N_U "vmul") (VMULQ_N_F
> "vmul")
> @@ -485,6 +505,7 @@ (define_int_attr mve_insn [
>  		 (VQADDQ_M_N_S "vqadd") (VQADDQ_M_N_U "vqadd")
>  		 (VQADDQ_M_S "vqadd") (VQADDQ_M_U "vqadd")
>  		 (VQADDQ_N_S "vqadd") (VQADDQ_N_U "vqadd")
> +		 (VQADDQ_S "vqadd") (VQADDQ_U "vqadd")
>  		 (VQDMLADHQ_M_S "vqdmladh")
>  		 (VQDMLADHXQ_M_S "vqdmladhx")
>  		 (VQDMLAHQ_M_N_S "vqdmlah")
> @@ -494,6 +515,7 @@ (define_int_attr mve_insn [
>  		 (VQDMULHQ_M_N_S "vqdmulh")
>  		 (VQDMULHQ_M_S "vqdmulh")
>  		 (VQDMULHQ_N_S "vqdmulh")
> +		 (VQDMULHQ_S "vqdmulh")
>  		 (VQRDMLADHQ_M_S "vqrdmladh")
>  		 (VQRDMLADHXQ_M_S "vqrdmladhx")
>  		 (VQRDMLAHQ_M_N_S "vqrdmlah")
> @@ -503,14 +525,21 @@ (define_int_attr mve_insn [
>  		 (VQRDMULHQ_M_N_S "vqrdmulh")
>  		 (VQRDMULHQ_M_S "vqrdmulh")
>  		 (VQRDMULHQ_N_S "vqrdmulh")
> +		 (VQRDMULHQ_S "vqrdmulh")
>  		 (VQRSHLQ_M_S "vqrshl") (VQRSHLQ_M_U "vqrshl")
> +		 (VQRSHLQ_S "vqrshl") (VQRSHLQ_U "vqrshl")
>  		 (VQSHLQ_M_S "vqshl") (VQSHLQ_M_U "vqshl")
> +		 (VQSHLQ_S "vqshl") (VQSHLQ_U "vqshl")
>  		 (VQSUBQ_M_N_S "vqsub") (VQSUBQ_M_N_U "vqsub")
>  		 (VQSUBQ_M_S "vqsub") (VQSUBQ_M_U "vqsub")
>  		 (VQSUBQ_N_S "vqsub") (VQSUBQ_N_U "vqsub")
> +		 (VQSUBQ_S "vqsub") (VQSUBQ_U "vqsub")
>  		 (VRHADDQ_M_S "vrhadd") (VRHADDQ_M_U "vrhadd")
> +		 (VRHADDQ_S "vrhadd") (VRHADDQ_U "vrhadd")
>  		 (VRMULHQ_M_S "vrmulh") (VRMULHQ_M_U "vrmulh")
> +		 (VRMULHQ_S "vrmulh") (VRMULHQ_U "vrmulh")
>  		 (VRSHLQ_M_S "vrshl") (VRSHLQ_M_U "vrshl")
> +		 (VRSHLQ_S "vrshl") (VRSHLQ_U "vrshl")
>  		 (VSHLQ_M_S "vshl") (VSHLQ_M_U "vshl")
>  		 (VSUBQ_M_N_S "vsub") (VSUBQ_M_N_U "vsub")
> (VSUBQ_M_N_F "vsub")
>  		 (VSUBQ_M_S "vsub") (VSUBQ_M_U "vsub") (VSUBQ_M_F
> "vsub")
> @@ -1669,6 +1698,8 @@ (define_int_attr supf [(VCVTQ_TO_F_S "s")
> (VCVTQ_TO_F_U "u") (VREV16Q_S "s")
>  		       (VQRDMLASHQ_M_N_S "s")
>  		       (VQDMULHQ_M_N_S "s")
>  		       (VQRDMULHQ_M_N_S "s")
> +		       (VQDMULHQ_S "s")
> +		       (VQRDMULHQ_S "s")
>  		       ])
> 
>  ;; Both kinds of return insn.
> diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
> index d14a04d5f82..b9126af2aa9 100644
> --- a/gcc/config/arm/mve.md
> +++ b/gcc/config/arm/mve.md
> @@ -841,16 +841,28 @@ (define_insn
> "mve_vcmp<mve_cmp_op>q_n_<mode>"
> 
>  ;;
>  ;; [vabdq_s, vabdq_u])
> +;; [vhaddq_s, vhaddq_u])
> +;; [vhsubq_s, vhsubq_u])
> +;; [vmulhq_s, vmulhq_u])
> +;; [vqaddq_u, vqaddq_s])
> +;; [vqdmulhq_s])
> +;; [vqrdmulhq_s])
> +;; [vqrshlq_s, vqrshlq_u])
> +;; [vqshlq_s, vqshlq_u])
> +;; [vqsubq_u, vqsubq_s])
> +;; [vrhaddq_s, vrhaddq_u])
> +;; [vrmulhq_s, vrmulhq_u])
> +;; [vrshlq_s, vrshlq_u])
>  ;;
> -(define_insn "mve_vabdq_<supf><mode>"
> +(define_insn "@mve_<mve_insn>q_<supf><mode>"
>    [
>     (set (match_operand:MVE_2 0 "s_register_operand" "=w")
>  	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand"
> "w")
>  		       (match_operand:MVE_2 2 "s_register_operand" "w")]
> -	 VABDQ))
> +	 MVE_INT_SU_BINARY))
>    ]
>    "TARGET_HAVE_MVE"
> -  "vabd.<supf>%#<V_sz_elem>	%q0, %q1, %q2"
> +  "<mve_insn>.<supf>%#<V_sz_elem>\t%q0, %q1, %q2"
>    [(set_attr "type" "mve_move")
>  ])
> 
> @@ -1033,21 +1045,6 @@ (define_insn
> "@mve_<mve_insn>q_n_<supf><mode>"
>    [(set_attr "type" "mve_move")
>  ])
> 
> -;;
> -;; [vhaddq_s, vhaddq_u])
> -;;
> -(define_insn "@mve_vhaddq_<supf><mode>"
> -  [
> -   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
> -	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand"
> "w")
> -		       (match_operand:MVE_2 2 "s_register_operand" "w")]
> -	 VHADDQ))
> -  ]
> -  "TARGET_HAVE_MVE"
> -  "vhadd.<supf>%#<V_sz_elem>\t%q0, %q1, %q2"
> -  [(set_attr "type" "mve_move")
> -])
> -
>  ;;
>  ;; [vhcaddq_rot270_s])
>  ;;
> @@ -1078,21 +1075,6 @@ (define_insn "mve_vhcaddq_rot90_s<mode>"
>    [(set_attr "type" "mve_move")
>  ])
> 
> -;;
> -;; [vhsubq_s, vhsubq_u])
> -;;
> -(define_insn "mve_vhsubq_<supf><mode>"
> -  [
> -   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
> -	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand"
> "w")
> -		       (match_operand:MVE_2 2 "s_register_operand" "w")]
> -	 VHSUBQ))
> -  ]
> -  "TARGET_HAVE_MVE"
> -  "vhsub.<supf>%#<V_sz_elem>\t%q0, %q1, %q2"
> -  [(set_attr "type" "mve_move")
> -])
> -
>  ;;
>  ;; [vmaxaq_s])
>  ;;
> @@ -1293,21 +1275,6 @@ (define_insn "mve_vmlsdavxq_s<mode>"
>    [(set_attr "type" "mve_move")
>  ])
> 
> -;;
> -;; [vmulhq_s, vmulhq_u])
> -;;
> -(define_insn "mve_vmulhq_<supf><mode>"
> -  [
> -   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
> -	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand"
> "w")
> -		       (match_operand:MVE_2 2 "s_register_operand" "w")]
> -	 VMULHQ))
> -  ]
> -  "TARGET_HAVE_MVE"
> -  "vmulh.<supf>%#<V_sz_elem>\t%q0, %q1, %q2"
> -  [(set_attr "type" "mve_move")
> -])
> -
>  ;;
>  ;; [vmullbq_int_u, vmullbq_int_s])
>  ;;
> @@ -1405,51 +1372,6 @@ (define_expand "mve_vorrq_u<mode>"
>    "TARGET_HAVE_MVE"
>  )
> 
> -;;
> -;; [vqaddq_u, vqaddq_s])
> -;;
> -(define_insn "mve_vqaddq_<supf><mode>"
> -  [
> -   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
> -	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand"
> "w")
> -		       (match_operand:MVE_2 2 "s_register_operand" "w")]
> -	 VQADDQ))
> -  ]
> -  "TARGET_HAVE_MVE"
> -  "vqadd.<supf>%#<V_sz_elem>\t%q0, %q1, %q2"
> -  [(set_attr "type" "mve_move")
> -])
> -
> -;;
> -;; [vqdmulhq_s])
> -;;
> -(define_insn "mve_vqdmulhq_s<mode>"
> -  [
> -   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
> -	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand"
> "w")
> -		       (match_operand:MVE_2 2 "s_register_operand" "w")]
> -	 VQDMULHQ_S))
> -  ]
> -  "TARGET_HAVE_MVE"
> -  "vqdmulh.s%#<V_sz_elem>\t%q0, %q1, %q2"
> -  [(set_attr "type" "mve_move")
> -])
> -
> -;;
> -;; [vqrdmulhq_s])
> -;;
> -(define_insn "mve_vqrdmulhq_s<mode>"
> -  [
> -   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
> -	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand"
> "w")
> -		       (match_operand:MVE_2 2 "s_register_operand" "w")]
> -	 VQRDMULHQ_S))
> -  ]
> -  "TARGET_HAVE_MVE"
> -  "vqrdmulh.s%#<V_sz_elem>\t%q0, %q1, %q2"
> -  [(set_attr "type" "mve_move")
> -])
> -
>  ;;
>  ;; [vqrshlq_n_s, vqrshlq_n_u])
>  ;;
> @@ -1465,21 +1387,6 @@ (define_insn "mve_vqrshlq_n_<supf><mode>"
>    [(set_attr "type" "mve_move")
>  ])
> 
> -;;
> -;; [vqrshlq_s, vqrshlq_u])
> -;;
> -(define_insn "mve_vqrshlq_<supf><mode>"
> -  [
> -   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
> -	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand"
> "w")
> -		       (match_operand:MVE_2 2 "s_register_operand" "w")]
> -	 VQRSHLQ))
> -  ]
> -  "TARGET_HAVE_MVE"
> -  "vqrshl.<supf>%#<V_sz_elem>\t%q0, %q1, %q2"
> -  [(set_attr "type" "mve_move")
> -])
> -
>  ;;
>  ;; [vqshlq_n_s, vqshlq_n_u])
>  ;;
> @@ -1510,21 +1417,6 @@ (define_insn "mve_vqshlq_r_<supf><mode>"
>    [(set_attr "type" "mve_move")
>  ])
> 
> -;;
> -;; [vqshlq_s, vqshlq_u])
> -;;
> -(define_insn "mve_vqshlq_<supf><mode>"
> -  [
> -   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
> -	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand"
> "w")
> -		       (match_operand:MVE_2 2 "s_register_operand" "w")]
> -	 VQSHLQ))
> -  ]
> -  "TARGET_HAVE_MVE"
> -  "vqshl.<supf>%#<V_sz_elem>\t%q0, %q1, %q2"
> -  [(set_attr "type" "mve_move")
> -])
> -
>  ;;
>  ;; [vqshluq_n_s])
>  ;;
> @@ -1540,51 +1432,6 @@ (define_insn "mve_vqshluq_n_s<mode>"
>    [(set_attr "type" "mve_move")
>  ])
> 
> -;;
> -;; [vqsubq_u, vqsubq_s])
> -;;
> -(define_insn "mve_vqsubq_<supf><mode>"
> -  [
> -   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
> -	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand"
> "w")
> -		       (match_operand:MVE_2 2 "s_register_operand" "w")]
> -	 VQSUBQ))
> -  ]
> -  "TARGET_HAVE_MVE"
> -  "vqsub.<supf>%#<V_sz_elem>\t%q0, %q1, %q2"
> -  [(set_attr "type" "mve_move")
> -])
> -
> -;;
> -;; [vrhaddq_s, vrhaddq_u])
> -;;
> -(define_insn "@mve_vrhaddq_<supf><mode>"
> -  [
> -   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
> -	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand"
> "w")
> -		       (match_operand:MVE_2 2 "s_register_operand" "w")]
> -	 VRHADDQ))
> -  ]
> -  "TARGET_HAVE_MVE"
> -  "vrhadd.<supf>%#<V_sz_elem>\t%q0, %q1, %q2"
> -  [(set_attr "type" "mve_move")
> -])
> -
> -;;
> -;; [vrmulhq_s, vrmulhq_u])
> -;;
> -(define_insn "mve_vrmulhq_<supf><mode>"
> -  [
> -   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
> -	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand"
> "w")
> -		       (match_operand:MVE_2 2 "s_register_operand" "w")]
> -	 VRMULHQ))
> -  ]
> -  "TARGET_HAVE_MVE"
> -  "vrmulh.<supf>%#<V_sz_elem>\t%q0, %q1, %q2"
> -  [(set_attr "type" "mve_move")
> -])
> -
>  ;;
>  ;; [vrshlq_n_u, vrshlq_n_s])
>  ;;
> @@ -1600,21 +1447,6 @@ (define_insn "mve_vrshlq_n_<supf><mode>"
>    [(set_attr "type" "mve_move")
>  ])
> 
> -;;
> -;; [vrshlq_s, vrshlq_u])
> -;;
> -(define_insn "mve_vrshlq_<supf><mode>"
> -  [
> -   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
> -	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand"
> "w")
> -		       (match_operand:MVE_2 2 "s_register_operand" "w")]
> -	 VRSHLQ))
> -  ]
> -  "TARGET_HAVE_MVE"
> -  "vrshl.<supf>%#<V_sz_elem>\t%q0, %q1, %q2"
> -  [(set_attr "type" "mve_move")
> -])
> -
>  ;;
>  ;; [vrshrq_n_s, vrshrq_n_u])
>  ;;
> diff --git a/gcc/config/arm/vec-common.md b/gcc/config/arm/vec-
> common.md
> index f06df4db636..918338ca5c0 100644
> --- a/gcc/config/arm/vec-common.md
> +++ b/gcc/config/arm/vec-common.md
> @@ -573,7 +573,7 @@ (define_expand "avg<mode>3_floor"
>    "ARM_HAVE_<MODE>_ARITH"
>  {
>    if (TARGET_HAVE_MVE)
> -    emit_insn (gen_mve_vhaddq (VHADDQ_S, <MODE>mode,
> +    emit_insn (gen_mve_q (VHADDQ_S, VHADDQ_S, <MODE>mode,
>  			       operands[0], operands[1], operands[2]));
>    else
>      emit_insn (gen_neon_vhadd (UNSPEC_VHADD_S, UNSPEC_VHADD_S,
> <MODE>mode,
> @@ -588,7 +588,7 @@ (define_expand "uavg<mode>3_floor"
>    "ARM_HAVE_<MODE>_ARITH"
>  {
>    if (TARGET_HAVE_MVE)
> -    emit_insn (gen_mve_vhaddq (VHADDQ_U, <MODE>mode,
> +    emit_insn (gen_mve_q (VHADDQ_U, VHADDQ_U, <MODE>mode,
>  			       operands[0], operands[1], operands[2]));
>    else
>      emit_insn (gen_neon_vhadd (UNSPEC_VHADD_U, UNSPEC_VHADD_U,
> <MODE>mode,
> @@ -603,7 +603,7 @@ (define_expand "avg<mode>3_ceil"
>    "ARM_HAVE_<MODE>_ARITH"
>  {
>    if (TARGET_HAVE_MVE)
> -    emit_insn (gen_mve_vrhaddq (VRHADDQ_S, <MODE>mode,
> +    emit_insn (gen_mve_q (VRHADDQ_S, VRHADDQ_S, <MODE>mode,
>  				operands[0], operands[1], operands[2]));
>    else
>      emit_insn (gen_neon_vhadd (UNSPEC_VRHADD_S, UNSPEC_VRHADD_S,
> <MODE>mode,
> @@ -618,7 +618,7 @@ (define_expand "uavg<mode>3_ceil"
>    "ARM_HAVE_<MODE>_ARITH"
>  {
>    if (TARGET_HAVE_MVE)
> -    emit_insn (gen_mve_vrhaddq (VRHADDQ_U, <MODE>mode,
> +    emit_insn (gen_mve_q (VRHADDQ_U, VRHADDQ_U, <MODE>mode,
>  				operands[0], operands[1], operands[2]));
>    else
>      emit_insn (gen_neon_vhadd (UNSPEC_VRHADD_U, UNSPEC_VRHADD_U,
> <MODE>mode,
> --
> 2.34.1


^ permalink raw reply	[flat|nested] 55+ messages in thread

* RE: [PATCH 22/22] arm: [MVE intrinsics] rework vhaddq vhsubq vmulhq vqaddq vqsubq vqdmulhq vrhaddq vrmulhq
  2023-04-18 13:46 ` [PATCH 22/22] arm: [MVE intrinsics] rework vhaddq vhsubq vmulhq vqaddq vqsubq vqdmulhq vrhaddq vrmulhq Christophe Lyon
@ 2023-05-03  8:51   ` Kyrylo Tkachov
  0 siblings, 0 replies; 55+ messages in thread
From: Kyrylo Tkachov @ 2023-05-03  8:51 UTC (permalink / raw)
  To: Christophe Lyon, gcc-patches, Richard Earnshaw, Richard Sandiford
  Cc: Christophe Lyon



> -----Original Message-----
> From: Christophe Lyon <christophe.lyon@arm.com>
> Sent: Tuesday, April 18, 2023 2:46 PM
> To: gcc-patches@gcc.gnu.org; Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>;
> Richard Earnshaw <Richard.Earnshaw@arm.com>; Richard Sandiford
> <Richard.Sandiford@arm.com>
> Cc: Christophe Lyon <Christophe.Lyon@arm.com>
> Subject: [PATCH 22/22] arm: [MVE intrinsics] rework vhaddq vhsubq vmulhq
> vqaddq vqsubq vqdmulhq vrhaddq vrmulhq
> 
> Implement vhaddq, vhsubq, vmulhq, vqaddq, vqsubq, vqdmulhq, vrhaddq,
> vrmulhq using the new MVE builtins framework.

Ok.
Thanks,
Kyrill

> 
> 2022-09-08  Christophe Lyon <christophe.lyon@arm.com>
> 
> 	gcc/
> 	* config/arm/arm-mve-builtins-base.cc
> (FUNCTION_WITH_M_N_NO_F)
> 	(FUNCTION_WITHOUT_N_NO_F, FUNCTION_WITH_M_N_NO_U_F):
> New.
> 	(vhaddq, vhsubq, vmulhq, vqaddq, vqsubq, vqdmulhq, vrhaddq)
> 	(vrmulhq): New.
> 	* config/arm/arm-mve-builtins-base.def (vhaddq, vhsubq, vmulhq)
> 	(vqaddq, vqsubq, vqdmulhq, vrhaddq, vrmulhq): New.
> 	* config/arm/arm-mve-builtins-base.h (vhaddq, vhsubq, vmulhq)
> 	(vqaddq, vqsubq, vqdmulhq, vrhaddq, vrmulhq): New.
> 	* config/arm/arm_mve.h (vhsubq): Remove.
> 	(vhaddq): Remove.
> 	(vhaddq_m): Remove.
> 	(vhsubq_m): Remove.
> 	(vhaddq_x): Remove.
> 	(vhsubq_x): Remove.
> 	(vhsubq_u8): Remove.
> 	(vhsubq_n_u8): Remove.
> 	(vhaddq_u8): Remove.
> 	(vhaddq_n_u8): Remove.
> 	(vhsubq_s8): Remove.
> 	(vhsubq_n_s8): Remove.
> 	(vhaddq_s8): Remove.
> 	(vhaddq_n_s8): Remove.
> 	(vhsubq_u16): Remove.
> 	(vhsubq_n_u16): Remove.
> 	(vhaddq_u16): Remove.
> 	(vhaddq_n_u16): Remove.
> 	(vhsubq_s16): Remove.
> 	(vhsubq_n_s16): Remove.
> 	(vhaddq_s16): Remove.
> 	(vhaddq_n_s16): Remove.
> 	(vhsubq_u32): Remove.
> 	(vhsubq_n_u32): Remove.
> 	(vhaddq_u32): Remove.
> 	(vhaddq_n_u32): Remove.
> 	(vhsubq_s32): Remove.
> 	(vhsubq_n_s32): Remove.
> 	(vhaddq_s32): Remove.
> 	(vhaddq_n_s32): Remove.
> 	(vhaddq_m_n_s8): Remove.
> 	(vhaddq_m_n_s32): Remove.
> 	(vhaddq_m_n_s16): Remove.
> 	(vhaddq_m_n_u8): Remove.
> 	(vhaddq_m_n_u32): Remove.
> 	(vhaddq_m_n_u16): Remove.
> 	(vhaddq_m_s8): Remove.
> 	(vhaddq_m_s32): Remove.
> 	(vhaddq_m_s16): Remove.
> 	(vhaddq_m_u8): Remove.
> 	(vhaddq_m_u32): Remove.
> 	(vhaddq_m_u16): Remove.
> 	(vhsubq_m_n_s8): Remove.
> 	(vhsubq_m_n_s32): Remove.
> 	(vhsubq_m_n_s16): Remove.
> 	(vhsubq_m_n_u8): Remove.
> 	(vhsubq_m_n_u32): Remove.
> 	(vhsubq_m_n_u16): Remove.
> 	(vhsubq_m_s8): Remove.
> 	(vhsubq_m_s32): Remove.
> 	(vhsubq_m_s16): Remove.
> 	(vhsubq_m_u8): Remove.
> 	(vhsubq_m_u32): Remove.
> 	(vhsubq_m_u16): Remove.
> 	(vhaddq_x_n_s8): Remove.
> 	(vhaddq_x_n_s16): Remove.
> 	(vhaddq_x_n_s32): Remove.
> 	(vhaddq_x_n_u8): Remove.
> 	(vhaddq_x_n_u16): Remove.
> 	(vhaddq_x_n_u32): Remove.
> 	(vhaddq_x_s8): Remove.
> 	(vhaddq_x_s16): Remove.
> 	(vhaddq_x_s32): Remove.
> 	(vhaddq_x_u8): Remove.
> 	(vhaddq_x_u16): Remove.
> 	(vhaddq_x_u32): Remove.
> 	(vhsubq_x_n_s8): Remove.
> 	(vhsubq_x_n_s16): Remove.
> 	(vhsubq_x_n_s32): Remove.
> 	(vhsubq_x_n_u8): Remove.
> 	(vhsubq_x_n_u16): Remove.
> 	(vhsubq_x_n_u32): Remove.
> 	(vhsubq_x_s8): Remove.
> 	(vhsubq_x_s16): Remove.
> 	(vhsubq_x_s32): Remove.
> 	(vhsubq_x_u8): Remove.
> 	(vhsubq_x_u16): Remove.
> 	(vhsubq_x_u32): Remove.
> 	(__arm_vhsubq_u8): Remove.
> 	(__arm_vhsubq_n_u8): Remove.
> 	(__arm_vhaddq_u8): Remove.
> 	(__arm_vhaddq_n_u8): Remove.
> 	(__arm_vhsubq_s8): Remove.
> 	(__arm_vhsubq_n_s8): Remove.
> 	(__arm_vhaddq_s8): Remove.
> 	(__arm_vhaddq_n_s8): Remove.
> 	(__arm_vhsubq_u16): Remove.
> 	(__arm_vhsubq_n_u16): Remove.
> 	(__arm_vhaddq_u16): Remove.
> 	(__arm_vhaddq_n_u16): Remove.
> 	(__arm_vhsubq_s16): Remove.
> 	(__arm_vhsubq_n_s16): Remove.
> 	(__arm_vhaddq_s16): Remove.
> 	(__arm_vhaddq_n_s16): Remove.
> 	(__arm_vhsubq_u32): Remove.
> 	(__arm_vhsubq_n_u32): Remove.
> 	(__arm_vhaddq_u32): Remove.
> 	(__arm_vhaddq_n_u32): Remove.
> 	(__arm_vhsubq_s32): Remove.
> 	(__arm_vhsubq_n_s32): Remove.
> 	(__arm_vhaddq_s32): Remove.
> 	(__arm_vhaddq_n_s32): Remove.
> 	(__arm_vhaddq_m_n_s8): Remove.
> 	(__arm_vhaddq_m_n_s32): Remove.
> 	(__arm_vhaddq_m_n_s16): Remove.
> 	(__arm_vhaddq_m_n_u8): Remove.
> 	(__arm_vhaddq_m_n_u32): Remove.
> 	(__arm_vhaddq_m_n_u16): Remove.
> 	(__arm_vhaddq_m_s8): Remove.
> 	(__arm_vhaddq_m_s32): Remove.
> 	(__arm_vhaddq_m_s16): Remove.
> 	(__arm_vhaddq_m_u8): Remove.
> 	(__arm_vhaddq_m_u32): Remove.
> 	(__arm_vhaddq_m_u16): Remove.
> 	(__arm_vhsubq_m_n_s8): Remove.
> 	(__arm_vhsubq_m_n_s32): Remove.
> 	(__arm_vhsubq_m_n_s16): Remove.
> 	(__arm_vhsubq_m_n_u8): Remove.
> 	(__arm_vhsubq_m_n_u32): Remove.
> 	(__arm_vhsubq_m_n_u16): Remove.
> 	(__arm_vhsubq_m_s8): Remove.
> 	(__arm_vhsubq_m_s32): Remove.
> 	(__arm_vhsubq_m_s16): Remove.
> 	(__arm_vhsubq_m_u8): Remove.
> 	(__arm_vhsubq_m_u32): Remove.
> 	(__arm_vhsubq_m_u16): Remove.
> 	(__arm_vhaddq_x_n_s8): Remove.
> 	(__arm_vhaddq_x_n_s16): Remove.
> 	(__arm_vhaddq_x_n_s32): Remove.
> 	(__arm_vhaddq_x_n_u8): Remove.
> 	(__arm_vhaddq_x_n_u16): Remove.
> 	(__arm_vhaddq_x_n_u32): Remove.
> 	(__arm_vhaddq_x_s8): Remove.
> 	(__arm_vhaddq_x_s16): Remove.
> 	(__arm_vhaddq_x_s32): Remove.
> 	(__arm_vhaddq_x_u8): Remove.
> 	(__arm_vhaddq_x_u16): Remove.
> 	(__arm_vhaddq_x_u32): Remove.
> 	(__arm_vhsubq_x_n_s8): Remove.
> 	(__arm_vhsubq_x_n_s16): Remove.
> 	(__arm_vhsubq_x_n_s32): Remove.
> 	(__arm_vhsubq_x_n_u8): Remove.
> 	(__arm_vhsubq_x_n_u16): Remove.
> 	(__arm_vhsubq_x_n_u32): Remove.
> 	(__arm_vhsubq_x_s8): Remove.
> 	(__arm_vhsubq_x_s16): Remove.
> 	(__arm_vhsubq_x_s32): Remove.
> 	(__arm_vhsubq_x_u8): Remove.
> 	(__arm_vhsubq_x_u16): Remove.
> 	(__arm_vhsubq_x_u32): Remove.
> 	(__arm_vhsubq): Remove.
> 	(__arm_vhaddq): Remove.
> 	(__arm_vhaddq_m): Remove.
> 	(__arm_vhsubq_m): Remove.
> 	(__arm_vhaddq_x): Remove.
> 	(__arm_vhsubq_x): Remove.
> 	(vmulhq): Remove.
> 	(vmulhq_m): Remove.
> 	(vmulhq_x): Remove.
> 	(vmulhq_u8): Remove.
> 	(vmulhq_s8): Remove.
> 	(vmulhq_u16): Remove.
> 	(vmulhq_s16): Remove.
> 	(vmulhq_u32): Remove.
> 	(vmulhq_s32): Remove.
> 	(vmulhq_m_s8): Remove.
> 	(vmulhq_m_s32): Remove.
> 	(vmulhq_m_s16): Remove.
> 	(vmulhq_m_u8): Remove.
> 	(vmulhq_m_u32): Remove.
> 	(vmulhq_m_u16): Remove.
> 	(vmulhq_x_s8): Remove.
> 	(vmulhq_x_s16): Remove.
> 	(vmulhq_x_s32): Remove.
> 	(vmulhq_x_u8): Remove.
> 	(vmulhq_x_u16): Remove.
> 	(vmulhq_x_u32): Remove.
> 	(__arm_vmulhq_u8): Remove.
> 	(__arm_vmulhq_s8): Remove.
> 	(__arm_vmulhq_u16): Remove.
> 	(__arm_vmulhq_s16): Remove.
> 	(__arm_vmulhq_u32): Remove.
> 	(__arm_vmulhq_s32): Remove.
> 	(__arm_vmulhq_m_s8): Remove.
> 	(__arm_vmulhq_m_s32): Remove.
> 	(__arm_vmulhq_m_s16): Remove.
> 	(__arm_vmulhq_m_u8): Remove.
> 	(__arm_vmulhq_m_u32): Remove.
> 	(__arm_vmulhq_m_u16): Remove.
> 	(__arm_vmulhq_x_s8): Remove.
> 	(__arm_vmulhq_x_s16): Remove.
> 	(__arm_vmulhq_x_s32): Remove.
> 	(__arm_vmulhq_x_u8): Remove.
> 	(__arm_vmulhq_x_u16): Remove.
> 	(__arm_vmulhq_x_u32): Remove.
> 	(__arm_vmulhq): Remove.
> 	(__arm_vmulhq_m): Remove.
> 	(__arm_vmulhq_x): Remove.
> 	(vqsubq): Remove.
> 	(vqaddq): Remove.
> 	(vqaddq_m): Remove.
> 	(vqsubq_m): Remove.
> 	(vqsubq_u8): Remove.
> 	(vqsubq_n_u8): Remove.
> 	(vqaddq_u8): Remove.
> 	(vqaddq_n_u8): Remove.
> 	(vqsubq_s8): Remove.
> 	(vqsubq_n_s8): Remove.
> 	(vqaddq_s8): Remove.
> 	(vqaddq_n_s8): Remove.
> 	(vqsubq_u16): Remove.
> 	(vqsubq_n_u16): Remove.
> 	(vqaddq_u16): Remove.
> 	(vqaddq_n_u16): Remove.
> 	(vqsubq_s16): Remove.
> 	(vqsubq_n_s16): Remove.
> 	(vqaddq_s16): Remove.
> 	(vqaddq_n_s16): Remove.
> 	(vqsubq_u32): Remove.
> 	(vqsubq_n_u32): Remove.
> 	(vqaddq_u32): Remove.
> 	(vqaddq_n_u32): Remove.
> 	(vqsubq_s32): Remove.
> 	(vqsubq_n_s32): Remove.
> 	(vqaddq_s32): Remove.
> 	(vqaddq_n_s32): Remove.
> 	(vqaddq_m_n_s8): Remove.
> 	(vqaddq_m_n_s32): Remove.
> 	(vqaddq_m_n_s16): Remove.
> 	(vqaddq_m_n_u8): Remove.
> 	(vqaddq_m_n_u32): Remove.
> 	(vqaddq_m_n_u16): Remove.
> 	(vqaddq_m_s8): Remove.
> 	(vqaddq_m_s32): Remove.
> 	(vqaddq_m_s16): Remove.
> 	(vqaddq_m_u8): Remove.
> 	(vqaddq_m_u32): Remove.
> 	(vqaddq_m_u16): Remove.
> 	(vqsubq_m_n_s8): Remove.
> 	(vqsubq_m_n_s32): Remove.
> 	(vqsubq_m_n_s16): Remove.
> 	(vqsubq_m_n_u8): Remove.
> 	(vqsubq_m_n_u32): Remove.
> 	(vqsubq_m_n_u16): Remove.
> 	(vqsubq_m_s8): Remove.
> 	(vqsubq_m_s32): Remove.
> 	(vqsubq_m_s16): Remove.
> 	(vqsubq_m_u8): Remove.
> 	(vqsubq_m_u32): Remove.
> 	(vqsubq_m_u16): Remove.
> 	(__arm_vqsubq_u8): Remove.
> 	(__arm_vqsubq_n_u8): Remove.
> 	(__arm_vqaddq_u8): Remove.
> 	(__arm_vqaddq_n_u8): Remove.
> 	(__arm_vqsubq_s8): Remove.
> 	(__arm_vqsubq_n_s8): Remove.
> 	(__arm_vqaddq_s8): Remove.
> 	(__arm_vqaddq_n_s8): Remove.
> 	(__arm_vqsubq_u16): Remove.
> 	(__arm_vqsubq_n_u16): Remove.
> 	(__arm_vqaddq_u16): Remove.
> 	(__arm_vqaddq_n_u16): Remove.
> 	(__arm_vqsubq_s16): Remove.
> 	(__arm_vqsubq_n_s16): Remove.
> 	(__arm_vqaddq_s16): Remove.
> 	(__arm_vqaddq_n_s16): Remove.
> 	(__arm_vqsubq_u32): Remove.
> 	(__arm_vqsubq_n_u32): Remove.
> 	(__arm_vqaddq_u32): Remove.
> 	(__arm_vqaddq_n_u32): Remove.
> 	(__arm_vqsubq_s32): Remove.
> 	(__arm_vqsubq_n_s32): Remove.
> 	(__arm_vqaddq_s32): Remove.
> 	(__arm_vqaddq_n_s32): Remove.
> 	(__arm_vqaddq_m_n_s8): Remove.
> 	(__arm_vqaddq_m_n_s32): Remove.
> 	(__arm_vqaddq_m_n_s16): Remove.
> 	(__arm_vqaddq_m_n_u8): Remove.
> 	(__arm_vqaddq_m_n_u32): Remove.
> 	(__arm_vqaddq_m_n_u16): Remove.
> 	(__arm_vqaddq_m_s8): Remove.
> 	(__arm_vqaddq_m_s32): Remove.
> 	(__arm_vqaddq_m_s16): Remove.
> 	(__arm_vqaddq_m_u8): Remove.
> 	(__arm_vqaddq_m_u32): Remove.
> 	(__arm_vqaddq_m_u16): Remove.
> 	(__arm_vqsubq_m_n_s8): Remove.
> 	(__arm_vqsubq_m_n_s32): Remove.
> 	(__arm_vqsubq_m_n_s16): Remove.
> 	(__arm_vqsubq_m_n_u8): Remove.
> 	(__arm_vqsubq_m_n_u32): Remove.
> 	(__arm_vqsubq_m_n_u16): Remove.
> 	(__arm_vqsubq_m_s8): Remove.
> 	(__arm_vqsubq_m_s32): Remove.
> 	(__arm_vqsubq_m_s16): Remove.
> 	(__arm_vqsubq_m_u8): Remove.
> 	(__arm_vqsubq_m_u32): Remove.
> 	(__arm_vqsubq_m_u16): Remove.
> 	(__arm_vqsubq): Remove.
> 	(__arm_vqaddq): Remove.
> 	(__arm_vqaddq_m): Remove.
> 	(__arm_vqsubq_m): Remove.
> 	(vqdmulhq): Remove.
> 	(vqdmulhq_m): Remove.
> 	(vqdmulhq_s8): Remove.
> 	(vqdmulhq_n_s8): Remove.
> 	(vqdmulhq_s16): Remove.
> 	(vqdmulhq_n_s16): Remove.
> 	(vqdmulhq_s32): Remove.
> 	(vqdmulhq_n_s32): Remove.
> 	(vqdmulhq_m_n_s8): Remove.
> 	(vqdmulhq_m_n_s32): Remove.
> 	(vqdmulhq_m_n_s16): Remove.
> 	(vqdmulhq_m_s8): Remove.
> 	(vqdmulhq_m_s32): Remove.
> 	(vqdmulhq_m_s16): Remove.
> 	(__arm_vqdmulhq_s8): Remove.
> 	(__arm_vqdmulhq_n_s8): Remove.
> 	(__arm_vqdmulhq_s16): Remove.
> 	(__arm_vqdmulhq_n_s16): Remove.
> 	(__arm_vqdmulhq_s32): Remove.
> 	(__arm_vqdmulhq_n_s32): Remove.
> 	(__arm_vqdmulhq_m_n_s8): Remove.
> 	(__arm_vqdmulhq_m_n_s32): Remove.
> 	(__arm_vqdmulhq_m_n_s16): Remove.
> 	(__arm_vqdmulhq_m_s8): Remove.
> 	(__arm_vqdmulhq_m_s32): Remove.
> 	(__arm_vqdmulhq_m_s16): Remove.
> 	(__arm_vqdmulhq): Remove.
> 	(__arm_vqdmulhq_m): Remove.
> 	(vrhaddq): Remove.
> 	(vrhaddq_m): Remove.
> 	(vrhaddq_x): Remove.
> 	(vrhaddq_u8): Remove.
> 	(vrhaddq_s8): Remove.
> 	(vrhaddq_u16): Remove.
> 	(vrhaddq_s16): Remove.
> 	(vrhaddq_u32): Remove.
> 	(vrhaddq_s32): Remove.
> 	(vrhaddq_m_s8): Remove.
> 	(vrhaddq_m_s32): Remove.
> 	(vrhaddq_m_s16): Remove.
> 	(vrhaddq_m_u8): Remove.
> 	(vrhaddq_m_u32): Remove.
> 	(vrhaddq_m_u16): Remove.
> 	(vrhaddq_x_s8): Remove.
> 	(vrhaddq_x_s16): Remove.
> 	(vrhaddq_x_s32): Remove.
> 	(vrhaddq_x_u8): Remove.
> 	(vrhaddq_x_u16): Remove.
> 	(vrhaddq_x_u32): Remove.
> 	(__arm_vrhaddq_u8): Remove.
> 	(__arm_vrhaddq_s8): Remove.
> 	(__arm_vrhaddq_u16): Remove.
> 	(__arm_vrhaddq_s16): Remove.
> 	(__arm_vrhaddq_u32): Remove.
> 	(__arm_vrhaddq_s32): Remove.
> 	(__arm_vrhaddq_m_s8): Remove.
> 	(__arm_vrhaddq_m_s32): Remove.
> 	(__arm_vrhaddq_m_s16): Remove.
> 	(__arm_vrhaddq_m_u8): Remove.
> 	(__arm_vrhaddq_m_u32): Remove.
> 	(__arm_vrhaddq_m_u16): Remove.
> 	(__arm_vrhaddq_x_s8): Remove.
> 	(__arm_vrhaddq_x_s16): Remove.
> 	(__arm_vrhaddq_x_s32): Remove.
> 	(__arm_vrhaddq_x_u8): Remove.
> 	(__arm_vrhaddq_x_u16): Remove.
> 	(__arm_vrhaddq_x_u32): Remove.
> 	(__arm_vrhaddq): Remove.
> 	(__arm_vrhaddq_m): Remove.
> 	(__arm_vrhaddq_x): Remove.
> 	(vrmulhq): Remove.
> 	(vrmulhq_m): Remove.
> 	(vrmulhq_x): Remove.
> 	(vrmulhq_u8): Remove.
> 	(vrmulhq_s8): Remove.
> 	(vrmulhq_u16): Remove.
> 	(vrmulhq_s16): Remove.
> 	(vrmulhq_u32): Remove.
> 	(vrmulhq_s32): Remove.
> 	(vrmulhq_m_s8): Remove.
> 	(vrmulhq_m_s32): Remove.
> 	(vrmulhq_m_s16): Remove.
> 	(vrmulhq_m_u8): Remove.
> 	(vrmulhq_m_u32): Remove.
> 	(vrmulhq_m_u16): Remove.
> 	(vrmulhq_x_s8): Remove.
> 	(vrmulhq_x_s16): Remove.
> 	(vrmulhq_x_s32): Remove.
> 	(vrmulhq_x_u8): Remove.
> 	(vrmulhq_x_u16): Remove.
> 	(vrmulhq_x_u32): Remove.
> 	(__arm_vrmulhq_u8): Remove.
> 	(__arm_vrmulhq_s8): Remove.
> 	(__arm_vrmulhq_u16): Remove.
> 	(__arm_vrmulhq_s16): Remove.
> 	(__arm_vrmulhq_u32): Remove.
> 	(__arm_vrmulhq_s32): Remove.
> 	(__arm_vrmulhq_m_s8): Remove.
> 	(__arm_vrmulhq_m_s32): Remove.
> 	(__arm_vrmulhq_m_s16): Remove.
> 	(__arm_vrmulhq_m_u8): Remove.
> 	(__arm_vrmulhq_m_u32): Remove.
> 	(__arm_vrmulhq_m_u16): Remove.
> 	(__arm_vrmulhq_x_s8): Remove.
> 	(__arm_vrmulhq_x_s16): Remove.
> 	(__arm_vrmulhq_x_s32): Remove.
> 	(__arm_vrmulhq_x_u8): Remove.
> 	(__arm_vrmulhq_x_u16): Remove.
> 	(__arm_vrmulhq_x_u32): Remove.
> 	(__arm_vrmulhq): Remove.
> 	(__arm_vrmulhq_m): Remove.
> 	(__arm_vrmulhq_x): Remove.
> ---
>  gcc/config/arm/arm-mve-builtins-base.cc  |   35 +
>  gcc/config/arm/arm-mve-builtins-base.def |    8 +
>  gcc/config/arm/arm-mve-builtins-base.h   |    8 +
>  gcc/config/arm/arm_mve.h                 | 3203 ----------------------
>  4 files changed, 51 insertions(+), 3203 deletions(-)
> 
> diff --git a/gcc/config/arm/arm-mve-builtins-base.cc b/gcc/config/arm/arm-
> mve-builtins-base.cc
> index 9722c861faf..668f1fe9cda 100644
> --- a/gcc/config/arm/arm-mve-builtins-base.cc
> +++ b/gcc/config/arm/arm-mve-builtins-base.cc
> @@ -115,13 +115,48 @@ namespace arm_mve {
>      -1, -1, -1,								\
>      -1, -1, -1))
> 
> +  /* Helper for builtins with only unspec codes, _m predicated and _n
> +     overrides, but no floating-point version.  */
> +#define FUNCTION_WITH_M_N_NO_F(NAME, UNSPEC) FUNCTION
> 		\
> +  (NAME, unspec_mve_function_exact_insn,				\
> +   (UNSPEC##_S, UNSPEC##_U, -1,
> 	\
> +    UNSPEC##_N_S, UNSPEC##_N_U, -1,					\
> +    UNSPEC##_M_S, UNSPEC##_M_U, -1,
> 	\
> +    UNSPEC##_M_N_S, UNSPEC##_M_N_U, -1))
> +
> +  /* Helper for builtins with only unspec codes, _m predicated
> +     overrides, no _n and no floating-point version.  */
> +#define FUNCTION_WITHOUT_N_NO_F(NAME, UNSPEC) FUNCTION
> 		\
> +  (NAME, unspec_mve_function_exact_insn,				\
> +   (UNSPEC##_S, UNSPEC##_U, -1,
> 	\
> +    -1, -1, -1,								\
> +    UNSPEC##_M_S, UNSPEC##_M_U, -1,
> 	\
> +    -1, -1, -1))
> +
> +  /* Helper for builtins with only unspec codes, _m predicated and _n
> +     overrides, but no unsigned and floating-point versions.  */
> +#define FUNCTION_WITH_M_N_NO_U_F(NAME, UNSPEC) FUNCTION
> 		\
> +  (NAME, unspec_mve_function_exact_insn,				\
> +   (UNSPEC##_S, -1, -1,							\
> +    UNSPEC##_N_S, -1, -1,						\
> +    UNSPEC##_M_S, -1, -1,						\
> +    UNSPEC##_M_N_S, -1, -1))
> +
>  FUNCTION_WITH_RTX_M_N (vaddq, PLUS, VADDQ)
>  FUNCTION_WITH_RTX_M (vandq, AND, VANDQ)
>  FUNCTION_WITHOUT_M_N (vcreateq, VCREATEQ)
>  FUNCTION_WITH_RTX_M (veorq, XOR, VEORQ)
> +FUNCTION_WITH_M_N_NO_F (vhaddq, VHADDQ)
> +FUNCTION_WITH_M_N_NO_F (vhsubq, VHSUBQ)
> +FUNCTION_WITHOUT_N_NO_F (vmulhq, VMULHQ)
>  FUNCTION_WITH_RTX_M_N (vmulq, MULT, VMULQ)
>  FUNCTION_WITH_RTX_M_N_NO_N_F (vorrq, IOR, VORRQ)
> +FUNCTION_WITH_M_N_NO_F (vqaddq, VQADDQ)
> +FUNCTION_WITH_M_N_NO_U_F (vqdmulhq, VQDMULHQ)
> +FUNCTION_WITH_M_N_NO_F (vqsubq, VQSUBQ)
>  FUNCTION (vreinterpretq, vreinterpretq_impl,)
> +FUNCTION_WITHOUT_N_NO_F (vrhaddq, VRHADDQ)
> +FUNCTION_WITHOUT_N_NO_F (vrmulhq, VRMULHQ)
>  FUNCTION_WITH_RTX_M_N (vsubq, MINUS, VSUBQ)
>  FUNCTION (vuninitializedq, vuninitializedq_impl,)
> 
> diff --git a/gcc/config/arm/arm-mve-builtins-base.def b/gcc/config/arm/arm-
> mve-builtins-base.def
> index 1bfd15f973c..d256f3ebb2d 100644
> --- a/gcc/config/arm/arm-mve-builtins-base.def
> +++ b/gcc/config/arm/arm-mve-builtins-base.def
> @@ -22,9 +22,17 @@ DEF_MVE_FUNCTION (vaddq, binary_opt_n,
> all_integer, mx_or_none)
>  DEF_MVE_FUNCTION (vandq, binary, all_integer, mx_or_none)
>  DEF_MVE_FUNCTION (vcreateq, create, all_integer_with_64, none)
>  DEF_MVE_FUNCTION (veorq, binary, all_integer, mx_or_none)
> +DEF_MVE_FUNCTION (vhaddq, binary_opt_n, all_integer, mx_or_none)
> +DEF_MVE_FUNCTION (vhsubq, binary_opt_n, all_integer, mx_or_none)
> +DEF_MVE_FUNCTION (vmulhq, binary, all_integer, mx_or_none)
>  DEF_MVE_FUNCTION (vmulq, binary_opt_n, all_integer, mx_or_none)
>  DEF_MVE_FUNCTION (vorrq, binary_orrq, all_integer, mx_or_none)
> +DEF_MVE_FUNCTION (vqaddq, binary_opt_n, all_integer, m_or_none)
> +DEF_MVE_FUNCTION (vqdmulhq, binary_opt_n, all_signed, m_or_none)
> +DEF_MVE_FUNCTION (vqsubq, binary_opt_n, all_integer, m_or_none)
>  DEF_MVE_FUNCTION (vreinterpretq, unary_convert, reinterpret_integer,
> none)
> +DEF_MVE_FUNCTION (vrhaddq, binary, all_integer, mx_or_none)
> +DEF_MVE_FUNCTION (vrmulhq, binary, all_integer, mx_or_none)
>  DEF_MVE_FUNCTION (vsubq, binary_opt_n, all_integer, mx_or_none)
>  DEF_MVE_FUNCTION (vuninitializedq, inherent, all_integer_with_64, none)
>  #undef REQUIRES_FLOAT
> diff --git a/gcc/config/arm/arm-mve-builtins-base.h b/gcc/config/arm/arm-
> mve-builtins-base.h
> index 8dd6bff01bf..d64cb5e1dec 100644
> --- a/gcc/config/arm/arm-mve-builtins-base.h
> +++ b/gcc/config/arm/arm-mve-builtins-base.h
> @@ -27,9 +27,17 @@ extern const function_base *const vaddq;
>  extern const function_base *const vandq;
>  extern const function_base *const vcreateq;
>  extern const function_base *const veorq;
> +extern const function_base *const vhaddq;
> +extern const function_base *const vhsubq;
> +extern const function_base *const vmulhq;
>  extern const function_base *const vmulq;
>  extern const function_base *const vorrq;
> +extern const function_base *const vqaddq;
> +extern const function_base *const vqdmulhq;
> +extern const function_base *const vqsubq;
>  extern const function_base *const vreinterpretq;
> +extern const function_base *const vrhaddq;
> +extern const function_base *const vrmulhq;
>  extern const function_base *const vsubq;
>  extern const function_base *const vuninitializedq;
> 
> diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
> index 4810e2977d3..9c5d14794a1 100644
> --- a/gcc/config/arm/arm_mve.h
> +++ b/gcc/config/arm/arm_mve.h
> @@ -61,21 +61,14 @@
>  #define vaddlvq_p(__a, __p) __arm_vaddlvq_p(__a, __p)
>  #define vcmpneq(__a, __b) __arm_vcmpneq(__a, __b)
>  #define vshlq(__a, __b) __arm_vshlq(__a, __b)
> -#define vrmulhq(__a, __b) __arm_vrmulhq(__a, __b)
> -#define vrhaddq(__a, __b) __arm_vrhaddq(__a, __b)
> -#define vqsubq(__a, __b) __arm_vqsubq(__a, __b)
> -#define vqaddq(__a, __b) __arm_vqaddq(__a, __b)
>  #define vornq(__a, __b) __arm_vornq(__a, __b)
>  #define vmulltq_int(__a, __b) __arm_vmulltq_int(__a, __b)
>  #define vmullbq_int(__a, __b) __arm_vmullbq_int(__a, __b)
> -#define vmulhq(__a, __b) __arm_vmulhq(__a, __b)
>  #define vmladavq(__a, __b) __arm_vmladavq(__a, __b)
>  #define vminvq(__a, __b) __arm_vminvq(__a, __b)
>  #define vminq(__a, __b) __arm_vminq(__a, __b)
>  #define vmaxvq(__a, __b) __arm_vmaxvq(__a, __b)
>  #define vmaxq(__a, __b) __arm_vmaxq(__a, __b)
> -#define vhsubq(__a, __b) __arm_vhsubq(__a, __b)
> -#define vhaddq(__a, __b) __arm_vhaddq(__a, __b)
>  #define vcmphiq(__a, __b) __arm_vcmphiq(__a, __b)
>  #define vcmpeqq(__a, __b) __arm_vcmpeqq(__a, __b)
>  #define vcmpcsq(__a, __b) __arm_vcmpcsq(__a, __b)
> @@ -104,7 +97,6 @@
>  #define vcmpgeq(__a, __b) __arm_vcmpgeq(__a, __b)
>  #define vqshluq(__a, __imm) __arm_vqshluq(__a, __imm)
>  #define vqrdmulhq(__a, __b) __arm_vqrdmulhq(__a, __b)
> -#define vqdmulhq(__a, __b) __arm_vqdmulhq(__a, __b)
>  #define vmlsdavxq(__a, __b) __arm_vmlsdavxq(__a, __b)
>  #define vmlsdavq(__a, __b) __arm_vmlsdavq(__a, __b)
>  #define vmladavxq(__a, __b) __arm_vmladavxq(__a, __b)
> @@ -236,10 +228,8 @@
>  #define vbrsrq_m(__inactive, __a, __b, __p) __arm_vbrsrq_m(__inactive,
> __a, __b, __p)
>  #define vcaddq_rot270_m(__inactive, __a, __b, __p)
> __arm_vcaddq_rot270_m(__inactive, __a, __b, __p)
>  #define vcaddq_rot90_m(__inactive, __a, __b, __p)
> __arm_vcaddq_rot90_m(__inactive, __a, __b, __p)
> -#define vhaddq_m(__inactive, __a, __b, __p) __arm_vhaddq_m(__inactive,
> __a, __b, __p)
>  #define vhcaddq_rot270_m(__inactive, __a, __b, __p)
> __arm_vhcaddq_rot270_m(__inactive, __a, __b, __p)
>  #define vhcaddq_rot90_m(__inactive, __a, __b, __p)
> __arm_vhcaddq_rot90_m(__inactive, __a, __b, __p)
> -#define vhsubq_m(__inactive, __a, __b, __p) __arm_vhsubq_m(__inactive,
> __a, __b, __p)
>  #define vmaxq_m(__inactive, __a, __b, __p) __arm_vmaxq_m(__inactive,
> __a, __b, __p)
>  #define vminq_m(__inactive, __a, __b, __p) __arm_vminq_m(__inactive,
> __a, __b, __p)
>  #define vmladavaq_p(__a, __b, __c, __p) __arm_vmladavaq_p(__a, __b, __c,
> __p)
> @@ -248,18 +238,15 @@
>  #define vmlasq_m(__a, __b, __c, __p) __arm_vmlasq_m(__a, __b, __c, __p)
>  #define vmlsdavaq_p(__a, __b, __c, __p) __arm_vmlsdavaq_p(__a, __b, __c,
> __p)
>  #define vmlsdavaxq_p(__a, __b, __c, __p) __arm_vmlsdavaxq_p(__a, __b,
> __c, __p)
> -#define vmulhq_m(__inactive, __a, __b, __p) __arm_vmulhq_m(__inactive,
> __a, __b, __p)
>  #define vmullbq_int_m(__inactive, __a, __b, __p)
> __arm_vmullbq_int_m(__inactive, __a, __b, __p)
>  #define vmulltq_int_m(__inactive, __a, __b, __p)
> __arm_vmulltq_int_m(__inactive, __a, __b, __p)
>  #define vornq_m(__inactive, __a, __b, __p) __arm_vornq_m(__inactive, __a,
> __b, __p)
> -#define vqaddq_m(__inactive, __a, __b, __p) __arm_vqaddq_m(__inactive,
> __a, __b, __p)
>  #define vqdmladhq_m(__inactive, __a, __b, __p)
> __arm_vqdmladhq_m(__inactive, __a, __b, __p)
>  #define vqdmlashq_m(__a, __b, __c, __p) __arm_vqdmlashq_m(__a, __b,
> __c, __p)
>  #define vqdmladhxq_m(__inactive, __a, __b, __p)
> __arm_vqdmladhxq_m(__inactive, __a, __b, __p)
>  #define vqdmlahq_m(__a, __b, __c, __p) __arm_vqdmlahq_m(__a, __b, __c,
> __p)
>  #define vqdmlsdhq_m(__inactive, __a, __b, __p)
> __arm_vqdmlsdhq_m(__inactive, __a, __b, __p)
>  #define vqdmlsdhxq_m(__inactive, __a, __b, __p)
> __arm_vqdmlsdhxq_m(__inactive, __a, __b, __p)
> -#define vqdmulhq_m(__inactive, __a, __b, __p)
> __arm_vqdmulhq_m(__inactive, __a, __b, __p)
>  #define vqrdmladhq_m(__inactive, __a, __b, __p)
> __arm_vqrdmladhq_m(__inactive, __a, __b, __p)
>  #define vqrdmladhxq_m(__inactive, __a, __b, __p)
> __arm_vqrdmladhxq_m(__inactive, __a, __b, __p)
>  #define vqrdmlahq_m(__a, __b, __c, __p) __arm_vqrdmlahq_m(__a, __b,
> __c, __p)
> @@ -270,9 +257,6 @@
>  #define vqrshlq_m(__inactive, __a, __b, __p) __arm_vqrshlq_m(__inactive,
> __a, __b, __p)
>  #define vqshlq_m_n(__inactive, __a, __imm, __p)
> __arm_vqshlq_m_n(__inactive, __a, __imm, __p)
>  #define vqshlq_m(__inactive, __a, __b, __p) __arm_vqshlq_m(__inactive,
> __a, __b, __p)
> -#define vqsubq_m(__inactive, __a, __b, __p) __arm_vqsubq_m(__inactive,
> __a, __b, __p)
> -#define vrhaddq_m(__inactive, __a, __b, __p)
> __arm_vrhaddq_m(__inactive, __a, __b, __p)
> -#define vrmulhq_m(__inactive, __a, __b, __p)
> __arm_vrmulhq_m(__inactive, __a, __b, __p)
>  #define vrshlq_m(__inactive, __a, __b, __p) __arm_vrshlq_m(__inactive,
> __a, __b, __p)
>  #define vrshrq_m(__inactive, __a, __imm, __p) __arm_vrshrq_m(__inactive,
> __a, __imm, __p)
>  #define vshlq_m_n(__inactive, __a, __imm, __p)
> __arm_vshlq_m_n(__inactive, __a, __imm, __p)
> @@ -384,19 +368,14 @@
>  #define vclsq_x(__a, __p) __arm_vclsq_x(__a, __p)
>  #define vclzq_x(__a, __p) __arm_vclzq_x(__a, __p)
>  #define vnegq_x(__a, __p) __arm_vnegq_x(__a, __p)
> -#define vmulhq_x(__a, __b, __p) __arm_vmulhq_x(__a, __b, __p)
>  #define vmullbq_poly_x(__a, __b, __p) __arm_vmullbq_poly_x(__a, __b,
> __p)
>  #define vmullbq_int_x(__a, __b, __p) __arm_vmullbq_int_x(__a, __b, __p)
>  #define vmulltq_poly_x(__a, __b, __p) __arm_vmulltq_poly_x(__a, __b, __p)
>  #define vmulltq_int_x(__a, __b, __p) __arm_vmulltq_int_x(__a, __b, __p)
>  #define vcaddq_rot90_x(__a, __b, __p) __arm_vcaddq_rot90_x(__a, __b,
> __p)
>  #define vcaddq_rot270_x(__a, __b, __p) __arm_vcaddq_rot270_x(__a, __b,
> __p)
> -#define vhaddq_x(__a, __b, __p) __arm_vhaddq_x(__a, __b, __p)
>  #define vhcaddq_rot90_x(__a, __b, __p) __arm_vhcaddq_rot90_x(__a, __b,
> __p)
>  #define vhcaddq_rot270_x(__a, __b, __p) __arm_vhcaddq_rot270_x(__a,
> __b, __p)
> -#define vhsubq_x(__a, __b, __p) __arm_vhsubq_x(__a, __b, __p)
> -#define vrhaddq_x(__a, __b, __p) __arm_vrhaddq_x(__a, __b, __p)
> -#define vrmulhq_x(__a, __b, __p) __arm_vrmulhq_x(__a, __b, __p)
>  #define vbicq_x(__a, __b, __p) __arm_vbicq_x(__a, __b, __p)
>  #define vbrsrq_x(__a, __b, __p) __arm_vbrsrq_x(__a, __b, __p)
>  #define vmovlbq_x(__a, __p) __arm_vmovlbq_x(__a, __p)
> @@ -662,25 +641,14 @@
>  #define vshlq_u8(__a, __b) __arm_vshlq_u8(__a, __b)
>  #define vshlq_u16(__a, __b) __arm_vshlq_u16(__a, __b)
>  #define vshlq_u32(__a, __b) __arm_vshlq_u32(__a, __b)
> -#define vrmulhq_u8(__a, __b) __arm_vrmulhq_u8(__a, __b)
> -#define vrhaddq_u8(__a, __b) __arm_vrhaddq_u8(__a, __b)
> -#define vqsubq_u8(__a, __b) __arm_vqsubq_u8(__a, __b)
> -#define vqsubq_n_u8(__a, __b) __arm_vqsubq_n_u8(__a, __b)
> -#define vqaddq_u8(__a, __b) __arm_vqaddq_u8(__a, __b)
> -#define vqaddq_n_u8(__a, __b) __arm_vqaddq_n_u8(__a, __b)
>  #define vornq_u8(__a, __b) __arm_vornq_u8(__a, __b)
>  #define vmulltq_int_u8(__a, __b) __arm_vmulltq_int_u8(__a, __b)
>  #define vmullbq_int_u8(__a, __b) __arm_vmullbq_int_u8(__a, __b)
> -#define vmulhq_u8(__a, __b) __arm_vmulhq_u8(__a, __b)
>  #define vmladavq_u8(__a, __b) __arm_vmladavq_u8(__a, __b)
>  #define vminvq_u8(__a, __b) __arm_vminvq_u8(__a, __b)
>  #define vminq_u8(__a, __b) __arm_vminq_u8(__a, __b)
>  #define vmaxvq_u8(__a, __b) __arm_vmaxvq_u8(__a, __b)
>  #define vmaxq_u8(__a, __b) __arm_vmaxq_u8(__a, __b)
> -#define vhsubq_u8(__a, __b) __arm_vhsubq_u8(__a, __b)
> -#define vhsubq_n_u8(__a, __b) __arm_vhsubq_n_u8(__a, __b)
> -#define vhaddq_u8(__a, __b) __arm_vhaddq_u8(__a, __b)
> -#define vhaddq_n_u8(__a, __b) __arm_vhaddq_n_u8(__a, __b)
>  #define vcmpneq_n_u8(__a, __b) __arm_vcmpneq_n_u8(__a, __b)
>  #define vcmphiq_u8(__a, __b) __arm_vcmphiq_u8(__a, __b)
>  #define vcmphiq_n_u8(__a, __b) __arm_vcmphiq_n_u8(__a, __b)
> @@ -725,24 +693,15 @@
>  #define vshlq_r_s8(__a, __b) __arm_vshlq_r_s8(__a, __b)
>  #define vrshlq_s8(__a, __b) __arm_vrshlq_s8(__a, __b)
>  #define vrshlq_n_s8(__a, __b) __arm_vrshlq_n_s8(__a, __b)
> -#define vrmulhq_s8(__a, __b) __arm_vrmulhq_s8(__a, __b)
> -#define vrhaddq_s8(__a, __b) __arm_vrhaddq_s8(__a, __b)
> -#define vqsubq_s8(__a, __b) __arm_vqsubq_s8(__a, __b)
> -#define vqsubq_n_s8(__a, __b) __arm_vqsubq_n_s8(__a, __b)
>  #define vqshlq_s8(__a, __b) __arm_vqshlq_s8(__a, __b)
>  #define vqshlq_r_s8(__a, __b) __arm_vqshlq_r_s8(__a, __b)
>  #define vqrshlq_s8(__a, __b) __arm_vqrshlq_s8(__a, __b)
>  #define vqrshlq_n_s8(__a, __b) __arm_vqrshlq_n_s8(__a, __b)
>  #define vqrdmulhq_s8(__a, __b) __arm_vqrdmulhq_s8(__a, __b)
>  #define vqrdmulhq_n_s8(__a, __b) __arm_vqrdmulhq_n_s8(__a, __b)
> -#define vqdmulhq_s8(__a, __b) __arm_vqdmulhq_s8(__a, __b)
> -#define vqdmulhq_n_s8(__a, __b) __arm_vqdmulhq_n_s8(__a, __b)
> -#define vqaddq_s8(__a, __b) __arm_vqaddq_s8(__a, __b)
> -#define vqaddq_n_s8(__a, __b) __arm_vqaddq_n_s8(__a, __b)
>  #define vornq_s8(__a, __b) __arm_vornq_s8(__a, __b)
>  #define vmulltq_int_s8(__a, __b) __arm_vmulltq_int_s8(__a, __b)
>  #define vmullbq_int_s8(__a, __b) __arm_vmullbq_int_s8(__a, __b)
> -#define vmulhq_s8(__a, __b) __arm_vmulhq_s8(__a, __b)
>  #define vmlsdavxq_s8(__a, __b) __arm_vmlsdavxq_s8(__a, __b)
>  #define vmlsdavq_s8(__a, __b) __arm_vmlsdavq_s8(__a, __b)
>  #define vmladavxq_s8(__a, __b) __arm_vmladavxq_s8(__a, __b)
> @@ -751,12 +710,8 @@
>  #define vminq_s8(__a, __b) __arm_vminq_s8(__a, __b)
>  #define vmaxvq_s8(__a, __b) __arm_vmaxvq_s8(__a, __b)
>  #define vmaxq_s8(__a, __b) __arm_vmaxq_s8(__a, __b)
> -#define vhsubq_s8(__a, __b) __arm_vhsubq_s8(__a, __b)
> -#define vhsubq_n_s8(__a, __b) __arm_vhsubq_n_s8(__a, __b)
>  #define vhcaddq_rot90_s8(__a, __b) __arm_vhcaddq_rot90_s8(__a, __b)
>  #define vhcaddq_rot270_s8(__a, __b) __arm_vhcaddq_rot270_s8(__a, __b)
> -#define vhaddq_s8(__a, __b) __arm_vhaddq_s8(__a, __b)
> -#define vhaddq_n_s8(__a, __b) __arm_vhaddq_n_s8(__a, __b)
>  #define vcaddq_rot90_s8(__a, __b) __arm_vcaddq_rot90_s8(__a, __b)
>  #define vcaddq_rot270_s8(__a, __b) __arm_vcaddq_rot270_s8(__a, __b)
>  #define vbrsrq_n_s8(__a, __b) __arm_vbrsrq_n_s8(__a, __b)
> @@ -766,25 +721,14 @@
>  #define vshlq_n_s8(__a,  __imm) __arm_vshlq_n_s8(__a,  __imm)
>  #define vrshrq_n_s8(__a,  __imm) __arm_vrshrq_n_s8(__a,  __imm)
>  #define vqshlq_n_s8(__a,  __imm) __arm_vqshlq_n_s8(__a,  __imm)
> -#define vrmulhq_u16(__a, __b) __arm_vrmulhq_u16(__a, __b)
> -#define vrhaddq_u16(__a, __b) __arm_vrhaddq_u16(__a, __b)
> -#define vqsubq_u16(__a, __b) __arm_vqsubq_u16(__a, __b)
> -#define vqsubq_n_u16(__a, __b) __arm_vqsubq_n_u16(__a, __b)
> -#define vqaddq_u16(__a, __b) __arm_vqaddq_u16(__a, __b)
> -#define vqaddq_n_u16(__a, __b) __arm_vqaddq_n_u16(__a, __b)
>  #define vornq_u16(__a, __b) __arm_vornq_u16(__a, __b)
>  #define vmulltq_int_u16(__a, __b) __arm_vmulltq_int_u16(__a, __b)
>  #define vmullbq_int_u16(__a, __b) __arm_vmullbq_int_u16(__a, __b)
> -#define vmulhq_u16(__a, __b) __arm_vmulhq_u16(__a, __b)
>  #define vmladavq_u16(__a, __b) __arm_vmladavq_u16(__a, __b)
>  #define vminvq_u16(__a, __b) __arm_vminvq_u16(__a, __b)
>  #define vminq_u16(__a, __b) __arm_vminq_u16(__a, __b)
>  #define vmaxvq_u16(__a, __b) __arm_vmaxvq_u16(__a, __b)
>  #define vmaxq_u16(__a, __b) __arm_vmaxq_u16(__a, __b)
> -#define vhsubq_u16(__a, __b) __arm_vhsubq_u16(__a, __b)
> -#define vhsubq_n_u16(__a, __b) __arm_vhsubq_n_u16(__a, __b)
> -#define vhaddq_u16(__a, __b) __arm_vhaddq_u16(__a, __b)
> -#define vhaddq_n_u16(__a, __b) __arm_vhaddq_n_u16(__a, __b)
>  #define vcmpneq_n_u16(__a, __b) __arm_vcmpneq_n_u16(__a, __b)
>  #define vcmphiq_u16(__a, __b) __arm_vcmphiq_u16(__a, __b)
>  #define vcmphiq_n_u16(__a, __b) __arm_vcmphiq_n_u16(__a, __b)
> @@ -829,24 +773,15 @@
>  #define vshlq_r_s16(__a, __b) __arm_vshlq_r_s16(__a, __b)
>  #define vrshlq_s16(__a, __b) __arm_vrshlq_s16(__a, __b)
>  #define vrshlq_n_s16(__a, __b) __arm_vrshlq_n_s16(__a, __b)
> -#define vrmulhq_s16(__a, __b) __arm_vrmulhq_s16(__a, __b)
> -#define vrhaddq_s16(__a, __b) __arm_vrhaddq_s16(__a, __b)
> -#define vqsubq_s16(__a, __b) __arm_vqsubq_s16(__a, __b)
> -#define vqsubq_n_s16(__a, __b) __arm_vqsubq_n_s16(__a, __b)
>  #define vqshlq_s16(__a, __b) __arm_vqshlq_s16(__a, __b)
>  #define vqshlq_r_s16(__a, __b) __arm_vqshlq_r_s16(__a, __b)
>  #define vqrshlq_s16(__a, __b) __arm_vqrshlq_s16(__a, __b)
>  #define vqrshlq_n_s16(__a, __b) __arm_vqrshlq_n_s16(__a, __b)
>  #define vqrdmulhq_s16(__a, __b) __arm_vqrdmulhq_s16(__a, __b)
>  #define vqrdmulhq_n_s16(__a, __b) __arm_vqrdmulhq_n_s16(__a, __b)
> -#define vqdmulhq_s16(__a, __b) __arm_vqdmulhq_s16(__a, __b)
> -#define vqdmulhq_n_s16(__a, __b) __arm_vqdmulhq_n_s16(__a, __b)
> -#define vqaddq_s16(__a, __b) __arm_vqaddq_s16(__a, __b)
> -#define vqaddq_n_s16(__a, __b) __arm_vqaddq_n_s16(__a, __b)
>  #define vornq_s16(__a, __b) __arm_vornq_s16(__a, __b)
>  #define vmulltq_int_s16(__a, __b) __arm_vmulltq_int_s16(__a, __b)
>  #define vmullbq_int_s16(__a, __b) __arm_vmullbq_int_s16(__a, __b)
> -#define vmulhq_s16(__a, __b) __arm_vmulhq_s16(__a, __b)
>  #define vmlsdavxq_s16(__a, __b) __arm_vmlsdavxq_s16(__a, __b)
>  #define vmlsdavq_s16(__a, __b) __arm_vmlsdavq_s16(__a, __b)
>  #define vmladavxq_s16(__a, __b) __arm_vmladavxq_s16(__a, __b)
> @@ -855,12 +790,8 @@
>  #define vminq_s16(__a, __b) __arm_vminq_s16(__a, __b)
>  #define vmaxvq_s16(__a, __b) __arm_vmaxvq_s16(__a, __b)
>  #define vmaxq_s16(__a, __b) __arm_vmaxq_s16(__a, __b)
> -#define vhsubq_s16(__a, __b) __arm_vhsubq_s16(__a, __b)
> -#define vhsubq_n_s16(__a, __b) __arm_vhsubq_n_s16(__a, __b)
>  #define vhcaddq_rot90_s16(__a, __b) __arm_vhcaddq_rot90_s16(__a, __b)
>  #define vhcaddq_rot270_s16(__a, __b) __arm_vhcaddq_rot270_s16(__a,
> __b)
> -#define vhaddq_s16(__a, __b) __arm_vhaddq_s16(__a, __b)
> -#define vhaddq_n_s16(__a, __b) __arm_vhaddq_n_s16(__a, __b)
>  #define vcaddq_rot90_s16(__a, __b) __arm_vcaddq_rot90_s16(__a, __b)
>  #define vcaddq_rot270_s16(__a, __b) __arm_vcaddq_rot270_s16(__a, __b)
>  #define vbrsrq_n_s16(__a, __b) __arm_vbrsrq_n_s16(__a, __b)
> @@ -870,25 +801,14 @@
>  #define vshlq_n_s16(__a,  __imm) __arm_vshlq_n_s16(__a,  __imm)
>  #define vrshrq_n_s16(__a,  __imm) __arm_vrshrq_n_s16(__a,  __imm)
>  #define vqshlq_n_s16(__a,  __imm) __arm_vqshlq_n_s16(__a,  __imm)
> -#define vrmulhq_u32(__a, __b) __arm_vrmulhq_u32(__a, __b)
> -#define vrhaddq_u32(__a, __b) __arm_vrhaddq_u32(__a, __b)
> -#define vqsubq_u32(__a, __b) __arm_vqsubq_u32(__a, __b)
> -#define vqsubq_n_u32(__a, __b) __arm_vqsubq_n_u32(__a, __b)
> -#define vqaddq_u32(__a, __b) __arm_vqaddq_u32(__a, __b)
> -#define vqaddq_n_u32(__a, __b) __arm_vqaddq_n_u32(__a, __b)
>  #define vornq_u32(__a, __b) __arm_vornq_u32(__a, __b)
>  #define vmulltq_int_u32(__a, __b) __arm_vmulltq_int_u32(__a, __b)
>  #define vmullbq_int_u32(__a, __b) __arm_vmullbq_int_u32(__a, __b)
> -#define vmulhq_u32(__a, __b) __arm_vmulhq_u32(__a, __b)
>  #define vmladavq_u32(__a, __b) __arm_vmladavq_u32(__a, __b)
>  #define vminvq_u32(__a, __b) __arm_vminvq_u32(__a, __b)
>  #define vminq_u32(__a, __b) __arm_vminq_u32(__a, __b)
>  #define vmaxvq_u32(__a, __b) __arm_vmaxvq_u32(__a, __b)
>  #define vmaxq_u32(__a, __b) __arm_vmaxq_u32(__a, __b)
> -#define vhsubq_u32(__a, __b) __arm_vhsubq_u32(__a, __b)
> -#define vhsubq_n_u32(__a, __b) __arm_vhsubq_n_u32(__a, __b)
> -#define vhaddq_u32(__a, __b) __arm_vhaddq_u32(__a, __b)
> -#define vhaddq_n_u32(__a, __b) __arm_vhaddq_n_u32(__a, __b)
>  #define vcmpneq_n_u32(__a, __b) __arm_vcmpneq_n_u32(__a, __b)
>  #define vcmphiq_u32(__a, __b) __arm_vcmphiq_u32(__a, __b)
>  #define vcmphiq_n_u32(__a, __b) __arm_vcmphiq_n_u32(__a, __b)
> @@ -933,24 +853,15 @@
>  #define vshlq_r_s32(__a, __b) __arm_vshlq_r_s32(__a, __b)
>  #define vrshlq_s32(__a, __b) __arm_vrshlq_s32(__a, __b)
>  #define vrshlq_n_s32(__a, __b) __arm_vrshlq_n_s32(__a, __b)
> -#define vrmulhq_s32(__a, __b) __arm_vrmulhq_s32(__a, __b)
> -#define vrhaddq_s32(__a, __b) __arm_vrhaddq_s32(__a, __b)
> -#define vqsubq_s32(__a, __b) __arm_vqsubq_s32(__a, __b)
> -#define vqsubq_n_s32(__a, __b) __arm_vqsubq_n_s32(__a, __b)
>  #define vqshlq_s32(__a, __b) __arm_vqshlq_s32(__a, __b)
>  #define vqshlq_r_s32(__a, __b) __arm_vqshlq_r_s32(__a, __b)
>  #define vqrshlq_s32(__a, __b) __arm_vqrshlq_s32(__a, __b)
>  #define vqrshlq_n_s32(__a, __b) __arm_vqrshlq_n_s32(__a, __b)
>  #define vqrdmulhq_s32(__a, __b) __arm_vqrdmulhq_s32(__a, __b)
>  #define vqrdmulhq_n_s32(__a, __b) __arm_vqrdmulhq_n_s32(__a, __b)
> -#define vqdmulhq_s32(__a, __b) __arm_vqdmulhq_s32(__a, __b)
> -#define vqdmulhq_n_s32(__a, __b) __arm_vqdmulhq_n_s32(__a, __b)
> -#define vqaddq_s32(__a, __b) __arm_vqaddq_s32(__a, __b)
> -#define vqaddq_n_s32(__a, __b) __arm_vqaddq_n_s32(__a, __b)
>  #define vornq_s32(__a, __b) __arm_vornq_s32(__a, __b)
>  #define vmulltq_int_s32(__a, __b) __arm_vmulltq_int_s32(__a, __b)
>  #define vmullbq_int_s32(__a, __b) __arm_vmullbq_int_s32(__a, __b)
> -#define vmulhq_s32(__a, __b) __arm_vmulhq_s32(__a, __b)
>  #define vmlsdavxq_s32(__a, __b) __arm_vmlsdavxq_s32(__a, __b)
>  #define vmlsdavq_s32(__a, __b) __arm_vmlsdavq_s32(__a, __b)
>  #define vmladavxq_s32(__a, __b) __arm_vmladavxq_s32(__a, __b)
> @@ -959,12 +870,8 @@
>  #define vminq_s32(__a, __b) __arm_vminq_s32(__a, __b)
>  #define vmaxvq_s32(__a, __b) __arm_vmaxvq_s32(__a, __b)
>  #define vmaxq_s32(__a, __b) __arm_vmaxq_s32(__a, __b)
> -#define vhsubq_s32(__a, __b) __arm_vhsubq_s32(__a, __b)
> -#define vhsubq_n_s32(__a, __b) __arm_vhsubq_n_s32(__a, __b)
>  #define vhcaddq_rot90_s32(__a, __b) __arm_vhcaddq_rot90_s32(__a, __b)
>  #define vhcaddq_rot270_s32(__a, __b) __arm_vhcaddq_rot270_s32(__a,
> __b)
> -#define vhaddq_s32(__a, __b) __arm_vhaddq_s32(__a, __b)
> -#define vhaddq_n_s32(__a, __b) __arm_vhaddq_n_s32(__a, __b)
>  #define vcaddq_rot90_s32(__a, __b) __arm_vcaddq_rot90_s32(__a, __b)
>  #define vcaddq_rot270_s32(__a, __b) __arm_vcaddq_rot270_s32(__a, __b)
>  #define vbrsrq_n_s32(__a, __b) __arm_vbrsrq_n_s32(__a, __b)
> @@ -1634,36 +1541,12 @@
>  #define vcaddq_rot90_m_u8(__inactive, __a, __b, __p)
> __arm_vcaddq_rot90_m_u8(__inactive, __a, __b, __p)
>  #define vcaddq_rot90_m_u32(__inactive, __a, __b, __p)
> __arm_vcaddq_rot90_m_u32(__inactive, __a, __b, __p)
>  #define vcaddq_rot90_m_u16(__inactive, __a, __b, __p)
> __arm_vcaddq_rot90_m_u16(__inactive, __a, __b, __p)
> -#define vhaddq_m_n_s8(__inactive, __a, __b, __p)
> __arm_vhaddq_m_n_s8(__inactive, __a, __b, __p)
> -#define vhaddq_m_n_s32(__inactive, __a, __b, __p)
> __arm_vhaddq_m_n_s32(__inactive, __a, __b, __p)
> -#define vhaddq_m_n_s16(__inactive, __a, __b, __p)
> __arm_vhaddq_m_n_s16(__inactive, __a, __b, __p)
> -#define vhaddq_m_n_u8(__inactive, __a, __b, __p)
> __arm_vhaddq_m_n_u8(__inactive, __a, __b, __p)
> -#define vhaddq_m_n_u32(__inactive, __a, __b, __p)
> __arm_vhaddq_m_n_u32(__inactive, __a, __b, __p)
> -#define vhaddq_m_n_u16(__inactive, __a, __b, __p)
> __arm_vhaddq_m_n_u16(__inactive, __a, __b, __p)
> -#define vhaddq_m_s8(__inactive, __a, __b, __p)
> __arm_vhaddq_m_s8(__inactive, __a, __b, __p)
> -#define vhaddq_m_s32(__inactive, __a, __b, __p)
> __arm_vhaddq_m_s32(__inactive, __a, __b, __p)
> -#define vhaddq_m_s16(__inactive, __a, __b, __p)
> __arm_vhaddq_m_s16(__inactive, __a, __b, __p)
> -#define vhaddq_m_u8(__inactive, __a, __b, __p)
> __arm_vhaddq_m_u8(__inactive, __a, __b, __p)
> -#define vhaddq_m_u32(__inactive, __a, __b, __p)
> __arm_vhaddq_m_u32(__inactive, __a, __b, __p)
> -#define vhaddq_m_u16(__inactive, __a, __b, __p)
> __arm_vhaddq_m_u16(__inactive, __a, __b, __p)
>  #define vhcaddq_rot270_m_s8(__inactive, __a, __b, __p)
> __arm_vhcaddq_rot270_m_s8(__inactive, __a, __b, __p)
>  #define vhcaddq_rot270_m_s32(__inactive, __a, __b, __p)
> __arm_vhcaddq_rot270_m_s32(__inactive, __a, __b, __p)
>  #define vhcaddq_rot270_m_s16(__inactive, __a, __b, __p)
> __arm_vhcaddq_rot270_m_s16(__inactive, __a, __b, __p)
>  #define vhcaddq_rot90_m_s8(__inactive, __a, __b, __p)
> __arm_vhcaddq_rot90_m_s8(__inactive, __a, __b, __p)
>  #define vhcaddq_rot90_m_s32(__inactive, __a, __b, __p)
> __arm_vhcaddq_rot90_m_s32(__inactive, __a, __b, __p)
>  #define vhcaddq_rot90_m_s16(__inactive, __a, __b, __p)
> __arm_vhcaddq_rot90_m_s16(__inactive, __a, __b, __p)
> -#define vhsubq_m_n_s8(__inactive, __a, __b, __p)
> __arm_vhsubq_m_n_s8(__inactive, __a, __b, __p)
> -#define vhsubq_m_n_s32(__inactive, __a, __b, __p)
> __arm_vhsubq_m_n_s32(__inactive, __a, __b, __p)
> -#define vhsubq_m_n_s16(__inactive, __a, __b, __p)
> __arm_vhsubq_m_n_s16(__inactive, __a, __b, __p)
> -#define vhsubq_m_n_u8(__inactive, __a, __b, __p)
> __arm_vhsubq_m_n_u8(__inactive, __a, __b, __p)
> -#define vhsubq_m_n_u32(__inactive, __a, __b, __p)
> __arm_vhsubq_m_n_u32(__inactive, __a, __b, __p)
> -#define vhsubq_m_n_u16(__inactive, __a, __b, __p)
> __arm_vhsubq_m_n_u16(__inactive, __a, __b, __p)
> -#define vhsubq_m_s8(__inactive, __a, __b, __p)
> __arm_vhsubq_m_s8(__inactive, __a, __b, __p)
> -#define vhsubq_m_s32(__inactive, __a, __b, __p)
> __arm_vhsubq_m_s32(__inactive, __a, __b, __p)
> -#define vhsubq_m_s16(__inactive, __a, __b, __p)
> __arm_vhsubq_m_s16(__inactive, __a, __b, __p)
> -#define vhsubq_m_u8(__inactive, __a, __b, __p)
> __arm_vhsubq_m_u8(__inactive, __a, __b, __p)
> -#define vhsubq_m_u32(__inactive, __a, __b, __p)
> __arm_vhsubq_m_u32(__inactive, __a, __b, __p)
> -#define vhsubq_m_u16(__inactive, __a, __b, __p)
> __arm_vhsubq_m_u16(__inactive, __a, __b, __p)
>  #define vmaxq_m_s8(__inactive, __a, __b, __p)
> __arm_vmaxq_m_s8(__inactive, __a, __b, __p)
>  #define vmaxq_m_s32(__inactive, __a, __b, __p)
> __arm_vmaxq_m_s32(__inactive, __a, __b, __p)
>  #define vmaxq_m_s16(__inactive, __a, __b, __p)
> __arm_vmaxq_m_s16(__inactive, __a, __b, __p)
> @@ -1703,12 +1586,6 @@
>  #define vmlsdavaxq_p_s8(__a, __b, __c, __p) __arm_vmlsdavaxq_p_s8(__a,
> __b, __c, __p)
>  #define vmlsdavaxq_p_s32(__a, __b, __c, __p)
> __arm_vmlsdavaxq_p_s32(__a, __b, __c, __p)
>  #define vmlsdavaxq_p_s16(__a, __b, __c, __p)
> __arm_vmlsdavaxq_p_s16(__a, __b, __c, __p)
> -#define vmulhq_m_s8(__inactive, __a, __b, __p)
> __arm_vmulhq_m_s8(__inactive, __a, __b, __p)
> -#define vmulhq_m_s32(__inactive, __a, __b, __p)
> __arm_vmulhq_m_s32(__inactive, __a, __b, __p)
> -#define vmulhq_m_s16(__inactive, __a, __b, __p)
> __arm_vmulhq_m_s16(__inactive, __a, __b, __p)
> -#define vmulhq_m_u8(__inactive, __a, __b, __p)
> __arm_vmulhq_m_u8(__inactive, __a, __b, __p)
> -#define vmulhq_m_u32(__inactive, __a, __b, __p)
> __arm_vmulhq_m_u32(__inactive, __a, __b, __p)
> -#define vmulhq_m_u16(__inactive, __a, __b, __p)
> __arm_vmulhq_m_u16(__inactive, __a, __b, __p)
>  #define vmullbq_int_m_s8(__inactive, __a, __b, __p)
> __arm_vmullbq_int_m_s8(__inactive, __a, __b, __p)
>  #define vmullbq_int_m_s32(__inactive, __a, __b, __p)
> __arm_vmullbq_int_m_s32(__inactive, __a, __b, __p)
>  #define vmullbq_int_m_s16(__inactive, __a, __b, __p)
> __arm_vmullbq_int_m_s16(__inactive, __a, __b, __p)
> @@ -1727,18 +1604,6 @@
>  #define vornq_m_u8(__inactive, __a, __b, __p)
> __arm_vornq_m_u8(__inactive, __a, __b, __p)
>  #define vornq_m_u32(__inactive, __a, __b, __p)
> __arm_vornq_m_u32(__inactive, __a, __b, __p)
>  #define vornq_m_u16(__inactive, __a, __b, __p)
> __arm_vornq_m_u16(__inactive, __a, __b, __p)
> -#define vqaddq_m_n_s8(__inactive, __a, __b, __p)
> __arm_vqaddq_m_n_s8(__inactive, __a, __b, __p)
> -#define vqaddq_m_n_s32(__inactive, __a, __b, __p)
> __arm_vqaddq_m_n_s32(__inactive, __a, __b, __p)
> -#define vqaddq_m_n_s16(__inactive, __a, __b, __p)
> __arm_vqaddq_m_n_s16(__inactive, __a, __b, __p)
> -#define vqaddq_m_n_u8(__inactive, __a, __b, __p)
> __arm_vqaddq_m_n_u8(__inactive, __a, __b, __p)
> -#define vqaddq_m_n_u32(__inactive, __a, __b, __p)
> __arm_vqaddq_m_n_u32(__inactive, __a, __b, __p)
> -#define vqaddq_m_n_u16(__inactive, __a, __b, __p)
> __arm_vqaddq_m_n_u16(__inactive, __a, __b, __p)
> -#define vqaddq_m_s8(__inactive, __a, __b, __p)
> __arm_vqaddq_m_s8(__inactive, __a, __b, __p)
> -#define vqaddq_m_s32(__inactive, __a, __b, __p)
> __arm_vqaddq_m_s32(__inactive, __a, __b, __p)
> -#define vqaddq_m_s16(__inactive, __a, __b, __p)
> __arm_vqaddq_m_s16(__inactive, __a, __b, __p)
> -#define vqaddq_m_u8(__inactive, __a, __b, __p)
> __arm_vqaddq_m_u8(__inactive, __a, __b, __p)
> -#define vqaddq_m_u32(__inactive, __a, __b, __p)
> __arm_vqaddq_m_u32(__inactive, __a, __b, __p)
> -#define vqaddq_m_u16(__inactive, __a, __b, __p)
> __arm_vqaddq_m_u16(__inactive, __a, __b, __p)
>  #define vqdmladhq_m_s8(__inactive, __a, __b, __p)
> __arm_vqdmladhq_m_s8(__inactive, __a, __b, __p)
>  #define vqdmladhq_m_s32(__inactive, __a, __b, __p)
> __arm_vqdmladhq_m_s32(__inactive, __a, __b, __p)
>  #define vqdmladhq_m_s16(__inactive, __a, __b, __p)
> __arm_vqdmladhq_m_s16(__inactive, __a, __b, __p)
> @@ -1757,12 +1622,6 @@
>  #define vqdmlsdhxq_m_s8(__inactive, __a, __b, __p)
> __arm_vqdmlsdhxq_m_s8(__inactive, __a, __b, __p)
>  #define vqdmlsdhxq_m_s32(__inactive, __a, __b, __p)
> __arm_vqdmlsdhxq_m_s32(__inactive, __a, __b, __p)
>  #define vqdmlsdhxq_m_s16(__inactive, __a, __b, __p)
> __arm_vqdmlsdhxq_m_s16(__inactive, __a, __b, __p)
> -#define vqdmulhq_m_n_s8(__inactive, __a, __b, __p)
> __arm_vqdmulhq_m_n_s8(__inactive, __a, __b, __p)
> -#define vqdmulhq_m_n_s32(__inactive, __a, __b, __p)
> __arm_vqdmulhq_m_n_s32(__inactive, __a, __b, __p)
> -#define vqdmulhq_m_n_s16(__inactive, __a, __b, __p)
> __arm_vqdmulhq_m_n_s16(__inactive, __a, __b, __p)
> -#define vqdmulhq_m_s8(__inactive, __a, __b, __p)
> __arm_vqdmulhq_m_s8(__inactive, __a, __b, __p)
> -#define vqdmulhq_m_s32(__inactive, __a, __b, __p)
> __arm_vqdmulhq_m_s32(__inactive, __a, __b, __p)
> -#define vqdmulhq_m_s16(__inactive, __a, __b, __p)
> __arm_vqdmulhq_m_s16(__inactive, __a, __b, __p)
>  #define vqrdmladhq_m_s8(__inactive, __a, __b, __p)
> __arm_vqrdmladhq_m_s8(__inactive, __a, __b, __p)
>  #define vqrdmladhq_m_s32(__inactive, __a, __b, __p)
> __arm_vqrdmladhq_m_s32(__inactive, __a, __b, __p)
>  #define vqrdmladhq_m_s16(__inactive, __a, __b, __p)
> __arm_vqrdmladhq_m_s16(__inactive, __a, __b, __p)
> @@ -1805,30 +1664,6 @@
>  #define vqshlq_m_u8(__inactive, __a, __b, __p)
> __arm_vqshlq_m_u8(__inactive, __a, __b, __p)
>  #define vqshlq_m_u32(__inactive, __a, __b, __p)
> __arm_vqshlq_m_u32(__inactive, __a, __b, __p)
>  #define vqshlq_m_u16(__inactive, __a, __b, __p)
> __arm_vqshlq_m_u16(__inactive, __a, __b, __p)
> -#define vqsubq_m_n_s8(__inactive, __a, __b, __p)
> __arm_vqsubq_m_n_s8(__inactive, __a, __b, __p)
> -#define vqsubq_m_n_s32(__inactive, __a, __b, __p)
> __arm_vqsubq_m_n_s32(__inactive, __a, __b, __p)
> -#define vqsubq_m_n_s16(__inactive, __a, __b, __p)
> __arm_vqsubq_m_n_s16(__inactive, __a, __b, __p)
> -#define vqsubq_m_n_u8(__inactive, __a, __b, __p)
> __arm_vqsubq_m_n_u8(__inactive, __a, __b, __p)
> -#define vqsubq_m_n_u32(__inactive, __a, __b, __p)
> __arm_vqsubq_m_n_u32(__inactive, __a, __b, __p)
> -#define vqsubq_m_n_u16(__inactive, __a, __b, __p)
> __arm_vqsubq_m_n_u16(__inactive, __a, __b, __p)
> -#define vqsubq_m_s8(__inactive, __a, __b, __p)
> __arm_vqsubq_m_s8(__inactive, __a, __b, __p)
> -#define vqsubq_m_s32(__inactive, __a, __b, __p)
> __arm_vqsubq_m_s32(__inactive, __a, __b, __p)
> -#define vqsubq_m_s16(__inactive, __a, __b, __p)
> __arm_vqsubq_m_s16(__inactive, __a, __b, __p)
> -#define vqsubq_m_u8(__inactive, __a, __b, __p)
> __arm_vqsubq_m_u8(__inactive, __a, __b, __p)
> -#define vqsubq_m_u32(__inactive, __a, __b, __p)
> __arm_vqsubq_m_u32(__inactive, __a, __b, __p)
> -#define vqsubq_m_u16(__inactive, __a, __b, __p)
> __arm_vqsubq_m_u16(__inactive, __a, __b, __p)
> -#define vrhaddq_m_s8(__inactive, __a, __b, __p)
> __arm_vrhaddq_m_s8(__inactive, __a, __b, __p)
> -#define vrhaddq_m_s32(__inactive, __a, __b, __p)
> __arm_vrhaddq_m_s32(__inactive, __a, __b, __p)
> -#define vrhaddq_m_s16(__inactive, __a, __b, __p)
> __arm_vrhaddq_m_s16(__inactive, __a, __b, __p)
> -#define vrhaddq_m_u8(__inactive, __a, __b, __p)
> __arm_vrhaddq_m_u8(__inactive, __a, __b, __p)
> -#define vrhaddq_m_u32(__inactive, __a, __b, __p)
> __arm_vrhaddq_m_u32(__inactive, __a, __b, __p)
> -#define vrhaddq_m_u16(__inactive, __a, __b, __p)
> __arm_vrhaddq_m_u16(__inactive, __a, __b, __p)
> -#define vrmulhq_m_s8(__inactive, __a, __b, __p)
> __arm_vrmulhq_m_s8(__inactive, __a, __b, __p)
> -#define vrmulhq_m_s32(__inactive, __a, __b, __p)
> __arm_vrmulhq_m_s32(__inactive, __a, __b, __p)
> -#define vrmulhq_m_s16(__inactive, __a, __b, __p)
> __arm_vrmulhq_m_s16(__inactive, __a, __b, __p)
> -#define vrmulhq_m_u8(__inactive, __a, __b, __p)
> __arm_vrmulhq_m_u8(__inactive, __a, __b, __p)
> -#define vrmulhq_m_u32(__inactive, __a, __b, __p)
> __arm_vrmulhq_m_u32(__inactive, __a, __b, __p)
> -#define vrmulhq_m_u16(__inactive, __a, __b, __p)
> __arm_vrmulhq_m_u16(__inactive, __a, __b, __p)
>  #define vrshlq_m_s8(__inactive, __a, __b, __p)
> __arm_vrshlq_m_s8(__inactive, __a, __b, __p)
>  #define vrshlq_m_s32(__inactive, __a, __b, __p)
> __arm_vrshlq_m_s32(__inactive, __a, __b, __p)
>  #define vrshlq_m_s16(__inactive, __a, __b, __p)
> __arm_vrshlq_m_s16(__inactive, __a, __b, __p)
> @@ -2315,12 +2150,6 @@
>  #define vnegq_x_s8(__a, __p) __arm_vnegq_x_s8(__a, __p)
>  #define vnegq_x_s16(__a, __p) __arm_vnegq_x_s16(__a, __p)
>  #define vnegq_x_s32(__a, __p) __arm_vnegq_x_s32(__a, __p)
> -#define vmulhq_x_s8(__a, __b, __p) __arm_vmulhq_x_s8(__a, __b, __p)
> -#define vmulhq_x_s16(__a, __b, __p) __arm_vmulhq_x_s16(__a, __b, __p)
> -#define vmulhq_x_s32(__a, __b, __p) __arm_vmulhq_x_s32(__a, __b, __p)
> -#define vmulhq_x_u8(__a, __b, __p) __arm_vmulhq_x_u8(__a, __b, __p)
> -#define vmulhq_x_u16(__a, __b, __p) __arm_vmulhq_x_u16(__a, __b, __p)
> -#define vmulhq_x_u32(__a, __b, __p) __arm_vmulhq_x_u32(__a, __b, __p)
>  #define vmullbq_poly_x_p8(__a, __b, __p) __arm_vmullbq_poly_x_p8(__a,
> __b, __p)
>  #define vmullbq_poly_x_p16(__a, __b, __p)
> __arm_vmullbq_poly_x_p16(__a, __b, __p)
>  #define vmullbq_int_x_s8(__a, __b, __p) __arm_vmullbq_int_x_s8(__a, __b,
> __p)
> @@ -2349,48 +2178,12 @@
>  #define vcaddq_rot270_x_u8(__a, __b, __p)
> __arm_vcaddq_rot270_x_u8(__a, __b, __p)
>  #define vcaddq_rot270_x_u16(__a, __b, __p)
> __arm_vcaddq_rot270_x_u16(__a, __b, __p)
>  #define vcaddq_rot270_x_u32(__a, __b, __p)
> __arm_vcaddq_rot270_x_u32(__a, __b, __p)
> -#define vhaddq_x_n_s8(__a, __b, __p) __arm_vhaddq_x_n_s8(__a, __b,
> __p)
> -#define vhaddq_x_n_s16(__a, __b, __p) __arm_vhaddq_x_n_s16(__a, __b,
> __p)
> -#define vhaddq_x_n_s32(__a, __b, __p) __arm_vhaddq_x_n_s32(__a, __b,
> __p)
> -#define vhaddq_x_n_u8(__a, __b, __p) __arm_vhaddq_x_n_u8(__a, __b,
> __p)
> -#define vhaddq_x_n_u16(__a, __b, __p) __arm_vhaddq_x_n_u16(__a, __b,
> __p)
> -#define vhaddq_x_n_u32(__a, __b, __p) __arm_vhaddq_x_n_u32(__a, __b,
> __p)
> -#define vhaddq_x_s8(__a, __b, __p) __arm_vhaddq_x_s8(__a, __b, __p)
> -#define vhaddq_x_s16(__a, __b, __p) __arm_vhaddq_x_s16(__a, __b, __p)
> -#define vhaddq_x_s32(__a, __b, __p) __arm_vhaddq_x_s32(__a, __b, __p)
> -#define vhaddq_x_u8(__a, __b, __p) __arm_vhaddq_x_u8(__a, __b, __p)
> -#define vhaddq_x_u16(__a, __b, __p) __arm_vhaddq_x_u16(__a, __b, __p)
> -#define vhaddq_x_u32(__a, __b, __p) __arm_vhaddq_x_u32(__a, __b, __p)
>  #define vhcaddq_rot90_x_s8(__a, __b, __p)
> __arm_vhcaddq_rot90_x_s8(__a, __b, __p)
>  #define vhcaddq_rot90_x_s16(__a, __b, __p)
> __arm_vhcaddq_rot90_x_s16(__a, __b, __p)
>  #define vhcaddq_rot90_x_s32(__a, __b, __p)
> __arm_vhcaddq_rot90_x_s32(__a, __b, __p)
>  #define vhcaddq_rot270_x_s8(__a, __b, __p)
> __arm_vhcaddq_rot270_x_s8(__a, __b, __p)
>  #define vhcaddq_rot270_x_s16(__a, __b, __p)
> __arm_vhcaddq_rot270_x_s16(__a, __b, __p)
>  #define vhcaddq_rot270_x_s32(__a, __b, __p)
> __arm_vhcaddq_rot270_x_s32(__a, __b, __p)
> -#define vhsubq_x_n_s8(__a, __b, __p) __arm_vhsubq_x_n_s8(__a, __b,
> __p)
> -#define vhsubq_x_n_s16(__a, __b, __p) __arm_vhsubq_x_n_s16(__a, __b,
> __p)
> -#define vhsubq_x_n_s32(__a, __b, __p) __arm_vhsubq_x_n_s32(__a, __b,
> __p)
> -#define vhsubq_x_n_u8(__a, __b, __p) __arm_vhsubq_x_n_u8(__a, __b,
> __p)
> -#define vhsubq_x_n_u16(__a, __b, __p) __arm_vhsubq_x_n_u16(__a, __b,
> __p)
> -#define vhsubq_x_n_u32(__a, __b, __p) __arm_vhsubq_x_n_u32(__a, __b,
> __p)
> -#define vhsubq_x_s8(__a, __b, __p) __arm_vhsubq_x_s8(__a, __b, __p)
> -#define vhsubq_x_s16(__a, __b, __p) __arm_vhsubq_x_s16(__a, __b, __p)
> -#define vhsubq_x_s32(__a, __b, __p) __arm_vhsubq_x_s32(__a, __b, __p)
> -#define vhsubq_x_u8(__a, __b, __p) __arm_vhsubq_x_u8(__a, __b, __p)
> -#define vhsubq_x_u16(__a, __b, __p) __arm_vhsubq_x_u16(__a, __b, __p)
> -#define vhsubq_x_u32(__a, __b, __p) __arm_vhsubq_x_u32(__a, __b, __p)
> -#define vrhaddq_x_s8(__a, __b, __p) __arm_vrhaddq_x_s8(__a, __b, __p)
> -#define vrhaddq_x_s16(__a, __b, __p) __arm_vrhaddq_x_s16(__a, __b,
> __p)
> -#define vrhaddq_x_s32(__a, __b, __p) __arm_vrhaddq_x_s32(__a, __b,
> __p)
> -#define vrhaddq_x_u8(__a, __b, __p) __arm_vrhaddq_x_u8(__a, __b, __p)
> -#define vrhaddq_x_u16(__a, __b, __p) __arm_vrhaddq_x_u16(__a, __b,
> __p)
> -#define vrhaddq_x_u32(__a, __b, __p) __arm_vrhaddq_x_u32(__a, __b,
> __p)
> -#define vrmulhq_x_s8(__a, __b, __p) __arm_vrmulhq_x_s8(__a, __b, __p)
> -#define vrmulhq_x_s16(__a, __b, __p) __arm_vrmulhq_x_s16(__a, __b,
> __p)
> -#define vrmulhq_x_s32(__a, __b, __p) __arm_vrmulhq_x_s32(__a, __b,
> __p)
> -#define vrmulhq_x_u8(__a, __b, __p) __arm_vrmulhq_x_u8(__a, __b, __p)
> -#define vrmulhq_x_u16(__a, __b, __p) __arm_vrmulhq_x_u16(__a, __b,
> __p)
> -#define vrmulhq_x_u32(__a, __b, __p) __arm_vrmulhq_x_u32(__a, __b,
> __p)
>  #define vbicq_x_s8(__a, __b, __p) __arm_vbicq_x_s8(__a, __b, __p)
>  #define vbicq_x_s16(__a, __b, __p) __arm_vbicq_x_s16(__a, __b, __p)
>  #define vbicq_x_s32(__a, __b, __p) __arm_vbicq_x_s32(__a, __b, __p)
> @@ -3351,48 +3144,6 @@ __arm_vshlq_u32 (uint32x4_t __a, int32x4_t __b)
>    return __builtin_mve_vshlq_uv4si (__a, __b);
>  }
> 
> -__extension__ extern __inline uint8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vrmulhq_u8 (uint8x16_t __a, uint8x16_t __b)
> -{
> -  return __builtin_mve_vrmulhq_uv16qi (__a, __b);
> -}
> -
> -__extension__ extern __inline uint8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vrhaddq_u8 (uint8x16_t __a, uint8x16_t __b)
> -{
> -  return __builtin_mve_vrhaddq_uv16qi (__a, __b);
> -}
> -
> -__extension__ extern __inline uint8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vqsubq_u8 (uint8x16_t __a, uint8x16_t __b)
> -{
> -  return __builtin_mve_vqsubq_uv16qi (__a, __b);
> -}
> -
> -__extension__ extern __inline uint8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vqsubq_n_u8 (uint8x16_t __a, uint8_t __b)
> -{
> -  return __builtin_mve_vqsubq_n_uv16qi (__a, __b);
> -}
> -
> -__extension__ extern __inline uint8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vqaddq_u8 (uint8x16_t __a, uint8x16_t __b)
> -{
> -  return __builtin_mve_vqaddq_uv16qi (__a, __b);
> -}
> -
> -__extension__ extern __inline uint8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vqaddq_n_u8 (uint8x16_t __a, uint8_t __b)
> -{
> -  return __builtin_mve_vqaddq_n_uv16qi (__a, __b);
> -}
> -
>  __extension__ extern __inline uint8x16_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vornq_u8 (uint8x16_t __a, uint8x16_t __b)
> @@ -3414,13 +3165,6 @@ __arm_vmullbq_int_u8 (uint8x16_t __a,
> uint8x16_t __b)
>    return __builtin_mve_vmullbq_int_uv16qi (__a, __b);
>  }
> 
> -__extension__ extern __inline uint8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vmulhq_u8 (uint8x16_t __a, uint8x16_t __b)
> -{
> -  return __builtin_mve_vmulhq_uv16qi (__a, __b);
> -}
> -
>  __extension__ extern __inline uint32_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vmladavq_u8 (uint8x16_t __a, uint8x16_t __b)
> @@ -3456,34 +3200,6 @@ __arm_vmaxq_u8 (uint8x16_t __a, uint8x16_t
> __b)
>    return __builtin_mve_vmaxq_uv16qi (__a, __b);
>  }
> 
> -__extension__ extern __inline uint8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vhsubq_u8 (uint8x16_t __a, uint8x16_t __b)
> -{
> -  return __builtin_mve_vhsubq_uv16qi (__a, __b);
> -}
> -
> -__extension__ extern __inline uint8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vhsubq_n_u8 (uint8x16_t __a, uint8_t __b)
> -{
> -  return __builtin_mve_vhsubq_n_uv16qi (__a, __b);
> -}
> -
> -__extension__ extern __inline uint8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vhaddq_u8 (uint8x16_t __a, uint8x16_t __b)
> -{
> -  return __builtin_mve_vhaddq_uv16qi (__a, __b);
> -}
> -
> -__extension__ extern __inline uint8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vhaddq_n_u8 (uint8x16_t __a, uint8_t __b)
> -{
> -  return __builtin_mve_vhaddq_n_uv16qi (__a, __b);
> -}
> -
>  __extension__ extern __inline mve_pred16_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vcmpneq_n_u8 (uint8x16_t __a, uint8_t __b)
> @@ -3794,34 +3510,6 @@ __arm_vrshlq_n_s8 (int8x16_t __a, int32_t __b)
>    return __builtin_mve_vrshlq_n_sv16qi (__a, __b);
>  }
> 
> -__extension__ extern __inline int8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vrmulhq_s8 (int8x16_t __a, int8x16_t __b)
> -{
> -  return __builtin_mve_vrmulhq_sv16qi (__a, __b);
> -}
> -
> -__extension__ extern __inline int8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vrhaddq_s8 (int8x16_t __a, int8x16_t __b)
> -{
> -  return __builtin_mve_vrhaddq_sv16qi (__a, __b);
> -}
> -
> -__extension__ extern __inline int8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vqsubq_s8 (int8x16_t __a, int8x16_t __b)
> -{
> -  return __builtin_mve_vqsubq_sv16qi (__a, __b);
> -}
> -
> -__extension__ extern __inline int8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vqsubq_n_s8 (int8x16_t __a, int8_t __b)
> -{
> -  return __builtin_mve_vqsubq_n_sv16qi (__a, __b);
> -}
> -
>  __extension__ extern __inline int8x16_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vqshlq_s8 (int8x16_t __a, int8x16_t __b)
> @@ -3864,34 +3552,6 @@ __arm_vqrdmulhq_n_s8 (int8x16_t __a, int8_t
> __b)
>    return __builtin_mve_vqrdmulhq_n_sv16qi (__a, __b);
>  }
> 
> -__extension__ extern __inline int8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vqdmulhq_s8 (int8x16_t __a, int8x16_t __b)
> -{
> -  return __builtin_mve_vqdmulhq_sv16qi (__a, __b);
> -}
> -
> -__extension__ extern __inline int8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vqdmulhq_n_s8 (int8x16_t __a, int8_t __b)
> -{
> -  return __builtin_mve_vqdmulhq_n_sv16qi (__a, __b);
> -}
> -
> -__extension__ extern __inline int8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vqaddq_s8 (int8x16_t __a, int8x16_t __b)
> -{
> -  return __builtin_mve_vqaddq_sv16qi (__a, __b);
> -}
> -
> -__extension__ extern __inline int8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vqaddq_n_s8 (int8x16_t __a, int8_t __b)
> -{
> -  return __builtin_mve_vqaddq_n_sv16qi (__a, __b);
> -}
> -
>  __extension__ extern __inline int8x16_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vornq_s8 (int8x16_t __a, int8x16_t __b)
> @@ -3913,13 +3573,6 @@ __arm_vmullbq_int_s8 (int8x16_t __a, int8x16_t
> __b)
>    return __builtin_mve_vmullbq_int_sv16qi (__a, __b);
>  }
> 
> -__extension__ extern __inline int8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vmulhq_s8 (int8x16_t __a, int8x16_t __b)
> -{
> -  return __builtin_mve_vmulhq_sv16qi (__a, __b);
> -}
> -
>  __extension__ extern __inline int32_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vmlsdavxq_s8 (int8x16_t __a, int8x16_t __b)
> @@ -3976,20 +3629,6 @@ __arm_vmaxq_s8 (int8x16_t __a, int8x16_t __b)
>    return __builtin_mve_vmaxq_sv16qi (__a, __b);
>  }
> 
> -__extension__ extern __inline int8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vhsubq_s8 (int8x16_t __a, int8x16_t __b)
> -{
> -  return __builtin_mve_vhsubq_sv16qi (__a, __b);
> -}
> -
> -__extension__ extern __inline int8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vhsubq_n_s8 (int8x16_t __a, int8_t __b)
> -{
> -  return __builtin_mve_vhsubq_n_sv16qi (__a, __b);
> -}
> -
>  __extension__ extern __inline int8x16_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vhcaddq_rot90_s8 (int8x16_t __a, int8x16_t __b)
> @@ -4004,20 +3643,6 @@ __arm_vhcaddq_rot270_s8 (int8x16_t __a,
> int8x16_t __b)
>    return __builtin_mve_vhcaddq_rot270_sv16qi (__a, __b);
>  }
> 
> -__extension__ extern __inline int8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vhaddq_s8 (int8x16_t __a, int8x16_t __b)
> -{
> -  return __builtin_mve_vhaddq_sv16qi (__a, __b);
> -}
> -
> -__extension__ extern __inline int8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vhaddq_n_s8 (int8x16_t __a, int8_t __b)
> -{
> -  return __builtin_mve_vhaddq_n_sv16qi (__a, __b);
> -}
> -
>  __extension__ extern __inline int8x16_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vcaddq_rot90_s8 (int8x16_t __a, int8x16_t __b)
> @@ -4081,48 +3706,6 @@ __arm_vqshlq_n_s8 (int8x16_t __a, const int
> __imm)
>    return __builtin_mve_vqshlq_n_sv16qi (__a, __imm);
>  }
> 
> -__extension__ extern __inline uint16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vrmulhq_u16 (uint16x8_t __a, uint16x8_t __b)
> -{
> -  return __builtin_mve_vrmulhq_uv8hi (__a, __b);
> -}
> -
> -__extension__ extern __inline uint16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vrhaddq_u16 (uint16x8_t __a, uint16x8_t __b)
> -{
> -  return __builtin_mve_vrhaddq_uv8hi (__a, __b);
> -}
> -
> -__extension__ extern __inline uint16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vqsubq_u16 (uint16x8_t __a, uint16x8_t __b)
> -{
> -  return __builtin_mve_vqsubq_uv8hi (__a, __b);
> -}
> -
> -__extension__ extern __inline uint16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vqsubq_n_u16 (uint16x8_t __a, uint16_t __b)
> -{
> -  return __builtin_mve_vqsubq_n_uv8hi (__a, __b);
> -}
> -
> -__extension__ extern __inline uint16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vqaddq_u16 (uint16x8_t __a, uint16x8_t __b)
> -{
> -  return __builtin_mve_vqaddq_uv8hi (__a, __b);
> -}
> -
> -__extension__ extern __inline uint16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vqaddq_n_u16 (uint16x8_t __a, uint16_t __b)
> -{
> -  return __builtin_mve_vqaddq_n_uv8hi (__a, __b);
> -}
> -
>  __extension__ extern __inline uint16x8_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vornq_u16 (uint16x8_t __a, uint16x8_t __b)
> @@ -4144,13 +3727,6 @@ __arm_vmullbq_int_u16 (uint16x8_t __a,
> uint16x8_t __b)
>    return __builtin_mve_vmullbq_int_uv8hi (__a, __b);
>  }
> 
> -__extension__ extern __inline uint16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vmulhq_u16 (uint16x8_t __a, uint16x8_t __b)
> -{
> -  return __builtin_mve_vmulhq_uv8hi (__a, __b);
> -}
> -
>  __extension__ extern __inline uint32_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vmladavq_u16 (uint16x8_t __a, uint16x8_t __b)
> @@ -4186,34 +3762,6 @@ __arm_vmaxq_u16 (uint16x8_t __a, uint16x8_t
> __b)
>    return __builtin_mve_vmaxq_uv8hi (__a, __b);
>  }
> 
> -__extension__ extern __inline uint16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vhsubq_u16 (uint16x8_t __a, uint16x8_t __b)
> -{
> -  return __builtin_mve_vhsubq_uv8hi (__a, __b);
> -}
> -
> -__extension__ extern __inline uint16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vhsubq_n_u16 (uint16x8_t __a, uint16_t __b)
> -{
> -  return __builtin_mve_vhsubq_n_uv8hi (__a, __b);
> -}
> -
> -__extension__ extern __inline uint16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vhaddq_u16 (uint16x8_t __a, uint16x8_t __b)
> -{
> -  return __builtin_mve_vhaddq_uv8hi (__a, __b);
> -}
> -
> -__extension__ extern __inline uint16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vhaddq_n_u16 (uint16x8_t __a, uint16_t __b)
> -{
> -  return __builtin_mve_vhaddq_n_uv8hi (__a, __b);
> -}
> -
>  __extension__ extern __inline mve_pred16_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vcmpneq_n_u16 (uint16x8_t __a, uint16_t __b)
> @@ -4524,34 +4072,6 @@ __arm_vrshlq_n_s16 (int16x8_t __a, int32_t __b)
>    return __builtin_mve_vrshlq_n_sv8hi (__a, __b);
>  }
> 
> -__extension__ extern __inline int16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vrmulhq_s16 (int16x8_t __a, int16x8_t __b)
> -{
> -  return __builtin_mve_vrmulhq_sv8hi (__a, __b);
> -}
> -
> -__extension__ extern __inline int16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vrhaddq_s16 (int16x8_t __a, int16x8_t __b)
> -{
> -  return __builtin_mve_vrhaddq_sv8hi (__a, __b);
> -}
> -
> -__extension__ extern __inline int16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vqsubq_s16 (int16x8_t __a, int16x8_t __b)
> -{
> -  return __builtin_mve_vqsubq_sv8hi (__a, __b);
> -}
> -
> -__extension__ extern __inline int16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vqsubq_n_s16 (int16x8_t __a, int16_t __b)
> -{
> -  return __builtin_mve_vqsubq_n_sv8hi (__a, __b);
> -}
> -
>  __extension__ extern __inline int16x8_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vqshlq_s16 (int16x8_t __a, int16x8_t __b)
> @@ -4594,34 +4114,6 @@ __arm_vqrdmulhq_n_s16 (int16x8_t __a, int16_t
> __b)
>    return __builtin_mve_vqrdmulhq_n_sv8hi (__a, __b);
>  }
> 
> -__extension__ extern __inline int16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vqdmulhq_s16 (int16x8_t __a, int16x8_t __b)
> -{
> -  return __builtin_mve_vqdmulhq_sv8hi (__a, __b);
> -}
> -
> -__extension__ extern __inline int16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vqdmulhq_n_s16 (int16x8_t __a, int16_t __b)
> -{
> -  return __builtin_mve_vqdmulhq_n_sv8hi (__a, __b);
> -}
> -
> -__extension__ extern __inline int16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vqaddq_s16 (int16x8_t __a, int16x8_t __b)
> -{
> -  return __builtin_mve_vqaddq_sv8hi (__a, __b);
> -}
> -
> -__extension__ extern __inline int16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vqaddq_n_s16 (int16x8_t __a, int16_t __b)
> -{
> -  return __builtin_mve_vqaddq_n_sv8hi (__a, __b);
> -}
> -
>  __extension__ extern __inline int16x8_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vornq_s16 (int16x8_t __a, int16x8_t __b)
> @@ -4643,13 +4135,6 @@ __arm_vmullbq_int_s16 (int16x8_t __a, int16x8_t
> __b)
>    return __builtin_mve_vmullbq_int_sv8hi (__a, __b);
>  }
> 
> -__extension__ extern __inline int16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vmulhq_s16 (int16x8_t __a, int16x8_t __b)
> -{
> -  return __builtin_mve_vmulhq_sv8hi (__a, __b);
> -}
> -
>  __extension__ extern __inline int32_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vmlsdavxq_s16 (int16x8_t __a, int16x8_t __b)
> @@ -4706,20 +4191,6 @@ __arm_vmaxq_s16 (int16x8_t __a, int16x8_t __b)
>    return __builtin_mve_vmaxq_sv8hi (__a, __b);
>  }
> 
> -__extension__ extern __inline int16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vhsubq_s16 (int16x8_t __a, int16x8_t __b)
> -{
> -  return __builtin_mve_vhsubq_sv8hi (__a, __b);
> -}
> -
> -__extension__ extern __inline int16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vhsubq_n_s16 (int16x8_t __a, int16_t __b)
> -{
> -  return __builtin_mve_vhsubq_n_sv8hi (__a, __b);
> -}
> -
>  __extension__ extern __inline int16x8_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vhcaddq_rot90_s16 (int16x8_t __a, int16x8_t __b)
> @@ -4734,20 +4205,6 @@ __arm_vhcaddq_rot270_s16 (int16x8_t __a,
> int16x8_t __b)
>    return __builtin_mve_vhcaddq_rot270_sv8hi (__a, __b);
>  }
> 
> -__extension__ extern __inline int16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vhaddq_s16 (int16x8_t __a, int16x8_t __b)
> -{
> -  return __builtin_mve_vhaddq_sv8hi (__a, __b);
> -}
> -
> -__extension__ extern __inline int16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vhaddq_n_s16 (int16x8_t __a, int16_t __b)
> -{
> -  return __builtin_mve_vhaddq_n_sv8hi (__a, __b);
> -}
> -
>  __extension__ extern __inline int16x8_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vcaddq_rot90_s16 (int16x8_t __a, int16x8_t __b)
> @@ -4811,48 +4268,6 @@ __arm_vqshlq_n_s16 (int16x8_t __a, const int
> __imm)
>    return __builtin_mve_vqshlq_n_sv8hi (__a, __imm);
>  }
> 
> -__extension__ extern __inline uint32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vrmulhq_u32 (uint32x4_t __a, uint32x4_t __b)
> -{
> -  return __builtin_mve_vrmulhq_uv4si (__a, __b);
> -}
> -
> -__extension__ extern __inline uint32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vrhaddq_u32 (uint32x4_t __a, uint32x4_t __b)
> -{
> -  return __builtin_mve_vrhaddq_uv4si (__a, __b);
> -}
> -
> -__extension__ extern __inline uint32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vqsubq_u32 (uint32x4_t __a, uint32x4_t __b)
> -{
> -  return __builtin_mve_vqsubq_uv4si (__a, __b);
> -}
> -
> -__extension__ extern __inline uint32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vqsubq_n_u32 (uint32x4_t __a, uint32_t __b)
> -{
> -  return __builtin_mve_vqsubq_n_uv4si (__a, __b);
> -}
> -
> -__extension__ extern __inline uint32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vqaddq_u32 (uint32x4_t __a, uint32x4_t __b)
> -{
> -  return __builtin_mve_vqaddq_uv4si (__a, __b);
> -}
> -
> -__extension__ extern __inline uint32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vqaddq_n_u32 (uint32x4_t __a, uint32_t __b)
> -{
> -  return __builtin_mve_vqaddq_n_uv4si (__a, __b);
> -}
> -
>  __extension__ extern __inline uint32x4_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vornq_u32 (uint32x4_t __a, uint32x4_t __b)
> @@ -4874,13 +4289,6 @@ __arm_vmullbq_int_u32 (uint32x4_t __a,
> uint32x4_t __b)
>    return __builtin_mve_vmullbq_int_uv4si (__a, __b);
>  }
> 
> -__extension__ extern __inline uint32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vmulhq_u32 (uint32x4_t __a, uint32x4_t __b)
> -{
> -  return __builtin_mve_vmulhq_uv4si (__a, __b);
> -}
> -
>  __extension__ extern __inline uint32_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vmladavq_u32 (uint32x4_t __a, uint32x4_t __b)
> @@ -4916,34 +4324,6 @@ __arm_vmaxq_u32 (uint32x4_t __a, uint32x4_t
> __b)
>    return __builtin_mve_vmaxq_uv4si (__a, __b);
>  }
> 
> -__extension__ extern __inline uint32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vhsubq_u32 (uint32x4_t __a, uint32x4_t __b)
> -{
> -  return __builtin_mve_vhsubq_uv4si (__a, __b);
> -}
> -
> -__extension__ extern __inline uint32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vhsubq_n_u32 (uint32x4_t __a, uint32_t __b)
> -{
> -  return __builtin_mve_vhsubq_n_uv4si (__a, __b);
> -}
> -
> -__extension__ extern __inline uint32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vhaddq_u32 (uint32x4_t __a, uint32x4_t __b)
> -{
> -  return __builtin_mve_vhaddq_uv4si (__a, __b);
> -}
> -
> -__extension__ extern __inline uint32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vhaddq_n_u32 (uint32x4_t __a, uint32_t __b)
> -{
> -  return __builtin_mve_vhaddq_n_uv4si (__a, __b);
> -}
> -
>  __extension__ extern __inline mve_pred16_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vcmpneq_n_u32 (uint32x4_t __a, uint32_t __b)
> @@ -5254,34 +4634,6 @@ __arm_vrshlq_n_s32 (int32x4_t __a, int32_t __b)
>    return __builtin_mve_vrshlq_n_sv4si (__a, __b);
>  }
> 
> -__extension__ extern __inline int32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vrmulhq_s32 (int32x4_t __a, int32x4_t __b)
> -{
> -  return __builtin_mve_vrmulhq_sv4si (__a, __b);
> -}
> -
> -__extension__ extern __inline int32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vrhaddq_s32 (int32x4_t __a, int32x4_t __b)
> -{
> -  return __builtin_mve_vrhaddq_sv4si (__a, __b);
> -}
> -
> -__extension__ extern __inline int32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vqsubq_s32 (int32x4_t __a, int32x4_t __b)
> -{
> -  return __builtin_mve_vqsubq_sv4si (__a, __b);
> -}
> -
> -__extension__ extern __inline int32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vqsubq_n_s32 (int32x4_t __a, int32_t __b)
> -{
> -  return __builtin_mve_vqsubq_n_sv4si (__a, __b);
> -}
> -
>  __extension__ extern __inline int32x4_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vqshlq_s32 (int32x4_t __a, int32x4_t __b)
> @@ -5324,34 +4676,6 @@ __arm_vqrdmulhq_n_s32 (int32x4_t __a, int32_t
> __b)
>    return __builtin_mve_vqrdmulhq_n_sv4si (__a, __b);
>  }
> 
> -__extension__ extern __inline int32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vqdmulhq_s32 (int32x4_t __a, int32x4_t __b)
> -{
> -  return __builtin_mve_vqdmulhq_sv4si (__a, __b);
> -}
> -
> -__extension__ extern __inline int32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vqdmulhq_n_s32 (int32x4_t __a, int32_t __b)
> -{
> -  return __builtin_mve_vqdmulhq_n_sv4si (__a, __b);
> -}
> -
> -__extension__ extern __inline int32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vqaddq_s32 (int32x4_t __a, int32x4_t __b)
> -{
> -  return __builtin_mve_vqaddq_sv4si (__a, __b);
> -}
> -
> -__extension__ extern __inline int32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vqaddq_n_s32 (int32x4_t __a, int32_t __b)
> -{
> -  return __builtin_mve_vqaddq_n_sv4si (__a, __b);
> -}
> -
>  __extension__ extern __inline int32x4_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vornq_s32 (int32x4_t __a, int32x4_t __b)
> @@ -5373,13 +4697,6 @@ __arm_vmullbq_int_s32 (int32x4_t __a, int32x4_t
> __b)
>    return __builtin_mve_vmullbq_int_sv4si (__a, __b);
>  }
> 
> -__extension__ extern __inline int32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vmulhq_s32 (int32x4_t __a, int32x4_t __b)
> -{
> -  return __builtin_mve_vmulhq_sv4si (__a, __b);
> -}
> -
>  __extension__ extern __inline int32_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vmlsdavxq_s32 (int32x4_t __a, int32x4_t __b)
> @@ -5436,20 +4753,6 @@ __arm_vmaxq_s32 (int32x4_t __a, int32x4_t __b)
>    return __builtin_mve_vmaxq_sv4si (__a, __b);
>  }
> 
> -__extension__ extern __inline int32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vhsubq_s32 (int32x4_t __a, int32x4_t __b)
> -{
> -  return __builtin_mve_vhsubq_sv4si (__a, __b);
> -}
> -
> -__extension__ extern __inline int32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vhsubq_n_s32 (int32x4_t __a, int32_t __b)
> -{
> -  return __builtin_mve_vhsubq_n_sv4si (__a, __b);
> -}
> -
>  __extension__ extern __inline int32x4_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vhcaddq_rot90_s32 (int32x4_t __a, int32x4_t __b)
> @@ -5464,20 +4767,6 @@ __arm_vhcaddq_rot270_s32 (int32x4_t __a,
> int32x4_t __b)
>    return __builtin_mve_vhcaddq_rot270_sv4si (__a, __b);
>  }
> 
> -__extension__ extern __inline int32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vhaddq_s32 (int32x4_t __a, int32x4_t __b)
> -{
> -  return __builtin_mve_vhaddq_sv4si (__a, __b);
> -}
> -
> -__extension__ extern __inline int32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vhaddq_n_s32 (int32x4_t __a, int32_t __b)
> -{
> -  return __builtin_mve_vhaddq_n_sv4si (__a, __b);
> -}
> -
>  __extension__ extern __inline int32x4_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vcaddq_rot90_s32 (int32x4_t __a, int32x4_t __b)
> @@ -9005,90 +8294,6 @@ __arm_vcaddq_rot90_m_u16 (uint16x8_t
> __inactive, uint16x8_t __a, uint16x8_t __b,
>    return __builtin_mve_vcaddq_rot90_m_uv8hi (__inactive, __a, __b, __p);
>  }
> 
> -__extension__ extern __inline int8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vhaddq_m_n_s8 (int8x16_t __inactive, int8x16_t __a, int8_t __b,
> mve_pred16_t __p)
> -{
> -  return __builtin_mve_vhaddq_m_n_sv16qi (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline int32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vhaddq_m_n_s32 (int32x4_t __inactive, int32x4_t __a, int32_t __b,
> mve_pred16_t __p)
> -{
> -  return __builtin_mve_vhaddq_m_n_sv4si (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline int16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vhaddq_m_n_s16 (int16x8_t __inactive, int16x8_t __a, int16_t __b,
> mve_pred16_t __p)
> -{
> -  return __builtin_mve_vhaddq_m_n_sv8hi (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline uint8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vhaddq_m_n_u8 (uint8x16_t __inactive, uint8x16_t __a, uint8_t __b,
> mve_pred16_t __p)
> -{
> -  return __builtin_mve_vhaddq_m_n_uv16qi (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline uint32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vhaddq_m_n_u32 (uint32x4_t __inactive, uint32x4_t __a, uint32_t
> __b, mve_pred16_t __p)
> -{
> -  return __builtin_mve_vhaddq_m_n_uv4si (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline uint16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vhaddq_m_n_u16 (uint16x8_t __inactive, uint16x8_t __a, uint16_t
> __b, mve_pred16_t __p)
> -{
> -  return __builtin_mve_vhaddq_m_n_uv8hi (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline int8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vhaddq_m_s8 (int8x16_t __inactive, int8x16_t __a, int8x16_t __b,
> mve_pred16_t __p)
> -{
> -  return __builtin_mve_vhaddq_m_sv16qi (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline int32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vhaddq_m_s32 (int32x4_t __inactive, int32x4_t __a, int32x4_t __b,
> mve_pred16_t __p)
> -{
> -  return __builtin_mve_vhaddq_m_sv4si (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline int16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vhaddq_m_s16 (int16x8_t __inactive, int16x8_t __a, int16x8_t __b,
> mve_pred16_t __p)
> -{
> -  return __builtin_mve_vhaddq_m_sv8hi (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline uint8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vhaddq_m_u8 (uint8x16_t __inactive, uint8x16_t __a, uint8x16_t
> __b, mve_pred16_t __p)
> -{
> -  return __builtin_mve_vhaddq_m_uv16qi (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline uint32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vhaddq_m_u32 (uint32x4_t __inactive, uint32x4_t __a, uint32x4_t
> __b, mve_pred16_t __p)
> -{
> -  return __builtin_mve_vhaddq_m_uv4si (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline uint16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vhaddq_m_u16 (uint16x8_t __inactive, uint16x8_t __a, uint16x8_t
> __b, mve_pred16_t __p)
> -{
> -  return __builtin_mve_vhaddq_m_uv8hi (__inactive, __a, __b, __p);
> -}
> -
>  __extension__ extern __inline int8x16_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vhcaddq_rot270_m_s8 (int8x16_t __inactive, int8x16_t __a, int8x16_t
> __b, mve_pred16_t __p)
> @@ -9131,90 +8336,6 @@ __arm_vhcaddq_rot90_m_s16 (int16x8_t
> __inactive, int16x8_t __a, int16x8_t __b, m
>    return __builtin_mve_vhcaddq_rot90_m_sv8hi (__inactive, __a, __b, __p);
>  }
> 
> -__extension__ extern __inline int8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vhsubq_m_n_s8 (int8x16_t __inactive, int8x16_t __a, int8_t __b,
> mve_pred16_t __p)
> -{
> -  return __builtin_mve_vhsubq_m_n_sv16qi (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline int32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vhsubq_m_n_s32 (int32x4_t __inactive, int32x4_t __a, int32_t __b,
> mve_pred16_t __p)
> -{
> -  return __builtin_mve_vhsubq_m_n_sv4si (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline int16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vhsubq_m_n_s16 (int16x8_t __inactive, int16x8_t __a, int16_t __b,
> mve_pred16_t __p)
> -{
> -  return __builtin_mve_vhsubq_m_n_sv8hi (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline uint8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vhsubq_m_n_u8 (uint8x16_t __inactive, uint8x16_t __a, uint8_t __b,
> mve_pred16_t __p)
> -{
> -  return __builtin_mve_vhsubq_m_n_uv16qi (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline uint32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vhsubq_m_n_u32 (uint32x4_t __inactive, uint32x4_t __a, uint32_t
> __b, mve_pred16_t __p)
> -{
> -  return __builtin_mve_vhsubq_m_n_uv4si (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline uint16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vhsubq_m_n_u16 (uint16x8_t __inactive, uint16x8_t __a, uint16_t
> __b, mve_pred16_t __p)
> -{
> -  return __builtin_mve_vhsubq_m_n_uv8hi (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline int8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vhsubq_m_s8 (int8x16_t __inactive, int8x16_t __a, int8x16_t __b,
> mve_pred16_t __p)
> -{
> -  return __builtin_mve_vhsubq_m_sv16qi (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline int32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vhsubq_m_s32 (int32x4_t __inactive, int32x4_t __a, int32x4_t __b,
> mve_pred16_t __p)
> -{
> -  return __builtin_mve_vhsubq_m_sv4si (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline int16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vhsubq_m_s16 (int16x8_t __inactive, int16x8_t __a, int16x8_t __b,
> mve_pred16_t __p)
> -{
> -  return __builtin_mve_vhsubq_m_sv8hi (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline uint8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vhsubq_m_u8 (uint8x16_t __inactive, uint8x16_t __a, uint8x16_t
> __b, mve_pred16_t __p)
> -{
> -  return __builtin_mve_vhsubq_m_uv16qi (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline uint32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vhsubq_m_u32 (uint32x4_t __inactive, uint32x4_t __a, uint32x4_t
> __b, mve_pred16_t __p)
> -{
> -  return __builtin_mve_vhsubq_m_uv4si (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline uint16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vhsubq_m_u16 (uint16x8_t __inactive, uint16x8_t __a, uint16x8_t
> __b, mve_pred16_t __p)
> -{
> -  return __builtin_mve_vhsubq_m_uv8hi (__inactive, __a, __b, __p);
> -}
> -
>  __extension__ extern __inline int8x16_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vmaxq_m_s8 (int8x16_t __inactive, int8x16_t __a, int8x16_t __b,
> mve_pred16_t __p)
> @@ -9488,48 +8609,6 @@ __arm_vmlsdavaxq_p_s16 (int32_t __a, int16x8_t
> __b, int16x8_t __c, mve_pred16_t
>    return __builtin_mve_vmlsdavaxq_p_sv8hi (__a, __b, __c, __p);
>  }
> 
> -__extension__ extern __inline int8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vmulhq_m_s8 (int8x16_t __inactive, int8x16_t __a, int8x16_t __b,
> mve_pred16_t __p)
> -{
> -  return __builtin_mve_vmulhq_m_sv16qi (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline int32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vmulhq_m_s32 (int32x4_t __inactive, int32x4_t __a, int32x4_t __b,
> mve_pred16_t __p)
> -{
> -  return __builtin_mve_vmulhq_m_sv4si (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline int16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vmulhq_m_s16 (int16x8_t __inactive, int16x8_t __a, int16x8_t __b,
> mve_pred16_t __p)
> -{
> -  return __builtin_mve_vmulhq_m_sv8hi (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline uint8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vmulhq_m_u8 (uint8x16_t __inactive, uint8x16_t __a, uint8x16_t
> __b, mve_pred16_t __p)
> -{
> -  return __builtin_mve_vmulhq_m_uv16qi (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline uint32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vmulhq_m_u32 (uint32x4_t __inactive, uint32x4_t __a, uint32x4_t
> __b, mve_pred16_t __p)
> -{
> -  return __builtin_mve_vmulhq_m_uv4si (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline uint16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vmulhq_m_u16 (uint16x8_t __inactive, uint16x8_t __a, uint16x8_t
> __b, mve_pred16_t __p)
> -{
> -  return __builtin_mve_vmulhq_m_uv8hi (__inactive, __a, __b, __p);
> -}
> -
>  __extension__ extern __inline int16x8_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vmullbq_int_m_s8 (int16x8_t __inactive, int8x16_t __a, int8x16_t
> __b, mve_pred16_t __p)
> @@ -9656,90 +8735,6 @@ __arm_vornq_m_u16 (uint16x8_t __inactive,
> uint16x8_t __a, uint16x8_t __b, mve_pr
>    return __builtin_mve_vornq_m_uv8hi (__inactive, __a, __b, __p);
>  }
> 
> -__extension__ extern __inline int8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vqaddq_m_n_s8 (int8x16_t __inactive, int8x16_t __a, int8_t __b,
> mve_pred16_t __p)
> -{
> -  return __builtin_mve_vqaddq_m_n_sv16qi (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline int32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vqaddq_m_n_s32 (int32x4_t __inactive, int32x4_t __a, int32_t __b,
> mve_pred16_t __p)
> -{
> -  return __builtin_mve_vqaddq_m_n_sv4si (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline int16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vqaddq_m_n_s16 (int16x8_t __inactive, int16x8_t __a, int16_t __b,
> mve_pred16_t __p)
> -{
> -  return __builtin_mve_vqaddq_m_n_sv8hi (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline uint8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vqaddq_m_n_u8 (uint8x16_t __inactive, uint8x16_t __a, uint8_t __b,
> mve_pred16_t __p)
> -{
> -  return __builtin_mve_vqaddq_m_n_uv16qi (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline uint32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vqaddq_m_n_u32 (uint32x4_t __inactive, uint32x4_t __a, uint32_t
> __b, mve_pred16_t __p)
> -{
> -  return __builtin_mve_vqaddq_m_n_uv4si (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline uint16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vqaddq_m_n_u16 (uint16x8_t __inactive, uint16x8_t __a, uint16_t
> __b, mve_pred16_t __p)
> -{
> -  return __builtin_mve_vqaddq_m_n_uv8hi (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline int8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vqaddq_m_s8 (int8x16_t __inactive, int8x16_t __a, int8x16_t __b,
> mve_pred16_t __p)
> -{
> -  return __builtin_mve_vqaddq_m_sv16qi (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline int32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vqaddq_m_s32 (int32x4_t __inactive, int32x4_t __a, int32x4_t __b,
> mve_pred16_t __p)
> -{
> -  return __builtin_mve_vqaddq_m_sv4si (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline int16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vqaddq_m_s16 (int16x8_t __inactive, int16x8_t __a, int16x8_t __b,
> mve_pred16_t __p)
> -{
> -  return __builtin_mve_vqaddq_m_sv8hi (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline uint8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vqaddq_m_u8 (uint8x16_t __inactive, uint8x16_t __a, uint8x16_t
> __b, mve_pred16_t __p)
> -{
> -  return __builtin_mve_vqaddq_m_uv16qi (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline uint32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vqaddq_m_u32 (uint32x4_t __inactive, uint32x4_t __a, uint32x4_t
> __b, mve_pred16_t __p)
> -{
> -  return __builtin_mve_vqaddq_m_uv4si (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline uint16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vqaddq_m_u16 (uint16x8_t __inactive, uint16x8_t __a, uint16x8_t
> __b, mve_pred16_t __p)
> -{
> -  return __builtin_mve_vqaddq_m_uv8hi (__inactive, __a, __b, __p);
> -}
> -
>  __extension__ extern __inline int8x16_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vqdmladhq_m_s8 (int8x16_t __inactive, int8x16_t __a, int8x16_t __b,
> mve_pred16_t __p)
> @@ -9845,48 +8840,6 @@ __arm_vqdmlsdhxq_m_s16 (int16x8_t __inactive,
> int16x8_t __a, int16x8_t __b, mve_
>    return __builtin_mve_vqdmlsdhxq_m_sv8hi (__inactive, __a, __b, __p);
>  }
> 
> -__extension__ extern __inline int8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vqdmulhq_m_n_s8 (int8x16_t __inactive, int8x16_t __a, int8_t __b,
> mve_pred16_t __p)
> -{
> -  return __builtin_mve_vqdmulhq_m_n_sv16qi (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline int32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vqdmulhq_m_n_s32 (int32x4_t __inactive, int32x4_t __a, int32_t
> __b, mve_pred16_t __p)
> -{
> -  return __builtin_mve_vqdmulhq_m_n_sv4si (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline int16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vqdmulhq_m_n_s16 (int16x8_t __inactive, int16x8_t __a, int16_t
> __b, mve_pred16_t __p)
> -{
> -  return __builtin_mve_vqdmulhq_m_n_sv8hi (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline int8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vqdmulhq_m_s8 (int8x16_t __inactive, int8x16_t __a, int8x16_t __b,
> mve_pred16_t __p)
> -{
> -  return __builtin_mve_vqdmulhq_m_sv16qi (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline int32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vqdmulhq_m_s32 (int32x4_t __inactive, int32x4_t __a, int32x4_t __b,
> mve_pred16_t __p)
> -{
> -  return __builtin_mve_vqdmulhq_m_sv4si (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline int16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vqdmulhq_m_s16 (int16x8_t __inactive, int16x8_t __a, int16x8_t __b,
> mve_pred16_t __p)
> -{
> -  return __builtin_mve_vqdmulhq_m_sv8hi (__inactive, __a, __b, __p);
> -}
> -
>  __extension__ extern __inline int8x16_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vqrdmladhq_m_s8 (int8x16_t __inactive, int8x16_t __a, int8x16_t
> __b, mve_pred16_t __p)
> @@ -10202,174 +9155,6 @@ __arm_vqshlq_m_u16 (uint16x8_t __inactive,
> uint16x8_t __a, int16x8_t __b, mve_pr
>    return __builtin_mve_vqshlq_m_uv8hi (__inactive, __a, __b, __p);
>  }
> 
> -__extension__ extern __inline int8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vqsubq_m_n_s8 (int8x16_t __inactive, int8x16_t __a, int8_t __b,
> mve_pred16_t __p)
> -{
> -  return __builtin_mve_vqsubq_m_n_sv16qi (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline int32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vqsubq_m_n_s32 (int32x4_t __inactive, int32x4_t __a, int32_t __b,
> mve_pred16_t __p)
> -{
> -  return __builtin_mve_vqsubq_m_n_sv4si (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline int16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vqsubq_m_n_s16 (int16x8_t __inactive, int16x8_t __a, int16_t __b,
> mve_pred16_t __p)
> -{
> -  return __builtin_mve_vqsubq_m_n_sv8hi (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline uint8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vqsubq_m_n_u8 (uint8x16_t __inactive, uint8x16_t __a, uint8_t __b,
> mve_pred16_t __p)
> -{
> -  return __builtin_mve_vqsubq_m_n_uv16qi (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline uint32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vqsubq_m_n_u32 (uint32x4_t __inactive, uint32x4_t __a, uint32_t
> __b, mve_pred16_t __p)
> -{
> -  return __builtin_mve_vqsubq_m_n_uv4si (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline uint16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vqsubq_m_n_u16 (uint16x8_t __inactive, uint16x8_t __a, uint16_t
> __b, mve_pred16_t __p)
> -{
> -  return __builtin_mve_vqsubq_m_n_uv8hi (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline int8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vqsubq_m_s8 (int8x16_t __inactive, int8x16_t __a, int8x16_t __b,
> mve_pred16_t __p)
> -{
> -  return __builtin_mve_vqsubq_m_sv16qi (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline int32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vqsubq_m_s32 (int32x4_t __inactive, int32x4_t __a, int32x4_t __b,
> mve_pred16_t __p)
> -{
> -  return __builtin_mve_vqsubq_m_sv4si (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline int16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vqsubq_m_s16 (int16x8_t __inactive, int16x8_t __a, int16x8_t __b,
> mve_pred16_t __p)
> -{
> -  return __builtin_mve_vqsubq_m_sv8hi (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline uint8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vqsubq_m_u8 (uint8x16_t __inactive, uint8x16_t __a, uint8x16_t
> __b, mve_pred16_t __p)
> -{
> -  return __builtin_mve_vqsubq_m_uv16qi (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline uint32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vqsubq_m_u32 (uint32x4_t __inactive, uint32x4_t __a, uint32x4_t
> __b, mve_pred16_t __p)
> -{
> -  return __builtin_mve_vqsubq_m_uv4si (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline uint16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vqsubq_m_u16 (uint16x8_t __inactive, uint16x8_t __a, uint16x8_t
> __b, mve_pred16_t __p)
> -{
> -  return __builtin_mve_vqsubq_m_uv8hi (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline int8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vrhaddq_m_s8 (int8x16_t __inactive, int8x16_t __a, int8x16_t __b,
> mve_pred16_t __p)
> -{
> -  return __builtin_mve_vrhaddq_m_sv16qi (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline int32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vrhaddq_m_s32 (int32x4_t __inactive, int32x4_t __a, int32x4_t __b,
> mve_pred16_t __p)
> -{
> -  return __builtin_mve_vrhaddq_m_sv4si (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline int16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vrhaddq_m_s16 (int16x8_t __inactive, int16x8_t __a, int16x8_t __b,
> mve_pred16_t __p)
> -{
> -  return __builtin_mve_vrhaddq_m_sv8hi (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline uint8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vrhaddq_m_u8 (uint8x16_t __inactive, uint8x16_t __a, uint8x16_t
> __b, mve_pred16_t __p)
> -{
> -  return __builtin_mve_vrhaddq_m_uv16qi (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline uint32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vrhaddq_m_u32 (uint32x4_t __inactive, uint32x4_t __a, uint32x4_t
> __b, mve_pred16_t __p)
> -{
> -  return __builtin_mve_vrhaddq_m_uv4si (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline uint16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vrhaddq_m_u16 (uint16x8_t __inactive, uint16x8_t __a, uint16x8_t
> __b, mve_pred16_t __p)
> -{
> -  return __builtin_mve_vrhaddq_m_uv8hi (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline int8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vrmulhq_m_s8 (int8x16_t __inactive, int8x16_t __a, int8x16_t __b,
> mve_pred16_t __p)
> -{
> -  return __builtin_mve_vrmulhq_m_sv16qi (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline int32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vrmulhq_m_s32 (int32x4_t __inactive, int32x4_t __a, int32x4_t __b,
> mve_pred16_t __p)
> -{
> -  return __builtin_mve_vrmulhq_m_sv4si (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline int16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vrmulhq_m_s16 (int16x8_t __inactive, int16x8_t __a, int16x8_t __b,
> mve_pred16_t __p)
> -{
> -  return __builtin_mve_vrmulhq_m_sv8hi (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline uint8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vrmulhq_m_u8 (uint8x16_t __inactive, uint8x16_t __a, uint8x16_t
> __b, mve_pred16_t __p)
> -{
> -  return __builtin_mve_vrmulhq_m_uv16qi (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline uint32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vrmulhq_m_u32 (uint32x4_t __inactive, uint32x4_t __a, uint32x4_t
> __b, mve_pred16_t __p)
> -{
> -  return __builtin_mve_vrmulhq_m_uv4si (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline uint16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vrmulhq_m_u16 (uint16x8_t __inactive, uint16x8_t __a, uint16x8_t
> __b, mve_pred16_t __p)
> -{
> -  return __builtin_mve_vrmulhq_m_uv8hi (__inactive, __a, __b, __p);
> -}
> -
>  __extension__ extern __inline int8x16_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vrshlq_m_s8 (int8x16_t __inactive, int8x16_t __a, int8x16_t __b,
> mve_pred16_t __p)
> @@ -13289,48 +12074,6 @@ __arm_vnegq_x_s32 (int32x4_t __a,
> mve_pred16_t __p)
>    return __builtin_mve_vnegq_m_sv4si (__arm_vuninitializedq_s32 (), __a,
> __p);
>  }
> 
> -__extension__ extern __inline int8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vmulhq_x_s8 (int8x16_t __a, int8x16_t __b, mve_pred16_t __p)
> -{
> -  return __builtin_mve_vmulhq_m_sv16qi (__arm_vuninitializedq_s8 (), __a,
> __b, __p);
> -}
> -
> -__extension__ extern __inline int16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vmulhq_x_s16 (int16x8_t __a, int16x8_t __b, mve_pred16_t __p)
> -{
> -  return __builtin_mve_vmulhq_m_sv8hi (__arm_vuninitializedq_s16 (), __a,
> __b, __p);
> -}
> -
> -__extension__ extern __inline int32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vmulhq_x_s32 (int32x4_t __a, int32x4_t __b, mve_pred16_t __p)
> -{
> -  return __builtin_mve_vmulhq_m_sv4si (__arm_vuninitializedq_s32 (), __a,
> __b, __p);
> -}
> -
> -__extension__ extern __inline uint8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vmulhq_x_u8 (uint8x16_t __a, uint8x16_t __b, mve_pred16_t __p)
> -{
> -  return __builtin_mve_vmulhq_m_uv16qi (__arm_vuninitializedq_u8 (), __a,
> __b, __p);
> -}
> -
> -__extension__ extern __inline uint16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vmulhq_x_u16 (uint16x8_t __a, uint16x8_t __b, mve_pred16_t __p)
> -{
> -  return __builtin_mve_vmulhq_m_uv8hi (__arm_vuninitializedq_u16 (), __a,
> __b, __p);
> -}
> -
> -__extension__ extern __inline uint32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vmulhq_x_u32 (uint32x4_t __a, uint32x4_t __b, mve_pred16_t __p)
> -{
> -  return __builtin_mve_vmulhq_m_uv4si (__arm_vuninitializedq_u32 (), __a,
> __b, __p);
> -}
> -
>  __extension__ extern __inline uint16x8_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vmullbq_poly_x_p8 (uint8x16_t __a, uint8x16_t __b, mve_pred16_t
> __p)
> @@ -13527,90 +12270,6 @@ __arm_vcaddq_rot270_x_u32 (uint32x4_t __a,
> uint32x4_t __b, mve_pred16_t __p)
>    return __builtin_mve_vcaddq_rot270_m_uv4si (__arm_vuninitializedq_u32
> (), __a, __b, __p);
>  }
> 
> -__extension__ extern __inline int8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vhaddq_x_n_s8 (int8x16_t __a, int8_t __b, mve_pred16_t __p)
> -{
> -  return __builtin_mve_vhaddq_m_n_sv16qi (__arm_vuninitializedq_s8 (),
> __a, __b, __p);
> -}
> -
> -__extension__ extern __inline int16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vhaddq_x_n_s16 (int16x8_t __a, int16_t __b, mve_pred16_t __p)
> -{
> -  return __builtin_mve_vhaddq_m_n_sv8hi (__arm_vuninitializedq_s16 (),
> __a, __b, __p);
> -}
> -
> -__extension__ extern __inline int32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vhaddq_x_n_s32 (int32x4_t __a, int32_t __b, mve_pred16_t __p)
> -{
> -  return __builtin_mve_vhaddq_m_n_sv4si (__arm_vuninitializedq_s32 (),
> __a, __b, __p);
> -}
> -
> -__extension__ extern __inline uint8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vhaddq_x_n_u8 (uint8x16_t __a, uint8_t __b, mve_pred16_t __p)
> -{
> -  return __builtin_mve_vhaddq_m_n_uv16qi (__arm_vuninitializedq_u8 (),
> __a, __b, __p);
> -}
> -
> -__extension__ extern __inline uint16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vhaddq_x_n_u16 (uint16x8_t __a, uint16_t __b, mve_pred16_t __p)
> -{
> -  return __builtin_mve_vhaddq_m_n_uv8hi (__arm_vuninitializedq_u16 (),
> __a, __b, __p);
> -}
> -
> -__extension__ extern __inline uint32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vhaddq_x_n_u32 (uint32x4_t __a, uint32_t __b, mve_pred16_t __p)
> -{
> -  return __builtin_mve_vhaddq_m_n_uv4si (__arm_vuninitializedq_u32 (),
> __a, __b, __p);
> -}
> -
> -__extension__ extern __inline int8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vhaddq_x_s8 (int8x16_t __a, int8x16_t __b, mve_pred16_t __p)
> -{
> -  return __builtin_mve_vhaddq_m_sv16qi (__arm_vuninitializedq_s8 (), __a,
> __b, __p);
> -}
> -
> -__extension__ extern __inline int16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vhaddq_x_s16 (int16x8_t __a, int16x8_t __b, mve_pred16_t __p)
> -{
> -  return __builtin_mve_vhaddq_m_sv8hi (__arm_vuninitializedq_s16 (), __a,
> __b, __p);
> -}
> -
> -__extension__ extern __inline int32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vhaddq_x_s32 (int32x4_t __a, int32x4_t __b, mve_pred16_t __p)
> -{
> -  return __builtin_mve_vhaddq_m_sv4si (__arm_vuninitializedq_s32 (), __a,
> __b, __p);
> -}
> -
> -__extension__ extern __inline uint8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vhaddq_x_u8 (uint8x16_t __a, uint8x16_t __b, mve_pred16_t __p)
> -{
> -  return __builtin_mve_vhaddq_m_uv16qi (__arm_vuninitializedq_u8 (), __a,
> __b, __p);
> -}
> -
> -__extension__ extern __inline uint16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vhaddq_x_u16 (uint16x8_t __a, uint16x8_t __b, mve_pred16_t __p)
> -{
> -  return __builtin_mve_vhaddq_m_uv8hi (__arm_vuninitializedq_u16 (), __a,
> __b, __p);
> -}
> -
> -__extension__ extern __inline uint32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vhaddq_x_u32 (uint32x4_t __a, uint32x4_t __b, mve_pred16_t __p)
> -{
> -  return __builtin_mve_vhaddq_m_uv4si (__arm_vuninitializedq_u32 (), __a,
> __b, __p);
> -}
> -
>  __extension__ extern __inline int8x16_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vhcaddq_rot90_x_s8 (int8x16_t __a, int8x16_t __b, mve_pred16_t
> __p)
> @@ -13653,174 +12312,6 @@ __arm_vhcaddq_rot270_x_s32 (int32x4_t __a,
> int32x4_t __b, mve_pred16_t __p)
>    return __builtin_mve_vhcaddq_rot270_m_sv4si (__arm_vuninitializedq_s32
> (), __a, __b, __p);
>  }
> 
> -__extension__ extern __inline int8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vhsubq_x_n_s8 (int8x16_t __a, int8_t __b, mve_pred16_t __p)
> -{
> -  return __builtin_mve_vhsubq_m_n_sv16qi (__arm_vuninitializedq_s8 (),
> __a, __b, __p);
> -}
> -
> -__extension__ extern __inline int16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vhsubq_x_n_s16 (int16x8_t __a, int16_t __b, mve_pred16_t __p)
> -{
> -  return __builtin_mve_vhsubq_m_n_sv8hi (__arm_vuninitializedq_s16 (),
> __a, __b, __p);
> -}
> -
> -__extension__ extern __inline int32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vhsubq_x_n_s32 (int32x4_t __a, int32_t __b, mve_pred16_t __p)
> -{
> -  return __builtin_mve_vhsubq_m_n_sv4si (__arm_vuninitializedq_s32 (),
> __a, __b, __p);
> -}
> -
> -__extension__ extern __inline uint8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vhsubq_x_n_u8 (uint8x16_t __a, uint8_t __b, mve_pred16_t __p)
> -{
> -  return __builtin_mve_vhsubq_m_n_uv16qi (__arm_vuninitializedq_u8 (),
> __a, __b, __p);
> -}
> -
> -__extension__ extern __inline uint16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vhsubq_x_n_u16 (uint16x8_t __a, uint16_t __b, mve_pred16_t __p)
> -{
> -  return __builtin_mve_vhsubq_m_n_uv8hi (__arm_vuninitializedq_u16 (),
> __a, __b, __p);
> -}
> -
> -__extension__ extern __inline uint32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vhsubq_x_n_u32 (uint32x4_t __a, uint32_t __b, mve_pred16_t __p)
> -{
> -  return __builtin_mve_vhsubq_m_n_uv4si (__arm_vuninitializedq_u32 (),
> __a, __b, __p);
> -}
> -
> -__extension__ extern __inline int8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vhsubq_x_s8 (int8x16_t __a, int8x16_t __b, mve_pred16_t __p)
> -{
> -  return __builtin_mve_vhsubq_m_sv16qi (__arm_vuninitializedq_s8 (), __a,
> __b, __p);
> -}
> -
> -__extension__ extern __inline int16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vhsubq_x_s16 (int16x8_t __a, int16x8_t __b, mve_pred16_t __p)
> -{
> -  return __builtin_mve_vhsubq_m_sv8hi (__arm_vuninitializedq_s16 (), __a,
> __b, __p);
> -}
> -
> -__extension__ extern __inline int32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vhsubq_x_s32 (int32x4_t __a, int32x4_t __b, mve_pred16_t __p)
> -{
> -  return __builtin_mve_vhsubq_m_sv4si (__arm_vuninitializedq_s32 (), __a,
> __b, __p);
> -}
> -
> -__extension__ extern __inline uint8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vhsubq_x_u8 (uint8x16_t __a, uint8x16_t __b, mve_pred16_t __p)
> -{
> -  return __builtin_mve_vhsubq_m_uv16qi (__arm_vuninitializedq_u8 (), __a,
> __b, __p);
> -}
> -
> -__extension__ extern __inline uint16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vhsubq_x_u16 (uint16x8_t __a, uint16x8_t __b, mve_pred16_t __p)
> -{
> -  return __builtin_mve_vhsubq_m_uv8hi (__arm_vuninitializedq_u16 (), __a,
> __b, __p);
> -}
> -
> -__extension__ extern __inline uint32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vhsubq_x_u32 (uint32x4_t __a, uint32x4_t __b, mve_pred16_t __p)
> -{
> -  return __builtin_mve_vhsubq_m_uv4si (__arm_vuninitializedq_u32 (), __a,
> __b, __p);
> -}
> -
> -__extension__ extern __inline int8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vrhaddq_x_s8 (int8x16_t __a, int8x16_t __b, mve_pred16_t __p)
> -{
> -  return __builtin_mve_vrhaddq_m_sv16qi (__arm_vuninitializedq_s8 (), __a,
> __b, __p);
> -}
> -
> -__extension__ extern __inline int16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vrhaddq_x_s16 (int16x8_t __a, int16x8_t __b, mve_pred16_t __p)
> -{
> -  return __builtin_mve_vrhaddq_m_sv8hi (__arm_vuninitializedq_s16 (), __a,
> __b, __p);
> -}
> -
> -__extension__ extern __inline int32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vrhaddq_x_s32 (int32x4_t __a, int32x4_t __b, mve_pred16_t __p)
> -{
> -  return __builtin_mve_vrhaddq_m_sv4si (__arm_vuninitializedq_s32 (), __a,
> __b, __p);
> -}
> -
> -__extension__ extern __inline uint8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vrhaddq_x_u8 (uint8x16_t __a, uint8x16_t __b, mve_pred16_t __p)
> -{
> -  return __builtin_mve_vrhaddq_m_uv16qi (__arm_vuninitializedq_u8 (),
> __a, __b, __p);
> -}
> -
> -__extension__ extern __inline uint16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vrhaddq_x_u16 (uint16x8_t __a, uint16x8_t __b, mve_pred16_t __p)
> -{
> -  return __builtin_mve_vrhaddq_m_uv8hi (__arm_vuninitializedq_u16 (),
> __a, __b, __p);
> -}
> -
> -__extension__ extern __inline uint32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vrhaddq_x_u32 (uint32x4_t __a, uint32x4_t __b, mve_pred16_t __p)
> -{
> -  return __builtin_mve_vrhaddq_m_uv4si (__arm_vuninitializedq_u32 (),
> __a, __b, __p);
> -}
> -
> -__extension__ extern __inline int8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vrmulhq_x_s8 (int8x16_t __a, int8x16_t __b, mve_pred16_t __p)
> -{
> -  return __builtin_mve_vrmulhq_m_sv16qi (__arm_vuninitializedq_s8 (), __a,
> __b, __p);
> -}
> -
> -__extension__ extern __inline int16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vrmulhq_x_s16 (int16x8_t __a, int16x8_t __b, mve_pred16_t __p)
> -{
> -  return __builtin_mve_vrmulhq_m_sv8hi (__arm_vuninitializedq_s16 (), __a,
> __b, __p);
> -}
> -
> -__extension__ extern __inline int32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vrmulhq_x_s32 (int32x4_t __a, int32x4_t __b, mve_pred16_t __p)
> -{
> -  return __builtin_mve_vrmulhq_m_sv4si (__arm_vuninitializedq_s32 (), __a,
> __b, __p);
> -}
> -
> -__extension__ extern __inline uint8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vrmulhq_x_u8 (uint8x16_t __a, uint8x16_t __b, mve_pred16_t __p)
> -{
> -  return __builtin_mve_vrmulhq_m_uv16qi (__arm_vuninitializedq_u8 (),
> __a, __b, __p);
> -}
> -
> -__extension__ extern __inline uint16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vrmulhq_x_u16 (uint16x8_t __a, uint16x8_t __b, mve_pred16_t __p)
> -{
> -  return __builtin_mve_vrmulhq_m_uv8hi (__arm_vuninitializedq_u16 (),
> __a, __b, __p);
> -}
> -
> -__extension__ extern __inline uint32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vrmulhq_x_u32 (uint32x4_t __a, uint32x4_t __b, mve_pred16_t __p)
> -{
> -  return __builtin_mve_vrmulhq_m_uv4si (__arm_vuninitializedq_u32 (),
> __a, __b, __p);
> -}
> -
>  __extension__ extern __inline int8x16_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vbicq_x_s8 (int8x16_t __a, int8x16_t __b, mve_pred16_t __p)
> @@ -18558,48 +17049,6 @@ __arm_vshlq (uint32x4_t __a, int32x4_t __b)
>   return __arm_vshlq_u32 (__a, __b);
>  }
> 
> -__extension__ extern __inline uint8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vrmulhq (uint8x16_t __a, uint8x16_t __b)
> -{
> - return __arm_vrmulhq_u8 (__a, __b);
> -}
> -
> -__extension__ extern __inline uint8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vrhaddq (uint8x16_t __a, uint8x16_t __b)
> -{
> - return __arm_vrhaddq_u8 (__a, __b);
> -}
> -
> -__extension__ extern __inline uint8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vqsubq (uint8x16_t __a, uint8x16_t __b)
> -{
> - return __arm_vqsubq_u8 (__a, __b);
> -}
> -
> -__extension__ extern __inline uint8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vqsubq (uint8x16_t __a, uint8_t __b)
> -{
> - return __arm_vqsubq_n_u8 (__a, __b);
> -}
> -
> -__extension__ extern __inline uint8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vqaddq (uint8x16_t __a, uint8x16_t __b)
> -{
> - return __arm_vqaddq_u8 (__a, __b);
> -}
> -
> -__extension__ extern __inline uint8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vqaddq (uint8x16_t __a, uint8_t __b)
> -{
> - return __arm_vqaddq_n_u8 (__a, __b);
> -}
> -
>  __extension__ extern __inline uint8x16_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vornq (uint8x16_t __a, uint8x16_t __b)
> @@ -18621,13 +17070,6 @@ __arm_vmullbq_int (uint8x16_t __a, uint8x16_t
> __b)
>   return __arm_vmullbq_int_u8 (__a, __b);
>  }
> 
> -__extension__ extern __inline uint8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vmulhq (uint8x16_t __a, uint8x16_t __b)
> -{
> - return __arm_vmulhq_u8 (__a, __b);
> -}
> -
>  __extension__ extern __inline uint32_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vmladavq (uint8x16_t __a, uint8x16_t __b)
> @@ -18663,34 +17105,6 @@ __arm_vmaxq (uint8x16_t __a, uint8x16_t __b)
>   return __arm_vmaxq_u8 (__a, __b);
>  }
> 
> -__extension__ extern __inline uint8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vhsubq (uint8x16_t __a, uint8x16_t __b)
> -{
> - return __arm_vhsubq_u8 (__a, __b);
> -}
> -
> -__extension__ extern __inline uint8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vhsubq (uint8x16_t __a, uint8_t __b)
> -{
> - return __arm_vhsubq_n_u8 (__a, __b);
> -}
> -
> -__extension__ extern __inline uint8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vhaddq (uint8x16_t __a, uint8x16_t __b)
> -{
> - return __arm_vhaddq_u8 (__a, __b);
> -}
> -
> -__extension__ extern __inline uint8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vhaddq (uint8x16_t __a, uint8_t __b)
> -{
> - return __arm_vhaddq_n_u8 (__a, __b);
> -}
> -
>  __extension__ extern __inline mve_pred16_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vcmpneq (uint8x16_t __a, uint8_t __b)
> @@ -18999,34 +17413,6 @@ __arm_vrshlq (int8x16_t __a, int32_t __b)
>   return __arm_vrshlq_n_s8 (__a, __b);
>  }
> 
> -__extension__ extern __inline int8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vrmulhq (int8x16_t __a, int8x16_t __b)
> -{
> - return __arm_vrmulhq_s8 (__a, __b);
> -}
> -
> -__extension__ extern __inline int8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vrhaddq (int8x16_t __a, int8x16_t __b)
> -{
> - return __arm_vrhaddq_s8 (__a, __b);
> -}
> -
> -__extension__ extern __inline int8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vqsubq (int8x16_t __a, int8x16_t __b)
> -{
> - return __arm_vqsubq_s8 (__a, __b);
> -}
> -
> -__extension__ extern __inline int8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vqsubq (int8x16_t __a, int8_t __b)
> -{
> - return __arm_vqsubq_n_s8 (__a, __b);
> -}
> -
>  __extension__ extern __inline int8x16_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vqshlq (int8x16_t __a, int8x16_t __b)
> @@ -19069,34 +17455,6 @@ __arm_vqrdmulhq (int8x16_t __a, int8_t __b)
>   return __arm_vqrdmulhq_n_s8 (__a, __b);
>  }
> 
> -__extension__ extern __inline int8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vqdmulhq (int8x16_t __a, int8x16_t __b)
> -{
> - return __arm_vqdmulhq_s8 (__a, __b);
> -}
> -
> -__extension__ extern __inline int8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vqdmulhq (int8x16_t __a, int8_t __b)
> -{
> - return __arm_vqdmulhq_n_s8 (__a, __b);
> -}
> -
> -__extension__ extern __inline int8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vqaddq (int8x16_t __a, int8x16_t __b)
> -{
> - return __arm_vqaddq_s8 (__a, __b);
> -}
> -
> -__extension__ extern __inline int8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vqaddq (int8x16_t __a, int8_t __b)
> -{
> - return __arm_vqaddq_n_s8 (__a, __b);
> -}
> -
>  __extension__ extern __inline int8x16_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vornq (int8x16_t __a, int8x16_t __b)
> @@ -19118,13 +17476,6 @@ __arm_vmullbq_int (int8x16_t __a, int8x16_t
> __b)
>   return __arm_vmullbq_int_s8 (__a, __b);
>  }
> 
> -__extension__ extern __inline int8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vmulhq (int8x16_t __a, int8x16_t __b)
> -{
> - return __arm_vmulhq_s8 (__a, __b);
> -}
> -
>  __extension__ extern __inline int32_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vmlsdavxq (int8x16_t __a, int8x16_t __b)
> @@ -19181,20 +17532,6 @@ __arm_vmaxq (int8x16_t __a, int8x16_t __b)
>   return __arm_vmaxq_s8 (__a, __b);
>  }
> 
> -__extension__ extern __inline int8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vhsubq (int8x16_t __a, int8x16_t __b)
> -{
> - return __arm_vhsubq_s8 (__a, __b);
> -}
> -
> -__extension__ extern __inline int8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vhsubq (int8x16_t __a, int8_t __b)
> -{
> - return __arm_vhsubq_n_s8 (__a, __b);
> -}
> -
>  __extension__ extern __inline int8x16_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vhcaddq_rot90 (int8x16_t __a, int8x16_t __b)
> @@ -19209,20 +17546,6 @@ __arm_vhcaddq_rot270 (int8x16_t __a,
> int8x16_t __b)
>   return __arm_vhcaddq_rot270_s8 (__a, __b);
>  }
> 
> -__extension__ extern __inline int8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vhaddq (int8x16_t __a, int8x16_t __b)
> -{
> - return __arm_vhaddq_s8 (__a, __b);
> -}
> -
> -__extension__ extern __inline int8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vhaddq (int8x16_t __a, int8_t __b)
> -{
> - return __arm_vhaddq_n_s8 (__a, __b);
> -}
> -
>  __extension__ extern __inline int8x16_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vcaddq_rot90 (int8x16_t __a, int8x16_t __b)
> @@ -19286,48 +17609,6 @@ __arm_vqshlq_n (int8x16_t __a, const int
> __imm)
>   return __arm_vqshlq_n_s8 (__a, __imm);
>  }
> 
> -__extension__ extern __inline uint16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vrmulhq (uint16x8_t __a, uint16x8_t __b)
> -{
> - return __arm_vrmulhq_u16 (__a, __b);
> -}
> -
> -__extension__ extern __inline uint16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vrhaddq (uint16x8_t __a, uint16x8_t __b)
> -{
> - return __arm_vrhaddq_u16 (__a, __b);
> -}
> -
> -__extension__ extern __inline uint16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vqsubq (uint16x8_t __a, uint16x8_t __b)
> -{
> - return __arm_vqsubq_u16 (__a, __b);
> -}
> -
> -__extension__ extern __inline uint16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vqsubq (uint16x8_t __a, uint16_t __b)
> -{
> - return __arm_vqsubq_n_u16 (__a, __b);
> -}
> -
> -__extension__ extern __inline uint16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vqaddq (uint16x8_t __a, uint16x8_t __b)
> -{
> - return __arm_vqaddq_u16 (__a, __b);
> -}
> -
> -__extension__ extern __inline uint16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vqaddq (uint16x8_t __a, uint16_t __b)
> -{
> - return __arm_vqaddq_n_u16 (__a, __b);
> -}
> -
>  __extension__ extern __inline uint16x8_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vornq (uint16x8_t __a, uint16x8_t __b)
> @@ -19349,13 +17630,6 @@ __arm_vmullbq_int (uint16x8_t __a, uint16x8_t
> __b)
>   return __arm_vmullbq_int_u16 (__a, __b);
>  }
> 
> -__extension__ extern __inline uint16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vmulhq (uint16x8_t __a, uint16x8_t __b)
> -{
> - return __arm_vmulhq_u16 (__a, __b);
> -}
> -
>  __extension__ extern __inline uint32_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vmladavq (uint16x8_t __a, uint16x8_t __b)
> @@ -19391,34 +17665,6 @@ __arm_vmaxq (uint16x8_t __a, uint16x8_t __b)
>   return __arm_vmaxq_u16 (__a, __b);
>  }
> 
> -__extension__ extern __inline uint16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vhsubq (uint16x8_t __a, uint16x8_t __b)
> -{
> - return __arm_vhsubq_u16 (__a, __b);
> -}
> -
> -__extension__ extern __inline uint16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vhsubq (uint16x8_t __a, uint16_t __b)
> -{
> - return __arm_vhsubq_n_u16 (__a, __b);
> -}
> -
> -__extension__ extern __inline uint16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vhaddq (uint16x8_t __a, uint16x8_t __b)
> -{
> - return __arm_vhaddq_u16 (__a, __b);
> -}
> -
> -__extension__ extern __inline uint16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vhaddq (uint16x8_t __a, uint16_t __b)
> -{
> - return __arm_vhaddq_n_u16 (__a, __b);
> -}
> -
>  __extension__ extern __inline mve_pred16_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vcmpneq (uint16x8_t __a, uint16_t __b)
> @@ -19727,34 +17973,6 @@ __arm_vrshlq (int16x8_t __a, int32_t __b)
>   return __arm_vrshlq_n_s16 (__a, __b);
>  }
> 
> -__extension__ extern __inline int16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vrmulhq (int16x8_t __a, int16x8_t __b)
> -{
> - return __arm_vrmulhq_s16 (__a, __b);
> -}
> -
> -__extension__ extern __inline int16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vrhaddq (int16x8_t __a, int16x8_t __b)
> -{
> - return __arm_vrhaddq_s16 (__a, __b);
> -}
> -
> -__extension__ extern __inline int16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vqsubq (int16x8_t __a, int16x8_t __b)
> -{
> - return __arm_vqsubq_s16 (__a, __b);
> -}
> -
> -__extension__ extern __inline int16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vqsubq (int16x8_t __a, int16_t __b)
> -{
> - return __arm_vqsubq_n_s16 (__a, __b);
> -}
> -
>  __extension__ extern __inline int16x8_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vqshlq (int16x8_t __a, int16x8_t __b)
> @@ -19797,34 +18015,6 @@ __arm_vqrdmulhq (int16x8_t __a, int16_t __b)
>   return __arm_vqrdmulhq_n_s16 (__a, __b);
>  }
> 
> -__extension__ extern __inline int16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vqdmulhq (int16x8_t __a, int16x8_t __b)
> -{
> - return __arm_vqdmulhq_s16 (__a, __b);
> -}
> -
> -__extension__ extern __inline int16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vqdmulhq (int16x8_t __a, int16_t __b)
> -{
> - return __arm_vqdmulhq_n_s16 (__a, __b);
> -}
> -
> -__extension__ extern __inline int16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vqaddq (int16x8_t __a, int16x8_t __b)
> -{
> - return __arm_vqaddq_s16 (__a, __b);
> -}
> -
> -__extension__ extern __inline int16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vqaddq (int16x8_t __a, int16_t __b)
> -{
> - return __arm_vqaddq_n_s16 (__a, __b);
> -}
> -
>  __extension__ extern __inline int16x8_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vornq (int16x8_t __a, int16x8_t __b)
> @@ -19846,13 +18036,6 @@ __arm_vmullbq_int (int16x8_t __a, int16x8_t
> __b)
>   return __arm_vmullbq_int_s16 (__a, __b);
>  }
> 
> -__extension__ extern __inline int16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vmulhq (int16x8_t __a, int16x8_t __b)
> -{
> - return __arm_vmulhq_s16 (__a, __b);
> -}
> -
>  __extension__ extern __inline int32_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vmlsdavxq (int16x8_t __a, int16x8_t __b)
> @@ -19909,20 +18092,6 @@ __arm_vmaxq (int16x8_t __a, int16x8_t __b)
>   return __arm_vmaxq_s16 (__a, __b);
>  }
> 
> -__extension__ extern __inline int16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vhsubq (int16x8_t __a, int16x8_t __b)
> -{
> - return __arm_vhsubq_s16 (__a, __b);
> -}
> -
> -__extension__ extern __inline int16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vhsubq (int16x8_t __a, int16_t __b)
> -{
> - return __arm_vhsubq_n_s16 (__a, __b);
> -}
> -
>  __extension__ extern __inline int16x8_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vhcaddq_rot90 (int16x8_t __a, int16x8_t __b)
> @@ -19937,20 +18106,6 @@ __arm_vhcaddq_rot270 (int16x8_t __a,
> int16x8_t __b)
>   return __arm_vhcaddq_rot270_s16 (__a, __b);
>  }
> 
> -__extension__ extern __inline int16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vhaddq (int16x8_t __a, int16x8_t __b)
> -{
> - return __arm_vhaddq_s16 (__a, __b);
> -}
> -
> -__extension__ extern __inline int16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vhaddq (int16x8_t __a, int16_t __b)
> -{
> - return __arm_vhaddq_n_s16 (__a, __b);
> -}
> -
>  __extension__ extern __inline int16x8_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vcaddq_rot90 (int16x8_t __a, int16x8_t __b)
> @@ -20014,48 +18169,6 @@ __arm_vqshlq_n (int16x8_t __a, const int
> __imm)
>   return __arm_vqshlq_n_s16 (__a, __imm);
>  }
> 
> -__extension__ extern __inline uint32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vrmulhq (uint32x4_t __a, uint32x4_t __b)
> -{
> - return __arm_vrmulhq_u32 (__a, __b);
> -}
> -
> -__extension__ extern __inline uint32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vrhaddq (uint32x4_t __a, uint32x4_t __b)
> -{
> - return __arm_vrhaddq_u32 (__a, __b);
> -}
> -
> -__extension__ extern __inline uint32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vqsubq (uint32x4_t __a, uint32x4_t __b)
> -{
> - return __arm_vqsubq_u32 (__a, __b);
> -}
> -
> -__extension__ extern __inline uint32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vqsubq (uint32x4_t __a, uint32_t __b)
> -{
> - return __arm_vqsubq_n_u32 (__a, __b);
> -}
> -
> -__extension__ extern __inline uint32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vqaddq (uint32x4_t __a, uint32x4_t __b)
> -{
> - return __arm_vqaddq_u32 (__a, __b);
> -}
> -
> -__extension__ extern __inline uint32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vqaddq (uint32x4_t __a, uint32_t __b)
> -{
> - return __arm_vqaddq_n_u32 (__a, __b);
> -}
> -
>  __extension__ extern __inline uint32x4_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vornq (uint32x4_t __a, uint32x4_t __b)
> @@ -20077,13 +18190,6 @@ __arm_vmullbq_int (uint32x4_t __a, uint32x4_t
> __b)
>   return __arm_vmullbq_int_u32 (__a, __b);
>  }
> 
> -__extension__ extern __inline uint32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vmulhq (uint32x4_t __a, uint32x4_t __b)
> -{
> - return __arm_vmulhq_u32 (__a, __b);
> -}
> -
>  __extension__ extern __inline uint32_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vmladavq (uint32x4_t __a, uint32x4_t __b)
> @@ -20119,34 +18225,6 @@ __arm_vmaxq (uint32x4_t __a, uint32x4_t __b)
>   return __arm_vmaxq_u32 (__a, __b);
>  }
> 
> -__extension__ extern __inline uint32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vhsubq (uint32x4_t __a, uint32x4_t __b)
> -{
> - return __arm_vhsubq_u32 (__a, __b);
> -}
> -
> -__extension__ extern __inline uint32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vhsubq (uint32x4_t __a, uint32_t __b)
> -{
> - return __arm_vhsubq_n_u32 (__a, __b);
> -}
> -
> -__extension__ extern __inline uint32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vhaddq (uint32x4_t __a, uint32x4_t __b)
> -{
> - return __arm_vhaddq_u32 (__a, __b);
> -}
> -
> -__extension__ extern __inline uint32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vhaddq (uint32x4_t __a, uint32_t __b)
> -{
> - return __arm_vhaddq_n_u32 (__a, __b);
> -}
> -
>  __extension__ extern __inline mve_pred16_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vcmpneq (uint32x4_t __a, uint32_t __b)
> @@ -20455,34 +18533,6 @@ __arm_vrshlq (int32x4_t __a, int32_t __b)
>   return __arm_vrshlq_n_s32 (__a, __b);
>  }
> 
> -__extension__ extern __inline int32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vrmulhq (int32x4_t __a, int32x4_t __b)
> -{
> - return __arm_vrmulhq_s32 (__a, __b);
> -}
> -
> -__extension__ extern __inline int32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vrhaddq (int32x4_t __a, int32x4_t __b)
> -{
> - return __arm_vrhaddq_s32 (__a, __b);
> -}
> -
> -__extension__ extern __inline int32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vqsubq (int32x4_t __a, int32x4_t __b)
> -{
> - return __arm_vqsubq_s32 (__a, __b);
> -}
> -
> -__extension__ extern __inline int32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vqsubq (int32x4_t __a, int32_t __b)
> -{
> - return __arm_vqsubq_n_s32 (__a, __b);
> -}
> -
>  __extension__ extern __inline int32x4_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vqshlq (int32x4_t __a, int32x4_t __b)
> @@ -20525,34 +18575,6 @@ __arm_vqrdmulhq (int32x4_t __a, int32_t __b)
>   return __arm_vqrdmulhq_n_s32 (__a, __b);
>  }
> 
> -__extension__ extern __inline int32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vqdmulhq (int32x4_t __a, int32x4_t __b)
> -{
> - return __arm_vqdmulhq_s32 (__a, __b);
> -}
> -
> -__extension__ extern __inline int32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vqdmulhq (int32x4_t __a, int32_t __b)
> -{
> - return __arm_vqdmulhq_n_s32 (__a, __b);
> -}
> -
> -__extension__ extern __inline int32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vqaddq (int32x4_t __a, int32x4_t __b)
> -{
> - return __arm_vqaddq_s32 (__a, __b);
> -}
> -
> -__extension__ extern __inline int32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vqaddq (int32x4_t __a, int32_t __b)
> -{
> - return __arm_vqaddq_n_s32 (__a, __b);
> -}
> -
>  __extension__ extern __inline int32x4_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vornq (int32x4_t __a, int32x4_t __b)
> @@ -20574,13 +18596,6 @@ __arm_vmullbq_int (int32x4_t __a, int32x4_t
> __b)
>   return __arm_vmullbq_int_s32 (__a, __b);
>  }
> 
> -__extension__ extern __inline int32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vmulhq (int32x4_t __a, int32x4_t __b)
> -{
> - return __arm_vmulhq_s32 (__a, __b);
> -}
> -
>  __extension__ extern __inline int32_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vmlsdavxq (int32x4_t __a, int32x4_t __b)
> @@ -20637,20 +18652,6 @@ __arm_vmaxq (int32x4_t __a, int32x4_t __b)
>   return __arm_vmaxq_s32 (__a, __b);
>  }
> 
> -__extension__ extern __inline int32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vhsubq (int32x4_t __a, int32x4_t __b)
> -{
> - return __arm_vhsubq_s32 (__a, __b);
> -}
> -
> -__extension__ extern __inline int32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vhsubq (int32x4_t __a, int32_t __b)
> -{
> - return __arm_vhsubq_n_s32 (__a, __b);
> -}
> -
>  __extension__ extern __inline int32x4_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vhcaddq_rot90 (int32x4_t __a, int32x4_t __b)
> @@ -20665,20 +18666,6 @@ __arm_vhcaddq_rot270 (int32x4_t __a,
> int32x4_t __b)
>   return __arm_vhcaddq_rot270_s32 (__a, __b);
>  }
> 
> -__extension__ extern __inline int32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vhaddq (int32x4_t __a, int32x4_t __b)
> -{
> - return __arm_vhaddq_s32 (__a, __b);
> -}
> -
> -__extension__ extern __inline int32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vhaddq (int32x4_t __a, int32_t __b)
> -{
> - return __arm_vhaddq_n_s32 (__a, __b);
> -}
> -
>  __extension__ extern __inline int32x4_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vcaddq_rot90 (int32x4_t __a, int32x4_t __b)
> @@ -24165,90 +22152,6 @@ __arm_vcaddq_rot90_m (uint16x8_t __inactive,
> uint16x8_t __a, uint16x8_t __b, mve
>   return __arm_vcaddq_rot90_m_u16 (__inactive, __a, __b, __p);
>  }
> 
> -__extension__ extern __inline int8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vhaddq_m (int8x16_t __inactive, int8x16_t __a, int8_t __b,
> mve_pred16_t __p)
> -{
> - return __arm_vhaddq_m_n_s8 (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline int32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vhaddq_m (int32x4_t __inactive, int32x4_t __a, int32_t __b,
> mve_pred16_t __p)
> -{
> - return __arm_vhaddq_m_n_s32 (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline int16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vhaddq_m (int16x8_t __inactive, int16x8_t __a, int16_t __b,
> mve_pred16_t __p)
> -{
> - return __arm_vhaddq_m_n_s16 (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline uint8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vhaddq_m (uint8x16_t __inactive, uint8x16_t __a, uint8_t __b,
> mve_pred16_t __p)
> -{
> - return __arm_vhaddq_m_n_u8 (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline uint32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vhaddq_m (uint32x4_t __inactive, uint32x4_t __a, uint32_t __b,
> mve_pred16_t __p)
> -{
> - return __arm_vhaddq_m_n_u32 (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline uint16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vhaddq_m (uint16x8_t __inactive, uint16x8_t __a, uint16_t __b,
> mve_pred16_t __p)
> -{
> - return __arm_vhaddq_m_n_u16 (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline int8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vhaddq_m (int8x16_t __inactive, int8x16_t __a, int8x16_t __b,
> mve_pred16_t __p)
> -{
> - return __arm_vhaddq_m_s8 (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline int32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vhaddq_m (int32x4_t __inactive, int32x4_t __a, int32x4_t __b,
> mve_pred16_t __p)
> -{
> - return __arm_vhaddq_m_s32 (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline int16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vhaddq_m (int16x8_t __inactive, int16x8_t __a, int16x8_t __b,
> mve_pred16_t __p)
> -{
> - return __arm_vhaddq_m_s16 (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline uint8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vhaddq_m (uint8x16_t __inactive, uint8x16_t __a, uint8x16_t __b,
> mve_pred16_t __p)
> -{
> - return __arm_vhaddq_m_u8 (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline uint32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vhaddq_m (uint32x4_t __inactive, uint32x4_t __a, uint32x4_t __b,
> mve_pred16_t __p)
> -{
> - return __arm_vhaddq_m_u32 (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline uint16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vhaddq_m (uint16x8_t __inactive, uint16x8_t __a, uint16x8_t __b,
> mve_pred16_t __p)
> -{
> - return __arm_vhaddq_m_u16 (__inactive, __a, __b, __p);
> -}
> -
>  __extension__ extern __inline int8x16_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vhcaddq_rot270_m (int8x16_t __inactive, int8x16_t __a, int8x16_t
> __b, mve_pred16_t __p)
> @@ -24291,90 +22194,6 @@ __arm_vhcaddq_rot90_m (int16x8_t __inactive,
> int16x8_t __a, int16x8_t __b, mve_p
>   return __arm_vhcaddq_rot90_m_s16 (__inactive, __a, __b, __p);
>  }
> 
> -__extension__ extern __inline int8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vhsubq_m (int8x16_t __inactive, int8x16_t __a, int8_t __b,
> mve_pred16_t __p)
> -{
> - return __arm_vhsubq_m_n_s8 (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline int32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vhsubq_m (int32x4_t __inactive, int32x4_t __a, int32_t __b,
> mve_pred16_t __p)
> -{
> - return __arm_vhsubq_m_n_s32 (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline int16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vhsubq_m (int16x8_t __inactive, int16x8_t __a, int16_t __b,
> mve_pred16_t __p)
> -{
> - return __arm_vhsubq_m_n_s16 (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline uint8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vhsubq_m (uint8x16_t __inactive, uint8x16_t __a, uint8_t __b,
> mve_pred16_t __p)
> -{
> - return __arm_vhsubq_m_n_u8 (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline uint32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vhsubq_m (uint32x4_t __inactive, uint32x4_t __a, uint32_t __b,
> mve_pred16_t __p)
> -{
> - return __arm_vhsubq_m_n_u32 (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline uint16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vhsubq_m (uint16x8_t __inactive, uint16x8_t __a, uint16_t __b,
> mve_pred16_t __p)
> -{
> - return __arm_vhsubq_m_n_u16 (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline int8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vhsubq_m (int8x16_t __inactive, int8x16_t __a, int8x16_t __b,
> mve_pred16_t __p)
> -{
> - return __arm_vhsubq_m_s8 (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline int32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vhsubq_m (int32x4_t __inactive, int32x4_t __a, int32x4_t __b,
> mve_pred16_t __p)
> -{
> - return __arm_vhsubq_m_s32 (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline int16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vhsubq_m (int16x8_t __inactive, int16x8_t __a, int16x8_t __b,
> mve_pred16_t __p)
> -{
> - return __arm_vhsubq_m_s16 (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline uint8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vhsubq_m (uint8x16_t __inactive, uint8x16_t __a, uint8x16_t __b,
> mve_pred16_t __p)
> -{
> - return __arm_vhsubq_m_u8 (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline uint32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vhsubq_m (uint32x4_t __inactive, uint32x4_t __a, uint32x4_t __b,
> mve_pred16_t __p)
> -{
> - return __arm_vhsubq_m_u32 (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline uint16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vhsubq_m (uint16x8_t __inactive, uint16x8_t __a, uint16x8_t __b,
> mve_pred16_t __p)
> -{
> - return __arm_vhsubq_m_u16 (__inactive, __a, __b, __p);
> -}
> -
>  __extension__ extern __inline int8x16_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vmaxq_m (int8x16_t __inactive, int8x16_t __a, int8x16_t __b,
> mve_pred16_t __p)
> @@ -24648,48 +22467,6 @@ __arm_vmlsdavaxq_p (int32_t __a, int16x8_t
> __b, int16x8_t __c, mve_pred16_t __p)
>   return __arm_vmlsdavaxq_p_s16 (__a, __b, __c, __p);
>  }
> 
> -__extension__ extern __inline int8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vmulhq_m (int8x16_t __inactive, int8x16_t __a, int8x16_t __b,
> mve_pred16_t __p)
> -{
> - return __arm_vmulhq_m_s8 (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline int32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vmulhq_m (int32x4_t __inactive, int32x4_t __a, int32x4_t __b,
> mve_pred16_t __p)
> -{
> - return __arm_vmulhq_m_s32 (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline int16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vmulhq_m (int16x8_t __inactive, int16x8_t __a, int16x8_t __b,
> mve_pred16_t __p)
> -{
> - return __arm_vmulhq_m_s16 (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline uint8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vmulhq_m (uint8x16_t __inactive, uint8x16_t __a, uint8x16_t __b,
> mve_pred16_t __p)
> -{
> - return __arm_vmulhq_m_u8 (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline uint32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vmulhq_m (uint32x4_t __inactive, uint32x4_t __a, uint32x4_t __b,
> mve_pred16_t __p)
> -{
> - return __arm_vmulhq_m_u32 (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline uint16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vmulhq_m (uint16x8_t __inactive, uint16x8_t __a, uint16x8_t __b,
> mve_pred16_t __p)
> -{
> - return __arm_vmulhq_m_u16 (__inactive, __a, __b, __p);
> -}
> -
>  __extension__ extern __inline int16x8_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vmullbq_int_m (int16x8_t __inactive, int8x16_t __a, int8x16_t __b,
> mve_pred16_t __p)
> @@ -24816,90 +22593,6 @@ __arm_vornq_m (uint16x8_t __inactive,
> uint16x8_t __a, uint16x8_t __b, mve_pred16
>   return __arm_vornq_m_u16 (__inactive, __a, __b, __p);
>  }
> 
> -__extension__ extern __inline int8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vqaddq_m (int8x16_t __inactive, int8x16_t __a, int8_t __b,
> mve_pred16_t __p)
> -{
> - return __arm_vqaddq_m_n_s8 (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline int32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vqaddq_m (int32x4_t __inactive, int32x4_t __a, int32_t __b,
> mve_pred16_t __p)
> -{
> - return __arm_vqaddq_m_n_s32 (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline int16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vqaddq_m (int16x8_t __inactive, int16x8_t __a, int16_t __b,
> mve_pred16_t __p)
> -{
> - return __arm_vqaddq_m_n_s16 (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline uint8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vqaddq_m (uint8x16_t __inactive, uint8x16_t __a, uint8_t __b,
> mve_pred16_t __p)
> -{
> - return __arm_vqaddq_m_n_u8 (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline uint32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vqaddq_m (uint32x4_t __inactive, uint32x4_t __a, uint32_t __b,
> mve_pred16_t __p)
> -{
> - return __arm_vqaddq_m_n_u32 (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline uint16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vqaddq_m (uint16x8_t __inactive, uint16x8_t __a, uint16_t __b,
> mve_pred16_t __p)
> -{
> - return __arm_vqaddq_m_n_u16 (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline int8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vqaddq_m (int8x16_t __inactive, int8x16_t __a, int8x16_t __b,
> mve_pred16_t __p)
> -{
> - return __arm_vqaddq_m_s8 (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline int32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vqaddq_m (int32x4_t __inactive, int32x4_t __a, int32x4_t __b,
> mve_pred16_t __p)
> -{
> - return __arm_vqaddq_m_s32 (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline int16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vqaddq_m (int16x8_t __inactive, int16x8_t __a, int16x8_t __b,
> mve_pred16_t __p)
> -{
> - return __arm_vqaddq_m_s16 (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline uint8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vqaddq_m (uint8x16_t __inactive, uint8x16_t __a, uint8x16_t __b,
> mve_pred16_t __p)
> -{
> - return __arm_vqaddq_m_u8 (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline uint32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vqaddq_m (uint32x4_t __inactive, uint32x4_t __a, uint32x4_t __b,
> mve_pred16_t __p)
> -{
> - return __arm_vqaddq_m_u32 (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline uint16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vqaddq_m (uint16x8_t __inactive, uint16x8_t __a, uint16x8_t __b,
> mve_pred16_t __p)
> -{
> - return __arm_vqaddq_m_u16 (__inactive, __a, __b, __p);
> -}
> -
>  __extension__ extern __inline int8x16_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vqdmladhq_m (int8x16_t __inactive, int8x16_t __a, int8x16_t __b,
> mve_pred16_t __p)
> @@ -25005,48 +22698,6 @@ __arm_vqdmlsdhxq_m (int16x8_t __inactive,
> int16x8_t __a, int16x8_t __b, mve_pred
>   return __arm_vqdmlsdhxq_m_s16 (__inactive, __a, __b, __p);
>  }
> 
> -__extension__ extern __inline int8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vqdmulhq_m (int8x16_t __inactive, int8x16_t __a, int8_t __b,
> mve_pred16_t __p)
> -{
> - return __arm_vqdmulhq_m_n_s8 (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline int32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vqdmulhq_m (int32x4_t __inactive, int32x4_t __a, int32_t __b,
> mve_pred16_t __p)
> -{
> - return __arm_vqdmulhq_m_n_s32 (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline int16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vqdmulhq_m (int16x8_t __inactive, int16x8_t __a, int16_t __b,
> mve_pred16_t __p)
> -{
> - return __arm_vqdmulhq_m_n_s16 (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline int8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vqdmulhq_m (int8x16_t __inactive, int8x16_t __a, int8x16_t __b,
> mve_pred16_t __p)
> -{
> - return __arm_vqdmulhq_m_s8 (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline int32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vqdmulhq_m (int32x4_t __inactive, int32x4_t __a, int32x4_t __b,
> mve_pred16_t __p)
> -{
> - return __arm_vqdmulhq_m_s32 (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline int16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vqdmulhq_m (int16x8_t __inactive, int16x8_t __a, int16x8_t __b,
> mve_pred16_t __p)
> -{
> - return __arm_vqdmulhq_m_s16 (__inactive, __a, __b, __p);
> -}
> -
>  __extension__ extern __inline int8x16_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vqrdmladhq_m (int8x16_t __inactive, int8x16_t __a, int8x16_t __b,
> mve_pred16_t __p)
> @@ -25362,174 +23013,6 @@ __arm_vqshlq_m (uint16x8_t __inactive,
> uint16x8_t __a, int16x8_t __b, mve_pred16
>   return __arm_vqshlq_m_u16 (__inactive, __a, __b, __p);
>  }
> 
> -__extension__ extern __inline int8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vqsubq_m (int8x16_t __inactive, int8x16_t __a, int8_t __b,
> mve_pred16_t __p)
> -{
> - return __arm_vqsubq_m_n_s8 (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline int32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vqsubq_m (int32x4_t __inactive, int32x4_t __a, int32_t __b,
> mve_pred16_t __p)
> -{
> - return __arm_vqsubq_m_n_s32 (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline int16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vqsubq_m (int16x8_t __inactive, int16x8_t __a, int16_t __b,
> mve_pred16_t __p)
> -{
> - return __arm_vqsubq_m_n_s16 (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline uint8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vqsubq_m (uint8x16_t __inactive, uint8x16_t __a, uint8_t __b,
> mve_pred16_t __p)
> -{
> - return __arm_vqsubq_m_n_u8 (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline uint32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vqsubq_m (uint32x4_t __inactive, uint32x4_t __a, uint32_t __b,
> mve_pred16_t __p)
> -{
> - return __arm_vqsubq_m_n_u32 (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline uint16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vqsubq_m (uint16x8_t __inactive, uint16x8_t __a, uint16_t __b,
> mve_pred16_t __p)
> -{
> - return __arm_vqsubq_m_n_u16 (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline int8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vqsubq_m (int8x16_t __inactive, int8x16_t __a, int8x16_t __b,
> mve_pred16_t __p)
> -{
> - return __arm_vqsubq_m_s8 (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline int32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vqsubq_m (int32x4_t __inactive, int32x4_t __a, int32x4_t __b,
> mve_pred16_t __p)
> -{
> - return __arm_vqsubq_m_s32 (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline int16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vqsubq_m (int16x8_t __inactive, int16x8_t __a, int16x8_t __b,
> mve_pred16_t __p)
> -{
> - return __arm_vqsubq_m_s16 (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline uint8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vqsubq_m (uint8x16_t __inactive, uint8x16_t __a, uint8x16_t __b,
> mve_pred16_t __p)
> -{
> - return __arm_vqsubq_m_u8 (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline uint32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vqsubq_m (uint32x4_t __inactive, uint32x4_t __a, uint32x4_t __b,
> mve_pred16_t __p)
> -{
> - return __arm_vqsubq_m_u32 (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline uint16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vqsubq_m (uint16x8_t __inactive, uint16x8_t __a, uint16x8_t __b,
> mve_pred16_t __p)
> -{
> - return __arm_vqsubq_m_u16 (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline int8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vrhaddq_m (int8x16_t __inactive, int8x16_t __a, int8x16_t __b,
> mve_pred16_t __p)
> -{
> - return __arm_vrhaddq_m_s8 (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline int32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vrhaddq_m (int32x4_t __inactive, int32x4_t __a, int32x4_t __b,
> mve_pred16_t __p)
> -{
> - return __arm_vrhaddq_m_s32 (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline int16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vrhaddq_m (int16x8_t __inactive, int16x8_t __a, int16x8_t __b,
> mve_pred16_t __p)
> -{
> - return __arm_vrhaddq_m_s16 (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline uint8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vrhaddq_m (uint8x16_t __inactive, uint8x16_t __a, uint8x16_t __b,
> mve_pred16_t __p)
> -{
> - return __arm_vrhaddq_m_u8 (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline uint32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vrhaddq_m (uint32x4_t __inactive, uint32x4_t __a, uint32x4_t __b,
> mve_pred16_t __p)
> -{
> - return __arm_vrhaddq_m_u32 (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline uint16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vrhaddq_m (uint16x8_t __inactive, uint16x8_t __a, uint16x8_t __b,
> mve_pred16_t __p)
> -{
> - return __arm_vrhaddq_m_u16 (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline int8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vrmulhq_m (int8x16_t __inactive, int8x16_t __a, int8x16_t __b,
> mve_pred16_t __p)
> -{
> - return __arm_vrmulhq_m_s8 (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline int32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vrmulhq_m (int32x4_t __inactive, int32x4_t __a, int32x4_t __b,
> mve_pred16_t __p)
> -{
> - return __arm_vrmulhq_m_s32 (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline int16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vrmulhq_m (int16x8_t __inactive, int16x8_t __a, int16x8_t __b,
> mve_pred16_t __p)
> -{
> - return __arm_vrmulhq_m_s16 (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline uint8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vrmulhq_m (uint8x16_t __inactive, uint8x16_t __a, uint8x16_t __b,
> mve_pred16_t __p)
> -{
> - return __arm_vrmulhq_m_u8 (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline uint32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vrmulhq_m (uint32x4_t __inactive, uint32x4_t __a, uint32x4_t __b,
> mve_pred16_t __p)
> -{
> - return __arm_vrmulhq_m_u32 (__inactive, __a, __b, __p);
> -}
> -
> -__extension__ extern __inline uint16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vrmulhq_m (uint16x8_t __inactive, uint16x8_t __a, uint16x8_t __b,
> mve_pred16_t __p)
> -{
> - return __arm_vrmulhq_m_u16 (__inactive, __a, __b, __p);
> -}
> -
>  __extension__ extern __inline int8x16_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vrshlq_m (int8x16_t __inactive, int8x16_t __a, int8x16_t __b,
> mve_pred16_t __p)
> @@ -27980,48 +25463,6 @@ __arm_vnegq_x (int32x4_t __a, mve_pred16_t
> __p)
>   return __arm_vnegq_x_s32 (__a, __p);
>  }
> 
> -__extension__ extern __inline int8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vmulhq_x (int8x16_t __a, int8x16_t __b, mve_pred16_t __p)
> -{
> - return __arm_vmulhq_x_s8 (__a, __b, __p);
> -}
> -
> -__extension__ extern __inline int16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vmulhq_x (int16x8_t __a, int16x8_t __b, mve_pred16_t __p)
> -{
> - return __arm_vmulhq_x_s16 (__a, __b, __p);
> -}
> -
> -__extension__ extern __inline int32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vmulhq_x (int32x4_t __a, int32x4_t __b, mve_pred16_t __p)
> -{
> - return __arm_vmulhq_x_s32 (__a, __b, __p);
> -}
> -
> -__extension__ extern __inline uint8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vmulhq_x (uint8x16_t __a, uint8x16_t __b, mve_pred16_t __p)
> -{
> - return __arm_vmulhq_x_u8 (__a, __b, __p);
> -}
> -
> -__extension__ extern __inline uint16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vmulhq_x (uint16x8_t __a, uint16x8_t __b, mve_pred16_t __p)
> -{
> - return __arm_vmulhq_x_u16 (__a, __b, __p);
> -}
> -
> -__extension__ extern __inline uint32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vmulhq_x (uint32x4_t __a, uint32x4_t __b, mve_pred16_t __p)
> -{
> - return __arm_vmulhq_x_u32 (__a, __b, __p);
> -}
> -
>  __extension__ extern __inline uint16x8_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vmullbq_poly_x (uint8x16_t __a, uint8x16_t __b, mve_pred16_t __p)
> @@ -28218,90 +25659,6 @@ __arm_vcaddq_rot270_x (uint32x4_t __a,
> uint32x4_t __b, mve_pred16_t __p)
>   return __arm_vcaddq_rot270_x_u32 (__a, __b, __p);
>  }
> 
> -__extension__ extern __inline int8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vhaddq_x (int8x16_t __a, int8_t __b, mve_pred16_t __p)
> -{
> - return __arm_vhaddq_x_n_s8 (__a, __b, __p);
> -}
> -
> -__extension__ extern __inline int16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vhaddq_x (int16x8_t __a, int16_t __b, mve_pred16_t __p)
> -{
> - return __arm_vhaddq_x_n_s16 (__a, __b, __p);
> -}
> -
> -__extension__ extern __inline int32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vhaddq_x (int32x4_t __a, int32_t __b, mve_pred16_t __p)
> -{
> - return __arm_vhaddq_x_n_s32 (__a, __b, __p);
> -}
> -
> -__extension__ extern __inline uint8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vhaddq_x (uint8x16_t __a, uint8_t __b, mve_pred16_t __p)
> -{
> - return __arm_vhaddq_x_n_u8 (__a, __b, __p);
> -}
> -
> -__extension__ extern __inline uint16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vhaddq_x (uint16x8_t __a, uint16_t __b, mve_pred16_t __p)
> -{
> - return __arm_vhaddq_x_n_u16 (__a, __b, __p);
> -}
> -
> -__extension__ extern __inline uint32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vhaddq_x (uint32x4_t __a, uint32_t __b, mve_pred16_t __p)
> -{
> - return __arm_vhaddq_x_n_u32 (__a, __b, __p);
> -}
> -
> -__extension__ extern __inline int8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vhaddq_x (int8x16_t __a, int8x16_t __b, mve_pred16_t __p)
> -{
> - return __arm_vhaddq_x_s8 (__a, __b, __p);
> -}
> -
> -__extension__ extern __inline int16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vhaddq_x (int16x8_t __a, int16x8_t __b, mve_pred16_t __p)
> -{
> - return __arm_vhaddq_x_s16 (__a, __b, __p);
> -}
> -
> -__extension__ extern __inline int32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vhaddq_x (int32x4_t __a, int32x4_t __b, mve_pred16_t __p)
> -{
> - return __arm_vhaddq_x_s32 (__a, __b, __p);
> -}
> -
> -__extension__ extern __inline uint8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vhaddq_x (uint8x16_t __a, uint8x16_t __b, mve_pred16_t __p)
> -{
> - return __arm_vhaddq_x_u8 (__a, __b, __p);
> -}
> -
> -__extension__ extern __inline uint16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vhaddq_x (uint16x8_t __a, uint16x8_t __b, mve_pred16_t __p)
> -{
> - return __arm_vhaddq_x_u16 (__a, __b, __p);
> -}
> -
> -__extension__ extern __inline uint32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vhaddq_x (uint32x4_t __a, uint32x4_t __b, mve_pred16_t __p)
> -{
> - return __arm_vhaddq_x_u32 (__a, __b, __p);
> -}
> -
>  __extension__ extern __inline int8x16_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vhcaddq_rot90_x (int8x16_t __a, int8x16_t __b, mve_pred16_t __p)
> @@ -28344,174 +25701,6 @@ __arm_vhcaddq_rot270_x (int32x4_t __a,
> int32x4_t __b, mve_pred16_t __p)
>   return __arm_vhcaddq_rot270_x_s32 (__a, __b, __p);
>  }
> 
> -__extension__ extern __inline int8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vhsubq_x (int8x16_t __a, int8_t __b, mve_pred16_t __p)
> -{
> - return __arm_vhsubq_x_n_s8 (__a, __b, __p);
> -}
> -
> -__extension__ extern __inline int16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vhsubq_x (int16x8_t __a, int16_t __b, mve_pred16_t __p)
> -{
> - return __arm_vhsubq_x_n_s16 (__a, __b, __p);
> -}
> -
> -__extension__ extern __inline int32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vhsubq_x (int32x4_t __a, int32_t __b, mve_pred16_t __p)
> -{
> - return __arm_vhsubq_x_n_s32 (__a, __b, __p);
> -}
> -
> -__extension__ extern __inline uint8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vhsubq_x (uint8x16_t __a, uint8_t __b, mve_pred16_t __p)
> -{
> - return __arm_vhsubq_x_n_u8 (__a, __b, __p);
> -}
> -
> -__extension__ extern __inline uint16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vhsubq_x (uint16x8_t __a, uint16_t __b, mve_pred16_t __p)
> -{
> - return __arm_vhsubq_x_n_u16 (__a, __b, __p);
> -}
> -
> -__extension__ extern __inline uint32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vhsubq_x (uint32x4_t __a, uint32_t __b, mve_pred16_t __p)
> -{
> - return __arm_vhsubq_x_n_u32 (__a, __b, __p);
> -}
> -
> -__extension__ extern __inline int8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vhsubq_x (int8x16_t __a, int8x16_t __b, mve_pred16_t __p)
> -{
> - return __arm_vhsubq_x_s8 (__a, __b, __p);
> -}
> -
> -__extension__ extern __inline int16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vhsubq_x (int16x8_t __a, int16x8_t __b, mve_pred16_t __p)
> -{
> - return __arm_vhsubq_x_s16 (__a, __b, __p);
> -}
> -
> -__extension__ extern __inline int32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vhsubq_x (int32x4_t __a, int32x4_t __b, mve_pred16_t __p)
> -{
> - return __arm_vhsubq_x_s32 (__a, __b, __p);
> -}
> -
> -__extension__ extern __inline uint8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vhsubq_x (uint8x16_t __a, uint8x16_t __b, mve_pred16_t __p)
> -{
> - return __arm_vhsubq_x_u8 (__a, __b, __p);
> -}
> -
> -__extension__ extern __inline uint16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vhsubq_x (uint16x8_t __a, uint16x8_t __b, mve_pred16_t __p)
> -{
> - return __arm_vhsubq_x_u16 (__a, __b, __p);
> -}
> -
> -__extension__ extern __inline uint32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vhsubq_x (uint32x4_t __a, uint32x4_t __b, mve_pred16_t __p)
> -{
> - return __arm_vhsubq_x_u32 (__a, __b, __p);
> -}
> -
> -__extension__ extern __inline int8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vrhaddq_x (int8x16_t __a, int8x16_t __b, mve_pred16_t __p)
> -{
> - return __arm_vrhaddq_x_s8 (__a, __b, __p);
> -}
> -
> -__extension__ extern __inline int16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vrhaddq_x (int16x8_t __a, int16x8_t __b, mve_pred16_t __p)
> -{
> - return __arm_vrhaddq_x_s16 (__a, __b, __p);
> -}
> -
> -__extension__ extern __inline int32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vrhaddq_x (int32x4_t __a, int32x4_t __b, mve_pred16_t __p)
> -{
> - return __arm_vrhaddq_x_s32 (__a, __b, __p);
> -}
> -
> -__extension__ extern __inline uint8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vrhaddq_x (uint8x16_t __a, uint8x16_t __b, mve_pred16_t __p)
> -{
> - return __arm_vrhaddq_x_u8 (__a, __b, __p);
> -}
> -
> -__extension__ extern __inline uint16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vrhaddq_x (uint16x8_t __a, uint16x8_t __b, mve_pred16_t __p)
> -{
> - return __arm_vrhaddq_x_u16 (__a, __b, __p);
> -}
> -
> -__extension__ extern __inline uint32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vrhaddq_x (uint32x4_t __a, uint32x4_t __b, mve_pred16_t __p)
> -{
> - return __arm_vrhaddq_x_u32 (__a, __b, __p);
> -}
> -
> -__extension__ extern __inline int8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vrmulhq_x (int8x16_t __a, int8x16_t __b, mve_pred16_t __p)
> -{
> - return __arm_vrmulhq_x_s8 (__a, __b, __p);
> -}
> -
> -__extension__ extern __inline int16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vrmulhq_x (int16x8_t __a, int16x8_t __b, mve_pred16_t __p)
> -{
> - return __arm_vrmulhq_x_s16 (__a, __b, __p);
> -}
> -
> -__extension__ extern __inline int32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vrmulhq_x (int32x4_t __a, int32x4_t __b, mve_pred16_t __p)
> -{
> - return __arm_vrmulhq_x_s32 (__a, __b, __p);
> -}
> -
> -__extension__ extern __inline uint8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vrmulhq_x (uint8x16_t __a, uint8x16_t __b, mve_pred16_t __p)
> -{
> - return __arm_vrmulhq_x_u8 (__a, __b, __p);
> -}
> -
> -__extension__ extern __inline uint16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vrmulhq_x (uint16x8_t __a, uint16x8_t __b, mve_pred16_t __p)
> -{
> - return __arm_vrmulhq_x_u16 (__a, __b, __p);
> -}
> -
> -__extension__ extern __inline uint32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vrmulhq_x (uint32x4_t __a, uint32x4_t __b, mve_pred16_t __p)
> -{
> - return __arm_vrmulhq_x_u32 (__a, __b, __p);
> -}
> -
>  __extension__ extern __inline int8x16_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vbicq_x (int8x16_t __a, int8x16_t __b, mve_pred16_t __p)
> @@ -32685,42 +29874,6 @@ extern void *__ARM_undef;
>    int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int16x8_t]:
> __arm_vrshlq_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
> __ARM_mve_coerce(__p1, int16x8_t)), \
>    int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int32x4_t]:
> __arm_vrshlq_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
> __ARM_mve_coerce(__p1, int32x4_t)));})
> 
> -#define __arm_vrmulhq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
> -  __typeof(p1) __p1 = (p1); \
> -  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0,
> \
> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]:
> __arm_vrmulhq_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int8x16_t)), \
> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]:
> __arm_vrmulhq_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce(__p1, int16x8_t)), \
> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
> __arm_vrmulhq_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce(__p1, int32x4_t)), \
> -  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]:
> __arm_vrmulhq_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
> __ARM_mve_coerce(__p1, uint8x16_t)), \
> -  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]:
> __arm_vrmulhq_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
> __ARM_mve_coerce(__p1, uint16x8_t)), \
> -  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]:
> __arm_vrmulhq_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
> __ARM_mve_coerce(__p1, uint32x4_t)));})
> -
> -#define __arm_vrhaddq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
> -  __typeof(p1) __p1 = (p1); \
> -  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0,
> \
> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]:
> __arm_vrhaddq_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int8x16_t)), \
> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]:
> __arm_vrhaddq_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce(__p1, int16x8_t)), \
> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
> __arm_vrhaddq_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce(__p1, int32x4_t)), \
> -  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]:
> __arm_vrhaddq_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
> __ARM_mve_coerce(__p1, uint8x16_t)), \
> -  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]:
> __arm_vrhaddq_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
> __ARM_mve_coerce(__p1, uint16x8_t)), \
> -  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]:
> __arm_vrhaddq_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
> __ARM_mve_coerce(__p1, uint32x4_t)));})
> -
> -#define __arm_vqsubq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
> -  __typeof(p1) __p1 = (p1); \
> -  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0,
> \
> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
> __arm_vqsubq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce3(p1, int)), \
> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vqsubq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce3(p1, int)), \
> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
> __arm_vqsubq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce3(p1, int)), \
> -  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
> __arm_vqsubq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
> __ARM_mve_coerce3(p1, int)), \
> -  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
> __arm_vqsubq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
> __ARM_mve_coerce3(p1, int)), \
> -  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
> __arm_vqsubq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
> __ARM_mve_coerce3(p1, int)), \
> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]:
> __arm_vqsubq_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int8x16_t)), \
> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]:
> __arm_vqsubq_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce(__p1, int16x8_t)), \
> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
> __arm_vqsubq_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce(__p1, int32x4_t)), \
> -  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]:
> __arm_vqsubq_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
> __ARM_mve_coerce(__p1, uint8x16_t)), \
> -  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]:
> __arm_vqsubq_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
> __ARM_mve_coerce(__p1, uint16x8_t)), \
> -  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]:
> __arm_vqsubq_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
> __ARM_mve_coerce(__p1, uint32x4_t)));})
> -
>  #define __arm_vqshluq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
>    _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
>    int (*)[__ARM_mve_type_int8x16_t]: __arm_vqshluq_n_s8
> (__ARM_mve_coerce(__p0, int8x16_t), p1), \
> @@ -32831,32 +29984,6 @@ extern void *__ARM_undef;
>    int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]:
> __arm_vqdmullbq_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce(__p1, int16x8_t)), \
>    int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
> __arm_vqdmullbq_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce(__p1, int32x4_t)));})
> 
> -#define __arm_vqdmulhq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
> -  __typeof(p1) __p1 = (p1); \
> -  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0,
> \
> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
> __arm_vqdmulhq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce3(p1, int)), \
> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vqdmulhq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce3(p1, int)), \
> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
> __arm_vqdmulhq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce3(p1, int)), \
> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]:
> __arm_vqdmulhq_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int8x16_t)), \
> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]:
> __arm_vqdmulhq_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce(__p1, int16x8_t)), \
> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
> __arm_vqdmulhq_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce(__p1, int32x4_t)));})
> -
> -#define __arm_vqaddq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
> -  __typeof(p1) __p1 = (p1); \
> -  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0,
> \
> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
> __arm_vqaddq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce3(p1, int)), \
> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vqaddq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce3(p1, int)), \
> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
> __arm_vqaddq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce3(p1, int)), \
> -  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
> __arm_vqaddq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
> __ARM_mve_coerce3(p1, int)), \
> -  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
> __arm_vqaddq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
> __ARM_mve_coerce3(p1, int)), \
> -  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
> __arm_vqaddq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
> __ARM_mve_coerce3(p1, int)), \
> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]:
> __arm_vqaddq_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int8x16_t)), \
> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]:
> __arm_vqaddq_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce(__p1, int16x8_t)), \
> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
> __arm_vqaddq_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce(__p1, int32x4_t)), \
> -  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]:
> __arm_vqaddq_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
> __ARM_mve_coerce(__p1, uint8x16_t)), \
> -  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]:
> __arm_vqaddq_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
> __ARM_mve_coerce(__p1, uint16x8_t)), \
> -  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]:
> __arm_vqaddq_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
> __ARM_mve_coerce(__p1, uint32x4_t)));})
> -
>  #define __arm_vmulltq_poly(p0,p1) ({ __typeof(p0) __p0 = (p0); \
>    __typeof(p1) __p1 = (p1); \
>    _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0,
> \
> @@ -32879,22 +30006,6 @@ extern void *__ARM_undef;
>    int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]:
> __arm_vmulltq_int_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
> __ARM_mve_coerce(__p1, uint16x8_t)), \
>    int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]:
> __arm_vmulltq_int_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
> __ARM_mve_coerce(__p1, uint32x4_t)));})
> 
> -#define __arm_vhaddq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
> -  __typeof(p1) __p1 = (p1); \
> -  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0,
> \
> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
> __arm_vhaddq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce3(p1, int)), \
> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vhaddq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce3(p1, int)), \
> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
> __arm_vhaddq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce3(p1, int)), \
> -  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
> __arm_vhaddq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
> __ARM_mve_coerce3(p1, int)), \
> -  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
> __arm_vhaddq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
> __ARM_mve_coerce3(p1, int)), \
> -  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
> __arm_vhaddq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
> __ARM_mve_coerce3(p1, int)), \
> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]:
> __arm_vhaddq_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int8x16_t)), \
> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]:
> __arm_vhaddq_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce(__p1, int16x8_t)), \
> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
> __arm_vhaddq_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce(__p1, int32x4_t)), \
> -  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]:
> __arm_vhaddq_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
> __ARM_mve_coerce(__p1, uint8x16_t)), \
> -  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]:
> __arm_vhaddq_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
> __ARM_mve_coerce(__p1, uint16x8_t)), \
> -  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]:
> __arm_vhaddq_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
> __ARM_mve_coerce(__p1, uint32x4_t)));})
> -
>  #define __arm_vhcaddq_rot270(p0,p1) ({ __typeof(p0) __p0 = (p0); \
>    __typeof(p1) __p1 = (p1); \
>    _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0,
> \
> @@ -32909,22 +30020,6 @@ extern void *__ARM_undef;
>    int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]:
> __arm_vhcaddq_rot90_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce(__p1, int16x8_t)), \
>    int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
> __arm_vhcaddq_rot90_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce(__p1, int32x4_t)));})
> 
> -#define __arm_vhsubq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
> -  __typeof(p1) __p1 = (p1); \
> -  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0,
> \
> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
> __arm_vhsubq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce3(p1, int)), \
> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vhsubq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce3(p1, int)), \
> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
> __arm_vhsubq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce3(p1, int)), \
> -  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
> __arm_vhsubq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
> __ARM_mve_coerce3(p1, int)), \
> -  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
> __arm_vhsubq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
> __ARM_mve_coerce3(p1, int)), \
> -  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
> __arm_vhsubq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
> __ARM_mve_coerce3(p1, int)), \
> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]:
> __arm_vhsubq_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int8x16_t)), \
> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]:
> __arm_vhsubq_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce(__p1, int16x8_t)), \
> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
> __arm_vhsubq_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce(__p1, int32x4_t)), \
> -  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]:
> __arm_vhsubq_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
> __ARM_mve_coerce(__p1, uint8x16_t)), \
> -  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]:
> __arm_vhsubq_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
> __ARM_mve_coerce(__p1, uint16x8_t)), \
> -  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]:
> __arm_vhsubq_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
> __ARM_mve_coerce(__p1, uint32x4_t)));})
> -
>  #define __arm_vminq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
>    __typeof(p1) __p1 = (p1); \
>    _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0,
> \
> @@ -32975,16 +30070,6 @@ extern void *__ARM_undef;
>    int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint16x8_t]:
> __arm_vmovnbq_u16 (__ARM_mve_coerce(__p0, uint8x16_t),
> __ARM_mve_coerce(__p1, uint16x8_t)), \
>    int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint32x4_t]:
> __arm_vmovnbq_u32 (__ARM_mve_coerce(__p0, uint16x8_t),
> __ARM_mve_coerce(__p1, uint32x4_t)));})
> 
> -#define __arm_vmulhq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
> -  __typeof(p1) __p1 = (p1); \
> -  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0,
> \
> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]:
> __arm_vmulhq_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int8x16_t)), \
> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]:
> __arm_vmulhq_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce(__p1, int16x8_t)), \
> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
> __arm_vmulhq_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce(__p1, int32x4_t)), \
> -  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]:
> __arm_vmulhq_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
> __ARM_mve_coerce(__p1, uint8x16_t)), \
> -  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]:
> __arm_vmulhq_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
> __ARM_mve_coerce(__p1, uint16x8_t)), \
> -  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]:
> __arm_vmulhq_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
> __ARM_mve_coerce(__p1, uint32x4_t)));})
> -
>  #define __arm_vmullbq_int(p0,p1) ({ __typeof(p0) __p0 = (p0); \
>    __typeof(p1) __p1 = (p1); \
>    _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0,
> \
> @@ -34580,42 +31665,6 @@ extern void *__ARM_undef;
>    int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int16x8_t]:
> __arm_vrshlq_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
> __ARM_mve_coerce(__p1, int16x8_t)), \
>    int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int32x4_t]:
> __arm_vrshlq_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
> __ARM_mve_coerce(__p1, int32x4_t)));})
> 
> -#define __arm_vrmulhq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
> -  __typeof(p1) __p1 = (p1); \
> -  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0,
> \
> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]:
> __arm_vrmulhq_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int8x16_t)), \
> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]:
> __arm_vrmulhq_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce(__p1, int16x8_t)), \
> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
> __arm_vrmulhq_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce(__p1, int32x4_t)), \
> -  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]:
> __arm_vrmulhq_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
> __ARM_mve_coerce(__p1, uint8x16_t)), \
> -  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]:
> __arm_vrmulhq_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
> __ARM_mve_coerce(__p1, uint16x8_t)), \
> -  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]:
> __arm_vrmulhq_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
> __ARM_mve_coerce(__p1, uint32x4_t)));})
> -
> -#define __arm_vrhaddq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
> -  __typeof(p1) __p1 = (p1); \
> -  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0,
> \
> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]:
> __arm_vrhaddq_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int8x16_t)), \
> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]:
> __arm_vrhaddq_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce(__p1, int16x8_t)), \
> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
> __arm_vrhaddq_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce(__p1, int32x4_t)), \
> -  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]:
> __arm_vrhaddq_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
> __ARM_mve_coerce(__p1, uint8x16_t)), \
> -  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]:
> __arm_vrhaddq_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
> __ARM_mve_coerce(__p1, uint16x8_t)), \
> -  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]:
> __arm_vrhaddq_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
> __ARM_mve_coerce(__p1, uint32x4_t)));})
> -
> -#define __arm_vqsubq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
> -  __typeof(p1) __p1 = (p1); \
> -  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0,
> \
> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
> __arm_vqsubq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce3(p1, int)), \
> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vqsubq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce3(p1, int)), \
> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
> __arm_vqsubq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce3(p1, int)), \
> -  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
> __arm_vqsubq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
> __ARM_mve_coerce3(p1, int)), \
> -  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
> __arm_vqsubq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
> __ARM_mve_coerce3(p1, int)), \
> -  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
> __arm_vqsubq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
> __ARM_mve_coerce3(p1, int)), \
> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]:
> __arm_vqsubq_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int8x16_t)), \
> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]:
> __arm_vqsubq_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce(__p1, int16x8_t)), \
> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
> __arm_vqsubq_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce(__p1, int32x4_t)), \
> -  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]:
> __arm_vqsubq_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
> __ARM_mve_coerce(__p1, uint8x16_t)), \
> -  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]:
> __arm_vqsubq_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
> __ARM_mve_coerce(__p1, uint16x8_t)), \
> -  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]:
> __arm_vqsubq_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
> __ARM_mve_coerce(__p1, uint32x4_t)));})
> -
>  #define __arm_vqshlq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
>    __typeof(p1) __p1 = (p1); \
>    _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0,
> \
> @@ -34694,32 +31743,6 @@ extern void *__ARM_undef;
>    int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vqrdmulhq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce3(p1, int)), \
>    int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
> __arm_vqrdmulhq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce3(p1, int)));})
> 
> -#define __arm_vqdmulhq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
> -  __typeof(p1) __p1 = (p1); \
> -  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0,
> \
> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
> __arm_vqdmulhq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce3(p1, int)), \
> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vqdmulhq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce3(p1, int)), \
> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
> __arm_vqdmulhq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce3(p1, int)), \
> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]:
> __arm_vqdmulhq_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int8x16_t)), \
> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]:
> __arm_vqdmulhq_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce(__p1, int16x8_t)), \
> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
> __arm_vqdmulhq_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce(__p1, int32x4_t)));})
> -
> -#define __arm_vqaddq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
> -  __typeof(p1) __p1 = (p1); \
> -  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0,
> \
> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
> __arm_vqaddq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce3(p1, int)), \
> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vqaddq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce3(p1, int)), \
> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
> __arm_vqaddq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce3(p1, int)), \
> -  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
> __arm_vqaddq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
> __ARM_mve_coerce3(p1, int)), \
> -  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
> __arm_vqaddq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
> __ARM_mve_coerce3(p1, int)), \
> -  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
> __arm_vqaddq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
> __ARM_mve_coerce3(p1, int)), \
> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]:
> __arm_vqaddq_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int8x16_t)), \
> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]:
> __arm_vqaddq_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce(__p1, int16x8_t)), \
> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
> __arm_vqaddq_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce(__p1, int32x4_t)), \
> -  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]:
> __arm_vqaddq_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
> __ARM_mve_coerce(__p1, uint8x16_t)), \
> -  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]:
> __arm_vqaddq_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
> __ARM_mve_coerce(__p1, uint16x8_t)), \
> -  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]:
> __arm_vqaddq_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
> __ARM_mve_coerce(__p1, uint32x4_t)));})
> -
>  #define __arm_vornq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
>    __typeof(p1) __p1 = (p1); \
>    _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0,
> \
> @@ -34750,16 +31773,6 @@ extern void *__ARM_undef;
>    int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]:
> __arm_vmullbq_int_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
> __ARM_mve_coerce(__p1, uint16x8_t)), \
>    int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]:
> __arm_vmullbq_int_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
> __ARM_mve_coerce(__p1, uint32x4_t)));})
> 
> -#define __arm_vmulhq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
> -  __typeof(p1) __p1 = (p1); \
> -  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0,
> \
> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]:
> __arm_vmulhq_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int8x16_t)), \
> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]:
> __arm_vmulhq_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce(__p1, int16x8_t)), \
> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
> __arm_vmulhq_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce(__p1, int32x4_t)), \
> -  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]:
> __arm_vmulhq_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
> __ARM_mve_coerce(__p1, uint8x16_t)), \
> -  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]:
> __arm_vmulhq_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
> __ARM_mve_coerce(__p1, uint16x8_t)), \
> -  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]:
> __arm_vmulhq_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
> __ARM_mve_coerce(__p1, uint32x4_t)));})
> -
>  #define __arm_vminq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
>    __typeof(p1) __p1 = (p1); \
>    _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0,
> \
> @@ -34794,22 +31807,6 @@ extern void *__ARM_undef;
>    int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int16x8_t]:
> __arm_vmaxaq_s16 (__ARM_mve_coerce(__p0, uint16x8_t),
> __ARM_mve_coerce(__p1, int16x8_t)), \
>    int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int32x4_t]:
> __arm_vmaxaq_s32 (__ARM_mve_coerce(__p0, uint32x4_t),
> __ARM_mve_coerce(__p1, int32x4_t)));})
> 
> -#define __arm_vhsubq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
> -  __typeof(p1) __p1 = (p1); \
> -  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0,
> \
> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
> __arm_vhsubq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce3(p1, int)), \
> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vhsubq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce3(p1, int)), \
> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
> __arm_vhsubq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce3(p1, int)), \
> -  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
> __arm_vhsubq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
> __ARM_mve_coerce3(p1, int)), \
> -  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
> __arm_vhsubq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
> __ARM_mve_coerce3(p1, int)), \
> -  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
> __arm_vhsubq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
> __ARM_mve_coerce3(p1, int)), \
> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]:
> __arm_vhsubq_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int8x16_t)), \
> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]:
> __arm_vhsubq_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce(__p1, int16x8_t)), \
> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
> __arm_vhsubq_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce(__p1, int32x4_t)), \
> -  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]:
> __arm_vhsubq_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
> __ARM_mve_coerce(__p1, uint8x16_t)), \
> -  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]:
> __arm_vhsubq_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
> __ARM_mve_coerce(__p1, uint16x8_t)), \
> -  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]:
> __arm_vhsubq_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
> __ARM_mve_coerce(__p1, uint32x4_t)));})
> -
>  #define __arm_vhcaddq_rot90(p0,p1) ({ __typeof(p0) __p0 = (p0); \
>    __typeof(p1) __p1 = (p1); \
>    _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0,
> \
> @@ -34824,22 +31821,6 @@ extern void *__ARM_undef;
>    int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]:
> __arm_vhcaddq_rot270_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce(__p1, int16x8_t)), \
>    int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
> __arm_vhcaddq_rot270_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce(__p1, int32x4_t)));})
> 
> -#define __arm_vhaddq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
> -  __typeof(p1) __p1 = (p1); \
> -  _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0,
> \
> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
> __arm_vhaddq_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce3(p1, int)), \
> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vhaddq_n_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce3(p1, int)), \
> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
> __arm_vhaddq_n_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce3(p1, int)), \
> -  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
> __arm_vhaddq_n_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
> __ARM_mve_coerce3(p1, int)), \
> -  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
> __arm_vhaddq_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
> __ARM_mve_coerce3(p1, int)), \
> -  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
> __arm_vhaddq_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
> __ARM_mve_coerce3(p1, int)), \
> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]:
> __arm_vhaddq_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int8x16_t)), \
> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]:
> __arm_vhaddq_s16 (__ARM_mve_coerce(__p0, int16x8_t),
> __ARM_mve_coerce(__p1, int16x8_t)), \
> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
> __arm_vhaddq_s32 (__ARM_mve_coerce(__p0, int32x4_t),
> __ARM_mve_coerce(__p1, int32x4_t)), \
> -  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]:
> __arm_vhaddq_u8 (__ARM_mve_coerce(__p0, uint8x16_t),
> __ARM_mve_coerce(__p1, uint8x16_t)), \
> -  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]:
> __arm_vhaddq_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
> __ARM_mve_coerce(__p1, uint16x8_t)), \
> -  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]:
> __arm_vhaddq_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
> __ARM_mve_coerce(__p1, uint32x4_t)));})
> -
>  #define __arm_vcaddq_rot90(p0,p1) ({ __typeof(p0) __p0 = (p0); \
>    __typeof(p1) __p1 = (p1); \
>    _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0,
> \
> @@ -35910,16 +32891,6 @@ extern void *__ARM_undef;
>    int (*)[__ARM_mve_type_uint8x16_t]: __arm_vmovltq_x_u8
> (__ARM_mve_coerce(__p1, uint8x16_t), p2), \
>    int (*)[__ARM_mve_type_uint16x8_t]: __arm_vmovltq_x_u16
> (__ARM_mve_coerce(__p1, uint16x8_t), p2));})
> 
> -#define __arm_vmulhq_x(p1,p2,p3) ({ __typeof(p1) __p1 = (p1); \
> -  __typeof(p2) __p2 = (p2); \
> -  _Generic( (int (*)[__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0,
> \
> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]:
> __arm_vmulhq_x_s8 (__ARM_mve_coerce(__p1, int8x16_t),
> __ARM_mve_coerce(__p2, int8x16_t), p3), \
> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]:
> __arm_vmulhq_x_s16 (__ARM_mve_coerce(__p1, int16x8_t),
> __ARM_mve_coerce(__p2, int16x8_t), p3), \
> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
> __arm_vmulhq_x_s32 (__ARM_mve_coerce(__p1, int32x4_t),
> __ARM_mve_coerce(__p2, int32x4_t), p3), \
> -  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]:
> __arm_vmulhq_x_u8 (__ARM_mve_coerce(__p1, uint8x16_t),
> __ARM_mve_coerce(__p2, uint8x16_t), p3), \
> -  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]:
> __arm_vmulhq_x_u16 (__ARM_mve_coerce(__p1, uint16x8_t),
> __ARM_mve_coerce(__p2, uint16x8_t), p3), \
> -  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]:
> __arm_vmulhq_x_u32 (__ARM_mve_coerce(__p1, uint32x4_t),
> __ARM_mve_coerce(__p2, uint32x4_t), p3));})
> -
>  #define __arm_vmullbq_int_x(p1,p2,p3) ({ __typeof(p1) __p1 = (p1); \
>    __typeof(p2) __p2 = (p2); \
>    _Generic( (int (*)[__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0,
> \
> @@ -36095,16 +33066,6 @@ extern void *__ARM_undef;
>    int (*)[__ARM_mve_type_int8x16_t]: __arm_vrev16q_x_s8
> (__ARM_mve_coerce(__p1, int8x16_t), p2), \
>    int (*)[__ARM_mve_type_uint8x16_t]: __arm_vrev16q_x_u8
> (__ARM_mve_coerce(__p1, uint8x16_t), p2));})
> 
> -#define __arm_vrhaddq_x(p1,p2,p3) ({ __typeof(p1) __p1 = (p1); \
> -  __typeof(p2) __p2 = (p2); \
> -  _Generic( (int (*)[__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0,
> \
> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]:
> __arm_vrhaddq_x_s8 (__ARM_mve_coerce(__p1, int8x16_t),
> __ARM_mve_coerce(__p2, int8x16_t), p3), \
> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]:
> __arm_vrhaddq_x_s16 (__ARM_mve_coerce(__p1, int16x8_t),
> __ARM_mve_coerce(__p2, int16x8_t), p3), \
> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
> __arm_vrhaddq_x_s32 (__ARM_mve_coerce(__p1, int32x4_t),
> __ARM_mve_coerce(__p2, int32x4_t), p3), \
> -  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]:
> __arm_vrhaddq_x_u8 (__ARM_mve_coerce(__p1, uint8x16_t),
> __ARM_mve_coerce(__p2, uint8x16_t), p3), \
> -  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]:
> __arm_vrhaddq_x_u16 (__ARM_mve_coerce(__p1, uint16x8_t),
> __ARM_mve_coerce(__p2, uint16x8_t), p3), \
> -  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]:
> __arm_vrhaddq_x_u32 (__ARM_mve_coerce(__p1, uint32x4_t),
> __ARM_mve_coerce(__p2, uint32x4_t), p3));})
> -
>  #define __arm_vshlq_x(p1,p2,p3) ({ __typeof(p1) __p1 = (p1); \
>    __typeof(p2) __p2 = (p2); \
>    _Generic( (int (*)[__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0,
> \
> @@ -36115,16 +33076,6 @@ extern void *__ARM_undef;
>    int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int16x8_t]:
> __arm_vshlq_x_u16 (__ARM_mve_coerce(__p1, uint16x8_t),
> __ARM_mve_coerce(__p2, int16x8_t), p3), \
>    int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int32x4_t]:
> __arm_vshlq_x_u32 (__ARM_mve_coerce(__p1, uint32x4_t),
> __ARM_mve_coerce(__p2, int32x4_t), p3));})
> 
> -#define __arm_vrmulhq_x(p1,p2,p3) ({ __typeof(p1) __p1 = (p1); \
> -  __typeof(p2) __p2 = (p2); \
> -  _Generic( (int (*)[__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0,
> \
> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]:
> __arm_vrmulhq_x_s8 (__ARM_mve_coerce(__p1, int8x16_t),
> __ARM_mve_coerce(__p2, int8x16_t), p3), \
> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]:
> __arm_vrmulhq_x_s16 (__ARM_mve_coerce(__p1, int16x8_t),
> __ARM_mve_coerce(__p2, int16x8_t), p3), \
> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
> __arm_vrmulhq_x_s32 (__ARM_mve_coerce(__p1, int32x4_t),
> __ARM_mve_coerce(__p2, int32x4_t), p3), \
> -  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]:
> __arm_vrmulhq_x_u8 (__ARM_mve_coerce(__p1, uint8x16_t),
> __ARM_mve_coerce(__p2, uint8x16_t), p3), \
> -  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]:
> __arm_vrmulhq_x_u16 (__ARM_mve_coerce(__p1, uint16x8_t),
> __ARM_mve_coerce(__p2, uint16x8_t), p3), \
> -  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]:
> __arm_vrmulhq_x_u32 (__ARM_mve_coerce(__p1, uint32x4_t),
> __ARM_mve_coerce(__p2, uint32x4_t), p3));})
> -
>  #define __arm_vrshlq_x(p1,p2,p3) ({ __typeof(p1) __p1 = (p1); \
>    __typeof(p2) __p2 = (p2); \
>    _Generic( (int (*)[__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0,
> \
> @@ -36236,22 +33187,6 @@ extern void *__ARM_undef;
>    int (*)[__ARM_mve_type_uint16x8_t]: __arm_vshrq_x_n_u16
> (__ARM_mve_coerce(__p1, uint16x8_t), p2, p3), \
>    int (*)[__ARM_mve_type_uint32x4_t]: __arm_vshrq_x_n_u32
> (__ARM_mve_coerce(__p1, uint32x4_t), p2, p3));})
> 
> -#define __arm_vhaddq_x(p1,p2,p3) ({ __typeof(p1) __p1 = (p1); \
> -  __typeof(p2) __p2 = (p2); \
> -  _Generic( (int (*)[__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0,
> \
> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
> __arm_vhaddq_x_n_s8 (__ARM_mve_coerce(__p1, int8x16_t),
> __ARM_mve_coerce3(p2, int), p3), \
> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vhaddq_x_n_s16 (__ARM_mve_coerce(__p1, int16x8_t),
> __ARM_mve_coerce3(p2, int), p3), \
> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
> __arm_vhaddq_x_n_s32 (__ARM_mve_coerce(__p1, int32x4_t),
> __ARM_mve_coerce3(p2, int), p3), \
> -  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
> __arm_vhaddq_x_n_u8( __ARM_mve_coerce(__p1, uint8x16_t),
> __ARM_mve_coerce3(p2, int), p3), \
> -  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
> __arm_vhaddq_x_n_u16( __ARM_mve_coerce(__p1, uint16x8_t),
> __ARM_mve_coerce3(p2, int), p3), \
> -  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
> __arm_vhaddq_x_n_u32( __ARM_mve_coerce(__p1, uint32x4_t),
> __ARM_mve_coerce3(p2, int), p3), \
> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]:
> __arm_vhaddq_x_s8 (__ARM_mve_coerce(__p1, int8x16_t),
> __ARM_mve_coerce(__p2, int8x16_t), p3), \
> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]:
> __arm_vhaddq_x_s16 (__ARM_mve_coerce(__p1, int16x8_t),
> __ARM_mve_coerce(__p2, int16x8_t), p3), \
> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
> __arm_vhaddq_x_s32 (__ARM_mve_coerce(__p1, int32x4_t),
> __ARM_mve_coerce(__p2, int32x4_t), p3), \
> -  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]:
> __arm_vhaddq_x_u8( __ARM_mve_coerce(__p1, uint8x16_t),
> __ARM_mve_coerce(__p2, uint8x16_t), p3), \
> -  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]:
> __arm_vhaddq_x_u16 (__ARM_mve_coerce(__p1, uint16x8_t),
> __ARM_mve_coerce(__p2, uint16x8_t), p3), \
> -  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]:
> __arm_vhaddq_x_u32 (__ARM_mve_coerce(__p1, uint32x4_t),
> __ARM_mve_coerce(__p2, uint32x4_t), p3));})
> -
>  #define __arm_vhcaddq_rot270_x(p1,p2,p3) ({ __typeof(p1) __p1 = (p1); \
>    __typeof(p2) __p2 = (p2); \
>    _Generic( (int (*)[__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0,
> \
> @@ -36266,22 +33201,6 @@ extern void *__ARM_undef;
>    int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]:
> __arm_vhcaddq_rot90_x_s16 (__ARM_mve_coerce(__p1, int16x8_t),
> __ARM_mve_coerce(__p2, int16x8_t), p3), \
>    int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
> __arm_vhcaddq_rot90_x_s32 (__ARM_mve_coerce(__p1, int32x4_t),
> __ARM_mve_coerce(__p2, int32x4_t), p3));})
> 
> -#define __arm_vhsubq_x(p1,p2,p3) ({ __typeof(p1) __p1 = (p1); \
> -  __typeof(p2) __p2 = (p2); \
> -  _Generic( (int (*)[__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0,
> \
> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
> __arm_vhsubq_x_n_s8 (__ARM_mve_coerce(__p1, int8x16_t),
> __ARM_mve_coerce3(p2, int), p3), \
> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vhsubq_x_n_s16 (__ARM_mve_coerce(__p1, int16x8_t),
> __ARM_mve_coerce3(p2, int), p3), \
> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
> __arm_vhsubq_x_n_s32 (__ARM_mve_coerce(__p1, int32x4_t),
> __ARM_mve_coerce3(p2, int), p3), \
> -  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
> __arm_vhsubq_x_n_u8 (__ARM_mve_coerce(__p1, uint8x16_t),
> __ARM_mve_coerce3(p2, int), p3), \
> -  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
> __arm_vhsubq_x_n_u16 (__ARM_mve_coerce(__p1, uint16x8_t),
> __ARM_mve_coerce3(p2, int), p3), \
> -  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
> __arm_vhsubq_x_n_u32 (__ARM_mve_coerce(__p1, uint32x4_t),
> __ARM_mve_coerce3(p2, int), p3), \
> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]:
> __arm_vhsubq_x_s8 (__ARM_mve_coerce(__p1, int8x16_t),
> __ARM_mve_coerce(__p2, int8x16_t), p3), \
> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]:
> __arm_vhsubq_x_s16 (__ARM_mve_coerce(__p1, int16x8_t),
> __ARM_mve_coerce(__p2, int16x8_t), p3), \
> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
> __arm_vhsubq_x_s32 (__ARM_mve_coerce(__p1, int32x4_t),
> __ARM_mve_coerce(__p2, int32x4_t), p3), \
> -  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]:
> __arm_vhsubq_x_u8 (__ARM_mve_coerce(__p1, uint8x16_t),
> __ARM_mve_coerce(__p2, uint8x16_t), p3), \
> -  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]:
> __arm_vhsubq_x_u16 (__ARM_mve_coerce(__p1, uint16x8_t),
> __ARM_mve_coerce(__p2, uint16x8_t), p3), \
> -  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]:
> __arm_vhsubq_x_u32 (__ARM_mve_coerce(__p1, uint32x4_t),
> __ARM_mve_coerce(__p2, uint32x4_t), p3));})
> -
>  #define __arm_vclsq_x(p1,p2) ({ __typeof(p1) __p1 = (p1); \
>    _Generic( (int (*)[__ARM_mve_typeid(__p1)])0, \
>    int (*)[__ARM_mve_type_int8x16_t]: __arm_vclsq_x_s8
> (__ARM_mve_coerce(__p1, int8x16_t), p2), \
> @@ -36446,28 +33365,6 @@ extern void *__ARM_undef;
>    int
> (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_m
> ve_type_int16x8_t]: __arm_vqshlq_m_u16 (__ARM_mve_coerce(__p0,
> uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t),
> __ARM_mve_coerce(__p2, int16x8_t), p3), \
>    int
> (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_m
> ve_type_int32x4_t]: __arm_vqshlq_m_u32 (__ARM_mve_coerce(__p0,
> uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t),
> __ARM_mve_coerce(__p2, int32x4_t), p3));})
> 
> -#define __arm_vrhaddq_m(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
> -  __typeof(p1) __p1 = (p1); \
> -  __typeof(p2) __p2 = (p2); \
> -  _Generic( (int
> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typ
> eid(__p2)])0, \
> -  int
> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
> _type_int8x16_t]: __arm_vrhaddq_m_s8 (__ARM_mve_coerce(__p0,
> int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2,
> int8x16_t), p3), \
> -  int
> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
> _type_int16x8_t]: __arm_vrhaddq_m_s16 (__ARM_mve_coerce(__p0,
> int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2,
> int16x8_t), p3), \
> -  int
> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
> _type_int32x4_t]: __arm_vrhaddq_m_s32 (__ARM_mve_coerce(__p0,
> int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2,
> int32x4_t), p3), \
> -  int
> (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_m
> ve_type_uint8x16_t]: __arm_vrhaddq_m_u8 (__ARM_mve_coerce(__p0,
> uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t),
> __ARM_mve_coerce(__p2, uint8x16_t), p3), \
> -  int
> (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_m
> ve_type_uint16x8_t]: __arm_vrhaddq_m_u16 (__ARM_mve_coerce(__p0,
> uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t),
> __ARM_mve_coerce(__p2, uint16x8_t), p3), \
> -  int
> (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_m
> ve_type_uint32x4_t]: __arm_vrhaddq_m_u32 (__ARM_mve_coerce(__p0,
> uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t),
> __ARM_mve_coerce(__p2, uint32x4_t), p3));})
> -
> -#define __arm_vrmulhq_m(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
> -  __typeof(p1) __p1 = (p1); \
> -  __typeof(p2) __p2 = (p2); \
> -  _Generic( (int
> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typ
> eid(__p2)])0, \
> -  int
> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
> _type_int8x16_t]: __arm_vrmulhq_m_s8 (__ARM_mve_coerce(__p0,
> int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2,
> int8x16_t), p3), \
> -  int
> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
> _type_int16x8_t]: __arm_vrmulhq_m_s16 (__ARM_mve_coerce(__p0,
> int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2,
> int16x8_t), p3), \
> -  int
> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
> _type_int32x4_t]: __arm_vrmulhq_m_s32 (__ARM_mve_coerce(__p0,
> int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2,
> int32x4_t), p3), \
> -  int
> (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_m
> ve_type_uint8x16_t]: __arm_vrmulhq_m_u8 (__ARM_mve_coerce(__p0,
> uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t),
> __ARM_mve_coerce(__p2, uint8x16_t), p3), \
> -  int
> (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_m
> ve_type_uint16x8_t]: __arm_vrmulhq_m_u16 (__ARM_mve_coerce(__p0,
> uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t),
> __ARM_mve_coerce(__p2, uint16x8_t), p3), \
> -  int
> (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_m
> ve_type_uint32x4_t]: __arm_vrmulhq_m_u32 (__ARM_mve_coerce(__p0,
> uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t),
> __ARM_mve_coerce(__p2, uint32x4_t), p3));})
> -
>  #define __arm_vrshlq_m(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
>    __typeof(p1) __p1 = (p1); \
>    __typeof(p2) __p2 = (p2); \
> @@ -36509,23 +33406,6 @@ extern void *__ARM_undef;
>    int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]:
> __arm_vsliq_m_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
> __ARM_mve_coerce(__p1, uint16x8_t),  p2, p3), \
>    int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]:
> __arm_vsliq_m_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
> __ARM_mve_coerce(__p1, uint32x4_t),  p2, p3));})
> 
> -#define __arm_vqsubq_m(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
> -  __typeof(p1) __p1 = (p1); \
> -  __typeof(p2) __p2 = (p2); \
> -  _Generic( (int
> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typ
> eid(__p2)])0, \
> -  int
> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
> _type_int_n]: __arm_vqsubq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce3(p2, int), p3), \
> -  int
> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
> _type_int_n]: __arm_vqsubq_m_n_s16 (__ARM_mve_coerce(__p0,
> int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce3(p2,
> int), p3), \
> -  int
> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
> _type_int_n]: __arm_vqsubq_m_n_s32 (__ARM_mve_coerce(__p0,
> int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce3(p2,
> int), p3), \
> -  int
> (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_m
> ve_type_int_n]: __arm_vqsubq_m_n_u8 (__ARM_mve_coerce(__p0,
> uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce3(p2,
> int), p3), \
> -  int
> (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_m
> ve_type_int_n]: __arm_vqsubq_m_n_u16 (__ARM_mve_coerce(__p0,
> uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce3(p2,
> int), p3), \
> -  int
> (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_m
> ve_type_int_n]: __arm_vqsubq_m_n_u32 (__ARM_mve_coerce(__p0,
> uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce3(p2,
> int), p3), \
> -  int
> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
> _type_int8x16_t]: __arm_vqsubq_m_s8 (__ARM_mve_coerce(__p0,
> int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2,
> int8x16_t), p3), \
> -  int
> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
> _type_int16x8_t]: __arm_vqsubq_m_s16 (__ARM_mve_coerce(__p0,
> int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2,
> int16x8_t), p3), \
> -  int
> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
> _type_int32x4_t]: __arm_vqsubq_m_s32 (__ARM_mve_coerce(__p0,
> int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2,
> int32x4_t), p3), \
> -  int
> (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_m
> ve_type_uint8x16_t]: __arm_vqsubq_m_u8 (__ARM_mve_coerce(__p0,
> uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t),
> __ARM_mve_coerce(__p2, uint8x16_t), p3), \
> -  int
> (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_m
> ve_type_uint16x8_t]: __arm_vqsubq_m_u16 (__ARM_mve_coerce(__p0,
> uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t),
> __ARM_mve_coerce(__p2, uint16x8_t), p3), \
> -  int
> (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_m
> ve_type_uint32x4_t]: __arm_vqsubq_m_u32 (__ARM_mve_coerce(__p0,
> uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t),
> __ARM_mve_coerce(__p2, uint32x4_t), p3));})
> -
>  #define __arm_vqrdmulhq_m(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
>    __typeof(p1) __p1 = (p1); \
>    __typeof(p2) __p2 = (p2); \
> @@ -36799,23 +33679,6 @@ extern void *__ARM_undef;
>    int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]:
> __arm_vsriq_m_n_u16 (__ARM_mve_coerce(__p0, uint16x8_t),
> __ARM_mve_coerce(__p1, uint16x8_t), p2, p3), \
>    int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]:
> __arm_vsriq_m_n_u32 (__ARM_mve_coerce(__p0, uint32x4_t),
> __ARM_mve_coerce(__p1, uint32x4_t), p2, p3));})
> 
> -#define __arm_vhaddq_m(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
> -  __typeof(p1) __p1 = (p1); \
> -  __typeof(p2) __p2 = (p2); \
> -  _Generic( (int
> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typ
> eid(__p2)])0, \
> -  int
> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
> _type_int_n]: __arm_vhaddq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce3(p2, int), p3), \
> -  int
> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
> _type_int_n]: __arm_vhaddq_m_n_s16 (__ARM_mve_coerce(__p0,
> int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce3(p2,
> int), p3), \
> -  int
> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
> _type_int_n]: __arm_vhaddq_m_n_s32 (__ARM_mve_coerce(__p0,
> int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce3(p2,
> int), p3), \
> -  int
> (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_m
> ve_type_int_n]: __arm_vhaddq_m_n_u8 (__ARM_mve_coerce(__p0,
> uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce3(p2,
> int), p3), \
> -  int
> (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_m
> ve_type_int_n]: __arm_vhaddq_m_n_u16 (__ARM_mve_coerce(__p0,
> uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce3(p2,
> int), p3), \
> -  int
> (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_m
> ve_type_int_n]: __arm_vhaddq_m_n_u32 (__ARM_mve_coerce(__p0,
> uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce3(p2,
> int), p3), \
> -  int
> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
> _type_int8x16_t]: __arm_vhaddq_m_s8 (__ARM_mve_coerce(__p0,
> int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2,
> int8x16_t), p3), \
> -  int
> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
> _type_int16x8_t]: __arm_vhaddq_m_s16 (__ARM_mve_coerce(__p0,
> int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2,
> int16x8_t), p3), \
> -  int
> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
> _type_int32x4_t]: __arm_vhaddq_m_s32 (__ARM_mve_coerce(__p0,
> int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2,
> int32x4_t), p3), \
> -  int
> (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_m
> ve_type_uint8x16_t]: __arm_vhaddq_m_u8 (__ARM_mve_coerce(__p0,
> uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t),
> __ARM_mve_coerce(__p2, uint8x16_t), p3), \
> -  int
> (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_m
> ve_type_uint16x8_t]: __arm_vhaddq_m_u16 (__ARM_mve_coerce(__p0,
> uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t),
> __ARM_mve_coerce(__p2, uint16x8_t), p3), \
> -  int
> (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_m
> ve_type_uint32x4_t]: __arm_vhaddq_m_u32 (__ARM_mve_coerce(__p0,
> uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t),
> __ARM_mve_coerce(__p2, uint32x4_t), p3));})
> -
>  #define __arm_vhcaddq_rot270_m(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0);
> \
>    __typeof(p1) __p1 = (p1); \
>    __typeof(p2) __p2 = (p2); \
> @@ -36832,23 +33695,6 @@ extern void *__ARM_undef;
>    int
> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
> _type_int16x8_t]: __arm_vhcaddq_rot90_m_s16 (__ARM_mve_coerce(__p0,
> int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2,
> int16x8_t), p3), \
>    int
> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
> _type_int32x4_t]: __arm_vhcaddq_rot90_m_s32 (__ARM_mve_coerce(__p0,
> int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2,
> int32x4_t), p3));})
> 
> -#define __arm_vhsubq_m(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
> -  __typeof(p1) __p1 = (p1); \
> -  __typeof(p2) __p2 = (p2); \
> -  _Generic( (int
> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typ
> eid(__p2)])0, \
> -  int
> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
> _type_int8x16_t]: __arm_vhsubq_m_s8 (__ARM_mve_coerce(__p0,
> int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2,
> int8x16_t), p3), \
> -  int
> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
> _type_int16x8_t]: __arm_vhsubq_m_s16 (__ARM_mve_coerce(__p0,
> int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2,
> int16x8_t), p3), \
> -  int
> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
> _type_int32x4_t]: __arm_vhsubq_m_s32 (__ARM_mve_coerce(__p0,
> int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2,
> int32x4_t), p3), \
> -  int
> (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_m
> ve_type_uint8x16_t]: __arm_vhsubq_m_u8 (__ARM_mve_coerce(__p0,
> uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t),
> __ARM_mve_coerce(__p2, uint8x16_t), p3), \
> -  int
> (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_m
> ve_type_uint16x8_t]: __arm_vhsubq_m_u16 (__ARM_mve_coerce(__p0,
> uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t),
> __ARM_mve_coerce(__p2, uint16x8_t), p3), \
> -  int
> (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_m
> ve_type_uint32x4_t]: __arm_vhsubq_m_u32 (__ARM_mve_coerce(__p0,
> uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t),
> __ARM_mve_coerce(__p2, uint32x4_t), p3), \
> -  int
> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
> _type_int_n]: __arm_vhsubq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce3(p2, int), p3), \
> -  int
> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
> _type_int_n]: __arm_vhsubq_m_n_s16 (__ARM_mve_coerce(__p0,
> int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce3(p2,
> int), p3), \
> -  int
> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
> _type_int_n]: __arm_vhsubq_m_n_s32 (__ARM_mve_coerce(__p0,
> int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce3(p2,
> int), p3), \
> -  int
> (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_m
> ve_type_int_n]: __arm_vhsubq_m_n_u8 (__ARM_mve_coerce(__p0,
> uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce3(p2,
> int), p3), \
> -  int
> (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_m
> ve_type_int_n]: __arm_vhsubq_m_n_u16 (__ARM_mve_coerce(__p0,
> uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce3(p2,
> int), p3), \
> -  int
> (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_m
> ve_type_int_n]: __arm_vhsubq_m_n_u32 (__ARM_mve_coerce(__p0,
> uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce3(p2,
> int), p3));})
> -
>  #define __arm_vmaxq_m(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
>    __typeof(p1) __p1 = (p1); \
>    __typeof(p2) __p2 = (p2); \
> @@ -36893,17 +33739,6 @@ extern void *__ARM_undef;
>    int
> (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_m
> ve_type_int_n]: __arm_vmlasq_m_n_u16 (__ARM_mve_coerce(__p0,
> uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce3(p2,
> int), p3), \
>    int
> (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_m
> ve_type_int_n]: __arm_vmlasq_m_n_u32 (__ARM_mve_coerce(__p0,
> uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce3(p2,
> int), p3));})
> 
> -#define __arm_vmulhq_m(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
> -  __typeof(p1) __p1 = (p1); \
> -  __typeof(p2) __p2 = (p2); \
> -  _Generic( (int
> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typ
> eid(__p2)])0, \
> -  int
> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
> _type_int8x16_t]: __arm_vmulhq_m_s8 (__ARM_mve_coerce(__p0,
> int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2,
> int8x16_t), p3), \
> -  int
> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
> _type_int16x8_t]: __arm_vmulhq_m_s16 (__ARM_mve_coerce(__p0,
> int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2,
> int16x8_t), p3), \
> -  int
> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
> _type_int32x4_t]: __arm_vmulhq_m_s32 (__ARM_mve_coerce(__p0,
> int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2,
> int32x4_t), p3), \
> -  int
> (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_m
> ve_type_uint8x16_t]: __arm_vmulhq_m_u8 (__ARM_mve_coerce(__p0,
> uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t),
> __ARM_mve_coerce(__p2, uint8x16_t), p3), \
> -  int
> (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_m
> ve_type_uint16x8_t]: __arm_vmulhq_m_u16 (__ARM_mve_coerce(__p0,
> uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t),
> __ARM_mve_coerce(__p2, uint16x8_t), p3), \
> -  int
> (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_m
> ve_type_uint32x4_t]: __arm_vmulhq_m_u32 (__ARM_mve_coerce(__p0,
> uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t),
> __ARM_mve_coerce(__p2, uint32x4_t), p3));})
> -
>  #define __arm_vmullbq_int_m(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
>    __typeof(p1) __p1 = (p1); \
>    __typeof(p2) __p2 = (p2); \
> @@ -36933,23 +33768,6 @@ extern void *__ARM_undef;
>    int
> (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint8x16_t][__ARM_m
> ve_type_uint8x16_t]: __arm_vmulltq_poly_m_p8 (__ARM_mve_coerce(__p0,
> uint16x8_t), __ARM_mve_coerce(__p1, uint8x16_t),
> __ARM_mve_coerce(__p2, uint8x16_t), p3), \
>    int
> (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint16x8_t][__ARM_m
> ve_type_uint16x8_t]: __arm_vmulltq_poly_m_p16
> (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1,
> uint16x8_t), __ARM_mve_coerce(__p2, uint16x8_t), p3));})
> 
> -#define __arm_vqaddq_m(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
> -  __typeof(p1) __p1 = (p1); \
> -  __typeof(p2) __p2 = (p2); \
> -  _Generic( (int
> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typ
> eid(__p2)])0, \
> -  int
> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
> _type_int_n]: __arm_vqaddq_m_n_s8 (__ARM_mve_coerce(__p0, int8x16_t),
> __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce3(p2, int), p3), \
> -  int
> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
> _type_int_n]: __arm_vqaddq_m_n_s16 (__ARM_mve_coerce(__p0,
> int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce3(p2,
> int), p3), \
> -  int
> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
> _type_int_n]: __arm_vqaddq_m_n_s32 (__ARM_mve_coerce(__p0,
> int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce3(p2,
> int), p3), \
> -  int
> (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_m
> ve_type_int_n]: __arm_vqaddq_m_n_u8 (__ARM_mve_coerce(__p0,
> uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce3(p2,
> int), p3), \
> -  int
> (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_m
> ve_type_int_n]: __arm_vqaddq_m_n_u16 (__ARM_mve_coerce(__p0,
> uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce3(p2,
> int), p3), \
> -  int
> (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_m
> ve_type_int_n]: __arm_vqaddq_m_n_u32 (__ARM_mve_coerce(__p0,
> uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce3(p2,
> int), p3), \
> -  int
> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
> _type_int8x16_t]: __arm_vqaddq_m_s8 (__ARM_mve_coerce(__p0,
> int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2,
> int8x16_t), p3), \
> -  int
> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
> _type_int16x8_t]: __arm_vqaddq_m_s16 (__ARM_mve_coerce(__p0,
> int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2,
> int16x8_t), p3), \
> -  int
> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
> _type_int32x4_t]: __arm_vqaddq_m_s32 (__ARM_mve_coerce(__p0,
> int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2,
> int32x4_t), p3), \
> -  int
> (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t][__ARM_m
> ve_type_uint8x16_t]: __arm_vqaddq_m_u8 (__ARM_mve_coerce(__p0,
> uint8x16_t), __ARM_mve_coerce(__p1, uint8x16_t),
> __ARM_mve_coerce(__p2, uint8x16_t), p3), \
> -  int
> (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t][__ARM_m
> ve_type_uint16x8_t]: __arm_vqaddq_m_u16 (__ARM_mve_coerce(__p0,
> uint16x8_t), __ARM_mve_coerce(__p1, uint16x8_t),
> __ARM_mve_coerce(__p2, uint16x8_t), p3), \
> -  int
> (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t][__ARM_m
> ve_type_uint32x4_t]: __arm_vqaddq_m_u32 (__ARM_mve_coerce(__p0,
> uint32x4_t), __ARM_mve_coerce(__p1, uint32x4_t),
> __ARM_mve_coerce(__p2, uint32x4_t), p3));})
> -
>  #define __arm_vqdmlahq_m(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
>    __typeof(p1) __p1 = (p1); \
>    __typeof(p2) __p2 = (p2); \
> @@ -36958,17 +33776,6 @@ extern void *__ARM_undef;
>    int
> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
> _type_int_n]: __arm_vqdmlahq_m_n_s16 (__ARM_mve_coerce(__p0,
> int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce3(p2,
> int), p3), \
>    int
> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
> _type_int_n]: __arm_vqdmlahq_m_n_s32 (__ARM_mve_coerce(__p0,
> int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce3(p2,
> int), p3));})
> 
> -#define __arm_vqdmulhq_m(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
> -  __typeof(p1) __p1 = (p1); \
> -  __typeof(p2) __p2 = (p2); \
> -  _Generic( (int
> (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typ
> eid(__p2)])0, \
> -  int
> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
> _type_int_n]: __arm_vqdmulhq_m_n_s8 (__ARM_mve_coerce(__p0,
> int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce3(p2,
> int), p3), \
> -  int
> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
> _type_int_n]: __arm_vqdmulhq_m_n_s16 (__ARM_mve_coerce(__p0,
> int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce3(p2,
> int), p3), \
> -  int
> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
> _type_int_n]: __arm_vqdmulhq_m_n_s32 (__ARM_mve_coerce(__p0,
> int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce3(p2,
> int), p3), \
> -  int
> (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t][__ARM_mve
> _type_int8x16_t]: __arm_vqdmulhq_m_s8 (__ARM_mve_coerce(__p0,
> int8x16_t), __ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2,
> int8x16_t), p3), \
> -  int
> (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t][__ARM_mve
> _type_int16x8_t]: __arm_vqdmulhq_m_s16 (__ARM_mve_coerce(__p0,
> int16x8_t), __ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2,
> int16x8_t), p3), \
> -  int
> (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t][__ARM_mve
> _type_int32x4_t]: __arm_vqdmulhq_m_s32 (__ARM_mve_coerce(__p0,
> int32x4_t), __ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2,
> int32x4_t), p3));})
> -
>  #define __arm_vqdmullbq_m(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \
>    __typeof(p1) __p1 = (p1); \
>    __typeof(p2) __p2 = (p2); \
> @@ -37562,16 +34369,6 @@ extern void *__ARM_undef;
>    int (*)[__ARM_mve_type_uint8x16_t]: __arm_vmovltq_x_u8
> (__ARM_mve_coerce(__p1, uint8x16_t), p2), \
>    int (*)[__ARM_mve_type_uint16x8_t]: __arm_vmovltq_x_u16
> (__ARM_mve_coerce(__p1, uint16x8_t), p2));})
> 
> -#define __arm_vmulhq_x(p1,p2,p3) ({ __typeof(p1) __p1 = (p1); \
> -  __typeof(p2) __p2 = (p2); \
> -  _Generic( (int (*)[__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0,
> \
> -  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]:
> __arm_vmulhq_x_s8 (__ARM_mve_coerce(__p1, int8x16_t),
> __ARM_mve_coerce(__p2, int8x16_t), p3), \
> -  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]:
> __arm_vmulhq_x_s16 (__ARM_mve_coerce(__p1, int16x8_t),
> __ARM_mve_coerce(__p2, int16x8_t), p3), \
> -  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
> __arm_vmulhq_x_s32 (__ARM_mve_coerce(__p1, int32x4_t),
> __ARM_mve_coerce(__p2, int32x4_t), p3), \
> -  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]:
> __arm_vmulhq_x_u8 (__ARM_mve_coerce(__p1, uint8x16_t),
> __ARM_mve_coerce(__p2, uint8x16_t), p3), \
> -  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]:
> __arm_vmulhq_x_u16 (__ARM_mve_coerce(__p1, uint16x8_t),
> __ARM_mve_coerce(__p2, uint16x8_t), p3), \
> -  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]:
> __arm_vmulhq_x_u32 (__ARM_mve_coerce(__p1, uint32x4_t),
> __ARM_mve_coerce(__p2, uint32x4_t), p3));})
> -
>  #define __arm_vmullbq_int_x(p1,p2,p3) ({ __typeof(p1) __p1 = (p1); \
>    __typeof(p2) __p2 = (p2); \
>    _Generic( (int (*)[__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0,
> \
> --
> 2.34.1


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 08/22] arm: [MVE intrinsics] rework vaddq vmulq vsubq
  2023-05-02 16:31   ` Kyrylo Tkachov
@ 2023-05-03  9:06     ` Christophe Lyon
  0 siblings, 0 replies; 55+ messages in thread
From: Christophe Lyon @ 2023-05-03  9:06 UTC (permalink / raw)
  To: Kyrylo Tkachov, gcc-patches, Richard Earnshaw, Richard Sandiford



On 5/2/23 18:31, Kyrylo Tkachov wrote:
> 
> 
>> -----Original Message-----
>> From: Christophe Lyon <christophe.lyon@arm.com>
>> Sent: Tuesday, April 18, 2023 2:46 PM
>> To: gcc-patches@gcc.gnu.org; Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>;
>> Richard Earnshaw <Richard.Earnshaw@arm.com>; Richard Sandiford
>> <Richard.Sandiford@arm.com>
>> Cc: Christophe Lyon <Christophe.Lyon@arm.com>
>> Subject: [PATCH 08/22] arm: [MVE intrinsics] rework vaddq vmulq vsubq
>>
>> Implement vaddq, vmulq, vsubq using the new MVE builtins framework.
>>
>> 2022-09-08  Christophe Lyon <christophe.lyon@arm.com>
>>
>> 	gcc/
>>
>> 	* config/arm/arm-mve-builtins-base.cc
>> (FUNCTION_WITH_RTX_M_N):
>> 	New.
>> 	(vaddq, vmulq, vsubq): New.
>> 	* config/arm/arm-mve-builtins-base.def (vaddq, vmulq, vsubq): New.
>> 	* config/arm/arm-mve-builtins-base.h (vaddq, vmulq, vsubq): New.
>> 	* config/arm/arm_mve.h (vaddq): Remove.
>> 	(vaddq_m): Remove.
>> 	(vaddq_x): Remove.
>> 	(vaddq_n_u8): Remove.
>> 	(vaddq_n_s8): Remove.
>> 	(vaddq_n_u16): Remove.
>> 	(vaddq_n_s16): Remove.
>> 	(vaddq_n_u32): Remove.
>> 	(vaddq_n_s32): Remove.
>> 	(vaddq_n_f16): Remove.
>> 	(vaddq_n_f32): Remove.
>> 	(vaddq_m_n_s8): Remove.
>> 	(vaddq_m_n_s32): Remove.
>> 	(vaddq_m_n_s16): Remove.
>> 	(vaddq_m_n_u8): Remove.
>> 	(vaddq_m_n_u32): Remove.
>> 	(vaddq_m_n_u16): Remove.
>> 	(vaddq_m_s8): Remove.
>> 	(vaddq_m_s32): Remove.
>> 	(vaddq_m_s16): Remove.
>> 	(vaddq_m_u8): Remove.
>> 	(vaddq_m_u32): Remove.
>> 	(vaddq_m_u16): Remove.
>> 	(vaddq_m_f32): Remove.
>> 	(vaddq_m_f16): Remove.
>> 	(vaddq_m_n_f32): Remove.
>> 	(vaddq_m_n_f16): Remove.
>> 	(vaddq_s8): Remove.
>> 	(vaddq_s16): Remove.
>> 	(vaddq_s32): Remove.
>> 	(vaddq_u8): Remove.
>> 	(vaddq_u16): Remove.
>> 	(vaddq_u32): Remove.
>> 	(vaddq_f16): Remove.
>> 	(vaddq_f32): Remove.
>> 	(vaddq_x_s8): Remove.
>> 	(vaddq_x_s16): Remove.
>> 	(vaddq_x_s32): Remove.
>> 	(vaddq_x_n_s8): Remove.
>> 	(vaddq_x_n_s16): Remove.
>> 	(vaddq_x_n_s32): Remove.
>> 	(vaddq_x_u8): Remove.
>> 	(vaddq_x_u16): Remove.
>> 	(vaddq_x_u32): Remove.
>> 	(vaddq_x_n_u8): Remove.
>> 	(vaddq_x_n_u16): Remove.
>> 	(vaddq_x_n_u32): Remove.
>> 	(vaddq_x_f16): Remove.
>> 	(vaddq_x_f32): Remove.
>> 	(vaddq_x_n_f16): Remove.
>> 	(vaddq_x_n_f32): Remove.
>> 	(__arm_vaddq_n_u8): Remove.
>> 	(__arm_vaddq_n_s8): Remove.
>> 	(__arm_vaddq_n_u16): Remove.
>> 	(__arm_vaddq_n_s16): Remove.
>> 	(__arm_vaddq_n_u32): Remove.
>> 	(__arm_vaddq_n_s32): Remove.
>> 	(__arm_vaddq_m_n_s8): Remove.
>> 	(__arm_vaddq_m_n_s32): Remove.
>> 	(__arm_vaddq_m_n_s16): Remove.
>> 	(__arm_vaddq_m_n_u8): Remove.
>> 	(__arm_vaddq_m_n_u32): Remove.
>> 	(__arm_vaddq_m_n_u16): Remove.
>> 	(__arm_vaddq_m_s8): Remove.
>> 	(__arm_vaddq_m_s32): Remove.
>> 	(__arm_vaddq_m_s16): Remove.
>> 	(__arm_vaddq_m_u8): Remove.
>> 	(__arm_vaddq_m_u32): Remove.
>> 	(__arm_vaddq_m_u16): Remove.
>> 	(__arm_vaddq_s8): Remove.
>> 	(__arm_vaddq_s16): Remove.
>> 	(__arm_vaddq_s32): Remove.
>> 	(__arm_vaddq_u8): Remove.
>> 	(__arm_vaddq_u16): Remove.
>> 	(__arm_vaddq_u32): Remove.
>> 	(__arm_vaddq_x_s8): Remove.
>> 	(__arm_vaddq_x_s16): Remove.
>> 	(__arm_vaddq_x_s32): Remove.
>> 	(__arm_vaddq_x_n_s8): Remove.
>> 	(__arm_vaddq_x_n_s16): Remove.
>> 	(__arm_vaddq_x_n_s32): Remove.
>> 	(__arm_vaddq_x_u8): Remove.
>> 	(__arm_vaddq_x_u16): Remove.
>> 	(__arm_vaddq_x_u32): Remove.
>> 	(__arm_vaddq_x_n_u8): Remove.
>> 	(__arm_vaddq_x_n_u16): Remove.
>> 	(__arm_vaddq_x_n_u32): Remove.
>> 	(__arm_vaddq_n_f16): Remove.
>> 	(__arm_vaddq_n_f32): Remove.
>> 	(__arm_vaddq_m_f32): Remove.
>> 	(__arm_vaddq_m_f16): Remove.
>> 	(__arm_vaddq_m_n_f32): Remove.
>> 	(__arm_vaddq_m_n_f16): Remove.
>> 	(__arm_vaddq_f16): Remove.
>> 	(__arm_vaddq_f32): Remove.
>> 	(__arm_vaddq_x_f16): Remove.
>> 	(__arm_vaddq_x_f32): Remove.
>> 	(__arm_vaddq_x_n_f16): Remove.
>> 	(__arm_vaddq_x_n_f32): Remove.
>> 	(__arm_vaddq): Remove.
>> 	(__arm_vaddq_m): Remove.
>> 	(__arm_vaddq_x): Remove.
>> 	(vmulq): Remove.
>> 	(vmulq_m): Remove.
>> 	(vmulq_x): Remove.
>> 	(vmulq_u8): Remove.
>> 	(vmulq_n_u8): Remove.
>> 	(vmulq_s8): Remove.
>> 	(vmulq_n_s8): Remove.
>> 	(vmulq_u16): Remove.
>> 	(vmulq_n_u16): Remove.
>> 	(vmulq_s16): Remove.
>> 	(vmulq_n_s16): Remove.
>> 	(vmulq_u32): Remove.
>> 	(vmulq_n_u32): Remove.
>> 	(vmulq_s32): Remove.
>> 	(vmulq_n_s32): Remove.
>> 	(vmulq_n_f16): Remove.
>> 	(vmulq_f16): Remove.
>> 	(vmulq_n_f32): Remove.
>> 	(vmulq_f32): Remove.
>> 	(vmulq_m_n_s8): Remove.
>> 	(vmulq_m_n_s32): Remove.
>> 	(vmulq_m_n_s16): Remove.
>> 	(vmulq_m_n_u8): Remove.
>> 	(vmulq_m_n_u32): Remove.
>> 	(vmulq_m_n_u16): Remove.
>> 	(vmulq_m_s8): Remove.
>> 	(vmulq_m_s32): Remove.
>> 	(vmulq_m_s16): Remove.
>> 	(vmulq_m_u8): Remove.
>> 	(vmulq_m_u32): Remove.
>> 	(vmulq_m_u16): Remove.
>> 	(vmulq_m_f32): Remove.
>> 	(vmulq_m_f16): Remove.
>> 	(vmulq_m_n_f32): Remove.
>> 	(vmulq_m_n_f16): Remove.
>> 	(vmulq_x_s8): Remove.
>> 	(vmulq_x_s16): Remove.
>> 	(vmulq_x_s32): Remove.
>> 	(vmulq_x_n_s8): Remove.
>> 	(vmulq_x_n_s16): Remove.
>> 	(vmulq_x_n_s32): Remove.
>> 	(vmulq_x_u8): Remove.
>> 	(vmulq_x_u16): Remove.
>> 	(vmulq_x_u32): Remove.
>> 	(vmulq_x_n_u8): Remove.
>> 	(vmulq_x_n_u16): Remove.
>> 	(vmulq_x_n_u32): Remove.
>> 	(vmulq_x_f16): Remove.
>> 	(vmulq_x_f32): Remove.
>> 	(vmulq_x_n_f16): Remove.
>> 	(vmulq_x_n_f32): Remove.
>> 	(__arm_vmulq_u8): Remove.
>> 	(__arm_vmulq_n_u8): Remove.
>> 	(__arm_vmulq_s8): Remove.
>> 	(__arm_vmulq_n_s8): Remove.
>> 	(__arm_vmulq_u16): Remove.
>> 	(__arm_vmulq_n_u16): Remove.
>> 	(__arm_vmulq_s16): Remove.
>> 	(__arm_vmulq_n_s16): Remove.
>> 	(__arm_vmulq_u32): Remove.
>> 	(__arm_vmulq_n_u32): Remove.
>> 	(__arm_vmulq_s32): Remove.
>> 	(__arm_vmulq_n_s32): Remove.
>> 	(__arm_vmulq_m_n_s8): Remove.
>> 	(__arm_vmulq_m_n_s32): Remove.
>> 	(__arm_vmulq_m_n_s16): Remove.
>> 	(__arm_vmulq_m_n_u8): Remove.
>> 	(__arm_vmulq_m_n_u32): Remove.
>> 	(__arm_vmulq_m_n_u16): Remove.
>> 	(__arm_vmulq_m_s8): Remove.
>> 	(__arm_vmulq_m_s32): Remove.
>> 	(__arm_vmulq_m_s16): Remove.
>> 	(__arm_vmulq_m_u8): Remove.
>> 	(__arm_vmulq_m_u32): Remove.
>> 	(__arm_vmulq_m_u16): Remove.
>> 	(__arm_vmulq_x_s8): Remove.
>> 	(__arm_vmulq_x_s16): Remove.
>> 	(__arm_vmulq_x_s32): Remove.
>> 	(__arm_vmulq_x_n_s8): Remove.
>> 	(__arm_vmulq_x_n_s16): Remove.
>> 	(__arm_vmulq_x_n_s32): Remove.
>> 	(__arm_vmulq_x_u8): Remove.
>> 	(__arm_vmulq_x_u16): Remove.
>> 	(__arm_vmulq_x_u32): Remove.
>> 	(__arm_vmulq_x_n_u8): Remove.
>> 	(__arm_vmulq_x_n_u16): Remove.
>> 	(__arm_vmulq_x_n_u32): Remove.
>> 	(__arm_vmulq_n_f16): Remove.
>> 	(__arm_vmulq_f16): Remove.
>> 	(__arm_vmulq_n_f32): Remove.
>> 	(__arm_vmulq_f32): Remove.
>> 	(__arm_vmulq_m_f32): Remove.
>> 	(__arm_vmulq_m_f16): Remove.
>> 	(__arm_vmulq_m_n_f32): Remove.
>> 	(__arm_vmulq_m_n_f16): Remove.
>> 	(__arm_vmulq_x_f16): Remove.
>> 	(__arm_vmulq_x_f32): Remove.
>> 	(__arm_vmulq_x_n_f16): Remove.
>> 	(__arm_vmulq_x_n_f32): Remove.
>> 	(__arm_vmulq): Remove.
>> 	(__arm_vmulq_m): Remove.
>> 	(__arm_vmulq_x): Remove.
>> 	(vsubq): Remove.
>> 	(vsubq_m): Remove.
>> 	(vsubq_x): Remove.
>> 	(vsubq_n_f16): Remove.
>> 	(vsubq_n_f32): Remove.
>> 	(vsubq_u8): Remove.
>> 	(vsubq_n_u8): Remove.
>> 	(vsubq_s8): Remove.
>> 	(vsubq_n_s8): Remove.
>> 	(vsubq_u16): Remove.
>> 	(vsubq_n_u16): Remove.
>> 	(vsubq_s16): Remove.
>> 	(vsubq_n_s16): Remove.
>> 	(vsubq_u32): Remove.
>> 	(vsubq_n_u32): Remove.
>> 	(vsubq_s32): Remove.
>> 	(vsubq_n_s32): Remove.
>> 	(vsubq_f16): Remove.
>> 	(vsubq_f32): Remove.
>> 	(vsubq_m_s8): Remove.
>> 	(vsubq_m_u8): Remove.
>> 	(vsubq_m_s16): Remove.
>> 	(vsubq_m_u16): Remove.
>> 	(vsubq_m_s32): Remove.
>> 	(vsubq_m_u32): Remove.
>> 	(vsubq_m_n_s8): Remove.
>> 	(vsubq_m_n_s32): Remove.
>> 	(vsubq_m_n_s16): Remove.
>> 	(vsubq_m_n_u8): Remove.
>> 	(vsubq_m_n_u32): Remove.
>> 	(vsubq_m_n_u16): Remove.
>> 	(vsubq_m_f32): Remove.
>> 	(vsubq_m_f16): Remove.
>> 	(vsubq_m_n_f32): Remove.
>> 	(vsubq_m_n_f16): Remove.
>> 	(vsubq_x_s8): Remove.
>> 	(vsubq_x_s16): Remove.
>> 	(vsubq_x_s32): Remove.
>> 	(vsubq_x_n_s8): Remove.
>> 	(vsubq_x_n_s16): Remove.
>> 	(vsubq_x_n_s32): Remove.
>> 	(vsubq_x_u8): Remove.
>> 	(vsubq_x_u16): Remove.
>> 	(vsubq_x_u32): Remove.
>> 	(vsubq_x_n_u8): Remove.
>> 	(vsubq_x_n_u16): Remove.
>> 	(vsubq_x_n_u32): Remove.
>> 	(vsubq_x_f16): Remove.
>> 	(vsubq_x_f32): Remove.
>> 	(vsubq_x_n_f16): Remove.
>> 	(vsubq_x_n_f32): Remove.
>> 	(__arm_vsubq_u8): Remove.
>> 	(__arm_vsubq_n_u8): Remove.
>> 	(__arm_vsubq_s8): Remove.
>> 	(__arm_vsubq_n_s8): Remove.
>> 	(__arm_vsubq_u16): Remove.
>> 	(__arm_vsubq_n_u16): Remove.
>> 	(__arm_vsubq_s16): Remove.
>> 	(__arm_vsubq_n_s16): Remove.
>> 	(__arm_vsubq_u32): Remove.
>> 	(__arm_vsubq_n_u32): Remove.
>> 	(__arm_vsubq_s32): Remove.
>> 	(__arm_vsubq_n_s32): Remove.
>> 	(__arm_vsubq_m_s8): Remove.
>> 	(__arm_vsubq_m_u8): Remove.
>> 	(__arm_vsubq_m_s16): Remove.
>> 	(__arm_vsubq_m_u16): Remove.
>> 	(__arm_vsubq_m_s32): Remove.
>> 	(__arm_vsubq_m_u32): Remove.
>> 	(__arm_vsubq_m_n_s8): Remove.
>> 	(__arm_vsubq_m_n_s32): Remove.
>> 	(__arm_vsubq_m_n_s16): Remove.
>> 	(__arm_vsubq_m_n_u8): Remove.
>> 	(__arm_vsubq_m_n_u32): Remove.
>> 	(__arm_vsubq_m_n_u16): Remove.
>> 	(__arm_vsubq_x_s8): Remove.
>> 	(__arm_vsubq_x_s16): Remove.
>> 	(__arm_vsubq_x_s32): Remove.
>> 	(__arm_vsubq_x_n_s8): Remove.
>> 	(__arm_vsubq_x_n_s16): Remove.
>> 	(__arm_vsubq_x_n_s32): Remove.
>> 	(__arm_vsubq_x_u8): Remove.
>> 	(__arm_vsubq_x_u16): Remove.
>> 	(__arm_vsubq_x_u32): Remove.
>> 	(__arm_vsubq_x_n_u8): Remove.
>> 	(__arm_vsubq_x_n_u16): Remove.
>> 	(__arm_vsubq_x_n_u32): Remove.
>> 	(__arm_vsubq_n_f16): Remove.
>> 	(__arm_vsubq_n_f32): Remove.
>> 	(__arm_vsubq_f16): Remove.
>> 	(__arm_vsubq_f32): Remove.
>> 	(__arm_vsubq_m_f32): Remove.
>> 	(__arm_vsubq_m_f16): Remove.
>> 	(__arm_vsubq_m_n_f32): Remove.
>> 	(__arm_vsubq_m_n_f16): Remove.
>> 	(__arm_vsubq_x_f16): Remove.
>> 	(__arm_vsubq_x_f32): Remove.
>> 	(__arm_vsubq_x_n_f16): Remove.
>> 	(__arm_vsubq_x_n_f32): Remove.
>> 	(__arm_vsubq): Remove.
>> 	(__arm_vsubq_m): Remove.
>> 	(__arm_vsubq_x): Remove.
>> 	* config/arm/arm_mve_builtins.def (vsubq_u, vsubq_s, vsubq_f):
>> 	Remove.
>> 	(vmulq_u, vmulq_s, vmulq_f): Remove.
>> 	* config/arm/mve.md (mve_vsubq_<supf><mode>): Remove.
>> 	(mve_vmulq_<supf><mode>): Remove.
> 
> [snip]
> 
>> diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
>> index 5167fbc6add..ccb3cf23304 100644
>> --- a/gcc/config/arm/mve.md
>> +++ b/gcc/config/arm/mve.md
>> @@ -1353,18 +1353,6 @@ (define_insn "mve_vmulltq_int_<supf><mode>"
>>   ;; [vmulq_u, vmulq_s])
>>   ;; [vsubq_s, vsubq_u])
>>   ;;
>> -(define_insn "mve_vmulq_<supf><mode>"
>> -  [
>> -    (set (match_operand:MVE_2 0 "s_register_operand" "=w")
>> -	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand"
>> "w")
>> -		       (match_operand:MVE_2 2 "s_register_operand" "w")]
>> -	 VMULQ))
>> -  ]
>> -  "TARGET_HAVE_MVE"
>> -  "vmul.i%#<V_sz_elem>\t%q0, %q1, %q2"
>> -  [(set_attr "type" "mve_move")
>> -])
>> -
>>   (define_insn "mve_<mve_addsubmul>q<mode>"
>>     [
>>      (set (match_operand:MVE_2 0 "s_register_operand" "=w")
>> @@ -1742,21 +1730,6 @@ (define_insn "mve_vshlq_r_<supf><mode>"
>>     [(set_attr "type" "mve_move")
>>   ])
>>
>> -;;
>> -;; [vsubq_s, vsubq_u])
>> -;;
>> -(define_insn "mve_vsubq_<supf><mode>"
>> -  [
>> -   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
>> -	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand"
>> "w")
>> -		       (match_operand:MVE_2 2 "s_register_operand" "w")]
>> -	 VSUBQ))
>> -  ]
>> -  "TARGET_HAVE_MVE"
>> -  "vsub.i%#<V_sz_elem>\t%q0, %q1, %q2"
>> -  [(set_attr "type" "mve_move")
>> -])
>> -
> 
> Just to make sure I understand correctly, are these patterns being removed because the new builtins are wired through the factored patterns in patch [07/22]?
> If so, ok.

Yes. In patch 07/22, we introduce mve_<mve_addsubmul>q<mode> which uses 
standard RTX codes, instead of unspecs.
With this patch 08/22, the builtins now rely on the RTX codes, so the 
unspec-based patterns as useless (as was already the case for "add").

Christophe

> Thanks,
> Kyrill
> 
>>   ;;
>>   ;; [vabdq_f])
>>   ;;
>> --
>> 2.34.1
> 

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH v2 03/22] arm: [MVE intrinsics] Rework vreinterpretq
  2023-05-02 15:49         ` Christophe Lyon
@ 2023-05-03 14:37           ` Christophe Lyon
  2023-05-03 14:52             ` Kyrylo Tkachov
  0 siblings, 1 reply; 55+ messages in thread
From: Christophe Lyon @ 2023-05-03 14:37 UTC (permalink / raw)
  To: gcc-patches, kyrylo.tkachov, richard.earnshaw, richard.sandiford
  Cc: Christophe Lyon

This patch implements vreinterpretq using the new MVE intrinsics
framework.

The old definitions for vreinterpretq are removed as a consequence.

2022-09-08  Murray Steele  <murray.steele@arm.com>
	    Christophe Lyon  <christophe.lyon@arm.com>

	gcc/
	* config/arm/arm-mve-builtins-base.cc (vreinterpretq_impl): New class.
	* config/arm/arm-mve-builtins-base.def: Define vreinterpretq.
	* config/arm/arm-mve-builtins-base.h (vreinterpretq): New declaration.
	* config/arm/arm-mve-builtins-shapes.cc (parse_element_type): New function.
	(parse_type): Likewise.
	(parse_signature): Likewise.
	(build_one): Likewise.
	(build_all): Likewise.
	(overloaded_base): New struct.
	(unary_convert_def): Likewise.
	* config/arm/arm-mve-builtins-shapes.h (unary_convert): Declare.
	* config/arm/arm-mve-builtins.cc (TYPES_reinterpret_signed1): New
	macro.
	(TYPES_reinterpret_unsigned1): Likewise.
	(TYPES_reinterpret_integer): Likewise.
	(TYPES_reinterpret_integer1): Likewise.
	(TYPES_reinterpret_float1): Likewise.
	(TYPES_reinterpret_float): Likewise.
	(reinterpret_integer): New.
	(reinterpret_float): New.
	(handle_arm_mve_h): Register builtins.
	* config/arm/arm_mve.h (vreinterpretq_s16): Remove.
	(vreinterpretq_s32): Likewise.
	(vreinterpretq_s64): Likewise.
	(vreinterpretq_s8): Likewise.
	(vreinterpretq_u16): Likewise.
	(vreinterpretq_u32): Likewise.
	(vreinterpretq_u64): Likewise.
	(vreinterpretq_u8): Likewise.
	(vreinterpretq_f16): Likewise.
	(vreinterpretq_f32): Likewise.
	(vreinterpretq_s16_s32): Likewise.
	(vreinterpretq_s16_s64): Likewise.
	(vreinterpretq_s16_s8): Likewise.
	(vreinterpretq_s16_u16): Likewise.
	(vreinterpretq_s16_u32): Likewise.
	(vreinterpretq_s16_u64): Likewise.
	(vreinterpretq_s16_u8): Likewise.
	(vreinterpretq_s32_s16): Likewise.
	(vreinterpretq_s32_s64): Likewise.
	(vreinterpretq_s32_s8): Likewise.
	(vreinterpretq_s32_u16): Likewise.
	(vreinterpretq_s32_u32): Likewise.
	(vreinterpretq_s32_u64): Likewise.
	(vreinterpretq_s32_u8): Likewise.
	(vreinterpretq_s64_s16): Likewise.
	(vreinterpretq_s64_s32): Likewise.
	(vreinterpretq_s64_s8): Likewise.
	(vreinterpretq_s64_u16): Likewise.
	(vreinterpretq_s64_u32): Likewise.
	(vreinterpretq_s64_u64): Likewise.
	(vreinterpretq_s64_u8): Likewise.
	(vreinterpretq_s8_s16): Likewise.
	(vreinterpretq_s8_s32): Likewise.
	(vreinterpretq_s8_s64): Likewise.
	(vreinterpretq_s8_u16): Likewise.
	(vreinterpretq_s8_u32): Likewise.
	(vreinterpretq_s8_u64): Likewise.
	(vreinterpretq_s8_u8): Likewise.
	(vreinterpretq_u16_s16): Likewise.
	(vreinterpretq_u16_s32): Likewise.
	(vreinterpretq_u16_s64): Likewise.
	(vreinterpretq_u16_s8): Likewise.
	(vreinterpretq_u16_u32): Likewise.
	(vreinterpretq_u16_u64): Likewise.
	(vreinterpretq_u16_u8): Likewise.
	(vreinterpretq_u32_s16): Likewise.
	(vreinterpretq_u32_s32): Likewise.
	(vreinterpretq_u32_s64): Likewise.
	(vreinterpretq_u32_s8): Likewise.
	(vreinterpretq_u32_u16): Likewise.
	(vreinterpretq_u32_u64): Likewise.
	(vreinterpretq_u32_u8): Likewise.
	(vreinterpretq_u64_s16): Likewise.
	(vreinterpretq_u64_s32): Likewise.
	(vreinterpretq_u64_s64): Likewise.
	(vreinterpretq_u64_s8): Likewise.
	(vreinterpretq_u64_u16): Likewise.
	(vreinterpretq_u64_u32): Likewise.
	(vreinterpretq_u64_u8): Likewise.
	(vreinterpretq_u8_s16): Likewise.
	(vreinterpretq_u8_s32): Likewise.
	(vreinterpretq_u8_s64): Likewise.
	(vreinterpretq_u8_s8): Likewise.
	(vreinterpretq_u8_u16): Likewise.
	(vreinterpretq_u8_u32): Likewise.
	(vreinterpretq_u8_u64): Likewise.
	(vreinterpretq_s32_f16): Likewise.
	(vreinterpretq_s32_f32): Likewise.
	(vreinterpretq_u16_f16): Likewise.
	(vreinterpretq_u16_f32): Likewise.
	(vreinterpretq_u32_f16): Likewise.
	(vreinterpretq_u32_f32): Likewise.
	(vreinterpretq_u64_f16): Likewise.
	(vreinterpretq_u64_f32): Likewise.
	(vreinterpretq_u8_f16): Likewise.
	(vreinterpretq_u8_f32): Likewise.
	(vreinterpretq_f16_f32): Likewise.
	(vreinterpretq_f16_s16): Likewise.
	(vreinterpretq_f16_s32): Likewise.
	(vreinterpretq_f16_s64): Likewise.
	(vreinterpretq_f16_s8): Likewise.
	(vreinterpretq_f16_u16): Likewise.
	(vreinterpretq_f16_u32): Likewise.
	(vreinterpretq_f16_u64): Likewise.
	(vreinterpretq_f16_u8): Likewise.
	(vreinterpretq_f32_f16): Likewise.
	(vreinterpretq_f32_s16): Likewise.
	(vreinterpretq_f32_s32): Likewise.
	(vreinterpretq_f32_s64): Likewise.
	(vreinterpretq_f32_s8): Likewise.
	(vreinterpretq_f32_u16): Likewise.
	(vreinterpretq_f32_u32): Likewise.
	(vreinterpretq_f32_u64): Likewise.
	(vreinterpretq_f32_u8): Likewise.
	(vreinterpretq_s16_f16): Likewise.
	(vreinterpretq_s16_f32): Likewise.
	(vreinterpretq_s64_f16): Likewise.
	(vreinterpretq_s64_f32): Likewise.
	(vreinterpretq_s8_f16): Likewise.
	(vreinterpretq_s8_f32): Likewise.
	(__arm_vreinterpretq_f16): Likewise.
	(__arm_vreinterpretq_f32): Likewise.
	(__arm_vreinterpretq_s16): Likewise.
	(__arm_vreinterpretq_s32): Likewise.
	(__arm_vreinterpretq_s64): Likewise.
	(__arm_vreinterpretq_s8): Likewise.
	(__arm_vreinterpretq_u16): Likewise.
	(__arm_vreinterpretq_u32): Likewise.
	(__arm_vreinterpretq_u64): Likewise.
	(__arm_vreinterpretq_u8): Likewise.
	* config/arm/arm_mve_types.h (__arm_vreinterpretq_s16_s32): Remove.
	(__arm_vreinterpretq_s16_s64): Likewise.
	(__arm_vreinterpretq_s16_s8): Likewise.
	(__arm_vreinterpretq_s16_u16): Likewise.
	(__arm_vreinterpretq_s16_u32): Likewise.
	(__arm_vreinterpretq_s16_u64): Likewise.
	(__arm_vreinterpretq_s16_u8): Likewise.
	(__arm_vreinterpretq_s32_s16): Likewise.
	(__arm_vreinterpretq_s32_s64): Likewise.
	(__arm_vreinterpretq_s32_s8): Likewise.
	(__arm_vreinterpretq_s32_u16): Likewise.
	(__arm_vreinterpretq_s32_u32): Likewise.
	(__arm_vreinterpretq_s32_u64): Likewise.
	(__arm_vreinterpretq_s32_u8): Likewise.
	(__arm_vreinterpretq_s64_s16): Likewise.
	(__arm_vreinterpretq_s64_s32): Likewise.
	(__arm_vreinterpretq_s64_s8): Likewise.
	(__arm_vreinterpretq_s64_u16): Likewise.
	(__arm_vreinterpretq_s64_u32): Likewise.
	(__arm_vreinterpretq_s64_u64): Likewise.
	(__arm_vreinterpretq_s64_u8): Likewise.
	(__arm_vreinterpretq_s8_s16): Likewise.
	(__arm_vreinterpretq_s8_s32): Likewise.
	(__arm_vreinterpretq_s8_s64): Likewise.
	(__arm_vreinterpretq_s8_u16): Likewise.
	(__arm_vreinterpretq_s8_u32): Likewise.
	(__arm_vreinterpretq_s8_u64): Likewise.
	(__arm_vreinterpretq_s8_u8): Likewise.
	(__arm_vreinterpretq_u16_s16): Likewise.
	(__arm_vreinterpretq_u16_s32): Likewise.
	(__arm_vreinterpretq_u16_s64): Likewise.
	(__arm_vreinterpretq_u16_s8): Likewise.
	(__arm_vreinterpretq_u16_u32): Likewise.
	(__arm_vreinterpretq_u16_u64): Likewise.
	(__arm_vreinterpretq_u16_u8): Likewise.
	(__arm_vreinterpretq_u32_s16): Likewise.
	(__arm_vreinterpretq_u32_s32): Likewise.
	(__arm_vreinterpretq_u32_s64): Likewise.
	(__arm_vreinterpretq_u32_s8): Likewise.
	(__arm_vreinterpretq_u32_u16): Likewise.
	(__arm_vreinterpretq_u32_u64): Likewise.
	(__arm_vreinterpretq_u32_u8): Likewise.
	(__arm_vreinterpretq_u64_s16): Likewise.
	(__arm_vreinterpretq_u64_s32): Likewise.
	(__arm_vreinterpretq_u64_s64): Likewise.
	(__arm_vreinterpretq_u64_s8): Likewise.
	(__arm_vreinterpretq_u64_u16): Likewise.
	(__arm_vreinterpretq_u64_u32): Likewise.
	(__arm_vreinterpretq_u64_u8): Likewise.
	(__arm_vreinterpretq_u8_s16): Likewise.
	(__arm_vreinterpretq_u8_s32): Likewise.
	(__arm_vreinterpretq_u8_s64): Likewise.
	(__arm_vreinterpretq_u8_s8): Likewise.
	(__arm_vreinterpretq_u8_u16): Likewise.
	(__arm_vreinterpretq_u8_u32): Likewise.
	(__arm_vreinterpretq_u8_u64): Likewise.
	(__arm_vreinterpretq_s32_f16): Likewise.
	(__arm_vreinterpretq_s32_f32): Likewise.
	(__arm_vreinterpretq_s16_f16): Likewise.
	(__arm_vreinterpretq_s16_f32): Likewise.
	(__arm_vreinterpretq_s64_f16): Likewise.
	(__arm_vreinterpretq_s64_f32): Likewise.
	(__arm_vreinterpretq_s8_f16): Likewise.
	(__arm_vreinterpretq_s8_f32): Likewise.
	(__arm_vreinterpretq_u16_f16): Likewise.
	(__arm_vreinterpretq_u16_f32): Likewise.
	(__arm_vreinterpretq_u32_f16): Likewise.
	(__arm_vreinterpretq_u32_f32): Likewise.
	(__arm_vreinterpretq_u64_f16): Likewise.
	(__arm_vreinterpretq_u64_f32): Likewise.
	(__arm_vreinterpretq_u8_f16): Likewise.
	(__arm_vreinterpretq_u8_f32): Likewise.
	(__arm_vreinterpretq_f16_f32): Likewise.
	(__arm_vreinterpretq_f16_s16): Likewise.
	(__arm_vreinterpretq_f16_s32): Likewise.
	(__arm_vreinterpretq_f16_s64): Likewise.
	(__arm_vreinterpretq_f16_s8): Likewise.
	(__arm_vreinterpretq_f16_u16): Likewise.
	(__arm_vreinterpretq_f16_u32): Likewise.
	(__arm_vreinterpretq_f16_u64): Likewise.
	(__arm_vreinterpretq_f16_u8): Likewise.
	(__arm_vreinterpretq_f32_f16): Likewise.
	(__arm_vreinterpretq_f32_s16): Likewise.
	(__arm_vreinterpretq_f32_s32): Likewise.
	(__arm_vreinterpretq_f32_s64): Likewise.
	(__arm_vreinterpretq_f32_s8): Likewise.
	(__arm_vreinterpretq_f32_u16): Likewise.
	(__arm_vreinterpretq_f32_u32): Likewise.
	(__arm_vreinterpretq_f32_u64): Likewise.
	(__arm_vreinterpretq_f32_u8): Likewise.
	(__arm_vreinterpretq_s16): Likewise.
	(__arm_vreinterpretq_s32): Likewise.
	(__arm_vreinterpretq_s64): Likewise.
	(__arm_vreinterpretq_s8): Likewise.
	(__arm_vreinterpretq_u16): Likewise.
	(__arm_vreinterpretq_u32): Likewise.
	(__arm_vreinterpretq_u64): Likewise.
	(__arm_vreinterpretq_u8): Likewise.
	(__arm_vreinterpretq_f16): Likewise.
	(__arm_vreinterpretq_f32): Likewise.
	* config/arm/mve.md (@arm_mve_reinterpret<mode>): New pattern.
	* config/arm/unspecs.md: (REINTERPRET): New unspec.

	gcc/testsuite/
	* g++.target/arm/mve.exp: Add general-c++ and general directories.
	* g++.target/arm/mve/general-c++/nomve_fp_1.c: New test.
	* g++.target/arm/mve/general-c++/vreinterpretq_1.C: New test.
	* gcc.target/arm/mve/general-c/nomve_fp_1.c: New test.
	* gcc.target/arm/mve/general-c/vreinterpretq_1.c: New test.
---
 gcc/config/arm/arm-mve-builtins-base.cc       |   33 +
 gcc/config/arm/arm-mve-builtins-base.def      |    2 +
 gcc/config/arm/arm-mve-builtins-base.h        |    2 +
 gcc/config/arm/arm-mve-builtins-shapes.cc     |   28 +
 gcc/config/arm/arm-mve-builtins-shapes.h      |    8 +
 gcc/config/arm/arm-mve-builtins.cc            |   60 +
 gcc/config/arm/arm_mve.h                      |  300 ----
 gcc/config/arm/arm_mve_types.h                | 1365 +----------------
 gcc/config/arm/mve.md                         |   18 +
 gcc/config/arm/unspecs.md                     |    1 +
 gcc/testsuite/g++.target/arm/mve.exp          |    8 +-
 .../arm/mve/general-c++/nomve_fp_1.c          |   15 +
 .../arm/mve/general-c++/vreinterpretq_1.C     |   25 +
 .../gcc.target/arm/mve/general-c/nomve_fp_1.c |   15 +
 .../arm/mve/general-c/vreinterpretq_1.c       |   25 +
 15 files changed, 290 insertions(+), 1615 deletions(-)
 create mode 100644 gcc/testsuite/g++.target/arm/mve/general-c++/nomve_fp_1.c
 create mode 100644 gcc/testsuite/g++.target/arm/mve/general-c++/vreinterpretq_1.C
 create mode 100644 gcc/testsuite/gcc.target/arm/mve/general-c/nomve_fp_1.c
 create mode 100644 gcc/testsuite/gcc.target/arm/mve/general-c/vreinterpretq_1.c

diff --git a/gcc/config/arm/arm-mve-builtins-base.cc b/gcc/config/arm/arm-mve-builtins-base.cc
index e9f285faf2b..abf6a1e19de 100644
--- a/gcc/config/arm/arm-mve-builtins-base.cc
+++ b/gcc/config/arm/arm-mve-builtins-base.cc
@@ -38,8 +38,41 @@ using namespace arm_mve;
 
 namespace {
 
+/* Implements vreinterpretq_* intrinsics.  */
+class vreinterpretq_impl : public quiet<function_base>
+{
+  gimple *
+  fold (gimple_folder &f) const override
+  {
+    /* We should punt to rtl if the effect of the reinterpret on
+       registers does not conform to GCC's endianness model like we do
+       on aarch64, but MVE intrinsics are not currently supported on
+       big-endian.  For this, we'd need to handle big-endian properly
+       in the .md file, like we do on aarch64 with
+       define_insn_and_split "*aarch64_sve_reinterpret<mode>".  */
+    gcc_assert (targetm.can_change_mode_class (f.vector_mode (0),
+					       f.vector_mode (1),
+					       VFP_REGS));
+
+    /* Otherwise vreinterpret corresponds directly to a VIEW_CONVERT_EXPR
+       reinterpretation.  */
+    tree rhs = build1 (VIEW_CONVERT_EXPR, TREE_TYPE (f.lhs),
+		       gimple_call_arg (f.call, 0));
+    return gimple_build_assign (f.lhs, VIEW_CONVERT_EXPR, rhs);
+  }
+
+  rtx
+  expand (function_expander &e) const override
+  {
+    machine_mode mode = e.vector_mode (0);
+    return e.use_exact_insn (code_for_arm_mve_reinterpret (mode));
+  }
+};
+
 } /* end anonymous namespace */
 
 namespace arm_mve {
 
+FUNCTION (vreinterpretq, vreinterpretq_impl,)
+
 } /* end namespace arm_mve */
diff --git a/gcc/config/arm/arm-mve-builtins-base.def b/gcc/config/arm/arm-mve-builtins-base.def
index d15ba2e23e8..5c0c1b9cee7 100644
--- a/gcc/config/arm/arm-mve-builtins-base.def
+++ b/gcc/config/arm/arm-mve-builtins-base.def
@@ -18,7 +18,9 @@
    <http://www.gnu.org/licenses/>.  */
 
 #define REQUIRES_FLOAT false
+DEF_MVE_FUNCTION (vreinterpretq, unary_convert, reinterpret_integer, none)
 #undef REQUIRES_FLOAT
 
 #define REQUIRES_FLOAT true
+DEF_MVE_FUNCTION (vreinterpretq, unary_convert, reinterpret_float, none)
 #undef REQUIRES_FLOAT
diff --git a/gcc/config/arm/arm-mve-builtins-base.h b/gcc/config/arm/arm-mve-builtins-base.h
index c4d7b750cd5..60e7bd24eda 100644
--- a/gcc/config/arm/arm-mve-builtins-base.h
+++ b/gcc/config/arm/arm-mve-builtins-base.h
@@ -23,6 +23,8 @@
 namespace arm_mve {
 namespace functions {
 
+extern const function_base *const vreinterpretq;
+
 } /* end namespace arm_mve::functions */
 } /* end namespace arm_mve */
 
diff --git a/gcc/config/arm/arm-mve-builtins-shapes.cc b/gcc/config/arm/arm-mve-builtins-shapes.cc
index f20660d8319..d0da0ffef91 100644
--- a/gcc/config/arm/arm-mve-builtins-shapes.cc
+++ b/gcc/config/arm/arm-mve-builtins-shapes.cc
@@ -338,6 +338,34 @@ struct overloaded_base : public function_shape
   }
 };
 
+/* <T0>_t foo_t0[_t1](<T1>_t)
+
+   where the target type <t0> must be specified explicitly but the source
+   type <t1> can be inferred.
+
+   Example: vreinterpretq.
+   int16x8_t [__arm_]vreinterpretq_s16[_s8](int8x16_t a)
+   int32x4_t [__arm_]vreinterpretq_s32[_s8](int8x16_t a)
+   int8x16_t [__arm_]vreinterpretq_s8[_s16](int16x8_t a)
+   int8x16_t [__arm_]vreinterpretq_s8[_s32](int32x4_t a)  */
+struct unary_convert_def : public overloaded_base<1>
+{
+  void
+  build (function_builder &b, const function_group_info &group,
+	 bool preserve_user_namespace) const override
+  {
+    b.add_overloaded_functions (group, MODE_none, preserve_user_namespace);
+    build_all (b, "v0,v1", group, MODE_none, preserve_user_namespace);
+  }
+
+  tree
+  resolve (function_resolver &r) const override
+  {
+    return r.resolve_unary ();
+  }
+};
+SHAPE (unary_convert)
+
 } /* end namespace arm_mve */
 
 #undef SHAPE
diff --git a/gcc/config/arm/arm-mve-builtins-shapes.h b/gcc/config/arm/arm-mve-builtins-shapes.h
index 9e353b85a76..04d19a02890 100644
--- a/gcc/config/arm/arm-mve-builtins-shapes.h
+++ b/gcc/config/arm/arm-mve-builtins-shapes.h
@@ -22,8 +22,16 @@
 
 namespace arm_mve
 {
+  /* The naming convention is:
+
+     - to use names like "unary" etc. if the rules are somewhat generic,
+       especially if there are no ranges involved.  */
+
   namespace shapes
   {
+
+    extern const function_shape *const unary_convert;
+
   } /* end namespace arm_mve::shapes */
 } /* end namespace arm_mve */
 
diff --git a/gcc/config/arm/arm-mve-builtins.cc b/gcc/config/arm/arm-mve-builtins.cc
index b0cceb75ceb..e409a029346 100644
--- a/gcc/config/arm/arm-mve-builtins.cc
+++ b/gcc/config/arm/arm-mve-builtins.cc
@@ -199,6 +199,52 @@ CONSTEXPR const type_suffix_info type_suffixes[NUM_TYPE_SUFFIXES + 1] = {
 #define TYPES_signed_32(S, D) \
   S (s32)
 
+#define TYPES_reinterpret_signed1(D, A) \
+  D (A, s8), D (A, s16), D (A, s32), D (A, s64)
+
+#define TYPES_reinterpret_unsigned1(D, A) \
+  D (A, u8), D (A, u16), D (A, u32), D (A, u64)
+
+#define TYPES_reinterpret_integer(S, D) \
+  TYPES_reinterpret_unsigned1 (D, s8), \
+  D (s8, s16), D (s8, s32), D (s8, s64), \
+  TYPES_reinterpret_unsigned1 (D, s16), \
+  D (s16, s8), D (s16, s32), D (s16, s64), \
+  TYPES_reinterpret_unsigned1 (D, s32), \
+  D (s32, s8), D (s32, s16), D (s32, s64), \
+  TYPES_reinterpret_unsigned1 (D, s64), \
+  D (s64, s8), D (s64, s16), D (s64, s32), \
+  TYPES_reinterpret_signed1 (D, u8), \
+  D (u8, u16), D (u8, u32), D (u8, u64), \
+  TYPES_reinterpret_signed1 (D, u16), \
+  D (u16, u8), D (u16, u32), D (u16, u64), \
+  TYPES_reinterpret_signed1 (D, u32), \
+  D (u32, u8), D (u32, u16), D (u32, u64), \
+  TYPES_reinterpret_signed1 (D, u64), \
+  D (u64, u8), D (u64, u16), D (u64, u32)
+
+/* { _s8  _s16 _s32 _s64 } x { _s8  _s16 _s32 _s64 }
+   { _u8  _u16 _u32 _u64 }   { _u8  _u16 _u32 _u64 }.  */
+#define TYPES_reinterpret_integer1(D, A) \
+  TYPES_reinterpret_signed1 (D, A), \
+  TYPES_reinterpret_unsigned1 (D, A)
+
+#define TYPES_reinterpret_float1(D, A) \
+  D (A, f16), D (A, f32)
+
+#define TYPES_reinterpret_float(S, D) \
+  TYPES_reinterpret_float1 (D, s8), \
+  TYPES_reinterpret_float1 (D, s16), \
+  TYPES_reinterpret_float1 (D, s32), \
+  TYPES_reinterpret_float1 (D, s64), \
+  TYPES_reinterpret_float1 (D, u8), \
+  TYPES_reinterpret_float1 (D, u16), \
+  TYPES_reinterpret_float1 (D, u32), \
+  TYPES_reinterpret_float1 (D, u64), \
+  TYPES_reinterpret_integer1 (D, f16), \
+  TYPES_reinterpret_integer1 (D, f32), \
+  D (f16, f32), D (f32, f16)
+
 /* Describe a pair of type suffixes in which only the first is used.  */
 #define DEF_VECTOR_TYPE(X) { TYPE_SUFFIX_ ## X, NUM_TYPE_SUFFIXES }
 
@@ -231,6 +277,8 @@ DEF_MVE_TYPES_ARRAY (integer_16_32);
 DEF_MVE_TYPES_ARRAY (integer_32);
 DEF_MVE_TYPES_ARRAY (signed_16_32);
 DEF_MVE_TYPES_ARRAY (signed_32);
+DEF_MVE_TYPES_ARRAY (reinterpret_integer);
+DEF_MVE_TYPES_ARRAY (reinterpret_float);
 
 /* Used by functions that have no governing predicate.  */
 static const predication_index preds_none[] = { PRED_none, NUM_PREDS };
@@ -253,6 +301,14 @@ static const predication_index preds_p_or_none[] = {
   PRED_p, PRED_none, NUM_PREDS
 };
 
+/* A list of all MVE ACLE functions.  */
+static CONSTEXPR const function_group_info function_groups[] = {
+#define DEF_MVE_FUNCTION(NAME, SHAPE, TYPES, PREDS)			\
+  { #NAME, &functions::NAME, &shapes::SHAPE, types_##TYPES, preds_##PREDS, \
+    REQUIRES_FLOAT },
+#include "arm-mve-builtins.def"
+};
+
 /* The scalar type associated with each vector type.  */
 extern GTY(()) tree scalar_types[NUM_VECTOR_TYPES];
 tree scalar_types[NUM_VECTOR_TYPES];
@@ -431,6 +487,10 @@ handle_arm_mve_h (bool preserve_user_namespace)
 
   /* Define MVE functions.  */
   function_table = new hash_table<registered_function_hasher> (1023);
+  function_builder builder;
+  for (unsigned int i = 0; i < ARRAY_SIZE (function_groups); ++i)
+    builder.register_function_group (function_groups[i],
+				     preserve_user_namespace);
 }
 
 /* Return true if CANDIDATE is equivalent to MODEL_TYPE for overloading
diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
index 0d2ba968fc0..7688b5a7e53 100644
--- a/gcc/config/arm/arm_mve.h
+++ b/gcc/config/arm/arm_mve.h
@@ -358,14 +358,6 @@
 #define vstrwq_scatter_shifted_offset_p(__base, __offset, __value, __p) __arm_vstrwq_scatter_shifted_offset_p(__base, __offset, __value, __p)
 #define vstrwq_scatter_shifted_offset(__base, __offset, __value) __arm_vstrwq_scatter_shifted_offset(__base, __offset, __value)
 #define vuninitializedq(__v) __arm_vuninitializedq(__v)
-#define vreinterpretq_s16(__a) __arm_vreinterpretq_s16(__a)
-#define vreinterpretq_s32(__a) __arm_vreinterpretq_s32(__a)
-#define vreinterpretq_s64(__a) __arm_vreinterpretq_s64(__a)
-#define vreinterpretq_s8(__a) __arm_vreinterpretq_s8(__a)
-#define vreinterpretq_u16(__a) __arm_vreinterpretq_u16(__a)
-#define vreinterpretq_u32(__a) __arm_vreinterpretq_u32(__a)
-#define vreinterpretq_u64(__a) __arm_vreinterpretq_u64(__a)
-#define vreinterpretq_u8(__a) __arm_vreinterpretq_u8(__a)
 #define vddupq_m(__inactive, __a, __imm, __p) __arm_vddupq_m(__inactive, __a, __imm, __p)
 #define vddupq_u8(__a, __imm) __arm_vddupq_u8(__a, __imm)
 #define vddupq_u32(__a, __imm) __arm_vddupq_u32(__a, __imm)
@@ -518,8 +510,6 @@
 #define vfmsq_m(__a, __b, __c, __p) __arm_vfmsq_m(__a, __b, __c, __p)
 #define vmaxnmq_m(__inactive, __a, __b, __p) __arm_vmaxnmq_m(__inactive, __a, __b, __p)
 #define vminnmq_m(__inactive, __a, __b, __p) __arm_vminnmq_m(__inactive, __a, __b, __p)
-#define vreinterpretq_f16(__a) __arm_vreinterpretq_f16(__a)
-#define vreinterpretq_f32(__a) __arm_vreinterpretq_f32(__a)
 #define vminnmq_x(__a, __b, __p) __arm_vminnmq_x(__a, __b, __p)
 #define vmaxnmq_x(__a, __b, __p) __arm_vmaxnmq_x(__a, __b, __p)
 #define vcmulq_x(__a, __b, __p) __arm_vcmulq_x(__a, __b, __p)
@@ -2365,96 +2355,6 @@
 #define vaddq_u32(__a, __b) __arm_vaddq_u32(__a, __b)
 #define vaddq_f16(__a, __b) __arm_vaddq_f16(__a, __b)
 #define vaddq_f32(__a, __b) __arm_vaddq_f32(__a, __b)
-#define vreinterpretq_s16_s32(__a) __arm_vreinterpretq_s16_s32(__a)
-#define vreinterpretq_s16_s64(__a) __arm_vreinterpretq_s16_s64(__a)
-#define vreinterpretq_s16_s8(__a) __arm_vreinterpretq_s16_s8(__a)
-#define vreinterpretq_s16_u16(__a) __arm_vreinterpretq_s16_u16(__a)
-#define vreinterpretq_s16_u32(__a) __arm_vreinterpretq_s16_u32(__a)
-#define vreinterpretq_s16_u64(__a) __arm_vreinterpretq_s16_u64(__a)
-#define vreinterpretq_s16_u8(__a) __arm_vreinterpretq_s16_u8(__a)
-#define vreinterpretq_s32_s16(__a) __arm_vreinterpretq_s32_s16(__a)
-#define vreinterpretq_s32_s64(__a) __arm_vreinterpretq_s32_s64(__a)
-#define vreinterpretq_s32_s8(__a) __arm_vreinterpretq_s32_s8(__a)
-#define vreinterpretq_s32_u16(__a) __arm_vreinterpretq_s32_u16(__a)
-#define vreinterpretq_s32_u32(__a) __arm_vreinterpretq_s32_u32(__a)
-#define vreinterpretq_s32_u64(__a) __arm_vreinterpretq_s32_u64(__a)
-#define vreinterpretq_s32_u8(__a) __arm_vreinterpretq_s32_u8(__a)
-#define vreinterpretq_s64_s16(__a) __arm_vreinterpretq_s64_s16(__a)
-#define vreinterpretq_s64_s32(__a) __arm_vreinterpretq_s64_s32(__a)
-#define vreinterpretq_s64_s8(__a) __arm_vreinterpretq_s64_s8(__a)
-#define vreinterpretq_s64_u16(__a) __arm_vreinterpretq_s64_u16(__a)
-#define vreinterpretq_s64_u32(__a) __arm_vreinterpretq_s64_u32(__a)
-#define vreinterpretq_s64_u64(__a) __arm_vreinterpretq_s64_u64(__a)
-#define vreinterpretq_s64_u8(__a) __arm_vreinterpretq_s64_u8(__a)
-#define vreinterpretq_s8_s16(__a) __arm_vreinterpretq_s8_s16(__a)
-#define vreinterpretq_s8_s32(__a) __arm_vreinterpretq_s8_s32(__a)
-#define vreinterpretq_s8_s64(__a) __arm_vreinterpretq_s8_s64(__a)
-#define vreinterpretq_s8_u16(__a) __arm_vreinterpretq_s8_u16(__a)
-#define vreinterpretq_s8_u32(__a) __arm_vreinterpretq_s8_u32(__a)
-#define vreinterpretq_s8_u64(__a) __arm_vreinterpretq_s8_u64(__a)
-#define vreinterpretq_s8_u8(__a) __arm_vreinterpretq_s8_u8(__a)
-#define vreinterpretq_u16_s16(__a) __arm_vreinterpretq_u16_s16(__a)
-#define vreinterpretq_u16_s32(__a) __arm_vreinterpretq_u16_s32(__a)
-#define vreinterpretq_u16_s64(__a) __arm_vreinterpretq_u16_s64(__a)
-#define vreinterpretq_u16_s8(__a) __arm_vreinterpretq_u16_s8(__a)
-#define vreinterpretq_u16_u32(__a) __arm_vreinterpretq_u16_u32(__a)
-#define vreinterpretq_u16_u64(__a) __arm_vreinterpretq_u16_u64(__a)
-#define vreinterpretq_u16_u8(__a) __arm_vreinterpretq_u16_u8(__a)
-#define vreinterpretq_u32_s16(__a) __arm_vreinterpretq_u32_s16(__a)
-#define vreinterpretq_u32_s32(__a) __arm_vreinterpretq_u32_s32(__a)
-#define vreinterpretq_u32_s64(__a) __arm_vreinterpretq_u32_s64(__a)
-#define vreinterpretq_u32_s8(__a) __arm_vreinterpretq_u32_s8(__a)
-#define vreinterpretq_u32_u16(__a) __arm_vreinterpretq_u32_u16(__a)
-#define vreinterpretq_u32_u64(__a) __arm_vreinterpretq_u32_u64(__a)
-#define vreinterpretq_u32_u8(__a) __arm_vreinterpretq_u32_u8(__a)
-#define vreinterpretq_u64_s16(__a) __arm_vreinterpretq_u64_s16(__a)
-#define vreinterpretq_u64_s32(__a) __arm_vreinterpretq_u64_s32(__a)
-#define vreinterpretq_u64_s64(__a) __arm_vreinterpretq_u64_s64(__a)
-#define vreinterpretq_u64_s8(__a) __arm_vreinterpretq_u64_s8(__a)
-#define vreinterpretq_u64_u16(__a) __arm_vreinterpretq_u64_u16(__a)
-#define vreinterpretq_u64_u32(__a) __arm_vreinterpretq_u64_u32(__a)
-#define vreinterpretq_u64_u8(__a) __arm_vreinterpretq_u64_u8(__a)
-#define vreinterpretq_u8_s16(__a) __arm_vreinterpretq_u8_s16(__a)
-#define vreinterpretq_u8_s32(__a) __arm_vreinterpretq_u8_s32(__a)
-#define vreinterpretq_u8_s64(__a) __arm_vreinterpretq_u8_s64(__a)
-#define vreinterpretq_u8_s8(__a) __arm_vreinterpretq_u8_s8(__a)
-#define vreinterpretq_u8_u16(__a) __arm_vreinterpretq_u8_u16(__a)
-#define vreinterpretq_u8_u32(__a) __arm_vreinterpretq_u8_u32(__a)
-#define vreinterpretq_u8_u64(__a) __arm_vreinterpretq_u8_u64(__a)
-#define vreinterpretq_s32_f16(__a) __arm_vreinterpretq_s32_f16(__a)
-#define vreinterpretq_s32_f32(__a) __arm_vreinterpretq_s32_f32(__a)
-#define vreinterpretq_u16_f16(__a) __arm_vreinterpretq_u16_f16(__a)
-#define vreinterpretq_u16_f32(__a) __arm_vreinterpretq_u16_f32(__a)
-#define vreinterpretq_u32_f16(__a) __arm_vreinterpretq_u32_f16(__a)
-#define vreinterpretq_u32_f32(__a) __arm_vreinterpretq_u32_f32(__a)
-#define vreinterpretq_u64_f16(__a) __arm_vreinterpretq_u64_f16(__a)
-#define vreinterpretq_u64_f32(__a) __arm_vreinterpretq_u64_f32(__a)
-#define vreinterpretq_u8_f16(__a) __arm_vreinterpretq_u8_f16(__a)
-#define vreinterpretq_u8_f32(__a) __arm_vreinterpretq_u8_f32(__a)
-#define vreinterpretq_f16_f32(__a) __arm_vreinterpretq_f16_f32(__a)
-#define vreinterpretq_f16_s16(__a) __arm_vreinterpretq_f16_s16(__a)
-#define vreinterpretq_f16_s32(__a) __arm_vreinterpretq_f16_s32(__a)
-#define vreinterpretq_f16_s64(__a) __arm_vreinterpretq_f16_s64(__a)
-#define vreinterpretq_f16_s8(__a) __arm_vreinterpretq_f16_s8(__a)
-#define vreinterpretq_f16_u16(__a) __arm_vreinterpretq_f16_u16(__a)
-#define vreinterpretq_f16_u32(__a) __arm_vreinterpretq_f16_u32(__a)
-#define vreinterpretq_f16_u64(__a) __arm_vreinterpretq_f16_u64(__a)
-#define vreinterpretq_f16_u8(__a) __arm_vreinterpretq_f16_u8(__a)
-#define vreinterpretq_f32_f16(__a) __arm_vreinterpretq_f32_f16(__a)
-#define vreinterpretq_f32_s16(__a) __arm_vreinterpretq_f32_s16(__a)
-#define vreinterpretq_f32_s32(__a) __arm_vreinterpretq_f32_s32(__a)
-#define vreinterpretq_f32_s64(__a) __arm_vreinterpretq_f32_s64(__a)
-#define vreinterpretq_f32_s8(__a) __arm_vreinterpretq_f32_s8(__a)
-#define vreinterpretq_f32_u16(__a) __arm_vreinterpretq_f32_u16(__a)
-#define vreinterpretq_f32_u32(__a) __arm_vreinterpretq_f32_u32(__a)
-#define vreinterpretq_f32_u64(__a) __arm_vreinterpretq_f32_u64(__a)
-#define vreinterpretq_f32_u8(__a) __arm_vreinterpretq_f32_u8(__a)
-#define vreinterpretq_s16_f16(__a) __arm_vreinterpretq_s16_f16(__a)
-#define vreinterpretq_s16_f32(__a) __arm_vreinterpretq_s16_f32(__a)
-#define vreinterpretq_s64_f16(__a) __arm_vreinterpretq_s64_f16(__a)
-#define vreinterpretq_s64_f32(__a) __arm_vreinterpretq_s64_f32(__a)
-#define vreinterpretq_s8_f16(__a) __arm_vreinterpretq_s8_f16(__a)
-#define vreinterpretq_s8_f32(__a) __arm_vreinterpretq_s8_f32(__a)
 #define vuninitializedq_u8(void) __arm_vuninitializedq_u8(void)
 #define vuninitializedq_u16(void) __arm_vuninitializedq_u16(void)
 #define vuninitializedq_u32(void) __arm_vuninitializedq_u32(void)
@@ -37874,126 +37774,6 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_float16x8_t]: __arm_vuninitializedq_f16 (), \
   int (*)[__ARM_mve_type_float32x4_t]: __arm_vuninitializedq_f32 ());})
 
-#define __arm_vreinterpretq_f16(p0) ({ __typeof(p0) __p0 = (p0); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
-  int (*)[__ARM_mve_type_int8x16_t]: __arm_vreinterpretq_f16_s8 (__ARM_mve_coerce(__p0, int8x16_t)), \
-  int (*)[__ARM_mve_type_int16x8_t]: __arm_vreinterpretq_f16_s16 (__ARM_mve_coerce(__p0, int16x8_t)), \
-  int (*)[__ARM_mve_type_int32x4_t]: __arm_vreinterpretq_f16_s32 (__ARM_mve_coerce(__p0, int32x4_t)), \
-  int (*)[__ARM_mve_type_int64x2_t]: __arm_vreinterpretq_f16_s64 (__ARM_mve_coerce(__p0, int64x2_t)), \
-  int (*)[__ARM_mve_type_uint8x16_t]: __arm_vreinterpretq_f16_u8 (__ARM_mve_coerce(__p0, uint8x16_t)), \
-  int (*)[__ARM_mve_type_uint16x8_t]: __arm_vreinterpretq_f16_u16 (__ARM_mve_coerce(__p0, uint16x8_t)), \
-  int (*)[__ARM_mve_type_uint32x4_t]: __arm_vreinterpretq_f16_u32 (__ARM_mve_coerce(__p0, uint32x4_t)), \
-  int (*)[__ARM_mve_type_uint64x2_t]: __arm_vreinterpretq_f16_u64 (__ARM_mve_coerce(__p0, uint64x2_t)), \
-  int (*)[__ARM_mve_type_float32x4_t]: __arm_vreinterpretq_f16_f32 (__ARM_mve_coerce(__p0, float32x4_t)));})
-
-#define __arm_vreinterpretq_f32(p0) ({ __typeof(p0) __p0 = (p0); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
-  int (*)[__ARM_mve_type_int8x16_t]: __arm_vreinterpretq_f32_s8 (__ARM_mve_coerce(__p0, int8x16_t)), \
-  int (*)[__ARM_mve_type_int16x8_t]: __arm_vreinterpretq_f32_s16 (__ARM_mve_coerce(__p0, int16x8_t)), \
-  int (*)[__ARM_mve_type_int32x4_t]: __arm_vreinterpretq_f32_s32 (__ARM_mve_coerce(__p0, int32x4_t)), \
-  int (*)[__ARM_mve_type_int64x2_t]: __arm_vreinterpretq_f32_s64 (__ARM_mve_coerce(__p0, int64x2_t)), \
-  int (*)[__ARM_mve_type_uint8x16_t]: __arm_vreinterpretq_f32_u8 (__ARM_mve_coerce(__p0, uint8x16_t)), \
-  int (*)[__ARM_mve_type_uint16x8_t]: __arm_vreinterpretq_f32_u16 (__ARM_mve_coerce(__p0, uint16x8_t)), \
-  int (*)[__ARM_mve_type_uint32x4_t]: __arm_vreinterpretq_f32_u32 (__ARM_mve_coerce(__p0, uint32x4_t)), \
-  int (*)[__ARM_mve_type_uint64x2_t]: __arm_vreinterpretq_f32_u64 (__ARM_mve_coerce(__p0, uint64x2_t)), \
-  int (*)[__ARM_mve_type_float16x8_t]: __arm_vreinterpretq_f32_f16 (__ARM_mve_coerce(__p0, float16x8_t)));})
-
-#define __arm_vreinterpretq_s16(p0) ({ __typeof(p0) __p0 = (p0); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
-  int (*)[__ARM_mve_type_float16x8_t]: __arm_vreinterpretq_s16_f16 (__ARM_mve_coerce(__p0, float16x8_t)), \
-  int (*)[__ARM_mve_type_int8x16_t]: __arm_vreinterpretq_s16_s8 (__ARM_mve_coerce(__p0, int8x16_t)), \
-  int (*)[__ARM_mve_type_int32x4_t]: __arm_vreinterpretq_s16_s32 (__ARM_mve_coerce(__p0, int32x4_t)), \
-  int (*)[__ARM_mve_type_int64x2_t]: __arm_vreinterpretq_s16_s64 (__ARM_mve_coerce(__p0, int64x2_t)), \
-  int (*)[__ARM_mve_type_uint8x16_t]: __arm_vreinterpretq_s16_u8 (__ARM_mve_coerce(__p0, uint8x16_t)), \
-  int (*)[__ARM_mve_type_uint16x8_t]: __arm_vreinterpretq_s16_u16 (__ARM_mve_coerce(__p0, uint16x8_t)), \
-  int (*)[__ARM_mve_type_uint32x4_t]: __arm_vreinterpretq_s16_u32 (__ARM_mve_coerce(__p0, uint32x4_t)), \
-  int (*)[__ARM_mve_type_uint64x2_t]: __arm_vreinterpretq_s16_u64 (__ARM_mve_coerce(__p0, uint64x2_t)), \
-  int (*)[__ARM_mve_type_float32x4_t]: __arm_vreinterpretq_s16_f32 (__ARM_mve_coerce(__p0, float32x4_t)));})
-
-#define __arm_vreinterpretq_s32(p0) ({ __typeof(p0) __p0 = (p0); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
-  int (*)[__ARM_mve_type_float16x8_t]: __arm_vreinterpretq_s32_f16 (__ARM_mve_coerce(__p0, float16x8_t)), \
-  int (*)[__ARM_mve_type_int16x8_t]: __arm_vreinterpretq_s32_s16 (__ARM_mve_coerce(__p0, int16x8_t)), \
-  int (*)[__ARM_mve_type_int8x16_t]: __arm_vreinterpretq_s32_s8 (__ARM_mve_coerce(__p0, int8x16_t)), \
-  int (*)[__ARM_mve_type_int64x2_t]: __arm_vreinterpretq_s32_s64 (__ARM_mve_coerce(__p0, int64x2_t)), \
-  int (*)[__ARM_mve_type_uint8x16_t]: __arm_vreinterpretq_s32_u8 (__ARM_mve_coerce(__p0, uint8x16_t)), \
-  int (*)[__ARM_mve_type_uint16x8_t]: __arm_vreinterpretq_s32_u16 (__ARM_mve_coerce(__p0, uint16x8_t)), \
-  int (*)[__ARM_mve_type_uint32x4_t]: __arm_vreinterpretq_s32_u32 (__ARM_mve_coerce(__p0, uint32x4_t)), \
-  int (*)[__ARM_mve_type_uint64x2_t]: __arm_vreinterpretq_s32_u64 (__ARM_mve_coerce(__p0, uint64x2_t)), \
-  int (*)[__ARM_mve_type_float32x4_t]: __arm_vreinterpretq_s32_f32 (__ARM_mve_coerce(__p0, float32x4_t)));})
-
-#define __arm_vreinterpretq_s64(p0) ({ __typeof(p0) __p0 = (p0); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
-  int (*)[__ARM_mve_type_float16x8_t]: __arm_vreinterpretq_s64_f16 (__ARM_mve_coerce(__p0, float16x8_t)), \
-  int (*)[__ARM_mve_type_int16x8_t]: __arm_vreinterpretq_s64_s16 (__ARM_mve_coerce(__p0, int16x8_t)), \
-  int (*)[__ARM_mve_type_int32x4_t]: __arm_vreinterpretq_s64_s32 (__ARM_mve_coerce(__p0, int32x4_t)), \
-  int (*)[__ARM_mve_type_int8x16_t]: __arm_vreinterpretq_s64_s8 (__ARM_mve_coerce(__p0, int8x16_t)), \
-  int (*)[__ARM_mve_type_uint8x16_t]: __arm_vreinterpretq_s64_u8 (__ARM_mve_coerce(__p0, uint8x16_t)), \
-  int (*)[__ARM_mve_type_uint16x8_t]: __arm_vreinterpretq_s64_u16 (__ARM_mve_coerce(__p0, uint16x8_t)), \
-  int (*)[__ARM_mve_type_uint32x4_t]: __arm_vreinterpretq_s64_u32 (__ARM_mve_coerce(__p0, uint32x4_t)), \
-  int (*)[__ARM_mve_type_uint64x2_t]: __arm_vreinterpretq_s64_u64 (__ARM_mve_coerce(__p0, uint64x2_t)), \
-  int (*)[__ARM_mve_type_float32x4_t]: __arm_vreinterpretq_s64_f32 (__ARM_mve_coerce(__p0, float32x4_t)));})
-
-#define __arm_vreinterpretq_s8(p0) ({ __typeof(p0) __p0 = (p0); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
-  int (*)[__ARM_mve_type_float16x8_t]: __arm_vreinterpretq_s8_f16 (__ARM_mve_coerce(__p0, float16x8_t)), \
-  int (*)[__ARM_mve_type_int16x8_t]: __arm_vreinterpretq_s8_s16 (__ARM_mve_coerce(__p0, int16x8_t)), \
-  int (*)[__ARM_mve_type_int32x4_t]: __arm_vreinterpretq_s8_s32 (__ARM_mve_coerce(__p0, int32x4_t)), \
-  int (*)[__ARM_mve_type_int64x2_t]: __arm_vreinterpretq_s8_s64 (__ARM_mve_coerce(__p0, int64x2_t)), \
-  int (*)[__ARM_mve_type_uint8x16_t]: __arm_vreinterpretq_s8_u8 (__ARM_mve_coerce(__p0, uint8x16_t)), \
-  int (*)[__ARM_mve_type_uint16x8_t]: __arm_vreinterpretq_s8_u16 (__ARM_mve_coerce(__p0, uint16x8_t)), \
-  int (*)[__ARM_mve_type_uint32x4_t]: __arm_vreinterpretq_s8_u32 (__ARM_mve_coerce(__p0, uint32x4_t)), \
-  int (*)[__ARM_mve_type_uint64x2_t]: __arm_vreinterpretq_s8_u64 (__ARM_mve_coerce(__p0, uint64x2_t)), \
-  int (*)[__ARM_mve_type_float32x4_t]: __arm_vreinterpretq_s8_f32 (__ARM_mve_coerce(__p0, float32x4_t)));})
-
-#define __arm_vreinterpretq_u16(p0) ({ __typeof(p0) __p0 = (p0); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
-  int (*)[__ARM_mve_type_float16x8_t]: __arm_vreinterpretq_u16_f16 (__ARM_mve_coerce(__p0, float16x8_t)), \
-  int (*)[__ARM_mve_type_int8x16_t]: __arm_vreinterpretq_u16_s8 (__ARM_mve_coerce(__p0, int8x16_t)), \
-  int (*)[__ARM_mve_type_int32x4_t]: __arm_vreinterpretq_u16_s32 (__ARM_mve_coerce(__p0, int32x4_t)), \
-  int (*)[__ARM_mve_type_int64x2_t]: __arm_vreinterpretq_u16_s64 (__ARM_mve_coerce(__p0, int64x2_t)), \
-  int (*)[__ARM_mve_type_uint8x16_t]: __arm_vreinterpretq_u16_u8 (__ARM_mve_coerce(__p0, uint8x16_t)), \
-  int (*)[__ARM_mve_type_int16x8_t]: __arm_vreinterpretq_u16_s16 (__ARM_mve_coerce(__p0, int16x8_t)), \
-  int (*)[__ARM_mve_type_uint32x4_t]: __arm_vreinterpretq_u16_u32 (__ARM_mve_coerce(__p0, uint32x4_t)), \
-  int (*)[__ARM_mve_type_uint64x2_t]: __arm_vreinterpretq_u16_u64 (__ARM_mve_coerce(__p0, uint64x2_t)), \
-  int (*)[__ARM_mve_type_float32x4_t]: __arm_vreinterpretq_u16_f32 (__ARM_mve_coerce(__p0, float32x4_t)));})
-
-#define __arm_vreinterpretq_u32(p0) ({ __typeof(p0) __p0 = (p0); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
-  int (*)[__ARM_mve_type_float16x8_t]: __arm_vreinterpretq_u32_f16 (__ARM_mve_coerce(__p0, float16x8_t)), \
-  int (*)[__ARM_mve_type_int16x8_t]: __arm_vreinterpretq_u32_s16 (__ARM_mve_coerce(__p0, int16x8_t)), \
-  int (*)[__ARM_mve_type_int8x16_t]: __arm_vreinterpretq_u32_s8 (__ARM_mve_coerce(__p0, int8x16_t)), \
-  int (*)[__ARM_mve_type_int64x2_t]: __arm_vreinterpretq_u32_s64 (__ARM_mve_coerce(__p0, int64x2_t)), \
-  int (*)[__ARM_mve_type_uint8x16_t]: __arm_vreinterpretq_u32_u8 (__ARM_mve_coerce(__p0, uint8x16_t)), \
-  int (*)[__ARM_mve_type_uint16x8_t]: __arm_vreinterpretq_u32_u16 (__ARM_mve_coerce(__p0, uint16x8_t)), \
-  int (*)[__ARM_mve_type_int32x4_t]: __arm_vreinterpretq_u32_s32 (__ARM_mve_coerce(__p0, int32x4_t)), \
-  int (*)[__ARM_mve_type_uint64x2_t]: __arm_vreinterpretq_u32_u64 (__ARM_mve_coerce(__p0, uint64x2_t)), \
-  int (*)[__ARM_mve_type_float32x4_t]: __arm_vreinterpretq_u32_f32 (__ARM_mve_coerce(__p0, float32x4_t)));})
-
-#define __arm_vreinterpretq_u64(p0) ({ __typeof(p0) __p0 = (p0); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
-  int (*)[__ARM_mve_type_float16x8_t]: __arm_vreinterpretq_u64_f16 (__ARM_mve_coerce(__p0, float16x8_t)), \
-  int (*)[__ARM_mve_type_int16x8_t]: __arm_vreinterpretq_u64_s16 (__ARM_mve_coerce(__p0, int16x8_t)), \
-  int (*)[__ARM_mve_type_int32x4_t]: __arm_vreinterpretq_u64_s32 (__ARM_mve_coerce(__p0, int32x4_t)), \
-  int (*)[__ARM_mve_type_int8x16_t]: __arm_vreinterpretq_u64_s8 (__ARM_mve_coerce(__p0, int8x16_t)), \
-  int (*)[__ARM_mve_type_uint8x16_t]: __arm_vreinterpretq_u64_u8 (__ARM_mve_coerce(__p0, uint8x16_t)), \
-  int (*)[__ARM_mve_type_uint16x8_t]: __arm_vreinterpretq_u64_u16 (__ARM_mve_coerce(__p0, uint16x8_t)), \
-  int (*)[__ARM_mve_type_uint32x4_t]: __arm_vreinterpretq_u64_u32 (__ARM_mve_coerce(__p0, uint32x4_t)), \
-  int (*)[__ARM_mve_type_int64x2_t]: __arm_vreinterpretq_u64_s64 (__ARM_mve_coerce(__p0, int64x2_t)), \
-  int (*)[__ARM_mve_type_float32x4_t]: __arm_vreinterpretq_u64_f32 (__ARM_mve_coerce(__p0, float32x4_t)));})
-
-#define __arm_vreinterpretq_u8(p0) ({ __typeof(p0) __p0 = (p0); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
-  int (*)[__ARM_mve_type_float16x8_t]: __arm_vreinterpretq_u8_f16 (__ARM_mve_coerce(__p0, float16x8_t)), \
-  int (*)[__ARM_mve_type_int16x8_t]: __arm_vreinterpretq_u8_s16 (__ARM_mve_coerce(__p0, int16x8_t)), \
-  int (*)[__ARM_mve_type_int32x4_t]: __arm_vreinterpretq_u8_s32 (__ARM_mve_coerce(__p0, int32x4_t)), \
-  int (*)[__ARM_mve_type_int64x2_t]: __arm_vreinterpretq_u8_s64 (__ARM_mve_coerce(__p0, int64x2_t)), \
-  int (*)[__ARM_mve_type_int8x16_t]: __arm_vreinterpretq_u8_s8 (__ARM_mve_coerce(__p0, int8x16_t)), \
-  int (*)[__ARM_mve_type_uint16x8_t]: __arm_vreinterpretq_u8_u16 (__ARM_mve_coerce(__p0, uint16x8_t)), \
-  int (*)[__ARM_mve_type_uint32x4_t]: __arm_vreinterpretq_u8_u32 (__ARM_mve_coerce(__p0, uint32x4_t)), \
-  int (*)[__ARM_mve_type_uint64x2_t]: __arm_vreinterpretq_u8_u64 (__ARM_mve_coerce(__p0, uint64x2_t)), \
-  int (*)[__ARM_mve_type_float32x4_t]: __arm_vreinterpretq_u8_f32 (__ARM_mve_coerce(__p0, float32x4_t)));})
-
 #define __arm_vstrwq_scatter_base_wb(p0,p1,p2) ({ __typeof(p2) __p2 = (p2); \
   _Generic( (int (*)[__ARM_mve_typeid(__p2)])0, \
   int (*)[__ARM_mve_type_int32x4_t]: __arm_vstrwq_scatter_base_wb_s32 (p0, p1, __ARM_mve_coerce(__p2, int32x4_t)), \
@@ -39931,86 +39711,6 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_uint32x4_t]: __arm_vuninitializedq_u32 (), \
   int (*)[__ARM_mve_type_uint64x2_t]: __arm_vuninitializedq_u64 ());})
 
-#define __arm_vreinterpretq_s16(p0) ({ __typeof(p0) __p0 = (p0); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
-  int (*)[__ARM_mve_type_int8x16_t]: __arm_vreinterpretq_s16_s8 (__ARM_mve_coerce(__p0, int8x16_t)), \
-  int (*)[__ARM_mve_type_int32x4_t]: __arm_vreinterpretq_s16_s32 (__ARM_mve_coerce(__p0, int32x4_t)), \
-  int (*)[__ARM_mve_type_int64x2_t]: __arm_vreinterpretq_s16_s64 (__ARM_mve_coerce(__p0, int64x2_t)), \
-  int (*)[__ARM_mve_type_uint8x16_t]: __arm_vreinterpretq_s16_u8 (__ARM_mve_coerce(__p0, uint8x16_t)), \
-  int (*)[__ARM_mve_type_uint16x8_t]: __arm_vreinterpretq_s16_u16 (__ARM_mve_coerce(__p0, uint16x8_t)), \
-  int (*)[__ARM_mve_type_uint32x4_t]: __arm_vreinterpretq_s16_u32 (__ARM_mve_coerce(__p0, uint32x4_t)), \
-  int (*)[__ARM_mve_type_uint64x2_t]: __arm_vreinterpretq_s16_u64 (__ARM_mve_coerce(__p0, uint64x2_t)));})
-
-#define __arm_vreinterpretq_s32(p0) ({ __typeof(p0) __p0 = (p0); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
-  int (*)[__ARM_mve_type_int16x8_t]: __arm_vreinterpretq_s32_s16 (__ARM_mve_coerce(__p0, int16x8_t)), \
-  int (*)[__ARM_mve_type_int8x16_t]: __arm_vreinterpretq_s32_s8 (__ARM_mve_coerce(__p0, int8x16_t)), \
-  int (*)[__ARM_mve_type_int64x2_t]: __arm_vreinterpretq_s32_s64 (__ARM_mve_coerce(__p0, int64x2_t)), \
-  int (*)[__ARM_mve_type_uint8x16_t]: __arm_vreinterpretq_s32_u8 (__ARM_mve_coerce(__p0, uint8x16_t)), \
-  int (*)[__ARM_mve_type_uint16x8_t]: __arm_vreinterpretq_s32_u16 (__ARM_mve_coerce(__p0, uint16x8_t)), \
-  int (*)[__ARM_mve_type_uint32x4_t]: __arm_vreinterpretq_s32_u32 (__ARM_mve_coerce(__p0, uint32x4_t)), \
-  int (*)[__ARM_mve_type_uint64x2_t]: __arm_vreinterpretq_s32_u64 (__ARM_mve_coerce(__p0, uint64x2_t)));})
-
-#define __arm_vreinterpretq_s64(p0) ({ __typeof(p0) __p0 = (p0); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
-  int (*)[__ARM_mve_type_int16x8_t]: __arm_vreinterpretq_s64_s16 (__ARM_mve_coerce(__p0, int16x8_t)), \
-  int (*)[__ARM_mve_type_int32x4_t]: __arm_vreinterpretq_s64_s32 (__ARM_mve_coerce(__p0, int32x4_t)), \
-  int (*)[__ARM_mve_type_int8x16_t]: __arm_vreinterpretq_s64_s8 (__ARM_mve_coerce(__p0, int8x16_t)), \
-  int (*)[__ARM_mve_type_uint8x16_t]: __arm_vreinterpretq_s64_u8 (__ARM_mve_coerce(__p0, uint8x16_t)), \
-  int (*)[__ARM_mve_type_uint16x8_t]: __arm_vreinterpretq_s64_u16 (__ARM_mve_coerce(__p0, uint16x8_t)), \
-  int (*)[__ARM_mve_type_uint32x4_t]: __arm_vreinterpretq_s64_u32 (__ARM_mve_coerce(__p0, uint32x4_t)), \
-  int (*)[__ARM_mve_type_uint64x2_t]: __arm_vreinterpretq_s64_u64 (__ARM_mve_coerce(__p0, uint64x2_t)));})
-
-#define __arm_vreinterpretq_s8(p0) ({ __typeof(p0) __p0 = (p0); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
-  int (*)[__ARM_mve_type_int16x8_t]: __arm_vreinterpretq_s8_s16 (__ARM_mve_coerce(__p0, int16x8_t)), \
-  int (*)[__ARM_mve_type_int32x4_t]: __arm_vreinterpretq_s8_s32 (__ARM_mve_coerce(__p0, int32x4_t)), \
-  int (*)[__ARM_mve_type_int64x2_t]: __arm_vreinterpretq_s8_s64 (__ARM_mve_coerce(__p0, int64x2_t)), \
-  int (*)[__ARM_mve_type_uint8x16_t]: __arm_vreinterpretq_s8_u8 (__ARM_mve_coerce(__p0, uint8x16_t)), \
-  int (*)[__ARM_mve_type_uint16x8_t]: __arm_vreinterpretq_s8_u16 (__ARM_mve_coerce(__p0, uint16x8_t)), \
-  int (*)[__ARM_mve_type_uint32x4_t]: __arm_vreinterpretq_s8_u32 (__ARM_mve_coerce(__p0, uint32x4_t)), \
-  int (*)[__ARM_mve_type_uint64x2_t]: __arm_vreinterpretq_s8_u64 (__ARM_mve_coerce(__p0, uint64x2_t)));})
-
-#define __arm_vreinterpretq_u16(p0) ({ __typeof(p0) __p0 = (p0); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
-  int (*)[__ARM_mve_type_int8x16_t]: __arm_vreinterpretq_u16_s8 (__ARM_mve_coerce(__p0, int8x16_t)), \
-  int (*)[__ARM_mve_type_int32x4_t]: __arm_vreinterpretq_u16_s32 (__ARM_mve_coerce(__p0, int32x4_t)), \
-  int (*)[__ARM_mve_type_int64x2_t]: __arm_vreinterpretq_u16_s64 (__ARM_mve_coerce(__p0, int64x2_t)), \
-  int (*)[__ARM_mve_type_uint8x16_t]: __arm_vreinterpretq_u16_u8 (__ARM_mve_coerce(__p0, uint8x16_t)), \
-  int (*)[__ARM_mve_type_int16x8_t]: __arm_vreinterpretq_u16_s16 (__ARM_mve_coerce(__p0, int16x8_t)), \
-  int (*)[__ARM_mve_type_uint32x4_t]: __arm_vreinterpretq_u16_u32 (__ARM_mve_coerce(__p0, uint32x4_t)), \
-  int (*)[__ARM_mve_type_uint64x2_t]: __arm_vreinterpretq_u16_u64 (__ARM_mve_coerce(__p0, uint64x2_t)));})
-
-#define __arm_vreinterpretq_u32(p0) ({ __typeof(p0) __p0 = (p0); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
-  int (*)[__ARM_mve_type_int16x8_t]: __arm_vreinterpretq_u32_s16 (__ARM_mve_coerce(__p0, int16x8_t)), \
-  int (*)[__ARM_mve_type_int8x16_t]: __arm_vreinterpretq_u32_s8 (__ARM_mve_coerce(__p0, int8x16_t)), \
-  int (*)[__ARM_mve_type_int64x2_t]: __arm_vreinterpretq_u32_s64 (__ARM_mve_coerce(__p0, int64x2_t)), \
-  int (*)[__ARM_mve_type_uint8x16_t]: __arm_vreinterpretq_u32_u8 (__ARM_mve_coerce(__p0, uint8x16_t)), \
-  int (*)[__ARM_mve_type_uint16x8_t]: __arm_vreinterpretq_u32_u16 (__ARM_mve_coerce(__p0, uint16x8_t)), \
-  int (*)[__ARM_mve_type_int32x4_t]: __arm_vreinterpretq_u32_s32 (__ARM_mve_coerce(__p0, int32x4_t)), \
-  int (*)[__ARM_mve_type_uint64x2_t]: __arm_vreinterpretq_u32_u64 (__ARM_mve_coerce(__p0, uint64x2_t)));})
-
-#define __arm_vreinterpretq_u64(p0) ({ __typeof(p0) __p0 = (p0); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
-  int (*)[__ARM_mve_type_int16x8_t]: __arm_vreinterpretq_u64_s16 (__ARM_mve_coerce(__p0, int16x8_t)), \
-  int (*)[__ARM_mve_type_int32x4_t]: __arm_vreinterpretq_u64_s32 (__ARM_mve_coerce(__p0, int32x4_t)), \
-  int (*)[__ARM_mve_type_int8x16_t]: __arm_vreinterpretq_u64_s8 (__ARM_mve_coerce(__p0, int8x16_t)), \
-  int (*)[__ARM_mve_type_uint8x16_t]: __arm_vreinterpretq_u64_u8 (__ARM_mve_coerce(__p0, uint8x16_t)), \
-  int (*)[__ARM_mve_type_uint16x8_t]: __arm_vreinterpretq_u64_u16 (__ARM_mve_coerce(__p0, uint16x8_t)), \
-  int (*)[__ARM_mve_type_uint32x4_t]: __arm_vreinterpretq_u64_u32 (__ARM_mve_coerce(__p0, uint32x4_t)), \
-  int (*)[__ARM_mve_type_int64x2_t]: __arm_vreinterpretq_u64_s64 (__ARM_mve_coerce(__p0, int64x2_t)));})
-
-#define __arm_vreinterpretq_u8(p0) ({ __typeof(p0) __p0 = (p0); \
-  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
-  int (*)[__ARM_mve_type_int16x8_t]: __arm_vreinterpretq_u8_s16 (__ARM_mve_coerce(__p0, int16x8_t)), \
-  int (*)[__ARM_mve_type_int32x4_t]: __arm_vreinterpretq_u8_s32 (__ARM_mve_coerce(__p0, int32x4_t)), \
-  int (*)[__ARM_mve_type_int64x2_t]: __arm_vreinterpretq_u8_s64 (__ARM_mve_coerce(__p0, int64x2_t)), \
-  int (*)[__ARM_mve_type_int8x16_t]: __arm_vreinterpretq_u8_s8 (__ARM_mve_coerce(__p0, int8x16_t)), \
-  int (*)[__ARM_mve_type_uint16x8_t]: __arm_vreinterpretq_u8_u16 (__ARM_mve_coerce(__p0, uint16x8_t)), \
-  int (*)[__ARM_mve_type_uint32x4_t]: __arm_vreinterpretq_u8_u32 (__ARM_mve_coerce(__p0, uint32x4_t)), \
-  int (*)[__ARM_mve_type_uint64x2_t]: __arm_vreinterpretq_u8_u64 (__ARM_mve_coerce(__p0, uint64x2_t)));})
-
 #define __arm_vabsq_x(p1,p2) ({ __typeof(p1) __p1 = (p1); \
   _Generic( (int (*)[__ARM_mve_typeid(__p1)])0, \
   int (*)[__ARM_mve_type_int8x16_t]: __arm_vabsq_x_s8 (__ARM_mve_coerce(__p1, int8x16_t), p2), \
diff --git a/gcc/config/arm/arm_mve_types.h b/gcc/config/arm/arm_mve_types.h
index 12bb519142f..ae2591faa03 100644
--- a/gcc/config/arm/arm_mve_types.h
+++ b/gcc/config/arm/arm_mve_types.h
@@ -29,1124 +29,101 @@ typedef float float32_t;
 
 #pragma GCC arm "arm_mve_types.h"
 
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_s16_s32 (int32x4_t __a)
-{
-  return (int16x8_t)  __a;
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_s16_s64 (int64x2_t __a)
-{
-  return (int16x8_t)  __a;
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_s16_s8 (int8x16_t __a)
-{
-  return (int16x8_t)  __a;
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_s16_u16 (uint16x8_t __a)
-{
-  return (int16x8_t)  __a;
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_s16_u32 (uint32x4_t __a)
-{
-  return (int16x8_t)  __a;
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_s16_u64 (uint64x2_t __a)
-{
-  return (int16x8_t)  __a;
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_s16_u8 (uint8x16_t __a)
-{
-  return (int16x8_t)  __a;
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_s32_s16 (int16x8_t __a)
-{
-  return (int32x4_t)  __a;
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_s32_s64 (int64x2_t __a)
-{
-  return (int32x4_t)  __a;
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_s32_s8 (int8x16_t __a)
-{
-  return (int32x4_t)  __a;
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_s32_u16 (uint16x8_t __a)
-{
-  return (int32x4_t)  __a;
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_s32_u32 (uint32x4_t __a)
-{
-  return (int32x4_t)  __a;
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_s32_u64 (uint64x2_t __a)
-{
-  return (int32x4_t)  __a;
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_s32_u8 (uint8x16_t __a)
-{
-  return (int32x4_t)  __a;
-}
-
-__extension__ extern __inline int64x2_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_s64_s16 (int16x8_t __a)
-{
-  return (int64x2_t)  __a;
-}
-
-__extension__ extern __inline int64x2_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_s64_s32 (int32x4_t __a)
-{
-  return (int64x2_t)  __a;
-}
-
-__extension__ extern __inline int64x2_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_s64_s8 (int8x16_t __a)
-{
-  return (int64x2_t)  __a;
-}
-
-__extension__ extern __inline int64x2_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_s64_u16 (uint16x8_t __a)
-{
-  return (int64x2_t)  __a;
-}
-
-__extension__ extern __inline int64x2_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_s64_u32 (uint32x4_t __a)
-{
-  return (int64x2_t)  __a;
-}
-
-__extension__ extern __inline int64x2_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_s64_u64 (uint64x2_t __a)
-{
-  return (int64x2_t)  __a;
-}
-
-__extension__ extern __inline int64x2_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_s64_u8 (uint8x16_t __a)
-{
-  return (int64x2_t)  __a;
-}
-
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_s8_s16 (int16x8_t __a)
-{
-  return (int8x16_t)  __a;
-}
-
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_s8_s32 (int32x4_t __a)
-{
-  return (int8x16_t)  __a;
-}
-
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_s8_s64 (int64x2_t __a)
-{
-  return (int8x16_t)  __a;
-}
-
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_s8_u16 (uint16x8_t __a)
-{
-  return (int8x16_t)  __a;
-}
-
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_s8_u32 (uint32x4_t __a)
-{
-  return (int8x16_t)  __a;
-}
-
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_s8_u64 (uint64x2_t __a)
-{
-  return (int8x16_t)  __a;
-}
-
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_s8_u8 (uint8x16_t __a)
-{
-  return (int8x16_t)  __a;
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_u16_s16 (int16x8_t __a)
-{
-  return (uint16x8_t)  __a;
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_u16_s32 (int32x4_t __a)
-{
-  return (uint16x8_t)  __a;
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_u16_s64 (int64x2_t __a)
-{
-  return (uint16x8_t)  __a;
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_u16_s8 (int8x16_t __a)
-{
-  return (uint16x8_t)  __a;
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_u16_u32 (uint32x4_t __a)
-{
-  return (uint16x8_t)  __a;
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_u16_u64 (uint64x2_t __a)
-{
-  return (uint16x8_t)  __a;
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_u16_u8 (uint8x16_t __a)
-{
-  return (uint16x8_t)  __a;
-}
-
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_u32_s16 (int16x8_t __a)
-{
-  return (uint32x4_t)  __a;
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_u32_s32 (int32x4_t __a)
-{
-  return (uint32x4_t)  __a;
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_u32_s64 (int64x2_t __a)
-{
-  return (uint32x4_t)  __a;
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_u32_s8 (int8x16_t __a)
-{
-  return (uint32x4_t)  __a;
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_u32_u16 (uint16x8_t __a)
-{
-  return (uint32x4_t)  __a;
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_u32_u64 (uint64x2_t __a)
-{
-  return (uint32x4_t)  __a;
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_u32_u8 (uint8x16_t __a)
-{
-  return (uint32x4_t)  __a;
-}
-
-__extension__ extern __inline uint64x2_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_u64_s16 (int16x8_t __a)
-{
-  return (uint64x2_t)  __a;
-}
-
-__extension__ extern __inline uint64x2_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_u64_s32 (int32x4_t __a)
-{
-  return (uint64x2_t)  __a;
-}
-
-__extension__ extern __inline uint64x2_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_u64_s64 (int64x2_t __a)
-{
-  return (uint64x2_t)  __a;
-}
-
-__extension__ extern __inline uint64x2_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_u64_s8 (int8x16_t __a)
-{
-  return (uint64x2_t)  __a;
-}
-
-__extension__ extern __inline uint64x2_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_u64_u16 (uint16x8_t __a)
-{
-  return (uint64x2_t)  __a;
-}
-
-__extension__ extern __inline uint64x2_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_u64_u32 (uint32x4_t __a)
-{
-  return (uint64x2_t)  __a;
-}
-
-__extension__ extern __inline uint64x2_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_u64_u8 (uint8x16_t __a)
-{
-  return (uint64x2_t)  __a;
-}
-
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_u8_s16 (int16x8_t __a)
-{
-  return (uint8x16_t)  __a;
-}
-
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_u8_s32 (int32x4_t __a)
-{
-  return (uint8x16_t)  __a;
-}
-
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_u8_s64 (int64x2_t __a)
-{
-  return (uint8x16_t)  __a;
-}
-
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_u8_s8 (int8x16_t __a)
-{
-  return (uint8x16_t)  __a;
-}
-
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_u8_u16 (uint16x8_t __a)
-{
-  return (uint8x16_t)  __a;
-}
-
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_u8_u32 (uint32x4_t __a)
-{
-  return (uint8x16_t)  __a;
-}
-
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_u8_u64 (uint64x2_t __a)
-{
-  return (uint8x16_t)  __a;
-}
-
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vuninitializedq_u8 (void)
-{
-  uint8x16_t __uninit;
-  __asm__ ("": "=w"(__uninit));
-  return __uninit;
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vuninitializedq_u16 (void)
-{
-  uint16x8_t __uninit;
-  __asm__ ("": "=w"(__uninit));
-  return __uninit;
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vuninitializedq_u32 (void)
-{
-  uint32x4_t __uninit;
-  __asm__ ("": "=w"(__uninit));
-  return __uninit;
-}
-
-__extension__ extern __inline uint64x2_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vuninitializedq_u64 (void)
-{
-  uint64x2_t __uninit;
-  __asm__ ("": "=w"(__uninit));
-  return __uninit;
-}
-
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vuninitializedq_s8 (void)
-{
-  int8x16_t __uninit;
-  __asm__ ("": "=w"(__uninit));
-  return __uninit;
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vuninitializedq_s16 (void)
-{
-  int16x8_t __uninit;
-  __asm__ ("": "=w"(__uninit));
-  return __uninit;
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vuninitializedq_s32 (void)
-{
-  int32x4_t __uninit;
-  __asm__ ("": "=w"(__uninit));
-  return __uninit;
-}
-
-__extension__ extern __inline int64x2_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vuninitializedq_s64 (void)
-{
-  int64x2_t __uninit;
-  __asm__ ("": "=w"(__uninit));
-  return __uninit;
-}
-
-#if (__ARM_FEATURE_MVE & 2) /* MVE Floating point.  */
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_s32_f16 (float16x8_t __a)
-{
-  return (int32x4_t)  __a;
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_s32_f32 (float32x4_t __a)
-{
-  return (int32x4_t)  __a;
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_s16_f16 (float16x8_t __a)
-{
-  return (int16x8_t)  __a;
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_s16_f32 (float32x4_t __a)
-{
-  return (int16x8_t)  __a;
-}
-
-__extension__ extern __inline int64x2_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_s64_f16 (float16x8_t __a)
-{
-  return (int64x2_t)  __a;
-}
-
-__extension__ extern __inline int64x2_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_s64_f32 (float32x4_t __a)
-{
-  return (int64x2_t)  __a;
-}
-
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_s8_f16 (float16x8_t __a)
-{
-  return (int8x16_t)  __a;
-}
-
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_s8_f32 (float32x4_t __a)
-{
-  return (int8x16_t)  __a;
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_u16_f16 (float16x8_t __a)
-{
-  return (uint16x8_t)  __a;
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_u16_f32 (float32x4_t __a)
-{
-  return (uint16x8_t)  __a;
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_u32_f16 (float16x8_t __a)
-{
-  return (uint32x4_t)  __a;
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_u32_f32 (float32x4_t __a)
-{
-  return (uint32x4_t)  __a;
-}
-
-__extension__ extern __inline uint64x2_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_u64_f16 (float16x8_t __a)
-{
-  return (uint64x2_t)  __a;
-}
-
-__extension__ extern __inline uint64x2_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_u64_f32 (float32x4_t __a)
-{
-  return (uint64x2_t)  __a;
-}
-
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_u8_f16 (float16x8_t __a)
-{
-  return (uint8x16_t)  __a;
-}
-
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_u8_f32 (float32x4_t __a)
-{
-  return (uint8x16_t)  __a;
-}
-
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_f16_f32 (float32x4_t __a)
-{
-  return (float16x8_t)  __a;
-}
-
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_f16_s16 (int16x8_t __a)
-{
-  return (float16x8_t)  __a;
-}
-
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_f16_s32 (int32x4_t __a)
-{
-  return (float16x8_t)  __a;
-}
-
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_f16_s64 (int64x2_t __a)
-{
-  return (float16x8_t)  __a;
-}
-
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_f16_s8 (int8x16_t __a)
-{
-  return (float16x8_t)  __a;
-}
-
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_f16_u16 (uint16x8_t __a)
-{
-  return (float16x8_t)  __a;
-}
-
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_f16_u32 (uint32x4_t __a)
-{
-  return (float16x8_t)  __a;
-}
-
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_f16_u64 (uint64x2_t __a)
-{
-  return (float16x8_t)  __a;
-}
-
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_f16_u8 (uint8x16_t __a)
-{
-  return (float16x8_t)  __a;
-}
-
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_f32_f16 (float16x8_t __a)
-{
-  return (float32x4_t)  __a;
-}
-
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_f32_s16 (int16x8_t __a)
-{
-  return (float32x4_t)  __a;
-}
-
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_f32_s32 (int32x4_t __a)
-{
-  return (float32x4_t)  __a;
-}
-
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_f32_s64 (int64x2_t __a)
-{
-  return (float32x4_t)  __a;
-}
-
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_f32_s8 (int8x16_t __a)
-{
-  return (float32x4_t)  __a;
-}
-
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_f32_u16 (uint16x8_t __a)
-{
-  return (float32x4_t)  __a;
-}
-
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_f32_u32 (uint32x4_t __a)
-{
-  return (float32x4_t)  __a;
-}
-
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_f32_u64 (uint64x2_t __a)
-{
-  return (float32x4_t)  __a;
-}
-
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_f32_u8 (uint8x16_t __a)
-{
-  return (float32x4_t)  __a;
-}
-
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vuninitializedq_f16 (void)
-{
-  float16x8_t __uninit;
-  __asm__ ("": "=w" (__uninit));
-  return __uninit;
-}
-
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vuninitializedq_f32 (void)
-{
-  float32x4_t __uninit;
-  __asm__ ("": "=w" (__uninit));
-  return __uninit;
-}
-
-#endif
-
-#ifdef __cplusplus
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_s16 (int32x4_t __a)
-{
- return __arm_vreinterpretq_s16_s32 (__a);
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_s16 (int64x2_t __a)
-{
- return __arm_vreinterpretq_s16_s64 (__a);
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_s16 (int8x16_t __a)
-{
- return __arm_vreinterpretq_s16_s8 (__a);
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_s16 (uint16x8_t __a)
-{
- return __arm_vreinterpretq_s16_u16 (__a);
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_s16 (uint32x4_t __a)
-{
- return __arm_vreinterpretq_s16_u32 (__a);
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_s16 (uint64x2_t __a)
-{
- return __arm_vreinterpretq_s16_u64 (__a);
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_s16 (uint8x16_t __a)
-{
- return __arm_vreinterpretq_s16_u8 (__a);
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_s32 (int16x8_t __a)
-{
- return __arm_vreinterpretq_s32_s16 (__a);
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_s32 (int64x2_t __a)
-{
- return __arm_vreinterpretq_s32_s64 (__a);
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_s32 (int8x16_t __a)
-{
- return __arm_vreinterpretq_s32_s8 (__a);
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_s32 (uint16x8_t __a)
-{
- return __arm_vreinterpretq_s32_u16 (__a);
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_s32 (uint32x4_t __a)
-{
- return __arm_vreinterpretq_s32_u32 (__a);
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_s32 (uint64x2_t __a)
-{
- return __arm_vreinterpretq_s32_u64 (__a);
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_s32 (uint8x16_t __a)
-{
- return __arm_vreinterpretq_s32_u8 (__a);
-}
-
-__extension__ extern __inline int64x2_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_s64 (int16x8_t __a)
-{
- return __arm_vreinterpretq_s64_s16 (__a);
-}
-
-__extension__ extern __inline int64x2_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_s64 (int32x4_t __a)
-{
- return __arm_vreinterpretq_s64_s32 (__a);
-}
-
-__extension__ extern __inline int64x2_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_s64 (int8x16_t __a)
-{
- return __arm_vreinterpretq_s64_s8 (__a);
-}
-
-__extension__ extern __inline int64x2_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_s64 (uint16x8_t __a)
-{
- return __arm_vreinterpretq_s64_u16 (__a);
-}
-
-__extension__ extern __inline int64x2_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_s64 (uint32x4_t __a)
-{
- return __arm_vreinterpretq_s64_u32 (__a);
-}
-
-__extension__ extern __inline int64x2_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_s64 (uint64x2_t __a)
-{
- return __arm_vreinterpretq_s64_u64 (__a);
-}
-
-__extension__ extern __inline int64x2_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_s64 (uint8x16_t __a)
-{
- return __arm_vreinterpretq_s64_u8 (__a);
-}
-
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_s8 (int16x8_t __a)
-{
- return __arm_vreinterpretq_s8_s16 (__a);
-}
-
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_s8 (int32x4_t __a)
-{
- return __arm_vreinterpretq_s8_s32 (__a);
-}
-
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_s8 (int64x2_t __a)
-{
- return __arm_vreinterpretq_s8_s64 (__a);
-}
-
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_s8 (uint16x8_t __a)
-{
- return __arm_vreinterpretq_s8_u16 (__a);
-}
-
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_s8 (uint32x4_t __a)
-{
- return __arm_vreinterpretq_s8_u32 (__a);
-}
-
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_s8 (uint64x2_t __a)
-{
- return __arm_vreinterpretq_s8_u64 (__a);
-}
-
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_s8 (uint8x16_t __a)
-{
- return __arm_vreinterpretq_s8_u8 (__a);
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_u16 (int16x8_t __a)
-{
- return __arm_vreinterpretq_u16_s16 (__a);
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_u16 (int32x4_t __a)
-{
- return __arm_vreinterpretq_u16_s32 (__a);
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_u16 (int64x2_t __a)
-{
- return __arm_vreinterpretq_u16_s64 (__a);
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_u16 (int8x16_t __a)
-{
- return __arm_vreinterpretq_u16_s8 (__a);
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_u16 (uint32x4_t __a)
-{
- return __arm_vreinterpretq_u16_u32 (__a);
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_u16 (uint64x2_t __a)
-{
- return __arm_vreinterpretq_u16_u64 (__a);
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_u16 (uint8x16_t __a)
-{
- return __arm_vreinterpretq_u16_u8 (__a);
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_u32 (int16x8_t __a)
-{
- return __arm_vreinterpretq_u32_s16 (__a);
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_u32 (int32x4_t __a)
-{
- return __arm_vreinterpretq_u32_s32 (__a);
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_u32 (int64x2_t __a)
-{
- return __arm_vreinterpretq_u32_s64 (__a);
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_u32 (int8x16_t __a)
-{
- return __arm_vreinterpretq_u32_s8 (__a);
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_u32 (uint16x8_t __a)
-{
- return __arm_vreinterpretq_u32_u16 (__a);
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_u32 (uint64x2_t __a)
-{
- return __arm_vreinterpretq_u32_u64 (__a);
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_u32 (uint8x16_t __a)
-{
- return __arm_vreinterpretq_u32_u8 (__a);
-}
-
-__extension__ extern __inline uint64x2_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_u64 (int16x8_t __a)
-{
- return __arm_vreinterpretq_u64_s16 (__a);
-}
-
-__extension__ extern __inline uint64x2_t
+__extension__ extern __inline uint8x16_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_u64 (int32x4_t __a)
+__arm_vuninitializedq_u8 (void)
 {
- return __arm_vreinterpretq_u64_s32 (__a);
+  uint8x16_t __uninit;
+  __asm__ ("": "=w"(__uninit));
+  return __uninit;
 }
 
-__extension__ extern __inline uint64x2_t
+__extension__ extern __inline uint16x8_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_u64 (int64x2_t __a)
+__arm_vuninitializedq_u16 (void)
 {
- return __arm_vreinterpretq_u64_s64 (__a);
+  uint16x8_t __uninit;
+  __asm__ ("": "=w"(__uninit));
+  return __uninit;
 }
 
-__extension__ extern __inline uint64x2_t
+__extension__ extern __inline uint32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_u64 (int8x16_t __a)
+__arm_vuninitializedq_u32 (void)
 {
- return __arm_vreinterpretq_u64_s8 (__a);
+  uint32x4_t __uninit;
+  __asm__ ("": "=w"(__uninit));
+  return __uninit;
 }
 
 __extension__ extern __inline uint64x2_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_u64 (uint16x8_t __a)
+__arm_vuninitializedq_u64 (void)
 {
- return __arm_vreinterpretq_u64_u16 (__a);
+  uint64x2_t __uninit;
+  __asm__ ("": "=w"(__uninit));
+  return __uninit;
 }
 
-__extension__ extern __inline uint64x2_t
+__extension__ extern __inline int8x16_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_u64 (uint32x4_t __a)
+__arm_vuninitializedq_s8 (void)
 {
- return __arm_vreinterpretq_u64_u32 (__a);
+  int8x16_t __uninit;
+  __asm__ ("": "=w"(__uninit));
+  return __uninit;
 }
 
-__extension__ extern __inline uint64x2_t
+__extension__ extern __inline int16x8_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_u64 (uint8x16_t __a)
+__arm_vuninitializedq_s16 (void)
 {
- return __arm_vreinterpretq_u64_u8 (__a);
+  int16x8_t __uninit;
+  __asm__ ("": "=w"(__uninit));
+  return __uninit;
 }
 
-__extension__ extern __inline uint8x16_t
+__extension__ extern __inline int32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_u8 (int16x8_t __a)
+__arm_vuninitializedq_s32 (void)
 {
- return __arm_vreinterpretq_u8_s16 (__a);
+  int32x4_t __uninit;
+  __asm__ ("": "=w"(__uninit));
+  return __uninit;
 }
 
-__extension__ extern __inline uint8x16_t
+__extension__ extern __inline int64x2_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_u8 (int32x4_t __a)
+__arm_vuninitializedq_s64 (void)
 {
- return __arm_vreinterpretq_u8_s32 (__a);
+  int64x2_t __uninit;
+  __asm__ ("": "=w"(__uninit));
+  return __uninit;
 }
 
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_u8 (int64x2_t __a)
-{
- return __arm_vreinterpretq_u8_s64 (__a);
-}
+#if (__ARM_FEATURE_MVE & 2) /* MVE Floating point.  */
 
-__extension__ extern __inline uint8x16_t
+__extension__ extern __inline float16x8_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_u8 (int8x16_t __a)
+__arm_vuninitializedq_f16 (void)
 {
- return __arm_vreinterpretq_u8_s8 (__a);
+  float16x8_t __uninit;
+  __asm__ ("": "=w" (__uninit));
+  return __uninit;
 }
 
-__extension__ extern __inline uint8x16_t
+__extension__ extern __inline float32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_u8 (uint16x8_t __a)
+__arm_vuninitializedq_f32 (void)
 {
- return __arm_vreinterpretq_u8_u16 (__a);
+  float32x4_t __uninit;
+  __asm__ ("": "=w" (__uninit));
+  return __uninit;
 }
 
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_u8 (uint32x4_t __a)
-{
- return __arm_vreinterpretq_u8_u32 (__a);
-}
+#endif
 
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_u8 (uint64x2_t __a)
-{
- return __arm_vreinterpretq_u8_u64 (__a);
-}
+#ifdef __cplusplus
 
 __extension__ extern __inline uint8x16_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
@@ -1205,244 +182,6 @@ __arm_vuninitializedq (int64x2_t /* __v ATTRIBUTE UNUSED */)
 }
 
 #if (__ARM_FEATURE_MVE & 2) /* MVE Floating point.  */
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_s32 (float16x8_t __a)
-{
- return __arm_vreinterpretq_s32_f16 (__a);
-}
-
-__extension__ extern __inline int32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_s32 (float32x4_t __a)
-{
- return __arm_vreinterpretq_s32_f32 (__a);
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_s16 (float16x8_t __a)
-{
- return __arm_vreinterpretq_s16_f16 (__a);
-}
-
-__extension__ extern __inline int16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_s16 (float32x4_t __a)
-{
- return __arm_vreinterpretq_s16_f32 (__a);
-}
-
-__extension__ extern __inline int64x2_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_s64 (float16x8_t __a)
-{
- return __arm_vreinterpretq_s64_f16 (__a);
-}
-
-__extension__ extern __inline int64x2_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_s64 (float32x4_t __a)
-{
- return __arm_vreinterpretq_s64_f32 (__a);
-}
-
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_s8 (float16x8_t __a)
-{
- return __arm_vreinterpretq_s8_f16 (__a);
-}
-
-__extension__ extern __inline int8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_s8 (float32x4_t __a)
-{
- return __arm_vreinterpretq_s8_f32 (__a);
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_u16 (float16x8_t __a)
-{
- return __arm_vreinterpretq_u16_f16 (__a);
-}
-
-__extension__ extern __inline uint16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_u16 (float32x4_t __a)
-{
- return __arm_vreinterpretq_u16_f32 (__a);
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_u32 (float16x8_t __a)
-{
- return __arm_vreinterpretq_u32_f16 (__a);
-}
-
-__extension__ extern __inline uint32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_u32 (float32x4_t __a)
-{
- return __arm_vreinterpretq_u32_f32 (__a);
-}
-
-__extension__ extern __inline uint64x2_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_u64 (float16x8_t __a)
-{
- return __arm_vreinterpretq_u64_f16 (__a);
-}
-
-__extension__ extern __inline uint64x2_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_u64 (float32x4_t __a)
-{
- return __arm_vreinterpretq_u64_f32 (__a);
-}
-
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_u8 (float16x8_t __a)
-{
- return __arm_vreinterpretq_u8_f16 (__a);
-}
-
-__extension__ extern __inline uint8x16_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_u8 (float32x4_t __a)
-{
- return __arm_vreinterpretq_u8_f32 (__a);
-}
-
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_f16 (float32x4_t __a)
-{
- return __arm_vreinterpretq_f16_f32 (__a);
-}
-
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_f16 (int16x8_t __a)
-{
- return __arm_vreinterpretq_f16_s16 (__a);
-}
-
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_f16 (int32x4_t __a)
-{
- return __arm_vreinterpretq_f16_s32 (__a);
-}
-
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_f16 (int64x2_t __a)
-{
- return __arm_vreinterpretq_f16_s64 (__a);
-}
-
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_f16 (int8x16_t __a)
-{
- return __arm_vreinterpretq_f16_s8 (__a);
-}
-
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_f16 (uint16x8_t __a)
-{
- return __arm_vreinterpretq_f16_u16 (__a);
-}
-
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_f16 (uint32x4_t __a)
-{
- return __arm_vreinterpretq_f16_u32 (__a);
-}
-
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_f16 (uint64x2_t __a)
-{
- return __arm_vreinterpretq_f16_u64 (__a);
-}
-
-__extension__ extern __inline float16x8_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_f16 (uint8x16_t __a)
-{
- return __arm_vreinterpretq_f16_u8 (__a);
-}
-
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_f32 (float16x8_t __a)
-{
- return __arm_vreinterpretq_f32_f16 (__a);
-}
-
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_f32 (int16x8_t __a)
-{
- return __arm_vreinterpretq_f32_s16 (__a);
-}
-
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_f32 (int32x4_t __a)
-{
- return __arm_vreinterpretq_f32_s32 (__a);
-}
-
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_f32 (int64x2_t __a)
-{
- return __arm_vreinterpretq_f32_s64 (__a);
-}
-
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_f32 (int8x16_t __a)
-{
- return __arm_vreinterpretq_f32_s8 (__a);
-}
-
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_f32 (uint16x8_t __a)
-{
- return __arm_vreinterpretq_f32_u16 (__a);
-}
-
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_f32 (uint32x4_t __a)
-{
- return __arm_vreinterpretq_f32_u32 (__a);
-}
-
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_f32 (uint64x2_t __a)
-{
- return __arm_vreinterpretq_f32_u64 (__a);
-}
-
-__extension__ extern __inline float32x4_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vreinterpretq_f32 (uint8x16_t __a)
-{
- return __arm_vreinterpretq_f32_u8 (__a);
-}
-
 __extension__ extern __inline float16x8_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vuninitializedq (float16x8_t /* __v ATTRIBUTE UNUSED */)
diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index 35eab6c94bf..ab688396f97 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -10561,3 +10561,21 @@ (define_expand "vcond_mask_<mode><MVE_vpred>"
     }
   DONE;
 })
+
+;; Reinterpret operand 1 in operand 0's mode, without changing its contents.
+(define_expand "@arm_mve_reinterpret<mode>"
+  [(set (match_operand:MVE_vecs 0 "register_operand")
+	(unspec:MVE_vecs
+	  [(match_operand 1 "arm_any_register_operand")]
+	  REINTERPRET))]
+  "(TARGET_HAVE_MVE && VALID_MVE_SI_MODE (<MODE>mode))
+    || (TARGET_HAVE_MVE_FLOAT && VALID_MVE_SF_MODE (<MODE>mode))"
+  {
+    machine_mode src_mode = GET_MODE (operands[1]);
+    if (targetm.can_change_mode_class (<MODE>mode, src_mode, VFP_REGS))
+      {
+	emit_move_insn (operands[0], gen_lowpart (<MODE>mode, operands[1]));
+	DONE;
+      }
+  }
+)
diff --git a/gcc/config/arm/unspecs.md b/gcc/config/arm/unspecs.md
index 84384ee798d..dccda283573 100644
--- a/gcc/config/arm/unspecs.md
+++ b/gcc/config/arm/unspecs.md
@@ -1255,4 +1255,5 @@ (define_c_enum "unspec" [
   SQRSHRL_64
   SQRSHRL_48
   VSHLCQ_M_
+  REINTERPRET
 ])
diff --git a/gcc/testsuite/g++.target/arm/mve.exp b/gcc/testsuite/g++.target/arm/mve.exp
index cd824035540..f75ec20ea64 100644
--- a/gcc/testsuite/g++.target/arm/mve.exp
+++ b/gcc/testsuite/g++.target/arm/mve.exp
@@ -42,8 +42,12 @@ set dg-do-what-default "assemble"
 dg-init
 
 # Main loop.
-dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/../../gcc.target/arm/mve/intrinsics/*.\[cCS\]]] \
-	"" $DEFAULT_CXXFLAGS
+set gcc_subdir [string replace $subdir 0 2 gcc]
+set files [glob -nocomplain \
+	       "$srcdir/$subdir/../../gcc.target/arm/mve/intrinsics/*.\[cCS\]" \
+	       "$srcdir/$gcc_subdir/mve/general/*.\[cCS\]" \
+	       "$srcdir/$subdir/mve/general-c++/*.\[cCS\]"]
+dg-runtest [lsort $files] "" $DEFAULT_CXXFLAGS
 
 # All done.
 set dg-do-what-default ${save-dg-do-what-default}
diff --git a/gcc/testsuite/g++.target/arm/mve/general-c++/nomve_fp_1.c b/gcc/testsuite/g++.target/arm/mve/general-c++/nomve_fp_1.c
new file mode 100644
index 00000000000..e0692ceb8c8
--- /dev/null
+++ b/gcc/testsuite/g++.target/arm/mve/general-c++/nomve_fp_1.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_v8_1m_mve_ok } */
+/* Do not use dg-add-options arm_v8_1m_mve, because this might expand to "",
+   which could imply mve+fp depending on the user settings. We want to make
+   sure the '+fp' extension is not enabled.  */
+/* { dg-options "-mfpu=auto -march=armv8.1-m.main+mve" } */
+
+#include <arm_mve.h>
+
+void
+f1 (uint8x16_t v)
+{
+  vreinterpretq_f16 (v); /* { dg-error {ACLE function 'void vreinterpretq_f16\(uint8x16_t\)' requires ISA extension 'mve.fp'} } */
+  /* { dg-message {note: you can enable mve.fp by using the command-line option '-march', or by using the 'target' attribute or pragma} "" {target *-*-*} .-1 } */
+}
diff --git a/gcc/testsuite/g++.target/arm/mve/general-c++/vreinterpretq_1.C b/gcc/testsuite/g++.target/arm/mve/general-c++/vreinterpretq_1.C
new file mode 100644
index 00000000000..8b29ee58163
--- /dev/null
+++ b/gcc/testsuite/g++.target/arm/mve/general-c++/vreinterpretq_1.C
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
+/* { dg-add-options arm_v8_1m_mve_fp } */
+
+#include <arm_mve.h>
+
+void
+f1 (int8x16_t s8, uint16x8_t u16, float32x4_t f32)
+{
+  __arm_vreinterpretq_s8 (); /* { dg-error {no matching function for call to '__arm_vreinterpretq_s8\(\)'} } */
+  __arm_vreinterpretq_s8 (s8, s8); /* { dg-error {no matching function for call to '__arm_vreinterpretq_s8\(int8x16_t\&, int8x16_t\&\)'} } */
+  __arm_vreinterpretq_s8 (0); /* { dg-error {no matching function for call to '__arm_vreinterpretq_s8\(int\)'} } */
+  __arm_vreinterpretq_s8 (s8); /* { dg-error {no matching function for call to '__arm_vreinterpretq_s8\(int8x16_t\&\)'} } */
+  __arm_vreinterpretq_s8 (u16);
+  __arm_vreinterpretq_u16 (); /* { dg-error {no matching function for call to '__arm_vreinterpretq_u16\(\)'} } */
+  __arm_vreinterpretq_u16 (u16, u16); /* { dg-error {no matching function for call to '__arm_vreinterpretq_u16\(uint16x8_t\&, uint16x8_t\&\)'} } */
+  __arm_vreinterpretq_u16 (0); /* { dg-error {no matching function for call to '__arm_vreinterpretq_u16\(int\)'} } */
+  __arm_vreinterpretq_u16 (u16); /* { dg-error {no matching function for call to '__arm_vreinterpretq_u16\(uint16x8_t\&\)'} } */
+  __arm_vreinterpretq_u16 (f32);
+  __arm_vreinterpretq_f32 (); /* { dg-error {no matching function for call to '__arm_vreinterpretq_f32\(\)'} } */
+  __arm_vreinterpretq_f32 (f32, f32); /* { dg-error {no matching function for call to '__arm_vreinterpretq_f32\(float32x4_t\&, float32x4_t\&\)'} } */
+  __arm_vreinterpretq_f32 (0); /* { dg-error {no matching function for call to '__arm_vreinterpretq_f32\(int\)'} } */
+  __arm_vreinterpretq_f32 (f32); /* { dg-error {no matching function for call to '__arm_vreinterpretq_f32\(float32x4_t\&\)'} } */
+  __arm_vreinterpretq_f32 (s8);
+}
diff --git a/gcc/testsuite/gcc.target/arm/mve/general-c/nomve_fp_1.c b/gcc/testsuite/gcc.target/arm/mve/general-c/nomve_fp_1.c
new file mode 100644
index 00000000000..21c2af16a61
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/general-c/nomve_fp_1.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_v8_1m_mve_ok } */
+/* Do not use dg-add-options arm_v8_1m_mve, because this might expand to "",
+   which could imply mve+fp depending on the user settings. We want to make
+   sure the '+fp' extension is not enabled.  */
+/* { dg-options "-mfpu=auto -march=armv8.1-m.main+mve" } */
+
+#include <arm_mve.h>
+
+void
+foo (uint8x16_t v)
+{
+  vreinterpretq_f16 (v); /* { dg-error {ACLE function '__arm_vreinterpretq_f16_u8' requires ISA extension 'mve.fp'} } */
+  /* { dg-message {note: you can enable mve.fp by using the command-line option '-march', or by using the 'target' attribute or pragma} "" {target *-*-*} .-1 } */
+}
diff --git a/gcc/testsuite/gcc.target/arm/mve/general-c/vreinterpretq_1.c b/gcc/testsuite/gcc.target/arm/mve/general-c/vreinterpretq_1.c
new file mode 100644
index 00000000000..0297bd50198
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/general-c/vreinterpretq_1.c
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
+/* { dg-add-options arm_v8_1m_mve_fp } */
+
+#include <arm_mve.h>
+
+void
+f1 (int8x16_t s8, uint16x8_t u16, float32x4_t f32)
+{
+  __arm_vreinterpretq_s8 (); /* { dg-error {too few arguments to function '__arm_vreinterpretq_s8'} } */
+  __arm_vreinterpretq_s8 (s8, s8); /* { dg-error {too many arguments to function '__arm_vreinterpretq_s8'} } */
+  __arm_vreinterpretq_s8 (0); /* { dg-error {passing 'int' to argument 1 of '__arm_vreinterpretq_s8', which expects an MVE vector type} } */
+  __arm_vreinterpretq_s8 (s8); /* { dg-error {'__arm_vreinterpretq_s8' has no form that takes 'int8x16_t' arguments} } */
+  __arm_vreinterpretq_s8 (u16);
+  __arm_vreinterpretq_u16 (); /* { dg-error {too few arguments to function '__arm_vreinterpretq_u16'} } */
+  __arm_vreinterpretq_u16 (u16, u16); /* { dg-error {too many arguments to function '__arm_vreinterpretq_u16'} } */
+  __arm_vreinterpretq_u16 (0); /* { dg-error {passing 'int' to argument 1 of '__arm_vreinterpretq_u16', which expects an MVE vector type} } */
+  __arm_vreinterpretq_u16 (u16); /* { dg-error {'__arm_vreinterpretq_u16' has no form that takes 'uint16x8_t' arguments} } */
+  __arm_vreinterpretq_u16 (f32);
+  __arm_vreinterpretq_f32 (); /* { dg-error {too few arguments to function '__arm_vreinterpretq_f32'} } */
+  __arm_vreinterpretq_f32 (f32, f32); /* { dg-error {too many arguments to function '__arm_vreinterpretq_f32'} } */
+  __arm_vreinterpretq_f32 (0); /* { dg-error {passing 'int' to argument 1 of '__arm_vreinterpretq_f32', which expects an MVE vector type} } */
+  __arm_vreinterpretq_f32 (f32); /* { dg-error {'__arm_vreinterpretq_f32' has no form that takes 'float32x4_t' arguments} } */
+  __arm_vreinterpretq_f32 (s8);
+}
-- 
2.34.1


^ permalink raw reply	[flat|nested] 55+ messages in thread

* RE: [PATCH v2 03/22] arm: [MVE intrinsics] Rework vreinterpretq
  2023-05-03 14:37           ` [PATCH v2 " Christophe Lyon
@ 2023-05-03 14:52             ` Kyrylo Tkachov
  0 siblings, 0 replies; 55+ messages in thread
From: Kyrylo Tkachov @ 2023-05-03 14:52 UTC (permalink / raw)
  To: Christophe Lyon, gcc-patches, Richard Earnshaw, Richard Sandiford
  Cc: Christophe Lyon



> -----Original Message-----
> From: Christophe Lyon <christophe.lyon@arm.com>
> Sent: Wednesday, May 3, 2023 3:37 PM
> To: gcc-patches@gcc.gnu.org; Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>;
> Richard Earnshaw <Richard.Earnshaw@arm.com>; Richard Sandiford
> <Richard.Sandiford@arm.com>
> Cc: Christophe Lyon <Christophe.Lyon@arm.com>
> Subject: [PATCH v2 03/22] arm: [MVE intrinsics] Rework vreinterpretq
> 
> This patch implements vreinterpretq using the new MVE intrinsics
> framework.
> 
> The old definitions for vreinterpretq are removed as a consequence.
> 

Ok.
Thanks,
Kyrill

> 2022-09-08  Murray Steele  <murray.steele@arm.com>
> 	    Christophe Lyon  <christophe.lyon@arm.com>
> 
> 	gcc/
> 	* config/arm/arm-mve-builtins-base.cc (vreinterpretq_impl): New
> class.
> 	* config/arm/arm-mve-builtins-base.def: Define vreinterpretq.
> 	* config/arm/arm-mve-builtins-base.h (vreinterpretq): New
> declaration.
> 	* config/arm/arm-mve-builtins-shapes.cc (parse_element_type): New
> function.
> 	(parse_type): Likewise.
> 	(parse_signature): Likewise.
> 	(build_one): Likewise.
> 	(build_all): Likewise.
> 	(overloaded_base): New struct.
> 	(unary_convert_def): Likewise.
> 	* config/arm/arm-mve-builtins-shapes.h (unary_convert): Declare.
> 	* config/arm/arm-mve-builtins.cc (TYPES_reinterpret_signed1): New
> 	macro.
> 	(TYPES_reinterpret_unsigned1): Likewise.
> 	(TYPES_reinterpret_integer): Likewise.
> 	(TYPES_reinterpret_integer1): Likewise.
> 	(TYPES_reinterpret_float1): Likewise.
> 	(TYPES_reinterpret_float): Likewise.
> 	(reinterpret_integer): New.
> 	(reinterpret_float): New.
> 	(handle_arm_mve_h): Register builtins.
> 	* config/arm/arm_mve.h (vreinterpretq_s16): Remove.
> 	(vreinterpretq_s32): Likewise.
> 	(vreinterpretq_s64): Likewise.
> 	(vreinterpretq_s8): Likewise.
> 	(vreinterpretq_u16): Likewise.
> 	(vreinterpretq_u32): Likewise.
> 	(vreinterpretq_u64): Likewise.
> 	(vreinterpretq_u8): Likewise.
> 	(vreinterpretq_f16): Likewise.
> 	(vreinterpretq_f32): Likewise.
> 	(vreinterpretq_s16_s32): Likewise.
> 	(vreinterpretq_s16_s64): Likewise.
> 	(vreinterpretq_s16_s8): Likewise.
> 	(vreinterpretq_s16_u16): Likewise.
> 	(vreinterpretq_s16_u32): Likewise.
> 	(vreinterpretq_s16_u64): Likewise.
> 	(vreinterpretq_s16_u8): Likewise.
> 	(vreinterpretq_s32_s16): Likewise.
> 	(vreinterpretq_s32_s64): Likewise.
> 	(vreinterpretq_s32_s8): Likewise.
> 	(vreinterpretq_s32_u16): Likewise.
> 	(vreinterpretq_s32_u32): Likewise.
> 	(vreinterpretq_s32_u64): Likewise.
> 	(vreinterpretq_s32_u8): Likewise.
> 	(vreinterpretq_s64_s16): Likewise.
> 	(vreinterpretq_s64_s32): Likewise.
> 	(vreinterpretq_s64_s8): Likewise.
> 	(vreinterpretq_s64_u16): Likewise.
> 	(vreinterpretq_s64_u32): Likewise.
> 	(vreinterpretq_s64_u64): Likewise.
> 	(vreinterpretq_s64_u8): Likewise.
> 	(vreinterpretq_s8_s16): Likewise.
> 	(vreinterpretq_s8_s32): Likewise.
> 	(vreinterpretq_s8_s64): Likewise.
> 	(vreinterpretq_s8_u16): Likewise.
> 	(vreinterpretq_s8_u32): Likewise.
> 	(vreinterpretq_s8_u64): Likewise.
> 	(vreinterpretq_s8_u8): Likewise.
> 	(vreinterpretq_u16_s16): Likewise.
> 	(vreinterpretq_u16_s32): Likewise.
> 	(vreinterpretq_u16_s64): Likewise.
> 	(vreinterpretq_u16_s8): Likewise.
> 	(vreinterpretq_u16_u32): Likewise.
> 	(vreinterpretq_u16_u64): Likewise.
> 	(vreinterpretq_u16_u8): Likewise.
> 	(vreinterpretq_u32_s16): Likewise.
> 	(vreinterpretq_u32_s32): Likewise.
> 	(vreinterpretq_u32_s64): Likewise.
> 	(vreinterpretq_u32_s8): Likewise.
> 	(vreinterpretq_u32_u16): Likewise.
> 	(vreinterpretq_u32_u64): Likewise.
> 	(vreinterpretq_u32_u8): Likewise.
> 	(vreinterpretq_u64_s16): Likewise.
> 	(vreinterpretq_u64_s32): Likewise.
> 	(vreinterpretq_u64_s64): Likewise.
> 	(vreinterpretq_u64_s8): Likewise.
> 	(vreinterpretq_u64_u16): Likewise.
> 	(vreinterpretq_u64_u32): Likewise.
> 	(vreinterpretq_u64_u8): Likewise.
> 	(vreinterpretq_u8_s16): Likewise.
> 	(vreinterpretq_u8_s32): Likewise.
> 	(vreinterpretq_u8_s64): Likewise.
> 	(vreinterpretq_u8_s8): Likewise.
> 	(vreinterpretq_u8_u16): Likewise.
> 	(vreinterpretq_u8_u32): Likewise.
> 	(vreinterpretq_u8_u64): Likewise.
> 	(vreinterpretq_s32_f16): Likewise.
> 	(vreinterpretq_s32_f32): Likewise.
> 	(vreinterpretq_u16_f16): Likewise.
> 	(vreinterpretq_u16_f32): Likewise.
> 	(vreinterpretq_u32_f16): Likewise.
> 	(vreinterpretq_u32_f32): Likewise.
> 	(vreinterpretq_u64_f16): Likewise.
> 	(vreinterpretq_u64_f32): Likewise.
> 	(vreinterpretq_u8_f16): Likewise.
> 	(vreinterpretq_u8_f32): Likewise.
> 	(vreinterpretq_f16_f32): Likewise.
> 	(vreinterpretq_f16_s16): Likewise.
> 	(vreinterpretq_f16_s32): Likewise.
> 	(vreinterpretq_f16_s64): Likewise.
> 	(vreinterpretq_f16_s8): Likewise.
> 	(vreinterpretq_f16_u16): Likewise.
> 	(vreinterpretq_f16_u32): Likewise.
> 	(vreinterpretq_f16_u64): Likewise.
> 	(vreinterpretq_f16_u8): Likewise.
> 	(vreinterpretq_f32_f16): Likewise.
> 	(vreinterpretq_f32_s16): Likewise.
> 	(vreinterpretq_f32_s32): Likewise.
> 	(vreinterpretq_f32_s64): Likewise.
> 	(vreinterpretq_f32_s8): Likewise.
> 	(vreinterpretq_f32_u16): Likewise.
> 	(vreinterpretq_f32_u32): Likewise.
> 	(vreinterpretq_f32_u64): Likewise.
> 	(vreinterpretq_f32_u8): Likewise.
> 	(vreinterpretq_s16_f16): Likewise.
> 	(vreinterpretq_s16_f32): Likewise.
> 	(vreinterpretq_s64_f16): Likewise.
> 	(vreinterpretq_s64_f32): Likewise.
> 	(vreinterpretq_s8_f16): Likewise.
> 	(vreinterpretq_s8_f32): Likewise.
> 	(__arm_vreinterpretq_f16): Likewise.
> 	(__arm_vreinterpretq_f32): Likewise.
> 	(__arm_vreinterpretq_s16): Likewise.
> 	(__arm_vreinterpretq_s32): Likewise.
> 	(__arm_vreinterpretq_s64): Likewise.
> 	(__arm_vreinterpretq_s8): Likewise.
> 	(__arm_vreinterpretq_u16): Likewise.
> 	(__arm_vreinterpretq_u32): Likewise.
> 	(__arm_vreinterpretq_u64): Likewise.
> 	(__arm_vreinterpretq_u8): Likewise.
> 	* config/arm/arm_mve_types.h (__arm_vreinterpretq_s16_s32):
> Remove.
> 	(__arm_vreinterpretq_s16_s64): Likewise.
> 	(__arm_vreinterpretq_s16_s8): Likewise.
> 	(__arm_vreinterpretq_s16_u16): Likewise.
> 	(__arm_vreinterpretq_s16_u32): Likewise.
> 	(__arm_vreinterpretq_s16_u64): Likewise.
> 	(__arm_vreinterpretq_s16_u8): Likewise.
> 	(__arm_vreinterpretq_s32_s16): Likewise.
> 	(__arm_vreinterpretq_s32_s64): Likewise.
> 	(__arm_vreinterpretq_s32_s8): Likewise.
> 	(__arm_vreinterpretq_s32_u16): Likewise.
> 	(__arm_vreinterpretq_s32_u32): Likewise.
> 	(__arm_vreinterpretq_s32_u64): Likewise.
> 	(__arm_vreinterpretq_s32_u8): Likewise.
> 	(__arm_vreinterpretq_s64_s16): Likewise.
> 	(__arm_vreinterpretq_s64_s32): Likewise.
> 	(__arm_vreinterpretq_s64_s8): Likewise.
> 	(__arm_vreinterpretq_s64_u16): Likewise.
> 	(__arm_vreinterpretq_s64_u32): Likewise.
> 	(__arm_vreinterpretq_s64_u64): Likewise.
> 	(__arm_vreinterpretq_s64_u8): Likewise.
> 	(__arm_vreinterpretq_s8_s16): Likewise.
> 	(__arm_vreinterpretq_s8_s32): Likewise.
> 	(__arm_vreinterpretq_s8_s64): Likewise.
> 	(__arm_vreinterpretq_s8_u16): Likewise.
> 	(__arm_vreinterpretq_s8_u32): Likewise.
> 	(__arm_vreinterpretq_s8_u64): Likewise.
> 	(__arm_vreinterpretq_s8_u8): Likewise.
> 	(__arm_vreinterpretq_u16_s16): Likewise.
> 	(__arm_vreinterpretq_u16_s32): Likewise.
> 	(__arm_vreinterpretq_u16_s64): Likewise.
> 	(__arm_vreinterpretq_u16_s8): Likewise.
> 	(__arm_vreinterpretq_u16_u32): Likewise.
> 	(__arm_vreinterpretq_u16_u64): Likewise.
> 	(__arm_vreinterpretq_u16_u8): Likewise.
> 	(__arm_vreinterpretq_u32_s16): Likewise.
> 	(__arm_vreinterpretq_u32_s32): Likewise.
> 	(__arm_vreinterpretq_u32_s64): Likewise.
> 	(__arm_vreinterpretq_u32_s8): Likewise.
> 	(__arm_vreinterpretq_u32_u16): Likewise.
> 	(__arm_vreinterpretq_u32_u64): Likewise.
> 	(__arm_vreinterpretq_u32_u8): Likewise.
> 	(__arm_vreinterpretq_u64_s16): Likewise.
> 	(__arm_vreinterpretq_u64_s32): Likewise.
> 	(__arm_vreinterpretq_u64_s64): Likewise.
> 	(__arm_vreinterpretq_u64_s8): Likewise.
> 	(__arm_vreinterpretq_u64_u16): Likewise.
> 	(__arm_vreinterpretq_u64_u32): Likewise.
> 	(__arm_vreinterpretq_u64_u8): Likewise.
> 	(__arm_vreinterpretq_u8_s16): Likewise.
> 	(__arm_vreinterpretq_u8_s32): Likewise.
> 	(__arm_vreinterpretq_u8_s64): Likewise.
> 	(__arm_vreinterpretq_u8_s8): Likewise.
> 	(__arm_vreinterpretq_u8_u16): Likewise.
> 	(__arm_vreinterpretq_u8_u32): Likewise.
> 	(__arm_vreinterpretq_u8_u64): Likewise.
> 	(__arm_vreinterpretq_s32_f16): Likewise.
> 	(__arm_vreinterpretq_s32_f32): Likewise.
> 	(__arm_vreinterpretq_s16_f16): Likewise.
> 	(__arm_vreinterpretq_s16_f32): Likewise.
> 	(__arm_vreinterpretq_s64_f16): Likewise.
> 	(__arm_vreinterpretq_s64_f32): Likewise.
> 	(__arm_vreinterpretq_s8_f16): Likewise.
> 	(__arm_vreinterpretq_s8_f32): Likewise.
> 	(__arm_vreinterpretq_u16_f16): Likewise.
> 	(__arm_vreinterpretq_u16_f32): Likewise.
> 	(__arm_vreinterpretq_u32_f16): Likewise.
> 	(__arm_vreinterpretq_u32_f32): Likewise.
> 	(__arm_vreinterpretq_u64_f16): Likewise.
> 	(__arm_vreinterpretq_u64_f32): Likewise.
> 	(__arm_vreinterpretq_u8_f16): Likewise.
> 	(__arm_vreinterpretq_u8_f32): Likewise.
> 	(__arm_vreinterpretq_f16_f32): Likewise.
> 	(__arm_vreinterpretq_f16_s16): Likewise.
> 	(__arm_vreinterpretq_f16_s32): Likewise.
> 	(__arm_vreinterpretq_f16_s64): Likewise.
> 	(__arm_vreinterpretq_f16_s8): Likewise.
> 	(__arm_vreinterpretq_f16_u16): Likewise.
> 	(__arm_vreinterpretq_f16_u32): Likewise.
> 	(__arm_vreinterpretq_f16_u64): Likewise.
> 	(__arm_vreinterpretq_f16_u8): Likewise.
> 	(__arm_vreinterpretq_f32_f16): Likewise.
> 	(__arm_vreinterpretq_f32_s16): Likewise.
> 	(__arm_vreinterpretq_f32_s32): Likewise.
> 	(__arm_vreinterpretq_f32_s64): Likewise.
> 	(__arm_vreinterpretq_f32_s8): Likewise.
> 	(__arm_vreinterpretq_f32_u16): Likewise.
> 	(__arm_vreinterpretq_f32_u32): Likewise.
> 	(__arm_vreinterpretq_f32_u64): Likewise.
> 	(__arm_vreinterpretq_f32_u8): Likewise.
> 	(__arm_vreinterpretq_s16): Likewise.
> 	(__arm_vreinterpretq_s32): Likewise.
> 	(__arm_vreinterpretq_s64): Likewise.
> 	(__arm_vreinterpretq_s8): Likewise.
> 	(__arm_vreinterpretq_u16): Likewise.
> 	(__arm_vreinterpretq_u32): Likewise.
> 	(__arm_vreinterpretq_u64): Likewise.
> 	(__arm_vreinterpretq_u8): Likewise.
> 	(__arm_vreinterpretq_f16): Likewise.
> 	(__arm_vreinterpretq_f32): Likewise.
> 	* config/arm/mve.md (@arm_mve_reinterpret<mode>): New
> pattern.
> 	* config/arm/unspecs.md: (REINTERPRET): New unspec.
> 
> 	gcc/testsuite/
> 	* g++.target/arm/mve.exp: Add general-c++ and general directories.
> 	* g++.target/arm/mve/general-c++/nomve_fp_1.c: New test.
> 	* g++.target/arm/mve/general-c++/vreinterpretq_1.C: New test.
> 	* gcc.target/arm/mve/general-c/nomve_fp_1.c: New test.
> 	* gcc.target/arm/mve/general-c/vreinterpretq_1.c: New test.
> ---
>  gcc/config/arm/arm-mve-builtins-base.cc       |   33 +
>  gcc/config/arm/arm-mve-builtins-base.def      |    2 +
>  gcc/config/arm/arm-mve-builtins-base.h        |    2 +
>  gcc/config/arm/arm-mve-builtins-shapes.cc     |   28 +
>  gcc/config/arm/arm-mve-builtins-shapes.h      |    8 +
>  gcc/config/arm/arm-mve-builtins.cc            |   60 +
>  gcc/config/arm/arm_mve.h                      |  300 ----
>  gcc/config/arm/arm_mve_types.h                | 1365 +----------------
>  gcc/config/arm/mve.md                         |   18 +
>  gcc/config/arm/unspecs.md                     |    1 +
>  gcc/testsuite/g++.target/arm/mve.exp          |    8 +-
>  .../arm/mve/general-c++/nomve_fp_1.c          |   15 +
>  .../arm/mve/general-c++/vreinterpretq_1.C     |   25 +
>  .../gcc.target/arm/mve/general-c/nomve_fp_1.c |   15 +
>  .../arm/mve/general-c/vreinterpretq_1.c       |   25 +
>  15 files changed, 290 insertions(+), 1615 deletions(-)
>  create mode 100644 gcc/testsuite/g++.target/arm/mve/general-
> c++/nomve_fp_1.c
>  create mode 100644 gcc/testsuite/g++.target/arm/mve/general-
> c++/vreinterpretq_1.C
>  create mode 100644 gcc/testsuite/gcc.target/arm/mve/general-
> c/nomve_fp_1.c
>  create mode 100644 gcc/testsuite/gcc.target/arm/mve/general-
> c/vreinterpretq_1.c
> 
> diff --git a/gcc/config/arm/arm-mve-builtins-base.cc b/gcc/config/arm/arm-
> mve-builtins-base.cc
> index e9f285faf2b..abf6a1e19de 100644
> --- a/gcc/config/arm/arm-mve-builtins-base.cc
> +++ b/gcc/config/arm/arm-mve-builtins-base.cc
> @@ -38,8 +38,41 @@ using namespace arm_mve;
> 
>  namespace {
> 
> +/* Implements vreinterpretq_* intrinsics.  */
> +class vreinterpretq_impl : public quiet<function_base>
> +{
> +  gimple *
> +  fold (gimple_folder &f) const override
> +  {
> +    /* We should punt to rtl if the effect of the reinterpret on
> +       registers does not conform to GCC's endianness model like we do
> +       on aarch64, but MVE intrinsics are not currently supported on
> +       big-endian.  For this, we'd need to handle big-endian properly
> +       in the .md file, like we do on aarch64 with
> +       define_insn_and_split "*aarch64_sve_reinterpret<mode>".  */
> +    gcc_assert (targetm.can_change_mode_class (f.vector_mode (0),
> +					       f.vector_mode (1),
> +					       VFP_REGS));
> +
> +    /* Otherwise vreinterpret corresponds directly to a VIEW_CONVERT_EXPR
> +       reinterpretation.  */
> +    tree rhs = build1 (VIEW_CONVERT_EXPR, TREE_TYPE (f.lhs),
> +		       gimple_call_arg (f.call, 0));
> +    return gimple_build_assign (f.lhs, VIEW_CONVERT_EXPR, rhs);
> +  }
> +
> +  rtx
> +  expand (function_expander &e) const override
> +  {
> +    machine_mode mode = e.vector_mode (0);
> +    return e.use_exact_insn (code_for_arm_mve_reinterpret (mode));
> +  }
> +};
> +
>  } /* end anonymous namespace */
> 
>  namespace arm_mve {
> 
> +FUNCTION (vreinterpretq, vreinterpretq_impl,)
> +
>  } /* end namespace arm_mve */
> diff --git a/gcc/config/arm/arm-mve-builtins-base.def b/gcc/config/arm/arm-
> mve-builtins-base.def
> index d15ba2e23e8..5c0c1b9cee7 100644
> --- a/gcc/config/arm/arm-mve-builtins-base.def
> +++ b/gcc/config/arm/arm-mve-builtins-base.def
> @@ -18,7 +18,9 @@
>     <http://www.gnu.org/licenses/>.  */
> 
>  #define REQUIRES_FLOAT false
> +DEF_MVE_FUNCTION (vreinterpretq, unary_convert, reinterpret_integer,
> none)
>  #undef REQUIRES_FLOAT
> 
>  #define REQUIRES_FLOAT true
> +DEF_MVE_FUNCTION (vreinterpretq, unary_convert, reinterpret_float, none)
>  #undef REQUIRES_FLOAT
> diff --git a/gcc/config/arm/arm-mve-builtins-base.h b/gcc/config/arm/arm-
> mve-builtins-base.h
> index c4d7b750cd5..60e7bd24eda 100644
> --- a/gcc/config/arm/arm-mve-builtins-base.h
> +++ b/gcc/config/arm/arm-mve-builtins-base.h
> @@ -23,6 +23,8 @@
>  namespace arm_mve {
>  namespace functions {
> 
> +extern const function_base *const vreinterpretq;
> +
>  } /* end namespace arm_mve::functions */
>  } /* end namespace arm_mve */
> 
> diff --git a/gcc/config/arm/arm-mve-builtins-shapes.cc b/gcc/config/arm/arm-
> mve-builtins-shapes.cc
> index f20660d8319..d0da0ffef91 100644
> --- a/gcc/config/arm/arm-mve-builtins-shapes.cc
> +++ b/gcc/config/arm/arm-mve-builtins-shapes.cc
> @@ -338,6 +338,34 @@ struct overloaded_base : public function_shape
>    }
>  };
> 
> +/* <T0>_t foo_t0[_t1](<T1>_t)
> +
> +   where the target type <t0> must be specified explicitly but the source
> +   type <t1> can be inferred.
> +
> +   Example: vreinterpretq.
> +   int16x8_t [__arm_]vreinterpretq_s16[_s8](int8x16_t a)
> +   int32x4_t [__arm_]vreinterpretq_s32[_s8](int8x16_t a)
> +   int8x16_t [__arm_]vreinterpretq_s8[_s16](int16x8_t a)
> +   int8x16_t [__arm_]vreinterpretq_s8[_s32](int32x4_t a)  */
> +struct unary_convert_def : public overloaded_base<1>
> +{
> +  void
> +  build (function_builder &b, const function_group_info &group,
> +	 bool preserve_user_namespace) const override
> +  {
> +    b.add_overloaded_functions (group, MODE_none,
> preserve_user_namespace);
> +    build_all (b, "v0,v1", group, MODE_none, preserve_user_namespace);
> +  }
> +
> +  tree
> +  resolve (function_resolver &r) const override
> +  {
> +    return r.resolve_unary ();
> +  }
> +};
> +SHAPE (unary_convert)
> +
>  } /* end namespace arm_mve */
> 
>  #undef SHAPE
> diff --git a/gcc/config/arm/arm-mve-builtins-shapes.h b/gcc/config/arm/arm-
> mve-builtins-shapes.h
> index 9e353b85a76..04d19a02890 100644
> --- a/gcc/config/arm/arm-mve-builtins-shapes.h
> +++ b/gcc/config/arm/arm-mve-builtins-shapes.h
> @@ -22,8 +22,16 @@
> 
>  namespace arm_mve
>  {
> +  /* The naming convention is:
> +
> +     - to use names like "unary" etc. if the rules are somewhat generic,
> +       especially if there are no ranges involved.  */
> +
>    namespace shapes
>    {
> +
> +    extern const function_shape *const unary_convert;
> +
>    } /* end namespace arm_mve::shapes */
>  } /* end namespace arm_mve */
> 
> diff --git a/gcc/config/arm/arm-mve-builtins.cc b/gcc/config/arm/arm-mve-
> builtins.cc
> index b0cceb75ceb..e409a029346 100644
> --- a/gcc/config/arm/arm-mve-builtins.cc
> +++ b/gcc/config/arm/arm-mve-builtins.cc
> @@ -199,6 +199,52 @@ CONSTEXPR const type_suffix_info
> type_suffixes[NUM_TYPE_SUFFIXES + 1] = {
>  #define TYPES_signed_32(S, D) \
>    S (s32)
> 
> +#define TYPES_reinterpret_signed1(D, A) \
> +  D (A, s8), D (A, s16), D (A, s32), D (A, s64)
> +
> +#define TYPES_reinterpret_unsigned1(D, A) \
> +  D (A, u8), D (A, u16), D (A, u32), D (A, u64)
> +
> +#define TYPES_reinterpret_integer(S, D) \
> +  TYPES_reinterpret_unsigned1 (D, s8), \
> +  D (s8, s16), D (s8, s32), D (s8, s64), \
> +  TYPES_reinterpret_unsigned1 (D, s16), \
> +  D (s16, s8), D (s16, s32), D (s16, s64), \
> +  TYPES_reinterpret_unsigned1 (D, s32), \
> +  D (s32, s8), D (s32, s16), D (s32, s64), \
> +  TYPES_reinterpret_unsigned1 (D, s64), \
> +  D (s64, s8), D (s64, s16), D (s64, s32), \
> +  TYPES_reinterpret_signed1 (D, u8), \
> +  D (u8, u16), D (u8, u32), D (u8, u64), \
> +  TYPES_reinterpret_signed1 (D, u16), \
> +  D (u16, u8), D (u16, u32), D (u16, u64), \
> +  TYPES_reinterpret_signed1 (D, u32), \
> +  D (u32, u8), D (u32, u16), D (u32, u64), \
> +  TYPES_reinterpret_signed1 (D, u64), \
> +  D (u64, u8), D (u64, u16), D (u64, u32)
> +
> +/* { _s8  _s16 _s32 _s64 } x { _s8  _s16 _s32 _s64 }
> +   { _u8  _u16 _u32 _u64 }   { _u8  _u16 _u32 _u64 }.  */
> +#define TYPES_reinterpret_integer1(D, A) \
> +  TYPES_reinterpret_signed1 (D, A), \
> +  TYPES_reinterpret_unsigned1 (D, A)
> +
> +#define TYPES_reinterpret_float1(D, A) \
> +  D (A, f16), D (A, f32)
> +
> +#define TYPES_reinterpret_float(S, D) \
> +  TYPES_reinterpret_float1 (D, s8), \
> +  TYPES_reinterpret_float1 (D, s16), \
> +  TYPES_reinterpret_float1 (D, s32), \
> +  TYPES_reinterpret_float1 (D, s64), \
> +  TYPES_reinterpret_float1 (D, u8), \
> +  TYPES_reinterpret_float1 (D, u16), \
> +  TYPES_reinterpret_float1 (D, u32), \
> +  TYPES_reinterpret_float1 (D, u64), \
> +  TYPES_reinterpret_integer1 (D, f16), \
> +  TYPES_reinterpret_integer1 (D, f32), \
> +  D (f16, f32), D (f32, f16)
> +
>  /* Describe a pair of type suffixes in which only the first is used.  */
>  #define DEF_VECTOR_TYPE(X) { TYPE_SUFFIX_ ## X, NUM_TYPE_SUFFIXES }
> 
> @@ -231,6 +277,8 @@ DEF_MVE_TYPES_ARRAY (integer_16_32);
>  DEF_MVE_TYPES_ARRAY (integer_32);
>  DEF_MVE_TYPES_ARRAY (signed_16_32);
>  DEF_MVE_TYPES_ARRAY (signed_32);
> +DEF_MVE_TYPES_ARRAY (reinterpret_integer);
> +DEF_MVE_TYPES_ARRAY (reinterpret_float);
> 
>  /* Used by functions that have no governing predicate.  */
>  static const predication_index preds_none[] = { PRED_none, NUM_PREDS };
> @@ -253,6 +301,14 @@ static const predication_index preds_p_or_none[] = {
>    PRED_p, PRED_none, NUM_PREDS
>  };
> 
> +/* A list of all MVE ACLE functions.  */
> +static CONSTEXPR const function_group_info function_groups[] = {
> +#define DEF_MVE_FUNCTION(NAME, SHAPE, TYPES, PREDS)
> 	\
> +  { #NAME, &functions::NAME, &shapes::SHAPE, types_##TYPES,
> preds_##PREDS, \
> +    REQUIRES_FLOAT },
> +#include "arm-mve-builtins.def"
> +};
> +
>  /* The scalar type associated with each vector type.  */
>  extern GTY(()) tree scalar_types[NUM_VECTOR_TYPES];
>  tree scalar_types[NUM_VECTOR_TYPES];
> @@ -431,6 +487,10 @@ handle_arm_mve_h (bool
> preserve_user_namespace)
> 
>    /* Define MVE functions.  */
>    function_table = new hash_table<registered_function_hasher> (1023);
> +  function_builder builder;
> +  for (unsigned int i = 0; i < ARRAY_SIZE (function_groups); ++i)
> +    builder.register_function_group (function_groups[i],
> +				     preserve_user_namespace);
>  }
> 
>  /* Return true if CANDIDATE is equivalent to MODEL_TYPE for overloading
> diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
> index 0d2ba968fc0..7688b5a7e53 100644
> --- a/gcc/config/arm/arm_mve.h
> +++ b/gcc/config/arm/arm_mve.h
> @@ -358,14 +358,6 @@
>  #define vstrwq_scatter_shifted_offset_p(__base, __offset, __value, __p)
> __arm_vstrwq_scatter_shifted_offset_p(__base, __offset, __value, __p)
>  #define vstrwq_scatter_shifted_offset(__base, __offset, __value)
> __arm_vstrwq_scatter_shifted_offset(__base, __offset, __value)
>  #define vuninitializedq(__v) __arm_vuninitializedq(__v)
> -#define vreinterpretq_s16(__a) __arm_vreinterpretq_s16(__a)
> -#define vreinterpretq_s32(__a) __arm_vreinterpretq_s32(__a)
> -#define vreinterpretq_s64(__a) __arm_vreinterpretq_s64(__a)
> -#define vreinterpretq_s8(__a) __arm_vreinterpretq_s8(__a)
> -#define vreinterpretq_u16(__a) __arm_vreinterpretq_u16(__a)
> -#define vreinterpretq_u32(__a) __arm_vreinterpretq_u32(__a)
> -#define vreinterpretq_u64(__a) __arm_vreinterpretq_u64(__a)
> -#define vreinterpretq_u8(__a) __arm_vreinterpretq_u8(__a)
>  #define vddupq_m(__inactive, __a, __imm, __p)
> __arm_vddupq_m(__inactive, __a, __imm, __p)
>  #define vddupq_u8(__a, __imm) __arm_vddupq_u8(__a, __imm)
>  #define vddupq_u32(__a, __imm) __arm_vddupq_u32(__a, __imm)
> @@ -518,8 +510,6 @@
>  #define vfmsq_m(__a, __b, __c, __p) __arm_vfmsq_m(__a, __b, __c, __p)
>  #define vmaxnmq_m(__inactive, __a, __b, __p)
> __arm_vmaxnmq_m(__inactive, __a, __b, __p)
>  #define vminnmq_m(__inactive, __a, __b, __p)
> __arm_vminnmq_m(__inactive, __a, __b, __p)
> -#define vreinterpretq_f16(__a) __arm_vreinterpretq_f16(__a)
> -#define vreinterpretq_f32(__a) __arm_vreinterpretq_f32(__a)
>  #define vminnmq_x(__a, __b, __p) __arm_vminnmq_x(__a, __b, __p)
>  #define vmaxnmq_x(__a, __b, __p) __arm_vmaxnmq_x(__a, __b, __p)
>  #define vcmulq_x(__a, __b, __p) __arm_vcmulq_x(__a, __b, __p)
> @@ -2365,96 +2355,6 @@
>  #define vaddq_u32(__a, __b) __arm_vaddq_u32(__a, __b)
>  #define vaddq_f16(__a, __b) __arm_vaddq_f16(__a, __b)
>  #define vaddq_f32(__a, __b) __arm_vaddq_f32(__a, __b)
> -#define vreinterpretq_s16_s32(__a) __arm_vreinterpretq_s16_s32(__a)
> -#define vreinterpretq_s16_s64(__a) __arm_vreinterpretq_s16_s64(__a)
> -#define vreinterpretq_s16_s8(__a) __arm_vreinterpretq_s16_s8(__a)
> -#define vreinterpretq_s16_u16(__a) __arm_vreinterpretq_s16_u16(__a)
> -#define vreinterpretq_s16_u32(__a) __arm_vreinterpretq_s16_u32(__a)
> -#define vreinterpretq_s16_u64(__a) __arm_vreinterpretq_s16_u64(__a)
> -#define vreinterpretq_s16_u8(__a) __arm_vreinterpretq_s16_u8(__a)
> -#define vreinterpretq_s32_s16(__a) __arm_vreinterpretq_s32_s16(__a)
> -#define vreinterpretq_s32_s64(__a) __arm_vreinterpretq_s32_s64(__a)
> -#define vreinterpretq_s32_s8(__a) __arm_vreinterpretq_s32_s8(__a)
> -#define vreinterpretq_s32_u16(__a) __arm_vreinterpretq_s32_u16(__a)
> -#define vreinterpretq_s32_u32(__a) __arm_vreinterpretq_s32_u32(__a)
> -#define vreinterpretq_s32_u64(__a) __arm_vreinterpretq_s32_u64(__a)
> -#define vreinterpretq_s32_u8(__a) __arm_vreinterpretq_s32_u8(__a)
> -#define vreinterpretq_s64_s16(__a) __arm_vreinterpretq_s64_s16(__a)
> -#define vreinterpretq_s64_s32(__a) __arm_vreinterpretq_s64_s32(__a)
> -#define vreinterpretq_s64_s8(__a) __arm_vreinterpretq_s64_s8(__a)
> -#define vreinterpretq_s64_u16(__a) __arm_vreinterpretq_s64_u16(__a)
> -#define vreinterpretq_s64_u32(__a) __arm_vreinterpretq_s64_u32(__a)
> -#define vreinterpretq_s64_u64(__a) __arm_vreinterpretq_s64_u64(__a)
> -#define vreinterpretq_s64_u8(__a) __arm_vreinterpretq_s64_u8(__a)
> -#define vreinterpretq_s8_s16(__a) __arm_vreinterpretq_s8_s16(__a)
> -#define vreinterpretq_s8_s32(__a) __arm_vreinterpretq_s8_s32(__a)
> -#define vreinterpretq_s8_s64(__a) __arm_vreinterpretq_s8_s64(__a)
> -#define vreinterpretq_s8_u16(__a) __arm_vreinterpretq_s8_u16(__a)
> -#define vreinterpretq_s8_u32(__a) __arm_vreinterpretq_s8_u32(__a)
> -#define vreinterpretq_s8_u64(__a) __arm_vreinterpretq_s8_u64(__a)
> -#define vreinterpretq_s8_u8(__a) __arm_vreinterpretq_s8_u8(__a)
> -#define vreinterpretq_u16_s16(__a) __arm_vreinterpretq_u16_s16(__a)
> -#define vreinterpretq_u16_s32(__a) __arm_vreinterpretq_u16_s32(__a)
> -#define vreinterpretq_u16_s64(__a) __arm_vreinterpretq_u16_s64(__a)
> -#define vreinterpretq_u16_s8(__a) __arm_vreinterpretq_u16_s8(__a)
> -#define vreinterpretq_u16_u32(__a) __arm_vreinterpretq_u16_u32(__a)
> -#define vreinterpretq_u16_u64(__a) __arm_vreinterpretq_u16_u64(__a)
> -#define vreinterpretq_u16_u8(__a) __arm_vreinterpretq_u16_u8(__a)
> -#define vreinterpretq_u32_s16(__a) __arm_vreinterpretq_u32_s16(__a)
> -#define vreinterpretq_u32_s32(__a) __arm_vreinterpretq_u32_s32(__a)
> -#define vreinterpretq_u32_s64(__a) __arm_vreinterpretq_u32_s64(__a)
> -#define vreinterpretq_u32_s8(__a) __arm_vreinterpretq_u32_s8(__a)
> -#define vreinterpretq_u32_u16(__a) __arm_vreinterpretq_u32_u16(__a)
> -#define vreinterpretq_u32_u64(__a) __arm_vreinterpretq_u32_u64(__a)
> -#define vreinterpretq_u32_u8(__a) __arm_vreinterpretq_u32_u8(__a)
> -#define vreinterpretq_u64_s16(__a) __arm_vreinterpretq_u64_s16(__a)
> -#define vreinterpretq_u64_s32(__a) __arm_vreinterpretq_u64_s32(__a)
> -#define vreinterpretq_u64_s64(__a) __arm_vreinterpretq_u64_s64(__a)
> -#define vreinterpretq_u64_s8(__a) __arm_vreinterpretq_u64_s8(__a)
> -#define vreinterpretq_u64_u16(__a) __arm_vreinterpretq_u64_u16(__a)
> -#define vreinterpretq_u64_u32(__a) __arm_vreinterpretq_u64_u32(__a)
> -#define vreinterpretq_u64_u8(__a) __arm_vreinterpretq_u64_u8(__a)
> -#define vreinterpretq_u8_s16(__a) __arm_vreinterpretq_u8_s16(__a)
> -#define vreinterpretq_u8_s32(__a) __arm_vreinterpretq_u8_s32(__a)
> -#define vreinterpretq_u8_s64(__a) __arm_vreinterpretq_u8_s64(__a)
> -#define vreinterpretq_u8_s8(__a) __arm_vreinterpretq_u8_s8(__a)
> -#define vreinterpretq_u8_u16(__a) __arm_vreinterpretq_u8_u16(__a)
> -#define vreinterpretq_u8_u32(__a) __arm_vreinterpretq_u8_u32(__a)
> -#define vreinterpretq_u8_u64(__a) __arm_vreinterpretq_u8_u64(__a)
> -#define vreinterpretq_s32_f16(__a) __arm_vreinterpretq_s32_f16(__a)
> -#define vreinterpretq_s32_f32(__a) __arm_vreinterpretq_s32_f32(__a)
> -#define vreinterpretq_u16_f16(__a) __arm_vreinterpretq_u16_f16(__a)
> -#define vreinterpretq_u16_f32(__a) __arm_vreinterpretq_u16_f32(__a)
> -#define vreinterpretq_u32_f16(__a) __arm_vreinterpretq_u32_f16(__a)
> -#define vreinterpretq_u32_f32(__a) __arm_vreinterpretq_u32_f32(__a)
> -#define vreinterpretq_u64_f16(__a) __arm_vreinterpretq_u64_f16(__a)
> -#define vreinterpretq_u64_f32(__a) __arm_vreinterpretq_u64_f32(__a)
> -#define vreinterpretq_u8_f16(__a) __arm_vreinterpretq_u8_f16(__a)
> -#define vreinterpretq_u8_f32(__a) __arm_vreinterpretq_u8_f32(__a)
> -#define vreinterpretq_f16_f32(__a) __arm_vreinterpretq_f16_f32(__a)
> -#define vreinterpretq_f16_s16(__a) __arm_vreinterpretq_f16_s16(__a)
> -#define vreinterpretq_f16_s32(__a) __arm_vreinterpretq_f16_s32(__a)
> -#define vreinterpretq_f16_s64(__a) __arm_vreinterpretq_f16_s64(__a)
> -#define vreinterpretq_f16_s8(__a) __arm_vreinterpretq_f16_s8(__a)
> -#define vreinterpretq_f16_u16(__a) __arm_vreinterpretq_f16_u16(__a)
> -#define vreinterpretq_f16_u32(__a) __arm_vreinterpretq_f16_u32(__a)
> -#define vreinterpretq_f16_u64(__a) __arm_vreinterpretq_f16_u64(__a)
> -#define vreinterpretq_f16_u8(__a) __arm_vreinterpretq_f16_u8(__a)
> -#define vreinterpretq_f32_f16(__a) __arm_vreinterpretq_f32_f16(__a)
> -#define vreinterpretq_f32_s16(__a) __arm_vreinterpretq_f32_s16(__a)
> -#define vreinterpretq_f32_s32(__a) __arm_vreinterpretq_f32_s32(__a)
> -#define vreinterpretq_f32_s64(__a) __arm_vreinterpretq_f32_s64(__a)
> -#define vreinterpretq_f32_s8(__a) __arm_vreinterpretq_f32_s8(__a)
> -#define vreinterpretq_f32_u16(__a) __arm_vreinterpretq_f32_u16(__a)
> -#define vreinterpretq_f32_u32(__a) __arm_vreinterpretq_f32_u32(__a)
> -#define vreinterpretq_f32_u64(__a) __arm_vreinterpretq_f32_u64(__a)
> -#define vreinterpretq_f32_u8(__a) __arm_vreinterpretq_f32_u8(__a)
> -#define vreinterpretq_s16_f16(__a) __arm_vreinterpretq_s16_f16(__a)
> -#define vreinterpretq_s16_f32(__a) __arm_vreinterpretq_s16_f32(__a)
> -#define vreinterpretq_s64_f16(__a) __arm_vreinterpretq_s64_f16(__a)
> -#define vreinterpretq_s64_f32(__a) __arm_vreinterpretq_s64_f32(__a)
> -#define vreinterpretq_s8_f16(__a) __arm_vreinterpretq_s8_f16(__a)
> -#define vreinterpretq_s8_f32(__a) __arm_vreinterpretq_s8_f32(__a)
>  #define vuninitializedq_u8(void) __arm_vuninitializedq_u8(void)
>  #define vuninitializedq_u16(void) __arm_vuninitializedq_u16(void)
>  #define vuninitializedq_u32(void) __arm_vuninitializedq_u32(void)
> @@ -37874,126 +37774,6 @@ extern void *__ARM_undef;
>    int (*)[__ARM_mve_type_float16x8_t]: __arm_vuninitializedq_f16 (), \
>    int (*)[__ARM_mve_type_float32x4_t]: __arm_vuninitializedq_f32 ());})
> 
> -#define __arm_vreinterpretq_f16(p0) ({ __typeof(p0) __p0 = (p0); \
> -  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
> -  int (*)[__ARM_mve_type_int8x16_t]: __arm_vreinterpretq_f16_s8
> (__ARM_mve_coerce(__p0, int8x16_t)), \
> -  int (*)[__ARM_mve_type_int16x8_t]: __arm_vreinterpretq_f16_s16
> (__ARM_mve_coerce(__p0, int16x8_t)), \
> -  int (*)[__ARM_mve_type_int32x4_t]: __arm_vreinterpretq_f16_s32
> (__ARM_mve_coerce(__p0, int32x4_t)), \
> -  int (*)[__ARM_mve_type_int64x2_t]: __arm_vreinterpretq_f16_s64
> (__ARM_mve_coerce(__p0, int64x2_t)), \
> -  int (*)[__ARM_mve_type_uint8x16_t]: __arm_vreinterpretq_f16_u8
> (__ARM_mve_coerce(__p0, uint8x16_t)), \
> -  int (*)[__ARM_mve_type_uint16x8_t]: __arm_vreinterpretq_f16_u16
> (__ARM_mve_coerce(__p0, uint16x8_t)), \
> -  int (*)[__ARM_mve_type_uint32x4_t]: __arm_vreinterpretq_f16_u32
> (__ARM_mve_coerce(__p0, uint32x4_t)), \
> -  int (*)[__ARM_mve_type_uint64x2_t]: __arm_vreinterpretq_f16_u64
> (__ARM_mve_coerce(__p0, uint64x2_t)), \
> -  int (*)[__ARM_mve_type_float32x4_t]: __arm_vreinterpretq_f16_f32
> (__ARM_mve_coerce(__p0, float32x4_t)));})
> -
> -#define __arm_vreinterpretq_f32(p0) ({ __typeof(p0) __p0 = (p0); \
> -  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
> -  int (*)[__ARM_mve_type_int8x16_t]: __arm_vreinterpretq_f32_s8
> (__ARM_mve_coerce(__p0, int8x16_t)), \
> -  int (*)[__ARM_mve_type_int16x8_t]: __arm_vreinterpretq_f32_s16
> (__ARM_mve_coerce(__p0, int16x8_t)), \
> -  int (*)[__ARM_mve_type_int32x4_t]: __arm_vreinterpretq_f32_s32
> (__ARM_mve_coerce(__p0, int32x4_t)), \
> -  int (*)[__ARM_mve_type_int64x2_t]: __arm_vreinterpretq_f32_s64
> (__ARM_mve_coerce(__p0, int64x2_t)), \
> -  int (*)[__ARM_mve_type_uint8x16_t]: __arm_vreinterpretq_f32_u8
> (__ARM_mve_coerce(__p0, uint8x16_t)), \
> -  int (*)[__ARM_mve_type_uint16x8_t]: __arm_vreinterpretq_f32_u16
> (__ARM_mve_coerce(__p0, uint16x8_t)), \
> -  int (*)[__ARM_mve_type_uint32x4_t]: __arm_vreinterpretq_f32_u32
> (__ARM_mve_coerce(__p0, uint32x4_t)), \
> -  int (*)[__ARM_mve_type_uint64x2_t]: __arm_vreinterpretq_f32_u64
> (__ARM_mve_coerce(__p0, uint64x2_t)), \
> -  int (*)[__ARM_mve_type_float16x8_t]: __arm_vreinterpretq_f32_f16
> (__ARM_mve_coerce(__p0, float16x8_t)));})
> -
> -#define __arm_vreinterpretq_s16(p0) ({ __typeof(p0) __p0 = (p0); \
> -  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
> -  int (*)[__ARM_mve_type_float16x8_t]: __arm_vreinterpretq_s16_f16
> (__ARM_mve_coerce(__p0, float16x8_t)), \
> -  int (*)[__ARM_mve_type_int8x16_t]: __arm_vreinterpretq_s16_s8
> (__ARM_mve_coerce(__p0, int8x16_t)), \
> -  int (*)[__ARM_mve_type_int32x4_t]: __arm_vreinterpretq_s16_s32
> (__ARM_mve_coerce(__p0, int32x4_t)), \
> -  int (*)[__ARM_mve_type_int64x2_t]: __arm_vreinterpretq_s16_s64
> (__ARM_mve_coerce(__p0, int64x2_t)), \
> -  int (*)[__ARM_mve_type_uint8x16_t]: __arm_vreinterpretq_s16_u8
> (__ARM_mve_coerce(__p0, uint8x16_t)), \
> -  int (*)[__ARM_mve_type_uint16x8_t]: __arm_vreinterpretq_s16_u16
> (__ARM_mve_coerce(__p0, uint16x8_t)), \
> -  int (*)[__ARM_mve_type_uint32x4_t]: __arm_vreinterpretq_s16_u32
> (__ARM_mve_coerce(__p0, uint32x4_t)), \
> -  int (*)[__ARM_mve_type_uint64x2_t]: __arm_vreinterpretq_s16_u64
> (__ARM_mve_coerce(__p0, uint64x2_t)), \
> -  int (*)[__ARM_mve_type_float32x4_t]: __arm_vreinterpretq_s16_f32
> (__ARM_mve_coerce(__p0, float32x4_t)));})
> -
> -#define __arm_vreinterpretq_s32(p0) ({ __typeof(p0) __p0 = (p0); \
> -  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
> -  int (*)[__ARM_mve_type_float16x8_t]: __arm_vreinterpretq_s32_f16
> (__ARM_mve_coerce(__p0, float16x8_t)), \
> -  int (*)[__ARM_mve_type_int16x8_t]: __arm_vreinterpretq_s32_s16
> (__ARM_mve_coerce(__p0, int16x8_t)), \
> -  int (*)[__ARM_mve_type_int8x16_t]: __arm_vreinterpretq_s32_s8
> (__ARM_mve_coerce(__p0, int8x16_t)), \
> -  int (*)[__ARM_mve_type_int64x2_t]: __arm_vreinterpretq_s32_s64
> (__ARM_mve_coerce(__p0, int64x2_t)), \
> -  int (*)[__ARM_mve_type_uint8x16_t]: __arm_vreinterpretq_s32_u8
> (__ARM_mve_coerce(__p0, uint8x16_t)), \
> -  int (*)[__ARM_mve_type_uint16x8_t]: __arm_vreinterpretq_s32_u16
> (__ARM_mve_coerce(__p0, uint16x8_t)), \
> -  int (*)[__ARM_mve_type_uint32x4_t]: __arm_vreinterpretq_s32_u32
> (__ARM_mve_coerce(__p0, uint32x4_t)), \
> -  int (*)[__ARM_mve_type_uint64x2_t]: __arm_vreinterpretq_s32_u64
> (__ARM_mve_coerce(__p0, uint64x2_t)), \
> -  int (*)[__ARM_mve_type_float32x4_t]: __arm_vreinterpretq_s32_f32
> (__ARM_mve_coerce(__p0, float32x4_t)));})
> -
> -#define __arm_vreinterpretq_s64(p0) ({ __typeof(p0) __p0 = (p0); \
> -  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
> -  int (*)[__ARM_mve_type_float16x8_t]: __arm_vreinterpretq_s64_f16
> (__ARM_mve_coerce(__p0, float16x8_t)), \
> -  int (*)[__ARM_mve_type_int16x8_t]: __arm_vreinterpretq_s64_s16
> (__ARM_mve_coerce(__p0, int16x8_t)), \
> -  int (*)[__ARM_mve_type_int32x4_t]: __arm_vreinterpretq_s64_s32
> (__ARM_mve_coerce(__p0, int32x4_t)), \
> -  int (*)[__ARM_mve_type_int8x16_t]: __arm_vreinterpretq_s64_s8
> (__ARM_mve_coerce(__p0, int8x16_t)), \
> -  int (*)[__ARM_mve_type_uint8x16_t]: __arm_vreinterpretq_s64_u8
> (__ARM_mve_coerce(__p0, uint8x16_t)), \
> -  int (*)[__ARM_mve_type_uint16x8_t]: __arm_vreinterpretq_s64_u16
> (__ARM_mve_coerce(__p0, uint16x8_t)), \
> -  int (*)[__ARM_mve_type_uint32x4_t]: __arm_vreinterpretq_s64_u32
> (__ARM_mve_coerce(__p0, uint32x4_t)), \
> -  int (*)[__ARM_mve_type_uint64x2_t]: __arm_vreinterpretq_s64_u64
> (__ARM_mve_coerce(__p0, uint64x2_t)), \
> -  int (*)[__ARM_mve_type_float32x4_t]: __arm_vreinterpretq_s64_f32
> (__ARM_mve_coerce(__p0, float32x4_t)));})
> -
> -#define __arm_vreinterpretq_s8(p0) ({ __typeof(p0) __p0 = (p0); \
> -  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
> -  int (*)[__ARM_mve_type_float16x8_t]: __arm_vreinterpretq_s8_f16
> (__ARM_mve_coerce(__p0, float16x8_t)), \
> -  int (*)[__ARM_mve_type_int16x8_t]: __arm_vreinterpretq_s8_s16
> (__ARM_mve_coerce(__p0, int16x8_t)), \
> -  int (*)[__ARM_mve_type_int32x4_t]: __arm_vreinterpretq_s8_s32
> (__ARM_mve_coerce(__p0, int32x4_t)), \
> -  int (*)[__ARM_mve_type_int64x2_t]: __arm_vreinterpretq_s8_s64
> (__ARM_mve_coerce(__p0, int64x2_t)), \
> -  int (*)[__ARM_mve_type_uint8x16_t]: __arm_vreinterpretq_s8_u8
> (__ARM_mve_coerce(__p0, uint8x16_t)), \
> -  int (*)[__ARM_mve_type_uint16x8_t]: __arm_vreinterpretq_s8_u16
> (__ARM_mve_coerce(__p0, uint16x8_t)), \
> -  int (*)[__ARM_mve_type_uint32x4_t]: __arm_vreinterpretq_s8_u32
> (__ARM_mve_coerce(__p0, uint32x4_t)), \
> -  int (*)[__ARM_mve_type_uint64x2_t]: __arm_vreinterpretq_s8_u64
> (__ARM_mve_coerce(__p0, uint64x2_t)), \
> -  int (*)[__ARM_mve_type_float32x4_t]: __arm_vreinterpretq_s8_f32
> (__ARM_mve_coerce(__p0, float32x4_t)));})
> -
> -#define __arm_vreinterpretq_u16(p0) ({ __typeof(p0) __p0 = (p0); \
> -  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
> -  int (*)[__ARM_mve_type_float16x8_t]: __arm_vreinterpretq_u16_f16
> (__ARM_mve_coerce(__p0, float16x8_t)), \
> -  int (*)[__ARM_mve_type_int8x16_t]: __arm_vreinterpretq_u16_s8
> (__ARM_mve_coerce(__p0, int8x16_t)), \
> -  int (*)[__ARM_mve_type_int32x4_t]: __arm_vreinterpretq_u16_s32
> (__ARM_mve_coerce(__p0, int32x4_t)), \
> -  int (*)[__ARM_mve_type_int64x2_t]: __arm_vreinterpretq_u16_s64
> (__ARM_mve_coerce(__p0, int64x2_t)), \
> -  int (*)[__ARM_mve_type_uint8x16_t]: __arm_vreinterpretq_u16_u8
> (__ARM_mve_coerce(__p0, uint8x16_t)), \
> -  int (*)[__ARM_mve_type_int16x8_t]: __arm_vreinterpretq_u16_s16
> (__ARM_mve_coerce(__p0, int16x8_t)), \
> -  int (*)[__ARM_mve_type_uint32x4_t]: __arm_vreinterpretq_u16_u32
> (__ARM_mve_coerce(__p0, uint32x4_t)), \
> -  int (*)[__ARM_mve_type_uint64x2_t]: __arm_vreinterpretq_u16_u64
> (__ARM_mve_coerce(__p0, uint64x2_t)), \
> -  int (*)[__ARM_mve_type_float32x4_t]: __arm_vreinterpretq_u16_f32
> (__ARM_mve_coerce(__p0, float32x4_t)));})
> -
> -#define __arm_vreinterpretq_u32(p0) ({ __typeof(p0) __p0 = (p0); \
> -  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
> -  int (*)[__ARM_mve_type_float16x8_t]: __arm_vreinterpretq_u32_f16
> (__ARM_mve_coerce(__p0, float16x8_t)), \
> -  int (*)[__ARM_mve_type_int16x8_t]: __arm_vreinterpretq_u32_s16
> (__ARM_mve_coerce(__p0, int16x8_t)), \
> -  int (*)[__ARM_mve_type_int8x16_t]: __arm_vreinterpretq_u32_s8
> (__ARM_mve_coerce(__p0, int8x16_t)), \
> -  int (*)[__ARM_mve_type_int64x2_t]: __arm_vreinterpretq_u32_s64
> (__ARM_mve_coerce(__p0, int64x2_t)), \
> -  int (*)[__ARM_mve_type_uint8x16_t]: __arm_vreinterpretq_u32_u8
> (__ARM_mve_coerce(__p0, uint8x16_t)), \
> -  int (*)[__ARM_mve_type_uint16x8_t]: __arm_vreinterpretq_u32_u16
> (__ARM_mve_coerce(__p0, uint16x8_t)), \
> -  int (*)[__ARM_mve_type_int32x4_t]: __arm_vreinterpretq_u32_s32
> (__ARM_mve_coerce(__p0, int32x4_t)), \
> -  int (*)[__ARM_mve_type_uint64x2_t]: __arm_vreinterpretq_u32_u64
> (__ARM_mve_coerce(__p0, uint64x2_t)), \
> -  int (*)[__ARM_mve_type_float32x4_t]: __arm_vreinterpretq_u32_f32
> (__ARM_mve_coerce(__p0, float32x4_t)));})
> -
> -#define __arm_vreinterpretq_u64(p0) ({ __typeof(p0) __p0 = (p0); \
> -  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
> -  int (*)[__ARM_mve_type_float16x8_t]: __arm_vreinterpretq_u64_f16
> (__ARM_mve_coerce(__p0, float16x8_t)), \
> -  int (*)[__ARM_mve_type_int16x8_t]: __arm_vreinterpretq_u64_s16
> (__ARM_mve_coerce(__p0, int16x8_t)), \
> -  int (*)[__ARM_mve_type_int32x4_t]: __arm_vreinterpretq_u64_s32
> (__ARM_mve_coerce(__p0, int32x4_t)), \
> -  int (*)[__ARM_mve_type_int8x16_t]: __arm_vreinterpretq_u64_s8
> (__ARM_mve_coerce(__p0, int8x16_t)), \
> -  int (*)[__ARM_mve_type_uint8x16_t]: __arm_vreinterpretq_u64_u8
> (__ARM_mve_coerce(__p0, uint8x16_t)), \
> -  int (*)[__ARM_mve_type_uint16x8_t]: __arm_vreinterpretq_u64_u16
> (__ARM_mve_coerce(__p0, uint16x8_t)), \
> -  int (*)[__ARM_mve_type_uint32x4_t]: __arm_vreinterpretq_u64_u32
> (__ARM_mve_coerce(__p0, uint32x4_t)), \
> -  int (*)[__ARM_mve_type_int64x2_t]: __arm_vreinterpretq_u64_s64
> (__ARM_mve_coerce(__p0, int64x2_t)), \
> -  int (*)[__ARM_mve_type_float32x4_t]: __arm_vreinterpretq_u64_f32
> (__ARM_mve_coerce(__p0, float32x4_t)));})
> -
> -#define __arm_vreinterpretq_u8(p0) ({ __typeof(p0) __p0 = (p0); \
> -  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
> -  int (*)[__ARM_mve_type_float16x8_t]: __arm_vreinterpretq_u8_f16
> (__ARM_mve_coerce(__p0, float16x8_t)), \
> -  int (*)[__ARM_mve_type_int16x8_t]: __arm_vreinterpretq_u8_s16
> (__ARM_mve_coerce(__p0, int16x8_t)), \
> -  int (*)[__ARM_mve_type_int32x4_t]: __arm_vreinterpretq_u8_s32
> (__ARM_mve_coerce(__p0, int32x4_t)), \
> -  int (*)[__ARM_mve_type_int64x2_t]: __arm_vreinterpretq_u8_s64
> (__ARM_mve_coerce(__p0, int64x2_t)), \
> -  int (*)[__ARM_mve_type_int8x16_t]: __arm_vreinterpretq_u8_s8
> (__ARM_mve_coerce(__p0, int8x16_t)), \
> -  int (*)[__ARM_mve_type_uint16x8_t]: __arm_vreinterpretq_u8_u16
> (__ARM_mve_coerce(__p0, uint16x8_t)), \
> -  int (*)[__ARM_mve_type_uint32x4_t]: __arm_vreinterpretq_u8_u32
> (__ARM_mve_coerce(__p0, uint32x4_t)), \
> -  int (*)[__ARM_mve_type_uint64x2_t]: __arm_vreinterpretq_u8_u64
> (__ARM_mve_coerce(__p0, uint64x2_t)), \
> -  int (*)[__ARM_mve_type_float32x4_t]: __arm_vreinterpretq_u8_f32
> (__ARM_mve_coerce(__p0, float32x4_t)));})
> -
>  #define __arm_vstrwq_scatter_base_wb(p0,p1,p2) ({ __typeof(p2) __p2 =
> (p2); \
>    _Generic( (int (*)[__ARM_mve_typeid(__p2)])0, \
>    int (*)[__ARM_mve_type_int32x4_t]: __arm_vstrwq_scatter_base_wb_s32
> (p0, p1, __ARM_mve_coerce(__p2, int32x4_t)), \
> @@ -39931,86 +39711,6 @@ extern void *__ARM_undef;
>    int (*)[__ARM_mve_type_uint32x4_t]: __arm_vuninitializedq_u32 (), \
>    int (*)[__ARM_mve_type_uint64x2_t]: __arm_vuninitializedq_u64 ());})
> 
> -#define __arm_vreinterpretq_s16(p0) ({ __typeof(p0) __p0 = (p0); \
> -  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
> -  int (*)[__ARM_mve_type_int8x16_t]: __arm_vreinterpretq_s16_s8
> (__ARM_mve_coerce(__p0, int8x16_t)), \
> -  int (*)[__ARM_mve_type_int32x4_t]: __arm_vreinterpretq_s16_s32
> (__ARM_mve_coerce(__p0, int32x4_t)), \
> -  int (*)[__ARM_mve_type_int64x2_t]: __arm_vreinterpretq_s16_s64
> (__ARM_mve_coerce(__p0, int64x2_t)), \
> -  int (*)[__ARM_mve_type_uint8x16_t]: __arm_vreinterpretq_s16_u8
> (__ARM_mve_coerce(__p0, uint8x16_t)), \
> -  int (*)[__ARM_mve_type_uint16x8_t]: __arm_vreinterpretq_s16_u16
> (__ARM_mve_coerce(__p0, uint16x8_t)), \
> -  int (*)[__ARM_mve_type_uint32x4_t]: __arm_vreinterpretq_s16_u32
> (__ARM_mve_coerce(__p0, uint32x4_t)), \
> -  int (*)[__ARM_mve_type_uint64x2_t]: __arm_vreinterpretq_s16_u64
> (__ARM_mve_coerce(__p0, uint64x2_t)));})
> -
> -#define __arm_vreinterpretq_s32(p0) ({ __typeof(p0) __p0 = (p0); \
> -  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
> -  int (*)[__ARM_mve_type_int16x8_t]: __arm_vreinterpretq_s32_s16
> (__ARM_mve_coerce(__p0, int16x8_t)), \
> -  int (*)[__ARM_mve_type_int8x16_t]: __arm_vreinterpretq_s32_s8
> (__ARM_mve_coerce(__p0, int8x16_t)), \
> -  int (*)[__ARM_mve_type_int64x2_t]: __arm_vreinterpretq_s32_s64
> (__ARM_mve_coerce(__p0, int64x2_t)), \
> -  int (*)[__ARM_mve_type_uint8x16_t]: __arm_vreinterpretq_s32_u8
> (__ARM_mve_coerce(__p0, uint8x16_t)), \
> -  int (*)[__ARM_mve_type_uint16x8_t]: __arm_vreinterpretq_s32_u16
> (__ARM_mve_coerce(__p0, uint16x8_t)), \
> -  int (*)[__ARM_mve_type_uint32x4_t]: __arm_vreinterpretq_s32_u32
> (__ARM_mve_coerce(__p0, uint32x4_t)), \
> -  int (*)[__ARM_mve_type_uint64x2_t]: __arm_vreinterpretq_s32_u64
> (__ARM_mve_coerce(__p0, uint64x2_t)));})
> -
> -#define __arm_vreinterpretq_s64(p0) ({ __typeof(p0) __p0 = (p0); \
> -  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
> -  int (*)[__ARM_mve_type_int16x8_t]: __arm_vreinterpretq_s64_s16
> (__ARM_mve_coerce(__p0, int16x8_t)), \
> -  int (*)[__ARM_mve_type_int32x4_t]: __arm_vreinterpretq_s64_s32
> (__ARM_mve_coerce(__p0, int32x4_t)), \
> -  int (*)[__ARM_mve_type_int8x16_t]: __arm_vreinterpretq_s64_s8
> (__ARM_mve_coerce(__p0, int8x16_t)), \
> -  int (*)[__ARM_mve_type_uint8x16_t]: __arm_vreinterpretq_s64_u8
> (__ARM_mve_coerce(__p0, uint8x16_t)), \
> -  int (*)[__ARM_mve_type_uint16x8_t]: __arm_vreinterpretq_s64_u16
> (__ARM_mve_coerce(__p0, uint16x8_t)), \
> -  int (*)[__ARM_mve_type_uint32x4_t]: __arm_vreinterpretq_s64_u32
> (__ARM_mve_coerce(__p0, uint32x4_t)), \
> -  int (*)[__ARM_mve_type_uint64x2_t]: __arm_vreinterpretq_s64_u64
> (__ARM_mve_coerce(__p0, uint64x2_t)));})
> -
> -#define __arm_vreinterpretq_s8(p0) ({ __typeof(p0) __p0 = (p0); \
> -  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
> -  int (*)[__ARM_mve_type_int16x8_t]: __arm_vreinterpretq_s8_s16
> (__ARM_mve_coerce(__p0, int16x8_t)), \
> -  int (*)[__ARM_mve_type_int32x4_t]: __arm_vreinterpretq_s8_s32
> (__ARM_mve_coerce(__p0, int32x4_t)), \
> -  int (*)[__ARM_mve_type_int64x2_t]: __arm_vreinterpretq_s8_s64
> (__ARM_mve_coerce(__p0, int64x2_t)), \
> -  int (*)[__ARM_mve_type_uint8x16_t]: __arm_vreinterpretq_s8_u8
> (__ARM_mve_coerce(__p0, uint8x16_t)), \
> -  int (*)[__ARM_mve_type_uint16x8_t]: __arm_vreinterpretq_s8_u16
> (__ARM_mve_coerce(__p0, uint16x8_t)), \
> -  int (*)[__ARM_mve_type_uint32x4_t]: __arm_vreinterpretq_s8_u32
> (__ARM_mve_coerce(__p0, uint32x4_t)), \
> -  int (*)[__ARM_mve_type_uint64x2_t]: __arm_vreinterpretq_s8_u64
> (__ARM_mve_coerce(__p0, uint64x2_t)));})
> -
> -#define __arm_vreinterpretq_u16(p0) ({ __typeof(p0) __p0 = (p0); \
> -  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
> -  int (*)[__ARM_mve_type_int8x16_t]: __arm_vreinterpretq_u16_s8
> (__ARM_mve_coerce(__p0, int8x16_t)), \
> -  int (*)[__ARM_mve_type_int32x4_t]: __arm_vreinterpretq_u16_s32
> (__ARM_mve_coerce(__p0, int32x4_t)), \
> -  int (*)[__ARM_mve_type_int64x2_t]: __arm_vreinterpretq_u16_s64
> (__ARM_mve_coerce(__p0, int64x2_t)), \
> -  int (*)[__ARM_mve_type_uint8x16_t]: __arm_vreinterpretq_u16_u8
> (__ARM_mve_coerce(__p0, uint8x16_t)), \
> -  int (*)[__ARM_mve_type_int16x8_t]: __arm_vreinterpretq_u16_s16
> (__ARM_mve_coerce(__p0, int16x8_t)), \
> -  int (*)[__ARM_mve_type_uint32x4_t]: __arm_vreinterpretq_u16_u32
> (__ARM_mve_coerce(__p0, uint32x4_t)), \
> -  int (*)[__ARM_mve_type_uint64x2_t]: __arm_vreinterpretq_u16_u64
> (__ARM_mve_coerce(__p0, uint64x2_t)));})
> -
> -#define __arm_vreinterpretq_u32(p0) ({ __typeof(p0) __p0 = (p0); \
> -  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
> -  int (*)[__ARM_mve_type_int16x8_t]: __arm_vreinterpretq_u32_s16
> (__ARM_mve_coerce(__p0, int16x8_t)), \
> -  int (*)[__ARM_mve_type_int8x16_t]: __arm_vreinterpretq_u32_s8
> (__ARM_mve_coerce(__p0, int8x16_t)), \
> -  int (*)[__ARM_mve_type_int64x2_t]: __arm_vreinterpretq_u32_s64
> (__ARM_mve_coerce(__p0, int64x2_t)), \
> -  int (*)[__ARM_mve_type_uint8x16_t]: __arm_vreinterpretq_u32_u8
> (__ARM_mve_coerce(__p0, uint8x16_t)), \
> -  int (*)[__ARM_mve_type_uint16x8_t]: __arm_vreinterpretq_u32_u16
> (__ARM_mve_coerce(__p0, uint16x8_t)), \
> -  int (*)[__ARM_mve_type_int32x4_t]: __arm_vreinterpretq_u32_s32
> (__ARM_mve_coerce(__p0, int32x4_t)), \
> -  int (*)[__ARM_mve_type_uint64x2_t]: __arm_vreinterpretq_u32_u64
> (__ARM_mve_coerce(__p0, uint64x2_t)));})
> -
> -#define __arm_vreinterpretq_u64(p0) ({ __typeof(p0) __p0 = (p0); \
> -  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
> -  int (*)[__ARM_mve_type_int16x8_t]: __arm_vreinterpretq_u64_s16
> (__ARM_mve_coerce(__p0, int16x8_t)), \
> -  int (*)[__ARM_mve_type_int32x4_t]: __arm_vreinterpretq_u64_s32
> (__ARM_mve_coerce(__p0, int32x4_t)), \
> -  int (*)[__ARM_mve_type_int8x16_t]: __arm_vreinterpretq_u64_s8
> (__ARM_mve_coerce(__p0, int8x16_t)), \
> -  int (*)[__ARM_mve_type_uint8x16_t]: __arm_vreinterpretq_u64_u8
> (__ARM_mve_coerce(__p0, uint8x16_t)), \
> -  int (*)[__ARM_mve_type_uint16x8_t]: __arm_vreinterpretq_u64_u16
> (__ARM_mve_coerce(__p0, uint16x8_t)), \
> -  int (*)[__ARM_mve_type_uint32x4_t]: __arm_vreinterpretq_u64_u32
> (__ARM_mve_coerce(__p0, uint32x4_t)), \
> -  int (*)[__ARM_mve_type_int64x2_t]: __arm_vreinterpretq_u64_s64
> (__ARM_mve_coerce(__p0, int64x2_t)));})
> -
> -#define __arm_vreinterpretq_u8(p0) ({ __typeof(p0) __p0 = (p0); \
> -  _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \
> -  int (*)[__ARM_mve_type_int16x8_t]: __arm_vreinterpretq_u8_s16
> (__ARM_mve_coerce(__p0, int16x8_t)), \
> -  int (*)[__ARM_mve_type_int32x4_t]: __arm_vreinterpretq_u8_s32
> (__ARM_mve_coerce(__p0, int32x4_t)), \
> -  int (*)[__ARM_mve_type_int64x2_t]: __arm_vreinterpretq_u8_s64
> (__ARM_mve_coerce(__p0, int64x2_t)), \
> -  int (*)[__ARM_mve_type_int8x16_t]: __arm_vreinterpretq_u8_s8
> (__ARM_mve_coerce(__p0, int8x16_t)), \
> -  int (*)[__ARM_mve_type_uint16x8_t]: __arm_vreinterpretq_u8_u16
> (__ARM_mve_coerce(__p0, uint16x8_t)), \
> -  int (*)[__ARM_mve_type_uint32x4_t]: __arm_vreinterpretq_u8_u32
> (__ARM_mve_coerce(__p0, uint32x4_t)), \
> -  int (*)[__ARM_mve_type_uint64x2_t]: __arm_vreinterpretq_u8_u64
> (__ARM_mve_coerce(__p0, uint64x2_t)));})
> -
>  #define __arm_vabsq_x(p1,p2) ({ __typeof(p1) __p1 = (p1); \
>    _Generic( (int (*)[__ARM_mve_typeid(__p1)])0, \
>    int (*)[__ARM_mve_type_int8x16_t]: __arm_vabsq_x_s8
> (__ARM_mve_coerce(__p1, int8x16_t), p2), \
> diff --git a/gcc/config/arm/arm_mve_types.h
> b/gcc/config/arm/arm_mve_types.h
> index 12bb519142f..ae2591faa03 100644
> --- a/gcc/config/arm/arm_mve_types.h
> +++ b/gcc/config/arm/arm_mve_types.h
> @@ -29,1124 +29,101 @@ typedef float float32_t;
> 
>  #pragma GCC arm "arm_mve_types.h"
> 
> -__extension__ extern __inline int16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_s16_s32 (int32x4_t __a)
> -{
> -  return (int16x8_t)  __a;
> -}
> -
> -__extension__ extern __inline int16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_s16_s64 (int64x2_t __a)
> -{
> -  return (int16x8_t)  __a;
> -}
> -
> -__extension__ extern __inline int16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_s16_s8 (int8x16_t __a)
> -{
> -  return (int16x8_t)  __a;
> -}
> -
> -__extension__ extern __inline int16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_s16_u16 (uint16x8_t __a)
> -{
> -  return (int16x8_t)  __a;
> -}
> -
> -__extension__ extern __inline int16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_s16_u32 (uint32x4_t __a)
> -{
> -  return (int16x8_t)  __a;
> -}
> -
> -__extension__ extern __inline int16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_s16_u64 (uint64x2_t __a)
> -{
> -  return (int16x8_t)  __a;
> -}
> -
> -__extension__ extern __inline int16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_s16_u8 (uint8x16_t __a)
> -{
> -  return (int16x8_t)  __a;
> -}
> -
> -__extension__ extern __inline int32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_s32_s16 (int16x8_t __a)
> -{
> -  return (int32x4_t)  __a;
> -}
> -
> -__extension__ extern __inline int32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_s32_s64 (int64x2_t __a)
> -{
> -  return (int32x4_t)  __a;
> -}
> -
> -__extension__ extern __inline int32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_s32_s8 (int8x16_t __a)
> -{
> -  return (int32x4_t)  __a;
> -}
> -
> -__extension__ extern __inline int32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_s32_u16 (uint16x8_t __a)
> -{
> -  return (int32x4_t)  __a;
> -}
> -
> -__extension__ extern __inline int32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_s32_u32 (uint32x4_t __a)
> -{
> -  return (int32x4_t)  __a;
> -}
> -
> -__extension__ extern __inline int32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_s32_u64 (uint64x2_t __a)
> -{
> -  return (int32x4_t)  __a;
> -}
> -
> -__extension__ extern __inline int32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_s32_u8 (uint8x16_t __a)
> -{
> -  return (int32x4_t)  __a;
> -}
> -
> -__extension__ extern __inline int64x2_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_s64_s16 (int16x8_t __a)
> -{
> -  return (int64x2_t)  __a;
> -}
> -
> -__extension__ extern __inline int64x2_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_s64_s32 (int32x4_t __a)
> -{
> -  return (int64x2_t)  __a;
> -}
> -
> -__extension__ extern __inline int64x2_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_s64_s8 (int8x16_t __a)
> -{
> -  return (int64x2_t)  __a;
> -}
> -
> -__extension__ extern __inline int64x2_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_s64_u16 (uint16x8_t __a)
> -{
> -  return (int64x2_t)  __a;
> -}
> -
> -__extension__ extern __inline int64x2_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_s64_u32 (uint32x4_t __a)
> -{
> -  return (int64x2_t)  __a;
> -}
> -
> -__extension__ extern __inline int64x2_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_s64_u64 (uint64x2_t __a)
> -{
> -  return (int64x2_t)  __a;
> -}
> -
> -__extension__ extern __inline int64x2_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_s64_u8 (uint8x16_t __a)
> -{
> -  return (int64x2_t)  __a;
> -}
> -
> -__extension__ extern __inline int8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_s8_s16 (int16x8_t __a)
> -{
> -  return (int8x16_t)  __a;
> -}
> -
> -__extension__ extern __inline int8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_s8_s32 (int32x4_t __a)
> -{
> -  return (int8x16_t)  __a;
> -}
> -
> -__extension__ extern __inline int8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_s8_s64 (int64x2_t __a)
> -{
> -  return (int8x16_t)  __a;
> -}
> -
> -__extension__ extern __inline int8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_s8_u16 (uint16x8_t __a)
> -{
> -  return (int8x16_t)  __a;
> -}
> -
> -__extension__ extern __inline int8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_s8_u32 (uint32x4_t __a)
> -{
> -  return (int8x16_t)  __a;
> -}
> -
> -__extension__ extern __inline int8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_s8_u64 (uint64x2_t __a)
> -{
> -  return (int8x16_t)  __a;
> -}
> -
> -__extension__ extern __inline int8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_s8_u8 (uint8x16_t __a)
> -{
> -  return (int8x16_t)  __a;
> -}
> -
> -__extension__ extern __inline uint16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_u16_s16 (int16x8_t __a)
> -{
> -  return (uint16x8_t)  __a;
> -}
> -
> -__extension__ extern __inline uint16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_u16_s32 (int32x4_t __a)
> -{
> -  return (uint16x8_t)  __a;
> -}
> -
> -__extension__ extern __inline uint16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_u16_s64 (int64x2_t __a)
> -{
> -  return (uint16x8_t)  __a;
> -}
> -
> -__extension__ extern __inline uint16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_u16_s8 (int8x16_t __a)
> -{
> -  return (uint16x8_t)  __a;
> -}
> -
> -__extension__ extern __inline uint16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_u16_u32 (uint32x4_t __a)
> -{
> -  return (uint16x8_t)  __a;
> -}
> -
> -__extension__ extern __inline uint16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_u16_u64 (uint64x2_t __a)
> -{
> -  return (uint16x8_t)  __a;
> -}
> -
> -__extension__ extern __inline uint16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_u16_u8 (uint8x16_t __a)
> -{
> -  return (uint16x8_t)  __a;
> -}
> -
> -
> -__extension__ extern __inline uint32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_u32_s16 (int16x8_t __a)
> -{
> -  return (uint32x4_t)  __a;
> -}
> -
> -__extension__ extern __inline uint32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_u32_s32 (int32x4_t __a)
> -{
> -  return (uint32x4_t)  __a;
> -}
> -
> -__extension__ extern __inline uint32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_u32_s64 (int64x2_t __a)
> -{
> -  return (uint32x4_t)  __a;
> -}
> -
> -__extension__ extern __inline uint32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_u32_s8 (int8x16_t __a)
> -{
> -  return (uint32x4_t)  __a;
> -}
> -
> -__extension__ extern __inline uint32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_u32_u16 (uint16x8_t __a)
> -{
> -  return (uint32x4_t)  __a;
> -}
> -
> -__extension__ extern __inline uint32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_u32_u64 (uint64x2_t __a)
> -{
> -  return (uint32x4_t)  __a;
> -}
> -
> -__extension__ extern __inline uint32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_u32_u8 (uint8x16_t __a)
> -{
> -  return (uint32x4_t)  __a;
> -}
> -
> -__extension__ extern __inline uint64x2_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_u64_s16 (int16x8_t __a)
> -{
> -  return (uint64x2_t)  __a;
> -}
> -
> -__extension__ extern __inline uint64x2_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_u64_s32 (int32x4_t __a)
> -{
> -  return (uint64x2_t)  __a;
> -}
> -
> -__extension__ extern __inline uint64x2_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_u64_s64 (int64x2_t __a)
> -{
> -  return (uint64x2_t)  __a;
> -}
> -
> -__extension__ extern __inline uint64x2_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_u64_s8 (int8x16_t __a)
> -{
> -  return (uint64x2_t)  __a;
> -}
> -
> -__extension__ extern __inline uint64x2_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_u64_u16 (uint16x8_t __a)
> -{
> -  return (uint64x2_t)  __a;
> -}
> -
> -__extension__ extern __inline uint64x2_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_u64_u32 (uint32x4_t __a)
> -{
> -  return (uint64x2_t)  __a;
> -}
> -
> -__extension__ extern __inline uint64x2_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_u64_u8 (uint8x16_t __a)
> -{
> -  return (uint64x2_t)  __a;
> -}
> -
> -__extension__ extern __inline uint8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_u8_s16 (int16x8_t __a)
> -{
> -  return (uint8x16_t)  __a;
> -}
> -
> -__extension__ extern __inline uint8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_u8_s32 (int32x4_t __a)
> -{
> -  return (uint8x16_t)  __a;
> -}
> -
> -__extension__ extern __inline uint8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_u8_s64 (int64x2_t __a)
> -{
> -  return (uint8x16_t)  __a;
> -}
> -
> -__extension__ extern __inline uint8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_u8_s8 (int8x16_t __a)
> -{
> -  return (uint8x16_t)  __a;
> -}
> -
> -__extension__ extern __inline uint8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_u8_u16 (uint16x8_t __a)
> -{
> -  return (uint8x16_t)  __a;
> -}
> -
> -__extension__ extern __inline uint8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_u8_u32 (uint32x4_t __a)
> -{
> -  return (uint8x16_t)  __a;
> -}
> -
> -__extension__ extern __inline uint8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_u8_u64 (uint64x2_t __a)
> -{
> -  return (uint8x16_t)  __a;
> -}
> -
> -__extension__ extern __inline uint8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vuninitializedq_u8 (void)
> -{
> -  uint8x16_t __uninit;
> -  __asm__ ("": "=w"(__uninit));
> -  return __uninit;
> -}
> -
> -__extension__ extern __inline uint16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vuninitializedq_u16 (void)
> -{
> -  uint16x8_t __uninit;
> -  __asm__ ("": "=w"(__uninit));
> -  return __uninit;
> -}
> -
> -__extension__ extern __inline uint32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vuninitializedq_u32 (void)
> -{
> -  uint32x4_t __uninit;
> -  __asm__ ("": "=w"(__uninit));
> -  return __uninit;
> -}
> -
> -__extension__ extern __inline uint64x2_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vuninitializedq_u64 (void)
> -{
> -  uint64x2_t __uninit;
> -  __asm__ ("": "=w"(__uninit));
> -  return __uninit;
> -}
> -
> -__extension__ extern __inline int8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vuninitializedq_s8 (void)
> -{
> -  int8x16_t __uninit;
> -  __asm__ ("": "=w"(__uninit));
> -  return __uninit;
> -}
> -
> -__extension__ extern __inline int16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vuninitializedq_s16 (void)
> -{
> -  int16x8_t __uninit;
> -  __asm__ ("": "=w"(__uninit));
> -  return __uninit;
> -}
> -
> -__extension__ extern __inline int32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vuninitializedq_s32 (void)
> -{
> -  int32x4_t __uninit;
> -  __asm__ ("": "=w"(__uninit));
> -  return __uninit;
> -}
> -
> -__extension__ extern __inline int64x2_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vuninitializedq_s64 (void)
> -{
> -  int64x2_t __uninit;
> -  __asm__ ("": "=w"(__uninit));
> -  return __uninit;
> -}
> -
> -#if (__ARM_FEATURE_MVE & 2) /* MVE Floating point.  */
> -
> -__extension__ extern __inline int32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_s32_f16 (float16x8_t __a)
> -{
> -  return (int32x4_t)  __a;
> -}
> -
> -__extension__ extern __inline int32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_s32_f32 (float32x4_t __a)
> -{
> -  return (int32x4_t)  __a;
> -}
> -
> -__extension__ extern __inline int16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_s16_f16 (float16x8_t __a)
> -{
> -  return (int16x8_t)  __a;
> -}
> -
> -__extension__ extern __inline int16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_s16_f32 (float32x4_t __a)
> -{
> -  return (int16x8_t)  __a;
> -}
> -
> -__extension__ extern __inline int64x2_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_s64_f16 (float16x8_t __a)
> -{
> -  return (int64x2_t)  __a;
> -}
> -
> -__extension__ extern __inline int64x2_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_s64_f32 (float32x4_t __a)
> -{
> -  return (int64x2_t)  __a;
> -}
> -
> -__extension__ extern __inline int8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_s8_f16 (float16x8_t __a)
> -{
> -  return (int8x16_t)  __a;
> -}
> -
> -__extension__ extern __inline int8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_s8_f32 (float32x4_t __a)
> -{
> -  return (int8x16_t)  __a;
> -}
> -
> -__extension__ extern __inline uint16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_u16_f16 (float16x8_t __a)
> -{
> -  return (uint16x8_t)  __a;
> -}
> -
> -__extension__ extern __inline uint16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_u16_f32 (float32x4_t __a)
> -{
> -  return (uint16x8_t)  __a;
> -}
> -
> -__extension__ extern __inline uint32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_u32_f16 (float16x8_t __a)
> -{
> -  return (uint32x4_t)  __a;
> -}
> -
> -__extension__ extern __inline uint32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_u32_f32 (float32x4_t __a)
> -{
> -  return (uint32x4_t)  __a;
> -}
> -
> -__extension__ extern __inline uint64x2_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_u64_f16 (float16x8_t __a)
> -{
> -  return (uint64x2_t)  __a;
> -}
> -
> -__extension__ extern __inline uint64x2_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_u64_f32 (float32x4_t __a)
> -{
> -  return (uint64x2_t)  __a;
> -}
> -
> -__extension__ extern __inline uint8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_u8_f16 (float16x8_t __a)
> -{
> -  return (uint8x16_t)  __a;
> -}
> -
> -__extension__ extern __inline uint8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_u8_f32 (float32x4_t __a)
> -{
> -  return (uint8x16_t)  __a;
> -}
> -
> -__extension__ extern __inline float16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_f16_f32 (float32x4_t __a)
> -{
> -  return (float16x8_t)  __a;
> -}
> -
> -__extension__ extern __inline float16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_f16_s16 (int16x8_t __a)
> -{
> -  return (float16x8_t)  __a;
> -}
> -
> -__extension__ extern __inline float16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_f16_s32 (int32x4_t __a)
> -{
> -  return (float16x8_t)  __a;
> -}
> -
> -__extension__ extern __inline float16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_f16_s64 (int64x2_t __a)
> -{
> -  return (float16x8_t)  __a;
> -}
> -
> -__extension__ extern __inline float16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_f16_s8 (int8x16_t __a)
> -{
> -  return (float16x8_t)  __a;
> -}
> -
> -__extension__ extern __inline float16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_f16_u16 (uint16x8_t __a)
> -{
> -  return (float16x8_t)  __a;
> -}
> -
> -__extension__ extern __inline float16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_f16_u32 (uint32x4_t __a)
> -{
> -  return (float16x8_t)  __a;
> -}
> -
> -__extension__ extern __inline float16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_f16_u64 (uint64x2_t __a)
> -{
> -  return (float16x8_t)  __a;
> -}
> -
> -__extension__ extern __inline float16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_f16_u8 (uint8x16_t __a)
> -{
> -  return (float16x8_t)  __a;
> -}
> -
> -__extension__ extern __inline float32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_f32_f16 (float16x8_t __a)
> -{
> -  return (float32x4_t)  __a;
> -}
> -
> -__extension__ extern __inline float32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_f32_s16 (int16x8_t __a)
> -{
> -  return (float32x4_t)  __a;
> -}
> -
> -__extension__ extern __inline float32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_f32_s32 (int32x4_t __a)
> -{
> -  return (float32x4_t)  __a;
> -}
> -
> -__extension__ extern __inline float32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_f32_s64 (int64x2_t __a)
> -{
> -  return (float32x4_t)  __a;
> -}
> -
> -__extension__ extern __inline float32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_f32_s8 (int8x16_t __a)
> -{
> -  return (float32x4_t)  __a;
> -}
> -
> -__extension__ extern __inline float32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_f32_u16 (uint16x8_t __a)
> -{
> -  return (float32x4_t)  __a;
> -}
> -
> -__extension__ extern __inline float32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_f32_u32 (uint32x4_t __a)
> -{
> -  return (float32x4_t)  __a;
> -}
> -
> -__extension__ extern __inline float32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_f32_u64 (uint64x2_t __a)
> -{
> -  return (float32x4_t)  __a;
> -}
> -
> -__extension__ extern __inline float32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_f32_u8 (uint8x16_t __a)
> -{
> -  return (float32x4_t)  __a;
> -}
> -
> -__extension__ extern __inline float16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vuninitializedq_f16 (void)
> -{
> -  float16x8_t __uninit;
> -  __asm__ ("": "=w" (__uninit));
> -  return __uninit;
> -}
> -
> -__extension__ extern __inline float32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vuninitializedq_f32 (void)
> -{
> -  float32x4_t __uninit;
> -  __asm__ ("": "=w" (__uninit));
> -  return __uninit;
> -}
> -
> -#endif
> -
> -#ifdef __cplusplus
> -
> -__extension__ extern __inline int16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_s16 (int32x4_t __a)
> -{
> - return __arm_vreinterpretq_s16_s32 (__a);
> -}
> -
> -__extension__ extern __inline int16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_s16 (int64x2_t __a)
> -{
> - return __arm_vreinterpretq_s16_s64 (__a);
> -}
> -
> -__extension__ extern __inline int16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_s16 (int8x16_t __a)
> -{
> - return __arm_vreinterpretq_s16_s8 (__a);
> -}
> -
> -__extension__ extern __inline int16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_s16 (uint16x8_t __a)
> -{
> - return __arm_vreinterpretq_s16_u16 (__a);
> -}
> -
> -__extension__ extern __inline int16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_s16 (uint32x4_t __a)
> -{
> - return __arm_vreinterpretq_s16_u32 (__a);
> -}
> -
> -__extension__ extern __inline int16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_s16 (uint64x2_t __a)
> -{
> - return __arm_vreinterpretq_s16_u64 (__a);
> -}
> -
> -__extension__ extern __inline int16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_s16 (uint8x16_t __a)
> -{
> - return __arm_vreinterpretq_s16_u8 (__a);
> -}
> -
> -__extension__ extern __inline int32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_s32 (int16x8_t __a)
> -{
> - return __arm_vreinterpretq_s32_s16 (__a);
> -}
> -
> -__extension__ extern __inline int32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_s32 (int64x2_t __a)
> -{
> - return __arm_vreinterpretq_s32_s64 (__a);
> -}
> -
> -__extension__ extern __inline int32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_s32 (int8x16_t __a)
> -{
> - return __arm_vreinterpretq_s32_s8 (__a);
> -}
> -
> -__extension__ extern __inline int32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_s32 (uint16x8_t __a)
> -{
> - return __arm_vreinterpretq_s32_u16 (__a);
> -}
> -
> -__extension__ extern __inline int32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_s32 (uint32x4_t __a)
> -{
> - return __arm_vreinterpretq_s32_u32 (__a);
> -}
> -
> -__extension__ extern __inline int32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_s32 (uint64x2_t __a)
> -{
> - return __arm_vreinterpretq_s32_u64 (__a);
> -}
> -
> -__extension__ extern __inline int32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_s32 (uint8x16_t __a)
> -{
> - return __arm_vreinterpretq_s32_u8 (__a);
> -}
> -
> -__extension__ extern __inline int64x2_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_s64 (int16x8_t __a)
> -{
> - return __arm_vreinterpretq_s64_s16 (__a);
> -}
> -
> -__extension__ extern __inline int64x2_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_s64 (int32x4_t __a)
> -{
> - return __arm_vreinterpretq_s64_s32 (__a);
> -}
> -
> -__extension__ extern __inline int64x2_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_s64 (int8x16_t __a)
> -{
> - return __arm_vreinterpretq_s64_s8 (__a);
> -}
> -
> -__extension__ extern __inline int64x2_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_s64 (uint16x8_t __a)
> -{
> - return __arm_vreinterpretq_s64_u16 (__a);
> -}
> -
> -__extension__ extern __inline int64x2_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_s64 (uint32x4_t __a)
> -{
> - return __arm_vreinterpretq_s64_u32 (__a);
> -}
> -
> -__extension__ extern __inline int64x2_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_s64 (uint64x2_t __a)
> -{
> - return __arm_vreinterpretq_s64_u64 (__a);
> -}
> -
> -__extension__ extern __inline int64x2_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_s64 (uint8x16_t __a)
> -{
> - return __arm_vreinterpretq_s64_u8 (__a);
> -}
> -
> -__extension__ extern __inline int8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_s8 (int16x8_t __a)
> -{
> - return __arm_vreinterpretq_s8_s16 (__a);
> -}
> -
> -__extension__ extern __inline int8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_s8 (int32x4_t __a)
> -{
> - return __arm_vreinterpretq_s8_s32 (__a);
> -}
> -
> -__extension__ extern __inline int8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_s8 (int64x2_t __a)
> -{
> - return __arm_vreinterpretq_s8_s64 (__a);
> -}
> -
> -__extension__ extern __inline int8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_s8 (uint16x8_t __a)
> -{
> - return __arm_vreinterpretq_s8_u16 (__a);
> -}
> -
> -__extension__ extern __inline int8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_s8 (uint32x4_t __a)
> -{
> - return __arm_vreinterpretq_s8_u32 (__a);
> -}
> -
> -__extension__ extern __inline int8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_s8 (uint64x2_t __a)
> -{
> - return __arm_vreinterpretq_s8_u64 (__a);
> -}
> -
> -__extension__ extern __inline int8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_s8 (uint8x16_t __a)
> -{
> - return __arm_vreinterpretq_s8_u8 (__a);
> -}
> -
> -__extension__ extern __inline uint16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_u16 (int16x8_t __a)
> -{
> - return __arm_vreinterpretq_u16_s16 (__a);
> -}
> -
> -__extension__ extern __inline uint16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_u16 (int32x4_t __a)
> -{
> - return __arm_vreinterpretq_u16_s32 (__a);
> -}
> -
> -__extension__ extern __inline uint16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_u16 (int64x2_t __a)
> -{
> - return __arm_vreinterpretq_u16_s64 (__a);
> -}
> -
> -__extension__ extern __inline uint16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_u16 (int8x16_t __a)
> -{
> - return __arm_vreinterpretq_u16_s8 (__a);
> -}
> -
> -__extension__ extern __inline uint16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_u16 (uint32x4_t __a)
> -{
> - return __arm_vreinterpretq_u16_u32 (__a);
> -}
> -
> -__extension__ extern __inline uint16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_u16 (uint64x2_t __a)
> -{
> - return __arm_vreinterpretq_u16_u64 (__a);
> -}
> -
> -__extension__ extern __inline uint16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_u16 (uint8x16_t __a)
> -{
> - return __arm_vreinterpretq_u16_u8 (__a);
> -}
> -
> -__extension__ extern __inline uint32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_u32 (int16x8_t __a)
> -{
> - return __arm_vreinterpretq_u32_s16 (__a);
> -}
> -
> -__extension__ extern __inline uint32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_u32 (int32x4_t __a)
> -{
> - return __arm_vreinterpretq_u32_s32 (__a);
> -}
> -
> -__extension__ extern __inline uint32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_u32 (int64x2_t __a)
> -{
> - return __arm_vreinterpretq_u32_s64 (__a);
> -}
> -
> -__extension__ extern __inline uint32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_u32 (int8x16_t __a)
> -{
> - return __arm_vreinterpretq_u32_s8 (__a);
> -}
> -
> -__extension__ extern __inline uint32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_u32 (uint16x8_t __a)
> -{
> - return __arm_vreinterpretq_u32_u16 (__a);
> -}
> -
> -__extension__ extern __inline uint32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_u32 (uint64x2_t __a)
> -{
> - return __arm_vreinterpretq_u32_u64 (__a);
> -}
> -
> -__extension__ extern __inline uint32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_u32 (uint8x16_t __a)
> -{
> - return __arm_vreinterpretq_u32_u8 (__a);
> -}
> -
> -__extension__ extern __inline uint64x2_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_u64 (int16x8_t __a)
> -{
> - return __arm_vreinterpretq_u64_s16 (__a);
> -}
> -
> -__extension__ extern __inline uint64x2_t
> +__extension__ extern __inline uint8x16_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_u64 (int32x4_t __a)
> +__arm_vuninitializedq_u8 (void)
>  {
> - return __arm_vreinterpretq_u64_s32 (__a);
> +  uint8x16_t __uninit;
> +  __asm__ ("": "=w"(__uninit));
> +  return __uninit;
>  }
> 
> -__extension__ extern __inline uint64x2_t
> +__extension__ extern __inline uint16x8_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_u64 (int64x2_t __a)
> +__arm_vuninitializedq_u16 (void)
>  {
> - return __arm_vreinterpretq_u64_s64 (__a);
> +  uint16x8_t __uninit;
> +  __asm__ ("": "=w"(__uninit));
> +  return __uninit;
>  }
> 
> -__extension__ extern __inline uint64x2_t
> +__extension__ extern __inline uint32x4_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_u64 (int8x16_t __a)
> +__arm_vuninitializedq_u32 (void)
>  {
> - return __arm_vreinterpretq_u64_s8 (__a);
> +  uint32x4_t __uninit;
> +  __asm__ ("": "=w"(__uninit));
> +  return __uninit;
>  }
> 
>  __extension__ extern __inline uint64x2_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_u64 (uint16x8_t __a)
> +__arm_vuninitializedq_u64 (void)
>  {
> - return __arm_vreinterpretq_u64_u16 (__a);
> +  uint64x2_t __uninit;
> +  __asm__ ("": "=w"(__uninit));
> +  return __uninit;
>  }
> 
> -__extension__ extern __inline uint64x2_t
> +__extension__ extern __inline int8x16_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_u64 (uint32x4_t __a)
> +__arm_vuninitializedq_s8 (void)
>  {
> - return __arm_vreinterpretq_u64_u32 (__a);
> +  int8x16_t __uninit;
> +  __asm__ ("": "=w"(__uninit));
> +  return __uninit;
>  }
> 
> -__extension__ extern __inline uint64x2_t
> +__extension__ extern __inline int16x8_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_u64 (uint8x16_t __a)
> +__arm_vuninitializedq_s16 (void)
>  {
> - return __arm_vreinterpretq_u64_u8 (__a);
> +  int16x8_t __uninit;
> +  __asm__ ("": "=w"(__uninit));
> +  return __uninit;
>  }
> 
> -__extension__ extern __inline uint8x16_t
> +__extension__ extern __inline int32x4_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_u8 (int16x8_t __a)
> +__arm_vuninitializedq_s32 (void)
>  {
> - return __arm_vreinterpretq_u8_s16 (__a);
> +  int32x4_t __uninit;
> +  __asm__ ("": "=w"(__uninit));
> +  return __uninit;
>  }
> 
> -__extension__ extern __inline uint8x16_t
> +__extension__ extern __inline int64x2_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_u8 (int32x4_t __a)
> +__arm_vuninitializedq_s64 (void)
>  {
> - return __arm_vreinterpretq_u8_s32 (__a);
> +  int64x2_t __uninit;
> +  __asm__ ("": "=w"(__uninit));
> +  return __uninit;
>  }
> 
> -__extension__ extern __inline uint8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_u8 (int64x2_t __a)
> -{
> - return __arm_vreinterpretq_u8_s64 (__a);
> -}
> +#if (__ARM_FEATURE_MVE & 2) /* MVE Floating point.  */
> 
> -__extension__ extern __inline uint8x16_t
> +__extension__ extern __inline float16x8_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_u8 (int8x16_t __a)
> +__arm_vuninitializedq_f16 (void)
>  {
> - return __arm_vreinterpretq_u8_s8 (__a);
> +  float16x8_t __uninit;
> +  __asm__ ("": "=w" (__uninit));
> +  return __uninit;
>  }
> 
> -__extension__ extern __inline uint8x16_t
> +__extension__ extern __inline float32x4_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_u8 (uint16x8_t __a)
> +__arm_vuninitializedq_f32 (void)
>  {
> - return __arm_vreinterpretq_u8_u16 (__a);
> +  float32x4_t __uninit;
> +  __asm__ ("": "=w" (__uninit));
> +  return __uninit;
>  }
> 
> -__extension__ extern __inline uint8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_u8 (uint32x4_t __a)
> -{
> - return __arm_vreinterpretq_u8_u32 (__a);
> -}
> +#endif
> 
> -__extension__ extern __inline uint8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_u8 (uint64x2_t __a)
> -{
> - return __arm_vreinterpretq_u8_u64 (__a);
> -}
> +#ifdef __cplusplus
> 
>  __extension__ extern __inline uint8x16_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> @@ -1205,244 +182,6 @@ __arm_vuninitializedq (int64x2_t /* __v
> ATTRIBUTE UNUSED */)
>  }
> 
>  #if (__ARM_FEATURE_MVE & 2) /* MVE Floating point.  */
> -__extension__ extern __inline int32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_s32 (float16x8_t __a)
> -{
> - return __arm_vreinterpretq_s32_f16 (__a);
> -}
> -
> -__extension__ extern __inline int32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_s32 (float32x4_t __a)
> -{
> - return __arm_vreinterpretq_s32_f32 (__a);
> -}
> -
> -__extension__ extern __inline int16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_s16 (float16x8_t __a)
> -{
> - return __arm_vreinterpretq_s16_f16 (__a);
> -}
> -
> -__extension__ extern __inline int16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_s16 (float32x4_t __a)
> -{
> - return __arm_vreinterpretq_s16_f32 (__a);
> -}
> -
> -__extension__ extern __inline int64x2_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_s64 (float16x8_t __a)
> -{
> - return __arm_vreinterpretq_s64_f16 (__a);
> -}
> -
> -__extension__ extern __inline int64x2_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_s64 (float32x4_t __a)
> -{
> - return __arm_vreinterpretq_s64_f32 (__a);
> -}
> -
> -__extension__ extern __inline int8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_s8 (float16x8_t __a)
> -{
> - return __arm_vreinterpretq_s8_f16 (__a);
> -}
> -
> -__extension__ extern __inline int8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_s8 (float32x4_t __a)
> -{
> - return __arm_vreinterpretq_s8_f32 (__a);
> -}
> -
> -__extension__ extern __inline uint16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_u16 (float16x8_t __a)
> -{
> - return __arm_vreinterpretq_u16_f16 (__a);
> -}
> -
> -__extension__ extern __inline uint16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_u16 (float32x4_t __a)
> -{
> - return __arm_vreinterpretq_u16_f32 (__a);
> -}
> -
> -__extension__ extern __inline uint32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_u32 (float16x8_t __a)
> -{
> - return __arm_vreinterpretq_u32_f16 (__a);
> -}
> -
> -__extension__ extern __inline uint32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_u32 (float32x4_t __a)
> -{
> - return __arm_vreinterpretq_u32_f32 (__a);
> -}
> -
> -__extension__ extern __inline uint64x2_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_u64 (float16x8_t __a)
> -{
> - return __arm_vreinterpretq_u64_f16 (__a);
> -}
> -
> -__extension__ extern __inline uint64x2_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_u64 (float32x4_t __a)
> -{
> - return __arm_vreinterpretq_u64_f32 (__a);
> -}
> -
> -__extension__ extern __inline uint8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_u8 (float16x8_t __a)
> -{
> - return __arm_vreinterpretq_u8_f16 (__a);
> -}
> -
> -__extension__ extern __inline uint8x16_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_u8 (float32x4_t __a)
> -{
> - return __arm_vreinterpretq_u8_f32 (__a);
> -}
> -
> -__extension__ extern __inline float16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_f16 (float32x4_t __a)
> -{
> - return __arm_vreinterpretq_f16_f32 (__a);
> -}
> -
> -__extension__ extern __inline float16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_f16 (int16x8_t __a)
> -{
> - return __arm_vreinterpretq_f16_s16 (__a);
> -}
> -
> -__extension__ extern __inline float16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_f16 (int32x4_t __a)
> -{
> - return __arm_vreinterpretq_f16_s32 (__a);
> -}
> -
> -__extension__ extern __inline float16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_f16 (int64x2_t __a)
> -{
> - return __arm_vreinterpretq_f16_s64 (__a);
> -}
> -
> -__extension__ extern __inline float16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_f16 (int8x16_t __a)
> -{
> - return __arm_vreinterpretq_f16_s8 (__a);
> -}
> -
> -__extension__ extern __inline float16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_f16 (uint16x8_t __a)
> -{
> - return __arm_vreinterpretq_f16_u16 (__a);
> -}
> -
> -__extension__ extern __inline float16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_f16 (uint32x4_t __a)
> -{
> - return __arm_vreinterpretq_f16_u32 (__a);
> -}
> -
> -__extension__ extern __inline float16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_f16 (uint64x2_t __a)
> -{
> - return __arm_vreinterpretq_f16_u64 (__a);
> -}
> -
> -__extension__ extern __inline float16x8_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_f16 (uint8x16_t __a)
> -{
> - return __arm_vreinterpretq_f16_u8 (__a);
> -}
> -
> -__extension__ extern __inline float32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_f32 (float16x8_t __a)
> -{
> - return __arm_vreinterpretq_f32_f16 (__a);
> -}
> -
> -__extension__ extern __inline float32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_f32 (int16x8_t __a)
> -{
> - return __arm_vreinterpretq_f32_s16 (__a);
> -}
> -
> -__extension__ extern __inline float32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_f32 (int32x4_t __a)
> -{
> - return __arm_vreinterpretq_f32_s32 (__a);
> -}
> -
> -__extension__ extern __inline float32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_f32 (int64x2_t __a)
> -{
> - return __arm_vreinterpretq_f32_s64 (__a);
> -}
> -
> -__extension__ extern __inline float32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_f32 (int8x16_t __a)
> -{
> - return __arm_vreinterpretq_f32_s8 (__a);
> -}
> -
> -__extension__ extern __inline float32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_f32 (uint16x8_t __a)
> -{
> - return __arm_vreinterpretq_f32_u16 (__a);
> -}
> -
> -__extension__ extern __inline float32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_f32 (uint32x4_t __a)
> -{
> - return __arm_vreinterpretq_f32_u32 (__a);
> -}
> -
> -__extension__ extern __inline float32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_f32 (uint64x2_t __a)
> -{
> - return __arm_vreinterpretq_f32_u64 (__a);
> -}
> -
> -__extension__ extern __inline float32x4_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vreinterpretq_f32 (uint8x16_t __a)
> -{
> - return __arm_vreinterpretq_f32_u8 (__a);
> -}
> -
>  __extension__ extern __inline float16x8_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __arm_vuninitializedq (float16x8_t /* __v ATTRIBUTE UNUSED */)
> diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
> index 35eab6c94bf..ab688396f97 100644
> --- a/gcc/config/arm/mve.md
> +++ b/gcc/config/arm/mve.md
> @@ -10561,3 +10561,21 @@ (define_expand
> "vcond_mask_<mode><MVE_vpred>"
>      }
>    DONE;
>  })
> +
> +;; Reinterpret operand 1 in operand 0's mode, without changing its contents.
> +(define_expand "@arm_mve_reinterpret<mode>"
> +  [(set (match_operand:MVE_vecs 0 "register_operand")
> +	(unspec:MVE_vecs
> +	  [(match_operand 1 "arm_any_register_operand")]
> +	  REINTERPRET))]
> +  "(TARGET_HAVE_MVE && VALID_MVE_SI_MODE (<MODE>mode))
> +    || (TARGET_HAVE_MVE_FLOAT && VALID_MVE_SF_MODE
> (<MODE>mode))"
> +  {
> +    machine_mode src_mode = GET_MODE (operands[1]);
> +    if (targetm.can_change_mode_class (<MODE>mode, src_mode,
> VFP_REGS))
> +      {
> +	emit_move_insn (operands[0], gen_lowpart (<MODE>mode,
> operands[1]));
> +	DONE;
> +      }
> +  }
> +)
> diff --git a/gcc/config/arm/unspecs.md b/gcc/config/arm/unspecs.md
> index 84384ee798d..dccda283573 100644
> --- a/gcc/config/arm/unspecs.md
> +++ b/gcc/config/arm/unspecs.md
> @@ -1255,4 +1255,5 @@ (define_c_enum "unspec" [
>    SQRSHRL_64
>    SQRSHRL_48
>    VSHLCQ_M_
> +  REINTERPRET
>  ])
> diff --git a/gcc/testsuite/g++.target/arm/mve.exp
> b/gcc/testsuite/g++.target/arm/mve.exp
> index cd824035540..f75ec20ea64 100644
> --- a/gcc/testsuite/g++.target/arm/mve.exp
> +++ b/gcc/testsuite/g++.target/arm/mve.exp
> @@ -42,8 +42,12 @@ set dg-do-what-default "assemble"
>  dg-init
> 
>  # Main loop.
> -dg-runtest [lsort [glob -nocomplain
> $srcdir/$subdir/../../gcc.target/arm/mve/intrinsics/*.\[cCS\]]] \
> -	"" $DEFAULT_CXXFLAGS
> +set gcc_subdir [string replace $subdir 0 2 gcc]
> +set files [glob -nocomplain \
> +	       "$srcdir/$subdir/../../gcc.target/arm/mve/intrinsics/*.\[cCS\]" \
> +	       "$srcdir/$gcc_subdir/mve/general/*.\[cCS\]" \
> +	       "$srcdir/$subdir/mve/general-c++/*.\[cCS\]"]
> +dg-runtest [lsort $files] "" $DEFAULT_CXXFLAGS
> 
>  # All done.
>  set dg-do-what-default ${save-dg-do-what-default}
> diff --git a/gcc/testsuite/g++.target/arm/mve/general-c++/nomve_fp_1.c
> b/gcc/testsuite/g++.target/arm/mve/general-c++/nomve_fp_1.c
> new file mode 100644
> index 00000000000..e0692ceb8c8
> --- /dev/null
> +++ b/gcc/testsuite/g++.target/arm/mve/general-c++/nomve_fp_1.c
> @@ -0,0 +1,15 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target arm_v8_1m_mve_ok } */
> +/* Do not use dg-add-options arm_v8_1m_mve, because this might expand
> to "",
> +   which could imply mve+fp depending on the user settings. We want to
> make
> +   sure the '+fp' extension is not enabled.  */
> +/* { dg-options "-mfpu=auto -march=armv8.1-m.main+mve" } */
> +
> +#include <arm_mve.h>
> +
> +void
> +f1 (uint8x16_t v)
> +{
> +  vreinterpretq_f16 (v); /* { dg-error {ACLE function 'void
> vreinterpretq_f16\(uint8x16_t\)' requires ISA extension 'mve.fp'} } */
> +  /* { dg-message {note: you can enable mve.fp by using the command-line
> option '-march', or by using the 'target' attribute or pragma} "" {target *-*-*} .-
> 1 } */
> +}
> diff --git a/gcc/testsuite/g++.target/arm/mve/general-c++/vreinterpretq_1.C
> b/gcc/testsuite/g++.target/arm/mve/general-c++/vreinterpretq_1.C
> new file mode 100644
> index 00000000000..8b29ee58163
> --- /dev/null
> +++ b/gcc/testsuite/g++.target/arm/mve/general-c++/vreinterpretq_1.C
> @@ -0,0 +1,25 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
> +/* { dg-add-options arm_v8_1m_mve_fp } */
> +
> +#include <arm_mve.h>
> +
> +void
> +f1 (int8x16_t s8, uint16x8_t u16, float32x4_t f32)
> +{
> +  __arm_vreinterpretq_s8 (); /* { dg-error {no matching function for call to
> '__arm_vreinterpretq_s8\(\)'} } */
> +  __arm_vreinterpretq_s8 (s8, s8); /* { dg-error {no matching function for call
> to '__arm_vreinterpretq_s8\(int8x16_t\&, int8x16_t\&\)'} } */
> +  __arm_vreinterpretq_s8 (0); /* { dg-error {no matching function for call to
> '__arm_vreinterpretq_s8\(int\)'} } */
> +  __arm_vreinterpretq_s8 (s8); /* { dg-error {no matching function for call to
> '__arm_vreinterpretq_s8\(int8x16_t\&\)'} } */
> +  __arm_vreinterpretq_s8 (u16);
> +  __arm_vreinterpretq_u16 (); /* { dg-error {no matching function for call to
> '__arm_vreinterpretq_u16\(\)'} } */
> +  __arm_vreinterpretq_u16 (u16, u16); /* { dg-error {no matching function
> for call to '__arm_vreinterpretq_u16\(uint16x8_t\&, uint16x8_t\&\)'} } */
> +  __arm_vreinterpretq_u16 (0); /* { dg-error {no matching function for call to
> '__arm_vreinterpretq_u16\(int\)'} } */
> +  __arm_vreinterpretq_u16 (u16); /* { dg-error {no matching function for call
> to '__arm_vreinterpretq_u16\(uint16x8_t\&\)'} } */
> +  __arm_vreinterpretq_u16 (f32);
> +  __arm_vreinterpretq_f32 (); /* { dg-error {no matching function for call to
> '__arm_vreinterpretq_f32\(\)'} } */
> +  __arm_vreinterpretq_f32 (f32, f32); /* { dg-error {no matching function for
> call to '__arm_vreinterpretq_f32\(float32x4_t\&, float32x4_t\&\)'} } */
> +  __arm_vreinterpretq_f32 (0); /* { dg-error {no matching function for call to
> '__arm_vreinterpretq_f32\(int\)'} } */
> +  __arm_vreinterpretq_f32 (f32); /* { dg-error {no matching function for call
> to '__arm_vreinterpretq_f32\(float32x4_t\&\)'} } */
> +  __arm_vreinterpretq_f32 (s8);
> +}
> diff --git a/gcc/testsuite/gcc.target/arm/mve/general-c/nomve_fp_1.c
> b/gcc/testsuite/gcc.target/arm/mve/general-c/nomve_fp_1.c
> new file mode 100644
> index 00000000000..21c2af16a61
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/mve/general-c/nomve_fp_1.c
> @@ -0,0 +1,15 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target arm_v8_1m_mve_ok } */
> +/* Do not use dg-add-options arm_v8_1m_mve, because this might expand
> to "",
> +   which could imply mve+fp depending on the user settings. We want to
> make
> +   sure the '+fp' extension is not enabled.  */
> +/* { dg-options "-mfpu=auto -march=armv8.1-m.main+mve" } */
> +
> +#include <arm_mve.h>
> +
> +void
> +foo (uint8x16_t v)
> +{
> +  vreinterpretq_f16 (v); /* { dg-error {ACLE function
> '__arm_vreinterpretq_f16_u8' requires ISA extension 'mve.fp'} } */
> +  /* { dg-message {note: you can enable mve.fp by using the command-line
> option '-march', or by using the 'target' attribute or pragma} "" {target *-*-*} .-
> 1 } */
> +}
> diff --git a/gcc/testsuite/gcc.target/arm/mve/general-c/vreinterpretq_1.c
> b/gcc/testsuite/gcc.target/arm/mve/general-c/vreinterpretq_1.c
> new file mode 100644
> index 00000000000..0297bd50198
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/mve/general-c/vreinterpretq_1.c
> @@ -0,0 +1,25 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
> +/* { dg-add-options arm_v8_1m_mve_fp } */
> +
> +#include <arm_mve.h>
> +
> +void
> +f1 (int8x16_t s8, uint16x8_t u16, float32x4_t f32)
> +{
> +  __arm_vreinterpretq_s8 (); /* { dg-error {too few arguments to function
> '__arm_vreinterpretq_s8'} } */
> +  __arm_vreinterpretq_s8 (s8, s8); /* { dg-error {too many arguments to
> function '__arm_vreinterpretq_s8'} } */
> +  __arm_vreinterpretq_s8 (0); /* { dg-error {passing 'int' to argument 1 of
> '__arm_vreinterpretq_s8', which expects an MVE vector type} } */
> +  __arm_vreinterpretq_s8 (s8); /* { dg-error {'__arm_vreinterpretq_s8' has
> no form that takes 'int8x16_t' arguments} } */
> +  __arm_vreinterpretq_s8 (u16);
> +  __arm_vreinterpretq_u16 (); /* { dg-error {too few arguments to function
> '__arm_vreinterpretq_u16'} } */
> +  __arm_vreinterpretq_u16 (u16, u16); /* { dg-error {too many arguments to
> function '__arm_vreinterpretq_u16'} } */
> +  __arm_vreinterpretq_u16 (0); /* { dg-error {passing 'int' to argument 1 of
> '__arm_vreinterpretq_u16', which expects an MVE vector type} } */
> +  __arm_vreinterpretq_u16 (u16); /* { dg-error {'__arm_vreinterpretq_u16'
> has no form that takes 'uint16x8_t' arguments} } */
> +  __arm_vreinterpretq_u16 (f32);
> +  __arm_vreinterpretq_f32 (); /* { dg-error {too few arguments to function
> '__arm_vreinterpretq_f32'} } */
> +  __arm_vreinterpretq_f32 (f32, f32); /* { dg-error {too many arguments to
> function '__arm_vreinterpretq_f32'} } */
> +  __arm_vreinterpretq_f32 (0); /* { dg-error {passing 'int' to argument 1 of
> '__arm_vreinterpretq_f32', which expects an MVE vector type} } */
> +  __arm_vreinterpretq_f32 (f32); /* { dg-error {'__arm_vreinterpretq_f32'
> has no form that takes 'float32x4_t' arguments} } */
> +  __arm_vreinterpretq_f32 (s8);
> +}
> --
> 2.34.1


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 00/22] arm: New framework for MVE intrinsics
  2023-05-02 15:04   ` Christophe Lyon
@ 2023-05-03 15:01     ` Christophe Lyon
  0 siblings, 0 replies; 55+ messages in thread
From: Christophe Lyon @ 2023-05-03 15:01 UTC (permalink / raw)
  To: Kyrylo Tkachov, gcc-patches, Richard Earnshaw, Richard Sandiford



On 5/2/23 17:04, Christophe Lyon via Gcc-patches wrote:
> 
> 
> On 5/2/23 11:18, Kyrylo Tkachov wrote:
>> Hi Christophe,
>>
>>> -----Original Message-----
>>> From: Christophe Lyon <christophe.lyon@arm.com>
>>> Sent: Tuesday, April 18, 2023 2:46 PM
>>> To: gcc-patches@gcc.gnu.org; Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>;
>>> Richard Earnshaw <Richard.Earnshaw@arm.com>; Richard Sandiford
>>> <Richard.Sandiford@arm.com>
>>> Cc: Christophe Lyon <Christophe.Lyon@arm.com>
>>> Subject: [PATCH 00/22] arm: New framework for MVE intrinsics
>>>
>>> Hi,
>>>
>>> This is the beginning of a long patch series to change the way Arm MVE
>>> intrinsics are implemented. The goal is to get rid of arm_mve.h, which
>>> takes a long time to parse and compile.
>>>
>>
>> Thanks for doing this. It is a significant improvement to the MVE 
>> intrinsics and should address some of the biggest maintainability and 
>> scalability issues we have in that area.
>> I'll be going through the patches one-by-one (I've looked at these 
>> offline already before), but the approach looks good to me at a high 
>> level.
>>
>> My hope is that we'll move all the intrinsics, including the Neon ones 
>> to use this framework in the future, but getting the framework in 
>> place first is a good major first step in that direction.
>>
> 
> Indeed. Ideally we'd probably want to make this framework more generic 
> so that it supports aarch64 SVE, arm MVE and Neon, but that can be done 
> later. I tried to highlight the differences I noticed compared to SVE, 
> so that it helps us think what needs to be specialized for different 
> targets, as opposed to what is already generic enough.
> 
> Thanks,
> 
> Christophe
> 

Thank you Kyrill for the quick review.
I've pushed the series with the minor changes you requested.

I am going to prepare the next batch :-)

Thanks,

Christophe

>> Thanks,
>> Kyrill
>>
>>> Roughly speaking, it's about using a framework very similar to what is
>>> implemented for AArch64/SVE intrinsics. I haven't converted all the
>>> intrinsics yet, but I think it would be good to start the conversion
>>> when stage-1 reopens.
>>>
>>> * Factorizing names
>>> One of the main implementation differences I noticed between SVE and
>>> MVE is that mve.md provides only full builtin names at the moment, and
>>> makes almost no use of "parameterized names"
>>> (https://gcc.gnu.org/onlinedocs/gccint/Parameterized-
>>> Names.html#Parameterized-Names).
>>>
>>> Without this, we'd need the builtin expander to use a large
>>> switch/case of the form:
>>>
>>> switch (code)
>>> case VADDQ_S: insn_code = code_for_mve_vaddq_s (...)
>>> case VADDQ_U: insn_code = code_for_mve_vaddq_u (...)
>>> case VSUBQ_S: insn_code = code_for_mve_vsubq_s (...)
>>> case VSUBQ_U: insn_code = code_for_mve_vsubq_u (...)
>>> ....
>>>
>>> so part of the work (which I called "factorize" in the commit
>>> messages) is about replacing
>>>
>>> (define_insn "mve_vaddq_n_<supf><mode>"
>>> with
>>> (define_insn "@mve_<mve_insn>q_n_<supf><mode>"
>>> with the help of a new iterator (mve_insn).
>>>
>>> Doing so makes it more obvious that some patterns are identical,
>>> except for the instruction name. I took this opportunity to merge
>>> them, so for instance I have a patch which merges add, sub and mul
>>> patterns.  Although not strictly necessary for the MVE intrinsics
>>> restructuring work, this is a good opportunity to reduce such code
>>> duplication (I did notice a few bugs during that process, which led me
>>> to post a few small patches in the past months).  Note that identical
>>> patterns will probably remain after the series, they can be merged
>>> later if we want.
>>>
>>> This factorization also implies the introduction of new iterators, but
>>> also means that several existing ones become useless. These patches do
>>> not remove them because it's a bit painful to reorder patches which
>>> remove lines at some "random" places, leading to merge conflicts. It's
>>> much simpler to write a big cleanup patch at the end of the serie to
>>> remove all such useless iterators at once.
>>>
>>> * Intrinsic re-implementation
>>> After intrinsic names have been factorized, the actual
>>> re-implementation patch is small:
>>> - add 1 line in each of arm-mve-builtins-base.{cc,def,h} describing
>>>    the intrinsic shape/signature, types and predicates involved,
>>>    RTX/unspec codes
>>> - remove the intrinsic definitions from arm_mve.h
>>>
>>> The full series of ~140 patches is organized like this:
>>> - patches 1 and 2 introduce the new framework
>>> - new implementation of vreinterpretq
>>> - new implementation of vuninitialized
>>> - patch groups of varying size, consisting in:
>>>    - add a new "shape" if needed (e.g. unary, binary, ternary, ....)
>>>    - add framework support functions if needed
>>>    - factorize a set of intrinsics (at minimum, just make use of
>>>      parameterized-names)
>>>    - actual re-implementation of the intrinsics
>>>
>>> I kept patches small so the incremental progress is easy to follow and
>>> check.  I'll submit the patches in small groups, this first one will
>>> make sure we agree on the implementation.
>>>
>>> Tested on arm-eabi with -mthumb/-mfloat-abi=hard/-march=armv8.1-
>>> m.main+mve.
>>>
>>> To help reviewers, I suggest to compare arm-mve-builtins.cc with
>>> aarch64-sve-builtins.cc.
>>>
>>> Christophe Lyon (22):
>>>    arm: move builtin function codes into general numberspace
>>>    arm: [MVE intrinsics] Add new framework
>>>    arm: [MVE intrinsics] Rework vreinterpretq
>>>    arm: [MVE intrinsics] Rework vuninitialized
>>>    arm: [MVE intrinsics] add binary_opt_n shape
>>>    arm: [MVE intrinsics] add unspec_based_mve_function_exact_insn
>>>    arm: [MVE intrinsics] factorize vadd vsubq vmulq
>>>    arm: [MVE intrinsics] rework vaddq vmulq vsubq
>>>    arm: [MVE intrinsics] add binary shape
>>>    arm: [MVE intrinsics] factorize vandq veorq vorrq vbicq
>>>    arm: [MVE intrinsics] rework vandq veorq
>>>    arm: [MVE intrinsics] add binary_orrq shape
>>>    arm: [MVE intrinsics] rework vorrq
>>>    arm: [MVE intrinsics] add unspec_mve_function_exact_insn
>>>    arm: [MVE intrinsics] add create shape
>>>    arm: [MVE intrinsics] factorize vcreateq
>>>    arm: [MVE intrinsics] rework vcreateq
>>>    arm: [MVE intrinsics] factorize several binary_m operations
>>>    arm: [MVE intrinsics] factorize several binary _n operations
>>>    arm: [MVE intrinsics] factorize several binary _m_n operations
>>>    arm: [MVE intrinsics] factorize several binary operations
>>>    arm: [MVE intrinsics] rework vhaddq vhsubq vmulhq vqaddq vqsubq
>>>      vqdmulhq vrhaddq vrmulhq
>>>
>>>   gcc/config.gcc                                |    2 +-
>>>   gcc/config/arm/arm-builtins.cc                |  237 +-
>>>   gcc/config/arm/arm-builtins.h                 |    1 +
>>>   gcc/config/arm/arm-c.cc                       |   42 +-
>>>   gcc/config/arm/arm-mve-builtins-base.cc       |  163 +
>>>   gcc/config/arm/arm-mve-builtins-base.def      |   50 +
>>>   gcc/config/arm/arm-mve-builtins-base.h        |   47 +
>>>   gcc/config/arm/arm-mve-builtins-functions.h   |  387 +
>>>   gcc/config/arm/arm-mve-builtins-shapes.cc     |  529 ++
>>>   gcc/config/arm/arm-mve-builtins-shapes.h      |   47 +
>>>   gcc/config/arm/arm-mve-builtins.cc            | 2013 ++++-
>>>   gcc/config/arm/arm-mve-builtins.def           |   40 +-
>>>   gcc/config/arm/arm-mve-builtins.h             |  672 +-
>>>   gcc/config/arm/arm-protos.h                   |   24 +
>>>   gcc/config/arm/arm.cc                         |   27 +
>>>   gcc/config/arm/arm_mve.h                      | 7581 +----------------
>>>   gcc/config/arm/arm_mve_builtins.def           |    6 -
>>>   gcc/config/arm/arm_mve_types.h                | 1430 ----
>>>   gcc/config/arm/iterators.md                   |  240 +-
>>>   gcc/config/arm/mve.md                         | 1747 +---
>>>   gcc/config/arm/predicates.md                  |    4 +
>>>   gcc/config/arm/t-arm                          |   32 +-
>>>   gcc/config/arm/unspecs.md                     |    1 +
>>>   gcc/config/arm/vec-common.md                  |    8 +-
>>>   gcc/testsuite/g++.target/arm/mve.exp          |    8 +-
>>>   .../arm/mve/general-c++/nomve_fp_1.c          |   15 +
>>>   .../arm/mve/general-c++/vreinterpretq_1.C     |   25 +
>>>   .../gcc.target/arm/mve/general-c/nomve_fp_1.c |   15 +
>>>   .../arm/mve/general-c/vreinterpretq_1.c       |   25 +
>>>   29 files changed, 4926 insertions(+), 10492 deletions(-)
>>>   create mode 100644 gcc/config/arm/arm-mve-builtins-base.cc
>>>   create mode 100644 gcc/config/arm/arm-mve-builtins-base.def
>>>   create mode 100644 gcc/config/arm/arm-mve-builtins-base.h
>>>   create mode 100644 gcc/config/arm/arm-mve-builtins-functions.h
>>>   create mode 100644 gcc/config/arm/arm-mve-builtins-shapes.cc
>>>   create mode 100644 gcc/config/arm/arm-mve-builtins-shapes.h
>>>   create mode 100644 gcc/testsuite/g++.target/arm/mve/general-
>>> c++/nomve_fp_1.c
>>>   create mode 100644 gcc/testsuite/g++.target/arm/mve/general-
>>> c++/vreinterpretq_1.C
>>>   create mode 100644 gcc/testsuite/gcc.target/arm/mve/general-
>>> c/nomve_fp_1.c
>>>   create mode 100644 gcc/testsuite/gcc.target/arm/mve/general-
>>> c/vreinterpretq_1.c
>>>
>>> -- 
>>> 2.34.1
>>

^ permalink raw reply	[flat|nested] 55+ messages in thread

end of thread, other threads:[~2023-05-03 15:01 UTC | newest]

Thread overview: 55+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-04-18 13:45 [PATCH 00/22] arm: New framework for MVE intrinsics Christophe Lyon
2023-04-18 13:45 ` [PATCH 01/22] arm: move builtin function codes into general numberspace Christophe Lyon
2023-05-02  9:24   ` Kyrylo Tkachov
2023-04-18 13:45 ` [PATCH 02/22] arm: [MVE intrinsics] Add new framework Christophe Lyon
2023-05-02 10:17   ` Kyrylo Tkachov
2023-04-18 13:45 ` [PATCH 03/22] arm: [MVE intrinsics] Rework vreinterpretq Christophe Lyon
2023-05-02 10:26   ` Kyrylo Tkachov
2023-05-02 14:05     ` Christophe Lyon
2023-05-02 15:28       ` Kyrylo Tkachov
2023-05-02 15:49         ` Christophe Lyon
2023-05-03 14:37           ` [PATCH v2 " Christophe Lyon
2023-05-03 14:52             ` Kyrylo Tkachov
2023-04-18 13:45 ` [PATCH 04/22] arm: [MVE intrinsics] Rework vuninitialized Christophe Lyon
2023-05-02 16:13   ` Kyrylo Tkachov
2023-04-18 13:45 ` [PATCH 05/22] arm: [MVE intrinsics] add binary_opt_n shape Christophe Lyon
2023-05-02 16:16   ` Kyrylo Tkachov
2023-04-18 13:45 ` [PATCH 06/22] arm: [MVE intrinsics] add unspec_based_mve_function_exact_insn Christophe Lyon
2023-05-02 16:17   ` Kyrylo Tkachov
2023-04-18 13:45 ` [PATCH 07/22] arm: [MVE intrinsics] factorize vadd vsubq vmulq Christophe Lyon
2023-05-02 16:19   ` Kyrylo Tkachov
2023-05-02 16:22     ` Christophe Lyon
2023-04-18 13:45 ` [PATCH 08/22] arm: [MVE intrinsics] rework vaddq vmulq vsubq Christophe Lyon
2023-05-02 16:31   ` Kyrylo Tkachov
2023-05-03  9:06     ` Christophe Lyon
2023-04-18 13:45 ` [PATCH 09/22] arm: [MVE intrinsics] add binary shape Christophe Lyon
2023-05-02 16:32   ` Kyrylo Tkachov
2023-04-18 13:45 ` [PATCH 10/22] arm: [MVE intrinsics] factorize vandq veorq vorrq vbicq Christophe Lyon
2023-05-02 16:36   ` Kyrylo Tkachov
2023-04-18 13:45 ` [PATCH 11/22] arm: [MVE intrinsics] rework vandq veorq Christophe Lyon
2023-05-02 16:37   ` Kyrylo Tkachov
2023-04-18 13:45 ` [PATCH 12/22] arm: [MVE intrinsics] add binary_orrq shape Christophe Lyon
2023-05-02 16:39   ` Kyrylo Tkachov
2023-04-18 13:45 ` [PATCH 13/22] arm: [MVE intrinsics] rework vorrq Christophe Lyon
2023-05-02 16:41   ` Kyrylo Tkachov
2023-04-18 13:46 ` [PATCH 14/22] arm: [MVE intrinsics] add unspec_mve_function_exact_insn Christophe Lyon
2023-05-03  8:40   ` Kyrylo Tkachov
2023-04-18 13:46 ` [PATCH 15/22] arm: [MVE intrinsics] add create shape Christophe Lyon
2023-05-03  8:40   ` Kyrylo Tkachov
2023-04-18 13:46 ` [PATCH 16/22] arm: [MVE intrinsics] factorize vcreateq Christophe Lyon
2023-05-03  8:42   ` Kyrylo Tkachov
2023-04-18 13:46 ` [PATCH 17/22] arm: [MVE intrinsics] rework vcreateq Christophe Lyon
2023-05-03  8:44   ` Kyrylo Tkachov
2023-04-18 13:46 ` [PATCH 18/22] arm: [MVE intrinsics] factorize several binary_m operations Christophe Lyon
2023-05-03  8:46   ` Kyrylo Tkachov
2023-04-18 13:46 ` [PATCH 19/22] arm: [MVE intrinsics] factorize several binary _n operations Christophe Lyon
2023-05-03  8:47   ` Kyrylo Tkachov
2023-04-18 13:46 ` [PATCH 20/22] arm: [MVE intrinsics] factorize several binary _m_n operations Christophe Lyon
2023-05-03  8:48   ` Kyrylo Tkachov
2023-04-18 13:46 ` [PATCH 21/22] arm: [MVE intrinsics] factorize several binary operations Christophe Lyon
2023-05-03  8:49   ` Kyrylo Tkachov
2023-04-18 13:46 ` [PATCH 22/22] arm: [MVE intrinsics] rework vhaddq vhsubq vmulhq vqaddq vqsubq vqdmulhq vrhaddq vrmulhq Christophe Lyon
2023-05-03  8:51   ` Kyrylo Tkachov
2023-05-02  9:18 ` [PATCH 00/22] arm: New framework for MVE intrinsics Kyrylo Tkachov
2023-05-02 15:04   ` Christophe Lyon
2023-05-03 15:01     ` Christophe Lyon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).