RE: [PATCH 02/22] arm: [MVE intrinsics] Add new framework

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>
To: Christophe Lyon <Christophe.Lyon@arm.com>,
	"gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>,
	Richard Earnshaw <Richard.Earnshaw@arm.com>,
	Richard Sandiford <Richard.Sandiford@arm.com>
Cc: Christophe Lyon <Christophe.Lyon@arm.com>
Subject: RE: [PATCH 02/22] arm: [MVE intrinsics] Add new framework
Date: Tue, 2 May 2023 10:17:04 +0000	[thread overview]
Message-ID: <PAXPR08MB6926816B6C8756D491C51F63936F9@PAXPR08MB6926.eurprd08.prod.outlook.com> (raw)
In-Reply-To: <20230418134608.244751-3-christophe.lyon@arm.com>



> -----Original Message-----
> From: Christophe Lyon <christophe.lyon@arm.com>
> Sent: Tuesday, April 18, 2023 2:46 PM
> To: gcc-patches@gcc.gnu.org; Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>;
> Richard Earnshaw <Richard.Earnshaw@arm.com>; Richard Sandiford
> <Richard.Sandiford@arm.com>
> Cc: Christophe Lyon <Christophe.Lyon@arm.com>
> Subject: [PATCH 02/22] arm: [MVE intrinsics] Add new framework
> 
> This patch introduces the new MVE intrinsics framework, heavily
> inspired by the SVE one in the aarch64 port.
> 
> Like the MVE intrinsic types implementation, the intrinsics framework
> defines functions via a new pragma in arm_mve.h. A boolean parameter
> is used to pass true when __ARM_MVE_PRESERVE_USER_NAMESPACE is
> defined, and false when it is not, allowing for non-prefixed intrinsic
> functions to be conditionally defined.
> 
> Future patches will build on this framework by adding new intrinsic
> functions and adding the features needed to support them.
> 
> Differences compared to the aarch64/SVE port include:
> - when present, the predicate argument is the last one with MVE (the
>   first one with SVE)
> - when using merging predicates ("_m" suffix), the "inactive" argument
>   (if any) is inserted in the first position
> - when using merging predicates ("_m" suffix), some function do not
>   have the "inactive" argument, so we maintain an exception-list
> - MVE intrinsics dealing with floating-point require the FP extension,
>   while SVE may support different extensions
> - regarding global state, MVE does not have any prefetch intrinsic, so
>   we do not need a flag for this
> - intrinsic names can be prefixed with "__arm", depending on whether
>   preserve_user_namespace is true or false
> - parse_signature: the maximum number of arguments is now a parameter,
>   this helps detecting an overflow with a new assert.
> - suffixes and overloading can be controlled using
>   explicit_mode_suffix_p and skip_overload_p in addition to
>   explicit_type_suffix_p

Ok.
Thanks,
Kyrill

> 
> At this implemtation stage, there are some limitations compared
> to aarch64/SVE, which are removed later in the series:
> - "offset" mode is not supported yet
> - gimple folding is not implemented
> 
> 2022-09-08  Murray Steele  <murray.steele@arm.com>
> 	    Christophe Lyon  <christophe.lyon@arm.com>
> 
> gcc/ChangeLog:
> 
> 	* config.gcc: Add arm-mve-builtins-base.o and
> 	arm-mve-builtins-shapes.o to extra_objs.
> 	* config/arm/arm-builtins.cc (arm_builtin_decl): Handle MVE builtin
> 	numberspace.
> 	(arm_expand_builtin): Likewise
> 	(arm_check_builtin_call): Likewise
> 	(arm_describe_resolver): Likewise.
> 	* config/arm/arm-builtins.h (enum resolver_ident): Add
> 	arm_mve_resolver.
> 	* config/arm/arm-c.cc (arm_pragma_arm): Handle new pragma.
> 	(arm_resolve_overloaded_builtin): Handle MVE builtins.
> 	(arm_register_target_pragmas): Register arm_check_builtin_call.
> 	* config/arm/arm-mve-builtins.cc (class registered_function): New
> 	class.
> 	(struct registered_function_hasher): New struct.
> 	(pred_suffixes): New table.
> 	(mode_suffixes): New table.
> 	(type_suffix_info): New table.
> 	(TYPES_float16): New.
> 	(TYPES_all_float): New.
> 	(TYPES_integer_8): New.
> 	(TYPES_integer_8_16): New.
> 	(TYPES_integer_16_32): New.
> 	(TYPES_integer_32): New.
> 	(TYPES_signed_16_32): New.
> 	(TYPES_signed_32): New.
> 	(TYPES_all_signed): New.
> 	(TYPES_all_unsigned): New.
> 	(TYPES_all_integer): New.
> 	(TYPES_all_integer_with_64): New.
> 	(DEF_VECTOR_TYPE): New.
> 	(DEF_DOUBLE_TYPE): New.
> 	(DEF_MVE_TYPES_ARRAY): New.
> 	(all_integer): New.
> 	(all_integer_with_64): New.
> 	(float16): New.
> 	(all_float): New.
> 	(all_signed): New.
> 	(all_unsigned): New.
> 	(integer_8): New.
> 	(integer_8_16): New.
> 	(integer_16_32): New.
> 	(integer_32): New.
> 	(signed_16_32): New.
> 	(signed_32): New.
> 	(register_vector_type): Use void_type_node for mve.fp-only types
> when
> 	mve.fp is not enabled.
> 	(register_builtin_tuple_types): Likewise.
> 	(handle_arm_mve_h): New function..
> 	(matches_type_p): Likewise..
> 	(report_out_of_range): Likewise.
> 	(report_not_enum): Likewise.
> 	(report_missing_float): Likewise.
> 	(report_non_ice): Likewise.
> 	(check_requires_float): Likewise.
> 	(function_instance::hash): Likewise
> 	(function_instance::call_properties): Likewise.
> 	(function_instance::reads_global_state_p): Likewise.
> 	(function_instance::modifies_global_state_p): Likewise.
> 	(function_instance::could_trap_p): Likewise.
> 	(function_instance::has_inactive_argument): Likewise.
> 	(registered_function_hasher::hash): Likewise.
> 	(registered_function_hasher::equal): Likewise.
> 	(function_builder::function_builder): Likewise.
> 	(function_builder::~function_builder): Likewise.
> 	(function_builder::append_name): Likewise.
> 	(function_builder::finish_name): Likewise.
> 	(function_builder::get_name): Likewise.
> 	(add_attribute): Likewise.
> 	(function_builder::get_attributes): Likewise.
> 	(function_builder::add_function): Likewise.
> 	(function_builder::add_unique_function): Likewise.
> 	(function_builder::add_overloaded_function): Likewise.
> 	(function_builder::add_overloaded_functions): Likewise.
> 	(function_builder::register_function_group): Likewise.
> 	(function_call_info::function_call_info): Likewise.
> 	(function_resolver::function_resolver): Likewise.
> 	(function_resolver::get_vector_type): Likewise.
> 	(function_resolver::get_scalar_type_name): Likewise.
> 	(function_resolver::get_argument_type): Likewise.
> 	(function_resolver::scalar_argument_p): Likewise.
> 	(function_resolver::report_no_such_form): Likewise.
> 	(function_resolver::lookup_form): Likewise.
> 	(function_resolver::resolve_to): Likewise.
> 	(function_resolver::infer_vector_or_tuple_type): Likewise.
> 	(function_resolver::infer_vector_type): Likewise.
> 	(function_resolver::require_vector_or_scalar_type): Likewise.
> 	(function_resolver::require_vector_type): Likewise.
> 	(function_resolver::require_matching_vector_type): Likewise.
> 	(function_resolver::require_derived_vector_type): Likewise.
> 	(function_resolver::require_derived_scalar_type): Likewise.
> 	(function_resolver::require_integer_immediate): Likewise.
> 	(function_resolver::require_scalar_type): Likewise.
> 	(function_resolver::check_num_arguments): Likewise.
> 	(function_resolver::check_gp_argument): Likewise.
> 	(function_resolver::finish_opt_n_resolution): Likewise.
> 	(function_resolver::resolve_unary): Likewise.
> 	(function_resolver::resolve_unary_n): Likewise.
> 	(function_resolver::resolve_uniform): Likewise.
> 	(function_resolver::resolve_uniform_opt_n): Likewise.
> 	(function_resolver::resolve): Likewise.
> 	(function_checker::function_checker): Likewise.
> 	(function_checker::argument_exists_p): Likewise.
> 	(function_checker::require_immediate): Likewise.
> 	(function_checker::require_immediate_enum): Likewise.
> 	(function_checker::require_immediate_range): Likewise.
> 	(function_checker::check): Likewise.
> 	(gimple_folder::gimple_folder): Likewise.
> 	(gimple_folder::fold): Likewise.
> 	(function_expander::function_expander): Likewise.
> 	(function_expander::direct_optab_handler): Likewise.
> 	(function_expander::get_fallback_value): Likewise.
> 	(function_expander::get_reg_target): Likewise.
> 	(function_expander::add_output_operand): Likewise.
> 	(function_expander::add_input_operand): Likewise.
> 	(function_expander::add_integer_operand): Likewise.
> 	(function_expander::generate_insn): Likewise.
> 	(function_expander::use_exact_insn): Likewise.
> 	(function_expander::use_unpred_insn): Likewise.
> 	(function_expander::use_pred_x_insn): Likewise.
> 	(function_expander::use_cond_insn): Likewise.
> 	(function_expander::map_to_rtx_codes): Likewise.
> 	(function_expander::expand): Likewise.
> 	(resolve_overloaded_builtin): Likewise.
> 	(check_builtin_call): Likewise.
> 	(gimple_fold_builtin): Likewise.
> 	(expand_builtin): Likewise.
> 	(gt_ggc_mx): Likewise.
> 	(gt_pch_nx): Likewise.
> 	(gt_pch_nx): Likewise.
> 	* config/arm/arm-mve-builtins.def(s8): Define new type suffix.
> 	(s16): Likewise.
> 	(s32): Likewise.
> 	(s64): Likewise.
> 	(u8): Likewise.
> 	(u16): Likewise.
> 	(u32): Likewise.
> 	(u64): Likewise.
> 	(f16): Likewise.
> 	(f32): Likewise.
> 	(n): New mode.
> 	(offset): New mode.
> 	* config/arm/arm-mve-builtins.h (MAX_TUPLE_SIZE): New constant.
> 	(CP_READ_FPCR): Likewise.
> 	(CP_RAISE_FP_EXCEPTIONS): Likewise.
> 	(CP_READ_MEMORY): Likewise.
> 	(CP_WRITE_MEMORY): Likewise.
> 	(enum units_index): New enum.
> 	(enum predication_index): New.
> 	(enum type_class_index): New.
> 	(enum mode_suffix_index): New enum.
> 	(enum type_suffix_index): New.
> 	(struct mode_suffix_info): New struct.
> 	(struct type_suffix_info): New.
> 	(struct function_group_info): Likewise.
> 	(class function_instance): Likewise.
> 	(class registered_function): Likewise.
> 	(class function_builder): Likewise.
> 	(class function_call_info): Likewise.
> 	(class function_resolver): Likewise.
> 	(class function_checker): Likewise.
> 	(class gimple_folder): Likewise.
> 	(class function_expander): Likewise.
> 	(get_mve_pred16_t): Likewise.
> 	(find_mode_suffix): New function.
> 	(class function_base): Likewise.
> 	(class function_shape): Likewise.
> 	(function_instance::operator==): New function.
> 	(function_instance::operator!=): Likewise.
> 	(function_instance::vectors_per_tuple): Likewise.
> 	(function_instance::mode_suffix): Likewise.
> 	(function_instance::type_suffix): Likewise.
> 	(function_instance::scalar_type): Likewise.
> 	(function_instance::vector_type): Likewise.
> 	(function_instance::tuple_type): Likewise.
> 	(function_instance::vector_mode): Likewise.
> 	(function_call_info::function_returns_void_p): Likewise.
> 	(function_base::call_properties): Likewise.
> 	* config/arm/arm-protos.h (enum arm_builtin_class): Add
> 	ARM_BUILTIN_MVE.
> 	(handle_arm_mve_h): New.
> 	(resolve_overloaded_builtin): New.
> 	(check_builtin_call): New.
> 	(gimple_fold_builtin): New.
> 	(expand_builtin): New.
> 	* config/arm/arm.cc (TARGET_GIMPLE_FOLD_BUILTIN): Define as
> 	arm_gimple_fold_builtin.
> 	(arm_gimple_fold_builtin): New function.
> 	* config/arm/arm_mve.h: Use new arm_mve.h pragma.
> 	* config/arm/predicates.md (arm_any_register_operand): New
> predicate.
> 	* config/arm/t-arm: (arm-mve-builtins.o): Add includes.
> 	(arm-mve-builtins-shapes.o): New target.
> 	(arm-mve-builtins-base.o): New target.
> 	* config/arm/arm-mve-builtins-base.cc: New file.
> 	* config/arm/arm-mve-builtins-base.def: New file.
> 	* config/arm/arm-mve-builtins-base.h: New file.
> 	* config/arm/arm-mve-builtins-functions.h: New file.
> 	* config/arm/arm-mve-builtins-shapes.cc: New file.
> 	* config/arm/arm-mve-builtins-shapes.h: New file.
> 
> Co-authored-by: Christophe Lyon  <christophe.lyon@arm.com
> ---
>  gcc/config.gcc                              |    2 +-
>  gcc/config/arm/arm-builtins.cc              |   15 +-
>  gcc/config/arm/arm-builtins.h               |    1 +
>  gcc/config/arm/arm-c.cc                     |   42 +-
>  gcc/config/arm/arm-mve-builtins-base.cc     |   45 +
>  gcc/config/arm/arm-mve-builtins-base.def    |   24 +
>  gcc/config/arm/arm-mve-builtins-base.h      |   29 +
>  gcc/config/arm/arm-mve-builtins-functions.h |   50 +
>  gcc/config/arm/arm-mve-builtins-shapes.cc   |  343 ++++
>  gcc/config/arm/arm-mve-builtins-shapes.h    |   30 +
>  gcc/config/arm/arm-mve-builtins.cc          | 1950 ++++++++++++++++++-
>  gcc/config/arm/arm-mve-builtins.def         |   40 +-
>  gcc/config/arm/arm-mve-builtins.h           |  669 ++++++-
>  gcc/config/arm/arm-protos.h                 |   10 +-
>  gcc/config/arm/arm.cc                       |   27 +
>  gcc/config/arm/arm_mve.h                    |    6 +
>  gcc/config/arm/predicates.md                |    4 +
>  gcc/config/arm/t-arm                        |   32 +-
>  18 files changed, 3292 insertions(+), 27 deletions(-)
>  create mode 100644 gcc/config/arm/arm-mve-builtins-base.cc
>  create mode 100644 gcc/config/arm/arm-mve-builtins-base.def
>  create mode 100644 gcc/config/arm/arm-mve-builtins-base.h
>  create mode 100644 gcc/config/arm/arm-mve-builtins-functions.h
>  create mode 100644 gcc/config/arm/arm-mve-builtins-shapes.cc
>  create mode 100644 gcc/config/arm/arm-mve-builtins-shapes.h
> 
> diff --git a/gcc/config.gcc b/gcc/config.gcc
> index 6fd1594480a..5d49f5890ab 100644
> --- a/gcc/config.gcc
> +++ b/gcc/config.gcc
> @@ -362,7 +362,7 @@ arc*-*-*)
>  	;;
>  arm*-*-*)
>  	cpu_type=arm
> -	extra_objs="arm-builtins.o arm-mve-builtins.o aarch-common.o
> aarch-bti-insert.o"
> +	extra_objs="arm-builtins.o arm-mve-builtins.o arm-mve-builtins-
> shapes.o arm-mve-builtins-base.o aarch-common.o aarch-bti-insert.o"
>  	extra_headers="mmintrin.h arm_neon.h arm_acle.h arm_fp16.h
> arm_cmse.h arm_bf16.h arm_mve_types.h arm_mve.h arm_cde.h"
>  	target_type_format_char='%'
>  	c_target_objs="arm-c.o"
> diff --git a/gcc/config/arm/arm-builtins.cc b/gcc/config/arm/arm-builtins.cc
> index adcb50d2185..d0c57409b4c 100644
> --- a/gcc/config/arm/arm-builtins.cc
> +++ b/gcc/config/arm/arm-builtins.cc
> @@ -2712,6 +2712,7 @@ arm_general_builtin_decl (unsigned code)
>    return arm_builtin_decls[code];
>  }
> 
> +/* Implement TARGET_BUILTIN_DECL.  */
>  /* Return the ARM builtin for CODE.  */
>  tree
>  arm_builtin_decl (unsigned code, bool initialize_p ATTRIBUTE_UNUSED)
> @@ -2721,6 +2722,8 @@ arm_builtin_decl (unsigned code, bool initialize_p
> ATTRIBUTE_UNUSED)
>      {
>      case ARM_BUILTIN_GENERAL:
>        return arm_general_builtin_decl (subcode);
> +    case ARM_BUILTIN_MVE:
> +      return error_mark_node;
>      default:
>        gcc_unreachable ();
>      }
> @@ -4087,6 +4090,8 @@ arm_expand_builtin (tree exp,
>      {
>      case ARM_BUILTIN_GENERAL:
>        return arm_general_expand_builtin (subcode, exp, target, ignore);
> +    case ARM_BUILTIN_MVE:
> +      return arm_mve::expand_builtin (subcode, exp, target);
>      default:
>        gcc_unreachable ();
>      }
> @@ -4188,8 +4193,9 @@ arm_general_check_builtin_call (unsigned int code)
> 
>  /* Implement TARGET_CHECK_BUILTIN_CALL.  */
>  bool
> -arm_check_builtin_call (location_t, vec<location_t>, tree fndecl, tree,
> -			unsigned int, tree *)
> +arm_check_builtin_call (location_t loc, vec<location_t> arg_loc,
> +			tree fndecl, tree orig_fndecl,
> +			unsigned int nargs, tree *args)
>  {
>    unsigned int code = DECL_MD_FUNCTION_CODE (fndecl);
>    unsigned int subcode = code >> ARM_BUILTIN_SHIFT;
> @@ -4197,6 +4203,9 @@ arm_check_builtin_call (location_t,
> vec<location_t>, tree fndecl, tree,
>      {
>      case ARM_BUILTIN_GENERAL:
>        return arm_general_check_builtin_call (subcode);
> +    case ARM_BUILTIN_MVE:
> +      return arm_mve::check_builtin_call (loc, arg_loc, subcode,
> +					  orig_fndecl, nargs, args);
>      default:
>        gcc_unreachable ();
>      }
> @@ -4215,6 +4224,8 @@ arm_describe_resolver (tree fndecl)
>  	&& subcode < ARM_BUILTIN_MVE_BASE)
>  	return arm_cde_resolver;
>        return arm_no_resolver;
> +    case ARM_BUILTIN_MVE:
> +      return arm_mve_resolver;
>      default:
>        gcc_unreachable ();
>      }
> diff --git a/gcc/config/arm/arm-builtins.h b/gcc/config/arm/arm-builtins.h
> index 8c94b6bc40b..494dcd09411 100644
> --- a/gcc/config/arm/arm-builtins.h
> +++ b/gcc/config/arm/arm-builtins.h
> @@ -27,6 +27,7 @@
> 
>  enum resolver_ident {
>      arm_cde_resolver,
> +    arm_mve_resolver,
>      arm_no_resolver
>  };
>  enum resolver_ident arm_describe_resolver (tree);
> diff --git a/gcc/config/arm/arm-c.cc b/gcc/config/arm/arm-c.cc
> index 59c0d8ce747..d3d93ceba00 100644
> --- a/gcc/config/arm/arm-c.cc
> +++ b/gcc/config/arm/arm-c.cc
> @@ -144,20 +144,44 @@ arm_pragma_arm (cpp_reader *)
>    const char *name = TREE_STRING_POINTER (x);
>    if (strcmp (name, "arm_mve_types.h") == 0)
>      arm_mve::handle_arm_mve_types_h ();
> +  else if (strcmp (name, "arm_mve.h") == 0)
> +    {
> +      if (pragma_lex (&x) == CPP_NAME)
> +	{
> +	  if (strcmp (IDENTIFIER_POINTER (x), "true") == 0)
> +	    arm_mve::handle_arm_mve_h (true);
> +	  else if (strcmp (IDENTIFIER_POINTER (x), "false") == 0)
> +	    arm_mve::handle_arm_mve_h (false);
> +	  else
> +	    error ("%<#pragma GCC arm \"arm_mve.h\"%> requires a boolean
> parameter");
> +	}
> +    }
>    else
>      error ("unknown %<#pragma GCC arm%> option %qs", name);
>  }
> 
> -/* Implement TARGET_RESOLVE_OVERLOADED_BUILTIN.  This is currently
> only
> -   used for the MVE related builtins for the CDE extension.
> -   Here we ensure the type of arguments is such that the size is correct, and
> -   then return a tree that describes the same function call but with the
> -   relevant types cast as necessary.  */
> +/* Implement TARGET_RESOLVE_OVERLOADED_BUILTIN.  */
>  tree
> -arm_resolve_overloaded_builtin (location_t loc, tree fndecl, void *arglist)
> +arm_resolve_overloaded_builtin (location_t loc, tree fndecl,
> +				void *uncast_arglist)
>  {
> -  if (arm_describe_resolver (fndecl) == arm_cde_resolver)
> -    return arm_resolve_cde_builtin (loc, fndecl, arglist);
> +  enum resolver_ident resolver = arm_describe_resolver (fndecl);
> +  if (resolver == arm_cde_resolver)
> +    return arm_resolve_cde_builtin (loc, fndecl, uncast_arglist);
> +  if (resolver == arm_mve_resolver)
> +    {
> +      vec<tree, va_gc> empty = {};
> +      vec<tree, va_gc> *arglist = (uncast_arglist
> +				   ? (vec<tree, va_gc> *) uncast_arglist
> +				   : &empty);
> +      unsigned int code = DECL_MD_FUNCTION_CODE (fndecl);
> +      unsigned int subcode = code >> ARM_BUILTIN_SHIFT;
> +      tree new_fndecl = arm_mve::resolve_overloaded_builtin (loc, subcode,
> arglist);
> +      if (new_fndecl == NULL_TREE || new_fndecl == error_mark_node)
> +	return new_fndecl;
> +      return build_function_call_vec (loc, vNULL, new_fndecl, arglist,
> +				      NULL, fndecl);
> +    }
>    return NULL_TREE;
>  }
> 
> @@ -519,7 +543,9 @@ arm_register_target_pragmas (void)
>  {
>    /* Update pragma hook to allow parsing #pragma GCC target.  */
>    targetm.target_option.pragma_parse = arm_pragma_target_parse;
> +
>    targetm.resolve_overloaded_builtin = arm_resolve_overloaded_builtin;
> +  targetm.check_builtin_call = arm_check_builtin_call;
> 
>    c_register_pragma ("GCC", "arm", arm_pragma_arm);
> 
> diff --git a/gcc/config/arm/arm-mve-builtins-base.cc b/gcc/config/arm/arm-
> mve-builtins-base.cc
> new file mode 100644
> index 00000000000..e9f285faf2b
> --- /dev/null
> +++ b/gcc/config/arm/arm-mve-builtins-base.cc
> @@ -0,0 +1,45 @@
> +/* ACLE support for Arm MVE (__ARM_FEATURE_MVE intrinsics)
> +   Copyright (C) 2023 Free Software Foundation, Inc.
> +
> +   This file is part of GCC.
> +
> +   GCC is free software; you can redistribute it and/or modify it
> +   under the terms of the GNU General Public License as published by
> +   the Free Software Foundation; either version 3, or (at your option)
> +   any later version.
> +
> +   GCC is distributed in the hope that it will be useful, but
> +   WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   General Public License for more details.
> +
> +   You should have received a copy of the GNU General Public License
> +   along with GCC; see the file COPYING3.  If not see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#include "config.h"
> +#include "system.h"
> +#include "coretypes.h"
> +#include "tm.h"
> +#include "tree.h"
> +#include "rtl.h"
> +#include "memmodel.h"
> +#include "insn-codes.h"
> +#include "optabs.h"
> +#include "basic-block.h"
> +#include "function.h"
> +#include "gimple.h"
> +#include "arm-mve-builtins.h"
> +#include "arm-mve-builtins-shapes.h"
> +#include "arm-mve-builtins-base.h"
> +#include "arm-mve-builtins-functions.h"
> +
> +using namespace arm_mve;
> +
> +namespace {
> +
> +} /* end anonymous namespace */
> +
> +namespace arm_mve {
> +
> +} /* end namespace arm_mve */
> diff --git a/gcc/config/arm/arm-mve-builtins-base.def b/gcc/config/arm/arm-
> mve-builtins-base.def
> new file mode 100644
> index 00000000000..d15ba2e23e8
> --- /dev/null
> +++ b/gcc/config/arm/arm-mve-builtins-base.def
> @@ -0,0 +1,24 @@
> +/* ACLE support for Arm MVE (__ARM_FEATURE_MVE intrinsics)
> +   Copyright (C) 2023 Free Software Foundation, Inc.
> +
> +   This file is part of GCC.
> +
> +   GCC is free software; you can redistribute it and/or modify it
> +   under the terms of the GNU General Public License as published by
> +   the Free Software Foundation; either version 3, or (at your option)
> +   any later version.
> +
> +   GCC is distributed in the hope that it will be useful, but
> +   WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   General Public License for more details.
> +
> +   You should have received a copy of the GNU General Public License
> +   along with GCC; see the file COPYING3.  If not see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#define REQUIRES_FLOAT false
> +#undef REQUIRES_FLOAT
> +
> +#define REQUIRES_FLOAT true
> +#undef REQUIRES_FLOAT
> diff --git a/gcc/config/arm/arm-mve-builtins-base.h b/gcc/config/arm/arm-
> mve-builtins-base.h
> new file mode 100644
> index 00000000000..c4d7b750cd5
> --- /dev/null
> +++ b/gcc/config/arm/arm-mve-builtins-base.h
> @@ -0,0 +1,29 @@
> +/* ACLE support for Arm MVE (__ARM_FEATURE_MVE intrinsics)
> +   Copyright (C) 2023 Free Software Foundation, Inc.
> +
> +   This file is part of GCC.
> +
> +   GCC is free software; you can redistribute it and/or modify it
> +   under the terms of the GNU General Public License as published by
> +   the Free Software Foundation; either version 3, or (at your option)
> +   any later version.
> +
> +   GCC is distributed in the hope that it will be useful, but
> +   WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   General Public License for more details.
> +
> +   You should have received a copy of the GNU General Public License
> +   along with GCC; see the file COPYING3.  If not see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#ifndef GCC_ARM_MVE_BUILTINS_BASE_H
> +#define GCC_ARM_MVE_BUILTINS_BASE_H
> +
> +namespace arm_mve {
> +namespace functions {
> +
> +} /* end namespace arm_mve::functions */
> +} /* end namespace arm_mve */
> +
> +#endif
> diff --git a/gcc/config/arm/arm-mve-builtins-functions.h
> b/gcc/config/arm/arm-mve-builtins-functions.h
> new file mode 100644
> index 00000000000..dff01999bcd
> --- /dev/null
> +++ b/gcc/config/arm/arm-mve-builtins-functions.h
> @@ -0,0 +1,50 @@
> +/* ACLE support for Arm MVE (function_base classes)
> +   Copyright (C) 2023 Free Software Foundation, Inc.
> +
> +   This file is part of GCC.
> +
> +   GCC is free software; you can redistribute it and/or modify it
> +   under the terms of the GNU General Public License as published by
> +   the Free Software Foundation; either version 3, or (at your option)
> +   any later version.
> +
> +   GCC is distributed in the hope that it will be useful, but
> +   WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   General Public License for more details.
> +
> +   You should have received a copy of the GNU General Public License
> +   along with GCC; see the file COPYING3.  If not see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#ifndef GCC_ARM_MVE_BUILTINS_FUNCTIONS_H
> +#define GCC_ARM_MVE_BUILTINS_FUNCTIONS_H
> +
> +namespace arm_mve {
> +
> +/* Wrap T, which is derived from function_base, and indicate that the
> +   function never has side effects.  It is only necessary to use this
> +   wrapper on functions that might have floating-point suffixes, since
> +   otherwise we assume by default that the function has no side effects.  */
> +template<typename T>
> +class quiet : public T
> +{
> +public:
> +  CONSTEXPR quiet () : T () {}
> +
> +  unsigned int
> +  call_properties (const function_instance &) const override
> +  {
> +    return 0;
> +  }
> +};
> +
> +} /* end namespace arm_mve */
> +
> +/* Declare the global function base NAME, creating it from an instance
> +   of class CLASS with constructor arguments ARGS.  */
> +#define FUNCTION(NAME, CLASS, ARGS) \
> +  namespace { static CONSTEXPR const CLASS NAME##_obj ARGS; } \
> +  namespace functions { const function_base *const NAME = &NAME##_obj;
> }
> +
> +#endif
> diff --git a/gcc/config/arm/arm-mve-builtins-shapes.cc b/gcc/config/arm/arm-
> mve-builtins-shapes.cc
> new file mode 100644
> index 00000000000..f20660d8319
> --- /dev/null
> +++ b/gcc/config/arm/arm-mve-builtins-shapes.cc
> @@ -0,0 +1,343 @@
> +/* ACLE support for Arm MVE (function shapes)
> +   Copyright (C) 2023 Free Software Foundation, Inc.
> +
> +   This file is part of GCC.
> +
> +   GCC is free software; you can redistribute it and/or modify it
> +   under the terms of the GNU General Public License as published by
> +   the Free Software Foundation; either version 3, or (at your option)
> +   any later version.
> +
> +   GCC is distributed in the hope that it will be useful, but
> +   WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   General Public License for more details.
> +
> +   You should have received a copy of the GNU General Public License
> +   along with GCC; see the file COPYING3.  If not see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#include "config.h"
> +#include "system.h"
> +#include "coretypes.h"
> +#include "tm.h"
> +#include "tree.h"
> +#include "rtl.h"
> +#include "memmodel.h"
> +#include "insn-codes.h"
> +#include "optabs.h"
> +#include "arm-mve-builtins.h"
> +#include "arm-mve-builtins-shapes.h"
> +
> +/* In the comments below, _t0 represents the first type suffix
> +   (e.g. "_s8") and _t1 represents the second.  T0/T1 represent the
> +   type full names (e.g. int8x16_t). Square brackets enclose
> +   characters that are present in only the full name, not the
> +   overloaded name.  Governing predicate arguments and predicate
> +   suffixes are not shown, since they depend on the predication type,
> +   which is a separate piece of information from the shape.  */
> +
> +namespace arm_mve {
> +
> +/* If INSTANCE has a predicate, add it to the list of argument types
> +   in ARGUMENT_TYPES.  RETURN_TYPE is the type returned by the
> +   function.  */
> +static void
> +apply_predication (const function_instance &instance, tree return_type,
> +		   vec<tree> &argument_types)
> +{
> +  if (instance.pred != PRED_none)
> +    {
> +      /* When predicate is PRED_m, insert a first argument
> +	 ("inactive") with the same type as return_type.  */
> +      if (instance.has_inactive_argument ())
> +	argument_types.quick_insert (0, return_type);
> +      argument_types.quick_push (get_mve_pred16_t ());
> +    }
> +}
> +
> +/* Parse and move past an element type in FORMAT and return it as a type
> +   suffix.  The format is:
> +
> +   [01]    - the element type in type suffix 0 or 1 of INSTANCE.
> +   h<elt>  - a half-sized version of <elt>
> +   s<bits> - a signed type with the given number of bits
> +   s[01]   - a signed type with the same width as type suffix 0 or 1
> +   u<bits> - an unsigned type with the given number of bits
> +   u[01]   - an unsigned type with the same width as type suffix 0 or 1
> +   w<elt>  - a double-sized version of <elt>
> +   x<bits> - a type with the given number of bits and same signedness
> +             as the next argument.
> +
> +   Future intrinsics will extend this format.  */
> +static type_suffix_index
> +parse_element_type (const function_instance &instance, const char
> *&format)
> +{
> +  int ch = *format++;
> +
> +
> +  if (ch == 's' || ch == 'u')
> +    {
> +      type_class_index tclass = (ch == 'f' ? TYPE_float
> +				 : ch == 's' ? TYPE_signed
> +				 : TYPE_unsigned);
> +      char *end;
> +      unsigned int bits = strtol (format, &end, 10);
> +      format = end;
> +      if (bits == 0 || bits == 1)
> +	bits = instance.type_suffix (bits).element_bits;
> +      return find_type_suffix (tclass, bits);
> +    }
> +
> +  if (ch == 'h')
> +    {
> +      type_suffix_index suffix = parse_element_type (instance, format);
> +      return find_type_suffix (type_suffixes[suffix].tclass,
> +			       type_suffixes[suffix].element_bits / 2);
> +    }
> +
> +   if (ch == 'w')
> +    {
> +      type_suffix_index suffix = parse_element_type (instance, format);
> +      return find_type_suffix (type_suffixes[suffix].tclass,
> +			       type_suffixes[suffix].element_bits * 2);
> +    }
> +
> +  if (ch == 'x')
> +    {
> +      const char *next = format;
> +      next = strstr (format, ",");
> +      next+=2;
> +      type_suffix_index suffix = parse_element_type (instance, next);
> +      type_class_index tclass = type_suffixes[suffix].tclass;
> +      char *end;
> +      unsigned int bits = strtol (format, &end, 10);
> +      format = end;
> +      return find_type_suffix (tclass, bits);
> +    }
> +
> +  if (ch == '0' || ch == '1')
> +    return instance.type_suffix_ids[ch - '0'];
> +
> +  gcc_unreachable ();
> +}
> +
> +/* Read and return a type from FORMAT for function INSTANCE.  Advance
> +   FORMAT beyond the type string.  The format is:
> +
> +   p       - predicates with type mve_pred16_t
> +   s<elt>  - a scalar type with the given element suffix
> +   t<elt>  - a vector or tuple type with given element suffix [*1]
> +   v<elt>  - a vector with the given element suffix
> +
> +   where <elt> has the format described above parse_element_type.
> +
> +   Future intrinsics will extend this format.
> +
> +   [*1] the vectors_per_tuple function indicates whether the type should
> +        be a tuple, and if so, how many vectors it should contain.  */
> +static tree
> +parse_type (const function_instance &instance, const char *&format)
> +{
> +  int ch = *format++;
> +
> +  if (ch == 'p')
> +    return get_mve_pred16_t ();
> +
> +  if (ch == 's')
> +    {
> +      type_suffix_index suffix = parse_element_type (instance, format);
> +      return scalar_types[type_suffixes[suffix].vector_type];
> +    }
> +
> +  if (ch == 't')
> +    {
> +      type_suffix_index suffix = parse_element_type (instance, format);
> +      vector_type_index vector_type = type_suffixes[suffix].vector_type;
> +      unsigned int num_vectors = instance.vectors_per_tuple ();
> +      return acle_vector_types[num_vectors - 1][vector_type];
> +    }
> +
> +  if (ch == 'v')
> +    {
> +      type_suffix_index suffix = parse_element_type (instance, format);
> +      return acle_vector_types[0][type_suffixes[suffix].vector_type];
> +    }
> +
> +  gcc_unreachable ();
> +}
> +
> +/* Read a type signature for INSTANCE from FORMAT.  Add the argument
> +   types to ARGUMENT_TYPES and return the return type.  Assert there
> +   are no more than MAX_ARGS arguments.
> +
> +   The format is a comma-separated list of types (as for parse_type),
> +   with the first type being the return type and the rest being the
> +   argument types.  */
> +static tree
> +parse_signature (const function_instance &instance, const char *format,
> +		 vec<tree> &argument_types, unsigned int max_args)
> +{
> +  tree return_type = parse_type (instance, format);
> +  unsigned int args = 0;
> +  while (format[0] == ',')
> +    {
> +      gcc_assert (args < max_args);
> +      format += 1;
> +      tree argument_type = parse_type (instance, format);
> +      argument_types.quick_push (argument_type);
> +      args += 1;
> +    }
> +  gcc_assert (format[0] == 0);
> +  return return_type;
> +}
> +
> +/* Add one function instance for GROUP, using mode suffix
> MODE_SUFFIX_ID,
> +   the type suffixes at index TI and the predication suffix at index PI.
> +   The other arguments are as for build_all.  */
> +static void
> +build_one (function_builder &b, const char *signature,
> +	   const function_group_info &group, mode_suffix_index
> mode_suffix_id,
> +	   unsigned int ti, unsigned int pi, bool preserve_user_namespace,
> +	   bool force_direct_overloads)
> +{
> +  /* Current functions take at most five arguments.  Match
> +     parse_signature parameter below.  */
> +  auto_vec<tree, 5> argument_types;
> +  function_instance instance (group.base_name, *group.base, *group.shape,
> +			      mode_suffix_id, group.types[ti],
> +			      group.preds[pi]);
> +  tree return_type = parse_signature (instance, signature, argument_types,
> 5);
> +  apply_predication (instance, return_type, argument_types);
> +  b.add_unique_function (instance, return_type, argument_types,
> +			 preserve_user_namespace, group.requires_float,
> +			 force_direct_overloads);
> +}
> +
> +/* Add a function instance for every type and predicate combination in
> +   GROUP, except if requested to use only the predicates listed in
> +   RESTRICT_TO_PREDS.  Take the function base name from GROUP and the
> +   mode suffix from MODE_SUFFIX_ID. Use SIGNATURE to construct the
> +   function signature, then use apply_predication to add in the
> +   predicate.  */
> +static void
> +build_all (function_builder &b, const char *signature,
> +	   const function_group_info &group, mode_suffix_index
> mode_suffix_id,
> +	   bool preserve_user_namespace,
> +	   bool force_direct_overloads = false,
> +	   const predication_index *restrict_to_preds = NULL)
> +{
> +  for (unsigned int pi = 0; group.preds[pi] != NUM_PREDS; ++pi)
> +    {
> +      unsigned int pi2 = 0;
> +
> +      if (restrict_to_preds)
> +	for (; restrict_to_preds[pi2] != NUM_PREDS; ++pi2)
> +	  if (restrict_to_preds[pi2] == group.preds[pi])
> +	    break;
> +
> +      if (restrict_to_preds == NULL || restrict_to_preds[pi2] != NUM_PREDS)
> +	for (unsigned int ti = 0;
> +	     ti == 0 || group.types[ti][0] != NUM_TYPE_SUFFIXES; ++ti)
> +	  build_one (b, signature, group, mode_suffix_id, ti, pi,
> +		     preserve_user_namespace, force_direct_overloads);
> +    }
> +}
> +
> +/* Add a function instance for every type and predicate combination in
> +   GROUP, except if requested to use only the predicates listed in
> +   RESTRICT_TO_PREDS, and only for 16-bit and 32-bit integers.  Take
> +   the function base name from GROUP and the mode suffix from
> +   MODE_SUFFIX_ID. Use SIGNATURE to construct the function signature,
> +   then use apply_predication to add in the predicate.  */
> +static void
> +build_16_32 (function_builder &b, const char *signature,
> +	     const function_group_info &group, mode_suffix_index
> mode_suffix_id,
> +	     bool preserve_user_namespace,
> +	     bool force_direct_overloads = false,
> +	     const predication_index *restrict_to_preds = NULL)
> +{
> +  for (unsigned int pi = 0; group.preds[pi] != NUM_PREDS; ++pi)
> +    {
> +      unsigned int pi2 = 0;
> +
> +      if (restrict_to_preds)
> +	for (; restrict_to_preds[pi2] != NUM_PREDS; ++pi2)
> +	  if (restrict_to_preds[pi2] == group.preds[pi])
> +	    break;
> +
> +      if (restrict_to_preds == NULL || restrict_to_preds[pi2] != NUM_PREDS)
> +	for (unsigned int ti = 0;
> +	     ti == 0 || group.types[ti][0] != NUM_TYPE_SUFFIXES; ++ti)
> +	  {
> +	    unsigned int element_bits =
> type_suffixes[group.types[ti][0]].element_bits;
> +	    type_class_index tclass = type_suffixes[group.types[ti][0]].tclass;
> +	    if ((tclass == TYPE_signed || tclass == TYPE_unsigned)
> +		&& (element_bits == 16 || element_bits == 32))
> +	      build_one (b, signature, group, mode_suffix_id, ti, pi,
> +			 preserve_user_namespace, force_direct_overloads);
> +	  }
> +    }
> +}
> +
> +/* Declare the function shape NAME, pointing it to an instance
> +   of class <NAME>_def.  */
> +#define SHAPE(NAME) \
> +  static CONSTEXPR const NAME##_def NAME##_obj; \
> +  namespace shapes { const function_shape *const NAME = &NAME##_obj; }
> +
> +/* Base class for functions that are not overloaded.  */
> +struct nonoverloaded_base : public function_shape
> +{
> +  bool
> +  explicit_type_suffix_p (unsigned int, enum predication_index, enum
> mode_suffix_index) const override
> +  {
> +    return true;
> +  }
> +
> +  bool
> +  explicit_mode_suffix_p (enum predication_index, enum
> mode_suffix_index) const override
> +  {
> +    return true;
> +  }
> +
> +  bool
> +  skip_overload_p (enum predication_index, enum mode_suffix_index)
> const override
> +  {
> +    return false;
> +  }
> +
> +  tree
> +  resolve (function_resolver &) const override
> +  {
> +    gcc_unreachable ();
> +  }
> +};
> +
> +/* Base class for overloaded functions.  Bit N of EXPLICIT_MASK is true
> +   if type suffix N appears in the overloaded name.  */
> +template<unsigned int EXPLICIT_MASK>
> +struct overloaded_base : public function_shape
> +{
> +  bool
> +  explicit_type_suffix_p (unsigned int i, enum predication_index, enum
> mode_suffix_index) const override
> +  {
> +    return (EXPLICIT_MASK >> i) & 1;
> +  }
> +
> +  bool
> +  explicit_mode_suffix_p (enum predication_index, enum
> mode_suffix_index) const override
> +  {
> +    return false;
> +  }
> +
> +  bool
> +  skip_overload_p (enum predication_index, enum mode_suffix_index)
> const override
> +  {
> +    return false;
> +  }
> +};
> +
> +} /* end namespace arm_mve */
> +
> +#undef SHAPE
> diff --git a/gcc/config/arm/arm-mve-builtins-shapes.h b/gcc/config/arm/arm-
> mve-builtins-shapes.h
> new file mode 100644
> index 00000000000..9e353b85a76
> --- /dev/null
> +++ b/gcc/config/arm/arm-mve-builtins-shapes.h
> @@ -0,0 +1,30 @@
> +/* ACLE support for Arm MVE (function shapes)
> +   Copyright (C) 2023 Free Software Foundation, Inc.
> +
> +   This file is part of GCC.
> +
> +   GCC is free software; you can redistribute it and/or modify it
> +   under the terms of the GNU General Public License as published by
> +   the Free Software Foundation; either version 3, or (at your option)
> +   any later version.
> +
> +   GCC is distributed in the hope that it will be useful, but
> +   WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   General Public License for more details.
> +
> +   You should have received a copy of the GNU General Public License
> +   along with GCC; see the file COPYING3.  If not see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#ifndef GCC_ARM_MVE_BUILTINS_SHAPES_H
> +#define GCC_ARM_MVE_BUILTINS_SHAPES_H
> +
> +namespace arm_mve
> +{
> +  namespace shapes
> +  {
> +  } /* end namespace arm_mve::shapes */
> +} /* end namespace arm_mve */
> +
> +#endif
> diff --git a/gcc/config/arm/arm-mve-builtins.cc b/gcc/config/arm/arm-mve-
> builtins.cc
> index 7586a82e3c1..b0cceb75ceb 100644
> --- a/gcc/config/arm/arm-mve-builtins.cc
> +++ b/gcc/config/arm/arm-mve-builtins.cc
> @@ -24,7 +24,19 @@
>  #include "coretypes.h"
>  #include "tm.h"
>  #include "tree.h"
> +#include "rtl.h"
> +#include "tm_p.h"
> +#include "memmodel.h"
> +#include "insn-codes.h"
> +#include "optabs.h"
> +#include "recog.h"
> +#include "expr.h"
> +#include "basic-block.h"
> +#include "function.h"
>  #include "fold-const.h"
> +#include "gimple.h"
> +#include "gimple-iterator.h"
> +#include "emit-rtl.h"
>  #include "langhooks.h"
>  #include "stringpool.h"
>  #include "attribs.h"
> @@ -32,6 +44,8 @@
>  #include "arm-protos.h"
>  #include "arm-builtins.h"
>  #include "arm-mve-builtins.h"
> +#include "arm-mve-builtins-base.h"
> +#include "arm-mve-builtins-shapes.h"
> 
>  namespace arm_mve {
> 
> @@ -46,6 +60,33 @@ struct vector_type_info
>    const bool requires_float;
>  };
> 
> +/* Describes a function decl.  */
> +class GTY(()) registered_function
> +{
> +public:
> +  /* The ACLE function that the decl represents.  */
> +  function_instance instance GTY ((skip));
> +
> +  /* The decl itself.  */
> +  tree decl;
> +
> +  /* Whether the function requires a floating point abi.  */
> +  bool requires_float;
> +
> +  /* True if the decl represents an overloaded function that needs to be
> +     resolved by function_resolver.  */
> +  bool overloaded_p;
> +};
> +
> +/* Hash traits for registered_function.  */
> +struct registered_function_hasher : nofree_ptr_hash <registered_function>
> +{
> +  typedef function_instance compare_type;
> +
> +  static hashval_t hash (value_type);
> +  static bool equal (value_type, const compare_type &);
> +};
> +
>  /* Flag indicating whether the arm MVE types have been handled.  */
>  static bool handle_arm_mve_types_p;
> 
> @@ -54,11 +95,167 @@ static CONSTEXPR const vector_type_info
> vector_types[] = {
>  #define DEF_MVE_TYPE(ACLE_NAME, SCALAR_TYPE) \
>    { #ACLE_NAME, REQUIRES_FLOAT },
>  #include "arm-mve-builtins.def"
> -#undef DEF_MVE_TYPE
> +};
> +
> +/* The function name suffix associated with each predication type.  */
> +static const char *const pred_suffixes[NUM_PREDS + 1] = {
> +  "",
> +  "_m",
> +  "_p",
> +  "_x",
> +  "_z",
> +  ""
> +};
> +
> +/* Static information about each mode_suffix_index.  */
> +CONSTEXPR const mode_suffix_info mode_suffixes[] = {
> +#define VECTOR_TYPE_none NUM_VECTOR_TYPES
> +#define DEF_MVE_MODE(NAME, BASE, DISPLACEMENT, UNITS) \
> +  { "_" #NAME, VECTOR_TYPE_##BASE, VECTOR_TYPE_##DISPLACEMENT,
> UNITS_##UNITS },
> +#include "arm-mve-builtins.def"
> +#undef VECTOR_TYPE_none
> +  { "", NUM_VECTOR_TYPES, NUM_VECTOR_TYPES, UNITS_none }
> +};
> +
> +/* Static information about each type_suffix_index.  */
> +CONSTEXPR const type_suffix_info type_suffixes[NUM_TYPE_SUFFIXES + 1] =
> {
> +#define DEF_MVE_TYPE_SUFFIX(NAME, ACLE_TYPE, CLASS, BITS, MODE)
> 	\
> +  { "_" #NAME, \
> +    VECTOR_TYPE_##ACLE_TYPE, \
> +    TYPE_##CLASS, \
> +    BITS, \
> +    BITS / BITS_PER_UNIT, \
> +    TYPE_##CLASS == TYPE_signed || TYPE_##CLASS == TYPE_unsigned, \
> +    TYPE_##CLASS == TYPE_unsigned, \
> +    TYPE_##CLASS == TYPE_float, \
> +    0, \
> +    MODE },
> +#include "arm-mve-builtins.def"
> +  { "", NUM_VECTOR_TYPES, TYPE_bool, 0, 0, false, false, false,
> +    0, VOIDmode }
> +};
> +
> +/* Define a TYPES_<combination> macro for each combination of type
> +   suffixes that an ACLE function can have, where <combination> is the
> +   name used in DEF_MVE_FUNCTION entries.
> +
> +   Use S (T) for single type suffix T and D (T1, T2) for a pair of type
> +   suffixes T1 and T2.  Use commas to separate the suffixes.
> +
> +   Although the order shouldn't matter, the convention is to sort the
> +   suffixes lexicographically after dividing suffixes into a type
> +   class ("b", "f", etc.) and a numerical bit count.  */
> +
> +/* _f16.  */
> +#define TYPES_float16(S, D) \
> +  S (f16)
> +
> +/* _f16 _f32.  */
> +#define TYPES_all_float(S, D) \
> +  S (f16), S (f32)
> +
> +/* _s8  _u8 .  */
> +#define TYPES_integer_8(S, D) \
> +  S (s8), S (u8)
> +
> +/* _s8 _s16
> +   _u8 _u16.  */
> +#define TYPES_integer_8_16(S, D) \
> +  S (s8), S (s16), S (u8), S(u16)
> +
> +/* _s16 _s32
> +   _u16 _u32.  */
> +#define TYPES_integer_16_32(S, D)     \
> +  S (s16), S (s32),		      \
> +  S (u16), S (u32)
> +
> +/* _s16 _s32.  */
> +#define TYPES_signed_16_32(S, D) \
> +  S (s16), S (s32)
> +
> +/* _s8 _s16 _s32.  */
> +#define TYPES_all_signed(S, D) \
> +  S (s8), S (s16), S (s32)
> +
> +/* _u8 _u16 _u32.  */
> +#define TYPES_all_unsigned(S, D) \
> +  S (u8), S (u16), S (u32)
> +
> +/* _s8 _s16 _s32
> +   _u8 _u16 _u32.  */
> +#define TYPES_all_integer(S, D) \
> +  TYPES_all_signed (S, D), TYPES_all_unsigned (S, D)
> +
> +/* _s8 _s16 _s32 _s64
> +   _u8 _u16 _u32 _u64.  */
> +#define TYPES_all_integer_with_64(S, D) \
> +  TYPES_all_signed (S, D), S (s64), TYPES_all_unsigned (S, D), S (u64)
> +
> +/* s32 _u32.  */
> +#define TYPES_integer_32(S, D) \
> +  S (s32), S (u32)
> +
> +/* s32 .  */
> +#define TYPES_signed_32(S, D) \
> +  S (s32)
> +
> +/* Describe a pair of type suffixes in which only the first is used.  */
> +#define DEF_VECTOR_TYPE(X) { TYPE_SUFFIX_ ## X, NUM_TYPE_SUFFIXES }
> +
> +/* Describe a pair of type suffixes in which both are used.  */
> +#define DEF_DOUBLE_TYPE(X, Y) { TYPE_SUFFIX_ ## X, TYPE_SUFFIX_ ## Y }
> +
> +/* Create an array that can be used in arm-mve-builtins.def to
> +   select the type suffixes in TYPES_<NAME>.  */
> +#define DEF_MVE_TYPES_ARRAY(NAME) \
> +  static const type_suffix_pair types_##NAME[] = { \
> +    TYPES_##NAME (DEF_VECTOR_TYPE, DEF_DOUBLE_TYPE), \
> +    { NUM_TYPE_SUFFIXES, NUM_TYPE_SUFFIXES } \
> +  }
> +
> +/* For functions that don't take any type suffixes.  */
> +static const type_suffix_pair types_none[] = {
> +  { NUM_TYPE_SUFFIXES, NUM_TYPE_SUFFIXES },
> +  { NUM_TYPE_SUFFIXES, NUM_TYPE_SUFFIXES }
> +};
> +
> +DEF_MVE_TYPES_ARRAY (all_integer);
> +DEF_MVE_TYPES_ARRAY (all_integer_with_64);
> +DEF_MVE_TYPES_ARRAY (float16);
> +DEF_MVE_TYPES_ARRAY (all_float);
> +DEF_MVE_TYPES_ARRAY (all_signed);
> +DEF_MVE_TYPES_ARRAY (all_unsigned);
> +DEF_MVE_TYPES_ARRAY (integer_8);
> +DEF_MVE_TYPES_ARRAY (integer_8_16);
> +DEF_MVE_TYPES_ARRAY (integer_16_32);
> +DEF_MVE_TYPES_ARRAY (integer_32);
> +DEF_MVE_TYPES_ARRAY (signed_16_32);
> +DEF_MVE_TYPES_ARRAY (signed_32);
> +
> +/* Used by functions that have no governing predicate.  */
> +static const predication_index preds_none[] = { PRED_none, NUM_PREDS };
> +
> +/* Used by functions that have the m (merging) predicated form, and in
> +   addition have an unpredicated form.  */
> +static const predication_index preds_m_or_none[] = {
> +  PRED_m, PRED_none, NUM_PREDS
> +};
> +
> +/* Used by functions that have the mx (merging and "don't care"
> +   predicated forms, and in addition have an unpredicated form.  */
> +static const predication_index preds_mx_or_none[] = {
> +  PRED_m, PRED_x, PRED_none, NUM_PREDS
> +};
> +
> +/* Used by functions that have the p predicated form, in addition to
> +   an unpredicated form.  */
> +static const predication_index preds_p_or_none[] = {
> +  PRED_p, PRED_none, NUM_PREDS
>  };
> 
>  /* The scalar type associated with each vector type.  */
> -GTY(()) tree scalar_types[NUM_VECTOR_TYPES];
> +extern GTY(()) tree scalar_types[NUM_VECTOR_TYPES];
> +tree scalar_types[NUM_VECTOR_TYPES];
> 
>  /* The single-predicate and single-vector types, with their built-in
>     "__simd128_..._t" name.  Allow an index of NUM_VECTOR_TYPES, which
> always
> @@ -66,7 +263,20 @@ GTY(()) tree scalar_types[NUM_VECTOR_TYPES];
>  static GTY(()) tree abi_vector_types[NUM_VECTOR_TYPES + 1];
> 
>  /* Same, but with the arm_mve.h names.  */
> -GTY(()) tree acle_vector_types[3][NUM_VECTOR_TYPES + 1];
> +extern GTY(()) tree
> acle_vector_types[MAX_TUPLE_SIZE][NUM_VECTOR_TYPES + 1];
> +tree acle_vector_types[MAX_TUPLE_SIZE][NUM_VECTOR_TYPES + 1];
> +
> +/* The list of all registered function decls, indexed by code.  */
> +static GTY(()) vec<registered_function *, va_gc> *registered_functions;
> +
> +/* All registered function decls, hashed on the function_instance
> +   that they implement.  This is used for looking up implementations of
> +   overloaded functions.  */
> +static hash_table<registered_function_hasher> *function_table;
> +
> +/* True if we've already complained about attempts to use functions
> +   when the required extension is disabled.  */
> +static bool reported_missing_float_p;
> 
>  /* Return the MVE abi type with element of type TYPE.  */
>  static tree
> @@ -87,7 +297,6 @@ register_builtin_types ()
>  #define DEF_MVE_TYPE(ACLE_NAME, SCALAR_TYPE) \
>    scalar_types[VECTOR_TYPE_ ## ACLE_NAME] = SCALAR_TYPE;
>  #include "arm-mve-builtins.def"
> -#undef DEF_MVE_TYPE
>    for (unsigned int i = 0; i < NUM_VECTOR_TYPES; ++i)
>      {
>        if (vector_types[i].requires_float && !TARGET_HAVE_MVE_FLOAT)
> @@ -113,8 +322,18 @@ register_builtin_types ()
>  static void
>  register_vector_type (vector_type_index type)
>  {
> +
> +  /* If the target does not have the mve.fp extension, but the type requires
> +     it, then it needs to be assigned a non-dummy type so that functions
> +     with those types in their signature can be registered.  This allows for
> +     diagnostics about the missing extension, rather than about a missing
> +     function definition.  */
>    if (vector_types[type].requires_float && !TARGET_HAVE_MVE_FLOAT)
> -    return;
> +    {
> +      acle_vector_types[0][type] = void_type_node;
> +      return;
> +    }
> +
>    tree vectype = abi_vector_types[type];
>    tree id = get_identifier (vector_types[type].acle_name);
>    tree decl = build_decl (input_location, TYPE_DECL, id, vectype);
> @@ -133,15 +352,26 @@ register_vector_type (vector_type_index type)
>    acle_vector_types[0][type] = vectype;
>  }
> 
> -/* Register tuple type TYPE with NUM_VECTORS arity under its
> -   arm_mve_types.h name.  */
> +/* Register tuple types of element type TYPE under their arm_mve_types.h
> +   names.  */
>  static void
>  register_builtin_tuple_types (vector_type_index type)
>  {
>    const vector_type_info* info = &vector_types[type];
> +
> +  /* If the target does not have the mve.fp extension, but the type requires
> +     it, then it needs to be assigned a non-dummy type so that functions
> +     with those types in their signature can be registered.  This allows for
> +     diagnostics about the missing extension, rather than about a missing
> +     function definition.  */
>    if (scalar_types[type] == boolean_type_node
>        || (info->requires_float && !TARGET_HAVE_MVE_FLOAT))
> +    {
> +      for (unsigned int num_vectors = 2; num_vectors <= 4; num_vectors += 2)
> +	acle_vector_types[num_vectors >> 1][type] = void_type_node;
>      return;
> +    }
> +
>    const char *vector_type_name = info->acle_name;
>    char buffer[sizeof ("float32x4x2_t")];
>    for (unsigned int num_vectors = 2; num_vectors <= 4; num_vectors += 2)
> @@ -189,8 +419,1710 @@ handle_arm_mve_types_h ()
>      }
>  }
> 
> -} /* end namespace arm_mve */
> +/* Implement #pragma GCC arm "arm_mve.h" <bool>.  */
> +void
> +handle_arm_mve_h (bool preserve_user_namespace)
> +{
> +  if (function_table)
> +    {
> +      error ("duplicate definition of %qs", "arm_mve.h");
> +      return;
> +    }
> 
> -using namespace arm_mve;
> +  /* Define MVE functions.  */
> +  function_table = new hash_table<registered_function_hasher> (1023);
> +}
> +
> +/* Return true if CANDIDATE is equivalent to MODEL_TYPE for overloading
> +   purposes.  */
> +static bool
> +matches_type_p (const_tree model_type, const_tree candidate)
> +{
> +  if (VECTOR_TYPE_P (model_type))
> +    {
> +      if (!VECTOR_TYPE_P (candidate)
> +	  || maybe_ne (TYPE_VECTOR_SUBPARTS (model_type),
> +		       TYPE_VECTOR_SUBPARTS (candidate))
> +	  || TYPE_MODE (model_type) != TYPE_MODE (candidate))
> +	return false;
> +
> +      model_type = TREE_TYPE (model_type);
> +      candidate = TREE_TYPE (candidate);
> +    }
> +  return (candidate != error_mark_node
> +	  && TYPE_MAIN_VARIANT (model_type) == TYPE_MAIN_VARIANT
> (candidate));
> +}
> +
> +/* Report an error against LOCATION that the user has tried to use
> +   a floating point function when the mve.fp extension is disabled.  */
> +static void
> +report_missing_float (location_t location, tree fndecl)
> +{
> +  /* Avoid reporting a slew of messages for a single oversight.  */
> +  if (reported_missing_float_p)
> +    return;
> +
> +  error_at (location, "ACLE function %qD requires ISA extension %qs",
> +	    fndecl, "mve.fp");
> +  inform (location, "you can enable mve.fp by using the command-line"
> +	  " option %<-march%>, or by using the %<target%>"
> +	  " attribute or pragma");
> +  reported_missing_float_p = true;
> +}
> +
> +/* Report that LOCATION has a call to FNDECL in which argument ARGNO
> +   was not an integer constant expression.  ARGNO counts from zero.  */
> +static void
> +report_non_ice (location_t location, tree fndecl, unsigned int argno)
> +{
> +  error_at (location, "argument %d of %qE must be an integer constant"
> +	    " expression", argno + 1, fndecl);
> +}
> +
> +/* Report that LOCATION has a call to FNDECL in which argument ARGNO has
> +   the value ACTUAL, whereas the function requires a value in the range
> +   [MIN, MAX].  ARGNO counts from zero.  */
> +static void
> +report_out_of_range (location_t location, tree fndecl, unsigned int argno,
> +		     HOST_WIDE_INT actual, HOST_WIDE_INT min,
> +		     HOST_WIDE_INT max)
> +{
> +  error_at (location, "passing %wd to argument %d of %qE, which expects"
> +	    " a value in the range [%wd, %wd]", actual, argno + 1, fndecl,
> +	    min, max);
> +}
> +
> +/* Report that LOCATION has a call to FNDECL in which argument ARGNO has
> +   the value ACTUAL, whereas the function requires a valid value of
> +   enum type ENUMTYPE.  ARGNO counts from zero.  */
> +static void
> +report_not_enum (location_t location, tree fndecl, unsigned int argno,
> +		 HOST_WIDE_INT actual, tree enumtype)
> +{
> +  error_at (location, "passing %wd to argument %d of %qE, which expects"
> +	    " a valid %qT value", actual, argno + 1, fndecl, enumtype);
> +}
> +
> +/* Checks that the mve.fp extension is enabled, given that REQUIRES_FLOAT
> +   indicates whether it is required or not for function FNDECL.
> +   Report an error against LOCATION if not.  */
> +static bool
> +check_requires_float (location_t location, tree fndecl,
> +		      bool requires_float)
> +{
> +  if (requires_float && !TARGET_HAVE_MVE_FLOAT)
> +    {
> +      report_missing_float (location, fndecl);
> +      return false;
> +    }
> +
> +  return true;
> +}
> +
> +/* Return a hash code for a function_instance.  */
> +hashval_t
> +function_instance::hash () const
> +{
> +  inchash::hash h;
> +  /* BASE uniquely determines BASE_NAME, so we don't need to hash both.
> */
> +  h.add_ptr (base);
> +  h.add_ptr (shape);
> +  h.add_int (mode_suffix_id);
> +  h.add_int (type_suffix_ids[0]);
> +  h.add_int (type_suffix_ids[1]);
> +  h.add_int (pred);
> +  return h.end ();
> +}
> +
> +/* Return a set of CP_* flags that describe what the function could do,
> +   taking the command-line flags into account.  */
> +unsigned int
> +function_instance::call_properties () const
> +{
> +  unsigned int flags = base->call_properties (*this);
> +
> +  /* -fno-trapping-math means that we can assume any FP exceptions
> +     are not user-visible.  */
> +  if (!flag_trapping_math)
> +    flags &= ~CP_RAISE_FP_EXCEPTIONS;
> +
> +  return flags;
> +}
> +
> +/* Return true if calls to the function could read some form of
> +   global state.  */
> +bool
> +function_instance::reads_global_state_p () const
> +{
> +  unsigned int flags = call_properties ();
> +
> +  /* Preserve any dependence on rounding mode, flush to zero mode, etc.
> +     There is currently no way of turning this off; in particular,
> +     -fno-rounding-math (which is the default) means that we should make
> +     the usual assumptions about rounding mode, which for intrinsics means
> +     acting as the instructions do.  */
> +  if (flags & CP_READ_FPCR)
> +    return true;
> +
> +  return false;
> +}
> +
> +/* Return true if calls to the function could modify some form of
> +   global state.  */
> +bool
> +function_instance::modifies_global_state_p () const
> +{
> +  unsigned int flags = call_properties ();
> +
> +  /* Preserve any exception state written back to the FPCR,
> +     unless -fno-trapping-math says this is unnecessary.  */
> +  if (flags & CP_RAISE_FP_EXCEPTIONS)
> +    return true;
> +
> +  /* Handle direct modifications of global state.  */
> +  return flags & CP_WRITE_MEMORY;
> +}
> +
> +/* Return true if calls to the function could raise a signal.  */
> +bool
> +function_instance::could_trap_p () const
> +{
> +  unsigned int flags = call_properties ();
> +
> +  /* Handle functions that could raise SIGFPE.  */
> +  if (flags & CP_RAISE_FP_EXCEPTIONS)
> +    return true;
> +
> +  /* Handle functions that could raise SIGBUS or SIGSEGV.  */
> +  if (flags & (CP_READ_MEMORY | CP_WRITE_MEMORY))
> +    return true;
> +
> +  return false;
> +}
> +
> +/* Return true if the function has an implicit "inactive" argument.
> +   This is the case of most _m predicated functions, but not all.
> +   The list will be updated as needed.  */
> +bool
> +function_instance::has_inactive_argument () const
> +{
> +  if (pred != PRED_m)
> +    return false;
> +
> +  return true;
> +}
> +
> +inline hashval_t
> +registered_function_hasher::hash (value_type value)
> +{
> +  return value->instance.hash ();
> +}
> +
> +inline bool
> +registered_function_hasher::equal (value_type value, const compare_type
> &key)
> +{
> +  return value->instance == key;
> +}
> +
> +function_builder::function_builder ()
> +{
> +  m_overload_type = build_function_type (void_type_node, void_list_node);
> +  m_direct_overloads = lang_GNU_CXX ();
> +  gcc_obstack_init (&m_string_obstack);
> +}
> +
> +function_builder::~function_builder ()
> +{
> +  obstack_free (&m_string_obstack, NULL);
> +}
> +
> +/* Add NAME to the end of the function name being built.  */
> +void
> +function_builder::append_name (const char *name)
> +{
> +  obstack_grow (&m_string_obstack, name, strlen (name));
> +}
> +
> +/* Zero-terminate and complete the function name being built.  */
> +char *
> +function_builder::finish_name ()
> +{
> +  obstack_1grow (&m_string_obstack, 0);
> +  return (char *) obstack_finish (&m_string_obstack);
> +}
> +
> +/* Return the overloaded or full function name for INSTANCE, with optional
> +   prefix; PRESERVE_USER_NAMESPACE selects the prefix, and
> OVERLOADED_P
> +   selects which the overloaded or full function name.  Allocate the string on
> +   m_string_obstack; the caller must use obstack_free to free it after use.  */
> +char *
> +function_builder::get_name (const function_instance &instance,
> +			    bool preserve_user_namespace,
> +			    bool overloaded_p)
> +{
> +  if (preserve_user_namespace)
> +    append_name ("__arm_");
> +  append_name (instance.base_name);
> +  append_name (pred_suffixes[instance.pred]);
> +  if (!overloaded_p
> +      || instance.shape->explicit_mode_suffix_p (instance.pred,
> +						 instance.mode_suffix_id))
> +    append_name (instance.mode_suffix ().string);
> +  for (unsigned int i = 0; i < 2; ++i)
> +    if (!overloaded_p
> +	|| instance.shape->explicit_type_suffix_p (i, instance.pred,
> +						   instance.mode_suffix_id))
> +      append_name (instance.type_suffix (i).string);
> +  return finish_name ();
> +}
> +
> +/* Add attribute NAME to ATTRS.  */
> +static tree
> +add_attribute (const char *name, tree attrs)
> +{
> +  return tree_cons (get_identifier (name), NULL_TREE, attrs);
> +}
> +
> +/* Return the appropriate function attributes for INSTANCE.  */
> +tree
> +function_builder::get_attributes (const function_instance &instance)
> +{
> +  tree attrs = NULL_TREE;
> +
> +  if (!instance.modifies_global_state_p ())
> +    {
> +      if (instance.reads_global_state_p ())
> +	attrs = add_attribute ("pure", attrs);
> +      else
> +	attrs = add_attribute ("const", attrs);
> +    }
> +
> +  if (!flag_non_call_exceptions || !instance.could_trap_p ())
> +    attrs = add_attribute ("nothrow", attrs);
> +
> +  return add_attribute ("leaf", attrs);
> +}
> +
> +/* Add a function called NAME with type FNTYPE and attributes ATTRS.
> +   INSTANCE describes what the function does and OVERLOADED_P indicates
> +   whether it is overloaded.  REQUIRES_FLOAT indicates whether the function
> +   requires the mve.fp extension.  */
> +registered_function &
> +function_builder::add_function (const function_instance &instance,
> +				const char *name, tree fntype, tree attrs,
> +				bool requires_float,
> +				bool overloaded_p,
> +				bool placeholder_p)
> +{
> +  unsigned int code = vec_safe_length (registered_functions);
> +  code = (code << ARM_BUILTIN_SHIFT) | ARM_BUILTIN_MVE;
> +
> +  /* We need to be able to generate placeholders to ensure that we have a
> +     consistent numbering scheme for function codes between the C and C++
> +     frontends, so that everything ties up in LTO.
> +
> +     Currently, tree-streamer-in.cc:unpack_ts_function_decl_value_fields
> +     validates that tree nodes returned by TARGET_BUILTIN_DECL are non-
> NULL and
> +     some node other than error_mark_node.  This is a holdover from when
> builtin
> +     decls were streamed by code rather than by value.
> +
> +     Ultimately, we should be able to remove this validation of BUILT_IN_MD
> +     nodes and remove the target hook.  For now, however, we need to
> appease the
> +     validation and return a non-NULL, non-error_mark_node node, so we
> +     arbitrarily choose integer_zero_node.  */
> +  tree decl = placeholder_p
> +    ? integer_zero_node
> +    : simulate_builtin_function_decl (input_location, name, fntype,
> +				      code, NULL, attrs);
> +
> +  registered_function &rfn = *ggc_alloc <registered_function> ();
> +  rfn.instance = instance;
> +  rfn.decl = decl;
> +  rfn.requires_float = requires_float;
> +  rfn.overloaded_p = overloaded_p;
> +  vec_safe_push (registered_functions, &rfn);
> +
> +  return rfn;
> +}
> +
> +/* Add a built-in function for INSTANCE, with the argument types given
> +   by ARGUMENT_TYPES and the return type given by RETURN_TYPE.
> +   REQUIRES_FLOAT indicates whether the function requires the mve.fp
> extension,
> +   and PRESERVE_USER_NAMESPACE indicates whether the function should
> also be
> +   registered under its non-prefixed name.  */
> +void
> +function_builder::add_unique_function (const function_instance &instance,
> +				       tree return_type,
> +				       vec<tree> &argument_types,
> +				       bool preserve_user_namespace,
> +				       bool requires_float,
> +				       bool force_direct_overloads)
> +{
> +  /* Add the function under its full (unique) name with prefix.  */
> +  char *name = get_name (instance, true, false);
> +  tree fntype = build_function_type_array (return_type,
> +					   argument_types.length (),
> +					   argument_types.address ());
> +  tree attrs = get_attributes (instance);
> +  registered_function &rfn = add_function (instance, name, fntype, attrs,
> +					   requires_float, false, false);
> +
> +  /* Enter the function into the hash table.  */
> +  hashval_t hash = instance.hash ();
> +  registered_function **rfn_slot
> +    = function_table->find_slot_with_hash (instance, hash, INSERT);
> +  gcc_assert (!*rfn_slot);
> +  *rfn_slot = &rfn;
> +
> +  /* Also add the non-prefixed non-overloaded function, if the user
> namespace
> +     does not need to be preserved.  */
> +  if (!preserve_user_namespace)
> +    {
> +      char *noprefix_name = get_name (instance, false, false);
> +      tree attrs = get_attributes (instance);
> +      add_function (instance, noprefix_name, fntype, attrs, requires_float,
> +		    false, false);
> +    }
> +
> +  /* Also add the function under its overloaded alias, if we want
> +     a separate decl for each instance of an overloaded function.  */
> +  char *overload_name = get_name (instance, true, true);
> +  if (strcmp (name, overload_name) != 0)
> +    {
> +      /* Attribute lists shouldn't be shared.  */
> +      tree attrs = get_attributes (instance);
> +      bool placeholder_p = !(m_direct_overloads || force_direct_overloads);
> +      add_function (instance, overload_name, fntype, attrs,
> +		    requires_float, false, placeholder_p);
> +
> +      /* Also add the non-prefixed overloaded function, if the user namespace
> +	 does not need to be preserved.  */
> +      if (!preserve_user_namespace)
> +	{
> +	  char *noprefix_overload_name = get_name (instance, false, true);
> +	  tree attrs = get_attributes (instance);
> +	  add_function (instance, noprefix_overload_name, fntype, attrs,
> +			requires_float, false, placeholder_p);
> +	}
> +    }
> +
> +  obstack_free (&m_string_obstack, name);
> +}
> +
> +/* Add one function decl for INSTANCE, to be used with manual overload
> +   resolution.  REQUIRES_FLOAT indicates whether the function requires the
> +   mve.fp extension.
> +
> +   For simplicity, partition functions by instance and required extensions,
> +   and check whether the required extensions are available as part of
> resolving
> +   the function to the relevant unique function.  */
> +void
> +function_builder::add_overloaded_function (const function_instance
> &instance,
> +					   bool preserve_user_namespace,
> +					   bool requires_float)
> +{
> +  char *name = get_name (instance, true, true);
> +  if (registered_function **map_value = m_overload_names.get (name))
> +    {
> +      gcc_assert ((*map_value)->instance == instance);
> +      obstack_free (&m_string_obstack, name);
> +    }
> +  else
> +    {
> +      registered_function &rfn
> +	= add_function (instance, name, m_overload_type, NULL_TREE,
> +			requires_float, true, m_direct_overloads);
> +      m_overload_names.put (name, &rfn);
> +      if (!preserve_user_namespace)
> +	{
> +	  char *noprefix_name = get_name (instance, false, true);
> +	  registered_function &noprefix_rfn
> +	    = add_function (instance, noprefix_name, m_overload_type,
> +			    NULL_TREE, requires_float, true,
> +			    m_direct_overloads);
> +	  m_overload_names.put (noprefix_name, &noprefix_rfn);
> +	}
> +    }
> +}
> +
> +/* If we are using manual overload resolution, add one function decl
> +   for each overloaded function in GROUP.  Take the function base name
> +   from GROUP and the mode from MODE.  */
> +void
> +function_builder::add_overloaded_functions (const function_group_info
> &group,
> +					    mode_suffix_index mode,
> +					    bool preserve_user_namespace)
> +{
> +  for (unsigned int pi = 0; group.preds[pi] != NUM_PREDS; ++pi)
> +    {
> +      unsigned int explicit_type0
> +	= (*group.shape)->explicit_type_suffix_p (0, group.preds[pi], mode);
> +      unsigned int explicit_type1
> +	= (*group.shape)->explicit_type_suffix_p (1, group.preds[pi], mode);
> +
> +      if ((*group.shape)->skip_overload_p (group.preds[pi], mode))
> +	continue;
> +
> +      if (!explicit_type0 && !explicit_type1)
> +	{
> +	  /* Deal with the common case in which there is one overloaded
> +	     function for all type combinations.  */
> +	  function_instance instance (group.base_name, *group.base,
> +				      *group.shape, mode, types_none[0],
> +				      group.preds[pi]);
> +	  add_overloaded_function (instance, preserve_user_namespace,
> +				   group.requires_float);
> +	}
> +      else
> +	for (unsigned int ti = 0; group.types[ti][0] != NUM_TYPE_SUFFIXES;
> +	     ++ti)
> +	  {
> +	    /* Stub out the types that are determined by overload
> +	       resolution.  */
> +	    type_suffix_pair types = {
> +	      explicit_type0 ? group.types[ti][0] : NUM_TYPE_SUFFIXES,
> +	      explicit_type1 ? group.types[ti][1] : NUM_TYPE_SUFFIXES
> +	    };
> +	    function_instance instance (group.base_name, *group.base,
> +					*group.shape, mode, types,
> +					group.preds[pi]);
> +	    add_overloaded_function (instance, preserve_user_namespace,
> +				     group.requires_float);
> +	  }
> +    }
> +}
> +
> +/* Register all the functions in GROUP.  */
> +void
> +function_builder::register_function_group (const function_group_info
> &group,
> +					   bool preserve_user_namespace)
> +{
> +  (*group.shape)->build (*this, group, preserve_user_namespace);
> +}
> +
> +function_call_info::function_call_info (location_t location_in,
> +					const function_instance &instance_in,
> +					tree fndecl_in)
> +  : function_instance (instance_in), location (location_in), fndecl (fndecl_in)
> +{
> +}
> +
> +function_resolver::function_resolver (location_t location,
> +				      const function_instance &instance,
> +				      tree fndecl, vec<tree, va_gc> &arglist)
> +  : function_call_info (location, instance, fndecl), m_arglist (arglist)
> +{
> +}
> +
> +/* Return the vector type associated with type suffix TYPE.  */
> +tree
> +function_resolver::get_vector_type (type_suffix_index type)
> +{
> +  return acle_vector_types[0][type_suffixes[type].vector_type];
> +}
> +
> +/* Return the <stdint.h> name associated with TYPE.  Using the <stdint.h>
> +   name should be more user-friendly than the underlying canonical type,
> +   since it makes the signedness and bitwidth explicit.  */
> +const char *
> +function_resolver::get_scalar_type_name (type_suffix_index type)
> +{
> +  return vector_types[type_suffixes[type].vector_type].acle_name + 2;
> +}
> +
> +/* Return the type of argument I, or error_mark_node if it isn't
> +   well-formed.  */
> +tree
> +function_resolver::get_argument_type (unsigned int i)
> +{
> +  tree arg = m_arglist[i];
> +  return arg == error_mark_node ? arg : TREE_TYPE (arg);
> +}
> +
> +/* Return true if argument I is some form of scalar value.  */
> +bool
> +function_resolver::scalar_argument_p (unsigned int i)
> +{
> +  tree type = get_argument_type (i);
> +  return (INTEGRAL_TYPE_P (type)
> +	  /* Allow pointer types, leaving the frontend to warn where
> +	     necessary.  */
> +	  || POINTER_TYPE_P (type)
> +	  || SCALAR_FLOAT_TYPE_P (type));
> +}
> +
> +/* Report that the function has no form that takes type suffix TYPE.
> +   Return error_mark_node.  */
> +tree
> +function_resolver::report_no_such_form (type_suffix_index type)
> +{
> +  error_at (location, "%qE has no form that takes %qT arguments",
> +	    fndecl, get_vector_type (type));
> +  return error_mark_node;
> +}
> +
> +/* Silently check whether there is an instance of the function with the
> +   mode suffix given by MODE and the type suffixes given by TYPE0 and
> TYPE1.
> +   Return its function decl if so, otherwise return null.  */
> +tree
> +function_resolver::lookup_form (mode_suffix_index mode,
> +				type_suffix_index type0,
> +				type_suffix_index type1)
> +{
> +  type_suffix_pair types = { type0, type1 };
> +  function_instance instance (base_name, base, shape, mode, types, pred);
> +  registered_function *rfn
> +    = function_table->find_with_hash (instance, instance.hash ());
> +  return rfn ? rfn->decl : NULL_TREE;
> +}
> +
> +/* Resolve the function to one with the mode suffix given by MODE and the
> +   type suffixes given by TYPE0 and TYPE1.  Return its function decl on
> +   success, otherwise report an error and return error_mark_node.  */
> +tree
> +function_resolver::resolve_to (mode_suffix_index mode,
> +			       type_suffix_index type0,
> +			       type_suffix_index type1)
> +{
> +  tree res = lookup_form (mode, type0, type1);
> +  if (!res)
> +    {
> +      if (type1 == NUM_TYPE_SUFFIXES)
> +	return report_no_such_form (type0);
> +      if (type0 == type_suffix_ids[0])
> +	return report_no_such_form (type1);
> +      /* To be filled in when we have other cases.  */
> +      gcc_unreachable ();
> +    }
> +  return res;
> +}
> +
> +/* Require argument ARGNO to be a single vector or a tuple of
> NUM_VECTORS
> +   vectors; NUM_VECTORS is 1 for the former.  Return the associated type
> +   suffix on success, using TYPE_SUFFIX_b for predicates.  Report an error
> +   and return NUM_TYPE_SUFFIXES on failure.  */
> +type_suffix_index
> +function_resolver::infer_vector_or_tuple_type (unsigned int argno,
> +					       unsigned int num_vectors)
> +{
> +  tree actual = get_argument_type (argno);
> +  if (actual == error_mark_node)
> +    return NUM_TYPE_SUFFIXES;
> +
> +  /* A linear search should be OK here, since the code isn't hot and
> +     the number of types is only small.  */
> +  for (unsigned int size_i = 0; size_i < MAX_TUPLE_SIZE; ++size_i)
> +    for (unsigned int suffix_i = 0; suffix_i < NUM_TYPE_SUFFIXES; ++suffix_i)
> +      {
> +	vector_type_index type_i = type_suffixes[suffix_i].vector_type;
> +	tree type = acle_vector_types[size_i][type_i];
> +	if (type && matches_type_p (type, actual))
> +	  {
> +	    if (size_i + 1 == num_vectors)
> +	      return type_suffix_index (suffix_i);
> +
> +	    if (num_vectors == 1)
> +	      error_at (location, "passing %qT to argument %d of %qE, which"
> +			" expects a single MVE vector rather than a tuple",
> +			actual, argno + 1, fndecl);
> +	    else if (size_i == 0 && type_i != VECTOR_TYPE_mve_pred16_t)
> +	      /* num_vectors is always != 1, so the singular isn't needed.  */
> +	      error_n (location, num_vectors, "%qT%d%qE%d",
> +		       "passing single vector %qT to argument %d"
> +		       " of %qE, which expects a tuple of %d vectors",
> +		       actual, argno + 1, fndecl, num_vectors);
> +	    else
> +	      /* num_vectors is always != 1, so the singular isn't needed.  */
> +	      error_n (location, num_vectors, "%qT%d%qE%d",
> +		       "passing %qT to argument %d of %qE, which"
> +		       " expects a tuple of %d vectors", actual, argno + 1,
> +		       fndecl, num_vectors);
> +	    return NUM_TYPE_SUFFIXES;
> +	  }
> +      }
> +
> +  if (num_vectors == 1)
> +    error_at (location, "passing %qT to argument %d of %qE, which"
> +	      " expects an MVE vector type", actual, argno + 1, fndecl);
> +  else
> +    error_at (location, "passing %qT to argument %d of %qE, which"
> +	      " expects an MVE tuple type", actual, argno + 1, fndecl);
> +  return NUM_TYPE_SUFFIXES;
> +}
> +
> +/* Require argument ARGNO to have some form of vector type.  Return the
> +   associated type suffix on success, using TYPE_SUFFIX_b for predicates.
> +   Report an error and return NUM_TYPE_SUFFIXES on failure.  */
> +type_suffix_index
> +function_resolver::infer_vector_type (unsigned int argno)
> +{
> +  return infer_vector_or_tuple_type (argno, 1);
> +}
> +
> +/* Require argument ARGNO to be a vector or scalar argument.  Return true
> +   if it is, otherwise report an appropriate error.  */
> +bool
> +function_resolver::require_vector_or_scalar_type (unsigned int argno)
> +{
> +  tree actual = get_argument_type (argno);
> +  if (actual == error_mark_node)
> +    return false;
> +
> +  if (!scalar_argument_p (argno) && !VECTOR_TYPE_P (actual))
> +    {
> +      error_at (location, "passing %qT to argument %d of %qE, which"
> +		" expects a vector or scalar type", actual, argno + 1, fndecl);
> +      return false;
> +    }
> +
> +  return true;
> +}
> +
> +/* Require argument ARGNO to have vector type TYPE, in cases where this
> +   requirement holds for all uses of the function.  Return true if the
> +   argument has the right form, otherwise report an appropriate error.  */
> +bool
> +function_resolver::require_vector_type (unsigned int argno,
> +					vector_type_index type)
> +{
> +  tree expected = acle_vector_types[0][type];
> +  tree actual = get_argument_type (argno);
> +  if (actual == error_mark_node)
> +    return false;
> +
> +  if (!matches_type_p (expected, actual))
> +    {
> +      error_at (location, "passing %qT to argument %d of %qE, which"
> +		" expects %qT", actual, argno + 1, fndecl, expected);
> +      return false;
> +    }
> +  return true;
> +}
> +
> +/* Like require_vector_type, but TYPE is inferred from previous arguments
> +   rather than being a fixed part of the function signature.  This changes
> +   the nature of the error messages.  */
> +bool
> +function_resolver::require_matching_vector_type (unsigned int argno,
> +						 type_suffix_index type)
> +{
> +  type_suffix_index new_type = infer_vector_type (argno);
> +  if (new_type == NUM_TYPE_SUFFIXES)
> +    return false;
> +
> +  if (type != new_type)
> +    {
> +      error_at (location, "passing %qT to argument %d of %qE, but"
> +		" previous arguments had type %qT",
> +		get_vector_type (new_type), argno + 1, fndecl,
> +		get_vector_type (type));
> +      return false;
> +    }
> +  return true;
> +}
> +
> +/* Require argument ARGNO to be a vector type with the following
> properties:
> +
> +   - the type class must be the same as FIRST_TYPE's if EXPECTED_TCLASS
> +     is SAME_TYPE_CLASS, otherwise it must be EXPECTED_TCLASS itself.
> +
> +   - the element size must be:
> +
> +     - the same as FIRST_TYPE's if EXPECTED_BITS == SAME_SIZE
> +     - half of FIRST_TYPE's if EXPECTED_BITS == HALF_SIZE
> +     - a quarter of FIRST_TYPE's if EXPECTED_BITS == QUARTER_SIZE
> +     - EXPECTED_BITS itself otherwise
> +
> +   Return true if the argument has the required type, otherwise report
> +   an appropriate error.
> +
> +   FIRST_ARGNO is the first argument that is known to have type FIRST_TYPE.
> +   Usually it comes before ARGNO, but sometimes it is more natural to
> resolve
> +   arguments out of order.
> +
> +   If the required properties depend on FIRST_TYPE then both FIRST_ARGNO
> and
> +   ARGNO contribute to the resolution process.  If the required properties
> +   are fixed, only FIRST_ARGNO contributes to the resolution process.
> +
> +   This function is a bit of a Swiss army knife.  The complication comes
> +   from trying to give good error messages when FIRST_ARGNO and ARGNO
> are
> +   inconsistent, since either of them might be wrong.  */
> +bool function_resolver::
> +require_derived_vector_type (unsigned int argno,
> +			     unsigned int first_argno,
> +			     type_suffix_index first_type,
> +			     type_class_index expected_tclass,
> +			     unsigned int expected_bits)
> +{
> +  /* If the type needs to match FIRST_ARGNO exactly, use the preferred
> +     error message for that case.  The VECTOR_TYPE_P test excludes tuple
> +     types, which we handle below instead.  */
> +  bool both_vectors_p = VECTOR_TYPE_P (get_argument_type (first_argno));
> +  if (both_vectors_p
> +      && expected_tclass == SAME_TYPE_CLASS
> +      && expected_bits == SAME_SIZE)
> +    {
> +      /* There's no need to resolve this case out of order.  */
> +      gcc_assert (argno > first_argno);
> +      return require_matching_vector_type (argno, first_type);
> +    }
> +
> +  /* Use FIRST_TYPE to get the expected type class and element size.  */
> +  type_class_index orig_expected_tclass = expected_tclass;
> +  if (expected_tclass == NUM_TYPE_CLASSES)
> +    expected_tclass = type_suffixes[first_type].tclass;
> +
> +  unsigned int orig_expected_bits = expected_bits;
> +  if (expected_bits == SAME_SIZE)
> +    expected_bits = type_suffixes[first_type].element_bits;
> +  else if (expected_bits == HALF_SIZE)
> +    expected_bits = type_suffixes[first_type].element_bits / 2;
> +  else if (expected_bits == QUARTER_SIZE)
> +    expected_bits = type_suffixes[first_type].element_bits / 4;
> +
> +  /* If the expected type doesn't depend on FIRST_TYPE at all,
> +     just check for the fixed choice of vector type.  */
> +  if (expected_tclass == orig_expected_tclass
> +      && expected_bits == orig_expected_bits)
> +    {
> +      const type_suffix_info &expected_suffix
> +	= type_suffixes[find_type_suffix (expected_tclass, expected_bits)];
> +      return require_vector_type (argno, expected_suffix.vector_type);
> +    }
> +
> +  /* Require the argument to be some form of MVE vector type,
> +     without being specific about the type of vector we want.  */
> +  type_suffix_index actual_type = infer_vector_type (argno);
> +  if (actual_type == NUM_TYPE_SUFFIXES)
> +    return false;
> +
> +  /* Exit now if we got the right type.  */
> +  bool tclass_ok_p = (type_suffixes[actual_type].tclass == expected_tclass);
> +  bool size_ok_p = (type_suffixes[actual_type].element_bits ==
> expected_bits);
> +  if (tclass_ok_p && size_ok_p)
> +    return true;
> +
> +  /* First look for cases in which the actual type contravenes a fixed
> +     size requirement, without having to refer to FIRST_TYPE.  */
> +  if (!size_ok_p && expected_bits == orig_expected_bits)
> +    {
> +      error_at (location, "passing %qT to argument %d of %qE, which"
> +		" expects a vector of %d-bit elements",
> +		get_vector_type (actual_type), argno + 1, fndecl,
> +		expected_bits);
> +      return false;
> +    }
> +
> +  /* Likewise for a fixed type class requirement.  This is only ever
> +     needed for signed and unsigned types, so don't create unnecessary
> +     translation work for other type classes.  */
> +  if (!tclass_ok_p && orig_expected_tclass == TYPE_signed)
> +    {
> +      error_at (location, "passing %qT to argument %d of %qE, which"
> +		" expects a vector of signed integers",
> +		get_vector_type (actual_type), argno + 1, fndecl);
> +      return false;
> +    }
> +  if (!tclass_ok_p && orig_expected_tclass == TYPE_unsigned)
> +    {
> +      error_at (location, "passing %qT to argument %d of %qE, which"
> +		" expects a vector of unsigned integers",
> +		get_vector_type (actual_type), argno + 1, fndecl);
> +      return false;
> +    }
> +
> +  /* Make sure that FIRST_TYPE itself is sensible before using it
> +     as a basis for an error message.  */
> +  if (resolve_to (mode_suffix_id, first_type) == error_mark_node)
> +    return false;
> +
> +  /* If the arguments have consistent type classes, but a link between
> +     the sizes has been broken, try to describe the error in those terms.  */
> +  if (both_vectors_p && tclass_ok_p && orig_expected_bits == SAME_SIZE)
> +    {
> +      if (argno < first_argno)
> +	{
> +	  std::swap (argno, first_argno);
> +	  std::swap (actual_type, first_type);
> +	}
> +      error_at (location, "arguments %d and %d of %qE must have the"
> +		" same element size, but the values passed here have type"
> +		" %qT and %qT respectively", first_argno + 1, argno + 1,
> +		fndecl, get_vector_type (first_type),
> +		get_vector_type (actual_type));
> +      return false;
> +    }
> +
> +  /* Likewise in reverse: look for cases in which the sizes are consistent
> +     but a link between the type classes has been broken.  */
> +  if (both_vectors_p
> +      && size_ok_p
> +      && orig_expected_tclass == SAME_TYPE_CLASS
> +      && type_suffixes[first_type].integer_p
> +      && type_suffixes[actual_type].integer_p)
> +    {
> +      if (argno < first_argno)
> +	{
> +	  std::swap (argno, first_argno);
> +	  std::swap (actual_type, first_type);
> +	}
> +      error_at (location, "arguments %d and %d of %qE must have the"
> +		" same signedness, but the values passed here have type"
> +		" %qT and %qT respectively", first_argno + 1, argno + 1,
> +		fndecl, get_vector_type (first_type),
> +		get_vector_type (actual_type));
> +      return false;
> +    }
> +
> +  /* The two arguments are wildly inconsistent.  */
> +  type_suffix_index expected_type
> +    = find_type_suffix (expected_tclass, expected_bits);
> +  error_at (location, "passing %qT instead of the expected %qT to argument"
> +	    " %d of %qE, after passing %qT to argument %d",
> +	    get_vector_type (actual_type), get_vector_type (expected_type),
> +	    argno + 1, fndecl, get_argument_type (first_argno),
> +	    first_argno + 1);
> +  return false;
> +}
> +
> +/* Require argument ARGNO to be a (possibly variable) scalar, expecting it
> +   to have the following properties:
> +
> +   - the type class must be the same as for type suffix 0 if EXPECTED_TCLASS
> +     is SAME_TYPE_CLASS, otherwise it must be EXPECTED_TCLASS itself.
> +
> +   - the element size must be the same as for type suffix 0 if EXPECTED_BITS
> +     is SAME_TYPE_SIZE, otherwise it must be EXPECTED_BITS itself.
> +
> +   Return true if the argument is valid, otherwise report an appropriate error.
> +
> +   Note that we don't check whether the scalar type actually has the required
> +   properties, since that's subject to implicit promotions and conversions.
> +   Instead we just use the expected properties to tune the error message.  */
> +bool function_resolver::
> +require_derived_scalar_type (unsigned int argno,
> +			     type_class_index expected_tclass,
> +			     unsigned int expected_bits)
> +{
> +  gcc_assert (expected_tclass == SAME_TYPE_CLASS
> +	      || expected_tclass == TYPE_signed
> +	      || expected_tclass == TYPE_unsigned);
> +
> +  /* If the expected type doesn't depend on the type suffix at all,
> +     just check for the fixed choice of scalar type.  */
> +  if (expected_tclass != SAME_TYPE_CLASS && expected_bits != SAME_SIZE)
> +    {
> +      type_suffix_index expected_type
> +	= find_type_suffix (expected_tclass, expected_bits);
> +      return require_scalar_type (argno, get_scalar_type_name
> (expected_type));
> +    }
> +
> +  if (scalar_argument_p (argno))
> +    return true;
> +
> +  if (expected_tclass == SAME_TYPE_CLASS)
> +    /* It doesn't really matter whether the element is expected to be
> +       the same size as type suffix 0.  */
> +    error_at (location, "passing %qT to argument %d of %qE, which"
> +	      " expects a scalar element", get_argument_type (argno),
> +	      argno + 1, fndecl);
> +  else
> +    /* It doesn't seem useful to distinguish between signed and unsigned
> +       scalars here.  */
> +    error_at (location, "passing %qT to argument %d of %qE, which"
> +	      " expects a scalar integer", get_argument_type (argno),
> +	      argno + 1, fndecl);
> +  return false;
> +}
> +
> +/* Require argument ARGNO to be suitable for an integer constant
> expression.
> +   Return true if it is, otherwise report an appropriate error.
> +
> +   function_checker checks whether the argument is actually constant and
> +   has a suitable range.  The reason for distinguishing immediate arguments
> +   here is because it provides more consistent error messages than
> +   require_scalar_type would.  */
> +bool
> +function_resolver::require_integer_immediate (unsigned int argno)
> +{
> +  if (!scalar_argument_p (argno))
> +    {
> +      report_non_ice (location, fndecl, argno);
> +      return false;
> +    }
> +  return true;
> +}
> +
> +/* Require argument ARGNO to be a (possibly variable) scalar, using
> EXPECTED
> +   as the name of its expected type.  Return true if the argument has the
> +   right form, otherwise report an appropriate error.  */
> +bool
> +function_resolver::require_scalar_type (unsigned int argno,
> +					const char *expected)
> +{
> +  if (!scalar_argument_p (argno))
> +    {
> +      error_at (location, "passing %qT to argument %d of %qE, which"
> +		" expects %qs", get_argument_type (argno), argno + 1,
> +		fndecl, expected);
> +      return false;
> +    }
> +  return true;
> +}
> +
> +/* Require the function to have exactly EXPECTED arguments.  Return true
> +   if it does, otherwise report an appropriate error.  */
> +bool
> +function_resolver::check_num_arguments (unsigned int expected)
> +{
> +  if (m_arglist.length () < expected)
> +    error_at (location, "too few arguments to function %qE", fndecl);
> +  else if (m_arglist.length () > expected)
> +    error_at (location, "too many arguments to function %qE", fndecl);
> +  return m_arglist.length () == expected;
> +}
> +
> +/* If the function is predicated, check that the last argument is a
> +   suitable predicate.  Also check that there are NOPS further
> +   arguments before any predicate, but don't check what they are.
> +
> +   Return true on success, otherwise report a suitable error.
> +   When returning true:
> +
> +   - set I to the number of the last unchecked argument.
> +   - set NARGS to the total number of arguments.  */
> +bool
> +function_resolver::check_gp_argument (unsigned int nops,
> +				      unsigned int &i, unsigned int &nargs)
> +{
> +  i = nops - 1;
> +  if (pred != PRED_none)
> +    {
> +      switch (pred)
> +	{
> +	case PRED_m:
> +	  /* Add first inactive argument if needed, and final predicate.  */
> +	  if (has_inactive_argument ())
> +	    nargs = nops + 2;
> +	  else
> +	    nargs = nops + 1;
> +	  break;
> +
> +	case PRED_p:
> +	case PRED_x:
> +	  /* Add final predicate.  */
> +	  nargs = nops + 1;
> +	  break;
> +
> +	default:
> +	  gcc_unreachable ();
> +	}
> +
> +      if (!check_num_arguments (nargs)
> +	  || !require_vector_type (nargs - 1, VECTOR_TYPE_mve_pred16_t))
> +	return false;
> +
> +      i = nargs - 2;
> +    }
> +  else
> +    {
> +      nargs = nops;
> +      if (!check_num_arguments (nargs))
> +	return false;
> +    }
> +
> +  return true;
> +}
> +
> +/* Finish resolving a function whose final argument can be a vector
> +   or a scalar, with the function having an implicit "_n" suffix
> +   in the latter case.  This "_n" form might only exist for certain
> +   type suffixes.
> +
> +   ARGNO is the index of the final argument.  The inferred type suffix
> +   was obtained from argument FIRST_ARGNO, which has type FIRST_TYPE.
> +   EXPECTED_TCLASS and EXPECTED_BITS describe the expected properties
> +   of the final vector or scalar argument, in the same way as for
> +   require_derived_vector_type.  INFERRED_TYPE is the inferred type
> +   suffix itself, or NUM_TYPE_SUFFIXES if it's the same as FIRST_TYPE.
> +
> +   Return the function decl of the resolved function on success,
> +   otherwise report a suitable error and return error_mark_node.  */
> +tree function_resolver::
> +finish_opt_n_resolution (unsigned int argno, unsigned int first_argno,
> +			 type_suffix_index first_type,
> +			 type_class_index expected_tclass,
> +			 unsigned int expected_bits,
> +			 type_suffix_index inferred_type)
> +{
> +  if (inferred_type == NUM_TYPE_SUFFIXES)
> +    inferred_type = first_type;
> +  tree scalar_form = lookup_form (MODE_n, inferred_type);
> +
> +  /* Allow the final argument to be scalar, if an _n form exists.  */
> +  if (scalar_argument_p (argno))
> +    {
> +      if (scalar_form)
> +	return scalar_form;
> +
> +      /* Check the vector form normally.  If that succeeds, raise an
> +	 error about having no corresponding _n form.  */
> +      tree res = resolve_to (mode_suffix_id, inferred_type);
> +      if (res != error_mark_node)
> +	error_at (location, "passing %qT to argument %d of %qE, but its"
> +		  " %qT form does not accept scalars",
> +		  get_argument_type (argno), argno + 1, fndecl,
> +		  get_vector_type (first_type));
> +      return error_mark_node;
> +    }
> +
> +  /* If an _n form does exist, provide a more accurate message than
> +     require_derived_vector_type would for arguments that are neither
> +     vectors nor scalars.  */
> +  if (scalar_form && !require_vector_or_scalar_type (argno))
> +    return error_mark_node;
> +
> +  /* Check for the correct vector type.  */
> +  if (!require_derived_vector_type (argno, first_argno, first_type,
> +				    expected_tclass, expected_bits))
> +    return error_mark_node;
> +
> +  return resolve_to (mode_suffix_id, inferred_type);
> +}
> +
> +/* Resolve a (possibly predicated) unary function.  If the function uses
> +   merge predication or if TREAT_AS_MERGE_P is true, there is an extra
> +   vector argument before the governing predicate that specifies the
> +   values of inactive elements.  This argument has the following
> +   properties:
> +
> +   - the type class must be the same as for active elements if MERGE_TCLASS
> +     is SAME_TYPE_CLASS, otherwise it must be MERGE_TCLASS itself.
> +
> +   - the element size must be the same as for active elements if MERGE_BITS
> +     is SAME_TYPE_SIZE, otherwise it must be MERGE_BITS itself.
> +
> +   Return the function decl of the resolved function on success,
> +   otherwise report a suitable error and return error_mark_node.  */
> +tree
> +function_resolver::resolve_unary (type_class_index merge_tclass,
> +				  unsigned int merge_bits,
> +				  bool treat_as_merge_p)
> +{
> +  type_suffix_index type;
> +  if (pred == PRED_m || treat_as_merge_p)
> +    {
> +      if (!check_num_arguments (3))
> +	return error_mark_node;
> +      if (merge_tclass == SAME_TYPE_CLASS && merge_bits == SAME_SIZE)
> +	{
> +	  /* The inactive elements are the same as the active elements,
> +	     so we can use normal left-to-right resolution.  */
> +	  if ((type = infer_vector_type (0)) == NUM_TYPE_SUFFIXES
> +	      /* Predicates are the last argument.  */
> +	      || !require_vector_type (2 , VECTOR_TYPE_mve_pred16_t)
> +	      || !require_matching_vector_type (1 , type))
> +	    return error_mark_node;
> +	}
> +      else
> +	{
> +	  /* The inactive element type is a function of the active one,
> +	     so resolve the active one first.  */
> +	  if (!require_vector_type (1, VECTOR_TYPE_mve_pred16_t)
> +	      || (type = infer_vector_type (2)) == NUM_TYPE_SUFFIXES
> +	      || !require_derived_vector_type (0, 2, type, merge_tclass,
> +					       merge_bits))
> +	    return error_mark_node;
> +	}
> +    }
> +  else
> +    {
> +      /* We just need to check the predicate (if any) and the single
> +	 vector argument.  */
> +      unsigned int i, nargs;
> +      if (!check_gp_argument (1, i, nargs)
> +	  || (type = infer_vector_type (i)) == NUM_TYPE_SUFFIXES)
> +	return error_mark_node;
> +    }
> +
> +  /* Handle convert-like functions in which the first type suffix is
> +     explicit.  */
> +  if (type_suffix_ids[0] != NUM_TYPE_SUFFIXES)
> +    return resolve_to (mode_suffix_id, type_suffix_ids[0], type);
> +
> +  return resolve_to (mode_suffix_id, type);
> +}
> +
> +/* Resolve a (possibly predicated) unary function taking a scalar
> +   argument (_n suffix).  If the function uses merge predication,
> +   there is an extra vector argument in the first position, and the
> +   final governing predicate that specifies the values of inactive
> +   elements.
> +
> +   Return the function decl of the resolved function on success,
> +   otherwise report a suitable error and return error_mark_node.  */
> +tree
> +function_resolver::resolve_unary_n ()
> +{
> +  type_suffix_index type;
> +
> +  /* Currently only support overrides for _m (vdupq).  */
> +  if (pred != PRED_m)
> +    return error_mark_node;
> +
> +  if (pred == PRED_m)
> +    {
> +      if (!check_num_arguments (3))
> +	return error_mark_node;
> +
> +      /* The inactive elements are the same as the active elements,
> +	 so we can use normal left-to-right resolution.  */
> +      if ((type = infer_vector_type (0)) == NUM_TYPE_SUFFIXES
> +	  /* Predicates are the last argument.  */
> +	  || !require_vector_type (2 , VECTOR_TYPE_mve_pred16_t))
> +	return error_mark_node;
> +    }
> +
> +  /* Make sure the argument is scalar.  */
> +  tree scalar_form = lookup_form (MODE_n, type);
> +
> +  if (scalar_argument_p (1) && scalar_form)
> +    return scalar_form;
> +
> +  return error_mark_node;
> +}
> +
> +/* Resolve a (possibly predicated) function that takes NOPS like-typed
> +   vector arguments followed by NIMM integer immediates.  Return the
> +   function decl of the resolved function on success, otherwise report
> +   a suitable error and return error_mark_node.  */
> +tree
> +function_resolver::resolve_uniform (unsigned int nops, unsigned int nimm)
> +{
> +  unsigned int i, nargs;
> +  type_suffix_index type;
> +  if (!check_gp_argument (nops + nimm, i, nargs)
> +      || (type = infer_vector_type (0 )) == NUM_TYPE_SUFFIXES)
> +    return error_mark_node;
> +
> +  unsigned int last_arg = i + 1 - nimm;
> +  for (i = 0; i < last_arg; i++)
> +    if (!require_matching_vector_type (i, type))
> +      return error_mark_node;
> +
> +  for (i = last_arg; i < nargs; ++i)
> +    if (!require_integer_immediate (i))
> +      return error_mark_node;
> +
> +  return resolve_to (mode_suffix_id, type);
> +}
> +
> +/* Resolve a (possibly predicated) function that offers a choice between
> +   taking:
> +
> +   - NOPS like-typed vector arguments or
> +   - NOPS - 1 like-typed vector arguments followed by a scalar argument
> +
> +   Return the function decl of the resolved function on success,
> +   otherwise report a suitable error and return error_mark_node.  */
> +tree
> +function_resolver::resolve_uniform_opt_n (unsigned int nops)
> +{
> +  unsigned int i, nargs;
> +  type_suffix_index type;
> +  if (!check_gp_argument (nops, i, nargs)
> +      /* Unary operators should use resolve_unary, so using i - 1 is
> +	 safe.  */
> +      || (type = infer_vector_type (i - 1)) == NUM_TYPE_SUFFIXES)
> +    return error_mark_node;
> +
> +  /* Skip last argument, may be scalar.  */
> +  unsigned int last_arg = i;
> +  for (i = 0; i < last_arg; i++)
> +    if (!require_matching_vector_type (i, type))
> +      return error_mark_node;
> +
> +  return finish_opt_n_resolution (last_arg, 0, type);
> +}
> +
> +/* If the call is erroneous, report an appropriate error and return
> +   error_mark_node.  Otherwise, if the function is overloaded, return
> +   the decl of the non-overloaded function.  Return NULL_TREE otherwise,
> +   indicating that the call should be processed in the normal way.  */
> +tree
> +function_resolver::resolve ()
> +{
> +  return shape->resolve (*this);
> +}
> +
> +function_checker::function_checker (location_t location,
> +				    const function_instance &instance,
> +				    tree fndecl, tree fntype,
> +				    unsigned int nargs, tree *args)
> +  : function_call_info (location, instance, fndecl),
> +    m_fntype (fntype), m_nargs (nargs), m_args (args)
> +{
> +  if (instance.has_inactive_argument ())
> +    m_base_arg = 1;
> +  else
> +    m_base_arg = 0;
> +}
> +
> +/* Return true if argument ARGNO exists. which it might not for
> +   erroneous calls.  It is safe to wave through checks if this
> +   function returns false.  */
> +bool
> +function_checker::argument_exists_p (unsigned int argno)
> +{
> +  gcc_assert (argno < (unsigned int) type_num_arguments (m_fntype));
> +  return argno < m_nargs;
> +}
> +
> +/* Check that argument ARGNO is an integer constant expression and
> +   store its value in VALUE_OUT if so.  The caller should first
> +   check that argument ARGNO exists.  */
> +bool
> +function_checker::require_immediate (unsigned int argno,
> +				     HOST_WIDE_INT &value_out)
> +{
> +  gcc_assert (argno < m_nargs);
> +  tree arg = m_args[argno];
> +
> +  /* The type and range are unsigned, so read the argument as an
> +     unsigned rather than signed HWI.  */
> +  if (!tree_fits_uhwi_p (arg))
> +    {
> +      report_non_ice (location, fndecl, argno);
> +      return false;
> +    }
> +
> +  /* ...but treat VALUE_OUT as signed for error reporting, since printing
> +     -1 is more user-friendly than the maximum uint64_t value.  */
> +  value_out = tree_to_uhwi (arg);
> +  return true;
> +}
> +
> +/* Check that argument REL_ARGNO is an integer constant expression that
> has
> +   a valid value for enumeration type TYPE.  REL_ARGNO counts from the end
> +   of the predication arguments.  */
> +bool
> +function_checker::require_immediate_enum (unsigned int rel_argno, tree
> type)
> +{
> +  unsigned int argno = m_base_arg + rel_argno;
> +  if (!argument_exists_p (argno))
> +    return true;
> +
> +  HOST_WIDE_INT actual;
> +  if (!require_immediate (argno, actual))
> +    return false;
> +
> +  for (tree entry = TYPE_VALUES (type); entry; entry = TREE_CHAIN (entry))
> +    {
> +      /* The value is an INTEGER_CST for C and a CONST_DECL wrapper
> +	 around an INTEGER_CST for C++.  */
> +      tree value = TREE_VALUE (entry);
> +      if (TREE_CODE (value) == CONST_DECL)
> +	value = DECL_INITIAL (value);
> +      if (wi::to_widest (value) == actual)
> +	return true;
> +    }
> +
> +  report_not_enum (location, fndecl, argno, actual, type);
> +  return false;
> +}
> +
> +/* Check that argument REL_ARGNO is an integer constant expression in the
> +   range [MIN, MAX].  REL_ARGNO counts from the end of the predication
> +   arguments.  */
> +bool
> +function_checker::require_immediate_range (unsigned int rel_argno,
> +					   HOST_WIDE_INT min,
> +					   HOST_WIDE_INT max)
> +{
> +  unsigned int argno = m_base_arg + rel_argno;
> +  if (!argument_exists_p (argno))
> +    return true;
> +
> +  /* Required because of the tree_to_uhwi -> HOST_WIDE_INT conversion
> +     in require_immediate.  */
> +  gcc_assert (min >= 0 && min <= max);
> +  HOST_WIDE_INT actual;
> +  if (!require_immediate (argno, actual))
> +    return false;
> +
> +  if (!IN_RANGE (actual, min, max))
> +    {
> +      report_out_of_range (location, fndecl, argno, actual, min, max);
> +      return false;
> +    }
> +
> +  return true;
> +}
> +
> +/* Perform semantic checks on the call.  Return true if the call is valid,
> +   otherwise report a suitable error.  */
> +bool
> +function_checker::check ()
> +{
> +  function_args_iterator iter;
> +  tree type;
> +  unsigned int i = 0;
> +  FOREACH_FUNCTION_ARGS (m_fntype, type, iter)
> +    {
> +      if (type == void_type_node || i >= m_nargs)
> +	break;
> +
> +      if (i >= m_base_arg
> +	  && TREE_CODE (type) == ENUMERAL_TYPE
> +	  && !require_immediate_enum (i - m_base_arg, type))
> +	return false;
> +
> +      i += 1;
> +    }
> +
> +  return shape->check (*this);
> +}
> +
> +gimple_folder::gimple_folder (const function_instance &instance, tree
> fndecl,
> +			      gcall *call_in)
> +  : function_call_info (gimple_location (call_in), instance, fndecl),
> +    call (call_in), lhs (gimple_call_lhs (call_in))
> +{
> +}
> +
> +/* Try to fold the call.  Return the new statement on success and null
> +   on failure.  */
> +gimple *
> +gimple_folder::fold ()
> +{
> +  /* Don't fold anything when MVE is disabled; emit an error during
> +     expansion instead.  */
> +  if (!TARGET_HAVE_MVE)
> +    return NULL;
> +
> +  /* Punt if the function has a return type and no result location is
> +     provided.  The attributes should allow target-independent code to
> +     remove the calls if appropriate.  */
> +  if (!lhs && TREE_TYPE (gimple_call_fntype (call)) != void_type_node)
> +    return NULL;
> +
> +  return base->fold (*this);
> +}
> +
> +function_expander::function_expander (const function_instance &instance,
> +				      tree fndecl, tree call_expr_in,
> +				      rtx possible_target_in)
> +  : function_call_info (EXPR_LOCATION (call_expr_in), instance, fndecl),
> +    call_expr (call_expr_in), possible_target (possible_target_in)
> +{
> +}
> +
> +/* Return the handler of direct optab OP for type suffix SUFFIX_I.  */
> +insn_code
> +function_expander::direct_optab_handler (optab op, unsigned int suffix_i)
> +{
> +  return ::direct_optab_handler (op, vector_mode (suffix_i));
> +}
> +
> +/* For a function that does the equivalent of:
> +
> +     OUTPUT = COND ? FN (INPUTS) : FALLBACK;
> +
> +   return the value of FALLBACK.
> +
> +   MODE is the mode of OUTPUT.
> +   MERGE_ARGNO is the argument that provides FALLBACK for _m functions,
> +   or DEFAULT_MERGE_ARGNO if we should apply the usual rules.
> +
> +   ARGNO is the caller's index into args.  If the returned value is
> +   argument 0 (as for unary _m operations), increment ARGNO past the
> +   returned argument.  */
> +rtx
> +function_expander::get_fallback_value (machine_mode mode,
> +				       unsigned int merge_argno,
> +				       unsigned int &argno)
> +{
> +  if (pred == PRED_z)
> +    return CONST0_RTX (mode);
> +
> +  gcc_assert (pred == PRED_m || pred == PRED_x);
> +
> +  if (merge_argno == 0)
> +    return args[argno++];
> +
> +  return args[merge_argno];
> +}
> +
> +/* Return a REG rtx that can be used for the result of the function,
> +   using the preferred target if suitable.  */
> +rtx
> +function_expander::get_reg_target ()
> +{
> +  machine_mode target_mode = TYPE_MODE (TREE_TYPE (TREE_TYPE
> (fndecl)));
> +  if (!possible_target || GET_MODE (possible_target) != target_mode)
> +    possible_target = gen_reg_rtx (target_mode);
> +  return possible_target;
> +}
> +
> +/* Add an output operand to the instruction we're building, which has
> +   code ICODE.  Bind the output to the preferred target rtx if possible.  */
> +void
> +function_expander::add_output_operand (insn_code icode)
> +{
> +  unsigned int opno = m_ops.length ();
> +  machine_mode mode = insn_data[icode].operand[opno].mode;
> +  m_ops.safe_grow (opno + 1, true);
> +  create_output_operand (&m_ops.last (), possible_target, mode);
> +}
> +
> +/* Add an input operand to the instruction we're building, which has
> +   code ICODE.  Calculate the value of the operand as follows:
> +
> +   - If the operand is a predicate, coerce X to have the
> +     mode that the instruction expects.
> +
> +   - Otherwise use X directly.  The expand machinery checks that X has
> +     the right mode for the instruction.  */
> +void
> +function_expander::add_input_operand (insn_code icode, rtx x)
> +{
> +  unsigned int opno = m_ops.length ();
> +  const insn_operand_data &operand = insn_data[icode].operand[opno];
> +  machine_mode mode = operand.mode;
> +  if (mode == VOIDmode)
> +    {
> +      /* The only allowable use of VOIDmode is the wildcard
> +	 arm_any_register_operand, which is used to avoid
> +	 combinatorial explosion in the reinterpret patterns.  */
> +      gcc_assert (operand.predicate == arm_any_register_operand);
> +      mode = GET_MODE (x);
> +    }
> +  else if (VALID_MVE_PRED_MODE (mode))
> +    x = gen_lowpart (mode, x);
> +
> +  m_ops.safe_grow (m_ops.length () + 1, true);
> +  create_input_operand (&m_ops.last (), x, mode);
> +}
> +
> +/* Add an integer operand with value X to the instruction.  */
> +void
> +function_expander::add_integer_operand (HOST_WIDE_INT x)
> +{
> +  m_ops.safe_grow (m_ops.length () + 1, true);
> +  create_integer_operand (&m_ops.last (), x);
> +}
> +
> +/* Generate instruction ICODE, given that its operands have already
> +   been added to M_OPS.  Return the value of the first operand.  */
> +rtx
> +function_expander::generate_insn (insn_code icode)
> +{
> +  expand_insn (icode, m_ops.length (), m_ops.address ());
> +  return function_returns_void_p () ? const0_rtx : m_ops[0].value;
> +}
> +
> +/* Implement the call using instruction ICODE, with a 1:1 mapping between
> +   arguments and input operands.  */
> +rtx
> +function_expander::use_exact_insn (insn_code icode)
> +{
> +  unsigned int nops = insn_data[icode].n_operands;
> +  if (!function_returns_void_p ())
> +    {
> +      add_output_operand (icode);
> +      nops -= 1;
> +    }
> +  for (unsigned int i = 0; i < nops; ++i)
> +    add_input_operand (icode, args[i]);
> +  return generate_insn (icode);
> +}
> +
> +/* Implement the call using instruction ICODE, which does not use a
> +   predicate.  */
> +rtx
> +function_expander::use_unpred_insn (insn_code icode)
> +{
> +  gcc_assert (pred == PRED_none);
> +  /* Discount the output operand.  */
> +  unsigned int nops = insn_data[icode].n_operands - 1;
> +  unsigned int i = 0;
> +
> +  add_output_operand (icode);
> +  for (; i < nops; ++i)
> +    add_input_operand (icode, args[i]);
> +
> +  return generate_insn (icode);
> +}
> +
> +/* Implement the call using instruction ICODE, which is a predicated
> +   operation that returns arbitrary values for inactive lanes.  */
> +rtx
> +function_expander::use_pred_x_insn (insn_code icode)
> +{
> +  gcc_assert (pred == PRED_x);
> +  unsigned int nops = args.length ();
> +
> +  add_output_operand (icode);
> +  /* Use first operand as arbitrary inactive input.  */
> +  add_input_operand (icode, possible_target);
> +  emit_clobber (possible_target);
> +  /* Copy remaining arguments, including the final predicate.  */
> +  for (unsigned int i = 0; i < nops; ++i)
> +      add_input_operand (icode, args[i]);
> +
> +  return generate_insn (icode);
> +}
> +
> +/* Implement the call using instruction ICODE, which does the equivalent of:
> +
> +     OUTPUT = COND ? FN (INPUTS) : FALLBACK;
> +
> +   The instruction operands are in the order above: OUTPUT, COND, INPUTS
> +   and FALLBACK.  MERGE_ARGNO is the argument that provides FALLBACK
> for _m
> +   functions, or DEFAULT_MERGE_ARGNO if we should apply the usual rules.
> */
> +rtx
> +function_expander::use_cond_insn (insn_code icode, unsigned int
> merge_argno)
> +{
> +  /* At present we never need to handle PRED_none, which would involve
> +     creating a new predicate rather than using one supplied by the user.  */
> +  gcc_assert (pred != PRED_none);
> +  /* For MVE, we only handle PRED_m at present.  */
> +  gcc_assert (pred == PRED_m);
> +
> +  /* Discount the output, predicate and fallback value.  */
> +  unsigned int nops = insn_data[icode].n_operands - 3;
> +  machine_mode mode = insn_data[icode].operand[0].mode;
> +
> +  unsigned int opno = 0;
> +  rtx fallback_arg = NULL_RTX;
> +  fallback_arg = get_fallback_value (mode, merge_argno, opno);
> +  rtx pred_arg = args[nops + 1];
> +
> +  add_output_operand (icode);
> +    add_input_operand (icode, fallback_arg);
> +  for (unsigned int i = 0; i < nops; ++i)
> +    add_input_operand (icode, args[opno + i]);
> +  add_input_operand (icode, pred_arg);
> +  return generate_insn (icode);
> +}
> +
> +/* Implement the call using a normal unpredicated optab for PRED_none.
> +
> +   <optab> corresponds to:
> +
> +   - CODE_FOR_SINT for signed integers
> +   - CODE_FOR_UINT for unsigned integers
> +   - CODE_FOR_FP for floating-point values  */
> +rtx
> +function_expander::map_to_rtx_codes (rtx_code code_for_sint,
> +				     rtx_code code_for_uint,
> +				     rtx_code code_for_fp)
> +{
> +  gcc_assert (pred == PRED_none);
> +  rtx_code code = type_suffix (0).integer_p ?
> +    (type_suffix (0).unsigned_p ? code_for_uint : code_for_sint)
> +    : code_for_fp;
> +  insn_code icode = direct_optab_handler (code_to_optab (code), 0);
> +  if (icode == CODE_FOR_nothing)
> +    gcc_unreachable ();
> +
> +  return use_unpred_insn (icode);
> +}
> +
> +/* Expand the call and return its lhs.  */
> +rtx
> +function_expander::expand ()
> +{
> +  unsigned int nargs = call_expr_nargs (call_expr);
> +  args.reserve (nargs);
> +  for (unsigned int i = 0; i < nargs; ++i)
> +    args.quick_push (expand_normal (CALL_EXPR_ARG (call_expr, i)));
> +
> +  return base->expand (*this);
> +}
> +
> +/* If we're implementing manual overloading, check whether the MVE
> +   function with subcode CODE is overloaded, and if so attempt to
> +   determine the corresponding non-overloaded function.  The call
> +   occurs at location LOCATION and has the arguments given by ARGLIST.
> +
> +   If the call is erroneous, report an appropriate error and return
> +   error_mark_node.  Otherwise, if the function is overloaded, return
> +   the decl of the non-overloaded function.  Return NULL_TREE otherwise,
> +   indicating that the call should be processed in the normal way.  */
> +tree
> +resolve_overloaded_builtin (location_t location, unsigned int code,
> +			    vec<tree, va_gc> *arglist)
> +{
> +  if (code >= vec_safe_length (registered_functions))
> +    return NULL_TREE;
> +
> +  registered_function &rfn = *(*registered_functions)[code];
> +  if (rfn.overloaded_p)
> +    return function_resolver (location, rfn.instance, rfn.decl,
> +			      *arglist).resolve ();
> +  return NULL_TREE;
> +}
> +
> +/* Perform any semantic checks needed for a call to the MVE function
> +   with subcode CODE, such as testing for integer constant expressions.
> +   The call occurs at location LOCATION and has NARGS arguments,
> +   given by ARGS.  FNDECL is the original function decl, before
> +   overload resolution.
> +
> +   Return true if the call is valid, otherwise report a suitable error.  */
> +bool
> +check_builtin_call (location_t location, vec<location_t>, unsigned int code,
> +		    tree fndecl, unsigned int nargs, tree *args)
> +{
> +  const registered_function &rfn = *(*registered_functions)[code];
> +  if (!check_requires_float (location, rfn.decl, rfn.requires_float))
> +    return false;
> +
> +  return function_checker (location, rfn.instance, fndecl,
> +			   TREE_TYPE (rfn.decl), nargs, args).check ();
> +}
> +
> +/* Attempt to fold STMT, given that it's a call to the MVE function
> +   with subcode CODE.  Return the new statement on success and null
> +   on failure.  Insert any other new statements at GSI.  */
> +gimple *
> +gimple_fold_builtin (unsigned int code, gcall *stmt)
> +{
> +  registered_function &rfn = *(*registered_functions)[code];
> +  return gimple_folder (rfn.instance, rfn.decl, stmt).fold ();
> +}
> +
> +/* Expand a call to the MVE function with subcode CODE.  EXP is the call
> +   expression and TARGET is the preferred location for the result.
> +   Return the value of the lhs.  */
> +rtx
> +expand_builtin (unsigned int code, tree exp, rtx target)
> +{
> +  registered_function &rfn = *(*registered_functions)[code];
> +  if (!check_requires_float (EXPR_LOCATION (exp), rfn.decl,
> +			    rfn.requires_float))
> +    return target;
> +  return function_expander (rfn.instance, rfn.decl, exp, target).expand ();
> +}
> +
> +} /* end namespace arm_mve */
> +
> +using namespace arm_mve;
> +
> +inline void
> +gt_ggc_mx (function_instance *)
> +{
> +}
> +
> +inline void
> +gt_pch_nx (function_instance *)
> +{
> +}
> +
> +inline void
> +gt_pch_nx (function_instance *, gt_pointer_operator, void *)
> +{
> +}
> 
>  #include "gt-arm-mve-builtins.h"
> diff --git a/gcc/config/arm/arm-mve-builtins.def b/gcc/config/arm/arm-mve-
> builtins.def
> index 69f3f81b473..49d07364fa2 100644
> --- a/gcc/config/arm/arm-mve-builtins.def
> +++ b/gcc/config/arm/arm-mve-builtins.def
> @@ -17,10 +17,25 @@
>     along with GCC; see the file COPYING3.  If not see
>     <http://www.gnu.org/licenses/>.  */
> 
> +#ifndef DEF_MVE_MODE
> +#define DEF_MVE_MODE(A, B, C, D)
> +#endif
> +
>  #ifndef DEF_MVE_TYPE
> -#error "arm-mve-builtins.def included without defining DEF_MVE_TYPE"
> +#define DEF_MVE_TYPE(A, B)
> +#endif
> +
> +#ifndef DEF_MVE_TYPE_SUFFIX
> +#define DEF_MVE_TYPE_SUFFIX(A, B, C, D, E)
>  #endif
> 
> +#ifndef DEF_MVE_FUNCTION
> +#define DEF_MVE_FUNCTION(A, B, C, D)
> +#endif
> +
> +DEF_MVE_MODE (n, none, none, none)
> +DEF_MVE_MODE (offset, none, none, bytes)
> +
>  #define REQUIRES_FLOAT false
>  DEF_MVE_TYPE (mve_pred16_t, boolean_type_node)
>  DEF_MVE_TYPE (uint8x16_t, unsigned_intQI_type_node)
> @@ -37,3 +52,26 @@ DEF_MVE_TYPE (int64x2_t, intDI_type_node)
>  DEF_MVE_TYPE (float16x8_t, arm_fp16_type_node)
>  DEF_MVE_TYPE (float32x4_t, float_type_node)
>  #undef REQUIRES_FLOAT
> +
> +#define REQUIRES_FLOAT false
> +DEF_MVE_TYPE_SUFFIX (s8, int8x16_t, signed, 8, V16QImode)
> +DEF_MVE_TYPE_SUFFIX (s16, int16x8_t, signed, 16, V8HImode)
> +DEF_MVE_TYPE_SUFFIX (s32, int32x4_t, signed, 32, V4SImode)
> +DEF_MVE_TYPE_SUFFIX (s64, int64x2_t, signed, 64, V2DImode)
> +DEF_MVE_TYPE_SUFFIX (u8, uint8x16_t, unsigned, 8, V16QImode)
> +DEF_MVE_TYPE_SUFFIX (u16, uint16x8_t, unsigned, 16, V8HImode)
> +DEF_MVE_TYPE_SUFFIX (u32, uint32x4_t, unsigned, 32, V4SImode)
> +DEF_MVE_TYPE_SUFFIX (u64, uint64x2_t, unsigned, 64, V2DImode)
> +#undef REQUIRES_FLOAT
> +
> +#define REQUIRES_FLOAT true
> +DEF_MVE_TYPE_SUFFIX (f16, float16x8_t, float, 16, V8HFmode)
> +DEF_MVE_TYPE_SUFFIX (f32, float32x4_t, float, 32, V4SFmode)
> +#undef REQUIRES_FLOAT
> +
> +#include "arm-mve-builtins-base.def"
> +
> +#undef DEF_MVE_TYPE
> +#undef DEF_MVE_TYPE_SUFFIX
> +#undef DEF_MVE_FUNCTION
> +#undef DEF_MVE_MODE
> diff --git a/gcc/config/arm/arm-mve-builtins.h b/gcc/config/arm/arm-mve-
> builtins.h
> index 290a118ec92..a20d2fb5d86 100644
> --- a/gcc/config/arm/arm-mve-builtins.h
> +++ b/gcc/config/arm/arm-mve-builtins.h
> @@ -20,7 +20,79 @@
>  #ifndef GCC_ARM_MVE_BUILTINS_H
>  #define GCC_ARM_MVE_BUILTINS_H
> 
> +/* The full name of an MVE ACLE function is the concatenation of:
> +
> +   - the base name ("vadd", etc.)
> +   - the "mode" suffix ("_n", "_index", etc.)
> +   - the type suffixes ("_s32", "_b8", etc.)
> +   - the predication suffix ("_x", "_z", etc.)
> +
> +   Each piece of information is individually useful, so we retain this
> +   classification throughout:
> +
> +   - function_base represents the base name
> +
> +   - mode_suffix_index represents the mode suffix
> +
> +   - type_suffix_index represents individual type suffixes, while
> +     type_suffix_pair represents a pair of them
> +
> +   - prediction_index extends the predication suffix with an additional
> +     alternative: PRED_implicit for implicitly-predicated operations
> +
> +   In addition to its unique full name, a function may have a shorter
> +   overloaded alias.  This alias removes pieces of the suffixes that
> +   can be inferred from the arguments, such as by shortening the mode
> +   suffix or dropping some of the type suffixes.  The base name and the
> +   predication suffix stay the same.
> +
> +   The function_shape class describes what arguments a given function
> +   takes and what its overloaded alias is called.  In broad terms,
> +   function_base describes how the underlying instruction behaves while
> +   function_shape describes how that instruction has been presented at
> +   the language level.
> +
> +   The static list of functions uses function_group to describe a group
> +   of related functions.  The function_builder class is responsible for
> +   expanding this static description into a list of individual functions
> +   and registering the associated built-in functions.  function_instance
> +   describes one of these individual functions in terms of the properties
> +   described above.
> +
> +   The classes involved in compiling a function call are:
> +
> +   - function_resolver, which resolves an overloaded function call to a
> +     specific function_instance and its associated function decl
> +
> +   - function_checker, which checks whether the values of the arguments
> +     conform to the ACLE specification
> +
> +   - gimple_folder, which tries to fold a function call at the gimple level
> +
> +   - function_expander, which expands a function call into rtl instructions
> +
> +   function_resolver and function_checker operate at the language level
> +   and so are associated with the function_shape.  gimple_folder and
> +   function_expander are concerned with the behavior of the function
> +   and so are associated with the function_base.
> +
> +   Note that we've specifically chosen not to fold calls in the frontend,
> +   since MVE intrinsics will hardly ever fold a useful language-level
> +   constant.  */
>  namespace arm_mve {
> +/* The maximum number of vectors in an ACLE tuple type.  */
> +const unsigned int MAX_TUPLE_SIZE = 3;
> +
> +/* Used to represent the default merge argument index for _m functions.
> +   The actual index depends on how many arguments the function takes.  */
> +const unsigned int DEFAULT_MERGE_ARGNO = 0;
> +
> +/* Flags that describe what a function might do, in addition to reading
> +   its arguments and returning a result.  */
> +const unsigned int CP_READ_FPCR = 1U << 0;
> +const unsigned int CP_RAISE_FP_EXCEPTIONS = 1U << 1;
> +const unsigned int CP_READ_MEMORY = 1U << 2;
> +const unsigned int CP_WRITE_MEMORY = 1U << 3;
> 
>  /* Enumerates the MVE predicate and (data) vector types, together called
>     "vector types" for brevity.  */
> @@ -30,11 +102,604 @@ enum vector_type_index
>    VECTOR_TYPE_ ## ACLE_NAME,
>  #include "arm-mve-builtins.def"
>    NUM_VECTOR_TYPES
> -#undef DEF_MVE_TYPE
>  };
> 
> +/* Classifies the available measurement units for an address displacement.
> */
> +enum units_index
> +{
> +  UNITS_none,
> +  UNITS_bytes
> +};
> +
> +/* Describes the various uses of a governing predicate.  */
> +enum predication_index
> +{
> +  /* No governing predicate is present.  */
> +  PRED_none,
> +
> +  /* Merging predication: copy inactive lanes from the first data argument
> +     to the vector result.  */
> +  PRED_m,
> +
> +  /* Plain predication: inactive lanes are not used to compute the
> +     scalar result.  */
> +  PRED_p,
> +
> +  /* "Don't care" predication: set inactive lanes of the vector result
> +     to arbitrary values.  */
> +  PRED_x,
> +
> +  /* Zero predication: set inactive lanes of the vector result to zero.  */
> +  PRED_z,
> +
> +  NUM_PREDS
> +};
> +
> +/* Classifies element types, based on type suffixes with the bit count
> +   removed.  */
> +enum type_class_index
> +{
> +  TYPE_bool,
> +  TYPE_float,
> +  TYPE_signed,
> +  TYPE_unsigned,
> +  NUM_TYPE_CLASSES
> +};
> +
> +/* Classifies an operation into "modes"; for example, to distinguish
> +   vector-scalar operations from vector-vector operations, or to
> +   distinguish between different addressing modes.  This classification
> +   accounts for the function suffixes that occur between the base name
> +   and the first type suffix.  */
> +enum mode_suffix_index
> +{
> +#define DEF_MVE_MODE(NAME, BASE, DISPLACEMENT, UNITS)
> MODE_##NAME,
> +#include "arm-mve-builtins.def"
> +  MODE_none
> +};
> +
> +/* Enumerates the possible type suffixes.  Each suffix is associated with
> +   a vector type, but for predicates provides extra information about the
> +   element size.  */
> +enum type_suffix_index
> +{
> +#define DEF_MVE_TYPE_SUFFIX(NAME, ACLE_TYPE, CLASS, BITS, MODE)
> 	\
> +  TYPE_SUFFIX_ ## NAME,
> +#include "arm-mve-builtins.def"
> +  NUM_TYPE_SUFFIXES
> +};
> +
> +/* Combines two type suffixes.  */
> +typedef enum type_suffix_index type_suffix_pair[2];
> +
> +class function_base;
> +class function_shape;
> +
> +/* Static information about a mode suffix.  */
> +struct mode_suffix_info
> +{
> +  /* The suffix string itself.  */
> +  const char *string;
> +
> +  /* The type of the vector base address, or NUM_VECTOR_TYPES if the
> +     mode does not include a vector base address.  */
> +  vector_type_index base_vector_type;
> +
> +  /* The type of the vector displacement, or NUM_VECTOR_TYPES if the
> +     mode does not include a vector displacement.  (Note that scalar
> +     displacements are always int64_t.)  */
> +  vector_type_index displacement_vector_type;
> +
> +  /* The units in which the vector or scalar displacement is measured,
> +     or UNITS_none if the mode doesn't take a displacement.  */
> +  units_index displacement_units;
> +};
> +
> +/* Static information about a type suffix.  */
> +struct type_suffix_info
> +{
> +  /* The suffix string itself.  */
> +  const char *string;
> +
> +  /* The associated ACLE vector or predicate type.  */
> +  vector_type_index vector_type : 8;
> +
> +  /* What kind of type the suffix represents.  */
> +  type_class_index tclass : 8;
> +
> +  /* The number of bits and bytes in an element.  For predicates this
> +     measures the associated data elements.  */
> +  unsigned int element_bits : 8;
> +  unsigned int element_bytes : 8;
> +
> +  /* True if the suffix is for an integer type.  */
> +  unsigned int integer_p : 1;
> +  /* True if the suffix is for an unsigned type.  */
> +  unsigned int unsigned_p : 1;
> +  /* True if the suffix is for a floating-point type.  */
> +  unsigned int float_p : 1;
> +  unsigned int spare : 13;
> +
> +  /* The associated vector or predicate mode.  */
> +  machine_mode vector_mode : 16;
> +};
> +
> +/* Static information about a set of functions.  */
> +struct function_group_info
> +{
> +  /* The base name, as a string.  */
> +  const char *base_name;
> +
> +  /* Describes the behavior associated with the function base name.  */
> +  const function_base *const *base;
> +
> +  /* The shape of the functions, as described above the class definition.
> +     It's possible to have entries with the same base name but different
> +     shapes.  */
> +  const function_shape *const *shape;
> +
> +  /* A list of the available type suffixes, and of the available predication
> +     types.  The function supports every combination of the two.
> +
> +     The list of type suffixes is terminated by two NUM_TYPE_SUFFIXES
> +     while the list of predication types is terminated by NUM_PREDS.
> +     The list of type suffixes is lexicographically ordered based
> +     on the index value.  */
> +  const type_suffix_pair *types;
> +  const predication_index *preds;
> +
> +  /* Whether the function group requires a floating point abi.  */
> +  bool requires_float;
> +};
> +
> +/* Describes a single fully-resolved function (i.e. one that has a
> +   unique full name).  */
> +class GTY((user)) function_instance
> +{
> +public:
> +  function_instance (const char *, const function_base *,
> +		     const function_shape *, mode_suffix_index,
> +		     const type_suffix_pair &, predication_index);
> +
> +  bool operator== (const function_instance &) const;
> +  bool operator!= (const function_instance &) const;
> +  hashval_t hash () const;
> +
> +  unsigned int call_properties () const;
> +  bool reads_global_state_p () const;
> +  bool modifies_global_state_p () const;
> +  bool could_trap_p () const;
> +
> +  unsigned int vectors_per_tuple () const;
> +
> +  const mode_suffix_info &mode_suffix () const;
> +
> +  const type_suffix_info &type_suffix (unsigned int) const;
> +  tree scalar_type (unsigned int) const;
> +  tree vector_type (unsigned int) const;
> +  tree tuple_type (unsigned int) const;
> +  machine_mode vector_mode (unsigned int) const;
> +  machine_mode gp_mode (unsigned int) const;
> +
> +  bool has_inactive_argument () const;
> +
> +  /* The properties of the function.  (The explicit "enum"s are required
> +     for gengtype.)  */
> +  const char *base_name;
> +  const function_base *base;
> +  const function_shape *shape;
> +  enum mode_suffix_index mode_suffix_id;
> +  type_suffix_pair type_suffix_ids;
> +  enum predication_index pred;
> +};
> +
> +class registered_function;
> +
> +/* A class for building and registering function decls.  */
> +class function_builder
> +{
> +public:
> +  function_builder ();
> +  ~function_builder ();
> +
> +  void add_unique_function (const function_instance &, tree,
> +			    vec<tree> &, bool, bool, bool);
> +  void add_overloaded_function (const function_instance &, bool, bool);
> +  void add_overloaded_functions (const function_group_info &,
> +				 mode_suffix_index, bool);
> +
> +  void register_function_group (const function_group_info &, bool);
> +
> +private:
> +  void append_name (const char *);
> +  char *finish_name ();
> +
> +  char *get_name (const function_instance &, bool, bool);
> +
> +  tree get_attributes (const function_instance &);
> +
> +  registered_function &add_function (const function_instance &,
> +				     const char *, tree, tree,
> +				     bool, bool, bool);
> +
> +  /* The function type to use for functions that are resolved by
> +     function_resolver.  */
> +  tree m_overload_type;
> +
> +  /* True if we should create a separate decl for each instance of an
> +     overloaded function, instead of using function_resolver.  */
> +  bool m_direct_overloads;
> +
> +  /* Used for building up function names.  */
> +  obstack m_string_obstack;
> +
> +  /* Maps all overloaded function names that we've registered so far
> +     to their associated function_instances.  */
> +  hash_map<nofree_string_hash, registered_function *>
> m_overload_names;
> +};
> +
> +/* A base class for handling calls to built-in functions.  */
> +class function_call_info : public function_instance
> +{
> +public:
> +  function_call_info (location_t, const function_instance &, tree);
> +
> +  bool function_returns_void_p ();
> +
> +  /* The location of the call.  */
> +  location_t location;
> +
> +  /* The FUNCTION_DECL that is being called.  */
> +  tree fndecl;
> +};
> +
> +/* A class for resolving an overloaded function call.  */
> +class function_resolver : public function_call_info
> +{
> +public:
> +  enum { SAME_SIZE = 256, HALF_SIZE, QUARTER_SIZE };
> +  static const type_class_index SAME_TYPE_CLASS = NUM_TYPE_CLASSES;
> +
> +  function_resolver (location_t, const function_instance &, tree,
> +		     vec<tree, va_gc> &);
> +
> +  tree get_vector_type (type_suffix_index);
> +  const char *get_scalar_type_name (type_suffix_index);
> +  tree get_argument_type (unsigned int);
> +  bool scalar_argument_p (unsigned int);
> +
> +  tree report_no_such_form (type_suffix_index);
> +  tree lookup_form (mode_suffix_index,
> +		    type_suffix_index = NUM_TYPE_SUFFIXES,
> +		    type_suffix_index = NUM_TYPE_SUFFIXES);
> +  tree resolve_to (mode_suffix_index,
> +		   type_suffix_index = NUM_TYPE_SUFFIXES,
> +		   type_suffix_index = NUM_TYPE_SUFFIXES);
> +
> +  type_suffix_index infer_vector_or_tuple_type (unsigned int, unsigned int);
> +  type_suffix_index infer_vector_type (unsigned int);
> +
> +  bool require_vector_or_scalar_type (unsigned int);
> +
> +  bool require_vector_type (unsigned int, vector_type_index);
> +  bool require_matching_vector_type (unsigned int, type_suffix_index);
> +  bool require_derived_vector_type (unsigned int, unsigned int,
> +				    type_suffix_index,
> +				    type_class_index = SAME_TYPE_CLASS,
> +				    unsigned int = SAME_SIZE);
> +  bool require_integer_immediate (unsigned int);
> +  bool require_scalar_type (unsigned int, const char *);
> +  bool require_derived_scalar_type (unsigned int, type_class_index,
> +				    unsigned int = SAME_SIZE);
> +
> +  bool check_num_arguments (unsigned int);
> +  bool check_gp_argument (unsigned int, unsigned int &, unsigned int &);
> +  tree resolve_unary (type_class_index = SAME_TYPE_CLASS,
> +		      unsigned int = SAME_SIZE, bool = false);
> +  tree resolve_unary_n ();
> +  tree resolve_uniform (unsigned int, unsigned int = 0);
> +  tree resolve_uniform_opt_n (unsigned int);
> +  tree finish_opt_n_resolution (unsigned int, unsigned int, type_suffix_index,
> +				type_class_index = SAME_TYPE_CLASS,
> +				unsigned int = SAME_SIZE,
> +				type_suffix_index = NUM_TYPE_SUFFIXES);
> +
> +  tree resolve ();
> +
> +private:
> +  /* The arguments to the overloaded function.  */
> +  vec<tree, va_gc> &m_arglist;
> +};
> +
> +/* A class for checking that the semantic constraints on a function call are
> +   satisfied, such as arguments being integer constant expressions with
> +   a particular range.  The parent class's FNDECL is the decl that was
> +   called in the original source, before overload resolution.  */
> +class function_checker : public function_call_info
> +{
> +public:
> +  function_checker (location_t, const function_instance &, tree,
> +		    tree, unsigned int, tree *);
> +
> +  bool require_immediate_enum (unsigned int, tree);
> +  bool require_immediate_lane_index (unsigned int, unsigned int = 1);
> +  bool require_immediate_range (unsigned int, HOST_WIDE_INT,
> HOST_WIDE_INT);
> +
> +  bool check ();
> +
> +private:
> +  bool argument_exists_p (unsigned int);
> +
> +  bool require_immediate (unsigned int, HOST_WIDE_INT &);
> +
> +  /* The type of the resolved function.  */
> +  tree m_fntype;
> +
> +  /* The arguments to the function.  */
> +  unsigned int m_nargs;
> +  tree *m_args;
> +
> +  /* The first argument not associated with the function's predication
> +     type.  */
> +  unsigned int m_base_arg;
> +};
> +
> +/* A class for folding a gimple function call.  */
> +class gimple_folder : public function_call_info
> +{
> +public:
> +  gimple_folder (const function_instance &, tree,
> +		 gcall *);
> +
> +  gimple *fold ();
> +
> +  /* The call we're folding.  */
> +  gcall *call;
> +
> +  /* The result of the call, or null if none.  */
> +  tree lhs;
> +};
> +
> +/* A class for expanding a function call into RTL.  */
> +class function_expander : public function_call_info
> +{
> +public:
> +  function_expander (const function_instance &, tree, tree, rtx);
> +  rtx expand ();
> +
> +  insn_code direct_optab_handler (optab, unsigned int = 0);
> +
> +  rtx get_fallback_value (machine_mode, unsigned int, unsigned int &);
> +  rtx get_reg_target ();
> +
> +  void add_output_operand (insn_code);
> +  void add_input_operand (insn_code, rtx);
> +  void add_integer_operand (HOST_WIDE_INT);
> +  rtx generate_insn (insn_code);
> +
> +  rtx use_exact_insn (insn_code);
> +  rtx use_unpred_insn (insn_code);
> +  rtx use_pred_x_insn (insn_code);
> +  rtx use_cond_insn (insn_code, unsigned int = DEFAULT_MERGE_ARGNO);
> +
> +  rtx map_to_rtx_codes (rtx_code, rtx_code, rtx_code);
> +
> +  /* The function call expression.  */
> +  tree call_expr;
> +
> +  /* For functions that return a value, this is the preferred location
> +     of that value.  It could be null or could have a different mode
> +     from the function return type.  */
> +  rtx possible_target;
> +
> +  /* The expanded arguments.  */
> +  auto_vec<rtx, 16> args;
> +
> +private:
> +  /* Used to build up the operands to an instruction.  */
> +  auto_vec<expand_operand, 8> m_ops;
> +};
> +
> +/* Provides information about a particular function base name, and handles
> +   tasks related to the base name.  */
> +class function_base
> +{
> +public:
> +  /* Return a set of CP_* flags that describe what the function might do,
> +     in addition to reading its arguments and returning a result.  */
> +  virtual unsigned int call_properties (const function_instance &) const;
> +
> +  /* If the function operates on tuples of vectors, return the number
> +     of vectors in the tuples, otherwise return 1.  */
> +  virtual unsigned int vectors_per_tuple () const { return 1; }
> +
> +  /* Try to fold the given gimple call.  Return the new gimple statement
> +     on success, otherwise return null.  */
> +  virtual gimple *fold (gimple_folder &) const { return NULL; }
> +
> +  /* Expand the given call into rtl.  Return the result of the function,
> +     or an arbitrary value if the function doesn't return a result.  */
> +  virtual rtx expand (function_expander &) const = 0;
> +};
> +
> +/* Classifies functions into "shapes".  The idea is to take all the
> +   type signatures for a set of functions, and classify what's left
> +   based on:
> +
> +   - the number of arguments
> +
> +   - the process of determining the types in the signature from the mode
> +     and type suffixes in the function name (including types that are not
> +     affected by the suffixes)
> +
> +   - which arguments must be integer constant expressions, and what range
> +     those arguments have
> +
> +   - the process for mapping overloaded names to "full" names.  */
> +class function_shape
> +{
> +public:
> +  virtual bool explicit_type_suffix_p (unsigned int, enum predication_index,
> enum mode_suffix_index) const = 0;
> +  virtual bool explicit_mode_suffix_p (enum predication_index, enum
> mode_suffix_index) const = 0;
> +  virtual bool skip_overload_p (enum predication_index, enum
> mode_suffix_index) const = 0;
> +
> +  /* Define all functions associated with the given group.  */
> +  virtual void build (function_builder &,
> +		      const function_group_info &,
> +		      bool) const = 0;
> +
> +  /* Try to resolve the overloaded call.  Return the non-overloaded
> +     function decl on success and error_mark_node on failure.  */
> +  virtual tree resolve (function_resolver &) const = 0;
> +
> +  /* Check whether the given call is semantically valid.  Return true
> +     if it is, otherwise report an error and return false.  */
> +  virtual bool check (function_checker &) const { return true; }
> +};
> +
> +extern const type_suffix_info type_suffixes[NUM_TYPE_SUFFIXES + 1];
> +extern const mode_suffix_info mode_suffixes[MODE_none + 1];
> +
>  extern tree scalar_types[NUM_VECTOR_TYPES];
> -extern tree acle_vector_types[3][NUM_VECTOR_TYPES + 1];
> +extern tree acle_vector_types[MAX_TUPLE_SIZE][NUM_VECTOR_TYPES + 1];
> +
> +/* Return the ACLE type mve_pred16_t.  */
> +inline tree
> +get_mve_pred16_t (void)
> +{
> +  return acle_vector_types[0][VECTOR_TYPE_mve_pred16_t];
> +}
> +
> +/* Try to find a mode with the given mode_suffix_info fields.  Return the
> +   mode on success or MODE_none on failure.  */
> +inline mode_suffix_index
> +find_mode_suffix (vector_type_index base_vector_type,
> +		  vector_type_index displacement_vector_type,
> +		  units_index displacement_units)
> +{
> +  for (unsigned int mode_i = 0; mode_i < ARRAY_SIZE (mode_suffixes);
> ++mode_i)
> +    {
> +      const mode_suffix_info &mode = mode_suffixes[mode_i];
> +      if (mode.base_vector_type == base_vector_type
> +	  && mode.displacement_vector_type == displacement_vector_type
> +	  && mode.displacement_units == displacement_units)
> +	return mode_suffix_index (mode_i);
> +    }
> +  return MODE_none;
> +}
> +
> +/* Return the type suffix associated with ELEMENT_BITS-bit elements of type
> +   class TCLASS.  */
> +inline type_suffix_index
> +find_type_suffix (type_class_index tclass, unsigned int element_bits)
> +{
> +  for (unsigned int i = 0; i < NUM_TYPE_SUFFIXES; ++i)
> +    if (type_suffixes[i].tclass == tclass
> +	&& type_suffixes[i].element_bits == element_bits)
> +      return type_suffix_index (i);
> +  gcc_unreachable ();
> +}
> +
> +inline function_instance::
> +function_instance (const char *base_name_in,
> +		   const function_base *base_in,
> +		   const function_shape *shape_in,
> +		   mode_suffix_index mode_suffix_id_in,
> +		   const type_suffix_pair &type_suffix_ids_in,
> +		   predication_index pred_in)
> +  : base_name (base_name_in), base (base_in), shape (shape_in),
> +    mode_suffix_id (mode_suffix_id_in), pred (pred_in)
> +{
> +  memcpy (type_suffix_ids, type_suffix_ids_in, sizeof (type_suffix_ids));
> +}
> +
> +inline bool
> +function_instance::operator== (const function_instance &other) const
> +{
> +  return (base == other.base
> +	  && shape == other.shape
> +	  && mode_suffix_id == other.mode_suffix_id
> +	  && pred == other.pred
> +	  && type_suffix_ids[0] == other.type_suffix_ids[0]
> +	  && type_suffix_ids[1] == other.type_suffix_ids[1]);
> +}
> +
> +inline bool
> +function_instance::operator!= (const function_instance &other) const
> +{
> +  return !operator== (other);
> +}
> +
> +/* If the function operates on tuples of vectors, return the number
> +   of vectors in the tuples, otherwise return 1.  */
> +inline unsigned int
> +function_instance::vectors_per_tuple () const
> +{
> +  return base->vectors_per_tuple ();
> +}
> +
> +/* Return information about the function's mode suffix.  */
> +inline const mode_suffix_info &
> +function_instance::mode_suffix () const
> +{
> +  return mode_suffixes[mode_suffix_id];
> +}
> +
> +/* Return information about type suffix I.  */
> +inline const type_suffix_info &
> +function_instance::type_suffix (unsigned int i) const
> +{
> +  return type_suffixes[type_suffix_ids[i]];
> +}
> +
> +/* Return the scalar type associated with type suffix I.  */
> +inline tree
> +function_instance::scalar_type (unsigned int i) const
> +{
> +  return scalar_types[type_suffix (i).vector_type];
> +}
> +
> +/* Return the vector type associated with type suffix I.  */
> +inline tree
> +function_instance::vector_type (unsigned int i) const
> +{
> +  return acle_vector_types[0][type_suffix (i).vector_type];
> +}
> +
> +/* If the function operates on tuples of vectors, return the tuple type
> +   associated with type suffix I, otherwise return the vector type associated
> +   with type suffix I.  */
> +inline tree
> +function_instance::tuple_type (unsigned int i) const
> +{
> +  unsigned int num_vectors = vectors_per_tuple ();
> +  return acle_vector_types[num_vectors - 1][type_suffix (i).vector_type];
> +}
> +
> +/* Return the vector or predicate mode associated with type suffix I.  */
> +inline machine_mode
> +function_instance::vector_mode (unsigned int i) const
> +{
> +  return type_suffix (i).vector_mode;
> +}
> +
> +/* Return true if the function has no return value.  */
> +inline bool
> +function_call_info::function_returns_void_p ()
> +{
> +  return TREE_TYPE (TREE_TYPE (fndecl)) == void_type_node;
> +}
> +
> +/* Default implementation of function::call_properties, with conservatively
> +   correct behavior for floating-point instructions.  */
> +inline unsigned int
> +function_base::call_properties (const function_instance &instance) const
> +{
> +  unsigned int flags = 0;
> +  if (instance.type_suffix (0).float_p || instance.type_suffix (1).float_p)
> +    flags |= CP_READ_FPCR | CP_RAISE_FP_EXCEPTIONS;
> +  return flags;
> +}
> 
>  } /* end namespace arm_mve */
> 
> diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
> index 1bdbd3b8ab3..61fcd671437 100644
> --- a/gcc/config/arm/arm-protos.h
> +++ b/gcc/config/arm/arm-protos.h
> @@ -215,7 +215,8 @@ extern opt_machine_mode arm_get_mask_mode
> (machine_mode mode);
>     those groups.  */
>  enum arm_builtin_class
>  {
> -  ARM_BUILTIN_GENERAL
> +  ARM_BUILTIN_GENERAL,
> +  ARM_BUILTIN_MVE
>  };
> 
>  /* Built-in function codes are structured so that the low
> @@ -229,6 +230,13 @@ const unsigned int ARM_BUILTIN_CLASS = (1 <<
> ARM_BUILTIN_SHIFT) - 1;
>  /* MVE functions.  */
>  namespace arm_mve {
>    void handle_arm_mve_types_h ();
> +  void handle_arm_mve_h (bool);
> +  tree resolve_overloaded_builtin (location_t, unsigned int,
> +				   vec<tree, va_gc> *);
> +  bool check_builtin_call (location_t, vec<location_t>, unsigned int,
> +			   tree, unsigned int, tree *);
> +  gimple *gimple_fold_builtin (unsigned int code, gcall *stmt);
> +  rtx expand_builtin (unsigned int, tree, rtx);
>  }
> 
>  /* Thumb functions.  */
> diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc
> index bf7ff9a9704..004e6c6194e 100644
> --- a/gcc/config/arm/arm.cc
> +++ b/gcc/config/arm/arm.cc
> @@ -69,6 +69,7 @@
>  #include "optabs-libfuncs.h"
>  #include "gimplify.h"
>  #include "gimple.h"
> +#include "gimple-iterator.h"
>  #include "selftest.h"
>  #include "tree-vectorizer.h"
>  #include "opts.h"
> @@ -506,6 +507,9 @@ static const struct attribute_spec
> arm_attribute_table[] =
>  #undef TARGET_FUNCTION_VALUE_REGNO_P
>  #define TARGET_FUNCTION_VALUE_REGNO_P arm_function_value_regno_p
> 
> +#undef TARGET_GIMPLE_FOLD_BUILTIN
> +#define TARGET_GIMPLE_FOLD_BUILTIN arm_gimple_fold_builtin
> +
>  #undef  TARGET_ASM_OUTPUT_MI_THUNK
>  #define TARGET_ASM_OUTPUT_MI_THUNK arm_output_mi_thunk
>  #undef  TARGET_ASM_CAN_OUTPUT_MI_THUNK
> @@ -2844,6 +2848,29 @@ arm_init_libfuncs (void)
>    speculation_barrier_libfunc = init_one_libfunc ("__speculation_barrier");
>  }
> 
> +/* Implement TARGET_GIMPLE_FOLD_BUILTIN.  */
> +static bool
> +arm_gimple_fold_builtin (gimple_stmt_iterator *gsi)
> +{
> +  gcall *stmt = as_a <gcall *> (gsi_stmt (*gsi));
> +  tree fndecl = gimple_call_fndecl (stmt);
> +  unsigned int code = DECL_MD_FUNCTION_CODE (fndecl);
> +  unsigned int subcode = code >> ARM_BUILTIN_SHIFT;
> +  gimple *new_stmt = NULL;
> +  switch (code & ARM_BUILTIN_CLASS)
> +    {
> +    case ARM_BUILTIN_GENERAL:
> +      break;
> +    case ARM_BUILTIN_MVE:
> +      new_stmt = arm_mve::gimple_fold_builtin (subcode, stmt);
> +    }
> +  if (!new_stmt)
> +    return false;
> +
> +  gsi_replace (gsi, new_stmt, true);
> +  return true;
> +}
> +
>  /* On AAPCS systems, this is the "struct __va_list".  */
>  static GTY(()) tree va_list_type;
> 
> diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
> index 1262d668121..0d2ba968fc0 100644
> --- a/gcc/config/arm/arm_mve.h
> +++ b/gcc/config/arm/arm_mve.h
> @@ -34,6 +34,12 @@
>  #endif
>  #include "arm_mve_types.h"
> 
> +#ifdef __ARM_MVE_PRESERVE_USER_NAMESPACE
> +#pragma GCC arm "arm_mve.h" true
> +#else
> +#pragma GCC arm "arm_mve.h" false
> +#endif
> +
>  #ifndef __ARM_MVE_PRESERVE_USER_NAMESPACE
>  #define vst4q(__addr, __value) __arm_vst4q(__addr, __value)
>  #define vdupq_n(__a) __arm_vdupq_n(__a)
> diff --git a/gcc/config/arm/predicates.md b/gcc/config/arm/predicates.md
> index 3139750c606..8e235f63ee6 100644
> --- a/gcc/config/arm/predicates.md
> +++ b/gcc/config/arm/predicates.md
> @@ -903,3 +903,7 @@ (define_predicate "call_insn_operand"
>  (define_special_predicate "aligned_operand"
>    (ior (not (match_code "mem"))
>         (match_test "MEM_ALIGN (op) >= GET_MODE_ALIGNMENT (mode)")))
> +
> +;; A special predicate that doesn't match a particular mode.
> +(define_special_predicate "arm_any_register_operand"
> +  (match_code "reg"))
> diff --git a/gcc/config/arm/t-arm b/gcc/config/arm/t-arm
> index 637e72af5bb..9a1b06368a1 100644
> --- a/gcc/config/arm/t-arm
> +++ b/gcc/config/arm/t-arm
> @@ -154,15 +154,41 @@ arm-builtins.o: $(srcdir)/config/arm/arm-builtins.cc
> $(CONFIG_H) \
>  		$(srcdir)/config/arm/arm-builtins.cc
> 
>  arm-mve-builtins.o: $(srcdir)/config/arm/arm-mve-builtins.cc $(CONFIG_H) \
> -  $(SYSTEM_H) coretypes.h $(TM_H) $(TREE_H) \
> -  fold-const.h langhooks.h stringpool.h attribs.h diagnostic.h \
> +  $(SYSTEM_H) coretypes.h $(TM_H) $(TREE_H) $(RTL_H) $(TM_P_H) \
> +  memmodel.h insn-codes.h optabs.h recog.h expr.h basic-block.h \
> +  function.h fold-const.h gimple.h gimple-fold.h emit-rtl.h langhooks.h \
> +  stringpool.h attribs.h diagnostic.h \
>    $(srcdir)/config/arm/arm-protos.h \
>    $(srcdir)/config/arm/arm-builtins.h \
>    $(srcdir)/config/arm/arm-mve-builtins.h \
> -  $(srcdir)/config/arm/arm-mve-builtins.def
> +  $(srcdir)/config/arm/arm-mve-builtins-base.h \
> +  $(srcdir)/config/arm/arm-mve-builtins-shapes.h \
> +  $(srcdir)/config/arm/arm-mve-builtins.def \
> +  $(srcdir)/config/arm/arm-mve-builtins-base.def
>  	$(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS)
> $(INCLUDES) \
>  		$(srcdir)/config/arm/arm-mve-builtins.cc
> 
> +arm-mve-builtins-shapes.o: \
> +  $(srcdir)/config/arm/arm-mve-builtins-shapes.cc \
> +  $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(TREE_H) \
> +  $(RTL_H) memmodel.h insn-codes.h optabs.h \
> +  $(srcdir)/config/arm/arm-mve-builtins.h \
> +  $(srcdir)/config/arm/arm-mve-builtins-shapes.h
> +	$(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS)
> $(INCLUDES) \
> +		$(srcdir)/config/arm/arm-mve-builtins-shapes.cc
> +
> +arm-mve-builtins-base.o: \
> +  $(srcdir)/config/arm/arm-mve-builtins-base.cc \
> +  $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(TREE_H) $(RTL_H) \
> +  memmodel.h insn-codes.h $(OPTABS_H) \
> +  $(BASIC_BLOCK_H) $(FUNCTION_H) $(GIMPLE_H) \
> +  $(srcdir)/config/arm/arm-mve-builtins.h \
> +  $(srcdir)/config/arm/arm-mve-builtins-shapes.h \
> +  $(srcdir)/config/arm/arm-mve-builtins-base.h \
> +  $(srcdir)/config/arm/arm-mve-builtins-functions.h
> +	$(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS)
> $(INCLUDES) \
> +		$(srcdir)/config/arm/arm-mve-builtins-base.cc
> +
>  arm-c.o: $(srcdir)/config/arm/arm-c.cc $(CONFIG_H) $(SYSTEM_H) \
>      coretypes.h $(TM_H) $(TREE_H) output.h $(C_COMMON_H)
>  	$(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS)
> $(INCLUDES) \
> --
> 2.34.1

next prev parent reply	other threads:[~2023-05-02 10:17 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-18 13:45 [PATCH 00/22] arm: New framework for MVE intrinsics Christophe Lyon
2023-04-18 13:45 ` [PATCH 01/22] arm: move builtin function codes into general numberspace Christophe Lyon
2023-05-02  9:24   ` Kyrylo Tkachov
2023-04-18 13:45 ` [PATCH 02/22] arm: [MVE intrinsics] Add new framework Christophe Lyon
2023-05-02 10:17   ` Kyrylo Tkachov [this message]
2023-04-18 13:45 ` [PATCH 03/22] arm: [MVE intrinsics] Rework vreinterpretq Christophe Lyon
2023-05-02 10:26   ` Kyrylo Tkachov
2023-05-02 14:05     ` Christophe Lyon
2023-05-02 15:28       ` Kyrylo Tkachov
2023-05-02 15:49         ` Christophe Lyon
2023-05-03 14:37           ` [PATCH v2 " Christophe Lyon
2023-05-03 14:52             ` Kyrylo Tkachov
2023-04-18 13:45 ` [PATCH 04/22] arm: [MVE intrinsics] Rework vuninitialized Christophe Lyon
2023-05-02 16:13   ` Kyrylo Tkachov
2023-04-18 13:45 ` [PATCH 05/22] arm: [MVE intrinsics] add binary_opt_n shape Christophe Lyon
2023-05-02 16:16   ` Kyrylo Tkachov
2023-04-18 13:45 ` [PATCH 06/22] arm: [MVE intrinsics] add unspec_based_mve_function_exact_insn Christophe Lyon
2023-05-02 16:17   ` Kyrylo Tkachov
2023-04-18 13:45 ` [PATCH 07/22] arm: [MVE intrinsics] factorize vadd vsubq vmulq Christophe Lyon
2023-05-02 16:19   ` Kyrylo Tkachov
2023-05-02 16:22     ` Christophe Lyon
2023-04-18 13:45 ` [PATCH 08/22] arm: [MVE intrinsics] rework vaddq vmulq vsubq Christophe Lyon
2023-05-02 16:31   ` Kyrylo Tkachov
2023-05-03  9:06     ` Christophe Lyon
2023-04-18 13:45 ` [PATCH 09/22] arm: [MVE intrinsics] add binary shape Christophe Lyon
2023-05-02 16:32   ` Kyrylo Tkachov
2023-04-18 13:45 ` [PATCH 10/22] arm: [MVE intrinsics] factorize vandq veorq vorrq vbicq Christophe Lyon
2023-05-02 16:36   ` Kyrylo Tkachov
2023-04-18 13:45 ` [PATCH 11/22] arm: [MVE intrinsics] rework vandq veorq Christophe Lyon
2023-05-02 16:37   ` Kyrylo Tkachov
2023-04-18 13:45 ` [PATCH 12/22] arm: [MVE intrinsics] add binary_orrq shape Christophe Lyon
2023-05-02 16:39   ` Kyrylo Tkachov
2023-04-18 13:45 ` [PATCH 13/22] arm: [MVE intrinsics] rework vorrq Christophe Lyon
2023-05-02 16:41   ` Kyrylo Tkachov
2023-04-18 13:46 ` [PATCH 14/22] arm: [MVE intrinsics] add unspec_mve_function_exact_insn Christophe Lyon
2023-05-03  8:40   ` Kyrylo Tkachov
2023-04-18 13:46 ` [PATCH 15/22] arm: [MVE intrinsics] add create shape Christophe Lyon
2023-05-03  8:40   ` Kyrylo Tkachov
2023-04-18 13:46 ` [PATCH 16/22] arm: [MVE intrinsics] factorize vcreateq Christophe Lyon
2023-05-03  8:42   ` Kyrylo Tkachov
2023-04-18 13:46 ` [PATCH 17/22] arm: [MVE intrinsics] rework vcreateq Christophe Lyon
2023-05-03  8:44   ` Kyrylo Tkachov
2023-04-18 13:46 ` [PATCH 18/22] arm: [MVE intrinsics] factorize several binary_m operations Christophe Lyon
2023-05-03  8:46   ` Kyrylo Tkachov
2023-04-18 13:46 ` [PATCH 19/22] arm: [MVE intrinsics] factorize several binary _n operations Christophe Lyon
2023-05-03  8:47   ` Kyrylo Tkachov
2023-04-18 13:46 ` [PATCH 20/22] arm: [MVE intrinsics] factorize several binary _m_n operations Christophe Lyon
2023-05-03  8:48   ` Kyrylo Tkachov
2023-04-18 13:46 ` [PATCH 21/22] arm: [MVE intrinsics] factorize several binary operations Christophe Lyon
2023-05-03  8:49   ` Kyrylo Tkachov
2023-04-18 13:46 ` [PATCH 22/22] arm: [MVE intrinsics] rework vhaddq vhsubq vmulhq vqaddq vqsubq vqdmulhq vrhaddq vrmulhq Christophe Lyon
2023-05-03  8:51   ` Kyrylo Tkachov
2023-05-02  9:18 ` [PATCH 00/22] arm: New framework for MVE intrinsics Kyrylo Tkachov
2023-05-02 15:04   ` Christophe Lyon
2023-05-03 15:01     ` Christophe Lyon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=PAXPR08MB6926816B6C8756D491C51F63936F9@PAXPR08MB6926.eurprd08.prod.outlook.com \
    --to=kyrylo.tkachov@arm.com \
    --cc=Christophe.Lyon@arm.com \
    --cc=Richard.Earnshaw@arm.com \
    --cc=Richard.Sandiford@arm.com \
    --cc=gcc-patches@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).