public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH 00/16] aarch64: Add support for SME
@ 2022-11-13  9:59 Richard Sandiford
  2022-11-13  9:59 ` [PATCH 01/16] aarch64: Add arm_streaming(_compatible) attributes Richard Sandiford
                   ` (15 more replies)
  0 siblings, 16 replies; 17+ messages in thread
From: Richard Sandiford @ 2022-11-13  9:59 UTC (permalink / raw)
  To: gcc-patches

This series adds support for the Armv9-A Scalable Matrix Extension (SME).
Details about the extension are available here:

  https://developer.arm.com/documentation/ddi0616/aa/?lang=en

The ABI and ACLE documentation is available on github:

  https://github.com/ARM-software/abi-aa/blob/main/aapcs64/aapcs64.rst
  https://github.com/ARM-software/abi-aa/blob/main/aadwarf64/aadwarf64.rst
  https://github.com/ARM-software/acle/blob/main/main/acle.md#scalable-matrix-extension-sme

Series tested on aarch64-linux-gnu.  It depends on other patches
posted recently, and I'll give some time for comments & reviews,
so I won't be applying just yet.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 01/16] aarch64: Add arm_streaming(_compatible) attributes
  2022-11-13  9:59 [PATCH 00/16] aarch64: Add support for SME Richard Sandiford
@ 2022-11-13  9:59 ` Richard Sandiford
  2022-11-13 10:00 ` [PATCH 02/16] aarch64: Add +sme Richard Sandiford
                   ` (14 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: Richard Sandiford @ 2022-11-13  9:59 UTC (permalink / raw)
  To: gcc-patches

This patch adds support for recognising the SME arm_streaming
and arm_streaming_compatible attributes.  These attributes
respectively describe whether the processor is definitely in
"streaming mode" (PSTATE.SM==1), whether the processor is
definitely not in streaming mode (PSTATE.SM==0), or whether
we don't know at compile time either way.

As far as the compiler is concerned, this effectively create three
ISA submodes: streaming mode enables things that are not available
in non-streaming mode, non-streaming mode enables things that not
available in streaming mode, and streaming-compatible mode has to stick
to the common subset.  This means that some instructions are conditional
on PSTATE.SM==1 and some are conditional on PSTATE.SM==0.

I wondered about recording the streaming state in a new variable.
However, the set of available instructions is also influenced by
PSTATE.ZA (added later), so I think it makes sense to view this
as an instance of a more general mechanism.  Also, keeping the
PSTATE.SM state in the same flag variable as the other ISA
features makes it possible to sum up the requirements of an
ACLE function in a single value.

The patch therefore adds a new set of feature flags called "ISA modes".
Unlike the other two sets of flags (optional features and architecture-
level features), these ISA modes are not controlled directly by
command-line parameters or "target" attributes.

arm_streaming and arm_streaming_compatible are function type attributes
rather than function declaration attributes.  This means that we need
to find somewhere to copy the type information across to a function's
target options.  The patch does this in aarch64_set_current_function.

We also need to record which ISA mode a callee expects/requires
to be active on entry.  (The same mode is then active on return.)
The patch extends the current UNSPEC_CALLEE_ABI cookie to include
this information, as well as the PCS variant that it recorded
previously.

gcc/
	* config/aarch64/aarch64-isa-modes.def: New file.
	* config/aarch64/aarch64.h: Include it in the feature enumerations.
	(AARCH64_FL_SM_STATE, AARCH64_FL_ISA_MODES): New constants.
	(AARCH64_FL_DEFAULT_ISA_MODE): Likewise.
	(AARCH64_ISA_MODE): New macro.
	(CUMULATIVE_ARGS): Add an isa_mode field.
	* config/aarch64/aarch64-protos.h (aarch64_gen_callee_cookie): Declare.
	(aarch64_tlsdesc_abi_id): Return an arm_pcs.
	* config/aarch64/aarch64.cc (attr_streaming_exclusions): New variable.
	(aarch64_attribute_table): Add arm_streaming and
	arm_streaming_compatible.
	(aarch64_fntype_sm_state, aarch64_fntype_isa_mode): New functions.
	(aarch64_fndecl_sm_state, aarch64_fndecl_isa_mode): Likewise.
	(aarch64_gen_callee_cookie, aarch64_callee_abi): Likewise.
	(aarch64_insn_callee_cookie, aarch64_insn_callee_abi): Use them.
	(aarch64_function_arg, aarch64_output_mi_thunk): Likewise.
	(aarch64_init_cumulative_args): Initialize the isa_mode field.
	(aarch64_override_options): Add the ISA mode to the feature set.
	(aarch64_temporary_target::copy_from_fndecl): Likewise.
	(aarch64_fndecl_options, aarch64_handle_attr_arch): Likewise.
	(aarch64_set_current_function): Maintain the correct ISA mode.
	(aarch64_tlsdesc_abi_id): Return an arm_pcs.
	(aarch64_comp_type_attributes): Handle arm_streaming and
	arm_streaming_compatible.
	* config/aarch64/aarch64.md (tlsdesc_small_<mode>): Use
	aarch64_gen_callee_cookie to get the ABI cookie.
	* config/aarch64/t-aarch64 (TM_H): Add all feature-related .def files.

gcc/testsuite/
	* gcc.target/aarch64/sme/aarch64-sme.exp: New harness.
	* gcc.target/aarch64/sme/streaming_mode_1.c: New test.
	* gcc.target/aarch64/sme/streaming_mode_2.c: Likewise.
	* gcc.target/aarch64/auto-init-1.c: Only expect the call insn
	to contain 1 (const_int 0), not 2.
---
 gcc/config/aarch64/aarch64-isa-modes.def      |  35 ++++
 gcc/config/aarch64/aarch64-protos.h           |   3 +-
 gcc/config/aarch64/aarch64.cc                 | 194 +++++++++++++++---
 gcc/config/aarch64/aarch64.h                  |  24 ++-
 gcc/config/aarch64/aarch64.md                 |   3 +-
 gcc/config/aarch64/t-aarch64                  |   5 +-
 .../gcc.target/aarch64/auto-init-1.c          |   3 +-
 .../gcc.target/aarch64/sme/aarch64-sme.exp    |  41 ++++
 .../gcc.target/aarch64/sme/streaming_mode_1.c | 106 ++++++++++
 .../gcc.target/aarch64/sme/streaming_mode_2.c |  25 +++
 10 files changed, 403 insertions(+), 36 deletions(-)
 create mode 100644 gcc/config/aarch64/aarch64-isa-modes.def
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/aarch64-sme.exp
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/streaming_mode_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/streaming_mode_2.c

diff --git a/gcc/config/aarch64/aarch64-isa-modes.def b/gcc/config/aarch64/aarch64-isa-modes.def
new file mode 100644
index 00000000000..fba8eafbae1
--- /dev/null
+++ b/gcc/config/aarch64/aarch64-isa-modes.def
@@ -0,0 +1,35 @@
+/* Copyright (C) 2022 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published
+   by the Free Software Foundation; either version 3, or (at your
+   option) any later version.
+
+   GCC is distributed in the hope that it will be useful, but WITHOUT
+   ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+   or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
+   License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.  */
+
+/* This file defines a set of "ISA modes"; in other words, it defines
+   various bits of runtime state that control the set of available
+   instructions or that affect the semantics of instructions in some way.
+
+   Before using #include to read this file, define a macro:
+
+      DEF_AARCH64_ISA_MODE(NAME)
+
+  where NAME is the name of the mode.  */
+
+/* Indicates that PSTATE.SM is known to be 1 or 0 respectively.  These
+   modes are mutually exclusive.  If neither mode is active then the state
+   of PSTATE.SM is not known at compile time.  */
+DEF_AARCH64_ISA_MODE(SM_ON)
+DEF_AARCH64_ISA_MODE(SM_OFF)
+
+#undef DEF_AARCH64_ISA_MODE
diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index 866d68ad4d7..06b926b42d6 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -771,6 +771,7 @@ bool aarch64_const_vec_all_same_in_range_p (rtx, HOST_WIDE_INT,
 bool aarch64_constant_address_p (rtx);
 bool aarch64_emit_approx_div (rtx, rtx, rtx);
 bool aarch64_emit_approx_sqrt (rtx, rtx, bool);
+rtx aarch64_gen_callee_cookie (aarch64_feature_flags, arm_pcs);
 void aarch64_expand_call (rtx, rtx, rtx, bool);
 bool aarch64_expand_cpymem (rtx *);
 bool aarch64_expand_setmem (rtx *);
@@ -849,7 +850,7 @@ int aarch64_movk_shift (const wide_int_ref &, const wide_int_ref &);
 bool aarch64_use_return_insn_p (void);
 const char *aarch64_output_casesi (rtx *);
 
-unsigned int aarch64_tlsdesc_abi_id ();
+arm_pcs aarch64_tlsdesc_abi_id ();
 enum aarch64_symbol_type aarch64_classify_symbol (rtx, HOST_WIDE_INT);
 enum aarch64_symbol_type aarch64_classify_tls_symbol (rtx);
 enum reg_class aarch64_regno_regclass (unsigned);
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index a40ac6fd903..a2e910daddf 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -2731,6 +2731,16 @@ handle_aarch64_vector_pcs_attribute (tree *node, tree name, tree,
   gcc_unreachable ();
 }
 
+/* Mutually-exclusive function type attributes for controlling PSTATE.SM.  */
+static const struct attribute_spec::exclusions attr_streaming_exclusions[] =
+{
+  /* Attribute name     exclusion applies to:
+			function, type, variable */
+  { "arm_streaming", false, false, false },
+  { "arm_streaming_compatible", false, true, false },
+  { NULL, false, false, false }
+};
+
 /* Table of machine attributes.  */
 static const struct attribute_spec aarch64_attribute_table[] =
 {
@@ -2738,6 +2748,10 @@ static const struct attribute_spec aarch64_attribute_table[] =
        affects_type_identity, handler, exclude } */
   { "aarch64_vector_pcs", 0, 0, false, true,  true,  true,
 			  handle_aarch64_vector_pcs_attribute, NULL },
+  { "arm_streaming",	  0, 0, false, true,  true,  true,
+			  NULL, attr_streaming_exclusions },
+  { "arm_streaming_compatible", 0, 0, false, true,  true,  true,
+			  NULL, attr_streaming_exclusions },
   { "arm_sve_vector_bits", 1, 1, false, true,  false, true,
 			  aarch64_sve::handle_arm_sve_vector_bits_attribute,
 			  NULL },
@@ -4048,6 +4062,47 @@ aarch64_fntype_abi (const_tree fntype)
   return default_function_abi;
 }
 
+/* Return the state of PSTATE.SM on entry to functions of type FNTYPE.  */
+
+static aarch64_feature_flags
+aarch64_fntype_sm_state (const_tree fntype)
+{
+  if (lookup_attribute ("arm_streaming", TYPE_ATTRIBUTES (fntype)))
+    return AARCH64_FL_SM_ON;
+
+  if (lookup_attribute ("arm_streaming_compatible", TYPE_ATTRIBUTES (fntype)))
+    return 0;
+
+  return AARCH64_FL_SM_OFF;
+}
+
+/* Return the ISA mode on entry to functions of type FNTYPE.  */
+
+static aarch64_feature_flags
+aarch64_fntype_isa_mode (const_tree fntype)
+{
+  return aarch64_fntype_sm_state (fntype);
+}
+
+/* Return the state of PSTATE.SM when compiling the body of
+   function FNDECL.  This might be different from the state of
+   PSTATE.SM on entry.  */
+
+static aarch64_feature_flags
+aarch64_fndecl_sm_state (const_tree fndecl)
+{
+  return aarch64_fntype_sm_state (TREE_TYPE (fndecl));
+}
+
+/* Return the ISA mode that should be used to compile the body of
+   function FNDECL.  */
+
+static aarch64_feature_flags
+aarch64_fndecl_isa_mode (const_tree fndecl)
+{
+  return aarch64_fndecl_sm_state (fndecl);
+}
+
 /* Implement TARGET_COMPATIBLE_VECTOR_TYPES_P.  */
 
 static bool
@@ -4110,17 +4165,46 @@ aarch64_reg_save_mode (unsigned int regno)
   gcc_unreachable ();
 }
 
-/* Implement TARGET_INSN_CALLEE_ABI.  */
+/* Given the ISA mode on entry to a callee and the ABI of the callee,
+   return the CONST_INT that should be placed in an UNSPEC_CALLEE_ABI rtx.  */
 
-const predefined_function_abi &
-aarch64_insn_callee_abi (const rtx_insn *insn)
+rtx
+aarch64_gen_callee_cookie (aarch64_feature_flags isa_mode, arm_pcs pcs_variant)
+{
+  return gen_int_mode ((unsigned int) isa_mode
+		       | (unsigned int) pcs_variant << AARCH64_NUM_ISA_MODES,
+		       DImode);
+}
+
+/* COOKIE is a CONST_INT from an UNSPEC_CALLEE_ABI rtx.  Return the
+   callee's ABI.  */
+
+static const predefined_function_abi &
+aarch64_callee_abi (rtx cookie)
+{
+  return function_abis[UINTVAL (cookie) >> AARCH64_NUM_ISA_MODES];
+}
+
+/* INSN is a call instruction.  Return the CONST_INT stored in its
+   UNSPEC_CALLEE_ABI rtx.  */
+
+static rtx
+aarch64_insn_callee_cookie (const rtx_insn *insn)
 {
   rtx pat = PATTERN (insn);
   gcc_assert (GET_CODE (pat) == PARALLEL);
   rtx unspec = XVECEXP (pat, 0, 1);
   gcc_assert (GET_CODE (unspec) == UNSPEC
 	      && XINT (unspec, 1) == UNSPEC_CALLEE_ABI);
-  return function_abis[INTVAL (XVECEXP (unspec, 0, 0))];
+  return XVECEXP (unspec, 0, 0);
+}
+
+/* Implement TARGET_INSN_CALLEE_ABI.  */
+
+const predefined_function_abi &
+aarch64_insn_callee_abi (const rtx_insn *insn)
+{
+  return aarch64_callee_abi (aarch64_insn_callee_cookie (insn));
 }
 
 /* Implement TARGET_HARD_REGNO_CALL_PART_CLOBBERED.  The callee only saves
@@ -7861,7 +7945,7 @@ aarch64_function_arg (cumulative_args_t pcum_v, const function_arg_info &arg)
 	      || pcum->pcs_variant == ARM_PCS_SVE);
 
   if (arg.end_marker_p ())
-    return gen_int_mode (pcum->pcs_variant, DImode);
+    return aarch64_gen_callee_cookie (pcum->isa_mode, pcum->pcs_variant);
 
   aarch64_layout_arg (pcum_v, arg);
   return pcum->aapcs_reg;
@@ -7882,9 +7966,15 @@ aarch64_init_cumulative_args (CUMULATIVE_ARGS *pcum,
   pcum->aapcs_nextnvrn = 0;
   pcum->aapcs_nextnprn = 0;
   if (fntype)
-    pcum->pcs_variant = (arm_pcs) fntype_abi (fntype).id ();
+    {
+      pcum->pcs_variant = (arm_pcs) fntype_abi (fntype).id ();
+      pcum->isa_mode = aarch64_fntype_isa_mode (fntype);
+    }
   else
-    pcum->pcs_variant = ARM_PCS_AAPCS64;
+    {
+      pcum->pcs_variant = ARM_PCS_AAPCS64;
+      pcum->isa_mode = AARCH64_FL_DEFAULT_ISA_MODE;
+    }
   pcum->aapcs_reg = NULL_RTX;
   pcum->aapcs_arg_processed = false;
   pcum->aapcs_stack_words = 0;
@@ -10372,7 +10462,9 @@ aarch64_output_mi_thunk (FILE *file, tree thunk ATTRIBUTE_UNUSED,
     }
   funexp = XEXP (DECL_RTL (function), 0);
   funexp = gen_rtx_MEM (FUNCTION_MODE, funexp);
-  rtx callee_abi = gen_int_mode (fndecl_abi (function).id (), DImode);
+  auto isa_mode = aarch64_fntype_isa_mode (TREE_TYPE (function));
+  auto pcs_variant = arm_pcs (fndecl_abi (function).id ());
+  rtx callee_abi = aarch64_gen_callee_cookie (isa_mode, pcs_variant);
   insn = emit_call_insn (gen_sibcall (funexp, const0_rtx, callee_abi));
   SIBLING_CALL_P (insn) = 1;
 
@@ -18315,6 +18407,7 @@ aarch64_override_options (void)
   SUBTARGET_OVERRIDE_OPTIONS;
 #endif
 
+  auto isa_mode = AARCH64_FL_DEFAULT_ISA_MODE;
   if (cpu && arch)
     {
       /* If both -mcpu and -march are specified, warn if they are not
@@ -18327,25 +18420,25 @@ aarch64_override_options (void)
 	}
 
       selected_arch = arch->arch;
-      aarch64_set_asm_isa_flags (arch_isa);
+      aarch64_set_asm_isa_flags (arch_isa | isa_mode);
     }
   else if (cpu)
     {
       selected_arch = cpu->arch;
-      aarch64_set_asm_isa_flags (cpu_isa);
+      aarch64_set_asm_isa_flags (cpu_isa | isa_mode);
     }
   else if (arch)
     {
       cpu = &all_cores[arch->ident];
       selected_arch = arch->arch;
-      aarch64_set_asm_isa_flags (arch_isa);
+      aarch64_set_asm_isa_flags (arch_isa | isa_mode);
     }
   else
     {
       /* No -mcpu or -march specified, so use the default CPU.  */
       cpu = &all_cores[TARGET_CPU_DEFAULT];
       selected_arch = cpu->arch;
-      aarch64_set_asm_isa_flags (cpu->flags);
+      aarch64_set_asm_isa_flags (cpu->flags | isa_mode);
     }
 
   selected_tune = tune ? tune->ident : cpu->ident;
@@ -18518,6 +18611,21 @@ aarch64_save_restore_target_globals (tree new_tree)
     TREE_TARGET_GLOBALS (new_tree) = save_target_globals_default_opts ();
 }
 
+/* Return the target_option_node for FNDECL, or the current options
+   if FNDECL is null.  */
+
+static tree
+aarch64_fndecl_options (tree fndecl)
+{
+  if (!fndecl)
+    return target_option_current_node;
+
+  if (tree options = DECL_FUNCTION_SPECIFIC_TARGET (fndecl))
+    return options;
+
+  return target_option_default_node;
+}
+
 /* Implement TARGET_SET_CURRENT_FUNCTION.  Unpack the codegen decisions
    like tuning and ISA features from the DECL_FUNCTION_SPECIFIC_TARGET
    of the function, if such exists.  This function may be called multiple
@@ -18527,25 +18635,24 @@ aarch64_save_restore_target_globals (tree new_tree)
 static void
 aarch64_set_current_function (tree fndecl)
 {
-  if (!fndecl || fndecl == aarch64_previous_fndecl)
-    return;
+  tree old_tree = aarch64_fndecl_options (aarch64_previous_fndecl);
+  tree new_tree = aarch64_fndecl_options (fndecl);
 
-  tree old_tree = (aarch64_previous_fndecl
-		   ? DECL_FUNCTION_SPECIFIC_TARGET (aarch64_previous_fndecl)
-		   : NULL_TREE);
-
-  tree new_tree = DECL_FUNCTION_SPECIFIC_TARGET (fndecl);
-
-  /* If current function has no attributes but the previous one did,
-     use the default node.  */
-  if (!new_tree && old_tree)
-    new_tree = target_option_default_node;
+  auto new_isa_mode = (fndecl
+		       ? aarch64_fndecl_isa_mode (fndecl)
+		       : AARCH64_FL_DEFAULT_ISA_MODE);
+  auto isa_flags = TREE_TARGET_OPTION (new_tree)->x_aarch64_isa_flags;
 
   /* If nothing to do, return.  #pragma GCC reset or #pragma GCC pop to
      the default have been handled by aarch64_save_restore_target_globals from
      aarch64_pragma_target_parse.  */
-  if (old_tree == new_tree)
-    return;
+  if (old_tree == new_tree
+      && (!fndecl || aarch64_previous_fndecl)
+      && (isa_flags & AARCH64_FL_ISA_MODES) == new_isa_mode)
+    {
+      gcc_assert (AARCH64_ISA_MODE == new_isa_mode);
+      return;
+    }
 
   aarch64_previous_fndecl = fndecl;
 
@@ -18553,7 +18660,28 @@ aarch64_set_current_function (tree fndecl)
   cl_target_option_restore (&global_options, &global_options_set,
 			    TREE_TARGET_OPTION (new_tree));
 
+  /* The ISA mode can vary based on function type attributes and
+     function declaration attributes.  Make sure that the target
+     options correctly reflect these attributes.  */
+  if ((isa_flags & AARCH64_FL_ISA_MODES) != new_isa_mode)
+    {
+      auto base_flags = (aarch64_asm_isa_flags & ~AARCH64_FL_ISA_MODES);
+      aarch64_set_asm_isa_flags (base_flags | new_isa_mode);
+
+      aarch64_override_options_internal (&global_options);
+      new_tree = build_target_option_node (&global_options,
+					   &global_options_set);
+      DECL_FUNCTION_SPECIFIC_TARGET (fndecl) = new_tree;
+
+      tree new_optimize = build_optimization_node (&global_options,
+						   &global_options_set);
+      if (new_optimize != optimization_default_node)
+	DECL_FUNCTION_SPECIFIC_OPTIMIZATION (fndecl) = new_optimize;
+    }
+
   aarch64_save_restore_target_globals (new_tree);
+
+  gcc_assert (AARCH64_ISA_MODE == new_isa_mode);
 }
 
 /* Enum describing the various ways we can handle attributes.
@@ -18603,7 +18731,7 @@ aarch64_handle_attr_arch (const char *str)
     {
       gcc_assert (tmp_arch);
       selected_arch = tmp_arch->arch;
-      aarch64_set_asm_isa_flags (tmp_flags);
+      aarch64_set_asm_isa_flags (tmp_flags | AARCH64_ISA_MODE);
       return true;
     }
 
@@ -18644,7 +18772,7 @@ aarch64_handle_attr_cpu (const char *str)
       gcc_assert (tmp_cpu);
       selected_tune = tmp_cpu->ident;
       selected_arch = tmp_cpu->arch;
-      aarch64_set_asm_isa_flags (tmp_flags);
+      aarch64_set_asm_isa_flags (tmp_flags | AARCH64_ISA_MODE);
       return true;
     }
 
@@ -18744,7 +18872,7 @@ aarch64_handle_attr_isa_flags (char *str)
      features if the user wants to handpick specific features.  */
   if (strncmp ("+nothing", str, 8) == 0)
     {
-      isa_flags = 0;
+      isa_flags = AARCH64_ISA_MODE;
       str += 8;
     }
 
@@ -19237,7 +19365,7 @@ aarch64_can_inline_p (tree caller, tree callee)
 /* Return the ID of the TLDESC ABI, initializing the descriptor if hasn't
    been already.  */
 
-unsigned int
+arm_pcs
 aarch64_tlsdesc_abi_id ()
 {
   predefined_function_abi &tlsdesc_abi = function_abis[ARM_PCS_TLSDESC];
@@ -19251,7 +19379,7 @@ aarch64_tlsdesc_abi_id ()
 	SET_HARD_REG_BIT (full_reg_clobbers, regno);
       tlsdesc_abi.initialize (ARM_PCS_TLSDESC, full_reg_clobbers);
     }
-  return tlsdesc_abi.id ();
+  return ARM_PCS_TLSDESC;
 }
 
 /* Return true if SYMBOL_REF X binds locally.  */
@@ -26956,6 +27084,10 @@ aarch64_comp_type_attributes (const_tree type1, const_tree type2)
     return 0;
   if (!check_attr ("SVE sizeless type"))
     return 0;
+  if (!check_attr ("arm_streaming"))
+    return 0;
+  if (!check_attr ("arm_streaming_compatible"))
+    return 0;
   return 1;
 }
 
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index e60f9bce023..1ac37b902bf 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -157,10 +157,13 @@
 
 #ifndef USED_FOR_TARGET
 
-/* Define an enum of all features (architectures and extensions).  */
+/* Define an enum of all features (ISA modes, architectures and extensions).
+   The ISA modes must come first.  */
 enum class aarch64_feature : unsigned char {
+#define DEF_AARCH64_ISA_MODE(IDENT) IDENT,
 #define AARCH64_OPT_EXTENSION(A, IDENT, C, D, E, F) IDENT,
 #define AARCH64_ARCH(A, B, IDENT, D, E) IDENT,
+#include "aarch64-isa-modes.def"
 #include "aarch64-option-extensions.def"
 #include "aarch64-arches.def"
 };
@@ -169,16 +172,34 @@ enum class aarch64_feature : unsigned char {
 #define HANDLE(IDENT) \
   constexpr auto AARCH64_FL_##IDENT \
     = aarch64_feature_flags (1) << int (aarch64_feature::IDENT);
+#define DEF_AARCH64_ISA_MODE(IDENT) HANDLE (IDENT)
 #define AARCH64_OPT_EXTENSION(A, IDENT, C, D, E, F) HANDLE (IDENT)
 #define AARCH64_ARCH(A, B, IDENT, D, E) HANDLE (IDENT)
+#include "aarch64-isa-modes.def"
 #include "aarch64-option-extensions.def"
 #include "aarch64-arches.def"
 #undef HANDLE
 
+constexpr auto AARCH64_FL_SM_STATE = AARCH64_FL_SM_ON | AARCH64_FL_SM_OFF;
+
+constexpr unsigned int AARCH64_NUM_ISA_MODES = (0
+#define DEF_AARCH64_ISA_MODE(IDENT) + 1
+#include "aarch64-isa-modes.def"
+);
+
+/* The mask of all ISA modes.  */
+constexpr auto AARCH64_FL_ISA_MODES
+  = (aarch64_feature_flags (1) << AARCH64_NUM_ISA_MODES) - 1;
+
+/* The default ISA mode, for functions with no attributes that specify
+   something to the contrary.  */
+constexpr auto AARCH64_FL_DEFAULT_ISA_MODE = AARCH64_FL_SM_OFF;
+
 #endif
 
 /* Macros to test ISA flags.  */
 
+#define AARCH64_ISA_MODE           (aarch64_isa_flags & AARCH64_FL_ISA_MODES)
 #define AARCH64_ISA_CRC            (aarch64_isa_flags & AARCH64_FL_CRC)
 #define AARCH64_ISA_CRYPTO         (aarch64_isa_flags & AARCH64_FL_CRYPTO)
 #define AARCH64_ISA_FP             (aarch64_isa_flags & AARCH64_FL_FP)
@@ -895,6 +916,7 @@ enum arm_pcs
 typedef struct
 {
   enum arm_pcs pcs_variant;
+  aarch64_feature_flags isa_mode;
   int aapcs_arg_processed;	/* No need to lay out this argument again.  */
   int aapcs_ncrn;		/* Next Core register number.  */
   int aapcs_nextncrn;		/* Next next core register number.  */
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index ca2e618d9b9..cd6d5e5000c 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -7088,7 +7088,8 @@ (define_expand "tlsdesc_small_<mode>"
   {
     if (TARGET_SVE)
       {
-	rtx abi = gen_int_mode (aarch64_tlsdesc_abi_id (), DImode);
+	rtx abi = aarch64_gen_callee_cookie (AARCH64_ISA_MODE,
+					     aarch64_tlsdesc_abi_id ());
 	rtx_insn *call
 	  = emit_call_insn (gen_tlsdesc_small_sve_<mode> (operands[0], abi));
 	RTL_CONST_CALL_P (call) = 1;
diff --git a/gcc/config/aarch64/t-aarch64 b/gcc/config/aarch64/t-aarch64
index ba74abc0a43..47a753c5f1b 100644
--- a/gcc/config/aarch64/t-aarch64
+++ b/gcc/config/aarch64/t-aarch64
@@ -18,7 +18,10 @@
 #  along with GCC; see the file COPYING3.  If not see
 #  <http://www.gnu.org/licenses/>.
 
-TM_H += $(srcdir)/config/aarch64/aarch64-cores.def
+TM_H += $(srcdir)/config/aarch64/aarch64-cores.def \
+	$(srcdir)/config/aarch64/aarch64-isa-modes.def \
+	$(srcdir)/config/aarch64/aarch64-option-extensions.def \
+	$(srcdir)/config/aarch64/aarch64-arches.def
 OPTIONS_H_EXTRA += $(srcdir)/config/aarch64/aarch64-cores.def \
 		   $(srcdir)/config/aarch64/aarch64-arches.def \
 		   $(srcdir)/config/aarch64/aarch64-fusion-pairs.def \
diff --git a/gcc/testsuite/gcc.target/aarch64/auto-init-1.c b/gcc/testsuite/gcc.target/aarch64/auto-init-1.c
index 0fa470880bf..48c5bb6a45c 100644
--- a/gcc/testsuite/gcc.target/aarch64/auto-init-1.c
+++ b/gcc/testsuite/gcc.target/aarch64/auto-init-1.c
@@ -29,4 +29,5 @@ void foo()
   return;
 }
 
-/* { dg-final { scan-rtl-dump-times "const_int 0" 11 "expand" } } */
+/* Includes 1 for the call instruction and one for a nop.  */
+/* { dg-final { scan-rtl-dump-times "const_int 0" 10 "expand" } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/sme/aarch64-sme.exp b/gcc/testsuite/gcc.target/aarch64/sme/aarch64-sme.exp
new file mode 100644
index 00000000000..c542912e14a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sme/aarch64-sme.exp
@@ -0,0 +1,41 @@
+#  Specific regression driver for AArch64 SME.
+#  Copyright (C) 2009-2022 Free Software Foundation, Inc.
+#  Contributed by ARM Ltd.
+#
+#  This file is part of GCC.
+#
+#  GCC is free software; you can redistribute it and/or modify it
+#  under the terms of the GNU General Public License as published by
+#  the Free Software Foundation; either version 3, or (at your option)
+#  any later version.
+#
+#  GCC is distributed in the hope that it will be useful, but
+#  WITHOUT ANY WARRANTY; without even the implied warranty of
+#  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+#  General Public License for more details.
+#
+#  You should have received a copy of the GNU General Public License
+#  along with GCC; see the file COPYING3.  If not see
+#  <http://www.gnu.org/licenses/>.  */
+
+# GCC testsuite that uses the `dg.exp' driver.
+
+# Exit immediately if this isn't an AArch64 target.
+if {![istarget aarch64*-*-*] } then {
+    return
+}
+
+# Load support procs.
+load_lib gcc-dg.exp
+
+# Initialize `dg'.
+dg-init
+
+aarch64-with-arch-dg-options "" {
+    # Main loop.
+    dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/*.\[cCS\]]] \
+	"" ""
+}
+
+# All done.
+dg-finish
diff --git a/gcc/testsuite/gcc.target/aarch64/sme/streaming_mode_1.c b/gcc/testsuite/gcc.target/aarch64/sme/streaming_mode_1.c
new file mode 100644
index 00000000000..22d4a8bcc97
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sme/streaming_mode_1.c
@@ -0,0 +1,106 @@
+// { dg-options "" }
+
+void __attribute__((arm_streaming_compatible)) sc_a ();
+void sc_a (); // { dg-error "conflicting types" }
+
+void sc_b ();
+void __attribute__((arm_streaming_compatible)) sc_b (); // { dg-error "conflicting types" }
+
+void __attribute__((arm_streaming_compatible)) sc_c ();
+void sc_c () {} // Inherits attribute from declaration (confusingly).
+
+void sc_d ();
+void __attribute__((arm_streaming_compatible)) sc_d () {} // { dg-error "conflicting types" }
+
+void __attribute__((arm_streaming_compatible)) sc_e () {}
+void sc_e (); // { dg-error "conflicting types" }
+
+void sc_f () {}
+void __attribute__((arm_streaming_compatible)) sc_f (); // { dg-error "conflicting types" }
+
+extern void (*sc_g) ();
+extern __attribute__((arm_streaming_compatible)) void (*sc_g) (); // { dg-error "conflicting types" }
+
+extern __attribute__((arm_streaming_compatible)) void (*sc_h) ();
+extern void (*sc_h) (); // { dg-error "conflicting types" }
+
+//----------------------------------------------------------------------------
+
+void __attribute__((arm_streaming)) s_a ();
+void s_a (); // { dg-error "conflicting types" }
+
+void s_b ();
+void __attribute__((arm_streaming)) s_b (); // { dg-error "conflicting types" }
+
+void __attribute__((arm_streaming)) s_c ();
+void s_c () {} // Inherits attribute from declaration (confusingly).
+
+void s_d ();
+void __attribute__((arm_streaming)) s_d () {} // { dg-error "conflicting types" }
+
+void __attribute__((arm_streaming)) s_e () {}
+void s_e (); // { dg-error "conflicting types" }
+
+void s_f () {}
+void __attribute__((arm_streaming)) s_f (); // { dg-error "conflicting types" }
+
+extern void (*s_g) ();
+extern __attribute__((arm_streaming)) void (*s_g) (); // { dg-error "conflicting types" }
+
+extern __attribute__((arm_streaming)) void (*s_h) ();
+extern void (*s_h) (); // { dg-error "conflicting types" }
+
+//----------------------------------------------------------------------------
+
+void __attribute__((arm_streaming)) mixed_a ();
+void __attribute__((arm_streaming_compatible)) mixed_a (); // { dg-error "conflicting types" }
+// { dg-warning "ignoring attribute" "" { target *-*-* } .-1 }
+
+void __attribute__((arm_streaming_compatible)) mixed_b ();
+void __attribute__((arm_streaming)) mixed_b (); // { dg-error "conflicting types" }
+// { dg-warning "ignoring attribute" "" { target *-*-* } .-1 }
+
+void __attribute__((arm_streaming)) mixed_c ();
+void __attribute__((arm_streaming_compatible)) mixed_c () {} // { dg-warning "ignoring attribute" }
+
+void __attribute__((arm_streaming_compatible)) mixed_d ();
+void __attribute__((arm_streaming)) mixed_d () {} // { dg-warning "ignoring attribute" }
+
+void __attribute__((arm_streaming)) mixed_e () {}
+void __attribute__((arm_streaming_compatible)) mixed_e (); // { dg-error "conflicting types" }
+// { dg-warning "ignoring attribute" "" { target *-*-* } .-1 }
+
+void __attribute__((arm_streaming_compatible)) mixed_f () {}
+void __attribute__((arm_streaming)) mixed_f (); // { dg-error "conflicting types" }
+// { dg-warning "ignoring attribute" "" { target *-*-* } .-1 }
+
+extern __attribute__((arm_streaming_compatible)) void (*mixed_g) ();
+extern __attribute__((arm_streaming)) void (*mixed_g) (); // { dg-error "conflicting types" }
+
+extern __attribute__((arm_streaming)) void (*mixed_h) ();
+extern __attribute__((arm_streaming_compatible)) void (*mixed_h) (); // { dg-error "conflicting types" }
+
+//----------------------------------------------------------------------------
+
+void __attribute__((arm_streaming, arm_streaming_compatible)) contradiction_1(); // { dg-warning "conflicts with attribute" }
+void __attribute__((arm_streaming_compatible, arm_streaming)) contradiction_2(); // { dg-warning "conflicts with attribute" }
+
+int __attribute__((arm_streaming_compatible)) int_attr; // { dg-warning "only applies to function types" }
+void *__attribute__((arm_streaming)) ptr_attr; // { dg-warning "only applies to function types" }
+
+typedef void __attribute__((arm_streaming)) s_callback ();
+typedef void __attribute__((arm_streaming_compatible)) sc_callback ();
+
+void (*__attribute__((arm_streaming)) s_callback_ptr) ();
+void (*__attribute__((arm_streaming_compatible)) sc_callback_ptr) ();
+
+typedef void __attribute__((arm_streaming, arm_streaming_compatible)) contradiction_callback_1 (); // { dg-warning "conflicts with attribute" }
+typedef void __attribute__((arm_streaming_compatible, arm_streaming)) contradiction_callback_2 (); // { dg-warning "conflicts with attribute" }
+
+void __attribute__((arm_streaming, arm_streaming_compatible)) (*contradiction_callback_ptr_1) (); // { dg-warning "conflicts with attribute" }
+void __attribute__((arm_streaming_compatible, arm_streaming)) (*contradiction_callback_ptr_2) (); // { dg-warning "conflicts with attribute" }
+
+struct s {
+  void __attribute__((arm_streaming, arm_streaming_compatible)) (*contradiction_callback_ptr_1) (); // { dg-warning "conflicts with attribute" }
+  void __attribute__((arm_streaming_compatible, arm_streaming)) (*contradiction_callback_ptr_2) (); // { dg-warning "conflicts with attribute" }
+};
diff --git a/gcc/testsuite/gcc.target/aarch64/sme/streaming_mode_2.c b/gcc/testsuite/gcc.target/aarch64/sme/streaming_mode_2.c
new file mode 100644
index 00000000000..448ddb5feb1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sme/streaming_mode_2.c
@@ -0,0 +1,25 @@
+// { dg-options "" }
+
+void __attribute__((arm_streaming_compatible)) sc_fn ();
+void __attribute__((arm_streaming)) s_fn ();
+void ns_fn ();
+
+__attribute__((arm_streaming_compatible)) void (*sc_fn_ptr) ();
+__attribute__((arm_streaming)) void (*s_fn_ptr) ();
+void (*ns_fn_ptr) ();
+
+void
+f ()
+{
+  sc_fn_ptr = sc_fn;
+  sc_fn_ptr = s_fn; // { dg-warning "incompatible pointer type" }
+  sc_fn_ptr = ns_fn; // { dg-warning "incompatible pointer type" }
+
+  s_fn_ptr = sc_fn; // { dg-warning "incompatible pointer type" }
+  s_fn_ptr = s_fn;
+  s_fn_ptr = ns_fn; // { dg-warning "incompatible pointer type" }
+
+  ns_fn_ptr = sc_fn; // { dg-warning "incompatible pointer type" }
+  ns_fn_ptr = s_fn; // { dg-warning "incompatible pointer type" }
+  ns_fn_ptr = ns_fn;
+}
-- 
2.25.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 02/16] aarch64: Add +sme
  2022-11-13  9:59 [PATCH 00/16] aarch64: Add support for SME Richard Sandiford
  2022-11-13  9:59 ` [PATCH 01/16] aarch64: Add arm_streaming(_compatible) attributes Richard Sandiford
@ 2022-11-13 10:00 ` Richard Sandiford
  2022-11-13 10:00 ` [PATCH 03/16] aarch64: Distinguish streaming-compatible AdvSIMD insns Richard Sandiford
                   ` (13 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: Richard Sandiford @ 2022-11-13 10:00 UTC (permalink / raw)
  To: gcc-patches

This patch adds the +sme ISA feature and requires it to be present
when compiling arm_streaming code.  (arm_streaming_compatible code
does not necessarily assume the presence of SME.  It just has to
work when SME is present and streaming mode is enabled.)

gcc/
	* doc/gcc/gcc-command-options/machine-dependent-options/aarch64-options.rst:
	Document SME.
	* doc/gccint/testsuites/directives-used-within-dejagnu-tests/keywords-describing-target-attributes.rst:
	Document aarch64_sve.
	* config/aarch64/aarch64-option-extensions.def (sme): Define.
	* config/aarch64/aarch64.h (AARCH64_ISA_SME): New macro.
	* config/aarch64/aarch64.cc (aarch64_override_options_internal):
	Ensure that SME is present when compiling streaming code.

gcc/testsuite/
	* lib/target-supports.exp (check_effective_target_aarch64_sme): New
	target test.
	* gcc.target/aarch64/sme/aarch64-sme.exp: Force SME to be enabled
	if it isn't by default.
	* gcc.target/aarch64/sme/streaming_mode_3.c: New test.
---
 .../aarch64/aarch64-option-extensions.def     |  2 +
 gcc/config/aarch64/aarch64.cc                 | 33 ++++++++++
 gcc/config/aarch64/aarch64.h                  |  1 +
 .../aarch64-options.rst                       |  3 +
 .../keywords-describing-target-attributes.rst |  3 +
 .../gcc.target/aarch64/sme/aarch64-sme.exp    | 10 ++-
 .../gcc.target/aarch64/sme/streaming_mode_3.c | 63 +++++++++++++++++++
 .../gcc.target/aarch64/sme/streaming_mode_4.c | 22 +++++++
 gcc/testsuite/lib/target-supports.exp         | 12 ++++
 9 files changed, 147 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/streaming_mode_3.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/streaming_mode_4.c

diff --git a/gcc/config/aarch64/aarch64-option-extensions.def b/gcc/config/aarch64/aarch64-option-extensions.def
index bdf4baf309c..402a9832f87 100644
--- a/gcc/config/aarch64/aarch64-option-extensions.def
+++ b/gcc/config/aarch64/aarch64-option-extensions.def
@@ -129,6 +129,8 @@ AARCH64_OPT_EXTENSION("sve2-sha3", SVE2_SHA3, (SVE2, SHA3), (), (), "svesha3")
 AARCH64_OPT_EXTENSION("sve2-bitperm", SVE2_BITPERM, (SVE2), (), (),
 		      "svebitperm")
 
+AARCH64_OPT_EXTENSION("sme", SME, (SVE2), (), (), "sme")
+
 AARCH64_OPT_EXTENSION("tme", TME, (), (), (), "")
 
 AARCH64_OPT_EXTENSION("i8mm", I8MM, (SIMD), (), (), "i8mm")
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index a2e910daddf..fc6f0bc208a 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -11374,6 +11374,23 @@ aarch64_fixed_condition_code_regs (unsigned int *p1, unsigned int *p2)
   return true;
 }
 
+/* Implement TARGET_START_CALL_ARGS.  */
+
+static void
+aarch64_start_call_args (cumulative_args_t ca_v)
+{
+  CUMULATIVE_ARGS *ca = get_cumulative_args (ca_v);
+
+  if (!TARGET_SME && (ca->isa_mode & AARCH64_FL_SM_ON))
+    {
+      error ("calling a streaming function requires the ISA extension %qs",
+	     "sme");
+      inform (input_location, "you can enable %qs using the command-line"
+	      " option %<-march%>, or by using the %<target%>"
+	      " attribute or pragma", "sme");
+    }
+}
+
 /* This function is used by the call expanders of the machine description.
    RESULT is the register in which the result is returned.  It's NULL for
    "call" and "sibcall".
@@ -17865,6 +17882,19 @@ aarch64_override_options_internal (struct gcc_options *opts)
       && !fixed_regs[R18_REGNUM])
     error ("%<-fsanitize=shadow-call-stack%> requires %<-ffixed-x18%>");
 
+  if ((opts->x_aarch64_isa_flags & AARCH64_FL_SM_ON)
+      && !(opts->x_aarch64_isa_flags & AARCH64_FL_SME))
+    {
+      error ("streaming functions require the ISA extension %qs", "sme");
+      inform (input_location, "you can enable %qs using the command-line"
+	      " option %<-march%>, or by using the %<target%>"
+	      " attribute or pragma", "sme");
+      opts->x_target_flags &= ~MASK_GENERAL_REGS_ONLY;
+      auto new_flags = (opts->x_aarch64_asm_isa_flags
+			| feature_deps::SME ().enable);
+      aarch64_set_asm_isa_flags (opts, new_flags);
+    }
+
   initialize_aarch64_code_model (opts);
   initialize_aarch64_tls_size (opts);
 
@@ -27721,6 +27751,9 @@ aarch64_run_selftests (void)
 #undef TARGET_FUNCTION_VALUE_REGNO_P
 #define TARGET_FUNCTION_VALUE_REGNO_P aarch64_function_value_regno_p
 
+#undef TARGET_START_CALL_ARGS
+#define TARGET_START_CALL_ARGS aarch64_start_call_args
+
 #undef TARGET_GIMPLE_FOLD_BUILTIN
 #define TARGET_GIMPLE_FOLD_BUILTIN aarch64_gimple_fold_builtin
 
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 1ac37b902bf..c47f27eefec 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -214,6 +214,7 @@ constexpr auto AARCH64_FL_DEFAULT_ISA_MODE = AARCH64_FL_SM_OFF;
 #define AARCH64_ISA_SVE2_BITPERM  (aarch64_isa_flags & AARCH64_FL_SVE2_BITPERM)
 #define AARCH64_ISA_SVE2_SHA3	   (aarch64_isa_flags & AARCH64_FL_SVE2_SHA3)
 #define AARCH64_ISA_SVE2_SM4	   (aarch64_isa_flags & AARCH64_FL_SVE2_SM4)
+#define AARCH64_ISA_SME		   (aarch64_isa_flags & AARCH64_FL_SME)
 #define AARCH64_ISA_V8_3A	   (aarch64_isa_flags & AARCH64_FL_V8_3A)
 #define AARCH64_ISA_DOTPROD	   (aarch64_isa_flags & AARCH64_FL_DOTPROD)
 #define AARCH64_ISA_AES	           (aarch64_isa_flags & AARCH64_FL_AES)
diff --git a/gcc/doc/gcc/gcc-command-options/machine-dependent-options/aarch64-options.rst b/gcc/doc/gcc/gcc-command-options/machine-dependent-options/aarch64-options.rst
index c2b23a6ee97..f6d82f4435b 100644
--- a/gcc/doc/gcc/gcc-command-options/machine-dependent-options/aarch64-options.rst
+++ b/gcc/doc/gcc/gcc-command-options/machine-dependent-options/aarch64-options.rst
@@ -544,6 +544,9 @@ the following and their inverses no :samp:`{feature}` :
 :samp:`pauth`
   Enable the Pointer Authentication Extension.
 
+:samp:`sme`
+  Enable the Scalable Matrix Extension.
+
 Feature ``crypto`` implies ``aes``, ``sha2``, and ``simd``,
 which implies ``fp``.
 Conversely, ``nofp`` implies ``nosimd``, which implies
diff --git a/gcc/doc/gccint/testsuites/directives-used-within-dejagnu-tests/keywords-describing-target-attributes.rst b/gcc/doc/gccint/testsuites/directives-used-within-dejagnu-tests/keywords-describing-target-attributes.rst
index 709e4ea2b90..84822b4335c 100644
--- a/gcc/doc/gccint/testsuites/directives-used-within-dejagnu-tests/keywords-describing-target-attributes.rst
+++ b/gcc/doc/gccint/testsuites/directives-used-within-dejagnu-tests/keywords-describing-target-attributes.rst
@@ -886,6 +886,9 @@ AArch64-specific attributes
   AArch64 target that is able to generate and execute armv8.3-a FJCVTZS
   instruction.
 
+``aarch64_sme``
+  AArch64 target that generates instructions for SME.
+
 MIPS-specific attributes
 ~~~~~~~~~~~~~~~~~~~~~~~~
 
diff --git a/gcc/testsuite/gcc.target/aarch64/sme/aarch64-sme.exp b/gcc/testsuite/gcc.target/aarch64/sme/aarch64-sme.exp
index c542912e14a..b3ad2ea4c5e 100644
--- a/gcc/testsuite/gcc.target/aarch64/sme/aarch64-sme.exp
+++ b/gcc/testsuite/gcc.target/aarch64/sme/aarch64-sme.exp
@@ -31,10 +31,16 @@ load_lib gcc-dg.exp
 # Initialize `dg'.
 dg-init
 
-aarch64-with-arch-dg-options "" {
+if { [check_effective_target_aarch64_sme] } {
+    set sme_flags ""
+} else {
+    set sme_flags "-march=armv8.2-a+sme"
+}
+
+aarch64-with-arch-dg-options $sme_flags {
     # Main loop.
     dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/*.\[cCS\]]] \
-	"" ""
+	"" $sme_flags
 }
 
 # All done.
diff --git a/gcc/testsuite/gcc.target/aarch64/sme/streaming_mode_3.c b/gcc/testsuite/gcc.target/aarch64/sme/streaming_mode_3.c
new file mode 100644
index 00000000000..926ffa24e45
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sme/streaming_mode_3.c
@@ -0,0 +1,63 @@
+// { dg-options "" }
+
+#pragma GCC target "+nosme"
+
+void __attribute__((arm_streaming_compatible)) sc_a () {}
+void __attribute__((arm_streaming)) s_a () {} // { dg-error "streaming functions require the ISA extension 'sme'" }
+void ns_a () {}
+
+void __attribute__((arm_streaming_compatible)) sc_b () {}
+void ns_b () {}
+void __attribute__((arm_streaming)) s_b () {} // { dg-error "streaming functions require the ISA extension 'sme'" }
+
+void __attribute__((arm_streaming_compatible)) sc_c () {}
+void __attribute__((arm_streaming_compatible)) sc_d () {}
+
+void __attribute__((arm_streaming)) s_c () {} // { dg-error "streaming functions require the ISA extension 'sme'" }
+void __attribute__((arm_streaming)) s_d () {} // { dg-error "streaming functions require the ISA extension 'sme'" }
+
+void ns_c () {}
+void ns_d () {}
+
+void __attribute__((arm_streaming_compatible)) sc_e ();
+void __attribute__((arm_streaming)) s_e ();
+void ns_e ();
+
+#pragma GCC target "+sme"
+
+void __attribute__((arm_streaming_compatible)) sc_f () {}
+void __attribute__((arm_streaming)) s_f () {}
+void ns_f () {}
+
+void __attribute__((arm_streaming_compatible)) sc_g () {}
+void ns_g () {}
+void __attribute__((arm_streaming)) s_g () {}
+
+void __attribute__((arm_streaming_compatible)) sc_h () {}
+void __attribute__((arm_streaming_compatible)) sc_i () {}
+
+void __attribute__((arm_streaming)) s_h () {}
+void __attribute__((arm_streaming)) s_i () {}
+
+void ns_h () {}
+void ns_i () {}
+
+void __attribute__((arm_streaming_compatible)) sc_j ();
+void __attribute__((arm_streaming)) s_j ();
+void ns_j ();
+
+#pragma GCC target "+sme"
+
+void __attribute__((arm_streaming_compatible)) sc_k () {}
+
+#pragma GCC target "+nosme"
+#pragma GCC target "+sme"
+
+void __attribute__((arm_streaming)) s_k () {}
+
+#pragma GCC target "+nosme"
+#pragma GCC target "+sme"
+
+void ns_k () {}
+
+#pragma GCC target "+nosme"
diff --git a/gcc/testsuite/gcc.target/aarch64/sme/streaming_mode_4.c b/gcc/testsuite/gcc.target/aarch64/sme/streaming_mode_4.c
new file mode 100644
index 00000000000..d777d7ee0d9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sme/streaming_mode_4.c
@@ -0,0 +1,22 @@
+// { dg-options "-mgeneral-regs-only" }
+
+void __attribute__((arm_streaming_compatible)) sc_a () {}
+void __attribute__((arm_streaming)) s_a () {} // { dg-error "streaming functions require the ISA extension 'sme'" }
+void ns_a () {}
+
+void __attribute__((arm_streaming_compatible)) sc_b () {}
+void ns_b () {}
+void __attribute__((arm_streaming)) s_b () {} // { dg-error "streaming functions require the ISA extension 'sme'" }
+
+void __attribute__((arm_streaming_compatible)) sc_c () {}
+void __attribute__((arm_streaming_compatible)) sc_d () {}
+
+void __attribute__((arm_streaming)) s_c () {} // { dg-error "streaming functions require the ISA extension 'sme'" }
+void __attribute__((arm_streaming)) s_d () {} // { dg-error "streaming functions require the ISA extension 'sme'" }
+
+void ns_c () {}
+void ns_d () {}
+
+void __attribute__((arm_streaming_compatible)) sc_e ();
+void __attribute__((arm_streaming)) s_e ();
+void ns_e ();
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index c7f583d6d14..f6cb16521b3 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -3967,6 +3967,18 @@ proc aarch64_sve_bits { } {
     }]
 }
 
+# Return 1 if this is an AArch64 target that generates instructions for SME.
+proc check_effective_target_aarch64_sme { } {
+    if { ![istarget aarch64*-*-*] } {
+	return 0
+    }
+    return [check_no_compiler_messages aarch64_sme assembly {
+	#if !defined (__ARM_FEATURE_SME)
+	#error FOO
+	#endif
+    }]
+}
+
 # Return 1 if this is a compiler supporting ARC atomic operations
 proc check_effective_target_arc_atomic { } {
     return [check_no_compiler_messages arc_atomic assembly {
-- 
2.25.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 03/16] aarch64: Distinguish streaming-compatible AdvSIMD insns
  2022-11-13  9:59 [PATCH 00/16] aarch64: Add support for SME Richard Sandiford
  2022-11-13  9:59 ` [PATCH 01/16] aarch64: Add arm_streaming(_compatible) attributes Richard Sandiford
  2022-11-13 10:00 ` [PATCH 02/16] aarch64: Add +sme Richard Sandiford
@ 2022-11-13 10:00 ` Richard Sandiford
  2022-11-13 10:00 ` [PATCH 04/16] aarch64: Mark relevant SVE instructions as non-streaming Richard Sandiford
                   ` (12 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: Richard Sandiford @ 2022-11-13 10:00 UTC (permalink / raw)
  To: gcc-patches

The vast majority of Advanced SIMD instructions are not
available in streaming mode, but some of the load/store/move
instructions are.  This patch adds a new target feature macro
called TARGET_BASE_SIMD for this streaming-compatible subset.

The vector-to-vector move instructions are not streaming-compatible,
so we need to use the SVE move instructions where enabled, or fall
back to the nofp16 handling otherwise.

I haven't found a good way of testing the SVE EXT alternative
in aarch64_simd_mov_from_<mode>high, but I'd rather provide it
than not.

gcc/
	* config/aarch64/aarch64.h (TARGET_BASE_SIMD): New macro.
	(TARGET_SIMD): Require PSTATE.SM to be 0.
	(AARCH64_ISA_SM_OFF): New macro.
	* config/aarch64/aarch64.cc (aarch64_array_mode_supported_p):
	Allow Advanced SIMD structure modes for TARGET_BASE_SIMD.
	(aarch64_print_operand): Support '%Z'.
	(aarch64_secondary_reload): Expect SVE moves to be used for
	Advanced SIMD modes if SVE is enabled and non-streaming
	Advanced SIMD isn't.
	(aarch64_register_move_cost): Likewise.
	(aarch64_simd_container_mode): Extend Advanced SIMD mode
	handling to TARGET_BASE_SIMD.
	(aarch64_expand_cpymem): Expand commentary.
	* config/aarch64/aarch64.md (arches): Add base_simd.
	(arch_enabled): Handle it.
	(*mov<mode>_aarch64): Extend UMOV alternative to TARGET_BASE_SIMD.
	(*movti_aarch64): Use an SVE move instruction if non-streaming
	SIMD isn't available.
	(*mov<TFD:mode>_aarch64): Likewise.
	(load_pair_dw_tftf): Extend to TARGET_BASE_SIMD.
	(store_pair_dw_tftf): Likewise.
	(loadwb_pair<TX:mode>_<P:mode>): Likewise.
	(storewb_pair<TX:mode>_<P:mode>): Likewise.
	* config/aarch64/aarch64-simd.md (*aarch64_simd_mov<VDMOV:mode>):
	Allow UMOV in streaming mode.
	(*aarch64_simd_mov<VQMOV:mode>): Use an SVE move instruction
	if non-streaming SIMD isn't available.
	(aarch64_store_lane0<mode>): Depend on TARGET_FLOAT rather than
	TARGET_SIMD.
	(aarch64_simd_mov_from_<mode>low): Likewise.  Use fmov if
	Advanced SIMD is completely disabled.
	(aarch64_simd_mov_from_<mode>high): Use SVE EXT instructions if
	non-streaming SIMD isn't available.

gcc/testsuite/
	* gcc.target/aarch64/movdf_2.c: New test.
	* gcc.target/aarch64/movdi_3.c: Likewise.
	* gcc.target/aarch64/movhf_2.c: Likewise.
	* gcc.target/aarch64/movhi_2.c: Likewise.
	* gcc.target/aarch64/movqi_2.c: Likewise.
	* gcc.target/aarch64/movsf_2.c: Likewise.
	* gcc.target/aarch64/movsi_2.c: Likewise.
	* gcc.target/aarch64/movtf_3.c: Likewise.
	* gcc.target/aarch64/movtf_4.c: Likewise.
	* gcc.target/aarch64/movti_3.c: Likewise.
	* gcc.target/aarch64/movti_4.c: Likewise.
	* gcc.target/aarch64/movv16qi_4.c: Likewise.
	* gcc.target/aarch64/movv16qi_5.c: Likewise.
	* gcc.target/aarch64/movv8qi_4.c: Likewise.
	* gcc.target/aarch64/sme/arm_neon_1.c: Likewise.
	* gcc.target/aarch64/sme/arm_neon_2.c: Likewise.
	* gcc.target/aarch64/sme/arm_neon_3.c: Likewise.
---
 gcc/config/aarch64/aarch64-simd.md            | 43 ++++++----
 gcc/config/aarch64/aarch64.cc                 | 22 +++--
 gcc/config/aarch64/aarch64.h                  | 12 ++-
 gcc/config/aarch64/aarch64.md                 | 45 +++++-----
 gcc/testsuite/gcc.target/aarch64/movdf_2.c    | 51 +++++++++++
 gcc/testsuite/gcc.target/aarch64/movdi_3.c    | 59 +++++++++++++
 gcc/testsuite/gcc.target/aarch64/movhf_2.c    | 53 ++++++++++++
 gcc/testsuite/gcc.target/aarch64/movhi_2.c    | 61 +++++++++++++
 gcc/testsuite/gcc.target/aarch64/movqi_2.c    | 59 +++++++++++++
 gcc/testsuite/gcc.target/aarch64/movsf_2.c    | 51 +++++++++++
 gcc/testsuite/gcc.target/aarch64/movsi_2.c    | 59 +++++++++++++
 gcc/testsuite/gcc.target/aarch64/movtf_3.c    | 81 +++++++++++++++++
 gcc/testsuite/gcc.target/aarch64/movtf_4.c    | 78 +++++++++++++++++
 gcc/testsuite/gcc.target/aarch64/movti_3.c    | 86 +++++++++++++++++++
 gcc/testsuite/gcc.target/aarch64/movti_4.c    | 83 ++++++++++++++++++
 gcc/testsuite/gcc.target/aarch64/movv16qi_4.c | 82 ++++++++++++++++++
 gcc/testsuite/gcc.target/aarch64/movv16qi_5.c | 79 +++++++++++++++++
 gcc/testsuite/gcc.target/aarch64/movv8qi_4.c  | 55 ++++++++++++
 .../gcc.target/aarch64/sme/arm_neon_1.c       | 13 +++
 .../gcc.target/aarch64/sme/arm_neon_2.c       | 11 +++
 .../gcc.target/aarch64/sme/arm_neon_3.c       | 11 +++
 21 files changed, 1047 insertions(+), 47 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/movdf_2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/movdi_3.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/movhf_2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/movhi_2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/movqi_2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/movsf_2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/movsi_2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/movtf_3.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/movtf_4.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/movti_3.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/movti_4.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/movv16qi_4.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/movv16qi_5.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/movv8qi_4.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/arm_neon_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/arm_neon_2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/arm_neon_3.c

diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 5386043739a..b6313cba172 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -133,7 +133,7 @@ (define_insn "*aarch64_simd_mov<VDMOV:mode>"
 	 return "mov\t%0.<Vbtype>, %1.<Vbtype>";
        return "fmov\t%d0, %d1";
      case 4:
-       if (TARGET_SIMD)
+       if (TARGET_BASE_SIMD)
 	 return "umov\t%0, %1.d[0]";
        return "fmov\t%x0, %d1";
      case 5: return "fmov\t%d0, %1";
@@ -152,9 +152,9 @@ (define_insn "*aarch64_simd_mov<VDMOV:mode>"
 
 (define_insn "*aarch64_simd_mov<VQMOV:mode>"
   [(set (match_operand:VQMOV 0 "nonimmediate_operand"
-		"=w, Umn,  m,  w, ?r, ?w, ?r, w,  w")
+		"=w, Umn, m,  w,  w, ?r, ?w, ?r,  w,  w")
 	(match_operand:VQMOV 1 "general_operand"
-		"m,  Dz, w,  w,  w,  r,  r, Dn, Dz"))]
+		 "m,  Dz, w,  w,  w,  w,  r,  r, Dn, Dz"))]
   "TARGET_FLOAT
    && (register_operand (operands[0], <MODE>mode)
        || aarch64_simd_reg_or_zero (operands[1], <MODE>mode))"
@@ -170,22 +170,24 @@ (define_insn "*aarch64_simd_mov<VQMOV:mode>"
     case 3:
 	return "mov\t%0.<Vbtype>, %1.<Vbtype>";
     case 4:
+	return "mov\t%Z0.d, %Z1.d";
     case 5:
     case 6:
-	return "#";
     case 7:
-	return aarch64_output_simd_mov_immediate (operands[1], 128);
+	return "#";
     case 8:
+	return aarch64_output_simd_mov_immediate (operands[1], 128);
+    case 9:
 	return "fmov\t%d0, xzr";
     default:
 	gcc_unreachable ();
     }
 }
   [(set_attr "type" "neon_load1_1reg<q>, store_16, neon_store1_1reg<q>,\
-		     neon_logic<q>, multiple, multiple,\
-		     multiple, neon_move<q>, fmov")
-   (set_attr "length" "4,4,4,4,8,8,8,4,4")
-   (set_attr "arch" "*,*,*,simd,*,*,*,simd,*")]
+		     neon_logic<q>, *, multiple, multiple,\
+		     multiple, neon_move<q>, f_mcr")
+   (set_attr "length" "4,4,4,4,4,8,8,8,4,4")
+   (set_attr "arch" "*,*,*,simd,sve,*,*,*,simd,*")]
 )
 
 ;; When storing lane zero we can use the normal STR and its more permissive
@@ -195,7 +197,7 @@ (define_insn "aarch64_store_lane0<mode>"
   [(set (match_operand:<VEL> 0 "memory_operand" "=m")
 	(vec_select:<VEL> (match_operand:VALL_F16 1 "register_operand" "w")
 			(parallel [(match_operand 2 "const_int_operand" "n")])))]
-  "TARGET_SIMD
+  "TARGET_FLOAT
    && ENDIAN_LANE_N (<nunits>, INTVAL (operands[2])) == 0"
   "str\\t%<Vetype>1, %0"
   [(set_attr "type" "neon_store1_1reg<q>")]
@@ -353,35 +355,38 @@ (define_expand "aarch64_get_high<mode>"
 )
 
 (define_insn_and_split "aarch64_simd_mov_from_<mode>low"
-  [(set (match_operand:<VHALF> 0 "register_operand" "=w,?r")
+  [(set (match_operand:<VHALF> 0 "register_operand" "=w,?r,?r")
         (vec_select:<VHALF>
-          (match_operand:VQMOV_NO2E 1 "register_operand" "w,w")
+          (match_operand:VQMOV_NO2E 1 "register_operand" "w,w,w")
           (match_operand:VQMOV_NO2E 2 "vect_par_cnst_lo_half" "")))]
-  "TARGET_SIMD"
+  "TARGET_FLOAT"
   "@
    #
-   umov\t%0, %1.d[0]"
+   umov\t%0, %1.d[0]
+   fmov\t%0, %d1"
   "&& reload_completed && aarch64_simd_register (operands[0], <VHALF>mode)"
   [(set (match_dup 0) (match_dup 1))]
   {
     operands[1] = aarch64_replace_reg_mode (operands[1], <VHALF>mode);
   }
-  [(set_attr "type" "mov_reg,neon_to_gp<q>")
+  [(set_attr "type" "mov_reg,neon_to_gp<q>,f_mrc")
+   (set_attr "arch" "simd,base_simd,*")
    (set_attr "length" "4")]
 )
 
 (define_insn "aarch64_simd_mov_from_<mode>high"
-  [(set (match_operand:<VHALF> 0 "register_operand" "=w,?r,?r")
+  [(set (match_operand:<VHALF> 0 "register_operand" "=w,w,?r,?r")
         (vec_select:<VHALF>
-          (match_operand:VQMOV_NO2E 1 "register_operand" "w,w,w")
+          (match_operand:VQMOV_NO2E 1 "register_operand" "w,0,w,w")
           (match_operand:VQMOV_NO2E 2 "vect_par_cnst_hi_half" "")))]
   "TARGET_FLOAT"
   "@
    dup\t%d0, %1.d[1]
+   ext\t%Z0.b, %Z0.b, %Z0.b, #8
    umov\t%0, %1.d[1]
    fmov\t%0, %1.d[1]"
-  [(set_attr "type" "neon_dup<q>,neon_to_gp<q>,f_mrc")
-   (set_attr "arch" "simd,simd,*")
+  [(set_attr "type" "neon_dup<q>,*,neon_to_gp<q>,f_mrc")
+   (set_attr "arch" "simd,sve,simd,*")
    (set_attr "length" "4")]
 )
 
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index fc6f0bc208a..36ef0435b4e 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -3726,7 +3726,7 @@ static bool
 aarch64_array_mode_supported_p (machine_mode mode,
 				unsigned HOST_WIDE_INT nelems)
 {
-  if (TARGET_SIMD
+  if (TARGET_BASE_SIMD
       && (AARCH64_VALID_SIMD_QREG_MODE (mode)
 	  || AARCH64_VALID_SIMD_DREG_MODE (mode))
       && (nelems >= 2 && nelems <= 4))
@@ -11876,6 +11876,10 @@ sizetochar (int size)
      'N':		Take the duplicated element in a vector constant
 			and print the negative of it in decimal.
      'b/h/s/d/q':	Print a scalar FP/SIMD register name.
+     'Z':		Same for SVE registers.  ('z' was already taken.)
+			Note that it is not necessary to use %Z for operands
+			that have SVE modes.  The convention is to use %Z
+			only for non-SVE (or potentially non-SVE) modes.
      'S/T/U/V':		Print a FP/SIMD register name for a register list.
 			The register printed is the FP/SIMD register name
 			of X + 0/1/2/3 for S/T/U/V.
@@ -12048,6 +12052,8 @@ aarch64_print_operand (FILE *f, rtx x, int code)
     case 's':
     case 'd':
     case 'q':
+    case 'Z':
+      code = TOLOWER (code);
       if (!REG_P (x) || !FP_REGNUM_P (REGNO (x)))
 	{
 	  output_operand_lossage ("incompatible floating point / vector register operand for '%%%c'", code);
@@ -12702,8 +12708,8 @@ aarch64_secondary_reload (bool in_p ATTRIBUTE_UNUSED, rtx x,
       return NO_REGS;
     }
 
-  /* Without the TARGET_SIMD instructions we cannot move a Q register
-     to a Q register directly.  We need a scratch.  */
+  /* Without the TARGET_SIMD or TARGET_SVE instructions we cannot move a
+     Q register to a Q register directly.  We need a scratch.  */
   if (REG_P (x)
       && (mode == TFmode
 	  || mode == TImode
@@ -15273,7 +15279,7 @@ aarch64_register_move_cost (machine_mode mode,
 	 secondary reload.  A general register is used as a scratch to move
 	 the upper DI value and the lower DI value is moved directly,
 	 hence the cost is the sum of three moves. */
-      if (! TARGET_SIMD)
+      if (!TARGET_SIMD && !TARGET_SVE)
 	return regmove_cost->GP2FP + regmove_cost->FP2GP + regmove_cost->FP2FP;
 
       return regmove_cost->FP2FP;
@@ -20773,7 +20779,7 @@ aarch64_simd_container_mode (scalar_mode mode, poly_int64 width)
     return aarch64_full_sve_mode (mode).else_mode (word_mode);
 
   gcc_assert (known_eq (width, 64) || known_eq (width, 128));
-  if (TARGET_SIMD)
+  if (TARGET_BASE_SIMD)
     {
       if (known_eq (width, 128))
 	return aarch64_vq_mode (mode).else_mode (word_mode);
@@ -24908,7 +24914,11 @@ aarch64_expand_cpymem (rtx *operands)
   int copy_bits = 256;
 
   /* Default to 256-bit LDP/STP on large copies, however small copies, no SIMD
-     support or slow 256-bit LDP/STP fall back to 128-bit chunks.  */
+     support or slow 256-bit LDP/STP fall back to 128-bit chunks.
+
+     ??? Although it would be possible to use LDP/STP Qn in streaming mode
+     (so using TARGET_BASE_SIMD instead of TARGET_SIMD), it isn't clear
+     whether that would improve performance.  */
   if (size <= 24
       || !TARGET_SIMD
       || (aarch64_tune_params.extra_tuning_flags
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index c47f27eefec..398cc03fd1f 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -61,8 +61,15 @@
 #define WORDS_BIG_ENDIAN (BYTES_BIG_ENDIAN)
 
 /* AdvSIMD is supported in the default configuration, unless disabled by
-   -mgeneral-regs-only or by the +nosimd extension.  */
-#define TARGET_SIMD (AARCH64_ISA_SIMD)
+   -mgeneral-regs-only or by the +nosimd extension.  The set of available
+   instructions is then subdivided into:
+
+   - the "base" set, available both in SME streaming mode and in
+     non-streaming mode
+
+   - the full set, available only in non-streaming mode.  */
+#define TARGET_BASE_SIMD (AARCH64_ISA_SIMD)
+#define TARGET_SIMD (AARCH64_ISA_SIMD && AARCH64_ISA_SM_OFF)
 #define TARGET_FLOAT (AARCH64_ISA_FP)
 
 #define UNITS_PER_WORD		8
@@ -199,6 +206,7 @@ constexpr auto AARCH64_FL_DEFAULT_ISA_MODE = AARCH64_FL_SM_OFF;
 
 /* Macros to test ISA flags.  */
 
+#define AARCH64_ISA_SM_OFF         (aarch64_isa_flags & AARCH64_FL_SM_OFF)
 #define AARCH64_ISA_MODE           (aarch64_isa_flags & AARCH64_FL_ISA_MODES)
 #define AARCH64_ISA_CRC            (aarch64_isa_flags & AARCH64_FL_CRC)
 #define AARCH64_ISA_CRYPTO         (aarch64_isa_flags & AARCH64_FL_CRYPTO)
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index cd6d5e5000c..3dc877ba9fe 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -374,7 +374,7 @@ (define_constants
 ;; As a convenience, "fp_q" means "fp" + the ability to move between
 ;; Q registers and is equivalent to "simd".
 
-(define_enum "arches" [ any rcpc8_4 fp fp_q simd sve fp16])
+(define_enum "arches" [any rcpc8_4 fp fp_q base_simd simd sve fp16])
 
 (define_enum_attr "arch" "arches" (const_string "any"))
 
@@ -402,6 +402,9 @@ (define_attr "arch_enabled" "no,yes"
 	(and (eq_attr "arch" "fp")
 	     (match_test "TARGET_FLOAT"))
 
+	(and (eq_attr "arch" "base_simd")
+	     (match_test "TARGET_BASE_SIMD"))
+
 	(and (eq_attr "arch" "fp_q, simd")
 	     (match_test "TARGET_SIMD"))
 
@@ -1215,7 +1218,7 @@ (define_insn "*mov<mode>_aarch64"
      case 8:
        return "str\t%<size>1, %0";
      case 9:
-       return TARGET_SIMD ? "umov\t%w0, %1.<v>[0]" : "fmov\t%w0, %s1";
+       return TARGET_BASE_SIMD ? "umov\t%w0, %1.<v>[0]" : "fmov\t%w0, %s1";
      case 10:
        return TARGET_SIMD ? "dup\t%0.<Vallxd>, %w1" : "fmov\t%s0, %w1";
      case 11:
@@ -1395,9 +1398,9 @@ (define_expand "movti"
 
 (define_insn "*movti_aarch64"
   [(set (match_operand:TI 0
-	 "nonimmediate_operand"  "=   r,w,w,w, r,w,r,m,m,w,m")
+	 "nonimmediate_operand"  "=   r,w,w,w, r,w,w,r,m,m,w,m")
 	(match_operand:TI 1
-	 "aarch64_movti_operand" " rUti,Z,Z,r, w,w,m,r,Z,m,w"))]
+	 "aarch64_movti_operand" " rUti,Z,Z,r, w,w,w,m,r,Z,m,w"))]
   "(register_operand (operands[0], TImode)
     || aarch64_reg_or_zero (operands[1], TImode))"
   "@
@@ -1407,16 +1410,17 @@ (define_insn "*movti_aarch64"
    #
    #
    mov\\t%0.16b, %1.16b
+   mov\\t%Z0.d, %Z1.d
    ldp\\t%0, %H0, %1
    stp\\t%1, %H1, %0
    stp\\txzr, xzr, %0
    ldr\\t%q0, %1
    str\\t%q1, %0"
-  [(set_attr "type" "multiple,neon_move,f_mcr,f_mcr,f_mrc,neon_logic_q, \
+  [(set_attr "type" "multiple,neon_move,f_mcr,f_mcr,f_mrc,neon_logic_q,*,\
 		             load_16,store_16,store_16,\
                              load_16,store_16")
-   (set_attr "length" "8,4,4,8,8,4,4,4,4,4,4")
-   (set_attr "arch" "*,simd,*,*,*,simd,*,*,*,fp,fp")]
+   (set_attr "length" "8,4,4,8,8,4,4,4,4,4,4,4")
+   (set_attr "arch" "*,simd,*,*,*,simd,sve,*,*,*,fp,fp")]
 )
 
 ;; Split a TImode register-register or register-immediate move into
@@ -1552,13 +1556,14 @@ (define_split
 
 (define_insn "*mov<mode>_aarch64"
   [(set (match_operand:TFD 0
-	 "nonimmediate_operand" "=w,?r ,w ,?r,w,?w,w,m,?r,m ,m")
+	 "nonimmediate_operand" "=w,w,?r ,w ,?r,w,?w,w,m,?r,m ,m")
 	(match_operand:TFD 1
-	 "general_operand"      " w,?rY,?r,w ,Y,Y ,m,w,m ,?r,Y"))]
+	 "general_operand"      " w,w,?rY,?r,w ,Y,Y ,m,w,m ,?r,Y"))]
   "TARGET_FLOAT && (register_operand (operands[0], <MODE>mode)
     || aarch64_reg_or_fp_zero (operands[1], <MODE>mode))"
   "@
    mov\\t%0.16b, %1.16b
+   mov\\t%Z0.d, %Z1.d
    #
    #
    #
@@ -1569,10 +1574,10 @@ (define_insn "*mov<mode>_aarch64"
    ldp\\t%0, %H0, %1
    stp\\t%1, %H1, %0
    stp\\txzr, xzr, %0"
-  [(set_attr "type" "logic_reg,multiple,f_mcr,f_mrc,neon_move_q,f_mcr,\
+  [(set_attr "type" "logic_reg,*,multiple,f_mcr,f_mrc,neon_move_q,f_mcr,\
                      f_loadd,f_stored,load_16,store_16,store_16")
-   (set_attr "length" "4,8,8,8,4,4,4,4,4,4,4")
-   (set_attr "arch" "simd,*,*,*,simd,*,*,*,*,*,*")]
+   (set_attr "length" "4,4,8,8,8,4,4,4,4,4,4,4")
+   (set_attr "arch" "simd,sve,*,*,*,simd,*,*,*,*,*,*")]
 )
 
 (define_split
@@ -1756,7 +1761,7 @@ (define_insn "load_pair_dw_tftf"
 	(match_operand:TF 1 "aarch64_mem_pair_operand" "Ump"))
    (set (match_operand:TF 2 "register_operand" "=w")
 	(match_operand:TF 3 "memory_operand" "m"))]
-   "TARGET_SIMD
+   "TARGET_BASE_SIMD
     && rtx_equal_p (XEXP (operands[3], 0),
 		    plus_constant (Pmode,
 				   XEXP (operands[1], 0),
@@ -1806,11 +1811,11 @@ (define_insn "store_pair_dw_tftf"
 	(match_operand:TF 1 "register_operand" "w"))
    (set (match_operand:TF 2 "memory_operand" "=m")
 	(match_operand:TF 3 "register_operand" "w"))]
-   "TARGET_SIMD &&
-    rtx_equal_p (XEXP (operands[2], 0),
-		 plus_constant (Pmode,
-				XEXP (operands[0], 0),
-				GET_MODE_SIZE (TFmode)))"
+   "TARGET_BASE_SIMD
+    && rtx_equal_p (XEXP (operands[2], 0),
+		    plus_constant (Pmode,
+				   XEXP (operands[0], 0),
+				   GET_MODE_SIZE (TFmode)))"
   "stp\\t%q1, %q3, %z0"
   [(set_attr "type" "neon_stp_q")
    (set_attr "fp" "yes")]
@@ -1858,7 +1863,7 @@ (define_insn "loadwb_pair<TX:mode>_<P:mode>"
      (set (match_operand:TX 3 "register_operand" "=w")
           (mem:TX (plus:P (match_dup 1)
 			  (match_operand:P 5 "const_int_operand" "n"))))])]
-  "TARGET_SIMD && INTVAL (operands[5]) == GET_MODE_SIZE (<TX:MODE>mode)"
+  "TARGET_BASE_SIMD && INTVAL (operands[5]) == GET_MODE_SIZE (<TX:MODE>mode)"
   "ldp\\t%q2, %q3, [%1], %4"
   [(set_attr "type" "neon_ldp_q")]
 )
@@ -1908,7 +1913,7 @@ (define_insn "storewb_pair<TX:mode>_<P:mode>"
      (set (mem:TX (plus:P (match_dup 0)
 			  (match_operand:P 5 "const_int_operand" "n")))
           (match_operand:TX 3 "register_operand" "w"))])]
-  "TARGET_SIMD
+  "TARGET_BASE_SIMD
    && INTVAL (operands[5])
       == INTVAL (operands[4]) + GET_MODE_SIZE (<TX:MODE>mode)"
   "stp\\t%q2, %q3, [%0, %4]!"
diff --git a/gcc/testsuite/gcc.target/aarch64/movdf_2.c b/gcc/testsuite/gcc.target/aarch64/movdf_2.c
new file mode 100644
index 00000000000..c2454d2c83e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/movdf_2.c
@@ -0,0 +1,51 @@
+/* { dg-do assemble } */
+/* { dg-options "-O --save-temps" } */
+/* { dg-final { check-function-bodies "**" "" "" } } */
+
+/*
+** fpr_to_fpr:
+**	fmov	d0, d1
+**	ret
+*/
+double __attribute__((arm_streaming_compatible))
+fpr_to_fpr (double q0, double q1)
+{
+  return q1;
+}
+
+/*
+** gpr_to_fpr:
+**	fmov	d0, x0
+**	ret
+*/
+double __attribute__((arm_streaming_compatible))
+gpr_to_fpr ()
+{
+  register double x0 asm ("x0");
+  asm volatile ("" : "=r" (x0));
+  return x0;
+}
+
+/*
+** zero_to_fpr:
+**	fmov	d0, xzr
+**	ret
+*/
+double __attribute__((arm_streaming_compatible))
+zero_to_fpr ()
+{
+  return 0;
+}
+
+/*
+** fpr_to_gpr:
+**	fmov	x0, d0
+**	ret
+*/
+void __attribute__((arm_streaming_compatible))
+fpr_to_gpr (double q0)
+{
+  register double x0 asm ("x0");
+  x0 = q0;
+  asm volatile ("" :: "r" (x0));
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/movdi_3.c b/gcc/testsuite/gcc.target/aarch64/movdi_3.c
new file mode 100644
index 00000000000..5d369b27356
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/movdi_3.c
@@ -0,0 +1,59 @@
+/* { dg-do assemble } */
+/* { dg-options "-O --save-temps" } */
+/* { dg-final { check-function-bodies "**" "" "" } } */
+
+#include <stdint.h>
+
+/*
+** fpr_to_fpr:
+**	fmov	d0, d1
+**	ret
+*/
+void __attribute__((arm_streaming_compatible))
+fpr_to_fpr (void)
+{
+  register uint64_t q0 asm ("q0");
+  register uint64_t q1 asm ("q1");
+  asm volatile ("" : "=w" (q1));
+  q0 = q1;
+  asm volatile ("" :: "w" (q0));
+}
+
+/*
+** gpr_to_fpr:
+**	fmov	d0, x0
+**	ret
+*/
+void __attribute__((arm_streaming_compatible))
+gpr_to_fpr (uint64_t x0)
+{
+  register uint64_t q0 asm ("q0");
+  q0 = x0;
+  asm volatile ("" :: "w" (q0));
+}
+
+/*
+** zero_to_fpr:
+**	fmov	d0, xzr
+**	ret
+*/
+void __attribute__((arm_streaming_compatible))
+zero_to_fpr ()
+{
+  register uint64_t q0 asm ("q0");
+  q0 = 0;
+  asm volatile ("" :: "w" (q0));
+}
+
+/*
+** fpr_to_gpr:
+**	fmov	x0, d0
+**	ret
+*/
+uint64_t __attribute__((arm_streaming_compatible))
+fpr_to_gpr ()
+{
+  register uint64_t q0 asm ("q0");
+  asm volatile ("" : "=w" (q0));
+  return q0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/movhf_2.c b/gcc/testsuite/gcc.target/aarch64/movhf_2.c
new file mode 100644
index 00000000000..cf3af357b84
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/movhf_2.c
@@ -0,0 +1,53 @@
+/* { dg-do assemble } */
+/* { dg-options "-O --save-temps" } */
+/* { dg-final { check-function-bodies "**" "" "" } } */
+
+#pragma GCC target "+nothing+simd"
+
+/*
+** fpr_to_fpr:
+**	fmov	s0, s1
+**	ret
+*/
+_Float16 __attribute__((arm_streaming_compatible))
+fpr_to_fpr (_Float16 q0, _Float16 q1)
+{
+  return q1;
+}
+
+/*
+** gpr_to_fpr:
+**	fmov	s0, w0
+**	ret
+*/
+_Float16 __attribute__((arm_streaming_compatible))
+gpr_to_fpr ()
+{
+  register _Float16 w0 asm ("w0");
+  asm volatile ("" : "=r" (w0));
+  return w0;
+}
+
+/*
+** zero_to_fpr:
+**	fmov	s0, wzr
+**	ret
+*/
+_Float16 __attribute__((arm_streaming_compatible))
+zero_to_fpr ()
+{
+  return 0;
+}
+
+/*
+** fpr_to_gpr:
+**	fmov	w0, s0
+**	ret
+*/
+void __attribute__((arm_streaming_compatible))
+fpr_to_gpr (_Float16 q0)
+{
+  register _Float16 w0 asm ("w0");
+  w0 = q0;
+  asm volatile ("" :: "r" (w0));
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/movhi_2.c b/gcc/testsuite/gcc.target/aarch64/movhi_2.c
new file mode 100644
index 00000000000..108923449b9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/movhi_2.c
@@ -0,0 +1,61 @@
+/* { dg-do assemble } */
+/* { dg-options "-O --save-temps" } */
+/* { dg-final { check-function-bodies "**" "" "" } } */
+
+#pragma GCC target "+nothing+simd"
+
+#include <stdint.h>
+
+/*
+** fpr_to_fpr:
+**	fmov	s0, s1
+**	ret
+*/
+void __attribute__((arm_streaming_compatible))
+fpr_to_fpr (void)
+{
+  register uint16_t q0 asm ("q0");
+  register uint16_t q1 asm ("q1");
+  asm volatile ("" : "=w" (q1));
+  q0 = q1;
+  asm volatile ("" :: "w" (q0));
+}
+
+/*
+** gpr_to_fpr:
+**	fmov	s0, w0
+**	ret
+*/
+void __attribute__((arm_streaming_compatible))
+gpr_to_fpr (uint16_t w0)
+{
+  register uint16_t q0 asm ("q0");
+  q0 = w0;
+  asm volatile ("" :: "w" (q0));
+}
+
+/*
+** zero_to_fpr:
+**	fmov	s0, wzr
+**	ret
+*/
+void __attribute__((arm_streaming_compatible))
+zero_to_fpr ()
+{
+  register uint16_t q0 asm ("q0");
+  q0 = 0;
+  asm volatile ("" :: "w" (q0));
+}
+
+/*
+** fpr_to_gpr:
+**	umov	w0, v0.h\[0\]
+**	ret
+*/
+uint16_t __attribute__((arm_streaming_compatible))
+fpr_to_gpr ()
+{
+  register uint16_t q0 asm ("q0");
+  asm volatile ("" : "=w" (q0));
+  return q0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/movqi_2.c b/gcc/testsuite/gcc.target/aarch64/movqi_2.c
new file mode 100644
index 00000000000..a28547d2ba3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/movqi_2.c
@@ -0,0 +1,59 @@
+/* { dg-do assemble } */
+/* { dg-options "-O --save-temps" } */
+/* { dg-final { check-function-bodies "**" "" "" } } */
+
+#include <stdint.h>
+
+/*
+** fpr_to_fpr:
+**	fmov	s0, s1
+**	ret
+*/
+void __attribute__((arm_streaming_compatible))
+fpr_to_fpr (void)
+{
+  register uint8_t q0 asm ("q0");
+  register uint8_t q1 asm ("q1");
+  asm volatile ("" : "=w" (q1));
+  q0 = q1;
+  asm volatile ("" :: "w" (q0));
+}
+
+/*
+** gpr_to_fpr:
+**	fmov	s0, w0
+**	ret
+*/
+void __attribute__((arm_streaming_compatible))
+gpr_to_fpr (uint8_t w0)
+{
+  register uint8_t q0 asm ("q0");
+  q0 = w0;
+  asm volatile ("" :: "w" (q0));
+}
+
+/*
+** zero_to_fpr:
+**	fmov	s0, wzr
+**	ret
+*/
+void __attribute__((arm_streaming_compatible))
+zero_to_fpr ()
+{
+  register uint8_t q0 asm ("q0");
+  q0 = 0;
+  asm volatile ("" :: "w" (q0));
+}
+
+/*
+** fpr_to_gpr:
+**	umov	w0, v0.b\[0\]
+**	ret
+*/
+uint8_t __attribute__((arm_streaming_compatible))
+fpr_to_gpr ()
+{
+  register uint8_t q0 asm ("q0");
+  asm volatile ("" : "=w" (q0));
+  return q0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/movsf_2.c b/gcc/testsuite/gcc.target/aarch64/movsf_2.c
new file mode 100644
index 00000000000..53abd380510
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/movsf_2.c
@@ -0,0 +1,51 @@
+/* { dg-do assemble } */
+/* { dg-options "-O --save-temps" } */
+/* { dg-final { check-function-bodies "**" "" "" } } */
+
+/*
+** fpr_to_fpr:
+**	fmov	s0, s1
+**	ret
+*/
+float __attribute__((arm_streaming_compatible))
+fpr_to_fpr (float q0, float q1)
+{
+  return q1;
+}
+
+/*
+** gpr_to_fpr:
+**	fmov	s0, w0
+**	ret
+*/
+float __attribute__((arm_streaming_compatible))
+gpr_to_fpr ()
+{
+  register float w0 asm ("w0");
+  asm volatile ("" : "=r" (w0));
+  return w0;
+}
+
+/*
+** zero_to_fpr:
+**	fmov	s0, wzr
+**	ret
+*/
+float __attribute__((arm_streaming_compatible))
+zero_to_fpr ()
+{
+  return 0;
+}
+
+/*
+** fpr_to_gpr:
+**	fmov	w0, s0
+**	ret
+*/
+void __attribute__((arm_streaming_compatible))
+fpr_to_gpr (float q0)
+{
+  register float w0 asm ("w0");
+  w0 = q0;
+  asm volatile ("" :: "r" (w0));
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/movsi_2.c b/gcc/testsuite/gcc.target/aarch64/movsi_2.c
new file mode 100644
index 00000000000..a0159d3fc1e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/movsi_2.c
@@ -0,0 +1,59 @@
+/* { dg-do assemble } */
+/* { dg-options "-O --save-temps" } */
+/* { dg-final { check-function-bodies "**" "" "" } } */
+
+#include <stdint.h>
+
+/*
+** fpr_to_fpr:
+**	fmov	s0, s1
+**	ret
+*/
+void __attribute__((arm_streaming_compatible))
+fpr_to_fpr (void)
+{
+  register uint32_t q0 asm ("q0");
+  register uint32_t q1 asm ("q1");
+  asm volatile ("" : "=w" (q1));
+  q0 = q1;
+  asm volatile ("" :: "w" (q0));
+}
+
+/*
+** gpr_to_fpr:
+**	fmov	s0, w0
+**	ret
+*/
+void __attribute__((arm_streaming_compatible))
+gpr_to_fpr (uint32_t w0)
+{
+  register uint32_t q0 asm ("q0");
+  q0 = w0;
+  asm volatile ("" :: "w" (q0));
+}
+
+/*
+** zero_to_fpr:
+**	fmov	s0, wzr
+**	ret
+*/
+void __attribute__((arm_streaming_compatible))
+zero_to_fpr ()
+{
+  register uint32_t q0 asm ("q0");
+  q0 = 0;
+  asm volatile ("" :: "w" (q0));
+}
+
+/*
+** fpr_to_gpr:
+**	fmov	w0, s0
+**	ret
+*/
+uint32_t __attribute__((arm_streaming_compatible))
+fpr_to_gpr ()
+{
+  register uint32_t q0 asm ("q0");
+  asm volatile ("" : "=w" (q0));
+  return q0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/movtf_3.c b/gcc/testsuite/gcc.target/aarch64/movtf_3.c
new file mode 100644
index 00000000000..d38f59e2a1f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/movtf_3.c
@@ -0,0 +1,81 @@
+/* { dg-do assemble } */
+/* { dg-require-effective-target large_long_double } */
+/* { dg-options "-O -mtune=neoverse-v1 --save-temps" } */
+/* { dg-final { check-function-bodies "**" "" "" } } */
+
+#pragma GCC target "+nosve"
+
+/*
+** fpr_to_fpr:
+**	sub	sp, sp, #16
+**	str	q1, \[sp\]
+**	ldr	q0, \[sp\]
+**	add	sp, sp, #?16
+**	ret
+*/
+long double __attribute__((arm_streaming_compatible))
+fpr_to_fpr (long double q0, long double q1)
+{
+  return q1;
+}
+
+/*
+** gpr_to_fpr:	{ target aarch64_little_endian }
+**	fmov	d0, x0
+**	fmov	v0.d\[1\], x1
+**	ret
+*/
+/*
+** gpr_to_fpr:	{ target aarch64_big_endian }
+**	fmov	d0, x1
+**	fmov	v0.d\[1\], x0
+**	ret
+*/
+long double __attribute__((arm_streaming_compatible))
+gpr_to_fpr ()
+{
+  register long double x0 asm ("x0");
+  asm volatile ("" : "=r" (x0));
+  return x0;
+}
+
+/*
+** zero_to_fpr:
+**	fmov	s0, wzr
+**	ret
+*/
+long double __attribute__((arm_streaming_compatible))
+zero_to_fpr ()
+{
+  return 0;
+}
+
+/*
+** fpr_to_gpr:	{ target aarch64_little_endian }
+** (
+**	fmov	x0, d0
+**	fmov	x1, v0.d\[1\]
+** |
+**	fmov	x1, v0.d\[1\]
+**	fmov	x0, d0
+** )
+**	ret
+*/
+/*
+** fpr_to_gpr:	{ target aarch64_big_endian }
+** (
+**	fmov	x1, d0
+**	fmov	x0, v0.d\[1\]
+** |
+**	fmov	x0, v0.d\[1\]
+**	fmov	x1, d0
+** )
+**	ret
+*/
+void __attribute__((arm_streaming_compatible))
+fpr_to_gpr (long double q0)
+{
+  register long double x0 asm ("x0");
+  x0 = q0;
+  asm volatile ("" :: "r" (x0));
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/movtf_4.c b/gcc/testsuite/gcc.target/aarch64/movtf_4.c
new file mode 100644
index 00000000000..5b7486c7887
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/movtf_4.c
@@ -0,0 +1,78 @@
+/* { dg-do assemble } */
+/* { dg-require-effective-target large_long_double } */
+/* { dg-options "-O -mtune=neoverse-v1 --save-temps" } */
+/* { dg-final { check-function-bodies "**" "" "" } } */
+
+#pragma GCC target "+sve"
+
+/*
+** fpr_to_fpr:
+**	mov	z0.d, z1.d
+**	ret
+*/
+long double __attribute__((arm_streaming_compatible))
+fpr_to_fpr (long double q0, long double q1)
+{
+  return q1;
+}
+
+/*
+** gpr_to_fpr:	{ target aarch64_little_endian }
+**	fmov	d0, x0
+**	fmov	v0.d\[1\], x1
+**	ret
+*/
+/*
+** gpr_to_fpr:	{ target aarch64_big_endian }
+**	fmov	d0, x1
+**	fmov	v0.d\[1\], x0
+**	ret
+*/
+long double __attribute__((arm_streaming_compatible))
+gpr_to_fpr ()
+{
+  register long double x0 asm ("x0");
+  asm volatile ("" : "=r" (x0));
+  return x0;
+}
+
+/*
+** zero_to_fpr:
+**	fmov	s0, wzr
+**	ret
+*/
+long double __attribute__((arm_streaming_compatible))
+zero_to_fpr ()
+{
+  return 0;
+}
+
+/*
+** fpr_to_gpr:	{ target aarch64_little_endian }
+** (
+**	fmov	x0, d0
+**	fmov	x1, v0.d\[1\]
+** |
+**	fmov	x1, v0.d\[1\]
+**	fmov	x0, d0
+** )
+**	ret
+*/
+/*
+** fpr_to_gpr:	{ target aarch64_big_endian }
+** (
+**	fmov	x1, d0
+**	fmov	x0, v0.d\[1\]
+** |
+**	fmov	x0, v0.d\[1\]
+**	fmov	x1, d0
+** )
+**	ret
+*/
+void __attribute__((arm_streaming_compatible))
+fpr_to_gpr (long double q0)
+{
+  register long double x0 asm ("x0");
+  x0 = q0;
+  asm volatile ("" :: "r" (x0));
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/movti_3.c b/gcc/testsuite/gcc.target/aarch64/movti_3.c
new file mode 100644
index 00000000000..d846b09497e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/movti_3.c
@@ -0,0 +1,86 @@
+/* { dg-do assemble } */
+/* { dg-options "-O -mtune=neoverse-v1 --save-temps" } */
+/* { dg-final { check-function-bodies "**" "" "" } } */
+
+#pragma GCC target "+nosve"
+
+/*
+** fpr_to_fpr:
+**	sub	sp, sp, #16
+**	str	q1, \[sp\]
+**	ldr	q0, \[sp\]
+**	add	sp, sp, #?16
+**	ret
+*/
+void __attribute__((arm_streaming_compatible))
+fpr_to_fpr (void)
+{
+  register __int128_t q0 asm ("q0");
+  register __int128_t q1 asm ("q1");
+  asm volatile ("" : "=w" (q1));
+  q0 = q1;
+  asm volatile ("" :: "w" (q0));
+}
+
+/*
+** gpr_to_fpr:	{ target aarch64_little_endian }
+**	fmov	d0, x0
+**	fmov	v0.d\[1\], x1
+**	ret
+*/
+/*
+** gpr_to_fpr:	{ target aarch64_big_endian }
+**	fmov	d0, x1
+**	fmov	v0.d\[1\], x0
+**	ret
+*/
+void __attribute__((arm_streaming_compatible))
+gpr_to_fpr (__int128_t x0)
+{
+  register __int128_t q0 asm ("q0");
+  q0 = x0;
+  asm volatile ("" :: "w" (q0));
+}
+
+/*
+** zero_to_fpr:
+**	fmov	d0, xzr
+**	ret
+*/
+void __attribute__((arm_streaming_compatible))
+zero_to_fpr ()
+{
+  register __int128_t q0 asm ("q0");
+  q0 = 0;
+  asm volatile ("" :: "w" (q0));
+}
+
+/*
+** fpr_to_gpr:	{ target aarch64_little_endian }
+** (
+**	fmov	x0, d0
+**	fmov	x1, v0.d\[1\]
+** |
+**	fmov	x1, v0.d\[1\]
+**	fmov	x0, d0
+** )
+**	ret
+*/
+/*
+** fpr_to_gpr:	{ target aarch64_big_endian }
+** (
+**	fmov	x1, d0
+**	fmov	x0, v0.d\[1\]
+** |
+**	fmov	x0, v0.d\[1\]
+**	fmov	x1, d0
+** )
+**	ret
+*/
+__int128_t __attribute__((arm_streaming_compatible))
+fpr_to_gpr ()
+{
+  register __int128_t q0 asm ("q0");
+  asm volatile ("" : "=w" (q0));
+  return q0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/movti_4.c b/gcc/testsuite/gcc.target/aarch64/movti_4.c
new file mode 100644
index 00000000000..01e5537e88f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/movti_4.c
@@ -0,0 +1,83 @@
+/* { dg-do assemble } */
+/* { dg-options "-O -mtune=neoverse-v1 --save-temps" } */
+/* { dg-final { check-function-bodies "**" "" "" } } */
+
+#pragma GCC target "+sve"
+
+/*
+** fpr_to_fpr:
+**	mov	z0\.d, z1\.d
+**	ret
+*/
+void __attribute__((arm_streaming_compatible))
+fpr_to_fpr (void)
+{
+  register __int128_t q0 asm ("q0");
+  register __int128_t q1 asm ("q1");
+  asm volatile ("" : "=w" (q1));
+  q0 = q1;
+  asm volatile ("" :: "w" (q0));
+}
+
+/*
+** gpr_to_fpr:	{ target aarch64_little_endian }
+**	fmov	d0, x0
+**	fmov	v0.d\[1\], x1
+**	ret
+*/
+/*
+** gpr_to_fpr:	{ target aarch64_big_endian }
+**	fmov	d0, x1
+**	fmov	v0.d\[1\], x0
+**	ret
+*/
+void __attribute__((arm_streaming_compatible))
+gpr_to_fpr (__int128_t x0)
+{
+  register __int128_t q0 asm ("q0");
+  q0 = x0;
+  asm volatile ("" :: "w" (q0));
+}
+
+/*
+** zero_to_fpr:
+**	fmov	d0, xzr
+**	ret
+*/
+void __attribute__((arm_streaming_compatible))
+zero_to_fpr ()
+{
+  register __int128_t q0 asm ("q0");
+  q0 = 0;
+  asm volatile ("" :: "w" (q0));
+}
+
+/*
+** fpr_to_gpr:	{ target aarch64_little_endian }
+** (
+**	fmov	x0, d0
+**	fmov	x1, v0.d\[1\]
+** |
+**	fmov	x1, v0.d\[1\]
+**	fmov	x0, d0
+** )
+**	ret
+*/
+/*
+** fpr_to_gpr:	{ target aarch64_big_endian }
+** (
+**	fmov	x1, d0
+**	fmov	x0, v0.d\[1\]
+** |
+**	fmov	x0, v0.d\[1\]
+**	fmov	x1, d0
+** )
+**	ret
+*/
+__int128_t __attribute__((arm_streaming_compatible))
+fpr_to_gpr ()
+{
+  register __int128_t q0 asm ("q0");
+  asm volatile ("" : "=w" (q0));
+  return q0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/movv16qi_4.c b/gcc/testsuite/gcc.target/aarch64/movv16qi_4.c
new file mode 100644
index 00000000000..f0f8cb95750
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/movv16qi_4.c
@@ -0,0 +1,82 @@
+/* { dg-do assemble } */
+/* { dg-options "-O -mtune=neoverse-v1 --save-temps" } */
+/* { dg-final { check-function-bodies "**" "" "" } } */
+
+#pragma GCC target "+nosve"
+
+typedef unsigned char v16qi __attribute__((vector_size(16)));
+
+/*
+** fpr_to_fpr:
+**	sub	sp, sp, #16
+**	str	q1, \[sp\]
+**	ldr	q0, \[sp\]
+**	add	sp, sp, #?16
+**	ret
+*/
+v16qi __attribute__((arm_streaming_compatible))
+fpr_to_fpr (v16qi q0, v16qi q1)
+{
+  return q1;
+}
+
+/*
+** gpr_to_fpr:	{ target aarch64_little_endian }
+**	fmov	d0, x0
+**	fmov	v0.d\[1\], x1
+**	ret
+*/
+/*
+** gpr_to_fpr:	{ target aarch64_big_endian }
+**	fmov	d0, x1
+**	fmov	v0.d\[1\], x0
+**	ret
+*/
+v16qi __attribute__((arm_streaming_compatible))
+gpr_to_fpr ()
+{
+  register v16qi x0 asm ("x0");
+  asm volatile ("" : "=r" (x0));
+  return x0;
+}
+
+/*
+** zero_to_fpr:
+**	fmov	d0, xzr
+**	ret
+*/
+v16qi __attribute__((arm_streaming_compatible))
+zero_to_fpr ()
+{
+  return (v16qi) {};
+}
+
+/*
+** fpr_to_gpr:	{ target aarch64_little_endian }
+** (
+**	umov	x0, v0.d\[0\]
+**	fmov	x1, v0.d\[1\]
+** |
+**	fmov	x1, v0.d\[1\]
+**	umov	x0, v0.d\[0\]
+** )
+**	ret
+*/
+/*
+** fpr_to_gpr:	{ target aarch64_big_endian }
+** (
+**	umov	x1, v0.d\[0\]
+**	fmov	x0, v0.d\[1\]
+** |
+**	fmov	x0, v0.d\[1\]
+**	umov	x1, v0.d\[0\]
+** )
+**	ret
+*/
+void __attribute__((arm_streaming_compatible))
+fpr_to_gpr (v16qi q0)
+{
+  register v16qi x0 asm ("x0");
+  x0 = q0;
+  asm volatile ("" :: "r" (x0));
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/movv16qi_5.c b/gcc/testsuite/gcc.target/aarch64/movv16qi_5.c
new file mode 100644
index 00000000000..db59f01376e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/movv16qi_5.c
@@ -0,0 +1,79 @@
+/* { dg-do assemble } */
+/* { dg-options "-O -mtune=neoverse-v1 --save-temps" } */
+/* { dg-final { check-function-bodies "**" "" "" } } */
+
+#pragma GCC target "+sve"
+
+typedef unsigned char v16qi __attribute__((vector_size(16)));
+
+/*
+** fpr_to_fpr:
+**	mov	z0.d, z1.d
+**	ret
+*/
+v16qi __attribute__((arm_streaming_compatible))
+fpr_to_fpr (v16qi q0, v16qi q1)
+{
+  return q1;
+}
+
+/*
+** gpr_to_fpr:	{ target aarch64_little_endian }
+**	fmov	d0, x0
+**	fmov	v0.d\[1\], x1
+**	ret
+*/
+/*
+** gpr_to_fpr:	{ target aarch64_big_endian }
+**	fmov	d0, x1
+**	fmov	v0.d\[1\], x0
+**	ret
+*/
+v16qi __attribute__((arm_streaming_compatible))
+gpr_to_fpr ()
+{
+  register v16qi x0 asm ("x0");
+  asm volatile ("" : "=r" (x0));
+  return x0;
+}
+
+/*
+** zero_to_fpr:
+**	fmov	d0, xzr
+**	ret
+*/
+v16qi __attribute__((arm_streaming_compatible))
+zero_to_fpr ()
+{
+  return (v16qi) {};
+}
+
+/*
+** fpr_to_gpr:	{ target aarch64_little_endian }
+** (
+**	umov	x0, v0.d\[0\]
+**	fmov	x1, v0.d\[1\]
+** |
+**	fmov	x1, v0.d\[1\]
+**	umov	x0, v0.d\[0\]
+** )
+**	ret
+*/
+/*
+** fpr_to_gpr:	{ target aarch64_big_endian }
+** (
+**	umov	x1, v0.d\[0\]
+**	fmov	x0, v0.d\[1\]
+** |
+**	fmov	x0, v0.d\[1\]
+**	umov	x1, v0.d\[0\]
+** )
+**	ret
+*/
+void __attribute__((arm_streaming_compatible))
+fpr_to_gpr (v16qi q0)
+{
+  register v16qi x0 asm ("x0");
+  x0 = q0;
+  asm volatile ("" :: "r" (x0));
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/movv8qi_4.c b/gcc/testsuite/gcc.target/aarch64/movv8qi_4.c
new file mode 100644
index 00000000000..49eb2d31910
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/movv8qi_4.c
@@ -0,0 +1,55 @@
+/* { dg-do assemble } */
+/* { dg-options "-O -mtune=neoverse-v1 --save-temps" } */
+/* { dg-final { check-function-bodies "**" "" "" } } */
+
+#pragma GCC target "+nosve"
+
+typedef unsigned char v8qi __attribute__((vector_size(8)));
+
+/*
+** fpr_to_fpr:
+**	fmov	d0, d1
+**	ret
+*/
+v8qi __attribute__((arm_streaming_compatible))
+fpr_to_fpr (v8qi q0, v8qi q1)
+{
+  return q1;
+}
+
+/*
+** gpr_to_fpr:
+**	fmov	d0, x0
+**	ret
+*/
+v8qi __attribute__((arm_streaming_compatible))
+gpr_to_fpr ()
+{
+  register v8qi x0 asm ("x0");
+  asm volatile ("" : "=r" (x0));
+  return x0;
+}
+
+/*
+** zero_to_fpr:
+**	fmov	d0, xzr
+**	ret
+*/
+v8qi __attribute__((arm_streaming_compatible))
+zero_to_fpr ()
+{
+  return (v8qi) {};
+}
+
+/*
+** fpr_to_gpr:
+**	umov	x0, v0\.d\[0\]
+**	ret
+*/
+void __attribute__((arm_streaming_compatible))
+fpr_to_gpr (v8qi q0)
+{
+  register v8qi x0 asm ("x0");
+  x0 = q0;
+  asm volatile ("" :: "r" (x0));
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/sme/arm_neon_1.c b/gcc/testsuite/gcc.target/aarch64/sme/arm_neon_1.c
new file mode 100644
index 00000000000..4a526e7d125
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sme/arm_neon_1.c
@@ -0,0 +1,13 @@
+// { dg-options "" }
+
+#include <arm_neon.h>
+
+#pragma GCC target "+nosme"
+
+// { dg-error {inlining failed.*'vaddq_s32'} "" { target *-*-* } 0 }
+
+int32x4_t __attribute__((arm_streaming_compatible))
+foo (int32x4_t x, int32x4_t y)
+{
+  return vaddq_s32 (x, y);
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/sme/arm_neon_2.c b/gcc/testsuite/gcc.target/aarch64/sme/arm_neon_2.c
new file mode 100644
index 00000000000..e7183caa6f8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sme/arm_neon_2.c
@@ -0,0 +1,11 @@
+// { dg-options "" }
+
+#include <arm_neon.h>
+
+// { dg-error {inlining failed.*'vaddq_s32'} "" { target *-*-* } 0 }
+
+int32x4_t __attribute__((arm_streaming_compatible))
+foo (int32x4_t x, int32x4_t y)
+{
+  return vaddq_s32 (x, y);
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/sme/arm_neon_3.c b/gcc/testsuite/gcc.target/aarch64/sme/arm_neon_3.c
new file mode 100644
index 00000000000..e11570e41d1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sme/arm_neon_3.c
@@ -0,0 +1,11 @@
+// { dg-options "" }
+
+#include <arm_neon.h>
+
+// { dg-error {inlining failed.*'vaddq_s32'} "" { target *-*-* } 0 }
+
+int32x4_t __attribute__((arm_streaming))
+foo (int32x4_t x, int32x4_t y)
+{
+  return vaddq_s32 (x, y);
+}
-- 
2.25.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 04/16] aarch64: Mark relevant SVE instructions as non-streaming
  2022-11-13  9:59 [PATCH 00/16] aarch64: Add support for SME Richard Sandiford
                   ` (2 preceding siblings ...)
  2022-11-13 10:00 ` [PATCH 03/16] aarch64: Distinguish streaming-compatible AdvSIMD insns Richard Sandiford
@ 2022-11-13 10:00 ` Richard Sandiford
  2022-11-13 10:00 ` [PATCH 05/16] aarch64: Switch PSTATE.SM around calls Richard Sandiford
                   ` (11 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: Richard Sandiford @ 2022-11-13 10:00 UTC (permalink / raw)
  To: gcc-patches

Following on from the previous Advanced SIMD patch, this one
divides SVE instructions into non-streaming and streaming-
compatible groups.

gcc/
	* config/aarch64/aarch64.h (TARGET_NON_STREAMING): New macro.
	(TARGET_SVE2_AES, TARGET_SVE2_BITPERM): Use it.
	(TARGET_SVE2_SHA3, TARGET_SVE2_SM4): Likewise.
	* config/aarch64/aarch64-sve-builtins-base.def: Separate out
	the functions that require PSTATE.SM to be 0 and guard them
	with AARCH64_FL_SM_OFF.
	* config/aarch64/aarch64-sve-builtins-sve2.def: Likewise.
	* config/aarch64/aarch64-sve-builtins.cc (check_required_extensions):
	Enforce AARCH64_FL_SM_OFF requirements.
	* config/aarch64/aarch64-sve.md (aarch64_wrffr): Require
	TARGET_NON_STREAMING
	(aarch64_rdffr, aarch64_rdffr_z, *aarch64_rdffr_z_ptest): Likewise.
	(*aarch64_rdffr_ptest, *aarch64_rdffr_z_cc, *aarch64_rdffr_cc)
	(@aarch64_ld<fn>f1<mode>): Likewise.
	(@aarch64_ld<fn>f1_<ANY_EXTEND:optab><SVE_HSDI:mode><SVE_PARTIAL_I:mode>)
	(gather_load<mode><v_int_container>): Likewise
	(mask_gather_load<mode><v_int_container>): Likewise.
	(mask_gather_load<mode><v_int_container>): Likewise.
	(*mask_gather_load<mode><v_int_container>_<su>xtw_unpacked): Likewise.
	(*mask_gather_load<mode><v_int_container>_sxtw): Likewise.
	(*mask_gather_load<mode><v_int_container>_uxtw): Likewise.
	(@aarch64_gather_load_<ANY_EXTEND:optab><SVE_4HSI:mode><SVE_4BHI:mode>)
	(@aarch64_gather_load_<ANY_EXTEND:optab><SVE_2HSDI:mode>
	<SVE_2BHSI:mode>): Likewise.
	(*aarch64_gather_load_<ANY_EXTEND:optab><SVE_2HSDI:mode>
	<SVE_2BHSI:mode>_<ANY_EXTEND2:su>xtw_unpacked)
	(*aarch64_gather_load_<ANY_EXTEND:optab><SVE_2HSDI:mode>
	<SVE_2BHSI:mode>_sxtw): Likewise.
	(*aarch64_gather_load_<ANY_EXTEND:optab><SVE_2HSDI:mode>
	<SVE_2BHSI:mode>_uxtw): Likewise.
	(@aarch64_ldff1_gather<mode>, @aarch64_ldff1_gather<mode>): Likewise.
	(*aarch64_ldff1_gather<mode>_sxtw): Likewise.
	(*aarch64_ldff1_gather<mode>_uxtw): Likewise.
	(@aarch64_ldff1_gather_<ANY_EXTEND:optab><VNx4_WIDE:mode>
	<VNx4_NARROW:mode>): Likewise.
	(@aarch64_ldff1_gather_<ANY_EXTEND:optab><VNx2_WIDE:mode>
	<VNx2_NARROW:mode>): Likewise.
	(*aarch64_ldff1_gather_<ANY_EXTEND:optab><VNx2_WIDE:mode>
	<VNx2_NARROW:mode>_sxtw): Likewise.
	(*aarch64_ldff1_gather_<ANY_EXTEND:optab><VNx2_WIDE:mode>
	<VNx2_NARROW:mode>_uxtw): Likewise.
	(@aarch64_sve_gather_prefetch<SVE_FULL_I:mode><VNx4SI_ONLY:mode>)
	(@aarch64_sve_gather_prefetch<SVE_FULL_I:mode><VNx2DI_ONLY:mode>)
	(*aarch64_sve_gather_prefetch<SVE_FULL_I:mode><VNx2DI_ONLY:mode>_sxtw)
	(*aarch64_sve_gather_prefetch<SVE_FULL_I:mode><VNx2DI_ONLY:mode>_uxtw)
	(scatter_store<mode><v_int_container>): Likewise.
	(mask_scatter_store<mode><v_int_container>): Likewise.
	(*mask_scatter_store<mode><v_int_container>_<su>xtw_unpacked)
	(*mask_scatter_store<mode><v_int_container>_sxtw): Likewise.
	(*mask_scatter_store<mode><v_int_container>_uxtw): Likewise.
	(@aarch64_scatter_store_trunc<VNx4_NARROW:mode><VNx4_WIDE:mode>)
	(@aarch64_scatter_store_trunc<VNx2_NARROW:mode><VNx2_WIDE:mode>)
	(*aarch64_scatter_store_trunc<VNx2_NARROW:mode><VNx2_WIDE:mode>_sxtw)
	(*aarch64_scatter_store_trunc<VNx2_NARROW:mode><VNx2_WIDE:mode>_uxtw)
	(@aarch64_sve_ld1ro<mode>, @aarch64_adr<mode>): Likewise.
	(*aarch64_adr_sxtw, *aarch64_adr_uxtw_unspec): Likewise.
	(*aarch64_adr_uxtw_and, @aarch64_adr<mode>_shift): Likewise.
	(*aarch64_adr<mode>_shift, *aarch64_adr_shift_sxtw): Likewise.
	(*aarch64_adr_shift_uxtw, @aarch64_sve_add_<optab><vsi2qi>): Likewise.
	(@aarch64_sve_<sve_fp_op><mode>, fold_left_plus_<mode>): Likewise.
	(mask_fold_left_plus_<mode>, @aarch64_sve_compact<mode>): Likewise.
	* config/aarch64/aarch64-sve2.md (@aarch64_gather_ldnt<mode>)
	(@aarch64_gather_ldnt_<ANY_EXTEND:optab><SVE_FULL_SDI:mode>
	<SVE_PARTIAL_I:mode>): Likewise.
	(@aarch64_sve2_histcnt<mode>, @aarch64_sve2_histseg<mode>): Likewise.
	(@aarch64_pred_<SVE2_MATCH:sve_int_op><mode>): Likewise.
	(*aarch64_pred_<SVE2_MATCH:sve_int_op><mode>_cc): Likewise.
	(*aarch64_pred_<SVE2_MATCH:sve_int_op><mode>_ptest): Likewise.
	* config/aarch64/iterators.md (SVE_FP_UNARY_INT): Make FEXPA
	depend on TARGET_NON_STREAMING.
	(SVE_BFLOAT_TERNARY_LONG): Likewise BFMMLA.

gcc/testsuite/
	* g++.target/aarch64/sve/aarch64-ssve.exp: New harness.
	* g++.target/aarch64/sve/acle/aarch64-sve-acle-asm.exp: Add
	-DSTREAMING_COMPATIBLE to the list of options.
	* g++.target/aarch64/sve2/acle/aarch64-sve2-acle-asm.exp: Likewise.
	* gcc.target/aarch64/sve/acle/aarch64-sve-acle-asm.exp: Likewise.
	* gcc.target/aarch64/sve2/acle/aarch64-sve2-acle-asm.exp: Likewise.
	Fix pasto in variable name.
	* gcc.target/aarch64/sve/acle/asm/test_sve_acle.h: Mark functions
	as streaming-compatible if STREAMING_COMPATIBLE is defined.
	* gcc.target/aarch64/sve/acle/asm/adda_f16.c: Disable for
	streaming-compatible code.
	* gcc.target/aarch64/sve/acle/asm/adda_f32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/adda_f64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/adrb.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/adrd.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/adrh.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/adrw.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/bfmmla_f32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/compact_f32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/compact_f64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/compact_s32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/compact_s64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/compact_u32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/compact_u64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/expa_f16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/expa_f32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/expa_f64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ld1_gather_f32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ld1_gather_f64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ld1_gather_s32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ld1_gather_s64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ld1_gather_u32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ld1_gather_u64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ld1ro_bf16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ld1ro_f16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ld1ro_f32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ld1ro_f64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ld1ro_s16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ld1ro_s32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ld1ro_s64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ld1ro_s8.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ld1ro_u16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ld1ro_u32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ld1ro_u64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ld1ro_u8.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ld1sb_gather_s32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ld1sb_gather_s64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ld1sb_gather_u32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ld1sb_gather_u64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ld1sh_gather_s32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ld1sh_gather_s64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ld1sh_gather_u32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ld1sh_gather_u64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ld1sw_gather_s64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ld1sw_gather_u64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ld1ub_gather_s32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ld1ub_gather_s64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ld1ub_gather_u32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ld1ub_gather_u64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ld1uh_gather_s32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ld1uh_gather_s64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ld1uh_gather_u32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ld1uh_gather_u64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ld1uw_gather_s64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ld1uw_gather_u64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldff1_bf16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldff1_f16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldff1_f32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldff1_f64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldff1_gather_f32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldff1_gather_f64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldff1_gather_s32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldff1_gather_s64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldff1_gather_u32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldff1_gather_u64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldff1_s16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldff1_s32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldff1_s64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldff1_s8.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldff1_u16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldff1_u32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldff1_u64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldff1_u8.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldff1sb_gather_s32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldff1sb_gather_s64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldff1sb_gather_u32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldff1sb_gather_u64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldff1sb_s16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldff1sb_s32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldff1sb_s64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldff1sb_u16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldff1sb_u32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldff1sb_u64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldff1sh_gather_s32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldff1sh_gather_s64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldff1sh_gather_u32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldff1sh_gather_u64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldff1sh_s32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldff1sh_s64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldff1sh_u32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldff1sh_u64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldff1sw_gather_s64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldff1sw_gather_u64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldff1sw_s64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldff1sw_u64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldff1ub_gather_s32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldff1ub_gather_s64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldff1ub_gather_u32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldff1ub_gather_u64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldff1ub_s16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldff1ub_s32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldff1ub_s64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldff1ub_u16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldff1ub_u32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldff1ub_u64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldff1uh_gather_s32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldff1uh_gather_s64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldff1uh_gather_u32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldff1uh_gather_u64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldff1uh_s32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldff1uh_s64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldff1uh_u32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldff1uh_u64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldff1uw_gather_s64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldff1uw_gather_u64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldff1uw_s64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldff1uw_u64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldnf1_bf16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldnf1_f16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldnf1_f32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldnf1_f64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldnf1_s16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldnf1_s32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldnf1_s64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldnf1_s8.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldnf1_u16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldnf1_u32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldnf1_u64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldnf1_u8.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldnf1sb_s16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldnf1sb_s32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldnf1sb_s64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldnf1sb_u16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldnf1sb_u32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldnf1sb_u64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldnf1sh_s32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldnf1sh_s64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldnf1sh_u32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldnf1sh_u64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldnf1sw_s64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldnf1sw_u64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldnf1ub_s16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldnf1ub_s32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldnf1ub_s64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldnf1ub_u16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldnf1ub_u32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldnf1ub_u64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldnf1uh_s32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldnf1uh_s64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldnf1uh_u32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldnf1uh_u64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldnf1uw_s64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/ldnf1uw_u64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/mmla_f32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/mmla_f64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/mmla_s32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/mmla_u32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/prfb_gather.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/prfd_gather.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/prfh_gather.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/prfw_gather.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/rdffr_1.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/st1_scatter_f32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/st1_scatter_f64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/st1_scatter_s32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/st1_scatter_s64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/st1_scatter_u32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/st1_scatter_u64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/st1b_scatter_s32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/st1b_scatter_s64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/st1b_scatter_u32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/st1b_scatter_u64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/st1h_scatter_s32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/st1h_scatter_s64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/st1h_scatter_u32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/st1h_scatter_u64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/st1w_scatter_s64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/st1w_scatter_u64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/tmad_f16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/tmad_f32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/tmad_f64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/tsmul_f16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/tsmul_f32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/tsmul_f64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/tssel_f16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/tssel_f32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/tssel_f64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/usmmla_s32.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/aesd_u8.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/aese_u8.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/aesimc_u8.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/aesmc_u8.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/bdep_u16.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/bdep_u32.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/bdep_u64.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/bdep_u8.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/bext_u16.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/bext_u32.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/bext_u64.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/bext_u8.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/bgrp_u16.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/bgrp_u32.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/bgrp_u64.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/bgrp_u8.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/histcnt_s32.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/histcnt_s64.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/histcnt_u32.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/histcnt_u64.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/histseg_s8.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/histseg_u8.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/ldnt1_gather_f32.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/ldnt1_gather_f64.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/ldnt1_gather_s32.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/ldnt1_gather_s64.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/ldnt1_gather_u32.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/ldnt1_gather_u64.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/ldnt1sb_gather_s32.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/ldnt1sb_gather_s64.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/ldnt1sb_gather_u32.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/ldnt1sb_gather_u64.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/ldnt1sh_gather_s32.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/ldnt1sh_gather_s64.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/ldnt1sh_gather_u32.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/ldnt1sh_gather_u64.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/ldnt1sw_gather_s64.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/ldnt1sw_gather_u64.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/ldnt1ub_gather_s32.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/ldnt1ub_gather_s64.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/ldnt1ub_gather_u32.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/ldnt1ub_gather_u64.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/ldnt1uh_gather_s32.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/ldnt1uh_gather_s64.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/ldnt1uh_gather_u32.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/ldnt1uh_gather_u64.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/ldnt1uw_gather_s64.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/ldnt1uw_gather_u64.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/match_s16.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/match_s8.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/match_u16.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/match_u8.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/nmatch_s16.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/nmatch_s8.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/nmatch_u16.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/nmatch_u8.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/pmullb_pair_u64.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/pmullt_pair_u64.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/rax1_s64.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/rax1_u64.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/sm4e_u32.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/sm4ekey_u32.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/stnt1_scatter_f32.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/stnt1_scatter_f64.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/stnt1_scatter_s32.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/stnt1_scatter_s64.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/stnt1_scatter_u32.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/stnt1_scatter_u64.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/stnt1b_scatter_s32.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/stnt1b_scatter_s64.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/stnt1b_scatter_u32.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/stnt1b_scatter_u64.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/stnt1h_scatter_s32.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/stnt1h_scatter_s64.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/stnt1h_scatter_u32.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/stnt1h_scatter_u64.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/stnt1w_scatter_s64.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/stnt1w_scatter_u64.c: Likewise.
---
 .../aarch64/aarch64-sve-builtins-base.def     | 150 +++++----
 .../aarch64/aarch64-sve-builtins-sve2.def     |  65 ++--
 gcc/config/aarch64/aarch64-sve-builtins.cc    |   7 +
 gcc/config/aarch64/aarch64-sve.md             | 124 +++----
 gcc/config/aarch64/aarch64-sve2.md            |  11 +-
 gcc/config/aarch64/aarch64.h                  |  11 +-
 gcc/config/aarch64/iterators.md               |   4 +-
 .../g++.target/aarch64/sve/aarch64-ssve.exp   | 309 ++++++++++++++++++
 .../aarch64/sve/acle/aarch64-sve-acle-asm.exp |   1 +
 .../sve2/acle/aarch64-sve2-acle-asm.exp       |   1 +
 .../aarch64/sve/acle/aarch64-sve-acle-asm.exp |   1 +
 .../aarch64/sve/acle/asm/adda_f16.c           |   1 +
 .../aarch64/sve/acle/asm/adda_f32.c           |   1 +
 .../aarch64/sve/acle/asm/adda_f64.c           |   1 +
 .../gcc.target/aarch64/sve/acle/asm/adrb.c    |   1 +
 .../gcc.target/aarch64/sve/acle/asm/adrd.c    |   1 +
 .../gcc.target/aarch64/sve/acle/asm/adrh.c    |   1 +
 .../gcc.target/aarch64/sve/acle/asm/adrw.c    |   1 +
 .../aarch64/sve/acle/asm/bfmmla_f32.c         |   1 +
 .../aarch64/sve/acle/asm/compact_f32.c        |   1 +
 .../aarch64/sve/acle/asm/compact_f64.c        |   1 +
 .../aarch64/sve/acle/asm/compact_s32.c        |   1 +
 .../aarch64/sve/acle/asm/compact_s64.c        |   1 +
 .../aarch64/sve/acle/asm/compact_u32.c        |   1 +
 .../aarch64/sve/acle/asm/compact_u64.c        |   1 +
 .../aarch64/sve/acle/asm/expa_f16.c           |   1 +
 .../aarch64/sve/acle/asm/expa_f32.c           |   1 +
 .../aarch64/sve/acle/asm/expa_f64.c           |   1 +
 .../aarch64/sve/acle/asm/ld1_gather_f32.c     |   1 +
 .../aarch64/sve/acle/asm/ld1_gather_f64.c     |   1 +
 .../aarch64/sve/acle/asm/ld1_gather_s32.c     |   1 +
 .../aarch64/sve/acle/asm/ld1_gather_s64.c     |   1 +
 .../aarch64/sve/acle/asm/ld1_gather_u32.c     |   1 +
 .../aarch64/sve/acle/asm/ld1_gather_u64.c     |   1 +
 .../aarch64/sve/acle/asm/ld1ro_bf16.c         |   1 +
 .../aarch64/sve/acle/asm/ld1ro_f16.c          |   1 +
 .../aarch64/sve/acle/asm/ld1ro_f32.c          |   1 +
 .../aarch64/sve/acle/asm/ld1ro_f64.c          |   1 +
 .../aarch64/sve/acle/asm/ld1ro_s16.c          |   1 +
 .../aarch64/sve/acle/asm/ld1ro_s32.c          |   1 +
 .../aarch64/sve/acle/asm/ld1ro_s64.c          |   1 +
 .../aarch64/sve/acle/asm/ld1ro_s8.c           |   1 +
 .../aarch64/sve/acle/asm/ld1ro_u16.c          |   1 +
 .../aarch64/sve/acle/asm/ld1ro_u32.c          |   1 +
 .../aarch64/sve/acle/asm/ld1ro_u64.c          |   1 +
 .../aarch64/sve/acle/asm/ld1ro_u8.c           |   1 +
 .../aarch64/sve/acle/asm/ld1sb_gather_s32.c   |   1 +
 .../aarch64/sve/acle/asm/ld1sb_gather_s64.c   |   1 +
 .../aarch64/sve/acle/asm/ld1sb_gather_u32.c   |   1 +
 .../aarch64/sve/acle/asm/ld1sb_gather_u64.c   |   1 +
 .../aarch64/sve/acle/asm/ld1sh_gather_s32.c   |   1 +
 .../aarch64/sve/acle/asm/ld1sh_gather_s64.c   |   1 +
 .../aarch64/sve/acle/asm/ld1sh_gather_u32.c   |   1 +
 .../aarch64/sve/acle/asm/ld1sh_gather_u64.c   |   1 +
 .../aarch64/sve/acle/asm/ld1sw_gather_s64.c   |   1 +
 .../aarch64/sve/acle/asm/ld1sw_gather_u64.c   |   1 +
 .../aarch64/sve/acle/asm/ld1ub_gather_s32.c   |   1 +
 .../aarch64/sve/acle/asm/ld1ub_gather_s64.c   |   1 +
 .../aarch64/sve/acle/asm/ld1ub_gather_u32.c   |   1 +
 .../aarch64/sve/acle/asm/ld1ub_gather_u64.c   |   1 +
 .../aarch64/sve/acle/asm/ld1uh_gather_s32.c   |   1 +
 .../aarch64/sve/acle/asm/ld1uh_gather_s64.c   |   1 +
 .../aarch64/sve/acle/asm/ld1uh_gather_u32.c   |   1 +
 .../aarch64/sve/acle/asm/ld1uh_gather_u64.c   |   1 +
 .../aarch64/sve/acle/asm/ld1uw_gather_s64.c   |   1 +
 .../aarch64/sve/acle/asm/ld1uw_gather_u64.c   |   1 +
 .../aarch64/sve/acle/asm/ldff1_bf16.c         |   1 +
 .../aarch64/sve/acle/asm/ldff1_f16.c          |   1 +
 .../aarch64/sve/acle/asm/ldff1_f32.c          |   1 +
 .../aarch64/sve/acle/asm/ldff1_f64.c          |   1 +
 .../aarch64/sve/acle/asm/ldff1_gather_f32.c   |   1 +
 .../aarch64/sve/acle/asm/ldff1_gather_f64.c   |   1 +
 .../aarch64/sve/acle/asm/ldff1_gather_s32.c   |   1 +
 .../aarch64/sve/acle/asm/ldff1_gather_s64.c   |   1 +
 .../aarch64/sve/acle/asm/ldff1_gather_u32.c   |   1 +
 .../aarch64/sve/acle/asm/ldff1_gather_u64.c   |   1 +
 .../aarch64/sve/acle/asm/ldff1_s16.c          |   1 +
 .../aarch64/sve/acle/asm/ldff1_s32.c          |   1 +
 .../aarch64/sve/acle/asm/ldff1_s64.c          |   1 +
 .../aarch64/sve/acle/asm/ldff1_s8.c           |   1 +
 .../aarch64/sve/acle/asm/ldff1_u16.c          |   1 +
 .../aarch64/sve/acle/asm/ldff1_u32.c          |   1 +
 .../aarch64/sve/acle/asm/ldff1_u64.c          |   1 +
 .../aarch64/sve/acle/asm/ldff1_u8.c           |   1 +
 .../aarch64/sve/acle/asm/ldff1sb_gather_s32.c |   1 +
 .../aarch64/sve/acle/asm/ldff1sb_gather_s64.c |   1 +
 .../aarch64/sve/acle/asm/ldff1sb_gather_u32.c |   1 +
 .../aarch64/sve/acle/asm/ldff1sb_gather_u64.c |   1 +
 .../aarch64/sve/acle/asm/ldff1sb_s16.c        |   1 +
 .../aarch64/sve/acle/asm/ldff1sb_s32.c        |   1 +
 .../aarch64/sve/acle/asm/ldff1sb_s64.c        |   1 +
 .../aarch64/sve/acle/asm/ldff1sb_u16.c        |   1 +
 .../aarch64/sve/acle/asm/ldff1sb_u32.c        |   1 +
 .../aarch64/sve/acle/asm/ldff1sb_u64.c        |   1 +
 .../aarch64/sve/acle/asm/ldff1sh_gather_s32.c |   1 +
 .../aarch64/sve/acle/asm/ldff1sh_gather_s64.c |   1 +
 .../aarch64/sve/acle/asm/ldff1sh_gather_u32.c |   1 +
 .../aarch64/sve/acle/asm/ldff1sh_gather_u64.c |   1 +
 .../aarch64/sve/acle/asm/ldff1sh_s32.c        |   1 +
 .../aarch64/sve/acle/asm/ldff1sh_s64.c        |   1 +
 .../aarch64/sve/acle/asm/ldff1sh_u32.c        |   1 +
 .../aarch64/sve/acle/asm/ldff1sh_u64.c        |   1 +
 .../aarch64/sve/acle/asm/ldff1sw_gather_s64.c |   1 +
 .../aarch64/sve/acle/asm/ldff1sw_gather_u64.c |   1 +
 .../aarch64/sve/acle/asm/ldff1sw_s64.c        |   1 +
 .../aarch64/sve/acle/asm/ldff1sw_u64.c        |   1 +
 .../aarch64/sve/acle/asm/ldff1ub_gather_s32.c |   1 +
 .../aarch64/sve/acle/asm/ldff1ub_gather_s64.c |   1 +
 .../aarch64/sve/acle/asm/ldff1ub_gather_u32.c |   1 +
 .../aarch64/sve/acle/asm/ldff1ub_gather_u64.c |   1 +
 .../aarch64/sve/acle/asm/ldff1ub_s16.c        |   1 +
 .../aarch64/sve/acle/asm/ldff1ub_s32.c        |   1 +
 .../aarch64/sve/acle/asm/ldff1ub_s64.c        |   1 +
 .../aarch64/sve/acle/asm/ldff1ub_u16.c        |   1 +
 .../aarch64/sve/acle/asm/ldff1ub_u32.c        |   1 +
 .../aarch64/sve/acle/asm/ldff1ub_u64.c        |   1 +
 .../aarch64/sve/acle/asm/ldff1uh_gather_s32.c |   1 +
 .../aarch64/sve/acle/asm/ldff1uh_gather_s64.c |   1 +
 .../aarch64/sve/acle/asm/ldff1uh_gather_u32.c |   1 +
 .../aarch64/sve/acle/asm/ldff1uh_gather_u64.c |   1 +
 .../aarch64/sve/acle/asm/ldff1uh_s32.c        |   1 +
 .../aarch64/sve/acle/asm/ldff1uh_s64.c        |   1 +
 .../aarch64/sve/acle/asm/ldff1uh_u32.c        |   1 +
 .../aarch64/sve/acle/asm/ldff1uh_u64.c        |   1 +
 .../aarch64/sve/acle/asm/ldff1uw_gather_s64.c |   1 +
 .../aarch64/sve/acle/asm/ldff1uw_gather_u64.c |   1 +
 .../aarch64/sve/acle/asm/ldff1uw_s64.c        |   1 +
 .../aarch64/sve/acle/asm/ldff1uw_u64.c        |   1 +
 .../aarch64/sve/acle/asm/ldnf1_bf16.c         |   1 +
 .../aarch64/sve/acle/asm/ldnf1_f16.c          |   1 +
 .../aarch64/sve/acle/asm/ldnf1_f32.c          |   1 +
 .../aarch64/sve/acle/asm/ldnf1_f64.c          |   1 +
 .../aarch64/sve/acle/asm/ldnf1_s16.c          |   1 +
 .../aarch64/sve/acle/asm/ldnf1_s32.c          |   1 +
 .../aarch64/sve/acle/asm/ldnf1_s64.c          |   1 +
 .../aarch64/sve/acle/asm/ldnf1_s8.c           |   1 +
 .../aarch64/sve/acle/asm/ldnf1_u16.c          |   1 +
 .../aarch64/sve/acle/asm/ldnf1_u32.c          |   1 +
 .../aarch64/sve/acle/asm/ldnf1_u64.c          |   1 +
 .../aarch64/sve/acle/asm/ldnf1_u8.c           |   1 +
 .../aarch64/sve/acle/asm/ldnf1sb_s16.c        |   1 +
 .../aarch64/sve/acle/asm/ldnf1sb_s32.c        |   1 +
 .../aarch64/sve/acle/asm/ldnf1sb_s64.c        |   1 +
 .../aarch64/sve/acle/asm/ldnf1sb_u16.c        |   1 +
 .../aarch64/sve/acle/asm/ldnf1sb_u32.c        |   1 +
 .../aarch64/sve/acle/asm/ldnf1sb_u64.c        |   1 +
 .../aarch64/sve/acle/asm/ldnf1sh_s32.c        |   1 +
 .../aarch64/sve/acle/asm/ldnf1sh_s64.c        |   1 +
 .../aarch64/sve/acle/asm/ldnf1sh_u32.c        |   1 +
 .../aarch64/sve/acle/asm/ldnf1sh_u64.c        |   1 +
 .../aarch64/sve/acle/asm/ldnf1sw_s64.c        |   1 +
 .../aarch64/sve/acle/asm/ldnf1sw_u64.c        |   1 +
 .../aarch64/sve/acle/asm/ldnf1ub_s16.c        |   1 +
 .../aarch64/sve/acle/asm/ldnf1ub_s32.c        |   1 +
 .../aarch64/sve/acle/asm/ldnf1ub_s64.c        |   1 +
 .../aarch64/sve/acle/asm/ldnf1ub_u16.c        |   1 +
 .../aarch64/sve/acle/asm/ldnf1ub_u32.c        |   1 +
 .../aarch64/sve/acle/asm/ldnf1ub_u64.c        |   1 +
 .../aarch64/sve/acle/asm/ldnf1uh_s32.c        |   1 +
 .../aarch64/sve/acle/asm/ldnf1uh_s64.c        |   1 +
 .../aarch64/sve/acle/asm/ldnf1uh_u32.c        |   1 +
 .../aarch64/sve/acle/asm/ldnf1uh_u64.c        |   1 +
 .../aarch64/sve/acle/asm/ldnf1uw_s64.c        |   1 +
 .../aarch64/sve/acle/asm/ldnf1uw_u64.c        |   1 +
 .../aarch64/sve/acle/asm/mmla_f32.c           |   1 +
 .../aarch64/sve/acle/asm/mmla_f64.c           |   1 +
 .../aarch64/sve/acle/asm/mmla_s32.c           |   1 +
 .../aarch64/sve/acle/asm/mmla_u32.c           |   1 +
 .../aarch64/sve/acle/asm/prfb_gather.c        |   1 +
 .../aarch64/sve/acle/asm/prfd_gather.c        |   1 +
 .../aarch64/sve/acle/asm/prfh_gather.c        |   1 +
 .../aarch64/sve/acle/asm/prfw_gather.c        |   1 +
 .../gcc.target/aarch64/sve/acle/asm/rdffr_1.c |   1 +
 .../aarch64/sve/acle/asm/st1_scatter_f32.c    |   1 +
 .../aarch64/sve/acle/asm/st1_scatter_f64.c    |   1 +
 .../aarch64/sve/acle/asm/st1_scatter_s32.c    |   1 +
 .../aarch64/sve/acle/asm/st1_scatter_s64.c    |   1 +
 .../aarch64/sve/acle/asm/st1_scatter_u32.c    |   1 +
 .../aarch64/sve/acle/asm/st1_scatter_u64.c    |   1 +
 .../aarch64/sve/acle/asm/st1b_scatter_s32.c   |   1 +
 .../aarch64/sve/acle/asm/st1b_scatter_s64.c   |   1 +
 .../aarch64/sve/acle/asm/st1b_scatter_u32.c   |   1 +
 .../aarch64/sve/acle/asm/st1b_scatter_u64.c   |   1 +
 .../aarch64/sve/acle/asm/st1h_scatter_s32.c   |   1 +
 .../aarch64/sve/acle/asm/st1h_scatter_s64.c   |   1 +
 .../aarch64/sve/acle/asm/st1h_scatter_u32.c   |   1 +
 .../aarch64/sve/acle/asm/st1h_scatter_u64.c   |   1 +
 .../aarch64/sve/acle/asm/st1w_scatter_s64.c   |   1 +
 .../aarch64/sve/acle/asm/st1w_scatter_u64.c   |   1 +
 .../aarch64/sve/acle/asm/test_sve_acle.h      |  11 +-
 .../aarch64/sve/acle/asm/tmad_f16.c           |   1 +
 .../aarch64/sve/acle/asm/tmad_f32.c           |   1 +
 .../aarch64/sve/acle/asm/tmad_f64.c           |   1 +
 .../aarch64/sve/acle/asm/tsmul_f16.c          |   1 +
 .../aarch64/sve/acle/asm/tsmul_f32.c          |   1 +
 .../aarch64/sve/acle/asm/tsmul_f64.c          |   1 +
 .../aarch64/sve/acle/asm/tssel_f16.c          |   1 +
 .../aarch64/sve/acle/asm/tssel_f32.c          |   1 +
 .../aarch64/sve/acle/asm/tssel_f64.c          |   1 +
 .../aarch64/sve/acle/asm/usmmla_s32.c         |   1 +
 .../sve2/acle/aarch64-sve2-acle-asm.exp       |   3 +-
 .../aarch64/sve2/acle/asm/aesd_u8.c           |   1 +
 .../aarch64/sve2/acle/asm/aese_u8.c           |   1 +
 .../aarch64/sve2/acle/asm/aesimc_u8.c         |   1 +
 .../aarch64/sve2/acle/asm/aesmc_u8.c          |   1 +
 .../aarch64/sve2/acle/asm/bdep_u16.c          |   1 +
 .../aarch64/sve2/acle/asm/bdep_u32.c          |   1 +
 .../aarch64/sve2/acle/asm/bdep_u64.c          |   1 +
 .../aarch64/sve2/acle/asm/bdep_u8.c           |   1 +
 .../aarch64/sve2/acle/asm/bext_u16.c          |   1 +
 .../aarch64/sve2/acle/asm/bext_u32.c          |   1 +
 .../aarch64/sve2/acle/asm/bext_u64.c          |   1 +
 .../aarch64/sve2/acle/asm/bext_u8.c           |   1 +
 .../aarch64/sve2/acle/asm/bgrp_u16.c          |   1 +
 .../aarch64/sve2/acle/asm/bgrp_u32.c          |   1 +
 .../aarch64/sve2/acle/asm/bgrp_u64.c          |   1 +
 .../aarch64/sve2/acle/asm/bgrp_u8.c           |   1 +
 .../aarch64/sve2/acle/asm/histcnt_s32.c       |   1 +
 .../aarch64/sve2/acle/asm/histcnt_s64.c       |   1 +
 .../aarch64/sve2/acle/asm/histcnt_u32.c       |   1 +
 .../aarch64/sve2/acle/asm/histcnt_u64.c       |   1 +
 .../aarch64/sve2/acle/asm/histseg_s8.c        |   1 +
 .../aarch64/sve2/acle/asm/histseg_u8.c        |   1 +
 .../aarch64/sve2/acle/asm/ldnt1_gather_f32.c  |   1 +
 .../aarch64/sve2/acle/asm/ldnt1_gather_f64.c  |   1 +
 .../aarch64/sve2/acle/asm/ldnt1_gather_s32.c  |   1 +
 .../aarch64/sve2/acle/asm/ldnt1_gather_s64.c  |   1 +
 .../aarch64/sve2/acle/asm/ldnt1_gather_u32.c  |   1 +
 .../aarch64/sve2/acle/asm/ldnt1_gather_u64.c  |   1 +
 .../sve2/acle/asm/ldnt1sb_gather_s32.c        |   1 +
 .../sve2/acle/asm/ldnt1sb_gather_s64.c        |   1 +
 .../sve2/acle/asm/ldnt1sb_gather_u32.c        |   1 +
 .../sve2/acle/asm/ldnt1sb_gather_u64.c        |   1 +
 .../sve2/acle/asm/ldnt1sh_gather_s32.c        |   1 +
 .../sve2/acle/asm/ldnt1sh_gather_s64.c        |   1 +
 .../sve2/acle/asm/ldnt1sh_gather_u32.c        |   1 +
 .../sve2/acle/asm/ldnt1sh_gather_u64.c        |   1 +
 .../sve2/acle/asm/ldnt1sw_gather_s64.c        |   1 +
 .../sve2/acle/asm/ldnt1sw_gather_u64.c        |   1 +
 .../sve2/acle/asm/ldnt1ub_gather_s32.c        |   1 +
 .../sve2/acle/asm/ldnt1ub_gather_s64.c        |   1 +
 .../sve2/acle/asm/ldnt1ub_gather_u32.c        |   1 +
 .../sve2/acle/asm/ldnt1ub_gather_u64.c        |   1 +
 .../sve2/acle/asm/ldnt1uh_gather_s32.c        |   1 +
 .../sve2/acle/asm/ldnt1uh_gather_s64.c        |   1 +
 .../sve2/acle/asm/ldnt1uh_gather_u32.c        |   1 +
 .../sve2/acle/asm/ldnt1uh_gather_u64.c        |   1 +
 .../sve2/acle/asm/ldnt1uw_gather_s64.c        |   1 +
 .../sve2/acle/asm/ldnt1uw_gather_u64.c        |   1 +
 .../aarch64/sve2/acle/asm/match_s16.c         |   1 +
 .../aarch64/sve2/acle/asm/match_s8.c          |   1 +
 .../aarch64/sve2/acle/asm/match_u16.c         |   1 +
 .../aarch64/sve2/acle/asm/match_u8.c          |   1 +
 .../aarch64/sve2/acle/asm/nmatch_s16.c        |   1 +
 .../aarch64/sve2/acle/asm/nmatch_s8.c         |   1 +
 .../aarch64/sve2/acle/asm/nmatch_u16.c        |   1 +
 .../aarch64/sve2/acle/asm/nmatch_u8.c         |   1 +
 .../aarch64/sve2/acle/asm/pmullb_pair_u64.c   |   1 +
 .../aarch64/sve2/acle/asm/pmullt_pair_u64.c   |   1 +
 .../aarch64/sve2/acle/asm/rax1_s64.c          |   1 +
 .../aarch64/sve2/acle/asm/rax1_u64.c          |   1 +
 .../aarch64/sve2/acle/asm/sm4e_u32.c          |   1 +
 .../aarch64/sve2/acle/asm/sm4ekey_u32.c       |   1 +
 .../aarch64/sve2/acle/asm/stnt1_scatter_f32.c |   1 +
 .../aarch64/sve2/acle/asm/stnt1_scatter_f64.c |   1 +
 .../aarch64/sve2/acle/asm/stnt1_scatter_s32.c |   1 +
 .../aarch64/sve2/acle/asm/stnt1_scatter_s64.c |   1 +
 .../aarch64/sve2/acle/asm/stnt1_scatter_u32.c |   1 +
 .../aarch64/sve2/acle/asm/stnt1_scatter_u64.c |   1 +
 .../sve2/acle/asm/stnt1b_scatter_s32.c        |   1 +
 .../sve2/acle/asm/stnt1b_scatter_s64.c        |   1 +
 .../sve2/acle/asm/stnt1b_scatter_u32.c        |   1 +
 .../sve2/acle/asm/stnt1b_scatter_u64.c        |   1 +
 .../sve2/acle/asm/stnt1h_scatter_s32.c        |   1 +
 .../sve2/acle/asm/stnt1h_scatter_s64.c        |   1 +
 .../sve2/acle/asm/stnt1h_scatter_u32.c        |   1 +
 .../sve2/acle/asm/stnt1h_scatter_u64.c        |   1 +
 .../sve2/acle/asm/stnt1w_scatter_s64.c        |   1 +
 .../sve2/acle/asm/stnt1w_scatter_u64.c        |   1 +
 279 files changed, 799 insertions(+), 165 deletions(-)
 create mode 100644 gcc/testsuite/g++.target/aarch64/sve/aarch64-ssve.exp

diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.def b/gcc/config/aarch64/aarch64-sve-builtins-base.def
index ffdf7cb4c32..a2d0cea6c5b 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins-base.def
+++ b/gcc/config/aarch64/aarch64-sve-builtins-base.def
@@ -25,12 +25,7 @@ DEF_SVE_FUNCTION (svacgt, compare_opt_n, all_float, implicit)
 DEF_SVE_FUNCTION (svacle, compare_opt_n, all_float, implicit)
 DEF_SVE_FUNCTION (svaclt, compare_opt_n, all_float, implicit)
 DEF_SVE_FUNCTION (svadd, binary_opt_n, all_arith, mxz)
-DEF_SVE_FUNCTION (svadda, fold_left, all_float, implicit)
 DEF_SVE_FUNCTION (svaddv, reduction_wide, all_arith, implicit)
-DEF_SVE_FUNCTION (svadrb, adr_offset, none, none)
-DEF_SVE_FUNCTION (svadrd, adr_index, none, none)
-DEF_SVE_FUNCTION (svadrh, adr_index, none, none)
-DEF_SVE_FUNCTION (svadrw, adr_index, none, none)
 DEF_SVE_FUNCTION (svand, binary_opt_n, all_integer, mxz)
 DEF_SVE_FUNCTION (svand, binary_opt_n, b, z)
 DEF_SVE_FUNCTION (svandv, reduction, all_integer, implicit)
@@ -75,7 +70,6 @@ DEF_SVE_FUNCTION (svcnth_pat, count_pat, none, none)
 DEF_SVE_FUNCTION (svcntp, count_pred, all_pred, implicit)
 DEF_SVE_FUNCTION (svcntw, count_inherent, none, none)
 DEF_SVE_FUNCTION (svcntw_pat, count_pat, none, none)
-DEF_SVE_FUNCTION (svcompact, unary, sd_data, implicit)
 DEF_SVE_FUNCTION (svcreate2, create, all_data, none)
 DEF_SVE_FUNCTION (svcreate3, create, all_data, none)
 DEF_SVE_FUNCTION (svcreate4, create, all_data, none)
@@ -93,7 +87,6 @@ DEF_SVE_FUNCTION (svdupq_lane, binary_uint64_n, all_data, none)
 DEF_SVE_FUNCTION (sveor, binary_opt_n, all_integer, mxz)
 DEF_SVE_FUNCTION (sveor, binary_opt_n, b, z)
 DEF_SVE_FUNCTION (sveorv, reduction, all_integer, implicit)
-DEF_SVE_FUNCTION (svexpa, unary_uint, all_float, none)
 DEF_SVE_FUNCTION (svext, ext, all_data, none)
 DEF_SVE_FUNCTION (svextb, unary, hsd_integer, mxz)
 DEF_SVE_FUNCTION (svexth, unary, sd_integer, mxz)
@@ -106,51 +99,13 @@ DEF_SVE_FUNCTION (svinsr, binary_n, all_data, none)
 DEF_SVE_FUNCTION (svlasta, reduction, all_data, implicit)
 DEF_SVE_FUNCTION (svlastb, reduction, all_data, implicit)
 DEF_SVE_FUNCTION (svld1, load, all_data, implicit)
-DEF_SVE_FUNCTION (svld1_gather, load_gather_sv, sd_data, implicit)
-DEF_SVE_FUNCTION (svld1_gather, load_gather_vs, sd_data, implicit)
 DEF_SVE_FUNCTION (svld1rq, load_replicate, all_data, implicit)
 DEF_SVE_FUNCTION (svld1sb, load_ext, hsd_integer, implicit)
-DEF_SVE_FUNCTION (svld1sb_gather, load_ext_gather_offset, sd_integer, implicit)
 DEF_SVE_FUNCTION (svld1sh, load_ext, sd_integer, implicit)
-DEF_SVE_FUNCTION (svld1sh_gather, load_ext_gather_offset, sd_integer, implicit)
-DEF_SVE_FUNCTION (svld1sh_gather, load_ext_gather_index, sd_integer, implicit)
 DEF_SVE_FUNCTION (svld1sw, load_ext, d_integer, implicit)
-DEF_SVE_FUNCTION (svld1sw_gather, load_ext_gather_offset, d_integer, implicit)
-DEF_SVE_FUNCTION (svld1sw_gather, load_ext_gather_index, d_integer, implicit)
 DEF_SVE_FUNCTION (svld1ub, load_ext, hsd_integer, implicit)
-DEF_SVE_FUNCTION (svld1ub_gather, load_ext_gather_offset, sd_integer, implicit)
 DEF_SVE_FUNCTION (svld1uh, load_ext, sd_integer, implicit)
-DEF_SVE_FUNCTION (svld1uh_gather, load_ext_gather_offset, sd_integer, implicit)
-DEF_SVE_FUNCTION (svld1uh_gather, load_ext_gather_index, sd_integer, implicit)
 DEF_SVE_FUNCTION (svld1uw, load_ext, d_integer, implicit)
-DEF_SVE_FUNCTION (svld1uw_gather, load_ext_gather_offset, d_integer, implicit)
-DEF_SVE_FUNCTION (svld1uw_gather, load_ext_gather_index, d_integer, implicit)
-DEF_SVE_FUNCTION (svldff1, load, all_data, implicit)
-DEF_SVE_FUNCTION (svldff1_gather, load_gather_sv, sd_data, implicit)
-DEF_SVE_FUNCTION (svldff1_gather, load_gather_vs, sd_data, implicit)
-DEF_SVE_FUNCTION (svldff1sb, load_ext, hsd_integer, implicit)
-DEF_SVE_FUNCTION (svldff1sb_gather, load_ext_gather_offset, sd_integer, implicit)
-DEF_SVE_FUNCTION (svldff1sh, load_ext, sd_integer, implicit)
-DEF_SVE_FUNCTION (svldff1sh_gather, load_ext_gather_offset, sd_integer, implicit)
-DEF_SVE_FUNCTION (svldff1sh_gather, load_ext_gather_index, sd_integer, implicit)
-DEF_SVE_FUNCTION (svldff1sw, load_ext, d_integer, implicit)
-DEF_SVE_FUNCTION (svldff1sw_gather, load_ext_gather_offset, d_integer, implicit)
-DEF_SVE_FUNCTION (svldff1sw_gather, load_ext_gather_index, d_integer, implicit)
-DEF_SVE_FUNCTION (svldff1ub, load_ext, hsd_integer, implicit)
-DEF_SVE_FUNCTION (svldff1ub_gather, load_ext_gather_offset, sd_integer, implicit)
-DEF_SVE_FUNCTION (svldff1uh, load_ext, sd_integer, implicit)
-DEF_SVE_FUNCTION (svldff1uh_gather, load_ext_gather_offset, sd_integer, implicit)
-DEF_SVE_FUNCTION (svldff1uh_gather, load_ext_gather_index, sd_integer, implicit)
-DEF_SVE_FUNCTION (svldff1uw, load_ext, d_integer, implicit)
-DEF_SVE_FUNCTION (svldff1uw_gather, load_ext_gather_offset, d_integer, implicit)
-DEF_SVE_FUNCTION (svldff1uw_gather, load_ext_gather_index, d_integer, implicit)
-DEF_SVE_FUNCTION (svldnf1, load, all_data, implicit)
-DEF_SVE_FUNCTION (svldnf1sb, load_ext, hsd_integer, implicit)
-DEF_SVE_FUNCTION (svldnf1sh, load_ext, sd_integer, implicit)
-DEF_SVE_FUNCTION (svldnf1sw, load_ext, d_integer, implicit)
-DEF_SVE_FUNCTION (svldnf1ub, load_ext, hsd_integer, implicit)
-DEF_SVE_FUNCTION (svldnf1uh, load_ext, sd_integer, implicit)
-DEF_SVE_FUNCTION (svldnf1uw, load_ext, d_integer, implicit)
 DEF_SVE_FUNCTION (svldnt1, load, all_data, implicit)
 DEF_SVE_FUNCTION (svld2, load, all_data, implicit)
 DEF_SVE_FUNCTION (svld3, load, all_data, implicit)
@@ -173,7 +128,6 @@ DEF_SVE_FUNCTION (svmla, ternary_opt_n, all_arith, mxz)
 DEF_SVE_FUNCTION (svmla_lane, ternary_lane, all_float, none)
 DEF_SVE_FUNCTION (svmls, ternary_opt_n, all_arith, mxz)
 DEF_SVE_FUNCTION (svmls_lane, ternary_lane, all_float, none)
-DEF_SVE_FUNCTION (svmmla, mmla, none, none)
 DEF_SVE_FUNCTION (svmov, unary, b, z)
 DEF_SVE_FUNCTION (svmsb, ternary_opt_n, all_arith, mxz)
 DEF_SVE_FUNCTION (svmul, binary_opt_n, all_arith, mxz)
@@ -197,13 +151,9 @@ DEF_SVE_FUNCTION (svpfalse, inherent_b, b, none)
 DEF_SVE_FUNCTION (svpfirst, unary, b, implicit)
 DEF_SVE_FUNCTION (svpnext, unary_pred, all_pred, implicit)
 DEF_SVE_FUNCTION (svprfb, prefetch, none, implicit)
-DEF_SVE_FUNCTION (svprfb_gather, prefetch_gather_offset, none, implicit)
 DEF_SVE_FUNCTION (svprfd, prefetch, none, implicit)
-DEF_SVE_FUNCTION (svprfd_gather, prefetch_gather_index, none, implicit)
 DEF_SVE_FUNCTION (svprfh, prefetch, none, implicit)
-DEF_SVE_FUNCTION (svprfh_gather, prefetch_gather_index, none, implicit)
 DEF_SVE_FUNCTION (svprfw, prefetch, none, implicit)
-DEF_SVE_FUNCTION (svprfw_gather, prefetch_gather_index, none, implicit)
 DEF_SVE_FUNCTION (svptest_any, ptest, none, implicit)
 DEF_SVE_FUNCTION (svptest_first, ptest, none, implicit)
 DEF_SVE_FUNCTION (svptest_last, ptest, none, implicit)
@@ -244,7 +194,6 @@ DEF_SVE_FUNCTION (svqincw_pat, inc_dec_pat, s_integer, none)
 DEF_SVE_FUNCTION (svqincw_pat, inc_dec_pat, sd_integer, none)
 DEF_SVE_FUNCTION (svqsub, binary_opt_n, all_integer, none)
 DEF_SVE_FUNCTION (svrbit, unary, all_integer, mxz)
-DEF_SVE_FUNCTION (svrdffr, rdffr, none, z_or_none)
 DEF_SVE_FUNCTION (svrecpe, unary, all_float, none)
 DEF_SVE_FUNCTION (svrecps, binary, all_float, none)
 DEF_SVE_FUNCTION (svrecpx, unary, all_float, mxz)
@@ -269,20 +218,12 @@ DEF_SVE_FUNCTION (svsel, binary, b, implicit)
 DEF_SVE_FUNCTION (svset2, set, all_data, none)
 DEF_SVE_FUNCTION (svset3, set, all_data, none)
 DEF_SVE_FUNCTION (svset4, set, all_data, none)
-DEF_SVE_FUNCTION (svsetffr, setffr, none, none)
 DEF_SVE_FUNCTION (svsplice, binary, all_data, implicit)
 DEF_SVE_FUNCTION (svsqrt, unary, all_float, mxz)
 DEF_SVE_FUNCTION (svst1, store, all_data, implicit)
-DEF_SVE_FUNCTION (svst1_scatter, store_scatter_index, sd_data, implicit)
-DEF_SVE_FUNCTION (svst1_scatter, store_scatter_offset, sd_data, implicit)
 DEF_SVE_FUNCTION (svst1b, store, hsd_integer, implicit)
-DEF_SVE_FUNCTION (svst1b_scatter, store_scatter_offset, sd_integer, implicit)
 DEF_SVE_FUNCTION (svst1h, store, sd_integer, implicit)
-DEF_SVE_FUNCTION (svst1h_scatter, store_scatter_index, sd_integer, implicit)
-DEF_SVE_FUNCTION (svst1h_scatter, store_scatter_offset, sd_integer, implicit)
 DEF_SVE_FUNCTION (svst1w, store, d_integer, implicit)
-DEF_SVE_FUNCTION (svst1w_scatter, store_scatter_index, d_integer, implicit)
-DEF_SVE_FUNCTION (svst1w_scatter, store_scatter_offset, d_integer, implicit)
 DEF_SVE_FUNCTION (svst2, store, all_data, implicit)
 DEF_SVE_FUNCTION (svst3, store, all_data, implicit)
 DEF_SVE_FUNCTION (svst4, store, all_data, implicit)
@@ -290,13 +231,10 @@ DEF_SVE_FUNCTION (svstnt1, store, all_data, implicit)
 DEF_SVE_FUNCTION (svsub, binary_opt_n, all_arith, mxz)
 DEF_SVE_FUNCTION (svsubr, binary_opt_n, all_arith, mxz)
 DEF_SVE_FUNCTION (svtbl, binary_uint, all_data, none)
-DEF_SVE_FUNCTION (svtmad, tmad, all_float, none)
 DEF_SVE_FUNCTION (svtrn1, binary, all_data, none)
 DEF_SVE_FUNCTION (svtrn1, binary_pred, all_pred, none)
 DEF_SVE_FUNCTION (svtrn2, binary, all_data, none)
 DEF_SVE_FUNCTION (svtrn2, binary_pred, all_pred, none)
-DEF_SVE_FUNCTION (svtsmul, binary_uint, all_float, none)
-DEF_SVE_FUNCTION (svtssel, binary_uint, all_float, none)
 DEF_SVE_FUNCTION (svundef, inherent, all_data, none)
 DEF_SVE_FUNCTION (svundef2, inherent, all_data, none)
 DEF_SVE_FUNCTION (svundef3, inherent, all_data, none)
@@ -311,13 +249,78 @@ DEF_SVE_FUNCTION (svuzp2, binary, all_data, none)
 DEF_SVE_FUNCTION (svuzp2, binary_pred, all_pred, none)
 DEF_SVE_FUNCTION (svwhilele, compare_scalar, while, none)
 DEF_SVE_FUNCTION (svwhilelt, compare_scalar, while, none)
-DEF_SVE_FUNCTION (svwrffr, setffr, none, implicit)
 DEF_SVE_FUNCTION (svzip1, binary, all_data, none)
 DEF_SVE_FUNCTION (svzip1, binary_pred, all_pred, none)
 DEF_SVE_FUNCTION (svzip2, binary, all_data, none)
 DEF_SVE_FUNCTION (svzip2, binary_pred, all_pred, none)
 #undef REQUIRED_EXTENSIONS
 
+#define REQUIRED_EXTENSIONS AARCH64_FL_SM_OFF
+DEF_SVE_FUNCTION (svadda, fold_left, all_float, implicit)
+DEF_SVE_FUNCTION (svadrb, adr_offset, none, none)
+DEF_SVE_FUNCTION (svadrd, adr_index, none, none)
+DEF_SVE_FUNCTION (svadrh, adr_index, none, none)
+DEF_SVE_FUNCTION (svadrw, adr_index, none, none)
+DEF_SVE_FUNCTION (svcompact, unary, sd_data, implicit)
+DEF_SVE_FUNCTION (svexpa, unary_uint, all_float, none)
+DEF_SVE_FUNCTION (svld1_gather, load_gather_sv, sd_data, implicit)
+DEF_SVE_FUNCTION (svld1_gather, load_gather_vs, sd_data, implicit)
+DEF_SVE_FUNCTION (svld1sb_gather, load_ext_gather_offset, sd_integer, implicit)
+DEF_SVE_FUNCTION (svld1sh_gather, load_ext_gather_offset, sd_integer, implicit)
+DEF_SVE_FUNCTION (svld1sh_gather, load_ext_gather_index, sd_integer, implicit)
+DEF_SVE_FUNCTION (svld1sw_gather, load_ext_gather_offset, d_integer, implicit)
+DEF_SVE_FUNCTION (svld1sw_gather, load_ext_gather_index, d_integer, implicit)
+DEF_SVE_FUNCTION (svld1ub_gather, load_ext_gather_offset, sd_integer, implicit)
+DEF_SVE_FUNCTION (svld1uh_gather, load_ext_gather_offset, sd_integer, implicit)
+DEF_SVE_FUNCTION (svld1uh_gather, load_ext_gather_index, sd_integer, implicit)
+DEF_SVE_FUNCTION (svld1uw_gather, load_ext_gather_offset, d_integer, implicit)
+DEF_SVE_FUNCTION (svld1uw_gather, load_ext_gather_index, d_integer, implicit)
+DEF_SVE_FUNCTION (svldff1, load, all_data, implicit)
+DEF_SVE_FUNCTION (svldff1_gather, load_gather_sv, sd_data, implicit)
+DEF_SVE_FUNCTION (svldff1_gather, load_gather_vs, sd_data, implicit)
+DEF_SVE_FUNCTION (svldff1sb, load_ext, hsd_integer, implicit)
+DEF_SVE_FUNCTION (svldff1sb_gather, load_ext_gather_offset, sd_integer, implicit)
+DEF_SVE_FUNCTION (svldff1sh, load_ext, sd_integer, implicit)
+DEF_SVE_FUNCTION (svldff1sh_gather, load_ext_gather_offset, sd_integer, implicit)
+DEF_SVE_FUNCTION (svldff1sh_gather, load_ext_gather_index, sd_integer, implicit)
+DEF_SVE_FUNCTION (svldff1sw, load_ext, d_integer, implicit)
+DEF_SVE_FUNCTION (svldff1sw_gather, load_ext_gather_offset, d_integer, implicit)
+DEF_SVE_FUNCTION (svldff1sw_gather, load_ext_gather_index, d_integer, implicit)
+DEF_SVE_FUNCTION (svldff1ub, load_ext, hsd_integer, implicit)
+DEF_SVE_FUNCTION (svldff1ub_gather, load_ext_gather_offset, sd_integer, implicit)
+DEF_SVE_FUNCTION (svldff1uh, load_ext, sd_integer, implicit)
+DEF_SVE_FUNCTION (svldff1uh_gather, load_ext_gather_offset, sd_integer, implicit)
+DEF_SVE_FUNCTION (svldff1uh_gather, load_ext_gather_index, sd_integer, implicit)
+DEF_SVE_FUNCTION (svldff1uw, load_ext, d_integer, implicit)
+DEF_SVE_FUNCTION (svldff1uw_gather, load_ext_gather_offset, d_integer, implicit)
+DEF_SVE_FUNCTION (svldff1uw_gather, load_ext_gather_index, d_integer, implicit)
+DEF_SVE_FUNCTION (svldnf1, load, all_data, implicit)
+DEF_SVE_FUNCTION (svldnf1sb, load_ext, hsd_integer, implicit)
+DEF_SVE_FUNCTION (svldnf1sh, load_ext, sd_integer, implicit)
+DEF_SVE_FUNCTION (svldnf1sw, load_ext, d_integer, implicit)
+DEF_SVE_FUNCTION (svldnf1ub, load_ext, hsd_integer, implicit)
+DEF_SVE_FUNCTION (svldnf1uh, load_ext, sd_integer, implicit)
+DEF_SVE_FUNCTION (svldnf1uw, load_ext, d_integer, implicit)
+DEF_SVE_FUNCTION (svmmla, mmla, none, none)
+DEF_SVE_FUNCTION (svprfb_gather, prefetch_gather_offset, none, implicit)
+DEF_SVE_FUNCTION (svprfd_gather, prefetch_gather_index, none, implicit)
+DEF_SVE_FUNCTION (svprfh_gather, prefetch_gather_index, none, implicit)
+DEF_SVE_FUNCTION (svprfw_gather, prefetch_gather_index, none, implicit)
+DEF_SVE_FUNCTION (svrdffr, rdffr, none, z_or_none)
+DEF_SVE_FUNCTION (svsetffr, setffr, none, none)
+DEF_SVE_FUNCTION (svst1_scatter, store_scatter_index, sd_data, implicit)
+DEF_SVE_FUNCTION (svst1_scatter, store_scatter_offset, sd_data, implicit)
+DEF_SVE_FUNCTION (svst1b_scatter, store_scatter_offset, sd_integer, implicit)
+DEF_SVE_FUNCTION (svst1h_scatter, store_scatter_index, sd_integer, implicit)
+DEF_SVE_FUNCTION (svst1h_scatter, store_scatter_offset, sd_integer, implicit)
+DEF_SVE_FUNCTION (svst1w_scatter, store_scatter_index, d_integer, implicit)
+DEF_SVE_FUNCTION (svst1w_scatter, store_scatter_offset, d_integer, implicit)
+DEF_SVE_FUNCTION (svtmad, tmad, all_float, none)
+DEF_SVE_FUNCTION (svtsmul, binary_uint, all_float, none)
+DEF_SVE_FUNCTION (svtssel, binary_uint, all_float, none)
+DEF_SVE_FUNCTION (svwrffr, setffr, none, implicit)
+#undef REQUIRED_EXTENSIONS
+
 #define REQUIRED_EXTENSIONS AARCH64_FL_BF16
 DEF_SVE_FUNCTION (svbfdot, ternary_bfloat_opt_n, s_float, none)
 DEF_SVE_FUNCTION (svbfdot_lane, ternary_bfloat_lanex2, s_float, none)
@@ -325,27 +328,31 @@ DEF_SVE_FUNCTION (svbfmlalb, ternary_bfloat_opt_n, s_float, none)
 DEF_SVE_FUNCTION (svbfmlalb_lane, ternary_bfloat_lane, s_float, none)
 DEF_SVE_FUNCTION (svbfmlalt, ternary_bfloat_opt_n, s_float, none)
 DEF_SVE_FUNCTION (svbfmlalt_lane, ternary_bfloat_lane, s_float, none)
-DEF_SVE_FUNCTION (svbfmmla, ternary_bfloat, s_float, none)
 DEF_SVE_FUNCTION (svcvt, unary_convert, cvt_bfloat, mxz)
 DEF_SVE_FUNCTION (svcvtnt, unary_convert_narrowt, cvt_bfloat, mx)
 #undef REQUIRED_EXTENSIONS
 
+#define REQUIRED_EXTENSIONS AARCH64_FL_BF16 | AARCH64_FL_SM_OFF
+DEF_SVE_FUNCTION (svbfmmla, ternary_bfloat, s_float, none)
+#undef REQUIRED_EXTENSIONS
+
 #define REQUIRED_EXTENSIONS AARCH64_FL_I8MM
-DEF_SVE_FUNCTION (svmmla, mmla, s_integer, none)
-DEF_SVE_FUNCTION (svusmmla, ternary_uintq_intq, s_signed, none)
 DEF_SVE_FUNCTION (svsudot, ternary_intq_uintq_opt_n, s_signed, none)
 DEF_SVE_FUNCTION (svsudot_lane, ternary_intq_uintq_lane, s_signed, none)
 DEF_SVE_FUNCTION (svusdot, ternary_uintq_intq_opt_n, s_signed, none)
 DEF_SVE_FUNCTION (svusdot_lane, ternary_uintq_intq_lane, s_signed, none)
 #undef REQUIRED_EXTENSIONS
 
-#define REQUIRED_EXTENSIONS AARCH64_FL_F32MM
+#define REQUIRED_EXTENSIONS AARCH64_FL_I8MM | AARCH64_FL_SM_OFF
+DEF_SVE_FUNCTION (svmmla, mmla, s_integer, none)
+DEF_SVE_FUNCTION (svusmmla, ternary_uintq_intq, s_signed, none)
+#undef REQUIRED_EXTENSIONS
+
+#define REQUIRED_EXTENSIONS AARCH64_FL_F32MM | AARCH64_FL_SM_OFF
 DEF_SVE_FUNCTION (svmmla, mmla, s_float, none)
 #undef REQUIRED_EXTENSIONS
 
 #define REQUIRED_EXTENSIONS AARCH64_FL_F64MM
-DEF_SVE_FUNCTION (svld1ro, load_replicate, all_data, implicit)
-DEF_SVE_FUNCTION (svmmla, mmla, d_float, none)
 DEF_SVE_FUNCTION (svtrn1q, binary, all_data, none)
 DEF_SVE_FUNCTION (svtrn2q, binary, all_data, none)
 DEF_SVE_FUNCTION (svuzp1q, binary, all_data, none)
@@ -353,3 +360,8 @@ DEF_SVE_FUNCTION (svuzp2q, binary, all_data, none)
 DEF_SVE_FUNCTION (svzip1q, binary, all_data, none)
 DEF_SVE_FUNCTION (svzip2q, binary, all_data, none)
 #undef REQUIRED_EXTENSIONS
+
+#define REQUIRED_EXTENSIONS AARCH64_FL_F64MM | AARCH64_FL_SM_OFF
+DEF_SVE_FUNCTION (svld1ro, load_replicate, all_data, implicit)
+DEF_SVE_FUNCTION (svmmla, mmla, d_float, none)
+#undef REQUIRED_EXTENSIONS
diff --git a/gcc/config/aarch64/aarch64-sve-builtins-sve2.def b/gcc/config/aarch64/aarch64-sve-builtins-sve2.def
index 635089ffc58..4e0466b4cf8 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins-sve2.def
+++ b/gcc/config/aarch64/aarch64-sve-builtins-sve2.def
@@ -51,24 +51,9 @@ DEF_SVE_FUNCTION (sveor3, ternary_opt_n, all_integer, none)
 DEF_SVE_FUNCTION (sveorbt, ternary_opt_n, all_integer, none)
 DEF_SVE_FUNCTION (sveortb, ternary_opt_n, all_integer, none)
 DEF_SVE_FUNCTION (svhadd, binary_opt_n, all_integer, mxz)
-DEF_SVE_FUNCTION (svhistcnt, binary_to_uint, sd_integer, z)
-DEF_SVE_FUNCTION (svhistseg, binary_to_uint, b_integer, none)
 DEF_SVE_FUNCTION (svhsub, binary_opt_n, all_integer, mxz)
 DEF_SVE_FUNCTION (svhsubr, binary_opt_n, all_integer, mxz)
-DEF_SVE_FUNCTION (svldnt1_gather, load_gather_sv_restricted, sd_data, implicit)
-DEF_SVE_FUNCTION (svldnt1_gather, load_gather_vs, sd_data, implicit)
-DEF_SVE_FUNCTION (svldnt1sb_gather, load_ext_gather_offset_restricted, sd_integer, implicit)
-DEF_SVE_FUNCTION (svldnt1sh_gather, load_ext_gather_offset_restricted, sd_integer, implicit)
-DEF_SVE_FUNCTION (svldnt1sh_gather, load_ext_gather_index_restricted, sd_integer, implicit)
-DEF_SVE_FUNCTION (svldnt1sw_gather, load_ext_gather_offset_restricted, d_integer, implicit)
-DEF_SVE_FUNCTION (svldnt1sw_gather, load_ext_gather_index_restricted, d_integer, implicit)
-DEF_SVE_FUNCTION (svldnt1ub_gather, load_ext_gather_offset_restricted, sd_integer, implicit)
-DEF_SVE_FUNCTION (svldnt1uh_gather, load_ext_gather_offset_restricted, sd_integer, implicit)
-DEF_SVE_FUNCTION (svldnt1uh_gather, load_ext_gather_index_restricted, sd_integer, implicit)
-DEF_SVE_FUNCTION (svldnt1uw_gather, load_ext_gather_offset_restricted, d_integer, implicit)
-DEF_SVE_FUNCTION (svldnt1uw_gather, load_ext_gather_index_restricted, d_integer, implicit)
 DEF_SVE_FUNCTION (svlogb, unary_to_int, all_float, mxz)
-DEF_SVE_FUNCTION (svmatch, compare, bh_integer, implicit)
 DEF_SVE_FUNCTION (svmaxp, binary, all_arith, mx)
 DEF_SVE_FUNCTION (svmaxnmp, binary, all_float, mx)
 DEF_SVE_FUNCTION (svmla_lane, ternary_lane, hsd_integer, none)
@@ -91,7 +76,6 @@ DEF_SVE_FUNCTION (svmullb_lane, binary_long_lane, sd_integer, none)
 DEF_SVE_FUNCTION (svmullt, binary_long_opt_n, hsd_integer, none)
 DEF_SVE_FUNCTION (svmullt_lane, binary_long_lane, sd_integer, none)
 DEF_SVE_FUNCTION (svnbsl, ternary_opt_n, all_integer, none)
-DEF_SVE_FUNCTION (svnmatch, compare, bh_integer, implicit)
 DEF_SVE_FUNCTION (svpmul, binary_opt_n, b_unsigned, none)
 DEF_SVE_FUNCTION (svpmullb, binary_long_opt_n, hd_unsigned, none)
 DEF_SVE_FUNCTION (svpmullb_pair, binary_opt_n, bs_unsigned, none)
@@ -164,13 +148,6 @@ DEF_SVE_FUNCTION (svsli, ternary_shift_left_imm, all_integer, none)
 DEF_SVE_FUNCTION (svsqadd, binary_int_opt_n, all_unsigned, mxz)
 DEF_SVE_FUNCTION (svsra, ternary_shift_right_imm, all_integer, none)
 DEF_SVE_FUNCTION (svsri, ternary_shift_right_imm, all_integer, none)
-DEF_SVE_FUNCTION (svstnt1_scatter, store_scatter_index_restricted, sd_data, implicit)
-DEF_SVE_FUNCTION (svstnt1_scatter, store_scatter_offset_restricted, sd_data, implicit)
-DEF_SVE_FUNCTION (svstnt1b_scatter, store_scatter_offset_restricted, sd_integer, implicit)
-DEF_SVE_FUNCTION (svstnt1h_scatter, store_scatter_index_restricted, sd_integer, implicit)
-DEF_SVE_FUNCTION (svstnt1h_scatter, store_scatter_offset_restricted, sd_integer, implicit)
-DEF_SVE_FUNCTION (svstnt1w_scatter, store_scatter_index_restricted, d_integer, implicit)
-DEF_SVE_FUNCTION (svstnt1w_scatter, store_scatter_offset_restricted, d_integer, implicit)
 DEF_SVE_FUNCTION (svsubhnb, binary_narrowb_opt_n, hsd_integer, none)
 DEF_SVE_FUNCTION (svsubhnt, binary_narrowt_opt_n, hsd_integer, none)
 DEF_SVE_FUNCTION (svsublb, binary_long_opt_n, hsd_integer, none)
@@ -189,7 +166,35 @@ DEF_SVE_FUNCTION (svwhilewr, compare_ptr, all_data, none)
 DEF_SVE_FUNCTION (svxar, ternary_shift_right_imm, all_integer, none)
 #undef REQUIRED_EXTENSIONS
 
-#define REQUIRED_EXTENSIONS (AARCH64_FL_SVE2 | AARCH64_FL_SVE2_AES)
+#define REQUIRED_EXTENSIONS AARCH64_FL_SVE2 | AARCH64_FL_SM_OFF
+DEF_SVE_FUNCTION (svhistcnt, binary_to_uint, sd_integer, z)
+DEF_SVE_FUNCTION (svhistseg, binary_to_uint, b_integer, none)
+DEF_SVE_FUNCTION (svldnt1_gather, load_gather_sv_restricted, sd_data, implicit)
+DEF_SVE_FUNCTION (svldnt1_gather, load_gather_vs, sd_data, implicit)
+DEF_SVE_FUNCTION (svldnt1sb_gather, load_ext_gather_offset_restricted, sd_integer, implicit)
+DEF_SVE_FUNCTION (svldnt1sh_gather, load_ext_gather_offset_restricted, sd_integer, implicit)
+DEF_SVE_FUNCTION (svldnt1sh_gather, load_ext_gather_index_restricted, sd_integer, implicit)
+DEF_SVE_FUNCTION (svldnt1sw_gather, load_ext_gather_offset_restricted, d_integer, implicit)
+DEF_SVE_FUNCTION (svldnt1sw_gather, load_ext_gather_index_restricted, d_integer, implicit)
+DEF_SVE_FUNCTION (svldnt1ub_gather, load_ext_gather_offset_restricted, sd_integer, implicit)
+DEF_SVE_FUNCTION (svldnt1uh_gather, load_ext_gather_offset_restricted, sd_integer, implicit)
+DEF_SVE_FUNCTION (svldnt1uh_gather, load_ext_gather_index_restricted, sd_integer, implicit)
+DEF_SVE_FUNCTION (svldnt1uw_gather, load_ext_gather_offset_restricted, d_integer, implicit)
+DEF_SVE_FUNCTION (svldnt1uw_gather, load_ext_gather_index_restricted, d_integer, implicit)
+DEF_SVE_FUNCTION (svmatch, compare, bh_integer, implicit)
+DEF_SVE_FUNCTION (svnmatch, compare, bh_integer, implicit)
+DEF_SVE_FUNCTION (svstnt1_scatter, store_scatter_index_restricted, sd_data, implicit)
+DEF_SVE_FUNCTION (svstnt1_scatter, store_scatter_offset_restricted, sd_data, implicit)
+DEF_SVE_FUNCTION (svstnt1b_scatter, store_scatter_offset_restricted, sd_integer, implicit)
+DEF_SVE_FUNCTION (svstnt1h_scatter, store_scatter_index_restricted, sd_integer, implicit)
+DEF_SVE_FUNCTION (svstnt1h_scatter, store_scatter_offset_restricted, sd_integer, implicit)
+DEF_SVE_FUNCTION (svstnt1w_scatter, store_scatter_index_restricted, d_integer, implicit)
+DEF_SVE_FUNCTION (svstnt1w_scatter, store_scatter_offset_restricted, d_integer, implicit)
+#undef REQUIRED_EXTENSIONS
+
+#define REQUIRED_EXTENSIONS (AARCH64_FL_SVE2 \
+			     | AARCH64_FL_SVE2_AES \
+			     | AARCH64_FL_SM_OFF)
 DEF_SVE_FUNCTION (svaesd, binary, b_unsigned, none)
 DEF_SVE_FUNCTION (svaese, binary, b_unsigned, none)
 DEF_SVE_FUNCTION (svaesmc, unary, b_unsigned, none)
@@ -198,17 +203,23 @@ DEF_SVE_FUNCTION (svpmullb_pair, binary_opt_n, d_unsigned, none)
 DEF_SVE_FUNCTION (svpmullt_pair, binary_opt_n, d_unsigned, none)
 #undef REQUIRED_EXTENSIONS
 
-#define REQUIRED_EXTENSIONS (AARCH64_FL_SVE2 | AARCH64_FL_SVE2_BITPERM)
+#define REQUIRED_EXTENSIONS (AARCH64_FL_SVE2 \
+			     | AARCH64_FL_SVE2_BITPERM \
+			     | AARCH64_FL_SM_OFF)
 DEF_SVE_FUNCTION (svbdep, binary_opt_n, all_unsigned, none)
 DEF_SVE_FUNCTION (svbext, binary_opt_n, all_unsigned, none)
 DEF_SVE_FUNCTION (svbgrp, binary_opt_n, all_unsigned, none)
 #undef REQUIRED_EXTENSIONS
 
-#define REQUIRED_EXTENSIONS (AARCH64_FL_SVE2 | AARCH64_FL_SVE2_SHA3)
+#define REQUIRED_EXTENSIONS (AARCH64_FL_SVE2 \
+			     | AARCH64_FL_SVE2_SHA3 \
+			     | AARCH64_FL_SM_OFF)
 DEF_SVE_FUNCTION (svrax1, binary, d_integer, none)
 #undef REQUIRED_EXTENSIONS
 
-#define REQUIRED_EXTENSIONS (AARCH64_FL_SVE2 | AARCH64_FL_SVE2_SM4)
+#define REQUIRED_EXTENSIONS (AARCH64_FL_SVE2 \
+			     | AARCH64_FL_SVE2_SM4 \
+			     | AARCH64_FL_SM_OFF)
 DEF_SVE_FUNCTION (svsm4e, binary, s_unsigned, none)
 DEF_SVE_FUNCTION (svsm4ekey, binary, s_unsigned, none)
 #undef REQUIRED_EXTENSIONS
diff --git a/gcc/config/aarch64/aarch64-sve-builtins.cc b/gcc/config/aarch64/aarch64-sve-builtins.cc
index e168c83344a..a6de1068da9 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins.cc
+++ b/gcc/config/aarch64/aarch64-sve-builtins.cc
@@ -700,6 +700,13 @@ check_required_extensions (location_t location, tree fndecl,
   if (missing_extensions == 0)
     return check_required_registers (location, fndecl);
 
+  if (missing_extensions & AARCH64_FL_SM_OFF)
+    {
+      error_at (location, "ACLE function %qD cannot be called when"
+		" SME streaming mode is enabled", fndecl);
+      return false;
+    }
+
   static const struct {
     aarch64_feature_flags flag;
     const char *name;
diff --git a/gcc/config/aarch64/aarch64-sve.md b/gcc/config/aarch64/aarch64-sve.md
index b8cc47ef5fc..e98fbcbeb0e 100644
--- a/gcc/config/aarch64/aarch64-sve.md
+++ b/gcc/config/aarch64/aarch64-sve.md
@@ -1082,7 +1082,7 @@ (define_insn "aarch64_wrffr"
 	(match_operand:VNx16BI 0 "aarch64_simd_reg_or_minus_one" "Dm, Upa"))
    (set (reg:VNx16BI FFRT_REGNUM)
 	(unspec:VNx16BI [(match_dup 0)] UNSPEC_WRFFR))]
-  "TARGET_SVE"
+  "TARGET_SVE && TARGET_NON_STREAMING"
   "@
    setffr
    wrffr\t%0.b"
@@ -1123,7 +1123,7 @@ (define_insn "aarch64_copy_ffr_to_ffrt"
 (define_insn "aarch64_rdffr"
   [(set (match_operand:VNx16BI 0 "register_operand" "=Upa")
 	(reg:VNx16BI FFRT_REGNUM))]
-  "TARGET_SVE"
+  "TARGET_SVE && TARGET_NON_STREAMING"
   "rdffr\t%0.b"
 )
 
@@ -1133,7 +1133,7 @@ (define_insn "aarch64_rdffr_z"
 	(and:VNx16BI
 	  (reg:VNx16BI FFRT_REGNUM)
 	  (match_operand:VNx16BI 1 "register_operand" "Upa")))]
-  "TARGET_SVE"
+  "TARGET_SVE && TARGET_NON_STREAMING"
   "rdffr\t%0.b, %1/z"
 )
 
@@ -1149,7 +1149,7 @@ (define_insn "*aarch64_rdffr_z_ptest"
 	     (match_dup 1))]
 	  UNSPEC_PTEST))
    (clobber (match_scratch:VNx16BI 0 "=Upa"))]
-  "TARGET_SVE"
+  "TARGET_SVE && TARGET_NON_STREAMING"
   "rdffrs\t%0.b, %1/z"
 )
 
@@ -1163,7 +1163,7 @@ (define_insn "*aarch64_rdffr_ptest"
 	   (reg:VNx16BI FFRT_REGNUM)]
 	  UNSPEC_PTEST))
    (clobber (match_scratch:VNx16BI 0 "=Upa"))]
-  "TARGET_SVE"
+  "TARGET_SVE && TARGET_NON_STREAMING"
   "rdffrs\t%0.b, %1/z"
 )
 
@@ -1182,7 +1182,7 @@ (define_insn "*aarch64_rdffr_z_cc"
 	(and:VNx16BI
 	  (reg:VNx16BI FFRT_REGNUM)
 	  (match_dup 1)))]
-  "TARGET_SVE"
+  "TARGET_SVE && TARGET_NON_STREAMING"
   "rdffrs\t%0.b, %1/z"
 )
 
@@ -1197,7 +1197,7 @@ (define_insn "*aarch64_rdffr_cc"
 	  UNSPEC_PTEST))
    (set (match_operand:VNx16BI 0 "register_operand" "=Upa")
 	(reg:VNx16BI FFRT_REGNUM))]
-  "TARGET_SVE"
+  "TARGET_SVE && TARGET_NON_STREAMING"
   "rdffrs\t%0.b, %1/z"
 )
 
@@ -1327,7 +1327,7 @@ (define_insn "@aarch64_ld<fn>f1<mode>"
 	   (match_operand:SVE_FULL 1 "aarch64_sve_ld<fn>f1_operand" "Ut<fn>")
 	   (reg:VNx16BI FFRT_REGNUM)]
 	  SVE_LDFF1_LDNF1))]
-  "TARGET_SVE"
+  "TARGET_SVE && TARGET_NON_STREAMING"
   "ld<fn>f1<Vesize>\t%0.<Vetype>, %2/z, %1"
 )
 
@@ -1361,7 +1361,9 @@ (define_insn_and_rewrite "@aarch64_ld<fn>f1_<ANY_EXTEND:optab><SVE_HSDI:mode><SV
 		(reg:VNx16BI FFRT_REGNUM)]
 	       SVE_LDFF1_LDNF1))]
 	  UNSPEC_PRED_X))]
-  "TARGET_SVE && (~<SVE_HSDI:narrower_mask> & <SVE_PARTIAL_I:self_mask>) == 0"
+  "TARGET_SVE
+   && TARGET_NON_STREAMING
+   && (~<SVE_HSDI:narrower_mask> & <SVE_PARTIAL_I:self_mask>) == 0"
   "ld<fn>f1<ANY_EXTEND:s><SVE_PARTIAL_I:Vesize>\t%0.<SVE_HSDI:Vctype>, %2/z, %1"
   "&& !CONSTANT_P (operands[3])"
   {
@@ -1409,7 +1411,7 @@ (define_expand "gather_load<mode><v_int_container>"
 	   (match_operand:DI 4 "aarch64_gather_scale_operand_<Vesize>")
 	   (mem:BLK (scratch))]
 	  UNSPEC_LD1_GATHER))]
-  "TARGET_SVE"
+  "TARGET_SVE && TARGET_NON_STREAMING"
   {
     operands[5] = aarch64_ptrue_reg (<VPRED>mode);
   }
@@ -1427,7 +1429,7 @@ (define_insn "mask_gather_load<mode><v_int_container>"
 	   (match_operand:DI 4 "aarch64_gather_scale_operand_<Vesize>" "Ui1, Ui1, Ui1, Ui1, i, i")
 	   (mem:BLK (scratch))]
 	  UNSPEC_LD1_GATHER))]
-  "TARGET_SVE"
+  "TARGET_SVE && TARGET_NON_STREAMING"
   "@
    ld1<Vesize>\t%0.s, %5/z, [%2.s]
    ld1<Vesize>\t%0.s, %5/z, [%2.s, #%1]
@@ -1449,7 +1451,7 @@ (define_insn "mask_gather_load<mode><v_int_container>"
 	   (match_operand:DI 4 "aarch64_gather_scale_operand_<Vesize>" "Ui1, Ui1, Ui1, i")
 	   (mem:BLK (scratch))]
 	  UNSPEC_LD1_GATHER))]
-  "TARGET_SVE"
+  "TARGET_SVE && TARGET_NON_STREAMING"
   "@
    ld1<Vesize>\t%0.d, %5/z, [%2.d]
    ld1<Vesize>\t%0.d, %5/z, [%2.d, #%1]
@@ -1472,7 +1474,7 @@ (define_insn_and_rewrite "*mask_gather_load<mode><v_int_container>_<su>xtw_unpac
 	   (match_operand:DI 4 "aarch64_gather_scale_operand_<Vesize>" "Ui1, i")
 	   (mem:BLK (scratch))]
 	  UNSPEC_LD1_GATHER))]
-  "TARGET_SVE"
+  "TARGET_SVE && TARGET_NON_STREAMING"
   "@
    ld1<Vesize>\t%0.d, %5/z, [%1, %2.d, <su>xtw]
    ld1<Vesize>\t%0.d, %5/z, [%1, %2.d, <su>xtw %p4]"
@@ -1499,7 +1501,7 @@ (define_insn_and_rewrite "*mask_gather_load<mode><v_int_container>_sxtw"
 	   (match_operand:DI 4 "aarch64_gather_scale_operand_<Vesize>" "Ui1, i")
 	   (mem:BLK (scratch))]
 	  UNSPEC_LD1_GATHER))]
-  "TARGET_SVE"
+  "TARGET_SVE && TARGET_NON_STREAMING"
   "@
    ld1<Vesize>\t%0.d, %5/z, [%1, %2.d, sxtw]
    ld1<Vesize>\t%0.d, %5/z, [%1, %2.d, sxtw %p4]"
@@ -1523,7 +1525,7 @@ (define_insn "*mask_gather_load<mode><v_int_container>_uxtw"
 	   (match_operand:DI 4 "aarch64_gather_scale_operand_<Vesize>" "Ui1, i")
 	   (mem:BLK (scratch))]
 	  UNSPEC_LD1_GATHER))]
-  "TARGET_SVE"
+  "TARGET_SVE && TARGET_NON_STREAMING"
   "@
    ld1<Vesize>\t%0.d, %5/z, [%1, %2.d, uxtw]
    ld1<Vesize>\t%0.d, %5/z, [%1, %2.d, uxtw %p4]"
@@ -1557,7 +1559,9 @@ (define_insn_and_rewrite "@aarch64_gather_load_<ANY_EXTEND:optab><SVE_4HSI:mode>
 		(mem:BLK (scratch))]
 	       UNSPEC_LD1_GATHER))]
 	  UNSPEC_PRED_X))]
-  "TARGET_SVE && (~<SVE_4HSI:narrower_mask> & <SVE_4BHI:self_mask>) == 0"
+  "TARGET_SVE
+   && TARGET_NON_STREAMING
+   && (~<SVE_4HSI:narrower_mask> & <SVE_4BHI:self_mask>) == 0"
   "@
    ld1<ANY_EXTEND:s><SVE_4BHI:Vesize>\t%0.s, %5/z, [%2.s]
    ld1<ANY_EXTEND:s><SVE_4BHI:Vesize>\t%0.s, %5/z, [%2.s, #%1]
@@ -1587,7 +1591,9 @@ (define_insn_and_rewrite "@aarch64_gather_load_<ANY_EXTEND:optab><SVE_2HSDI:mode
 		(mem:BLK (scratch))]
 	       UNSPEC_LD1_GATHER))]
 	  UNSPEC_PRED_X))]
-  "TARGET_SVE && (~<SVE_2HSDI:narrower_mask> & <SVE_2BHSI:self_mask>) == 0"
+  "TARGET_SVE
+   && TARGET_NON_STREAMING
+   && (~<SVE_2HSDI:narrower_mask> & <SVE_2BHSI:self_mask>) == 0"
   "@
    ld1<ANY_EXTEND:s><SVE_2BHSI:Vesize>\t%0.d, %5/z, [%2.d]
    ld1<ANY_EXTEND:s><SVE_2BHSI:Vesize>\t%0.d, %5/z, [%2.d, #%1]
@@ -1618,7 +1624,9 @@ (define_insn_and_rewrite "*aarch64_gather_load_<ANY_EXTEND:optab><SVE_2HSDI:mode
 		(mem:BLK (scratch))]
 	       UNSPEC_LD1_GATHER))]
 	  UNSPEC_PRED_X))]
-  "TARGET_SVE && (~<SVE_2HSDI:narrower_mask> & <SVE_2BHSI:self_mask>) == 0"
+  "TARGET_SVE
+   && TARGET_NON_STREAMING
+   && (~<SVE_2HSDI:narrower_mask> & <SVE_2BHSI:self_mask>) == 0"
   "@
    ld1<ANY_EXTEND:s><SVE_2BHSI:Vesize>\t%0.d, %5/z, [%1, %2.d, <ANY_EXTEND2:su>xtw]
    ld1<ANY_EXTEND:s><SVE_2BHSI:Vesize>\t%0.d, %5/z, [%1, %2.d, <ANY_EXTEND2:su>xtw %p4]"
@@ -1650,7 +1658,9 @@ (define_insn_and_rewrite "*aarch64_gather_load_<ANY_EXTEND:optab><SVE_2HSDI:mode
 		(mem:BLK (scratch))]
 	       UNSPEC_LD1_GATHER))]
 	  UNSPEC_PRED_X))]
-  "TARGET_SVE && (~<SVE_2HSDI:narrower_mask> & <SVE_2BHSI:self_mask>) == 0"
+  "TARGET_SVE
+   && TARGET_NON_STREAMING
+   && (~<SVE_2HSDI:narrower_mask> & <SVE_2BHSI:self_mask>) == 0"
   "@
    ld1<ANY_EXTEND:s><SVE_2BHSI:Vesize>\t%0.d, %5/z, [%1, %2.d, sxtw]
    ld1<ANY_EXTEND:s><SVE_2BHSI:Vesize>\t%0.d, %5/z, [%1, %2.d, sxtw %p4]"
@@ -1679,7 +1689,9 @@ (define_insn_and_rewrite "*aarch64_gather_load_<ANY_EXTEND:optab><SVE_2HSDI:mode
 		(mem:BLK (scratch))]
 	       UNSPEC_LD1_GATHER))]
 	  UNSPEC_PRED_X))]
-  "TARGET_SVE && (~<SVE_2HSDI:narrower_mask> & <SVE_2BHSI:self_mask>) == 0"
+  "TARGET_SVE
+   && TARGET_NON_STREAMING
+   && (~<SVE_2HSDI:narrower_mask> & <SVE_2BHSI:self_mask>) == 0"
   "@
    ld1<ANY_EXTEND:s><SVE_2BHSI:Vesize>\t%0.d, %5/z, [%1, %2.d, uxtw]
    ld1<ANY_EXTEND:s><SVE_2BHSI:Vesize>\t%0.d, %5/z, [%1, %2.d, uxtw %p4]"
@@ -1710,7 +1722,7 @@ (define_insn "@aarch64_ldff1_gather<mode>"
 	   (mem:BLK (scratch))
 	   (reg:VNx16BI FFRT_REGNUM)]
 	  UNSPEC_LDFF1_GATHER))]
-  "TARGET_SVE"
+  "TARGET_SVE && TARGET_NON_STREAMING"
   "@
    ldff1w\t%0.s, %5/z, [%2.s]
    ldff1w\t%0.s, %5/z, [%2.s, #%1]
@@ -1733,7 +1745,7 @@ (define_insn "@aarch64_ldff1_gather<mode>"
 	   (mem:BLK (scratch))
 	   (reg:VNx16BI FFRT_REGNUM)]
 	  UNSPEC_LDFF1_GATHER))]
-  "TARGET_SVE"
+  "TARGET_SVE && TARGET_NON_STREAMING"
   "@
    ldff1d\t%0.d, %5/z, [%2.d]
    ldff1d\t%0.d, %5/z, [%2.d, #%1]
@@ -1758,7 +1770,7 @@ (define_insn_and_rewrite "*aarch64_ldff1_gather<mode>_sxtw"
 	   (mem:BLK (scratch))
 	   (reg:VNx16BI FFRT_REGNUM)]
 	  UNSPEC_LDFF1_GATHER))]
-  "TARGET_SVE"
+  "TARGET_SVE && TARGET_NON_STREAMING"
   "@
    ldff1d\t%0.d, %5/z, [%1, %2.d, sxtw]
    ldff1d\t%0.d, %5/z, [%1, %2.d, sxtw %p4]"
@@ -1782,7 +1794,7 @@ (define_insn "*aarch64_ldff1_gather<mode>_uxtw"
 	   (mem:BLK (scratch))
 	   (reg:VNx16BI FFRT_REGNUM)]
 	  UNSPEC_LDFF1_GATHER))]
-  "TARGET_SVE"
+  "TARGET_SVE && TARGET_NON_STREAMING"
   "@
    ldff1d\t%0.d, %5/z, [%1, %2.d, uxtw]
    ldff1d\t%0.d, %5/z, [%1, %2.d, uxtw %p4]"
@@ -1817,7 +1829,7 @@ (define_insn_and_rewrite "@aarch64_ldff1_gather_<ANY_EXTEND:optab><VNx4_WIDE:mod
 		(reg:VNx16BI FFRT_REGNUM)]
 	       UNSPEC_LDFF1_GATHER))]
 	  UNSPEC_PRED_X))]
-  "TARGET_SVE"
+  "TARGET_SVE && TARGET_NON_STREAMING"
   "@
    ldff1<ANY_EXTEND:s><VNx4_NARROW:Vesize>\t%0.s, %5/z, [%2.s]
    ldff1<ANY_EXTEND:s><VNx4_NARROW:Vesize>\t%0.s, %5/z, [%2.s, #%1]
@@ -1848,7 +1860,7 @@ (define_insn_and_rewrite "@aarch64_ldff1_gather_<ANY_EXTEND:optab><VNx2_WIDE:mod
 		(reg:VNx16BI FFRT_REGNUM)]
 	       UNSPEC_LDFF1_GATHER))]
 	  UNSPEC_PRED_X))]
-  "TARGET_SVE"
+  "TARGET_SVE && TARGET_NON_STREAMING"
   "@
    ldff1<ANY_EXTEND:s><VNx2_NARROW:Vesize>\t%0.d, %5/z, [%2.d]
    ldff1<ANY_EXTEND:s><VNx2_NARROW:Vesize>\t%0.d, %5/z, [%2.d, #%1]
@@ -1881,7 +1893,7 @@ (define_insn_and_rewrite "*aarch64_ldff1_gather_<ANY_EXTEND:optab><VNx2_WIDE:mod
 		(reg:VNx16BI FFRT_REGNUM)]
 	       UNSPEC_LDFF1_GATHER))]
 	  UNSPEC_PRED_X))]
-  "TARGET_SVE"
+  "TARGET_SVE && TARGET_NON_STREAMING"
   "@
    ldff1<ANY_EXTEND:s><VNx2_NARROW:Vesize>\t%0.d, %5/z, [%1, %2.d, sxtw]
    ldff1<ANY_EXTEND:s><VNx2_NARROW:Vesize>\t%0.d, %5/z, [%1, %2.d, sxtw %p4]"
@@ -1910,7 +1922,7 @@ (define_insn_and_rewrite "*aarch64_ldff1_gather_<ANY_EXTEND:optab><VNx2_WIDE:mod
 		(reg:VNx16BI FFRT_REGNUM)]
 	       UNSPEC_LDFF1_GATHER))]
 	  UNSPEC_PRED_X))]
-  "TARGET_SVE"
+  "TARGET_SVE && TARGET_NON_STREAMING"
   "@
    ldff1<ANY_EXTEND:s><VNx2_NARROW:Vesize>\t%0.d, %5/z, [%1, %2.d, uxtw]
    ldff1<ANY_EXTEND:s><VNx2_NARROW:Vesize>\t%0.d, %5/z, [%1, %2.d, uxtw %p4]"
@@ -1985,7 +1997,7 @@ (define_insn "@aarch64_sve_gather_prefetch<SVE_FULL_I:mode><VNx4SI_ONLY:mode>"
 	       UNSPEC_SVE_PREFETCH_GATHER)
 	     (match_operand:DI 7 "const_int_operand")
 	     (match_operand:DI 8 "const_int_operand"))]
-  "TARGET_SVE"
+  "TARGET_SVE && TARGET_NON_STREAMING"
   {
     static const char *const insns[][2] = {
       "prf<SVE_FULL_I:Vesize>", "%0, [%2.s]",
@@ -2014,7 +2026,7 @@ (define_insn "@aarch64_sve_gather_prefetch<SVE_FULL_I:mode><VNx2DI_ONLY:mode>"
 	       UNSPEC_SVE_PREFETCH_GATHER)
 	     (match_operand:DI 7 "const_int_operand")
 	     (match_operand:DI 8 "const_int_operand"))]
-  "TARGET_SVE"
+  "TARGET_SVE && TARGET_NON_STREAMING"
   {
     static const char *const insns[][2] = {
       "prf<SVE_FULL_I:Vesize>", "%0, [%2.d]",
@@ -2045,7 +2057,7 @@ (define_insn_and_rewrite "*aarch64_sve_gather_prefetch<SVE_FULL_I:mode><VNx2DI_O
 	       UNSPEC_SVE_PREFETCH_GATHER)
 	     (match_operand:DI 7 "const_int_operand")
 	     (match_operand:DI 8 "const_int_operand"))]
-  "TARGET_SVE"
+  "TARGET_SVE && TARGET_NON_STREAMING"
   {
     static const char *const insns[][2] = {
       "prfb", "%0, [%1, %2.d, sxtw]",
@@ -2075,7 +2087,7 @@ (define_insn "*aarch64_sve_gather_prefetch<SVE_FULL_I:mode><VNx2DI_ONLY:mode>_ux
 	       UNSPEC_SVE_PREFETCH_GATHER)
 	     (match_operand:DI 7 "const_int_operand")
 	     (match_operand:DI 8 "const_int_operand"))]
-  "TARGET_SVE"
+  "TARGET_SVE && TARGET_NON_STREAMING"
   {
     static const char *const insns[][2] = {
       "prfb", "%0, [%1, %2.d, uxtw]",
@@ -2242,7 +2254,7 @@ (define_expand "scatter_store<mode><v_int_container>"
 	   (match_operand:DI 3 "aarch64_gather_scale_operand_<Vesize>")
 	   (match_operand:SVE_24 4 "register_operand")]
 	  UNSPEC_ST1_SCATTER))]
-  "TARGET_SVE"
+  "TARGET_SVE && TARGET_NON_STREAMING"
   {
     operands[5] = aarch64_ptrue_reg (<VPRED>mode);
   }
@@ -2260,7 +2272,7 @@ (define_insn "mask_scatter_store<mode><v_int_container>"
 	   (match_operand:DI 3 "aarch64_gather_scale_operand_<Vesize>" "Ui1, Ui1, Ui1, Ui1, i, i")
 	   (match_operand:SVE_4 4 "register_operand" "w, w, w, w, w, w")]
 	  UNSPEC_ST1_SCATTER))]
-  "TARGET_SVE"
+  "TARGET_SVE && TARGET_NON_STREAMING"
   "@
    st1<Vesize>\t%4.s, %5, [%1.s]
    st1<Vesize>\t%4.s, %5, [%1.s, #%0]
@@ -2282,7 +2294,7 @@ (define_insn "mask_scatter_store<mode><v_int_container>"
 	   (match_operand:DI 3 "aarch64_gather_scale_operand_<Vesize>" "Ui1, Ui1, Ui1, i")
 	   (match_operand:SVE_2 4 "register_operand" "w, w, w, w")]
 	  UNSPEC_ST1_SCATTER))]
-  "TARGET_SVE"
+  "TARGET_SVE && TARGET_NON_STREAMING"
   "@
    st1<Vesize>\t%4.d, %5, [%1.d]
    st1<Vesize>\t%4.d, %5, [%1.d, #%0]
@@ -2305,7 +2317,7 @@ (define_insn_and_rewrite "*mask_scatter_store<mode><v_int_container>_<su>xtw_unp
 	   (match_operand:DI 3 "aarch64_gather_scale_operand_<Vesize>" "Ui1, i")
 	   (match_operand:SVE_2 4 "register_operand" "w, w")]
 	  UNSPEC_ST1_SCATTER))]
-  "TARGET_SVE"
+  "TARGET_SVE && TARGET_NON_STREAMING"
   "@
    st1<Vesize>\t%4.d, %5, [%0, %1.d, <su>xtw]
    st1<Vesize>\t%4.d, %5, [%0, %1.d, <su>xtw %p3]"
@@ -2332,7 +2344,7 @@ (define_insn_and_rewrite "*mask_scatter_store<mode><v_int_container>_sxtw"
 	   (match_operand:DI 3 "aarch64_gather_scale_operand_<Vesize>" "Ui1, i")
 	   (match_operand:SVE_2 4 "register_operand" "w, w")]
 	  UNSPEC_ST1_SCATTER))]
-  "TARGET_SVE"
+  "TARGET_SVE && TARGET_NON_STREAMING"
   "@
    st1<Vesize>\t%4.d, %5, [%0, %1.d, sxtw]
    st1<Vesize>\t%4.d, %5, [%0, %1.d, sxtw %p3]"
@@ -2356,7 +2368,7 @@ (define_insn "*mask_scatter_store<mode><v_int_container>_uxtw"
 	   (match_operand:DI 3 "aarch64_gather_scale_operand_<Vesize>" "Ui1, i")
 	   (match_operand:SVE_2 4 "register_operand" "w, w")]
 	  UNSPEC_ST1_SCATTER))]
-  "TARGET_SVE"
+  "TARGET_SVE && TARGET_NON_STREAMING"
   "@
    st1<Vesize>\t%4.d, %5, [%0, %1.d, uxtw]
    st1<Vesize>\t%4.d, %5, [%0, %1.d, uxtw %p3]"
@@ -2384,7 +2396,7 @@ (define_insn "@aarch64_scatter_store_trunc<VNx4_NARROW:mode><VNx4_WIDE:mode>"
 	   (truncate:VNx4_NARROW
 	     (match_operand:VNx4_WIDE 4 "register_operand" "w, w, w, w, w, w"))]
 	  UNSPEC_ST1_SCATTER))]
-  "TARGET_SVE"
+  "TARGET_SVE && TARGET_NON_STREAMING"
   "@
    st1<VNx4_NARROW:Vesize>\t%4.s, %5, [%1.s]
    st1<VNx4_NARROW:Vesize>\t%4.s, %5, [%1.s, #%0]
@@ -2407,7 +2419,7 @@ (define_insn "@aarch64_scatter_store_trunc<VNx2_NARROW:mode><VNx2_WIDE:mode>"
 	   (truncate:VNx2_NARROW
 	     (match_operand:VNx2_WIDE 4 "register_operand" "w, w, w, w"))]
 	  UNSPEC_ST1_SCATTER))]
-  "TARGET_SVE"
+  "TARGET_SVE && TARGET_NON_STREAMING"
   "@
    st1<VNx2_NARROW:Vesize>\t%4.d, %5, [%1.d]
    st1<VNx2_NARROW:Vesize>\t%4.d, %5, [%1.d, #%0]
@@ -2432,7 +2444,7 @@ (define_insn_and_rewrite "*aarch64_scatter_store_trunc<VNx2_NARROW:mode><VNx2_WI
 	   (truncate:VNx2_NARROW
 	     (match_operand:VNx2_WIDE 4 "register_operand" "w, w"))]
 	  UNSPEC_ST1_SCATTER))]
-  "TARGET_SVE"
+  "TARGET_SVE && TARGET_NON_STREAMING"
   "@
    st1<VNx2_NARROW:Vesize>\t%4.d, %5, [%0, %1.d, sxtw]
    st1<VNx2_NARROW:Vesize>\t%4.d, %5, [%0, %1.d, sxtw %p3]"
@@ -2456,7 +2468,7 @@ (define_insn "*aarch64_scatter_store_trunc<VNx2_NARROW:mode><VNx2_WIDE:mode>_uxt
 	   (truncate:VNx2_NARROW
 	     (match_operand:VNx2_WIDE 4 "register_operand" "w, w"))]
 	  UNSPEC_ST1_SCATTER))]
-  "TARGET_SVE"
+  "TARGET_SVE && TARGET_NON_STREAMING"
   "@
    st1<VNx2_NARROW:Vesize>\t%4.d, %5, [%0, %1.d, uxtw]
    st1<VNx2_NARROW:Vesize>\t%4.d, %5, [%0, %1.d, uxtw %p3]"
@@ -2602,7 +2614,7 @@ (define_insn "@aarch64_sve_ld1ro<mode>"
 	   (match_operand:OI 1 "aarch64_sve_ld1ro_operand_<Vesize>"
 			       "UO<Vesize>")]
 	  UNSPEC_LD1RO))]
-  "TARGET_SVE_F64MM"
+  "TARGET_SVE_F64MM && TARGET_NON_STREAMING"
   {
     operands[1] = gen_rtx_MEM (<VEL>mode, XEXP (operands[1], 0));
     return "ld1ro<Vesize>\t%0.<Vetype>, %2/z, %1";
@@ -3834,7 +3846,7 @@ (define_insn "@aarch64_adr<mode>"
 	  [(match_operand:SVE_FULL_SDI 1 "register_operand" "w")
 	   (match_operand:SVE_FULL_SDI 2 "register_operand" "w")]
 	  UNSPEC_ADR))]
-  "TARGET_SVE"
+  "TARGET_SVE && TARGET_NON_STREAMING"
   "adr\t%0.<Vetype>, [%1.<Vetype>, %2.<Vetype>]"
 )
 
@@ -3850,7 +3862,7 @@ (define_insn_and_rewrite "*aarch64_adr_sxtw"
 		  (match_operand:VNx2DI 2 "register_operand" "w")))]
 	     UNSPEC_PRED_X)]
 	  UNSPEC_ADR))]
-  "TARGET_SVE"
+  "TARGET_SVE && TARGET_NON_STREAMING"
   "adr\t%0.d, [%1.d, %2.d, sxtw]"
   "&& !CONSTANT_P (operands[3])"
   {
@@ -3867,7 +3879,7 @@ (define_insn "*aarch64_adr_uxtw_unspec"
 	     (match_operand:VNx2DI 2 "register_operand" "w")
 	     (match_operand:VNx2DI 3 "aarch64_sve_uxtw_immediate"))]
 	  UNSPEC_ADR))]
-  "TARGET_SVE"
+  "TARGET_SVE && TARGET_NON_STREAMING"
   "adr\t%0.d, [%1.d, %2.d, uxtw]"
 )
 
@@ -3879,7 +3891,7 @@ (define_insn "*aarch64_adr_uxtw_and"
 	    (match_operand:VNx2DI 2 "register_operand" "w")
 	    (match_operand:VNx2DI 3 "aarch64_sve_uxtw_immediate"))
 	  (match_operand:VNx2DI 1 "register_operand" "w")))]
-  "TARGET_SVE"
+  "TARGET_SVE && TARGET_NON_STREAMING"
   "adr\t%0.d, [%1.d, %2.d, uxtw]"
 )
 
@@ -3894,7 +3906,7 @@ (define_expand "@aarch64_adr<mode>_shift"
 	       (match_operand:SVE_FULL_SDI 3 "const_1_to_3_operand"))]
 	    UNSPEC_PRED_X)
 	  (match_operand:SVE_FULL_SDI 1 "register_operand")))]
-  "TARGET_SVE"
+  "TARGET_SVE && TARGET_NON_STREAMING"
   {
     operands[4] = CONSTM1_RTX (<VPRED>mode);
   }
@@ -3910,7 +3922,7 @@ (define_insn_and_rewrite "*aarch64_adr<mode>_shift"
 	       (match_operand:SVE_24I 3 "const_1_to_3_operand"))]
 	    UNSPEC_PRED_X)
 	  (match_operand:SVE_24I 1 "register_operand" "w")))]
-  "TARGET_SVE"
+  "TARGET_SVE && TARGET_NON_STREAMING"
   "adr\t%0.<Vctype>, [%1.<Vctype>, %2.<Vctype>, lsl %3]"
   "&& !CONSTANT_P (operands[4])"
   {
@@ -3934,7 +3946,7 @@ (define_insn_and_rewrite "*aarch64_adr_shift_sxtw"
 	       (match_operand:VNx2DI 3 "const_1_to_3_operand"))]
 	    UNSPEC_PRED_X)
 	  (match_operand:VNx2DI 1 "register_operand" "w")))]
-  "TARGET_SVE"
+  "TARGET_SVE && TARGET_NON_STREAMING"
   "adr\t%0.d, [%1.d, %2.d, sxtw %3]"
   "&& (!CONSTANT_P (operands[4]) || !CONSTANT_P (operands[5]))"
   {
@@ -3955,7 +3967,7 @@ (define_insn_and_rewrite "*aarch64_adr_shift_uxtw"
 	       (match_operand:VNx2DI 3 "const_1_to_3_operand"))]
 	    UNSPEC_PRED_X)
 	  (match_operand:VNx2DI 1 "register_operand" "w")))]
-  "TARGET_SVE"
+  "TARGET_SVE && TARGET_NON_STREAMING"
   "adr\t%0.d, [%1.d, %2.d, uxtw %3]"
   "&& !CONSTANT_P (operands[5])"
   {
@@ -6967,7 +6979,7 @@ (define_insn "@aarch64_sve_add_<optab><vsi2qi>"
 	     (match_operand:<VSI2QI> 3 "register_operand" "w, w")]
 	    MATMUL)
 	  (match_operand:VNx4SI_ONLY 1 "register_operand" "0, w")))]
-  "TARGET_SVE_I8MM"
+  "TARGET_SVE_I8MM && TARGET_NON_STREAMING"
   "@
    <sur>mmla\\t%0.s, %2.b, %3.b
    movprfx\t%0, %1\;<sur>mmla\\t%0.s, %2.b, %3.b"
@@ -7538,7 +7550,7 @@ (define_insn "@aarch64_sve_<sve_fp_op><mode>"
 	   (match_operand:SVE_MATMULF 3 "register_operand" "w, w")
 	   (match_operand:SVE_MATMULF 1 "register_operand" "0, w")]
 	  FMMLA))]
-  "TARGET_SVE"
+  "TARGET_SVE && TARGET_NON_STREAMING"
   "@
    <sve_fp_op>\\t%0.<Vetype>, %2.<Vetype>, %3.<Vetype>
    movprfx\t%0, %1\;<sve_fp_op>\\t%0.<Vetype>, %2.<Vetype>, %3.<Vetype>"
@@ -8601,7 +8613,7 @@ (define_expand "fold_left_plus_<mode>"
 		       (match_operand:<VEL> 1 "register_operand")
 		       (match_operand:SVE_FULL_F 2 "register_operand")]
 		      UNSPEC_FADDA))]
-  "TARGET_SVE"
+  "TARGET_SVE && TARGET_NON_STREAMING"
   {
     operands[3] = aarch64_ptrue_reg (<VPRED>mode);
   }
@@ -8614,7 +8626,7 @@ (define_insn "mask_fold_left_plus_<mode>"
 		       (match_operand:<VEL> 1 "register_operand" "0")
 		       (match_operand:SVE_FULL_F 2 "register_operand" "w")]
 		      UNSPEC_FADDA))]
-  "TARGET_SVE"
+  "TARGET_SVE && TARGET_NON_STREAMING"
   "fadda\t%<Vetype>0, %3, %<Vetype>0, %2.<Vetype>"
 )
 
@@ -8668,7 +8680,7 @@ (define_insn "@aarch64_sve_compact<mode>"
 	  [(match_operand:<VPRED> 1 "register_operand" "Upl")
 	   (match_operand:SVE_FULL_SD 2 "register_operand" "w")]
 	  UNSPEC_SVE_COMPACT))]
-  "TARGET_SVE"
+  "TARGET_SVE && TARGET_NON_STREAMING"
   "compact\t%0.<Vetype>, %1, %2.<Vetype>"
 )
 
diff --git a/gcc/config/aarch64/aarch64-sve2.md b/gcc/config/aarch64/aarch64-sve2.md
index 5df38e3f951..033520740cd 100644
--- a/gcc/config/aarch64/aarch64-sve2.md
+++ b/gcc/config/aarch64/aarch64-sve2.md
@@ -109,7 +109,7 @@ (define_insn "@aarch64_gather_ldnt<mode>"
 	   (match_operand:<V_INT_EQUIV> 3 "register_operand" "w, w")
 	   (mem:BLK (scratch))]
 	  UNSPEC_LDNT1_GATHER))]
-  "TARGET_SVE2"
+  "TARGET_SVE2 && TARGET_NON_STREAMING"
   "@
    ldnt1<Vesize>\t%0.<Vetype>, %1/z, [%3.<Vetype>]
    ldnt1<Vesize>\t%0.<Vetype>, %1/z, [%3.<Vetype>, %2]"
@@ -129,6 +129,7 @@ (define_insn_and_rewrite "@aarch64_gather_ldnt_<ANY_EXTEND:optab><SVE_FULL_SDI:m
 	       UNSPEC_LDNT1_GATHER))]
 	  UNSPEC_PRED_X))]
   "TARGET_SVE2
+   && TARGET_NON_STREAMING
    && (~<SVE_FULL_SDI:narrower_mask> & <SVE_PARTIAL_I:self_mask>) == 0"
   "@
    ldnt1<ANY_EXTEND:s><SVE_PARTIAL_I:Vesize>\t%0.<SVE_FULL_SDI:Vetype>, %1/z, [%3.<SVE_FULL_SDI:Vetype>]
@@ -2426,7 +2427,7 @@ (define_insn "@aarch64_sve2_histcnt<mode>"
 	   (match_operand:SVE_FULL_SDI 2 "register_operand" "w")
 	   (match_operand:SVE_FULL_SDI 3 "register_operand" "w")]
 	  UNSPEC_HISTCNT))]
-  "TARGET_SVE2"
+  "TARGET_SVE2 && TARGET_NON_STREAMING"
   "histcnt\t%0.<Vetype>, %1/z, %2.<Vetype>, %3.<Vetype>"
 )
 
@@ -2436,7 +2437,7 @@ (define_insn "@aarch64_sve2_histseg<mode>"
 	  [(match_operand:VNx16QI_ONLY 1 "register_operand" "w")
 	   (match_operand:VNx16QI_ONLY 2 "register_operand" "w")]
 	  UNSPEC_HISTSEG))]
-  "TARGET_SVE2"
+  "TARGET_SVE2 && TARGET_NON_STREAMING"
   "histseg\t%0.<Vetype>, %1.<Vetype>, %2.<Vetype>"
 )
 
@@ -2460,7 +2461,7 @@ (define_insn "@aarch64_pred_<sve_int_op><mode>"
 	     SVE2_MATCH)]
 	  UNSPEC_PRED_Z))
    (clobber (reg:CC_NZC CC_REGNUM))]
-  "TARGET_SVE2"
+  "TARGET_SVE2 && TARGET_NON_STREAMING"
   "<sve_int_op>\t%0.<Vetype>, %1/z, %3.<Vetype>, %4.<Vetype>"
 )
 
@@ -2491,6 +2492,7 @@ (define_insn_and_rewrite "*aarch64_pred_<sve_int_op><mode>_cc"
 	     SVE2_MATCH)]
 	  UNSPEC_PRED_Z))]
   "TARGET_SVE2
+   && TARGET_NON_STREAMING
    && aarch64_sve_same_pred_for_ptest_p (&operands[4], &operands[6])"
   "<sve_int_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, %3.<Vetype>"
   "&& !rtx_equal_p (operands[4], operands[6])"
@@ -2518,6 +2520,7 @@ (define_insn_and_rewrite "*aarch64_pred_<sve_int_op><mode>_ptest"
 	  UNSPEC_PTEST))
    (clobber (match_scratch:<VPRED> 0 "=Upa"))]
   "TARGET_SVE2
+   && TARGET_NON_STREAMING
    && aarch64_sve_same_pred_for_ptest_p (&operands[4], &operands[6])"
   "<sve_int_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, %3.<Vetype>"
   "&& !rtx_equal_p (operands[4], operands[6])"
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 398cc03fd1f..8359cf709c1 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -252,6 +252,9 @@ constexpr auto AARCH64_FL_DEFAULT_ISA_MODE = AARCH64_FL_SM_OFF;
 #define AARCH64_ISA_MOPS	   (aarch64_isa_flags & AARCH64_FL_MOPS)
 #define AARCH64_ISA_LS64	   (aarch64_isa_flags & AARCH64_FL_LS64)
 
+/* The current function is a normal non-streaming function.  */
+#define TARGET_NON_STREAMING (AARCH64_ISA_SM_OFF)
+
 /* Crypto is an optional extension to AdvSIMD.  */
 #define TARGET_CRYPTO (AARCH64_ISA_CRYPTO)
 
@@ -290,16 +293,16 @@ constexpr auto AARCH64_FL_DEFAULT_ISA_MODE = AARCH64_FL_SM_OFF;
 #define TARGET_SVE2 (AARCH64_ISA_SVE2)
 
 /* SVE2 AES instructions, enabled through +sve2-aes.  */
-#define TARGET_SVE2_AES (AARCH64_ISA_SVE2_AES)
+#define TARGET_SVE2_AES (AARCH64_ISA_SVE2_AES && TARGET_NON_STREAMING)
 
 /* SVE2 BITPERM instructions, enabled through +sve2-bitperm.  */
-#define TARGET_SVE2_BITPERM (AARCH64_ISA_SVE2_BITPERM)
+#define TARGET_SVE2_BITPERM (AARCH64_ISA_SVE2_BITPERM && TARGET_NON_STREAMING)
 
 /* SVE2 SHA3 instructions, enabled through +sve2-sha3.  */
-#define TARGET_SVE2_SHA3 (AARCH64_ISA_SVE2_SHA3)
+#define TARGET_SVE2_SHA3 (AARCH64_ISA_SVE2_SHA3 && TARGET_NON_STREAMING)
 
 /* SVE2 SM4 instructions, enabled through +sve2-sm4.  */
-#define TARGET_SVE2_SM4 (AARCH64_ISA_SVE2_SM4)
+#define TARGET_SVE2_SM4 (AARCH64_ISA_SVE2_SM4 && TARGET_NON_STREAMING)
 
 /* ARMv8.3-A features.  */
 #define TARGET_ARMV8_3	(AARCH64_ISA_V8_3A)
diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index a8ad4e5ff21..8d65fadbdf6 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -2709,7 +2709,7 @@ (define_int_iterator SVE_INT_UNARY [UNSPEC_RBIT UNSPEC_REVB
 
 (define_int_iterator SVE_FP_UNARY [UNSPEC_FRECPE UNSPEC_RSQRTE])
 
-(define_int_iterator SVE_FP_UNARY_INT [UNSPEC_FEXPA])
+(define_int_iterator SVE_FP_UNARY_INT [(UNSPEC_FEXPA "TARGET_NON_STREAMING")])
 
 (define_int_iterator SVE_INT_SHIFT_IMM [UNSPEC_ASRD
 					(UNSPEC_SQSHLU "TARGET_SVE2")
@@ -2723,7 +2723,7 @@ (define_int_iterator SVE_FP_BINARY_INT [UNSPEC_FTSMUL UNSPEC_FTSSEL])
 (define_int_iterator SVE_BFLOAT_TERNARY_LONG [UNSPEC_BFDOT
 					      UNSPEC_BFMLALB
 					      UNSPEC_BFMLALT
-					      UNSPEC_BFMMLA])
+					      (UNSPEC_BFMMLA "TARGET_NON_STREAMING")])
 
 (define_int_iterator SVE_BFLOAT_TERNARY_LONG_LANE [UNSPEC_BFDOT
 						   UNSPEC_BFMLALB
diff --git a/gcc/testsuite/g++.target/aarch64/sve/aarch64-ssve.exp b/gcc/testsuite/g++.target/aarch64/sve/aarch64-ssve.exp
new file mode 100644
index 00000000000..23f23f8ec42
--- /dev/null
+++ b/gcc/testsuite/g++.target/aarch64/sve/aarch64-ssve.exp
@@ -0,0 +1,309 @@
+#  Specific regression driver for AArch64 SME.
+#  Copyright (C) 2009-2022 Free Software Foundation, Inc.
+#  Contributed by ARM Ltd.
+#
+#  This file is part of GCC.
+#
+#  GCC is free software; you can redistribute it and/or modify it
+#  under the terms of the GNU General Public License as published by
+#  the Free Software Foundation; either version 3, or (at your option)
+#  any later version.
+#
+#  GCC is distributed in the hope that it will be useful, but
+#  WITHOUT ANY WARRANTY; without even the implied warranty of
+#  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+#  General Public License for more details.
+#
+#  You should have received a copy of the GNU General Public License
+#  along with GCC; see the file COPYING3.  If not see
+#  <http://www.gnu.org/licenses/>.  */
+
+# Test whether certain SVE instructions are accepted or rejected in
+# SME streaming mode.
+
+# Exit immediately if this isn't an AArch64 target.
+if {![istarget aarch64*-*-*] } then {
+    return
+}
+
+load_lib gcc-defs.exp
+
+gcc_parallel_test_enable 0
+
+# Code shared by all tests.
+set preamble {
+#include <arm_sve.h>
+
+#pragma GCC target "+i8mm+f32mm+f64mm+sve2+sve2-bitperm+sve2-sm4+sve2-aes+sve2-sha3+sme"
+
+extern svbool_t &pred;
+
+extern svint8_t &s8;
+extern svint32_t &s32;
+
+extern svuint8_t &u8;
+extern svuint16_t &u16;
+extern svuint32_t &u32;
+extern svuint64_t &u64;
+
+extern svbfloat16_t &bf16;
+extern svfloat32_t &f32;
+
+extern void *void_ptr;
+
+extern int8_t *s8_ptr;
+extern int16_t *s16_ptr;
+extern int32_t *s32_ptr;
+
+extern uint8_t *u8_ptr;
+extern uint16_t *u16_ptr;
+extern uint32_t *u32_ptr;
+extern uint64_t *u64_ptr;
+
+extern uint64_t indx;
+}
+
+# Wrap a standalone call in a streaming-compatible function.
+set sc_harness {
+void __attribute__((arm_streaming_compatible))
+foo ()
+{
+  $CALL;
+}
+}
+
+# HARNESS is some source code that should be appended to the preamble
+# variable defined above.  It includes the string "$CALL", which should be
+# replaced by the function call in CALL.  The result after both steps is
+# a complete C++ translation unit.
+#
+# Try compiling the C++ code and see what output GCC produces.
+# The expected output is either:
+#
+# - empty, if SHOULD_PASS is true
+# - a message rejecting CALL in streaming mode, if SHOULD_PASS is false
+#
+# CALL is simple enough that it can be used in test names.
+proc check_ssve_call { harness name call should_pass } {
+    global preamble
+
+    set filename test-[pid]
+    set fd [open $filename.cc w]
+    puts $fd $preamble
+    puts -nonewline $fd [string map [list {$CALL} $call] $harness]
+    close $fd
+    remote_download host $filename.cc
+
+    set test "streaming SVE call $name"
+
+    set gcc_output [g++_target_compile $filename.cc $filename.s assembly ""]
+    remote_file build delete $filename.cc $filename.s
+
+    if { [string equal $gcc_output ""] } {
+	if { $should_pass } {
+	    pass $test
+	} else {
+	    fail $test
+	}
+	return
+    }
+
+    set lines [split $gcc_output "\n"]
+    set error_text "cannot be called when SME streaming mode is enabled"
+    if { [llength $lines] == 3
+	 && [string first "In function" [lindex $lines 0]] >= 0
+	 && [string first $error_text [lindex $lines 1]] >= 0
+	 && [string equal [lindex $lines 2] ""] } {
+	if { $should_pass } {
+	    fail $test
+	} else {
+	    pass $test
+	}
+	return
+    }
+
+    verbose -log "$test: unexpected output"
+    fail $test
+}
+
+# Apply check_ssve_call to each line in CALLS.  The other arguments are
+# as for check_ssve_call.
+proc check_ssve_calls { harness calls should_pass } {
+    foreach line [split $calls "\n"] {
+	set call [string trim $line]
+	if { [string equal $call ""] } {
+	    continue
+	}
+	check_ssve_call $harness "$call" $call $should_pass
+    }
+}
+
+# A small selection of things that are valid in streaming mode.
+set streaming_ok {
+    s8 = svadd_x (pred, s8, s8)
+    s8 = svld1 (pred, s8_ptr)
+}
+
+# This order follows the list in the SME manual.
+set nonstreaming_only {
+    u32 = svadrb_offset (u32, u32)
+    u64 = svadrb_offset (u64, u64)
+    u32 = svadrh_index (u32, u32)
+    u64 = svadrh_index (u64, u64)
+    u32 = svadrw_index (u32, u32)
+    u64 = svadrw_index (u64, u64)
+    u32 = svadrd_index (u32, u32)
+    u64 = svadrd_index (u64, u64)
+    u8 = svaesd (u8, u8)
+    u8 = svaese (u8, u8)
+    u8 = svaesimc (u8)
+    u8 = svaesmc (u8)
+    u8 = svbdep (u8, u8)
+    u8 = svbext (u8, u8)
+    f32 = svbfmmla (f32, bf16, bf16)
+    u8 = svbgrp (u8, u8)
+    u32 = svcompact (pred, u32)
+    f32 = svadda (pred, 1.0f, f32)
+    f32 = svexpa (u32)
+    f32 = svmmla (f32, f32, f32)
+    f32 = svtmad (f32, f32, 0)
+    f32 = svtsmul (f32, u32)
+    f32 = svtssel (f32, u32)
+    u32 = svhistcnt_z (pred, u32, u32)
+    u8 = svhistseg (u8, u8)
+    u32 = svld1ub_gather_offset_u32 (pred, u8_ptr, u32)
+    u32 = svld1ub_gather_offset_u32 (pred, u32, 1)
+    u64 = svld1_gather_index (pred, u64_ptr, u64)
+    u64 = svld1_gather_index_u64 (pred, u64, 1)
+    u32 = svld1uh_gather_index_u32 (pred, u16_ptr, u32)
+    u32 = svld1uh_gather_index_u32 (pred, u32, 1)
+    u8 = svld1ro (pred, u8_ptr + indx)
+    u8 = svld1ro (pred, u8_ptr + 1)
+    u16 = svld1ro (pred, u16_ptr + indx)
+    u16 = svld1ro (pred, u16_ptr + 1)
+    u32 = svld1ro (pred, u32_ptr + indx)
+    u32 = svld1ro (pred, u32_ptr + 1)
+    u64 = svld1ro (pred, u64_ptr + indx)
+    u64 = svld1ro (pred, u64_ptr + 1)
+    u32 = svld1sb_gather_offset_u32 (pred, s8_ptr, u32)
+    u32 = svld1sb_gather_offset_u32 (pred, u32, 1)
+    u32 = svld1sh_gather_index_u32 (pred, s16_ptr, u32)
+    u32 = svld1sh_gather_index_u32 (pred, u32, 1)
+    u64 = svld1sw_gather_index_u64 (pred, s32_ptr, u64)
+    u64 = svld1sw_gather_index_u64 (pred, u64, 1)
+    u64 = svld1uw_gather_index_u64 (pred, u32_ptr, u64)
+    u64 = svld1uw_gather_index_u64 (pred, u64, 1)
+    u32 = svld1_gather_index (pred, u32_ptr, u32)
+    u32 = svld1_gather_index_u32 (pred, u32, 1)
+    u8 = svldff1(pred, u8_ptr)
+    u16 = svldff1ub_u16(pred, u8_ptr)
+    u32 = svldff1ub_u32(pred, u8_ptr)
+    u64 = svldff1ub_u64(pred, u8_ptr)
+    u32 = svldff1ub_gather_offset_u32 (pred, u8_ptr, u32)
+    u32 = svldff1ub_gather_offset_u32 (pred, u32, 1)
+    u64 = svldff1(pred, u64_ptr)
+    u64 = svldff1_gather_index (pred, u64_ptr, u64)
+    u64 = svldff1_gather_index_u64 (pred, u64, 1)
+    u16 = svldff1(pred, u16_ptr)
+    u32 = svldff1uh_u32(pred, u16_ptr)
+    u64 = svldff1uh_u64(pred, u16_ptr)
+    u32 = svldff1uh_gather_offset_u32 (pred, u16_ptr, u32)
+    u32 = svldff1uh_gather_offset_u32 (pred, u32, 1)
+    u16 = svldff1sb_u16(pred, s8_ptr)
+    u32 = svldff1sb_u32(pred, s8_ptr)
+    u64 = svldff1sb_u64(pred, s8_ptr)
+    u32 = svldff1sb_gather_offset_u32 (pred, s8_ptr, u32)
+    u32 = svldff1sb_gather_offset_u32 (pred, u32, 1)
+    u32 = svldff1sh_u32(pred, s16_ptr)
+    u64 = svldff1sh_u64(pred, s16_ptr)
+    u32 = svldff1sh_gather_offset_u32 (pred, s16_ptr, u32)
+    u32 = svldff1sh_gather_offset_u32 (pred, u32, 1)
+    u64 = svldff1sw_u64(pred, s32_ptr)
+    u64 = svldff1sw_gather_offset_u64 (pred, s32_ptr, u64)
+    u64 = svldff1sw_gather_offset_u64 (pred, u64, 1)
+    u32 = svldff1(pred, u32_ptr)
+    u32 = svldff1_gather_index (pred, u32_ptr, u32)
+    u32 = svldff1_gather_index_u32 (pred, u32, 1)
+    u64 = svldff1uw_u64(pred, u32_ptr)
+    u64 = svldff1uw_gather_offset_u64 (pred, u32_ptr, u64)
+    u64 = svldff1uw_gather_offset_u64 (pred, u64, 1)
+    u8 = svldnf1(pred, u8_ptr)
+    u16 = svldnf1ub_u16(pred, u8_ptr)
+    u32 = svldnf1ub_u32(pred, u8_ptr)
+    u64 = svldnf1ub_u64(pred, u8_ptr)
+    u64 = svldnf1(pred, u64_ptr)
+    u16 = svldnf1(pred, u16_ptr)
+    u32 = svldnf1uh_u32(pred, u16_ptr)
+    u64 = svldnf1uh_u64(pred, u16_ptr)
+    u16 = svldnf1sb_u16(pred, s8_ptr)
+    u32 = svldnf1sb_u32(pred, s8_ptr)
+    u64 = svldnf1sb_u64(pred, s8_ptr)
+    u32 = svldnf1sh_u32(pred, s16_ptr)
+    u64 = svldnf1sh_u64(pred, s16_ptr)
+    u64 = svldnf1sw_u64(pred, s32_ptr)
+    u32 = svldnf1(pred, u32_ptr)
+    u64 = svldnf1uw_u64(pred, u32_ptr)
+    u32 = svldnt1ub_gather_offset_u32 (pred, u8_ptr, u32)
+    u32 = svldnt1ub_gather_offset_u32 (pred, u32, 1)
+    u64 = svldnt1_gather_index (pred, u64_ptr, u64)
+    u64 = svldnt1_gather_index_u64 (pred, u64, 1)
+    u32 = svldnt1uh_gather_offset_u32 (pred, u16_ptr, u32)
+    u32 = svldnt1uh_gather_offset_u32 (pred, u32, 1)
+    u32 = svldnt1sb_gather_offset_u32 (pred, s8_ptr, u32)
+    u32 = svldnt1sb_gather_offset_u32 (pred, u32, 1)
+    u32 = svldnt1sh_gather_offset_u32 (pred, s16_ptr, u32)
+    u32 = svldnt1sh_gather_offset_u32 (pred, u32, 1)
+    u64 = svldnt1sw_gather_offset_u64 (pred, s32_ptr, u64)
+    u64 = svldnt1sw_gather_offset_u64 (pred, u64, 1)
+    u64 = svldnt1uw_gather_offset_u64 (pred, u32_ptr, u64)
+    u64 = svldnt1uw_gather_offset_u64 (pred, u64, 1)
+    u32 = svldnt1_gather_offset (pred, u32_ptr, u32)
+    u32 = svldnt1_gather_offset_u32 (pred, u32, 1)
+    pred = svmatch (pred, u8, u8)
+    pred = svnmatch (pred, u8, u8)
+    u64 = svpmullb_pair (u64, u64)
+    u64 = svpmullt_pair (u64, u64)
+    svprfb_gather_offset (pred, void_ptr, u64, SV_PLDL1KEEP)
+    svprfb_gather_offset (pred, u64, 1, SV_PLDL1KEEP)
+    svprfd_gather_index (pred, void_ptr, u64, SV_PLDL1KEEP)
+    svprfd_gather_index (pred, u64, 1, SV_PLDL1KEEP)
+    svprfh_gather_index (pred, void_ptr, u64, SV_PLDL1KEEP)
+    svprfh_gather_index (pred, u64, 1, SV_PLDL1KEEP)
+    svprfw_gather_index (pred, void_ptr, u64, SV_PLDL1KEEP)
+    svprfw_gather_index (pred, u64, 1, SV_PLDL1KEEP)
+    u64 = svrax1 (u64, u64)
+    pred = svrdffr ()
+    pred = svrdffr_z (pred)
+    svsetffr ()
+    u32 = svsm4e (u32, u32)
+    u32 = svsm4ekey (u32, u32)
+    s32 = svmmla (s32, s8, s8)
+    svst1b_scatter_offset (pred, u8_ptr, u32, u32)
+    svst1b_scatter_offset (pred, u32, 1, u32)
+    svst1_scatter_index (pred, u64_ptr, u64, u64)
+    svst1_scatter_index (pred, u64, 1, u64)
+    svst1h_scatter_index (pred, u16_ptr, u32, u32)
+    svst1h_scatter_index (pred, u32, 1, u32)
+    svst1w_scatter_index (pred, u32_ptr, u64, u64)
+    svst1w_scatter_index (pred, u64, 1, u64)
+    svst1_scatter_index (pred, u32_ptr, u32, u32)
+    svst1_scatter_index (pred, u32, 1, u32)
+    svstnt1b_scatter_offset (pred, u8_ptr, u32, u32)
+    svstnt1b_scatter_offset (pred, u32, 1, u32)
+    svstnt1_scatter_offset (pred, u64_ptr, u64, u64)
+    svstnt1_scatter_offset (pred, u64, 1, u64)
+    svstnt1h_scatter_offset (pred, u16_ptr, u32, u32)
+    svstnt1h_scatter_offset (pred, u32, 1, u32)
+    svstnt1w_scatter_offset (pred, u32_ptr, u64, u64)
+    svstnt1w_scatter_offset (pred, u64, 1, u64)
+    svstnt1_scatter_offset (pred, u32_ptr, u32, u32)
+    svstnt1_scatter_offset (pred, u32, 1, u32)
+    u32 = svmmla (u32, u8, u8)
+    s32 = svusmmla (s32, u8, s8)
+    svwrffr (pred)
+}
+
+check_ssve_calls $sc_harness $streaming_ok 1
+check_ssve_calls $sc_harness $nonstreaming_only 0
+
+gcc_parallel_test_enable 1
diff --git a/gcc/testsuite/g++.target/aarch64/sve/acle/aarch64-sve-acle-asm.exp b/gcc/testsuite/g++.target/aarch64/sve/acle/aarch64-sve-acle-asm.exp
index 38140413a97..1f49c98f077 100644
--- a/gcc/testsuite/g++.target/aarch64/sve/acle/aarch64-sve-acle-asm.exp
+++ b/gcc/testsuite/g++.target/aarch64/sve/acle/aarch64-sve-acle-asm.exp
@@ -50,6 +50,7 @@ if { [info exists gcc_runtest_parallelize_limit_minor] } {
 torture-init
 set-torture-options {
     "-std=c++98 -O0 -g"
+    "-std=c++98 -O0 -DSTREAMING_COMPATIBLE"
     "-std=c++98 -O1 -g"
     "-std=c++11 -O2 -g"
     "-std=c++14 -O3 -g"
diff --git a/gcc/testsuite/g++.target/aarch64/sve2/acle/aarch64-sve2-acle-asm.exp b/gcc/testsuite/g++.target/aarch64/sve2/acle/aarch64-sve2-acle-asm.exp
index 78e8ecae729..8d562171a01 100644
--- a/gcc/testsuite/g++.target/aarch64/sve2/acle/aarch64-sve2-acle-asm.exp
+++ b/gcc/testsuite/g++.target/aarch64/sve2/acle/aarch64-sve2-acle-asm.exp
@@ -53,6 +53,7 @@ if { [info exists gcc_runtest_parallelize_limit_minor] } {
 torture-init
 set-torture-options {
     "-std=c++98 -O0 -g"
+    "-std=c++98 -O0 -DSTREAMING_COMPATIBLE"
     "-std=c++98 -O1 -g"
     "-std=c++11 -O2 -g"
     "-std=c++14 -O3 -g"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/aarch64-sve-acle-asm.exp b/gcc/testsuite/gcc.target/aarch64/sve/acle/aarch64-sve-acle-asm.exp
index a271f1793f4..8cb2b9bb4fc 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/aarch64-sve-acle-asm.exp
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/aarch64-sve-acle-asm.exp
@@ -50,6 +50,7 @@ if { [info exists gcc_runtest_parallelize_limit_minor] } {
 torture-init
 set-torture-options {
     "-std=c90 -O0 -g"
+    "-std=c90 -O0 -DSTREAMING_COMPATIBLE"
     "-std=c90 -O1 -g"
     "-std=c99 -O2 -g"
     "-std=c11 -O3 -g"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/adda_f16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/adda_f16.c
index 6c6bfa1c294..4d6ec2d65f7 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/adda_f16.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/adda_f16.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/adda_f32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/adda_f32.c
index 8b2a1dd1c68..04afbcee6c0 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/adda_f32.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/adda_f32.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/adda_f64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/adda_f64.c
index 90a56420a6a..8b4c7d1ff7f 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/adda_f64.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/adda_f64.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/adrb.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/adrb.c
index a61eec9712e..5dcdc54b007 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/adrb.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/adrb.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/adrd.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/adrd.c
index 970485bd67d..d9d16ce3f7d 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/adrd.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/adrd.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/adrh.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/adrh.c
index d06f51fe35b..a358c240389 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/adrh.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/adrh.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/adrw.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/adrw.c
index b23f25a1125..bd1e9af0a6d 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/adrw.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/adrw.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/bfmmla_f32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/bfmmla_f32.c
index b1d98fbf536..4bb2912a45a 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/bfmmla_f32.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/bfmmla_f32.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-additional-options "-march=armv8.2-a+sve+bf16" } */
 /* { dg-require-effective-target aarch64_asm_bf16_ok }  */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/compact_f32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/compact_f32.c
index 2e80d6830ca..d261ec00b92 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/compact_f32.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/compact_f32.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/compact_f64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/compact_f64.c
index e0bc33efec2..024b0510faa 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/compact_f64.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/compact_f64.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/compact_s32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/compact_s32.c
index e4634982bf6..0b32dfb609c 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/compact_s32.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/compact_s32.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/compact_s64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/compact_s64.c
index 71cb97b8a2a..38688dbca73 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/compact_s64.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/compact_s64.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/compact_u32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/compact_u32.c
index 954329a0b2f..a3e89cc97a1 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/compact_u32.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/compact_u32.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/compact_u64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/compact_u64.c
index ec664845f4a..602ab048c99 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/compact_u64.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/compact_u64.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/expa_f16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/expa_f16.c
index 5a5411e46cb..87c26e6ea6b 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/expa_f16.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/expa_f16.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/expa_f32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/expa_f32.c
index 4ded1c5756e..5e9839537c7 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/expa_f32.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/expa_f32.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/expa_f64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/expa_f64.c
index c31f9ccb5b2..b117df2a4b1 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/expa_f64.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/expa_f64.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1_gather_f32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1_gather_f32.c
index 00b68ff290c..8b972f61b49 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1_gather_f32.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1_gather_f32.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1_gather_f64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1_gather_f64.c
index 47127960c0d..413d4d62d4e 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1_gather_f64.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1_gather_f64.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1_gather_s32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1_gather_s32.c
index 9b6335547f5..b3df7d154cf 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1_gather_s32.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1_gather_s32.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1_gather_s64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1_gather_s64.c
index c9cea3ad8c7..0da1e52966b 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1_gather_s64.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1_gather_s64.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1_gather_u32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1_gather_u32.c
index 2cccc8d4906..a3304c4197a 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1_gather_u32.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1_gather_u32.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1_gather_u64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1_gather_u64.c
index 6ee1d48ab0c..73ef94805dc 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1_gather_u64.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1_gather_u64.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1ro_bf16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1ro_bf16.c
index cb1801778d4..fe909b666c9 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1ro_bf16.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1ro_bf16.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 /* { dg-additional-options "-march=armv8.6-a+f64mm" } */
 /* { dg-require-effective-target aarch64_asm_f64mm_ok }  */
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1ro_f16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1ro_f16.c
index 86081edbd65..30ba3063900 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1ro_f16.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1ro_f16.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 /* { dg-additional-options "-march=armv8.6-a+f64mm" } */
 /* { dg-require-effective-target aarch64_asm_f64mm_ok }  */
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1ro_f32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1ro_f32.c
index c8df00f8a02..cf62fada91a 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1ro_f32.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1ro_f32.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 /* { dg-additional-options "-march=armv8.6-a+f64mm" } */
 /* { dg-require-effective-target aarch64_asm_f64mm_ok }  */
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1ro_f64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1ro_f64.c
index 2fb9d5b7486..b9fde4dac69 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1ro_f64.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1ro_f64.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 /* { dg-additional-options "-march=armv8.6-a+f64mm" } */
 /* { dg-require-effective-target aarch64_asm_f64mm_ok }  */
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1ro_s16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1ro_s16.c
index 3cd211b1646..35b7dd1d27e 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1ro_s16.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1ro_s16.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 /* { dg-additional-options "-march=armv8.6-a+f64mm" } */
 /* { dg-require-effective-target aarch64_asm_f64mm_ok }  */
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1ro_s32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1ro_s32.c
index 44b16ed5f72..57b6a6567c0 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1ro_s32.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1ro_s32.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 /* { dg-additional-options "-march=armv8.6-a+f64mm" } */
 /* { dg-require-effective-target aarch64_asm_f64mm_ok }  */
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1ro_s64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1ro_s64.c
index 3aa9a15eeee..bd7e28478e2 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1ro_s64.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1ro_s64.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 /* { dg-additional-options "-march=armv8.6-a+f64mm" } */
 /* { dg-require-effective-target aarch64_asm_f64mm_ok }  */
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1ro_s8.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1ro_s8.c
index 49aff5146f2..1438000038e 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1ro_s8.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1ro_s8.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 /* { dg-additional-options "-march=armv8.6-a+f64mm" } */
 /* { dg-require-effective-target aarch64_asm_f64mm_ok }  */
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1ro_u16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1ro_u16.c
index 00bf9e129f5..145b0b7f3aa 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1ro_u16.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1ro_u16.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 /* { dg-additional-options "-march=armv8.6-a+f64mm" } */
 /* { dg-require-effective-target aarch64_asm_f64mm_ok }  */
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1ro_u32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1ro_u32.c
index 9e9b3290a12..9f150631b94 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1ro_u32.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1ro_u32.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 /* { dg-additional-options "-march=armv8.6-a+f64mm" } */
 /* { dg-require-effective-target aarch64_asm_f64mm_ok }  */
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1ro_u64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1ro_u64.c
index 64ec628714b..8dd75d13607 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1ro_u64.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1ro_u64.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 /* { dg-additional-options "-march=armv8.6-a+f64mm" } */
 /* { dg-require-effective-target aarch64_asm_f64mm_ok }  */
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1ro_u8.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1ro_u8.c
index 22701320bf7..f154545868b 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1ro_u8.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1ro_u8.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 /* { dg-additional-options "-march=armv8.6-a+f64mm" } */
 /* { dg-require-effective-target aarch64_asm_f64mm_ok }  */
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1sb_gather_s32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1sb_gather_s32.c
index 16a5316a9e4..06249ad4c5c 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1sb_gather_s32.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1sb_gather_s32.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1sb_gather_s64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1sb_gather_s64.c
index 3f953247ea1..8d141e133e6 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1sb_gather_s64.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1sb_gather_s64.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1sb_gather_u32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1sb_gather_u32.c
index 424de65a6fe..77836cbf652 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1sb_gather_u32.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1sb_gather_u32.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1sb_gather_u64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1sb_gather_u64.c
index aa375bea2e3..f4b24ab419a 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1sb_gather_u64.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1sb_gather_u64.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1sh_gather_s32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1sh_gather_s32.c
index ed07b4dfcfa..1b978236845 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1sh_gather_s32.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1sh_gather_s32.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1sh_gather_s64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1sh_gather_s64.c
index 20ca4272059..2009dec812e 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1sh_gather_s64.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1sh_gather_s64.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1sh_gather_u32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1sh_gather_u32.c
index e3a85a23fb6..0e1d4896665 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1sh_gather_u32.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1sh_gather_u32.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1sh_gather_u64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1sh_gather_u64.c
index 3a0094fba59..115d7d3a996 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1sh_gather_u64.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1sh_gather_u64.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1sw_gather_s64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1sw_gather_s64.c
index 4d076b4861a..5dc44421ca4 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1sw_gather_s64.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1sw_gather_s64.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1sw_gather_u64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1sw_gather_u64.c
index ffa85eb3e73..fac4ec41c00 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1sw_gather_u64.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1sw_gather_u64.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1ub_gather_s32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1ub_gather_s32.c
index a9c4182659e..f57df42266d 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1ub_gather_s32.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1ub_gather_s32.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1ub_gather_s64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1ub_gather_s64.c
index 99af86ddf82..0c069fa4f44 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1ub_gather_s64.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1ub_gather_s64.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1ub_gather_u32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1ub_gather_u32.c
index 77c7e0a2dff..98102e01393 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1ub_gather_u32.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1ub_gather_u32.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1ub_gather_u64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1ub_gather_u64.c
index b605f8b67e3..f86a34d1248 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1ub_gather_u64.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1ub_gather_u64.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1uh_gather_s32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1uh_gather_s32.c
index 84fb5c335d7..13937187895 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1uh_gather_s32.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1uh_gather_s32.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1uh_gather_s64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1uh_gather_s64.c
index 44700179322..f0338aae6b4 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1uh_gather_s64.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1uh_gather_s64.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1uh_gather_u32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1uh_gather_u32.c
index 09d3cc8c298..5810bc0accb 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1uh_gather_u32.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1uh_gather_u32.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1uh_gather_u64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1uh_gather_u64.c
index f3dcf03cd81..52e95abb9b4 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1uh_gather_u64.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1uh_gather_u64.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1uw_gather_s64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1uw_gather_s64.c
index f4e9d5db970..0889eefdddd 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1uw_gather_s64.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1uw_gather_s64.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1uw_gather_u64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1uw_gather_u64.c
index 854d19233f5..fb144d756ab 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1uw_gather_u64.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1uw_gather_u64.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_bf16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_bf16.c
index 80f6468700e..1f997480ea8 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_bf16.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_bf16.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_f16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_f16.c
index 13ce863c96a..60405d0a0ed 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_f16.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_f16.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_f32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_f32.c
index 2fcc633906c..225e9969dd2 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_f32.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_f32.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_f64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_f64.c
index cc15b927aba..366e36afdbe 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_f64.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_f64.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_gather_f32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_gather_f32.c
index 7e330c04221..b84b9bcdda7 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_gather_f32.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_gather_f32.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_gather_f64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_gather_f64.c
index d0e47f0bf19..e779b071283 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_gather_f64.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_gather_f64.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_gather_s32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_gather_s32.c
index 66bf0f74630..17e0f9aa2d8 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_gather_s32.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_gather_s32.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_gather_s64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_gather_s64.c
index faf71bf9dd5..030f187b152 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_gather_s64.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_gather_s64.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_gather_u32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_gather_u32.c
index 41c7dc9cf31..fb86530166f 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_gather_u32.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_gather_u32.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_gather_u64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_gather_u64.c
index 8b53ce94f85..5be30a2d842 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_gather_u64.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_gather_u64.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_s16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_s16.c
index 1d5fde0e639..61d242c074b 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_s16.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_s16.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_s32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_s32.c
index 97a36e88499..afe748ef939 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_s32.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_s32.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_s64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_s64.c
index c018a4c1ca6..bee22285539 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_s64.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_s64.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_s8.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_s8.c
index cf620d1f4b0..ccaac2ca4eb 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_s8.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_s8.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_u16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_u16.c
index 1fa819296cb..c8416f99df9 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_u16.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_u16.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_u32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_u32.c
index 5224ec40ac8..ec26a82ca19 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_u32.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_u32.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_u64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_u64.c
index 18e87f2b805..e211f179486 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_u64.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_u64.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_u8.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_u8.c
index 83883fca43a..24dfe452f03 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_u8.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_u8.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sb_gather_s32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sb_gather_s32.c
index c2a676807a5..f7e3977bfcf 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sb_gather_s32.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sb_gather_s32.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sb_gather_s64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sb_gather_s64.c
index 2f2a04d24bb..7f2a829a8e4 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sb_gather_s64.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sb_gather_s64.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sb_gather_u32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sb_gather_u32.c
index e3e83a205cb..685f628088d 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sb_gather_u32.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sb_gather_u32.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sb_gather_u64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sb_gather_u64.c
index 769f2c266e9..49a7a85367f 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sb_gather_u64.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sb_gather_u64.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sb_s16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sb_s16.c
index e0a748c6a6b..1d30c7ba618 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sb_s16.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sb_s16.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sb_s32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sb_s32.c
index 86716da9ba1..c2b3f42cb5b 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sb_s32.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sb_s32.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sb_s64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sb_s64.c
index e7a4aa6e93d..585a6241e0b 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sb_s64.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sb_s64.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sb_u16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sb_u16.c
index 69ba96d52e2..ebb2f0f66f0 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sb_u16.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sb_u16.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sb_u32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sb_u32.c
index e1a1873f0a4..f4ea96cf91c 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sb_u32.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sb_u32.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sb_u64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sb_u64.c
index 0a49cbcc07f..e3735239c4e 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sb_u64.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sb_u64.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sh_gather_s32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sh_gather_s32.c
index b633335dc71..67e70361b5c 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sh_gather_s32.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sh_gather_s32.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sh_gather_s64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sh_gather_s64.c
index 32a4309b633..5755c79bc1a 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sh_gather_s64.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sh_gather_s64.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sh_gather_u32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sh_gather_u32.c
index 73a9be8923b..a5848999573 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sh_gather_u32.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sh_gather_u32.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sh_gather_u64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sh_gather_u64.c
index 94ea73b6306..b1875120980 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sh_gather_u64.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sh_gather_u64.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sh_s32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sh_s32.c
index 81b64e836b8..bffac936527 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sh_s32.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sh_s32.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sh_s64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sh_s64.c
index 453b3ff244a..a4acb1e5ea9 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sh_s64.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sh_s64.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sh_u32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sh_u32.c
index bbbed79dc35..828288cd825 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sh_u32.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sh_u32.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sh_u64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sh_u64.c
index 5430e256b46..e3432c46c27 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sh_u64.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sh_u64.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sw_gather_s64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sw_gather_s64.c
index e5da8a83dc3..78aa34ec055 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sw_gather_s64.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sw_gather_s64.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sw_gather_u64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sw_gather_u64.c
index 41142875673..9dad1212c81 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sw_gather_u64.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sw_gather_u64.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sw_s64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sw_s64.c
index d795ace6391..33b6c10ddc5 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sw_s64.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sw_s64.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sw_u64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sw_u64.c
index 6caf2f5045d..e8c9c845f95 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sw_u64.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sw_u64.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1ub_gather_s32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1ub_gather_s32.c
index af0be08d21c..b1c9c81357f 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1ub_gather_s32.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1ub_gather_s32.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1ub_gather_s64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1ub_gather_s64.c
index 43124dd8930..9ab776a218f 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1ub_gather_s64.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1ub_gather_s64.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1ub_gather_u32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1ub_gather_u32.c
index 90c4e58a275..745740dfa3f 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1ub_gather_u32.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1ub_gather_u32.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1ub_gather_u64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1ub_gather_u64.c
index 302623a400b..3a7bd6a436b 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1ub_gather_u64.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1ub_gather_u64.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1ub_s16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1ub_s16.c
index 88ad2d1dc61..ade0704f7ad 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1ub_s16.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1ub_s16.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1ub_s32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1ub_s32.c
index e8e06411f98..5d3e0ce95e5 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1ub_s32.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1ub_s32.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1ub_s64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1ub_s64.c
index 21d02ddb721..08ae802ee26 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1ub_s64.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1ub_s64.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1ub_u16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1ub_u16.c
index 904cb027e3e..d8dc5e15738 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1ub_u16.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1ub_u16.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1ub_u32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1ub_u32.c
index a400123188b..042ae5a9f02 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1ub_u32.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1ub_u32.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1ub_u64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1ub_u64.c
index a9a98a68362..d0844fa5197 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1ub_u64.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1ub_u64.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1uh_gather_s32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1uh_gather_s32.c
index d02e443428a..12460105d0e 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1uh_gather_s32.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1uh_gather_s32.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1uh_gather_s64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1uh_gather_s64.c
index 663a73d2715..536331371b0 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1uh_gather_s64.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1uh_gather_s64.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1uh_gather_u32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1uh_gather_u32.c
index 5e0ef067f54..602e6a686e6 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1uh_gather_u32.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1uh_gather_u32.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1uh_gather_u64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1uh_gather_u64.c
index 1cfae1b9532..4b307b3416e 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1uh_gather_u64.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1uh_gather_u64.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1uh_s32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1uh_s32.c
index abb3d769a74..db205b1ef7b 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1uh_s32.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1uh_s32.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1uh_s64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1uh_s64.c
index 6e330e8e8a8..0eac877eb82 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1uh_s64.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1uh_s64.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1uh_u32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1uh_u32.c
index 4eb5323e957..266ecf167fe 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1uh_u32.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1uh_u32.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1uh_u64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1uh_u64.c
index ebac26e7d37..bdd725e4a35 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1uh_u64.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1uh_u64.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1uw_gather_s64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1uw_gather_s64.c
index 6c0daea52b5..ab2c79da782 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1uw_gather_s64.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1uw_gather_s64.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1uw_gather_u64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1uw_gather_u64.c
index 0e400c6790f..361d7de05d8 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1uw_gather_u64.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1uw_gather_u64.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1uw_s64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1uw_s64.c
index ac97798991c..8adcec3d512 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1uw_s64.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1uw_s64.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1uw_u64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1uw_u64.c
index c7ab0617106..781fc1a9c66 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1uw_u64.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1uw_u64.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1_bf16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1_bf16.c
index 947a896e778..93b4425ecb5 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1_bf16.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1_bf16.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1_f16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1_f16.c
index cf017868839..d47d748c76c 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1_f16.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1_f16.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1_f32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1_f32.c
index 83b73ec8e09..e390d685797 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1_f32.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1_f32.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1_f64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1_f64.c
index 778096e826b..97a0e39e7c8 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1_f64.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1_f64.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1_s16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1_s16.c
index 592c8237de3..21008d7f9ca 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1_s16.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1_s16.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1_s32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1_s32.c
index 634092af8ea..8a3d795b309 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1_s32.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1_s32.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1_s64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1_s64.c
index 4a03f66767a..c0b57a2f3fc 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1_s64.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1_s64.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1_s8.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1_s8.c
index 162ee176ad5..6714152d93c 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1_s8.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1_s8.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1_u16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1_u16.c
index e920ac43b45..3df404d77bb 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1_u16.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1_u16.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1_u32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1_u32.c
index 65e28c5c206..e899a4a6ff4 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1_u32.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1_u32.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1_u64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1_u64.c
index 70d3f27d87a..ab69656cfa8 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1_u64.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1_u64.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1_u8.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1_u8.c
index 5c29f1d196a..5d7b074973e 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1_u8.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1_u8.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1sb_s16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1sb_s16.c
index e04b9a7887f..5b53c885d6a 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1sb_s16.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1sb_s16.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1sb_s32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1sb_s32.c
index 0553fc98da4..992eba7cc2f 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1sb_s32.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1sb_s32.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1sb_s64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1sb_s64.c
index 61a474fdf52..99e0f8bd091 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1sb_s64.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1sb_s64.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1sb_u16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1sb_u16.c
index be63d8bf9b2..fe23913f23c 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1sb_u16.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1sb_u16.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1sb_u32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1sb_u32.c
index 4f52490b4a8..6deb39770a1 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1sb_u32.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1sb_u32.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1sb_u64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1sb_u64.c
index 73f50d182a5..e76457da6cd 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1sb_u64.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1sb_u64.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1sh_s32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1sh_s32.c
index 08c7dc6dd4d..e49a7f8ed49 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1sh_s32.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1sh_s32.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1sh_s64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1sh_s64.c
index 6a41bc26b7f..00b40281c24 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1sh_s64.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1sh_s64.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1sh_u32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1sh_u32.c
index 2f7718730f1..41560af330f 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1sh_u32.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1sh_u32.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1sh_u64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1sh_u64.c
index d7f1a68a4cd..0acf4b34916 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1sh_u64.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1sh_u64.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1sw_s64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1sw_s64.c
index 5b483e4aa1d..5782128982c 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1sw_s64.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1sw_s64.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1sw_u64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1sw_u64.c
index 62121ce0a44..8249c4c3f79 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1sw_u64.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1sw_u64.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1ub_s16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1ub_s16.c
index 8fe13411f31..e59c451f790 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1ub_s16.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1ub_s16.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1ub_s32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1ub_s32.c
index 50122e3b786..d788576e275 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1ub_s32.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1ub_s32.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1ub_s64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1ub_s64.c
index d7cce11b60c..b21fdb96491 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1ub_s64.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1ub_s64.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1ub_u16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1ub_u16.c
index 7bf82c3b6c0..1ae41b002ff 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1ub_u16.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1ub_u16.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1ub_u32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1ub_u32.c
index e2fef064b47..e3d8fb3b5f0 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1ub_u32.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1ub_u32.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1ub_u64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1ub_u64.c
index 57c61e122ac..df9a0c07fa7 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1ub_u64.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1ub_u64.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1uh_s32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1uh_s32.c
index ed9686c4ed5..c3467d84675 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1uh_s32.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1uh_s32.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1uh_s64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1uh_s64.c
index a3107f562b8..bf3355e9986 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1uh_s64.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1uh_s64.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1uh_u32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1uh_u32.c
index 93d5abaf76e..bcc3eb3fd8f 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1uh_u32.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1uh_u32.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1uh_u64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1uh_u64.c
index 32d36a84ce3..4c01c13ac3f 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1uh_u64.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1uh_u64.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1uw_s64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1uw_s64.c
index 373922791d0..3c655659115 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1uw_s64.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1uw_s64.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1uw_u64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1uw_u64.c
index b3c3be1d01f..b222a0dc648 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1uw_u64.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1uw_u64.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mmla_f32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mmla_f32.c
index f66dbf397c4..e1c7f47dc96 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mmla_f32.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mmla_f32.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-require-effective-target aarch64_asm_f32mm_ok } */
 /* { dg-additional-options "-march=armv8.2-a+f32mm" } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mmla_f64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mmla_f64.c
index 49dc0607cff..c45caa70001 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mmla_f64.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mmla_f64.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-require-effective-target aarch64_asm_f64mm_ok } */
 /* { dg-additional-options "-march=armv8.2-a+f64mm" } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mmla_s32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mmla_s32.c
index e7ce009acfc..dc155461c61 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mmla_s32.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mmla_s32.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-require-effective-target aarch64_asm_i8mm_ok } */
 /* { dg-additional-options "-march=armv8.2-a+sve+i8mm" } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mmla_u32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mmla_u32.c
index 81f5166fbf9..43d601a471d 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mmla_u32.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mmla_u32.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-require-effective-target aarch64_asm_i8mm_ok } */
 /* { dg-additional-options "-march=armv8.2-a+sve+i8mm" } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/prfb_gather.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/prfb_gather.c
index c4bfbbbf7d7..f32cfbfcb19 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/prfb_gather.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/prfb_gather.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/prfd_gather.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/prfd_gather.c
index a84acb1a106..8a4293b6253 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/prfd_gather.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/prfd_gather.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/prfh_gather.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/prfh_gather.c
index 04b7a15758c..6beca4b8e0f 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/prfh_gather.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/prfh_gather.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/prfw_gather.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/prfw_gather.c
index 2bbae1b9e02..6af44ac8290 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/prfw_gather.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/prfw_gather.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/rdffr_1.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/rdffr_1.c
index 5564e967fcf..7e28ef6412f 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/rdffr_1.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/rdffr_1.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st1_scatter_f32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st1_scatter_f32.c
index cb6774ad04f..1efd4344532 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st1_scatter_f32.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st1_scatter_f32.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st1_scatter_f64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st1_scatter_f64.c
index fe978bbe5f1..f50c43e8309 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st1_scatter_f64.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st1_scatter_f64.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st1_scatter_s32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st1_scatter_s32.c
index d244e701a81..bb6fb10b83f 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st1_scatter_s32.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st1_scatter_s32.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st1_scatter_s64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st1_scatter_s64.c
index 5c4ebf440bc..19ec78e9e6e 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st1_scatter_s64.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st1_scatter_s64.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st1_scatter_u32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st1_scatter_u32.c
index fe3f7259f24..57fbb91b0ef 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st1_scatter_u32.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st1_scatter_u32.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st1_scatter_u64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st1_scatter_u64.c
index 23212356625..60018be5b80 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st1_scatter_u64.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st1_scatter_u64.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st1b_scatter_s32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st1b_scatter_s32.c
index d59033356be..fb1bb29dbe2 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st1b_scatter_s32.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st1b_scatter_s32.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st1b_scatter_s64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st1b_scatter_s64.c
index c7a35f1b470..65ee9a071fd 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st1b_scatter_s64.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st1b_scatter_s64.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st1b_scatter_u32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st1b_scatter_u32.c
index e098cb9b77e..ceec6193952 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st1b_scatter_u32.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st1b_scatter_u32.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st1b_scatter_u64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st1b_scatter_u64.c
index 058d1313fc2..aeedbc6d7a7 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st1b_scatter_u64.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st1b_scatter_u64.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st1h_scatter_s32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st1h_scatter_s32.c
index 2a23d41f3a1..2d69d085bc0 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st1h_scatter_s32.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st1h_scatter_s32.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st1h_scatter_s64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st1h_scatter_s64.c
index 6a1adb05609..3e5733ef9bb 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st1h_scatter_s64.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st1h_scatter_s64.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st1h_scatter_u32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st1h_scatter_u32.c
index 12197315d09..5cd330a3dec 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st1h_scatter_u32.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st1h_scatter_u32.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st1h_scatter_u64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st1h_scatter_u64.c
index 7021ea68f49..0ee9948cb4e 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st1h_scatter_u64.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st1h_scatter_u64.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st1w_scatter_s64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st1w_scatter_s64.c
index 2363f592b19..f18bedce1ca 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st1w_scatter_s64.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st1w_scatter_s64.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st1w_scatter_u64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st1w_scatter_u64.c
index 767c009b4f7..6850865ec9a 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st1w_scatter_u64.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st1w_scatter_u64.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/test_sve_acle.h b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/test_sve_acle.h
index fbf392b3ed4..5ee272e270c 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/test_sve_acle.h
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/test_sve_acle.h
@@ -11,10 +11,17 @@
 #error "Please define -DTEST_OVERLOADS or -DTEST_FULL"
 #endif
 
+#ifdef STREAMING_COMPATIBLE
+#define ATTR __attribute__ ((arm_streaming_compatible))
+#else
+#define ATTR
+#endif
+
 #ifdef __cplusplus
-#define PROTO(NAME, RET, ARGS) extern "C" RET NAME ARGS; RET NAME ARGS
+#define PROTO(NAME, RET, ARGS) \
+  extern "C" RET ATTR NAME ARGS; RET ATTR NAME ARGS
 #else
-#define PROTO(NAME, RET, ARGS) RET NAME ARGS
+#define PROTO(NAME, RET, ARGS) RET ATTR NAME ARGS
 #endif
 
 #define TEST_UNIFORM_Z(NAME, TYPE, CODE1, CODE2)		\
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/tmad_f16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/tmad_f16.c
index 3a00716e37f..c0b03a0d331 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/tmad_f16.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/tmad_f16.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/tmad_f32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/tmad_f32.c
index b73d420fbac..8eef8a12ca8 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/tmad_f32.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/tmad_f32.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/tmad_f64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/tmad_f64.c
index fc31928a6c3..5c96c55796c 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/tmad_f64.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/tmad_f64.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/tsmul_f16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/tsmul_f16.c
index 94bc696eb07..9deed667f89 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/tsmul_f16.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/tsmul_f16.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/tsmul_f32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/tsmul_f32.c
index d0ec91882d2..749ea8664be 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/tsmul_f32.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/tsmul_f32.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/tsmul_f64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/tsmul_f64.c
index 23e0da3f7a0..053abcb26e9 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/tsmul_f64.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/tsmul_f64.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/tssel_f16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/tssel_f16.c
index e7c3ea03b81..3ab251fe04a 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/tssel_f16.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/tssel_f16.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/tssel_f32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/tssel_f32.c
index 022573a191d..6c6471c5e56 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/tssel_f32.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/tssel_f32.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/tssel_f64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/tssel_f64.c
index ffcdf4224b3..9559e0f352d 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/tssel_f64.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/tssel_f64.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/usmmla_s32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/usmmla_s32.c
index 9440f3fd919..a0dd7e334aa 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/usmmla_s32.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/usmmla_s32.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-require-effective-target aarch64_asm_i8mm_ok } */
 /* { dg-additional-options "-march=armv8.2-a+sve+i8mm" } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/aarch64-sve2-acle-asm.exp b/gcc/testsuite/gcc.target/aarch64/sve2/acle/aarch64-sve2-acle-asm.exp
index e08cd612190..41fd283fdbc 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/aarch64-sve2-acle-asm.exp
+++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/aarch64-sve2-acle-asm.exp
@@ -39,7 +39,7 @@ if { [check_effective_target_aarch64_sve2] } {
 
 # Turn off any codegen tweaks by default that may affect expected assembly.
 # Tests relying on those should turn them on explicitly.
-set sve_flags "$sve_flags -mtune=generic -moverride=tune=none"
+set sve2_flags "$sve2_flags -mtune=generic -moverride=tune=none"
 
 lappend extra_flags "-fno-ipa-icf"
 
@@ -52,6 +52,7 @@ if { [info exists gcc_runtest_parallelize_limit_minor] } {
 torture-init
 set-torture-options {
     "-std=c90 -O0 -g"
+    "-std=c90 -O0 -DSTREAMING_COMPATIBLE"
     "-std=c90 -O1 -g"
     "-std=c99 -O2 -g"
     "-std=c11 -O3 -g"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/aesd_u8.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/aesd_u8.c
index 622f5cf4609..484f7251f75 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/aesd_u8.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/aesd_u8.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/aese_u8.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/aese_u8.c
index 6555bbb1de7..6869bbd0527 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/aese_u8.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/aese_u8.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/aesimc_u8.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/aesimc_u8.c
index 4630595ff20..534ffe06f35 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/aesimc_u8.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/aesimc_u8.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/aesmc_u8.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/aesmc_u8.c
index 6e8acf48f2a..1660a8eaf01 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/aesmc_u8.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/aesmc_u8.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/bdep_u16.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/bdep_u16.c
index 14230850f70..c1a4e10614f 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/bdep_u16.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/bdep_u16.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/bdep_u32.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/bdep_u32.c
index 7f08df4baa2..4f14cc4c432 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/bdep_u32.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/bdep_u32.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/bdep_u64.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/bdep_u64.c
index 7f7cbbeebad..091253ec60b 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/bdep_u64.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/bdep_u64.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/bdep_u8.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/bdep_u8.c
index b420323b906..deb1ad27d90 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/bdep_u8.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/bdep_u8.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/bext_u16.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/bext_u16.c
index 50a647918e5..9efa501efa8 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/bext_u16.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/bext_u16.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/bext_u32.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/bext_u32.c
index 9f98b843c1a..18963da5bd3 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/bext_u32.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/bext_u32.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/bext_u64.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/bext_u64.c
index 9dbaec1b762..91591f93b88 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/bext_u64.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/bext_u64.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/bext_u8.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/bext_u8.c
index 81ed5a463a0..1211587ef41 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/bext_u8.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/bext_u8.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/bgrp_u16.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/bgrp_u16.c
index 70aeae3f329..72868bea7f6 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/bgrp_u16.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/bgrp_u16.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/bgrp_u32.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/bgrp_u32.c
index 6e19e38d897..c8923816fe4 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/bgrp_u32.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/bgrp_u32.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/bgrp_u64.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/bgrp_u64.c
index 27fa40f4777..86989529faf 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/bgrp_u64.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/bgrp_u64.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/bgrp_u8.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/bgrp_u8.c
index b667e03e3a4..5cd941a7a6e 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/bgrp_u8.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/bgrp_u8.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/histcnt_s32.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/histcnt_s32.c
index 7bf783a7c18..53d6c5c5636 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/histcnt_s32.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/histcnt_s32.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/histcnt_s64.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/histcnt_s64.c
index 001f5f0f187..c6d9862e31f 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/histcnt_s64.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/histcnt_s64.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/histcnt_u32.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/histcnt_u32.c
index d93091adc55..cb11a00261b 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/histcnt_u32.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/histcnt_u32.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/histcnt_u64.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/histcnt_u64.c
index 3b889802395..0bb06cdb45d 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/histcnt_u64.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/histcnt_u64.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/histseg_s8.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/histseg_s8.c
index 380ccdf85a5..ce3458e5ef6 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/histseg_s8.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/histseg_s8.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/histseg_u8.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/histseg_u8.c
index f43292f0ccd..7b1eff811c5 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/histseg_u8.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/histseg_u8.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1_gather_f32.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1_gather_f32.c
index 102810e25c8..17e3673a4a7 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1_gather_f32.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1_gather_f32.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1_gather_f64.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1_gather_f64.c
index a0ed71227e8..8ce32e9f9ff 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1_gather_f64.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1_gather_f64.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1_gather_s32.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1_gather_s32.c
index 94c64971c77..b7e1d7a99c8 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1_gather_s32.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1_gather_s32.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1_gather_s64.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1_gather_s64.c
index a0aa6703f9c..b0789ad21ce 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1_gather_s64.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1_gather_s64.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1_gather_u32.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1_gather_u32.c
index e1479684e82..df09eaa7680 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1_gather_u32.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1_gather_u32.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1_gather_u64.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1_gather_u64.c
index 77cdcfebafe..5f185ea824b 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1_gather_u64.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1_gather_u64.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1sb_gather_s32.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1sb_gather_s32.c
index bb729483fcd..71fece575d9 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1sb_gather_s32.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1sb_gather_s32.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1sb_gather_s64.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1sb_gather_s64.c
index de5b693140c..1183e72f0fb 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1sb_gather_s64.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1sb_gather_s64.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1sb_gather_u32.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1sb_gather_u32.c
index d01ec18e442..4d5e6e7716f 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1sb_gather_u32.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1sb_gather_u32.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1sb_gather_u64.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1sb_gather_u64.c
index b96e94353f1..ed329a23f19 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1sb_gather_u64.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1sb_gather_u64.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1sh_gather_s32.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1sh_gather_s32.c
index 1dcfbc0fb95..6dbd6cea0f6 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1sh_gather_s32.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1sh_gather_s32.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1sh_gather_s64.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1sh_gather_s64.c
index 4166ed0a6c8..4ea3335a29f 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1sh_gather_s64.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1sh_gather_s64.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1sh_gather_u32.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1sh_gather_u32.c
index 7680344da28..d5545151994 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1sh_gather_u32.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1sh_gather_u32.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1sh_gather_u64.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1sh_gather_u64.c
index 2427c83ab67..18c8ca44e7b 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1sh_gather_u64.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1sh_gather_u64.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1sw_gather_s64.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1sw_gather_s64.c
index 2f538e847c2..41bff31d021 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1sw_gather_s64.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1sw_gather_s64.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1sw_gather_u64.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1sw_gather_u64.c
index ace1c2f2fe5..30b8f6948f7 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1sw_gather_u64.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1sw_gather_u64.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1ub_gather_s32.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1ub_gather_s32.c
index d3b29eb193d..8750d11af0f 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1ub_gather_s32.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1ub_gather_s32.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1ub_gather_s64.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1ub_gather_s64.c
index 3bc406620d7..f7981991a6a 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1ub_gather_s64.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1ub_gather_s64.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1ub_gather_u32.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1ub_gather_u32.c
index 0af4b40b851..4d5ee4ef4ef 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1ub_gather_u32.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1ub_gather_u32.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1ub_gather_u64.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1ub_gather_u64.c
index fe28d78ed46..005c29c0644 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1ub_gather_u64.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1ub_gather_u64.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1uh_gather_s32.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1uh_gather_s32.c
index 985432615ca..92613b16685 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1uh_gather_s32.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1uh_gather_s32.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1uh_gather_s64.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1uh_gather_s64.c
index 3c5baeee60e..be2e6d126e8 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1uh_gather_s64.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1uh_gather_s64.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1uh_gather_u32.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1uh_gather_u32.c
index 4d945e9f994..4d122059f72 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1uh_gather_u32.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1uh_gather_u32.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1uh_gather_u64.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1uh_gather_u64.c
index 680238ac4f7..e3bc1044cd7 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1uh_gather_u64.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1uh_gather_u64.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1uw_gather_s64.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1uw_gather_s64.c
index 787ae9defb2..9efa4b2cbf0 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1uw_gather_s64.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1uw_gather_s64.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1uw_gather_u64.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1uw_gather_u64.c
index 4810bc3c45c..4ded4454df1 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1uw_gather_u64.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1uw_gather_u64.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/match_s16.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/match_s16.c
index baebc7693c6..d0ce8129475 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/match_s16.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/match_s16.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/match_s8.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/match_s8.c
index f35a753791d..03473906aa2 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/match_s8.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/match_s8.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/match_u16.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/match_u16.c
index 0bdf4462f3d..2a8b4d250ab 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/match_u16.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/match_u16.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/match_u8.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/match_u8.c
index 6d78692bdb4..8409276d905 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/match_u8.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/match_u8.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/nmatch_s16.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/nmatch_s16.c
index 935b19a1040..044ba1de397 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/nmatch_s16.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/nmatch_s16.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/nmatch_s8.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/nmatch_s8.c
index 8a00b30f308..6c2d890fa41 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/nmatch_s8.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/nmatch_s8.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/nmatch_u16.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/nmatch_u16.c
index 868c20a11e5..863e31054e2 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/nmatch_u16.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/nmatch_u16.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/nmatch_u8.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/nmatch_u8.c
index af6b5816513..a62783db763 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/nmatch_u8.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/nmatch_u8.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/pmullb_pair_u64.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/pmullb_pair_u64.c
index 944609214a1..1fd85e0ce80 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/pmullb_pair_u64.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/pmullb_pair_u64.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/pmullt_pair_u64.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/pmullt_pair_u64.c
index 90e2e991f9b..300d885abb0 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/pmullt_pair_u64.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/pmullt_pair_u64.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/rax1_s64.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/rax1_s64.c
index ea80d40dbdf..9dbc7183992 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/rax1_s64.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/rax1_s64.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/rax1_u64.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/rax1_u64.c
index b237c7edd5a..5caa2a5443b 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/rax1_u64.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/rax1_u64.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/sm4e_u32.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/sm4e_u32.c
index 0ff5746d814..14194eef6c4 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/sm4e_u32.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/sm4e_u32.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/sm4ekey_u32.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/sm4ekey_u32.c
index 58ad33c5ddb..e72384108e6 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/sm4ekey_u32.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/sm4ekey_u32.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/stnt1_scatter_f32.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/stnt1_scatter_f32.c
index 3f928e20eac..75539f6928f 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/stnt1_scatter_f32.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/stnt1_scatter_f32.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/stnt1_scatter_f64.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/stnt1_scatter_f64.c
index 8a35c76b90a..c0d47d0c13f 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/stnt1_scatter_f64.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/stnt1_scatter_f64.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/stnt1_scatter_s32.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/stnt1_scatter_s32.c
index bd600268228..80fb3e8695b 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/stnt1_scatter_s32.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/stnt1_scatter_s32.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/stnt1_scatter_s64.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/stnt1_scatter_s64.c
index 0bfa2616ef5..edd2bc41832 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/stnt1_scatter_s64.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/stnt1_scatter_s64.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/stnt1_scatter_u32.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/stnt1_scatter_u32.c
index fbfa008c1d5..a6e5059def9 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/stnt1_scatter_u32.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/stnt1_scatter_u32.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/stnt1_scatter_u64.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/stnt1_scatter_u64.c
index c283135c4ec..067e5b109c3 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/stnt1_scatter_u64.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/stnt1_scatter_u64.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/stnt1b_scatter_s32.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/stnt1b_scatter_s32.c
index bf6ba597362..498fe82e5c2 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/stnt1b_scatter_s32.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/stnt1b_scatter_s32.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/stnt1b_scatter_s64.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/stnt1b_scatter_s64.c
index a24d0c89c76..614f5fb1a49 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/stnt1b_scatter_s64.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/stnt1b_scatter_s64.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/stnt1b_scatter_u32.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/stnt1b_scatter_u32.c
index 2b05a7720bd..ce2c482afbd 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/stnt1b_scatter_u32.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/stnt1b_scatter_u32.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/stnt1b_scatter_u64.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/stnt1b_scatter_u64.c
index a13c5f5bb9d..593dc193975 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/stnt1b_scatter_u64.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/stnt1b_scatter_u64.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/stnt1h_scatter_s32.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/stnt1h_scatter_s32.c
index 4e012f61f34..b9d06c1c5ab 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/stnt1h_scatter_s32.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/stnt1h_scatter_s32.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/stnt1h_scatter_s64.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/stnt1h_scatter_s64.c
index e934a708d89..006e0e24dec 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/stnt1h_scatter_s64.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/stnt1h_scatter_s64.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/stnt1h_scatter_u32.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/stnt1h_scatter_u32.c
index db21821eb58..8cd7cb86ab3 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/stnt1h_scatter_u32.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/stnt1h_scatter_u32.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/stnt1h_scatter_u64.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/stnt1h_scatter_u64.c
index 53f930da1fc..972ee36896b 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/stnt1h_scatter_u64.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/stnt1h_scatter_u64.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/stnt1w_scatter_s64.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/stnt1w_scatter_s64.c
index ec6c837d907..368a17c4769 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/stnt1w_scatter_s64.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/stnt1w_scatter_s64.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/stnt1w_scatter_u64.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/stnt1w_scatter_u64.c
index 3c5d96de4f8..57d60a350de 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/stnt1w_scatter_u64.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/stnt1w_scatter_u64.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
 /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */
 
 #include "test_sve_acle.h"
-- 
2.25.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 05/16] aarch64: Switch PSTATE.SM around calls
  2022-11-13  9:59 [PATCH 00/16] aarch64: Add support for SME Richard Sandiford
                   ` (3 preceding siblings ...)
  2022-11-13 10:00 ` [PATCH 04/16] aarch64: Mark relevant SVE instructions as non-streaming Richard Sandiford
@ 2022-11-13 10:00 ` Richard Sandiford
  2022-11-13 10:01 ` [PATCH 06/16] aarch64: Add support for SME ZA attributes Richard Sandiford
                   ` (10 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: Richard Sandiford @ 2022-11-13 10:00 UTC (permalink / raw)
  To: gcc-patches

This patch adds support for switching to the appropriate SME mode
for each call.  Switching to streaming mode requires an SMSTART SM
instruction and switching to non-streaming mode requires an SMSTOP SM
instruction.  If the call is being made from streaming-compatible code,
these switches are conditional on the current mode being the opposite
of the one that the call needs.

Since changing PSTATE.SM changes the vector length and effectively
changes the ISA, the code to do the switching has to be emitted late.
The patch does this using a new pass that runs next to late prologue/
epilogue insertion.  (It doesn't use md_reorg because later additions
need the CFG.)

If a streaming-compatible function needs to switch mode for a call,
it must restore the original mode afterwards.  The old mode must
therefore be available immediately after the call.  The easiest
way of ensuring this is to force the use of a hard frame pointer
and ensure that the old state is saved at an in-range offset
from there.

Changing modes clobbers the Z and P registers, so we need to
save and restore live Z and P state around each mode switch.
However, mode switches are not expected to be performance
critical, so it seemed better to err on the side of being
correct rather than trying to optimise the save and restore
with surrounding code.

gcc/
	* config/aarch64/aarch64-passes.def
	(pass_late_thread_prologue_and_epilogue): New pass.
	* config/aarch64/aarch64-sme.md: New file.
	* config/aarch64/aarch64.md: Include it.
	(*tb<optab><mode>1): Rename to...
	(@aarch64_tb<optab><mode>): ...this.
	(call, call_value, sibcall, sibcall_value): Don't require operand 2
	to be a CONST_INT.
	* config/aarch64/aarch64-protos.h (aarch64_emit_call_insn): Return
	the insn.
	(make_pass_switch_sm_state): Declare.
	* config/aarch64/aarch64.h (TARGET_STREAMING_COMPATIBLE): New macro.
	(TARGET_SME): Likewise.
	(aarch64_frame::old_svcr_offset): New member variable.
	(machine_function::call_switches_sm_state): Likewise.
	(CUMULATIVE_ARGS::num_sme_mode_switch_args): Likewise.
	(CUMULATIVE_ARGS::sme_mode_switch_args): Likewise.
	* config/aarch64/aarch64.cc: Include tree-pass.h and cfgbuild.h.
	(aarch64_cfun_incoming_sm_state): New function.
	(aarch64_call_switches_sm_state): Likewise.
	(aarch64_callee_isa_mode): Likewise.
	(aarch64_insn_callee_isa_mode): Likewise.
	(aarch64_guard_switch_pstate_sm): Likewise.
	(aarch64_switch_pstate_sm): Likewise.
	(aarch64_sme_mode_switch_regs): New class.
	(aarch64_record_sme_mode_switch_args): New function.
	(aarch64_finish_sme_mode_switch_args): Likewise.
	(aarch64_function_arg): Handle the end marker by returning a
	PARALLEL that contains the ABI cookie that we used previously
	alongside the result of aarch64_finish_sme_mode_switch_args.
	(aarch64_init_cumulative_args): Initialize num_sme_mode_switch_args.
	(aarch64_function_arg_advance): If a call would switch SM state,
	record all argument registers that would need to be saved around
	the mode switch.
	(aarch64_need_old_pstate_sm): New function.
	(aarch64_layout_frame): Decide whether the frame needs to store the
	incoming value of PSTATE.SM and allocate a save slot for it if so.
	(aarch64_old_svcr_mem): New function.
	(aarch64_read_old_svcr): Likewise.
	(aarch64_guard_switch_pstate_sm): Likewise.
	(aarch64_expand_prologue): Initialize any SVCR save slot.
	(aarch64_expand_call): Allow the cookie to be PARALLEL that contains
	both the UNSPEC_CALLEE_ABI value and a list of registers that need
	to be preserved across a change to PSTATE.SM.  If the call does
	involve such a change to PSTATE.SM, record the registers that
	would be clobbered by this process.  Update call_switches_sm_state
	accordingly.
	(aarch64_emit_call_insn): Return the emitted instruction.
	(aarch64_frame_pointer_required): New function.
	(aarch64_switch_sm_state_for_call): Likewise.
	(pass_data_switch_sm_state): New pass variable.
	(pass_switch_sm_state): New pass class.
	(make_pass_switch_sm_state): New function.
	(TARGET_FRAME_POINTER_REQUIRED): Define.
	* config/aarch64/t-aarch64 (s-check-sve-md): Add aarch64-sme.md.

gcc/testsuite/
	* gcc.target/aarch64/sme/call_sm_switch_1.c: New test.
	* gcc.target/aarch64/sme/call_sm_switch_2.c: Likewise.
	* gcc.target/aarch64/sme/call_sm_switch_3.c: Likewise.
	* gcc.target/aarch64/sme/call_sm_switch_4.c: Likewise.
	* gcc.target/aarch64/sme/call_sm_switch_5.c: Likewise.
	* gcc.target/aarch64/sme/call_sm_switch_6.c: Likewise.
	* gcc.target/aarch64/sme/call_sm_switch_7.c: Likewise.
	* gcc.target/aarch64/sme/call_sm_switch_8.c: Likewise.
	* gcc.target/aarch64/sme/call_sm_switch_9.c: Likewise.
	* gcc.target/aarch64/sme/call_sm_switch_10.c: Likewise.
---
 gcc/config/aarch64/aarch64-passes.def         |   1 +
 gcc/config/aarch64/aarch64-protos.h           |   3 +-
 gcc/config/aarch64/aarch64-sme.md             | 133 +++
 gcc/config/aarch64/aarch64.cc                 | 815 +++++++++++++++++-
 gcc/config/aarch64/aarch64.h                  |  25 +
 gcc/config/aarch64/aarch64.md                 |  13 +-
 gcc/config/aarch64/t-aarch64                  |   3 +-
 .../gcc.target/aarch64/sme/call_sm_switch_1.c | 195 +++++
 .../aarch64/sme/call_sm_switch_10.c           |  37 +
 .../gcc.target/aarch64/sme/call_sm_switch_2.c |  43 +
 .../gcc.target/aarch64/sme/call_sm_switch_3.c | 156 ++++
 .../gcc.target/aarch64/sme/call_sm_switch_4.c |  43 +
 .../gcc.target/aarch64/sme/call_sm_switch_5.c | 308 +++++++
 .../gcc.target/aarch64/sme/call_sm_switch_6.c |  45 +
 .../gcc.target/aarch64/sme/call_sm_switch_7.c | 516 +++++++++++
 .../gcc.target/aarch64/sme/call_sm_switch_8.c |  87 ++
 .../gcc.target/aarch64/sme/call_sm_switch_9.c | 103 +++
 17 files changed, 2512 insertions(+), 14 deletions(-)
 create mode 100644 gcc/config/aarch64/aarch64-sme.md
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_10.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_3.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_4.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_5.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_6.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_7.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_8.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_9.c

diff --git a/gcc/config/aarch64/aarch64-passes.def b/gcc/config/aarch64/aarch64-passes.def
index a2babc112c3..0bd558001e4 100644
--- a/gcc/config/aarch64/aarch64-passes.def
+++ b/gcc/config/aarch64/aarch64-passes.def
@@ -20,6 +20,7 @@
 
 INSERT_PASS_AFTER (pass_regrename, 1, pass_fma_steering);
 INSERT_PASS_BEFORE (pass_reorder_blocks, 1, pass_track_speculation);
+INSERT_PASS_BEFORE (pass_late_thread_prologue_and_epilogue, 1, pass_switch_sm_state);
 INSERT_PASS_AFTER (pass_machine_reorg, 1, pass_tag_collision_avoidance);
 INSERT_PASS_BEFORE (pass_shorten_branches, 1, pass_insert_bti);
 INSERT_PASS_AFTER (pass_if_after_combine, 1, pass_cc_fusion);
diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index 06b926b42d6..0f686fba4bd 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -910,7 +910,7 @@ void aarch64_sve_expand_vector_init (rtx, rtx);
 void aarch64_init_cumulative_args (CUMULATIVE_ARGS *, const_tree, rtx,
 				   const_tree, unsigned, bool = false);
 void aarch64_init_expanders (void);
-void aarch64_emit_call_insn (rtx);
+rtx_insn *aarch64_emit_call_insn (rtx);
 void aarch64_register_pragmas (void);
 void aarch64_relayout_simd_types (void);
 void aarch64_reset_previous_fndecl (void);
@@ -1051,6 +1051,7 @@ rtl_opt_pass *make_pass_track_speculation (gcc::context *);
 rtl_opt_pass *make_pass_tag_collision_avoidance (gcc::context *);
 rtl_opt_pass *make_pass_insert_bti (gcc::context *ctxt);
 rtl_opt_pass *make_pass_cc_fusion (gcc::context *ctxt);
+rtl_opt_pass *make_pass_switch_sm_state (gcc::context *ctxt);
 
 poly_uint64 aarch64_regmode_natural_size (machine_mode);
 
diff --git a/gcc/config/aarch64/aarch64-sme.md b/gcc/config/aarch64/aarch64-sme.md
new file mode 100644
index 00000000000..88f1526fa34
--- /dev/null
+++ b/gcc/config/aarch64/aarch64-sme.md
@@ -0,0 +1,133 @@
+;; Machine description for AArch64 SME.
+;; Copyright (C) 2022 Free Software Foundation, Inc.
+;;
+;; This file is part of GCC.
+;;
+;; GCC is free software; you can redistribute it and/or modify it
+;; under the terms of the GNU General Public License as published by
+;; the Free Software Foundation; either version 3, or (at your option)
+;; any later version.
+;;
+;; GCC is distributed in the hope that it will be useful, but
+;; WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+;; General Public License for more details.
+;;
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3.  If not see
+;; <http://www.gnu.org/licenses/>.
+
+;; The file is organised into the following sections (search for the full
+;; line):
+;;
+;; == State management
+;; ---- Test current state
+;; ---- PSTATE.SM management
+
+;; =========================================================================
+;; == State management
+;; =========================================================================
+
+;; -------------------------------------------------------------------------
+;; ---- Test current state
+;; -------------------------------------------------------------------------
+
+(define_c_enum "unspec" [
+  UNSPEC_GET_SME_STATE
+  UNSPEC_READ_SVCR
+])
+
+(define_insn "aarch64_get_sme_state"
+  [(set (reg:TI R0_REGNUM)
+	(unspec_volatile:TI [(const_int 0)] UNSPEC_GET_SME_STATE))
+   (clobber (reg:DI R16_REGNUM))
+   (clobber (reg:DI R17_REGNUM))
+   (clobber (reg:DI R18_REGNUM))
+   (clobber (reg:DI R30_REGNUM))
+   (clobber (reg:CC CC_REGNUM))]
+  ""
+  "bl\t__arm_sme_state"
+)
+
+(define_insn "aarch64_read_svcr"
+  [(set (match_operand:DI 0 "register_operand" "=r")
+	(unspec_volatile:DI [(const_int 0)] UNSPEC_READ_SVCR))]
+  "TARGET_SME"
+  "mrs\t%0, svcr"
+)
+
+;; -------------------------------------------------------------------------
+;; ---- PSTATE.SM management
+;; -------------------------------------------------------------------------
+;; Includes
+;; - SMSTART SM
+;; - SMSTOP SM
+;; -------------------------------------------------------------------------
+
+(define_c_enum "unspec" [
+  UNSPEC_SMSTART_SM
+  UNSPEC_SMSTOP_SM
+])
+
+;; Doesn't depend on a TARGET_* since (a) the instruction is always
+;; emitted under direct control of aarch64 code and (b) it is sometimes
+;; used conditionally.
+(define_insn "aarch64_smstart_sm"
+  [(unspec_volatile [(const_int 0)] UNSPEC_SMSTART_SM)
+   (clobber (reg:V4x16QI V0_REGNUM))
+   (clobber (reg:V4x16QI V4_REGNUM))
+   (clobber (reg:V4x16QI V8_REGNUM))
+   (clobber (reg:V4x16QI V12_REGNUM))
+   (clobber (reg:V4x16QI V16_REGNUM))
+   (clobber (reg:V4x16QI V20_REGNUM))
+   (clobber (reg:V4x16QI V24_REGNUM))
+   (clobber (reg:V4x16QI V28_REGNUM))
+   (clobber (reg:VNx16BI P0_REGNUM))
+   (clobber (reg:VNx16BI P1_REGNUM))
+   (clobber (reg:VNx16BI P2_REGNUM))
+   (clobber (reg:VNx16BI P3_REGNUM))
+   (clobber (reg:VNx16BI P4_REGNUM))
+   (clobber (reg:VNx16BI P5_REGNUM))
+   (clobber (reg:VNx16BI P6_REGNUM))
+   (clobber (reg:VNx16BI P7_REGNUM))
+   (clobber (reg:VNx16BI P8_REGNUM))
+   (clobber (reg:VNx16BI P9_REGNUM))
+   (clobber (reg:VNx16BI P10_REGNUM))
+   (clobber (reg:VNx16BI P11_REGNUM))
+   (clobber (reg:VNx16BI P12_REGNUM))
+   (clobber (reg:VNx16BI P13_REGNUM))
+   (clobber (reg:VNx16BI P14_REGNUM))
+   (clobber (reg:VNx16BI P15_REGNUM))]
+  ""
+  "smstart\tsm"
+)
+
+(define_insn "aarch64_smstop_sm"
+  [(unspec_volatile [(const_int 0)] UNSPEC_SMSTOP_SM)
+   (clobber (reg:V4x16QI V0_REGNUM))
+   (clobber (reg:V4x16QI V4_REGNUM))
+   (clobber (reg:V4x16QI V8_REGNUM))
+   (clobber (reg:V4x16QI V12_REGNUM))
+   (clobber (reg:V4x16QI V16_REGNUM))
+   (clobber (reg:V4x16QI V20_REGNUM))
+   (clobber (reg:V4x16QI V24_REGNUM))
+   (clobber (reg:V4x16QI V28_REGNUM))
+   (clobber (reg:VNx16BI P0_REGNUM))
+   (clobber (reg:VNx16BI P1_REGNUM))
+   (clobber (reg:VNx16BI P2_REGNUM))
+   (clobber (reg:VNx16BI P3_REGNUM))
+   (clobber (reg:VNx16BI P4_REGNUM))
+   (clobber (reg:VNx16BI P5_REGNUM))
+   (clobber (reg:VNx16BI P6_REGNUM))
+   (clobber (reg:VNx16BI P7_REGNUM))
+   (clobber (reg:VNx16BI P8_REGNUM))
+   (clobber (reg:VNx16BI P9_REGNUM))
+   (clobber (reg:VNx16BI P10_REGNUM))
+   (clobber (reg:VNx16BI P11_REGNUM))
+   (clobber (reg:VNx16BI P12_REGNUM))
+   (clobber (reg:VNx16BI P13_REGNUM))
+   (clobber (reg:VNx16BI P14_REGNUM))
+   (clobber (reg:VNx16BI P15_REGNUM))]
+  ""
+  "smstop\tsm"
+)
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 36ef0435b4e..d8310eb8597 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -82,6 +82,8 @@
 #include "tree-dfa.h"
 #include "asan.h"
 #include "aarch64-feature-deps.h"
+#include "tree-pass.h"
+#include "cfgbuild.h"
 
 /* This file should be included last.  */
 #include "target-def.h"
@@ -4103,6 +4105,26 @@ aarch64_fndecl_isa_mode (const_tree fndecl)
   return aarch64_fndecl_sm_state (fndecl);
 }
 
+/* Return the state of PSTATE.SM on entry to the current function.
+   This might be different from the state of PSTATE.SM in the function
+   body.  */
+
+static aarch64_feature_flags
+aarch64_cfun_incoming_sm_state ()
+{
+  return aarch64_fntype_sm_state (TREE_TYPE (cfun->decl));
+}
+
+/* Return true if a call from the current function to a function with
+   ISA mode CALLEE_MODE would involve a change to PSTATE.SM around
+   the BL instruction.  */
+
+static bool
+aarch64_call_switches_sm_state (aarch64_feature_flags callee_mode)
+{
+  return (callee_mode & ~AARCH64_ISA_MODE & AARCH64_FL_SM_STATE) != 0;
+}
+
 /* Implement TARGET_COMPATIBLE_VECTOR_TYPES_P.  */
 
 static bool
@@ -4185,6 +4207,16 @@ aarch64_callee_abi (rtx cookie)
   return function_abis[UINTVAL (cookie) >> AARCH64_NUM_ISA_MODES];
 }
 
+/* COOKIE is a CONST_INT from an UNSPEC_CALLEE_ABI rtx.  Return the
+   required ISA mode on entry to the callee, which is also the ISA
+   mode on return from the callee.  */
+
+static aarch64_feature_flags
+aarch64_callee_isa_mode (rtx cookie)
+{
+  return UINTVAL (cookie) & AARCH64_FL_ISA_MODES;
+}
+
 /* INSN is a call instruction.  Return the CONST_INT stored in its
    UNSPEC_CALLEE_ABI rtx.  */
 
@@ -4207,6 +4239,15 @@ aarch64_insn_callee_abi (const rtx_insn *insn)
   return aarch64_callee_abi (aarch64_insn_callee_cookie (insn));
 }
 
+/* INSN is a call instruction.  Return the required ISA mode on entry to
+   the callee, which is also the ISA mode on return from the callee.  */
+
+static aarch64_feature_flags
+aarch64_insn_callee_isa_mode (const rtx_insn *insn)
+{
+  return aarch64_callee_isa_mode (aarch64_insn_callee_cookie (insn));
+}
+
 /* Implement TARGET_HARD_REGNO_CALL_PART_CLOBBERED.  The callee only saves
    the lower 64 bits of a 128-bit register.  Tell the compiler the callee
    clobbers the top 64 bits when restoring the bottom 64 bits.  */
@@ -6394,6 +6435,428 @@ aarch64_sub_sp (rtx temp1, rtx temp2, poly_int64 delta, bool frame_related_p,
 		      temp1, temp2, frame_related_p, emit_move_imm);
 }
 
+/* A streaming-compatible function needs to switch temporarily to the known
+   PSTATE.SM mode described by LOCAL_MODE.  The low bit of OLD_SVCR contains
+   the runtime state of PSTATE.SM in the streaming-compatible code, before
+   the start of the switch to LOCAL_MODE.
+
+   Emit instructions to branch around the mode switch if PSTATE.SM already
+   matches LOCAL_MODE.  Return the label that the branch jumps to.  */
+
+static rtx_insn *
+aarch64_guard_switch_pstate_sm (rtx old_svcr, aarch64_feature_flags local_mode)
+{
+  local_mode &= AARCH64_FL_SM_STATE;
+  gcc_assert (local_mode != 0);
+  auto already_ok_cond = (local_mode & AARCH64_FL_SM_ON ? NE : EQ);
+  auto *label = gen_label_rtx ();
+  auto *jump = emit_jump_insn (gen_aarch64_tb (already_ok_cond, DImode,
+					       old_svcr, const0_rtx, label));
+  JUMP_LABEL (jump) = label;
+  return label;
+}
+
+/* Emit code to switch from the PSTATE.SM state in OLD_MODE to the PSTATE.SM
+   state in NEW_MODE.  This is known to involve either an SMSTART SM or
+   an SMSTOP SM.  */
+
+static void
+aarch64_switch_pstate_sm (aarch64_feature_flags old_mode,
+			  aarch64_feature_flags new_mode)
+{
+  old_mode &= AARCH64_FL_SM_STATE;
+  new_mode &= AARCH64_FL_SM_STATE;
+  gcc_assert (old_mode != new_mode);
+
+  if ((new_mode & AARCH64_FL_SM_ON)
+      || (new_mode == 0 && (old_mode & AARCH64_FL_SM_OFF)))
+    emit_insn (gen_aarch64_smstart_sm ());
+  else
+    emit_insn (gen_aarch64_smstop_sm ());
+}
+
+/* As a side-effect, SMSTART SM and SMSTOP SM clobber the contents of all
+   FP and predicate registers.  This class emits code to preserve any
+   necessary registers around the mode switch.
+
+   The class uses four approaches to saving and restoring contents, enumerated
+   by group_type:
+
+   - GPR: save and restore the contents of FP registers using GPRs.
+     This is used if the FP register contains no more than 64 significant
+     bits.  The registers used are FIRST_GPR onwards.
+
+   - MEM_128: save and restore 128-bit SIMD registers using memory.
+
+   - MEM_SVE_PRED: save and restore full SVE predicate registers using memory.
+
+   - MEM_SVE_DATA: save and restore full SVE vector registers using memory.
+
+   The save slots within each memory group are consecutive, with the
+   MEM_SVE_PRED slots occupying a region below the MEM_SVE_DATA slots.
+
+   There will only be two mode switches for each use of SME, so they should
+   not be particularly performance-sensitive.  It's also rare for SIMD, SVE
+   or predicate registers to be live across mode switches.  We therefore
+   don't preallocate the save slots but instead allocate them locally on
+   demand.  This makes the code emitted by the class self-contained.  */
+
+class aarch64_sme_mode_switch_regs
+{
+public:
+  static const unsigned int FIRST_GPR = R10_REGNUM;
+
+  void add_reg (machine_mode, unsigned int);
+  void add_call_args (rtx_call_insn *);
+  void add_call_result (rtx_call_insn *);
+
+  void emit_prologue ();
+  void emit_epilogue ();
+
+  /* The number of GPRs needed to save FP registers, starting from
+     FIRST_GPR.  */
+  unsigned int num_gprs () { return m_group_count[GPR]; }
+
+private:
+  enum sequence { PROLOGUE, EPILOGUE };
+  enum group_type { GPR, MEM_128, MEM_SVE_PRED, MEM_SVE_DATA, NUM_GROUPS };
+
+  /* Information about the save location for one FP, SIMD, SVE data, or
+     SVE predicate register.  */
+  struct save_location {
+    /* The register to be saved.  */
+    rtx reg;
+
+    /* Which group the save location belongs to.  */
+    group_type group;
+
+    /* A zero-based index of the register within the group.  */
+    unsigned int index;
+  };
+
+  unsigned int sve_data_headroom ();
+  rtx get_slot_mem (machine_mode, poly_int64);
+  void emit_stack_adjust (sequence, poly_int64);
+  void emit_mem_move (sequence, const save_location &, poly_int64);
+
+  void emit_gpr_moves (sequence);
+  void emit_mem_128_moves (sequence);
+  void emit_sve_sp_adjust (sequence);
+  void emit_sve_pred_moves (sequence);
+  void emit_sve_data_moves (sequence);
+
+  /* All save locations, in no particular order.  */
+  auto_vec<save_location, 12> m_save_locations;
+
+  /* The number of registers in each group.  */
+  unsigned int m_group_count[NUM_GROUPS] = {};
+};
+
+/* Record that (reg:MODE REGNO) needs to be preserved around the mode
+   switch.  */
+
+void
+aarch64_sme_mode_switch_regs::add_reg (machine_mode mode, unsigned int regno)
+{
+  if (!FP_REGNUM_P (regno) && !PR_REGNUM_P (regno))
+    return;
+
+  unsigned int end_regno = end_hard_regno (mode, regno);
+  unsigned int vec_flags = aarch64_classify_vector_mode (mode);
+  gcc_assert ((vec_flags & VEC_STRUCT) || end_regno == regno + 1);
+  for (; regno < end_regno; regno++)
+    {
+      machine_mode submode = mode;
+      if (vec_flags & VEC_STRUCT)
+	{
+	  if (vec_flags & VEC_SVE_DATA)
+	    submode = SVE_BYTE_MODE;
+	  else if (vec_flags & VEC_PARTIAL)
+	    submode = V8QImode;
+	  else
+	    submode = V16QImode;
+	}
+      save_location loc;
+      loc.reg = gen_rtx_REG (submode, regno);
+      if (vec_flags == VEC_SVE_PRED)
+	{
+	  gcc_assert (PR_REGNUM_P (regno));
+	  loc.group = MEM_SVE_PRED;
+	}
+      else
+	{
+	  gcc_assert (FP_REGNUM_P (regno));
+	  if (known_le (GET_MODE_SIZE (submode), 8))
+	    loc.group = GPR;
+	  else if (known_eq (GET_MODE_SIZE (submode), 16))
+	    loc.group = MEM_128;
+	  else
+	    loc.group = MEM_SVE_DATA;
+	}
+      loc.index = m_group_count[loc.group]++;
+      m_save_locations.quick_push (loc);
+    }
+}
+
+/* Record that the arguments to CALL_INSN need to be preserved around
+   the mode switch.  */
+
+void
+aarch64_sme_mode_switch_regs::add_call_args (rtx_call_insn *call_insn)
+{
+  for (rtx node = CALL_INSN_FUNCTION_USAGE (call_insn);
+       node; node = XEXP (node, 1))
+    {
+      rtx item = XEXP (node, 0);
+      if (GET_CODE (item) != USE)
+	continue;
+      item = XEXP (item, 0);
+      if (!REG_P (item))
+	continue;
+      add_reg (GET_MODE (item), REGNO (item));
+    }
+}
+
+/* Record that the return value from CALL_INSN (if any) needs to be
+   preserved around the mode switch.  */
+
+void
+aarch64_sme_mode_switch_regs::add_call_result (rtx_call_insn *call_insn)
+{
+  rtx pat = PATTERN (call_insn);
+  gcc_assert (GET_CODE (pat) == PARALLEL);
+  pat = XVECEXP (pat, 0, 0);
+  if (GET_CODE (pat) == CALL)
+    return;
+  rtx dest = SET_DEST (pat);
+  add_reg (GET_MODE (dest), REGNO (dest));
+}
+
+/* Emit code to save registers before the mode switch.  */
+
+void
+aarch64_sme_mode_switch_regs::emit_prologue ()
+{
+  emit_sve_sp_adjust (PROLOGUE);
+  emit_sve_pred_moves (PROLOGUE);
+  emit_sve_data_moves (PROLOGUE);
+  emit_mem_128_moves (PROLOGUE);
+  emit_gpr_moves (PROLOGUE);
+}
+
+/* Emit code to restore registers after the mode switch.  */
+
+void
+aarch64_sme_mode_switch_regs::emit_epilogue ()
+{
+  emit_gpr_moves (EPILOGUE);
+  emit_mem_128_moves (EPILOGUE);
+  emit_sve_pred_moves (EPILOGUE);
+  emit_sve_data_moves (EPILOGUE);
+  emit_sve_sp_adjust (EPILOGUE);
+}
+
+/* The SVE predicate registers are stored below the SVE data registers,
+   with the predicate save area being padded to a data-register-sized
+   boundary.  Return the size of this padded area as a whole number
+   of data register slots.  */
+
+unsigned int
+aarch64_sme_mode_switch_regs::sve_data_headroom ()
+{
+  return CEIL (m_group_count[MEM_SVE_PRED], 8);
+}
+
+/* Return a memory reference of mode MODE to OFFSET bytes from the
+   stack pointer.  */
+
+rtx
+aarch64_sme_mode_switch_regs::get_slot_mem (machine_mode mode,
+					    poly_int64 offset)
+{
+  rtx addr = plus_constant (Pmode, stack_pointer_rtx, offset);
+  return gen_rtx_MEM (mode, addr);
+}
+
+/* Allocate or deallocate SIZE bytes of stack space: SEQ decides which.  */
+
+void
+aarch64_sme_mode_switch_regs::emit_stack_adjust (sequence seq,
+						 poly_int64 size)
+{
+  if (seq == PROLOGUE)
+    size = -size;
+  emit_insn (gen_rtx_SET (stack_pointer_rtx,
+			  plus_constant (Pmode, stack_pointer_rtx, size)));
+}
+
+/* Save or restore the register in LOC, whose slot is OFFSET bytes from
+   the stack pointer.  SEQ chooses between saving and restoring.  */
+
+void
+aarch64_sme_mode_switch_regs::emit_mem_move (sequence seq,
+					     const save_location &loc,
+					     poly_int64 offset)
+{
+  rtx mem = get_slot_mem (GET_MODE (loc.reg), offset);
+  if (seq == PROLOGUE)
+    emit_move_insn (mem, loc.reg);
+  else
+    emit_move_insn (loc.reg, mem);
+}
+
+/* Emit instructions to save or restore the GPR group.  SEQ chooses between
+   saving and restoring.  */
+
+void
+aarch64_sme_mode_switch_regs::emit_gpr_moves (sequence seq)
+{
+  for (auto &loc : m_save_locations)
+    if (loc.group == GPR)
+      {
+	gcc_assert (loc.index < 8);
+	rtx gpr = gen_rtx_REG (GET_MODE (loc.reg), FIRST_GPR + loc.index);
+	if (seq == PROLOGUE)
+	  emit_move_insn (gpr, loc.reg);
+	else
+	  emit_move_insn (loc.reg, gpr);
+      }
+}
+
+/* Emit instructions to save or restore the MEM_128 group.  SEQ chooses
+   between saving and restoring.  */
+
+void
+aarch64_sme_mode_switch_regs::emit_mem_128_moves (sequence seq)
+{
+  HOST_WIDE_INT count = m_group_count[MEM_128];
+  if (count == 0)
+    return;
+
+  auto sp = stack_pointer_rtx;
+  auto sp_adjust = (seq == PROLOGUE ? -count : count) * 16;
+
+  /* Pick a common mode that supports LDR & STR with pre/post-modification
+     and LDP & STP with pre/post-modification.  */
+  auto mode = TFmode;
+
+  /* An instruction pattern that should be emitted at the end.  */
+  rtx last_pat = NULL_RTX;
+
+  /* A previous MEM_128 location that hasn't been handled yet.  */
+  save_location *prev_loc = nullptr;
+
+  /* Look for LDP/STPs and record any leftover LDR/STR in PREV_LOC.  */
+  for (auto &loc : m_save_locations)
+    if (loc.group == MEM_128)
+      {
+	if (!prev_loc)
+	  {
+	    prev_loc = &loc;
+	    continue;
+	  }
+	gcc_assert (loc.index == prev_loc->index + 1);
+
+	/* The offset of the base of the save area from the current
+	   stack pointer.  */
+	HOST_WIDE_INT bias = 0;
+	if (prev_loc->index == 0 && seq == PROLOGUE)
+	  bias = sp_adjust;
+
+	/* Get the two sets in the LDP/STP.  */
+	rtx ops[] = {
+	  gen_rtx_REG (mode, REGNO (prev_loc->reg)),
+	  get_slot_mem (mode, prev_loc->index * 16 + bias),
+	  gen_rtx_REG (mode, REGNO (loc.reg)),
+	  get_slot_mem (mode, loc.index * 16 + bias)
+	};
+	unsigned int lhs = (seq == PROLOGUE);
+	rtx set1 = gen_rtx_SET (ops[lhs], ops[1 - lhs]);
+	rtx set2 = gen_rtx_SET (ops[lhs + 2], ops[3 - lhs]);
+
+	/* Combine the sets with any stack allocation/deallocation.  */
+	rtvec vec;
+	if (prev_loc->index == 0)
+	  {
+	    rtx plus_sp = plus_constant (Pmode, sp, sp_adjust);
+	    vec = gen_rtvec (3, gen_rtx_SET (sp, plus_sp), set1, set2);
+	  }
+	else
+	  vec = gen_rtvec (2, set1, set2);
+	rtx pat = gen_rtx_PARALLEL (VOIDmode, vec);
+
+	/* Queue a deallocation to the end, otherwise emit the
+	   instruction now.  */
+	if (seq == EPILOGUE && prev_loc->index == 0)
+	  last_pat = pat;
+	else
+	  emit_insn (pat);
+	prev_loc = nullptr;
+      }
+
+  /* Handle any leftover LDR/STR.  */
+  if (prev_loc)
+    {
+      rtx reg = gen_rtx_REG (mode, REGNO (prev_loc->reg));
+      rtx addr;
+      if (prev_loc->index != 0)
+	addr = plus_constant (Pmode, sp, prev_loc->index * 16);
+      else if (seq == PROLOGUE)
+	{
+	  rtx allocate = plus_constant (Pmode, sp, -count * 16);
+	  addr = gen_rtx_PRE_MODIFY (Pmode, sp, allocate);
+	}
+      else
+	{
+	  rtx deallocate = plus_constant (Pmode, sp, count * 16);
+	  addr = gen_rtx_POST_MODIFY (Pmode, sp, deallocate);
+	}
+      rtx mem = gen_rtx_MEM (mode, addr);
+      if (seq == PROLOGUE)
+	emit_move_insn (mem, reg);
+      else
+	emit_move_insn (reg, mem);
+    }
+
+  if (last_pat)
+    emit_insn (last_pat);
+}
+
+/* Allocate or deallocate the stack space needed by the SVE groups.
+   SEQ chooses between allocating and deallocating.  */
+
+void
+aarch64_sme_mode_switch_regs::emit_sve_sp_adjust (sequence seq)
+{
+  if (unsigned int count = m_group_count[MEM_SVE_DATA] + sve_data_headroom ())
+    emit_stack_adjust (seq, count * BYTES_PER_SVE_VECTOR);
+}
+
+/* Save or restore the MEM_SVE_DATA group.  SEQ chooses between saving
+   and restoring.  */
+
+void
+aarch64_sme_mode_switch_regs::emit_sve_data_moves (sequence seq)
+{
+  for (auto &loc : m_save_locations)
+    if (loc.group == MEM_SVE_DATA)
+      {
+	auto index = loc.index + sve_data_headroom ();
+	emit_mem_move (seq, loc, index * BYTES_PER_SVE_VECTOR);
+      }
+}
+
+/* Save or restore the MEM_SVE_PRED group.  SEQ chooses between saving
+   and restoring.  */
+
+void
+aarch64_sme_mode_switch_regs::emit_sve_pred_moves (sequence seq)
+{
+  for (auto &loc : m_save_locations)
+    if (loc.group == MEM_SVE_PRED)
+      emit_mem_move (seq, loc, loc.index * BYTES_PER_SVE_PRED);
+}
+
 /* Set DEST to (vec_series BASE STEP).  */
 
 static void
@@ -7934,6 +8397,40 @@ on_stack:
   return;
 }
 
+/* Add the current argument register to the set of those that need
+   to be saved and restored around a change to PSTATE.SM.  */
+
+static void
+aarch64_record_sme_mode_switch_args (CUMULATIVE_ARGS *pcum)
+{
+  subrtx_var_iterator::array_type array;
+  FOR_EACH_SUBRTX_VAR (iter, array, pcum->aapcs_reg, NONCONST)
+    {
+      rtx x = *iter;
+      if (REG_P (x) && (FP_REGNUM_P (REGNO (x)) || PR_REGNUM_P (REGNO (x))))
+	{
+	  unsigned int i = pcum->num_sme_mode_switch_args++;
+	  gcc_assert (i < ARRAY_SIZE (pcum->sme_mode_switch_args));
+	  pcum->sme_mode_switch_args[i] = x;
+	}
+    }
+}
+
+/* Return a parallel that contains all the registers that need to be
+   saved around a change to PSTATE.SM.  Return const0_rtx if there is
+   no such mode switch, or if no registers need to be saved.  */
+
+static rtx
+aarch64_finish_sme_mode_switch_args (CUMULATIVE_ARGS *pcum)
+{
+  if (!pcum->num_sme_mode_switch_args)
+    return const0_rtx;
+
+  auto argvec = gen_rtvec_v (pcum->num_sme_mode_switch_args,
+			     pcum->sme_mode_switch_args);
+  return gen_rtx_PARALLEL (VOIDmode, argvec);
+}
+
 /* Implement TARGET_FUNCTION_ARG.  */
 
 static rtx
@@ -7945,7 +8442,13 @@ aarch64_function_arg (cumulative_args_t pcum_v, const function_arg_info &arg)
 	      || pcum->pcs_variant == ARM_PCS_SVE);
 
   if (arg.end_marker_p ())
-    return aarch64_gen_callee_cookie (pcum->isa_mode, pcum->pcs_variant);
+    {
+      rtx abi_cookie = aarch64_gen_callee_cookie (pcum->isa_mode,
+						  pcum->pcs_variant);
+      rtx sme_mode_switch_args = aarch64_finish_sme_mode_switch_args (pcum);
+      return gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2, abi_cookie,
+						    sme_mode_switch_args));
+    }
 
   aarch64_layout_arg (pcum_v, arg);
   return pcum->aapcs_reg;
@@ -7980,6 +8483,7 @@ aarch64_init_cumulative_args (CUMULATIVE_ARGS *pcum,
   pcum->aapcs_stack_words = 0;
   pcum->aapcs_stack_size = 0;
   pcum->silent_p = silent_p;
+  pcum->num_sme_mode_switch_args = 0;
 
   if (!silent_p
       && !TARGET_FLOAT
@@ -8020,6 +8524,10 @@ aarch64_function_arg_advance (cumulative_args_t pcum_v,
       aarch64_layout_arg (pcum_v, arg);
       gcc_assert ((pcum->aapcs_reg != NULL_RTX)
 		  != (pcum->aapcs_stack_words != 0));
+      if (pcum->aapcs_reg
+	  && aarch64_call_switches_sm_state (pcum->isa_mode))
+	aarch64_record_sme_mode_switch_args (pcum);
+
       pcum->aapcs_arg_processed = false;
       pcum->aapcs_ncrn = pcum->aapcs_nextncrn;
       pcum->aapcs_nvrn = pcum->aapcs_nextnvrn;
@@ -8457,6 +8965,30 @@ aarch64_needs_frame_chain (void)
   return aarch64_use_frame_pointer;
 }
 
+/* Return true if the current function needs to record the incoming
+   value of PSTATE.SM.  */
+static bool
+aarch64_need_old_pstate_sm ()
+{
+  /* Exit early if the incoming value of PSTATE.SM is known at
+     compile time.  */
+  if (aarch64_cfun_incoming_sm_state () != 0)
+    return false;
+
+  if (cfun->machine->call_switches_sm_state)
+    for (auto insn = get_insns (); insn; insn = NEXT_INSN (insn))
+      if (auto *call = dyn_cast<rtx_call_insn *> (insn))
+	if (!SIBLING_CALL_P (call))
+	  {
+	    /* Return true if there is call to a non-streaming-compatible
+	       function.  */
+	    auto callee_isa_mode = aarch64_insn_callee_isa_mode (call);
+	    if (aarch64_call_switches_sm_state (callee_isa_mode))
+	      return true;
+	  }
+  return false;
+}
+
 /* Mark the registers that need to be saved by the callee and calculate
    the size of the callee-saved registers area and frame record (both FP
    and LR may be omitted).  */
@@ -8486,6 +9018,7 @@ aarch64_layout_frame (void)
   /* First mark all the registers that really need to be saved...  */
   for (regno = 0; regno <= LAST_SAVED_REGNUM; regno++)
     frame.reg_offset[regno] = SLOT_NOT_REQUIRED;
+  frame.old_svcr_offset = SLOT_NOT_REQUIRED;
 
   /* ... that includes the eh data registers (if needed)...  */
   if (crtl->calls_eh_return)
@@ -8612,6 +9145,12 @@ aarch64_layout_frame (void)
 	offset += UNITS_PER_WORD;
       }
 
+  if (aarch64_need_old_pstate_sm ())
+    {
+      frame.old_svcr_offset = offset;
+      offset += UNITS_PER_WORD;
+    }
+
   poly_int64 max_int_offset = offset;
   offset = aligned_upper_bound (offset, STACK_BOUNDARY / BITS_PER_UNIT);
   bool has_align_gap = maybe_ne (offset, max_int_offset);
@@ -9908,6 +10447,48 @@ aarch64_epilogue_uses (int regno)
   return 0;
 }
 
+/* The current function's frame has a save slot for the incoming state
+   of SVCR.  Return a legitimate memory for the slot, based on the hard
+   frame pointer.  */
+
+static rtx
+aarch64_old_svcr_mem ()
+{
+  gcc_assert (frame_pointer_needed
+	      && known_ge (cfun->machine->frame.old_svcr_offset, 0));
+  rtx base = hard_frame_pointer_rtx;
+  poly_int64 offset = (/* hard fp -> top of frame.  */
+		       cfun->machine->frame.hard_fp_offset
+		       /* top of frame -> bottom of frame.  */
+		       - cfun->machine->frame.frame_size
+		       /* bottom of frame -> save slot.  */
+		       + cfun->machine->frame.old_svcr_offset);
+  return gen_frame_mem (DImode, plus_constant (Pmode, base, offset));
+}
+
+/* The current function's frame has a save slot for the incoming state
+   of SVCR.  Load the slot into register REGNO and return the register.  */
+
+static rtx
+aarch64_read_old_svcr (unsigned int regno)
+{
+  rtx svcr = gen_rtx_REG (DImode, regno);
+  emit_move_insn (svcr, aarch64_old_svcr_mem ());
+  return svcr;
+}
+
+/* Like the rtx version of aarch64_guard_switch_pstate_sm, but first
+   load the incoming value of SVCR from its save slot into temporary
+   register REGNO.  */
+
+static rtx_insn *
+aarch64_guard_switch_pstate_sm (unsigned int regno,
+				aarch64_feature_flags local_mode)
+{
+  rtx old_svcr = aarch64_read_old_svcr (regno);
+  return aarch64_guard_switch_pstate_sm (old_svcr, local_mode);
+}
+
 /* AArch64 stack frames generated by this compiler look like:
 
 	+-------------------------------+
@@ -10141,6 +10722,40 @@ aarch64_expand_prologue (void)
      that is assumed by the called.  */
   aarch64_allocate_and_probe_stack_space (tmp1_rtx, tmp0_rtx, final_adjust,
 					  !frame_pointer_needed, true);
+
+  /* Save the incoming value of PSTATE.SM, if required.  */
+  if (known_ge (cfun->machine->frame.old_svcr_offset, 0))
+    {
+      rtx mem = aarch64_old_svcr_mem ();
+      MEM_VOLATILE_P (mem) = 1;
+      if (TARGET_SME)
+	{
+	  rtx reg = gen_rtx_REG (DImode, IP0_REGNUM);
+	  emit_insn (gen_aarch64_read_svcr (reg));
+	  emit_move_insn (mem, reg);
+	}
+      else
+	{
+	  rtx old_r0 = NULL_RTX, old_r1 = NULL_RTX;
+	  auto &args = crtl->args.info;
+	  if (args.aapcs_ncrn > 0)
+	    {
+	      old_r0 = gen_rtx_REG (DImode, PROBE_STACK_FIRST_REGNUM);
+	      emit_move_insn (old_r0, gen_rtx_REG (DImode, R0_REGNUM));
+	    }
+	  if (args.aapcs_ncrn > 1)
+	    {
+	      old_r1 = gen_rtx_REG (DImode, PROBE_STACK_SECOND_REGNUM);
+	      emit_move_insn (old_r1, gen_rtx_REG (DImode, R1_REGNUM));
+	    }
+	  emit_insn (gen_aarch64_get_sme_state ());
+	  emit_move_insn (mem, gen_rtx_REG (DImode, R0_REGNUM));
+	  if (old_r0)
+	    emit_move_insn (gen_rtx_REG (DImode, R0_REGNUM), old_r0);
+	  if (old_r1)
+	    emit_move_insn (gen_rtx_REG (DImode, R1_REGNUM), old_r1);
+	}
+    }
 }
 
 /* Return TRUE if we can use a simple_return insn.
@@ -11395,17 +12010,33 @@ aarch64_start_call_args (cumulative_args_t ca_v)
    RESULT is the register in which the result is returned.  It's NULL for
    "call" and "sibcall".
    MEM is the location of the function call.
-   CALLEE_ABI is a const_int that gives the arm_pcs of the callee.
+   COOKIE is either:
+     - a const_int that gives the argument to the call's UNSPEC_CALLEE_ABI.
+     - a PARALLEL that contains such a const_int as its first element.
+       The second element is a PARALLEL that lists all the argument
+       registers that need to be saved and restored around a change
+       in PSTATE.SM, or const0_rtx if no such switch is needed.
    SIBCALL indicates whether this function call is normal call or sibling call.
    It will generate different pattern accordingly.  */
 
 void
-aarch64_expand_call (rtx result, rtx mem, rtx callee_abi, bool sibcall)
+aarch64_expand_call (rtx result, rtx mem, rtx cookie, bool sibcall)
 {
   rtx call, callee, tmp;
   rtvec vec;
   machine_mode mode;
 
+  rtx callee_abi = cookie;
+  rtx sme_mode_switch_args = const0_rtx;
+  if (GET_CODE (cookie) == PARALLEL)
+    {
+      callee_abi = XVECEXP (cookie, 0, 0);
+      sme_mode_switch_args = XVECEXP (cookie, 0, 1);
+    }
+
+  gcc_assert (CONST_INT_P (callee_abi));
+  auto callee_isa_mode = aarch64_callee_isa_mode (callee_abi);
+
   gcc_assert (MEM_P (mem));
   callee = XEXP (mem, 0);
   mode = GET_MODE (callee);
@@ -11430,26 +12061,67 @@ aarch64_expand_call (rtx result, rtx mem, rtx callee_abi, bool sibcall)
   else
     tmp = gen_rtx_CLOBBER (VOIDmode, gen_rtx_REG (Pmode, LR_REGNUM));
 
-  gcc_assert (CONST_INT_P (callee_abi));
   callee_abi = gen_rtx_UNSPEC (DImode, gen_rtvec (1, callee_abi),
 			       UNSPEC_CALLEE_ABI);
 
   vec = gen_rtvec (3, call, callee_abi, tmp);
   call = gen_rtx_PARALLEL (VOIDmode, vec);
 
-  aarch64_emit_call_insn (call);
+  auto call_insn = aarch64_emit_call_insn (call);
+
+  /* Check whether the call requires a change to PSTATE.SM.  We can't
+     emit the instructions to change PSTATE.SM yet, since they involve
+     a change in vector length and a change in instruction set, which
+     cannot be represented in RTL.
+
+     For now, just record which registers will be clobbered by the
+     changes to PSTATE.SM.  */
+  if (!sibcall && aarch64_call_switches_sm_state (callee_isa_mode))
+    {
+      aarch64_sme_mode_switch_regs args_switch;
+      if (sme_mode_switch_args != const0_rtx)
+	{
+	  unsigned int num_args = XVECLEN (sme_mode_switch_args, 0);
+	  for (unsigned int i = 0; i < num_args; ++i)
+	    {
+	      rtx x = XVECEXP (sme_mode_switch_args, 0, i);
+	      args_switch.add_reg (GET_MODE (x), REGNO (x));
+	    }
+	}
+
+      aarch64_sme_mode_switch_regs result_switch;
+      if (result)
+	result_switch.add_reg (GET_MODE (result), REGNO (result));
+
+      unsigned int num_gprs = MAX (args_switch.num_gprs (),
+				   result_switch.num_gprs ());
+      for (unsigned int i = 0; i < num_gprs; ++i)
+	clobber_reg (&CALL_INSN_FUNCTION_USAGE (call_insn),
+		     gen_rtx_REG (DImode, args_switch.FIRST_GPR + i));
+
+      for (int regno = V0_REGNUM; regno < V0_REGNUM + 32; regno += 4)
+	clobber_reg (&CALL_INSN_FUNCTION_USAGE (call_insn),
+		     gen_rtx_REG (V4x16QImode, regno));
+
+      for (int regno = P0_REGNUM; regno < P0_REGNUM + 16; regno += 1)
+	clobber_reg (&CALL_INSN_FUNCTION_USAGE (call_insn),
+		     gen_rtx_REG (VNx16BImode, regno));
+
+      cfun->machine->call_switches_sm_state = true;
+    }
 }
 
 /* Emit call insn with PAT and do aarch64-specific handling.  */
 
-void
+rtx_insn *
 aarch64_emit_call_insn (rtx pat)
 {
-  rtx insn = emit_call_insn (pat);
+  auto insn = emit_call_insn (pat);
 
   rtx *fusage = &CALL_INSN_FUNCTION_USAGE (insn);
   clobber_reg (fusage, gen_rtx_REG (word_mode, IP0_REGNUM));
   clobber_reg (fusage, gen_rtx_REG (word_mode, IP1_REGNUM));
+  return insn;
 }
 
 machine_mode
@@ -12761,6 +13433,16 @@ aarch64_secondary_memory_needed (machine_mode mode, reg_class_t class1,
   return false;
 }
 
+/* Implement TARGET_FRAME_POINTER_REQUIRED.  */
+
+static bool
+aarch64_frame_pointer_required ()
+{
+  /* If the function needs to record the incoming value of PSTATE.SM,
+     make sure that the slot is accessible from the frame pointer.  */
+  return aarch64_need_old_pstate_sm ();
+}
+
 static bool
 aarch64_can_eliminate (const int from ATTRIBUTE_UNUSED, const int to)
 {
@@ -27496,6 +28178,122 @@ aarch64_indirect_call_asm (rtx addr)
   return "";
 }
 
+/* If CALL involves a change in PSTATE.SM, emit the instructions needed
+   to switch to the new mode and the instructions needed to restore the
+   original mode.  Return true if something changed.  */
+static bool
+aarch64_switch_sm_state_for_call (rtx_call_insn *call)
+{
+  /* Mode switches for sibling calls are handled via the epilogue.  */
+  if (SIBLING_CALL_P (call))
+    return false;
+
+  auto callee_isa_mode = aarch64_insn_callee_isa_mode (call);
+  if (!aarch64_call_switches_sm_state (callee_isa_mode))
+    return false;
+
+  /* Switch mode before the call, preserving any argument registers
+     across the switch.  */
+  start_sequence ();
+  rtx_insn *args_guard_label = nullptr;
+  if (TARGET_STREAMING_COMPATIBLE)
+    args_guard_label = aarch64_guard_switch_pstate_sm (IP0_REGNUM,
+						       callee_isa_mode);
+  aarch64_sme_mode_switch_regs args_switch;
+  args_switch.add_call_args (call);
+  args_switch.emit_prologue ();
+  aarch64_switch_pstate_sm (AARCH64_ISA_MODE, callee_isa_mode);
+  args_switch.emit_epilogue ();
+  if (args_guard_label)
+    emit_label (args_guard_label);
+  auto args_seq = get_insns ();
+  end_sequence ();
+  emit_insn_before (args_seq, call);
+
+  if (find_reg_note (call, REG_NORETURN, NULL_RTX))
+    return true;
+
+  /* Switch mode after the call, preserving any return registers across
+     the switch.  */
+  start_sequence ();
+  rtx_insn *return_guard_label = nullptr;
+  if (TARGET_STREAMING_COMPATIBLE)
+    return_guard_label = aarch64_guard_switch_pstate_sm (IP0_REGNUM,
+							 callee_isa_mode);
+  aarch64_sme_mode_switch_regs return_switch;
+  return_switch.add_call_result (call);
+  return_switch.emit_prologue ();
+  aarch64_switch_pstate_sm (callee_isa_mode, AARCH64_ISA_MODE);
+  return_switch.emit_epilogue ();
+  if (return_guard_label)
+    emit_label (return_guard_label);
+  auto result_seq = get_insns ();
+  end_sequence ();
+  emit_insn_after (result_seq, call);
+  return true;
+}
+
+namespace {
+
+const pass_data pass_data_switch_sm_state =
+{
+  RTL_PASS, // type
+  "smstarts", // name
+  OPTGROUP_NONE, // optinfo_flags
+  TV_NONE, // tv_id
+  0, // properties_required
+  0, // properties_provided
+  0, // properties_destroyed
+  0, // todo_flags_start
+  TODO_df_finish, // todo_flags_finish
+};
+
+class pass_switch_sm_state : public rtl_opt_pass
+{
+public:
+  pass_switch_sm_state (gcc::context *ctxt)
+    : rtl_opt_pass (pass_data_switch_sm_state, ctxt)
+  {}
+
+  // opt_pass methods:
+  bool gate (function *) override final;
+  unsigned int execute (function *) override final;
+};
+
+bool
+pass_switch_sm_state::gate (function *)
+{
+  return cfun->machine->call_switches_sm_state;
+}
+
+/* Emit any instructions needed to switch PSTATE.SM.  */
+unsigned int
+pass_switch_sm_state::execute (function *fn)
+{
+  basic_block bb;
+
+  auto_sbitmap blocks (last_basic_block_for_fn (cfun));
+  bitmap_clear (blocks);
+  FOR_EACH_BB_FN (bb, fn)
+    {
+      rtx_insn *insn;
+      FOR_BB_INSNS (bb, insn)
+	if (auto *call = dyn_cast<rtx_call_insn *> (insn))
+	  if (aarch64_switch_sm_state_for_call (call))
+	    bitmap_set_bit (blocks, bb->index);
+    }
+  find_many_sub_basic_blocks (blocks);
+  return 0;
+}
+
+}
+
+rtl_opt_pass *
+make_pass_switch_sm_state (gcc::context *ctxt)
+{
+  return new pass_switch_sm_state (ctxt);
+}
+
 /* Target-specific selftests.  */
 
 #if CHECKING_P
@@ -27683,6 +28481,9 @@ aarch64_run_selftests (void)
 #undef TARGET_CALLEE_COPIES
 #define TARGET_CALLEE_COPIES hook_bool_CUMULATIVE_ARGS_arg_info_false
 
+#undef TARGET_FRAME_POINTER_REQUIRED
+#define TARGET_FRAME_POINTER_REQUIRED aarch64_frame_pointer_required
+
 #undef TARGET_CAN_ELIMINATE
 #define TARGET_CAN_ELIMINATE aarch64_can_eliminate
 
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 8359cf709c1..f23edea35f5 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -255,6 +255,10 @@ constexpr auto AARCH64_FL_DEFAULT_ISA_MODE = AARCH64_FL_SM_OFF;
 /* The current function is a normal non-streaming function.  */
 #define TARGET_NON_STREAMING (AARCH64_ISA_SM_OFF)
 
+/* The current function has a streaming-compatible body.  */
+#define TARGET_STREAMING_COMPATIBLE \
+  ((aarch64_isa_flags & AARCH64_FL_SM_STATE) == 0)
+
 /* Crypto is an optional extension to AdvSIMD.  */
 #define TARGET_CRYPTO (AARCH64_ISA_CRYPTO)
 
@@ -304,6 +308,10 @@ constexpr auto AARCH64_FL_DEFAULT_ISA_MODE = AARCH64_FL_SM_OFF;
 /* SVE2 SM4 instructions, enabled through +sve2-sm4.  */
 #define TARGET_SVE2_SM4 (AARCH64_ISA_SVE2_SM4 && TARGET_NON_STREAMING)
 
+/* SME instructions, enabled through +sme.  Note that this does not
+   imply anything about the state of PSTATE.SM.  */
+#define TARGET_SME (AARCH64_ISA_SME)
+
 /* ARMv8.3-A features.  */
 #define TARGET_ARMV8_3	(AARCH64_ISA_V8_3A)
 
@@ -802,6 +810,13 @@ struct GTY (()) aarch64_frame
      STACK_BOUNDARY.  */
   poly_int64 locals_offset;
 
+  /* The offset from the base of the frame of a 64-bit slot whose low
+     bit contains the incoming value of PSTATE.SM.  This slot must be
+     within reach of the hard frame pointer.
+
+     The offset is -1 if such a slot isn't needed.  */
+  poly_int64 old_svcr_offset;
+
   /* Offset from the base of the frame (incomming SP) to the
      hard_frame_pointer.  This value is always a multiple of
      STACK_BOUNDARY.  */
@@ -884,6 +899,10 @@ typedef struct GTY (()) machine_function
   /* One entry for each general purpose register.  */
   rtx call_via[SP_REGNUM];
   bool label_is_assembled;
+  /* True if we've expanded at least one call to a function that changes
+     PSTATE.SM.  This should only be used for saving compile time: false
+     guarantees that no such mode switch exists.  */
+  bool call_switches_sm_state;
 } machine_function;
 #endif
 
@@ -948,6 +967,12 @@ typedef struct
 				   stack arg area so far.  */
   bool silent_p;		/* True if we should act silently, rather than
 				   raise an error for invalid calls.  */
+
+  /* A list of registers that need to be saved and restored around a
+     change to PSTATE.SM.  An auto_vec would be more convenient, but those
+     can't be copied.  */
+  unsigned int num_sme_mode_switch_args;
+  rtx sme_mode_switch_args[12];
 } CUMULATIVE_ARGS;
 #endif
 
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 3dc877ba9fe..991f46fbc80 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -940,7 +940,7 @@ (define_insn "*cb<optab><mode>1"
 		      (const_int 1)))]
 )
 
-(define_insn "*tb<optab><mode>1"
+(define_insn "@aarch64_tb<optab><mode>"
   [(set (pc) (if_then_else
 	      (EQL (zero_extract:DI (match_operand:GPI 0 "register_operand" "r")
 				    (const_int 1)
@@ -1027,7 +1027,7 @@ (define_expand "call"
   [(parallel
      [(call (match_operand 0 "memory_operand")
 	    (match_operand 1 "general_operand"))
-      (unspec:DI [(match_operand 2 "const_int_operand")] UNSPEC_CALLEE_ABI)
+      (unspec:DI [(match_operand 2)] UNSPEC_CALLEE_ABI)
       (clobber (reg:DI LR_REGNUM))])]
   ""
   "
@@ -1053,7 +1053,7 @@ (define_expand "call_value"
      [(set (match_operand 0 "")
 	   (call (match_operand 1 "memory_operand")
 		 (match_operand 2 "general_operand")))
-     (unspec:DI [(match_operand 3 "const_int_operand")] UNSPEC_CALLEE_ABI)
+     (unspec:DI [(match_operand 3)] UNSPEC_CALLEE_ABI)
      (clobber (reg:DI LR_REGNUM))])]
   ""
   "
@@ -1080,7 +1080,7 @@ (define_expand "sibcall"
   [(parallel
      [(call (match_operand 0 "memory_operand")
 	    (match_operand 1 "general_operand"))
-      (unspec:DI [(match_operand 2 "const_int_operand")] UNSPEC_CALLEE_ABI)
+      (unspec:DI [(match_operand 2)] UNSPEC_CALLEE_ABI)
       (return)])]
   ""
   {
@@ -1094,7 +1094,7 @@ (define_expand "sibcall_value"
      [(set (match_operand 0 "")
 	   (call (match_operand 1 "memory_operand")
 		 (match_operand 2 "general_operand")))
-      (unspec:DI [(match_operand 3 "const_int_operand")] UNSPEC_CALLEE_ABI)
+      (unspec:DI [(match_operand 3)] UNSPEC_CALLEE_ABI)
       (return)])]
   ""
   {
@@ -7783,3 +7783,6 @@ (define_insn "st64bv0"
 
 ;; SVE2.
 (include "aarch64-sve2.md")
+
+;; SME and extensions
+(include "aarch64-sme.md")
diff --git a/gcc/config/aarch64/t-aarch64 b/gcc/config/aarch64/t-aarch64
index 47a753c5f1b..c1c8f5c7dae 100644
--- a/gcc/config/aarch64/t-aarch64
+++ b/gcc/config/aarch64/t-aarch64
@@ -186,7 +186,8 @@ MULTILIB_DIRNAMES   = $(subst $(comma), ,$(TM_MULTILIB_CONFIG))
 insn-conditions.md: s-check-sve-md
 s-check-sve-md: $(srcdir)/config/aarch64/check-sve-md.awk \
 		$(srcdir)/config/aarch64/aarch64-sve.md \
-		$(srcdir)/config/aarch64/aarch64-sve2.md
+		$(srcdir)/config/aarch64/aarch64-sve2.md \
+		$(srcdir)/config/aarch64/aarch64-sme.md
 	$(AWK) -f $(srcdir)/config/aarch64/check-sve-md.awk \
 	  $(srcdir)/config/aarch64/aarch64-sve.md
 	$(AWK) -f $(srcdir)/config/aarch64/check-sve-md.awk \
diff --git a/gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_1.c b/gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_1.c
new file mode 100644
index 00000000000..b4931c1bc37
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_1.c
@@ -0,0 +1,195 @@
+// { dg-options "-O -fomit-frame-pointer -fno-optimize-sibling-calls" }
+// { dg-final { check-function-bodies "**" "" } }
+
+void ns_callee ();
+__attribute__((arm_streaming)) void s_callee ();
+__attribute__((arm_streaming_compatible)) void sc_callee ();
+
+struct callbacks {
+  void (*ns_ptr) ();
+  __attribute__((arm_streaming)) void (*s_ptr) ();
+  __attribute__((arm_streaming_compatible)) void (*sc_ptr) ();
+};
+
+/*
+** n_caller:	{ target lp64 }
+**	stp	(x19|x2[0-8]), x30, \[sp, #?-80\]!
+**	stp	d8, d9, \[sp, #?16\]
+**	stp	d10, d11, \[sp, #?32\]
+**	stp	d12, d13, \[sp, #?48\]
+**	stp	d14, d15, \[sp, #?64\]
+**	mov	\1, x0
+**	bl	ns_callee
+**	smstart	sm
+**	bl	s_callee
+**	smstop	sm
+**	bl	sc_callee
+**	ldr	(x[0-9]+), \[\1\]
+**	blr	\2
+**	ldr	(x[0-9]+), \[\1, #?8\]
+**	smstart	sm
+**	blr	\3
+**	smstop	sm
+**	ldr	(x[0-9]+), \[\1, #?16\]
+**	blr	\4
+**	ldp	d8, d9, \[sp, #?16\]
+**	ldp	d10, d11, \[sp, #?32\]
+**	ldp	d12, d13, \[sp, #?48\]
+**	ldp	d14, d15, \[sp, #?64\]
+**	ldp	\1, x30, \[sp\], #?80
+**	ret
+*/
+void
+n_caller (struct callbacks *c)
+{
+  ns_callee ();
+  s_callee ();
+  sc_callee ();
+
+  c->ns_ptr ();
+  c->s_ptr ();
+  c->sc_ptr ();
+}
+
+/*
+** s_caller:	{ target lp64 }
+**	stp	(x19|x2[0-8]), x30, \[sp, #?-80\]!
+**	stp	d8, d9, \[sp, #?16\]
+**	stp	d10, d11, \[sp, #?32\]
+**	stp	d12, d13, \[sp, #?48\]
+**	stp	d14, d15, \[sp, #?64\]
+**	mov	\1, x0
+**	smstop	sm
+**	bl	ns_callee
+**	smstart	sm
+**	bl	s_callee
+**	bl	sc_callee
+**	ldr	(x[0-9]+), \[\1\]
+**	smstop	sm
+**	blr	\2
+**	smstart	sm
+**	ldr	(x[0-9]+), \[\1, #?8\]
+**	blr	\3
+**	ldr	(x[0-9]+), \[\1, #?16\]
+**	blr	\4
+**	ldp	d8, d9, \[sp, #?16\]
+**	ldp	d10, d11, \[sp, #?32\]
+**	ldp	d12, d13, \[sp, #?48\]
+**	ldp	d14, d15, \[sp, #?64\]
+**	ldp	\1, x30, \[sp\], #?80
+**	ret
+*/
+void __attribute__((arm_streaming))
+s_caller (struct callbacks *c)
+{
+  ns_callee ();
+  s_callee ();
+  sc_callee ();
+
+  c->ns_ptr ();
+  c->s_ptr ();
+  c->sc_ptr ();
+}
+
+/*
+** sc_caller_sme:
+**	stp	x29, x30, \[sp, #?-96\]!
+**	mov	x29, sp
+**	stp	d8, d9, \[sp, #?32\]
+**	stp	d10, d11, \[sp, #?48\]
+**	stp	d12, d13, \[sp, #?64\]
+**	stp	d14, d15, \[sp, #?80\]
+**	mrs	x16, svcr
+**	str	x16, \[x29, #?16\]
+**	ldr	x16, \[x29, #?16\]
+**	tbz	x16, 0, .*
+**	smstop	sm
+**	bl	ns_callee
+**	ldr	x16, \[x29, #?16\]
+**	tbz	x16, 0, .*
+**	smstart	sm
+**	ldr	x16, \[x29, #?16\]
+**	tbnz	x16, 0, .*
+**	smstart	sm
+**	bl	s_callee
+**	ldr	x16, \[x29, #?16\]
+**	tbnz	x16, 0, .*
+**	smstop	sm
+**	bl	sc_callee
+**	ldp	d8, d9, \[sp, #?32\]
+**	ldp	d10, d11, \[sp, #?48\]
+**	ldp	d12, d13, \[sp, #?64\]
+**	ldp	d14, d15, \[sp, #?80\]
+**	ldp	x29, x30, \[sp\], #?96
+**	ret
+*/
+void __attribute__((arm_streaming_compatible))
+sc_caller_sme ()
+{
+  ns_callee ();
+  s_callee ();
+  sc_callee ();
+}
+
+#pragma GCC target "+nosme"
+
+/*
+** sc_caller:
+**	stp	x29, x30, \[sp, #?-96\]!
+**	mov	x29, sp
+**	stp	d8, d9, \[sp, #?32\]
+**	stp	d10, d11, \[sp, #?48\]
+**	stp	d12, d13, \[sp, #?64\]
+**	stp	d14, d15, \[sp, #?80\]
+**	bl	__arm_sme_state
+**	str	x0, \[x29, #?16\]
+**	...
+**	bl	sc_callee
+**	ldp	d8, d9, \[sp, #?32\]
+**	ldp	d10, d11, \[sp, #?48\]
+**	ldp	d12, d13, \[sp, #?64\]
+**	ldp	d14, d15, \[sp, #?80\]
+**	ldp	x29, x30, \[sp\], #?96
+**	ret
+*/
+void __attribute__((arm_streaming_compatible))
+sc_caller ()
+{
+  ns_callee ();
+  sc_callee ();
+}
+
+/*
+** sc_caller_x0:
+**	...
+**	mov	x10, x0
+**	bl	__arm_sme_state
+**	...
+**	str	wzr, \[x10\]
+**	...
+*/
+void __attribute__((arm_streaming_compatible))
+sc_caller_x0 (int *ptr)
+{
+  *ptr = 0;
+  ns_callee ();
+  sc_callee ();
+}
+
+/*
+** sc_caller_x1:
+**	...
+**	mov	x10, x0
+**	mov	x11, x1
+**	bl	__arm_sme_state
+**	...
+**	str	w11, \[x10\]
+**	...
+*/
+void __attribute__((arm_streaming_compatible))
+sc_caller_x1 (int *ptr, int a)
+{
+  *ptr = a;
+  ns_callee ();
+  sc_callee ();
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_10.c b/gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_10.c
new file mode 100644
index 00000000000..f70378541a4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_10.c
@@ -0,0 +1,37 @@
+// { dg-options "" }
+
+#pragma GCC target "+nosme"
+
+void ns_callee ();
+__attribute__((arm_streaming)) void s_callee ();
+__attribute__((arm_streaming_compatible)) void sc_callee ();
+
+struct callbacks {
+  void (*ns_ptr) ();
+  __attribute__((arm_streaming)) void (*s_ptr) ();
+  __attribute__((arm_streaming_compatible)) void (*sc_ptr) ();
+};
+
+void
+n_caller (struct callbacks *c)
+{
+  ns_callee ();
+  s_callee (); // { dg-error "calling a streaming function requires the ISA extension 'sme'" }
+  sc_callee ();
+
+  c->ns_ptr ();
+  c->s_ptr (); // { dg-error "calling a streaming function requires the ISA extension 'sme'" }
+  c->sc_ptr ();
+}
+
+void __attribute__((arm_streaming_compatible))
+sc_caller_sme (struct callbacks *c)
+{
+  ns_callee ();
+  s_callee (); // { dg-error "calling a streaming function requires the ISA extension 'sme'" }
+  sc_callee ();
+
+  c->ns_ptr ();
+  c->s_ptr (); // { dg-error "calling a streaming function requires the ISA extension 'sme'" }
+  c->sc_ptr ();
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_2.c b/gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_2.c
new file mode 100644
index 00000000000..9a1b646a2f8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_2.c
@@ -0,0 +1,43 @@
+// { dg-options "-O -fomit-frame-pointer -fno-optimize-sibling-calls" }
+
+void ns_callee ();
+__attribute__((arm_streaming)) void s_callee ();
+__attribute__((arm_streaming_compatible)) void sc_callee ();
+
+struct callbacks {
+  void (*ns_ptr) ();
+  __attribute__((arm_streaming)) void (*s_ptr) ();
+  __attribute__((arm_streaming_compatible)) void (*sc_ptr) ();
+};
+
+void
+n_caller (struct callbacks *c)
+{
+  ns_callee ();
+  sc_callee ();
+
+  c->ns_ptr ();
+  c->sc_ptr ();
+}
+
+void __attribute__((arm_streaming))
+s_caller (struct callbacks *c)
+{
+  s_callee ();
+  sc_callee ();
+
+  c->s_ptr ();
+  c->sc_ptr ();
+}
+
+void __attribute__((arm_streaming_compatible))
+sc_caller (struct callbacks *c)
+{
+  sc_callee ();
+
+  c->sc_ptr ();
+}
+
+// { dg-final { scan-assembler-not {[dpqz][0-9]+,} } }
+// { dg-final { scan-assembler-not {smstart\tsm} } }
+// { dg-final { scan-assembler-not {smstop\tsm} } }
diff --git a/gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_3.c b/gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_3.c
new file mode 100644
index 00000000000..9ad6b2c1fff
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_3.c
@@ -0,0 +1,156 @@
+// { dg-options "-O -fomit-frame-pointer -fno-optimize-sibling-calls" }
+// { dg-final { check-function-bodies "**" "" } }
+
+__attribute__((aarch64_vector_pcs)) void ns_callee ();
+__attribute__((arm_streaming, aarch64_vector_pcs)) void s_callee ();
+__attribute__((arm_streaming_compatible, aarch64_vector_pcs)) void sc_callee ();
+
+struct callbacks {
+  __attribute__((aarch64_vector_pcs)) void (*ns_ptr) ();
+  __attribute__((arm_streaming, aarch64_vector_pcs)) void (*s_ptr) ();
+  __attribute__((arm_streaming_compatible, aarch64_vector_pcs)) void (*sc_ptr) ();
+};
+
+/*
+** n_caller:	{ target lp64 }
+**	stp	(x19|x2[0-8]), x30, \[sp, #?-272\]!
+**	stp	q8, q9, \[sp, #?16\]
+**	stp	q10, q11, \[sp, #?48\]
+**	stp	q12, q13, \[sp, #?80\]
+**	stp	q14, q15, \[sp, #?112\]
+**	stp	q16, q17, \[sp, #?144\]
+**	stp	q18, q19, \[sp, #?176\]
+**	stp	q20, q21, \[sp, #?208\]
+**	stp	q22, q23, \[sp, #?240\]
+**	mov	\1, x0
+**	bl	ns_callee
+**	smstart	sm
+**	bl	s_callee
+**	smstop	sm
+**	bl	sc_callee
+**	ldr	(x[0-9]+), \[\1\]
+**	blr	\2
+**	ldr	(x[0-9]+), \[\1, #?8\]
+**	smstart	sm
+**	blr	\3
+**	smstop	sm
+**	ldr	(x[0-9]+), \[\1, #?16\]
+**	blr	\4
+**	ldp	q8, q9, \[sp, #?16\]
+**	ldp	q10, q11, \[sp, #?48\]
+**	ldp	q12, q13, \[sp, #?80\]
+**	ldp	q14, q15, \[sp, #?112\]
+**	ldp	q16, q17, \[sp, #?144\]
+**	ldp	q18, q19, \[sp, #?176\]
+**	ldp	q20, q21, \[sp, #?208\]
+**	ldp	q22, q23, \[sp, #?240\]
+**	ldp	\1, x30, \[sp\], #?272
+**	ret
+*/
+void __attribute__((aarch64_vector_pcs))
+n_caller (struct callbacks *c)
+{
+  ns_callee ();
+  s_callee ();
+  sc_callee ();
+
+  c->ns_ptr ();
+  c->s_ptr ();
+  c->sc_ptr ();
+}
+
+/*
+** s_caller:	{ target lp64 }
+**	stp	(x19|x2[0-8]), x30, \[sp, #?-272\]!
+**	stp	q8, q9, \[sp, #?16\]
+**	stp	q10, q11, \[sp, #?48\]
+**	stp	q12, q13, \[sp, #?80\]
+**	stp	q14, q15, \[sp, #?112\]
+**	stp	q16, q17, \[sp, #?144\]
+**	stp	q18, q19, \[sp, #?176\]
+**	stp	q20, q21, \[sp, #?208\]
+**	stp	q22, q23, \[sp, #?240\]
+**	mov	\1, x0
+**	smstop	sm
+**	bl	ns_callee
+**	smstart	sm
+**	bl	s_callee
+**	bl	sc_callee
+**	ldr	(x[0-9]+), \[\1\]
+**	smstop	sm
+**	blr	\2
+**	smstart	sm
+**	ldr	(x[0-9]+), \[\1, #?8\]
+**	blr	\3
+**	ldr	(x[0-9]+), \[\1, #?16\]
+**	blr	\4
+**	ldp	q8, q9, \[sp, #?16\]
+**	ldp	q10, q11, \[sp, #?48\]
+**	ldp	q12, q13, \[sp, #?80\]
+**	ldp	q14, q15, \[sp, #?112\]
+**	ldp	q16, q17, \[sp, #?144\]
+**	ldp	q18, q19, \[sp, #?176\]
+**	ldp	q20, q21, \[sp, #?208\]
+**	ldp	q22, q23, \[sp, #?240\]
+**	ldp	\1, x30, \[sp\], #?272
+**	ret
+*/
+void __attribute__((arm_streaming, aarch64_vector_pcs))
+s_caller (struct callbacks *c)
+{
+  ns_callee ();
+  s_callee ();
+  sc_callee ();
+
+  c->ns_ptr ();
+  c->s_ptr ();
+  c->sc_ptr ();
+}
+
+/*
+** sc_caller:
+**	stp	x29, x30, \[sp, #?-288\]!
+**	mov	x29, sp
+**	stp	q8, q9, \[sp, #?32\]
+**	stp	q10, q11, \[sp, #?64\]
+**	stp	q12, q13, \[sp, #?96\]
+**	stp	q14, q15, \[sp, #?128\]
+**	stp	q16, q17, \[sp, #?160\]
+**	stp	q18, q19, \[sp, #?192\]
+**	stp	q20, q21, \[sp, #?224\]
+**	stp	q22, q23, \[sp, #?256\]
+**	mrs	x16, svcr
+**	str	x16, \[x29, #?16\]
+**	ldr	x16, \[x29, #?16\]
+**	tbz	x16, 0, .*
+**	smstop	sm
+**	bl	ns_callee
+**	ldr	x16, \[x29, #?16\]
+**	tbz	x16, 0, .*
+**	smstart	sm
+**	ldr	x16, \[x29, #?16\]
+**	tbnz	x16, 0, .*
+**	smstart	sm
+**	bl	s_callee
+**	ldr	x16, \[x29, #?16\]
+**	tbnz	x16, 0, .*
+**	smstop	sm
+**	bl	sc_callee
+**	ldp	q8, q9, \[sp, #?32\]
+**	ldp	q10, q11, \[sp, #?64\]
+**	ldp	q12, q13, \[sp, #?96\]
+**	ldp	q14, q15, \[sp, #?128\]
+**	ldp	q16, q17, \[sp, #?160\]
+**	ldp	q18, q19, \[sp, #?192\]
+**	ldp	q20, q21, \[sp, #?224\]
+**	ldp	q22, q23, \[sp, #?256\]
+**	ldp	x29, x30, \[sp\], #?288
+**	ret
+*/
+void __attribute__((arm_streaming_compatible, aarch64_vector_pcs))
+sc_caller ()
+{
+  ns_callee ();
+  s_callee ();
+  sc_callee ();
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_4.c b/gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_4.c
new file mode 100644
index 00000000000..1dd1eeb2439
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_4.c
@@ -0,0 +1,43 @@
+// { dg-options "-O -fomit-frame-pointer -fno-optimize-sibling-calls" }
+
+__attribute__((aarch64_vector_pcs)) void ns_callee ();
+__attribute__((aarch64_vector_pcs, arm_streaming)) void s_callee ();
+__attribute__((aarch64_vector_pcs, arm_streaming_compatible)) void sc_callee ();
+
+struct callbacks {
+  __attribute__((aarch64_vector_pcs)) void (*ns_ptr) ();
+  __attribute__((aarch64_vector_pcs, arm_streaming)) void (*s_ptr) ();
+  __attribute__((aarch64_vector_pcs, arm_streaming_compatible)) void (*sc_ptr) ();
+};
+
+void __attribute__((aarch64_vector_pcs))
+n_caller (struct callbacks *c)
+{
+  ns_callee ();
+  sc_callee ();
+
+  c->ns_ptr ();
+  c->sc_ptr ();
+}
+
+void __attribute__((aarch64_vector_pcs, arm_streaming))
+s_caller (struct callbacks *c)
+{
+  s_callee ();
+  sc_callee ();
+
+  c->s_ptr ();
+  c->sc_ptr ();
+}
+
+void __attribute__((aarch64_vector_pcs, arm_streaming_compatible))
+sc_caller (struct callbacks *c)
+{
+  sc_callee ();
+
+  c->sc_ptr ();
+}
+
+// { dg-final { scan-assembler-not {[dpqz][0-9]+,} } }
+// { dg-final { scan-assembler-not {smstart\tsm} } }
+// { dg-final { scan-assembler-not {smstop\tsm} } }
diff --git a/gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_5.c b/gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_5.c
new file mode 100644
index 00000000000..e9f7af16445
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_5.c
@@ -0,0 +1,308 @@
+// { dg-options "-O -fomit-frame-pointer -fno-optimize-sibling-calls" }
+// { dg-final { check-function-bodies "**" "" } }
+
+#include <arm_sve.h>
+
+svbool_t ns_callee ();
+__attribute__((arm_streaming)) svbool_t s_callee ();
+__attribute__((arm_streaming_compatible)) svbool_t sc_callee ();
+
+struct callbacks {
+  svbool_t (*ns_ptr) ();
+  __attribute__((arm_streaming)) svbool_t (*s_ptr) ();
+  __attribute__((arm_streaming_compatible)) svbool_t (*sc_ptr) ();
+};
+
+/*
+** n_caller:	{ target lp64 }
+**	stp	(x19|x2[0-8]), x30, \[sp, #?-16\]!
+**	addvl	sp, sp, #-18
+**	str	p4, \[sp\]
+**	str	p5, \[sp, #1, mul vl\]
+**	str	p6, \[sp, #2, mul vl\]
+**	str	p7, \[sp, #3, mul vl\]
+**	str	p8, \[sp, #4, mul vl\]
+**	str	p9, \[sp, #5, mul vl\]
+**	str	p10, \[sp, #6, mul vl\]
+**	str	p11, \[sp, #7, mul vl\]
+**	str	p12, \[sp, #8, mul vl\]
+**	str	p13, \[sp, #9, mul vl\]
+**	str	p14, \[sp, #10, mul vl\]
+**	str	p15, \[sp, #11, mul vl\]
+**	str	z8, \[sp, #2, mul vl\]
+**	str	z9, \[sp, #3, mul vl\]
+**	str	z10, \[sp, #4, mul vl\]
+**	str	z11, \[sp, #5, mul vl\]
+**	str	z12, \[sp, #6, mul vl\]
+**	str	z13, \[sp, #7, mul vl\]
+**	str	z14, \[sp, #8, mul vl\]
+**	str	z15, \[sp, #9, mul vl\]
+**	str	z16, \[sp, #10, mul vl\]
+**	str	z17, \[sp, #11, mul vl\]
+**	str	z18, \[sp, #12, mul vl\]
+**	str	z19, \[sp, #13, mul vl\]
+**	str	z20, \[sp, #14, mul vl\]
+**	str	z21, \[sp, #15, mul vl\]
+**	str	z22, \[sp, #16, mul vl\]
+**	str	z23, \[sp, #17, mul vl\]
+**	mov	\1, x0
+**	bl	ns_callee
+**	smstart	sm
+**	bl	s_callee
+**	addvl	sp, sp, #-1
+**	str	p0, \[sp\]
+**	smstop	sm
+**	ldr	p0, \[sp\]
+**	addvl	sp, sp, #1
+**	bl	sc_callee
+**	ldr	(x[0-9]+), \[\1\]
+**	blr	\2
+**	ldr	(x[0-9]+), \[\1, #?8\]
+**	smstart	sm
+**	blr	\3
+**	addvl	sp, sp, #-1
+**	str	p0, \[sp\]
+**	smstop	sm
+**	ldr	p0, \[sp\]
+**	addvl	sp, sp, #1
+**	ldr	(x[0-9]+), \[\1, #?16\]
+**	blr	\4
+**	ldr	z8, \[sp, #2, mul vl\]
+**	ldr	z9, \[sp, #3, mul vl\]
+**	ldr	z10, \[sp, #4, mul vl\]
+**	ldr	z11, \[sp, #5, mul vl\]
+**	ldr	z12, \[sp, #6, mul vl\]
+**	ldr	z13, \[sp, #7, mul vl\]
+**	ldr	z14, \[sp, #8, mul vl\]
+**	ldr	z15, \[sp, #9, mul vl\]
+**	ldr	z16, \[sp, #10, mul vl\]
+**	ldr	z17, \[sp, #11, mul vl\]
+**	ldr	z18, \[sp, #12, mul vl\]
+**	ldr	z19, \[sp, #13, mul vl\]
+**	ldr	z20, \[sp, #14, mul vl\]
+**	ldr	z21, \[sp, #15, mul vl\]
+**	ldr	z22, \[sp, #16, mul vl\]
+**	ldr	z23, \[sp, #17, mul vl\]
+**	ldr	p4, \[sp\]
+**	ldr	p5, \[sp, #1, mul vl\]
+**	ldr	p6, \[sp, #2, mul vl\]
+**	ldr	p7, \[sp, #3, mul vl\]
+**	ldr	p8, \[sp, #4, mul vl\]
+**	ldr	p9, \[sp, #5, mul vl\]
+**	ldr	p10, \[sp, #6, mul vl\]
+**	ldr	p11, \[sp, #7, mul vl\]
+**	ldr	p12, \[sp, #8, mul vl\]
+**	ldr	p13, \[sp, #9, mul vl\]
+**	ldr	p14, \[sp, #10, mul vl\]
+**	ldr	p15, \[sp, #11, mul vl\]
+**	addvl	sp, sp, #18
+**	ldp	\1, x30, \[sp\], #?16
+**	ret
+*/
+svbool_t
+n_caller (struct callbacks *c)
+{
+  ns_callee ();
+  s_callee ();
+  sc_callee ();
+
+  c->ns_ptr ();
+  c->s_ptr ();
+  return c->sc_ptr ();
+}
+
+/*
+** s_caller:	{ target lp64 }
+**	stp	(x19|x2[0-8]), x30, \[sp, #?-16\]!
+**	addvl	sp, sp, #-18
+**	str	p4, \[sp\]
+**	str	p5, \[sp, #1, mul vl\]
+**	str	p6, \[sp, #2, mul vl\]
+**	str	p7, \[sp, #3, mul vl\]
+**	str	p8, \[sp, #4, mul vl\]
+**	str	p9, \[sp, #5, mul vl\]
+**	str	p10, \[sp, #6, mul vl\]
+**	str	p11, \[sp, #7, mul vl\]
+**	str	p12, \[sp, #8, mul vl\]
+**	str	p13, \[sp, #9, mul vl\]
+**	str	p14, \[sp, #10, mul vl\]
+**	str	p15, \[sp, #11, mul vl\]
+**	str	z8, \[sp, #2, mul vl\]
+**	str	z9, \[sp, #3, mul vl\]
+**	str	z10, \[sp, #4, mul vl\]
+**	str	z11, \[sp, #5, mul vl\]
+**	str	z12, \[sp, #6, mul vl\]
+**	str	z13, \[sp, #7, mul vl\]
+**	str	z14, \[sp, #8, mul vl\]
+**	str	z15, \[sp, #9, mul vl\]
+**	str	z16, \[sp, #10, mul vl\]
+**	str	z17, \[sp, #11, mul vl\]
+**	str	z18, \[sp, #12, mul vl\]
+**	str	z19, \[sp, #13, mul vl\]
+**	str	z20, \[sp, #14, mul vl\]
+**	str	z21, \[sp, #15, mul vl\]
+**	str	z22, \[sp, #16, mul vl\]
+**	str	z23, \[sp, #17, mul vl\]
+**	mov	\1, x0
+**	smstop	sm
+**	bl	ns_callee
+**	addvl	sp, sp, #-1
+**	str	p0, \[sp\]
+**	smstart	sm
+**	ldr	p0, \[sp\]
+**	addvl	sp, sp, #1
+**	bl	s_callee
+**	bl	sc_callee
+**	ldr	(x[0-9]+), \[\1\]
+**	smstop	sm
+**	blr	\2
+**	addvl	sp, sp, #-1
+**	str	p0, \[sp\]
+**	smstart	sm
+**	ldr	p0, \[sp\]
+**	addvl	sp, sp, #1
+**	ldr	(x[0-9]+), \[\1, #?8\]
+**	blr	\3
+**	ldr	(x[0-9]+), \[\1, #?16\]
+**	blr	\4
+**	ldr	z8, \[sp, #2, mul vl\]
+**	ldr	z9, \[sp, #3, mul vl\]
+**	ldr	z10, \[sp, #4, mul vl\]
+**	ldr	z11, \[sp, #5, mul vl\]
+**	ldr	z12, \[sp, #6, mul vl\]
+**	ldr	z13, \[sp, #7, mul vl\]
+**	ldr	z14, \[sp, #8, mul vl\]
+**	ldr	z15, \[sp, #9, mul vl\]
+**	ldr	z16, \[sp, #10, mul vl\]
+**	ldr	z17, \[sp, #11, mul vl\]
+**	ldr	z18, \[sp, #12, mul vl\]
+**	ldr	z19, \[sp, #13, mul vl\]
+**	ldr	z20, \[sp, #14, mul vl\]
+**	ldr	z21, \[sp, #15, mul vl\]
+**	ldr	z22, \[sp, #16, mul vl\]
+**	ldr	z23, \[sp, #17, mul vl\]
+**	ldr	p4, \[sp\]
+**	ldr	p5, \[sp, #1, mul vl\]
+**	ldr	p6, \[sp, #2, mul vl\]
+**	ldr	p7, \[sp, #3, mul vl\]
+**	ldr	p8, \[sp, #4, mul vl\]
+**	ldr	p9, \[sp, #5, mul vl\]
+**	ldr	p10, \[sp, #6, mul vl\]
+**	ldr	p11, \[sp, #7, mul vl\]
+**	ldr	p12, \[sp, #8, mul vl\]
+**	ldr	p13, \[sp, #9, mul vl\]
+**	ldr	p14, \[sp, #10, mul vl\]
+**	ldr	p15, \[sp, #11, mul vl\]
+**	addvl	sp, sp, #18
+**	ldp	\1, x30, \[sp\], #?16
+**	ret
+*/
+svbool_t __attribute__((arm_streaming))
+s_caller (struct callbacks *c)
+{
+  ns_callee ();
+  s_callee ();
+  sc_callee ();
+
+  c->ns_ptr ();
+  c->s_ptr ();
+  return c->sc_ptr ();
+}
+
+/*
+** sc_caller:
+**	stp	x29, x30, \[sp, #?-32\]!
+**	mov	x29, sp
+**	addvl	sp, sp, #-18
+**	str	p4, \[sp\]
+**	str	p5, \[sp, #1, mul vl\]
+**	str	p6, \[sp, #2, mul vl\]
+**	str	p7, \[sp, #3, mul vl\]
+**	str	p8, \[sp, #4, mul vl\]
+**	str	p9, \[sp, #5, mul vl\]
+**	str	p10, \[sp, #6, mul vl\]
+**	str	p11, \[sp, #7, mul vl\]
+**	str	p12, \[sp, #8, mul vl\]
+**	str	p13, \[sp, #9, mul vl\]
+**	str	p14, \[sp, #10, mul vl\]
+**	str	p15, \[sp, #11, mul vl\]
+**	str	z8, \[sp, #2, mul vl\]
+**	str	z9, \[sp, #3, mul vl\]
+**	str	z10, \[sp, #4, mul vl\]
+**	str	z11, \[sp, #5, mul vl\]
+**	str	z12, \[sp, #6, mul vl\]
+**	str	z13, \[sp, #7, mul vl\]
+**	str	z14, \[sp, #8, mul vl\]
+**	str	z15, \[sp, #9, mul vl\]
+**	str	z16, \[sp, #10, mul vl\]
+**	str	z17, \[sp, #11, mul vl\]
+**	str	z18, \[sp, #12, mul vl\]
+**	str	z19, \[sp, #13, mul vl\]
+**	str	z20, \[sp, #14, mul vl\]
+**	str	z21, \[sp, #15, mul vl\]
+**	str	z22, \[sp, #16, mul vl\]
+**	str	z23, \[sp, #17, mul vl\]
+**	mrs	x16, svcr
+**	str	x16, \[x29, #?16\]
+**	ldr	x16, \[x29, #?16\]
+**	tbz	x16, 0, .*
+**	smstop	sm
+**	bl	ns_callee
+**	ldr	x16, \[x29, #?16\]
+**	tbz	x16, 0, .*
+**	addvl	sp, sp, #-1
+**	str	p0, \[sp\]
+**	smstart	sm
+**	ldr	p0, \[sp\]
+**	addvl	sp, sp, #1
+**	ldr	x16, \[x29, #?16\]
+**	tbnz	x16, 0, .*
+**	smstart	sm
+**	bl	s_callee
+**	ldr	x16, \[x29, #?16\]
+**	tbnz	x16, 0, .*
+**	addvl	sp, sp, #-1
+**	str	p0, \[sp\]
+**	smstop	sm
+**	ldr	p0, \[sp\]
+**	addvl	sp, sp, #1
+**	bl	sc_callee
+**	ldr	z8, \[sp, #2, mul vl\]
+**	ldr	z9, \[sp, #3, mul vl\]
+**	ldr	z10, \[sp, #4, mul vl\]
+**	ldr	z11, \[sp, #5, mul vl\]
+**	ldr	z12, \[sp, #6, mul vl\]
+**	ldr	z13, \[sp, #7, mul vl\]
+**	ldr	z14, \[sp, #8, mul vl\]
+**	ldr	z15, \[sp, #9, mul vl\]
+**	ldr	z16, \[sp, #10, mul vl\]
+**	ldr	z17, \[sp, #11, mul vl\]
+**	ldr	z18, \[sp, #12, mul vl\]
+**	ldr	z19, \[sp, #13, mul vl\]
+**	ldr	z20, \[sp, #14, mul vl\]
+**	ldr	z21, \[sp, #15, mul vl\]
+**	ldr	z22, \[sp, #16, mul vl\]
+**	ldr	z23, \[sp, #17, mul vl\]
+**	ldr	p4, \[sp\]
+**	ldr	p5, \[sp, #1, mul vl\]
+**	ldr	p6, \[sp, #2, mul vl\]
+**	ldr	p7, \[sp, #3, mul vl\]
+**	ldr	p8, \[sp, #4, mul vl\]
+**	ldr	p9, \[sp, #5, mul vl\]
+**	ldr	p10, \[sp, #6, mul vl\]
+**	ldr	p11, \[sp, #7, mul vl\]
+**	ldr	p12, \[sp, #8, mul vl\]
+**	ldr	p13, \[sp, #9, mul vl\]
+**	ldr	p14, \[sp, #10, mul vl\]
+**	ldr	p15, \[sp, #11, mul vl\]
+**	addvl	sp, sp, #18
+**	ldp	x29, x30, \[sp\], #?32
+**	ret
+*/
+svbool_t __attribute__((arm_streaming_compatible))
+sc_caller ()
+{
+  ns_callee ();
+  s_callee ();
+  return sc_callee ();
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_6.c b/gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_6.c
new file mode 100644
index 00000000000..507e2856a8c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_6.c
@@ -0,0 +1,45 @@
+// { dg-options "-O -fomit-frame-pointer -fno-optimize-sibling-calls" }
+
+#include <arm_sve.h>
+
+svbool_t ns_callee ();
+__attribute__((arm_streaming)) svbool_t s_callee ();
+__attribute__((arm_streaming_compatible)) svbool_t sc_callee ();
+
+struct callbacks {
+  svbool_t (*ns_ptr) ();
+  __attribute__((arm_streaming)) svbool_t (*s_ptr) ();
+  __attribute__((arm_streaming_compatible)) svbool_t (*sc_ptr) ();
+};
+
+svbool_t
+n_caller (struct callbacks *c)
+{
+  ns_callee ();
+  sc_callee ();
+
+  c->ns_ptr ();
+  return c->sc_ptr ();
+}
+
+svbool_t __attribute__((arm_streaming))
+s_caller (struct callbacks *c)
+{
+  s_callee ();
+  sc_callee ();
+
+  c->s_ptr ();
+  return c->sc_ptr ();
+}
+
+svbool_t __attribute__((arm_streaming_compatible))
+sc_caller (struct callbacks *c)
+{
+  sc_callee ();
+
+  return c->sc_ptr ();
+}
+
+// { dg-final { scan-assembler-not {[dpqz][0-9]+,} } }
+// { dg-final { scan-assembler-not {smstart\tsm} } }
+// { dg-final { scan-assembler-not {smstop\tsm} } }
diff --git a/gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_7.c b/gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_7.c
new file mode 100644
index 00000000000..af4d64b26ec
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_7.c
@@ -0,0 +1,516 @@
+// { dg-options "-O -fomit-frame-pointer -fno-optimize-sibling-calls" }
+// { dg-final { check-function-bodies "**" "" } }
+
+#include <arm_neon.h>
+#include <arm_sve.h>
+
+double produce_d0 ();
+void consume_d0 (double);
+
+/*
+** test_d0:
+**	...
+**	smstop	sm
+**	bl	produce_d0
+**	fmov	x10, d0
+**	smstart	sm
+**	fmov	d0, x10
+**	fmov	x10, d0
+**	smstop	sm
+**	fmov	d0, x10
+**	bl	consume_d0
+**	...
+*/
+void __attribute__((arm_streaming))
+test_d0 ()
+{
+  double res = produce_d0 ();
+  asm volatile ("");
+  consume_d0 (res);
+}
+
+int8x8_t produce_d0_vec ();
+void consume_d0_vec (int8x8_t);
+
+/*
+** test_d0_vec:
+**	...
+**	smstop	sm
+**	bl	produce_d0_vec
+** (
+**	fmov	x10, d0
+** |
+**	umov	x10, v0.d\[0\]
+** )
+**	smstart	sm
+**	fmov	d0, x10
+** (
+**	fmov	x10, d0
+** |
+**	umov	x10, v0.d\[0\]
+** )
+**	smstop	sm
+**	fmov	d0, x10
+**	bl	consume_d0_vec
+**	...
+*/
+void __attribute__((arm_streaming))
+test_d0_vec ()
+{
+  int8x8_t res = produce_d0_vec ();
+  asm volatile ("");
+  consume_d0_vec (res);
+}
+
+int8x16_t produce_q0 ();
+void consume_q0 (int8x16_t);
+
+/*
+** test_q0:
+**	...
+**	smstop	sm
+**	bl	produce_q0
+**	str	q0, \[sp, #?-16\]!
+**	smstart	sm
+**	ldr	q0, \[sp\], #?16
+**	str	q0, \[sp, #?-16\]!
+**	smstop	sm
+**	ldr	q0, \[sp\], #?16
+**	bl	consume_q0
+**	...
+*/
+void __attribute__((arm_streaming))
+test_q0 ()
+{
+  int8x16_t res = produce_q0 ();
+  asm volatile ("");
+  consume_q0 (res);
+}
+
+int8x16x2_t produce_q1 ();
+void consume_q1 (int8x16x2_t);
+
+/*
+** test_q1:
+**	...
+**	smstop	sm
+**	bl	produce_q1
+**	stp	q0, q1, \[sp, #?-32\]!
+**	smstart	sm
+**	ldp	q0, q1, \[sp\], #?32
+**	stp	q0, q1, \[sp, #?-32\]!
+**	smstop	sm
+**	ldp	q0, q1, \[sp\], #?32
+**	bl	consume_q1
+**	...
+*/
+void __attribute__((arm_streaming))
+test_q1 ()
+{
+  int8x16x2_t res = produce_q1 ();
+  asm volatile ("");
+  consume_q1 (res);
+}
+
+int8x16x3_t produce_q2 ();
+void consume_q2 (int8x16x3_t);
+
+/*
+** test_q2:
+**	...
+**	smstop	sm
+**	bl	produce_q2
+**	stp	q0, q1, \[sp, #?-48\]!
+**	str	q2, \[sp, #?32\]
+**	smstart	sm
+**	ldr	q2, \[sp, #?32\]
+**	ldp	q0, q1, \[sp\], #?48
+**	stp	q0, q1, \[sp, #?-48\]!
+**	str	q2, \[sp, #?32\]
+**	smstop	sm
+**	ldr	q2, \[sp, #?32\]
+**	ldp	q0, q1, \[sp\], #?48
+**	bl	consume_q2
+**	...
+*/
+void __attribute__((arm_streaming))
+test_q2 ()
+{
+  int8x16x3_t res = produce_q2 ();
+  asm volatile ("");
+  consume_q2 (res);
+}
+
+int8x16x4_t produce_q3 ();
+void consume_q3 (int8x16x4_t);
+
+/*
+** test_q3:
+**	...
+**	smstop	sm
+**	bl	produce_q3
+**	stp	q0, q1, \[sp, #?-64\]!
+**	stp	q2, q3, \[sp, #?32\]
+**	smstart	sm
+**	ldp	q2, q3, \[sp, #?32\]
+**	ldp	q0, q1, \[sp\], #?64
+**	stp	q0, q1, \[sp, #?-64\]!
+**	stp	q2, q3, \[sp, #?32\]
+**	smstop	sm
+**	ldp	q2, q3, \[sp, #?32\]
+**	ldp	q0, q1, \[sp\], #?64
+**	bl	consume_q3
+**	...
+*/
+void __attribute__((arm_streaming))
+test_q3 ()
+{
+  int8x16x4_t res = produce_q3 ();
+  asm volatile ("");
+  consume_q3 (res);
+}
+
+svint8_t produce_z0 ();
+void consume_z0 (svint8_t);
+
+/*
+** test_z0:
+**	...
+**	smstop	sm
+**	bl	produce_z0
+**	addvl	sp, sp, #-1
+**	str	z0, \[sp\]
+**	smstart	sm
+**	ldr	z0, \[sp\]
+**	addvl	sp, sp, #1
+**	addvl	sp, sp, #-1
+**	str	z0, \[sp\]
+**	smstop	sm
+**	ldr	z0, \[sp\]
+**	addvl	sp, sp, #1
+**	bl	consume_z0
+**	...
+*/
+void __attribute__((arm_streaming))
+test_z0 ()
+{
+  svint8_t res = produce_z0 ();
+  asm volatile ("");
+  consume_z0 (res);
+}
+
+svint8x4_t produce_z3 ();
+void consume_z3 (svint8x4_t);
+
+/*
+** test_z3:
+**	...
+**	smstop	sm
+**	bl	produce_z3
+**	addvl	sp, sp, #-4
+**	str	z0, \[sp\]
+**	str	z1, \[sp, #1, mul vl\]
+**	str	z2, \[sp, #2, mul vl\]
+**	str	z3, \[sp, #3, mul vl\]
+**	smstart	sm
+**	ldr	z0, \[sp\]
+**	ldr	z1, \[sp, #1, mul vl\]
+**	ldr	z2, \[sp, #2, mul vl\]
+**	ldr	z3, \[sp, #3, mul vl\]
+**	addvl	sp, sp, #4
+**	addvl	sp, sp, #-4
+**	str	z0, \[sp\]
+**	str	z1, \[sp, #1, mul vl\]
+**	str	z2, \[sp, #2, mul vl\]
+**	str	z3, \[sp, #3, mul vl\]
+**	smstop	sm
+**	ldr	z0, \[sp\]
+**	ldr	z1, \[sp, #1, mul vl\]
+**	ldr	z2, \[sp, #2, mul vl\]
+**	ldr	z3, \[sp, #3, mul vl\]
+**	addvl	sp, sp, #4
+**	bl	consume_z3
+**	...
+*/
+void __attribute__((arm_streaming))
+test_z3 ()
+{
+  svint8x4_t res = produce_z3 ();
+  asm volatile ("");
+  consume_z3 (res);
+}
+
+svbool_t produce_p0 ();
+void consume_p0 (svbool_t);
+
+/*
+** test_p0:
+**	...
+**	smstop	sm
+**	bl	produce_p0
+**	addvl	sp, sp, #-1
+**	str	p0, \[sp\]
+**	smstart	sm
+**	ldr	p0, \[sp\]
+**	addvl	sp, sp, #1
+**	addvl	sp, sp, #-1
+**	str	p0, \[sp\]
+**	smstop	sm
+**	ldr	p0, \[sp\]
+**	addvl	sp, sp, #1
+**	bl	consume_p0
+**	...
+*/
+void __attribute__((arm_streaming))
+test_p0 ()
+{
+  svbool_t res = produce_p0 ();
+  asm volatile ("");
+  consume_p0 (res);
+}
+
+void consume_d7 (double, double, double, double, double, double, double,
+		 double);
+
+/*
+** test_d7:
+**	...
+**	fmov	x10, d0
+**	fmov	x11, d1
+**	fmov	x12, d2
+**	fmov	x13, d3
+**	fmov	x14, d4
+**	fmov	x15, d5
+**	fmov	x16, d6
+**	fmov	x17, d7
+**	smstop	sm
+**	fmov	d0, x10
+**	fmov	d1, x11
+**	fmov	d2, x12
+**	fmov	d3, x13
+**	fmov	d4, x14
+**	fmov	d5, x15
+**	fmov	d6, x16
+**	fmov	d7, x17
+**	bl	consume_d7
+**	...
+*/
+void __attribute__((arm_streaming))
+test_d7 ()
+{
+  consume_d7 (1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0);
+}
+
+void consume_d7_vec (int8x8_t, int8x8_t, int8x8_t, int8x8_t, int8x8_t,
+		     int8x8_t, int8x8_t, int8x8_t);
+
+/*
+** test_d7_vec:
+**	...
+** (
+**	fmov	x10, d0
+**	fmov	x11, d1
+**	fmov	x12, d2
+**	fmov	x13, d3
+**	fmov	x14, d4
+**	fmov	x15, d5
+**	fmov	x16, d6
+**	fmov	x17, d7
+** |
+**	umov	x10, v0.d\[0\]
+**	umov	x11, v1.d\[0\]
+**	umov	x12, v2.d\[0\]
+**	umov	x13, v3.d\[0\]
+**	umov	x14, v4.d\[0\]
+**	umov	x15, v5.d\[0\]
+**	umov	x16, v6.d\[0\]
+**	umov	x17, v7.d\[0\]
+** )
+**	smstop	sm
+**	fmov	d0, x10
+**	fmov	d1, x11
+**	fmov	d2, x12
+**	fmov	d3, x13
+**	fmov	d4, x14
+**	fmov	d5, x15
+**	fmov	d6, x16
+**	fmov	d7, x17
+**	bl	consume_d7_vec
+**	...
+*/
+void __attribute__((arm_streaming))
+test_d7_vec (int8x8_t *ptr)
+{
+  consume_d7_vec (*ptr, *ptr, *ptr, *ptr, *ptr, *ptr, *ptr, *ptr);
+}
+
+void consume_q7 (int8x16_t, int8x16_t, int8x16_t, int8x16_t, int8x16_t,
+		 int8x16_t, int8x16_t, int8x16_t);
+
+/*
+** test_q7:
+**	...
+**	stp	q0, q1, \[sp, #?-128\]!
+**	stp	q2, q3, \[sp, #?32\]
+**	stp	q4, q5, \[sp, #?64\]
+**	stp	q6, q7, \[sp, #?96\]
+**	smstop	sm
+**	ldp	q2, q3, \[sp, #?32\]
+**	ldp	q4, q5, \[sp, #?64\]
+**	ldp	q6, q7, \[sp, #?96\]
+**	ldp	q0, q1, \[sp\], #?128
+**	bl	consume_q7
+**	...
+*/
+void __attribute__((arm_streaming))
+test_q7 (int8x16_t *ptr)
+{
+  consume_q7 (*ptr, *ptr, *ptr, *ptr, *ptr, *ptr, *ptr, *ptr);
+}
+
+void consume_z7 (svint8_t, svint8_t, svint8_t, svint8_t, svint8_t,
+		 svint8_t, svint8_t, svint8_t);
+
+/*
+** test_z7:
+**	...
+**	addvl	sp, sp, #-8
+**	str	z0, \[sp\]
+**	str	z1, \[sp, #1, mul vl\]
+**	str	z2, \[sp, #2, mul vl\]
+**	str	z3, \[sp, #3, mul vl\]
+**	str	z4, \[sp, #4, mul vl\]
+**	str	z5, \[sp, #5, mul vl\]
+**	str	z6, \[sp, #6, mul vl\]
+**	str	z7, \[sp, #7, mul vl\]
+**	smstop	sm
+**	ldr	z0, \[sp\]
+**	ldr	z1, \[sp, #1, mul vl\]
+**	ldr	z2, \[sp, #2, mul vl\]
+**	ldr	z3, \[sp, #3, mul vl\]
+**	ldr	z4, \[sp, #4, mul vl\]
+**	ldr	z5, \[sp, #5, mul vl\]
+**	ldr	z6, \[sp, #6, mul vl\]
+**	ldr	z7, \[sp, #7, mul vl\]
+**	addvl	sp, sp, #8
+**	bl	consume_z7
+**	...
+*/
+void __attribute__((arm_streaming))
+test_z7 (svint8_t *ptr)
+{
+  consume_z7 (*ptr, *ptr, *ptr, *ptr, *ptr, *ptr, *ptr, *ptr);
+}
+
+void consume_p3 (svbool_t, svbool_t, svbool_t, svbool_t);
+
+/*
+** test_p3:
+**	...
+**	addvl	sp, sp, #-1
+**	str	p0, \[sp\]
+**	str	p1, \[sp, #1, mul vl\]
+**	str	p2, \[sp, #2, mul vl\]
+**	str	p3, \[sp, #3, mul vl\]
+**	smstop	sm
+**	ldr	p0, \[sp\]
+**	ldr	p1, \[sp, #1, mul vl\]
+**	ldr	p2, \[sp, #2, mul vl\]
+**	ldr	p3, \[sp, #3, mul vl\]
+**	addvl	sp, sp, #1
+**	bl	consume_p3
+**	...
+*/
+void __attribute__((arm_streaming))
+test_p3 (svbool_t *ptr)
+{
+  consume_p3 (*ptr, *ptr, *ptr, *ptr);
+}
+
+void consume_mixed (float, double, float32x4_t, svfloat32_t,
+		    float, double, float64x2_t, svfloat64_t,
+		    svbool_t, svbool_t, svbool_t, svbool_t);
+
+/*
+** test_mixed:
+**	...
+**	addvl	sp, sp, #-3
+**	str	p0, \[sp\]
+**	str	p1, \[sp, #1, mul vl\]
+**	str	p2, \[sp, #2, mul vl\]
+**	str	p3, \[sp, #3, mul vl\]
+**	str	z3, \[sp, #1, mul vl\]
+**	str	z7, \[sp, #2, mul vl\]
+**	stp	q2, q6, \[sp, #?-32\]!
+**	fmov	w10, s0
+**	fmov	x11, d1
+**	fmov	w12, s4
+**	fmov	x13, d5
+**	smstop	sm
+**	fmov	s0, w10
+**	fmov	d1, x11
+**	fmov	s4, w12
+**	fmov	d5, x13
+**	ldp	q2, q6, \[sp\], #?32
+**	ldr	p0, \[sp\]
+**	ldr	p1, \[sp, #1, mul vl\]
+**	ldr	p2, \[sp, #2, mul vl\]
+**	ldr	p3, \[sp, #3, mul vl\]
+**	ldr	z3, \[sp, #1, mul vl\]
+**	ldr	z7, \[sp, #2, mul vl\]
+**	addvl	sp, sp, #3
+**	bl	consume_mixed
+**	...
+*/
+void __attribute__((arm_streaming))
+test_mixed (float32x4_t *float32x4_ptr,
+	    svfloat32_t *svfloat32_ptr,
+	    float64x2_t *float64x2_ptr,
+	    svfloat64_t *svfloat64_ptr,
+	    svbool_t *svbool_ptr)
+{
+  consume_mixed (1.0f, 2.0, *float32x4_ptr, *svfloat32_ptr,
+		 3.0f, 4.0, *float64x2_ptr, *svfloat64_ptr,
+		 *svbool_ptr, *svbool_ptr, *svbool_ptr, *svbool_ptr);
+}
+
+void consume_varargs (float, ...);
+
+/*
+** test_varargs:
+**	...
+**	stp	q3, q7, \[sp, #?-32\]!
+**	fmov	w10, s0
+**	fmov	x11, d1
+** (
+**	fmov	x12, d2
+** |
+**	umov	x12, v2.d\[0\]
+** )
+**	fmov	x13, d4
+**	fmov	x14, d5
+** (
+**	fmov	x15, d6
+** |
+**	umov	x15, v6.d\[0\]
+** )
+**	smstop	sm
+**	fmov	s0, w10
+**	fmov	d1, x11
+**	fmov	d2, x12
+**	fmov	d4, x13
+**	fmov	d5, x14
+**	fmov	d6, x15
+**	ldp	q3, q7, \[sp\], #?32
+**	bl	consume_varargs
+**	...
+*/
+void __attribute__((arm_streaming))
+test_varargs (float32x2_t *float32x2_ptr,
+	      float32x4_t *float32x4_ptr,
+	      float64x1_t *float64x1_ptr,
+	      float64x2_t *float64x2_ptr)
+{
+  consume_varargs (1.0f, 2.0, *float32x2_ptr, *float32x4_ptr,
+		   3.0f, 4.0, *float64x1_ptr, *float64x2_ptr);
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_8.c b/gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_8.c
new file mode 100644
index 00000000000..1a28da795de
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_8.c
@@ -0,0 +1,87 @@
+// { dg-options "-O -fomit-frame-pointer -fno-optimize-sibling-calls -msve-vector-bits=128" }
+// { dg-final { check-function-bodies "**" "" } }
+
+#include <arm_sve.h>
+
+svint8_t produce_z0 ();
+void consume_z0 (svint8_t);
+
+/*
+** test_z0:
+**	...
+**	smstop	sm
+**	bl	produce_z0
+**	str	q0, \[sp, #?-16\]!
+**	smstart	sm
+**	ldr	q0, \[sp\], #?16
+**	str	q0, \[sp, #?-16\]!
+**	smstop	sm
+**	ldr	q0, \[sp\], #?16
+**	bl	consume_z0
+**	...
+*/
+void __attribute__((arm_streaming))
+test_z0 ()
+{
+  svint8_t res = produce_z0 ();
+  asm volatile ("");
+  consume_z0 (res);
+}
+
+svint8x4_t produce_z3 ();
+void consume_z3 (svint8x4_t);
+
+/*
+** test_z3:
+**	...
+**	smstop	sm
+**	bl	produce_z3
+**	stp	q0, q1, \[sp, #?-64\]!
+**	stp	q2, q3, \[sp, #?32\]
+**	smstart	sm
+**	ldp	q2, q3, \[sp, #?32\]
+**	ldp	q0, q1, \[sp\], #?64
+**	stp	q0, q1, \[sp, #?-64\]!
+**	stp	q2, q3, \[sp, #?32\]
+**	smstop	sm
+**	ldp	q2, q3, \[sp, #?32\]
+**	ldp	q0, q1, \[sp\], #?64
+**	bl	consume_z3
+**	...
+*/
+void __attribute__((arm_streaming))
+test_z3 ()
+{
+  svint8x4_t res = produce_z3 ();
+  asm volatile ("");
+  consume_z3 (res);
+}
+
+svbool_t produce_p0 ();
+void consume_p0 (svbool_t);
+
+/*
+** test_p0:
+**	...
+**	smstop	sm
+**	bl	produce_p0
+**	sub	sp, sp, #?16
+**	str	p0, \[sp\]
+**	smstart	sm
+**	ldr	p0, \[sp\]
+**	add	sp, sp, #?16
+**	sub	sp, sp, #?16
+**	str	p0, \[sp\]
+**	smstop	sm
+**	ldr	p0, \[sp\]
+**	add	sp, sp, #?16
+**	bl	consume_p0
+**	...
+*/
+void __attribute__((arm_streaming))
+test_p0 ()
+{
+  svbool_t res = produce_p0 ();
+  asm volatile ("");
+  consume_p0 (res);
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_9.c b/gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_9.c
new file mode 100644
index 00000000000..fd880ee7931
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_9.c
@@ -0,0 +1,103 @@
+// { dg-options "-O -fomit-frame-pointer -fno-optimize-sibling-calls -msve-vector-bits=256" }
+// { dg-final { check-function-bodies "**" "" } }
+
+#include <arm_sve.h>
+
+svint8_t produce_z0 ();
+void consume_z0 (svint8_t);
+
+/*
+** test_z0:
+**	...
+**	smstop	sm
+**	bl	produce_z0
+**	sub	sp, sp, #?32
+**	str	z0, \[sp\]
+**	smstart	sm
+**	ldr	z0, \[sp\]
+**	add	sp, sp, #?32
+**	sub	sp, sp, #?32
+**	str	z0, \[sp\]
+**	smstop	sm
+**	ldr	z0, \[sp\]
+**	add	sp, sp, #?32
+**	bl	consume_z0
+**	...
+*/
+void __attribute__((arm_streaming))
+test_z0 ()
+{
+  svint8_t res = produce_z0 ();
+  asm volatile ("");
+  consume_z0 (res);
+}
+
+svint8x4_t produce_z3 ();
+void consume_z3 (svint8x4_t);
+
+/*
+** test_z3:
+**	...
+**	smstop	sm
+**	bl	produce_z3
+**	sub	sp, sp, #?128
+**	str	z0, \[sp\]
+**	str	z1, \[sp, #1, mul vl\]
+**	str	z2, \[sp, #2, mul vl\]
+**	str	z3, \[sp, #3, mul vl\]
+**	smstart	sm
+**	ldr	z0, \[sp\]
+**	ldr	z1, \[sp, #1, mul vl\]
+**	ldr	z2, \[sp, #2, mul vl\]
+**	ldr	z3, \[sp, #3, mul vl\]
+**	add	sp, sp, #?128
+**	sub	sp, sp, #?128
+**	str	z0, \[sp\]
+**	str	z1, \[sp, #1, mul vl\]
+**	str	z2, \[sp, #2, mul vl\]
+**	str	z3, \[sp, #3, mul vl\]
+**	smstop	sm
+**	ldr	z0, \[sp\]
+**	ldr	z1, \[sp, #1, mul vl\]
+**	ldr	z2, \[sp, #2, mul vl\]
+**	ldr	z3, \[sp, #3, mul vl\]
+**	add	sp, sp, #?128
+**	bl	consume_z3
+**	...
+*/
+void __attribute__((arm_streaming))
+test_z3 ()
+{
+  svint8x4_t res = produce_z3 ();
+  asm volatile ("");
+  consume_z3 (res);
+}
+
+svbool_t produce_p0 ();
+void consume_p0 (svbool_t);
+
+/*
+** test_p0:
+**	...
+**	smstop	sm
+**	bl	produce_p0
+**	sub	sp, sp, #?32
+**	str	p0, \[sp\]
+**	smstart	sm
+**	ldr	p0, \[sp\]
+**	add	sp, sp, #?32
+**	sub	sp, sp, #?32
+**	str	p0, \[sp\]
+**	smstop	sm
+**	ldr	p0, \[sp\]
+**	add	sp, sp, #?32
+**	bl	consume_p0
+**	...
+*/
+void __attribute__((arm_streaming))
+test_p0 ()
+{
+  svbool_t res = produce_p0 ();
+  asm volatile ("");
+  consume_p0 (res);
+}
-- 
2.25.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 06/16] aarch64: Add support for SME ZA attributes
  2022-11-13  9:59 [PATCH 00/16] aarch64: Add support for SME Richard Sandiford
                   ` (4 preceding siblings ...)
  2022-11-13 10:00 ` [PATCH 05/16] aarch64: Switch PSTATE.SM around calls Richard Sandiford
@ 2022-11-13 10:01 ` Richard Sandiford
  2022-11-13 10:01 ` [PATCH 07/16] aarch64: Add a register class for w12-w15 Richard Sandiford
                   ` (9 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: Richard Sandiford @ 2022-11-13 10:01 UTC (permalink / raw)
  To: gcc-patches

SME has an array called ZA that can be enabled and disabled separately
from streaming mode.  A status bit called PSTATE.ZA indicates whether
ZA is currently enabled or not.

In C and C++, the state of PSTATE.ZA is controlled using function
attributes.  If a function's type has an arm_shared_za attribute,
PSTATE.ZA==1 on entry to the function and on return from the function,
and the function shares the contents of ZA with its caller.  Otherwise,
the caller and callee have separate ZA contexts; they do not use ZA to
share data.

Although normal non-arm_shared_za functions have a separate
ZA context from their callers, nested uses of ZA are expected
to be rare.  The ABI therefore defines a cooperative lazy saving
scheme that allows saves and restore of ZA to be kept to a minimum.
(Callers still have the option of doing a full save and restore
if they prefer.)

Functions that want to use ZA internally have an arm_new_za
attribute, which tells the compiler to enable PSTATE.ZA for
the duration of the function body.  It also tells the compiler
to commit any lazy save initiated by a caller.

There is also a function type attribute called arm_preserves_za,
which a function can use to guarantee to callers that it doesn't
change ZA (and so that callers don't need to save and restore it).
A known flaw is that it should be possible to assign preserves-ZA
functions to normal function pointers, but currently that results
in a diagnostic.  (The opposite way is invalid and rightly rejected.)

gcc/
	* config/aarch64/aarch64-isa-modes.def (ZA_ON): New ISA mode.
	* config/aarch64/aarch64-protos.h (aarch64_rdsvl_immediate_p)
	(aarch64_output_rdsvl, aarch64_restore_za): Declare.
	* config/aarch64/constraints.md (UsR): New constraint.
	* config/aarch64/aarch64.md (ZA_REGNUM, OLD_ZA_REGNUM): New constants.
	(UNSPEC_SME_VQ): New unspec.
	(arches): Add sme.
	(arch_enabled): Handle it.
	(*cb<optab><mode>1): Rename to...
	(aarch64_cb<optab><mode>1): ...this.
	(*movsi_aarch64): Add an alernative for RDSVL.
	(*movdi_aarch64): Likewise.
	* config/aarch64/aarch64-sme.md (UNSPEC_SMSTART_ZA, UNSPEC_SMSTOP_ZA)
	(UNSPEC_TPIDR2_SAVE, UNSPEC_TPIDR2_RESTORE, UNSPEC_READ_TPIDR2)
	(UNSPEC_CLEAR_TPIDR2): New unspecs.
	(aarch64_smstart_za, aarch64_smstop_za, aarch64_tpidr2_save)
	(aarch64_tpidr2_restore, aarch64_read_tpidr2, aarch64_clear_tpidr2)
	(aarch64_save_za, aarch64_restore_za): New patterns.
	* config/aarch64/aarch64.h (AARCH64_ISA_ZA_ON, TARGET_ZA): New macros.
	(FIXED_REGISTERS, REGISTER_NAMES): Add the ZA registers.
	(CALL_USED_REGISTERS): Replace with...
	(CALL_REALLY_USED_REGISTERS): ...this and add the ZA registers.
	(FIRST_PSEUDO_REGISTER): Bump to include ZA registers.
	(ZA_REGS): New register class.
	(REG_CLASS_NAMES): Update accordingly.
	(REG_CLASS_CONTENTS): Likewise.
	(aarch64_frame::has_new_za_state): New member variable.
	(machine_function::tpidr2_block): Likewise.
	(machine_function::tpidr2_block_ptr): Likewise.
	(machine_function::za_save_buffer): Likewise.
	(CUMULATIVE_ARGS::preserves_za): Likewise.
	* config/aarch64/aarch64.cc (handle_arm_new_za_attribute): New
	function.
	(attr_arm_new_za_exclusions): New variable.
	(attr_no_arm_new_za): Likewise.
	(aarch64_attribute_table): Add arm_new_za, arm_shared_za, and
	arm_preserves_za.
	(aarch64_hard_regno_nregs): Handle the ZA registers.
	(aarch64_hard_regno_mode_ok): Likewise.
	(aarch64_regno_regclass): Likewise.
	(aarch64_class_max_nregs): Likewise.
	(aarch64_conditional_register_usage): Likewise.
	(aarch64_fntype_za_state): New function.
	(aarch64_fntype_isa_mode): Call it.
	(aarch64_fntype_preserves_za): New function.
	(aarch64_fndecl_has_new_za_state): Likewise.
	(aarch64_fndecl_za_state): Likewise.
	(aarch64_fndecl_isa_mode): Call it.
	(aarch64_fndecl_preserves_za): New function.
	(aarch64_cfun_incoming_za_state): Likewise.
	(aarch64_cfun_has_new_za_state): Likewise.
	(aarch64_sme_vq_immediate): Likewise.
	(aarch64_sme_vq_unspec_p): Likewise.
	(aarch64_rdsvl_immediate_p): Likewise.
	(aarch64_output_rdsvl): Likewise.
	(aarch64_expand_mov_immediate): Handle RDSVL immediates.
	(aarch64_mov_operand_p): Likewise.
	(aarch64_init_cumulative_args): Record whether the call preserves ZA.
	(aarch64_layout_frame): Check whether the current function creates
	new ZA state.  Record that it clobbers LR if so.
	(aarch64_epilogue_uses): Handle ZA_REGNUM.
	(aarch64_expand_prologue): Handle functions that create new ZA state.
	(aarch64_expand_epilogue): Likewise.
	(aarch64_create_tpidr2_block): New function.
	(aarch64_restore_za): Likewise.
	(aarch64_start_call_args): Disallow calls to shared-ZA functions
	from functions that have no ZA state.  Set up a lazy save if the
	call might clobber the caller's ZA state.
	(aarch64_expand_call): Record the shared-ZA functions use ZA_REGNUM.
	(aarch64_end_call_args): New function.
	(aarch64_override_options_internal): Require TARGET_SME for
	functions that have ZA state.
	(aarch64_comp_type_attributes): Handle arm_shared_za and
	arm_preserves_za.
	(aarch64_merge_decl_attributes): New function.
	(TARGET_END_CALL_ARGS, TARGET_MERGE_DECL_ATTRIBUTES): Define.
	(TARGET_MD_ASM_ADJUST): Use aarch64_md_asm_adjust.

gcc/testsuite/
	* gcc.target/aarch64/sme/za_state_1.c: New test.
	* gcc.target/aarch64/sme/za_state_2.c: Likewise.
	* gcc.target/aarch64/sme/za_state_3.c: Likewise.
	* gcc.target/aarch64/sme/za_state_4.c: Likewise.
	* gcc.target/aarch64/sme/za_state_5.c: Likewise.
	* gcc.target/aarch64/sme/za_state_6.c: Likewise.
	* gcc.target/aarch64/sme/za_state_7.c: Likewise.
---
 gcc/config/aarch64/aarch64-isa-modes.def      |   5 +
 gcc/config/aarch64/aarch64-protos.h           |   4 +
 gcc/config/aarch64/aarch64-sme.md             | 138 ++++++
 gcc/config/aarch64/aarch64.cc                 | 408 +++++++++++++++++-
 gcc/config/aarch64/aarch64.h                  |  43 +-
 gcc/config/aarch64/aarch64.md                 |  39 +-
 gcc/config/aarch64/constraints.md             |   6 +
 .../gcc.target/aarch64/sme/za_state_1.c       | 102 +++++
 .../gcc.target/aarch64/sme/za_state_2.c       |  96 +++++
 .../gcc.target/aarch64/sme/za_state_3.c       |  27 ++
 .../gcc.target/aarch64/sme/za_state_4.c       | 277 ++++++++++++
 .../gcc.target/aarch64/sme/za_state_5.c       | 241 +++++++++++
 .../gcc.target/aarch64/sme/za_state_6.c       | 132 ++++++
 .../gcc.target/aarch64/sme/za_state_7.c       |  55 +++
 14 files changed, 1548 insertions(+), 25 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/za_state_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/za_state_2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/za_state_3.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/za_state_4.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/za_state_5.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/za_state_6.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/za_state_7.c

diff --git a/gcc/config/aarch64/aarch64-isa-modes.def b/gcc/config/aarch64/aarch64-isa-modes.def
index fba8eafbae1..001ef54a59b 100644
--- a/gcc/config/aarch64/aarch64-isa-modes.def
+++ b/gcc/config/aarch64/aarch64-isa-modes.def
@@ -32,4 +32,9 @@
 DEF_AARCH64_ISA_MODE(SM_ON)
 DEF_AARCH64_ISA_MODE(SM_OFF)
 
+/* Indicates that PSTATE.ZA is known to be 1.  The converse is that
+   PSTATE.ZA might be 0 or 1, depending on whether there is an uncommitted
+   lazy save.  */
+DEF_AARCH64_ISA_MODE(ZA_ON)
+
 #undef DEF_AARCH64_ISA_MODE
diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index 0f686fba4bd..97a84f616a2 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -807,6 +807,8 @@ bool aarch64_sve_addvl_addpl_immediate_p (rtx);
 bool aarch64_sve_vector_inc_dec_immediate_p (rtx);
 int aarch64_add_offset_temporaries (rtx);
 void aarch64_split_add_offset (scalar_int_mode, rtx, rtx, rtx, rtx, rtx);
+bool aarch64_rdsvl_immediate_p (const_rtx);
+char *aarch64_output_rdsvl (const_rtx);
 bool aarch64_mov_operand_p (rtx, machine_mode);
 rtx aarch64_reverse_mask (machine_mode, unsigned int);
 bool aarch64_offset_7bit_signed_scaled_p (machine_mode, poly_int64);
@@ -1077,4 +1079,6 @@ const char *aarch64_indirect_call_asm (rtx);
 extern bool aarch64_harden_sls_retbr_p (void);
 extern bool aarch64_harden_sls_blr_p (void);
 
+void aarch64_restore_za ();
+
 #endif /* GCC_AARCH64_PROTOS_H */
diff --git a/gcc/config/aarch64/aarch64-sme.md b/gcc/config/aarch64/aarch64-sme.md
index 88f1526fa34..55fb00db12d 100644
--- a/gcc/config/aarch64/aarch64-sme.md
+++ b/gcc/config/aarch64/aarch64-sme.md
@@ -23,6 +23,7 @@
 ;; == State management
 ;; ---- Test current state
 ;; ---- PSTATE.SM management
+;; ---- PSTATE.ZA management
 
 ;; =========================================================================
 ;; == State management
@@ -131,3 +132,140 @@ (define_insn "aarch64_smstop_sm"
   ""
   "smstop\tsm"
 )
+
+;; -------------------------------------------------------------------------
+;; ---- PSTATE.ZA management
+;; -------------------------------------------------------------------------
+;; Includes
+;; - SMSTART ZA
+;; - SMSTOP ZA
+;; plus calls to support routines.
+;; -------------------------------------------------------------------------
+
+(define_c_enum "unspec" [
+  UNSPEC_SMSTART_ZA
+  UNSPEC_SMSTOP_ZA
+  UNSPEC_TPIDR2_SAVE
+  UNSPEC_TPIDR2_RESTORE
+  UNSPEC_READ_TPIDR2
+  UNSPEC_CLEAR_TPIDR2
+])
+
+;; Enable ZA, starting with fresh ZA contents.  This is only valid when
+;; SME is present, but the pattern does not depend on TARGET_SME since
+;; it can be used conditionally.
+(define_insn "aarch64_smstart_za"
+  [(unspec_volatile [(const_int 0)] UNSPEC_SMSTART_ZA)
+   (clobber (reg:VNx16QI ZA_REGNUM))]
+  ""
+  "smstart\tza"
+)
+
+;; Disable ZA and discard its current contents.  This is only valid when
+;; SME is present, but the pattern does not depend on TARGET_SME since
+;; it can be used conditionally.
+;;
+;; The ABI says that the ZA save buffer must be null whenever PSTATE.ZA
+;; is zero.  This instruction is therefore sequenced wrt writes to
+;; OLD_ZA_REGNUM.
+(define_insn "aarch64_smstop_za"
+  [(unspec_volatile [(reg:VNx16QI OLD_ZA_REGNUM)] UNSPEC_SMSTOP_ZA)
+   (clobber (reg:VNx16QI ZA_REGNUM))]
+  ""
+  "smstop\tza"
+)
+
+;; Use the ABI-defined routine to commit any uncommitted lazy save.
+(define_insn "aarch64_tpidr2_save"
+  [(unspec_volatile:DI [(reg:VNx16QI OLD_ZA_REGNUM)
+			(reg:VNx16QI ZA_REGNUM)] UNSPEC_TPIDR2_SAVE)
+   (clobber (reg:DI R14_REGNUM))
+   (clobber (reg:DI R15_REGNUM))
+   (clobber (reg:DI R16_REGNUM))
+   (clobber (reg:DI R17_REGNUM))
+   (clobber (reg:DI R18_REGNUM))
+   (clobber (reg:DI R30_REGNUM))
+   (clobber (reg:CC CC_REGNUM))]
+  ""
+  "bl\t__arm_tpidr2_save"
+)
+
+;; Use the ABI-defined routine to restore lazy-saved ZA contents
+;; from the TPIDR2 block pointed to by X0.
+(define_insn "aarch64_tpidr2_restore"
+  [(set (reg:VNx16QI ZA_REGNUM)
+	(unspec:VNx16QI [(reg:VNx16QI OLD_ZA_REGNUM)
+			 (reg:DI R0_REGNUM)] UNSPEC_TPIDR2_RESTORE))
+   (clobber (reg:DI R14_REGNUM))
+   (clobber (reg:DI R15_REGNUM))
+   (clobber (reg:DI R16_REGNUM))
+   (clobber (reg:DI R17_REGNUM))
+   (clobber (reg:DI R18_REGNUM))
+   (clobber (reg:DI R30_REGNUM))
+   (clobber (reg:CC CC_REGNUM))]
+  ""
+  "bl\t__arm_tpidr2_restore"
+)
+
+;; Check whether a lazy save of ZA is active.  This is only valid when
+;; SME is present, but the pattern does not depend on TARGET_SME since
+;; it can be used conditionally.
+(define_insn "aarch64_read_tpidr2"
+  [(set (match_operand:DI 0 "register_operand" "=r")
+	(unspec:DI [(reg:VNx16QI OLD_ZA_REGNUM)] UNSPEC_READ_TPIDR2))]
+  ""
+  "mrs\t%0, tpidr2_el0"
+)
+
+;; Clear TPIDR2_EL0, cancelling any uncommitted lazy save.  This is only
+;; valid when SME is present, but the pattern does not depend on TARGET_SME
+;; since it can be used conditionally.
+(define_insn "aarch64_clear_tpidr2"
+  [(set (reg:VNx16QI OLD_ZA_REGNUM)
+	(unspec:VNx16QI [(const_int 0)] UNSPEC_CLEAR_TPIDR2))]
+  ""
+  "msr\ttpidr2_el0, xzr"
+)
+
+;; Set up a lazy save of ZA.  Operand 0 points to the TPIDR2 block and
+;; operand 1 is the contents of that block.  Operand 1 exists only to
+;; provide dependency information: the TPIDR2 block must be valid
+;; before TPIDR2_EL0 is updated.
+(define_insn "aarch64_save_za"
+  [(set (reg:VNx16QI OLD_ZA_REGNUM)
+	(reg:VNx16QI ZA_REGNUM))
+   (use (match_operand 0 "pmode_register_operand" "r"))
+   (use (match_operand:V16QI 1 "memory_operand" "m"))]
+  ""
+  "msr\ttpidr2_el0, %0"
+)
+
+;; Check whether a lazy save set up by aarch64_save_za was committed
+;; and restore the saved contents if so.
+(define_insn_and_split "aarch64_restore_za"
+  [(set (reg:VNx16QI ZA_REGNUM)
+	(reg:VNx16QI OLD_ZA_REGNUM))
+   (clobber (reg:DI R14_REGNUM))
+   (clobber (reg:DI R15_REGNUM))
+   (clobber (reg:DI R16_REGNUM))
+   (clobber (reg:DI R17_REGNUM))
+   (clobber (reg:DI R18_REGNUM))
+   (clobber (reg:DI R30_REGNUM))
+   (clobber (reg:CC CC_REGNUM))
+   (clobber (reg:VNx16QI OLD_ZA_REGNUM))]
+  ""
+  "#"
+  "&& epilogue_completed"
+  [(const_int 0)]
+  {
+    auto label = gen_label_rtx ();
+    auto tpidr2 = gen_rtx_REG (DImode, R16_REGNUM);
+    emit_insn (gen_aarch64_read_tpidr2 (tpidr2));
+    auto jump = emit_jump_insn (gen_aarch64_cbnedi1 (tpidr2, label));
+    JUMP_LABEL (jump) = label;
+    aarch64_restore_za ();
+    emit_label (label);
+    emit_insn (gen_aarch64_clear_tpidr2 ());
+    DONE;
+  }
+)
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index d8310eb8597..b200d2a9f80 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -2733,6 +2733,22 @@ handle_aarch64_vector_pcs_attribute (tree *node, tree name, tree,
   gcc_unreachable ();
 }
 
+/* Check whether an 'arm_new_za' attribute is valid.  */
+
+static tree
+handle_arm_new_za_attribute (tree *node, tree name, tree,
+			     int, bool *no_add_attrs)
+{
+  tree decl = *node;
+  if (TREE_CODE (decl) != FUNCTION_DECL)
+    {
+      error_at (DECL_SOURCE_LOCATION (decl),
+		"%qE attribute applies only to functions", name);
+      *no_add_attrs = true;
+    }
+  return NULL_TREE;
+}
+
 /* Mutually-exclusive function type attributes for controlling PSTATE.SM.  */
 static const struct attribute_spec::exclusions attr_streaming_exclusions[] =
 {
@@ -2743,6 +2759,26 @@ static const struct attribute_spec::exclusions attr_streaming_exclusions[] =
   { NULL, false, false, false }
 };
 
+/* Function type attributes that are mutually-exclusive with arm_new_za.  */
+static const struct attribute_spec::exclusions attr_arm_new_za_exclusions[] =
+{
+  /* Attribute name     exclusion applies to:
+			function, type, variable */
+  { "arm_preserves_za", true, false, false },
+  { "arm_shared_za", true, false, false },
+  { NULL, false, false, false }
+};
+
+/* Used by function type attributes that are mutually-exclusive with
+   arm_new_za.  */
+static const struct attribute_spec::exclusions attr_no_arm_new_za[] =
+{
+  /* Attribute name     exclusion applies to:
+			function, type, variable */
+  { "arm_new_za", true, false, false },
+  { NULL, false, false, false }
+};
+
 /* Table of machine attributes.  */
 static const struct attribute_spec aarch64_attribute_table[] =
 {
@@ -2754,6 +2790,13 @@ static const struct attribute_spec aarch64_attribute_table[] =
 			  NULL, attr_streaming_exclusions },
   { "arm_streaming_compatible", 0, 0, false, true,  true,  true,
 			  NULL, attr_streaming_exclusions },
+  { "arm_new_za",	  0, 0, true, false, false, false,
+			  handle_arm_new_za_attribute,
+			  attr_arm_new_za_exclusions },
+  { "arm_shared_za",	  0, 0, false, true,  true,  true,
+			  NULL, attr_no_arm_new_za },
+  { "arm_preserves_za",	  0, 0, false, true,  true,  true,
+			  NULL, attr_no_arm_new_za },
   { "arm_sve_vector_bits", 1, 1, false, true,  false, true,
 			  aarch64_sve::handle_arm_sve_vector_bits_attribute,
 			  NULL },
@@ -3929,6 +3972,7 @@ aarch64_hard_regno_nregs (unsigned regno, machine_mode mode)
     case PR_HI_REGS:
     case FFR_REGS:
     case PR_AND_FFR_REGS:
+    case ZA_REGS:
       return 1;
     default:
       return CEIL (lowest_size, UNITS_PER_WORD);
@@ -3959,6 +4003,9 @@ aarch64_hard_regno_mode_ok (unsigned regno, machine_mode mode)
   if (pr_or_ffr_regnum_p (regno))
     return false;
 
+  if (regno == ZA_REGNUM || regno == OLD_ZA_REGNUM)
+    return true;
+
   if (regno == SP_REGNUM)
     /* The purpose of comparing with ptr_mode is to support the
        global register variable associated with the stack pointer
@@ -4078,12 +4125,41 @@ aarch64_fntype_sm_state (const_tree fntype)
   return AARCH64_FL_SM_OFF;
 }
 
+/* Return the state of PSTATE.ZA on entry to functions of type FNTYPE.  */
+
+static aarch64_feature_flags
+aarch64_fntype_za_state (const_tree fntype)
+{
+  if (lookup_attribute ("arm_shared_za", TYPE_ATTRIBUTES (fntype)))
+    return AARCH64_FL_ZA_ON;
+
+  return 0;
+}
+
 /* Return the ISA mode on entry to functions of type FNTYPE.  */
 
 static aarch64_feature_flags
 aarch64_fntype_isa_mode (const_tree fntype)
 {
-  return aarch64_fntype_sm_state (fntype);
+  return (aarch64_fntype_sm_state (fntype)
+	  | aarch64_fntype_za_state (fntype));
+}
+
+/* Return true if functions of type FNTYPE preserve the contents of ZA.  */
+
+static bool
+aarch64_fntype_preserves_za (const_tree fntype)
+{
+  return lookup_attribute ("arm_preserves_za", TYPE_ATTRIBUTES (fntype));
+}
+
+/* Return true if FNDECL creates new ZA state (as opposed to sharing
+   ZA with its callers or ignoring ZA altogether).  */
+
+static bool
+aarch64_fndecl_has_new_za_state (const_tree fndecl)
+{
+  return lookup_attribute ("arm_new_za", DECL_ATTRIBUTES (fndecl));
 }
 
 /* Return the state of PSTATE.SM when compiling the body of
@@ -4096,13 +4172,34 @@ aarch64_fndecl_sm_state (const_tree fndecl)
   return aarch64_fntype_sm_state (TREE_TYPE (fndecl));
 }
 
+/* Return the state of PSTATE.ZA when compiling the body of function FNDECL.
+   This might be different from the state of PSTATE.ZA on entry.  */
+
+static aarch64_feature_flags
+aarch64_fndecl_za_state (const_tree fndecl)
+{
+  if (aarch64_fndecl_has_new_za_state (fndecl))
+    return AARCH64_FL_ZA_ON;
+
+  return aarch64_fntype_za_state (TREE_TYPE (fndecl));
+}
+
 /* Return the ISA mode that should be used to compile the body of
    function FNDECL.  */
 
 static aarch64_feature_flags
 aarch64_fndecl_isa_mode (const_tree fndecl)
 {
-  return aarch64_fndecl_sm_state (fndecl);
+  return (aarch64_fndecl_sm_state (fndecl)
+	  | aarch64_fndecl_za_state (fndecl));
+}
+
+/* Return true if function FNDECL preserves the contents of ZA.  */
+
+static bool
+aarch64_fndecl_preserves_za (const_tree fndecl)
+{
+  return aarch64_fntype_preserves_za (TREE_TYPE (fndecl));
 }
 
 /* Return the state of PSTATE.SM on entry to the current function.
@@ -4115,6 +4212,25 @@ aarch64_cfun_incoming_sm_state ()
   return aarch64_fntype_sm_state (TREE_TYPE (cfun->decl));
 }
 
+/* Return the state of PSTATE.ZA on entry to the current function
+   (which might be different from the state of PSTATE.ZA in the
+   function body).  */
+
+static aarch64_feature_flags
+aarch64_cfun_incoming_za_state ()
+{
+  return aarch64_fntype_za_state (TREE_TYPE (cfun->decl));
+}
+
+/* Return true if the current function creates new ZA state (as opposed
+   to sharing ZA with its callers or ignoring ZA altogether).  */
+
+static bool
+aarch64_cfun_has_new_za_state ()
+{
+  return aarch64_fndecl_has_new_za_state (cfun->decl);
+}
+
 /* Return true if a call from the current function to a function with
    ISA mode CALLEE_MODE would involve a change to PSTATE.SM around
    the BL instruction.  */
@@ -5678,6 +5794,74 @@ aarch64_output_sve_vector_inc_dec (const char *operands, rtx x)
 					     factor, nelts_per_vq);
 }
 
+/* Return a constant that represents FACTOR multiplied by the
+   number of 128-bit quadwords in an SME vector.  ISA_MODE is the
+   ISA mode in which the calculation is being performed.  */
+
+static rtx
+aarch64_sme_vq_immediate (machine_mode mode, HOST_WIDE_INT factor,
+			  aarch64_feature_flags isa_mode)
+{
+  gcc_assert (aarch64_sve_rdvl_factor_p (factor));
+  if (isa_mode & AARCH64_FL_SM_ON)
+    /* We're in streaming mode, so we can use normal poly-int values.  */
+    return gen_int_mode ({ factor, factor }, mode);
+
+  rtvec vec = gen_rtvec (1, gen_int_mode (factor, SImode));
+  rtx unspec = gen_rtx_UNSPEC (mode, vec, UNSPEC_SME_VQ);
+  return gen_rtx_CONST (mode, unspec);
+}
+
+/* Return true if X is a constant that represents some number X
+   multiplied by the number of quadwords in an SME vector.  Store this X
+   in *FACTOR if so.  */
+
+static bool
+aarch64_sme_vq_unspec_p (const_rtx x, HOST_WIDE_INT *factor)
+{
+  if (!TARGET_SME || GET_CODE (x) != CONST)
+    return false;
+
+  x = XEXP (x, 0);
+  if (GET_CODE (x) != UNSPEC
+      || XINT (x, 1) != UNSPEC_SME_VQ
+      || XVECLEN (x, 0) != 1)
+    return false;
+
+  x = XVECEXP (x, 0, 0);
+  if (!CONST_INT_P (x))
+    return false;
+
+  *factor = INTVAL (x);
+  return true;
+}
+
+/* Return true if X is a constant that represents some number X
+   multiplied by the number of quadwords in an SME vector, and if
+   that X is in the range of RDSVL.  */
+
+bool
+aarch64_rdsvl_immediate_p (const_rtx x)
+{
+  HOST_WIDE_INT factor;
+  return (aarch64_sme_vq_unspec_p (x, &factor)
+	  && aarch64_sve_rdvl_factor_p (factor));
+}
+
+/* Return the asm string for an RDSVL instruction that calculates X,
+   which is a constant that satisfies aarch64_rdsvl_immediate_p.  */
+
+char *
+aarch64_output_rdsvl (const_rtx x)
+{
+  gcc_assert (aarch64_rdsvl_immediate_p (x));
+  static char buffer[sizeof ("rdsvl\t%x0, #-") + 3 * sizeof (int)];
+  x = XVECEXP (XEXP (x, 0), 0, 0);
+  snprintf (buffer, sizeof (buffer), "rdsvl\t%%x0, #%d",
+	    (int) INTVAL (x) / 16);
+  return buffer;
+}
+
 /* Multipliers for repeating bitmasks of width 32, 16, 8, 4, and 2.  */
 
 static const unsigned HOST_WIDE_INT bitmask_imm_mul[] =
@@ -7457,6 +7641,15 @@ aarch64_expand_mov_immediate (rtx dest, rtx imm)
 	  return;
 	}
 
+      if (aarch64_rdsvl_immediate_p (base))
+	{
+	  /* We could handle non-constant offsets if they are ever
+	     generated.  */
+	  gcc_assert (const_offset == 0);
+	  emit_insn (gen_rtx_SET (dest, imm));
+	  return;
+	}
+
       sty = aarch64_classify_symbol (base, const_offset);
       switch (sty)
 	{
@@ -8458,7 +8651,7 @@ void
 aarch64_init_cumulative_args (CUMULATIVE_ARGS *pcum,
 			      const_tree fntype,
 			      rtx libname ATTRIBUTE_UNUSED,
-			      const_tree fndecl ATTRIBUTE_UNUSED,
+			      const_tree fndecl,
 			      unsigned n_named ATTRIBUTE_UNUSED,
 			      bool silent_p)
 {
@@ -8483,6 +8676,9 @@ aarch64_init_cumulative_args (CUMULATIVE_ARGS *pcum,
   pcum->aapcs_stack_words = 0;
   pcum->aapcs_stack_size = 0;
   pcum->silent_p = silent_p;
+  pcum->preserves_za = (fndecl ? aarch64_fndecl_preserves_za (fndecl)
+			: fntype ? aarch64_fntype_preserves_za (fntype)
+			: false);
   pcum->num_sme_mode_switch_args = 0;
 
   if (!silent_p
@@ -9015,6 +9211,12 @@ aarch64_layout_frame (void)
   frame.wb_push_candidate2 = INVALID_REGNUM;
   frame.spare_pred_reg = INVALID_REGNUM;
 
+  frame.has_new_za_state = (aarch64_cfun_has_new_za_state ()
+			    && DF_REG_USE_COUNT (ZA_REGNUM) > 0);
+  if (frame.has_new_za_state)
+    /* Saving any old ZA state involves a call to __arm_tpidr2_save.  */
+    df_set_regs_ever_live (R30_REGNUM, true);
+
   /* First mark all the registers that really need to be saved...  */
   for (regno = 0; regno <= LAST_SAVED_REGNUM; regno++)
     frame.reg_offset[regno] = SLOT_NOT_REQUIRED;
@@ -10443,7 +10645,11 @@ aarch64_epilogue_uses (int regno)
     {
       if (regno == LR_REGNUM)
 	return 1;
+      if (regno == ZA_REGNUM)
+	return 1;
     }
+  if (regno == ZA_REGNUM && aarch64_cfun_incoming_za_state ())
+    return 1;
   return 0;
 }
 
@@ -10756,6 +10962,27 @@ aarch64_expand_prologue (void)
 	    emit_move_insn (gen_rtx_REG (DImode, R1_REGNUM), old_r1);
 	}
     }
+
+  if (cfun->machine->frame.has_new_za_state)
+    {
+      /* Commit any uncommitted lazy save and turn ZA on.  The sequence is:
+
+	     mrs <temp>, tpidr2_el0
+	     cbz <temp>, no_save
+	     bl __arm_tpidr2_save
+	     msr tpidr2_el0, xzr
+	 no_save:
+	     smstart za  */
+      auto label = gen_label_rtx ();
+      auto tmp_reg = gen_rtx_REG (DImode, STACK_CLASH_SVE_CFA_REGNUM);
+      emit_insn (gen_aarch64_read_tpidr2 (tmp_reg));
+      auto jump = emit_jump_insn (gen_aarch64_cbeqdi1 (tmp_reg, label));
+      JUMP_LABEL (jump) = label;
+      emit_insn (gen_aarch64_tpidr2_save ());
+      emit_insn (gen_aarch64_clear_tpidr2 ());
+      emit_label (label);
+      emit_insn (gen_aarch64_smstart_za ());
+    }
 }
 
 /* Return TRUE if we can use a simple_return insn.
@@ -10829,6 +11056,11 @@ aarch64_expand_epilogue (rtx_call_insn *sibcall)
     = maybe_ne (get_frame_size ()
 		+ cfun->machine->frame.saved_varargs_size, 0);
 
+  if (cfun->machine->frame.has_new_za_state)
+    /* Turn ZA off before returning.  TPIDR2_EL0 is already null at
+       this point.  */
+    emit_insn (gen_aarch64_smstop_za ());
+
   /* Emit a barrier to prevent loads from a deallocated stack.  */
   if (maybe_gt (final_adjust, crtl->outgoing_args_size)
       || cfun->calls_alloca
@@ -11989,6 +12221,66 @@ aarch64_fixed_condition_code_regs (unsigned int *p1, unsigned int *p2)
   return true;
 }
 
+/* Make the start of the current function allocate a ZA lazy save buffer
+   and associated TPIDR2 block.  Also make it initialize the TPIDR2 block
+   to point to the ZA save buffer.  */
+
+static void
+aarch64_create_tpidr2_block ()
+{
+  if (cfun->machine->tpidr2_block)
+    return;
+
+  start_sequence ();
+  NO_DEFER_POP;
+
+  /* The TPIDR2 block is 16 bytes in size and must be aligned to a 128-bit
+     boundary.  */
+  rtx block = assign_stack_local (V16QImode, 16, 128);
+
+  /* We use the block by moving its address into TPIDR2_EL0, so we need
+     a simple register pointer to it rather than a general address.  */
+  rtx ptr = force_reg (Pmode, XEXP (block, 0));
+  cfun->machine->tpidr2_block_ptr = ptr;
+  cfun->machine->tpidr2_block = replace_equiv_address (block, ptr);
+
+  /* The ZA save buffer is SVL.B*SVL.B bytes in size.  */
+  rtx svl_bytes = aarch64_sme_vq_immediate (Pmode, 16, AARCH64_ISA_MODE);
+  rtx za_size = expand_simple_binop (Pmode, MULT, svl_bytes, svl_bytes,
+				     NULL, 0, OPTAB_LIB_WIDEN);
+  rtx za_save_buffer = allocate_dynamic_stack_space (za_size, 128, 128,
+						     -1, true);
+  za_save_buffer = force_reg (Pmode, za_save_buffer);
+  cfun->machine->za_save_buffer = za_save_buffer;
+
+  /* The first word of the block points to the save buffer and the second
+     word is the number of ZA slices to save.  */
+  rtx block_0 = adjust_address (block, DImode, 0);
+  rtx block_8 = adjust_address (block, DImode, 8);
+  emit_insn (gen_store_pair_dw_didi (block_0, za_save_buffer,
+				     block_8, force_reg (DImode, svl_bytes)));
+
+  OK_DEFER_POP;
+  auto insns = get_insns ();
+  end_sequence ();
+
+  emit_insn_after (insns, parm_birth_insn);
+}
+
+/* Restore the contents of ZA from the lazy save buffer.  PSTATE.ZA is
+   known to be 0 and TPIDR2_EL0 is known to be null.  */
+
+void
+aarch64_restore_za ()
+{
+  gcc_assert (cfun->machine->tpidr2_block);
+
+  emit_insn (gen_aarch64_smstart_za ());
+  emit_move_insn (gen_rtx_REG (Pmode, R0_REGNUM),
+		  cfun->machine->tpidr2_block_ptr);
+  emit_insn (gen_aarch64_tpidr2_restore ());
+}
+
 /* Implement TARGET_START_CALL_ARGS.  */
 
 static void
@@ -12004,6 +12296,23 @@ aarch64_start_call_args (cumulative_args_t ca_v)
 	      " option %<-march%>, or by using the %<target%>"
 	      " attribute or pragma", "sme");
     }
+
+  if (!TARGET_ZA && (ca->isa_mode & AARCH64_FL_ZA_ON))
+    error ("call to an %<arm_shared_za%> function from a function"
+	   " that has no ZA state");
+
+  /* Set up a lazy save buffer if the current function has ZA state
+     that is not shared with the callee and if the callee might
+     clobber the state.  */
+  if (TARGET_ZA
+      && !(ca->isa_mode & AARCH64_FL_ZA_ON)
+      && !ca->preserves_za)
+    {
+      if (!cfun->machine->tpidr2_block)
+	aarch64_create_tpidr2_block ();
+      emit_insn (gen_aarch64_save_za (cfun->machine->tpidr2_block_ptr,
+				      cfun->machine->tpidr2_block));
+    }
 }
 
 /* This function is used by the call expanders of the machine description.
@@ -12109,6 +12418,27 @@ aarch64_expand_call (rtx result, rtx mem, rtx cookie, bool sibcall)
 
       cfun->machine->call_switches_sm_state = true;
     }
+
+  /* If the callee is a shared ZA function, record that it uses the
+     current value of ZA.  */
+  if (callee_isa_mode & AARCH64_FL_ZA_ON)
+    use_reg (&CALL_INSN_FUNCTION_USAGE (call_insn),
+	     gen_rtx_REG (VNx16BImode, ZA_REGNUM));
+}
+
+/* Implement TARGET_END_CALL_ARGS.  */
+
+static void
+aarch64_end_call_args (cumulative_args_t ca_v)
+{
+  CUMULATIVE_ARGS *ca = get_cumulative_args (ca_v);
+
+  /* If we set up a ZA lazy save before the call, check whether the save
+     was committed.  Restore the contents of ZA from the buffer is so.  */
+  if (TARGET_ZA
+      && !(ca->isa_mode & AARCH64_FL_ZA_ON)
+      && !ca->preserves_za)
+    emit_insn (gen_aarch64_restore_za ());
 }
 
 /* Emit call insn with PAT and do aarch64-specific handling.  */
@@ -13246,6 +13576,9 @@ aarch64_regno_regclass (unsigned regno)
   if (regno == FFR_REGNUM || regno == FFRT_REGNUM)
     return FFR_REGS;
 
+  if (regno == ZA_REGNUM || regno == OLD_ZA_REGNUM)
+    return ZA_REGS;
+
   return NO_REGS;
 }
 
@@ -13601,12 +13934,14 @@ aarch64_class_max_nregs (reg_class_t regclass, machine_mode mode)
       return (vec_flags & VEC_ADVSIMD
 	      ? CEIL (lowest_size, UNITS_PER_VREG)
 	      : CEIL (lowest_size, UNITS_PER_WORD));
+
     case STACK_REG:
     case PR_REGS:
     case PR_LO_REGS:
     case PR_HI_REGS:
     case FFR_REGS:
     case PR_AND_FFR_REGS:
+    case ZA_REGS:
       return 1;
 
     case NO_REGS:
@@ -18570,10 +18905,13 @@ aarch64_override_options_internal (struct gcc_options *opts)
       && !fixed_regs[R18_REGNUM])
     error ("%<-fsanitize=shadow-call-stack%> requires %<-ffixed-x18%>");
 
-  if ((opts->x_aarch64_isa_flags & AARCH64_FL_SM_ON)
+  if ((opts->x_aarch64_isa_flags & (AARCH64_FL_SM_ON | AARCH64_FL_ZA_ON))
       && !(opts->x_aarch64_isa_flags & AARCH64_FL_SME))
     {
-      error ("streaming functions require the ISA extension %qs", "sme");
+      if (opts->x_aarch64_isa_flags & AARCH64_FL_SM_ON)
+	error ("streaming functions require the ISA extension %qs", "sme");
+      else
+	error ("functions with ZA state require the ISA extension %qs", "sme");
       inform (input_location, "you can enable %qs using the command-line"
 	      " option %<-march%>, or by using the %<target%>"
 	      " attribute or pragma", "sme");
@@ -20900,9 +21238,11 @@ aarch64_conditional_register_usage (void)
 	call_used_regs[i] = 1;
       }
 
-  /* Only allow the FFR and FFRT to be accessed via special patterns.  */
+  /* Only allow these registers to be accessed via special patterns.  */
   CLEAR_HARD_REG_BIT (operand_reg_set, FFR_REGNUM);
   CLEAR_HARD_REG_BIT (operand_reg_set, FFRT_REGNUM);
+  CLEAR_HARD_REG_BIT (operand_reg_set, ZA_REGNUM);
+  CLEAR_HARD_REG_BIT (operand_reg_set, OLD_ZA_REGNUM);
 
   /* When tracking speculation, we need a couple of call-clobbered registers
      to track the speculation state.  It would be nice to just use
@@ -22359,6 +22699,9 @@ aarch64_mov_operand_p (rtx x, machine_mode mode)
 	  || aarch64_sve_rdvl_immediate_p (x)))
     return true;
 
+  if (aarch64_rdsvl_immediate_p (x))
+    return true;
+
   return aarch64_classify_symbolic_expression (x)
     == SYMBOL_TINY_ABSOLUTE;
 }
@@ -27810,9 +28153,36 @@ aarch64_comp_type_attributes (const_tree type1, const_tree type2)
     return 0;
   if (!check_attr ("arm_streaming_compatible"))
     return 0;
+  if (!check_attr ("arm_shared_za"))
+    return 0;
+  if (!check_attr ("arm_preserves_za"))
+    return 0;
   return 1;
 }
 
+/* Implement TARGET_MERGE_DECL_ATTRIBUTES.  */
+
+static tree
+aarch64_merge_decl_attributes (tree olddecl, tree newdecl)
+{
+  tree attrs = merge_attributes (DECL_ATTRIBUTES (olddecl),
+				 DECL_ATTRIBUTES (newdecl));
+
+  if (DECL_INITIAL (olddecl))
+    for (auto name : { "arm_new_za" })
+      if (!lookup_attribute (name, DECL_ATTRIBUTES (olddecl))
+	  && lookup_attribute (name, DECL_ATTRIBUTES (newdecl)))
+	{
+	  error ("cannot apply attribute %qs to %q+D after the function"
+		 " has been defined", name, newdecl);
+	  inform (DECL_SOURCE_LOCATION (olddecl), "%q+D defined here",
+		  newdecl);
+	  attrs = remove_attribute (name, attrs);
+	}
+
+  return attrs;
+}
+
 /* Implement TARGET_GET_MULTILIB_ABI_NAME */
 
 static const char *
@@ -28178,6 +28548,24 @@ aarch64_indirect_call_asm (rtx addr)
   return "";
 }
 
+/* Implement TARGET_MD_ASM_ADJUST.  */
+
+rtx_insn *
+aarch64_md_asm_adjust (vec<rtx> &outputs, vec<rtx> &inputs,
+		       vec<machine_mode> &input_modes,
+		       vec<const char *> &constraints,
+		       vec<rtx> &uses, vec<rtx> &clobbers,
+		       HARD_REG_SET &clobbered_regs, location_t loc)
+{
+  /* "za" in the clobber list is defined to mean that the asm can read
+     from and write to ZA.  */
+  if (TEST_HARD_REG_BIT (clobbered_regs, ZA_REGNUM))
+    uses.safe_push (gen_rtx_REG (VNx16QImode, ZA_REGNUM));
+
+  return arm_md_asm_adjust (outputs, inputs, input_modes, constraints,
+			    uses, clobbers, clobbered_regs, loc);
+}
+
 /* If CALL involves a change in PSTATE.SM, emit the instructions needed
    to switch to the new mode and the instructions needed to restore the
    original mode.  Return true if something changed.  */
@@ -28565,6 +28953,9 @@ aarch64_run_selftests (void)
 #undef TARGET_START_CALL_ARGS
 #define TARGET_START_CALL_ARGS aarch64_start_call_args
 
+#undef TARGET_END_CALL_ARGS
+#define TARGET_END_CALL_ARGS aarch64_end_call_args
+
 #undef TARGET_GIMPLE_FOLD_BUILTIN
 #define TARGET_GIMPLE_FOLD_BUILTIN aarch64_gimple_fold_builtin
 
@@ -28926,6 +29317,9 @@ aarch64_libgcc_floating_mode_supported_p
 #undef TARGET_COMP_TYPE_ATTRIBUTES
 #define TARGET_COMP_TYPE_ATTRIBUTES aarch64_comp_type_attributes
 
+#undef TARGET_MERGE_DECL_ATTRIBUTES
+#define TARGET_MERGE_DECL_ATTRIBUTES aarch64_merge_decl_attributes
+
 #undef TARGET_GET_MULTILIB_ABI_NAME
 #define TARGET_GET_MULTILIB_ABI_NAME aarch64_get_multilib_abi_name
 
@@ -28947,7 +29341,7 @@ aarch64_libgcc_floating_mode_supported_p
 #define TARGET_STRICT_ARGUMENT_NAMING hook_bool_CUMULATIVE_ARGS_true
 
 #undef TARGET_MD_ASM_ADJUST
-#define TARGET_MD_ASM_ADJUST arm_md_asm_adjust
+#define TARGET_MD_ASM_ADJUST aarch64_md_asm_adjust
 
 #undef TARGET_ASM_FILE_END
 #define TARGET_ASM_FILE_END aarch64_asm_file_end
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index f23edea35f5..b5877e7e61e 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -207,6 +207,7 @@ constexpr auto AARCH64_FL_DEFAULT_ISA_MODE = AARCH64_FL_SM_OFF;
 /* Macros to test ISA flags.  */
 
 #define AARCH64_ISA_SM_OFF         (aarch64_isa_flags & AARCH64_FL_SM_OFF)
+#define AARCH64_ISA_ZA_ON          (aarch64_isa_flags & AARCH64_FL_ZA_ON)
 #define AARCH64_ISA_MODE           (aarch64_isa_flags & AARCH64_FL_ISA_MODES)
 #define AARCH64_ISA_CRC            (aarch64_isa_flags & AARCH64_FL_CRC)
 #define AARCH64_ISA_CRYPTO         (aarch64_isa_flags & AARCH64_FL_CRYPTO)
@@ -259,6 +260,9 @@ constexpr auto AARCH64_FL_DEFAULT_ISA_MODE = AARCH64_FL_SM_OFF;
 #define TARGET_STREAMING_COMPATIBLE \
   ((aarch64_isa_flags & AARCH64_FL_SM_STATE) == 0)
 
+/* PSTATE.ZA is enabled in the current function body.  */
+#define TARGET_ZA (AARCH64_ISA_ZA_ON)
+
 /* Crypto is an optional extension to AdvSIMD.  */
 #define TARGET_CRYPTO (AARCH64_ISA_CRYPTO)
 
@@ -445,7 +449,8 @@ constexpr auto AARCH64_FL_DEFAULT_ISA_MODE = AARCH64_FL_SM_OFF;
     1, 1, 1, 1,			/* SFP, AP, CC, VG */	\
     0, 0, 0, 0,   0, 0, 0, 0,   /* P0 - P7 */           \
     0, 0, 0, 0,   0, 0, 0, 0,   /* P8 - P15 */          \
-    1, 1			/* FFR and FFRT */	\
+    1, 1,			/* FFR and FFRT */	\
+    1, 1			/* TPIDR2 and ZA */	\
   }
 
 /* X30 is marked as caller-saved which is in line with regular function call
@@ -455,7 +460,7 @@ constexpr auto AARCH64_FL_DEFAULT_ISA_MODE = AARCH64_FL_SM_OFF;
    true but not until function epilogues have been generated.  This ensures
    that X30 is available for use in leaf functions if needed.  */
 
-#define CALL_USED_REGISTERS				\
+#define CALL_REALLY_USED_REGISTERS			\
   {							\
     1, 1, 1, 1,   1, 1, 1, 1,	/* R0 - R7 */		\
     1, 1, 1, 1,   1, 1, 1, 1,	/* R8 - R15 */		\
@@ -468,7 +473,8 @@ constexpr auto AARCH64_FL_DEFAULT_ISA_MODE = AARCH64_FL_SM_OFF;
     1, 1, 1, 1,			/* SFP, AP, CC, VG */	\
     1, 1, 1, 1,   1, 1, 1, 1,	/* P0 - P7 */		\
     1, 1, 1, 1,   1, 1, 1, 1,	/* P8 - P15 */		\
-    1, 1			/* FFR and FFRT */	\
+    1, 1,			/* FFR and FFRT */	\
+    1, 0			/* TPIDR2 and ZA */	\
   }
 
 #define REGISTER_NAMES						\
@@ -484,7 +490,8 @@ constexpr auto AARCH64_FL_DEFAULT_ISA_MODE = AARCH64_FL_SM_OFF;
     "sfp", "ap",  "cc",  "vg",					\
     "p0",  "p1",  "p2",  "p3",  "p4",  "p5",  "p6",  "p7",	\
     "p8",  "p9",  "p10", "p11", "p12", "p13", "p14", "p15",	\
-    "ffr", "ffrt"						\
+    "ffr", "ffrt",						\
+    "za", "old_za"						\
   }
 
 /* Generate the register aliases for core register N */
@@ -533,7 +540,7 @@ constexpr auto AARCH64_FL_DEFAULT_ISA_MODE = AARCH64_FL_SM_OFF;
 #define FRAME_POINTER_REGNUM		SFP_REGNUM
 #define STACK_POINTER_REGNUM		SP_REGNUM
 #define ARG_POINTER_REGNUM		AP_REGNUM
-#define FIRST_PSEUDO_REGISTER		(FFRT_REGNUM + 1)
+#define FIRST_PSEUDO_REGISTER		(OLD_ZA_REGNUM + 1)
 
 /* The number of argument registers available for each class.  */
 #define NUM_ARG_REGS			8
@@ -673,6 +680,7 @@ enum reg_class
   PR_REGS,
   FFR_REGS,
   PR_AND_FFR_REGS,
+  ZA_REGS,
   ALL_REGS,
   LIM_REG_CLASSES		/* Last */
 };
@@ -696,6 +704,7 @@ enum reg_class
   "PR_REGS",					\
   "FFR_REGS",					\
   "PR_AND_FFR_REGS",				\
+  "ZA_REGS",					\
   "ALL_REGS"					\
 }
 
@@ -716,6 +725,7 @@ enum reg_class
   { 0x00000000, 0x00000000, 0x000ffff0 },	/* PR_REGS */		\
   { 0x00000000, 0x00000000, 0x00300000 },	/* FFR_REGS */		\
   { 0x00000000, 0x00000000, 0x003ffff0 },	/* PR_AND_FFR_REGS */	\
+  { 0x00000000, 0x00000000, 0x00c00000 },	/* ZA_REGS */		\
   { 0xffffffff, 0xffffffff, 0x000fffff }	/* ALL_REGS */		\
 }
 
@@ -889,16 +899,36 @@ struct GTY (()) aarch64_frame
 
   /* True if shadow call stack should be enabled for the current function.  */
   bool is_scs_enabled;
+
+  /* True if the function has an arm_new_za attribute and if ZA is
+     actually used by the function.  */
+  bool has_new_za_state;
 };
 
 typedef struct GTY (()) machine_function
 {
   struct aarch64_frame frame;
+
   /* One entry for each hard register.  */
   bool reg_is_wrapped_separately[LAST_SAVED_REGNUM];
+
   /* One entry for each general purpose register.  */
   rtx call_via[SP_REGNUM];
+
+  /* A MEM for the whole of the function's TPIDR2 block, or null if the
+     function doesn't have a TPIDR2 block.  */
+  rtx tpidr2_block;
+
+  /* A pseudo register that points to the function's TPIDR2 block, or null
+     if the function doesn't have a TPIDR2 block.  */
+  rtx tpidr2_block_ptr;
+
+  /* A pseudo register that points to the function's ZA save buffer,
+     or null if none.  */
+  rtx za_save_buffer;
+
   bool label_is_assembled;
+
   /* True if we've expanded at least one call to a function that changes
      PSTATE.SM.  This should only be used for saving compile time: false
      guarantees that no such mode switch exists.  */
@@ -968,6 +998,9 @@ typedef struct
   bool silent_p;		/* True if we should act silently, rather than
 				   raise an error for invalid calls.  */
 
+  /* True if the call preserves ZA.  */
+  bool preserves_za;
+
   /* A list of registers that need to be saved and restored around a
      change to PSTATE.SM.  An auto_vec would be more convenient, but those
      can't be copied.  */
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 991f46fbc80..3ebe8690c31 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -111,6 +111,11 @@ (define_constants
     ;; "FFR token": a fake register used for representing the scheduling
     ;; restrictions on FFR-related operations.
     (FFRT_REGNUM	85)
+    (ZA_REGNUM		86)
+    ;; Represents a lazy-populated back-up of the ZA contents, as managed
+    ;; by TPIDR2_EL0.  Modelling this as a simple register allows the RTL
+    ;; optimizers to remove redundant saves and restores.
+    (OLD_ZA_REGNUM	87)
     ;; The pair of scratch registers used for stack probing with -fstack-check.
     ;; Leave R9 alone as a possible choice for the static chain.
     ;; Note that the use of these registers is mutually exclusive with the use
@@ -303,6 +308,9 @@ (define_c_enum "unspec" [
     UNSPEC_TAG_SPACE		; Translate address to MTE tag address space.
     UNSPEC_LD1RO
     UNSPEC_SALT_ADDR
+    ;; Wraps a constant integer that should be multiplied by the number
+    ;; of quadwords in an SME vector.
+    UNSPEC_SME_VQ
 ])
 
 (define_c_enum "unspecv" [
@@ -374,7 +382,7 @@ (define_constants
 ;; As a convenience, "fp_q" means "fp" + the ability to move between
 ;; Q registers and is equivalent to "simd".
 
-(define_enum "arches" [any rcpc8_4 fp fp_q base_simd simd sve fp16])
+(define_enum "arches" [any rcpc8_4 fp fp_q base_simd simd sve fp16 sme])
 
 (define_enum_attr "arch" "arches" (const_string "any"))
 
@@ -412,7 +420,10 @@ (define_attr "arch_enabled" "no,yes"
 	     (match_test "TARGET_FP_F16INST"))
 
 	(and (eq_attr "arch" "sve")
-	     (match_test "TARGET_SVE")))
+	     (match_test "TARGET_SVE"))
+
+	(and (eq_attr "arch" "sme")
+	     (match_test "TARGET_SME")))
     (const_string "yes")
     (const_string "no")))
 
@@ -915,7 +926,7 @@ (define_insn "simple_return"
    (set_attr "sls_length" "retbr")]
 )
 
-(define_insn "*cb<optab><mode>1"
+(define_insn "aarch64_cb<optab><mode>1"
   [(set (pc) (if_then_else (EQL (match_operand:GPI 0 "register_operand" "r")
 				(const_int 0))
 			   (label_ref (match_operand 1 "" ""))
@@ -1268,8 +1279,8 @@ (define_expand "mov<mode>"
 )
 
 (define_insn_and_split "*movsi_aarch64"
-  [(set (match_operand:SI 0 "nonimmediate_operand" "=r,k,r,r,r,  r,  r,r,w, m,m,  r,  r,  r, w,r,w, w")
-	(match_operand:SI 1 "aarch64_mov_operand"  " r,r,k,M,n,Usv,Usr,m,m,rZ,w,Usw,Usa,Ush,rZ,w,w,Ds"))]
+  [(set (match_operand:SI 0 "nonimmediate_operand" "=r,k,r,r,r,  r,  r,  r,r,w, m,m,  r,  r,  r, w,r,w, w")
+	(match_operand:SI 1 "aarch64_mov_operand"  " r,r,k,M,n,Usv,Usr,UsR,m,m,rZ,w,Usw,Usa,Ush,rZ,w,w,Ds"))]
   "(register_operand (operands[0], SImode)
     || aarch64_reg_or_zero (operands[1], SImode))"
   "@
@@ -1280,6 +1291,7 @@ (define_insn_and_split "*movsi_aarch64"
    #
    * return aarch64_output_sve_cnt_immediate (\"cnt\", \"%x0\", operands[1]);
    * return aarch64_output_sve_rdvl (operands[1]);
+   * return aarch64_output_rdsvl (operands[1]);
    ldr\\t%w0, %1
    ldr\\t%s0, %1
    str\\t%w1, %0
@@ -1300,17 +1312,17 @@ (define_insn_and_split "*movsi_aarch64"
     }"
   ;; The "mov_imm" type for CNT is just a placeholder.
   [(set_attr "type" "mov_reg,mov_reg,mov_reg,
-		     mov_imm,mov_imm,mov_imm,mov_imm,
+		     mov_imm,mov_imm,mov_imm,mov_imm,mov_imm,
 		     load_4,load_4,store_4,store_4,load_4,
 		     adr,adr,f_mcr,f_mrc,fmov,neon_move")
-   (set_attr "arch"   "*,*,*,*,*,sve,sve,*,fp,*,fp,*,*,*,fp,fp,fp,simd")
-   (set_attr "length" "4,4,4,4,*,  4,  4,4, 4,4, 4,8,4,4, 4, 4, 4,   4")
+   (set_attr "arch"   "*,*,*,*,*,sve,sve,sme,*,fp,*,fp,*,*,*,fp,fp,fp,simd")
+   (set_attr "length" "4,4,4,4,*,  4,  4,  4,4, 4,4, 4,8,4,4, 4, 4, 4,   4")
 ]
 )
 
 (define_insn_and_split "*movdi_aarch64"
-  [(set (match_operand:DI 0 "nonimmediate_operand" "=r,k,r,r,r,r,  r,  r,r,w, m,m,  r,  r,  r, w,r,w, w")
-	(match_operand:DI 1 "aarch64_mov_operand"  " r,r,k,N,M,n,Usv,Usr,m,m,rZ,w,Usw,Usa,Ush,rZ,w,w,Dd"))]
+  [(set (match_operand:DI 0 "nonimmediate_operand" "=r,k,r,r,r,r,  r,  r,  r,r,w, m,m,  r,  r,  r, w,r,w, w")
+	(match_operand:DI 1 "aarch64_mov_operand"  " r,r,k,N,M,n,Usv,Usr,UsR,m,m,rZ,w,Usw,Usa,Ush,rZ,w,w,Dd"))]
   "(register_operand (operands[0], DImode)
     || aarch64_reg_or_zero (operands[1], DImode))"
   "@
@@ -1322,6 +1334,7 @@ (define_insn_and_split "*movdi_aarch64"
    #
    * return aarch64_output_sve_cnt_immediate (\"cnt\", \"%x0\", operands[1]);
    * return aarch64_output_sve_rdvl (operands[1]);
+   * return aarch64_output_rdsvl (operands[1]);
    ldr\\t%x0, %1
    ldr\\t%d0, %1
    str\\t%x1, %0
@@ -1342,11 +1355,11 @@ (define_insn_and_split "*movdi_aarch64"
     }"
   ;; The "mov_imm" type for CNTD is just a placeholder.
   [(set_attr "type" "mov_reg,mov_reg,mov_reg,
-		     mov_imm,mov_imm,mov_imm,mov_imm,mov_imm,
+		     mov_imm,mov_imm,mov_imm,mov_imm,mov_imm,mov_imm,
 		     load_8,load_8,store_8,store_8,load_8,
 		     adr,adr,f_mcr,f_mrc,fmov,neon_move")
-   (set_attr "arch"   "*,*,*,*,*,*,sve,sve,*,fp,*,fp,*,*,*,fp,fp,fp,simd")
-   (set_attr "length" "4,4,4,4,4,*,  4,  4,4, 4,4, 4,8,4,4, 4, 4, 4,   4")]
+   (set_attr "arch"   "*,*,*,*,*,*,sve,sve,sme,*,fp,*,fp,*,*,*,fp,fp,fp,simd")
+   (set_attr "length" "4,4,4,4,4,*,  4,  4,  4,4, 4,4, 4,8,4,4, 4, 4, 4,   4")]
 )
 
 (define_insn "insv_imm<mode>"
diff --git a/gcc/config/aarch64/constraints.md b/gcc/config/aarch64/constraints.md
index 3664e4dbdd6..8d4393f30a1 100644
--- a/gcc/config/aarch64/constraints.md
+++ b/gcc/config/aarch64/constraints.md
@@ -215,6 +215,12 @@ (define_constraint "Usr"
  (and (match_code "const_poly_int")
       (match_test "aarch64_sve_rdvl_immediate_p (op)")))
 
+(define_constraint "UsR"
+  "@internal
+   A constraint that matches a value produced by RDSVL."
+ (and (match_code "const")
+      (match_test "aarch64_rdsvl_immediate_p (op)")))
+
 (define_constraint "Usv"
   "@internal
    A constraint that matches a VG-based constant that can be loaded by
diff --git a/gcc/testsuite/gcc.target/aarch64/sme/za_state_1.c b/gcc/testsuite/gcc.target/aarch64/sme/za_state_1.c
new file mode 100644
index 00000000000..cd3cfd0cf4d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sme/za_state_1.c
@@ -0,0 +1,102 @@
+// { dg-options "" }
+
+void __attribute__((arm_shared_za)) shared_a ();
+void shared_a (); // { dg-error "conflicting types" }
+
+void shared_b ();
+void __attribute__((arm_shared_za)) shared_b (); // { dg-error "conflicting types" }
+
+void __attribute__((arm_shared_za)) shared_c ();
+void shared_c () {} // Inherits attribute from declaration (confusingly).
+
+void shared_d ();
+void __attribute__((arm_shared_za)) shared_d () {} // { dg-error "conflicting types" }
+
+void __attribute__((arm_shared_za)) shared_e () {}
+void shared_e (); // { dg-error "conflicting types" }
+
+void shared_f () {}
+void __attribute__((arm_shared_za)) shared_f (); // { dg-error "conflicting types" }
+
+extern void (*shared_g) ();
+extern __attribute__((arm_shared_za)) void (*shared_g) (); // { dg-error "conflicting types" }
+
+extern __attribute__((arm_shared_za)) void (*shared_h) ();
+extern void (*shared_h) (); // { dg-error "conflicting types" }
+
+//----------------------------------------------------------------------------
+
+void __attribute__((arm_preserves_za)) preserves_a ();
+void preserves_a (); // { dg-error "conflicting types" }
+
+void preserves_b ();
+void __attribute__((arm_preserves_za)) preserves_b (); // { dg-error "conflicting types" }
+
+void __attribute__((arm_preserves_za)) preserves_c ();
+void preserves_c () {} // Inherits attribute from declaration (confusingly).
+
+void preserves_d ();
+void __attribute__((arm_preserves_za)) preserves_d () {} // { dg-error "conflicting types" }
+
+void __attribute__((arm_preserves_za)) preserves_e () {}
+void preserves_e (); // { dg-error "conflicting types" }
+
+void preserves_f () {}
+void __attribute__((arm_preserves_za)) preserves_f (); // { dg-error "conflicting types" }
+
+extern void (*preserves_g) ();
+extern __attribute__((arm_preserves_za)) void (*preserves_g) (); // { dg-error "conflicting types" }
+
+extern __attribute__((arm_preserves_za)) void (*preserves_h) ();
+extern void (*preserves_h) (); // { dg-error "conflicting types" }
+
+//----------------------------------------------------------------------------
+
+void __attribute__((arm_preserves_za)) mixed_a ();
+void __attribute__((arm_shared_za)) mixed_a (); // { dg-error "conflicting types" }
+
+void __attribute__((arm_shared_za)) mixed_b ();
+void __attribute__((arm_preserves_za)) mixed_b (); // { dg-error "conflicting types" }
+
+void __attribute__((arm_preserves_za)) mixed_c ();
+void __attribute__((arm_shared_za)) mixed_c () {} // { dg-error "conflicting types" }
+
+void __attribute__((arm_shared_za)) mixed_d ();
+void __attribute__((arm_preserves_za)) mixed_d () {} // { dg-error "conflicting types" }
+
+void __attribute__((arm_preserves_za)) mixed_e () {}
+void __attribute__((arm_shared_za)) mixed_e (); // { dg-error "conflicting types" }
+
+void __attribute__((arm_shared_za)) mixed_f () {}
+void __attribute__((arm_preserves_za)) mixed_f (); // { dg-error "conflicting types" }
+
+extern __attribute__((arm_shared_za)) void (*mixed_g) ();
+extern __attribute__((arm_preserves_za)) void (*mixed_g) (); // { dg-error "conflicting types" }
+
+extern __attribute__((arm_preserves_za)) void (*mixed_h) ();
+extern __attribute__((arm_shared_za)) void (*mixed_h) (); // { dg-error "conflicting types" }
+
+//----------------------------------------------------------------------------
+
+void __attribute__((arm_preserves_za, arm_shared_za)) complementary_1();
+void __attribute__((arm_shared_za, arm_preserves_za)) complementary_2();
+
+int __attribute__((arm_shared_za)) int_attr; // { dg-warning "only applies to function types" }
+void *__attribute__((arm_preserves_za)) ptr_attr; // { dg-warning "only applies to function types" }
+
+typedef void __attribute__((arm_preserves_za)) preserves_callback ();
+typedef void __attribute__((arm_shared_za)) shared_callback ();
+
+void (*__attribute__((arm_preserves_za)) preserves_callback_ptr) ();
+void (*__attribute__((arm_shared_za)) shared_callback_ptr) ();
+
+typedef void __attribute__((arm_preserves_za, arm_shared_za)) complementary_callback_1 ();
+typedef void __attribute__((arm_shared_za, arm_preserves_za)) complementary_callback_2 ();
+
+void __attribute__((arm_preserves_za, arm_shared_za)) (*complementary_callback_ptr_1) ();
+void __attribute__((arm_shared_za, arm_preserves_za)) (*complementary_callback_ptr_2) ();
+
+struct s {
+  void __attribute__((arm_preserves_za, arm_shared_za)) (*complementary_callback_ptr_1) ();
+  void __attribute__((arm_shared_za, arm_preserves_za)) (*complementary_callback_ptr_2) ();
+};
diff --git a/gcc/testsuite/gcc.target/aarch64/sme/za_state_2.c b/gcc/testsuite/gcc.target/aarch64/sme/za_state_2.c
new file mode 100644
index 00000000000..261c500aff1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sme/za_state_2.c
@@ -0,0 +1,96 @@
+// { dg-options "" }
+
+void __attribute__((arm_new_za)) new_za_a ();
+void new_za_a ();
+
+void new_za_b ();
+void __attribute__((arm_new_za)) new_za_b ();
+
+void __attribute__((arm_new_za)) new_za_c ();
+void new_za_c () {}
+
+void new_za_d ();
+void __attribute__((arm_new_za)) new_za_d () {}
+
+void __attribute__((arm_new_za)) new_za_e () {}
+void new_za_e ();
+
+void new_za_f () {}
+void __attribute__((arm_new_za)) new_za_f (); // { dg-error "cannot apply attribute 'arm_new_za' to 'new_za_f' after the function has been defined" }
+
+extern void (*new_za_g) ();
+extern __attribute__((arm_new_za)) void (*new_za_g) (); // { dg-error "applies only to functions" }
+
+extern __attribute__((arm_new_za)) void (*new_za_h) (); // { dg-error "applies only to functions" }
+extern void (*new_za_h) ();
+
+//----------------------------------------------------------------------------
+
+void __attribute__((arm_new_za)) shared_a ();
+void __attribute__((arm_shared_za)) shared_a (); // { dg-warning "conflicts with attribute" }
+
+void __attribute__((arm_shared_za)) shared_b ();
+void __attribute__((arm_new_za)) shared_b (); // { dg-error "conflicting types" }
+// { dg-warning "conflicts with attribute" "" { target *-*-* } .-1 }
+
+void __attribute__((arm_new_za)) shared_c ();
+void __attribute__((arm_shared_za)) shared_c () {} // { dg-warning "conflicts with attribute" }
+
+void __attribute__((arm_shared_za)) shared_d ();
+void __attribute__((arm_new_za)) shared_d () {} // { dg-warning "conflicts with attribute" }
+
+void __attribute__((arm_new_za)) shared_e () {}
+void __attribute__((arm_shared_za)) shared_e (); // { dg-warning "conflicts with attribute" }
+
+void __attribute__((arm_shared_za)) shared_f () {}
+void __attribute__((arm_new_za)) shared_f (); // { dg-error "conflicting types" }
+// { dg-warning "conflicts with attribute" "" { target *-*-* } .-1 }
+
+//----------------------------------------------------------------------------
+
+void __attribute__((arm_new_za)) preserves_a ();
+void __attribute__((arm_preserves_za)) preserves_a (); // { dg-warning "conflicts with attribute" }
+
+void __attribute__((arm_preserves_za)) preserves_b ();
+void __attribute__((arm_new_za)) preserves_b (); // { dg-error "conflicting types" }
+// { dg-warning "conflicts with attribute" "" { target *-*-* } .-1 }
+
+void __attribute__((arm_new_za)) preserves_c ();
+void __attribute__((arm_preserves_za)) preserves_c () {} // { dg-warning "conflicts with attribute" }
+
+void __attribute__((arm_preserves_za)) preserves_d ();
+void __attribute__((arm_new_za)) preserves_d () {} // { dg-warning "conflicts with attribute" }
+
+void __attribute__((arm_new_za)) preserves_e () {}
+void __attribute__((arm_preserves_za)) preserves_e (); // { dg-warning "conflicts with attribute" }
+
+void __attribute__((arm_preserves_za)) preserves_f () {}
+void __attribute__((arm_new_za)) preserves_f (); // { dg-error "conflicting types" }
+// { dg-warning "conflicts with attribute" "" { target *-*-* } .-1 }
+
+//----------------------------------------------------------------------------
+
+void __attribute__((arm_new_za, arm_shared_za)) contradiction_1(); // { dg-warning "conflicts with attribute" }
+void __attribute__((arm_shared_za, arm_new_za)) contradiction_2(); // { dg-warning "conflicts with attribute" }
+void __attribute__((arm_new_za, arm_preserves_za)) contradiction_3(); // { dg-warning "conflicts with attribute" }
+void __attribute__((arm_preserves_za, arm_new_za)) contradiction_4(); // { dg-warning "conflicts with attribute" }
+
+int __attribute__((arm_new_za)) int_attr; // { dg-error "applies only to functions" }
+typedef __attribute__((arm_new_za)) int int_typdef; // { dg-error "applies only to functions" }
+typedef void __attribute__((arm_new_za)) new_za_callback (); // { dg-error "applies only to functions" }
+
+//----------------------------------------------------------------------------
+
+void __attribute__((arm_streaming, arm_new_za)) complementary_1 () {}
+void __attribute__((arm_new_za, arm_streaming)) complementary_2 () {}
+void __attribute__((arm_streaming_compatible, arm_new_za)) complementary_3 () {}
+void __attribute__((arm_new_za, arm_streaming_compatible)) complementary_4 () {}
+
+//----------------------------------------------------------------------------
+
+#pragma GCC target "+nosme"
+
+void __attribute__((arm_new_za)) bereft_1 ();
+void __attribute__((arm_new_za)) bereft_2 () {} // { dg-error "functions with ZA state require the ISA extension 'sme'" }
+void __attribute__((arm_shared_za)) bereft_3 ();
+void __attribute__((arm_shared_za)) bereft_4 () {} // { dg-error "functions with ZA state require the ISA extension 'sme'" }
diff --git a/gcc/testsuite/gcc.target/aarch64/sme/za_state_3.c b/gcc/testsuite/gcc.target/aarch64/sme/za_state_3.c
new file mode 100644
index 00000000000..fc5771070e4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sme/za_state_3.c
@@ -0,0 +1,27 @@
+// { dg-options "" }
+
+void normal_callee ();
+__attribute__((arm_shared_za)) void shared_callee ();
+__attribute__((arm_preserves_za)) void preserves_callee ();
+__attribute__((arm_shared_za, arm_preserves_za)) void shared_preserves_callee ();
+
+struct callbacks {
+  void (*normal_ptr) ();
+  __attribute__((arm_shared_za)) void (*shared_ptr) ();
+  __attribute__((arm_preserves_za)) void (*preserves_ptr) ();
+  __attribute__((arm_shared_za, arm_preserves_za)) void (*shared_preserves_ptr) ();
+};
+
+void
+normal_caller (struct callbacks *c)
+{
+  normal_callee ();
+  shared_callee (); // { dg-error "call to an 'arm_shared_za' function from a function that has no ZA state" }
+  preserves_callee ();
+  shared_preserves_callee (); // { dg-error "call to an 'arm_shared_za' function from a function that has no ZA state" }
+
+  c->normal_ptr ();
+  c->shared_ptr (); // { dg-error "call to an 'arm_shared_za' function from a function that has no ZA state" }
+  c->preserves_ptr ();
+  c->shared_preserves_ptr (); // { dg-error "call to an 'arm_shared_za' function from a function that has no ZA state" }
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/sme/za_state_4.c b/gcc/testsuite/gcc.target/aarch64/sme/za_state_4.c
new file mode 100644
index 00000000000..34369101085
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sme/za_state_4.c
@@ -0,0 +1,277 @@
+// { dg-options "-O -fno-optimize-sibling-calls" }
+// { dg-final { check-function-bodies "**" "" } }
+
+void ns_normal_callee ();
+__attribute__((arm_shared_za)) void ns_shared_callee ();
+__attribute__((arm_preserves_za)) void ns_preserves_callee ();
+__attribute__((arm_shared_za, arm_preserves_za)) void ns_shared_preserves_callee ();
+
+__attribute__((arm_streaming)) void s_normal_callee ();
+__attribute__((arm_streaming, arm_shared_za)) void s_shared_callee ();
+__attribute__((arm_streaming, arm_preserves_za)) void s_preserves_callee ();
+__attribute__((arm_streaming, arm_shared_za, arm_preserves_za)) void s_shared_preserves_callee ();
+
+__attribute__((arm_streaming_compatible)) void sc_normal_callee ();
+__attribute__((arm_streaming_compatible, arm_shared_za)) void sc_shared_callee ();
+__attribute__((arm_streaming_compatible, arm_preserves_za)) void sc_preserves_callee ();
+__attribute__((arm_streaming_compatible, arm_shared_za, arm_preserves_za)) void sc_shared_preserves_callee ();
+
+struct callbacks {
+  void (*normal_ptr) ();
+  __attribute__((arm_shared_za)) void (*shared_ptr) ();
+  __attribute__((arm_preserves_za)) void (*preserves_ptr) ();
+  __attribute__((arm_shared_za, arm_preserves_za)) void (*shared_preserves_ptr) ();
+};
+
+/*
+** ns_caller1:
+**	...
+**	mrs	x11, tpidr2_el0
+**	cbz	x11, .*
+**	bl	__arm_tpidr2_save
+**	smstart	za
+**	...
+**	add	(x[0-9]+), x29, .*
+**	rdsvl	(x[0-9]+), #1
+**	mov	(x[0-9]+), sp
+**	msub	(x[0-9]+), \2, \2, \3
+**	mov	sp, \4
+**	stp	\4, \2, .*
+**	msr	tpidr2_el0, \1
+**	bl	ns_normal_callee
+**	mrs	x16, tpidr2_el0
+**	cbnz	x16, .*
+**	smstart	za
+**	mov	x0, \1
+**	bl	__arm_tpidr2_restore
+**	msr	tpidr2_el0, xzr
+**	bl	ns_shared_callee
+**	bl	ns_preserves_callee
+**	bl	ns_shared_preserves_callee
+**	msr	tpidr2_el0, \1
+**	ldr	x[0-9]+, .*
+**	blr	x[0-9]+
+**	mrs	x16, tpidr2_el0
+**	cbnz	x16, .*
+**	smstart	za
+**	mov	x0, \1
+**	bl	__arm_tpidr2_restore
+**	msr	tpidr2_el0, xzr
+**	ldr	x[0-9]+, .*
+**	blr	x[0-9]+
+**	ldr	x[0-9]+, .*
+**	blr	x[0-9]+
+**	ldr	x[0-9]+, .*
+**	blr	x[0-9]+
+**	smstop	za
+**	...
+*/
+void __attribute__((arm_new_za))
+ns_caller1 (struct callbacks *c)
+{
+  ns_normal_callee ();
+  ns_shared_callee ();
+  ns_preserves_callee ();
+  ns_shared_preserves_callee ();
+
+  c->normal_ptr ();
+  c->shared_ptr ();
+  c->preserves_ptr ();
+  c->shared_preserves_ptr ();
+}
+
+/*
+** ns_caller2:
+**	...
+**	mrs	x11, tpidr2_el0
+**	cbz	x11, .*
+**	bl	__arm_tpidr2_save
+**	msr	tpidr2_el0, xzr
+**	smstart	za
+**	bl	ns_shared_callee
+**	smstop	za
+**	...
+*/
+void __attribute__((arm_new_za))
+ns_caller2 (struct callbacks *c)
+{
+  ns_shared_callee ();
+}
+
+/*
+** ns_caller3:
+**	...
+**	mrs	x11, tpidr2_el0
+**	cbz	x11, .*
+**	bl	__arm_tpidr2_save
+**	msr	tpidr2_el0, xzr
+**	smstart	za
+**	bl	ns_preserves_callee
+**	bl	ns_shared_callee
+**	bl	ns_shared_preserves_callee
+**	smstop	za
+**	...
+*/
+void __attribute__((arm_new_za))
+ns_caller3 (struct callbacks *c)
+{
+  ns_preserves_callee ();
+  ns_shared_callee ();
+  ns_shared_preserves_callee ();
+}
+
+/*
+** ns_caller4:
+**	...
+**	mrs	x11, tpidr2_el0
+**	cbz	x11, .*
+**	bl	__arm_tpidr2_save
+**	smstart	za
+**	...
+**	add	(x[0-9]+), x29, .*
+**	rdsvl	(x[0-9]+), #1
+**	mov	(x[0-9]+), sp
+**	msub	(x[0-9]+), \2, \2, \3
+**	mov	sp, \4
+**	stp	\4, \2, .*
+**	msr	tpidr2_el0, \1
+**	smstart	sm
+**	bl	s_normal_callee
+**	smstop	sm
+**	mrs	x16, tpidr2_el0
+**	cbnz	x16, .*
+**	smstart	za
+**	mov	x0, \1
+**	bl	__arm_tpidr2_restore
+**	msr	tpidr2_el0, xzr
+**	smstart	sm
+**	bl	s_shared_callee
+**	smstop	sm
+**	smstart	sm
+**	bl	s_preserves_callee
+**	smstop	sm
+**	smstart	sm
+**	bl	s_shared_preserves_callee
+**	smstop	sm
+**	smstop	za
+**	...
+*/
+void __attribute__((arm_new_za))
+ns_caller4 (struct callbacks *c)
+{
+  s_normal_callee ();
+  s_shared_callee ();
+  s_preserves_callee ();
+  s_shared_preserves_callee ();
+}
+
+/*
+** ns_caller5:
+**	...
+**	mrs	x11, tpidr2_el0
+**	cbz	x11, .*
+**	bl	__arm_tpidr2_save
+**	smstart	za
+**	...
+**	add	(x[0-9]+), x29, .*
+**	rdsvl	(x[0-9]+), #1
+**	mov	(x[0-9]+), sp
+**	msub	(x[0-9]+), \2, \2, \3
+**	mov	sp, \4
+**	stp	\4, \2, .*
+**	msr	tpidr2_el0, \1
+**	bl	sc_normal_callee
+**	mrs	x16, tpidr2_el0
+**	cbnz	x16, .*
+**	smstart	za
+**	mov	x0, \1
+**	bl	__arm_tpidr2_restore
+**	msr	tpidr2_el0, xzr
+**	bl	sc_shared_callee
+**	bl	sc_preserves_callee
+**	bl	sc_shared_preserves_callee
+**	smstop	za
+**	...
+*/
+void __attribute__((arm_new_za))
+ns_caller5 (struct callbacks *c)
+{
+  sc_normal_callee ();
+  sc_shared_callee ();
+  sc_preserves_callee ();
+  sc_shared_preserves_callee ();
+}
+
+/*
+** s_caller1:
+**	...
+**	mrs	x11, tpidr2_el0
+**	cbz	x11, .*
+**	bl	__arm_tpidr2_save
+**	smstart	za
+**	...
+**	add	(x[0-9]+), x29, .*
+**	cntb	(x[0-9]+)
+**	mov	(x[0-9]+), sp
+**	msub	(x[0-9]+), \2, \2, \3
+**	mov	sp, \4
+**	stp	\4, \2, .*
+**	msr	tpidr2_el0, \1
+**	bl	s_normal_callee
+**	mrs	x16, tpidr2_el0
+**	cbnz	x16, .*
+**	smstart	za
+**	mov	x0, \1
+**	bl	__arm_tpidr2_restore
+**	msr	tpidr2_el0, xzr
+**	bl	s_shared_callee
+**	bl	s_preserves_callee
+**	bl	s_shared_preserves_callee
+**	smstop	za
+**	...
+*/
+void __attribute__((arm_new_za, arm_streaming))
+s_caller1 (struct callbacks *c)
+{
+  s_normal_callee ();
+  s_shared_callee ();
+  s_preserves_callee ();
+  s_shared_preserves_callee ();
+}
+
+/*
+** sc_caller1:
+**	...
+**	mrs	x11, tpidr2_el0
+**	cbz	x11, .*
+**	bl	__arm_tpidr2_save
+**	smstart	za
+**	...
+**	add	(x[0-9]+), x29, .*
+**	rdsvl	(x[0-9]+), #1
+**	mov	(x[0-9]+), sp
+**	msub	(x[0-9]+), \2, \2, \3
+**	mov	sp, \4
+**	stp	\4, \2, .*
+**	msr	tpidr2_el0, \1
+**	bl	sc_normal_callee
+**	mrs	x16, tpidr2_el0
+**	cbnz	x16, .*
+**	smstart	za
+**	mov	x0, \1
+**	bl	__arm_tpidr2_restore
+**	msr	tpidr2_el0, xzr
+**	bl	sc_shared_callee
+**	bl	sc_preserves_callee
+**	bl	sc_shared_preserves_callee
+**	smstop	za
+**	...
+*/
+void __attribute__((arm_new_za, arm_streaming_compatible))
+sc_caller1 (struct callbacks *c)
+{
+  sc_normal_callee ();
+  sc_shared_callee ();
+  sc_preserves_callee ();
+  sc_shared_preserves_callee ();
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/sme/za_state_5.c b/gcc/testsuite/gcc.target/aarch64/sme/za_state_5.c
new file mode 100644
index 00000000000..b18d3fff652
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sme/za_state_5.c
@@ -0,0 +1,241 @@
+// { dg-options "-O -fno-optimize-sibling-calls" }
+// { dg-final { check-function-bodies "**" "" } }
+
+void ns_normal_callee ();
+__attribute__((arm_shared_za)) void ns_shared_callee ();
+__attribute__((arm_preserves_za)) void ns_preserves_callee ();
+__attribute__((arm_shared_za, arm_preserves_za)) void ns_shared_preserves_callee ();
+
+__attribute__((arm_streaming)) void s_normal_callee ();
+__attribute__((arm_streaming, arm_shared_za)) void s_shared_callee ();
+__attribute__((arm_streaming, arm_preserves_za)) void s_preserves_callee ();
+__attribute__((arm_streaming, arm_shared_za, arm_preserves_za)) void s_shared_preserves_callee ();
+
+__attribute__((arm_streaming_compatible)) void sc_normal_callee ();
+__attribute__((arm_streaming_compatible, arm_shared_za)) void sc_shared_callee ();
+__attribute__((arm_streaming_compatible, arm_preserves_za)) void sc_preserves_callee ();
+__attribute__((arm_streaming_compatible, arm_shared_za, arm_preserves_za)) void sc_shared_preserves_callee ();
+
+struct callbacks {
+  void (*normal_ptr) ();
+  __attribute__((arm_shared_za)) void (*shared_ptr) ();
+  __attribute__((arm_preserves_za)) void (*preserves_ptr) ();
+  __attribute__((arm_shared_za, arm_preserves_za)) void (*shared_preserves_ptr) ();
+};
+
+/*
+** ns_caller1:
+**	...
+**	add	(x[0-9]+), x29, .*
+**	rdsvl	(x[0-9]+), #1
+**	mov	(x[0-9]+), sp
+**	msub	(x[0-9]+), \2, \2, \3
+**	mov	sp, \4
+**	stp	\4, \2, .*
+**	msr	tpidr2_el0, \1
+**	bl	ns_normal_callee
+**	mrs	x16, tpidr2_el0
+**	cbnz	x16, .*
+**	smstart	za
+**	mov	x0, \1
+**	bl	__arm_tpidr2_restore
+**	msr	tpidr2_el0, xzr
+**	bl	ns_shared_callee
+**	bl	ns_preserves_callee
+**	bl	ns_shared_preserves_callee
+**	msr	tpidr2_el0, \1
+**	ldr	x[0-9]+, .*
+**	blr	x[0-9]+
+**	mrs	x16, tpidr2_el0
+**	cbnz	x16, .*
+**	smstart	za
+**	mov	x0, \1
+**	bl	__arm_tpidr2_restore
+**	msr	tpidr2_el0, xzr
+**	ldr	x[0-9]+, .*
+**	blr	x[0-9]+
+**	ldr	x[0-9]+, .*
+**	blr	x[0-9]+
+**	ldr	x[0-9]+, .*
+**	blr	x[0-9]+
+**	...
+*/
+void __attribute__((arm_shared_za))
+ns_caller1 (struct callbacks *c)
+{
+  ns_normal_callee ();
+  ns_shared_callee ();
+  ns_preserves_callee ();
+  ns_shared_preserves_callee ();
+
+  c->normal_ptr ();
+  c->shared_ptr ();
+  c->preserves_ptr ();
+  c->shared_preserves_ptr ();
+}
+
+/*
+** ns_caller2:
+**	stp	x29, x30, \[sp, #?-16\]!
+**	mov	x29, sp
+**	bl	ns_shared_callee
+**	ldp	x29, x30, \[sp\], #?16
+**	ret
+*/
+void __attribute__((arm_shared_za))
+ns_caller2 (struct callbacks *c)
+{
+  ns_shared_callee ();
+}
+
+/*
+** ns_caller3:
+**	stp	x29, x30, \[sp, #?-16\]!
+**	mov	x29, sp
+**	bl	ns_preserves_callee
+**	bl	ns_shared_callee
+**	bl	ns_shared_preserves_callee
+**	ldp	x29, x30, \[sp\], #?16
+**	ret
+*/
+void __attribute__((arm_shared_za))
+ns_caller3 (struct callbacks *c)
+{
+  ns_preserves_callee ();
+  ns_shared_callee ();
+  ns_shared_preserves_callee ();
+}
+
+/*
+** ns_caller4:
+**	...
+**	add	(x[0-9]+), x29, .*
+**	rdsvl	(x[0-9]+), #1
+**	mov	(x[0-9]+), sp
+**	msub	(x[0-9]+), \2, \2, \3
+**	mov	sp, \4
+**	stp	\4, \2, .*
+**	msr	tpidr2_el0, \1
+**	smstart	sm
+**	bl	s_normal_callee
+**	smstop	sm
+**	mrs	x16, tpidr2_el0
+**	cbnz	x16, .*
+**	smstart	za
+**	mov	x0, \1
+**	bl	__arm_tpidr2_restore
+**	msr	tpidr2_el0, xzr
+**	smstart	sm
+**	bl	s_shared_callee
+**	smstop	sm
+**	smstart	sm
+**	bl	s_preserves_callee
+**	smstop	sm
+**	smstart	sm
+**	bl	s_shared_preserves_callee
+**	smstop	sm
+**	...
+*/
+void __attribute__((arm_shared_za))
+ns_caller4 (struct callbacks *c)
+{
+  s_normal_callee ();
+  s_shared_callee ();
+  s_preserves_callee ();
+  s_shared_preserves_callee ();
+}
+
+/*
+** ns_caller5:
+**	...
+**	add	(x[0-9]+), x29, .*
+**	rdsvl	(x[0-9]+), #1
+**	mov	(x[0-9]+), sp
+**	msub	(x[0-9]+), \2, \2, \3
+**	mov	sp, \4
+**	stp	\4, \2, .*
+**	msr	tpidr2_el0, \1
+**	bl	sc_normal_callee
+**	mrs	x16, tpidr2_el0
+**	cbnz	x16, .*
+**	smstart	za
+**	mov	x0, \1
+**	bl	__arm_tpidr2_restore
+**	msr	tpidr2_el0, xzr
+**	bl	sc_shared_callee
+**	bl	sc_preserves_callee
+**	bl	sc_shared_preserves_callee
+**	...
+*/
+void __attribute__((arm_shared_za))
+ns_caller5 (struct callbacks *c)
+{
+  sc_normal_callee ();
+  sc_shared_callee ();
+  sc_preserves_callee ();
+  sc_shared_preserves_callee ();
+}
+
+/*
+** s_caller1:
+**	...
+**	add	(x[0-9]+), x29, .*
+**	cntb	(x[0-9]+)
+**	mov	(x[0-9]+), sp
+**	msub	(x[0-9]+), \2, \2, \3
+**	mov	sp, \4
+**	stp	\4, \2, .*
+**	msr	tpidr2_el0, \1
+**	bl	s_normal_callee
+**	mrs	x16, tpidr2_el0
+**	cbnz	x16, .*
+**	smstart	za
+**	mov	x0, \1
+**	bl	__arm_tpidr2_restore
+**	msr	tpidr2_el0, xzr
+**	bl	s_shared_callee
+**	bl	s_preserves_callee
+**	bl	s_shared_preserves_callee
+**	...
+*/
+void __attribute__((arm_shared_za, arm_streaming))
+s_caller1 (struct callbacks *c)
+{
+  s_normal_callee ();
+  s_shared_callee ();
+  s_preserves_callee ();
+  s_shared_preserves_callee ();
+}
+
+/*
+** sc_caller1:
+**	...
+**	add	(x[0-9]+), x29, .*
+**	rdsvl	(x[0-9]+), #1
+**	mov	(x[0-9]+), sp
+**	msub	(x[0-9]+), \2, \2, \3
+**	mov	sp, \4
+**	stp	\4, \2, .*
+**	msr	tpidr2_el0, \1
+**	bl	sc_normal_callee
+**	mrs	x16, tpidr2_el0
+**	cbnz	x16, .*
+**	smstart	za
+**	mov	x0, \1
+**	bl	__arm_tpidr2_restore
+**	msr	tpidr2_el0, xzr
+**	bl	sc_shared_callee
+**	bl	sc_preserves_callee
+**	bl	sc_shared_preserves_callee
+**	...
+*/
+void __attribute__((arm_shared_za, arm_streaming_compatible))
+sc_caller1 (struct callbacks *c)
+{
+  sc_normal_callee ();
+  sc_shared_callee ();
+  sc_preserves_callee ();
+  sc_shared_preserves_callee ();
+}
+
+// { dg-final { scan-assembler-not {\tsmstop\tza} } }
diff --git a/gcc/testsuite/gcc.target/aarch64/sme/za_state_6.c b/gcc/testsuite/gcc.target/aarch64/sme/za_state_6.c
new file mode 100644
index 00000000000..c0b9e2275f2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sme/za_state_6.c
@@ -0,0 +1,132 @@
+// { dg-options "-O -fno-optimize-sibling-calls" }
+
+void ns_normal_callee ();
+__attribute__((arm_shared_za)) void ns_shared_callee ();
+__attribute__((arm_preserves_za)) void ns_preserves_callee ();
+__attribute__((arm_shared_za, arm_preserves_za)) void ns_shared_preserves_callee ();
+
+__attribute__((arm_streaming)) void s_normal_callee ();
+__attribute__((arm_streaming, arm_shared_za)) void s_shared_callee ();
+__attribute__((arm_streaming, arm_preserves_za)) void s_preserves_callee ();
+__attribute__((arm_streaming, arm_shared_za, arm_preserves_za)) void s_shared_preserves_callee ();
+
+__attribute__((arm_streaming_compatible)) void sc_normal_callee ();
+__attribute__((arm_streaming_compatible, arm_shared_za)) void sc_shared_callee ();
+__attribute__((arm_streaming_compatible, arm_preserves_za)) void sc_preserves_callee ();
+__attribute__((arm_streaming_compatible, arm_shared_za, arm_preserves_za)) void sc_shared_preserves_callee ();
+
+void __attribute__((arm_new_za))
+caller1 ()
+{
+  ns_normal_callee ();
+  ns_shared_callee ();
+  ns_preserves_callee ();
+  ns_shared_preserves_callee ();
+
+  s_normal_callee ();
+  s_shared_callee ();
+  s_preserves_callee ();
+  s_shared_preserves_callee ();
+
+  sc_normal_callee ();
+  sc_shared_callee ();
+  sc_preserves_callee ();
+  sc_shared_preserves_callee ();
+}
+
+void __attribute__((arm_shared_za))
+caller2 ()
+{
+  ns_normal_callee ();
+  ns_shared_callee ();
+  ns_preserves_callee ();
+  ns_shared_preserves_callee ();
+
+  s_normal_callee ();
+  s_shared_callee ();
+  s_preserves_callee ();
+  s_shared_preserves_callee ();
+
+  sc_normal_callee ();
+  sc_shared_callee ();
+  sc_preserves_callee ();
+  sc_shared_preserves_callee ();
+}
+
+void __attribute__((arm_new_za, arm_streaming))
+caller3 ()
+{
+  ns_normal_callee ();
+  ns_shared_callee ();
+  ns_preserves_callee ();
+  ns_shared_preserves_callee ();
+
+  s_normal_callee ();
+  s_shared_callee ();
+  s_preserves_callee ();
+  s_shared_preserves_callee ();
+
+  sc_normal_callee ();
+  sc_shared_callee ();
+  sc_preserves_callee ();
+  sc_shared_preserves_callee ();
+}
+
+void __attribute__((arm_shared_za, arm_streaming))
+caller4 ()
+{
+  ns_normal_callee ();
+  ns_shared_callee ();
+  ns_preserves_callee ();
+  ns_shared_preserves_callee ();
+
+  s_normal_callee ();
+  s_shared_callee ();
+  s_preserves_callee ();
+  s_shared_preserves_callee ();
+
+  sc_normal_callee ();
+  sc_shared_callee ();
+  sc_preserves_callee ();
+  sc_shared_preserves_callee ();
+}
+
+void __attribute__((arm_new_za, arm_streaming_compatible))
+caller5 ()
+{
+  ns_normal_callee ();
+  ns_shared_callee ();
+  ns_preserves_callee ();
+  ns_shared_preserves_callee ();
+
+  s_normal_callee ();
+  s_shared_callee ();
+  s_preserves_callee ();
+  s_shared_preserves_callee ();
+
+  sc_normal_callee ();
+  sc_shared_callee ();
+  sc_preserves_callee ();
+  sc_shared_preserves_callee ();
+}
+
+void __attribute__((arm_shared_za, arm_streaming_compatible))
+caller6 ()
+{
+  ns_normal_callee ();
+  ns_shared_callee ();
+  ns_preserves_callee ();
+  ns_shared_preserves_callee ();
+
+  s_normal_callee ();
+  s_shared_callee ();
+  s_preserves_callee ();
+  s_shared_preserves_callee ();
+
+  sc_normal_callee ();
+  sc_shared_callee ();
+  sc_preserves_callee ();
+  sc_shared_preserves_callee ();
+}
+
+// { dg-final { scan-assembler-times {\tmsr\ttpidr2_el0, xzr} 18 } }
diff --git a/gcc/testsuite/gcc.target/aarch64/sme/za_state_7.c b/gcc/testsuite/gcc.target/aarch64/sme/za_state_7.c
new file mode 100644
index 00000000000..4b588517cb1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sme/za_state_7.c
@@ -0,0 +1,55 @@
+// { dg-options "-O -fno-optimize-sibling-calls -fomit-frame-pointer" }
+// { dg-final { check-function-bodies "**" "" } }
+
+/*
+** za1:
+**	mov	w0, #?1
+**	ret
+*/
+int __attribute__((arm_new_za))
+za1 ()
+{
+  asm ("");
+  return 1;
+}
+
+/*
+** za2:
+**	str	x30, \[sp, #?-16\]!
+**	mrs	x11, tpidr2_el0
+**	cbz	x11, .*
+**	bl	__arm_tpidr2_save
+**	msr	tpidr2_el0, xzr
+**	smstart	za
+**	mov	w0, #?1
+**	smstop	za
+**	ldr	x30, \[sp\], #?16
+**	ret
+*/
+int __attribute__((arm_new_za))
+za2 ()
+{
+  asm ("" ::: "za");
+  return 1;
+}
+
+/*
+** za3:
+**	str	x30, \[sp, #?-16\]!
+**	mrs	x11, tpidr2_el0
+**	cbz	x11, .*
+**	bl	__arm_tpidr2_save
+**	msr	tpidr2_el0, xzr
+**	smstart	za
+**	mov	w0, w2
+**	smstop	za
+**	ldr	x30, \[sp\], #?16
+**	ret
+*/
+int __attribute__((arm_new_za))
+za3 ()
+{
+  register int ret asm ("x2");
+  asm ("" : "=r" (ret) :: "za");
+  return ret;
+}
-- 
2.25.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 07/16] aarch64: Add a register class for w12-w15
  2022-11-13  9:59 [PATCH 00/16] aarch64: Add support for SME Richard Sandiford
                   ` (5 preceding siblings ...)
  2022-11-13 10:01 ` [PATCH 06/16] aarch64: Add support for SME ZA attributes Richard Sandiford
@ 2022-11-13 10:01 ` Richard Sandiford
  2022-11-13 10:01 ` [PATCH 08/16] aarch64: Add a VNx1TI mode Richard Sandiford
                   ` (8 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: Richard Sandiford @ 2022-11-13 10:01 UTC (permalink / raw)
  To: gcc-patches

Some SME instructions use w12-w15 to index ZA.  This patch
adds a register class for that range.

gcc/
	* config/aarch64/aarch64.h (ZA_INDEX_REGNUM_P): New macro.
	(ZA_INDEX_REGS): New register class.
	(REG_CLASS_NAMES, REG_CLASS_CONTENTS): Add entries for it.
	* config/aarch64/aarch64.cc (aarch64_regno_regclass)
	(aarch64_class_max_nregs, aarch64_register_move_cost): Handle
	ZA_INDEX_REGS.
---
 gcc/config/aarch64/aarch64.cc | 12 +++++++-----
 gcc/config/aarch64/aarch64.h  |  6 ++++++
 2 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index b200d2a9f80..d29cfefee6b 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -13553,6 +13553,9 @@ aarch64_label_mentioned_p (rtx x)
 enum reg_class
 aarch64_regno_regclass (unsigned regno)
 {
+  if (ZA_INDEX_REGNUM_P (regno))
+    return ZA_INDEX_REGS;
+
   if (STUB_REGNUM_P (regno))
     return STUB_REGS;
 
@@ -13917,6 +13920,7 @@ aarch64_class_max_nregs (reg_class_t regclass, machine_mode mode)
   unsigned int nregs, vec_flags;
   switch (regclass)
     {
+    case ZA_INDEX_REGS:
     case STUB_REGS:
     case TAILCALL_ADDR_REGS:
     case POINTER_REGS:
@@ -16252,13 +16256,11 @@ aarch64_register_move_cost (machine_mode mode,
   const struct cpu_regmove_cost *regmove_cost
     = aarch64_tune_params.regmove_cost;
 
-  /* Caller save and pointer regs are equivalent to GENERAL_REGS.  */
-  if (to == TAILCALL_ADDR_REGS || to == POINTER_REGS
-      || to == STUB_REGS)
+  /* Trest any subset of GENERAL_REGS as though it were GENERAL_REGS.  */
+  if (reg_class_subset_p (to, GENERAL_REGS))
     to = GENERAL_REGS;
 
-  if (from == TAILCALL_ADDR_REGS || from == POINTER_REGS
-      || from == STUB_REGS)
+  if (reg_class_subset_p (from, GENERAL_REGS))
     from = GENERAL_REGS;
 
   /* Make RDFFR very expensive.  In particular, if we know that the FFR
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index b5877e7e61e..bfa28726221 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -643,6 +643,9 @@ constexpr auto AARCH64_FL_DEFAULT_ISA_MODE = AARCH64_FL_SM_OFF;
    && (REGNO) != R17_REGNUM \
    && (REGNO) != R30_REGNUM) \
 
+#define ZA_INDEX_REGNUM_P(REGNO) \
+  IN_RANGE (REGNO, R12_REGNUM, R15_REGNUM)
+
 #define FP_REGNUM_P(REGNO)			\
   (((unsigned) (REGNO - V0_REGNUM)) <= (V31_REGNUM - V0_REGNUM))
 
@@ -666,6 +669,7 @@ constexpr auto AARCH64_FL_DEFAULT_ISA_MODE = AARCH64_FL_SM_OFF;
 enum reg_class
 {
   NO_REGS,
+  ZA_INDEX_REGS,
   TAILCALL_ADDR_REGS,
   STUB_REGS,
   GENERAL_REGS,
@@ -690,6 +694,7 @@ enum reg_class
 #define REG_CLASS_NAMES				\
 {						\
   "NO_REGS",					\
+  "ZA_INDEX_REGS",				\
   "TAILCALL_ADDR_REGS",				\
   "STUB_REGS",					\
   "GENERAL_REGS",				\
@@ -711,6 +716,7 @@ enum reg_class
 #define REG_CLASS_CONTENTS						\
 {									\
   { 0x00000000, 0x00000000, 0x00000000 },	/* NO_REGS */		\
+  { 0x0000f000, 0x00000000, 0x00000000 },	/* ZA_INDEX_REGS */	\
   { 0x00030000, 0x00000000, 0x00000000 },	/* TAILCALL_ADDR_REGS */\
   { 0x3ffcffff, 0x00000000, 0x00000000 },	/* STUB_REGS */		\
   { 0x7fffffff, 0x00000000, 0x00000003 },	/* GENERAL_REGS */	\
-- 
2.25.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 08/16] aarch64: Add a VNx1TI mode
  2022-11-13  9:59 [PATCH 00/16] aarch64: Add support for SME Richard Sandiford
                   ` (6 preceding siblings ...)
  2022-11-13 10:01 ` [PATCH 07/16] aarch64: Add a register class for w12-w15 Richard Sandiford
@ 2022-11-13 10:01 ` Richard Sandiford
  2022-11-13 10:01 ` [PATCH 09/16] aarch64: Make AARCH64_FL_SVE requirements explicit Richard Sandiford
                   ` (7 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: Richard Sandiford @ 2022-11-13 10:01 UTC (permalink / raw)
  To: gcc-patches

Although TI isn't really a native SVE element mode, it's convenient
for SME if we define VNx1TI anyway, so that it can be used to
distinguish .Q ZA operations from others.  It's purely an RTL
convenience and isn't (yet) a valid storage mode.

gcc/
	* config/aarch64/aarch64-modes.def: Add VNx1TI.
---
 gcc/config/aarch64/aarch64-modes.def | 21 ++++++++++++++-------
 1 file changed, 14 insertions(+), 7 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-modes.def b/gcc/config/aarch64/aarch64-modes.def
index 0fd4c32ad0b..e960b649a6b 100644
--- a/gcc/config/aarch64/aarch64-modes.def
+++ b/gcc/config/aarch64/aarch64-modes.def
@@ -148,7 +148,7 @@ ADV_SIMD_Q_REG_STRUCT_MODES (4, V4x16, V4x8, V4x4, V4x2)
    for 8-bit, 16-bit, 32-bit and 64-bit elements respectively.  It isn't
    strictly necessary to set the alignment here, since the default would
    be clamped to BIGGEST_ALIGNMENT anyhow, but it seems clearer.  */
-#define SVE_MODES(NVECS, VB, VH, VS, VD) \
+#define SVE_MODES(NVECS, VB, VH, VS, VD, VT) \
   VECTOR_MODES_WITH_PREFIX (VNx, INT, 16 * NVECS, NVECS == 1 ? 1 : 4); \
   VECTOR_MODES_WITH_PREFIX (VNx, FLOAT, 16 * NVECS, NVECS == 1 ? 1 : 4); \
   \
@@ -156,6 +156,7 @@ ADV_SIMD_Q_REG_STRUCT_MODES (4, V4x16, V4x8, V4x4, V4x2)
   ADJUST_NUNITS (VH##HI, aarch64_sve_vg * NVECS * 4); \
   ADJUST_NUNITS (VS##SI, aarch64_sve_vg * NVECS * 2); \
   ADJUST_NUNITS (VD##DI, aarch64_sve_vg * NVECS); \
+  ADJUST_NUNITS (VT##TI, exact_div (aarch64_sve_vg * NVECS, 2)); \
   ADJUST_NUNITS (VH##BF, aarch64_sve_vg * NVECS * 4); \
   ADJUST_NUNITS (VH##HF, aarch64_sve_vg * NVECS * 4); \
   ADJUST_NUNITS (VS##SF, aarch64_sve_vg * NVECS * 2); \
@@ -165,17 +166,23 @@ ADV_SIMD_Q_REG_STRUCT_MODES (4, V4x16, V4x8, V4x4, V4x2)
   ADJUST_ALIGNMENT (VH##HI, 16); \
   ADJUST_ALIGNMENT (VS##SI, 16); \
   ADJUST_ALIGNMENT (VD##DI, 16); \
+  ADJUST_ALIGNMENT (VT##TI, 16); \
   ADJUST_ALIGNMENT (VH##BF, 16); \
   ADJUST_ALIGNMENT (VH##HF, 16); \
   ADJUST_ALIGNMENT (VS##SF, 16); \
   ADJUST_ALIGNMENT (VD##DF, 16);
 
-/* Give SVE vectors the names normally used for 256-bit vectors.
-   The actual number depends on command-line flags.  */
-SVE_MODES (1, VNx16, VNx8, VNx4, VNx2)
-SVE_MODES (2, VNx32, VNx16, VNx8, VNx4)
-SVE_MODES (3, VNx48, VNx24, VNx12, VNx6)
-SVE_MODES (4, VNx64, VNx32, VNx16, VNx8)
+/* Give SVE vectors names of the form VNxX, where X describes what is
+   stored in each 128-bit unit.  The actual size of the mode depends
+   on command-line flags.
+
+   VNx1TI isn't really a native SVE mode, but it can be useful in some
+   limited situations.  */
+VECTOR_MODE_WITH_PREFIX (VNx, INT, TI, 1, 1);
+SVE_MODES (1, VNx16, VNx8, VNx4, VNx2, VNx1)
+SVE_MODES (2, VNx32, VNx16, VNx8, VNx4, VNx2)
+SVE_MODES (3, VNx48, VNx24, VNx12, VNx6, VNx3)
+SVE_MODES (4, VNx64, VNx32, VNx16, VNx8, VNx4)
 
 /* Partial SVE vectors:
 
-- 
2.25.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 09/16] aarch64: Make AARCH64_FL_SVE requirements explicit
  2022-11-13  9:59 [PATCH 00/16] aarch64: Add support for SME Richard Sandiford
                   ` (7 preceding siblings ...)
  2022-11-13 10:01 ` [PATCH 08/16] aarch64: Add a VNx1TI mode Richard Sandiford
@ 2022-11-13 10:01 ` Richard Sandiford
  2022-11-13 10:02 ` [PATCH 10/16] aarch64: Generalise unspec_based_function_base Richard Sandiford
                   ` (6 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: Richard Sandiford @ 2022-11-13 10:01 UTC (permalink / raw)
  To: gcc-patches

So far, all intrinsics covered by the aarch64-sve-builtins*
framework have (naturally enough) required at least SVE.
However, arm_sme.h defines a couple of intrinsics that can
be called by any code.  It's therefore necessary to make
the implicit SVE requirement explicit.

gcc/
	* config/aarch64/aarch64-sve-builtins.cc (function_groups): Remove
	implied requirement on SVE.
	* config/aarch64/aarch64-sve-builtins-base.def: Explicitly require SVE.
	* config/aarch64/aarch64-sve-builtins-sve2.def: Likewise.
---
 .../aarch64/aarch64-sve-builtins-base.def     | 26 ++++++++++++-------
 .../aarch64/aarch64-sve-builtins-sve2.def     | 18 ++++++++-----
 gcc/config/aarch64/aarch64-sve-builtins.cc    |  2 +-
 3 files changed, 30 insertions(+), 16 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.def b/gcc/config/aarch64/aarch64-sve-builtins-base.def
index a2d0cea6c5b..d35cdffe20f 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins-base.def
+++ b/gcc/config/aarch64/aarch64-sve-builtins-base.def
@@ -17,7 +17,7 @@
    along with GCC; see the file COPYING3.  If not see
    <http://www.gnu.org/licenses/>.  */
 
-#define REQUIRED_EXTENSIONS 0
+#define REQUIRED_EXTENSIONS AARCH64_FL_SVE
 DEF_SVE_FUNCTION (svabd, binary_opt_n, all_arith, mxz)
 DEF_SVE_FUNCTION (svabs, unary, all_float_and_signed, mxz)
 DEF_SVE_FUNCTION (svacge, compare_opt_n, all_float, implicit)
@@ -255,7 +255,7 @@ DEF_SVE_FUNCTION (svzip2, binary, all_data, none)
 DEF_SVE_FUNCTION (svzip2, binary_pred, all_pred, none)
 #undef REQUIRED_EXTENSIONS
 
-#define REQUIRED_EXTENSIONS AARCH64_FL_SM_OFF
+#define REQUIRED_EXTENSIONS AARCH64_FL_SVE | AARCH64_FL_SM_OFF
 DEF_SVE_FUNCTION (svadda, fold_left, all_float, implicit)
 DEF_SVE_FUNCTION (svadrb, adr_offset, none, none)
 DEF_SVE_FUNCTION (svadrd, adr_index, none, none)
@@ -321,7 +321,7 @@ DEF_SVE_FUNCTION (svtssel, binary_uint, all_float, none)
 DEF_SVE_FUNCTION (svwrffr, setffr, none, implicit)
 #undef REQUIRED_EXTENSIONS
 
-#define REQUIRED_EXTENSIONS AARCH64_FL_BF16
+#define REQUIRED_EXTENSIONS AARCH64_FL_SVE | AARCH64_FL_BF16
 DEF_SVE_FUNCTION (svbfdot, ternary_bfloat_opt_n, s_float, none)
 DEF_SVE_FUNCTION (svbfdot_lane, ternary_bfloat_lanex2, s_float, none)
 DEF_SVE_FUNCTION (svbfmlalb, ternary_bfloat_opt_n, s_float, none)
@@ -332,27 +332,33 @@ DEF_SVE_FUNCTION (svcvt, unary_convert, cvt_bfloat, mxz)
 DEF_SVE_FUNCTION (svcvtnt, unary_convert_narrowt, cvt_bfloat, mx)
 #undef REQUIRED_EXTENSIONS
 
-#define REQUIRED_EXTENSIONS AARCH64_FL_BF16 | AARCH64_FL_SM_OFF
+#define REQUIRED_EXTENSIONS (AARCH64_FL_SVE \
+			     | AARCH64_FL_BF16 \
+			     | AARCH64_FL_SM_OFF)
 DEF_SVE_FUNCTION (svbfmmla, ternary_bfloat, s_float, none)
 #undef REQUIRED_EXTENSIONS
 
-#define REQUIRED_EXTENSIONS AARCH64_FL_I8MM
+#define REQUIRED_EXTENSIONS AARCH64_FL_SVE | AARCH64_FL_I8MM
 DEF_SVE_FUNCTION (svsudot, ternary_intq_uintq_opt_n, s_signed, none)
 DEF_SVE_FUNCTION (svsudot_lane, ternary_intq_uintq_lane, s_signed, none)
 DEF_SVE_FUNCTION (svusdot, ternary_uintq_intq_opt_n, s_signed, none)
 DEF_SVE_FUNCTION (svusdot_lane, ternary_uintq_intq_lane, s_signed, none)
 #undef REQUIRED_EXTENSIONS
 
-#define REQUIRED_EXTENSIONS AARCH64_FL_I8MM | AARCH64_FL_SM_OFF
+#define REQUIRED_EXTENSIONS (AARCH64_FL_SVE \
+			     | AARCH64_FL_I8MM \
+			     | AARCH64_FL_SM_OFF)
 DEF_SVE_FUNCTION (svmmla, mmla, s_integer, none)
 DEF_SVE_FUNCTION (svusmmla, ternary_uintq_intq, s_signed, none)
 #undef REQUIRED_EXTENSIONS
 
-#define REQUIRED_EXTENSIONS AARCH64_FL_F32MM | AARCH64_FL_SM_OFF
+#define REQUIRED_EXTENSIONS (AARCH64_FL_SVE \
+			     | AARCH64_FL_F32MM \
+			     | AARCH64_FL_SM_OFF)
 DEF_SVE_FUNCTION (svmmla, mmla, s_float, none)
 #undef REQUIRED_EXTENSIONS
 
-#define REQUIRED_EXTENSIONS AARCH64_FL_F64MM
+#define REQUIRED_EXTENSIONS AARCH64_FL_SVE | AARCH64_FL_F64MM
 DEF_SVE_FUNCTION (svtrn1q, binary, all_data, none)
 DEF_SVE_FUNCTION (svtrn2q, binary, all_data, none)
 DEF_SVE_FUNCTION (svuzp1q, binary, all_data, none)
@@ -361,7 +367,9 @@ DEF_SVE_FUNCTION (svzip1q, binary, all_data, none)
 DEF_SVE_FUNCTION (svzip2q, binary, all_data, none)
 #undef REQUIRED_EXTENSIONS
 
-#define REQUIRED_EXTENSIONS AARCH64_FL_F64MM | AARCH64_FL_SM_OFF
+#define REQUIRED_EXTENSIONS (AARCH64_FL_SVE \
+			     | AARCH64_FL_F64MM \
+			     | AARCH64_FL_SM_OFF)
 DEF_SVE_FUNCTION (svld1ro, load_replicate, all_data, implicit)
 DEF_SVE_FUNCTION (svmmla, mmla, d_float, none)
 #undef REQUIRED_EXTENSIONS
diff --git a/gcc/config/aarch64/aarch64-sve-builtins-sve2.def b/gcc/config/aarch64/aarch64-sve-builtins-sve2.def
index 4e0466b4cf8..3c0a0e072f2 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins-sve2.def
+++ b/gcc/config/aarch64/aarch64-sve-builtins-sve2.def
@@ -17,7 +17,7 @@
    along with GCC; see the file COPYING3.  If not see
    <http://www.gnu.org/licenses/>.  */
 
-#define REQUIRED_EXTENSIONS AARCH64_FL_SVE2
+#define REQUIRED_EXTENSIONS AARCH64_FL_SVE | AARCH64_FL_SVE2
 DEF_SVE_FUNCTION (svaba, ternary_opt_n, all_integer, none)
 DEF_SVE_FUNCTION (svabalb, ternary_long_opt_n, hsd_integer, none)
 DEF_SVE_FUNCTION (svabalt, ternary_long_opt_n, hsd_integer, none)
@@ -166,7 +166,9 @@ DEF_SVE_FUNCTION (svwhilewr, compare_ptr, all_data, none)
 DEF_SVE_FUNCTION (svxar, ternary_shift_right_imm, all_integer, none)
 #undef REQUIRED_EXTENSIONS
 
-#define REQUIRED_EXTENSIONS AARCH64_FL_SVE2 | AARCH64_FL_SM_OFF
+#define REQUIRED_EXTENSIONS (AARCH64_FL_SVE \
+			     | AARCH64_FL_SVE2 \
+			     | AARCH64_FL_SM_OFF)
 DEF_SVE_FUNCTION (svhistcnt, binary_to_uint, sd_integer, z)
 DEF_SVE_FUNCTION (svhistseg, binary_to_uint, b_integer, none)
 DEF_SVE_FUNCTION (svldnt1_gather, load_gather_sv_restricted, sd_data, implicit)
@@ -192,7 +194,8 @@ DEF_SVE_FUNCTION (svstnt1w_scatter, store_scatter_index_restricted, d_integer, i
 DEF_SVE_FUNCTION (svstnt1w_scatter, store_scatter_offset_restricted, d_integer, implicit)
 #undef REQUIRED_EXTENSIONS
 
-#define REQUIRED_EXTENSIONS (AARCH64_FL_SVE2 \
+#define REQUIRED_EXTENSIONS (AARCH64_FL_SVE \
+			     | AARCH64_FL_SVE2 \
 			     | AARCH64_FL_SVE2_AES \
 			     | AARCH64_FL_SM_OFF)
 DEF_SVE_FUNCTION (svaesd, binary, b_unsigned, none)
@@ -203,7 +206,8 @@ DEF_SVE_FUNCTION (svpmullb_pair, binary_opt_n, d_unsigned, none)
 DEF_SVE_FUNCTION (svpmullt_pair, binary_opt_n, d_unsigned, none)
 #undef REQUIRED_EXTENSIONS
 
-#define REQUIRED_EXTENSIONS (AARCH64_FL_SVE2 \
+#define REQUIRED_EXTENSIONS (AARCH64_FL_SVE \
+			     | AARCH64_FL_SVE2 \
 			     | AARCH64_FL_SVE2_BITPERM \
 			     | AARCH64_FL_SM_OFF)
 DEF_SVE_FUNCTION (svbdep, binary_opt_n, all_unsigned, none)
@@ -211,13 +215,15 @@ DEF_SVE_FUNCTION (svbext, binary_opt_n, all_unsigned, none)
 DEF_SVE_FUNCTION (svbgrp, binary_opt_n, all_unsigned, none)
 #undef REQUIRED_EXTENSIONS
 
-#define REQUIRED_EXTENSIONS (AARCH64_FL_SVE2 \
+#define REQUIRED_EXTENSIONS (AARCH64_FL_SVE \
+			     | AARCH64_FL_SVE2 \
 			     | AARCH64_FL_SVE2_SHA3 \
 			     | AARCH64_FL_SM_OFF)
 DEF_SVE_FUNCTION (svrax1, binary, d_integer, none)
 #undef REQUIRED_EXTENSIONS
 
-#define REQUIRED_EXTENSIONS (AARCH64_FL_SVE2 \
+#define REQUIRED_EXTENSIONS (AARCH64_FL_SVE \
+			     | AARCH64_FL_SVE2 \
 			     | AARCH64_FL_SVE2_SM4 \
 			     | AARCH64_FL_SM_OFF)
 DEF_SVE_FUNCTION (svsm4e, binary, s_unsigned, none)
diff --git a/gcc/config/aarch64/aarch64-sve-builtins.cc b/gcc/config/aarch64/aarch64-sve-builtins.cc
index a6de1068da9..cb3eb76dd77 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins.cc
+++ b/gcc/config/aarch64/aarch64-sve-builtins.cc
@@ -525,7 +525,7 @@ static const predication_index preds_z[] = { PRED_z, NUM_PREDS };
 static CONSTEXPR const function_group_info function_groups[] = {
 #define DEF_SVE_FUNCTION(NAME, SHAPE, TYPES, PREDS) \
   { #NAME, &functions::NAME, &shapes::SHAPE, types_##TYPES, preds_##PREDS, \
-    REQUIRED_EXTENSIONS | AARCH64_FL_SVE },
+    REQUIRED_EXTENSIONS },
 #include "aarch64-sve-builtins.def"
 };
 
-- 
2.25.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 10/16] aarch64: Generalise unspec_based_function_base
  2022-11-13  9:59 [PATCH 00/16] aarch64: Add support for SME Richard Sandiford
                   ` (8 preceding siblings ...)
  2022-11-13 10:01 ` [PATCH 09/16] aarch64: Make AARCH64_FL_SVE requirements explicit Richard Sandiford
@ 2022-11-13 10:02 ` Richard Sandiford
  2022-11-13 10:02 ` [PATCH 11/16] aarch64: Generalise _m rules for SVE intrinsics Richard Sandiford
                   ` (5 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: Richard Sandiford @ 2022-11-13 10:02 UTC (permalink / raw)
  To: gcc-patches

Until now, SVE intrinsics that map directly to unspecs
have always used type suffix 0 to distinguish between signed
integers, unsigned integers, and floating-point values.
SME adds functions that need to use type suffix 1 instead.
This patch generalises the classes accordingly.

gcc/
	* config/aarch64/aarch64-sve-builtins-functions.h
	(unspec_based_function_base): Allow type suffix 1 to determine
	the mode of the operation.
	(unspec_based_fused_function): Update accordingly.
	(unspec_based_fused_lane_function): Likewise.
---
 .../aarch64/aarch64-sve-builtins-functions.h  | 29 ++++++++++++-------
 1 file changed, 18 insertions(+), 11 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-sve-builtins-functions.h b/gcc/config/aarch64/aarch64-sve-builtins-functions.h
index 472e26c17ff..2fd135aab07 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins-functions.h
+++ b/gcc/config/aarch64/aarch64-sve-builtins-functions.h
@@ -229,18 +229,21 @@ class unspec_based_function_base : public function_base
 public:
   CONSTEXPR unspec_based_function_base (int unspec_for_sint,
 					int unspec_for_uint,
-					int unspec_for_fp)
+					int unspec_for_fp,
+					unsigned int suffix_index = 0)
     : m_unspec_for_sint (unspec_for_sint),
       m_unspec_for_uint (unspec_for_uint),
-      m_unspec_for_fp (unspec_for_fp)
+      m_unspec_for_fp (unspec_for_fp),
+      m_suffix_index (suffix_index)
   {}
 
   /* Return the unspec code to use for INSTANCE, based on type suffix 0.  */
   int
   unspec_for (const function_instance &instance) const
   {
-    return (!instance.type_suffix (0).integer_p ? m_unspec_for_fp
-	    : instance.type_suffix (0).unsigned_p ? m_unspec_for_uint
+    auto &suffix = instance.type_suffix (m_suffix_index);
+    return (!suffix.integer_p ? m_unspec_for_fp
+	    : suffix.unsigned_p ? m_unspec_for_uint
 	    : m_unspec_for_sint);
   }
 
@@ -249,6 +252,9 @@ public:
   int m_unspec_for_sint;
   int m_unspec_for_uint;
   int m_unspec_for_fp;
+
+  /* Which type suffix is used to choose between the unspecs.  */
+  unsigned int m_suffix_index;
 };
 
 /* A function_base for functions that have an associated unspec code.
@@ -301,7 +307,8 @@ public:
   rtx
   expand (function_expander &e) const override
   {
-    return e.use_exact_insn (CODE (unspec_for (e), e.vector_mode (0)));
+    return e.use_exact_insn (CODE (unspec_for (e),
+				   e.vector_mode (m_suffix_index)));
   }
 };
 
@@ -355,16 +362,16 @@ public:
   {
     int unspec = unspec_for (e);
     insn_code icode;
-    if (e.type_suffix (0).float_p)
+    if (e.type_suffix (m_suffix_index).float_p)
       {
 	/* Put the operands in the normal (fma ...) order, with the accumulator
 	   last.  This fits naturally since that's also the unprinted operand
 	   in the asm output.  */
 	e.rotate_inputs_left (0, e.pred != PRED_none ? 4 : 3);
-	icode = code_for_aarch64_sve (unspec, e.vector_mode (0));
+	icode = code_for_aarch64_sve (unspec, e.vector_mode (m_suffix_index));
       }
     else
-      icode = INT_CODE (unspec, e.vector_mode (0));
+      icode = INT_CODE (unspec, e.vector_mode (m_suffix_index));
     return e.use_exact_insn (icode);
   }
 };
@@ -385,16 +392,16 @@ public:
   {
     int unspec = unspec_for (e);
     insn_code icode;
-    if (e.type_suffix (0).float_p)
+    if (e.type_suffix (m_suffix_index).float_p)
       {
 	/* Put the operands in the normal (fma ...) order, with the accumulator
 	   last.  This fits naturally since that's also the unprinted operand
 	   in the asm output.  */
 	e.rotate_inputs_left (0, e.pred != PRED_none ? 5 : 4);
-	icode = code_for_aarch64_lane (unspec, e.vector_mode (0));
+	icode = code_for_aarch64_lane (unspec, e.vector_mode (m_suffix_index));
       }
     else
-      icode = INT_CODE (unspec, e.vector_mode (0));
+      icode = INT_CODE (unspec, e.vector_mode (m_suffix_index));
     return e.use_exact_insn (icode);
   }
 };
-- 
2.25.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 11/16] aarch64: Generalise _m rules for SVE intrinsics
  2022-11-13  9:59 [PATCH 00/16] aarch64: Add support for SME Richard Sandiford
                   ` (9 preceding siblings ...)
  2022-11-13 10:02 ` [PATCH 10/16] aarch64: Generalise unspec_based_function_base Richard Sandiford
@ 2022-11-13 10:02 ` Richard Sandiford
  2022-11-13 10:02 ` [PATCH 12/16] aarch64: Tweaks to function_resolver::resolve_to Richard Sandiford
                   ` (4 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: Richard Sandiford @ 2022-11-13 10:02 UTC (permalink / raw)
  To: gcc-patches

In SVE there was a simple rule that unary merging (_m) intrinsics
had a separate initial argument to specify the values of inactive
lanes, whereas other merging functions took inactive lanes from
the first operand to the operation.

That rule began to break down in SVE2, and it continues to do
so in SME.  This patch therefore adds a virtual function to
specify whether the separate initial argument is present or not.
The old rule is still the default.

gcc/
	* config/aarch64/aarch64-sve-builtins.h
	(function_shape::has_merge_argument_p): New member function.
	* config/aarch64/aarch64-sve-builtins.cc:
	(function_resolver::check_gp_argument): Use it.
	(function_expander::get_fallback_value): Likewise.
	* config/aarch64/aarch64-sve-builtins-shapes.cc
	(apply_predication): Likewise.
	(unary_convert_narrowt_def::has_merge_argument_p): New function.
---
 gcc/config/aarch64/aarch64-sve-builtins-shapes.cc | 10 ++++++++--
 gcc/config/aarch64/aarch64-sve-builtins.cc        |  4 ++--
 gcc/config/aarch64/aarch64-sve-builtins.h         | 13 +++++++++++++
 3 files changed, 23 insertions(+), 4 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-sve-builtins-shapes.cc b/gcc/config/aarch64/aarch64-sve-builtins-shapes.cc
index 8e26bd8a60f..5b47dff0b41 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins-shapes.cc
+++ b/gcc/config/aarch64/aarch64-sve-builtins-shapes.cc
@@ -66,8 +66,8 @@ apply_predication (const function_instance &instance, tree return_type,
 	 the same type as the result.  For unary_convert_narrowt it also
 	 provides the "bottom" half of active elements, and is present
 	 for all types of predication.  */
-      if ((argument_types.length () == 2 && instance.pred == PRED_m)
-	  || instance.shape == shapes::unary_convert_narrowt)
+      auto nargs = argument_types.length () - 1;
+      if (instance.shape->has_merge_argument_p (instance, nargs))
 	argument_types.quick_insert (0, return_type);
     }
 }
@@ -3238,6 +3238,12 @@ SHAPE (unary_convert)
    predicate.  */
 struct unary_convert_narrowt_def : public overloaded_base<1>
 {
+  bool
+  has_merge_argument_p (const function_instance &, unsigned int) const override
+  {
+    return true;
+  }
+
   void
   build (function_builder &b, const function_group_info &group) const override
   {
diff --git a/gcc/config/aarch64/aarch64-sve-builtins.cc b/gcc/config/aarch64/aarch64-sve-builtins.cc
index cb3eb76dd77..450a8d958a8 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins.cc
+++ b/gcc/config/aarch64/aarch64-sve-builtins.cc
@@ -2152,7 +2152,7 @@ function_resolver::check_gp_argument (unsigned int nops,
   if (pred != PRED_none)
     {
       /* Unary merge operations should use resolve_unary instead.  */
-      gcc_assert (nops != 1 || pred != PRED_m);
+      gcc_assert (!shape->has_merge_argument_p (*this, nops));
       nargs = nops + 1;
       if (!check_num_arguments (nargs)
 	  || !require_vector_type (i, VECTOR_TYPE_svbool_t))
@@ -2790,7 +2790,7 @@ function_expander::get_fallback_value (machine_mode mode, unsigned int nops,
 
   gcc_assert (pred == PRED_m || pred == PRED_x);
   if (merge_argno == DEFAULT_MERGE_ARGNO)
-    merge_argno = nops == 1 && pred == PRED_m ? 0 : 1;
+    merge_argno = shape->has_merge_argument_p (*this, nops) ? 0 : 1;
 
   if (merge_argno == 0)
     return args[argno++];
diff --git a/gcc/config/aarch64/aarch64-sve-builtins.h b/gcc/config/aarch64/aarch64-sve-builtins.h
index 0d130b871d0..623b9e3a07b 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins.h
+++ b/gcc/config/aarch64/aarch64-sve-builtins.h
@@ -636,6 +636,9 @@ public:
 class function_shape
 {
 public:
+  virtual bool has_merge_argument_p (const function_instance &,
+				     unsigned int) const;
+
   virtual bool explicit_type_suffix_p (unsigned int) const = 0;
 
   /* Define all functions associated with the given group.  */
@@ -877,6 +880,16 @@ function_base::call_properties (const function_instance &instance) const
   return flags;
 }
 
+/* Return true if INSTANCE (which has NARGS arguments) has an initial
+   vector argument whose only purpose is to specify the values of
+   inactive lanes.  */
+inline bool
+function_shape::has_merge_argument_p (const function_instance &instance,
+				      unsigned int nargs) const
+{
+  return nargs == 1 && instance.pred == PRED_m;
+}
+
 }
 
 #endif
-- 
2.25.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 12/16] aarch64: Tweaks to function_resolver::resolve_to
  2022-11-13  9:59 [PATCH 00/16] aarch64: Add support for SME Richard Sandiford
                   ` (10 preceding siblings ...)
  2022-11-13 10:02 ` [PATCH 11/16] aarch64: Generalise _m rules for SVE intrinsics Richard Sandiford
@ 2022-11-13 10:02 ` Richard Sandiford
  2022-11-13 10:02 ` [PATCH 13/16] aarch64: Add support for <arm_sme.h> Richard Sandiford
                   ` (3 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: Richard Sandiford @ 2022-11-13 10:02 UTC (permalink / raw)
  To: gcc-patches

This patch adds a new interface to function_resolver::resolve_to
in which the mode suffix stays the same (which is the common case).
It then moves the handling of explicit first type suffixes from
function_resolver::resolve_unary to this new function.

This makes things slightly simpler for existing code.  However, the
main reason for doing it is that it helps require_derived_vector_type
handle explicit type suffixes correctly, which in turn improves the
error messages generated by the manual C overloading code in a
follow-up SME patch.

gcc/
	* config/aarch64/aarch64-sve-builtins.h
	(function_resolver::resolve_to): Add an overload that takes
	only the type suffixes.
	* config/aarch64/aarch64-sve-builtins.cc
	(function_resolver::resolve_to): Likewise.  Handle explicit type
	suffixes here rather than...
	(function_resolver::resolve_unary): ...here.
	(function_resolver::require_derived_vector_type): Simplify accordingly.
	(function_resolver::finish_opt_n_resolution): Likewise.
	(function_resolver::resolve_uniform): Likewise.
	* config/aarch64/aarch64-sve-builtins-shapes.cc
	(binary_imm_narrowt_base::resolve): Likewise.
	(load_contiguous_base::resolve): Likewise.
	(mmla_def::resolve): Likewise.
	(ternary_resize2_base::resolve): Likewise.
	(ternary_resize2_lane_base::resolve): Likewise.
	(unary_narrowt_base::resolve): Likewise.
	(binary_n_def::resolve): Likewise.
	(binary_uint_def::resolve): Likewise.
	(binary_uint_n_def::resolve): Likewise.
	(binary_uint64_n_def::resolve): Likewise.
	(binary_wide_def::resolve): Likewise.
	(compare_ptr_def::resolve): Likewise.
	(compare_scalar_def::resolve): Likewise.
	(fold_left_def::resolve): Likewise.
	(get_def::resolve): Likewise.
	(inc_dec_pred_def::resolve): Likewise.
	(inc_dec_pred_scalar_def::resolve): Likewise.
	(set_def::resolve): Likewise.
	(store_def::resolve): Likewise.
	(tbl_tuple_def::resolve): Likewise.
	(ternary_qq_lane_rotate_def::resolve): Likewise.
	(ternary_qq_rotate_def::resolve): Likewise.
	(ternary_uint_def::resolve): Likewise.
	(unary_def::resolve): Likewise.
	(unary_widen_def::resolve): Likewise.
---
 .../aarch64/aarch64-sve-builtins-shapes.cc    | 48 +++++++++----------
 gcc/config/aarch64/aarch64-sve-builtins.cc    | 34 +++++++++----
 gcc/config/aarch64/aarch64-sve-builtins.h     |  1 +
 3 files changed, 49 insertions(+), 34 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-sve-builtins-shapes.cc b/gcc/config/aarch64/aarch64-sve-builtins-shapes.cc
index 5b47dff0b41..df2d5414c07 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins-shapes.cc
+++ b/gcc/config/aarch64/aarch64-sve-builtins-shapes.cc
@@ -550,7 +550,7 @@ struct binary_imm_narrowt_base : public overloaded_base<0>
 	|| !r.require_integer_immediate (i + 2))
       return error_mark_node;
 
-    return r.resolve_to (r.mode_suffix_id, type);
+    return r.resolve_to (type);
   }
 };
 
@@ -649,7 +649,7 @@ struct load_contiguous_base : public overloaded_base<0>
 	|| (vnum_p && !r.require_scalar_type (i + 1, "int64_t")))
       return error_mark_node;
 
-    return r.resolve_to (r.mode_suffix_id, type);
+    return r.resolve_to (type);
   }
 };
 
@@ -739,7 +739,7 @@ struct mmla_def : public overloaded_base<0>
 
     /* Make sure that the function exists now, since not all forms
        follow a set pattern after this point.  */
-    tree res = r.resolve_to (r.mode_suffix_id, type);
+    tree res = r.resolve_to (type);
     if (res == error_mark_node)
       return res;
 
@@ -896,7 +896,7 @@ struct ternary_resize2_base : public overloaded_base<0>
 					   MODIFIER))
       return error_mark_node;
 
-    return r.resolve_to (r.mode_suffix_id, type);
+    return r.resolve_to (type);
   }
 };
 
@@ -921,7 +921,7 @@ struct ternary_resize2_lane_base : public overloaded_base<0>
 	|| !r.require_integer_immediate (i + 3))
       return error_mark_node;
 
-    return r.resolve_to (r.mode_suffix_id, type);
+    return r.resolve_to (type);
   }
 };
 
@@ -1012,7 +1012,7 @@ struct unary_narrowt_base : public overloaded_base<0>
 	|| !r.require_derived_vector_type (i, i + 1, type, CLASS, r.HALF_SIZE))
       return error_mark_node;
 
-    return r.resolve_to (r.mode_suffix_id, type);
+    return r.resolve_to (type);
   }
 };
 
@@ -1218,7 +1218,7 @@ struct binary_n_def : public overloaded_base<0>
 	|| !r.require_derived_scalar_type (i + 1, r.SAME_TYPE_CLASS))
       return error_mark_node;
 
-    return r.resolve_to (r.mode_suffix_id, type);
+    return r.resolve_to (type);
   }
 };
 SHAPE (binary_n)
@@ -1399,7 +1399,7 @@ struct binary_uint_def : public overloaded_base<0>
 	|| !r.require_derived_vector_type (i + 1, i, type, TYPE_unsigned))
       return error_mark_node;
 
-    return r.resolve_to (r.mode_suffix_id, type);
+    return r.resolve_to (type);
   }
 };
 SHAPE (binary_uint)
@@ -1427,7 +1427,7 @@ struct binary_uint_n_def : public overloaded_base<0>
 	|| !r.require_derived_scalar_type (i + 1, TYPE_unsigned))
       return error_mark_node;
 
-    return r.resolve_to (r.mode_suffix_id, type);
+    return r.resolve_to (type);
   }
 };
 SHAPE (binary_uint_n)
@@ -1484,7 +1484,7 @@ struct binary_uint64_n_def : public overloaded_base<0>
 	|| !r.require_scalar_type (i + 1, "uint64_t"))
       return error_mark_node;
 
-    return r.resolve_to (r.mode_suffix_id, type);
+    return r.resolve_to (type);
   }
 };
 SHAPE (binary_uint64_n)
@@ -1539,7 +1539,7 @@ struct binary_wide_def : public overloaded_base<0>
 					   r.HALF_SIZE))
       return error_mark_node;
 
-    return r.resolve_to (r.mode_suffix_id, type);
+    return r.resolve_to (type);
   }
 };
 SHAPE (binary_wide)
@@ -1671,7 +1671,7 @@ struct compare_ptr_def : public overloaded_base<0>
 	|| !r.require_matching_pointer_type (i + 1, i, type))
       return error_mark_node;
 
-    return r.resolve_to (r.mode_suffix_id, type);
+    return r.resolve_to (type);
   }
 };
 SHAPE (compare_ptr)
@@ -1700,7 +1700,7 @@ struct compare_scalar_def : public overloaded_base<1>
 	|| !r.require_matching_integer_scalar_type (i + 1, i, type))
       return error_mark_node;
 
-    return r.resolve_to (r.mode_suffix_id, r.type_suffix_ids[0], type);
+    return r.resolve_to (type);
   }
 };
 SHAPE (compare_scalar)
@@ -1877,7 +1877,7 @@ struct fold_left_def : public overloaded_base<0>
 	|| (type = r.infer_vector_type (i + 1)) == NUM_TYPE_SUFFIXES)
       return error_mark_node;
 
-    return r.resolve_to (r.mode_suffix_id, type);
+    return r.resolve_to (type);
   }
 };
 SHAPE (fold_left)
@@ -1905,7 +1905,7 @@ struct get_def : public overloaded_base<0>
 	|| !r.require_integer_immediate (i + 1))
       return error_mark_node;
 
-    return r.resolve_to (r.mode_suffix_id, type);
+    return r.resolve_to (type);
   }
 
   bool
@@ -1987,7 +1987,7 @@ struct inc_dec_pred_def : public overloaded_base<0>
 	|| !r.require_vector_type (i + 1, VECTOR_TYPE_svbool_t))
       return error_mark_node;
 
-    return r.resolve_to (r.mode_suffix_id, type);
+    return r.resolve_to (type);
   }
 };
 SHAPE (inc_dec_pred)
@@ -2014,7 +2014,7 @@ struct inc_dec_pred_scalar_def : public overloaded_base<2>
 	|| !r.require_vector_type (i + 1, VECTOR_TYPE_svbool_t))
       return error_mark_node;
 
-    return r.resolve_to (r.mode_suffix_id, type, r.type_suffix_ids[1]);
+    return r.resolve_to (type, r.type_suffix_ids[1]);
   }
 };
 SHAPE (inc_dec_pred_scalar)
@@ -2419,7 +2419,7 @@ struct set_def : public overloaded_base<0>
 	|| !r.require_derived_vector_type (i + 2, i, type))
       return error_mark_node;
 
-    return r.resolve_to (r.mode_suffix_id, type);
+    return r.resolve_to (type);
   }
 
   bool
@@ -2594,7 +2594,7 @@ struct store_def : public overloaded_base<0>
 	|| ((type = r.infer_tuple_type (nargs - 1)) == NUM_TYPE_SUFFIXES))
       return error_mark_node;
 
-    return r.resolve_to (r.mode_suffix_id, type);
+    return r.resolve_to (type);
   }
 };
 SHAPE (store)
@@ -2714,7 +2714,7 @@ struct tbl_tuple_def : public overloaded_base<0>
 	|| !r.require_derived_vector_type (i + 1, i, type, TYPE_unsigned))
       return error_mark_node;
 
-    return r.resolve_to (r.mode_suffix_id, type);
+    return r.resolve_to (type);
   }
 };
 SHAPE (tbl_tuple)
@@ -2959,7 +2959,7 @@ struct ternary_qq_lane_rotate_def : public overloaded_base<0>
 	|| !r.require_integer_immediate (i + 4))
       return error_mark_node;
 
-    return r.resolve_to (r.mode_suffix_id, type);
+    return r.resolve_to (type);
   }
 
   bool
@@ -3018,7 +3018,7 @@ struct ternary_qq_rotate_def : public overloaded_base<0>
 	|| !r.require_integer_immediate (i + 3))
       return error_mark_node;
 
-    return r.resolve_to (r.mode_suffix_id, type);
+    return r.resolve_to (type);
   }
 
   bool
@@ -3107,7 +3107,7 @@ struct ternary_uint_def : public overloaded_base<0>
 	|| !r.require_derived_vector_type (i + 2, i, type, TYPE_unsigned))
       return error_mark_node;
 
-    return r.resolve_to (r.mode_suffix_id, type);
+    return r.resolve_to (type);
   }
 };
 SHAPE (ternary_uint)
@@ -3437,7 +3437,7 @@ struct unary_widen_def : public overloaded_base<0>
 
     /* There is only a single form for predicates.  */
     if (type == TYPE_SUFFIX_b)
-      return r.resolve_to (r.mode_suffix_id, type);
+      return r.resolve_to (type);
 
     if (type_suffixes[type].integer_p
 	&& type_suffixes[type].element_bits < 64)
diff --git a/gcc/config/aarch64/aarch64-sve-builtins.cc b/gcc/config/aarch64/aarch64-sve-builtins.cc
index 450a8d958a8..e50a58dcc0a 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins.cc
+++ b/gcc/config/aarch64/aarch64-sve-builtins.cc
@@ -1255,6 +1255,25 @@ function_resolver::resolve_to (mode_suffix_index mode,
   return res;
 }
 
+/* A cut-down interface to the function above that keeps the mode suffix
+   unchanged.  As a convenience, resolve_to (TYPE0) can be used for functions
+   whose first type suffix is explicit, with TYPE0 then describing the
+   second type suffix rather than the first.  */
+tree
+function_resolver::resolve_to (type_suffix_index type0,
+			       type_suffix_index type1)
+{
+  /* Handle convert-like functions in which the first type suffix is
+     explicit.  */
+  if (type_suffix_ids[0] != NUM_TYPE_SUFFIXES && type1 == NUM_TYPE_SUFFIXES)
+    {
+      type1 = type0;
+      type0 = type_suffix_ids[0];
+    }
+
+  return resolve_to (mode_suffix_id, type0, type1);
+}
+
 /* Require argument ARGNO to be a 32-bit or 64-bit scalar integer type.
    Return the associated type suffix on success, otherwise report an
    error and return NUM_TYPE_SUFFIXES.  */
@@ -1636,7 +1655,7 @@ require_derived_vector_type (unsigned int argno,
 
   /* Make sure that FIRST_TYPE itself is sensible before using it
      as a basis for an error message.  */
-  if (resolve_to (mode_suffix_id, first_type) == error_mark_node)
+  if (resolve_to (first_type) == error_mark_node)
     return false;
 
   /* If the arguments have consistent type classes, but a link between
@@ -2202,7 +2221,7 @@ finish_opt_n_resolution (unsigned int argno, unsigned int first_argno,
 
       /* Check the vector form normally.  If that succeeds, raise an
 	 error about having no corresponding _n form.  */
-      tree res = resolve_to (mode_suffix_id, inferred_type);
+      tree res = resolve_to (inferred_type);
       if (res != error_mark_node)
 	error_at (location, "passing %qT to argument %d of %qE, but its"
 		  " %qT form does not accept scalars",
@@ -2222,7 +2241,7 @@ finish_opt_n_resolution (unsigned int argno, unsigned int first_argno,
 				    expected_tclass, expected_bits))
     return error_mark_node;
 
-  return resolve_to (mode_suffix_id, inferred_type);
+  return resolve_to (inferred_type);
 }
 
 /* Resolve a (possibly predicated) unary function.  If the function uses
@@ -2279,12 +2298,7 @@ function_resolver::resolve_unary (type_class_index merge_tclass,
 	return error_mark_node;
     }
 
-  /* Handle convert-like functions in which the first type suffix is
-     explicit.  */
-  if (type_suffix_ids[0] != NUM_TYPE_SUFFIXES)
-    return resolve_to (mode_suffix_id, type_suffix_ids[0], type);
-
-  return resolve_to (mode_suffix_id, type);
+  return resolve_to (type);
 }
 
 /* Resolve a (possibly predicated) function that takes NOPS like-typed
@@ -2309,7 +2323,7 @@ function_resolver::resolve_uniform (unsigned int nops, unsigned int nimm)
     if (!require_integer_immediate (i))
       return error_mark_node;
 
-  return resolve_to (mode_suffix_id, type);
+  return resolve_to (type);
 }
 
 /* Resolve a (possibly predicated) function that offers a choice between
diff --git a/gcc/config/aarch64/aarch64-sve-builtins.h b/gcc/config/aarch64/aarch64-sve-builtins.h
index 623b9e3a07b..479b248bef1 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins.h
+++ b/gcc/config/aarch64/aarch64-sve-builtins.h
@@ -394,6 +394,7 @@ public:
   tree resolve_to (mode_suffix_index,
 		   type_suffix_index = NUM_TYPE_SUFFIXES,
 		   type_suffix_index = NUM_TYPE_SUFFIXES);
+  tree resolve_to (type_suffix_index, type_suffix_index = NUM_TYPE_SUFFIXES);
 
   type_suffix_index infer_integer_scalar_type (unsigned int);
   type_suffix_index infer_pointer_type (unsigned int, bool = false);
-- 
2.25.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 13/16] aarch64: Add support for <arm_sme.h>
  2022-11-13  9:59 [PATCH 00/16] aarch64: Add support for SME Richard Sandiford
                   ` (11 preceding siblings ...)
  2022-11-13 10:02 ` [PATCH 12/16] aarch64: Tweaks to function_resolver::resolve_to Richard Sandiford
@ 2022-11-13 10:02 ` Richard Sandiford
  2022-11-13 10:03 ` [PATCH 14/16] aarch64: Add support for arm_locally_streaming Richard Sandiford
                   ` (2 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: Richard Sandiford @ 2022-11-13 10:02 UTC (permalink / raw)
  To: gcc-patches

This adds support for the SME parts of arm_sme.h.  The SME2 parts
will follow at a later date.

The patch doesn't add the ACLE-defined __ARM_FEATURE macros though.
I'm planning to do this later, once we're sure everything has been
implemented.

The feature names sme-i16i64 and sme-f64f64 are different from
the ones that GAS currently expects, but we thought it would be
better to make the compiler names predictable from the architecture
FEAT_* names.  I'm going to add sme-i16i64 and sme-f64f64 aliases
to GAS soon.

gcc/
	* doc/gcc/gcc-command-options/machine-dependent-options/aarch64-options.rst:
	Document +sme-i16i64 and +sme-f64f64.
	* config.gcc (aarch64*-*-*): Add arm_sme.h to the list of headers
	to install and aarch64-sve-builtins-sme.o to the list of objects
	to build.
	* config/aarch64/aarch64-c.cc (aarch64_pragma_aarch64): Handle
	arm_sme.h.
	* config/aarch64/aarch64-option-extensions.def (sme-i16i64)
	(sme-f64f64): New extensions.
	* config/aarch64/aarch64-protos.h (aarch64_sme_vq_immediate)
	(aarch64_addsvl_addspl_immediate_p, aarch64_output_addsvl_addspl)
	(aarch64_output_sme_zero): Declare.
	(aarch64_output_move_struct): Delete.
	(aarch64_sme_ldr_vnum_offset): Declare.
	(aarch64_sve::handle_arm_sme_h): Likewise.
	* config/aarch64/aarch64.h (AARCH64_ISA_SM_ON): New macro.
	(AARCH64_ISA_SME_I16I64, AARCH64_ISA_SME_F64F64): Likewise.
	(TARGET_STREAMING, TARGET_STREAMING_SME): Likewise.
	(TARGET_SME_I16I64, TARGET_SME_F64F64): Likewise.
	* config/aarch64/aarch64.cc (aarch64_sve_rdvl_factor_p): Rename to...
	(aarch64_sve_rdvl_addvl_factor_p): ...this.
	(aarch64_sve_rdvl_immediate_p): Update accordingly.
	(aarch64_rdsvl_immediate_p, aarch64_add_offset): Likewise.
	(aarch64_sme_vq_immediate): Likewise.  Make public.
	(aarch64_sve_addpl_factor_p): New function.
	(aarch64_sve_addvl_addpl_immediate_p): Use
	aarch64_sve_rdvl_addvl_factor_p and aarch64_sve_addpl_factor_p.
	(aarch64_addsvl_addspl_immediate_p): New function.
	(aarch64_output_addsvl_addspl): Likewise.
	(aarch64_cannot_force_const_mem): Return true for RDSVL immediates.
	(aarch64_classify_index): Handle .Q scaling for VNx1TImode.
	(aarch64_classify_address): Likewise for vnum offsets.
	(aarch64_output_sme_zero): New function.
	(aarch64_sme_ldr_vnum_offset_p): Likewise.
	* config/aarch64/predicates.md (aarch64_addsvl_addspl_immediate):
	New predicate.
	(aarch64_pluslong_operand): Include it for SME.
	* config/aarch64/constraints.md (Uci, Uav): New constraints.
	* config/aarch64/iterators.md (VNx1TI_ONLY): New mode iterator.
	(SME_ZA_I, SME_ZA_SDI, SME_MOP_BHI, SME_MOP_HSDF): Likewise.
	(UNSPEC_SME_ADDHA, UNSPEC_SME_ADDVA, UNSPEC_SME_FMOPA)
	(UNSPEC_SME_FMOPS, UNSPEC_SME_LD1_HOR, UNSPEC_SME_LD1_VER)
	(UNSPEC_SME_READ_HOR, UNSPEC_SME_READ_VER, UNSPEC_SME_SMOPA)
	(UNSPEC_SME_SMOPS, UNSPEC_SME_ST1_HOR, UNSPEC_SME_ST1_VER)
	(UNSPEC_SME_SUMOPA, UNSPEC_SME_SUMOPS, UNSPEC_SME_UMOPA)
	(UNSPEC_SME_UMOPS, UNSPEC_SME_USMOPA, UNSPEC_SME_USMOPS)
	(UNSPEC_SME_WRITE_HOR, UNSPEC_SME_WRITE_VER): New unspecs.
	(Vetype, Vesize, VPRED): Handle VNx1TI.
	(V4xWIDE, V4xWIDE_PRED, V4xwetype): New mode attributes.
	(SME_FMOP_WIDE, SME_FMOP_WIDE_PRED, sme_fmop_wide_etype, b): Likewise.
	(SME_LD1, SME_READ, SME_ST1, SME_WRITE, SME_UNARY_SDI, SME_INT_MOP)
	(SME_FP_MOP): New int iterators.
	(optab): Handle SME unspecs.
	(hv): New int attribute.
	* config/aarch64/aarch64.md (*add<mode>3_aarch64): Handle ADDSVL
	and ADDSPL.
	* config/aarch64/aarch64-sme.md (UNSPEC_SME_LDR): New unspec.
	(@aarch64_sme_<optab><mode>, *aarch64_sme_<optab><mode>_plus)
	(aarch64_sme_ldr0, *aarch64_sme_ldrn<mode>): New patterns.
	(UNSPEC_SME_STR): New unspec.
	(@aarch64_sme_<optab><mode>, *aarch64_sme_<optab><mode>_plus)
	(aarch64_sme_str0, *aarch64_sme_strn<mode>): New patterns.
	(@aarch64_sme_<optab><v_int_container><mode>): Likewise.
	(*aarch64_sme_<optab><v_int_container><mode>_plus): Likewise.
	(@aarch64_sme_<optab><VNx1TI_ONLY:mode><SVE_FULL:mode>): Likewise.
	(@aarch64_sme_<optab><v_int_container><mode>): Likewise.
	(*aarch64_sme_<optab><v_int_container><mode>_plus): Likewise.
	(@aarch64_sme_<optab><VNx1TI_ONLY:mode><SVE_FULL:mode>): Likewise.
	(UNSPEC_SME_ZERO): New unspec.
	(aarch64_sme_zero): New pattern.
	(@aarch64_sme_<SME_UNARY_SDI:optab><mode>): Likewise.
	(@aarch64_sme_<SME_INT_MOP:optab><mode>): Likewise.
	(@aarch64_sme_<SME_FP_MOP:optab><mode>): Likewise.
	* config/aarch64/aarch64-sve-builtins.def: Add ZA type suffixes.
	* config/aarch64/aarch64-sve-builtins.h (CP_READ_ZA): New call
	property.
	(CP_WRITE_ZA): Likewise.
	(PRED_za_m): New predication type.
	(type_suffix_info): Add vector_p and za_p fields.
	(function_instance::shared_za_p): New member function.
	(function_instance::preserves_za_p): Likewise.
	(function_instance::num_za_tiles): Likewise.
	(function_expander::get_contiguous_base): Take a base argument
	number, a vnum argument number, and an argument that indicates
	whether the vnum parameter is a factor of the SME vector length
	or the prevailing vector length.
	(function_expander::add_integer_operand): Take a poly_int64.
	(sve_switcher::sve_switcher): Take a base set of flags.
	(sme_switcher): New class.
	(scalar_types): Add a null entry for NUM_VECTOR_TYPES.
	* config/aarch64/aarch64-sve-builtins.cc: Include
	aarch64-sve-builtins-sme.h.
	(pred_suffixes): Add an entry for PRED_za_m.
	(type_suffixes): Initialize vector_p and za_p.  Handle ZA suffixes.
	(TYPES_all_za, TYPES_all_za_data, TYPES_s_za_integer)
	(TYPES_d_za_integer, TYPES_mop_base, TYPES_mop_base_signed)
	(TYPES_mop_base_unsigned, TYPES_mop_i16i64, TYPES_mop_i16i64_signed)
	(TYPES_mop_i16i64_unsigned, TYPES_mop_f64f64, TYPES_za): New
	type suffix macros.
	(preds_m, preds_za_m): New predication lists.
	(scalar_types): Add an entry for NUM_VECTOR_TYPES.
	(find_type_suffix_for_scalar_type): Check positively for vectors
	rather than negatively for predicates.
	(check_required_extensions): Handle arm_streaming and arm_shared_za
	requirements.
	(function_instance::reads_global_state_p): Return true for functions
	that read ZA.
	(function_instance::modifies_global_state_p): Return true for functions
	that write to ZA.
	(function_instance::shared_za_p): New function.
	(function_instance::preserves_za_p): Likewise.
	(sve_switcher::sve_switcher): Add a base flags argument.
	(function_builder::get_name): Handle "__arm_" prefixes.
	(function_builder::get_attributes): Add arm_shared_za and
	arm_preserved_za attributes where appropriate.
	(function_resolver::check_gp_argument): Assert that the predication
	isn't ZA _m predication.
	(function_checker::function_checker): Don't bias the argument
	number for ZA _m predication.
	(function_expander::get_contiguous_base): Add arguments that
	specify the base argument number, the vnum argument number,
	and an argument that indicates whether the vnum parameter is
	a factor of the SME vector length or the prevailing vector length.
	Handle the SME case.
	(function_expander::add_integer_operand): Take a poly_int64.
	(init_builtins): Call handle_arm_sme_h for LTO.
	(handle_arm_sve_h): Skip SME intrinsics.
	(handle_arm_sme_h): New function.
	* config/aarch64/aarch64-sve-builtins-functions.h
	(read_write_za, write_za): New classes.
	(unspec_based_sme_function, za_arith_function): New using aliases.
	(quiet_za_arith_function): Likewise.
	* config/aarch64/aarch64-sve-builtins-shapes.h
	(binary_za_int_m, binary_za_m, binary_za_uint_m, bool_inherent)
	(inherent_za, inherent_mask_za, ldr_za, load_za, read_za, store_za)
	(str_za, unary_za_m, write_za): Declare.
	* config/aarch64/aarch64-sve-builtins-shapes.cc (apply_predication):
	Expect za_m functions to have an existing governing predicate.
	(binary_za_m_base, binary_za_int_m_def, binary_za_m_def)
	(binary_za_uint_m_def, bool_inherent_def, inherent_za_def)
	(inherent_mask_za_def, ldr_za_def, load_za_def, read_za_def)
	(store_za_def, str_za_def, unary_za_m_def, write_za_def): New classes.
	* config/aarch64/aarch64-sve-builtins-base.cc
	(svundef_impl::call_properties): New function.  Handle ZA suffixes.
	(svundef_impl::expand): Handle ZA suffixes here too.
	* config/aarch64/arm_sme.h: New file.
	* config/aarch64/aarch64-sve-builtins-sme.h: Likewise.
	* config/aarch64/aarch64-sve-builtins-sme.cc: Likewise.
	* config/aarch64/aarch64-sve-builtins-sme.def: Likewise.
	* config/aarch64/t-aarch64 (aarch64-sve-builtins.o): Depend on
	aarch64-sve-builtins-sme.def and aarch64-sve-builtins-sme.h.
	(aarch64-sve-builtins-sme.o): New rule.

gcc/testsuite/
	* lib/target-supports.exp: Add sme and sme-i16i64 features.
	* gcc.target/aarch64/sve/acle/asm/test_sve_acle.h: Allow functions
	to be marked as arm_streaming and/or arm_shared_za.
	* g++.target/aarch64/sve/acle/general-c++/func_redef_4.c: Mark the
	function as arm_preserves_za.
	* g++.target/aarch64/sve/acle/general-c++/func_redef_5.c: Likewise.
	* g++.target/aarch64/sve/acle/general-c++/func_redef_7.c: Likewise.
	* gcc.target/aarch64/sve/acle/general-c/func_redef_4.c: Likewise.
	* gcc.target/aarch64/sve/acle/general-c/func_redef_5.c: Likewise.
	* g++.target/aarch64/sme/aarch64-sme-acle-asm.exp: New test harness.
	* gcc.target/aarch64/sme/aarch64-sme-acle-asm.exp: Likewise.
	* gcc.target/aarch64/sme/acle-asm/addha_za32.c: New test.
	* gcc.target/aarch64/sme/acle-asm/addha_za64.c: Likewise.
	* gcc.target/aarch64/sme/acle-asm/addva_za32.c: Likewise.
	* gcc.target/aarch64/sme/acle-asm/addva_za64.c: Likewise.
	* gcc.target/aarch64/sme/acle-asm/arm_has_sme_sc.c: Likewise.
	* gcc.target/aarch64/sme/acle-asm/arm_in_streaming_mode_ns.c: Likewise.
	* gcc.target/aarch64/sme/acle-asm/arm_in_streaming_mode_s.c: Likewise.
	* gcc.target/aarch64/sme/acle-asm/arm_in_streaming_mode_sc.c: Likewise.
	* gcc.target/aarch64/sme/acle-asm/cntsb_s.c: Likewise.
	* gcc.target/aarch64/sme/acle-asm/cntsb_sc.c: Likewise.
	* gcc.target/aarch64/sme/acle-asm/cntsd_s.c: Likewise.
	* gcc.target/aarch64/sme/acle-asm/cntsd_sc.c: Likewise.
	* gcc.target/aarch64/sme/acle-asm/cntsh_s.c: Likewise.
	* gcc.target/aarch64/sme/acle-asm/cntsh_sc.c: Likewise.
	* gcc.target/aarch64/sme/acle-asm/cntsw_s.c: Likewise.
	* gcc.target/aarch64/sme/acle-asm/cntsw_sc.c: Likewise.
	* gcc.target/aarch64/sme/acle-asm/ld1_hor_vnum_za128.c: Likewise.
	* gcc.target/aarch64/sme/acle-asm/ld1_hor_vnum_za16.c: Likewise.
	* gcc.target/aarch64/sme/acle-asm/ld1_hor_vnum_za32.c: Likewise.
	* gcc.target/aarch64/sme/acle-asm/ld1_hor_vnum_za64.c: Likewise.
	* gcc.target/aarch64/sme/acle-asm/ld1_hor_vnum_za8.c: Likewise.
	* gcc.target/aarch64/sme/acle-asm/ld1_hor_za128.c: Likewise.
	* gcc.target/aarch64/sme/acle-asm/ld1_hor_za16.c: Likewise.
	* gcc.target/aarch64/sme/acle-asm/ld1_hor_za32.c: Likewise.
	* gcc.target/aarch64/sme/acle-asm/ld1_hor_za64.c: Likewise.
	* gcc.target/aarch64/sme/acle-asm/ld1_hor_za8.c: Likewise.
	* gcc.target/aarch64/sme/acle-asm/ldr_vnum_za_s.c: Likewise.
	* gcc.target/aarch64/sme/acle-asm/ldr_vnum_za_sc.c: Likewise.
	* gcc.target/aarch64/sme/acle-asm/ldr_za_s.c: Likewise.
	* gcc.target/aarch64/sme/acle-asm/ldr_za_sc.c: Likewise.
	* gcc.target/aarch64/sme/acle-asm/mopa_za32.c: Likewise.
	* gcc.target/aarch64/sme/acle-asm/mopa_za64.c: Likewise.
	* gcc.target/aarch64/sme/acle-asm/mops_za32.c: Likewise.
	* gcc.target/aarch64/sme/acle-asm/mops_za64.c: Likewise.
	* gcc.target/aarch64/sme/acle-asm/read_hor_za128.c: Likewise.
	* gcc.target/aarch64/sme/acle-asm/read_hor_za16.c: Likewise.
	* gcc.target/aarch64/sme/acle-asm/read_hor_za32.c: Likewise.
	* gcc.target/aarch64/sme/acle-asm/read_hor_za64.c: Likewise.
	* gcc.target/aarch64/sme/acle-asm/read_hor_za8.c: Likewise.
	* gcc.target/aarch64/sme/acle-asm/read_ver_za128.c: Likewise.
	* gcc.target/aarch64/sme/acle-asm/read_ver_za16.c: Likewise.
	* gcc.target/aarch64/sme/acle-asm/read_ver_za32.c: Likewise.
	* gcc.target/aarch64/sme/acle-asm/read_ver_za64.c: Likewise.
	* gcc.target/aarch64/sme/acle-asm/read_ver_za8.c: Likewise.
	* gcc.target/aarch64/sme/acle-asm/st1_hor_vnum_za128.c: Likewise.
	* gcc.target/aarch64/sme/acle-asm/st1_hor_vnum_za16.c: Likewise.
	* gcc.target/aarch64/sme/acle-asm/st1_hor_vnum_za32.c: Likewise.
	* gcc.target/aarch64/sme/acle-asm/st1_hor_vnum_za64.c: Likewise.
	* gcc.target/aarch64/sme/acle-asm/st1_hor_vnum_za8.c: Likewise.
	* gcc.target/aarch64/sme/acle-asm/st1_hor_za128.c: Likewise.
	* gcc.target/aarch64/sme/acle-asm/st1_hor_za16.c: Likewise.
	* gcc.target/aarch64/sme/acle-asm/st1_hor_za32.c: Likewise.
	* gcc.target/aarch64/sme/acle-asm/st1_hor_za64.c: Likewise.
	* gcc.target/aarch64/sme/acle-asm/st1_hor_za8.c: Likewise.
	* gcc.target/aarch64/sme/acle-asm/str_vnum_za_s.c: Likewise.
	* gcc.target/aarch64/sme/acle-asm/str_vnum_za_sc.c: Likewise.
	* gcc.target/aarch64/sme/acle-asm/str_za_s.c: Likewise.
	* gcc.target/aarch64/sme/acle-asm/str_za_sc.c: Likewise.
	* gcc.target/aarch64/sme/acle-asm/sumopa_za32.c: Likewise.
	* gcc.target/aarch64/sme/acle-asm/sumopa_za64.c: Likewise.
	* gcc.target/aarch64/sme/acle-asm/sumops_za32.c: Likewise.
	* gcc.target/aarch64/sme/acle-asm/sumops_za64.c: Likewise.
	* gcc.target/aarch64/sme/acle-asm/test_sme_acle.h: Likewise.
	* gcc.target/aarch64/sme/acle-asm/undef_za.c: Likewise.
	* gcc.target/aarch64/sme/acle-asm/usmopa_za32.c: Likewise.
	* gcc.target/aarch64/sme/acle-asm/usmopa_za64.c: Likewise.
	* gcc.target/aarch64/sme/acle-asm/usmops_za32.c: Likewise.
	* gcc.target/aarch64/sme/acle-asm/usmops_za64.c: Likewise.
	* gcc.target/aarch64/sme/acle-asm/write_hor_za128.c: Likewise.
	* gcc.target/aarch64/sme/acle-asm/write_hor_za16.c: Likewise.
	* gcc.target/aarch64/sme/acle-asm/write_hor_za32.c: Likewise.
	* gcc.target/aarch64/sme/acle-asm/write_hor_za64.c: Likewise.
	* gcc.target/aarch64/sme/acle-asm/write_hor_za8.c: Likewise.
	* gcc.target/aarch64/sme/acle-asm/write_ver_za128.c: Likewise.
	* gcc.target/aarch64/sme/acle-asm/write_ver_za16.c: Likewise.
	* gcc.target/aarch64/sme/acle-asm/write_ver_za32.c: Likewise.
	* gcc.target/aarch64/sme/acle-asm/write_ver_za64.c: Likewise.
	* gcc.target/aarch64/sme/acle-asm/write_ver_za8.c: Likewise.
	* gcc.target/aarch64/sme/acle-asm/zero_mask_za.c: Likewise.
	* gcc.target/aarch64/sme/acle-asm/zero_za.c: Likewise.
	* gcc.target/aarch64/sve/acle/general-c/binary_za_int_m_1.c: Likewise.
	* gcc.target/aarch64/sve/acle/general-c/binary_za_m_1.c: Likewise.
	* gcc.target/aarch64/sve/acle/general-c/binary_za_m_2.c: Likewise.
	* gcc.target/aarch64/sve/acle/general-c/binary_za_uint_m_1.c: Likewise.
	* gcc.target/aarch64/sve/acle/general-c/read_za_m_1.c: Likewise.
	* gcc.target/aarch64/sve/acle/general-c/unary_za_m_1.c: Likewise.
	* gcc.target/aarch64/sve/acle/general-c/write_za_m_1.c: Likewise.
---
 gcc/config.gcc                                |   4 +-
 gcc/config/aarch64/aarch64-c.cc               |   2 +
 .../aarch64/aarch64-option-extensions.def     |   4 +
 gcc/config/aarch64/aarch64-protos.h           |   8 +-
 gcc/config/aarch64/aarch64-sme.md             | 335 ++++++++++++++++
 .../aarch64/aarch64-sve-builtins-base.cc      |  13 +-
 .../aarch64/aarch64-sve-builtins-functions.h  |  39 ++
 .../aarch64/aarch64-sve-builtins-shapes.cc    | 280 ++++++++++++-
 .../aarch64/aarch64-sve-builtins-shapes.h     |  13 +
 .../aarch64/aarch64-sve-builtins-sme.cc       | 351 +++++++++++++++++
 .../aarch64/aarch64-sve-builtins-sme.def      |  83 ++++
 gcc/config/aarch64/aarch64-sve-builtins-sme.h |  56 +++
 gcc/config/aarch64/aarch64-sve-builtins.cc    | 275 +++++++++++--
 gcc/config/aarch64/aarch64-sve-builtins.def   |  15 +
 gcc/config/aarch64/aarch64-sve-builtins.h     |  46 ++-
 gcc/config/aarch64/aarch64.cc                 | 143 ++++++-
 gcc/config/aarch64/aarch64.h                  |  15 +
 gcc/config/aarch64/aarch64.md                 |  14 +-
 gcc/config/aarch64/arm_sme.h                  |  46 +++
 gcc/config/aarch64/constraints.md             |   9 +
 gcc/config/aarch64/iterators.md               |  98 +++++
 gcc/config/aarch64/predicates.md              |   8 +-
 gcc/config/aarch64/t-aarch64                  |  17 +-
 .../aarch64-options.rst                       |   6 +
 .../aarch64/sme/aarch64-sme-acle-asm.exp      |  86 ++++
 .../sve/acle/general-c++/func_redef_4.c       |   2 +-
 .../sve/acle/general-c++/func_redef_5.c       |   2 +-
 .../sve/acle/general-c++/func_redef_7.c       |   2 +-
 .../aarch64/sme/aarch64-sme-acle-asm.exp      |  82 ++++
 .../aarch64/sme/acle-asm/addha_za32.c         |  48 +++
 .../aarch64/sme/acle-asm/addha_za64.c         |  50 +++
 .../aarch64/sme/acle-asm/addva_za32.c         |  48 +++
 .../aarch64/sme/acle-asm/addva_za64.c         |  50 +++
 .../aarch64/sme/acle-asm/arm_has_sme_sc.c     |  25 ++
 .../sme/acle-asm/arm_in_streaming_mode_ns.c   |  11 +
 .../sme/acle-asm/arm_in_streaming_mode_s.c    |  11 +
 .../sme/acle-asm/arm_in_streaming_mode_sc.c   |  26 ++
 .../gcc.target/aarch64/sme/acle-asm/cntsb_s.c | 310 +++++++++++++++
 .../aarch64/sme/acle-asm/cntsb_sc.c           |  12 +
 .../gcc.target/aarch64/sme/acle-asm/cntsd_s.c | 277 +++++++++++++
 .../aarch64/sme/acle-asm/cntsd_sc.c           |  13 +
 .../gcc.target/aarch64/sme/acle-asm/cntsh_s.c | 279 +++++++++++++
 .../aarch64/sme/acle-asm/cntsh_sc.c           |  13 +
 .../gcc.target/aarch64/sme/acle-asm/cntsw_s.c | 278 +++++++++++++
 .../aarch64/sme/acle-asm/cntsw_sc.c           |  13 +
 .../aarch64/sme/acle-asm/ld1_hor_vnum_za128.c |  46 +++
 .../aarch64/sme/acle-asm/ld1_hor_vnum_za16.c  |  46 +++
 .../aarch64/sme/acle-asm/ld1_hor_vnum_za32.c  |  46 +++
 .../aarch64/sme/acle-asm/ld1_hor_vnum_za64.c  |  46 +++
 .../aarch64/sme/acle-asm/ld1_hor_vnum_za8.c   |  46 +++
 .../aarch64/sme/acle-asm/ld1_hor_za128.c      |  63 +++
 .../aarch64/sme/acle-asm/ld1_hor_za16.c       |  94 +++++
 .../aarch64/sme/acle-asm/ld1_hor_za32.c       |  93 +++++
 .../aarch64/sme/acle-asm/ld1_hor_za64.c       |  73 ++++
 .../aarch64/sme/acle-asm/ld1_hor_za8.c        |  63 +++
 .../aarch64/sme/acle-asm/ld1_ver_vnum_za128.c |   0
 .../aarch64/sme/acle-asm/ld1_ver_vnum_za16.c  |   0
 .../aarch64/sme/acle-asm/ld1_ver_vnum_za32.c  |   0
 .../aarch64/sme/acle-asm/ld1_ver_vnum_za64.c  |   0
 .../aarch64/sme/acle-asm/ld1_ver_vnum_za8.c   |   0
 .../aarch64/sme/acle-asm/ld1_ver_za128.c      |   0
 .../aarch64/sme/acle-asm/ld1_ver_za16.c       |   0
 .../aarch64/sme/acle-asm/ld1_ver_za32.c       |   0
 .../aarch64/sme/acle-asm/ld1_ver_za64.c       |   0
 .../aarch64/sme/acle-asm/ld1_ver_za8.c        |   0
 .../aarch64/sme/acle-asm/ldr_vnum_za_s.c      | 121 ++++++
 .../aarch64/sme/acle-asm/ldr_vnum_za_sc.c     | 166 ++++++++
 .../aarch64/sme/acle-asm/ldr_za_s.c           | 104 +++++
 .../aarch64/sme/acle-asm/ldr_za_sc.c          |  51 +++
 .../aarch64/sme/acle-asm/mopa_za32.c          | 102 +++++
 .../aarch64/sme/acle-asm/mopa_za64.c          |  70 ++++
 .../aarch64/sme/acle-asm/mops_za32.c          | 102 +++++
 .../aarch64/sme/acle-asm/mops_za64.c          |  70 ++++
 .../aarch64/sme/acle-asm/read_hor_za128.c     | 367 ++++++++++++++++++
 .../aarch64/sme/acle-asm/read_hor_za16.c      | 171 ++++++++
 .../aarch64/sme/acle-asm/read_hor_za32.c      | 164 ++++++++
 .../aarch64/sme/acle-asm/read_hor_za64.c      | 154 ++++++++
 .../aarch64/sme/acle-asm/read_hor_za8.c       |  97 +++++
 .../aarch64/sme/acle-asm/read_ver_za128.c     | 367 ++++++++++++++++++
 .../aarch64/sme/acle-asm/read_ver_za16.c      | 171 ++++++++
 .../aarch64/sme/acle-asm/read_ver_za32.c      | 164 ++++++++
 .../aarch64/sme/acle-asm/read_ver_za64.c      | 154 ++++++++
 .../aarch64/sme/acle-asm/read_ver_za8.c       |  97 +++++
 .../aarch64/sme/acle-asm/st1_hor_vnum_za128.c |  46 +++
 .../aarch64/sme/acle-asm/st1_hor_vnum_za16.c  |  46 +++
 .../aarch64/sme/acle-asm/st1_hor_vnum_za32.c  |  46 +++
 .../aarch64/sme/acle-asm/st1_hor_vnum_za64.c  |  46 +++
 .../aarch64/sme/acle-asm/st1_hor_vnum_za8.c   |  46 +++
 .../aarch64/sme/acle-asm/st1_hor_za128.c      |  63 +++
 .../aarch64/sme/acle-asm/st1_hor_za16.c       |  94 +++++
 .../aarch64/sme/acle-asm/st1_hor_za32.c       |  93 +++++
 .../aarch64/sme/acle-asm/st1_hor_za64.c       |  73 ++++
 .../aarch64/sme/acle-asm/st1_hor_za8.c        |  63 +++
 .../aarch64/sme/acle-asm/st1_ver_vnum_za128.c |   0
 .../aarch64/sme/acle-asm/st1_ver_vnum_za16.c  |   0
 .../aarch64/sme/acle-asm/st1_ver_vnum_za32.c  |   0
 .../aarch64/sme/acle-asm/st1_ver_vnum_za64.c  |   0
 .../aarch64/sme/acle-asm/st1_ver_vnum_za8.c   |   0
 .../aarch64/sme/acle-asm/st1_ver_za128.c      |   0
 .../aarch64/sme/acle-asm/st1_ver_za16.c       |   0
 .../aarch64/sme/acle-asm/st1_ver_za32.c       |   0
 .../aarch64/sme/acle-asm/st1_ver_za64.c       |   0
 .../aarch64/sme/acle-asm/st1_ver_za8.c        |   0
 .../aarch64/sme/acle-asm/str_vnum_za_s.c      | 121 ++++++
 .../aarch64/sme/acle-asm/str_vnum_za_sc.c     | 166 ++++++++
 .../aarch64/sme/acle-asm/str_za_s.c           | 104 +++++
 .../aarch64/sme/acle-asm/str_za_sc.c          |  51 +++
 .../aarch64/sme/acle-asm/sumopa_za32.c        |  30 ++
 .../aarch64/sme/acle-asm/sumopa_za64.c        |  32 ++
 .../aarch64/sme/acle-asm/sumops_za32.c        |  30 ++
 .../aarch64/sme/acle-asm/sumops_za64.c        |  32 ++
 .../aarch64/sme/acle-asm/test_sme_acle.h      |  62 +++
 .../aarch64/sme/acle-asm/undef_za.c           |  33 ++
 .../aarch64/sme/acle-asm/usmopa_za32.c        |  30 ++
 .../aarch64/sme/acle-asm/usmopa_za64.c        |  32 ++
 .../aarch64/sme/acle-asm/usmops_za32.c        |  30 ++
 .../aarch64/sme/acle-asm/usmops_za64.c        |  32 ++
 .../aarch64/sme/acle-asm/write_hor_za128.c    | 173 +++++++++
 .../aarch64/sme/acle-asm/write_hor_za16.c     | 113 ++++++
 .../aarch64/sme/acle-asm/write_hor_za32.c     | 123 ++++++
 .../aarch64/sme/acle-asm/write_hor_za64.c     | 113 ++++++
 .../aarch64/sme/acle-asm/write_hor_za8.c      |  73 ++++
 .../aarch64/sme/acle-asm/write_ver_za128.c    | 173 +++++++++
 .../aarch64/sme/acle-asm/write_ver_za16.c     | 113 ++++++
 .../aarch64/sme/acle-asm/write_ver_za32.c     | 123 ++++++
 .../aarch64/sme/acle-asm/write_ver_za64.c     | 113 ++++++
 .../aarch64/sme/acle-asm/write_ver_za8.c      |  73 ++++
 .../aarch64/sme/acle-asm/zero_mask_za.c       | 130 +++++++
 .../gcc.target/aarch64/sme/acle-asm/zero_za.c |  11 +
 .../aarch64/sve/acle/asm/test_sve_acle.h      |  16 +-
 .../sve/acle/general-c/binary_za_int_m_1.c    |  48 +++
 .../sve/acle/general-c/binary_za_m_1.c        |  48 +++
 .../sve/acle/general-c/binary_za_m_2.c        |  11 +
 .../sve/acle/general-c/binary_za_uint_m_1.c   |  48 +++
 .../aarch64/sve/acle/general-c/func_redef_4.c |   2 +-
 .../aarch64/sve/acle/general-c/func_redef_5.c |   2 +-
 .../aarch64/sve/acle/general-c/read_za_m_1.c  |  47 +++
 .../aarch64/sve/acle/general-c/unary_za_m_1.c |  47 +++
 .../aarch64/sve/acle/general-c/write_za_m_1.c |  47 +++
 gcc/testsuite/lib/target-supports.exp         |   3 +-
 140 files changed, 9806 insertions(+), 71 deletions(-)
 create mode 100644 gcc/config/aarch64/aarch64-sve-builtins-sme.cc
 create mode 100644 gcc/config/aarch64/aarch64-sve-builtins-sme.def
 create mode 100644 gcc/config/aarch64/aarch64-sve-builtins-sme.h
 create mode 100644 gcc/config/aarch64/arm_sme.h
 create mode 100644 gcc/testsuite/g++.target/aarch64/sme/aarch64-sme-acle-asm.exp
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/aarch64-sme-acle-asm.exp
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/acle-asm/addha_za32.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/acle-asm/addha_za64.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/acle-asm/addva_za32.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/acle-asm/addva_za64.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/acle-asm/arm_has_sme_sc.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/acle-asm/arm_in_streaming_mode_ns.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/acle-asm/arm_in_streaming_mode_s.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/acle-asm/arm_in_streaming_mode_sc.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/acle-asm/cntsb_s.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/acle-asm/cntsb_sc.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/acle-asm/cntsd_s.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/acle-asm/cntsd_sc.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/acle-asm/cntsh_s.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/acle-asm/cntsh_sc.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/acle-asm/cntsw_s.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/acle-asm/cntsw_sc.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/acle-asm/ld1_hor_vnum_za128.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/acle-asm/ld1_hor_vnum_za16.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/acle-asm/ld1_hor_vnum_za32.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/acle-asm/ld1_hor_vnum_za64.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/acle-asm/ld1_hor_vnum_za8.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/acle-asm/ld1_hor_za128.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/acle-asm/ld1_hor_za16.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/acle-asm/ld1_hor_za32.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/acle-asm/ld1_hor_za64.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/acle-asm/ld1_hor_za8.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/acle-asm/ld1_ver_vnum_za128.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/acle-asm/ld1_ver_vnum_za16.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/acle-asm/ld1_ver_vnum_za32.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/acle-asm/ld1_ver_vnum_za64.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/acle-asm/ld1_ver_vnum_za8.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/acle-asm/ld1_ver_za128.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/acle-asm/ld1_ver_za16.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/acle-asm/ld1_ver_za32.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/acle-asm/ld1_ver_za64.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/acle-asm/ld1_ver_za8.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/acle-asm/ldr_vnum_za_s.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/acle-asm/ldr_vnum_za_sc.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/acle-asm/ldr_za_s.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/acle-asm/ldr_za_sc.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/acle-asm/mopa_za32.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/acle-asm/mopa_za64.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/acle-asm/mops_za32.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/acle-asm/mops_za64.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/acle-asm/read_hor_za128.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/acle-asm/read_hor_za16.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/acle-asm/read_hor_za32.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/acle-asm/read_hor_za64.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/acle-asm/read_hor_za8.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/acle-asm/read_ver_za128.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/acle-asm/read_ver_za16.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/acle-asm/read_ver_za32.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/acle-asm/read_ver_za64.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/acle-asm/read_ver_za8.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/acle-asm/st1_hor_vnum_za128.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/acle-asm/st1_hor_vnum_za16.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/acle-asm/st1_hor_vnum_za32.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/acle-asm/st1_hor_vnum_za64.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/acle-asm/st1_hor_vnum_za8.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/acle-asm/st1_hor_za128.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/acle-asm/st1_hor_za16.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/acle-asm/st1_hor_za32.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/acle-asm/st1_hor_za64.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/acle-asm/st1_hor_za8.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/acle-asm/st1_ver_vnum_za128.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/acle-asm/st1_ver_vnum_za16.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/acle-asm/st1_ver_vnum_za32.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/acle-asm/st1_ver_vnum_za64.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/acle-asm/st1_ver_vnum_za8.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/acle-asm/st1_ver_za128.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/acle-asm/st1_ver_za16.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/acle-asm/st1_ver_za32.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/acle-asm/st1_ver_za64.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/acle-asm/st1_ver_za8.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/acle-asm/str_vnum_za_s.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/acle-asm/str_vnum_za_sc.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/acle-asm/str_za_s.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/acle-asm/str_za_sc.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/acle-asm/sumopa_za32.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/acle-asm/sumopa_za64.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/acle-asm/sumops_za32.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/acle-asm/sumops_za64.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/acle-asm/test_sme_acle.h
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/acle-asm/undef_za.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/acle-asm/usmopa_za32.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/acle-asm/usmopa_za64.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/acle-asm/usmops_za32.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/acle-asm/usmops_za64.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/acle-asm/write_hor_za128.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/acle-asm/write_hor_za16.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/acle-asm/write_hor_za32.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/acle-asm/write_hor_za64.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/acle-asm/write_hor_za8.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/acle-asm/write_ver_za128.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/acle-asm/write_ver_za16.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/acle-asm/write_ver_za32.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/acle-asm/write_ver_za64.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/acle-asm/write_ver_za8.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/acle-asm/zero_mask_za.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/acle-asm/zero_za.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/acle/general-c/binary_za_int_m_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/acle/general-c/binary_za_m_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/acle/general-c/binary_za_m_2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/acle/general-c/binary_za_uint_m_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/acle/general-c/read_za_m_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/acle/general-c/unary_za_m_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/acle/general-c/write_za_m_1.c

diff --git a/gcc/config.gcc b/gcc/config.gcc
index b5eda046033..79673619bd4 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -323,11 +323,11 @@ m32c*-*-*)
         ;;
 aarch64*-*-*)
 	cpu_type=aarch64
-	extra_headers="arm_fp16.h arm_neon.h arm_bf16.h arm_acle.h arm_sve.h"
+	extra_headers="arm_fp16.h arm_neon.h arm_bf16.h arm_acle.h arm_sve.h arm_sme.h"
 	c_target_objs="aarch64-c.o"
 	cxx_target_objs="aarch64-c.o"
 	d_target_objs="aarch64-d.o"
-	extra_objs="aarch64-builtins.o aarch-common.o aarch64-sve-builtins.o aarch64-sve-builtins-shapes.o aarch64-sve-builtins-base.o aarch64-sve-builtins-sve2.o cortex-a57-fma-steering.o aarch64-speculation.o falkor-tag-collision-avoidance.o aarch64-bti-insert.o aarch64-cc-fusion.o"
+	extra_objs="aarch64-builtins.o aarch-common.o aarch64-sve-builtins.o aarch64-sve-builtins-shapes.o aarch64-sve-builtins-base.o aarch64-sve-builtins-sve2.o aarch64-sve-builtins-sme.o cortex-a57-fma-steering.o aarch64-speculation.o falkor-tag-collision-avoidance.o aarch64-bti-insert.o aarch64-cc-fusion.o"
 	target_gtfiles="\$(srcdir)/config/aarch64/aarch64-builtins.cc \$(srcdir)/config/aarch64/aarch64-sve-builtins.h \$(srcdir)/config/aarch64/aarch64-sve-builtins.cc"
 	target_has_targetm_common=yes
 	;;
diff --git a/gcc/config/aarch64/aarch64-c.cc b/gcc/config/aarch64/aarch64-c.cc
index e296c73350f..db2705ac6d2 100644
--- a/gcc/config/aarch64/aarch64-c.cc
+++ b/gcc/config/aarch64/aarch64-c.cc
@@ -288,6 +288,8 @@ aarch64_pragma_aarch64 (cpp_reader *)
   const char *name = TREE_STRING_POINTER (x);
   if (strcmp (name, "arm_sve.h") == 0)
     aarch64_sve::handle_arm_sve_h ();
+  else if (strcmp (name, "arm_sme.h") == 0)
+    aarch64_sve::handle_arm_sme_h ();
   else if (strcmp (name, "arm_neon.h") == 0)
     handle_arm_neon_h ();
   else if (strcmp (name, "arm_acle.h") == 0)
diff --git a/gcc/config/aarch64/aarch64-option-extensions.def b/gcc/config/aarch64/aarch64-option-extensions.def
index 402a9832f87..cf55742ae60 100644
--- a/gcc/config/aarch64/aarch64-option-extensions.def
+++ b/gcc/config/aarch64/aarch64-option-extensions.def
@@ -131,6 +131,10 @@ AARCH64_OPT_EXTENSION("sve2-bitperm", SVE2_BITPERM, (SVE2), (), (),
 
 AARCH64_OPT_EXTENSION("sme", SME, (SVE2), (), (), "sme")
 
+AARCH64_OPT_EXTENSION("sme-i16i64", SME_I16I64, (SME), (), (), "")
+
+AARCH64_OPT_EXTENSION("sme-f64f64", SME_F64F64, (SME), (), (), "")
+
 AARCH64_OPT_EXTENSION("tme", TME, (), (), (), "")
 
 AARCH64_OPT_EXTENSION("i8mm", I8MM, (SIMD), (), (), "i8mm")
diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index 97a84f616a2..700d5fb1c77 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -808,7 +808,11 @@ bool aarch64_sve_vector_inc_dec_immediate_p (rtx);
 int aarch64_add_offset_temporaries (rtx);
 void aarch64_split_add_offset (scalar_int_mode, rtx, rtx, rtx, rtx, rtx);
 bool aarch64_rdsvl_immediate_p (const_rtx);
+rtx aarch64_sme_vq_immediate (machine_mode mode, HOST_WIDE_INT,
+			      aarch64_feature_flags);
 char *aarch64_output_rdsvl (const_rtx);
+bool aarch64_addsvl_addspl_immediate_p (const_rtx);
+char *aarch64_output_addsvl_addspl (rtx);
 bool aarch64_mov_operand_p (rtx, machine_mode);
 rtx aarch64_reverse_mask (machine_mode, unsigned int);
 bool aarch64_offset_7bit_signed_scaled_p (machine_mode, poly_int64);
@@ -851,6 +855,7 @@ bool aarch64_uimm12_shift (HOST_WIDE_INT);
 int aarch64_movk_shift (const wide_int_ref &, const wide_int_ref &);
 bool aarch64_use_return_insn_p (void);
 const char *aarch64_output_casesi (rtx *);
+const char *aarch64_output_sme_zero (rtx);
 
 arm_pcs aarch64_tlsdesc_abi_id ();
 enum aarch64_symbol_type aarch64_classify_symbol (rtx, HOST_WIDE_INT);
@@ -865,7 +870,6 @@ int aarch64_uxt_size (int, HOST_WIDE_INT);
 int aarch64_vec_fpconst_pow_of_2 (rtx);
 rtx aarch64_eh_return_handler_rtx (void);
 rtx aarch64_mask_from_zextract_ops (rtx, rtx);
-const char *aarch64_output_move_struct (rtx *operands);
 rtx aarch64_return_addr_rtx (void);
 rtx aarch64_return_addr (int, rtx);
 rtx aarch64_simd_gen_const_vector_dup (machine_mode, HOST_WIDE_INT);
@@ -879,6 +883,7 @@ bool aarch64_sve_ldnf1_operand_p (rtx);
 bool aarch64_sve_ldr_operand_p (rtx);
 bool aarch64_sve_prefetch_operand_p (rtx, machine_mode);
 bool aarch64_sve_struct_memory_operand_p (rtx);
+bool aarch64_sme_ldr_vnum_offset_p (rtx, rtx);
 rtx aarch64_simd_vect_par_cnst_half (machine_mode, int, bool);
 rtx aarch64_gen_stepped_int_parallel (unsigned int, int, int);
 bool aarch64_stepped_int_parallel_p (rtx, int);
@@ -996,6 +1001,7 @@ void handle_arm_neon_h (void);
 namespace aarch64_sve {
   void init_builtins ();
   void handle_arm_sve_h ();
+  void handle_arm_sme_h ();
   tree builtin_decl (unsigned, bool);
   bool builtin_type_p (const_tree);
   bool builtin_type_p (const_tree, unsigned int *, unsigned int *);
diff --git a/gcc/config/aarch64/aarch64-sme.md b/gcc/config/aarch64/aarch64-sme.md
index 55fb00db12d..7b3ccea2e11 100644
--- a/gcc/config/aarch64/aarch64-sme.md
+++ b/gcc/config/aarch64/aarch64-sme.md
@@ -24,6 +24,18 @@
 ;; ---- Test current state
 ;; ---- PSTATE.SM management
 ;; ---- PSTATE.ZA management
+;;
+;; == Loads, stores and moves
+;; ---- Single-vector loads
+;; ---- Single-vector stores
+;; ---- Single-vector moves
+;; ---- Zeroing
+;;
+;; == Unary operations
+;; ---- Single vector input
+;;
+;; == Binary operations
+;; ---- Sum of outer products
 
 ;; =========================================================================
 ;; == State management
@@ -269,3 +281,326 @@ (define_insn_and_split "aarch64_restore_za"
     DONE;
   }
 )
+;; =========================================================================
+;; == Loads, stores and moves
+;; =========================================================================
+
+;; -------------------------------------------------------------------------
+;; ---- Single-vector loads
+;; -------------------------------------------------------------------------
+;; Includes:
+;; - LD1
+;; - LDR
+;; -------------------------------------------------------------------------
+
+(define_c_enum "unspec" [
+  UNSPEC_SME_LDR
+])
+
+(define_insn "@aarch64_sme_<optab><mode>"
+  [(set (reg:SME_ZA_I ZA_REGNUM)
+	(unspec:SME_ZA_I
+	  [(reg:SME_ZA_I ZA_REGNUM)
+	   (match_operand:DI 0 "const_int_operand")
+	   (match_operand:SI 1 "register_operand" "Uci")
+	   (match_operand:<VPRED> 2 "register_operand" "Upl")
+	   (match_operand:SME_ZA_I 3 "aarch64_sve_ldff1_operand" "Utf")]
+	  SME_LD1))]
+  "TARGET_STREAMING_SME"
+  "ld1<Vesize>\t{ za%0<hv>.<Vetype>[%w1, 0] }, %2/z, %3"
+)
+
+(define_insn "*aarch64_sme_<optab><mode>_plus"
+  [(set (reg:SME_ZA_I ZA_REGNUM)
+	(unspec:SME_ZA_I
+	  [(reg:SME_ZA_I ZA_REGNUM)
+	   (match_operand:DI 0 "const_int_operand")
+	   (plus:SI (match_operand:SI 1 "register_operand" "Uci")
+		    (match_operand:SI 2 "const_int_operand"))
+	   (match_operand:<VPRED> 3 "register_operand" "Upl")
+	   (match_operand:SME_ZA_I 4 "aarch64_sve_ldff1_operand" "Utf")]
+	  SME_LD1))]
+  "TARGET_STREAMING_SME
+   && IN_RANGE (UINTVAL (operands[2]), 0,
+		15 / GET_MODE_UNIT_SIZE (<MODE>mode))"
+  "ld1<Vesize>\t{ za%0<hv>.<Vetype>[%w1, %2] }, %3/z, %4"
+)
+
+(define_insn "aarch64_sme_ldr0"
+  [(set (reg:VNx16QI ZA_REGNUM)
+	(unspec:VNx16QI
+	  [(reg:VNx16QI ZA_REGNUM)
+	   (match_operand:SI 0 "register_operand" "Uci")
+	   (match_operand:VNx16QI 1 "aarch64_sync_memory_operand" "Q")]
+	  UNSPEC_SME_LDR))]
+  "TARGET_SME"
+  "ldr\tza[%w0, 0], %1"
+)
+
+(define_insn "*aarch64_sme_ldrn<mode>"
+  [(set (reg:VNx16QI ZA_REGNUM)
+	(unspec:VNx16QI
+	  [(reg:VNx16QI ZA_REGNUM)
+	   (plus:SI (match_operand:SI 0 "register_operand" "Uci")
+		    (match_operand:SI 1 "const_int_operand"))
+	   (mem:VNx16QI
+	     (plus:P (match_operand:P 2 "register_operand" "rk")
+		     (match_operand 3)))]
+	  UNSPEC_SME_LDR))]
+  "TARGET_SME
+   && aarch64_sme_ldr_vnum_offset_p (operands[1], operands[3])"
+  "ldr\tza[%w0, %1], [%2, #%1, mul vl]"
+)
+
+;; -------------------------------------------------------------------------
+;; ---- Single-vector stores
+;; -------------------------------------------------------------------------
+;; Includes:
+;; - ST1
+;; - STR
+;; -------------------------------------------------------------------------
+
+(define_c_enum "unspec" [
+  UNSPEC_SME_STR
+])
+
+(define_insn "@aarch64_sme_<optab><mode>"
+  [(set (match_operand:SME_ZA_I 0 "aarch64_sve_ldff1_operand" "+Utf")
+	(unspec:SME_ZA_I
+	  [(match_dup 0)
+	   (match_operand:DI 1 "const_int_operand")
+	   (match_operand:SI 2 "register_operand" "Uci")
+	   (match_operand:<VPRED> 3 "register_operand" "Upl")
+	   (reg:SME_ZA_I ZA_REGNUM)]
+	  SME_ST1))]
+  "TARGET_STREAMING_SME"
+  "st1<Vesize>\t{ za%1<hv>.<Vetype>[%w2, 0] }, %3/z, %0"
+)
+
+(define_insn "*aarch64_sme_<optab><mode>_plus"
+  [(set (match_operand:SME_ZA_I 0 "aarch64_sve_ldff1_operand" "+Utf")
+	(unspec:SME_ZA_I
+	  [(match_dup 0)
+	   (match_operand:DI 1 "const_int_operand")
+	   (plus:SI (match_operand:SI 2 "register_operand" "Uci")
+		    (match_operand:SI 3 "const_int_operand"))
+	   (match_operand:<VPRED> 4 "register_operand" "Upl")
+	   (reg:SME_ZA_I ZA_REGNUM)]
+	  SME_ST1))]
+  "TARGET_STREAMING_SME
+   && IN_RANGE (UINTVAL (operands[3]), 0,
+		15 / GET_MODE_UNIT_SIZE (<MODE>mode))"
+  "st1<Vesize>\t{ za%1<hv>.<Vetype>[%w2, %3] }, %4/z, %0"
+)
+
+(define_insn "aarch64_sme_str0"
+  [(set (match_operand:VNx16QI 0 "aarch64_sync_memory_operand" "+Q")
+	(unspec:VNx16QI
+	  [(match_dup 0)
+	   (match_operand:SI 1 "register_operand" "Uci")
+	   (reg:VNx16QI ZA_REGNUM)]
+	  UNSPEC_SME_STR))]
+  "TARGET_SME"
+  "str\tza[%w1, 0], %0"
+)
+
+(define_insn "*aarch64_sme_strn<mode>"
+  [(set (mem:VNx16QI
+	  (plus:P (match_operand:P 2 "register_operand" "rk")
+		  (match_operand 3)))
+	(unspec:VNx16QI
+	  [(mem:VNx16QI (plus:P (match_dup 2) (match_dup 3)))
+	   (plus:SI (match_operand:SI 0 "register_operand" "Uci")
+		    (match_operand:SI 1 "const_int_operand"))
+	   (reg:VNx16QI ZA_REGNUM)]
+	  UNSPEC_SME_STR))]
+  "TARGET_SME
+   && aarch64_sme_ldr_vnum_offset_p (operands[1], operands[3])"
+  "str\tza[%w0, %1], [%2, #%1, mul vl]"
+)
+
+;; -------------------------------------------------------------------------
+;; ---- Single-vector moves
+;; -------------------------------------------------------------------------
+;; Includes:
+;; - MOVA
+;; -------------------------------------------------------------------------
+
+(define_insn "@aarch64_sme_<optab><v_int_container><mode>"
+  [(set (match_operand:SVE_FULL 0 "register_operand" "=w")
+	(unspec:SVE_FULL
+	  [(match_operand:SVE_FULL 1 "register_operand" "0")
+	   (match_operand:<VPRED> 2 "register_operand" "Upl")
+	   (match_operand:DI 3 "const_int_operand")
+	   (match_operand:SI 4 "register_operand" "Uci")
+	   (reg:<V_INT_CONTAINER> ZA_REGNUM)]
+	  SME_READ))]
+  "TARGET_STREAMING_SME"
+  "mova\t%0.<Vetype>, %2/m, za%3<hv>.<Vetype>[%w4, 0]"
+)
+
+(define_insn "*aarch64_sme_<optab><v_int_container><mode>_plus"
+  [(set (match_operand:SVE_FULL 0 "register_operand" "=w")
+	(unspec:SVE_FULL
+	  [(match_operand:SVE_FULL 1 "register_operand" "0")
+	   (match_operand:<VPRED> 2 "register_operand" "Upl")
+	   (match_operand:DI 3 "const_int_operand")
+	   (plus:SI (match_operand:SI 4 "register_operand" "Uci")
+		    (match_operand:SI 5 "const_int_operand"))
+	   (reg:<V_INT_CONTAINER> ZA_REGNUM)]
+	  SME_READ))]
+  "TARGET_STREAMING_SME
+   && IN_RANGE (UINTVAL (operands[5]), 0,
+		15 / GET_MODE_UNIT_SIZE (<MODE>mode))"
+  "mova\t%0.<Vetype>, %2/m, za%3<hv>.<Vetype>[%w4, %5]"
+)
+
+(define_insn "@aarch64_sme_<optab><VNx1TI_ONLY:mode><SVE_FULL:mode>"
+  [(set (match_operand:SVE_FULL 0 "register_operand" "=w")
+	(unspec:SVE_FULL
+	  [(match_operand:SVE_FULL 1 "register_operand" "0")
+	   (match_operand:VNx2BI 2 "register_operand" "Upl")
+	   (match_operand:DI 3 "const_int_operand")
+	   (match_operand:SI 4 "register_operand" "Uci")
+	   (reg:VNx1TI_ONLY ZA_REGNUM)]
+	  SME_READ))]
+  "TARGET_STREAMING_SME"
+  "mova\t%0.q, %2/m, za%3<hv>.q[%w4, 0]"
+)
+
+(define_insn "@aarch64_sme_<optab><v_int_container><mode>"
+  [(set (reg:<V_INT_CONTAINER> ZA_REGNUM)
+	(unspec:<V_INT_CONTAINER>
+	  [(reg:SVE_FULL ZA_REGNUM)
+	   (match_operand:DI 0 "const_int_operand")
+	   (match_operand:SI 1 "register_operand" "Uci")
+	   (match_operand:<VPRED> 2 "register_operand" "Upl")
+	   (match_operand:SVE_FULL 3 "register_operand" "w")]
+	  SME_WRITE))]
+  "TARGET_STREAMING_SME"
+  "mova\tza%0<hv>.<Vetype>[%w1, 0], %2/m, %3.<Vetype>"
+)
+
+(define_insn "*aarch64_sme_<optab><v_int_container><mode>_plus"
+  [(set (reg:<V_INT_CONTAINER> ZA_REGNUM)
+	(unspec:<V_INT_CONTAINER>
+	  [(reg:SVE_FULL ZA_REGNUM)
+	   (match_operand:DI 0 "const_int_operand")
+	   (plus:SI (match_operand:SI 1 "register_operand" "Uci")
+		    (match_operand:SI 2 "const_int_operand"))
+	   (match_operand:<VPRED> 3 "register_operand" "Upl")
+	   (match_operand:SVE_FULL 4 "register_operand" "w")]
+	  SME_WRITE))]
+  "TARGET_STREAMING_SME
+   && IN_RANGE (UINTVAL (operands[2]), 0,
+		15 / GET_MODE_UNIT_SIZE (<MODE>mode))"
+  "mova\tza%0<hv>.<Vetype>[%w1, %2], %3/m, %4.<Vetype>"
+)
+
+(define_insn "@aarch64_sme_<optab><VNx1TI_ONLY:mode><SVE_FULL:mode>"
+  [(set (reg:VNx1TI_ONLY ZA_REGNUM)
+	(unspec:VNx1TI_ONLY
+	  [(reg:VNx1TI_ONLY ZA_REGNUM)
+	   (match_operand:DI 0 "const_int_operand")
+	   (match_operand:SI 1 "register_operand" "Uci")
+	   (match_operand:VNx2BI 2 "register_operand" "Upl")
+	   (match_operand:SVE_FULL 3 "register_operand" "w")]
+	  SME_WRITE))]
+  "TARGET_STREAMING_SME"
+  "mova\tza%0<hv>.q[%w1, 0], %2/m, %3.q"
+)
+
+;; -------------------------------------------------------------------------
+;; ---- Zeroing
+;; -------------------------------------------------------------------------
+;; Includes
+;; - ZERO
+;; -------------------------------------------------------------------------
+
+(define_c_enum "unspec" [UNSPEC_SME_ZERO])
+
+(define_insn "aarch64_sme_zero"
+  [(set (reg:VNx16QI ZA_REGNUM)
+	(unspec:VNx16QI [(reg:VNx16QI ZA_REGNUM)
+			 (match_operand:DI 0 "const_int_operand")]
+			UNSPEC_SME_ZERO))]
+  "TARGET_SME"
+  {
+    return aarch64_output_sme_zero (operands[0]);
+  }
+)
+
+;; =========================================================================
+;; == Unary operations
+;; =========================================================================
+
+;; -------------------------------------------------------------------------
+;; ---- Single vector input
+;; -------------------------------------------------------------------------
+;; Includes
+;; - ADDHA
+;; - ADDVA
+;; -------------------------------------------------------------------------
+
+(define_insn "@aarch64_sme_<optab><mode>"
+  [(set (reg:SME_ZA_SDI ZA_REGNUM)
+	(unspec:SME_ZA_SDI
+	  [(reg:SME_ZA_SDI ZA_REGNUM)
+	   (match_operand:DI 0 "const_int_operand")
+	   (match_operand:<VPRED> 1 "register_operand" "Upl")
+	   (match_operand:<VPRED> 2 "register_operand" "Upl")
+	   (match_operand:SME_ZA_SDI 3 "register_operand" "w")]
+	  SME_UNARY_SDI))]
+  "TARGET_STREAMING_SME"
+  "<optab>\tza%0.<Vetype>, %1/m, %2/m, %3.<Vetype>"
+)
+
+;; =========================================================================
+;; == Binary operations
+;; =========================================================================
+
+;; -------------------------------------------------------------------------
+;; ---- Sum of outer products
+;; -------------------------------------------------------------------------
+;; Includes
+;; - BFMOPA
+;; - BFMOPS
+;; - FMOPA
+;; - FMOPS
+;; - SMOPA
+;; - SMOPS
+;; - SUMOPA
+;; - SUMOPS
+;; - UMOPA
+;; - UMOPS
+;; - USMOPA
+;; - USMOPS
+;; -------------------------------------------------------------------------
+
+(define_insn "@aarch64_sme_<optab><mode>"
+  [(set (reg:<V4xWIDE> ZA_REGNUM)
+	(unspec:<V4xWIDE>
+	  [(reg:<V4xWIDE> ZA_REGNUM)
+	   (match_operand:DI 0 "const_int_operand")
+	   (match_operand:<V4xWIDE_PRED> 1 "register_operand" "Upl")
+	   (match_operand:<V4xWIDE_PRED> 2 "register_operand" "Upl")
+	   (match_operand:SME_MOP_BHI 3 "register_operand" "w")
+	   (match_operand:SME_MOP_BHI 4 "register_operand" "w")]
+	  SME_INT_MOP))]
+  "TARGET_STREAMING_SME"
+  "<optab>\tza%0.<V4xwetype>, %1/m, %2/m, %3.<Vetype>, %4.<Vetype>"
+)
+
+(define_insn "@aarch64_sme_<optab><mode>"
+  [(set (reg:<SME_FMOP_WIDE> ZA_REGNUM)
+	(unspec:<SME_FMOP_WIDE>
+	  [(reg:<SME_FMOP_WIDE> ZA_REGNUM)
+	   (match_operand:DI 0 "const_int_operand")
+	   (match_operand:<SME_FMOP_WIDE_PRED> 1 "register_operand" "Upl")
+	   (match_operand:<SME_FMOP_WIDE_PRED> 2 "register_operand" "Upl")
+	   (match_operand:SME_MOP_HSDF 3 "register_operand" "w")
+	   (match_operand:SME_MOP_HSDF 4 "register_operand" "w")]
+	  SME_FP_MOP))]
+  "TARGET_STREAMING_SME"
+  "<b><optab>\tza%0.<sme_fmop_wide_etype>, %1/m, %2/m, %3.<Vetype>, %4.<Vetype>"
+)
diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
index 6347407555f..f4765e6e541 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
+++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
@@ -2332,10 +2332,21 @@ class svundef_impl : public quiet<multi_vector_function>
 public:
   using quiet<multi_vector_function>::quiet;
 
+  unsigned int
+  call_properties (const function_instance &fi) const override
+  {
+    auto base = quiet<multi_vector_function>::call_properties (fi);
+    if (fi.type_suffix (0).za_p)
+      base |= CP_WRITE_ZA;
+    return base;
+  }
+
   rtx
   expand (function_expander &e) const override
   {
-    rtx target = e.get_reg_target ();
+    rtx target = (e.type_suffix (0).za_p
+		  ? gen_rtx_REG (VNx16QImode, ZA_REGNUM)
+		  : e.get_reg_target ());
     emit_clobber (copy_rtx (target));
     return target;
   }
diff --git a/gcc/config/aarch64/aarch64-sve-builtins-functions.h b/gcc/config/aarch64/aarch64-sve-builtins-functions.h
index 2fd135aab07..70cfb6a7c23 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins-functions.h
+++ b/gcc/config/aarch64/aarch64-sve-builtins-functions.h
@@ -39,6 +39,36 @@ public:
   }
 };
 
+/* Wrap T, which is derived from function_base, and indicate that the
+   function reads from and writes to ZA.  */
+template<typename T>
+class read_write_za : public T
+{
+public:
+  using T::T;
+
+  unsigned int
+  call_properties (const function_instance &fi) const override
+  {
+    return T::call_properties (fi) | CP_READ_ZA | CP_WRITE_ZA;
+  }
+};
+
+/* Wrap T, which is derived from function_base, and indicate that the
+   function writes to ZA (but does not read from it).  */
+template<typename T>
+class write_za : public T
+{
+public:
+  using T::T;
+
+  unsigned int
+  call_properties (const function_instance &fi) const override
+  {
+    return T::call_properties (fi) | CP_WRITE_ZA;
+  }
+};
+
 /* A function_base that sometimes or always operates on tuples of
    vectors.  */
 class multi_vector_function : public function_base
@@ -348,6 +378,15 @@ typedef unspec_based_function_exact_insn<code_for_aarch64_sve_sub>
 typedef unspec_based_function_exact_insn<code_for_aarch64_sve_sub_lane>
   unspec_based_sub_lane_function;
 
+/* General SME unspec-based functions.  */
+typedef unspec_based_function_exact_insn<code_for_aarch64_sme>
+  unspec_based_sme_function;
+
+/* SME functions that read from and write to ZA.  */
+typedef read_write_za<unspec_based_sme_function> za_arith_function;
+typedef read_write_za<quiet<unspec_based_sme_function>>
+  quiet_za_arith_function;
+
 /* A function that acts like unspec_based_function_exact_insn<INT_CODE>
    when operating on integers, but that expands to an (fma ...)-style
    aarch64_sve* operation when applied to floats.  */
diff --git a/gcc/config/aarch64/aarch64-sve-builtins-shapes.cc b/gcc/config/aarch64/aarch64-sve-builtins-shapes.cc
index df2d5414c07..69c5304a8ba 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins-shapes.cc
+++ b/gcc/config/aarch64/aarch64-sve-builtins-shapes.cc
@@ -59,7 +59,10 @@ static void
 apply_predication (const function_instance &instance, tree return_type,
 		   vec<tree> &argument_types)
 {
-  if (instance.pred != PRED_none)
+  /* There are currently no SME ZA instructions that have both merging and
+     unpredicated forms, so for simplicity, the predicates are always included
+     in the original format string.  */
+  if (instance.pred != PRED_none && instance.pred != PRED_za_m)
     {
       argument_types.quick_insert (0, get_svbool_t ());
       /* For unary merge operations, the first argument is a vector with
@@ -584,6 +587,32 @@ struct binary_imm_long_base : public overloaded_base<0>
   }
 };
 
+template<type_class_index TCLASS = function_resolver::SAME_TYPE_CLASS,
+	 unsigned int BITS = function_resolver::SAME_SIZE>
+struct binary_za_m_base : public overloaded_base<1>
+{
+  tree
+  resolve (function_resolver &r) const override
+  {
+    type_suffix_index type;
+    if (!r.check_num_arguments (5)
+	|| !r.require_integer_immediate (0)
+	|| !r.require_vector_type (1, VECTOR_TYPE_svbool_t)
+	|| !r.require_vector_type (2, VECTOR_TYPE_svbool_t)
+	|| (type = r.infer_vector_type (3)) == NUM_TYPE_SUFFIXES
+	|| !r.require_derived_vector_type (4, 3, type, TCLASS, BITS))
+      return error_mark_node;
+
+    return r.resolve_to (type);
+  }
+
+  bool
+  check (function_checker &c) const override
+  {
+    return c.require_immediate_range (0, 0, c.num_za_tiles () - 1);
+  }
+};
+
 /* Base class for inc_dec and inc_dec_pat.  */
 struct inc_dec_base : public overloaded_base<0>
 {
@@ -1571,6 +1600,61 @@ struct binary_wide_opt_n_def : public overloaded_base<0>
 };
 SHAPE (binary_wide_opt_n)
 
+/* void svfoo_t0[_t1](uint64_t, svbool_t, svbool_t, sv<t1>_t,
+		      sv<t1:int>_t).  */
+struct binary_za_int_m_def : public binary_za_m_base<TYPE_signed>
+{
+  void
+  build (function_builder &b, const function_group_info &group) const override
+  {
+    b.add_overloaded_functions (group, MODE_none);
+    build_all (b, "_,su64,vp,vp,t1,ts1", group, MODE_none);
+  }
+};
+SHAPE (binary_za_int_m)
+
+/* void svfoo_t0[_t1](uint64_t, svbool_t, svbool_t, sv<t1>_t, sv<t1>_t).  */
+struct binary_za_m_def : public binary_za_m_base<>
+{
+  void
+  build (function_builder &b, const function_group_info &group) const override
+  {
+    b.add_overloaded_functions (group, MODE_none);
+    /* Allow the overloaded form to be specified seperately, with just
+       a single suffix.  This is necessary for the 64-bit SME MOP intrinsics,
+       which have some forms dependent on FEAT_SME_I16I64 and some forms
+       dependent on FEAT_SME_F64F64.  The resolver needs to be defined
+       for base SME.  */
+    if (group.types[0][1] != NUM_TYPE_SUFFIXES)
+      build_all (b, "_,su64,vp,vp,t1,t1", group, MODE_none);
+  }
+};
+SHAPE (binary_za_m)
+
+/* void svfoo_t0[_t1](uint64_t, svbool_t, svbool_t, sv<t1>_t,
+		      sv<t1:uint>_t).  */
+struct binary_za_uint_m_def : public binary_za_m_base<TYPE_unsigned>
+{
+  void
+  build (function_builder &b, const function_group_info &group) const override
+  {
+    b.add_overloaded_functions (group, MODE_none);
+    build_all (b, "_,su64,vp,vp,t1,tu1", group, MODE_none);
+  }
+};
+SHAPE (binary_za_uint_m)
+
+/* bool svfoo().  */
+struct bool_inherent_def : public nonoverloaded_base
+{
+  void
+  build (function_builder &b, const function_group_info &group) const override
+  {
+    build_all (b, "sp", group, MODE_none);
+  }
+};
+SHAPE (bool_inherent)
+
 /* sv<t0>_t svfoo[_t0](sv<t0>_t, sv<t0>_t)
    <t0>_t svfoo[_n_t0](<t0>_t, sv<t0>_t).  */
 struct clast_def : public overloaded_base<0>
@@ -2050,6 +2134,41 @@ struct inherent_b_def : public overloaded_base<0>
 };
 SHAPE (inherent_b)
 
+/* void svfoo_t0().  */
+struct inherent_za_def : public nonoverloaded_base
+{
+  void
+  build (function_builder &b, const function_group_info &group) const override
+  {
+    build_all (b, "_", group, MODE_none);
+  }
+};
+SHAPE (inherent_za)
+
+/* void svfoo_t0(uint64_t).  */
+struct inherent_mask_za_def : public nonoverloaded_base
+{
+  void
+  build (function_builder &b, const function_group_info &group) const override
+  {
+    build_all (b, "_,su64", group, MODE_none);
+  }
+};
+SHAPE (inherent_mask_za)
+
+/* void svfoo_t0(uint32_t, const void *)
+   void svfoo_vnum_t0(uint32_t, const void *, int64_t).  */
+struct ldr_za_def : public nonoverloaded_base
+{
+  void
+  build (function_builder &b, const function_group_info &group) const override
+  {
+    build_all (b, "_,su32,al", group, MODE_none);
+    build_all (b, "_,su32,al,ss64", group, MODE_vnum);
+  }
+};
+SHAPE (ldr_za)
+
 /* sv<t0>[xN]_t svfoo[_t0](const <t0>_t *)
    sv<t0>[xN]_t svfoo_vnum[_t0](const <t0>_t *, int64_t).  */
 struct load_def : public load_contiguous_base
@@ -2260,6 +2379,27 @@ struct load_replicate_def : public load_contiguous_base
 };
 SHAPE (load_replicate)
 
+/* void svfoo_t0(uint64_t, uint32_t, svbool_t, const void *)
+   void svfoo_vnum_t0(uint64_t, uint32_t, svbool_t, const void *, int64_t)
+
+   where the first two fields form a (ZA tile, slice) pair.  */
+struct load_za_def : public nonoverloaded_base
+{
+  void
+  build (function_builder &b, const function_group_info &group) const override
+  {
+    build_all (b, "_,su64,su32,vp,al", group, MODE_none);
+    build_all (b, "_,su64,su32,vp,al,ss64", group, MODE_vnum);
+  }
+
+  bool
+  check (function_checker &c) const override
+  {
+    return c.require_immediate_range (0, 0, c.num_za_tiles () - 1);
+  }
+};
+SHAPE (load_za)
+
 /* svbool_t svfoo(enum svpattern).  */
 struct pattern_pred_def : public nonoverloaded_base
 {
@@ -2354,6 +2494,46 @@ struct rdffr_def : public nonoverloaded_base
 };
 SHAPE (rdffr)
 
+/* sv<t1>_t svfoo_t0[_t1](uint64_t, uint32_t).  */
+struct read_za_def : public overloaded_base<1>
+{
+  bool
+  has_merge_argument_p (const function_instance &, unsigned int) const override
+  {
+    return true;
+  }
+
+  void
+  build (function_builder &b, const function_group_info &group) const override
+  {
+    b.add_overloaded_functions (group, MODE_none);
+    build_all (b, "t1,su64,su32", group, MODE_none);
+  }
+
+  tree
+  resolve (function_resolver &r) const override
+  {
+    gcc_assert (r.pred == PRED_m);
+    type_suffix_index type;
+    if (!r.check_num_arguments (4)
+	|| (type = r.infer_vector_type (0)) == NUM_TYPE_SUFFIXES
+	|| !r.require_vector_type (1, VECTOR_TYPE_svbool_t)
+	|| !r.require_integer_immediate (2)
+	|| !r.require_scalar_type (3, "uint32_t"))
+      return error_mark_node;
+
+    return r.resolve_to (type);
+  }
+
+  bool
+  check (function_checker &c) const override
+  {
+    gcc_assert (c.pred == PRED_m);
+    return c.require_immediate_range (1, 0, c.num_za_tiles () - 1);
+  }
+};
+SHAPE (read_za)
+
 /* <t0>_t svfoo[_t0](sv<t0>_t).  */
 struct reduction_def : public overloaded_base<0>
 {
@@ -2694,6 +2874,40 @@ struct store_scatter_offset_restricted_def : public store_scatter_base
 };
 SHAPE (store_scatter_offset_restricted)
 
+/* void svfoo_t0(uint64_t, uint32_t, svbool_t, void *)
+   void svfoo_vnum_t0(uint64_t, uint32_t, svbool_t, void *, int64_t)
+
+   where the first two fields form a (ZA tile, slice) pair.  */
+struct store_za_def : public nonoverloaded_base
+{
+  void
+  build (function_builder &b, const function_group_info &group) const override
+  {
+    build_all (b, "_,su64,su32,vp,as", group, MODE_none);
+    build_all (b, "_,su64,su32,vp,as,ss64", group, MODE_vnum);
+  }
+
+  bool
+  check (function_checker &c) const override
+  {
+    return c.require_immediate_range (0, 0, c.num_za_tiles () - 1);
+  }
+};
+SHAPE (store_za)
+
+/* void svfoo_t0(uint32_t, void *)
+   void svfoo_vnum_t0(uint32_t, void *, int64_t).  */
+struct str_za_def : public nonoverloaded_base
+{
+  void
+  build (function_builder &b, const function_group_info &group) const override
+  {
+    build_all (b, "_,su32,as", group, MODE_none);
+    build_all (b, "_,su32,as,ss64", group, MODE_vnum);
+  }
+};
+SHAPE (str_za)
+
 /* sv<t0>_t svfoo[_t0](sv<t0>xN_t, sv<t0:uint>_t).  */
 struct tbl_tuple_def : public overloaded_base<0>
 {
@@ -3454,4 +3668,68 @@ struct unary_widen_def : public overloaded_base<0>
 };
 SHAPE (unary_widen)
 
+/* void svfoo_t0[_t1](uint64_t, svbool_t, svbool_t, sv<t1>_t).  */
+struct unary_za_m_def : public overloaded_base<1>
+{
+  void
+  build (function_builder &b, const function_group_info &group) const override
+  {
+    b.add_overloaded_functions (group, MODE_none);
+    build_all (b, "_,su64,vp,vp,t1", group, MODE_none);
+  }
+
+  tree
+  resolve (function_resolver &r) const override
+  {
+    type_suffix_index type;
+    if (!r.check_num_arguments (4)
+	|| !r.require_integer_immediate (0)
+	|| !r.require_vector_type (1, VECTOR_TYPE_svbool_t)
+	|| !r.require_vector_type (2, VECTOR_TYPE_svbool_t)
+	|| (type = r.infer_vector_type (3)) == NUM_TYPE_SUFFIXES)
+      return error_mark_node;
+
+    return r.resolve_to (type);
+  }
+
+  bool
+  check (function_checker &c) const override
+  {
+    return c.require_immediate_range (0, 0, c.num_za_tiles () - 1);
+  }
+};
+SHAPE (unary_za_m)
+
+/* void svfoo_t0[_t1](uint64_t, uint32_t, svbool_t, sv<t1>_t).  */
+struct write_za_def : public overloaded_base<1>
+{
+  void
+  build (function_builder &b, const function_group_info &group) const override
+  {
+    b.add_overloaded_functions (group, MODE_none);
+    build_all (b, "_,su64,su32,vp,t1", group, MODE_none);
+  }
+
+  tree
+  resolve (function_resolver &r) const override
+  {
+    type_suffix_index type;
+    if (!r.check_num_arguments (4)
+	|| !r.require_integer_immediate (0)
+	|| !r.require_scalar_type (1, "uint32_t")
+	|| !r.require_vector_type (2, VECTOR_TYPE_svbool_t)
+	|| (type = r.infer_vector_type (3)) == NUM_TYPE_SUFFIXES)
+      return error_mark_node;
+
+    return r.resolve_to (type);
+  }
+
+  bool
+  check (function_checker &c) const override
+  {
+    return c.require_immediate_range (0, 0, c.num_za_tiles () - 1);
+  }
+};
+SHAPE (write_za)
+
 }
diff --git a/gcc/config/aarch64/aarch64-sve-builtins-shapes.h b/gcc/config/aarch64/aarch64-sve-builtins-shapes.h
index 3b0025f85db..f7f9cdd3351 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins-shapes.h
+++ b/gcc/config/aarch64/aarch64-sve-builtins-shapes.h
@@ -93,6 +93,10 @@ namespace aarch64_sve
     extern const function_shape *const binary_uint64_opt_n;
     extern const function_shape *const binary_wide;
     extern const function_shape *const binary_wide_opt_n;
+    extern const function_shape *const binary_za_int_m;
+    extern const function_shape *const binary_za_m;
+    extern const function_shape *const binary_za_uint_m;
+    extern const function_shape *const bool_inherent;
     extern const function_shape *const clast;
     extern const function_shape *const compare;
     extern const function_shape *const compare_opt_n;
@@ -114,6 +118,9 @@ namespace aarch64_sve
     extern const function_shape *const inc_dec_pred_scalar;
     extern const function_shape *const inherent;
     extern const function_shape *const inherent_b;
+    extern const function_shape *const inherent_za;
+    extern const function_shape *const inherent_mask_za;
+    extern const function_shape *const ldr_za;
     extern const function_shape *const load;
     extern const function_shape *const load_ext;
     extern const function_shape *const load_ext_gather_index;
@@ -124,6 +131,7 @@ namespace aarch64_sve
     extern const function_shape *const load_gather_sv_restricted;
     extern const function_shape *const load_gather_vs;
     extern const function_shape *const load_replicate;
+    extern const function_shape *const load_za;
     extern const function_shape *const mmla;
     extern const function_shape *const pattern_pred;
     extern const function_shape *const prefetch;
@@ -131,6 +139,7 @@ namespace aarch64_sve
     extern const function_shape *const prefetch_gather_offset;
     extern const function_shape *const ptest;
     extern const function_shape *const rdffr;
+    extern const function_shape *const read_za;
     extern const function_shape *const reduction;
     extern const function_shape *const reduction_wide;
     extern const function_shape *const set;
@@ -147,6 +156,8 @@ namespace aarch64_sve
     extern const function_shape *const store_scatter_index_restricted;
     extern const function_shape *const store_scatter_offset;
     extern const function_shape *const store_scatter_offset_restricted;
+    extern const function_shape *const store_za;
+    extern const function_shape *const str_za;
     extern const function_shape *const tbl_tuple;
     extern const function_shape *const ternary_bfloat;
     extern const function_shape *const ternary_bfloat_lane;
@@ -185,6 +196,8 @@ namespace aarch64_sve
     extern const function_shape *const unary_to_uint;
     extern const function_shape *const unary_uint;
     extern const function_shape *const unary_widen;
+    extern const function_shape *const unary_za_m;
+    extern const function_shape *const write_za;
   }
 }
 
diff --git a/gcc/config/aarch64/aarch64-sve-builtins-sme.cc b/gcc/config/aarch64/aarch64-sve-builtins-sme.cc
new file mode 100644
index 00000000000..fa6683c0088
--- /dev/null
+++ b/gcc/config/aarch64/aarch64-sve-builtins-sme.cc
@@ -0,0 +1,351 @@
+/* ACLE support for AArch64 SVE (__ARM_FEATURE_SVE2 intrinsics)
+   Copyright (C) 2020-2022 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.  */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "tm.h"
+#include "tree.h"
+#include "rtl.h"
+#include "tm_p.h"
+#include "memmodel.h"
+#include "insn-codes.h"
+#include "optabs.h"
+#include "recog.h"
+#include "expr.h"
+#include "basic-block.h"
+#include "function.h"
+#include "fold-const.h"
+#include "gimple.h"
+#include "gimple-iterator.h"
+#include "gimplify.h"
+#include "explow.h"
+#include "emit-rtl.h"
+#include "aarch64-sve-builtins.h"
+#include "aarch64-sve-builtins-shapes.h"
+#include "aarch64-sve-builtins-base.h"
+#include "aarch64-sve-builtins-sme.h"
+#include "aarch64-sve-builtins-functions.h"
+
+using namespace aarch64_sve;
+
+namespace {
+
+class load_store_za_base : public function_base
+{
+public:
+  tree
+  memory_scalar_type (const function_instance &) const override
+  {
+    return void_type_node;
+  }
+};
+
+class read_write_za_base : public function_base
+{
+public:
+  constexpr read_write_za_base (int unspec) : m_unspec (unspec) {}
+
+  rtx
+  expand (function_expander &e) const override
+  {
+    auto za_mode = e.vector_mode (0);
+    auto z_mode = e.vector_mode (1);
+    auto icode = (za_mode == VNx1TImode
+		  ? code_for_aarch64_sme (m_unspec, za_mode, z_mode)
+		  : code_for_aarch64_sme (m_unspec, z_mode, z_mode));
+    return e.use_exact_insn (icode);
+  }
+
+  int m_unspec;
+};
+
+class load_za_base : public load_store_za_base
+{
+public:
+  unsigned int
+  call_properties (const function_instance &) const override
+  {
+    return CP_READ_MEMORY | CP_WRITE_ZA;
+  }
+};
+
+class store_za_base : public load_store_za_base
+{
+public:
+  unsigned int
+  call_properties (const function_instance &) const override
+  {
+    return CP_WRITE_MEMORY | CP_READ_ZA;
+  }
+};
+
+static void
+add_load_store_operand (function_expander &e, unsigned int base_argno)
+{
+  auto mode = e.vector_mode (0);
+  rtx base = e.get_contiguous_base (mode, base_argno, base_argno + 1,
+				    AARCH64_FL_SM_ON);
+  auto mem = gen_rtx_MEM (mode, force_reg (Pmode, base));
+  set_mem_align (mem, BITS_PER_UNIT);
+  e.add_fixed_operand (mem);
+}
+
+class arm_has_sme_impl : public function_base
+{
+  gimple *
+  fold (gimple_folder &f) const override
+  {
+    if (TARGET_SME)
+      return f.fold_to_cstu (1);
+    return nullptr;
+  }
+
+  rtx
+  expand (function_expander &e) const override
+  {
+    if (TARGET_SME)
+      return const1_rtx;
+    emit_insn (gen_aarch64_get_sme_state ());
+    return expand_simple_binop (DImode, LSHIFTRT,
+				gen_rtx_REG (DImode, R0_REGNUM),
+				gen_int_mode (63, QImode),
+				e.possible_target, true, OPTAB_LIB_WIDEN);
+  }
+};
+
+class arm_in_streaming_mode_impl : public function_base
+{
+  gimple *
+  fold (gimple_folder &f) const override
+  {
+    if (TARGET_STREAMING)
+      return f.fold_to_cstu (1);
+    if (TARGET_NON_STREAMING)
+      return f.fold_to_cstu (0);
+    return nullptr;
+  }
+
+  rtx
+  expand (function_expander &e) const override
+  {
+    if (TARGET_STREAMING)
+      return const1_rtx;
+
+    if (TARGET_NON_STREAMING)
+      return const0_rtx;
+
+    rtx reg;
+    if (TARGET_SME)
+      {
+	reg = gen_reg_rtx (DImode);
+	emit_insn (gen_aarch64_read_svcr (reg));
+      }
+    else
+      {
+	emit_insn (gen_aarch64_get_sme_state ());
+	reg = gen_rtx_REG (DImode, R0_REGNUM);
+      }
+    return expand_simple_binop (DImode, AND, reg, gen_int_mode (1, DImode),
+				e.possible_target, true, OPTAB_LIB_WIDEN);
+  }
+};
+
+/* Implements svcnts[bhwd].  */
+class svcnts_bhwd_impl : public function_base
+{
+public:
+  constexpr svcnts_bhwd_impl (machine_mode ref_mode) : m_ref_mode (ref_mode) {}
+
+  unsigned int
+  get_shift () const
+  {
+    return exact_log2 (GET_MODE_UNIT_SIZE (m_ref_mode));
+  }
+
+  gimple *
+  fold (gimple_folder &f) const override
+  {
+    if (TARGET_STREAMING)
+      return f.fold_to_cstu (GET_MODE_NUNITS (m_ref_mode));
+    return nullptr;
+  }
+
+  rtx
+  expand (function_expander &e) const override
+  {
+    rtx cntsb = aarch64_sme_vq_immediate (DImode, 16, AARCH64_ISA_MODE);
+    auto shift = get_shift ();
+    if (!shift)
+      return cntsb;
+
+    return expand_simple_binop (DImode, LSHIFTRT, cntsb,
+				gen_int_mode (shift, QImode),
+				e.possible_target, true, OPTAB_LIB_WIDEN);
+  }
+
+  /* The mode of the vector associated with the [bhwd] suffix.  */
+  machine_mode m_ref_mode;
+};
+
+class svld1_impl : public load_za_base
+{
+public:
+  constexpr svld1_impl (int unspec) : m_unspec (unspec) {}
+
+  rtx
+  expand (function_expander &e) const override
+  {
+    auto icode = code_for_aarch64_sme (m_unspec, e.vector_mode (0));
+    for (int i = 0; i < 3; ++i)
+      e.add_input_operand (icode, e.args[i]);
+    add_load_store_operand (e, 3);
+    return e.generate_insn (icode);
+  }
+
+  int m_unspec;
+};
+
+class svldr_impl : public load_za_base
+{
+public:
+  rtx
+  expand (function_expander &e) const override
+  {
+    auto icode = CODE_FOR_aarch64_sme_ldr0;
+    e.add_input_operand (icode, e.args[0]);
+    add_load_store_operand (e, 1);
+    return e.generate_insn (icode);
+  }
+};
+
+class svread_impl : public read_write_za_base
+{
+public:
+  using read_write_za_base::read_write_za_base;
+
+  unsigned int
+  call_properties (const function_instance &) const override
+  {
+    return CP_READ_ZA;
+  }
+};
+
+class svst1_impl : public store_za_base
+{
+public:
+  constexpr svst1_impl (int unspec) : m_unspec (unspec) {}
+
+  rtx
+  expand (function_expander &e) const override
+  {
+    auto icode = code_for_aarch64_sme (m_unspec, e.vector_mode (0));
+    add_load_store_operand (e, 3);
+    for (int i = 0; i < 3; ++i)
+      e.add_input_operand (icode, e.args[i]);
+    return e.generate_insn (icode);
+  }
+
+  int m_unspec;
+};
+
+class svstr_impl : public store_za_base
+{
+public:
+  rtx
+  expand (function_expander &e) const override
+  {
+    auto icode = CODE_FOR_aarch64_sme_str0;
+    add_load_store_operand (e, 1);
+    e.add_input_operand (icode, e.args[0]);
+    return e.generate_insn (icode);
+  }
+};
+
+class svwrite_impl : public read_write_za_base
+{
+public:
+  using read_write_za_base::read_write_za_base;
+
+  unsigned int
+  call_properties (const function_instance &) const override
+  {
+    return CP_WRITE_ZA;
+  }
+};
+
+class svzero_impl : public write_za<function_base>
+{
+public:
+  rtx
+  expand (function_expander &) const override
+  {
+    emit_insn (gen_aarch64_sme_zero (gen_int_mode (0xff, SImode)));
+    return const0_rtx;
+  }
+};
+
+class svzero_mask_impl : public write_za<function_base>
+{
+public:
+  rtx
+  expand (function_expander &e) const override
+  {
+    return e.use_exact_insn (CODE_FOR_aarch64_sme_zero);
+  }
+};
+
+} /* end anonymous namespace */
+
+namespace aarch64_sve {
+
+FUNCTION (arm_has_sme, arm_has_sme_impl, )
+FUNCTION (arm_in_streaming_mode, arm_in_streaming_mode_impl, )
+FUNCTION (svaddha, za_arith_function, (UNSPEC_SME_ADDHA,
+				       UNSPEC_SME_ADDHA, -1, 1))
+FUNCTION (svaddva, za_arith_function, (UNSPEC_SME_ADDVA,
+				       UNSPEC_SME_ADDVA, -1, 1))
+FUNCTION (svcntsb, svcnts_bhwd_impl, (VNx16QImode))
+FUNCTION (svcntsd, svcnts_bhwd_impl, (VNx2DImode))
+FUNCTION (svcntsh, svcnts_bhwd_impl, (VNx8HImode))
+FUNCTION (svcntsw, svcnts_bhwd_impl, (VNx4SImode))
+FUNCTION (svld1_hor, svld1_impl, (UNSPEC_SME_LD1_HOR))
+FUNCTION (svld1_ver, svld1_impl, (UNSPEC_SME_LD1_VER))
+FUNCTION (svldr, svldr_impl, )
+FUNCTION (svmopa, quiet_za_arith_function, (UNSPEC_SME_SMOPA,
+					    UNSPEC_SME_UMOPA,
+					    UNSPEC_SME_FMOPA, 1))
+FUNCTION (svmops, quiet_za_arith_function, (UNSPEC_SME_SMOPS,
+					    UNSPEC_SME_UMOPS,
+					    UNSPEC_SME_FMOPS, 1))
+FUNCTION (svread_hor, svread_impl, (UNSPEC_SME_READ_HOR))
+FUNCTION (svread_ver, svread_impl, (UNSPEC_SME_READ_VER))
+FUNCTION (svst1_hor, svst1_impl, (UNSPEC_SME_ST1_HOR))
+FUNCTION (svst1_ver, svst1_impl, (UNSPEC_SME_ST1_VER))
+FUNCTION (svsumopa, quiet_za_arith_function, (UNSPEC_SME_SUMOPA, -1, -1, 1))
+FUNCTION (svsumops, quiet_za_arith_function, (UNSPEC_SME_SUMOPS, -1, -1, 1))
+FUNCTION (svusmopa, quiet_za_arith_function, (-1, UNSPEC_SME_USMOPA, -1, 1))
+FUNCTION (svusmops, quiet_za_arith_function, (-1, UNSPEC_SME_USMOPS, -1, 1))
+FUNCTION (svstr, svstr_impl, )
+FUNCTION (svwrite_hor, svwrite_impl, (UNSPEC_SME_WRITE_HOR))
+FUNCTION (svwrite_ver, svwrite_impl, (UNSPEC_SME_WRITE_VER))
+FUNCTION (svzero, svzero_impl, )
+FUNCTION (svzero_mask, svzero_mask_impl, )
+
+} /* end namespace aarch64_sve */
diff --git a/gcc/config/aarch64/aarch64-sve-builtins-sme.def b/gcc/config/aarch64/aarch64-sve-builtins-sme.def
new file mode 100644
index 00000000000..a1d496dd809
--- /dev/null
+++ b/gcc/config/aarch64/aarch64-sve-builtins-sme.def
@@ -0,0 +1,83 @@
+/* ACLE support for AArch64 SVE (__ARM_FEATURE_SVE intrinsics)
+   Copyright (C) 2022 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.  */
+
+#define REQUIRED_EXTENSIONS 0
+DEF_SVE_FUNCTION (arm_has_sme, bool_inherent, none, none)
+DEF_SVE_FUNCTION (arm_in_streaming_mode, bool_inherent, none, none)
+#undef REQUIRED_EXTENSIONS
+
+#define REQUIRED_EXTENSIONS AARCH64_FL_SME
+DEF_SVE_FUNCTION (svcntsb, count_inherent, none, none)
+DEF_SVE_FUNCTION (svcntsd, count_inherent, none, none)
+DEF_SVE_FUNCTION (svcntsh, count_inherent, none, none)
+DEF_SVE_FUNCTION (svcntsw, count_inherent, none, none)
+#undef REQUIRED_EXTENSIONS
+
+#define REQUIRED_EXTENSIONS AARCH64_FL_SME | AARCH64_FL_ZA_ON
+DEF_SVE_FUNCTION (svldr, ldr_za, za, none)
+DEF_SVE_FUNCTION (svstr, str_za, za, none)
+DEF_SVE_FUNCTION (svundef, inherent_za, za, none)
+DEF_SVE_FUNCTION (svzero, inherent_za, za, none)
+DEF_SVE_FUNCTION (svzero_mask, inherent_mask_za, za, none)
+#undef REQUIRED_EXTENSIONS
+
+#define REQUIRED_EXTENSIONS (AARCH64_FL_SME \
+			     | AARCH64_FL_SM_ON \
+			     | AARCH64_FL_ZA_ON)
+DEF_SVE_FUNCTION (svaddha, unary_za_m, s_za_integer, za_m)
+DEF_SVE_FUNCTION (svaddva, unary_za_m, s_za_integer, za_m)
+DEF_SVE_FUNCTION (svld1_hor, load_za, all_za, none)
+DEF_SVE_FUNCTION (svld1_ver, load_za, all_za, none)
+DEF_SVE_FUNCTION (svmopa, binary_za_m, mop_base, za_m)
+DEF_SVE_FUNCTION (svmopa, binary_za_m, d_za, za_m)
+DEF_SVE_FUNCTION (svmops, binary_za_m, mop_base, za_m)
+DEF_SVE_FUNCTION (svmops, binary_za_m, d_za, za_m)
+DEF_SVE_FUNCTION (svread_hor, read_za, all_za_data, m)
+DEF_SVE_FUNCTION (svread_ver, read_za, all_za_data, m)
+DEF_SVE_FUNCTION (svst1_hor, store_za, all_za, none)
+DEF_SVE_FUNCTION (svst1_ver, store_za, all_za, none)
+DEF_SVE_FUNCTION (svsumopa, binary_za_uint_m, mop_base_signed, za_m)
+DEF_SVE_FUNCTION (svsumops, binary_za_uint_m, mop_base_signed, za_m)
+DEF_SVE_FUNCTION (svusmopa, binary_za_int_m, mop_base_unsigned, za_m)
+DEF_SVE_FUNCTION (svusmops, binary_za_int_m, mop_base_unsigned, za_m)
+DEF_SVE_FUNCTION (svwrite_hor, write_za, all_za_data, za_m)
+DEF_SVE_FUNCTION (svwrite_ver, write_za, all_za_data, za_m)
+#undef REQUIRED_EXTENSIONS
+
+#define REQUIRED_EXTENSIONS (AARCH64_FL_SME \
+			     | AARCH64_FL_SME_I16I64 \
+			     | AARCH64_FL_SM_ON \
+			     | AARCH64_FL_ZA_ON)
+DEF_SVE_FUNCTION (svaddha, unary_za_m, d_za_integer, za_m)
+DEF_SVE_FUNCTION (svaddva, unary_za_m, d_za_integer, za_m)
+DEF_SVE_FUNCTION (svmopa, binary_za_m, mop_i16i64, za_m)
+DEF_SVE_FUNCTION (svmops, binary_za_m, mop_i16i64, za_m)
+DEF_SVE_FUNCTION (svsumopa, binary_za_uint_m, mop_i16i64_signed, za_m)
+DEF_SVE_FUNCTION (svsumops, binary_za_uint_m, mop_i16i64_signed, za_m)
+DEF_SVE_FUNCTION (svusmopa, binary_za_int_m, mop_i16i64_unsigned, za_m)
+DEF_SVE_FUNCTION (svusmops, binary_za_int_m, mop_i16i64_unsigned, za_m)
+#undef REQUIRED_EXTENSIONS
+
+#define REQUIRED_EXTENSIONS (AARCH64_FL_SME \
+			     | AARCH64_FL_SME_F64F64 \
+			     | AARCH64_FL_SM_ON \
+			     | AARCH64_FL_ZA_ON)
+DEF_SVE_FUNCTION (svmopa, binary_za_m, mop_f64f64, za_m)
+DEF_SVE_FUNCTION (svmops, binary_za_m, mop_f64f64, za_m)
+#undef REQUIRED_EXTENSIONS
diff --git a/gcc/config/aarch64/aarch64-sve-builtins-sme.h b/gcc/config/aarch64/aarch64-sve-builtins-sme.h
new file mode 100644
index 00000000000..952e6867e9f
--- /dev/null
+++ b/gcc/config/aarch64/aarch64-sve-builtins-sme.h
@@ -0,0 +1,56 @@
+/* ACLE support for AArch64 SVE (__ARM_FEATURE_SVE intrinsics)
+   Copyright (C) 2022 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef GCC_AARCH64_SVE_BUILTINS_SME_H
+#define GCC_AARCH64_SVE_BUILTINS_SME_H
+
+namespace aarch64_sve
+{
+  namespace functions
+  {
+    extern const function_base *const arm_has_sme;
+    extern const function_base *const arm_in_streaming_mode;
+    extern const function_base *const svaddha;
+    extern const function_base *const svaddva;
+    extern const function_base *const svcntsb;
+    extern const function_base *const svcntsd;
+    extern const function_base *const svcntsh;
+    extern const function_base *const svcntsw;
+    extern const function_base *const svld1_hor;
+    extern const function_base *const svld1_ver;
+    extern const function_base *const svldr;
+    extern const function_base *const svmopa;
+    extern const function_base *const svmops;
+    extern const function_base *const svread_hor;
+    extern const function_base *const svread_ver;
+    extern const function_base *const svst1_hor;
+    extern const function_base *const svst1_ver;
+    extern const function_base *const svstr;
+    extern const function_base *const svsumopa;
+    extern const function_base *const svsumops;
+    extern const function_base *const svusmopa;
+    extern const function_base *const svusmops;
+    extern const function_base *const svwrite_hor;
+    extern const function_base *const svwrite_ver;
+    extern const function_base *const svzero;
+    extern const function_base *const svzero_mask;
+  }
+}
+
+#endif
diff --git a/gcc/config/aarch64/aarch64-sve-builtins.cc b/gcc/config/aarch64/aarch64-sve-builtins.cc
index e50a58dcc0a..c8e8bbcdc50 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins.cc
+++ b/gcc/config/aarch64/aarch64-sve-builtins.cc
@@ -51,6 +51,7 @@
 #include "aarch64-sve-builtins.h"
 #include "aarch64-sve-builtins-base.h"
 #include "aarch64-sve-builtins-sve2.h"
+#include "aarch64-sve-builtins-sme.h"
 #include "aarch64-sve-builtins-shapes.h"
 
 namespace aarch64_sve {
@@ -112,6 +113,7 @@ static const char *const pred_suffixes[NUM_PREDS + 1] = {
   "_m",
   "_x",
   "_z",
+  "_m",
   ""
 };
 
@@ -136,12 +138,28 @@ CONSTEXPR const type_suffix_info type_suffixes[NUM_TYPE_SUFFIXES + 1] = {
     TYPE_##CLASS == TYPE_signed || TYPE_##CLASS == TYPE_unsigned, \
     TYPE_##CLASS == TYPE_unsigned, \
     TYPE_##CLASS == TYPE_float, \
+    TYPE_##CLASS != TYPE_bool, \
     TYPE_##CLASS == TYPE_bool, \
+    false, \
+    0, \
+    MODE },
+#define DEF_SME_ZA_SUFFIX(NAME, BITS, MODE) \
+  { "_" #NAME, \
+    NUM_VECTOR_TYPES, \
+    NUM_TYPE_CLASSES, \
+    BITS, \
+    BITS / BITS_PER_UNIT, \
+    false, \
+    false, \
+    false, \
+    false, \
+    false, \
+    true, \
     0, \
     MODE },
 #include "aarch64-sve-builtins.def"
   { "", NUM_VECTOR_TYPES, TYPE_bool, 0, 0, false, false, false, false,
-    0, VOIDmode }
+    false, false, 0, VOIDmode }
 };
 
 /* Define a TYPES_<combination> macro for each combination of type
@@ -415,6 +433,73 @@ CONSTEXPR const type_suffix_info type_suffixes[NUM_TYPE_SUFFIXES + 1] = {
   TYPES_while1 (D, b32), \
   TYPES_while1 (D, b64)
 
+/* _za8 _za16 _za32 _za64 _za128.  */
+#define TYPES_all_za(S, D) \
+  S (za8), S (za16), S (za32), S (za64), S (za128)
+
+/* _za64.  */
+#define TYPES_d_za(S, D) \
+  S (za64)
+
+/* {   _za8 } x {             _s8  _u8 }
+
+   {  _za16 } x { _bf16 _f16 _s16 _u16 }
+
+   {  _za32 } x {       _f32 _s32 _u32 }
+
+   {  _za64 } x {       _f64 _s64 _u64 }
+
+   { _za128 } x {      _bf16           }
+		{       _f16 _f32 _f64 }
+		{ _s8   _s16 _s32 _s64 }
+		{ _u8   _u16 _u32 _u64 }.  */
+#define TYPES_all_za_data(S, D) \
+  D (za8, s8), D (za8, u8), \
+  D (za16, bf16), D (za16, f16), D (za16, s16), D (za16, u16), \
+  D (za32, f32), D (za32, s32), D (za32, u32), \
+  D (za64, f64), D (za64, s64), D (za64, u64), \
+  TYPES_reinterpret1 (D, za128)
+
+/* _za32 x { _s32 _u32 }.  */
+#define TYPES_s_za_integer(S, D) \
+  D (za32, s32), D (za32, u32)
+
+/* _za64 x { _s64 _u64 }.  */
+#define TYPES_d_za_integer(S, D) \
+  D (za64, s64), D (za64, u64)
+
+/* _za32 x { _s8 _u8 _bf16 _f16 _f32 }.  */
+#define TYPES_mop_base(S, D) \
+  D (za32, s8), D (za32, u8), D (za32, bf16), D (za32, f16), D (za32, f32)
+
+/* _za32_s8.  */
+#define TYPES_mop_base_signed(S, D) \
+  D (za32, s8)
+
+/* _za32_u8.  */
+#define TYPES_mop_base_unsigned(S, D) \
+  D (za32, u8)
+
+/* _za64 x { _s16 _u16 }.  */
+#define TYPES_mop_i16i64(S, D) \
+  D (za64, s16), D (za64, u16)
+
+/* _za64_s16.  */
+#define TYPES_mop_i16i64_signed(S, D) \
+  D (za64, s16)
+
+/* _za64_u16.  */
+#define TYPES_mop_i16i64_unsigned(S, D) \
+  D (za64, u16)
+
+/* _za64 x { _f64 _f64 }.  */
+#define TYPES_mop_f64f64(S, D) \
+  D (za64, f64)
+
+/* _za.  */
+#define TYPES_za(S, D) \
+  S (za)
+
 /* Describe a pair of type suffixes in which only the first is used.  */
 #define DEF_VECTOR_TYPE(X) { TYPE_SUFFIX_ ## X, NUM_TYPE_SUFFIXES }
 
@@ -482,6 +567,19 @@ DEF_SVE_TYPES_ARRAY (cvt_narrow);
 DEF_SVE_TYPES_ARRAY (inc_dec_n);
 DEF_SVE_TYPES_ARRAY (reinterpret);
 DEF_SVE_TYPES_ARRAY (while);
+DEF_SVE_TYPES_ARRAY (all_za);
+DEF_SVE_TYPES_ARRAY (d_za);
+DEF_SVE_TYPES_ARRAY (all_za_data);
+DEF_SVE_TYPES_ARRAY (s_za_integer);
+DEF_SVE_TYPES_ARRAY (d_za_integer);
+DEF_SVE_TYPES_ARRAY (mop_base);
+DEF_SVE_TYPES_ARRAY (mop_base_signed);
+DEF_SVE_TYPES_ARRAY (mop_base_unsigned);
+DEF_SVE_TYPES_ARRAY (mop_i16i64);
+DEF_SVE_TYPES_ARRAY (mop_i16i64_signed);
+DEF_SVE_TYPES_ARRAY (mop_i16i64_unsigned);
+DEF_SVE_TYPES_ARRAY (mop_f64f64);
+DEF_SVE_TYPES_ARRAY (za);
 
 /* Used by functions that have no governing predicate.  */
 static const predication_index preds_none[] = { PRED_none, NUM_PREDS };
@@ -490,6 +588,9 @@ static const predication_index preds_none[] = { PRED_none, NUM_PREDS };
    explicit suffix.  */
 static const predication_index preds_implicit[] = { PRED_implicit, NUM_PREDS };
 
+/* Used by functions that only support "_m" predication.  */
+static const predication_index preds_m[] = { PRED_m, NUM_PREDS };
+
 /* Used by functions that allow merging and "don't care" predication,
    but are not suitable for predicated MOVPRFX.  */
 static const predication_index preds_mx[] = {
@@ -521,6 +622,9 @@ static const predication_index preds_z_or_none[] = {
 /* Used by (mostly predicate) functions that only support "_z" predication.  */
 static const predication_index preds_z[] = { PRED_z, NUM_PREDS };
 
+/* Used by SME instructions that always merge into ZA.  */
+static const predication_index preds_za_m[] = { PRED_za_m, NUM_PREDS };
+
 /* A list of all SVE ACLE functions.  */
 static CONSTEXPR const function_group_info function_groups[] = {
 #define DEF_SVE_FUNCTION(NAME, SHAPE, TYPES, PREDS) \
@@ -530,8 +634,8 @@ static CONSTEXPR const function_group_info function_groups[] = {
 };
 
 /* The scalar type associated with each vector type.  */
-extern GTY(()) tree scalar_types[NUM_VECTOR_TYPES];
-tree scalar_types[NUM_VECTOR_TYPES];
+extern GTY(()) tree scalar_types[NUM_VECTOR_TYPES + 1];
+tree scalar_types[NUM_VECTOR_TYPES + 1];
 
 /* The single-predicate and single-vector types, with their built-in
    "__SV..._t" name.  Allow an index of NUM_VECTOR_TYPES, which always
@@ -639,7 +743,7 @@ find_type_suffix_for_scalar_type (const_tree type)
   /* A linear search should be OK here, since the code isn't hot and
      the number of types is only small.  */
   for (unsigned int suffix_i = 0; suffix_i < NUM_TYPE_SUFFIXES; ++suffix_i)
-    if (!type_suffixes[suffix_i].bool_p)
+    if (type_suffixes[suffix_i].vector_p)
       {
 	vector_type_index vector_i = type_suffixes[suffix_i].vector_type;
 	if (matches_type_p (scalar_types[vector_i], type))
@@ -707,6 +811,20 @@ check_required_extensions (location_t location, tree fndecl,
       return false;
     }
 
+  if (missing_extensions & AARCH64_FL_SM_ON)
+    {
+      error_at (location, "ACLE function %qD can only be called when"
+		" SME streaming mode is enabled", fndecl);
+      return false;
+    }
+
+  if (missing_extensions & AARCH64_FL_ZA_ON)
+    {
+      error_at (location, "ACLE function %qD can only be called from"
+		" a function that has ZA state", fndecl);
+      return false;
+    }
+
   static const struct {
     aarch64_feature_flags flag;
     const char *name;
@@ -742,9 +860,13 @@ report_out_of_range (location_t location, tree fndecl, unsigned int argno,
 		     HOST_WIDE_INT actual, HOST_WIDE_INT min,
 		     HOST_WIDE_INT max)
 {
-  error_at (location, "passing %wd to argument %d of %qE, which expects"
-	    " a value in the range [%wd, %wd]", actual, argno + 1, fndecl,
-	    min, max);
+  if (min == max)
+    error_at (location, "passing %wd to argument %d of %qE, which expects"
+	      " the value %wd", actual, argno + 1, fndecl, min);
+  else
+    error_at (location, "passing %wd to argument %d of %qE, which expects"
+	      " a value in the range [%wd, %wd]", actual, argno + 1, fndecl,
+	      min, max);
 }
 
 /* Report that LOCATION has a call to FNDECL in which argument ARGNO has
@@ -830,7 +952,7 @@ function_instance::reads_global_state_p () const
     return true;
 
   /* Handle direct reads of global state.  */
-  return flags & (CP_READ_MEMORY | CP_READ_FFR);
+  return flags & (CP_READ_MEMORY | CP_READ_FFR | CP_READ_ZA);
 }
 
 /* Return true if calls to the function could modify some form of
@@ -851,7 +973,7 @@ function_instance::modifies_global_state_p () const
     return true;
 
   /* Handle direct modifications of global state.  */
-  return flags & (CP_WRITE_MEMORY | CP_WRITE_FFR);
+  return flags & (CP_WRITE_MEMORY | CP_WRITE_FFR | CP_WRITE_ZA);
 }
 
 /* Return true if calls to the function could raise a signal.  */
@@ -871,6 +993,20 @@ function_instance::could_trap_p () const
   return false;
 }
 
+/* Return true if the function shares ZA state with its caller.  */
+bool
+function_instance::shared_za_p () const
+{
+  return (call_properties () & (CP_READ_ZA | CP_WRITE_ZA)) != 0;
+}
+
+/* Return true if the function preserves ZA.  */
+bool
+function_instance::preserves_za_p () const
+{
+  return (call_properties () & CP_WRITE_ZA) == 0;
+}
+
 inline hashval_t
 registered_function_hasher::hash (value_type value)
 {
@@ -883,8 +1019,8 @@ registered_function_hasher::equal (value_type value, const compare_type &key)
   return value->instance == key;
 }
 
-sve_switcher::sve_switcher ()
-  : aarch64_simd_switcher (AARCH64_FL_F16 | AARCH64_FL_SVE)
+sve_switcher::sve_switcher (aarch64_feature_flags flags)
+  : aarch64_simd_switcher (AARCH64_FL_F16 | AARCH64_FL_SVE | flags)
 {
   /* Changing the ISA flags and have_regs_of_mode should be enough here.
      We shouldn't need to pay the compile-time cost of a full target
@@ -940,6 +1076,10 @@ char *
 function_builder::get_name (const function_instance &instance,
 			    bool overloaded_p)
 {
+  /* __arm_* functions are listed as arm_*, so that the associated GCC
+     code is not in the implementation namespace.  */
+  if (strncmp (instance.base_name, "arm_", 4) == 0)
+    append_name ("__");
   append_name (instance.base_name);
   if (overloaded_p)
     switch (instance.displacement_units ())
@@ -981,6 +1121,11 @@ function_builder::get_attributes (const function_instance &instance)
 {
   tree attrs = NULL_TREE;
 
+  if (instance.shared_za_p ())
+    attrs = add_attribute ("arm_shared_za", attrs);
+  if (instance.preserves_za_p ())
+    attrs = add_attribute ("arm_preserves_za", attrs);
+
   if (!instance.modifies_global_state_p ())
     {
       if (instance.reads_global_state_p ())
@@ -1236,12 +1381,24 @@ function_resolver::lookup_form (mode_suffix_index mode,
 
 /* Resolve the function to one with the mode suffix given by MODE and the
    type suffixes given by TYPE0 and TYPE1.  Return its function decl on
-   success, otherwise report an error and return error_mark_node.  */
+   success, otherwise report an error and return error_mark_node.
+
+   As a convenience, resolve_to (MODE, TYPE0) can be used for functions
+   whose first type suffix is explicit, with TYPE0 then describing the
+   second type suffix rather than the first.  */
 tree
 function_resolver::resolve_to (mode_suffix_index mode,
 			       type_suffix_index type0,
 			       type_suffix_index type1)
 {
+  /* Handle convert-like functions in which the first type suffix is
+     explicit.  */
+  if (type_suffix_ids[0] != NUM_TYPE_SUFFIXES && type0 != type_suffix_ids[0])
+    {
+      type1 = type0;
+      type0 = type_suffix_ids[0];
+    }
+
   tree res = lookup_form (mode, type0, type1);
   if (!res)
     {
@@ -2167,6 +2324,7 @@ bool
 function_resolver::check_gp_argument (unsigned int nops,
 				      unsigned int &i, unsigned int &nargs)
 {
+  gcc_assert (pred != PRED_za_m);
   i = 0;
   if (pred != PRED_none)
     {
@@ -2367,9 +2525,7 @@ function_checker::function_checker (location_t location,
 				    unsigned int nargs, tree *args)
   : function_call_info (location, instance, fndecl),
     m_fntype (fntype), m_nargs (nargs), m_args (args),
-    /* We don't have to worry about unary _m operations here, since they
-       never have arguments that need checking.  */
-    m_base_arg (pred != PRED_none ? 1 : 0)
+    m_base_arg (pred != PRED_none && pred != PRED_za_m ? 1 : 0)
 {
 }
 
@@ -2762,21 +2918,51 @@ function_expander::convert_to_pmode (rtx x)
 }
 
 /* Return the base address for a contiguous load or store function.
-   MEM_MODE is the mode of the addressed memory.  */
+   MEM_MODE is the mode of the addressed memory, BASE_ARGNO is
+   the index of the base argument, and VNUM_ARGNO is the index of
+   the vnum offset argument (if any).  VL_ISA_MODE is AARCH64_FL_SM_ON
+   if the vnum argument is a factor of the SME vector length, 0 if it
+   is a factor of the current prevailing vector length.  */
 rtx
-function_expander::get_contiguous_base (machine_mode mem_mode)
+function_expander::get_contiguous_base (machine_mode mem_mode,
+					unsigned int base_argno,
+					unsigned int vnum_argno,
+					aarch64_feature_flags vl_isa_mode)
 {
-  rtx base = convert_to_pmode (args[1]);
+  rtx base = convert_to_pmode (args[base_argno]);
   if (mode_suffix_id == MODE_vnum)
     {
-      /* Use the size of the memory mode for extending loads and truncating
-	 stores.  Use the size of a full vector for non-extending loads
-	 and non-truncating stores (including svld[234] and svst[234]).  */
-      poly_int64 size = ordered_min (GET_MODE_SIZE (mem_mode),
-				     BYTES_PER_SVE_VECTOR);
-      rtx offset = gen_int_mode (size, Pmode);
-      offset = simplify_gen_binary (MULT, Pmode, args[2], offset);
-      base = simplify_gen_binary (PLUS, Pmode, base, offset);
+      rtx vnum = args[vnum_argno];
+      if (vnum != const0_rtx)
+	{
+	  /* Use the size of the memory mode for extending loads and truncating
+	     stores.  Use the size of a full vector for non-extending loads
+	     and non-truncating stores (including svld[234] and svst[234]).  */
+	  poly_int64 size = ordered_min (GET_MODE_SIZE (mem_mode),
+					 BYTES_PER_SVE_VECTOR);
+	  rtx offset;
+	  if ((vl_isa_mode & AARCH64_FL_SM_ON)
+	      && !TARGET_STREAMING
+	      && !size.is_constant ())
+	    {
+	      gcc_assert (known_eq (size, BYTES_PER_SVE_VECTOR));
+	      if (CONST_INT_P (vnum) && IN_RANGE (INTVAL (vnum), -32, 31))
+		offset = aarch64_sme_vq_immediate (Pmode, INTVAL (vnum) * 16,
+						   AARCH64_ISA_MODE);
+	      else
+		{
+		  offset = aarch64_sme_vq_immediate (Pmode, 16,
+						     AARCH64_ISA_MODE);
+		  offset = simplify_gen_binary (MULT, Pmode, vnum, offset);
+		}
+	    }
+	  else
+	    {
+	      offset = gen_int_mode (size, Pmode);
+	      offset = simplify_gen_binary (MULT, Pmode, vnum, offset);
+	    }
+	  base = simplify_gen_binary (PLUS, Pmode, base, offset);
+	}
     }
   return base;
 }
@@ -2883,7 +3069,7 @@ function_expander::add_input_operand (insn_code icode, rtx x)
 
 /* Add an integer operand with value X to the instruction.  */
 void
-function_expander::add_integer_operand (HOST_WIDE_INT x)
+function_expander::add_integer_operand (poly_int64 x)
 {
   m_ops.safe_grow (m_ops.length () + 1, true);
   create_integer_operand (&m_ops.last (), x);
@@ -3428,7 +3614,10 @@ init_builtins ()
   sve_switcher sve;
   register_builtin_types ();
   if (in_lto_p)
-    handle_arm_sve_h ();
+    {
+      handle_arm_sve_h ();
+      handle_arm_sme_h ();
+    }
 }
 
 /* Register vector type TYPE under its arm_sve.h name.  */
@@ -3578,7 +3767,8 @@ handle_arm_sve_h ()
   function_table = new hash_table<registered_function_hasher> (1023);
   function_builder builder;
   for (unsigned int i = 0; i < ARRAY_SIZE (function_groups); ++i)
-    builder.register_function_group (function_groups[i]);
+    if (!(function_groups[i].required_extensions & AARCH64_FL_SME))
+      builder.register_function_group (function_groups[i]);
 }
 
 /* Return the function decl with SVE function subcode CODE, or error_mark_node
@@ -3591,6 +3781,33 @@ builtin_decl (unsigned int code, bool)
   return (*registered_functions)[code]->decl;
 }
 
+/* Implement #pragma GCC aarch64 "arm_sme.h".  */
+void
+handle_arm_sme_h ()
+{
+  if (!function_table)
+    {
+      error ("%qs defined without first defining %qs",
+	     "arm_sme.h", "arm_sve.h");
+      return;
+    }
+
+  static bool initialized_p;
+  if (initialized_p)
+    {
+      error ("duplicate definition of %qs", "arm_sme.h");
+      return;
+    }
+  initialized_p = true;
+
+  sme_switcher sme;
+
+  function_builder builder;
+  for (unsigned int i = 0; i < ARRAY_SIZE (function_groups); ++i)
+    if (function_groups[i].required_extensions & AARCH64_FL_SME)
+      builder.register_function_group (function_groups[i]);
+}
+
 /* If we're implementing manual overloading, check whether the SVE
    function with subcode CODE is overloaded, and if so attempt to
    determine the corresponding non-overloaded function.  The call
diff --git a/gcc/config/aarch64/aarch64-sve-builtins.def b/gcc/config/aarch64/aarch64-sve-builtins.def
index 6e4dcdbc97e..39ef94dc936 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins.def
+++ b/gcc/config/aarch64/aarch64-sve-builtins.def
@@ -29,6 +29,10 @@
 #define DEF_SVE_TYPE_SUFFIX(A, B, C, D, E)
 #endif
 
+#ifndef DEF_SME_ZA_SUFFIX
+#define DEF_SME_ZA_SUFFIX(A, B, C)
+#endif
+
 #ifndef DEF_SVE_FUNCTION
 #define DEF_SVE_FUNCTION(A, B, C, D)
 #endif
@@ -95,10 +99,21 @@ DEF_SVE_TYPE_SUFFIX (u16, svuint16_t, unsigned, 16, VNx8HImode)
 DEF_SVE_TYPE_SUFFIX (u32, svuint32_t, unsigned, 32, VNx4SImode)
 DEF_SVE_TYPE_SUFFIX (u64, svuint64_t, unsigned, 64, VNx2DImode)
 
+/* Arbitrarily associate _za with bytes (by analogy with char's role in C).  */
+DEF_SME_ZA_SUFFIX (za, 8, VNx16QImode)
+
+DEF_SME_ZA_SUFFIX (za8, 8, VNx16QImode)
+DEF_SME_ZA_SUFFIX (za16, 16, VNx8HImode)
+DEF_SME_ZA_SUFFIX (za32, 32, VNx4SImode)
+DEF_SME_ZA_SUFFIX (za64, 64, VNx2DImode)
+DEF_SME_ZA_SUFFIX (za128, 128, VNx1TImode)
+
 #include "aarch64-sve-builtins-base.def"
 #include "aarch64-sve-builtins-sve2.def"
+#include "aarch64-sve-builtins-sme.def"
 
 #undef DEF_SVE_FUNCTION
+#undef DEF_SME_ZA_SUFFIX
 #undef DEF_SVE_TYPE_SUFFIX
 #undef DEF_SVE_TYPE
 #undef DEF_SVE_MODE
diff --git a/gcc/config/aarch64/aarch64-sve-builtins.h b/gcc/config/aarch64/aarch64-sve-builtins.h
index 479b248bef1..f5d66987be3 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins.h
+++ b/gcc/config/aarch64/aarch64-sve-builtins.h
@@ -97,6 +97,8 @@ const unsigned int CP_PREFETCH_MEMORY = 1U << 3;
 const unsigned int CP_WRITE_MEMORY = 1U << 4;
 const unsigned int CP_READ_FFR = 1U << 5;
 const unsigned int CP_WRITE_FFR = 1U << 6;
+const unsigned int CP_READ_ZA = 1U << 7;
+const unsigned int CP_WRITE_ZA = 1U << 8;
 
 /* Enumerates the SVE predicate and (data) vector types, together called
    "vector types" for brevity.  */
@@ -142,6 +144,10 @@ enum predication_index
   /* Zero predication: set inactive lanes of the vector result to zero.  */
   PRED_z,
 
+  /* Merging predication for SME's ZA: merge into slices of the array
+     instead of overwriting the whole slices.  */
+  PRED_za_m,
+
   NUM_PREDS
 };
 
@@ -176,6 +182,8 @@ enum type_suffix_index
 {
 #define DEF_SVE_TYPE_SUFFIX(NAME, ACLE_TYPE, CLASS, BITS, MODE) \
   TYPE_SUFFIX_ ## NAME,
+#define DEF_SME_ZA_SUFFIX(NAME, BITS, MODE) \
+  TYPE_SUFFIX_ ## NAME,
 #include "aarch64-sve-builtins.def"
   NUM_TYPE_SUFFIXES
 };
@@ -229,9 +237,13 @@ struct type_suffix_info
   unsigned int unsigned_p : 1;
   /* True if the suffix is for a floating-point type.  */
   unsigned int float_p : 1;
+  /* True if the suffix is for a vector type (integer or float).  */
+  unsigned int vector_p : 1;
   /* True if the suffix is for a boolean type.  */
   unsigned int bool_p : 1;
-  unsigned int spare : 12;
+  /* True if the suffix is for SME's ZA.  */
+  unsigned int za_p : 1;
+  unsigned int spare : 10;
 
   /* The associated vector or predicate mode.  */
   machine_mode vector_mode : 16;
@@ -283,6 +295,8 @@ public:
   bool reads_global_state_p () const;
   bool modifies_global_state_p () const;
   bool could_trap_p () const;
+  bool shared_za_p () const;
+  bool preserves_za_p () const;
 
   unsigned int vectors_per_tuple () const;
   tree memory_scalar_type () const;
@@ -293,11 +307,13 @@ public:
   tree displacement_vector_type () const;
   units_index displacement_units () const;
 
+  unsigned int num_za_tiles () const;
+
   const type_suffix_info &type_suffix (unsigned int) const;
   tree scalar_type (unsigned int) const;
   tree vector_type (unsigned int) const;
   tree tuple_type (unsigned int) const;
-  unsigned int elements_per_vq (unsigned int i) const;
+  unsigned int elements_per_vq (unsigned int) const;
   machine_mode vector_mode (unsigned int) const;
   machine_mode gp_mode (unsigned int) const;
 
@@ -532,7 +548,8 @@ public:
   bool overlaps_input_p (rtx);
 
   rtx convert_to_pmode (rtx);
-  rtx get_contiguous_base (machine_mode);
+  rtx get_contiguous_base (machine_mode, unsigned int = 1, unsigned int = 2,
+			   aarch64_feature_flags = 0);
   rtx get_fallback_value (machine_mode, unsigned int,
 			  unsigned int, unsigned int &);
   rtx get_reg_target ();
@@ -540,7 +557,7 @@ public:
 
   void add_output_operand (insn_code);
   void add_input_operand (insn_code, rtx);
-  void add_integer_operand (HOST_WIDE_INT);
+  void add_integer_operand (poly_int64);
   void add_mem_operand (machine_mode, rtx);
   void add_address_operand (rtx);
   void add_fixed_operand (rtx);
@@ -660,7 +677,7 @@ public:
 class sve_switcher : public aarch64_simd_switcher
 {
 public:
-  sve_switcher ();
+  sve_switcher (aarch64_feature_flags = 0);
   ~sve_switcher ();
 
 private:
@@ -668,10 +685,17 @@ private:
   bool m_old_have_regs_of_mode[MAX_MACHINE_MODE];
 };
 
+/* Extends sve_switch enough for defining arm_sme.h.  */
+class sme_switcher : public sve_switcher
+{
+public:
+  sme_switcher () : sve_switcher (AARCH64_FL_SME) {}
+};
+
 extern const type_suffix_info type_suffixes[NUM_TYPE_SUFFIXES + 1];
 extern const mode_suffix_info mode_suffixes[MODE_none + 1];
 
-extern tree scalar_types[NUM_VECTOR_TYPES];
+extern tree scalar_types[NUM_VECTOR_TYPES + 1];
 extern tree acle_vector_types[MAX_TUPLE_SIZE][NUM_VECTOR_TYPES + 1];
 extern tree acle_svpattern;
 extern tree acle_svprfop;
@@ -801,6 +825,16 @@ function_instance::displacement_vector_type () const
   return acle_vector_types[0][mode_suffix ().displacement_vector_type];
 }
 
+/* Return the number of ZA tiles associated with the _za<N> suffix
+   (which is always the first type suffix).  */
+inline unsigned int
+function_instance::num_za_tiles () const
+{
+  auto &suffix = type_suffix (0);
+  gcc_checking_assert (suffix.za_p);
+  return suffix.element_bytes;
+}
+
 /* If the function takes a vector or scalar displacement, return the units
    in which the displacement is measured, otherwise return UNITS_none.  */
 inline units_index
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index d29cfefee6b..966d13abe4c 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -5643,15 +5643,26 @@ aarch64_output_sve_scalar_inc_dec (rtx offset)
 }
 
 /* Return true if a single RDVL instruction can multiply FACTOR by the
-   number of 128-bit quadwords in an SVE vector.  */
+   number of 128-bit quadwords in an SVE vector.  This is also the
+   range of ADDVL.  */
 
 static bool
-aarch64_sve_rdvl_factor_p (HOST_WIDE_INT factor)
+aarch64_sve_rdvl_addvl_factor_p (HOST_WIDE_INT factor)
 {
   return (multiple_p (factor, 16)
 	  && IN_RANGE (factor, -32 * 16, 31 * 16));
 }
 
+/* Return true if ADDPL can be used to add FACTOR multiplied by the number
+   of quadwords in an SVE vector.  */
+
+static bool
+aarch64_sve_addpl_factor_p (HOST_WIDE_INT factor)
+{
+  return (multiple_p (factor, 2)
+	  && IN_RANGE (factor, -32 * 2, 31 * 2));
+}
+
 /* Return true if we can move VALUE into a register using a single
    RDVL instruction.  */
 
@@ -5659,7 +5670,7 @@ static bool
 aarch64_sve_rdvl_immediate_p (poly_int64 value)
 {
   HOST_WIDE_INT factor = value.coeffs[0];
-  return value.coeffs[1] == factor && aarch64_sve_rdvl_factor_p (factor);
+  return value.coeffs[1] == factor && aarch64_sve_rdvl_addvl_factor_p (factor);
 }
 
 /* Likewise for rtx X.  */
@@ -5695,10 +5706,8 @@ aarch64_sve_addvl_addpl_immediate_p (poly_int64 value)
   HOST_WIDE_INT factor = value.coeffs[0];
   if (factor == 0 || value.coeffs[1] != factor)
     return false;
-  /* FACTOR counts VG / 2, so a value of 2 is one predicate width
-     and a value of 16 is one vector width.  */
-  return (((factor & 15) == 0 && IN_RANGE (factor, -32 * 16, 31 * 16))
-	  || ((factor & 1) == 0 && IN_RANGE (factor, -32 * 2, 31 * 2)));
+  return (aarch64_sve_rdvl_addvl_factor_p (factor)
+	  || aarch64_sve_addpl_factor_p (factor));
 }
 
 /* Likewise for rtx X.  */
@@ -5798,11 +5807,11 @@ aarch64_output_sve_vector_inc_dec (const char *operands, rtx x)
    number of 128-bit quadwords in an SME vector.  ISA_MODE is the
    ISA mode in which the calculation is being performed.  */
 
-static rtx
+rtx
 aarch64_sme_vq_immediate (machine_mode mode, HOST_WIDE_INT factor,
 			  aarch64_feature_flags isa_mode)
 {
-  gcc_assert (aarch64_sve_rdvl_factor_p (factor));
+  gcc_assert (aarch64_sve_rdvl_addvl_factor_p (factor));
   if (isa_mode & AARCH64_FL_SM_ON)
     /* We're in streaming mode, so we can use normal poly-int values.  */
     return gen_int_mode ({ factor, factor }, mode);
@@ -5845,7 +5854,7 @@ aarch64_rdsvl_immediate_p (const_rtx x)
 {
   HOST_WIDE_INT factor;
   return (aarch64_sme_vq_unspec_p (x, &factor)
-	  && aarch64_sve_rdvl_factor_p (factor));
+	  && aarch64_sve_rdvl_addvl_factor_p (factor));
 }
 
 /* Return the asm string for an RDSVL instruction that calculates X,
@@ -5862,6 +5871,38 @@ aarch64_output_rdsvl (const_rtx x)
   return buffer;
 }
 
+/* Return true if X is a constant that can be added using ADDSVL or ADDSPL.  */
+
+bool
+aarch64_addsvl_addspl_immediate_p (const_rtx x)
+{
+  HOST_WIDE_INT factor;
+  return (aarch64_sme_vq_unspec_p (x, &factor)
+	  && (aarch64_sve_rdvl_addvl_factor_p (factor)
+	      || aarch64_sve_addpl_factor_p (factor)));
+}
+
+/* X is a constant that satisfies aarch64_addsvl_addspl_immediate_p.
+   Return the asm string for the associated instruction.  */
+
+char *
+aarch64_output_addsvl_addspl (rtx x)
+{
+  static char buffer[sizeof ("addspl\t%x0, %x1, #-") + 3 * sizeof (int)];
+  HOST_WIDE_INT factor;
+  if (!aarch64_sme_vq_unspec_p (x, &factor))
+    gcc_unreachable ();
+  if (aarch64_sve_rdvl_addvl_factor_p (factor))
+    snprintf (buffer, sizeof (buffer), "addsvl\t%%x0, %%x1, #%d",
+	      (int) factor / 16);
+  else if (aarch64_sve_addpl_factor_p (factor))
+    snprintf (buffer, sizeof (buffer), "addspl\t%%x0, %%x1, #%d",
+	      (int) factor / 2);
+  else
+    gcc_unreachable ();
+  return buffer;
+}
+
 /* Multipliers for repeating bitmasks of width 32, 16, 8, 4, and 2.  */
 
 static const unsigned HOST_WIDE_INT bitmask_imm_mul[] =
@@ -6471,7 +6512,7 @@ aarch64_add_offset (scalar_int_mode mode, rtx dest, rtx src,
 	      shift = 0;
 	    }
 	  /* Try to use an unshifted RDVL.  */
-	  else if (aarch64_sve_rdvl_factor_p (factor))
+	  else if (aarch64_sve_rdvl_addvl_factor_p (factor))
 	    {
 	      val = gen_int_mode (poly_int64 (factor, factor), mode);
 	      shift = 0;
@@ -11354,6 +11395,9 @@ aarch64_cannot_force_const_mem (machine_mode mode ATTRIBUTE_UNUSED, rtx x)
   if (GET_CODE (x) == HIGH)
     return true;
 
+  if (aarch64_rdsvl_immediate_p (x))
+    return true;
+
   /* There's no way to calculate VL-based values using relocations.  */
   subrtx_iterator::array_type array;
   FOR_EACH_SUBRTX (iter, array, x, ALL)
@@ -11569,7 +11613,7 @@ aarch64_classify_index (struct aarch64_address_info *info, rtx x,
       && contains_reg_of_mode[GENERAL_REGS][GET_MODE (SUBREG_REG (index))])
     index = SUBREG_REG (index);
 
-  if (aarch64_sve_data_mode_p (mode))
+  if (aarch64_sve_data_mode_p (mode) || mode == VNx1TImode)
     {
       if (type != ADDRESS_REG_REG
 	  || (1 << shift) != GET_MODE_UNIT_SIZE (mode))
@@ -11672,7 +11716,8 @@ aarch64_classify_address (struct aarch64_address_info *info,
 			    && ((vec_flags == 0
 				 && known_lt (GET_MODE_SIZE (mode), 16))
 				|| vec_flags == VEC_ADVSIMD
-				|| vec_flags & VEC_SVE_DATA));
+				|| vec_flags & VEC_SVE_DATA
+				|| mode == VNx1TImode));
 
   /* For SVE, only accept [Rn], [Rn, #offset, MUL VL] and [Rn, Rm, LSL #shift].
      The latter is not valid for SVE predicates, and that's rejected through
@@ -11791,7 +11836,7 @@ aarch64_classify_address (struct aarch64_address_info *info,
 	  /* Make "m" use the LD1 offset range for SVE data modes, so
 	     that pre-RTL optimizers like ivopts will work to that
 	     instead of the wider LDR/STR range.  */
-	  if (vec_flags == VEC_SVE_DATA)
+	  if (vec_flags == VEC_SVE_DATA || mode == VNx1TImode)
 	    return (type == ADDR_QUERY_M
 		    ? offset_4bit_signed_scaled_p (mode, offset)
 		    : offset_9bit_signed_scaled_p (mode, offset));
@@ -14090,6 +14135,51 @@ aarch64_output_casesi (rtx *operands)
   return "";
 }
 
+/* Return the asm string for an SME ZERO instruction whose 8-bit mask
+   operand is MASK,  */
+const char *
+aarch64_output_sme_zero (rtx mask)
+{
+  auto mask_val = UINTVAL (mask);
+  if (mask_val == 0)
+    return "zero\t{}";
+
+  if (mask_val == 0xff)
+    return "zero\t{ za }";
+
+  static constexpr std::pair<unsigned int, char> tiles[] = {
+    { 0xff, 'b' },
+    { 0x55, 'h' },
+    { 0x11, 's' },
+    { 0x01, 'd' }
+  };
+  /* The last entry in the list has the form "za7.d }", but that's the
+     same length as "za7.d, ".  */
+  static char buffer[sizeof("zero\t{ ") + sizeof ("za7.d, ") * 8 + 1];
+  unsigned int i = 0;
+  i += snprintf (buffer + i, sizeof (buffer) - i, "zero\t");
+  const char *prefix = "{ ";
+  for (auto &tile : tiles)
+    {
+      auto tile_mask = tile.first;
+      unsigned int tile_index = 0;
+      while (tile_mask < 0x100)
+	{
+	  if ((mask_val & tile_mask) == tile_mask)
+	    {
+	      i += snprintf (buffer + i, sizeof (buffer) - i, "%sza%d.%c",
+			     prefix, tile_index, tile.second);
+	      prefix = ", ";
+	      mask_val &= ~tile_mask;
+	    }
+	  tile_mask <<= 1;
+	  tile_index += 1;
+	}
+    }
+  gcc_assert (mask_val == 0 && i + 3 <= sizeof (buffer));
+  snprintf (buffer + i, sizeof (buffer) - i, " }");
+  return buffer;
+}
 
 /* Return size in bits of an arithmetic operand which is shifted/scaled and
    masked such that it is suitable for a UXTB, UXTH, or UXTW extend
@@ -23015,6 +23105,31 @@ aarch64_sve_struct_memory_operand_p (rtx op)
 	  && offset_4bit_signed_scaled_p (SVE_BYTE_MODE, last));
 }
 
+/* Return true if OFFSET is a constant integer and if VNUM is
+   OFFSET * the number of bytes in an SVE vector.  This is the requirement
+   that exists in SME LDR and STR instructions, where the VL offset must
+   equal the ZA slice offset.  */
+bool
+aarch64_sme_ldr_vnum_offset_p (rtx offset, rtx vnum)
+{
+  if (!CONST_INT_P (offset) || !IN_RANGE (INTVAL (offset), 0, 15))
+    return false;
+
+  if (TARGET_STREAMING)
+    {
+      poly_int64 const_vnum;
+      return (poly_int_rtx_p (vnum, &const_vnum)
+	      && known_eq (const_vnum,
+			   INTVAL (offset) * BYTES_PER_SVE_VECTOR));
+    }
+  else
+    {
+      HOST_WIDE_INT factor;
+      return (aarch64_sme_vq_unspec_p (vnum, &factor)
+	      && factor == INTVAL (offset) * 16);
+    }
+}
+
 /* Emit a register copy from operand to operand, taking care not to
    early-clobber source registers in the process.
 
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index bfa28726221..bc86d7220f1 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -207,6 +207,7 @@ constexpr auto AARCH64_FL_DEFAULT_ISA_MODE = AARCH64_FL_SM_OFF;
 /* Macros to test ISA flags.  */
 
 #define AARCH64_ISA_SM_OFF         (aarch64_isa_flags & AARCH64_FL_SM_OFF)
+#define AARCH64_ISA_SM_ON          (aarch64_isa_flags & AARCH64_FL_SM_ON)
 #define AARCH64_ISA_ZA_ON          (aarch64_isa_flags & AARCH64_FL_ZA_ON)
 #define AARCH64_ISA_MODE           (aarch64_isa_flags & AARCH64_FL_ISA_MODES)
 #define AARCH64_ISA_CRC            (aarch64_isa_flags & AARCH64_FL_CRC)
@@ -224,6 +225,8 @@ constexpr auto AARCH64_FL_DEFAULT_ISA_MODE = AARCH64_FL_SM_OFF;
 #define AARCH64_ISA_SVE2_SHA3	   (aarch64_isa_flags & AARCH64_FL_SVE2_SHA3)
 #define AARCH64_ISA_SVE2_SM4	   (aarch64_isa_flags & AARCH64_FL_SVE2_SM4)
 #define AARCH64_ISA_SME		   (aarch64_isa_flags & AARCH64_FL_SME)
+#define AARCH64_ISA_SME_I16I64	   (aarch64_isa_flags & AARCH64_FL_SME_I16I64)
+#define AARCH64_ISA_SME_F64F64	   (aarch64_isa_flags & AARCH64_FL_SME_F64F64)
 #define AARCH64_ISA_V8_3A	   (aarch64_isa_flags & AARCH64_FL_V8_3A)
 #define AARCH64_ISA_DOTPROD	   (aarch64_isa_flags & AARCH64_FL_DOTPROD)
 #define AARCH64_ISA_AES	           (aarch64_isa_flags & AARCH64_FL_AES)
@@ -256,6 +259,9 @@ constexpr auto AARCH64_FL_DEFAULT_ISA_MODE = AARCH64_FL_SM_OFF;
 /* The current function is a normal non-streaming function.  */
 #define TARGET_NON_STREAMING (AARCH64_ISA_SM_OFF)
 
+/* The current function has a streaming body.  */
+#define TARGET_STREAMING (AARCH64_ISA_SM_ON)
+
 /* The current function has a streaming-compatible body.  */
 #define TARGET_STREAMING_COMPATIBLE \
   ((aarch64_isa_flags & AARCH64_FL_SM_STATE) == 0)
@@ -316,6 +322,15 @@ constexpr auto AARCH64_FL_DEFAULT_ISA_MODE = AARCH64_FL_SM_OFF;
    imply anything about the state of PSTATE.SM.  */
 #define TARGET_SME (AARCH64_ISA_SME)
 
+/* Streaming-mode SME instructions.  */
+#define TARGET_STREAMING_SME (TARGET_STREAMING && TARGET_SME)
+
+/* The FEAT_SME_I16I64 extension to SME, enabled through +sme-i16i64.  */
+#define TARGET_SME_I16I64 (AARCH64_ISA_SME_I16I64)
+
+/* The FEAT_SME_F64F64 extension to SME, enabled through +sme-f64f64.  */
+#define TARGET_SME_F64F64 (AARCH64_ISA_SME_F64F64)
+
 /* ARMv8.3-A features.  */
 #define TARGET_ARMV8_3	(AARCH64_ISA_V8_3A)
 
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 3ebe8690c31..de6bf5e6c4d 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -2097,10 +2097,10 @@ (define_expand "add<mode>3"
 
 (define_insn "*add<mode>3_aarch64"
   [(set
-    (match_operand:GPI 0 "register_operand" "=rk,rk,w,rk,r,r,rk")
+    (match_operand:GPI 0 "register_operand" "=rk,rk,w,rk,r,r,rk,rk")
     (plus:GPI
-     (match_operand:GPI 1 "register_operand" "%rk,rk,w,rk,rk,0,rk")
-     (match_operand:GPI 2 "aarch64_pluslong_operand" "I,r,w,J,Uaa,Uai,Uav")))]
+     (match_operand:GPI 1 "register_operand" "%rk,rk,w,rk,rk,0,rk,rk")
+     (match_operand:GPI 2 "aarch64_pluslong_operand" "I,r,w,J,Uaa,Uai,Uav,UaV")))]
   ""
   "@
   add\\t%<w>0, %<w>1, %2
@@ -2109,10 +2109,12 @@ (define_insn "*add<mode>3_aarch64"
   sub\\t%<w>0, %<w>1, #%n2
   #
   * return aarch64_output_sve_scalar_inc_dec (operands[2]);
-  * return aarch64_output_sve_addvl_addpl (operands[2]);"
+  * return aarch64_output_sve_addvl_addpl (operands[2]);
+  * return aarch64_output_addsvl_addspl (operands[2]);"
   ;; The "alu_imm" types for INC/DEC and ADDVL/ADDPL are just placeholders.
-  [(set_attr "type" "alu_imm,alu_sreg,neon_add,alu_imm,multiple,alu_imm,alu_imm")
-   (set_attr "arch" "*,*,simd,*,*,sve,sve")]
+  [(set_attr "type" "alu_imm,alu_sreg,neon_add,alu_imm,multiple,alu_imm,
+		     alu_imm,alu_imm")
+   (set_attr "arch" "*,*,simd,*,*,sve,sve,sme")]
 )
 
 ;; zero_extend version of above
diff --git a/gcc/config/aarch64/arm_sme.h b/gcc/config/aarch64/arm_sme.h
new file mode 100644
index 00000000000..ab6ec3341c3
--- /dev/null
+++ b/gcc/config/aarch64/arm_sme.h
@@ -0,0 +1,46 @@
+/* AArch64 SVE intrinsics include file.
+   Copyright (C) 2022 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published
+   by the Free Software Foundation; either version 3, or (at your
+   option) any later version.
+
+   GCC is distributed in the hope that it will be useful, but WITHOUT
+   ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+   or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
+   License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef _ARM_SME_H_
+#define _ARM_SME_H_
+
+#include <arm_sve.h>
+#pragma GCC aarch64 "arm_sme.h"
+
+__attribute__((arm_streaming_compatible))
+void __arm_za_disable(void);
+
+__attribute__((arm_streaming_compatible, arm_preserves_za))
+void *__arm_sc_memcpy(void *, const void *, __SIZE_TYPE__);
+
+__attribute__((arm_streaming_compatible, arm_preserves_za))
+void *__arm_sc_memmove(void *, const void *, __SIZE_TYPE__);
+
+__attribute__((arm_streaming_compatible, arm_preserves_za))
+void *__arm_sc_memset(void *, int, __SIZE_TYPE__);
+
+__attribute__((arm_streaming_compatible, arm_preserves_za))
+void *__arm_sc_memchr(void *, int, __SIZE_TYPE__);
+
+#endif
diff --git a/gcc/config/aarch64/constraints.md b/gcc/config/aarch64/constraints.md
index 8d4393f30a1..7e35374975e 100644
--- a/gcc/config/aarch64/constraints.md
+++ b/gcc/config/aarch64/constraints.md
@@ -21,6 +21,9 @@
 (define_register_constraint "k" "STACK_REG"
   "@internal The stack register.")
 
+(define_register_constraint "Uci" "ZA_INDEX_REGS"
+  "@internal r12-r15, which can be used to index ZA.")
+
 (define_register_constraint "Ucs" "TAILCALL_ADDR_REGS"
   "@internal Registers suitable for an indirect tail call")
 
@@ -74,6 +77,12 @@ (define_constraint "Uav"
    a single ADDVL or ADDPL."
  (match_operand 0 "aarch64_sve_addvl_addpl_immediate"))
 
+(define_constraint "UaV"
+  "@internal
+   A constraint that matches a VG-based constant that can be added by
+   a single ADDVL or ADDPL."
+ (match_operand 0 "aarch64_addsvl_addspl_immediate"))
+
 (define_constraint "Uat"
   "@internal
    A constraint that matches a VG-based constant that can be added by
diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index 8d65fadbdf6..5a71049751d 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -450,6 +450,7 @@ (define_mode_iterator VNx4SI_ONLY [VNx4SI])
 (define_mode_iterator VNx4SF_ONLY [VNx4SF])
 (define_mode_iterator VNx2DI_ONLY [VNx2DI])
 (define_mode_iterator VNx2DF_ONLY [VNx2DF])
+(define_mode_iterator VNx1TI_ONLY [VNx1TI])
 
 ;; All SVE vector structure modes.
 (define_mode_iterator SVE_STRUCT [VNx32QI VNx16HI VNx8SI VNx4DI
@@ -598,6 +599,15 @@ (define_mode_iterator PRED_HSD [VNx8BI VNx4BI VNx2BI])
 ;; Bfloat16 modes to which V4SF can be converted
 (define_mode_iterator V4SF_TO_BF [V4BF V8BF])
 
+;; The modes used to represent different ZA access sizes.
+(define_mode_iterator SME_ZA_I [VNx16QI VNx8HI VNx4SI VNx2DI VNx1TI])
+(define_mode_iterator SME_ZA_SDI [VNx4SI (VNx2DI "TARGET_SME_I16I64")])
+
+;; The modes for which outer product instructions are supported.
+(define_mode_iterator SME_MOP_BHI [VNx16QI (VNx8HI "TARGET_SME_I16I64")])
+(define_mode_iterator SME_MOP_HSDF [VNx8BF VNx8HF VNx4SF
+				    (VNx2DF "TARGET_SME_F64F64")])
+
 ;; ------------------------------------------------------------------
 ;; Unspec enumerations for Advance SIMD. These could well go into
 ;; aarch64.md but for their use in int_iterators here.
@@ -976,6 +986,28 @@ (define_c_enum "unspec"
     UNSPEC_BFCVTN2     ; Used in aarch64-simd.md.
     UNSPEC_BFCVT       ; Used in aarch64-simd.md.
     UNSPEC_FCVTXN	; Used in aarch64-simd.md.
+
+    ;; All used in aarch64-sme.md
+    UNSPEC_SME_ADDHA
+    UNSPEC_SME_ADDVA
+    UNSPEC_SME_FMOPA
+    UNSPEC_SME_FMOPS
+    UNSPEC_SME_LD1_HOR
+    UNSPEC_SME_LD1_VER
+    UNSPEC_SME_READ_HOR
+    UNSPEC_SME_READ_VER
+    UNSPEC_SME_SMOPA
+    UNSPEC_SME_SMOPS
+    UNSPEC_SME_ST1_HOR
+    UNSPEC_SME_ST1_VER
+    UNSPEC_SME_SUMOPA
+    UNSPEC_SME_SUMOPS
+    UNSPEC_SME_UMOPA
+    UNSPEC_SME_UMOPS
+    UNSPEC_SME_USMOPA
+    UNSPEC_SME_USMOPS
+    UNSPEC_SME_WRITE_HOR
+    UNSPEC_SME_WRITE_VER
 ])
 
 ;; ------------------------------------------------------------------
@@ -1232,6 +1264,7 @@ (define_mode_attr Vetype [(V8QI "b") (V16QI "b")
 			  (VNx4SF "s") (VNx2SF "s")
 			  (VNx2DI "d")
 			  (VNx2DF "d")
+			  (VNx1TI "q")
 			  (BF "h") (V4BF "h") (V8BF "h")
 			  (HF "h")
 			  (SF "s") (DF "d")
@@ -1250,6 +1283,7 @@ (define_mode_attr Vesize [(VNx16QI "b") (VNx8QI "b") (VNx4QI "b") (VNx2QI "b")
 			  (VNx4SF "w") (VNx2SF "w")
 			  (VNx2DI "d")
 			  (VNx2DF "d")
+			  (VNx1TI "q")
 			  (VNx32QI "b") (VNx48QI "b") (VNx64QI "b")
 			  (VNx16HI "h") (VNx24HI "h") (VNx32HI "h")
 			  (VNx16HF "h") (VNx24HF "h") (VNx32HF "h")
@@ -1574,6 +1608,15 @@ (define_mode_attr Vmwtype [(V8QI ".8h") (V4HI ".4s")
 			   (V4HF ".4s") (V2SF ".2d")
 			   (SI   "")    (HI   "")])
 
+;; Vector modes whose elements are four times wider.
+(define_mode_attr V4xWIDE [(VNx16QI "VNx4SI") (VNx8HI "VNx2DI")])
+
+;; Predicate modes for V4xWIDE.
+(define_mode_attr V4xWIDE_PRED [(VNx16QI "VNx4BI") (VNx8HI "VNx2BI")])
+
+;; Element suffix for V4xWIDE.
+(define_mode_attr V4xwetype [(VNx16QI "s") (VNx8HI "d")])
+
 ;; Lower part register suffixes for VQW/VQ_HSF.
 (define_mode_attr Vhalftype [(V16QI "8b") (V8HI "4h")
 			     (V4SI "2s") (V8HF "4h")
@@ -2046,6 +2089,7 @@ (define_mode_attr VPRED [(VNx16QI "VNx16BI") (VNx8QI "VNx8BI")
 			 (VNx4SF "VNx4BI") (VNx2SF "VNx2BI")
 			 (VNx2DI "VNx2BI")
 			 (VNx2DF "VNx2BI")
+			 (VNx1TI "VNx2BI")
 			 (VNx32QI "VNx16BI")
 			 (VNx16HI "VNx8BI") (VNx16HF "VNx8BI")
 			 (VNx16BF "VNx8BI")
@@ -2126,6 +2170,17 @@ (define_mode_attr sve_lane_con [(VNx8HI "y") (VNx4SI "y") (VNx2DI "x")
 ;; The constraint to use for an SVE FCMLA lane index.
 (define_mode_attr sve_lane_pair_con [(VNx8HF "y") (VNx4SF "x")])
 
+(define_mode_attr SME_FMOP_WIDE [(VNx8BF "VNx4SF") (VNx8HF "VNx4SF")
+				 (VNx4SF "VNx4SF") (VNx2DF "VNx2DF")])
+
+(define_mode_attr SME_FMOP_WIDE_PRED [(VNx8BF "VNx4BI") (VNx8HF "VNx4BI")
+				      (VNx4SF "VNx4BI") (VNx2DF "VNx2BI")])
+
+(define_mode_attr sme_fmop_wide_etype [(VNx8BF "s") (VNx8HF "s")
+				       (VNx4SF "s") (VNx2DF "d")])
+
+(define_mode_attr b [(VNx8BF "b") (VNx8HF "") (VNx4SF "") (VNx2DF "")])
+
 ;; -------------------------------------------------------------------
 ;; Code Iterators
 ;; -------------------------------------------------------------------
@@ -3160,6 +3215,20 @@ (define_int_iterator FCMLA_OP [UNSPEC_FCMLA
 (define_int_iterator FCMUL_OP [UNSPEC_FCMUL
 			       UNSPEC_FCMUL_CONJ])
 
+(define_int_iterator SME_LD1 [UNSPEC_SME_LD1_HOR UNSPEC_SME_LD1_VER])
+(define_int_iterator SME_READ [UNSPEC_SME_READ_HOR UNSPEC_SME_READ_VER])
+(define_int_iterator SME_ST1 [UNSPEC_SME_ST1_HOR UNSPEC_SME_ST1_VER])
+(define_int_iterator SME_WRITE [UNSPEC_SME_WRITE_HOR UNSPEC_SME_WRITE_VER])
+
+(define_int_iterator SME_UNARY_SDI [UNSPEC_SME_ADDHA UNSPEC_SME_ADDVA])
+
+(define_int_iterator SME_INT_MOP [UNSPEC_SME_SMOPA UNSPEC_SME_SMOPS
+				  UNSPEC_SME_SUMOPA UNSPEC_SME_SUMOPS
+				  UNSPEC_SME_UMOPA UNSPEC_SME_UMOPS
+				  UNSPEC_SME_USMOPA UNSPEC_SME_USMOPS])
+
+(define_int_iterator SME_FP_MOP [UNSPEC_SME_FMOPA UNSPEC_SME_FMOPS])
+
 ;; Iterators for atomic operations.
 
 (define_int_iterator ATOMIC_LDOP
@@ -3232,6 +3301,26 @@ (define_int_attr optab [(UNSPEC_ANDF "and")
 			(UNSPEC_PMULLT "pmullt")
 			(UNSPEC_PMULLT_PAIR "pmullt_pair")
 			(UNSPEC_SMATMUL "smatmul")
+			(UNSPEC_SME_ADDHA "addha")
+			(UNSPEC_SME_ADDVA "addva")
+			(UNSPEC_SME_FMOPA "fmopa")
+			(UNSPEC_SME_FMOPS "fmops")
+			(UNSPEC_SME_LD1_HOR "ld1_hor")
+			(UNSPEC_SME_LD1_VER "ld1_ver")
+			(UNSPEC_SME_READ_HOR "read_hor")
+			(UNSPEC_SME_READ_VER "read_ver")
+			(UNSPEC_SME_SMOPA "smopa")
+			(UNSPEC_SME_SMOPS "smops")
+			(UNSPEC_SME_ST1_HOR "st1_hor")
+			(UNSPEC_SME_ST1_VER "st1_ver")
+			(UNSPEC_SME_SUMOPA "sumopa")
+			(UNSPEC_SME_SUMOPS "sumops")
+			(UNSPEC_SME_UMOPA "umopa")
+			(UNSPEC_SME_UMOPS "umops")
+			(UNSPEC_SME_USMOPA "usmopa")
+			(UNSPEC_SME_USMOPS "usmops")
+			(UNSPEC_SME_WRITE_HOR "write_hor")
+			(UNSPEC_SME_WRITE_VER "write_ver")
 			(UNSPEC_SQCADD90 "sqcadd90")
 			(UNSPEC_SQCADD270 "sqcadd270")
 			(UNSPEC_SQRDCMLAH "sqrdcmlah")
@@ -4001,6 +4090,15 @@ (define_int_attr min_elem_bits [(UNSPEC_RBIT "8")
 (define_int_attr unspec [(UNSPEC_WHILERW "UNSPEC_WHILERW")
 			 (UNSPEC_WHILEWR "UNSPEC_WHILEWR")])
 
+(define_int_attr hv [(UNSPEC_SME_LD1_HOR "h")
+		     (UNSPEC_SME_LD1_VER "v")
+		     (UNSPEC_SME_READ_HOR "h")
+		     (UNSPEC_SME_READ_VER "v")
+		     (UNSPEC_SME_ST1_HOR "h")
+		     (UNSPEC_SME_ST1_VER "v")
+		     (UNSPEC_SME_WRITE_HOR "h")
+		     (UNSPEC_SME_WRITE_VER "v")])
+
 ;; Iterators and attributes for fpcr fpsr getter setters
 
 (define_int_iterator GET_FPSCR
diff --git a/gcc/config/aarch64/predicates.md b/gcc/config/aarch64/predicates.md
index c308015ac2c..9e4a70ad9e9 100644
--- a/gcc/config/aarch64/predicates.md
+++ b/gcc/config/aarch64/predicates.md
@@ -168,11 +168,17 @@ (define_predicate "aarch64_split_add_offset_immediate"
   (and (match_code "const_poly_int")
        (match_test "aarch64_add_offset_temporaries (op) == 1")))
 
+(define_predicate "aarch64_addsvl_addspl_immediate"
+  (and (match_code "const")
+       (match_test "aarch64_addsvl_addspl_immediate_p (op)")))
+
 (define_predicate "aarch64_pluslong_operand"
   (ior (match_operand 0 "register_operand")
        (match_operand 0 "aarch64_pluslong_immediate")
        (and (match_test "TARGET_SVE")
-	    (match_operand 0 "aarch64_sve_plus_immediate"))))
+	    (match_operand 0 "aarch64_sve_plus_immediate"))
+       (and (match_test "TARGET_SME")
+	    (match_operand 0 "aarch64_addsvl_addspl_immediate"))))
 
 (define_predicate "aarch64_pluslong_or_poly_operand"
   (ior (match_operand 0 "aarch64_pluslong_operand")
diff --git a/gcc/config/aarch64/t-aarch64 b/gcc/config/aarch64/t-aarch64
index c1c8f5c7dae..2438b78a87f 100644
--- a/gcc/config/aarch64/t-aarch64
+++ b/gcc/config/aarch64/t-aarch64
@@ -63,6 +63,7 @@ aarch64-sve-builtins.o: $(srcdir)/config/aarch64/aarch64-sve-builtins.cc \
   $(srcdir)/config/aarch64/aarch64-sve-builtins.def \
   $(srcdir)/config/aarch64/aarch64-sve-builtins-base.def \
   $(srcdir)/config/aarch64/aarch64-sve-builtins-sve2.def \
+  $(srcdir)/config/aarch64/aarch64-sve-builtins-sme.def \
   $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(TREE_H) $(RTL_H) \
   $(TM_P_H) memmodel.h insn-codes.h $(OPTABS_H) $(RECOG_H) $(DIAGNOSTIC_H) \
   $(EXPR_H) $(BASIC_BLOCK_H) $(FUNCTION_H) fold-const.h $(GIMPLE_H) \
@@ -72,7 +73,8 @@ aarch64-sve-builtins.o: $(srcdir)/config/aarch64/aarch64-sve-builtins.cc \
   $(srcdir)/config/aarch64/aarch64-sve-builtins.h \
   $(srcdir)/config/aarch64/aarch64-sve-builtins-shapes.h \
   $(srcdir)/config/aarch64/aarch64-sve-builtins-base.h \
-  $(srcdir)/config/aarch64/aarch64-sve-builtins-sve2.h
+  $(srcdir)/config/aarch64/aarch64-sve-builtins-sve2.h \
+  $(srcdir)/config/aarch64/aarch64-sve-builtins-sme.h
 	$(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) \
 		$(srcdir)/config/aarch64/aarch64-sve-builtins.cc
 
@@ -113,6 +115,19 @@ aarch64-sve-builtins-sve2.o: \
 	$(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) \
 		$(srcdir)/config/aarch64/aarch64-sve-builtins-sve2.cc
 
+aarch64-sve-builtins-sme.o: \
+  $(srcdir)/config/aarch64/aarch64-sve-builtins-sme.cc \
+  $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(TREE_H) $(RTL_H) \
+  $(TM_P_H) memmodel.h insn-codes.h $(OPTABS_H) $(RECOG_H) \
+  $(EXPR_H) $(BASIC_BLOCK_H) $(FUNCTION_H) fold-const.h $(GIMPLE_H) \
+  gimple-iterator.h gimplify.h explow.h $(EMIT_RTL_H) \
+  $(srcdir)/config/aarch64/aarch64-sve-builtins.h \
+  $(srcdir)/config/aarch64/aarch64-sve-builtins-shapes.h \
+  $(srcdir)/config/aarch64/aarch64-sve-builtins-sme.h \
+  $(srcdir)/config/aarch64/aarch64-sve-builtins-functions.h
+	$(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) \
+		$(srcdir)/config/aarch64/aarch64-sve-builtins-sme.cc
+
 aarch64-builtin-iterators.h: $(srcdir)/config/aarch64/geniterators.sh \
 	$(srcdir)/config/aarch64/iterators.md
 	$(SHELL) $(srcdir)/config/aarch64/geniterators.sh \
diff --git a/gcc/doc/gcc/gcc-command-options/machine-dependent-options/aarch64-options.rst b/gcc/doc/gcc/gcc-command-options/machine-dependent-options/aarch64-options.rst
index f6d82f4435b..03da1a867bb 100644
--- a/gcc/doc/gcc/gcc-command-options/machine-dependent-options/aarch64-options.rst
+++ b/gcc/doc/gcc/gcc-command-options/machine-dependent-options/aarch64-options.rst
@@ -547,6 +547,12 @@ the following and their inverses no :samp:`{feature}` :
 :samp:`sme`
   Enable the Scalable Matrix Extension.
 
+:samp:`sme-i16i64`
+  Enable the FEAT_SME_I16I64 extension to SME.
+
+:samp:`sme-f64f64`
+  Enable the FEAT_SME_F64F64 extension to SME.
+
 Feature ``crypto`` implies ``aes``, ``sha2``, and ``simd``,
 which implies ``fp``.
 Conversely, ``nofp`` implies ``nosimd``, which implies
diff --git a/gcc/testsuite/g++.target/aarch64/sme/aarch64-sme-acle-asm.exp b/gcc/testsuite/g++.target/aarch64/sme/aarch64-sme-acle-asm.exp
new file mode 100644
index 00000000000..f05b9de76c5
--- /dev/null
+++ b/gcc/testsuite/g++.target/aarch64/sme/aarch64-sme-acle-asm.exp
@@ -0,0 +1,86 @@
+#  Assembly-based regression-test driver for the SME ACLE
+#  Copyright (C) 2009-2022 Free Software Foundation, Inc.
+#
+#  This file is part of GCC.
+#
+#  GCC is free software; you can redistribute it and/or modify it
+#  under the terms of the GNU General Public License as published by
+#  the Free Software Foundation; either version 3, or (at your option)
+#  any later version.
+#
+#  GCC is distributed in the hope that it will be useful, but
+#  WITHOUT ANY WARRANTY; without even the implied warranty of
+#  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+#  General Public License for more details.
+#
+#  You should have received a copy of the GNU General Public License
+#  along with GCC; see the file COPYING3.  If not see
+#  <http://www.gnu.org/licenses/>.  */
+
+# GCC testsuite that uses the `dg.exp' driver.
+
+# Exit immediately if this isn't an AArch64 target.
+if {![istarget aarch64*-*-*] } {
+    return
+}
+
+# Load support procs.
+load_lib g++-dg.exp
+
+# Initialize `dg'.
+dg-init
+
+# Force SME if we're not testing it already.
+if { [check_effective_target_aarch64_sme] } {
+    set sme_flags ""
+} else {
+    set sme_flags "-march=armv8.2-a+sme"
+}
+
+# Turn off any codegen tweaks by default that may affect expected assembly.
+# Tests relying on those should turn them on explicitly.
+set sme_flags "$sme_flags -mtune=generic -moverride=tune=none"
+
+global gcc_runtest_parallelize_limit_minor
+if { [info exists gcc_runtest_parallelize_limit_minor] } {
+    set old_limit_minor $gcc_runtest_parallelize_limit_minor
+    set gcc_runtest_parallelize_limit_minor 1
+}
+
+torture-init
+set-torture-options {
+    "-std=c++98 -O0 -g"
+    "-std=c++98 -O1 -g"
+    "-std=c++11 -O2 -g"
+    "-std=c++14 -O3 -g"
+    "-std=c++17 -Og -g"
+    "-std=c++2a -Os -g"
+    "-std=gnu++98 -O2 -fno-schedule-insns -fno-schedule-insns2 -DCHECK_ASM --save-temps"
+    "-std=gnu++11 -Ofast -g"
+    "-std=gnu++17 -O3 -g"
+    "-std=gnu++2a -O0 -g"
+} {
+    "-DTEST_FULL"
+    "-DTEST_OVERLOADS"
+}
+
+# Main loop.
+set gcc_subdir [string replace $subdir 0 2 gcc]
+set files [glob -nocomplain $srcdir/$gcc_subdir/acle-asm/*.c]
+set save-dg-do-what-default ${dg-do-what-default}
+if { [check_effective_target_aarch64_asm_sme-i16i64_ok] } {
+    set dg-do-what-default assemble
+} else {
+    set dg-do-what-default compile
+}
+gcc-dg-runtest [lsort $files] "" "$sme_flags -fno-ipa-icf"
+set dg-do-what-default ${save-dg-do-what-default}
+
+torture-finish
+
+if { [info exists gcc_runtest_parallelize_limit_minor] } {
+    set gcc_runtest_parallelize_limit_minor $old_limit_minor
+}
+
+# All done.
+dg-finish
diff --git a/gcc/testsuite/g++.target/aarch64/sve/acle/general-c++/func_redef_4.c b/gcc/testsuite/g++.target/aarch64/sve/acle/general-c++/func_redef_4.c
index 9591e3d01d6..8ad86a3c024 100644
--- a/gcc/testsuite/g++.target/aarch64/sve/acle/general-c++/func_redef_4.c
+++ b/gcc/testsuite/g++.target/aarch64/sve/acle/general-c++/func_redef_4.c
@@ -4,6 +4,6 @@
    to be diagnosed.  Any attempt to call the function before including
    arm_sve.h will lead to a link failure.  (Same for taking its address,
    etc.)  */
-extern __SVUint8_t svadd_u8_x (__SVBool_t, __SVUint8_t, __SVUint8_t);
+extern __attribute__((arm_preserves_za)) __SVUint8_t svadd_u8_x (__SVBool_t, __SVUint8_t, __SVUint8_t);
 
 #pragma GCC aarch64 "arm_sve.h"
diff --git a/gcc/testsuite/g++.target/aarch64/sve/acle/general-c++/func_redef_5.c b/gcc/testsuite/g++.target/aarch64/sve/acle/general-c++/func_redef_5.c
index f87201984b8..7c2f4c440cb 100644
--- a/gcc/testsuite/g++.target/aarch64/sve/acle/general-c++/func_redef_5.c
+++ b/gcc/testsuite/g++.target/aarch64/sve/acle/general-c++/func_redef_5.c
@@ -1,6 +1,6 @@
 /* { dg-do compile } */
 
-__SVUint8_t
+__SVUint8_t __attribute__((arm_preserves_za))
 svadd_u8_x (__SVBool_t pg, __SVUint8_t x, __SVUint8_t y)
 {
   return x;
diff --git a/gcc/testsuite/g++.target/aarch64/sve/acle/general-c++/func_redef_7.c b/gcc/testsuite/g++.target/aarch64/sve/acle/general-c++/func_redef_7.c
index 1f2e4bf66a3..31b8d7ddfab 100644
--- a/gcc/testsuite/g++.target/aarch64/sve/acle/general-c++/func_redef_7.c
+++ b/gcc/testsuite/g++.target/aarch64/sve/acle/general-c++/func_redef_7.c
@@ -1,6 +1,6 @@
 /* { dg-do compile } */
 
-__SVUint8_t
+__SVUint8_t __attribute__((arm_preserves_za))
 svadd_x (__SVBool_t pg, __SVUint8_t x, __SVUint8_t y)
 {
   return x;
diff --git a/gcc/testsuite/gcc.target/aarch64/sme/aarch64-sme-acle-asm.exp b/gcc/testsuite/gcc.target/aarch64/sme/aarch64-sme-acle-asm.exp
new file mode 100644
index 00000000000..ad998583b70
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sme/aarch64-sme-acle-asm.exp
@@ -0,0 +1,82 @@
+#  Assembly-based regression-test driver for the SME ACLE
+#  Copyright (C) 2009-2022 Free Software Foundation, Inc.
+#
+#  This file is part of GCC.
+#
+#  GCC is free software; you can redistribute it and/or modify it
+#  under the terms of the GNU General Public License as published by
+#  the Free Software Foundation; either version 3, or (at your option)
+#  any later version.
+#
+#  GCC is distributed in the hope that it will be useful, but
+#  WITHOUT ANY WARRANTY; without even the implied warranty of
+#  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+#  General Public License for more details.
+#
+#  You should have received a copy of the GNU General Public License
+#  along with GCC; see the file COPYING3.  If not see
+#  <http://www.gnu.org/licenses/>.  */
+
+# GCC testsuite that uses the `dg.exp' driver.
+
+# Exit immediately if this isn't an AArch64 target.
+if {![istarget aarch64*-*-*] } {
+    return
+}
+
+# Load support procs.
+load_lib gcc-dg.exp
+
+# Initialize `dg'.
+dg-init
+
+# Force SME if we're not testing it already.
+if { [check_effective_target_aarch64_sme] } {
+    set sme_flags ""
+} else {
+    set sme_flags "-march=armv8.2-a+sme"
+}
+
+# Turn off any codegen tweaks by default that may affect expected assembly.
+# Tests relying on those should turn them on explicitly.
+set sme_flags "$sme_flags -mtune=generic -moverride=tune=none"
+
+global gcc_runtest_parallelize_limit_minor
+if { [info exists gcc_runtest_parallelize_limit_minor] } {
+    set old_limit_minor $gcc_runtest_parallelize_limit_minor
+    set gcc_runtest_parallelize_limit_minor 1
+}
+
+torture-init
+set-torture-options {
+    "-std=c90 -O0 -g"
+    "-std=c90 -O1 -g"
+    "-std=c99 -O2 -g"
+    "-std=c11 -O3 -g"
+    "-std=gnu90 -O2 -fno-schedule-insns -fno-schedule-insns2 -DCHECK_ASM --save-temps"
+    "-std=gnu99 -Ofast -g"
+    "-std=gnu11 -Os -g"
+} {
+    "-DTEST_FULL"
+    "-DTEST_OVERLOADS"
+}
+
+# Main loop.
+set files [glob -nocomplain $srcdir/$subdir/acle-asm/*.c]
+set save-dg-do-what-default ${dg-do-what-default}
+if { [check_effective_target_aarch64_asm_sme-i16i64_ok] } {
+    set dg-do-what-default assemble
+} else {
+    set dg-do-what-default compile
+}
+gcc-dg-runtest [lsort $files] "" "$sme_flags -fno-ipa-icf"
+set dg-do-what-default ${save-dg-do-what-default}
+
+torture-finish
+
+if { [info exists gcc_runtest_parallelize_limit_minor] } {
+    set gcc_runtest_parallelize_limit_minor $old_limit_minor
+}
+
+# All done.
+dg-finish
diff --git a/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/addha_za32.c b/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/addha_za32.c
new file mode 100644
index 00000000000..8dee401458c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/addha_za32.c
@@ -0,0 +1,48 @@
+/* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
+
+#include "test_sme_acle.h"
+
+/*
+** addha_za32_s32_0_p0_p1_z0:
+**	addha	za0\.s, p0/m, p1/m, z0\.s
+**	ret
+*/
+TEST_UNIFORM_ZA (addha_za32_s32_0_p0_p1_z0, svint32_t,
+		 svaddha_za32_s32_m (0, p0, p1, z0),
+		 svaddha_za32_m (0, p0, p1, z0))
+
+/*
+** addha_za32_s32_0_p1_p0_z1:
+**	addha	za0\.s, p1/m, p0/m, z1\.s
+**	ret
+*/
+TEST_UNIFORM_ZA (addha_za32_s32_0_p1_p0_z1, svint32_t,
+		 svaddha_za32_s32_m (0, p1, p0, z1),
+		 svaddha_za32_m (0, p1, p0, z1))
+
+/*
+** addha_za32_s32_1_p0_p1_z0:
+**	addha	za1\.s, p0/m, p1/m, z0\.s
+**	ret
+*/
+TEST_UNIFORM_ZA (addha_za32_s32_1_p0_p1_z0, svint32_t,
+		 svaddha_za32_s32_m (1, p0, p1, z0),
+		 svaddha_za32_m (1, p0, p1, z0))
+
+/*
+** addha_za32_s32_3_p0_p1_z0:
+**	addha	za3\.s, p0/m, p1/m, z0\.s
+**	ret
+*/
+TEST_UNIFORM_ZA (addha_za32_s32_3_p0_p1_z0, svint32_t,
+		 svaddha_za32_s32_m (3, p0, p1, z0),
+		 svaddha_za32_m (3, p0, p1, z0))
+
+/*
+** addha_za32_u32_0_p0_p1_z0:
+**	addha	za0\.s, p0/m, p1/m, z0\.s
+**	ret
+*/
+TEST_UNIFORM_ZA (addha_za32_u32_0_p0_p1_z0, svuint32_t,
+		 svaddha_za32_u32_m (0, p0, p1, z0),
+		 svaddha_za32_m (0, p0, p1, z0))
diff --git a/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/addha_za64.c b/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/addha_za64.c
new file mode 100644
index 00000000000..363ff1aab21
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/addha_za64.c
@@ -0,0 +1,50 @@
+/* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
+
+#include "test_sme_acle.h"
+
+#pragma GCC target "+sme-i16i64"
+
+/*
+** addha_za64_s64_0_p0_p1_z0:
+**	addha	za0\.d, p0/m, p1/m, z0\.d
+**	ret
+*/
+TEST_UNIFORM_ZA (addha_za64_s64_0_p0_p1_z0, svint64_t,
+		 svaddha_za64_s64_m (0, p0, p1, z0),
+		 svaddha_za64_m (0, p0, p1, z0))
+
+/*
+** addha_za64_s64_0_p1_p0_z1:
+**	addha	za0\.d, p1/m, p0/m, z1\.d
+**	ret
+*/
+TEST_UNIFORM_ZA (addha_za64_s64_0_p1_p0_z1, svint64_t,
+		 svaddha_za64_s64_m (0, p1, p0, z1),
+		 svaddha_za64_m (0, p1, p0, z1))
+
+/*
+** addha_za64_s64_1_p0_p1_z0:
+**	addha	za1\.d, p0/m, p1/m, z0\.d
+**	ret
+*/
+TEST_UNIFORM_ZA (addha_za64_s64_1_p0_p1_z0, svint64_t,
+		 svaddha_za64_s64_m (1, p0, p1, z0),
+		 svaddha_za64_m (1, p0, p1, z0))
+
+/*
+** addha_za64_s64_7_p0_p1_z0:
+**	addha	za7\.d, p0/m, p1/m, z0\.d
+**	ret
+*/
+TEST_UNIFORM_ZA (addha_za64_s64_7_p0_p1_z0, svint64_t,
+		 svaddha_za64_s64_m (7, p0, p1, z0),
+		 svaddha_za64_m (7, p0, p1, z0))
+
+/*
+** addha_za64_u64_0_p0_p1_z0:
+**	addha	za0\.d, p0/m, p1/m, z0\.d
+**	ret
+*/
+TEST_UNIFORM_ZA (addha_za64_u64_0_p0_p1_z0, svuint64_t,
+		 svaddha_za64_u64_m (0, p0, p1, z0),
+		 svaddha_za64_m (0, p0, p1, z0))
diff --git a/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/addva_za32.c b/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/addva_za32.c
new file mode 100644
index 00000000000..0de019ac86a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/addva_za32.c
@@ -0,0 +1,48 @@
+/* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
+
+#include "test_sme_acle.h"
+
+/*
+** addva_za32_s32_0_p0_p1_z0:
+**	addva	za0\.s, p0/m, p1/m, z0\.s
+**	ret
+*/
+TEST_UNIFORM_ZA (addva_za32_s32_0_p0_p1_z0, svint32_t,
+		 svaddva_za32_s32_m (0, p0, p1, z0),
+		 svaddva_za32_m (0, p0, p1, z0))
+
+/*
+** addva_za32_s32_0_p1_p0_z1:
+**	addva	za0\.s, p1/m, p0/m, z1\.s
+**	ret
+*/
+TEST_UNIFORM_ZA (addva_za32_s32_0_p1_p0_z1, svint32_t,
+		 svaddva_za32_s32_m (0, p1, p0, z1),
+		 svaddva_za32_m (0, p1, p0, z1))
+
+/*
+** addva_za32_s32_1_p0_p1_z0:
+**	addva	za1\.s, p0/m, p1/m, z0\.s
+**	ret
+*/
+TEST_UNIFORM_ZA (addva_za32_s32_1_p0_p1_z0, svint32_t,
+		 svaddva_za32_s32_m (1, p0, p1, z0),
+		 svaddva_za32_m (1, p0, p1, z0))
+
+/*
+** addva_za32_s32_3_p0_p1_z0:
+**	addva	za3\.s, p0/m, p1/m, z0\.s
+**	ret
+*/
+TEST_UNIFORM_ZA (addva_za32_s32_3_p0_p1_z0, svint32_t,
+		 svaddva_za32_s32_m (3, p0, p1, z0),
+		 svaddva_za32_m (3, p0, p1, z0))
+
+/*
+** addva_za32_u32_0_p0_p1_z0:
+**	addva	za0\.s, p0/m, p1/m, z0\.s
+**	ret
+*/
+TEST_UNIFORM_ZA (addva_za32_u32_0_p0_p1_z0, svuint32_t,
+		 svaddva_za32_u32_m (0, p0, p1, z0),
+		 svaddva_za32_m (0, p0, p1, z0))
diff --git a/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/addva_za64.c b/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/addva_za64.c
new file mode 100644
index 00000000000..d83d4e03c6a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/addva_za64.c
@@ -0,0 +1,50 @@
+/* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
+
+#include "test_sme_acle.h"
+
+#pragma GCC target "+sme-i16i64"
+
+/*
+** addva_za64_s64_0_p0_p1_z0:
+**	addva	za0\.d, p0/m, p1/m, z0\.d
+**	ret
+*/
+TEST_UNIFORM_ZA (addva_za64_s64_0_p0_p1_z0, svint64_t,
+		 svaddva_za64_s64_m (0, p0, p1, z0),
+		 svaddva_za64_m (0, p0, p1, z0))
+
+/*
+** addva_za64_s64_0_p1_p0_z1:
+**	addva	za0\.d, p1/m, p0/m, z1\.d
+**	ret
+*/
+TEST_UNIFORM_ZA (addva_za64_s64_0_p1_p0_z1, svint64_t,
+		 svaddva_za64_s64_m (0, p1, p0, z1),
+		 svaddva_za64_m (0, p1, p0, z1))
+
+/*
+** addva_za64_s64_1_p0_p1_z0:
+**	addva	za1\.d, p0/m, p1/m, z0\.d
+**	ret
+*/
+TEST_UNIFORM_ZA (addva_za64_s64_1_p0_p1_z0, svint64_t,
+		 svaddva_za64_s64_m (1, p0, p1, z0),
+		 svaddva_za64_m (1, p0, p1, z0))
+
+/*
+** addva_za64_s64_7_p0_p1_z0:
+**	addva	za7\.d, p0/m, p1/m, z0\.d
+**	ret
+*/
+TEST_UNIFORM_ZA (addva_za64_s64_7_p0_p1_z0, svint64_t,
+		 svaddva_za64_s64_m (7, p0, p1, z0),
+		 svaddva_za64_m (7, p0, p1, z0))
+
+/*
+** addva_za64_u64_0_p0_p1_z0:
+**	addva	za0\.d, p0/m, p1/m, z0\.d
+**	ret
+*/
+TEST_UNIFORM_ZA (addva_za64_u64_0_p0_p1_z0, svuint64_t,
+		 svaddva_za64_u64_m (0, p0, p1, z0),
+		 svaddva_za64_m (0, p0, p1, z0))
diff --git a/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/arm_has_sme_sc.c b/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/arm_has_sme_sc.c
new file mode 100644
index 00000000000..e37793f9e75
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/arm_has_sme_sc.c
@@ -0,0 +1,25 @@
+/* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
+
+#define STREAMING_COMPATIBLE
+#define NO_SHARED_ZA
+#include "test_sme_acle.h"
+
+#pragma GCC target "+nosme"
+
+/*
+** test_nosme:
+**	...
+**	bl	__arm_sme_state
+**	lsr	x0, x0, #?63
+**	...
+*/
+PROTO (test_nosme, int, ()) { return __arm_has_sme (); }
+
+#pragma GCC target "+sme"
+
+/*
+** test_sme:
+**	mov	w0, #?1
+**	ret
+*/
+PROTO (test_sme, int, ()) { return __arm_has_sme (); }
diff --git a/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/arm_in_streaming_mode_ns.c b/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/arm_in_streaming_mode_ns.c
new file mode 100644
index 00000000000..ba475d67bb2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/arm_in_streaming_mode_ns.c
@@ -0,0 +1,11 @@
+/* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
+
+#define NON_STREAMING
+#include "test_sme_acle.h"
+
+/*
+** test_sme:
+**	mov	w0, #?0
+**	ret
+*/
+PROTO (test_sme, int, ()) { return __arm_in_streaming_mode (); }
diff --git a/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/arm_in_streaming_mode_s.c b/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/arm_in_streaming_mode_s.c
new file mode 100644
index 00000000000..b88d47921bb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/arm_in_streaming_mode_s.c
@@ -0,0 +1,11 @@
+/* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
+
+#define NO_SHARED_ZA
+#include "test_sme_acle.h"
+
+/*
+** test_sme:
+**	mov	w0, #?1
+**	ret
+*/
+PROTO (test_sme, int, ()) { return __arm_in_streaming_mode (); }
diff --git a/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/arm_in_streaming_mode_sc.c b/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/arm_in_streaming_mode_sc.c
new file mode 100644
index 00000000000..fb3588a642e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/arm_in_streaming_mode_sc.c
@@ -0,0 +1,26 @@
+/* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
+
+#define STREAMING_COMPATIBLE
+#define NO_SHARED_ZA
+#include "test_sme_acle.h"
+
+#pragma GCC target "+nosme"
+
+/*
+** test_nosme:
+**	...
+**	bl	__arm_sme_state
+**	and	w0, w0, #?1
+**	...
+*/
+PROTO (test_nosme, int, ()) { return __arm_in_streaming_mode (); }
+
+#pragma GCC target "+sme"
+
+/*
+** test_sme:
+**	mrs	x([0-9]+), svcr
+**	and	w0, w\1, #?1
+**	ret
+*/
+PROTO (test_sme, int, ()) { return __arm_in_streaming_mode (); }
diff --git a/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/cntsb_s.c b/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/cntsb_s.c
new file mode 100644
index 00000000000..0a8de45be4d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/cntsb_s.c
@@ -0,0 +1,310 @@
+/* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
+
+#define NO_SHARED_ZA
+#include "test_sme_acle.h"
+
+/*
+** cntb_1:
+**	cntb	x0
+**	ret
+*/
+PROTO (cntb_1, uint64_t, ()) { return svcntsb (); }
+
+/*
+** cntb_2:
+**	cntb	x0, all, mul #2
+**	ret
+*/
+PROTO (cntb_2, uint64_t, ()) { return svcntsb () * 2; }
+
+/*
+** cntb_3:
+**	cntb	x0, all, mul #3
+**	ret
+*/
+PROTO (cntb_3, uint64_t, ()) { return svcntsb () * 3; }
+
+/*
+** cntb_4:
+**	cntb	x0, all, mul #4
+**	ret
+*/
+PROTO (cntb_4, uint64_t, ()) { return svcntsb () * 4; }
+
+/*
+** cntb_8:
+**	cntb	x0, all, mul #8
+**	ret
+*/
+PROTO (cntb_8, uint64_t, ()) { return svcntsb () * 8; }
+
+/*
+** cntb_15:
+**	cntb	x0, all, mul #15
+**	ret
+*/
+PROTO (cntb_15, uint64_t, ()) { return svcntsb () * 15; }
+
+/*
+** cntb_16:
+**	cntb	x0, all, mul #16
+**	ret
+*/
+PROTO (cntb_16, uint64_t, ()) { return svcntsb () * 16; }
+
+/*
+** cntb_17:
+**	rdvl	x0, #17
+**	ret
+*/
+PROTO (cntb_17, uint64_t, ()) { return svcntsb () * 17; }
+
+/*
+** cntb_31:
+**	rdvl	x0, #31
+**	ret
+*/
+PROTO (cntb_31, uint64_t, ()) { return svcntsb () * 31; }
+
+/*
+** cntb_32:
+**	cntb	(x[0-9]+)
+**	lsl	x0, \1, 5
+**	ret
+*/
+PROTO (cntb_32, uint64_t, ()) { return svcntsb () * 32; }
+
+/* Other sequences would be OK.  */
+/*
+** cntb_33:
+**	cntb	(x[0-9]+)
+**	lsl	x0, \1, 5
+**	incb	x0
+**	ret
+*/
+PROTO (cntb_33, uint64_t, ()) { return svcntsb () * 33; }
+
+/*
+** cntb_64:
+**	cntb	(x[0-9]+)
+**	lsl	x0, \1, 6
+**	ret
+*/
+PROTO (cntb_64, uint64_t, ()) { return svcntsb () * 64; }
+
+/*
+** cntb_128:
+**	cntb	(x[0-9]+)
+**	lsl	x0, \1, 7
+**	ret
+*/
+PROTO (cntb_128, uint64_t, ()) { return svcntsb () * 128; }
+
+/* Other sequences would be OK.  */
+/*
+** cntb_129:
+**	cntb	(x[0-9]+)
+**	lsl	x0, \1, 7
+**	incb	x0
+**	ret
+*/
+PROTO (cntb_129, uint64_t, ()) { return svcntsb () * 129; }
+
+/*
+** cntb_m1:
+**	rdvl	x0, #-1
+**	ret
+*/
+PROTO (cntb_m1, uint64_t, ()) { return -svcntsb (); }
+
+/*
+** cntb_m13:
+**	rdvl	x0, #-13
+**	ret
+*/
+PROTO (cntb_m13, uint64_t, ()) { return -svcntsb () * 13; }
+
+/*
+** cntb_m15:
+**	rdvl	x0, #-15
+**	ret
+*/
+PROTO (cntb_m15, uint64_t, ()) { return -svcntsb () * 15; }
+
+/*
+** cntb_m16:
+**	rdvl	x0, #-16
+**	ret
+*/
+PROTO (cntb_m16, uint64_t, ()) { return -svcntsb () * 16; }
+
+/*
+** cntb_m17:
+**	rdvl	x0, #-17
+**	ret
+*/
+PROTO (cntb_m17, uint64_t, ()) { return -svcntsb () * 17; }
+
+/*
+** cntb_m32:
+**	rdvl	x0, #-32
+**	ret
+*/
+PROTO (cntb_m32, uint64_t, ()) { return -svcntsb () * 32; }
+
+/*
+** cntb_m33:
+**	rdvl	x0, #-32
+**	decb	x0
+**	ret
+*/
+PROTO (cntb_m33, uint64_t, ()) { return -svcntsb () * 33; }
+
+/*
+** cntb_m34:
+**	rdvl	(x[0-9]+), #-17
+**	lsl	x0, \1, #?1
+**	ret
+*/
+PROTO (cntb_m34, uint64_t, ()) { return -svcntsb () * 34; }
+
+/*
+** cntb_m64:
+**	rdvl	(x[0-9]+), #-1
+**	lsl	x0, \1, #?6
+**	ret
+*/
+PROTO (cntb_m64, uint64_t, ()) { return -svcntsb () * 64; }
+
+/*
+** incb_1:
+**	incb	x0
+**	ret
+*/
+PROTO (incb_1, uint64_t, (uint64_t x0)) { return x0 + svcntsb (); }
+
+/*
+** incb_2:
+**	incb	x0, all, mul #2
+**	ret
+*/
+PROTO (incb_2, uint64_t, (uint64_t x0)) { return x0 + svcntsb () * 2; }
+
+/*
+** incb_3:
+**	incb	x0, all, mul #3
+**	ret
+*/
+PROTO (incb_3, uint64_t, (uint64_t x0)) { return x0 + svcntsb () * 3; }
+
+/*
+** incb_4:
+**	incb	x0, all, mul #4
+**	ret
+*/
+PROTO (incb_4, uint64_t, (uint64_t x0)) { return x0 + svcntsb () * 4; }
+
+/*
+** incb_8:
+**	incb	x0, all, mul #8
+**	ret
+*/
+PROTO (incb_8, uint64_t, (uint64_t x0)) { return x0 + svcntsb () * 8; }
+
+/*
+** incb_15:
+**	incb	x0, all, mul #15
+**	ret
+*/
+PROTO (incb_15, uint64_t, (uint64_t x0)) { return x0 + svcntsb () * 15; }
+
+/*
+** incb_16:
+**	incb	x0, all, mul #16
+**	ret
+*/
+PROTO (incb_16, uint64_t, (uint64_t x0)) { return x0 + svcntsb () * 16; }
+
+/*
+** incb_17:
+**	addvl	x0, x0, #17
+**	ret
+*/
+PROTO (incb_17, uint64_t, (uint64_t x0)) { return x0 + svcntsb () * 17; }
+
+/*
+** incb_31:
+**	addvl	x0, x0, #31
+**	ret
+*/
+PROTO (incb_31, uint64_t, (uint64_t x0)) { return x0 + svcntsb () * 31; }
+
+/*
+** decb_1:
+**	decb	x0
+**	ret
+*/
+PROTO (decb_1, uint64_t, (uint64_t x0)) { return x0 - svcntsb (); }
+
+/*
+** decb_2:
+**	decb	x0, all, mul #2
+**	ret
+*/
+PROTO (decb_2, uint64_t, (uint64_t x0)) { return x0 - svcntsb () * 2; }
+
+/*
+** decb_3:
+**	decb	x0, all, mul #3
+**	ret
+*/
+PROTO (decb_3, uint64_t, (uint64_t x0)) { return x0 - svcntsb () * 3; }
+
+/*
+** decb_4:
+**	decb	x0, all, mul #4
+**	ret
+*/
+PROTO (decb_4, uint64_t, (uint64_t x0)) { return x0 - svcntsb () * 4; }
+
+/*
+** decb_8:
+**	decb	x0, all, mul #8
+**	ret
+*/
+PROTO (decb_8, uint64_t, (uint64_t x0)) { return x0 - svcntsb () * 8; }
+
+/*
+** decb_15:
+**	decb	x0, all, mul #15
+**	ret
+*/
+PROTO (decb_15, uint64_t, (uint64_t x0)) { return x0 - svcntsb () * 15; }
+
+/*
+** decb_16:
+**	decb	x0, all, mul #16
+**	ret
+*/
+PROTO (decb_16, uint64_t, (uint64_t x0)) { return x0 - svcntsb () * 16; }
+
+/*
+** decb_17:
+**	addvl	x0, x0, #-17
+**	ret
+*/
+PROTO (decb_17, uint64_t, (uint64_t x0)) { return x0 - svcntsb () * 17; }
+
+/*
+** decb_31:
+**	addvl	x0, x0, #-31
+**	ret
+*/
+PROTO (decb_31, uint64_t, (uint64_t x0)) { return x0 - svcntsb () * 31; }
+
+/*
+** decb_32:
+**	addvl	x0, x0, #-32
+**	ret
+*/
+PROTO (decb_32, uint64_t, (uint64_t x0)) { return x0 - svcntsb () * 32; }
diff --git a/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/cntsb_sc.c b/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/cntsb_sc.c
new file mode 100644
index 00000000000..9ee4c8afc36
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/cntsb_sc.c
@@ -0,0 +1,12 @@
+/* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
+
+#define STREAMING_COMPATIBLE
+#define NO_SHARED_ZA
+#include "test_sme_acle.h"
+
+/*
+** cntsb:
+**	rdsvl	x0, #1
+**	ret
+*/
+PROTO (cntsb, uint64_t, ()) { return svcntsb (); }
diff --git a/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/cntsd_s.c b/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/cntsd_s.c
new file mode 100644
index 00000000000..3bf9498e925
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/cntsd_s.c
@@ -0,0 +1,277 @@
+/* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
+
+#define NO_SHARED_ZA
+#include "test_sme_acle.h"
+
+/*
+** cntd_1:
+**	cntd	x0
+**	ret
+*/
+PROTO (cntd_1, uint64_t, ()) { return svcntsd (); }
+
+/*
+** cntd_2:
+**	cntw	x0
+**	ret
+*/
+PROTO (cntd_2, uint64_t, ()) { return svcntsd () * 2; }
+
+/*
+** cntd_3:
+**	cntd	x0, all, mul #3
+**	ret
+*/
+PROTO (cntd_3, uint64_t, ()) { return svcntsd () * 3; }
+
+/*
+** cntd_4:
+**	cnth	x0
+**	ret
+*/
+PROTO (cntd_4, uint64_t, ()) { return svcntsd () * 4; }
+
+/*
+** cntd_8:
+**	cntb	x0
+**	ret
+*/
+PROTO (cntd_8, uint64_t, ()) { return svcntsd () * 8; }
+
+/*
+** cntd_15:
+**	cntd	x0, all, mul #15
+**	ret
+*/
+PROTO (cntd_15, uint64_t, ()) { return svcntsd () * 15; }
+
+/*
+** cntd_16:
+**	cntb	x0, all, mul #2
+**	ret
+*/
+PROTO (cntd_16, uint64_t, ()) { return svcntsd () * 16; }
+
+/* Other sequences would be OK.  */
+/*
+** cntd_17:
+**	rdvl	(x[0-9]+), #17
+**	asr	x0, \1, 3
+**	ret
+*/
+PROTO (cntd_17, uint64_t, ()) { return svcntsd () * 17; }
+
+/*
+** cntd_32:
+**	cntb	x0, all, mul #4
+**	ret
+*/
+PROTO (cntd_32, uint64_t, ()) { return svcntsd () * 32; }
+
+/*
+** cntd_64:
+**	cntb	x0, all, mul #8
+**	ret
+*/
+PROTO (cntd_64, uint64_t, ()) { return svcntsd () * 64; }
+
+/*
+** cntd_128:
+**	cntb	x0, all, mul #16
+**	ret
+*/
+PROTO (cntd_128, uint64_t, ()) { return svcntsd () * 128; }
+
+/*
+** cntd_m1:
+**	cntd	(x[0-9]+)
+**	neg	x0, \1
+**	ret
+*/
+PROTO (cntd_m1, uint64_t, ()) { return -svcntsd (); }
+
+/*
+** cntd_m13:
+**	cntd	(x[0-9]+), all, mul #13
+**	neg	x0, \1
+**	ret
+*/
+PROTO (cntd_m13, uint64_t, ()) { return -svcntsd () * 13; }
+
+/*
+** cntd_m15:
+**	cntd	(x[0-9]+), all, mul #15
+**	neg	x0, \1
+**	ret
+*/
+PROTO (cntd_m15, uint64_t, ()) { return -svcntsd () * 15; }
+
+/*
+** cntd_m16:
+**	rdvl	x0, #-2
+**	ret
+*/
+PROTO (cntd_m16, uint64_t, ()) { return -svcntsd () * 16; }
+
+/* Other sequences would be OK.  */
+/*
+** cntd_m17:
+**	rdvl	(x[0-9]+), #-17
+**	asr	x0, \1, 3
+**	ret
+*/
+PROTO (cntd_m17, uint64_t, ()) { return -svcntsd () * 17; }
+
+/*
+** incd_1:
+**	incd	x0
+**	ret
+*/
+PROTO (incd_1, uint64_t, (uint64_t x0)) { return x0 + svcntsd (); }
+
+/*
+** incd_2:
+**	incw	x0
+**	ret
+*/
+PROTO (incd_2, uint64_t, (uint64_t x0)) { return x0 + svcntsd () * 2; }
+
+/*
+** incd_3:
+**	incd	x0, all, mul #3
+**	ret
+*/
+PROTO (incd_3, uint64_t, (uint64_t x0)) { return x0 + svcntsd () * 3; }
+
+/*
+** incd_4:
+**	inch	x0
+**	ret
+*/
+PROTO (incd_4, uint64_t, (uint64_t x0)) { return x0 + svcntsd () * 4; }
+
+/*
+** incd_7:
+**	incd	x0, all, mul #7
+**	ret
+*/
+PROTO (incd_7, uint64_t, (uint64_t x0)) { return x0 + svcntsd () * 7; }
+
+/*
+** incd_8:
+**	incb	x0
+**	ret
+*/
+PROTO (incd_8, uint64_t, (uint64_t x0)) { return x0 + svcntsd () * 8; }
+
+/*
+** incd_9:
+**	incd	x0, all, mul #9
+**	ret
+*/
+PROTO (incd_9, uint64_t, (uint64_t x0)) { return x0 + svcntsd () * 9; }
+
+/*
+** incd_15:
+**	incd	x0, all, mul #15
+**	ret
+*/
+PROTO (incd_15, uint64_t, (uint64_t x0)) { return x0 + svcntsd () * 15; }
+
+/*
+** incd_16:
+**	incb	x0, all, mul #2
+**	ret
+*/
+PROTO (incd_16, uint64_t, (uint64_t x0)) { return x0 + svcntsd () * 16; }
+
+/*
+** incd_18:
+**	incw	x0, all, mul #9
+**	ret
+*/
+PROTO (incd_18, uint64_t, (uint64_t x0)) { return x0 + svcntsd () * 18; }
+
+/*
+** incd_30:
+**	incw	x0, all, mul #15
+**	ret
+*/
+PROTO (incd_30, uint64_t, (uint64_t x0)) { return x0 + svcntsd () * 30; }
+
+/*
+** decd_1:
+**	decd	x0
+**	ret
+*/
+PROTO (decd_1, uint64_t, (uint64_t x0)) { return x0 - svcntsd (); }
+
+/*
+** decd_2:
+**	decw	x0
+**	ret
+*/
+PROTO (decd_2, uint64_t, (uint64_t x0)) { return x0 - svcntsd () * 2; }
+
+/*
+** decd_3:
+**	decd	x0, all, mul #3
+**	ret
+*/
+PROTO (decd_3, uint64_t, (uint64_t x0)) { return x0 - svcntsd () * 3; }
+
+/*
+** decd_4:
+**	dech	x0
+**	ret
+*/
+PROTO (decd_4, uint64_t, (uint64_t x0)) { return x0 - svcntsd () * 4; }
+
+/*
+** decd_7:
+**	decd	x0, all, mul #7
+**	ret
+*/
+PROTO (decd_7, uint64_t, (uint64_t x0)) { return x0 - svcntsd () * 7; }
+
+/*
+** decd_8:
+**	decb	x0
+**	ret
+*/
+PROTO (decd_8, uint64_t, (uint64_t x0)) { return x0 - svcntsd () * 8; }
+
+/*
+** decd_9:
+**	decd	x0, all, mul #9
+**	ret
+*/
+PROTO (decd_9, uint64_t, (uint64_t x0)) { return x0 - svcntsd () * 9; }
+
+/*
+** decd_15:
+**	decd	x0, all, mul #15
+**	ret
+*/
+PROTO (decd_15, uint64_t, (uint64_t x0)) { return x0 - svcntsd () * 15; }
+
+/*
+** decd_16:
+**	decb	x0, all, mul #2
+**	ret
+*/
+PROTO (decd_16, uint64_t, (uint64_t x0)) { return x0 - svcntsd () * 16; }
+
+/*
+** decd_18:
+**	decw	x0, all, mul #9
+**	ret
+*/
+PROTO (decd_18, uint64_t, (uint64_t x0)) { return x0 - svcntsd () * 18; }
+
+/*
+** decd_30:
+**	decw	x0, all, mul #15
+**	ret
+*/
+PROTO (decd_30, uint64_t, (uint64_t x0)) { return x0 - svcntsd () * 30; }
diff --git a/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/cntsd_sc.c b/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/cntsd_sc.c
new file mode 100644
index 00000000000..90fb374bac9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/cntsd_sc.c
@@ -0,0 +1,13 @@
+/* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
+
+#define STREAMING_COMPATIBLE
+#define NO_SHARED_ZA
+#include "test_sme_acle.h"
+
+/*
+** cntsd:
+**	rdsvl	(x[0-9])+, #1
+**	lsr	x0, \1, #?3
+**	ret
+*/
+PROTO (cntsd, uint64_t, ()) { return svcntsd (); }
diff --git a/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/cntsh_s.c b/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/cntsh_s.c
new file mode 100644
index 00000000000..021c39a1467
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/cntsh_s.c
@@ -0,0 +1,279 @@
+/* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
+
+#define NO_SHARED_ZA
+#include "test_sme_acle.h"
+
+/*
+** cnth_1:
+**	cnth	x0
+**	ret
+*/
+PROTO (cnth_1, uint64_t, ()) { return svcntsh (); }
+
+/*
+** cnth_2:
+**	cntb	x0
+**	ret
+*/
+PROTO (cnth_2, uint64_t, ()) { return svcntsh () * 2; }
+
+/*
+** cnth_3:
+**	cnth	x0, all, mul #3
+**	ret
+*/
+PROTO (cnth_3, uint64_t, ()) { return svcntsh () * 3; }
+
+/*
+** cnth_4:
+**	cntb	x0, all, mul #2
+**	ret
+*/
+PROTO (cnth_4, uint64_t, ()) { return svcntsh () * 4; }
+
+/*
+** cnth_8:
+**	cntb	x0, all, mul #4
+**	ret
+*/
+PROTO (cnth_8, uint64_t, ()) { return svcntsh () * 8; }
+
+/*
+** cnth_15:
+**	cnth	x0, all, mul #15
+**	ret
+*/
+PROTO (cnth_15, uint64_t, ()) { return svcntsh () * 15; }
+
+/*
+** cnth_16:
+**	cntb	x0, all, mul #8
+**	ret
+*/
+PROTO (cnth_16, uint64_t, ()) { return svcntsh () * 16; }
+
+/* Other sequences would be OK.  */
+/*
+** cnth_17:
+**	rdvl	(x[0-9]+), #17
+**	asr	x0, \1, 1
+**	ret
+*/
+PROTO (cnth_17, uint64_t, ()) { return svcntsh () * 17; }
+
+/*
+** cnth_32:
+**	cntb	x0, all, mul #16
+**	ret
+*/
+PROTO (cnth_32, uint64_t, ()) { return svcntsh () * 32; }
+
+/*
+** cnth_64:
+**	cntb	(x[0-9]+)
+**	lsl	x0, \1, 5
+**	ret
+*/
+PROTO (cnth_64, uint64_t, ()) { return svcntsh () * 64; }
+
+/*
+** cnth_128:
+**	cntb	(x[0-9]+)
+**	lsl	x0, \1, 6
+**	ret
+*/
+PROTO (cnth_128, uint64_t, ()) { return svcntsh () * 128; }
+
+/*
+** cnth_m1:
+**	cnth	(x[0-9]+)
+**	neg	x0, \1
+**	ret
+*/
+PROTO (cnth_m1, uint64_t, ()) { return -svcntsh (); }
+
+/*
+** cnth_m13:
+**	cnth	(x[0-9]+), all, mul #13
+**	neg	x0, \1
+**	ret
+*/
+PROTO (cnth_m13, uint64_t, ()) { return -svcntsh () * 13; }
+
+/*
+** cnth_m15:
+**	cnth	(x[0-9]+), all, mul #15
+**	neg	x0, \1
+**	ret
+*/
+PROTO (cnth_m15, uint64_t, ()) { return -svcntsh () * 15; }
+
+/*
+** cnth_m16:
+**	rdvl	x0, #-8
+**	ret
+*/
+PROTO (cnth_m16, uint64_t, ()) { return -svcntsh () * 16; }
+
+/* Other sequences would be OK.  */
+/*
+** cnth_m17:
+**	rdvl	(x[0-9]+), #-17
+**	asr	x0, \1, 1
+**	ret
+*/
+PROTO (cnth_m17, uint64_t, ()) { return -svcntsh () * 17; }
+
+/*
+** inch_1:
+**	inch	x0
+**	ret
+*/
+PROTO (inch_1, uint64_t, (uint64_t x0)) { return x0 + svcntsh (); }
+
+/*
+** inch_2:
+**	incb	x0
+**	ret
+*/
+PROTO (inch_2, uint64_t, (uint64_t x0)) { return x0 + svcntsh () * 2; }
+
+/*
+** inch_3:
+**	inch	x0, all, mul #3
+**	ret
+*/
+PROTO (inch_3, uint64_t, (uint64_t x0)) { return x0 + svcntsh () * 3; }
+
+/*
+** inch_4:
+**	incb	x0, all, mul #2
+**	ret
+*/
+PROTO (inch_4, uint64_t, (uint64_t x0)) { return x0 + svcntsh () * 4; }
+
+/*
+** inch_7:
+**	inch	x0, all, mul #7
+**	ret
+*/
+PROTO (inch_7, uint64_t, (uint64_t x0)) { return x0 + svcntsh () * 7; }
+
+/*
+** inch_8:
+**	incb	x0, all, mul #4
+**	ret
+*/
+PROTO (inch_8, uint64_t, (uint64_t x0)) { return x0 + svcntsh () * 8; }
+
+/*
+** inch_9:
+**	inch	x0, all, mul #9
+**	ret
+*/
+PROTO (inch_9, uint64_t, (uint64_t x0)) { return x0 + svcntsh () * 9; }
+
+/*
+** inch_15:
+**	inch	x0, all, mul #15
+**	ret
+*/
+PROTO (inch_15, uint64_t, (uint64_t x0)) { return x0 + svcntsh () * 15; }
+
+/*
+** inch_16:
+**	incb	x0, all, mul #8
+**	ret
+*/
+PROTO (inch_16, uint64_t, (uint64_t x0)) { return x0 + svcntsh () * 16; }
+
+/*
+** inch_18:
+**	incb	x0, all, mul #9
+**	ret
+*/
+PROTO (inch_18, uint64_t, (uint64_t x0)) { return x0 + svcntsh () * 18; }
+
+/*
+** inch_30:
+**	incb	x0, all, mul #15
+**	ret
+*/
+PROTO (inch_30, uint64_t, (uint64_t x0)) { return x0 + svcntsh () * 30; }
+
+/*
+** dech_1:
+**	dech	x0
+**	ret
+*/
+PROTO (dech_1, uint64_t, (uint64_t x0)) { return x0 - svcntsh (); }
+
+/*
+** dech_2:
+**	decb	x0
+**	ret
+*/
+PROTO (dech_2, uint64_t, (uint64_t x0)) { return x0 - svcntsh () * 2; }
+
+/*
+** dech_3:
+**	dech	x0, all, mul #3
+**	ret
+*/
+PROTO (dech_3, uint64_t, (uint64_t x0)) { return x0 - svcntsh () * 3; }
+
+/*
+** dech_4:
+**	decb	x0, all, mul #2
+**	ret
+*/
+PROTO (dech_4, uint64_t, (uint64_t x0)) { return x0 - svcntsh () * 4; }
+
+/*
+** dech_7:
+**	dech	x0, all, mul #7
+**	ret
+*/
+PROTO (dech_7, uint64_t, (uint64_t x0)) { return x0 - svcntsh () * 7; }
+
+/*
+** dech_8:
+**	decb	x0, all, mul #4
+**	ret
+*/
+PROTO (dech_8, uint64_t, (uint64_t x0)) { return x0 - svcntsh () * 8; }
+
+/*
+** dech_9:
+**	dech	x0, all, mul #9
+**	ret
+*/
+PROTO (dech_9, uint64_t, (uint64_t x0)) { return x0 - svcntsh () * 9; }
+
+/*
+** dech_15:
+**	dech	x0, all, mul #15
+**	ret
+*/
+PROTO (dech_15, uint64_t, (uint64_t x0)) { return x0 - svcntsh () * 15; }
+
+/*
+** dech_16:
+**	decb	x0, all, mul #8
+**	ret
+*/
+PROTO (dech_16, uint64_t, (uint64_t x0)) { return x0 - svcntsh () * 16; }
+
+/*
+** dech_18:
+**	decb	x0, all, mul #9
+**	ret
+*/
+PROTO (dech_18, uint64_t, (uint64_t x0)) { return x0 - svcntsh () * 18; }
+
+/*
+** dech_30:
+**	decb	x0, all, mul #15
+**	ret
+*/
+PROTO (dech_30, uint64_t, (uint64_t x0)) { return x0 - svcntsh () * 30; }
diff --git a/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/cntsh_sc.c b/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/cntsh_sc.c
new file mode 100644
index 00000000000..9f6c85208a6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/cntsh_sc.c
@@ -0,0 +1,13 @@
+/* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
+
+#define STREAMING_COMPATIBLE
+#define NO_SHARED_ZA
+#include "test_sme_acle.h"
+
+/*
+** cntsh:
+**	rdsvl	(x[0-9])+, #1
+**	lsr	x0, \1, #?1
+**	ret
+*/
+PROTO (cntsh, uint64_t, ()) { return svcntsh (); }
diff --git a/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/cntsw_s.c b/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/cntsw_s.c
new file mode 100644
index 00000000000..c421e1b8e1f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/cntsw_s.c
@@ -0,0 +1,278 @@
+/* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
+
+#define NO_SHARED_ZA
+#include "test_sme_acle.h"
+
+/*
+** cntw_1:
+**	cntw	x0
+**	ret
+*/
+PROTO (cntw_1, uint64_t, ()) { return svcntsw (); }
+
+/*
+** cntw_2:
+**	cnth	x0
+**	ret
+*/
+PROTO (cntw_2, uint64_t, ()) { return svcntsw () * 2; }
+
+/*
+** cntw_3:
+**	cntw	x0, all, mul #3
+**	ret
+*/
+PROTO (cntw_3, uint64_t, ()) { return svcntsw () * 3; }
+
+/*
+** cntw_4:
+**	cntb	x0
+**	ret
+*/
+PROTO (cntw_4, uint64_t, ()) { return svcntsw () * 4; }
+
+/*
+** cntw_8:
+**	cntb	x0, all, mul #2
+**	ret
+*/
+PROTO (cntw_8, uint64_t, ()) { return svcntsw () * 8; }
+
+/*
+** cntw_15:
+**	cntw	x0, all, mul #15
+**	ret
+*/
+PROTO (cntw_15, uint64_t, ()) { return svcntsw () * 15; }
+
+/*
+** cntw_16:
+**	cntb	x0, all, mul #4
+**	ret
+*/
+PROTO (cntw_16, uint64_t, ()) { return svcntsw () * 16; }
+
+/* Other sequences would be OK.  */
+/*
+** cntw_17:
+**	rdvl	(x[0-9]+), #17
+**	asr	x0, \1, 2
+**	ret
+*/
+PROTO (cntw_17, uint64_t, ()) { return svcntsw () * 17; }
+
+/*
+** cntw_32:
+**	cntb	x0, all, mul #8
+**	ret
+*/
+PROTO (cntw_32, uint64_t, ()) { return svcntsw () * 32; }
+
+/*
+** cntw_64:
+**	cntb	x0, all, mul #16
+**	ret
+*/
+PROTO (cntw_64, uint64_t, ()) { return svcntsw () * 64; }
+
+/*
+** cntw_128:
+**	cntb	(x[0-9]+)
+**	lsl	x0, \1, 5
+**	ret
+*/
+PROTO (cntw_128, uint64_t, ()) { return svcntsw () * 128; }
+
+/*
+** cntw_m1:
+**	cntw	(x[0-9]+)
+**	neg	x0, \1
+**	ret
+*/
+PROTO (cntw_m1, uint64_t, ()) { return -svcntsw (); }
+
+/*
+** cntw_m13:
+**	cntw	(x[0-9]+), all, mul #13
+**	neg	x0, \1
+**	ret
+*/
+PROTO (cntw_m13, uint64_t, ()) { return -svcntsw () * 13; }
+
+/*
+** cntw_m15:
+**	cntw	(x[0-9]+), all, mul #15
+**	neg	x0, \1
+**	ret
+*/
+PROTO (cntw_m15, uint64_t, ()) { return -svcntsw () * 15; }
+
+/*
+** cntw_m16:
+**	rdvl	(x[0-9]+), #-4
+**	ret
+*/
+PROTO (cntw_m16, uint64_t, ()) { return -svcntsw () * 16; }
+
+/* Other sequences would be OK.  */
+/*
+** cntw_m17:
+**	rdvl	(x[0-9]+), #-17
+**	asr	x0, \1, 2
+**	ret
+*/
+PROTO (cntw_m17, uint64_t, ()) { return -svcntsw () * 17; }
+
+/*
+** incw_1:
+**	incw	x0
+**	ret
+*/
+PROTO (incw_1, uint64_t, (uint64_t x0)) { return x0 + svcntsw (); }
+
+/*
+** incw_2:
+**	inch	x0
+**	ret
+*/
+PROTO (incw_2, uint64_t, (uint64_t x0)) { return x0 + svcntsw () * 2; }
+
+/*
+** incw_3:
+**	incw	x0, all, mul #3
+**	ret
+*/
+PROTO (incw_3, uint64_t, (uint64_t x0)) { return x0 + svcntsw () * 3; }
+
+/*
+** incw_4:
+**	incb	x0
+**	ret
+*/
+PROTO (incw_4, uint64_t, (uint64_t x0)) { return x0 + svcntsw () * 4; }
+
+/*
+** incw_7:
+**	incw	x0, all, mul #7
+**	ret
+*/
+PROTO (incw_7, uint64_t, (uint64_t x0)) { return x0 + svcntsw () * 7; }
+
+/*
+** incw_8:
+**	incb	x0, all, mul #2
+**	ret
+*/
+PROTO (incw_8, uint64_t, (uint64_t x0)) { return x0 + svcntsw () * 8; }
+
+/*
+** incw_9:
+**	incw	x0, all, mul #9
+**	ret
+*/
+PROTO (incw_9, uint64_t, (uint64_t x0)) { return x0 + svcntsw () * 9; }
+
+/*
+** incw_15:
+**	incw	x0, all, mul #15
+**	ret
+*/
+PROTO (incw_15, uint64_t, (uint64_t x0)) { return x0 + svcntsw () * 15; }
+
+/*
+** incw_16:
+**	incb	x0, all, mul #4
+**	ret
+*/
+PROTO (incw_16, uint64_t, (uint64_t x0)) { return x0 + svcntsw () * 16; }
+
+/*
+** incw_18:
+**	inch	x0, all, mul #9
+**	ret
+*/
+PROTO (incw_18, uint64_t, (uint64_t x0)) { return x0 + svcntsw () * 18; }
+
+/*
+** incw_30:
+**	inch	x0, all, mul #15
+**	ret
+*/
+PROTO (incw_30, uint64_t, (uint64_t x0)) { return x0 + svcntsw () * 30; }
+
+/*
+** decw_1:
+**	decw	x0
+**	ret
+*/
+PROTO (decw_1, uint64_t, (uint64_t x0)) { return x0 - svcntsw (); }
+
+/*
+** decw_2:
+**	dech	x0
+**	ret
+*/
+PROTO (decw_2, uint64_t, (uint64_t x0)) { return x0 - svcntsw () * 2; }
+
+/*
+** decw_3:
+**	decw	x0, all, mul #3
+**	ret
+*/
+PROTO (decw_3, uint64_t, (uint64_t x0)) { return x0 - svcntsw () * 3; }
+
+/*
+** decw_4:
+**	decb	x0
+**	ret
+*/
+PROTO (decw_4, uint64_t, (uint64_t x0)) { return x0 - svcntsw () * 4; }
+
+/*
+** decw_7:
+**	decw	x0, all, mul #7
+**	ret
+*/
+PROTO (decw_7, uint64_t, (uint64_t x0)) { return x0 - svcntsw () * 7; }
+
+/*
+** decw_8:
+**	decb	x0, all, mul #2
+**	ret
+*/
+PROTO (decw_8, uint64_t, (uint64_t x0)) { return x0 - svcntsw () * 8; }
+
+/*
+** decw_9:
+**	decw	x0, all, mul #9
+**	ret
+*/
+PROTO (decw_9, uint64_t, (uint64_t x0)) { return x0 - svcntsw () * 9; }
+
+/*
+** decw_15:
+**	decw	x0, all, mul #15
+**	ret
+*/
+PROTO (decw_15, uint64_t, (uint64_t x0)) { return x0 - svcntsw () * 15; }
+
+/*
+** decw_16:
+**	decb	x0, all, mul #4
+**	ret
+*/
+PROTO (decw_16, uint64_t, (uint64_t x0)) { return x0 - svcntsw () * 16; }
+
+/*
+** decw_18:
+**	dech	x0, all, mul #9
+**	ret
+*/
+PROTO (decw_18, uint64_t, (uint64_t x0)) { return x0 - svcntsw () * 18; }
+
+/*
+** decw_30:
+**	dech	x0, all, mul #15
+**	ret
+*/
+PROTO (decw_30, uint64_t, (uint64_t x0)) { return x0 - svcntsw () * 30; }
diff --git a/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/cntsw_sc.c b/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/cntsw_sc.c
new file mode 100644
index 00000000000..75ca937c48f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/cntsw_sc.c
@@ -0,0 +1,13 @@
+/* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
+
+#define STREAMING_COMPATIBLE
+#define NO_SHARED_ZA
+#include "test_sme_acle.h"
+
+/*
+** cntsw:
+**	rdsvl	(x[0-9])+, #1
+**	lsr	x0, \1, #?2
+**	ret
+*/
+PROTO (cntsw, uint64_t, ()) { return svcntsw (); }
diff --git a/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/ld1_hor_vnum_za128.c b/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/ld1_hor_vnum_za128.c
new file mode 100644
index 00000000000..897b5522d8d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/ld1_hor_vnum_za128.c
@@ -0,0 +1,46 @@
+/* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
+
+#include "test_sme_acle.h"
+
+/*
+** ld1_vnum_za128_0_0:
+**	mov	(w1[2-5]), w0
+**	ld1q	{ za0h\.q\[\1, 0\] }, p0/z, \[x1\]
+**	ret
+*/
+TEST_LOAD_ZA (ld1_vnum_za128_0_0,
+	      svld1_hor_vnum_za128 (0, w0, p0, x1, 0),
+	      svld1_hor_vnum_za128 (0, w0, p0, x1, 0))
+
+/*
+** ld1_vnum_za128_5_0:
+**	incb	x1, all, mul #13
+**	mov	(w1[2-5]), w0
+**	ld1q	{ za5h\.q\[\1, 0\] }, p0/z, \[x1\]
+**	ret
+*/
+TEST_LOAD_ZA (ld1_vnum_za128_5_0,
+	      svld1_hor_vnum_za128 (5, w0, p0, x1, 13),
+	      svld1_hor_vnum_za128 (5, w0, p0, x1, 13))
+
+/*
+** ld1_vnum_za128_11_0:
+**	cntb	(x[0-9]+)
+**	madd	(x[0-9]+), (?:\1, x2|x2, \1), x1
+**	mov	(w1[2-5]), w0
+**	ld1q	{ za11h\.q\[\3, 0\] }, p0/z, \[\2\]
+**	ret
+*/
+TEST_LOAD_ZA (ld1_vnum_za128_11_0,
+	      svld1_hor_vnum_za128 (11, w0, p0, x1, x2),
+	      svld1_hor_vnum_za128 (11, w0, p0, x1, x2))
+
+/*
+** ld1_vnum_za128_0_1:
+**	add	(w1[2-5]), w0, #?1
+**	ld1q	{ za0h\.q\[\1, 0\] }, p0/z, \[x1\]
+**	ret
+*/
+TEST_LOAD_ZA (ld1_vnum_za128_0_1,
+	      svld1_hor_vnum_za128 (0, w0 + 1, p0, x1, 0),
+	      svld1_hor_vnum_za128 (0, w0 + 1, p0, x1, 0))
diff --git a/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/ld1_hor_vnum_za16.c b/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/ld1_hor_vnum_za16.c
new file mode 100644
index 00000000000..4cf4417b9b0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/ld1_hor_vnum_za16.c
@@ -0,0 +1,46 @@
+/* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
+
+#include "test_sme_acle.h"
+
+/*
+** ld1_vnum_za16_0_0:
+**	mov	(w1[2-5]), w0
+**	ld1h	{ za0h\.h\[\1, 0\] }, p0/z, \[x1\]
+**	ret
+*/
+TEST_LOAD_ZA (ld1_vnum_za16_0_0,
+	      svld1_hor_vnum_za16 (0, w0, p0, x1, 0),
+	      svld1_hor_vnum_za16 (0, w0, p0, x1, 0))
+
+/*
+** ld1_vnum_za16_0_1:
+**	incb	x1, all, mul #9
+**	mov	(w1[2-5]), w0
+**	ld1h	{ za0h\.h\[\1, 1\] }, p0/z, \[x1\]
+**	ret
+*/
+TEST_LOAD_ZA (ld1_vnum_za16_0_1,
+	      svld1_hor_vnum_za16 (0, w0 + 1, p0, x1, 9),
+	      svld1_hor_vnum_za16 (0, w0 + 1, p0, x1, 9))
+
+/*
+** ld1_vnum_za16_1_7:
+**	cntb	(x[0-9]+)
+**	madd	(x[0-9]+), (?:\1, x2|x2, \1), x1
+**	mov	(w1[2-5]), w0
+**	ld1h	{ za1h\.h\[\3, 7\] }, p0/z, \[\2\]
+**	ret
+*/
+TEST_LOAD_ZA (ld1_vnum_za16_1_7,
+	      svld1_hor_vnum_za16 (1, w0 + 7, p0, x1, x2),
+	      svld1_hor_vnum_za16 (1, w0 + 7, p0, x1, x2))
+
+/*
+** ld1_vnum_za16_0_8:
+**	add	(w1[2-5]), w0, #?8
+**	ld1h	{ za0h\.h\[\1, 0\] }, p0/z, \[x1\]
+**	ret
+*/
+TEST_LOAD_ZA (ld1_vnum_za16_0_8,
+	      svld1_hor_vnum_za16 (0, w0 + 8, p0, x1, 0),
+	      svld1_hor_vnum_za16 (0, w0 + 8, p0, x1, 0))
diff --git a/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/ld1_hor_vnum_za32.c b/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/ld1_hor_vnum_za32.c
new file mode 100644
index 00000000000..9dc0d0b0309
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/ld1_hor_vnum_za32.c
@@ -0,0 +1,46 @@
+/* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
+
+#include "test_sme_acle.h"
+
+/*
+** ld1_vnum_za32_0_0:
+**	mov	(w1[2-5]), w0
+**	ld1w	{ za0h\.s\[\1, 0\] }, p0/z, \[x1\]
+**	ret
+*/
+TEST_LOAD_ZA (ld1_vnum_za32_0_0,
+	      svld1_hor_vnum_za32 (0, w0, p0, x1, 0),
+	      svld1_hor_vnum_za32 (0, w0, p0, x1, 0))
+
+/*
+** ld1_vnum_za32_0_1:
+**	incb	x1, all, mul #5
+**	mov	(w1[2-5]), w0
+**	ld1w	{ za0h\.s\[\1, 1\] }, p0/z, \[x1\]
+**	ret
+*/
+TEST_LOAD_ZA (ld1_vnum_za32_0_1,
+	      svld1_hor_vnum_za32 (0, w0 + 1, p0, x1, 5),
+	      svld1_hor_vnum_za32 (0, w0 + 1, p0, x1, 5))
+
+/*
+** ld1_vnum_za32_2_3:
+**	cntb	(x[0-9]+)
+**	madd	(x[0-9]+), (?:\1, x2|x2, \1), x1
+**	mov	(w1[2-5]), w0
+**	ld1w	{ za2h\.s\[\3, 3\] }, p0/z, \[\2\]
+**	ret
+*/
+TEST_LOAD_ZA (ld1_vnum_za32_2_3,
+	      svld1_hor_vnum_za32 (2, w0 + 3, p0, x1, x2),
+	      svld1_hor_vnum_za32 (2, w0 + 3, p0, x1, x2))
+
+/*
+** ld1_vnum_za32_0_4:
+**	add	(w1[2-5]), w0, #?4
+**	ld1w	{ za0h\.s\[\1, 0\] }, p0/z, \[x1\]
+**	ret
+*/
+TEST_LOAD_ZA (ld1_vnum_za32_0_4,
+	      svld1_hor_vnum_za32 (0, w0 + 4, p0, x1, 0),
+	      svld1_hor_vnum_za32 (0, w0 + 4, p0, x1, 0))
diff --git a/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/ld1_hor_vnum_za64.c b/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/ld1_hor_vnum_za64.c
new file mode 100644
index 00000000000..ad3258718a6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/ld1_hor_vnum_za64.c
@@ -0,0 +1,46 @@
+/* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
+
+#include "test_sme_acle.h"
+
+/*
+** ld1_vnum_za64_0_0:
+**	mov	(w1[2-5]), w0
+**	ld1d	{ za0h\.d\[\1, 0\] }, p0/z, \[x1\]
+**	ret
+*/
+TEST_LOAD_ZA (ld1_vnum_za64_0_0,
+	      svld1_hor_vnum_za64 (0, w0, p0, x1, 0),
+	      svld1_hor_vnum_za64 (0, w0, p0, x1, 0))
+
+/*
+** ld1_vnum_za64_0_1:
+**	incb	x1, all, mul #13
+**	mov	(w1[2-5]), w0
+**	ld1d	{ za0h\.d\[\1, 1\] }, p0/z, \[x1\]
+**	ret
+*/
+TEST_LOAD_ZA (ld1_vnum_za64_0_1,
+	      svld1_hor_vnum_za64 (0, w0 + 1, p0, x1, 13),
+	      svld1_hor_vnum_za64 (0, w0 + 1, p0, x1, 13))
+
+/*
+** ld1_vnum_za64_5_1:
+**	cntb	(x[0-9]+)
+**	madd	(x[0-9]+), (?:\1, x2|x2, \1), x1
+**	mov	(w1[2-5]), w0
+**	ld1d	{ za5h\.d\[\3, 1\] }, p0/z, \[\2\]
+**	ret
+*/
+TEST_LOAD_ZA (ld1_vnum_za64_5_1,
+	      svld1_hor_vnum_za64 (5, w0 + 1, p0, x1, x2),
+	      svld1_hor_vnum_za64 (5, w0 + 1, p0, x1, x2))
+
+/*
+** ld1_vnum_za64_0_2:
+**	add	(w1[2-5]), w0, #?2
+**	ld1d	{ za0h\.d\[\1, 0\] }, p0/z, \[x1\]
+**	ret
+*/
+TEST_LOAD_ZA (ld1_vnum_za64_0_2,
+	      svld1_hor_vnum_za64 (0, w0 + 2, p0, x1, 0),
+	      svld1_hor_vnum_za64 (0, w0 + 2, p0, x1, 0))
diff --git a/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/ld1_hor_vnum_za8.c b/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/ld1_hor_vnum_za8.c
new file mode 100644
index 00000000000..68b43dc32a7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/ld1_hor_vnum_za8.c
@@ -0,0 +1,46 @@
+/* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
+
+#include "test_sme_acle.h"
+
+/*
+** ld1_vnum_za8_0_0:
+**	mov	(w1[2-5]), w0
+**	ld1b	{ za0h\.b\[\1, 0\] }, p0/z, \[x1\]
+**	ret
+*/
+TEST_LOAD_ZA (ld1_vnum_za8_0_0,
+	      svld1_hor_vnum_za8 (0, w0, p0, x1, 0),
+	      svld1_hor_vnum_za8 (0, w0, p0, x1, 0))
+
+/*
+** ld1_vnum_za8_0_1:
+**	incb	x1, all, mul #11
+**	mov	(w1[2-5]), w0
+**	ld1b	{ za0h\.b\[\1, 1\] }, p0/z, \[x1\]
+**	ret
+*/
+TEST_LOAD_ZA (ld1_vnum_za8_0_1,
+	      svld1_hor_vnum_za8 (0, w0 + 1, p0, x1, 11),
+	      svld1_hor_vnum_za8 (0, w0 + 1, p0, x1, 11))
+
+/*
+** ld1_vnum_za8_0_15:
+**	cntb	(x[0-9]+)
+**	mul	(x[0-9]+), (?:\1, x2|x2, \1)
+**	mov	(w1[2-5]), w0
+**	ld1b	{ za0h\.b\[\3, 15\] }, p0/z, \[x1, \2\]
+**	ret
+*/
+TEST_LOAD_ZA (ld1_vnum_za8_0_15,
+	      svld1_hor_vnum_za8 (0, w0 + 15, p0, x1, x2),
+	      svld1_hor_vnum_za8 (0, w0 + 15, p0, x1, x2))
+
+/*
+** ld1_vnum_za8_0_16:
+**	add	(w1[2-5]), w0, #?16
+**	ld1b	{ za0h\.b\[\1, 0\] }, p0/z, \[x1\]
+**	ret
+*/
+TEST_LOAD_ZA (ld1_vnum_za8_0_16,
+	      svld1_hor_vnum_za8 (0, w0 + 16, p0, x1, 0),
+	      svld1_hor_vnum_za8 (0, w0 + 16, p0, x1, 0))
diff --git a/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/ld1_hor_za128.c b/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/ld1_hor_za128.c
new file mode 100644
index 00000000000..554028a7e3e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/ld1_hor_za128.c
@@ -0,0 +1,63 @@
+/* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
+
+#include "test_sme_acle.h"
+
+/*
+** ld1_za128_0_0:
+**	mov	(w1[2-5]), w0
+**	ld1q	{ za0h\.q\[\1, 0\] }, p0/z, \[x1\]
+**	ret
+*/
+TEST_LOAD_ZA (ld1_za128_0_0,
+	      svld1_hor_za128 (0, w0, p0, x1),
+	      svld1_hor_za128 (0, w0, p0, x1))
+
+/*
+** ld1_za128_0_1:
+**	add	(w1[2-5]), w0, #?1
+**	ld1q	{ za0h\.q\[\1, 0\] }, p0/z, \[x1\]
+**	ret
+*/
+TEST_LOAD_ZA (ld1_za128_0_1,
+	      svld1_hor_za128 (0, w0 + 1, p0, x1),
+	      svld1_hor_za128 (0, w0 + 1, p0, x1))
+
+/*
+** ld1_za128_7_0:
+**	mov	(w1[2-5]), w0
+**	ld1q	{ za7h\.q\[\1, 0\] }, p0/z, \[x1\]
+**	ret
+*/
+TEST_LOAD_ZA (ld1_za128_7_0,
+	      svld1_hor_za128 (7, w0, p0, x1),
+	      svld1_hor_za128 (7, w0, p0, x1))
+
+/*
+** ld1_za128_13_0:
+**	mov	(w1[2-5]), w0
+**	ld1q	{ za13h\.q\[\1, 0\] }, p0/z, \[x1\]
+**	ret
+*/
+TEST_LOAD_ZA (ld1_za128_13_0,
+	      svld1_hor_za128 (13, w0, p0, x1),
+	      svld1_hor_za128 (13, w0, p0, x1))
+
+/*
+** ld1_za128_15_0:
+**	mov	(w1[2-5]), w0
+**	ld1q	{ za15h\.q\[\1, 0\] }, p0/z, \[x1\]
+**	ret
+*/
+TEST_LOAD_ZA (ld1_za128_15_0,
+	      svld1_hor_za128 (15, w0, p0, x1),
+	      svld1_hor_za128 (15, w0, p0, x1))
+
+/*
+** ld1_za128_9_0_index:
+**	mov	(w1[2-5]), w0
+**	ld1q	{ za9h\.q\[\1, 0\] }, p0/z, \[x1, x2, lsl #?4\]
+**	ret
+*/
+TEST_LOAD_ZA (ld1_za128_9_0_index,
+	      svld1_hor_za128 (9, w0, p0, x1 + x2 * 16),
+	      svld1_hor_za128 (9, w0, p0, x1 + x2 * 16))
diff --git a/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/ld1_hor_za16.c b/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/ld1_hor_za16.c
new file mode 100644
index 00000000000..4f807e6aa7a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/ld1_hor_za16.c
@@ -0,0 +1,94 @@
+/* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
+
+#include "test_sme_acle.h"
+
+/*
+** ld1_za16_0_0:
+**	mov	(w1[2-5]), w0
+**	ld1h	{ za0h\.h\[\1, 0\] }, p0/z, \[x1\]
+**	ret
+*/
+TEST_LOAD_ZA (ld1_za16_0_0,
+	      svld1_hor_za16 (0, w0, p0, x1),
+	      svld1_hor_za16 (0, w0, p0, x1))
+
+/*
+** ld1_za16_0_1:
+**	mov	(w1[2-5]), w0
+**	ld1h	{ za0h\.h\[\1, 1\] }, p0/z, \[x1\]
+**	ret
+*/
+TEST_LOAD_ZA (ld1_za16_0_1,
+	      svld1_hor_za16 (0, w0 + 1, p0, x1),
+	      svld1_hor_za16 (0, w0 + 1, p0, x1))
+
+/*
+** ld1_za16_0_7:
+**	mov	(w1[2-5]), w0
+**	ld1h	{ za0h\.h\[\1, 7\] }, p0/z, \[x1\]
+**	ret
+*/
+TEST_LOAD_ZA (ld1_za16_0_7,
+	      svld1_hor_za16 (0, w0 + 7, p0, x1),
+	      svld1_hor_za16 (0, w0 + 7, p0, x1))
+
+/*
+** ld1_za16_1_0:
+**	mov	(w1[2-5]), w0
+**	ld1h	{ za1h\.h\[\1, 0\] }, p0/z, \[x1\]
+**	ret
+*/
+TEST_LOAD_ZA (ld1_za16_1_0,
+	      svld1_hor_za16 (1, w0, p0, x1),
+	      svld1_hor_za16 (1, w0, p0, x1))
+
+
+/*
+** ld1_za16_1_1:
+**	mov	(w1[2-5]), w0
+**	ld1h	{ za1h\.h\[\1, 1\] }, p0/z, \[x1\]
+**	ret
+*/
+TEST_LOAD_ZA (ld1_za16_1_1,
+	      svld1_hor_za16 (1, w0 + 1, p0, x1),
+	      svld1_hor_za16 (1, w0 + 1, p0, x1))
+
+/*
+** ld1_za16_1_7:
+**	mov	(w1[2-5]), w0
+**	ld1h	{ za1h\.h\[\1, 7\] }, p0/z, \[x1\]
+**	ret
+*/
+TEST_LOAD_ZA (ld1_za16_1_7,
+	      svld1_hor_za16 (1, w0 + 7, p0, x1),
+	      svld1_hor_za16 (1, w0 + 7, p0, x1))
+
+/*
+** ld1_za16_1_5_index:
+**	mov	(w1[2-5]), w0
+**	ld1h	{ za1h\.h\[\1, 5\] }, p0/z, \[x1, x2, lsl #?1\]
+**	ret
+*/
+TEST_LOAD_ZA (ld1_za16_1_5_index,
+	      svld1_hor_za16 (1, w0 + 5, p0, x1 + x2 * 2),
+	      svld1_hor_za16 (1, w0 + 5, p0, x1 + x2 * 2))
+
+/*
+** ld1_za16_0_8:
+**	add	(w1[2-5]), w0, #?8
+**	ld1h	{ za0h\.h\[\1, 0\] }, p0/z, \[x1\]
+**	ret
+*/
+TEST_LOAD_ZA (ld1_za16_0_8,
+	      svld1_hor_za16 (0, w0 + 8, p0, x1),
+	      svld1_hor_za16 (0, w0 + 8, p0, x1))
+
+/*
+** ld1_za16_0_m1:
+**	sub	(w1[2-5]), w0, #?1
+**	ld1h	{ za0h\.h\[\1, 0\] }, p0/z, \[x1\]
+**	ret
+*/
+TEST_LOAD_ZA (ld1_za16_0_m1,
+	      svld1_hor_za16 (0, w0 - 1, p0, x1),
+	      svld1_hor_za16 (0, w0 - 1, p0, x1))
diff --git a/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/ld1_hor_za32.c b/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/ld1_hor_za32.c
new file mode 100644
index 00000000000..253783047a0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/ld1_hor_za32.c
@@ -0,0 +1,93 @@
+/* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
+
+#include "test_sme_acle.h"
+
+/*
+** ld1_za32_0_0:
+**	mov	(w1[2-5]), w0
+**	ld1w	{ za0h\.s\[\1, 0\] }, p0/z, \[x1\]
+**	ret
+*/
+TEST_LOAD_ZA (ld1_za32_0_0,
+	      svld1_hor_za32 (0, w0, p0, x1),
+	      svld1_hor_za32 (0, w0, p0, x1))
+
+/*
+** ld1_za32_0_1:
+**	mov	(w1[2-5]), w0
+**	ld1w	{ za0h\.s\[\1, 1\] }, p0/z, \[x1\]
+**	ret
+*/
+TEST_LOAD_ZA (ld1_za32_0_1,
+	      svld1_hor_za32 (0, w0 + 1, p0, x1),
+	      svld1_hor_za32 (0, w0 + 1, p0, x1))
+
+/*
+** ld1_za32_0_3:
+**	mov	(w1[2-5]), w0
+**	ld1w	{ za0h\.s\[\1, 3\] }, p0/z, \[x1\]
+**	ret
+*/
+TEST_LOAD_ZA (ld1_za32_0_3,
+	      svld1_hor_za32 (0, w0 + 3, p0, x1),
+	      svld1_hor_za32 (0, w0 + 3, p0, x1))
+
+/*
+** ld1_za32_3_0:
+**	mov	(w1[2-5]), w0
+**	ld1w	{ za3h\.s\[\1, 0\] }, p0/z, \[x1\]
+**	ret
+*/
+TEST_LOAD_ZA (ld1_za32_3_0,
+	      svld1_hor_za32 (3, w0, p0, x1),
+	      svld1_hor_za32 (3, w0, p0, x1))
+
+/*
+** ld1_za32_3_1:
+**	mov	(w1[2-5]), w0
+**	ld1w	{ za3h\.s\[\1, 1\] }, p0/z, \[x1\]
+**	ret
+*/
+TEST_LOAD_ZA (ld1_za32_3_1,
+	      svld1_hor_za32 (3, w0 + 1, p0, x1),
+	      svld1_hor_za32 (3, w0 + 1, p0, x1))
+
+/*
+** ld1_za32_3_3:
+**	mov	(w1[2-5]), w0
+**	ld1w	{ za3h\.s\[\1, 3\] }, p0/z, \[x1\]
+**	ret
+*/
+TEST_LOAD_ZA (ld1_za32_3_3,
+	      svld1_hor_za32 (3, w0 + 3, p0, x1),
+	      svld1_hor_za32 (3, w0 + 3, p0, x1))
+
+/*
+** ld1_za32_1_2_index:
+**	mov	(w1[2-5]), w0
+**	ld1w	{ za1h\.s\[\1, 2\] }, p0/z, \[x1, x2, lsl #?2\]
+**	ret
+*/
+TEST_LOAD_ZA (ld1_za32_1_2_index,
+	      svld1_hor_za32 (1, w0 + 2, p0, x1 + x2 * 4),
+	      svld1_hor_za32 (1, w0 + 2, p0, x1 + x2 * 4))
+
+/*
+** ld1_za32_0_4:
+**	add	(w1[2-5]), w0, #?4
+**	ld1w	{ za0h\.s\[\1, 0\] }, p0/z, \[x1\]
+**	ret
+*/
+TEST_LOAD_ZA (ld1_za32_0_4,
+	      svld1_hor_za32 (0, w0 + 4, p0, x1),
+	      svld1_hor_za32 (0, w0 + 4, p0, x1))
+
+/*
+** ld1_za32_0_m1:
+**	sub	(w1[2-5]), w0, #?1
+**	ld1w	{ za0h\.s\[\1, 0\] }, p0/z, \[x1\]
+**	ret
+*/
+TEST_LOAD_ZA (ld1_za32_0_m1,
+	      svld1_hor_za32 (0, w0 - 1, p0, x1),
+	      svld1_hor_za32 (0, w0 - 1, p0, x1))
diff --git a/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/ld1_hor_za64.c b/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/ld1_hor_za64.c
new file mode 100644
index 00000000000..b90b49dd054
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/ld1_hor_za64.c
@@ -0,0 +1,73 @@
+/* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
+
+#include "test_sme_acle.h"
+
+/*
+** ld1_za64_0_0:
+**	mov	(w1[2-5]), w0
+**	ld1d	{ za0h\.d\[\1, 0\] }, p0/z, \[x1\]
+**	ret
+*/
+TEST_LOAD_ZA (ld1_za64_0_0,
+	      svld1_hor_za64 (0, w0, p0, x1),
+	      svld1_hor_za64 (0, w0, p0, x1))
+
+/*
+** ld1_za64_0_1:
+**	mov	(w1[2-5]), w0
+**	ld1d	{ za0h\.d\[\1, 1\] }, p0/z, \[x1\]
+**	ret
+*/
+TEST_LOAD_ZA (ld1_za64_0_1,
+	      svld1_hor_za64 (0, w0 + 1, p0, x1),
+	      svld1_hor_za64 (0, w0 + 1, p0, x1))
+
+/*
+** ld1_za64_7_0:
+**	mov	(w1[2-5]), w0
+**	ld1d	{ za7h\.d\[\1, 0\] }, p0/z, \[x1\]
+**	ret
+*/
+TEST_LOAD_ZA (ld1_za64_7_0,
+	      svld1_hor_za64 (7, w0, p0, x1),
+	      svld1_hor_za64 (7, w0, p0, x1))
+
+/*
+** ld1_za64_7_1:
+**	mov	(w1[2-5]), w0
+**	ld1d	{ za7h\.d\[\1, 1\] }, p0/z, \[x1\]
+**	ret
+*/
+TEST_LOAD_ZA (ld1_za64_7_1,
+	      svld1_hor_za64 (7, w0 + 1, p0, x1),
+	      svld1_hor_za64 (7, w0 + 1, p0, x1))
+
+/*
+** ld1_za64_5_1_index:
+**	mov	(w1[2-5]), w0
+**	ld1d	{ za5h\.d\[\1, 1\] }, p0/z, \[x1, x2, lsl #?3\]
+**	ret
+*/
+TEST_LOAD_ZA (ld1_za64_5_1_index,
+	      svld1_hor_za64 (5, w0 + 1, p0, x1 + x2 * 8),
+	      svld1_hor_za64 (5, w0 + 1, p0, x1 + x2 * 8))
+
+/*
+** ld1_za64_0_2:
+**	add	(w1[2-5]), w0, #?2
+**	ld1d	{ za0h\.d\[\1, 0\] }, p0/z, \[x1\]
+**	ret
+*/
+TEST_LOAD_ZA (ld1_za64_0_2,
+	      svld1_hor_za64 (0, w0 + 2, p0, x1),
+	      svld1_hor_za64 (0, w0 + 2, p0, x1))
+
+/*
+** ld1_za64_0_m1:
+**	sub	(w1[2-5]), w0, #?1
+**	ld1d	{ za0h\.d\[\1, 0\] }, p0/z, \[x1\]
+**	ret
+*/
+TEST_LOAD_ZA (ld1_za64_0_m1,
+	      svld1_hor_za64 (0, w0 - 1, p0, x1),
+	      svld1_hor_za64 (0, w0 - 1, p0, x1))
diff --git a/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/ld1_hor_za8.c b/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/ld1_hor_za8.c
new file mode 100644
index 00000000000..937e6376cf1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/ld1_hor_za8.c
@@ -0,0 +1,63 @@
+/* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
+
+#include "test_sme_acle.h"
+
+/*
+** ld1_za8_0_0:
+**	mov	(w1[2-5]), w0
+**	ld1b	{ za0h\.b\[\1, 0\] }, p0/z, \[x1\]
+**	ret
+*/
+TEST_LOAD_ZA (ld1_za8_0_0,
+	      svld1_hor_za8 (0, w0, p0, x1),
+	      svld1_hor_za8 (0, w0, p0, x1))
+
+/*
+** ld1_za8_0_1:
+**	mov	(w1[2-5]), w0
+**	ld1b	{ za0h\.b\[\1, 1\] }, p0/z, \[x1\]
+**	ret
+*/
+TEST_LOAD_ZA (ld1_za8_0_1,
+	      svld1_hor_za8 (0, w0 + 1, p0, x1),
+	      svld1_hor_za8 (0, w0 + 1, p0, x1))
+
+/*
+** ld1_za8_0_15:
+**	mov	(w1[2-5]), w0
+**	ld1b	{ za0h\.b\[\1, 15\] }, p0/z, \[x1\]
+**	ret
+*/
+TEST_LOAD_ZA (ld1_za8_0_15,
+	      svld1_hor_za8 (0, w0 + 15, p0, x1),
+	      svld1_hor_za8 (0, w0 + 15, p0, x1))
+
+/*
+** ld1_za8_0_13_index:
+**	mov	(w1[2-5]), w0
+**	ld1b	{ za0h\.b\[\1, 15\] }, p0/z, \[x1, x2\]
+**	ret
+*/
+TEST_LOAD_ZA (ld1_za8_0_13_index,
+	      svld1_hor_za8 (0, w0 + 15, p0, x1 + x2),
+	      svld1_hor_za8 (0, w0 + 15, p0, x1 + x2))
+
+/*
+** ld1_za8_0_16:
+**	add	(w1[2-5]), w0, #?16
+**	ld1b	{ za0h\.b\[\1, 0\] }, p0/z, \[x1\]
+**	ret
+*/
+TEST_LOAD_ZA (ld1_za8_0_16,
+	      svld1_hor_za8 (0, w0 + 16, p0, x1),
+	      svld1_hor_za8 (0, w0 + 16, p0, x1))
+
+/*
+** ld1_za8_0_m1:
+**	sub	(w1[2-5]), w0, #?1
+**	ld1b	{ za0h\.b\[\1, 0\] }, p0/z, \[x1\]
+**	ret
+*/
+TEST_LOAD_ZA (ld1_za8_0_m1,
+	      svld1_hor_za8 (0, w0 - 1, p0, x1),
+	      svld1_hor_za8 (0, w0 - 1, p0, x1))
diff --git a/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/ld1_ver_vnum_za128.c b/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/ld1_ver_vnum_za128.c
new file mode 100644
index 00000000000..e69de29bb2d
diff --git a/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/ld1_ver_vnum_za16.c b/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/ld1_ver_vnum_za16.c
new file mode 100644
index 00000000000..e69de29bb2d
diff --git a/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/ld1_ver_vnum_za32.c b/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/ld1_ver_vnum_za32.c
new file mode 100644
index 00000000000..e69de29bb2d
diff --git a/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/ld1_ver_vnum_za64.c b/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/ld1_ver_vnum_za64.c
new file mode 100644
index 00000000000..e69de29bb2d
diff --git a/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/ld1_ver_vnum_za8.c b/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/ld1_ver_vnum_za8.c
new file mode 100644
index 00000000000..e69de29bb2d
diff --git a/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/ld1_ver_za128.c b/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/ld1_ver_za128.c
new file mode 100644
index 00000000000..e69de29bb2d
diff --git a/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/ld1_ver_za16.c b/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/ld1_ver_za16.c
new file mode 100644
index 00000000000..e69de29bb2d
diff --git a/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/ld1_ver_za32.c b/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/ld1_ver_za32.c
new file mode 100644
index 00000000000..e69de29bb2d
diff --git a/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/ld1_ver_za64.c b/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/ld1_ver_za64.c
new file mode 100644
index 00000000000..e69de29bb2d
diff --git a/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/ld1_ver_za8.c b/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/ld1_ver_za8.c
new file mode 100644
index 00000000000..e69de29bb2d
diff --git a/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/ldr_vnum_za_s.c b/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/ldr_vnum_za_s.c
new file mode 100644
index 00000000000..592cfc3c145
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/ldr_vnum_za_s.c
@@ -0,0 +1,121 @@
+/* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
+
+#include "test_sme_acle.h"
+
+/*
+** ldr_vnum_za_0:
+**	mov	(w1[2-5]), w0
+**	ldr	za\[\1, 0\], \[x1(?:, #0, mul vl)?\]
+**	ret
+*/
+TEST_LOAD_ZA (ldr_vnum_za_0,
+	      svldr_vnum_za (w0, x1, 0),
+	      svldr_vnum_za (w0, x1, 0))
+
+/*
+** ldr_vnum_za_1:
+**	mov	(w1[2-5]), w0
+**	ldr	za\[\1, 1\], \[x1, #1, mul vl\]
+**	ret
+*/
+TEST_LOAD_ZA (ldr_vnum_za_1,
+	      svldr_vnum_za (w0 + 1, x1, 1),
+	      svldr_vnum_za (w0 + 1, x1, 1))
+
+/*
+** ldr_vnum_za_13:
+**	mov	(w1[2-5]), w0
+**	ldr	za\[\1, 13\], \[x1, #13, mul vl\]
+**	ret
+*/
+TEST_LOAD_ZA (ldr_vnum_za_13,
+	      svldr_vnum_za (w0 + 13, x1, 13),
+	      svldr_vnum_za (w0 + 13, x1, 13))
+
+/*
+** ldr_vnum_za_15:
+**	mov	(w1[2-5]), w0
+**	ldr	za\[\1, 15\], \[x1, #15, mul vl\]
+**	ret
+*/
+TEST_LOAD_ZA (ldr_vnum_za_15,
+	      svldr_vnum_za (w0 + 15, x1, 15),
+	      svldr_vnum_za (w0 + 15, x1, 15))
+
+/*
+** ldr_vnum_za_16:
+** (
+**	add	(w1[2-5]), w0, #?16
+**	incb	x1, all, mul #16
+**	ldr	za\[\1, 0\], \[x1(?:, #0, mul vl)?\]
+** |
+**	incb	x1, all, mul #16
+**	add	(w1[2-5]), w0, #?16
+**	ldr	za\[\2, 0\], \[x1(?:, #0, mul vl)?\]
+** )
+**	ret
+*/
+TEST_LOAD_ZA (ldr_vnum_za_16,
+	      svldr_vnum_za (w0 + 16, x1, 16),
+	      svldr_vnum_za (w0 + 16, x1, 16))
+
+/*
+** ldr_vnum_za_m1:
+** (
+**	sub	(w1[2-5]), w0, #?1
+**	decb	x1
+**	ldr	za\[\1, 0\], \[x1(?:, #0, mul vl)?\]
+** |
+**	decb	x1
+**	sub	(w1[2-5]), w0, #?1
+**	ldr	za\[\2, 0\], \[x1(?:, #0, mul vl)?\]
+** )
+**	ret
+*/
+TEST_LOAD_ZA (ldr_vnum_za_m1,
+	      svldr_vnum_za (w0 - 1, x1, -1),
+	      svldr_vnum_za (w0 - 1, x1, -1))
+
+/*
+** ldr_vnum_za_mixed_1:
+**	add	(w1[2-5]), w0, #?1
+**	ldr	za\[\1, 0\], \[x1(?:, #0, mul vl)?\]
+**	ret
+*/
+TEST_LOAD_ZA (ldr_vnum_za_mixed_1,
+	      svldr_vnum_za (w0 + 1, x1, 0),
+	      svldr_vnum_za (w0 + 1, x1, 0))
+
+/*
+** ldr_vnum_za_mixed_2:
+** (
+**	mov	(w1[2-5]), w0
+**	incb	x1
+**	ldr	za\[\1, 0\], \[x1(?:, #0, mul vl)?\]
+** |
+**	incb	x1
+**	mov	(w1[2-5]), w0
+**	ldr	za\[\2, 0\], \[x1(?:, #0, mul vl)?\]
+** )
+**	ret
+*/
+TEST_LOAD_ZA (ldr_vnum_za_mixed_2,
+	      svldr_vnum_za (w0, x1, 1),
+	      svldr_vnum_za (w0, x1, 1))
+
+/*
+** ldr_vnum_za_mixed_3:
+** (
+**	add	(w1[2-5]), w0, #?2
+**	incb	x1
+**	ldr	za\[\1, 0\], \[x1(?:, #0, mul vl)?\]
+** |
+**	incb	x1
+**	add	(w1[2-5]), w0, #?2
+**	ldr	za\[\2, 0\], \[x1(?:, #0, mul vl)?\]
+** )
+**	ret
+*/
+TEST_LOAD_ZA (ldr_vnum_za_mixed_3,
+	      svldr_vnum_za (w0 + 2, x1, 1),
+	      svldr_vnum_za (w0 + 2, x1, 1))
diff --git a/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/ldr_vnum_za_sc.c b/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/ldr_vnum_za_sc.c
new file mode 100644
index 00000000000..303cf5b03eb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/ldr_vnum_za_sc.c
@@ -0,0 +1,166 @@
+/* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
+
+#define STREAMING_COMPATIBLE
+#include "test_sme_acle.h"
+
+/*
+** ldr_vnum_za_0:
+**	mov	(w1[2-5]), w0
+**	ldr	za\[\1, 0\], \[x1(?:, #0, mul vl)?\]
+**	ret
+*/
+TEST_LOAD_ZA (ldr_vnum_za_0,
+	      svldr_vnum_za (w0, x1, 0),
+	      svldr_vnum_za (w0, x1, 0))
+
+/*
+** ldr_vnum_za_1:
+**	mov	(w1[2-5]), w0
+**	ldr	za\[\1, 1\], \[x1, #1, mul vl\]
+**	ret
+*/
+TEST_LOAD_ZA (ldr_vnum_za_1,
+	      svldr_vnum_za (w0 + 1, x1, 1),
+	      svldr_vnum_za (w0 + 1, x1, 1))
+
+/*
+** ldr_vnum_za_13:
+**	mov	(w1[2-5]), w0
+**	ldr	za\[\1, 13\], \[x1, #13, mul vl\]
+**	ret
+*/
+TEST_LOAD_ZA (ldr_vnum_za_13,
+	      svldr_vnum_za (w0 + 13, x1, 13),
+	      svldr_vnum_za (w0 + 13, x1, 13))
+
+/*
+** ldr_vnum_za_15:
+**	mov	(w1[2-5]), w0
+**	ldr	za\[\1, 15\], \[x1, #15, mul vl\]
+**	ret
+*/
+TEST_LOAD_ZA (ldr_vnum_za_15,
+	      svldr_vnum_za (w0 + 15, x1, 15),
+	      svldr_vnum_za (w0 + 15, x1, 15))
+
+/*
+** ldr_vnum_za_16:
+** (
+**	add	(w1[2-5]), w0, #?16
+**	addsvl	(x[0-9]+), x1, #16
+**	ldr	za\[\1, 0\], \[\2(?:, #0, mul vl)?\]
+** |
+**	addsvl	(x[0-9]+), x1, #16
+**	add	(w1[2-5]), w0, #?16
+**	ldr	za\[\4, 0\], \[\3(?:, #0, mul vl)?\]
+** )
+**	ret
+*/
+TEST_LOAD_ZA (ldr_vnum_za_16,
+	      svldr_vnum_za (w0 + 16, x1, 16),
+	      svldr_vnum_za (w0 + 16, x1, 16))
+
+/*
+** ldr_vnum_za_m1:
+** (
+**	sub	(w1[2-5]), w0, #?1
+**	addsvl	(x[0-9]+), x1, #-1
+**	ldr	za\[\1, 0\], \[\2(?:, #0, mul vl)?\]
+** |
+**	addsvl	(x[0-9]+), x1, #-1
+**	sub	(w1[2-5]), w0, #?1
+**	ldr	za\[\4, 0\], \[\3(?:, #0, mul vl)?\]
+** )
+**	ret
+*/
+TEST_LOAD_ZA (ldr_vnum_za_m1,
+	      svldr_vnum_za (w0 - 1, x1, -1),
+	      svldr_vnum_za (w0 - 1, x1, -1))
+
+/*
+** ldr_vnum_za_mixed_1:
+**	add	(w1[2-5]), w0, #?1
+**	ldr	za\[\1, 0\], \[x1(?:, #0, mul vl)?\]
+**	ret
+*/
+TEST_LOAD_ZA (ldr_vnum_za_mixed_1,
+	      svldr_vnum_za (w0 + 1, x1, 0),
+	      svldr_vnum_za (w0 + 1, x1, 0))
+
+/*
+** ldr_vnum_za_mixed_2:
+** (
+**	mov	(w1[2-5]), w0
+**	addsvl	(x[0-9]+), x1, #1
+**	ldr	za\[\1, 0\], \[\2(?:, #0, mul vl)?\]
+** |
+**	addsvl	(x[0-9]+), x1, #1
+**	mov	(w1[2-5]), w0
+**	ldr	za\[\4, 0\], \[\3(?:, #0, mul vl)?\]
+** )
+**	ret
+*/
+TEST_LOAD_ZA (ldr_vnum_za_mixed_2,
+	      svldr_vnum_za (w0, x1, 1),
+	      svldr_vnum_za (w0, x1, 1))
+
+/*
+** ldr_vnum_za_mixed_3:
+** (
+**	add	(w1[2-5]), w0, #?2
+**	addsvl	(x[0-9]+), x1, #1
+**	ldr	za\[\1, 0\], \[\2(?:, #0, mul vl)?\]
+** |
+**	addsvl	(x[0-9]+), x1, #1
+**	add	(w1[2-5]), w0, #?2
+**	ldr	za\[\4, 0\], \[\3(?:, #0, mul vl)?\]
+** )
+**	ret
+*/
+TEST_LOAD_ZA (ldr_vnum_za_mixed_3,
+	      svldr_vnum_za (w0 + 2, x1, 1),
+	      svldr_vnum_za (w0 + 2, x1, 1))
+
+/*
+** ldr_vnum_za_mixed_4:
+**	...
+**	addsvl	x[0-9]+, x1, #-32
+**	...
+**	ret
+*/
+TEST_LOAD_ZA (ldr_vnum_za_mixed_4,
+	      svldr_vnum_za (w0 + 3, x1, -32),
+	      svldr_vnum_za (w0 + 3, x1, -32))
+
+/*
+** ldr_vnum_za_mixed_5:
+**	...
+**	rdsvl	x[0-9]+, #1
+**	...
+**	ret
+*/
+TEST_LOAD_ZA (ldr_vnum_za_mixed_5,
+	      svldr_vnum_za (w0 + 3, x1, -33),
+	      svldr_vnum_za (w0 + 3, x1, -33))
+
+/*
+** ldr_vnum_za_mixed_6:
+**	...
+**	addsvl	x[0-9]+, x1, #31
+**	...
+**	ret
+*/
+TEST_LOAD_ZA (ldr_vnum_za_mixed_6,
+	      svldr_vnum_za (w0 + 4, x1, 31),
+	      svldr_vnum_za (w0 + 4, x1, 31))
+
+/*
+** ldr_vnum_za_mixed_7:
+**	...
+**	rdsvl	x[0-9]+, #1
+**	...
+**	ret
+*/
+TEST_LOAD_ZA (ldr_vnum_za_mixed_7,
+	      svldr_vnum_za (w0 + 3, x1, 32),
+	      svldr_vnum_za (w0 + 3, x1, 32))
diff --git a/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/ldr_za_s.c b/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/ldr_za_s.c
new file mode 100644
index 00000000000..72c335c4d83
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/ldr_za_s.c
@@ -0,0 +1,104 @@
+/* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
+
+#include "test_sme_acle.h"
+
+/*
+** ldr_za_0:
+**	mov	(w1[2-5]), w0
+**	ldr	za\[\1, 0\], \[x1(?:, #0, mul vl)?\]
+**	ret
+*/
+TEST_LOAD_ZA (ldr_za_0,
+	      svldr_za (w0, x1),
+	      svldr_za (w0, x1))
+
+/*
+** ldr_za_1_vnum:
+**	mov	(w1[2-5]), w0
+**	ldr	za\[\1, 1\], \[x1, #1, mul vl\]
+**	ret
+*/
+TEST_LOAD_ZA (ldr_za_1_vnum,
+	      svldr_za (w0 + 1, x1 + svcntsb ()),
+	      svldr_za (w0 + 1, x1 + svcntsb ()))
+
+/*
+** ldr_za_13_vnum:
+**	mov	(w1[2-5]), w0
+**	ldr	za\[\1, 13\], \[x1, #13, mul vl\]
+**	ret
+*/
+TEST_LOAD_ZA (ldr_za_13_vnum,
+	      svldr_za (w0 + 13, x1 + svcntsb () * 13),
+	      svldr_za (w0 + 13, x1 + svcntsb () * 13))
+
+/*
+** ldr_za_15_vnum:
+**	mov	(w1[2-5]), w0
+**	ldr	za\[\1, 15\], \[x1, #15, mul vl\]
+**	ret
+*/
+TEST_LOAD_ZA (ldr_za_15_vnum,
+	      svldr_za (w0 + 15, x1 + svcntsb () * 15),
+	      svldr_za (w0 + 15, x1 + svcntsb () * 15))
+
+/*
+** ldr_za_16_vnum:
+** (
+**	add	(w1[2-5]), w0, #?16
+**	incb	x1, all, mul #16
+**	ldr	za\[\1, 0\], \[x1(?:, #0, mul vl)?\]
+** |
+**	incb	x1, all, mul #16
+**	add	(w1[2-5]), w0, #?16
+**	ldr	za\[\2, 0\], \[x1(?:, #0, mul vl)?\]
+** )
+**	ret
+*/
+TEST_LOAD_ZA (ldr_za_16_vnum,
+	      svldr_za (w0 + 16, x1 + svcntsb () * 16),
+	      svldr_za (w0 + 16, x1 + svcntsb () * 16))
+
+/*
+** ldr_za_m1_vnum:
+** (
+**	sub	(w1[2-5]), w0, #?1
+**	decb	x1
+**	ldr	za\[\1, 0\], \[x1(?:, #0, mul vl)?\]
+** |
+**	decb	x1
+**	sub	(w1[2-5]), w0, #?1
+**	ldr	za\[\2, 0\], \[x1(?:, #0, mul vl)?\]
+** )
+**	ret
+*/
+TEST_LOAD_ZA (ldr_za_m1_vnum,
+	      svldr_za (w0 - 1, x1 - svcntsb ()),
+	      svldr_za (w0 - 1, x1 - svcntsb ()))
+
+/*
+** ldr_za_2:
+**	add	(w1[2-5]), w0, #?2
+**	ldr	za\[\1, 0\], \[x1(?:, #0, mul vl)?\]
+**	ret
+*/
+TEST_LOAD_ZA (ldr_za_2,
+	      svldr_za (w0 + 2, x1),
+	      svldr_za (w0 + 2, x1))
+
+/*
+** ldr_za_offset:
+** (
+**	mov	(w1[2-5]), w0
+**	add	(x[0-9]+), x1, #?1
+**	ldr	za\[\1, 0\], \[\2(?:, #0, mul vl)?\]
+** |
+**	add	(x[0-9]+), x1, #?1
+**	mov	(w1[2-5]), w0
+**	ldr	za\[\4, 0\], \[\3(?:, #0, mul vl)?\]
+** )
+**	ret
+*/
+TEST_LOAD_ZA (ldr_za_offset,
+	      svldr_za (w0, x1 + 1),
+	      svldr_za (w0, x1 + 1))
diff --git a/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/ldr_za_sc.c b/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/ldr_za_sc.c
new file mode 100644
index 00000000000..3f9593fa5fb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/ldr_za_sc.c
@@ -0,0 +1,51 @@
+/* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
+
+#define STREAMING_COMPATIBLE
+#include "test_sme_acle.h"
+
+/*
+** ldr_za_0:
+**	mov	(w1[2-5]), w0
+**	ldr	za\[\1, 0\], \[x1(?:, #0, mul vl)?\]
+**	ret
+*/
+TEST_LOAD_ZA (ldr_za_0,
+	      svldr_za (w0, x1),
+	      svldr_za (w0, x1))
+
+/*
+** ldr_za_1_vnum:
+**	mov	(w1[2-5]), w0
+**	ldr	za\[\1, 1\], \[x1, #1, mul vl\]
+**	ret
+*/
+TEST_LOAD_ZA (ldr_za_1_vnum,
+	      svldr_za (w0 + 1, x1 + svcntsb ()),
+	      svldr_za (w0 + 1, x1 + svcntsb ()))
+
+/*
+** ldr_za_2:
+**	add	(w1[2-5]), w0, #?2
+**	ldr	za\[\1, 0\], \[x1(?:, #0, mul vl)?\]
+**	ret
+*/
+TEST_LOAD_ZA (ldr_za_2,
+	      svldr_za (w0 + 2, x1),
+	      svldr_za (w0 + 2, x1))
+
+/*
+** ldr_za_offset:
+** (
+**	mov	(w1[2-5]), w0
+**	add	(x[0-9]+), x1, #?1
+**	ldr	za\[\1, 0\], \[\2(?:, #0, mul vl)?\]
+** |
+**	add	(x[0-9]+), x1, #?1
+**	mov	(w1[2-5]), w0
+**	ldr	za\[\4, 0\], \[\3(?:, #0, mul vl)?\]
+** )
+**	ret
+*/
+TEST_LOAD_ZA (ldr_za_offset,
+	      svldr_za (w0, x1 + 1),
+	      svldr_za (w0, x1 + 1))
diff --git a/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/mopa_za32.c b/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/mopa_za32.c
new file mode 100644
index 00000000000..480de2c7faf
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/mopa_za32.c
@@ -0,0 +1,102 @@
+/* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
+
+#include "test_sme_acle.h"
+
+/*
+** mopa_za32_s8_0_p0_p1_z0_z1:
+**	smopa	za0\.s, p0/m, p1/m, z0\.b, z1\.b
+**	ret
+*/
+TEST_UNIFORM_ZA (mopa_za32_s8_0_p0_p1_z0_z1, svint8_t,
+		 svmopa_za32_s8_m (0, p0, p1, z0, z1),
+		 svmopa_za32_m (0, p0, p1, z0, z1))
+
+/*
+** mopa_za32_s8_0_p1_p0_z1_z0:
+**	smopa	za0\.s, p1/m, p0/m, z1\.b, z0\.b
+**	ret
+*/
+TEST_UNIFORM_ZA (mopa_za32_s8_0_p1_p0_z1_z0, svint8_t,
+		 svmopa_za32_s8_m (0, p1, p0, z1, z0),
+		 svmopa_za32_m (0, p1, p0, z1, z0))
+
+/*
+** mopa_za32_s8_3_p0_p1_z0_z1:
+**	smopa	za3\.s, p0/m, p1/m, z0\.b, z1\.b
+**	ret
+*/
+TEST_UNIFORM_ZA (mopa_za32_s8_3_p0_p1_z0_z1, svint8_t,
+		 svmopa_za32_s8_m (3, p0, p1, z0, z1),
+		 svmopa_za32_m (3, p0, p1, z0, z1))
+
+/*
+** mopa_za32_u8_0_p0_p1_z0_z1:
+**	umopa	za0\.s, p0/m, p1/m, z0\.b, z1\.b
+**	ret
+*/
+TEST_UNIFORM_ZA (mopa_za32_u8_0_p0_p1_z0_z1, svuint8_t,
+		 svmopa_za32_u8_m (0, p0, p1, z0, z1),
+		 svmopa_za32_m (0, p0, p1, z0, z1))
+
+/*
+** mopa_za32_u8_3_p0_p1_z0_z1:
+**	umopa	za3\.s, p0/m, p1/m, z0\.b, z1\.b
+**	ret
+*/
+TEST_UNIFORM_ZA (mopa_za32_u8_3_p0_p1_z0_z1, svuint8_t,
+		 svmopa_za32_u8_m (3, p0, p1, z0, z1),
+		 svmopa_za32_m (3, p0, p1, z0, z1))
+
+/*
+** mopa_za32_bf16_0_p0_p1_z0_z1:
+**	bfmopa	za0\.s, p0/m, p1/m, z0\.h, z1\.h
+**	ret
+*/
+TEST_UNIFORM_ZA (mopa_za32_bf16_0_p0_p1_z0_z1, svbfloat16_t,
+		 svmopa_za32_bf16_m (0, p0, p1, z0, z1),
+		 svmopa_za32_m (0, p0, p1, z0, z1))
+
+/*
+** mopa_za32_bf16_3_p0_p1_z0_z1:
+**	bfmopa	za3\.s, p0/m, p1/m, z0\.h, z1\.h
+**	ret
+*/
+TEST_UNIFORM_ZA (mopa_za32_bf16_3_p0_p1_z0_z1, svbfloat16_t,
+		 svmopa_za32_bf16_m (3, p0, p1, z0, z1),
+		 svmopa_za32_m (3, p0, p1, z0, z1))
+
+/*
+** mopa_za32_f16_0_p0_p1_z0_z1:
+**	fmopa	za0\.s, p0/m, p1/m, z0\.h, z1\.h
+**	ret
+*/
+TEST_UNIFORM_ZA (mopa_za32_f16_0_p0_p1_z0_z1, svfloat16_t,
+		 svmopa_za32_f16_m (0, p0, p1, z0, z1),
+		 svmopa_za32_m (0, p0, p1, z0, z1))
+
+/*
+** mopa_za32_f16_3_p0_p1_z0_z1:
+**	fmopa	za3\.s, p0/m, p1/m, z0\.h, z1\.h
+**	ret
+*/
+TEST_UNIFORM_ZA (mopa_za32_f16_3_p0_p1_z0_z1, svfloat16_t,
+		 svmopa_za32_f16_m (3, p0, p1, z0, z1),
+		 svmopa_za32_m (3, p0, p1, z0, z1))
+
+/*
+** mopa_za32_f32_0_p0_p1_z0_z1:
+**	fmopa	za0\.s, p0/m, p1/m, z0\.s, z1\.s
+**	ret
+*/
+TEST_UNIFORM_ZA (mopa_za32_f32_0_p0_p1_z0_z1, svfloat32_t,
+		 svmopa_za32_f32_m (0, p0, p1, z0, z1),
+		 svmopa_za32_m (0, p0, p1, z0, z1))
+
+/*
+** mopa_za32_f32_3_p0_p1_z0_z1:
+**	fmopa	za3\.s, p0/m, p1/m, z0\.s, z1\.s
+**	ret
+*/
+TEST_UNIFORM_ZA (mopa_za32_f32_3_p0_p1_z0_z1, svfloat32_t,
+		 svmopa_za32_f32_m (3, p0, p1, z0, z1),
+		 svmopa_za32_m (3, p0, p1, z0, z1))
diff --git a/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/mopa_za64.c b/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/mopa_za64.c
new file mode 100644
index 00000000000..f523b960538
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/mopa_za64.c
@@ -0,0 +1,70 @@
+/* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
+
+#include "test_sme_acle.h"
+
+#pragma GCC target "+sme-i16i64"
+
+/*
+** mopa_za64_s16_0_p0_p1_z0_z1:
+**	smopa	za0\.d, p0/m, p1/m, z0\.h, z1\.h
+**	ret
+*/
+TEST_UNIFORM_ZA (mopa_za64_s16_0_p0_p1_z0_z1, svint16_t,
+		 svmopa_za64_s16_m (0, p0, p1, z0, z1),
+		 svmopa_za64_m (0, p0, p1, z0, z1))
+
+/*
+** mopa_za64_s16_0_p1_p0_z1_z0:
+**	smopa	za0\.d, p1/m, p0/m, z1\.h, z0\.h
+**	ret
+*/
+TEST_UNIFORM_ZA (mopa_za64_s16_0_p1_p0_z1_z0, svint16_t,
+		 svmopa_za64_s16_m (0, p1, p0, z1, z0),
+		 svmopa_za64_m (0, p1, p0, z1, z0))
+
+/*
+** mopa_za64_s16_7_p0_p1_z0_z1:
+**	smopa	za7\.d, p0/m, p1/m, z0\.h, z1\.h
+**	ret
+*/
+TEST_UNIFORM_ZA (mopa_za64_s16_7_p0_p1_z0_z1, svint16_t,
+		 svmopa_za64_s16_m (7, p0, p1, z0, z1),
+		 svmopa_za64_m (7, p0, p1, z0, z1))
+
+/*
+** mopa_za64_u16_0_p0_p1_z0_z1:
+**	umopa	za0\.d, p0/m, p1/m, z0\.h, z1\.h
+**	ret
+*/
+TEST_UNIFORM_ZA (mopa_za64_u16_0_p0_p1_z0_z1, svuint16_t,
+		 svmopa_za64_u16_m (0, p0, p1, z0, z1),
+		 svmopa_za64_m (0, p0, p1, z0, z1))
+
+/*
+** mopa_za64_u16_7_p0_p1_z0_z1:
+**	umopa	za7\.d, p0/m, p1/m, z0\.h, z1\.h
+**	ret
+*/
+TEST_UNIFORM_ZA (mopa_za64_u16_7_p0_p1_z0_z1, svuint16_t,
+		 svmopa_za64_u16_m (7, p0, p1, z0, z1),
+		 svmopa_za64_m (7, p0, p1, z0, z1))
+
+#pragma GCC target "+nosme-i16i64+sme-f64f64"
+
+/*
+** mopa_za64_f64_0_p0_p1_z0_z1:
+**	fmopa	za0\.d, p0/m, p1/m, z0\.d, z1\.d
+**	ret
+*/
+TEST_UNIFORM_ZA (mopa_za64_f64_0_p0_p1_z0_z1, svfloat64_t,
+		 svmopa_za64_f64_m (0, p0, p1, z0, z1),
+		 svmopa_za64_m (0, p0, p1, z0, z1))
+
+/*
+** mopa_za64_f64_7_p0_p1_z0_z1:
+**	fmopa	za7\.d, p0/m, p1/m, z0\.d, z1\.d
+**	ret
+*/
+TEST_UNIFORM_ZA (mopa_za64_f64_7_p0_p1_z0_z1, svfloat64_t,
+		 svmopa_za64_f64_m (7, p0, p1, z0, z1),
+		 svmopa_za64_m (7, p0, p1, z0, z1))
diff --git a/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/mops_za32.c b/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/mops_za32.c
new file mode 100644
index 00000000000..63c2b80fd5b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/mops_za32.c
@@ -0,0 +1,102 @@
+/* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
+
+#include "test_sme_acle.h"
+
+/*
+** mops_za32_s8_0_p0_p1_z0_z1:
+**	smops	za0\.s, p0/m, p1/m, z0\.b, z1\.b
+**	ret
+*/
+TEST_UNIFORM_ZA (mops_za32_s8_0_p0_p1_z0_z1, svint8_t,
+		 svmops_za32_s8_m (0, p0, p1, z0, z1),
+		 svmops_za32_m (0, p0, p1, z0, z1))
+
+/*
+** mops_za32_s8_0_p1_p0_z1_z0:
+**	smops	za0\.s, p1/m, p0/m, z1\.b, z0\.b
+**	ret
+*/
+TEST_UNIFORM_ZA (mops_za32_s8_0_p1_p0_z1_z0, svint8_t,
+		 svmops_za32_s8_m (0, p1, p0, z1, z0),
+		 svmops_za32_m (0, p1, p0, z1, z0))
+
+/*
+** mops_za32_s8_3_p0_p1_z0_z1:
+**	smops	za3\.s, p0/m, p1/m, z0\.b, z1\.b
+**	ret
+*/
+TEST_UNIFORM_ZA (mops_za32_s8_3_p0_p1_z0_z1, svint8_t,
+		 svmops_za32_s8_m (3, p0, p1, z0, z1),
+		 svmops_za32_m (3, p0, p1, z0, z1))
+
+/*
+** mops_za32_u8_0_p0_p1_z0_z1:
+**	umops	za0\.s, p0/m, p1/m, z0\.b, z1\.b
+**	ret
+*/
+TEST_UNIFORM_ZA (mops_za32_u8_0_p0_p1_z0_z1, svuint8_t,
+		 svmops_za32_u8_m (0, p0, p1, z0, z1),
+		 svmops_za32_m (0, p0, p1, z0, z1))
+
+/*
+** mops_za32_u8_3_p0_p1_z0_z1:
+**	umops	za3\.s, p0/m, p1/m, z0\.b, z1\.b
+**	ret
+*/
+TEST_UNIFORM_ZA (mops_za32_u8_3_p0_p1_z0_z1, svuint8_t,
+		 svmops_za32_u8_m (3, p0, p1, z0, z1),
+		 svmops_za32_m (3, p0, p1, z0, z1))
+
+/*
+** mops_za32_bf16_0_p0_p1_z0_z1:
+**	bfmops	za0\.s, p0/m, p1/m, z0\.h, z1\.h
+**	ret
+*/
+TEST_UNIFORM_ZA (mops_za32_bf16_0_p0_p1_z0_z1, svbfloat16_t,
+		 svmops_za32_bf16_m (0, p0, p1, z0, z1),
+		 svmops_za32_m (0, p0, p1, z0, z1))
+
+/*
+** mops_za32_bf16_3_p0_p1_z0_z1:
+**	bfmops	za3\.s, p0/m, p1/m, z0\.h, z1\.h
+**	ret
+*/
+TEST_UNIFORM_ZA (mops_za32_bf16_3_p0_p1_z0_z1, svbfloat16_t,
+		 svmops_za32_bf16_m (3, p0, p1, z0, z1),
+		 svmops_za32_m (3, p0, p1, z0, z1))
+
+/*
+** mops_za32_f16_0_p0_p1_z0_z1:
+**	fmops	za0\.s, p0/m, p1/m, z0\.h, z1\.h
+**	ret
+*/
+TEST_UNIFORM_ZA (mops_za32_f16_0_p0_p1_z0_z1, svfloat16_t,
+		 svmops_za32_f16_m (0, p0, p1, z0, z1),
+		 svmops_za32_m (0, p0, p1, z0, z1))
+
+/*
+** mops_za32_f16_3_p0_p1_z0_z1:
+**	fmops	za3\.s, p0/m, p1/m, z0\.h, z1\.h
+**	ret
+*/
+TEST_UNIFORM_ZA (mops_za32_f16_3_p0_p1_z0_z1, svfloat16_t,
+		 svmops_za32_f16_m (3, p0, p1, z0, z1),
+		 svmops_za32_m (3, p0, p1, z0, z1))
+
+/*
+** mops_za32_f32_0_p0_p1_z0_z1:
+**	fmops	za0\.s, p0/m, p1/m, z0\.s, z1\.s
+**	ret
+*/
+TEST_UNIFORM_ZA (mops_za32_f32_0_p0_p1_z0_z1, svfloat32_t,
+		 svmops_za32_f32_m (0, p0, p1, z0, z1),
+		 svmops_za32_m (0, p0, p1, z0, z1))
+
+/*
+** mops_za32_f32_3_p0_p1_z0_z1:
+**	fmops	za3\.s, p0/m, p1/m, z0\.s, z1\.s
+**	ret
+*/
+TEST_UNIFORM_ZA (mops_za32_f32_3_p0_p1_z0_z1, svfloat32_t,
+		 svmops_za32_f32_m (3, p0, p1, z0, z1),
+		 svmops_za32_m (3, p0, p1, z0, z1))
diff --git a/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/mops_za64.c b/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/mops_za64.c
new file mode 100644
index 00000000000..bc04c3cf7fa
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/mops_za64.c
@@ -0,0 +1,70 @@
+/* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
+
+#include "test_sme_acle.h"
+
+#pragma GCC target "+sme-i16i64"
+
+/*
+** mops_za64_s16_0_p0_p1_z0_z1:
+**	smops	za0\.d, p0/m, p1/m, z0\.h, z1\.h
+**	ret
+*/
+TEST_UNIFORM_ZA (mops_za64_s16_0_p0_p1_z0_z1, svint16_t,
+		 svmops_za64_s16_m (0, p0, p1, z0, z1),
+		 svmops_za64_m (0, p0, p1, z0, z1))
+
+/*
+** mops_za64_s16_0_p1_p0_z1_z0:
+**	smops	za0\.d, p1/m, p0/m, z1\.h, z0\.h
+**	ret
+*/
+TEST_UNIFORM_ZA (mops_za64_s16_0_p1_p0_z1_z0, svint16_t,
+		 svmops_za64_s16_m (0, p1, p0, z1, z0),
+		 svmops_za64_m (0, p1, p0, z1, z0))
+
+/*
+** mops_za64_s16_7_p0_p1_z0_z1:
+**	smops	za7\.d, p0/m, p1/m, z0\.h, z1\.h
+**	ret
+*/
+TEST_UNIFORM_ZA (mops_za64_s16_7_p0_p1_z0_z1, svint16_t,
+		 svmops_za64_s16_m (7, p0, p1, z0, z1),
+		 svmops_za64_m (7, p0, p1, z0, z1))
+
+/*
+** mops_za64_u16_0_p0_p1_z0_z1:
+**	umops	za0\.d, p0/m, p1/m, z0\.h, z1\.h
+**	ret
+*/
+TEST_UNIFORM_ZA (mops_za64_u16_0_p0_p1_z0_z1, svuint16_t,
+		 svmops_za64_u16_m (0, p0, p1, z0, z1),
+		 svmops_za64_m (0, p0, p1, z0, z1))
+
+/*
+** mops_za64_u16_7_p0_p1_z0_z1:
+**	umops	za7\.d, p0/m, p1/m, z0\.h, z1\.h
+**	ret
+*/
+TEST_UNIFORM_ZA (mops_za64_u16_7_p0_p1_z0_z1, svuint16_t,
+		 svmops_za64_u16_m (7, p0, p1, z0, z1),
+		 svmops_za64_m (7, p0, p1, z0, z1))
+
+#pragma GCC target "+nosme-i16i64+sme-f64f64"
+
+/*
+** mops_za64_f64_0_p0_p1_z0_z1:
+**	fmops	za0\.d, p0/m, p1/m, z0\.d, z1\.d
+**	ret
+*/
+TEST_UNIFORM_ZA (mops_za64_f64_0_p0_p1_z0_z1, svfloat64_t,
+		 svmops_za64_f64_m (0, p0, p1, z0, z1),
+		 svmops_za64_m (0, p0, p1, z0, z1))
+
+/*
+** mops_za64_f64_7_p0_p1_z0_z1:
+**	fmops	za7\.d, p0/m, p1/m, z0\.d, z1\.d
+**	ret
+*/
+TEST_UNIFORM_ZA (mops_za64_f64_7_p0_p1_z0_z1, svfloat64_t,
+		 svmops_za64_f64_m (7, p0, p1, z0, z1),
+		 svmops_za64_m (7, p0, p1, z0, z1))
diff --git a/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/read_hor_za128.c b/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/read_hor_za128.c
new file mode 100644
index 00000000000..0dd503143e5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/read_hor_za128.c
@@ -0,0 +1,367 @@
+/* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
+
+#include "test_sme_acle.h"
+
+/*
+** read_za128_s8_0_0_tied:
+**	mov	(w1[2-5]), w0
+**	mova	z0\.q, p0/m, za0h\.q\[\1, 0\]
+**	ret
+*/
+TEST_READ_ZA (read_za128_s8_0_0_tied, svint8_t,
+	      z0 = svread_hor_za128_s8_m (z0, p0, 0, w0),
+	      z0 = svread_hor_za128_m (z0, p0, 0, w0))
+
+/*
+** read_za128_s8_0_1_tied:
+**	add	(w1[2-5]), w0, #?1
+**	mova	z0\.q, p0/m, za0h\.q\[\1, 0\]
+**	ret
+*/
+TEST_READ_ZA (read_za128_s8_0_1_tied, svint8_t,
+	      z0 = svread_hor_za128_s8_m (z0, p0, 0, w0 + 1),
+	      z0 = svread_hor_za128_m (z0, p0, 0, w0 + 1))
+
+/*
+** read_za128_s8_0_m1_tied:
+**	sub	(w1[2-5]), w0, #?1
+**	mova	z0\.q, p0/m, za0h\.q\[\1, 0\]
+**	ret
+*/
+TEST_READ_ZA (read_za128_s8_0_m1_tied, svint8_t,
+	      z0 = svread_hor_za128_s8_m (z0, p0, 0, w0 - 1),
+	      z0 = svread_hor_za128_m (z0, p0, 0, w0 - 1))
+
+/*
+** read_za128_s8_1_0_tied:
+**	mov	(w1[2-5]), w0
+**	mova	z0\.q, p0/m, za1h\.q\[\1, 0\]
+**	ret
+*/
+TEST_READ_ZA (read_za128_s8_1_0_tied, svint8_t,
+	      z0 = svread_hor_za128_s8_m (z0, p0, 1, w0),
+	      z0 = svread_hor_za128_m (z0, p0, 1, w0))
+
+/*
+** read_za128_s8_15_0_tied:
+**	mov	(w1[2-5]), w0
+**	mova	z0\.q, p0/m, za15h\.q\[\1, 0\]
+**	ret
+*/
+TEST_READ_ZA (read_za128_s8_15_0_tied, svint8_t,
+	      z0 = svread_hor_za128_s8_m (z0, p0, 15, w0),
+	      z0 = svread_hor_za128_m (z0, p0, 15, w0))
+
+/*
+** read_za128_s8_0_0_untied:
+** (
+**	mov	(w1[2-5]), w0
+**	mov	z0\.d, z1\.d
+**	mova	z0\.q, p0/m, za0h\.q\[\1, 0\]
+** |
+**	mov	z0\.d, z1\.d
+**	mov	(w1[2-5]), w0
+**	mova	z0\.q, p0/m, za0h\.q\[\2, 0\]
+** )
+**	ret
+*/
+TEST_READ_ZA (read_za128_s8_0_0_untied, svint8_t,
+	      z0 = svread_hor_za128_s8_m (z1, p0, 0, w0),
+	      z0 = svread_hor_za128_m (z1, p0, 0, w0))
+
+/*
+** read_za128_u8_0_0_tied:
+**	mov	(w1[2-5]), w0
+**	mova	z0\.q, p0/m, za0h\.q\[\1, 0\]
+**	ret
+*/
+TEST_READ_ZA (read_za128_u8_0_0_tied, svuint8_t,
+	      z0 = svread_hor_za128_u8_m (z0, p0, 0, w0),
+	      z0 = svread_hor_za128_m (z0, p0, 0, w0))
+
+/*
+** read_za128_u8_0_0_untied:
+** (
+**	mov	(w1[2-5]), w0
+**	mov	z0\.d, z1\.d
+**	mova	z0\.q, p0/m, za0h\.q\[\1, 0\]
+** |
+**	mov	z0\.d, z1\.d
+**	mov	(w1[2-5]), w0
+**	mova	z0\.q, p0/m, za0h\.q\[\2, 0\]
+** )
+**	ret
+*/
+TEST_READ_ZA (read_za128_u8_0_0_untied, svuint8_t,
+	      z0 = svread_hor_za128_u8_m (z1, p0, 0, w0),
+	      z0 = svread_hor_za128_m (z1, p0, 0, w0))
+
+/*
+** read_za128_s16_0_0_tied:
+**	mov	(w1[2-5]), w0
+**	mova	z0\.q, p0/m, za0h\.q\[\1, 0\]
+**	ret
+*/
+TEST_READ_ZA (read_za128_s16_0_0_tied, svint16_t,
+	      z0 = svread_hor_za128_s16_m (z0, p0, 0, w0),
+	      z0 = svread_hor_za128_m (z0, p0, 0, w0))
+
+/*
+** read_za128_s16_0_0_untied:
+** (
+**	mov	(w1[2-5]), w0
+**	mov	z0\.d, z1\.d
+**	mova	z0\.q, p0/m, za0h\.q\[\1, 0\]
+** |
+**	mov	z0\.d, z1\.d
+**	mov	(w1[2-5]), w0
+**	mova	z0\.q, p0/m, za0h\.q\[\2, 0\]
+** )
+**	ret
+*/
+TEST_READ_ZA (read_za128_s16_0_0_untied, svint16_t,
+	      z0 = svread_hor_za128_s16_m (z1, p0, 0, w0),
+	      z0 = svread_hor_za128_m (z1, p0, 0, w0))
+
+/*
+** read_za128_u16_0_0_tied:
+**	mov	(w1[2-5]), w0
+**	mova	z0\.q, p0/m, za0h\.q\[\1, 0\]
+**	ret
+*/
+TEST_READ_ZA (read_za128_u16_0_0_tied, svuint16_t,
+	      z0 = svread_hor_za128_u16_m (z0, p0, 0, w0),
+	      z0 = svread_hor_za128_m (z0, p0, 0, w0))
+
+/*
+** read_za128_u16_0_0_untied:
+** (
+**	mov	(w1[2-5]), w0
+**	mov	z0\.d, z1\.d
+**	mova	z0\.q, p0/m, za0h\.q\[\1, 0\]
+** |
+**	mov	z0\.d, z1\.d
+**	mov	(w1[2-5]), w0
+**	mova	z0\.q, p0/m, za0h\.q\[\2, 0\]
+** )
+**	ret
+*/
+TEST_READ_ZA (read_za128_u16_0_0_untied, svuint16_t,
+	      z0 = svread_hor_za128_u16_m (z1, p0, 0, w0),
+	      z0 = svread_hor_za128_m (z1, p0, 0, w0))
+
+/*
+** read_za128_f16_0_0_tied:
+**	mov	(w1[2-5]), w0
+**	mova	z0\.q, p0/m, za0h\.q\[\1, 0\]
+**	ret
+*/
+TEST_READ_ZA (read_za128_f16_0_0_tied, svfloat16_t,
+	      z0 = svread_hor_za128_f16_m (z0, p0, 0, w0),
+	      z0 = svread_hor_za128_m (z0, p0, 0, w0))
+
+/*
+** read_za128_f16_0_0_untied:
+** (
+**	mov	(w1[2-5]), w0
+**	mov	z0\.d, z1\.d
+**	mova	z0\.q, p0/m, za0h\.q\[\1, 0\]
+** |
+**	mov	z0\.d, z1\.d
+**	mov	(w1[2-5]), w0
+**	mova	z0\.q, p0/m, za0h\.q\[\2, 0\]
+** )
+**	ret
+*/
+TEST_READ_ZA (read_za128_f16_0_0_untied, svfloat16_t,
+	      z0 = svread_hor_za128_f16_m (z1, p0, 0, w0),
+	      z0 = svread_hor_za128_m (z1, p0, 0, w0))
+
+/*
+** read_za128_bf16_0_0_tied:
+**	mov	(w1[2-5]), w0
+**	mova	z0\.q, p0/m, za0h\.q\[\1, 0\]
+**	ret
+*/
+TEST_READ_ZA (read_za128_bf16_0_0_tied, svbfloat16_t,
+	      z0 = svread_hor_za128_bf16_m (z0, p0, 0, w0),
+	      z0 = svread_hor_za128_m (z0, p0, 0, w0))
+
+/*
+** read_za128_bf16_0_0_untied:
+** (
+**	mov	(w1[2-5]), w0
+**	mov	z0\.d, z1\.d
+**	mova	z0\.q, p0/m, za0h\.q\[\1, 0\]
+** |
+**	mov	z0\.d, z1\.d
+**	mov	(w1[2-5]), w0
+**	mova	z0\.q, p0/m, za0h\.q\[\2, 0\]
+** )
+**	ret
+*/
+TEST_READ_ZA (read_za128_bf16_0_0_untied, svbfloat16_t,
+	      z0 = svread_hor_za128_bf16_m (z1, p0, 0, w0),
+	      z0 = svread_hor_za128_m (z1, p0, 0, w0))
+
+/*
+** read_za128_s32_0_0_tied:
+**	mov	(w1[2-5]), w0
+**	mova	z0\.q, p0/m, za0h\.q\[\1, 0\]
+**	ret
+*/
+TEST_READ_ZA (read_za128_s32_0_0_tied, svint32_t,
+	      z0 = svread_hor_za128_s32_m (z0, p0, 0, w0),
+	      z0 = svread_hor_za128_m (z0, p0, 0, w0))
+
+/*
+** read_za128_s32_0_0_untied:
+** (
+**	mov	(w1[2-5]), w0
+**	mov	z0\.d, z1\.d
+**	mova	z0\.q, p0/m, za0h\.q\[\1, 0\]
+** |
+**	mov	z0\.d, z1\.d
+**	mov	(w1[2-5]), w0
+**	mova	z0\.q, p0/m, za0h\.q\[\2, 0\]
+** )
+**	ret
+*/
+TEST_READ_ZA (read_za128_s32_0_0_untied, svint32_t,
+	      z0 = svread_hor_za128_s32_m (z1, p0, 0, w0),
+	      z0 = svread_hor_za128_m (z1, p0, 0, w0))
+
+/*
+** read_za128_u32_0_0_tied:
+**	mov	(w1[2-5]), w0
+**	mova	z0\.q, p0/m, za0h\.q\[\1, 0\]
+**	ret
+*/
+TEST_READ_ZA (read_za128_u32_0_0_tied, svuint32_t,
+	      z0 = svread_hor_za128_u32_m (z0, p0, 0, w0),
+	      z0 = svread_hor_za128_m (z0, p0, 0, w0))
+
+/*
+** read_za128_u32_0_0_untied:
+** (
+**	mov	(w1[2-5]), w0
+**	mov	z0\.d, z1\.d
+**	mova	z0\.q, p0/m, za0h\.q\[\1, 0\]
+** |
+**	mov	z0\.d, z1\.d
+**	mov	(w1[2-5]), w0
+**	mova	z0\.q, p0/m, za0h\.q\[\2, 0\]
+** )
+**	ret
+*/
+TEST_READ_ZA (read_za128_u32_0_0_untied, svuint32_t,
+	      z0 = svread_hor_za128_u32_m (z1, p0, 0, w0),
+	      z0 = svread_hor_za128_m (z1, p0, 0, w0))
+
+/*
+** read_za128_f32_0_0_tied:
+**	mov	(w1[2-5]), w0
+**	mova	z0\.q, p0/m, za0h\.q\[\1, 0\]
+**	ret
+*/
+TEST_READ_ZA (read_za128_f32_0_0_tied, svfloat32_t,
+	      z0 = svread_hor_za128_f32_m (z0, p0, 0, w0),
+	      z0 = svread_hor_za128_m (z0, p0, 0, w0))
+
+/*
+** read_za128_f32_0_0_untied:
+** (
+**	mov	(w1[2-5]), w0
+**	mov	z0\.d, z1\.d
+**	mova	z0\.q, p0/m, za0h\.q\[\1, 0\]
+** |
+**	mov	z0\.d, z1\.d
+**	mov	(w1[2-5]), w0
+**	mova	z0\.q, p0/m, za0h\.q\[\2, 0\]
+** )
+**	ret
+*/
+TEST_READ_ZA (read_za128_f32_0_0_untied, svfloat32_t,
+	      z0 = svread_hor_za128_f32_m (z1, p0, 0, w0),
+	      z0 = svread_hor_za128_m (z1, p0, 0, w0))
+
+/*
+** read_za128_s64_0_0_tied:
+**	mov	(w1[2-5]), w0
+**	mova	z0\.q, p0/m, za0h\.q\[\1, 0\]
+**	ret
+*/
+TEST_READ_ZA (read_za128_s64_0_0_tied, svint64_t,
+	      z0 = svread_hor_za128_s64_m (z0, p0, 0, w0),
+	      z0 = svread_hor_za128_m (z0, p0, 0, w0))
+
+/*
+** read_za128_s64_0_0_untied:
+** (
+**	mov	(w1[2-5]), w0
+**	mov	z0\.d, z1\.d
+**	mova	z0\.q, p0/m, za0h\.q\[\1, 0\]
+** |
+**	mov	z0\.d, z1\.d
+**	mov	(w1[2-5]), w0
+**	mova	z0\.q, p0/m, za0h\.q\[\2, 0\]
+** )
+**	ret
+*/
+TEST_READ_ZA (read_za128_s64_0_0_untied, svint64_t,
+	      z0 = svread_hor_za128_s64_m (z1, p0, 0, w0),
+	      z0 = svread_hor_za128_m (z1, p0, 0, w0))
+
+/*
+** read_za128_u64_0_0_tied:
+**	mov	(w1[2-5]), w0
+**	mova	z0\.q, p0/m, za0h\.q\[\1, 0\]
+**	ret
+*/
+TEST_READ_ZA (read_za128_u64_0_0_tied, svuint64_t,
+	      z0 = svread_hor_za128_u64_m (z0, p0, 0, w0),
+	      z0 = svread_hor_za128_m (z0, p0, 0, w0))
+
+/*
+** read_za128_u64_0_0_untied:
+** (
+**	mov	(w1[2-5]), w0
+**	mov	z0\.d, z1\.d
+**	mova	z0\.q, p0/m, za0h\.q\[\1, 0\]
+** |
+**	mov	z0\.d, z1\.d
+**	mov	(w1[2-5]), w0
+**	mova	z0\.q, p0/m, za0h\.q\[\2, 0\]
+** )
+**	ret
+*/
+TEST_READ_ZA (read_za128_u64_0_0_untied, svuint64_t,
+	      z0 = svread_hor_za128_u64_m (z1, p0, 0, w0),
+	      z0 = svread_hor_za128_m (z1, p0, 0, w0))
+
+/*
+** read_za128_f64_0_0_tied:
+**	mov	(w1[2-5]), w0
+**	mova	z0\.q, p0/m, za0h\.q\[\1, 0\]
+**	ret
+*/
+TEST_READ_ZA (read_za128_f64_0_0_tied, svfloat64_t,
+	      z0 = svread_hor_za128_f64_m (z0, p0, 0, w0),
+	      z0 = svread_hor_za128_m (z0, p0, 0, w0))
+
+/*
+** read_za128_f64_0_0_untied:
+** (
+**	mov	(w1[2-5]), w0
+**	mov	z0\.d, z1\.d
+**	mova	z0\.q, p0/m, za0h\.q\[\1, 0\]
+** |
+**	mov	z0\.d, z1\.d
+**	mov	(w1[2-5]), w0
+**	mova	z0\.q, p0/m, za0h\.q\[\2, 0\]
+** )
+**	ret
+*/
+TEST_READ_ZA (read_za128_f64_0_0_untied, svfloat64_t,
+	      z0 = svread_hor_za128_f64_m (z1, p0, 0, w0),
+	      z0 = svread_hor_za128_m (z1, p0, 0, w0))
diff --git a/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/read_hor_za16.c b/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/read_hor_za16.c
new file mode 100644
index 00000000000..c52d94a584b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/read_hor_za16.c
@@ -0,0 +1,171 @@
+/* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
+
+#include "test_sme_acle.h"
+
+/*
+** read_za16_s16_0_0_tied:
+**	mov	(w1[2-5]), w0
+**	mova	z0\.h, p0/m, za0h\.h\[\1, 0\]
+**	ret
+*/
+TEST_READ_ZA (read_za16_s16_0_0_tied, svint16_t,
+	      z0 = svread_hor_za16_s16_m (z0, p0, 0, w0),
+	      z0 = svread_hor_za16_m (z0, p0, 0, w0))
+
+/*
+** read_za16_s16_0_1_tied:
+**	mov	(w1[2-5]), w0
+**	mova	z0\.h, p0/m, za0h\.h\[\1, 1\]
+**	ret
+*/
+TEST_READ_ZA (read_za16_s16_0_1_tied, svint16_t,
+	      z0 = svread_hor_za16_s16_m (z0, p0, 0, w0 + 1),
+	      z0 = svread_hor_za16_m (z0, p0, 0, w0 + 1))
+
+/*
+** read_za16_s16_0_7_tied:
+**	mov	(w1[2-5]), w0
+**	mova	z0\.h, p0/m, za0h\.h\[\1, 7\]
+**	ret
+*/
+TEST_READ_ZA (read_za16_s16_0_7_tied, svint16_t,
+	      z0 = svread_hor_za16_s16_m (z0, p0, 0, w0 + 7),
+	      z0 = svread_hor_za16_m (z0, p0, 0, w0 + 7))
+
+/*
+** read_za16_s16_0_8_tied:
+**	add	(w1[2-5]), w0, #?8
+**	mova	z0\.h, p0/m, za0h\.h\[\1, 0\]
+**	ret
+*/
+TEST_READ_ZA (read_za16_s16_0_8_tied, svint16_t,
+	      z0 = svread_hor_za16_s16_m (z0, p0, 0, w0 + 8),
+	      z0 = svread_hor_za16_m (z0, p0, 0, w0 + 8))
+
+/*
+** read_za16_s16_0_m1_tied:
+**	sub	(w1[2-5]), w0, #?1
+**	mova	z0\.h, p0/m, za0h\.h\[\1, 0\]
+**	ret
+*/
+TEST_READ_ZA (read_za16_s16_0_m1_tied, svint16_t,
+	      z0 = svread_hor_za16_s16_m (z0, p0, 0, w0 - 1),
+	      z0 = svread_hor_za16_m (z0, p0, 0, w0 - 1))
+
+/*
+** read_za16_s16_1_0_tied:
+**	mov	(w1[2-5]), w0
+**	mova	z0\.h, p0/m, za1h\.h\[\1, 0\]
+**	ret
+*/
+TEST_READ_ZA (read_za16_s16_1_0_tied, svint16_t,
+	      z0 = svread_hor_za16_s16_m (z0, p0, 1, w0),
+	      z0 = svread_hor_za16_m (z0, p0, 1, w0))
+
+/*
+** read_za16_s16_1_7_tied:
+**	mov	(w1[2-5]), w0
+**	mova	z0\.h, p0/m, za1h\.h\[\1, 7\]
+**	ret
+*/
+TEST_READ_ZA (read_za16_s16_1_7_tied, svint16_t,
+	      z0 = svread_hor_za16_s16_m (z0, p0, 1, w0 + 7),
+	      z0 = svread_hor_za16_m (z0, p0, 1, w0 + 7))
+
+/*
+** read_za16_s16_0_0_untied:
+** (
+**	mov	(w1[2-5]), w0
+**	mov	z0\.d, z1\.d
+**	mova	z0\.h, p0/m, za0h\.h\[\1, 0\]
+** |
+**	mov	z0\.d, z1\.d
+**	mov	(w1[2-5]), w0
+**	mova	z0\.h, p0/m, za0h\.h\[\2, 0\]
+** )
+**	ret
+*/
+TEST_READ_ZA (read_za16_s16_0_0_untied, svint16_t,
+	      z0 = svread_hor_za16_s16_m (z1, p0, 0, w0),
+	      z0 = svread_hor_za16_m (z1, p0, 0, w0))
+
+/*
+** read_za16_u16_0_0_tied:
+**	mov	(w1[2-5]), w0
+**	mova	z0\.h, p0/m, za0h\.h\[\1, 0\]
+**	ret
+*/
+TEST_READ_ZA (read_za16_u16_0_0_tied, svuint16_t,
+	      z0 = svread_hor_za16_u16_m (z0, p0, 0, w0),
+	      z0 = svread_hor_za16_m (z0, p0, 0, w0))
+
+/*
+** read_za16_u16_0_0_untied:
+** (
+**	mov	(w1[2-5]), w0
+**	mov	z0\.d, z1\.d
+**	mova	z0\.h, p0/m, za0h\.h\[\1, 0\]
+** |
+**	mov	z0\.d, z1\.d
+**	mov	(w1[2-5]), w0
+**	mova	z0\.h, p0/m, za0h\.h\[\2, 0\]
+** )
+**	ret
+*/
+TEST_READ_ZA (read_za16_u16_0_0_untied, svuint16_t,
+	      z0 = svread_hor_za16_u16_m (z1, p0, 0, w0),
+	      z0 = svread_hor_za16_m (z1, p0, 0, w0))
+
+/*
+** read_za16_f16_0_0_tied:
+**	mov	(w1[2-5]), w0
+**	mova	z0\.h, p0/m, za0h\.h\[\1, 0\]
+**	ret
+*/
+TEST_READ_ZA (read_za16_f16_0_0_tied, svfloat16_t,
+	      z0 = svread_hor_za16_f16_m (z0, p0, 0, w0),
+	      z0 = svread_hor_za16_m (z0, p0, 0, w0))
+
+/*
+** read_za16_f16_0_0_untied:
+** (
+**	mov	(w1[2-5]), w0
+**	mov	z0\.d, z1\.d
+**	mova	z0\.h, p0/m, za0h\.h\[\1, 0\]
+** |
+**	mov	z0\.d, z1\.d
+**	mov	(w1[2-5]), w0
+**	mova	z0\.h, p0/m, za0h\.h\[\2, 0\]
+** )
+**	ret
+*/
+TEST_READ_ZA (read_za16_f16_0_0_untied, svfloat16_t,
+	      z0 = svread_hor_za16_f16_m (z1, p0, 0, w0),
+	      z0 = svread_hor_za16_m (z1, p0, 0, w0))
+
+/*
+** read_za16_bf16_0_0_tied:
+**	mov	(w1[2-5]), w0
+**	mova	z0\.h, p0/m, za0h\.h\[\1, 0\]
+**	ret
+*/
+TEST_READ_ZA (read_za16_bf16_0_0_tied, svbfloat16_t,
+	      z0 = svread_hor_za16_bf16_m (z0, p0, 0, w0),
+	      z0 = svread_hor_za16_m (z0, p0, 0, w0))
+
+/*
+** read_za16_bf16_0_0_untied:
+** (
+**	mov	(w1[2-5]), w0
+**	mov	z0\.d, z1\.d
+**	mova	z0\.h, p0/m, za0h\.h\[\1, 0\]
+** |
+**	mov	z0\.d, z1\.d
+**	mov	(w1[2-5]), w0
+**	mova	z0\.h, p0/m, za0h\.h\[\2, 0\]
+** )
+**	ret
+*/
+TEST_READ_ZA (read_za16_bf16_0_0_untied, svbfloat16_t,
+	      z0 = svread_hor_za16_bf16_m (z1, p0, 0, w0),
+	      z0 = svread_hor_za16_m (z1, p0, 0, w0))
diff --git a/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/read_hor_za32.c b/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/read_hor_za32.c
new file mode 100644
index 00000000000..a085dc7fea7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/read_hor_za32.c
@@ -0,0 +1,164 @@
+/* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
+
+#include "test_sme_acle.h"
+
+/*
+** read_za32_s32_0_0_tied:
+**	mov	(w1[2-5]), w0
+**	mova	z0\.s, p0/m, za0h\.s\[\1, 0\]
+**	ret
+*/
+TEST_READ_ZA (read_za32_s32_0_0_tied, svint32_t,
+	      z0 = svread_hor_za32_s32_m (z0, p0, 0, w0),
+	      z0 = svread_hor_za32_m (z0, p0, 0, w0))
+
+/*
+** read_za32_s32_0_1_tied:
+**	mov	(w1[2-5]), w0
+**	mova	z0\.s, p0/m, za0h\.s\[\1, 1\]
+**	ret
+*/
+TEST_READ_ZA (read_za32_s32_0_1_tied, svint32_t,
+	      z0 = svread_hor_za32_s32_m (z0, p0, 0, w0 + 1),
+	      z0 = svread_hor_za32_m (z0, p0, 0, w0 + 1))
+
+/*
+** read_za32_s32_0_3_tied:
+**	mov	(w1[2-5]), w0
+**	mova	z0\.s, p0/m, za0h\.s\[\1, 3\]
+**	ret
+*/
+TEST_READ_ZA (read_za32_s32_0_3_tied, svint32_t,
+	      z0 = svread_hor_za32_s32_m (z0, p0, 0, w0 + 3),
+	      z0 = svread_hor_za32_m (z0, p0, 0, w0 + 3))
+
+/*
+** read_za32_s32_0_4_tied:
+**	add	(w1[2-5]), w0, #?4
+**	mova	z0\.s, p0/m, za0h\.s\[\1, 0\]
+**	ret
+*/
+TEST_READ_ZA (read_za32_s32_0_4_tied, svint32_t,
+	      z0 = svread_hor_za32_s32_m (z0, p0, 0, w0 + 4),
+	      z0 = svread_hor_za32_m (z0, p0, 0, w0 + 4))
+
+/*
+** read_za32_s32_0_m1_tied:
+**	sub	(w1[2-5]), w0, #?1
+**	mova	z0\.s, p0/m, za0h\.s\[\1, 0\]
+**	ret
+*/
+TEST_READ_ZA (read_za32_s32_0_m1_tied, svint32_t,
+	      z0 = svread_hor_za32_s32_m (z0, p0, 0, w0 - 1),
+	      z0 = svread_hor_za32_m (z0, p0, 0, w0 - 1))
+
+/*
+** read_za32_s32_1_0_tied:
+**	mov	(w1[2-5]), w0
+**	mova	z0\.s, p0/m, za1h\.s\[\1, 0\]
+**	ret
+*/
+TEST_READ_ZA (read_za32_s32_1_0_tied, svint32_t,
+	      z0 = svread_hor_za32_s32_m (z0, p0, 1, w0),
+	      z0 = svread_hor_za32_m (z0, p0, 1, w0))
+
+/*
+** read_za32_s32_1_3_tied:
+**	mov	(w1[2-5]), w0
+**	mova	z0\.s, p0/m, za1h\.s\[\1, 3\]
+**	ret
+*/
+TEST_READ_ZA (read_za32_s32_1_3_tied, svint32_t,
+	      z0 = svread_hor_za32_s32_m (z0, p0, 1, w0 + 3),
+	      z0 = svread_hor_za32_m (z0, p0, 1, w0 + 3))
+
+/*
+** read_za32_s32_3_0_tied:
+**	mov	(w1[2-5]), w0
+**	mova	z0\.s, p0/m, za3h\.s\[\1, 0\]
+**	ret
+*/
+TEST_READ_ZA (read_za32_s32_3_0_tied, svint32_t,
+	      z0 = svread_hor_za32_s32_m (z0, p0, 3, w0),
+	      z0 = svread_hor_za32_m (z0, p0, 3, w0))
+
+/*
+** read_za32_s32_3_3_tied:
+**	mov	(w1[2-5]), w0
+**	mova	z0\.s, p0/m, za3h\.s\[\1, 3\]
+**	ret
+*/
+TEST_READ_ZA (read_za32_s32_3_3_tied, svint32_t,
+	      z0 = svread_hor_za32_s32_m (z0, p0, 3, w0 + 3),
+	      z0 = svread_hor_za32_m (z0, p0, 3, w0 + 3))
+
+/*
+** read_za32_s32_0_0_untied:
+** (
+**	mov	(w1[2-5]), w0
+**	mov	z0\.d, z1\.d
+**	mova	z0\.s, p0/m, za0h\.s\[\1, 0\]
+** |
+**	mov	z0\.d, z1\.d
+**	mov	(w1[2-5]), w0
+**	mova	z0\.s, p0/m, za0h\.s\[\2, 0\]
+** )
+**	ret
+*/
+TEST_READ_ZA (read_za32_s32_0_0_untied, svint32_t,
+	      z0 = svread_hor_za32_s32_m (z1, p0, 0, w0),
+	      z0 = svread_hor_za32_m (z1, p0, 0, w0))
+
+/*
+** read_za32_u32_0_0_tied:
+**	mov	(w1[2-5]), w0
+**	mova	z0\.s, p0/m, za0h\.s\[\1, 0\]
+**	ret
+*/
+TEST_READ_ZA (read_za32_u32_0_0_tied, svuint32_t,
+	      z0 = svread_hor_za32_u32_m (z0, p0, 0, w0),
+	      z0 = svread_hor_za32_m (z0, p0, 0, w0))
+
+/*
+** read_za32_u32_0_0_untied:
+** (
+**	mov	(w1[2-5]), w0
+**	mov	z0\.d, z1\.d
+**	mova	z0\.s, p0/m, za0h\.s\[\1, 0\]
+** |
+**	mov	z0\.d, z1\.d
+**	mov	(w1[2-5]), w0
+**	mova	z0\.s, p0/m, za0h\.s\[\2, 0\]
+** )
+**	ret
+*/
+TEST_READ_ZA (read_za32_u32_0_0_untied, svuint32_t,
+	      z0 = svread_hor_za32_u32_m (z1, p0, 0, w0),
+	      z0 = svread_hor_za32_m (z1, p0, 0, w0))
+
+/*
+** read_za32_f32_0_0_tied:
+**	mov	(w1[2-5]), w0
+**	mova	z0\.s, p0/m, za0h\.s\[\1, 0\]
+**	ret
+*/
+TEST_READ_ZA (read_za32_f32_0_0_tied, svfloat32_t,
+	      z0 = svread_hor_za32_f32_m (z0, p0, 0, w0),
+	      z0 = svread_hor_za32_m (z0, p0, 0, w0))
+
+/*
+** read_za32_f32_0_0_untied:
+** (
+**	mov	(w1[2-5]), w0
+**	mov	z0\.d, z1\.d
+**	mova	z0\.s, p0/m, za0h\.s\[\1, 0\]
+** |
+**	mov	z0\.d, z1\.d
+**	mov	(w1[2-5]), w0
+**	mova	z0\.s, p0/m, za0h\.s\[\2, 0\]
+** )
+**	ret
+*/
+TEST_READ_ZA (read_za32_f32_0_0_untied, svfloat32_t,
+	      z0 = svread_hor_za32_f32_m (z1, p0, 0, w0),
+	      z0 = svread_hor_za32_m (z1, p0, 0, w0))
diff --git a/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/read_hor_za64.c b/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/read_hor_za64.c
new file mode 100644
index 00000000000..021e3460f0c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/read_hor_za64.c
@@ -0,0 +1,154 @@
+/* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
+
+#include "test_sme_acle.h"
+
+/*
+** read_za64_s64_0_0_tied:
+**	mov	(w1[2-5]), w0
+**	mova	z0\.d, p0/m, za0h\.d\[\1, 0\]
+**	ret
+*/
+TEST_READ_ZA (read_za64_s64_0_0_tied, svint64_t,
+	      z0 = svread_hor_za64_s64_m (z0, p0, 0, w0),
+	      z0 = svread_hor_za64_m (z0, p0, 0, w0))
+
+/*
+** read_za64_s64_0_1_tied:
+**	mov	(w1[2-5]), w0
+**	mova	z0\.d, p0/m, za0h\.d\[\1, 1\]
+**	ret
+*/
+TEST_READ_ZA (read_za64_s64_0_1_tied, svint64_t,
+	      z0 = svread_hor_za64_s64_m (z0, p0, 0, w0 + 1),
+	      z0 = svread_hor_za64_m (z0, p0, 0, w0 + 1))
+
+/*
+** read_za64_s64_0_2_tied:
+**	add	(w1[2-5]), w0, #?2
+**	mova	z0\.d, p0/m, za0h\.d\[\1, 0\]
+**	ret
+*/
+TEST_READ_ZA (read_za64_s64_0_2_tied, svint64_t,
+	      z0 = svread_hor_za64_s64_m (z0, p0, 0, w0 + 2),
+	      z0 = svread_hor_za64_m (z0, p0, 0, w0 + 2))
+
+/*
+** read_za64_s64_0_m1_tied:
+**	sub	(w1[2-5]), w0, #?1
+**	mova	z0\.d, p0/m, za0h\.d\[\1, 0\]
+**	ret
+*/
+TEST_READ_ZA (read_za64_s64_0_m1_tied, svint64_t,
+	      z0 = svread_hor_za64_s64_m (z0, p0, 0, w0 - 1),
+	      z0 = svread_hor_za64_m (z0, p0, 0, w0 - 1))
+
+/*
+** read_za64_s64_1_0_tied:
+**	mov	(w1[2-5]), w0
+**	mova	z0\.d, p0/m, za1h\.d\[\1, 0\]
+**	ret
+*/
+TEST_READ_ZA (read_za64_s64_1_0_tied, svint64_t,
+	      z0 = svread_hor_za64_s64_m (z0, p0, 1, w0),
+	      z0 = svread_hor_za64_m (z0, p0, 1, w0))
+
+/*
+** read_za64_s64_1_1_tied:
+**	mov	(w1[2-5]), w0
+**	mova	z0\.d, p0/m, za1h\.d\[\1, 1\]
+**	ret
+*/
+TEST_READ_ZA (read_za64_s64_1_1_tied, svint64_t,
+	      z0 = svread_hor_za64_s64_m (z0, p0, 1, w0 + 1),
+	      z0 = svread_hor_za64_m (z0, p0, 1, w0 + 1))
+
+/*
+** read_za64_s64_7_0_tied:
+**	mov	(w1[2-5]), w0
+**	mova	z0\.d, p0/m, za7h\.d\[\1, 0\]
+**	ret
+*/
+TEST_READ_ZA (read_za64_s64_7_0_tied, svint64_t,
+	      z0 = svread_hor_za64_s64_m (z0, p0, 7, w0),
+	      z0 = svread_hor_za64_m (z0, p0, 7, w0))
+
+/*
+** read_za64_s64_7_1_tied:
+**	mov	(w1[2-5]), w0
+**	mova	z0\.d, p0/m, za7h\.d\[\1, 1\]
+**	ret
+*/
+TEST_READ_ZA (read_za64_s64_7_1_tied, svint64_t,
+	      z0 = svread_hor_za64_s64_m (z0, p0, 7, w0 + 1),
+	      z0 = svread_hor_za64_m (z0, p0, 7, w0 + 1))
+
+/*
+** read_za64_s64_0_0_untied:
+** (
+**	mov	(w1[2-5]), w0
+**	mov	z0\.d, z1\.d
+**	mova	z0\.d, p0/m, za0h\.d\[\1, 0\]
+** |
+**	mov	z0\.d, z1\.d
+**	mov	(w1[2-5]), w0
+**	mova	z0\.d, p0/m, za0h\.d\[\2, 0\]
+** )
+**	ret
+*/
+TEST_READ_ZA (read_za64_s64_0_0_untied, svint64_t,
+	      z0 = svread_hor_za64_s64_m (z1, p0, 0, w0),
+	      z0 = svread_hor_za64_m (z1, p0, 0, w0))
+
+/*
+** read_za64_u64_0_0_tied:
+**	mov	(w1[2-5]), w0
+**	mova	z0\.d, p0/m, za0h\.d\[\1, 0\]
+**	ret
+*/
+TEST_READ_ZA (read_za64_u64_0_0_tied, svuint64_t,
+	      z0 = svread_hor_za64_u64_m (z0, p0, 0, w0),
+	      z0 = svread_hor_za64_m (z0, p0, 0, w0))
+
+/*
+** read_za64_u64_0_0_untied:
+** (
+**	mov	(w1[2-5]), w0
+**	mov	z0\.d, z1\.d
+**	mova	z0\.d, p0/m, za0h\.d\[\1, 0\]
+** |
+**	mov	z0\.d, z1\.d
+**	mov	(w1[2-5]), w0
+**	mova	z0\.d, p0/m, za0h\.d\[\2, 0\]
+** )
+**	ret
+*/
+TEST_READ_ZA (read_za64_u64_0_0_untied, svuint64_t,
+	      z0 = svread_hor_za64_u64_m (z1, p0, 0, w0),
+	      z0 = svread_hor_za64_m (z1, p0, 0, w0))
+
+/*
+** read_za64_f64_0_0_tied:
+**	mov	(w1[2-5]), w0
+**	mova	z0\.d, p0/m, za0h\.d\[\1, 0\]
+**	ret
+*/
+TEST_READ_ZA (read_za64_f64_0_0_tied, svfloat64_t,
+	      z0 = svread_hor_za64_f64_m (z0, p0, 0, w0),
+	      z0 = svread_hor_za64_m (z0, p0, 0, w0))
+
+/*
+** read_za64_f64_0_0_untied:
+** (
+**	mov	(w1[2-5]), w0
+**	mov	z0\.d, z1\.d
+**	mova	z0\.d, p0/m, za0h\.d\[\1, 0\]
+** |
+**	mov	z0\.d, z1\.d
+**	mov	(w1[2-5]), w0
+**	mova	z0\.d, p0/m, za0h\.d\[\2, 0\]
+** )
+**	ret
+*/
+TEST_READ_ZA (read_za64_f64_0_0_untied, svfloat64_t,
+	      z0 = svread_hor_za64_f64_m (z1, p0, 0, w0),
+	      z0 = svread_hor_za64_m (z1, p0, 0, w0))
diff --git a/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/read_hor_za8.c b/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/read_hor_za8.c
new file mode 100644
index 00000000000..0558aa0e583
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/read_hor_za8.c
@@ -0,0 +1,97 @@
+/* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
+
+#include "test_sme_acle.h"
+
+/*
+** read_za8_s8_0_0_tied:
+**	mov	(w1[2-5]), w0
+**	mova	z0\.b, p0/m, za0h\.b\[\1, 0\]
+**	ret
+*/
+TEST_READ_ZA (read_za8_s8_0_0_tied, svint8_t,
+	      z0 = svread_hor_za8_s8_m (z0, p0, 0, w0),
+	      z0 = svread_hor_za8_m (z0, p0, 0, w0))
+
+/*
+** read_za8_s8_0_1_tied:
+**	mov	(w1[2-5]), w0
+**	mova	z0\.b, p0/m, za0h\.b\[\1, 1\]
+**	ret
+*/
+TEST_READ_ZA (read_za8_s8_0_1_tied, svint8_t,
+	      z0 = svread_hor_za8_s8_m (z0, p0, 0, w0 + 1),
+	      z0 = svread_hor_za8_m (z0, p0, 0, w0 + 1))
+
+/*
+** read_za8_s8_0_15_tied:
+**	mov	(w1[2-5]), w0
+**	mova	z0\.b, p0/m, za0h\.b\[\1, 15\]
+**	ret
+*/
+TEST_READ_ZA (read_za8_s8_0_15_tied, svint8_t,
+	      z0 = svread_hor_za8_s8_m (z0, p0, 0, w0 + 15),
+	      z0 = svread_hor_za8_m (z0, p0, 0, w0 + 15))
+
+/*
+** read_za8_s8_0_16_tied:
+**	add	(w1[2-5]), w0, #?16
+**	mova	z0\.b, p0/m, za0h\.b\[\1, 0\]
+**	ret
+*/
+TEST_READ_ZA (read_za8_s8_0_16_tied, svint8_t,
+	      z0 = svread_hor_za8_s8_m (z0, p0, 0, w0 + 16),
+	      z0 = svread_hor_za8_m (z0, p0, 0, w0 + 16))
+
+/*
+** read_za8_s8_0_m1_tied:
+**	sub	(w1[2-5]), w0, #?1
+**	mova	z0\.b, p0/m, za0h\.b\[\1, 0\]
+**	ret
+*/
+TEST_READ_ZA (read_za8_s8_0_m1_tied, svint8_t,
+	      z0 = svread_hor_za8_s8_m (z0, p0, 0, w0 - 1),
+	      z0 = svread_hor_za8_m (z0, p0, 0, w0 - 1))
+
+/*
+** read_za8_s8_0_0_untied:
+** (
+**	mov	(w1[2-5]), w0
+**	mov	z0\.d, z1\.d
+**	mova	z0\.b, p0/m, za0h\.b\[\1, 0\]
+** |
+**	mov	z0\.d, z1\.d
+**	mov	(w1[2-5]), w0
+**	mova	z0\.b, p0/m, za0h\.b\[\2, 0\]
+** )
+**	ret
+*/
+TEST_READ_ZA (read_za8_s8_0_0_untied, svint8_t,
+	      z0 = svread_hor_za8_s8_m (z1, p0, 0, w0),
+	      z0 = svread_hor_za8_m (z1, p0, 0, w0))
+
+/*
+** read_za8_u8_0_0_tied:
+**	mov	(w1[2-5]), w0
+**	mova	z0\.b, p0/m, za0h\.b\[\1, 0\]
+**	ret
+*/
+TEST_READ_ZA (read_za8_u8_0_0_tied, svuint8_t,
+	      z0 = svread_hor_za8_u8_m (z0, p0, 0, w0),
+	      z0 = svread_hor_za8_m (z0, p0, 0, w0))
+
+/*
+** read_za8_u8_0_0_untied:
+** (
+**	mov	(w1[2-5]), w0
+**	mov	z0\.d, z1\.d
+**	mova	z0\.b, p0/m, za0h\.b\[\1, 0\]
+** |
+**	mov	z0\.d, z1\.d
+**	mov	(w1[2-5]), w0
+**	mova	z0\.b, p0/m, za0h\.b\[\2, 0\]
+** )
+**	ret
+*/
+TEST_READ_ZA (read_za8_u8_0_0_untied, svuint8_t,
+	      z0 = svread_hor_za8_u8_m (z1, p0, 0, w0),
+	      z0 = svread_hor_za8_m (z1, p0, 0, w0))
diff --git a/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/read_ver_za128.c b/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/read_ver_za128.c
new file mode 100644
index 00000000000..177fa7a8124
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/read_ver_za128.c
@@ -0,0 +1,367 @@
+/* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
+
+#include "test_sme_acle.h"
+
+/*
+** read_za128_s8_0_0_tied:
+**	mov	(w1[2-5]), w0
+**	mova	z0\.q, p0/m, za0v\.q\[\1, 0\]
+**	ret
+*/
+TEST_READ_ZA (read_za128_s8_0_0_tied, svint8_t,
+	      z0 = svread_ver_za128_s8_m (z0, p0, 0, w0),
+	      z0 = svread_ver_za128_m (z0, p0, 0, w0))
+
+/*
+** read_za128_s8_0_1_tied:
+**	add	(w1[2-5]), w0, #?1
+**	mova	z0\.q, p0/m, za0v\.q\[\1, 0\]
+**	ret
+*/
+TEST_READ_ZA (read_za128_s8_0_1_tied, svint8_t,
+	      z0 = svread_ver_za128_s8_m (z0, p0, 0, w0 + 1),
+	      z0 = svread_ver_za128_m (z0, p0, 0, w0 + 1))
+
+/*
+** read_za128_s8_0_m1_tied:
+**	sub	(w1[2-5]), w0, #?1
+**	mova	z0\.q, p0/m, za0v\.q\[\1, 0\]
+**	ret
+*/
+TEST_READ_ZA (read_za128_s8_0_m1_tied, svint8_t,
+	      z0 = svread_ver_za128_s8_m (z0, p0, 0, w0 - 1),
+	      z0 = svread_ver_za128_m (z0, p0, 0, w0 - 1))
+
+/*
+** read_za128_s8_1_0_tied:
+**	mov	(w1[2-5]), w0
+**	mova	z0\.q, p0/m, za1v\.q\[\1, 0\]
+**	ret
+*/
+TEST_READ_ZA (read_za128_s8_1_0_tied, svint8_t,
+	      z0 = svread_ver_za128_s8_m (z0, p0, 1, w0),
+	      z0 = svread_ver_za128_m (z0, p0, 1, w0))
+
+/*
+** read_za128_s8_15_0_tied:
+**	mov	(w1[2-5]), w0
+**	mova	z0\.q, p0/m, za15v\.q\[\1, 0\]
+**	ret
+*/
+TEST_READ_ZA (read_za128_s8_15_0_tied, svint8_t,
+	      z0 = svread_ver_za128_s8_m (z0, p0, 15, w0),
+	      z0 = svread_ver_za128_m (z0, p0, 15, w0))
+
+/*
+** read_za128_s8_0_0_untied:
+** (
+**	mov	(w1[2-5]), w0
+**	mov	z0\.d, z1\.d
+**	mova	z0\.q, p0/m, za0v\.q\[\1, 0\]
+** |
+**	mov	z0\.d, z1\.d
+**	mov	(w1[2-5]), w0
+**	mova	z0\.q, p0/m, za0v\.q\[\2, 0\]
+** )
+**	ret
+*/
+TEST_READ_ZA (read_za128_s8_0_0_untied, svint8_t,
+	      z0 = svread_ver_za128_s8_m (z1, p0, 0, w0),
+	      z0 = svread_ver_za128_m (z1, p0, 0, w0))
+
+/*
+** read_za128_u8_0_0_tied:
+**	mov	(w1[2-5]), w0
+**	mova	z0\.q, p0/m, za0v\.q\[\1, 0\]
+**	ret
+*/
+TEST_READ_ZA (read_za128_u8_0_0_tied, svuint8_t,
+	      z0 = svread_ver_za128_u8_m (z0, p0, 0, w0),
+	      z0 = svread_ver_za128_m (z0, p0, 0, w0))
+
+/*
+** read_za128_u8_0_0_untied:
+** (
+**	mov	(w1[2-5]), w0
+**	mov	z0\.d, z1\.d
+**	mova	z0\.q, p0/m, za0v\.q\[\1, 0\]
+** |
+**	mov	z0\.d, z1\.d
+**	mov	(w1[2-5]), w0
+**	mova	z0\.q, p0/m, za0v\.q\[\2, 0\]
+** )
+**	ret
+*/
+TEST_READ_ZA (read_za128_u8_0_0_untied, svuint8_t,
+	      z0 = svread_ver_za128_u8_m (z1, p0, 0, w0),
+	      z0 = svread_ver_za128_m (z1, p0, 0, w0))
+
+/*
+** read_za128_s16_0_0_tied:
+**	mov	(w1[2-5]), w0
+**	mova	z0\.q, p0/m, za0v\.q\[\1, 0\]
+**	ret
+*/
+TEST_READ_ZA (read_za128_s16_0_0_tied, svint16_t,
+	      z0 = svread_ver_za128_s16_m (z0, p0, 0, w0),
+	      z0 = svread_ver_za128_m (z0, p0, 0, w0))
+
+/*
+** read_za128_s16_0_0_untied:
+** (
+**	mov	(w1[2-5]), w0
+**	mov	z0\.d, z1\.d
+**	mova	z0\.q, p0/m, za0v\.q\[\1, 0\]
+** |
+**	mov	z0\.d, z1\.d
+**	mov	(w1[2-5]), w0
+**	mova	z0\.q, p0/m, za0v\.q\[\2, 0\]
+** )
+**	ret
+*/
+TEST_READ_ZA (read_za128_s16_0_0_untied, svint16_t,
+	      z0 = svread_ver_za128_s16_m (z1, p0, 0, w0),
+	      z0 = svread_ver_za128_m (z1, p0, 0, w0))
+
+/*
+** read_za128_u16_0_0_tied:
+**	mov	(w1[2-5]), w0
+**	mova	z0\.q, p0/m, za0v\.q\[\1, 0\]
+**	ret
+*/
+TEST_READ_ZA (read_za128_u16_0_0_tied, svuint16_t,
+	      z0 = svread_ver_za128_u16_m (z0, p0, 0, w0),
+	      z0 = svread_ver_za128_m (z0, p0, 0, w0))
+
+/*
+** read_za128_u16_0_0_untied:
+** (
+**	mov	(w1[2-5]), w0
+**	mov	z0\.d, z1\.d
+**	mova	z0\.q, p0/m, za0v\.q\[\1, 0\]
+** |
+**	mov	z0\.d, z1\.d
+**	mov	(w1[2-5]), w0
+**	mova	z0\.q, p0/m, za0v\.q\[\2, 0\]
+** )
+**	ret
+*/
+TEST_READ_ZA (read_za128_u16_0_0_untied, svuint16_t,
+	      z0 = svread_ver_za128_u16_m (z1, p0, 0, w0),
+	      z0 = svread_ver_za128_m (z1, p0, 0, w0))
+
+/*
+** read_za128_f16_0_0_tied:
+**	mov	(w1[2-5]), w0
+**	mova	z0\.q, p0/m, za0v\.q\[\1, 0\]
+**	ret
+*/
+TEST_READ_ZA (read_za128_f16_0_0_tied, svfloat16_t,
+	      z0 = svread_ver_za128_f16_m (z0, p0, 0, w0),
+	      z0 = svread_ver_za128_m (z0, p0, 0, w0))
+
+/*
+** read_za128_f16_0_0_untied:
+** (
+**	mov	(w1[2-5]), w0
+**	mov	z0\.d, z1\.d
+**	mova	z0\.q, p0/m, za0v\.q\[\1, 0\]
+** |
+**	mov	z0\.d, z1\.d
+**	mov	(w1[2-5]), w0
+**	mova	z0\.q, p0/m, za0v\.q\[\2, 0\]
+** )
+**	ret
+*/
+TEST_READ_ZA (read_za128_f16_0_0_untied, svfloat16_t,
+	      z0 = svread_ver_za128_f16_m (z1, p0, 0, w0),
+	      z0 = svread_ver_za128_m (z1, p0, 0, w0))
+
+/*
+** read_za128_bf16_0_0_tied:
+**	mov	(w1[2-5]), w0
+**	mova	z0\.q, p0/m, za0v\.q\[\1, 0\]
+**	ret
+*/
+TEST_READ_ZA (read_za128_bf16_0_0_tied, svbfloat16_t,
+	      z0 = svread_ver_za128_bf16_m (z0, p0, 0, w0),
+	      z0 = svread_ver_za128_m (z0, p0, 0, w0))
+
+/*
+** read_za128_bf16_0_0_untied:
+** (
+**	mov	(w1[2-5]), w0
+**	mov	z0\.d, z1\.d
+**	mova	z0\.q, p0/m, za0v\.q\[\1, 0\]
+** |
+**	mov	z0\.d, z1\.d
+**	mov	(w1[2-5]), w0
+**	mova	z0\.q, p0/m, za0v\.q\[\2, 0\]
+** )
+**	ret
+*/
+TEST_READ_ZA (read_za128_bf16_0_0_untied, svbfloat16_t,
+	      z0 = svread_ver_za128_bf16_m (z1, p0, 0, w0),
+	      z0 = svread_ver_za128_m (z1, p0, 0, w0))
+
+/*
+** read_za128_s32_0_0_tied:
+**	mov	(w1[2-5]), w0
+**	mova	z0\.q, p0/m, za0v\.q\[\1, 0\]
+**	ret
+*/
+TEST_READ_ZA (read_za128_s32_0_0_tied, svint32_t,
+	      z0 = svread_ver_za128_s32_m (z0, p0, 0, w0),
+	      z0 = svread_ver_za128_m (z0, p0, 0, w0))
+
+/*
+** read_za128_s32_0_0_untied:
+** (
+**	mov	(w1[2-5]), w0
+**	mov	z0\.d, z1\.d
+**	mova	z0\.q, p0/m, za0v\.q\[\1, 0\]
+** |
+**	mov	z0\.d, z1\.d
+**	mov	(w1[2-5]), w0
+**	mova	z0\.q, p0/m, za0v\.q\[\2, 0\]
+** )
+**	ret
+*/
+TEST_READ_ZA (read_za128_s32_0_0_untied, svint32_t,
+	      z0 = svread_ver_za128_s32_m (z1, p0, 0, w0),
+	      z0 = svread_ver_za128_m (z1, p0, 0, w0))
+
+/*
+** read_za128_u32_0_0_tied:
+**	mov	(w1[2-5]), w0
+**	mova	z0\.q, p0/m, za0v\.q\[\1, 0\]
+**	ret
+*/
+TEST_READ_ZA (read_za128_u32_0_0_tied, svuint32_t,
+	      z0 = svread_ver_za128_u32_m (z0, p0, 0, w0),
+	      z0 = svread_ver_za128_m (z0, p0, 0, w0))
+
+/*
+** read_za128_u32_0_0_untied:
+** (
+**	mov	(w1[2-5]), w0
+**	mov	z0\.d, z1\.d
+**	mova	z0\.q, p0/m, za0v\.q\[\1, 0\]
+** |
+**	mov	z0\.d, z1\.d
+**	mov	(w1[2-5]), w0
+**	mova	z0\.q, p0/m, za0v\.q\[\2, 0\]
+** )
+**	ret
+*/
+TEST_READ_ZA (read_za128_u32_0_0_untied, svuint32_t,
+	      z0 = svread_ver_za128_u32_m (z1, p0, 0, w0),
+	      z0 = svread_ver_za128_m (z1, p0, 0, w0))
+
+/*
+** read_za128_f32_0_0_tied:
+**	mov	(w1[2-5]), w0
+**	mova	z0\.q, p0/m, za0v\.q\[\1, 0\]
+**	ret
+*/
+TEST_READ_ZA (read_za128_f32_0_0_tied, svfloat32_t,
+	      z0 = svread_ver_za128_f32_m (z0, p0, 0, w0),
+	      z0 = svread_ver_za128_m (z0, p0, 0, w0))
+
+/*
+** read_za128_f32_0_0_untied:
+** (
+**	mov	(w1[2-5]), w0
+**	mov	z0\.d, z1\.d
+**	mova	z0\.q, p0/m, za0v\.q\[\1, 0\]
+** |
+**	mov	z0\.d, z1\.d
+**	mov	(w1[2-5]), w0
+**	mova	z0\.q, p0/m, za0v\.q\[\2, 0\]
+** )
+**	ret
+*/
+TEST_READ_ZA (read_za128_f32_0_0_untied, svfloat32_t,
+	      z0 = svread_ver_za128_f32_m (z1, p0, 0, w0),
+	      z0 = svread_ver_za128_m (z1, p0, 0, w0))
+
+/*
+** read_za128_s64_0_0_tied:
+**	mov	(w1[2-5]), w0
+**	mova	z0\.q, p0/m, za0v\.q\[\1, 0\]
+**	ret
+*/
+TEST_READ_ZA (read_za128_s64_0_0_tied, svint64_t,
+	      z0 = svread_ver_za128_s64_m (z0, p0, 0, w0),
+	      z0 = svread_ver_za128_m (z0, p0, 0, w0))
+
+/*
+** read_za128_s64_0_0_untied:
+** (
+**	mov	(w1[2-5]), w0
+**	mov	z0\.d, z1\.d
+**	mova	z0\.q, p0/m, za0v\.q\[\1, 0\]
+** |
+**	mov	z0\.d, z1\.d
+**	mov	(w1[2-5]), w0
+**	mova	z0\.q, p0/m, za0v\.q\[\2, 0\]
+** )
+**	ret
+*/
+TEST_READ_ZA (read_za128_s64_0_0_untied, svint64_t,
+	      z0 = svread_ver_za128_s64_m (z1, p0, 0, w0),
+	      z0 = svread_ver_za128_m (z1, p0, 0, w0))
+
+/*
+** read_za128_u64_0_0_tied:
+**	mov	(w1[2-5]), w0
+**	mova	z0\.q, p0/m, za0v\.q\[\1, 0\]
+**	ret
+*/
+TEST_READ_ZA (read_za128_u64_0_0_tied, svuint64_t,
+	      z0 = svread_ver_za128_u64_m (z0, p0, 0, w0),
+	      z0 = svread_ver_za128_m (z0, p0, 0, w0))
+
+/*
+** read_za128_u64_0_0_untied:
+** (
+**	mov	(w1[2-5]), w0
+**	mov	z0\.d, z1\.d
+**	mova	z0\.q, p0/m, za0v\.q\[\1, 0\]
+** |
+**	mov	z0\.d, z1\.d
+**	mov	(w1[2-5]), w0
+**	mova	z0\.q, p0/m, za0v\.q\[\2, 0\]
+** )
+**	ret
+*/
+TEST_READ_ZA (read_za128_u64_0_0_untied, svuint64_t,
+	      z0 = svread_ver_za128_u64_m (z1, p0, 0, w0),
+	      z0 = svread_ver_za128_m (z1, p0, 0, w0))
+
+/*
+** read_za128_f64_0_0_tied:
+**	mov	(w1[2-5]), w0
+**	mova	z0\.q, p0/m, za0v\.q\[\1, 0\]
+**	ret
+*/
+TEST_READ_ZA (read_za128_f64_0_0_tied, svfloat64_t,
+	      z0 = svread_ver_za128_f64_m (z0, p0, 0, w0),
+	      z0 = svread_ver_za128_m (z0, p0, 0, w0))
+
+/*
+** read_za128_f64_0_0_untied:
+** (
+**	mov	(w1[2-5]), w0
+**	mov	z0\.d, z1\.d
+**	mova	z0\.q, p0/m, za0v\.q\[\1, 0\]
+** |
+**	mov	z0\.d, z1\.d
+**	mov	(w1[2-5]), w0
+**	mova	z0\.q, p0/m, za0v\.q\[\2, 0\]
+** )
+**	ret
+*/
+TEST_READ_ZA (read_za128_f64_0_0_untied, svfloat64_t,
+	      z0 = svread_ver_za128_f64_m (z1, p0, 0, w0),
+	      z0 = svread_ver_za128_m (z1, p0, 0, w0))
diff --git a/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/read_ver_za16.c b/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/read_ver_za16.c
new file mode 100644
index 00000000000..ea67289f0ed
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/read_ver_za16.c
@@ -0,0 +1,171 @@
+/* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
+
+#include "test_sme_acle.h"
+
+/*
+** read_za16_s16_0_0_tied:
+**	mov	(w1[2-5]), w0
+**	mova	z0\.h, p0/m, za0v\.h\[\1, 0\]
+**	ret
+*/
+TEST_READ_ZA (read_za16_s16_0_0_tied, svint16_t,
+	      z0 = svread_ver_za16_s16_m (z0, p0, 0, w0),
+	      z0 = svread_ver_za16_m (z0, p0, 0, w0))
+
+/*
+** read_za16_s16_0_1_tied:
+**	mov	(w1[2-5]), w0
+**	mova	z0\.h, p0/m, za0v\.h\[\1, 1\]
+**	ret
+*/
+TEST_READ_ZA (read_za16_s16_0_1_tied, svint16_t,
+	      z0 = svread_ver_za16_s16_m (z0, p0, 0, w0 + 1),
+	      z0 = svread_ver_za16_m (z0, p0, 0, w0 + 1))
+
+/*
+** read_za16_s16_0_7_tied:
+**	mov	(w1[2-5]), w0
+**	mova	z0\.h, p0/m, za0v\.h\[\1, 7\]
+**	ret
+*/
+TEST_READ_ZA (read_za16_s16_0_7_tied, svint16_t,
+	      z0 = svread_ver_za16_s16_m (z0, p0, 0, w0 + 7),
+	      z0 = svread_ver_za16_m (z0, p0, 0, w0 + 7))
+
+/*
+** read_za16_s16_0_8_tied:
+**	add	(w1[2-5]), w0, #?8
+**	mova	z0\.h, p0/m, za0v\.h\[\1, 0\]
+**	ret
+*/
+TEST_READ_ZA (read_za16_s16_0_8_tied, svint16_t,
+	      z0 = svread_ver_za16_s16_m (z0, p0, 0, w0 + 8),
+	      z0 = svread_ver_za16_m (z0, p0, 0, w0 + 8))
+
+/*
+** read_za16_s16_0_m1_tied:
+**	sub	(w1[2-5]), w0, #?1
+**	mova	z0\.h, p0/m, za0v\.h\[\1, 0\]
+**	ret
+*/
+TEST_READ_ZA (read_za16_s16_0_m1_tied, svint16_t,
+	      z0 = svread_ver_za16_s16_m (z0, p0, 0, w0 - 1),
+	      z0 = svread_ver_za16_m (z0, p0, 0, w0 - 1))
+
+/*
+** read_za16_s16_1_0_tied:
+**	mov	(w1[2-5]), w0
+**	mova	z0\.h, p0/m, za1v\.h\[\1, 0\]
+**	ret
+*/
+TEST_READ_ZA (read_za16_s16_1_0_tied, svint16_t,
+	      z0 = svread_ver_za16_s16_m (z0, p0, 1, w0),
+	      z0 = svread_ver_za16_m (z0, p0, 1, w0))
+
+/*
+** read_za16_s16_1_7_tied:
+**	mov	(w1[2-5]), w0
+**	mova	z0\.h, p0/m, za1v\.h\[\1, 7\]
+**	ret
+*/
+TEST_READ_ZA (read_za16_s16_1_7_tied, svint16_t,
+	      z0 = svread_ver_za16_s16_m (z0, p0, 1, w0 + 7),
+	      z0 = svread_ver_za16_m (z0, p0, 1, w0 + 7))
+
+/*
+** read_za16_s16_0_0_untied:
+** (
+**	mov	(w1[2-5]), w0
+**	mov	z0\.d, z1\.d
+**	mova	z0\.h, p0/m, za0v\.h\[\1, 0\]
+** |
+**	mov	z0\.d, z1\.d
+**	mov	(w1[2-5]), w0
+**	mova	z0\.h, p0/m, za0v\.h\[\2, 0\]
+** )
+**	ret
+*/
+TEST_READ_ZA (read_za16_s16_0_0_untied, svint16_t,
+	      z0 = svread_ver_za16_s16_m (z1, p0, 0, w0),
+	      z0 = svread_ver_za16_m (z1, p0, 0, w0))
+
+/*
+** read_za16_u16_0_0_tied:
+**	mov	(w1[2-5]), w0
+**	mova	z0\.h, p0/m, za0v\.h\[\1, 0\]
+**	ret
+*/
+TEST_READ_ZA (read_za16_u16_0_0_tied, svuint16_t,
+	      z0 = svread_ver_za16_u16_m (z0, p0, 0, w0),
+	      z0 = svread_ver_za16_m (z0, p0, 0, w0))
+
+/*
+** read_za16_u16_0_0_untied:
+** (
+**	mov	(w1[2-5]), w0
+**	mov	z0\.d, z1\.d
+**	mova	z0\.h, p0/m, za0v\.h\[\1, 0\]
+** |
+**	mov	z0\.d, z1\.d
+**	mov	(w1[2-5]), w0
+**	mova	z0\.h, p0/m, za0v\.h\[\2, 0\]
+** )
+**	ret
+*/
+TEST_READ_ZA (read_za16_u16_0_0_untied, svuint16_t,
+	      z0 = svread_ver_za16_u16_m (z1, p0, 0, w0),
+	      z0 = svread_ver_za16_m (z1, p0, 0, w0))
+
+/*
+** read_za16_f16_0_0_tied:
+**	mov	(w1[2-5]), w0
+**	mova	z0\.h, p0/m, za0v\.h\[\1, 0\]
+**	ret
+*/
+TEST_READ_ZA (read_za16_f16_0_0_tied, svfloat16_t,
+	      z0 = svread_ver_za16_f16_m (z0, p0, 0, w0),
+	      z0 = svread_ver_za16_m (z0, p0, 0, w0))
+
+/*
+** read_za16_f16_0_0_untied:
+** (
+**	mov	(w1[2-5]), w0
+**	mov	z0\.d, z1\.d
+**	mova	z0\.h, p0/m, za0v\.h\[\1, 0\]
+** |
+**	mov	z0\.d, z1\.d
+**	mov	(w1[2-5]), w0
+**	mova	z0\.h, p0/m, za0v\.h\[\2, 0\]
+** )
+**	ret
+*/
+TEST_READ_ZA (read_za16_f16_0_0_untied, svfloat16_t,
+	      z0 = svread_ver_za16_f16_m (z1, p0, 0, w0),
+	      z0 = svread_ver_za16_m (z1, p0, 0, w0))
+
+/*
+** read_za16_bf16_0_0_tied:
+**	mov	(w1[2-5]), w0
+**	mova	z0\.h, p0/m, za0v\.h\[\1, 0\]
+**	ret
+*/
+TEST_READ_ZA (read_za16_bf16_0_0_tied, svbfloat16_t,
+	      z0 = svread_ver_za16_bf16_m (z0, p0, 0, w0),
+	      z0 = svread_ver_za16_m (z0, p0, 0, w0))
+
+/*
+** read_za16_bf16_0_0_untied:
+** (
+**	mov	(w1[2-5]), w0
+**	mov	z0\.d, z1\.d
+**	mova	z0\.h, p0/m, za0v\.h\[\1, 0\]
+** |
+**	mov	z0\.d, z1\.d
+**	mov	(w1[2-5]), w0
+**	mova	z0\.h, p0/m, za0v\.h\[\2, 0\]
+** )
+**	ret
+*/
+TEST_READ_ZA (read_za16_bf16_0_0_untied, svbfloat16_t,
+	      z0 = svread_ver_za16_bf16_m (z1, p0, 0, w0),
+	      z0 = svread_ver_za16_m (z1, p0, 0, w0))
diff --git a/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/read_ver_za32.c b/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/read_ver_za32.c
new file mode 100644
index 00000000000..97d7a2d627b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/read_ver_za32.c
@@ -0,0 +1,164 @@
+/* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
+
+#include "test_sme_acle.h"
+
+/*
+** read_za32_s32_0_0_tied:
+**	mov	(w1[2-5]), w0
+**	mova	z0\.s, p0/m, za0v\.s\[\1, 0\]
+**	ret
+*/
+TEST_READ_ZA (read_za32_s32_0_0_tied, svint32_t,
+	      z0 = svread_ver_za32_s32_m (z0, p0, 0, w0),
+	      z0 = svread_ver_za32_m (z0, p0, 0, w0))
+
+/*
+** read_za32_s32_0_1_tied:
+**	mov	(w1[2-5]), w0
+**	mova	z0\.s, p0/m, za0v\.s\[\1, 1\]
+**	ret
+*/
+TEST_READ_ZA (read_za32_s32_0_1_tied, svint32_t,
+	      z0 = svread_ver_za32_s32_m (z0, p0, 0, w0 + 1),
+	      z0 = svread_ver_za32_m (z0, p0, 0, w0 + 1))
+
+/*
+** read_za32_s32_0_3_tied:
+**	mov	(w1[2-5]), w0
+**	mova	z0\.s, p0/m, za0v\.s\[\1, 3\]
+**	ret
+*/
+TEST_READ_ZA (read_za32_s32_0_3_tied, svint32_t,
+	      z0 = svread_ver_za32_s32_m (z0, p0, 0, w0 + 3),
+	      z0 = svread_ver_za32_m (z0, p0, 0, w0 + 3))
+
+/*
+** read_za32_s32_0_4_tied:
+**	add	(w1[2-5]), w0, #?4
+**	mova	z0\.s, p0/m, za0v\.s\[\1, 0\]
+**	ret
+*/
+TEST_READ_ZA (read_za32_s32_0_4_tied, svint32_t,
+	      z0 = svread_ver_za32_s32_m (z0, p0, 0, w0 + 4),
+	      z0 = svread_ver_za32_m (z0, p0, 0, w0 + 4))
+
+/*
+** read_za32_s32_0_m1_tied:
+**	sub	(w1[2-5]), w0, #?1
+**	mova	z0\.s, p0/m, za0v\.s\[\1, 0\]
+**	ret
+*/
+TEST_READ_ZA (read_za32_s32_0_m1_tied, svint32_t,
+	      z0 = svread_ver_za32_s32_m (z0, p0, 0, w0 - 1),
+	      z0 = svread_ver_za32_m (z0, p0, 0, w0 - 1))
+
+/*
+** read_za32_s32_1_0_tied:
+**	mov	(w1[2-5]), w0
+**	mova	z0\.s, p0/m, za1v\.s\[\1, 0\]
+**	ret
+*/
+TEST_READ_ZA (read_za32_s32_1_0_tied, svint32_t,
+	      z0 = svread_ver_za32_s32_m (z0, p0, 1, w0),
+	      z0 = svread_ver_za32_m (z0, p0, 1, w0))
+
+/*
+** read_za32_s32_1_3_tied:
+**	mov	(w1[2-5]), w0
+**	mova	z0\.s, p0/m, za1v\.s\[\1, 3\]
+**	ret
+*/
+TEST_READ_ZA (read_za32_s32_1_3_tied, svint32_t,
+	      z0 = svread_ver_za32_s32_m (z0, p0, 1, w0 + 3),
+	      z0 = svread_ver_za32_m (z0, p0, 1, w0 + 3))
+
+/*
+** read_za32_s32_3_0_tied:
+**	mov	(w1[2-5]), w0
+**	mova	z0\.s, p0/m, za3v\.s\[\1, 0\]
+**	ret
+*/
+TEST_READ_ZA (read_za32_s32_3_0_tied, svint32_t,
+	      z0 = svread_ver_za32_s32_m (z0, p0, 3, w0),
+	      z0 = svread_ver_za32_m (z0, p0, 3, w0))
+
+/*
+** read_za32_s32_3_3_tied:
+**	mov	(w1[2-5]), w0
+**	mova	z0\.s, p0/m, za3v\.s\[\1, 3\]
+**	ret
+*/
+TEST_READ_ZA (read_za32_s32_3_3_tied, svint32_t,
+	      z0 = svread_ver_za32_s32_m (z0, p0, 3, w0 + 3),
+	      z0 = svread_ver_za32_m (z0, p0, 3, w0 + 3))
+
+/*
+** read_za32_s32_0_0_untied:
+** (
+**	mov	(w1[2-5]), w0
+**	mov	z0\.d, z1\.d
+**	mova	z0\.s, p0/m, za0v\.s\[\1, 0\]
+** |
+**	mov	z0\.d, z1\.d
+**	mov	(w1[2-5]), w0
+**	mova	z0\.s, p0/m, za0v\.s\[\2, 0\]
+** )
+**	ret
+*/
+TEST_READ_ZA (read_za32_s32_0_0_untied, svint32_t,
+	      z0 = svread_ver_za32_s32_m (z1, p0, 0, w0),
+	      z0 = svread_ver_za32_m (z1, p0, 0, w0))
+
+/*
+** read_za32_u32_0_0_tied:
+**	mov	(w1[2-5]), w0
+**	mova	z0\.s, p0/m, za0v\.s\[\1, 0\]
+**	ret
+*/
+TEST_READ_ZA (read_za32_u32_0_0_tied, svuint32_t,
+	      z0 = svread_ver_za32_u32_m (z0, p0, 0, w0),
+	      z0 = svread_ver_za32_m (z0, p0, 0, w0))
+
+/*
+** read_za32_u32_0_0_untied:
+** (
+**	mov	(w1[2-5]), w0
+**	mov	z0\.d, z1\.d
+**	mova	z0\.s, p0/m, za0v\.s\[\1, 0\]
+** |
+**	mov	z0\.d, z1\.d
+**	mov	(w1[2-5]), w0
+**	mova	z0\.s, p0/m, za0v\.s\[\2, 0\]
+** )
+**	ret
+*/
+TEST_READ_ZA (read_za32_u32_0_0_untied, svuint32_t,
+	      z0 = svread_ver_za32_u32_m (z1, p0, 0, w0),
+	      z0 = svread_ver_za32_m (z1, p0, 0, w0))
+
+/*
+** read_za32_f32_0_0_tied:
+**	mov	(w1[2-5]), w0
+**	mova	z0\.s, p0/m, za0v\.s\[\1, 0\]
+**	ret
+*/
+TEST_READ_ZA (read_za32_f32_0_0_tied, svfloat32_t,
+	      z0 = svread_ver_za32_f32_m (z0, p0, 0, w0),
+	      z0 = svread_ver_za32_m (z0, p0, 0, w0))
+
+/*
+** read_za32_f32_0_0_untied:
+** (
+**	mov	(w1[2-5]), w0
+**	mov	z0\.d, z1\.d
+**	mova	z0\.s, p0/m, za0v\.s\[\1, 0\]
+** |
+**	mov	z0\.d, z1\.d
+**	mov	(w1[2-5]), w0
+**	mova	z0\.s, p0/m, za0v\.s\[\2, 0\]
+** )
+**	ret
+*/
+TEST_READ_ZA (read_za32_f32_0_0_untied, svfloat32_t,
+	      z0 = svread_ver_za32_f32_m (z1, p0, 0, w0),
+	      z0 = svread_ver_za32_m (z1, p0, 0, w0))
diff --git a/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/read_ver_za64.c b/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/read_ver_za64.c
new file mode 100644
index 00000000000..ce1348f147b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/read_ver_za64.c
@@ -0,0 +1,154 @@
+/* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
+
+#include "test_sme_acle.h"
+
+/*
+** read_za64_s64_0_0_tied:
+**	mov	(w1[2-5]), w0
+**	mova	z0\.d, p0/m, za0v\.d\[\1, 0\]
+**	ret
+*/
+TEST_READ_ZA (read_za64_s64_0_0_tied, svint64_t,
+	      z0 = svread_ver_za64_s64_m (z0, p0, 0, w0),
+	      z0 = svread_ver_za64_m (z0, p0, 0, w0))
+
+/*
+** read_za64_s64_0_1_tied:
+**	mov	(w1[2-5]), w0
+**	mova	z0\.d, p0/m, za0v\.d\[\1, 1\]
+**	ret
+*/
+TEST_READ_ZA (read_za64_s64_0_1_tied, svint64_t,
+	      z0 = svread_ver_za64_s64_m (z0, p0, 0, w0 + 1),
+	      z0 = svread_ver_za64_m (z0, p0, 0, w0 + 1))
+
+/*
+** read_za64_s64_0_2_tied:
+**	add	(w1[2-5]), w0, #?2
+**	mova	z0\.d, p0/m, za0v\.d\[\1, 0\]
+**	ret
+*/
+TEST_READ_ZA (read_za64_s64_0_2_tied, svint64_t,
+	      z0 = svread_ver_za64_s64_m (z0, p0, 0, w0 + 2),
+	      z0 = svread_ver_za64_m (z0, p0, 0, w0 + 2))
+
+/*
+** read_za64_s64_0_m1_tied:
+**	sub	(w1[2-5]), w0, #?1
+**	mova	z0\.d, p0/m, za0v\.d\[\1, 0\]
+**	ret
+*/
+TEST_READ_ZA (read_za64_s64_0_m1_tied, svint64_t,
+	      z0 = svread_ver_za64_s64_m (z0, p0, 0, w0 - 1),
+	      z0 = svread_ver_za64_m (z0, p0, 0, w0 - 1))
+
+/*
+** read_za64_s64_1_0_tied:
+**	mov	(w1[2-5]), w0
+**	mova	z0\.d, p0/m, za1v\.d\[\1, 0\]
+**	ret
+*/
+TEST_READ_ZA (read_za64_s64_1_0_tied, svint64_t,
+	      z0 = svread_ver_za64_s64_m (z0, p0, 1, w0),
+	      z0 = svread_ver_za64_m (z0, p0, 1, w0))
+
+/*
+** read_za64_s64_1_1_tied:
+**	mov	(w1[2-5]), w0
+**	mova	z0\.d, p0/m, za1v\.d\[\1, 1\]
+**	ret
+*/
+TEST_READ_ZA (read_za64_s64_1_1_tied, svint64_t,
+	      z0 = svread_ver_za64_s64_m (z0, p0, 1, w0 + 1),
+	      z0 = svread_ver_za64_m (z0, p0, 1, w0 + 1))
+
+/*
+** read_za64_s64_7_0_tied:
+**	mov	(w1[2-5]), w0
+**	mova	z0\.d, p0/m, za7v\.d\[\1, 0\]
+**	ret
+*/
+TEST_READ_ZA (read_za64_s64_7_0_tied, svint64_t,
+	      z0 = svread_ver_za64_s64_m (z0, p0, 7, w0),
+	      z0 = svread_ver_za64_m (z0, p0, 7, w0))
+
+/*
+** read_za64_s64_7_1_tied:
+**	mov	(w1[2-5]), w0
+**	mova	z0\.d, p0/m, za7v\.d\[\1, 1\]
+**	ret
+*/
+TEST_READ_ZA (read_za64_s64_7_1_tied, svint64_t,
+	      z0 = svread_ver_za64_s64_m (z0, p0, 7, w0 + 1),
+	      z0 = svread_ver_za64_m (z0, p0, 7, w0 + 1))
+
+/*
+** read_za64_s64_0_0_untied:
+** (
+**	mov	(w1[2-5]), w0
+**	mov	z0\.d, z1\.d
+**	mova	z0\.d, p0/m, za0v\.d\[\1, 0\]
+** |
+**	mov	z0\.d, z1\.d
+**	mov	(w1[2-5]), w0
+**	mova	z0\.d, p0/m, za0v\.d\[\2, 0\]
+** )
+**	ret
+*/
+TEST_READ_ZA (read_za64_s64_0_0_untied, svint64_t,
+	      z0 = svread_ver_za64_s64_m (z1, p0, 0, w0),
+	      z0 = svread_ver_za64_m (z1, p0, 0, w0))
+
+/*
+** read_za64_u64_0_0_tied:
+**	mov	(w1[2-5]), w0
+**	mova	z0\.d, p0/m, za0v\.d\[\1, 0\]
+**	ret
+*/
+TEST_READ_ZA (read_za64_u64_0_0_tied, svuint64_t,
+	      z0 = svread_ver_za64_u64_m (z0, p0, 0, w0),
+	      z0 = svread_ver_za64_m (z0, p0, 0, w0))
+
+/*
+** read_za64_u64_0_0_untied:
+** (
+**	mov	(w1[2-5]), w0
+**	mov	z0\.d, z1\.d
+**	mova	z0\.d, p0/m, za0v\.d\[\1, 0\]
+** |
+**	mov	z0\.d, z1\.d
+**	mov	(w1[2-5]), w0
+**	mova	z0\.d, p0/m, za0v\.d\[\2, 0\]
+** )
+**	ret
+*/
+TEST_READ_ZA (read_za64_u64_0_0_untied, svuint64_t,
+	      z0 = svread_ver_za64_u64_m (z1, p0, 0, w0),
+	      z0 = svread_ver_za64_m (z1, p0, 0, w0))
+
+/*
+** read_za64_f64_0_0_tied:
+**	mov	(w1[2-5]), w0
+**	mova	z0\.d, p0/m, za0v\.d\[\1, 0\]
+**	ret
+*/
+TEST_READ_ZA (read_za64_f64_0_0_tied, svfloat64_t,
+	      z0 = svread_ver_za64_f64_m (z0, p0, 0, w0),
+	      z0 = svread_ver_za64_m (z0, p0, 0, w0))
+
+/*
+** read_za64_f64_0_0_untied:
+** (
+**	mov	(w1[2-5]), w0
+**	mov	z0\.d, z1\.d
+**	mova	z0\.d, p0/m, za0v\.d\[\1, 0\]
+** |
+**	mov	z0\.d, z1\.d
+**	mov	(w1[2-5]), w0
+**	mova	z0\.d, p0/m, za0v\.d\[\2, 0\]
+** )
+**	ret
+*/
+TEST_READ_ZA (read_za64_f64_0_0_untied, svfloat64_t,
+	      z0 = svread_ver_za64_f64_m (z1, p0, 0, w0),
+	      z0 = svread_ver_za64_m (z1, p0, 0, w0))
diff --git a/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/read_ver_za8.c b/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/read_ver_za8.c
new file mode 100644
index 00000000000..916155f5261
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sme/acle-asm/read_ver_za8.c
@@ -0,0 +1,97 @@
+/* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
+
+#include "test_sme_acle.h"
+
+/*
+** read_za8_s8_0_0_tied:
+**	mov	(w1[2-5]), w0
+**	mova	z0\.b, p0/m, za0v\.b\[\1, 0\]
+**	ret
+*/
+TEST_READ_ZA (read_za8_s8_0_0_tied, svint8_t,
+	      z0 = svread_ver_za8_s8_m (z0, p0, 0, w0),
+	      z0 = svread_ver_za8_m (z0, p0, 0, w0))
+
+/*
+** read_za8_s8_0_1_tied:
+**	mov	(w1[2-5]), w0
+**	mova	z0\.b, p0/m, za0v\.b\[\1, 1\]
+**	ret
+*/
+TEST_READ_ZA (read_za8_s8_0_1_tied, svint8_t,
+	      z0 = svread_ver_za8_s8_m (z0, p0, 0, w0 + 1),
+	      z0 = svread_ver_za8_m (z0, p0, 0, w0 + 1))
+
+/*
+** read_za8_s8_0_15_tied:
+**	mov	(w1[2-5]), w0
+**	mova	z0\.b, p0/m, za0v\.b\[\1, 15\]
+**	ret
+*/
+TEST_READ_ZA (read_za8_s8_0_15_tied, svint8_t,
+	      z0 = svread_ver_za8_s8_m (z0, p0, 0, w0 + 15),
+	      z0 = svread_ver_za8_m (z0, p0, 0, w0 + 15))
+
+/*
+** read_za8_s8_0_16_tied:
+**	add	(w1[2-5]), w0, #?16
+**	mova	z0\.b, p0/m, za0v\.b\[\1, 0\]
+**	ret
+*/
+TEST_READ_ZA (read_za8_s8_0_16_tied, svint8_t,
+	      z0 = svread_ver_za8_s8_m (z0, p0, 0, w0 + 16),
+	      z0 = svread_ver_za8_m (z0, p0, 0, w0 + 16))
+
+/*
+** read_za8_s8_0_m1_tied:
+**	sub	(w1[2-5]), w0, #?1
+**	mova	z0\.b, p0/m, za0v\.b\[\1, 0\]
+**	ret
+*/
+TEST_RE