[PATCH 1/3]middle-end: Add the ability to let the target decide the method of argument promotions.

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH 1/3]middle-end: Add the ability to let the target decide the method of argument promotions.
@ 2022-05-13 17:11 Tamar Christina
  2022-05-13 17:11 ` [PATCH 2/3]AArch64 Promote function arguments using a paradoxical subreg when beneficial Tamar Christina
                   ` (3 more replies)
  0 siblings, 4 replies; 19+ messages in thread
From: Tamar Christina @ 2022-05-13 17:11 UTC (permalink / raw)
  To: gcc-patches; +Cc: nd, rguenther, jeffreyalaw, richard.sandiford

[-- Attachment #1: Type: text/plain, Size: 10449 bytes --]

Hi All,

Some targets require function parameters to be promoted to a different
type on expand time because the target may not have native instructions
to work on such types.  As an example the AArch64 port does not have native
instructions working on integer 8- or 16-bit values.  As such it promotes
every parameter of these types to 32-bits.

This promotion could be done by a target for two reasons:

1. For correctness.  This may be an APCS requirement for instance.
2. For efficiency.  By promoting the argument at expansion time we don't have
   to keep promoting the type back and forth after each operation on it.
   i.e. the promotion simplies the RTL.

This patch adds the ability for a target to decide whether during the expansion
to use an extend to handle promotion or to use a paradoxical subreg.

A pradoxical subreg can be used when there's no correctness issues and when you
still want the RTL efficiency of not doing the promotion constantly.

This also allows the target to not need to generate any code when the top bits
are not significant.

An example is in AArch64 the following extend is unneeded:

uint8_t fd2 (uint8_t xr){
    return xr + 1;
}

currently generates:

fd2:
        and     w0, w0, 255
        add     w0, w0, 1
        ret

instead of

fd2:
        add     w0, w0, #1
        ret

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
Bootstrapped on x86_64-pc-linux-gnu and no issues

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

	* cfgexpand.cc (set_rtl): Check for function promotion.
	* tree-outof-ssa.cc (insert_value_copy_on_edge): Likewise.
	* function.cc (assign_parm_setup_reg): Likewise.
	* hooks.cc (hook_bool_mode_mode_int_tree_false,
	hook_bool_mode_mode_int_tree_true): New.
	* hooks.h (hook_bool_mode_mode_int_tree_false,
	hook_bool_mode_mode_int_tree_true): New.
	* target.def (promote_function_args_subreg_p): New.
	* doc/tm.texi: Document it.
	* doc/tm.texi.in: Likewise.

--- inline copy of patch -- 
diff --git a/gcc/cfgexpand.cc b/gcc/cfgexpand.cc
index d3cc77d2ca98f620b29623fc5696410bad9bc184..df95184cfa185312c2a46cb92daa051718d9f4f3 100644
--- a/gcc/cfgexpand.cc
+++ b/gcc/cfgexpand.cc
@@ -206,14 +206,20 @@ set_rtl (tree t, rtx x)
      have to compute it ourselves.  For RESULT_DECLs, we accept mode
      mismatches too, as long as we have BLKmode or are not coalescing
      across variables, so that we don't reject BLKmode PARALLELs or
-     unpromoted REGs.  */
+     unpromoted REGs.  For any promoted types that result in a
+     paradoxical subreg also accept the argument.  */
   gcc_checking_assert (!x || x == pc_rtx || TREE_CODE (t) != SSA_NAME
 		       || (SSAVAR (t)
 			   && TREE_CODE (SSAVAR (t)) == RESULT_DECL
 			   && (promote_ssa_mode (t, NULL) == BLKmode
 			       || !flag_tree_coalesce_vars))
 		       || !use_register_for_decl (t)
-		       || GET_MODE (x) == promote_ssa_mode (t, NULL));
+		       || GET_MODE (x) == promote_ssa_mode (t, NULL)
+		       || targetm.calls.promote_function_args_subreg_p (
+					  GET_MODE (x),
+					  promote_ssa_mode (t, NULL),
+					  TYPE_UNSIGNED (TREE_TYPE (t)),
+					  SSAVAR (t)));
 
   if (x)
     {
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 2f92d37da8c0091e9879a493cfe8a361eb1d9299..6314cd83a2488dc225d4a1a15599e8e51e639f7f 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -3906,6 +3906,15 @@ cases of mismatch, it also makes for better code on certain machines.
 The default is to not promote prototypes.
 @end deftypefn
 
+@deftypefn {Target Hook} bool TARGET_PROMOTE_FUNCTION_ARGS_SUBREG_P (machine_mode @var{mode}, machine_mode @var{promoted_mode}, int @var{unsignedp}, tree @var{v})
+When a function argument is promoted with @code{PROMOTE_MODE} then this
+hook is used to determine whether the bits of the promoted type are all
+significant in the expression pointed to by V.  If they are an extend is
+generated, if they are not a paradoxical subreg is created for the argument
+from @code{mode} to @code{promoted_mode}.
+The default is to promote using an extend.
+@end deftypefn
+
 @deftypefn {Target Hook} bool TARGET_PUSH_ARGUMENT (unsigned int @var{npush})
 This target hook returns @code{true} if push instructions will be
 used to pass outgoing arguments.  When the push instruction usage is
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index f869ddd5e5b8b7acbd8e9765fb103af24a1085b6..35f955803ec0a5a93be18a028fa1043f90858982 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -3103,6 +3103,8 @@ control passing certain arguments in registers.
 
 @hook TARGET_PROMOTE_PROTOTYPES
 
+@hook TARGET_PROMOTE_FUNCTION_ARGS_SUBREG_P
+
 @hook TARGET_PUSH_ARGUMENT
 
 @defmac PUSH_ARGS_REVERSED
diff --git a/gcc/function.cc b/gcc/function.cc
index d5ed51a6a663a1ef472f5b1c090543f359c18f42..92f469bfd5d1ebfb09cc94d9be854715cd2f90f8 100644
--- a/gcc/function.cc
+++ b/gcc/function.cc
@@ -3161,7 +3161,7 @@ assign_parm_setup_reg (struct assign_parm_data_all *all, tree parm,
   machine_mode promoted_nominal_mode;
   int unsignedp = TYPE_UNSIGNED (TREE_TYPE (parm));
   bool did_conversion = false;
-  bool need_conversion, moved;
+  bool need_conversion, moved, use_subregs;
   enum insn_code icode;
   rtx rtl;
 
@@ -3172,7 +3172,20 @@ assign_parm_setup_reg (struct assign_parm_data_all *all, tree parm,
     = promote_function_mode (data->nominal_type, data->nominal_mode, &unsignedp,
 			     TREE_TYPE (current_function_decl), 2);
 
-  parmreg = gen_reg_rtx (promoted_nominal_mode);
+  /* Check to see how the target wants the promotion of function arguments to
+     be handled.  */
+  use_subregs
+    = targetm.calls.promote_function_args_subreg_p (data->nominal_mode,
+						    promoted_nominal_mode,
+						    unsignedp, parm);
+
+  /* If we're promoting using a paradoxical subreg then we need to keep using
+     the unpromoted type because that's the only fully defined value.  */
+  if (use_subregs)
+    parmreg = gen_reg_rtx (data->nominal_mode);
+  else
+    parmreg = gen_reg_rtx (promoted_nominal_mode);
+
   if (!DECL_ARTIFICIAL (parm))
     mark_user_reg (parmreg);
 
@@ -3256,9 +3269,19 @@ assign_parm_setup_reg (struct assign_parm_data_all *all, tree parm,
 	    }
 	  else
 	    t = op1;
-	  rtx_insn *pat = gen_extend_insn (op0, t, promoted_nominal_mode,
-					   data->passed_mode, unsignedp);
-	  emit_insn (pat);
+
+	  /* Promote the argument itself now if a target wants it.  This
+	     prevents unneeded back and forth convertions in RTL between
+	     the original and promoted type.  */
+	  if (use_subregs)
+	    emit_move_insn (op0, lowpart_subreg (promoted_nominal_mode, t,
+						 data->nominal_mode));
+	  else
+	    {
+	      rtx_insn *pat = gen_extend_insn (op0, t, promoted_nominal_mode,
+					       data->passed_mode, unsignedp);
+	      emit_insn (pat);
+	    }
 	  insns = get_insns ();
 
 	  moved = true;
diff --git a/gcc/hooks.h b/gcc/hooks.h
index 1056e1e9e4dc3e6ce298557351047caa2f84227f..8d68de5cdb9adaea0a79ebf6de599f66b40aa67a 100644
--- a/gcc/hooks.h
+++ b/gcc/hooks.h
@@ -31,6 +31,8 @@ extern bool hook_bool_const_int_const_int_true (const int, const int);
 extern bool hook_bool_mode_false (machine_mode);
 extern bool hook_bool_mode_true (machine_mode);
 extern bool hook_bool_mode_mode_true (machine_mode, machine_mode);
+extern bool hook_bool_mode_mode_int_tree_false (machine_mode, machine_mode, int, tree);
+extern bool hook_bool_mode_mode_int_tree_true (machine_mode, machine_mode, int, tree);
 extern bool hook_bool_mode_const_rtx_false (machine_mode, const_rtx);
 extern bool hook_bool_mode_const_rtx_true (machine_mode, const_rtx);
 extern bool hook_bool_mode_rtx_false (machine_mode, rtx);
diff --git a/gcc/hooks.cc b/gcc/hooks.cc
index b29233f4f852fb81ede75a5065d743cd16cc9219..7647774f9e8efbbe13d5607e4a4b2f1c9d22f045 100644
--- a/gcc/hooks.cc
+++ b/gcc/hooks.cc
@@ -89,6 +89,22 @@ hook_bool_mode_mode_true (machine_mode, machine_mode)
   return true;
 }
 
+/* Generic hook that takes (machine_mode, machine_mode, int, tree) and
+   returns false.  */
+bool
+hook_bool_mode_mode_int_tree_false (machine_mode, machine_mode, int, tree)
+{
+  return false;
+}
+
+/* Generic hook that takes (machine_mode, machine_mode, int, tree) and
+   returns true.  */
+bool
+hook_bool_mode_mode_int_tree_true (machine_mode, machine_mode, int, tree)
+{
+  return true;
+}
+
 /* Generic hook that takes (machine_mode, const_rtx) and returns false.  */
 bool
 hook_bool_mode_const_rtx_false (machine_mode, const_rtx)
diff --git a/gcc/target.def b/gcc/target.def
index 72c2e1ef756cf70a1c92abe81f8a6577eaaa2501..bdbacf8c5fd7b0626a37951f6f6ec649f3194977 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -4561,6 +4561,17 @@ The default is to not promote prototypes.",
  bool, (const_tree fntype),
  hook_bool_const_tree_false)
 
+DEFHOOK
+(promote_function_args_subreg_p,
+ "When a function argument is promoted with @code{PROMOTE_MODE} then this\n\
+hook is used to determine whether the bits of the promoted type are all\n\
+significant in the expression pointed to by V.  If they are an extend is\n\
+generated, if they are not a paradoxical subreg is created for the argument\n\
+from @code{mode} to @code{promoted_mode}.\n\
+The default is to promote using an extend.",
+ bool, (machine_mode mode, machine_mode promoted_mode, int unsignedp, tree v),
+ hook_bool_mode_mode_int_tree_false)
+
 DEFHOOK
 (struct_value_rtx,
  "This target hook should return the location of the structure value\n\
diff --git a/gcc/tree-outof-ssa.cc b/gcc/tree-outof-ssa.cc
index ec883126ad86d86a2c2dafee4592b8d83e2ed917..0f437023983baa0f23da25221f7bce8fc559a8b8 100644
--- a/gcc/tree-outof-ssa.cc
+++ b/gcc/tree-outof-ssa.cc
@@ -45,6 +45,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-ssa-coalesce.h"
 #include "tree-outof-ssa.h"
 #include "dojump.h"
+#include "target.h"
 
 /* FIXME: A lot of code here deals with expanding to RTL.  All that code
    should be in cfgexpand.cc.  */
@@ -333,7 +334,10 @@ insert_value_copy_on_edge (edge e, int dest, tree src, location_t locus)
   dest_mode = GET_MODE (dest_rtx);
   gcc_assert (src_mode == TYPE_MODE (TREE_TYPE (name)));
   gcc_assert (!REG_P (dest_rtx)
-	      || dest_mode == promote_ssa_mode (name, &unsignedp));
+	      || dest_mode == promote_ssa_mode (name, &unsignedp)
+	      || targetm.calls.promote_function_args_subreg_p (
+			dest_mode, promote_ssa_mode (name, NULL), unsignedp,
+			name));
 
   if (src_mode != dest_mode)
     {




-- 

[-- Attachment #2: rb15721.patch --]
[-- Type: text/plain, Size: 8469 bytes --]

diff --git a/gcc/cfgexpand.cc b/gcc/cfgexpand.cc
index d3cc77d2ca98f620b29623fc5696410bad9bc184..df95184cfa185312c2a46cb92daa051718d9f4f3 100644
--- a/gcc/cfgexpand.cc
+++ b/gcc/cfgexpand.cc
@@ -206,14 +206,20 @@ set_rtl (tree t, rtx x)
      have to compute it ourselves.  For RESULT_DECLs, we accept mode
      mismatches too, as long as we have BLKmode or are not coalescing
      across variables, so that we don't reject BLKmode PARALLELs or
-     unpromoted REGs.  */
+     unpromoted REGs.  For any promoted types that result in a
+     paradoxical subreg also accept the argument.  */
   gcc_checking_assert (!x || x == pc_rtx || TREE_CODE (t) != SSA_NAME
 		       || (SSAVAR (t)
 			   && TREE_CODE (SSAVAR (t)) == RESULT_DECL
 			   && (promote_ssa_mode (t, NULL) == BLKmode
 			       || !flag_tree_coalesce_vars))
 		       || !use_register_for_decl (t)
-		       || GET_MODE (x) == promote_ssa_mode (t, NULL));
+		       || GET_MODE (x) == promote_ssa_mode (t, NULL)
+		       || targetm.calls.promote_function_args_subreg_p (
+					  GET_MODE (x),
+					  promote_ssa_mode (t, NULL),
+					  TYPE_UNSIGNED (TREE_TYPE (t)),
+					  SSAVAR (t)));
 
   if (x)
     {
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 2f92d37da8c0091e9879a493cfe8a361eb1d9299..6314cd83a2488dc225d4a1a15599e8e51e639f7f 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -3906,6 +3906,15 @@ cases of mismatch, it also makes for better code on certain machines.
 The default is to not promote prototypes.
 @end deftypefn
 
+@deftypefn {Target Hook} bool TARGET_PROMOTE_FUNCTION_ARGS_SUBREG_P (machine_mode @var{mode}, machine_mode @var{promoted_mode}, int @var{unsignedp}, tree @var{v})
+When a function argument is promoted with @code{PROMOTE_MODE} then this
+hook is used to determine whether the bits of the promoted type are all
+significant in the expression pointed to by V.  If they are an extend is
+generated, if they are not a paradoxical subreg is created for the argument
+from @code{mode} to @code{promoted_mode}.
+The default is to promote using an extend.
+@end deftypefn
+
 @deftypefn {Target Hook} bool TARGET_PUSH_ARGUMENT (unsigned int @var{npush})
 This target hook returns @code{true} if push instructions will be
 used to pass outgoing arguments.  When the push instruction usage is
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index f869ddd5e5b8b7acbd8e9765fb103af24a1085b6..35f955803ec0a5a93be18a028fa1043f90858982 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -3103,6 +3103,8 @@ control passing certain arguments in registers.
 
 @hook TARGET_PROMOTE_PROTOTYPES
 
+@hook TARGET_PROMOTE_FUNCTION_ARGS_SUBREG_P
+
 @hook TARGET_PUSH_ARGUMENT
 
 @defmac PUSH_ARGS_REVERSED
diff --git a/gcc/function.cc b/gcc/function.cc
index d5ed51a6a663a1ef472f5b1c090543f359c18f42..92f469bfd5d1ebfb09cc94d9be854715cd2f90f8 100644
--- a/gcc/function.cc
+++ b/gcc/function.cc
@@ -3161,7 +3161,7 @@ assign_parm_setup_reg (struct assign_parm_data_all *all, tree parm,
   machine_mode promoted_nominal_mode;
   int unsignedp = TYPE_UNSIGNED (TREE_TYPE (parm));
   bool did_conversion = false;
-  bool need_conversion, moved;
+  bool need_conversion, moved, use_subregs;
   enum insn_code icode;
   rtx rtl;
 
@@ -3172,7 +3172,20 @@ assign_parm_setup_reg (struct assign_parm_data_all *all, tree parm,
     = promote_function_mode (data->nominal_type, data->nominal_mode, &unsignedp,
 			     TREE_TYPE (current_function_decl), 2);
 
-  parmreg = gen_reg_rtx (promoted_nominal_mode);
+  /* Check to see how the target wants the promotion of function arguments to
+     be handled.  */
+  use_subregs
+    = targetm.calls.promote_function_args_subreg_p (data->nominal_mode,
+						    promoted_nominal_mode,
+						    unsignedp, parm);
+
+  /* If we're promoting using a paradoxical subreg then we need to keep using
+     the unpromoted type because that's the only fully defined value.  */
+  if (use_subregs)
+    parmreg = gen_reg_rtx (data->nominal_mode);
+  else
+    parmreg = gen_reg_rtx (promoted_nominal_mode);
+
   if (!DECL_ARTIFICIAL (parm))
     mark_user_reg (parmreg);
 
@@ -3256,9 +3269,19 @@ assign_parm_setup_reg (struct assign_parm_data_all *all, tree parm,
 	    }
 	  else
 	    t = op1;
-	  rtx_insn *pat = gen_extend_insn (op0, t, promoted_nominal_mode,
-					   data->passed_mode, unsignedp);
-	  emit_insn (pat);
+
+	  /* Promote the argument itself now if a target wants it.  This
+	     prevents unneeded back and forth convertions in RTL between
+	     the original and promoted type.  */
+	  if (use_subregs)
+	    emit_move_insn (op0, lowpart_subreg (promoted_nominal_mode, t,
+						 data->nominal_mode));
+	  else
+	    {
+	      rtx_insn *pat = gen_extend_insn (op0, t, promoted_nominal_mode,
+					       data->passed_mode, unsignedp);
+	      emit_insn (pat);
+	    }
 	  insns = get_insns ();
 
 	  moved = true;
diff --git a/gcc/hooks.h b/gcc/hooks.h
index 1056e1e9e4dc3e6ce298557351047caa2f84227f..8d68de5cdb9adaea0a79ebf6de599f66b40aa67a 100644
--- a/gcc/hooks.h
+++ b/gcc/hooks.h
@@ -31,6 +31,8 @@ extern bool hook_bool_const_int_const_int_true (const int, const int);
 extern bool hook_bool_mode_false (machine_mode);
 extern bool hook_bool_mode_true (machine_mode);
 extern bool hook_bool_mode_mode_true (machine_mode, machine_mode);
+extern bool hook_bool_mode_mode_int_tree_false (machine_mode, machine_mode, int, tree);
+extern bool hook_bool_mode_mode_int_tree_true (machine_mode, machine_mode, int, tree);
 extern bool hook_bool_mode_const_rtx_false (machine_mode, const_rtx);
 extern bool hook_bool_mode_const_rtx_true (machine_mode, const_rtx);
 extern bool hook_bool_mode_rtx_false (machine_mode, rtx);
diff --git a/gcc/hooks.cc b/gcc/hooks.cc
index b29233f4f852fb81ede75a5065d743cd16cc9219..7647774f9e8efbbe13d5607e4a4b2f1c9d22f045 100644
--- a/gcc/hooks.cc
+++ b/gcc/hooks.cc
@@ -89,6 +89,22 @@ hook_bool_mode_mode_true (machine_mode, machine_mode)
   return true;
 }
 
+/* Generic hook that takes (machine_mode, machine_mode, int, tree) and
+   returns false.  */
+bool
+hook_bool_mode_mode_int_tree_false (machine_mode, machine_mode, int, tree)
+{
+  return false;
+}
+
+/* Generic hook that takes (machine_mode, machine_mode, int, tree) and
+   returns true.  */
+bool
+hook_bool_mode_mode_int_tree_true (machine_mode, machine_mode, int, tree)
+{
+  return true;
+}
+
 /* Generic hook that takes (machine_mode, const_rtx) and returns false.  */
 bool
 hook_bool_mode_const_rtx_false (machine_mode, const_rtx)
diff --git a/gcc/target.def b/gcc/target.def
index 72c2e1ef756cf70a1c92abe81f8a6577eaaa2501..bdbacf8c5fd7b0626a37951f6f6ec649f3194977 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -4561,6 +4561,17 @@ The default is to not promote prototypes.",
  bool, (const_tree fntype),
  hook_bool_const_tree_false)
 
+DEFHOOK
+(promote_function_args_subreg_p,
+ "When a function argument is promoted with @code{PROMOTE_MODE} then this\n\
+hook is used to determine whether the bits of the promoted type are all\n\
+significant in the expression pointed to by V.  If they are an extend is\n\
+generated, if they are not a paradoxical subreg is created for the argument\n\
+from @code{mode} to @code{promoted_mode}.\n\
+The default is to promote using an extend.",
+ bool, (machine_mode mode, machine_mode promoted_mode, int unsignedp, tree v),
+ hook_bool_mode_mode_int_tree_false)
+
 DEFHOOK
 (struct_value_rtx,
  "This target hook should return the location of the structure value\n\
diff --git a/gcc/tree-outof-ssa.cc b/gcc/tree-outof-ssa.cc
index ec883126ad86d86a2c2dafee4592b8d83e2ed917..0f437023983baa0f23da25221f7bce8fc559a8b8 100644
--- a/gcc/tree-outof-ssa.cc
+++ b/gcc/tree-outof-ssa.cc
@@ -45,6 +45,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-ssa-coalesce.h"
 #include "tree-outof-ssa.h"
 #include "dojump.h"
+#include "target.h"
 
 /* FIXME: A lot of code here deals with expanding to RTL.  All that code
    should be in cfgexpand.cc.  */
@@ -333,7 +334,10 @@ insert_value_copy_on_edge (edge e, int dest, tree src, location_t locus)
   dest_mode = GET_MODE (dest_rtx);
   gcc_assert (src_mode == TYPE_MODE (TREE_TYPE (name)));
   gcc_assert (!REG_P (dest_rtx)
-	      || dest_mode == promote_ssa_mode (name, &unsignedp));
+	      || dest_mode == promote_ssa_mode (name, &unsignedp)
+	      || targetm.calls.promote_function_args_subreg_p (
+			dest_mode, promote_ssa_mode (name, NULL), unsignedp,
+			name));
 
   if (src_mode != dest_mode)
     {




^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 2/3]AArch64 Promote function arguments using a paradoxical subreg when beneficial.
  2022-05-13 17:11 [PATCH 1/3]middle-end: Add the ability to let the target decide the method of argument promotions Tamar Christina
@ 2022-05-13 17:11 ` Tamar Christina
  2022-10-27  3:15   ` Andrew Pinski
  2022-05-13 17:12 ` [PATCH 3/3]AArch64 Update the testsuite to remove xfails Tamar Christina
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 19+ messages in thread
From: Tamar Christina @ 2022-05-13 17:11 UTC (permalink / raw)
  To: gcc-patches
  Cc: nd, Richard.Earnshaw, Marcus.Shawcroft, Kyrylo.Tkachov,
	richard.sandiford

[-- Attachment #1: Type: text/plain, Size: 8561 bytes --]

Hi All,

The PROMOTE_MODE always promotes 8 and 16-bit parameters to 32-bits.
This promotion is not required for the ABI which states:


```
C.9	If the argument is an Integral or Pointer Type, the size of the argument is
less than or equal to 8 bytes and the NGRN is less than 8, the argument is
copied to the least significant bits in x[NGRN]. The NGRN is incremented by one.
The argument has now been allocated.

C.16	If the size of the argument is less than 8 bytes then the size of the
argument is set to 8 bytes. The effect is as if the argument was copied to the
least significant bits of a 64-bit register and the remaining bits filled with
unspecified values
```

That is, the bits in the registers are unspecified and callees cannot assume
any particular status.

This means that we can avoid the promotion and still get correct code as the
language level promotion rules require values to be extended when the bits are
significant.

So if we are .e.g OR-ing two 8-bit values no extend is needed as the top bits
are irrelevant.  If we are doing e.g. addition, then the top bits *might* be
relevant depending on the result type.  But the middle end will always
contain the appropriate extend in those cases.

The mid-end also has optimizations around this assumption and the AArch64 port
actively undoes them.

So for instance

uint16_t fd (uint8_t xr){
    return xr + 1;
}

uint8_t fd2 (uint8_t xr){
    return xr + 1;
}

should produce

fd:                                     // @fd
        and     w8, w0, #0xff
        add     w0, w8, #1
        ret
fd2:                                    // @fd2
        add     w0, w0, #1
        ret

like clang does instead of

fd:
        and     w0, w0, 255
        add     w0, w0, 1
        ret
fd2:
        and     w0, w0, 255
        add     w0, w0, 1
        ret

like we do now.  Removing this forced expansion maintains correctness but fixes
issues with various codegen defects.  It also brings us inline with clang.

Note that C, C++ and Fortran etc all correctly specify what should happen w.r.t
extends and e.g. array indexing, pointer arith etc so we never get incorrect
code.

There is however a second reason for doing this promotion: RTL efficiency.
The promotion stops us from having to promote the values to SI to be able to
use them in instructions and then truncating again afterwards.

To get both the efficiency and the simpler RTL we can instead promote to a
paradoxical subreg.  This patch implements the hook for AArch64 and adds an
explicit opt-out for values that feed into comparisons.  This is done because:

1. our comparisons patterns already allow us to absorb the zero extend
2. The extension allows us to use cbz/cbnz/tbz etc.  In some cases such as

int foo (char a, char b)
{
   if (a)
     if (b)
       bar1 ();
     else
       ...
    else
     if (b)
       bar2 ();
     else
       ...
}

by zero extending the value we can avoid having to repeatedly test the value
before a branch.  Allowing the zero extend also allows our existing `ands`
patterns to work as expected.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
I have to commit this and the last patch together but ease of review
I have split them up here. However 209 missed optimization xfails are
fixed.

No performance difference on SPECCPU 2017 but no failures.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

	* config/aarch64/aarch64.cc (aarch64_promote_function_args_subreg_p):
	(TARGET_PROMOTE_FUNCTION_ARGS_SUBREG_P): New.
	* config/aarch64/aarch64.h (PROMOTE_MODE): Expand doc.

gcc/testsuite/ChangeLog:

	* gcc.target/aarch64/apc-subreg.c: New test.

--- inline copy of patch -- 
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index efa46ac0b8799b5849b609d591186e26e5cb37ff..cc74a816fcc6458aa065246a30a4d2184692ad74 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -34,7 +34,8 @@
 
 #define REGISTER_TARGET_PRAGMAS() aarch64_register_pragmas ()
 
-/* Target machine storage layout.  */
+/* Target machine storage layout.  See also
+   TARGET_PROMOTE_FUNCTION_ARGS_SUBREG_P.  */
 
 #define PROMOTE_MODE(MODE, UNSIGNEDP, TYPE)	\
   if (GET_MODE_CLASS (MODE) == MODE_INT		\
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 2f559600cff55af9d468e8d0810545583cc986f5..252d6c2af72afc1dfee1a86644a5753784b41f59 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -3736,6 +3736,57 @@ aarch64_array_mode_supported_p (machine_mode mode,
   return false;
 }
 
+/* Implement target hook TARGET_PROMOTE_FUNCTION_ARGS_SUBREG_P to complement
+   PROMOTE_MODE.  If any argument promotion was done, do them as subregs.  */
+static bool
+aarch64_promote_function_args_subreg_p (machine_mode mode,
+					machine_mode promoted_mode,
+					int /* unsignedp */, tree parm)
+{
+  bool candidate_p = GET_MODE_CLASS (mode) == MODE_INT
+		     && GET_MODE_CLASS (promoted_mode) == MODE_INT
+		     && known_lt (GET_MODE_SIZE (mode), 4)
+		     && promoted_mode == SImode;
+
+  if (!candidate_p)
+    return false;
+
+  if (!parm || !is_gimple_reg (parm))
+    return true;
+
+  tree var = parm;
+  if (!VAR_P (var))
+    {
+      if (TREE_CODE (parm) == SSA_NAME
+	   && !(var = SSA_NAME_VAR (var)))
+	return true;
+      else if (TREE_CODE (parm) != PARM_DECL)
+	return true;
+    }
+
+  /* If the variable is used inside a comparison which sets CC then we should
+     still promote using an extend.  By doing this we make it easier to use
+     cbz/cbnz but also repeatedly having to test the value in certain
+     circumstances like nested if values that test the same value with calls
+     in between. */
+  tree ssa_var = ssa_default_def (cfun, var);
+  if (!ssa_var)
+    return true;
+
+  const ssa_use_operand_t *const head = &(SSA_NAME_IMM_USE_NODE (ssa_var));
+  const ssa_use_operand_t *ptr;
+
+  for (ptr = head->next; ptr != head; ptr = ptr->next)
+    if (USE_STMT(ptr) && is_gimple_assign (USE_STMT (ptr)))
+      {
+	tree_code code = gimple_assign_rhs_code (USE_STMT(ptr));
+	if (TREE_CODE_CLASS (code) == tcc_comparison)
+	  return false;
+      }
+
+  return true;
+}
+
 /* MODE is some form of SVE vector mode.  For data modes, return the number
    of vector register bits that each element of MODE occupies, such as 64
    for both VNx2DImode and VNx2SImode (where each 32-bit value is stored
@@ -27490,6 +27541,10 @@ aarch64_libgcc_floating_mode_supported_p
 #undef TARGET_ARRAY_MODE_SUPPORTED_P
 #define TARGET_ARRAY_MODE_SUPPORTED_P aarch64_array_mode_supported_p
 
+#undef TARGET_PROMOTE_FUNCTION_ARGS_SUBREG_P
+#define TARGET_PROMOTE_FUNCTION_ARGS_SUBREG_P \
+  aarch64_promote_function_args_subreg_p
+
 #undef TARGET_VECTORIZE_CREATE_COSTS
 #define TARGET_VECTORIZE_CREATE_COSTS aarch64_vectorize_create_costs
 
diff --git a/gcc/testsuite/gcc.target/aarch64/apc-subreg.c b/gcc/testsuite/gcc.target/aarch64/apc-subreg.c
new file mode 100644
index 0000000000000000000000000000000000000000..2d7563a11ce11fa677f7ad4bf2a090e6a136e4d9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/apc-subreg.c
@@ -0,0 +1,103 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-O" } */
+/* { dg-final { check-function-bodies "**" "" "" } } */
+
+#include <stdint.h>
+
+/*
+** f0:
+**	mvn	w0, w0
+**	ret
+*/
+uint8_t f0 (uint8_t xr){
+    return (uint8_t) (0xff - xr);
+}
+
+/*
+** f1:
+**	mvn	w0, w0
+**	ret
+*/
+int8_t f1 (int8_t xr){
+    return (int8_t) (0xff - xr);
+}
+
+/*
+** f2:
+**	mvn	w0, w0
+**	ret
+*/
+uint16_t f2 (uint16_t xr){
+    return (uint16_t) (0xffFF - xr);
+}
+
+/*
+** f3:
+**	mvn	w0, w0
+**	ret
+*/
+uint32_t f3 (uint32_t xr){
+    return (uint32_t) (0xffFFffff - xr);
+}
+
+/*
+** f4:
+**	mvn	x0, x0
+**	ret
+*/
+uint64_t f4 (uint64_t xr){
+    return (uint64_t) (0xffFFffffffffffff - xr);
+}
+
+/*
+** f5:
+**	mvn	w0, w0
+**	sub	w0, w0, w1
+**	ret
+*/
+uint8_t f5 (uint8_t xr, uint8_t xc){
+    return (uint8_t) (0xff - xr - xc);
+}
+
+/*
+** f6:
+**	mvn	w0, w0
+**	and	w0, w0, 255
+**	and	w1, w1, 255
+**	mul	w0, w0, w1
+**	ret
+*/
+uint16_t f6 (uint8_t xr, uint8_t xc){
+    return ((uint8_t) (0xff - xr)) * xc;
+}
+
+/*
+** f7:
+**	and	w0, w0, 255
+**	and	w1, w1, 255
+**	mul	w0, w0, w1
+**	ret
+*/
+uint16_t f7 (uint8_t xr, uint8_t xc){
+    return xr * xc;
+}
+
+/*
+** f8:
+**	mul	w0, w0, w1
+**	and	w0, w0, 255
+**	ret
+*/
+uint16_t f8 (uint8_t xr, uint8_t xc){
+    return (uint8_t)(xr * xc);
+}
+
+/*
+** f9:
+**	and	w0, w0, 255
+**	add	w0, w0, w1
+**	ret
+*/
+uint16_t f9 (uint8_t xr, uint16_t xc){
+    return xr + xc;
+}




-- 

[-- Attachment #2: rb15722.patch --]
[-- Type: text/plain, Size: 4895 bytes --]

diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index efa46ac0b8799b5849b609d591186e26e5cb37ff..cc74a816fcc6458aa065246a30a4d2184692ad74 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -34,7 +34,8 @@
 
 #define REGISTER_TARGET_PRAGMAS() aarch64_register_pragmas ()
 
-/* Target machine storage layout.  */
+/* Target machine storage layout.  See also
+   TARGET_PROMOTE_FUNCTION_ARGS_SUBREG_P.  */
 
 #define PROMOTE_MODE(MODE, UNSIGNEDP, TYPE)	\
   if (GET_MODE_CLASS (MODE) == MODE_INT		\
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 2f559600cff55af9d468e8d0810545583cc986f5..252d6c2af72afc1dfee1a86644a5753784b41f59 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -3736,6 +3736,57 @@ aarch64_array_mode_supported_p (machine_mode mode,
   return false;
 }
 
+/* Implement target hook TARGET_PROMOTE_FUNCTION_ARGS_SUBREG_P to complement
+   PROMOTE_MODE.  If any argument promotion was done, do them as subregs.  */
+static bool
+aarch64_promote_function_args_subreg_p (machine_mode mode,
+					machine_mode promoted_mode,
+					int /* unsignedp */, tree parm)
+{
+  bool candidate_p = GET_MODE_CLASS (mode) == MODE_INT
+		     && GET_MODE_CLASS (promoted_mode) == MODE_INT
+		     && known_lt (GET_MODE_SIZE (mode), 4)
+		     && promoted_mode == SImode;
+
+  if (!candidate_p)
+    return false;
+
+  if (!parm || !is_gimple_reg (parm))
+    return true;
+
+  tree var = parm;
+  if (!VAR_P (var))
+    {
+      if (TREE_CODE (parm) == SSA_NAME
+	   && !(var = SSA_NAME_VAR (var)))
+	return true;
+      else if (TREE_CODE (parm) != PARM_DECL)
+	return true;
+    }
+
+  /* If the variable is used inside a comparison which sets CC then we should
+     still promote using an extend.  By doing this we make it easier to use
+     cbz/cbnz but also repeatedly having to test the value in certain
+     circumstances like nested if values that test the same value with calls
+     in between. */
+  tree ssa_var = ssa_default_def (cfun, var);
+  if (!ssa_var)
+    return true;
+
+  const ssa_use_operand_t *const head = &(SSA_NAME_IMM_USE_NODE (ssa_var));
+  const ssa_use_operand_t *ptr;
+
+  for (ptr = head->next; ptr != head; ptr = ptr->next)
+    if (USE_STMT(ptr) && is_gimple_assign (USE_STMT (ptr)))
+      {
+	tree_code code = gimple_assign_rhs_code (USE_STMT(ptr));
+	if (TREE_CODE_CLASS (code) == tcc_comparison)
+	  return false;
+      }
+
+  return true;
+}
+
 /* MODE is some form of SVE vector mode.  For data modes, return the number
    of vector register bits that each element of MODE occupies, such as 64
    for both VNx2DImode and VNx2SImode (where each 32-bit value is stored
@@ -27490,6 +27541,10 @@ aarch64_libgcc_floating_mode_supported_p
 #undef TARGET_ARRAY_MODE_SUPPORTED_P
 #define TARGET_ARRAY_MODE_SUPPORTED_P aarch64_array_mode_supported_p
 
+#undef TARGET_PROMOTE_FUNCTION_ARGS_SUBREG_P
+#define TARGET_PROMOTE_FUNCTION_ARGS_SUBREG_P \
+  aarch64_promote_function_args_subreg_p
+
 #undef TARGET_VECTORIZE_CREATE_COSTS
 #define TARGET_VECTORIZE_CREATE_COSTS aarch64_vectorize_create_costs
 
diff --git a/gcc/testsuite/gcc.target/aarch64/apc-subreg.c b/gcc/testsuite/gcc.target/aarch64/apc-subreg.c
new file mode 100644
index 0000000000000000000000000000000000000000..2d7563a11ce11fa677f7ad4bf2a090e6a136e4d9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/apc-subreg.c
@@ -0,0 +1,103 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-O" } */
+/* { dg-final { check-function-bodies "**" "" "" } } */
+
+#include <stdint.h>
+
+/*
+** f0:
+**	mvn	w0, w0
+**	ret
+*/
+uint8_t f0 (uint8_t xr){
+    return (uint8_t) (0xff - xr);
+}
+
+/*
+** f1:
+**	mvn	w0, w0
+**	ret
+*/
+int8_t f1 (int8_t xr){
+    return (int8_t) (0xff - xr);
+}
+
+/*
+** f2:
+**	mvn	w0, w0
+**	ret
+*/
+uint16_t f2 (uint16_t xr){
+    return (uint16_t) (0xffFF - xr);
+}
+
+/*
+** f3:
+**	mvn	w0, w0
+**	ret
+*/
+uint32_t f3 (uint32_t xr){
+    return (uint32_t) (0xffFFffff - xr);
+}
+
+/*
+** f4:
+**	mvn	x0, x0
+**	ret
+*/
+uint64_t f4 (uint64_t xr){
+    return (uint64_t) (0xffFFffffffffffff - xr);
+}
+
+/*
+** f5:
+**	mvn	w0, w0
+**	sub	w0, w0, w1
+**	ret
+*/
+uint8_t f5 (uint8_t xr, uint8_t xc){
+    return (uint8_t) (0xff - xr - xc);
+}
+
+/*
+** f6:
+**	mvn	w0, w0
+**	and	w0, w0, 255
+**	and	w1, w1, 255
+**	mul	w0, w0, w1
+**	ret
+*/
+uint16_t f6 (uint8_t xr, uint8_t xc){
+    return ((uint8_t) (0xff - xr)) * xc;
+}
+
+/*
+** f7:
+**	and	w0, w0, 255
+**	and	w1, w1, 255
+**	mul	w0, w0, w1
+**	ret
+*/
+uint16_t f7 (uint8_t xr, uint8_t xc){
+    return xr * xc;
+}
+
+/*
+** f8:
+**	mul	w0, w0, w1
+**	and	w0, w0, 255
+**	ret
+*/
+uint16_t f8 (uint8_t xr, uint8_t xc){
+    return (uint8_t)(xr * xc);
+}
+
+/*
+** f9:
+**	and	w0, w0, 255
+**	add	w0, w0, w1
+**	ret
+*/
+uint16_t f9 (uint8_t xr, uint16_t xc){
+    return xr + xc;
+}




^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 3/3]AArch64 Update the testsuite to remove xfails.
  2022-05-13 17:11 [PATCH 1/3]middle-end: Add the ability to let the target decide the method of argument promotions Tamar Christina
  2022-05-13 17:11 ` [PATCH 2/3]AArch64 Promote function arguments using a paradoxical subreg when beneficial Tamar Christina
@ 2022-05-13 17:12 ` Tamar Christina
  2022-05-16  6:31 ` [PATCH 1/3]middle-end: Add the ability to let the target decide the method of argument promotions Richard Biener
  2022-05-16 11:36 ` Richard Sandiford
  3 siblings, 0 replies; 19+ messages in thread
From: Tamar Christina @ 2022-05-13 17:12 UTC (permalink / raw)
  To: gcc-patches
  Cc: nd, Richard.Earnshaw, Marcus.Shawcroft, Kyrylo.Tkachov,
	richard.sandiford

[-- Attachment #1: Type: text/plain, Size: 64535 bytes --]

Hi All,

This removes the xfails from the below tests which now all pass.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues and fixes 209
SVE xfails.

Ok for master?

Thanks,
Tamar

gcc/testsuite/ChangeLog:

	* gcc.target/aarch64/sve/acle/asm/abd_s16.c: Remove xfail.
	* gcc.target/aarch64/sve/acle/asm/abd_s8.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/abd_u16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/abd_u8.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/add_s16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/add_s8.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/add_u16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/add_u8.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/and_s16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/and_s8.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/and_u16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/and_u8.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/asr_s16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/asr_s8.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/bic_s16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/bic_s8.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/bic_u16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/bic_u8.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/dot_s32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/dot_s64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/dot_u32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/dot_u64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/eor_s16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/eor_s8.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/eor_u16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/eor_u8.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/lsl_s16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/lsl_s8.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/lsl_u16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/lsl_u8.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/lsr_u16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/lsr_u8.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/mad_s16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/mad_s8.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/mad_u16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/mad_u8.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/max_s16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/max_s8.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/max_u16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/max_u8.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/min_s16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/min_s8.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/min_u16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/min_u8.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/mla_s16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/mla_s8.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/mla_u16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/mla_u8.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/mls_s16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/mls_s8.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/mls_u16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/mls_u8.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/msb_s16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/msb_s8.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/msb_u16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/msb_u8.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/mul_s16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/mul_s8.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/mul_u16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/mul_u8.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/mulh_s16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/mulh_s8.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/mulh_u16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/mulh_u8.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/orr_s16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/orr_s8.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/orr_u16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/orr_u8.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/scale_f16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/sub_s16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/sub_s8.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/sub_u16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/sub_u8.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/subr_s16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/subr_s8.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/subr_u16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/subr_u8.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/bcax_s16.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/bcax_s8.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/bcax_u16.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/bcax_u8.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/qadd_s16.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/qadd_s8.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/qadd_u16.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/qadd_u8.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/qdmlalb_s16.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/qdmlalb_s32.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/qdmlalbt_s16.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/qdmlalbt_s32.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/qsub_s16.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/qsub_s8.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/qsub_u16.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/qsub_u8.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/qsubr_s16.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/qsubr_s8.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/qsubr_u16.c: Likewise.
	* gcc.target/aarch64/sve2/acle/asm/qsubr_u8.c: Likewise.

--- inline copy of patch -- 
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/abd_s16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/abd_s16.c
index e2d0c0fb7ef3f9cd6f232bb5da7f5a46205a093f..030d952ecda6da44b2351d48daa850fc7b5033b7 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/abd_s16.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/abd_s16.c
@@ -43,7 +43,7 @@ TEST_UNIFORM_ZX (abd_w0_s16_m_tied1, svint16_t, int16_t,
 		 z0 = svabd_m (p0, z0, x0))
 
 /*
-** abd_w0_s16_m_untied: { xfail *-*-* }
+** abd_w0_s16_m_untied:
 **	mov	(z[0-9]+\.h), w0
 **	movprfx	z0, z1
 **	sabd	z0\.h, p0/m, z0\.h, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/abd_s8.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/abd_s8.c
index 49a2cc388f960848e15680219c098fdd0ab21671..e1a74a16f6f745966889f90e82e4da473bd1e706 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/abd_s8.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/abd_s8.c
@@ -43,7 +43,7 @@ TEST_UNIFORM_ZX (abd_w0_s8_m_tied1, svint8_t, int8_t,
 		 z0 = svabd_m (p0, z0, x0))
 
 /*
-** abd_w0_s8_m_untied: { xfail *-*-* }
+** abd_w0_s8_m_untied:
 **	mov	(z[0-9]+\.b), w0
 **	movprfx	z0, z1
 **	sabd	z0\.b, p0/m, z0\.b, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/abd_u16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/abd_u16.c
index 60aa9429ea62b41c2ae098b4f2cf6c5357fdf9af..f763da3ec941d334009c1cfe6c80861e4d15cd33 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/abd_u16.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/abd_u16.c
@@ -43,7 +43,7 @@ TEST_UNIFORM_ZX (abd_w0_u16_m_tied1, svuint16_t, uint16_t,
 		 z0 = svabd_m (p0, z0, x0))
 
 /*
-** abd_w0_u16_m_untied: { xfail *-*-* }
+** abd_w0_u16_m_untied:
 **	mov	(z[0-9]+\.h), w0
 **	movprfx	z0, z1
 **	uabd	z0\.h, p0/m, z0\.h, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/abd_u8.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/abd_u8.c
index 454ef153cc3c51b3595525e76ea0a0d7ca70805b..f46f1943484f001a5e9babdba8cb8b9160fef314 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/abd_u8.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/abd_u8.c
@@ -43,7 +43,7 @@ TEST_UNIFORM_ZX (abd_w0_u8_m_tied1, svuint8_t, uint8_t,
 		 z0 = svabd_m (p0, z0, x0))
 
 /*
-** abd_w0_u8_m_untied: { xfail *-*-* }
+** abd_w0_u8_m_untied:
 **	mov	(z[0-9]+\.b), w0
 **	movprfx	z0, z1
 **	uabd	z0\.b, p0/m, z0\.b, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/add_s16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/add_s16.c
index c0883edf9ab4eedf5dca5104e1443b614397cad5..7dd48c778634bd958445cf9bfa29e54fa38b3921 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/add_s16.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/add_s16.c
@@ -43,7 +43,7 @@ TEST_UNIFORM_ZX (add_w0_s16_m_tied1, svint16_t, int16_t,
 		 z0 = svadd_m (p0, z0, x0))
 
 /*
-** add_w0_s16_m_untied: { xfail *-*-* }
+** add_w0_s16_m_untied:
 **	mov	(z[0-9]+\.h), w0
 **	movprfx	z0, z1
 **	add	z0\.h, p0/m, z0\.h, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/add_s8.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/add_s8.c
index 0889c189d59699e6fdbc1c91f8a36ab0f9296f52..6775f26cfb24dfaa1c5d60302ae5eb1766102def 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/add_s8.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/add_s8.c
@@ -43,7 +43,7 @@ TEST_UNIFORM_ZX (add_w0_s8_m_tied1, svint8_t, int8_t,
 		 z0 = svadd_m (p0, z0, x0))
 
 /*
-** add_w0_s8_m_untied: { xfail *-*-* }
+** add_w0_s8_m_untied:
 **	mov	(z[0-9]+\.b), w0
 **	movprfx	z0, z1
 **	add	z0\.b, p0/m, z0\.b, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/add_u16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/add_u16.c
index 25cb90353d3b852334885c6eb306bb2e97452c2e..f0fafdb1c2dc526f2b9199a451a62975111e53a8 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/add_u16.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/add_u16.c
@@ -43,7 +43,7 @@ TEST_UNIFORM_ZX (add_w0_u16_m_tied1, svuint16_t, uint16_t,
 		 z0 = svadd_m (p0, z0, x0))
 
 /*
-** add_w0_u16_m_untied: { xfail *-*-* }
+** add_w0_u16_m_untied:
 **	mov	(z[0-9]+\.h), w0
 **	movprfx	z0, z1
 **	add	z0\.h, p0/m, z0\.h, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/add_u8.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/add_u8.c
index 06b68c97ce8c84494c960f28b2ae4c5dcafd0d03..2bdb7a85f56a8c1969e9390b6ba7c0eb10336c97 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/add_u8.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/add_u8.c
@@ -43,7 +43,7 @@ TEST_UNIFORM_ZX (add_w0_u8_m_tied1, svuint8_t, uint8_t,
 		 z0 = svadd_m (p0, z0, x0))
 
 /*
-** add_w0_u8_m_untied: { xfail *-*-* }
+** add_w0_u8_m_untied:
 **	mov	(z[0-9]+\.b), w0
 **	movprfx	z0, z1
 **	add	z0\.b, p0/m, z0\.b, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/and_s16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/and_s16.c
index d54613e915d221b02d957cf57c44d690328cb430..15e5d58334455a75a42eee29e2aaf1eb89e75acd 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/and_s16.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/and_s16.c
@@ -43,7 +43,7 @@ TEST_UNIFORM_ZX (and_w0_s16_m_tied1, svint16_t, int16_t,
 		 z0 = svand_m (p0, z0, x0))
 
 /*
-** and_w0_s16_m_untied: { xfail *-*-* }
+** and_w0_s16_m_untied:
 **	mov	(z[0-9]+\.h), w0
 **	movprfx	z0, z1
 **	and	z0\.h, p0/m, z0\.h, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/and_s8.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/and_s8.c
index 61d168d3fdf8968295724cdb0e63fcb087d30d27..197c3011cee8d17725ed0e42557745325b4cfa2d 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/and_s8.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/and_s8.c
@@ -43,7 +43,7 @@ TEST_UNIFORM_ZX (and_w0_s8_m_tied1, svint8_t, int8_t,
 		 z0 = svand_m (p0, z0, x0))
 
 /*
-** and_w0_s8_m_untied: { xfail *-*-* }
+** and_w0_s8_m_untied:
 **	mov	(z[0-9]+\.b), w0
 **	movprfx	z0, z1
 **	and	z0\.b, p0/m, z0\.b, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/and_u16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/and_u16.c
index 875a08d71d1822c54b3d549ce46e6c951ef38bc6..8210fba2f6382e493fba46f7b087b6de860f7f66 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/and_u16.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/and_u16.c
@@ -43,7 +43,7 @@ TEST_UNIFORM_ZX (and_w0_u16_m_tied1, svuint16_t, uint16_t,
 		 z0 = svand_m (p0, z0, x0))
 
 /*
-** and_w0_u16_m_untied: { xfail *-*-* }
+** and_w0_u16_m_untied:
 **	mov	(z[0-9]+\.h), w0
 **	movprfx	z0, z1
 **	and	z0\.h, p0/m, z0\.h, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/and_u8.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/and_u8.c
index b0f1c9529f05d614a1b37bdba68caf843ae12f65..5d3fd60382ffe4016f4539f347a1e34045a4e827 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/and_u8.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/and_u8.c
@@ -43,7 +43,7 @@ TEST_UNIFORM_ZX (and_w0_u8_m_tied1, svuint8_t, uint8_t,
 		 z0 = svand_m (p0, z0, x0))
 
 /*
-** and_w0_u8_m_untied: { xfail *-*-* }
+** and_w0_u8_m_untied:
 **	mov	(z[0-9]+\.b), w0
 **	movprfx	z0, z1
 **	and	z0\.b, p0/m, z0\.b, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/asr_s16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/asr_s16.c
index 877bf10685a4b29cc0c6e067ce0cec61530df59f..f9ce790da95e655ac73389ccdc36b9e16cdfebb9 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/asr_s16.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/asr_s16.c
@@ -43,7 +43,7 @@ TEST_UNIFORM_ZX (asr_w0_s16_m_tied1, svint16_t, uint16_t,
 		 z0 = svasr_m (p0, z0, x0))
 
 /*
-** asr_w0_s16_m_untied: { xfail *-*-* }
+** asr_w0_s16_m_untied:
 **	mov	(z[0-9]+\.h), w0
 **	movprfx	z0, z1
 **	asr	z0\.h, p0/m, z0\.h, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/asr_s8.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/asr_s8.c
index 992e93fdef7a6a425d247229ebc14801caf02ef0..5cf3a712c282534031e1dbf301f82058467da4ec 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/asr_s8.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/asr_s8.c
@@ -43,7 +43,7 @@ TEST_UNIFORM_ZX (asr_w0_s8_m_tied1, svint8_t, uint8_t,
 		 z0 = svasr_m (p0, z0, x0))
 
 /*
-** asr_w0_s8_m_untied: { xfail *-*-* }
+** asr_w0_s8_m_untied:
 **	mov	(z[0-9]+\.b), w0
 **	movprfx	z0, z1
 **	asr	z0\.b, p0/m, z0\.b, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/bic_s16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/bic_s16.c
index c80f5697f5f475cb052527b2a0af5c37b794ba3b..9cf22ebe7fc37a12f9ef3d0a5c76b9c7db82d0ae 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/bic_s16.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/bic_s16.c
@@ -43,7 +43,7 @@ TEST_UNIFORM_ZX (bic_w0_s16_m_tied1, svint16_t, int16_t,
 		 z0 = svbic_m (p0, z0, x0))
 
 /*
-** bic_w0_s16_m_untied: { xfail *-*-* }
+** bic_w0_s16_m_untied:
 **	mov	(z[0-9]+\.h), w0
 **	movprfx	z0, z1
 **	bic	z0\.h, p0/m, z0\.h, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/bic_s8.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/bic_s8.c
index 0958a34039394d79ecd00ac3c855c01cecbce86a..6795716810c8ba79da1c4e96d8eeef1cfdd1dc40 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/bic_s8.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/bic_s8.c
@@ -43,7 +43,7 @@ TEST_UNIFORM_ZX (bic_w0_s8_m_tied1, svint8_t, int8_t,
 		 z0 = svbic_m (p0, z0, x0))
 
 /*
-** bic_w0_s8_m_untied: { xfail *-*-* }
+** bic_w0_s8_m_untied:
 **	mov	(z[0-9]+\.b), w0
 **	movprfx	z0, z1
 **	bic	z0\.b, p0/m, z0\.b, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/bic_u16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/bic_u16.c
index 30209ffb418f410094afd9df594dad98aa380a5d..5409954caff45d6c12bd13c4e090712f72e8da0e 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/bic_u16.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/bic_u16.c
@@ -43,7 +43,7 @@ TEST_UNIFORM_ZX (bic_w0_u16_m_tied1, svuint16_t, uint16_t,
 		 z0 = svbic_m (p0, z0, x0))
 
 /*
-** bic_w0_u16_m_untied: { xfail *-*-* }
+** bic_w0_u16_m_untied:
 **	mov	(z[0-9]+\.h), w0
 **	movprfx	z0, z1
 **	bic	z0\.h, p0/m, z0\.h, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/bic_u8.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/bic_u8.c
index 80c489b9cdb2b6c6dfd3ab5377bc902f8ef86f6d..a0fbae4796259694b6918a88d9743c4edb9f390d 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/bic_u8.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/bic_u8.c
@@ -43,7 +43,7 @@ TEST_UNIFORM_ZX (bic_w0_u8_m_tied1, svuint8_t, uint8_t,
 		 z0 = svbic_m (p0, z0, x0))
 
 /*
-** bic_w0_u8_m_untied: { xfail *-*-* }
+** bic_w0_u8_m_untied:
 **	mov	(z[0-9]+\.b), w0
 **	movprfx	z0, z1
 **	bic	z0\.b, p0/m, z0\.b, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/dot_s32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/dot_s32.c
index 605bd1b30f25f4b2b2bb9e923a9e7dfee8784b07..e9603adf20b79bb96cb1f8ea7e804c308212b319 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/dot_s32.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/dot_s32.c
@@ -54,7 +54,7 @@ TEST_DUAL_ZX (dot_w0_s32_tied1, svint32_t, svint8_t, int8_t,
 	      z0 = svdot (z0, z4, x0))
 
 /*
-** dot_w0_s32_untied: { xfail *-*-* }
+** dot_w0_s32_untied:
 **	mov	(z[0-9]+\.b), w0
 **	movprfx	z0, z1
 **	sdot	z0\.s, z4\.b, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/dot_s64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/dot_s64.c
index b6574740b7e7a9a9ff5f79568d8590d2744ad7a4..7a13c62c535150d827e4247de3151588e61d9e68 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/dot_s64.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/dot_s64.c
@@ -54,7 +54,7 @@ TEST_DUAL_ZX (dot_w0_s64_tied1, svint64_t, svint16_t, int16_t,
 	      z0 = svdot (z0, z4, x0))
 
 /*
-** dot_w0_s64_untied: { xfail *-*-* }
+** dot_w0_s64_untied:
 **	mov	(z[0-9]+\.h), w0
 **	movprfx	z0, z1
 **	sdot	z0\.d, z4\.h, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/dot_u32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/dot_u32.c
index 541e71cc212e7ce0d96a550c373c4c30a0db58ea..e46cac469d98fc59408296676566c61f88fc1c77 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/dot_u32.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/dot_u32.c
@@ -54,7 +54,7 @@ TEST_DUAL_ZX (dot_w0_u32_tied1, svuint32_t, svuint8_t, uint8_t,
 	      z0 = svdot (z0, z4, x0))
 
 /*
-** dot_w0_u32_untied: { xfail *-*-* }
+** dot_w0_u32_untied:
 **	mov	(z[0-9]+\.b), w0
 **	movprfx	z0, z1
 **	udot	z0\.s, z4\.b, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/dot_u64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/dot_u64.c
index cc0e853737df001f4d46c0a12edad45dd568f745..67576c8da171ff61bb32db4f89b2dd3074be48e0 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/dot_u64.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/dot_u64.c
@@ -54,7 +54,7 @@ TEST_DUAL_ZX (dot_w0_u64_tied1, svuint64_t, svuint16_t, uint16_t,
 	      z0 = svdot (z0, z4, x0))
 
 /*
-** dot_w0_u64_untied: { xfail *-*-* }
+** dot_w0_u64_untied:
 **	mov	(z[0-9]+\.h), w0
 **	movprfx	z0, z1
 **	udot	z0\.d, z4\.h, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/eor_s16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/eor_s16.c
index 7cf73609a1aa188e0385838803fe75f45014a1bc..123cdfb0a7a74be73be8bea5b21ec2593a21f96a 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/eor_s16.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/eor_s16.c
@@ -43,7 +43,7 @@ TEST_UNIFORM_ZX (eor_w0_s16_m_tied1, svint16_t, int16_t,
 		 z0 = sveor_m (p0, z0, x0))
 
 /*
-** eor_w0_s16_m_untied: { xfail *-*-* }
+** eor_w0_s16_m_untied:
 **	mov	(z[0-9]+\.h), w0
 **	movprfx	z0, z1
 **	eor	z0\.h, p0/m, z0\.h, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/eor_s8.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/eor_s8.c
index 083ac2dde06e43dbdcdcf441c7d50ea77ac5c9cd..5da5908ac36bb729564e2e810a43965bd0913be3 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/eor_s8.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/eor_s8.c
@@ -43,7 +43,7 @@ TEST_UNIFORM_ZX (eor_w0_s8_m_tied1, svint8_t, int8_t,
 		 z0 = sveor_m (p0, z0, x0))
 
 /*
-** eor_w0_s8_m_untied: { xfail *-*-* }
+** eor_w0_s8_m_untied:
 **	mov	(z[0-9]+\.b), w0
 **	movprfx	z0, z1
 **	eor	z0\.b, p0/m, z0\.b, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/eor_u16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/eor_u16.c
index 40b43a5f89b480a6286c1fcc0746375eb8031914..b3c2e0c4e0b4320653b663f55246e9aa59afa722 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/eor_u16.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/eor_u16.c
@@ -43,7 +43,7 @@ TEST_UNIFORM_ZX (eor_w0_u16_m_tied1, svuint16_t, uint16_t,
 		 z0 = sveor_m (p0, z0, x0))
 
 /*
-** eor_w0_u16_m_untied: { xfail *-*-* }
+** eor_w0_u16_m_untied:
 **	mov	(z[0-9]+\.h), w0
 **	movprfx	z0, z1
 **	eor	z0\.h, p0/m, z0\.h, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/eor_u8.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/eor_u8.c
index 006637699e8b199aad22112b72d748c04f8f5257..021af308174b1d5e0003e7a2960cc821739c4aab 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/eor_u8.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/eor_u8.c
@@ -43,7 +43,7 @@ TEST_UNIFORM_ZX (eor_w0_u8_m_tied1, svuint8_t, uint8_t,
 		 z0 = sveor_m (p0, z0, x0))
 
 /*
-** eor_w0_u8_m_untied: { xfail *-*-* }
+** eor_w0_u8_m_untied:
 **	mov	(z[0-9]+\.b), w0
 **	movprfx	z0, z1
 **	eor	z0\.b, p0/m, z0\.b, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsl_s16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsl_s16.c
index edaaca5f155b9af6cd7dd1d15ff94e86c82c486f..67db99697ee88367f8393f9580b4cc11655c2327 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsl_s16.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsl_s16.c
@@ -43,7 +43,7 @@ TEST_UNIFORM_ZX (lsl_w0_s16_m_tied1, svint16_t, uint16_t,
 		 z0 = svlsl_m (p0, z0, x0))
 
 /*
-** lsl_w0_s16_m_untied: { xfail *-*-* }
+** lsl_w0_s16_m_untied:
 **	mov	(z[0-9]+\.h), w0
 **	movprfx	z0, z1
 **	lsl	z0\.h, p0/m, z0\.h, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsl_s8.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsl_s8.c
index 9a9cc959c33da729324920a6946c15cb6ba099bd..67a09a745e24e08c29bdbadde1b2f9640683a1d0 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsl_s8.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsl_s8.c
@@ -43,7 +43,7 @@ TEST_UNIFORM_ZX (lsl_w0_s8_m_tied1, svint8_t, uint8_t,
 		 z0 = svlsl_m (p0, z0, x0))
 
 /*
-** lsl_w0_s8_m_untied: { xfail *-*-* }
+** lsl_w0_s8_m_untied:
 **	mov	(z[0-9]+\.b), w0
 **	movprfx	z0, z1
 **	lsl	z0\.b, p0/m, z0\.b, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsl_u16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsl_u16.c
index 57db0fda66af3d642eeacd22ec3bd48c94b34048..b5e7386ce632a672b49b4bc05d8380fbc62bc0bb 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsl_u16.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsl_u16.c
@@ -43,7 +43,7 @@ TEST_UNIFORM_ZX (lsl_w0_u16_m_tied1, svuint16_t, uint16_t,
 		 z0 = svlsl_m (p0, z0, x0))
 
 /*
-** lsl_w0_u16_m_untied: { xfail *-*-* }
+** lsl_w0_u16_m_untied:
 **	mov	(z[0-9]+\.h), w0
 **	movprfx	z0, z1
 **	lsl	z0\.h, p0/m, z0\.h, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsl_u8.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsl_u8.c
index 894b5513857b5949a6942156148778a9ae9cadbe..905bf9ed1692ab596af0237a7b33fa40bca24ddd 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsl_u8.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsl_u8.c
@@ -43,7 +43,7 @@ TEST_UNIFORM_ZX (lsl_w0_u8_m_tied1, svuint8_t, uint8_t,
 		 z0 = svlsl_m (p0, z0, x0))
 
 /*
-** lsl_w0_u8_m_untied: { xfail *-*-* }
+** lsl_w0_u8_m_untied:
 **	mov	(z[0-9]+\.b), w0
 **	movprfx	z0, z1
 **	lsl	z0\.b, p0/m, z0\.b, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsr_u16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsr_u16.c
index 61575645fad086970193875926d051ecc6482ef6..a41411986f798e251e73915542f999f8468fd2eb 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsr_u16.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsr_u16.c
@@ -43,7 +43,7 @@ TEST_UNIFORM_ZX (lsr_w0_u16_m_tied1, svuint16_t, uint16_t,
 		 z0 = svlsr_m (p0, z0, x0))
 
 /*
-** lsr_w0_u16_m_untied: { xfail *-*-* }
+** lsr_w0_u16_m_untied:
 **	mov	(z[0-9]+\.h), w0
 **	movprfx	z0, z1
 **	lsr	z0\.h, p0/m, z0\.h, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsr_u8.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsr_u8.c
index a049ca90556e5daa42a928a82a1a704803194cd6..b773eedba7fe5b4e390170076a32d9281527d690 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsr_u8.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsr_u8.c
@@ -43,7 +43,7 @@ TEST_UNIFORM_ZX (lsr_w0_u8_m_tied1, svuint8_t, uint8_t,
 		 z0 = svlsr_m (p0, z0, x0))
 
 /*
-** lsr_w0_u8_m_untied: { xfail *-*-* }
+** lsr_w0_u8_m_untied:
 **	mov	(z[0-9]+\.b), w0
 **	movprfx	z0, z1
 **	lsr	z0\.b, p0/m, z0\.b, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mad_s16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mad_s16.c
index 02a6d4588b85f315e8e695b196db8bbf5c214454..bb6e0ea31c521dfb8199aff6409b56335d492320 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mad_s16.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mad_s16.c
@@ -54,7 +54,7 @@ TEST_UNIFORM_ZX (mad_w0_s16_m_tied1, svint16_t, int16_t,
 		 z0 = svmad_m (p0, z0, z1, x0))
 
 /*
-** mad_w0_s16_m_untied: { xfail *-*-* }
+** mad_w0_s16_m_untied:
 **	mov	(z[0-9]+\.h), w0
 **	movprfx	z0, z1
 **	mad	z0\.h, p0/m, z2\.h, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mad_s8.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mad_s8.c
index 90d712686ca5ab9752c245dcf2ae4230546fd9fc..adc6972a7dca903a148e83b649140f8b97f10eee 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mad_s8.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mad_s8.c
@@ -54,7 +54,7 @@ TEST_UNIFORM_ZX (mad_w0_s8_m_tied1, svint8_t, int8_t,
 		 z0 = svmad_m (p0, z0, z1, x0))
 
 /*
-** mad_w0_s8_m_untied: { xfail *-*-* }
+** mad_w0_s8_m_untied:
 **	mov	(z[0-9]+\.b), w0
 **	movprfx	z0, z1
 **	mad	z0\.b, p0/m, z2\.b, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mad_u16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mad_u16.c
index 1d2ad9c5fc9d972c08bbf690b1fb4fb4d26c6f29..5b3c390b14c8b26b268cad5b364ac32a13a502b5 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mad_u16.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mad_u16.c
@@ -54,7 +54,7 @@ TEST_UNIFORM_ZX (mad_w0_u16_m_tied1, svuint16_t, uint16_t,
 		 z0 = svmad_m (p0, z0, z1, x0))
 
 /*
-** mad_w0_u16_m_untied: { xfail *-*-* }
+** mad_w0_u16_m_untied:
 **	mov	(z[0-9]+\.h), w0
 **	movprfx	z0, z1
 **	mad	z0\.h, p0/m, z2\.h, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mad_u8.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mad_u8.c
index 0b4b1b8cfe6e3b00eb4b1fe0516ca84fd2418aa0..a23c813f18f1a82b9dca777a84d44c2d2993c6ce 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mad_u8.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mad_u8.c
@@ -54,7 +54,7 @@ TEST_UNIFORM_ZX (mad_w0_u8_m_tied1, svuint8_t, uint8_t,
 		 z0 = svmad_m (p0, z0, z1, x0))
 
 /*
-** mad_w0_u8_m_untied: { xfail *-*-* }
+** mad_w0_u8_m_untied:
 **	mov	(z[0-9]+\.b), w0
 **	movprfx	z0, z1
 **	mad	z0\.b, p0/m, z2\.b, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/max_s16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/max_s16.c
index 6a21675228274043ec4ed46405964015a9f34744..d5f5e6e629dcadf40618dd7156a39fda36b3896a 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/max_s16.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/max_s16.c
@@ -43,7 +43,7 @@ TEST_UNIFORM_ZX (max_w0_s16_m_tied1, svint16_t, int16_t,
 		 z0 = svmax_m (p0, z0, x0))
 
 /*
-** max_w0_s16_m_untied: { xfail *-*-* }
+** max_w0_s16_m_untied:
 **	mov	(z[0-9]+\.h), w0
 **	movprfx	z0, z1
 **	smax	z0\.h, p0/m, z0\.h, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/max_s8.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/max_s8.c
index c651a26f0d1a92bdaca0ea7260421a6e251e2622..e23dbeeec6b33ef8f28ddde8a52af1e1cbb3e815 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/max_s8.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/max_s8.c
@@ -43,7 +43,7 @@ TEST_UNIFORM_ZX (max_w0_s8_m_tied1, svint8_t, int8_t,
 		 z0 = svmax_m (p0, z0, x0))
 
 /*
-** max_w0_s8_m_untied: { xfail *-*-* }
+** max_w0_s8_m_untied:
 **	mov	(z[0-9]+\.b), w0
 **	movprfx	z0, z1
 **	smax	z0\.b, p0/m, z0\.b, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/max_u16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/max_u16.c
index 9a0b9543169d9a626e20009a8a838ba16fd6c48c..9ad6705ab703b204a78b07174590cbaa8813731d 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/max_u16.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/max_u16.c
@@ -43,7 +43,7 @@ TEST_UNIFORM_ZX (max_w0_u16_m_tied1, svuint16_t, uint16_t,
 		 z0 = svmax_m (p0, z0, x0))
 
 /*
-** max_w0_u16_m_untied: { xfail *-*-* }
+** max_w0_u16_m_untied:
 **	mov	(z[0-9]+\.h), w0
 **	movprfx	z0, z1
 **	umax	z0\.h, p0/m, z0\.h, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/max_u8.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/max_u8.c
index 04c9ddb36a23c13e345781c04c3ef566e9c0f2af..6326891f68005c6223afe0c1f24ffa1b81d059e3 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/max_u8.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/max_u8.c
@@ -43,7 +43,7 @@ TEST_UNIFORM_ZX (max_w0_u8_m_tied1, svuint8_t, uint8_t,
 		 z0 = svmax_m (p0, z0, x0))
 
 /*
-** max_w0_u8_m_untied: { xfail *-*-* }
+** max_w0_u8_m_untied:
 **	mov	(z[0-9]+\.b), w0
 **	movprfx	z0, z1
 **	umax	z0\.b, p0/m, z0\.b, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/min_s16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/min_s16.c
index 14dfcc4c333b69fe24a4a9f978c02999f8abffce..1cf12277add40827921a4a1004d59e6706977acf 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/min_s16.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/min_s16.c
@@ -43,7 +43,7 @@ TEST_UNIFORM_ZX (min_w0_s16_m_tied1, svint16_t, int16_t,
 		 z0 = svmin_m (p0, z0, x0))
 
 /*
-** min_w0_s16_m_untied: { xfail *-*-* }
+** min_w0_s16_m_untied:
 **	mov	(z[0-9]+\.h), w0
 **	movprfx	z0, z1
 **	smin	z0\.h, p0/m, z0\.h, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/min_s8.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/min_s8.c
index 714b1576d5c6fa27d2648c2db0e38076aeb266c0..4edc0b72fadbbe508d7b9666f6d438163eb8c2cb 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/min_s8.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/min_s8.c
@@ -43,7 +43,7 @@ TEST_UNIFORM_ZX (min_w0_s8_m_tied1, svint8_t, int8_t,
 		 z0 = svmin_m (p0, z0, x0))
 
 /*
-** min_w0_s8_m_untied: { xfail *-*-* }
+** min_w0_s8_m_untied:
 **	mov	(z[0-9]+\.b), w0
 **	movprfx	z0, z1
 **	smin	z0\.b, p0/m, z0\.b, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/min_u16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/min_u16.c
index df35cf1135ec0fcf7a6528f78657271b95b8defc..2147f96d09f251703845745f2fbe505793bf51cd 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/min_u16.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/min_u16.c
@@ -43,7 +43,7 @@ TEST_UNIFORM_ZX (min_w0_u16_m_tied1, svuint16_t, uint16_t,
 		 z0 = svmin_m (p0, z0, x0))
 
 /*
-** min_w0_u16_m_untied: { xfail *-*-* }
+** min_w0_u16_m_untied:
 **	mov	(z[0-9]+\.h), w0
 **	movprfx	z0, z1
 **	umin	z0\.h, p0/m, z0\.h, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/min_u8.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/min_u8.c
index 2ca274278a29a0e16e0756a5015f438bfa90f839..734925e6ea348316ee8cf6a3b9356f38c2f819ea 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/min_u8.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/min_u8.c
@@ -43,7 +43,7 @@ TEST_UNIFORM_ZX (min_w0_u8_m_tied1, svuint8_t, uint8_t,
 		 z0 = svmin_m (p0, z0, x0))
 
 /*
-** min_w0_u8_m_untied: { xfail *-*-* }
+** min_w0_u8_m_untied:
 **	mov	(z[0-9]+\.b), w0
 **	movprfx	z0, z1
 **	umin	z0\.b, p0/m, z0\.b, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mla_s16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mla_s16.c
index f3ed191db6abe5947313cf9cc9259735f39aa789..c016acd816290082b4dde6b83ca4f7c74f975ae0 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mla_s16.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mla_s16.c
@@ -54,7 +54,7 @@ TEST_UNIFORM_ZX (mla_w0_s16_m_tied1, svint16_t, int16_t,
 		 z0 = svmla_m (p0, z0, z1, x0))
 
 /*
-** mla_w0_s16_m_untied: { xfail *-*-* }
+** mla_w0_s16_m_untied:
 **	mov	(z[0-9]+\.h), w0
 **	movprfx	z0, z1
 **	mla	z0\.h, p0/m, z2\.h, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mla_s8.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mla_s8.c
index 47468947d78b686ee83a309ad7ff2f31deed5872..ccbf145850acf927ebdc02ad950bd518606eb79d 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mla_s8.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mla_s8.c
@@ -54,7 +54,7 @@ TEST_UNIFORM_ZX (mla_w0_s8_m_tied1, svint8_t, int8_t,
 		 z0 = svmla_m (p0, z0, z1, x0))
 
 /*
-** mla_w0_s8_m_untied: { xfail *-*-* }
+** mla_w0_s8_m_untied:
 **	mov	(z[0-9]+\.b), w0
 **	movprfx	z0, z1
 **	mla	z0\.b, p0/m, z2\.b, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mla_u16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mla_u16.c
index 7238e428f68668c23ef46d62a6bfb036bc669641..c3121bb761261c1acf632709ff4836b2fd4e5692 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mla_u16.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mla_u16.c
@@ -54,7 +54,7 @@ TEST_UNIFORM_ZX (mla_w0_u16_m_tied1, svuint16_t, uint16_t,
 		 z0 = svmla_m (p0, z0, z1, x0))
 
 /*
-** mla_w0_u16_m_untied: { xfail *-*-* }
+** mla_w0_u16_m_untied:
 **	mov	(z[0-9]+\.h), w0
 **	movprfx	z0, z1
 **	mla	z0\.h, p0/m, z2\.h, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mla_u8.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mla_u8.c
index 832ed41410e39b1c0e554799d9d881357939ab1c..ecc70274cb6e452b1576d02968b8afe5a7dc7fac 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mla_u8.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mla_u8.c
@@ -54,7 +54,7 @@ TEST_UNIFORM_ZX (mla_w0_u8_m_tied1, svuint8_t, uint8_t,
 		 z0 = svmla_m (p0, z0, z1, x0))
 
 /*
-** mla_w0_u8_m_untied: { xfail *-*-* }
+** mla_w0_u8_m_untied:
 **	mov	(z[0-9]+\.b), w0
 **	movprfx	z0, z1
 **	mla	z0\.b, p0/m, z2\.b, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mls_s16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mls_s16.c
index e199829c4adc8da64824e5523e77a02b1dc0acf4..7be0a4128f6165a65ef2a2f2defb659c50d64b45 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mls_s16.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mls_s16.c
@@ -54,7 +54,7 @@ TEST_UNIFORM_ZX (mls_w0_s16_m_tied1, svint16_t, int16_t,
 		 z0 = svmls_m (p0, z0, z1, x0))
 
 /*
-** mls_w0_s16_m_untied: { xfail *-*-* }
+** mls_w0_s16_m_untied:
 **	mov	(z[0-9]+\.h), w0
 **	movprfx	z0, z1
 **	mls	z0\.h, p0/m, z2\.h, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mls_s8.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mls_s8.c
index c60c431455f099fc45e5ad2a026a72a6f7ae3eb9..fc3a65c7a0944e5c6865a2ddb899a06d15b21976 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mls_s8.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mls_s8.c
@@ -54,7 +54,7 @@ TEST_UNIFORM_ZX (mls_w0_s8_m_tied1, svint8_t, int8_t,
 		 z0 = svmls_m (p0, z0, z1, x0))
 
 /*
-** mls_w0_s8_m_untied: { xfail *-*-* }
+** mls_w0_s8_m_untied:
 **	mov	(z[0-9]+\.b), w0
 **	movprfx	z0, z1
 **	mls	z0\.b, p0/m, z2\.b, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mls_u16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mls_u16.c
index e8a9f5cd94c6d823f988915373bfa5eebbc44c45..10779fb6674aefef7bcbd209dbc94ec3badb4801 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mls_u16.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mls_u16.c
@@ -54,7 +54,7 @@ TEST_UNIFORM_ZX (mls_w0_u16_m_tied1, svuint16_t, uint16_t,
 		 z0 = svmls_m (p0, z0, z1, x0))
 
 /*
-** mls_w0_u16_m_untied: { xfail *-*-* }
+** mls_w0_u16_m_untied:
 **	mov	(z[0-9]+\.h), w0
 **	movprfx	z0, z1
 **	mls	z0\.h, p0/m, z2\.h, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mls_u8.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mls_u8.c
index 0489aaa7cf96af5d0d02233f813b0b95119b1169..c9277e11cb5cefbe43511da5e01a74aa5f7eac6c 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mls_u8.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mls_u8.c
@@ -54,7 +54,7 @@ TEST_UNIFORM_ZX (mls_w0_u8_m_tied1, svuint8_t, uint8_t,
 		 z0 = svmls_m (p0, z0, z1, x0))
 
 /*
-** mls_w0_u8_m_untied: { xfail *-*-* }
+** mls_w0_u8_m_untied:
 **	mov	(z[0-9]+\.b), w0
 **	movprfx	z0, z1
 **	mls	z0\.b, p0/m, z2\.b, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/msb_s16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/msb_s16.c
index 56347cfb91828d45f5609c030e0869c650c5fd2f..52f3721c93633568f390cc6b7196af38a715b844 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/msb_s16.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/msb_s16.c
@@ -54,7 +54,7 @@ TEST_UNIFORM_ZX (msb_w0_s16_m_tied1, svint16_t, int16_t,
 		 z0 = svmsb_m (p0, z0, z1, x0))
 
 /*
-** msb_w0_s16_m_untied: { xfail *-*-* }
+** msb_w0_s16_m_untied:
 **	mov	(z[0-9]+\.h), w0
 **	movprfx	z0, z1
 **	msb	z0\.h, p0/m, z2\.h, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/msb_s8.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/msb_s8.c
index d7fcafdd0dfab6f4b1f7da8551570ab898ca2eab..381773c4a2b43538a336a63a36f4d3307e0e405f 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/msb_s8.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/msb_s8.c
@@ -54,7 +54,7 @@ TEST_UNIFORM_ZX (msb_w0_s8_m_tied1, svint8_t, int8_t,
 		 z0 = svmsb_m (p0, z0, z1, x0))
 
 /*
-** msb_w0_s8_m_untied: { xfail *-*-* }
+** msb_w0_s8_m_untied:
 **	mov	(z[0-9]+\.b), w0
 **	movprfx	z0, z1
 **	msb	z0\.b, p0/m, z2\.b, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/msb_u16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/msb_u16.c
index 437a96040e12ca2dc83d8dc56ff2dbf14d3728ba..3b710f72c2dcce03df64634b35a5838330eda410 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/msb_u16.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/msb_u16.c
@@ -54,7 +54,7 @@ TEST_UNIFORM_ZX (msb_w0_u16_m_tied1, svuint16_t, uint16_t,
 		 z0 = svmsb_m (p0, z0, z1, x0))
 
 /*
-** msb_w0_u16_m_untied: { xfail *-*-* }
+** msb_w0_u16_m_untied:
 **	mov	(z[0-9]+\.h), w0
 **	movprfx	z0, z1
 **	msb	z0\.h, p0/m, z2\.h, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/msb_u8.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/msb_u8.c
index 5665ec9e32075c0d8a9665ab624c4dfde2042d5a..a0e86c799183d758a5be4a5116c592aba9d50021 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/msb_u8.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/msb_u8.c
@@ -54,7 +54,7 @@ TEST_UNIFORM_ZX (msb_w0_u8_m_tied1, svuint8_t, uint8_t,
 		 z0 = svmsb_m (p0, z0, z1, x0))
 
 /*
-** msb_w0_u8_m_untied: { xfail *-*-* }
+** msb_w0_u8_m_untied:
 **	mov	(z[0-9]+\.b), w0
 **	movprfx	z0, z1
 **	msb	z0\.b, p0/m, z2\.b, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_s16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_s16.c
index aa08bc2740507c7df20a31acea455b61e937ae4e..52c6767ede0df73ac5b5a7c32c283c13f46d1c85 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_s16.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_s16.c
@@ -43,7 +43,7 @@ TEST_UNIFORM_ZX (mul_w0_s16_m_tied1, svint16_t, int16_t,
 		 z0 = svmul_m (p0, z0, x0))
 
 /*
-** mul_w0_s16_m_untied: { xfail *-*-* }
+** mul_w0_s16_m_untied:
 **	mov	(z[0-9]+\.h), w0
 **	movprfx	z0, z1
 **	mul	z0\.h, p0/m, z0\.h, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_s8.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_s8.c
index 012e6f250989dd0965b72bbec26d05facf0d6bda..5fc9d93b2b4567f381521c63d9b174cdd4dea1e0 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_s8.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_s8.c
@@ -43,7 +43,7 @@ TEST_UNIFORM_ZX (mul_w0_s8_m_tied1, svint8_t, int8_t,
 		 z0 = svmul_m (p0, z0, x0))
 
 /*
-** mul_w0_s8_m_untied: { xfail *-*-* }
+** mul_w0_s8_m_untied:
 **	mov	(z[0-9]+\.b), w0
 **	movprfx	z0, z1
 **	mul	z0\.b, p0/m, z0\.b, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u16.c
index 300987eb6e63677a0fa5e0f2b99f39700f9520bb..b081d230be3ed49044c506da156bc00bbaeb28a6 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u16.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u16.c
@@ -43,7 +43,7 @@ TEST_UNIFORM_ZX (mul_w0_u16_m_tied1, svuint16_t, uint16_t,
 		 z0 = svmul_m (p0, z0, x0))
 
 /*
-** mul_w0_u16_m_untied: { xfail *-*-* }
+** mul_w0_u16_m_untied:
 **	mov	(z[0-9]+\.h), w0
 **	movprfx	z0, z1
 **	mul	z0\.h, p0/m, z0\.h, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u8.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u8.c
index b2745a48f506cabee8fe9d952c7d89fa8eabdd45..e02be8b343e4e02d765bfea3a55e5f2f43345867 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u8.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u8.c
@@ -43,7 +43,7 @@ TEST_UNIFORM_ZX (mul_w0_u8_m_tied1, svuint8_t, uint8_t,
 		 z0 = svmul_m (p0, z0, x0))
 
 /*
-** mul_w0_u8_m_untied: { xfail *-*-* }
+** mul_w0_u8_m_untied:
 **	mov	(z[0-9]+\.b), w0
 **	movprfx	z0, z1
 **	mul	z0\.b, p0/m, z0\.b, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mulh_s16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mulh_s16.c
index a81532f5d8987dc2405cf0a51ca476052f5b984e..3cd3a2477052ad17c435a9bcb0db92789a2b8611 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mulh_s16.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mulh_s16.c
@@ -43,7 +43,7 @@ TEST_UNIFORM_ZX (mulh_w0_s16_m_tied1, svint16_t, int16_t,
 		 z0 = svmulh_m (p0, z0, x0))
 
 /*
-** mulh_w0_s16_m_untied: { xfail *-*-* }
+** mulh_w0_s16_m_untied:
 **	mov	(z[0-9]+\.h), w0
 **	movprfx	z0, z1
 **	smulh	z0\.h, p0/m, z0\.h, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mulh_s8.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mulh_s8.c
index f9cd01afdc964f89e6fd24a057aa17f69bb3172c..08ead93a80da38a9df5dc770141f84a23e5ac1a6 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mulh_s8.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mulh_s8.c
@@ -43,7 +43,7 @@ TEST_UNIFORM_ZX (mulh_w0_s8_m_tied1, svint8_t, int8_t,
 		 z0 = svmulh_m (p0, z0, x0))
 
 /*
-** mulh_w0_s8_m_untied: { xfail *-*-* }
+** mulh_w0_s8_m_untied:
 **	mov	(z[0-9]+\.b), w0
 **	movprfx	z0, z1
 **	smulh	z0\.b, p0/m, z0\.b, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mulh_u16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mulh_u16.c
index e9173eb243ec9c2514809fd0bdf208a7811377a8..01a0de738bb980adfcd0394ca221fe765fb8207c 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mulh_u16.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mulh_u16.c
@@ -43,7 +43,7 @@ TEST_UNIFORM_ZX (mulh_w0_u16_m_tied1, svuint16_t, uint16_t,
 		 z0 = svmulh_m (p0, z0, x0))
 
 /*
-** mulh_w0_u16_m_untied: { xfail *-*-* }
+** mulh_w0_u16_m_untied:
 **	mov	(z[0-9]+\.h), w0
 **	movprfx	z0, z1
 **	umulh	z0\.h, p0/m, z0\.h, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mulh_u8.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mulh_u8.c
index db7b1be1bdf92ec534ae8d467cb44d0df380423a..ce547ffbe2b6ced08ad52f366800f68c7bda9dfb 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mulh_u8.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mulh_u8.c
@@ -43,7 +43,7 @@ TEST_UNIFORM_ZX (mulh_w0_u8_m_tied1, svuint8_t, uint8_t,
 		 z0 = svmulh_m (p0, z0, x0))
 
 /*
-** mulh_w0_u8_m_untied: { xfail *-*-* }
+** mulh_w0_u8_m_untied:
 **	mov	(z[0-9]+\.b), w0
 **	movprfx	z0, z1
 **	umulh	z0\.b, p0/m, z0\.b, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/orr_s16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/orr_s16.c
index 62b707a9c696164ec1da32daac76182f3dbcf1fa..c0a3aaf3dfe6b7bf29dfdf91a16b16e64d4b966a 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/orr_s16.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/orr_s16.c
@@ -43,7 +43,7 @@ TEST_UNIFORM_ZX (orr_w0_s16_m_tied1, svint16_t, int16_t,
 		 z0 = svorr_m (p0, z0, x0))
 
 /*
-** orr_w0_s16_m_untied: { xfail *-*-* }
+** orr_w0_s16_m_untied:
 **	mov	(z[0-9]+\.h), w0
 **	movprfx	z0, z1
 **	orr	z0\.h, p0/m, z0\.h, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/orr_s8.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/orr_s8.c
index b6483b6e76ec82e0e6e99e701271fb2fe3d507b5..96c265d856ab5aba848a04c44c1ebc2db790e21d 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/orr_s8.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/orr_s8.c
@@ -43,7 +43,7 @@ TEST_UNIFORM_ZX (orr_w0_s8_m_tied1, svint8_t, int8_t,
 		 z0 = svorr_m (p0, z0, x0))
 
 /*
-** orr_w0_s8_m_untied: { xfail *-*-* }
+** orr_w0_s8_m_untied:
 **	mov	(z[0-9]+\.b), w0
 **	movprfx	z0, z1
 **	orr	z0\.b, p0/m, z0\.b, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/orr_u16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/orr_u16.c
index 000a0444c9b08f8c17bc44183381f57c5136081a..8757ff18499b28fd4e3156abfa9e331058b895a8 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/orr_u16.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/orr_u16.c
@@ -43,7 +43,7 @@ TEST_UNIFORM_ZX (orr_w0_u16_m_tied1, svuint16_t, uint16_t,
 		 z0 = svorr_m (p0, z0, x0))
 
 /*
-** orr_w0_u16_m_untied: { xfail *-*-* }
+** orr_w0_u16_m_untied:
 **	mov	(z[0-9]+\.h), w0
 **	movprfx	z0, z1
 **	orr	z0\.h, p0/m, z0\.h, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/orr_u8.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/orr_u8.c
index efe5591b47287412ec558ece8a9449302b9df31d..c9701f2c3492520f15c369ca74b60a2c370b78f2 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/orr_u8.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/orr_u8.c
@@ -43,7 +43,7 @@ TEST_UNIFORM_ZX (orr_w0_u8_m_tied1, svuint8_t, uint8_t,
 		 z0 = svorr_m (p0, z0, x0))
 
 /*
-** orr_w0_u8_m_untied: { xfail *-*-* }
+** orr_w0_u8_m_untied:
 **	mov	(z[0-9]+\.b), w0
 **	movprfx	z0, z1
 **	orr	z0\.b, p0/m, z0\.b, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/scale_f16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/scale_f16.c
index 9c554255b443844d04427ce5777264921a3c5f61..50a718d46cf772ec12697d14bee61a3dfb59f6ec 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/scale_f16.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/scale_f16.c
@@ -43,7 +43,7 @@ TEST_UNIFORM_ZX (scale_w0_f16_m_tied1, svfloat16_t, int16_t,
 		 z0 = svscale_m (p0, z0, x0))
 
 /*
-** scale_w0_f16_m_untied: { xfail *-*-* }
+** scale_w0_f16_m_untied:
 **	mov	(z[0-9]+\.h), w0
 **	movprfx	z0, z1
 **	fscale	z0\.h, p0/m, z0\.h, \1
@@ -211,7 +211,7 @@ TEST_UNIFORM_ZX (scale_w0_f16_x_tied1, svfloat16_t, int16_t,
 		 z0 = svscale_x (p0, z0, x0))
 
 /*
-** scale_w0_f16_x_untied: { xfail *-*-* }
+** scale_w0_f16_x_untied:
 **	mov	(z[0-9]+\.h), w0
 **	movprfx	z0, z1
 **	fscale	z0\.h, p0/m, z0\.h, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/sub_s16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/sub_s16.c
index aea8ea2b4aa545b5e1dc0416f972a2dfec512ba4..fcab3a87162d5df76371166951efb18fde493e55 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/sub_s16.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/sub_s16.c
@@ -43,7 +43,7 @@ TEST_UNIFORM_ZX (sub_w0_s16_m_tied1, svint16_t, int16_t,
 		 z0 = svsub_m (p0, z0, x0))
 
 /*
-** sub_w0_s16_m_untied: { xfail *-*-* }
+** sub_w0_s16_m_untied:
 **	mov	(z[0-9]+\.h), w0
 **	movprfx	z0, z1
 **	sub	z0\.h, p0/m, z0\.h, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/sub_s8.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/sub_s8.c
index 0d7ba99aa5695c97e74ffc729dc01a96d2fbeec5..4de12f0b387d9d595c5bc4043018aaabdd643425 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/sub_s8.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/sub_s8.c
@@ -43,7 +43,7 @@ TEST_UNIFORM_ZX (sub_w0_s8_m_tied1, svint8_t, int8_t,
 		 z0 = svsub_m (p0, z0, x0))
 
 /*
-** sub_w0_s8_m_untied: { xfail *-*-* }
+** sub_w0_s8_m_untied:
 **	mov	(z[0-9]+\.b), w0
 **	movprfx	z0, z1
 **	sub	z0\.b, p0/m, z0\.b, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/sub_u16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/sub_u16.c
index 89620e159bf3b55a88c00900c901cafb547510b1..d682efa93eac4c97984e75d302268c2b478e07b9 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/sub_u16.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/sub_u16.c
@@ -43,7 +43,7 @@ TEST_UNIFORM_ZX (sub_w0_u16_m_tied1, svuint16_t, uint16_t,
 		 z0 = svsub_m (p0, z0, x0))
 
 /*
-** sub_w0_u16_m_untied: { xfail *-*-* }
+** sub_w0_u16_m_untied:
 **	mov	(z[0-9]+\.h), w0
 **	movprfx	z0, z1
 **	sub	z0\.h, p0/m, z0\.h, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/sub_u8.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/sub_u8.c
index 4552041910f7e86cbe896c04971da1b974a5eda0..39bc519c642585eba13fda8edd8060dff5627371 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/sub_u8.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/sub_u8.c
@@ -43,7 +43,7 @@ TEST_UNIFORM_ZX (sub_w0_u8_m_tied1, svuint8_t, uint8_t,
 		 z0 = svsub_m (p0, z0, x0))
 
 /*
-** sub_w0_u8_m_untied: { xfail *-*-* }
+** sub_w0_u8_m_untied:
 **	mov	(z[0-9]+\.b), w0
 **	movprfx	z0, z1
 **	sub	z0\.b, p0/m, z0\.b, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/subr_s16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/subr_s16.c
index d3dad62dafeb93db1f2a309e4f4caab967a6dedd..ad6f4519e536adab2c45ad015b922d4c96714ab7 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/subr_s16.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/subr_s16.c
@@ -43,7 +43,7 @@ TEST_UNIFORM_ZX (subr_w0_s16_m_tied1, svint16_t, int16_t,
 		 z0 = svsubr_m (p0, z0, x0))
 
 /*
-** subr_w0_s16_m_untied: { xfail *-*-* }
+** subr_w0_s16_m_untied:
 **	mov	(z[0-9]+\.h), w0
 **	movprfx	z0, z1
 **	subr	z0\.h, p0/m, z0\.h, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/subr_s8.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/subr_s8.c
index 90d2a6de9a5fc935fd134c459b89829a933f116d..c31948202e2405990d0c6c99bd11fcb564983cc9 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/subr_s8.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/subr_s8.c
@@ -43,7 +43,7 @@ TEST_UNIFORM_ZX (subr_w0_s8_m_tied1, svint8_t, int8_t,
 		 z0 = svsubr_m (p0, z0, x0))
 
 /*
-** subr_w0_s8_m_untied: { xfail *-*-* }
+** subr_w0_s8_m_untied:
 **	mov	(z[0-9]+\.b), w0
 **	movprfx	z0, z1
 **	subr	z0\.b, p0/m, z0\.b, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/subr_u16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/subr_u16.c
index 379a80fb189796f747c92948d33ff3bf8cf6d0a7..631593aad78af792659842fd635b3335ea439366 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/subr_u16.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/subr_u16.c
@@ -43,7 +43,7 @@ TEST_UNIFORM_ZX (subr_w0_u16_m_tied1, svuint16_t, uint16_t,
 		 z0 = svsubr_m (p0, z0, x0))
 
 /*
-** subr_w0_u16_m_untied: { xfail *-*-* }
+** subr_w0_u16_m_untied:
 **	mov	(z[0-9]+\.h), w0
 **	movprfx	z0, z1
 **	subr	z0\.h, p0/m, z0\.h, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/subr_u8.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/subr_u8.c
index fe5f96da833565d6013383281e14005f380410ba..b9ab757ca81b75d1d76fc969982e078db76b8a28 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/subr_u8.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/subr_u8.c
@@ -43,7 +43,7 @@ TEST_UNIFORM_ZX (subr_w0_u8_m_tied1, svuint8_t, uint8_t,
 		 z0 = svsubr_m (p0, z0, x0))
 
 /*
-** subr_w0_u8_m_untied: { xfail *-*-* }
+** subr_w0_u8_m_untied:
 **	mov	(z[0-9]+\.b), w0
 **	movprfx	z0, z1
 **	subr	z0\.b, p0/m, z0\.b, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/bcax_s16.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/bcax_s16.c
index acad87d963540c6cbf3df1a929ac368c754007a7..2fb2bdff574222033b090498b93b99efae68073e 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/bcax_s16.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/bcax_s16.c
@@ -66,7 +66,7 @@ TEST_UNIFORM_ZX (bcax_w0_s16_tied2, svint16_t, int16_t,
 		 z0 = svbcax (z1, z0, x0))
 
 /*
-** bcax_w0_s16_untied: { xfail *-*-*}
+** bcax_w0_s16_untied:
 **	mov	(z[0-9]+)\.h, w0
 **	movprfx	z0, z1
 **	bcax	z0\.d, z0\.d, (z2\.d, \1\.d|\1\.d, z2\.d)
diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/bcax_s8.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/bcax_s8.c
index 548aafad85739d8420bd2c14bb877f14d8e755bc..3925f56ff20ca9db559a338587e1f19db8e9a0a6 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/bcax_s8.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/bcax_s8.c
@@ -66,7 +66,7 @@ TEST_UNIFORM_ZX (bcax_w0_s8_tied2, svint8_t, int8_t,
 		 z0 = svbcax (z1, z0, x0))
 
 /*
-** bcax_w0_s8_untied: { xfail *-*-*}
+** bcax_w0_s8_untied:
 **	mov	(z[0-9]+)\.b, w0
 **	movprfx	z0, z1
 **	bcax	z0\.d, z0\.d, (z2\.d, \1\.d|\1\.d, z2\.d)
diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/bcax_u16.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/bcax_u16.c
index b63a4774ba73e5df19fcc5190483a3fda4092598..589e868b205c492b510aafd21d77bb187d35429d 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/bcax_u16.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/bcax_u16.c
@@ -66,7 +66,7 @@ TEST_UNIFORM_ZX (bcax_w0_u16_tied2, svuint16_t, uint16_t,
 		 z0 = svbcax (z1, z0, x0))
 
 /*
-** bcax_w0_u16_untied: { xfail *-*-*}
+** bcax_w0_u16_untied:
 **	mov	(z[0-9]+)\.h, w0
 **	movprfx	z0, z1
 **	bcax	z0\.d, z0\.d, (z2\.d, \1\.d|\1\.d, z2\.d)
diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/bcax_u8.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/bcax_u8.c
index 0957d58bd0ecd348ea4148ce84e05ffcb2848bcd..ee0b603830b702fd7e99723d129e5369ded0ab9a 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/bcax_u8.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/bcax_u8.c
@@ -66,7 +66,7 @@ TEST_UNIFORM_ZX (bcax_w0_u8_tied2, svuint8_t, uint8_t,
 		 z0 = svbcax (z1, z0, x0))
 
 /*
-** bcax_w0_u8_untied: { xfail *-*-*}
+** bcax_w0_u8_untied:
 **	mov	(z[0-9]+)\.b, w0
 **	movprfx	z0, z1
 **	bcax	z0\.d, z0\.d, (z2\.d, \1\.d|\1\.d, z2\.d)
diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/qadd_s16.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/qadd_s16.c
index 6330c4265bb17b80f12a884e663db342b9c346b2..286f2e2b969139106f8a38b41b42097c9a79c276 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/qadd_s16.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/qadd_s16.c
@@ -163,7 +163,7 @@ TEST_UNIFORM_ZX (qadd_w0_s16_m_tied1, svint16_t, int16_t,
 		 z0 = svqadd_m (p0, z0, x0))
 
 /*
-** qadd_w0_s16_m_untied: { xfail *-*-* }
+** qadd_w0_s16_m_untied:
 **	mov	(z[0-9]+\.h), w0
 **	movprfx	z0, z1
 **	sqadd	z0\.h, p0/m, z0\.h, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/qadd_s8.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/qadd_s8.c
index 61343beacb899b843c2358b609cc26a1ca1110a3..ad439892a6749aa3d346424f8b2aebcd4bd4762c 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/qadd_s8.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/qadd_s8.c
@@ -163,7 +163,7 @@ TEST_UNIFORM_ZX (qadd_w0_s8_m_tied1, svint8_t, int8_t,
 		 z0 = svqadd_m (p0, z0, x0))
 
 /*
-** qadd_w0_s8_m_untied: { xfail *-*-* }
+** qadd_w0_s8_m_untied:
 **	mov	(z[0-9]+\.b), w0
 **	movprfx	z0, z1
 **	sqadd	z0\.b, p0/m, z0\.b, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/qadd_u16.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/qadd_u16.c
index f6c7ca9e075b7106209b858763a0cec160210ddf..9dafc3617ef84cf5a6ac13ad532a0e63065c7409 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/qadd_u16.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/qadd_u16.c
@@ -166,7 +166,7 @@ TEST_UNIFORM_ZX (qadd_w0_u16_m_tied1, svuint16_t, uint16_t,
 		 z0 = svqadd_m (p0, z0, x0))
 
 /*
-** qadd_w0_u16_m_untied: { xfail *-*-* }
+** qadd_w0_u16_m_untied:
 **	mov	(z[0-9]+\.h), w0
 **	movprfx	z0, z1
 **	uqadd	z0\.h, p0/m, z0\.h, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/qadd_u8.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/qadd_u8.c
index 6c856e2871c26e1fbfa32c4e37df60c8884e76dd..a485cf3eeb656dded02c7eba8944757d164fbf6c 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/qadd_u8.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/qadd_u8.c
@@ -163,7 +163,7 @@ TEST_UNIFORM_ZX (qadd_w0_u8_m_tied1, svuint8_t, uint8_t,
 		 z0 = svqadd_m (p0, z0, x0))
 
 /*
-** qadd_w0_u8_m_untied: { xfail *-*-* }
+** qadd_w0_u8_m_untied:
 **	mov	(z[0-9]+\.b), w0
 **	movprfx	z0, z1
 **	uqadd	z0\.b, p0/m, z0\.b, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/qdmlalb_s16.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/qdmlalb_s16.c
index 4d1e90395e212b323f304f837504f2d8b158afc8..4880a6940d2981dd37b049431a68c326264c9503 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/qdmlalb_s16.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/qdmlalb_s16.c
@@ -54,7 +54,7 @@ TEST_DUAL_ZX (qdmlalb_w0_s16_tied1, svint16_t, svint8_t, int8_t,
 	      z0 = svqdmlalb (z0, z4, x0))
 
 /*
-** qdmlalb_w0_s16_untied: { xfail *-*-* }
+** qdmlalb_w0_s16_untied:
 **	mov	(z[0-9]+\.b), w0
 **	movprfx	z0, z1
 **	sqdmlalb	z0\.h, z4\.b, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/qdmlalb_s32.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/qdmlalb_s32.c
index 94373773e61e3ad6ce0ff5c2a7fda38f8e2086cb..03eebba5997c583768253586c252ec13c939b890 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/qdmlalb_s32.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/qdmlalb_s32.c
@@ -54,7 +54,7 @@ TEST_DUAL_ZX (qdmlalb_w0_s32_tied1, svint32_t, svint16_t, int16_t,
 	      z0 = svqdmlalb (z0, z4, x0))
 
 /*
-** qdmlalb_w0_s32_untied: { xfail *-*-* }
+** qdmlalb_w0_s32_untied:
 **	mov	(z[0-9]+\.h), w0
 **	movprfx	z0, z1
 **	sqdmlalb	z0\.s, z4\.h, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/qdmlalbt_s16.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/qdmlalbt_s16.c
index d591db3cfb8da203395b3b5e2955805381a3e4f3..0b3d5279db96bd7a39f286c6c5b37d9e1dea7d5a 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/qdmlalbt_s16.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/qdmlalbt_s16.c
@@ -54,7 +54,7 @@ TEST_DUAL_ZX (qdmlalbt_w0_s16_tied1, svint16_t, svint8_t, int8_t,
 	      z0 = svqdmlalbt (z0, z4, x0))
 
 /*
-** qdmlalbt_w0_s16_untied: { xfail *-*-*}
+** qdmlalbt_w0_s16_untied:
 **	mov	(z[0-9]+\.b), w0
 **	movprfx	z0, z1
 **	sqdmlalbt	z0\.h, z4\.b, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/qdmlalbt_s32.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/qdmlalbt_s32.c
index e8326fed6171531cad3dd1c8c921d89bc1f29ce9..9f2dd464e638d8dbe7d15b1fa1c1538aef18169a 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/qdmlalbt_s32.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/qdmlalbt_s32.c
@@ -54,7 +54,7 @@ TEST_DUAL_ZX (qdmlalbt_w0_s32_tied1, svint32_t, svint16_t, int16_t,
 	      z0 = svqdmlalbt (z0, z4, x0))
 
 /*
-** qdmlalbt_w0_s32_untied: { xfail *-*-*}
+** qdmlalbt_w0_s32_untied:
 **	mov	(z[0-9]+\.h), w0
 **	movprfx	z0, z1
 **	sqdmlalbt	z0\.s, z4\.h, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/qsub_s16.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/qsub_s16.c
index c102e58ed910c07aef6bee739f45fb96dd5aed80..618a61bf48fd912014b59874372e8d22eb7ee325 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/qsub_s16.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/qsub_s16.c
@@ -163,7 +163,7 @@ TEST_UNIFORM_ZX (qsub_w0_s16_m_tied1, svint16_t, int16_t,
 		 z0 = svqsub_m (p0, z0, x0))
 
 /*
-** qsub_w0_s16_m_untied: { xfail *-*-* }
+** qsub_w0_s16_m_untied:
 **	mov	(z[0-9]+\.h), w0
 **	movprfx	z0, z1
 **	sqsub	z0\.h, p0/m, z0\.h, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/qsub_s8.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/qsub_s8.c
index 067ee6e6cb1026fc7691268c47d318f4e67aa66c..f0d6b97bf97cb511e3e3a8c1d01948bbe28e393d 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/qsub_s8.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/qsub_s8.c
@@ -163,7 +163,7 @@ TEST_UNIFORM_ZX (qsub_w0_s8_m_tied1, svint8_t, int8_t,
 		 z0 = svqsub_m (p0, z0, x0))
 
 /*
-** qsub_w0_s8_m_untied: { xfail *-*-* }
+** qsub_w0_s8_m_untied:
 **	mov	(z[0-9]+\.b), w0
 **	movprfx	z0, z1
 **	sqsub	z0\.b, p0/m, z0\.b, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/qsub_u16.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/qsub_u16.c
index 61be74634723fe1cac7a5c2e38b5d5e72d73dd84..3e87dd162bfe0494d060098c0640094591ef6072 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/qsub_u16.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/qsub_u16.c
@@ -166,7 +166,7 @@ TEST_UNIFORM_ZX (qsub_w0_u16_m_tied1, svuint16_t, uint16_t,
 		 z0 = svqsub_m (p0, z0, x0))
 
 /*
-** qsub_w0_u16_m_untied: { xfail *-*-* }
+** qsub_w0_u16_m_untied:
 **	mov	(z[0-9]+\.h), w0
 **	movprfx	z0, z1
 **	uqsub	z0\.h, p0/m, z0\.h, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/qsub_u8.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/qsub_u8.c
index 686b2b425fb5f0ff9439571f7c518aa4b2e43167..92bf92909a7874a8bb4e073250b9f71e1bc82ebe 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/qsub_u8.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/qsub_u8.c
@@ -163,7 +163,7 @@ TEST_UNIFORM_ZX (qsub_w0_u8_m_tied1, svuint8_t, uint8_t,
 		 z0 = svqsub_m (p0, z0, x0))
 
 /*
-** qsub_w0_u8_m_untied: { xfail *-*-* }
+** qsub_w0_u8_m_untied:
 **	mov	(z[0-9]+\.b), w0
 **	movprfx	z0, z1
 **	uqsub	z0\.b, p0/m, z0\.b, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/qsubr_s16.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/qsubr_s16.c
index 577310d9614be6f063e473eb2d1b6be7aebf4cc2..e42821849b266391555d39773efa6f072bfa810c 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/qsubr_s16.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/qsubr_s16.c
@@ -43,7 +43,7 @@ TEST_UNIFORM_ZX (qsubr_w0_s16_m_tied1, svint16_t, int16_t,
 		 z0 = svqsubr_m (p0, z0, x0))
 
 /*
-** qsubr_w0_s16_m_untied: { xfail *-*-* }
+** qsubr_w0_s16_m_untied:
 **	mov	(z[0-9]+\.h), w0
 **	movprfx	z0, z1
 **	sqsubr	z0\.h, p0/m, z0\.h, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/qsubr_s8.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/qsubr_s8.c
index ce814a8393e94f50ecff8869649bf0acbb8eeb7e..92e66f70cbfdb066be1e2b5a745f57dff0b85c43 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/qsubr_s8.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/qsubr_s8.c
@@ -43,7 +43,7 @@ TEST_UNIFORM_ZX (qsubr_w0_s8_m_tied1, svint8_t, int8_t,
 		 z0 = svqsubr_m (p0, z0, x0))
 
 /*
-** qsubr_w0_s8_m_untied: { xfail *-*-* }
+** qsubr_w0_s8_m_untied:
 **	mov	(z[0-9]+\.b), w0
 **	movprfx	z0, z1
 **	sqsubr	z0\.b, p0/m, z0\.b, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/qsubr_u16.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/qsubr_u16.c
index f406bf2ed86c606bd984671b30c925b1c1233fd4..8018fdaef9e3d2a99a916aeae114ad273ea8ee92 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/qsubr_u16.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/qsubr_u16.c
@@ -43,7 +43,7 @@ TEST_UNIFORM_ZX (qsubr_w0_u16_m_tied1, svuint16_t, uint16_t,
 		 z0 = svqsubr_m (p0, z0, x0))
 
 /*
-** qsubr_w0_u16_m_untied: { xfail *-*-* }
+** qsubr_w0_u16_m_untied:
 **	mov	(z[0-9]+\.h), w0
 **	movprfx	z0, z1
 **	uqsubr	z0\.h, p0/m, z0\.h, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/qsubr_u8.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/qsubr_u8.c
index 7b487fd93b19d4ad620541069759fd0b05564b1c..bb29faa83504c4d1d0df942dfba2ed936c3986a1 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/qsubr_u8.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/qsubr_u8.c
@@ -43,7 +43,7 @@ TEST_UNIFORM_ZX (qsubr_w0_u8_m_tied1, svuint8_t, uint8_t,
 		 z0 = svqsubr_m (p0, z0, x0))
 
 /*
-** qsubr_w0_u8_m_untied: { xfail *-*-* }
+** qsubr_w0_u8_m_untied:
 **	mov	(z[0-9]+\.b), w0
 **	movprfx	z0, z1
 **	uqsubr	z0\.b, p0/m, z0\.b, \1




-- 

[-- Attachment #2: rb15723.patch --]
[-- Type: text/plain, Size: 58818 bytes --]

diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/abd_s16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/abd_s16.c
index e2d0c0fb7ef3f9cd6f232bb5da7f5a46205a093f..030d952ecda6da44b2351d48daa850fc7b5033b7 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/abd_s16.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/abd_s16.c
@@ -43,7 +43,7 @@ TEST_UNIFORM_ZX (abd_w0_s16_m_tied1, svint16_t, int16_t,
 		 z0 = svabd_m (p0, z0, x0))
 
 /*
-** abd_w0_s16_m_untied: { xfail *-*-* }
+** abd_w0_s16_m_untied:
 **	mov	(z[0-9]+\.h), w0
 **	movprfx	z0, z1
 **	sabd	z0\.h, p0/m, z0\.h, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/abd_s8.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/abd_s8.c
index 49a2cc388f960848e15680219c098fdd0ab21671..e1a74a16f6f745966889f90e82e4da473bd1e706 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/abd_s8.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/abd_s8.c
@@ -43,7 +43,7 @@ TEST_UNIFORM_ZX (abd_w0_s8_m_tied1, svint8_t, int8_t,
 		 z0 = svabd_m (p0, z0, x0))
 
 /*
-** abd_w0_s8_m_untied: { xfail *-*-* }
+** abd_w0_s8_m_untied:
 **	mov	(z[0-9]+\.b), w0
 **	movprfx	z0, z1
 **	sabd	z0\.b, p0/m, z0\.b, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/abd_u16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/abd_u16.c
index 60aa9429ea62b41c2ae098b4f2cf6c5357fdf9af..f763da3ec941d334009c1cfe6c80861e4d15cd33 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/abd_u16.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/abd_u16.c
@@ -43,7 +43,7 @@ TEST_UNIFORM_ZX (abd_w0_u16_m_tied1, svuint16_t, uint16_t,
 		 z0 = svabd_m (p0, z0, x0))
 
 /*
-** abd_w0_u16_m_untied: { xfail *-*-* }
+** abd_w0_u16_m_untied:
 **	mov	(z[0-9]+\.h), w0
 **	movprfx	z0, z1
 **	uabd	z0\.h, p0/m, z0\.h, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/abd_u8.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/abd_u8.c
index 454ef153cc3c51b3595525e76ea0a0d7ca70805b..f46f1943484f001a5e9babdba8cb8b9160fef314 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/abd_u8.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/abd_u8.c
@@ -43,7 +43,7 @@ TEST_UNIFORM_ZX (abd_w0_u8_m_tied1, svuint8_t, uint8_t,
 		 z0 = svabd_m (p0, z0, x0))
 
 /*
-** abd_w0_u8_m_untied: { xfail *-*-* }
+** abd_w0_u8_m_untied:
 **	mov	(z[0-9]+\.b), w0
 **	movprfx	z0, z1
 **	uabd	z0\.b, p0/m, z0\.b, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/add_s16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/add_s16.c
index c0883edf9ab4eedf5dca5104e1443b614397cad5..7dd48c778634bd958445cf9bfa29e54fa38b3921 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/add_s16.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/add_s16.c
@@ -43,7 +43,7 @@ TEST_UNIFORM_ZX (add_w0_s16_m_tied1, svint16_t, int16_t,
 		 z0 = svadd_m (p0, z0, x0))
 
 /*
-** add_w0_s16_m_untied: { xfail *-*-* }
+** add_w0_s16_m_untied:
 **	mov	(z[0-9]+\.h), w0
 **	movprfx	z0, z1
 **	add	z0\.h, p0/m, z0\.h, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/add_s8.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/add_s8.c
index 0889c189d59699e6fdbc1c91f8a36ab0f9296f52..6775f26cfb24dfaa1c5d60302ae5eb1766102def 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/add_s8.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/add_s8.c
@@ -43,7 +43,7 @@ TEST_UNIFORM_ZX (add_w0_s8_m_tied1, svint8_t, int8_t,
 		 z0 = svadd_m (p0, z0, x0))
 
 /*
-** add_w0_s8_m_untied: { xfail *-*-* }
+** add_w0_s8_m_untied:
 **	mov	(z[0-9]+\.b), w0
 **	movprfx	z0, z1
 **	add	z0\.b, p0/m, z0\.b, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/add_u16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/add_u16.c
index 25cb90353d3b852334885c6eb306bb2e97452c2e..f0fafdb1c2dc526f2b9199a451a62975111e53a8 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/add_u16.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/add_u16.c
@@ -43,7 +43,7 @@ TEST_UNIFORM_ZX (add_w0_u16_m_tied1, svuint16_t, uint16_t,
 		 z0 = svadd_m (p0, z0, x0))
 
 /*
-** add_w0_u16_m_untied: { xfail *-*-* }
+** add_w0_u16_m_untied:
 **	mov	(z[0-9]+\.h), w0
 **	movprfx	z0, z1
 **	add	z0\.h, p0/m, z0\.h, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/add_u8.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/add_u8.c
index 06b68c97ce8c84494c960f28b2ae4c5dcafd0d03..2bdb7a85f56a8c1969e9390b6ba7c0eb10336c97 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/add_u8.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/add_u8.c
@@ -43,7 +43,7 @@ TEST_UNIFORM_ZX (add_w0_u8_m_tied1, svuint8_t, uint8_t,
 		 z0 = svadd_m (p0, z0, x0))
 
 /*
-** add_w0_u8_m_untied: { xfail *-*-* }
+** add_w0_u8_m_untied:
 **	mov	(z[0-9]+\.b), w0
 **	movprfx	z0, z1
 **	add	z0\.b, p0/m, z0\.b, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/and_s16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/and_s16.c
index d54613e915d221b02d957cf57c44d690328cb430..15e5d58334455a75a42eee29e2aaf1eb89e75acd 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/and_s16.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/and_s16.c
@@ -43,7 +43,7 @@ TEST_UNIFORM_ZX (and_w0_s16_m_tied1, svint16_t, int16_t,
 		 z0 = svand_m (p0, z0, x0))
 
 /*
-** and_w0_s16_m_untied: { xfail *-*-* }
+** and_w0_s16_m_untied:
 **	mov	(z[0-9]+\.h), w0
 **	movprfx	z0, z1
 **	and	z0\.h, p0/m, z0\.h, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/and_s8.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/and_s8.c
index 61d168d3fdf8968295724cdb0e63fcb087d30d27..197c3011cee8d17725ed0e42557745325b4cfa2d 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/and_s8.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/and_s8.c
@@ -43,7 +43,7 @@ TEST_UNIFORM_ZX (and_w0_s8_m_tied1, svint8_t, int8_t,
 		 z0 = svand_m (p0, z0, x0))
 
 /*
-** and_w0_s8_m_untied: { xfail *-*-* }
+** and_w0_s8_m_untied:
 **	mov	(z[0-9]+\.b), w0
 **	movprfx	z0, z1
 **	and	z0\.b, p0/m, z0\.b, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/and_u16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/and_u16.c
index 875a08d71d1822c54b3d549ce46e6c951ef38bc6..8210fba2f6382e493fba46f7b087b6de860f7f66 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/and_u16.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/and_u16.c
@@ -43,7 +43,7 @@ TEST_UNIFORM_ZX (and_w0_u16_m_tied1, svuint16_t, uint16_t,
 		 z0 = svand_m (p0, z0, x0))
 
 /*
-** and_w0_u16_m_untied: { xfail *-*-* }
+** and_w0_u16_m_untied:
 **	mov	(z[0-9]+\.h), w0
 **	movprfx	z0, z1
 **	and	z0\.h, p0/m, z0\.h, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/and_u8.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/and_u8.c
index b0f1c9529f05d614a1b37bdba68caf843ae12f65..5d3fd60382ffe4016f4539f347a1e34045a4e827 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/and_u8.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/and_u8.c
@@ -43,7 +43,7 @@ TEST_UNIFORM_ZX (and_w0_u8_m_tied1, svuint8_t, uint8_t,
 		 z0 = svand_m (p0, z0, x0))
 
 /*
-** and_w0_u8_m_untied: { xfail *-*-* }
+** and_w0_u8_m_untied:
 **	mov	(z[0-9]+\.b), w0
 **	movprfx	z0, z1
 **	and	z0\.b, p0/m, z0\.b, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/asr_s16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/asr_s16.c
index 877bf10685a4b29cc0c6e067ce0cec61530df59f..f9ce790da95e655ac73389ccdc36b9e16cdfebb9 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/asr_s16.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/asr_s16.c
@@ -43,7 +43,7 @@ TEST_UNIFORM_ZX (asr_w0_s16_m_tied1, svint16_t, uint16_t,
 		 z0 = svasr_m (p0, z0, x0))
 
 /*
-** asr_w0_s16_m_untied: { xfail *-*-* }
+** asr_w0_s16_m_untied:
 **	mov	(z[0-9]+\.h), w0
 **	movprfx	z0, z1
 **	asr	z0\.h, p0/m, z0\.h, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/asr_s8.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/asr_s8.c
index 992e93fdef7a6a425d247229ebc14801caf02ef0..5cf3a712c282534031e1dbf301f82058467da4ec 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/asr_s8.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/asr_s8.c
@@ -43,7 +43,7 @@ TEST_UNIFORM_ZX (asr_w0_s8_m_tied1, svint8_t, uint8_t,
 		 z0 = svasr_m (p0, z0, x0))
 
 /*
-** asr_w0_s8_m_untied: { xfail *-*-* }
+** asr_w0_s8_m_untied:
 **	mov	(z[0-9]+\.b), w0
 **	movprfx	z0, z1
 **	asr	z0\.b, p0/m, z0\.b, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/bic_s16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/bic_s16.c
index c80f5697f5f475cb052527b2a0af5c37b794ba3b..9cf22ebe7fc37a12f9ef3d0a5c76b9c7db82d0ae 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/bic_s16.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/bic_s16.c
@@ -43,7 +43,7 @@ TEST_UNIFORM_ZX (bic_w0_s16_m_tied1, svint16_t, int16_t,
 		 z0 = svbic_m (p0, z0, x0))
 
 /*
-** bic_w0_s16_m_untied: { xfail *-*-* }
+** bic_w0_s16_m_untied:
 **	mov	(z[0-9]+\.h), w0
 **	movprfx	z0, z1
 **	bic	z0\.h, p0/m, z0\.h, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/bic_s8.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/bic_s8.c
index 0958a34039394d79ecd00ac3c855c01cecbce86a..6795716810c8ba79da1c4e96d8eeef1cfdd1dc40 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/bic_s8.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/bic_s8.c
@@ -43,7 +43,7 @@ TEST_UNIFORM_ZX (bic_w0_s8_m_tied1, svint8_t, int8_t,
 		 z0 = svbic_m (p0, z0, x0))
 
 /*
-** bic_w0_s8_m_untied: { xfail *-*-* }
+** bic_w0_s8_m_untied:
 **	mov	(z[0-9]+\.b), w0
 **	movprfx	z0, z1
 **	bic	z0\.b, p0/m, z0\.b, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/bic_u16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/bic_u16.c
index 30209ffb418f410094afd9df594dad98aa380a5d..5409954caff45d6c12bd13c4e090712f72e8da0e 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/bic_u16.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/bic_u16.c
@@ -43,7 +43,7 @@ TEST_UNIFORM_ZX (bic_w0_u16_m_tied1, svuint16_t, uint16_t,
 		 z0 = svbic_m (p0, z0, x0))
 
 /*
-** bic_w0_u16_m_untied: { xfail *-*-* }
+** bic_w0_u16_m_untied:
 **	mov	(z[0-9]+\.h), w0
 **	movprfx	z0, z1
 **	bic	z0\.h, p0/m, z0\.h, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/bic_u8.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/bic_u8.c
index 80c489b9cdb2b6c6dfd3ab5377bc902f8ef86f6d..a0fbae4796259694b6918a88d9743c4edb9f390d 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/bic_u8.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/bic_u8.c
@@ -43,7 +43,7 @@ TEST_UNIFORM_ZX (bic_w0_u8_m_tied1, svuint8_t, uint8_t,
 		 z0 = svbic_m (p0, z0, x0))
 
 /*
-** bic_w0_u8_m_untied: { xfail *-*-* }
+** bic_w0_u8_m_untied:
 **	mov	(z[0-9]+\.b), w0
 **	movprfx	z0, z1
 **	bic	z0\.b, p0/m, z0\.b, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/dot_s32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/dot_s32.c
index 605bd1b30f25f4b2b2bb9e923a9e7dfee8784b07..e9603adf20b79bb96cb1f8ea7e804c308212b319 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/dot_s32.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/dot_s32.c
@@ -54,7 +54,7 @@ TEST_DUAL_ZX (dot_w0_s32_tied1, svint32_t, svint8_t, int8_t,
 	      z0 = svdot (z0, z4, x0))
 
 /*
-** dot_w0_s32_untied: { xfail *-*-* }
+** dot_w0_s32_untied:
 **	mov	(z[0-9]+\.b), w0
 **	movprfx	z0, z1
 **	sdot	z0\.s, z4\.b, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/dot_s64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/dot_s64.c
index b6574740b7e7a9a9ff5f79568d8590d2744ad7a4..7a13c62c535150d827e4247de3151588e61d9e68 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/dot_s64.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/dot_s64.c
@@ -54,7 +54,7 @@ TEST_DUAL_ZX (dot_w0_s64_tied1, svint64_t, svint16_t, int16_t,
 	      z0 = svdot (z0, z4, x0))
 
 /*
-** dot_w0_s64_untied: { xfail *-*-* }
+** dot_w0_s64_untied:
 **	mov	(z[0-9]+\.h), w0
 **	movprfx	z0, z1
 **	sdot	z0\.d, z4\.h, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/dot_u32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/dot_u32.c
index 541e71cc212e7ce0d96a550c373c4c30a0db58ea..e46cac469d98fc59408296676566c61f88fc1c77 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/dot_u32.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/dot_u32.c
@@ -54,7 +54,7 @@ TEST_DUAL_ZX (dot_w0_u32_tied1, svuint32_t, svuint8_t, uint8_t,
 	      z0 = svdot (z0, z4, x0))
 
 /*
-** dot_w0_u32_untied: { xfail *-*-* }
+** dot_w0_u32_untied:
 **	mov	(z[0-9]+\.b), w0
 **	movprfx	z0, z1
 **	udot	z0\.s, z4\.b, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/dot_u64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/dot_u64.c
index cc0e853737df001f4d46c0a12edad45dd568f745..67576c8da171ff61bb32db4f89b2dd3074be48e0 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/dot_u64.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/dot_u64.c
@@ -54,7 +54,7 @@ TEST_DUAL_ZX (dot_w0_u64_tied1, svuint64_t, svuint16_t, uint16_t,
 	      z0 = svdot (z0, z4, x0))
 
 /*
-** dot_w0_u64_untied: { xfail *-*-* }
+** dot_w0_u64_untied:
 **	mov	(z[0-9]+\.h), w0
 **	movprfx	z0, z1
 **	udot	z0\.d, z4\.h, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/eor_s16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/eor_s16.c
index 7cf73609a1aa188e0385838803fe75f45014a1bc..123cdfb0a7a74be73be8bea5b21ec2593a21f96a 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/eor_s16.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/eor_s16.c
@@ -43,7 +43,7 @@ TEST_UNIFORM_ZX (eor_w0_s16_m_tied1, svint16_t, int16_t,
 		 z0 = sveor_m (p0, z0, x0))
 
 /*
-** eor_w0_s16_m_untied: { xfail *-*-* }
+** eor_w0_s16_m_untied:
 **	mov	(z[0-9]+\.h), w0
 **	movprfx	z0, z1
 **	eor	z0\.h, p0/m, z0\.h, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/eor_s8.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/eor_s8.c
index 083ac2dde06e43dbdcdcf441c7d50ea77ac5c9cd..5da5908ac36bb729564e2e810a43965bd0913be3 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/eor_s8.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/eor_s8.c
@@ -43,7 +43,7 @@ TEST_UNIFORM_ZX (eor_w0_s8_m_tied1, svint8_t, int8_t,
 		 z0 = sveor_m (p0, z0, x0))
 
 /*
-** eor_w0_s8_m_untied: { xfail *-*-* }
+** eor_w0_s8_m_untied:
 **	mov	(z[0-9]+\.b), w0
 **	movprfx	z0, z1
 **	eor	z0\.b, p0/m, z0\.b, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/eor_u16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/eor_u16.c
index 40b43a5f89b480a6286c1fcc0746375eb8031914..b3c2e0c4e0b4320653b663f55246e9aa59afa722 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/eor_u16.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/eor_u16.c
@@ -43,7 +43,7 @@ TEST_UNIFORM_ZX (eor_w0_u16_m_tied1, svuint16_t, uint16_t,
 		 z0 = sveor_m (p0, z0, x0))
 
 /*
-** eor_w0_u16_m_untied: { xfail *-*-* }
+** eor_w0_u16_m_untied:
 **	mov	(z[0-9]+\.h), w0
 **	movprfx	z0, z1
 **	eor	z0\.h, p0/m, z0\.h, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/eor_u8.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/eor_u8.c
index 006637699e8b199aad22112b72d748c04f8f5257..021af308174b1d5e0003e7a2960cc821739c4aab 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/eor_u8.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/eor_u8.c
@@ -43,7 +43,7 @@ TEST_UNIFORM_ZX (eor_w0_u8_m_tied1, svuint8_t, uint8_t,
 		 z0 = sveor_m (p0, z0, x0))
 
 /*
-** eor_w0_u8_m_untied: { xfail *-*-* }
+** eor_w0_u8_m_untied:
 **	mov	(z[0-9]+\.b), w0
 **	movprfx	z0, z1
 **	eor	z0\.b, p0/m, z0\.b, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsl_s16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsl_s16.c
index edaaca5f155b9af6cd7dd1d15ff94e86c82c486f..67db99697ee88367f8393f9580b4cc11655c2327 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsl_s16.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsl_s16.c
@@ -43,7 +43,7 @@ TEST_UNIFORM_ZX (lsl_w0_s16_m_tied1, svint16_t, uint16_t,
 		 z0 = svlsl_m (p0, z0, x0))
 
 /*
-** lsl_w0_s16_m_untied: { xfail *-*-* }
+** lsl_w0_s16_m_untied:
 **	mov	(z[0-9]+\.h), w0
 **	movprfx	z0, z1
 **	lsl	z0\.h, p0/m, z0\.h, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsl_s8.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsl_s8.c
index 9a9cc959c33da729324920a6946c15cb6ba099bd..67a09a745e24e08c29bdbadde1b2f9640683a1d0 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsl_s8.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsl_s8.c
@@ -43,7 +43,7 @@ TEST_UNIFORM_ZX (lsl_w0_s8_m_tied1, svint8_t, uint8_t,
 		 z0 = svlsl_m (p0, z0, x0))
 
 /*
-** lsl_w0_s8_m_untied: { xfail *-*-* }
+** lsl_w0_s8_m_untied:
 **	mov	(z[0-9]+\.b), w0
 **	movprfx	z0, z1
 **	lsl	z0\.b, p0/m, z0\.b, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsl_u16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsl_u16.c
index 57db0fda66af3d642eeacd22ec3bd48c94b34048..b5e7386ce632a672b49b4bc05d8380fbc62bc0bb 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsl_u16.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsl_u16.c
@@ -43,7 +43,7 @@ TEST_UNIFORM_ZX (lsl_w0_u16_m_tied1, svuint16_t, uint16_t,
 		 z0 = svlsl_m (p0, z0, x0))
 
 /*
-** lsl_w0_u16_m_untied: { xfail *-*-* }
+** lsl_w0_u16_m_untied:
 **	mov	(z[0-9]+\.h), w0
 **	movprfx	z0, z1
 **	lsl	z0\.h, p0/m, z0\.h, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsl_u8.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsl_u8.c
index 894b5513857b5949a6942156148778a9ae9cadbe..905bf9ed1692ab596af0237a7b33fa40bca24ddd 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsl_u8.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsl_u8.c
@@ -43,7 +43,7 @@ TEST_UNIFORM_ZX (lsl_w0_u8_m_tied1, svuint8_t, uint8_t,
 		 z0 = svlsl_m (p0, z0, x0))
 
 /*
-** lsl_w0_u8_m_untied: { xfail *-*-* }
+** lsl_w0_u8_m_untied:
 **	mov	(z[0-9]+\.b), w0
 **	movprfx	z0, z1
 **	lsl	z0\.b, p0/m, z0\.b, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsr_u16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsr_u16.c
index 61575645fad086970193875926d051ecc6482ef6..a41411986f798e251e73915542f999f8468fd2eb 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsr_u16.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsr_u16.c
@@ -43,7 +43,7 @@ TEST_UNIFORM_ZX (lsr_w0_u16_m_tied1, svuint16_t, uint16_t,
 		 z0 = svlsr_m (p0, z0, x0))
 
 /*
-** lsr_w0_u16_m_untied: { xfail *-*-* }
+** lsr_w0_u16_m_untied:
 **	mov	(z[0-9]+\.h), w0
 **	movprfx	z0, z1
 **	lsr	z0\.h, p0/m, z0\.h, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsr_u8.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsr_u8.c
index a049ca90556e5daa42a928a82a1a704803194cd6..b773eedba7fe5b4e390170076a32d9281527d690 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsr_u8.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsr_u8.c
@@ -43,7 +43,7 @@ TEST_UNIFORM_ZX (lsr_w0_u8_m_tied1, svuint8_t, uint8_t,
 		 z0 = svlsr_m (p0, z0, x0))
 
 /*
-** lsr_w0_u8_m_untied: { xfail *-*-* }
+** lsr_w0_u8_m_untied:
 **	mov	(z[0-9]+\.b), w0
 **	movprfx	z0, z1
 **	lsr	z0\.b, p0/m, z0\.b, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mad_s16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mad_s16.c
index 02a6d4588b85f315e8e695b196db8bbf5c214454..bb6e0ea31c521dfb8199aff6409b56335d492320 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mad_s16.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mad_s16.c
@@ -54,7 +54,7 @@ TEST_UNIFORM_ZX (mad_w0_s16_m_tied1, svint16_t, int16_t,
 		 z0 = svmad_m (p0, z0, z1, x0))
 
 /*
-** mad_w0_s16_m_untied: { xfail *-*-* }
+** mad_w0_s16_m_untied:
 **	mov	(z[0-9]+\.h), w0
 **	movprfx	z0, z1
 **	mad	z0\.h, p0/m, z2\.h, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mad_s8.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mad_s8.c
index 90d712686ca5ab9752c245dcf2ae4230546fd9fc..adc6972a7dca903a148e83b649140f8b97f10eee 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mad_s8.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mad_s8.c
@@ -54,7 +54,7 @@ TEST_UNIFORM_ZX (mad_w0_s8_m_tied1, svint8_t, int8_t,
 		 z0 = svmad_m (p0, z0, z1, x0))
 
 /*
-** mad_w0_s8_m_untied: { xfail *-*-* }
+** mad_w0_s8_m_untied:
 **	mov	(z[0-9]+\.b), w0
 **	movprfx	z0, z1
 **	mad	z0\.b, p0/m, z2\.b, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mad_u16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mad_u16.c
index 1d2ad9c5fc9d972c08bbf690b1fb4fb4d26c6f29..5b3c390b14c8b26b268cad5b364ac32a13a502b5 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mad_u16.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mad_u16.c
@@ -54,7 +54,7 @@ TEST_UNIFORM_ZX (mad_w0_u16_m_tied1, svuint16_t, uint16_t,
 		 z0 = svmad_m (p0, z0, z1, x0))
 
 /*
-** mad_w0_u16_m_untied: { xfail *-*-* }
+** mad_w0_u16_m_untied:
 **	mov	(z[0-9]+\.h), w0
 **	movprfx	z0, z1
 **	mad	z0\.h, p0/m, z2\.h, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mad_u8.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mad_u8.c
index 0b4b1b8cfe6e3b00eb4b1fe0516ca84fd2418aa0..a23c813f18f1a82b9dca777a84d44c2d2993c6ce 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mad_u8.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mad_u8.c
@@ -54,7 +54,7 @@ TEST_UNIFORM_ZX (mad_w0_u8_m_tied1, svuint8_t, uint8_t,
 		 z0 = svmad_m (p0, z0, z1, x0))
 
 /*
-** mad_w0_u8_m_untied: { xfail *-*-* }
+** mad_w0_u8_m_untied:
 **	mov	(z[0-9]+\.b), w0
 **	movprfx	z0, z1
 **	mad	z0\.b, p0/m, z2\.b, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/max_s16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/max_s16.c
index 6a21675228274043ec4ed46405964015a9f34744..d5f5e6e629dcadf40618dd7156a39fda36b3896a 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/max_s16.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/max_s16.c
@@ -43,7 +43,7 @@ TEST_UNIFORM_ZX (max_w0_s16_m_tied1, svint16_t, int16_t,
 		 z0 = svmax_m (p0, z0, x0))
 
 /*
-** max_w0_s16_m_untied: { xfail *-*-* }
+** max_w0_s16_m_untied:
 **	mov	(z[0-9]+\.h), w0
 **	movprfx	z0, z1
 **	smax	z0\.h, p0/m, z0\.h, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/max_s8.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/max_s8.c
index c651a26f0d1a92bdaca0ea7260421a6e251e2622..e23dbeeec6b33ef8f28ddde8a52af1e1cbb3e815 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/max_s8.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/max_s8.c
@@ -43,7 +43,7 @@ TEST_UNIFORM_ZX (max_w0_s8_m_tied1, svint8_t, int8_t,
 		 z0 = svmax_m (p0, z0, x0))
 
 /*
-** max_w0_s8_m_untied: { xfail *-*-* }
+** max_w0_s8_m_untied:
 **	mov	(z[0-9]+\.b), w0
 **	movprfx	z0, z1
 **	smax	z0\.b, p0/m, z0\.b, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/max_u16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/max_u16.c
index 9a0b9543169d9a626e20009a8a838ba16fd6c48c..9ad6705ab703b204a78b07174590cbaa8813731d 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/max_u16.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/max_u16.c
@@ -43,7 +43,7 @@ TEST_UNIFORM_ZX (max_w0_u16_m_tied1, svuint16_t, uint16_t,
 		 z0 = svmax_m (p0, z0, x0))
 
 /*
-** max_w0_u16_m_untied: { xfail *-*-* }
+** max_w0_u16_m_untied:
 **	mov	(z[0-9]+\.h), w0
 **	movprfx	z0, z1
 **	umax	z0\.h, p0/m, z0\.h, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/max_u8.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/max_u8.c
index 04c9ddb36a23c13e345781c04c3ef566e9c0f2af..6326891f68005c6223afe0c1f24ffa1b81d059e3 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/max_u8.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/max_u8.c
@@ -43,7 +43,7 @@ TEST_UNIFORM_ZX (max_w0_u8_m_tied1, svuint8_t, uint8_t,
 		 z0 = svmax_m (p0, z0, x0))
 
 /*
-** max_w0_u8_m_untied: { xfail *-*-* }
+** max_w0_u8_m_untied:
 **	mov	(z[0-9]+\.b), w0
 **	movprfx	z0, z1
 **	umax	z0\.b, p0/m, z0\.b, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/min_s16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/min_s16.c
index 14dfcc4c333b69fe24a4a9f978c02999f8abffce..1cf12277add40827921a4a1004d59e6706977acf 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/min_s16.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/min_s16.c
@@ -43,7 +43,7 @@ TEST_UNIFORM_ZX (min_w0_s16_m_tied1, svint16_t, int16_t,
 		 z0 = svmin_m (p0, z0, x0))
 
 /*
-** min_w0_s16_m_untied: { xfail *-*-* }
+** min_w0_s16_m_untied:
 **	mov	(z[0-9]+\.h), w0
 **	movprfx	z0, z1
 **	smin	z0\.h, p0/m, z0\.h, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/min_s8.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/min_s8.c
index 714b1576d5c6fa27d2648c2db0e38076aeb266c0..4edc0b72fadbbe508d7b9666f6d438163eb8c2cb 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/min_s8.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/min_s8.c
@@ -43,7 +43,7 @@ TEST_UNIFORM_ZX (min_w0_s8_m_tied1, svint8_t, int8_t,
 		 z0 = svmin_m (p0, z0, x0))
 
 /*
-** min_w0_s8_m_untied: { xfail *-*-* }
+** min_w0_s8_m_untied:
 **	mov	(z[0-9]+\.b), w0
 **	movprfx	z0, z1
 **	smin	z0\.b, p0/m, z0\.b, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/min_u16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/min_u16.c
index df35cf1135ec0fcf7a6528f78657271b95b8defc..2147f96d09f251703845745f2fbe505793bf51cd 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/min_u16.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/min_u16.c
@@ -43,7 +43,7 @@ TEST_UNIFORM_ZX (min_w0_u16_m_tied1, svuint16_t, uint16_t,
 		 z0 = svmin_m (p0, z0, x0))
 
 /*
-** min_w0_u16_m_untied: { xfail *-*-* }
+** min_w0_u16_m_untied:
 **	mov	(z[0-9]+\.h), w0
 **	movprfx	z0, z1
 **	umin	z0\.h, p0/m, z0\.h, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/min_u8.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/min_u8.c
index 2ca274278a29a0e16e0756a5015f438bfa90f839..734925e6ea348316ee8cf6a3b9356f38c2f819ea 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/min_u8.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/min_u8.c
@@ -43,7 +43,7 @@ TEST_UNIFORM_ZX (min_w0_u8_m_tied1, svuint8_t, uint8_t,
 		 z0 = svmin_m (p0, z0, x0))
 
 /*
-** min_w0_u8_m_untied: { xfail *-*-* }
+** min_w0_u8_m_untied:
 **	mov	(z[0-9]+\.b), w0
 **	movprfx	z0, z1
 **	umin	z0\.b, p0/m, z0\.b, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mla_s16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mla_s16.c
index f3ed191db6abe5947313cf9cc9259735f39aa789..c016acd816290082b4dde6b83ca4f7c74f975ae0 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mla_s16.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mla_s16.c
@@ -54,7 +54,7 @@ TEST_UNIFORM_ZX (mla_w0_s16_m_tied1, svint16_t, int16_t,
 		 z0 = svmla_m (p0, z0, z1, x0))
 
 /*
-** mla_w0_s16_m_untied: { xfail *-*-* }
+** mla_w0_s16_m_untied:
 **	mov	(z[0-9]+\.h), w0
 **	movprfx	z0, z1
 **	mla	z0\.h, p0/m, z2\.h, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mla_s8.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mla_s8.c
index 47468947d78b686ee83a309ad7ff2f31deed5872..ccbf145850acf927ebdc02ad950bd518606eb79d 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mla_s8.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mla_s8.c
@@ -54,7 +54,7 @@ TEST_UNIFORM_ZX (mla_w0_s8_m_tied1, svint8_t, int8_t,
 		 z0 = svmla_m (p0, z0, z1, x0))
 
 /*
-** mla_w0_s8_m_untied: { xfail *-*-* }
+** mla_w0_s8_m_untied:
 **	mov	(z[0-9]+\.b), w0
 **	movprfx	z0, z1
 **	mla	z0\.b, p0/m, z2\.b, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mla_u16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mla_u16.c
index 7238e428f68668c23ef46d62a6bfb036bc669641..c3121bb761261c1acf632709ff4836b2fd4e5692 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mla_u16.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mla_u16.c
@@ -54,7 +54,7 @@ TEST_UNIFORM_ZX (mla_w0_u16_m_tied1, svuint16_t, uint16_t,
 		 z0 = svmla_m (p0, z0, z1, x0))
 
 /*
-** mla_w0_u16_m_untied: { xfail *-*-* }
+** mla_w0_u16_m_untied:
 **	mov	(z[0-9]+\.h), w0
 **	movprfx	z0, z1
 **	mla	z0\.h, p0/m, z2\.h, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mla_u8.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mla_u8.c
index 832ed41410e39b1c0e554799d9d881357939ab1c..ecc70274cb6e452b1576d02968b8afe5a7dc7fac 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mla_u8.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mla_u8.c
@@ -54,7 +54,7 @@ TEST_UNIFORM_ZX (mla_w0_u8_m_tied1, svuint8_t, uint8_t,
 		 z0 = svmla_m (p0, z0, z1, x0))
 
 /*
-** mla_w0_u8_m_untied: { xfail *-*-* }
+** mla_w0_u8_m_untied:
 **	mov	(z[0-9]+\.b), w0
 **	movprfx	z0, z1
 **	mla	z0\.b, p0/m, z2\.b, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mls_s16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mls_s16.c
index e199829c4adc8da64824e5523e77a02b1dc0acf4..7be0a4128f6165a65ef2a2f2defb659c50d64b45 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mls_s16.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mls_s16.c
@@ -54,7 +54,7 @@ TEST_UNIFORM_ZX (mls_w0_s16_m_tied1, svint16_t, int16_t,
 		 z0 = svmls_m (p0, z0, z1, x0))
 
 /*
-** mls_w0_s16_m_untied: { xfail *-*-* }
+** mls_w0_s16_m_untied:
 **	mov	(z[0-9]+\.h), w0
 **	movprfx	z0, z1
 **	mls	z0\.h, p0/m, z2\.h, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mls_s8.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mls_s8.c
index c60c431455f099fc45e5ad2a026a72a6f7ae3eb9..fc3a65c7a0944e5c6865a2ddb899a06d15b21976 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mls_s8.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mls_s8.c
@@ -54,7 +54,7 @@ TEST_UNIFORM_ZX (mls_w0_s8_m_tied1, svint8_t, int8_t,
 		 z0 = svmls_m (p0, z0, z1, x0))
 
 /*
-** mls_w0_s8_m_untied: { xfail *-*-* }
+** mls_w0_s8_m_untied:
 **	mov	(z[0-9]+\.b), w0
 **	movprfx	z0, z1
 **	mls	z0\.b, p0/m, z2\.b, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mls_u16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mls_u16.c
index e8a9f5cd94c6d823f988915373bfa5eebbc44c45..10779fb6674aefef7bcbd209dbc94ec3badb4801 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mls_u16.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mls_u16.c
@@ -54,7 +54,7 @@ TEST_UNIFORM_ZX (mls_w0_u16_m_tied1, svuint16_t, uint16_t,
 		 z0 = svmls_m (p0, z0, z1, x0))
 
 /*
-** mls_w0_u16_m_untied: { xfail *-*-* }
+** mls_w0_u16_m_untied:
 **	mov	(z[0-9]+\.h), w0
 **	movprfx	z0, z1
 **	mls	z0\.h, p0/m, z2\.h, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mls_u8.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mls_u8.c
index 0489aaa7cf96af5d0d02233f813b0b95119b1169..c9277e11cb5cefbe43511da5e01a74aa5f7eac6c 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mls_u8.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mls_u8.c
@@ -54,7 +54,7 @@ TEST_UNIFORM_ZX (mls_w0_u8_m_tied1, svuint8_t, uint8_t,
 		 z0 = svmls_m (p0, z0, z1, x0))
 
 /*
-** mls_w0_u8_m_untied: { xfail *-*-* }
+** mls_w0_u8_m_untied:
 **	mov	(z[0-9]+\.b), w0
 **	movprfx	z0, z1
 **	mls	z0\.b, p0/m, z2\.b, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/msb_s16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/msb_s16.c
index 56347cfb91828d45f5609c030e0869c650c5fd2f..52f3721c93633568f390cc6b7196af38a715b844 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/msb_s16.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/msb_s16.c
@@ -54,7 +54,7 @@ TEST_UNIFORM_ZX (msb_w0_s16_m_tied1, svint16_t, int16_t,
 		 z0 = svmsb_m (p0, z0, z1, x0))
 
 /*
-** msb_w0_s16_m_untied: { xfail *-*-* }
+** msb_w0_s16_m_untied:
 **	mov	(z[0-9]+\.h), w0
 **	movprfx	z0, z1
 **	msb	z0\.h, p0/m, z2\.h, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/msb_s8.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/msb_s8.c
index d7fcafdd0dfab6f4b1f7da8551570ab898ca2eab..381773c4a2b43538a336a63a36f4d3307e0e405f 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/msb_s8.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/msb_s8.c
@@ -54,7 +54,7 @@ TEST_UNIFORM_ZX (msb_w0_s8_m_tied1, svint8_t, int8_t,
 		 z0 = svmsb_m (p0, z0, z1, x0))
 
 /*
-** msb_w0_s8_m_untied: { xfail *-*-* }
+** msb_w0_s8_m_untied:
 **	mov	(z[0-9]+\.b), w0
 **	movprfx	z0, z1
 **	msb	z0\.b, p0/m, z2\.b, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/msb_u16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/msb_u16.c
index 437a96040e12ca2dc83d8dc56ff2dbf14d3728ba..3b710f72c2dcce03df64634b35a5838330eda410 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/msb_u16.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/msb_u16.c
@@ -54,7 +54,7 @@ TEST_UNIFORM_ZX (msb_w0_u16_m_tied1, svuint16_t, uint16_t,
 		 z0 = svmsb_m (p0, z0, z1, x0))
 
 /*
-** msb_w0_u16_m_untied: { xfail *-*-* }
+** msb_w0_u16_m_untied:
 **	mov	(z[0-9]+\.h), w0
 **	movprfx	z0, z1
 **	msb	z0\.h, p0/m, z2\.h, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/msb_u8.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/msb_u8.c
index 5665ec9e32075c0d8a9665ab624c4dfde2042d5a..a0e86c799183d758a5be4a5116c592aba9d50021 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/msb_u8.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/msb_u8.c
@@ -54,7 +54,7 @@ TEST_UNIFORM_ZX (msb_w0_u8_m_tied1, svuint8_t, uint8_t,
 		 z0 = svmsb_m (p0, z0, z1, x0))
 
 /*
-** msb_w0_u8_m_untied: { xfail *-*-* }
+** msb_w0_u8_m_untied:
 **	mov	(z[0-9]+\.b), w0
 **	movprfx	z0, z1
 **	msb	z0\.b, p0/m, z2\.b, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_s16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_s16.c
index aa08bc2740507c7df20a31acea455b61e937ae4e..52c6767ede0df73ac5b5a7c32c283c13f46d1c85 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_s16.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_s16.c
@@ -43,7 +43,7 @@ TEST_UNIFORM_ZX (mul_w0_s16_m_tied1, svint16_t, int16_t,
 		 z0 = svmul_m (p0, z0, x0))
 
 /*
-** mul_w0_s16_m_untied: { xfail *-*-* }
+** mul_w0_s16_m_untied:
 **	mov	(z[0-9]+\.h), w0
 **	movprfx	z0, z1
 **	mul	z0\.h, p0/m, z0\.h, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_s8.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_s8.c
index 012e6f250989dd0965b72bbec26d05facf0d6bda..5fc9d93b2b4567f381521c63d9b174cdd4dea1e0 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_s8.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_s8.c
@@ -43,7 +43,7 @@ TEST_UNIFORM_ZX (mul_w0_s8_m_tied1, svint8_t, int8_t,
 		 z0 = svmul_m (p0, z0, x0))
 
 /*
-** mul_w0_s8_m_untied: { xfail *-*-* }
+** mul_w0_s8_m_untied:
 **	mov	(z[0-9]+\.b), w0
 **	movprfx	z0, z1
 **	mul	z0\.b, p0/m, z0\.b, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u16.c
index 300987eb6e63677a0fa5e0f2b99f39700f9520bb..b081d230be3ed49044c506da156bc00bbaeb28a6 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u16.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u16.c
@@ -43,7 +43,7 @@ TEST_UNIFORM_ZX (mul_w0_u16_m_tied1, svuint16_t, uint16_t,
 		 z0 = svmul_m (p0, z0, x0))
 
 /*
-** mul_w0_u16_m_untied: { xfail *-*-* }
+** mul_w0_u16_m_untied:
 **	mov	(z[0-9]+\.h), w0
 **	movprfx	z0, z1
 **	mul	z0\.h, p0/m, z0\.h, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u8.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u8.c
index b2745a48f506cabee8fe9d952c7d89fa8eabdd45..e02be8b343e4e02d765bfea3a55e5f2f43345867 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u8.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u8.c
@@ -43,7 +43,7 @@ TEST_UNIFORM_ZX (mul_w0_u8_m_tied1, svuint8_t, uint8_t,
 		 z0 = svmul_m (p0, z0, x0))
 
 /*
-** mul_w0_u8_m_untied: { xfail *-*-* }
+** mul_w0_u8_m_untied:
 **	mov	(z[0-9]+\.b), w0
 **	movprfx	z0, z1
 **	mul	z0\.b, p0/m, z0\.b, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mulh_s16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mulh_s16.c
index a81532f5d8987dc2405cf0a51ca476052f5b984e..3cd3a2477052ad17c435a9bcb0db92789a2b8611 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mulh_s16.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mulh_s16.c
@@ -43,7 +43,7 @@ TEST_UNIFORM_ZX (mulh_w0_s16_m_tied1, svint16_t, int16_t,
 		 z0 = svmulh_m (p0, z0, x0))
 
 /*
-** mulh_w0_s16_m_untied: { xfail *-*-* }
+** mulh_w0_s16_m_untied:
 **	mov	(z[0-9]+\.h), w0
 **	movprfx	z0, z1
 **	smulh	z0\.h, p0/m, z0\.h, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mulh_s8.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mulh_s8.c
index f9cd01afdc964f89e6fd24a057aa17f69bb3172c..08ead93a80da38a9df5dc770141f84a23e5ac1a6 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mulh_s8.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mulh_s8.c
@@ -43,7 +43,7 @@ TEST_UNIFORM_ZX (mulh_w0_s8_m_tied1, svint8_t, int8_t,
 		 z0 = svmulh_m (p0, z0, x0))
 
 /*
-** mulh_w0_s8_m_untied: { xfail *-*-* }
+** mulh_w0_s8_m_untied:
 **	mov	(z[0-9]+\.b), w0
 **	movprfx	z0, z1
 **	smulh	z0\.b, p0/m, z0\.b, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mulh_u16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mulh_u16.c
index e9173eb243ec9c2514809fd0bdf208a7811377a8..01a0de738bb980adfcd0394ca221fe765fb8207c 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mulh_u16.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mulh_u16.c
@@ -43,7 +43,7 @@ TEST_UNIFORM_ZX (mulh_w0_u16_m_tied1, svuint16_t, uint16_t,
 		 z0 = svmulh_m (p0, z0, x0))
 
 /*
-** mulh_w0_u16_m_untied: { xfail *-*-* }
+** mulh_w0_u16_m_untied:
 **	mov	(z[0-9]+\.h), w0
 **	movprfx	z0, z1
 **	umulh	z0\.h, p0/m, z0\.h, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mulh_u8.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mulh_u8.c
index db7b1be1bdf92ec534ae8d467cb44d0df380423a..ce547ffbe2b6ced08ad52f366800f68c7bda9dfb 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mulh_u8.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mulh_u8.c
@@ -43,7 +43,7 @@ TEST_UNIFORM_ZX (mulh_w0_u8_m_tied1, svuint8_t, uint8_t,
 		 z0 = svmulh_m (p0, z0, x0))
 
 /*
-** mulh_w0_u8_m_untied: { xfail *-*-* }
+** mulh_w0_u8_m_untied:
 **	mov	(z[0-9]+\.b), w0
 **	movprfx	z0, z1
 **	umulh	z0\.b, p0/m, z0\.b, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/orr_s16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/orr_s16.c
index 62b707a9c696164ec1da32daac76182f3dbcf1fa..c0a3aaf3dfe6b7bf29dfdf91a16b16e64d4b966a 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/orr_s16.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/orr_s16.c
@@ -43,7 +43,7 @@ TEST_UNIFORM_ZX (orr_w0_s16_m_tied1, svint16_t, int16_t,
 		 z0 = svorr_m (p0, z0, x0))
 
 /*
-** orr_w0_s16_m_untied: { xfail *-*-* }
+** orr_w0_s16_m_untied:
 **	mov	(z[0-9]+\.h), w0
 **	movprfx	z0, z1
 **	orr	z0\.h, p0/m, z0\.h, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/orr_s8.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/orr_s8.c
index b6483b6e76ec82e0e6e99e701271fb2fe3d507b5..96c265d856ab5aba848a04c44c1ebc2db790e21d 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/orr_s8.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/orr_s8.c
@@ -43,7 +43,7 @@ TEST_UNIFORM_ZX (orr_w0_s8_m_tied1, svint8_t, int8_t,
 		 z0 = svorr_m (p0, z0, x0))
 
 /*
-** orr_w0_s8_m_untied: { xfail *-*-* }
+** orr_w0_s8_m_untied:
 **	mov	(z[0-9]+\.b), w0
 **	movprfx	z0, z1
 **	orr	z0\.b, p0/m, z0\.b, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/orr_u16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/orr_u16.c
index 000a0444c9b08f8c17bc44183381f57c5136081a..8757ff18499b28fd4e3156abfa9e331058b895a8 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/orr_u16.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/orr_u16.c
@@ -43,7 +43,7 @@ TEST_UNIFORM_ZX (orr_w0_u16_m_tied1, svuint16_t, uint16_t,
 		 z0 = svorr_m (p0, z0, x0))
 
 /*
-** orr_w0_u16_m_untied: { xfail *-*-* }
+** orr_w0_u16_m_untied:
 **	mov	(z[0-9]+\.h), w0
 **	movprfx	z0, z1
 **	orr	z0\.h, p0/m, z0\.h, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/orr_u8.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/orr_u8.c
index efe5591b47287412ec558ece8a9449302b9df31d..c9701f2c3492520f15c369ca74b60a2c370b78f2 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/orr_u8.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/orr_u8.c
@@ -43,7 +43,7 @@ TEST_UNIFORM_ZX (orr_w0_u8_m_tied1, svuint8_t, uint8_t,
 		 z0 = svorr_m (p0, z0, x0))
 
 /*
-** orr_w0_u8_m_untied: { xfail *-*-* }
+** orr_w0_u8_m_untied:
 **	mov	(z[0-9]+\.b), w0
 **	movprfx	z0, z1
 **	orr	z0\.b, p0/m, z0\.b, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/scale_f16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/scale_f16.c
index 9c554255b443844d04427ce5777264921a3c5f61..50a718d46cf772ec12697d14bee61a3dfb59f6ec 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/scale_f16.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/scale_f16.c
@@ -43,7 +43,7 @@ TEST_UNIFORM_ZX (scale_w0_f16_m_tied1, svfloat16_t, int16_t,
 		 z0 = svscale_m (p0, z0, x0))
 
 /*
-** scale_w0_f16_m_untied: { xfail *-*-* }
+** scale_w0_f16_m_untied:
 **	mov	(z[0-9]+\.h), w0
 **	movprfx	z0, z1
 **	fscale	z0\.h, p0/m, z0\.h, \1
@@ -211,7 +211,7 @@ TEST_UNIFORM_ZX (scale_w0_f16_x_tied1, svfloat16_t, int16_t,
 		 z0 = svscale_x (p0, z0, x0))
 
 /*
-** scale_w0_f16_x_untied: { xfail *-*-* }
+** scale_w0_f16_x_untied:
 **	mov	(z[0-9]+\.h), w0
 **	movprfx	z0, z1
 **	fscale	z0\.h, p0/m, z0\.h, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/sub_s16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/sub_s16.c
index aea8ea2b4aa545b5e1dc0416f972a2dfec512ba4..fcab3a87162d5df76371166951efb18fde493e55 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/sub_s16.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/sub_s16.c
@@ -43,7 +43,7 @@ TEST_UNIFORM_ZX (sub_w0_s16_m_tied1, svint16_t, int16_t,
 		 z0 = svsub_m (p0, z0, x0))
 
 /*
-** sub_w0_s16_m_untied: { xfail *-*-* }
+** sub_w0_s16_m_untied:
 **	mov	(z[0-9]+\.h), w0
 **	movprfx	z0, z1
 **	sub	z0\.h, p0/m, z0\.h, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/sub_s8.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/sub_s8.c
index 0d7ba99aa5695c97e74ffc729dc01a96d2fbeec5..4de12f0b387d9d595c5bc4043018aaabdd643425 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/sub_s8.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/sub_s8.c
@@ -43,7 +43,7 @@ TEST_UNIFORM_ZX (sub_w0_s8_m_tied1, svint8_t, int8_t,
 		 z0 = svsub_m (p0, z0, x0))
 
 /*
-** sub_w0_s8_m_untied: { xfail *-*-* }
+** sub_w0_s8_m_untied:
 **	mov	(z[0-9]+\.b), w0
 **	movprfx	z0, z1
 **	sub	z0\.b, p0/m, z0\.b, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/sub_u16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/sub_u16.c
index 89620e159bf3b55a88c00900c901cafb547510b1..d682efa93eac4c97984e75d302268c2b478e07b9 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/sub_u16.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/sub_u16.c
@@ -43,7 +43,7 @@ TEST_UNIFORM_ZX (sub_w0_u16_m_tied1, svuint16_t, uint16_t,
 		 z0 = svsub_m (p0, z0, x0))
 
 /*
-** sub_w0_u16_m_untied: { xfail *-*-* }
+** sub_w0_u16_m_untied:
 **	mov	(z[0-9]+\.h), w0
 **	movprfx	z0, z1
 **	sub	z0\.h, p0/m, z0\.h, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/sub_u8.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/sub_u8.c
index 4552041910f7e86cbe896c04971da1b974a5eda0..39bc519c642585eba13fda8edd8060dff5627371 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/sub_u8.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/sub_u8.c
@@ -43,7 +43,7 @@ TEST_UNIFORM_ZX (sub_w0_u8_m_tied1, svuint8_t, uint8_t,
 		 z0 = svsub_m (p0, z0, x0))
 
 /*
-** sub_w0_u8_m_untied: { xfail *-*-* }
+** sub_w0_u8_m_untied:
 **	mov	(z[0-9]+\.b), w0
 **	movprfx	z0, z1
 **	sub	z0\.b, p0/m, z0\.b, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/subr_s16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/subr_s16.c
index d3dad62dafeb93db1f2a309e4f4caab967a6dedd..ad6f4519e536adab2c45ad015b922d4c96714ab7 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/subr_s16.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/subr_s16.c
@@ -43,7 +43,7 @@ TEST_UNIFORM_ZX (subr_w0_s16_m_tied1, svint16_t, int16_t,
 		 z0 = svsubr_m (p0, z0, x0))
 
 /*
-** subr_w0_s16_m_untied: { xfail *-*-* }
+** subr_w0_s16_m_untied:
 **	mov	(z[0-9]+\.h), w0
 **	movprfx	z0, z1
 **	subr	z0\.h, p0/m, z0\.h, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/subr_s8.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/subr_s8.c
index 90d2a6de9a5fc935fd134c459b89829a933f116d..c31948202e2405990d0c6c99bd11fcb564983cc9 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/subr_s8.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/subr_s8.c
@@ -43,7 +43,7 @@ TEST_UNIFORM_ZX (subr_w0_s8_m_tied1, svint8_t, int8_t,
 		 z0 = svsubr_m (p0, z0, x0))
 
 /*
-** subr_w0_s8_m_untied: { xfail *-*-* }
+** subr_w0_s8_m_untied:
 **	mov	(z[0-9]+\.b), w0
 **	movprfx	z0, z1
 **	subr	z0\.b, p0/m, z0\.b, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/subr_u16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/subr_u16.c
index 379a80fb189796f747c92948d33ff3bf8cf6d0a7..631593aad78af792659842fd635b3335ea439366 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/subr_u16.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/subr_u16.c
@@ -43,7 +43,7 @@ TEST_UNIFORM_ZX (subr_w0_u16_m_tied1, svuint16_t, uint16_t,
 		 z0 = svsubr_m (p0, z0, x0))
 
 /*
-** subr_w0_u16_m_untied: { xfail *-*-* }
+** subr_w0_u16_m_untied:
 **	mov	(z[0-9]+\.h), w0
 **	movprfx	z0, z1
 **	subr	z0\.h, p0/m, z0\.h, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/subr_u8.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/subr_u8.c
index fe5f96da833565d6013383281e14005f380410ba..b9ab757ca81b75d1d76fc969982e078db76b8a28 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/subr_u8.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/subr_u8.c
@@ -43,7 +43,7 @@ TEST_UNIFORM_ZX (subr_w0_u8_m_tied1, svuint8_t, uint8_t,
 		 z0 = svsubr_m (p0, z0, x0))
 
 /*
-** subr_w0_u8_m_untied: { xfail *-*-* }
+** subr_w0_u8_m_untied:
 **	mov	(z[0-9]+\.b), w0
 **	movprfx	z0, z1
 **	subr	z0\.b, p0/m, z0\.b, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/bcax_s16.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/bcax_s16.c
index acad87d963540c6cbf3df1a929ac368c754007a7..2fb2bdff574222033b090498b93b99efae68073e 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/bcax_s16.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/bcax_s16.c
@@ -66,7 +66,7 @@ TEST_UNIFORM_ZX (bcax_w0_s16_tied2, svint16_t, int16_t,
 		 z0 = svbcax (z1, z0, x0))
 
 /*
-** bcax_w0_s16_untied: { xfail *-*-*}
+** bcax_w0_s16_untied:
 **	mov	(z[0-9]+)\.h, w0
 **	movprfx	z0, z1
 **	bcax	z0\.d, z0\.d, (z2\.d, \1\.d|\1\.d, z2\.d)
diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/bcax_s8.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/bcax_s8.c
index 548aafad85739d8420bd2c14bb877f14d8e755bc..3925f56ff20ca9db559a338587e1f19db8e9a0a6 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/bcax_s8.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/bcax_s8.c
@@ -66,7 +66,7 @@ TEST_UNIFORM_ZX (bcax_w0_s8_tied2, svint8_t, int8_t,
 		 z0 = svbcax (z1, z0, x0))
 
 /*
-** bcax_w0_s8_untied: { xfail *-*-*}
+** bcax_w0_s8_untied:
 **	mov	(z[0-9]+)\.b, w0
 **	movprfx	z0, z1
 **	bcax	z0\.d, z0\.d, (z2\.d, \1\.d|\1\.d, z2\.d)
diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/bcax_u16.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/bcax_u16.c
index b63a4774ba73e5df19fcc5190483a3fda4092598..589e868b205c492b510aafd21d77bb187d35429d 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/bcax_u16.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/bcax_u16.c
@@ -66,7 +66,7 @@ TEST_UNIFORM_ZX (bcax_w0_u16_tied2, svuint16_t, uint16_t,
 		 z0 = svbcax (z1, z0, x0))
 
 /*
-** bcax_w0_u16_untied: { xfail *-*-*}
+** bcax_w0_u16_untied:
 **	mov	(z[0-9]+)\.h, w0
 **	movprfx	z0, z1
 **	bcax	z0\.d, z0\.d, (z2\.d, \1\.d|\1\.d, z2\.d)
diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/bcax_u8.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/bcax_u8.c
index 0957d58bd0ecd348ea4148ce84e05ffcb2848bcd..ee0b603830b702fd7e99723d129e5369ded0ab9a 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/bcax_u8.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/bcax_u8.c
@@ -66,7 +66,7 @@ TEST_UNIFORM_ZX (bcax_w0_u8_tied2, svuint8_t, uint8_t,
 		 z0 = svbcax (z1, z0, x0))
 
 /*
-** bcax_w0_u8_untied: { xfail *-*-*}
+** bcax_w0_u8_untied:
 **	mov	(z[0-9]+)\.b, w0
 **	movprfx	z0, z1
 **	bcax	z0\.d, z0\.d, (z2\.d, \1\.d|\1\.d, z2\.d)
diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/qadd_s16.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/qadd_s16.c
index 6330c4265bb17b80f12a884e663db342b9c346b2..286f2e2b969139106f8a38b41b42097c9a79c276 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/qadd_s16.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/qadd_s16.c
@@ -163,7 +163,7 @@ TEST_UNIFORM_ZX (qadd_w0_s16_m_tied1, svint16_t, int16_t,
 		 z0 = svqadd_m (p0, z0, x0))
 
 /*
-** qadd_w0_s16_m_untied: { xfail *-*-* }
+** qadd_w0_s16_m_untied:
 **	mov	(z[0-9]+\.h), w0
 **	movprfx	z0, z1
 **	sqadd	z0\.h, p0/m, z0\.h, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/qadd_s8.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/qadd_s8.c
index 61343beacb899b843c2358b609cc26a1ca1110a3..ad439892a6749aa3d346424f8b2aebcd4bd4762c 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/qadd_s8.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/qadd_s8.c
@@ -163,7 +163,7 @@ TEST_UNIFORM_ZX (qadd_w0_s8_m_tied1, svint8_t, int8_t,
 		 z0 = svqadd_m (p0, z0, x0))
 
 /*
-** qadd_w0_s8_m_untied: { xfail *-*-* }
+** qadd_w0_s8_m_untied:
 **	mov	(z[0-9]+\.b), w0
 **	movprfx	z0, z1
 **	sqadd	z0\.b, p0/m, z0\.b, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/qadd_u16.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/qadd_u16.c
index f6c7ca9e075b7106209b858763a0cec160210ddf..9dafc3617ef84cf5a6ac13ad532a0e63065c7409 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/qadd_u16.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/qadd_u16.c
@@ -166,7 +166,7 @@ TEST_UNIFORM_ZX (qadd_w0_u16_m_tied1, svuint16_t, uint16_t,
 		 z0 = svqadd_m (p0, z0, x0))
 
 /*
-** qadd_w0_u16_m_untied: { xfail *-*-* }
+** qadd_w0_u16_m_untied:
 **	mov	(z[0-9]+\.h), w0
 **	movprfx	z0, z1
 **	uqadd	z0\.h, p0/m, z0\.h, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/qadd_u8.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/qadd_u8.c
index 6c856e2871c26e1fbfa32c4e37df60c8884e76dd..a485cf3eeb656dded02c7eba8944757d164fbf6c 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/qadd_u8.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/qadd_u8.c
@@ -163,7 +163,7 @@ TEST_UNIFORM_ZX (qadd_w0_u8_m_tied1, svuint8_t, uint8_t,
 		 z0 = svqadd_m (p0, z0, x0))
 
 /*
-** qadd_w0_u8_m_untied: { xfail *-*-* }
+** qadd_w0_u8_m_untied:
 **	mov	(z[0-9]+\.b), w0
 **	movprfx	z0, z1
 **	uqadd	z0\.b, p0/m, z0\.b, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/qdmlalb_s16.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/qdmlalb_s16.c
index 4d1e90395e212b323f304f837504f2d8b158afc8..4880a6940d2981dd37b049431a68c326264c9503 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/qdmlalb_s16.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/qdmlalb_s16.c
@@ -54,7 +54,7 @@ TEST_DUAL_ZX (qdmlalb_w0_s16_tied1, svint16_t, svint8_t, int8_t,
 	      z0 = svqdmlalb (z0, z4, x0))
 
 /*
-** qdmlalb_w0_s16_untied: { xfail *-*-* }
+** qdmlalb_w0_s16_untied:
 **	mov	(z[0-9]+\.b), w0
 **	movprfx	z0, z1
 **	sqdmlalb	z0\.h, z4\.b, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/qdmlalb_s32.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/qdmlalb_s32.c
index 94373773e61e3ad6ce0ff5c2a7fda38f8e2086cb..03eebba5997c583768253586c252ec13c939b890 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/qdmlalb_s32.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/qdmlalb_s32.c
@@ -54,7 +54,7 @@ TEST_DUAL_ZX (qdmlalb_w0_s32_tied1, svint32_t, svint16_t, int16_t,
 	      z0 = svqdmlalb (z0, z4, x0))
 
 /*
-** qdmlalb_w0_s32_untied: { xfail *-*-* }
+** qdmlalb_w0_s32_untied:
 **	mov	(z[0-9]+\.h), w0
 **	movprfx	z0, z1
 **	sqdmlalb	z0\.s, z4\.h, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/qdmlalbt_s16.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/qdmlalbt_s16.c
index d591db3cfb8da203395b3b5e2955805381a3e4f3..0b3d5279db96bd7a39f286c6c5b37d9e1dea7d5a 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/qdmlalbt_s16.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/qdmlalbt_s16.c
@@ -54,7 +54,7 @@ TEST_DUAL_ZX (qdmlalbt_w0_s16_tied1, svint16_t, svint8_t, int8_t,
 	      z0 = svqdmlalbt (z0, z4, x0))
 
 /*
-** qdmlalbt_w0_s16_untied: { xfail *-*-*}
+** qdmlalbt_w0_s16_untied:
 **	mov	(z[0-9]+\.b), w0
 **	movprfx	z0, z1
 **	sqdmlalbt	z0\.h, z4\.b, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/qdmlalbt_s32.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/qdmlalbt_s32.c
index e8326fed6171531cad3dd1c8c921d89bc1f29ce9..9f2dd464e638d8dbe7d15b1fa1c1538aef18169a 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/qdmlalbt_s32.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/qdmlalbt_s32.c
@@ -54,7 +54,7 @@ TEST_DUAL_ZX (qdmlalbt_w0_s32_tied1, svint32_t, svint16_t, int16_t,
 	      z0 = svqdmlalbt (z0, z4, x0))
 
 /*
-** qdmlalbt_w0_s32_untied: { xfail *-*-*}
+** qdmlalbt_w0_s32_untied:
 **	mov	(z[0-9]+\.h), w0
 **	movprfx	z0, z1
 **	sqdmlalbt	z0\.s, z4\.h, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/qsub_s16.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/qsub_s16.c
index c102e58ed910c07aef6bee739f45fb96dd5aed80..618a61bf48fd912014b59874372e8d22eb7ee325 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/qsub_s16.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/qsub_s16.c
@@ -163,7 +163,7 @@ TEST_UNIFORM_ZX (qsub_w0_s16_m_tied1, svint16_t, int16_t,
 		 z0 = svqsub_m (p0, z0, x0))
 
 /*
-** qsub_w0_s16_m_untied: { xfail *-*-* }
+** qsub_w0_s16_m_untied:
 **	mov	(z[0-9]+\.h), w0
 **	movprfx	z0, z1
 **	sqsub	z0\.h, p0/m, z0\.h, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/qsub_s8.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/qsub_s8.c
index 067ee6e6cb1026fc7691268c47d318f4e67aa66c..f0d6b97bf97cb511e3e3a8c1d01948bbe28e393d 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/qsub_s8.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/qsub_s8.c
@@ -163,7 +163,7 @@ TEST_UNIFORM_ZX (qsub_w0_s8_m_tied1, svint8_t, int8_t,
 		 z0 = svqsub_m (p0, z0, x0))
 
 /*
-** qsub_w0_s8_m_untied: { xfail *-*-* }
+** qsub_w0_s8_m_untied:
 **	mov	(z[0-9]+\.b), w0
 **	movprfx	z0, z1
 **	sqsub	z0\.b, p0/m, z0\.b, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/qsub_u16.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/qsub_u16.c
index 61be74634723fe1cac7a5c2e38b5d5e72d73dd84..3e87dd162bfe0494d060098c0640094591ef6072 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/qsub_u16.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/qsub_u16.c
@@ -166,7 +166,7 @@ TEST_UNIFORM_ZX (qsub_w0_u16_m_tied1, svuint16_t, uint16_t,
 		 z0 = svqsub_m (p0, z0, x0))
 
 /*
-** qsub_w0_u16_m_untied: { xfail *-*-* }
+** qsub_w0_u16_m_untied:
 **	mov	(z[0-9]+\.h), w0
 **	movprfx	z0, z1
 **	uqsub	z0\.h, p0/m, z0\.h, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/qsub_u8.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/qsub_u8.c
index 686b2b425fb5f0ff9439571f7c518aa4b2e43167..92bf92909a7874a8bb4e073250b9f71e1bc82ebe 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/qsub_u8.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/qsub_u8.c
@@ -163,7 +163,7 @@ TEST_UNIFORM_ZX (qsub_w0_u8_m_tied1, svuint8_t, uint8_t,
 		 z0 = svqsub_m (p0, z0, x0))
 
 /*
-** qsub_w0_u8_m_untied: { xfail *-*-* }
+** qsub_w0_u8_m_untied:
 **	mov	(z[0-9]+\.b), w0
 **	movprfx	z0, z1
 **	uqsub	z0\.b, p0/m, z0\.b, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/qsubr_s16.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/qsubr_s16.c
index 577310d9614be6f063e473eb2d1b6be7aebf4cc2..e42821849b266391555d39773efa6f072bfa810c 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/qsubr_s16.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/qsubr_s16.c
@@ -43,7 +43,7 @@ TEST_UNIFORM_ZX (qsubr_w0_s16_m_tied1, svint16_t, int16_t,
 		 z0 = svqsubr_m (p0, z0, x0))
 
 /*
-** qsubr_w0_s16_m_untied: { xfail *-*-* }
+** qsubr_w0_s16_m_untied:
 **	mov	(z[0-9]+\.h), w0
 **	movprfx	z0, z1
 **	sqsubr	z0\.h, p0/m, z0\.h, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/qsubr_s8.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/qsubr_s8.c
index ce814a8393e94f50ecff8869649bf0acbb8eeb7e..92e66f70cbfdb066be1e2b5a745f57dff0b85c43 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/qsubr_s8.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/qsubr_s8.c
@@ -43,7 +43,7 @@ TEST_UNIFORM_ZX (qsubr_w0_s8_m_tied1, svint8_t, int8_t,
 		 z0 = svqsubr_m (p0, z0, x0))
 
 /*
-** qsubr_w0_s8_m_untied: { xfail *-*-* }
+** qsubr_w0_s8_m_untied:
 **	mov	(z[0-9]+\.b), w0
 **	movprfx	z0, z1
 **	sqsubr	z0\.b, p0/m, z0\.b, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/qsubr_u16.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/qsubr_u16.c
index f406bf2ed86c606bd984671b30c925b1c1233fd4..8018fdaef9e3d2a99a916aeae114ad273ea8ee92 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/qsubr_u16.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/qsubr_u16.c
@@ -43,7 +43,7 @@ TEST_UNIFORM_ZX (qsubr_w0_u16_m_tied1, svuint16_t, uint16_t,
 		 z0 = svqsubr_m (p0, z0, x0))
 
 /*
-** qsubr_w0_u16_m_untied: { xfail *-*-* }
+** qsubr_w0_u16_m_untied:
 **	mov	(z[0-9]+\.h), w0
 **	movprfx	z0, z1
 **	uqsubr	z0\.h, p0/m, z0\.h, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/qsubr_u8.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/qsubr_u8.c
index 7b487fd93b19d4ad620541069759fd0b05564b1c..bb29faa83504c4d1d0df942dfba2ed936c3986a1 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/qsubr_u8.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/qsubr_u8.c
@@ -43,7 +43,7 @@ TEST_UNIFORM_ZX (qsubr_w0_u8_m_tied1, svuint8_t, uint8_t,
 		 z0 = svqsubr_m (p0, z0, x0))
 
 /*
-** qsubr_w0_u8_m_untied: { xfail *-*-* }
+** qsubr_w0_u8_m_untied:
 **	mov	(z[0-9]+\.b), w0
 **	movprfx	z0, z1
 **	uqsubr	z0\.b, p0/m, z0\.b, \1




^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/3]middle-end: Add the ability to let the target decide the method of argument promotions.
  2022-05-13 17:11 [PATCH 1/3]middle-end: Add the ability to let the target decide the method of argument promotions Tamar Christina
  2022-05-13 17:11 ` [PATCH 2/3]AArch64 Promote function arguments using a paradoxical subreg when beneficial Tamar Christina
  2022-05-13 17:12 ` [PATCH 3/3]AArch64 Update the testsuite to remove xfails Tamar Christina
@ 2022-05-16  6:31 ` Richard Biener
  2022-05-16  8:26   ` Tamar Christina
  2022-05-16 11:36 ` Richard Sandiford
  3 siblings, 1 reply; 19+ messages in thread
From: Richard Biener @ 2022-05-16  6:31 UTC (permalink / raw)
  To: Tamar Christina; +Cc: gcc-patches, nd, jeffreyalaw, richard.sandiford

On Fri, 13 May 2022, Tamar Christina wrote:

> Hi All,
> 
> Some targets require function parameters to be promoted to a different
> type on expand time because the target may not have native instructions
> to work on such types.  As an example the AArch64 port does not have native
> instructions working on integer 8- or 16-bit values.  As such it promotes
> every parameter of these types to 32-bits.
> 
> This promotion could be done by a target for two reasons:
> 
> 1. For correctness.  This may be an APCS requirement for instance.
> 2. For efficiency.  By promoting the argument at expansion time we don't have
>    to keep promoting the type back and forth after each operation on it.
>    i.e. the promotion simplies the RTL.
> 
> This patch adds the ability for a target to decide whether during the expansion
> to use an extend to handle promotion or to use a paradoxical subreg.
> 
> A pradoxical subreg can be used when there's no correctness issues and when you
> still want the RTL efficiency of not doing the promotion constantly.
> 
> This also allows the target to not need to generate any code when the top bits
> are not significant.
> 
> An example is in AArch64 the following extend is unneeded:
> 
> uint8_t fd2 (uint8_t xr){
>     return xr + 1;
> }
> 
> currently generates:
> 
> fd2:
>         and     w0, w0, 255
>         add     w0, w0, 1
>         ret
> 
> instead of
> 
> fd2:
>         add     w0, w0, #1
>         ret
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> Bootstrapped on x86_64-pc-linux-gnu and no issues
> 
> Ok for master?

Why do we need a target hook for this?  Why doesn't the targets
function_arg(?) hook just return (subreg:SI (reg:QI 0)) here instead
when no zero-extension is required and (reg:QI 0) when it is?

That said, an extra hook looks like a bad design to me, the existing
cummulative args way of querying the target ABI shoud be enough,
and if not, should be extended in a less hackish way.

But of course I am not familiar at all with the current state, but
since you specifially CCed me ...

Richard.

> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
> 	* cfgexpand.cc (set_rtl): Check for function promotion.
> 	* tree-outof-ssa.cc (insert_value_copy_on_edge): Likewise.
> 	* function.cc (assign_parm_setup_reg): Likewise.
> 	* hooks.cc (hook_bool_mode_mode_int_tree_false,
> 	hook_bool_mode_mode_int_tree_true): New.
> 	* hooks.h (hook_bool_mode_mode_int_tree_false,
> 	hook_bool_mode_mode_int_tree_true): New.
> 	* target.def (promote_function_args_subreg_p): New.
> 	* doc/tm.texi: Document it.
> 	* doc/tm.texi.in: Likewise.
> 
> --- inline copy of patch -- 
> diff --git a/gcc/cfgexpand.cc b/gcc/cfgexpand.cc
> index d3cc77d2ca98f620b29623fc5696410bad9bc184..df95184cfa185312c2a46cb92daa051718d9f4f3 100644
> --- a/gcc/cfgexpand.cc
> +++ b/gcc/cfgexpand.cc
> @@ -206,14 +206,20 @@ set_rtl (tree t, rtx x)
>       have to compute it ourselves.  For RESULT_DECLs, we accept mode
>       mismatches too, as long as we have BLKmode or are not coalescing
>       across variables, so that we don't reject BLKmode PARALLELs or
> -     unpromoted REGs.  */
> +     unpromoted REGs.  For any promoted types that result in a
> +     paradoxical subreg also accept the argument.  */
>    gcc_checking_assert (!x || x == pc_rtx || TREE_CODE (t) != SSA_NAME
>  		       || (SSAVAR (t)
>  			   && TREE_CODE (SSAVAR (t)) == RESULT_DECL
>  			   && (promote_ssa_mode (t, NULL) == BLKmode
>  			       || !flag_tree_coalesce_vars))
>  		       || !use_register_for_decl (t)
> -		       || GET_MODE (x) == promote_ssa_mode (t, NULL));
> +		       || GET_MODE (x) == promote_ssa_mode (t, NULL)
> +		       || targetm.calls.promote_function_args_subreg_p (
> +					  GET_MODE (x),
> +					  promote_ssa_mode (t, NULL),
> +					  TYPE_UNSIGNED (TREE_TYPE (t)),
> +					  SSAVAR (t)));
>  
>    if (x)
>      {
> diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
> index 2f92d37da8c0091e9879a493cfe8a361eb1d9299..6314cd83a2488dc225d4a1a15599e8e51e639f7f 100644
> --- a/gcc/doc/tm.texi
> +++ b/gcc/doc/tm.texi
> @@ -3906,6 +3906,15 @@ cases of mismatch, it also makes for better code on certain machines.
>  The default is to not promote prototypes.
>  @end deftypefn
>  
> +@deftypefn {Target Hook} bool TARGET_PROMOTE_FUNCTION_ARGS_SUBREG_P (machine_mode @var{mode}, machine_mode @var{promoted_mode}, int @var{unsignedp}, tree @var{v})
> +When a function argument is promoted with @code{PROMOTE_MODE} then this
> +hook is used to determine whether the bits of the promoted type are all
> +significant in the expression pointed to by V.  If they are an extend is
> +generated, if they are not a paradoxical subreg is created for the argument
> +from @code{mode} to @code{promoted_mode}.
> +The default is to promote using an extend.
> +@end deftypefn
> +
>  @deftypefn {Target Hook} bool TARGET_PUSH_ARGUMENT (unsigned int @var{npush})
>  This target hook returns @code{true} if push instructions will be
>  used to pass outgoing arguments.  When the push instruction usage is
> diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
> index f869ddd5e5b8b7acbd8e9765fb103af24a1085b6..35f955803ec0a5a93be18a028fa1043f90858982 100644
> --- a/gcc/doc/tm.texi.in
> +++ b/gcc/doc/tm.texi.in
> @@ -3103,6 +3103,8 @@ control passing certain arguments in registers.
>  
>  @hook TARGET_PROMOTE_PROTOTYPES
>  
> +@hook TARGET_PROMOTE_FUNCTION_ARGS_SUBREG_P
> +
>  @hook TARGET_PUSH_ARGUMENT
>  
>  @defmac PUSH_ARGS_REVERSED
> diff --git a/gcc/function.cc b/gcc/function.cc
> index d5ed51a6a663a1ef472f5b1c090543f359c18f42..92f469bfd5d1ebfb09cc94d9be854715cd2f90f8 100644
> --- a/gcc/function.cc
> +++ b/gcc/function.cc
> @@ -3161,7 +3161,7 @@ assign_parm_setup_reg (struct assign_parm_data_all *all, tree parm,
>    machine_mode promoted_nominal_mode;
>    int unsignedp = TYPE_UNSIGNED (TREE_TYPE (parm));
>    bool did_conversion = false;
> -  bool need_conversion, moved;
> +  bool need_conversion, moved, use_subregs;
>    enum insn_code icode;
>    rtx rtl;
>  
> @@ -3172,7 +3172,20 @@ assign_parm_setup_reg (struct assign_parm_data_all *all, tree parm,
>      = promote_function_mode (data->nominal_type, data->nominal_mode, &unsignedp,
>  			     TREE_TYPE (current_function_decl), 2);
>  
> -  parmreg = gen_reg_rtx (promoted_nominal_mode);
> +  /* Check to see how the target wants the promotion of function arguments to
> +     be handled.  */
> +  use_subregs
> +    = targetm.calls.promote_function_args_subreg_p (data->nominal_mode,
> +						    promoted_nominal_mode,
> +						    unsignedp, parm);
> +
> +  /* If we're promoting using a paradoxical subreg then we need to keep using
> +     the unpromoted type because that's the only fully defined value.  */
> +  if (use_subregs)
> +    parmreg = gen_reg_rtx (data->nominal_mode);
> +  else
> +    parmreg = gen_reg_rtx (promoted_nominal_mode);
> +
>    if (!DECL_ARTIFICIAL (parm))
>      mark_user_reg (parmreg);
>  
> @@ -3256,9 +3269,19 @@ assign_parm_setup_reg (struct assign_parm_data_all *all, tree parm,
>  	    }
>  	  else
>  	    t = op1;
> -	  rtx_insn *pat = gen_extend_insn (op0, t, promoted_nominal_mode,
> -					   data->passed_mode, unsignedp);
> -	  emit_insn (pat);
> +
> +	  /* Promote the argument itself now if a target wants it.  This
> +	     prevents unneeded back and forth convertions in RTL between
> +	     the original and promoted type.  */
> +	  if (use_subregs)
> +	    emit_move_insn (op0, lowpart_subreg (promoted_nominal_mode, t,
> +						 data->nominal_mode));
> +	  else
> +	    {
> +	      rtx_insn *pat = gen_extend_insn (op0, t, promoted_nominal_mode,
> +					       data->passed_mode, unsignedp);
> +	      emit_insn (pat);
> +	    }
>  	  insns = get_insns ();
>  
>  	  moved = true;
> diff --git a/gcc/hooks.h b/gcc/hooks.h
> index 1056e1e9e4dc3e6ce298557351047caa2f84227f..8d68de5cdb9adaea0a79ebf6de599f66b40aa67a 100644
> --- a/gcc/hooks.h
> +++ b/gcc/hooks.h
> @@ -31,6 +31,8 @@ extern bool hook_bool_const_int_const_int_true (const int, const int);
>  extern bool hook_bool_mode_false (machine_mode);
>  extern bool hook_bool_mode_true (machine_mode);
>  extern bool hook_bool_mode_mode_true (machine_mode, machine_mode);
> +extern bool hook_bool_mode_mode_int_tree_false (machine_mode, machine_mode, int, tree);
> +extern bool hook_bool_mode_mode_int_tree_true (machine_mode, machine_mode, int, tree);
>  extern bool hook_bool_mode_const_rtx_false (machine_mode, const_rtx);
>  extern bool hook_bool_mode_const_rtx_true (machine_mode, const_rtx);
>  extern bool hook_bool_mode_rtx_false (machine_mode, rtx);
> diff --git a/gcc/hooks.cc b/gcc/hooks.cc
> index b29233f4f852fb81ede75a5065d743cd16cc9219..7647774f9e8efbbe13d5607e4a4b2f1c9d22f045 100644
> --- a/gcc/hooks.cc
> +++ b/gcc/hooks.cc
> @@ -89,6 +89,22 @@ hook_bool_mode_mode_true (machine_mode, machine_mode)
>    return true;
>  }
>  
> +/* Generic hook that takes (machine_mode, machine_mode, int, tree) and
> +   returns false.  */
> +bool
> +hook_bool_mode_mode_int_tree_false (machine_mode, machine_mode, int, tree)
> +{
> +  return false;
> +}
> +
> +/* Generic hook that takes (machine_mode, machine_mode, int, tree) and
> +   returns true.  */
> +bool
> +hook_bool_mode_mode_int_tree_true (machine_mode, machine_mode, int, tree)
> +{
> +  return true;
> +}
> +
>  /* Generic hook that takes (machine_mode, const_rtx) and returns false.  */
>  bool
>  hook_bool_mode_const_rtx_false (machine_mode, const_rtx)
> diff --git a/gcc/target.def b/gcc/target.def
> index 72c2e1ef756cf70a1c92abe81f8a6577eaaa2501..bdbacf8c5fd7b0626a37951f6f6ec649f3194977 100644
> --- a/gcc/target.def
> +++ b/gcc/target.def
> @@ -4561,6 +4561,17 @@ The default is to not promote prototypes.",
>   bool, (const_tree fntype),
>   hook_bool_const_tree_false)
>  
> +DEFHOOK
> +(promote_function_args_subreg_p,
> + "When a function argument is promoted with @code{PROMOTE_MODE} then this\n\
> +hook is used to determine whether the bits of the promoted type are all\n\
> +significant in the expression pointed to by V.  If they are an extend is\n\
> +generated, if they are not a paradoxical subreg is created for the argument\n\
> +from @code{mode} to @code{promoted_mode}.\n\
> +The default is to promote using an extend.",
> + bool, (machine_mode mode, machine_mode promoted_mode, int unsignedp, tree v),
> + hook_bool_mode_mode_int_tree_false)
> +
>  DEFHOOK
>  (struct_value_rtx,
>   "This target hook should return the location of the structure value\n\
> diff --git a/gcc/tree-outof-ssa.cc b/gcc/tree-outof-ssa.cc
> index ec883126ad86d86a2c2dafee4592b8d83e2ed917..0f437023983baa0f23da25221f7bce8fc559a8b8 100644
> --- a/gcc/tree-outof-ssa.cc
> +++ b/gcc/tree-outof-ssa.cc
> @@ -45,6 +45,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "tree-ssa-coalesce.h"
>  #include "tree-outof-ssa.h"
>  #include "dojump.h"
> +#include "target.h"
>  
>  /* FIXME: A lot of code here deals with expanding to RTL.  All that code
>     should be in cfgexpand.cc.  */
> @@ -333,7 +334,10 @@ insert_value_copy_on_edge (edge e, int dest, tree src, location_t locus)
>    dest_mode = GET_MODE (dest_rtx);
>    gcc_assert (src_mode == TYPE_MODE (TREE_TYPE (name)));
>    gcc_assert (!REG_P (dest_rtx)
> -	      || dest_mode == promote_ssa_mode (name, &unsignedp));
> +	      || dest_mode == promote_ssa_mode (name, &unsignedp)
> +	      || targetm.calls.promote_function_args_subreg_p (
> +			dest_mode, promote_ssa_mode (name, NULL), unsignedp,
> +			name));
>  
>    if (src_mode != dest_mode)
>      {
> 
> 
> 
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Ivo Totev; HRB 36809 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: [PATCH 1/3]middle-end: Add the ability to let the target decide the method of argument promotions.
  2022-05-16  6:31 ` [PATCH 1/3]middle-end: Add the ability to let the target decide the method of argument promotions Richard Biener
@ 2022-05-16  8:26   ` Tamar Christina
  0 siblings, 0 replies; 19+ messages in thread
From: Tamar Christina @ 2022-05-16  8:26 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, nd, jeffreyalaw, Richard Sandiford



> -----Original Message-----
> From: Richard Biener <rguenther@suse.de>
> Sent: Monday, May 16, 2022 7:31 AM
> To: Tamar Christina <Tamar.Christina@arm.com>
> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; jeffreyalaw@gmail.com;
> Richard Sandiford <Richard.Sandiford@arm.com>
> Subject: Re: [PATCH 1/3]middle-end: Add the ability to let the target decide
> the method of argument promotions.
> 
> On Fri, 13 May 2022, Tamar Christina wrote:
> 
> > Hi All,
> >
> > Some targets require function parameters to be promoted to a different
> > type on expand time because the target may not have native
> > instructions to work on such types.  As an example the AArch64 port
> > does not have native instructions working on integer 8- or 16-bit
> > values.  As such it promotes every parameter of these types to 32-bits.
> >
> > This promotion could be done by a target for two reasons:
> >
> > 1. For correctness.  This may be an APCS requirement for instance.
> > 2. For efficiency.  By promoting the argument at expansion time we don't
> have
> >    to keep promoting the type back and forth after each operation on it.
> >    i.e. the promotion simplies the RTL.
> >
> > This patch adds the ability for a target to decide whether during the
> > expansion to use an extend to handle promotion or to use a paradoxical
> subreg.
> >
> > A pradoxical subreg can be used when there's no correctness issues and
> > when you still want the RTL efficiency of not doing the promotion
> constantly.
> >
> > This also allows the target to not need to generate any code when the
> > top bits are not significant.
> >
> > An example is in AArch64 the following extend is unneeded:
> >
> > uint8_t fd2 (uint8_t xr){
> >     return xr + 1;
> > }
> >
> > currently generates:
> >
> > fd2:
> >         and     w0, w0, 255
> >         add     w0, w0, 1
> >         ret
> >
> > instead of
> >
> > fd2:
> >         add     w0, w0, #1
> >         ret
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> > Bootstrapped on x86_64-pc-linux-gnu and no issues
> >
> > Ok for master?
> 
> Why do we need a target hook for this?  Why doesn't the targets
> function_arg(?) hook just return (subreg:SI (reg:QI 0)) here instead when no
> zero-extension is required and (reg:QI 0) when it is?
> 

Because I'm not sure it's allowed to. From what I can tell function_arg expects
return registers to be hardregs. i.e. the actual APCS determined register for the
argument.  And since it's a hardreg can't make it a paradoxical subreg, it'll just
resize the register to the new mode.

assign_parm_setup_reg gets around this by creating a new pseudo and then
assigning the resulting rtl to the param set_parm_rtl (parm, rtl); which as far
as I can tell changes what expand will expand the parm to without actually
changing the actual APCS register.  Which is why I think the current code does
the promotions there.

> That said, an extra hook looks like a bad design to me, the existing
> cummulative args way of querying the target ABI shoud be enough, and if
> not, should be extended in a less hackish way.

I could perhaps modify function_arg_info to have a field for the subreg extension
which I can then use in function_arg for the promotion.  However I'll have a problem
with  the asserts in insert_value_copy_on_edge and set_rtl both of which check that
either the in/out types match, or the out type is a promoted in type.

set_rtl I can extend with enough information to determine whether the subreg was
intentional but insert_value_copy_on_edge doesn't give me access to enough information..

Which is probably why the current code re-queries the target..  Can I get to the parm information
Here?

I also initially tried to extend PROMOTE_MODE itself, however this is used by promote_mode, which
is used in many other places and I am not sure some of these usages are safe to change, such as
promote_ssa_mode.

The new hook I know only interacts with parms.  Any thoughts appreciated 😊

Cheers,
Tamar

> 
> But of course I am not familiar at all with the current state, but since you
> specifially CCed me ...
> 
> Richard.
> 
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > 	* cfgexpand.cc (set_rtl): Check for function promotion.
> > 	* tree-outof-ssa.cc (insert_value_copy_on_edge): Likewise.
> > 	* function.cc (assign_parm_setup_reg): Likewise.
> > 	* hooks.cc (hook_bool_mode_mode_int_tree_false,
> > 	hook_bool_mode_mode_int_tree_true): New.
> > 	* hooks.h (hook_bool_mode_mode_int_tree_false,
> > 	hook_bool_mode_mode_int_tree_true): New.
> > 	* target.def (promote_function_args_subreg_p): New.
> > 	* doc/tm.texi: Document it.
> > 	* doc/tm.texi.in: Likewise.
> >
> > --- inline copy of patch --
> > diff --git a/gcc/cfgexpand.cc b/gcc/cfgexpand.cc index
> >
> d3cc77d2ca98f620b29623fc5696410bad9bc184..df95184cfa185312c2a46cb92da
> a
> > 051718d9f4f3 100644
> > --- a/gcc/cfgexpand.cc
> > +++ b/gcc/cfgexpand.cc
> > @@ -206,14 +206,20 @@ set_rtl (tree t, rtx x)
> >       have to compute it ourselves.  For RESULT_DECLs, we accept mode
> >       mismatches too, as long as we have BLKmode or are not coalescing
> >       across variables, so that we don't reject BLKmode PARALLELs or
> > -     unpromoted REGs.  */
> > +     unpromoted REGs.  For any promoted types that result in a
> > +     paradoxical subreg also accept the argument.  */
> >    gcc_checking_assert (!x || x == pc_rtx || TREE_CODE (t) != SSA_NAME
> >  		       || (SSAVAR (t)
> >  			   && TREE_CODE (SSAVAR (t)) == RESULT_DECL
> >  			   && (promote_ssa_mode (t, NULL) == BLKmode
> >  			       || !flag_tree_coalesce_vars))
> >  		       || !use_register_for_decl (t)
> > -		       || GET_MODE (x) == promote_ssa_mode (t, NULL));
> > +		       || GET_MODE (x) == promote_ssa_mode (t, NULL)
> > +		       || targetm.calls.promote_function_args_subreg_p (
> > +					  GET_MODE (x),
> > +					  promote_ssa_mode (t, NULL),
> > +					  TYPE_UNSIGNED (TREE_TYPE (t)),
> > +					  SSAVAR (t)));
> >
> >    if (x)
> >      {
> > diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi index
> >
> 2f92d37da8c0091e9879a493cfe8a361eb1d9299..6314cd83a2488dc225d4a1a15
> 599
> > e8e51e639f7f 100644
> > --- a/gcc/doc/tm.texi
> > +++ b/gcc/doc/tm.texi
> > @@ -3906,6 +3906,15 @@ cases of mismatch, it also makes for better code
> on certain machines.
> >  The default is to not promote prototypes.
> >  @end deftypefn
> >
> > +@deftypefn {Target Hook} bool
> TARGET_PROMOTE_FUNCTION_ARGS_SUBREG_P
> > +(machine_mode @var{mode}, machine_mode @var{promoted_mode},
> int
> > +@var{unsignedp}, tree @var{v}) When a function argument is promoted
> > +with @code{PROMOTE_MODE} then this hook is used to determine
> whether
> > +the bits of the promoted type are all significant in the expression
> > +pointed to by V.  If they are an extend is generated, if they are not a
> paradoxical subreg is created for the argument from @code{mode} to
> @code{promoted_mode}.
> > +The default is to promote using an extend.
> > +@end deftypefn
> > +
> >  @deftypefn {Target Hook} bool TARGET_PUSH_ARGUMENT (unsigned int
> > @var{npush})  This target hook returns @code{true} if push
> > instructions will be  used to pass outgoing arguments.  When the push
> > instruction usage is diff --git a/gcc/doc/tm.texi.in
> > b/gcc/doc/tm.texi.in index
> >
> f869ddd5e5b8b7acbd8e9765fb103af24a1085b6..35f955803ec0a5a93be18a028
> fa1
> > 043f90858982 100644
> > --- a/gcc/doc/tm.texi.in
> > +++ b/gcc/doc/tm.texi.in
> > @@ -3103,6 +3103,8 @@ control passing certain arguments in registers.
> >
> >  @hook TARGET_PROMOTE_PROTOTYPES
> >
> > +@hook TARGET_PROMOTE_FUNCTION_ARGS_SUBREG_P
> > +
> >  @hook TARGET_PUSH_ARGUMENT
> >
> >  @defmac PUSH_ARGS_REVERSED
> > diff --git a/gcc/function.cc b/gcc/function.cc index
> >
> d5ed51a6a663a1ef472f5b1c090543f359c18f42..92f469bfd5d1ebfb09cc94d9be
> 85
> > 4715cd2f90f8 100644
> > --- a/gcc/function.cc
> > +++ b/gcc/function.cc
> > @@ -3161,7 +3161,7 @@ assign_parm_setup_reg (struct
> assign_parm_data_all *all, tree parm,
> >    machine_mode promoted_nominal_mode;
> >    int unsignedp = TYPE_UNSIGNED (TREE_TYPE (parm));
> >    bool did_conversion = false;
> > -  bool need_conversion, moved;
> > +  bool need_conversion, moved, use_subregs;
> >    enum insn_code icode;
> >    rtx rtl;
> >
> > @@ -3172,7 +3172,20 @@ assign_parm_setup_reg (struct
> assign_parm_data_all *all, tree parm,
> >      = promote_function_mode (data->nominal_type, data-
> >nominal_mode, &unsignedp,
> >  			     TREE_TYPE (current_function_decl), 2);
> >
> > -  parmreg = gen_reg_rtx (promoted_nominal_mode);
> > +  /* Check to see how the target wants the promotion of function
> arguments to
> > +     be handled.  */
> > +  use_subregs
> > +    = targetm.calls.promote_function_args_subreg_p (data-
> >nominal_mode,
> > +						    promoted_nominal_mode,
> > +						    unsignedp, parm);
> > +
> > +  /* If we're promoting using a paradoxical subreg then we need to keep
> using
> > +     the unpromoted type because that's the only fully defined value.
> > + */  if (use_subregs)
> > +    parmreg = gen_reg_rtx (data->nominal_mode);  else
> > +    parmreg = gen_reg_rtx (promoted_nominal_mode);
> > +
> >    if (!DECL_ARTIFICIAL (parm))
> >      mark_user_reg (parmreg);
> >
> > @@ -3256,9 +3269,19 @@ assign_parm_setup_reg (struct
> assign_parm_data_all *all, tree parm,
> >  	    }
> >  	  else
> >  	    t = op1;
> > -	  rtx_insn *pat = gen_extend_insn (op0, t,
> promoted_nominal_mode,
> > -					   data->passed_mode, unsignedp);
> > -	  emit_insn (pat);
> > +
> > +	  /* Promote the argument itself now if a target wants it.  This
> > +	     prevents unneeded back and forth convertions in RTL between
> > +	     the original and promoted type.  */
> > +	  if (use_subregs)
> > +	    emit_move_insn (op0, lowpart_subreg
> (promoted_nominal_mode, t,
> > +						 data->nominal_mode));
> > +	  else
> > +	    {
> > +	      rtx_insn *pat = gen_extend_insn (op0, t,
> promoted_nominal_mode,
> > +					       data->passed_mode, unsignedp);
> > +	      emit_insn (pat);
> > +	    }
> >  	  insns = get_insns ();
> >
> >  	  moved = true;
> > diff --git a/gcc/hooks.h b/gcc/hooks.h index
> >
> 1056e1e9e4dc3e6ce298557351047caa2f84227f..8d68de5cdb9adaea0a79ebf6d
> e59
> > 9f66b40aa67a 100644
> > --- a/gcc/hooks.h
> > +++ b/gcc/hooks.h
> > @@ -31,6 +31,8 @@ extern bool hook_bool_const_int_const_int_true
> > (const int, const int);  extern bool hook_bool_mode_false
> > (machine_mode);  extern bool hook_bool_mode_true (machine_mode);
> > extern bool hook_bool_mode_mode_true (machine_mode,
> machine_mode);
> > +extern bool hook_bool_mode_mode_int_tree_false (machine_mode,
> > +machine_mode, int, tree); extern bool
> > +hook_bool_mode_mode_int_tree_true (machine_mode,
> machine_mode, int,
> > +tree);
> >  extern bool hook_bool_mode_const_rtx_false (machine_mode,
> const_rtx);
> > extern bool hook_bool_mode_const_rtx_true (machine_mode,
> const_rtx);
> > extern bool hook_bool_mode_rtx_false (machine_mode, rtx); diff --git
> > a/gcc/hooks.cc b/gcc/hooks.cc index
> >
> b29233f4f852fb81ede75a5065d743cd16cc9219..7647774f9e8efbbe13d5607e4
> a4b
> > 2f1c9d22f045 100644
> > --- a/gcc/hooks.cc
> > +++ b/gcc/hooks.cc
> > @@ -89,6 +89,22 @@ hook_bool_mode_mode_true (machine_mode,
> machine_mode)
> >    return true;
> >  }
> >
> > +/* Generic hook that takes (machine_mode, machine_mode, int, tree)
> and
> > +   returns false.  */
> > +bool
> > +hook_bool_mode_mode_int_tree_false (machine_mode,
> machine_mode, int,
> > +tree) {
> > +  return false;
> > +}
> > +
> > +/* Generic hook that takes (machine_mode, machine_mode, int, tree)
> and
> > +   returns true.  */
> > +bool
> > +hook_bool_mode_mode_int_tree_true (machine_mode,
> machine_mode, int,
> > +tree) {
> > +  return true;
> > +}
> > +
> >  /* Generic hook that takes (machine_mode, const_rtx) and returns
> > false.  */  bool  hook_bool_mode_const_rtx_false (machine_mode,
> > const_rtx) diff --git a/gcc/target.def b/gcc/target.def index
> >
> 72c2e1ef756cf70a1c92abe81f8a6577eaaa2501..bdbacf8c5fd7b0626a37951f6f6
> e
> > c649f3194977 100644
> > --- a/gcc/target.def
> > +++ b/gcc/target.def
> > @@ -4561,6 +4561,17 @@ The default is to not promote prototypes.",
> >   bool, (const_tree fntype),
> >   hook_bool_const_tree_false)
> >
> > +DEFHOOK
> > +(promote_function_args_subreg_p,
> > + "When a function argument is promoted with @code{PROMOTE_MODE}
> then
> > +this\n\ hook is used to determine whether the bits of the promoted
> > +type are all\n\ significant in the expression pointed to by V.  If
> > +they are an extend is\n\ generated, if they are not a paradoxical
> > +subreg is created for the argument\n\ from @code{mode} to
> > +@code{promoted_mode}.\n\ The default is to promote using an
> extend.",
> > +bool, (machine_mode mode, machine_mode promoted_mode, int
> unsignedp,
> > +tree v),
> > + hook_bool_mode_mode_int_tree_false)
> > +
> >  DEFHOOK
> >  (struct_value_rtx,
> >   "This target hook should return the location of the structure
> > value\n\ diff --git a/gcc/tree-outof-ssa.cc b/gcc/tree-outof-ssa.cc
> > index
> >
> ec883126ad86d86a2c2dafee4592b8d83e2ed917..0f437023983baa0f23da25221
> f7b
> > ce8fc559a8b8 100644
> > --- a/gcc/tree-outof-ssa.cc
> > +++ b/gcc/tree-outof-ssa.cc
> > @@ -45,6 +45,7 @@ along with GCC; see the file COPYING3.  If not see
> > #include "tree-ssa-coalesce.h"
> >  #include "tree-outof-ssa.h"
> >  #include "dojump.h"
> > +#include "target.h"
> >
> >  /* FIXME: A lot of code here deals with expanding to RTL.  All that code
> >     should be in cfgexpand.cc.  */
> > @@ -333,7 +334,10 @@ insert_value_copy_on_edge (edge e, int dest,
> tree src, location_t locus)
> >    dest_mode = GET_MODE (dest_rtx);
> >    gcc_assert (src_mode == TYPE_MODE (TREE_TYPE (name)));
> >    gcc_assert (!REG_P (dest_rtx)
> > -	      || dest_mode == promote_ssa_mode (name, &unsignedp));
> > +	      || dest_mode == promote_ssa_mode (name, &unsignedp)
> > +	      || targetm.calls.promote_function_args_subreg_p (
> > +			dest_mode, promote_ssa_mode (name, NULL),
> unsignedp,
> > +			name));
> >
> >    if (src_mode != dest_mode)
> >      {
> >
> >
> >
> >
> >
> 
> --
> Richard Biener <rguenther@suse.de>
> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409
> Nuernberg, Germany; GF: Ivo Totev; HRB 36809 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/3]middle-end: Add the ability to let the target decide the method of argument promotions.
  2022-05-13 17:11 [PATCH 1/3]middle-end: Add the ability to let the target decide the method of argument promotions Tamar Christina
                   ` (2 preceding siblings ...)
  2022-05-16  6:31 ` [PATCH 1/3]middle-end: Add the ability to let the target decide the method of argument promotions Richard Biener
@ 2022-05-16 11:36 ` Richard Sandiford
  2022-05-16 11:49   ` Tamar Christina
  3 siblings, 1 reply; 19+ messages in thread
From: Richard Sandiford @ 2022-05-16 11:36 UTC (permalink / raw)
  To: Tamar Christina; +Cc: gcc-patches, nd, rguenther, jeffreyalaw

Tamar Christina <tamar.christina@arm.com> writes:
> Hi All,
>
> Some targets require function parameters to be promoted to a different
> type on expand time because the target may not have native instructions
> to work on such types.  As an example the AArch64 port does not have native
> instructions working on integer 8- or 16-bit values.  As such it promotes
> every parameter of these types to 32-bits.

This doesn't seem specific to parameters though.  It applies to any
8- or 16-bit variable.  E.g.:

#include <stdint.h>
uint8_t foo(uint32_t x, uint32_t y) {
    uint8_t z = x != 0 ? x : y;
    return z + 1;
}

generates:

foo:
        cmp     w0, 0
        and     w1, w1, 255
        and     w0, w0, 255
        csel    w0, w1, w0, eq
        add     w0, w0, 1
        ret

So I think the new behaviour is really a modification of the PROMOTE_MODE
behaviour rather than the PROMOTE_FUNCTION_MODE behaviour.

FWIW, I agree with Richard that it would be better not to add a new hook.
I think we're really making PROMOTE_MODE choose between SIGN_EXTEND,
ZERO_EXTEND or SUBREG (what LLVM would call “any extend”) rather
than the current SIGN_EXTEND vs. ZERO_EXTEND choice.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: [PATCH 1/3]middle-end: Add the ability to let the target decide the method of argument promotions.
  2022-05-16 11:36 ` Richard Sandiford
@ 2022-05-16 11:49   ` Tamar Christina
  2022-05-16 12:14     ` Richard Sandiford
  0 siblings, 1 reply; 19+ messages in thread
From: Tamar Christina @ 2022-05-16 11:49 UTC (permalink / raw)
  To: Richard Sandiford; +Cc: gcc-patches, nd, rguenther, jeffreyalaw



> -----Original Message-----
> From: Richard Sandiford <richard.sandiford@arm.com>
> Sent: Monday, May 16, 2022 12:36 PM
> To: Tamar Christina <Tamar.Christina@arm.com>
> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; rguenther@suse.de;
> jeffreyalaw@gmail.com
> Subject: Re: [PATCH 1/3]middle-end: Add the ability to let the target decide
> the method of argument promotions.
> 
> Tamar Christina <tamar.christina@arm.com> writes:
> > Hi All,
> >
> > Some targets require function parameters to be promoted to a different
> > type on expand time because the target may not have native
> > instructions to work on such types.  As an example the AArch64 port
> > does not have native instructions working on integer 8- or 16-bit
> > values.  As such it promotes every parameter of these types to 32-bits.
> 
> This doesn't seem specific to parameters though.  It applies to any
> 8- or 16-bit variable.  E.g.:
> 
> #include <stdint.h>
> uint8_t foo(uint32_t x, uint32_t y) {
>     uint8_t z = x != 0 ? x : y;
>     return z + 1;
> }
> 
> generates:
> 
> foo:
>         cmp     w0, 0
>         and     w1, w1, 255
>         and     w0, w0, 255
>         csel    w0, w1, w0, eq
>         add     w0, w0, 1
>         ret
> 
> So I think the new behaviour is really a modification of the PROMOTE_MODE
> behaviour rather than the PROMOTE_FUNCTION_MODE behaviour.
> 
> FWIW, I agree with Richard that it would be better not to add a new hook.
> I think we're really making PROMOTE_MODE choose between
> SIGN_EXTEND, ZERO_EXTEND or SUBREG (what LLVM would call “any
> extend”) rather than the current SIGN_EXTEND vs. ZERO_EXTEND choice.

Ah, I hadn't realized this also applied to locals.. ok I can modify PROMOTE_MODE then,
but I also need the actual SSA_NAME and not just the type so will have to pass this along.

From a practical point of view.. the actual hook however is implemented by 34 targets,
would I need to CC maintainers for each of them or would global maintainer approval
suffice for these mostly mechanical changes?

Cheers,
Tamar

> 
> Thanks,
> Richard

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/3]middle-end: Add the ability to let the target decide the method of argument promotions.
  2022-05-16 11:49   ` Tamar Christina
@ 2022-05-16 12:14     ` Richard Sandiford
  2022-05-16 12:18       ` Richard Sandiford
  0 siblings, 1 reply; 19+ messages in thread
From: Richard Sandiford @ 2022-05-16 12:14 UTC (permalink / raw)
  To: Tamar Christina; +Cc: gcc-patches, nd, rguenther, jeffreyalaw

Tamar Christina <Tamar.Christina@arm.com> writes:
>> -----Original Message-----
>> From: Richard Sandiford <richard.sandiford@arm.com>
>> Sent: Monday, May 16, 2022 12:36 PM
>> To: Tamar Christina <Tamar.Christina@arm.com>
>> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; rguenther@suse.de;
>> jeffreyalaw@gmail.com
>> Subject: Re: [PATCH 1/3]middle-end: Add the ability to let the target decide
>> the method of argument promotions.
>> 
>> Tamar Christina <tamar.christina@arm.com> writes:
>> > Hi All,
>> >
>> > Some targets require function parameters to be promoted to a different
>> > type on expand time because the target may not have native
>> > instructions to work on such types.  As an example the AArch64 port
>> > does not have native instructions working on integer 8- or 16-bit
>> > values.  As such it promotes every parameter of these types to 32-bits.
>> 
>> This doesn't seem specific to parameters though.  It applies to any
>> 8- or 16-bit variable.  E.g.:
>> 
>> #include <stdint.h>
>> uint8_t foo(uint32_t x, uint32_t y) {
>>     uint8_t z = x != 0 ? x : y;
>>     return z + 1;
>> }
>> 
>> generates:
>> 
>> foo:
>>         cmp     w0, 0
>>         and     w1, w1, 255
>>         and     w0, w0, 255
>>         csel    w0, w1, w0, eq
>>         add     w0, w0, 1
>>         ret
>> 
>> So I think the new behaviour is really a modification of the PROMOTE_MODE
>> behaviour rather than the PROMOTE_FUNCTION_MODE behaviour.
>> 
>> FWIW, I agree with Richard that it would be better not to add a new hook.
>> I think we're really making PROMOTE_MODE choose between
>> SIGN_EXTEND, ZERO_EXTEND or SUBREG (what LLVM would call “any
>> extend”) rather than the current SIGN_EXTEND vs. ZERO_EXTEND choice.
>
> Ah, I hadn't realized this also applied to locals.. ok I can modify PROMOTE_MODE then,
> but I also need the actual SSA_NAME and not just the type so will have to pass this along.
>
> From a practical point of view.. the actual hook however is implemented by 34 targets,
> would I need to CC maintainers for each of them or would global maintainer approval
> suffice for these mostly mechanical changes?

Yeah, single approval should be enough mechanical changes.  It would be
good to do the interface change and mechanical target changes as a
separate prepatch if possible though.

I'm not sure about passing the SSA name to the target though, or the
way that the aarch64 hook uses the info.  It looks like a single cold
comparison could defeat the optimisation for hot code.

If we do try to make the decision based on uses at expand time, it might
be better for the analysis to be in target-independent code, with help
from the target to decide where extensions are cheap.  It still feels
a bit hacky though.

What stops us from forming cbz/cbnz when the extension is done close to
the comparison (from the comment in 2/3)?  If we can solve that, could
we simply do an any-extend all the time, and treat removing redundant
extensions as a global availability problem?

What kind of code do we emit when do an extension just before
an operation?  If the DECL_RTL is (subreg:QI (reg:SI R) 0), say,
then it should be safe to do the extension directly into R:

  (set (reg:SI X) (zero_extend:SI (subreg:QI (reg:SI X))))

which avoids the problem of having two values live at once
(the zero-extended value and the any-extended value).

Having identical instructions for each extension would also make
it easier for any global pass to move them and remove redundancies.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/3]middle-end: Add the ability to let the target decide the method of argument promotions.
  2022-05-16 12:14     ` Richard Sandiford
@ 2022-05-16 12:18       ` Richard Sandiford
  2022-05-16 13:02         ` Tamar Christina
  0 siblings, 1 reply; 19+ messages in thread
From: Richard Sandiford @ 2022-05-16 12:18 UTC (permalink / raw)
  To: Tamar Christina; +Cc: gcc-patches, nd, rguenther, jeffreyalaw

Richard Sandiford via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> Tamar Christina <Tamar.Christina@arm.com> writes:
>>> -----Original Message-----
>>> From: Richard Sandiford <richard.sandiford@arm.com>
>>> Sent: Monday, May 16, 2022 12:36 PM
>>> To: Tamar Christina <Tamar.Christina@arm.com>
>>> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; rguenther@suse.de;
>>> jeffreyalaw@gmail.com
>>> Subject: Re: [PATCH 1/3]middle-end: Add the ability to let the target decide
>>> the method of argument promotions.
>>> 
>>> Tamar Christina <tamar.christina@arm.com> writes:
>>> > Hi All,
>>> >
>>> > Some targets require function parameters to be promoted to a different
>>> > type on expand time because the target may not have native
>>> > instructions to work on such types.  As an example the AArch64 port
>>> > does not have native instructions working on integer 8- or 16-bit
>>> > values.  As such it promotes every parameter of these types to 32-bits.
>>> 
>>> This doesn't seem specific to parameters though.  It applies to any
>>> 8- or 16-bit variable.  E.g.:
>>> 
>>> #include <stdint.h>
>>> uint8_t foo(uint32_t x, uint32_t y) {
>>>     uint8_t z = x != 0 ? x : y;
>>>     return z + 1;
>>> }
>>> 
>>> generates:
>>> 
>>> foo:
>>>         cmp     w0, 0
>>>         and     w1, w1, 255
>>>         and     w0, w0, 255
>>>         csel    w0, w1, w0, eq
>>>         add     w0, w0, 1
>>>         ret
>>> 
>>> So I think the new behaviour is really a modification of the PROMOTE_MODE
>>> behaviour rather than the PROMOTE_FUNCTION_MODE behaviour.
>>> 
>>> FWIW, I agree with Richard that it would be better not to add a new hook.
>>> I think we're really making PROMOTE_MODE choose between
>>> SIGN_EXTEND, ZERO_EXTEND or SUBREG (what LLVM would call “any
>>> extend”) rather than the current SIGN_EXTEND vs. ZERO_EXTEND choice.
>>
>> Ah, I hadn't realized this also applied to locals.. ok I can modify PROMOTE_MODE then,
>> but I also need the actual SSA_NAME and not just the type so will have to pass this along.
>>
>> From a practical point of view.. the actual hook however is implemented by 34 targets,
>> would I need to CC maintainers for each of them or would global maintainer approval
>> suffice for these mostly mechanical changes?
>
> Yeah, single approval should be enough mechanical changes.  It would be
> good to do the interface change and mechanical target changes as a
> separate prepatch if possible though.
>
> I'm not sure about passing the SSA name to the target though, or the
> way that the aarch64 hook uses the info.  It looks like a single cold
> comparison could defeat the optimisation for hot code.
>
> If we do try to make the decision based on uses at expand time, it might
> be better for the analysis to be in target-independent code, with help
> from the target to decide where extensions are cheap.  It still feels
> a bit hacky though.
>
> What stops us from forming cbz/cbnz when the extension is done close to
> the comparison (from the comment in 2/3)?  If we can solve that, could
> we simply do an any-extend all the time, and treat removing redundant
> extensions as a global availability problem?

(which would run after combine)

>
> What kind of code do we emit when do an extension just before
> an operation?  If the DECL_RTL is (subreg:QI (reg:SI R) 0), say,
> then it should be safe to do the extension directly into R:
>
>   (set (reg:SI X) (zero_extend:SI (subreg:QI (reg:SI X))))

Oops, that should of course be:

  (set (reg:SI R) (zero_extend:SI (subreg:QI (reg:SI R))))

> which avoids the problem of having two values live at once
> (the zero-extended value and the any-extended value).
>
> Having identical instructions for each extension would also make
> it easier for any global pass to move them and remove redundancies.
>
> Thanks,
> Richard

^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: [PATCH 1/3]middle-end: Add the ability to let the target decide the method of argument promotions.
  2022-05-16 12:18       ` Richard Sandiford
@ 2022-05-16 13:02         ` Tamar Christina
  2022-05-16 13:24           ` Richard Sandiford
  0 siblings, 1 reply; 19+ messages in thread
From: Tamar Christina @ 2022-05-16 13:02 UTC (permalink / raw)
  To: Richard Sandiford; +Cc: gcc-patches, nd, rguenther, jeffreyalaw

> -----Original Message-----
> From: Richard Sandiford <richard.sandiford@arm.com>
> Sent: Monday, May 16, 2022 1:18 PM
> To: Tamar Christina <Tamar.Christina@arm.com>
> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; rguenther@suse.de;
> jeffreyalaw@gmail.com
> Subject: Re: [PATCH 1/3]middle-end: Add the ability to let the target decide
> the method of argument promotions.
> 
> Richard Sandiford via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> > Tamar Christina <Tamar.Christina@arm.com> writes:
> >>> -----Original Message-----
> >>> From: Richard Sandiford <richard.sandiford@arm.com>
> >>> Sent: Monday, May 16, 2022 12:36 PM
> >>> To: Tamar Christina <Tamar.Christina@arm.com>
> >>> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; rguenther@suse.de;
> >>> jeffreyalaw@gmail.com
> >>> Subject: Re: [PATCH 1/3]middle-end: Add the ability to let the
> >>> target decide the method of argument promotions.
> >>>
> >>> Tamar Christina <tamar.christina@arm.com> writes:
> >>> > Hi All,
> >>> >
> >>> > Some targets require function parameters to be promoted to a
> >>> > different type on expand time because the target may not have
> >>> > native instructions to work on such types.  As an example the
> >>> > AArch64 port does not have native instructions working on integer
> >>> > 8- or 16-bit values.  As such it promotes every parameter of these
> types to 32-bits.
> >>>
> >>> This doesn't seem specific to parameters though.  It applies to any
> >>> 8- or 16-bit variable.  E.g.:
> >>>
> >>> #include <stdint.h>
> >>> uint8_t foo(uint32_t x, uint32_t y) {
> >>>     uint8_t z = x != 0 ? x : y;
> >>>     return z + 1;
> >>> }
> >>>
> >>> generates:
> >>>
> >>> foo:
> >>>         cmp     w0, 0
> >>>         and     w1, w1, 255
> >>>         and     w0, w0, 255
> >>>         csel    w0, w1, w0, eq
> >>>         add     w0, w0, 1
> >>>         ret
> >>>
> >>> So I think the new behaviour is really a modification of the
> >>> PROMOTE_MODE behaviour rather than the
> PROMOTE_FUNCTION_MODE behaviour.
> >>>
> >>> FWIW, I agree with Richard that it would be better not to add a new
> hook.
> >>> I think we're really making PROMOTE_MODE choose between
> SIGN_EXTEND,
> >>> ZERO_EXTEND or SUBREG (what LLVM would call “any
> >>> extend”) rather than the current SIGN_EXTEND vs. ZERO_EXTEND
> choice.
> >>
> >> Ah, I hadn't realized this also applied to locals.. ok I can modify
> >> PROMOTE_MODE then, but I also need the actual SSA_NAME and not just
> the type so will have to pass this along.
> >>
> >> From a practical point of view.. the actual hook however is
> >> implemented by 34 targets, would I need to CC maintainers for each of
> >> them or would global maintainer approval suffice for these mostly
> mechanical changes?
> >
> > Yeah, single approval should be enough mechanical changes.  It would
> > be good to do the interface change and mechanical target changes as a
> > separate prepatch if possible though.
> >
> > I'm not sure about passing the SSA name to the target though, or the
> > way that the aarch64 hook uses the info.  It looks like a single cold
> > comparison could defeat the optimisation for hot code.

I'm not sure I follow why the likelihood of the comparison matters in this case..
I'll expand on it below..

> >
> > If we do try to make the decision based on uses at expand time, it
> > might be better for the analysis to be in target-independent code,
> > with help from the target to decide where extensions are cheap.  It
> > still feels a bit hacky though.

I thought about it but can't see most target having this problem. I did go
with an optimistic heuristics. There are of course various ways to defeat it
but looking through the corpus of code I don't see any but the simple cases
in practice. (more below).

> >
> > What stops us from forming cbz/cbnz when the extension is done close
> > to the comparison (from the comment in 2/3)?  If we can solve that,
> > could we simply do an any-extend all the time, and treat removing
> > redundant extensions as a global availability problem?
> 

In such cases there's no gain from doing the extension at all, e.g.
and w0, w0, 255
cmp w0, 0
b.eq .Lfoo

will be optimized to

tst w0, 0xff
b.ne .Lfoo

already.

In RTL the problem occurs when you have nested control flow like nested if and switch statements
The example in 2/3 was intended to show that before what we'd do is

and w22, w0, 255
.... <code that clobbers cc and caller saves>
<switch1>
cbz w22, .Lfoo1
....
<switch2>
cbz w22, .Lfoo2

If we have a single comparison we already sink the zero_extend today.

Now if we instead any-extend w0 we end up with:

mov w22, w0
.... <code that clobbers cc and caller saves>
<switch1>
tst w22, 0xff
b.ne .Lfoo1
....
<switch2>
tst w22, 0xff
b.ne .Lfoo2

So you get an additional tst before each branch. You also can't perform the tst higher since CC is clobbered.
The cbz is nice because the zero extend doesn't use CC of course and so having the value live allows us to optimize
The branch.

I don't think branch likeliness matters here as you must keep w22 live in both cases somehow. In the SPECCPU 2017
Benchmark perlbench (which uses a lot of nested switches) this pattern is responsible for an extra 0.3% codesize increase
which the approach in 2/3 prevents.

> (which would run after combine)
> 
> >
> > What kind of code do we emit when do an extension just before an
> > operation?  If the DECL_RTL is (subreg:QI (reg:SI R) 0), say, then it
> > should be safe to do the extension directly into R:
> >
> >   (set (reg:SI X) (zero_extend:SI (subreg:QI (reg:SI X))))
> 
> Oops, that should of course be:
> 
>   (set (reg:SI R) (zero_extend:SI (subreg:QI (reg:SI R))))
> 
> > which avoids the problem of having two values live at once (the
> > zero-extended value and the any-extended value).

I'm not sure it does, as the any-extended value must remain live. i.e. above you can't get rid of w22,
you can only choose between having it be zero of any extended.  But I am not sure how you would do
that after expand.

Kind Regards,
Tamar

> >
> > Having identical instructions for each extension would also make it
> > easier for any global pass to move them and remove redundancies.
> >
> > Thanks,
> > Richard

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/3]middle-end: Add the ability to let the target decide the method of argument promotions.
  2022-05-16 13:02         ` Tamar Christina
@ 2022-05-16 13:24           ` Richard Sandiford
  2022-05-16 15:29             ` Tamar Christina
  0 siblings, 1 reply; 19+ messages in thread
From: Richard Sandiford @ 2022-05-16 13:24 UTC (permalink / raw)
  To: Tamar Christina; +Cc: gcc-patches, nd, rguenther, jeffreyalaw

Tamar Christina <Tamar.Christina@arm.com> writes:
>> -----Original Message-----
>> From: Richard Sandiford <richard.sandiford@arm.com>
>> Sent: Monday, May 16, 2022 1:18 PM
>> To: Tamar Christina <Tamar.Christina@arm.com>
>> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; rguenther@suse.de;
>> jeffreyalaw@gmail.com
>> Subject: Re: [PATCH 1/3]middle-end: Add the ability to let the target decide
>> the method of argument promotions.
>> 
>> Richard Sandiford via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
>> > Tamar Christina <Tamar.Christina@arm.com> writes:
>> >>> -----Original Message-----
>> >>> From: Richard Sandiford <richard.sandiford@arm.com>
>> >>> Sent: Monday, May 16, 2022 12:36 PM
>> >>> To: Tamar Christina <Tamar.Christina@arm.com>
>> >>> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; rguenther@suse.de;
>> >>> jeffreyalaw@gmail.com
>> >>> Subject: Re: [PATCH 1/3]middle-end: Add the ability to let the
>> >>> target decide the method of argument promotions.
>> >>>
>> >>> Tamar Christina <tamar.christina@arm.com> writes:
>> >>> > Hi All,
>> >>> >
>> >>> > Some targets require function parameters to be promoted to a
>> >>> > different type on expand time because the target may not have
>> >>> > native instructions to work on such types.  As an example the
>> >>> > AArch64 port does not have native instructions working on integer
>> >>> > 8- or 16-bit values.  As such it promotes every parameter of these
>> types to 32-bits.
>> >>>
>> >>> This doesn't seem specific to parameters though.  It applies to any
>> >>> 8- or 16-bit variable.  E.g.:
>> >>>
>> >>> #include <stdint.h>
>> >>> uint8_t foo(uint32_t x, uint32_t y) {
>> >>>     uint8_t z = x != 0 ? x : y;
>> >>>     return z + 1;
>> >>> }
>> >>>
>> >>> generates:
>> >>>
>> >>> foo:
>> >>>         cmp     w0, 0
>> >>>         and     w1, w1, 255
>> >>>         and     w0, w0, 255
>> >>>         csel    w0, w1, w0, eq
>> >>>         add     w0, w0, 1
>> >>>         ret
>> >>>
>> >>> So I think the new behaviour is really a modification of the
>> >>> PROMOTE_MODE behaviour rather than the
>> PROMOTE_FUNCTION_MODE behaviour.
>> >>>
>> >>> FWIW, I agree with Richard that it would be better not to add a new
>> hook.
>> >>> I think we're really making PROMOTE_MODE choose between
>> SIGN_EXTEND,
>> >>> ZERO_EXTEND or SUBREG (what LLVM would call “any
>> >>> extend”) rather than the current SIGN_EXTEND vs. ZERO_EXTEND
>> choice.
>> >>
>> >> Ah, I hadn't realized this also applied to locals.. ok I can modify
>> >> PROMOTE_MODE then, but I also need the actual SSA_NAME and not just
>> the type so will have to pass this along.
>> >>
>> >> From a practical point of view.. the actual hook however is
>> >> implemented by 34 targets, would I need to CC maintainers for each of
>> >> them or would global maintainer approval suffice for these mostly
>> mechanical changes?
>> >
>> > Yeah, single approval should be enough mechanical changes.  It would
>> > be good to do the interface change and mechanical target changes as a
>> > separate prepatch if possible though.
>> >
>> > I'm not sure about passing the SSA name to the target though, or the
>> > way that the aarch64 hook uses the info.  It looks like a single cold
>> > comparison could defeat the optimisation for hot code.
>
> I'm not sure I follow why the likelihood of the comparison matters in this case..
> I'll expand on it below..

I meant the likelihood that the comparison is executed at all,
not which outcome is more likely.  E.g. suppose the only comparison
occurs on a failure path that eventually calls abort, and that there are
other paths (without comparisons of the same value) that would benefit
from the any-extend optimisation.  We'd prioritise the cold comparison
over optimising the other (hot) code.

I'm just suspicious of heuristics along the lines of “don't do X
if there is a single instance of Y”. :-)

>> > If we do try to make the decision based on uses at expand time, it
>> > might be better for the analysis to be in target-independent code,
>> > with help from the target to decide where extensions are cheap.  It
>> > still feels a bit hacky though.
>
> I thought about it but can't see most target having this problem. I did go
> with an optimistic heuristics. There are of course various ways to defeat it
> but looking through the corpus of code I don't see any but the simple cases
> in practice. (more below).
>
>> >
>> > What stops us from forming cbz/cbnz when the extension is done close
>> > to the comparison (from the comment in 2/3)?  If we can solve that,
>> > could we simply do an any-extend all the time, and treat removing
>> > redundant extensions as a global availability problem?
>> 
>
> In such cases there's no gain from doing the extension at all, e.g.
> and w0, w0, 255
> cmp w0, 0
> b.eq .Lfoo
>
> will be optimized to
>
> tst w0, 0xff
> b.ne .Lfoo
>
> already.
>
> In RTL the problem occurs when you have nested control flow like nested if and switch statements
> The example in 2/3 was intended to show that before what we'd do is
>
> and w22, w0, 255
> .... <code that clobbers cc and caller saves>
> <switch1>
> cbz w22, .Lfoo1
> ....
> <switch2>
> cbz w22, .Lfoo2
>
> If we have a single comparison we already sink the zero_extend today.
>
> Now if we instead any-extend w0 we end up with:
>
> mov w22, w0
> .... <code that clobbers cc and caller saves>
> <switch1>
> tst w22, 0xff
> b.ne .Lfoo1
> ....
> <switch2>
> tst w22, 0xff
> b.ne .Lfoo2
>
> So you get an additional tst before each branch. You also can't perform the tst higher since CC is clobbered.
> The cbz is nice because the zero extend doesn't use CC of course and so having the value live allows us to optimize
> The branch.

Once the cbz has been formed (in combine), where does the optimisation
of it happen?

> I don't think branch likeliness matters here as you must keep w22 live in both cases somehow. In the SPECCPU 2017
> Benchmark perlbench (which uses a lot of nested switches) this pattern is responsible for an extra 0.3% codesize increase
> which the approach in 2/3 prevents.
>
>> (which would run after combine)
>> 
>> >
>> > What kind of code do we emit when do an extension just before an
>> > operation?  If the DECL_RTL is (subreg:QI (reg:SI R) 0), say, then it
>> > should be safe to do the extension directly into R:
>> >
>> >   (set (reg:SI X) (zero_extend:SI (subreg:QI (reg:SI X))))
>> 
>> Oops, that should of course be:
>> 
>>   (set (reg:SI R) (zero_extend:SI (subreg:QI (reg:SI R))))
>> 
>> > which avoids the problem of having two values live at once (the
>> > zero-extended value and the any-extended value).
>
> I'm not sure it does, as the any-extended value must remain live. i.e. above you can't get rid of w22,
> you can only choose between having it be zero of any extended.  But I am not sure how you would do
> that after expand.

These per-operation extends are emitted during expand.  The question is
whether we do them into fresh registers:

   (set (reg:SI Rtmp) (zero_extend:SI (subreg:QI (reg:SI R))))

which leaves both R and Rtmp live at the same time, or whether we
do them in-situ:

   (set (reg:SI R) (zero_extend:SI (subreg:QI (reg:SI R))))

Expand should know that the latter is valid, given the DECL_RTL.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: [PATCH 1/3]middle-end: Add the ability to let the target decide the method of argument promotions.
  2022-05-16 13:24           ` Richard Sandiford
@ 2022-05-16 15:29             ` Tamar Christina
  2022-05-16 16:48               ` Richard Sandiford
  0 siblings, 1 reply; 19+ messages in thread
From: Tamar Christina @ 2022-05-16 15:29 UTC (permalink / raw)
  To: Richard Sandiford; +Cc: gcc-patches, nd, rguenther, jeffreyalaw

> -----Original Message-----
> From: Richard Sandiford <richard.sandiford@arm.com>
> Sent: Monday, May 16, 2022 2:24 PM
> To: Tamar Christina <Tamar.Christina@arm.com>
> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; rguenther@suse.de;
> jeffreyalaw@gmail.com
> Subject: Re: [PATCH 1/3]middle-end: Add the ability to let the target decide
> the method of argument promotions.
> 
> Tamar Christina <Tamar.Christina@arm.com> writes:
> >> -----Original Message-----
> >> From: Richard Sandiford <richard.sandiford@arm.com>
> >> Sent: Monday, May 16, 2022 1:18 PM
> >> To: Tamar Christina <Tamar.Christina@arm.com>
> >> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; rguenther@suse.de;
> >> jeffreyalaw@gmail.com
> >> Subject: Re: [PATCH 1/3]middle-end: Add the ability to let the target
> >> decide the method of argument promotions.
> >>
> >> Richard Sandiford via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> >> > Tamar Christina <Tamar.Christina@arm.com> writes:
> >> >>> -----Original Message-----
> >> >>> From: Richard Sandiford <richard.sandiford@arm.com>
> >> >>> Sent: Monday, May 16, 2022 12:36 PM
> >> >>> To: Tamar Christina <Tamar.Christina@arm.com>
> >> >>> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>;
> rguenther@suse.de;
> >> >>> jeffreyalaw@gmail.com
> >> >>> Subject: Re: [PATCH 1/3]middle-end: Add the ability to let the
> >> >>> target decide the method of argument promotions.
> >> >>>
> >> >>> Tamar Christina <tamar.christina@arm.com> writes:
> >> >>> > Hi All,
> >> >>> >
> >> >>> > Some targets require function parameters to be promoted to a
> >> >>> > different type on expand time because the target may not have
> >> >>> > native instructions to work on such types.  As an example the
> >> >>> > AArch64 port does not have native instructions working on
> >> >>> > integer
> >> >>> > 8- or 16-bit values.  As such it promotes every parameter of
> >> >>> > these
> >> types to 32-bits.
> >> >>>
> >> >>> This doesn't seem specific to parameters though.  It applies to
> >> >>> any
> >> >>> 8- or 16-bit variable.  E.g.:
> >> >>>
> >> >>> #include <stdint.h>
> >> >>> uint8_t foo(uint32_t x, uint32_t y) {
> >> >>>     uint8_t z = x != 0 ? x : y;
> >> >>>     return z + 1;
> >> >>> }
> >> >>>
> >> >>> generates:
> >> >>>
> >> >>> foo:
> >> >>>         cmp     w0, 0
> >> >>>         and     w1, w1, 255
> >> >>>         and     w0, w0, 255
> >> >>>         csel    w0, w1, w0, eq
> >> >>>         add     w0, w0, 1
> >> >>>         ret
> >> >>>
> >> >>> So I think the new behaviour is really a modification of the
> >> >>> PROMOTE_MODE behaviour rather than the
> >> PROMOTE_FUNCTION_MODE behaviour.
> >> >>>
> >> >>> FWIW, I agree with Richard that it would be better not to add a
> >> >>> new
> >> hook.
> >> >>> I think we're really making PROMOTE_MODE choose between
> >> SIGN_EXTEND,
> >> >>> ZERO_EXTEND or SUBREG (what LLVM would call “any
> >> >>> extend”) rather than the current SIGN_EXTEND vs. ZERO_EXTEND
> >> choice.
> >> >>
> >> >> Ah, I hadn't realized this also applied to locals.. ok I can
> >> >> modify PROMOTE_MODE then, but I also need the actual SSA_NAME
> and
> >> >> not just
> >> the type so will have to pass this along.
> >> >>
> >> >> From a practical point of view.. the actual hook however is
> >> >> implemented by 34 targets, would I need to CC maintainers for each
> >> >> of them or would global maintainer approval suffice for these
> >> >> mostly
> >> mechanical changes?
> >> >
> >> > Yeah, single approval should be enough mechanical changes.  It
> >> > would be good to do the interface change and mechanical target
> >> > changes as a separate prepatch if possible though.
> >> >
> >> > I'm not sure about passing the SSA name to the target though, or
> >> > the way that the aarch64 hook uses the info.  It looks like a
> >> > single cold comparison could defeat the optimisation for hot code.
> >
> > I'm not sure I follow why the likelihood of the comparison matters in this
> case..
> > I'll expand on it below..
> 
> I meant the likelihood that the comparison is executed at all, not which
> outcome is more likely.  E.g. suppose the only comparison occurs on a failure
> path that eventually calls abort, and that there are other paths (without
> comparisons of the same value) that would benefit from the any-extend
> optimisation.  We'd prioritise the cold comparison over optimising the other
> (hot) code.
> 
> I'm just suspicious of heuristics along the lines of “don't do X if there is a
> single instance of Y”. :-)

I'm probably very dense here sorry.. but if there's

1 use: the zero extend gets pushed down into the branch which needs it.

i.e. in:

extern void foo ();
extern void bar ();

uint8_t f (uint8_t a, uint8_t b)
{
  if (b) {
    if (a)
      foo ();
    else
      return f (a, b);
  } else {
      bar ();
  }
  return b;
}

The zero extend of a is only done in the true branch for if (b).  Secondly the zero
extended form is the basis for all other patterns we form, such as ands, which is
the combination of the zero extend and compare.

2 uses, both live:

extern void foo ();
extern void bar (uint8_t);

uint8_t f (uint8_t a, uint8_t b)
{
  if (b) {
    if (a)
      foo ();
    else
      return f (a, b);
  } else {
      bar (a);
  }
  return b;
}

In which case the extend of a is done before the if (b) and only the extended values
used.

Even if you had multiple cold/unused branches, I struggle to see any case where the
any-extend would be better.  Reload must keep the value live as it's a param. Either you:

1. have enough registers to keep the value live, in which case, instead of doing a "mov" to
     To copy the value and then later an AND or TST, it's better to just do an and instead of the mov.
      You keep the same number of registers live but in the best case you have 1 instruction less, and
       The worse case you have 0 instructions more.
2. You don't have enough registers to keep the value live, in which case the zero extended value is
     Still better because on the reload it can simply use ldrb ..., cbz as we use the load for an implicit
     zero extend. Which is still better than ldrb ..., tst, cbnz for an any-extend.

So I am genuinely struggling to see a case where any-extend is better for comparison. And the only
reason I am singling out comparisons is because in GIMPLE integer constants don't get an explicit
promotion to int.  Otherwise I wouldn't have needed to as it would have always required an extend
here.

> 
> >> > If we do try to make the decision based on uses at expand time, it
> >> > might be better for the analysis to be in target-independent code,
> >> > with help from the target to decide where extensions are cheap.  It
> >> > still feels a bit hacky though.
> >
> > I thought about it but can't see most target having this problem. I
> > did go with an optimistic heuristics. There are of course various ways
> > to defeat it but looking through the corpus of code I don't see any
> > but the simple cases in practice. (more below).
> >
> >> >
> >> > What stops us from forming cbz/cbnz when the extension is done
> >> > close to the comparison (from the comment in 2/3)?  If we can solve
> >> > that, could we simply do an any-extend all the time, and treat
> >> > removing redundant extensions as a global availability problem?
> >>
> >
> > In such cases there's no gain from doing the extension at all, e.g.
> > and w0, w0, 255
> > cmp w0, 0
> > b.eq .Lfoo
> >
> > will be optimized to
> >
> > tst w0, 0xff
> > b.ne .Lfoo
> >
> > already.
> >
> > In RTL the problem occurs when you have nested control flow like
> > nested if and switch statements The example in 2/3 was intended to
> > show that before what we'd do is
> >
> > and w22, w0, 255
> > .... <code that clobbers cc and caller saves> <switch1> cbz w22,
> > .Lfoo1 ....
> > <switch2>
> > cbz w22, .Lfoo2
> >
> > If we have a single comparison we already sink the zero_extend today.
> >
> > Now if we instead any-extend w0 we end up with:
> >
> > mov w22, w0
> > .... <code that clobbers cc and caller saves> <switch1> tst w22, 0xff
> > b.ne .Lfoo1 ....
> > <switch2>
> > tst w22, 0xff
> > b.ne .Lfoo2
> >
> > So you get an additional tst before each branch. You also can't perform the
> tst higher since CC is clobbered.
> > The cbz is nice because the zero extend doesn't use CC of course and
> > so having the value live allows us to optimize The branch.
> 
> Once the cbz has been formed (in combine), where does the optimisation of
> it happen?

There's no real "optimization". Combine combines the cmp 0 and br leaving the AND
behind.  Because of the live range required for the value reload must copy it away from
a caller save.  It chooses to move it to w22 in this case.

and w0, w0, 255
mov w22, w0

this simply gets simplified into and w22, w0, 255 by a zero extending move pattern.
The only optimization here is when the pattern isn't single use, it's simply not moved/folded.

The only options available to combine are

cmp, br = tst + br (in the case of a subreg where it can't tell what the top bits are)
and, cmp, br = ands + br (if value is single use)
cmp, br = cbz (in the case it knows that the top bits are 0).

If we emit a zero extend both operations above are possible, and we emit them depending on
value being single use or not.  If we emit a paradoxical subreg, we never form cbz unless the value
comes from an operation where GIMPLE has maintained C semantics.

But I am probably missing something.. so I'll just make the changes and see where we land 😊

> 
> > I don't think branch likeliness matters here as you must keep w22 live
> > in both cases somehow. In the SPECCPU 2017 Benchmark perlbench (which
> > uses a lot of nested switches) this pattern is responsible for an extra 0.3%
> codesize increase which the approach in 2/3 prevents.
> >
> >> (which would run after combine)
> >>
> >> >
> >> > What kind of code do we emit when do an extension just before an
> >> > operation?  If the DECL_RTL is (subreg:QI (reg:SI R) 0), say, then
> >> > it should be safe to do the extension directly into R:
> >> >
> >> >   (set (reg:SI X) (zero_extend:SI (subreg:QI (reg:SI X))))
> >>
> >> Oops, that should of course be:
> >>
> >>   (set (reg:SI R) (zero_extend:SI (subreg:QI (reg:SI R))))
> >>
> >> > which avoids the problem of having two values live at once (the
> >> > zero-extended value and the any-extended value)

I'm assuming R here is the hardreg which has the parameter? In which case
wouldn't the subreg be folded away? I.e you end up with

(set (reg:SI R) (zero_extend:SI (reg:QI R)))

? But that SET isn’t paradoxical, we wouldn't generate it.

We generate for e.g.:

#include <stdint.h>

uint16_t f8 (uint8_t xr, uint8_t xc){
    return (uint8_t)(xr * xc);
}

(insn 9 6 10 2 (set (reg:HI 101)                                                                                                                                                                                                                                                     (zero_extend:HI (reg/v:QI 96 [ xr ]))) "prom.c":4:16 -1                                                                                                                                                                                                                   (nil))                                                                                                                                                                                                                                                                  (insn 10 9 11 2 (set (reg:HI 102)                                                                                                                                                                                                                                                    (zero_extend:HI (reg/v:QI 98 [ xc ]))) "prom.c":4:16 -1                                                                                                                                                                                                                   (nil))                                                                                                                                                                                                                                                                  (insn 11 10 12 2 (set (reg:SI 103)                                                                                                                                                                                                                                                   (mult:SI (subreg:SI (reg:HI 101) 0)                                                                                                                                                                                                                                              (subreg:SI (reg:HI 102) 0))) "prom.c":4:16 -1                                                                                                                                                                                                                         (nil))

Out of expand. The paradoxical subreg isn't generated at all out of expand
unless it's needed. It does keep the original params around as unused:

(insn 2 7 4 2 (set (reg:QI 97)                                                                                                                                                                                                                                                       (reg:QI 0 x0 [ xr ])) "prom.c":3:37 -1                                                                                                                                                                                                                                    (nil))                                                                                                                                                                                                                                                                  (insn 4 2 3 2 (set (reg:QI 99)                                                                                                                                                                                                                                                       (reg:QI 1 x1 [ xc ])) "prom.c":3:37 -1                                                                                                                                                                                                                                    (nil))  

And the paradoxical subreg is moved into the first operation requiring it:

(insn 11 10 12 2 (set (reg:SI 103)                                                                                                                                                                                                                                                   (mult:SI (subreg:SI (reg:HI 101) 0)                                                                                                                                                                                                                                              (subreg:SI (reg:HI 102) 0))) "prom.c":4:16 -1                                                                                                                                                                                                                         (nil))

In any case, I'm still not totally sure what the objection here is.  Afaik,
compares need to be treated specially because in GIMPLE they already
are.  Afaik, C integer promotion rules state that in the comparison 0 should
have been promoted to an integer constant of rank int and so the comparison itself
should have been done as integer. i.e. extended.  And most of our patterns
are based around this.

Gimple however doesn't do this, the comparison is done in the rank of the
variable and there is no explicit conversion.  This happened to be fixed up
before during the forced promotion.  So to me the heuristic doesn't seem
to be that crazy..

But I'll respin the patch without the hook and see where things land.

Thanks,
Tamar

> >
> > I'm not sure it does, as the any-extended value must remain live. i.e.
> > above you can't get rid of w22, you can only choose between having it
> > be zero of any extended.  But I am not sure how you would do that after
> expand.
> 
> These per-operation extends are emitted during expand.  The question is
> whether we do them into fresh registers:
> 
>    (set (reg:SI Rtmp) (zero_extend:SI (subreg:QI (reg:SI R))))
> 
> which leaves both R and Rtmp live at the same time, or whether we do them
> in-situ:
> 
>    (set (reg:SI R) (zero_extend:SI (subreg:QI (reg:SI R))))
> 
> Expand should know that the latter is valid, given the DECL_RTL.
> 
> Thanks,
> Richard

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/3]middle-end: Add the ability to let the target decide the method of argument promotions.
  2022-05-16 15:29             ` Tamar Christina
@ 2022-05-16 16:48               ` Richard Sandiford
  2022-05-17  7:55                 ` Tamar Christina
  0 siblings, 1 reply; 19+ messages in thread
From: Richard Sandiford @ 2022-05-16 16:48 UTC (permalink / raw)
  To: Tamar Christina; +Cc: gcc-patches, nd, rguenther, jeffreyalaw

Tamar Christina <Tamar.Christina@arm.com> writes:
>> -----Original Message-----
>> From: Richard Sandiford <richard.sandiford@arm.com>
>> Sent: Monday, May 16, 2022 2:24 PM
>> To: Tamar Christina <Tamar.Christina@arm.com>
>> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; rguenther@suse.de;
>> jeffreyalaw@gmail.com
>> Subject: Re: [PATCH 1/3]middle-end: Add the ability to let the target decide
>> the method of argument promotions.
>> 
>> Tamar Christina <Tamar.Christina@arm.com> writes:
>> >> -----Original Message-----
>> >> From: Richard Sandiford <richard.sandiford@arm.com>
>> >> Sent: Monday, May 16, 2022 1:18 PM
>> >> To: Tamar Christina <Tamar.Christina@arm.com>
>> >> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; rguenther@suse.de;
>> >> jeffreyalaw@gmail.com
>> >> Subject: Re: [PATCH 1/3]middle-end: Add the ability to let the target
>> >> decide the method of argument promotions.
>> >>
>> >> Richard Sandiford via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
>> >> > Tamar Christina <Tamar.Christina@arm.com> writes:
>> >> >>> -----Original Message-----
>> >> >>> From: Richard Sandiford <richard.sandiford@arm.com>
>> >> >>> Sent: Monday, May 16, 2022 12:36 PM
>> >> >>> To: Tamar Christina <Tamar.Christina@arm.com>
>> >> >>> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>;
>> rguenther@suse.de;
>> >> >>> jeffreyalaw@gmail.com
>> >> >>> Subject: Re: [PATCH 1/3]middle-end: Add the ability to let the
>> >> >>> target decide the method of argument promotions.
>> >> >>>
>> >> >>> Tamar Christina <tamar.christina@arm.com> writes:
>> >> >>> > Hi All,
>> >> >>> >
>> >> >>> > Some targets require function parameters to be promoted to a
>> >> >>> > different type on expand time because the target may not have
>> >> >>> > native instructions to work on such types.  As an example the
>> >> >>> > AArch64 port does not have native instructions working on
>> >> >>> > integer
>> >> >>> > 8- or 16-bit values.  As such it promotes every parameter of
>> >> >>> > these
>> >> types to 32-bits.
>> >> >>>
>> >> >>> This doesn't seem specific to parameters though.  It applies to
>> >> >>> any
>> >> >>> 8- or 16-bit variable.  E.g.:
>> >> >>>
>> >> >>> #include <stdint.h>
>> >> >>> uint8_t foo(uint32_t x, uint32_t y) {
>> >> >>>     uint8_t z = x != 0 ? x : y;
>> >> >>>     return z + 1;
>> >> >>> }
>> >> >>>
>> >> >>> generates:
>> >> >>>
>> >> >>> foo:
>> >> >>>         cmp     w0, 0
>> >> >>>         and     w1, w1, 255
>> >> >>>         and     w0, w0, 255
>> >> >>>         csel    w0, w1, w0, eq
>> >> >>>         add     w0, w0, 1
>> >> >>>         ret
>> >> >>>
>> >> >>> So I think the new behaviour is really a modification of the
>> >> >>> PROMOTE_MODE behaviour rather than the
>> >> PROMOTE_FUNCTION_MODE behaviour.
>> >> >>>
>> >> >>> FWIW, I agree with Richard that it would be better not to add a
>> >> >>> new
>> >> hook.
>> >> >>> I think we're really making PROMOTE_MODE choose between
>> >> SIGN_EXTEND,
>> >> >>> ZERO_EXTEND or SUBREG (what LLVM would call “any
>> >> >>> extend”) rather than the current SIGN_EXTEND vs. ZERO_EXTEND
>> >> choice.
>> >> >>
>> >> >> Ah, I hadn't realized this also applied to locals.. ok I can
>> >> >> modify PROMOTE_MODE then, but I also need the actual SSA_NAME
>> and
>> >> >> not just
>> >> the type so will have to pass this along.
>> >> >>
>> >> >> From a practical point of view.. the actual hook however is
>> >> >> implemented by 34 targets, would I need to CC maintainers for each
>> >> >> of them or would global maintainer approval suffice for these
>> >> >> mostly
>> >> mechanical changes?
>> >> >
>> >> > Yeah, single approval should be enough mechanical changes.  It
>> >> > would be good to do the interface change and mechanical target
>> >> > changes as a separate prepatch if possible though.
>> >> >
>> >> > I'm not sure about passing the SSA name to the target though, or
>> >> > the way that the aarch64 hook uses the info.  It looks like a
>> >> > single cold comparison could defeat the optimisation for hot code.
>> >
>> > I'm not sure I follow why the likelihood of the comparison matters in this
>> case..
>> > I'll expand on it below..
>> 
>> I meant the likelihood that the comparison is executed at all, not which
>> outcome is more likely.  E.g. suppose the only comparison occurs on a failure
>> path that eventually calls abort, and that there are other paths (without
>> comparisons of the same value) that would benefit from the any-extend
>> optimisation.  We'd prioritise the cold comparison over optimising the other
>> (hot) code.
>> 
>> I'm just suspicious of heuristics along the lines of “don't do X if there is a
>> single instance of Y”. :-)
>
> I'm probably very dense here sorry.. but if there's
>
> 1 use: the zero extend gets pushed down into the branch which needs it.
>
> i.e. in:
>
> extern void foo ();
> extern void bar ();
>
> uint8_t f (uint8_t a, uint8_t b)
> {
>   if (b) {
>     if (a)
>       foo ();
>     else
>       return f (a, b);
>   } else {
>       bar ();
>   }
>   return b;
> }
>
> The zero extend of a is only done in the true branch for if (b).  Secondly the zero
> extended form is the basis for all other patterns we form, such as ands, which is
> the combination of the zero extend and compare.
>
> 2 uses, both live:
>
> extern void foo ();
> extern void bar (uint8_t);
>
> uint8_t f (uint8_t a, uint8_t b)
> {
>   if (b) {
>     if (a)
>       foo ();
>     else
>       return f (a, b);
>   } else {
>       bar (a);
>   }
>   return b;
> }
>
> In which case the extend of a is done before the if (b) and only the extended values
> used.
>
> Even if you had multiple cold/unused branches, I struggle to see any case where the
> any-extend would be better.  Reload must keep the value live as it's a param. Either you:
>
> 1. have enough registers to keep the value live, in which case, instead of doing a "mov" to
>      To copy the value and then later an AND or TST, it's better to just do an and instead of the mov.
>       You keep the same number of registers live but in the best case you have 1 instruction less, and
>        The worse case you have 0 instructions more.
> 2. You don't have enough registers to keep the value live, in which case the zero extended value is
>      Still better because on the reload it can simply use ldrb ..., cbz as we use the load for an implicit
>      zero extend. Which is still better than ldrb ..., tst, cbnz for an any-extend.
>
> So I am genuinely struggling to see a case where any-extend is better for comparison. And the only
> reason I am singling out comparisons is because in GIMPLE integer constants don't get an explicit
> promotion to int.  Otherwise I wouldn't have needed to as it would have always required an extend
> here.

IIUC, you're talking about cases involving multiple comparisons.  I was
instead talking about the case where there is 1 cold comparison that doesn't
benefit from any-extend and multiple hot operations (not comparisons)
that do benefit.  The patch then seemed to avoid any-extend because of
the cold comparison.

E.g. does the patch avoid the AND in:

#include <stdint.h>
uint8_t foo(uint8_t x, int y) {
    if (y) {
        printf("Foo %d\n", x ? 1 : 2);
        __builtin_abort ();
    }
    return x + 1;
}

?

>> >> > If we do try to make the decision based on uses at expand time, it
>> >> > might be better for the analysis to be in target-independent code,
>> >> > with help from the target to decide where extensions are cheap.  It
>> >> > still feels a bit hacky though.
>> >
>> > I thought about it but can't see most target having this problem. I
>> > did go with an optimistic heuristics. There are of course various ways
>> > to defeat it but looking through the corpus of code I don't see any
>> > but the simple cases in practice. (more below).
>> >
>> >> >
>> >> > What stops us from forming cbz/cbnz when the extension is done
>> >> > close to the comparison (from the comment in 2/3)?  If we can solve
>> >> > that, could we simply do an any-extend all the time, and treat
>> >> > removing redundant extensions as a global availability problem?
>> >>
>> >
>> > In such cases there's no gain from doing the extension at all, e.g.
>> > and w0, w0, 255
>> > cmp w0, 0
>> > b.eq .Lfoo
>> >
>> > will be optimized to
>> >
>> > tst w0, 0xff
>> > b.ne .Lfoo
>> >
>> > already.
>> >
>> > In RTL the problem occurs when you have nested control flow like
>> > nested if and switch statements The example in 2/3 was intended to
>> > show that before what we'd do is
>> >
>> > and w22, w0, 255
>> > .... <code that clobbers cc and caller saves> <switch1> cbz w22,
>> > .Lfoo1 ....
>> > <switch2>
>> > cbz w22, .Lfoo2
>> >
>> > If we have a single comparison we already sink the zero_extend today.
>> >
>> > Now if we instead any-extend w0 we end up with:
>> >
>> > mov w22, w0
>> > .... <code that clobbers cc and caller saves> <switch1> tst w22, 0xff
>> > b.ne .Lfoo1 ....
>> > <switch2>
>> > tst w22, 0xff
>> > b.ne .Lfoo2
>> >
>> > So you get an additional tst before each branch. You also can't perform the
>> tst higher since CC is clobbered.
>> > The cbz is nice because the zero extend doesn't use CC of course and
>> > so having the value live allows us to optimize The branch.
>> 
>> Once the cbz has been formed (in combine), where does the optimisation of
>> it happen?
>
> There's no real "optimization". Combine combines the cmp 0 and br leaving the AND
> behind.  Because of the live range required for the value reload must copy it away from
> a caller save.  It chooses to move it to w22 in this case.
>
> and w0, w0, 255
> mov w22, w0
>
> this simply gets simplified into and w22, w0, 255 by a zero extending move pattern.
> The only optimization here is when the pattern isn't single use, it's simply not moved/folded.
>
> The only options available to combine are
>
> cmp, br = tst + br (in the case of a subreg where it can't tell what the top bits are)
> and, cmp, br = ands + br (if value is single use)
> cmp, br = cbz (in the case it knows that the top bits are 0).
>
> If we emit a zero extend both operations above are possible, and we emit them depending on
> value being single use or not.  If we emit a paradoxical subreg, we never form cbz unless the value
> comes from an operation where GIMPLE has maintained C semantics.
>
> But I am probably missing something.. so I'll just make the changes and see where we land 😊

No, I agree/was agreeing with the description of the combine behaviour.
I guess I just misunderstood what you meant by “the cbz is nice because
the zero extend doesn't use CC of course and so having the value live
allows us to optimize the branch”.

>> > I don't think branch likeliness matters here as you must keep w22 live
>> > in both cases somehow. In the SPECCPU 2017 Benchmark perlbench (which
>> > uses a lot of nested switches) this pattern is responsible for an extra 0.3%
>> codesize increase which the approach in 2/3 prevents.
>> >
>> >> (which would run after combine)
>> >>
>> >> >
>> >> > What kind of code do we emit when do an extension just before an
>> >> > operation?  If the DECL_RTL is (subreg:QI (reg:SI R) 0), say, then
>> >> > it should be safe to do the extension directly into R:
>> >> >
>> >> >   (set (reg:SI X) (zero_extend:SI (subreg:QI (reg:SI X))))
>> >>
>> >> Oops, that should of course be:
>> >>
>> >>   (set (reg:SI R) (zero_extend:SI (subreg:QI (reg:SI R))))
>> >>
>> >> > which avoids the problem of having two values live at once (the
>> >> > zero-extended value and the any-extended value)
>
> I'm assuming R here is the hardreg which has the parameter? In which case
> wouldn't the subreg be folded away? I.e you end up with
>
> (set (reg:SI R) (zero_extend:SI (reg:QI R)))

No, R is the pseudo that holds the DECL_RTL (for both VAR_DECLs and
PARM_DECLs).

> ? But that SET isn’t paradoxical, we wouldn't generate it.
>
> We generate for e.g.:
>
> #include <stdint.h>
>
> uint16_t f8 (uint8_t xr, uint8_t xc){
>     return (uint8_t)(xr * xc);
> }
>
> (insn 9 6 10 2 (set (reg:HI 101)                                                                                                                                                                                                                                                     (zero_extend:HI (reg/v:QI 96 [ xr ]))) "prom.c":4:16 -1                                                                                                                                                                                                                   (nil))                                                                                                                                                                                                                                                                  (insn 10 9 11 2 (set (reg:HI 102)                                                                                                                                                                                                                                                    (zero_extend:HI (reg/v:QI 98 [ xc ]))) "prom.c":4:16 -1                                                                                                                                                                                                                   (nil))                                                                                                                                                                                                                                                                  (insn 11 10 12 2 (set (reg:SI 103)                                                                                                                                                                                                                                                   (mult:SI (subreg:SI (reg:HI 101) 0)                                                                                                                                                                                                                                              (subreg:SI (reg:HI 102) 0))) "prom.c":4:16 -1                                                                                                                                                                                                                         (nil))
>
> Out of expand. The paradoxical subreg isn't generated at all out of expand
> unless it's needed. It does keep the original params around as unused:
>
> (insn 2 7 4 2 (set (reg:QI 97)                                                                                                                                                                                                                                                       (reg:QI 0 x0 [ xr ])) "prom.c":3:37 -1                                                                                                                                                                                                                                    (nil))                                                                                                                                                                                                                                                                  (insn 4 2 3 2 (set (reg:QI 99)                                                                                                                                                                                                                                                       (reg:QI 1 x1 [ xc ])) "prom.c":3:37 -1                                                                                                                                                                                                                                    (nil))  
>
> And the paradoxical subreg is moved into the first operation requiring it:
>
> (insn 11 10 12 2 (set (reg:SI 103)                                                                                                                                                                                                                                                   (mult:SI (subreg:SI (reg:HI 101) 0)                                                                                                                                                                                                                                              (subreg:SI (reg:HI 102) 0))) "prom.c":4:16 -1                                                                                                                                                                                                                         (nil))

Ah, OK, this isn't what I'd imaagined.  I thought the xr and xc registers
would be SIs and the DECL_RTLs would be QI subregs of those SI regs.
I think that might work better, for the reasons above.  (That is,
whenever we need the register in extended form, we can simply extend
the existing reg rather than create a new one.)

I think that's where confusion was coming from.

> In any case, I'm still not totally sure what the objection here is.  Afaik,
> compares need to be treated specially because in GIMPLE they already
> are.  Afaik, C integer promotion rules state that in the comparison 0 should
> have been promoted to an integer constant of rank int and so the comparison itself
> should have been done as integer. i.e. extended.  And most of our patterns
> are based around this.
>
> Gimple however doesn't do this, the comparison is done in the rank of the
> variable and there is no explicit conversion.  This happened to be fixed up
> before during the forced promotion.  So to me the heuristic doesn't seem
> to be that crazy..

I guess I still don't see the distinction.  C says the same thing about
+, -, >>, etc.  And gimple is free to do those operations in narrow types
if it wants to, and if that doesn't change the semantics.  (Not that gimple
always does them in narrow types.  But it is valid gimple.)

The optimisation problem doesn't come from C or gimple semantics,
but from the fact that AArch64 (unlike x86 say) doesn't have byte add,
byte compare, byte right shift, etc.  We therefore need to promote
8-bit and 16-bit operations to 32 bits first.

For add, subtract, multiply, left shift, and logic ops, we can avoid
defining the upper bits of the inputs when we do these extensions,
because the upper bits of the inputs don't affect the useful bits of
the result.  But for comparisons, right shifts, and divides, etc.,
we do need to extend.

AIUI, the comparison case is special because (for AArch64-specific
reasons), we prefer extend + cbz to tst + branch, especially when the
extend can be shared.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: [PATCH 1/3]middle-end: Add the ability to let the target decide the method of argument promotions.
  2022-05-16 16:48               ` Richard Sandiford
@ 2022-05-17  7:55                 ` Tamar Christina
  2022-05-17  9:03                   ` Richard Sandiford
  0 siblings, 1 reply; 19+ messages in thread
From: Tamar Christina @ 2022-05-17  7:55 UTC (permalink / raw)
  To: Richard Sandiford; +Cc: gcc-patches, nd, rguenther, jeffreyalaw

> -----Original Message-----
> From: Richard Sandiford <richard.sandiford@arm.com>
> Sent: Monday, May 16, 2022 5:48 PM
> To: Tamar Christina <Tamar.Christina@arm.com>
> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; rguenther@suse.de;
> jeffreyalaw@gmail.com
> Subject: Re: [PATCH 1/3]middle-end: Add the ability to let the target decide
> the method of argument promotions.
> 
> Tamar Christina <Tamar.Christina@arm.com> writes:
> >> -----Original Message-----
> >> From: Richard Sandiford <richard.sandiford@arm.com>
> >> Sent: Monday, May 16, 2022 2:24 PM
> >> To: Tamar Christina <Tamar.Christina@arm.com>
> >> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; rguenther@suse.de;
> >> jeffreyalaw@gmail.com
> >> Subject: Re: [PATCH 1/3]middle-end: Add the ability to let the target
> >> decide the method of argument promotions.
> >>
> >> Tamar Christina <Tamar.Christina@arm.com> writes:
> >> >> -----Original Message-----
> >> >> From: Richard Sandiford <richard.sandiford@arm.com>
> >> >> Sent: Monday, May 16, 2022 1:18 PM
> >> >> To: Tamar Christina <Tamar.Christina@arm.com>
> >> >> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; rguenther@suse.de;
> >> >> jeffreyalaw@gmail.com
> >> >> Subject: Re: [PATCH 1/3]middle-end: Add the ability to let the
> >> >> target decide the method of argument promotions.
> >> >>
> >> >> Richard Sandiford via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> >> >> > Tamar Christina <Tamar.Christina@arm.com> writes:
> >> >> >>> -----Original Message-----
> >> >> >>> From: Richard Sandiford <richard.sandiford@arm.com>
> >> >> >>> Sent: Monday, May 16, 2022 12:36 PM
> >> >> >>> To: Tamar Christina <Tamar.Christina@arm.com>
> >> >> >>> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>;
> >> rguenther@suse.de;
> >> >> >>> jeffreyalaw@gmail.com
> >> >> >>> Subject: Re: [PATCH 1/3]middle-end: Add the ability to let the
> >> >> >>> target decide the method of argument promotions.
> >> >> >>>
> >> >> >>> Tamar Christina <tamar.christina@arm.com> writes:
> >> >> >>> > Hi All,
> >> >> >>> >
> >> >> >>> > Some targets require function parameters to be promoted to a
> >> >> >>> > different type on expand time because the target may not
> >> >> >>> > have native instructions to work on such types.  As an
> >> >> >>> > example the
> >> >> >>> > AArch64 port does not have native instructions working on
> >> >> >>> > integer
> >> >> >>> > 8- or 16-bit values.  As such it promotes every parameter of
> >> >> >>> > these
> >> >> types to 32-bits.
> >> >> >>>
> >> >> >>> This doesn't seem specific to parameters though.  It applies
> >> >> >>> to any
> >> >> >>> 8- or 16-bit variable.  E.g.:
> >> >> >>>
> >> >> >>> #include <stdint.h>
> >> >> >>> uint8_t foo(uint32_t x, uint32_t y) {
> >> >> >>>     uint8_t z = x != 0 ? x : y;
> >> >> >>>     return z + 1;
> >> >> >>> }
> >> >> >>>
> >> >> >>> generates:
> >> >> >>>
> >> >> >>> foo:
> >> >> >>>         cmp     w0, 0
> >> >> >>>         and     w1, w1, 255
> >> >> >>>         and     w0, w0, 255
> >> >> >>>         csel    w0, w1, w0, eq
> >> >> >>>         add     w0, w0, 1
> >> >> >>>         ret
> >> >> >>>
> >> >> >>> So I think the new behaviour is really a modification of the
> >> >> >>> PROMOTE_MODE behaviour rather than the
> >> >> PROMOTE_FUNCTION_MODE behaviour.
> >> >> >>>
> >> >> >>> FWIW, I agree with Richard that it would be better not to add
> >> >> >>> a new
> >> >> hook.
> >> >> >>> I think we're really making PROMOTE_MODE choose between
> >> >> SIGN_EXTEND,
> >> >> >>> ZERO_EXTEND or SUBREG (what LLVM would call “any
> >> >> >>> extend”) rather than the current SIGN_EXTEND vs. ZERO_EXTEND
> >> >> choice.
> >> >> >>
> >> >> >> Ah, I hadn't realized this also applied to locals.. ok I can
> >> >> >> modify PROMOTE_MODE then, but I also need the actual
> SSA_NAME
> >> and
> >> >> >> not just
> >> >> the type so will have to pass this along.
> >> >> >>
> >> >> >> From a practical point of view.. the actual hook however is
> >> >> >> implemented by 34 targets, would I need to CC maintainers for
> >> >> >> each of them or would global maintainer approval suffice for
> >> >> >> these mostly
> >> >> mechanical changes?
> >> >> >
> >> >> > Yeah, single approval should be enough mechanical changes.  It
> >> >> > would be good to do the interface change and mechanical target
> >> >> > changes as a separate prepatch if possible though.
> >> >> >
> >> >> > I'm not sure about passing the SSA name to the target though, or
> >> >> > the way that the aarch64 hook uses the info.  It looks like a
> >> >> > single cold comparison could defeat the optimisation for hot code.
> >> >
> >> > I'm not sure I follow why the likelihood of the comparison matters
> >> > in this
> >> case..
> >> > I'll expand on it below..
> >>
> >> I meant the likelihood that the comparison is executed at all, not
> >> which outcome is more likely.  E.g. suppose the only comparison
> >> occurs on a failure path that eventually calls abort, and that there
> >> are other paths (without comparisons of the same value) that would
> >> benefit from the any-extend optimisation.  We'd prioritise the cold
> >> comparison over optimising the other
> >> (hot) code.
> >>
> >> I'm just suspicious of heuristics along the lines of “don't do X if
> >> there is a single instance of Y”. :-)
> >
> > I'm probably very dense here sorry.. but if there's
> >
> > 1 use: the zero extend gets pushed down into the branch which needs it.
> >
> > i.e. in:
> >
> > extern void foo ();
> > extern void bar ();
> >
> > uint8_t f (uint8_t a, uint8_t b)
> > {
> >   if (b) {
> >     if (a)
> >       foo ();
> >     else
> >       return f (a, b);
> >   } else {
> >       bar ();
> >   }
> >   return b;
> > }
> >
> > The zero extend of a is only done in the true branch for if (b).
> > Secondly the zero extended form is the basis for all other patterns we
> > form, such as ands, which is the combination of the zero extend and
> compare.
> >
> > 2 uses, both live:
> >
> > extern void foo ();
> > extern void bar (uint8_t);
> >
> > uint8_t f (uint8_t a, uint8_t b)
> > {
> >   if (b) {
> >     if (a)
> >       foo ();
> >     else
> >       return f (a, b);
> >   } else {
> >       bar (a);
> >   }
> >   return b;
> > }
> >
> > In which case the extend of a is done before the if (b) and only the
> > extended values used.
> >
> > Even if you had multiple cold/unused branches, I struggle to see any
> > case where the any-extend would be better.  Reload must keep the value
> live as it's a param. Either you:
> >
> > 1. have enough registers to keep the value live, in which case, instead of
> doing a "mov" to
> >      To copy the value and then later an AND or TST, it's better to just do an
> and instead of the mov.
> >       You keep the same number of registers live but in the best case you
> have 1 instruction less, and
> >        The worse case you have 0 instructions more.
> > 2. You don't have enough registers to keep the value live, in which case the
> zero extended value is
> >      Still better because on the reload it can simply use ldrb ..., cbz as we use
> the load for an implicit
> >      zero extend. Which is still better than ldrb ..., tst, cbnz for an any-extend.
> >
> > So I am genuinely struggling to see a case where any-extend is better
> > for comparison. And the only reason I am singling out comparisons is
> > because in GIMPLE integer constants don't get an explicit promotion to
> > int.  Otherwise I wouldn't have needed to as it would have always required
> an extend here.
> 
> IIUC, you're talking about cases involving multiple comparisons.  I was instead
> talking about the case where there is 1 cold comparison that doesn't benefit
> from any-extend and multiple hot operations (not comparisons) that do
> benefit.  The patch then seemed to avoid any-extend because of the cold
> comparison.
> 
> E.g. does the patch avoid the AND in:
> 
> #include <stdint.h>
> uint8_t foo(uint8_t x, int y) {
>     if (y) {
>         printf("Foo %d\n", x ? 1 : 2);
>         __builtin_abort ();
>     }
>     return x + 1;
> }
> 
> ?

Morning,

It does actually, it generates:

foo:
        cbnz    w1, .L9
        add     w0, w0, 1
        ret
.L9:
        tst     w0, 255
        stp     x29, x30, [sp, -16]!
        cset    w1, ne
        add     w1, w1, 1
        mov     x29, sp
        adrp    x0, .LC0
        add     x0, x0, :lo12:.LC0
        bl      printf
        bl      abort
        .size   foo, .-foo

Now I will admit that this isn't because of a grand master design, but
purely because the patch works around the cases seen in SPEC.  In those
cases the comparisons in question were floated out of the if statement.

The heuristic in patch 2/3 allows this because it only looks for compares in
gimple assigns whereas in this case the compare is in the Gimple cond
directly.

> 
> >> >> > If we do try to make the decision based on uses at expand time,
> >> >> > it might be better for the analysis to be in target-independent
> >> >> > code, with help from the target to decide where extensions are
> >> >> > cheap.  It still feels a bit hacky though.
> >> >
> >> > I thought about it but can't see most target having this problem. I
> >> > did go with an optimistic heuristics. There are of course various
> >> > ways to defeat it but looking through the corpus of code I don't
> >> > see any but the simple cases in practice. (more below).
> >> >
> >> >> >
> >> >> > What stops us from forming cbz/cbnz when the extension is done
> >> >> > close to the comparison (from the comment in 2/3)?  If we can
> >> >> > solve that, could we simply do an any-extend all the time, and
> >> >> > treat removing redundant extensions as a global availability problem?
> >> >>
> >> >
> >> > In such cases there's no gain from doing the extension at all, e.g.
> >> > and w0, w0, 255
> >> > cmp w0, 0
> >> > b.eq .Lfoo
> >> >
> >> > will be optimized to
> >> >
> >> > tst w0, 0xff
> >> > b.ne .Lfoo
> >> >
> >> > already.
> >> >
> >> > In RTL the problem occurs when you have nested control flow like
> >> > nested if and switch statements The example in 2/3 was intended to
> >> > show that before what we'd do is
> >> >
> >> > and w22, w0, 255
> >> > .... <code that clobbers cc and caller saves> <switch1> cbz w22,
> >> > .Lfoo1 ....
> >> > <switch2>
> >> > cbz w22, .Lfoo2
> >> >
> >> > If we have a single comparison we already sink the zero_extend today.
> >> >
> >> > Now if we instead any-extend w0 we end up with:
> >> >
> >> > mov w22, w0
> >> > .... <code that clobbers cc and caller saves> <switch1> tst w22,
> >> > 0xff b.ne .Lfoo1 ....
> >> > <switch2>
> >> > tst w22, 0xff
> >> > b.ne .Lfoo2
> >> >
> >> > So you get an additional tst before each branch. You also can't
> >> > perform the
> >> tst higher since CC is clobbered.
> >> > The cbz is nice because the zero extend doesn't use CC of course
> >> > and so having the value live allows us to optimize The branch.
> >>
> >> Once the cbz has been formed (in combine), where does the
> >> optimisation of it happen?
> >
> > There's no real "optimization". Combine combines the cmp 0 and br
> > leaving the AND behind.  Because of the live range required for the
> > value reload must copy it away from a caller save.  It chooses to move it to
> w22 in this case.
> >
> > and w0, w0, 255
> > mov w22, w0
> >
> > this simply gets simplified into and w22, w0, 255 by a zero extending move
> pattern.
> > The only optimization here is when the pattern isn't single use, it's simply
> not moved/folded.
> >
> > The only options available to combine are
> >
> > cmp, br = tst + br (in the case of a subreg where it can't tell what
> > the top bits are) and, cmp, br = ands + br (if value is single use)
> > cmp, br = cbz (in the case it knows that the top bits are 0).
> >
> > If we emit a zero extend both operations above are possible, and we
> > emit them depending on value being single use or not.  If we emit a
> > paradoxical subreg, we never form cbz unless the value comes from an
> operation where GIMPLE has maintained C semantics.
> >
> > But I am probably missing something.. so I'll just make the changes
> > and see where we land 😊
> 
> No, I agree/was agreeing with the description of the combine behaviour.
> I guess I just misunderstood what you meant by “the cbz is nice because the
> zero extend doesn't use CC of course and so having the value live allows us
> to optimize the branch”.
> 
> >> > I don't think branch likeliness matters here as you must keep w22
> >> > live in both cases somehow. In the SPECCPU 2017 Benchmark perlbench
> >> > (which uses a lot of nested switches) this pattern is responsible
> >> > for an extra 0.3%
> >> codesize increase which the approach in 2/3 prevents.
> >> >
> >> >> (which would run after combine)
> >> >>
> >> >> >
> >> >> > What kind of code do we emit when do an extension just before an
> >> >> > operation?  If the DECL_RTL is (subreg:QI (reg:SI R) 0), say,
> >> >> > then it should be safe to do the extension directly into R:
> >> >> >
> >> >> >   (set (reg:SI X) (zero_extend:SI (subreg:QI (reg:SI X))))
> >> >>
> >> >> Oops, that should of course be:
> >> >>
> >> >>   (set (reg:SI R) (zero_extend:SI (subreg:QI (reg:SI R))))
> >> >>
> >> >> > which avoids the problem of having two values live at once (the
> >> >> > zero-extended value and the any-extended value)
> >
> > I'm assuming R here is the hardreg which has the parameter? In which
> > case wouldn't the subreg be folded away? I.e you end up with
> >
> > (set (reg:SI R) (zero_extend:SI (reg:QI R)))
> 
> No, R is the pseudo that holds the DECL_RTL (for both VAR_DECLs and
> PARM_DECLs).
> 
> > ? But that SET isn’t paradoxical, we wouldn't generate it.
> >
> > We generate for e.g.:
> >
> > #include <stdint.h>
> >
> > uint16_t f8 (uint8_t xr, uint8_t xc){
> >     return (uint8_t)(xr * xc);
> > }
> >
> > (insn 9 6 10 2 (set (reg:HI 101)
> (zero_extend:HI (reg/v:QI 96 [ xr ]))) "prom.c":4:16 -1
> (nil))
> (insn 10 9 11 2 (set (reg:HI 102)
> (zero_extend:HI (reg/v:QI 98 [ xc ]))) "prom.c":4:16 -1
> (nil))
> (insn 11 10 12 2 (set (reg:SI 103)
> (mult:SI (subreg:SI (reg:HI 101) 0)
> (subreg:SI (reg:HI 102) 0))) "prom.c":4:16 -1
> (nil))
> >
> > Out of expand. The paradoxical subreg isn't generated at all out of
> > expand unless it's needed. It does keep the original params around as
> unused:
> >
> > (insn 2 7 4 2 (set (reg:QI 97)
> (reg:QI 0 x0 [ xr ])) "prom.c":3:37 -1
> (nil))
> (insn 4 2 3 2 (set (reg:QI 99)
> (reg:QI 1 x1 [ xc ])) "prom.c":3:37 -1
> (nil))
> >
> > And the paradoxical subreg is moved into the first operation requiring it:
> >
> > (insn 11 10 12 2 (set (reg:SI 103)
> (mult:SI (subreg:SI (reg:HI 101) 0)
> (subreg:SI (reg:HI 102) 0))) "prom.c":4:16 -1
> (nil))
> 
> Ah, OK, this isn't what I'd imaagined.  I thought the xr and xc registers would
> be SIs and the DECL_RTLs would be QI subregs of those SI regs.
> I think that might work better, for the reasons above.  (That is, whenever we
> need the register in extended form, we can simply extend the existing reg
> rather than create a new one.)

Ah, I see, no, I explicitly avoid this. When doing the type promotions I tell it that
size of the copies of xr and xc is still the original size, e.g. QI (i.e. I don't change 97 and 99).
This is different from what we do with extends where 97 and 99 *would* be changed.

The reason is that if I make this SI the compiler thinks it knows the value of all the bits
in the register which led to various miscompares as it thinks it can use the SI value directly.

This happens because again the xr and xc are hard regs. So having 97 be

(set (reg:SI 97) (subreg:SI (reg:QI 0 x0 [ xr ]) 0))

gets folded to an incorrect

(set (reg:SI 97) (reg:SI 0 x0 [ xr ]))

And now 97 is free to be used without any zero extension, as 97 on it's own is an invalid RTX.

So I have to keep the intermediate copy QI mode, after which the RTX optimizations
being done during expand generates the forms above.

> 
> I think that's where confusion was coming from.
> 
> > In any case, I'm still not totally sure what the objection here is.
> > Afaik, compares need to be treated specially because in GIMPLE they
> > already are.  Afaik, C integer promotion rules state that in the
> > comparison 0 should have been promoted to an integer constant of rank
> > int and so the comparison itself should have been done as integer.
> > i.e. extended.  And most of our patterns are based around this.
> >
> > Gimple however doesn't do this, the comparison is done in the rank of
> > the variable and there is no explicit conversion.  This happened to be
> > fixed up before during the forced promotion.  So to me the heuristic
> > doesn't seem to be that crazy..
> 
> I guess I still don't see the distinction.  C says the same thing about
> +, -, >>, etc.  And gimple is free to do those operations in narrow
> +types
> if it wants to, and if that doesn't change the semantics.  (Not that gimple
> always does them in narrow types.  But it is valid gimple.)
> 
> The optimisation problem doesn't come from C or gimple semantics, but
> from the fact that AArch64 (unlike x86 say) doesn't have byte add, byte
> compare, byte right shift, etc.  We therefore need to promote 8-bit and 16-
> bit operations to 32 bits first.
> 
> For add, subtract, multiply, left shift, and logic ops, we can avoid defining the
> upper bits of the inputs when we do these extensions, because the upper
> bits of the inputs don't affect the useful bits of the result.  But for
> comparisons, right shifts, and divides, etc., we do need to extend.
> 
> AIUI, the comparison case is special because (for AArch64-specific reasons),
> we prefer extend + cbz to tst + branch, especially when the extend can be
> shared.

Agreed, so I think we agree on this 😊 I guess the disagreement is where this
should be done. I'll admit that the testcase above works by coincidence. But if
we don't do it during expand time, the only place I can think of to introduce
the zero extends is to add various patterns to do an early split of any-extend
+ cmp.  But wouldn't that be more fragile? At least at expand time all comparisons
are tcc_comparisons. 

Kind regards,
Tamar
> 
> Thanks,
> Richard

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/3]middle-end: Add the ability to let the target decide the method of argument promotions.
  2022-05-17  7:55                 ` Tamar Christina
@ 2022-05-17  9:03                   ` Richard Sandiford
  2022-05-17 17:45                     ` Tamar Christina
  0 siblings, 1 reply; 19+ messages in thread
From: Richard Sandiford @ 2022-05-17  9:03 UTC (permalink / raw)
  To: Tamar Christina; +Cc: gcc-patches, nd, rguenther, jeffreyalaw

Tamar Christina <Tamar.Christina@arm.com> writes:
[…]
>> E.g. does the patch avoid the AND in:
>> 
>> #include <stdint.h>
>> uint8_t foo(uint8_t x, int y) {
>>     if (y) {
>>         printf("Foo %d\n", x ? 1 : 2);
>>         __builtin_abort ();
>>     }
>>     return x + 1;
>> }
>> 
>> ?
>
> Morning,
>
> It does actually, it generates:
>
> foo:
>         cbnz    w1, .L9
>         add     w0, w0, 1
>         ret
> .L9:
>         tst     w0, 255
>         stp     x29, x30, [sp, -16]!
>         cset    w1, ne
>         add     w1, w1, 1
>         mov     x29, sp
>         adrp    x0, .LC0
>         add     x0, x0, :lo12:.LC0
>         bl      printf
>         bl      abort
>         .size   foo, .-foo

Ah, nice.

> Now I will admit that this isn't because of a grand master design, but
> purely because the patch works around the cases seen in SPEC.  In those
> cases the comparisons in question were floated out of the if statement.
>
> The heuristic in patch 2/3 allows this because it only looks for compares in
> gimple assigns whereas in this case the compare is in the Gimple cond
> directly.

OK.

[…]
>> > We generate for e.g.:
>> >
>> > #include <stdint.h>
>> >
>> > uint16_t f8 (uint8_t xr, uint8_t xc){
>> >     return (uint8_t)(xr * xc);
>> > }
>> >
>> > (insn 9 6 10 2 (set (reg:HI 101)
>> (zero_extend:HI (reg/v:QI 96 [ xr ]))) "prom.c":4:16 -1
>> (nil))
>> (insn 10 9 11 2 (set (reg:HI 102)
>> (zero_extend:HI (reg/v:QI 98 [ xc ]))) "prom.c":4:16 -1
>> (nil))
>> (insn 11 10 12 2 (set (reg:SI 103)
>> (mult:SI (subreg:SI (reg:HI 101) 0)
>> (subreg:SI (reg:HI 102) 0))) "prom.c":4:16 -1
>> (nil))
>> >
>> > Out of expand. The paradoxical subreg isn't generated at all out of
>> > expand unless it's needed. It does keep the original params around as
>> unused:
>> >
>> > (insn 2 7 4 2 (set (reg:QI 97)
>> (reg:QI 0 x0 [ xr ])) "prom.c":3:37 -1
>> (nil))
>> (insn 4 2 3 2 (set (reg:QI 99)
>> (reg:QI 1 x1 [ xc ])) "prom.c":3:37 -1
>> (nil))
>> >
>> > And the paradoxical subreg is moved into the first operation requiring it:
>> >
>> > (insn 11 10 12 2 (set (reg:SI 103)
>> (mult:SI (subreg:SI (reg:HI 101) 0)
>> (subreg:SI (reg:HI 102) 0))) "prom.c":4:16 -1
>> (nil))
>> 
>> Ah, OK, this isn't what I'd imaagined.  I thought the xr and xc registers would
>> be SIs and the DECL_RTLs would be QI subregs of those SI regs.
>> I think that might work better, for the reasons above.  (That is, whenever we
>> need the register in extended form, we can simply extend the existing reg
>> rather than create a new one.)
>
> Ah, I see, no, I explicitly avoid this. When doing the type promotions I tell it that
> size of the copies of xr and xc is still the original size, e.g. QI (i.e. I don't change 97 and 99).
> This is different from what we do with extends where 97 and 99 *would* be changed.
>
> The reason is that if I make this SI the compiler thinks it knows the value of all the bits
> in the register which led to various miscompares as it thinks it can use the SI value directly.
>
> This happens because again the xr and xc are hard regs. So having 97 be
>
> (set (reg:SI 97) (subreg:SI (reg:QI 0 x0 [ xr ]) 0))
>
> gets folded to an incorrect
>
> (set (reg:SI 97) (reg:SI 0 x0 [ xr ]))

This part I would expect (and hope for :-)).

> And now 97 is free to be used without any zero extension, as 97 on it's own is an invalid RTX.

But the way I'd imagined it working, expand would need to insert an
extension before any operation that needs the upper 24 bits to be
defined (e.g. comparisons, right shifts).  If the DECL_RTL is
(subreg:QI (reg:SI x) 0) then the upper bits are not defined,
since SUBREG_PROMOTED_VAR_P would/should be false for the subreg.

E.g. for:

  int8_t foo(int8_t x) { return x >> 1; }

x would have a DECL_RTL of (subreg:QI (reg:SI x) 0), the parameter
assignment would be expanded as:

  (set (reg:SI x) (reg:SI x0))

the shift would be expanded as:

  (set (reg:SI x) (zero_extend:SI (subreg:QI (reg:SI x) 0)))
  (set (reg:SI x) (ashiftrt:SI (reg:SI x) (const_int 1)))

and the return assignment would be expanded as:

  (set (reg:SI x0) (reg:SI x))

x + 1 would instead be expanded to just:

  (set (reg:SI x) (plus:SI (reg:SI x) (const_int 1)))

(without an extension).

I realised later though that, although reusing the DECL_RTL reg for
the extension has the nice RA property of avoiding multiple live values,
it would make it harder to combine the extension into the operation
if the variable is still live afterwards.  So I guess we lose something
both ways.

Maybe we need a different approach, not based on changing PROMOTE_MODE.

I wonder how easy it would be to do the promotion in gimple,
then reuse backprop to determine when a sign/zero-extension
(i.e. a normal gimple cast) can be converted into an “any extend”
(probably represented as a new ifn).

> So I have to keep the intermediate copy QI mode, after which the RTX optimizations
> being done during expand generates the forms above.
>
>> 
>> I think that's where confusion was coming from.
>> 
>> > In any case, I'm still not totally sure what the objection here is.
>> > Afaik, compares need to be treated specially because in GIMPLE they
>> > already are.  Afaik, C integer promotion rules state that in the
>> > comparison 0 should have been promoted to an integer constant of rank
>> > int and so the comparison itself should have been done as integer.
>> > i.e. extended.  And most of our patterns are based around this.
>> >
>> > Gimple however doesn't do this, the comparison is done in the rank of
>> > the variable and there is no explicit conversion.  This happened to be
>> > fixed up before during the forced promotion.  So to me the heuristic
>> > doesn't seem to be that crazy..
>> 
>> I guess I still don't see the distinction.  C says the same thing about
>> +, -, >>, etc.  And gimple is free to do those operations in narrow
>> +types
>> if it wants to, and if that doesn't change the semantics.  (Not that gimple
>> always does them in narrow types.  But it is valid gimple.)
>> 
>> The optimisation problem doesn't come from C or gimple semantics, but
>> from the fact that AArch64 (unlike x86 say) doesn't have byte add, byte
>> compare, byte right shift, etc.  We therefore need to promote 8-bit and 16-
>> bit operations to 32 bits first.
>> 
>> For add, subtract, multiply, left shift, and logic ops, we can avoid defining the
>> upper bits of the inputs when we do these extensions, because the upper
>> bits of the inputs don't affect the useful bits of the result.  But for
>> comparisons, right shifts, and divides, etc., we do need to extend.
>> 
>> AIUI, the comparison case is special because (for AArch64-specific reasons),
>> we prefer extend + cbz to tst + branch, especially when the extend can be
>> shared.
>
> Agreed, so I think we agree on this 😊 I guess the disagreement is where this
> should be done. I'll admit that the testcase above works by coincidence. But if
> we don't do it during expand time, the only place I can think of to introduce
> the zero extends is to add various patterns to do an early split of any-extend
> + cmp.  But wouldn't that be more fragile? At least at expand time all comparisons
> are tcc_comparisons. 

I guess one question is: is the patch without the comparison handling
just exacerbating an existing problem?  Do we already make similar bad
choices between extend+cbz and tst+branch in cases where the variables
aren't short, but where intermediate calculations involve &s?  If so,
it might be something worth tackling in its own right, regardless of
where the &s or extensions come from.

But yeah, I'm not sure how easily that would fit into existing passes.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: [PATCH 1/3]middle-end: Add the ability to let the target decide the method of argument promotions.
  2022-05-17  9:03                   ` Richard Sandiford
@ 2022-05-17 17:45                     ` Tamar Christina
  2022-05-18  7:49                       ` Richard Sandiford
  0 siblings, 1 reply; 19+ messages in thread
From: Tamar Christina @ 2022-05-17 17:45 UTC (permalink / raw)
  To: Richard Sandiford; +Cc: gcc-patches, nd, rguenther, jeffreyalaw

 […]
> >> > We generate for e.g.:
> >> >
> >> > #include <stdint.h>
> >> >
> >> > uint16_t f8 (uint8_t xr, uint8_t xc){
> >> >     return (uint8_t)(xr * xc);
> >> > }
> >> >
> >> > (insn 9 6 10 2 (set (reg:HI 101)
> >> (zero_extend:HI (reg/v:QI 96 [ xr ]))) "prom.c":4:16 -1
> >> (nil))
> >> (insn 10 9 11 2 (set (reg:HI 102)
> >> (zero_extend:HI (reg/v:QI 98 [ xc ]))) "prom.c":4:16 -1
> >> (nil))
> >> (insn 11 10 12 2 (set (reg:SI 103)
> >> (mult:SI (subreg:SI (reg:HI 101) 0)
> >> (subreg:SI (reg:HI 102) 0))) "prom.c":4:16 -1
> >> (nil))
> >> >
> >> > Out of expand. The paradoxical subreg isn't generated at all out of
> >> > expand unless it's needed. It does keep the original params around
> >> > as
> >> unused:
> >> >
> >> > (insn 2 7 4 2 (set (reg:QI 97)
> >> (reg:QI 0 x0 [ xr ])) "prom.c":3:37 -1
> >> (nil))
> >> (insn 4 2 3 2 (set (reg:QI 99)
> >> (reg:QI 1 x1 [ xc ])) "prom.c":3:37 -1
> >> (nil))
> >> >
> >> > And the paradoxical subreg is moved into the first operation requiring it:
> >> >
> >> > (insn 11 10 12 2 (set (reg:SI 103)
> >> (mult:SI (subreg:SI (reg:HI 101) 0)
> >> (subreg:SI (reg:HI 102) 0))) "prom.c":4:16 -1
> >> (nil))
> >>
> >> Ah, OK, this isn't what I'd imaagined.  I thought the xr and xc
> >> registers would be SIs and the DECL_RTLs would be QI subregs of those SI
> regs.
> >> I think that might work better, for the reasons above.  (That is,
> >> whenever we need the register in extended form, we can simply extend
> >> the existing reg rather than create a new one.)
> >
> > Ah, I see, no, I explicitly avoid this. When doing the type promotions
> > I tell it that size of the copies of xr and xc is still the original size, e.g. QI (i.e. I
> don't change 97 and 99).
> > This is different from what we do with extends where 97 and 99 *would*
> be changed.
> >
> > The reason is that if I make this SI the compiler thinks it knows the
> > value of all the bits in the register which led to various miscompares as it
> thinks it can use the SI value directly.
> >
> > This happens because again the xr and xc are hard regs. So having 97
> > be
> >
> > (set (reg:SI 97) (subreg:SI (reg:QI 0 x0 [ xr ]) 0))
> >
> > gets folded to an incorrect
> >
> > (set (reg:SI 97) (reg:SI 0 x0 [ xr ]))
> 
> This part I would expect (and hope for :-)).
> 
> > And now 97 is free to be used without any zero extension, as 97 on it's own
> is an invalid RTX.
> 
> But the way I'd imagined it working, expand would need to insert an
> extension before any operation that needs the upper 24 bits to be defined
> (e.g. comparisons, right shifts).  If the DECL_RTL is (subreg:QI (reg:SI x) 0)
> then the upper bits are not defined, since SUBREG_PROMOTED_VAR_P
> would/should be false for the subreg.

Ah I see, my fear here was that if we have a pattern which splits out the zero-extend for whatever reason
that if it gets folded it would be invalid.  But I think I understand what you meant.  In your case
we'd never again use the hardreg, but that everything goes through 97. Got it.

> 
> E.g. for:
> 
>   int8_t foo(int8_t x) { return x >> 1; }
> 
> x would have a DECL_RTL of (subreg:QI (reg:SI x) 0), the parameter
> assignment would be expanded as:
> 
>   (set (reg:SI x) (reg:SI x0))
> 
> the shift would be expanded as:
> 
>   (set (reg:SI x) (zero_extend:SI (subreg:QI (reg:SI x) 0)))
>   (set (reg:SI x) (ashiftrt:SI (reg:SI x) (const_int 1)))
> 
> and the return assignment would be expanded as:
> 
>   (set (reg:SI x0) (reg:SI x))
> 
> x + 1 would instead be expanded to just:
> 
>   (set (reg:SI x) (plus:SI (reg:SI x) (const_int 1)))
> 
> (without an extension).
> 
> I realised later though that, although reusing the DECL_RTL reg for the
> extension has the nice RA property of avoiding multiple live values, it would
> make it harder to combine the extension into the operation if the variable is
> still live afterwards.  So I guess we lose something both ways.
> 
> Maybe we need a different approach, not based on changing
> PROMOTE_MODE.
> 
> I wonder how easy it would be to do the promotion in gimple, then reuse
> backprop to determine when a sign/zero-extension (i.e. a normal gimple cast)
> can be converted into an “any extend”
> (probably represented as a new ifn).

Do you mean without changing the hook implementation but keeping the current promotion?

I guess the problem here is that it's the inverse cases that's the problem isn't it? It's not that in
gimple there are unneeded extends, it's that some operations require an any-extend no?

like in gimple ~a where a is an 8-bit quantity requires an any-extend, but no cast would be there
in gimple.

So for instance

#include <stdint.h>

uint8_t f (uint8_t a)
{
    return ~a;
}

Is just simply:

f (uint8_t a)
{
  uint8_t _2;

  <bb 2> [local count: 1073741824]:
  _2 = ~a_1(D);
  return _2;

}

In gimple. I'm also slightly worried about interfering with phi opts. Backprop runs
before ifcombine and pihops for instance and there are various phi opts like ifcombine_ifandif
that rely on the BB containing only the phi node.  Adding an any-extend in between would break this.

I also wonder if the IFN won't interfere with range analysis of expressions. Unless we manage to strategically
Insert the IFNs on entire expressions and not in the intermediate SSA components.

> 
> > So I have to keep the intermediate copy QI mode, after which the RTX
> > optimizations being done during expand generates the forms above.
> >
> >>
> >> I think that's where confusion was coming from.
> >>
> >> > In any case, I'm still not totally sure what the objection here is.
> >> > Afaik, compares need to be treated specially because in GIMPLE they
> >> > already are.  Afaik, C integer promotion rules state that in the
> >> > comparison 0 should have been promoted to an integer constant of
> >> > rank int and so the comparison itself should have been done as integer.
> >> > i.e. extended.  And most of our patterns are based around this.
> >> >
> >> > Gimple however doesn't do this, the comparison is done in the rank
> >> > of the variable and there is no explicit conversion.  This happened
> >> > to be fixed up before during the forced promotion.  So to me the
> >> > heuristic doesn't seem to be that crazy..
> >>
> >> I guess I still don't see the distinction.  C says the same thing
> >> about
> >> +, -, >>, etc.  And gimple is free to do those operations in narrow
> >> +types
> >> if it wants to, and if that doesn't change the semantics.  (Not that
> >> gimple always does them in narrow types.  But it is valid gimple.)
> >>
> >> The optimisation problem doesn't come from C or gimple semantics, but
> >> from the fact that AArch64 (unlike x86 say) doesn't have byte add,
> >> byte compare, byte right shift, etc.  We therefore need to promote
> >> 8-bit and 16- bit operations to 32 bits first.
> >>
> >> For add, subtract, multiply, left shift, and logic ops, we can avoid
> >> defining the upper bits of the inputs when we do these extensions,
> >> because the upper bits of the inputs don't affect the useful bits of
> >> the result.  But for comparisons, right shifts, and divides, etc., we do need
> to extend.
> >>
> >> AIUI, the comparison case is special because (for AArch64-specific
> >> reasons), we prefer extend + cbz to tst + branch, especially when the
> >> extend can be shared.
> >
> > Agreed, so I think we agree on this 😊 I guess the disagreement is
> > where this should be done. I'll admit that the testcase above works by
> > coincidence. But if we don't do it during expand time, the only place
> > I can think of to introduce the zero extends is to add various
> > patterns to do an early split of any-extend
> > + cmp.  But wouldn't that be more fragile? At least at expand time all
> > + comparisons
> > are tcc_comparisons.
> 
> I guess one question is: is the patch without the comparison handling just
> exacerbating an existing problem?  Do we already make similar bad choices
> between extend+cbz and tst+branch in cases where the variables aren't
> short, but where intermediate calculations involve &s?  If so, it might be
> something worth tackling in its own right, regardless of where the &s or
> extensions come from.

I've tried various cases but they all look correct.  This is mostly because
we already have actual instructions for these. We generate a chain of
and + tst in most != 0 cases, in the rest we do the least amount of &s
and then a normal cmp. 

For the case in spec, we correctly only get an additional mov to copy
the value.  So it looks like it's only an issue with short cases.

Cheers,
Tamar

> 
> But yeah, I'm not sure how easily that would fit into existing passes.
> 
> Thanks,
> Richard

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/3]middle-end: Add the ability to let the target decide the method of argument promotions.
  2022-05-17 17:45                     ` Tamar Christina
@ 2022-05-18  7:49                       ` Richard Sandiford
  0 siblings, 0 replies; 19+ messages in thread
From: Richard Sandiford @ 2022-05-18  7:49 UTC (permalink / raw)
  To: Tamar Christina; +Cc: gcc-patches, nd, rguenther, jeffreyalaw

Tamar Christina <Tamar.Christina@arm.com> writes:
>  […]
>> >> > We generate for e.g.:
>> >> >
>> >> > #include <stdint.h>
>> >> >
>> >> > uint16_t f8 (uint8_t xr, uint8_t xc){
>> >> >     return (uint8_t)(xr * xc);
>> >> > }
>> >> >
>> >> > (insn 9 6 10 2 (set (reg:HI 101)
>> >> (zero_extend:HI (reg/v:QI 96 [ xr ]))) "prom.c":4:16 -1
>> >> (nil))
>> >> (insn 10 9 11 2 (set (reg:HI 102)
>> >> (zero_extend:HI (reg/v:QI 98 [ xc ]))) "prom.c":4:16 -1
>> >> (nil))
>> >> (insn 11 10 12 2 (set (reg:SI 103)
>> >> (mult:SI (subreg:SI (reg:HI 101) 0)
>> >> (subreg:SI (reg:HI 102) 0))) "prom.c":4:16 -1
>> >> (nil))
>> >> >
>> >> > Out of expand. The paradoxical subreg isn't generated at all out of
>> >> > expand unless it's needed. It does keep the original params around
>> >> > as
>> >> unused:
>> >> >
>> >> > (insn 2 7 4 2 (set (reg:QI 97)
>> >> (reg:QI 0 x0 [ xr ])) "prom.c":3:37 -1
>> >> (nil))
>> >> (insn 4 2 3 2 (set (reg:QI 99)
>> >> (reg:QI 1 x1 [ xc ])) "prom.c":3:37 -1
>> >> (nil))
>> >> >
>> >> > And the paradoxical subreg is moved into the first operation requiring it:
>> >> >
>> >> > (insn 11 10 12 2 (set (reg:SI 103)
>> >> (mult:SI (subreg:SI (reg:HI 101) 0)
>> >> (subreg:SI (reg:HI 102) 0))) "prom.c":4:16 -1
>> >> (nil))
>> >>
>> >> Ah, OK, this isn't what I'd imaagined.  I thought the xr and xc
>> >> registers would be SIs and the DECL_RTLs would be QI subregs of those SI
>> regs.
>> >> I think that might work better, for the reasons above.  (That is,
>> >> whenever we need the register in extended form, we can simply extend
>> >> the existing reg rather than create a new one.)
>> >
>> > Ah, I see, no, I explicitly avoid this. When doing the type promotions
>> > I tell it that size of the copies of xr and xc is still the original size, e.g. QI (i.e. I
>> don't change 97 and 99).
>> > This is different from what we do with extends where 97 and 99 *would*
>> be changed.
>> >
>> > The reason is that if I make this SI the compiler thinks it knows the
>> > value of all the bits in the register which led to various miscompares as it
>> thinks it can use the SI value directly.
>> >
>> > This happens because again the xr and xc are hard regs. So having 97
>> > be
>> >
>> > (set (reg:SI 97) (subreg:SI (reg:QI 0 x0 [ xr ]) 0))
>> >
>> > gets folded to an incorrect
>> >
>> > (set (reg:SI 97) (reg:SI 0 x0 [ xr ]))
>> 
>> This part I would expect (and hope for :-)).
>> 
>> > And now 97 is free to be used without any zero extension, as 97 on it's own
>> is an invalid RTX.
>> 
>> But the way I'd imagined it working, expand would need to insert an
>> extension before any operation that needs the upper 24 bits to be defined
>> (e.g. comparisons, right shifts).  If the DECL_RTL is (subreg:QI (reg:SI x) 0)
>> then the upper bits are not defined, since SUBREG_PROMOTED_VAR_P
>> would/should be false for the subreg.
>
> Ah I see, my fear here was that if we have a pattern which splits out the zero-extend for whatever reason
> that if it gets folded it would be invalid.  But I think I understand what you meant.  In your case
> we'd never again use the hardreg, but that everything goes through 97. Got it.

Yeah.  The expand code is supposed to move the hard register into a
pseudo at the earliest opportunity (at the head of the function) and
then everything else should use the pseudo.  Using the hard register
later could lead to spill failures, or to attempts to keep the register
live across calls.

>> E.g. for:
>> 
>>   int8_t foo(int8_t x) { return x >> 1; }
>> 
>> x would have a DECL_RTL of (subreg:QI (reg:SI x) 0), the parameter
>> assignment would be expanded as:
>> 
>>   (set (reg:SI x) (reg:SI x0))
>> 
>> the shift would be expanded as:
>> 
>>   (set (reg:SI x) (zero_extend:SI (subreg:QI (reg:SI x) 0)))
>>   (set (reg:SI x) (ashiftrt:SI (reg:SI x) (const_int 1)))
>> 
>> and the return assignment would be expanded as:
>> 
>>   (set (reg:SI x0) (reg:SI x))
>> 
>> x + 1 would instead be expanded to just:
>> 
>>   (set (reg:SI x) (plus:SI (reg:SI x) (const_int 1)))
>> 
>> (without an extension).
>> 
>> I realised later though that, although reusing the DECL_RTL reg for the
>> extension has the nice RA property of avoiding multiple live values, it would
>> make it harder to combine the extension into the operation if the variable is
>> still live afterwards.  So I guess we lose something both ways.
>> 
>> Maybe we need a different approach, not based on changing
>> PROMOTE_MODE.
>> 
>> I wonder how easy it would be to do the promotion in gimple, then reuse
>> backprop to determine when a sign/zero-extension (i.e. a normal gimple cast)
>> can be converted into an “any extend”
>> (probably represented as a new ifn).
>
> Do you mean without changing the hook implementation but keeping the current promotion?

Yeah, keep the hook definitions as they are now, but do the promotion
in gimple by widening types where necessary.

> I guess the problem here is that it's the inverse cases that's the problem isn't it? It's not that in
> gimple there are unneeded extends, it's that some operations require an any-extend no?
>
> like in gimple ~a where a is an 8-bit quantity requires an any-extend, but no cast would be there
> in gimple.
>
> So for instance
>
> #include <stdint.h>
>
> uint8_t f (uint8_t a)
> {
>     return ~a;
> }
>
> Is just simply:
>
> f (uint8_t a)
> {
>   uint8_t _2;
>
>   <bb 2> [local count: 1073741824]:
>   _2 = ~a_1(D);
>   return _2;
>
> }

Right.  But the idea was to run a late-ish isel-like pass that converts
this to:

f (uint8_t a)
{
  uint8_t _2;
  uint32_t _3;
  uint8_t _4;

  <bb 2> [local count: 1073741824]:
  _2 = .ANY_EXTEND(a_1(D), (uint32_t)0);
  _3 = ~_2;
  _4 = (uint8_t)_3;
  return _4;
}

We'd need to experiment with the best placing of the pass.

> In gimple. I'm also slightly worried about interfering with phi opts. Backprop runs
> before ifcombine and pihops for instance and there are various phi opts like ifcombine_ifandif
> that rely on the BB containing only the phi node.  Adding an any-extend in between would break this.

Yeah, the current backprop pass runs quite early.  But I meant that we
could reuse the code (or simply run another instance of the pass,
extended to handle this case) at the appropriate point.

> I also wonder if the IFN won't interfere with range analysis of expressions. Unless we manage to strategically
> Insert the IFNs on entire expressions and not in the intermediate SSA components.

This would be part of the trade-off in placing the promotion pass.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/3]AArch64 Promote function arguments using a paradoxical subreg when beneficial.
  2022-05-13 17:11 ` [PATCH 2/3]AArch64 Promote function arguments using a paradoxical subreg when beneficial Tamar Christina
@ 2022-10-27  3:15   ` Andrew Pinski
  2022-10-28  9:57     ` Tamar Christina
  0 siblings, 1 reply; 19+ messages in thread
From: Andrew Pinski @ 2022-10-27  3:15 UTC (permalink / raw)
  To: Tamar Christina
  Cc: gcc-patches, Richard.Earnshaw, nd, richard.sandiford, Marcus.Shawcroft

On Fri, May 13, 2022 at 10:14 AM Tamar Christina via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> Hi All,
>
> The PROMOTE_MODE always promotes 8 and 16-bit parameters to 32-bits.
> This promotion is not required for the ABI which states:
>
>
> ```
> C.9     If the argument is an Integral or Pointer Type, the size of the argument is
> less than or equal to 8 bytes and the NGRN is less than 8, the argument is
> copied to the least significant bits in x[NGRN]. The NGRN is incremented by one.
> The argument has now been allocated.
>
> C.16    If the size of the argument is less than 8 bytes then the size of the
> argument is set to 8 bytes. The effect is as if the argument was copied to the
> least significant bits of a 64-bit register and the remaining bits filled with
> unspecified values
> ```
>
> That is, the bits in the registers are unspecified and callees cannot assume
> any particular status.
>
> This means that we can avoid the promotion and still get correct code as the
> language level promotion rules require values to be extended when the bits are
> significant.
>
> So if we are .e.g OR-ing two 8-bit values no extend is needed as the top bits
> are irrelevant.  If we are doing e.g. addition, then the top bits *might* be
> relevant depending on the result type.  But the middle end will always
> contain the appropriate extend in those cases.
>
> The mid-end also has optimizations around this assumption and the AArch64 port
> actively undoes them.
>
> So for instance
>
> uint16_t fd (uint8_t xr){
>     return xr + 1;
> }
>
> uint8_t fd2 (uint8_t xr){
>     return xr + 1;
> }
>
> should produce
>
> fd:                                     // @fd
>         and     w8, w0, #0xff
>         add     w0, w8, #1
>         ret
> fd2:                                    // @fd2
>         add     w0, w0, #1
>         ret
>
> like clang does instead of
>
> fd:
>         and     w0, w0, 255
>         add     w0, w0, 1
>         ret
> fd2:
>         and     w0, w0, 255
>         add     w0, w0, 1
>         ret
>
> like we do now.  Removing this forced expansion maintains correctness but fixes
> issues with various codegen defects.  It also brings us inline with clang.
>
> Note that C, C++ and Fortran etc all correctly specify what should happen w.r.t
> extends and e.g. array indexing, pointer arith etc so we never get incorrect
> code.
>
> There is however a second reason for doing this promotion: RTL efficiency.
> The promotion stops us from having to promote the values to SI to be able to
> use them in instructions and then truncating again afterwards.
>
> To get both the efficiency and the simpler RTL we can instead promote to a
> paradoxical subreg.  This patch implements the hook for AArch64 and adds an
> explicit opt-out for values that feed into comparisons.  This is done because:
>
> 1. our comparisons patterns already allow us to absorb the zero extend
> 2. The extension allows us to use cbz/cbnz/tbz etc.  In some cases such as
>
> int foo (char a, char b)
> {
>    if (a)
>      if (b)
>        bar1 ();
>      else
>        ...
>     else
>      if (b)
>        bar2 ();
>      else
>        ...
> }
>
> by zero extending the value we can avoid having to repeatedly test the value
> before a branch.  Allowing the zero extend also allows our existing `ands`
> patterns to work as expected.
>
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> I have to commit this and the last patch together but ease of review
> I have split them up here. However 209 missed optimization xfails are
> fixed.
>
> No performance difference on SPECCPU 2017 but no failures.
>
> Ok for master?

Did this patch ever get approved? It is a nice improvement that would
be nice to get into GCC 13 before the close of stage 1.

Thanks,
Andrew

>
> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
>         * config/aarch64/aarch64.cc (aarch64_promote_function_args_subreg_p):
>         (TARGET_PROMOTE_FUNCTION_ARGS_SUBREG_P): New.
>         * config/aarch64/aarch64.h (PROMOTE_MODE): Expand doc.
>
> gcc/testsuite/ChangeLog:
>
>         * gcc.target/aarch64/apc-subreg.c: New test.
>
> --- inline copy of patch --
> diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
> index efa46ac0b8799b5849b609d591186e26e5cb37ff..cc74a816fcc6458aa065246a30a4d2184692ad74 100644
> --- a/gcc/config/aarch64/aarch64.h
> +++ b/gcc/config/aarch64/aarch64.h
> @@ -34,7 +34,8 @@
>
>  #define REGISTER_TARGET_PRAGMAS() aarch64_register_pragmas ()
>
> -/* Target machine storage layout.  */
> +/* Target machine storage layout.  See also
> +   TARGET_PROMOTE_FUNCTION_ARGS_SUBREG_P.  */
>
>  #define PROMOTE_MODE(MODE, UNSIGNEDP, TYPE)    \
>    if (GET_MODE_CLASS (MODE) == MODE_INT                \
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index 2f559600cff55af9d468e8d0810545583cc986f5..252d6c2af72afc1dfee1a86644a5753784b41f59 100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -3736,6 +3736,57 @@ aarch64_array_mode_supported_p (machine_mode mode,
>    return false;
>  }
>
> +/* Implement target hook TARGET_PROMOTE_FUNCTION_ARGS_SUBREG_P to complement
> +   PROMOTE_MODE.  If any argument promotion was done, do them as subregs.  */
> +static bool
> +aarch64_promote_function_args_subreg_p (machine_mode mode,
> +                                       machine_mode promoted_mode,
> +                                       int /* unsignedp */, tree parm)
> +{
> +  bool candidate_p = GET_MODE_CLASS (mode) == MODE_INT
> +                    && GET_MODE_CLASS (promoted_mode) == MODE_INT
> +                    && known_lt (GET_MODE_SIZE (mode), 4)
> +                    && promoted_mode == SImode;
> +
> +  if (!candidate_p)
> +    return false;
> +
> +  if (!parm || !is_gimple_reg (parm))
> +    return true;
> +
> +  tree var = parm;
> +  if (!VAR_P (var))
> +    {
> +      if (TREE_CODE (parm) == SSA_NAME
> +          && !(var = SSA_NAME_VAR (var)))
> +       return true;
> +      else if (TREE_CODE (parm) != PARM_DECL)
> +       return true;
> +    }
> +
> +  /* If the variable is used inside a comparison which sets CC then we should
> +     still promote using an extend.  By doing this we make it easier to use
> +     cbz/cbnz but also repeatedly having to test the value in certain
> +     circumstances like nested if values that test the same value with calls
> +     in between. */
> +  tree ssa_var = ssa_default_def (cfun, var);
> +  if (!ssa_var)
> +    return true;
> +
> +  const ssa_use_operand_t *const head = &(SSA_NAME_IMM_USE_NODE (ssa_var));
> +  const ssa_use_operand_t *ptr;
> +
> +  for (ptr = head->next; ptr != head; ptr = ptr->next)
> +    if (USE_STMT(ptr) && is_gimple_assign (USE_STMT (ptr)))
> +      {
> +       tree_code code = gimple_assign_rhs_code (USE_STMT(ptr));
> +       if (TREE_CODE_CLASS (code) == tcc_comparison)
> +         return false;
> +      }
> +
> +  return true;
> +}
> +
>  /* MODE is some form of SVE vector mode.  For data modes, return the number
>     of vector register bits that each element of MODE occupies, such as 64
>     for both VNx2DImode and VNx2SImode (where each 32-bit value is stored
> @@ -27490,6 +27541,10 @@ aarch64_libgcc_floating_mode_supported_p
>  #undef TARGET_ARRAY_MODE_SUPPORTED_P
>  #define TARGET_ARRAY_MODE_SUPPORTED_P aarch64_array_mode_supported_p
>
> +#undef TARGET_PROMOTE_FUNCTION_ARGS_SUBREG_P
> +#define TARGET_PROMOTE_FUNCTION_ARGS_SUBREG_P \
> +  aarch64_promote_function_args_subreg_p
> +
>  #undef TARGET_VECTORIZE_CREATE_COSTS
>  #define TARGET_VECTORIZE_CREATE_COSTS aarch64_vectorize_create_costs
>
> diff --git a/gcc/testsuite/gcc.target/aarch64/apc-subreg.c b/gcc/testsuite/gcc.target/aarch64/apc-subreg.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..2d7563a11ce11fa677f7ad4bf2a090e6a136e4d9
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/apc-subreg.c
> @@ -0,0 +1,103 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-O" } */
> +/* { dg-final { check-function-bodies "**" "" "" } } */
> +
> +#include <stdint.h>
> +
> +/*
> +** f0:
> +**     mvn     w0, w0
> +**     ret
> +*/
> +uint8_t f0 (uint8_t xr){
> +    return (uint8_t) (0xff - xr);
> +}
> +
> +/*
> +** f1:
> +**     mvn     w0, w0
> +**     ret
> +*/
> +int8_t f1 (int8_t xr){
> +    return (int8_t) (0xff - xr);
> +}
> +
> +/*
> +** f2:
> +**     mvn     w0, w0
> +**     ret
> +*/
> +uint16_t f2 (uint16_t xr){
> +    return (uint16_t) (0xffFF - xr);
> +}
> +
> +/*
> +** f3:
> +**     mvn     w0, w0
> +**     ret
> +*/
> +uint32_t f3 (uint32_t xr){
> +    return (uint32_t) (0xffFFffff - xr);
> +}
> +
> +/*
> +** f4:
> +**     mvn     x0, x0
> +**     ret
> +*/
> +uint64_t f4 (uint64_t xr){
> +    return (uint64_t) (0xffFFffffffffffff - xr);
> +}
> +
> +/*
> +** f5:
> +**     mvn     w0, w0
> +**     sub     w0, w0, w1
> +**     ret
> +*/
> +uint8_t f5 (uint8_t xr, uint8_t xc){
> +    return (uint8_t) (0xff - xr - xc);
> +}
> +
> +/*
> +** f6:
> +**     mvn     w0, w0
> +**     and     w0, w0, 255
> +**     and     w1, w1, 255
> +**     mul     w0, w0, w1
> +**     ret
> +*/
> +uint16_t f6 (uint8_t xr, uint8_t xc){
> +    return ((uint8_t) (0xff - xr)) * xc;
> +}
> +
> +/*
> +** f7:
> +**     and     w0, w0, 255
> +**     and     w1, w1, 255
> +**     mul     w0, w0, w1
> +**     ret
> +*/
> +uint16_t f7 (uint8_t xr, uint8_t xc){
> +    return xr * xc;
> +}
> +
> +/*
> +** f8:
> +**     mul     w0, w0, w1
> +**     and     w0, w0, 255
> +**     ret
> +*/
> +uint16_t f8 (uint8_t xr, uint8_t xc){
> +    return (uint8_t)(xr * xc);
> +}
> +
> +/*
> +** f9:
> +**     and     w0, w0, 255
> +**     add     w0, w0, w1
> +**     ret
> +*/
> +uint16_t f9 (uint8_t xr, uint16_t xc){
> +    return xr + xc;
> +}
>
>
>
>
> --

^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: [PATCH 2/3]AArch64 Promote function arguments using a paradoxical subreg when beneficial.
  2022-10-27  3:15   ` Andrew Pinski
@ 2022-10-28  9:57     ` Tamar Christina
  0 siblings, 0 replies; 19+ messages in thread
From: Tamar Christina @ 2022-10-28  9:57 UTC (permalink / raw)
  To: Andrew Pinski
  Cc: gcc-patches, Richard Earnshaw, nd, Richard Sandiford, Marcus Shawcroft

> -----Original Message-----
> From: Andrew Pinski <pinskia@gmail.com>
> Sent: Thursday, October 27, 2022 4:15 AM
> To: Tamar Christina <Tamar.Christina@arm.com>
> Cc: gcc-patches@gcc.gnu.org; Richard Earnshaw
> <Richard.Earnshaw@arm.com>; nd <nd@arm.com>; Richard Sandiford
> <Richard.Sandiford@arm.com>; Marcus Shawcroft
> <Marcus.Shawcroft@arm.com>
> Subject: Re: [PATCH 2/3]AArch64 Promote function arguments using a
> paradoxical subreg when beneficial.
> 
> On Fri, May 13, 2022 at 10:14 AM Tamar Christina via Gcc-patches <gcc-
> patches@gcc.gnu.org> wrote:
> >
> > Hi All,
> >
> > The PROMOTE_MODE always promotes 8 and 16-bit parameters to 32-bits.
> > This promotion is not required for the ABI which states:
> >
> >
> > ```
> > C.9     If the argument is an Integral or Pointer Type, the size of the
> argument is
> > less than or equal to 8 bytes and the NGRN is less than 8, the
> > argument is copied to the least significant bits in x[NGRN]. The NGRN is
> incremented by one.
> > The argument has now been allocated.
> >
> > C.16    If the size of the argument is less than 8 bytes then the size of the
> > argument is set to 8 bytes. The effect is as if the argument was
> > copied to the least significant bits of a 64-bit register and the
> > remaining bits filled with unspecified values ```
> >
> > That is, the bits in the registers are unspecified and callees cannot
> > assume any particular status.
> >
> > This means that we can avoid the promotion and still get correct code
> > as the language level promotion rules require values to be extended
> > when the bits are significant.
> >
> > So if we are .e.g OR-ing two 8-bit values no extend is needed as the
> > top bits are irrelevant.  If we are doing e.g. addition, then the top
> > bits *might* be relevant depending on the result type.  But the middle
> > end will always contain the appropriate extend in those cases.
> >
> > The mid-end also has optimizations around this assumption and the
> > AArch64 port actively undoes them.
> >
> > So for instance
> >
> > uint16_t fd (uint8_t xr){
> >     return xr + 1;
> > }
> >
> > uint8_t fd2 (uint8_t xr){
> >     return xr + 1;
> > }
> >
> > should produce
> >
> > fd:                                     // @fd
> >         and     w8, w0, #0xff
> >         add     w0, w8, #1
> >         ret
> > fd2:                                    // @fd2
> >         add     w0, w0, #1
> >         ret
> >
> > like clang does instead of
> >
> > fd:
> >         and     w0, w0, 255
> >         add     w0, w0, 1
> >         ret
> > fd2:
> >         and     w0, w0, 255
> >         add     w0, w0, 1
> >         ret
> >
> > like we do now.  Removing this forced expansion maintains correctness
> > but fixes issues with various codegen defects.  It also brings us inline with
> clang.
> >
> > Note that C, C++ and Fortran etc all correctly specify what should
> > happen w.r.t extends and e.g. array indexing, pointer arith etc so we
> > never get incorrect code.
> >
> > There is however a second reason for doing this promotion: RTL efficiency.
> > The promotion stops us from having to promote the values to SI to be
> > able to use them in instructions and then truncating again afterwards.
> >
> > To get both the efficiency and the simpler RTL we can instead promote
> > to a paradoxical subreg.  This patch implements the hook for AArch64
> > and adds an explicit opt-out for values that feed into comparisons.  This is
> done because:
> >
> > 1. our comparisons patterns already allow us to absorb the zero extend
> > 2. The extension allows us to use cbz/cbnz/tbz etc.  In some cases
> > such as
> >
> > int foo (char a, char b)
> > {
> >    if (a)
> >      if (b)
> >        bar1 ();
> >      else
> >        ...
> >     else
> >      if (b)
> >        bar2 ();
> >      else
> >        ...
> > }
> >
> > by zero extending the value we can avoid having to repeatedly test the
> > value before a branch.  Allowing the zero extend also allows our
> > existing `ands` patterns to work as expected.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> > I have to commit this and the last patch together but ease of review I
> > have split them up here. However 209 missed optimization xfails are
> > fixed.
> >
> > No performance difference on SPECCPU 2017 but no failures.
> >
> > Ok for master?
> 
> Did this patch ever get approved? It is a nice improvement that would be nice
> to get into GCC 13 before the close of stage 1.

No, It was requested I make a standalone pass that introduces a new kind of extension
in the mid-end.  Unfortunately due to constrains on how much time I can dedicate to
that this year I've had to drop it for GCC 13.

I'll try to pick it up again during GCC 14.

Regards,
Tamar

> 
> Thanks,
> Andrew
> 
> >
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> >         * config/aarch64/aarch64.cc
> (aarch64_promote_function_args_subreg_p):
> >         (TARGET_PROMOTE_FUNCTION_ARGS_SUBREG_P): New.
> >         * config/aarch64/aarch64.h (PROMOTE_MODE): Expand doc.
> >
> > gcc/testsuite/ChangeLog:
> >
> >         * gcc.target/aarch64/apc-subreg.c: New test.
> >
> > --- inline copy of patch --
> > diff --git a/gcc/config/aarch64/aarch64.h
> > b/gcc/config/aarch64/aarch64.h index
> >
> efa46ac0b8799b5849b609d591186e26e5cb37ff..cc74a816fcc6458aa065246a30
> a4
> > d2184692ad74 100644
> > --- a/gcc/config/aarch64/aarch64.h
> > +++ b/gcc/config/aarch64/aarch64.h
> > @@ -34,7 +34,8 @@
> >
> >  #define REGISTER_TARGET_PRAGMAS() aarch64_register_pragmas ()
> >
> > -/* Target machine storage layout.  */
> > +/* Target machine storage layout.  See also
> > +   TARGET_PROMOTE_FUNCTION_ARGS_SUBREG_P.  */
> >
> >  #define PROMOTE_MODE(MODE, UNSIGNEDP, TYPE)    \
> >    if (GET_MODE_CLASS (MODE) == MODE_INT                \
> > diff --git a/gcc/config/aarch64/aarch64.cc
> > b/gcc/config/aarch64/aarch64.cc index
> >
> 2f559600cff55af9d468e8d0810545583cc986f5..252d6c2af72afc1dfee1a86644a5
> > 753784b41f59 100644
> > --- a/gcc/config/aarch64/aarch64.cc
> > +++ b/gcc/config/aarch64/aarch64.cc
> > @@ -3736,6 +3736,57 @@ aarch64_array_mode_supported_p
> (machine_mode mode,
> >    return false;
> >  }
> >
> > +/* Implement target hook
> TARGET_PROMOTE_FUNCTION_ARGS_SUBREG_P to complement
> > +   PROMOTE_MODE.  If any argument promotion was done, do them as
> > +subregs.  */ static bool aarch64_promote_function_args_subreg_p
> > +(machine_mode mode,
> > +                                       machine_mode promoted_mode,
> > +                                       int /* unsignedp */, tree
> > +parm) {
> > +  bool candidate_p = GET_MODE_CLASS (mode) == MODE_INT
> > +                    && GET_MODE_CLASS (promoted_mode) == MODE_INT
> > +                    && known_lt (GET_MODE_SIZE (mode), 4)
> > +                    && promoted_mode == SImode;
> > +
> > +  if (!candidate_p)
> > +    return false;
> > +
> > +  if (!parm || !is_gimple_reg (parm))
> > +    return true;
> > +
> > +  tree var = parm;
> > +  if (!VAR_P (var))
> > +    {
> > +      if (TREE_CODE (parm) == SSA_NAME
> > +          && !(var = SSA_NAME_VAR (var)))
> > +       return true;
> > +      else if (TREE_CODE (parm) != PARM_DECL)
> > +       return true;
> > +    }
> > +
> > +  /* If the variable is used inside a comparison which sets CC then we
> should
> > +     still promote using an extend.  By doing this we make it easier to use
> > +     cbz/cbnz but also repeatedly having to test the value in certain
> > +     circumstances like nested if values that test the same value with calls
> > +     in between. */
> > +  tree ssa_var = ssa_default_def (cfun, var);  if (!ssa_var)
> > +    return true;
> > +
> > +  const ssa_use_operand_t *const head =
> &(SSA_NAME_IMM_USE_NODE
> > + (ssa_var));  const ssa_use_operand_t *ptr;
> > +
> > +  for (ptr = head->next; ptr != head; ptr = ptr->next)
> > +    if (USE_STMT(ptr) && is_gimple_assign (USE_STMT (ptr)))
> > +      {
> > +       tree_code code = gimple_assign_rhs_code (USE_STMT(ptr));
> > +       if (TREE_CODE_CLASS (code) == tcc_comparison)
> > +         return false;
> > +      }
> > +
> > +  return true;
> > +}
> > +
> >  /* MODE is some form of SVE vector mode.  For data modes, return the
> number
> >     of vector register bits that each element of MODE occupies, such as 64
> >     for both VNx2DImode and VNx2SImode (where each 32-bit value is
> > stored @@ -27490,6 +27541,10 @@
> > aarch64_libgcc_floating_mode_supported_p
> >  #undef TARGET_ARRAY_MODE_SUPPORTED_P
> >  #define TARGET_ARRAY_MODE_SUPPORTED_P
> aarch64_array_mode_supported_p
> >
> > +#undef TARGET_PROMOTE_FUNCTION_ARGS_SUBREG_P
> > +#define TARGET_PROMOTE_FUNCTION_ARGS_SUBREG_P \
> > +  aarch64_promote_function_args_subreg_p
> > +
> >  #undef TARGET_VECTORIZE_CREATE_COSTS
> >  #define TARGET_VECTORIZE_CREATE_COSTS
> aarch64_vectorize_create_costs
> >
> > diff --git a/gcc/testsuite/gcc.target/aarch64/apc-subreg.c
> > b/gcc/testsuite/gcc.target/aarch64/apc-subreg.c
> > new file mode 100644
> > index
> >
> 0000000000000000000000000000000000000000..2d7563a11ce11fa677f7ad4bf2
> a0
> > 90e6a136e4d9
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/aarch64/apc-subreg.c
> > @@ -0,0 +1,103 @@
> > +/* { dg-do compile } */
> > +/* { dg-additional-options "-O" } */
> > +/* { dg-final { check-function-bodies "**" "" "" } } */
> > +
> > +#include <stdint.h>
> > +
> > +/*
> > +** f0:
> > +**     mvn     w0, w0
> > +**     ret
> > +*/
> > +uint8_t f0 (uint8_t xr){
> > +    return (uint8_t) (0xff - xr);
> > +}
> > +
> > +/*
> > +** f1:
> > +**     mvn     w0, w0
> > +**     ret
> > +*/
> > +int8_t f1 (int8_t xr){
> > +    return (int8_t) (0xff - xr);
> > +}
> > +
> > +/*
> > +** f2:
> > +**     mvn     w0, w0
> > +**     ret
> > +*/
> > +uint16_t f2 (uint16_t xr){
> > +    return (uint16_t) (0xffFF - xr);
> > +}
> > +
> > +/*
> > +** f3:
> > +**     mvn     w0, w0
> > +**     ret
> > +*/
> > +uint32_t f3 (uint32_t xr){
> > +    return (uint32_t) (0xffFFffff - xr); }
> > +
> > +/*
> > +** f4:
> > +**     mvn     x0, x0
> > +**     ret
> > +*/
> > +uint64_t f4 (uint64_t xr){
> > +    return (uint64_t) (0xffFFffffffffffff - xr); }
> > +
> > +/*
> > +** f5:
> > +**     mvn     w0, w0
> > +**     sub     w0, w0, w1
> > +**     ret
> > +*/
> > +uint8_t f5 (uint8_t xr, uint8_t xc){
> > +    return (uint8_t) (0xff - xr - xc); }
> > +
> > +/*
> > +** f6:
> > +**     mvn     w0, w0
> > +**     and     w0, w0, 255
> > +**     and     w1, w1, 255
> > +**     mul     w0, w0, w1
> > +**     ret
> > +*/
> > +uint16_t f6 (uint8_t xr, uint8_t xc){
> > +    return ((uint8_t) (0xff - xr)) * xc; }
> > +
> > +/*
> > +** f7:
> > +**     and     w0, w0, 255
> > +**     and     w1, w1, 255
> > +**     mul     w0, w0, w1
> > +**     ret
> > +*/
> > +uint16_t f7 (uint8_t xr, uint8_t xc){
> > +    return xr * xc;
> > +}
> > +
> > +/*
> > +** f8:
> > +**     mul     w0, w0, w1
> > +**     and     w0, w0, 255
> > +**     ret
> > +*/
> > +uint16_t f8 (uint8_t xr, uint8_t xc){
> > +    return (uint8_t)(xr * xc);
> > +}
> > +
> > +/*
> > +** f9:
> > +**     and     w0, w0, 255
> > +**     add     w0, w0, w1
> > +**     ret
> > +*/
> > +uint16_t f9 (uint8_t xr, uint16_t xc){
> > +    return xr + xc;
> > +}
> >
> >
> >
> >
> > --

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2022-10-28  9:57 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-13 17:11 [PATCH 1/3]middle-end: Add the ability to let the target decide the method of argument promotions Tamar Christina
2022-05-13 17:11 ` [PATCH 2/3]AArch64 Promote function arguments using a paradoxical subreg when beneficial Tamar Christina
2022-10-27  3:15   ` Andrew Pinski
2022-10-28  9:57     ` Tamar Christina
2022-05-13 17:12 ` [PATCH 3/3]AArch64 Update the testsuite to remove xfails Tamar Christina
2022-05-16  6:31 ` [PATCH 1/3]middle-end: Add the ability to let the target decide the method of argument promotions Richard Biener
2022-05-16  8:26   ` Tamar Christina
2022-05-16 11:36 ` Richard Sandiford
2022-05-16 11:49   ` Tamar Christina
2022-05-16 12:14     ` Richard Sandiford
2022-05-16 12:18       ` Richard Sandiford
2022-05-16 13:02         ` Tamar Christina
2022-05-16 13:24           ` Richard Sandiford
2022-05-16 15:29             ` Tamar Christina
2022-05-16 16:48               ` Richard Sandiford
2022-05-17  7:55                 ` Tamar Christina
2022-05-17  9:03                   ` Richard Sandiford
2022-05-17 17:45                     ` Tamar Christina
2022-05-18  7:49                       ` Richard Sandiford

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).