public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [0/4] Add SVE support for load/store_lanes
@ 2017-11-08 15:12 Richard Sandiford
  2017-11-08 15:13 ` [1/4] Give the target more control over ARRAY_TYPE modes Richard Sandiford
                   ` (3 more replies)
  0 siblings, 4 replies; 9+ messages in thread
From: Richard Sandiford @ 2017-11-08 15:12 UTC (permalink / raw)
  To: gcc-patches; +Cc: richard.earnshaw, james.greenhalgh, marcus.shawcroft

This mixture of target-specific and target-independent patches adds
support for SVE LD[234] and ST[234].  The main difference from
Advanced SIMD is that SVE uses an extended vector mode for the
array of vectors, instead of the integer modes used by Advanced SIMD.

Tested on aarch64-linux-gnu (without and without SVE), x86_64-linux-gnu
and powerpc64-linux-gnu.  OK to install?

Thanks,
Richard

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [1/4] Give the target more control over ARRAY_TYPE modes
  2017-11-08 15:12 [0/4] Add SVE support for load/store_lanes Richard Sandiford
@ 2017-11-08 15:13 ` Richard Sandiford
  2017-11-21 16:38   ` Jeff Law
  2017-11-08 15:16 ` [2/4] [AArch64] SVE load/store_lanes support Richard Sandiford
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 9+ messages in thread
From: Richard Sandiford @ 2017-11-08 15:13 UTC (permalink / raw)
  To: gcc-patches; +Cc: richard.earnshaw, james.greenhalgh, marcus.shawcroft

So far we've used integer modes for LD[234] and ST[234] arrays.
That doesn't scale well to SVE, since the sizes aren't fixed at
compile time (and even if they were, we wouldn't want integers
to be so wide).

This patch lets the target use double-, triple- and quadruple-length
vectors instead.


2017-11-08  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* target.def (array_mode): New target hook.
	* doc/tm.texi.in (TARGET_ARRAY_MODE): New hook.
	* doc/tm.texi: Regenerate.
	* hooks.h (hook_optmode_mode_uhwi_none): Declare.
	* hooks.c (hook_optmode_mode_uhwi_none): New function.
	* tree-vect-data-refs.c (vect_lanes_optab_supported_p): Use
	targetm.array_mode.
	* stor-layout.c (mode_for_array): Likewise.  Support polynomial
	type sizes.

Index: gcc/target.def
===================================================================
--- gcc/target.def	2017-11-08 15:05:33.783582091 +0000
+++ gcc/target.def	2017-11-08 15:06:16.086850270 +0000
@@ -3400,6 +3400,22 @@ the vector element type.",
  HOST_WIDE_INT, (const_tree type),
  default_vector_alignment)
 
+DEFHOOK
+(array_mode,
+ "Return the mode that GCC should use for an array that has\n\
+@var{nelems} elements, with each element having mode @var{mode}.\n\
+Return no mode if the target has no special requirements.  In the\n\
+latter case, GCC looks for an integer mode of the appropriate size\n\
+if available and uses BLKmode otherwise.  Usually the search for the\n\
+integer mode is limited to @code{MAX_FIXED_MODE_SIZE}, but the\n\
+@code{TARGET_ARRAY_MODE_SUPPORTED_P} hook allows a larger mode to be\n\
+used in specific cases.\n\
+\n\
+The main use of this hook is to specify that an array of vectors should\n\
+also have a vector mode.  The default implementation returns no mode.",
+ opt_machine_mode, (machine_mode mode, unsigned HOST_WIDE_INT nelems),
+ hook_optmode_mode_uhwi_none)
+
 /* True if we should try to use a scalar mode to represent an array,
    overriding the usual MAX_FIXED_MODE limit.  */
 DEFHOOK
Index: gcc/doc/tm.texi.in
===================================================================
--- gcc/doc/tm.texi.in	2017-11-08 15:05:32.663824754 +0000
+++ gcc/doc/tm.texi.in	2017-11-08 15:06:16.085850270 +0000
@@ -3322,6 +3322,8 @@ stack.
 
 @hook TARGET_VECTOR_MODE_SUPPORTED_P
 
+@hook TARGET_ARRAY_MODE
+
 @hook TARGET_ARRAY_MODE_SUPPORTED_P
 
 @hook TARGET_LIBGCC_FLOATING_MODE_SUPPORTED_P
Index: gcc/doc/tm.texi
===================================================================
--- gcc/doc/tm.texi	2017-11-08 15:05:33.777173014 +0000
+++ gcc/doc/tm.texi	2017-11-08 15:06:16.084850270 +0000
@@ -4243,6 +4243,20 @@ insns involving vector mode @var{mode}.
 must have move patterns for this mode.
 @end deftypefn
 
+@deftypefn {Target Hook} opt_machine_mode TARGET_ARRAY_MODE (machine_mode @var{mode}, unsigned HOST_WIDE_INT @var{nelems})
+Return the mode that GCC should use for an array that has
+@var{nelems} elements, with each element having mode @var{mode}.
+Return no mode if the target has no special requirements.  In the
+latter case, GCC looks for an integer mode of the appropriate size
+if available and uses BLKmode otherwise.  Usually the search for the
+integer mode is limited to @code{MAX_FIXED_MODE_SIZE}, but the
+@code{TARGET_ARRAY_MODE_SUPPORTED_P} hook allows a larger mode to be
+used in specific cases.
+
+The main use of this hook is to specify that an array of vectors should
+also have a vector mode.  The default implementation returns no mode.
+@end deftypefn
+
 @deftypefn {Target Hook} bool TARGET_ARRAY_MODE_SUPPORTED_P (machine_mode @var{mode}, unsigned HOST_WIDE_INT @var{nelems})
 Return true if GCC should try to use a scalar mode to store an array
 of @var{nelems} elements, given that each element has mode @var{mode}.
Index: gcc/hooks.h
===================================================================
--- gcc/hooks.h	2017-11-08 15:05:32.622623544 +0000
+++ gcc/hooks.h	2017-11-08 15:06:16.085850270 +0000
@@ -124,4 +124,7 @@ extern const char *hook_constcharptr_con
 extern const char *hook_constcharptr_const_tree_const_tree_null (const_tree, const_tree);
 extern const char *hook_constcharptr_int_const_tree_null (int, const_tree);
 extern const char *hook_constcharptr_int_const_tree_const_tree_null (int, const_tree, const_tree);
+
+extern opt_machine_mode hook_optmode_mode_uhwi_none (machine_mode,
+						     unsigned HOST_WIDE_INT);
 #endif
Index: gcc/hooks.c
===================================================================
--- gcc/hooks.c	2017-11-08 15:05:32.622623544 +0000
+++ gcc/hooks.c	2017-11-08 15:06:16.085850270 +0000
@@ -525,3 +525,11 @@ hook_bool_mode_reg_class_t_reg_class_t_f
   return false;
 }
 
+/* Generic hook that takes a mode and an unsigned HOST_WIDE_INT and
+   returns no mode.  */
+
+opt_machine_mode
+hook_optmode_mode_uhwi_none (machine_mode, unsigned HOST_WIDE_INT)
+{
+  return opt_machine_mode ();
+}
Index: gcc/tree-vect-data-refs.c
===================================================================
--- gcc/tree-vect-data-refs.c	2017-11-08 15:05:36.346297369 +0000
+++ gcc/tree-vect-data-refs.c	2017-11-08 15:06:16.087850270 +0000
@@ -60,20 +60,23 @@ Software Foundation; either version 3, o
 vect_lanes_optab_supported_p (const char *name, convert_optab optab,
 			      tree vectype, unsigned HOST_WIDE_INT count)
 {
-  machine_mode mode;
-  scalar_int_mode array_mode;
+  machine_mode mode, array_mode;
   bool limit_p;
 
   mode = TYPE_MODE (vectype);
-  limit_p = !targetm.array_mode_supported_p (mode, count);
-  if (!int_mode_for_size (count * GET_MODE_BITSIZE (mode),
-			  limit_p).exists (&array_mode))
+  if (!targetm.array_mode (mode, count).exists (&array_mode))
     {
-      if (dump_enabled_p ())
-	dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-                         "no array mode for %s[" HOST_WIDE_INT_PRINT_DEC "]\n",
-                         GET_MODE_NAME (mode), count);
-      return false;
+      poly_uint64 bits = count * GET_MODE_BITSIZE (mode);
+      limit_p = !targetm.array_mode_supported_p (mode, count);
+      if (!int_mode_for_size (bits, limit_p).exists (&array_mode))
+	{
+	  if (dump_enabled_p ())
+	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			     "no array mode for %s["
+			     HOST_WIDE_INT_PRINT_DEC "]\n",
+			     GET_MODE_NAME (mode), count);
+	  return false;
+	}
     }
 
   if (convert_optab_handler (optab, array_mode, mode) == CODE_FOR_nothing)
Index: gcc/stor-layout.c
===================================================================
--- gcc/stor-layout.c	2017-11-08 15:05:36.974386931 +0000
+++ gcc/stor-layout.c	2017-11-08 15:06:16.086850270 +0000
@@ -545,7 +545,8 @@ get_mode_alignment (machine_mode mode)
 mode_for_array (tree elem_type, tree size)
 {
   tree elem_size;
-  unsigned HOST_WIDE_INT int_size, int_elem_size;
+  poly_uint64 int_size, int_elem_size;
+  unsigned HOST_WIDE_INT num_elems;
   bool limit_p;
 
   /* One-element arrays get the component type's mode.  */
@@ -554,14 +555,16 @@ mode_for_array (tree elem_type, tree siz
     return TYPE_MODE (elem_type);
 
   limit_p = true;
-  if (tree_fits_uhwi_p (size) && tree_fits_uhwi_p (elem_size))
+  if (poly_int_tree_p (size, &int_size)
+      && poly_int_tree_p (elem_size, &int_elem_size)
+      && may_ne (int_elem_size, 0U)
+      && constant_multiple_p (int_size, int_elem_size, &num_elems))
     {
-      int_size = tree_to_uhwi (size);
-      int_elem_size = tree_to_uhwi (elem_size);
-      if (int_elem_size > 0
-	  && int_size % int_elem_size == 0
-	  && targetm.array_mode_supported_p (TYPE_MODE (elem_type),
-					     int_size / int_elem_size))
+      machine_mode elem_mode = TYPE_MODE (elem_type);
+      machine_mode mode;
+      if (targetm.array_mode (elem_mode, num_elems).exists (&mode))
+	return mode;
+      if (targetm.array_mode_supported_p (elem_mode, num_elems))
 	limit_p = false;
     }
   return mode_for_size_tree (size, MODE_INT, limit_p).else_blk ();

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [2/4] [AArch64] SVE load/store_lanes support
  2017-11-08 15:12 [0/4] Add SVE support for load/store_lanes Richard Sandiford
  2017-11-08 15:13 ` [1/4] Give the target more control over ARRAY_TYPE modes Richard Sandiford
@ 2017-11-08 15:16 ` Richard Sandiford
  2018-01-06 19:45   ` Richard Sandiford
  2017-11-08 15:18 ` [3/4] load/store_lanes testsuite markup Richard Sandiford
  2017-11-08 15:18 ` [4/4] [AArch64] Tests for SVE structure modes Richard Sandiford
  3 siblings, 1 reply; 9+ messages in thread
From: Richard Sandiford @ 2017-11-08 15:16 UTC (permalink / raw)
  To: gcc-patches; +Cc: richard.earnshaw, james.greenhalgh, marcus.shawcroft

This patch adds support for SVE LD[234], ST[234] and associated
structure modes.  Unlike Advanced SIMD, these modes are extra-long
vector modes instead of integer modes.


2017-11-08  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* config/aarch64/aarch64-modes.def: Define x2, x3 and x4 vector
	modes for SVE.
	* config/aarch64/aarch64-protos.h
	(aarch64_sve_struct_memory_operand_p): Declare.
	* config/aarch64/iterators.md (SVE_STRUCT): New mode iterator.
	(vector_count, insn_length, VSINGLE, vsingle): New mode attributes.
	(VPRED, vpred): Handle SVE structure modes.
	* config/aarch64/constraints.md (Utx): New constraint.
	* config/aarch64/predicates.md (aarch64_sve_struct_memory_operand)
	(aarch64_sve_struct_nonimmediate_operand): New predicates.
	* config/aarch64/aarch64.md (UNSPEC_LDN, UNSPEC_STN): New unspecs.
	* config/aarch64/aarch64-sve.md (mov<mode>, *aarch64_sve_mov<mode>_le)
	(*aarch64_sve_mov<mode>_be, pred_mov<mode>): New patterns for
	structure modes.  Split into pieces after RA.
	(vec_load_lanes<mode><vsingle>, vec_mask_load_lanes<mode><vsingle>)
	(vec_store_lanes<mode><vsingle>, vec_mask_store_lanes<mode><vsingle>):
	New patterns.
	* config/aarch64/aarch64.c (aarch64_classify_vector_mode): Handle
	SVE structure modes.
	(aarch64_classify_address): Likewise.
	(sizetochar): Move earlier in file.
	(aarch64_print_operand): Handle SVE register lists.
	(aarch64_array_mode): New function.
	(aarch64_sve_struct_memory_operand_p): Likewise.
	(TARGET_ARRAY_MODE): Redefine.

Index: gcc/config/aarch64/aarch64-modes.def
===================================================================
--- gcc/config/aarch64/aarch64-modes.def	2017-11-08 15:05:50.202852894 +0000
+++ gcc/config/aarch64/aarch64-modes.def	2017-11-08 15:06:19.687849905 +0000
@@ -87,6 +87,9 @@ INT_MODE (XI, 64);
 /* Give SVE vectors the names normally used for 256-bit vectors.
    The actual number depends on command-line flags.  */
 SVE_MODES (1, V32, V16, V8, V4)
+SVE_MODES (2, V64, V32, V16, V8)
+SVE_MODES (3, V96, V48, V24, V12)
+SVE_MODES (4, V128, V64, V32, V16)
 
 /* Quad float: 128-bit floating mode for long doubles.  */
 FLOAT_MODE (TF, 16, ieee_quad_format);
Index: gcc/config/aarch64/aarch64-protos.h
===================================================================
--- gcc/config/aarch64/aarch64-protos.h	2017-11-08 15:05:50.204852894 +0000
+++ gcc/config/aarch64/aarch64-protos.h	2017-11-08 15:06:19.687849905 +0000
@@ -432,6 +432,7 @@ rtx aarch64_simd_gen_const_vector_dup (m
 bool aarch64_simd_mem_operand_p (rtx);
 bool aarch64_sve_ld1r_operand_p (rtx);
 bool aarch64_sve_ldr_operand_p (rtx);
+bool aarch64_sve_struct_memory_operand_p (rtx);
 rtx aarch64_simd_vect_par_cnst_half (machine_mode, int, bool);
 rtx aarch64_tls_get_addr (void);
 tree aarch64_fold_builtin (tree, int, tree *, bool);
Index: gcc/config/aarch64/iterators.md
===================================================================
--- gcc/config/aarch64/iterators.md	2017-11-08 15:05:50.216852893 +0000
+++ gcc/config/aarch64/iterators.md	2017-11-08 15:06:19.690849904 +0000
@@ -249,6 +249,11 @@ (define_mode_iterator VMUL_CHANGE_NLANES
 ;; All SVE vector modes.
 (define_mode_iterator SVE_ALL [V32QI V16HI V8SI V4DI V16HF V8SF V4DF])
 
+;; All SVE vector structure modes.
+(define_mode_iterator SVE_STRUCT [V64QI V32HI V16SI V8DI V32HF V16SF V8DF
+				  V96QI V48HI V24SI V12DI V48HF V24SF V12DF
+				  V128QI V64HI V32SI V16DI V64HF V32SF V16DF])
+
 ;; All SVE vector modes that have 8-bit or 16-bit elements.
 (define_mode_iterator SVE_BH [V32QI V16HI V16HF])
 
@@ -588,7 +593,14 @@ (define_mode_attr Vetype [(V8QI "b") (V1
 (define_mode_attr Vesize [(V32QI "b")
 			  (V16HI "h") (V16HF "h")
 			  (V8SI  "w") (V8SF  "w")
-			  (V4DI  "d") (V4DF  "d")])
+			  (V4DI  "d") (V4DF  "d")
+			  (V64QI "b") (V96QI "b") (V128QI "b")
+			  (V32HI "h") (V48HI "h") (V64HI "h")
+			  (V32HF "h") (V48HF "h") (V64HF "h")
+			  (V16SI "w") (V24SI "w") (V32SI "w")
+			  (V16SF "w") (V24SF "w") (V32SF "w")
+			  (V8DI "d")  (V12DI "d") (V16DI "d")
+			  (V8DF "d")  (V12DF "d") (V16DF "d")])
 
 ;; Vetype is used everywhere in scheduling type and assembly output,
 ;; sometimes they are not the same, for example HF modes on some
@@ -946,17 +958,87 @@ (define_mode_attr insn_count [(OI "8") (
 ;; No need of iterator for -fPIC as it use got_lo12 for both modes.
 (define_mode_attr got_modifier [(SI "gotpage_lo14") (DI "gotpage_lo15")])
 
-;; The predicate mode associated with an SVE data mode.
+;; The number of subvectors in an SVE_STRUCT.
+(define_mode_attr vector_count [(V64QI "2") (V32HI "2")
+				(V16SI "2") (V8DI "2")
+				(V32HF "2") (V16SF "2") (V8DF "2")
+				(V96QI "3") (V48HI "3")
+				(V24SI "3") (V12DI "3")
+				(V48HF "3") (V24SF "3") (V12DF "3")
+				(V128QI "4") (V64HI "4")
+				(V32SI "4") (V16DI "4")
+				(V64HF "4") (V32SF "4") (V16DF "4")])
+
+;; The number of instruction bytes needed for an SVE_STRUCT move.  This is
+;; equal to vector_count * 4.
+(define_mode_attr insn_length [(V64QI "8") (V32HI "8")
+			       (V16SI "8") (V8DI "8")
+			       (V32HF "8") (V16SF "8") (V8DF "8")
+			       (V96QI "12") (V48HI "12")
+			       (V24SI "12") (V12DI "12")
+			       (V48HF "12") (V24SF "12") (V12DF "12")
+			       (V128QI "16") (V64HI "16")
+			       (V32SI "16") (V16DI "16")
+			       (V64HF "16") (V32SF "16") (V16DF "16")])
+
+;; The type of a subvector in an SVE_STRUCT.
+(define_mode_attr VSINGLE [(V64QI "V32QI") (V32HI "V16HI")
+			   (V16SI "V8SI") (V8DI "V4DI")
+			   (V32HF "V16HF") (V16SF "V8SF") (V8DF "V4DF")
+			   (V96QI "V32QI") (V48HI "V16HI")
+			   (V24SI "V8SI") (V12DI "V4DI")
+			   (V48HF "V16HF") (V24SF "V8SF") (V12DF "V4DF")
+			   (V128QI "V32QI") (V64HI "V16HI")
+			   (V32SI "V8SI") (V16DI "V4DI")
+			   (V64HF "V16HF") (V32SF "V8SF") (V16DF "V4DF")])
+
+;; ...and again in lower case.
+(define_mode_attr vsingle [(V64QI "v32qi") (V32HI "v16hi")
+			   (V16SI "v8si") (V8DI "v4di")
+			   (V32HF "v16hf") (V16SF "v8sf") (V8DF "v4df")
+			   (V96QI "v32qi") (V48HI "v16hi")
+			   (V24SI "v8si") (V12DI "v4di")
+			   (V48HF "v16hf") (V24SF "v8sf") (V12DF "v4df")
+			   (V128QI "v32qi") (V64HI "v16hi")
+			   (V32SI "v8si") (V16DI "v4di")
+			   (V64HF "v16hf") (V32SF "v8sf") (V16DF "v4df")])
+
+;; The predicate mode associated with an SVE data mode.  For structure modes
+;; this is equivalent to the <VPRED> of the subvector mode.
 (define_mode_attr VPRED [(V32QI "V32BI")
 			 (V16HI "V16BI") (V16HF "V16BI")
 			 (V8SI "V8BI") (V8SF "V8BI")
-			 (V4DI "V4BI") (V4DF "V4BI")])
+			 (V4DI "V4BI") (V4DF "V4BI")
+			 (V64QI "V32BI")
+			 (V32HI "V16BI") (V32HF "V16BI")
+			 (V16SI "V8BI") (V16SF "V8BI")
+			 (V8DI "V4BI") (V8DF "V4BI")
+			 (V96QI "V32BI")
+			 (V48HI "V16BI") (V48HF "V16BI")
+			 (V24SI "V8BI") (V24SF "V8BI")
+			 (V12DI "V4BI") (V12DF "V4BI")
+			 (V128QI "V32BI")
+			 (V64HI "V16BI") (V64HF "V16BI")
+			 (V32SI "V8BI") (V32SF "V8BI")
+			 (V16DI "V4BI") (V16DF "V4BI")])
 
 ;; ...and again in lower case.
 (define_mode_attr vpred [(V32QI "v32bi")
 			 (V16HI "v16bi") (V16HF "v16bi")
 			 (V8SI "v8bi") (V8SF "v8bi")
-			 (V4DI "v4bi") (V4DF "v4bi")])
+			 (V4DI "v4bi") (V4DF "v4bi")
+			 (V64QI "v32bi")
+			 (V32HI "v16bi") (V32HF "v16bi")
+			 (V16SI "v8bi") (V16SF "v8bi")
+			 (V8DI "v4bi") (V8DF "v4bi")
+			 (V96QI "v32bi")
+			 (V48HI "v16bi") (V48HF "v16bi")
+			 (V24SI "v8bi") (V24SF "v8bi")
+			 (V12DI "v4bi") (V12DF "v4bi")
+			 (V128QI "v32bi")
+			 (V64HI "v16bi") (V64HF "v8bi")
+			 (V32SI "v8bi") (V32SF "v8bi")
+			 (V16DI "v4bi") (V16DF "v4bi")])
 
 ;; -------------------------------------------------------------------
 ;; Code Iterators
Index: gcc/config/aarch64/constraints.md
===================================================================
--- gcc/config/aarch64/constraints.md	2017-11-08 15:05:50.215852893 +0000
+++ gcc/config/aarch64/constraints.md	2017-11-08 15:06:19.690849904 +0000
@@ -210,6 +210,12 @@ (define_memory_constraint "Utw"
   (and (match_code "mem")
        (match_test "aarch64_sve_ld1r_operand_p (op)")))
 
+(define_memory_constraint "Utx"
+  "@internal
+   An address valid for SVE structure mov patterns (as distinct from
+   LD[234] and ST[234] patterns)."
+  (match_operand 0 "aarch64_sve_struct_memory_operand"))
+
 (define_constraint "Ufc"
   "A floating point constant which can be used with an\
    FMOV immediate operation."
Index: gcc/config/aarch64/predicates.md
===================================================================
--- gcc/config/aarch64/predicates.md	2017-11-08 15:05:50.216852893 +0000
+++ gcc/config/aarch64/predicates.md	2017-11-08 15:06:19.690849904 +0000
@@ -471,6 +471,14 @@ (define_predicate "aarch64_sve_general_o
 	    (match_operand 0 "aarch64_sve_ldr_operand")
 	    (match_test "aarch64_mov_operand_p (op, mode)"))))
 
+(define_predicate "aarch64_sve_struct_memory_operand"
+  (and (match_code "mem")
+       (match_test "aarch64_sve_struct_memory_operand_p (op)")))
+
+(define_predicate "aarch64_sve_struct_nonimmediate_operand"
+  (ior (match_operand 0 "register_operand")
+       (match_operand 0 "aarch64_sve_struct_memory_operand")))
+
 ;; Doesn't include immediates, since those are handled by the move
 ;; patterns instead.
 (define_predicate "aarch64_sve_dup_operand"
Index: gcc/config/aarch64/aarch64.md
===================================================================
--- gcc/config/aarch64/aarch64.md	2017-11-08 15:05:50.214852893 +0000
+++ gcc/config/aarch64/aarch64.md	2017-11-08 15:06:19.689849905 +0000
@@ -160,6 +160,8 @@ (define_c_enum "unspec" [
     UNSPEC_PACK
     UNSPEC_FLOAT_CONVERT
     UNSPEC_WHILE_LO
+    UNSPEC_LDN
+    UNSPEC_STN
 ])
 
 (define_c_enum "unspecv" [
Index: gcc/config/aarch64/aarch64-sve.md
===================================================================
--- gcc/config/aarch64/aarch64-sve.md	2017-11-08 15:05:50.206852894 +0000
+++ gcc/config/aarch64/aarch64-sve.md	2017-11-08 15:06:19.687849905 +0000
@@ -189,6 +189,105 @@ (define_insn "maskstore<mode><vpred>"
   "st1<Vesize>\t%1.<Vetype>, %2, %0"
 )
 
+;; SVE structure moves.
+(define_expand "mov<mode>"
+  [(set (match_operand:SVE_STRUCT 0 "nonimmediate_operand")
+	(match_operand:SVE_STRUCT 1 "general_operand"))]
+  "TARGET_SVE"
+  {
+    /* Big-endian loads and stores need to be done via LD1 and ST1;
+       see the comment at the head of the file for details.  */
+    if ((MEM_P (operands[0]) || MEM_P (operands[1]))
+	&& BYTES_BIG_ENDIAN)
+      {
+	gcc_assert (can_create_pseudo_p ());
+	aarch64_expand_sve_mem_move (operands[0], operands[1], <VPRED>mode);
+	DONE;
+      }
+
+    if (CONSTANT_P (operands[1]))
+      {
+	aarch64_expand_mov_immediate (operands[0], operands[1]);
+	DONE;
+      }
+  }
+)
+
+;; Unpredicated structure moves (little-endian).
+(define_insn "*aarch64_sve_mov<mode>_le"
+  [(set (match_operand:SVE_STRUCT 0 "aarch64_sve_nonimmediate_operand" "=w, Utr, w, w")
+	(match_operand:SVE_STRUCT 1 "aarch64_sve_general_operand" "Utr, w, w, Dn"))]
+  "TARGET_SVE && !BYTES_BIG_ENDIAN"
+  "#"
+  [(set_attr "length" "<insn_length>")]
+)
+
+;; Unpredicated structure moves (big-endian).  Memory accesses require
+;; secondary reloads.
+(define_insn "*aarch64_sve_mov<mode>_le"
+  [(set (match_operand:SVE_STRUCT 0 "register_operand" "=w, w")
+	(match_operand:SVE_STRUCT 1 "aarch64_nonmemory_operand" "w, Dn"))]
+  "TARGET_SVE && BYTES_BIG_ENDIAN"
+  "#"
+  [(set_attr "length" "<insn_length>")]
+)
+
+;; Split unpredicated structure moves into pieces.  This is the same
+;; for both big-endian and little-endian code, although it only needs
+;; to handle memory operands for little-endian code.
+(define_split
+  [(set (match_operand:SVE_STRUCT 0 "aarch64_sve_nonimmediate_operand")
+	(match_operand:SVE_STRUCT 1 "aarch64_sve_general_operand"))]
+  "TARGET_SVE && reload_completed"
+  [(const_int 0)]
+  {
+    rtx dest = operands[0];
+    rtx src = operands[1];
+    if (REG_P (dest) && REG_P (src))
+      aarch64_simd_emit_reg_reg_move (operands, <VSINGLE>mode, <vector_count>);
+    else
+      for (unsigned int i = 0; i < <vector_count>; ++i)
+	{
+	  rtx subdest = simplify_gen_subreg (<VSINGLE>mode, dest, <MODE>mode,
+					     i * BYTES_PER_SVE_VECTOR);
+	  rtx subsrc = simplify_gen_subreg (<VSINGLE>mode, src, <MODE>mode,
+					    i * BYTES_PER_SVE_VECTOR);
+	  emit_insn (gen_rtx_SET (subdest, subsrc));
+	}
+    DONE;
+  }
+)
+
+;; Predicated structure moves.  This works for both endiannesses but in
+;; practice is only useful for big-endian.
+(define_insn_and_split "pred_mov<mode>"
+  [(set (match_operand:SVE_STRUCT 0 "aarch64_sve_struct_nonimmediate_operand" "=w, Utx")
+	(unspec:SVE_STRUCT
+	  [(match_operand:<VPRED> 1 "register_operand" "Upl, Upl")
+	   (match_operand:SVE_STRUCT 2 "aarch64_sve_struct_nonimmediate_operand" "Utx, w")]
+	  UNSPEC_MERGE_PTRUE))]
+  "TARGET_SVE
+   && (register_operand (operands[0], <MODE>mode)
+       || register_operand (operands[2], <MODE>mode))"
+  "#"
+  "&& reload_completed"
+  [(const_int 0)]
+  {
+    for (unsigned int i = 0; i < <vector_count>; ++i)
+      {
+	rtx subdest = simplify_gen_subreg (<VSINGLE>mode, operands[0],
+					   <MODE>mode,
+					   i * BYTES_PER_SVE_VECTOR);
+	rtx subsrc = simplify_gen_subreg (<VSINGLE>mode, operands[2],
+					  <MODE>mode,
+					  i * BYTES_PER_SVE_VECTOR);
+	aarch64_emit_sve_pred_move (subdest, operands[1], subsrc);
+      }
+    DONE;
+  }
+  [(set_attr "length" "<insn_length>")]
+)
+
 (define_expand "mov<mode>"
   [(set (match_operand:PRED_ALL 0 "nonimmediate_operand")
 	(match_operand:PRED_ALL 1 "general_operand"))]
@@ -447,6 +546,60 @@ (define_insn "*vec_series<mode>_plus"
   }
 )
 
+;; Unpredicated LD[234].
+(define_expand "vec_load_lanes<mode><vsingle>"
+  [(set (match_operand:SVE_STRUCT 0 "register_operand")
+	(unspec:SVE_STRUCT
+	  [(match_dup 2)
+	   (match_operand:SVE_STRUCT 1 "memory_operand")]
+	  UNSPEC_LDN))]
+  "TARGET_SVE"
+  {
+    operands[2] = force_reg (<VPRED>mode, CONSTM1_RTX (<VPRED>mode));
+  }
+)
+
+;; Predicated LD[234].
+(define_insn "vec_mask_load_lanes<mode><vsingle>"
+  [(set (match_operand:SVE_STRUCT 0 "register_operand" "=w")
+	(unspec:SVE_STRUCT
+	  [(match_operand:<VPRED> 2 "register_operand" "Upl")
+	   (match_operand:SVE_STRUCT 1 "memory_operand" "m")]
+	  UNSPEC_LDN))]
+  "TARGET_SVE"
+  "ld<vector_count><Vesize>\t%0, %2/z, %1"
+)
+
+;; Unpredicated ST[234].  This is always a full update, so the dependence
+;; on the old value of the memory location (via (match_dup 0)) is redundant.
+;; There doesn't seem to be any obvious benefit to treating the all-true
+;; case differently though.  In particular, it's very unlikely that we'll
+;; only find out during RTL that a store_lanes is dead.
+(define_expand "vec_store_lanes<mode><vsingle>"
+  [(set (match_operand:SVE_STRUCT 0 "memory_operand")
+	(unspec:SVE_STRUCT
+	  [(match_dup 2)
+	   (match_operand:SVE_STRUCT 1 "register_operand")
+	   (match_dup 0)]
+	  UNSPEC_STN))]
+  "TARGET_SVE"
+  {
+    operands[2] = force_reg (<VPRED>mode, CONSTM1_RTX (<VPRED>mode));
+  }
+)
+
+;; Predicated ST[234].
+(define_insn "vec_mask_store_lanes<mode><vsingle>"
+  [(set (match_operand:SVE_STRUCT 0 "memory_operand" "+m")
+	(unspec:SVE_STRUCT
+	  [(match_operand:<VPRED> 2 "register_operand" "Upl")
+	   (match_operand:SVE_STRUCT 1 "register_operand" "w")
+	   (match_dup 0)]
+	  UNSPEC_STN))]
+  "TARGET_SVE"
+  "st<vector_count><Vesize>\t%1, %2, %0"
+)
+
 (define_expand "vec_perm_const<mode>"
   [(match_operand:SVE_ALL 0 "register_operand")
    (match_operand:SVE_ALL 1 "register_operand")
Index: gcc/config/aarch64/aarch64.c
===================================================================
--- gcc/config/aarch64/aarch64.c	2017-11-08 15:05:50.213852893 +0000
+++ gcc/config/aarch64/aarch64.c	2017-11-08 15:06:19.689849905 +0000
@@ -1179,9 +1179,15 @@ aarch64_classify_vector_mode (machine_mo
 	  || inner == DImode
 	  || inner == DFmode))
     {
-      if (TARGET_SVE
-	  && must_eq (GET_MODE_BITSIZE (mode), BITS_PER_SVE_VECTOR))
-	return VEC_SVE_DATA;
+      if (TARGET_SVE)
+	{
+	  if (must_eq (GET_MODE_BITSIZE (mode), BITS_PER_SVE_VECTOR))
+	    return VEC_SVE_DATA;
+	  if (must_eq (GET_MODE_BITSIZE (mode), BITS_PER_SVE_VECTOR * 2)
+	      || must_eq (GET_MODE_BITSIZE (mode), BITS_PER_SVE_VECTOR * 3)
+	      || must_eq (GET_MODE_BITSIZE (mode), BITS_PER_SVE_VECTOR * 4))
+	    return VEC_SVE_DATA | VEC_STRUCT;
+	}
 
       /* This includes V1DF but not V1DI (which doesn't exist).  */
       if (TARGET_SIMD
@@ -1209,6 +1215,18 @@ aarch64_sve_data_mode_p (machine_mode mo
   return aarch64_classify_vector_mode (mode) & VEC_SVE_DATA;
 }
 
+/* Implement target hook TARGET_ARRAY_MODE.  */
+static opt_machine_mode
+aarch64_array_mode (machine_mode mode, unsigned HOST_WIDE_INT nelems)
+{
+  if (aarch64_classify_vector_mode (mode) == VEC_SVE_DATA
+      && IN_RANGE (nelems, 2, 4))
+    return mode_for_vector (GET_MODE_INNER (mode),
+			    GET_MODE_NUNITS (mode) * nelems);
+
+  return opt_machine_mode ();
+}
+
 /* Implement target hook TARGET_ARRAY_MODE_SUPPORTED_P.  */
 static bool
 aarch64_array_mode_supported_p (machine_mode mode,
@@ -5677,6 +5695,18 @@ aarch64_classify_address (struct aarch64
 		    ? offset_4bit_signed_scaled_p (mode, offset)
 		    : offset_9bit_signed_scaled_p (mode, offset));
 
+	  if (vec_flags == (VEC_SVE_DATA | VEC_STRUCT))
+	    {
+	      poly_int64 end_offset = (offset
+				       + GET_MODE_SIZE (mode)
+				       - BYTES_PER_SVE_VECTOR);
+	      return (type == ADDR_QUERY_M
+		      ? offset_4bit_signed_scaled_p (mode, offset)
+		      : (offset_9bit_signed_scaled_p (SVE_BYTE_MODE, offset)
+			 && offset_9bit_signed_scaled_p (SVE_BYTE_MODE,
+							 end_offset)));
+	    }
+
 	  if (vec_flags == VEC_SVE_PRED)
 	    return offset_9bit_signed_scaled_p (mode, offset);
 
@@ -6391,6 +6421,20 @@ aarch64_print_vector_float_operand (FILE
   return true;
 }
 
+/* Return the equivalent letter for size.  */
+static char
+sizetochar (int size)
+{
+  switch (size)
+    {
+    case 64: return 'd';
+    case 32: return 's';
+    case 16: return 'h';
+    case 8 : return 'b';
+    default: gcc_unreachable ();
+    }
+}
+
 /* Print operand X to file F in a target specific manner according to CODE.
    The acceptable formatting commands given by CODE are:
      'c':		An integer or symbol address without a preceding #
@@ -6674,7 +6718,18 @@ aarch64_print_operand (FILE *f, rtx x, i
 	{
 	case REG:
 	  if (aarch64_sve_data_mode_p (GET_MODE (x)))
-	    asm_fprintf (f, "z%d", REGNO (x) - V0_REGNUM);
+	    {
+	      if (REG_NREGS (x) == 1)
+		asm_fprintf (f, "z%d", REGNO (x) - V0_REGNUM);
+	      else
+		{
+		  char suffix
+		    = sizetochar (GET_MODE_UNIT_BITSIZE (GET_MODE (x)));
+		  asm_fprintf (f, "{z%d.%c - z%d.%c}",
+			       REGNO (x) - V0_REGNUM, suffix,
+			       END_REGNO (x) - V0_REGNUM - 1, suffix);
+		}
+	    }
 	  else
 	    asm_fprintf (f, "%s", reg_names [REGNO (x)]);
 	  break;
@@ -12825,20 +12880,6 @@ aarch64_final_prescan_insn (rtx_insn *in
 }
 
 
-/* Return the equivalent letter for size.  */
-static char
-sizetochar (int size)
-{
-  switch (size)
-    {
-    case 64: return 'd';
-    case 32: return 's';
-    case 16: return 'h';
-    case 8 : return 'b';
-    default: gcc_unreachable ();
-    }
-}
-
 /* Return true if BASE_OR_STEP is a valid immediate operand for an SVE INDEX
    instruction.  */
 
@@ -13432,6 +13473,28 @@ aarch64_sve_ldr_operand_p (rtx op)
 	  && addr.type == ADDRESS_REG_IMM);
 }
 
+/* Return true if OP is a valid MEM operand for an SVE_STRUCT mode.
+   We need to be able to access the individual pieces, so the range
+   is different from LD[234] and ST[234].  */
+bool
+aarch64_sve_struct_memory_operand_p (rtx op)
+{
+  if (!MEM_P (op))
+    return false;
+
+  machine_mode mode = GET_MODE (op);
+  struct aarch64_address_info addr;
+  if (!aarch64_classify_address (&addr, XEXP (op, 0), SVE_BYTE_MODE, false,
+				 ADDR_QUERY_ANY)
+      || addr.type != ADDRESS_REG_IMM)
+    return false;
+
+  poly_int64 first = addr.const_offset;
+  poly_int64 last = first + GET_MODE_SIZE (mode) - BYTES_PER_SVE_VECTOR;
+  return (offset_4bit_signed_scaled_p (SVE_BYTE_MODE, first)
+	  && offset_4bit_signed_scaled_p (SVE_BYTE_MODE, last));
+}
+
 /* Emit a register copy from operand to operand, taking care not to
    early-clobber source registers in the process.
 
@@ -17542,6 +17605,9 @@ #define TARGET_VECTOR_MODE_SUPPORTED_P a
 #define TARGET_VECTORIZE_SUPPORT_VECTOR_MISALIGNMENT \
   aarch64_builtin_support_vector_misalignment
 
+#undef TARGET_ARRAY_MODE
+#define TARGET_ARRAY_MODE aarch64_array_mode
+
 #undef TARGET_ARRAY_MODE_SUPPORTED_P
 #define TARGET_ARRAY_MODE_SUPPORTED_P aarch64_array_mode_supported_p
 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [4/4] [AArch64] Tests for SVE structure modes
  2017-11-08 15:12 [0/4] Add SVE support for load/store_lanes Richard Sandiford
                   ` (2 preceding siblings ...)
  2017-11-08 15:18 ` [3/4] load/store_lanes testsuite markup Richard Sandiford
@ 2017-11-08 15:18 ` Richard Sandiford
  2018-01-06 19:46   ` Richard Sandiford
  3 siblings, 1 reply; 9+ messages in thread
From: Richard Sandiford @ 2017-11-08 15:18 UTC (permalink / raw)
  To: gcc-patches; +Cc: richard.earnshaw, james.greenhalgh, marcus.shawcroft

This patch adds tests for the SVE structure mode move patterns
and for LD[234] and ST[234] vectorisation.


2017-11-08  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/testsuite/
	* gcc.target/aarch64/sve_struct_move_1.c: New test.
	* gcc.target/aarch64/sve_struct_move_2.c: Likewise.
	* gcc.target/aarch64/sve_struct_move_3.c: Likewise.
	* gcc.target/aarch64/sve_struct_move_4.c: Likewise.
	* gcc.target/aarch64/sve_struct_move_5.c: Likewise.
	* gcc.target/aarch64/sve_struct_move_6.c: Likewise.
	* gcc.target/aarch64/sve_struct_vect_1.c: Likewise.
	* gcc.target/aarch64/sve_struct_vect_1_run.c: Likewise.
	* gcc.target/aarch64/sve_struct_vect_2.c: Likewise.
	* gcc.target/aarch64/sve_struct_vect_2_run.c: Likewise.
	* gcc.target/aarch64/sve_struct_vect_3.c: Likewise.
	* gcc.target/aarch64/sve_struct_vect_3_run.c: Likewise.
	* gcc.target/aarch64/sve_struct_vect_4.c: Likewise.
	* gcc.target/aarch64/sve_struct_vect_4_run.c: Likewise.
	* gcc.target/aarch64/sve_struct_vect_5.c: Likewise.
	* gcc.target/aarch64/sve_struct_vect_5_run.c: Likewise.
	* gcc.target/aarch64/sve_struct_vect_6.c: Likewise.
	* gcc.target/aarch64/sve_struct_vect_6_run.c: Likewise.
	* gcc.target/aarch64/sve_struct_vect_7.c: Likewise.
	* gcc.target/aarch64/sve_struct_vect_7_run.c: Likewise.
	* gcc.target/aarch64/sve_struct_vect_8.c: Likewise.
	* gcc.target/aarch64/sve_struct_vect_8_run.c: Likewise.
	* gcc.target/aarch64/sve_struct_vect_9.c: Likewise.
	* gcc.target/aarch64/sve_struct_vect_9_run.c: Likewise.
	* gcc.target/aarch64/sve_struct_vect_10.c: Likewise.
	* gcc.target/aarch64/sve_struct_vect_10_run.c: Likewise.
	* gcc.target/aarch64/sve_struct_vect_11.c: Likewise.
	* gcc.target/aarch64/sve_struct_vect_11_run.c: Likewise.
	* gcc.target/aarch64/sve_struct_vect_12.c: Likewise.
	* gcc.target/aarch64/sve_struct_vect_12_run.c: Likewise.
	* gcc.target/aarch64/sve_struct_vect_13.c: Likewise.
	* gcc.target/aarch64/sve_struct_vect_13_run.c: Likewise.
	* gcc.target/aarch64/sve_struct_vect_14.c: Likewise.
	* gcc.target/aarch64/sve_struct_vect_15.c: Likewise.
	* gcc.target/aarch64/sve_struct_vect_16.c: Likewise.
	* gcc.target/aarch64/sve_struct_vect_17.c: Likewise.

Index: gcc/testsuite/gcc.target/aarch64/sve_struct_move_1.c
===================================================================
--- /dev/null	2017-11-08 11:04:45.353113300 +0000
+++ gcc/testsuite/gcc.target/aarch64/sve_struct_move_1.c	2017-11-08 15:06:27.247849138 +0000
@@ -0,0 +1,129 @@
+/* { dg-do assemble } */
+/* { dg-options "-O -march=armv8-a+sve -msve-vector-bits=256 -mbig-endian --save-temps" } */
+
+typedef char v32qi __attribute__((vector_size(32)));
+typedef struct { v32qi a[2]; } v64qi;
+
+typedef short v16hi __attribute__((vector_size(32)));
+typedef struct { v16hi a[2]; } v32hi;
+
+typedef int v8si __attribute__((vector_size(32)));
+typedef struct { v8si a[2]; } v16si;
+
+typedef long v4di __attribute__((vector_size(32)));
+typedef struct { v4di a[2]; } v8di;
+
+typedef _Float16 v16hf __attribute__((vector_size(32)));
+typedef struct { v16hf a[2]; } v32hf;
+
+typedef float v8sf __attribute__((vector_size(32)));
+typedef struct { v8sf a[2]; } v16sf;
+
+typedef double v4df __attribute__((vector_size(32)));
+typedef struct { v4df a[2]; } v8df;
+
+#define TEST_TYPE(TYPE, REG1, REG2)			\
+  void							\
+  f1_##TYPE (TYPE *a)					\
+  {							\
+    register TYPE x asm (#REG1) = a[0];			\
+    asm volatile ("# test " #TYPE " 1 %S0" :: "w" (x));	\
+    register TYPE y asm (#REG2) = x;			\
+    asm volatile ("# test " #TYPE " 2 %S0, %S1, %S2"	\
+		  : "=&w" (x) : "0" (x), "w" (y));	\
+    a[1] = x;						\
+  }							\
+  /* This must compile, but we don't care how.  */	\
+  void							\
+  f2_##TYPE (TYPE *a)					\
+  {							\
+    TYPE x = a[0];					\
+    x.a[0][3] = 1;					\
+    x.a[1][2] = 12;					\
+    asm volatile ("# %0" :: "w" (x));			\
+  }							\
+  void							\
+  f3_##TYPE (TYPE *a, int i)				\
+  {							\
+    TYPE x = a[0];					\
+    x.a[0][i] = 1;					\
+    asm volatile ("# %0" :: "w" (x));			\
+  }							\
+  void							\
+  f4_##TYPE (TYPE *a, int i, int j)			\
+  {							\
+    TYPE x = a[0];					\
+    x.a[i][j] = 44;					\
+    asm volatile ("# %0" :: "w" (x));			\
+  }
+
+TEST_TYPE (v64qi, z0, z2)
+TEST_TYPE (v32hi, z5, z7)
+TEST_TYPE (v16si, z10, z12)
+TEST_TYPE (v8di, z15, z17)
+TEST_TYPE (v32hf, z18, z20)
+TEST_TYPE (v16sf, z21, z23)
+TEST_TYPE (v8df, z28, z30)
+
+/* { dg-final { scan-assembler {\tld1b\tz0.b, p[0-7]/z, \[x0\]\n} } } */
+/* { dg-final { scan-assembler {\tld1b\tz1.b, p[0-7]/z, \[x0, #1, mul vl\]\n} } } */
+/* { dg-final { scan-assembler { test v64qi 1 z0\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz2.d, z0.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz3.d, z1.d\n} } } */
+/* { dg-final { scan-assembler { test v64qi 2 z0, z0, z2\n} } } */
+/* { dg-final { scan-assembler {\tst1b\tz0.b, p[0-7], \[x0, #2, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tst1b\tz1.b, p[0-7], \[x0, #3, mul vl\]\n} } } */
+
+/* { dg-final { scan-assembler {\tld1h\tz5.h, p[0-7]/z, \[x0\]\n} } } */
+/* { dg-final { scan-assembler {\tld1h\tz6.h, p[0-7]/z, \[x0, #1, mul vl\]\n} } } */
+/* { dg-final { scan-assembler { test v32hi 1 z5\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz7.d, z5.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz8.d, z6.d\n} } } */
+/* { dg-final { scan-assembler { test v32hi 2 z5, z5, z7\n} } } */
+/* { dg-final { scan-assembler {\tst1h\tz5.h, p[0-7], \[x0, #2, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tst1h\tz6.h, p[0-7], \[x0, #3, mul vl\]\n} } } */
+
+/* { dg-final { scan-assembler {\tld1w\tz10.s, p[0-7]/z, \[x0\]\n} } } */
+/* { dg-final { scan-assembler {\tld1w\tz11.s, p[0-7]/z, \[x0, #1, mul vl\]\n} } } */
+/* { dg-final { scan-assembler { test v16si 1 z10\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz12.d, z10.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz13.d, z11.d\n} } } */
+/* { dg-final { scan-assembler { test v16si 2 z10, z10, z12\n} } } */
+/* { dg-final { scan-assembler {\tst1w\tz10.s, p[0-7], \[x0, #2, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tst1w\tz11.s, p[0-7], \[x0, #3, mul vl\]\n} } } */
+
+/* { dg-final { scan-assembler {\tld1d\tz15.d, p[0-7]/z, \[x0\]\n} } } */
+/* { dg-final { scan-assembler {\tld1d\tz16.d, p[0-7]/z, \[x0, #1, mul vl\]\n} } } */
+/* { dg-final { scan-assembler { test v8di 1 z15\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz17.d, z15.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz18.d, z16.d\n} } } */
+/* { dg-final { scan-assembler { test v8di 2 z15, z15, z17\n} } } */
+/* { dg-final { scan-assembler {\tst1d\tz15.d, p[0-7], \[x0, #2, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tst1d\tz16.d, p[0-7], \[x0, #3, mul vl\]\n} } } */
+
+/* { dg-final { scan-assembler {\tld1h\tz18.h, p[0-7]/z, \[x0\]\n} } } */
+/* { dg-final { scan-assembler {\tld1h\tz19.h, p[0-7]/z, \[x0, #1, mul vl\]\n} } } */
+/* { dg-final { scan-assembler { test v32hf 1 z18\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz20.d, z18.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz21.d, z19.d\n} } } */
+/* { dg-final { scan-assembler { test v32hf 2 z18, z18, z20\n} } } */
+/* { dg-final { scan-assembler {\tst1h\tz18.h, p[0-7], \[x0, #2, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tst1h\tz19.h, p[0-7], \[x0, #3, mul vl\]\n} } } */
+
+/* { dg-final { scan-assembler {\tld1w\tz21.s, p[0-7]/z, \[x0\]\n} } } */
+/* { dg-final { scan-assembler {\tld1w\tz22.s, p[0-7]/z, \[x0, #1, mul vl\]\n} } } */
+/* { dg-final { scan-assembler { test v16sf 1 z21\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz23.d, z21.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz24.d, z22.d\n} } } */
+/* { dg-final { scan-assembler { test v16sf 2 z21, z21, z23\n} } } */
+/* { dg-final { scan-assembler {\tst1w\tz21.s, p[0-7], \[x0, #2, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tst1w\tz22.s, p[0-7], \[x0, #3, mul vl\]\n} } } */
+
+/* { dg-final { scan-assembler {\tld1d\tz28.d, p[0-7]/z, \[x0\]\n} } } */
+/* { dg-final { scan-assembler {\tld1d\tz29.d, p[0-7]/z, \[x0, #1, mul vl\]\n} } } */
+/* { dg-final { scan-assembler { test v8df 1 z28\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz30.d, z28.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz31.d, z29.d\n} } } */
+/* { dg-final { scan-assembler { test v8df 2 z28, z28, z30\n} } } */
+/* { dg-final { scan-assembler {\tst1d\tz28.d, p[0-7], \[x0, #2, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tst1d\tz29.d, p[0-7], \[x0, #3, mul vl\]\n} } } */
Index: gcc/testsuite/gcc.target/aarch64/sve_struct_move_2.c
===================================================================
--- /dev/null	2017-11-08 11:04:45.353113300 +0000
+++ gcc/testsuite/gcc.target/aarch64/sve_struct_move_2.c	2017-11-08 15:06:27.247849138 +0000
@@ -0,0 +1,127 @@
+/* { dg-do assemble } */
+/* { dg-options "-O -march=armv8-a+sve -msve-vector-bits=256 -mbig-endian --save-temps" } */
+
+typedef char v32qi __attribute__((vector_size(32)));
+typedef struct { v32qi a[3]; } v96qi;
+
+typedef short v16hi __attribute__((vector_size(32)));
+typedef struct { v16hi a[3]; } v48hi;
+
+typedef int v8si __attribute__((vector_size(32)));
+typedef struct { v8si a[3]; } v24si;
+
+typedef long v4di __attribute__((vector_size(32)));
+typedef struct { v4di a[3]; } v12di;
+
+typedef _Float16 v16hf __attribute__((vector_size(32)));
+typedef struct { v16hf a[3]; } v48hf;
+
+typedef float v8sf __attribute__((vector_size(32)));
+typedef struct { v8sf a[3]; } v24sf;
+
+typedef double v4df __attribute__((vector_size(32)));
+typedef struct { v4df a[3]; } v12df;
+
+#define TEST_TYPE(TYPE, REG1, REG2) \
+  void \
+  f_##TYPE (TYPE *a) \
+  { \
+    register TYPE x asm (#REG1) = a[0]; \
+    asm volatile ("# test " #TYPE " 1 %S0" :: "w" (x)); \
+    register TYPE y asm (#REG2) = x; \
+    asm volatile ("# test " #TYPE " 2 %S0, %S1, %S2" \
+		  : "=&w" (x) : "0" (x), "w" (y)); \
+    a[1] = x; \
+  }
+
+TEST_TYPE (v96qi, z0, z3)
+TEST_TYPE (v48hi, z6, z2)
+TEST_TYPE (v24si, z12, z15)
+TEST_TYPE (v12di, z16, z13)
+TEST_TYPE (v48hf, z18, z1)
+TEST_TYPE (v24sf, z20, z23)
+TEST_TYPE (v12df, z26, z29)
+
+/* { dg-final { scan-assembler {\tld1b\tz0.b, p[0-7]/z, \[x0\]\n} } } */
+/* { dg-final { scan-assembler {\tld1b\tz1.b, p[0-7]/z, \[x0, #1, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld1b\tz2.b, p[0-7]/z, \[x0, #2, mul vl\]\n} } } */
+/* { dg-final { scan-assembler { test v96qi 1 z0\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz3.d, z0.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz4.d, z1.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz5.d, z2.d\n} } } */
+/* { dg-final { scan-assembler { test v96qi 2 z0, z0, z3\n} } } */
+/* { dg-final { scan-assembler {\tst1b\tz0.b, p[0-7], \[x0, #3, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tst1b\tz1.b, p[0-7], \[x0, #4, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tst1b\tz2.b, p[0-7], \[x0, #5, mul vl\]\n} } } */
+
+/* { dg-final { scan-assembler {\tld1h\tz6.h, p[0-7]/z, \[x0\]\n} } } */
+/* { dg-final { scan-assembler {\tld1h\tz7.h, p[0-7]/z, \[x0, #1, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld1h\tz8.h, p[0-7]/z, \[x0, #2, mul vl\]\n} } } */
+/* { dg-final { scan-assembler { test v48hi 1 z6\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz2.d, z6.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz3.d, z7.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz4.d, z8.d\n} } } */
+/* { dg-final { scan-assembler { test v48hi 2 z6, z6, z2\n} } } */
+/* { dg-final { scan-assembler {\tst1h\tz6.h, p[0-7], \[x0, #3, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tst1h\tz7.h, p[0-7], \[x0, #4, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tst1h\tz8.h, p[0-7], \[x0, #5, mul vl\]\n} } } */
+
+/* { dg-final { scan-assembler {\tld1w\tz12.s, p[0-7]/z, \[x0\]\n} } } */
+/* { dg-final { scan-assembler {\tld1w\tz13.s, p[0-7]/z, \[x0, #1, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld1w\tz14.s, p[0-7]/z, \[x0, #2, mul vl\]\n} } } */
+/* { dg-final { scan-assembler { test v24si 1 z12\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz15.d, z12.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz16.d, z13.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz17.d, z14.d\n} } } */
+/* { dg-final { scan-assembler { test v24si 2 z12, z12, z15\n} } } */
+/* { dg-final { scan-assembler {\tst1w\tz12.s, p[0-7], \[x0, #3, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tst1w\tz13.s, p[0-7], \[x0, #4, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tst1w\tz14.s, p[0-7], \[x0, #5, mul vl\]\n} } } */
+
+/* { dg-final { scan-assembler {\tld1d\tz16.d, p[0-7]/z, \[x0\]\n} } } */
+/* { dg-final { scan-assembler {\tld1d\tz17.d, p[0-7]/z, \[x0, #1, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld1d\tz18.d, p[0-7]/z, \[x0, #2, mul vl\]\n} } } */
+/* { dg-final { scan-assembler { test v12di 1 z16\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz13.d, z16.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz14.d, z17.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz15.d, z18.d\n} } } */
+/* { dg-final { scan-assembler { test v12di 2 z16, z16, z13\n} } } */
+/* { dg-final { scan-assembler {\tst1d\tz16.d, p[0-7], \[x0, #3, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tst1d\tz17.d, p[0-7], \[x0, #4, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tst1d\tz18.d, p[0-7], \[x0, #5, mul vl\]\n} } } */
+
+/* { dg-final { scan-assembler {\tld1h\tz18.h, p[0-7]/z, \[x0\]\n} } } */
+/* { dg-final { scan-assembler {\tld1h\tz19.h, p[0-7]/z, \[x0, #1, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld1h\tz20.h, p[0-7]/z, \[x0, #2, mul vl\]\n} } } */
+/* { dg-final { scan-assembler { test v48hf 1 z18\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz1.d, z18.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz2.d, z19.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz3.d, z20.d\n} } } */
+/* { dg-final { scan-assembler { test v48hf 2 z18, z18, z1\n} } } */
+/* { dg-final { scan-assembler {\tst1h\tz18.h, p[0-7], \[x0, #3, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tst1h\tz19.h, p[0-7], \[x0, #4, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tst1h\tz20.h, p[0-7], \[x0, #5, mul vl\]\n} } } */
+
+/* { dg-final { scan-assembler {\tld1w\tz20.s, p[0-7]/z, \[x0\]\n} } } */
+/* { dg-final { scan-assembler {\tld1w\tz21.s, p[0-7]/z, \[x0, #1, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld1w\tz22.s, p[0-7]/z, \[x0, #2, mul vl\]\n} } } */
+/* { dg-final { scan-assembler { test v24sf 1 z20\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz23.d, z20.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz24.d, z21.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz25.d, z22.d\n} } } */
+/* { dg-final { scan-assembler { test v24sf 2 z20, z20, z23\n} } } */
+/* { dg-final { scan-assembler {\tst1w\tz20.s, p[0-7], \[x0, #3, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tst1w\tz21.s, p[0-7], \[x0, #4, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tst1w\tz22.s, p[0-7], \[x0, #5, mul vl\]\n} } } */
+
+/* { dg-final { scan-assembler {\tld1d\tz26.d, p[0-7]/z, \[x0\]\n} } } */
+/* { dg-final { scan-assembler {\tld1d\tz27.d, p[0-7]/z, \[x0, #1, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld1d\tz28.d, p[0-7]/z, \[x0, #2, mul vl\]\n} } } */
+/* { dg-final { scan-assembler { test v12df 1 z26\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz29.d, z26.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz30.d, z27.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz31.d, z28.d\n} } } */
+/* { dg-final { scan-assembler { test v12df 2 z26, z26, z29\n} } } */
+/* { dg-final { scan-assembler {\tst1d\tz26.d, p[0-7], \[x0, #3, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tst1d\tz27.d, p[0-7], \[x0, #4, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tst1d\tz28.d, p[0-7], \[x0, #5, mul vl\]\n} } } */
Index: gcc/testsuite/gcc.target/aarch64/sve_struct_move_3.c
===================================================================
--- /dev/null	2017-11-08 11:04:45.353113300 +0000
+++ gcc/testsuite/gcc.target/aarch64/sve_struct_move_3.c	2017-11-08 15:06:27.247849138 +0000
@@ -0,0 +1,148 @@
+/* { dg-do assemble } */
+/* { dg-options "-O -march=armv8-a+sve -msve-vector-bits=256 -mbig-endian --save-temps" } */
+
+typedef char v32qi __attribute__((vector_size(32)));
+typedef struct { v32qi a[4]; } v128qi;
+
+typedef short v16hi __attribute__((vector_size(32)));
+typedef struct { v16hi a[4]; } v64hi;
+
+typedef int v8si __attribute__((vector_size(32)));
+typedef struct { v8si a[4]; } v32si;
+
+typedef long v4di __attribute__((vector_size(32)));
+typedef struct { v4di a[4]; } v16di;
+
+typedef _Float16 v16hf __attribute__((vector_size(32)));
+typedef struct { v16hf a[4]; } v64hf;
+
+typedef float v8sf __attribute__((vector_size(32)));
+typedef struct { v8sf a[4]; } v32sf;
+
+typedef double v4df __attribute__((vector_size(32)));
+typedef struct { v4df a[4]; } v16df;
+
+#define TEST_TYPE(TYPE, REG1, REG2) \
+  void \
+  f_##TYPE (TYPE *a) \
+  { \
+    register TYPE x asm (#REG1) = a[0]; \
+    asm volatile ("# test " #TYPE " 1 %S0" :: "w" (x)); \
+    register TYPE y asm (#REG2) = x; \
+    asm volatile ("# test " #TYPE " 2 %S0, %S1, %S2" \
+		  : "=&w" (x) : "0" (x), "w" (y)); \
+    a[1] = x; \
+  }
+
+TEST_TYPE (v128qi, z0, z4)
+TEST_TYPE (v64hi, z6, z2)
+TEST_TYPE (v32si, z12, z16)
+TEST_TYPE (v16di, z17, z13)
+TEST_TYPE (v64hf, z18, z1)
+TEST_TYPE (v32sf, z20, z16)
+TEST_TYPE (v16df, z24, z28)
+
+/* { dg-final { scan-assembler {\tld1b\tz0.b, p[0-7]/z, \[x0\]\n} } } */
+/* { dg-final { scan-assembler {\tld1b\tz1.b, p[0-7]/z, \[x0, #1, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld1b\tz2.b, p[0-7]/z, \[x0, #2, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld1b\tz3.b, p[0-7]/z, \[x0, #3, mul vl\]\n} } } */
+/* { dg-final { scan-assembler { test v128qi 1 z0\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz4.d, z0.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz5.d, z1.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz6.d, z2.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz7.d, z3.d\n} } } */
+/* { dg-final { scan-assembler { test v128qi 2 z0, z0, z4\n} } } */
+/* { dg-final { scan-assembler {\tst1b\tz0.b, p[0-7], \[x0, #4, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tst1b\tz1.b, p[0-7], \[x0, #5, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tst1b\tz2.b, p[0-7], \[x0, #6, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tst1b\tz3.b, p[0-7], \[x0, #7, mul vl\]\n} } } */
+
+/* { dg-final { scan-assembler {\tld1h\tz6.h, p[0-7]/z, \[x0\]\n} } } */
+/* { dg-final { scan-assembler {\tld1h\tz7.h, p[0-7]/z, \[x0, #1, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld1h\tz8.h, p[0-7]/z, \[x0, #2, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld1h\tz9.h, p[0-7]/z, \[x0, #3, mul vl\]\n} } } */
+/* { dg-final { scan-assembler { test v64hi 1 z6\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz2.d, z6.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz3.d, z7.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz4.d, z8.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz5.d, z9.d\n} } } */
+/* { dg-final { scan-assembler { test v64hi 2 z6, z6, z2\n} } } */
+/* { dg-final { scan-assembler {\tst1h\tz6.h, p[0-7], \[x0, #4, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tst1h\tz7.h, p[0-7], \[x0, #5, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tst1h\tz8.h, p[0-7], \[x0, #6, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tst1h\tz9.h, p[0-7], \[x0, #7, mul vl\]\n} } } */
+
+/* { dg-final { scan-assembler {\tld1w\tz12.s, p[0-7]/z, \[x0\]\n} } } */
+/* { dg-final { scan-assembler {\tld1w\tz13.s, p[0-7]/z, \[x0, #1, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld1w\tz14.s, p[0-7]/z, \[x0, #2, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld1w\tz15.s, p[0-7]/z, \[x0, #3, mul vl\]\n} } } */
+/* { dg-final { scan-assembler { test v32si 1 z12\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz16.d, z12.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz17.d, z13.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz18.d, z14.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz19.d, z15.d\n} } } */
+/* { dg-final { scan-assembler { test v32si 2 z12, z12, z16\n} } } */
+/* { dg-final { scan-assembler {\tst1w\tz12.s, p[0-7], \[x0, #4, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tst1w\tz13.s, p[0-7], \[x0, #5, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tst1w\tz14.s, p[0-7], \[x0, #6, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tst1w\tz15.s, p[0-7], \[x0, #7, mul vl\]\n} } } */
+
+/* { dg-final { scan-assembler {\tld1d\tz17.d, p[0-7]/z, \[x0\]\n} } } */
+/* { dg-final { scan-assembler {\tld1d\tz18.d, p[0-7]/z, \[x0, #1, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld1d\tz19.d, p[0-7]/z, \[x0, #2, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld1d\tz20.d, p[0-7]/z, \[x0, #3, mul vl\]\n} } } */
+/* { dg-final { scan-assembler { test v16di 1 z17\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz13.d, z17.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz14.d, z18.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz15.d, z19.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz16.d, z20.d\n} } } */
+/* { dg-final { scan-assembler { test v16di 2 z17, z17, z13\n} } } */
+/* { dg-final { scan-assembler {\tst1d\tz17.d, p[0-7], \[x0, #4, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tst1d\tz18.d, p[0-7], \[x0, #5, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tst1d\tz19.d, p[0-7], \[x0, #6, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tst1d\tz20.d, p[0-7], \[x0, #7, mul vl\]\n} } } */
+
+/* { dg-final { scan-assembler {\tld1h\tz18.h, p[0-7]/z, \[x0\]\n} } } */
+/* { dg-final { scan-assembler {\tld1h\tz19.h, p[0-7]/z, \[x0, #1, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld1h\tz20.h, p[0-7]/z, \[x0, #2, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld1h\tz21.h, p[0-7]/z, \[x0, #3, mul vl\]\n} } } */
+/* { dg-final { scan-assembler { test v64hf 1 z18\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz1.d, z18.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz2.d, z19.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz3.d, z20.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz4.d, z21.d\n} } } */
+/* { dg-final { scan-assembler { test v64hf 2 z18, z18, z1\n} } } */
+/* { dg-final { scan-assembler {\tst1h\tz18.h, p[0-7], \[x0, #4, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tst1h\tz19.h, p[0-7], \[x0, #5, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tst1h\tz20.h, p[0-7], \[x0, #6, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tst1h\tz21.h, p[0-7], \[x0, #7, mul vl\]\n} } } */
+
+/* { dg-final { scan-assembler {\tld1w\tz20.s, p[0-7]/z, \[x0\]\n} } } */
+/* { dg-final { scan-assembler {\tld1w\tz21.s, p[0-7]/z, \[x0, #1, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld1w\tz22.s, p[0-7]/z, \[x0, #2, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld1w\tz23.s, p[0-7]/z, \[x0, #3, mul vl\]\n} } } */
+/* { dg-final { scan-assembler { test v32sf 1 z20\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz16.d, z20.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz17.d, z21.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz18.d, z22.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz19.d, z23.d\n} } } */
+/* { dg-final { scan-assembler { test v32sf 2 z20, z20, z16\n} } } */
+/* { dg-final { scan-assembler {\tst1w\tz20.s, p[0-7], \[x0, #4, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tst1w\tz21.s, p[0-7], \[x0, #5, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tst1w\tz22.s, p[0-7], \[x0, #6, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tst1w\tz23.s, p[0-7], \[x0, #7, mul vl\]\n} } } */
+
+/* { dg-final { scan-assembler {\tld1d\tz24.d, p[0-7]/z, \[x0\]\n} } } */
+/* { dg-final { scan-assembler {\tld1d\tz25.d, p[0-7]/z, \[x0, #1, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld1d\tz26.d, p[0-7]/z, \[x0, #2, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld1d\tz27.d, p[0-7]/z, \[x0, #3, mul vl\]\n} } } */
+/* { dg-final { scan-assembler { test v16df 1 z24\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz28.d, z24.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz29.d, z25.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz30.d, z26.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz31.d, z27.d\n} } } */
+/* { dg-final { scan-assembler { test v16df 2 z24, z24, z28\n} } } */
+/* { dg-final { scan-assembler {\tst1d\tz24.d, p[0-7], \[x0, #4, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tst1d\tz25.d, p[0-7], \[x0, #5, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tst1d\tz26.d, p[0-7], \[x0, #6, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tst1d\tz27.d, p[0-7], \[x0, #7, mul vl\]\n} } } */
Index: gcc/testsuite/gcc.target/aarch64/sve_struct_move_4.c
===================================================================
--- /dev/null	2017-11-08 11:04:45.353113300 +0000
+++ gcc/testsuite/gcc.target/aarch64/sve_struct_move_4.c	2017-11-08 15:06:27.247849138 +0000
@@ -0,0 +1,116 @@
+/* { dg-do assemble } */
+/* { dg-options "-O -march=armv8-a+sve -msve-vector-bits=256 -mlittle-endian --save-temps" } */
+
+typedef char v32qi __attribute__((vector_size(32)));
+typedef struct { v32qi a[2]; } v64qi;
+
+typedef short v16hi __attribute__((vector_size(32)));
+typedef struct { v16hi a[2]; } v32hi;
+
+typedef int v8si __attribute__((vector_size(32)));
+typedef struct { v8si a[2]; } v16si;
+
+typedef long v4di __attribute__((vector_size(32)));
+typedef struct { v4di a[2]; } v8di;
+
+typedef float v8sf __attribute__((vector_size(32)));
+typedef struct { v8sf a[2]; } v16sf;
+
+typedef double v4df __attribute__((vector_size(32)));
+typedef struct { v4df a[2]; } v8df;
+
+#define TEST_TYPE(TYPE, REG1, REG2)			\
+  void							\
+  f1_##TYPE (TYPE *a)					\
+  {							\
+    register TYPE x asm (#REG1) = a[0];			\
+    asm volatile ("# test " #TYPE " 1 %S0" :: "w" (x));	\
+    register TYPE y asm (#REG2) = x;			\
+    asm volatile ("# test " #TYPE " 2 %S0, %S1, %S2"	\
+		  : "=&w" (x) : "0" (x), "w" (y));	\
+    a[1] = x;						\
+  }							\
+  /* This must compile, but we don't care how.  */	\
+  void							\
+  f2_##TYPE (TYPE *a)					\
+  {							\
+    TYPE x = a[0];					\
+    x.a[0][3] = 1;					\
+    x.a[1][2] = 12;					\
+    asm volatile ("# %0" :: "w" (x));			\
+  }							\
+  void							\
+  f3_##TYPE (TYPE *a, int i)				\
+  {							\
+    TYPE x = a[0];					\
+    x.a[0][i] = 1;					\
+    asm volatile ("# %0" :: "w" (x));			\
+  }							\
+  void							\
+  f4_##TYPE (TYPE *a, int i, int j)			\
+  {							\
+    TYPE x = a[0];					\
+    x.a[i][j] = 44;					\
+    asm volatile ("# %0" :: "w" (x));			\
+  }
+
+TEST_TYPE (v64qi, z0, z2)
+TEST_TYPE (v32hi, z5, z7)
+TEST_TYPE (v16si, z10, z12)
+TEST_TYPE (v8di, z15, z17)
+TEST_TYPE (v16sf, z20, z23)
+TEST_TYPE (v8df, z28, z30)
+
+/* { dg-final { scan-assembler {\tldr\tz0, \[x0\]\n} } } */
+/* { dg-final { scan-assembler {\tldr\tz1, \[x0, #1, mul vl\]\n} } } */
+/* { dg-final { scan-assembler { test v64qi 1 z0\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz2.d, z0.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz3.d, z1.d\n} } } */
+/* { dg-final { scan-assembler { test v64qi 2 z0, z0, z2\n} } } */
+/* { dg-final { scan-assembler {\tstr\tz0, \[x0, #2, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tstr\tz1, \[x0, #3, mul vl\]\n} } } */
+
+/* { dg-final { scan-assembler {\tldr\tz5, \[x0\]\n} } } */
+/* { dg-final { scan-assembler {\tldr\tz6, \[x0, #1, mul vl\]\n} } } */
+/* { dg-final { scan-assembler { test v32hi 1 z5\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz7.d, z5.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz8.d, z6.d\n} } } */
+/* { dg-final { scan-assembler { test v32hi 2 z5, z5, z7\n} } } */
+/* { dg-final { scan-assembler {\tstr\tz5, \[x0, #2, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tstr\tz6, \[x0, #3, mul vl\]\n} } } */
+
+/* { dg-final { scan-assembler {\tldr\tz10, \[x0\]\n} } } */
+/* { dg-final { scan-assembler {\tldr\tz11, \[x0, #1, mul vl\]\n} } } */
+/* { dg-final { scan-assembler { test v16si 1 z10\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz12.d, z10.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz13.d, z11.d\n} } } */
+/* { dg-final { scan-assembler { test v16si 2 z10, z10, z12\n} } } */
+/* { dg-final { scan-assembler {\tstr\tz10, \[x0, #2, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tstr\tz11, \[x0, #3, mul vl\]\n} } } */
+
+/* { dg-final { scan-assembler {\tldr\tz15, \[x0\]\n} } } */
+/* { dg-final { scan-assembler {\tldr\tz16, \[x0, #1, mul vl\]\n} } } */
+/* { dg-final { scan-assembler { test v8di 1 z15\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz17.d, z15.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz18.d, z16.d\n} } } */
+/* { dg-final { scan-assembler { test v8di 2 z15, z15, z17\n} } } */
+/* { dg-final { scan-assembler {\tstr\tz15, \[x0, #2, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tstr\tz16, \[x0, #3, mul vl\]\n} } } */
+
+/* { dg-final { scan-assembler {\tldr\tz20, \[x0\]\n} } } */
+/* { dg-final { scan-assembler {\tldr\tz21, \[x0, #1, mul vl\]\n} } } */
+/* { dg-final { scan-assembler { test v16sf 1 z20\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz23.d, z20.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz24.d, z21.d\n} } } */
+/* { dg-final { scan-assembler { test v16sf 2 z20, z20, z23\n} } } */
+/* { dg-final { scan-assembler {\tstr\tz20, \[x0, #2, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tstr\tz21, \[x0, #3, mul vl\]\n} } } */
+
+/* { dg-final { scan-assembler {\tldr\tz28, \[x0\]\n} } } */
+/* { dg-final { scan-assembler {\tldr\tz29, \[x0, #1, mul vl\]\n} } } */
+/* { dg-final { scan-assembler { test v8df 1 z28\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz30.d, z28.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz31.d, z29.d\n} } } */
+/* { dg-final { scan-assembler { test v8df 2 z28, z28, z30\n} } } */
+/* { dg-final { scan-assembler {\tstr\tz28, \[x0, #2, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tstr\tz29, \[x0, #3, mul vl\]\n} } } */
Index: gcc/testsuite/gcc.target/aarch64/sve_struct_move_5.c
===================================================================
--- /dev/null	2017-11-08 11:04:45.353113300 +0000
+++ gcc/testsuite/gcc.target/aarch64/sve_struct_move_5.c	2017-11-08 15:06:27.248849138 +0000
@@ -0,0 +1,111 @@
+/* { dg-do assemble } */
+/* { dg-options "-O -march=armv8-a+sve -msve-vector-bits=256 -mlittle-endian --save-temps" } */
+
+typedef char v32qi __attribute__((vector_size(32)));
+typedef struct { v32qi a[3]; } v96qi;
+
+typedef short v16hi __attribute__((vector_size(32)));
+typedef struct { v16hi a[3]; } v48hi;
+
+typedef int v8si __attribute__((vector_size(32)));
+typedef struct { v8si a[3]; } v24si;
+
+typedef long v4di __attribute__((vector_size(32)));
+typedef struct { v4di a[3]; } v12di;
+
+typedef float v8sf __attribute__((vector_size(32)));
+typedef struct { v8sf a[3]; } v24sf;
+
+typedef double v4df __attribute__((vector_size(32)));
+typedef struct { v4df a[3]; } v12df;
+
+#define TEST_TYPE(TYPE, REG1, REG2) \
+  void \
+  f_##TYPE (TYPE *a) \
+  { \
+    register TYPE x asm (#REG1) = a[0]; \
+    asm volatile ("# test " #TYPE " 1 %S0" :: "w" (x)); \
+    register TYPE y asm (#REG2) = x; \
+    asm volatile ("# test " #TYPE " 2 %S0, %S1, %S2" \
+		  : "=&w" (x) : "0" (x), "w" (y)); \
+    a[1] = x; \
+  }
+
+TEST_TYPE (v96qi, z0, z3)
+TEST_TYPE (v48hi, z6, z2)
+TEST_TYPE (v24si, z12, z15)
+TEST_TYPE (v12di, z16, z13)
+TEST_TYPE (v24sf, z20, z23)
+TEST_TYPE (v12df, z26, z29)
+
+/* { dg-final { scan-assembler {\tldr\tz0, \[x0\]\n} } } */
+/* { dg-final { scan-assembler {\tldr\tz1, \[x0, #1, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tldr\tz2, \[x0, #2, mul vl\]\n} } } */
+/* { dg-final { scan-assembler { test v96qi 1 z0\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz3.d, z0.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz4.d, z1.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz5.d, z2.d\n} } } */
+/* { dg-final { scan-assembler { test v96qi 2 z0, z0, z3\n} } } */
+/* { dg-final { scan-assembler {\tstr\tz0, \[x0, #3, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tstr\tz1, \[x0, #4, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tstr\tz2, \[x0, #5, mul vl\]\n} } } */
+
+/* { dg-final { scan-assembler {\tldr\tz6, \[x0\]\n} } } */
+/* { dg-final { scan-assembler {\tldr\tz7, \[x0, #1, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tldr\tz8, \[x0, #2, mul vl\]\n} } } */
+/* { dg-final { scan-assembler { test v48hi 1 z6\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz2.d, z6.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz3.d, z7.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz4.d, z8.d\n} } } */
+/* { dg-final { scan-assembler { test v48hi 2 z6, z6, z2\n} } } */
+/* { dg-final { scan-assembler {\tstr\tz6, \[x0, #3, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tstr\tz7, \[x0, #4, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tstr\tz8, \[x0, #5, mul vl\]\n} } } */
+
+/* { dg-final { scan-assembler {\tldr\tz12, \[x0\]\n} } } */
+/* { dg-final { scan-assembler {\tldr\tz13, \[x0, #1, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tldr\tz14, \[x0, #2, mul vl\]\n} } } */
+/* { dg-final { scan-assembler { test v24si 1 z12\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz15.d, z12.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz16.d, z13.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz17.d, z14.d\n} } } */
+/* { dg-final { scan-assembler { test v24si 2 z12, z12, z15\n} } } */
+/* { dg-final { scan-assembler {\tstr\tz12, \[x0, #3, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tstr\tz13, \[x0, #4, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tstr\tz14, \[x0, #5, mul vl\]\n} } } */
+
+/* { dg-final { scan-assembler {\tldr\tz16, \[x0\]\n} } } */
+/* { dg-final { scan-assembler {\tldr\tz17, \[x0, #1, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tldr\tz18, \[x0, #2, mul vl\]\n} } } */
+/* { dg-final { scan-assembler { test v12di 1 z16\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz13.d, z16.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz14.d, z17.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz15.d, z18.d\n} } } */
+/* { dg-final { scan-assembler { test v12di 2 z16, z16, z13\n} } } */
+/* { dg-final { scan-assembler {\tstr\tz16, \[x0, #3, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tstr\tz17, \[x0, #4, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tstr\tz18, \[x0, #5, mul vl\]\n} } } */
+
+/* { dg-final { scan-assembler {\tldr\tz20, \[x0\]\n} } } */
+/* { dg-final { scan-assembler {\tldr\tz21, \[x0, #1, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tldr\tz22, \[x0, #2, mul vl\]\n} } } */
+/* { dg-final { scan-assembler { test v24sf 1 z20\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz23.d, z20.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz24.d, z21.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz25.d, z22.d\n} } } */
+/* { dg-final { scan-assembler { test v24sf 2 z20, z20, z23\n} } } */
+/* { dg-final { scan-assembler {\tstr\tz20, \[x0, #3, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tstr\tz21, \[x0, #4, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tstr\tz22, \[x0, #5, mul vl\]\n} } } */
+
+/* { dg-final { scan-assembler {\tldr\tz26, \[x0\]\n} } } */
+/* { dg-final { scan-assembler {\tldr\tz27, \[x0, #1, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tldr\tz28, \[x0, #2, mul vl\]\n} } } */
+/* { dg-final { scan-assembler { test v12df 1 z26\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz29.d, z26.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz30.d, z27.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz31.d, z28.d\n} } } */
+/* { dg-final { scan-assembler { test v12df 2 z26, z26, z29\n} } } */
+/* { dg-final { scan-assembler {\tstr\tz26, \[x0, #3, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tstr\tz27, \[x0, #4, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tstr\tz28, \[x0, #5, mul vl\]\n} } } */
Index: gcc/testsuite/gcc.target/aarch64/sve_struct_move_6.c
===================================================================
--- /dev/null	2017-11-08 11:04:45.353113300 +0000
+++ gcc/testsuite/gcc.target/aarch64/sve_struct_move_6.c	2017-11-08 15:06:27.248849138 +0000
@@ -0,0 +1,129 @@
+/* { dg-do assemble } */
+/* { dg-options "-O -march=armv8-a+sve -msve-vector-bits=256 -mlittle-endian --save-temps" } */
+
+typedef char v32qi __attribute__((vector_size(32)));
+typedef struct { v32qi a[4]; } v128qi;
+
+typedef short v16hi __attribute__((vector_size(32)));
+typedef struct { v16hi a[4]; } v64hi;
+
+typedef int v8si __attribute__((vector_size(32)));
+typedef struct { v8si a[4]; } v32si;
+
+typedef long v4di __attribute__((vector_size(32)));
+typedef struct { v4di a[4]; } v16di;
+
+typedef float v8sf __attribute__((vector_size(32)));
+typedef struct { v8sf a[4]; } v32sf;
+
+typedef double v4df __attribute__((vector_size(32)));
+typedef struct { v4df a[4]; } v16df;
+
+#define TEST_TYPE(TYPE, REG1, REG2) \
+  void \
+  f_##TYPE (TYPE *a) \
+  { \
+    register TYPE x asm (#REG1) = a[0]; \
+    asm volatile ("# test " #TYPE " 1 %S0" :: "w" (x)); \
+    register TYPE y asm (#REG2) = x; \
+    asm volatile ("# test " #TYPE " 2 %S0, %S1, %S2" \
+		  : "=&w" (x) : "0" (x), "w" (y)); \
+    a[1] = x; \
+  }
+
+TEST_TYPE (v128qi, z0, z4)
+TEST_TYPE (v64hi, z6, z2)
+TEST_TYPE (v32si, z12, z16)
+TEST_TYPE (v16di, z17, z13)
+TEST_TYPE (v32sf, z20, z16)
+TEST_TYPE (v16df, z24, z28)
+
+/* { dg-final { scan-assembler {\tldr\tz0, \[x0\]\n} } } */
+/* { dg-final { scan-assembler {\tldr\tz1, \[x0, #1, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tldr\tz2, \[x0, #2, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tldr\tz3, \[x0, #3, mul vl\]\n} } } */
+/* { dg-final { scan-assembler { test v128qi 1 z0\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz4.d, z0.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz5.d, z1.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz6.d, z2.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz7.d, z3.d\n} } } */
+/* { dg-final { scan-assembler { test v128qi 2 z0, z0, z4\n} } } */
+/* { dg-final { scan-assembler {\tstr\tz0, \[x0, #4, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tstr\tz1, \[x0, #5, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tstr\tz2, \[x0, #6, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tstr\tz3, \[x0, #7, mul vl\]\n} } } */
+
+/* { dg-final { scan-assembler {\tldr\tz6, \[x0\]\n} } } */
+/* { dg-final { scan-assembler {\tldr\tz7, \[x0, #1, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tldr\tz8, \[x0, #2, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tldr\tz9, \[x0, #3, mul vl\]\n} } } */
+/* { dg-final { scan-assembler { test v64hi 1 z6\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz2.d, z6.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz3.d, z7.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz4.d, z8.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz5.d, z9.d\n} } } */
+/* { dg-final { scan-assembler { test v64hi 2 z6, z6, z2\n} } } */
+/* { dg-final { scan-assembler {\tstr\tz6, \[x0, #4, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tstr\tz7, \[x0, #5, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tstr\tz8, \[x0, #6, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tstr\tz9, \[x0, #7, mul vl\]\n} } } */
+
+/* { dg-final { scan-assembler {\tldr\tz12, \[x0\]\n} } } */
+/* { dg-final { scan-assembler {\tldr\tz13, \[x0, #1, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tldr\tz14, \[x0, #2, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tldr\tz15, \[x0, #3, mul vl\]\n} } } */
+/* { dg-final { scan-assembler { test v32si 1 z12\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz16.d, z12.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz17.d, z13.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz18.d, z14.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz19.d, z15.d\n} } } */
+/* { dg-final { scan-assembler { test v32si 2 z12, z12, z16\n} } } */
+/* { dg-final { scan-assembler {\tstr\tz12, \[x0, #4, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tstr\tz13, \[x0, #5, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tstr\tz14, \[x0, #6, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tstr\tz15, \[x0, #7, mul vl\]\n} } } */
+
+/* { dg-final { scan-assembler {\tldr\tz17, \[x0\]\n} } } */
+/* { dg-final { scan-assembler {\tldr\tz18, \[x0, #1, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tldr\tz19, \[x0, #2, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tldr\tz20, \[x0, #3, mul vl\]\n} } } */
+/* { dg-final { scan-assembler { test v16di 1 z17\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz13.d, z17.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz14.d, z18.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz15.d, z19.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz16.d, z20.d\n} } } */
+/* { dg-final { scan-assembler { test v16di 2 z17, z17, z13\n} } } */
+/* { dg-final { scan-assembler {\tstr\tz17, \[x0, #4, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tstr\tz18, \[x0, #5, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tstr\tz19, \[x0, #6, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tstr\tz20, \[x0, #7, mul vl\]\n} } } */
+
+/* { dg-final { scan-assembler {\tldr\tz20, \[x0\]\n} } } */
+/* { dg-final { scan-assembler {\tldr\tz21, \[x0, #1, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tldr\tz22, \[x0, #2, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tldr\tz23, \[x0, #3, mul vl\]\n} } } */
+/* { dg-final { scan-assembler { test v32sf 1 z20\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz16.d, z20.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz17.d, z21.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz18.d, z22.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz19.d, z23.d\n} } } */
+/* { dg-final { scan-assembler { test v32sf 2 z20, z20, z16\n} } } */
+/* { dg-final { scan-assembler {\tstr\tz20, \[x0, #4, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tstr\tz21, \[x0, #5, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tstr\tz22, \[x0, #6, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tstr\tz23, \[x0, #7, mul vl\]\n} } } */
+
+/* { dg-final { scan-assembler {\tldr\tz24, \[x0\]\n} } } */
+/* { dg-final { scan-assembler {\tldr\tz25, \[x0, #1, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tldr\tz26, \[x0, #2, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tldr\tz27, \[x0, #3, mul vl\]\n} } } */
+/* { dg-final { scan-assembler { test v16df 1 z24\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz28.d, z24.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz29.d, z25.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz30.d, z26.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz31.d, z27.d\n} } } */
+/* { dg-final { scan-assembler { test v16df 2 z24, z24, z28\n} } } */
+/* { dg-final { scan-assembler {\tstr\tz24, \[x0, #4, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tstr\tz25, \[x0, #5, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tstr\tz26, \[x0, #6, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tstr\tz27, \[x0, #7, mul vl\]\n} } } */
Index: gcc/testsuite/gcc.target/aarch64/sve_struct_vect_1.c
===================================================================
--- /dev/null	2017-11-08 11:04:45.353113300 +0000
+++ gcc/testsuite/gcc.target/aarch64/sve_struct_vect_1.c	2017-11-08 15:06:27.248849138 +0000
@@ -0,0 +1,89 @@
+/* { dg-do assemble } */
+/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve --save-temps" } */
+
+#ifndef TYPE
+#define TYPE unsigned char
+#endif
+
+#ifndef NAME
+#define NAME(X) X
+#endif
+
+#define N 1024
+
+void __attribute__ ((noinline, noclone))
+NAME(f2) (TYPE *__restrict a, TYPE *__restrict b, TYPE *__restrict c)
+{
+  for (int i = 0; i < N; ++i)
+    {
+      a[i] = c[i * 2];
+      b[i] = c[i * 2 + 1];
+    }
+}
+
+void __attribute__ ((noinline, noclone))
+NAME(f3) (TYPE *__restrict a, TYPE *__restrict b, TYPE *__restrict c,
+	  TYPE *__restrict d)
+{
+  for (int i = 0; i < N; ++i)
+    {
+      a[i] = d[i * 3];
+      b[i] = d[i * 3 + 1];
+      c[i] = d[i * 3 + 2];
+    }
+}
+
+void __attribute__ ((noinline, noclone))
+NAME(f4) (TYPE *__restrict a, TYPE *__restrict b, TYPE *__restrict c,
+	  TYPE *__restrict d, TYPE *__restrict e)
+{
+  for (int i = 0; i < N; ++i)
+    {
+      a[i] = e[i * 4];
+      b[i] = e[i * 4 + 1];
+      c[i] = e[i * 4 + 2];
+      d[i] = e[i * 4 + 3];
+    }
+}
+
+void __attribute__ ((noinline, noclone))
+NAME(g2) (TYPE *__restrict a, TYPE *__restrict b, TYPE *__restrict c)
+{
+  for (int i = 0; i < N; ++i)
+    {
+      c[i * 2] = a[i];
+      c[i * 2 + 1] = b[i];
+    }
+}
+
+void __attribute__ ((noinline, noclone))
+NAME(g3) (TYPE *__restrict a, TYPE *__restrict b, TYPE *__restrict c,
+	  TYPE *__restrict d)
+{
+  for (int i = 0; i < N; ++i)
+    {
+      d[i * 3] = a[i];
+      d[i * 3 + 1] = b[i];
+      d[i * 3 + 2] = c[i];
+    }
+}
+
+void __attribute__ ((noinline, noclone))
+NAME(g4) (TYPE *__restrict a, TYPE *__restrict b, TYPE *__restrict c,
+	  TYPE *__restrict d, TYPE *__restrict e)
+{
+  for (int i = 0; i < N; ++i)
+    {
+      e[i * 4] = a[i];
+      e[i * 4 + 1] = b[i];
+      e[i * 4 + 2] = c[i];
+      e[i * 4 + 3] = d[i];
+    }
+}
+
+/* { dg-final { scan-assembler {\tld2b\t{z[0-9]+.b - z[0-9]+.b}, p[0-7]/z, \[x[0-9]+\]\n} } } */
+/* { dg-final { scan-assembler {\tld3b\t{z[0-9]+.b - z[0-9]+.b}, p[0-7]/z, \[x[0-9]+\]\n} } } */
+/* { dg-final { scan-assembler {\tld4b\t{z[0-9]+.b - z[0-9]+.b}, p[0-7]/z, \[x[0-9]+\]\n} } } */
+/* { dg-final { scan-assembler {\tst2b\t{z[0-9]+.b - z[0-9]+.b}, p[0-7], \[x[0-9]+\]\n} } } */
+/* { dg-final { scan-assembler {\tst3b\t{z[0-9]+.b - z[0-9]+.b}, p[0-7], \[x[0-9]+\]\n} } } */
+/* { dg-final { scan-assembler {\tst4b\t{z[0-9]+.b - z[0-9]+.b}, p[0-7], \[x[0-9]+\]\n} } } */
Index: gcc/testsuite/gcc.target/aarch64/sve_struct_vect_1_run.c
===================================================================
--- /dev/null	2017-11-08 11:04:45.353113300 +0000
+++ gcc/testsuite/gcc.target/aarch64/sve_struct_vect_1_run.c	2017-11-08 15:06:27.249849138 +0000
@@ -0,0 +1,63 @@
+/* { dg-do run { target aarch64_sve_hw } } */
+/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve --save-temps" } */
+
+#include "sve_struct_vect_1.c"
+
+TYPE a[N], b[N], c[N], d[N], e[N * 4];
+
+void __attribute__ ((noinline, noclone))
+init_array (TYPE *array, int n, TYPE base, TYPE step)
+{
+  for (int i = 0; i < n; ++i)
+    array[i] = base + step * i;
+}
+
+void __attribute__ ((noinline, noclone))
+check_array (TYPE *array, int n, TYPE base, TYPE step)
+{
+  for (int i = 0; i < n; ++i)
+    if (array[i] != (TYPE) (base + step * i))
+      __builtin_abort ();
+}
+
+int __attribute__ ((optimize (1)))
+main (void)
+{
+  init_array (e, 2 * N, 11, 5);
+  f2 (a, b, e);
+  check_array (a, N, 11, 10);
+  check_array (b, N, 16, 10);
+
+  init_array (e, 3 * N, 7, 6);
+  f3 (a, b, c, e);
+  check_array (a, N, 7, 18);
+  check_array (b, N, 13, 18);
+  check_array (c, N, 19, 18);
+
+  init_array (e, 4 * N, 4, 11);
+  f4 (a, b, c, d, e);
+  check_array (a, N, 4, 44);
+  check_array (b, N, 15, 44);
+  check_array (c, N, 26, 44);
+  check_array (d, N, 37, 44);
+
+  init_array (a, N, 2, 8);
+  init_array (b, N, 6, 8);
+  g2 (a, b, e);
+  check_array (e, 2 * N, 2, 4);
+
+  init_array (a, N, 4, 15);
+  init_array (b, N, 9, 15);
+  init_array (c, N, 14, 15);
+  g3 (a, b, c, e);
+  check_array (e, 3 * N, 4, 5);
+
+  init_array (a, N, 14, 36);
+  init_array (b, N, 23, 36);
+  init_array (c, N, 32, 36);
+  init_array (d, N, 41, 36);
+  g4 (a, b, c, d, e);
+  check_array (e, 4 * N, 14, 9);
+
+  return 0;
+}
Index: gcc/testsuite/gcc.target/aarch64/sve_struct_vect_2.c
===================================================================
--- /dev/null	2017-11-08 11:04:45.353113300 +0000
+++ gcc/testsuite/gcc.target/aarch64/sve_struct_vect_2.c	2017-11-08 15:06:27.249849138 +0000
@@ -0,0 +1,12 @@
+/* { dg-do assemble } */
+/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve --save-temps" } */
+
+#define TYPE unsigned short
+#include "sve_struct_vect_1.c"
+
+/* { dg-final { scan-assembler {\tld2h\t{z[0-9]+.h - z[0-9]+.h}, p[0-7]/z, \[x[0-9]+\]\n} } } */
+/* { dg-final { scan-assembler {\tld3h\t{z[0-9]+.h - z[0-9]+.h}, p[0-7]/z, \[x[0-9]+\]\n} } } */
+/* { dg-final { scan-assembler {\tld4h\t{z[0-9]+.h - z[0-9]+.h}, p[0-7]/z, \[x[0-9]+\]\n} } } */
+/* { dg-final { scan-assembler {\tst2h\t{z[0-9]+.h - z[0-9]+.h}, p[0-7], \[x[0-9]+\]\n} } } */
+/* { dg-final { scan-assembler {\tst3h\t{z[0-9]+.h - z[0-9]+.h}, p[0-7], \[x[0-9]+\]\n} } } */
+/* { dg-final { scan-assembler {\tst4h\t{z[0-9]+.h - z[0-9]+.h}, p[0-7], \[x[0-9]+\]\n} } } */
Index: gcc/testsuite/gcc.target/aarch64/sve_struct_vect_2_run.c
===================================================================
--- /dev/null	2017-11-08 11:04:45.353113300 +0000
+++ gcc/testsuite/gcc.target/aarch64/sve_struct_vect_2_run.c	2017-11-08 15:06:27.250849138 +0000
@@ -0,0 +1,5 @@
+/* { dg-do run { target aarch64_sve_hw } } */
+/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve --save-temps" } */
+
+#define TYPE unsigned short
+#include "sve_struct_vect_1_run.c"
Index: gcc/testsuite/gcc.target/aarch64/sve_struct_vect_3.c
===================================================================
--- /dev/null	2017-11-08 11:04:45.353113300 +0000
+++ gcc/testsuite/gcc.target/aarch64/sve_struct_vect_3.c	2017-11-08 15:06:27.250849138 +0000
@@ -0,0 +1,12 @@
+/* { dg-do assemble } */
+/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve --save-temps" } */
+
+#define TYPE unsigned int
+#include "sve_struct_vect_1.c"
+
+/* { dg-final { scan-assembler {\tld2w\t{z[0-9]+.s - z[0-9]+.s}, p[0-7]/z, \[x[0-9]+\]\n} } } */
+/* { dg-final { scan-assembler {\tld3w\t{z[0-9]+.s - z[0-9]+.s}, p[0-7]/z, \[x[0-9]+\]\n} } } */
+/* { dg-final { scan-assembler {\tld4w\t{z[0-9]+.s - z[0-9]+.s}, p[0-7]/z, \[x[0-9]+\]\n} } } */
+/* { dg-final { scan-assembler {\tst2w\t{z[0-9]+.s - z[0-9]+.s}, p[0-7], \[x[0-9]+\]\n} } } */
+/* { dg-final { scan-assembler {\tst3w\t{z[0-9]+.s - z[0-9]+.s}, p[0-7], \[x[0-9]+\]\n} } } */
+/* { dg-final { scan-assembler {\tst4w\t{z[0-9]+.s - z[0-9]+.s}, p[0-7], \[x[0-9]+\]\n} } } */
Index: gcc/testsuite/gcc.target/aarch64/sve_struct_vect_3_run.c
===================================================================
--- /dev/null	2017-11-08 11:04:45.353113300 +0000
+++ gcc/testsuite/gcc.target/aarch64/sve_struct_vect_3_run.c	2017-11-08 15:06:27.250849138 +0000
@@ -0,0 +1,5 @@
+/* { dg-do run { target aarch64_sve_hw } } */
+/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve --save-temps" } */
+
+#define TYPE unsigned int
+#include "sve_struct_vect_1_run.c"
Index: gcc/testsuite/gcc.target/aarch64/sve_struct_vect_4.c
===================================================================
--- /dev/null	2017-11-08 11:04:45.353113300 +0000
+++ gcc/testsuite/gcc.target/aarch64/sve_struct_vect_4.c	2017-11-08 15:06:27.250849138 +0000
@@ -0,0 +1,12 @@
+/* { dg-do assemble } */
+/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve --save-temps" } */
+
+#define TYPE unsigned long
+#include "sve_struct_vect_1.c"
+
+/* { dg-final { scan-assembler {\tld2d\t{z[0-9]+.d - z[0-9]+.d}, p[0-7]/z, \[x[0-9]+\]\n} } } */
+/* { dg-final { scan-assembler {\tld3d\t{z[0-9]+.d - z[0-9]+.d}, p[0-7]/z, \[x[0-9]+\]\n} } } */
+/* { dg-final { scan-assembler {\tld4d\t{z[0-9]+.d - z[0-9]+.d}, p[0-7]/z, \[x[0-9]+\]\n} } } */
+/* { dg-final { scan-assembler {\tst2d\t{z[0-9]+.d - z[0-9]+.d}, p[0-7], \[x[0-9]+\]\n} } } */
+/* { dg-final { scan-assembler {\tst3d\t{z[0-9]+.d - z[0-9]+.d}, p[0-7], \[x[0-9]+\]\n} } } */
+/* { dg-final { scan-assembler {\tst4d\t{z[0-9]+.d - z[0-9]+.d}, p[0-7], \[x[0-9]+\]\n} } } */
Index: gcc/testsuite/gcc.target/aarch64/sve_struct_vect_4_run.c
===================================================================
--- /dev/null	2017-11-08 11:04:45.353113300 +0000
+++ gcc/testsuite/gcc.target/aarch64/sve_struct_vect_4_run.c	2017-11-08 15:06:27.250849138 +0000
@@ -0,0 +1,5 @@
+/* { dg-do run { target aarch64_sve_hw } } */
+/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve --save-temps" } */
+
+#define TYPE unsigned long
+#include "sve_struct_vect_1_run.c"
Index: gcc/testsuite/gcc.target/aarch64/sve_struct_vect_5.c
===================================================================
--- /dev/null	2017-11-08 11:04:45.353113300 +0000
+++ gcc/testsuite/gcc.target/aarch64/sve_struct_vect_5.c	2017-11-08 15:06:27.250849138 +0000
@@ -0,0 +1,12 @@
+/* { dg-do assemble } */
+/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve --save-temps" } */
+
+#define TYPE float
+#include "sve_struct_vect_1.c"
+
+/* { dg-final { scan-assembler {\tld2w\t{z[0-9]+.s - z[0-9]+.s}, p[0-7]/z, \[x[0-9]+\]\n} } } */
+/* { dg-final { scan-assembler {\tld3w\t{z[0-9]+.s - z[0-9]+.s}, p[0-7]/z, \[x[0-9]+\]\n} } } */
+/* { dg-final { scan-assembler {\tld4w\t{z[0-9]+.s - z[0-9]+.s}, p[0-7]/z, \[x[0-9]+\]\n} } } */
+/* { dg-final { scan-assembler {\tst2w\t{z[0-9]+.s - z[0-9]+.s}, p[0-7], \[x[0-9]+\]\n} } } */
+/* { dg-final { scan-assembler {\tst3w\t{z[0-9]+.s - z[0-9]+.s}, p[0-7], \[x[0-9]+\]\n} } } */
+/* { dg-final { scan-assembler {\tst4w\t{z[0-9]+.s - z[0-9]+.s}, p[0-7], \[x[0-9]+\]\n} } } */
Index: gcc/testsuite/gcc.target/aarch64/sve_struct_vect_5_run.c
===================================================================
--- /dev/null	2017-11-08 11:04:45.353113300 +0000
+++ gcc/testsuite/gcc.target/aarch64/sve_struct_vect_5_run.c	2017-11-08 15:06:27.250849138 +0000
@@ -0,0 +1,5 @@
+/* { dg-do run { target aarch64_sve_hw } } */
+/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve --save-temps" } */
+
+#define TYPE float
+#include "sve_struct_vect_1_run.c"
Index: gcc/testsuite/gcc.target/aarch64/sve_struct_vect_6.c
===================================================================
--- /dev/null	2017-11-08 11:04:45.353113300 +0000
+++ gcc/testsuite/gcc.target/aarch64/sve_struct_vect_6.c	2017-11-08 15:06:27.250849138 +0000
@@ -0,0 +1,12 @@
+/* { dg-do assemble } */
+/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve --save-temps" } */
+
+#define TYPE double
+#include "sve_struct_vect_1.c"
+
+/* { dg-final { scan-assembler {\tld2d\t{z[0-9]+.d - z[0-9]+.d}, p[0-7]/z, \[x[0-9]+\]\n} } } */
+/* { dg-final { scan-assembler {\tld3d\t{z[0-9]+.d - z[0-9]+.d}, p[0-7]/z, \[x[0-9]+\]\n} } } */
+/* { dg-final { scan-assembler {\tld4d\t{z[0-9]+.d - z[0-9]+.d}, p[0-7]/z, \[x[0-9]+\]\n} } } */
+/* { dg-final { scan-assembler {\tst2d\t{z[0-9]+.d - z[0-9]+.d}, p[0-7], \[x[0-9]+\]\n} } } */
+/* { dg-final { scan-assembler {\tst3d\t{z[0-9]+.d - z[0-9]+.d}, p[0-7], \[x[0-9]+\]\n} } } */
+/* { dg-final { scan-assembler {\tst4d\t{z[0-9]+.d - z[0-9]+.d}, p[0-7], \[x[0-9]+\]\n} } } */
Index: gcc/testsuite/gcc.target/aarch64/sve_struct_vect_6_run.c
===================================================================
--- /dev/null	2017-11-08 11:04:45.353113300 +0000
+++ gcc/testsuite/gcc.target/aarch64/sve_struct_vect_6_run.c	2017-11-08 15:06:27.250849138 +0000
@@ -0,0 +1,5 @@
+/* { dg-do run { target aarch64_sve_hw } } */
+/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve --save-temps" } */
+
+#define TYPE double
+#include "sve_struct_vect_1_run.c"
Index: gcc/testsuite/gcc.target/aarch64/sve_struct_vect_7.c
===================================================================
--- /dev/null	2017-11-08 11:04:45.353113300 +0000
+++ gcc/testsuite/gcc.target/aarch64/sve_struct_vect_7.c	2017-11-08 15:06:27.250849138 +0000
@@ -0,0 +1,84 @@
+/* { dg-do assemble } */
+/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve --save-temps" } */
+
+#ifndef TYPE
+#define TYPE unsigned char
+#define ITYPE signed char
+#endif
+
+void __attribute__ ((noinline, noclone))
+f2 (TYPE *__restrict a, TYPE *__restrict b, TYPE *__restrict c, ITYPE n)
+{
+  for (ITYPE i = 0; i < n; ++i)
+    {
+      a[i] = c[i * 2];
+      b[i] = c[i * 2 + 1];
+    }
+}
+
+void __attribute__ ((noinline, noclone))
+f3 (TYPE *__restrict a, TYPE *__restrict b, TYPE *__restrict c,
+    TYPE *__restrict d, ITYPE n)
+{
+  for (ITYPE i = 0; i < n; ++i)
+    {
+      a[i] = d[i * 3];
+      b[i] = d[i * 3 + 1];
+      c[i] = d[i * 3 + 2];
+    }
+}
+
+void __attribute__ ((noinline, noclone))
+f4 (TYPE *__restrict a, TYPE *__restrict b, TYPE *__restrict c,
+    TYPE *__restrict d, TYPE *__restrict e, ITYPE n)
+{
+  for (ITYPE i = 0; i < n; ++i)
+    {
+      a[i] = e[i * 4];
+      b[i] = e[i * 4 + 1];
+      c[i] = e[i * 4 + 2];
+      d[i] = e[i * 4 + 3];
+    }
+}
+
+void __attribute__ ((noinline, noclone))
+g2 (TYPE *__restrict a, TYPE *__restrict b, TYPE *__restrict c, ITYPE n)
+{
+  for (ITYPE i = 0; i < n; ++i)
+    {
+      c[i * 2] = a[i];
+      c[i * 2 + 1] = b[i];
+    }
+}
+
+void __attribute__ ((noinline, noclone))
+g3 (TYPE *__restrict a, TYPE *__restrict b, TYPE *__restrict c,
+    TYPE *__restrict d, ITYPE n)
+{
+  for (ITYPE i = 0; i < n; ++i)
+    {
+      d[i * 3] = a[i];
+      d[i * 3 + 1] = b[i];
+      d[i * 3 + 2] = c[i];
+    }
+}
+
+void __attribute__ ((noinline, noclone))
+g4 (TYPE *__restrict a, TYPE *__restrict b, TYPE *__restrict c,
+    TYPE *__restrict d, TYPE *__restrict e, ITYPE n)
+{
+  for (ITYPE i = 0; i < n; ++i)
+    {
+      e[i * 4] = a[i];
+      e[i * 4 + 1] = b[i];
+      e[i * 4 + 2] = c[i];
+      e[i * 4 + 3] = d[i];
+    }
+}
+
+/* { dg-final { scan-assembler {\tld2b\t{z[0-9]+.b - z[0-9]+.b}, p[0-7]/z, \[x[0-9]+\]\n} } } */
+/* { dg-final { scan-assembler {\tld3b\t{z[0-9]+.b - z[0-9]+.b}, p[0-7]/z, \[x[0-9]+\]\n} } } */
+/* { dg-final { scan-assembler {\tld4b\t{z[0-9]+.b - z[0-9]+.b}, p[0-7]/z, \[x[0-9]+\]\n} } } */
+/* { dg-final { scan-assembler {\tst2b\t{z[0-9]+.b - z[0-9]+.b}, p[0-7], \[x[0-9]+\]\n} } } */
+/* { dg-final { scan-assembler {\tst3b\t{z[0-9]+.b - z[0-9]+.b}, p[0-7], \[x[0-9]+\]\n} } } */
+/* { dg-final { scan-assembler {\tst4b\t{z[0-9]+.b - z[0-9]+.b}, p[0-7], \[x[0-9]+\]\n} } } */
Index: gcc/testsuite/gcc.target/aarch64/sve_struct_vect_7_run.c
===================================================================
--- /dev/null	2017-11-08 11:04:45.353113300 +0000
+++ gcc/testsuite/gcc.target/aarch64/sve_struct_vect_7_run.c	2017-11-08 15:06:27.250849138 +0000
@@ -0,0 +1,65 @@
+/* { dg-do run { target aarch64_sve_hw } } */
+/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve --save-temps" } */
+
+#include "sve_struct_vect_7.c"
+
+#define N 93
+
+TYPE a[N], b[N], c[N], d[N], e[N * 4];
+
+void __attribute__ ((noinline, noclone))
+init_array (TYPE *array, int n, TYPE base, TYPE step)
+{
+  for (int i = 0; i < n; ++i)
+    array[i] = base + step * i;
+}
+
+void __attribute__ ((noinline, noclone))
+check_array (TYPE *array, int n, TYPE base, TYPE step)
+{
+  for (int i = 0; i < n; ++i)
+    if (array[i] != (TYPE) (base + step * i))
+      __builtin_abort ();
+}
+
+int __attribute__ ((optimize (1)))
+main (void)
+{
+  init_array (e, 2 * N, 11, 5);
+  f2 (a, b, e, N);
+  check_array (a, N, 11, 10);
+  check_array (b, N, 16, 10);
+
+  init_array (e, 3 * N, 7, 6);
+  f3 (a, b, c, e, N);
+  check_array (a, N, 7, 18);
+  check_array (b, N, 13, 18);
+  check_array (c, N, 19, 18);
+
+  init_array (e, 4 * N, 4, 11);
+  f4 (a, b, c, d, e, N);
+  check_array (a, N, 4, 44);
+  check_array (b, N, 15, 44);
+  check_array (c, N, 26, 44);
+  check_array (d, N, 37, 44);
+
+  init_array (a, N, 2, 8);
+  init_array (b, N, 6, 8);
+  g2 (a, b, e, N);
+  check_array (e, 2 * N, 2, 4);
+
+  init_array (a, N, 4, 15);
+  init_array (b, N, 9, 15);
+  init_array (c, N, 14, 15);
+  g3 (a, b, c, e, N);
+  check_array (e, 3 * N, 4, 5);
+
+  init_array (a, N, 14, 36);
+  init_array (b, N, 23, 36);
+  init_array (c, N, 32, 36);
+  init_array (d, N, 41, 36);
+  g4 (a, b, c, d, e, N);
+  check_array (e, 4 * N, 14, 9);
+
+  return 0;
+}
Index: gcc/testsuite/gcc.target/aarch64/sve_struct_vect_8.c
===================================================================
--- /dev/null	2017-11-08 11:04:45.353113300 +0000
+++ gcc/testsuite/gcc.target/aarch64/sve_struct_vect_8.c	2017-11-08 15:06:27.251849138 +0000
@@ -0,0 +1,13 @@
+/* { dg-do assemble } */
+/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve --save-temps" } */
+
+#define TYPE unsigned short
+#define ITYPE short
+#include "sve_struct_vect_7.c"
+
+/* { dg-final { scan-assembler {\tld2h\t{z[0-9]+.h - z[0-9]+.h}, p[0-7]/z, \[x[0-9]+\]\n} } } */
+/* { dg-final { scan-assembler {\tld3h\t{z[0-9]+.h - z[0-9]+.h}, p[0-7]/z, \[x[0-9]+\]\n} } } */
+/* { dg-final { scan-assembler {\tld4h\t{z[0-9]+.h - z[0-9]+.h}, p[0-7]/z, \[x[0-9]+\]\n} } } */
+/* { dg-final { scan-assembler {\tst2h\t{z[0-9]+.h - z[0-9]+.h}, p[0-7], \[x[0-9]+\]\n} } } */
+/* { dg-final { scan-assembler {\tst3h\t{z[0-9]+.h - z[0-9]+.h}, p[0-7], \[x[0-9]+\]\n} } } */
+/* { dg-final { scan-assembler {\tst4h\t{z[0-9]+.h - z[0-9]+.h}, p[0-7], \[x[0-9]+\]\n} } } */
Index: gcc/testsuite/gcc.target/aarch64/sve_struct_vect_8_run.c
===================================================================
--- /dev/null	2017-11-08 11:04:45.353113300 +0000
+++ gcc/testsuite/gcc.target/aarch64/sve_struct_vect_8_run.c	2017-11-08 15:06:27.251849138 +0000
@@ -0,0 +1,6 @@
+/* { dg-do run { target aarch64_sve_hw } } */
+/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve --save-temps" } */
+
+#define TYPE unsigned short
+#define ITYPE short
+#include "sve_struct_vect_7_run.c"
Index: gcc/testsuite/gcc.target/aarch64/sve_struct_vect_9.c
===================================================================
--- /dev/null	2017-11-08 11:04:45.353113300 +0000
+++ gcc/testsuite/gcc.target/aarch64/sve_struct_vect_9.c	2017-11-08 15:06:27.251849138 +0000
@@ -0,0 +1,13 @@
+/* { dg-do assemble } */
+/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve --save-temps" } */
+
+#define TYPE unsigned int
+#define ITYPE int
+#include "sve_struct_vect_7.c"
+
+/* { dg-final { scan-assembler {\tld2w\t{z[0-9]+.s - z[0-9]+.s}, p[0-7]/z, \[x[0-9]+\]\n} } } */
+/* { dg-final { scan-assembler {\tld3w\t{z[0-9]+.s - z[0-9]+.s}, p[0-7]/z, \[x[0-9]+\]\n} } } */
+/* { dg-final { scan-assembler {\tld4w\t{z[0-9]+.s - z[0-9]+.s}, p[0-7]/z, \[x[0-9]+\]\n} } } */
+/* { dg-final { scan-assembler {\tst2w\t{z[0-9]+.s - z[0-9]+.s}, p[0-7], \[x[0-9]+\]\n} } } */
+/* { dg-final { scan-assembler {\tst3w\t{z[0-9]+.s - z[0-9]+.s}, p[0-7], \[x[0-9]+\]\n} } } */
+/* { dg-final { scan-assembler {\tst4w\t{z[0-9]+.s - z[0-9]+.s}, p[0-7], \[x[0-9]+\]\n} } } */
Index: gcc/testsuite/gcc.target/aarch64/sve_struct_vect_9_run.c
===================================================================
--- /dev/null	2017-11-08 11:04:45.353113300 +0000
+++ gcc/testsuite/gcc.target/aarch64/sve_struct_vect_9_run.c	2017-11-08 15:06:27.251849138 +0000
@@ -0,0 +1,6 @@
+/* { dg-do run { target aarch64_sve_hw } } */
+/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve --save-temps" } */
+
+#define TYPE unsigned int
+#define ITYPE int
+#include "sve_struct_vect_7_run.c"
Index: gcc/testsuite/gcc.target/aarch64/sve_struct_vect_10.c
===================================================================
--- /dev/null	2017-11-08 11:04:45.353113300 +0000
+++ gcc/testsuite/gcc.target/aarch64/sve_struct_vect_10.c	2017-11-08 15:06:27.248849138 +0000
@@ -0,0 +1,13 @@
+/* { dg-do assemble } */
+/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve --save-temps" } */
+
+#define TYPE unsigned long
+#define ITYPE long
+#include "sve_struct_vect_7.c"
+
+/* { dg-final { scan-assembler {\tld2d\t{z[0-9]+.d - z[0-9]+.d}, p[0-7]/z, \[x[0-9]+\]\n} } } */
+/* { dg-final { scan-assembler {\tld3d\t{z[0-9]+.d - z[0-9]+.d}, p[0-7]/z, \[x[0-9]+\]\n} } } */
+/* { dg-final { scan-assembler {\tld4d\t{z[0-9]+.d - z[0-9]+.d}, p[0-7]/z, \[x[0-9]+\]\n} } } */
+/* { dg-final { scan-assembler {\tst2d\t{z[0-9]+.d - z[0-9]+.d}, p[0-7], \[x[0-9]+\]\n} } } */
+/* { dg-final { scan-assembler {\tst3d\t{z[0-9]+.d - z[0-9]+.d}, p[0-7], \[x[0-9]+\]\n} } } */
+/* { dg-final { scan-assembler {\tst4d\t{z[0-9]+.d - z[0-9]+.d}, p[0-7], \[x[0-9]+\]\n} } } */
Index: gcc/testsuite/gcc.target/aarch64/sve_struct_vect_10_run.c
===================================================================
--- /dev/null	2017-11-08 11:04:45.353113300 +0000
+++ gcc/testsuite/gcc.target/aarch64/sve_struct_vect_10_run.c	2017-11-08 15:06:27.248849138 +0000
@@ -0,0 +1,6 @@
+/* { dg-do run { target aarch64_sve_hw } } */
+/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve --save-temps" } */
+
+#define TYPE unsigned long
+#define ITYPE long
+#include "sve_struct_vect_7_run.c"
Index: gcc/testsuite/gcc.target/aarch64/sve_struct_vect_11.c
===================================================================
--- /dev/null	2017-11-08 11:04:45.353113300 +0000
+++ gcc/testsuite/gcc.target/aarch64/sve_struct_vect_11.c	2017-11-08 15:06:27.248849138 +0000
@@ -0,0 +1,13 @@
+/* { dg-do assemble } */
+/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve --save-temps" } */
+
+#define TYPE _Float16
+#define ITYPE short
+#include "sve_struct_vect_7.c"
+
+/* { dg-final { scan-assembler {\tld2h\t{z[0-9]+.h - z[0-9]+.h}, p[0-7]/z, \[x[0-9]+\]\n} } } */
+/* { dg-final { scan-assembler {\tld3h\t{z[0-9]+.h - z[0-9]+.h}, p[0-7]/z, \[x[0-9]+\]\n} } } */
+/* { dg-final { scan-assembler {\tld4h\t{z[0-9]+.h - z[0-9]+.h}, p[0-7]/z, \[x[0-9]+\]\n} } } */
+/* { dg-final { scan-assembler {\tst2h\t{z[0-9]+.h - z[0-9]+.h}, p[0-7], \[x[0-9]+\]\n} } } */
+/* { dg-final { scan-assembler {\tst3h\t{z[0-9]+.h - z[0-9]+.h}, p[0-7], \[x[0-9]+\]\n} } } */
+/* { dg-final { scan-assembler {\tst4h\t{z[0-9]+.h - z[0-9]+.h}, p[0-7], \[x[0-9]+\]\n} } } */
Index: gcc/testsuite/gcc.target/aarch64/sve_struct_vect_11_run.c
===================================================================
--- /dev/null	2017-11-08 11:04:45.353113300 +0000
+++ gcc/testsuite/gcc.target/aarch64/sve_struct_vect_11_run.c	2017-11-08 15:06:27.248849138 +0000
@@ -0,0 +1,6 @@
+/* { dg-do run { target aarch64_sve_hw } } */
+/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve --save-temps" } */
+
+#define TYPE _Float16
+#define ITYPE short
+#include "sve_struct_vect_7_run.c"
Index: gcc/testsuite/gcc.target/aarch64/sve_struct_vect_12.c
===================================================================
--- /dev/null	2017-11-08 11:04:45.353113300 +0000
+++ gcc/testsuite/gcc.target/aarch64/sve_struct_vect_12.c	2017-11-08 15:06:27.248849138 +0000
@@ -0,0 +1,13 @@
+/* { dg-do assemble } */
+/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve --save-temps" } */
+
+#define TYPE float
+#define ITYPE int
+#include "sve_struct_vect_7.c"
+
+/* { dg-final { scan-assembler {\tld2w\t{z[0-9]+.s - z[0-9]+.s}, p[0-7]/z, \[x[0-9]+\]\n} } } */
+/* { dg-final { scan-assembler {\tld3w\t{z[0-9]+.s - z[0-9]+.s}, p[0-7]/z, \[x[0-9]+\]\n} } } */
+/* { dg-final { scan-assembler {\tld4w\t{z[0-9]+.s - z[0-9]+.s}, p[0-7]/z, \[x[0-9]+\]\n} } } */
+/* { dg-final { scan-assembler {\tst2w\t{z[0-9]+.s - z[0-9]+.s}, p[0-7], \[x[0-9]+\]\n} } } */
+/* { dg-final { scan-assembler {\tst3w\t{z[0-9]+.s - z[0-9]+.s}, p[0-7], \[x[0-9]+\]\n} } } */
+/* { dg-final { scan-assembler {\tst4w\t{z[0-9]+.s - z[0-9]+.s}, p[0-7], \[x[0-9]+\]\n} } } */
Index: gcc/testsuite/gcc.target/aarch64/sve_struct_vect_12_run.c
===================================================================
--- /dev/null	2017-11-08 11:04:45.353113300 +0000
+++ gcc/testsuite/gcc.target/aarch64/sve_struct_vect_12_run.c	2017-11-08 15:06:27.248849138 +0000
@@ -0,0 +1,6 @@
+/* { dg-do run { target aarch64_sve_hw } } */
+/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve --save-temps" } */
+
+#define TYPE float
+#define ITYPE int
+#include "sve_struct_vect_7_run.c"
Index: gcc/testsuite/gcc.target/aarch64/sve_struct_vect_13.c
===================================================================
--- /dev/null	2017-11-08 11:04:45.353113300 +0000
+++ gcc/testsuite/gcc.target/aarch64/sve_struct_vect_13.c	2017-11-08 15:06:27.249849138 +0000
@@ -0,0 +1,13 @@
+/* { dg-do assemble } */
+/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve --save-temps" } */
+
+#define TYPE double
+#define ITYPE long
+#include "sve_struct_vect_7.c"
+
+/* { dg-final { scan-assembler {\tld2d\t{z[0-9]+.d - z[0-9]+.d}, p[0-7]/z, \[x[0-9]+\]\n} } } */
+/* { dg-final { scan-assembler {\tld3d\t{z[0-9]+.d - z[0-9]+.d}, p[0-7]/z, \[x[0-9]+\]\n} } } */
+/* { dg-final { scan-assembler {\tld4d\t{z[0-9]+.d - z[0-9]+.d}, p[0-7]/z, \[x[0-9]+\]\n} } } */
+/* { dg-final { scan-assembler {\tst2d\t{z[0-9]+.d - z[0-9]+.d}, p[0-7], \[x[0-9]+\]\n} } } */
+/* { dg-final { scan-assembler {\tst3d\t{z[0-9]+.d - z[0-9]+.d}, p[0-7], \[x[0-9]+\]\n} } } */
+/* { dg-final { scan-assembler {\tst4d\t{z[0-9]+.d - z[0-9]+.d}, p[0-7], \[x[0-9]+\]\n} } } */
Index: gcc/testsuite/gcc.target/aarch64/sve_struct_vect_13_run.c
===================================================================
--- /dev/null	2017-11-08 11:04:45.353113300 +0000
+++ gcc/testsuite/gcc.target/aarch64/sve_struct_vect_13_run.c	2017-11-08 15:06:27.249849138 +0000
@@ -0,0 +1,6 @@
+/* { dg-do run { target aarch64_sve_hw } } */
+/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve --save-temps" } */
+
+#define TYPE double
+#define ITYPE long
+#include "sve_struct_vect_7_run.c"
Index: gcc/testsuite/gcc.target/aarch64/sve_struct_vect_14.c
===================================================================
--- /dev/null	2017-11-08 11:04:45.353113300 +0000
+++ gcc/testsuite/gcc.target/aarch64/sve_struct_vect_14.c	2017-11-08 15:06:27.249849138 +0000
@@ -0,0 +1,72 @@
+/* { dg-do assemble } */
+/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve -msve-vector-bits=256 --save-temps" } */
+
+#define TYPE unsigned char
+#define NAME(X) qi_##X
+#include "sve_struct_vect_1.c"
+#undef NAME
+#undef TYPE
+
+#define TYPE unsigned short
+#define NAME(X) hi_##X
+#include "sve_struct_vect_1.c"
+#undef NAME
+#undef TYPE
+
+#define TYPE unsigned int
+#define NAME(X) si_##X
+#include "sve_struct_vect_1.c"
+#undef NAME
+#undef TYPE
+
+#define TYPE unsigned long
+#define NAME(X) di_##X
+#include "sve_struct_vect_1.c"
+#undef NAME
+#undef TYPE
+
+#define TYPE _Float16
+#define NAME(X) hf_##X
+#include "sve_struct_vect_1.c"
+#undef NAME
+#undef TYPE
+
+#define TYPE float
+#define NAME(X) sf_##X
+#include "sve_struct_vect_1.c"
+#undef NAME
+#undef TYPE
+
+#define TYPE double
+#define NAME(X) df_##X
+#include "sve_struct_vect_1.c"
+#undef NAME
+#undef TYPE
+
+/* { dg-final { scan-assembler-times {\tld2b\t{z[0-9]+.b - z[0-9]+.b}, p[0-7]/z, \[x[0-9]+\]\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tld3b\t{z[0-9]+.b - z[0-9]+.b}, p[0-7]/z, \[x[0-9]+\]\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tld4b\t{z[0-9]+.b - z[0-9]+.b}, p[0-7]/z, \[x[0-9]+\]\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tst2b\t{z[0-9]+.b - z[0-9]+.b}, p[0-7], \[x[0-9]+\]\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tst3b\t{z[0-9]+.b - z[0-9]+.b}, p[0-7], \[x[0-9]+\]\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tst4b\t{z[0-9]+.b - z[0-9]+.b}, p[0-7], \[x[0-9]+\]\n} 1 } } */
+
+/* { dg-final { scan-assembler-times {\tld2h\t{z[0-9]+.h - z[0-9]+.h}, p[0-7]/z, \[x[0-9]+\]\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tld3h\t{z[0-9]+.h - z[0-9]+.h}, p[0-7]/z, \[x[0-9]+\]\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tld4h\t{z[0-9]+.h - z[0-9]+.h}, p[0-7]/z, \[x[0-9]+\]\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tst2h\t{z[0-9]+.h - z[0-9]+.h}, p[0-7], \[x[0-9]+\]\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tst3h\t{z[0-9]+.h - z[0-9]+.h}, p[0-7], \[x[0-9]+\]\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tst4h\t{z[0-9]+.h - z[0-9]+.h}, p[0-7], \[x[0-9]+\]\n} 2 } } */
+
+/* { dg-final { scan-assembler-times {\tld2w\t{z[0-9]+.s - z[0-9]+.s}, p[0-7]/z, \[x[0-9]+\]\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tld3w\t{z[0-9]+.s - z[0-9]+.s}, p[0-7]/z, \[x[0-9]+\]\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tld4w\t{z[0-9]+.s - z[0-9]+.s}, p[0-7]/z, \[x[0-9]+\]\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tst2w\t{z[0-9]+.s - z[0-9]+.s}, p[0-7], \[x[0-9]+\]\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tst3w\t{z[0-9]+.s - z[0-9]+.s}, p[0-7], \[x[0-9]+\]\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tst4w\t{z[0-9]+.s - z[0-9]+.s}, p[0-7], \[x[0-9]+\]\n} 2 } } */
+
+/* { dg-final { scan-assembler-times {\tld2d\t{z[0-9]+.d - z[0-9]+.d}, p[0-7]/z, \[x[0-9]+\]\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tld3d\t{z[0-9]+.d - z[0-9]+.d}, p[0-7]/z, \[x[0-9]+\]\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tld4d\t{z[0-9]+.d - z[0-9]+.d}, p[0-7]/z, \[x[0-9]+\]\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tst2d\t{z[0-9]+.d - z[0-9]+.d}, p[0-7], \[x[0-9]+\]\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tst3d\t{z[0-9]+.d - z[0-9]+.d}, p[0-7], \[x[0-9]+\]\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tst4d\t{z[0-9]+.d - z[0-9]+.d}, p[0-7], \[x[0-9]+\]\n} 2 } } */
Index: gcc/testsuite/gcc.target/aarch64/sve_struct_vect_15.c
===================================================================
--- /dev/null	2017-11-08 11:04:45.353113300 +0000
+++ gcc/testsuite/gcc.target/aarch64/sve_struct_vect_15.c	2017-11-08 15:06:27.249849138 +0000
@@ -0,0 +1,32 @@
+/* { dg-do assemble } */
+/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve -msve-vector-bits=512 --save-temps" } */
+
+#include "sve_struct_vect_14.c"
+
+/* { dg-final { scan-assembler-times {\tld2b\t{z[0-9]+.b - z[0-9]+.b}, p[0-7]/z, \[x[0-9]+\]\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tld3b\t{z[0-9]+.b - z[0-9]+.b}, p[0-7]/z, \[x[0-9]+\]\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tld4b\t{z[0-9]+.b - z[0-9]+.b}, p[0-7]/z, \[x[0-9]+\]\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tst2b\t{z[0-9]+.b - z[0-9]+.b}, p[0-7], \[x[0-9]+\]\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tst3b\t{z[0-9]+.b - z[0-9]+.b}, p[0-7], \[x[0-9]+\]\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tst4b\t{z[0-9]+.b - z[0-9]+.b}, p[0-7], \[x[0-9]+\]\n} 1 } } */
+
+/* { dg-final { scan-assembler-times {\tld2h\t{z[0-9]+.h - z[0-9]+.h}, p[0-7]/z, \[x[0-9]+\]\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tld3h\t{z[0-9]+.h - z[0-9]+.h}, p[0-7]/z, \[x[0-9]+\]\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tld4h\t{z[0-9]+.h - z[0-9]+.h}, p[0-7]/z, \[x[0-9]+\]\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tst2h\t{z[0-9]+.h - z[0-9]+.h}, p[0-7], \[x[0-9]+\]\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tst3h\t{z[0-9]+.h - z[0-9]+.h}, p[0-7], \[x[0-9]+\]\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tst4h\t{z[0-9]+.h - z[0-9]+.h}, p[0-7], \[x[0-9]+\]\n} 2 } } */
+
+/* { dg-final { scan-assembler-times {\tld2w\t{z[0-9]+.s - z[0-9]+.s}, p[0-7]/z, \[x[0-9]+\]\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tld3w\t{z[0-9]+.s - z[0-9]+.s}, p[0-7]/z, \[x[0-9]+\]\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tld4w\t{z[0-9]+.s - z[0-9]+.s}, p[0-7]/z, \[x[0-9]+\]\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tst2w\t{z[0-9]+.s - z[0-9]+.s}, p[0-7], \[x[0-9]+\]\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tst3w\t{z[0-9]+.s - z[0-9]+.s}, p[0-7], \[x[0-9]+\]\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tst4w\t{z[0-9]+.s - z[0-9]+.s}, p[0-7], \[x[0-9]+\]\n} 2 } } */
+
+/* { dg-final { scan-assembler-times {\tld2d\t{z[0-9]+.d - z[0-9]+.d}, p[0-7]/z, \[x[0-9]+\]\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tld3d\t{z[0-9]+.d - z[0-9]+.d}, p[0-7]/z, \[x[0-9]+\]\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tld4d\t{z[0-9]+.d - z[0-9]+.d}, p[0-7]/z, \[x[0-9]+\]\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tst2d\t{z[0-9]+.d - z[0-9]+.d}, p[0-7], \[x[0-9]+\]\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tst3d\t{z[0-9]+.d - z[0-9]+.d}, p[0-7], \[x[0-9]+\]\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tst4d\t{z[0-9]+.d - z[0-9]+.d}, p[0-7], \[x[0-9]+\]\n} 2 } } */
Index: gcc/testsuite/gcc.target/aarch64/sve_struct_vect_16.c
===================================================================
--- /dev/null	2017-11-08 11:04:45.353113300 +0000
+++ gcc/testsuite/gcc.target/aarch64/sve_struct_vect_16.c	2017-11-08 15:06:27.249849138 +0000
@@ -0,0 +1,32 @@
+/* { dg-do assemble } */
+/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve -msve-vector-bits=1024 --save-temps" } */
+
+#include "sve_struct_vect_14.c"
+
+/* { dg-final { scan-assembler-times {\tld2b\t{z[0-9]+.b - z[0-9]+.b}, p[0-7]/z, \[x[0-9]+\]\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tld3b\t{z[0-9]+.b - z[0-9]+.b}, p[0-7]/z, \[x[0-9]+\]\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tld4b\t{z[0-9]+.b - z[0-9]+.b}, p[0-7]/z, \[x[0-9]+\]\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tst2b\t{z[0-9]+.b - z[0-9]+.b}, p[0-7], \[x[0-9]+\]\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tst3b\t{z[0-9]+.b - z[0-9]+.b}, p[0-7], \[x[0-9]+\]\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tst4b\t{z[0-9]+.b - z[0-9]+.b}, p[0-7], \[x[0-9]+\]\n} 1 } } */
+
+/* { dg-final { scan-assembler-times {\tld2h\t{z[0-9]+.h - z[0-9]+.h}, p[0-7]/z, \[x[0-9]+\]\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tld3h\t{z[0-9]+.h - z[0-9]+.h}, p[0-7]/z, \[x[0-9]+\]\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tld4h\t{z[0-9]+.h - z[0-9]+.h}, p[0-7]/z, \[x[0-9]+\]\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tst2h\t{z[0-9]+.h - z[0-9]+.h}, p[0-7], \[x[0-9]+\]\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tst3h\t{z[0-9]+.h - z[0-9]+.h}, p[0-7], \[x[0-9]+\]\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tst4h\t{z[0-9]+.h - z[0-9]+.h}, p[0-7], \[x[0-9]+\]\n} 2 } } */
+
+/* { dg-final { scan-assembler-times {\tld2w\t{z[0-9]+.s - z[0-9]+.s}, p[0-7]/z, \[x[0-9]+\]\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tld3w\t{z[0-9]+.s - z[0-9]+.s}, p[0-7]/z, \[x[0-9]+\]\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tld4w\t{z[0-9]+.s - z[0-9]+.s}, p[0-7]/z, \[x[0-9]+\]\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tst2w\t{z[0-9]+.s - z[0-9]+.s}, p[0-7], \[x[0-9]+\]\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tst3w\t{z[0-9]+.s - z[0-9]+.s}, p[0-7], \[x[0-9]+\]\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tst4w\t{z[0-9]+.s - z[0-9]+.s}, p[0-7], \[x[0-9]+\]\n} 2 } } */
+
+/* { dg-final { scan-assembler-times {\tld2d\t{z[0-9]+.d - z[0-9]+.d}, p[0-7]/z, \[x[0-9]+\]\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tld3d\t{z[0-9]+.d - z[0-9]+.d}, p[0-7]/z, \[x[0-9]+\]\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tld4d\t{z[0-9]+.d - z[0-9]+.d}, p[0-7]/z, \[x[0-9]+\]\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tst2d\t{z[0-9]+.d - z[0-9]+.d}, p[0-7], \[x[0-9]+\]\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tst3d\t{z[0-9]+.d - z[0-9]+.d}, p[0-7], \[x[0-9]+\]\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tst4d\t{z[0-9]+.d - z[0-9]+.d}, p[0-7], \[x[0-9]+\]\n} 2 } } */
Index: gcc/testsuite/gcc.target/aarch64/sve_struct_vect_17.c
===================================================================
--- /dev/null	2017-11-08 11:04:45.353113300 +0000
+++ gcc/testsuite/gcc.target/aarch64/sve_struct_vect_17.c	2017-11-08 15:06:27.249849138 +0000
@@ -0,0 +1,32 @@
+/* { dg-do assemble } */
+/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve -msve-vector-bits=2048 --save-temps" } */
+
+#include "sve_struct_vect_14.c"
+
+/* { dg-final { scan-assembler-times {\tld2b\t{z[0-9]+.b - z[0-9]+.b}, p[0-7]/z, \[x[0-9]+\]\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tld3b\t{z[0-9]+.b - z[0-9]+.b}, p[0-7]/z, \[x[0-9]+\]\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tld4b\t{z[0-9]+.b - z[0-9]+.b}, p[0-7]/z, \[x[0-9]+\]\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tst2b\t{z[0-9]+.b - z[0-9]+.b}, p[0-7], \[x[0-9]+\]\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tst3b\t{z[0-9]+.b - z[0-9]+.b}, p[0-7], \[x[0-9]+\]\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tst4b\t{z[0-9]+.b - z[0-9]+.b}, p[0-7], \[x[0-9]+\]\n} 1 } } */
+
+/* { dg-final { scan-assembler-times {\tld2h\t{z[0-9]+.h - z[0-9]+.h}, p[0-7]/z, \[x[0-9]+\]\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tld3h\t{z[0-9]+.h - z[0-9]+.h}, p[0-7]/z, \[x[0-9]+\]\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tld4h\t{z[0-9]+.h - z[0-9]+.h}, p[0-7]/z, \[x[0-9]+\]\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tst2h\t{z[0-9]+.h - z[0-9]+.h}, p[0-7], \[x[0-9]+\]\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tst3h\t{z[0-9]+.h - z[0-9]+.h}, p[0-7], \[x[0-9]+\]\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tst4h\t{z[0-9]+.h - z[0-9]+.h}, p[0-7], \[x[0-9]+\]\n} 2 } } */
+
+/* { dg-final { scan-assembler-times {\tld2w\t{z[0-9]+.s - z[0-9]+.s}, p[0-7]/z, \[x[0-9]+\]\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tld3w\t{z[0-9]+.s - z[0-9]+.s}, p[0-7]/z, \[x[0-9]+\]\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tld4w\t{z[0-9]+.s - z[0-9]+.s}, p[0-7]/z, \[x[0-9]+\]\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tst2w\t{z[0-9]+.s - z[0-9]+.s}, p[0-7], \[x[0-9]+\]\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tst3w\t{z[0-9]+.s - z[0-9]+.s}, p[0-7], \[x[0-9]+\]\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tst4w\t{z[0-9]+.s - z[0-9]+.s}, p[0-7], \[x[0-9]+\]\n} 2 } } */
+
+/* { dg-final { scan-assembler-times {\tld2d\t{z[0-9]+.d - z[0-9]+.d}, p[0-7]/z, \[x[0-9]+\]\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tld3d\t{z[0-9]+.d - z[0-9]+.d}, p[0-7]/z, \[x[0-9]+\]\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tld4d\t{z[0-9]+.d - z[0-9]+.d}, p[0-7]/z, \[x[0-9]+\]\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tst2d\t{z[0-9]+.d - z[0-9]+.d}, p[0-7], \[x[0-9]+\]\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tst3d\t{z[0-9]+.d - z[0-9]+.d}, p[0-7], \[x[0-9]+\]\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tst4d\t{z[0-9]+.d - z[0-9]+.d}, p[0-7], \[x[0-9]+\]\n} 2 } } */

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [3/4] load/store_lanes testsuite markup
  2017-11-08 15:12 [0/4] Add SVE support for load/store_lanes Richard Sandiford
  2017-11-08 15:13 ` [1/4] Give the target more control over ARRAY_TYPE modes Richard Sandiford
  2017-11-08 15:16 ` [2/4] [AArch64] SVE load/store_lanes support Richard Sandiford
@ 2017-11-08 15:18 ` Richard Sandiford
  2017-11-20  4:53   ` Jeff Law
  2017-11-08 15:18 ` [4/4] [AArch64] Tests for SVE structure modes Richard Sandiford
  3 siblings, 1 reply; 9+ messages in thread
From: Richard Sandiford @ 2017-11-08 15:18 UTC (permalink / raw)
  To: gcc-patches; +Cc: richard.earnshaw, james.greenhalgh, marcus.shawcroft

Supporting load/store lanes for variable-length vectors means that
we use them instead of SLP (for which we can't yet handle external
and constant definitions -- fixed by a later patch).  Previously
we'd fail to use load/store lanes too and fall back to 128-bit
vectorisation.


2017-11-08  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/testsuite/
	* lib/target-supports.exp (check_effective_target_vect_load_lanes):
	Return true for SVE too.
	* g++.dg/vect/pr36648.cc: XFAIL for variable-length vectors
	if load/store lanes are supported.
	* gcc.dg/vect/no-scevccp-slp-30.c: Likewise.
	* gcc.dg/vect/pr37027.c: Likewise.
	* gcc.dg/vect/pr67790.c: Likewise.
	* gcc.dg/vect/slp-1.c: Likewise.
	* gcc.dg/vect/slp-10.c: Likewise.
	* gcc.dg/vect/slp-12b.c: Likewise.
	* gcc.dg/vect/slp-12c.c: Likewise.
	* gcc.dg/vect/slp-17.c: Likewise.
	* gcc.dg/vect/slp-19b.c: Likewise.
	* gcc.dg/vect/slp-2.c: Likewise.
	* gcc.dg/vect/slp-20.c: Likewise.
	* gcc.dg/vect/slp-21.c: Likewise.
	* gcc.dg/vect/slp-22.c: Likewise.
	* gcc.dg/vect/slp-24-big-array.c: Likewise.
	* gcc.dg/vect/slp-24.c: Likewise.
	* gcc.dg/vect/slp-28.c: Likewise.
	* gcc.dg/vect/slp-33.c: Likewise.
	* gcc.dg/vect/slp-39.c: Likewise.
	* gcc.dg/vect/slp-6.c: Likewise.
	* gcc.dg/vect/slp-7.c: Likewise.
	* gcc.dg/vect/slp-cond-1.c: Likewise.
	* gcc.dg/vect/slp-cond-2-big-array.c: Likewise.
	* gcc.dg/vect/slp-cond-2.c: Likewise.
	* gcc.dg/vect/slp-multitypes-1.c: Likewise.
	* gcc.dg/vect/slp-multitypes-10.c: Likewise.
	* gcc.dg/vect/slp-multitypes-11-big-array.c: Likewise.
	* gcc.dg/vect/slp-multitypes-11.c: Likewise.
	* gcc.dg/vect/slp-multitypes-12.c: Likewise.
	* gcc.dg/vect/slp-multitypes-8.c: Likewise.
	* gcc.dg/vect/slp-multitypes-9.c: Likewise.
	* gcc.dg/vect/slp-reduc-1.c: Likewise.
	* gcc.dg/vect/slp-reduc-2.c: Likewise.
	* gcc.dg/vect/slp-reduc-5.c: Likewise.
	* gcc.dg/vect/slp-widen-mult-half.c: Likewise.
	* gcc.dg/vect/vect-cselim-1.c: Likewise.
	* gcc.dg/vect/slp-25.c: Remove XFAIL for variable-length SVE.
	* gcc.dg/vect/slp-perm-5.c: Likewise.
	* gcc.dg/vect/slp-perm-6.c: Likewise.
	* gcc.dg/vect/slp-perm-9.c: Likewise.
	* gcc.dg/vect/vect-119.c: Likewise.
	* gcc.dg/vect/vect-live-slp-1.c: Likewise.
	* gcc.dg/vect/vect-live-slp-2.c: Likewise.
	* gcc.dg/vect/vect-live-slp-3.c: Likewise.
	* gcc.dg/vect/vect-over-widen-1-big-array.c: Likewise.
	* gcc.dg/vect/vect-over-widen-1.c: Likewise.
	* gcc.dg/vect/vect-over-widen-3-big-array.c: Likewise.
	* gcc.dg/vect/vect-over-widen-4-big-array.c: Likewise.
	* gcc.dg/vect/vect-over-widen-4.c: Likewise.
	* gcc.dg/vect/slp-reduc-6.c: Remove XFAIL for variable-length vectors.
	* gcc.dg/vect/vect-load-lanes-peeling-1.c: Expect an epilogue loop
	for variable-length vectors.

Index: gcc/testsuite/lib/target-supports.exp
===================================================================
--- gcc/testsuite/lib/target-supports.exp	2017-11-08 15:05:50.258852888 +0000
+++ gcc/testsuite/lib/target-supports.exp	2017-11-08 15:06:23.208849548 +0000
@@ -6506,8 +6506,7 @@ proc check_effective_target_vect_load_la
     } else {
 	set et_vect_load_lanes 0
 	if { ([istarget arm*-*-*] && [check_effective_target_arm_neon_ok])
-	     || ([istarget aarch64*-*-*]
-		 && ![check_effective_target_aarch64_sve]) } {
+	     || [istarget aarch64*-*-*] } {
 	    set et_vect_load_lanes 1
 	}
     }
Index: gcc/testsuite/g++.dg/vect/pr36648.cc
===================================================================
--- gcc/testsuite/g++.dg/vect/pr36648.cc	2017-02-23 19:54:10.000000000 +0000
+++ gcc/testsuite/g++.dg/vect/pr36648.cc	2017-11-08 15:06:23.202849548 +0000
@@ -25,6 +25,6 @@ int main() { }
    targets, ! vect_no_align is a sufficient test.  */
 
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { { { !  vect_no_align } && { ! powerpc*-*-* } } || { powerpc*-*-* && vect_hw_misalign } } } } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { target { { { ! vect_no_align } && { ! powerpc*-*-* } } || { powerpc*-*-* && vect_hw_misalign } } } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { target { { { ! vect_no_align } && { ! powerpc*-*-* } } || { powerpc*-*-* && vect_hw_misalign } } xfail { vect_variable_length && vect_load_lanes } } } } */
 
 
Index: gcc/testsuite/gcc.dg/vect/no-scevccp-slp-30.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/no-scevccp-slp-30.c	2015-06-02 23:53:38.000000000 +0100
+++ gcc/testsuite/gcc.dg/vect/no-scevccp-slp-30.c	2017-11-08 15:06:23.202849548 +0000
@@ -52,5 +52,5 @@ int main (void)
 }
 
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect"  } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { xfail { vect_variable_length && vect_load_lanes } } } } */
   
Index: gcc/testsuite/gcc.dg/vect/pr37027.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/pr37027.c	2015-11-11 15:40:09.000000000 +0000
+++ gcc/testsuite/gcc.dg/vect/pr37027.c	2017-11-08 15:06:23.202849548 +0000
@@ -32,5 +32,5 @@ foo (void)
 }
 
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { xfail vect_no_int_add } } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { xfail vect_no_int_add } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { xfail { vect_no_int_add || { vect_variable_length && vect_load_lanes } } } } } */
 
Index: gcc/testsuite/gcc.dg/vect/pr67790.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/pr67790.c	2015-12-01 19:44:24.000000000 +0000
+++ gcc/testsuite/gcc.dg/vect/pr67790.c	2017-11-08 15:06:23.202849548 +0000
@@ -37,4 +37,4 @@ int main()
   return 0;
 }
 
-/* { dg-final { scan-tree-dump "vectorizing stmts using SLP" "vect" } } */
+/* { dg-final { scan-tree-dump "vectorizing stmts using SLP" "vect" { xfail { vect_variable_length && vect_load_lanes } } } } */
Index: gcc/testsuite/gcc.dg/vect/slp-1.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/slp-1.c	2015-06-02 23:53:38.000000000 +0100
+++ gcc/testsuite/gcc.dg/vect/slp-1.c	2017-11-08 15:06:23.202849548 +0000
@@ -118,5 +118,5 @@ int main (void)
 }
 
 /* { dg-final { scan-tree-dump-times "vectorized 4 loops" 1 "vect"  } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 4 "vect"  } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 4 "vect" { xfail { vect_variable_length && vect_load_lanes } } } } */
   
Index: gcc/testsuite/gcc.dg/vect/slp-10.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/slp-10.c	2017-10-04 16:25:39.696051107 +0100
+++ gcc/testsuite/gcc.dg/vect/slp-10.c	2017-11-08 15:06:23.202849548 +0000
@@ -107,7 +107,7 @@ int main (void)
 /* { dg-final { scan-tree-dump-times "vectorized 3 loops" 1 "vect"  {target {vect_uintfloat_cvt && vect_int_mult} } } } */
 /* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect"  {target {{! { vect_uintfloat_cvt}} && vect_int_mult} } } } */
 /* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect"  {target {{! { vect_uintfloat_cvt}} && { ! {vect_int_mult}}} } } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 3 "vect" {target {vect_uintfloat_cvt && vect_int_mult} } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 3 "vect" {target { vect_uintfloat_cvt && vect_int_mult } xfail { vect_variable_length && vect_load_lanes } } } } */
 /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect"  {target {{! { vect_uintfloat_cvt}} && vect_int_mult} } } } */
 /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect"  {target {{! { vect_uintfloat_cvt}} && { ! {vect_int_mult}}} } } } */
   
Index: gcc/testsuite/gcc.dg/vect/slp-12b.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/slp-12b.c	2017-10-04 16:25:39.697051107 +0100
+++ gcc/testsuite/gcc.dg/vect/slp-12b.c	2017-11-08 15:06:23.202849548 +0000
@@ -46,6 +46,6 @@ int main (void)
 
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { target { vect_strided2 && vect_int_mult } } } } */
 /* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect"  { target { ! { vect_strided2 && vect_int_mult } } } } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect"  { target { vect_strided2 && vect_int_mult } } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect"  { target { vect_strided2 && vect_int_mult } xfail { vect_variable_length && vect_load_lanes } } } } */
 /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect"  { target { ! { vect_strided2 && vect_int_mult } } } } } */
   
Index: gcc/testsuite/gcc.dg/vect/slp-12c.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/slp-12c.c	2015-06-02 23:53:38.000000000 +0100
+++ gcc/testsuite/gcc.dg/vect/slp-12c.c	2017-11-08 15:06:23.202849548 +0000
@@ -48,5 +48,5 @@ int main (void)
 
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { target { vect_int_mult } } } } */
 /* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect"  { target { ! vect_int_mult } } } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { target vect_int_mult } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { target vect_int_mult xfail { vect_variable_length && vect_load_lanes } } } } */
 /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" { target { ! vect_int_mult } } } } */
Index: gcc/testsuite/gcc.dg/vect/slp-17.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/slp-17.c	2015-06-02 23:53:38.000000000 +0100
+++ gcc/testsuite/gcc.dg/vect/slp-17.c	2017-11-08 15:06:23.203849548 +0000
@@ -51,5 +51,5 @@ int main (void)
 }
 
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect"  } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { xfail { vect_variable_length && vect_load_lanes } } } } */
   
Index: gcc/testsuite/gcc.dg/vect/slp-19b.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/slp-19b.c	2015-06-02 23:53:38.000000000 +0100
+++ gcc/testsuite/gcc.dg/vect/slp-19b.c	2017-11-08 15:06:23.203849548 +0000
@@ -53,5 +53,5 @@ int main (void)
 
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_strided4 } } } */
 /* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" { target { ! vect_strided4 } } } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { target vect_strided4 } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { target vect_strided4 xfail { vect_variable_length && vect_load_lanes } } } } */
 /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" { target { ! vect_strided4 } } } } */
Index: gcc/testsuite/gcc.dg/vect/slp-2.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/slp-2.c	2015-06-02 23:53:38.000000000 +0100
+++ gcc/testsuite/gcc.dg/vect/slp-2.c	2017-11-08 15:06:23.203849548 +0000
@@ -140,5 +140,5 @@ int main (void)
 }
 
 /* { dg-final { scan-tree-dump-times "vectorized 4 loops" 1 "vect"  } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 4 "vect"  } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 4 "vect" { xfail { vect_variable_length && vect_load_lanes } } } } */
   
Index: gcc/testsuite/gcc.dg/vect/slp-20.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/slp-20.c	2015-06-02 23:53:38.000000000 +0100
+++ gcc/testsuite/gcc.dg/vect/slp-20.c	2017-11-08 15:06:23.203849548 +0000
@@ -110,5 +110,5 @@ int main (void)
 }
 
 /* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect"  } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 4 "vect"  } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 4 "vect" { xfail { vect_variable_length && vect_load_lanes } } } } */
   
Index: gcc/testsuite/gcc.dg/vect/slp-21.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/slp-21.c	2015-06-02 23:53:38.000000000 +0100
+++ gcc/testsuite/gcc.dg/vect/slp-21.c	2017-11-08 15:06:23.203849548 +0000
@@ -201,6 +201,6 @@ int main (void)
 
 /* { dg-final { scan-tree-dump-times "vectorized 4 loops" 1 "vect"  { target { vect_strided4 || vect_extract_even_odd } } } } */
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { target  { ! { vect_strided4 || vect_extract_even_odd } } } } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { target vect_strided4 }  } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { target vect_strided4 xfail { vect_variable_length && vect_load_lanes } } } } */
 /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect"  { target { ! { vect_strided4 } } } } } */
   
Index: gcc/testsuite/gcc.dg/vect/slp-22.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/slp-22.c	2015-06-02 23:53:38.000000000 +0100
+++ gcc/testsuite/gcc.dg/vect/slp-22.c	2017-11-08 15:06:23.203849548 +0000
@@ -129,5 +129,5 @@ int main (void)
 }
 
 /* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect"  } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 6 "vect"  } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 6 "vect" { xfail { vect_variable_length && vect_load_lanes } } } } */
   
Index: gcc/testsuite/gcc.dg/vect/slp-24-big-array.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/slp-24-big-array.c	2017-11-08 15:05:40.718203564 +0000
+++ gcc/testsuite/gcc.dg/vect/slp-24-big-array.c	2017-11-08 15:06:23.203849548 +0000
@@ -91,4 +91,4 @@ int main (void)
 }
 
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { xfail { vect_no_align && ilp32 } } } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { xfail { vect_no_align && ilp32 } } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { xfail { { vect_no_align && ilp32 } || { vect_variable_length && vect_load_lanes } } } } } */
Index: gcc/testsuite/gcc.dg/vect/slp-24.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/slp-24.c	2017-11-08 15:05:40.718203564 +0000
+++ gcc/testsuite/gcc.dg/vect/slp-24.c	2017-11-08 15:06:23.204849548 +0000
@@ -77,4 +77,4 @@ int main (void)
 }
 
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { xfail { vect_no_align && ilp32 } } } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { xfail { vect_no_align && ilp32 } } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { xfail { { vect_no_align && ilp32 } || { vect_variable_length && vect_load_lanes } } } } } */
Index: gcc/testsuite/gcc.dg/vect/slp-28.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/slp-28.c	2017-11-08 15:05:42.968853628 +0000
+++ gcc/testsuite/gcc.dg/vect/slp-28.c	2017-11-08 15:06:23.204849548 +0000
@@ -89,5 +89,5 @@ int main (void)
 }
 
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect"  } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { xfail { vect_variable_length && vect_load_lanes } } } } */
   
Index: gcc/testsuite/gcc.dg/vect/slp-33.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/slp-33.c	2017-10-04 16:25:39.697051107 +0100
+++ gcc/testsuite/gcc.dg/vect/slp-33.c	2017-11-08 15:06:23.204849548 +0000
@@ -105,7 +105,7 @@ int main (void)
 /* { dg-final { scan-tree-dump-times "vectorized 3 loops" 1 "vect"  {target {vect_uintfloat_cvt && vect_int_mult} } } } */
 /* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect"  {target {{! { vect_uintfloat_cvt}} && vect_int_mult} } } } */
 /* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect"  {target {{! { vect_uintfloat_cvt}} && {! {vect_int_mult}}} } } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 3 "vect" {target {vect_uintfloat_cvt && vect_int_mult} } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 3 "vect" {target {vect_uintfloat_cvt && vect_int_mult} xfail { vect_variable_length && vect_load_lanes } } } } */
 /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect"  {target {{! { vect_uintfloat_cvt}} && vect_int_mult} } } } */
 /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect"  {target {{! { vect_uintfloat_cvt}} && {! {vect_int_mult}}} } } } */
   
Index: gcc/testsuite/gcc.dg/vect/slp-39.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/slp-39.c	2015-06-02 23:53:38.000000000 +0100
+++ gcc/testsuite/gcc.dg/vect/slp-39.c	2017-11-08 15:06:23.204849548 +0000
@@ -21,4 +21,4 @@ void bar (double w)
     }
 }
 
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { xfail { vect_variable_length && vect_load_lanes } } } } */
Index: gcc/testsuite/gcc.dg/vect/slp-6.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/slp-6.c	2015-06-02 23:53:35.000000000 +0100
+++ gcc/testsuite/gcc.dg/vect/slp-6.c	2017-11-08 15:06:23.204849548 +0000
@@ -116,6 +116,6 @@ int main (void)
 
 /* { dg-final { scan-tree-dump-times "vectorized 3 loops" 1 "vect"  {target vect_int_mult} } } */
 /* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect"  {target  { ! { vect_int_mult } } } } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 3 "vect" {target vect_int_mult  } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 3 "vect" {target vect_int_mult xfail { vect_variable_length && vect_load_lanes } } } } */
 /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" {target  { ! { vect_int_mult } } } } } */
   
Index: gcc/testsuite/gcc.dg/vect/slp-7.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/slp-7.c	2015-06-02 23:53:35.000000000 +0100
+++ gcc/testsuite/gcc.dg/vect/slp-7.c	2017-11-08 15:06:23.204849548 +0000
@@ -122,6 +122,6 @@ int main (void)
 
 /* { dg-final { scan-tree-dump-times "vectorized 3 loops" 1 "vect"  { target vect_short_mult } } }*/
 /* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect"  { target { ! { vect_short_mult } } } } }*/
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 3 "vect"  { target vect_short_mult } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 3 "vect"  { target vect_short_mult xfail { vect_variable_length && vect_load_lanes } } } } */
 /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect"  { target { ! { vect_short_mult } } } } } */
  
Index: gcc/testsuite/gcc.dg/vect/slp-cond-1.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/slp-cond-1.c	2015-06-02 23:53:38.000000000 +0100
+++ gcc/testsuite/gcc.dg/vect/slp-cond-1.c	2017-11-08 15:06:23.204849548 +0000
@@ -122,4 +122,4 @@ main ()
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 3 "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 3 "vect" { xfail { vect_variable_length && vect_load_lanes } } } } */
Index: gcc/testsuite/gcc.dg/vect/slp-cond-2-big-array.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/slp-cond-2-big-array.c	2017-10-04 16:25:39.698051107 +0100
+++ gcc/testsuite/gcc.dg/vect/slp-cond-2-big-array.c	2017-11-08 15:06:23.205849548 +0000
@@ -125,4 +125,4 @@ main ()
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 3 "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 3 "vect" { xfail { vect_variable_length && vect_load_lanes } } } } */
Index: gcc/testsuite/gcc.dg/vect/slp-cond-2.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/slp-cond-2.c	2017-10-04 16:25:39.698051107 +0100
+++ gcc/testsuite/gcc.dg/vect/slp-cond-2.c	2017-11-08 15:06:23.205849548 +0000
@@ -125,4 +125,4 @@ main ()
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 3 "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 3 "vect" { xfail { vect_variable_length && vect_load_lanes } } } } */
Index: gcc/testsuite/gcc.dg/vect/slp-multitypes-1.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/slp-multitypes-1.c	2015-06-02 23:53:38.000000000 +0100
+++ gcc/testsuite/gcc.dg/vect/slp-multitypes-1.c	2017-11-08 15:06:23.205849548 +0000
@@ -52,5 +52,5 @@ int main (void)
 }
 
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect"  } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { xfail { vect_variable_length && vect_load_lanes } } } } */
   
Index: gcc/testsuite/gcc.dg/vect/slp-multitypes-10.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/slp-multitypes-10.c	2015-06-02 23:53:38.000000000 +0100
+++ gcc/testsuite/gcc.dg/vect/slp-multitypes-10.c	2017-11-08 15:06:23.205849548 +0000
@@ -46,5 +46,5 @@ int main (void)
 }
 
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { target vect_pack_trunc } } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect"  { target vect_pack_trunc } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect"  { target vect_pack_trunc xfail { vect_variable_length && vect_load_lanes } } } } */
   
Index: gcc/testsuite/gcc.dg/vect/slp-multitypes-11-big-array.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/slp-multitypes-11-big-array.c	2017-11-08 15:05:40.720950312 +0000
+++ gcc/testsuite/gcc.dg/vect/slp-multitypes-11-big-array.c	2017-11-08 15:06:23.205849548 +0000
@@ -55,5 +55,5 @@ int main (void)
 }
 
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { target vect_unpack } } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect"  { target vect_unpack } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect"  { target vect_unpack xfail { vect_variable_length && vect_load_lanes } } } } */
 
Index: gcc/testsuite/gcc.dg/vect/slp-multitypes-11.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/slp-multitypes-11.c	2015-06-02 23:53:38.000000000 +0100
+++ gcc/testsuite/gcc.dg/vect/slp-multitypes-11.c	2017-11-08 15:06:23.205849548 +0000
@@ -49,5 +49,5 @@ int main (void)
 }
 
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { target vect_unpack } } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect"  { target vect_unpack } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect"  { target vect_unpack xfail { vect_variable_length && vect_load_lanes } } } } */
   
Index: gcc/testsuite/gcc.dg/vect/slp-multitypes-12.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/slp-multitypes-12.c	2015-06-02 23:53:38.000000000 +0100
+++ gcc/testsuite/gcc.dg/vect/slp-multitypes-12.c	2017-11-08 15:06:23.205849548 +0000
@@ -62,5 +62,5 @@ int main (void)
 }
 
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 3 "vect"  } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 3 "vect" { xfail { vect_variable_length && vect_load_lanes } } } } */
   
Index: gcc/testsuite/gcc.dg/vect/slp-multitypes-8.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/slp-multitypes-8.c	2015-06-02 23:53:38.000000000 +0100
+++ gcc/testsuite/gcc.dg/vect/slp-multitypes-8.c	2017-11-08 15:06:23.205849548 +0000
@@ -40,5 +40,5 @@ int main (void)
 }
 
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { target vect_unpack } } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect"  { target vect_unpack } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect"  { target vect_unpack xfail { vect_variable_length && vect_load_lanes } } } } */
   
Index: gcc/testsuite/gcc.dg/vect/slp-multitypes-9.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/slp-multitypes-9.c	2015-06-02 23:53:38.000000000 +0100
+++ gcc/testsuite/gcc.dg/vect/slp-multitypes-9.c	2017-11-08 15:06:23.205849548 +0000
@@ -40,5 +40,5 @@ int main (void)
 }
 
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { target vect_pack_trunc } } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect"  { target vect_pack_trunc } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect"  { target vect_pack_trunc xfail { vect_variable_length && vect_load_lanes } } } } */
   
Index: gcc/testsuite/gcc.dg/vect/slp-reduc-1.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/slp-reduc-1.c	2015-06-02 23:53:38.000000000 +0100
+++ gcc/testsuite/gcc.dg/vect/slp-reduc-1.c	2017-11-08 15:06:23.206849548 +0000
@@ -43,5 +43,5 @@ int main (void)
 }
 
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { xfail vect_no_int_add } } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { xfail vect_no_int_add } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { xfail { vect_no_int_add || { vect_variable_length && vect_load_lanes } } } } } */
 
Index: gcc/testsuite/gcc.dg/vect/slp-reduc-2.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/slp-reduc-2.c	2015-06-02 23:53:38.000000000 +0100
+++ gcc/testsuite/gcc.dg/vect/slp-reduc-2.c	2017-11-08 15:06:23.206849548 +0000
@@ -38,5 +38,5 @@ int main (void)
 }
 
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { xfail vect_no_int_add } } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { xfail vect_no_int_add } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { xfail { vect_no_int_add || { vect_variable_length && vect_load_lanes } } } } } */
 
Index: gcc/testsuite/gcc.dg/vect/slp-reduc-5.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/slp-reduc-5.c	2015-09-07 18:51:04.000000000 +0100
+++ gcc/testsuite/gcc.dg/vect/slp-reduc-5.c	2017-11-08 15:06:23.206849548 +0000
@@ -43,5 +43,5 @@ int main (void)
 }
 
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect" { xfail vect_no_int_min_max } } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { xfail vect_no_int_min_max } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { xfail { vect_no_int_min_max || { vect_variable_length && vect_load_lanes } } } } } */
 
Index: gcc/testsuite/gcc.dg/vect/slp-widen-mult-half.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/slp-widen-mult-half.c	2016-11-22 21:16:10.000000000 +0000
+++ gcc/testsuite/gcc.dg/vect/slp-widen-mult-half.c	2017-11-08 15:06:23.206849548 +0000
@@ -46,7 +46,7 @@ int main (void)
 }
 
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_widen_mult_hi_to_si } } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { target vect_widen_mult_hi_to_si } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { target vect_widen_mult_hi_to_si xfail { vect_variable_length && vect_load_lanes } } } } */
 /* { dg-final { scan-tree-dump-times "vect_recog_widen_mult_pattern: detected" 2 "vect" { target vect_widen_mult_hi_to_si_pattern } } } */
 /* { dg-final { scan-tree-dump-times "pattern recognized" 2 "vect" { target vect_widen_mult_hi_to_si_pattern } } } */
 
Index: gcc/testsuite/gcc.dg/vect/vect-cselim-1.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-cselim-1.c	2017-11-08 15:05:50.253852889 +0000
+++ gcc/testsuite/gcc.dg/vect/vect-cselim-1.c	2017-11-08 15:06:23.206849548 +0000
@@ -83,6 +83,4 @@ main (void)
 }
 
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { ! vect_masked_store } xfail { { vect_no_align && { ! vect_hw_misalign } } || { ! vect_strided2 } } } } } */
-/* Fails for variable-length SVE because we can't yet handle the
-   interleaved load.  This is fixed by a later patch.  */
-/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" { target vect_masked_store xfail { aarch64_sve && vect_variable_length } } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" { target vect_masked_store } } } */
Index: gcc/testsuite/gcc.dg/vect/slp-25.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/slp-25.c	2017-11-08 15:05:50.252852889 +0000
+++ gcc/testsuite/gcc.dg/vect/slp-25.c	2017-11-08 15:06:23.204849548 +0000
@@ -57,6 +57,4 @@ int main (void)
 
 /* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect"  } } */
 /* { dg-final { scan-tree-dump-times "Vectorizing an unaligned access" 0 "vect" } } */
-/* Needs store_lanes for SVE, otherwise falls back to Advanced SIMD.
-   Will be fixed when SVE LOAD_LANES support is added.  */
-/* { dg-final { scan-tree-dump-times "Alignment of access forced using peeling" 2 "vect" { xfail { { { ! vect_unaligned_possible } || { ! vect_natural_alignment } } && { ! { aarch64_sve && vect_variable_length } } } } } } */
+/* { dg-final { scan-tree-dump-times "Alignment of access forced using peeling" 2 "vect" { xfail { { ! vect_unaligned_possible } || { ! vect_natural_alignment } } } } } */
Index: gcc/testsuite/gcc.dg/vect/slp-perm-5.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/slp-perm-5.c	2017-11-08 15:05:50.252852889 +0000
+++ gcc/testsuite/gcc.dg/vect/slp-perm-5.c	2017-11-08 15:06:23.205849548 +0000
@@ -104,9 +104,7 @@ int main (int argc, const char* argv[])
 }
 
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { target vect_perm } } } */
-/* Fails for variable-length SVE because we fall back to Advanced SIMD
-   and use LD3/ST3.  Will be fixed when SVE LOAD_LANES support is added.  */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { target { vect_perm3_int && {! vect_load_lanes } } xfail { aarch64_sve && vect_variable_length } } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { target { vect_perm3_int && { ! vect_load_lanes } } } } } */
 /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" { target vect_load_lanes } } } */
 /* { dg-final { scan-tree-dump "note: Built SLP cancelled: can use load/store-lanes" "vect" { target { vect_perm3_int && vect_load_lanes } } } } */
 /* { dg-final { scan-tree-dump "LOAD_LANES" "vect" { target vect_load_lanes } } } */
Index: gcc/testsuite/gcc.dg/vect/slp-perm-6.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/slp-perm-6.c	2017-11-08 15:05:50.252852889 +0000
+++ gcc/testsuite/gcc.dg/vect/slp-perm-6.c	2017-11-08 15:06:23.206849548 +0000
@@ -103,10 +103,8 @@ int main (int argc, const char* argv[])
 }
 
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { target vect_perm } } } */
-/* Fails for variable-length SVE because we fall back to Advanced SIMD
-   and use LD3/ST3.  Will be fixed when SVE LOAD_LANES support is added.  */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { target { vect_perm3_int && {! vect_load_lanes } } xfail { aarch64_sve && vect_variable_length } } } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { target vect_load_lanes } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { target { vect_perm3_int && { ! vect_load_lanes } } } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { target vect_load_lanes xfail { vect_variable_length && vect_load_lanes } } } } */
 /* { dg-final { scan-tree-dump "note: Built SLP cancelled: can use load/store-lanes" "vect" { target { vect_perm3_int && vect_load_lanes } } } } */
 /* { dg-final { scan-tree-dump "LOAD_LANES" "vect" { target vect_load_lanes } } } */
 /* { dg-final { scan-tree-dump "STORE_LANES" "vect" { target vect_load_lanes } } } */
Index: gcc/testsuite/gcc.dg/vect/slp-perm-9.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/slp-perm-9.c	2017-11-08 15:05:50.252852889 +0000
+++ gcc/testsuite/gcc.dg/vect/slp-perm-9.c	2017-11-08 15:06:23.206849548 +0000
@@ -57,9 +57,7 @@ int main (int argc, const char* argv[])
   return 0;
 }
 
-/* Fails for variable-length SVE because we fall back to Advanced SIMD
-   and use LD3/ST3.  Will be fixed when SVE LOAD_LANES support is added.  */
-/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 2 "vect" { target { ! { vect_perm_short || vect_load_lanes } } xfail { aarch64_sve && vect_variable_length } } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 2 "vect" { target { ! { vect_perm_short || vect_load_lanes } } } } } */
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_perm_short || vect_load_lanes } } } } */
 /* { dg-final { scan-tree-dump-times "permutation requires at least three vectors" 1 "vect" { target { vect_perm_short && { ! vect_perm3_short } } } } } */
 /* { dg-final { scan-tree-dump-not "permutation requires at least three vectors" "vect" { target vect_perm3_short } } } */
Index: gcc/testsuite/gcc.dg/vect/vect-119.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-119.c	2017-11-08 15:05:50.253852889 +0000
+++ gcc/testsuite/gcc.dg/vect/vect-119.c	2017-11-08 15:06:23.206849548 +0000
@@ -25,7 +25,4 @@ unsigned int foo (const unsigned int x[O
   return sum;
 }
 
-/* Requires load-lanes for SVE, which is implemented by a later patch.
-   Until then we report it twice, once for SVE and once for 128-bit
-   Advanced SIMD.  */
-/* { dg-final { scan-tree-dump-times "Detected interleaving load of size 2" 1 "vect" { xfail { aarch64_sve && vect_variable_length } } } } */
+/* { dg-final { scan-tree-dump-times "Detected interleaving load of size 2" 1 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-live-slp-1.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-live-slp-1.c	2017-11-08 15:05:50.254852889 +0000
+++ gcc/testsuite/gcc.dg/vect/vect-live-slp-1.c	2017-11-08 15:06:23.206849548 +0000
@@ -68,8 +68,5 @@ main (void)
 }
 
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 4 "vect" } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 4 "vect" } } */
-/* We can't yet create the necessary SLP constant vector for variable-length
-   SVE and so fall back to Advanced SIMD.  This means that we repeat each
-   analysis note.  */
-/* { dg-final { scan-tree-dump-times "vec_stmt_relevant_p: stmt live but not relevant" 4 "vect" { xfail { aarch64_sve && vect_variable_length } } } }*/
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 4 "vect" { xfail { vect_variable_length && vect_load_lanes } } } } */
+/* { dg-final { scan-tree-dump-times "vec_stmt_relevant_p: stmt live but not relevant" 4 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-live-slp-2.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-live-slp-2.c	2017-11-08 15:05:50.254852889 +0000
+++ gcc/testsuite/gcc.dg/vect/vect-live-slp-2.c	2017-11-08 15:06:23.207849548 +0000
@@ -62,8 +62,5 @@ main (void)
 }
 
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect" } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" } } */
-/* We can't yet create the necessary SLP constant vector for variable-length
-   SVE and so fall back to Advanced SIMD.  This means that we repeat each
-   analysis note.  */
-/* { dg-final { scan-tree-dump-times "vec_stmt_relevant_p: stmt live but not relevant" 2 "vect" { xfail { aarch64_sve && vect_variable_length } } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { xfail { vect_variable_length && vect_load_lanes } } } } */
+/* { dg-final { scan-tree-dump-times "vec_stmt_relevant_p: stmt live but not relevant" 2 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-live-slp-3.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-live-slp-3.c	2017-11-08 15:05:50.254852889 +0000
+++ gcc/testsuite/gcc.dg/vect/vect-live-slp-3.c	2017-11-08 15:06:23.207849548 +0000
@@ -69,8 +69,5 @@ main (void)
 }
 
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 4 "vect" } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 4 "vect" } } */
-/* We can't yet create the necessary SLP constant vector for variable-length
-   SVE and so fall back to Advanced SIMD.  This means that we repeat each
-   analysis note.  */
-/* { dg-final { scan-tree-dump-times "vec_stmt_relevant_p: stmt live but not relevant" 4 "vect" { xfail { aarch64_sve && vect_variable_length } } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 4 "vect" { xfail { vect_variable_length && vect_load_lanes } } } } */
+/* { dg-final { scan-tree-dump-times "vec_stmt_relevant_p: stmt live but not relevant" 4 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-1-big-array.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-over-widen-1-big-array.c	2017-11-08 15:05:50.255852889 +0000
+++ gcc/testsuite/gcc.dg/vect/vect-over-widen-1-big-array.c	2017-11-08 15:06:23.207849548 +0000
@@ -59,8 +59,6 @@ int main (void)
 
 /* { dg-final { scan-tree-dump-times "vect_recog_widen_shift_pattern: detected" 2 "vect" { target vect_widen_shift } } } */
 /* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 2 "vect" { target vect_widen_shift } } } */
-/* Requires LD4 for variable-length SVE.  Until that's supported we fall
-   back to Advanced SIMD, which does have widening shifts.  */
-/* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 4 "vect" { target { ! vect_widen_shift } xfail { aarch64_sve && vect_variable_length } } } } */
+/* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 4 "vect" { target { ! vect_widen_shift } } } } */
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
 
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-1.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-over-widen-1.c	2017-11-08 15:05:50.255852889 +0000
+++ gcc/testsuite/gcc.dg/vect/vect-over-widen-1.c	2017-11-08 15:06:23.207849548 +0000
@@ -63,9 +63,7 @@ int main (void)
 
 /* { dg-final { scan-tree-dump-times "vect_recog_widen_shift_pattern: detected" 2 "vect" { target vect_widen_shift } } } */
 /* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 2 "vect" { target vect_widen_shift } } } */
-/* Requires LD4 for variable-length SVE.  Until that's supported we fall
-   back to Advanced SIMD, which does have widening shifts.  */
-/* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 4 "vect" { target { { ! vect_sizes_32B_16B } && { ! vect_widen_shift } } xfail { aarch64_sve && vect_variable_length } } } } */
+/* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 4 "vect" { target { { ! vect_sizes_32B_16B } && { ! vect_widen_shift } } } } } */
 /* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 8 "vect" { target vect_sizes_32B_16B } } } */
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
 
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-3-big-array.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-over-widen-3-big-array.c	2017-11-08 15:05:50.256852889 +0000
+++ gcc/testsuite/gcc.dg/vect/vect-over-widen-3-big-array.c	2017-11-08 15:06:23.207849548 +0000
@@ -59,9 +59,7 @@ int main (void)
   return 0;
 }
 
-/* Requires LD4 for variable-length SVE.  Until that's supported we fall
-   back to Advanced SIMD, which does have widening shifts.  */
-/* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 2 "vect" { target { ! vect_widen_shift } xfail { aarch64_sve && vect_variable_length } } } } */
+/* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 2 "vect" { target { ! vect_widen_shift } } } } */
 /* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 1 "vect" { target vect_widen_shift } } } */
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
 
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-4-big-array.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-over-widen-4-big-array.c	2017-11-08 15:05:50.256852889 +0000
+++ gcc/testsuite/gcc.dg/vect/vect-over-widen-4-big-array.c	2017-11-08 15:06:23.207849548 +0000
@@ -63,8 +63,6 @@ int main (void)
 
 /* { dg-final { scan-tree-dump-times "vect_recog_widen_shift_pattern: detected" 2 "vect" { target vect_widen_shift } } } */
 /* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 2 "vect" { target vect_widen_shift } } } */
-/* Requires LD4 for variable-length SVE.  Until that's supported we fall
-   back to Advanced SIMD, which does have widening shifts.  */
-/* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 4 "vect" { target { ! vect_widen_shift } xfail { aarch64_sve && vect_variable_length } } } } */
+/* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 4 "vect" { target { ! vect_widen_shift } } } } */
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
 
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-4.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-over-widen-4.c	2017-11-08 15:05:50.256852889 +0000
+++ gcc/testsuite/gcc.dg/vect/vect-over-widen-4.c	2017-11-08 15:06:23.207849548 +0000
@@ -67,9 +67,7 @@ int main (void)
 
 /* { dg-final { scan-tree-dump-times "vect_recog_widen_shift_pattern: detected" 2 "vect" { target vect_widen_shift } } } */
 /* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 2 "vect" { target vect_widen_shift } } } */
-/* Requires LD4 for variable-length SVE.  Until that's supported we fall
-   back to Advanced SIMD, which does have widening shifts.  */
-/* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 4 "vect" { target { { ! vect_sizes_32B_16B } && { ! vect_widen_shift } } xfail { aarch64_sve && vect_variable_length } } } } */
+/* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 4 "vect" { target { { ! vect_sizes_32B_16B } && { ! vect_widen_shift } } } } } */
 /* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 8 "vect" { target vect_sizes_32B_16B } } } */
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
 
Index: gcc/testsuite/gcc.dg/vect/slp-reduc-6.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/slp-reduc-6.c	2017-11-08 15:05:46.805853239 +0000
+++ gcc/testsuite/gcc.dg/vect/slp-reduc-6.c	2017-11-08 15:06:23.206849548 +0000
@@ -44,5 +44,5 @@ int main (void)
 
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect" { xfail { vect_no_int_add || { ! { vect_unpack || vect_strided2 } } } } } } */
 /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" } } */
-/* { dg-final { scan-tree-dump-times "different interleaving chains in one node" 1 "vect" { target { ! vect_no_int_add } xfail vect_variable_length } } } */
+/* { dg-final { scan-tree-dump-times "different interleaving chains in one node" 1 "vect" { target { ! vect_no_int_add } } } } */
 
Index: gcc/testsuite/gcc.dg/vect/vect-load-lanes-peeling-1.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-load-lanes-peeling-1.c	2016-11-22 21:16:10.000000000 +0000
+++ gcc/testsuite/gcc.dg/vect/vect-load-lanes-peeling-1.c	2017-11-08 15:06:23.207849548 +0000
@@ -10,4 +10,4 @@ f (int *__restrict a, int *__restrict b)
 }
 
 /* { dg-final { scan-tree-dump-not "Data access with gaps" "vect" } } */
-/* { dg-final { scan-tree-dump-not "epilog loop required" "vect" } } */
+/* { dg-final { scan-tree-dump-not "epilog loop required" "vect" { xfail vect_variable_length } } } */

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [3/4] load/store_lanes testsuite markup
  2017-11-08 15:18 ` [3/4] load/store_lanes testsuite markup Richard Sandiford
@ 2017-11-20  4:53   ` Jeff Law
  0 siblings, 0 replies; 9+ messages in thread
From: Jeff Law @ 2017-11-20  4:53 UTC (permalink / raw)
  To: gcc-patches, richard.earnshaw, james.greenhalgh,
	marcus.shawcroft, richard.sandiford

On 11/08/2017 08:14 AM, Richard Sandiford wrote:
> Supporting load/store lanes for variable-length vectors means that
> we use them instead of SLP (for which we can't yet handle external
> and constant definitions -- fixed by a later patch).  Previously
> we'd fail to use load/store lanes too and fall back to 128-bit
> vectorisation.
> 
> 
> 2017-11-08  Richard Sandiford  <richard.sandiford@linaro.org>
> 	    Alan Hayward  <alan.hayward@arm.com>
> 	    David Sherwood  <david.sherwood@arm.com>
> 
> gcc/testsuite/
> 	* lib/target-supports.exp (check_effective_target_vect_load_lanes):
> 	Return true for SVE too.
> 	* g++.dg/vect/pr36648.cc: XFAIL for variable-length vectors
> 	if load/store lanes are supported.
> 	* gcc.dg/vect/no-scevccp-slp-30.c: Likewise.
> 	* gcc.dg/vect/pr37027.c: Likewise.
> 	* gcc.dg/vect/pr67790.c: Likewise.
> 	* gcc.dg/vect/slp-1.c: Likewise.
> 	* gcc.dg/vect/slp-10.c: Likewise.
> 	* gcc.dg/vect/slp-12b.c: Likewise.
> 	* gcc.dg/vect/slp-12c.c: Likewise.
> 	* gcc.dg/vect/slp-17.c: Likewise.
> 	* gcc.dg/vect/slp-19b.c: Likewise.
> 	* gcc.dg/vect/slp-2.c: Likewise.
> 	* gcc.dg/vect/slp-20.c: Likewise.
> 	* gcc.dg/vect/slp-21.c: Likewise.
> 	* gcc.dg/vect/slp-22.c: Likewise.
> 	* gcc.dg/vect/slp-24-big-array.c: Likewise.
> 	* gcc.dg/vect/slp-24.c: Likewise.
> 	* gcc.dg/vect/slp-28.c: Likewise.
> 	* gcc.dg/vect/slp-33.c: Likewise.
> 	* gcc.dg/vect/slp-39.c: Likewise.
> 	* gcc.dg/vect/slp-6.c: Likewise.
> 	* gcc.dg/vect/slp-7.c: Likewise.
> 	* gcc.dg/vect/slp-cond-1.c: Likewise.
> 	* gcc.dg/vect/slp-cond-2-big-array.c: Likewise.
> 	* gcc.dg/vect/slp-cond-2.c: Likewise.
> 	* gcc.dg/vect/slp-multitypes-1.c: Likewise.
> 	* gcc.dg/vect/slp-multitypes-10.c: Likewise.
> 	* gcc.dg/vect/slp-multitypes-11-big-array.c: Likewise.
> 	* gcc.dg/vect/slp-multitypes-11.c: Likewise.
> 	* gcc.dg/vect/slp-multitypes-12.c: Likewise.
> 	* gcc.dg/vect/slp-multitypes-8.c: Likewise.
> 	* gcc.dg/vect/slp-multitypes-9.c: Likewise.
> 	* gcc.dg/vect/slp-reduc-1.c: Likewise.
> 	* gcc.dg/vect/slp-reduc-2.c: Likewise.
> 	* gcc.dg/vect/slp-reduc-5.c: Likewise.
> 	* gcc.dg/vect/slp-widen-mult-half.c: Likewise.
> 	* gcc.dg/vect/vect-cselim-1.c: Likewise.
> 	* gcc.dg/vect/slp-25.c: Remove XFAIL for variable-length SVE.
> 	* gcc.dg/vect/slp-perm-5.c: Likewise.
> 	* gcc.dg/vect/slp-perm-6.c: Likewise.
> 	* gcc.dg/vect/slp-perm-9.c: Likewise.
> 	* gcc.dg/vect/vect-119.c: Likewise.
> 	* gcc.dg/vect/vect-live-slp-1.c: Likewise.
> 	* gcc.dg/vect/vect-live-slp-2.c: Likewise.
> 	* gcc.dg/vect/vect-live-slp-3.c: Likewise.
> 	* gcc.dg/vect/vect-over-widen-1-big-array.c: Likewise.
> 	* gcc.dg/vect/vect-over-widen-1.c: Likewise.
> 	* gcc.dg/vect/vect-over-widen-3-big-array.c: Likewise.
> 	* gcc.dg/vect/vect-over-widen-4-big-array.c: Likewise.
> 	* gcc.dg/vect/vect-over-widen-4.c: Likewise.
> 	* gcc.dg/vect/slp-reduc-6.c: Remove XFAIL for variable-length vectors.
> 	* gcc.dg/vect/vect-load-lanes-peeling-1.c: Expect an epilogue loop
> 	for variable-length vectors.
OK once the set and any prereqs are approved.

jeff

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [1/4] Give the target more control over ARRAY_TYPE modes
  2017-11-08 15:13 ` [1/4] Give the target more control over ARRAY_TYPE modes Richard Sandiford
@ 2017-11-21 16:38   ` Jeff Law
  0 siblings, 0 replies; 9+ messages in thread
From: Jeff Law @ 2017-11-21 16:38 UTC (permalink / raw)
  To: gcc-patches, richard.earnshaw, james.greenhalgh,
	marcus.shawcroft, richard.sandiford

On 11/08/2017 08:12 AM, Richard Sandiford wrote:
> So far we've used integer modes for LD[234] and ST[234] arrays.
> That doesn't scale well to SVE, since the sizes aren't fixed at
> compile time (and even if they were, we wouldn't want integers
> to be so wide).
> 
> This patch lets the target use double-, triple- and quadruple-length
> vectors instead.
> 
> 
> 2017-11-08  Richard Sandiford  <richard.sandiford@linaro.org>
> 	    Alan Hayward  <alan.hayward@arm.com>
> 	    David Sherwood  <david.sherwood@arm.com>
> 
> gcc/
> 	* target.def (array_mode): New target hook.
> 	* doc/tm.texi.in (TARGET_ARRAY_MODE): New hook.
> 	* doc/tm.texi: Regenerate.
> 	* hooks.h (hook_optmode_mode_uhwi_none): Declare.
> 	* hooks.c (hook_optmode_mode_uhwi_none): New function.
> 	* tree-vect-data-refs.c (vect_lanes_optab_supported_p): Use
> 	targetm.array_mode.
> 	* stor-layout.c (mode_for_array): Likewise.  Support polynomial
> 	type sizes.
> 
Whoops.  I'd started, but not completed review on this one a few days ago.

OK.  I think this covers the target independent bits from the series, right?

jeff

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [2/4] [AArch64] SVE load/store_lanes support
  2017-11-08 15:16 ` [2/4] [AArch64] SVE load/store_lanes support Richard Sandiford
@ 2018-01-06 19:45   ` Richard Sandiford
  0 siblings, 0 replies; 9+ messages in thread
From: Richard Sandiford @ 2018-01-06 19:45 UTC (permalink / raw)
  To: gcc-patches; +Cc: richard.earnshaw, james.greenhalgh, marcus.shawcroft

Both a ping and a repost with the new VNx names.  See:

   https://gcc.gnu.org/ml/gcc-patches/2017-11/msg00592.html

for the full series.

Thanks,
Richard

---

This patch adds support for SVE LD[234], ST[234] and associated
structure modes.  Unlike Advanced SIMD, these modes are extra-long
vector modes instead of integer modes.

2017-11-06  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* config/aarch64/aarch64-modes.def: Define x2, x3 and x4 vector
	modes for SVE.
	* config/aarch64/aarch64-protos.h
	(aarch64_sve_struct_memory_operand_p): Declare.
	* config/aarch64/iterators.md (SVE_STRUCT): New mode iterator.
	(vector_count, insn_length, VSINGLE, vsingle): New mode attributes.
	(VPRED, vpred): Handle SVE structure modes.
	* config/aarch64/constraints.md (Utx): New constraint.
	* config/aarch64/predicates.md (aarch64_sve_struct_memory_operand)
	(aarch64_sve_struct_nonimmediate_operand): New predicates.
	* config/aarch64/aarch64.md (UNSPEC_LDN, UNSPEC_STN): New unspecs.
	* config/aarch64/aarch64-sve.md (mov<mode>, *aarch64_sve_mov<mode>_le)
	(*aarch64_sve_mov<mode>_be, pred_mov<mode>): New patterns for
	structure modes.  Split into pieces after RA.
	(vec_load_lanes<mode><vsingle>, vec_mask_load_lanes<mode><vsingle>)
	(vec_store_lanes<mode><vsingle>, vec_mask_store_lanes<mode><vsingle>):
	New patterns.
	* config/aarch64/aarch64.c (aarch64_classify_vector_mode): Handle
	SVE structure modes.
	(aarch64_classify_address): Likewise.
	(sizetochar): Move earlier in file.
	(aarch64_print_operand): Handle SVE register lists.
	(aarch64_array_mode): New function.
	(aarch64_sve_struct_memory_operand_p): Likewise.
	(TARGET_ARRAY_MODE): Redefine.

Index: gcc/config/aarch64/aarch64-modes.def
===================================================================
--- gcc/config/aarch64/aarch64-modes.def	2017-12-22 16:00:58.471012631 +0000
+++ gcc/config/aarch64/aarch64-modes.def	2017-12-22 16:01:42.042358758 +0000
@@ -87,6 +87,9 @@ INT_MODE (XI, 64);
 /* Give SVE vectors the names normally used for 256-bit vectors.
    The actual number depends on command-line flags.  */
 SVE_MODES (1, VNx16, VNx8, VNx4, VNx2)
+SVE_MODES (2, VNx32, VNx16, VNx8, VNx4)
+SVE_MODES (3, VNx48, VNx24, VNx12, VNx6)
+SVE_MODES (4, VNx64, VNx32, VNx16, VNx8)
 
 /* Quad float: 128-bit floating mode for long doubles.  */
 FLOAT_MODE (TF, 16, ieee_quad_format);
Index: gcc/config/aarch64/aarch64-protos.h
===================================================================
--- gcc/config/aarch64/aarch64-protos.h	2017-12-22 16:00:58.471012631 +0000
+++ gcc/config/aarch64/aarch64-protos.h	2017-12-22 16:01:42.043358720 +0000
@@ -432,6 +432,7 @@ rtx aarch64_simd_gen_const_vector_dup (m
 bool aarch64_simd_mem_operand_p (rtx);
 bool aarch64_sve_ld1r_operand_p (rtx);
 bool aarch64_sve_ldr_operand_p (rtx);
+bool aarch64_sve_struct_memory_operand_p (rtx);
 rtx aarch64_simd_vect_par_cnst_half (machine_mode, int, bool);
 rtx aarch64_tls_get_addr (void);
 tree aarch64_fold_builtin (tree, int, tree *, bool);
Index: gcc/config/aarch64/iterators.md
===================================================================
--- gcc/config/aarch64/iterators.md	2017-12-22 16:00:58.477012402 +0000
+++ gcc/config/aarch64/iterators.md	2017-12-22 16:01:42.045358644 +0000
@@ -250,6 +250,14 @@ (define_mode_iterator VMUL_CHANGE_NLANES
 (define_mode_iterator SVE_ALL [VNx16QI VNx8HI VNx4SI VNx2DI
 			       VNx8HF VNx4SF VNx2DF])
 
+;; All SVE vector structure modes.
+(define_mode_iterator SVE_STRUCT [VNx32QI VNx16HI VNx8SI VNx4DI
+				  VNx16HF VNx8SF VNx4DF
+				  VNx48QI VNx24HI VNx12SI VNx6DI
+				  VNx24HF VNx12SF VNx6DF
+				  VNx64QI VNx32HI VNx16SI VNx8DI
+				  VNx32HF VNx16SF VNx8DF])
+
 ;; All SVE vector modes that have 8-bit or 16-bit elements.
 (define_mode_iterator SVE_BH [VNx16QI VNx8HI VNx8HF])
 
@@ -587,9 +595,16 @@ (define_mode_attr Vetype [(V8QI "b") (V1
 
 ;; Equivalent of "size" for a vector element.
 (define_mode_attr Vesize [(VNx16QI "b")
-			  (VNx8HI  "h") (VNx8HF "h")
-			  (VNx4SI  "w") (VNx4SF "w")
-			  (VNx2DI  "d") (VNx2DF "d")])
+			  (VNx8HI  "h") (VNx8HF  "h")
+			  (VNx4SI  "w") (VNx4SF  "w")
+			  (VNx2DI  "d") (VNx2DF  "d")
+			  (VNx32QI "b") (VNx48QI "b") (VNx64QI "b")
+			  (VNx16HI "h") (VNx24HI "h") (VNx32HI "h")
+			  (VNx16HF "h") (VNx24HF "h") (VNx32HF "h")
+			  (VNx8SI  "w") (VNx12SI "w") (VNx16SI "w")
+			  (VNx8SF  "w") (VNx12SF "w") (VNx16SF "w")
+			  (VNx4DI  "d") (VNx6DI  "d") (VNx8DI  "d")
+			  (VNx4DF  "d") (VNx6DF  "d") (VNx8DF  "d")])
 
 ;; Vetype is used everywhere in scheduling type and assembly output,
 ;; sometimes they are not the same, for example HF modes on some
@@ -957,17 +972,93 @@ (define_mode_attr insn_count [(OI "8") (
 ;; No need of iterator for -fPIC as it use got_lo12 for both modes.
 (define_mode_attr got_modifier [(SI "gotpage_lo14") (DI "gotpage_lo15")])
 
-;; The predicate mode associated with an SVE data mode.
+;; The number of subvectors in an SVE_STRUCT.
+(define_mode_attr vector_count [(VNx32QI "2") (VNx16HI "2")
+				(VNx8SI  "2") (VNx4DI  "2")
+				(VNx16HF "2") (VNx8SF  "2") (VNx4DF "2")
+				(VNx48QI "3") (VNx24HI "3")
+				(VNx12SI "3") (VNx6DI  "3")
+				(VNx24HF "3") (VNx12SF "3") (VNx6DF "3")
+				(VNx64QI "4") (VNx32HI "4")
+				(VNx16SI "4") (VNx8DI  "4")
+				(VNx32HF "4") (VNx16SF "4") (VNx8DF "4")])
+
+;; The number of instruction bytes needed for an SVE_STRUCT move.  This is
+;; equal to vector_count * 4.
+(define_mode_attr insn_length [(VNx32QI "8")  (VNx16HI "8")
+			       (VNx8SI  "8")  (VNx4DI  "8")
+			       (VNx16HF "8")  (VNx8SF  "8")  (VNx4DF "8")
+			       (VNx48QI "12") (VNx24HI "12")
+			       (VNx12SI "12") (VNx6DI  "12")
+			       (VNx24HF "12") (VNx12SF "12") (VNx6DF "12")
+			       (VNx64QI "16") (VNx32HI "16")
+			       (VNx16SI "16") (VNx8DI  "16")
+			       (VNx32HF "16") (VNx16SF "16") (VNx8DF "16")])
+
+;; The type of a subvector in an SVE_STRUCT.
+(define_mode_attr VSINGLE [(VNx32QI "VNx16QI")
+			   (VNx16HI "VNx8HI") (VNx16HF "VNx8HF")
+			   (VNx8SI "VNx4SI") (VNx8SF "VNx4SF")
+			   (VNx4DI "VNx2DI") (VNx4DF "VNx2DF")
+			   (VNx48QI "VNx16QI")
+			   (VNx24HI "VNx8HI") (VNx24HF "VNx8HF")
+			   (VNx12SI "VNx4SI") (VNx12SF "VNx4SF")
+			   (VNx6DI "VNx2DI") (VNx6DF "VNx2DF")
+			   (VNx64QI "VNx16QI")
+			   (VNx32HI "VNx8HI") (VNx32HF "VNx8HF")
+			   (VNx16SI "VNx4SI") (VNx16SF "VNx4SF")
+			   (VNx8DI "VNx2DI") (VNx8DF "VNx2DF")])
+
+;; ...and again in lower case.
+(define_mode_attr vsingle [(VNx32QI "vnx16qi")
+			   (VNx16HI "vnx8hi") (VNx16HF "vnx8hf")
+			   (VNx8SI "vnx4si") (VNx8SF "vnx4sf")
+			   (VNx4DI "vnx2di") (VNx4DF "vnx2df")
+			   (VNx48QI "vnx16qi")
+			   (VNx24HI "vnx8hi") (VNx24HF "vnx8hf")
+			   (VNx12SI "vnx4si") (VNx12SF "vnx4sf")
+			   (VNx6DI "vnx2di") (VNx6DF "vnx2df")
+			   (VNx64QI "vnx16qi")
+			   (VNx32HI "vnx8hi") (VNx32HF "vnx8hf")
+			   (VNx16SI "vnx4si") (VNx16SF "vnx4sf")
+			   (VNx8DI "vnx2di") (VNx8DF "vnx2df")])
+
+;; The predicate mode associated with an SVE data mode.  For structure modes
+;; this is equivalent to the <VPRED> of the subvector mode.
 (define_mode_attr VPRED [(VNx16QI "VNx16BI")
 			 (VNx8HI "VNx8BI") (VNx8HF "VNx8BI")
 			 (VNx4SI "VNx4BI") (VNx4SF "VNx4BI")
-			 (VNx2DI "VNx2BI") (VNx2DF "VNx2BI")])
+			 (VNx2DI "VNx2BI") (VNx2DF "VNx2BI")
+			 (VNx32QI "VNx16BI")
+			 (VNx16HI "VNx8BI") (VNx16HF "VNx8BI")
+			 (VNx8SI "VNx4BI") (VNx8SF "VNx4BI")
+			 (VNx4DI "VNx2BI") (VNx4DF "VNx2BI")
+			 (VNx48QI "VNx16BI")
+			 (VNx24HI "VNx8BI") (VNx24HF "VNx8BI")
+			 (VNx12SI "VNx4BI") (VNx12SF "VNx4BI")
+			 (VNx6DI "VNx2BI") (VNx6DF "VNx2BI")
+			 (VNx64QI "VNx16BI")
+			 (VNx32HI "VNx8BI") (VNx32HF "VNx8BI")
+			 (VNx16SI "VNx4BI") (VNx16SF "VNx4BI")
+			 (VNx8DI "VNx2BI") (VNx8DF "VNx2BI")])
 
 ;; ...and again in lower case.
 (define_mode_attr vpred [(VNx16QI "vnx16bi")
 			 (VNx8HI "vnx8bi") (VNx8HF "vnx8bi")
 			 (VNx4SI "vnx4bi") (VNx4SF "vnx4bi")
-			 (VNx2DI "vnx2bi") (VNx2DF "vnx2bi")])
+			 (VNx2DI "vnx2bi") (VNx2DF "vnx2bi")
+			 (VNx32QI "vnx16bi")
+			 (VNx16HI "vnx8bi") (VNx16HF "vnx8bi")
+			 (VNx8SI "vnx4bi") (VNx8SF "vnx4bi")
+			 (VNx4DI "vnx2bi") (VNx4DF "vnx2bi")
+			 (VNx48QI "vnx16bi")
+			 (VNx24HI "vnx8bi") (VNx24HF "vnx8bi")
+			 (VNx12SI "vnx4bi") (VNx12SF "vnx4bi")
+			 (VNx6DI "vnx2bi") (VNx6DF "vnx2bi")
+			 (VNx64QI "vnx16bi")
+			 (VNx32HI "vnx8bi") (VNx32HF "vnx4bi")
+			 (VNx16SI "vnx4bi") (VNx16SF "vnx4bi")
+			 (VNx8DI "vnx2bi") (VNx8DF "vnx2bi")])
 
 ;; -------------------------------------------------------------------
 ;; Code Iterators
Index: gcc/config/aarch64/constraints.md
===================================================================
--- gcc/config/aarch64/constraints.md	2017-12-22 16:00:58.476012440 +0000
+++ gcc/config/aarch64/constraints.md	2017-12-22 16:01:42.045358644 +0000
@@ -237,6 +237,12 @@ (define_memory_constraint "Uty"
   (and (match_code "mem")
        (match_test "aarch64_sve_ld1r_operand_p (op)")))
 
+(define_memory_constraint "Utx"
+  "@internal
+   An address valid for SVE structure mov patterns (as distinct from
+   LD[234] and ST[234] patterns)."
+  (match_operand 0 "aarch64_sve_struct_memory_operand"))
+
 (define_constraint "Ufc"
   "A floating point constant which can be used with an\
    FMOV immediate operation."
Index: gcc/config/aarch64/predicates.md
===================================================================
--- gcc/config/aarch64/predicates.md	2017-12-22 16:00:58.477012402 +0000
+++ gcc/config/aarch64/predicates.md	2017-12-22 16:01:42.045358644 +0000
@@ -482,6 +482,14 @@ (define_predicate "aarch64_sve_general_o
 	    (match_operand 0 "aarch64_sve_ldr_operand")
 	    (match_test "aarch64_mov_operand_p (op, mode)"))))
 
+(define_predicate "aarch64_sve_struct_memory_operand"
+  (and (match_code "mem")
+       (match_test "aarch64_sve_struct_memory_operand_p (op)")))
+
+(define_predicate "aarch64_sve_struct_nonimmediate_operand"
+  (ior (match_operand 0 "register_operand")
+       (match_operand 0 "aarch64_sve_struct_memory_operand")))
+
 ;; Doesn't include immediates, since those are handled by the move
 ;; patterns instead.
 (define_predicate "aarch64_sve_dup_operand"
Index: gcc/config/aarch64/aarch64.md
===================================================================
--- gcc/config/aarch64/aarch64.md	2017-12-22 16:00:58.476012440 +0000
+++ gcc/config/aarch64/aarch64.md	2017-12-22 16:01:42.045358644 +0000
@@ -161,6 +161,8 @@ (define_c_enum "unspec" [
     UNSPEC_PACK
     UNSPEC_FLOAT_CONVERT
     UNSPEC_WHILE_LO
+    UNSPEC_LDN
+    UNSPEC_STN
 ])
 
 (define_c_enum "unspecv" [
Index: gcc/config/aarch64/aarch64-sve.md
===================================================================
--- gcc/config/aarch64/aarch64-sve.md	2017-12-22 16:00:58.471012631 +0000
+++ gcc/config/aarch64/aarch64-sve.md	2017-12-22 16:01:42.043358720 +0000
@@ -189,6 +189,105 @@ (define_insn "maskstore<mode><vpred>"
   "st1<Vesize>\t%1.<Vetype>, %2, %0"
 )
 
+;; SVE structure moves.
+(define_expand "mov<mode>"
+  [(set (match_operand:SVE_STRUCT 0 "nonimmediate_operand")
+	(match_operand:SVE_STRUCT 1 "general_operand"))]
+  "TARGET_SVE"
+  {
+    /* Big-endian loads and stores need to be done via LD1 and ST1;
+       see the comment at the head of the file for details.  */
+    if ((MEM_P (operands[0]) || MEM_P (operands[1]))
+	&& BYTES_BIG_ENDIAN)
+      {
+	gcc_assert (can_create_pseudo_p ());
+	aarch64_expand_sve_mem_move (operands[0], operands[1], <VPRED>mode);
+	DONE;
+      }
+
+    if (CONSTANT_P (operands[1]))
+      {
+	aarch64_expand_mov_immediate (operands[0], operands[1]);
+	DONE;
+      }
+  }
+)
+
+;; Unpredicated structure moves (little-endian).
+(define_insn "*aarch64_sve_mov<mode>_le"
+  [(set (match_operand:SVE_STRUCT 0 "aarch64_sve_nonimmediate_operand" "=w, Utr, w, w")
+	(match_operand:SVE_STRUCT 1 "aarch64_sve_general_operand" "Utr, w, w, Dn"))]
+  "TARGET_SVE && !BYTES_BIG_ENDIAN"
+  "#"
+  [(set_attr "length" "<insn_length>")]
+)
+
+;; Unpredicated structure moves (big-endian).  Memory accesses require
+;; secondary reloads.
+(define_insn "*aarch64_sve_mov<mode>_le"
+  [(set (match_operand:SVE_STRUCT 0 "register_operand" "=w, w")
+	(match_operand:SVE_STRUCT 1 "aarch64_nonmemory_operand" "w, Dn"))]
+  "TARGET_SVE && BYTES_BIG_ENDIAN"
+  "#"
+  [(set_attr "length" "<insn_length>")]
+)
+
+;; Split unpredicated structure moves into pieces.  This is the same
+;; for both big-endian and little-endian code, although it only needs
+;; to handle memory operands for little-endian code.
+(define_split
+  [(set (match_operand:SVE_STRUCT 0 "aarch64_sve_nonimmediate_operand")
+	(match_operand:SVE_STRUCT 1 "aarch64_sve_general_operand"))]
+  "TARGET_SVE && reload_completed"
+  [(const_int 0)]
+  {
+    rtx dest = operands[0];
+    rtx src = operands[1];
+    if (REG_P (dest) && REG_P (src))
+      aarch64_simd_emit_reg_reg_move (operands, <VSINGLE>mode, <vector_count>);
+    else
+      for (unsigned int i = 0; i < <vector_count>; ++i)
+	{
+	  rtx subdest = simplify_gen_subreg (<VSINGLE>mode, dest, <MODE>mode,
+					     i * BYTES_PER_SVE_VECTOR);
+	  rtx subsrc = simplify_gen_subreg (<VSINGLE>mode, src, <MODE>mode,
+					    i * BYTES_PER_SVE_VECTOR);
+	  emit_insn (gen_rtx_SET (subdest, subsrc));
+	}
+    DONE;
+  }
+)
+
+;; Predicated structure moves.  This works for both endiannesses but in
+;; practice is only useful for big-endian.
+(define_insn_and_split "pred_mov<mode>"
+  [(set (match_operand:SVE_STRUCT 0 "aarch64_sve_struct_nonimmediate_operand" "=w, Utx")
+	(unspec:SVE_STRUCT
+	  [(match_operand:<VPRED> 1 "register_operand" "Upl, Upl")
+	   (match_operand:SVE_STRUCT 2 "aarch64_sve_struct_nonimmediate_operand" "Utx, w")]
+	  UNSPEC_MERGE_PTRUE))]
+  "TARGET_SVE
+   && (register_operand (operands[0], <MODE>mode)
+       || register_operand (operands[2], <MODE>mode))"
+  "#"
+  "&& reload_completed"
+  [(const_int 0)]
+  {
+    for (unsigned int i = 0; i < <vector_count>; ++i)
+      {
+	rtx subdest = simplify_gen_subreg (<VSINGLE>mode, operands[0],
+					   <MODE>mode,
+					   i * BYTES_PER_SVE_VECTOR);
+	rtx subsrc = simplify_gen_subreg (<VSINGLE>mode, operands[2],
+					  <MODE>mode,
+					  i * BYTES_PER_SVE_VECTOR);
+	aarch64_emit_sve_pred_move (subdest, operands[1], subsrc);
+      }
+    DONE;
+  }
+  [(set_attr "length" "<insn_length>")]
+)
+
 (define_expand "mov<mode>"
   [(set (match_operand:PRED_ALL 0 "nonimmediate_operand")
 	(match_operand:PRED_ALL 1 "general_operand"))]
@@ -460,6 +559,60 @@ (define_insn "*vec_series<mode>_plus"
   }
 )
 
+;; Unpredicated LD[234].
+(define_expand "vec_load_lanes<mode><vsingle>"
+  [(set (match_operand:SVE_STRUCT 0 "register_operand")
+	(unspec:SVE_STRUCT
+	  [(match_dup 2)
+	   (match_operand:SVE_STRUCT 1 "memory_operand")]
+	  UNSPEC_LDN))]
+  "TARGET_SVE"
+  {
+    operands[2] = force_reg (<VPRED>mode, CONSTM1_RTX (<VPRED>mode));
+  }
+)
+
+;; Predicated LD[234].
+(define_insn "vec_mask_load_lanes<mode><vsingle>"
+  [(set (match_operand:SVE_STRUCT 0 "register_operand" "=w")
+	(unspec:SVE_STRUCT
+	  [(match_operand:<VPRED> 2 "register_operand" "Upl")
+	   (match_operand:SVE_STRUCT 1 "memory_operand" "m")]
+	  UNSPEC_LDN))]
+  "TARGET_SVE"
+  "ld<vector_count><Vesize>\t%0, %2/z, %1"
+)
+
+;; Unpredicated ST[234].  This is always a full update, so the dependence
+;; on the old value of the memory location (via (match_dup 0)) is redundant.
+;; There doesn't seem to be any obvious benefit to treating the all-true
+;; case differently though.  In particular, it's very unlikely that we'll
+;; only find out during RTL that a store_lanes is dead.
+(define_expand "vec_store_lanes<mode><vsingle>"
+  [(set (match_operand:SVE_STRUCT 0 "memory_operand")
+	(unspec:SVE_STRUCT
+	  [(match_dup 2)
+	   (match_operand:SVE_STRUCT 1 "register_operand")
+	   (match_dup 0)]
+	  UNSPEC_STN))]
+  "TARGET_SVE"
+  {
+    operands[2] = force_reg (<VPRED>mode, CONSTM1_RTX (<VPRED>mode));
+  }
+)
+
+;; Predicated ST[234].
+(define_insn "vec_mask_store_lanes<mode><vsingle>"
+  [(set (match_operand:SVE_STRUCT 0 "memory_operand" "+m")
+	(unspec:SVE_STRUCT
+	  [(match_operand:<VPRED> 2 "register_operand" "Upl")
+	   (match_operand:SVE_STRUCT 1 "register_operand" "w")
+	   (match_dup 0)]
+	  UNSPEC_STN))]
+  "TARGET_SVE"
+  "st<vector_count><Vesize>\t%1, %2, %0"
+)
+
 (define_expand "vec_perm<mode>"
   [(match_operand:SVE_ALL 0 "register_operand")
    (match_operand:SVE_ALL 1 "register_operand")
Index: gcc/config/aarch64/aarch64.c
===================================================================
--- gcc/config/aarch64/aarch64.c	2017-12-22 16:00:42.829606965 +0000
+++ gcc/config/aarch64/aarch64.c	2017-12-22 16:01:42.044358682 +0000
@@ -1178,9 +1178,15 @@ aarch64_classify_vector_mode (machine_mo
 	  || inner == DImode
 	  || inner == DFmode))
     {
-      if (TARGET_SVE
-	  && known_eq (GET_MODE_BITSIZE (mode), BITS_PER_SVE_VECTOR))
-	return VEC_SVE_DATA;
+      if (TARGET_SVE)
+	{
+	  if (known_eq (GET_MODE_BITSIZE (mode), BITS_PER_SVE_VECTOR))
+	    return VEC_SVE_DATA;
+	  if (known_eq (GET_MODE_BITSIZE (mode), BITS_PER_SVE_VECTOR * 2)
+	      || known_eq (GET_MODE_BITSIZE (mode), BITS_PER_SVE_VECTOR * 3)
+	      || known_eq (GET_MODE_BITSIZE (mode), BITS_PER_SVE_VECTOR * 4))
+	    return VEC_SVE_DATA | VEC_STRUCT;
+	}
 
       /* This includes V1DF but not V1DI (which doesn't exist).  */
       if (TARGET_SIMD
@@ -1208,6 +1214,18 @@ aarch64_sve_data_mode_p (machine_mode mo
   return aarch64_classify_vector_mode (mode) & VEC_SVE_DATA;
 }
 
+/* Implement target hook TARGET_ARRAY_MODE.  */
+static opt_machine_mode
+aarch64_array_mode (machine_mode mode, unsigned HOST_WIDE_INT nelems)
+{
+  if (aarch64_classify_vector_mode (mode) == VEC_SVE_DATA
+      && IN_RANGE (nelems, 2, 4))
+    return mode_for_vector (GET_MODE_INNER (mode),
+			    GET_MODE_NUNITS (mode) * nelems);
+
+  return opt_machine_mode ();
+}
+
 /* Implement target hook TARGET_ARRAY_MODE_SUPPORTED_P.  */
 static bool
 aarch64_array_mode_supported_p (machine_mode mode,
@@ -5778,6 +5796,18 @@ aarch64_classify_address (struct aarch64
 		    ? offset_4bit_signed_scaled_p (mode, offset)
 		    : offset_9bit_signed_scaled_p (mode, offset));
 
+	  if (vec_flags == (VEC_SVE_DATA | VEC_STRUCT))
+	    {
+	      poly_int64 end_offset = (offset
+				       + GET_MODE_SIZE (mode)
+				       - BYTES_PER_SVE_VECTOR);
+	      return (type == ADDR_QUERY_M
+		      ? offset_4bit_signed_scaled_p (mode, offset)
+		      : (offset_9bit_signed_scaled_p (SVE_BYTE_MODE, offset)
+			 && offset_9bit_signed_scaled_p (SVE_BYTE_MODE,
+							 end_offset)));
+	    }
+
 	  if (vec_flags == VEC_SVE_PRED)
 	    return offset_9bit_signed_scaled_p (mode, offset);
 
@@ -6490,6 +6520,20 @@ aarch64_print_vector_float_operand (FILE
   return true;
 }
 
+/* Return the equivalent letter for size.  */
+static char
+sizetochar (int size)
+{
+  switch (size)
+    {
+    case 64: return 'd';
+    case 32: return 's';
+    case 16: return 'h';
+    case 8 : return 'b';
+    default: gcc_unreachable ();
+    }
+}
+
 /* Print operand X to file F in a target specific manner according to CODE.
    The acceptable formatting commands given by CODE are:
      'c':		An integer or symbol address without a preceding #
@@ -6777,7 +6821,18 @@ aarch64_print_operand (FILE *f, rtx x, i
 	{
 	case REG:
 	  if (aarch64_sve_data_mode_p (GET_MODE (x)))
-	    asm_fprintf (f, "z%d", REGNO (x) - V0_REGNUM);
+	    {
+	      if (REG_NREGS (x) == 1)
+		asm_fprintf (f, "z%d", REGNO (x) - V0_REGNUM);
+	      else
+		{
+		  char suffix
+		    = sizetochar (GET_MODE_UNIT_BITSIZE (GET_MODE (x)));
+		  asm_fprintf (f, "{z%d.%c - z%d.%c}",
+			       REGNO (x) - V0_REGNUM, suffix,
+			       END_REGNO (x) - V0_REGNUM - 1, suffix);
+		}
+	    }
 	  else
 	    asm_fprintf (f, "%s", reg_names [REGNO (x)]);
 	  break;
@@ -12952,20 +13007,6 @@ aarch64_final_prescan_insn (rtx_insn *in
 }
 
 
-/* Return the equivalent letter for size.  */
-static char
-sizetochar (int size)
-{
-  switch (size)
-    {
-    case 64: return 'd';
-    case 32: return 's';
-    case 16: return 'h';
-    case 8 : return 'b';
-    default: gcc_unreachable ();
-    }
-}
-
 /* Return true if BASE_OR_STEP is a valid immediate operand for an SVE INDEX
    instruction.  */
 
@@ -13560,6 +13601,28 @@ aarch64_sve_ldr_operand_p (rtx op)
 	  && addr.type == ADDRESS_REG_IMM);
 }
 
+/* Return true if OP is a valid MEM operand for an SVE_STRUCT mode.
+   We need to be able to access the individual pieces, so the range
+   is different from LD[234] and ST[234].  */
+bool
+aarch64_sve_struct_memory_operand_p (rtx op)
+{
+  if (!MEM_P (op))
+    return false;
+
+  machine_mode mode = GET_MODE (op);
+  struct aarch64_address_info addr;
+  if (!aarch64_classify_address (&addr, XEXP (op, 0), SVE_BYTE_MODE, false,
+				 ADDR_QUERY_ANY)
+      || addr.type != ADDRESS_REG_IMM)
+    return false;
+
+  poly_int64 first = addr.const_offset;
+  poly_int64 last = first + GET_MODE_SIZE (mode) - BYTES_PER_SVE_VECTOR;
+  return (offset_4bit_signed_scaled_p (SVE_BYTE_MODE, first)
+	  && offset_4bit_signed_scaled_p (SVE_BYTE_MODE, last));
+}
+
 /* Emit a register copy from operand to operand, taking care not to
    early-clobber source registers in the process.
 
@@ -17629,6 +17692,9 @@ #define TARGET_VECTOR_MODE_SUPPORTED_P a
 #define TARGET_VECTORIZE_SUPPORT_VECTOR_MISALIGNMENT \
   aarch64_builtin_support_vector_misalignment
 
+#undef TARGET_ARRAY_MODE
+#define TARGET_ARRAY_MODE aarch64_array_mode
+
 #undef TARGET_ARRAY_MODE_SUPPORTED_P
 #define TARGET_ARRAY_MODE_SUPPORTED_P aarch64_array_mode_supported_p
 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [4/4] [AArch64] Tests for SVE structure modes
  2017-11-08 15:18 ` [4/4] [AArch64] Tests for SVE structure modes Richard Sandiford
@ 2018-01-06 19:46   ` Richard Sandiford
  0 siblings, 0 replies; 9+ messages in thread
From: Richard Sandiford @ 2018-01-06 19:46 UTC (permalink / raw)
  To: gcc-patches; +Cc: richard.earnshaw, james.greenhalgh, marcus.shawcroft

Ping

Richard Sandiford <richard.sandiford@linaro.org> writes:
> This patch adds tests for the SVE structure mode move patterns
> and for LD[234] and ST[234] vectorisation.
>
>
> 2017-11-08  Richard Sandiford  <richard.sandiford@linaro.org>
> 	    Alan Hayward  <alan.hayward@arm.com>
> 	    David Sherwood  <david.sherwood@arm.com>
>
> gcc/testsuite/
> 	* gcc.target/aarch64/sve_struct_move_1.c: New test.
> 	* gcc.target/aarch64/sve_struct_move_2.c: Likewise.
> 	* gcc.target/aarch64/sve_struct_move_3.c: Likewise.
> 	* gcc.target/aarch64/sve_struct_move_4.c: Likewise.
> 	* gcc.target/aarch64/sve_struct_move_5.c: Likewise.
> 	* gcc.target/aarch64/sve_struct_move_6.c: Likewise.
> 	* gcc.target/aarch64/sve_struct_vect_1.c: Likewise.
> 	* gcc.target/aarch64/sve_struct_vect_1_run.c: Likewise.
> 	* gcc.target/aarch64/sve_struct_vect_2.c: Likewise.
> 	* gcc.target/aarch64/sve_struct_vect_2_run.c: Likewise.
> 	* gcc.target/aarch64/sve_struct_vect_3.c: Likewise.
> 	* gcc.target/aarch64/sve_struct_vect_3_run.c: Likewise.
> 	* gcc.target/aarch64/sve_struct_vect_4.c: Likewise.
> 	* gcc.target/aarch64/sve_struct_vect_4_run.c: Likewise.
> 	* gcc.target/aarch64/sve_struct_vect_5.c: Likewise.
> 	* gcc.target/aarch64/sve_struct_vect_5_run.c: Likewise.
> 	* gcc.target/aarch64/sve_struct_vect_6.c: Likewise.
> 	* gcc.target/aarch64/sve_struct_vect_6_run.c: Likewise.
> 	* gcc.target/aarch64/sve_struct_vect_7.c: Likewise.
> 	* gcc.target/aarch64/sve_struct_vect_7_run.c: Likewise.
> 	* gcc.target/aarch64/sve_struct_vect_8.c: Likewise.
> 	* gcc.target/aarch64/sve_struct_vect_8_run.c: Likewise.
> 	* gcc.target/aarch64/sve_struct_vect_9.c: Likewise.
> 	* gcc.target/aarch64/sve_struct_vect_9_run.c: Likewise.
> 	* gcc.target/aarch64/sve_struct_vect_10.c: Likewise.
> 	* gcc.target/aarch64/sve_struct_vect_10_run.c: Likewise.
> 	* gcc.target/aarch64/sve_struct_vect_11.c: Likewise.
> 	* gcc.target/aarch64/sve_struct_vect_11_run.c: Likewise.
> 	* gcc.target/aarch64/sve_struct_vect_12.c: Likewise.
> 	* gcc.target/aarch64/sve_struct_vect_12_run.c: Likewise.
> 	* gcc.target/aarch64/sve_struct_vect_13.c: Likewise.
> 	* gcc.target/aarch64/sve_struct_vect_13_run.c: Likewise.
> 	* gcc.target/aarch64/sve_struct_vect_14.c: Likewise.
> 	* gcc.target/aarch64/sve_struct_vect_15.c: Likewise.
> 	* gcc.target/aarch64/sve_struct_vect_16.c: Likewise.
> 	* gcc.target/aarch64/sve_struct_vect_17.c: Likewise.

Index: gcc/testsuite/gcc.target/aarch64/sve_struct_move_1.c
===================================================================
--- /dev/null	2017-11-08 11:04:45.353113300 +0000
+++ gcc/testsuite/gcc.target/aarch64/sve_struct_move_1.c	2017-11-08 15:06:27.247849138 +0000
@@ -0,0 +1,129 @@
+/* { dg-do assemble } */
+/* { dg-options "-O -march=armv8-a+sve -msve-vector-bits=256 -mbig-endian --save-temps" } */
+
+typedef char v32qi __attribute__((vector_size(32)));
+typedef struct { v32qi a[2]; } v64qi;
+
+typedef short v16hi __attribute__((vector_size(32)));
+typedef struct { v16hi a[2]; } v32hi;
+
+typedef int v8si __attribute__((vector_size(32)));
+typedef struct { v8si a[2]; } v16si;
+
+typedef long v4di __attribute__((vector_size(32)));
+typedef struct { v4di a[2]; } v8di;
+
+typedef _Float16 v16hf __attribute__((vector_size(32)));
+typedef struct { v16hf a[2]; } v32hf;
+
+typedef float v8sf __attribute__((vector_size(32)));
+typedef struct { v8sf a[2]; } v16sf;
+
+typedef double v4df __attribute__((vector_size(32)));
+typedef struct { v4df a[2]; } v8df;
+
+#define TEST_TYPE(TYPE, REG1, REG2)			\
+  void							\
+  f1_##TYPE (TYPE *a)					\
+  {							\
+    register TYPE x asm (#REG1) = a[0];			\
+    asm volatile ("# test " #TYPE " 1 %S0" :: "w" (x));	\
+    register TYPE y asm (#REG2) = x;			\
+    asm volatile ("# test " #TYPE " 2 %S0, %S1, %S2"	\
+		  : "=&w" (x) : "0" (x), "w" (y));	\
+    a[1] = x;						\
+  }							\
+  /* This must compile, but we don't care how.  */	\
+  void							\
+  f2_##TYPE (TYPE *a)					\
+  {							\
+    TYPE x = a[0];					\
+    x.a[0][3] = 1;					\
+    x.a[1][2] = 12;					\
+    asm volatile ("# %0" :: "w" (x));			\
+  }							\
+  void							\
+  f3_##TYPE (TYPE *a, int i)				\
+  {							\
+    TYPE x = a[0];					\
+    x.a[0][i] = 1;					\
+    asm volatile ("# %0" :: "w" (x));			\
+  }							\
+  void							\
+  f4_##TYPE (TYPE *a, int i, int j)			\
+  {							\
+    TYPE x = a[0];					\
+    x.a[i][j] = 44;					\
+    asm volatile ("# %0" :: "w" (x));			\
+  }
+
+TEST_TYPE (v64qi, z0, z2)
+TEST_TYPE (v32hi, z5, z7)
+TEST_TYPE (v16si, z10, z12)
+TEST_TYPE (v8di, z15, z17)
+TEST_TYPE (v32hf, z18, z20)
+TEST_TYPE (v16sf, z21, z23)
+TEST_TYPE (v8df, z28, z30)
+
+/* { dg-final { scan-assembler {\tld1b\tz0.b, p[0-7]/z, \[x0\]\n} } } */
+/* { dg-final { scan-assembler {\tld1b\tz1.b, p[0-7]/z, \[x0, #1, mul vl\]\n} } } */
+/* { dg-final { scan-assembler { test v64qi 1 z0\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz2.d, z0.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz3.d, z1.d\n} } } */
+/* { dg-final { scan-assembler { test v64qi 2 z0, z0, z2\n} } } */
+/* { dg-final { scan-assembler {\tst1b\tz0.b, p[0-7], \[x0, #2, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tst1b\tz1.b, p[0-7], \[x0, #3, mul vl\]\n} } } */
+
+/* { dg-final { scan-assembler {\tld1h\tz5.h, p[0-7]/z, \[x0\]\n} } } */
+/* { dg-final { scan-assembler {\tld1h\tz6.h, p[0-7]/z, \[x0, #1, mul vl\]\n} } } */
+/* { dg-final { scan-assembler { test v32hi 1 z5\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz7.d, z5.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz8.d, z6.d\n} } } */
+/* { dg-final { scan-assembler { test v32hi 2 z5, z5, z7\n} } } */
+/* { dg-final { scan-assembler {\tst1h\tz5.h, p[0-7], \[x0, #2, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tst1h\tz6.h, p[0-7], \[x0, #3, mul vl\]\n} } } */
+
+/* { dg-final { scan-assembler {\tld1w\tz10.s, p[0-7]/z, \[x0\]\n} } } */
+/* { dg-final { scan-assembler {\tld1w\tz11.s, p[0-7]/z, \[x0, #1, mul vl\]\n} } } */
+/* { dg-final { scan-assembler { test v16si 1 z10\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz12.d, z10.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz13.d, z11.d\n} } } */
+/* { dg-final { scan-assembler { test v16si 2 z10, z10, z12\n} } } */
+/* { dg-final { scan-assembler {\tst1w\tz10.s, p[0-7], \[x0, #2, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tst1w\tz11.s, p[0-7], \[x0, #3, mul vl\]\n} } } */
+
+/* { dg-final { scan-assembler {\tld1d\tz15.d, p[0-7]/z, \[x0\]\n} } } */
+/* { dg-final { scan-assembler {\tld1d\tz16.d, p[0-7]/z, \[x0, #1, mul vl\]\n} } } */
+/* { dg-final { scan-assembler { test v8di 1 z15\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz17.d, z15.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz18.d, z16.d\n} } } */
+/* { dg-final { scan-assembler { test v8di 2 z15, z15, z17\n} } } */
+/* { dg-final { scan-assembler {\tst1d\tz15.d, p[0-7], \[x0, #2, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tst1d\tz16.d, p[0-7], \[x0, #3, mul vl\]\n} } } */
+
+/* { dg-final { scan-assembler {\tld1h\tz18.h, p[0-7]/z, \[x0\]\n} } } */
+/* { dg-final { scan-assembler {\tld1h\tz19.h, p[0-7]/z, \[x0, #1, mul vl\]\n} } } */
+/* { dg-final { scan-assembler { test v32hf 1 z18\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz20.d, z18.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz21.d, z19.d\n} } } */
+/* { dg-final { scan-assembler { test v32hf 2 z18, z18, z20\n} } } */
+/* { dg-final { scan-assembler {\tst1h\tz18.h, p[0-7], \[x0, #2, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tst1h\tz19.h, p[0-7], \[x0, #3, mul vl\]\n} } } */
+
+/* { dg-final { scan-assembler {\tld1w\tz21.s, p[0-7]/z, \[x0\]\n} } } */
+/* { dg-final { scan-assembler {\tld1w\tz22.s, p[0-7]/z, \[x0, #1, mul vl\]\n} } } */
+/* { dg-final { scan-assembler { test v16sf 1 z21\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz23.d, z21.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz24.d, z22.d\n} } } */
+/* { dg-final { scan-assembler { test v16sf 2 z21, z21, z23\n} } } */
+/* { dg-final { scan-assembler {\tst1w\tz21.s, p[0-7], \[x0, #2, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tst1w\tz22.s, p[0-7], \[x0, #3, mul vl\]\n} } } */
+
+/* { dg-final { scan-assembler {\tld1d\tz28.d, p[0-7]/z, \[x0\]\n} } } */
+/* { dg-final { scan-assembler {\tld1d\tz29.d, p[0-7]/z, \[x0, #1, mul vl\]\n} } } */
+/* { dg-final { scan-assembler { test v8df 1 z28\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz30.d, z28.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz31.d, z29.d\n} } } */
+/* { dg-final { scan-assembler { test v8df 2 z28, z28, z30\n} } } */
+/* { dg-final { scan-assembler {\tst1d\tz28.d, p[0-7], \[x0, #2, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tst1d\tz29.d, p[0-7], \[x0, #3, mul vl\]\n} } } */
Index: gcc/testsuite/gcc.target/aarch64/sve_struct_move_2.c
===================================================================
--- /dev/null	2017-11-08 11:04:45.353113300 +0000
+++ gcc/testsuite/gcc.target/aarch64/sve_struct_move_2.c	2017-11-08 15:06:27.247849138 +0000
@@ -0,0 +1,127 @@
+/* { dg-do assemble } */
+/* { dg-options "-O -march=armv8-a+sve -msve-vector-bits=256 -mbig-endian --save-temps" } */
+
+typedef char v32qi __attribute__((vector_size(32)));
+typedef struct { v32qi a[3]; } v96qi;
+
+typedef short v16hi __attribute__((vector_size(32)));
+typedef struct { v16hi a[3]; } v48hi;
+
+typedef int v8si __attribute__((vector_size(32)));
+typedef struct { v8si a[3]; } v24si;
+
+typedef long v4di __attribute__((vector_size(32)));
+typedef struct { v4di a[3]; } v12di;
+
+typedef _Float16 v16hf __attribute__((vector_size(32)));
+typedef struct { v16hf a[3]; } v48hf;
+
+typedef float v8sf __attribute__((vector_size(32)));
+typedef struct { v8sf a[3]; } v24sf;
+
+typedef double v4df __attribute__((vector_size(32)));
+typedef struct { v4df a[3]; } v12df;
+
+#define TEST_TYPE(TYPE, REG1, REG2) \
+  void \
+  f_##TYPE (TYPE *a) \
+  { \
+    register TYPE x asm (#REG1) = a[0]; \
+    asm volatile ("# test " #TYPE " 1 %S0" :: "w" (x)); \
+    register TYPE y asm (#REG2) = x; \
+    asm volatile ("# test " #TYPE " 2 %S0, %S1, %S2" \
+		  : "=&w" (x) : "0" (x), "w" (y)); \
+    a[1] = x; \
+  }
+
+TEST_TYPE (v96qi, z0, z3)
+TEST_TYPE (v48hi, z6, z2)
+TEST_TYPE (v24si, z12, z15)
+TEST_TYPE (v12di, z16, z13)
+TEST_TYPE (v48hf, z18, z1)
+TEST_TYPE (v24sf, z20, z23)
+TEST_TYPE (v12df, z26, z29)
+
+/* { dg-final { scan-assembler {\tld1b\tz0.b, p[0-7]/z, \[x0\]\n} } } */
+/* { dg-final { scan-assembler {\tld1b\tz1.b, p[0-7]/z, \[x0, #1, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld1b\tz2.b, p[0-7]/z, \[x0, #2, mul vl\]\n} } } */
+/* { dg-final { scan-assembler { test v96qi 1 z0\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz3.d, z0.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz4.d, z1.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz5.d, z2.d\n} } } */
+/* { dg-final { scan-assembler { test v96qi 2 z0, z0, z3\n} } } */
+/* { dg-final { scan-assembler {\tst1b\tz0.b, p[0-7], \[x0, #3, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tst1b\tz1.b, p[0-7], \[x0, #4, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tst1b\tz2.b, p[0-7], \[x0, #5, mul vl\]\n} } } */
+
+/* { dg-final { scan-assembler {\tld1h\tz6.h, p[0-7]/z, \[x0\]\n} } } */
+/* { dg-final { scan-assembler {\tld1h\tz7.h, p[0-7]/z, \[x0, #1, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld1h\tz8.h, p[0-7]/z, \[x0, #2, mul vl\]\n} } } */
+/* { dg-final { scan-assembler { test v48hi 1 z6\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz2.d, z6.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz3.d, z7.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz4.d, z8.d\n} } } */
+/* { dg-final { scan-assembler { test v48hi 2 z6, z6, z2\n} } } */
+/* { dg-final { scan-assembler {\tst1h\tz6.h, p[0-7], \[x0, #3, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tst1h\tz7.h, p[0-7], \[x0, #4, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tst1h\tz8.h, p[0-7], \[x0, #5, mul vl\]\n} } } */
+
+/* { dg-final { scan-assembler {\tld1w\tz12.s, p[0-7]/z, \[x0\]\n} } } */
+/* { dg-final { scan-assembler {\tld1w\tz13.s, p[0-7]/z, \[x0, #1, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld1w\tz14.s, p[0-7]/z, \[x0, #2, mul vl\]\n} } } */
+/* { dg-final { scan-assembler { test v24si 1 z12\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz15.d, z12.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz16.d, z13.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz17.d, z14.d\n} } } */
+/* { dg-final { scan-assembler { test v24si 2 z12, z12, z15\n} } } */
+/* { dg-final { scan-assembler {\tst1w\tz12.s, p[0-7], \[x0, #3, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tst1w\tz13.s, p[0-7], \[x0, #4, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tst1w\tz14.s, p[0-7], \[x0, #5, mul vl\]\n} } } */
+
+/* { dg-final { scan-assembler {\tld1d\tz16.d, p[0-7]/z, \[x0\]\n} } } */
+/* { dg-final { scan-assembler {\tld1d\tz17.d, p[0-7]/z, \[x0, #1, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld1d\tz18.d, p[0-7]/z, \[x0, #2, mul vl\]\n} } } */
+/* { dg-final { scan-assembler { test v12di 1 z16\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz13.d, z16.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz14.d, z17.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz15.d, z18.d\n} } } */
+/* { dg-final { scan-assembler { test v12di 2 z16, z16, z13\n} } } */
+/* { dg-final { scan-assembler {\tst1d\tz16.d, p[0-7], \[x0, #3, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tst1d\tz17.d, p[0-7], \[x0, #4, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tst1d\tz18.d, p[0-7], \[x0, #5, mul vl\]\n} } } */
+
+/* { dg-final { scan-assembler {\tld1h\tz18.h, p[0-7]/z, \[x0\]\n} } } */
+/* { dg-final { scan-assembler {\tld1h\tz19.h, p[0-7]/z, \[x0, #1, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld1h\tz20.h, p[0-7]/z, \[x0, #2, mul vl\]\n} } } */
+/* { dg-final { scan-assembler { test v48hf 1 z18\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz1.d, z18.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz2.d, z19.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz3.d, z20.d\n} } } */
+/* { dg-final { scan-assembler { test v48hf 2 z18, z18, z1\n} } } */
+/* { dg-final { scan-assembler {\tst1h\tz18.h, p[0-7], \[x0, #3, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tst1h\tz19.h, p[0-7], \[x0, #4, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tst1h\tz20.h, p[0-7], \[x0, #5, mul vl\]\n} } } */
+
+/* { dg-final { scan-assembler {\tld1w\tz20.s, p[0-7]/z, \[x0\]\n} } } */
+/* { dg-final { scan-assembler {\tld1w\tz21.s, p[0-7]/z, \[x0, #1, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld1w\tz22.s, p[0-7]/z, \[x0, #2, mul vl\]\n} } } */
+/* { dg-final { scan-assembler { test v24sf 1 z20\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz23.d, z20.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz24.d, z21.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz25.d, z22.d\n} } } */
+/* { dg-final { scan-assembler { test v24sf 2 z20, z20, z23\n} } } */
+/* { dg-final { scan-assembler {\tst1w\tz20.s, p[0-7], \[x0, #3, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tst1w\tz21.s, p[0-7], \[x0, #4, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tst1w\tz22.s, p[0-7], \[x0, #5, mul vl\]\n} } } */
+
+/* { dg-final { scan-assembler {\tld1d\tz26.d, p[0-7]/z, \[x0\]\n} } } */
+/* { dg-final { scan-assembler {\tld1d\tz27.d, p[0-7]/z, \[x0, #1, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld1d\tz28.d, p[0-7]/z, \[x0, #2, mul vl\]\n} } } */
+/* { dg-final { scan-assembler { test v12df 1 z26\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz29.d, z26.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz30.d, z27.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz31.d, z28.d\n} } } */
+/* { dg-final { scan-assembler { test v12df 2 z26, z26, z29\n} } } */
+/* { dg-final { scan-assembler {\tst1d\tz26.d, p[0-7], \[x0, #3, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tst1d\tz27.d, p[0-7], \[x0, #4, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tst1d\tz28.d, p[0-7], \[x0, #5, mul vl\]\n} } } */
Index: gcc/testsuite/gcc.target/aarch64/sve_struct_move_3.c
===================================================================
--- /dev/null	2017-11-08 11:04:45.353113300 +0000
+++ gcc/testsuite/gcc.target/aarch64/sve_struct_move_3.c	2017-11-08 15:06:27.247849138 +0000
@@ -0,0 +1,148 @@
+/* { dg-do assemble } */
+/* { dg-options "-O -march=armv8-a+sve -msve-vector-bits=256 -mbig-endian --save-temps" } */
+
+typedef char v32qi __attribute__((vector_size(32)));
+typedef struct { v32qi a[4]; } v128qi;
+
+typedef short v16hi __attribute__((vector_size(32)));
+typedef struct { v16hi a[4]; } v64hi;
+
+typedef int v8si __attribute__((vector_size(32)));
+typedef struct { v8si a[4]; } v32si;
+
+typedef long v4di __attribute__((vector_size(32)));
+typedef struct { v4di a[4]; } v16di;
+
+typedef _Float16 v16hf __attribute__((vector_size(32)));
+typedef struct { v16hf a[4]; } v64hf;
+
+typedef float v8sf __attribute__((vector_size(32)));
+typedef struct { v8sf a[4]; } v32sf;
+
+typedef double v4df __attribute__((vector_size(32)));
+typedef struct { v4df a[4]; } v16df;
+
+#define TEST_TYPE(TYPE, REG1, REG2) \
+  void \
+  f_##TYPE (TYPE *a) \
+  { \
+    register TYPE x asm (#REG1) = a[0]; \
+    asm volatile ("# test " #TYPE " 1 %S0" :: "w" (x)); \
+    register TYPE y asm (#REG2) = x; \
+    asm volatile ("# test " #TYPE " 2 %S0, %S1, %S2" \
+		  : "=&w" (x) : "0" (x), "w" (y)); \
+    a[1] = x; \
+  }
+
+TEST_TYPE (v128qi, z0, z4)
+TEST_TYPE (v64hi, z6, z2)
+TEST_TYPE (v32si, z12, z16)
+TEST_TYPE (v16di, z17, z13)
+TEST_TYPE (v64hf, z18, z1)
+TEST_TYPE (v32sf, z20, z16)
+TEST_TYPE (v16df, z24, z28)
+
+/* { dg-final { scan-assembler {\tld1b\tz0.b, p[0-7]/z, \[x0\]\n} } } */
+/* { dg-final { scan-assembler {\tld1b\tz1.b, p[0-7]/z, \[x0, #1, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld1b\tz2.b, p[0-7]/z, \[x0, #2, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld1b\tz3.b, p[0-7]/z, \[x0, #3, mul vl\]\n} } } */
+/* { dg-final { scan-assembler { test v128qi 1 z0\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz4.d, z0.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz5.d, z1.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz6.d, z2.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz7.d, z3.d\n} } } */
+/* { dg-final { scan-assembler { test v128qi 2 z0, z0, z4\n} } } */
+/* { dg-final { scan-assembler {\tst1b\tz0.b, p[0-7], \[x0, #4, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tst1b\tz1.b, p[0-7], \[x0, #5, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tst1b\tz2.b, p[0-7], \[x0, #6, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tst1b\tz3.b, p[0-7], \[x0, #7, mul vl\]\n} } } */
+
+/* { dg-final { scan-assembler {\tld1h\tz6.h, p[0-7]/z, \[x0\]\n} } } */
+/* { dg-final { scan-assembler {\tld1h\tz7.h, p[0-7]/z, \[x0, #1, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld1h\tz8.h, p[0-7]/z, \[x0, #2, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld1h\tz9.h, p[0-7]/z, \[x0, #3, mul vl\]\n} } } */
+/* { dg-final { scan-assembler { test v64hi 1 z6\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz2.d, z6.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz3.d, z7.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz4.d, z8.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz5.d, z9.d\n} } } */
+/* { dg-final { scan-assembler { test v64hi 2 z6, z6, z2\n} } } */
+/* { dg-final { scan-assembler {\tst1h\tz6.h, p[0-7], \[x0, #4, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tst1h\tz7.h, p[0-7], \[x0, #5, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tst1h\tz8.h, p[0-7], \[x0, #6, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tst1h\tz9.h, p[0-7], \[x0, #7, mul vl\]\n} } } */
+
+/* { dg-final { scan-assembler {\tld1w\tz12.s, p[0-7]/z, \[x0\]\n} } } */
+/* { dg-final { scan-assembler {\tld1w\tz13.s, p[0-7]/z, \[x0, #1, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld1w\tz14.s, p[0-7]/z, \[x0, #2, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld1w\tz15.s, p[0-7]/z, \[x0, #3, mul vl\]\n} } } */
+/* { dg-final { scan-assembler { test v32si 1 z12\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz16.d, z12.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz17.d, z13.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz18.d, z14.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz19.d, z15.d\n} } } */
+/* { dg-final { scan-assembler { test v32si 2 z12, z12, z16\n} } } */
+/* { dg-final { scan-assembler {\tst1w\tz12.s, p[0-7], \[x0, #4, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tst1w\tz13.s, p[0-7], \[x0, #5, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tst1w\tz14.s, p[0-7], \[x0, #6, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tst1w\tz15.s, p[0-7], \[x0, #7, mul vl\]\n} } } */
+
+/* { dg-final { scan-assembler {\tld1d\tz17.d, p[0-7]/z, \[x0\]\n} } } */
+/* { dg-final { scan-assembler {\tld1d\tz18.d, p[0-7]/z, \[x0, #1, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld1d\tz19.d, p[0-7]/z, \[x0, #2, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld1d\tz20.d, p[0-7]/z, \[x0, #3, mul vl\]\n} } } */
+/* { dg-final { scan-assembler { test v16di 1 z17\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz13.d, z17.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz14.d, z18.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz15.d, z19.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz16.d, z20.d\n} } } */
+/* { dg-final { scan-assembler { test v16di 2 z17, z17, z13\n} } } */
+/* { dg-final { scan-assembler {\tst1d\tz17.d, p[0-7], \[x0, #4, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tst1d\tz18.d, p[0-7], \[x0, #5, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tst1d\tz19.d, p[0-7], \[x0, #6, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tst1d\tz20.d, p[0-7], \[x0, #7, mul vl\]\n} } } */
+
+/* { dg-final { scan-assembler {\tld1h\tz18.h, p[0-7]/z, \[x0\]\n} } } */
+/* { dg-final { scan-assembler {\tld1h\tz19.h, p[0-7]/z, \[x0, #1, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld1h\tz20.h, p[0-7]/z, \[x0, #2, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld1h\tz21.h, p[0-7]/z, \[x0, #3, mul vl\]\n} } } */
+/* { dg-final { scan-assembler { test v64hf 1 z18\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz1.d, z18.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz2.d, z19.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz3.d, z20.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz4.d, z21.d\n} } } */
+/* { dg-final { scan-assembler { test v64hf 2 z18, z18, z1\n} } } */
+/* { dg-final { scan-assembler {\tst1h\tz18.h, p[0-7], \[x0, #4, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tst1h\tz19.h, p[0-7], \[x0, #5, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tst1h\tz20.h, p[0-7], \[x0, #6, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tst1h\tz21.h, p[0-7], \[x0, #7, mul vl\]\n} } } */
+
+/* { dg-final { scan-assembler {\tld1w\tz20.s, p[0-7]/z, \[x0\]\n} } } */
+/* { dg-final { scan-assembler {\tld1w\tz21.s, p[0-7]/z, \[x0, #1, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld1w\tz22.s, p[0-7]/z, \[x0, #2, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld1w\tz23.s, p[0-7]/z, \[x0, #3, mul vl\]\n} } } */
+/* { dg-final { scan-assembler { test v32sf 1 z20\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz16.d, z20.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz17.d, z21.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz18.d, z22.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz19.d, z23.d\n} } } */
+/* { dg-final { scan-assembler { test v32sf 2 z20, z20, z16\n} } } */
+/* { dg-final { scan-assembler {\tst1w\tz20.s, p[0-7], \[x0, #4, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tst1w\tz21.s, p[0-7], \[x0, #5, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tst1w\tz22.s, p[0-7], \[x0, #6, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tst1w\tz23.s, p[0-7], \[x0, #7, mul vl\]\n} } } */
+
+/* { dg-final { scan-assembler {\tld1d\tz24.d, p[0-7]/z, \[x0\]\n} } } */
+/* { dg-final { scan-assembler {\tld1d\tz25.d, p[0-7]/z, \[x0, #1, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld1d\tz26.d, p[0-7]/z, \[x0, #2, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld1d\tz27.d, p[0-7]/z, \[x0, #3, mul vl\]\n} } } */
+/* { dg-final { scan-assembler { test v16df 1 z24\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz28.d, z24.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz29.d, z25.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz30.d, z26.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz31.d, z27.d\n} } } */
+/* { dg-final { scan-assembler { test v16df 2 z24, z24, z28\n} } } */
+/* { dg-final { scan-assembler {\tst1d\tz24.d, p[0-7], \[x0, #4, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tst1d\tz25.d, p[0-7], \[x0, #5, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tst1d\tz26.d, p[0-7], \[x0, #6, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tst1d\tz27.d, p[0-7], \[x0, #7, mul vl\]\n} } } */
Index: gcc/testsuite/gcc.target/aarch64/sve_struct_move_4.c
===================================================================
--- /dev/null	2017-11-08 11:04:45.353113300 +0000
+++ gcc/testsuite/gcc.target/aarch64/sve_struct_move_4.c	2017-11-08 15:06:27.247849138 +0000
@@ -0,0 +1,116 @@
+/* { dg-do assemble } */
+/* { dg-options "-O -march=armv8-a+sve -msve-vector-bits=256 -mlittle-endian --save-temps" } */
+
+typedef char v32qi __attribute__((vector_size(32)));
+typedef struct { v32qi a[2]; } v64qi;
+
+typedef short v16hi __attribute__((vector_size(32)));
+typedef struct { v16hi a[2]; } v32hi;
+
+typedef int v8si __attribute__((vector_size(32)));
+typedef struct { v8si a[2]; } v16si;
+
+typedef long v4di __attribute__((vector_size(32)));
+typedef struct { v4di a[2]; } v8di;
+
+typedef float v8sf __attribute__((vector_size(32)));
+typedef struct { v8sf a[2]; } v16sf;
+
+typedef double v4df __attribute__((vector_size(32)));
+typedef struct { v4df a[2]; } v8df;
+
+#define TEST_TYPE(TYPE, REG1, REG2)			\
+  void							\
+  f1_##TYPE (TYPE *a)					\
+  {							\
+    register TYPE x asm (#REG1) = a[0];			\
+    asm volatile ("# test " #TYPE " 1 %S0" :: "w" (x));	\
+    register TYPE y asm (#REG2) = x;			\
+    asm volatile ("# test " #TYPE " 2 %S0, %S1, %S2"	\
+		  : "=&w" (x) : "0" (x), "w" (y));	\
+    a[1] = x;						\
+  }							\
+  /* This must compile, but we don't care how.  */	\
+  void							\
+  f2_##TYPE (TYPE *a)					\
+  {							\
+    TYPE x = a[0];					\
+    x.a[0][3] = 1;					\
+    x.a[1][2] = 12;					\
+    asm volatile ("# %0" :: "w" (x));			\
+  }							\
+  void							\
+  f3_##TYPE (TYPE *a, int i)				\
+  {							\
+    TYPE x = a[0];					\
+    x.a[0][i] = 1;					\
+    asm volatile ("# %0" :: "w" (x));			\
+  }							\
+  void							\
+  f4_##TYPE (TYPE *a, int i, int j)			\
+  {							\
+    TYPE x = a[0];					\
+    x.a[i][j] = 44;					\
+    asm volatile ("# %0" :: "w" (x));			\
+  }
+
+TEST_TYPE (v64qi, z0, z2)
+TEST_TYPE (v32hi, z5, z7)
+TEST_TYPE (v16si, z10, z12)
+TEST_TYPE (v8di, z15, z17)
+TEST_TYPE (v16sf, z20, z23)
+TEST_TYPE (v8df, z28, z30)
+
+/* { dg-final { scan-assembler {\tldr\tz0, \[x0\]\n} } } */
+/* { dg-final { scan-assembler {\tldr\tz1, \[x0, #1, mul vl\]\n} } } */
+/* { dg-final { scan-assembler { test v64qi 1 z0\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz2.d, z0.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz3.d, z1.d\n} } } */
+/* { dg-final { scan-assembler { test v64qi 2 z0, z0, z2\n} } } */
+/* { dg-final { scan-assembler {\tstr\tz0, \[x0, #2, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tstr\tz1, \[x0, #3, mul vl\]\n} } } */
+
+/* { dg-final { scan-assembler {\tldr\tz5, \[x0\]\n} } } */
+/* { dg-final { scan-assembler {\tldr\tz6, \[x0, #1, mul vl\]\n} } } */
+/* { dg-final { scan-assembler { test v32hi 1 z5\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz7.d, z5.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz8.d, z6.d\n} } } */
+/* { dg-final { scan-assembler { test v32hi 2 z5, z5, z7\n} } } */
+/* { dg-final { scan-assembler {\tstr\tz5, \[x0, #2, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tstr\tz6, \[x0, #3, mul vl\]\n} } } */
+
+/* { dg-final { scan-assembler {\tldr\tz10, \[x0\]\n} } } */
+/* { dg-final { scan-assembler {\tldr\tz11, \[x0, #1, mul vl\]\n} } } */
+/* { dg-final { scan-assembler { test v16si 1 z10\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz12.d, z10.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz13.d, z11.d\n} } } */
+/* { dg-final { scan-assembler { test v16si 2 z10, z10, z12\n} } } */
+/* { dg-final { scan-assembler {\tstr\tz10, \[x0, #2, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tstr\tz11, \[x0, #3, mul vl\]\n} } } */
+
+/* { dg-final { scan-assembler {\tldr\tz15, \[x0\]\n} } } */
+/* { dg-final { scan-assembler {\tldr\tz16, \[x0, #1, mul vl\]\n} } } */
+/* { dg-final { scan-assembler { test v8di 1 z15\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz17.d, z15.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz18.d, z16.d\n} } } */
+/* { dg-final { scan-assembler { test v8di 2 z15, z15, z17\n} } } */
+/* { dg-final { scan-assembler {\tstr\tz15, \[x0, #2, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tstr\tz16, \[x0, #3, mul vl\]\n} } } */
+
+/* { dg-final { scan-assembler {\tldr\tz20, \[x0\]\n} } } */
+/* { dg-final { scan-assembler {\tldr\tz21, \[x0, #1, mul vl\]\n} } } */
+/* { dg-final { scan-assembler { test v16sf 1 z20\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz23.d, z20.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz24.d, z21.d\n} } } */
+/* { dg-final { scan-assembler { test v16sf 2 z20, z20, z23\n} } } */
+/* { dg-final { scan-assembler {\tstr\tz20, \[x0, #2, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tstr\tz21, \[x0, #3, mul vl\]\n} } } */
+
+/* { dg-final { scan-assembler {\tldr\tz28, \[x0\]\n} } } */
+/* { dg-final { scan-assembler {\tldr\tz29, \[x0, #1, mul vl\]\n} } } */
+/* { dg-final { scan-assembler { test v8df 1 z28\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz30.d, z28.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz31.d, z29.d\n} } } */
+/* { dg-final { scan-assembler { test v8df 2 z28, z28, z30\n} } } */
+/* { dg-final { scan-assembler {\tstr\tz28, \[x0, #2, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tstr\tz29, \[x0, #3, mul vl\]\n} } } */
Index: gcc/testsuite/gcc.target/aarch64/sve_struct_move_5.c
===================================================================
--- /dev/null	2017-11-08 11:04:45.353113300 +0000
+++ gcc/testsuite/gcc.target/aarch64/sve_struct_move_5.c	2017-11-08 15:06:27.248849138 +0000
@@ -0,0 +1,111 @@
+/* { dg-do assemble } */
+/* { dg-options "-O -march=armv8-a+sve -msve-vector-bits=256 -mlittle-endian --save-temps" } */
+
+typedef char v32qi __attribute__((vector_size(32)));
+typedef struct { v32qi a[3]; } v96qi;
+
+typedef short v16hi __attribute__((vector_size(32)));
+typedef struct { v16hi a[3]; } v48hi;
+
+typedef int v8si __attribute__((vector_size(32)));
+typedef struct { v8si a[3]; } v24si;
+
+typedef long v4di __attribute__((vector_size(32)));
+typedef struct { v4di a[3]; } v12di;
+
+typedef float v8sf __attribute__((vector_size(32)));
+typedef struct { v8sf a[3]; } v24sf;
+
+typedef double v4df __attribute__((vector_size(32)));
+typedef struct { v4df a[3]; } v12df;
+
+#define TEST_TYPE(TYPE, REG1, REG2) \
+  void \
+  f_##TYPE (TYPE *a) \
+  { \
+    register TYPE x asm (#REG1) = a[0]; \
+    asm volatile ("# test " #TYPE " 1 %S0" :: "w" (x)); \
+    register TYPE y asm (#REG2) = x; \
+    asm volatile ("# test " #TYPE " 2 %S0, %S1, %S2" \
+		  : "=&w" (x) : "0" (x), "w" (y)); \
+    a[1] = x; \
+  }
+
+TEST_TYPE (v96qi, z0, z3)
+TEST_TYPE (v48hi, z6, z2)
+TEST_TYPE (v24si, z12, z15)
+TEST_TYPE (v12di, z16, z13)
+TEST_TYPE (v24sf, z20, z23)
+TEST_TYPE (v12df, z26, z29)
+
+/* { dg-final { scan-assembler {\tldr\tz0, \[x0\]\n} } } */
+/* { dg-final { scan-assembler {\tldr\tz1, \[x0, #1, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tldr\tz2, \[x0, #2, mul vl\]\n} } } */
+/* { dg-final { scan-assembler { test v96qi 1 z0\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz3.d, z0.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz4.d, z1.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz5.d, z2.d\n} } } */
+/* { dg-final { scan-assembler { test v96qi 2 z0, z0, z3\n} } } */
+/* { dg-final { scan-assembler {\tstr\tz0, \[x0, #3, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tstr\tz1, \[x0, #4, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tstr\tz2, \[x0, #5, mul vl\]\n} } } */
+
+/* { dg-final { scan-assembler {\tldr\tz6, \[x0\]\n} } } */
+/* { dg-final { scan-assembler {\tldr\tz7, \[x0, #1, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tldr\tz8, \[x0, #2, mul vl\]\n} } } */
+/* { dg-final { scan-assembler { test v48hi 1 z6\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz2.d, z6.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz3.d, z7.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz4.d, z8.d\n} } } */
+/* { dg-final { scan-assembler { test v48hi 2 z6, z6, z2\n} } } */
+/* { dg-final { scan-assembler {\tstr\tz6, \[x0, #3, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tstr\tz7, \[x0, #4, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tstr\tz8, \[x0, #5, mul vl\]\n} } } */
+
+/* { dg-final { scan-assembler {\tldr\tz12, \[x0\]\n} } } */
+/* { dg-final { scan-assembler {\tldr\tz13, \[x0, #1, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tldr\tz14, \[x0, #2, mul vl\]\n} } } */
+/* { dg-final { scan-assembler { test v24si 1 z12\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz15.d, z12.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz16.d, z13.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz17.d, z14.d\n} } } */
+/* { dg-final { scan-assembler { test v24si 2 z12, z12, z15\n} } } */
+/* { dg-final { scan-assembler {\tstr\tz12, \[x0, #3, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tstr\tz13, \[x0, #4, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tstr\tz14, \[x0, #5, mul vl\]\n} } } */
+
+/* { dg-final { scan-assembler {\tldr\tz16, \[x0\]\n} } } */
+/* { dg-final { scan-assembler {\tldr\tz17, \[x0, #1, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tldr\tz18, \[x0, #2, mul vl\]\n} } } */
+/* { dg-final { scan-assembler { test v12di 1 z16\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz13.d, z16.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz14.d, z17.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz15.d, z18.d\n} } } */
+/* { dg-final { scan-assembler { test v12di 2 z16, z16, z13\n} } } */
+/* { dg-final { scan-assembler {\tstr\tz16, \[x0, #3, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tstr\tz17, \[x0, #4, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tstr\tz18, \[x0, #5, mul vl\]\n} } } */
+
+/* { dg-final { scan-assembler {\tldr\tz20, \[x0\]\n} } } */
+/* { dg-final { scan-assembler {\tldr\tz21, \[x0, #1, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tldr\tz22, \[x0, #2, mul vl\]\n} } } */
+/* { dg-final { scan-assembler { test v24sf 1 z20\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz23.d, z20.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz24.d, z21.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz25.d, z22.d\n} } } */
+/* { dg-final { scan-assembler { test v24sf 2 z20, z20, z23\n} } } */
+/* { dg-final { scan-assembler {\tstr\tz20, \[x0, #3, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tstr\tz21, \[x0, #4, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tstr\tz22, \[x0, #5, mul vl\]\n} } } */
+
+/* { dg-final { scan-assembler {\tldr\tz26, \[x0\]\n} } } */
+/* { dg-final { scan-assembler {\tldr\tz27, \[x0, #1, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tldr\tz28, \[x0, #2, mul vl\]\n} } } */
+/* { dg-final { scan-assembler { test v12df 1 z26\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz29.d, z26.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz30.d, z27.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz31.d, z28.d\n} } } */
+/* { dg-final { scan-assembler { test v12df 2 z26, z26, z29\n} } } */
+/* { dg-final { scan-assembler {\tstr\tz26, \[x0, #3, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tstr\tz27, \[x0, #4, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tstr\tz28, \[x0, #5, mul vl\]\n} } } */
Index: gcc/testsuite/gcc.target/aarch64/sve_struct_move_6.c
===================================================================
--- /dev/null	2017-11-08 11:04:45.353113300 +0000
+++ gcc/testsuite/gcc.target/aarch64/sve_struct_move_6.c	2017-11-08 15:06:27.248849138 +0000
@@ -0,0 +1,129 @@
+/* { dg-do assemble } */
+/* { dg-options "-O -march=armv8-a+sve -msve-vector-bits=256 -mlittle-endian --save-temps" } */
+
+typedef char v32qi __attribute__((vector_size(32)));
+typedef struct { v32qi a[4]; } v128qi;
+
+typedef short v16hi __attribute__((vector_size(32)));
+typedef struct { v16hi a[4]; } v64hi;
+
+typedef int v8si __attribute__((vector_size(32)));
+typedef struct { v8si a[4]; } v32si;
+
+typedef long v4di __attribute__((vector_size(32)));
+typedef struct { v4di a[4]; } v16di;
+
+typedef float v8sf __attribute__((vector_size(32)));
+typedef struct { v8sf a[4]; } v32sf;
+
+typedef double v4df __attribute__((vector_size(32)));
+typedef struct { v4df a[4]; } v16df;
+
+#define TEST_TYPE(TYPE, REG1, REG2) \
+  void \
+  f_##TYPE (TYPE *a) \
+  { \
+    register TYPE x asm (#REG1) = a[0]; \
+    asm volatile ("# test " #TYPE " 1 %S0" :: "w" (x)); \
+    register TYPE y asm (#REG2) = x; \
+    asm volatile ("# test " #TYPE " 2 %S0, %S1, %S2" \
+		  : "=&w" (x) : "0" (x), "w" (y)); \
+    a[1] = x; \
+  }
+
+TEST_TYPE (v128qi, z0, z4)
+TEST_TYPE (v64hi, z6, z2)
+TEST_TYPE (v32si, z12, z16)
+TEST_TYPE (v16di, z17, z13)
+TEST_TYPE (v32sf, z20, z16)
+TEST_TYPE (v16df, z24, z28)
+
+/* { dg-final { scan-assembler {\tldr\tz0, \[x0\]\n} } } */
+/* { dg-final { scan-assembler {\tldr\tz1, \[x0, #1, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tldr\tz2, \[x0, #2, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tldr\tz3, \[x0, #3, mul vl\]\n} } } */
+/* { dg-final { scan-assembler { test v128qi 1 z0\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz4.d, z0.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz5.d, z1.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz6.d, z2.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz7.d, z3.d\n} } } */
+/* { dg-final { scan-assembler { test v128qi 2 z0, z0, z4\n} } } */
+/* { dg-final { scan-assembler {\tstr\tz0, \[x0, #4, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tstr\tz1, \[x0, #5, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tstr\tz2, \[x0, #6, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tstr\tz3, \[x0, #7, mul vl\]\n} } } */
+
+/* { dg-final { scan-assembler {\tldr\tz6, \[x0\]\n} } } */
+/* { dg-final { scan-assembler {\tldr\tz7, \[x0, #1, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tldr\tz8, \[x0, #2, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tldr\tz9, \[x0, #3, mul vl\]\n} } } */
+/* { dg-final { scan-assembler { test v64hi 1 z6\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz2.d, z6.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz3.d, z7.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz4.d, z8.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz5.d, z9.d\n} } } */
+/* { dg-final { scan-assembler { test v64hi 2 z6, z6, z2\n} } } */
+/* { dg-final { scan-assembler {\tstr\tz6, \[x0, #4, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tstr\tz7, \[x0, #5, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tstr\tz8, \[x0, #6, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tstr\tz9, \[x0, #7, mul vl\]\n} } } */
+
+/* { dg-final { scan-assembler {\tldr\tz12, \[x0\]\n} } } */
+/* { dg-final { scan-assembler {\tldr\tz13, \[x0, #1, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tldr\tz14, \[x0, #2, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tldr\tz15, \[x0, #3, mul vl\]\n} } } */
+/* { dg-final { scan-assembler { test v32si 1 z12\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz16.d, z12.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz17.d, z13.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz18.d, z14.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz19.d, z15.d\n} } } */
+/* { dg-final { scan-assembler { test v32si 2 z12, z12, z16\n} } } */
+/* { dg-final { scan-assembler {\tstr\tz12, \[x0, #4, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tstr\tz13, \[x0, #5, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tstr\tz14, \[x0, #6, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tstr\tz15, \[x0, #7, mul vl\]\n} } } */
+
+/* { dg-final { scan-assembler {\tldr\tz17, \[x0\]\n} } } */
+/* { dg-final { scan-assembler {\tldr\tz18, \[x0, #1, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tldr\tz19, \[x0, #2, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tldr\tz20, \[x0, #3, mul vl\]\n} } } */
+/* { dg-final { scan-assembler { test v16di 1 z17\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz13.d, z17.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz14.d, z18.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz15.d, z19.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz16.d, z20.d\n} } } */
+/* { dg-final { scan-assembler { test v16di 2 z17, z17, z13\n} } } */
+/* { dg-final { scan-assembler {\tstr\tz17, \[x0, #4, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tstr\tz18, \[x0, #5, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tstr\tz19, \[x0, #6, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tstr\tz20, \[x0, #7, mul vl\]\n} } } */
+
+/* { dg-final { scan-assembler {\tldr\tz20, \[x0\]\n} } } */
+/* { dg-final { scan-assembler {\tldr\tz21, \[x0, #1, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tldr\tz22, \[x0, #2, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tldr\tz23, \[x0, #3, mul vl\]\n} } } */
+/* { dg-final { scan-assembler { test v32sf 1 z20\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz16.d, z20.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz17.d, z21.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz18.d, z22.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz19.d, z23.d\n} } } */
+/* { dg-final { scan-assembler { test v32sf 2 z20, z20, z16\n} } } */
+/* { dg-final { scan-assembler {\tstr\tz20, \[x0, #4, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tstr\tz21, \[x0, #5, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tstr\tz22, \[x0, #6, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tstr\tz23, \[x0, #7, mul vl\]\n} } } */
+
+/* { dg-final { scan-assembler {\tldr\tz24, \[x0\]\n} } } */
+/* { dg-final { scan-assembler {\tldr\tz25, \[x0, #1, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tldr\tz26, \[x0, #2, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tldr\tz27, \[x0, #3, mul vl\]\n} } } */
+/* { dg-final { scan-assembler { test v16df 1 z24\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz28.d, z24.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz29.d, z25.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz30.d, z26.d\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz31.d, z27.d\n} } } */
+/* { dg-final { scan-assembler { test v16df 2 z24, z24, z28\n} } } */
+/* { dg-final { scan-assembler {\tstr\tz24, \[x0, #4, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tstr\tz25, \[x0, #5, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tstr\tz26, \[x0, #6, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tstr\tz27, \[x0, #7, mul vl\]\n} } } */
Index: gcc/testsuite/gcc.target/aarch64/sve_struct_vect_1.c
===================================================================
--- /dev/null	2017-11-08 11:04:45.353113300 +0000
+++ gcc/testsuite/gcc.target/aarch64/sve_struct_vect_1.c	2017-11-08 15:06:27.248849138 +0000
@@ -0,0 +1,89 @@
+/* { dg-do assemble } */
+/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve --save-temps" } */
+
+#ifndef TYPE
+#define TYPE unsigned char
+#endif
+
+#ifndef NAME
+#define NAME(X) X
+#endif
+
+#define N 1024
+
+void __attribute__ ((noinline, noclone))
+NAME(f2) (TYPE *__restrict a, TYPE *__restrict b, TYPE *__restrict c)
+{
+  for (int i = 0; i < N; ++i)
+    {
+      a[i] = c[i * 2];
+      b[i] = c[i * 2 + 1];
+    }
+}
+
+void __attribute__ ((noinline, noclone))
+NAME(f3) (TYPE *__restrict a, TYPE *__restrict b, TYPE *__restrict c,
+	  TYPE *__restrict d)
+{
+  for (int i = 0; i < N; ++i)
+    {
+      a[i] = d[i * 3];
+      b[i] = d[i * 3 + 1];
+      c[i] = d[i * 3 + 2];
+    }
+}
+
+void __attribute__ ((noinline, noclone))
+NAME(f4) (TYPE *__restrict a, TYPE *__restrict b, TYPE *__restrict c,
+	  TYPE *__restrict d, TYPE *__restrict e)
+{
+  for (int i = 0; i < N; ++i)
+    {
+      a[i] = e[i * 4];
+      b[i] = e[i * 4 + 1];
+      c[i] = e[i * 4 + 2];
+      d[i] = e[i * 4 + 3];
+    }
+}
+
+void __attribute__ ((noinline, noclone))
+NAME(g2) (TYPE *__restrict a, TYPE *__restrict b, TYPE *__restrict c)
+{
+  for (int i = 0; i < N; ++i)
+    {
+      c[i * 2] = a[i];
+      c[i * 2 + 1] = b[i];
+    }
+}
+
+void __attribute__ ((noinline, noclone))
+NAME(g3) (TYPE *__restrict a, TYPE *__restrict b, TYPE *__restrict c,
+	  TYPE *__restrict d)
+{
+  for (int i = 0; i < N; ++i)
+    {
+      d[i * 3] = a[i];
+      d[i * 3 + 1] = b[i];
+      d[i * 3 + 2] = c[i];
+    }
+}
+
+void __attribute__ ((noinline, noclone))
+NAME(g4) (TYPE *__restrict a, TYPE *__restrict b, TYPE *__restrict c,
+	  TYPE *__restrict d, TYPE *__restrict e)
+{
+  for (int i = 0; i < N; ++i)
+    {
+      e[i * 4] = a[i];
+      e[i * 4 + 1] = b[i];
+      e[i * 4 + 2] = c[i];
+      e[i * 4 + 3] = d[i];
+    }
+}
+
+/* { dg-final { scan-assembler {\tld2b\t{z[0-9]+.b - z[0-9]+.b}, p[0-7]/z, \[x[0-9]+\]\n} } } */
+/* { dg-final { scan-assembler {\tld3b\t{z[0-9]+.b - z[0-9]+.b}, p[0-7]/z, \[x[0-9]+\]\n} } } */
+/* { dg-final { scan-assembler {\tld4b\t{z[0-9]+.b - z[0-9]+.b}, p[0-7]/z, \[x[0-9]+\]\n} } } */
+/* { dg-final { scan-assembler {\tst2b\t{z[0-9]+.b - z[0-9]+.b}, p[0-7], \[x[0-9]+\]\n} } } */
+/* { dg-final { scan-assembler {\tst3b\t{z[0-9]+.b - z[0-9]+.b}, p[0-7], \[x[0-9]+\]\n} } } */
+/* { dg-final { scan-assembler {\tst4b\t{z[0-9]+.b - z[0-9]+.b}, p[0-7], \[x[0-9]+\]\n} } } */
Index: gcc/testsuite/gcc.target/aarch64/sve_struct_vect_1_run.c
===================================================================
--- /dev/null	2017-11-08 11:04:45.353113300 +0000
+++ gcc/testsuite/gcc.target/aarch64/sve_struct_vect_1_run.c	2017-11-08 15:06:27.249849138 +0000
@@ -0,0 +1,63 @@
+/* { dg-do run { target aarch64_sve_hw } } */
+/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve --save-temps" } */
+
+#include "sve_struct_vect_1.c"
+
+TYPE a[N], b[N], c[N], d[N], e[N * 4];
+
+void __attribute__ ((noinline, noclone))
+init_array (TYPE *array, int n, TYPE base, TYPE step)
+{
+  for (int i = 0; i < n; ++i)
+    array[i] = base + step * i;
+}
+
+void __attribute__ ((noinline, noclone))
+check_array (TYPE *array, int n, TYPE base, TYPE step)
+{
+  for (int i = 0; i < n; ++i)
+    if (array[i] != (TYPE) (base + step * i))
+      __builtin_abort ();
+}
+
+int __attribute__ ((optimize (1)))
+main (void)
+{
+  init_array (e, 2 * N, 11, 5);
+  f2 (a, b, e);
+  check_array (a, N, 11, 10);
+  check_array (b, N, 16, 10);
+
+  init_array (e, 3 * N, 7, 6);
+  f3 (a, b, c, e);
+  check_array (a, N, 7, 18);
+  check_array (b, N, 13, 18);
+  check_array (c, N, 19, 18);
+
+  init_array (e, 4 * N, 4, 11);
+  f4 (a, b, c, d, e);
+  check_array (a, N, 4, 44);
+  check_array (b, N, 15, 44);
+  check_array (c, N, 26, 44);
+  check_array (d, N, 37, 44);
+
+  init_array (a, N, 2, 8);
+  init_array (b, N, 6, 8);
+  g2 (a, b, e);
+  check_array (e, 2 * N, 2, 4);
+
+  init_array (a, N, 4, 15);
+  init_array (b, N, 9, 15);
+  init_array (c, N, 14, 15);
+  g3 (a, b, c, e);
+  check_array (e, 3 * N, 4, 5);
+
+  init_array (a, N, 14, 36);
+  init_array (b, N, 23, 36);
+  init_array (c, N, 32, 36);
+  init_array (d, N, 41, 36);
+  g4 (a, b, c, d, e);
+  check_array (e, 4 * N, 14, 9);
+
+  return 0;
+}
Index: gcc/testsuite/gcc.target/aarch64/sve_struct_vect_2.c
===================================================================
--- /dev/null	2017-11-08 11:04:45.353113300 +0000
+++ gcc/testsuite/gcc.target/aarch64/sve_struct_vect_2.c	2017-11-08 15:06:27.249849138 +0000
@@ -0,0 +1,12 @@
+/* { dg-do assemble } */
+/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve --save-temps" } */
+
+#define TYPE unsigned short
+#include "sve_struct_vect_1.c"
+
+/* { dg-final { scan-assembler {\tld2h\t{z[0-9]+.h - z[0-9]+.h}, p[0-7]/z, \[x[0-9]+\]\n} } } */
+/* { dg-final { scan-assembler {\tld3h\t{z[0-9]+.h - z[0-9]+.h}, p[0-7]/z, \[x[0-9]+\]\n} } } */
+/* { dg-final { scan-assembler {\tld4h\t{z[0-9]+.h - z[0-9]+.h}, p[0-7]/z, \[x[0-9]+\]\n} } } */
+/* { dg-final { scan-assembler {\tst2h\t{z[0-9]+.h - z[0-9]+.h}, p[0-7], \[x[0-9]+\]\n} } } */
+/* { dg-final { scan-assembler {\tst3h\t{z[0-9]+.h - z[0-9]+.h}, p[0-7], \[x[0-9]+\]\n} } } */
+/* { dg-final { scan-assembler {\tst4h\t{z[0-9]+.h - z[0-9]+.h}, p[0-7], \[x[0-9]+\]\n} } } */
Index: gcc/testsuite/gcc.target/aarch64/sve_struct_vect_2_run.c
===================================================================
--- /dev/null	2017-11-08 11:04:45.353113300 +0000
+++ gcc/testsuite/gcc.target/aarch64/sve_struct_vect_2_run.c	2017-11-08 15:06:27.250849138 +0000
@@ -0,0 +1,5 @@
+/* { dg-do run { target aarch64_sve_hw } } */
+/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve --save-temps" } */
+
+#define TYPE unsigned short
+#include "sve_struct_vect_1_run.c"
Index: gcc/testsuite/gcc.target/aarch64/sve_struct_vect_3.c
===================================================================
--- /dev/null	2017-11-08 11:04:45.353113300 +0000
+++ gcc/testsuite/gcc.target/aarch64/sve_struct_vect_3.c	2017-11-08 15:06:27.250849138 +0000
@@ -0,0 +1,12 @@
+/* { dg-do assemble } */
+/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve --save-temps" } */
+
+#define TYPE unsigned int
+#include "sve_struct_vect_1.c"
+
+/* { dg-final { scan-assembler {\tld2w\t{z[0-9]+.s - z[0-9]+.s}, p[0-7]/z, \[x[0-9]+\]\n} } } */
+/* { dg-final { scan-assembler {\tld3w\t{z[0-9]+.s - z[0-9]+.s}, p[0-7]/z, \[x[0-9]+\]\n} } } */
+/* { dg-final { scan-assembler {\tld4w\t{z[0-9]+.s - z[0-9]+.s}, p[0-7]/z, \[x[0-9]+\]\n} } } */
+/* { dg-final { scan-assembler {\tst2w\t{z[0-9]+.s - z[0-9]+.s}, p[0-7], \[x[0-9]+\]\n} } } */
+/* { dg-final { scan-assembler {\tst3w\t{z[0-9]+.s - z[0-9]+.s}, p[0-7], \[x[0-9]+\]\n} } } */
+/* { dg-final { scan-assembler {\tst4w\t{z[0-9]+.s - z[0-9]+.s}, p[0-7], \[x[0-9]+\]\n} } } */
Index: gcc/testsuite/gcc.target/aarch64/sve_struct_vect_3_run.c
===================================================================
--- /dev/null	2017-11-08 11:04:45.353113300 +0000
+++ gcc/testsuite/gcc.target/aarch64/sve_struct_vect_3_run.c	2017-11-08 15:06:27.250849138 +0000
@@ -0,0 +1,5 @@
+/* { dg-do run { target aarch64_sve_hw } } */
+/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve --save-temps" } */
+
+#define TYPE unsigned int
+#include "sve_struct_vect_1_run.c"
Index: gcc/testsuite/gcc.target/aarch64/sve_struct_vect_4.c
===================================================================
--- /dev/null	2017-11-08 11:04:45.353113300 +0000
+++ gcc/testsuite/gcc.target/aarch64/sve_struct_vect_4.c	2017-11-08 15:06:27.250849138 +0000
@@ -0,0 +1,12 @@
+/* { dg-do assemble } */
+/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve --save-temps" } */
+
+#define TYPE unsigned long
+#include "sve_struct_vect_1.c"
+
+/* { dg-final { scan-assembler {\tld2d\t{z[0-9]+.d - z[0-9]+.d}, p[0-7]/z, \[x[0-9]+\]\n} } } */
+/* { dg-final { scan-assembler {\tld3d\t{z[0-9]+.d - z[0-9]+.d}, p[0-7]/z, \[x[0-9]+\]\n} } } */
+/* { dg-final { scan-assembler {\tld4d\t{z[0-9]+.d - z[0-9]+.d}, p[0-7]/z, \[x[0-9]+\]\n} } } */
+/* { dg-final { scan-assembler {\tst2d\t{z[0-9]+.d - z[0-9]+.d}, p[0-7], \[x[0-9]+\]\n} } } */
+/* { dg-final { scan-assembler {\tst3d\t{z[0-9]+.d - z[0-9]+.d}, p[0-7], \[x[0-9]+\]\n} } } */
+/* { dg-final { scan-assembler {\tst4d\t{z[0-9]+.d - z[0-9]+.d}, p[0-7], \[x[0-9]+\]\n} } } */
Index: gcc/testsuite/gcc.target/aarch64/sve_struct_vect_4_run.c
===================================================================
--- /dev/null	2017-11-08 11:04:45.353113300 +0000
+++ gcc/testsuite/gcc.target/aarch64/sve_struct_vect_4_run.c	2017-11-08 15:06:27.250849138 +0000
@@ -0,0 +1,5 @@
+/* { dg-do run { target aarch64_sve_hw } } */
+/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve --save-temps" } */
+
+#define TYPE unsigned long
+#include "sve_struct_vect_1_run.c"
Index: gcc/testsuite/gcc.target/aarch64/sve_struct_vect_5.c
===================================================================
--- /dev/null	2017-11-08 11:04:45.353113300 +0000
+++ gcc/testsuite/gcc.target/aarch64/sve_struct_vect_5.c	2017-11-08 15:06:27.250849138 +0000
@@ -0,0 +1,12 @@
+/* { dg-do assemble } */
+/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve --save-temps" } */
+
+#define TYPE float
+#include "sve_struct_vect_1.c"
+
+/* { dg-final { scan-assembler {\tld2w\t{z[0-9]+.s - z[0-9]+.s}, p[0-7]/z, \[x[0-9]+\]\n} } } */
+/* { dg-final { scan-assembler {\tld3w\t{z[0-9]+.s - z[0-9]+.s}, p[0-7]/z, \[x[0-9]+\]\n} } } */
+/* { dg-final { scan-assembler {\tld4w\t{z[0-9]+.s - z[0-9]+.s}, p[0-7]/z, \[x[0-9]+\]\n} } } */
+/* { dg-final { scan-assembler {\tst2w\t{z[0-9]+.s - z[0-9]+.s}, p[0-7], \[x[0-9]+\]\n} } } */
+/* { dg-final { scan-assembler {\tst3w\t{z[0-9]+.s - z[0-9]+.s}, p[0-7], \[x[0-9]+\]\n} } } */
+/* { dg-final { scan-assembler {\tst4w\t{z[0-9]+.s - z[0-9]+.s}, p[0-7], \[x[0-9]+\]\n} } } */
Index: gcc/testsuite/gcc.target/aarch64/sve_struct_vect_5_run.c
===================================================================
--- /dev/null	2017-11-08 11:04:45.353113300 +0000
+++ gcc/testsuite/gcc.target/aarch64/sve_struct_vect_5_run.c	2017-11-08 15:06:27.250849138 +0000
@@ -0,0 +1,5 @@
+/* { dg-do run { target aarch64_sve_hw } } */
+/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve --save-temps" } */
+
+#define TYPE float
+#include "sve_struct_vect_1_run.c"
Index: gcc/testsuite/gcc.target/aarch64/sve_struct_vect_6.c
===================================================================
--- /dev/null	2017-11-08 11:04:45.353113300 +0000
+++ gcc/testsuite/gcc.target/aarch64/sve_struct_vect_6.c	2017-11-08 15:06:27.250849138 +0000
@@ -0,0 +1,12 @@
+/* { dg-do assemble } */
+/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve --save-temps" } */
+
+#define TYPE double
+#include "sve_struct_vect_1.c"
+
+/* { dg-final { scan-assembler {\tld2d\t{z[0-9]+.d - z[0-9]+.d}, p[0-7]/z, \[x[0-9]+\]\n} } } */
+/* { dg-final { scan-assembler {\tld3d\t{z[0-9]+.d - z[0-9]+.d}, p[0-7]/z, \[x[0-9]+\]\n} } } */
+/* { dg-final { scan-assembler {\tld4d\t{z[0-9]+.d - z[0-9]+.d}, p[0-7]/z, \[x[0-9]+\]\n} } } */
+/* { dg-final { scan-assembler {\tst2d\t{z[0-9]+.d - z[0-9]+.d}, p[0-7], \[x[0-9]+\]\n} } } */
+/* { dg-final { scan-assembler {\tst3d\t{z[0-9]+.d - z[0-9]+.d}, p[0-7], \[x[0-9]+\]\n} } } */
+/* { dg-final { scan-assembler {\tst4d\t{z[0-9]+.d - z[0-9]+.d}, p[0-7], \[x[0-9]+\]\n} } } */
Index: gcc/testsuite/gcc.target/aarch64/sve_struct_vect_6_run.c
===================================================================
--- /dev/null	2017-11-08 11:04:45.353113300 +0000
+++ gcc/testsuite/gcc.target/aarch64/sve_struct_vect_6_run.c	2017-11-08 15:06:27.250849138 +0000
@@ -0,0 +1,5 @@
+/* { dg-do run { target aarch64_sve_hw } } */
+/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve --save-temps" } */
+
+#define TYPE double
+#include "sve_struct_vect_1_run.c"
Index: gcc/testsuite/gcc.target/aarch64/sve_struct_vect_7.c
===================================================================
--- /dev/null	2017-11-08 11:04:45.353113300 +0000
+++ gcc/testsuite/gcc.target/aarch64/sve_struct_vect_7.c	2017-11-08 15:06:27.250849138 +0000
@@ -0,0 +1,84 @@
+/* { dg-do assemble } */
+/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve --save-temps" } */
+
+#ifndef TYPE
+#define TYPE unsigned char
+#define ITYPE signed char
+#endif
+
+void __attribute__ ((noinline, noclone))
+f2 (TYPE *__restrict a, TYPE *__restrict b, TYPE *__restrict c, ITYPE n)
+{
+  for (ITYPE i = 0; i < n; ++i)
+    {
+      a[i] = c[i * 2];
+      b[i] = c[i * 2 + 1];
+    }
+}
+
+void __attribute__ ((noinline, noclone))
+f3 (TYPE *__restrict a, TYPE *__restrict b, TYPE *__restrict c,
+    TYPE *__restrict d, ITYPE n)
+{
+  for (ITYPE i = 0; i < n; ++i)
+    {
+      a[i] = d[i * 3];
+      b[i] = d[i * 3 + 1];
+      c[i] = d[i * 3 + 2];
+    }
+}
+
+void __attribute__ ((noinline, noclone))
+f4 (TYPE *__restrict a, TYPE *__restrict b, TYPE *__restrict c,
+    TYPE *__restrict d, TYPE *__restrict e, ITYPE n)
+{
+  for (ITYPE i = 0; i < n; ++i)
+    {
+      a[i] = e[i * 4];
+      b[i] = e[i * 4 + 1];
+      c[i] = e[i * 4 + 2];
+      d[i] = e[i * 4 + 3];
+    }
+}
+
+void __attribute__ ((noinline, noclone))
+g2 (TYPE *__restrict a, TYPE *__restrict b, TYPE *__restrict c, ITYPE n)
+{
+  for (ITYPE i = 0; i < n; ++i)
+    {
+      c[i * 2] = a[i];
+      c[i * 2 + 1] = b[i];
+    }
+}
+
+void __attribute__ ((noinline, noclone))
+g3 (TYPE *__restrict a, TYPE *__restrict b, TYPE *__restrict c,
+    TYPE *__restrict d, ITYPE n)
+{
+  for (ITYPE i = 0; i < n; ++i)
+    {
+      d[i * 3] = a[i];
+      d[i * 3 + 1] = b[i];
+      d[i * 3 + 2] = c[i];
+    }
+}
+
+void __attribute__ ((noinline, noclone))
+g4 (TYPE *__restrict a, TYPE *__restrict b, TYPE *__restrict c,
+    TYPE *__restrict d, TYPE *__restrict e, ITYPE n)
+{
+  for (ITYPE i = 0; i < n; ++i)
+    {
+      e[i * 4] = a[i];
+      e[i * 4 + 1] = b[i];
+      e[i * 4 + 2] = c[i];
+      e[i * 4 + 3] = d[i];
+    }
+}
+
+/* { dg-final { scan-assembler {\tld2b\t{z[0-9]+.b - z[0-9]+.b}, p[0-7]/z, \[x[0-9]+\]\n} } } */
+/* { dg-final { scan-assembler {\tld3b\t{z[0-9]+.b - z[0-9]+.b}, p[0-7]/z, \[x[0-9]+\]\n} } } */
+/* { dg-final { scan-assembler {\tld4b\t{z[0-9]+.b - z[0-9]+.b}, p[0-7]/z, \[x[0-9]+\]\n} } } */
+/* { dg-final { scan-assembler {\tst2b\t{z[0-9]+.b - z[0-9]+.b}, p[0-7], \[x[0-9]+\]\n} } } */
+/* { dg-final { scan-assembler {\tst3b\t{z[0-9]+.b - z[0-9]+.b}, p[0-7], \[x[0-9]+\]\n} } } */
+/* { dg-final { scan-assembler {\tst4b\t{z[0-9]+.b - z[0-9]+.b}, p[0-7], \[x[0-9]+\]\n} } } */
Index: gcc/testsuite/gcc.target/aarch64/sve_struct_vect_7_run.c
===================================================================
--- /dev/null	2017-11-08 11:04:45.353113300 +0000
+++ gcc/testsuite/gcc.target/aarch64/sve_struct_vect_7_run.c	2017-11-08 15:06:27.250849138 +0000
@@ -0,0 +1,65 @@
+/* { dg-do run { target aarch64_sve_hw } } */
+/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve --save-temps" } */
+
+#include "sve_struct_vect_7.c"
+
+#define N 93
+
+TYPE a[N], b[N], c[N], d[N], e[N * 4];
+
+void __attribute__ ((noinline, noclone))
+init_array (TYPE *array, int n, TYPE base, TYPE step)
+{
+  for (int i = 0; i < n; ++i)
+    array[i] = base + step * i;
+}
+
+void __attribute__ ((noinline, noclone))
+check_array (TYPE *array, int n, TYPE base, TYPE step)
+{
+  for (int i = 0; i < n; ++i)
+    if (array[i] != (TYPE) (base + step * i))
+      __builtin_abort ();
+}
+
+int __attribute__ ((optimize (1)))
+main (void)
+{
+  init_array (e, 2 * N, 11, 5);
+  f2 (a, b, e, N);
+  check_array (a, N, 11, 10);
+  check_array (b, N, 16, 10);
+
+  init_array (e, 3 * N, 7, 6);
+  f3 (a, b, c, e, N);
+  check_array (a, N, 7, 18);
+  check_array (b, N, 13, 18);
+  check_array (c, N, 19, 18);
+
+  init_array (e, 4 * N, 4, 11);
+  f4 (a, b, c, d, e, N);
+  check_array (a, N, 4, 44);
+  check_array (b, N, 15, 44);
+  check_array (c, N, 26, 44);
+  check_array (d, N, 37, 44);
+
+  init_array (a, N, 2, 8);
+  init_array (b, N, 6, 8);
+  g2 (a, b, e, N);
+  check_array (e, 2 * N, 2, 4);
+
+  init_array (a, N, 4, 15);
+  init_array (b, N, 9, 15);
+  init_array (c, N, 14, 15);
+  g3 (a, b, c, e, N);
+  check_array (e, 3 * N, 4, 5);
+
+  init_array (a, N, 14, 36);
+  init_array (b, N, 23, 36);
+  init_array (c, N, 32, 36);
+  init_array (d, N, 41, 36);
+  g4 (a, b, c, d, e, N);
+  check_array (e, 4 * N, 14, 9);
+
+  return 0;
+}
Index: gcc/testsuite/gcc.target/aarch64/sve_struct_vect_8.c
===================================================================
--- /dev/null	2017-11-08 11:04:45.353113300 +0000
+++ gcc/testsuite/gcc.target/aarch64/sve_struct_vect_8.c	2017-11-08 15:06:27.251849138 +0000
@@ -0,0 +1,13 @@
+/* { dg-do assemble } */
+/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve --save-temps" } */
+
+#define TYPE unsigned short
+#define ITYPE short
+#include "sve_struct_vect_7.c"
+
+/* { dg-final { scan-assembler {\tld2h\t{z[0-9]+.h - z[0-9]+.h}, p[0-7]/z, \[x[0-9]+\]\n} } } */
+/* { dg-final { scan-assembler {\tld3h\t{z[0-9]+.h - z[0-9]+.h}, p[0-7]/z, \[x[0-9]+\]\n} } } */
+/* { dg-final { scan-assembler {\tld4h\t{z[0-9]+.h - z[0-9]+.h}, p[0-7]/z, \[x[0-9]+\]\n} } } */
+/* { dg-final { scan-assembler {\tst2h\t{z[0-9]+.h - z[0-9]+.h}, p[0-7], \[x[0-9]+\]\n} } } */
+/* { dg-final { scan-assembler {\tst3h\t{z[0-9]+.h - z[0-9]+.h}, p[0-7], \[x[0-9]+\]\n} } } */
+/* { dg-final { scan-assembler {\tst4h\t{z[0-9]+.h - z[0-9]+.h}, p[0-7], \[x[0-9]+\]\n} } } */
Index: gcc/testsuite/gcc.target/aarch64/sve_struct_vect_8_run.c
===================================================================
--- /dev/null	2017-11-08 11:04:45.353113300 +0000
+++ gcc/testsuite/gcc.target/aarch64/sve_struct_vect_8_run.c	2017-11-08 15:06:27.251849138 +0000
@@ -0,0 +1,6 @@
+/* { dg-do run { target aarch64_sve_hw } } */
+/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve --save-temps" } */
+
+#define TYPE unsigned short
+#define ITYPE short
+#include "sve_struct_vect_7_run.c"
Index: gcc/testsuite/gcc.target/aarch64/sve_struct_vect_9.c
===================================================================
--- /dev/null	2017-11-08 11:04:45.353113300 +0000
+++ gcc/testsuite/gcc.target/aarch64/sve_struct_vect_9.c	2017-11-08 15:06:27.251849138 +0000
@@ -0,0 +1,13 @@
+/* { dg-do assemble } */
+/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve --save-temps" } */
+
+#define TYPE unsigned int
+#define ITYPE int
+#include "sve_struct_vect_7.c"
+
+/* { dg-final { scan-assembler {\tld2w\t{z[0-9]+.s - z[0-9]+.s}, p[0-7]/z, \[x[0-9]+\]\n} } } */
+/* { dg-final { scan-assembler {\tld3w\t{z[0-9]+.s - z[0-9]+.s}, p[0-7]/z, \[x[0-9]+\]\n} } } */
+/* { dg-final { scan-assembler {\tld4w\t{z[0-9]+.s - z[0-9]+.s}, p[0-7]/z, \[x[0-9]+\]\n} } } */
+/* { dg-final { scan-assembler {\tst2w\t{z[0-9]+.s - z[0-9]+.s}, p[0-7], \[x[0-9]+\]\n} } } */
+/* { dg-final { scan-assembler {\tst3w\t{z[0-9]+.s - z[0-9]+.s}, p[0-7], \[x[0-9]+\]\n} } } */
+/* { dg-final { scan-assembler {\tst4w\t{z[0-9]+.s - z[0-9]+.s}, p[0-7], \[x[0-9]+\]\n} } } */
Index: gcc/testsuite/gcc.target/aarch64/sve_struct_vect_9_run.c
===================================================================
--- /dev/null	2017-11-08 11:04:45.353113300 +0000
+++ gcc/testsuite/gcc.target/aarch64/sve_struct_vect_9_run.c	2017-11-08 15:06:27.251849138 +0000
@@ -0,0 +1,6 @@
+/* { dg-do run { target aarch64_sve_hw } } */
+/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve --save-temps" } */
+
+#define TYPE unsigned int
+#define ITYPE int
+#include "sve_struct_vect_7_run.c"
Index: gcc/testsuite/gcc.target/aarch64/sve_struct_vect_10.c
===================================================================
--- /dev/null	2017-11-08 11:04:45.353113300 +0000
+++ gcc/testsuite/gcc.target/aarch64/sve_struct_vect_10.c	2017-11-08 15:06:27.248849138 +0000
@@ -0,0 +1,13 @@
+/* { dg-do assemble } */
+/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve --save-temps" } */
+
+#define TYPE unsigned long
+#define ITYPE long
+#include "sve_struct_vect_7.c"
+
+/* { dg-final { scan-assembler {\tld2d\t{z[0-9]+.d - z[0-9]+.d}, p[0-7]/z, \[x[0-9]+\]\n} } } */
+/* { dg-final { scan-assembler {\tld3d\t{z[0-9]+.d - z[0-9]+.d}, p[0-7]/z, \[x[0-9]+\]\n} } } */
+/* { dg-final { scan-assembler {\tld4d\t{z[0-9]+.d - z[0-9]+.d}, p[0-7]/z, \[x[0-9]+\]\n} } } */
+/* { dg-final { scan-assembler {\tst2d\t{z[0-9]+.d - z[0-9]+.d}, p[0-7], \[x[0-9]+\]\n} } } */
+/* { dg-final { scan-assembler {\tst3d\t{z[0-9]+.d - z[0-9]+.d}, p[0-7], \[x[0-9]+\]\n} } } */
+/* { dg-final { scan-assembler {\tst4d\t{z[0-9]+.d - z[0-9]+.d}, p[0-7], \[x[0-9]+\]\n} } } */
Index: gcc/testsuite/gcc.target/aarch64/sve_struct_vect_10_run.c
===================================================================
--- /dev/null	2017-11-08 11:04:45.353113300 +0000
+++ gcc/testsuite/gcc.target/aarch64/sve_struct_vect_10_run.c	2017-11-08 15:06:27.248849138 +0000
@@ -0,0 +1,6 @@
+/* { dg-do run { target aarch64_sve_hw } } */
+/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve --save-temps" } */
+
+#define TYPE unsigned long
+#define ITYPE long
+#include "sve_struct_vect_7_run.c"
Index: gcc/testsuite/gcc.target/aarch64/sve_struct_vect_11.c
===================================================================
--- /dev/null	2017-11-08 11:04:45.353113300 +0000
+++ gcc/testsuite/gcc.target/aarch64/sve_struct_vect_11.c	2017-11-08 15:06:27.248849138 +0000
@@ -0,0 +1,13 @@
+/* { dg-do assemble } */
+/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve --save-temps" } */
+
+#define TYPE _Float16
+#define ITYPE short
+#include "sve_struct_vect_7.c"
+
+/* { dg-final { scan-assembler {\tld2h\t{z[0-9]+.h - z[0-9]+.h}, p[0-7]/z, \[x[0-9]+\]\n} } } */
+/* { dg-final { scan-assembler {\tld3h\t{z[0-9]+.h - z[0-9]+.h}, p[0-7]/z, \[x[0-9]+\]\n} } } */
+/* { dg-final { scan-assembler {\tld4h\t{z[0-9]+.h - z[0-9]+.h}, p[0-7]/z, \[x[0-9]+\]\n} } } */
+/* { dg-final { scan-assembler {\tst2h\t{z[0-9]+.h - z[0-9]+.h}, p[0-7], \[x[0-9]+\]\n} } } */
+/* { dg-final { scan-assembler {\tst3h\t{z[0-9]+.h - z[0-9]+.h}, p[0-7], \[x[0-9]+\]\n} } } */
+/* { dg-final { scan-assembler {\tst4h\t{z[0-9]+.h - z[0-9]+.h}, p[0-7], \[x[0-9]+\]\n} } } */
Index: gcc/testsuite/gcc.target/aarch64/sve_struct_vect_11_run.c
===================================================================
--- /dev/null	2017-11-08 11:04:45.353113300 +0000
+++ gcc/testsuite/gcc.target/aarch64/sve_struct_vect_11_run.c	2017-11-08 15:06:27.248849138 +0000
@@ -0,0 +1,6 @@
+/* { dg-do run { target aarch64_sve_hw } } */
+/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve --save-temps" } */
+
+#define TYPE _Float16
+#define ITYPE short
+#include "sve_struct_vect_7_run.c"
Index: gcc/testsuite/gcc.target/aarch64/sve_struct_vect_12.c
===================================================================
--- /dev/null	2017-11-08 11:04:45.353113300 +0000
+++ gcc/testsuite/gcc.target/aarch64/sve_struct_vect_12.c	2017-11-08 15:06:27.248849138 +0000
@@ -0,0 +1,13 @@
+/* { dg-do assemble } */
+/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve --save-temps" } */
+
+#define TYPE float
+#define ITYPE int
+#include "sve_struct_vect_7.c"
+
+/* { dg-final { scan-assembler {\tld2w\t{z[0-9]+.s - z[0-9]+.s}, p[0-7]/z, \[x[0-9]+\]\n} } } */
+/* { dg-final { scan-assembler {\tld3w\t{z[0-9]+.s - z[0-9]+.s}, p[0-7]/z, \[x[0-9]+\]\n} } } */
+/* { dg-final { scan-assembler {\tld4w\t{z[0-9]+.s - z[0-9]+.s}, p[0-7]/z, \[x[0-9]+\]\n} } } */
+/* { dg-final { scan-assembler {\tst2w\t{z[0-9]+.s - z[0-9]+.s}, p[0-7], \[x[0-9]+\]\n} } } */
+/* { dg-final { scan-assembler {\tst3w\t{z[0-9]+.s - z[0-9]+.s}, p[0-7], \[x[0-9]+\]\n} } } */
+/* { dg-final { scan-assembler {\tst4w\t{z[0-9]+.s - z[0-9]+.s}, p[0-7], \[x[0-9]+\]\n} } } */
Index: gcc/testsuite/gcc.target/aarch64/sve_struct_vect_12_run.c
===================================================================
--- /dev/null	2017-11-08 11:04:45.353113300 +0000
+++ gcc/testsuite/gcc.target/aarch64/sve_struct_vect_12_run.c	2017-11-08 15:06:27.248849138 +0000
@@ -0,0 +1,6 @@
+/* { dg-do run { target aarch64_sve_hw } } */
+/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve --save-temps" } */
+
+#define TYPE float
+#define ITYPE int
+#include "sve_struct_vect_7_run.c"
Index: gcc/testsuite/gcc.target/aarch64/sve_struct_vect_13.c
===================================================================
--- /dev/null	2017-11-08 11:04:45.353113300 +0000
+++ gcc/testsuite/gcc.target/aarch64/sve_struct_vect_13.c	2017-11-08 15:06:27.249849138 +0000
@@ -0,0 +1,13 @@
+/* { dg-do assemble } */
+/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve --save-temps" } */
+
+#define TYPE double
+#define ITYPE long
+#include "sve_struct_vect_7.c"
+
+/* { dg-final { scan-assembler {\tld2d\t{z[0-9]+.d - z[0-9]+.d}, p[0-7]/z, \[x[0-9]+\]\n} } } */
+/* { dg-final { scan-assembler {\tld3d\t{z[0-9]+.d - z[0-9]+.d}, p[0-7]/z, \[x[0-9]+\]\n} } } */
+/* { dg-final { scan-assembler {\tld4d\t{z[0-9]+.d - z[0-9]+.d}, p[0-7]/z, \[x[0-9]+\]\n} } } */
+/* { dg-final { scan-assembler {\tst2d\t{z[0-9]+.d - z[0-9]+.d}, p[0-7], \[x[0-9]+\]\n} } } */
+/* { dg-final { scan-assembler {\tst3d\t{z[0-9]+.d - z[0-9]+.d}, p[0-7], \[x[0-9]+\]\n} } } */
+/* { dg-final { scan-assembler {\tst4d\t{z[0-9]+.d - z[0-9]+.d}, p[0-7], \[x[0-9]+\]\n} } } */
Index: gcc/testsuite/gcc.target/aarch64/sve_struct_vect_13_run.c
===================================================================
--- /dev/null	2017-11-08 11:04:45.353113300 +0000
+++ gcc/testsuite/gcc.target/aarch64/sve_struct_vect_13_run.c	2017-11-08 15:06:27.249849138 +0000
@@ -0,0 +1,6 @@
+/* { dg-do run { target aarch64_sve_hw } } */
+/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve --save-temps" } */
+
+#define TYPE double
+#define ITYPE long
+#include "sve_struct_vect_7_run.c"
Index: gcc/testsuite/gcc.target/aarch64/sve_struct_vect_14.c
===================================================================
--- /dev/null	2017-11-08 11:04:45.353113300 +0000
+++ gcc/testsuite/gcc.target/aarch64/sve_struct_vect_14.c	2017-11-08 15:06:27.249849138 +0000
@@ -0,0 +1,72 @@
+/* { dg-do assemble } */
+/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve -msve-vector-bits=256 --save-temps" } */
+
+#define TYPE unsigned char
+#define NAME(X) qi_##X
+#include "sve_struct_vect_1.c"
+#undef NAME
+#undef TYPE
+
+#define TYPE unsigned short
+#define NAME(X) hi_##X
+#include "sve_struct_vect_1.c"
+#undef NAME
+#undef TYPE
+
+#define TYPE unsigned int
+#define NAME(X) si_##X
+#include "sve_struct_vect_1.c"
+#undef NAME
+#undef TYPE
+
+#define TYPE unsigned long
+#define NAME(X) di_##X
+#include "sve_struct_vect_1.c"
+#undef NAME
+#undef TYPE
+
+#define TYPE _Float16
+#define NAME(X) hf_##X
+#include "sve_struct_vect_1.c"
+#undef NAME
+#undef TYPE
+
+#define TYPE float
+#define NAME(X) sf_##X
+#include "sve_struct_vect_1.c"
+#undef NAME
+#undef TYPE
+
+#define TYPE double
+#define NAME(X) df_##X
+#include "sve_struct_vect_1.c"
+#undef NAME
+#undef TYPE
+
+/* { dg-final { scan-assembler-times {\tld2b\t{z[0-9]+.b - z[0-9]+.b}, p[0-7]/z, \[x[0-9]+\]\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tld3b\t{z[0-9]+.b - z[0-9]+.b}, p[0-7]/z, \[x[0-9]+\]\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tld4b\t{z[0-9]+.b - z[0-9]+.b}, p[0-7]/z, \[x[0-9]+\]\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tst2b\t{z[0-9]+.b - z[0-9]+.b}, p[0-7], \[x[0-9]+\]\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tst3b\t{z[0-9]+.b - z[0-9]+.b}, p[0-7], \[x[0-9]+\]\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tst4b\t{z[0-9]+.b - z[0-9]+.b}, p[0-7], \[x[0-9]+\]\n} 1 } } */
+
+/* { dg-final { scan-assembler-times {\tld2h\t{z[0-9]+.h - z[0-9]+.h}, p[0-7]/z, \[x[0-9]+\]\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tld3h\t{z[0-9]+.h - z[0-9]+.h}, p[0-7]/z, \[x[0-9]+\]\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tld4h\t{z[0-9]+.h - z[0-9]+.h}, p[0-7]/z, \[x[0-9]+\]\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tst2h\t{z[0-9]+.h - z[0-9]+.h}, p[0-7], \[x[0-9]+\]\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tst3h\t{z[0-9]+.h - z[0-9]+.h}, p[0-7], \[x[0-9]+\]\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tst4h\t{z[0-9]+.h - z[0-9]+.h}, p[0-7], \[x[0-9]+\]\n} 2 } } */
+
+/* { dg-final { scan-assembler-times {\tld2w\t{z[0-9]+.s - z[0-9]+.s}, p[0-7]/z, \[x[0-9]+\]\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tld3w\t{z[0-9]+.s - z[0-9]+.s}, p[0-7]/z, \[x[0-9]+\]\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tld4w\t{z[0-9]+.s - z[0-9]+.s}, p[0-7]/z, \[x[0-9]+\]\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tst2w\t{z[0-9]+.s - z[0-9]+.s}, p[0-7], \[x[0-9]+\]\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tst3w\t{z[0-9]+.s - z[0-9]+.s}, p[0-7], \[x[0-9]+\]\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tst4w\t{z[0-9]+.s - z[0-9]+.s}, p[0-7], \[x[0-9]+\]\n} 2 } } */
+
+/* { dg-final { scan-assembler-times {\tld2d\t{z[0-9]+.d - z[0-9]+.d}, p[0-7]/z, \[x[0-9]+\]\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tld3d\t{z[0-9]+.d - z[0-9]+.d}, p[0-7]/z, \[x[0-9]+\]\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tld4d\t{z[0-9]+.d - z[0-9]+.d}, p[0-7]/z, \[x[0-9]+\]\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tst2d\t{z[0-9]+.d - z[0-9]+.d}, p[0-7], \[x[0-9]+\]\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tst3d\t{z[0-9]+.d - z[0-9]+.d}, p[0-7], \[x[0-9]+\]\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tst4d\t{z[0-9]+.d - z[0-9]+.d}, p[0-7], \[x[0-9]+\]\n} 2 } } */
Index: gcc/testsuite/gcc.target/aarch64/sve_struct_vect_15.c
===================================================================
--- /dev/null	2017-11-08 11:04:45.353113300 +0000
+++ gcc/testsuite/gcc.target/aarch64/sve_struct_vect_15.c	2017-11-08 15:06:27.249849138 +0000
@@ -0,0 +1,32 @@
+/* { dg-do assemble } */
+/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve -msve-vector-bits=512 --save-temps" } */
+
+#include "sve_struct_vect_14.c"
+
+/* { dg-final { scan-assembler-times {\tld2b\t{z[0-9]+.b - z[0-9]+.b}, p[0-7]/z, \[x[0-9]+\]\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tld3b\t{z[0-9]+.b - z[0-9]+.b}, p[0-7]/z, \[x[0-9]+\]\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tld4b\t{z[0-9]+.b - z[0-9]+.b}, p[0-7]/z, \[x[0-9]+\]\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tst2b\t{z[0-9]+.b - z[0-9]+.b}, p[0-7], \[x[0-9]+\]\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tst3b\t{z[0-9]+.b - z[0-9]+.b}, p[0-7], \[x[0-9]+\]\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tst4b\t{z[0-9]+.b - z[0-9]+.b}, p[0-7], \[x[0-9]+\]\n} 1 } } */
+
+/* { dg-final { scan-assembler-times {\tld2h\t{z[0-9]+.h - z[0-9]+.h}, p[0-7]/z, \[x[0-9]+\]\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tld3h\t{z[0-9]+.h - z[0-9]+.h}, p[0-7]/z, \[x[0-9]+\]\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tld4h\t{z[0-9]+.h - z[0-9]+.h}, p[0-7]/z, \[x[0-9]+\]\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tst2h\t{z[0-9]+.h - z[0-9]+.h}, p[0-7], \[x[0-9]+\]\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tst3h\t{z[0-9]+.h - z[0-9]+.h}, p[0-7], \[x[0-9]+\]\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tst4h\t{z[0-9]+.h - z[0-9]+.h}, p[0-7], \[x[0-9]+\]\n} 2 } } */
+
+/* { dg-final { scan-assembler-times {\tld2w\t{z[0-9]+.s - z[0-9]+.s}, p[0-7]/z, \[x[0-9]+\]\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tld3w\t{z[0-9]+.s - z[0-9]+.s}, p[0-7]/z, \[x[0-9]+\]\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tld4w\t{z[0-9]+.s - z[0-9]+.s}, p[0-7]/z, \[x[0-9]+\]\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tst2w\t{z[0-9]+.s - z[0-9]+.s}, p[0-7], \[x[0-9]+\]\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tst3w\t{z[0-9]+.s - z[0-9]+.s}, p[0-7], \[x[0-9]+\]\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tst4w\t{z[0-9]+.s - z[0-9]+.s}, p[0-7], \[x[0-9]+\]\n} 2 } } */
+
+/* { dg-final { scan-assembler-times {\tld2d\t{z[0-9]+.d - z[0-9]+.d}, p[0-7]/z, \[x[0-9]+\]\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tld3d\t{z[0-9]+.d - z[0-9]+.d}, p[0-7]/z, \[x[0-9]+\]\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tld4d\t{z[0-9]+.d - z[0-9]+.d}, p[0-7]/z, \[x[0-9]+\]\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tst2d\t{z[0-9]+.d - z[0-9]+.d}, p[0-7], \[x[0-9]+\]\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tst3d\t{z[0-9]+.d - z[0-9]+.d}, p[0-7], \[x[0-9]+\]\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tst4d\t{z[0-9]+.d - z[0-9]+.d}, p[0-7], \[x[0-9]+\]\n} 2 } } */
Index: gcc/testsuite/gcc.target/aarch64/sve_struct_vect_16.c
===================================================================
--- /dev/null	2017-11-08 11:04:45.353113300 +0000
+++ gcc/testsuite/gcc.target/aarch64/sve_struct_vect_16.c	2017-11-08 15:06:27.249849138 +0000
@@ -0,0 +1,32 @@
+/* { dg-do assemble } */
+/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve -msve-vector-bits=1024 --save-temps" } */
+
+#include "sve_struct_vect_14.c"
+
+/* { dg-final { scan-assembler-times {\tld2b\t{z[0-9]+.b - z[0-9]+.b}, p[0-7]/z, \[x[0-9]+\]\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tld3b\t{z[0-9]+.b - z[0-9]+.b}, p[0-7]/z, \[x[0-9]+\]\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tld4b\t{z[0-9]+.b - z[0-9]+.b}, p[0-7]/z, \[x[0-9]+\]\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tst2b\t{z[0-9]+.b - z[0-9]+.b}, p[0-7], \[x[0-9]+\]\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tst3b\t{z[0-9]+.b - z[0-9]+.b}, p[0-7], \[x[0-9]+\]\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tst4b\t{z[0-9]+.b - z[0-9]+.b}, p[0-7], \[x[0-9]+\]\n} 1 } } */
+
+/* { dg-final { scan-assembler-times {\tld2h\t{z[0-9]+.h - z[0-9]+.h}, p[0-7]/z, \[x[0-9]+\]\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tld3h\t{z[0-9]+.h - z[0-9]+.h}, p[0-7]/z, \[x[0-9]+\]\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tld4h\t{z[0-9]+.h - z[0-9]+.h}, p[0-7]/z, \[x[0-9]+\]\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tst2h\t{z[0-9]+.h - z[0-9]+.h}, p[0-7], \[x[0-9]+\]\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tst3h\t{z[0-9]+.h - z[0-9]+.h}, p[0-7], \[x[0-9]+\]\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tst4h\t{z[0-9]+.h - z[0-9]+.h}, p[0-7], \[x[0-9]+\]\n} 2 } } */
+
+/* { dg-final { scan-assembler-times {\tld2w\t{z[0-9]+.s - z[0-9]+.s}, p[0-7]/z, \[x[0-9]+\]\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tld3w\t{z[0-9]+.s - z[0-9]+.s}, p[0-7]/z, \[x[0-9]+\]\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tld4w\t{z[0-9]+.s - z[0-9]+.s}, p[0-7]/z, \[x[0-9]+\]\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tst2w\t{z[0-9]+.s - z[0-9]+.s}, p[0-7], \[x[0-9]+\]\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tst3w\t{z[0-9]+.s - z[0-9]+.s}, p[0-7], \[x[0-9]+\]\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tst4w\t{z[0-9]+.s - z[0-9]+.s}, p[0-7], \[x[0-9]+\]\n} 2 } } */
+
+/* { dg-final { scan-assembler-times {\tld2d\t{z[0-9]+.d - z[0-9]+.d}, p[0-7]/z, \[x[0-9]+\]\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tld3d\t{z[0-9]+.d - z[0-9]+.d}, p[0-7]/z, \[x[0-9]+\]\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tld4d\t{z[0-9]+.d - z[0-9]+.d}, p[0-7]/z, \[x[0-9]+\]\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tst2d\t{z[0-9]+.d - z[0-9]+.d}, p[0-7], \[x[0-9]+\]\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tst3d\t{z[0-9]+.d - z[0-9]+.d}, p[0-7], \[x[0-9]+\]\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tst4d\t{z[0-9]+.d - z[0-9]+.d}, p[0-7], \[x[0-9]+\]\n} 2 } } */
Index: gcc/testsuite/gcc.target/aarch64/sve_struct_vect_17.c
===================================================================
--- /dev/null	2017-11-08 11:04:45.353113300 +0000
+++ gcc/testsuite/gcc.target/aarch64/sve_struct_vect_17.c	2017-11-08 15:06:27.249849138 +0000
@@ -0,0 +1,32 @@
+/* { dg-do assemble } */
+/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve -msve-vector-bits=2048 --save-temps" } */
+
+#include "sve_struct_vect_14.c"
+
+/* { dg-final { scan-assembler-times {\tld2b\t{z[0-9]+.b - z[0-9]+.b}, p[0-7]/z, \[x[0-9]+\]\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tld3b\t{z[0-9]+.b - z[0-9]+.b}, p[0-7]/z, \[x[0-9]+\]\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tld4b\t{z[0-9]+.b - z[0-9]+.b}, p[0-7]/z, \[x[0-9]+\]\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tst2b\t{z[0-9]+.b - z[0-9]+.b}, p[0-7], \[x[0-9]+\]\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tst3b\t{z[0-9]+.b - z[0-9]+.b}, p[0-7], \[x[0-9]+\]\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tst4b\t{z[0-9]+.b - z[0-9]+.b}, p[0-7], \[x[0-9]+\]\n} 1 } } */
+
+/* { dg-final { scan-assembler-times {\tld2h\t{z[0-9]+.h - z[0-9]+.h}, p[0-7]/z, \[x[0-9]+\]\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tld3h\t{z[0-9]+.h - z[0-9]+.h}, p[0-7]/z, \[x[0-9]+\]\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tld4h\t{z[0-9]+.h - z[0-9]+.h}, p[0-7]/z, \[x[0-9]+\]\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tst2h\t{z[0-9]+.h - z[0-9]+.h}, p[0-7], \[x[0-9]+\]\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tst3h\t{z[0-9]+.h - z[0-9]+.h}, p[0-7], \[x[0-9]+\]\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tst4h\t{z[0-9]+.h - z[0-9]+.h}, p[0-7], \[x[0-9]+\]\n} 2 } } */
+
+/* { dg-final { scan-assembler-times {\tld2w\t{z[0-9]+.s - z[0-9]+.s}, p[0-7]/z, \[x[0-9]+\]\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tld3w\t{z[0-9]+.s - z[0-9]+.s}, p[0-7]/z, \[x[0-9]+\]\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tld4w\t{z[0-9]+.s - z[0-9]+.s}, p[0-7]/z, \[x[0-9]+\]\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tst2w\t{z[0-9]+.s - z[0-9]+.s}, p[0-7], \[x[0-9]+\]\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tst3w\t{z[0-9]+.s - z[0-9]+.s}, p[0-7], \[x[0-9]+\]\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tst4w\t{z[0-9]+.s - z[0-9]+.s}, p[0-7], \[x[0-9]+\]\n} 2 } } */
+
+/* { dg-final { scan-assembler-times {\tld2d\t{z[0-9]+.d - z[0-9]+.d}, p[0-7]/z, \[x[0-9]+\]\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tld3d\t{z[0-9]+.d - z[0-9]+.d}, p[0-7]/z, \[x[0-9]+\]\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tld4d\t{z[0-9]+.d - z[0-9]+.d}, p[0-7]/z, \[x[0-9]+\]\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tst2d\t{z[0-9]+.d - z[0-9]+.d}, p[0-7], \[x[0-9]+\]\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tst3d\t{z[0-9]+.d - z[0-9]+.d}, p[0-7], \[x[0-9]+\]\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tst4d\t{z[0-9]+.d - z[0-9]+.d}, p[0-7], \[x[0-9]+\]\n} 2 } } */

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2018-01-06 19:46 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-08 15:12 [0/4] Add SVE support for load/store_lanes Richard Sandiford
2017-11-08 15:13 ` [1/4] Give the target more control over ARRAY_TYPE modes Richard Sandiford
2017-11-21 16:38   ` Jeff Law
2017-11-08 15:16 ` [2/4] [AArch64] SVE load/store_lanes support Richard Sandiford
2018-01-06 19:45   ` Richard Sandiford
2017-11-08 15:18 ` [3/4] load/store_lanes testsuite markup Richard Sandiford
2017-11-20  4:53   ` Jeff Law
2017-11-08 15:18 ` [4/4] [AArch64] Tests for SVE structure modes Richard Sandiford
2018-01-06 19:46   ` Richard Sandiford

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).