public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [0/n] Support multiple vector sizes for vectorisation
@ 2019-10-25 12:32 Richard Sandiford
  2019-10-25 12:34 ` [6/n] Use build_vector_type_for_mode in get_vectype_for_scalar_type_and_size Richard Sandiford
                   ` (13 more replies)
  0 siblings, 14 replies; 48+ messages in thread
From: Richard Sandiford @ 2019-10-25 12:32 UTC (permalink / raw)
  To: gcc-patches

This is a continuation of the patch series I started on Wednesday
this time posted under a covering message.  Parts 1-5 were:

[1/n] https://gcc.gnu.org/ml/gcc-patches/2019-10/msg01634.html
[2/n] https://gcc.gnu.org/ml/gcc-patches/2019-10/msg01637.html
[3/n] https://gcc.gnu.org/ml/gcc-patches/2019-10/msg01638.html
[4/n] https://gcc.gnu.org/ml/gcc-patches/2019-10/msg01639.html
[5/n] https://gcc.gnu.org/ml/gcc-patches/2019-10/msg01641.html

Some parts of the series will conflict with Andre's patches,
so I'll hold off applying anything that gets approved until those
patches have gone in.  The conflicts should only be minor though,
and won't change the approach, so I thought it was worth posting
for comments now anyway.

I tested each patch individually on aarch64-linux-gnu and the series as
a whole on x86_64-linux-gnu.  I also tried building at least one target
per CPU directory and spot-checked that they were behaving sensibly after
the patch.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [7/n] Use consistent compatibility checks in vectorizable_shift
  2019-10-25 12:32 [0/n] Support multiple vector sizes for vectorisation Richard Sandiford
  2019-10-25 12:34 ` [6/n] Use build_vector_type_for_mode in get_vectype_for_scalar_type_and_size Richard Sandiford
@ 2019-10-25 12:34 ` Richard Sandiford
  2019-10-30 14:33   ` Richard Biener
  2019-10-25 12:39 ` [8/n] Replace autovectorize_vector_sizes with autovectorize_vector_modes Richard Sandiford
                   ` (11 subsequent siblings)
  13 siblings, 1 reply; 48+ messages in thread
From: Richard Sandiford @ 2019-10-25 12:34 UTC (permalink / raw)
  To: gcc-patches

The validation phase of vectorizable_shift used TYPE_MODE to check
whether the shift amount vector was compatible with the shifted vector:

      if ((op1_vectype == NULL_TREE
	   || TYPE_MODE (op1_vectype) != TYPE_MODE (vectype))
 	  && (!slp_node
 	      || SLP_TREE_DEF_TYPE
 		   (SLP_TREE_CHILDREN (slp_node)[1]) != vect_constant_def))

But the generation phase was stricter and required the element types to
be equivalent:

		   && !useless_type_conversion_p (TREE_TYPE (vectype),
						  TREE_TYPE (op1)))

This difference led to an ICE with a later patch.

The first condition seems a bit too lax given that the function
supports vect_worthwhile_without_simd_p, where two different vector
types could have the same integer mode.  But it seems too strict
to reject signed shifts by unsigned amounts or unsigned shifts by
signed amounts; verify_gimple_assign_binary is happy with those.

This patch therefore goes for a middle ground of checking both TYPE_MODE
and TYPE_VECTOR_SUBPARTS, using the same condition in both places.


2019-10-24  Richard Sandiford  <richard.sandiford@arm.com>

gcc/
	* tree-vect-stmts.c (vectorizable_shift): Check the number
	of vector elements as well as the type mode when deciding
	whether an op1_vectype is compatible.  Reuse the result of
	this check when generating vector statements.

Index: gcc/tree-vect-stmts.c
===================================================================
--- gcc/tree-vect-stmts.c	2019-10-25 13:27:08.653811531 +0100
+++ gcc/tree-vect-stmts.c	2019-10-25 13:27:12.121787027 +0100
@@ -5522,6 +5522,7 @@ vectorizable_shift (stmt_vec_info stmt_i
   bool scalar_shift_arg = true;
   bb_vec_info bb_vinfo = STMT_VINFO_BB_VINFO (stmt_info);
   vec_info *vinfo = stmt_info->vinfo;
+  bool incompatible_op1_vectype_p = false;
 
   if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo)
     return false;
@@ -5666,8 +5667,12 @@ vectorizable_shift (stmt_vec_info stmt_i
 
       if (!op1_vectype)
 	op1_vectype = get_same_sized_vectype (TREE_TYPE (op1), vectype_out);
-      if ((op1_vectype == NULL_TREE
-	   || TYPE_MODE (op1_vectype) != TYPE_MODE (vectype))
+      incompatible_op1_vectype_p
+	= (op1_vectype == NULL_TREE
+	   || maybe_ne (TYPE_VECTOR_SUBPARTS (op1_vectype),
+			TYPE_VECTOR_SUBPARTS (vectype))
+	   || TYPE_MODE (op1_vectype) != TYPE_MODE (vectype));
+      if (incompatible_op1_vectype_p
 	  && (!slp_node
 	      || SLP_TREE_DEF_TYPE
 		   (SLP_TREE_CHILDREN (slp_node)[1]) != vect_constant_def))
@@ -5813,9 +5818,7 @@ vectorizable_shift (stmt_vec_info stmt_i
                     }
                 }
             }
-	  else if (slp_node
-		   && !useless_type_conversion_p (TREE_TYPE (vectype),
-						  TREE_TYPE (op1)))
+	  else if (slp_node && incompatible_op1_vectype_p)
 	    {
 	      if (was_scalar_shift_arg)
 		{

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [6/n] Use build_vector_type_for_mode in get_vectype_for_scalar_type_and_size
  2019-10-25 12:32 [0/n] Support multiple vector sizes for vectorisation Richard Sandiford
@ 2019-10-25 12:34 ` Richard Sandiford
  2019-10-30 14:32   ` Richard Biener
  2019-10-25 12:34 ` [7/n] Use consistent compatibility checks in vectorizable_shift Richard Sandiford
                   ` (12 subsequent siblings)
  13 siblings, 1 reply; 48+ messages in thread
From: Richard Sandiford @ 2019-10-25 12:34 UTC (permalink / raw)
  To: gcc-patches

Except for one case, get_vectype_for_scalar_type_and_size calculates
what the vector mode should be and then calls build_vector_type,
which recomputes the mode from scratch.  This patch makes it use
build_vector_type_for_mode instead.

The exception mentioned above is when preferred_simd_mode returns
an integer mode, which it does if no appropriate vector mode exists.
The integer mode in question is usually word_mode, although epiphany
can return a doubleword mode in some cases.

There's no guarantee that this integer mode is appropriate, since for
example the scalar type could be a float.  The traditional behaviour is
therefore to use the integer mode to determine a size only, and leave
mode_for_vector to pick the TYPE_MODE.  (Note that it can actually end
up picking a vector mode if the target defines a disabled vector mode.
We therefore still need to check TYPE_MODE after building the type.)


2019-10-24  Richard Sandiford  <richard.sandiford@arm.com>

gcc/
	* tree-vect-stmts.c (get_vectype_for_scalar_type_and_size): If
	targetm.vectorize.preferred_simd_mode returns an integer mode,
	use mode_for_vector to decide what the vector type's mode
	should actually be.  Use build_vector_type_for_mode instead
	of build_vector_type.

Index: gcc/tree-vect-stmts.c
===================================================================
--- gcc/tree-vect-stmts.c	2019-10-25 13:26:59.309877555 +0100
+++ gcc/tree-vect-stmts.c	2019-10-25 13:27:08.653811531 +0100
@@ -11162,16 +11162,31 @@ get_vectype_for_scalar_type_and_size (tr
   /* If no size was supplied use the mode the target prefers.   Otherwise
      lookup a vector mode of the specified size.  */
   if (known_eq (size, 0U))
-    simd_mode = targetm.vectorize.preferred_simd_mode (inner_mode);
+    {
+      simd_mode = targetm.vectorize.preferred_simd_mode (inner_mode);
+      if (SCALAR_INT_MODE_P (simd_mode))
+	{
+	  /* Traditional behavior is not to take the integer mode
+	     literally, but simply to use it as a way of determining
+	     the vector size.  It is up to mode_for_vector to decide
+	     what the TYPE_MODE should be.
+
+	     Note that nunits == 1 is allowed in order to support single
+	     element vector types.  */
+	  if (!multiple_p (GET_MODE_SIZE (simd_mode), nbytes, &nunits)
+	      || !mode_for_vector (inner_mode, nunits).exists (&simd_mode))
+	    return NULL_TREE;
+	}
+    }
   else if (!multiple_p (size, nbytes, &nunits)
 	   || !mode_for_vector (inner_mode, nunits).exists (&simd_mode))
     return NULL_TREE;
-  /* NOTE: nunits == 1 is allowed to support single element vector types.  */
-  if (!multiple_p (GET_MODE_SIZE (simd_mode), nbytes, &nunits))
-    return NULL_TREE;
 
-  vectype = build_vector_type (scalar_type, nunits);
+  vectype = build_vector_type_for_mode (scalar_type, simd_mode);
 
+  /* In cases where the mode was chosen by mode_for_vector, check that
+     the target actually supports the chosen mode, or that it at least
+     allows the vector mode to be replaced by a like-sized integer.  */
   if (!VECTOR_MODE_P (TYPE_MODE (vectype))
       && !INTEGRAL_MODE_P (TYPE_MODE (vectype)))
     return NULL_TREE;

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [8/n] Replace autovectorize_vector_sizes with autovectorize_vector_modes
  2019-10-25 12:32 [0/n] Support multiple vector sizes for vectorisation Richard Sandiford
  2019-10-25 12:34 ` [6/n] Use build_vector_type_for_mode in get_vectype_for_scalar_type_and_size Richard Sandiford
  2019-10-25 12:34 ` [7/n] Use consistent compatibility checks in vectorizable_shift Richard Sandiford
@ 2019-10-25 12:39 ` Richard Sandiford
  2019-10-30 14:48   ` Richard Biener
  2019-10-25 12:41 ` [9/n] Replace vec_info::vector_size with vec_info::vector_mode Richard Sandiford
                   ` (10 subsequent siblings)
  13 siblings, 1 reply; 48+ messages in thread
From: Richard Sandiford @ 2019-10-25 12:39 UTC (permalink / raw)
  To: gcc-patches

This is another patch in the series to remove the assumption that
all modes involved in vectorisation have to be the same size.
Rather than have the target provide a list of vector sizes,
it makes the target provide a list of vector "approaches",
with each approach represented by a mode.

A later patch will pass this mode to targetm.vectorize.related_mode
to get the vector mode for a given element mode.  Until then, the modes
simply act as an alternative way of specifying the vector size.


2019-10-24  Richard Sandiford  <richard.sandiford@arm.com>

gcc/
	* target.h (vector_sizes, auto_vector_sizes): Delete.
	(vector_modes, auto_vector_modes): New typedefs.
	* target.def (autovectorize_vector_sizes): Replace with...
	(autovectorize_vector_modes): ...this new hook.
	* doc/tm.texi.in (TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES):
	Replace with...
	(TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES): ...this new hook.
	* doc/tm.texi: Regenerate.
	* targhooks.h (default_autovectorize_vector_sizes): Delete.
	(default_autovectorize_vector_modes): New function.
	* targhooks.c (default_autovectorize_vector_sizes): Delete.
	(default_autovectorize_vector_modes): New function.
	* omp-general.c (omp_max_vf): Use autovectorize_vector_modes instead
	of autovectorize_vector_sizes.  Use the number of units in the mode
	to calculate the maximum VF.
	* omp-low.c (omp_clause_aligned_alignment): Use
	autovectorize_vector_modes instead of autovectorize_vector_sizes.
	Use a loop based on related_mode to iterate through all supported
	vector modes for a given scalar mode.
	* optabs-query.c (can_vec_mask_load_store_p): Use
	autovectorize_vector_modes instead of autovectorize_vector_sizes.
	* tree-vect-loop.c (vect_analyze_loop, vect_transform_loop): Likewise.
	* tree-vect-slp.c (vect_slp_bb_region): Likewise.
	* config/aarch64/aarch64.c (aarch64_autovectorize_vector_sizes):
	Replace with...
	(aarch64_autovectorize_vector_modes): ...this new function.
	(TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES): Delete.
	(TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES): Define.
	* config/arc/arc.c (arc_autovectorize_vector_sizes): Replace with...
	(arc_autovectorize_vector_modes): ...this new function.
	(TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES): Delete.
	(TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES): Define.
	* config/arm/arm.c (arm_autovectorize_vector_sizes): Replace with...
	(arm_autovectorize_vector_modes): ...this new function.
	(TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES): Delete.
	(TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES): Define.
	* config/i386/i386.c (ix86_autovectorize_vector_sizes): Replace with...
	(ix86_autovectorize_vector_modes): ...this new function.
	(TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES): Delete.
	(TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES): Define.
	* config/mips/mips.c (mips_autovectorize_vector_sizes): Replace with...
	(mips_autovectorize_vector_modes): ...this new function.
	(TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES): Delete.
	(TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES): Define.

Index: gcc/target.h
===================================================================
--- gcc/target.h	2019-09-30 17:19:39.843166118 +0100
+++ gcc/target.h	2019-10-25 13:27:15.525762975 +0100
@@ -205,11 +205,11 @@ enum vect_cost_model_location {
 class vec_perm_indices;
 
 /* The type to use for lists of vector sizes.  */
-typedef vec<poly_uint64> vector_sizes;
+typedef vec<machine_mode> vector_modes;
 
 /* Same, but can be used to construct local lists that are
    automatically freed.  */
-typedef auto_vec<poly_uint64, 8> auto_vector_sizes;
+typedef auto_vec<machine_mode, 8> auto_vector_modes;
 
 /* The target structure.  This holds all the backend hooks.  */
 #define DEFHOOKPOD(NAME, DOC, TYPE, INIT) TYPE NAME;
Index: gcc/target.def
===================================================================
--- gcc/target.def	2019-10-25 13:26:59.309877555 +0100
+++ gcc/target.def	2019-10-25 13:27:15.525762975 +0100
@@ -1894,20 +1894,28 @@ reached.  The default is @var{mode} whic
 /* Returns a mask of vector sizes to iterate over when auto-vectorizing
    after processing the preferred one derived from preferred_simd_mode.  */
 DEFHOOK
-(autovectorize_vector_sizes,
- "If the mode returned by @code{TARGET_VECTORIZE_PREFERRED_SIMD_MODE} is not\n\
-the only one that is worth considering, this hook should add all suitable\n\
-vector sizes to @var{sizes}, in order of decreasing preference.  The first\n\
-one should be the size of @code{TARGET_VECTORIZE_PREFERRED_SIMD_MODE}.\n\
-If @var{all} is true, add suitable vector sizes even when they are generally\n\
+(autovectorize_vector_modes,
+ "If using the mode returned by @code{TARGET_VECTORIZE_PREFERRED_SIMD_MODE}\n\
+is not the only approach worth considering, this hook should add one mode to\n\
+@var{modes} for each useful alternative approach.  These modes are then\n\
+passed to @code{TARGET_VECTORIZE_RELATED_MODE} to obtain the vector mode\n\
+for a given element mode.\n\
+\n\
+The modes returned in @var{modes} should use the smallest element mode\n\
+possible for the vectorization approach that they represent, preferring\n\
+integer modes over floating-poing modes in the event of a tie.  The first\n\
+mode should be the @code{TARGET_VECTORIZE_PREFERRED_SIMD_MODE} for its\n\
+element mode.\n\
+\n\
+If @var{all} is true, add suitable vector modes even when they are generally\n\
 not expected to be worthwhile.\n\
 \n\
 The hook does not need to do anything if the vector returned by\n\
 @code{TARGET_VECTORIZE_PREFERRED_SIMD_MODE} is the only one relevant\n\
 for autovectorization.  The default implementation does nothing.",
  void,
- (vector_sizes *sizes, bool all),
- default_autovectorize_vector_sizes)
+ (vector_modes *modes, bool all),
+ default_autovectorize_vector_modes)
 
 DEFHOOK
 (related_mode,
Index: gcc/doc/tm.texi.in
===================================================================
--- gcc/doc/tm.texi.in	2019-10-25 13:26:59.009879675 +0100
+++ gcc/doc/tm.texi.in	2019-10-25 13:27:15.521763003 +0100
@@ -4179,7 +4179,7 @@ address;  but often a machine-dependent
 
 @hook TARGET_VECTORIZE_SPLIT_REDUCTION
 
-@hook TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES
+@hook TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES
 
 @hook TARGET_VECTORIZE_RELATED_MODE
 
Index: gcc/doc/tm.texi
===================================================================
--- gcc/doc/tm.texi	2019-10-25 13:26:59.305877583 +0100
+++ gcc/doc/tm.texi	2019-10-25 13:27:15.521763003 +0100
@@ -6016,12 +6016,20 @@ against lower halves of vectors recursiv
 reached.  The default is @var{mode} which means no splitting.
 @end deftypefn
 
-@deftypefn {Target Hook} void TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES (vector_sizes *@var{sizes}, bool @var{all})
-If the mode returned by @code{TARGET_VECTORIZE_PREFERRED_SIMD_MODE} is not
-the only one that is worth considering, this hook should add all suitable
-vector sizes to @var{sizes}, in order of decreasing preference.  The first
-one should be the size of @code{TARGET_VECTORIZE_PREFERRED_SIMD_MODE}.
-If @var{all} is true, add suitable vector sizes even when they are generally
+@deftypefn {Target Hook} void TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES (vector_modes *@var{modes}, bool @var{all})
+If using the mode returned by @code{TARGET_VECTORIZE_PREFERRED_SIMD_MODE}
+is not the only approach worth considering, this hook should add one mode to
+@var{modes} for each useful alternative approach.  These modes are then
+passed to @code{TARGET_VECTORIZE_RELATED_MODE} to obtain the vector mode
+for a given element mode.
+
+The modes returned in @var{modes} should use the smallest element mode
+possible for the vectorization approach that they represent, preferring
+integer modes over floating-poing modes in the event of a tie.  The first
+mode should be the @code{TARGET_VECTORIZE_PREFERRED_SIMD_MODE} for its
+element mode.
+
+If @var{all} is true, add suitable vector modes even when they are generally
 not expected to be worthwhile.
 
 The hook does not need to do anything if the vector returned by
Index: gcc/targhooks.h
===================================================================
--- gcc/targhooks.h	2019-10-25 13:26:59.309877555 +0100
+++ gcc/targhooks.h	2019-10-25 13:27:15.525762975 +0100
@@ -113,7 +113,7 @@ default_builtin_support_vector_misalignm
 					     int, bool);
 extern machine_mode default_preferred_simd_mode (scalar_mode mode);
 extern machine_mode default_split_reduction (machine_mode);
-extern void default_autovectorize_vector_sizes (vector_sizes *, bool);
+extern void default_autovectorize_vector_modes (vector_modes *, bool);
 extern opt_machine_mode default_vectorize_related_mode (machine_mode,
 							scalar_mode,
 							poly_uint64);
Index: gcc/targhooks.c
===================================================================
--- gcc/targhooks.c	2019-10-25 13:26:59.309877555 +0100
+++ gcc/targhooks.c	2019-10-25 13:27:15.525762975 +0100
@@ -1299,11 +1299,10 @@ default_split_reduction (machine_mode mo
   return mode;
 }
 
-/* By default only the size derived from the preferred vector mode
-   is tried.  */
+/* By default only the preferred vector mode is tried.  */
 
 void
-default_autovectorize_vector_sizes (vector_sizes *, bool)
+default_autovectorize_vector_modes (vector_modes *, bool)
 {
 }
 
Index: gcc/omp-general.c
===================================================================
--- gcc/omp-general.c	2019-10-25 09:21:28.798326303 +0100
+++ gcc/omp-general.c	2019-10-25 13:27:15.521763003 +0100
@@ -508,13 +508,16 @@ omp_max_vf (void)
 	  && global_options_set.x_flag_tree_loop_vectorize))
     return 1;
 
-  auto_vector_sizes sizes;
-  targetm.vectorize.autovectorize_vector_sizes (&sizes, true);
-  if (!sizes.is_empty ())
+  auto_vector_modes modes;
+  targetm.vectorize.autovectorize_vector_modes (&modes, true);
+  if (!modes.is_empty ())
     {
       poly_uint64 vf = 0;
-      for (unsigned int i = 0; i < sizes.length (); ++i)
-	vf = ordered_max (vf, sizes[i]);
+      for (unsigned int i = 0; i < modes.length (); ++i)
+	/* The returned modes use the smallest element size (and thus
+	   the largest nunits) for the vectorization approach that they
+	   represent.  */
+	vf = ordered_max (vf, GET_MODE_NUNITS (modes[i]));
       return vf;
     }
 
Index: gcc/omp-low.c
===================================================================
--- gcc/omp-low.c	2019-10-11 15:43:51.283513446 +0100
+++ gcc/omp-low.c	2019-10-25 13:27:15.525762975 +0100
@@ -3947,11 +3947,8 @@ omp_clause_aligned_alignment (tree claus
   /* Otherwise return implementation defined alignment.  */
   unsigned int al = 1;
   opt_scalar_mode mode_iter;
-  auto_vector_sizes sizes;
-  targetm.vectorize.autovectorize_vector_sizes (&sizes, true);
-  poly_uint64 vs = 0;
-  for (unsigned int i = 0; i < sizes.length (); ++i)
-    vs = ordered_max (vs, sizes[i]);
+  auto_vector_modes modes;
+  targetm.vectorize.autovectorize_vector_modes (&modes, true);
   static enum mode_class classes[]
     = { MODE_INT, MODE_VECTOR_INT, MODE_FLOAT, MODE_VECTOR_FLOAT };
   for (int i = 0; i < 4; i += 2)
@@ -3962,19 +3959,18 @@ omp_clause_aligned_alignment (tree claus
 	machine_mode vmode = targetm.vectorize.preferred_simd_mode (mode);
 	if (GET_MODE_CLASS (vmode) != classes[i + 1])
 	  continue;
-	while (maybe_ne (vs, 0U)
-	       && known_lt (GET_MODE_SIZE (vmode), vs)
-	       && GET_MODE_2XWIDER_MODE (vmode).exists ())
-	  vmode = GET_MODE_2XWIDER_MODE (vmode).require ();
+	machine_mode alt_vmode;
+	for (unsigned int j = 0; j < modes.length (); ++j)
+	  if (related_vector_mode (modes[j], mode).exists (&alt_vmode)
+	      && known_ge (GET_MODE_SIZE (alt_vmode), GET_MODE_SIZE (vmode)))
+	    vmode = alt_vmode;
 
 	tree type = lang_hooks.types.type_for_mode (mode, 1);
 	if (type == NULL_TREE || TYPE_MODE (type) != mode)
 	  continue;
-	poly_uint64 nelts = exact_div (GET_MODE_SIZE (vmode),
-				       GET_MODE_SIZE (mode));
-	type = build_vector_type (type, nelts);
-	if (TYPE_MODE (type) != vmode)
-	  continue;
+	type = build_vector_type_for_mode (type, vmode);
+	/* The functions above are not allowed to return invalid modes.  */
+	gcc_assert (TYPE_MODE (type) == vmode);
 	if (TYPE_ALIGN_UNIT (type) > al)
 	  al = TYPE_ALIGN_UNIT (type);
       }
Index: gcc/optabs-query.c
===================================================================
--- gcc/optabs-query.c	2019-10-25 13:26:59.305877583 +0100
+++ gcc/optabs-query.c	2019-10-25 13:27:15.525762975 +0100
@@ -589,11 +589,11 @@ can_vec_mask_load_store_p (machine_mode
       && convert_optab_handler (op, vmode, mask_mode) != CODE_FOR_nothing)
     return true;
 
-  auto_vector_sizes vector_sizes;
-  targetm.vectorize.autovectorize_vector_sizes (&vector_sizes, true);
-  for (unsigned int i = 0; i < vector_sizes.length (); ++i)
+  auto_vector_modes vector_modes;
+  targetm.vectorize.autovectorize_vector_modes (&vector_modes, true);
+  for (unsigned int i = 0; i < vector_modes.length (); ++i)
     {
-      poly_uint64 cur = vector_sizes[i];
+      poly_uint64 cur = GET_MODE_SIZE (vector_modes[i]);
       poly_uint64 nunits;
       if (!multiple_p (cur, GET_MODE_SIZE (smode), &nunits))
 	continue;
Index: gcc/tree-vect-loop.c
===================================================================
--- gcc/tree-vect-loop.c	2019-10-25 13:26:59.137878771 +0100
+++ gcc/tree-vect-loop.c	2019-10-25 13:27:15.525762975 +0100
@@ -2319,12 +2319,12 @@ vect_analyze_loop_2 (loop_vec_info loop_
 vect_analyze_loop (class loop *loop, loop_vec_info orig_loop_vinfo,
 		   vec_info_shared *shared)
 {
-  auto_vector_sizes vector_sizes;
+  auto_vector_modes vector_modes;
 
   /* Autodetect first vector size we try.  */
-  targetm.vectorize.autovectorize_vector_sizes (&vector_sizes,
+  targetm.vectorize.autovectorize_vector_modes (&vector_modes,
 						loop->simdlen != 0);
-  unsigned int next_size = 0;
+  unsigned int mode_i = 0;
 
   DUMP_VECT_SCOPE ("analyze_loop_nest");
 
@@ -2343,7 +2343,7 @@ vect_analyze_loop (class loop *loop, loo
   unsigned n_stmts = 0;
   poly_uint64 autodetected_vector_size = 0;
   opt_loop_vec_info first_loop_vinfo = opt_loop_vec_info::success (NULL);
-  poly_uint64 next_vector_size = 0;
+  machine_mode next_vector_mode = VOIDmode;
   while (1)
     {
       /* Check the CFG characteristics of the loop (nesting, entry/exit).  */
@@ -2357,7 +2357,7 @@ vect_analyze_loop (class loop *loop, loo
 	  gcc_checking_assert (first_loop_vinfo == NULL);
 	  return loop_vinfo;
 	}
-      loop_vinfo->vector_size = next_vector_size;
+      loop_vinfo->vector_size = GET_MODE_SIZE (next_vector_mode);
 
       bool fatal = false;
 
@@ -2365,7 +2365,7 @@ vect_analyze_loop (class loop *loop, loo
 	LOOP_VINFO_ORIG_LOOP_INFO (loop_vinfo) = orig_loop_vinfo;
 
       opt_result res = vect_analyze_loop_2 (loop_vinfo, fatal, &n_stmts);
-      if (next_size == 0)
+      if (mode_i == 0)
 	autodetected_vector_size = loop_vinfo->vector_size;
 
       if (res)
@@ -2399,11 +2399,12 @@ vect_analyze_loop (class loop *loop, loo
 	  return opt_loop_vec_info::propagate_failure (res);
 	}
 
-      if (next_size < vector_sizes.length ()
-	  && known_eq (vector_sizes[next_size], autodetected_vector_size))
-	next_size += 1;
+      if (mode_i < vector_modes.length ()
+	  && known_eq (GET_MODE_SIZE (vector_modes[mode_i]),
+		       autodetected_vector_size))
+	mode_i += 1;
 
-      if (next_size == vector_sizes.length ()
+      if (mode_i == vector_modes.length ()
 	  || known_eq (autodetected_vector_size, 0U))
 	{
 	  if (first_loop_vinfo)
@@ -2423,15 +2424,11 @@ vect_analyze_loop (class loop *loop, loo
 	}
 
       /* Try the next biggest vector size.  */
-      next_vector_size = vector_sizes[next_size++];
+      next_vector_mode = vector_modes[mode_i++];
       if (dump_enabled_p ())
-	{
-	  dump_printf_loc (MSG_NOTE, vect_location,
-			   "***** Re-trying analysis with "
-			   "vector size ");
-	  dump_dec (MSG_NOTE, next_vector_size);
-	  dump_printf (MSG_NOTE, "\n");
-	}
+	dump_printf_loc (MSG_NOTE, vect_location,
+			 "***** Re-trying analysis with vector mode %s\n",
+			 GET_MODE_NAME (next_vector_mode));
     }
 }
 
@@ -8277,9 +8274,9 @@ vect_transform_loop (loop_vec_info loop_
 
   if (epilogue)
     {
-      auto_vector_sizes vector_sizes;
-      targetm.vectorize.autovectorize_vector_sizes (&vector_sizes, false);
-      unsigned int next_size = 0;
+      auto_vector_modes vector_modes;
+      targetm.vectorize.autovectorize_vector_modes (&vector_modes, false);
+      unsigned int next_i = 0;
 
       /* Note LOOP_VINFO_NITERS_KNOWN_P and LOOP_VINFO_INT_NITERS work
          on niters already ajusted for the iterations of the prologue.  */
@@ -8295,18 +8292,20 @@ vect_transform_loop (loop_vec_info loop_
 	  epilogue->any_upper_bound = true;
 
 	  unsigned int ratio;
-	  while (next_size < vector_sizes.length ()
-		 && !(constant_multiple_p (loop_vinfo->vector_size,
-					   vector_sizes[next_size], &ratio)
+	  while (next_i < vector_modes.length ()
+		 && !(constant_multiple_p
+		      (loop_vinfo->vector_size,
+		       GET_MODE_SIZE (vector_modes[next_i]), &ratio)
 		      && eiters >= lowest_vf / ratio))
-	    next_size += 1;
+	    next_i += 1;
 	}
       else
-	while (next_size < vector_sizes.length ()
-	       && maybe_lt (loop_vinfo->vector_size, vector_sizes[next_size]))
-	  next_size += 1;
+	while (next_i < vector_modes.length ()
+	       && maybe_lt (loop_vinfo->vector_size,
+			    GET_MODE_SIZE (vector_modes[next_i])))
+	  next_i += 1;
 
-      if (next_size == vector_sizes.length ())
+      if (next_i == vector_modes.length ())
 	epilogue = NULL;
     }
 
Index: gcc/tree-vect-slp.c
===================================================================
--- gcc/tree-vect-slp.c	2019-10-25 13:26:59.141878743 +0100
+++ gcc/tree-vect-slp.c	2019-10-25 13:27:15.525762975 +0100
@@ -3087,12 +3087,12 @@ vect_slp_bb_region (gimple_stmt_iterator
 		    unsigned int n_stmts)
 {
   bb_vec_info bb_vinfo;
-  auto_vector_sizes vector_sizes;
+  auto_vector_modes vector_modes;
 
   /* Autodetect first vector size we try.  */
-  poly_uint64 next_vector_size = 0;
-  targetm.vectorize.autovectorize_vector_sizes (&vector_sizes, false);
-  unsigned int next_size = 0;
+  machine_mode next_vector_mode = VOIDmode;
+  targetm.vectorize.autovectorize_vector_modes (&vector_modes, false);
+  unsigned int mode_i = 0;
 
   vec_info_shared shared;
 
@@ -3109,7 +3109,7 @@ vect_slp_bb_region (gimple_stmt_iterator
 	bb_vinfo->shared->save_datarefs ();
       else
 	bb_vinfo->shared->check_datarefs ();
-      bb_vinfo->vector_size = next_vector_size;
+      bb_vinfo->vector_size = GET_MODE_SIZE (next_vector_mode);
 
       if (vect_slp_analyze_bb_1 (bb_vinfo, n_stmts, fatal)
 	  && dbg_cnt (vect_slp))
@@ -3136,17 +3136,18 @@ vect_slp_bb_region (gimple_stmt_iterator
 	  vectorized = true;
 	}
 
-      if (next_size == 0)
+      if (mode_i == 0)
 	autodetected_vector_size = bb_vinfo->vector_size;
 
       delete bb_vinfo;
 
-      if (next_size < vector_sizes.length ()
-	  && known_eq (vector_sizes[next_size], autodetected_vector_size))
-	next_size += 1;
+      if (mode_i < vector_modes.length ()
+	  && known_eq (GET_MODE_SIZE (vector_modes[mode_i]),
+		       autodetected_vector_size))
+	mode_i += 1;
 
       if (vectorized
-	  || next_size == vector_sizes.length ()
+	  || mode_i == vector_modes.length ()
 	  || known_eq (autodetected_vector_size, 0U)
 	  /* If vect_slp_analyze_bb_1 signaled that analysis for all
 	     vector sizes will fail do not bother iterating.  */
@@ -3154,15 +3155,11 @@ vect_slp_bb_region (gimple_stmt_iterator
 	return vectorized;
 
       /* Try the next biggest vector size.  */
-      next_vector_size = vector_sizes[next_size++];
+      next_vector_mode = vector_modes[mode_i++];
       if (dump_enabled_p ())
-	{
-	  dump_printf_loc (MSG_NOTE, vect_location,
-			   "***** Re-trying analysis with "
-			   "vector size ");
-	  dump_dec (MSG_NOTE, next_vector_size);
-	  dump_printf (MSG_NOTE, "\n");
-	}
+	dump_printf_loc (MSG_NOTE, vect_location,
+			 "***** Re-trying analysis with vector mode %s\n",
+			 GET_MODE_NAME (next_vector_mode));
     }
 }
 
Index: gcc/config/aarch64/aarch64.c
===================================================================
--- gcc/config/aarch64/aarch64.c	2019-10-25 13:26:59.177878488 +0100
+++ gcc/config/aarch64/aarch64.c	2019-10-25 13:27:15.505763118 +0100
@@ -15203,12 +15203,12 @@ aarch64_preferred_simd_mode (scalar_mode
 /* Return a list of possible vector sizes for the vectorizer
    to iterate over.  */
 static void
-aarch64_autovectorize_vector_sizes (vector_sizes *sizes, bool)
+aarch64_autovectorize_vector_modes (vector_modes *modes, bool)
 {
   if (TARGET_SVE)
-    sizes->safe_push (BYTES_PER_SVE_VECTOR);
-  sizes->safe_push (16);
-  sizes->safe_push (8);
+    modes->safe_push (VNx16QImode);
+  modes->safe_push (V16QImode);
+  modes->safe_push (V8QImode);
 }
 
 /* Implement TARGET_MANGLE_TYPE.  */
@@ -20915,9 +20915,9 @@ #define TARGET_VECTORIZE_BUILTINS
 #define TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION \
   aarch64_builtin_vectorized_function
 
-#undef TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES
-#define TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES \
-  aarch64_autovectorize_vector_sizes
+#undef TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES
+#define TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES \
+  aarch64_autovectorize_vector_modes
 
 #undef TARGET_ATOMIC_ASSIGN_EXPAND_FENV
 #define TARGET_ATOMIC_ASSIGN_EXPAND_FENV \
Index: gcc/config/arc/arc.c
===================================================================
--- gcc/config/arc/arc.c	2019-10-25 09:21:25.974346475 +0100
+++ gcc/config/arc/arc.c	2019-10-25 13:27:15.505763118 +0100
@@ -607,15 +607,15 @@ arc_preferred_simd_mode (scalar_mode mod
 }
 
 /* Implements target hook
-   TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES.  */
+   TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES.  */
 
 static void
-arc_autovectorize_vector_sizes (vector_sizes *sizes, bool)
+arc_autovectorize_vector_modes (vector_modes *modes, bool)
 {
   if (TARGET_PLUS_QMACW)
     {
-      sizes->quick_push (8);
-      sizes->quick_push (4);
+      modes->quick_push (V4HImode);
+      modes->quick_push (V2HImode);
     }
 }
 
@@ -726,8 +726,8 @@ #define TARGET_VECTOR_MODE_SUPPORTED_P a
 #undef TARGET_VECTORIZE_PREFERRED_SIMD_MODE
 #define TARGET_VECTORIZE_PREFERRED_SIMD_MODE arc_preferred_simd_mode
 
-#undef TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES
-#define TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES arc_autovectorize_vector_sizes
+#undef TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES
+#define TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES arc_autovectorize_vector_modes
 
 #undef TARGET_CAN_USE_DOLOOP_P
 #define TARGET_CAN_USE_DOLOOP_P arc_can_use_doloop_p
Index: gcc/config/arm/arm.c
===================================================================
--- gcc/config/arm/arm.c	2019-10-23 11:29:47.933883742 +0100
+++ gcc/config/arm/arm.c	2019-10-25 13:27:15.513763059 +0100
@@ -289,7 +289,7 @@ static bool arm_builtin_support_vector_m
 static void arm_conditional_register_usage (void);
 static enum flt_eval_method arm_excess_precision (enum excess_precision_type);
 static reg_class_t arm_preferred_rename_class (reg_class_t rclass);
-static void arm_autovectorize_vector_sizes (vector_sizes *, bool);
+static void arm_autovectorize_vector_modes (vector_modes *, bool);
 static int arm_default_branch_cost (bool, bool);
 static int arm_cortex_a5_branch_cost (bool, bool);
 static int arm_cortex_m_branch_cost (bool, bool);
@@ -522,9 +522,9 @@ #define TARGET_VECTOR_MODE_SUPPORTED_P a
 #define TARGET_ARRAY_MODE_SUPPORTED_P arm_array_mode_supported_p
 #undef TARGET_VECTORIZE_PREFERRED_SIMD_MODE
 #define TARGET_VECTORIZE_PREFERRED_SIMD_MODE arm_preferred_simd_mode
-#undef TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES
-#define TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES \
-  arm_autovectorize_vector_sizes
+#undef TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES
+#define TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES \
+  arm_autovectorize_vector_modes
 
 #undef  TARGET_MACHINE_DEPENDENT_REORG
 #define TARGET_MACHINE_DEPENDENT_REORG arm_reorg
@@ -29012,12 +29012,12 @@ arm_vector_alignment (const_tree type)
 }
 
 static void
-arm_autovectorize_vector_sizes (vector_sizes *sizes, bool)
+arm_autovectorize_vector_modes (vector_modes *modes, bool)
 {
   if (!TARGET_NEON_VECTORIZE_DOUBLE)
     {
-      sizes->safe_push (16);
-      sizes->safe_push (8);
+      modes->safe_push (V16QImode);
+      modes->safe_push (V8QImode);
     }
 }
 
Index: gcc/config/i386/i386.c
===================================================================
--- gcc/config/i386/i386.c	2019-10-25 13:26:59.277877782 +0100
+++ gcc/config/i386/i386.c	2019-10-25 13:27:15.517763031 +0100
@@ -21387,35 +21387,35 @@ ix86_preferred_simd_mode (scalar_mode mo
    256bit and 128bit vectors.  */
 
 static void
-ix86_autovectorize_vector_sizes (vector_sizes *sizes, bool all)
+ix86_autovectorize_vector_modes (vector_modes *modes, bool all)
 {
   if (TARGET_AVX512F && !TARGET_PREFER_AVX256)
     {
-      sizes->safe_push (64);
-      sizes->safe_push (32);
-      sizes->safe_push (16);
+      modes->safe_push (V64QImode);
+      modes->safe_push (V32QImode);
+      modes->safe_push (V16QImode);
     }
   else if (TARGET_AVX512F && all)
     {
-      sizes->safe_push (32);
-      sizes->safe_push (16);
-      sizes->safe_push (64);
+      modes->safe_push (V32QImode);
+      modes->safe_push (V16QImode);
+      modes->safe_push (V64QImode);
     }
   else if (TARGET_AVX && !TARGET_PREFER_AVX128)
     {
-      sizes->safe_push (32);
-      sizes->safe_push (16);
+      modes->safe_push (V32QImode);
+      modes->safe_push (V16QImode);
     }
   else if (TARGET_AVX && all)
     {
-      sizes->safe_push (16);
-      sizes->safe_push (32);
+      modes->safe_push (V16QImode);
+      modes->safe_push (V32QImode);
     }
   else if (TARGET_MMX_WITH_SSE)
-    sizes->safe_push (16);
+    modes->safe_push (V16QImode);
 
   if (TARGET_MMX_WITH_SSE)
-    sizes->safe_push (8);
+    modes->safe_push (V8QImode);
 }
 
 /* Implemenation of targetm.vectorize.get_mask_mode.  */
@@ -22954,9 +22954,9 @@ #define TARGET_VECTORIZE_PREFERRED_SIMD_
 #undef TARGET_VECTORIZE_SPLIT_REDUCTION
 #define TARGET_VECTORIZE_SPLIT_REDUCTION \
   ix86_split_reduction
-#undef TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES
-#define TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES \
-  ix86_autovectorize_vector_sizes
+#undef TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES
+#define TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES \
+  ix86_autovectorize_vector_modes
 #undef TARGET_VECTORIZE_GET_MASK_MODE
 #define TARGET_VECTORIZE_GET_MASK_MODE ix86_get_mask_mode
 #undef TARGET_VECTORIZE_INIT_COST
Index: gcc/config/mips/mips.c
===================================================================
--- gcc/config/mips/mips.c	2019-10-17 14:22:54.903313423 +0100
+++ gcc/config/mips/mips.c	2019-10-25 13:27:15.517763031 +0100
@@ -13453,13 +13453,13 @@ mips_preferred_simd_mode (scalar_mode mo
   return word_mode;
 }
 
-/* Implement TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES.  */
+/* Implement TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES.  */
 
 static void
-mips_autovectorize_vector_sizes (vector_sizes *sizes, bool)
+mips_autovectorize_vector_modes (vector_modes *modes, bool)
 {
   if (ISA_HAS_MSA)
-    sizes->safe_push (16);
+    modes->safe_push (V16QImode);
 }
 
 /* Implement TARGET_INIT_LIBFUNCS.  */
@@ -22694,9 +22694,9 @@ #define TARGET_SCALAR_MODE_SUPPORTED_P m
 
 #undef TARGET_VECTORIZE_PREFERRED_SIMD_MODE
 #define TARGET_VECTORIZE_PREFERRED_SIMD_MODE mips_preferred_simd_mode
-#undef TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES
-#define TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES \
-  mips_autovectorize_vector_sizes
+#undef TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES
+#define TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES \
+  mips_autovectorize_vector_modes
 
 #undef TARGET_INIT_BUILTINS
 #define TARGET_INIT_BUILTINS mips_init_builtins

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [9/n] Replace vec_info::vector_size with vec_info::vector_mode
  2019-10-25 12:32 [0/n] Support multiple vector sizes for vectorisation Richard Sandiford
                   ` (2 preceding siblings ...)
  2019-10-25 12:39 ` [8/n] Replace autovectorize_vector_sizes with autovectorize_vector_modes Richard Sandiford
@ 2019-10-25 12:41 ` Richard Sandiford
  2019-11-05 12:47   ` Richard Biener
  2019-10-25 12:43 ` [10/n] Make less use of get_same_sized_vectype Richard Sandiford
                   ` (9 subsequent siblings)
  13 siblings, 1 reply; 48+ messages in thread
From: Richard Sandiford @ 2019-10-25 12:41 UTC (permalink / raw)
  To: gcc-patches

This patch replaces vec_info::vector_size with vec_info::vector_mode,
but for now continues to use it as a way of specifying a single
vector size.  This makes it easier for later patches to use
related_vector_mode instead.


2019-10-24  Richard Sandiford  <richard.sandiford@arm.com>

gcc/
	* tree-vectorizer.h (vec_info::vector_size): Replace with...
	(vec_info::vector_mode): ...this new field.
	* tree-vect-loop.c (vect_update_vf_for_slp): Update accordingly.
	(vect_analyze_loop, vect_transform_loop): Likewise.
	* tree-vect-slp.c (can_duplicate_and_interleave_p): Likewise.
	(vect_make_slp_decision, vect_slp_bb_region): Likewise.
	* tree-vect-stmts.c (get_vectype_for_scalar_type): Likewise.
	* tree-vectorizer.c (try_vectorize_loop_1): Likewise.

gcc/testsuite/
	* gcc.dg/vect/vect-tail-nomask-1.c: Update expected epilogue
	vectorization message.

Index: gcc/tree-vectorizer.h
===================================================================
--- gcc/tree-vectorizer.h	2019-10-25 13:26:59.093879082 +0100
+++ gcc/tree-vectorizer.h	2019-10-25 13:27:19.317736181 +0100
@@ -329,9 +329,9 @@ typedef std::pair<tree, tree> vec_object
   /* Cost data used by the target cost model.  */
   void *target_cost_data;
 
-  /* The vector size for this loop in bytes, or 0 if we haven't picked
-     a size yet.  */
-  poly_uint64 vector_size;
+  /* If we've chosen a vector size for this vectorization region,
+     this is one mode that has such a size, otherwise it is VOIDmode.  */
+  machine_mode vector_mode;
 
 private:
   stmt_vec_info new_stmt_vec_info (gimple *stmt);
Index: gcc/tree-vect-loop.c
===================================================================
--- gcc/tree-vect-loop.c	2019-10-25 13:27:15.525762975 +0100
+++ gcc/tree-vect-loop.c	2019-10-25 13:27:19.309736237 +0100
@@ -1414,8 +1414,8 @@ vect_update_vf_for_slp (loop_vec_info lo
 	dump_printf_loc (MSG_NOTE, vect_location,
 			 "Loop contains SLP and non-SLP stmts\n");
       /* Both the vectorization factor and unroll factor have the form
-	 loop_vinfo->vector_size * X for some rational X, so they must have
-	 a common multiple.  */
+	 GET_MODE_SIZE (loop_vinfo->vector_mode) * X for some rational X,
+	 so they must have a common multiple.  */
       vectorization_factor
 	= force_common_multiple (vectorization_factor,
 				 LOOP_VINFO_SLP_UNROLLING_FACTOR (loop_vinfo));
@@ -2341,7 +2341,7 @@ vect_analyze_loop (class loop *loop, loo
        " loops cannot be vectorized\n");
 
   unsigned n_stmts = 0;
-  poly_uint64 autodetected_vector_size = 0;
+  machine_mode autodetected_vector_mode = VOIDmode;
   opt_loop_vec_info first_loop_vinfo = opt_loop_vec_info::success (NULL);
   machine_mode next_vector_mode = VOIDmode;
   while (1)
@@ -2357,7 +2357,7 @@ vect_analyze_loop (class loop *loop, loo
 	  gcc_checking_assert (first_loop_vinfo == NULL);
 	  return loop_vinfo;
 	}
-      loop_vinfo->vector_size = GET_MODE_SIZE (next_vector_mode);
+      loop_vinfo->vector_mode = next_vector_mode;
 
       bool fatal = false;
 
@@ -2366,7 +2366,7 @@ vect_analyze_loop (class loop *loop, loo
 
       opt_result res = vect_analyze_loop_2 (loop_vinfo, fatal, &n_stmts);
       if (mode_i == 0)
-	autodetected_vector_size = loop_vinfo->vector_size;
+	autodetected_vector_mode = loop_vinfo->vector_mode;
 
       if (res)
 	{
@@ -2401,21 +2401,21 @@ vect_analyze_loop (class loop *loop, loo
 
       if (mode_i < vector_modes.length ()
 	  && known_eq (GET_MODE_SIZE (vector_modes[mode_i]),
-		       autodetected_vector_size))
+		       GET_MODE_SIZE (autodetected_vector_mode)))
 	mode_i += 1;
 
       if (mode_i == vector_modes.length ()
-	  || known_eq (autodetected_vector_size, 0U))
+	  || autodetected_vector_mode == VOIDmode)
 	{
 	  if (first_loop_vinfo)
 	    {
 	      loop->aux = (loop_vec_info) first_loop_vinfo;
 	      if (dump_enabled_p ())
 		{
+		  machine_mode mode = first_loop_vinfo->vector_mode;
 		  dump_printf_loc (MSG_NOTE, vect_location,
-				   "***** Choosing vector size ");
-		  dump_dec (MSG_NOTE, first_loop_vinfo->vector_size);
-		  dump_printf (MSG_NOTE, "\n");
+				   "***** Choosing vector mode %s\n",
+				   GET_MODE_NAME (mode));
 		}
 	      return first_loop_vinfo;
 	    }
@@ -8238,12 +8238,9 @@ vect_transform_loop (loop_vec_info loop_
 	  dump_printf (MSG_NOTE, "\n");
 	}
       else
-	{
-	  dump_printf_loc (MSG_NOTE, vect_location,
-			   "LOOP EPILOGUE VECTORIZED (VS=");
-	  dump_dec (MSG_NOTE, loop_vinfo->vector_size);
-	  dump_printf (MSG_NOTE, ")\n");
-	}
+	dump_printf_loc (MSG_NOTE, vect_location,
+			 "LOOP EPILOGUE VECTORIZED (MODE=%s)\n",
+			 GET_MODE_NAME (loop_vinfo->vector_mode));
     }
 
   /* Loops vectorized with a variable factor won't benefit from
@@ -8294,14 +8291,14 @@ vect_transform_loop (loop_vec_info loop_
 	  unsigned int ratio;
 	  while (next_i < vector_modes.length ()
 		 && !(constant_multiple_p
-		      (loop_vinfo->vector_size,
+		      (GET_MODE_SIZE (loop_vinfo->vector_mode),
 		       GET_MODE_SIZE (vector_modes[next_i]), &ratio)
 		      && eiters >= lowest_vf / ratio))
 	    next_i += 1;
 	}
       else
 	while (next_i < vector_modes.length ()
-	       && maybe_lt (loop_vinfo->vector_size,
+	       && maybe_lt (GET_MODE_SIZE (loop_vinfo->vector_mode),
 			    GET_MODE_SIZE (vector_modes[next_i])))
 	  next_i += 1;
 
Index: gcc/tree-vect-slp.c
===================================================================
--- gcc/tree-vect-slp.c	2019-10-25 13:27:15.525762975 +0100
+++ gcc/tree-vect-slp.c	2019-10-25 13:27:19.313736209 +0100
@@ -274,7 +274,7 @@ can_duplicate_and_interleave_p (vec_info
     {
       scalar_int_mode int_mode;
       poly_int64 elt_bits = elt_bytes * BITS_PER_UNIT;
-      if (multiple_p (vinfo->vector_size, elt_bytes, &nelts)
+      if (multiple_p (GET_MODE_SIZE (vinfo->vector_mode), elt_bytes, &nelts)
 	  && int_mode_for_size (elt_bits, 0).exists (&int_mode))
 	{
 	  tree int_type = build_nonstandard_integer_type
@@ -474,7 +474,7 @@ vect_get_and_check_slp_defs (vec_info *v
 	    }
 	  if ((dt == vect_constant_def
 	       || dt == vect_external_def)
-	      && !vinfo->vector_size.is_constant ()
+	      && !GET_MODE_SIZE (vinfo->vector_mode).is_constant ()
 	      && (TREE_CODE (type) == BOOLEAN_TYPE
 		  || !can_duplicate_and_interleave_p (vinfo, stmts.length (),
 						      TYPE_MODE (type))))
@@ -2339,8 +2339,11 @@ vect_make_slp_decision (loop_vec_info lo
   FOR_EACH_VEC_ELT (slp_instances, i, instance)
     {
       /* FORNOW: SLP if you can.  */
-      /* All unroll factors have the form vinfo->vector_size * X for some
-	 rational X, so they must have a common multiple.  */
+      /* All unroll factors have the form:
+
+	   GET_MODE_SIZE (vinfo->vector_mode) * X
+
+	 for some rational X, so they must have a common multiple.  */
       unrolling_factor
 	= force_common_multiple (unrolling_factor,
 				 SLP_INSTANCE_UNROLLING_FACTOR (instance));
@@ -3096,7 +3099,7 @@ vect_slp_bb_region (gimple_stmt_iterator
 
   vec_info_shared shared;
 
-  poly_uint64 autodetected_vector_size = 0;
+  machine_mode autodetected_vector_mode = VOIDmode;
   while (1)
     {
       bool vectorized = false;
@@ -3109,7 +3112,7 @@ vect_slp_bb_region (gimple_stmt_iterator
 	bb_vinfo->shared->save_datarefs ();
       else
 	bb_vinfo->shared->check_datarefs ();
-      bb_vinfo->vector_size = GET_MODE_SIZE (next_vector_mode);
+      bb_vinfo->vector_mode = next_vector_mode;
 
       if (vect_slp_analyze_bb_1 (bb_vinfo, n_stmts, fatal)
 	  && dbg_cnt (vect_slp))
@@ -3123,7 +3126,7 @@ vect_slp_bb_region (gimple_stmt_iterator
 	  unsigned HOST_WIDE_INT bytes;
 	  if (dump_enabled_p ())
 	    {
-	      if (bb_vinfo->vector_size.is_constant (&bytes))
+	      if (GET_MODE_SIZE (bb_vinfo->vector_mode).is_constant (&bytes))
 		dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, vect_location,
 				 "basic block part vectorized using %wu byte "
 				 "vectors\n", bytes);
@@ -3137,18 +3140,18 @@ vect_slp_bb_region (gimple_stmt_iterator
 	}
 
       if (mode_i == 0)
-	autodetected_vector_size = bb_vinfo->vector_size;
+	autodetected_vector_mode = bb_vinfo->vector_mode;
 
       delete bb_vinfo;
 
       if (mode_i < vector_modes.length ()
 	  && known_eq (GET_MODE_SIZE (vector_modes[mode_i]),
-		       autodetected_vector_size))
+		       GET_MODE_SIZE (autodetected_vector_mode)))
 	mode_i += 1;
 
       if (vectorized
 	  || mode_i == vector_modes.length ()
-	  || known_eq (autodetected_vector_size, 0U)
+	  || autodetected_vector_mode == VOIDmode
 	  /* If vect_slp_analyze_bb_1 signaled that analysis for all
 	     vector sizes will fail do not bother iterating.  */
 	  || fatal)
Index: gcc/tree-vect-stmts.c
===================================================================
--- gcc/tree-vect-stmts.c	2019-10-25 13:27:12.121787027 +0100
+++ gcc/tree-vect-stmts.c	2019-10-25 13:27:19.313736209 +0100
@@ -11212,11 +11212,10 @@ get_vectype_for_scalar_type_and_size (tr
 get_vectype_for_scalar_type (vec_info *vinfo, tree scalar_type)
 {
   tree vectype;
-  vectype = get_vectype_for_scalar_type_and_size (scalar_type,
-						  vinfo->vector_size);
-  if (vectype
-      && known_eq (vinfo->vector_size, 0U))
-    vinfo->vector_size = GET_MODE_SIZE (TYPE_MODE (vectype));
+  poly_uint64 vector_size = GET_MODE_SIZE (vinfo->vector_mode);
+  vectype = get_vectype_for_scalar_type_and_size (scalar_type, vector_size);
+  if (vectype && vinfo->vector_mode == VOIDmode)
+    vinfo->vector_mode = TYPE_MODE (vectype);
   return vectype;
 }
 
Index: gcc/tree-vectorizer.c
===================================================================
--- gcc/tree-vectorizer.c	2019-10-21 07:41:32.997886232 +0100
+++ gcc/tree-vectorizer.c	2019-10-25 13:27:19.317736181 +0100
@@ -971,7 +971,7 @@ try_vectorize_loop_1 (hash_table<simduid
   unsigned HOST_WIDE_INT bytes;
   if (dump_enabled_p ())
     {
-      if (loop_vinfo->vector_size.is_constant (&bytes))
+      if (GET_MODE_SIZE (loop_vinfo->vector_mode).is_constant (&bytes))
 	dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, vect_location,
 			 "loop vectorized using %wu byte vectors\n", bytes);
       else
Index: gcc/testsuite/gcc.dg/vect/vect-tail-nomask-1.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-tail-nomask-1.c	2019-03-08 18:15:02.260871260 +0000
+++ gcc/testsuite/gcc.dg/vect/vect-tail-nomask-1.c	2019-10-25 13:27:19.309736237 +0100
@@ -106,4 +106,4 @@ main (int argc, const char **argv)
 }
 
 /* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 2 "vect" { target avx2_runtime } } } */
-/* { dg-final { scan-tree-dump-times "LOOP EPILOGUE VECTORIZED \\(VS=16\\)" 2 "vect" { target avx2_runtime } } } */
+/* { dg-final { scan-tree-dump-times "LOOP EPILOGUE VECTORIZED \\(MODE=V16QI\\)" 2 "vect" { target avx2_runtime } } } */

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [10/n] Make less use of get_same_sized_vectype
  2019-10-25 12:32 [0/n] Support multiple vector sizes for vectorisation Richard Sandiford
                   ` (3 preceding siblings ...)
  2019-10-25 12:41 ` [9/n] Replace vec_info::vector_size with vec_info::vector_mode Richard Sandiford
@ 2019-10-25 12:43 ` Richard Sandiford
  2019-11-05 12:50   ` Richard Biener
  2019-10-25 12:44 ` [11/n] Support vectorisation with mixed vector sizes Richard Sandiford
                   ` (8 subsequent siblings)
  13 siblings, 1 reply; 48+ messages in thread
From: Richard Sandiford @ 2019-10-25 12:43 UTC (permalink / raw)
  To: gcc-patches

Some callers of get_same_sized_vectype were dealing with operands that
are constant or defined externally, and so have no STMT_VINFO_VECTYPE
available.  Under the current model, using get_same_sized_vectype for
that case is equivalent to using get_vectype_for_scalar_type, since
get_vectype_for_scalar_type always returns vectors of the same size,
once a size is fixed.

Using get_vectype_for_scalar_type is arguably more obvious though:
if we're using the same scalar type as we would for internal
definitions, we should use the same vector type too.  (Constant and
external definitions sometimes let us change the original scalar type
to a "nicer" scalar type, but that isn't what's happening here.)

This is a prerequisite to supporting multiple vector sizes in the same
vec_info.


2019-10-24  Richard Sandiford  <richard.sandiford@arm.com>

gcc/
	* tree-vect-stmts.c (vectorizable_call): If an operand is
	constant or external, use get_vectype_for_scalar_type
	rather than get_same_sized_vectype to get its vector type.
	(vectorizable_conversion, vectorizable_shift): Likewise.
	(vectorizable_operation): Likewise.

Index: gcc/tree-vect-stmts.c
===================================================================
--- gcc/tree-vect-stmts.c	2019-10-25 13:27:19.313736209 +0100
+++ gcc/tree-vect-stmts.c	2019-10-25 13:27:22.985710263 +0100
@@ -3308,10 +3308,10 @@ vectorizable_call (stmt_vec_info stmt_in
 	  return false;
 	}
     }
-  /* If all arguments are external or constant defs use a vector type with
-     the same size as the output vector type.  */
+  /* If all arguments are external or constant defs, infer the vector type
+     from the scalar type.  */
   if (!vectype_in)
-    vectype_in = get_same_sized_vectype (rhs_type, vectype_out);
+    vectype_in = get_vectype_for_scalar_type (vinfo, rhs_type);
   if (vec_stmt)
     gcc_assert (vectype_in);
   if (!vectype_in)
@@ -4800,10 +4800,10 @@ vectorizable_conversion (stmt_vec_info s
 	}
     }
 
-  /* If op0 is an external or constant defs use a vector type of
-     the same size as the output vector type.  */
+  /* If op0 is an external or constant def, infer the vector type
+     from the scalar type.  */
   if (!vectype_in)
-    vectype_in = get_same_sized_vectype (rhs_type, vectype_out);
+    vectype_in = get_vectype_for_scalar_type (vinfo, rhs_type);
   if (vec_stmt)
     gcc_assert (vectype_in);
   if (!vectype_in)
@@ -5564,10 +5564,10 @@ vectorizable_shift (stmt_vec_info stmt_i
                          "use not simple.\n");
       return false;
     }
-  /* If op0 is an external or constant def use a vector type with
-     the same size as the output vector type.  */
+  /* If op0 is an external or constant def, infer the vector type
+     from the scalar type.  */
   if (!vectype)
-    vectype = get_same_sized_vectype (TREE_TYPE (op0), vectype_out);
+    vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (op0));
   if (vec_stmt)
     gcc_assert (vectype);
   if (!vectype)
@@ -5666,7 +5666,7 @@ vectorizable_shift (stmt_vec_info stmt_i
                          "vector/vector shift/rotate found.\n");
 
       if (!op1_vectype)
-	op1_vectype = get_same_sized_vectype (TREE_TYPE (op1), vectype_out);
+	op1_vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (op1));
       incompatible_op1_vectype_p
 	= (op1_vectype == NULL_TREE
 	   || maybe_ne (TYPE_VECTOR_SUBPARTS (op1_vectype),
@@ -5997,8 +5997,8 @@ vectorizable_operation (stmt_vec_info st
                          "use not simple.\n");
       return false;
     }
-  /* If op0 is an external or constant def use a vector type with
-     the same size as the output vector type.  */
+  /* If op0 is an external or constant def, infer the vector type
+     from the scalar type.  */
   if (!vectype)
     {
       /* For boolean type we cannot determine vectype by
@@ -6018,7 +6018,7 @@ vectorizable_operation (stmt_vec_info st
 	  vectype = vectype_out;
 	}
       else
-	vectype = get_same_sized_vectype (TREE_TYPE (op0), vectype_out);
+	vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (op0));
     }
   if (vec_stmt)
     gcc_assert (vectype);

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [11/n] Support vectorisation with mixed vector sizes
  2019-10-25 12:32 [0/n] Support multiple vector sizes for vectorisation Richard Sandiford
                   ` (4 preceding siblings ...)
  2019-10-25 12:43 ` [10/n] Make less use of get_same_sized_vectype Richard Sandiford
@ 2019-10-25 12:44 ` Richard Sandiford
  2019-11-05 12:57   ` Richard Biener
  2019-10-25 12:49 ` [12/n] [AArch64] Support vectorising with multiple " Richard Sandiford
                   ` (7 subsequent siblings)
  13 siblings, 1 reply; 48+ messages in thread
From: Richard Sandiford @ 2019-10-25 12:44 UTC (permalink / raw)
  To: gcc-patches

After previous patches, it's now possible to make the vectoriser
support multiple vector sizes in the same vector region, using
related_vector_mode to pick the right vector mode for a given
element mode.  No port yet takes advantage of this, but I have
a follow-on patch for AArch64.

This patch also seemed like a good opportunity to add some more dump
messages: one to make it clear which vector size/mode was being used
when analysis passed or failed, and another to say when we've decided
to skip a redundant vector size/mode.


2019-10-24  Richard Sandiford  <richard.sandiford@arm.com>

gcc/
	* machmode.h (opt_machine_mode::operator==): New function.
	(opt_machine_mode::operator!=): Likewise.
	* tree-vectorizer.h (vec_info::vector_mode): Update comment.
	(get_related_vectype_for_scalar_type): Delete.
	(get_vectype_for_scalar_type_and_size): Declare.
	* tree-vect-slp.c (vect_slp_bb_region): Print dump messages to say
	whether analysis passed or failed, and with what vector modes.
	Use related_vector_mode to check whether trying a particular
	vector mode would be redundant with the autodetected mode,
	and print a dump message if we decide to skip it.
	* tree-vect-loop.c (vect_analyze_loop): Likewise.
	(vect_create_epilog_for_reduction): Use
	get_related_vectype_for_scalar_type instead of
	get_vectype_for_scalar_type_and_size.
	* tree-vect-stmts.c (get_vectype_for_scalar_type_and_size): Replace
	with...
	(get_related_vectype_for_scalar_type): ...this new function.
	Take a starting/"prevailing" vector mode rather than a vector size.
	Take an optional nunits argument, with the same meaning as for
	related_vector_mode.  Use related_vector_mode when not
	auto-detecting a mode, falling back to mode_for_vector if no
	target mode exists.
	(get_vectype_for_scalar_type): Update accordingly.
	(get_same_sized_vectype): Likewise.
	* tree-vectorizer.c (get_vec_alignment_for_array_type): Likewise.

Index: gcc/machmode.h
===================================================================
--- gcc/machmode.h	2019-10-25 13:26:59.053879364 +0100
+++ gcc/machmode.h	2019-10-25 13:27:26.201687539 +0100
@@ -258,6 +258,9 @@ #define CLASS_HAS_WIDER_MODES_P(CLASS)
   bool exists () const;
   template<typename U> bool exists (U *) const;
 
+  bool operator== (const T &m) const { return m_mode == m; }
+  bool operator!= (const T &m) const { return m_mode != m; }
+
 private:
   machine_mode m_mode;
 };
Index: gcc/tree-vectorizer.h
===================================================================
--- gcc/tree-vectorizer.h	2019-10-25 13:27:19.317736181 +0100
+++ gcc/tree-vectorizer.h	2019-10-25 13:27:26.209687483 +0100
@@ -329,8 +329,9 @@ typedef std::pair<tree, tree> vec_object
   /* Cost data used by the target cost model.  */
   void *target_cost_data;
 
-  /* If we've chosen a vector size for this vectorization region,
-     this is one mode that has such a size, otherwise it is VOIDmode.  */
+  /* The argument we should pass to related_vector_mode when looking up
+     the vector mode for a scalar mode, or VOIDmode if we haven't yet
+     made any decisions about which vector modes to use.  */
   machine_mode vector_mode;
 
 private:
@@ -1595,8 +1596,9 @@ extern dump_user_location_t find_loop_lo
 extern bool vect_can_advance_ivs_p (loop_vec_info);
 
 /* In tree-vect-stmts.c.  */
+extern tree get_related_vectype_for_scalar_type (machine_mode, tree,
+						 poly_uint64 = 0);
 extern tree get_vectype_for_scalar_type (vec_info *, tree);
-extern tree get_vectype_for_scalar_type_and_size (tree, poly_uint64);
 extern tree get_mask_type_for_scalar_type (vec_info *, tree);
 extern tree get_same_sized_vectype (tree, tree);
 extern bool vect_get_loop_mask_type (loop_vec_info);
Index: gcc/tree-vect-slp.c
===================================================================
--- gcc/tree-vect-slp.c	2019-10-25 13:27:19.313736209 +0100
+++ gcc/tree-vect-slp.c	2019-10-25 13:27:26.205687511 +0100
@@ -3118,7 +3118,12 @@ vect_slp_bb_region (gimple_stmt_iterator
 	  && dbg_cnt (vect_slp))
 	{
 	  if (dump_enabled_p ())
-	    dump_printf_loc (MSG_NOTE, vect_location, "SLPing BB part\n");
+	    {
+	      dump_printf_loc (MSG_NOTE, vect_location,
+			       "***** Analysis succeeded with vector mode"
+			       " %s\n", GET_MODE_NAME (bb_vinfo->vector_mode));
+	      dump_printf_loc (MSG_NOTE, vect_location, "SLPing BB part\n");
+	    }
 
 	  bb_vinfo->shared->check_datarefs ();
 	  vect_schedule_slp (bb_vinfo);
@@ -3138,6 +3143,13 @@ vect_slp_bb_region (gimple_stmt_iterator
 
 	  vectorized = true;
 	}
+      else
+	{
+	  if (dump_enabled_p ())
+	    dump_printf_loc (MSG_NOTE, vect_location,
+			     "***** Analysis failed with vector mode %s\n",
+			     GET_MODE_NAME (bb_vinfo->vector_mode));
+	}
 
       if (mode_i == 0)
 	autodetected_vector_mode = bb_vinfo->vector_mode;
@@ -3145,9 +3157,22 @@ vect_slp_bb_region (gimple_stmt_iterator
       delete bb_vinfo;
 
       if (mode_i < vector_modes.length ()
-	  && known_eq (GET_MODE_SIZE (vector_modes[mode_i]),
-		       GET_MODE_SIZE (autodetected_vector_mode)))
-	mode_i += 1;
+	  && VECTOR_MODE_P (autodetected_vector_mode)
+	  && (related_vector_mode (vector_modes[mode_i],
+				   GET_MODE_INNER (autodetected_vector_mode))
+	      == autodetected_vector_mode)
+	  && (related_vector_mode (autodetected_vector_mode,
+				   GET_MODE_INNER (vector_modes[mode_i]))
+	      == vector_modes[mode_i]))
+	{
+	  if (dump_enabled_p ())
+	    dump_printf_loc (MSG_NOTE, vect_location,
+			     "***** Skipping vector mode %s, which would"
+			     " repeat the analysis for %s\n",
+			     GET_MODE_NAME (vector_modes[mode_i]),
+			     GET_MODE_NAME (autodetected_vector_mode));
+	  mode_i += 1;
+	}
 
       if (vectorized
 	  || mode_i == vector_modes.length ()
Index: gcc/tree-vect-loop.c
===================================================================
--- gcc/tree-vect-loop.c	2019-10-25 13:27:19.309736237 +0100
+++ gcc/tree-vect-loop.c	2019-10-25 13:27:26.201687539 +0100
@@ -2367,6 +2367,17 @@ vect_analyze_loop (class loop *loop, loo
       opt_result res = vect_analyze_loop_2 (loop_vinfo, fatal, &n_stmts);
       if (mode_i == 0)
 	autodetected_vector_mode = loop_vinfo->vector_mode;
+      if (dump_enabled_p ())
+	{
+	  if (res)
+	    dump_printf_loc (MSG_NOTE, vect_location,
+			     "***** Analysis succeeded with vector mode %s\n",
+			     GET_MODE_NAME (loop_vinfo->vector_mode));
+	  else
+	    dump_printf_loc (MSG_NOTE, vect_location,
+			     "***** Analysis failed with vector mode %s\n",
+			     GET_MODE_NAME (loop_vinfo->vector_mode));
+	}
 
       if (res)
 	{
@@ -2400,9 +2411,22 @@ vect_analyze_loop (class loop *loop, loo
 	}
 
       if (mode_i < vector_modes.length ()
-	  && known_eq (GET_MODE_SIZE (vector_modes[mode_i]),
-		       GET_MODE_SIZE (autodetected_vector_mode)))
-	mode_i += 1;
+	  && VECTOR_MODE_P (autodetected_vector_mode)
+	  && (related_vector_mode (vector_modes[mode_i],
+				   GET_MODE_INNER (autodetected_vector_mode))
+	      == autodetected_vector_mode)
+	  && (related_vector_mode (autodetected_vector_mode,
+				   GET_MODE_INNER (vector_modes[mode_i]))
+	      == vector_modes[mode_i]))
+	{
+	  if (dump_enabled_p ())
+	    dump_printf_loc (MSG_NOTE, vect_location,
+			     "***** Skipping vector mode %s, which would"
+			     " repeat the analysis for %s\n",
+			     GET_MODE_NAME (vector_modes[mode_i]),
+			     GET_MODE_NAME (autodetected_vector_mode));
+	  mode_i += 1;
+	}
 
       if (mode_i == vector_modes.length ()
 	  || autodetected_vector_mode == VOIDmode)
@@ -4763,7 +4787,10 @@ vect_create_epilog_for_reduction (stmt_v
 	  && (mode1 = targetm.vectorize.split_reduction (mode)) != mode)
 	sz1 = GET_MODE_SIZE (mode1).to_constant ();
 
-      tree vectype1 = get_vectype_for_scalar_type_and_size (scalar_type, sz1);
+      unsigned int scalar_bytes = tree_to_uhwi (TYPE_SIZE_UNIT (scalar_type));
+      tree vectype1 = get_related_vectype_for_scalar_type (TYPE_MODE (vectype),
+							   scalar_type,
+							   sz1 / scalar_bytes);
       reduce_with_shift = have_whole_vector_shift (mode1);
       if (!VECTOR_MODE_P (mode1))
 	reduce_with_shift = false;
@@ -4781,7 +4808,9 @@ vect_create_epilog_for_reduction (stmt_v
 	{
 	  gcc_assert (!slp_reduc);
 	  sz /= 2;
-	  vectype1 = get_vectype_for_scalar_type_and_size (scalar_type, sz);
+	  vectype1 = get_related_vectype_for_scalar_type (TYPE_MODE (vectype),
+							  scalar_type,
+							  sz / scalar_bytes);
 
 	  /* The target has to make sure we support lowpart/highpart
 	     extraction, either via direct vector extract or through
Index: gcc/tree-vect-stmts.c
===================================================================
--- gcc/tree-vect-stmts.c	2019-10-25 13:27:22.985710263 +0100
+++ gcc/tree-vect-stmts.c	2019-10-25 13:27:26.205687511 +0100
@@ -11111,18 +11111,28 @@ vect_remove_stores (stmt_vec_info first_
     }
 }
 
-/* Function get_vectype_for_scalar_type_and_size.
-
-   Returns the vector type corresponding to SCALAR_TYPE  and SIZE as supported
-   by the target.  */
+/* If NUNITS is nonzero, return a vector type that contains NUNITS
+   elements of type SCALAR_TYPE, or null if the target doesn't support
+   such a type.
+
+   If NUNITS is zero, return a vector type that contains elements of
+   type SCALAR_TYPE, choosing whichever vector size the target prefers.
+
+   If PREVAILING_MODE is VOIDmode, we have not yet chosen a vector mode
+   for this vectorization region and want to "autodetect" the best choice.
+   Otherwise, PREVAILING_MODE is a previously-chosen vector TYPE_MODE
+   and we want the new type to be interoperable with it.   PREVAILING_MODE
+   in this case can be a scalar integer mode or a vector mode; when it
+   is a vector mode, the function acts like a tree-level version of
+   related_vector_mode.  */
 
 tree
-get_vectype_for_scalar_type_and_size (tree scalar_type, poly_uint64 size)
+get_related_vectype_for_scalar_type (machine_mode prevailing_mode,
+				     tree scalar_type, poly_uint64 nunits)
 {
   tree orig_scalar_type = scalar_type;
   scalar_mode inner_mode;
   machine_mode simd_mode;
-  poly_uint64 nunits;
   tree vectype;
 
   if (!is_int_mode (TYPE_MODE (scalar_type), &inner_mode)
@@ -11162,10 +11172,11 @@ get_vectype_for_scalar_type_and_size (tr
   if (scalar_type == NULL_TREE)
     return NULL_TREE;
 
-  /* If no size was supplied use the mode the target prefers.   Otherwise
-     lookup a vector mode of the specified size.  */
-  if (known_eq (size, 0U))
+  /* If no prevailing mode was supplied, use the mode the target prefers.
+     Otherwise lookup a vector mode based on the prevailing mode.  */
+  if (prevailing_mode == VOIDmode)
     {
+      gcc_assert (known_eq (nunits, 0U));
       simd_mode = targetm.vectorize.preferred_simd_mode (inner_mode);
       if (SCALAR_INT_MODE_P (simd_mode))
 	{
@@ -11181,9 +11192,19 @@ get_vectype_for_scalar_type_and_size (tr
 	    return NULL_TREE;
 	}
     }
-  else if (!multiple_p (size, nbytes, &nunits)
-	   || !mode_for_vector (inner_mode, nunits).exists (&simd_mode))
-    return NULL_TREE;
+  else if (SCALAR_INT_MODE_P (prevailing_mode)
+	   || !related_vector_mode (prevailing_mode,
+				    inner_mode, nunits).exists (&simd_mode))
+    {
+      /* Fall back to using mode_for_vector, mostly in the hope of being
+	 able to use an integer mode.  */
+      if (known_eq (nunits, 0U)
+	  && !multiple_p (GET_MODE_SIZE (prevailing_mode), nbytes, &nunits))
+	return NULL_TREE;
+
+      if (!mode_for_vector (inner_mode, nunits).exists (&simd_mode))
+	return NULL_TREE;
+    }
 
   vectype = build_vector_type_for_mode (scalar_type, simd_mode);
 
@@ -11211,9 +11232,8 @@ get_vectype_for_scalar_type_and_size (tr
 tree
 get_vectype_for_scalar_type (vec_info *vinfo, tree scalar_type)
 {
-  tree vectype;
-  poly_uint64 vector_size = GET_MODE_SIZE (vinfo->vector_mode);
-  vectype = get_vectype_for_scalar_type_and_size (scalar_type, vector_size);
+  tree vectype = get_related_vectype_for_scalar_type (vinfo->vector_mode,
+						      scalar_type);
   if (vectype && vinfo->vector_mode == VOIDmode)
     vinfo->vector_mode = TYPE_MODE (vectype);
   return vectype;
@@ -11246,8 +11266,13 @@ get_same_sized_vectype (tree scalar_type
   if (VECT_SCALAR_BOOLEAN_TYPE_P (scalar_type))
     return truth_type_for (vector_type);
 
-  return get_vectype_for_scalar_type_and_size
-	   (scalar_type, GET_MODE_SIZE (TYPE_MODE (vector_type)));
+  poly_uint64 nunits;
+  if (!multiple_p (GET_MODE_SIZE (TYPE_MODE (vector_type)),
+		   GET_MODE_SIZE (TYPE_MODE (scalar_type)), &nunits))
+    return NULL_TREE;
+
+  return get_related_vectype_for_scalar_type (TYPE_MODE (vector_type),
+					      scalar_type, nunits);
 }
 
 /* Function vect_is_simple_use.
Index: gcc/tree-vectorizer.c
===================================================================
--- gcc/tree-vectorizer.c	2019-10-25 13:27:19.317736181 +0100
+++ gcc/tree-vectorizer.c	2019-10-25 13:27:26.209687483 +0100
@@ -1348,7 +1348,7 @@ get_vec_alignment_for_array_type (tree t
   poly_uint64 array_size, vector_size;
 
   tree scalar_type = strip_array_types (type);
-  tree vectype = get_vectype_for_scalar_type_and_size (scalar_type, 0);
+  tree vectype = get_related_vectype_for_scalar_type (VOIDmode, scalar_type);
   if (!vectype
       || !poly_int_tree_p (TYPE_SIZE (type), &array_size)
       || !poly_int_tree_p (TYPE_SIZE (vectype), &vector_size)

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [12/n] [AArch64] Support vectorising with multiple vector sizes
  2019-10-25 12:32 [0/n] Support multiple vector sizes for vectorisation Richard Sandiford
                   ` (5 preceding siblings ...)
  2019-10-25 12:44 ` [11/n] Support vectorisation with mixed vector sizes Richard Sandiford
@ 2019-10-25 12:49 ` Richard Sandiford
  2019-10-25 12:51 ` [13/n] Allow mixed vector sizes within a single vectorised stmt Richard Sandiford
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 48+ messages in thread
From: Richard Sandiford @ 2019-10-25 12:49 UTC (permalink / raw)
  To: gcc-patches

This patch makes the vectoriser try mixtures of 64-bit and 128-bit
vector modes on AArch64.  It fixes some existing XFAILs and allows
kernel 24 from the Livermore Loops test to be vectorised (by using
a mixture of V2DF and V2SI).

I'll apply this if the prerequisites are approved.


2019-10-24  Richard Sandiford  <richard.sandiford@arm.com>

gcc/
	* config/aarch64/aarch64.c (aarch64_vectorize_related_mode): New
	function.
	(aarch64_autovectorize_vector_modes): Also add V4HImode and V2SImode.
	(TARGET_VECTORIZE_RELATED_MODE): Define.

gcc/testsuite/
	* gcc.dg/vect/vect-outer-4f.c: Expect the test to pass on aarch64
	targets.
	* gcc.dg/vect/vect-outer-4g.c: Likewise.
	* gcc.dg/vect/vect-outer-4k.c: Likewise.
	* gcc.dg/vect/vect-outer-4l.c: Likewise.
	* gfortran.dg/vect/vect-8.f90: Expect kernel 24 to be vectorized
	for aarch64.
	* gcc.target/aarch64/sve/reduc_strict_3.c: Update the number of
	times that "Detected double reduction" is printed.
	* gcc.target/aarch64/vect_mixed_sizes_1.c: New test.
	* gcc.target/aarch64/vect_mixed_sizes_2.c: Likewise.
	* gcc.target/aarch64/vect_mixed_sizes_3.c: Likewise.
	* gcc.target/aarch64/vect_mixed_sizes_4.c: Likewise.

Index: gcc/config/aarch64/aarch64.c
===================================================================
--- gcc/config/aarch64/aarch64.c	2019-10-25 13:27:15.505763118 +0100
+++ gcc/config/aarch64/aarch64.c	2019-10-25 13:27:29.685662922 +0100
@@ -1767,6 +1767,30 @@ aarch64_sve_int_mode (machine_mode mode)
   return aarch64_sve_data_mode (int_mode, GET_MODE_NUNITS (mode)).require ();
 }
 
+/* Implement TARGET_VECTORIZE_RELATED_MODE.  */
+
+static opt_machine_mode
+aarch64_vectorize_related_mode (machine_mode vector_mode,
+				scalar_mode element_mode,
+				poly_uint64 nunits)
+{
+  unsigned int vec_flags = aarch64_classify_vector_mode (vector_mode);
+
+  /* Prefer to use 1 128-bit vector instead of 2 64-bit vectors.  */
+  if ((vec_flags & VEC_ADVSIMD)
+      && known_eq (nunits, 0U)
+      && known_eq (GET_MODE_BITSIZE (vector_mode), 64U)
+      && maybe_ge (GET_MODE_BITSIZE (element_mode)
+		   * GET_MODE_NUNITS (vector_mode), 128U))
+    {
+      machine_mode res = aarch64_simd_container_mode (element_mode, 128);
+      if (VECTOR_MODE_P (res))
+	return res;
+    }
+
+  return default_vectorize_related_mode (vector_mode, element_mode, nunits);
+}
+
 /* Implement TARGET_PREFERRED_ELSE_VALUE.  For binary operations,
    prefer to use the first arithmetic operand as the else value if
    the else value doesn't matter, since that exactly matches the SVE
@@ -15207,8 +15231,27 @@ aarch64_autovectorize_vector_modes (vect
 {
   if (TARGET_SVE)
     modes->safe_push (VNx16QImode);
+
+  /* Try using 128-bit vectors for all element types.  */
   modes->safe_push (V16QImode);
+
+  /* Try using 64-bit vectors for 8-bit elements and 128-bit vectors
+     for wider elements.  */
   modes->safe_push (V8QImode);
+
+  /* Try using 64-bit vectors for 16-bit elements and 128-bit vectors
+     for wider elements.
+
+     TODO: We could support a limited form of V4QImode too, so that
+     we use 32-bit vectors for 8-bit elements.  */
+  modes->safe_push (V4HImode);
+
+  /* Try using 64-bit vectors for 32-bit elements and 128-bit vectors
+     for 64-bit elements.
+
+     TODO: We could similarly support limited forms of V2QImode and V2HImode
+     for this case.  */
+  modes->safe_push (V2SImode);
 }
 
 /* Implement TARGET_MANGLE_TYPE.  */
@@ -20950,6 +20993,8 @@ #define TARGET_VECTORIZE_VECTOR_ALIGNMEN
 #define TARGET_VECTORIZE_VEC_PERM_CONST \
   aarch64_vectorize_vec_perm_const
 
+#undef TARGET_VECTORIZE_RELATED_MODE
+#define TARGET_VECTORIZE_RELATED_MODE aarch64_vectorize_related_mode
 #undef TARGET_VECTORIZE_GET_MASK_MODE
 #define TARGET_VECTORIZE_GET_MASK_MODE aarch64_get_mask_mode
 #undef TARGET_VECTORIZE_EMPTY_MASK_IS_EXPENSIVE
Index: gcc/testsuite/gcc.dg/vect/vect-outer-4f.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-outer-4f.c	2019-03-08 18:15:02.304871094 +0000
+++ gcc/testsuite/gcc.dg/vect/vect-outer-4f.c	2019-10-25 13:27:29.685662922 +0100
@@ -65,4 +65,4 @@ int main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { xfail *-*-* } } } */
+/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { xfail { ! aarch64*-*-* } } } } */
Index: gcc/testsuite/gcc.dg/vect/vect-outer-4g.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-outer-4g.c	2019-03-08 18:15:02.268871230 +0000
+++ gcc/testsuite/gcc.dg/vect/vect-outer-4g.c	2019-10-25 13:27:29.685662922 +0100
@@ -65,4 +65,4 @@ int main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { xfail *-*-* } } } */
+/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { xfail { ! aarch64*-*-* } } } } */
Index: gcc/testsuite/gcc.dg/vect/vect-outer-4k.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-outer-4k.c	2019-03-08 18:15:02.280871184 +0000
+++ gcc/testsuite/gcc.dg/vect/vect-outer-4k.c	2019-10-25 13:27:29.685662922 +0100
@@ -65,4 +65,4 @@ int main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { xfail *-*-* } } } */
+/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { xfail { ! aarch64*-*-* } } } } */
Index: gcc/testsuite/gcc.dg/vect/vect-outer-4l.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-outer-4l.c	2019-03-08 18:15:02.240871337 +0000
+++ gcc/testsuite/gcc.dg/vect/vect-outer-4l.c	2019-10-25 13:27:29.685662922 +0100
@@ -65,4 +65,4 @@ int main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { xfail *-*-* } } } */
+/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { xfail { ! aarch64*-*-* } } } }*/
Index: gcc/testsuite/gfortran.dg/vect/vect-8.f90
===================================================================
--- gcc/testsuite/gfortran.dg/vect/vect-8.f90	2019-10-07 09:33:30.357955705 +0100
+++ gcc/testsuite/gfortran.dg/vect/vect-8.f90	2019-10-25 13:27:29.689662894 +0100
@@ -704,5 +704,6 @@ CALL track('KERNEL  ')
 RETURN
 END SUBROUTINE kernel
 
-! { dg-final { scan-tree-dump-times "vectorized 22 loops" 1 "vect" { target vect_intdouble_cvt } } }
-! { dg-final { scan-tree-dump-times "vectorized 17 loops" 1 "vect" { target { ! vect_intdouble_cvt } } } }
+! { dg-final { scan-tree-dump-times "vectorized 23 loops" 1 "vect" { target aarch64*-*-* } } }
+! { dg-final { scan-tree-dump-times "vectorized 22 loops" 1 "vect" { target { vect_intdouble_cvt && { ! aarch64*-*-* } } } } }
+! { dg-final { scan-tree-dump-times "vectorized 17 loops" 1 "vect" { target { { ! vect_intdouble_cvt } && { ! aarch64*-*-* } } } } }
Index: gcc/testsuite/gcc.target/aarch64/sve/reduc_strict_3.c
===================================================================
--- gcc/testsuite/gcc.target/aarch64/sve/reduc_strict_3.c	2019-10-25 10:16:43.638748945 +0100
+++ gcc/testsuite/gcc.target/aarch64/sve/reduc_strict_3.c	2019-10-25 13:27:29.685662922 +0100
@@ -122,8 +122,7 @@ double_reduc3 (float *restrict i, float
 /* { dg-final { scan-assembler-times {\tfadda\ts[0-9]+, p[0-7], s[0-9]+, z[0-9]+\.s} 4 } } */
 /* { dg-final { scan-assembler-times {\tfadda\td[0-9]+, p[0-7], d[0-9]+, z[0-9]+\.d} 9 } } */
 /* 1 reduction each for double_reduc{1,2} and 2 for double_reduc3.  Each one
-   is reported three times, once for SVE, once for 128-bit AdvSIMD and once
-   for 64-bit AdvSIMD.  */
-/* { dg-final { scan-tree-dump-times "Detected double reduction" 12 "vect" } } */
+   is reported five times, once for each of the autovectorize_vector_modes.  */
+/* { dg-final { scan-tree-dump-times "Detected double reduction" 20 "vect" } } */
 /* double_reduc2 has 2 reductions and slp_non_chained_reduc has 3.  */
 /* { dg-final { scan-tree-dump-times "Detected reduction" 10 "vect" } } */
Index: gcc/testsuite/gcc.target/aarch64/vect_mixed_sizes_1.c
===================================================================
--- /dev/null	2019-09-17 11:41:18.176664108 +0100
+++ gcc/testsuite/gcc.target/aarch64/vect_mixed_sizes_1.c	2019-10-25 13:27:29.685662922 +0100
@@ -0,0 +1,18 @@
+/* { dg-options "-O2 -ftree-vectorize" } */
+
+#pragma GCC target "+nosve"
+
+#include <stdint.h>
+
+void
+f (int64_t *x, int64_t *y, int32_t *z, int n)
+{
+  for (int i = 0; i < n; ++i)
+    {
+      x[i] += y[i];
+      z[i] += z[i - 2];
+    }
+}
+
+/* { dg-final { scan-assembler-times {\tadd\tv[0-9]+\.2d,} 1 } } */
+/* { dg-final { scan-assembler-times {\tadd\tv[0-9]+\.2s,} 1 } } */
Index: gcc/testsuite/gcc.target/aarch64/vect_mixed_sizes_2.c
===================================================================
--- /dev/null	2019-09-17 11:41:18.176664108 +0100
+++ gcc/testsuite/gcc.target/aarch64/vect_mixed_sizes_2.c	2019-10-25 13:27:29.685662922 +0100
@@ -0,0 +1,19 @@
+/* { dg-options "-O2 -ftree-vectorize" } */
+
+#pragma GCC target "+nosve"
+
+#include <stdint.h>
+
+void
+f (int32_t *x, int32_t *y, int16_t *z, int n)
+{
+  for (int i = 0; i < n; ++i)
+    {
+      x[i] += y[i];
+      z[i] += z[i - 4];
+    }
+}
+
+/* { dg-final { scan-assembler-times {\tadd\tv[0-9]+\.4s,} 1 } } */
+/* { dg-final { scan-assembler-times {\tadd\tv[0-9]+\.4h,} 1 } } */
+/* { dg-final { scan-assembler-not {\tadd\tv[0-9]+\.2s,} } } */
Index: gcc/testsuite/gcc.target/aarch64/vect_mixed_sizes_3.c
===================================================================
--- /dev/null	2019-09-17 11:41:18.176664108 +0100
+++ gcc/testsuite/gcc.target/aarch64/vect_mixed_sizes_3.c	2019-10-25 13:27:29.689662894 +0100
@@ -0,0 +1,19 @@
+/* { dg-options "-O2 -ftree-vectorize" } */
+
+#pragma GCC target "+nosve"
+
+#include <stdint.h>
+
+void
+f (int16_t *x, int16_t *y, int8_t *z, int n)
+{
+  for (int i = 0; i < n; ++i)
+    {
+      x[i] += y[i];
+      z[i] += z[i - 8];
+    }
+}
+
+/* { dg-final { scan-assembler-times {\tadd\tv[0-9]+\.8h,} 1 } } */
+/* { dg-final { scan-assembler-times {\tadd\tv[0-9]+\.8b,} 1 } } */
+/* { dg-final { scan-assembler-not {\tadd\tv[0-9]+\.4h,} } } */
Index: gcc/testsuite/gcc.target/aarch64/vect_mixed_sizes_4.c
===================================================================
--- /dev/null	2019-09-17 11:41:18.176664108 +0100
+++ gcc/testsuite/gcc.target/aarch64/vect_mixed_sizes_4.c	2019-10-25 13:27:29.689662894 +0100
@@ -0,0 +1,18 @@
+/* { dg-options "-O2 -ftree-vectorize" } */
+
+#pragma GCC target "+nosve"
+
+#include <stdint.h>
+
+void
+f (int64_t *x, int64_t *y, int8_t *z, int n)
+{
+  for (int i = 0; i < n; ++i)
+    {
+      x[i] += y[i];
+      z[i] += z[i - 8];
+    }
+}
+
+/* { dg-final { scan-assembler-times {\tadd\tv[0-9]+\.2d,} 4 } } */
+/* { dg-final { scan-assembler-times {\tadd\tv[0-9]+\.8b,} 1 } } */

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [13/n] Allow mixed vector sizes within a single vectorised stmt
  2019-10-25 12:32 [0/n] Support multiple vector sizes for vectorisation Richard Sandiford
                   ` (6 preceding siblings ...)
  2019-10-25 12:49 ` [12/n] [AArch64] Support vectorising with multiple " Richard Sandiford
@ 2019-10-25 12:51 ` Richard Sandiford
  2019-11-05 12:58   ` Richard Biener
  2019-10-25 13:00 ` [14/n] Vectorise conversions between differently-sized integer vectors Richard Sandiford
                   ` (5 subsequent siblings)
  13 siblings, 1 reply; 48+ messages in thread
From: Richard Sandiford @ 2019-10-25 12:51 UTC (permalink / raw)
  To: gcc-patches

Although a previous patch allowed mixed vector sizes within a vector
region, we generally still required equal vector sizes within a vector
stmt.  Specifically, vect_get_vector_types_for_stmt computes two vector
types: the vector type corresponding to STMT_VINFO_VECTYPE and the
vector type that determines the minimum vectorisation factor for the
stmt ("nunits_vectype").  It then required these two types to be
the same size.

There doesn't seem to be any need for that restriction though.  AFAICT,
all vectorizable_* functions either do their own compatibility checks
or don't need to do them (because gimple guarantees that the scalar
types are compatible).

It should always be the case that nunits_vectype has at least as many
elements as the other vectype, but that's something we can assert for.

I couldn't resist a couple of other tweaks while there:

- there's no need to compute nunits_vectype if its element type is
  the same as STMT_VINFO_VECTYPE's.

- it's useful to distinguish the nunits_vectype from the main vectype
  in dump messages

- when reusing the existing STMT_VINFO_VECTYPE, it's useful to say so
  in the dump, and say what the type is


2019-10-24  Richard Sandiford  <richard.sandiford@arm.com>

gcc/
	* tree-vect-stmts.c (vect_get_vector_types_for_stmt): Don't
	require vectype and nunits_vectype to have the same size;
	instead assert that nunits_vectype has at least as many
	elements as vectype.  Don't compute a separate nunits_vectype
	if the scalar type is obviously the same as vectype's.
	Tweak dump messages.

Index: gcc/tree-vect-stmts.c
===================================================================
--- gcc/tree-vect-stmts.c	2019-10-25 13:27:26.205687511 +0100
+++ gcc/tree-vect-stmts.c	2019-10-25 13:27:32.877640367 +0100
@@ -11973,7 +11973,12 @@ vect_get_vector_types_for_stmt (stmt_vec
   tree vectype;
   tree scalar_type = NULL_TREE;
   if (STMT_VINFO_VECTYPE (stmt_info))
-    *stmt_vectype_out = vectype = STMT_VINFO_VECTYPE (stmt_info);
+    {
+      *stmt_vectype_out = vectype = STMT_VINFO_VECTYPE (stmt_info);
+      if (dump_enabled_p ())
+	dump_printf_loc (MSG_NOTE, vect_location,
+			 "precomputed vectype: %T\n", vectype);
+    }
   else
     {
       gcc_assert (!STMT_VINFO_DATA_REF (stmt_info));
@@ -12005,7 +12010,7 @@ vect_get_vector_types_for_stmt (stmt_vec
 
       if (dump_enabled_p ())
 	dump_printf_loc (MSG_NOTE, vect_location,
-			 "get vectype for scalar type:  %T\n", scalar_type);
+			 "get vectype for scalar type: %T\n", scalar_type);
       vectype = get_vectype_for_scalar_type (vinfo, scalar_type);
       if (!vectype)
 	return opt_result::failure_at (stmt,
@@ -12022,42 +12027,38 @@ vect_get_vector_types_for_stmt (stmt_vec
 
   /* Don't try to compute scalar types if the stmt produces a boolean
      vector; use the existing vector type instead.  */
-  tree nunits_vectype;
-  if (VECTOR_BOOLEAN_TYPE_P (vectype))
-    nunits_vectype = vectype;
-  else
+  tree nunits_vectype = vectype;
+  if (!VECTOR_BOOLEAN_TYPE_P (vectype)
+      && *stmt_vectype_out != boolean_type_node)
     {
       /* The number of units is set according to the smallest scalar
 	 type (or the largest vector size, but we only support one
 	 vector size per vectorization).  */
-      if (*stmt_vectype_out != boolean_type_node)
+      HOST_WIDE_INT dummy;
+      scalar_type = vect_get_smallest_scalar_type (stmt_info, &dummy, &dummy);
+      if (scalar_type != TREE_TYPE (vectype))
 	{
-	  HOST_WIDE_INT dummy;
-	  scalar_type = vect_get_smallest_scalar_type (stmt_info,
-						       &dummy, &dummy);
+	  if (dump_enabled_p ())
+	    dump_printf_loc (MSG_NOTE, vect_location,
+			     "get vectype for smallest scalar type: %T\n",
+			     scalar_type);
+	  nunits_vectype = get_vectype_for_scalar_type (vinfo, scalar_type);
+	  if (!nunits_vectype)
+	    return opt_result::failure_at
+	      (stmt, "not vectorized: unsupported data-type %T\n",
+	       scalar_type);
+	  if (dump_enabled_p ())
+	    dump_printf_loc (MSG_NOTE, vect_location, "nunits vectype: %T\n",
+			     nunits_vectype);
 	}
-      if (dump_enabled_p ())
-	dump_printf_loc (MSG_NOTE, vect_location,
-			 "get vectype for scalar type:  %T\n", scalar_type);
-      nunits_vectype = get_vectype_for_scalar_type (vinfo, scalar_type);
     }
-  if (!nunits_vectype)
-    return opt_result::failure_at (stmt,
-				   "not vectorized: unsupported data-type %T\n",
-				   scalar_type);
-
-  if (maybe_ne (GET_MODE_SIZE (TYPE_MODE (vectype)),
-		GET_MODE_SIZE (TYPE_MODE (nunits_vectype))))
-    return opt_result::failure_at (stmt,
-				   "not vectorized: different sized vector "
-				   "types in statement, %T and %T\n",
-				   vectype, nunits_vectype);
+
+  gcc_assert (*stmt_vectype_out == boolean_type_node
+	      || multiple_p (TYPE_VECTOR_SUBPARTS (nunits_vectype),
+			     TYPE_VECTOR_SUBPARTS (*stmt_vectype_out)));
 
   if (dump_enabled_p ())
     {
-      dump_printf_loc (MSG_NOTE, vect_location, "vectype: %T\n",
-		       nunits_vectype);
-
       dump_printf_loc (MSG_NOTE, vect_location, "nunits = ");
       dump_dec (MSG_NOTE, TYPE_VECTOR_SUBPARTS (nunits_vectype));
       dump_printf (MSG_NOTE, "\n");

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [14/n] Vectorise conversions between differently-sized integer vectors
  2019-10-25 12:32 [0/n] Support multiple vector sizes for vectorisation Richard Sandiford
                   ` (7 preceding siblings ...)
  2019-10-25 12:51 ` [13/n] Allow mixed vector sizes within a single vectorised stmt Richard Sandiford
@ 2019-10-25 13:00 ` Richard Sandiford
  2019-11-05 13:02   ` Richard Biener
  2019-10-29 17:05 ` [15/n] Consider building nodes from scalars in vect_slp_analyze_node_operations Richard Sandiford
                   ` (4 subsequent siblings)
  13 siblings, 1 reply; 48+ messages in thread
From: Richard Sandiford @ 2019-10-25 13:00 UTC (permalink / raw)
  To: gcc-patches

This patch adds AArch64 patterns for converting between 64-bit and
128-bit integer vectors, and makes the vectoriser and expand pass
use them.


2019-10-24  Richard Sandiford  <richard.sandiford@arm.com>

gcc/
	* tree-vect-stmts.c (vectorizable_conversion): Extend the
	non-widening and non-narrowing path to handle standard
	conversion codes, if the target supports them.
	* expr.c (convert_move): Try using the extend and truncate optabs
	for vectors.
	* optabs-tree.c (supportable_convert_operation): Likewise.
	* config/aarch64/iterators.md (Vnarroqw): New iterator.
	* config/aarch64/aarch64-simd.md (<optab><Vnarrowq><mode>2)
	(trunc<mode><Vnarrowq>2): New patterns.

gcc/testsuite/
	* gcc.dg/vect/no-scevccp-outer-12.c: Expect the test to pass
	on aarch64 targets.
	* gcc.dg/vect/vect-double-reduc-5.c: Likewise.
	* gcc.dg/vect/vect-outer-4e.c: Likewise.
	* gcc.target/aarch64/vect_mixed_sizes_5.c: New test.
	* gcc.target/aarch64/vect_mixed_sizes_6.c: Likewise.
	* gcc.target/aarch64/vect_mixed_sizes_7.c: Likewise.
	* gcc.target/aarch64/vect_mixed_sizes_8.c: Likewise.
	* gcc.target/aarch64/vect_mixed_sizes_9.c: Likewise.
	* gcc.target/aarch64/vect_mixed_sizes_10.c: Likewise.
	* gcc.target/aarch64/vect_mixed_sizes_11.c: Likewise.
	* gcc.target/aarch64/vect_mixed_sizes_12.c: Likewise.
	* gcc.target/aarch64/vect_mixed_sizes_13.c: Likewise.

Index: gcc/tree-vect-stmts.c
===================================================================
--- gcc/tree-vect-stmts.c	2019-10-25 13:27:32.877640367 +0100
+++ gcc/tree-vect-stmts.c	2019-10-25 13:27:36.197616908 +0100
@@ -4861,7 +4861,9 @@ vectorizable_conversion (stmt_vec_info s
   switch (modifier)
     {
     case NONE:
-      if (code != FIX_TRUNC_EXPR && code != FLOAT_EXPR)
+      if (code != FIX_TRUNC_EXPR
+	  && code != FLOAT_EXPR
+	  && !CONVERT_EXPR_CODE_P (code))
 	return false;
       if (supportable_convert_operation (code, vectype_out, vectype_in,
 					 &decl1, &code1))
Index: gcc/expr.c
===================================================================
--- gcc/expr.c	2019-10-22 08:46:57.359355939 +0100
+++ gcc/expr.c	2019-10-25 13:27:36.193616936 +0100
@@ -250,6 +250,31 @@ convert_move (rtx to, rtx from, int unsi
 
   if (VECTOR_MODE_P (to_mode) || VECTOR_MODE_P (from_mode))
     {
+      if (GET_MODE_UNIT_PRECISION (to_mode)
+	  > GET_MODE_UNIT_PRECISION (from_mode))
+	{
+	  optab op = unsignedp ? zext_optab : sext_optab;
+	  insn_code icode = convert_optab_handler (op, to_mode, from_mode);
+	  if (icode != CODE_FOR_nothing)
+	    {
+	      emit_unop_insn (icode, to, from,
+			      unsignedp ? ZERO_EXTEND : SIGN_EXTEND);
+	      return;
+	    }
+	}
+
+      if (GET_MODE_UNIT_PRECISION (to_mode)
+	  < GET_MODE_UNIT_PRECISION (from_mode))
+	{
+	  insn_code icode = convert_optab_handler (trunc_optab,
+						   to_mode, from_mode);
+	  if (icode != CODE_FOR_nothing)
+	    {
+	      emit_unop_insn (icode, to, from, TRUNCATE);
+	      return;
+	    }
+	}
+
       gcc_assert (known_eq (GET_MODE_BITSIZE (from_mode),
 			    GET_MODE_BITSIZE (to_mode)));
 
Index: gcc/optabs-tree.c
===================================================================
--- gcc/optabs-tree.c	2019-10-08 09:23:31.894529571 +0100
+++ gcc/optabs-tree.c	2019-10-25 13:27:36.193616936 +0100
@@ -303,6 +303,20 @@ supportable_convert_operation (enum tree
       return true;
     }
 
+  if (GET_MODE_UNIT_PRECISION (m1) > GET_MODE_UNIT_PRECISION (m2)
+      && can_extend_p (m1, m2, TYPE_UNSIGNED (vectype_in)))
+    {
+      *code1 = code;
+      return true;
+    }
+
+  if (GET_MODE_UNIT_PRECISION (m1) < GET_MODE_UNIT_PRECISION (m2)
+      && convert_optab_handler (trunc_optab, m1, m2) != CODE_FOR_nothing)
+    {
+      *code1 = code;
+      return true;
+    }
+
   /* Now check for builtin.  */
   if (targetm.vectorize.builtin_conversion
       && targetm.vectorize.builtin_conversion (code, vectype_out, vectype_in))
Index: gcc/config/aarch64/iterators.md
===================================================================
--- gcc/config/aarch64/iterators.md	2019-10-17 14:23:07.711222242 +0100
+++ gcc/config/aarch64/iterators.md	2019-10-25 13:27:36.189616964 +0100
@@ -860,6 +860,8 @@ (define_mode_attr VNARROWQ [(V8HI "V8QI"
 			    (V2DI "V2SI")
 			    (DI	  "SI")	  (SI	"HI")
 			    (HI	  "QI")])
+(define_mode_attr Vnarrowq [(V8HI "v8qi") (V4SI "v4hi")
+			    (V2DI "v2si")])
 
 ;; Narrowed quad-modes for VQN (Used for XTN2).
 (define_mode_attr VNARROWQ2 [(V8HI "V16QI") (V4SI "V8HI")
Index: gcc/config/aarch64/aarch64-simd.md
===================================================================
--- gcc/config/aarch64/aarch64-simd.md	2019-08-25 19:10:35.550157075 +0100
+++ gcc/config/aarch64/aarch64-simd.md	2019-10-25 13:27:36.189616964 +0100
@@ -7007,3 +7007,21 @@ (define_insn "aarch64_crypto_pmullv2di"
   "pmull2\\t%0.1q, %1.2d, %2.2d"
   [(set_attr "type" "crypto_pmull")]
 )
+
+;; Sign- or zero-extend a 64-bit integer vector to a 128-bit vector.
+(define_insn "<optab><Vnarrowq><mode>2"
+  [(set (match_operand:VQN 0 "register_operand" "=w")
+	(ANY_EXTEND:VQN (match_operand:<VNARROWQ> 1 "register_operand" "w")))]
+  "TARGET_SIMD"
+  "<su>xtl\t%0.<Vtype>, %1.<Vntype>"
+  [(set_attr "type" "neon_shift_imm_long")]
+)
+
+;; Truncate a 128-bit integer vector to a 64-bit vector.
+(define_insn "trunc<mode><Vnarrowq>2"
+  [(set (match_operand:<VNARROWQ> 0 "register_operand" "=w")
+	(truncate:<VNARROWQ> (match_operand:VQN 1 "register_operand" "w")))]
+  "TARGET_SIMD"
+  "xtn\t%0.<Vntype>, %1.<Vtype>"
+  [(set_attr "type" "neon_shift_imm_narrow_q")]
+)
Index: gcc/testsuite/gcc.dg/vect/no-scevccp-outer-12.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/no-scevccp-outer-12.c	2019-03-08 18:15:02.252871290 +0000
+++ gcc/testsuite/gcc.dg/vect/no-scevccp-outer-12.c	2019-10-25 13:27:36.193616936 +0100
@@ -46,4 +46,4 @@ int main (void)
 }
 
 /* Until we support multiple types in the inner loop  */
-/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" { xfail *-*-* } } } */
+/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" { xfail { ! aarch64*-*-* } } } } */
Index: gcc/testsuite/gcc.dg/vect/vect-double-reduc-5.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-double-reduc-5.c	2019-03-08 18:15:02.244871320 +0000
+++ gcc/testsuite/gcc.dg/vect/vect-double-reduc-5.c	2019-10-25 13:27:36.193616936 +0100
@@ -52,5 +52,5 @@ int main ()
 
 /* Vectorization of loops with multiple types and double reduction is not 
    supported yet.  */       
-/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { xfail *-*-* } } } */
+/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { xfail { ! aarch64*-*-* } } } } */
       
Index: gcc/testsuite/gcc.dg/vect/vect-outer-4e.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-outer-4e.c	2019-03-08 18:15:02.264871246 +0000
+++ gcc/testsuite/gcc.dg/vect/vect-outer-4e.c	2019-10-25 13:27:36.193616936 +0100
@@ -23,4 +23,4 @@ foo (){
   return;
 }
 
-/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { xfail *-*-* } } } */
+/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { xfail { ! aarch64*-*-* } } } } */
Index: gcc/testsuite/gcc.target/aarch64/vect_mixed_sizes_5.c
===================================================================
--- /dev/null	2019-09-17 11:41:18.176664108 +0100
+++ gcc/testsuite/gcc.target/aarch64/vect_mixed_sizes_5.c	2019-10-25 13:27:36.193616936 +0100
@@ -0,0 +1,18 @@
+/* { dg-options "-O2 -ftree-vectorize" } */
+
+#pragma GCC target "+nosve"
+
+#include <stdint.h>
+
+void
+f (int64_t *x, int64_t *y, int32_t *z, int n)
+{
+  for (int i = 0; i < n; ++i)
+    {
+      x[i] = z[i];
+      y[i] += y[i - 2];
+    }
+}
+
+/* { dg-final { scan-assembler-times {\tsxtl\tv[0-9]+\.2d, v[0-9]+\.2s\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tadd\tv[0-9]+\.2d,} 1 } } */
Index: gcc/testsuite/gcc.target/aarch64/vect_mixed_sizes_6.c
===================================================================
--- /dev/null	2019-09-17 11:41:18.176664108 +0100
+++ gcc/testsuite/gcc.target/aarch64/vect_mixed_sizes_6.c	2019-10-25 13:27:36.193616936 +0100
@@ -0,0 +1,18 @@
+/* { dg-options "-O2 -ftree-vectorize" } */
+
+#pragma GCC target "+nosve"
+
+#include <stdint.h>
+
+void
+f (int32_t *x, int32_t *y, int16_t *z, int n)
+{
+  for (int i = 0; i < n; ++i)
+    {
+      x[i] = z[i];
+      y[i] += y[i - 4];
+    }
+}
+
+/* { dg-final { scan-assembler-times {\tsxtl\tv[0-9]+\.4s, v[0-9]+\.4h\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tadd\tv[0-9]+\.4s,} 1 } } */
Index: gcc/testsuite/gcc.target/aarch64/vect_mixed_sizes_7.c
===================================================================
--- /dev/null	2019-09-17 11:41:18.176664108 +0100
+++ gcc/testsuite/gcc.target/aarch64/vect_mixed_sizes_7.c	2019-10-25 13:27:36.193616936 +0100
@@ -0,0 +1,18 @@
+/* { dg-options "-O2 -ftree-vectorize" } */
+
+#pragma GCC target "+nosve"
+
+#include <stdint.h>
+
+void
+f (int16_t *x, int16_t *y, int8_t *z, int n)
+{
+  for (int i = 0; i < n; ++i)
+    {
+      x[i] = z[i];
+      y[i] += y[i - 8];
+    }
+}
+
+/* { dg-final { scan-assembler-times {\tsxtl\tv[0-9]+\.8h, v[0-9]+\.8b\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tadd\tv[0-9]+\.8h,} 1 } } */
Index: gcc/testsuite/gcc.target/aarch64/vect_mixed_sizes_8.c
===================================================================
--- /dev/null	2019-09-17 11:41:18.176664108 +0100
+++ gcc/testsuite/gcc.target/aarch64/vect_mixed_sizes_8.c	2019-10-25 13:27:36.193616936 +0100
@@ -0,0 +1,18 @@
+/* { dg-options "-O2 -ftree-vectorize" } */
+
+#pragma GCC target "+nosve"
+
+#include <stdint.h>
+
+void
+f (int64_t *x, int64_t *y, uint32_t *z, int n)
+{
+  for (int i = 0; i < n; ++i)
+    {
+      x[i] = z[i];
+      y[i] += y[i - 2];
+    }
+}
+
+/* { dg-final { scan-assembler-times {\tuxtl\tv[0-9]+\.2d, v[0-9]+\.2s\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tadd\tv[0-9]+\.2d,} 1 } } */
Index: gcc/testsuite/gcc.target/aarch64/vect_mixed_sizes_9.c
===================================================================
--- /dev/null	2019-09-17 11:41:18.176664108 +0100
+++ gcc/testsuite/gcc.target/aarch64/vect_mixed_sizes_9.c	2019-10-25 13:27:36.193616936 +0100
@@ -0,0 +1,18 @@
+/* { dg-options "-O2 -ftree-vectorize" } */
+
+#pragma GCC target "+nosve"
+
+#include <stdint.h>
+
+void
+f (int32_t *x, int32_t *y, uint16_t *z, int n)
+{
+  for (int i = 0; i < n; ++i)
+    {
+      x[i] = z[i];
+      y[i] += y[i - 4];
+    }
+}
+
+/* { dg-final { scan-assembler-times {\tuxtl\tv[0-9]+\.4s, v[0-9]+\.4h\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tadd\tv[0-9]+\.4s,} 1 } } */
Index: gcc/testsuite/gcc.target/aarch64/vect_mixed_sizes_10.c
===================================================================
--- /dev/null	2019-09-17 11:41:18.176664108 +0100
+++ gcc/testsuite/gcc.target/aarch64/vect_mixed_sizes_10.c	2019-10-25 13:27:36.193616936 +0100
@@ -0,0 +1,18 @@
+/* { dg-options "-O2 -ftree-vectorize" } */
+
+#pragma GCC target "+nosve"
+
+#include <stdint.h>
+
+void
+f (int16_t *x, int16_t *y, uint8_t *z, int n)
+{
+  for (int i = 0; i < n; ++i)
+    {
+      x[i] = z[i];
+      y[i] += y[i - 8];
+    }
+}
+
+/* { dg-final { scan-assembler-times {\tuxtl\tv[0-9]+\.8h, v[0-9]+\.8b\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tadd\tv[0-9]+\.8h,} 1 } } */
Index: gcc/testsuite/gcc.target/aarch64/vect_mixed_sizes_11.c
===================================================================
--- /dev/null	2019-09-17 11:41:18.176664108 +0100
+++ gcc/testsuite/gcc.target/aarch64/vect_mixed_sizes_11.c	2019-10-25 13:27:36.193616936 +0100
@@ -0,0 +1,18 @@
+/* { dg-options "-O2 -ftree-vectorize" } */
+
+#pragma GCC target "+nosve"
+
+#include <stdint.h>
+
+void
+f (int32_t *x, int64_t *y, int64_t *z, int n)
+{
+  for (int i = 0; i < n; ++i)
+    {
+      x[i] = z[i];
+      y[i] += y[i - 2];
+    }
+}
+
+/* { dg-final { scan-assembler-times {\txtn\tv[0-9]+\.2s, v[0-9]+\.2d\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tadd\tv[0-9]+\.2d,} 1 } } */
Index: gcc/testsuite/gcc.target/aarch64/vect_mixed_sizes_12.c
===================================================================
--- /dev/null	2019-09-17 11:41:18.176664108 +0100
+++ gcc/testsuite/gcc.target/aarch64/vect_mixed_sizes_12.c	2019-10-25 13:27:36.193616936 +0100
@@ -0,0 +1,18 @@
+/* { dg-options "-O2 -ftree-vectorize" } */
+
+#pragma GCC target "+nosve"
+
+#include <stdint.h>
+
+void
+f (int16_t *x, int32_t *y, int32_t *z, int n)
+{
+  for (int i = 0; i < n; ++i)
+    {
+      x[i] = z[i];
+      y[i] += y[i - 4];
+    }
+}
+
+/* { dg-final { scan-assembler-times {\txtn\tv[0-9]+\.4h, v[0-9]+\.4s\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tadd\tv[0-9]+\.4s,} 1 } } */
Index: gcc/testsuite/gcc.target/aarch64/vect_mixed_sizes_13.c
===================================================================
--- /dev/null	2019-09-17 11:41:18.176664108 +0100
+++ gcc/testsuite/gcc.target/aarch64/vect_mixed_sizes_13.c	2019-10-25 13:27:36.193616936 +0100
@@ -0,0 +1,18 @@
+/* { dg-options "-O2 -ftree-vectorize" } */
+
+#pragma GCC target "+nosve"
+
+#include <stdint.h>
+
+void
+f (int8_t *x, int16_t *y, int16_t *z, int n)
+{
+  for (int i = 0; i < n; ++i)
+    {
+      x[i] = z[i];
+      y[i] += y[i - 8];
+    }
+}
+
+/* { dg-final { scan-assembler-times {\txtn\tv[0-9]+\.8b, v[0-9]+\.8h\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tadd\tv[0-9]+\.8h,} 1 } } */

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [15/n] Consider building nodes from scalars in vect_slp_analyze_node_operations
  2019-10-25 12:32 [0/n] Support multiple vector sizes for vectorisation Richard Sandiford
                   ` (8 preceding siblings ...)
  2019-10-25 13:00 ` [14/n] Vectorise conversions between differently-sized integer vectors Richard Sandiford
@ 2019-10-29 17:05 ` Richard Sandiford
  2019-11-05 13:07   ` Richard Biener
  2019-10-29 17:14 ` [16/n] Apply maximum nunits for BB SLP Richard Sandiford
                   ` (3 subsequent siblings)
  13 siblings, 1 reply; 48+ messages in thread
From: Richard Sandiford @ 2019-10-29 17:05 UTC (permalink / raw)
  To: gcc-patches

If the statements in an SLP node aren't similar enough to be vectorised,
or aren't something the vectoriser has code to handle, the BB vectoriser
tries building the vector from scalars instead.  This patch does the
same thing if we're able to build a viable-looking tree but fail later
during the analysis phase, e.g. because the target doesn't support a
particular vector operation.

This is needed to avoid regressions with a later patch.


2019-10-29  Richard Sandiford  <richard.sandiford@arm.com>

gcc/
	* tree-vect-slp.c (vect_contains_pattern_stmt_p): New function.
	(vect_slp_convert_to_external): Likewise.
	(vect_slp_analyze_node_operations): If analysis fails, try building
	the node from scalars instead.

gcc/testsuite/
	* gcc.dg/vect/bb-slp-div-2.c: New test.

Index: gcc/tree-vect-slp.c
===================================================================
--- gcc/tree-vect-slp.c	2019-10-29 17:01:46.000000000 +0000
+++ gcc/tree-vect-slp.c	2019-10-29 17:02:06.355512105 +0000
@@ -225,6 +225,19 @@ vect_free_oprnd_info (vec<slp_oprnd_info
 }
 
 
+/* Return true if STMTS contains a pattern statement.  */
+
+static bool
+vect_contains_pattern_stmt_p (vec<stmt_vec_info> stmts)
+{
+  stmt_vec_info stmt_info;
+  unsigned int i;
+  FOR_EACH_VEC_ELT (stmts, i, stmt_info)
+    if (is_pattern_stmt_p (stmt_info))
+      return true;
+  return false;
+}
+
 /* Find the place of the data-ref in STMT_INFO in the interleaving chain
    that starts from FIRST_STMT_INFO.  Return -1 if the data-ref is not a part
    of the chain.  */
@@ -2630,6 +2643,39 @@ vect_slp_analyze_node_operations_1 (vec_
   return vect_analyze_stmt (stmt_info, &dummy, node, node_instance, cost_vec);
 }
 
+/* Try to build NODE from scalars, returning true on success.
+   NODE_INSTANCE is the SLP instance that contains NODE.  */
+
+static bool
+vect_slp_convert_to_external (vec_info *vinfo, slp_tree node,
+			      slp_instance node_instance)
+{
+  stmt_vec_info stmt_info;
+  unsigned int i;
+
+  if (!is_a <bb_vec_info> (vinfo)
+      || node == SLP_INSTANCE_TREE (node_instance)
+      || vect_contains_pattern_stmt_p (SLP_TREE_SCALAR_STMTS (node)))
+    return false;
+
+  if (dump_enabled_p ())
+    dump_printf_loc (MSG_NOTE, vect_location,
+		     "Building vector operands from scalars instead\n");
+
+  /* Don't remove and free the child nodes here, since they could be
+     referenced by other structures.  The analysis and scheduling phases
+     (need to) ignore child nodes of anything that isn't vect_internal_def.  */
+  unsigned int group_size = SLP_TREE_SCALAR_STMTS (node).length ();
+  SLP_TREE_DEF_TYPE (node) = vect_external_def;
+  SLP_TREE_SCALAR_OPS (node).safe_grow (group_size);
+  FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_STMTS (node), i, stmt_info)
+    {
+      tree lhs = gimple_get_lhs (vect_orig_stmt (stmt_info)->stmt);
+      SLP_TREE_SCALAR_OPS (node)[i] = lhs;
+    }
+  return true;
+}
+
 /* Analyze statements contained in SLP tree NODE after recursively analyzing
    the subtree.  NODE_INSTANCE contains NODE and VINFO contains INSTANCE.
 
@@ -2656,6 +2702,13 @@ vect_slp_analyze_node_operations (vec_in
     {
       SLP_TREE_NUMBER_OF_VEC_STMTS (node)
 	= SLP_TREE_NUMBER_OF_VEC_STMTS (*leader);
+      /* Cope with cases in which we made a late decision to build the
+	 node from scalars.  */
+      if (SLP_TREE_DEF_TYPE (*leader) == vect_external_def
+	  && vect_slp_convert_to_external (vinfo, node, node_instance))
+	;
+      else
+	gcc_assert (SLP_TREE_DEF_TYPE (node) == SLP_TREE_DEF_TYPE (*leader));
       return true;
     }
 
@@ -2715,6 +2768,11 @@ vect_slp_analyze_node_operations (vec_in
     if (SLP_TREE_SCALAR_STMTS (child).length () != 0)
       STMT_VINFO_DEF_TYPE (SLP_TREE_SCALAR_STMTS (child)[0]) = dt[j];
 
+  /* If this node can't be vectorized, try pruning the tree here rather
+     than felling the whole thing.  */
+  if (!res && vect_slp_convert_to_external (vinfo, node, node_instance))
+    res = true;
+
   return res;
 }
 
Index: gcc/testsuite/gcc.dg/vect/bb-slp-div-2.c
===================================================================
--- /dev/null	2019-09-17 11:41:18.176664108 +0100
+++ gcc/testsuite/gcc.dg/vect/bb-slp-div-2.c	2019-10-29 17:02:06.351512133 +0000
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+
+int x[4], y[4], z[4];
+
+void
+f (void)
+{
+  x[0] += y[0] / z[0] * 2;
+  x[1] += y[1] / z[1] * 2;
+  x[2] += y[2] / z[2] * 2;
+  x[3] += y[3] / z[3] * 2;
+}
+
+/* { dg-final { scan-tree-dump "basic block vectorized" "slp2" { target vect_int } } } */

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [16/n] Apply maximum nunits for BB SLP
  2019-10-25 12:32 [0/n] Support multiple vector sizes for vectorisation Richard Sandiford
                   ` (9 preceding siblings ...)
  2019-10-29 17:05 ` [15/n] Consider building nodes from scalars in vect_slp_analyze_node_operations Richard Sandiford
@ 2019-10-29 17:14 ` Richard Sandiford
  2019-11-05 13:22   ` Richard Biener
  2019-11-05 20:10 ` [10a/n] Require equal type sizes for vectorised calls Richard Sandiford
                   ` (2 subsequent siblings)
  13 siblings, 1 reply; 48+ messages in thread
From: Richard Sandiford @ 2019-10-29 17:14 UTC (permalink / raw)
  To: gcc-patches

The BB vectoriser picked vector types in the same way as the loop
vectoriser: it picked a vector mode/size for the region and then
based all the vector types off that choice.  This meant we could
end up trying to use vector types that had too many elements for
the group size.

The main part of this patch is therefore about passing the SLP
group size down to routines like get_vectype_for_scalar_type and
ensuring that each vector type in the SLP tree is chosen wrt the
group size.  That part in itself is pretty easy and mechanical.

The main warts are:

(1) We normally pick a STMT_VINFO_VECTYPE for data references at an
    early stage (vect_analyze_data_refs).  However, nothing in the
    BB vectoriser relied on this, or on the min_vf calculated from it.
    I couldn't see anything other than vect_recog_bool_pattern that
    tried to access the vector type before the SLP tree is built.

(2) It's possible for the same statement to be used in the groups of
    different sizes.  Taking the group size into account meant that
    we could try to pick different vector types for the same statement.

    This problem should go away with the move to doing everything on
    SLP trees, where presumably we would attach the vector type to the
    SLP node rather than the stmt_vec_info.  Until then, the patch just
    uses a first-come, first-served approach.

(3) A similar problem exists for grouped data references, where
    different statements in the same dataref group could be used
    in SLP nodes that have different group sizes.  The patch copes
    with that by making sure that all vector types in a dataref
    group remain consistent.

The patch means that:

    void
    f (int *x, short *y)
    {
      x[0] += y[0];
      x[1] += y[1];
      x[2] += y[2];
      x[3] += y[3];
    }

now produces:

        ldr     q0, [x0]
        ldr     d1, [x1]
        saddw   v0.4s, v0.4s, v1.4h
        str     q0, [x0]
        ret

instead of:

        ldrsh   w2, [x1]
        ldrsh   w3, [x1, 2]
        fmov    s0, w2
        ldrsh   w2, [x1, 4]
        ldrsh   w1, [x1, 6]
        ins     v0.s[1], w3
        ldr     q1, [x0]
        ins     v0.s[2], w2
        ins     v0.s[3], w1
        add     v0.4s, v0.4s, v1.4s
        str     q0, [x0]
        ret

Unfortunately it also means we start to vectorise
gcc.target/i386/pr84101.c for -m32.  That seems like a target
cost issue though; see PR92265 for details.


2019-10-29  Richard Sandiford  <richard.sandiford@arm.com>

gcc/
	* tree-vectorizer.h (vect_get_vector_types_for_stmt): Take an
	optional maximum nunits.
	(get_vectype_for_scalar_type): Likewise.  Also declare a form that
	takes an slp_tree.
	(get_mask_type_for_scalar_type): Take an optional slp_tree.
	(vect_get_mask_type_for_stmt): Likewise.
	* tree-vect-data-refs.c (vect_analyze_data_refs): Don't store
	the vector type in STMT_VINFO_VECTYPE for BB vectorization.
	* tree-vect-patterns.c (vect_recog_bool_pattern): Use
	vect_get_vector_types_for_stmt instead of STMT_VINFO_VECTYPE
	to get an assumed vector type for data references.
	* tree-vect-slp.c (vect_update_shared_vectype): New function.
	(vect_update_all_shared_vectypes): Likewise.
	(vect_build_slp_tree_1): Pass the group size to
	vect_get_vector_types_for_stmt.  Use vect_update_shared_vectype
	for BB vectorization.
	(vect_build_slp_tree_2): Call vect_update_all_shared_vectypes
	before building the vectof from scalars.
	(vect_analyze_slp_instance): Pass the group size to
	get_vectype_for_scalar_type.
	(vect_slp_analyze_node_operations_1): Don't recompute the vector
	types for BB vectorization here; just handle the case in which
	we deferred the choice for booleans.
	(vect_get_constant_vectors): Pass the slp_tree to
	get_vectype_for_scalar_type.
	* tree-vect-stmts.c (vect_prologue_cost_for_slp_op): Likewise.
	(vectorizable_call): Likewise.
	(vectorizable_simd_clone_call): Likewise.
	(vectorizable_conversion): Likewise.
	(vectorizable_shift): Likewise.
	(vectorizable_operation): Likewise.
	(vectorizable_comparison): Likewise.
	(vect_is_simple_cond): Take the slp_tree as argument and
	pass it to get_vectype_for_scalar_type.
	(vectorizable_condition): Update call accordingly.
	(get_vectype_for_scalar_type): Take a group_size argument.
	For BB vectorization, limit the the vector to that number
	of elements.  Also define an overload that takes an slp_tree.
	(get_mask_type_for_scalar_type): Add an slp_tree argument and
	pass it to get_vectype_for_scalar_type.
	(vect_get_vector_types_for_stmt): Add a group_size argument
	and pass it to get_vectype_for_scalar_type.  Don't use the
	cached vector type for BB vectorization if a group size is given.
	Handle data references in that case.
	(vect_get_mask_type_for_stmt): Take an slp_tree argument and
	pass it to get_mask_type_for_scalar_type.

gcc/testsuite/
	* gcc.dg/vect/bb-slp-4.c: Expect the block to be vectorized
	with -fno-vect-cost-model.
	* gcc.dg/vect/bb-slp-bool-1.c: New test.
	* gcc.target/aarch64/vect_mixed_sizes_14.c: Likewise.
	* gcc.target/i386/pr84101.c: XFAIL for -m32.

Index: gcc/tree-vectorizer.h
===================================================================
--- gcc/tree-vectorizer.h	2019-10-29 17:01:42.835677274 +0000
+++ gcc/tree-vectorizer.h	2019-10-29 17:02:09.883487330 +0000
@@ -1598,8 +1598,9 @@ extern bool vect_can_advance_ivs_p (loop
 /* In tree-vect-stmts.c.  */
 extern tree get_related_vectype_for_scalar_type (machine_mode, tree,
 						 poly_uint64 = 0);
-extern tree get_vectype_for_scalar_type (vec_info *, tree);
-extern tree get_mask_type_for_scalar_type (vec_info *, tree);
+extern tree get_vectype_for_scalar_type (vec_info *, tree, unsigned int = 0);
+extern tree get_vectype_for_scalar_type (vec_info *, tree, slp_tree);
+extern tree get_mask_type_for_scalar_type (vec_info *, tree, slp_tree = 0);
 extern tree get_same_sized_vectype (tree, tree);
 extern bool vect_get_loop_mask_type (loop_vec_info);
 extern bool vect_is_simple_use (tree, vec_info *, enum vect_def_type *,
@@ -1649,8 +1650,8 @@ extern void optimize_mask_stores (class
 extern gcall *vect_gen_while (tree, tree, tree);
 extern tree vect_gen_while_not (gimple_seq *, tree, tree, tree);
 extern opt_result vect_get_vector_types_for_stmt (stmt_vec_info, tree *,
-						  tree *);
-extern opt_tree vect_get_mask_type_for_stmt (stmt_vec_info);
+						  tree *, unsigned int = 0);
+extern opt_tree vect_get_mask_type_for_stmt (stmt_vec_info, slp_tree = 0);
 
 /* In tree-vect-data-refs.c.  */
 extern bool vect_can_force_dr_alignment_p (const_tree, poly_uint64);
Index: gcc/tree-vect-data-refs.c
===================================================================
--- gcc/tree-vect-data-refs.c	2019-10-25 09:21:28.606327675 +0100
+++ gcc/tree-vect-data-refs.c	2019-10-29 17:02:09.875487386 +0000
@@ -4343,9 +4343,8 @@ vect_analyze_data_refs (vec_info *vinfo,
 
       /* Set vectype for STMT.  */
       scalar_type = TREE_TYPE (DR_REF (dr));
-      STMT_VINFO_VECTYPE (stmt_info)
-	= get_vectype_for_scalar_type (vinfo, scalar_type);
-      if (!STMT_VINFO_VECTYPE (stmt_info))
+      tree vectype = get_vectype_for_scalar_type (vinfo, scalar_type);
+      if (!vectype)
         {
           if (dump_enabled_p ())
             {
@@ -4378,14 +4377,19 @@ vect_analyze_data_refs (vec_info *vinfo,
 	  if (dump_enabled_p ())
 	    dump_printf_loc (MSG_NOTE, vect_location,
 			     "got vectype for stmt: %G%T\n",
-			     stmt_info->stmt, STMT_VINFO_VECTYPE (stmt_info));
+			     stmt_info->stmt, vectype);
 	}
 
       /* Adjust the minimal vectorization factor according to the
 	 vector type.  */
-      vf = TYPE_VECTOR_SUBPARTS (STMT_VINFO_VECTYPE (stmt_info));
+      vf = TYPE_VECTOR_SUBPARTS (vectype);
       *min_vf = upper_bound (*min_vf, vf);
 
+      /* Leave the BB vectorizer to pick the vector type later, based on
+	 the final dataref group size and SLP node size.  */
+      if (is_a <loop_vec_info> (vinfo))
+	STMT_VINFO_VECTYPE (stmt_info) = vectype;
+
       if (gatherscatter != SG_NONE)
 	{
 	  gather_scatter_info gs_info;
Index: gcc/tree-vect-patterns.c
===================================================================
--- gcc/tree-vect-patterns.c	2019-10-29 17:01:42.543679326 +0000
+++ gcc/tree-vect-patterns.c	2019-10-29 17:02:09.879487358 +0000
@@ -4153,9 +4153,10 @@ vect_recog_bool_pattern (stmt_vec_info s
 	   && STMT_VINFO_DATA_REF (stmt_vinfo))
     {
       stmt_vec_info pattern_stmt_info;
-      vectype = STMT_VINFO_VECTYPE (stmt_vinfo);
-      gcc_assert (vectype != NULL_TREE);
-      if (!VECTOR_MODE_P (TYPE_MODE (vectype)))
+      tree nunits_vectype;
+      if (!vect_get_vector_types_for_stmt (stmt_vinfo, &vectype,
+					   &nunits_vectype)
+	  || !VECTOR_MODE_P (TYPE_MODE (vectype)))
 	return NULL;
 
       if (check_bool_pattern (var, vinfo, bool_stmts))
Index: gcc/tree-vect-slp.c
===================================================================
--- gcc/tree-vect-slp.c	2019-10-29 17:02:06.355512105 +0000
+++ gcc/tree-vect-slp.c	2019-10-29 17:02:09.879487358 +0000
@@ -601,6 +601,77 @@ vect_get_and_check_slp_defs (vec_info *v
   return 0;
 }
 
+/* Try to assign vector type VECTYPE to STMT_INFO for BB vectorization.
+   Return true if we can, meaning that this choice doesn't conflict with
+   existing SLP nodes that use STMT_INFO.  */
+
+static bool
+vect_update_shared_vectype (stmt_vec_info stmt_info, tree vectype)
+{
+  tree old_vectype = STMT_VINFO_VECTYPE (stmt_info);
+  if (old_vectype && useless_type_conversion_p (vectype, old_vectype))
+    return true;
+
+  if (STMT_VINFO_GROUPED_ACCESS (stmt_info)
+      && DR_IS_READ (STMT_VINFO_DATA_REF (stmt_info)))
+    {
+      /* We maintain the invariant that if any statement in the group is
+	 used, all other members of the group have the same vector type.  */
+      stmt_vec_info first_info = DR_GROUP_FIRST_ELEMENT (stmt_info);
+      stmt_vec_info member_info = first_info;
+      for (; member_info; member_info = DR_GROUP_NEXT_ELEMENT (member_info))
+	if (STMT_VINFO_NUM_SLP_USES (member_info) > 0
+	    || is_pattern_stmt_p (member_info))
+	  break;
+
+      if (!member_info)
+	{
+	  for (member_info = first_info; member_info;
+	       member_info = DR_GROUP_NEXT_ELEMENT (member_info))
+	    STMT_VINFO_VECTYPE (member_info) = vectype;
+	  return true;
+	}
+    }
+  else if (STMT_VINFO_NUM_SLP_USES (stmt_info) == 0
+	   && !is_pattern_stmt_p (stmt_info))
+    {
+      STMT_VINFO_VECTYPE (stmt_info) = vectype;
+      return true;
+    }
+
+  if (dump_enabled_p ())
+    {
+      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+		       "Build SLP failed: incompatible vector"
+		       " types for: %G", stmt_info->stmt);
+      dump_printf_loc (MSG_NOTE, vect_location,
+		       "    old vector type: %T\n", old_vectype);
+      dump_printf_loc (MSG_NOTE, vect_location,
+		       "    new vector type: %T\n", vectype);
+    }
+  return false;
+}
+
+/* Try to infer and assign a vector type to all the statements in STMTS.
+   Used only for BB vectorization.  */
+
+static bool
+vect_update_all_shared_vectypes (vec<stmt_vec_info> stmts)
+{
+  tree vectype, nunits_vectype;
+  if (!vect_get_vector_types_for_stmt (stmts[0], &vectype,
+				       &nunits_vectype, stmts.length ()))
+    return false;
+
+  stmt_vec_info stmt_info;
+  unsigned int i;
+  FOR_EACH_VEC_ELT (stmts, i, stmt_info)
+    if (!vect_update_shared_vectype (stmt_info, vectype))
+      return false;
+
+  return true;
+}
+
 /* Return true if call statements CALL1 and CALL2 are similar enough
    to be combined into the same SLP group.  */
 
@@ -747,6 +818,7 @@ vect_build_slp_tree_1 (unsigned char *sw
   stmt_vec_info stmt_info;
   FOR_EACH_VEC_ELT (stmts, i, stmt_info)
     {
+      vec_info *vinfo = stmt_info->vinfo;
       gimple *stmt = stmt_info->stmt;
       swap[i] = 0;
       matches[i] = false;
@@ -780,7 +852,7 @@ vect_build_slp_tree_1 (unsigned char *sw
 
       tree nunits_vectype;
       if (!vect_get_vector_types_for_stmt (stmt_info, &vectype,
-					   &nunits_vectype)
+					   &nunits_vectype, group_size)
 	  || (nunits_vectype
 	      && !vect_record_max_nunits (stmt_info, group_size,
 					  nunits_vectype, max_nunits)))
@@ -792,6 +864,10 @@ vect_build_slp_tree_1 (unsigned char *sw
 
       gcc_assert (vectype);
 
+      if (is_a <bb_vec_info> (vinfo)
+	  && !vect_update_shared_vectype (stmt_info, vectype))
+	continue;
+
       if (gcall *call_stmt = dyn_cast <gcall *> (stmt))
 	{
 	  rhs_code = CALL_EXPR;
@@ -1330,7 +1406,8 @@ vect_build_slp_tree_2 (vec_info *vinfo,
 	      FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (child), j, grandchild)
 		if (SLP_TREE_DEF_TYPE (grandchild) != vect_external_def)
 		  break;
-	      if (!grandchild)
+	      if (!grandchild
+		  && vect_update_all_shared_vectypes (oprnd_info->def_stmts))
 		{
 		  /* Roll back.  */
 		  this_tree_size = old_tree_size;
@@ -1371,7 +1448,8 @@ vect_build_slp_tree_2 (vec_info *vinfo,
 	     do extra work to cancel the pattern so the uses see the
 	     scalar version.  */
 	  && !is_pattern_stmt_p (stmt_info)
-	  && !oprnd_info->any_pattern)
+	  && !oprnd_info->any_pattern
+	  && vect_update_all_shared_vectypes (oprnd_info->def_stmts))
 	{
 	  if (dump_enabled_p ())
 	    dump_printf_loc (MSG_NOTE, vect_location,
@@ -1468,7 +1546,9 @@ vect_build_slp_tree_2 (vec_info *vinfo,
 		  FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (child), j, grandchild)
 		    if (SLP_TREE_DEF_TYPE (grandchild) != vect_external_def)
 		      break;
-		  if (!grandchild)
+		  if (!grandchild
+		      && (vect_update_all_shared_vectypes
+			  (oprnd_info->def_stmts)))
 		    {
 		      /* Roll back.  */
 		      this_tree_size = old_tree_size;
@@ -2003,8 +2083,8 @@ vect_analyze_slp_instance (vec_info *vin
   if (STMT_VINFO_GROUPED_ACCESS (stmt_info))
     {
       scalar_type = TREE_TYPE (DR_REF (dr));
-      vectype = get_vectype_for_scalar_type (vinfo, scalar_type);
       group_size = DR_GROUP_SIZE (stmt_info);
+      vectype = get_vectype_for_scalar_type (vinfo, scalar_type, group_size);
     }
   else if (!dr && REDUC_GROUP_FIRST_ELEMENT (stmt_info))
     {
@@ -2586,22 +2666,13 @@ vect_slp_analyze_node_operations_1 (vec_
      Memory accesses already got their vector type assigned
      in vect_analyze_data_refs.  */
   bb_vec_info bb_vinfo = STMT_VINFO_BB_VINFO (stmt_info);
-  if (bb_vinfo
-      && ! STMT_VINFO_DATA_REF (stmt_info))
+  if (bb_vinfo && STMT_VINFO_VECTYPE (stmt_info) == boolean_type_node)
     {
-      tree vectype, nunits_vectype;
-      if (!vect_get_vector_types_for_stmt (stmt_info, &vectype,
-					   &nunits_vectype))
-	/* We checked this when building the node.  */
-	gcc_unreachable ();
-      if (vectype == boolean_type_node)
-	{
-	  vectype = vect_get_mask_type_for_stmt (stmt_info);
-	  if (!vectype)
-	    /* vect_get_mask_type_for_stmt has already explained the
-	       failure.  */
-	    return false;
-	}
+      tree vectype = vect_get_mask_type_for_stmt (stmt_info, node);
+      if (!vectype)
+	/* vect_get_mask_type_for_stmt has already explained the
+	   failure.  */
+	return false;
 
       stmt_vec_info sstmt_info;
       unsigned int i;
@@ -3475,7 +3546,7 @@ vect_get_constant_vectors (slp_tree op_n
       && vect_mask_constant_operand_p (stmt_vinfo))
     vector_type = truth_type_for (stmt_vectype);
   else
-    vector_type = get_vectype_for_scalar_type (vinfo, TREE_TYPE (op));
+    vector_type = get_vectype_for_scalar_type (vinfo, TREE_TYPE (op), op_node);
 
   unsigned int number_of_vectors
     = vect_get_num_vectors (SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node)
Index: gcc/tree-vect-stmts.c
===================================================================
--- gcc/tree-vect-stmts.c	2019-10-29 17:01:42.951676460 +0000
+++ gcc/tree-vect-stmts.c	2019-10-29 17:02:09.883487330 +0000
@@ -783,7 +783,7 @@ vect_prologue_cost_for_slp_op (slp_tree
   /* Without looking at the actual initializer a vector of
      constants can be implemented as load from the constant pool.
      When all elements are the same we can use a splat.  */
-  tree vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (op));
+  tree vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (op), node);
   unsigned group_size = SLP_TREE_SCALAR_STMTS (node).length ();
   unsigned num_vects_to_check;
   unsigned HOST_WIDE_INT const_nunits;
@@ -3290,7 +3290,7 @@ vectorizable_call (stmt_vec_info stmt_in
   /* If all arguments are external or constant defs, infer the vector type
      from the scalar type.  */
   if (!vectype_in)
-    vectype_in = get_vectype_for_scalar_type (vinfo, rhs_type);
+    vectype_in = get_vectype_for_scalar_type (vinfo, rhs_type, slp_node);
   if (vec_stmt)
     gcc_assert (vectype_in);
   if (!vectype_in)
@@ -4066,7 +4066,8 @@ vectorizable_simd_clone_call (stmt_vec_i
 	&& bestn->simdclone->args[i].arg_type == SIMD_CLONE_ARG_TYPE_VECTOR)
       {
 	tree arg_type = TREE_TYPE (gimple_call_arg (stmt, i));
-	arginfo[i].vectype = get_vectype_for_scalar_type (vinfo, arg_type);
+	arginfo[i].vectype = get_vectype_for_scalar_type (vinfo, arg_type,
+							  slp_node);
 	if (arginfo[i].vectype == NULL
 	    || (simd_clone_subparts (arginfo[i].vectype)
 		> bestn->simdclone->simdlen))
@@ -4782,7 +4783,7 @@ vectorizable_conversion (stmt_vec_info s
   /* If op0 is an external or constant def, infer the vector type
      from the scalar type.  */
   if (!vectype_in)
-    vectype_in = get_vectype_for_scalar_type (vinfo, rhs_type);
+    vectype_in = get_vectype_for_scalar_type (vinfo, rhs_type, slp_node);
   if (vec_stmt)
     gcc_assert (vectype_in);
   if (!vectype_in)
@@ -5548,7 +5549,7 @@ vectorizable_shift (stmt_vec_info stmt_i
   /* If op0 is an external or constant def, infer the vector type
      from the scalar type.  */
   if (!vectype)
-    vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (op0));
+    vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (op0), slp_node);
   if (vec_stmt)
     gcc_assert (vectype);
   if (!vectype)
@@ -5647,7 +5648,8 @@ vectorizable_shift (stmt_vec_info stmt_i
                          "vector/vector shift/rotate found.\n");
 
       if (!op1_vectype)
-	op1_vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (op1));
+	op1_vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (op1),
+						   slp_node);
       incompatible_op1_vectype_p
 	= (op1_vectype == NULL_TREE
 	   || maybe_ne (TYPE_VECTOR_SUBPARTS (op1_vectype),
@@ -5999,7 +6001,8 @@ vectorizable_operation (stmt_vec_info st
 	  vectype = vectype_out;
 	}
       else
-	vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (op0));
+	vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (op0),
+					       slp_node);
     }
   if (vec_stmt)
     gcc_assert (vectype);
@@ -9741,7 +9744,7 @@ vectorizable_load (stmt_vec_info stmt_in
    condition operands are supportable using vec_is_simple_use.  */
 
 static bool
-vect_is_simple_cond (tree cond, vec_info *vinfo,
+vect_is_simple_cond (tree cond, vec_info *vinfo, slp_tree slp_node,
 		     tree *comp_vectype, enum vect_def_type *dts,
 		     tree vectype)
 {
@@ -9805,7 +9808,8 @@ vect_is_simple_cond (tree cond, vec_info
 	scalar_type = build_nonstandard_integer_type
 	  (tree_to_uhwi (TYPE_SIZE (TREE_TYPE (vectype))),
 	   TYPE_UNSIGNED (scalar_type));
-      *comp_vectype = get_vectype_for_scalar_type (vinfo, scalar_type);
+      *comp_vectype = get_vectype_for_scalar_type (vinfo, scalar_type,
+						   slp_node);
     }
 
   return true;
@@ -9912,7 +9916,7 @@ vectorizable_condition (stmt_vec_info st
   then_clause = gimple_assign_rhs2 (stmt);
   else_clause = gimple_assign_rhs3 (stmt);
 
-  if (!vect_is_simple_cond (cond_expr, stmt_info->vinfo,
+  if (!vect_is_simple_cond (cond_expr, stmt_info->vinfo, slp_node,
 			    &comp_vectype, &dts[0], slp_node ? NULL : vectype)
       || !comp_vectype)
     return false;
@@ -10391,7 +10395,8 @@ vectorizable_comparison (stmt_vec_info s
   /* Invariant comparison.  */
   if (!vectype)
     {
-      vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (rhs1));
+      vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (rhs1),
+					     slp_node);
       if (maybe_ne (TYPE_VECTOR_SUBPARTS (vectype), nunits))
 	return false;
     }
@@ -11199,27 +11204,87 @@ get_related_vectype_for_scalar_type (mac
 /* Function get_vectype_for_scalar_type.
 
    Returns the vector type corresponding to SCALAR_TYPE as supported
-   by the target.  */
+   by the target.  If GROUP_SIZE is nonzero and we're performing BB
+   vectorization, make sure that the number of elements in the vector
+   is no bigger than GROUP_SIZE.  */
 
 tree
-get_vectype_for_scalar_type (vec_info *vinfo, tree scalar_type)
+get_vectype_for_scalar_type (vec_info *vinfo, tree scalar_type,
+			     unsigned int group_size)
 {
+  /* For BB vectorization, we should always have a group size once we've
+     constructed the SLP tree; the only valid uses of zero GROUP_SIZEs
+     are tentative requests during things like early data reference
+     analysis and pattern recognition.  */
+  if (is_a <bb_vec_info> (vinfo))
+    gcc_assert (vinfo->slp_instances.is_empty () || group_size != 0);
+  else
+    group_size = 0;
+
   tree vectype = get_related_vectype_for_scalar_type (vinfo->vector_mode,
 						      scalar_type);
   if (vectype && vinfo->vector_mode == VOIDmode)
     vinfo->vector_mode = TYPE_MODE (vectype);
+
+  /* If the natural choice of vector type doesn't satisfy GROUP_SIZE,
+     try again with an explicit number of elements.  */
+  if (vectype
+      && group_size
+      && maybe_ge (TYPE_VECTOR_SUBPARTS (vectype), group_size))
+    {
+      /* Start with the biggest number of units that fits within
+	 GROUP_SIZE and halve it until we find a valid vector type.
+	 Usually either the first attempt will succeed or all will
+	 fail (in the latter case because GROUP_SIZE is too small
+	 for the target), but it's possible that a target could have
+	 a hole between supported vector types.
+
+	 If GROUP_SIZE is not a power of 2, this has the effect of
+	 trying the largest power of 2 that fits within the group,
+	 even though the group is not a multiple of that vector size.
+	 The BB vectorizer will then try to carve up the group into
+	 smaller pieces.  */
+      unsigned int nunits = 1 << floor_log2 (group_size);
+      do
+	{
+	  vectype = get_related_vectype_for_scalar_type (vinfo->vector_mode,
+							 scalar_type, nunits);
+	  nunits /= 2;
+	}
+      while (nunits > 1 && !vectype);
+    }
   return vectype;
 }
 
+/* Return the vector type corresponding to SCALAR_TYPE as supported
+   by the target.  NODE, if nonnull, is the SLP tree node that will
+   use the returned vector type.  */
+
+tree
+get_vectype_for_scalar_type (vec_info *vinfo, tree scalar_type, slp_tree node)
+{
+  unsigned int group_size = 0;
+  if (node)
+    {
+      group_size = SLP_TREE_SCALAR_OPS (node).length ();
+      if (group_size == 0)
+	group_size = SLP_TREE_SCALAR_STMTS (node).length ();
+    }
+  return get_vectype_for_scalar_type (vinfo, scalar_type, group_size);
+}
+
 /* Function get_mask_type_for_scalar_type.
 
    Returns the mask type corresponding to a result of comparison
-   of vectors of specified SCALAR_TYPE as supported by target.  */
+   of vectors of specified SCALAR_TYPE as supported by target.
+   NODE, if nonnull, is the SLP tree node that will use the returned
+   vector type.  */
 
 tree
-get_mask_type_for_scalar_type (vec_info *vinfo, tree scalar_type)
+get_mask_type_for_scalar_type (vec_info *vinfo, tree scalar_type,
+			       slp_tree node)
 {
-  tree vectype = get_vectype_for_scalar_type (vinfo, scalar_type);
+  tree vectype = get_vectype_for_scalar_type (vinfo, scalar_type, node);
 
   if (!vectype)
     return NULL;
@@ -11892,6 +11957,9 @@ vect_gen_while_not (gimple_seq *seq, tre
 
 /* Try to compute the vector types required to vectorize STMT_INFO,
    returning true on success and false if vectorization isn't possible.
+   If GROUP_SIZE is nonzero and we're performing BB vectorization,
+   take sure that the number of elements in the vectors is no bigger
+   than GROUP_SIZE.
 
    On success:
 
@@ -11909,11 +11977,21 @@ vect_gen_while_not (gimple_seq *seq, tre
 opt_result
 vect_get_vector_types_for_stmt (stmt_vec_info stmt_info,
 				tree *stmt_vectype_out,
-				tree *nunits_vectype_out)
+				tree *nunits_vectype_out,
+				unsigned int group_size)
 {
   vec_info *vinfo = stmt_info->vinfo;
   gimple *stmt = stmt_info->stmt;
 
+  /* For BB vectorization, we should always have a group size once we've
+     constructed the SLP tree; the only valid uses of zero GROUP_SIZEs
+     are tentative requests during things like early data reference
+     analysis and pattern recognition.  */
+  if (is_a <bb_vec_info> (vinfo))
+    gcc_assert (vinfo->slp_instances.is_empty () || group_size != 0);
+  else
+    group_size = 0;
+
   *stmt_vectype_out = NULL_TREE;
   *nunits_vectype_out = NULL_TREE;
 
@@ -11944,7 +12022,7 @@ vect_get_vector_types_for_stmt (stmt_vec
 
   tree vectype;
   tree scalar_type = NULL_TREE;
-  if (STMT_VINFO_VECTYPE (stmt_info))
+  if (group_size == 0 && STMT_VINFO_VECTYPE (stmt_info))
     {
       *stmt_vectype_out = vectype = STMT_VINFO_VECTYPE (stmt_info);
       if (dump_enabled_p ())
@@ -11953,15 +12031,17 @@ vect_get_vector_types_for_stmt (stmt_vec
     }
   else
     {
-      gcc_assert (!STMT_VINFO_DATA_REF (stmt_info));
-      if (gimple_call_internal_p (stmt, IFN_MASK_STORE))
+      if (data_reference *dr = STMT_VINFO_DATA_REF (stmt_info))
+	scalar_type = TREE_TYPE (DR_REF (dr));
+      else if (gimple_call_internal_p (stmt, IFN_MASK_STORE))
 	scalar_type = TREE_TYPE (gimple_call_arg (stmt, 3));
       else
 	scalar_type = TREE_TYPE (gimple_get_lhs (stmt));
 
       /* Pure bool ops don't participate in number-of-units computation.
 	 For comparisons use the types being compared.  */
-      if (VECT_SCALAR_BOOLEAN_TYPE_P (scalar_type)
+      if (!STMT_VINFO_DATA_REF (stmt_info)
+	  && VECT_SCALAR_BOOLEAN_TYPE_P (scalar_type)
 	  && is_gimple_assign (stmt)
 	  && gimple_assign_rhs_code (stmt) != COND_EXPR)
 	{
@@ -11981,9 +12061,16 @@ vect_get_vector_types_for_stmt (stmt_vec
 	}
 
       if (dump_enabled_p ())
-	dump_printf_loc (MSG_NOTE, vect_location,
-			 "get vectype for scalar type: %T\n", scalar_type);
-      vectype = get_vectype_for_scalar_type (vinfo, scalar_type);
+	{
+	  if (group_size)
+	    dump_printf_loc (MSG_NOTE, vect_location,
+			     "get vectype for scalar type (group size %d):"
+			     " %T\n", group_size, scalar_type);
+	  else
+	    dump_printf_loc (MSG_NOTE, vect_location,
+			     "get vectype for scalar type: %T\n", scalar_type);
+	}
+      vectype = get_vectype_for_scalar_type (vinfo, scalar_type, group_size);
       if (!vectype)
 	return opt_result::failure_at (stmt,
 				       "not vectorized:"
@@ -12014,7 +12101,8 @@ vect_get_vector_types_for_stmt (stmt_vec
 	    dump_printf_loc (MSG_NOTE, vect_location,
 			     "get vectype for smallest scalar type: %T\n",
 			     scalar_type);
-	  nunits_vectype = get_vectype_for_scalar_type (vinfo, scalar_type);
+	  nunits_vectype = get_vectype_for_scalar_type (vinfo, scalar_type,
+							group_size);
 	  if (!nunits_vectype)
 	    return opt_result::failure_at
 	      (stmt, "not vectorized: unsupported data-type %T\n",
@@ -12042,10 +12130,11 @@ vect_get_vector_types_for_stmt (stmt_vec
 
 /* Try to determine the correct vector type for STMT_INFO, which is a
    statement that produces a scalar boolean result.  Return the vector
-   type on success, otherwise return NULL_TREE.  */
+   type on success, otherwise return NULL_TREE.  NODE, if nonnull,
+   is the SLP tree node that will use the returned vector type.  */
 
 opt_tree
-vect_get_mask_type_for_stmt (stmt_vec_info stmt_info)
+vect_get_mask_type_for_stmt (stmt_vec_info stmt_info, slp_tree node)
 {
   vec_info *vinfo = stmt_info->vinfo;
   gimple *stmt = stmt_info->stmt;
@@ -12057,7 +12146,7 @@ vect_get_mask_type_for_stmt (stmt_vec_in
       && !VECT_SCALAR_BOOLEAN_TYPE_P (TREE_TYPE (gimple_assign_rhs1 (stmt))))
     {
       scalar_type = TREE_TYPE (gimple_assign_rhs1 (stmt));
-      mask_type = get_mask_type_for_scalar_type (vinfo, scalar_type);
+      mask_type = get_mask_type_for_scalar_type (vinfo, scalar_type, node);
 
       if (!mask_type)
 	return opt_tree::failure_at (stmt,
Index: gcc/testsuite/gcc.dg/vect/bb-slp-4.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/bb-slp-4.c	2019-03-08 18:15:02.268871230 +0000
+++ gcc/testsuite/gcc.dg/vect/bb-slp-4.c	2019-10-29 17:02:09.875487386 +0000
@@ -38,5 +38,4 @@ int main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "basic block vectorized" 0 "slp2" } } */
-  
+/* { dg-final { scan-tree-dump-times "basic block vectorized" 1 "slp2" } } */
Index: gcc/testsuite/gcc.dg/vect/bb-slp-bool-1.c
===================================================================
--- /dev/null	2019-09-17 11:41:18.176664108 +0100
+++ gcc/testsuite/gcc.dg/vect/bb-slp-bool-1.c	2019-10-29 17:02:09.875487386 +0000
@@ -0,0 +1,44 @@
+#include "tree-vect.h"
+
+void __attribute__ ((noipa))
+f1 (_Bool *x, unsigned short *y)
+{
+  x[0] = (y[0] == 1);
+  x[1] = (y[1] == 1);
+}
+
+void __attribute__ ((noipa))
+f2 (_Bool *x, unsigned short *y)
+{
+  x[0] = (y[0] == 1);
+  x[1] = (y[1] == 1);
+  x[2] = (y[2] == 1);
+  x[3] = (y[3] == 1);
+  x[4] = (y[4] == 1);
+  x[5] = (y[5] == 1);
+  x[6] = (y[6] == 1);
+  x[7] = (y[7] == 1);
+}
+
+_Bool x[8];
+unsigned short y[8] = { 11, 1, 9, 5, 1, 44, 1, 1 };
+
+int
+main (void)
+{
+  check_vect ();
+
+  f1 (x, y);
+
+  if (x[0] || !x[1])
+    __builtin_abort ();
+
+  x[1] = 0;
+
+  f2 (x, y);
+
+  if (x[0] || !x[1] || x[2] | x[3] || !x[4] || x[5] || !x[6] || !x[7])
+    __builtin_abort ();
+
+  return 0;
+}
Index: gcc/testsuite/gcc.target/aarch64/vect_mixed_sizes_14.c
===================================================================
--- /dev/null	2019-09-17 11:41:18.176664108 +0100
+++ gcc/testsuite/gcc.target/aarch64/vect_mixed_sizes_14.c	2019-10-29 17:02:09.875487386 +0000
@@ -0,0 +1,26 @@
+/* { dg-options "-O2 -ftree-vectorize" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+/*
+** foo:
+** (
+**	ldr	d([0-9]+), \[x1\]
+**	ldr	q([0-9]+), \[x0\]
+**	saddw	v([0-9]+)\.4s, v\2\.4s, v\1\.4h
+**	str	q\3, \[x0\]
+** |
+**	ldr	q([0-9]+), \[x0\]
+**	ldr	d([0-9]+), \[x1\]
+**	saddw	v([0-9]+)\.4s, v\4\.4s, v\5\.4h
+**	str	q\6, \[x0\]
+** )
+**	ret
+*/
+void
+foo (int *x, short *y)
+{
+  x[0] += y[0];
+  x[1] += y[1];
+  x[2] += y[2];
+  x[3] += y[3];
+}
Index: gcc/testsuite/gcc.target/i386/pr84101.c
===================================================================
--- gcc/testsuite/gcc.target/i386/pr84101.c	2019-04-04 08:34:50.849942379 +0100
+++ gcc/testsuite/gcc.target/i386/pr84101.c	2019-10-29 17:02:09.875487386 +0000
@@ -18,4 +18,5 @@ uint64_pair_t pair(int num)
   return p ;
 }
 
-/* { dg-final { scan-tree-dump-not "basic block vectorized" "slp2" } } */
+/* See PR92266 for the XFAIL.  */
+/* { dg-final { scan-tree-dump-not "basic block vectorized" "slp2" { xfail ilp32 } } } */

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [6/n] Use build_vector_type_for_mode in get_vectype_for_scalar_type_and_size
  2019-10-25 12:34 ` [6/n] Use build_vector_type_for_mode in get_vectype_for_scalar_type_and_size Richard Sandiford
@ 2019-10-30 14:32   ` Richard Biener
  0 siblings, 0 replies; 48+ messages in thread
From: Richard Biener @ 2019-10-30 14:32 UTC (permalink / raw)
  To: Richard Sandiford; +Cc: GCC Patches

On Fri, Oct 25, 2019 at 2:32 PM Richard Sandiford
<richard.sandiford@arm.com> wrote:
>
> Except for one case, get_vectype_for_scalar_type_and_size calculates
> what the vector mode should be and then calls build_vector_type,
> which recomputes the mode from scratch.  This patch makes it use
> build_vector_type_for_mode instead.
>
> The exception mentioned above is when preferred_simd_mode returns
> an integer mode, which it does if no appropriate vector mode exists.
> The integer mode in question is usually word_mode, although epiphany
> can return a doubleword mode in some cases.
>
> There's no guarantee that this integer mode is appropriate, since for
> example the scalar type could be a float.  The traditional behaviour is
> therefore to use the integer mode to determine a size only, and leave
> mode_for_vector to pick the TYPE_MODE.  (Note that it can actually end
> up picking a vector mode if the target defines a disabled vector mode.
> We therefore still need to check TYPE_MODE after building the type.)

OK.

Thanks,
Richard.

>
> 2019-10-24  Richard Sandiford  <richard.sandiford@arm.com>
>
> gcc/
>         * tree-vect-stmts.c (get_vectype_for_scalar_type_and_size): If
>         targetm.vectorize.preferred_simd_mode returns an integer mode,
>         use mode_for_vector to decide what the vector type's mode
>         should actually be.  Use build_vector_type_for_mode instead
>         of build_vector_type.
>
> Index: gcc/tree-vect-stmts.c
> ===================================================================
> --- gcc/tree-vect-stmts.c       2019-10-25 13:26:59.309877555 +0100
> +++ gcc/tree-vect-stmts.c       2019-10-25 13:27:08.653811531 +0100
> @@ -11162,16 +11162,31 @@ get_vectype_for_scalar_type_and_size (tr
>    /* If no size was supplied use the mode the target prefers.   Otherwise
>       lookup a vector mode of the specified size.  */
>    if (known_eq (size, 0U))
> -    simd_mode = targetm.vectorize.preferred_simd_mode (inner_mode);
> +    {
> +      simd_mode = targetm.vectorize.preferred_simd_mode (inner_mode);
> +      if (SCALAR_INT_MODE_P (simd_mode))
> +       {
> +         /* Traditional behavior is not to take the integer mode
> +            literally, but simply to use it as a way of determining
> +            the vector size.  It is up to mode_for_vector to decide
> +            what the TYPE_MODE should be.
> +
> +            Note that nunits == 1 is allowed in order to support single
> +            element vector types.  */
> +         if (!multiple_p (GET_MODE_SIZE (simd_mode), nbytes, &nunits)
> +             || !mode_for_vector (inner_mode, nunits).exists (&simd_mode))
> +           return NULL_TREE;
> +       }
> +    }
>    else if (!multiple_p (size, nbytes, &nunits)
>            || !mode_for_vector (inner_mode, nunits).exists (&simd_mode))
>      return NULL_TREE;
> -  /* NOTE: nunits == 1 is allowed to support single element vector types.  */
> -  if (!multiple_p (GET_MODE_SIZE (simd_mode), nbytes, &nunits))
> -    return NULL_TREE;
>
> -  vectype = build_vector_type (scalar_type, nunits);
> +  vectype = build_vector_type_for_mode (scalar_type, simd_mode);
>
> +  /* In cases where the mode was chosen by mode_for_vector, check that
> +     the target actually supports the chosen mode, or that it at least
> +     allows the vector mode to be replaced by a like-sized integer.  */
>    if (!VECTOR_MODE_P (TYPE_MODE (vectype))
>        && !INTEGRAL_MODE_P (TYPE_MODE (vectype)))
>      return NULL_TREE;

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [7/n] Use consistent compatibility checks in vectorizable_shift
  2019-10-25 12:34 ` [7/n] Use consistent compatibility checks in vectorizable_shift Richard Sandiford
@ 2019-10-30 14:33   ` Richard Biener
  0 siblings, 0 replies; 48+ messages in thread
From: Richard Biener @ 2019-10-30 14:33 UTC (permalink / raw)
  To: Richard Sandiford; +Cc: GCC Patches

On Fri, Oct 25, 2019 at 2:34 PM Richard Sandiford
<richard.sandiford@arm.com> wrote:
>
> The validation phase of vectorizable_shift used TYPE_MODE to check
> whether the shift amount vector was compatible with the shifted vector:
>
>       if ((op1_vectype == NULL_TREE
>            || TYPE_MODE (op1_vectype) != TYPE_MODE (vectype))
>           && (!slp_node
>               || SLP_TREE_DEF_TYPE
>                    (SLP_TREE_CHILDREN (slp_node)[1]) != vect_constant_def))
>
> But the generation phase was stricter and required the element types to
> be equivalent:
>
>                    && !useless_type_conversion_p (TREE_TYPE (vectype),
>                                                   TREE_TYPE (op1)))
>
> This difference led to an ICE with a later patch.
>
> The first condition seems a bit too lax given that the function
> supports vect_worthwhile_without_simd_p, where two different vector
> types could have the same integer mode.  But it seems too strict
> to reject signed shifts by unsigned amounts or unsigned shifts by
> signed amounts; verify_gimple_assign_binary is happy with those.
>
> This patch therefore goes for a middle ground of checking both TYPE_MODE
> and TYPE_VECTOR_SUBPARTS, using the same condition in both places.

OK.  The whole vectorizable_shift needs a rewrite ;)  (no good reason to
not support widening/narrowing of a shift argument)

Richard.

>
> 2019-10-24  Richard Sandiford  <richard.sandiford@arm.com>
>
> gcc/
>         * tree-vect-stmts.c (vectorizable_shift): Check the number
>         of vector elements as well as the type mode when deciding
>         whether an op1_vectype is compatible.  Reuse the result of
>         this check when generating vector statements.
>
> Index: gcc/tree-vect-stmts.c
> ===================================================================
> --- gcc/tree-vect-stmts.c       2019-10-25 13:27:08.653811531 +0100
> +++ gcc/tree-vect-stmts.c       2019-10-25 13:27:12.121787027 +0100
> @@ -5522,6 +5522,7 @@ vectorizable_shift (stmt_vec_info stmt_i
>    bool scalar_shift_arg = true;
>    bb_vec_info bb_vinfo = STMT_VINFO_BB_VINFO (stmt_info);
>    vec_info *vinfo = stmt_info->vinfo;
> +  bool incompatible_op1_vectype_p = false;
>
>    if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo)
>      return false;
> @@ -5666,8 +5667,12 @@ vectorizable_shift (stmt_vec_info stmt_i
>
>        if (!op1_vectype)
>         op1_vectype = get_same_sized_vectype (TREE_TYPE (op1), vectype_out);
> -      if ((op1_vectype == NULL_TREE
> -          || TYPE_MODE (op1_vectype) != TYPE_MODE (vectype))
> +      incompatible_op1_vectype_p
> +       = (op1_vectype == NULL_TREE
> +          || maybe_ne (TYPE_VECTOR_SUBPARTS (op1_vectype),
> +                       TYPE_VECTOR_SUBPARTS (vectype))
> +          || TYPE_MODE (op1_vectype) != TYPE_MODE (vectype));
> +      if (incompatible_op1_vectype_p
>           && (!slp_node
>               || SLP_TREE_DEF_TYPE
>                    (SLP_TREE_CHILDREN (slp_node)[1]) != vect_constant_def))
> @@ -5813,9 +5818,7 @@ vectorizable_shift (stmt_vec_info stmt_i
>                      }
>                  }
>              }
> -         else if (slp_node
> -                  && !useless_type_conversion_p (TREE_TYPE (vectype),
> -                                                 TREE_TYPE (op1)))
> +         else if (slp_node && incompatible_op1_vectype_p)
>             {
>               if (was_scalar_shift_arg)
>                 {

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [8/n] Replace autovectorize_vector_sizes with autovectorize_vector_modes
  2019-10-25 12:39 ` [8/n] Replace autovectorize_vector_sizes with autovectorize_vector_modes Richard Sandiford
@ 2019-10-30 14:48   ` Richard Biener
  2019-10-30 16:33     ` Richard Sandiford
  0 siblings, 1 reply; 48+ messages in thread
From: Richard Biener @ 2019-10-30 14:48 UTC (permalink / raw)
  To: Richard Sandiford; +Cc: GCC Patches

On Fri, Oct 25, 2019 at 2:37 PM Richard Sandiford
<richard.sandiford@arm.com> wrote:
>
> This is another patch in the series to remove the assumption that
> all modes involved in vectorisation have to be the same size.
> Rather than have the target provide a list of vector sizes,
> it makes the target provide a list of vector "approaches",
> with each approach represented by a mode.
>
> A later patch will pass this mode to targetm.vectorize.related_mode
> to get the vector mode for a given element mode.  Until then, the modes
> simply act as an alternative way of specifying the vector size.

Is there a restriction to use integer vector modes for the hook
or would FP vector modes be OK as well?  Note that your
x86 change likely disables word_mode vectorization with -mno-sse?

That is, how do we represent GPR vectorization "size" here?
The preferred SIMD mode hook may return an integer mode,
are non-vector modes OK for autovectorize_vector_modes?

Thanks,
Richard.
>
> 2019-10-24  Richard Sandiford  <richard.sandiford@arm.com>
>
> gcc/
>         * target.h (vector_sizes, auto_vector_sizes): Delete.
>         (vector_modes, auto_vector_modes): New typedefs.
>         * target.def (autovectorize_vector_sizes): Replace with...
>         (autovectorize_vector_modes): ...this new hook.
>         * doc/tm.texi.in (TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES):
>         Replace with...
>         (TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES): ...this new hook.
>         * doc/tm.texi: Regenerate.
>         * targhooks.h (default_autovectorize_vector_sizes): Delete.
>         (default_autovectorize_vector_modes): New function.
>         * targhooks.c (default_autovectorize_vector_sizes): Delete.
>         (default_autovectorize_vector_modes): New function.
>         * omp-general.c (omp_max_vf): Use autovectorize_vector_modes instead
>         of autovectorize_vector_sizes.  Use the number of units in the mode
>         to calculate the maximum VF.
>         * omp-low.c (omp_clause_aligned_alignment): Use
>         autovectorize_vector_modes instead of autovectorize_vector_sizes.
>         Use a loop based on related_mode to iterate through all supported
>         vector modes for a given scalar mode.
>         * optabs-query.c (can_vec_mask_load_store_p): Use
>         autovectorize_vector_modes instead of autovectorize_vector_sizes.
>         * tree-vect-loop.c (vect_analyze_loop, vect_transform_loop): Likewise.
>         * tree-vect-slp.c (vect_slp_bb_region): Likewise.
>         * config/aarch64/aarch64.c (aarch64_autovectorize_vector_sizes):
>         Replace with...
>         (aarch64_autovectorize_vector_modes): ...this new function.
>         (TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES): Delete.
>         (TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES): Define.
>         * config/arc/arc.c (arc_autovectorize_vector_sizes): Replace with...
>         (arc_autovectorize_vector_modes): ...this new function.
>         (TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES): Delete.
>         (TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES): Define.
>         * config/arm/arm.c (arm_autovectorize_vector_sizes): Replace with...
>         (arm_autovectorize_vector_modes): ...this new function.
>         (TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES): Delete.
>         (TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES): Define.
>         * config/i386/i386.c (ix86_autovectorize_vector_sizes): Replace with...
>         (ix86_autovectorize_vector_modes): ...this new function.
>         (TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES): Delete.
>         (TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES): Define.
>         * config/mips/mips.c (mips_autovectorize_vector_sizes): Replace with...
>         (mips_autovectorize_vector_modes): ...this new function.
>         (TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES): Delete.
>         (TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES): Define.
>
> Index: gcc/target.h
> ===================================================================
> --- gcc/target.h        2019-09-30 17:19:39.843166118 +0100
> +++ gcc/target.h        2019-10-25 13:27:15.525762975 +0100
> @@ -205,11 +205,11 @@ enum vect_cost_model_location {
>  class vec_perm_indices;
>
>  /* The type to use for lists of vector sizes.  */
> -typedef vec<poly_uint64> vector_sizes;
> +typedef vec<machine_mode> vector_modes;
>
>  /* Same, but can be used to construct local lists that are
>     automatically freed.  */
> -typedef auto_vec<poly_uint64, 8> auto_vector_sizes;
> +typedef auto_vec<machine_mode, 8> auto_vector_modes;
>
>  /* The target structure.  This holds all the backend hooks.  */
>  #define DEFHOOKPOD(NAME, DOC, TYPE, INIT) TYPE NAME;
> Index: gcc/target.def
> ===================================================================
> --- gcc/target.def      2019-10-25 13:26:59.309877555 +0100
> +++ gcc/target.def      2019-10-25 13:27:15.525762975 +0100
> @@ -1894,20 +1894,28 @@ reached.  The default is @var{mode} whic
>  /* Returns a mask of vector sizes to iterate over when auto-vectorizing
>     after processing the preferred one derived from preferred_simd_mode.  */
>  DEFHOOK
> -(autovectorize_vector_sizes,
> - "If the mode returned by @code{TARGET_VECTORIZE_PREFERRED_SIMD_MODE} is not\n\
> -the only one that is worth considering, this hook should add all suitable\n\
> -vector sizes to @var{sizes}, in order of decreasing preference.  The first\n\
> -one should be the size of @code{TARGET_VECTORIZE_PREFERRED_SIMD_MODE}.\n\
> -If @var{all} is true, add suitable vector sizes even when they are generally\n\
> +(autovectorize_vector_modes,
> + "If using the mode returned by @code{TARGET_VECTORIZE_PREFERRED_SIMD_MODE}\n\
> +is not the only approach worth considering, this hook should add one mode to\n\
> +@var{modes} for each useful alternative approach.  These modes are then\n\
> +passed to @code{TARGET_VECTORIZE_RELATED_MODE} to obtain the vector mode\n\
> +for a given element mode.\n\
> +\n\
> +The modes returned in @var{modes} should use the smallest element mode\n\
> +possible for the vectorization approach that they represent, preferring\n\
> +integer modes over floating-poing modes in the event of a tie.  The first\n\
> +mode should be the @code{TARGET_VECTORIZE_PREFERRED_SIMD_MODE} for its\n\
> +element mode.\n\
> +\n\
> +If @var{all} is true, add suitable vector modes even when they are generally\n\
>  not expected to be worthwhile.\n\
>  \n\
>  The hook does not need to do anything if the vector returned by\n\
>  @code{TARGET_VECTORIZE_PREFERRED_SIMD_MODE} is the only one relevant\n\
>  for autovectorization.  The default implementation does nothing.",
>   void,
> - (vector_sizes *sizes, bool all),
> - default_autovectorize_vector_sizes)
> + (vector_modes *modes, bool all),
> + default_autovectorize_vector_modes)
>
>  DEFHOOK
>  (related_mode,
> Index: gcc/doc/tm.texi.in
> ===================================================================
> --- gcc/doc/tm.texi.in  2019-10-25 13:26:59.009879675 +0100
> +++ gcc/doc/tm.texi.in  2019-10-25 13:27:15.521763003 +0100
> @@ -4179,7 +4179,7 @@ address;  but often a machine-dependent
>
>  @hook TARGET_VECTORIZE_SPLIT_REDUCTION
>
> -@hook TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES
> +@hook TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES
>
>  @hook TARGET_VECTORIZE_RELATED_MODE
>
> Index: gcc/doc/tm.texi
> ===================================================================
> --- gcc/doc/tm.texi     2019-10-25 13:26:59.305877583 +0100
> +++ gcc/doc/tm.texi     2019-10-25 13:27:15.521763003 +0100
> @@ -6016,12 +6016,20 @@ against lower halves of vectors recursiv
>  reached.  The default is @var{mode} which means no splitting.
>  @end deftypefn
>
> -@deftypefn {Target Hook} void TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES (vector_sizes *@var{sizes}, bool @var{all})
> -If the mode returned by @code{TARGET_VECTORIZE_PREFERRED_SIMD_MODE} is not
> -the only one that is worth considering, this hook should add all suitable
> -vector sizes to @var{sizes}, in order of decreasing preference.  The first
> -one should be the size of @code{TARGET_VECTORIZE_PREFERRED_SIMD_MODE}.
> -If @var{all} is true, add suitable vector sizes even when they are generally
> +@deftypefn {Target Hook} void TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES (vector_modes *@var{modes}, bool @var{all})
> +If using the mode returned by @code{TARGET_VECTORIZE_PREFERRED_SIMD_MODE}
> +is not the only approach worth considering, this hook should add one mode to
> +@var{modes} for each useful alternative approach.  These modes are then
> +passed to @code{TARGET_VECTORIZE_RELATED_MODE} to obtain the vector mode
> +for a given element mode.
> +
> +The modes returned in @var{modes} should use the smallest element mode
> +possible for the vectorization approach that they represent, preferring
> +integer modes over floating-poing modes in the event of a tie.  The first
> +mode should be the @code{TARGET_VECTORIZE_PREFERRED_SIMD_MODE} for its
> +element mode.
> +
> +If @var{all} is true, add suitable vector modes even when they are generally
>  not expected to be worthwhile.
>
>  The hook does not need to do anything if the vector returned by
> Index: gcc/targhooks.h
> ===================================================================
> --- gcc/targhooks.h     2019-10-25 13:26:59.309877555 +0100
> +++ gcc/targhooks.h     2019-10-25 13:27:15.525762975 +0100
> @@ -113,7 +113,7 @@ default_builtin_support_vector_misalignm
>                                              int, bool);
>  extern machine_mode default_preferred_simd_mode (scalar_mode mode);
>  extern machine_mode default_split_reduction (machine_mode);
> -extern void default_autovectorize_vector_sizes (vector_sizes *, bool);
> +extern void default_autovectorize_vector_modes (vector_modes *, bool);
>  extern opt_machine_mode default_vectorize_related_mode (machine_mode,
>                                                         scalar_mode,
>                                                         poly_uint64);
> Index: gcc/targhooks.c
> ===================================================================
> --- gcc/targhooks.c     2019-10-25 13:26:59.309877555 +0100
> +++ gcc/targhooks.c     2019-10-25 13:27:15.525762975 +0100
> @@ -1299,11 +1299,10 @@ default_split_reduction (machine_mode mo
>    return mode;
>  }
>
> -/* By default only the size derived from the preferred vector mode
> -   is tried.  */
> +/* By default only the preferred vector mode is tried.  */
>
>  void
> -default_autovectorize_vector_sizes (vector_sizes *, bool)
> +default_autovectorize_vector_modes (vector_modes *, bool)
>  {
>  }
>
> Index: gcc/omp-general.c
> ===================================================================
> --- gcc/omp-general.c   2019-10-25 09:21:28.798326303 +0100
> +++ gcc/omp-general.c   2019-10-25 13:27:15.521763003 +0100
> @@ -508,13 +508,16 @@ omp_max_vf (void)
>           && global_options_set.x_flag_tree_loop_vectorize))
>      return 1;
>
> -  auto_vector_sizes sizes;
> -  targetm.vectorize.autovectorize_vector_sizes (&sizes, true);
> -  if (!sizes.is_empty ())
> +  auto_vector_modes modes;
> +  targetm.vectorize.autovectorize_vector_modes (&modes, true);
> +  if (!modes.is_empty ())
>      {
>        poly_uint64 vf = 0;
> -      for (unsigned int i = 0; i < sizes.length (); ++i)
> -       vf = ordered_max (vf, sizes[i]);
> +      for (unsigned int i = 0; i < modes.length (); ++i)
> +       /* The returned modes use the smallest element size (and thus
> +          the largest nunits) for the vectorization approach that they
> +          represent.  */
> +       vf = ordered_max (vf, GET_MODE_NUNITS (modes[i]));
>        return vf;
>      }
>
> Index: gcc/omp-low.c
> ===================================================================
> --- gcc/omp-low.c       2019-10-11 15:43:51.283513446 +0100
> +++ gcc/omp-low.c       2019-10-25 13:27:15.525762975 +0100
> @@ -3947,11 +3947,8 @@ omp_clause_aligned_alignment (tree claus
>    /* Otherwise return implementation defined alignment.  */
>    unsigned int al = 1;
>    opt_scalar_mode mode_iter;
> -  auto_vector_sizes sizes;
> -  targetm.vectorize.autovectorize_vector_sizes (&sizes, true);
> -  poly_uint64 vs = 0;
> -  for (unsigned int i = 0; i < sizes.length (); ++i)
> -    vs = ordered_max (vs, sizes[i]);
> +  auto_vector_modes modes;
> +  targetm.vectorize.autovectorize_vector_modes (&modes, true);
>    static enum mode_class classes[]
>      = { MODE_INT, MODE_VECTOR_INT, MODE_FLOAT, MODE_VECTOR_FLOAT };
>    for (int i = 0; i < 4; i += 2)
> @@ -3962,19 +3959,18 @@ omp_clause_aligned_alignment (tree claus
>         machine_mode vmode = targetm.vectorize.preferred_simd_mode (mode);
>         if (GET_MODE_CLASS (vmode) != classes[i + 1])
>           continue;
> -       while (maybe_ne (vs, 0U)
> -              && known_lt (GET_MODE_SIZE (vmode), vs)
> -              && GET_MODE_2XWIDER_MODE (vmode).exists ())
> -         vmode = GET_MODE_2XWIDER_MODE (vmode).require ();
> +       machine_mode alt_vmode;
> +       for (unsigned int j = 0; j < modes.length (); ++j)
> +         if (related_vector_mode (modes[j], mode).exists (&alt_vmode)
> +             && known_ge (GET_MODE_SIZE (alt_vmode), GET_MODE_SIZE (vmode)))
> +           vmode = alt_vmode;
>
>         tree type = lang_hooks.types.type_for_mode (mode, 1);
>         if (type == NULL_TREE || TYPE_MODE (type) != mode)
>           continue;
> -       poly_uint64 nelts = exact_div (GET_MODE_SIZE (vmode),
> -                                      GET_MODE_SIZE (mode));
> -       type = build_vector_type (type, nelts);
> -       if (TYPE_MODE (type) != vmode)
> -         continue;
> +       type = build_vector_type_for_mode (type, vmode);
> +       /* The functions above are not allowed to return invalid modes.  */
> +       gcc_assert (TYPE_MODE (type) == vmode);
>         if (TYPE_ALIGN_UNIT (type) > al)
>           al = TYPE_ALIGN_UNIT (type);
>        }
> Index: gcc/optabs-query.c
> ===================================================================
> --- gcc/optabs-query.c  2019-10-25 13:26:59.305877583 +0100
> +++ gcc/optabs-query.c  2019-10-25 13:27:15.525762975 +0100
> @@ -589,11 +589,11 @@ can_vec_mask_load_store_p (machine_mode
>        && convert_optab_handler (op, vmode, mask_mode) != CODE_FOR_nothing)
>      return true;
>
> -  auto_vector_sizes vector_sizes;
> -  targetm.vectorize.autovectorize_vector_sizes (&vector_sizes, true);
> -  for (unsigned int i = 0; i < vector_sizes.length (); ++i)
> +  auto_vector_modes vector_modes;
> +  targetm.vectorize.autovectorize_vector_modes (&vector_modes, true);
> +  for (unsigned int i = 0; i < vector_modes.length (); ++i)
>      {
> -      poly_uint64 cur = vector_sizes[i];
> +      poly_uint64 cur = GET_MODE_SIZE (vector_modes[i]);
>        poly_uint64 nunits;
>        if (!multiple_p (cur, GET_MODE_SIZE (smode), &nunits))
>         continue;
> Index: gcc/tree-vect-loop.c
> ===================================================================
> --- gcc/tree-vect-loop.c        2019-10-25 13:26:59.137878771 +0100
> +++ gcc/tree-vect-loop.c        2019-10-25 13:27:15.525762975 +0100
> @@ -2319,12 +2319,12 @@ vect_analyze_loop_2 (loop_vec_info loop_
>  vect_analyze_loop (class loop *loop, loop_vec_info orig_loop_vinfo,
>                    vec_info_shared *shared)
>  {
> -  auto_vector_sizes vector_sizes;
> +  auto_vector_modes vector_modes;
>
>    /* Autodetect first vector size we try.  */
> -  targetm.vectorize.autovectorize_vector_sizes (&vector_sizes,
> +  targetm.vectorize.autovectorize_vector_modes (&vector_modes,
>                                                 loop->simdlen != 0);
> -  unsigned int next_size = 0;
> +  unsigned int mode_i = 0;
>
>    DUMP_VECT_SCOPE ("analyze_loop_nest");
>
> @@ -2343,7 +2343,7 @@ vect_analyze_loop (class loop *loop, loo
>    unsigned n_stmts = 0;
>    poly_uint64 autodetected_vector_size = 0;
>    opt_loop_vec_info first_loop_vinfo = opt_loop_vec_info::success (NULL);
> -  poly_uint64 next_vector_size = 0;
> +  machine_mode next_vector_mode = VOIDmode;
>    while (1)
>      {
>        /* Check the CFG characteristics of the loop (nesting, entry/exit).  */
> @@ -2357,7 +2357,7 @@ vect_analyze_loop (class loop *loop, loo
>           gcc_checking_assert (first_loop_vinfo == NULL);
>           return loop_vinfo;
>         }
> -      loop_vinfo->vector_size = next_vector_size;
> +      loop_vinfo->vector_size = GET_MODE_SIZE (next_vector_mode);
>
>        bool fatal = false;
>
> @@ -2365,7 +2365,7 @@ vect_analyze_loop (class loop *loop, loo
>         LOOP_VINFO_ORIG_LOOP_INFO (loop_vinfo) = orig_loop_vinfo;
>
>        opt_result res = vect_analyze_loop_2 (loop_vinfo, fatal, &n_stmts);
> -      if (next_size == 0)
> +      if (mode_i == 0)
>         autodetected_vector_size = loop_vinfo->vector_size;
>
>        if (res)
> @@ -2399,11 +2399,12 @@ vect_analyze_loop (class loop *loop, loo
>           return opt_loop_vec_info::propagate_failure (res);
>         }
>
> -      if (next_size < vector_sizes.length ()
> -         && known_eq (vector_sizes[next_size], autodetected_vector_size))
> -       next_size += 1;
> +      if (mode_i < vector_modes.length ()
> +         && known_eq (GET_MODE_SIZE (vector_modes[mode_i]),
> +                      autodetected_vector_size))
> +       mode_i += 1;
>
> -      if (next_size == vector_sizes.length ()
> +      if (mode_i == vector_modes.length ()
>           || known_eq (autodetected_vector_size, 0U))
>         {
>           if (first_loop_vinfo)
> @@ -2423,15 +2424,11 @@ vect_analyze_loop (class loop *loop, loo
>         }
>
>        /* Try the next biggest vector size.  */
> -      next_vector_size = vector_sizes[next_size++];
> +      next_vector_mode = vector_modes[mode_i++];
>        if (dump_enabled_p ())
> -       {
> -         dump_printf_loc (MSG_NOTE, vect_location,
> -                          "***** Re-trying analysis with "
> -                          "vector size ");
> -         dump_dec (MSG_NOTE, next_vector_size);
> -         dump_printf (MSG_NOTE, "\n");
> -       }
> +       dump_printf_loc (MSG_NOTE, vect_location,
> +                        "***** Re-trying analysis with vector mode %s\n",
> +                        GET_MODE_NAME (next_vector_mode));
>      }
>  }
>
> @@ -8277,9 +8274,9 @@ vect_transform_loop (loop_vec_info loop_
>
>    if (epilogue)
>      {
> -      auto_vector_sizes vector_sizes;
> -      targetm.vectorize.autovectorize_vector_sizes (&vector_sizes, false);
> -      unsigned int next_size = 0;
> +      auto_vector_modes vector_modes;
> +      targetm.vectorize.autovectorize_vector_modes (&vector_modes, false);
> +      unsigned int next_i = 0;
>
>        /* Note LOOP_VINFO_NITERS_KNOWN_P and LOOP_VINFO_INT_NITERS work
>           on niters already ajusted for the iterations of the prologue.  */
> @@ -8295,18 +8292,20 @@ vect_transform_loop (loop_vec_info loop_
>           epilogue->any_upper_bound = true;
>
>           unsigned int ratio;
> -         while (next_size < vector_sizes.length ()
> -                && !(constant_multiple_p (loop_vinfo->vector_size,
> -                                          vector_sizes[next_size], &ratio)
> +         while (next_i < vector_modes.length ()
> +                && !(constant_multiple_p
> +                     (loop_vinfo->vector_size,
> +                      GET_MODE_SIZE (vector_modes[next_i]), &ratio)
>                       && eiters >= lowest_vf / ratio))
> -           next_size += 1;
> +           next_i += 1;
>         }
>        else
> -       while (next_size < vector_sizes.length ()
> -              && maybe_lt (loop_vinfo->vector_size, vector_sizes[next_size]))
> -         next_size += 1;
> +       while (next_i < vector_modes.length ()
> +              && maybe_lt (loop_vinfo->vector_size,
> +                           GET_MODE_SIZE (vector_modes[next_i])))
> +         next_i += 1;
>
> -      if (next_size == vector_sizes.length ())
> +      if (next_i == vector_modes.length ())
>         epilogue = NULL;
>      }
>
> Index: gcc/tree-vect-slp.c
> ===================================================================
> --- gcc/tree-vect-slp.c 2019-10-25 13:26:59.141878743 +0100
> +++ gcc/tree-vect-slp.c 2019-10-25 13:27:15.525762975 +0100
> @@ -3087,12 +3087,12 @@ vect_slp_bb_region (gimple_stmt_iterator
>                     unsigned int n_stmts)
>  {
>    bb_vec_info bb_vinfo;
> -  auto_vector_sizes vector_sizes;
> +  auto_vector_modes vector_modes;
>
>    /* Autodetect first vector size we try.  */
> -  poly_uint64 next_vector_size = 0;
> -  targetm.vectorize.autovectorize_vector_sizes (&vector_sizes, false);
> -  unsigned int next_size = 0;
> +  machine_mode next_vector_mode = VOIDmode;
> +  targetm.vectorize.autovectorize_vector_modes (&vector_modes, false);
> +  unsigned int mode_i = 0;
>
>    vec_info_shared shared;
>
> @@ -3109,7 +3109,7 @@ vect_slp_bb_region (gimple_stmt_iterator
>         bb_vinfo->shared->save_datarefs ();
>        else
>         bb_vinfo->shared->check_datarefs ();
> -      bb_vinfo->vector_size = next_vector_size;
> +      bb_vinfo->vector_size = GET_MODE_SIZE (next_vector_mode);
>
>        if (vect_slp_analyze_bb_1 (bb_vinfo, n_stmts, fatal)
>           && dbg_cnt (vect_slp))
> @@ -3136,17 +3136,18 @@ vect_slp_bb_region (gimple_stmt_iterator
>           vectorized = true;
>         }
>
> -      if (next_size == 0)
> +      if (mode_i == 0)
>         autodetected_vector_size = bb_vinfo->vector_size;
>
>        delete bb_vinfo;
>
> -      if (next_size < vector_sizes.length ()
> -         && known_eq (vector_sizes[next_size], autodetected_vector_size))
> -       next_size += 1;
> +      if (mode_i < vector_modes.length ()
> +         && known_eq (GET_MODE_SIZE (vector_modes[mode_i]),
> +                      autodetected_vector_size))
> +       mode_i += 1;
>
>        if (vectorized
> -         || next_size == vector_sizes.length ()
> +         || mode_i == vector_modes.length ()
>           || known_eq (autodetected_vector_size, 0U)
>           /* If vect_slp_analyze_bb_1 signaled that analysis for all
>              vector sizes will fail do not bother iterating.  */
> @@ -3154,15 +3155,11 @@ vect_slp_bb_region (gimple_stmt_iterator
>         return vectorized;
>
>        /* Try the next biggest vector size.  */
> -      next_vector_size = vector_sizes[next_size++];
> +      next_vector_mode = vector_modes[mode_i++];
>        if (dump_enabled_p ())
> -       {
> -         dump_printf_loc (MSG_NOTE, vect_location,
> -                          "***** Re-trying analysis with "
> -                          "vector size ");
> -         dump_dec (MSG_NOTE, next_vector_size);
> -         dump_printf (MSG_NOTE, "\n");
> -       }
> +       dump_printf_loc (MSG_NOTE, vect_location,
> +                        "***** Re-trying analysis with vector mode %s\n",
> +                        GET_MODE_NAME (next_vector_mode));
>      }
>  }
>
> Index: gcc/config/aarch64/aarch64.c
> ===================================================================
> --- gcc/config/aarch64/aarch64.c        2019-10-25 13:26:59.177878488 +0100
> +++ gcc/config/aarch64/aarch64.c        2019-10-25 13:27:15.505763118 +0100
> @@ -15203,12 +15203,12 @@ aarch64_preferred_simd_mode (scalar_mode
>  /* Return a list of possible vector sizes for the vectorizer
>     to iterate over.  */
>  static void
> -aarch64_autovectorize_vector_sizes (vector_sizes *sizes, bool)
> +aarch64_autovectorize_vector_modes (vector_modes *modes, bool)
>  {
>    if (TARGET_SVE)
> -    sizes->safe_push (BYTES_PER_SVE_VECTOR);
> -  sizes->safe_push (16);
> -  sizes->safe_push (8);
> +    modes->safe_push (VNx16QImode);
> +  modes->safe_push (V16QImode);
> +  modes->safe_push (V8QImode);
>  }
>
>  /* Implement TARGET_MANGLE_TYPE.  */
> @@ -20915,9 +20915,9 @@ #define TARGET_VECTORIZE_BUILTINS
>  #define TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION \
>    aarch64_builtin_vectorized_function
>
> -#undef TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES
> -#define TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES \
> -  aarch64_autovectorize_vector_sizes
> +#undef TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES
> +#define TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES \
> +  aarch64_autovectorize_vector_modes
>
>  #undef TARGET_ATOMIC_ASSIGN_EXPAND_FENV
>  #define TARGET_ATOMIC_ASSIGN_EXPAND_FENV \
> Index: gcc/config/arc/arc.c
> ===================================================================
> --- gcc/config/arc/arc.c        2019-10-25 09:21:25.974346475 +0100
> +++ gcc/config/arc/arc.c        2019-10-25 13:27:15.505763118 +0100
> @@ -607,15 +607,15 @@ arc_preferred_simd_mode (scalar_mode mod
>  }
>
>  /* Implements target hook
> -   TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES.  */
> +   TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES.  */
>
>  static void
> -arc_autovectorize_vector_sizes (vector_sizes *sizes, bool)
> +arc_autovectorize_vector_modes (vector_modes *modes, bool)
>  {
>    if (TARGET_PLUS_QMACW)
>      {
> -      sizes->quick_push (8);
> -      sizes->quick_push (4);
> +      modes->quick_push (V4HImode);
> +      modes->quick_push (V2HImode);
>      }
>  }
>
> @@ -726,8 +726,8 @@ #define TARGET_VECTOR_MODE_SUPPORTED_P a
>  #undef TARGET_VECTORIZE_PREFERRED_SIMD_MODE
>  #define TARGET_VECTORIZE_PREFERRED_SIMD_MODE arc_preferred_simd_mode
>
> -#undef TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES
> -#define TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES arc_autovectorize_vector_sizes
> +#undef TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES
> +#define TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES arc_autovectorize_vector_modes
>
>  #undef TARGET_CAN_USE_DOLOOP_P
>  #define TARGET_CAN_USE_DOLOOP_P arc_can_use_doloop_p
> Index: gcc/config/arm/arm.c
> ===================================================================
> --- gcc/config/arm/arm.c        2019-10-23 11:29:47.933883742 +0100
> +++ gcc/config/arm/arm.c        2019-10-25 13:27:15.513763059 +0100
> @@ -289,7 +289,7 @@ static bool arm_builtin_support_vector_m
>  static void arm_conditional_register_usage (void);
>  static enum flt_eval_method arm_excess_precision (enum excess_precision_type);
>  static reg_class_t arm_preferred_rename_class (reg_class_t rclass);
> -static void arm_autovectorize_vector_sizes (vector_sizes *, bool);
> +static void arm_autovectorize_vector_modes (vector_modes *, bool);
>  static int arm_default_branch_cost (bool, bool);
>  static int arm_cortex_a5_branch_cost (bool, bool);
>  static int arm_cortex_m_branch_cost (bool, bool);
> @@ -522,9 +522,9 @@ #define TARGET_VECTOR_MODE_SUPPORTED_P a
>  #define TARGET_ARRAY_MODE_SUPPORTED_P arm_array_mode_supported_p
>  #undef TARGET_VECTORIZE_PREFERRED_SIMD_MODE
>  #define TARGET_VECTORIZE_PREFERRED_SIMD_MODE arm_preferred_simd_mode
> -#undef TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES
> -#define TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES \
> -  arm_autovectorize_vector_sizes
> +#undef TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES
> +#define TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES \
> +  arm_autovectorize_vector_modes
>
>  #undef  TARGET_MACHINE_DEPENDENT_REORG
>  #define TARGET_MACHINE_DEPENDENT_REORG arm_reorg
> @@ -29012,12 +29012,12 @@ arm_vector_alignment (const_tree type)
>  }
>
>  static void
> -arm_autovectorize_vector_sizes (vector_sizes *sizes, bool)
> +arm_autovectorize_vector_modes (vector_modes *modes, bool)
>  {
>    if (!TARGET_NEON_VECTORIZE_DOUBLE)
>      {
> -      sizes->safe_push (16);
> -      sizes->safe_push (8);
> +      modes->safe_push (V16QImode);
> +      modes->safe_push (V8QImode);
>      }
>  }
>
> Index: gcc/config/i386/i386.c
> ===================================================================
> --- gcc/config/i386/i386.c      2019-10-25 13:26:59.277877782 +0100
> +++ gcc/config/i386/i386.c      2019-10-25 13:27:15.517763031 +0100
> @@ -21387,35 +21387,35 @@ ix86_preferred_simd_mode (scalar_mode mo
>     256bit and 128bit vectors.  */
>
>  static void
> -ix86_autovectorize_vector_sizes (vector_sizes *sizes, bool all)
> +ix86_autovectorize_vector_modes (vector_modes *modes, bool all)
>  {
>    if (TARGET_AVX512F && !TARGET_PREFER_AVX256)
>      {
> -      sizes->safe_push (64);
> -      sizes->safe_push (32);
> -      sizes->safe_push (16);
> +      modes->safe_push (V64QImode);
> +      modes->safe_push (V32QImode);
> +      modes->safe_push (V16QImode);
>      }
>    else if (TARGET_AVX512F && all)
>      {
> -      sizes->safe_push (32);
> -      sizes->safe_push (16);
> -      sizes->safe_push (64);
> +      modes->safe_push (V32QImode);
> +      modes->safe_push (V16QImode);
> +      modes->safe_push (V64QImode);
>      }
>    else if (TARGET_AVX && !TARGET_PREFER_AVX128)
>      {
> -      sizes->safe_push (32);
> -      sizes->safe_push (16);
> +      modes->safe_push (V32QImode);
> +      modes->safe_push (V16QImode);
>      }
>    else if (TARGET_AVX && all)
>      {
> -      sizes->safe_push (16);
> -      sizes->safe_push (32);
> +      modes->safe_push (V16QImode);
> +      modes->safe_push (V32QImode);
>      }
>    else if (TARGET_MMX_WITH_SSE)
> -    sizes->safe_push (16);
> +    modes->safe_push (V16QImode);
>
>    if (TARGET_MMX_WITH_SSE)
> -    sizes->safe_push (8);
> +    modes->safe_push (V8QImode);
>  }
>
>  /* Implemenation of targetm.vectorize.get_mask_mode.  */
> @@ -22954,9 +22954,9 @@ #define TARGET_VECTORIZE_PREFERRED_SIMD_
>  #undef TARGET_VECTORIZE_SPLIT_REDUCTION
>  #define TARGET_VECTORIZE_SPLIT_REDUCTION \
>    ix86_split_reduction
> -#undef TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES
> -#define TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES \
> -  ix86_autovectorize_vector_sizes
> +#undef TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES
> +#define TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES \
> +  ix86_autovectorize_vector_modes
>  #undef TARGET_VECTORIZE_GET_MASK_MODE
>  #define TARGET_VECTORIZE_GET_MASK_MODE ix86_get_mask_mode
>  #undef TARGET_VECTORIZE_INIT_COST
> Index: gcc/config/mips/mips.c
> ===================================================================
> --- gcc/config/mips/mips.c      2019-10-17 14:22:54.903313423 +0100
> +++ gcc/config/mips/mips.c      2019-10-25 13:27:15.517763031 +0100
> @@ -13453,13 +13453,13 @@ mips_preferred_simd_mode (scalar_mode mo
>    return word_mode;
>  }
>
> -/* Implement TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES.  */
> +/* Implement TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES.  */
>
>  static void
> -mips_autovectorize_vector_sizes (vector_sizes *sizes, bool)
> +mips_autovectorize_vector_modes (vector_modes *modes, bool)
>  {
>    if (ISA_HAS_MSA)
> -    sizes->safe_push (16);
> +    modes->safe_push (V16QImode);
>  }
>
>  /* Implement TARGET_INIT_LIBFUNCS.  */
> @@ -22694,9 +22694,9 @@ #define TARGET_SCALAR_MODE_SUPPORTED_P m
>
>  #undef TARGET_VECTORIZE_PREFERRED_SIMD_MODE
>  #define TARGET_VECTORIZE_PREFERRED_SIMD_MODE mips_preferred_simd_mode
> -#undef TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES
> -#define TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES \
> -  mips_autovectorize_vector_sizes
> +#undef TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES
> +#define TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES \
> +  mips_autovectorize_vector_modes
>
>  #undef TARGET_INIT_BUILTINS
>  #define TARGET_INIT_BUILTINS mips_init_builtins

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [8/n] Replace autovectorize_vector_sizes with autovectorize_vector_modes
  2019-10-30 14:48   ` Richard Biener
@ 2019-10-30 16:33     ` Richard Sandiford
  2019-11-11 10:30       ` Richard Sandiford
  2019-11-11 14:33       ` Richard Biener
  0 siblings, 2 replies; 48+ messages in thread
From: Richard Sandiford @ 2019-10-30 16:33 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC Patches

Richard Biener <richard.guenther@gmail.com> writes:
> On Fri, Oct 25, 2019 at 2:37 PM Richard Sandiford
> <richard.sandiford@arm.com> wrote:
>>
>> This is another patch in the series to remove the assumption that
>> all modes involved in vectorisation have to be the same size.
>> Rather than have the target provide a list of vector sizes,
>> it makes the target provide a list of vector "approaches",
>> with each approach represented by a mode.
>>
>> A later patch will pass this mode to targetm.vectorize.related_mode
>> to get the vector mode for a given element mode.  Until then, the modes
>> simply act as an alternative way of specifying the vector size.
>
> Is there a restriction to use integer vector modes for the hook
> or would FP vector modes be OK as well?

Conceptually, each mode returned by the hook represents a set of vector
modes, with the set containing one member for each supported element
type.  The idea is to represent the set using the member with the
smallest element type, preferring integer modes over floating-point
modes in the event of a tie.  So using a floating-point mode as the
representative mode is fine if floating-point elements are the smallest
(or only) supported element type.

> Note that your x86 change likely disables word_mode vectorization with
> -mno-sse?

No, that still works, because...

> That is, how do we represent GPR vectorization "size" here?
> The preferred SIMD mode hook may return an integer mode,
> are non-vector modes OK for autovectorize_vector_modes?

...at least with all current targets, preferred_simd_mode is only
an integer mode if the target has no "real" vectorisation support
for that element type.  There's no need to handle that case in
autovectorize_vector_sizes/modes, and e.g. the x86 hook does nothing
when SSE is disabled.

So while preferred_simd_mode can continue to return integer modes,
autovectorize_vector_modes always returns vector modes.

This patch just treats the mode as an alternative way of specifying
the vector size.  11/n then tries to use related_vector_mode to choose
the vector mode for each element type instead.  But 11/n only uses
related_vector_mode if vec_info::vector_mode is a vector mode.  If it's
an integer mode (as for -mno-sse), or if related_vector_mode fails to
find a vector mode, then we still fall back to mode_for_vector and so
pick an integer mode in the same cases as before.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [9/n] Replace vec_info::vector_size with vec_info::vector_mode
  2019-10-25 12:41 ` [9/n] Replace vec_info::vector_size with vec_info::vector_mode Richard Sandiford
@ 2019-11-05 12:47   ` Richard Biener
  0 siblings, 0 replies; 48+ messages in thread
From: Richard Biener @ 2019-11-05 12:47 UTC (permalink / raw)
  To: Richard Sandiford; +Cc: GCC Patches

On Fri, Oct 25, 2019 at 2:39 PM Richard Sandiford
<richard.sandiford@arm.com> wrote:
>
> This patch replaces vec_info::vector_size with vec_info::vector_mode,
> but for now continues to use it as a way of specifying a single
> vector size.  This makes it easier for later patches to use
> related_vector_mode instead.

OK.

>
> 2019-10-24  Richard Sandiford  <richard.sandiford@arm.com>
>
> gcc/
>         * tree-vectorizer.h (vec_info::vector_size): Replace with...
>         (vec_info::vector_mode): ...this new field.
>         * tree-vect-loop.c (vect_update_vf_for_slp): Update accordingly.
>         (vect_analyze_loop, vect_transform_loop): Likewise.
>         * tree-vect-slp.c (can_duplicate_and_interleave_p): Likewise.
>         (vect_make_slp_decision, vect_slp_bb_region): Likewise.
>         * tree-vect-stmts.c (get_vectype_for_scalar_type): Likewise.
>         * tree-vectorizer.c (try_vectorize_loop_1): Likewise.
>
> gcc/testsuite/
>         * gcc.dg/vect/vect-tail-nomask-1.c: Update expected epilogue
>         vectorization message.
>
> Index: gcc/tree-vectorizer.h
> ===================================================================
> --- gcc/tree-vectorizer.h       2019-10-25 13:26:59.093879082 +0100
> +++ gcc/tree-vectorizer.h       2019-10-25 13:27:19.317736181 +0100
> @@ -329,9 +329,9 @@ typedef std::pair<tree, tree> vec_object
>    /* Cost data used by the target cost model.  */
>    void *target_cost_data;
>
> -  /* The vector size for this loop in bytes, or 0 if we haven't picked
> -     a size yet.  */
> -  poly_uint64 vector_size;
> +  /* If we've chosen a vector size for this vectorization region,
> +     this is one mode that has such a size, otherwise it is VOIDmode.  */
> +  machine_mode vector_mode;
>
>  private:
>    stmt_vec_info new_stmt_vec_info (gimple *stmt);
> Index: gcc/tree-vect-loop.c
> ===================================================================
> --- gcc/tree-vect-loop.c        2019-10-25 13:27:15.525762975 +0100
> +++ gcc/tree-vect-loop.c        2019-10-25 13:27:19.309736237 +0100
> @@ -1414,8 +1414,8 @@ vect_update_vf_for_slp (loop_vec_info lo
>         dump_printf_loc (MSG_NOTE, vect_location,
>                          "Loop contains SLP and non-SLP stmts\n");
>        /* Both the vectorization factor and unroll factor have the form
> -        loop_vinfo->vector_size * X for some rational X, so they must have
> -        a common multiple.  */
> +        GET_MODE_SIZE (loop_vinfo->vector_mode) * X for some rational X,
> +        so they must have a common multiple.  */
>        vectorization_factor
>         = force_common_multiple (vectorization_factor,
>                                  LOOP_VINFO_SLP_UNROLLING_FACTOR (loop_vinfo));
> @@ -2341,7 +2341,7 @@ vect_analyze_loop (class loop *loop, loo
>         " loops cannot be vectorized\n");
>
>    unsigned n_stmts = 0;
> -  poly_uint64 autodetected_vector_size = 0;
> +  machine_mode autodetected_vector_mode = VOIDmode;
>    opt_loop_vec_info first_loop_vinfo = opt_loop_vec_info::success (NULL);
>    machine_mode next_vector_mode = VOIDmode;
>    while (1)
> @@ -2357,7 +2357,7 @@ vect_analyze_loop (class loop *loop, loo
>           gcc_checking_assert (first_loop_vinfo == NULL);
>           return loop_vinfo;
>         }
> -      loop_vinfo->vector_size = GET_MODE_SIZE (next_vector_mode);
> +      loop_vinfo->vector_mode = next_vector_mode;
>
>        bool fatal = false;
>
> @@ -2366,7 +2366,7 @@ vect_analyze_loop (class loop *loop, loo
>
>        opt_result res = vect_analyze_loop_2 (loop_vinfo, fatal, &n_stmts);
>        if (mode_i == 0)
> -       autodetected_vector_size = loop_vinfo->vector_size;
> +       autodetected_vector_mode = loop_vinfo->vector_mode;
>
>        if (res)
>         {
> @@ -2401,21 +2401,21 @@ vect_analyze_loop (class loop *loop, loo
>
>        if (mode_i < vector_modes.length ()
>           && known_eq (GET_MODE_SIZE (vector_modes[mode_i]),
> -                      autodetected_vector_size))
> +                      GET_MODE_SIZE (autodetected_vector_mode)))
>         mode_i += 1;
>
>        if (mode_i == vector_modes.length ()
> -         || known_eq (autodetected_vector_size, 0U))
> +         || autodetected_vector_mode == VOIDmode)
>         {
>           if (first_loop_vinfo)
>             {
>               loop->aux = (loop_vec_info) first_loop_vinfo;
>               if (dump_enabled_p ())
>                 {
> +                 machine_mode mode = first_loop_vinfo->vector_mode;
>                   dump_printf_loc (MSG_NOTE, vect_location,
> -                                  "***** Choosing vector size ");
> -                 dump_dec (MSG_NOTE, first_loop_vinfo->vector_size);
> -                 dump_printf (MSG_NOTE, "\n");
> +                                  "***** Choosing vector mode %s\n",
> +                                  GET_MODE_NAME (mode));
>                 }
>               return first_loop_vinfo;
>             }
> @@ -8238,12 +8238,9 @@ vect_transform_loop (loop_vec_info loop_
>           dump_printf (MSG_NOTE, "\n");
>         }
>        else
> -       {
> -         dump_printf_loc (MSG_NOTE, vect_location,
> -                          "LOOP EPILOGUE VECTORIZED (VS=");
> -         dump_dec (MSG_NOTE, loop_vinfo->vector_size);
> -         dump_printf (MSG_NOTE, ")\n");
> -       }
> +       dump_printf_loc (MSG_NOTE, vect_location,
> +                        "LOOP EPILOGUE VECTORIZED (MODE=%s)\n",
> +                        GET_MODE_NAME (loop_vinfo->vector_mode));
>      }
>
>    /* Loops vectorized with a variable factor won't benefit from
> @@ -8294,14 +8291,14 @@ vect_transform_loop (loop_vec_info loop_
>           unsigned int ratio;
>           while (next_i < vector_modes.length ()
>                  && !(constant_multiple_p
> -                     (loop_vinfo->vector_size,
> +                     (GET_MODE_SIZE (loop_vinfo->vector_mode),
>                        GET_MODE_SIZE (vector_modes[next_i]), &ratio)
>                       && eiters >= lowest_vf / ratio))
>             next_i += 1;
>         }
>        else
>         while (next_i < vector_modes.length ()
> -              && maybe_lt (loop_vinfo->vector_size,
> +              && maybe_lt (GET_MODE_SIZE (loop_vinfo->vector_mode),
>                             GET_MODE_SIZE (vector_modes[next_i])))
>           next_i += 1;
>
> Index: gcc/tree-vect-slp.c
> ===================================================================
> --- gcc/tree-vect-slp.c 2019-10-25 13:27:15.525762975 +0100
> +++ gcc/tree-vect-slp.c 2019-10-25 13:27:19.313736209 +0100
> @@ -274,7 +274,7 @@ can_duplicate_and_interleave_p (vec_info
>      {
>        scalar_int_mode int_mode;
>        poly_int64 elt_bits = elt_bytes * BITS_PER_UNIT;
> -      if (multiple_p (vinfo->vector_size, elt_bytes, &nelts)
> +      if (multiple_p (GET_MODE_SIZE (vinfo->vector_mode), elt_bytes, &nelts)
>           && int_mode_for_size (elt_bits, 0).exists (&int_mode))
>         {
>           tree int_type = build_nonstandard_integer_type
> @@ -474,7 +474,7 @@ vect_get_and_check_slp_defs (vec_info *v
>             }
>           if ((dt == vect_constant_def
>                || dt == vect_external_def)
> -             && !vinfo->vector_size.is_constant ()
> +             && !GET_MODE_SIZE (vinfo->vector_mode).is_constant ()
>               && (TREE_CODE (type) == BOOLEAN_TYPE
>                   || !can_duplicate_and_interleave_p (vinfo, stmts.length (),
>                                                       TYPE_MODE (type))))
> @@ -2339,8 +2339,11 @@ vect_make_slp_decision (loop_vec_info lo
>    FOR_EACH_VEC_ELT (slp_instances, i, instance)
>      {
>        /* FORNOW: SLP if you can.  */
> -      /* All unroll factors have the form vinfo->vector_size * X for some
> -        rational X, so they must have a common multiple.  */
> +      /* All unroll factors have the form:
> +
> +          GET_MODE_SIZE (vinfo->vector_mode) * X
> +
> +        for some rational X, so they must have a common multiple.  */
>        unrolling_factor
>         = force_common_multiple (unrolling_factor,
>                                  SLP_INSTANCE_UNROLLING_FACTOR (instance));
> @@ -3096,7 +3099,7 @@ vect_slp_bb_region (gimple_stmt_iterator
>
>    vec_info_shared shared;
>
> -  poly_uint64 autodetected_vector_size = 0;
> +  machine_mode autodetected_vector_mode = VOIDmode;
>    while (1)
>      {
>        bool vectorized = false;
> @@ -3109,7 +3112,7 @@ vect_slp_bb_region (gimple_stmt_iterator
>         bb_vinfo->shared->save_datarefs ();
>        else
>         bb_vinfo->shared->check_datarefs ();
> -      bb_vinfo->vector_size = GET_MODE_SIZE (next_vector_mode);
> +      bb_vinfo->vector_mode = next_vector_mode;
>
>        if (vect_slp_analyze_bb_1 (bb_vinfo, n_stmts, fatal)
>           && dbg_cnt (vect_slp))
> @@ -3123,7 +3126,7 @@ vect_slp_bb_region (gimple_stmt_iterator
>           unsigned HOST_WIDE_INT bytes;
>           if (dump_enabled_p ())
>             {
> -             if (bb_vinfo->vector_size.is_constant (&bytes))
> +             if (GET_MODE_SIZE (bb_vinfo->vector_mode).is_constant (&bytes))
>                 dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, vect_location,
>                                  "basic block part vectorized using %wu byte "
>                                  "vectors\n", bytes);
> @@ -3137,18 +3140,18 @@ vect_slp_bb_region (gimple_stmt_iterator
>         }
>
>        if (mode_i == 0)
> -       autodetected_vector_size = bb_vinfo->vector_size;
> +       autodetected_vector_mode = bb_vinfo->vector_mode;
>
>        delete bb_vinfo;
>
>        if (mode_i < vector_modes.length ()
>           && known_eq (GET_MODE_SIZE (vector_modes[mode_i]),
> -                      autodetected_vector_size))
> +                      GET_MODE_SIZE (autodetected_vector_mode)))
>         mode_i += 1;
>
>        if (vectorized
>           || mode_i == vector_modes.length ()
> -         || known_eq (autodetected_vector_size, 0U)
> +         || autodetected_vector_mode == VOIDmode
>           /* If vect_slp_analyze_bb_1 signaled that analysis for all
>              vector sizes will fail do not bother iterating.  */
>           || fatal)
> Index: gcc/tree-vect-stmts.c
> ===================================================================
> --- gcc/tree-vect-stmts.c       2019-10-25 13:27:12.121787027 +0100
> +++ gcc/tree-vect-stmts.c       2019-10-25 13:27:19.313736209 +0100
> @@ -11212,11 +11212,10 @@ get_vectype_for_scalar_type_and_size (tr
>  get_vectype_for_scalar_type (vec_info *vinfo, tree scalar_type)
>  {
>    tree vectype;
> -  vectype = get_vectype_for_scalar_type_and_size (scalar_type,
> -                                                 vinfo->vector_size);
> -  if (vectype
> -      && known_eq (vinfo->vector_size, 0U))
> -    vinfo->vector_size = GET_MODE_SIZE (TYPE_MODE (vectype));
> +  poly_uint64 vector_size = GET_MODE_SIZE (vinfo->vector_mode);
> +  vectype = get_vectype_for_scalar_type_and_size (scalar_type, vector_size);
> +  if (vectype && vinfo->vector_mode == VOIDmode)
> +    vinfo->vector_mode = TYPE_MODE (vectype);
>    return vectype;
>  }
>
> Index: gcc/tree-vectorizer.c
> ===================================================================
> --- gcc/tree-vectorizer.c       2019-10-21 07:41:32.997886232 +0100
> +++ gcc/tree-vectorizer.c       2019-10-25 13:27:19.317736181 +0100
> @@ -971,7 +971,7 @@ try_vectorize_loop_1 (hash_table<simduid
>    unsigned HOST_WIDE_INT bytes;
>    if (dump_enabled_p ())
>      {
> -      if (loop_vinfo->vector_size.is_constant (&bytes))
> +      if (GET_MODE_SIZE (loop_vinfo->vector_mode).is_constant (&bytes))
>         dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, vect_location,
>                          "loop vectorized using %wu byte vectors\n", bytes);
>        else
> Index: gcc/testsuite/gcc.dg/vect/vect-tail-nomask-1.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/vect/vect-tail-nomask-1.c      2019-03-08 18:15:02.260871260 +0000
> +++ gcc/testsuite/gcc.dg/vect/vect-tail-nomask-1.c      2019-10-25 13:27:19.309736237 +0100
> @@ -106,4 +106,4 @@ main (int argc, const char **argv)
>  }
>
>  /* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 2 "vect" { target avx2_runtime } } } */
> -/* { dg-final { scan-tree-dump-times "LOOP EPILOGUE VECTORIZED \\(VS=16\\)" 2 "vect" { target avx2_runtime } } } */
> +/* { dg-final { scan-tree-dump-times "LOOP EPILOGUE VECTORIZED \\(MODE=V16QI\\)" 2 "vect" { target avx2_runtime } } } */

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [10/n] Make less use of get_same_sized_vectype
  2019-10-25 12:43 ` [10/n] Make less use of get_same_sized_vectype Richard Sandiford
@ 2019-11-05 12:50   ` Richard Biener
  2019-11-05 15:34     ` Richard Sandiford
  0 siblings, 1 reply; 48+ messages in thread
From: Richard Biener @ 2019-11-05 12:50 UTC (permalink / raw)
  To: Richard Sandiford; +Cc: GCC Patches

On Fri, Oct 25, 2019 at 2:41 PM Richard Sandiford
<richard.sandiford@arm.com> wrote:
>
> Some callers of get_same_sized_vectype were dealing with operands that
> are constant or defined externally, and so have no STMT_VINFO_VECTYPE
> available.  Under the current model, using get_same_sized_vectype for
> that case is equivalent to using get_vectype_for_scalar_type, since
> get_vectype_for_scalar_type always returns vectors of the same size,
> once a size is fixed.
>
> Using get_vectype_for_scalar_type is arguably more obvious though:
> if we're using the same scalar type as we would for internal
> definitions, we should use the same vector type too.  (Constant and
> external definitions sometimes let us change the original scalar type
> to a "nicer" scalar type, but that isn't what's happening here.)
>
> This is a prerequisite to supporting multiple vector sizes in the same
> vec_info.

This might change the actual type we get back, IIRC we mass-changed
it in the opposite direction from your change in the past, because it's
more obvious to relate the type used to another vector type on the
stmt.  So isn't it better to use the new related_vector_type thing here?

Richard.

>
> 2019-10-24  Richard Sandiford  <richard.sandiford@arm.com>
>
> gcc/
>         * tree-vect-stmts.c (vectorizable_call): If an operand is
>         constant or external, use get_vectype_for_scalar_type
>         rather than get_same_sized_vectype to get its vector type.
>         (vectorizable_conversion, vectorizable_shift): Likewise.
>         (vectorizable_operation): Likewise.
>
> Index: gcc/tree-vect-stmts.c
> ===================================================================
> --- gcc/tree-vect-stmts.c       2019-10-25 13:27:19.313736209 +0100
> +++ gcc/tree-vect-stmts.c       2019-10-25 13:27:22.985710263 +0100
> @@ -3308,10 +3308,10 @@ vectorizable_call (stmt_vec_info stmt_in
>           return false;
>         }
>      }
> -  /* If all arguments are external or constant defs use a vector type with
> -     the same size as the output vector type.  */
> +  /* If all arguments are external or constant defs, infer the vector type
> +     from the scalar type.  */
>    if (!vectype_in)
> -    vectype_in = get_same_sized_vectype (rhs_type, vectype_out);
> +    vectype_in = get_vectype_for_scalar_type (vinfo, rhs_type);
>    if (vec_stmt)
>      gcc_assert (vectype_in);
>    if (!vectype_in)
> @@ -4800,10 +4800,10 @@ vectorizable_conversion (stmt_vec_info s
>         }
>      }
>
> -  /* If op0 is an external or constant defs use a vector type of
> -     the same size as the output vector type.  */
> +  /* If op0 is an external or constant def, infer the vector type
> +     from the scalar type.  */
>    if (!vectype_in)
> -    vectype_in = get_same_sized_vectype (rhs_type, vectype_out);
> +    vectype_in = get_vectype_for_scalar_type (vinfo, rhs_type);
>    if (vec_stmt)
>      gcc_assert (vectype_in);
>    if (!vectype_in)
> @@ -5564,10 +5564,10 @@ vectorizable_shift (stmt_vec_info stmt_i
>                           "use not simple.\n");
>        return false;
>      }
> -  /* If op0 is an external or constant def use a vector type with
> -     the same size as the output vector type.  */
> +  /* If op0 is an external or constant def, infer the vector type
> +     from the scalar type.  */
>    if (!vectype)
> -    vectype = get_same_sized_vectype (TREE_TYPE (op0), vectype_out);
> +    vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (op0));
>    if (vec_stmt)
>      gcc_assert (vectype);
>    if (!vectype)
> @@ -5666,7 +5666,7 @@ vectorizable_shift (stmt_vec_info stmt_i
>                           "vector/vector shift/rotate found.\n");
>
>        if (!op1_vectype)
> -       op1_vectype = get_same_sized_vectype (TREE_TYPE (op1), vectype_out);
> +       op1_vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (op1));
>        incompatible_op1_vectype_p
>         = (op1_vectype == NULL_TREE
>            || maybe_ne (TYPE_VECTOR_SUBPARTS (op1_vectype),
> @@ -5997,8 +5997,8 @@ vectorizable_operation (stmt_vec_info st
>                           "use not simple.\n");
>        return false;
>      }
> -  /* If op0 is an external or constant def use a vector type with
> -     the same size as the output vector type.  */
> +  /* If op0 is an external or constant def, infer the vector type
> +     from the scalar type.  */
>    if (!vectype)
>      {
>        /* For boolean type we cannot determine vectype by
> @@ -6018,7 +6018,7 @@ vectorizable_operation (stmt_vec_info st
>           vectype = vectype_out;
>         }
>        else
> -       vectype = get_same_sized_vectype (TREE_TYPE (op0), vectype_out);
> +       vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (op0));
>      }
>    if (vec_stmt)
>      gcc_assert (vectype);

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [11/n] Support vectorisation with mixed vector sizes
  2019-10-25 12:44 ` [11/n] Support vectorisation with mixed vector sizes Richard Sandiford
@ 2019-11-05 12:57   ` Richard Biener
  2019-11-06 12:38     ` Richard Sandiford
  0 siblings, 1 reply; 48+ messages in thread
From: Richard Biener @ 2019-11-05 12:57 UTC (permalink / raw)
  To: Richard Sandiford; +Cc: GCC Patches

On Fri, Oct 25, 2019 at 2:43 PM Richard Sandiford
<richard.sandiford@arm.com> wrote:
>
> After previous patches, it's now possible to make the vectoriser
> support multiple vector sizes in the same vector region, using
> related_vector_mode to pick the right vector mode for a given
> element mode.  No port yet takes advantage of this, but I have
> a follow-on patch for AArch64.
>
> This patch also seemed like a good opportunity to add some more dump
> messages: one to make it clear which vector size/mode was being used
> when analysis passed or failed, and another to say when we've decided
> to skip a redundant vector size/mode.

OK.

I wonder if, when we requested a specific size previously, we now
have to verify we got that constraint satisfied after the change.
Esp. the epilogue vectorization cases want to get V2DI
from V4DI.

          sz /= 2;
-         vectype1 = get_vectype_for_scalar_type_and_size (scalar_type, sz);
+         vectype1 = get_related_vectype_for_scalar_type (TYPE_MODE (vectype),
+                                                         scalar_type,
+                                                         sz / scalar_bytes);

doesn't look like an improvement in readability to me there.  Maybe
re-formulating
the whole code in terms of lanes instead of size would make it easier to follow?

Thanks,
Richard.

>
> 2019-10-24  Richard Sandiford  <richard.sandiford@arm.com>
>
> gcc/
>         * machmode.h (opt_machine_mode::operator==): New function.
>         (opt_machine_mode::operator!=): Likewise.
>         * tree-vectorizer.h (vec_info::vector_mode): Update comment.
>         (get_related_vectype_for_scalar_type): Delete.
>         (get_vectype_for_scalar_type_and_size): Declare.
>         * tree-vect-slp.c (vect_slp_bb_region): Print dump messages to say
>         whether analysis passed or failed, and with what vector modes.
>         Use related_vector_mode to check whether trying a particular
>         vector mode would be redundant with the autodetected mode,
>         and print a dump message if we decide to skip it.
>         * tree-vect-loop.c (vect_analyze_loop): Likewise.
>         (vect_create_epilog_for_reduction): Use
>         get_related_vectype_for_scalar_type instead of
>         get_vectype_for_scalar_type_and_size.
>         * tree-vect-stmts.c (get_vectype_for_scalar_type_and_size): Replace
>         with...
>         (get_related_vectype_for_scalar_type): ...this new function.
>         Take a starting/"prevailing" vector mode rather than a vector size.
>         Take an optional nunits argument, with the same meaning as for
>         related_vector_mode.  Use related_vector_mode when not
>         auto-detecting a mode, falling back to mode_for_vector if no
>         target mode exists.
>         (get_vectype_for_scalar_type): Update accordingly.
>         (get_same_sized_vectype): Likewise.
>         * tree-vectorizer.c (get_vec_alignment_for_array_type): Likewise.
>
> Index: gcc/machmode.h
> ===================================================================
> --- gcc/machmode.h      2019-10-25 13:26:59.053879364 +0100
> +++ gcc/machmode.h      2019-10-25 13:27:26.201687539 +0100
> @@ -258,6 +258,9 @@ #define CLASS_HAS_WIDER_MODES_P(CLASS)
>    bool exists () const;
>    template<typename U> bool exists (U *) const;
>
> +  bool operator== (const T &m) const { return m_mode == m; }
> +  bool operator!= (const T &m) const { return m_mode != m; }
> +
>  private:
>    machine_mode m_mode;
>  };
> Index: gcc/tree-vectorizer.h
> ===================================================================
> --- gcc/tree-vectorizer.h       2019-10-25 13:27:19.317736181 +0100
> +++ gcc/tree-vectorizer.h       2019-10-25 13:27:26.209687483 +0100
> @@ -329,8 +329,9 @@ typedef std::pair<tree, tree> vec_object
>    /* Cost data used by the target cost model.  */
>    void *target_cost_data;
>
> -  /* If we've chosen a vector size for this vectorization region,
> -     this is one mode that has such a size, otherwise it is VOIDmode.  */
> +  /* The argument we should pass to related_vector_mode when looking up
> +     the vector mode for a scalar mode, or VOIDmode if we haven't yet
> +     made any decisions about which vector modes to use.  */
>    machine_mode vector_mode;
>
>  private:
> @@ -1595,8 +1596,9 @@ extern dump_user_location_t find_loop_lo
>  extern bool vect_can_advance_ivs_p (loop_vec_info);
>
>  /* In tree-vect-stmts.c.  */
> +extern tree get_related_vectype_for_scalar_type (machine_mode, tree,
> +                                                poly_uint64 = 0);
>  extern tree get_vectype_for_scalar_type (vec_info *, tree);
> -extern tree get_vectype_for_scalar_type_and_size (tree, poly_uint64);
>  extern tree get_mask_type_for_scalar_type (vec_info *, tree);
>  extern tree get_same_sized_vectype (tree, tree);
>  extern bool vect_get_loop_mask_type (loop_vec_info);
> Index: gcc/tree-vect-slp.c
> ===================================================================
> --- gcc/tree-vect-slp.c 2019-10-25 13:27:19.313736209 +0100
> +++ gcc/tree-vect-slp.c 2019-10-25 13:27:26.205687511 +0100
> @@ -3118,7 +3118,12 @@ vect_slp_bb_region (gimple_stmt_iterator
>           && dbg_cnt (vect_slp))
>         {
>           if (dump_enabled_p ())
> -           dump_printf_loc (MSG_NOTE, vect_location, "SLPing BB part\n");
> +           {
> +             dump_printf_loc (MSG_NOTE, vect_location,
> +                              "***** Analysis succeeded with vector mode"
> +                              " %s\n", GET_MODE_NAME (bb_vinfo->vector_mode));
> +             dump_printf_loc (MSG_NOTE, vect_location, "SLPing BB part\n");
> +           }
>
>           bb_vinfo->shared->check_datarefs ();
>           vect_schedule_slp (bb_vinfo);
> @@ -3138,6 +3143,13 @@ vect_slp_bb_region (gimple_stmt_iterator
>
>           vectorized = true;
>         }
> +      else
> +       {
> +         if (dump_enabled_p ())
> +           dump_printf_loc (MSG_NOTE, vect_location,
> +                            "***** Analysis failed with vector mode %s\n",
> +                            GET_MODE_NAME (bb_vinfo->vector_mode));
> +       }
>
>        if (mode_i == 0)
>         autodetected_vector_mode = bb_vinfo->vector_mode;
> @@ -3145,9 +3157,22 @@ vect_slp_bb_region (gimple_stmt_iterator
>        delete bb_vinfo;
>
>        if (mode_i < vector_modes.length ()
> -         && known_eq (GET_MODE_SIZE (vector_modes[mode_i]),
> -                      GET_MODE_SIZE (autodetected_vector_mode)))
> -       mode_i += 1;
> +         && VECTOR_MODE_P (autodetected_vector_mode)
> +         && (related_vector_mode (vector_modes[mode_i],
> +                                  GET_MODE_INNER (autodetected_vector_mode))
> +             == autodetected_vector_mode)
> +         && (related_vector_mode (autodetected_vector_mode,
> +                                  GET_MODE_INNER (vector_modes[mode_i]))
> +             == vector_modes[mode_i]))
> +       {
> +         if (dump_enabled_p ())
> +           dump_printf_loc (MSG_NOTE, vect_location,
> +                            "***** Skipping vector mode %s, which would"
> +                            " repeat the analysis for %s\n",
> +                            GET_MODE_NAME (vector_modes[mode_i]),
> +                            GET_MODE_NAME (autodetected_vector_mode));
> +         mode_i += 1;
> +       }
>
>        if (vectorized
>           || mode_i == vector_modes.length ()
> Index: gcc/tree-vect-loop.c
> ===================================================================
> --- gcc/tree-vect-loop.c        2019-10-25 13:27:19.309736237 +0100
> +++ gcc/tree-vect-loop.c        2019-10-25 13:27:26.201687539 +0100
> @@ -2367,6 +2367,17 @@ vect_analyze_loop (class loop *loop, loo
>        opt_result res = vect_analyze_loop_2 (loop_vinfo, fatal, &n_stmts);
>        if (mode_i == 0)
>         autodetected_vector_mode = loop_vinfo->vector_mode;
> +      if (dump_enabled_p ())
> +       {
> +         if (res)
> +           dump_printf_loc (MSG_NOTE, vect_location,
> +                            "***** Analysis succeeded with vector mode %s\n",
> +                            GET_MODE_NAME (loop_vinfo->vector_mode));
> +         else
> +           dump_printf_loc (MSG_NOTE, vect_location,
> +                            "***** Analysis failed with vector mode %s\n",
> +                            GET_MODE_NAME (loop_vinfo->vector_mode));
> +       }
>
>        if (res)
>         {
> @@ -2400,9 +2411,22 @@ vect_analyze_loop (class loop *loop, loo
>         }
>
>        if (mode_i < vector_modes.length ()
> -         && known_eq (GET_MODE_SIZE (vector_modes[mode_i]),
> -                      GET_MODE_SIZE (autodetected_vector_mode)))
> -       mode_i += 1;
> +         && VECTOR_MODE_P (autodetected_vector_mode)
> +         && (related_vector_mode (vector_modes[mode_i],
> +                                  GET_MODE_INNER (autodetected_vector_mode))
> +             == autodetected_vector_mode)
> +         && (related_vector_mode (autodetected_vector_mode,
> +                                  GET_MODE_INNER (vector_modes[mode_i]))
> +             == vector_modes[mode_i]))
> +       {
> +         if (dump_enabled_p ())
> +           dump_printf_loc (MSG_NOTE, vect_location,
> +                            "***** Skipping vector mode %s, which would"
> +                            " repeat the analysis for %s\n",
> +                            GET_MODE_NAME (vector_modes[mode_i]),
> +                            GET_MODE_NAME (autodetected_vector_mode));
> +         mode_i += 1;
> +       }
>
>        if (mode_i == vector_modes.length ()
>           || autodetected_vector_mode == VOIDmode)
> @@ -4763,7 +4787,10 @@ vect_create_epilog_for_reduction (stmt_v
>           && (mode1 = targetm.vectorize.split_reduction (mode)) != mode)
>         sz1 = GET_MODE_SIZE (mode1).to_constant ();
>
> -      tree vectype1 = get_vectype_for_scalar_type_and_size (scalar_type, sz1);
> +      unsigned int scalar_bytes = tree_to_uhwi (TYPE_SIZE_UNIT (scalar_type));
> +      tree vectype1 = get_related_vectype_for_scalar_type (TYPE_MODE (vectype),
> +                                                          scalar_type,
> +                                                          sz1 / scalar_bytes);
>        reduce_with_shift = have_whole_vector_shift (mode1);
>        if (!VECTOR_MODE_P (mode1))
>         reduce_with_shift = false;
> @@ -4781,7 +4808,9 @@ vect_create_epilog_for_reduction (stmt_v
>         {
>           gcc_assert (!slp_reduc);
>           sz /= 2;
> -         vectype1 = get_vectype_for_scalar_type_and_size (scalar_type, sz);
> +         vectype1 = get_related_vectype_for_scalar_type (TYPE_MODE (vectype),
> +                                                         scalar_type,
> +                                                         sz / scalar_bytes);
>
>           /* The target has to make sure we support lowpart/highpart
>              extraction, either via direct vector extract or through
> Index: gcc/tree-vect-stmts.c
> ===================================================================
> --- gcc/tree-vect-stmts.c       2019-10-25 13:27:22.985710263 +0100
> +++ gcc/tree-vect-stmts.c       2019-10-25 13:27:26.205687511 +0100
> @@ -11111,18 +11111,28 @@ vect_remove_stores (stmt_vec_info first_
>      }
>  }
>
> -/* Function get_vectype_for_scalar_type_and_size.
> -
> -   Returns the vector type corresponding to SCALAR_TYPE  and SIZE as supported
> -   by the target.  */
> +/* If NUNITS is nonzero, return a vector type that contains NUNITS
> +   elements of type SCALAR_TYPE, or null if the target doesn't support
> +   such a type.
> +
> +   If NUNITS is zero, return a vector type that contains elements of
> +   type SCALAR_TYPE, choosing whichever vector size the target prefers.
> +
> +   If PREVAILING_MODE is VOIDmode, we have not yet chosen a vector mode
> +   for this vectorization region and want to "autodetect" the best choice.
> +   Otherwise, PREVAILING_MODE is a previously-chosen vector TYPE_MODE
> +   and we want the new type to be interoperable with it.   PREVAILING_MODE
> +   in this case can be a scalar integer mode or a vector mode; when it
> +   is a vector mode, the function acts like a tree-level version of
> +   related_vector_mode.  */
>
>  tree
> -get_vectype_for_scalar_type_and_size (tree scalar_type, poly_uint64 size)
> +get_related_vectype_for_scalar_type (machine_mode prevailing_mode,
> +                                    tree scalar_type, poly_uint64 nunits)
>  {
>    tree orig_scalar_type = scalar_type;
>    scalar_mode inner_mode;
>    machine_mode simd_mode;
> -  poly_uint64 nunits;
>    tree vectype;
>
>    if (!is_int_mode (TYPE_MODE (scalar_type), &inner_mode)
> @@ -11162,10 +11172,11 @@ get_vectype_for_scalar_type_and_size (tr
>    if (scalar_type == NULL_TREE)
>      return NULL_TREE;
>
> -  /* If no size was supplied use the mode the target prefers.   Otherwise
> -     lookup a vector mode of the specified size.  */
> -  if (known_eq (size, 0U))
> +  /* If no prevailing mode was supplied, use the mode the target prefers.
> +     Otherwise lookup a vector mode based on the prevailing mode.  */
> +  if (prevailing_mode == VOIDmode)
>      {
> +      gcc_assert (known_eq (nunits, 0U));
>        simd_mode = targetm.vectorize.preferred_simd_mode (inner_mode);
>        if (SCALAR_INT_MODE_P (simd_mode))
>         {
> @@ -11181,9 +11192,19 @@ get_vectype_for_scalar_type_and_size (tr
>             return NULL_TREE;
>         }
>      }
> -  else if (!multiple_p (size, nbytes, &nunits)
> -          || !mode_for_vector (inner_mode, nunits).exists (&simd_mode))
> -    return NULL_TREE;
> +  else if (SCALAR_INT_MODE_P (prevailing_mode)
> +          || !related_vector_mode (prevailing_mode,
> +                                   inner_mode, nunits).exists (&simd_mode))
> +    {
> +      /* Fall back to using mode_for_vector, mostly in the hope of being
> +        able to use an integer mode.  */
> +      if (known_eq (nunits, 0U)
> +         && !multiple_p (GET_MODE_SIZE (prevailing_mode), nbytes, &nunits))
> +       return NULL_TREE;
> +
> +      if (!mode_for_vector (inner_mode, nunits).exists (&simd_mode))
> +       return NULL_TREE;
> +    }
>
>    vectype = build_vector_type_for_mode (scalar_type, simd_mode);
>
> @@ -11211,9 +11232,8 @@ get_vectype_for_scalar_type_and_size (tr
>  tree
>  get_vectype_for_scalar_type (vec_info *vinfo, tree scalar_type)
>  {
> -  tree vectype;
> -  poly_uint64 vector_size = GET_MODE_SIZE (vinfo->vector_mode);
> -  vectype = get_vectype_for_scalar_type_and_size (scalar_type, vector_size);
> +  tree vectype = get_related_vectype_for_scalar_type (vinfo->vector_mode,
> +                                                     scalar_type);
>    if (vectype && vinfo->vector_mode == VOIDmode)
>      vinfo->vector_mode = TYPE_MODE (vectype);
>    return vectype;
> @@ -11246,8 +11266,13 @@ get_same_sized_vectype (tree scalar_type
>    if (VECT_SCALAR_BOOLEAN_TYPE_P (scalar_type))
>      return truth_type_for (vector_type);
>
> -  return get_vectype_for_scalar_type_and_size
> -          (scalar_type, GET_MODE_SIZE (TYPE_MODE (vector_type)));
> +  poly_uint64 nunits;
> +  if (!multiple_p (GET_MODE_SIZE (TYPE_MODE (vector_type)),
> +                  GET_MODE_SIZE (TYPE_MODE (scalar_type)), &nunits))
> +    return NULL_TREE;
> +
> +  return get_related_vectype_for_scalar_type (TYPE_MODE (vector_type),
> +                                             scalar_type, nunits);
>  }
>
>  /* Function vect_is_simple_use.
> Index: gcc/tree-vectorizer.c
> ===================================================================
> --- gcc/tree-vectorizer.c       2019-10-25 13:27:19.317736181 +0100
> +++ gcc/tree-vectorizer.c       2019-10-25 13:27:26.209687483 +0100
> @@ -1348,7 +1348,7 @@ get_vec_alignment_for_array_type (tree t
>    poly_uint64 array_size, vector_size;
>
>    tree scalar_type = strip_array_types (type);
> -  tree vectype = get_vectype_for_scalar_type_and_size (scalar_type, 0);
> +  tree vectype = get_related_vectype_for_scalar_type (VOIDmode, scalar_type);
>    if (!vectype
>        || !poly_int_tree_p (TYPE_SIZE (type), &array_size)
>        || !poly_int_tree_p (TYPE_SIZE (vectype), &vector_size)

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [13/n] Allow mixed vector sizes within a single vectorised stmt
  2019-10-25 12:51 ` [13/n] Allow mixed vector sizes within a single vectorised stmt Richard Sandiford
@ 2019-11-05 12:58   ` Richard Biener
  0 siblings, 0 replies; 48+ messages in thread
From: Richard Biener @ 2019-11-05 12:58 UTC (permalink / raw)
  To: Richard Sandiford; +Cc: GCC Patches

On Fri, Oct 25, 2019 at 2:49 PM Richard Sandiford
<richard.sandiford@arm.com> wrote:
>
> Although a previous patch allowed mixed vector sizes within a vector
> region, we generally still required equal vector sizes within a vector
> stmt.  Specifically, vect_get_vector_types_for_stmt computes two vector
> types: the vector type corresponding to STMT_VINFO_VECTYPE and the
> vector type that determines the minimum vectorisation factor for the
> stmt ("nunits_vectype").  It then required these two types to be
> the same size.
>
> There doesn't seem to be any need for that restriction though.  AFAICT,
> all vectorizable_* functions either do their own compatibility checks
> or don't need to do them (because gimple guarantees that the scalar
> types are compatible).
>
> It should always be the case that nunits_vectype has at least as many
> elements as the other vectype, but that's something we can assert for.
>
> I couldn't resist a couple of other tweaks while there:
>
> - there's no need to compute nunits_vectype if its element type is
>   the same as STMT_VINFO_VECTYPE's.
>
> - it's useful to distinguish the nunits_vectype from the main vectype
>   in dump messages
>
> - when reusing the existing STMT_VINFO_VECTYPE, it's useful to say so
>   in the dump, and say what the type is

OK.

Thanks,
Richard.

>
> 2019-10-24  Richard Sandiford  <richard.sandiford@arm.com>
>
> gcc/
>         * tree-vect-stmts.c (vect_get_vector_types_for_stmt): Don't
>         require vectype and nunits_vectype to have the same size;
>         instead assert that nunits_vectype has at least as many
>         elements as vectype.  Don't compute a separate nunits_vectype
>         if the scalar type is obviously the same as vectype's.
>         Tweak dump messages.
>
> Index: gcc/tree-vect-stmts.c
> ===================================================================
> --- gcc/tree-vect-stmts.c       2019-10-25 13:27:26.205687511 +0100
> +++ gcc/tree-vect-stmts.c       2019-10-25 13:27:32.877640367 +0100
> @@ -11973,7 +11973,12 @@ vect_get_vector_types_for_stmt (stmt_vec
>    tree vectype;
>    tree scalar_type = NULL_TREE;
>    if (STMT_VINFO_VECTYPE (stmt_info))
> -    *stmt_vectype_out = vectype = STMT_VINFO_VECTYPE (stmt_info);
> +    {
> +      *stmt_vectype_out = vectype = STMT_VINFO_VECTYPE (stmt_info);
> +      if (dump_enabled_p ())
> +       dump_printf_loc (MSG_NOTE, vect_location,
> +                        "precomputed vectype: %T\n", vectype);
> +    }
>    else
>      {
>        gcc_assert (!STMT_VINFO_DATA_REF (stmt_info));
> @@ -12005,7 +12010,7 @@ vect_get_vector_types_for_stmt (stmt_vec
>
>        if (dump_enabled_p ())
>         dump_printf_loc (MSG_NOTE, vect_location,
> -                        "get vectype for scalar type:  %T\n", scalar_type);
> +                        "get vectype for scalar type: %T\n", scalar_type);
>        vectype = get_vectype_for_scalar_type (vinfo, scalar_type);
>        if (!vectype)
>         return opt_result::failure_at (stmt,
> @@ -12022,42 +12027,38 @@ vect_get_vector_types_for_stmt (stmt_vec
>
>    /* Don't try to compute scalar types if the stmt produces a boolean
>       vector; use the existing vector type instead.  */
> -  tree nunits_vectype;
> -  if (VECTOR_BOOLEAN_TYPE_P (vectype))
> -    nunits_vectype = vectype;
> -  else
> +  tree nunits_vectype = vectype;
> +  if (!VECTOR_BOOLEAN_TYPE_P (vectype)
> +      && *stmt_vectype_out != boolean_type_node)
>      {
>        /* The number of units is set according to the smallest scalar
>          type (or the largest vector size, but we only support one
>          vector size per vectorization).  */
> -      if (*stmt_vectype_out != boolean_type_node)
> +      HOST_WIDE_INT dummy;
> +      scalar_type = vect_get_smallest_scalar_type (stmt_info, &dummy, &dummy);
> +      if (scalar_type != TREE_TYPE (vectype))
>         {
> -         HOST_WIDE_INT dummy;
> -         scalar_type = vect_get_smallest_scalar_type (stmt_info,
> -                                                      &dummy, &dummy);
> +         if (dump_enabled_p ())
> +           dump_printf_loc (MSG_NOTE, vect_location,
> +                            "get vectype for smallest scalar type: %T\n",
> +                            scalar_type);
> +         nunits_vectype = get_vectype_for_scalar_type (vinfo, scalar_type);
> +         if (!nunits_vectype)
> +           return opt_result::failure_at
> +             (stmt, "not vectorized: unsupported data-type %T\n",
> +              scalar_type);
> +         if (dump_enabled_p ())
> +           dump_printf_loc (MSG_NOTE, vect_location, "nunits vectype: %T\n",
> +                            nunits_vectype);
>         }
> -      if (dump_enabled_p ())
> -       dump_printf_loc (MSG_NOTE, vect_location,
> -                        "get vectype for scalar type:  %T\n", scalar_type);
> -      nunits_vectype = get_vectype_for_scalar_type (vinfo, scalar_type);
>      }
> -  if (!nunits_vectype)
> -    return opt_result::failure_at (stmt,
> -                                  "not vectorized: unsupported data-type %T\n",
> -                                  scalar_type);
> -
> -  if (maybe_ne (GET_MODE_SIZE (TYPE_MODE (vectype)),
> -               GET_MODE_SIZE (TYPE_MODE (nunits_vectype))))
> -    return opt_result::failure_at (stmt,
> -                                  "not vectorized: different sized vector "
> -                                  "types in statement, %T and %T\n",
> -                                  vectype, nunits_vectype);
> +
> +  gcc_assert (*stmt_vectype_out == boolean_type_node
> +             || multiple_p (TYPE_VECTOR_SUBPARTS (nunits_vectype),
> +                            TYPE_VECTOR_SUBPARTS (*stmt_vectype_out)));
>
>    if (dump_enabled_p ())
>      {
> -      dump_printf_loc (MSG_NOTE, vect_location, "vectype: %T\n",
> -                      nunits_vectype);
> -
>        dump_printf_loc (MSG_NOTE, vect_location, "nunits = ");
>        dump_dec (MSG_NOTE, TYPE_VECTOR_SUBPARTS (nunits_vectype));
>        dump_printf (MSG_NOTE, "\n");

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [14/n] Vectorise conversions between differently-sized integer vectors
  2019-10-25 13:00 ` [14/n] Vectorise conversions between differently-sized integer vectors Richard Sandiford
@ 2019-11-05 13:02   ` Richard Biener
  2019-11-06 12:45     ` Richard Sandiford
  0 siblings, 1 reply; 48+ messages in thread
From: Richard Biener @ 2019-11-05 13:02 UTC (permalink / raw)
  To: Richard Sandiford; +Cc: GCC Patches

On Fri, Oct 25, 2019 at 2:51 PM Richard Sandiford
<richard.sandiford@arm.com> wrote:
>
> This patch adds AArch64 patterns for converting between 64-bit and
> 128-bit integer vectors, and makes the vectoriser and expand pass
> use them.

So on GIMPLE we'll see

v4si _1;
v4di _2;

 _1 = (v4si) _2;

then, correct?  Likewise for float conversions.

I think that's "new", can you add to tree-cfg.c:verify_gimple_assign_unary
verification that the number of lanes of the LHS and the RHS match please?

OK with that change.
Thanks,
Richard.

>
> 2019-10-24  Richard Sandiford  <richard.sandiford@arm.com>
>
> gcc/
>         * tree-vect-stmts.c (vectorizable_conversion): Extend the
>         non-widening and non-narrowing path to handle standard
>         conversion codes, if the target supports them.
>         * expr.c (convert_move): Try using the extend and truncate optabs
>         for vectors.
>         * optabs-tree.c (supportable_convert_operation): Likewise.
>         * config/aarch64/iterators.md (Vnarroqw): New iterator.
>         * config/aarch64/aarch64-simd.md (<optab><Vnarrowq><mode>2)
>         (trunc<mode><Vnarrowq>2): New patterns.
>
> gcc/testsuite/
>         * gcc.dg/vect/no-scevccp-outer-12.c: Expect the test to pass
>         on aarch64 targets.
>         * gcc.dg/vect/vect-double-reduc-5.c: Likewise.
>         * gcc.dg/vect/vect-outer-4e.c: Likewise.
>         * gcc.target/aarch64/vect_mixed_sizes_5.c: New test.
>         * gcc.target/aarch64/vect_mixed_sizes_6.c: Likewise.
>         * gcc.target/aarch64/vect_mixed_sizes_7.c: Likewise.
>         * gcc.target/aarch64/vect_mixed_sizes_8.c: Likewise.
>         * gcc.target/aarch64/vect_mixed_sizes_9.c: Likewise.
>         * gcc.target/aarch64/vect_mixed_sizes_10.c: Likewise.
>         * gcc.target/aarch64/vect_mixed_sizes_11.c: Likewise.
>         * gcc.target/aarch64/vect_mixed_sizes_12.c: Likewise.
>         * gcc.target/aarch64/vect_mixed_sizes_13.c: Likewise.
>
> Index: gcc/tree-vect-stmts.c
> ===================================================================
> --- gcc/tree-vect-stmts.c       2019-10-25 13:27:32.877640367 +0100
> +++ gcc/tree-vect-stmts.c       2019-10-25 13:27:36.197616908 +0100
> @@ -4861,7 +4861,9 @@ vectorizable_conversion (stmt_vec_info s
>    switch (modifier)
>      {
>      case NONE:
> -      if (code != FIX_TRUNC_EXPR && code != FLOAT_EXPR)
> +      if (code != FIX_TRUNC_EXPR
> +         && code != FLOAT_EXPR
> +         && !CONVERT_EXPR_CODE_P (code))
>         return false;
>        if (supportable_convert_operation (code, vectype_out, vectype_in,
>                                          &decl1, &code1))
> Index: gcc/expr.c
> ===================================================================
> --- gcc/expr.c  2019-10-22 08:46:57.359355939 +0100
> +++ gcc/expr.c  2019-10-25 13:27:36.193616936 +0100
> @@ -250,6 +250,31 @@ convert_move (rtx to, rtx from, int unsi
>
>    if (VECTOR_MODE_P (to_mode) || VECTOR_MODE_P (from_mode))
>      {
> +      if (GET_MODE_UNIT_PRECISION (to_mode)
> +         > GET_MODE_UNIT_PRECISION (from_mode))
> +       {
> +         optab op = unsignedp ? zext_optab : sext_optab;
> +         insn_code icode = convert_optab_handler (op, to_mode, from_mode);
> +         if (icode != CODE_FOR_nothing)
> +           {
> +             emit_unop_insn (icode, to, from,
> +                             unsignedp ? ZERO_EXTEND : SIGN_EXTEND);
> +             return;
> +           }
> +       }
> +
> +      if (GET_MODE_UNIT_PRECISION (to_mode)
> +         < GET_MODE_UNIT_PRECISION (from_mode))
> +       {
> +         insn_code icode = convert_optab_handler (trunc_optab,
> +                                                  to_mode, from_mode);
> +         if (icode != CODE_FOR_nothing)
> +           {
> +             emit_unop_insn (icode, to, from, TRUNCATE);
> +             return;
> +           }
> +       }
> +
>        gcc_assert (known_eq (GET_MODE_BITSIZE (from_mode),
>                             GET_MODE_BITSIZE (to_mode)));
>
> Index: gcc/optabs-tree.c
> ===================================================================
> --- gcc/optabs-tree.c   2019-10-08 09:23:31.894529571 +0100
> +++ gcc/optabs-tree.c   2019-10-25 13:27:36.193616936 +0100
> @@ -303,6 +303,20 @@ supportable_convert_operation (enum tree
>        return true;
>      }
>
> +  if (GET_MODE_UNIT_PRECISION (m1) > GET_MODE_UNIT_PRECISION (m2)
> +      && can_extend_p (m1, m2, TYPE_UNSIGNED (vectype_in)))
> +    {
> +      *code1 = code;
> +      return true;
> +    }
> +
> +  if (GET_MODE_UNIT_PRECISION (m1) < GET_MODE_UNIT_PRECISION (m2)
> +      && convert_optab_handler (trunc_optab, m1, m2) != CODE_FOR_nothing)
> +    {
> +      *code1 = code;
> +      return true;
> +    }
> +
>    /* Now check for builtin.  */
>    if (targetm.vectorize.builtin_conversion
>        && targetm.vectorize.builtin_conversion (code, vectype_out, vectype_in))
> Index: gcc/config/aarch64/iterators.md
> ===================================================================
> --- gcc/config/aarch64/iterators.md     2019-10-17 14:23:07.711222242 +0100
> +++ gcc/config/aarch64/iterators.md     2019-10-25 13:27:36.189616964 +0100
> @@ -860,6 +860,8 @@ (define_mode_attr VNARROWQ [(V8HI "V8QI"
>                             (V2DI "V2SI")
>                             (DI   "SI")   (SI   "HI")
>                             (HI   "QI")])
> +(define_mode_attr Vnarrowq [(V8HI "v8qi") (V4SI "v4hi")
> +                           (V2DI "v2si")])
>
>  ;; Narrowed quad-modes for VQN (Used for XTN2).
>  (define_mode_attr VNARROWQ2 [(V8HI "V16QI") (V4SI "V8HI")
> Index: gcc/config/aarch64/aarch64-simd.md
> ===================================================================
> --- gcc/config/aarch64/aarch64-simd.md  2019-08-25 19:10:35.550157075 +0100
> +++ gcc/config/aarch64/aarch64-simd.md  2019-10-25 13:27:36.189616964 +0100
> @@ -7007,3 +7007,21 @@ (define_insn "aarch64_crypto_pmullv2di"
>    "pmull2\\t%0.1q, %1.2d, %2.2d"
>    [(set_attr "type" "crypto_pmull")]
>  )
> +
> +;; Sign- or zero-extend a 64-bit integer vector to a 128-bit vector.
> +(define_insn "<optab><Vnarrowq><mode>2"
> +  [(set (match_operand:VQN 0 "register_operand" "=w")
> +       (ANY_EXTEND:VQN (match_operand:<VNARROWQ> 1 "register_operand" "w")))]
> +  "TARGET_SIMD"
> +  "<su>xtl\t%0.<Vtype>, %1.<Vntype>"
> +  [(set_attr "type" "neon_shift_imm_long")]
> +)
> +
> +;; Truncate a 128-bit integer vector to a 64-bit vector.
> +(define_insn "trunc<mode><Vnarrowq>2"
> +  [(set (match_operand:<VNARROWQ> 0 "register_operand" "=w")
> +       (truncate:<VNARROWQ> (match_operand:VQN 1 "register_operand" "w")))]
> +  "TARGET_SIMD"
> +  "xtn\t%0.<Vntype>, %1.<Vtype>"
> +  [(set_attr "type" "neon_shift_imm_narrow_q")]
> +)
> Index: gcc/testsuite/gcc.dg/vect/no-scevccp-outer-12.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/vect/no-scevccp-outer-12.c     2019-03-08 18:15:02.252871290 +0000
> +++ gcc/testsuite/gcc.dg/vect/no-scevccp-outer-12.c     2019-10-25 13:27:36.193616936 +0100
> @@ -46,4 +46,4 @@ int main (void)
>  }
>
>  /* Until we support multiple types in the inner loop  */
> -/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" { xfail *-*-* } } } */
> +/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" { xfail { ! aarch64*-*-* } } } } */
> Index: gcc/testsuite/gcc.dg/vect/vect-double-reduc-5.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/vect/vect-double-reduc-5.c     2019-03-08 18:15:02.244871320 +0000
> +++ gcc/testsuite/gcc.dg/vect/vect-double-reduc-5.c     2019-10-25 13:27:36.193616936 +0100
> @@ -52,5 +52,5 @@ int main ()
>
>  /* Vectorization of loops with multiple types and double reduction is not
>     supported yet.  */
> -/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { xfail *-*-* } } } */
> +/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { xfail { ! aarch64*-*-* } } } } */
>
> Index: gcc/testsuite/gcc.dg/vect/vect-outer-4e.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/vect/vect-outer-4e.c   2019-03-08 18:15:02.264871246 +0000
> +++ gcc/testsuite/gcc.dg/vect/vect-outer-4e.c   2019-10-25 13:27:36.193616936 +0100
> @@ -23,4 +23,4 @@ foo (){
>    return;
>  }
>
> -/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { xfail *-*-* } } } */
> +/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { xfail { ! aarch64*-*-* } } } } */
> Index: gcc/testsuite/gcc.target/aarch64/vect_mixed_sizes_5.c
> ===================================================================
> --- /dev/null   2019-09-17 11:41:18.176664108 +0100
> +++ gcc/testsuite/gcc.target/aarch64/vect_mixed_sizes_5.c       2019-10-25 13:27:36.193616936 +0100
> @@ -0,0 +1,18 @@
> +/* { dg-options "-O2 -ftree-vectorize" } */
> +
> +#pragma GCC target "+nosve"
> +
> +#include <stdint.h>
> +
> +void
> +f (int64_t *x, int64_t *y, int32_t *z, int n)
> +{
> +  for (int i = 0; i < n; ++i)
> +    {
> +      x[i] = z[i];
> +      y[i] += y[i - 2];
> +    }
> +}
> +
> +/* { dg-final { scan-assembler-times {\tsxtl\tv[0-9]+\.2d, v[0-9]+\.2s\n} 1 } } */
> +/* { dg-final { scan-assembler-times {\tadd\tv[0-9]+\.2d,} 1 } } */
> Index: gcc/testsuite/gcc.target/aarch64/vect_mixed_sizes_6.c
> ===================================================================
> --- /dev/null   2019-09-17 11:41:18.176664108 +0100
> +++ gcc/testsuite/gcc.target/aarch64/vect_mixed_sizes_6.c       2019-10-25 13:27:36.193616936 +0100
> @@ -0,0 +1,18 @@
> +/* { dg-options "-O2 -ftree-vectorize" } */
> +
> +#pragma GCC target "+nosve"
> +
> +#include <stdint.h>
> +
> +void
> +f (int32_t *x, int32_t *y, int16_t *z, int n)
> +{
> +  for (int i = 0; i < n; ++i)
> +    {
> +      x[i] = z[i];
> +      y[i] += y[i - 4];
> +    }
> +}
> +
> +/* { dg-final { scan-assembler-times {\tsxtl\tv[0-9]+\.4s, v[0-9]+\.4h\n} 1 } } */
> +/* { dg-final { scan-assembler-times {\tadd\tv[0-9]+\.4s,} 1 } } */
> Index: gcc/testsuite/gcc.target/aarch64/vect_mixed_sizes_7.c
> ===================================================================
> --- /dev/null   2019-09-17 11:41:18.176664108 +0100
> +++ gcc/testsuite/gcc.target/aarch64/vect_mixed_sizes_7.c       2019-10-25 13:27:36.193616936 +0100
> @@ -0,0 +1,18 @@
> +/* { dg-options "-O2 -ftree-vectorize" } */
> +
> +#pragma GCC target "+nosve"
> +
> +#include <stdint.h>
> +
> +void
> +f (int16_t *x, int16_t *y, int8_t *z, int n)
> +{
> +  for (int i = 0; i < n; ++i)
> +    {
> +      x[i] = z[i];
> +      y[i] += y[i - 8];
> +    }
> +}
> +
> +/* { dg-final { scan-assembler-times {\tsxtl\tv[0-9]+\.8h, v[0-9]+\.8b\n} 1 } } */
> +/* { dg-final { scan-assembler-times {\tadd\tv[0-9]+\.8h,} 1 } } */
> Index: gcc/testsuite/gcc.target/aarch64/vect_mixed_sizes_8.c
> ===================================================================
> --- /dev/null   2019-09-17 11:41:18.176664108 +0100
> +++ gcc/testsuite/gcc.target/aarch64/vect_mixed_sizes_8.c       2019-10-25 13:27:36.193616936 +0100
> @@ -0,0 +1,18 @@
> +/* { dg-options "-O2 -ftree-vectorize" } */
> +
> +#pragma GCC target "+nosve"
> +
> +#include <stdint.h>
> +
> +void
> +f (int64_t *x, int64_t *y, uint32_t *z, int n)
> +{
> +  for (int i = 0; i < n; ++i)
> +    {
> +      x[i] = z[i];
> +      y[i] += y[i - 2];
> +    }
> +}
> +
> +/* { dg-final { scan-assembler-times {\tuxtl\tv[0-9]+\.2d, v[0-9]+\.2s\n} 1 } } */
> +/* { dg-final { scan-assembler-times {\tadd\tv[0-9]+\.2d,} 1 } } */
> Index: gcc/testsuite/gcc.target/aarch64/vect_mixed_sizes_9.c
> ===================================================================
> --- /dev/null   2019-09-17 11:41:18.176664108 +0100
> +++ gcc/testsuite/gcc.target/aarch64/vect_mixed_sizes_9.c       2019-10-25 13:27:36.193616936 +0100
> @@ -0,0 +1,18 @@
> +/* { dg-options "-O2 -ftree-vectorize" } */
> +
> +#pragma GCC target "+nosve"
> +
> +#include <stdint.h>
> +
> +void
> +f (int32_t *x, int32_t *y, uint16_t *z, int n)
> +{
> +  for (int i = 0; i < n; ++i)
> +    {
> +      x[i] = z[i];
> +      y[i] += y[i - 4];
> +    }
> +}
> +
> +/* { dg-final { scan-assembler-times {\tuxtl\tv[0-9]+\.4s, v[0-9]+\.4h\n} 1 } } */
> +/* { dg-final { scan-assembler-times {\tadd\tv[0-9]+\.4s,} 1 } } */
> Index: gcc/testsuite/gcc.target/aarch64/vect_mixed_sizes_10.c
> ===================================================================
> --- /dev/null   2019-09-17 11:41:18.176664108 +0100
> +++ gcc/testsuite/gcc.target/aarch64/vect_mixed_sizes_10.c      2019-10-25 13:27:36.193616936 +0100
> @@ -0,0 +1,18 @@
> +/* { dg-options "-O2 -ftree-vectorize" } */
> +
> +#pragma GCC target "+nosve"
> +
> +#include <stdint.h>
> +
> +void
> +f (int16_t *x, int16_t *y, uint8_t *z, int n)
> +{
> +  for (int i = 0; i < n; ++i)
> +    {
> +      x[i] = z[i];
> +      y[i] += y[i - 8];
> +    }
> +}
> +
> +/* { dg-final { scan-assembler-times {\tuxtl\tv[0-9]+\.8h, v[0-9]+\.8b\n} 1 } } */
> +/* { dg-final { scan-assembler-times {\tadd\tv[0-9]+\.8h,} 1 } } */
> Index: gcc/testsuite/gcc.target/aarch64/vect_mixed_sizes_11.c
> ===================================================================
> --- /dev/null   2019-09-17 11:41:18.176664108 +0100
> +++ gcc/testsuite/gcc.target/aarch64/vect_mixed_sizes_11.c      2019-10-25 13:27:36.193616936 +0100
> @@ -0,0 +1,18 @@
> +/* { dg-options "-O2 -ftree-vectorize" } */
> +
> +#pragma GCC target "+nosve"
> +
> +#include <stdint.h>
> +
> +void
> +f (int32_t *x, int64_t *y, int64_t *z, int n)
> +{
> +  for (int i = 0; i < n; ++i)
> +    {
> +      x[i] = z[i];
> +      y[i] += y[i - 2];
> +    }
> +}
> +
> +/* { dg-final { scan-assembler-times {\txtn\tv[0-9]+\.2s, v[0-9]+\.2d\n} 1 } } */
> +/* { dg-final { scan-assembler-times {\tadd\tv[0-9]+\.2d,} 1 } } */
> Index: gcc/testsuite/gcc.target/aarch64/vect_mixed_sizes_12.c
> ===================================================================
> --- /dev/null   2019-09-17 11:41:18.176664108 +0100
> +++ gcc/testsuite/gcc.target/aarch64/vect_mixed_sizes_12.c      2019-10-25 13:27:36.193616936 +0100
> @@ -0,0 +1,18 @@
> +/* { dg-options "-O2 -ftree-vectorize" } */
> +
> +#pragma GCC target "+nosve"
> +
> +#include <stdint.h>
> +
> +void
> +f (int16_t *x, int32_t *y, int32_t *z, int n)
> +{
> +  for (int i = 0; i < n; ++i)
> +    {
> +      x[i] = z[i];
> +      y[i] += y[i - 4];
> +    }
> +}
> +
> +/* { dg-final { scan-assembler-times {\txtn\tv[0-9]+\.4h, v[0-9]+\.4s\n} 1 } } */
> +/* { dg-final { scan-assembler-times {\tadd\tv[0-9]+\.4s,} 1 } } */
> Index: gcc/testsuite/gcc.target/aarch64/vect_mixed_sizes_13.c
> ===================================================================
> --- /dev/null   2019-09-17 11:41:18.176664108 +0100
> +++ gcc/testsuite/gcc.target/aarch64/vect_mixed_sizes_13.c      2019-10-25 13:27:36.193616936 +0100
> @@ -0,0 +1,18 @@
> +/* { dg-options "-O2 -ftree-vectorize" } */
> +
> +#pragma GCC target "+nosve"
> +
> +#include <stdint.h>
> +
> +void
> +f (int8_t *x, int16_t *y, int16_t *z, int n)
> +{
> +  for (int i = 0; i < n; ++i)
> +    {
> +      x[i] = z[i];
> +      y[i] += y[i - 8];
> +    }
> +}
> +
> +/* { dg-final { scan-assembler-times {\txtn\tv[0-9]+\.8b, v[0-9]+\.8h\n} 1 } } */
> +/* { dg-final { scan-assembler-times {\tadd\tv[0-9]+\.8h,} 1 } } */

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [15/n] Consider building nodes from scalars in vect_slp_analyze_node_operations
  2019-10-29 17:05 ` [15/n] Consider building nodes from scalars in vect_slp_analyze_node_operations Richard Sandiford
@ 2019-11-05 13:07   ` Richard Biener
  0 siblings, 0 replies; 48+ messages in thread
From: Richard Biener @ 2019-11-05 13:07 UTC (permalink / raw)
  To: Richard Sandiford; +Cc: GCC Patches

On Tue, Oct 29, 2019 at 6:04 PM Richard Sandiford
<richard.sandiford@arm.com> wrote:
>
> If the statements in an SLP node aren't similar enough to be vectorised,
> or aren't something the vectoriser has code to handle, the BB vectoriser
> tries building the vector from scalars instead.  This patch does the
> same thing if we're able to build a viable-looking tree but fail later
> during the analysis phase, e.g. because the target doesn't support a
> particular vector operation.
>
> This is needed to avoid regressions with a later patch.

OK.

Thanks,
Richard.

>
> 2019-10-29  Richard Sandiford  <richard.sandiford@arm.com>
>
> gcc/
>         * tree-vect-slp.c (vect_contains_pattern_stmt_p): New function.
>         (vect_slp_convert_to_external): Likewise.
>         (vect_slp_analyze_node_operations): If analysis fails, try building
>         the node from scalars instead.
>
> gcc/testsuite/
>         * gcc.dg/vect/bb-slp-div-2.c: New test.
>
> Index: gcc/tree-vect-slp.c
> ===================================================================
> --- gcc/tree-vect-slp.c 2019-10-29 17:01:46.000000000 +0000
> +++ gcc/tree-vect-slp.c 2019-10-29 17:02:06.355512105 +0000
> @@ -225,6 +225,19 @@ vect_free_oprnd_info (vec<slp_oprnd_info
>  }
>
>
> +/* Return true if STMTS contains a pattern statement.  */
> +
> +static bool
> +vect_contains_pattern_stmt_p (vec<stmt_vec_info> stmts)
> +{
> +  stmt_vec_info stmt_info;
> +  unsigned int i;
> +  FOR_EACH_VEC_ELT (stmts, i, stmt_info)
> +    if (is_pattern_stmt_p (stmt_info))
> +      return true;
> +  return false;
> +}
> +
>  /* Find the place of the data-ref in STMT_INFO in the interleaving chain
>     that starts from FIRST_STMT_INFO.  Return -1 if the data-ref is not a part
>     of the chain.  */
> @@ -2630,6 +2643,39 @@ vect_slp_analyze_node_operations_1 (vec_
>    return vect_analyze_stmt (stmt_info, &dummy, node, node_instance, cost_vec);
>  }
>
> +/* Try to build NODE from scalars, returning true on success.
> +   NODE_INSTANCE is the SLP instance that contains NODE.  */
> +
> +static bool
> +vect_slp_convert_to_external (vec_info *vinfo, slp_tree node,
> +                             slp_instance node_instance)
> +{
> +  stmt_vec_info stmt_info;
> +  unsigned int i;
> +
> +  if (!is_a <bb_vec_info> (vinfo)
> +      || node == SLP_INSTANCE_TREE (node_instance)
> +      || vect_contains_pattern_stmt_p (SLP_TREE_SCALAR_STMTS (node)))
> +    return false;
> +
> +  if (dump_enabled_p ())
> +    dump_printf_loc (MSG_NOTE, vect_location,
> +                    "Building vector operands from scalars instead\n");
> +
> +  /* Don't remove and free the child nodes here, since they could be
> +     referenced by other structures.  The analysis and scheduling phases
> +     (need to) ignore child nodes of anything that isn't vect_internal_def.  */
> +  unsigned int group_size = SLP_TREE_SCALAR_STMTS (node).length ();
> +  SLP_TREE_DEF_TYPE (node) = vect_external_def;
> +  SLP_TREE_SCALAR_OPS (node).safe_grow (group_size);
> +  FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_STMTS (node), i, stmt_info)
> +    {
> +      tree lhs = gimple_get_lhs (vect_orig_stmt (stmt_info)->stmt);
> +      SLP_TREE_SCALAR_OPS (node)[i] = lhs;
> +    }
> +  return true;
> +}
> +
>  /* Analyze statements contained in SLP tree NODE after recursively analyzing
>     the subtree.  NODE_INSTANCE contains NODE and VINFO contains INSTANCE.
>
> @@ -2656,6 +2702,13 @@ vect_slp_analyze_node_operations (vec_in
>      {
>        SLP_TREE_NUMBER_OF_VEC_STMTS (node)
>         = SLP_TREE_NUMBER_OF_VEC_STMTS (*leader);
> +      /* Cope with cases in which we made a late decision to build the
> +        node from scalars.  */
> +      if (SLP_TREE_DEF_TYPE (*leader) == vect_external_def
> +         && vect_slp_convert_to_external (vinfo, node, node_instance))
> +       ;
> +      else
> +       gcc_assert (SLP_TREE_DEF_TYPE (node) == SLP_TREE_DEF_TYPE (*leader));
>        return true;
>      }
>
> @@ -2715,6 +2768,11 @@ vect_slp_analyze_node_operations (vec_in
>      if (SLP_TREE_SCALAR_STMTS (child).length () != 0)
>        STMT_VINFO_DEF_TYPE (SLP_TREE_SCALAR_STMTS (child)[0]) = dt[j];
>
> +  /* If this node can't be vectorized, try pruning the tree here rather
> +     than felling the whole thing.  */
> +  if (!res && vect_slp_convert_to_external (vinfo, node, node_instance))
> +    res = true;
> +
>    return res;
>  }
>
> Index: gcc/testsuite/gcc.dg/vect/bb-slp-div-2.c
> ===================================================================
> --- /dev/null   2019-09-17 11:41:18.176664108 +0100
> +++ gcc/testsuite/gcc.dg/vect/bb-slp-div-2.c    2019-10-29 17:02:06.351512133 +0000
> @@ -0,0 +1,14 @@
> +/* { dg-do compile } */
> +
> +int x[4], y[4], z[4];
> +
> +void
> +f (void)
> +{
> +  x[0] += y[0] / z[0] * 2;
> +  x[1] += y[1] / z[1] * 2;
> +  x[2] += y[2] / z[2] * 2;
> +  x[3] += y[3] / z[3] * 2;
> +}
> +
> +/* { dg-final { scan-tree-dump "basic block vectorized" "slp2" { target vect_int } } } */

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [16/n] Apply maximum nunits for BB SLP
  2019-10-29 17:14 ` [16/n] Apply maximum nunits for BB SLP Richard Sandiford
@ 2019-11-05 13:22   ` Richard Biener
  2019-11-05 14:09     ` Richard Sandiford
  0 siblings, 1 reply; 48+ messages in thread
From: Richard Biener @ 2019-11-05 13:22 UTC (permalink / raw)
  To: Richard Sandiford; +Cc: GCC Patches

On Tue, Oct 29, 2019 at 6:05 PM Richard Sandiford
<richard.sandiford@arm.com> wrote:
>
> The BB vectoriser picked vector types in the same way as the loop
> vectoriser: it picked a vector mode/size for the region and then
> based all the vector types off that choice.  This meant we could
> end up trying to use vector types that had too many elements for
> the group size.
>
> The main part of this patch is therefore about passing the SLP
> group size down to routines like get_vectype_for_scalar_type and
> ensuring that each vector type in the SLP tree is chosen wrt the
> group size.  That part in itself is pretty easy and mechanical.
>
> The main warts are:
>
> (1) We normally pick a STMT_VINFO_VECTYPE for data references at an
>     early stage (vect_analyze_data_refs).  However, nothing in the
>     BB vectoriser relied on this, or on the min_vf calculated from it.
>     I couldn't see anything other than vect_recog_bool_pattern that
>     tried to access the vector type before the SLP tree is built.

So can you not set STMT_VINFO_VECTYPE for data refs with BB vectorization
then?

> (2) It's possible for the same statement to be used in the groups of
>     different sizes.  Taking the group size into account meant that
>     we could try to pick different vector types for the same statement.

That only happens when we have multiple SLP instances though
(entries into the shared SLP graph).  It probably makes sense to
keep handling SLP instances sharing stmts together for costing
reasons but one issue is that for disjunct pieces (in the same BB)
disqualifying one cost-wise disqualifies all.  So at some point
during analysis (which should eventually cover more than a single
BB) we want to split the graph.  It probably doesn't help the above
case.

>     This problem should go away with the move to doing everything on
>     SLP trees, where presumably we would attach the vector type to the
>     SLP node rather than the stmt_vec_info.  Until then, the patch just
>     uses a first-come, first-served approach.

Yeah, I ran into not having vectype on SLP trees with invariants/externals
as well.  I suppose you didn't try simply adding that to the SLP tree
and pushing/popping it like we push/pop the def type?

Assigning the vector types should really happen in vectorizable_*
and not during SLP build itself btw.

Your update-all-shared-vectypes thing looks quadratic to me :/

> (3) A similar problem exists for grouped data references, where
>     different statements in the same dataref group could be used
>     in SLP nodes that have different group sizes.  The patch copes
>     with that by making sure that all vector types in a dataref
>     group remain consistent.
>
> The patch means that:
>
>     void
>     f (int *x, short *y)
>     {
>       x[0] += y[0];
>       x[1] += y[1];
>       x[2] += y[2];
>       x[3] += y[3];
>     }
>
> now produces:
>
>         ldr     q0, [x0]
>         ldr     d1, [x1]
>         saddw   v0.4s, v0.4s, v1.4h
>         str     q0, [x0]
>         ret
>
> instead of:
>
>         ldrsh   w2, [x1]
>         ldrsh   w3, [x1, 2]
>         fmov    s0, w2
>         ldrsh   w2, [x1, 4]
>         ldrsh   w1, [x1, 6]
>         ins     v0.s[1], w3
>         ldr     q1, [x0]
>         ins     v0.s[2], w2
>         ins     v0.s[3], w1
>         add     v0.4s, v0.4s, v1.4s
>         str     q0, [x0]
>         ret

Nice.

> Unfortunately it also means we start to vectorise
> gcc.target/i386/pr84101.c for -m32.  That seems like a target
> cost issue though; see PR92265 for details.
>
>
> 2019-10-29  Richard Sandiford  <richard.sandiford@arm.com>
>
> gcc/
>         * tree-vectorizer.h (vect_get_vector_types_for_stmt): Take an
>         optional maximum nunits.
>         (get_vectype_for_scalar_type): Likewise.  Also declare a form that
>         takes an slp_tree.
>         (get_mask_type_for_scalar_type): Take an optional slp_tree.
>         (vect_get_mask_type_for_stmt): Likewise.
>         * tree-vect-data-refs.c (vect_analyze_data_refs): Don't store
>         the vector type in STMT_VINFO_VECTYPE for BB vectorization.
>         * tree-vect-patterns.c (vect_recog_bool_pattern): Use
>         vect_get_vector_types_for_stmt instead of STMT_VINFO_VECTYPE
>         to get an assumed vector type for data references.
>         * tree-vect-slp.c (vect_update_shared_vectype): New function.
>         (vect_update_all_shared_vectypes): Likewise.
>         (vect_build_slp_tree_1): Pass the group size to
>         vect_get_vector_types_for_stmt.  Use vect_update_shared_vectype
>         for BB vectorization.
>         (vect_build_slp_tree_2): Call vect_update_all_shared_vectypes
>         before building the vectof from scalars.
>         (vect_analyze_slp_instance): Pass the group size to
>         get_vectype_for_scalar_type.
>         (vect_slp_analyze_node_operations_1): Don't recompute the vector
>         types for BB vectorization here; just handle the case in which
>         we deferred the choice for booleans.
>         (vect_get_constant_vectors): Pass the slp_tree to
>         get_vectype_for_scalar_type.
>         * tree-vect-stmts.c (vect_prologue_cost_for_slp_op): Likewise.
>         (vectorizable_call): Likewise.
>         (vectorizable_simd_clone_call): Likewise.
>         (vectorizable_conversion): Likewise.
>         (vectorizable_shift): Likewise.
>         (vectorizable_operation): Likewise.
>         (vectorizable_comparison): Likewise.
>         (vect_is_simple_cond): Take the slp_tree as argument and
>         pass it to get_vectype_for_scalar_type.
>         (vectorizable_condition): Update call accordingly.
>         (get_vectype_for_scalar_type): Take a group_size argument.
>         For BB vectorization, limit the the vector to that number
>         of elements.  Also define an overload that takes an slp_tree.
>         (get_mask_type_for_scalar_type): Add an slp_tree argument and
>         pass it to get_vectype_for_scalar_type.
>         (vect_get_vector_types_for_stmt): Add a group_size argument
>         and pass it to get_vectype_for_scalar_type.  Don't use the
>         cached vector type for BB vectorization if a group size is given.
>         Handle data references in that case.
>         (vect_get_mask_type_for_stmt): Take an slp_tree argument and
>         pass it to get_mask_type_for_scalar_type.
>
> gcc/testsuite/
>         * gcc.dg/vect/bb-slp-4.c: Expect the block to be vectorized
>         with -fno-vect-cost-model.
>         * gcc.dg/vect/bb-slp-bool-1.c: New test.
>         * gcc.target/aarch64/vect_mixed_sizes_14.c: Likewise.
>         * gcc.target/i386/pr84101.c: XFAIL for -m32.
>
> Index: gcc/tree-vectorizer.h
> ===================================================================
> --- gcc/tree-vectorizer.h       2019-10-29 17:01:42.835677274 +0000
> +++ gcc/tree-vectorizer.h       2019-10-29 17:02:09.883487330 +0000
> @@ -1598,8 +1598,9 @@ extern bool vect_can_advance_ivs_p (loop
>  /* In tree-vect-stmts.c.  */
>  extern tree get_related_vectype_for_scalar_type (machine_mode, tree,
>                                                  poly_uint64 = 0);
> -extern tree get_vectype_for_scalar_type (vec_info *, tree);
> -extern tree get_mask_type_for_scalar_type (vec_info *, tree);
> +extern tree get_vectype_for_scalar_type (vec_info *, tree, unsigned int = 0);
> +extern tree get_vectype_for_scalar_type (vec_info *, tree, slp_tree);
> +extern tree get_mask_type_for_scalar_type (vec_info *, tree, slp_tree = 0);
>  extern tree get_same_sized_vectype (tree, tree);
>  extern bool vect_get_loop_mask_type (loop_vec_info);
>  extern bool vect_is_simple_use (tree, vec_info *, enum vect_def_type *,
> @@ -1649,8 +1650,8 @@ extern void optimize_mask_stores (class
>  extern gcall *vect_gen_while (tree, tree, tree);
>  extern tree vect_gen_while_not (gimple_seq *, tree, tree, tree);
>  extern opt_result vect_get_vector_types_for_stmt (stmt_vec_info, tree *,
> -                                                 tree *);
> -extern opt_tree vect_get_mask_type_for_stmt (stmt_vec_info);
> +                                                 tree *, unsigned int = 0);
> +extern opt_tree vect_get_mask_type_for_stmt (stmt_vec_info, slp_tree = 0);
>
>  /* In tree-vect-data-refs.c.  */
>  extern bool vect_can_force_dr_alignment_p (const_tree, poly_uint64);
> Index: gcc/tree-vect-data-refs.c
> ===================================================================
> --- gcc/tree-vect-data-refs.c   2019-10-25 09:21:28.606327675 +0100
> +++ gcc/tree-vect-data-refs.c   2019-10-29 17:02:09.875487386 +0000
> @@ -4343,9 +4343,8 @@ vect_analyze_data_refs (vec_info *vinfo,
>
>        /* Set vectype for STMT.  */
>        scalar_type = TREE_TYPE (DR_REF (dr));
> -      STMT_VINFO_VECTYPE (stmt_info)
> -       = get_vectype_for_scalar_type (vinfo, scalar_type);
> -      if (!STMT_VINFO_VECTYPE (stmt_info))
> +      tree vectype = get_vectype_for_scalar_type (vinfo, scalar_type);
> +      if (!vectype)
>          {
>            if (dump_enabled_p ())
>              {
> @@ -4378,14 +4377,19 @@ vect_analyze_data_refs (vec_info *vinfo,
>           if (dump_enabled_p ())
>             dump_printf_loc (MSG_NOTE, vect_location,
>                              "got vectype for stmt: %G%T\n",
> -                            stmt_info->stmt, STMT_VINFO_VECTYPE (stmt_info));
> +                            stmt_info->stmt, vectype);
>         }
>
>        /* Adjust the minimal vectorization factor according to the
>          vector type.  */
> -      vf = TYPE_VECTOR_SUBPARTS (STMT_VINFO_VECTYPE (stmt_info));
> +      vf = TYPE_VECTOR_SUBPARTS (vectype);
>        *min_vf = upper_bound (*min_vf, vf);
>
> +      /* Leave the BB vectorizer to pick the vector type later, based on
> +        the final dataref group size and SLP node size.  */
> +      if (is_a <loop_vec_info> (vinfo))
> +       STMT_VINFO_VECTYPE (stmt_info) = vectype;
> +
>        if (gatherscatter != SG_NONE)
>         {
>           gather_scatter_info gs_info;
> Index: gcc/tree-vect-patterns.c
> ===================================================================
> --- gcc/tree-vect-patterns.c    2019-10-29 17:01:42.543679326 +0000
> +++ gcc/tree-vect-patterns.c    2019-10-29 17:02:09.879487358 +0000
> @@ -4153,9 +4153,10 @@ vect_recog_bool_pattern (stmt_vec_info s
>            && STMT_VINFO_DATA_REF (stmt_vinfo))
>      {
>        stmt_vec_info pattern_stmt_info;
> -      vectype = STMT_VINFO_VECTYPE (stmt_vinfo);
> -      gcc_assert (vectype != NULL_TREE);
> -      if (!VECTOR_MODE_P (TYPE_MODE (vectype)))
> +      tree nunits_vectype;
> +      if (!vect_get_vector_types_for_stmt (stmt_vinfo, &vectype,
> +                                          &nunits_vectype)
> +         || !VECTOR_MODE_P (TYPE_MODE (vectype)))
>         return NULL;
>
>        if (check_bool_pattern (var, vinfo, bool_stmts))
> Index: gcc/tree-vect-slp.c
> ===================================================================
> --- gcc/tree-vect-slp.c 2019-10-29 17:02:06.355512105 +0000
> +++ gcc/tree-vect-slp.c 2019-10-29 17:02:09.879487358 +0000
> @@ -601,6 +601,77 @@ vect_get_and_check_slp_defs (vec_info *v
>    return 0;
>  }
>
> +/* Try to assign vector type VECTYPE to STMT_INFO for BB vectorization.
> +   Return true if we can, meaning that this choice doesn't conflict with
> +   existing SLP nodes that use STMT_INFO.  */
> +
> +static bool
> +vect_update_shared_vectype (stmt_vec_info stmt_info, tree vectype)
> +{
> +  tree old_vectype = STMT_VINFO_VECTYPE (stmt_info);
> +  if (old_vectype && useless_type_conversion_p (vectype, old_vectype))
> +    return true;
> +
> +  if (STMT_VINFO_GROUPED_ACCESS (stmt_info)
> +      && DR_IS_READ (STMT_VINFO_DATA_REF (stmt_info)))
> +    {
> +      /* We maintain the invariant that if any statement in the group is
> +        used, all other members of the group have the same vector type.  */
> +      stmt_vec_info first_info = DR_GROUP_FIRST_ELEMENT (stmt_info);
> +      stmt_vec_info member_info = first_info;
> +      for (; member_info; member_info = DR_GROUP_NEXT_ELEMENT (member_info))
> +       if (STMT_VINFO_NUM_SLP_USES (member_info) > 0
> +           || is_pattern_stmt_p (member_info))
> +         break;
> +
> +      if (!member_info)
> +       {
> +         for (member_info = first_info; member_info;
> +              member_info = DR_GROUP_NEXT_ELEMENT (member_info))
> +           STMT_VINFO_VECTYPE (member_info) = vectype;
> +         return true;
> +       }
> +    }
> +  else if (STMT_VINFO_NUM_SLP_USES (stmt_info) == 0
> +          && !is_pattern_stmt_p (stmt_info))
> +    {
> +      STMT_VINFO_VECTYPE (stmt_info) = vectype;
> +      return true;
> +    }
> +
> +  if (dump_enabled_p ())
> +    {
> +      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +                      "Build SLP failed: incompatible vector"
> +                      " types for: %G", stmt_info->stmt);
> +      dump_printf_loc (MSG_NOTE, vect_location,
> +                      "    old vector type: %T\n", old_vectype);
> +      dump_printf_loc (MSG_NOTE, vect_location,
> +                      "    new vector type: %T\n", vectype);
> +    }
> +  return false;
> +}
> +
> +/* Try to infer and assign a vector type to all the statements in STMTS.
> +   Used only for BB vectorization.  */
> +
> +static bool
> +vect_update_all_shared_vectypes (vec<stmt_vec_info> stmts)
> +{
> +  tree vectype, nunits_vectype;
> +  if (!vect_get_vector_types_for_stmt (stmts[0], &vectype,
> +                                      &nunits_vectype, stmts.length ()))
> +    return false;
> +
> +  stmt_vec_info stmt_info;
> +  unsigned int i;
> +  FOR_EACH_VEC_ELT (stmts, i, stmt_info)
> +    if (!vect_update_shared_vectype (stmt_info, vectype))
> +      return false;
> +
> +  return true;
> +}
> +
>  /* Return true if call statements CALL1 and CALL2 are similar enough
>     to be combined into the same SLP group.  */
>
> @@ -747,6 +818,7 @@ vect_build_slp_tree_1 (unsigned char *sw
>    stmt_vec_info stmt_info;
>    FOR_EACH_VEC_ELT (stmts, i, stmt_info)
>      {
> +      vec_info *vinfo = stmt_info->vinfo;
>        gimple *stmt = stmt_info->stmt;
>        swap[i] = 0;
>        matches[i] = false;
> @@ -780,7 +852,7 @@ vect_build_slp_tree_1 (unsigned char *sw
>
>        tree nunits_vectype;
>        if (!vect_get_vector_types_for_stmt (stmt_info, &vectype,
> -                                          &nunits_vectype)
> +                                          &nunits_vectype, group_size)
>           || (nunits_vectype
>               && !vect_record_max_nunits (stmt_info, group_size,
>                                           nunits_vectype, max_nunits)))
> @@ -792,6 +864,10 @@ vect_build_slp_tree_1 (unsigned char *sw
>
>        gcc_assert (vectype);
>
> +      if (is_a <bb_vec_info> (vinfo)
> +         && !vect_update_shared_vectype (stmt_info, vectype))
> +       continue;
> +
>        if (gcall *call_stmt = dyn_cast <gcall *> (stmt))
>         {
>           rhs_code = CALL_EXPR;
> @@ -1330,7 +1406,8 @@ vect_build_slp_tree_2 (vec_info *vinfo,
>               FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (child), j, grandchild)
>                 if (SLP_TREE_DEF_TYPE (grandchild) != vect_external_def)
>                   break;
> -             if (!grandchild)
> +             if (!grandchild
> +                 && vect_update_all_shared_vectypes (oprnd_info->def_stmts))
>                 {
>                   /* Roll back.  */
>                   this_tree_size = old_tree_size;
> @@ -1371,7 +1448,8 @@ vect_build_slp_tree_2 (vec_info *vinfo,
>              do extra work to cancel the pattern so the uses see the
>              scalar version.  */
>           && !is_pattern_stmt_p (stmt_info)
> -         && !oprnd_info->any_pattern)
> +         && !oprnd_info->any_pattern
> +         && vect_update_all_shared_vectypes (oprnd_info->def_stmts))
>         {
>           if (dump_enabled_p ())
>             dump_printf_loc (MSG_NOTE, vect_location,
> @@ -1468,7 +1546,9 @@ vect_build_slp_tree_2 (vec_info *vinfo,
>                   FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (child), j, grandchild)
>                     if (SLP_TREE_DEF_TYPE (grandchild) != vect_external_def)
>                       break;
> -                 if (!grandchild)
> +                 if (!grandchild
> +                     && (vect_update_all_shared_vectypes
> +                         (oprnd_info->def_stmts)))
>                     {
>                       /* Roll back.  */
>                       this_tree_size = old_tree_size;
> @@ -2003,8 +2083,8 @@ vect_analyze_slp_instance (vec_info *vin
>    if (STMT_VINFO_GROUPED_ACCESS (stmt_info))
>      {
>        scalar_type = TREE_TYPE (DR_REF (dr));
> -      vectype = get_vectype_for_scalar_type (vinfo, scalar_type);
>        group_size = DR_GROUP_SIZE (stmt_info);
> +      vectype = get_vectype_for_scalar_type (vinfo, scalar_type, group_size);
>      }
>    else if (!dr && REDUC_GROUP_FIRST_ELEMENT (stmt_info))
>      {
> @@ -2586,22 +2666,13 @@ vect_slp_analyze_node_operations_1 (vec_
>       Memory accesses already got their vector type assigned
>       in vect_analyze_data_refs.  */
>    bb_vec_info bb_vinfo = STMT_VINFO_BB_VINFO (stmt_info);
> -  if (bb_vinfo
> -      && ! STMT_VINFO_DATA_REF (stmt_info))
> +  if (bb_vinfo && STMT_VINFO_VECTYPE (stmt_info) == boolean_type_node)
>      {
> -      tree vectype, nunits_vectype;
> -      if (!vect_get_vector_types_for_stmt (stmt_info, &vectype,
> -                                          &nunits_vectype))
> -       /* We checked this when building the node.  */
> -       gcc_unreachable ();
> -      if (vectype == boolean_type_node)
> -       {
> -         vectype = vect_get_mask_type_for_stmt (stmt_info);
> -         if (!vectype)
> -           /* vect_get_mask_type_for_stmt has already explained the
> -              failure.  */
> -           return false;
> -       }
> +      tree vectype = vect_get_mask_type_for_stmt (stmt_info, node);
> +      if (!vectype)
> +       /* vect_get_mask_type_for_stmt has already explained the
> +          failure.  */
> +       return false;
>
>        stmt_vec_info sstmt_info;
>        unsigned int i;
> @@ -3475,7 +3546,7 @@ vect_get_constant_vectors (slp_tree op_n
>        && vect_mask_constant_operand_p (stmt_vinfo))
>      vector_type = truth_type_for (stmt_vectype);
>    else
> -    vector_type = get_vectype_for_scalar_type (vinfo, TREE_TYPE (op));
> +    vector_type = get_vectype_for_scalar_type (vinfo, TREE_TYPE (op), op_node);
>
>    unsigned int number_of_vectors
>      = vect_get_num_vectors (SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node)
> Index: gcc/tree-vect-stmts.c
> ===================================================================
> --- gcc/tree-vect-stmts.c       2019-10-29 17:01:42.951676460 +0000
> +++ gcc/tree-vect-stmts.c       2019-10-29 17:02:09.883487330 +0000
> @@ -783,7 +783,7 @@ vect_prologue_cost_for_slp_op (slp_tree
>    /* Without looking at the actual initializer a vector of
>       constants can be implemented as load from the constant pool.
>       When all elements are the same we can use a splat.  */
> -  tree vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (op));
> +  tree vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (op), node);
>    unsigned group_size = SLP_TREE_SCALAR_STMTS (node).length ();
>    unsigned num_vects_to_check;
>    unsigned HOST_WIDE_INT const_nunits;
> @@ -3290,7 +3290,7 @@ vectorizable_call (stmt_vec_info stmt_in
>    /* If all arguments are external or constant defs, infer the vector type
>       from the scalar type.  */
>    if (!vectype_in)
> -    vectype_in = get_vectype_for_scalar_type (vinfo, rhs_type);
> +    vectype_in = get_vectype_for_scalar_type (vinfo, rhs_type, slp_node);
>    if (vec_stmt)
>      gcc_assert (vectype_in);
>    if (!vectype_in)
> @@ -4066,7 +4066,8 @@ vectorizable_simd_clone_call (stmt_vec_i
>         && bestn->simdclone->args[i].arg_type == SIMD_CLONE_ARG_TYPE_VECTOR)
>        {
>         tree arg_type = TREE_TYPE (gimple_call_arg (stmt, i));
> -       arginfo[i].vectype = get_vectype_for_scalar_type (vinfo, arg_type);
> +       arginfo[i].vectype = get_vectype_for_scalar_type (vinfo, arg_type,
> +                                                         slp_node);
>         if (arginfo[i].vectype == NULL
>             || (simd_clone_subparts (arginfo[i].vectype)
>                 > bestn->simdclone->simdlen))
> @@ -4782,7 +4783,7 @@ vectorizable_conversion (stmt_vec_info s
>    /* If op0 is an external or constant def, infer the vector type
>       from the scalar type.  */
>    if (!vectype_in)
> -    vectype_in = get_vectype_for_scalar_type (vinfo, rhs_type);
> +    vectype_in = get_vectype_for_scalar_type (vinfo, rhs_type, slp_node);
>    if (vec_stmt)
>      gcc_assert (vectype_in);
>    if (!vectype_in)
> @@ -5548,7 +5549,7 @@ vectorizable_shift (stmt_vec_info stmt_i
>    /* If op0 is an external or constant def, infer the vector type
>       from the scalar type.  */
>    if (!vectype)
> -    vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (op0));
> +    vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (op0), slp_node);
>    if (vec_stmt)
>      gcc_assert (vectype);
>    if (!vectype)
> @@ -5647,7 +5648,8 @@ vectorizable_shift (stmt_vec_info stmt_i
>                           "vector/vector shift/rotate found.\n");
>
>        if (!op1_vectype)
> -       op1_vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (op1));
> +       op1_vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (op1),
> +                                                  slp_node);
>        incompatible_op1_vectype_p
>         = (op1_vectype == NULL_TREE
>            || maybe_ne (TYPE_VECTOR_SUBPARTS (op1_vectype),
> @@ -5999,7 +6001,8 @@ vectorizable_operation (stmt_vec_info st
>           vectype = vectype_out;
>         }
>        else
> -       vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (op0));
> +       vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (op0),
> +                                              slp_node);
>      }
>    if (vec_stmt)
>      gcc_assert (vectype);
> @@ -9741,7 +9744,7 @@ vectorizable_load (stmt_vec_info stmt_in
>     condition operands are supportable using vec_is_simple_use.  */
>
>  static bool
> -vect_is_simple_cond (tree cond, vec_info *vinfo,
> +vect_is_simple_cond (tree cond, vec_info *vinfo, slp_tree slp_node,
>                      tree *comp_vectype, enum vect_def_type *dts,
>                      tree vectype)
>  {
> @@ -9805,7 +9808,8 @@ vect_is_simple_cond (tree cond, vec_info
>         scalar_type = build_nonstandard_integer_type
>           (tree_to_uhwi (TYPE_SIZE (TREE_TYPE (vectype))),
>            TYPE_UNSIGNED (scalar_type));
> -      *comp_vectype = get_vectype_for_scalar_type (vinfo, scalar_type);
> +      *comp_vectype = get_vectype_for_scalar_type (vinfo, scalar_type,
> +                                                  slp_node);
>      }
>
>    return true;
> @@ -9912,7 +9916,7 @@ vectorizable_condition (stmt_vec_info st
>    then_clause = gimple_assign_rhs2 (stmt);
>    else_clause = gimple_assign_rhs3 (stmt);
>
> -  if (!vect_is_simple_cond (cond_expr, stmt_info->vinfo,
> +  if (!vect_is_simple_cond (cond_expr, stmt_info->vinfo, slp_node,
>                             &comp_vectype, &dts[0], slp_node ? NULL : vectype)
>        || !comp_vectype)
>      return false;
> @@ -10391,7 +10395,8 @@ vectorizable_comparison (stmt_vec_info s
>    /* Invariant comparison.  */
>    if (!vectype)
>      {
> -      vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (rhs1));
> +      vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (rhs1),
> +                                            slp_node);
>        if (maybe_ne (TYPE_VECTOR_SUBPARTS (vectype), nunits))
>         return false;
>      }
> @@ -11199,27 +11204,87 @@ get_related_vectype_for_scalar_type (mac
>  /* Function get_vectype_for_scalar_type.
>
>     Returns the vector type corresponding to SCALAR_TYPE as supported
> -   by the target.  */
> +   by the target.  If GROUP_SIZE is nonzero and we're performing BB
> +   vectorization, make sure that the number of elements in the vector
> +   is no bigger than GROUP_SIZE.  */
>
>  tree
> -get_vectype_for_scalar_type (vec_info *vinfo, tree scalar_type)
> +get_vectype_for_scalar_type (vec_info *vinfo, tree scalar_type,
> +                            unsigned int group_size)
>  {
> +  /* For BB vectorization, we should always have a group size once we've
> +     constructed the SLP tree; the only valid uses of zero GROUP_SIZEs
> +     are tentative requests during things like early data reference
> +     analysis and pattern recognition.  */
> +  if (is_a <bb_vec_info> (vinfo))
> +    gcc_assert (vinfo->slp_instances.is_empty () || group_size != 0);
> +  else
> +    group_size = 0;
> +
>    tree vectype = get_related_vectype_for_scalar_type (vinfo->vector_mode,
>                                                       scalar_type);
>    if (vectype && vinfo->vector_mode == VOIDmode)
>      vinfo->vector_mode = TYPE_MODE (vectype);
> +
> +  /* If the natural choice of vector type doesn't satisfy GROUP_SIZE,
> +     try again with an explicit number of elements.  */
> +  if (vectype
> +      && group_size
> +      && maybe_ge (TYPE_VECTOR_SUBPARTS (vectype), group_size))
> +    {
> +      /* Start with the biggest number of units that fits within
> +        GROUP_SIZE and halve it until we find a valid vector type.
> +        Usually either the first attempt will succeed or all will
> +        fail (in the latter case because GROUP_SIZE is too small
> +        for the target), but it's possible that a target could have
> +        a hole between supported vector types.
> +
> +        If GROUP_SIZE is not a power of 2, this has the effect of
> +        trying the largest power of 2 that fits within the group,
> +        even though the group is not a multiple of that vector size.
> +        The BB vectorizer will then try to carve up the group into
> +        smaller pieces.  */
> +      unsigned int nunits = 1 << floor_log2 (group_size);
> +      do
> +       {
> +         vectype = get_related_vectype_for_scalar_type (vinfo->vector_mode,
> +                                                        scalar_type, nunits);
> +         nunits /= 2;
> +       }
> +      while (nunits > 1 && !vectype);
> +    }
>    return vectype;
>  }
>
> +/* Return the vector type corresponding to SCALAR_TYPE as supported
> +   by the target.  NODE, if nonnull, is the SLP tree node that will
> +   use the returned vector type.  */
> +
> +tree
> +get_vectype_for_scalar_type (vec_info *vinfo, tree scalar_type, slp_tree node)
> +{
> +  unsigned int group_size = 0;
> +  if (node)
> +    {
> +      group_size = SLP_TREE_SCALAR_OPS (node).length ();
> +      if (group_size == 0)
> +       group_size = SLP_TREE_SCALAR_STMTS (node).length ();
> +    }
> +  return get_vectype_for_scalar_type (vinfo, scalar_type, group_size);
> +}
> +
>  /* Function get_mask_type_for_scalar_type.
>
>     Returns the mask type corresponding to a result of comparison
> -   of vectors of specified SCALAR_TYPE as supported by target.  */
> +   of vectors of specified SCALAR_TYPE as supported by target.
> +   NODE, if nonnull, is the SLP tree node that will use the returned
> +   vector type.  */
>
>  tree
> -get_mask_type_for_scalar_type (vec_info *vinfo, tree scalar_type)
> +get_mask_type_for_scalar_type (vec_info *vinfo, tree scalar_type,
> +                              slp_tree node)
>  {
> -  tree vectype = get_vectype_for_scalar_type (vinfo, scalar_type);
> +  tree vectype = get_vectype_for_scalar_type (vinfo, scalar_type, node);
>
>    if (!vectype)
>      return NULL;
> @@ -11892,6 +11957,9 @@ vect_gen_while_not (gimple_seq *seq, tre
>
>  /* Try to compute the vector types required to vectorize STMT_INFO,
>     returning true on success and false if vectorization isn't possible.
> +   If GROUP_SIZE is nonzero and we're performing BB vectorization,
> +   take sure that the number of elements in the vectors is no bigger
> +   than GROUP_SIZE.
>
>     On success:
>
> @@ -11909,11 +11977,21 @@ vect_gen_while_not (gimple_seq *seq, tre
>  opt_result
>  vect_get_vector_types_for_stmt (stmt_vec_info stmt_info,
>                                 tree *stmt_vectype_out,
> -                               tree *nunits_vectype_out)
> +                               tree *nunits_vectype_out,
> +                               unsigned int group_size)
>  {
>    vec_info *vinfo = stmt_info->vinfo;
>    gimple *stmt = stmt_info->stmt;
>
> +  /* For BB vectorization, we should always have a group size once we've
> +     constructed the SLP tree; the only valid uses of zero GROUP_SIZEs
> +     are tentative requests during things like early data reference
> +     analysis and pattern recognition.  */
> +  if (is_a <bb_vec_info> (vinfo))
> +    gcc_assert (vinfo->slp_instances.is_empty () || group_size != 0);
> +  else
> +    group_size = 0;
> +
>    *stmt_vectype_out = NULL_TREE;
>    *nunits_vectype_out = NULL_TREE;
>
> @@ -11944,7 +12022,7 @@ vect_get_vector_types_for_stmt (stmt_vec
>
>    tree vectype;
>    tree scalar_type = NULL_TREE;
> -  if (STMT_VINFO_VECTYPE (stmt_info))
> +  if (group_size == 0 && STMT_VINFO_VECTYPE (stmt_info))
>      {
>        *stmt_vectype_out = vectype = STMT_VINFO_VECTYPE (stmt_info);
>        if (dump_enabled_p ())
> @@ -11953,15 +12031,17 @@ vect_get_vector_types_for_stmt (stmt_vec
>      }
>    else
>      {
> -      gcc_assert (!STMT_VINFO_DATA_REF (stmt_info));
> -      if (gimple_call_internal_p (stmt, IFN_MASK_STORE))
> +      if (data_reference *dr = STMT_VINFO_DATA_REF (stmt_info))
> +       scalar_type = TREE_TYPE (DR_REF (dr));
> +      else if (gimple_call_internal_p (stmt, IFN_MASK_STORE))
>         scalar_type = TREE_TYPE (gimple_call_arg (stmt, 3));
>        else
>         scalar_type = TREE_TYPE (gimple_get_lhs (stmt));
>
>        /* Pure bool ops don't participate in number-of-units computation.
>          For comparisons use the types being compared.  */
> -      if (VECT_SCALAR_BOOLEAN_TYPE_P (scalar_type)
> +      if (!STMT_VINFO_DATA_REF (stmt_info)
> +         && VECT_SCALAR_BOOLEAN_TYPE_P (scalar_type)
>           && is_gimple_assign (stmt)
>           && gimple_assign_rhs_code (stmt) != COND_EXPR)
>         {
> @@ -11981,9 +12061,16 @@ vect_get_vector_types_for_stmt (stmt_vec
>         }
>
>        if (dump_enabled_p ())
> -       dump_printf_loc (MSG_NOTE, vect_location,
> -                        "get vectype for scalar type: %T\n", scalar_type);
> -      vectype = get_vectype_for_scalar_type (vinfo, scalar_type);
> +       {
> +         if (group_size)
> +           dump_printf_loc (MSG_NOTE, vect_location,
> +                            "get vectype for scalar type (group size %d):"
> +                            " %T\n", group_size, scalar_type);
> +         else
> +           dump_printf_loc (MSG_NOTE, vect_location,
> +                            "get vectype for scalar type: %T\n", scalar_type);
> +       }
> +      vectype = get_vectype_for_scalar_type (vinfo, scalar_type, group_size);
>        if (!vectype)
>         return opt_result::failure_at (stmt,
>                                        "not vectorized:"
> @@ -12014,7 +12101,8 @@ vect_get_vector_types_for_stmt (stmt_vec
>             dump_printf_loc (MSG_NOTE, vect_location,
>                              "get vectype for smallest scalar type: %T\n",
>                              scalar_type);
> -         nunits_vectype = get_vectype_for_scalar_type (vinfo, scalar_type);
> +         nunits_vectype = get_vectype_for_scalar_type (vinfo, scalar_type,
> +                                                       group_size);
>           if (!nunits_vectype)
>             return opt_result::failure_at
>               (stmt, "not vectorized: unsupported data-type %T\n",
> @@ -12042,10 +12130,11 @@ vect_get_vector_types_for_stmt (stmt_vec
>
>  /* Try to determine the correct vector type for STMT_INFO, which is a
>     statement that produces a scalar boolean result.  Return the vector
> -   type on success, otherwise return NULL_TREE.  */
> +   type on success, otherwise return NULL_TREE.  NODE, if nonnull,
> +   is the SLP tree node that will use the returned vector type.  */
>
>  opt_tree
> -vect_get_mask_type_for_stmt (stmt_vec_info stmt_info)
> +vect_get_mask_type_for_stmt (stmt_vec_info stmt_info, slp_tree node)
>  {
>    vec_info *vinfo = stmt_info->vinfo;
>    gimple *stmt = stmt_info->stmt;
> @@ -12057,7 +12146,7 @@ vect_get_mask_type_for_stmt (stmt_vec_in
>        && !VECT_SCALAR_BOOLEAN_TYPE_P (TREE_TYPE (gimple_assign_rhs1 (stmt))))
>      {
>        scalar_type = TREE_TYPE (gimple_assign_rhs1 (stmt));
> -      mask_type = get_mask_type_for_scalar_type (vinfo, scalar_type);
> +      mask_type = get_mask_type_for_scalar_type (vinfo, scalar_type, node);
>
>        if (!mask_type)
>         return opt_tree::failure_at (stmt,
> Index: gcc/testsuite/gcc.dg/vect/bb-slp-4.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/vect/bb-slp-4.c        2019-03-08 18:15:02.268871230 +0000
> +++ gcc/testsuite/gcc.dg/vect/bb-slp-4.c        2019-10-29 17:02:09.875487386 +0000
> @@ -38,5 +38,4 @@ int main (void)
>    return 0;
>  }
>
> -/* { dg-final { scan-tree-dump-times "basic block vectorized" 0 "slp2" } } */
> -
> +/* { dg-final { scan-tree-dump-times "basic block vectorized" 1 "slp2" } } */
> Index: gcc/testsuite/gcc.dg/vect/bb-slp-bool-1.c
> ===================================================================
> --- /dev/null   2019-09-17 11:41:18.176664108 +0100
> +++ gcc/testsuite/gcc.dg/vect/bb-slp-bool-1.c   2019-10-29 17:02:09.875487386 +0000
> @@ -0,0 +1,44 @@
> +#include "tree-vect.h"
> +
> +void __attribute__ ((noipa))
> +f1 (_Bool *x, unsigned short *y)
> +{
> +  x[0] = (y[0] == 1);
> +  x[1] = (y[1] == 1);
> +}
> +
> +void __attribute__ ((noipa))
> +f2 (_Bool *x, unsigned short *y)
> +{
> +  x[0] = (y[0] == 1);
> +  x[1] = (y[1] == 1);
> +  x[2] = (y[2] == 1);
> +  x[3] = (y[3] == 1);
> +  x[4] = (y[4] == 1);
> +  x[5] = (y[5] == 1);
> +  x[6] = (y[6] == 1);
> +  x[7] = (y[7] == 1);
> +}
> +
> +_Bool x[8];
> +unsigned short y[8] = { 11, 1, 9, 5, 1, 44, 1, 1 };
> +
> +int
> +main (void)
> +{
> +  check_vect ();
> +
> +  f1 (x, y);
> +
> +  if (x[0] || !x[1])
> +    __builtin_abort ();
> +
> +  x[1] = 0;
> +
> +  f2 (x, y);
> +
> +  if (x[0] || !x[1] || x[2] | x[3] || !x[4] || x[5] || !x[6] || !x[7])
> +    __builtin_abort ();
> +
> +  return 0;
> +}
> Index: gcc/testsuite/gcc.target/aarch64/vect_mixed_sizes_14.c
> ===================================================================
> --- /dev/null   2019-09-17 11:41:18.176664108 +0100
> +++ gcc/testsuite/gcc.target/aarch64/vect_mixed_sizes_14.c      2019-10-29 17:02:09.875487386 +0000
> @@ -0,0 +1,26 @@
> +/* { dg-options "-O2 -ftree-vectorize" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> +
> +/*
> +** foo:
> +** (
> +**     ldr     d([0-9]+), \[x1\]
> +**     ldr     q([0-9]+), \[x0\]
> +**     saddw   v([0-9]+)\.4s, v\2\.4s, v\1\.4h
> +**     str     q\3, \[x0\]
> +** |
> +**     ldr     q([0-9]+), \[x0\]
> +**     ldr     d([0-9]+), \[x1\]
> +**     saddw   v([0-9]+)\.4s, v\4\.4s, v\5\.4h
> +**     str     q\6, \[x0\]
> +** )
> +**     ret
> +*/
> +void
> +foo (int *x, short *y)
> +{
> +  x[0] += y[0];
> +  x[1] += y[1];
> +  x[2] += y[2];
> +  x[3] += y[3];
> +}
> Index: gcc/testsuite/gcc.target/i386/pr84101.c
> ===================================================================
> --- gcc/testsuite/gcc.target/i386/pr84101.c     2019-04-04 08:34:50.849942379 +0100
> +++ gcc/testsuite/gcc.target/i386/pr84101.c     2019-10-29 17:02:09.875487386 +0000
> @@ -18,4 +18,5 @@ uint64_pair_t pair(int num)
>    return p ;
>  }
>
> -/* { dg-final { scan-tree-dump-not "basic block vectorized" "slp2" } } */
> +/* See PR92266 for the XFAIL.  */
> +/* { dg-final { scan-tree-dump-not "basic block vectorized" "slp2" { xfail ilp32 } } } */

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [16/n] Apply maximum nunits for BB SLP
  2019-11-05 13:22   ` Richard Biener
@ 2019-11-05 14:09     ` Richard Sandiford
  2019-11-14 12:22       ` Richard Biener
  0 siblings, 1 reply; 48+ messages in thread
From: Richard Sandiford @ 2019-11-05 14:09 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC Patches

Richard Biener <richard.guenther@gmail.com> writes:
> On Tue, Oct 29, 2019 at 6:05 PM Richard Sandiford
> <richard.sandiford@arm.com> wrote:
>>
>> The BB vectoriser picked vector types in the same way as the loop
>> vectoriser: it picked a vector mode/size for the region and then
>> based all the vector types off that choice.  This meant we could
>> end up trying to use vector types that had too many elements for
>> the group size.
>>
>> The main part of this patch is therefore about passing the SLP
>> group size down to routines like get_vectype_for_scalar_type and
>> ensuring that each vector type in the SLP tree is chosen wrt the
>> group size.  That part in itself is pretty easy and mechanical.
>>
>> The main warts are:
>>
>> (1) We normally pick a STMT_VINFO_VECTYPE for data references at an
>>     early stage (vect_analyze_data_refs).  However, nothing in the
>>     BB vectoriser relied on this, or on the min_vf calculated from it.
>>     I couldn't see anything other than vect_recog_bool_pattern that
>>     tried to access the vector type before the SLP tree is built.
>
> So can you not set STMT_VINFO_VECTYPE for data refs with BB vectorization
> then?

Yeah, the patch stops us from setting it during vect_analyze_data_refs.
We still need to set it later when building the SLP tree, just like
we do for other statements.

>> (2) It's possible for the same statement to be used in the groups of
>>     different sizes.  Taking the group size into account meant that
>>     we could try to pick different vector types for the same statement.
>
> That only happens when we have multiple SLP instances though
> (entries into the shared SLP graph).

Yeah.

> It probably makes sense to keep handling SLP instances sharing stmts
> together for costing reasons but one issue is that for disjunct pieces
> (in the same BB) disqualifying one cost-wise disqualifies all.  So at
> some point during analysis (which should eventually cover more than a
> single BB) we want to split the graph.  It probably doesn't help the
> above case.

Yeah, sounds like there are two issues: one with sharing stmt_vec_infos
between multiple SLP nodes, and one with sharing SLP child nodes between
multiple parent nodes.  (2) comes from the first, but I guess failing
based on costs is more about the second.

>>     This problem should go away with the move to doing everything on
>>     SLP trees, where presumably we would attach the vector type to the
>>     SLP node rather than the stmt_vec_info.  Until then, the patch just
>>     uses a first-come, first-served approach.
>
> Yeah, I ran into not having vectype on SLP trees with invariants/externals
> as well.  I suppose you didn't try simply adding that to the SLP tree
> and pushing/popping it like we push/pop the def type?

No, didn't try that.  Maybe it would be worth a go, but it seems like it
could be a rabbit hole.

> Assigning the vector types should really happen in vectorizable_*
> and not during SLP build itself btw.

Agree we need to improve the way this is handled, but delaying it
to vectorizable_* sounds quite late.  Maybe it should be a more global
decision, since the vector types for each vectorizable_* have to be
compatible and it's not obvious which routine should get first choice.

> Your update-all-shared-vectypes thing looks quadratic to me :/

Should be amortised linear.  The statements in a DR group always
have the same vectype.  When we want to change the vector type
of one statement, we change it for all statements if possible
or fail if we can't.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [10/n] Make less use of get_same_sized_vectype
  2019-11-05 12:50   ` Richard Biener
@ 2019-11-05 15:34     ` Richard Sandiford
  2019-11-05 16:09       ` Richard Biener
  0 siblings, 1 reply; 48+ messages in thread
From: Richard Sandiford @ 2019-11-05 15:34 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC Patches

Richard Biener <richard.guenther@gmail.com> writes:
> On Fri, Oct 25, 2019 at 2:41 PM Richard Sandiford
> <richard.sandiford@arm.com> wrote:
>>
>> Some callers of get_same_sized_vectype were dealing with operands that
>> are constant or defined externally, and so have no STMT_VINFO_VECTYPE
>> available.  Under the current model, using get_same_sized_vectype for
>> that case is equivalent to using get_vectype_for_scalar_type, since
>> get_vectype_for_scalar_type always returns vectors of the same size,
>> once a size is fixed.
>>
>> Using get_vectype_for_scalar_type is arguably more obvious though:
>> if we're using the same scalar type as we would for internal
>> definitions, we should use the same vector type too.  (Constant and
>> external definitions sometimes let us change the original scalar type
>> to a "nicer" scalar type, but that isn't what's happening here.)
>>
>> This is a prerequisite to supporting multiple vector sizes in the same
>> vec_info.
>
> This might change the actual type we get back, IIRC we mass-changed
> it in the opposite direction from your change in the past, because it's
> more obvious to relate the type used to another vector type on the
> stmt.  So isn't it better to use the new related_vector_type thing here?

I guess this is a downside of the patch order.  Hopefully this looks
like a more sensible decision after 16/n, where we also pass the
group size to get_vectype_for_scalar_type.  If not: :-)

At the moment, there can only ever be one vector type for a given scalar
type within a vector loop.  We don't e.g. allow one loop vector stmt to
use V2SI and another to V4SI, because all vectorised SIs have to be
compatible in the same way that the original scalar SIs were.  So once
we have a loop_vec_info and once we have a scalar type, there's only one
valid choice of vector type.  I think trying to circumvent that by using
get_related_vectype_for_scalar_type instead of get_vectype_for_scalar_type
would run the risk of introducing accidental mismatches.  I.e. if we do
it right, calling get_related_vectype_for_scalar_type would give the same
result as calling get_vectype_for_scalar_type (and so we might as well
just call get_vectype_for_scalar_type).  If we do it wrong we can end up
with a different type from the one that other statements were expecting.

BB vectorisation currently works the same way.  But after 16/n, there
is instead one vector type for each (bb_vinfo, scalar_type, group_size)
triple.  So that patch makes it possible for different SLP instances to
have different vector types for the same scalar type, but the choice is
still fixed within an SLP instance, in a similar way to loop
vectorisation.

So I think using anything other than get_vectype_for_scalar_type
would give a false sense of freedom.  Calling it directly is mostly
useful for temporaries, e.g. in epilogue reduction handling.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [10/n] Make less use of get_same_sized_vectype
  2019-11-05 15:34     ` Richard Sandiford
@ 2019-11-05 16:09       ` Richard Biener
  0 siblings, 0 replies; 48+ messages in thread
From: Richard Biener @ 2019-11-05 16:09 UTC (permalink / raw)
  To: Richard Sandiford; +Cc: GCC Patches

On Tue, Nov 5, 2019 at 4:34 PM Richard Sandiford
<Richard.Sandiford@arm.com> wrote:
>
> Richard Biener <richard.guenther@gmail.com> writes:
> > On Fri, Oct 25, 2019 at 2:41 PM Richard Sandiford
> > <richard.sandiford@arm.com> wrote:
> >>
> >> Some callers of get_same_sized_vectype were dealing with operands that
> >> are constant or defined externally, and so have no STMT_VINFO_VECTYPE
> >> available.  Under the current model, using get_same_sized_vectype for
> >> that case is equivalent to using get_vectype_for_scalar_type, since
> >> get_vectype_for_scalar_type always returns vectors of the same size,
> >> once a size is fixed.
> >>
> >> Using get_vectype_for_scalar_type is arguably more obvious though:
> >> if we're using the same scalar type as we would for internal
> >> definitions, we should use the same vector type too.  (Constant and
> >> external definitions sometimes let us change the original scalar type
> >> to a "nicer" scalar type, but that isn't what's happening here.)
> >>
> >> This is a prerequisite to supporting multiple vector sizes in the same
> >> vec_info.
> >
> > This might change the actual type we get back, IIRC we mass-changed
> > it in the opposite direction from your change in the past, because it's
> > more obvious to relate the type used to another vector type on the
> > stmt.  So isn't it better to use the new related_vector_type thing here?
>
> I guess this is a downside of the patch order.  Hopefully this looks
> like a more sensible decision after 16/n, where we also pass the
> group size to get_vectype_for_scalar_type.  If not: :-)
>
> At the moment, there can only ever be one vector type for a given scalar
> type within a vector loop.  We don't e.g. allow one loop vector stmt to
> use V2SI and another to V4SI, because all vectorised SIs have to be
> compatible in the same way that the original scalar SIs were.  So once
> we have a loop_vec_info and once we have a scalar type, there's only one
> valid choice of vector type.  I think trying to circumvent that by using
> get_related_vectype_for_scalar_type instead of get_vectype_for_scalar_type
> would run the risk of introducing accidental mismatches.  I.e. if we do
> it right, calling get_related_vectype_for_scalar_type would give the same
> result as calling get_vectype_for_scalar_type (and so we might as well
> just call get_vectype_for_scalar_type).  If we do it wrong we can end up
> with a different type from the one that other statements were expecting.
>
> BB vectorisation currently works the same way.  But after 16/n, there
> is instead one vector type for each (bb_vinfo, scalar_type, group_size)
> triple.  So that patch makes it possible for different SLP instances to
> have different vector types for the same scalar type, but the choice is
> still fixed within an SLP instance, in a similar way to loop
> vectorisation.
>
> So I think using anything other than get_vectype_for_scalar_type
> would give a false sense of freedom.  Calling it directly is mostly
> useful for temporaries, e.g. in epilogue reduction handling.

OK, I guess we'll see how it plays out in the end of the series(es).

What I think should be there in the end is vectorizable_* asking
for the vector type that matches what they implement in terms
of the operation.  Say, if it is a PLUS and one operand has a
vector type set in the internal_def definition then the constant/external_def
should get the very same vector type.  That is, ideally those
routines would _not_ call "get me a vector type" but simply compute
the required type.  That is, vectorizable_operation should not need to call
these functions at unless we run into a completely invariant operation
(in which case we might as well fail vectorization).  I see it doesn't
care for the vector type of op1/2 which is probably a bug since
with multiple sizes we may get V2SI on one op and V4SI on another?

Richard.

> Thanks,
> Richard

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [10a/n] Require equal type sizes for vectorised calls
  2019-10-25 12:32 [0/n] Support multiple vector sizes for vectorisation Richard Sandiford
                   ` (10 preceding siblings ...)
  2019-10-29 17:14 ` [16/n] Apply maximum nunits for BB SLP Richard Sandiford
@ 2019-11-05 20:10 ` Richard Sandiford
  2019-11-06  9:44   ` Richard Biener
  2019-11-05 20:25 ` [11a/n] Avoid retrying with the same vector modes Richard Sandiford
  2019-11-05 20:45 ` [17/17] Extend can_duplicate_and_interleave_p to mixed-size vectors Richard Sandiford
  13 siblings, 1 reply; 48+ messages in thread
From: Richard Sandiford @ 2019-11-05 20:10 UTC (permalink / raw)
  To: gcc-patches

As explained in the comment, vectorizable_call needs more work to
support mixtures of sizes.  This avoids testsuite fallout for
later SVE patches.

Was originally going to be later in the series, but applying it
before 11/n seems safer.  As before each patch tested individually
on aarch64-linux-gnu and the series as a whole on x86_64-linux-gnu.


2019-11-04  Richard Sandiford  <richard.sandiford@arm.com>

gcc/
	* tree-vect-stmts.c (vectorizable_call): Require the types
	to have the same size.

Index: gcc/tree-vect-stmts.c
===================================================================
--- gcc/tree-vect-stmts.c	2019-11-05 10:38:50.718047381 +0000
+++ gcc/tree-vect-stmts.c	2019-11-05 10:38:55.542013228 +0000
@@ -3317,6 +3317,19 @@ vectorizable_call (stmt_vec_info stmt_in
 
       return false;
     }
+  /* FORNOW: we don't yet support mixtures of vector sizes for calls,
+     just mixtures of nunits.  E.g. DI->SI versions of __builtin_ctz*
+     are traditionally vectorized as two VnDI->VnDI IFN_CTZs followed
+     by a pack of the two vectors into an SI vector.  We would need
+     separate code to handle direct VnDI->VnSI IFN_CTZs.  */
+  if (TYPE_SIZE (vectype_in) != TYPE_SIZE (vectype_out))
+    {
+      if (dump_enabled_p ())
+	dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			 "mismatched vector sizes %T and %T\n",
+			 vectype_in, vectype_out);
+      return false;
+    }
 
   /* FORNOW */
   nunits_in = TYPE_VECTOR_SUBPARTS (vectype_in);

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [11a/n] Avoid retrying with the same vector modes
  2019-10-25 12:32 [0/n] Support multiple vector sizes for vectorisation Richard Sandiford
                   ` (11 preceding siblings ...)
  2019-11-05 20:10 ` [10a/n] Require equal type sizes for vectorised calls Richard Sandiford
@ 2019-11-05 20:25 ` Richard Sandiford
  2019-11-06  9:49   ` Richard Biener
  2019-11-05 20:45 ` [17/17] Extend can_duplicate_and_interleave_p to mixed-size vectors Richard Sandiford
  13 siblings, 1 reply; 48+ messages in thread
From: Richard Sandiford @ 2019-11-05 20:25 UTC (permalink / raw)
  To: gcc-patches

Patch 12/n makes the AArch64 port add four entries to
autovectorize_vector_modes.  Each entry describes a different
vector mode assignment for vector code that mixes 8-bit, 16-bit,
32-bit and 64-bit elements.  But if (as usual) the vector code has
fewer element sizes than that, we could end up trying the same
combination of vector modes multiple times.  This patch adds a
check to prevent that.

As before: each patch tested individually on aarch64-linux-gnu and the
series as a whole on x86_64-linux-gnu.


2019-11-04  Richard Sandiford  <richard.sandiford@arm.com>

gcc/
	* tree-vectorizer.h (vec_info::mode_set): New typedef.
	(vec_info::used_vector_mode): New member variable.
	(vect_chooses_same_modes_p): Declare.
	* tree-vect-stmts.c (get_vectype_for_scalar_type): Record each
	chosen vector mode in vec_info::used_vector_mode.
	(vect_chooses_same_modes_p): New function.
	* tree-vect-loop.c (vect_analyze_loop): Use it to avoid trying
	the same vector statements multiple times.
	* tree-vect-slp.c (vect_slp_bb_region): Likewise.

Index: gcc/tree-vectorizer.h
===================================================================
--- gcc/tree-vectorizer.h	2019-11-05 10:48:11.246092351 +0000
+++ gcc/tree-vectorizer.h	2019-11-05 10:57:41.662071145 +0000
@@ -298,6 +298,7 @@ typedef std::pair<tree, tree> vec_object
 /* Vectorizer state common between loop and basic-block vectorization.  */
 class vec_info {
 public:
+  typedef hash_set<int_hash<machine_mode, E_VOIDmode, E_BLKmode> > mode_set;
   enum vec_kind { bb, loop };
 
   vec_info (vec_kind, void *, vec_info_shared *);
@@ -335,6 +336,9 @@ typedef std::pair<tree, tree> vec_object
   /* Cost data used by the target cost model.  */
   void *target_cost_data;
 
+  /* The set of vector modes used in the vectorized region.  */
+  mode_set used_vector_modes;
+
   /* The argument we should pass to related_vector_mode when looking up
      the vector mode for a scalar mode, or VOIDmode if we haven't yet
      made any decisions about which vector modes to use.  */
@@ -1615,6 +1619,7 @@ extern tree get_related_vectype_for_scal
 extern tree get_vectype_for_scalar_type (vec_info *, tree);
 extern tree get_mask_type_for_scalar_type (vec_info *, tree);
 extern tree get_same_sized_vectype (tree, tree);
+extern bool vect_chooses_same_modes_p (vec_info *, machine_mode);
 extern bool vect_get_loop_mask_type (loop_vec_info);
 extern bool vect_is_simple_use (tree, vec_info *, enum vect_def_type *,
 				stmt_vec_info * = NULL, gimple ** = NULL);
Index: gcc/tree-vect-stmts.c
===================================================================
--- gcc/tree-vect-stmts.c	2019-11-05 10:48:11.242092379 +0000
+++ gcc/tree-vect-stmts.c	2019-11-05 10:57:41.662071145 +0000
@@ -11235,6 +11235,10 @@ get_vectype_for_scalar_type (vec_info *v
 						      scalar_type);
   if (vectype && vinfo->vector_mode == VOIDmode)
     vinfo->vector_mode = TYPE_MODE (vectype);
+
+  if (vectype)
+    vinfo->used_vector_modes.add (TYPE_MODE (vectype));
+
   return vectype;
 }
 
@@ -11274,6 +11278,20 @@ get_same_sized_vectype (tree scalar_type
 					      scalar_type, nunits);
 }
 
+/* Return true if replacing LOOP_VINFO->vector_mode with VECTOR_MODE
+   would not change the chosen vector modes.  */
+
+bool
+vect_chooses_same_modes_p (vec_info *vinfo, machine_mode vector_mode)
+{
+  for (vec_info::mode_set::iterator i = vinfo->used_vector_modes.begin ();
+       i != vinfo->used_vector_modes.end (); ++i)
+    if (!VECTOR_MODE_P (*i)
+	|| related_vector_mode (vector_mode, GET_MODE_INNER (*i), 0) != *i)
+      return false;
+  return true;
+}
+
 /* Function vect_is_simple_use.
 
    Input:
Index: gcc/tree-vect-loop.c
===================================================================
--- gcc/tree-vect-loop.c	2019-11-05 10:48:11.238092407 +0000
+++ gcc/tree-vect-loop.c	2019-11-05 10:57:41.658071173 +0000
@@ -2430,6 +2430,19 @@ vect_analyze_loop (class loop *loop, vec
 	}
 
       loop->aux = NULL;
+
+      if (!fatal)
+	while (mode_i < vector_modes.length ()
+	       && vect_chooses_same_modes_p (loop_vinfo, vector_modes[mode_i]))
+	  {
+	    if (dump_enabled_p ())
+	      dump_printf_loc (MSG_NOTE, vect_location,
+			       "***** The result for vector mode %s would"
+			       " be the same\n",
+			       GET_MODE_NAME (vector_modes[mode_i]));
+	    mode_i += 1;
+	  }
+
       if (res)
 	{
 	  LOOP_VINFO_VECTORIZABLE_P (loop_vinfo) = 1;
Index: gcc/tree-vect-slp.c
===================================================================
--- gcc/tree-vect-slp.c	2019-11-05 10:48:11.242092379 +0000
+++ gcc/tree-vect-slp.c	2019-11-05 10:57:41.662071145 +0000
@@ -3238,6 +3238,18 @@ vect_slp_bb_region (gimple_stmt_iterator
       if (mode_i == 0)
 	autodetected_vector_mode = bb_vinfo->vector_mode;
 
+      if (!fatal)
+	while (mode_i < vector_modes.length ()
+	       && vect_chooses_same_modes_p (bb_vinfo, vector_modes[mode_i]))
+	  {
+	    if (dump_enabled_p ())
+	      dump_printf_loc (MSG_NOTE, vect_location,
+			       "***** The result for vector mode %s would"
+			       " be the same\n",
+			       GET_MODE_NAME (vector_modes[mode_i]));
+	    mode_i += 1;
+	  }
+
       delete bb_vinfo;
 
       if (mode_i < vector_modes.length ()

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [17/17] Extend can_duplicate_and_interleave_p to mixed-size vectors
  2019-10-25 12:32 [0/n] Support multiple vector sizes for vectorisation Richard Sandiford
                   ` (12 preceding siblings ...)
  2019-11-05 20:25 ` [11a/n] Avoid retrying with the same vector modes Richard Sandiford
@ 2019-11-05 20:45 ` Richard Sandiford
  2019-11-14 12:23   ` Richard Biener
  13 siblings, 1 reply; 48+ messages in thread
From: Richard Sandiford @ 2019-11-05 20:45 UTC (permalink / raw)
  To: gcc-patches

This patch makes can_duplicate_and_interleave_p cope with mixtures of
vector sizes, by using queries based on get_vectype_for_scalar_type
instead of directly querying GET_MODE_SIZE (vinfo->vector_mode).

int_mode_for_size is now the first check we do for a candidate mode,
so it seemed better to restrict it to MAX_FIXED_MODE_SIZE.  This avoids
unnecessary work and avoids trying to create scalar types that the
target might not support.

This final patch in the series.  As before, each patch tested individually
on aarch64-linux-gnu and the series as a whole on x86_64-linux-gnu.


2019-11-04  Richard Sandiford  <richard.sandiford@arm.com>

gcc/
	* tree-vectorizer.h (can_duplicate_and_interleave_p): Take an
	element type rather than an element mode.
	* tree-vect-slp.c (can_duplicate_and_interleave_p): Likewise.
	Use get_vectype_for_scalar_type to query the natural types
	for a given element type rather than basing everything on
	GET_MODE_SIZE (vinfo->vector_mode).  Limit int_mode_for_size
	query to MAX_FIXED_MODE_SIZE.
	(duplicate_and_interleave): Update call accordingly.
	* tree-vect-loop.c (vectorizable_reduction): Likewise.

Index: gcc/tree-vectorizer.h
===================================================================
--- gcc/tree-vectorizer.h	2019-11-05 11:08:12.521631453 +0000
+++ gcc/tree-vectorizer.h	2019-11-05 11:14:42.786884473 +0000
@@ -1779,8 +1779,7 @@ extern void vect_get_slp_defs (slp_tree,
 extern bool vect_slp_bb (basic_block);
 extern stmt_vec_info vect_find_last_scalar_stmt_in_slp (slp_tree);
 extern bool is_simple_and_all_uses_invariant (stmt_vec_info, loop_vec_info);
-extern bool can_duplicate_and_interleave_p (vec_info *, unsigned int,
-					    machine_mode,
+extern bool can_duplicate_and_interleave_p (vec_info *, unsigned int, tree,
 					    unsigned int * = NULL,
 					    tree * = NULL, tree * = NULL);
 extern void duplicate_and_interleave (vec_info *, gimple_seq *, tree,
Index: gcc/tree-vect-slp.c
===================================================================
--- gcc/tree-vect-slp.c	2019-11-05 11:08:12.517631481 +0000
+++ gcc/tree-vect-slp.c	2019-11-05 11:14:42.786884473 +0000
@@ -265,7 +265,7 @@ vect_get_place_in_interleaving_chain (st
   return -1;
 }
 
-/* Check whether it is possible to load COUNT elements of type ELT_MODE
+/* Check whether it is possible to load COUNT elements of type ELT_TYPE
    using the method implemented by duplicate_and_interleave.  Return true
    if so, returning the number of intermediate vectors in *NVECTORS_OUT
    (if nonnull) and the type of each intermediate vector in *VECTOR_TYPE_OUT
@@ -273,26 +273,37 @@ vect_get_place_in_interleaving_chain (st
 
 bool
 can_duplicate_and_interleave_p (vec_info *vinfo, unsigned int count,
-				machine_mode elt_mode,
-				unsigned int *nvectors_out,
+				tree elt_type, unsigned int *nvectors_out,
 				tree *vector_type_out,
 				tree *permutes)
 {
-  poly_int64 elt_bytes = count * GET_MODE_SIZE (elt_mode);
-  poly_int64 nelts;
+  tree base_vector_type = get_vectype_for_scalar_type (vinfo, elt_type, count);
+  if (!base_vector_type || !VECTOR_MODE_P (TYPE_MODE (base_vector_type)))
+    return false;
+
+  machine_mode base_vector_mode = TYPE_MODE (base_vector_type);
+  poly_int64 elt_bytes = count * GET_MODE_UNIT_SIZE (base_vector_mode);
   unsigned int nvectors = 1;
   for (;;)
     {
       scalar_int_mode int_mode;
       poly_int64 elt_bits = elt_bytes * BITS_PER_UNIT;
-      if (multiple_p (GET_MODE_SIZE (vinfo->vector_mode), elt_bytes, &nelts)
-	  && int_mode_for_size (elt_bits, 0).exists (&int_mode))
+      if (int_mode_for_size (elt_bits, 1).exists (&int_mode))
 	{
+	  /* Get the natural vector type for this SLP group size.  */
 	  tree int_type = build_nonstandard_integer_type
 	    (GET_MODE_BITSIZE (int_mode), 1);
-	  tree vector_type = build_vector_type (int_type, nelts);
-	  if (VECTOR_MODE_P (TYPE_MODE (vector_type)))
-	    {
+	  tree vector_type
+	    = get_vectype_for_scalar_type (vinfo, int_type, count);
+	  if (vector_type
+	      && VECTOR_MODE_P (TYPE_MODE (vector_type))
+	      && known_eq (GET_MODE_SIZE (TYPE_MODE (vector_type)),
+			   GET_MODE_SIZE (base_vector_mode)))
+	    {
+	      /* Try fusing consecutive sequences of COUNT / NVECTORS elements
+		 together into elements of type INT_TYPE and using the result
+		 to build NVECTORS vectors.  */
+	      poly_uint64 nelts = GET_MODE_NUNITS (TYPE_MODE (vector_type));
 	      vec_perm_builder sel1 (nelts, 2, 3);
 	      vec_perm_builder sel2 (nelts, 2, 3);
 	      poly_int64 half_nelts = exact_div (nelts, 2);
@@ -492,7 +503,7 @@ vect_get_and_check_slp_defs (vec_info *v
 	      && !GET_MODE_SIZE (vinfo->vector_mode).is_constant ()
 	      && (TREE_CODE (type) == BOOLEAN_TYPE
 		  || !can_duplicate_and_interleave_p (vinfo, stmts.length (),
-						      TYPE_MODE (type))))
+						      type)))
 	    {
 	      if (dump_enabled_p ())
 		dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
@@ -3551,7 +3562,7 @@ duplicate_and_interleave (vec_info *vinf
   unsigned int nvectors = 1;
   tree new_vector_type;
   tree permutes[2];
-  if (!can_duplicate_and_interleave_p (vinfo, nelts, TYPE_MODE (element_type),
+  if (!can_duplicate_and_interleave_p (vinfo, nelts, element_type,
 				       &nvectors, &new_vector_type,
 				       permutes))
     gcc_unreachable ();
Index: gcc/tree-vect-loop.c
===================================================================
--- gcc/tree-vect-loop.c	2019-11-05 10:57:41.658071173 +0000
+++ gcc/tree-vect-loop.c	2019-11-05 11:14:42.782884501 +0000
@@ -6288,10 +6288,9 @@ vectorizable_reduction (stmt_vec_info st
 	 that value needs to be repeated for every instance of the
 	 statement within the initial vector.  */
       unsigned int group_size = SLP_INSTANCE_GROUP_SIZE (slp_node_instance);
-      scalar_mode elt_mode = SCALAR_TYPE_MODE (TREE_TYPE (vectype_out));
       if (!neutral_op
 	  && !can_duplicate_and_interleave_p (loop_vinfo, group_size,
-					      elt_mode))
+					      TREE_TYPE (vectype_out)))
 	{
 	  if (dump_enabled_p ())
 	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [10a/n] Require equal type sizes for vectorised calls
  2019-11-05 20:10 ` [10a/n] Require equal type sizes for vectorised calls Richard Sandiford
@ 2019-11-06  9:44   ` Richard Biener
  0 siblings, 0 replies; 48+ messages in thread
From: Richard Biener @ 2019-11-06  9:44 UTC (permalink / raw)
  To: Richard Sandiford; +Cc: GCC Patches

On Tue, Nov 5, 2019 at 9:10 PM Richard Sandiford
<richard.sandiford@arm.com> wrote:
>
> As explained in the comment, vectorizable_call needs more work to
> support mixtures of sizes.  This avoids testsuite fallout for
> later SVE patches.
>
> Was originally going to be later in the series, but applying it
> before 11/n seems safer.  As before each patch tested individually
> on aarch64-linux-gnu and the series as a whole on x86_64-linux-gnu.

OK.

>
> 2019-11-04  Richard Sandiford  <richard.sandiford@arm.com>
>
> gcc/
>         * tree-vect-stmts.c (vectorizable_call): Require the types
>         to have the same size.
>
> Index: gcc/tree-vect-stmts.c
> ===================================================================
> --- gcc/tree-vect-stmts.c       2019-11-05 10:38:50.718047381 +0000
> +++ gcc/tree-vect-stmts.c       2019-11-05 10:38:55.542013228 +0000
> @@ -3317,6 +3317,19 @@ vectorizable_call (stmt_vec_info stmt_in
>
>        return false;
>      }
> +  /* FORNOW: we don't yet support mixtures of vector sizes for calls,
> +     just mixtures of nunits.  E.g. DI->SI versions of __builtin_ctz*
> +     are traditionally vectorized as two VnDI->VnDI IFN_CTZs followed
> +     by a pack of the two vectors into an SI vector.  We would need
> +     separate code to handle direct VnDI->VnSI IFN_CTZs.  */
> +  if (TYPE_SIZE (vectype_in) != TYPE_SIZE (vectype_out))
> +    {
> +      if (dump_enabled_p ())
> +       dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +                        "mismatched vector sizes %T and %T\n",
> +                        vectype_in, vectype_out);
> +      return false;
> +    }
>
>    /* FORNOW */
>    nunits_in = TYPE_VECTOR_SUBPARTS (vectype_in);

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [11a/n] Avoid retrying with the same vector modes
  2019-11-05 20:25 ` [11a/n] Avoid retrying with the same vector modes Richard Sandiford
@ 2019-11-06  9:49   ` Richard Biener
  2019-11-06 10:21     ` Richard Sandiford
  0 siblings, 1 reply; 48+ messages in thread
From: Richard Biener @ 2019-11-06  9:49 UTC (permalink / raw)
  To: Richard Sandiford; +Cc: GCC Patches

On Tue, Nov 5, 2019 at 9:25 PM Richard Sandiford
<richard.sandiford@arm.com> wrote:
>
> Patch 12/n makes the AArch64 port add four entries to
> autovectorize_vector_modes.  Each entry describes a different
> vector mode assignment for vector code that mixes 8-bit, 16-bit,
> 32-bit and 64-bit elements.  But if (as usual) the vector code has
> fewer element sizes than that, we could end up trying the same
> combination of vector modes multiple times.  This patch adds a
> check to prevent that.
>
> As before: each patch tested individually on aarch64-linux-gnu and the
> series as a whole on x86_64-linux-gnu.
>
>
> 2019-11-04  Richard Sandiford  <richard.sandiford@arm.com>
>
> gcc/
>         * tree-vectorizer.h (vec_info::mode_set): New typedef.
>         (vec_info::used_vector_mode): New member variable.
>         (vect_chooses_same_modes_p): Declare.
>         * tree-vect-stmts.c (get_vectype_for_scalar_type): Record each
>         chosen vector mode in vec_info::used_vector_mode.
>         (vect_chooses_same_modes_p): New function.
>         * tree-vect-loop.c (vect_analyze_loop): Use it to avoid trying
>         the same vector statements multiple times.
>         * tree-vect-slp.c (vect_slp_bb_region): Likewise.
>
> Index: gcc/tree-vectorizer.h
> ===================================================================
> --- gcc/tree-vectorizer.h       2019-11-05 10:48:11.246092351 +0000
> +++ gcc/tree-vectorizer.h       2019-11-05 10:57:41.662071145 +0000
> @@ -298,6 +298,7 @@ typedef std::pair<tree, tree> vec_object
>  /* Vectorizer state common between loop and basic-block vectorization.  */
>  class vec_info {
>  public:
> +  typedef hash_set<int_hash<machine_mode, E_VOIDmode, E_BLKmode> > mode_set;
>    enum vec_kind { bb, loop };
>
>    vec_info (vec_kind, void *, vec_info_shared *);
> @@ -335,6 +336,9 @@ typedef std::pair<tree, tree> vec_object
>    /* Cost data used by the target cost model.  */
>    void *target_cost_data;
>
> +  /* The set of vector modes used in the vectorized region.  */
> +  mode_set used_vector_modes;
> +
>    /* The argument we should pass to related_vector_mode when looking up
>       the vector mode for a scalar mode, or VOIDmode if we haven't yet
>       made any decisions about which vector modes to use.  */
> @@ -1615,6 +1619,7 @@ extern tree get_related_vectype_for_scal
>  extern tree get_vectype_for_scalar_type (vec_info *, tree);
>  extern tree get_mask_type_for_scalar_type (vec_info *, tree);
>  extern tree get_same_sized_vectype (tree, tree);
> +extern bool vect_chooses_same_modes_p (vec_info *, machine_mode);
>  extern bool vect_get_loop_mask_type (loop_vec_info);
>  extern bool vect_is_simple_use (tree, vec_info *, enum vect_def_type *,
>                                 stmt_vec_info * = NULL, gimple ** = NULL);
> Index: gcc/tree-vect-stmts.c
> ===================================================================
> --- gcc/tree-vect-stmts.c       2019-11-05 10:48:11.242092379 +0000
> +++ gcc/tree-vect-stmts.c       2019-11-05 10:57:41.662071145 +0000
> @@ -11235,6 +11235,10 @@ get_vectype_for_scalar_type (vec_info *v
>                                                       scalar_type);
>    if (vectype && vinfo->vector_mode == VOIDmode)
>      vinfo->vector_mode = TYPE_MODE (vectype);
> +
> +  if (vectype)
> +    vinfo->used_vector_modes.add (TYPE_MODE (vectype));
> +

Do we actually end up _using_ all types returned by this function?

Otherwise OK.

Richard.

>    return vectype;
>  }
>
> @@ -11274,6 +11278,20 @@ get_same_sized_vectype (tree scalar_type
>                                               scalar_type, nunits);
>  }
>
> +/* Return true if replacing LOOP_VINFO->vector_mode with VECTOR_MODE
> +   would not change the chosen vector modes.  */
> +
> +bool
> +vect_chooses_same_modes_p (vec_info *vinfo, machine_mode vector_mode)
> +{
> +  for (vec_info::mode_set::iterator i = vinfo->used_vector_modes.begin ();
> +       i != vinfo->used_vector_modes.end (); ++i)
> +    if (!VECTOR_MODE_P (*i)
> +       || related_vector_mode (vector_mode, GET_MODE_INNER (*i), 0) != *i)
> +      return false;
> +  return true;
> +}
> +
>  /* Function vect_is_simple_use.
>
>     Input:
> Index: gcc/tree-vect-loop.c
> ===================================================================
> --- gcc/tree-vect-loop.c        2019-11-05 10:48:11.238092407 +0000
> +++ gcc/tree-vect-loop.c        2019-11-05 10:57:41.658071173 +0000
> @@ -2430,6 +2430,19 @@ vect_analyze_loop (class loop *loop, vec
>         }
>
>        loop->aux = NULL;
> +
> +      if (!fatal)
> +       while (mode_i < vector_modes.length ()
> +              && vect_chooses_same_modes_p (loop_vinfo, vector_modes[mode_i]))
> +         {
> +           if (dump_enabled_p ())
> +             dump_printf_loc (MSG_NOTE, vect_location,
> +                              "***** The result for vector mode %s would"
> +                              " be the same\n",
> +                              GET_MODE_NAME (vector_modes[mode_i]));
> +           mode_i += 1;
> +         }
> +
>        if (res)
>         {
>           LOOP_VINFO_VECTORIZABLE_P (loop_vinfo) = 1;
> Index: gcc/tree-vect-slp.c
> ===================================================================
> --- gcc/tree-vect-slp.c 2019-11-05 10:48:11.242092379 +0000
> +++ gcc/tree-vect-slp.c 2019-11-05 10:57:41.662071145 +0000
> @@ -3238,6 +3238,18 @@ vect_slp_bb_region (gimple_stmt_iterator
>        if (mode_i == 0)
>         autodetected_vector_mode = bb_vinfo->vector_mode;
>
> +      if (!fatal)
> +       while (mode_i < vector_modes.length ()
> +              && vect_chooses_same_modes_p (bb_vinfo, vector_modes[mode_i]))
> +         {
> +           if (dump_enabled_p ())
> +             dump_printf_loc (MSG_NOTE, vect_location,
> +                              "***** The result for vector mode %s would"
> +                              " be the same\n",
> +                              GET_MODE_NAME (vector_modes[mode_i]));
> +           mode_i += 1;
> +         }
> +
>        delete bb_vinfo;
>
>        if (mode_i < vector_modes.length ()

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [11a/n] Avoid retrying with the same vector modes
  2019-11-06  9:49   ` Richard Biener
@ 2019-11-06 10:21     ` Richard Sandiford
  2019-11-06 10:27       ` Richard Biener
  0 siblings, 1 reply; 48+ messages in thread
From: Richard Sandiford @ 2019-11-06 10:21 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC Patches

Richard Biener <richard.guenther@gmail.com> writes:
> On Tue, Nov 5, 2019 at 9:25 PM Richard Sandiford
> <richard.sandiford@arm.com> wrote:
>>
>> Patch 12/n makes the AArch64 port add four entries to
>> autovectorize_vector_modes.  Each entry describes a different
>> vector mode assignment for vector code that mixes 8-bit, 16-bit,
>> 32-bit and 64-bit elements.  But if (as usual) the vector code has
>> fewer element sizes than that, we could end up trying the same
>> combination of vector modes multiple times.  This patch adds a
>> check to prevent that.
>>
>> As before: each patch tested individually on aarch64-linux-gnu and the
>> series as a whole on x86_64-linux-gnu.
>>
>>
>> 2019-11-04  Richard Sandiford  <richard.sandiford@arm.com>
>>
>> gcc/
>>         * tree-vectorizer.h (vec_info::mode_set): New typedef.
>>         (vec_info::used_vector_mode): New member variable.
>>         (vect_chooses_same_modes_p): Declare.
>>         * tree-vect-stmts.c (get_vectype_for_scalar_type): Record each
>>         chosen vector mode in vec_info::used_vector_mode.
>>         (vect_chooses_same_modes_p): New function.
>>         * tree-vect-loop.c (vect_analyze_loop): Use it to avoid trying
>>         the same vector statements multiple times.
>>         * tree-vect-slp.c (vect_slp_bb_region): Likewise.
>>
>> Index: gcc/tree-vectorizer.h
>> ===================================================================
>> --- gcc/tree-vectorizer.h       2019-11-05 10:48:11.246092351 +0000
>> +++ gcc/tree-vectorizer.h       2019-11-05 10:57:41.662071145 +0000
>> @@ -298,6 +298,7 @@ typedef std::pair<tree, tree> vec_object
>>  /* Vectorizer state common between loop and basic-block vectorization.  */
>>  class vec_info {
>>  public:
>> +  typedef hash_set<int_hash<machine_mode, E_VOIDmode, E_BLKmode> > mode_set;
>>    enum vec_kind { bb, loop };
>>
>>    vec_info (vec_kind, void *, vec_info_shared *);
>> @@ -335,6 +336,9 @@ typedef std::pair<tree, tree> vec_object
>>    /* Cost data used by the target cost model.  */
>>    void *target_cost_data;
>>
>> +  /* The set of vector modes used in the vectorized region.  */
>> +  mode_set used_vector_modes;
>> +
>>    /* The argument we should pass to related_vector_mode when looking up
>>       the vector mode for a scalar mode, or VOIDmode if we haven't yet
>>       made any decisions about which vector modes to use.  */
>> @@ -1615,6 +1619,7 @@ extern tree get_related_vectype_for_scal
>>  extern tree get_vectype_for_scalar_type (vec_info *, tree);
>>  extern tree get_mask_type_for_scalar_type (vec_info *, tree);
>>  extern tree get_same_sized_vectype (tree, tree);
>> +extern bool vect_chooses_same_modes_p (vec_info *, machine_mode);
>>  extern bool vect_get_loop_mask_type (loop_vec_info);
>>  extern bool vect_is_simple_use (tree, vec_info *, enum vect_def_type *,
>>                                 stmt_vec_info * = NULL, gimple ** = NULL);
>> Index: gcc/tree-vect-stmts.c
>> ===================================================================
>> --- gcc/tree-vect-stmts.c       2019-11-05 10:48:11.242092379 +0000
>> +++ gcc/tree-vect-stmts.c       2019-11-05 10:57:41.662071145 +0000
>> @@ -11235,6 +11235,10 @@ get_vectype_for_scalar_type (vec_info *v
>>                                                       scalar_type);
>>    if (vectype && vinfo->vector_mode == VOIDmode)
>>      vinfo->vector_mode = TYPE_MODE (vectype);
>> +
>> +  if (vectype)
>> +    vinfo->used_vector_modes.add (TYPE_MODE (vectype));
>> +
>
> Do we actually end up _using_ all types returned by this function?

No, not all of them, so it's a bit crude.  E.g. some types might end up
not being relevant after pattern recognition, or after we've made a
final decision about which parts of an address calculation to include
in a gather or scatter op.  So we can still end up retrying the same
thing even after the patch.

The problem is that we're trying to avoid pointless retries on failure
as well as success, so we could end up stopping at arbitrary points.
I wasn't sure where else to handle this.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [11a/n] Avoid retrying with the same vector modes
  2019-11-06 10:21     ` Richard Sandiford
@ 2019-11-06 10:27       ` Richard Biener
  2019-11-06 11:02         ` Richard Sandiford
  0 siblings, 1 reply; 48+ messages in thread
From: Richard Biener @ 2019-11-06 10:27 UTC (permalink / raw)
  To: Richard Sandiford; +Cc: GCC Patches

On Wed, Nov 6, 2019 at 11:21 AM Richard Sandiford
<richard.sandiford@arm.com> wrote:
>
> Richard Biener <richard.guenther@gmail.com> writes:
> > On Tue, Nov 5, 2019 at 9:25 PM Richard Sandiford
> > <richard.sandiford@arm.com> wrote:
> >>
> >> Patch 12/n makes the AArch64 port add four entries to
> >> autovectorize_vector_modes.  Each entry describes a different
> >> vector mode assignment for vector code that mixes 8-bit, 16-bit,
> >> 32-bit and 64-bit elements.  But if (as usual) the vector code has
> >> fewer element sizes than that, we could end up trying the same
> >> combination of vector modes multiple times.  This patch adds a
> >> check to prevent that.
> >>
> >> As before: each patch tested individually on aarch64-linux-gnu and the
> >> series as a whole on x86_64-linux-gnu.
> >>
> >>
> >> 2019-11-04  Richard Sandiford  <richard.sandiford@arm.com>
> >>
> >> gcc/
> >>         * tree-vectorizer.h (vec_info::mode_set): New typedef.
> >>         (vec_info::used_vector_mode): New member variable.
> >>         (vect_chooses_same_modes_p): Declare.
> >>         * tree-vect-stmts.c (get_vectype_for_scalar_type): Record each
> >>         chosen vector mode in vec_info::used_vector_mode.
> >>         (vect_chooses_same_modes_p): New function.
> >>         * tree-vect-loop.c (vect_analyze_loop): Use it to avoid trying
> >>         the same vector statements multiple times.
> >>         * tree-vect-slp.c (vect_slp_bb_region): Likewise.
> >>
> >> Index: gcc/tree-vectorizer.h
> >> ===================================================================
> >> --- gcc/tree-vectorizer.h       2019-11-05 10:48:11.246092351 +0000
> >> +++ gcc/tree-vectorizer.h       2019-11-05 10:57:41.662071145 +0000
> >> @@ -298,6 +298,7 @@ typedef std::pair<tree, tree> vec_object
> >>  /* Vectorizer state common between loop and basic-block vectorization.  */
> >>  class vec_info {
> >>  public:
> >> +  typedef hash_set<int_hash<machine_mode, E_VOIDmode, E_BLKmode> > mode_set;
> >>    enum vec_kind { bb, loop };
> >>
> >>    vec_info (vec_kind, void *, vec_info_shared *);
> >> @@ -335,6 +336,9 @@ typedef std::pair<tree, tree> vec_object
> >>    /* Cost data used by the target cost model.  */
> >>    void *target_cost_data;
> >>
> >> +  /* The set of vector modes used in the vectorized region.  */
> >> +  mode_set used_vector_modes;
> >> +
> >>    /* The argument we should pass to related_vector_mode when looking up
> >>       the vector mode for a scalar mode, or VOIDmode if we haven't yet
> >>       made any decisions about which vector modes to use.  */
> >> @@ -1615,6 +1619,7 @@ extern tree get_related_vectype_for_scal
> >>  extern tree get_vectype_for_scalar_type (vec_info *, tree);
> >>  extern tree get_mask_type_for_scalar_type (vec_info *, tree);
> >>  extern tree get_same_sized_vectype (tree, tree);
> >> +extern bool vect_chooses_same_modes_p (vec_info *, machine_mode);
> >>  extern bool vect_get_loop_mask_type (loop_vec_info);
> >>  extern bool vect_is_simple_use (tree, vec_info *, enum vect_def_type *,
> >>                                 stmt_vec_info * = NULL, gimple ** = NULL);
> >> Index: gcc/tree-vect-stmts.c
> >> ===================================================================
> >> --- gcc/tree-vect-stmts.c       2019-11-05 10:48:11.242092379 +0000
> >> +++ gcc/tree-vect-stmts.c       2019-11-05 10:57:41.662071145 +0000
> >> @@ -11235,6 +11235,10 @@ get_vectype_for_scalar_type (vec_info *v
> >>                                                       scalar_type);
> >>    if (vectype && vinfo->vector_mode == VOIDmode)
> >>      vinfo->vector_mode = TYPE_MODE (vectype);
> >> +
> >> +  if (vectype)
> >> +    vinfo->used_vector_modes.add (TYPE_MODE (vectype));
> >> +
> >
> > Do we actually end up _using_ all types returned by this function?
>
> No, not all of them, so it's a bit crude.  E.g. some types might end up
> not being relevant after pattern recognition, or after we've made a
> final decision about which parts of an address calculation to include
> in a gather or scatter op.  So we can still end up retrying the same
> thing even after the patch.
>
> The problem is that we're trying to avoid pointless retries on failure
> as well as success, so we could end up stopping at arbitrary points.
> I wasn't sure where else to handle this.

Yeah, I think this "iterating" is somewhat bogus (crude) now.  What we'd
like to collect is for all defs the vector types we could use and then
vectorizable_ defines constraints between input and output vector types.
From that we'd arrive at a (possibly quite large) set of "SLP graphs
with vector types"
we'd choose from.  I believe we'll never want to truly explore the whole space
but guess we want to greedily compute those "SLP graphs with vector types"
starting from what (grouped) datarefs tells us is possible (which is
kind of what
we do now).

Richard.

> Thanks,
> Richard

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [11a/n] Avoid retrying with the same vector modes
  2019-11-06 10:27       ` Richard Biener
@ 2019-11-06 11:02         ` Richard Sandiford
  2019-11-06 11:22           ` Richard Biener
  0 siblings, 1 reply; 48+ messages in thread
From: Richard Sandiford @ 2019-11-06 11:02 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC Patches

Richard Biener <richard.guenther@gmail.com> writes:
> On Wed, Nov 6, 2019 at 11:21 AM Richard Sandiford
> <richard.sandiford@arm.com> wrote:
>>
>> Richard Biener <richard.guenther@gmail.com> writes:
>> > On Tue, Nov 5, 2019 at 9:25 PM Richard Sandiford
>> > <richard.sandiford@arm.com> wrote:
>> >>
>> >> Patch 12/n makes the AArch64 port add four entries to
>> >> autovectorize_vector_modes.  Each entry describes a different
>> >> vector mode assignment for vector code that mixes 8-bit, 16-bit,
>> >> 32-bit and 64-bit elements.  But if (as usual) the vector code has
>> >> fewer element sizes than that, we could end up trying the same
>> >> combination of vector modes multiple times.  This patch adds a
>> >> check to prevent that.
>> >>
>> >> As before: each patch tested individually on aarch64-linux-gnu and the
>> >> series as a whole on x86_64-linux-gnu.
>> >>
>> >>
>> >> 2019-11-04  Richard Sandiford  <richard.sandiford@arm.com>
>> >>
>> >> gcc/
>> >>         * tree-vectorizer.h (vec_info::mode_set): New typedef.
>> >>         (vec_info::used_vector_mode): New member variable.
>> >>         (vect_chooses_same_modes_p): Declare.
>> >>         * tree-vect-stmts.c (get_vectype_for_scalar_type): Record each
>> >>         chosen vector mode in vec_info::used_vector_mode.
>> >>         (vect_chooses_same_modes_p): New function.
>> >>         * tree-vect-loop.c (vect_analyze_loop): Use it to avoid trying
>> >>         the same vector statements multiple times.
>> >>         * tree-vect-slp.c (vect_slp_bb_region): Likewise.
>> >>
>> >> Index: gcc/tree-vectorizer.h
>> >> ===================================================================
>> >> --- gcc/tree-vectorizer.h       2019-11-05 10:48:11.246092351 +0000
>> >> +++ gcc/tree-vectorizer.h       2019-11-05 10:57:41.662071145 +0000
>> >> @@ -298,6 +298,7 @@ typedef std::pair<tree, tree> vec_object
>> >>  /* Vectorizer state common between loop and basic-block vectorization.  */
>> >>  class vec_info {
>> >>  public:
>> >> +  typedef hash_set<int_hash<machine_mode, E_VOIDmode, E_BLKmode> > mode_set;
>> >>    enum vec_kind { bb, loop };
>> >>
>> >>    vec_info (vec_kind, void *, vec_info_shared *);
>> >> @@ -335,6 +336,9 @@ typedef std::pair<tree, tree> vec_object
>> >>    /* Cost data used by the target cost model.  */
>> >>    void *target_cost_data;
>> >>
>> >> +  /* The set of vector modes used in the vectorized region.  */
>> >> +  mode_set used_vector_modes;
>> >> +
>> >>    /* The argument we should pass to related_vector_mode when looking up
>> >>       the vector mode for a scalar mode, or VOIDmode if we haven't yet
>> >>       made any decisions about which vector modes to use.  */
>> >> @@ -1615,6 +1619,7 @@ extern tree get_related_vectype_for_scal
>> >>  extern tree get_vectype_for_scalar_type (vec_info *, tree);
>> >>  extern tree get_mask_type_for_scalar_type (vec_info *, tree);
>> >>  extern tree get_same_sized_vectype (tree, tree);
>> >> +extern bool vect_chooses_same_modes_p (vec_info *, machine_mode);
>> >>  extern bool vect_get_loop_mask_type (loop_vec_info);
>> >>  extern bool vect_is_simple_use (tree, vec_info *, enum vect_def_type *,
>> >>                                 stmt_vec_info * = NULL, gimple ** = NULL);
>> >> Index: gcc/tree-vect-stmts.c
>> >> ===================================================================
>> >> --- gcc/tree-vect-stmts.c       2019-11-05 10:48:11.242092379 +0000
>> >> +++ gcc/tree-vect-stmts.c       2019-11-05 10:57:41.662071145 +0000
>> >> @@ -11235,6 +11235,10 @@ get_vectype_for_scalar_type (vec_info *v
>> >>                                                       scalar_type);
>> >>    if (vectype && vinfo->vector_mode == VOIDmode)
>> >>      vinfo->vector_mode = TYPE_MODE (vectype);
>> >> +
>> >> +  if (vectype)
>> >> +    vinfo->used_vector_modes.add (TYPE_MODE (vectype));
>> >> +
>> >
>> > Do we actually end up _using_ all types returned by this function?
>>
>> No, not all of them, so it's a bit crude.  E.g. some types might end up
>> not being relevant after pattern recognition, or after we've made a
>> final decision about which parts of an address calculation to include
>> in a gather or scatter op.  So we can still end up retrying the same
>> thing even after the patch.
>>
>> The problem is that we're trying to avoid pointless retries on failure
>> as well as success, so we could end up stopping at arbitrary points.
>> I wasn't sure where else to handle this.
>
> Yeah, I think this "iterating" is somewhat bogus (crude) now.

I think it was crude even before the series though. :-)  Not sure the
series is making things worse.

The problem is that there's a chicken-and-egg problem between how
we decide to vectorise and which vector subarchitecture and VF we use.
E.g. if we have:

  unsigned char *x, *y;
  ...
  x[i] = (unsigned short) (x[i] + y[i] + 1) >> 1;

do we build the SLP graph on the assumption that we need to use short
elements, or on the assumption that we can use IFN_AVG_CEIL?  This
affects the VF we get out: using IFN_AVG_CEIL gives double the VF
relative to doing unsigned short arithmetic.

And we need to know which vector subarchitecture we're targetting when
making that decision: e.g. Advanced SIMD and SVE2 have IFN_AVG_CEIL,
but SVE doesn't.  On the other hand, SVE supports things that Advanced
SIMD doesn't.  It's similar story of course for the x86 vector subarchs.

For one pattern like this, we could simply try both ways.
But that becomes untenable if there are multiple potential patterns.
Iterating over the vector subarchs gives us a sensible way of reducing
the search space by only applying patterns that the subarch supports.

So...

> What we'd like to collect is for all defs the vector types we could
> use and then vectorizable_ defines constraints between input and
> output vector types.  From that we'd arrive at a (possibly quite
> large) set of "SLP graphs with vector types" we'd choose from.  I
> believe we'll never want to truly explore the whole space but guess we
> want to greedily compute those "SLP graphs with vector types" starting
> from what (grouped) datarefs tells us is possible (which is kind of
> what we do now).

...I don't think we can/should use the same SLP graph to pick vector
types for all subarchs, since the ideal graph depends on the subarch.
I'm also not sure the vectorizable_* routines could say anything that
isn't obvious from the input and output scalar types.  Won't it still be
the case that within an SLP instance, all scalars of type T will become
the same vector(N) T?

Thanks,
Richard

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [11a/n] Avoid retrying with the same vector modes
  2019-11-06 11:02         ` Richard Sandiford
@ 2019-11-06 11:22           ` Richard Biener
  2019-11-06 12:47             ` Richard Sandiford
  0 siblings, 1 reply; 48+ messages in thread
From: Richard Biener @ 2019-11-06 11:22 UTC (permalink / raw)
  To: Richard Sandiford; +Cc: GCC Patches

On Wed, Nov 6, 2019 at 12:02 PM Richard Sandiford
<richard.sandiford@arm.com> wrote:
>
> Richard Biener <richard.guenther@gmail.com> writes:
> > On Wed, Nov 6, 2019 at 11:21 AM Richard Sandiford
> > <richard.sandiford@arm.com> wrote:
> >>
> >> Richard Biener <richard.guenther@gmail.com> writes:
> >> > On Tue, Nov 5, 2019 at 9:25 PM Richard Sandiford
> >> > <richard.sandiford@arm.com> wrote:
> >> >>
> >> >> Patch 12/n makes the AArch64 port add four entries to
> >> >> autovectorize_vector_modes.  Each entry describes a different
> >> >> vector mode assignment for vector code that mixes 8-bit, 16-bit,
> >> >> 32-bit and 64-bit elements.  But if (as usual) the vector code has
> >> >> fewer element sizes than that, we could end up trying the same
> >> >> combination of vector modes multiple times.  This patch adds a
> >> >> check to prevent that.
> >> >>
> >> >> As before: each patch tested individually on aarch64-linux-gnu and the
> >> >> series as a whole on x86_64-linux-gnu.
> >> >>
> >> >>
> >> >> 2019-11-04  Richard Sandiford  <richard.sandiford@arm.com>
> >> >>
> >> >> gcc/
> >> >>         * tree-vectorizer.h (vec_info::mode_set): New typedef.
> >> >>         (vec_info::used_vector_mode): New member variable.
> >> >>         (vect_chooses_same_modes_p): Declare.
> >> >>         * tree-vect-stmts.c (get_vectype_for_scalar_type): Record each
> >> >>         chosen vector mode in vec_info::used_vector_mode.
> >> >>         (vect_chooses_same_modes_p): New function.
> >> >>         * tree-vect-loop.c (vect_analyze_loop): Use it to avoid trying
> >> >>         the same vector statements multiple times.
> >> >>         * tree-vect-slp.c (vect_slp_bb_region): Likewise.
> >> >>
> >> >> Index: gcc/tree-vectorizer.h
> >> >> ===================================================================
> >> >> --- gcc/tree-vectorizer.h       2019-11-05 10:48:11.246092351 +0000
> >> >> +++ gcc/tree-vectorizer.h       2019-11-05 10:57:41.662071145 +0000
> >> >> @@ -298,6 +298,7 @@ typedef std::pair<tree, tree> vec_object
> >> >>  /* Vectorizer state common between loop and basic-block vectorization.  */
> >> >>  class vec_info {
> >> >>  public:
> >> >> +  typedef hash_set<int_hash<machine_mode, E_VOIDmode, E_BLKmode> > mode_set;
> >> >>    enum vec_kind { bb, loop };
> >> >>
> >> >>    vec_info (vec_kind, void *, vec_info_shared *);
> >> >> @@ -335,6 +336,9 @@ typedef std::pair<tree, tree> vec_object
> >> >>    /* Cost data used by the target cost model.  */
> >> >>    void *target_cost_data;
> >> >>
> >> >> +  /* The set of vector modes used in the vectorized region.  */
> >> >> +  mode_set used_vector_modes;
> >> >> +
> >> >>    /* The argument we should pass to related_vector_mode when looking up
> >> >>       the vector mode for a scalar mode, or VOIDmode if we haven't yet
> >> >>       made any decisions about which vector modes to use.  */
> >> >> @@ -1615,6 +1619,7 @@ extern tree get_related_vectype_for_scal
> >> >>  extern tree get_vectype_for_scalar_type (vec_info *, tree);
> >> >>  extern tree get_mask_type_for_scalar_type (vec_info *, tree);
> >> >>  extern tree get_same_sized_vectype (tree, tree);
> >> >> +extern bool vect_chooses_same_modes_p (vec_info *, machine_mode);
> >> >>  extern bool vect_get_loop_mask_type (loop_vec_info);
> >> >>  extern bool vect_is_simple_use (tree, vec_info *, enum vect_def_type *,
> >> >>                                 stmt_vec_info * = NULL, gimple ** = NULL);
> >> >> Index: gcc/tree-vect-stmts.c
> >> >> ===================================================================
> >> >> --- gcc/tree-vect-stmts.c       2019-11-05 10:48:11.242092379 +0000
> >> >> +++ gcc/tree-vect-stmts.c       2019-11-05 10:57:41.662071145 +0000
> >> >> @@ -11235,6 +11235,10 @@ get_vectype_for_scalar_type (vec_info *v
> >> >>                                                       scalar_type);
> >> >>    if (vectype && vinfo->vector_mode == VOIDmode)
> >> >>      vinfo->vector_mode = TYPE_MODE (vectype);
> >> >> +
> >> >> +  if (vectype)
> >> >> +    vinfo->used_vector_modes.add (TYPE_MODE (vectype));
> >> >> +
> >> >
> >> > Do we actually end up _using_ all types returned by this function?
> >>
> >> No, not all of them, so it's a bit crude.  E.g. some types might end up
> >> not being relevant after pattern recognition, or after we've made a
> >> final decision about which parts of an address calculation to include
> >> in a gather or scatter op.  So we can still end up retrying the same
> >> thing even after the patch.
> >>
> >> The problem is that we're trying to avoid pointless retries on failure
> >> as well as success, so we could end up stopping at arbitrary points.
> >> I wasn't sure where else to handle this.
> >
> > Yeah, I think this "iterating" is somewhat bogus (crude) now.
>
> I think it was crude even before the series though. :-)  Not sure the
> series is making things worse.
>
> The problem is that there's a chicken-and-egg problem between how
> we decide to vectorise and which vector subarchitecture and VF we use.
> E.g. if we have:
>
>   unsigned char *x, *y;
>   ...
>   x[i] = (unsigned short) (x[i] + y[i] + 1) >> 1;
>
> do we build the SLP graph on the assumption that we need to use short
> elements, or on the assumption that we can use IFN_AVG_CEIL?  This
> affects the VF we get out: using IFN_AVG_CEIL gives double the VF
> relative to doing unsigned short arithmetic.
>
> And we need to know which vector subarchitecture we're targetting when
> making that decision: e.g. Advanced SIMD and SVE2 have IFN_AVG_CEIL,
> but SVE doesn't.  On the other hand, SVE supports things that Advanced
> SIMD doesn't.  It's similar story of course for the x86 vector subarchs.
>
> For one pattern like this, we could simply try both ways.
> But that becomes untenable if there are multiple potential patterns.
> Iterating over the vector subarchs gives us a sensible way of reducing
> the search space by only applying patterns that the subarch supports.
>
> So...
>
> > What we'd like to collect is for all defs the vector types we could
> > use and then vectorizable_ defines constraints between input and
> > output vector types.  From that we'd arrive at a (possibly quite
> > large) set of "SLP graphs with vector types" we'd choose from.  I
> > believe we'll never want to truly explore the whole space but guess we
> > want to greedily compute those "SLP graphs with vector types" starting
> > from what (grouped) datarefs tells us is possible (which is kind of
> > what we do now).
>
> ...I don't think we can/should use the same SLP graph to pick vector
> types for all subarchs, since the ideal graph depends on the subarch.
> I'm also not sure the vectorizable_* routines could say anything that
> isn't obvious from the input and output scalar types.  Won't it still be
> the case that within an SLP instance, all scalars of type T will become
> the same vector(N) T?

Not necessarily.  I can see the SLP graph containing "reductions"
(like those complex patterns proposed).  But yes, at the moment
there's a single group-size per SLP instance.  Now, for the SLP _graph_
we may have one instance with 4 and one with group size of 8 both
for example sharing the same grouped load.  It may make sense to
vectorize the load with v8si and for the group-size 4 SLP instance
have a "reduction operation" that selects the appropriate part of the
loaded vector.

Now, vectorizable_* for say a conversion from int to double
may be able to vectorize for a v4si directly to v4df or to two times v2df.
With the size restriction relaxed vectorizable_* could opt to choose
smaller/larger vectors specifically and thus also have different vector types
for the same scalar type in the SLP graph.  I do expect this to be
profitable on x86 for some loops due to some asymmetries in the ISA
(and extra cost of lane-crossing operations for say AVX where using SSE
is cheaper for some ops even though you now have 2 or more instructions).

Richard.

>
> Thanks,
> Richard

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [11/n] Support vectorisation with mixed vector sizes
  2019-11-05 12:57   ` Richard Biener
@ 2019-11-06 12:38     ` Richard Sandiford
  2019-11-12  9:22       ` Richard Biener
  0 siblings, 1 reply; 48+ messages in thread
From: Richard Sandiford @ 2019-11-06 12:38 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC Patches

Richard Biener <richard.guenther@gmail.com> writes:
> On Fri, Oct 25, 2019 at 2:43 PM Richard Sandiford
> <richard.sandiford@arm.com> wrote:
>>
>> After previous patches, it's now possible to make the vectoriser
>> support multiple vector sizes in the same vector region, using
>> related_vector_mode to pick the right vector mode for a given
>> element mode.  No port yet takes advantage of this, but I have
>> a follow-on patch for AArch64.
>>
>> This patch also seemed like a good opportunity to add some more dump
>> messages: one to make it clear which vector size/mode was being used
>> when analysis passed or failed, and another to say when we've decided
>> to skip a redundant vector size/mode.
>
> OK.
>
> I wonder if, when we requested a specific size previously, we now
> have to verify we got that constraint satisfied after the change.
> Esp. the epilogue vectorization cases want to get V2DI
> from V4DI.
>
>           sz /= 2;
> -         vectype1 = get_vectype_for_scalar_type_and_size (scalar_type, sz);
> +         vectype1 = get_related_vectype_for_scalar_type (TYPE_MODE (vectype),
> +                                                         scalar_type,
> +                                                         sz / scalar_bytes);
>
> doesn't look like an improvement in readability to me there.

Yeah, guess it isn't great.

> Maybe re-formulating the whole code in terms of lanes instead of size
> would make it easier to follow?

OK, how about this version?  It still won't win awards, but it's at
least a bit more readable.

Tested as before.

Richard


2019-11-06  Richard Sandiford  <richard.sandiford@arm.com>

gcc/
	* machmode.h (opt_machine_mode::operator==): New function.
	(opt_machine_mode::operator!=): Likewise.
	* tree-vectorizer.h (vec_info::vector_mode): Update comment.
	(get_related_vectype_for_scalar_type): Delete.
	(get_vectype_for_scalar_type_and_size): Declare.
	* tree-vect-slp.c (vect_slp_bb_region): Print dump messages to say
	whether analysis passed or failed, and with what vector modes.
	Use related_vector_mode to check whether trying a particular
	vector mode would be redundant with the autodetected mode,
	and print a dump message if we decide to skip it.
	* tree-vect-loop.c (vect_analyze_loop): Likewise.
	(vect_create_epilog_for_reduction): Use
	get_related_vectype_for_scalar_type instead of
	get_vectype_for_scalar_type_and_size.
	* tree-vect-stmts.c (get_vectype_for_scalar_type_and_size): Replace
	with...
	(get_related_vectype_for_scalar_type): ...this new function.
	Take a starting/"prevailing" vector mode rather than a vector size.
	Take an optional nunits argument, with the same meaning as for
	related_vector_mode.  Use related_vector_mode when not
	auto-detecting a mode, falling back to mode_for_vector if no
	target mode exists.
	(get_vectype_for_scalar_type): Update accordingly.
	(get_same_sized_vectype): Likewise.
	* tree-vectorizer.c (get_vec_alignment_for_array_type): Likewise.

Index: gcc/machmode.h
===================================================================
--- gcc/machmode.h	2019-11-06 12:35:12.460201615 +0000
+++ gcc/machmode.h	2019-11-06 12:35:27.972093472 +0000
@@ -258,6 +258,9 @@ #define CLASS_HAS_WIDER_MODES_P(CLASS)
   bool exists () const;
   template<typename U> bool exists (U *) const;
 
+  bool operator== (const T &m) const { return m_mode == m; }
+  bool operator!= (const T &m) const { return m_mode != m; }
+
 private:
   machine_mode m_mode;
 };
Index: gcc/tree-vectorizer.h
===================================================================
--- gcc/tree-vectorizer.h	2019-11-06 12:35:12.764199495 +0000
+++ gcc/tree-vectorizer.h	2019-11-06 12:35:27.976093444 +0000
@@ -335,8 +335,9 @@ typedef std::pair<tree, tree> vec_object
   /* Cost data used by the target cost model.  */
   void *target_cost_data;
 
-  /* If we've chosen a vector size for this vectorization region,
-     this is one mode that has such a size, otherwise it is VOIDmode.  */
+  /* The argument we should pass to related_vector_mode when looking up
+     the vector mode for a scalar mode, or VOIDmode if we haven't yet
+     made any decisions about which vector modes to use.  */
   machine_mode vector_mode;
 
 private:
@@ -1609,8 +1610,9 @@ extern bool vect_can_advance_ivs_p (loop
 extern void vect_update_inits_of_drs (loop_vec_info, tree, tree_code);
 
 /* In tree-vect-stmts.c.  */
+extern tree get_related_vectype_for_scalar_type (machine_mode, tree,
+						 poly_uint64 = 0);
 extern tree get_vectype_for_scalar_type (vec_info *, tree);
-extern tree get_vectype_for_scalar_type_and_size (tree, poly_uint64);
 extern tree get_mask_type_for_scalar_type (vec_info *, tree);
 extern tree get_same_sized_vectype (tree, tree);
 extern bool vect_get_loop_mask_type (loop_vec_info);
Index: gcc/tree-vect-slp.c
===================================================================
--- gcc/tree-vect-slp.c	2019-11-06 12:35:12.760199523 +0000
+++ gcc/tree-vect-slp.c	2019-11-06 12:35:27.972093472 +0000
@@ -3202,7 +3202,12 @@ vect_slp_bb_region (gimple_stmt_iterator
 	  && dbg_cnt (vect_slp))
 	{
 	  if (dump_enabled_p ())
-	    dump_printf_loc (MSG_NOTE, vect_location, "SLPing BB part\n");
+	    {
+	      dump_printf_loc (MSG_NOTE, vect_location,
+			       "***** Analysis succeeded with vector mode"
+			       " %s\n", GET_MODE_NAME (bb_vinfo->vector_mode));
+	      dump_printf_loc (MSG_NOTE, vect_location, "SLPing BB part\n");
+	    }
 
 	  bb_vinfo->shared->check_datarefs ();
 	  vect_schedule_slp (bb_vinfo);
@@ -3222,6 +3227,13 @@ vect_slp_bb_region (gimple_stmt_iterator
 
 	  vectorized = true;
 	}
+      else
+	{
+	  if (dump_enabled_p ())
+	    dump_printf_loc (MSG_NOTE, vect_location,
+			     "***** Analysis failed with vector mode %s\n",
+			     GET_MODE_NAME (bb_vinfo->vector_mode));
+	}
 
       if (mode_i == 0)
 	autodetected_vector_mode = bb_vinfo->vector_mode;
@@ -3229,9 +3241,22 @@ vect_slp_bb_region (gimple_stmt_iterator
       delete bb_vinfo;
 
       if (mode_i < vector_modes.length ()
-	  && known_eq (GET_MODE_SIZE (vector_modes[mode_i]),
-		       GET_MODE_SIZE (autodetected_vector_mode)))
-	mode_i += 1;
+	  && VECTOR_MODE_P (autodetected_vector_mode)
+	  && (related_vector_mode (vector_modes[mode_i],
+				   GET_MODE_INNER (autodetected_vector_mode))
+	      == autodetected_vector_mode)
+	  && (related_vector_mode (autodetected_vector_mode,
+				   GET_MODE_INNER (vector_modes[mode_i]))
+	      == vector_modes[mode_i]))
+	{
+	  if (dump_enabled_p ())
+	    dump_printf_loc (MSG_NOTE, vect_location,
+			     "***** Skipping vector mode %s, which would"
+			     " repeat the analysis for %s\n",
+			     GET_MODE_NAME (vector_modes[mode_i]),
+			     GET_MODE_NAME (autodetected_vector_mode));
+	  mode_i += 1;
+	}
 
       if (vectorized
 	  || mode_i == vector_modes.length ()
Index: gcc/tree-vect-loop.c
===================================================================
--- gcc/tree-vect-loop.c	2019-11-06 12:35:12.756199552 +0000
+++ gcc/tree-vect-loop.c	2019-11-06 12:35:27.972093472 +0000
@@ -2417,6 +2417,17 @@ vect_analyze_loop (class loop *loop, vec
       res = vect_analyze_loop_2 (loop_vinfo, fatal, &n_stmts);
       if (mode_i == 0)
 	autodetected_vector_mode = loop_vinfo->vector_mode;
+      if (dump_enabled_p ())
+	{
+	  if (res)
+	    dump_printf_loc (MSG_NOTE, vect_location,
+			     "***** Analysis succeeded with vector mode %s\n",
+			     GET_MODE_NAME (loop_vinfo->vector_mode));
+	  else
+	    dump_printf_loc (MSG_NOTE, vect_location,
+			     "***** Analysis failed with vector mode %s\n",
+			     GET_MODE_NAME (loop_vinfo->vector_mode));
+	}
 
       loop->aux = NULL;
       if (res)
@@ -2479,9 +2490,22 @@ vect_analyze_loop (class loop *loop, vec
 	}
 
       if (mode_i < vector_modes.length ()
-	  && known_eq (GET_MODE_SIZE (vector_modes[mode_i]),
-		       GET_MODE_SIZE (autodetected_vector_mode)))
-	mode_i += 1;
+	  && VECTOR_MODE_P (autodetected_vector_mode)
+	  && (related_vector_mode (vector_modes[mode_i],
+				   GET_MODE_INNER (autodetected_vector_mode))
+	      == autodetected_vector_mode)
+	  && (related_vector_mode (autodetected_vector_mode,
+				   GET_MODE_INNER (vector_modes[mode_i]))
+	      == vector_modes[mode_i]))
+	{
+	  if (dump_enabled_p ())
+	    dump_printf_loc (MSG_NOTE, vect_location,
+			     "***** Skipping vector mode %s, which would"
+			     " repeat the analysis for %s\n",
+			     GET_MODE_NAME (vector_modes[mode_i]),
+			     GET_MODE_NAME (autodetected_vector_mode));
+	  mode_i += 1;
+	}
 
       if (mode_i == vector_modes.length ()
 	  || autodetected_vector_mode == VOIDmode)
@@ -4870,13 +4894,15 @@ vect_create_epilog_for_reduction (stmt_v
 	 in a vector mode of smaller size and first reduce upper/lower
 	 halves against each other.  */
       enum machine_mode mode1 = mode;
-      unsigned sz = tree_to_uhwi (TYPE_SIZE_UNIT (vectype));
-      unsigned sz1 = sz;
+      unsigned nunits = TYPE_VECTOR_SUBPARTS (vectype).to_constant ();
+      unsigned nunits1 = nunits;
       if (!slp_reduc
 	  && (mode1 = targetm.vectorize.split_reduction (mode)) != mode)
-	sz1 = GET_MODE_SIZE (mode1).to_constant ();
+	nunits1 = GET_MODE_NUNITS (mode1).to_constant ();
 
-      tree vectype1 = get_vectype_for_scalar_type_and_size (scalar_type, sz1);
+      tree vectype1 = get_related_vectype_for_scalar_type (TYPE_MODE (vectype),
+							   scalar_type,
+							   nunits1);
       reduce_with_shift = have_whole_vector_shift (mode1);
       if (!VECTOR_MODE_P (mode1))
 	reduce_with_shift = false;
@@ -4890,11 +4916,13 @@ vect_create_epilog_for_reduction (stmt_v
       /* First reduce the vector to the desired vector size we should
 	 do shift reduction on by combining upper and lower halves.  */
       new_temp = new_phi_result;
-      while (sz > sz1)
+      while (nunits > nunits1)
 	{
 	  gcc_assert (!slp_reduc);
-	  sz /= 2;
-	  vectype1 = get_vectype_for_scalar_type_and_size (scalar_type, sz);
+	  nunits /= 2;
+	  vectype1 = get_related_vectype_for_scalar_type (TYPE_MODE (vectype),
+							  scalar_type, nunits);
+	  unsigned int bitsize = tree_to_uhwi (TYPE_SIZE (vectype1));
 
 	  /* The target has to make sure we support lowpart/highpart
 	     extraction, either via direct vector extract or through
@@ -4919,15 +4947,14 @@ vect_create_epilog_for_reduction (stmt_v
 		  = gimple_build_assign (dst2, BIT_FIELD_REF,
 					 build3 (BIT_FIELD_REF, vectype1,
 						 new_temp, TYPE_SIZE (vectype1),
-						 bitsize_int (sz * BITS_PER_UNIT)));
+						 bitsize_int (bitsize)));
 	      gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT);
 	    }
 	  else
 	    {
 	      /* Extract via punning to appropriately sized integer mode
 		 vector.  */
-	      tree eltype = build_nonstandard_integer_type (sz * BITS_PER_UNIT,
-							    1);
+	      tree eltype = build_nonstandard_integer_type (bitsize, 1);
 	      tree etype = build_vector_type (eltype, 2);
 	      gcc_assert (convert_optab_handler (vec_extract_optab,
 						 TYPE_MODE (etype),
@@ -4956,7 +4983,7 @@ vect_create_epilog_for_reduction (stmt_v
 		  = gimple_build_assign (tem, BIT_FIELD_REF,
 					 build3 (BIT_FIELD_REF, eltype,
 						 new_temp, TYPE_SIZE (eltype),
-						 bitsize_int (sz * BITS_PER_UNIT)));
+						 bitsize_int (bitsize)));
 	      gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT);
 	      dst2 =  make_ssa_name (vectype1);
 	      epilog_stmt = gimple_build_assign (dst2, VIEW_CONVERT_EXPR,
Index: gcc/tree-vect-stmts.c
===================================================================
--- gcc/tree-vect-stmts.c	2019-11-06 12:35:12.796199272 +0000
+++ gcc/tree-vect-stmts.c	2019-11-06 12:35:27.976093444 +0000
@@ -11097,18 +11097,28 @@ vect_remove_stores (stmt_vec_info first_
     }
 }
 
-/* Function get_vectype_for_scalar_type_and_size.
-
-   Returns the vector type corresponding to SCALAR_TYPE  and SIZE as supported
-   by the target.  */
+/* If NUNITS is nonzero, return a vector type that contains NUNITS
+   elements of type SCALAR_TYPE, or null if the target doesn't support
+   such a type.
+
+   If NUNITS is zero, return a vector type that contains elements of
+   type SCALAR_TYPE, choosing whichever vector size the target prefers.
+
+   If PREVAILING_MODE is VOIDmode, we have not yet chosen a vector mode
+   for this vectorization region and want to "autodetect" the best choice.
+   Otherwise, PREVAILING_MODE is a previously-chosen vector TYPE_MODE
+   and we want the new type to be interoperable with it.   PREVAILING_MODE
+   in this case can be a scalar integer mode or a vector mode; when it
+   is a vector mode, the function acts like a tree-level version of
+   related_vector_mode.  */
 
 tree
-get_vectype_for_scalar_type_and_size (tree scalar_type, poly_uint64 size)
+get_related_vectype_for_scalar_type (machine_mode prevailing_mode,
+				     tree scalar_type, poly_uint64 nunits)
 {
   tree orig_scalar_type = scalar_type;
   scalar_mode inner_mode;
   machine_mode simd_mode;
-  poly_uint64 nunits;
   tree vectype;
 
   if (!is_int_mode (TYPE_MODE (scalar_type), &inner_mode)
@@ -11148,10 +11158,11 @@ get_vectype_for_scalar_type_and_size (tr
   if (scalar_type == NULL_TREE)
     return NULL_TREE;
 
-  /* If no size was supplied use the mode the target prefers.   Otherwise
-     lookup a vector mode of the specified size.  */
-  if (known_eq (size, 0U))
+  /* If no prevailing mode was supplied, use the mode the target prefers.
+     Otherwise lookup a vector mode based on the prevailing mode.  */
+  if (prevailing_mode == VOIDmode)
     {
+      gcc_assert (known_eq (nunits, 0U));
       simd_mode = targetm.vectorize.preferred_simd_mode (inner_mode);
       if (SCALAR_INT_MODE_P (simd_mode))
 	{
@@ -11167,9 +11178,19 @@ get_vectype_for_scalar_type_and_size (tr
 	    return NULL_TREE;
 	}
     }
-  else if (!multiple_p (size, nbytes, &nunits)
-	   || !mode_for_vector (inner_mode, nunits).exists (&simd_mode))
-    return NULL_TREE;
+  else if (SCALAR_INT_MODE_P (prevailing_mode)
+	   || !related_vector_mode (prevailing_mode,
+				    inner_mode, nunits).exists (&simd_mode))
+    {
+      /* Fall back to using mode_for_vector, mostly in the hope of being
+	 able to use an integer mode.  */
+      if (known_eq (nunits, 0U)
+	  && !multiple_p (GET_MODE_SIZE (prevailing_mode), nbytes, &nunits))
+	return NULL_TREE;
+
+      if (!mode_for_vector (inner_mode, nunits).exists (&simd_mode))
+	return NULL_TREE;
+    }
 
   vectype = build_vector_type_for_mode (scalar_type, simd_mode);
 
@@ -11197,9 +11218,8 @@ get_vectype_for_scalar_type_and_size (tr
 tree
 get_vectype_for_scalar_type (vec_info *vinfo, tree scalar_type)
 {
-  tree vectype;
-  poly_uint64 vector_size = GET_MODE_SIZE (vinfo->vector_mode);
-  vectype = get_vectype_for_scalar_type_and_size (scalar_type, vector_size);
+  tree vectype = get_related_vectype_for_scalar_type (vinfo->vector_mode,
+						      scalar_type);
   if (vectype && vinfo->vector_mode == VOIDmode)
     vinfo->vector_mode = TYPE_MODE (vectype);
   return vectype;
@@ -11232,8 +11252,13 @@ get_same_sized_vectype (tree scalar_type
   if (VECT_SCALAR_BOOLEAN_TYPE_P (scalar_type))
     return truth_type_for (vector_type);
 
-  return get_vectype_for_scalar_type_and_size
-	   (scalar_type, GET_MODE_SIZE (TYPE_MODE (vector_type)));
+  poly_uint64 nunits;
+  if (!multiple_p (GET_MODE_SIZE (TYPE_MODE (vector_type)),
+		   GET_MODE_SIZE (TYPE_MODE (scalar_type)), &nunits))
+    return NULL_TREE;
+
+  return get_related_vectype_for_scalar_type (TYPE_MODE (vector_type),
+					      scalar_type, nunits);
 }
 
 /* Function vect_is_simple_use.
Index: gcc/tree-vectorizer.c
===================================================================
--- gcc/tree-vectorizer.c	2019-11-06 12:35:12.764199495 +0000
+++ gcc/tree-vectorizer.c	2019-11-06 12:35:27.976093444 +0000
@@ -1359,7 +1359,7 @@ get_vec_alignment_for_array_type (tree t
   poly_uint64 array_size, vector_size;
 
   tree scalar_type = strip_array_types (type);
-  tree vectype = get_vectype_for_scalar_type_and_size (scalar_type, 0);
+  tree vectype = get_related_vectype_for_scalar_type (VOIDmode, scalar_type);
   if (!vectype
       || !poly_int_tree_p (TYPE_SIZE (type), &array_size)
       || !poly_int_tree_p (TYPE_SIZE (vectype), &vector_size)

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [14/n] Vectorise conversions between differently-sized integer vectors
  2019-11-05 13:02   ` Richard Biener
@ 2019-11-06 12:45     ` Richard Sandiford
  2019-11-12  9:40       ` Richard Biener
  0 siblings, 1 reply; 48+ messages in thread
From: Richard Sandiford @ 2019-11-06 12:45 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC Patches

Richard Biener <richard.guenther@gmail.com> writes:
> On Fri, Oct 25, 2019 at 2:51 PM Richard Sandiford
> <richard.sandiford@arm.com> wrote:
>>
>> This patch adds AArch64 patterns for converting between 64-bit and
>> 128-bit integer vectors, and makes the vectoriser and expand pass
>> use them.
>
> So on GIMPLE we'll see
>
> v4si _1;
> v4di _2;
>
>  _1 = (v4si) _2;
>
> then, correct?  Likewise for float conversions.
>
> I think that's "new", can you add to tree-cfg.c:verify_gimple_assign_unary
> verification that the number of lanes of the LHS and the RHS match please?

Ah, yeah.  How's this?  Tested as before.

Richard


2019-11-06  Richard Sandiford  <richard.sandiford@arm.com>

gcc/
	* tree-cfg.c (verify_gimple_assign_unary): Handle conversions
	between vector types.
	* tree-vect-stmts.c (vectorizable_conversion): Extend the
	non-widening and non-narrowing path to handle standard
	conversion codes, if the target supports them.
	* expr.c (convert_move): Try using the extend and truncate optabs
	for vectors.
	* optabs-tree.c (supportable_convert_operation): Likewise.
	* config/aarch64/iterators.md (Vnarroqw): New iterator.
	* config/aarch64/aarch64-simd.md (<optab><Vnarrowq><mode>2)
	(trunc<mode><Vnarrowq>2): New patterns.

gcc/testsuite/
	* gcc.dg/vect/bb-slp-pr69907.c: Do not expect BB vectorization
	to fail for aarch64 targets.
	* gcc.dg/vect/no-scevccp-outer-12.c: Expect the test to pass
	on aarch64 targets.
	* gcc.dg/vect/vect-double-reduc-5.c: Likewise.
	* gcc.dg/vect/vect-outer-4e.c: Likewise.
	* gcc.target/aarch64/vect_mixed_sizes_5.c: New test.
	* gcc.target/aarch64/vect_mixed_sizes_6.c: Likewise.
	* gcc.target/aarch64/vect_mixed_sizes_7.c: Likewise.
	* gcc.target/aarch64/vect_mixed_sizes_8.c: Likewise.
	* gcc.target/aarch64/vect_mixed_sizes_9.c: Likewise.
	* gcc.target/aarch64/vect_mixed_sizes_10.c: Likewise.
	* gcc.target/aarch64/vect_mixed_sizes_11.c: Likewise.
	* gcc.target/aarch64/vect_mixed_sizes_12.c: Likewise.
	* gcc.target/aarch64/vect_mixed_sizes_13.c: Likewise.

Index: gcc/tree-cfg.c
===================================================================
--- gcc/tree-cfg.c	2019-09-05 08:49:30.829739618 +0100
+++ gcc/tree-cfg.c	2019-11-06 12:44:22.832365429 +0000
@@ -3553,6 +3553,24 @@ verify_gimple_assign_unary (gassign *stm
     {
     CASE_CONVERT:
       {
+	/* Allow conversions between vectors with the same number of elements,
+	   provided that the conversion is OK for the element types too.  */
+	if (VECTOR_TYPE_P (lhs_type)
+	    && VECTOR_TYPE_P (rhs1_type)
+	    && known_eq (TYPE_VECTOR_SUBPARTS (lhs_type),
+			 TYPE_VECTOR_SUBPARTS (rhs1_type)))
+	  {
+	    lhs_type = TREE_TYPE (lhs_type);
+	    rhs1_type = TREE_TYPE (rhs1_type);
+	  }
+	else if (VECTOR_TYPE_P (lhs_type) || VECTOR_TYPE_P (rhs1_type))
+	  {
+	    error ("invalid vector types in nop conversion");
+	    debug_generic_expr (lhs_type);
+	    debug_generic_expr (rhs1_type);
+	    return true;
+	  }
+
 	/* Allow conversions from pointer type to integral type only if
 	   there is no sign or zero extension involved.
 	   For targets were the precision of ptrofftype doesn't match that
Index: gcc/tree-vect-stmts.c
===================================================================
--- gcc/tree-vect-stmts.c	2019-11-06 12:44:10.896448608 +0000
+++ gcc/tree-vect-stmts.c	2019-11-06 12:44:22.832365429 +0000
@@ -4869,7 +4869,9 @@ vectorizable_conversion (stmt_vec_info s
   switch (modifier)
     {
     case NONE:
-      if (code != FIX_TRUNC_EXPR && code != FLOAT_EXPR)
+      if (code != FIX_TRUNC_EXPR
+	  && code != FLOAT_EXPR
+	  && !CONVERT_EXPR_CODE_P (code))
 	return false;
       if (supportable_convert_operation (code, vectype_out, vectype_in,
 					 &decl1, &code1))
Index: gcc/expr.c
===================================================================
--- gcc/expr.c	2019-11-06 12:29:17.394677341 +0000
+++ gcc/expr.c	2019-11-06 12:44:22.828365457 +0000
@@ -250,6 +250,31 @@ convert_move (rtx to, rtx from, int unsi
 
   if (VECTOR_MODE_P (to_mode) || VECTOR_MODE_P (from_mode))
     {
+      if (GET_MODE_UNIT_PRECISION (to_mode)
+	  > GET_MODE_UNIT_PRECISION (from_mode))
+	{
+	  optab op = unsignedp ? zext_optab : sext_optab;
+	  insn_code icode = convert_optab_handler (op, to_mode, from_mode);
+	  if (icode != CODE_FOR_nothing)
+	    {
+	      emit_unop_insn (icode, to, from,
+			      unsignedp ? ZERO_EXTEND : SIGN_EXTEND);
+	      return;
+	    }
+	}
+
+      if (GET_MODE_UNIT_PRECISION (to_mode)
+	  < GET_MODE_UNIT_PRECISION (from_mode))
+	{
+	  insn_code icode = convert_optab_handler (trunc_optab,
+						   to_mode, from_mode);
+	  if (icode != CODE_FOR_nothing)
+	    {
+	      emit_unop_insn (icode, to, from, TRUNCATE);
+	      return;
+	    }
+	}
+
       gcc_assert (known_eq (GET_MODE_BITSIZE (from_mode),
 			    GET_MODE_BITSIZE (to_mode)));
 
Index: gcc/optabs-tree.c
===================================================================
--- gcc/optabs-tree.c	2019-11-06 12:28:23.000000000 +0000
+++ gcc/optabs-tree.c	2019-11-06 12:44:22.828365457 +0000
@@ -303,6 +303,20 @@ supportable_convert_operation (enum tree
       return true;
     }
 
+  if (GET_MODE_UNIT_PRECISION (m1) > GET_MODE_UNIT_PRECISION (m2)
+      && can_extend_p (m1, m2, TYPE_UNSIGNED (vectype_in)))
+    {
+      *code1 = code;
+      return true;
+    }
+
+  if (GET_MODE_UNIT_PRECISION (m1) < GET_MODE_UNIT_PRECISION (m2)
+      && convert_optab_handler (trunc_optab, m1, m2) != CODE_FOR_nothing)
+    {
+      *code1 = code;
+      return true;
+    }
+
   /* Now check for builtin.  */
   if (targetm.vectorize.builtin_conversion
       && targetm.vectorize.builtin_conversion (code, vectype_out, vectype_in))
Index: gcc/config/aarch64/iterators.md
===================================================================
--- gcc/config/aarch64/iterators.md	2019-11-06 12:28:23.000000000 +0000
+++ gcc/config/aarch64/iterators.md	2019-11-06 12:44:22.824365485 +0000
@@ -933,6 +933,8 @@ (define_mode_attr VNARROWQ [(V8HI "V8QI"
 			    (V2DI "V2SI")
 			    (DI	  "SI")	  (SI	"HI")
 			    (HI	  "QI")])
+(define_mode_attr Vnarrowq [(V8HI "v8qi") (V4SI "v4hi")
+			    (V2DI "v2si")])
 
 ;; Narrowed quad-modes for VQN (Used for XTN2).
 (define_mode_attr VNARROWQ2 [(V8HI "V16QI") (V4SI "V8HI")
Index: gcc/config/aarch64/aarch64-simd.md
===================================================================
--- gcc/config/aarch64/aarch64-simd.md	2019-11-06 12:28:23.000000000 +0000
+++ gcc/config/aarch64/aarch64-simd.md	2019-11-06 12:44:22.824365485 +0000
@@ -7007,3 +7007,21 @@ (define_insn "aarch64_crypto_pmullv2di"
   "pmull2\\t%0.1q, %1.2d, %2.2d"
   [(set_attr "type" "crypto_pmull")]
 )
+
+;; Sign- or zero-extend a 64-bit integer vector to a 128-bit vector.
+(define_insn "<optab><Vnarrowq><mode>2"
+  [(set (match_operand:VQN 0 "register_operand" "=w")
+	(ANY_EXTEND:VQN (match_operand:<VNARROWQ> 1 "register_operand" "w")))]
+  "TARGET_SIMD"
+  "<su>xtl\t%0.<Vtype>, %1.<Vntype>"
+  [(set_attr "type" "neon_shift_imm_long")]
+)
+
+;; Truncate a 128-bit integer vector to a 64-bit vector.
+(define_insn "trunc<mode><Vnarrowq>2"
+  [(set (match_operand:<VNARROWQ> 0 "register_operand" "=w")
+	(truncate:<VNARROWQ> (match_operand:VQN 1 "register_operand" "w")))]
+  "TARGET_SIMD"
+  "xtn\t%0.<Vntype>, %1.<Vtype>"
+  [(set_attr "type" "neon_shift_imm_narrow_q")]
+)
Index: gcc/testsuite/gcc.dg/vect/bb-slp-pr69907.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/bb-slp-pr69907.c	2019-03-08 18:15:02.292871138 +0000
+++ gcc/testsuite/gcc.dg/vect/bb-slp-pr69907.c	2019-11-06 12:44:22.828365457 +0000
@@ -18,5 +18,6 @@ void foo(unsigned *p1, unsigned short *p
 }
 
 /* Disable for SVE because for long or variable-length vectors we don't
-   get an unrolled epilogue loop.  */
-/* { dg-final { scan-tree-dump "BB vectorization with gaps at the end of a load is not supported" "slp1" { target { ! aarch64_sve } } } } */
+   get an unrolled epilogue loop.  Also disable for AArch64 Advanced SIMD,
+   because there we can vectorize the epilogue using mixed vector sizes.  */
+/* { dg-final { scan-tree-dump "BB vectorization with gaps at the end of a load is not supported" "slp1" { target { ! aarch64*-*-* } } } } */
Index: gcc/testsuite/gcc.dg/vect/no-scevccp-outer-12.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/no-scevccp-outer-12.c	2019-11-06 12:28:23.000000000 +0000
+++ gcc/testsuite/gcc.dg/vect/no-scevccp-outer-12.c	2019-11-06 12:44:22.828365457 +0000
@@ -46,4 +46,4 @@ int main (void)
 }
 
 /* Until we support multiple types in the inner loop  */
-/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" { xfail *-*-* } } } */
+/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" { xfail { ! aarch64*-*-* } } } } */
Index: gcc/testsuite/gcc.dg/vect/vect-double-reduc-5.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-double-reduc-5.c	2019-11-06 12:28:23.000000000 +0000
+++ gcc/testsuite/gcc.dg/vect/vect-double-reduc-5.c	2019-11-06 12:44:22.828365457 +0000
@@ -52,5 +52,5 @@ int main ()
 
 /* Vectorization of loops with multiple types and double reduction is not 
    supported yet.  */       
-/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { xfail *-*-* } } } */
+/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { xfail { ! aarch64*-*-* } } } } */
       
Index: gcc/testsuite/gcc.dg/vect/vect-outer-4e.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-outer-4e.c	2019-11-06 12:28:23.000000000 +0000
+++ gcc/testsuite/gcc.dg/vect/vect-outer-4e.c	2019-11-06 12:44:22.828365457 +0000
@@ -23,4 +23,4 @@ foo (){
   return;
 }
 
-/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { xfail *-*-* } } } */
+/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { xfail { ! aarch64*-*-* } } } } */
Index: gcc/testsuite/gcc.target/aarch64/vect_mixed_sizes_5.c
===================================================================
--- /dev/null	2019-09-17 11:41:18.176664108 +0100
+++ gcc/testsuite/gcc.target/aarch64/vect_mixed_sizes_5.c	2019-11-06 12:44:22.828365457 +0000
@@ -0,0 +1,18 @@
+/* { dg-options "-O2 -ftree-vectorize" } */
+
+#pragma GCC target "+nosve"
+
+#include <stdint.h>
+
+void
+f (int64_t *x, int64_t *y, int32_t *z, int n)
+{
+  for (int i = 0; i < n; ++i)
+    {
+      x[i] = z[i];
+      y[i] += y[i - 2];
+    }
+}
+
+/* { dg-final { scan-assembler-times {\tsxtl\tv[0-9]+\.2d, v[0-9]+\.2s\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tadd\tv[0-9]+\.2d,} 1 } } */
Index: gcc/testsuite/gcc.target/aarch64/vect_mixed_sizes_6.c
===================================================================
--- /dev/null	2019-09-17 11:41:18.176664108 +0100
+++ gcc/testsuite/gcc.target/aarch64/vect_mixed_sizes_6.c	2019-11-06 12:44:22.828365457 +0000
@@ -0,0 +1,18 @@
+/* { dg-options "-O2 -ftree-vectorize" } */
+
+#pragma GCC target "+nosve"
+
+#include <stdint.h>
+
+void
+f (int32_t *x, int32_t *y, int16_t *z, int n)
+{
+  for (int i = 0; i < n; ++i)
+    {
+      x[i] = z[i];
+      y[i] += y[i - 4];
+    }
+}
+
+/* { dg-final { scan-assembler-times {\tsxtl\tv[0-9]+\.4s, v[0-9]+\.4h\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tadd\tv[0-9]+\.4s,} 1 } } */
Index: gcc/testsuite/gcc.target/aarch64/vect_mixed_sizes_7.c
===================================================================
--- /dev/null	2019-09-17 11:41:18.176664108 +0100
+++ gcc/testsuite/gcc.target/aarch64/vect_mixed_sizes_7.c	2019-11-06 12:44:22.828365457 +0000
@@ -0,0 +1,18 @@
+/* { dg-options "-O2 -ftree-vectorize" } */
+
+#pragma GCC target "+nosve"
+
+#include <stdint.h>
+
+void
+f (int16_t *x, int16_t *y, int8_t *z, int n)
+{
+  for (int i = 0; i < n; ++i)
+    {
+      x[i] = z[i];
+      y[i] += y[i - 8];
+    }
+}
+
+/* { dg-final { scan-assembler-times {\tsxtl\tv[0-9]+\.8h, v[0-9]+\.8b\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tadd\tv[0-9]+\.8h,} 1 } } */
Index: gcc/testsuite/gcc.target/aarch64/vect_mixed_sizes_8.c
===================================================================
--- /dev/null	2019-09-17 11:41:18.176664108 +0100
+++ gcc/testsuite/gcc.target/aarch64/vect_mixed_sizes_8.c	2019-11-06 12:44:22.828365457 +0000
@@ -0,0 +1,18 @@
+/* { dg-options "-O2 -ftree-vectorize" } */
+
+#pragma GCC target "+nosve"
+
+#include <stdint.h>
+
+void
+f (int64_t *x, int64_t *y, uint32_t *z, int n)
+{
+  for (int i = 0; i < n; ++i)
+    {
+      x[i] = z[i];
+      y[i] += y[i - 2];
+    }
+}
+
+/* { dg-final { scan-assembler-times {\tuxtl\tv[0-9]+\.2d, v[0-9]+\.2s\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tadd\tv[0-9]+\.2d,} 1 } } */
Index: gcc/testsuite/gcc.target/aarch64/vect_mixed_sizes_9.c
===================================================================
--- /dev/null	2019-09-17 11:41:18.176664108 +0100
+++ gcc/testsuite/gcc.target/aarch64/vect_mixed_sizes_9.c	2019-11-06 12:44:22.828365457 +0000
@@ -0,0 +1,18 @@
+/* { dg-options "-O2 -ftree-vectorize" } */
+
+#pragma GCC target "+nosve"
+
+#include <stdint.h>
+
+void
+f (int32_t *x, int32_t *y, uint16_t *z, int n)
+{
+  for (int i = 0; i < n; ++i)
+    {
+      x[i] = z[i];
+      y[i] += y[i - 4];
+    }
+}
+
+/* { dg-final { scan-assembler-times {\tuxtl\tv[0-9]+\.4s, v[0-9]+\.4h\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tadd\tv[0-9]+\.4s,} 1 } } */
Index: gcc/testsuite/gcc.target/aarch64/vect_mixed_sizes_10.c
===================================================================
--- /dev/null	2019-09-17 11:41:18.176664108 +0100
+++ gcc/testsuite/gcc.target/aarch64/vect_mixed_sizes_10.c	2019-11-06 12:44:22.828365457 +0000
@@ -0,0 +1,18 @@
+/* { dg-options "-O2 -ftree-vectorize" } */
+
+#pragma GCC target "+nosve"
+
+#include <stdint.h>
+
+void
+f (int16_t *x, int16_t *y, uint8_t *z, int n)
+{
+  for (int i = 0; i < n; ++i)
+    {
+      x[i] = z[i];
+      y[i] += y[i - 8];
+    }
+}
+
+/* { dg-final { scan-assembler-times {\tuxtl\tv[0-9]+\.8h, v[0-9]+\.8b\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tadd\tv[0-9]+\.8h,} 1 } } */
Index: gcc/testsuite/gcc.target/aarch64/vect_mixed_sizes_11.c
===================================================================
--- /dev/null	2019-09-17 11:41:18.176664108 +0100
+++ gcc/testsuite/gcc.target/aarch64/vect_mixed_sizes_11.c	2019-11-06 12:44:22.828365457 +0000
@@ -0,0 +1,18 @@
+/* { dg-options "-O2 -ftree-vectorize" } */
+
+#pragma GCC target "+nosve"
+
+#include <stdint.h>
+
+void
+f (int32_t *x, int64_t *y, int64_t *z, int n)
+{
+  for (int i = 0; i < n; ++i)
+    {
+      x[i] = z[i];
+      y[i] += y[i - 2];
+    }
+}
+
+/* { dg-final { scan-assembler-times {\txtn\tv[0-9]+\.2s, v[0-9]+\.2d\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tadd\tv[0-9]+\.2d,} 1 } } */
Index: gcc/testsuite/gcc.target/aarch64/vect_mixed_sizes_12.c
===================================================================
--- /dev/null	2019-09-17 11:41:18.176664108 +0100
+++ gcc/testsuite/gcc.target/aarch64/vect_mixed_sizes_12.c	2019-11-06 12:44:22.828365457 +0000
@@ -0,0 +1,18 @@
+/* { dg-options "-O2 -ftree-vectorize" } */
+
+#pragma GCC target "+nosve"
+
+#include <stdint.h>
+
+void
+f (int16_t *x, int32_t *y, int32_t *z, int n)
+{
+  for (int i = 0; i < n; ++i)
+    {
+      x[i] = z[i];
+      y[i] += y[i - 4];
+    }
+}
+
+/* { dg-final { scan-assembler-times {\txtn\tv[0-9]+\.4h, v[0-9]+\.4s\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tadd\tv[0-9]+\.4s,} 1 } } */
Index: gcc/testsuite/gcc.target/aarch64/vect_mixed_sizes_13.c
===================================================================
--- /dev/null	2019-09-17 11:41:18.176664108 +0100
+++ gcc/testsuite/gcc.target/aarch64/vect_mixed_sizes_13.c	2019-11-06 12:44:22.828365457 +0000
@@ -0,0 +1,18 @@
+/* { dg-options "-O2 -ftree-vectorize" } */
+
+#pragma GCC target "+nosve"
+
+#include <stdint.h>
+
+void
+f (int8_t *x, int16_t *y, int16_t *z, int n)
+{
+  for (int i = 0; i < n; ++i)
+    {
+      x[i] = z[i];
+      y[i] += y[i - 8];
+    }
+}
+
+/* { dg-final { scan-assembler-times {\txtn\tv[0-9]+\.8b, v[0-9]+\.8h\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tadd\tv[0-9]+\.8h,} 1 } } */

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [11a/n] Avoid retrying with the same vector modes
  2019-11-06 11:22           ` Richard Biener
@ 2019-11-06 12:47             ` Richard Sandiford
  2019-11-12  9:25               ` Richard Biener
  0 siblings, 1 reply; 48+ messages in thread
From: Richard Sandiford @ 2019-11-06 12:47 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC Patches

Richard Biener <richard.guenther@gmail.com> writes:
> On Wed, Nov 6, 2019 at 12:02 PM Richard Sandiford
> <richard.sandiford@arm.com> wrote:
>>
>> Richard Biener <richard.guenther@gmail.com> writes:
>> > On Wed, Nov 6, 2019 at 11:21 AM Richard Sandiford
>> > <richard.sandiford@arm.com> wrote:
>> >>
>> >> Richard Biener <richard.guenther@gmail.com> writes:
>> >> > On Tue, Nov 5, 2019 at 9:25 PM Richard Sandiford
>> >> > <richard.sandiford@arm.com> wrote:
>> >> >>
>> >> >> Patch 12/n makes the AArch64 port add four entries to
>> >> >> autovectorize_vector_modes.  Each entry describes a different
>> >> >> vector mode assignment for vector code that mixes 8-bit, 16-bit,
>> >> >> 32-bit and 64-bit elements.  But if (as usual) the vector code has
>> >> >> fewer element sizes than that, we could end up trying the same
>> >> >> combination of vector modes multiple times.  This patch adds a
>> >> >> check to prevent that.
>> >> >>
>> >> >> As before: each patch tested individually on aarch64-linux-gnu and the
>> >> >> series as a whole on x86_64-linux-gnu.
>> >> >>
>> >> >>
>> >> >> 2019-11-04  Richard Sandiford  <richard.sandiford@arm.com>
>> >> >>
>> >> >> gcc/
>> >> >>         * tree-vectorizer.h (vec_info::mode_set): New typedef.
>> >> >>         (vec_info::used_vector_mode): New member variable.
>> >> >>         (vect_chooses_same_modes_p): Declare.
>> >> >>         * tree-vect-stmts.c (get_vectype_for_scalar_type): Record each
>> >> >>         chosen vector mode in vec_info::used_vector_mode.
>> >> >>         (vect_chooses_same_modes_p): New function.
>> >> >>         * tree-vect-loop.c (vect_analyze_loop): Use it to avoid trying
>> >> >>         the same vector statements multiple times.
>> >> >>         * tree-vect-slp.c (vect_slp_bb_region): Likewise.
>> >> >>
>> >> >> Index: gcc/tree-vectorizer.h
>> >> >> ===================================================================
>> >> >> --- gcc/tree-vectorizer.h       2019-11-05 10:48:11.246092351 +0000
>> >> >> +++ gcc/tree-vectorizer.h       2019-11-05 10:57:41.662071145 +0000
>> >> >> @@ -298,6 +298,7 @@ typedef std::pair<tree, tree> vec_object
>> >> >>  /* Vectorizer state common between loop and basic-block vectorization.  */
>> >> >>  class vec_info {
>> >> >>  public:
>> >> >> +  typedef hash_set<int_hash<machine_mode, E_VOIDmode, E_BLKmode> > mode_set;
>> >> >>    enum vec_kind { bb, loop };
>> >> >>
>> >> >>    vec_info (vec_kind, void *, vec_info_shared *);
>> >> >> @@ -335,6 +336,9 @@ typedef std::pair<tree, tree> vec_object
>> >> >>    /* Cost data used by the target cost model.  */
>> >> >>    void *target_cost_data;
>> >> >>
>> >> >> +  /* The set of vector modes used in the vectorized region.  */
>> >> >> +  mode_set used_vector_modes;
>> >> >> +
>> >> >>    /* The argument we should pass to related_vector_mode when looking up
>> >> >>       the vector mode for a scalar mode, or VOIDmode if we haven't yet
>> >> >>       made any decisions about which vector modes to use.  */
>> >> >> @@ -1615,6 +1619,7 @@ extern tree get_related_vectype_for_scal
>> >> >>  extern tree get_vectype_for_scalar_type (vec_info *, tree);
>> >> >>  extern tree get_mask_type_for_scalar_type (vec_info *, tree);
>> >> >>  extern tree get_same_sized_vectype (tree, tree);
>> >> >> +extern bool vect_chooses_same_modes_p (vec_info *, machine_mode);
>> >> >>  extern bool vect_get_loop_mask_type (loop_vec_info);
>> >> >>  extern bool vect_is_simple_use (tree, vec_info *, enum vect_def_type *,
>> >> >>                                 stmt_vec_info * = NULL, gimple ** = NULL);
>> >> >> Index: gcc/tree-vect-stmts.c
>> >> >> ===================================================================
>> >> >> --- gcc/tree-vect-stmts.c       2019-11-05 10:48:11.242092379 +0000
>> >> >> +++ gcc/tree-vect-stmts.c       2019-11-05 10:57:41.662071145 +0000
>> >> >> @@ -11235,6 +11235,10 @@ get_vectype_for_scalar_type (vec_info *v
>> >> >>                                                       scalar_type);
>> >> >>    if (vectype && vinfo->vector_mode == VOIDmode)
>> >> >>      vinfo->vector_mode = TYPE_MODE (vectype);
>> >> >> +
>> >> >> +  if (vectype)
>> >> >> +    vinfo->used_vector_modes.add (TYPE_MODE (vectype));
>> >> >> +
>> >> >
>> >> > Do we actually end up _using_ all types returned by this function?
>> >>
>> >> No, not all of them, so it's a bit crude.  E.g. some types might end up
>> >> not being relevant after pattern recognition, or after we've made a
>> >> final decision about which parts of an address calculation to include
>> >> in a gather or scatter op.  So we can still end up retrying the same
>> >> thing even after the patch.
>> >>
>> >> The problem is that we're trying to avoid pointless retries on failure
>> >> as well as success, so we could end up stopping at arbitrary points.
>> >> I wasn't sure where else to handle this.
>> >
>> > Yeah, I think this "iterating" is somewhat bogus (crude) now.
>>
>> I think it was crude even before the series though. :-)  Not sure the
>> series is making things worse.
>>
>> The problem is that there's a chicken-and-egg problem between how
>> we decide to vectorise and which vector subarchitecture and VF we use.
>> E.g. if we have:
>>
>>   unsigned char *x, *y;
>>   ...
>>   x[i] = (unsigned short) (x[i] + y[i] + 1) >> 1;
>>
>> do we build the SLP graph on the assumption that we need to use short
>> elements, or on the assumption that we can use IFN_AVG_CEIL?  This
>> affects the VF we get out: using IFN_AVG_CEIL gives double the VF
>> relative to doing unsigned short arithmetic.
>>
>> And we need to know which vector subarchitecture we're targetting when
>> making that decision: e.g. Advanced SIMD and SVE2 have IFN_AVG_CEIL,
>> but SVE doesn't.  On the other hand, SVE supports things that Advanced
>> SIMD doesn't.  It's similar story of course for the x86 vector subarchs.
>>
>> For one pattern like this, we could simply try both ways.
>> But that becomes untenable if there are multiple potential patterns.
>> Iterating over the vector subarchs gives us a sensible way of reducing
>> the search space by only applying patterns that the subarch supports.
>>
>> So...
>>
>> > What we'd like to collect is for all defs the vector types we could
>> > use and then vectorizable_ defines constraints between input and
>> > output vector types.  From that we'd arrive at a (possibly quite
>> > large) set of "SLP graphs with vector types" we'd choose from.  I
>> > believe we'll never want to truly explore the whole space but guess we
>> > want to greedily compute those "SLP graphs with vector types" starting
>> > from what (grouped) datarefs tells us is possible (which is kind of
>> > what we do now).
>>
>> ...I don't think we can/should use the same SLP graph to pick vector
>> types for all subarchs, since the ideal graph depends on the subarch.
>> I'm also not sure the vectorizable_* routines could say anything that
>> isn't obvious from the input and output scalar types.  Won't it still be
>> the case that within an SLP instance, all scalars of type T will become
>> the same vector(N) T?
>
> Not necessarily.  I can see the SLP graph containing "reductions"
> (like those complex patterns proposed).  But yes, at the moment
> there's a single group-size per SLP instance.  Now, for the SLP _graph_
> we may have one instance with 4 and one with group size of 8 both
> for example sharing the same grouped load.  It may make sense to
> vectorize the load with v8si and for the group-size 4 SLP instance
> have a "reduction operation" that selects the appropriate part of the
> loaded vector.
>
> Now, vectorizable_* for say a conversion from int to double
> may be able to vectorize for a v4si directly to v4df or to two times v2df.
> With the size restriction relaxed vectorizable_* could opt to choose
> smaller/larger vectors specifically and thus also have different vector types
> for the same scalar type in the SLP graph.  I do expect this to be
> profitable on x86 for some loops due to some asymmetries in the ISA
> (and extra cost of lane-crossing operations for say AVX where using SSE
> is cheaper for some ops even though you now have 2 or more instructions).

Making this dependent on vectorizable_* seems to require a natural
conversion point in the original scalar code.  In your shared data-ref
example, the SLP instance with group size 4 could use 2 v2sis or a
single v4si.  The fact that the data ref is shared with a group size
of 8 shouldn't necessarily affect that choice and e.g. force v4si over
v2si.  So when taking the graph as a whole, it seems like we should be
able to combine 2 v2sis into a single v4si or split a v4si into 2 v2sis
at any point where that's worthwhile, independently of the surrounding
operations.

The same goes for the conversions.  If the target only supports
v4si->v2df conversions, it should still be possible to combine 2 v2dfs
into a single v4df as a separate, independent step if there's a benefit
to operating on v4df.  Same idea in reverse if the target only supports
v4si->v4df and an SLP instance wants to operate on v2df.  It would be
good if this splitting and combining wasn't dependent on having a
conversion.

So it seems like the natural point for inserting group size adjustments
is at sharing boundaries between SLP instances, with each SLP instance
using consistent choices internally (i.e. with the scalar type
determining the vector type).  And we should be able to insert those
group size adjustments based only on the types involved.  Whether
vectorizable_conversion can do a particular group size adjustment on
the fly seems more like a costing/pattern-matching decision rather than
something that should restrict the choice of types.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [8/n] Replace autovectorize_vector_sizes with autovectorize_vector_modes
  2019-10-30 16:33     ` Richard Sandiford
@ 2019-11-11 10:30       ` Richard Sandiford
  2019-11-11 14:33       ` Richard Biener
  1 sibling, 0 replies; 48+ messages in thread
From: Richard Sandiford @ 2019-11-11 10:30 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC Patches

Ping

Richard Sandiford <richard.sandiford@arm.com> writes:
> Richard Biener <richard.guenther@gmail.com> writes:
>> On Fri, Oct 25, 2019 at 2:37 PM Richard Sandiford
>> <richard.sandiford@arm.com> wrote:
>>>
>>> This is another patch in the series to remove the assumption that
>>> all modes involved in vectorisation have to be the same size.
>>> Rather than have the target provide a list of vector sizes,
>>> it makes the target provide a list of vector "approaches",
>>> with each approach represented by a mode.
>>>
>>> A later patch will pass this mode to targetm.vectorize.related_mode
>>> to get the vector mode for a given element mode.  Until then, the modes
>>> simply act as an alternative way of specifying the vector size.
>>
>> Is there a restriction to use integer vector modes for the hook
>> or would FP vector modes be OK as well?
>
> Conceptually, each mode returned by the hook represents a set of vector
> modes, with the set containing one member for each supported element
> type.  The idea is to represent the set using the member with the
> smallest element type, preferring integer modes over floating-point
> modes in the event of a tie.  So using a floating-point mode as the
> representative mode is fine if floating-point elements are the smallest
> (or only) supported element type.
>
>> Note that your x86 change likely disables word_mode vectorization with
>> -mno-sse?
>
> No, that still works, because...
>
>> That is, how do we represent GPR vectorization "size" here?
>> The preferred SIMD mode hook may return an integer mode,
>> are non-vector modes OK for autovectorize_vector_modes?
>
> ...at least with all current targets, preferred_simd_mode is only
> an integer mode if the target has no "real" vectorisation support
> for that element type.  There's no need to handle that case in
> autovectorize_vector_sizes/modes, and e.g. the x86 hook does nothing
> when SSE is disabled.
>
> So while preferred_simd_mode can continue to return integer modes,
> autovectorize_vector_modes always returns vector modes.
>
> This patch just treats the mode as an alternative way of specifying
> the vector size.  11/n then tries to use related_vector_mode to choose
> the vector mode for each element type instead.  But 11/n only uses
> related_vector_mode if vec_info::vector_mode is a vector mode.  If it's
> an integer mode (as for -mno-sse), or if related_vector_mode fails to
> find a vector mode, then we still fall back to mode_for_vector and so
> pick an integer mode in the same cases as before.
>
> Thanks,
> Richard

2019-10-24  Richard Sandiford  <richard.sandiford@arm.com>

gcc/
	* target.h (vector_sizes, auto_vector_sizes): Delete.
	(vector_modes, auto_vector_modes): New typedefs.
	* target.def (autovectorize_vector_sizes): Replace with...
	(autovectorize_vector_modes): ...this new hook.
	* doc/tm.texi.in (TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES):
	Replace with...
	(TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES): ...this new hook.
	* doc/tm.texi: Regenerate.
	* targhooks.h (default_autovectorize_vector_sizes): Delete.
	(default_autovectorize_vector_modes): New function.
	* targhooks.c (default_autovectorize_vector_sizes): Delete.
	(default_autovectorize_vector_modes): New function.
	* omp-general.c (omp_max_vf): Use autovectorize_vector_modes instead
	of autovectorize_vector_sizes.  Use the number of units in the mode
	to calculate the maximum VF.
	* omp-low.c (omp_clause_aligned_alignment): Use
	autovectorize_vector_modes instead of autovectorize_vector_sizes.
	Use a loop based on related_mode to iterate through all supported
	vector modes for a given scalar mode.
	* optabs-query.c (can_vec_mask_load_store_p): Use
	autovectorize_vector_modes instead of autovectorize_vector_sizes.
	* tree-vect-loop.c (vect_analyze_loop, vect_transform_loop): Likewise.
	* tree-vect-slp.c (vect_slp_bb_region): Likewise.
	* config/aarch64/aarch64.c (aarch64_autovectorize_vector_sizes):
	Replace with...
	(aarch64_autovectorize_vector_modes): ...this new function.
	(TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES): Delete.
	(TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES): Define.
	* config/arc/arc.c (arc_autovectorize_vector_sizes): Replace with...
	(arc_autovectorize_vector_modes): ...this new function.
	(TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES): Delete.
	(TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES): Define.
	* config/arm/arm.c (arm_autovectorize_vector_sizes): Replace with...
	(arm_autovectorize_vector_modes): ...this new function.
	(TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES): Delete.
	(TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES): Define.
	* config/i386/i386.c (ix86_autovectorize_vector_sizes): Replace with...
	(ix86_autovectorize_vector_modes): ...this new function.
	(TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES): Delete.
	(TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES): Define.
	* config/mips/mips.c (mips_autovectorize_vector_sizes): Replace with...
	(mips_autovectorize_vector_modes): ...this new function.
	(TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES): Delete.
	(TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES): Define.

Index: gcc/target.h
===================================================================
--- gcc/target.h	2019-11-05 10:31:39.017103686 +0000
+++ gcc/target.h	2019-11-05 10:34:34.443861724 +0000
@@ -205,11 +205,11 @@ enum vect_cost_model_location {
 class vec_perm_indices;
 
 /* The type to use for lists of vector sizes.  */
-typedef vec<poly_uint64> vector_sizes;
+typedef vec<machine_mode> vector_modes;
 
 /* Same, but can be used to construct local lists that are
    automatically freed.  */
-typedef auto_vec<poly_uint64, 8> auto_vector_sizes;
+typedef auto_vec<machine_mode, 8> auto_vector_modes;
 
 /* First argument of targetm.omp.device_kind_arch_isa.  */
 enum omp_device_kind_arch_isa {
Index: gcc/target.def
===================================================================
--- gcc/target.def	2019-11-05 10:31:39.017103686 +0000
+++ gcc/target.def	2019-11-05 10:34:34.443861724 +0000
@@ -1909,20 +1909,28 @@ reached.  The default is @var{mode} whic
 /* Returns a mask of vector sizes to iterate over when auto-vectorizing
    after processing the preferred one derived from preferred_simd_mode.  */
 DEFHOOK
-(autovectorize_vector_sizes,
- "If the mode returned by @code{TARGET_VECTORIZE_PREFERRED_SIMD_MODE} is not\n\
-the only one that is worth considering, this hook should add all suitable\n\
-vector sizes to @var{sizes}, in order of decreasing preference.  The first\n\
-one should be the size of @code{TARGET_VECTORIZE_PREFERRED_SIMD_MODE}.\n\
-If @var{all} is true, add suitable vector sizes even when they are generally\n\
+(autovectorize_vector_modes,
+ "If using the mode returned by @code{TARGET_VECTORIZE_PREFERRED_SIMD_MODE}\n\
+is not the only approach worth considering, this hook should add one mode to\n\
+@var{modes} for each useful alternative approach.  These modes are then\n\
+passed to @code{TARGET_VECTORIZE_RELATED_MODE} to obtain the vector mode\n\
+for a given element mode.\n\
+\n\
+The modes returned in @var{modes} should use the smallest element mode\n\
+possible for the vectorization approach that they represent, preferring\n\
+integer modes over floating-poing modes in the event of a tie.  The first\n\
+mode should be the @code{TARGET_VECTORIZE_PREFERRED_SIMD_MODE} for its\n\
+element mode.\n\
+\n\
+If @var{all} is true, add suitable vector modes even when they are generally\n\
 not expected to be worthwhile.\n\
 \n\
 The hook does not need to do anything if the vector returned by\n\
 @code{TARGET_VECTORIZE_PREFERRED_SIMD_MODE} is the only one relevant\n\
 for autovectorization.  The default implementation does nothing.",
  void,
- (vector_sizes *sizes, bool all),
- default_autovectorize_vector_sizes)
+ (vector_modes *modes, bool all),
+ default_autovectorize_vector_modes)
 
 DEFHOOK
 (related_mode,
Index: gcc/doc/tm.texi.in
===================================================================
--- gcc/doc/tm.texi.in	2019-11-05 10:31:39.013103714 +0000
+++ gcc/doc/tm.texi.in	2019-11-05 10:34:34.443861724 +0000
@@ -4179,7 +4179,7 @@ address;  but often a machine-dependent
 
 @hook TARGET_VECTORIZE_SPLIT_REDUCTION
 
-@hook TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES
+@hook TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES
 
 @hook TARGET_VECTORIZE_RELATED_MODE
 
Index: gcc/doc/tm.texi
===================================================================
--- gcc/doc/tm.texi	2019-11-05 10:31:39.013103714 +0000
+++ gcc/doc/tm.texi	2019-11-05 10:34:34.443861724 +0000
@@ -6016,12 +6016,20 @@ against lower halves of vectors recursiv
 reached.  The default is @var{mode} which means no splitting.
 @end deftypefn
 
-@deftypefn {Target Hook} void TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES (vector_sizes *@var{sizes}, bool @var{all})
-If the mode returned by @code{TARGET_VECTORIZE_PREFERRED_SIMD_MODE} is not
-the only one that is worth considering, this hook should add all suitable
-vector sizes to @var{sizes}, in order of decreasing preference.  The first
-one should be the size of @code{TARGET_VECTORIZE_PREFERRED_SIMD_MODE}.
-If @var{all} is true, add suitable vector sizes even when they are generally
+@deftypefn {Target Hook} void TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES (vector_modes *@var{modes}, bool @var{all})
+If using the mode returned by @code{TARGET_VECTORIZE_PREFERRED_SIMD_MODE}
+is not the only approach worth considering, this hook should add one mode to
+@var{modes} for each useful alternative approach.  These modes are then
+passed to @code{TARGET_VECTORIZE_RELATED_MODE} to obtain the vector mode
+for a given element mode.
+
+The modes returned in @var{modes} should use the smallest element mode
+possible for the vectorization approach that they represent, preferring
+integer modes over floating-poing modes in the event of a tie.  The first
+mode should be the @code{TARGET_VECTORIZE_PREFERRED_SIMD_MODE} for its
+element mode.
+
+If @var{all} is true, add suitable vector modes even when they are generally
 not expected to be worthwhile.
 
 The hook does not need to do anything if the vector returned by
Index: gcc/targhooks.h
===================================================================
--- gcc/targhooks.h	2019-11-05 10:31:39.017103686 +0000
+++ gcc/targhooks.h	2019-11-05 10:34:34.447861696 +0000
@@ -113,7 +113,7 @@ default_builtin_support_vector_misalignm
 					     int, bool);
 extern machine_mode default_preferred_simd_mode (scalar_mode mode);
 extern machine_mode default_split_reduction (machine_mode);
-extern void default_autovectorize_vector_sizes (vector_sizes *, bool);
+extern void default_autovectorize_vector_modes (vector_modes *, bool);
 extern opt_machine_mode default_vectorize_related_mode (machine_mode,
 							scalar_mode,
 							poly_uint64);
Index: gcc/targhooks.c
===================================================================
--- gcc/targhooks.c	2019-11-05 10:31:39.017103686 +0000
+++ gcc/targhooks.c	2019-11-05 10:34:34.447861696 +0000
@@ -1299,11 +1299,10 @@ default_split_reduction (machine_mode mo
   return mode;
 }
 
-/* By default only the size derived from the preferred vector mode
-   is tried.  */
+/* By default only the preferred vector mode is tried.  */
 
 void
-default_autovectorize_vector_sizes (vector_sizes *, bool)
+default_autovectorize_vector_modes (vector_modes *, bool)
 {
 }
 
Index: gcc/omp-general.c
===================================================================
--- gcc/omp-general.c	2019-11-05 10:31:39.013103714 +0000
+++ gcc/omp-general.c	2019-11-05 10:34:34.443861724 +0000
@@ -509,13 +509,16 @@ omp_max_vf (void)
 	  && global_options_set.x_flag_tree_loop_vectorize))
     return 1;
 
-  auto_vector_sizes sizes;
-  targetm.vectorize.autovectorize_vector_sizes (&sizes, true);
-  if (!sizes.is_empty ())
+  auto_vector_modes modes;
+  targetm.vectorize.autovectorize_vector_modes (&modes, true);
+  if (!modes.is_empty ())
     {
       poly_uint64 vf = 0;
-      for (unsigned int i = 0; i < sizes.length (); ++i)
-	vf = ordered_max (vf, sizes[i]);
+      for (unsigned int i = 0; i < modes.length (); ++i)
+	/* The returned modes use the smallest element size (and thus
+	   the largest nunits) for the vectorization approach that they
+	   represent.  */
+	vf = ordered_max (vf, GET_MODE_NUNITS (modes[i]));
       return vf;
     }
 
Index: gcc/omp-low.c
===================================================================
--- gcc/omp-low.c	2019-11-05 10:31:39.013103714 +0000
+++ gcc/omp-low.c	2019-11-05 10:34:34.443861724 +0000
@@ -3959,11 +3959,8 @@ omp_clause_aligned_alignment (tree claus
   /* Otherwise return implementation defined alignment.  */
   unsigned int al = 1;
   opt_scalar_mode mode_iter;
-  auto_vector_sizes sizes;
-  targetm.vectorize.autovectorize_vector_sizes (&sizes, true);
-  poly_uint64 vs = 0;
-  for (unsigned int i = 0; i < sizes.length (); ++i)
-    vs = ordered_max (vs, sizes[i]);
+  auto_vector_modes modes;
+  targetm.vectorize.autovectorize_vector_modes (&modes, true);
   static enum mode_class classes[]
     = { MODE_INT, MODE_VECTOR_INT, MODE_FLOAT, MODE_VECTOR_FLOAT };
   for (int i = 0; i < 4; i += 2)
@@ -3974,19 +3971,18 @@ omp_clause_aligned_alignment (tree claus
 	machine_mode vmode = targetm.vectorize.preferred_simd_mode (mode);
 	if (GET_MODE_CLASS (vmode) != classes[i + 1])
 	  continue;
-	while (maybe_ne (vs, 0U)
-	       && known_lt (GET_MODE_SIZE (vmode), vs)
-	       && GET_MODE_2XWIDER_MODE (vmode).exists ())
-	  vmode = GET_MODE_2XWIDER_MODE (vmode).require ();
+	machine_mode alt_vmode;
+	for (unsigned int j = 0; j < modes.length (); ++j)
+	  if (related_vector_mode (modes[j], mode).exists (&alt_vmode)
+	      && known_ge (GET_MODE_SIZE (alt_vmode), GET_MODE_SIZE (vmode)))
+	    vmode = alt_vmode;
 
 	tree type = lang_hooks.types.type_for_mode (mode, 1);
 	if (type == NULL_TREE || TYPE_MODE (type) != mode)
 	  continue;
-	poly_uint64 nelts = exact_div (GET_MODE_SIZE (vmode),
-				       GET_MODE_SIZE (mode));
-	type = build_vector_type (type, nelts);
-	if (TYPE_MODE (type) != vmode)
-	  continue;
+	type = build_vector_type_for_mode (type, vmode);
+	/* The functions above are not allowed to return invalid modes.  */
+	gcc_assert (TYPE_MODE (type) == vmode);
 	if (TYPE_ALIGN_UNIT (type) > al)
 	  al = TYPE_ALIGN_UNIT (type);
       }
Index: gcc/optabs-query.c
===================================================================
--- gcc/optabs-query.c	2019-11-05 10:31:39.013103714 +0000
+++ gcc/optabs-query.c	2019-11-05 10:34:34.443861724 +0000
@@ -589,11 +589,11 @@ can_vec_mask_load_store_p (machine_mode
       && convert_optab_handler (op, vmode, mask_mode) != CODE_FOR_nothing)
     return true;
 
-  auto_vector_sizes vector_sizes;
-  targetm.vectorize.autovectorize_vector_sizes (&vector_sizes, true);
-  for (unsigned int i = 0; i < vector_sizes.length (); ++i)
+  auto_vector_modes vector_modes;
+  targetm.vectorize.autovectorize_vector_modes (&vector_modes, true);
+  for (unsigned int i = 0; i < vector_modes.length (); ++i)
     {
-      poly_uint64 cur = vector_sizes[i];
+      poly_uint64 cur = GET_MODE_SIZE (vector_modes[i]);
       poly_uint64 nunits;
       if (!multiple_p (cur, GET_MODE_SIZE (smode), &nunits))
 	continue;
Index: gcc/tree-vect-loop.c
===================================================================
--- gcc/tree-vect-loop.c	2019-11-05 10:31:39.017103686 +0000
+++ gcc/tree-vect-loop.c	2019-11-05 10:34:34.447861696 +0000
@@ -2364,12 +2364,12 @@ vect_analyze_loop_2 (loop_vec_info loop_
 opt_loop_vec_info
 vect_analyze_loop (class loop *loop, vec_info_shared *shared)
 {
-  auto_vector_sizes vector_sizes;
+  auto_vector_modes vector_modes;
 
   /* Autodetect first vector size we try.  */
-  targetm.vectorize.autovectorize_vector_sizes (&vector_sizes,
+  targetm.vectorize.autovectorize_vector_modes (&vector_modes,
 						loop->simdlen != 0);
-  unsigned int next_size = 0;
+  unsigned int mode_i = 0;
 
   DUMP_VECT_SCOPE ("analyze_loop_nest");
 
@@ -2388,7 +2388,7 @@ vect_analyze_loop (class loop *loop, vec
   unsigned n_stmts = 0;
   poly_uint64 autodetected_vector_size = 0;
   opt_loop_vec_info first_loop_vinfo = opt_loop_vec_info::success (NULL);
-  poly_uint64 next_vector_size = 0;
+  machine_mode next_vector_mode = VOIDmode;
   poly_uint64 lowest_th = 0;
   unsigned vectorized_loops = 0;
 
@@ -2407,7 +2407,7 @@ vect_analyze_loop (class loop *loop, vec
 	  gcc_checking_assert (first_loop_vinfo == NULL);
 	  return loop_vinfo;
 	}
-      loop_vinfo->vector_size = next_vector_size;
+      loop_vinfo->vector_size = GET_MODE_SIZE (next_vector_mode);
 
       bool fatal = false;
 
@@ -2415,7 +2415,7 @@ vect_analyze_loop (class loop *loop, vec
 	LOOP_VINFO_ORIG_LOOP_INFO (loop_vinfo) = first_loop_vinfo;
 
       res = vect_analyze_loop_2 (loop_vinfo, fatal, &n_stmts);
-      if (next_size == 0)
+      if (mode_i == 0)
 	autodetected_vector_size = loop_vinfo->vector_size;
 
       loop->aux = NULL;
@@ -2478,24 +2478,21 @@ vect_analyze_loop (class loop *loop, vec
 	    }
 	}
 
-      if (next_size < vector_sizes.length ()
-	  && known_eq (vector_sizes[next_size], autodetected_vector_size))
-	next_size += 1;
+      if (mode_i < vector_modes.length ()
+	  && known_eq (GET_MODE_SIZE (vector_modes[mode_i]),
+		       autodetected_vector_size))
+	mode_i += 1;
 
-      if (next_size == vector_sizes.length ()
+      if (mode_i == vector_modes.length ()
 	  || known_eq (autodetected_vector_size, 0U))
 	break;
 
       /* Try the next biggest vector size.  */
-      next_vector_size = vector_sizes[next_size++];
+      next_vector_mode = vector_modes[mode_i++];
       if (dump_enabled_p ())
-	{
-	  dump_printf_loc (MSG_NOTE, vect_location,
-			   "***** Re-trying analysis with "
-			   "vector size ");
-	  dump_dec (MSG_NOTE, next_vector_size);
-	  dump_printf (MSG_NOTE, "\n");
-	}
+	dump_printf_loc (MSG_NOTE, vect_location,
+			 "***** Re-trying analysis with vector mode %s\n",
+			 GET_MODE_NAME (next_vector_mode));
     }
 
   if (first_loop_vinfo)
Index: gcc/tree-vect-slp.c
===================================================================
--- gcc/tree-vect-slp.c	2019-11-05 10:31:39.021103658 +0000
+++ gcc/tree-vect-slp.c	2019-11-05 10:34:34.447861696 +0000
@@ -3171,12 +3171,12 @@ vect_slp_bb_region (gimple_stmt_iterator
 		    unsigned int n_stmts)
 {
   bb_vec_info bb_vinfo;
-  auto_vector_sizes vector_sizes;
+  auto_vector_modes vector_modes;
 
   /* Autodetect first vector size we try.  */
-  poly_uint64 next_vector_size = 0;
-  targetm.vectorize.autovectorize_vector_sizes (&vector_sizes, false);
-  unsigned int next_size = 0;
+  machine_mode next_vector_mode = VOIDmode;
+  targetm.vectorize.autovectorize_vector_modes (&vector_modes, false);
+  unsigned int mode_i = 0;
 
   vec_info_shared shared;
 
@@ -3193,7 +3193,7 @@ vect_slp_bb_region (gimple_stmt_iterator
 	bb_vinfo->shared->save_datarefs ();
       else
 	bb_vinfo->shared->check_datarefs ();
-      bb_vinfo->vector_size = next_vector_size;
+      bb_vinfo->vector_size = GET_MODE_SIZE (next_vector_mode);
 
       if (vect_slp_analyze_bb_1 (bb_vinfo, n_stmts, fatal)
 	  && dbg_cnt (vect_slp))
@@ -3220,17 +3220,18 @@ vect_slp_bb_region (gimple_stmt_iterator
 	  vectorized = true;
 	}
 
-      if (next_size == 0)
+      if (mode_i == 0)
 	autodetected_vector_size = bb_vinfo->vector_size;
 
       delete bb_vinfo;
 
-      if (next_size < vector_sizes.length ()
-	  && known_eq (vector_sizes[next_size], autodetected_vector_size))
-	next_size += 1;
+      if (mode_i < vector_modes.length ()
+	  && known_eq (GET_MODE_SIZE (vector_modes[mode_i]),
+		       autodetected_vector_size))
+	mode_i += 1;
 
       if (vectorized
-	  || next_size == vector_sizes.length ()
+	  || mode_i == vector_modes.length ()
 	  || known_eq (autodetected_vector_size, 0U)
 	  /* If vect_slp_analyze_bb_1 signaled that analysis for all
 	     vector sizes will fail do not bother iterating.  */
@@ -3238,15 +3239,11 @@ vect_slp_bb_region (gimple_stmt_iterator
 	return vectorized;
 
       /* Try the next biggest vector size.  */
-      next_vector_size = vector_sizes[next_size++];
+      next_vector_mode = vector_modes[mode_i++];
       if (dump_enabled_p ())
-	{
-	  dump_printf_loc (MSG_NOTE, vect_location,
-			   "***** Re-trying analysis with "
-			   "vector size ");
-	  dump_dec (MSG_NOTE, next_vector_size);
-	  dump_printf (MSG_NOTE, "\n");
-	}
+	dump_printf_loc (MSG_NOTE, vect_location,
+			 "***** Re-trying analysis with vector mode %s\n",
+			 GET_MODE_NAME (next_vector_mode));
     }
 }
 
Index: gcc/config/aarch64/aarch64.c
===================================================================
--- gcc/config/aarch64/aarch64.c	2019-11-05 10:31:38.985103913 +0000
+++ gcc/config/aarch64/aarch64.c	2019-11-05 10:34:34.423861866 +0000
@@ -15926,12 +15926,12 @@ aarch64_preferred_simd_mode (scalar_mode
 /* Return a list of possible vector sizes for the vectorizer
    to iterate over.  */
 static void
-aarch64_autovectorize_vector_sizes (vector_sizes *sizes, bool)
+aarch64_autovectorize_vector_modes (vector_modes *modes, bool)
 {
   if (TARGET_SVE)
-    sizes->safe_push (BYTES_PER_SVE_VECTOR);
-  sizes->safe_push (16);
-  sizes->safe_push (8);
+    modes->safe_push (VNx16QImode);
+  modes->safe_push (V16QImode);
+  modes->safe_push (V8QImode);
 }
 
 /* Implement TARGET_MANGLE_TYPE.  */
@@ -21765,9 +21765,9 @@ #define TARGET_VECTORIZE_BUILTINS
 #define TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION \
   aarch64_builtin_vectorized_function
 
-#undef TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES
-#define TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES \
-  aarch64_autovectorize_vector_sizes
+#undef TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES
+#define TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES \
+  aarch64_autovectorize_vector_modes
 
 #undef TARGET_ATOMIC_ASSIGN_EXPAND_FENV
 #define TARGET_ATOMIC_ASSIGN_EXPAND_FENV \
Index: gcc/config/arc/arc.c
===================================================================
--- gcc/config/arc/arc.c	2019-11-05 10:31:38.989103884 +0000
+++ gcc/config/arc/arc.c	2019-11-05 10:34:34.427861838 +0000
@@ -607,15 +607,15 @@ arc_preferred_simd_mode (scalar_mode mod
 }
 
 /* Implements target hook
-   TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES.  */
+   TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES.  */
 
 static void
-arc_autovectorize_vector_sizes (vector_sizes *sizes, bool)
+arc_autovectorize_vector_modes (vector_modes *modes, bool)
 {
   if (TARGET_PLUS_QMACW)
     {
-      sizes->quick_push (8);
-      sizes->quick_push (4);
+      modes->quick_push (V4HImode);
+      modes->quick_push (V2HImode);
     }
 }
 
@@ -726,8 +726,8 @@ #define TARGET_VECTOR_MODE_SUPPORTED_P a
 #undef TARGET_VECTORIZE_PREFERRED_SIMD_MODE
 #define TARGET_VECTORIZE_PREFERRED_SIMD_MODE arc_preferred_simd_mode
 
-#undef TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES
-#define TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES arc_autovectorize_vector_sizes
+#undef TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES
+#define TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES arc_autovectorize_vector_modes
 
 #undef TARGET_CAN_USE_DOLOOP_P
 #define TARGET_CAN_USE_DOLOOP_P arc_can_use_doloop_p
Index: gcc/config/arm/arm.c
===================================================================
--- gcc/config/arm/arm.c	2019-11-05 10:31:39.001103800 +0000
+++ gcc/config/arm/arm.c	2019-11-05 10:34:34.435861782 +0000
@@ -289,7 +289,7 @@ static bool arm_builtin_support_vector_m
 static void arm_conditional_register_usage (void);
 static enum flt_eval_method arm_excess_precision (enum excess_precision_type);
 static reg_class_t arm_preferred_rename_class (reg_class_t rclass);
-static void arm_autovectorize_vector_sizes (vector_sizes *, bool);
+static void arm_autovectorize_vector_modes (vector_modes *, bool);
 static int arm_default_branch_cost (bool, bool);
 static int arm_cortex_a5_branch_cost (bool, bool);
 static int arm_cortex_m_branch_cost (bool, bool);
@@ -522,9 +522,9 @@ #define TARGET_VECTOR_MODE_SUPPORTED_P a
 #define TARGET_ARRAY_MODE_SUPPORTED_P arm_array_mode_supported_p
 #undef TARGET_VECTORIZE_PREFERRED_SIMD_MODE
 #define TARGET_VECTORIZE_PREFERRED_SIMD_MODE arm_preferred_simd_mode
-#undef TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES
-#define TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES \
-  arm_autovectorize_vector_sizes
+#undef TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES
+#define TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES \
+  arm_autovectorize_vector_modes
 
 #undef  TARGET_MACHINE_DEPENDENT_REORG
 #define TARGET_MACHINE_DEPENDENT_REORG arm_reorg
@@ -29016,12 +29016,12 @@ arm_vector_alignment (const_tree type)
 }
 
 static void
-arm_autovectorize_vector_sizes (vector_sizes *sizes, bool)
+arm_autovectorize_vector_modes (vector_modes *modes, bool)
 {
   if (!TARGET_NEON_VECTORIZE_DOUBLE)
     {
-      sizes->safe_push (16);
-      sizes->safe_push (8);
+      modes->safe_push (V16QImode);
+      modes->safe_push (V8QImode);
     }
 }
 
Index: gcc/config/i386/i386.c
===================================================================
--- gcc/config/i386/i386.c	2019-11-05 10:31:39.005103774 +0000
+++ gcc/config/i386/i386.c	2019-11-05 10:34:34.435861782 +0000
@@ -21386,35 +21386,35 @@ ix86_preferred_simd_mode (scalar_mode mo
    256bit and 128bit vectors.  */
 
 static void
-ix86_autovectorize_vector_sizes (vector_sizes *sizes, bool all)
+ix86_autovectorize_vector_modes (vector_modes *modes, bool all)
 {
   if (TARGET_AVX512F && !TARGET_PREFER_AVX256)
     {
-      sizes->safe_push (64);
-      sizes->safe_push (32);
-      sizes->safe_push (16);
+      modes->safe_push (V64QImode);
+      modes->safe_push (V32QImode);
+      modes->safe_push (V16QImode);
     }
   else if (TARGET_AVX512F && all)
     {
-      sizes->safe_push (32);
-      sizes->safe_push (16);
-      sizes->safe_push (64);
+      modes->safe_push (V32QImode);
+      modes->safe_push (V16QImode);
+      modes->safe_push (V64QImode);
     }
   else if (TARGET_AVX && !TARGET_PREFER_AVX128)
     {
-      sizes->safe_push (32);
-      sizes->safe_push (16);
+      modes->safe_push (V32QImode);
+      modes->safe_push (V16QImode);
     }
   else if (TARGET_AVX && all)
     {
-      sizes->safe_push (16);
-      sizes->safe_push (32);
+      modes->safe_push (V16QImode);
+      modes->safe_push (V32QImode);
     }
   else if (TARGET_MMX_WITH_SSE)
-    sizes->safe_push (16);
+    modes->safe_push (V16QImode);
 
   if (TARGET_MMX_WITH_SSE)
-    sizes->safe_push (8);
+    modes->safe_push (V8QImode);
 }
 
 /* Implemenation of targetm.vectorize.get_mask_mode.  */
@@ -22953,9 +22953,9 @@ #define TARGET_VECTORIZE_PREFERRED_SIMD_
 #undef TARGET_VECTORIZE_SPLIT_REDUCTION
 #define TARGET_VECTORIZE_SPLIT_REDUCTION \
   ix86_split_reduction
-#undef TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES
-#define TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES \
-  ix86_autovectorize_vector_sizes
+#undef TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES
+#define TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES \
+  ix86_autovectorize_vector_modes
 #undef TARGET_VECTORIZE_GET_MASK_MODE
 #define TARGET_VECTORIZE_GET_MASK_MODE ix86_get_mask_mode
 #undef TARGET_VECTORIZE_INIT_COST
Index: gcc/config/mips/mips.c
===================================================================
--- gcc/config/mips/mips.c	2019-11-05 10:31:39.009103742 +0000
+++ gcc/config/mips/mips.c	2019-11-05 10:34:34.439861754 +0000
@@ -13453,13 +13453,13 @@ mips_preferred_simd_mode (scalar_mode mo
   return word_mode;
 }
 
-/* Implement TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES.  */
+/* Implement TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES.  */
 
 static void
-mips_autovectorize_vector_sizes (vector_sizes *sizes, bool)
+mips_autovectorize_vector_modes (vector_modes *modes, bool)
 {
   if (ISA_HAS_MSA)
-    sizes->safe_push (16);
+    modes->safe_push (V16QImode);
 }
 
 /* Implement TARGET_INIT_LIBFUNCS.  */
@@ -22716,9 +22716,9 @@ #define TARGET_SCALAR_MODE_SUPPORTED_P m
 
 #undef TARGET_VECTORIZE_PREFERRED_SIMD_MODE
 #define TARGET_VECTORIZE_PREFERRED_SIMD_MODE mips_preferred_simd_mode
-#undef TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES
-#define TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES \
-  mips_autovectorize_vector_sizes
+#undef TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES
+#define TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES \
+  mips_autovectorize_vector_modes
 
 #undef TARGET_INIT_BUILTINS
 #define TARGET_INIT_BUILTINS mips_init_builtins

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [8/n] Replace autovectorize_vector_sizes with autovectorize_vector_modes
  2019-10-30 16:33     ` Richard Sandiford
  2019-11-11 10:30       ` Richard Sandiford
@ 2019-11-11 14:33       ` Richard Biener
  2019-11-12 17:55         ` Richard Sandiford
  1 sibling, 1 reply; 48+ messages in thread
From: Richard Biener @ 2019-11-11 14:33 UTC (permalink / raw)
  To: Richard Biener, GCC Patches, Richard Sandiford

On Wed, Oct 30, 2019 at 4:58 PM Richard Sandiford
<richard.sandiford@arm.com> wrote:
>
> Richard Biener <richard.guenther@gmail.com> writes:
> > On Fri, Oct 25, 2019 at 2:37 PM Richard Sandiford
> > <richard.sandiford@arm.com> wrote:
> >>
> >> This is another patch in the series to remove the assumption that
> >> all modes involved in vectorisation have to be the same size.
> >> Rather than have the target provide a list of vector sizes,
> >> it makes the target provide a list of vector "approaches",
> >> with each approach represented by a mode.
> >>
> >> A later patch will pass this mode to targetm.vectorize.related_mode
> >> to get the vector mode for a given element mode.  Until then, the modes
> >> simply act as an alternative way of specifying the vector size.
> >
> > Is there a restriction to use integer vector modes for the hook
> > or would FP vector modes be OK as well?
>
> Conceptually, each mode returned by the hook represents a set of vector
> modes, with the set containing one member for each supported element
> type.  The idea is to represent the set using the member with the
> smallest element type, preferring integer modes over floating-point
> modes in the event of a tie.  So using a floating-point mode as the
> representative mode is fine if floating-point elements are the smallest
> (or only) supported element type.
>
> > Note that your x86 change likely disables word_mode vectorization with
> > -mno-sse?
>
> No, that still works, because...
>
> > That is, how do we represent GPR vectorization "size" here?
> > The preferred SIMD mode hook may return an integer mode,
> > are non-vector modes OK for autovectorize_vector_modes?
>
> ...at least with all current targets, preferred_simd_mode is only
> an integer mode if the target has no "real" vectorisation support
> for that element type.  There's no need to handle that case in
> autovectorize_vector_sizes/modes, and e.g. the x86 hook does nothing
> when SSE is disabled.
>
> So while preferred_simd_mode can continue to return integer modes,
> autovectorize_vector_modes always returns vector modes.

Hmm, I see.  IIRC I was playing with a patch for x86 that
enabled word-mode vectorization (64bits) for SSE before (I see
we don't do that at the moment).  The MMX-with-SSE has made
that somewhat moot but with iterating over modes we could
even make MMX-with-SSE (MMX modes) and word-mode vectors
coexist by allowing the hook to return V4SI, V2SI, DImode?
Because MMX-with-SSE might be more costly than word-mode
but can of course handle more cases.

So you say the above isn't supported and cannot be made supported?

Thanks,
Richard.

> This patch just treats the mode as an alternative way of specifying
> the vector size.  11/n then tries to use related_vector_mode to choose
> the vector mode for each element type instead.  But 11/n only uses
> related_vector_mode if vec_info::vector_mode is a vector mode.  If it's
> an integer mode (as for -mno-sse), or if related_vector_mode fails to
> find a vector mode, then we still fall back to mode_for_vector and so
> pick an integer mode in the same cases as before.
>
> Thanks,
> Richard

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [11/n] Support vectorisation with mixed vector sizes
  2019-11-06 12:38     ` Richard Sandiford
@ 2019-11-12  9:22       ` Richard Biener
  0 siblings, 0 replies; 48+ messages in thread
From: Richard Biener @ 2019-11-12  9:22 UTC (permalink / raw)
  To: Richard Biener, GCC Patches, Richard Sandiford

On Wed, Nov 6, 2019 at 1:38 PM Richard Sandiford
<richard.sandiford@arm.com> wrote:
>
> Richard Biener <richard.guenther@gmail.com> writes:
> > On Fri, Oct 25, 2019 at 2:43 PM Richard Sandiford
> > <richard.sandiford@arm.com> wrote:
> >>
> >> After previous patches, it's now possible to make the vectoriser
> >> support multiple vector sizes in the same vector region, using
> >> related_vector_mode to pick the right vector mode for a given
> >> element mode.  No port yet takes advantage of this, but I have
> >> a follow-on patch for AArch64.
> >>
> >> This patch also seemed like a good opportunity to add some more dump
> >> messages: one to make it clear which vector size/mode was being used
> >> when analysis passed or failed, and another to say when we've decided
> >> to skip a redundant vector size/mode.
> >
> > OK.
> >
> > I wonder if, when we requested a specific size previously, we now
> > have to verify we got that constraint satisfied after the change.
> > Esp. the epilogue vectorization cases want to get V2DI
> > from V4DI.
> >
> >           sz /= 2;
> > -         vectype1 = get_vectype_for_scalar_type_and_size (scalar_type, sz);
> > +         vectype1 = get_related_vectype_for_scalar_type (TYPE_MODE (vectype),
> > +                                                         scalar_type,
> > +                                                         sz / scalar_bytes);
> >
> > doesn't look like an improvement in readability to me there.
>
> Yeah, guess it isn't great.
>
> > Maybe re-formulating the whole code in terms of lanes instead of size
> > would make it easier to follow?
>
> OK, how about this version?  It still won't win awards, but it's at
> least a bit more readable.
>
> Tested as before.

OK (and sorry for the delay, looking for leftovers of the series now).

Thanks,
Richard.

> Richard
>
>
> 2019-11-06  Richard Sandiford  <richard.sandiford@arm.com>
>
> gcc/
>         * machmode.h (opt_machine_mode::operator==): New function.
>         (opt_machine_mode::operator!=): Likewise.
>         * tree-vectorizer.h (vec_info::vector_mode): Update comment.
>         (get_related_vectype_for_scalar_type): Delete.
>         (get_vectype_for_scalar_type_and_size): Declare.
>         * tree-vect-slp.c (vect_slp_bb_region): Print dump messages to say
>         whether analysis passed or failed, and with what vector modes.
>         Use related_vector_mode to check whether trying a particular
>         vector mode would be redundant with the autodetected mode,
>         and print a dump message if we decide to skip it.
>         * tree-vect-loop.c (vect_analyze_loop): Likewise.
>         (vect_create_epilog_for_reduction): Use
>         get_related_vectype_for_scalar_type instead of
>         get_vectype_for_scalar_type_and_size.
>         * tree-vect-stmts.c (get_vectype_for_scalar_type_and_size): Replace
>         with...
>         (get_related_vectype_for_scalar_type): ...this new function.
>         Take a starting/"prevailing" vector mode rather than a vector size.
>         Take an optional nunits argument, with the same meaning as for
>         related_vector_mode.  Use related_vector_mode when not
>         auto-detecting a mode, falling back to mode_for_vector if no
>         target mode exists.
>         (get_vectype_for_scalar_type): Update accordingly.
>         (get_same_sized_vectype): Likewise.
>         * tree-vectorizer.c (get_vec_alignment_for_array_type): Likewise.
>
> Index: gcc/machmode.h
> ===================================================================
> --- gcc/machmode.h      2019-11-06 12:35:12.460201615 +0000
> +++ gcc/machmode.h      2019-11-06 12:35:27.972093472 +0000
> @@ -258,6 +258,9 @@ #define CLASS_HAS_WIDER_MODES_P(CLASS)
>    bool exists () const;
>    template<typename U> bool exists (U *) const;
>
> +  bool operator== (const T &m) const { return m_mode == m; }
> +  bool operator!= (const T &m) const { return m_mode != m; }
> +
>  private:
>    machine_mode m_mode;
>  };
> Index: gcc/tree-vectorizer.h
> ===================================================================
> --- gcc/tree-vectorizer.h       2019-11-06 12:35:12.764199495 +0000
> +++ gcc/tree-vectorizer.h       2019-11-06 12:35:27.976093444 +0000
> @@ -335,8 +335,9 @@ typedef std::pair<tree, tree> vec_object
>    /* Cost data used by the target cost model.  */
>    void *target_cost_data;
>
> -  /* If we've chosen a vector size for this vectorization region,
> -     this is one mode that has such a size, otherwise it is VOIDmode.  */
> +  /* The argument we should pass to related_vector_mode when looking up
> +     the vector mode for a scalar mode, or VOIDmode if we haven't yet
> +     made any decisions about which vector modes to use.  */
>    machine_mode vector_mode;
>
>  private:
> @@ -1609,8 +1610,9 @@ extern bool vect_can_advance_ivs_p (loop
>  extern void vect_update_inits_of_drs (loop_vec_info, tree, tree_code);
>
>  /* In tree-vect-stmts.c.  */
> +extern tree get_related_vectype_for_scalar_type (machine_mode, tree,
> +                                                poly_uint64 = 0);
>  extern tree get_vectype_for_scalar_type (vec_info *, tree);
> -extern tree get_vectype_for_scalar_type_and_size (tree, poly_uint64);
>  extern tree get_mask_type_for_scalar_type (vec_info *, tree);
>  extern tree get_same_sized_vectype (tree, tree);
>  extern bool vect_get_loop_mask_type (loop_vec_info);
> Index: gcc/tree-vect-slp.c
> ===================================================================
> --- gcc/tree-vect-slp.c 2019-11-06 12:35:12.760199523 +0000
> +++ gcc/tree-vect-slp.c 2019-11-06 12:35:27.972093472 +0000
> @@ -3202,7 +3202,12 @@ vect_slp_bb_region (gimple_stmt_iterator
>           && dbg_cnt (vect_slp))
>         {
>           if (dump_enabled_p ())
> -           dump_printf_loc (MSG_NOTE, vect_location, "SLPing BB part\n");
> +           {
> +             dump_printf_loc (MSG_NOTE, vect_location,
> +                              "***** Analysis succeeded with vector mode"
> +                              " %s\n", GET_MODE_NAME (bb_vinfo->vector_mode));
> +             dump_printf_loc (MSG_NOTE, vect_location, "SLPing BB part\n");
> +           }
>
>           bb_vinfo->shared->check_datarefs ();
>           vect_schedule_slp (bb_vinfo);
> @@ -3222,6 +3227,13 @@ vect_slp_bb_region (gimple_stmt_iterator
>
>           vectorized = true;
>         }
> +      else
> +       {
> +         if (dump_enabled_p ())
> +           dump_printf_loc (MSG_NOTE, vect_location,
> +                            "***** Analysis failed with vector mode %s\n",
> +                            GET_MODE_NAME (bb_vinfo->vector_mode));
> +       }
>
>        if (mode_i == 0)
>         autodetected_vector_mode = bb_vinfo->vector_mode;
> @@ -3229,9 +3241,22 @@ vect_slp_bb_region (gimple_stmt_iterator
>        delete bb_vinfo;
>
>        if (mode_i < vector_modes.length ()
> -         && known_eq (GET_MODE_SIZE (vector_modes[mode_i]),
> -                      GET_MODE_SIZE (autodetected_vector_mode)))
> -       mode_i += 1;
> +         && VECTOR_MODE_P (autodetected_vector_mode)
> +         && (related_vector_mode (vector_modes[mode_i],
> +                                  GET_MODE_INNER (autodetected_vector_mode))
> +             == autodetected_vector_mode)
> +         && (related_vector_mode (autodetected_vector_mode,
> +                                  GET_MODE_INNER (vector_modes[mode_i]))
> +             == vector_modes[mode_i]))
> +       {
> +         if (dump_enabled_p ())
> +           dump_printf_loc (MSG_NOTE, vect_location,
> +                            "***** Skipping vector mode %s, which would"
> +                            " repeat the analysis for %s\n",
> +                            GET_MODE_NAME (vector_modes[mode_i]),
> +                            GET_MODE_NAME (autodetected_vector_mode));
> +         mode_i += 1;
> +       }
>
>        if (vectorized
>           || mode_i == vector_modes.length ()
> Index: gcc/tree-vect-loop.c
> ===================================================================
> --- gcc/tree-vect-loop.c        2019-11-06 12:35:12.756199552 +0000
> +++ gcc/tree-vect-loop.c        2019-11-06 12:35:27.972093472 +0000
> @@ -2417,6 +2417,17 @@ vect_analyze_loop (class loop *loop, vec
>        res = vect_analyze_loop_2 (loop_vinfo, fatal, &n_stmts);
>        if (mode_i == 0)
>         autodetected_vector_mode = loop_vinfo->vector_mode;
> +      if (dump_enabled_p ())
> +       {
> +         if (res)
> +           dump_printf_loc (MSG_NOTE, vect_location,
> +                            "***** Analysis succeeded with vector mode %s\n",
> +                            GET_MODE_NAME (loop_vinfo->vector_mode));
> +         else
> +           dump_printf_loc (MSG_NOTE, vect_location,
> +                            "***** Analysis failed with vector mode %s\n",
> +                            GET_MODE_NAME (loop_vinfo->vector_mode));
> +       }
>
>        loop->aux = NULL;
>        if (res)
> @@ -2479,9 +2490,22 @@ vect_analyze_loop (class loop *loop, vec
>         }
>
>        if (mode_i < vector_modes.length ()
> -         && known_eq (GET_MODE_SIZE (vector_modes[mode_i]),
> -                      GET_MODE_SIZE (autodetected_vector_mode)))
> -       mode_i += 1;
> +         && VECTOR_MODE_P (autodetected_vector_mode)
> +         && (related_vector_mode (vector_modes[mode_i],
> +                                  GET_MODE_INNER (autodetected_vector_mode))
> +             == autodetected_vector_mode)
> +         && (related_vector_mode (autodetected_vector_mode,
> +                                  GET_MODE_INNER (vector_modes[mode_i]))
> +             == vector_modes[mode_i]))
> +       {
> +         if (dump_enabled_p ())
> +           dump_printf_loc (MSG_NOTE, vect_location,
> +                            "***** Skipping vector mode %s, which would"
> +                            " repeat the analysis for %s\n",
> +                            GET_MODE_NAME (vector_modes[mode_i]),
> +                            GET_MODE_NAME (autodetected_vector_mode));
> +         mode_i += 1;
> +       }
>
>        if (mode_i == vector_modes.length ()
>           || autodetected_vector_mode == VOIDmode)
> @@ -4870,13 +4894,15 @@ vect_create_epilog_for_reduction (stmt_v
>          in a vector mode of smaller size and first reduce upper/lower
>          halves against each other.  */
>        enum machine_mode mode1 = mode;
> -      unsigned sz = tree_to_uhwi (TYPE_SIZE_UNIT (vectype));
> -      unsigned sz1 = sz;
> +      unsigned nunits = TYPE_VECTOR_SUBPARTS (vectype).to_constant ();
> +      unsigned nunits1 = nunits;
>        if (!slp_reduc
>           && (mode1 = targetm.vectorize.split_reduction (mode)) != mode)
> -       sz1 = GET_MODE_SIZE (mode1).to_constant ();
> +       nunits1 = GET_MODE_NUNITS (mode1).to_constant ();
>
> -      tree vectype1 = get_vectype_for_scalar_type_and_size (scalar_type, sz1);
> +      tree vectype1 = get_related_vectype_for_scalar_type (TYPE_MODE (vectype),
> +                                                          scalar_type,
> +                                                          nunits1);
>        reduce_with_shift = have_whole_vector_shift (mode1);
>        if (!VECTOR_MODE_P (mode1))
>         reduce_with_shift = false;
> @@ -4890,11 +4916,13 @@ vect_create_epilog_for_reduction (stmt_v
>        /* First reduce the vector to the desired vector size we should
>          do shift reduction on by combining upper and lower halves.  */
>        new_temp = new_phi_result;
> -      while (sz > sz1)
> +      while (nunits > nunits1)
>         {
>           gcc_assert (!slp_reduc);
> -         sz /= 2;
> -         vectype1 = get_vectype_for_scalar_type_and_size (scalar_type, sz);
> +         nunits /= 2;
> +         vectype1 = get_related_vectype_for_scalar_type (TYPE_MODE (vectype),
> +                                                         scalar_type, nunits);
> +         unsigned int bitsize = tree_to_uhwi (TYPE_SIZE (vectype1));
>
>           /* The target has to make sure we support lowpart/highpart
>              extraction, either via direct vector extract or through
> @@ -4919,15 +4947,14 @@ vect_create_epilog_for_reduction (stmt_v
>                   = gimple_build_assign (dst2, BIT_FIELD_REF,
>                                          build3 (BIT_FIELD_REF, vectype1,
>                                                  new_temp, TYPE_SIZE (vectype1),
> -                                                bitsize_int (sz * BITS_PER_UNIT)));
> +                                                bitsize_int (bitsize)));
>               gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT);
>             }
>           else
>             {
>               /* Extract via punning to appropriately sized integer mode
>                  vector.  */
> -             tree eltype = build_nonstandard_integer_type (sz * BITS_PER_UNIT,
> -                                                           1);
> +             tree eltype = build_nonstandard_integer_type (bitsize, 1);
>               tree etype = build_vector_type (eltype, 2);
>               gcc_assert (convert_optab_handler (vec_extract_optab,
>                                                  TYPE_MODE (etype),
> @@ -4956,7 +4983,7 @@ vect_create_epilog_for_reduction (stmt_v
>                   = gimple_build_assign (tem, BIT_FIELD_REF,
>                                          build3 (BIT_FIELD_REF, eltype,
>                                                  new_temp, TYPE_SIZE (eltype),
> -                                                bitsize_int (sz * BITS_PER_UNIT)));
> +                                                bitsize_int (bitsize)));
>               gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT);
>               dst2 =  make_ssa_name (vectype1);
>               epilog_stmt = gimple_build_assign (dst2, VIEW_CONVERT_EXPR,
> Index: gcc/tree-vect-stmts.c
> ===================================================================
> --- gcc/tree-vect-stmts.c       2019-11-06 12:35:12.796199272 +0000
> +++ gcc/tree-vect-stmts.c       2019-11-06 12:35:27.976093444 +0000
> @@ -11097,18 +11097,28 @@ vect_remove_stores (stmt_vec_info first_
>      }
>  }
>
> -/* Function get_vectype_for_scalar_type_and_size.
> -
> -   Returns the vector type corresponding to SCALAR_TYPE  and SIZE as supported
> -   by the target.  */
> +/* If NUNITS is nonzero, return a vector type that contains NUNITS
> +   elements of type SCALAR_TYPE, or null if the target doesn't support
> +   such a type.
> +
> +   If NUNITS is zero, return a vector type that contains elements of
> +   type SCALAR_TYPE, choosing whichever vector size the target prefers.
> +
> +   If PREVAILING_MODE is VOIDmode, we have not yet chosen a vector mode
> +   for this vectorization region and want to "autodetect" the best choice.
> +   Otherwise, PREVAILING_MODE is a previously-chosen vector TYPE_MODE
> +   and we want the new type to be interoperable with it.   PREVAILING_MODE
> +   in this case can be a scalar integer mode or a vector mode; when it
> +   is a vector mode, the function acts like a tree-level version of
> +   related_vector_mode.  */
>
>  tree
> -get_vectype_for_scalar_type_and_size (tree scalar_type, poly_uint64 size)
> +get_related_vectype_for_scalar_type (machine_mode prevailing_mode,
> +                                    tree scalar_type, poly_uint64 nunits)
>  {
>    tree orig_scalar_type = scalar_type;
>    scalar_mode inner_mode;
>    machine_mode simd_mode;
> -  poly_uint64 nunits;
>    tree vectype;
>
>    if (!is_int_mode (TYPE_MODE (scalar_type), &inner_mode)
> @@ -11148,10 +11158,11 @@ get_vectype_for_scalar_type_and_size (tr
>    if (scalar_type == NULL_TREE)
>      return NULL_TREE;
>
> -  /* If no size was supplied use the mode the target prefers.   Otherwise
> -     lookup a vector mode of the specified size.  */
> -  if (known_eq (size, 0U))
> +  /* If no prevailing mode was supplied, use the mode the target prefers.
> +     Otherwise lookup a vector mode based on the prevailing mode.  */
> +  if (prevailing_mode == VOIDmode)
>      {
> +      gcc_assert (known_eq (nunits, 0U));
>        simd_mode = targetm.vectorize.preferred_simd_mode (inner_mode);
>        if (SCALAR_INT_MODE_P (simd_mode))
>         {
> @@ -11167,9 +11178,19 @@ get_vectype_for_scalar_type_and_size (tr
>             return NULL_TREE;
>         }
>      }
> -  else if (!multiple_p (size, nbytes, &nunits)
> -          || !mode_for_vector (inner_mode, nunits).exists (&simd_mode))
> -    return NULL_TREE;
> +  else if (SCALAR_INT_MODE_P (prevailing_mode)
> +          || !related_vector_mode (prevailing_mode,
> +                                   inner_mode, nunits).exists (&simd_mode))
> +    {
> +      /* Fall back to using mode_for_vector, mostly in the hope of being
> +        able to use an integer mode.  */
> +      if (known_eq (nunits, 0U)
> +         && !multiple_p (GET_MODE_SIZE (prevailing_mode), nbytes, &nunits))
> +       return NULL_TREE;
> +
> +      if (!mode_for_vector (inner_mode, nunits).exists (&simd_mode))
> +       return NULL_TREE;
> +    }
>
>    vectype = build_vector_type_for_mode (scalar_type, simd_mode);
>
> @@ -11197,9 +11218,8 @@ get_vectype_for_scalar_type_and_size (tr
>  tree
>  get_vectype_for_scalar_type (vec_info *vinfo, tree scalar_type)
>  {
> -  tree vectype;
> -  poly_uint64 vector_size = GET_MODE_SIZE (vinfo->vector_mode);
> -  vectype = get_vectype_for_scalar_type_and_size (scalar_type, vector_size);
> +  tree vectype = get_related_vectype_for_scalar_type (vinfo->vector_mode,
> +                                                     scalar_type);
>    if (vectype && vinfo->vector_mode == VOIDmode)
>      vinfo->vector_mode = TYPE_MODE (vectype);
>    return vectype;
> @@ -11232,8 +11252,13 @@ get_same_sized_vectype (tree scalar_type
>    if (VECT_SCALAR_BOOLEAN_TYPE_P (scalar_type))
>      return truth_type_for (vector_type);
>
> -  return get_vectype_for_scalar_type_and_size
> -          (scalar_type, GET_MODE_SIZE (TYPE_MODE (vector_type)));
> +  poly_uint64 nunits;
> +  if (!multiple_p (GET_MODE_SIZE (TYPE_MODE (vector_type)),
> +                  GET_MODE_SIZE (TYPE_MODE (scalar_type)), &nunits))
> +    return NULL_TREE;
> +
> +  return get_related_vectype_for_scalar_type (TYPE_MODE (vector_type),
> +                                             scalar_type, nunits);
>  }
>
>  /* Function vect_is_simple_use.
> Index: gcc/tree-vectorizer.c
> ===================================================================
> --- gcc/tree-vectorizer.c       2019-11-06 12:35:12.764199495 +0000
> +++ gcc/tree-vectorizer.c       2019-11-06 12:35:27.976093444 +0000
> @@ -1359,7 +1359,7 @@ get_vec_alignment_for_array_type (tree t
>    poly_uint64 array_size, vector_size;
>
>    tree scalar_type = strip_array_types (type);
> -  tree vectype = get_vectype_for_scalar_type_and_size (scalar_type, 0);
> +  tree vectype = get_related_vectype_for_scalar_type (VOIDmode, scalar_type);
>    if (!vectype
>        || !poly_int_tree_p (TYPE_SIZE (type), &array_size)
>        || !poly_int_tree_p (TYPE_SIZE (vectype), &vector_size)

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [11a/n] Avoid retrying with the same vector modes
  2019-11-06 12:47             ` Richard Sandiford
@ 2019-11-12  9:25               ` Richard Biener
  0 siblings, 0 replies; 48+ messages in thread
From: Richard Biener @ 2019-11-12  9:25 UTC (permalink / raw)
  To: Richard Biener, GCC Patches, Richard Sandiford

On Wed, Nov 6, 2019 at 1:47 PM Richard Sandiford
<richard.sandiford@arm.com> wrote:
>
> Richard Biener <richard.guenther@gmail.com> writes:
> > On Wed, Nov 6, 2019 at 12:02 PM Richard Sandiford
> > <richard.sandiford@arm.com> wrote:
> >>
> >> Richard Biener <richard.guenther@gmail.com> writes:
> >> > On Wed, Nov 6, 2019 at 11:21 AM Richard Sandiford
> >> > <richard.sandiford@arm.com> wrote:
> >> >>
> >> >> Richard Biener <richard.guenther@gmail.com> writes:
> >> >> > On Tue, Nov 5, 2019 at 9:25 PM Richard Sandiford
> >> >> > <richard.sandiford@arm.com> wrote:
> >> >> >>
> >> >> >> Patch 12/n makes the AArch64 port add four entries to
> >> >> >> autovectorize_vector_modes.  Each entry describes a different
> >> >> >> vector mode assignment for vector code that mixes 8-bit, 16-bit,
> >> >> >> 32-bit and 64-bit elements.  But if (as usual) the vector code has
> >> >> >> fewer element sizes than that, we could end up trying the same
> >> >> >> combination of vector modes multiple times.  This patch adds a
> >> >> >> check to prevent that.
> >> >> >>
> >> >> >> As before: each patch tested individually on aarch64-linux-gnu and the
> >> >> >> series as a whole on x86_64-linux-gnu.
> >> >> >>
> >> >> >>
> >> >> >> 2019-11-04  Richard Sandiford  <richard.sandiford@arm.com>
> >> >> >>
> >> >> >> gcc/
> >> >> >>         * tree-vectorizer.h (vec_info::mode_set): New typedef.
> >> >> >>         (vec_info::used_vector_mode): New member variable.
> >> >> >>         (vect_chooses_same_modes_p): Declare.
> >> >> >>         * tree-vect-stmts.c (get_vectype_for_scalar_type): Record each
> >> >> >>         chosen vector mode in vec_info::used_vector_mode.
> >> >> >>         (vect_chooses_same_modes_p): New function.
> >> >> >>         * tree-vect-loop.c (vect_analyze_loop): Use it to avoid trying
> >> >> >>         the same vector statements multiple times.
> >> >> >>         * tree-vect-slp.c (vect_slp_bb_region): Likewise.
> >> >> >>
> >> >> >> Index: gcc/tree-vectorizer.h
> >> >> >> ===================================================================
> >> >> >> --- gcc/tree-vectorizer.h       2019-11-05 10:48:11.246092351 +0000
> >> >> >> +++ gcc/tree-vectorizer.h       2019-11-05 10:57:41.662071145 +0000
> >> >> >> @@ -298,6 +298,7 @@ typedef std::pair<tree, tree> vec_object
> >> >> >>  /* Vectorizer state common between loop and basic-block vectorization.  */
> >> >> >>  class vec_info {
> >> >> >>  public:
> >> >> >> +  typedef hash_set<int_hash<machine_mode, E_VOIDmode, E_BLKmode> > mode_set;
> >> >> >>    enum vec_kind { bb, loop };
> >> >> >>
> >> >> >>    vec_info (vec_kind, void *, vec_info_shared *);
> >> >> >> @@ -335,6 +336,9 @@ typedef std::pair<tree, tree> vec_object
> >> >> >>    /* Cost data used by the target cost model.  */
> >> >> >>    void *target_cost_data;
> >> >> >>
> >> >> >> +  /* The set of vector modes used in the vectorized region.  */
> >> >> >> +  mode_set used_vector_modes;
> >> >> >> +
> >> >> >>    /* The argument we should pass to related_vector_mode when looking up
> >> >> >>       the vector mode for a scalar mode, or VOIDmode if we haven't yet
> >> >> >>       made any decisions about which vector modes to use.  */
> >> >> >> @@ -1615,6 +1619,7 @@ extern tree get_related_vectype_for_scal
> >> >> >>  extern tree get_vectype_for_scalar_type (vec_info *, tree);
> >> >> >>  extern tree get_mask_type_for_scalar_type (vec_info *, tree);
> >> >> >>  extern tree get_same_sized_vectype (tree, tree);
> >> >> >> +extern bool vect_chooses_same_modes_p (vec_info *, machine_mode);
> >> >> >>  extern bool vect_get_loop_mask_type (loop_vec_info);
> >> >> >>  extern bool vect_is_simple_use (tree, vec_info *, enum vect_def_type *,
> >> >> >>                                 stmt_vec_info * = NULL, gimple ** = NULL);
> >> >> >> Index: gcc/tree-vect-stmts.c
> >> >> >> ===================================================================
> >> >> >> --- gcc/tree-vect-stmts.c       2019-11-05 10:48:11.242092379 +0000
> >> >> >> +++ gcc/tree-vect-stmts.c       2019-11-05 10:57:41.662071145 +0000
> >> >> >> @@ -11235,6 +11235,10 @@ get_vectype_for_scalar_type (vec_info *v
> >> >> >>                                                       scalar_type);
> >> >> >>    if (vectype && vinfo->vector_mode == VOIDmode)
> >> >> >>      vinfo->vector_mode = TYPE_MODE (vectype);
> >> >> >> +
> >> >> >> +  if (vectype)
> >> >> >> +    vinfo->used_vector_modes.add (TYPE_MODE (vectype));
> >> >> >> +
> >> >> >
> >> >> > Do we actually end up _using_ all types returned by this function?
> >> >>
> >> >> No, not all of them, so it's a bit crude.  E.g. some types might end up
> >> >> not being relevant after pattern recognition, or after we've made a
> >> >> final decision about which parts of an address calculation to include
> >> >> in a gather or scatter op.  So we can still end up retrying the same
> >> >> thing even after the patch.
> >> >>
> >> >> The problem is that we're trying to avoid pointless retries on failure
> >> >> as well as success, so we could end up stopping at arbitrary points.
> >> >> I wasn't sure where else to handle this.
> >> >
> >> > Yeah, I think this "iterating" is somewhat bogus (crude) now.
> >>
> >> I think it was crude even before the series though. :-)  Not sure the
> >> series is making things worse.
> >>
> >> The problem is that there's a chicken-and-egg problem between how
> >> we decide to vectorise and which vector subarchitecture and VF we use.
> >> E.g. if we have:
> >>
> >>   unsigned char *x, *y;
> >>   ...
> >>   x[i] = (unsigned short) (x[i] + y[i] + 1) >> 1;
> >>
> >> do we build the SLP graph on the assumption that we need to use short
> >> elements, or on the assumption that we can use IFN_AVG_CEIL?  This
> >> affects the VF we get out: using IFN_AVG_CEIL gives double the VF
> >> relative to doing unsigned short arithmetic.
> >>
> >> And we need to know which vector subarchitecture we're targetting when
> >> making that decision: e.g. Advanced SIMD and SVE2 have IFN_AVG_CEIL,
> >> but SVE doesn't.  On the other hand, SVE supports things that Advanced
> >> SIMD doesn't.  It's similar story of course for the x86 vector subarchs.
> >>
> >> For one pattern like this, we could simply try both ways.
> >> But that becomes untenable if there are multiple potential patterns.
> >> Iterating over the vector subarchs gives us a sensible way of reducing
> >> the search space by only applying patterns that the subarch supports.
> >>
> >> So...
> >>
> >> > What we'd like to collect is for all defs the vector types we could
> >> > use and then vectorizable_ defines constraints between input and
> >> > output vector types.  From that we'd arrive at a (possibly quite
> >> > large) set of "SLP graphs with vector types" we'd choose from.  I
> >> > believe we'll never want to truly explore the whole space but guess we
> >> > want to greedily compute those "SLP graphs with vector types" starting
> >> > from what (grouped) datarefs tells us is possible (which is kind of
> >> > what we do now).
> >>
> >> ...I don't think we can/should use the same SLP graph to pick vector
> >> types for all subarchs, since the ideal graph depends on the subarch.
> >> I'm also not sure the vectorizable_* routines could say anything that
> >> isn't obvious from the input and output scalar types.  Won't it still be
> >> the case that within an SLP instance, all scalars of type T will become
> >> the same vector(N) T?
> >
> > Not necessarily.  I can see the SLP graph containing "reductions"
> > (like those complex patterns proposed).  But yes, at the moment
> > there's a single group-size per SLP instance.  Now, for the SLP _graph_
> > we may have one instance with 4 and one with group size of 8 both
> > for example sharing the same grouped load.  It may make sense to
> > vectorize the load with v8si and for the group-size 4 SLP instance
> > have a "reduction operation" that selects the appropriate part of the
> > loaded vector.
> >
> > Now, vectorizable_* for say a conversion from int to double
> > may be able to vectorize for a v4si directly to v4df or to two times v2df.
> > With the size restriction relaxed vectorizable_* could opt to choose
> > smaller/larger vectors specifically and thus also have different vector types
> > for the same scalar type in the SLP graph.  I do expect this to be
> > profitable on x86 for some loops due to some asymmetries in the ISA
> > (and extra cost of lane-crossing operations for say AVX where using SSE
> > is cheaper for some ops even though you now have 2 or more instructions).
>
> Making this dependent on vectorizable_* seems to require a natural
> conversion point in the original scalar code.  In your shared data-ref
> example, the SLP instance with group size 4 could use 2 v2sis or a
> single v4si.  The fact that the data ref is shared with a group size
> of 8 shouldn't necessarily affect that choice and e.g. force v4si over
> v2si.  So when taking the graph as a whole, it seems like we should be
> able to combine 2 v2sis into a single v4si or split a v4si into 2 v2sis
> at any point where that's worthwhile, independently of the surrounding
> operations.
>
> The same goes for the conversions.  If the target only supports
> v4si->v2df conversions, it should still be possible to combine 2 v2dfs
> into a single v4df as a separate, independent step if there's a benefit
> to operating on v4df.  Same idea in reverse if the target only supports
> v4si->v4df and an SLP instance wants to operate on v2df.  It would be
> good if this splitting and combining wasn't dependent on having a
> conversion.
>
> So it seems like the natural point for inserting group size adjustments
> is at sharing boundaries between SLP instances, with each SLP instance
> using consistent choices internally (i.e. with the scalar type
> determining the vector type).  And we should be able to insert those
> group size adjustments based only on the types involved.  Whether
> vectorizable_conversion can do a particular group size adjustment on
> the fly seems more like a costing/pattern-matching decision rather than
> something that should restrict the choice of types.

Yes, this sounds correct.  So let's go with the patch.

Thanks,
Richard.

> Thanks,
> Richard

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [14/n] Vectorise conversions between differently-sized integer vectors
  2019-11-06 12:45     ` Richard Sandiford
@ 2019-11-12  9:40       ` Richard Biener
  0 siblings, 0 replies; 48+ messages in thread
From: Richard Biener @ 2019-11-12  9:40 UTC (permalink / raw)
  To: Richard Biener, GCC Patches, Richard Sandiford

On Wed, Nov 6, 2019 at 1:45 PM Richard Sandiford
<richard.sandiford@arm.com> wrote:
>
> Richard Biener <richard.guenther@gmail.com> writes:
> > On Fri, Oct 25, 2019 at 2:51 PM Richard Sandiford
> > <richard.sandiford@arm.com> wrote:
> >>
> >> This patch adds AArch64 patterns for converting between 64-bit and
> >> 128-bit integer vectors, and makes the vectoriser and expand pass
> >> use them.
> >
> > So on GIMPLE we'll see
> >
> > v4si _1;
> > v4di _2;
> >
> >  _1 = (v4si) _2;
> >
> > then, correct?  Likewise for float conversions.
> >
> > I think that's "new", can you add to tree-cfg.c:verify_gimple_assign_unary
> > verification that the number of lanes of the LHS and the RHS match please?
>
> Ah, yeah.  How's this?  Tested as before.

OK.

Thanks,
Richard.

> Richard
>
>
> 2019-11-06  Richard Sandiford  <richard.sandiford@arm.com>
>
> gcc/
>         * tree-cfg.c (verify_gimple_assign_unary): Handle conversions
>         between vector types.
>         * tree-vect-stmts.c (vectorizable_conversion): Extend the
>         non-widening and non-narrowing path to handle standard
>         conversion codes, if the target supports them.
>         * expr.c (convert_move): Try using the extend and truncate optabs
>         for vectors.
>         * optabs-tree.c (supportable_convert_operation): Likewise.
>         * config/aarch64/iterators.md (Vnarroqw): New iterator.
>         * config/aarch64/aarch64-simd.md (<optab><Vnarrowq><mode>2)
>         (trunc<mode><Vnarrowq>2): New patterns.
>
> gcc/testsuite/
>         * gcc.dg/vect/bb-slp-pr69907.c: Do not expect BB vectorization
>         to fail for aarch64 targets.
>         * gcc.dg/vect/no-scevccp-outer-12.c: Expect the test to pass
>         on aarch64 targets.
>         * gcc.dg/vect/vect-double-reduc-5.c: Likewise.
>         * gcc.dg/vect/vect-outer-4e.c: Likewise.
>         * gcc.target/aarch64/vect_mixed_sizes_5.c: New test.
>         * gcc.target/aarch64/vect_mixed_sizes_6.c: Likewise.
>         * gcc.target/aarch64/vect_mixed_sizes_7.c: Likewise.
>         * gcc.target/aarch64/vect_mixed_sizes_8.c: Likewise.
>         * gcc.target/aarch64/vect_mixed_sizes_9.c: Likewise.
>         * gcc.target/aarch64/vect_mixed_sizes_10.c: Likewise.
>         * gcc.target/aarch64/vect_mixed_sizes_11.c: Likewise.
>         * gcc.target/aarch64/vect_mixed_sizes_12.c: Likewise.
>         * gcc.target/aarch64/vect_mixed_sizes_13.c: Likewise.
>
> Index: gcc/tree-cfg.c
> ===================================================================
> --- gcc/tree-cfg.c      2019-09-05 08:49:30.829739618 +0100
> +++ gcc/tree-cfg.c      2019-11-06 12:44:22.832365429 +0000
> @@ -3553,6 +3553,24 @@ verify_gimple_assign_unary (gassign *stm
>      {
>      CASE_CONVERT:
>        {
> +       /* Allow conversions between vectors with the same number of elements,
> +          provided that the conversion is OK for the element types too.  */
> +       if (VECTOR_TYPE_P (lhs_type)
> +           && VECTOR_TYPE_P (rhs1_type)
> +           && known_eq (TYPE_VECTOR_SUBPARTS (lhs_type),
> +                        TYPE_VECTOR_SUBPARTS (rhs1_type)))
> +         {
> +           lhs_type = TREE_TYPE (lhs_type);
> +           rhs1_type = TREE_TYPE (rhs1_type);
> +         }
> +       else if (VECTOR_TYPE_P (lhs_type) || VECTOR_TYPE_P (rhs1_type))
> +         {
> +           error ("invalid vector types in nop conversion");
> +           debug_generic_expr (lhs_type);
> +           debug_generic_expr (rhs1_type);
> +           return true;
> +         }
> +
>         /* Allow conversions from pointer type to integral type only if
>            there is no sign or zero extension involved.
>            For targets were the precision of ptrofftype doesn't match that
> Index: gcc/tree-vect-stmts.c
> ===================================================================
> --- gcc/tree-vect-stmts.c       2019-11-06 12:44:10.896448608 +0000
> +++ gcc/tree-vect-stmts.c       2019-11-06 12:44:22.832365429 +0000
> @@ -4869,7 +4869,9 @@ vectorizable_conversion (stmt_vec_info s
>    switch (modifier)
>      {
>      case NONE:
> -      if (code != FIX_TRUNC_EXPR && code != FLOAT_EXPR)
> +      if (code != FIX_TRUNC_EXPR
> +         && code != FLOAT_EXPR
> +         && !CONVERT_EXPR_CODE_P (code))
>         return false;
>        if (supportable_convert_operation (code, vectype_out, vectype_in,
>                                          &decl1, &code1))
> Index: gcc/expr.c
> ===================================================================
> --- gcc/expr.c  2019-11-06 12:29:17.394677341 +0000
> +++ gcc/expr.c  2019-11-06 12:44:22.828365457 +0000
> @@ -250,6 +250,31 @@ convert_move (rtx to, rtx from, int unsi
>
>    if (VECTOR_MODE_P (to_mode) || VECTOR_MODE_P (from_mode))
>      {
> +      if (GET_MODE_UNIT_PRECISION (to_mode)
> +         > GET_MODE_UNIT_PRECISION (from_mode))
> +       {
> +         optab op = unsignedp ? zext_optab : sext_optab;
> +         insn_code icode = convert_optab_handler (op, to_mode, from_mode);
> +         if (icode != CODE_FOR_nothing)
> +           {
> +             emit_unop_insn (icode, to, from,
> +                             unsignedp ? ZERO_EXTEND : SIGN_EXTEND);
> +             return;
> +           }
> +       }
> +
> +      if (GET_MODE_UNIT_PRECISION (to_mode)
> +         < GET_MODE_UNIT_PRECISION (from_mode))
> +       {
> +         insn_code icode = convert_optab_handler (trunc_optab,
> +                                                  to_mode, from_mode);
> +         if (icode != CODE_FOR_nothing)
> +           {
> +             emit_unop_insn (icode, to, from, TRUNCATE);
> +             return;
> +           }
> +       }
> +
>        gcc_assert (known_eq (GET_MODE_BITSIZE (from_mode),
>                             GET_MODE_BITSIZE (to_mode)));
>
> Index: gcc/optabs-tree.c
> ===================================================================
> --- gcc/optabs-tree.c   2019-11-06 12:28:23.000000000 +0000
> +++ gcc/optabs-tree.c   2019-11-06 12:44:22.828365457 +0000
> @@ -303,6 +303,20 @@ supportable_convert_operation (enum tree
>        return true;
>      }
>
> +  if (GET_MODE_UNIT_PRECISION (m1) > GET_MODE_UNIT_PRECISION (m2)
> +      && can_extend_p (m1, m2, TYPE_UNSIGNED (vectype_in)))
> +    {
> +      *code1 = code;
> +      return true;
> +    }
> +
> +  if (GET_MODE_UNIT_PRECISION (m1) < GET_MODE_UNIT_PRECISION (m2)
> +      && convert_optab_handler (trunc_optab, m1, m2) != CODE_FOR_nothing)
> +    {
> +      *code1 = code;
> +      return true;
> +    }
> +
>    /* Now check for builtin.  */
>    if (targetm.vectorize.builtin_conversion
>        && targetm.vectorize.builtin_conversion (code, vectype_out, vectype_in))
> Index: gcc/config/aarch64/iterators.md
> ===================================================================
> --- gcc/config/aarch64/iterators.md     2019-11-06 12:28:23.000000000 +0000
> +++ gcc/config/aarch64/iterators.md     2019-11-06 12:44:22.824365485 +0000
> @@ -933,6 +933,8 @@ (define_mode_attr VNARROWQ [(V8HI "V8QI"
>                             (V2DI "V2SI")
>                             (DI   "SI")   (SI   "HI")
>                             (HI   "QI")])
> +(define_mode_attr Vnarrowq [(V8HI "v8qi") (V4SI "v4hi")
> +                           (V2DI "v2si")])
>
>  ;; Narrowed quad-modes for VQN (Used for XTN2).
>  (define_mode_attr VNARROWQ2 [(V8HI "V16QI") (V4SI "V8HI")
> Index: gcc/config/aarch64/aarch64-simd.md
> ===================================================================
> --- gcc/config/aarch64/aarch64-simd.md  2019-11-06 12:28:23.000000000 +0000
> +++ gcc/config/aarch64/aarch64-simd.md  2019-11-06 12:44:22.824365485 +0000
> @@ -7007,3 +7007,21 @@ (define_insn "aarch64_crypto_pmullv2di"
>    "pmull2\\t%0.1q, %1.2d, %2.2d"
>    [(set_attr "type" "crypto_pmull")]
>  )
> +
> +;; Sign- or zero-extend a 64-bit integer vector to a 128-bit vector.
> +(define_insn "<optab><Vnarrowq><mode>2"
> +  [(set (match_operand:VQN 0 "register_operand" "=w")
> +       (ANY_EXTEND:VQN (match_operand:<VNARROWQ> 1 "register_operand" "w")))]
> +  "TARGET_SIMD"
> +  "<su>xtl\t%0.<Vtype>, %1.<Vntype>"
> +  [(set_attr "type" "neon_shift_imm_long")]
> +)
> +
> +;; Truncate a 128-bit integer vector to a 64-bit vector.
> +(define_insn "trunc<mode><Vnarrowq>2"
> +  [(set (match_operand:<VNARROWQ> 0 "register_operand" "=w")
> +       (truncate:<VNARROWQ> (match_operand:VQN 1 "register_operand" "w")))]
> +  "TARGET_SIMD"
> +  "xtn\t%0.<Vntype>, %1.<Vtype>"
> +  [(set_attr "type" "neon_shift_imm_narrow_q")]
> +)
> Index: gcc/testsuite/gcc.dg/vect/bb-slp-pr69907.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/vect/bb-slp-pr69907.c  2019-03-08 18:15:02.292871138 +0000
> +++ gcc/testsuite/gcc.dg/vect/bb-slp-pr69907.c  2019-11-06 12:44:22.828365457 +0000
> @@ -18,5 +18,6 @@ void foo(unsigned *p1, unsigned short *p
>  }
>
>  /* Disable for SVE because for long or variable-length vectors we don't
> -   get an unrolled epilogue loop.  */
> -/* { dg-final { scan-tree-dump "BB vectorization with gaps at the end of a load is not supported" "slp1" { target { ! aarch64_sve } } } } */
> +   get an unrolled epilogue loop.  Also disable for AArch64 Advanced SIMD,
> +   because there we can vectorize the epilogue using mixed vector sizes.  */
> +/* { dg-final { scan-tree-dump "BB vectorization with gaps at the end of a load is not supported" "slp1" { target { ! aarch64*-*-* } } } } */
> Index: gcc/testsuite/gcc.dg/vect/no-scevccp-outer-12.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/vect/no-scevccp-outer-12.c     2019-11-06 12:28:23.000000000 +0000
> +++ gcc/testsuite/gcc.dg/vect/no-scevccp-outer-12.c     2019-11-06 12:44:22.828365457 +0000
> @@ -46,4 +46,4 @@ int main (void)
>  }
>
>  /* Until we support multiple types in the inner loop  */
> -/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" { xfail *-*-* } } } */
> +/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" { xfail { ! aarch64*-*-* } } } } */
> Index: gcc/testsuite/gcc.dg/vect/vect-double-reduc-5.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/vect/vect-double-reduc-5.c     2019-11-06 12:28:23.000000000 +0000
> +++ gcc/testsuite/gcc.dg/vect/vect-double-reduc-5.c     2019-11-06 12:44:22.828365457 +0000
> @@ -52,5 +52,5 @@ int main ()
>
>  /* Vectorization of loops with multiple types and double reduction is not
>     supported yet.  */
> -/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { xfail *-*-* } } } */
> +/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { xfail { ! aarch64*-*-* } } } } */
>
> Index: gcc/testsuite/gcc.dg/vect/vect-outer-4e.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/vect/vect-outer-4e.c   2019-11-06 12:28:23.000000000 +0000
> +++ gcc/testsuite/gcc.dg/vect/vect-outer-4e.c   2019-11-06 12:44:22.828365457 +0000
> @@ -23,4 +23,4 @@ foo (){
>    return;
>  }
>
> -/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { xfail *-*-* } } } */
> +/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { xfail { ! aarch64*-*-* } } } } */
> Index: gcc/testsuite/gcc.target/aarch64/vect_mixed_sizes_5.c
> ===================================================================
> --- /dev/null   2019-09-17 11:41:18.176664108 +0100
> +++ gcc/testsuite/gcc.target/aarch64/vect_mixed_sizes_5.c       2019-11-06 12:44:22.828365457 +0000
> @@ -0,0 +1,18 @@
> +/* { dg-options "-O2 -ftree-vectorize" } */
> +
> +#pragma GCC target "+nosve"
> +
> +#include <stdint.h>
> +
> +void
> +f (int64_t *x, int64_t *y, int32_t *z, int n)
> +{
> +  for (int i = 0; i < n; ++i)
> +    {
> +      x[i] = z[i];
> +      y[i] += y[i - 2];
> +    }
> +}
> +
> +/* { dg-final { scan-assembler-times {\tsxtl\tv[0-9]+\.2d, v[0-9]+\.2s\n} 1 } } */
> +/* { dg-final { scan-assembler-times {\tadd\tv[0-9]+\.2d,} 1 } } */
> Index: gcc/testsuite/gcc.target/aarch64/vect_mixed_sizes_6.c
> ===================================================================
> --- /dev/null   2019-09-17 11:41:18.176664108 +0100
> +++ gcc/testsuite/gcc.target/aarch64/vect_mixed_sizes_6.c       2019-11-06 12:44:22.828365457 +0000
> @@ -0,0 +1,18 @@
> +/* { dg-options "-O2 -ftree-vectorize" } */
> +
> +#pragma GCC target "+nosve"
> +
> +#include <stdint.h>
> +
> +void
> +f (int32_t *x, int32_t *y, int16_t *z, int n)
> +{
> +  for (int i = 0; i < n; ++i)
> +    {
> +      x[i] = z[i];
> +      y[i] += y[i - 4];
> +    }
> +}
> +
> +/* { dg-final { scan-assembler-times {\tsxtl\tv[0-9]+\.4s, v[0-9]+\.4h\n} 1 } } */
> +/* { dg-final { scan-assembler-times {\tadd\tv[0-9]+\.4s,} 1 } } */
> Index: gcc/testsuite/gcc.target/aarch64/vect_mixed_sizes_7.c
> ===================================================================
> --- /dev/null   2019-09-17 11:41:18.176664108 +0100
> +++ gcc/testsuite/gcc.target/aarch64/vect_mixed_sizes_7.c       2019-11-06 12:44:22.828365457 +0000
> @@ -0,0 +1,18 @@
> +/* { dg-options "-O2 -ftree-vectorize" } */
> +
> +#pragma GCC target "+nosve"
> +
> +#include <stdint.h>
> +
> +void
> +f (int16_t *x, int16_t *y, int8_t *z, int n)
> +{
> +  for (int i = 0; i < n; ++i)
> +    {
> +      x[i] = z[i];
> +      y[i] += y[i - 8];
> +    }
> +}
> +
> +/* { dg-final { scan-assembler-times {\tsxtl\tv[0-9]+\.8h, v[0-9]+\.8b\n} 1 } } */
> +/* { dg-final { scan-assembler-times {\tadd\tv[0-9]+\.8h,} 1 } } */
> Index: gcc/testsuite/gcc.target/aarch64/vect_mixed_sizes_8.c
> ===================================================================
> --- /dev/null   2019-09-17 11:41:18.176664108 +0100
> +++ gcc/testsuite/gcc.target/aarch64/vect_mixed_sizes_8.c       2019-11-06 12:44:22.828365457 +0000
> @@ -0,0 +1,18 @@
> +/* { dg-options "-O2 -ftree-vectorize" } */
> +
> +#pragma GCC target "+nosve"
> +
> +#include <stdint.h>
> +
> +void
> +f (int64_t *x, int64_t *y, uint32_t *z, int n)
> +{
> +  for (int i = 0; i < n; ++i)
> +    {
> +      x[i] = z[i];
> +      y[i] += y[i - 2];
> +    }
> +}
> +
> +/* { dg-final { scan-assembler-times {\tuxtl\tv[0-9]+\.2d, v[0-9]+\.2s\n} 1 } } */
> +/* { dg-final { scan-assembler-times {\tadd\tv[0-9]+\.2d,} 1 } } */
> Index: gcc/testsuite/gcc.target/aarch64/vect_mixed_sizes_9.c
> ===================================================================
> --- /dev/null   2019-09-17 11:41:18.176664108 +0100
> +++ gcc/testsuite/gcc.target/aarch64/vect_mixed_sizes_9.c       2019-11-06 12:44:22.828365457 +0000
> @@ -0,0 +1,18 @@
> +/* { dg-options "-O2 -ftree-vectorize" } */
> +
> +#pragma GCC target "+nosve"
> +
> +#include <stdint.h>
> +
> +void
> +f (int32_t *x, int32_t *y, uint16_t *z, int n)
> +{
> +  for (int i = 0; i < n; ++i)
> +    {
> +      x[i] = z[i];
> +      y[i] += y[i - 4];
> +    }
> +}
> +
> +/* { dg-final { scan-assembler-times {\tuxtl\tv[0-9]+\.4s, v[0-9]+\.4h\n} 1 } } */
> +/* { dg-final { scan-assembler-times {\tadd\tv[0-9]+\.4s,} 1 } } */
> Index: gcc/testsuite/gcc.target/aarch64/vect_mixed_sizes_10.c
> ===================================================================
> --- /dev/null   2019-09-17 11:41:18.176664108 +0100
> +++ gcc/testsuite/gcc.target/aarch64/vect_mixed_sizes_10.c      2019-11-06 12:44:22.828365457 +0000
> @@ -0,0 +1,18 @@
> +/* { dg-options "-O2 -ftree-vectorize" } */
> +
> +#pragma GCC target "+nosve"
> +
> +#include <stdint.h>
> +
> +void
> +f (int16_t *x, int16_t *y, uint8_t *z, int n)
> +{
> +  for (int i = 0; i < n; ++i)
> +    {
> +      x[i] = z[i];
> +      y[i] += y[i - 8];
> +    }
> +}
> +
> +/* { dg-final { scan-assembler-times {\tuxtl\tv[0-9]+\.8h, v[0-9]+\.8b\n} 1 } } */
> +/* { dg-final { scan-assembler-times {\tadd\tv[0-9]+\.8h,} 1 } } */
> Index: gcc/testsuite/gcc.target/aarch64/vect_mixed_sizes_11.c
> ===================================================================
> --- /dev/null   2019-09-17 11:41:18.176664108 +0100
> +++ gcc/testsuite/gcc.target/aarch64/vect_mixed_sizes_11.c      2019-11-06 12:44:22.828365457 +0000
> @@ -0,0 +1,18 @@
> +/* { dg-options "-O2 -ftree-vectorize" } */
> +
> +#pragma GCC target "+nosve"
> +
> +#include <stdint.h>
> +
> +void
> +f (int32_t *x, int64_t *y, int64_t *z, int n)
> +{
> +  for (int i = 0; i < n; ++i)
> +    {
> +      x[i] = z[i];
> +      y[i] += y[i - 2];
> +    }
> +}
> +
> +/* { dg-final { scan-assembler-times {\txtn\tv[0-9]+\.2s, v[0-9]+\.2d\n} 1 } } */
> +/* { dg-final { scan-assembler-times {\tadd\tv[0-9]+\.2d,} 1 } } */
> Index: gcc/testsuite/gcc.target/aarch64/vect_mixed_sizes_12.c
> ===================================================================
> --- /dev/null   2019-09-17 11:41:18.176664108 +0100
> +++ gcc/testsuite/gcc.target/aarch64/vect_mixed_sizes_12.c      2019-11-06 12:44:22.828365457 +0000
> @@ -0,0 +1,18 @@
> +/* { dg-options "-O2 -ftree-vectorize" } */
> +
> +#pragma GCC target "+nosve"
> +
> +#include <stdint.h>
> +
> +void
> +f (int16_t *x, int32_t *y, int32_t *z, int n)
> +{
> +  for (int i = 0; i < n; ++i)
> +    {
> +      x[i] = z[i];
> +      y[i] += y[i - 4];
> +    }
> +}
> +
> +/* { dg-final { scan-assembler-times {\txtn\tv[0-9]+\.4h, v[0-9]+\.4s\n} 1 } } */
> +/* { dg-final { scan-assembler-times {\tadd\tv[0-9]+\.4s,} 1 } } */
> Index: gcc/testsuite/gcc.target/aarch64/vect_mixed_sizes_13.c
> ===================================================================
> --- /dev/null   2019-09-17 11:41:18.176664108 +0100
> +++ gcc/testsuite/gcc.target/aarch64/vect_mixed_sizes_13.c      2019-11-06 12:44:22.828365457 +0000
> @@ -0,0 +1,18 @@
> +/* { dg-options "-O2 -ftree-vectorize" } */
> +
> +#pragma GCC target "+nosve"
> +
> +#include <stdint.h>
> +
> +void
> +f (int8_t *x, int16_t *y, int16_t *z, int n)
> +{
> +  for (int i = 0; i < n; ++i)
> +    {
> +      x[i] = z[i];
> +      y[i] += y[i - 8];
> +    }
> +}
> +
> +/* { dg-final { scan-assembler-times {\txtn\tv[0-9]+\.8b, v[0-9]+\.8h\n} 1 } } */
> +/* { dg-final { scan-assembler-times {\tadd\tv[0-9]+\.8h,} 1 } } */

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [8/n] Replace autovectorize_vector_sizes with autovectorize_vector_modes
  2019-11-11 14:33       ` Richard Biener
@ 2019-11-12 17:55         ` Richard Sandiford
  2019-11-13 14:32           ` Richard Biener
  0 siblings, 1 reply; 48+ messages in thread
From: Richard Sandiford @ 2019-11-12 17:55 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC Patches

Richard Biener <richard.guenther@gmail.com> writes:
> On Wed, Oct 30, 2019 at 4:58 PM Richard Sandiford
> <richard.sandiford@arm.com> wrote:
>>
>> Richard Biener <richard.guenther@gmail.com> writes:
>> > On Fri, Oct 25, 2019 at 2:37 PM Richard Sandiford
>> > <richard.sandiford@arm.com> wrote:
>> >>
>> >> This is another patch in the series to remove the assumption that
>> >> all modes involved in vectorisation have to be the same size.
>> >> Rather than have the target provide a list of vector sizes,
>> >> it makes the target provide a list of vector "approaches",
>> >> with each approach represented by a mode.
>> >>
>> >> A later patch will pass this mode to targetm.vectorize.related_mode
>> >> to get the vector mode for a given element mode.  Until then, the modes
>> >> simply act as an alternative way of specifying the vector size.
>> >
>> > Is there a restriction to use integer vector modes for the hook
>> > or would FP vector modes be OK as well?
>>
>> Conceptually, each mode returned by the hook represents a set of vector
>> modes, with the set containing one member for each supported element
>> type.  The idea is to represent the set using the member with the
>> smallest element type, preferring integer modes over floating-point
>> modes in the event of a tie.  So using a floating-point mode as the
>> representative mode is fine if floating-point elements are the smallest
>> (or only) supported element type.
>>
>> > Note that your x86 change likely disables word_mode vectorization with
>> > -mno-sse?
>>
>> No, that still works, because...
>>
>> > That is, how do we represent GPR vectorization "size" here?
>> > The preferred SIMD mode hook may return an integer mode,
>> > are non-vector modes OK for autovectorize_vector_modes?
>>
>> ...at least with all current targets, preferred_simd_mode is only
>> an integer mode if the target has no "real" vectorisation support
>> for that element type.  There's no need to handle that case in
>> autovectorize_vector_sizes/modes, and e.g. the x86 hook does nothing
>> when SSE is disabled.
>>
>> So while preferred_simd_mode can continue to return integer modes,
>> autovectorize_vector_modes always returns vector modes.
>
> Hmm, I see.  IIRC I was playing with a patch for x86 that
> enabled word-mode vectorization (64bits) for SSE before (I see
> we don't do that at the moment).  The MMX-with-SSE has made
> that somewhat moot but with iterating over modes we could
> even make MMX-with-SSE (MMX modes) and word-mode vectors
> coexist by allowing the hook to return V4SI, V2SI, DImode?
> Because MMX-with-SSE might be more costly than word-mode
> but can of course handle more cases.
>
> So you say the above isn't supported and cannot be made supported?

It isn't supported as things stand.  It shouldn't be hard to make
it work, but I'm not sure what the best semantics would be.

AIUI, before the series, returning word_mode from preferred_simd_mode
just means that vectors should have the same size as word_mode.  If the
target defines V2SI, we'll use that as the raw type mode for SI vectors,
regardless of whether V2SI is enabled.  If the mode *is* enabled,
the TYPE_MODE will also be V2SI and so returning word_mode from
preferred_simd_mode is equivalent to returning V2SImode.  If the mode
isn't enabled, the TYPE_MODE will be word_mode if that's suitable and
BLKmode otherwise.

The situation's similar for SF; if the target defines and supports V2SF,
returning word_mode would be equivalent to returning V2SFmode.

But it sounds like returning word_mode for the new hook would behave
differently, in that we'd force the raw type mode to be DImode even
if V2SImode is defined and supported.  So what should happen for float
types?  Should we reject those, or behave as above and apply the usual
mode_for_vector treatment for a word_mode-sized vector?

If code contains a mixture of HImode and SImode elements, should
we use DImode for both of them, or SImode for HImode elements?
Should the modes be passed to the target's related_vector_mode
hook in the same way as for vectors, or handled before then?

I could implement one of these.  I'm just not sure it'd turn out
to be the right one, once someone actually tries to use it. :-)

FWIW, another way of doing the same thing would be to define
emulated vector modes, e.g. EMUL_V2SI, giving them a lower
priority than the real V2SI.  This is already possible with
VECTOR_MODES_WITH_PREFIX.  Because these emulated modes would
be permanently unsupported, the associated TYPE_MODE would always
be the equivalent integer mode (if appropriate).  So we could force
integer modes that way too.  This has the advantage that we never lose
sight of what the element type is, and so can choose between pairing
EMUL_V2SI and EMUL_V4HI vs. pairing EMUL_V2SI and EMUL_V2HI,
just like we can for "real" vector modes.

Of course that's "a bit" of a hack.  But then so IMO is using integer
modes for this kind of choice. :-)

Another option I'd considered was having the hook return a list of
abstract identifiers that are only meaningful to the target, either
with accompanying information like the maximum size and maximum nunits,
or with a separate hook to provide that information.  Or we could even
return a list of virtual objects.  But that seemed like an abstraction
too far when in practice a vector mode should be descriptive enough.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [8/n] Replace autovectorize_vector_sizes with autovectorize_vector_modes
  2019-11-12 17:55         ` Richard Sandiford
@ 2019-11-13 14:32           ` Richard Biener
  2019-11-13 16:16             ` Richard Sandiford
  0 siblings, 1 reply; 48+ messages in thread
From: Richard Biener @ 2019-11-13 14:32 UTC (permalink / raw)
  To: Richard Biener, GCC Patches, Richard Sandiford

On Tue, Nov 12, 2019 at 6:54 PM Richard Sandiford
<richard.sandiford@arm.com> wrote:
>
> Richard Biener <richard.guenther@gmail.com> writes:
> > On Wed, Oct 30, 2019 at 4:58 PM Richard Sandiford
> > <richard.sandiford@arm.com> wrote:
> >>
> >> Richard Biener <richard.guenther@gmail.com> writes:
> >> > On Fri, Oct 25, 2019 at 2:37 PM Richard Sandiford
> >> > <richard.sandiford@arm.com> wrote:
> >> >>
> >> >> This is another patch in the series to remove the assumption that
> >> >> all modes involved in vectorisation have to be the same size.
> >> >> Rather than have the target provide a list of vector sizes,
> >> >> it makes the target provide a list of vector "approaches",
> >> >> with each approach represented by a mode.
> >> >>
> >> >> A later patch will pass this mode to targetm.vectorize.related_mode
> >> >> to get the vector mode for a given element mode.  Until then, the modes
> >> >> simply act as an alternative way of specifying the vector size.
> >> >
> >> > Is there a restriction to use integer vector modes for the hook
> >> > or would FP vector modes be OK as well?
> >>
> >> Conceptually, each mode returned by the hook represents a set of vector
> >> modes, with the set containing one member for each supported element
> >> type.  The idea is to represent the set using the member with the
> >> smallest element type, preferring integer modes over floating-point
> >> modes in the event of a tie.  So using a floating-point mode as the
> >> representative mode is fine if floating-point elements are the smallest
> >> (or only) supported element type.
> >>
> >> > Note that your x86 change likely disables word_mode vectorization with
> >> > -mno-sse?
> >>
> >> No, that still works, because...
> >>
> >> > That is, how do we represent GPR vectorization "size" here?
> >> > The preferred SIMD mode hook may return an integer mode,
> >> > are non-vector modes OK for autovectorize_vector_modes?
> >>
> >> ...at least with all current targets, preferred_simd_mode is only
> >> an integer mode if the target has no "real" vectorisation support
> >> for that element type.  There's no need to handle that case in
> >> autovectorize_vector_sizes/modes, and e.g. the x86 hook does nothing
> >> when SSE is disabled.
> >>
> >> So while preferred_simd_mode can continue to return integer modes,
> >> autovectorize_vector_modes always returns vector modes.
> >
> > Hmm, I see.  IIRC I was playing with a patch for x86 that
> > enabled word-mode vectorization (64bits) for SSE before (I see
> > we don't do that at the moment).  The MMX-with-SSE has made
> > that somewhat moot but with iterating over modes we could
> > even make MMX-with-SSE (MMX modes) and word-mode vectors
> > coexist by allowing the hook to return V4SI, V2SI, DImode?
> > Because MMX-with-SSE might be more costly than word-mode
> > but can of course handle more cases.
> >
> > So you say the above isn't supported and cannot be made supported?
>
> It isn't supported as things stand.  It shouldn't be hard to make
> it work, but I'm not sure what the best semantics would be.
>
> AIUI, before the series, returning word_mode from preferred_simd_mode
> just means that vectors should have the same size as word_mode.  If the
> target defines V2SI, we'll use that as the raw type mode for SI vectors,
> regardless of whether V2SI is enabled.  If the mode *is* enabled,
> the TYPE_MODE will also be V2SI and so returning word_mode from
> preferred_simd_mode is equivalent to returning V2SImode.  If the mode
> isn't enabled, the TYPE_MODE will be word_mode if that's suitable and
> BLKmode otherwise.
>
> The situation's similar for SF; if the target defines and supports V2SF,
> returning word_mode would be equivalent to returning V2SFmode.
>
> But it sounds like returning word_mode for the new hook would behave
> differently, in that we'd force the raw type mode to be DImode even
> if V2SImode is defined and supported.

Yes.

> So what should happen for float
> types?  Should we reject those, or behave as above and apply the usual
> mode_for_vector treatment for a word_mode-sized vector?

I wasn't aware we're doing this mode_for_vector dance, certainly we
didn't that before?  I expected targets would return vector modes that
are enabled only.  What's the reason to not do that?

> If code contains a mixture of HImode and SImode elements, should
> we use DImode for both of them, or SImode for HImode elements?
> Should the modes be passed to the target's related_vector_mode
> hook in the same way as for vectors, or handled before then?

I'd say SImode and HImode are naturally "related" "vector" modes
for word_mode - so word_mode ideally is just a placeholder for
"try to vectorize w/o actual SIMD instructions" which we can do
for a small set of operations.

> I could implement one of these.  I'm just not sure it'd turn out
> to be the right one, once someone actually tries to use it. :-)

Heh.  I was merely trying to make sure we're not designing us
into a corner where we can't mix the word_mode vectorization
facility with the SIMD one (because on GIMPLE it's still all
vectors, only vector lowering exposes the ints IIRC).

> FWIW, another way of doing the same thing would be to define
> emulated vector modes, e.g. EMUL_V2SI, giving them a lower
> priority than the real V2SI.  This is already possible with
> VECTOR_MODES_WITH_PREFIX.  Because these emulated modes would
> be permanently unsupported, the associated TYPE_MODE would always
> be the equivalent integer mode (if appropriate).  So we could force
> integer modes that way too.  This has the advantage that we never lose
> sight of what the element type is, and so can choose between pairing
> EMUL_V2SI and EMUL_V4HI vs. pairing EMUL_V2SI and EMUL_V2HI,
> just like we can for "real" vector modes.
>
> Of course that's "a bit" of a hack.  But then so IMO is using integer
> modes for this kind of choice. :-)
>
> Another option I'd considered was having the hook return a list of
> abstract identifiers that are only meaningful to the target, either
> with accompanying information like the maximum size and maximum nunits,
> or with a separate hook to provide that information.  Or we could even
> return a list of virtual objects.  But that seemed like an abstraction
> too far when in practice a vector mode should be descriptive enough.

Anyway, I think we have testsuite coverage for word_mode vectorization
and IIRC i?86 enables that when there's no SSE support via
the preferred_simd_mode hook which you do not change (and at the moment
that doesn't have to be in the list of sizes we should iterate over).

When I understand you correctly what is not yet supported with your patch
is adding word_mode (or other integer modes) to autovectorize_vector_modes
but behavior could be emulated via non-enabled vector modes that get
mapped to integer modes.

Thus the patch is OK.

Thanks,
Richard.

> Thanks,
> Richard

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [8/n] Replace autovectorize_vector_sizes with autovectorize_vector_modes
  2019-11-13 14:32           ` Richard Biener
@ 2019-11-13 16:16             ` Richard Sandiford
  0 siblings, 0 replies; 48+ messages in thread
From: Richard Sandiford @ 2019-11-13 16:16 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC Patches

Richard Biener <richard.guenther@gmail.com> writes:
> On Tue, Nov 12, 2019 at 6:54 PM Richard Sandiford
> <richard.sandiford@arm.com> wrote:
>>
>> Richard Biener <richard.guenther@gmail.com> writes:
>> > On Wed, Oct 30, 2019 at 4:58 PM Richard Sandiford
>> > <richard.sandiford@arm.com> wrote:
>> >>
>> >> Richard Biener <richard.guenther@gmail.com> writes:
>> >> > On Fri, Oct 25, 2019 at 2:37 PM Richard Sandiford
>> >> > <richard.sandiford@arm.com> wrote:
>> >> >>
>> >> >> This is another patch in the series to remove the assumption that
>> >> >> all modes involved in vectorisation have to be the same size.
>> >> >> Rather than have the target provide a list of vector sizes,
>> >> >> it makes the target provide a list of vector "approaches",
>> >> >> with each approach represented by a mode.
>> >> >>
>> >> >> A later patch will pass this mode to targetm.vectorize.related_mode
>> >> >> to get the vector mode for a given element mode.  Until then, the modes
>> >> >> simply act as an alternative way of specifying the vector size.
>> >> >
>> >> > Is there a restriction to use integer vector modes for the hook
>> >> > or would FP vector modes be OK as well?
>> >>
>> >> Conceptually, each mode returned by the hook represents a set of vector
>> >> modes, with the set containing one member for each supported element
>> >> type.  The idea is to represent the set using the member with the
>> >> smallest element type, preferring integer modes over floating-point
>> >> modes in the event of a tie.  So using a floating-point mode as the
>> >> representative mode is fine if floating-point elements are the smallest
>> >> (or only) supported element type.
>> >>
>> >> > Note that your x86 change likely disables word_mode vectorization with
>> >> > -mno-sse?
>> >>
>> >> No, that still works, because...
>> >>
>> >> > That is, how do we represent GPR vectorization "size" here?
>> >> > The preferred SIMD mode hook may return an integer mode,
>> >> > are non-vector modes OK for autovectorize_vector_modes?
>> >>
>> >> ...at least with all current targets, preferred_simd_mode is only
>> >> an integer mode if the target has no "real" vectorisation support
>> >> for that element type.  There's no need to handle that case in
>> >> autovectorize_vector_sizes/modes, and e.g. the x86 hook does nothing
>> >> when SSE is disabled.
>> >>
>> >> So while preferred_simd_mode can continue to return integer modes,
>> >> autovectorize_vector_modes always returns vector modes.
>> >
>> > Hmm, I see.  IIRC I was playing with a patch for x86 that
>> > enabled word-mode vectorization (64bits) for SSE before (I see
>> > we don't do that at the moment).  The MMX-with-SSE has made
>> > that somewhat moot but with iterating over modes we could
>> > even make MMX-with-SSE (MMX modes) and word-mode vectors
>> > coexist by allowing the hook to return V4SI, V2SI, DImode?
>> > Because MMX-with-SSE might be more costly than word-mode
>> > but can of course handle more cases.
>> >
>> > So you say the above isn't supported and cannot be made supported?
>>
>> It isn't supported as things stand.  It shouldn't be hard to make
>> it work, but I'm not sure what the best semantics would be.
>>
>> AIUI, before the series, returning word_mode from preferred_simd_mode
>> just means that vectors should have the same size as word_mode.  If the
>> target defines V2SI, we'll use that as the raw type mode for SI vectors,
>> regardless of whether V2SI is enabled.  If the mode *is* enabled,
>> the TYPE_MODE will also be V2SI and so returning word_mode from
>> preferred_simd_mode is equivalent to returning V2SImode.  If the mode
>> isn't enabled, the TYPE_MODE will be word_mode if that's suitable and
>> BLKmode otherwise.
>>
>> The situation's similar for SF; if the target defines and supports V2SF,
>> returning word_mode would be equivalent to returning V2SFmode.
>>
>> But it sounds like returning word_mode for the new hook would behave
>> differently, in that we'd force the raw type mode to be DImode even
>> if V2SImode is defined and supported.
>
> Yes.
>
>> So what should happen for float
>> types?  Should we reject those, or behave as above and apply the usual
>> mode_for_vector treatment for a word_mode-sized vector?
>
> I wasn't aware we're doing this mode_for_vector dance, certainly we
> didn't that before?

We do that without the series too, just a bit more indirectly.
get_vectype_for_scalar_type_and_size has:

  /* If no size was supplied use the mode the target prefers.   Otherwise
     lookup a vector mode of the specified size.  */
  if (known_eq (size, 0U))
    simd_mode = targetm.vectorize.preferred_simd_mode (inner_mode);
  else if (!multiple_p (size, nbytes, &nunits)
	   || !mode_for_vector (inner_mode, nunits).exists (&simd_mode))
    return NULL_TREE;
  /* NOTE: nunits == 1 is allowed to support single element vector types.  */
  if (!multiple_p (GET_MODE_SIZE (simd_mode), nbytes, &nunits))
    return NULL_TREE;

  vectype = build_vector_type (scalar_type, nunits);

So all we use the simd_mode for is to get a size.  Then that size
determines the nunits, and build_vector_type picks whichever mode
works best for that nunits and element type.  That mode might not
have anything to do with simd_mode, except for having the same size.

So although the new mode_for_vector calls make that explicit,
it was the behaviour before the series too.

> I expected targets would return vector modes that
> are enabled only.  What's the reason to not do that?

I think all current targets do only return supported modes or
word_mode from preferred_simd_mode.  But returning word_mode
isn't for example an effective way of preventing V2SI being used,
since if V2SI is enabled for 64-bit targets, word_mode would actually
choose V2SI rather than DI.

>> If code contains a mixture of HImode and SImode elements, should
>> we use DImode for both of them, or SImode for HImode elements?
>> Should the modes be passed to the target's related_vector_mode
>> hook in the same way as for vectors, or handled before then?
>
> I'd say SImode and HImode are naturally "related" "vector" modes
> for word_mode - so word_mode ideally is just a placeholder for
> "try to vectorize w/o actual SIMD instructions" which we can do
> for a small set of operations.
>
>> I could implement one of these.  I'm just not sure it'd turn out
>> to be the right one, once someone actually tries to use it. :-)
>
> Heh.  I was merely trying to make sure we're not designing us
> into a corner where we can't mix the word_mode vectorization
> facility with the SIMD one (because on GIMPLE it's still all
> vectors, only vector lowering exposes the ints IIRC).
>
>> FWIW, another way of doing the same thing would be to define
>> emulated vector modes, e.g. EMUL_V2SI, giving them a lower
>> priority than the real V2SI.  This is already possible with
>> VECTOR_MODES_WITH_PREFIX.  Because these emulated modes would
>> be permanently unsupported, the associated TYPE_MODE would always
>> be the equivalent integer mode (if appropriate).  So we could force
>> integer modes that way too.  This has the advantage that we never lose
>> sight of what the element type is, and so can choose between pairing
>> EMUL_V2SI and EMUL_V4HI vs. pairing EMUL_V2SI and EMUL_V2HI,
>> just like we can for "real" vector modes.
>>
>> Of course that's "a bit" of a hack.  But then so IMO is using integer
>> modes for this kind of choice. :-)
>>
>> Another option I'd considered was having the hook return a list of
>> abstract identifiers that are only meaningful to the target, either
>> with accompanying information like the maximum size and maximum nunits,
>> or with a separate hook to provide that information.  Or we could even
>> return a list of virtual objects.  But that seemed like an abstraction
>> too far when in practice a vector mode should be descriptive enough.
>
> Anyway, I think we have testsuite coverage for word_mode vectorization
> and IIRC i?86 enables that when there's no SSE support via
> the preferred_simd_mode hook which you do not change (and at the moment
> that doesn't have to be in the list of sizes we should iterate over).

Yeah.

> When I understand you correctly what is not yet supported with your patch
> is adding word_mode (or other integer modes) to autovectorize_vector_modes
> but behavior could be emulated via non-enabled vector modes that get
> mapped to integer modes.

Right.  And we could support word_mode with its current meaning
from autovectorize_vector_modes too if that turns out to be the
best way of handling it.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [16/n] Apply maximum nunits for BB SLP
  2019-11-05 14:09     ` Richard Sandiford
@ 2019-11-14 12:22       ` Richard Biener
  0 siblings, 0 replies; 48+ messages in thread
From: Richard Biener @ 2019-11-14 12:22 UTC (permalink / raw)
  To: Richard Biener, GCC Patches, Richard Sandiford

On Tue, Nov 5, 2019 at 3:09 PM Richard Sandiford
<Richard.Sandiford@arm.com> wrote:
>
> Richard Biener <richard.guenther@gmail.com> writes:
> > On Tue, Oct 29, 2019 at 6:05 PM Richard Sandiford
> > <richard.sandiford@arm.com> wrote:
> >>
> >> The BB vectoriser picked vector types in the same way as the loop
> >> vectoriser: it picked a vector mode/size for the region and then
> >> based all the vector types off that choice.  This meant we could
> >> end up trying to use vector types that had too many elements for
> >> the group size.
> >>
> >> The main part of this patch is therefore about passing the SLP
> >> group size down to routines like get_vectype_for_scalar_type and
> >> ensuring that each vector type in the SLP tree is chosen wrt the
> >> group size.  That part in itself is pretty easy and mechanical.
> >>
> >> The main warts are:
> >>
> >> (1) We normally pick a STMT_VINFO_VECTYPE for data references at an
> >>     early stage (vect_analyze_data_refs).  However, nothing in the
> >>     BB vectoriser relied on this, or on the min_vf calculated from it.
> >>     I couldn't see anything other than vect_recog_bool_pattern that
> >>     tried to access the vector type before the SLP tree is built.
> >
> > So can you not set STMT_VINFO_VECTYPE for data refs with BB vectorization
> > then?
>
> Yeah, the patch stops us from setting it during vect_analyze_data_refs.
> We still need to set it later when building the SLP tree, just like
> we do for other statements.
>
> >> (2) It's possible for the same statement to be used in the groups of
> >>     different sizes.  Taking the group size into account meant that
> >>     we could try to pick different vector types for the same statement.
> >
> > That only happens when we have multiple SLP instances though
> > (entries into the shared SLP graph).
>
> Yeah.
>
> > It probably makes sense to keep handling SLP instances sharing stmts
> > together for costing reasons but one issue is that for disjunct pieces
> > (in the same BB) disqualifying one cost-wise disqualifies all.  So at
> > some point during analysis (which should eventually cover more than a
> > single BB) we want to split the graph.  It probably doesn't help the
> > above case.
>
> Yeah, sounds like there are two issues: one with sharing stmt_vec_infos
> between multiple SLP nodes, and one with sharing SLP child nodes between
> multiple parent nodes.  (2) comes from the first, but I guess failing
> based on costs is more about the second.
>
> >>     This problem should go away with the move to doing everything on
> >>     SLP trees, where presumably we would attach the vector type to the
> >>     SLP node rather than the stmt_vec_info.  Until then, the patch just
> >>     uses a first-come, first-served approach.
> >
> > Yeah, I ran into not having vectype on SLP trees with invariants/externals
> > as well.  I suppose you didn't try simply adding that to the SLP tree
> > and pushing/popping it like we push/pop the def type?
>
> No, didn't try that.  Maybe it would be worth a go, but it seems like it
> could be a rabbit hole.
>
> > Assigning the vector types should really happen in vectorizable_*
> > and not during SLP build itself btw.
>
> Agree we need to improve the way this is handled, but delaying it
> to vectorizable_* sounds quite late.  Maybe it should be a more global
> decision, since the vector types for each vectorizable_* have to be
> compatible and it's not obvious which routine should get first choice.
>
> > Your update-all-shared-vectypes thing looks quadratic to me :/
>
> Should be amortised linear.  The statements in a DR group always
> have the same vectype.  When we want to change the vector type
> of one statement, we change it for all statements if possible
> or fail if we can't.

OK, let's go for it.

Thanks,
Richard.

> Thanks,
> Richard

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [17/17] Extend can_duplicate_and_interleave_p to mixed-size vectors
  2019-11-05 20:45 ` [17/17] Extend can_duplicate_and_interleave_p to mixed-size vectors Richard Sandiford
@ 2019-11-14 12:23   ` Richard Biener
  0 siblings, 0 replies; 48+ messages in thread
From: Richard Biener @ 2019-11-14 12:23 UTC (permalink / raw)
  To: GCC Patches, Richard Sandiford

On Tue, Nov 5, 2019 at 9:45 PM Richard Sandiford
<richard.sandiford@arm.com> wrote:
>
> This patch makes can_duplicate_and_interleave_p cope with mixtures of
> vector sizes, by using queries based on get_vectype_for_scalar_type
> instead of directly querying GET_MODE_SIZE (vinfo->vector_mode).
>
> int_mode_for_size is now the first check we do for a candidate mode,
> so it seemed better to restrict it to MAX_FIXED_MODE_SIZE.  This avoids
> unnecessary work and avoids trying to create scalar types that the
> target might not support.
>
> This final patch in the series.  As before, each patch tested individually
> on aarch64-linux-gnu and the series as a whole on x86_64-linux-gnu.

OK.

Thanks,
Richard.

>
> 2019-11-04  Richard Sandiford  <richard.sandiford@arm.com>
>
> gcc/
>         * tree-vectorizer.h (can_duplicate_and_interleave_p): Take an
>         element type rather than an element mode.
>         * tree-vect-slp.c (can_duplicate_and_interleave_p): Likewise.
>         Use get_vectype_for_scalar_type to query the natural types
>         for a given element type rather than basing everything on
>         GET_MODE_SIZE (vinfo->vector_mode).  Limit int_mode_for_size
>         query to MAX_FIXED_MODE_SIZE.
>         (duplicate_and_interleave): Update call accordingly.
>         * tree-vect-loop.c (vectorizable_reduction): Likewise.
>
> Index: gcc/tree-vectorizer.h
> ===================================================================
> --- gcc/tree-vectorizer.h       2019-11-05 11:08:12.521631453 +0000
> +++ gcc/tree-vectorizer.h       2019-11-05 11:14:42.786884473 +0000
> @@ -1779,8 +1779,7 @@ extern void vect_get_slp_defs (slp_tree,
>  extern bool vect_slp_bb (basic_block);
>  extern stmt_vec_info vect_find_last_scalar_stmt_in_slp (slp_tree);
>  extern bool is_simple_and_all_uses_invariant (stmt_vec_info, loop_vec_info);
> -extern bool can_duplicate_and_interleave_p (vec_info *, unsigned int,
> -                                           machine_mode,
> +extern bool can_duplicate_and_interleave_p (vec_info *, unsigned int, tree,
>                                             unsigned int * = NULL,
>                                             tree * = NULL, tree * = NULL);
>  extern void duplicate_and_interleave (vec_info *, gimple_seq *, tree,
> Index: gcc/tree-vect-slp.c
> ===================================================================
> --- gcc/tree-vect-slp.c 2019-11-05 11:08:12.517631481 +0000
> +++ gcc/tree-vect-slp.c 2019-11-05 11:14:42.786884473 +0000
> @@ -265,7 +265,7 @@ vect_get_place_in_interleaving_chain (st
>    return -1;
>  }
>
> -/* Check whether it is possible to load COUNT elements of type ELT_MODE
> +/* Check whether it is possible to load COUNT elements of type ELT_TYPE
>     using the method implemented by duplicate_and_interleave.  Return true
>     if so, returning the number of intermediate vectors in *NVECTORS_OUT
>     (if nonnull) and the type of each intermediate vector in *VECTOR_TYPE_OUT
> @@ -273,26 +273,37 @@ vect_get_place_in_interleaving_chain (st
>
>  bool
>  can_duplicate_and_interleave_p (vec_info *vinfo, unsigned int count,
> -                               machine_mode elt_mode,
> -                               unsigned int *nvectors_out,
> +                               tree elt_type, unsigned int *nvectors_out,
>                                 tree *vector_type_out,
>                                 tree *permutes)
>  {
> -  poly_int64 elt_bytes = count * GET_MODE_SIZE (elt_mode);
> -  poly_int64 nelts;
> +  tree base_vector_type = get_vectype_for_scalar_type (vinfo, elt_type, count);
> +  if (!base_vector_type || !VECTOR_MODE_P (TYPE_MODE (base_vector_type)))
> +    return false;
> +
> +  machine_mode base_vector_mode = TYPE_MODE (base_vector_type);
> +  poly_int64 elt_bytes = count * GET_MODE_UNIT_SIZE (base_vector_mode);
>    unsigned int nvectors = 1;
>    for (;;)
>      {
>        scalar_int_mode int_mode;
>        poly_int64 elt_bits = elt_bytes * BITS_PER_UNIT;
> -      if (multiple_p (GET_MODE_SIZE (vinfo->vector_mode), elt_bytes, &nelts)
> -         && int_mode_for_size (elt_bits, 0).exists (&int_mode))
> +      if (int_mode_for_size (elt_bits, 1).exists (&int_mode))
>         {
> +         /* Get the natural vector type for this SLP group size.  */
>           tree int_type = build_nonstandard_integer_type
>             (GET_MODE_BITSIZE (int_mode), 1);
> -         tree vector_type = build_vector_type (int_type, nelts);
> -         if (VECTOR_MODE_P (TYPE_MODE (vector_type)))
> -           {
> +         tree vector_type
> +           = get_vectype_for_scalar_type (vinfo, int_type, count);
> +         if (vector_type
> +             && VECTOR_MODE_P (TYPE_MODE (vector_type))
> +             && known_eq (GET_MODE_SIZE (TYPE_MODE (vector_type)),
> +                          GET_MODE_SIZE (base_vector_mode)))
> +           {
> +             /* Try fusing consecutive sequences of COUNT / NVECTORS elements
> +                together into elements of type INT_TYPE and using the result
> +                to build NVECTORS vectors.  */
> +             poly_uint64 nelts = GET_MODE_NUNITS (TYPE_MODE (vector_type));
>               vec_perm_builder sel1 (nelts, 2, 3);
>               vec_perm_builder sel2 (nelts, 2, 3);
>               poly_int64 half_nelts = exact_div (nelts, 2);
> @@ -492,7 +503,7 @@ vect_get_and_check_slp_defs (vec_info *v
>               && !GET_MODE_SIZE (vinfo->vector_mode).is_constant ()
>               && (TREE_CODE (type) == BOOLEAN_TYPE
>                   || !can_duplicate_and_interleave_p (vinfo, stmts.length (),
> -                                                     TYPE_MODE (type))))
> +                                                     type)))
>             {
>               if (dump_enabled_p ())
>                 dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> @@ -3551,7 +3562,7 @@ duplicate_and_interleave (vec_info *vinf
>    unsigned int nvectors = 1;
>    tree new_vector_type;
>    tree permutes[2];
> -  if (!can_duplicate_and_interleave_p (vinfo, nelts, TYPE_MODE (element_type),
> +  if (!can_duplicate_and_interleave_p (vinfo, nelts, element_type,
>                                        &nvectors, &new_vector_type,
>                                        permutes))
>      gcc_unreachable ();
> Index: gcc/tree-vect-loop.c
> ===================================================================
> --- gcc/tree-vect-loop.c        2019-11-05 10:57:41.658071173 +0000
> +++ gcc/tree-vect-loop.c        2019-11-05 11:14:42.782884501 +0000
> @@ -6288,10 +6288,9 @@ vectorizable_reduction (stmt_vec_info st
>          that value needs to be repeated for every instance of the
>          statement within the initial vector.  */
>        unsigned int group_size = SLP_INSTANCE_GROUP_SIZE (slp_node_instance);
> -      scalar_mode elt_mode = SCALAR_TYPE_MODE (TREE_TYPE (vectype_out));
>        if (!neutral_op
>           && !can_duplicate_and_interleave_p (loop_vinfo, group_size,
> -                                             elt_mode))
> +                                             TREE_TYPE (vectype_out)))
>         {
>           if (dump_enabled_p ())
>             dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,

^ permalink raw reply	[flat|nested] 48+ messages in thread

end of thread, other threads:[~2019-11-14 12:23 UTC | newest]

Thread overview: 48+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-25 12:32 [0/n] Support multiple vector sizes for vectorisation Richard Sandiford
2019-10-25 12:34 ` [6/n] Use build_vector_type_for_mode in get_vectype_for_scalar_type_and_size Richard Sandiford
2019-10-30 14:32   ` Richard Biener
2019-10-25 12:34 ` [7/n] Use consistent compatibility checks in vectorizable_shift Richard Sandiford
2019-10-30 14:33   ` Richard Biener
2019-10-25 12:39 ` [8/n] Replace autovectorize_vector_sizes with autovectorize_vector_modes Richard Sandiford
2019-10-30 14:48   ` Richard Biener
2019-10-30 16:33     ` Richard Sandiford
2019-11-11 10:30       ` Richard Sandiford
2019-11-11 14:33       ` Richard Biener
2019-11-12 17:55         ` Richard Sandiford
2019-11-13 14:32           ` Richard Biener
2019-11-13 16:16             ` Richard Sandiford
2019-10-25 12:41 ` [9/n] Replace vec_info::vector_size with vec_info::vector_mode Richard Sandiford
2019-11-05 12:47   ` Richard Biener
2019-10-25 12:43 ` [10/n] Make less use of get_same_sized_vectype Richard Sandiford
2019-11-05 12:50   ` Richard Biener
2019-11-05 15:34     ` Richard Sandiford
2019-11-05 16:09       ` Richard Biener
2019-10-25 12:44 ` [11/n] Support vectorisation with mixed vector sizes Richard Sandiford
2019-11-05 12:57   ` Richard Biener
2019-11-06 12:38     ` Richard Sandiford
2019-11-12  9:22       ` Richard Biener
2019-10-25 12:49 ` [12/n] [AArch64] Support vectorising with multiple " Richard Sandiford
2019-10-25 12:51 ` [13/n] Allow mixed vector sizes within a single vectorised stmt Richard Sandiford
2019-11-05 12:58   ` Richard Biener
2019-10-25 13:00 ` [14/n] Vectorise conversions between differently-sized integer vectors Richard Sandiford
2019-11-05 13:02   ` Richard Biener
2019-11-06 12:45     ` Richard Sandiford
2019-11-12  9:40       ` Richard Biener
2019-10-29 17:05 ` [15/n] Consider building nodes from scalars in vect_slp_analyze_node_operations Richard Sandiford
2019-11-05 13:07   ` Richard Biener
2019-10-29 17:14 ` [16/n] Apply maximum nunits for BB SLP Richard Sandiford
2019-11-05 13:22   ` Richard Biener
2019-11-05 14:09     ` Richard Sandiford
2019-11-14 12:22       ` Richard Biener
2019-11-05 20:10 ` [10a/n] Require equal type sizes for vectorised calls Richard Sandiford
2019-11-06  9:44   ` Richard Biener
2019-11-05 20:25 ` [11a/n] Avoid retrying with the same vector modes Richard Sandiford
2019-11-06  9:49   ` Richard Biener
2019-11-06 10:21     ` Richard Sandiford
2019-11-06 10:27       ` Richard Biener
2019-11-06 11:02         ` Richard Sandiford
2019-11-06 11:22           ` Richard Biener
2019-11-06 12:47             ` Richard Sandiford
2019-11-12  9:25               ` Richard Biener
2019-11-05 20:45 ` [17/17] Extend can_duplicate_and_interleave_p to mixed-size vectors Richard Sandiford
2019-11-14 12:23   ` Richard Biener

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).