public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [00/13] Make VEC_PERM_EXPR work for variable-length vectors
@ 2017-12-09 23:06 Richard Sandiford
  2017-12-09 23:08 ` [01/13] Add a qimode_for_vec_perm helper function Richard Sandiford
                   ` (11 more replies)
  0 siblings, 12 replies; 46+ messages in thread
From: Richard Sandiford @ 2017-12-09 23:06 UTC (permalink / raw)
  To: gcc-patches

This series is a replacement for:
https://gcc.gnu.org/ml/gcc-patches/2017-11/msg00747.html
based on the feedback that using VEC_PERM_EXPR would be better.

The changes are:

(1) Remove the restriction that the selector elements have to have the
    same width as the data elements, but only for constant selectors.
    This lets through the cases we need without also allowing
    potentially-expensive ops.  Adding support for the variable
    case can be done later if it seems useful, but it's not trivial.

(2) Encode the integer form of constant selectors (vec_perm_indices)
    in the same way as the new VECTOR_CST encoding, so that it can
    cope with variable-length vectors.

(3) Remove the vec_perm_const optab and reuse the target hook to emit
    code.  This avoids the need to create a CONST_VECTOR for the wide
    selectors, and hence the need to have a corresponding wide vector
    mode (which the target wouldn't otherwise need or support).

(4) When handling the variable vec_perm optab, check that modes can store
    all element indices before using them.

(5) Unconditionally use ssizetype selector elements in permutes created
    by the vectoriser.

(6) Make the AArch64 vec_perm_const handling handle variable-length vectors.

Tested directly on trunk on aarch64-linux-gnu, x86_64-linux-gnu and
powerpc64le-linux-gnu.  Also tested by comparing the before and after
assembly output for:

   arm-linux-gnueabi arm-linux-gnueabihf aarch64-linux-gnu
   aarch64_be-linux-gnu ia64-linux-gnu i686-pc-linux-gnu
   mipsisa64-linux-gnu mipsel-linux-gnu powerpc64-linux-gnu
   powerpc64le-linux-gnu powerpc-eabispe x86_64-linux-gnu
   sparc64-linux-gnu

at -O3, which should cover all the ports that defined vec_perm_const.
The only difference was one instance of different RA for ia64-linux-gnu,
caused by using force_reg on a SUBREG that was previously used directly.

OK to install?

Thanks,
Richard

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [01/13] Add a qimode_for_vec_perm helper function
  2017-12-09 23:06 [00/13] Make VEC_PERM_EXPR work for variable-length vectors Richard Sandiford
@ 2017-12-09 23:08 ` Richard Sandiford
  2017-12-18 13:34   ` Richard Biener
  2017-12-09 23:09 ` [02/13] Pass vec_perm_indices by reference Richard Sandiford
                   ` (10 subsequent siblings)
  11 siblings, 1 reply; 46+ messages in thread
From: Richard Sandiford @ 2017-12-09 23:08 UTC (permalink / raw)
  To: gcc-patches

The vec_perm code falls back to doing byte-level permutes if
element-level permutes aren't supported.  There were two copies
of the code to calculate the mode, and later patches add another,
so this patch splits it out into a helper function.


2017-12-09  Richard Sandiford  <richard.sandiford@linaro.org>

gcc/
	* optabs-query.h (qimode_for_vec_perm): Declare.
	* optabs-query.c (can_vec_perm_p): Split out qimode search to...
	(qimode_for_vec_perm): ...this new function.
	* optabs.c (expand_vec_perm): Use qimode_for_vec_perm.

Index: gcc/optabs-query.h
===================================================================
--- gcc/optabs-query.h	2017-12-09 22:47:12.476364764 +0000
+++ gcc/optabs-query.h	2017-12-09 22:47:14.730310076 +0000
@@ -174,6 +174,7 @@ enum insn_code can_extend_p (machine_mod
 enum insn_code can_float_p (machine_mode, machine_mode, int);
 enum insn_code can_fix_p (machine_mode, machine_mode, int, bool *);
 bool can_conditionally_move_p (machine_mode mode);
+opt_machine_mode qimode_for_vec_perm (machine_mode);
 bool can_vec_perm_p (machine_mode, bool, vec_perm_indices *);
 /* Find a widening optab even if it doesn't widen as much as we want.  */
 #define find_widening_optab_handler(A, B, C) \
Index: gcc/optabs-query.c
===================================================================
--- gcc/optabs-query.c	2017-12-09 22:47:12.476364764 +0000
+++ gcc/optabs-query.c	2017-12-09 22:47:14.729310075 +0000
@@ -345,6 +345,22 @@ can_conditionally_move_p (machine_mode m
   return direct_optab_handler (movcc_optab, mode) != CODE_FOR_nothing;
 }
 
+/* If a target doesn't implement a permute on a vector with multibyte
+   elements, we can try to do the same permute on byte elements.
+   If this makes sense for vector mode MODE then return the appropriate
+   byte vector mode.  */
+
+opt_machine_mode
+qimode_for_vec_perm (machine_mode mode)
+{
+  machine_mode qimode;
+  if (GET_MODE_INNER (mode) != QImode
+      && mode_for_vector (QImode, GET_MODE_SIZE (mode)).exists (&qimode)
+      && VECTOR_MODE_P (qimode))
+    return qimode;
+  return opt_machine_mode ();
+}
+
 /* Return true if VEC_PERM_EXPR of arbitrary input vectors can be
    expanded using SIMD extensions of the CPU.  SEL may be NULL, which
    stands for an unknown constant.  Note that additional permutations
@@ -375,9 +391,7 @@ can_vec_perm_p (machine_mode mode, bool
     return true;
 
   /* We allow fallback to a QI vector mode, and adjust the mask.  */
-  if (GET_MODE_INNER (mode) == QImode
-      || !mode_for_vector (QImode, GET_MODE_SIZE (mode)).exists (&qimode)
-      || !VECTOR_MODE_P (qimode))
+  if (!qimode_for_vec_perm (mode).exists (&qimode))
     return false;
 
   /* ??? For completeness, we ought to check the QImode version of
Index: gcc/optabs.c
===================================================================
--- gcc/optabs.c	2017-12-09 22:47:12.476364764 +0000
+++ gcc/optabs.c	2017-12-09 22:47:14.731310077 +0000
@@ -5452,9 +5452,7 @@ expand_vec_perm (machine_mode mode, rtx
 
   /* Set QIMODE to a different vector mode with byte elements.
      If no such mode, or if MODE already has byte elements, use VOIDmode.  */
-  if (GET_MODE_INNER (mode) == QImode
-      || !mode_for_vector (QImode, w).exists (&qimode)
-      || !VECTOR_MODE_P (qimode))
+  if (!qimode_for_vec_perm (mode).exists (&qimode))
     qimode = VOIDmode;
 
   /* If the input is a constant, expand it specially.  */

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [02/13] Pass vec_perm_indices by reference
  2017-12-09 23:06 [00/13] Make VEC_PERM_EXPR work for variable-length vectors Richard Sandiford
  2017-12-09 23:08 ` [01/13] Add a qimode_for_vec_perm helper function Richard Sandiford
@ 2017-12-09 23:09 ` Richard Sandiford
  2017-12-12 14:23   ` Richard Biener
  2017-12-09 23:11 ` [03/13] Split can_vec_perm_p into can_vec_perm_{var,const}_p Richard Sandiford
                   ` (9 subsequent siblings)
  11 siblings, 1 reply; 46+ messages in thread
From: Richard Sandiford @ 2017-12-09 23:09 UTC (permalink / raw)
  To: gcc-patches

This patch makes functions take vec_perm_indices by reference rather
than value, since a later patch will turn vec_perm_indices into a class
that would be more expensive to copy.


2017-12-06  Richard Sandiford  <richard.sandiford@linaro.org>

gcc/
	* fold-const.c (fold_vec_perm): Take a const vec_perm_indices &
	instead of vec_perm_indices.
	* tree-vectorizer.h (vect_gen_perm_mask_any): Likewise,
	(vect_gen_perm_mask_checked): Likewise,
	* tree-vect-stmts.c (vect_gen_perm_mask_any): Likewise,
	(vect_gen_perm_mask_checked): Likewise,

Index: gcc/fold-const.c
===================================================================
--- gcc/fold-const.c	2017-12-09 22:47:11.840391388 +0000
+++ gcc/fold-const.c	2017-12-09 22:47:19.119312754 +0000
@@ -8801,7 +8801,7 @@ vec_cst_ctor_to_array (tree arg, unsigne
    NULL_TREE otherwise.  */
 
 static tree
-fold_vec_perm (tree type, tree arg0, tree arg1, vec_perm_indices sel)
+fold_vec_perm (tree type, tree arg0, tree arg1, const vec_perm_indices &sel)
 {
   unsigned int i;
   bool need_ctor = false;
Index: gcc/tree-vectorizer.h
===================================================================
--- gcc/tree-vectorizer.h	2017-12-09 22:47:11.840391388 +0000
+++ gcc/tree-vectorizer.h	2017-12-09 22:47:19.120312754 +0000
@@ -1204,8 +1204,8 @@ extern void vect_get_load_cost (struct d
 extern void vect_get_store_cost (struct data_reference *, int,
 				 unsigned int *, stmt_vector_for_cost *);
 extern bool vect_supportable_shift (enum tree_code, tree);
-extern tree vect_gen_perm_mask_any (tree, vec_perm_indices);
-extern tree vect_gen_perm_mask_checked (tree, vec_perm_indices);
+extern tree vect_gen_perm_mask_any (tree, const vec_perm_indices &);
+extern tree vect_gen_perm_mask_checked (tree, const vec_perm_indices &);
 extern void optimize_mask_stores (struct loop*);
 
 /* In tree-vect-data-refs.c.  */
Index: gcc/tree-vect-stmts.c
===================================================================
--- gcc/tree-vect-stmts.c	2017-12-09 22:47:11.840391388 +0000
+++ gcc/tree-vect-stmts.c	2017-12-09 22:47:19.119312754 +0000
@@ -6506,7 +6506,7 @@ vectorizable_store (gimple *stmt, gimple
    vect_gen_perm_mask_checked.  */
 
 tree
-vect_gen_perm_mask_any (tree vectype, vec_perm_indices sel)
+vect_gen_perm_mask_any (tree vectype, const vec_perm_indices &sel)
 {
   tree mask_elt_type, mask_type;
 
@@ -6527,7 +6527,7 @@ vect_gen_perm_mask_any (tree vectype, ve
    i.e. that the target supports the pattern _for arbitrary input vectors_.  */
 
 tree
-vect_gen_perm_mask_checked (tree vectype, vec_perm_indices sel)
+vect_gen_perm_mask_checked (tree vectype, const vec_perm_indices &sel)
 {
   gcc_assert (can_vec_perm_p (TYPE_MODE (vectype), false, &sel));
   return vect_gen_perm_mask_any (vectype, sel);

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [03/13] Split can_vec_perm_p into can_vec_perm_{var,const}_p
  2017-12-09 23:06 [00/13] Make VEC_PERM_EXPR work for variable-length vectors Richard Sandiford
  2017-12-09 23:08 ` [01/13] Add a qimode_for_vec_perm helper function Richard Sandiford
  2017-12-09 23:09 ` [02/13] Pass vec_perm_indices by reference Richard Sandiford
@ 2017-12-09 23:11 ` Richard Sandiford
  2017-12-12 14:25   ` Richard Biener
  2017-12-09 23:13 ` [04/13] Refactor expand_vec_perm Richard Sandiford
                   ` (8 subsequent siblings)
  11 siblings, 1 reply; 46+ messages in thread
From: Richard Sandiford @ 2017-12-09 23:11 UTC (permalink / raw)
  To: gcc-patches

This patch splits can_vec_perm_p into two functions: can_vec_perm_var_p
for testing permute operations with variable selection vectors, and
can_vec_perm_const_p for testing permute operations with specific
constant selection vectors.  This means that we can pass the constant
selection vector by reference.

Constant permutes can still use a variable permute as a fallback.
A later patch adds a check to make sure that we don't truncate the
vector indices when doing this.

However, have_whole_vector_shift checked:

  if (direct_optab_handler (vec_perm_const_optab, mode) == CODE_FOR_nothing)
    return false;

which had the effect of disallowing the fallback to variable permutes.
I'm not sure whether that was the intention or whether it was just
supposed to short-cut the loop on targets that don't support permutes.
(But then why bother?  The first check in the loop would fail and
we'd bail out straightaway.)

The patch adds a parameter for disallowing the fallback.  I think it
makes sense to do this for the following code in the VEC_PERM_EXPR
folder:

	  /* Some targets are deficient and fail to expand a single
	     argument permutation while still allowing an equivalent
	     2-argument version.  */
	  if (need_mask_canon && arg2 == op2
	      && !can_vec_perm_p (TYPE_MODE (type), false, &sel)
	      && can_vec_perm_p (TYPE_MODE (type), false, &sel2))

since it's really testing whether the expand_vec_perm_const code expects
a particular form.


2017-12-09  Richard Sandiford  <richard.sandiford@linaro.org>

gcc/
	* optabs-query.h (can_vec_perm_p): Delete.
	(can_vec_perm_var_p, can_vec_perm_const_p): Declare.
	* optabs-query.c (can_vec_perm_p): Split into...
	(can_vec_perm_var_p, can_vec_perm_const_p): ...these two functions.
	(can_mult_highpart_p): Use can_vec_perm_const_p to test whether a
	particular selector is valid.
	* tree-ssa-forwprop.c (simplify_vector_constructor): Likewise.
	* tree-vect-data-refs.c (vect_grouped_store_supported): Likewise.
	(vect_grouped_load_supported): Likewise.
	(vect_shift_permute_load_chain): Likewise.
	* tree-vect-slp.c (vect_build_slp_tree_1): Likewise.
	(vect_transform_slp_perm_load): Likewise.
	* tree-vect-stmts.c (perm_mask_for_reverse): Likewise.
	(vectorizable_bswap): Likewise.
	(vect_gen_perm_mask_checked): Likewise.
	* fold-const.c (fold_ternary_loc): Likewise.  Don't take
	implementations of variable permutation vectors into account
	when deciding which selector to use.
	* tree-vect-loop.c (have_whole_vector_shift): Don't check whether
	vec_perm_const_optab is supported; instead use can_vec_perm_const_p
	with a false third argument.
	* tree-vect-generic.c (lower_vec_perm): Use can_vec_perm_const_p
	to test whether the constant selector is valid and can_vec_perm_var_p
	to test whether a variable selector is valid.

Index: gcc/optabs-query.h
===================================================================
--- gcc/optabs-query.h	2017-12-09 22:47:14.730310076 +0000
+++ gcc/optabs-query.h	2017-12-09 22:47:21.534314227 +0000
@@ -175,7 +175,9 @@ enum insn_code can_float_p (machine_mode
 enum insn_code can_fix_p (machine_mode, machine_mode, int, bool *);
 bool can_conditionally_move_p (machine_mode mode);
 opt_machine_mode qimode_for_vec_perm (machine_mode);
-bool can_vec_perm_p (machine_mode, bool, vec_perm_indices *);
+bool can_vec_perm_var_p (machine_mode);
+bool can_vec_perm_const_p (machine_mode, const vec_perm_indices &,
+			   bool = true);
 /* Find a widening optab even if it doesn't widen as much as we want.  */
 #define find_widening_optab_handler(A, B, C) \
   find_widening_optab_handler_and_mode (A, B, C, NULL)
Index: gcc/optabs-query.c
===================================================================
--- gcc/optabs-query.c	2017-12-09 22:47:14.729310075 +0000
+++ gcc/optabs-query.c	2017-12-09 22:47:21.534314227 +0000
@@ -361,58 +361,75 @@ qimode_for_vec_perm (machine_mode mode)
   return opt_machine_mode ();
 }
 
-/* Return true if VEC_PERM_EXPR of arbitrary input vectors can be
-   expanded using SIMD extensions of the CPU.  SEL may be NULL, which
-   stands for an unknown constant.  Note that additional permutations
-   representing whole-vector shifts may also be handled via the vec_shr
-   optab, but only where the second input vector is entirely constant
-   zeroes; this case is not dealt with here.  */
+/* Return true if VEC_PERM_EXPRs with variable selector operands can be
+   expanded using SIMD extensions of the CPU.  MODE is the mode of the
+   vectors being permuted.  */
 
 bool
-can_vec_perm_p (machine_mode mode, bool variable, vec_perm_indices *sel)
+can_vec_perm_var_p (machine_mode mode)
 {
-  machine_mode qimode;
-
   /* If the target doesn't implement a vector mode for the vector type,
      then no operations are supported.  */
   if (!VECTOR_MODE_P (mode))
     return false;
 
-  if (!variable)
-    {
-      if (direct_optab_handler (vec_perm_const_optab, mode) != CODE_FOR_nothing
-	  && (sel == NULL
-	      || targetm.vectorize.vec_perm_const_ok == NULL
-	      || targetm.vectorize.vec_perm_const_ok (mode, *sel)))
-	return true;
-    }
-
   if (direct_optab_handler (vec_perm_optab, mode) != CODE_FOR_nothing)
     return true;
 
   /* We allow fallback to a QI vector mode, and adjust the mask.  */
+  machine_mode qimode;
   if (!qimode_for_vec_perm (mode).exists (&qimode))
     return false;
 
-  /* ??? For completeness, we ought to check the QImode version of
-      vec_perm_const_optab.  But all users of this implicit lowering
-      feature implement the variable vec_perm_optab.  */
   if (direct_optab_handler (vec_perm_optab, qimode) == CODE_FOR_nothing)
     return false;
 
   /* In order to support the lowering of variable permutations,
      we need to support shifts and adds.  */
-  if (variable)
+  if (GET_MODE_UNIT_SIZE (mode) > 2
+      && optab_handler (ashl_optab, mode) == CODE_FOR_nothing
+      && optab_handler (vashl_optab, mode) == CODE_FOR_nothing)
+    return false;
+  if (optab_handler (add_optab, qimode) == CODE_FOR_nothing)
+    return false;
+
+  return true;
+}
+
+/* Return true if the target directly supports VEC_PERM_EXPRs on vectors
+   of mode MODE using the selector SEL.  ALLOW_VARIABLE_P is true if it
+   is acceptable to force the selector into a register and use a variable
+   permute (if the target supports that).
+
+   Note that additional permutations representing whole-vector shifts may
+   also be handled via the vec_shr optab, but only where the second input
+   vector is entirely constant zeroes; this case is not dealt with here.  */
+
+bool
+can_vec_perm_const_p (machine_mode mode, const vec_perm_indices &sel,
+		      bool allow_variable_p)
+{
+  /* If the target doesn't implement a vector mode for the vector type,
+     then no operations are supported.  */
+  if (!VECTOR_MODE_P (mode))
+    return false;
+
+  /* It's probably cheaper to test for the variable case first.  */
+  if (allow_variable_p && can_vec_perm_var_p (mode))
+    return true;
+
+  if (direct_optab_handler (vec_perm_const_optab, mode) != CODE_FOR_nothing)
     {
-      if (GET_MODE_UNIT_SIZE (mode) > 2
-	  && optab_handler (ashl_optab, mode) == CODE_FOR_nothing
-	  && optab_handler (vashl_optab, mode) == CODE_FOR_nothing)
-	return false;
-      if (optab_handler (add_optab, qimode) == CODE_FOR_nothing)
-	return false;
+      if (targetm.vectorize.vec_perm_const_ok == NULL
+	  || targetm.vectorize.vec_perm_const_ok (mode, sel))
+	return true;
+
+      /* ??? For completeness, we ought to check the QImode version of
+	 vec_perm_const_optab.  But all users of this implicit lowering
+	 feature implement the variable vec_perm_optab.  */
     }
 
-  return true;
+  return false;
 }
 
 /* Find a widening optab even if it doesn't widen as much as we want.
@@ -472,7 +489,7 @@ can_mult_highpart_p (machine_mode mode,
 	    sel.quick_push (!BYTES_BIG_ENDIAN
 			    + (i & ~1)
 			    + ((i & 1) ? nunits : 0));
-	  if (can_vec_perm_p (mode, false, &sel))
+	  if (can_vec_perm_const_p (mode, sel))
 	    return 2;
 	}
     }
@@ -486,7 +503,7 @@ can_mult_highpart_p (machine_mode mode,
 	  auto_vec_perm_indices sel (nunits);
 	  for (i = 0; i < nunits; ++i)
 	    sel.quick_push (2 * i + (BYTES_BIG_ENDIAN ? 0 : 1));
-	  if (can_vec_perm_p (mode, false, &sel))
+	  if (can_vec_perm_const_p (mode, sel))
 	    return 3;
 	}
     }
Index: gcc/tree-ssa-forwprop.c
===================================================================
--- gcc/tree-ssa-forwprop.c	2017-12-09 22:47:11.145420483 +0000
+++ gcc/tree-ssa-forwprop.c	2017-12-09 22:47:21.534314227 +0000
@@ -2108,7 +2108,7 @@ simplify_vector_constructor (gimple_stmt
     {
       tree mask_type;
 
-      if (!can_vec_perm_p (TYPE_MODE (type), false, &sel))
+      if (!can_vec_perm_const_p (TYPE_MODE (type), sel))
 	return false;
       mask_type
 	= build_vector_type (build_nonstandard_integer_type (elem_size, 1),
Index: gcc/tree-vect-data-refs.c
===================================================================
--- gcc/tree-vect-data-refs.c	2017-12-09 22:47:11.145420483 +0000
+++ gcc/tree-vect-data-refs.c	2017-12-09 22:47:21.535314227 +0000
@@ -4587,11 +4587,11 @@ vect_grouped_store_supported (tree vecty
 		  if (3 * i + nelt2 < nelt)
 		    sel[3 * i + nelt2] = 0;
 		}
-	      if (!can_vec_perm_p (mode, false, &sel))
+	      if (!can_vec_perm_const_p (mode, sel))
 		{
 		  if (dump_enabled_p ())
 		    dump_printf (MSG_MISSED_OPTIMIZATION,
-				 "permutaion op not supported by target.\n");
+				 "permutation op not supported by target.\n");
 		  return false;
 		}
 
@@ -4604,11 +4604,11 @@ vect_grouped_store_supported (tree vecty
 		  if (3 * i + nelt2 < nelt)
 		    sel[3 * i + nelt2] = nelt + j2++;
 		}
-	      if (!can_vec_perm_p (mode, false, &sel))
+	      if (!can_vec_perm_const_p (mode, sel))
 		{
 		  if (dump_enabled_p ())
 		    dump_printf (MSG_MISSED_OPTIMIZATION,
-				 "permutaion op not supported by target.\n");
+				 "permutation op not supported by target.\n");
 		  return false;
 		}
 	    }
@@ -4624,11 +4624,11 @@ vect_grouped_store_supported (tree vecty
 	      sel[i * 2] = i;
 	      sel[i * 2 + 1] = i + nelt;
 	    }
-	  if (can_vec_perm_p (mode, false, &sel))
+	  if (can_vec_perm_const_p (mode, sel))
 	    {
 	      for (i = 0; i < nelt; i++)
 		sel[i] += nelt / 2;
-	      if (can_vec_perm_p (mode, false, &sel))
+	      if (can_vec_perm_const_p (mode, sel))
 		return true;
 	    }
 	}
@@ -5166,7 +5166,7 @@ vect_grouped_load_supported (tree vectyp
 		  sel[i] = 3 * i + k;
 		else
 		  sel[i] = 0;
-	      if (!can_vec_perm_p (mode, false, &sel))
+	      if (!can_vec_perm_const_p (mode, sel))
 		{
 		  if (dump_enabled_p ())
 		    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
@@ -5179,7 +5179,7 @@ vect_grouped_load_supported (tree vectyp
 		  sel[i] = i;
 		else
 		  sel[i] = nelt + ((nelt + k) % 3) + 3 * (j++);
-	      if (!can_vec_perm_p (mode, false, &sel))
+	      if (!can_vec_perm_const_p (mode, sel))
 		{
 		  if (dump_enabled_p ())
 		    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
@@ -5196,11 +5196,11 @@ vect_grouped_load_supported (tree vectyp
 	  gcc_assert (pow2p_hwi (count));
 	  for (i = 0; i < nelt; i++)
 	    sel[i] = i * 2;
-	  if (can_vec_perm_p (mode, false, &sel))
+	  if (can_vec_perm_const_p (mode, sel))
 	    {
 	      for (i = 0; i < nelt; i++)
 		sel[i] = i * 2 + 1;
-	      if (can_vec_perm_p (mode, false, &sel))
+	      if (can_vec_perm_const_p (mode, sel))
 		return true;
 	    }
         }
@@ -5527,7 +5527,7 @@ vect_shift_permute_load_chain (vec<tree>
 	sel[i] = i * 2;
       for (i = 0; i < nelt / 2; ++i)
 	sel[nelt / 2 + i] = i * 2 + 1;
-      if (!can_vec_perm_p (TYPE_MODE (vectype), false, &sel))
+      if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel))
 	{
 	  if (dump_enabled_p ())
 	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
@@ -5541,7 +5541,7 @@ vect_shift_permute_load_chain (vec<tree>
 	sel[i] = i * 2 + 1;
       for (i = 0; i < nelt / 2; ++i)
 	sel[nelt / 2 + i] = i * 2;
-      if (!can_vec_perm_p (TYPE_MODE (vectype), false, &sel))
+      if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel))
 	{
 	  if (dump_enabled_p ())
 	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
@@ -5555,7 +5555,7 @@ vect_shift_permute_load_chain (vec<tree>
 	 For vector length 8 it is {4 5 6 7 8 9 10 11}.  */
       for (i = 0; i < nelt; i++)
 	sel[i] = nelt / 2 + i;
-      if (!can_vec_perm_p (TYPE_MODE (vectype), false, &sel))
+      if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel))
 	{
 	  if (dump_enabled_p ())
 	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
@@ -5570,7 +5570,7 @@ vect_shift_permute_load_chain (vec<tree>
 	sel[i] = i;
       for (i = nelt / 2; i < nelt; i++)
 	sel[i] = nelt + i;
-      if (!can_vec_perm_p (TYPE_MODE (vectype), false, &sel))
+      if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel))
 	{
 	  if (dump_enabled_p ())
 	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
@@ -5633,7 +5633,7 @@ vect_shift_permute_load_chain (vec<tree>
 	  sel[i] = 3 * k + (l % 3);
 	  k++;
 	}
-      if (!can_vec_perm_p (TYPE_MODE (vectype), false, &sel))
+      if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel))
 	{
 	  if (dump_enabled_p ())
 	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
@@ -5647,7 +5647,7 @@ vect_shift_permute_load_chain (vec<tree>
 	 For vector length 8 it is {6 7 8 9 10 11 12 13}.  */
       for (i = 0; i < nelt; i++)
 	sel[i] = 2 * (nelt / 3) + (nelt % 3) + i;
-      if (!can_vec_perm_p (TYPE_MODE (vectype), false, &sel))
+      if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel))
 	{
 	  if (dump_enabled_p ())
 	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
@@ -5660,7 +5660,7 @@ vect_shift_permute_load_chain (vec<tree>
 	 For vector length 8 it is {5 6 7 8 9 10 11 12}.  */
       for (i = 0; i < nelt; i++)
 	sel[i] = 2 * (nelt / 3) + 1 + i;
-      if (!can_vec_perm_p (TYPE_MODE (vectype), false, &sel))
+      if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel))
 	{
 	  if (dump_enabled_p ())
 	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
@@ -5673,7 +5673,7 @@ vect_shift_permute_load_chain (vec<tree>
 	 For vector length 8 it is {3 4 5 6 7 8 9 10}.  */
       for (i = 0; i < nelt; i++)
 	sel[i] = (nelt / 3) + (nelt % 3) / 2 + i;
-      if (!can_vec_perm_p (TYPE_MODE (vectype), false, &sel))
+      if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel))
 	{
 	  if (dump_enabled_p ())
 	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
@@ -5686,7 +5686,7 @@ vect_shift_permute_load_chain (vec<tree>
 	 For vector length 8 it is {5 6 7 8 9 10 11 12}.  */
       for (i = 0; i < nelt; i++)
 	sel[i] = 2 * (nelt / 3) + (nelt % 3) / 2 + i;
-      if (!can_vec_perm_p (TYPE_MODE (vectype), false, &sel))
+      if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel))
 	{
 	  if (dump_enabled_p ())
 	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
Index: gcc/tree-vect-slp.c
===================================================================
--- gcc/tree-vect-slp.c	2017-12-09 22:47:11.145420483 +0000
+++ gcc/tree-vect-slp.c	2017-12-09 22:47:21.536314228 +0000
@@ -901,7 +901,7 @@ vect_build_slp_tree_1 (vec_info *vinfo,
 	    elt += count;
 	  sel.quick_push (elt);
 	}
-      if (!can_vec_perm_p (TYPE_MODE (vectype), false, &sel))
+      if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel))
 	{
 	  for (i = 0; i < group_size; ++i)
 	    if (gimple_assign_rhs_code (stmts[i]) == alt_stmt_code)
@@ -3646,7 +3646,7 @@ vect_transform_slp_perm_load (slp_tree n
 	  if (index == nunits)
 	    {
 	      if (! noop_p
-		  && ! can_vec_perm_p (mode, false, &mask))
+		  && ! can_vec_perm_const_p (mode, mask))
 		{
 		  if (dump_enabled_p ())
 		    {
Index: gcc/tree-vect-stmts.c
===================================================================
--- gcc/tree-vect-stmts.c	2017-12-09 22:47:19.119312754 +0000
+++ gcc/tree-vect-stmts.c	2017-12-09 22:47:21.537314229 +0000
@@ -1720,7 +1720,7 @@ perm_mask_for_reverse (tree vectype)
   for (i = 0; i < nunits; ++i)
     sel.quick_push (nunits - 1 - i);
 
-  if (!can_vec_perm_p (TYPE_MODE (vectype), false, &sel))
+  if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel))
     return NULL_TREE;
   return vect_gen_perm_mask_checked (vectype, sel);
 }
@@ -2502,7 +2502,7 @@ vectorizable_bswap (gimple *stmt, gimple
     for (unsigned j = 0; j < word_bytes; ++j)
       elts.quick_push ((i + 1) * word_bytes - j - 1);
 
-  if (! can_vec_perm_p (TYPE_MODE (char_vectype), false, &elts))
+  if (!can_vec_perm_const_p (TYPE_MODE (char_vectype), elts))
     return false;
 
   if (! vec_stmt)
@@ -6502,7 +6502,7 @@ vectorizable_store (gimple *stmt, gimple
 
 /* Given a vector type VECTYPE, turns permutation SEL into the equivalent
    VECTOR_CST mask.  No checks are made that the target platform supports the
-   mask, so callers may wish to test can_vec_perm_p separately, or use
+   mask, so callers may wish to test can_vec_perm_const_p separately, or use
    vect_gen_perm_mask_checked.  */
 
 tree
@@ -6523,13 +6523,13 @@ vect_gen_perm_mask_any (tree vectype, co
   return mask_elts.build ();
 }
 
-/* Checked version of vect_gen_perm_mask_any.  Asserts can_vec_perm_p,
+/* Checked version of vect_gen_perm_mask_any.  Asserts can_vec_perm_const_p,
    i.e. that the target supports the pattern _for arbitrary input vectors_.  */
 
 tree
 vect_gen_perm_mask_checked (tree vectype, const vec_perm_indices &sel)
 {
-  gcc_assert (can_vec_perm_p (TYPE_MODE (vectype), false, &sel));
+  gcc_assert (can_vec_perm_const_p (TYPE_MODE (vectype), sel));
   return vect_gen_perm_mask_any (vectype, sel);
 }
 
Index: gcc/fold-const.c
===================================================================
--- gcc/fold-const.c	2017-12-09 22:47:19.119312754 +0000
+++ gcc/fold-const.c	2017-12-09 22:47:21.534314227 +0000
@@ -11620,8 +11620,8 @@ fold_ternary_loc (location_t loc, enum t
 	     argument permutation while still allowing an equivalent
 	     2-argument version.  */
 	  if (need_mask_canon && arg2 == op2
-	      && !can_vec_perm_p (TYPE_MODE (type), false, &sel)
-	      && can_vec_perm_p (TYPE_MODE (type), false, &sel2))
+	      && !can_vec_perm_const_p (TYPE_MODE (type), sel, false)
+	      && can_vec_perm_const_p (TYPE_MODE (type), sel2, false))
 	    {
 	      need_mask_canon = need_mask_canon2;
 	      sel = sel2;
Index: gcc/tree-vect-loop.c
===================================================================
--- gcc/tree-vect-loop.c	2017-12-09 22:47:11.145420483 +0000
+++ gcc/tree-vect-loop.c	2017-12-09 22:47:21.536314228 +0000
@@ -3730,9 +3730,6 @@ have_whole_vector_shift (machine_mode mo
   if (optab_handler (vec_shr_optab, mode) != CODE_FOR_nothing)
     return true;
 
-  if (direct_optab_handler (vec_perm_const_optab, mode) == CODE_FOR_nothing)
-    return false;
-
   unsigned int i, nelt = GET_MODE_NUNITS (mode);
   auto_vec_perm_indices sel (nelt);
 
@@ -3740,7 +3737,7 @@ have_whole_vector_shift (machine_mode mo
     {
       sel.truncate (0);
       calc_vec_perm_mask_for_shift (i, nelt, &sel);
-      if (!can_vec_perm_p (mode, false, &sel))
+      if (!can_vec_perm_const_p (mode, sel, false))
 	return false;
     }
   return true;
Index: gcc/tree-vect-generic.c
===================================================================
--- gcc/tree-vect-generic.c	2017-12-09 22:47:11.145420483 +0000
+++ gcc/tree-vect-generic.c	2017-12-09 22:47:21.535314227 +0000
@@ -1306,7 +1306,7 @@ lower_vec_perm (gimple_stmt_iterator *gs
 	sel_int.quick_push (TREE_INT_CST_LOW (VECTOR_CST_ELT (mask, i))
 			    & (2 * elements - 1));
 
-      if (can_vec_perm_p (TYPE_MODE (vect_type), false, &sel_int))
+      if (can_vec_perm_const_p (TYPE_MODE (vect_type), sel_int))
 	{
 	  gimple_assign_set_rhs3 (stmt, mask);
 	  update_stmt (stmt);
@@ -1337,7 +1337,7 @@ lower_vec_perm (gimple_stmt_iterator *gs
 	    }
 	}
     }
-  else if (can_vec_perm_p (TYPE_MODE (vect_type), true, NULL))
+  else if (can_vec_perm_var_p (TYPE_MODE (vect_type)))
     return;
   
   warning_at (loc, OPT_Wvector_operation_performance,

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [04/13] Refactor expand_vec_perm
  2017-12-09 23:06 [00/13] Make VEC_PERM_EXPR work for variable-length vectors Richard Sandiford
                   ` (2 preceding siblings ...)
  2017-12-09 23:11 ` [03/13] Split can_vec_perm_p into can_vec_perm_{var,const}_p Richard Sandiford
@ 2017-12-09 23:13 ` Richard Sandiford
  2017-12-12 15:17   ` Richard Biener
  2017-12-09 23:17 ` [05/13] Remove vec_perm_const optab Richard Sandiford
                   ` (7 subsequent siblings)
  11 siblings, 1 reply; 46+ messages in thread
From: Richard Sandiford @ 2017-12-09 23:13 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 12051 bytes --]

This patch splits the variable handling out of expand_vec_perm into
a subroutine, so that the next patch can use a different interface
for expanding constant permutes.  expand_vec_perm now does all the
CONST_VECTOR handling directly and defers to expand_vec_perm_var
for other rtx codes.  Handling CONST_VECTORs includes handling the
fallback to variable permutes.

The patch also adds an assert for valid optab modes to expand_vec_perm_1,
so that we get it when using optabs for CONST_VECTORs.  The MODE_VECTOR_INT
part was previously in expand_vec_perm and the mode_for_int_vector part
is new.

Most of the patch is just reindentation, so I've attached a -b version.


2017-12-06  Richard Sandiford  <richard.sandiford@linaro.org>

gcc/
	* optabs.c (expand_vec_perm_1): Assert that SEL has an integer
	vector mode and that that mode matches the mode of the data
	being permuted.
	(expand_vec_perm): Split handling of non-CONST_VECTOR selectors
	out into expand_vec_perm_var.  Do all CONST_VECTOR handling here,
	directly using expand_vec_perm_1 when forcing selectors into
	registers.
	(expand_vec_perm_var): New function, split out from expand_vec_perm.

Index: gcc/optabs.c
===================================================================
--- gcc/optabs.c	2017-12-09 22:47:14.731310077 +0000
+++ gcc/optabs.c	2017-12-09 22:47:23.878315657 +0000
@@ -5405,6 +5405,8 @@ expand_vec_perm_1 (enum insn_code icode,
   machine_mode smode = GET_MODE (sel);
   struct expand_operand ops[4];
 
+  gcc_assert (GET_MODE_CLASS (smode) == MODE_VECTOR_INT
+	      || mode_for_int_vector (tmode).require () == smode);
   create_output_operand (&ops[0], target, tmode);
   create_input_operand (&ops[3], sel, smode);
 
@@ -5431,8 +5433,13 @@ expand_vec_perm_1 (enum insn_code icode,
   return NULL_RTX;
 }
 
-/* Generate instructions for vec_perm optab given its mode
-   and three operands.  */
+static rtx expand_vec_perm_var (machine_mode, rtx, rtx, rtx, rtx);
+
+/* Implement a permutation of vectors v0 and v1 using the permutation
+   vector in SEL and return the result.  Use TARGET to hold the result
+   if nonnull and convenient.
+
+   MODE is the mode of the vectors being permuted (V0 and V1).  */
 
 rtx
 expand_vec_perm (machine_mode mode, rtx v0, rtx v1, rtx sel, rtx target)
@@ -5443,6 +5450,9 @@ expand_vec_perm (machine_mode mode, rtx
   rtx tmp, sel_qi = NULL;
   rtvec vec;
 
+  if (GET_CODE (sel) != CONST_VECTOR)
+    return expand_vec_perm_var (mode, v0, v1, sel, target);
+
   if (!target || GET_MODE (target) != mode)
     target = gen_reg_rtx (mode);
 
@@ -5455,86 +5465,125 @@ expand_vec_perm (machine_mode mode, rtx
   if (!qimode_for_vec_perm (mode).exists (&qimode))
     qimode = VOIDmode;
 
-  /* If the input is a constant, expand it specially.  */
-  gcc_assert (GET_MODE_CLASS (GET_MODE (sel)) == MODE_VECTOR_INT);
-  if (GET_CODE (sel) == CONST_VECTOR)
-    {
-      /* See if this can be handled with a vec_shr.  We only do this if the
-	 second vector is all zeroes.  */
-      enum insn_code shift_code = optab_handler (vec_shr_optab, mode);
-      enum insn_code shift_code_qi = ((qimode != VOIDmode && qimode != mode)
-				      ? optab_handler (vec_shr_optab, qimode)
-				      : CODE_FOR_nothing);
-      rtx shift_amt = NULL_RTX;
-      if (v1 == CONST0_RTX (GET_MODE (v1))
-	  && (shift_code != CODE_FOR_nothing
-	      || shift_code_qi != CODE_FOR_nothing))
+  /* See if this can be handled with a vec_shr.  We only do this if the
+     second vector is all zeroes.  */
+  insn_code shift_code = optab_handler (vec_shr_optab, mode);
+  insn_code shift_code_qi = ((qimode != VOIDmode && qimode != mode)
+			     ? optab_handler (vec_shr_optab, qimode)
+			     : CODE_FOR_nothing);
+
+  if (v1 == CONST0_RTX (GET_MODE (v1))
+      && (shift_code != CODE_FOR_nothing
+	  || shift_code_qi != CODE_FOR_nothing))
+    {
+      rtx shift_amt = shift_amt_for_vec_perm_mask (sel);
+      if (shift_amt)
 	{
-	  shift_amt = shift_amt_for_vec_perm_mask (sel);
-	  if (shift_amt)
+	  struct expand_operand ops[3];
+	  if (shift_code != CODE_FOR_nothing)
 	    {
-	      struct expand_operand ops[3];
-	      if (shift_code != CODE_FOR_nothing)
-		{
-		  create_output_operand (&ops[0], target, mode);
-		  create_input_operand (&ops[1], v0, mode);
-		  create_convert_operand_from_type (&ops[2], shift_amt,
-						    sizetype);
-		  if (maybe_expand_insn (shift_code, 3, ops))
-		    return ops[0].value;
-		}
-	      if (shift_code_qi != CODE_FOR_nothing)
-		{
-		  tmp = gen_reg_rtx (qimode);
-		  create_output_operand (&ops[0], tmp, qimode);
-		  create_input_operand (&ops[1], gen_lowpart (qimode, v0),
-					qimode);
-		  create_convert_operand_from_type (&ops[2], shift_amt,
-						    sizetype);
-		  if (maybe_expand_insn (shift_code_qi, 3, ops))
-		    return gen_lowpart (mode, ops[0].value);
-		}
+	      create_output_operand (&ops[0], target, mode);
+	      create_input_operand (&ops[1], v0, mode);
+	      create_convert_operand_from_type (&ops[2], shift_amt, sizetype);
+	      if (maybe_expand_insn (shift_code, 3, ops))
+		return ops[0].value;
+	    }
+	  if (shift_code_qi != CODE_FOR_nothing)
+	    {
+	      rtx tmp = gen_reg_rtx (qimode);
+	      create_output_operand (&ops[0], tmp, qimode);
+	      create_input_operand (&ops[1], gen_lowpart (qimode, v0), qimode);
+	      create_convert_operand_from_type (&ops[2], shift_amt, sizetype);
+	      if (maybe_expand_insn (shift_code_qi, 3, ops))
+		return gen_lowpart (mode, ops[0].value);
 	    }
 	}
+    }
 
-      icode = direct_optab_handler (vec_perm_const_optab, mode);
-      if (icode != CODE_FOR_nothing)
+  icode = direct_optab_handler (vec_perm_const_optab, mode);
+  if (icode != CODE_FOR_nothing)
+    {
+      tmp = expand_vec_perm_1 (icode, target, v0, v1, sel);
+      if (tmp)
+	return tmp;
+    }
+
+  /* Fall back to a constant byte-based permutation.  */
+  if (qimode != VOIDmode)
+    {
+      vec = rtvec_alloc (w);
+      for (i = 0; i < e; ++i)
 	{
-	  tmp = expand_vec_perm_1 (icode, target, v0, v1, sel);
-	  if (tmp)
-	    return tmp;
+	  unsigned int j, this_e;
+
+	  this_e = INTVAL (CONST_VECTOR_ELT (sel, i));
+	  this_e &= 2 * e - 1;
+	  this_e *= u;
+
+	  for (j = 0; j < u; ++j)
+	    RTVEC_ELT (vec, i * u + j) = GEN_INT (this_e + j);
 	}
+      sel_qi = gen_rtx_CONST_VECTOR (qimode, vec);
 
-      /* Fall back to a constant byte-based permutation.  */
-      if (qimode != VOIDmode)
+      icode = direct_optab_handler (vec_perm_const_optab, qimode);
+      if (icode != CODE_FOR_nothing)
 	{
-	  vec = rtvec_alloc (w);
-	  for (i = 0; i < e; ++i)
-	    {
-	      unsigned int j, this_e;
+	  tmp = gen_reg_rtx (qimode);
+	  tmp = expand_vec_perm_1 (icode, tmp, gen_lowpart (qimode, v0),
+				   gen_lowpart (qimode, v1), sel_qi);
+	  if (tmp)
+	    return gen_lowpart (mode, tmp);
+	}
+    }
 
-	      this_e = INTVAL (CONST_VECTOR_ELT (sel, i));
-	      this_e &= 2 * e - 1;
-	      this_e *= u;
+  /* Otherwise expand as a fully variable permuation.  */
 
-	      for (j = 0; j < u; ++j)
-		RTVEC_ELT (vec, i * u + j) = GEN_INT (this_e + j);
-	    }
-	  sel_qi = gen_rtx_CONST_VECTOR (qimode, vec);
+  icode = direct_optab_handler (vec_perm_optab, mode);
+  if (icode != CODE_FOR_nothing)
+    {
+      rtx tmp = expand_vec_perm_1 (icode, target, v0, v1, sel);
+      if (tmp)
+	return tmp;
+    }
 
-	  icode = direct_optab_handler (vec_perm_const_optab, qimode);
-	  if (icode != CODE_FOR_nothing)
-	    {
-	      tmp = mode != qimode ? gen_reg_rtx (qimode) : target;
-	      tmp = expand_vec_perm_1 (icode, tmp, gen_lowpart (qimode, v0),
-				       gen_lowpart (qimode, v1), sel_qi);
-	      if (tmp)
-		return gen_lowpart (mode, tmp);
-	    }
+  if (qimode != VOIDmode)
+    {
+      icode = direct_optab_handler (vec_perm_optab, qimode);
+      if (icode != CODE_FOR_nothing)
+	{
+	  rtx tmp = gen_reg_rtx (qimode);
+	  tmp = expand_vec_perm_1 (icode, tmp, gen_lowpart (qimode, v0),
+				   gen_lowpart (qimode, v1), sel_qi);
+	  if (tmp)
+	    return gen_lowpart (mode, tmp);
 	}
     }
 
-  /* Otherwise expand as a fully variable permuation.  */
+  return NULL_RTX;
+}
+
+/* Implement a permutation of vectors v0 and v1 using the permutation
+   vector in SEL and return the result.  Use TARGET to hold the result
+   if nonnull and convenient.
+
+   MODE is the mode of the vectors being permuted (V0 and V1).
+   SEL must have the integer equivalent of MODE and is known to be
+   unsuitable for permutes with a constant permutation vector.  */
+
+static rtx
+expand_vec_perm_var (machine_mode mode, rtx v0, rtx v1, rtx sel, rtx target)
+{
+  enum insn_code icode;
+  unsigned int i, w, u;
+  rtx tmp, sel_qi;
+  rtvec vec;
+
+  w = GET_MODE_SIZE (mode);
+  u = GET_MODE_UNIT_SIZE (mode);
+
+  if (!target || GET_MODE (target) != mode)
+    target = gen_reg_rtx (mode);
+
   icode = direct_optab_handler (vec_perm_optab, mode);
   if (icode != CODE_FOR_nothing)
     {
@@ -5545,50 +5594,47 @@ expand_vec_perm (machine_mode mode, rtx
 
   /* As a special case to aid several targets, lower the element-based
      permutation to a byte-based permutation and try again.  */
-  if (qimode == VOIDmode)
+  machine_mode qimode;
+  if (!qimode_for_vec_perm (mode).exists (&qimode))
     return NULL_RTX;
   icode = direct_optab_handler (vec_perm_optab, qimode);
   if (icode == CODE_FOR_nothing)
     return NULL_RTX;
 
-  if (sel_qi == NULL)
+  /* Multiply each element by its byte size.  */
+  machine_mode selmode = GET_MODE (sel);
+  if (u == 2)
+    sel = expand_simple_binop (selmode, PLUS, sel, sel,
+			       NULL, 0, OPTAB_DIRECT);
+  else
+    sel = expand_simple_binop (selmode, ASHIFT, sel, GEN_INT (exact_log2 (u)),
+			       NULL, 0, OPTAB_DIRECT);
+  gcc_assert (sel != NULL);
+
+  /* Broadcast the low byte each element into each of its bytes.  */
+  vec = rtvec_alloc (w);
+  for (i = 0; i < w; ++i)
     {
-      /* Multiply each element by its byte size.  */
-      machine_mode selmode = GET_MODE (sel);
-      if (u == 2)
-	sel = expand_simple_binop (selmode, PLUS, sel, sel,
-				   NULL, 0, OPTAB_DIRECT);
-      else
-	sel = expand_simple_binop (selmode, ASHIFT, sel,
-				   GEN_INT (exact_log2 (u)),
-				   NULL, 0, OPTAB_DIRECT);
-      gcc_assert (sel != NULL);
-
-      /* Broadcast the low byte each element into each of its bytes.  */
-      vec = rtvec_alloc (w);
-      for (i = 0; i < w; ++i)
-	{
-	  int this_e = i / u * u;
-	  if (BYTES_BIG_ENDIAN)
-	    this_e += u - 1;
-	  RTVEC_ELT (vec, i) = GEN_INT (this_e);
-	}
-      tmp = gen_rtx_CONST_VECTOR (qimode, vec);
-      sel = gen_lowpart (qimode, sel);
-      sel = expand_vec_perm (qimode, sel, sel, tmp, NULL);
-      gcc_assert (sel != NULL);
-
-      /* Add the byte offset to each byte element.  */
-      /* Note that the definition of the indicies here is memory ordering,
-	 so there should be no difference between big and little endian.  */
-      vec = rtvec_alloc (w);
-      for (i = 0; i < w; ++i)
-	RTVEC_ELT (vec, i) = GEN_INT (i % u);
-      tmp = gen_rtx_CONST_VECTOR (qimode, vec);
-      sel_qi = expand_simple_binop (qimode, PLUS, sel, tmp,
-				    sel, 0, OPTAB_DIRECT);
-      gcc_assert (sel_qi != NULL);
+      int this_e = i / u * u;
+      if (BYTES_BIG_ENDIAN)
+	this_e += u - 1;
+      RTVEC_ELT (vec, i) = GEN_INT (this_e);
     }
+  tmp = gen_rtx_CONST_VECTOR (qimode, vec);
+  sel = gen_lowpart (qimode, sel);
+  sel = expand_vec_perm (qimode, sel, sel, tmp, NULL);
+  gcc_assert (sel != NULL);
+
+  /* Add the byte offset to each byte element.  */
+  /* Note that the definition of the indicies here is memory ordering,
+     so there should be no difference between big and little endian.  */
+  vec = rtvec_alloc (w);
+  for (i = 0; i < w; ++i)
+    RTVEC_ELT (vec, i) = GEN_INT (i % u);
+  tmp = gen_rtx_CONST_VECTOR (qimode, vec);
+  sel_qi = expand_simple_binop (qimode, PLUS, sel, tmp,
+				sel, 0, OPTAB_DIRECT);
+  gcc_assert (sel_qi != NULL);
 
   tmp = mode != qimode ? gen_reg_rtx (qimode) : target;
   tmp = expand_vec_perm_1 (icode, tmp, gen_lowpart (qimode, v0),


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: ignoring-whitespace.diff --]
[-- Type: text/x-patch, Size: 6676 bytes --]

Index: gcc/optabs.c
===================================================================
--- gcc/optabs.c	2017-12-09 23:06:57.167722990 +0000
+++ gcc/optabs.c	2017-12-09 23:11:09.452859833 +0000
@@ -5405,6 +5405,8 @@ expand_vec_perm_1 (enum insn_code icode,
   machine_mode smode = GET_MODE (sel);
   struct expand_operand ops[4];
 
+  gcc_assert (GET_MODE_CLASS (smode) == MODE_VECTOR_INT
+	      || mode_for_int_vector (tmode).require () == smode);
   create_output_operand (&ops[0], target, tmode);
   create_input_operand (&ops[3], sel, smode);
 
@@ -5431,8 +5433,13 @@ expand_vec_perm_1 (enum insn_code icode,
   return NULL_RTX;
 }
 
-/* Generate instructions for vec_perm optab given its mode
-   and three operands.  */
+static rtx expand_vec_perm_var (machine_mode, rtx, rtx, rtx, rtx);
+
+/* Implement a permutation of vectors v0 and v1 using the permutation
+   vector in SEL and return the result.  Use TARGET to hold the result
+   if nonnull and convenient.
+
+   MODE is the mode of the vectors being permuted (V0 and V1).  */
 
 rtx
 expand_vec_perm (machine_mode mode, rtx v0, rtx v1, rtx sel, rtx target)
@@ -5443,6 +5450,9 @@ expand_vec_perm (machine_mode mode, rtx
   rtx tmp, sel_qi = NULL;
   rtvec vec;
 
+  if (GET_CODE (sel) != CONST_VECTOR)
+    return expand_vec_perm_var (mode, v0, v1, sel, target);
+
   if (!target || GET_MODE (target) != mode)
     target = gen_reg_rtx (mode);
 
@@ -5455,22 +5465,18 @@ expand_vec_perm (machine_mode mode, rtx
   if (!qimode_for_vec_perm (mode).exists (&qimode))
     qimode = VOIDmode;
 
-  /* If the input is a constant, expand it specially.  */
-  gcc_assert (GET_MODE_CLASS (GET_MODE (sel)) == MODE_VECTOR_INT);
-  if (GET_CODE (sel) == CONST_VECTOR)
-    {
       /* See if this can be handled with a vec_shr.  We only do this if the
 	 second vector is all zeroes.  */
-      enum insn_code shift_code = optab_handler (vec_shr_optab, mode);
-      enum insn_code shift_code_qi = ((qimode != VOIDmode && qimode != mode)
+  insn_code shift_code = optab_handler (vec_shr_optab, mode);
+  insn_code shift_code_qi = ((qimode != VOIDmode && qimode != mode)
 				      ? optab_handler (vec_shr_optab, qimode)
 				      : CODE_FOR_nothing);
-      rtx shift_amt = NULL_RTX;
+
       if (v1 == CONST0_RTX (GET_MODE (v1))
 	  && (shift_code != CODE_FOR_nothing
 	      || shift_code_qi != CODE_FOR_nothing))
 	{
-	  shift_amt = shift_amt_for_vec_perm_mask (sel);
+      rtx shift_amt = shift_amt_for_vec_perm_mask (sel);
 	  if (shift_amt)
 	    {
 	      struct expand_operand ops[3];
@@ -5478,19 +5484,16 @@ expand_vec_perm (machine_mode mode, rtx
 		{
 		  create_output_operand (&ops[0], target, mode);
 		  create_input_operand (&ops[1], v0, mode);
-		  create_convert_operand_from_type (&ops[2], shift_amt,
-						    sizetype);
+	      create_convert_operand_from_type (&ops[2], shift_amt, sizetype);
 		  if (maybe_expand_insn (shift_code, 3, ops))
 		    return ops[0].value;
 		}
 	      if (shift_code_qi != CODE_FOR_nothing)
 		{
-		  tmp = gen_reg_rtx (qimode);
+	      rtx tmp = gen_reg_rtx (qimode);
 		  create_output_operand (&ops[0], tmp, qimode);
-		  create_input_operand (&ops[1], gen_lowpart (qimode, v0),
-					qimode);
-		  create_convert_operand_from_type (&ops[2], shift_amt,
-						    sizetype);
+	      create_input_operand (&ops[1], gen_lowpart (qimode, v0), qimode);
+	      create_convert_operand_from_type (&ops[2], shift_amt, sizetype);
 		  if (maybe_expand_insn (shift_code_qi, 3, ops))
 		    return gen_lowpart (mode, ops[0].value);
 		}
@@ -5525,16 +5528,62 @@ expand_vec_perm (machine_mode mode, rtx
 	  icode = direct_optab_handler (vec_perm_const_optab, qimode);
 	  if (icode != CODE_FOR_nothing)
 	    {
-	      tmp = mode != qimode ? gen_reg_rtx (qimode) : target;
+	  tmp = gen_reg_rtx (qimode);
 	      tmp = expand_vec_perm_1 (icode, tmp, gen_lowpart (qimode, v0),
 				       gen_lowpart (qimode, v1), sel_qi);
 	      if (tmp)
 		return gen_lowpart (mode, tmp);
 	    }
 	}
-    }
 
   /* Otherwise expand as a fully variable permuation.  */
+
+  icode = direct_optab_handler (vec_perm_optab, mode);
+  if (icode != CODE_FOR_nothing)
+    {
+      rtx tmp = expand_vec_perm_1 (icode, target, v0, v1, sel);
+      if (tmp)
+	return tmp;
+    }
+
+  if (qimode != VOIDmode)
+    {
+      icode = direct_optab_handler (vec_perm_optab, qimode);
+      if (icode != CODE_FOR_nothing)
+	{
+	  rtx tmp = gen_reg_rtx (qimode);
+	  tmp = expand_vec_perm_1 (icode, tmp, gen_lowpart (qimode, v0),
+				   gen_lowpart (qimode, v1), sel_qi);
+	  if (tmp)
+	    return gen_lowpart (mode, tmp);
+	}
+    }
+
+  return NULL_RTX;
+}
+
+/* Implement a permutation of vectors v0 and v1 using the permutation
+   vector in SEL and return the result.  Use TARGET to hold the result
+   if nonnull and convenient.
+
+   MODE is the mode of the vectors being permuted (V0 and V1).
+   SEL must have the integer equivalent of MODE and is known to be
+   unsuitable for permutes with a constant permutation vector.  */
+
+static rtx
+expand_vec_perm_var (machine_mode mode, rtx v0, rtx v1, rtx sel, rtx target)
+{
+  enum insn_code icode;
+  unsigned int i, w, u;
+  rtx tmp, sel_qi;
+  rtvec vec;
+
+  w = GET_MODE_SIZE (mode);
+  u = GET_MODE_UNIT_SIZE (mode);
+
+  if (!target || GET_MODE (target) != mode)
+    target = gen_reg_rtx (mode);
+
   icode = direct_optab_handler (vec_perm_optab, mode);
   if (icode != CODE_FOR_nothing)
     {
@@ -5545,22 +5594,20 @@ expand_vec_perm (machine_mode mode, rtx
 
   /* As a special case to aid several targets, lower the element-based
      permutation to a byte-based permutation and try again.  */
-  if (qimode == VOIDmode)
+  machine_mode qimode;
+  if (!qimode_for_vec_perm (mode).exists (&qimode))
     return NULL_RTX;
   icode = direct_optab_handler (vec_perm_optab, qimode);
   if (icode == CODE_FOR_nothing)
     return NULL_RTX;
 
-  if (sel_qi == NULL)
-    {
       /* Multiply each element by its byte size.  */
       machine_mode selmode = GET_MODE (sel);
       if (u == 2)
 	sel = expand_simple_binop (selmode, PLUS, sel, sel,
 				   NULL, 0, OPTAB_DIRECT);
       else
-	sel = expand_simple_binop (selmode, ASHIFT, sel,
-				   GEN_INT (exact_log2 (u)),
+    sel = expand_simple_binop (selmode, ASHIFT, sel, GEN_INT (exact_log2 (u)),
 				   NULL, 0, OPTAB_DIRECT);
       gcc_assert (sel != NULL);
 
@@ -5588,7 +5635,6 @@ expand_vec_perm (machine_mode mode, rtx
       sel_qi = expand_simple_binop (qimode, PLUS, sel, tmp,
 				    sel, 0, OPTAB_DIRECT);
       gcc_assert (sel_qi != NULL);
-    }
 
   tmp = mode != qimode ? gen_reg_rtx (qimode) : target;
   tmp = expand_vec_perm_1 (icode, tmp, gen_lowpart (qimode, v0),

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [05/13] Remove vec_perm_const optab
  2017-12-09 23:06 [00/13] Make VEC_PERM_EXPR work for variable-length vectors Richard Sandiford
                   ` (3 preceding siblings ...)
  2017-12-09 23:13 ` [04/13] Refactor expand_vec_perm Richard Sandiford
@ 2017-12-09 23:17 ` Richard Sandiford
  2017-12-12 15:26   ` Richard Biener
  2017-12-09 23:18 ` [06/13] Check whether a vector of QIs can store all indices Richard Sandiford
                   ` (6 subsequent siblings)
  11 siblings, 1 reply; 46+ messages in thread
From: Richard Sandiford @ 2017-12-09 23:17 UTC (permalink / raw)
  To: gcc-patches

One of the changes needed for variable-length VEC_PERM_EXPRs -- and for
long fixed-length VEC_PERM_EXPRs -- is the ability to use constant
selectors that wouldn't fit in the vectors being permuted.  E.g. a
permute on two V256QIs can't be done using a V256QI selector.

At the moment constant permutes use two interfaces:
targetm.vectorizer.vec_perm_const_ok for testing whether a permute is
valid and the vec_perm_const optab for actually emitting the permute.
The former gets passed a vec<> selector and the latter an rtx selector.
Most ports share a lot of code between the hook and the optab, with a
wrapper function for each interface.

We could try to keep that interface and require ports to define wider
vector modes that could be attached to the CONST_VECTOR (e.g. V256HI or
V256SI in the example above).  But building a CONST_VECTOR rtx seems a bit
pointless here, since the expand code only creates the CONST_VECTOR in
order to call the optab, and the first thing the target does is take
the CONST_VECTOR apart again.

The easiest approach therefore seemed to be to remove the optab and
reuse the target hook to emit the code.  One potential drawback is that
it's no longer possible to use match_operand predicates to force
operands into the required form, but in practice all targets want
register operands anyway.

The patch also changes vec_perm_indices into a class that provides
some simple routines for handling permutations.  A later patch will
flesh this out and get rid of auto_vec_perm_indices, but I didn't
want to do all that in this patch and make it more complicated than
it already is.


2017-12-09  Richard Sandiford  <richard.sandiford@linaro.org>

gcc/
	* Makefile.in (OBJS): Add vec-perm-indices.o.
	* vec-perm-indices.h: New file.
	* vec-perm-indices.c: Likewise.
	* target.h (vec_perm_indices): Replace with a forward class
	declaration.
	(auto_vec_perm_indices): Move to vec-perm-indices.h.
	* optabs.h: Include vec-perm-indices.h.
	(expand_vec_perm): Delete.
	(selector_fits_mode_p, expand_vec_perm_var): Declare.
	(expand_vec_perm_const): Declare.
	* target.def (vec_perm_const_ok): Replace with...
	(vec_perm_const): ...this new hook.
	* doc/tm.texi.in (TARGET_VECTORIZE_VEC_PERM_CONST_OK): Replace with...
	(TARGET_VECTORIZE_VEC_PERM_CONST): ...this new hook.
	* doc/tm.texi: Regenerate.
	* optabs.def (vec_perm_const): Delete.
	* doc/md.texi (vec_perm_const): Likewise.
	(vec_perm): Refer to TARGET_VECTORIZE_VEC_PERM_CONST.
	* expr.c (expand_expr_real_2): Use expand_vec_perm_const rather than
	expand_vec_perm for constant permutation vectors.  Assert that
	the mode of variable permutation vectors is the integer equivalent
	of the mode that is being permuted.
	* optabs-query.h (selector_fits_mode_p): Declare.
	* optabs-query.c: Include vec-perm-indices.h.
	(can_vec_perm_const_p): Check whether targetm.vectorize.vec_perm_const
	is defined, instead of checking whether the vec_perm_const_optab
	exists.  Use targetm.vectorize.vec_perm_const instead of
	targetm.vectorize.vec_perm_const_ok.  Check whether the indices
	fit in the vector mode before using a variable permute.
	* optabs.c (shift_amt_for_vec_perm_mask): Take a mode and a
	vec_perm_indices instead of an rtx.
	(expand_vec_perm): Replace with...
	(expand_vec_perm_const): ...this new function.  Take the selector
	as a vec_perm_indices rather than an rtx.  Also take the mode of
	the selector.  Update call to shift_amt_for_vec_perm_mask.
	Use targetm.vectorize.vec_perm_const instead of vec_perm_const_optab.
	Use vec_perm_indices::new_expanded_vector to expand the original
	selector into bytes.  Check whether the indices fit in the vector
	mode before using a variable permute.
	(expand_vec_perm_var): Make global.
	(expand_mult_highpart): Use expand_vec_perm_const.
	* fold-const.c: Includes vec-perm-indices.h.
	* tree-ssa-forwprop.c: Likewise.
	* tree-vect-data-refs.c: Likewise.
	* tree-vect-generic.c: Likewise.
	* tree-vect-loop.c: Likewise.
	* tree-vect-slp.c: Likewise.
	* tree-vect-stmts.c: Likewise.
	* config/aarch64/aarch64-protos.h (aarch64_expand_vec_perm_const):
	Delete.
	* config/aarch64/aarch64-simd.md (vec_perm_const<mode>): Delete.
	* config/aarch64/aarch64.c (aarch64_expand_vec_perm_const)
	(aarch64_vectorize_vec_perm_const_ok): Fuse into...
	(aarch64_vectorize_vec_perm_const): ...this new function.
	(TARGET_VECTORIZE_VEC_PERM_CONST_OK): Delete.
	(TARGET_VECTORIZE_VEC_PERM_CONST): Redefine.
	* config/arm/arm-protos.h (arm_expand_vec_perm_const): Delete.
	* config/arm/vec-common.md (vec_perm_const<mode>): Delete.
	* config/arm/arm.c (TARGET_VECTORIZE_VEC_PERM_CONST_OK): Delete.
	(TARGET_VECTORIZE_VEC_PERM_CONST): Redefine.
	(arm_expand_vec_perm_const, arm_vectorize_vec_perm_const_ok): Merge
	into...
	(arm_vectorize_vec_perm_const): ...this new function.  Explicitly
	check for NEON modes.
	* config/i386/i386-protos.h (ix86_expand_vec_perm_const): Delete.
	* config/i386/sse.md (VEC_PERM_CONST, vec_perm_const<mode>): Delete.
	* config/i386/i386.c (ix86_expand_vec_perm_const_1): Update comment.
	(ix86_expand_vec_perm_const, ix86_vectorize_vec_perm_const_ok): Merge
	into...
	(ix86_vectorize_vec_perm_const): ...this new function.  Incorporate
	the old VEC_PERM_CONST conditions.
	* config/ia64/ia64-protos.h (ia64_expand_vec_perm_const): Delete.
	* config/ia64/vect.md (vec_perm_const<mode>): Delete.
	* config/ia64/ia64.c (ia64_expand_vec_perm_const)
	(ia64_vectorize_vec_perm_const_ok): Merge into...
	(ia64_vectorize_vec_perm_const): ...this new function.
	* config/mips/loongson.md (vec_perm_const<mode>): Delete.
	* config/mips/mips-msa.md (vec_perm_const<mode>): Delete.
	* config/mips/mips-ps-3d.md (vec_perm_constv2sf): Delete.
	* config/mips/mips-protos.h (mips_expand_vec_perm_const): Delete.
	* config/mips/mips.c (mips_expand_vec_perm_const)
	(mips_vectorize_vec_perm_const_ok): Merge into...
	(mips_vectorize_vec_perm_const): ...this new function.
	* config/powerpcspe/altivec.md (vec_perm_constv16qi): Delete.
	* config/powerpcspe/paired.md (vec_perm_constv2sf): Delete.
	* config/powerpcspe/spe.md (vec_perm_constv2si): Delete.
	* config/powerpcspe/vsx.md (vec_perm_const<mode>): Delete.
	* config/powerpcspe/powerpcspe-protos.h (altivec_expand_vec_perm_const)
	(rs6000_expand_vec_perm_const): Delete.
	* config/powerpcspe/powerpcspe.c (TARGET_VECTORIZE_VEC_PERM_CONST_OK):
	Delete.
	(TARGET_VECTORIZE_VEC_PERM_CONST): Redefine.
	(altivec_expand_vec_perm_const_le): Take each operand individually.
	Operate on constant selectors rather than rtxes.
	(altivec_expand_vec_perm_const): Likewise.  Update call to
	altivec_expand_vec_perm_const_le.
	(rs6000_expand_vec_perm_const): Delete.
	(rs6000_vectorize_vec_perm_const_ok): Delete.
	(rs6000_vectorize_vec_perm_const): New function.
	(rs6000_do_expand_vec_perm): Take a vec_perm_builder instead of
	an element count and rtx array.
	(rs6000_expand_extract_even): Update call accordingly.
	(rs6000_expand_interleave): Likewise.
	* config/rs6000/altivec.md (vec_perm_constv16qi): Delete.
	* config/rs6000/paired.md (vec_perm_constv2sf): Delete.
	* config/rs6000/vsx.md (vec_perm_const<mode>): Delete.
	* config/rs6000/rs6000-protos.h (altivec_expand_vec_perm_const)
	(rs6000_expand_vec_perm_const): Delete.
	* config/rs6000/rs6000.c (TARGET_VECTORIZE_VEC_PERM_CONST_OK): Delete.
	(TARGET_VECTORIZE_VEC_PERM_CONST): Redefine.
	(altivec_expand_vec_perm_const_le): Take each operand individually.
	Operate on constant selectors rather than rtxes.
	(altivec_expand_vec_perm_const): Likewise.  Update call to
	altivec_expand_vec_perm_const_le.
	(rs6000_expand_vec_perm_const): Delete.
	(rs6000_vectorize_vec_perm_const_ok): Delete.
	(rs6000_vectorize_vec_perm_const): New function.  Remove stray
	reference to the SPE evmerge intructions.
	(rs6000_do_expand_vec_perm): Take a vec_perm_builder instead of
	an element count and rtx array.
	(rs6000_expand_extract_even): Update call accordingly.
	(rs6000_expand_interleave): Likewise.
	* config/sparc/sparc.md (vec_perm_constv8qi): Delete in favor of...
	* config/sparc/sparc.c (sparc_vectorize_vec_perm_const): ...this
	new function.
	(TARGET_VECTORIZE_VEC_PERM_CONST): Redefine.

Index: gcc/Makefile.in
===================================================================
--- gcc/Makefile.in	2017-12-09 22:47:09.549486911 +0000
+++ gcc/Makefile.in	2017-12-09 22:47:27.854318082 +0000
@@ -1584,6 +1584,7 @@ OBJS = \
 	var-tracking.o \
 	varasm.o \
 	varpool.o \
+	vec-perm-indices.o \
 	vmsdbgout.o \
 	vr-values.o \
 	vtable-verify.o \
Index: gcc/vec-perm-indices.h
===================================================================
--- /dev/null	2017-12-09 13:59:56.352713187 +0000
+++ gcc/vec-perm-indices.h	2017-12-09 22:47:27.885318101 +0000
@@ -0,0 +1,49 @@
+/* A representation of vector permutation indices.
+   Copyright (C) 2017 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+#ifndef GCC_VEC_PERN_INDICES_H
+#define GCC_VEC_PERN_INDICES_H 1
+
+/* This class represents a constant permutation vector, such as that used
+   as the final operand to a VEC_PERM_EXPR.  */
+class vec_perm_indices : public auto_vec<unsigned short, 32>
+{
+  typedef unsigned short element_type;
+  typedef auto_vec<element_type, 32> parent_type;
+
+public:
+  vec_perm_indices () {}
+  vec_perm_indices (unsigned int nunits) : parent_type (nunits) {}
+
+  void new_expanded_vector (const vec_perm_indices &, unsigned int);
+
+  bool all_in_range_p (element_type, element_type) const;
+
+private:
+  vec_perm_indices (const vec_perm_indices &);
+};
+
+/* Temporary.  */
+typedef vec_perm_indices vec_perm_builder;
+typedef vec_perm_indices auto_vec_perm_indices;
+
+bool tree_to_vec_perm_builder (vec_perm_builder *, tree);
+rtx vec_perm_indices_to_rtx (machine_mode, const vec_perm_indices &);
+
+#endif
Index: gcc/vec-perm-indices.c
===================================================================
--- /dev/null	2017-12-09 13:59:56.352713187 +0000
+++ gcc/vec-perm-indices.c	2017-12-09 22:47:27.885318101 +0000
@@ -0,0 +1,93 @@
+/* A representation of vector permutation indices.
+   Copyright (C) 2017 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "vec-perm-indices.h"
+#include "tree.h"
+#include "backend.h"
+#include "rtl.h"
+#include "memmodel.h"
+#include "emit-rtl.h"
+
+/* Switch to a new permutation vector that selects the same input elements
+   as ORIG, but with each element split into FACTOR pieces.  For example,
+   if ORIG is { 1, 2, 0, 3 } and FACTOR is 2, the new permutation is
+   { 2, 3, 4, 5, 0, 1, 6, 7 }.  */
+
+void
+vec_perm_indices::new_expanded_vector (const vec_perm_indices &orig,
+				       unsigned int factor)
+{
+  truncate (0);
+  reserve (orig.length () * factor);
+  for (unsigned int i = 0; i < orig.length (); ++i)
+    {
+      element_type base = orig[i] * factor;
+      for (unsigned int j = 0; j < factor; ++j)
+	quick_push (base + j);
+    }
+}
+
+/* Return true if all elements of the permutation vector are in the range
+   [START, START + SIZE).  */
+
+bool
+vec_perm_indices::all_in_range_p (element_type start, element_type size) const
+{
+  for (unsigned int i = 0; i < length (); ++i)
+    if ((*this)[i] < start || ((*this)[i] - start) >= size)
+      return false;
+  return true;
+}
+
+/* Try to read the contents of VECTOR_CST CST as a constant permutation
+   vector.  Return true and add the elements to BUILDER on success,
+   otherwise return false without modifying BUILDER.  */
+
+bool
+tree_to_vec_perm_builder (vec_perm_builder *builder, tree cst)
+{
+  unsigned int nelts = TYPE_VECTOR_SUBPARTS (TREE_TYPE (cst));
+  for (unsigned int i = 0; i < nelts; ++i)
+    if (!tree_fits_shwi_p (vector_cst_elt (cst, i)))
+      return false;
+
+  builder->reserve (nelts);
+  for (unsigned int i = 0; i < nelts; ++i)
+    builder->quick_push (tree_to_shwi (vector_cst_elt (cst, i))
+			 & (2 * nelts - 1));
+  return true;
+}
+
+/* Return a CONST_VECTOR of mode MODE that contains the elements of
+   INDICES.  */
+
+rtx
+vec_perm_indices_to_rtx (machine_mode mode, const vec_perm_indices &indices)
+{
+  gcc_assert (GET_MODE_CLASS (mode) == MODE_VECTOR_INT
+	      && GET_MODE_NUNITS (mode) == indices.length ());
+  unsigned int nelts = indices.length ();
+  rtvec v = rtvec_alloc (nelts);
+  for (unsigned int i = 0; i < nelts; ++i)
+    RTVEC_ELT (v, i) = gen_int_mode (indices[i], GET_MODE_INNER (mode));
+  return gen_rtx_CONST_VECTOR (mode, v);
+}
Index: gcc/target.h
===================================================================
--- gcc/target.h	2017-12-09 22:47:09.549486911 +0000
+++ gcc/target.h	2017-12-09 22:47:27.882318099 +0000
@@ -193,13 +193,7 @@ enum vect_cost_model_location {
   vect_epilogue = 2
 };
 
-/* The type to use for vector permutes with a constant permute vector.
-   Each entry is an index into the concatenated input vectors.  */
-typedef vec<unsigned short> vec_perm_indices;
-
-/* Same, but can be used to construct local permute vectors that are
-   automatically freed.  */
-typedef auto_vec<unsigned short, 32> auto_vec_perm_indices;
+class vec_perm_indices;
 
 /* The target structure.  This holds all the backend hooks.  */
 #define DEFHOOKPOD(NAME, DOC, TYPE, INIT) TYPE NAME;
Index: gcc/optabs.h
===================================================================
--- gcc/optabs.h	2017-12-09 22:47:09.549486911 +0000
+++ gcc/optabs.h	2017-12-09 22:47:27.882318099 +0000
@@ -22,6 +22,7 @@ #define GCC_OPTABS_H
 
 #include "optabs-query.h"
 #include "optabs-libfuncs.h"
+#include "vec-perm-indices.h"
 
 /* Generate code for a widening multiply.  */
 extern rtx expand_widening_mult (machine_mode, rtx, rtx, rtx, int, optab);
@@ -307,7 +308,9 @@ extern int have_insn_for (enum rtx_code,
 extern rtx_insn *gen_cond_trap (enum rtx_code, rtx, rtx, rtx);
 
 /* Generate code for VEC_PERM_EXPR.  */
-extern rtx expand_vec_perm (machine_mode, rtx, rtx, rtx, rtx);
+extern rtx expand_vec_perm_var (machine_mode, rtx, rtx, rtx, rtx);
+extern rtx expand_vec_perm_const (machine_mode, rtx, rtx,
+				  const vec_perm_builder &, machine_mode, rtx);
 
 /* Generate code for vector comparison.  */
 extern rtx expand_vec_cmp_expr (tree, tree, rtx);
Index: gcc/target.def
===================================================================
--- gcc/target.def	2017-12-09 22:47:09.549486911 +0000
+++ gcc/target.def	2017-12-09 22:47:27.882318099 +0000
@@ -1841,12 +1841,27 @@ DEFHOOK
  bool, (const_tree type, bool is_packed),
  default_builtin_vector_alignment_reachable)
 
-/* Return true if a vector created for vec_perm_const is valid.
-   A NULL indicates that all constants are valid permutations.  */
 DEFHOOK
-(vec_perm_const_ok,
- "Return true if a vector created for @code{vec_perm_const} is valid.",
- bool, (machine_mode, vec_perm_indices),
+(vec_perm_const,
+ "This hook is used to test whether the target can permute up to two\n\
+vectors of mode @var{mode} using the permutation vector @code{sel}, and\n\
+also to emit such a permutation.  In the former case @var{in0}, @var{in1}\n\
+and @var{out} are all null.  In the latter case @var{in0} and @var{in1} are\n\
+the source vectors and @var{out} is the destination vector; all three are\n\
+registers of mode @var{mode}.  @var{in1} is the same as @var{in0} if\n\
+@var{sel} describes a permutation on one vector instead of two.\n\
+\n\
+Return true if the operation is possible, emitting instructions for it\n\
+if rtxes are provided.\n\
+\n\
+@cindex @code{vec_perm@var{m}} instruction pattern\n\
+If the hook returns false for a mode with multibyte elements, GCC will\n\
+try the equivalent byte operation.  If that also fails, it will try forcing\n\
+the selector into a register and using the @var{vec_perm@var{mode}}\n\
+instruction pattern.  There is no need for the hook to handle these two\n\
+implementation approaches itself.",
+ bool, (machine_mode mode, rtx output, rtx in0, rtx in1,
+	const vec_perm_indices &sel),
  NULL)
 
 /* Return true if the target supports misaligned store/load of a
Index: gcc/doc/tm.texi.in
===================================================================
--- gcc/doc/tm.texi.in	2017-12-09 22:47:09.549486911 +0000
+++ gcc/doc/tm.texi.in	2017-12-09 22:47:27.879318098 +0000
@@ -4079,7 +4079,7 @@ address;  but often a machine-dependent
 
 @hook TARGET_VECTORIZE_VECTOR_ALIGNMENT_REACHABLE
 
-@hook TARGET_VECTORIZE_VEC_PERM_CONST_OK
+@hook TARGET_VECTORIZE_VEC_PERM_CONST
 
 @hook TARGET_VECTORIZE_BUILTIN_CONVERSION
 
Index: gcc/doc/tm.texi
===================================================================
--- gcc/doc/tm.texi	2017-12-09 22:47:09.549486911 +0000
+++ gcc/doc/tm.texi	2017-12-09 22:47:27.878318097 +0000
@@ -5798,8 +5798,24 @@ correct for most targets.
 Return true if vector alignment is reachable (by peeling N iterations) for the given scalar type @var{type}.  @var{is_packed} is false if the scalar access using @var{type} is known to be naturally aligned.
 @end deftypefn
 
-@deftypefn {Target Hook} bool TARGET_VECTORIZE_VEC_PERM_CONST_OK (machine_mode, @var{vec_perm_indices})
-Return true if a vector created for @code{vec_perm_const} is valid.
+@deftypefn {Target Hook} bool TARGET_VECTORIZE_VEC_PERM_CONST (machine_mode @var{mode}, rtx @var{output}, rtx @var{in0}, rtx @var{in1}, const vec_perm_indices @var{&sel})
+This hook is used to test whether the target can permute up to two
+vectors of mode @var{mode} using the permutation vector @code{sel}, and
+also to emit such a permutation.  In the former case @var{in0}, @var{in1}
+and @var{out} are all null.  In the latter case @var{in0} and @var{in1} are
+the source vectors and @var{out} is the destination vector; all three are
+registers of mode @var{mode}.  @var{in1} is the same as @var{in0} if
+@var{sel} describes a permutation on one vector instead of two.
+
+Return true if the operation is possible, emitting instructions for it
+if rtxes are provided.
+
+@cindex @code{vec_perm@var{m}} instruction pattern
+If the hook returns false for a mode with multibyte elements, GCC will
+try the equivalent byte operation.  If that also fails, it will try forcing
+the selector into a register and using the @var{vec_perm@var{mode}}
+instruction pattern.  There is no need for the hook to handle these two
+implementation approaches itself.
 @end deftypefn
 
 @deftypefn {Target Hook} tree TARGET_VECTORIZE_BUILTIN_CONVERSION (unsigned @var{code}, tree @var{dest_type}, tree @var{src_type})
Index: gcc/optabs.def
===================================================================
--- gcc/optabs.def	2017-12-09 22:47:09.549486911 +0000
+++ gcc/optabs.def	2017-12-09 22:47:27.882318099 +0000
@@ -302,7 +302,6 @@ OPTAB_D (vec_pack_ssat_optab, "vec_pack_
 OPTAB_D (vec_pack_trunc_optab, "vec_pack_trunc_$a")
 OPTAB_D (vec_pack_ufix_trunc_optab, "vec_pack_ufix_trunc_$a")
 OPTAB_D (vec_pack_usat_optab, "vec_pack_usat_$a")
-OPTAB_D (vec_perm_const_optab, "vec_perm_const$a")
 OPTAB_D (vec_perm_optab, "vec_perm$a")
 OPTAB_D (vec_realign_load_optab, "vec_realign_load_$a")
 OPTAB_D (vec_set_optab, "vec_set$a")
Index: gcc/doc/md.texi
===================================================================
--- gcc/doc/md.texi	2017-12-09 22:47:09.549486911 +0000
+++ gcc/doc/md.texi	2017-12-09 22:47:27.877318096 +0000
@@ -4972,20 +4972,8 @@ where @var{q} is a vector of @code{QImod
 the middle-end will lower the mode @var{m} @code{VEC_PERM_EXPR} to
 mode @var{q}.
 
-@cindex @code{vec_perm_const@var{m}} instruction pattern
-@item @samp{vec_perm_const@var{m}}
-Like @samp{vec_perm} except that the permutation is a compile-time
-constant.  That is, operand 3, the @dfn{selector}, is a @code{CONST_VECTOR}.
-
-Some targets cannot perform a permutation with a variable selector,
-but can efficiently perform a constant permutation.  Further, the
-target hook @code{vec_perm_ok} is queried to determine if the 
-specific constant permutation is available efficiently; the named
-pattern is never expanded without @code{vec_perm_ok} returning true.
-
-There is no need for a target to supply both @samp{vec_perm@var{m}}
-and @samp{vec_perm_const@var{m}} if the former can trivially implement
-the operation with, say, the vector constant loaded into a register.
+See also @code{TARGET_VECTORIZER_VEC_PERM_CONST}, which performs
+the analogous operation for constant selectors.
 
 @cindex @code{push@var{m}1} instruction pattern
 @item @samp{push@var{m}1}
Index: gcc/expr.c
===================================================================
--- gcc/expr.c	2017-12-09 22:47:09.549486911 +0000
+++ gcc/expr.c	2017-12-09 22:47:27.880318098 +0000
@@ -9439,28 +9439,24 @@ #define REDUCE_BIT_FIELD(expr)	(reduce_b
       goto binop;
 
     case VEC_PERM_EXPR:
-      expand_operands (treeop0, treeop1, target, &op0, &op1, EXPAND_NORMAL);
-      op2 = expand_normal (treeop2);
-
-      /* Careful here: if the target doesn't support integral vector modes,
-	 a constant selection vector could wind up smooshed into a normal
-	 integral constant.  */
-      if (CONSTANT_P (op2) && !VECTOR_MODE_P (GET_MODE (op2)))
-	{
-	  tree sel_type = TREE_TYPE (treeop2);
-	  machine_mode vmode
-	    = mode_for_vector (SCALAR_TYPE_MODE (TREE_TYPE (sel_type)),
-			       TYPE_VECTOR_SUBPARTS (sel_type)).require ();
-	  gcc_assert (GET_MODE_CLASS (vmode) == MODE_VECTOR_INT);
-	  op2 = simplify_subreg (vmode, op2, TYPE_MODE (sel_type), 0);
-	  gcc_assert (op2 && GET_CODE (op2) == CONST_VECTOR);
-	}
-      else
-        gcc_assert (GET_MODE_CLASS (GET_MODE (op2)) == MODE_VECTOR_INT);
-
-      temp = expand_vec_perm (mode, op0, op1, op2, target);
-      gcc_assert (temp);
-      return temp;
+      {
+	expand_operands (treeop0, treeop1, target, &op0, &op1, EXPAND_NORMAL);
+	vec_perm_builder sel;
+	if (TREE_CODE (treeop2) == VECTOR_CST
+	    && tree_to_vec_perm_builder (&sel, treeop2))
+	  {
+	    machine_mode sel_mode = TYPE_MODE (TREE_TYPE (treeop2));
+	    temp = expand_vec_perm_const (mode, op0, op1, sel,
+					  sel_mode, target);
+	  }
+	else
+	  {
+	    op2 = expand_normal (treeop2);
+	    temp = expand_vec_perm_var (mode, op0, op1, op2, target);
+	  }
+	gcc_assert (temp);
+	return temp;
+      }
 
     case DOT_PROD_EXPR:
       {
Index: gcc/optabs-query.h
===================================================================
--- gcc/optabs-query.h	2017-12-09 22:47:21.534314227 +0000
+++ gcc/optabs-query.h	2017-12-09 22:47:27.881318099 +0000
@@ -175,6 +175,7 @@ enum insn_code can_float_p (machine_mode
 enum insn_code can_fix_p (machine_mode, machine_mode, int, bool *);
 bool can_conditionally_move_p (machine_mode mode);
 opt_machine_mode qimode_for_vec_perm (machine_mode);
+bool selector_fits_mode_p (machine_mode, const vec_perm_indices &);
 bool can_vec_perm_var_p (machine_mode);
 bool can_vec_perm_const_p (machine_mode, const vec_perm_indices &,
 			   bool = true);
Index: gcc/optabs-query.c
===================================================================
--- gcc/optabs-query.c	2017-12-09 22:47:25.861316866 +0000
+++ gcc/optabs-query.c	2017-12-09 22:47:27.881318099 +0000
@@ -28,6 +28,7 @@ Software Foundation; either version 3, o
 #include "insn-config.h"
 #include "rtl.h"
 #include "recog.h"
+#include "vec-perm-indices.h"
 
 struct target_optabs default_target_optabs;
 struct target_optabs *this_fn_optabs = &default_target_optabs;
@@ -361,6 +362,17 @@ qimode_for_vec_perm (machine_mode mode)
   return opt_machine_mode ();
 }
 
+/* Return true if selector SEL can be represented in the integer
+   equivalent of vector mode MODE.  */
+
+bool
+selector_fits_mode_p (machine_mode mode, const vec_perm_indices &sel)
+{
+  unsigned HOST_WIDE_INT mask = GET_MODE_MASK (GET_MODE_INNER (mode));
+  return (mask == HOST_WIDE_INT_M1U
+	  || sel.all_in_range_p (0, mask + 1));
+}
+
 /* Return true if VEC_PERM_EXPRs with variable selector operands can be
    expanded using SIMD extensions of the CPU.  MODE is the mode of the
    vectors being permuted.  */
@@ -416,18 +428,22 @@ can_vec_perm_const_p (machine_mode mode,
     return false;
 
   /* It's probably cheaper to test for the variable case first.  */
-  if (allow_variable_p && can_vec_perm_var_p (mode))
+  if (allow_variable_p
+      && selector_fits_mode_p (mode, sel)
+      && can_vec_perm_var_p (mode))
     return true;
 
-  if (direct_optab_handler (vec_perm_const_optab, mode) != CODE_FOR_nothing)
+  if (targetm.vectorize.vec_perm_const != NULL)
     {
-      if (targetm.vectorize.vec_perm_const_ok == NULL
-	  || targetm.vectorize.vec_perm_const_ok (mode, sel))
+      if (targetm.vectorize.vec_perm_const (mode, NULL_RTX, NULL_RTX,
+					    NULL_RTX, sel))
 	return true;
 
       /* ??? For completeness, we ought to check the QImode version of
 	 vec_perm_const_optab.  But all users of this implicit lowering
-	 feature implement the variable vec_perm_optab.  */
+	 feature implement the variable vec_perm_optab, and the ia64
+	 port specifically doesn't want us to lower V2SF operations
+	 into integer operations.  */
     }
 
   return false;
Index: gcc/optabs.c
===================================================================
--- gcc/optabs.c	2017-12-09 22:47:25.861316866 +0000
+++ gcc/optabs.c	2017-12-09 22:47:27.881318099 +0000
@@ -5367,25 +5367,23 @@ vector_compare_rtx (machine_mode cmp_mod
   return gen_rtx_fmt_ee (rcode, cmp_mode, ops[0].value, ops[1].value);
 }
 
-/* Checks if vec_perm mask SEL is a constant equivalent to a shift of the first
-   vec_perm operand, assuming the second operand is a constant vector of zeroes.
-   Return the shift distance in bits if so, or NULL_RTX if the vec_perm is not a
-   shift.  */
+/* Check if vec_perm mask SEL is a constant equivalent to a shift of
+   the first vec_perm operand, assuming the second operand is a constant
+   vector of zeros.  Return the shift distance in bits if so, or NULL_RTX
+   if the vec_perm is not a shift.  MODE is the mode of the value being
+   shifted.  */
 static rtx
-shift_amt_for_vec_perm_mask (rtx sel)
+shift_amt_for_vec_perm_mask (machine_mode mode, const vec_perm_indices &sel)
 {
-  unsigned int i, first, nelt = GET_MODE_NUNITS (GET_MODE (sel));
-  unsigned int bitsize = GET_MODE_UNIT_BITSIZE (GET_MODE (sel));
+  unsigned int i, first, nelt = GET_MODE_NUNITS (mode);
+  unsigned int bitsize = GET_MODE_UNIT_BITSIZE (mode);
 
-  if (GET_CODE (sel) != CONST_VECTOR)
-    return NULL_RTX;
-
-  first = INTVAL (CONST_VECTOR_ELT (sel, 0));
+  first = sel[0];
   if (first >= nelt)
     return NULL_RTX;
   for (i = 1; i < nelt; i++)
     {
-      int idx = INTVAL (CONST_VECTOR_ELT (sel, i));
+      int idx = sel[i];
       unsigned int expected = i + first;
       /* Indices into the second vector are all equivalent.  */
       if (idx < 0 || (MIN (nelt, (unsigned) idx) != MIN (nelt, expected)))
@@ -5395,7 +5393,7 @@ shift_amt_for_vec_perm_mask (rtx sel)
   return GEN_INT (first * bitsize);
 }
 
-/* A subroutine of expand_vec_perm for expanding one vec_perm insn.  */
+/* A subroutine of expand_vec_perm_var for expanding one vec_perm insn.  */
 
 static rtx
 expand_vec_perm_1 (enum insn_code icode, rtx target,
@@ -5433,38 +5431,32 @@ expand_vec_perm_1 (enum insn_code icode,
   return NULL_RTX;
 }
 
-static rtx expand_vec_perm_var (machine_mode, rtx, rtx, rtx, rtx);
-
 /* Implement a permutation of vectors v0 and v1 using the permutation
    vector in SEL and return the result.  Use TARGET to hold the result
    if nonnull and convenient.
 
-   MODE is the mode of the vectors being permuted (V0 and V1).  */
+   MODE is the mode of the vectors being permuted (V0 and V1).  SEL_MODE
+   is the TYPE_MODE associated with SEL, or BLKmode if SEL isn't known
+   to have a particular mode.  */
 
 rtx
-expand_vec_perm (machine_mode mode, rtx v0, rtx v1, rtx sel, rtx target)
+expand_vec_perm_const (machine_mode mode, rtx v0, rtx v1,
+		       const vec_perm_builder &sel, machine_mode sel_mode,
+		       rtx target)
 {
-  enum insn_code icode;
-  machine_mode qimode;
-  unsigned int i, w, e, u;
-  rtx tmp, sel_qi = NULL;
-  rtvec vec;
-
-  if (GET_CODE (sel) != CONST_VECTOR)
-    return expand_vec_perm_var (mode, v0, v1, sel, target);
-
-  if (!target || GET_MODE (target) != mode)
+  if (!target || !register_operand (target, mode))
     target = gen_reg_rtx (mode);
 
-  w = GET_MODE_SIZE (mode);
-  e = GET_MODE_NUNITS (mode);
-  u = GET_MODE_UNIT_SIZE (mode);
-
   /* Set QIMODE to a different vector mode with byte elements.
      If no such mode, or if MODE already has byte elements, use VOIDmode.  */
+  machine_mode qimode;
   if (!qimode_for_vec_perm (mode).exists (&qimode))
     qimode = VOIDmode;
 
+  rtx_insn *last = get_last_insn ();
+
+  bool single_arg_p = rtx_equal_p (v0, v1);
+
   /* See if this can be handled with a vec_shr.  We only do this if the
      second vector is all zeroes.  */
   insn_code shift_code = optab_handler (vec_shr_optab, mode);
@@ -5476,7 +5468,7 @@ expand_vec_perm (machine_mode mode, rtx
       && (shift_code != CODE_FOR_nothing
 	  || shift_code_qi != CODE_FOR_nothing))
     {
-      rtx shift_amt = shift_amt_for_vec_perm_mask (sel);
+      rtx shift_amt = shift_amt_for_vec_perm_mask (mode, sel);
       if (shift_amt)
 	{
 	  struct expand_operand ops[3];
@@ -5500,65 +5492,81 @@ expand_vec_perm (machine_mode mode, rtx
 	}
     }
 
-  icode = direct_optab_handler (vec_perm_const_optab, mode);
-  if (icode != CODE_FOR_nothing)
+  if (targetm.vectorize.vec_perm_const != NULL)
     {
-      tmp = expand_vec_perm_1 (icode, target, v0, v1, sel);
-      if (tmp)
-	return tmp;
+      v0 = force_reg (mode, v0);
+      if (single_arg_p)
+	v1 = v0;
+      else
+	v1 = force_reg (mode, v1);
+
+      if (targetm.vectorize.vec_perm_const (mode, target, v0, v1, sel))
+	return target;
     }
 
   /* Fall back to a constant byte-based permutation.  */
+  vec_perm_indices qimode_indices;
+  rtx target_qi = NULL_RTX, v0_qi = NULL_RTX, v1_qi = NULL_RTX;
   if (qimode != VOIDmode)
     {
-      vec = rtvec_alloc (w);
-      for (i = 0; i < e; ++i)
-	{
-	  unsigned int j, this_e;
+      qimode_indices.new_expanded_vector (sel, GET_MODE_UNIT_SIZE (mode));
+      target_qi = gen_reg_rtx (qimode);
+      v0_qi = gen_lowpart (qimode, v0);
+      v1_qi = gen_lowpart (qimode, v1);
+      if (targetm.vectorize.vec_perm_const != NULL
+	  && targetm.vectorize.vec_perm_const (qimode, target_qi, v0_qi,
+					       v1_qi, qimode_indices))
+	return gen_lowpart (mode, target_qi);
+    }
 
-	  this_e = INTVAL (CONST_VECTOR_ELT (sel, i));
-	  this_e &= 2 * e - 1;
-	  this_e *= u;
+  /* Otherwise expand as a fully variable permuation.  */
 
-	  for (j = 0; j < u; ++j)
-	    RTVEC_ELT (vec, i * u + j) = GEN_INT (this_e + j);
-	}
-      sel_qi = gen_rtx_CONST_VECTOR (qimode, vec);
+  /* The optabs are only defined for selectors with the same width
+     as the values being permuted.  */
+  machine_mode required_sel_mode;
+  if (!mode_for_int_vector (mode).exists (&required_sel_mode)
+      || !VECTOR_MODE_P (required_sel_mode))
+    {
+      delete_insns_since (last);
+      return NULL_RTX;
+    }
 
-      icode = direct_optab_handler (vec_perm_const_optab, qimode);
-      if (icode != CODE_FOR_nothing)
+  /* We know that it is semantically valid to treat SEL as having SEL_MODE.
+     If that isn't the mode we want then we need to prove that using
+     REQUIRED_SEL_MODE is OK.  */
+  if (sel_mode != required_sel_mode)
+    {
+      if (!selector_fits_mode_p (required_sel_mode, sel))
 	{
-	  tmp = gen_reg_rtx (qimode);
-	  tmp = expand_vec_perm_1 (icode, tmp, gen_lowpart (qimode, v0),
-				   gen_lowpart (qimode, v1), sel_qi);
-	  if (tmp)
-	    return gen_lowpart (mode, tmp);
+	  delete_insns_since (last);
+	  return NULL_RTX;
 	}
+      sel_mode = required_sel_mode;
     }
 
-  /* Otherwise expand as a fully variable permuation.  */
-
-  icode = direct_optab_handler (vec_perm_optab, mode);
+  insn_code icode = direct_optab_handler (vec_perm_optab, mode);
   if (icode != CODE_FOR_nothing)
     {
-      rtx tmp = expand_vec_perm_1 (icode, target, v0, v1, sel);
+      rtx sel_rtx = vec_perm_indices_to_rtx (sel_mode, sel);
+      rtx tmp = expand_vec_perm_1 (icode, target, v0, v1, sel_rtx);
       if (tmp)
 	return tmp;
     }
 
-  if (qimode != VOIDmode)
+  if (qimode != VOIDmode
+      && selector_fits_mode_p (qimode, qimode_indices))
     {
       icode = direct_optab_handler (vec_perm_optab, qimode);
       if (icode != CODE_FOR_nothing)
 	{
-	  rtx tmp = gen_reg_rtx (qimode);
-	  tmp = expand_vec_perm_1 (icode, tmp, gen_lowpart (qimode, v0),
-				   gen_lowpart (qimode, v1), sel_qi);
+	  rtx sel_qi = vec_perm_indices_to_rtx (qimode, qimode_indices);
+	  rtx tmp = expand_vec_perm_1 (icode, target_qi, v0_qi, v1_qi, sel_qi);
 	  if (tmp)
 	    return gen_lowpart (mode, tmp);
 	}
     }
 
+  delete_insns_since (last);
   return NULL_RTX;
 }
 
@@ -5570,7 +5578,7 @@ expand_vec_perm (machine_mode mode, rtx
    SEL must have the integer equivalent of MODE and is known to be
    unsuitable for permutes with a constant permutation vector.  */
 
-static rtx
+rtx
 expand_vec_perm_var (machine_mode mode, rtx v0, rtx v1, rtx sel, rtx target)
 {
   enum insn_code icode;
@@ -5613,17 +5621,16 @@ expand_vec_perm_var (machine_mode mode,
   gcc_assert (sel != NULL);
 
   /* Broadcast the low byte each element into each of its bytes.  */
-  vec = rtvec_alloc (w);
+  vec_perm_builder const_sel (w);
   for (i = 0; i < w; ++i)
     {
       int this_e = i / u * u;
       if (BYTES_BIG_ENDIAN)
 	this_e += u - 1;
-      RTVEC_ELT (vec, i) = GEN_INT (this_e);
+      const_sel.quick_push (this_e);
     }
-  tmp = gen_rtx_CONST_VECTOR (qimode, vec);
   sel = gen_lowpart (qimode, sel);
-  sel = expand_vec_perm (qimode, sel, sel, tmp, NULL);
+  sel = expand_vec_perm_const (qimode, sel, sel, const_sel, qimode, NULL);
   gcc_assert (sel != NULL);
 
   /* Add the byte offset to each byte element.  */
@@ -5797,9 +5804,8 @@ expand_mult_highpart (machine_mode mode,
   enum insn_code icode;
   int method, i, nunits;
   machine_mode wmode;
-  rtx m1, m2, perm;
+  rtx m1, m2;
   optab tab1, tab2;
-  rtvec v;
 
   method = can_mult_highpart_p (mode, uns_p);
   switch (method)
@@ -5842,21 +5848,20 @@ expand_mult_highpart (machine_mode mode,
   expand_insn (optab_handler (tab2, mode), 3, eops);
   m2 = gen_lowpart (mode, eops[0].value);
 
-  v = rtvec_alloc (nunits);
+  auto_vec_perm_indices sel (nunits);
   if (method == 2)
     {
       for (i = 0; i < nunits; ++i)
-	RTVEC_ELT (v, i) = GEN_INT (!BYTES_BIG_ENDIAN + (i & ~1)
-				    + ((i & 1) ? nunits : 0));
-      perm = gen_rtx_CONST_VECTOR (mode, v);
+	sel.quick_push (!BYTES_BIG_ENDIAN + (i & ~1)
+			+ ((i & 1) ? nunits : 0));
     }
   else
     {
-      int base = BYTES_BIG_ENDIAN ? 0 : 1;
-      perm = gen_const_vec_series (mode, GEN_INT (base), GEN_INT (2));
+      for (i = 0; i < nunits; ++i)
+	sel.quick_push (2 * i + (BYTES_BIG_ENDIAN ? 0 : 1));
     }
 
-  return expand_vec_perm (mode, m1, m2, perm, target);
+  return expand_vec_perm_const (mode, m1, m2, sel, BLKmode, target);
 }
 \f
 /* Helper function to find the MODE_CC set in a sync_compare_and_swap
Index: gcc/fold-const.c
===================================================================
--- gcc/fold-const.c	2017-12-09 22:47:21.534314227 +0000
+++ gcc/fold-const.c	2017-12-09 22:47:27.881318099 +0000
@@ -82,6 +82,7 @@ Software Foundation; either version 3, o
 #include "stringpool.h"
 #include "attribs.h"
 #include "tree-vector-builder.h"
+#include "vec-perm-indices.h"
 
 /* Nonzero if we are folding constants inside an initializer; zero
    otherwise.  */
Index: gcc/tree-ssa-forwprop.c
===================================================================
--- gcc/tree-ssa-forwprop.c	2017-12-09 22:47:21.534314227 +0000
+++ gcc/tree-ssa-forwprop.c	2017-12-09 22:47:27.883318100 +0000
@@ -47,6 +47,7 @@ the Free Software Foundation; either ver
 #include "cfganal.h"
 #include "optabs-tree.h"
 #include "tree-vector-builder.h"
+#include "vec-perm-indices.h"
 
 /* This pass propagates the RHS of assignment statements into use
    sites of the LHS of the assignment.  It's basically a specialized
Index: gcc/tree-vect-data-refs.c
===================================================================
--- gcc/tree-vect-data-refs.c	2017-12-09 22:47:21.535314227 +0000
+++ gcc/tree-vect-data-refs.c	2017-12-09 22:47:27.883318100 +0000
@@ -52,6 +52,7 @@ Software Foundation; either version 3, o
 #include "params.h"
 #include "tree-cfg.h"
 #include "tree-hash-traits.h"
+#include "vec-perm-indices.h"
 
 /* Return true if load- or store-lanes optab OPTAB is implemented for
    COUNT vectors of type VECTYPE.  NAME is the name of OPTAB.  */
Index: gcc/tree-vect-generic.c
===================================================================
--- gcc/tree-vect-generic.c	2017-12-09 22:47:21.535314227 +0000
+++ gcc/tree-vect-generic.c	2017-12-09 22:47:27.883318100 +0000
@@ -38,6 +38,7 @@ Free Software Foundation; either version
 #include "gimplify.h"
 #include "tree-cfg.h"
 #include "tree-vector-builder.h"
+#include "vec-perm-indices.h"
 
 
 static void expand_vector_operations_1 (gimple_stmt_iterator *);
Index: gcc/tree-vect-loop.c
===================================================================
--- gcc/tree-vect-loop.c	2017-12-09 22:47:21.536314228 +0000
+++ gcc/tree-vect-loop.c	2017-12-09 22:47:27.884318101 +0000
@@ -52,6 +52,7 @@ Software Foundation; either version 3, o
 #include "tree-if-conv.h"
 #include "internal-fn.h"
 #include "tree-vector-builder.h"
+#include "vec-perm-indices.h"
 
 /* Loop Vectorization Pass.
 
Index: gcc/tree-vect-slp.c
===================================================================
--- gcc/tree-vect-slp.c	2017-12-09 22:47:21.536314228 +0000
+++ gcc/tree-vect-slp.c	2017-12-09 22:47:27.884318101 +0000
@@ -42,6 +42,7 @@ Software Foundation; either version 3, o
 #include "gimple-walk.h"
 #include "dbgcnt.h"
 #include "tree-vector-builder.h"
+#include "vec-perm-indices.h"
 
 
 /* Recursively free the memory allocated for the SLP tree rooted at NODE.  */
Index: gcc/tree-vect-stmts.c
===================================================================
--- gcc/tree-vect-stmts.c	2017-12-09 22:47:21.537314229 +0000
+++ gcc/tree-vect-stmts.c	2017-12-09 22:47:27.885318101 +0000
@@ -49,6 +49,7 @@ Software Foundation; either version 3, o
 #include "builtins.h"
 #include "internal-fn.h"
 #include "tree-vector-builder.h"
+#include "vec-perm-indices.h"
 
 /* For lang_hooks.types.type_for_mode.  */
 #include "langhooks.h"
Index: gcc/config/aarch64/aarch64-protos.h
===================================================================
--- gcc/config/aarch64/aarch64-protos.h	2017-12-09 22:47:09.549486911 +0000
+++ gcc/config/aarch64/aarch64-protos.h	2017-12-09 22:47:27.854318082 +0000
@@ -474,8 +474,6 @@ extern void aarch64_split_combinev16qi (
 extern void aarch64_expand_vec_perm (rtx, rtx, rtx, rtx, unsigned int);
 extern bool aarch64_madd_needs_nop (rtx_insn *);
 extern void aarch64_final_prescan_insn (rtx_insn *);
-extern bool
-aarch64_expand_vec_perm_const (rtx, rtx, rtx, rtx, unsigned int);
 void aarch64_atomic_assign_expand_fenv (tree *, tree *, tree *);
 int aarch64_ccmp_mode_to_code (machine_mode mode);
 
Index: gcc/config/aarch64/aarch64-simd.md
===================================================================
--- gcc/config/aarch64/aarch64-simd.md	2017-12-09 22:47:09.549486911 +0000
+++ gcc/config/aarch64/aarch64-simd.md	2017-12-09 22:47:27.854318082 +0000
@@ -5348,20 +5348,6 @@ (define_expand "aarch64_get_qreg<VSTRUCT
 
 ;; vec_perm support
 
-(define_expand "vec_perm_const<mode>"
-  [(match_operand:VALL_F16 0 "register_operand")
-   (match_operand:VALL_F16 1 "register_operand")
-   (match_operand:VALL_F16 2 "register_operand")
-   (match_operand:<V_INT_EQUIV> 3)]
-  "TARGET_SIMD"
-{
-  if (aarch64_expand_vec_perm_const (operands[0], operands[1],
-				     operands[2], operands[3], <nunits>))
-    DONE;
-  else
-    FAIL;
-})
-
 (define_expand "vec_perm<mode>"
   [(match_operand:VB 0 "register_operand")
    (match_operand:VB 1 "register_operand")
Index: gcc/config/aarch64/aarch64.c
===================================================================
--- gcc/config/aarch64/aarch64.c	2017-12-09 22:47:09.549486911 +0000
+++ gcc/config/aarch64/aarch64.c	2017-12-09 22:47:27.856318084 +0000
@@ -141,8 +141,6 @@ static void aarch64_elf_asm_constructor
 static void aarch64_elf_asm_destructor (rtx, int) ATTRIBUTE_UNUSED;
 static void aarch64_override_options_after_change (void);
 static bool aarch64_vector_mode_supported_p (machine_mode);
-static bool aarch64_vectorize_vec_perm_const_ok (machine_mode,
-						 vec_perm_indices);
 static int aarch64_address_cost (rtx, machine_mode, addr_space_t, bool);
 static bool aarch64_builtin_support_vector_misalignment (machine_mode mode,
 							 const_tree type,
@@ -13626,29 +13624,27 @@ aarch64_expand_vec_perm_const_1 (struct
   return false;
 }
 
-/* Expand a vec_perm_const pattern with the operands given by TARGET,
-   OP0, OP1 and SEL.  NELT is the number of elements in the vector.  */
+/* Implement TARGET_VECTORIZE_VEC_PERM_CONST.  */
 
-bool
-aarch64_expand_vec_perm_const (rtx target, rtx op0, rtx op1, rtx sel,
-			       unsigned int nelt)
+static bool
+aarch64_vectorize_vec_perm_const (machine_mode vmode, rtx target, rtx op0,
+				  rtx op1, const vec_perm_indices &sel)
 {
   struct expand_vec_perm_d d;
   unsigned int i, which;
 
+  d.vmode = vmode;
   d.target = target;
   d.op0 = op0;
   d.op1 = op1;
+  d.testing_p = !target;
 
-  d.vmode = GET_MODE (target);
-  gcc_assert (VECTOR_MODE_P (d.vmode));
-  d.testing_p = false;
-
+  /* Calculate whether all elements are in one vector.  */
+  unsigned int nelt = sel.length ();
   d.perm.reserve (nelt);
   for (i = which = 0; i < nelt; ++i)
     {
-      rtx e = XVECEXP (sel, 0, i);
-      unsigned int ei = INTVAL (e) & (2 * nelt - 1);
+      unsigned int ei = sel[i] & (2 * nelt - 1);
       which |= (ei < nelt ? 1 : 2);
       d.perm.quick_push (ei);
     }
@@ -13660,7 +13656,7 @@ aarch64_expand_vec_perm_const (rtx targe
 
     case 3:
       d.one_vector_p = false;
-      if (!rtx_equal_p (op0, op1))
+      if (d.testing_p || !rtx_equal_p (op0, op1))
 	break;
 
       /* The elements of PERM do not suggest that only the first operand
@@ -13681,37 +13677,8 @@ aarch64_expand_vec_perm_const (rtx targe
       break;
     }
 
-  return aarch64_expand_vec_perm_const_1 (&d);
-}
-
-static bool
-aarch64_vectorize_vec_perm_const_ok (machine_mode vmode, vec_perm_indices sel)
-{
-  struct expand_vec_perm_d d;
-  unsigned int i, nelt, which;
-  bool ret;
-
-  d.vmode = vmode;
-  d.testing_p = true;
-  d.perm.safe_splice (sel);
-
-  /* Calculate whether all elements are in one vector.  */
-  nelt = sel.length ();
-  for (i = which = 0; i < nelt; ++i)
-    {
-      unsigned int e = d.perm[i];
-      gcc_assert (e < 2 * nelt);
-      which |= (e < nelt ? 1 : 2);
-    }
-
-  /* If all elements are from the second vector, reindex as if from the
-     first vector.  */
-  if (which == 2)
-    for (i = 0; i < nelt; ++i)
-      d.perm[i] -= nelt;
-
-  /* Check whether the mask can be applied to a single vector.  */
-  d.one_vector_p = (which != 3);
+  if (!d.testing_p)
+    return aarch64_expand_vec_perm_const_1 (&d);
 
   d.target = gen_raw_REG (d.vmode, LAST_VIRTUAL_REGISTER + 1);
   d.op1 = d.op0 = gen_raw_REG (d.vmode, LAST_VIRTUAL_REGISTER + 2);
@@ -13719,7 +13686,7 @@ aarch64_vectorize_vec_perm_const_ok (mac
     d.op1 = gen_raw_REG (d.vmode, LAST_VIRTUAL_REGISTER + 3);
 
   start_sequence ();
-  ret = aarch64_expand_vec_perm_const_1 (&d);
+  bool ret = aarch64_expand_vec_perm_const_1 (&d);
   end_sequence ();
 
   return ret;
@@ -15471,9 +15438,9 @@ #define TARGET_VECTORIZE_VECTOR_ALIGNMEN
 
 /* vec_perm support.  */
 
-#undef TARGET_VECTORIZE_VEC_PERM_CONST_OK
-#define TARGET_VECTORIZE_VEC_PERM_CONST_OK \
-  aarch64_vectorize_vec_perm_const_ok
+#undef TARGET_VECTORIZE_VEC_PERM_CONST
+#define TARGET_VECTORIZE_VEC_PERM_CONST \
+  aarch64_vectorize_vec_perm_const
 
 #undef TARGET_INIT_LIBFUNCS
 #define TARGET_INIT_LIBFUNCS aarch64_init_libfuncs
Index: gcc/config/arm/arm-protos.h
===================================================================
--- gcc/config/arm/arm-protos.h	2017-12-09 22:47:09.549486911 +0000
+++ gcc/config/arm/arm-protos.h	2017-12-09 22:47:27.856318084 +0000
@@ -357,7 +357,6 @@ extern bool arm_validize_comparison (rtx
 
 extern bool arm_gen_setmem (rtx *);
 extern void arm_expand_vec_perm (rtx target, rtx op0, rtx op1, rtx sel);
-extern bool arm_expand_vec_perm_const (rtx target, rtx op0, rtx op1, rtx sel);
 
 extern bool arm_autoinc_modes_ok_p (machine_mode, enum arm_auto_incmodes);
 
Index: gcc/config/arm/vec-common.md
===================================================================
--- gcc/config/arm/vec-common.md	2017-12-09 22:47:09.549486911 +0000
+++ gcc/config/arm/vec-common.md	2017-12-09 22:47:27.858318085 +0000
@@ -109,35 +109,6 @@ (define_expand "umax<mode>3"
 {
 })
 
-(define_expand "vec_perm_const<mode>"
-  [(match_operand:VALL 0 "s_register_operand" "")
-   (match_operand:VALL 1 "s_register_operand" "")
-   (match_operand:VALL 2 "s_register_operand" "")
-   (match_operand:<V_cmp_result> 3 "" "")]
-  "TARGET_NEON
-   || (TARGET_REALLY_IWMMXT && VALID_IWMMXT_REG_MODE (<MODE>mode))"
-{
-  if (arm_expand_vec_perm_const (operands[0], operands[1],
-				 operands[2], operands[3]))
-    DONE;
-  else
-    FAIL;
-})
-
-(define_expand "vec_perm_const<mode>"
-  [(match_operand:VH 0 "s_register_operand")
-   (match_operand:VH 1 "s_register_operand")
-   (match_operand:VH 2 "s_register_operand")
-   (match_operand:<V_cmp_result> 3)]
-  "TARGET_NEON"
-{
-  if (arm_expand_vec_perm_const (operands[0], operands[1],
-				 operands[2], operands[3]))
-    DONE;
-  else
-    FAIL;
-})
-
 (define_expand "vec_perm<mode>"
   [(match_operand:VE 0 "s_register_operand" "")
    (match_operand:VE 1 "s_register_operand" "")
Index: gcc/config/arm/arm.c
===================================================================
--- gcc/config/arm/arm.c	2017-12-09 22:47:09.549486911 +0000
+++ gcc/config/arm/arm.c	2017-12-09 22:47:27.858318085 +0000
@@ -288,7 +288,8 @@ static int arm_cortex_a5_branch_cost (bo
 static int arm_cortex_m_branch_cost (bool, bool);
 static int arm_cortex_m7_branch_cost (bool, bool);
 
-static bool arm_vectorize_vec_perm_const_ok (machine_mode, vec_perm_indices);
+static bool arm_vectorize_vec_perm_const (machine_mode, rtx, rtx, rtx,
+					  const vec_perm_indices &);
 
 static bool aarch_macro_fusion_pair_p (rtx_insn*, rtx_insn*);
 
@@ -734,9 +735,8 @@ #define TARGET_VECTORIZE_SUPPORT_VECTOR_
 #define TARGET_PREFERRED_RENAME_CLASS \
   arm_preferred_rename_class
 
-#undef TARGET_VECTORIZE_VEC_PERM_CONST_OK
-#define TARGET_VECTORIZE_VEC_PERM_CONST_OK \
-  arm_vectorize_vec_perm_const_ok
+#undef TARGET_VECTORIZE_VEC_PERM_CONST
+#define TARGET_VECTORIZE_VEC_PERM_CONST arm_vectorize_vec_perm_const
 
 #undef TARGET_VECTORIZE_BUILTIN_VECTORIZATION_COST
 #define TARGET_VECTORIZE_BUILTIN_VECTORIZATION_COST \
@@ -29381,28 +29381,31 @@ arm_expand_vec_perm_const_1 (struct expa
   return false;
 }
 
-/* Expand a vec_perm_const pattern.  */
+/* Implement TARGET_VECTORIZE_VEC_PERM_CONST.  */
 
-bool
-arm_expand_vec_perm_const (rtx target, rtx op0, rtx op1, rtx sel)
+static bool
+arm_vectorize_vec_perm_const (machine_mode vmode, rtx target, rtx op0, rtx op1,
+			      const vec_perm_indices &sel)
 {
   struct expand_vec_perm_d d;
   int i, nelt, which;
 
+  if (!VALID_NEON_DREG_MODE (vmode) && !VALID_NEON_QREG_MODE (vmode))
+    return false;
+
   d.target = target;
   d.op0 = op0;
   d.op1 = op1;
 
-  d.vmode = GET_MODE (target);
+  d.vmode = vmode;
   gcc_assert (VECTOR_MODE_P (d.vmode));
-  d.testing_p = false;
+  d.testing_p = !target;
 
   nelt = GET_MODE_NUNITS (d.vmode);
   d.perm.reserve (nelt);
   for (i = which = 0; i < nelt; ++i)
     {
-      rtx e = XVECEXP (sel, 0, i);
-      int ei = INTVAL (e) & (2 * nelt - 1);
+      int ei = sel[i] & (2 * nelt - 1);
       which |= (ei < nelt ? 1 : 2);
       d.perm.quick_push (ei);
     }
@@ -29414,7 +29417,7 @@ arm_expand_vec_perm_const (rtx target, r
 
     case 3:
       d.one_vector_p = false;
-      if (!rtx_equal_p (op0, op1))
+      if (d.testing_p || !rtx_equal_p (op0, op1))
 	break;
 
       /* The elements of PERM do not suggest that only the first operand
@@ -29435,38 +29438,8 @@ arm_expand_vec_perm_const (rtx target, r
       break;
     }
 
-  return arm_expand_vec_perm_const_1 (&d);
-}
-
-/* Implement TARGET_VECTORIZE_VEC_PERM_CONST_OK.  */
-
-static bool
-arm_vectorize_vec_perm_const_ok (machine_mode vmode, vec_perm_indices sel)
-{
-  struct expand_vec_perm_d d;
-  unsigned int i, nelt, which;
-  bool ret;
-
-  d.vmode = vmode;
-  d.testing_p = true;
-  d.perm.safe_splice (sel);
-
-  /* Categorize the set of elements in the selector.  */
-  nelt = GET_MODE_NUNITS (d.vmode);
-  for (i = which = 0; i < nelt; ++i)
-    {
-      unsigned int e = d.perm[i];
-      gcc_assert (e < 2 * nelt);
-      which |= (e < nelt ? 1 : 2);
-    }
-
-  /* For all elements from second vector, fold the elements to first.  */
-  if (which == 2)
-    for (i = 0; i < nelt; ++i)
-      d.perm[i] -= nelt;
-
-  /* Check whether the mask can be applied to the vector type.  */
-  d.one_vector_p = (which != 3);
+  if (d.testing_p)
+    return arm_expand_vec_perm_const_1 (&d);
 
   d.target = gen_raw_REG (d.vmode, LAST_VIRTUAL_REGISTER + 1);
   d.op1 = d.op0 = gen_raw_REG (d.vmode, LAST_VIRTUAL_REGISTER + 2);
@@ -29474,7 +29447,7 @@ arm_vectorize_vec_perm_const_ok (machine
     d.op1 = gen_raw_REG (d.vmode, LAST_VIRTUAL_REGISTER + 3);
 
   start_sequence ();
-  ret = arm_expand_vec_perm_const_1 (&d);
+  bool ret = arm_expand_vec_perm_const_1 (&d);
   end_sequence ();
 
   return ret;
Index: gcc/config/i386/i386-protos.h
===================================================================
--- gcc/config/i386/i386-protos.h	2017-12-09 22:47:09.549486911 +0000
+++ gcc/config/i386/i386-protos.h	2017-12-09 22:47:27.859318085 +0000
@@ -133,7 +133,6 @@ extern bool ix86_expand_fp_movcc (rtx[])
 extern bool ix86_expand_fp_vcond (rtx[]);
 extern bool ix86_expand_int_vcond (rtx[]);
 extern void ix86_expand_vec_perm (rtx[]);
-extern bool ix86_expand_vec_perm_const (rtx[]);
 extern bool ix86_expand_mask_vec_cmp (rtx[]);
 extern bool ix86_expand_int_vec_cmp (rtx[]);
 extern bool ix86_expand_fp_vec_cmp (rtx[]);
Index: gcc/config/i386/sse.md
===================================================================
--- gcc/config/i386/sse.md	2017-12-09 22:47:09.549486911 +0000
+++ gcc/config/i386/sse.md	2017-12-09 22:47:27.863318088 +0000
@@ -11476,30 +11476,6 @@ (define_expand "vec_perm<mode>"
   DONE;
 })
 
-(define_mode_iterator VEC_PERM_CONST
-  [(V4SF "TARGET_SSE") (V4SI "TARGET_SSE")
-   (V2DF "TARGET_SSE") (V2DI "TARGET_SSE")
-   (V16QI "TARGET_SSE2") (V8HI "TARGET_SSE2")
-   (V8SF "TARGET_AVX") (V4DF "TARGET_AVX")
-   (V8SI "TARGET_AVX") (V4DI "TARGET_AVX")
-   (V32QI "TARGET_AVX2") (V16HI "TARGET_AVX2")
-   (V16SI "TARGET_AVX512F") (V8DI "TARGET_AVX512F")
-   (V16SF "TARGET_AVX512F") (V8DF "TARGET_AVX512F")
-   (V32HI "TARGET_AVX512BW") (V64QI "TARGET_AVX512BW")])
-
-(define_expand "vec_perm_const<mode>"
-  [(match_operand:VEC_PERM_CONST 0 "register_operand")
-   (match_operand:VEC_PERM_CONST 1 "register_operand")
-   (match_operand:VEC_PERM_CONST 2 "register_operand")
-   (match_operand:<sseintvecmode> 3)]
-  ""
-{
-  if (ix86_expand_vec_perm_const (operands))
-    DONE;
-  else
-    FAIL;
-})
-
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
 ;;
 ;; Parallel bitwise logical operations
Index: gcc/config/i386/i386.c
===================================================================
--- gcc/config/i386/i386.c	2017-12-09 22:47:09.549486911 +0000
+++ gcc/config/i386/i386.c	2017-12-09 22:47:27.862318087 +0000
@@ -47588,9 +47588,8 @@ expand_vec_perm_vpshufb4_vpermq2 (struct
   return true;
 }
 
-/* The guts of ix86_expand_vec_perm_const, also used by the ok hook.
-   With all of the interface bits taken care of, perform the expansion
-   in D and return true on success.  */
+/* The guts of ix86_vectorize_vec_perm_const.  With all of the interface bits
+   taken care of, perform the expansion in D and return true on success.  */
 
 static bool
 ix86_expand_vec_perm_const_1 (struct expand_vec_perm_d *d)
@@ -47725,69 +47724,29 @@ canonicalize_perm (struct expand_vec_per
   return (which == 3);
 }
 
-bool
-ix86_expand_vec_perm_const (rtx operands[4])
+/* Implement TARGET_VECTORIZE_VEC_PERM_CONST.  */
+
+static bool
+ix86_vectorize_vec_perm_const (machine_mode vmode, rtx target, rtx op0,
+			       rtx op1, const vec_perm_indices &sel)
 {
   struct expand_vec_perm_d d;
   unsigned char perm[MAX_VECT_LEN];
-  int i, nelt;
+  unsigned int i, nelt, which;
   bool two_args;
-  rtx sel;
 
-  d.target = operands[0];
-  d.op0 = operands[1];
-  d.op1 = operands[2];
-  sel = operands[3];
+  d.target = target;
+  d.op0 = op0;
+  d.op1 = op1;
 
-  d.vmode = GET_MODE (d.target);
+  d.vmode = vmode;
   gcc_assert (VECTOR_MODE_P (d.vmode));
   d.nelt = nelt = GET_MODE_NUNITS (d.vmode);
-  d.testing_p = false;
+  d.testing_p = !target;
 
-  gcc_assert (GET_CODE (sel) == CONST_VECTOR);
-  gcc_assert (XVECLEN (sel, 0) == nelt);
+  gcc_assert (sel.length () == nelt);
   gcc_checking_assert (sizeof (d.perm) == sizeof (perm));
 
-  for (i = 0; i < nelt; ++i)
-    {
-      rtx e = XVECEXP (sel, 0, i);
-      int ei = INTVAL (e) & (2 * nelt - 1);
-      d.perm[i] = ei;
-      perm[i] = ei;
-    }
-
-  two_args = canonicalize_perm (&d);
-
-  if (ix86_expand_vec_perm_const_1 (&d))
-    return true;
-
-  /* If the selector says both arguments are needed, but the operands are the
-     same, the above tried to expand with one_operand_p and flattened selector.
-     If that didn't work, retry without one_operand_p; we succeeded with that
-     during testing.  */
-  if (two_args && d.one_operand_p)
-    {
-      d.one_operand_p = false;
-      memcpy (d.perm, perm, sizeof (perm));
-      return ix86_expand_vec_perm_const_1 (&d);
-    }
-
-  return false;
-}
-
-/* Implement targetm.vectorize.vec_perm_const_ok.  */
-
-static bool
-ix86_vectorize_vec_perm_const_ok (machine_mode vmode, vec_perm_indices sel)
-{
-  struct expand_vec_perm_d d;
-  unsigned int i, nelt, which;
-  bool ret;
-
-  d.vmode = vmode;
-  d.nelt = nelt = GET_MODE_NUNITS (d.vmode);
-  d.testing_p = true;
-
   /* Given sufficient ISA support we can just return true here
      for selected vector modes.  */
   switch (d.vmode)
@@ -47796,17 +47755,23 @@ ix86_vectorize_vec_perm_const_ok (machin
     case E_V16SImode:
     case E_V8DImode:
     case E_V8DFmode:
-      if (TARGET_AVX512F)
-	/* All implementable with a single vperm[it]2 insn.  */
+      if (!TARGET_AVX512F)
+	return false;
+      /* All implementable with a single vperm[it]2 insn.  */
+      if (d.testing_p)
 	return true;
       break;
     case E_V32HImode:
-      if (TARGET_AVX512BW)
+      if (!TARGET_AVX512BW)
+	return false;
+      if (d.testing_p)
 	/* All implementable with a single vperm[it]2 insn.  */
 	return true;
       break;
     case E_V64QImode:
-      if (TARGET_AVX512BW)
+      if (!TARGET_AVX512BW)
+	return false;
+      if (d.testing_p)
 	/* Implementable with 2 vperm[it]2, 2 vpshufb and 1 or insn.  */
 	return true;
       break;
@@ -47814,73 +47779,108 @@ ix86_vectorize_vec_perm_const_ok (machin
     case E_V8SFmode:
     case E_V4DFmode:
     case E_V4DImode:
-      if (TARGET_AVX512VL)
+      if (!TARGET_AVX)
+	return false;
+      if (d.testing_p && TARGET_AVX512VL)
 	/* All implementable with a single vperm[it]2 insn.  */
 	return true;
       break;
     case E_V16HImode:
-      if (TARGET_AVX2)
+      if (!TARGET_SSE2)
+	return false;
+      if (d.testing_p && TARGET_AVX2)
 	/* Implementable with 4 vpshufb insns, 2 vpermq and 3 vpor insns.  */
 	return true;
       break;
     case E_V32QImode:
-      if (TARGET_AVX2)
+      if (!TARGET_SSE2)
+	return false;
+      if (d.testing_p && TARGET_AVX2)
 	/* Implementable with 4 vpshufb insns, 2 vpermq and 3 vpor insns.  */
 	return true;
       break;
-    case E_V4SImode:
-    case E_V4SFmode:
     case E_V8HImode:
     case E_V16QImode:
+      if (!TARGET_SSE2)
+	return false;
+      /* Fall through.  */
+    case E_V4SImode:
+    case E_V4SFmode:
+      if (!TARGET_SSE)
+	return false;
       /* All implementable with a single vpperm insn.  */
-      if (TARGET_XOP)
+      if (d.testing_p && TARGET_XOP)
 	return true;
       /* All implementable with 2 pshufb + 1 ior.  */
-      if (TARGET_SSSE3)
+      if (d.testing_p && TARGET_SSSE3)
 	return true;
       break;
     case E_V2DImode:
     case E_V2DFmode:
+      if (!TARGET_SSE)
+	return false;
       /* All implementable with shufpd or unpck[lh]pd.  */
-      return true;
+      if (d.testing_p)
+	return true;
+      break;
     default:
       return false;
     }
 
-  /* Extract the values from the vector CST into the permutation
-     array in D.  */
   for (i = which = 0; i < nelt; ++i)
     {
       unsigned char e = sel[i];
       gcc_assert (e < 2 * nelt);
       d.perm[i] = e;
+      perm[i] = e;
       which |= (e < nelt ? 1 : 2);
     }
 
-  /* For all elements from second vector, fold the elements to first.  */
-  if (which == 2)
-    for (i = 0; i < nelt; ++i)
-      d.perm[i] -= nelt;
+  if (d.testing_p)
+    {
+      /* For all elements from second vector, fold the elements to first.  */
+      if (which == 2)
+	for (i = 0; i < nelt; ++i)
+	  d.perm[i] -= nelt;
+
+      /* Check whether the mask can be applied to the vector type.  */
+      d.one_operand_p = (which != 3);
+
+      /* Implementable with shufps or pshufd.  */
+      if (d.one_operand_p && (d.vmode == V4SFmode || d.vmode == V4SImode))
+	return true;
+
+      /* Otherwise we have to go through the motions and see if we can
+	 figure out how to generate the requested permutation.  */
+      d.target = gen_raw_REG (d.vmode, LAST_VIRTUAL_REGISTER + 1);
+      d.op1 = d.op0 = gen_raw_REG (d.vmode, LAST_VIRTUAL_REGISTER + 2);
+      if (!d.one_operand_p)
+	d.op1 = gen_raw_REG (d.vmode, LAST_VIRTUAL_REGISTER + 3);
+
+      start_sequence ();
+      bool ret = ix86_expand_vec_perm_const_1 (&d);
+      end_sequence ();
 
-  /* Check whether the mask can be applied to the vector type.  */
-  d.one_operand_p = (which != 3);
+      return ret;
+    }
 
-  /* Implementable with shufps or pshufd.  */
-  if (d.one_operand_p && (d.vmode == V4SFmode || d.vmode == V4SImode))
+  two_args = canonicalize_perm (&d);
+
+  if (ix86_expand_vec_perm_const_1 (&d))
     return true;
 
-  /* Otherwise we have to go through the motions and see if we can
-     figure out how to generate the requested permutation.  */
-  d.target = gen_raw_REG (d.vmode, LAST_VIRTUAL_REGISTER + 1);
-  d.op1 = d.op0 = gen_raw_REG (d.vmode, LAST_VIRTUAL_REGISTER + 2);
-  if (!d.one_operand_p)
-    d.op1 = gen_raw_REG (d.vmode, LAST_VIRTUAL_REGISTER + 3);
-
-  start_sequence ();
-  ret = ix86_expand_vec_perm_const_1 (&d);
-  end_sequence ();
+  /* If the selector says both arguments are needed, but the operands are the
+     same, the above tried to expand with one_operand_p and flattened selector.
+     If that didn't work, retry without one_operand_p; we succeeded with that
+     during testing.  */
+  if (two_args && d.one_operand_p)
+    {
+      d.one_operand_p = false;
+      memcpy (d.perm, perm, sizeof (perm));
+      return ix86_expand_vec_perm_const_1 (&d);
+    }
 
-  return ret;
+  return false;
 }
 
 void
@@ -50532,9 +50532,8 @@ #define TARGET_CLASS_LIKELY_SPILLED_P ix
 #undef TARGET_VECTORIZE_BUILTIN_VECTORIZATION_COST
 #define TARGET_VECTORIZE_BUILTIN_VECTORIZATION_COST \
   ix86_builtin_vectorization_cost
-#undef TARGET_VECTORIZE_VEC_PERM_CONST_OK
-#define TARGET_VECTORIZE_VEC_PERM_CONST_OK \
-  ix86_vectorize_vec_perm_const_ok
+#undef TARGET_VECTORIZE_VEC_PERM_CONST
+#define TARGET_VECTORIZE_VEC_PERM_CONST ix86_vectorize_vec_perm_const
 #undef TARGET_VECTORIZE_PREFERRED_SIMD_MODE
 #define TARGET_VECTORIZE_PREFERRED_SIMD_MODE \
   ix86_preferred_simd_mode
Index: gcc/config/ia64/ia64-protos.h
===================================================================
--- gcc/config/ia64/ia64-protos.h	2017-12-09 22:47:09.549486911 +0000
+++ gcc/config/ia64/ia64-protos.h	2017-12-09 22:47:27.864318089 +0000
@@ -62,7 +62,6 @@ extern const char *get_bundle_name (int)
 extern const char *output_probe_stack_range (rtx, rtx);
 
 extern void ia64_expand_vec_perm_even_odd (rtx, rtx, rtx, int);
-extern bool ia64_expand_vec_perm_const (rtx op[4]);
 extern void ia64_expand_vec_setv2sf (rtx op[3]);
 #endif /* RTX_CODE */
 
Index: gcc/config/ia64/vect.md
===================================================================
--- gcc/config/ia64/vect.md	2017-12-09 22:47:09.549486911 +0000
+++ gcc/config/ia64/vect.md	2017-12-09 22:47:27.865318089 +0000
@@ -1549,19 +1549,6 @@ (define_expand "vec_pack_trunc_v2si"
   DONE;
 })
 
-(define_expand "vec_perm_const<mode>"
-  [(match_operand:VEC 0 "register_operand" "")
-   (match_operand:VEC 1 "register_operand" "")
-   (match_operand:VEC 2 "register_operand" "")
-   (match_operand:<vecint> 3 "" "")]
-  ""
-{
-  if (ia64_expand_vec_perm_const (operands))
-    DONE;
-  else
-    FAIL;
-})
-
 ;; Missing operations
 ;; fprcpa
 ;; fpsqrta
Index: gcc/config/ia64/ia64.c
===================================================================
--- gcc/config/ia64/ia64.c	2017-12-09 22:47:09.549486911 +0000
+++ gcc/config/ia64/ia64.c	2017-12-09 22:47:27.864318089 +0000
@@ -333,7 +333,8 @@ static fixed_size_mode ia64_get_reg_raw_
 static section * ia64_hpux_function_section (tree, enum node_frequency,
 					     bool, bool);
 
-static bool ia64_vectorize_vec_perm_const_ok (machine_mode, vec_perm_indices);
+static bool ia64_vectorize_vec_perm_const (machine_mode, rtx, rtx, rtx,
+					   const vec_perm_indices &);
 
 static unsigned int ia64_hard_regno_nregs (unsigned int, machine_mode);
 static bool ia64_hard_regno_mode_ok (unsigned int, machine_mode);
@@ -652,8 +653,8 @@ #define TARGET_DELAY_SCHED2 true
 #undef TARGET_DELAY_VARTRACK
 #define TARGET_DELAY_VARTRACK true
 
-#undef TARGET_VECTORIZE_VEC_PERM_CONST_OK
-#define TARGET_VECTORIZE_VEC_PERM_CONST_OK ia64_vectorize_vec_perm_const_ok
+#undef TARGET_VECTORIZE_VEC_PERM_CONST
+#define TARGET_VECTORIZE_VEC_PERM_CONST ia64_vectorize_vec_perm_const
 
 #undef TARGET_ATTRIBUTE_TAKES_IDENTIFIER_P
 #define TARGET_ATTRIBUTE_TAKES_IDENTIFIER_P ia64_attribute_takes_identifier_p
@@ -11741,32 +11742,31 @@ ia64_expand_vec_perm_const_1 (struct exp
   return false;
 }
 
-bool
-ia64_expand_vec_perm_const (rtx operands[4])
+/* Implement TARGET_VECTORIZE_VEC_PERM_CONST.  */
+
+static bool
+ia64_vectorize_vec_perm_const (machine_mode vmode, rtx target, rtx op0,
+			       rtx op1, const vec_perm_indices &sel)
 {
   struct expand_vec_perm_d d;
   unsigned char perm[MAX_VECT_LEN];
-  int i, nelt, which;
-  rtx sel;
+  unsigned int i, nelt, which;
 
-  d.target = operands[0];
-  d.op0 = operands[1];
-  d.op1 = operands[2];
-  sel = operands[3];
+  d.target = target;
+  d.op0 = op0;
+  d.op1 = op1;
 
-  d.vmode = GET_MODE (d.target);
+  d.vmode = vmode;
   gcc_assert (VECTOR_MODE_P (d.vmode));
   d.nelt = nelt = GET_MODE_NUNITS (d.vmode);
-  d.testing_p = false;
+  d.testing_p = !target;
 
-  gcc_assert (GET_CODE (sel) == CONST_VECTOR);
-  gcc_assert (XVECLEN (sel, 0) == nelt);
+  gcc_assert (sel.length () == nelt);
   gcc_checking_assert (sizeof (d.perm) == sizeof (perm));
 
   for (i = which = 0; i < nelt; ++i)
     {
-      rtx e = XVECEXP (sel, 0, i);
-      int ei = INTVAL (e) & (2 * nelt - 1);
+      unsigned int ei = sel[i] & (2 * nelt - 1);
 
       which |= (ei < nelt ? 1 : 2);
       d.perm[i] = ei;
@@ -11779,7 +11779,7 @@ ia64_expand_vec_perm_const (rtx operands
       gcc_unreachable();
 
     case 3:
-      if (!rtx_equal_p (d.op0, d.op1))
+      if (d.testing_p || !rtx_equal_p (d.op0, d.op1))
 	{
 	  d.one_operand_p = false;
 	  break;
@@ -11807,6 +11807,22 @@ ia64_expand_vec_perm_const (rtx operands
       break;
     }
 
+  if (d.testing_p)
+    {
+      /* We have to go through the motions and see if we can
+	 figure out how to generate the requested permutation.  */
+      d.target = gen_raw_REG (d.vmode, LAST_VIRTUAL_REGISTER + 1);
+      d.op1 = d.op0 = gen_raw_REG (d.vmode, LAST_VIRTUAL_REGISTER + 2);
+      if (!d.one_operand_p)
+	d.op1 = gen_raw_REG (d.vmode, LAST_VIRTUAL_REGISTER + 3);
+
+      start_sequence ();
+      bool ret = ia64_expand_vec_perm_const_1 (&d);
+      end_sequence ();
+
+      return ret;
+    }
+
   if (ia64_expand_vec_perm_const_1 (&d))
     return true;
 
@@ -11823,51 +11839,6 @@ ia64_expand_vec_perm_const (rtx operands
   return false;
 }
 
-/* Implement targetm.vectorize.vec_perm_const_ok.  */
-
-static bool
-ia64_vectorize_vec_perm_const_ok (machine_mode vmode, vec_perm_indices sel)
-{
-  struct expand_vec_perm_d d;
-  unsigned int i, nelt, which;
-  bool ret;
-
-  d.vmode = vmode;
-  d.nelt = nelt = GET_MODE_NUNITS (d.vmode);
-  d.testing_p = true;
-
-  /* Extract the values from the vector CST into the permutation
-     array in D.  */
-  for (i = which = 0; i < nelt; ++i)
-    {
-      unsigned char e = sel[i];
-      d.perm[i] = e;
-      gcc_assert (e < 2 * nelt);
-      which |= (e < nelt ? 1 : 2);
-    }
-
-  /* For all elements from second vector, fold the elements to first.  */
-  if (which == 2)
-    for (i = 0; i < nelt; ++i)
-      d.perm[i] -= nelt;
-
-  /* Check whether the mask can be applied to the vector type.  */
-  d.one_operand_p = (which != 3);
-
-  /* Otherwise we have to go through the motions and see if we can
-     figure out how to generate the requested permutation.  */
-  d.target = gen_raw_REG (d.vmode, LAST_VIRTUAL_REGISTER + 1);
-  d.op1 = d.op0 = gen_raw_REG (d.vmode, LAST_VIRTUAL_REGISTER + 2);
-  if (!d.one_operand_p)
-    d.op1 = gen_raw_REG (d.vmode, LAST_VIRTUAL_REGISTER + 3);
-
-  start_sequence ();
-  ret = ia64_expand_vec_perm_const_1 (&d);
-  end_sequence ();
-
-  return ret;
-}
-
 void
 ia64_expand_vec_setv2sf (rtx operands[3])
 {
Index: gcc/config/mips/loongson.md
===================================================================
--- gcc/config/mips/loongson.md	2017-12-09 22:47:09.549486911 +0000
+++ gcc/config/mips/loongson.md	2017-12-09 22:47:27.865318089 +0000
@@ -784,19 +784,6 @@ (define_insn "*loongson_punpcklwd_hi"
   "punpcklwd\t%0,%1,%2"
   [(set_attr "type" "fcvt")])
 
-(define_expand "vec_perm_const<mode>"
-  [(match_operand:VWHB 0 "register_operand" "")
-   (match_operand:VWHB 1 "register_operand" "")
-   (match_operand:VWHB 2 "register_operand" "")
-   (match_operand:VWHB 3 "" "")]
-  "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
-{
-  if (mips_expand_vec_perm_const (operands))
-    DONE;
-  else
-    FAIL;
-})
-
 (define_expand "vec_unpacks_lo_<mode>"
   [(match_operand:<V_stretch_half> 0 "register_operand" "")
    (match_operand:VHB 1 "register_operand" "")]
Index: gcc/config/mips/mips-msa.md
===================================================================
--- gcc/config/mips/mips-msa.md	2017-12-09 22:47:09.549486911 +0000
+++ gcc/config/mips/mips-msa.md	2017-12-09 22:47:27.865318089 +0000
@@ -558,19 +558,6 @@ (define_insn_and_split "msa_copy_s_<msaf
   [(set_attr "type" "simd_copy")
    (set_attr "mode" "<MODE>")])
 
-(define_expand "vec_perm_const<mode>"
-  [(match_operand:MSA 0 "register_operand")
-   (match_operand:MSA 1 "register_operand")
-   (match_operand:MSA 2 "register_operand")
-   (match_operand:<VIMODE> 3 "")]
-  "ISA_HAS_MSA"
-{
-  if (mips_expand_vec_perm_const (operands))
-    DONE;
-  else
-    FAIL;
-})
-
 (define_expand "abs<mode>2"
   [(match_operand:IMSA 0 "register_operand" "=f")
    (abs:IMSA (match_operand:IMSA 1 "register_operand" "f"))]
Index: gcc/config/mips/mips-ps-3d.md
===================================================================
--- gcc/config/mips/mips-ps-3d.md	2017-12-09 22:47:09.549486911 +0000
+++ gcc/config/mips/mips-ps-3d.md	2017-12-09 22:47:27.865318089 +0000
@@ -164,19 +164,6 @@ (define_insn "vec_perm_const_ps"
   [(set_attr "type" "fmove")
    (set_attr "mode" "SF")])
 
-(define_expand "vec_perm_constv2sf"
-  [(match_operand:V2SF 0 "register_operand" "")
-   (match_operand:V2SF 1 "register_operand" "")
-   (match_operand:V2SF 2 "register_operand" "")
-   (match_operand:V2SI 3 "" "")]
-  "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT"
-{
-  if (mips_expand_vec_perm_const (operands))
-    DONE;
-  else
-    FAIL;
-})
-
 ;; Expanders for builtins.  The instruction:
 ;;
 ;;     P[UL][UL].PS <result>, <a>, <b>
Index: gcc/config/mips/mips-protos.h
===================================================================
--- gcc/config/mips/mips-protos.h	2017-12-09 22:47:09.549486911 +0000
+++ gcc/config/mips/mips-protos.h	2017-12-09 22:47:27.865318089 +0000
@@ -348,7 +348,6 @@ extern void mips_expand_atomic_qihi (uni
 				     rtx, rtx, rtx, rtx);
 
 extern void mips_expand_vector_init (rtx, rtx);
-extern bool mips_expand_vec_perm_const (rtx op[4]);
 extern void mips_expand_vec_unpack (rtx op[2], bool, bool);
 extern void mips_expand_vec_reduc (rtx, rtx, rtx (*)(rtx, rtx, rtx));
 extern void mips_expand_vec_minmax (rtx, rtx, rtx,
Index: gcc/config/mips/mips.c
===================================================================
--- gcc/config/mips/mips.c	2017-12-09 22:47:09.549486911 +0000
+++ gcc/config/mips/mips.c	2017-12-09 22:47:27.867318090 +0000
@@ -21377,34 +21377,32 @@ mips_expand_vec_perm_const_1 (struct exp
   return false;
 }
 
-/* Expand a vec_perm_const pattern.  */
+/* Implement TARGET_VECTORIZE_VEC_PERM_CONST.  */
 
-bool
-mips_expand_vec_perm_const (rtx operands[4])
+static bool
+mips_vectorize_vec_perm_const (machine_mode vmode, rtx target, rtx op0,
+			       rtx op1, const vec_perm_indices &sel)
 {
   struct expand_vec_perm_d d;
   int i, nelt, which;
   unsigned char orig_perm[MAX_VECT_LEN];
-  rtx sel;
   bool ok;
 
-  d.target = operands[0];
-  d.op0 = operands[1];
-  d.op1 = operands[2];
-  sel = operands[3];
-
-  d.vmode = GET_MODE (d.target);
-  gcc_assert (VECTOR_MODE_P (d.vmode));
-  d.nelt = nelt = GET_MODE_NUNITS (d.vmode);
-  d.testing_p = false;
+  d.target = target;
+  d.op0 = op0;
+  d.op1 = op1;
+
+  d.vmode = vmode;
+  gcc_assert (VECTOR_MODE_P (vmode));
+  d.nelt = nelt = GET_MODE_NUNITS (vmode);
+  d.testing_p = !target;
 
   /* This is overly conservative, but ensures we don't get an
      uninitialized warning on ORIG_PERM.  */
   memset (orig_perm, 0, MAX_VECT_LEN);
   for (i = which = 0; i < nelt; ++i)
     {
-      rtx e = XVECEXP (sel, 0, i);
-      int ei = INTVAL (e) & (2 * nelt - 1);
+      int ei = sel[i] & (2 * nelt - 1);
       which |= (ei < nelt ? 1 : 2);
       orig_perm[i] = ei;
     }
@@ -21417,7 +21415,7 @@ mips_expand_vec_perm_const (rtx operands
 
     case 3:
       d.one_vector_p = false;
-      if (!rtx_equal_p (d.op0, d.op1))
+      if (d.testing_p || !rtx_equal_p (d.op0, d.op1))
 	break;
       /* FALLTHRU */
 
@@ -21434,6 +21432,19 @@ mips_expand_vec_perm_const (rtx operands
       break;
     }
 
+  if (d.testing_p)
+    {
+      d.target = gen_raw_REG (d.vmode, LAST_VIRTUAL_REGISTER + 1);
+      d.op1 = d.op0 = gen_raw_REG (d.vmode, LAST_VIRTUAL_REGISTER + 2);
+      if (!d.one_vector_p)
+	d.op1 = gen_raw_REG (d.vmode, LAST_VIRTUAL_REGISTER + 3);
+
+      start_sequence ();
+      ok = mips_expand_vec_perm_const_1 (&d);
+      end_sequence ();
+      return ok;
+    }
+
   ok = mips_expand_vec_perm_const_1 (&d);
 
   /* If we were given a two-vector permutation which just happened to
@@ -21445,8 +21456,8 @@ mips_expand_vec_perm_const (rtx operands
      the original permutation.  */
   if (!ok && which == 3)
     {
-      d.op0 = operands[1];
-      d.op1 = operands[2];
+      d.op0 = op0;
+      d.op1 = op1;
       d.one_vector_p = false;
       memcpy (d.perm, orig_perm, MAX_VECT_LEN);
       ok = mips_expand_vec_perm_const_1 (&d);
@@ -21466,48 +21477,6 @@ mips_sched_reassociation_width (unsigned
   return 1;
 }
 
-/* Implement TARGET_VECTORIZE_VEC_PERM_CONST_OK.  */
-
-static bool
-mips_vectorize_vec_perm_const_ok (machine_mode vmode, vec_perm_indices sel)
-{
-  struct expand_vec_perm_d d;
-  unsigned int i, nelt, which;
-  bool ret;
-
-  d.vmode = vmode;
-  d.nelt = nelt = GET_MODE_NUNITS (d.vmode);
-  d.testing_p = true;
-
-  /* Categorize the set of elements in the selector.  */
-  for (i = which = 0; i < nelt; ++i)
-    {
-      unsigned char e = sel[i];
-      d.perm[i] = e;
-      gcc_assert (e < 2 * nelt);
-      which |= (e < nelt ? 1 : 2);
-    }
-
-  /* For all elements from second vector, fold the elements to first.  */
-  if (which == 2)
-    for (i = 0; i < nelt; ++i)
-      d.perm[i] -= nelt;
-
-  /* Check whether the mask can be applied to the vector type.  */
-  d.one_vector_p = (which != 3);
-
-  d.target = gen_raw_REG (d.vmode, LAST_VIRTUAL_REGISTER + 1);
-  d.op1 = d.op0 = gen_raw_REG (d.vmode, LAST_VIRTUAL_REGISTER + 2);
-  if (!d.one_vector_p)
-    d.op1 = gen_raw_REG (d.vmode, LAST_VIRTUAL_REGISTER + 3);
-
-  start_sequence ();
-  ret = mips_expand_vec_perm_const_1 (&d);
-  end_sequence ();
-
-  return ret;
-}
-
 /* Expand an integral vector unpack operation.  */
 
 void
@@ -22589,8 +22558,8 @@ #define TARGET_SHIFT_TRUNCATION_MASK mip
 #undef TARGET_PREPARE_PCH_SAVE
 #define TARGET_PREPARE_PCH_SAVE mips_prepare_pch_save
 
-#undef TARGET_VECTORIZE_VEC_PERM_CONST_OK
-#define TARGET_VECTORIZE_VEC_PERM_CONST_OK mips_vectorize_vec_perm_const_ok
+#undef TARGET_VECTORIZE_VEC_PERM_CONST
+#define TARGET_VECTORIZE_VEC_PERM_CONST mips_vectorize_vec_perm_const
 
 #undef TARGET_SCHED_REASSOCIATION_WIDTH
 #define TARGET_SCHED_REASSOCIATION_WIDTH mips_sched_reassociation_width
Index: gcc/config/powerpcspe/altivec.md
===================================================================
--- gcc/config/powerpcspe/altivec.md	2017-12-09 22:47:09.549486911 +0000
+++ gcc/config/powerpcspe/altivec.md	2017-12-09 22:47:27.867318090 +0000
@@ -2080,19 +2080,6 @@ (define_expand "vec_permv16qi"
   }
 })
 
-(define_expand "vec_perm_constv16qi"
-  [(match_operand:V16QI 0 "register_operand" "")
-   (match_operand:V16QI 1 "register_operand" "")
-   (match_operand:V16QI 2 "register_operand" "")
-   (match_operand:V16QI 3 "" "")]
-  "TARGET_ALTIVEC"
-{
-  if (altivec_expand_vec_perm_const (operands))
-    DONE;
-  else
-    FAIL;
-})
-
 (define_insn "*altivec_vpermr_<mode>_internal"
   [(set (match_operand:VM 0 "register_operand" "=v,?wo")
 	(unspec:VM [(match_operand:VM 1 "register_operand" "v,wo")
Index: gcc/config/powerpcspe/paired.md
===================================================================
--- gcc/config/powerpcspe/paired.md	2017-12-09 22:47:09.549486911 +0000
+++ gcc/config/powerpcspe/paired.md	2017-12-09 22:47:27.867318090 +0000
@@ -313,19 +313,6 @@ (define_insn "paired_merge11"
   "ps_merge11 %0, %1, %2"
   [(set_attr "type" "fp")])
 
-(define_expand "vec_perm_constv2sf"
-  [(match_operand:V2SF 0 "gpc_reg_operand" "")
-   (match_operand:V2SF 1 "gpc_reg_operand" "")
-   (match_operand:V2SF 2 "gpc_reg_operand" "")
-   (match_operand:V2SI 3 "" "")]
-  "TARGET_PAIRED_FLOAT"
-{
-  if (rs6000_expand_vec_perm_const (operands))
-    DONE;
-  else
-    FAIL;
-})
-
 (define_insn "paired_sum0"
   [(set (match_operand:V2SF 0 "gpc_reg_operand" "=f")
 	(vec_concat:V2SF (plus:SF (vec_select:SF
Index: gcc/config/powerpcspe/spe.md
===================================================================
--- gcc/config/powerpcspe/spe.md	2017-12-09 22:47:09.549486911 +0000
+++ gcc/config/powerpcspe/spe.md	2017-12-09 22:47:27.871318093 +0000
@@ -511,19 +511,6 @@ (define_insn "vec_perm10_v2si"
   [(set_attr "type" "vecsimple")
    (set_attr  "length" "4")])
 
-(define_expand "vec_perm_constv2si"
-  [(match_operand:V2SI 0 "gpc_reg_operand" "")
-   (match_operand:V2SI 1 "gpc_reg_operand" "")
-   (match_operand:V2SI 2 "gpc_reg_operand" "")
-   (match_operand:V2SI 3 "" "")]
-  "TARGET_SPE"
-{
-  if (rs6000_expand_vec_perm_const (operands))
-    DONE;
-  else
-    FAIL;
-})
-
 (define_expand "spe_evmergehi"
   [(match_operand:V2SI 0 "register_operand" "")
    (match_operand:V2SI 1 "register_operand" "")
Index: gcc/config/powerpcspe/vsx.md
===================================================================
--- gcc/config/powerpcspe/vsx.md	2017-12-09 22:47:09.549486911 +0000
+++ gcc/config/powerpcspe/vsx.md	2017-12-09 22:47:27.871318093 +0000
@@ -2543,19 +2543,6 @@ (define_insn "vsx_xxpermdi2_<mode>_1"
 }
   [(set_attr "type" "vecperm")])
 
-(define_expand "vec_perm_const<mode>"
-  [(match_operand:VSX_D 0 "vsx_register_operand" "")
-   (match_operand:VSX_D 1 "vsx_register_operand" "")
-   (match_operand:VSX_D 2 "vsx_register_operand" "")
-   (match_operand:V2DI  3 "" "")]
-  "VECTOR_MEM_VSX_P (<MODE>mode)"
-{
-  if (rs6000_expand_vec_perm_const (operands))
-    DONE;
-  else
-    FAIL;
-})
-
 ;; Extraction of a single element in a small integer vector.  Until ISA 3.0,
 ;; none of the small types were allowed in a vector register, so we had to
 ;; extract to a DImode and either do a direct move or store.
Index: gcc/config/powerpcspe/powerpcspe-protos.h
===================================================================
--- gcc/config/powerpcspe/powerpcspe-protos.h	2017-12-09 22:47:09.549486911 +0000
+++ gcc/config/powerpcspe/powerpcspe-protos.h	2017-12-09 22:47:27.867318090 +0000
@@ -64,9 +64,7 @@ extern void rs6000_expand_vector_extract
 extern void rs6000_split_vec_extract_var (rtx, rtx, rtx, rtx, rtx);
 extern rtx rs6000_adjust_vec_address (rtx, rtx, rtx, rtx, machine_mode);
 extern void rs6000_split_v4si_init (rtx []);
-extern bool altivec_expand_vec_perm_const (rtx op[4]);
 extern void altivec_expand_vec_perm_le (rtx op[4]);
-extern bool rs6000_expand_vec_perm_const (rtx op[4]);
 extern void altivec_expand_lvx_be (rtx, rtx, machine_mode, unsigned);
 extern void altivec_expand_stvx_be (rtx, rtx, machine_mode, unsigned);
 extern void altivec_expand_stvex_be (rtx, rtx, machine_mode, unsigned);
Index: gcc/config/powerpcspe/powerpcspe.c
===================================================================
--- gcc/config/powerpcspe/powerpcspe.c	2017-12-09 22:47:09.549486911 +0000
+++ gcc/config/powerpcspe/powerpcspe.c	2017-12-09 22:47:27.871318093 +0000
@@ -1936,8 +1936,8 @@ #define TARGET_SET_CURRENT_FUNCTION rs60
 #undef TARGET_LEGITIMATE_CONSTANT_P
 #define TARGET_LEGITIMATE_CONSTANT_P rs6000_legitimate_constant_p
 
-#undef TARGET_VECTORIZE_VEC_PERM_CONST_OK
-#define TARGET_VECTORIZE_VEC_PERM_CONST_OK rs6000_vectorize_vec_perm_const_ok
+#undef TARGET_VECTORIZE_VEC_PERM_CONST
+#define TARGET_VECTORIZE_VEC_PERM_CONST rs6000_vectorize_vec_perm_const
 
 #undef TARGET_CAN_USE_DOLOOP_P
 #define TARGET_CAN_USE_DOLOOP_P can_use_doloop_if_innermost
@@ -38311,6 +38311,9 @@ rs6000_emit_parity (rtx dst, rtx src)
 }
 
 /* Expand an Altivec constant permutation for little endian mode.
+   OP0 and OP1 are the input vectors and TARGET is the output vector.
+   SEL specifies the constant permutation vector.
+
    There are two issues: First, the two input operands must be
    swapped so that together they form a double-wide array in LE
    order.  Second, the vperm instruction has surprising behavior
@@ -38352,22 +38355,18 @@ rs6000_emit_parity (rtx dst, rtx src)
 
    vr9  = 00000006 00000004 00000002 00000000.  */
 
-void
-altivec_expand_vec_perm_const_le (rtx operands[4])
+static void
+altivec_expand_vec_perm_const_le (rtx target, rtx op0, rtx op1,
+				  const vec_perm_indices &sel)
 {
   unsigned int i;
   rtx perm[16];
   rtx constv, unspec;
-  rtx target = operands[0];
-  rtx op0 = operands[1];
-  rtx op1 = operands[2];
-  rtx sel = operands[3];
 
   /* Unpack and adjust the constant selector.  */
   for (i = 0; i < 16; ++i)
     {
-      rtx e = XVECEXP (sel, 0, i);
-      unsigned int elt = 31 - (INTVAL (e) & 31);
+      unsigned int elt = 31 - (sel[i] & 31);
       perm[i] = GEN_INT (elt);
     }
 
@@ -38449,10 +38448,14 @@ altivec_expand_vec_perm_le (rtx operands
 }
 
 /* Expand an Altivec constant permutation.  Return true if we match
-   an efficient implementation; false to fall back to VPERM.  */
+   an efficient implementation; false to fall back to VPERM.
 
-bool
-altivec_expand_vec_perm_const (rtx operands[4])
+   OP0 and OP1 are the input vectors and TARGET is the output vector.
+   SEL specifies the constant permutation vector.  */
+
+static bool
+altivec_expand_vec_perm_const (rtx target, rtx op0, rtx op1,
+			       const vec_perm_indices &sel)
 {
   struct altivec_perm_insn {
     HOST_WIDE_INT mask;
@@ -38496,19 +38499,13 @@ altivec_expand_vec_perm_const (rtx opera
 
   unsigned int i, j, elt, which;
   unsigned char perm[16];
-  rtx target, op0, op1, sel, x;
+  rtx x;
   bool one_vec;
 
-  target = operands[0];
-  op0 = operands[1];
-  op1 = operands[2];
-  sel = operands[3];
-
   /* Unpack the constant selector.  */
   for (i = which = 0; i < 16; ++i)
     {
-      rtx e = XVECEXP (sel, 0, i);
-      elt = INTVAL (e) & 31;
+      elt = sel[i] & 31;
       which |= (elt < 16 ? 1 : 2);
       perm[i] = elt;
     }
@@ -38664,7 +38661,7 @@ altivec_expand_vec_perm_const (rtx opera
 
   if (!BYTES_BIG_ENDIAN)
     {
-      altivec_expand_vec_perm_const_le (operands);
+      altivec_expand_vec_perm_const_le (target, op0, op1, sel);
       return true;
     }
 
@@ -38724,60 +38721,54 @@ rs6000_expand_vec_perm_const_1 (rtx targ
   return true;
 }
 
-bool
-rs6000_expand_vec_perm_const (rtx operands[4])
-{
-  rtx target, op0, op1, sel;
-  unsigned char perm0, perm1;
-
-  target = operands[0];
-  op0 = operands[1];
-  op1 = operands[2];
-  sel = operands[3];
-
-  /* Unpack the constant selector.  */
-  perm0 = INTVAL (XVECEXP (sel, 0, 0)) & 3;
-  perm1 = INTVAL (XVECEXP (sel, 0, 1)) & 3;
-
-  return rs6000_expand_vec_perm_const_1 (target, op0, op1, perm0, perm1);
-}
-
-/* Test whether a constant permutation is supported.  */
+/* Implement TARGET_VECTORIZE_VEC_PERM_CONST.  */
 
 static bool
-rs6000_vectorize_vec_perm_const_ok (machine_mode vmode, vec_perm_indices sel)
+rs6000_vectorize_vec_perm_const (machine_mode vmode, rtx target, rtx op0,
+				 rtx op1, const vec_perm_indices &sel)
 {
+  bool testing_p = !target;
+
   /* AltiVec (and thus VSX) can handle arbitrary permutations.  */
-  if (TARGET_ALTIVEC)
+  if (TARGET_ALTIVEC && testing_p)
     return true;
 
-  /* Check for ps_merge* or evmerge* insns.  */
-  if ((TARGET_PAIRED_FLOAT && vmode == V2SFmode)
-      || (TARGET_SPE && vmode == V2SImode))
-    {
-      rtx op0 = gen_raw_REG (vmode, LAST_VIRTUAL_REGISTER + 1);
-      rtx op1 = gen_raw_REG (vmode, LAST_VIRTUAL_REGISTER + 2);
-      return rs6000_expand_vec_perm_const_1 (NULL, op0, op1, sel[0], sel[1]);
+  /* Check for ps_merge*, evmerge* or xxperm* insns.  */
+  if ((vmode == V2SFmode && TARGET_PAIRED_FLOAT)
+      || (vmode == V2SImode && TARGET_SPE)
+      || ((vmode == V2DFmode || vmode == V2DImode)
+	  && VECTOR_MEM_VSX_P (vmode)))
+    {
+      if (testing_p)
+	{
+	  op0 = gen_raw_REG (vmode, LAST_VIRTUAL_REGISTER + 1);
+	  op1 = gen_raw_REG (vmode, LAST_VIRTUAL_REGISTER + 2);
+	}
+      if (rs6000_expand_vec_perm_const_1 (target, op0, op1, sel[0], sel[1]))
+	return true;
+    }
+
+  if (TARGET_ALTIVEC)
+    {
+      /* Force the target-independent code to lower to V16QImode.  */
+      if (vmode != V16QImode)
+	return false;
+      if (altivec_expand_vec_perm_const (target, op0, op1, sel))
+	return true;
     }
 
   return false;
 }
 
-/* A subroutine for rs6000_expand_extract_even & rs6000_expand_interleave.  */
+/* A subroutine for rs6000_expand_extract_even & rs6000_expand_interleave.
+   OP0 and OP1 are the input vectors and TARGET is the output vector.
+   PERM specifies the constant permutation vector.  */
 
 static void
 rs6000_do_expand_vec_perm (rtx target, rtx op0, rtx op1,
-			   machine_mode vmode, unsigned nelt, rtx perm[])
+			   machine_mode vmode, const vec_perm_builder &perm)
 {
-  machine_mode imode;
-  rtx x;
-
-  imode = vmode;
-  if (GET_MODE_CLASS (vmode) != MODE_VECTOR_INT)
-    imode = mode_for_int_vector (vmode).require ();
-
-  x = gen_rtx_CONST_VECTOR (imode, gen_rtvec_v (nelt, perm));
-  x = expand_vec_perm (vmode, op0, op1, x, target);
+  rtx x = expand_vec_perm_const (vmode, op0, op1, perm, BLKmode, target);
   if (x != target)
     emit_move_insn (target, x);
 }
@@ -38789,12 +38780,12 @@ rs6000_expand_extract_even (rtx target,
 {
   machine_mode vmode = GET_MODE (target);
   unsigned i, nelt = GET_MODE_NUNITS (vmode);
-  rtx perm[16];
+  vec_perm_builder perm (nelt);
 
   for (i = 0; i < nelt; i++)
-    perm[i] = GEN_INT (i * 2);
+    perm.quick_push (i * 2);
 
-  rs6000_do_expand_vec_perm (target, op0, op1, vmode, nelt, perm);
+  rs6000_do_expand_vec_perm (target, op0, op1, vmode, perm);
 }
 
 /* Expand a vector interleave operation.  */
@@ -38804,16 +38795,16 @@ rs6000_expand_interleave (rtx target, rt
 {
   machine_mode vmode = GET_MODE (target);
   unsigned i, high, nelt = GET_MODE_NUNITS (vmode);
-  rtx perm[16];
+  vec_perm_builder perm (nelt);
 
   high = (highp ? 0 : nelt / 2);
   for (i = 0; i < nelt / 2; i++)
     {
-      perm[i * 2] = GEN_INT (i + high);
-      perm[i * 2 + 1] = GEN_INT (i + nelt + high);
+      perm.quick_push (i + high);
+      perm.quick_push (i + nelt + high);
     }
 
-  rs6000_do_expand_vec_perm (target, op0, op1, vmode, nelt, perm);
+  rs6000_do_expand_vec_perm (target, op0, op1, vmode, perm);
 }
 
 /* Scale a V2DF vector SRC by two to the SCALE and place in TGT.  */
Index: gcc/config/rs6000/altivec.md
===================================================================
--- gcc/config/rs6000/altivec.md	2017-12-09 22:47:09.549486911 +0000
+++ gcc/config/rs6000/altivec.md	2017-12-09 22:47:27.872318093 +0000
@@ -2198,19 +2198,6 @@ (define_expand "vec_permv16qi"
   }
 })
 
-(define_expand "vec_perm_constv16qi"
-  [(match_operand:V16QI 0 "register_operand" "")
-   (match_operand:V16QI 1 "register_operand" "")
-   (match_operand:V16QI 2 "register_operand" "")
-   (match_operand:V16QI 3 "" "")]
-  "TARGET_ALTIVEC"
-{
-  if (altivec_expand_vec_perm_const (operands))
-    DONE;
-  else
-    FAIL;
-})
-
 (define_insn "*altivec_vpermr_<mode>_internal"
   [(set (match_operand:VM 0 "register_operand" "=v,?wo")
 	(unspec:VM [(match_operand:VM 1 "register_operand" "v,wo")
Index: gcc/config/rs6000/paired.md
===================================================================
--- gcc/config/rs6000/paired.md	2017-12-09 22:47:09.549486911 +0000
+++ gcc/config/rs6000/paired.md	2017-12-09 22:47:27.872318093 +0000
@@ -313,19 +313,6 @@ (define_insn "paired_merge11"
   "ps_merge11 %0, %1, %2"
   [(set_attr "type" "fp")])
 
-(define_expand "vec_perm_constv2sf"
-  [(match_operand:V2SF 0 "gpc_reg_operand" "")
-   (match_operand:V2SF 1 "gpc_reg_operand" "")
-   (match_operand:V2SF 2 "gpc_reg_operand" "")
-   (match_operand:V2SI 3 "" "")]
-  "TARGET_PAIRED_FLOAT"
-{
-  if (rs6000_expand_vec_perm_const (operands))
-    DONE;
-  else
-    FAIL;
-})
-
 (define_insn "paired_sum0"
   [(set (match_operand:V2SF 0 "gpc_reg_operand" "=f")
 	(vec_concat:V2SF (plus:SF (vec_select:SF
Index: gcc/config/rs6000/vsx.md
===================================================================
--- gcc/config/rs6000/vsx.md	2017-12-09 22:47:09.549486911 +0000
+++ gcc/config/rs6000/vsx.md	2017-12-09 22:47:27.875318095 +0000
@@ -3189,19 +3189,6 @@ (define_insn "vsx_xxpermdi2_<mode>_1"
 }
   [(set_attr "type" "vecperm")])
 
-(define_expand "vec_perm_const<mode>"
-  [(match_operand:VSX_D 0 "vsx_register_operand" "")
-   (match_operand:VSX_D 1 "vsx_register_operand" "")
-   (match_operand:VSX_D 2 "vsx_register_operand" "")
-   (match_operand:V2DI  3 "" "")]
-  "VECTOR_MEM_VSX_P (<MODE>mode)"
-{
-  if (rs6000_expand_vec_perm_const (operands))
-    DONE;
-  else
-    FAIL;
-})
-
 ;; Extraction of a single element in a small integer vector.  Until ISA 3.0,
 ;; none of the small types were allowed in a vector register, so we had to
 ;; extract to a DImode and either do a direct move or store.
Index: gcc/config/rs6000/rs6000-protos.h
===================================================================
--- gcc/config/rs6000/rs6000-protos.h	2017-12-09 22:47:09.549486911 +0000
+++ gcc/config/rs6000/rs6000-protos.h	2017-12-09 22:47:27.872318093 +0000
@@ -63,9 +63,7 @@ extern void rs6000_expand_vector_extract
 extern void rs6000_split_vec_extract_var (rtx, rtx, rtx, rtx, rtx);
 extern rtx rs6000_adjust_vec_address (rtx, rtx, rtx, rtx, machine_mode);
 extern void rs6000_split_v4si_init (rtx []);
-extern bool altivec_expand_vec_perm_const (rtx op[4]);
 extern void altivec_expand_vec_perm_le (rtx op[4]);
-extern bool rs6000_expand_vec_perm_const (rtx op[4]);
 extern void altivec_expand_lvx_be (rtx, rtx, machine_mode, unsigned);
 extern void altivec_expand_stvx_be (rtx, rtx, machine_mode, unsigned);
 extern void altivec_expand_stvex_be (rtx, rtx, machine_mode, unsigned);
Index: gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/config/rs6000/rs6000.c	2017-12-09 22:47:09.549486911 +0000
+++ gcc/config/rs6000/rs6000.c	2017-12-09 22:47:27.874318095 +0000
@@ -1907,8 +1907,8 @@ #define TARGET_SET_CURRENT_FUNCTION rs60
 #undef TARGET_LEGITIMATE_CONSTANT_P
 #define TARGET_LEGITIMATE_CONSTANT_P rs6000_legitimate_constant_p
 
-#undef TARGET_VECTORIZE_VEC_PERM_CONST_OK
-#define TARGET_VECTORIZE_VEC_PERM_CONST_OK rs6000_vectorize_vec_perm_const_ok
+#undef TARGET_VECTORIZE_VEC_PERM_CONST
+#define TARGET_VECTORIZE_VEC_PERM_CONST rs6000_vectorize_vec_perm_const
 
 #undef TARGET_CAN_USE_DOLOOP_P
 #define TARGET_CAN_USE_DOLOOP_P can_use_doloop_if_innermost
@@ -35545,6 +35545,9 @@ rs6000_emit_parity (rtx dst, rtx src)
 }
 
 /* Expand an Altivec constant permutation for little endian mode.
+   OP0 and OP1 are the input vectors and TARGET is the output vector.
+   SEL specifies the constant permutation vector.
+
    There are two issues: First, the two input operands must be
    swapped so that together they form a double-wide array in LE
    order.  Second, the vperm instruction has surprising behavior
@@ -35586,22 +35589,18 @@ rs6000_emit_parity (rtx dst, rtx src)
 
    vr9  = 00000006 00000004 00000002 00000000.  */
 
-void
-altivec_expand_vec_perm_const_le (rtx operands[4])
+static void
+altivec_expand_vec_perm_const_le (rtx target, rtx op0, rtx op1,
+				  const vec_perm_indices &sel)
 {
   unsigned int i;
   rtx perm[16];
   rtx constv, unspec;
-  rtx target = operands[0];
-  rtx op0 = operands[1];
-  rtx op1 = operands[2];
-  rtx sel = operands[3];
 
   /* Unpack and adjust the constant selector.  */
   for (i = 0; i < 16; ++i)
     {
-      rtx e = XVECEXP (sel, 0, i);
-      unsigned int elt = 31 - (INTVAL (e) & 31);
+      unsigned int elt = 31 - (sel[i] & 31);
       perm[i] = GEN_INT (elt);
     }
 
@@ -35683,10 +35682,14 @@ altivec_expand_vec_perm_le (rtx operands
 }
 
 /* Expand an Altivec constant permutation.  Return true if we match
-   an efficient implementation; false to fall back to VPERM.  */
+   an efficient implementation; false to fall back to VPERM.
 
-bool
-altivec_expand_vec_perm_const (rtx operands[4])
+   OP0 and OP1 are the input vectors and TARGET is the output vector.
+   SEL specifies the constant permutation vector.  */
+
+static bool
+altivec_expand_vec_perm_const (rtx target, rtx op0, rtx op1,
+			       const vec_perm_indices &sel)
 {
   struct altivec_perm_insn {
     HOST_WIDE_INT mask;
@@ -35734,19 +35737,13 @@ altivec_expand_vec_perm_const (rtx opera
 
   unsigned int i, j, elt, which;
   unsigned char perm[16];
-  rtx target, op0, op1, sel, x;
+  rtx x;
   bool one_vec;
 
-  target = operands[0];
-  op0 = operands[1];
-  op1 = operands[2];
-  sel = operands[3];
-
   /* Unpack the constant selector.  */
   for (i = which = 0; i < 16; ++i)
     {
-      rtx e = XVECEXP (sel, 0, i);
-      elt = INTVAL (e) & 31;
+      elt = sel[i] & 31;
       which |= (elt < 16 ? 1 : 2);
       perm[i] = elt;
     }
@@ -35902,7 +35899,7 @@ altivec_expand_vec_perm_const (rtx opera
 
   if (!BYTES_BIG_ENDIAN)
     {
-      altivec_expand_vec_perm_const_le (operands);
+      altivec_expand_vec_perm_const_le (target, op0, op1, sel);
       return true;
     }
 
@@ -35962,59 +35959,53 @@ rs6000_expand_vec_perm_const_1 (rtx targ
   return true;
 }
 
-bool
-rs6000_expand_vec_perm_const (rtx operands[4])
-{
-  rtx target, op0, op1, sel;
-  unsigned char perm0, perm1;
-
-  target = operands[0];
-  op0 = operands[1];
-  op1 = operands[2];
-  sel = operands[3];
-
-  /* Unpack the constant selector.  */
-  perm0 = INTVAL (XVECEXP (sel, 0, 0)) & 3;
-  perm1 = INTVAL (XVECEXP (sel, 0, 1)) & 3;
-
-  return rs6000_expand_vec_perm_const_1 (target, op0, op1, perm0, perm1);
-}
-
-/* Test whether a constant permutation is supported.  */
+/* Implement TARGET_VECTORIZE_VEC_PERM_CONST.  */
 
 static bool
-rs6000_vectorize_vec_perm_const_ok (machine_mode vmode, vec_perm_indices sel)
+rs6000_vectorize_vec_perm_const (machine_mode vmode, rtx target, rtx op0,
+				 rtx op1, const vec_perm_indices &sel)
 {
+  bool testing_p = !target;
+
   /* AltiVec (and thus VSX) can handle arbitrary permutations.  */
-  if (TARGET_ALTIVEC)
+  if (TARGET_ALTIVEC && testing_p)
     return true;
 
-  /* Check for ps_merge* or evmerge* insns.  */
-  if (TARGET_PAIRED_FLOAT && vmode == V2SFmode)
+  /* Check for ps_merge* or xxpermdi insns.  */
+  if ((vmode == V2SFmode && TARGET_PAIRED_FLOAT)
+      || ((vmode == V2DFmode || vmode == V2DImode)
+	  && VECTOR_MEM_VSX_P (vmode)))
+    {
+      if (testing_p)
+	{
+	  op0 = gen_raw_REG (vmode, LAST_VIRTUAL_REGISTER + 1);
+	  op1 = gen_raw_REG (vmode, LAST_VIRTUAL_REGISTER + 2);
+	}
+      if (rs6000_expand_vec_perm_const_1 (target, op0, op1, sel[0], sel[1]))
+	return true;
+    }
+
+  if (TARGET_ALTIVEC)
     {
-      rtx op0 = gen_raw_REG (vmode, LAST_VIRTUAL_REGISTER + 1);
-      rtx op1 = gen_raw_REG (vmode, LAST_VIRTUAL_REGISTER + 2);
-      return rs6000_expand_vec_perm_const_1 (NULL, op0, op1, sel[0], sel[1]);
+      /* Force the target-independent code to lower to V16QImode.  */
+      if (vmode != V16QImode)
+	return false;
+      if (altivec_expand_vec_perm_const (target, op0, op1, sel))
+	return true;
     }
 
   return false;
 }
 
-/* A subroutine for rs6000_expand_extract_even & rs6000_expand_interleave.  */
+/* A subroutine for rs6000_expand_extract_even & rs6000_expand_interleave.
+   OP0 and OP1 are the input vectors and TARGET is the output vector.
+   PERM specifies the constant permutation vector.  */
 
 static void
 rs6000_do_expand_vec_perm (rtx target, rtx op0, rtx op1,
-			   machine_mode vmode, unsigned nelt, rtx perm[])
+			   machine_mode vmode, const vec_perm_builder &perm)
 {
-  machine_mode imode;
-  rtx x;
-
-  imode = vmode;
-  if (GET_MODE_CLASS (vmode) != MODE_VECTOR_INT)
-    imode = mode_for_int_vector (vmode).require ();
-
-  x = gen_rtx_CONST_VECTOR (imode, gen_rtvec_v (nelt, perm));
-  x = expand_vec_perm (vmode, op0, op1, x, target);
+  rtx x = expand_vec_perm_const (vmode, op0, op1, perm, BLKmode, target);
   if (x != target)
     emit_move_insn (target, x);
 }
@@ -36026,12 +36017,12 @@ rs6000_expand_extract_even (rtx target,
 {
   machine_mode vmode = GET_MODE (target);
   unsigned i, nelt = GET_MODE_NUNITS (vmode);
-  rtx perm[16];
+  vec_perm_builder perm (nelt);
 
   for (i = 0; i < nelt; i++)
-    perm[i] = GEN_INT (i * 2);
+    perm.quick_push (i * 2);
 
-  rs6000_do_expand_vec_perm (target, op0, op1, vmode, nelt, perm);
+  rs6000_do_expand_vec_perm (target, op0, op1, vmode, perm);
 }
 
 /* Expand a vector interleave operation.  */
@@ -36041,16 +36032,16 @@ rs6000_expand_interleave (rtx target, rt
 {
   machine_mode vmode = GET_MODE (target);
   unsigned i, high, nelt = GET_MODE_NUNITS (vmode);
-  rtx perm[16];
+  vec_perm_builder perm (nelt);
 
   high = (highp ? 0 : nelt / 2);
   for (i = 0; i < nelt / 2; i++)
     {
-      perm[i * 2] = GEN_INT (i + high);
-      perm[i * 2 + 1] = GEN_INT (i + nelt + high);
+      perm.quick_push (i + high);
+      perm.quick_push (i + nelt + high);
     }
 
-  rs6000_do_expand_vec_perm (target, op0, op1, vmode, nelt, perm);
+  rs6000_do_expand_vec_perm (target, op0, op1, vmode, perm);
 }
 
 /* Scale a V2DF vector SRC by two to the SCALE and place in TGT.  */
Index: gcc/config/sparc/sparc.md
===================================================================
--- gcc/config/sparc/sparc.md	2017-12-09 22:47:09.549486911 +0000
+++ gcc/config/sparc/sparc.md	2017-12-09 22:47:27.876318096 +0000
@@ -9327,28 +9327,6 @@ (define_insn "bshuffle<VM64:mode>_vis"
    (set_attr "subtype" "other")
    (set_attr "fptype" "double")])
 
-;; The rtl expanders will happily convert constant permutations on other
-;; modes down to V8QI.  Rely on this to avoid the complexity of the byte
-;; order of the permutation.
-(define_expand "vec_perm_constv8qi"
-  [(match_operand:V8QI 0 "register_operand" "")
-   (match_operand:V8QI 1 "register_operand" "")
-   (match_operand:V8QI 2 "register_operand" "")
-   (match_operand:V8QI 3 "" "")]
-  "TARGET_VIS2"
-{
-  unsigned int i, mask;
-  rtx sel = operands[3];
-
-  for (i = mask = 0; i < 8; ++i)
-    mask |= (INTVAL (XVECEXP (sel, 0, i)) & 0xf) << (28 - i*4);
-  sel = force_reg (SImode, gen_int_mode (mask, SImode));
-
-  emit_insn (gen_bmasksi_vis (gen_reg_rtx (SImode), sel, const0_rtx));
-  emit_insn (gen_bshufflev8qi_vis (operands[0], operands[1], operands[2]));
-  DONE;
-})
-
 ;; Unlike constant permutation, we can vastly simplify the compression of
 ;; the 64-bit selector input to the 32-bit %gsr value by knowing what the
 ;; width of the input is.
Index: gcc/config/sparc/sparc.c
===================================================================
--- gcc/config/sparc/sparc.c	2017-12-09 22:47:09.549486911 +0000
+++ gcc/config/sparc/sparc.c	2017-12-09 22:47:27.876318096 +0000
@@ -686,6 +686,8 @@ static bool sparc_modes_tieable_p (machi
 static bool sparc_can_change_mode_class (machine_mode, machine_mode,
 					 reg_class_t);
 static HOST_WIDE_INT sparc_constant_alignment (const_tree, HOST_WIDE_INT);
+static bool sparc_vectorize_vec_perm_const (machine_mode, rtx, rtx, rtx,
+					    const vec_perm_indices &);
 \f
 #ifdef SUBTARGET_ATTRIBUTE_TABLE
 /* Table of valid machine attributes.  */
@@ -930,6 +932,9 @@ #define TARGET_CAN_CHANGE_MODE_CLASS spa
 #undef TARGET_CONSTANT_ALIGNMENT
 #define TARGET_CONSTANT_ALIGNMENT sparc_constant_alignment
 
+#undef TARGET_VECTORIZE_VEC_PERM_CONST
+#define TARGET_VECTORIZE_VEC_PERM_CONST sparc_vectorize_vec_perm_const
+
 struct gcc_target targetm = TARGET_INITIALIZER;
 
 /* Return the memory reference contained in X if any, zero otherwise.  */
@@ -12812,6 +12817,32 @@ sparc_expand_vec_perm_bmask (machine_mod
   emit_insn (gen_bmasksi_vis (gen_reg_rtx (SImode), sel, t_1));
 }
 
+/* Implement TARGET_VEC_PERM_CONST.  */
+
+static bool
+sparc_vectorize_vec_perm_const (machine_mode vmode, rtx target, rtx op0,
+				rtx op1, const vec_perm_indices &sel)
+{
+  /* All permutes are supported.  */
+  if (!target)
+    return true;
+
+  /* Force target-independent code to convert constant permutations on other
+     modes down to V8QI.  Rely on this to avoid the complexity of the byte
+     order of the permutation.  */
+  if (vmode != V8QImode)
+    return false;
+
+  unsigned int i, mask;
+  for (i = mask = 0; i < 8; ++i)
+    mask |= (sel[i] & 0xf) << (28 - i*4);
+  rtx mask_rtx = force_reg (SImode, gen_int_mode (mask, SImode));
+
+  emit_insn (gen_bmasksi_vis (gen_reg_rtx (SImode), mask_rtx, const0_rtx));
+  emit_insn (gen_bshufflev8qi_vis (target, op0, op1));
+  return true;
+}
+
 /* Implement TARGET_FRAME_POINTER_REQUIRED.  */
 
 static bool

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [06/13] Check whether a vector of QIs can store all indices
  2017-12-09 23:06 [00/13] Make VEC_PERM_EXPR work for variable-length vectors Richard Sandiford
                   ` (4 preceding siblings ...)
  2017-12-09 23:17 ` [05/13] Remove vec_perm_const optab Richard Sandiford
@ 2017-12-09 23:18 ` Richard Sandiford
  2017-12-12 15:27   ` Richard Biener
  2017-12-09 23:20 ` [07/13] Make vec_perm_indices use new vector encoding Richard Sandiford
                   ` (5 subsequent siblings)
  11 siblings, 1 reply; 46+ messages in thread
From: Richard Sandiford @ 2017-12-09 23:18 UTC (permalink / raw)
  To: gcc-patches

The patch to remove the vec_perm_const optab checked whether replacing
a constant permute with a variable permute is safe, or whether it might
truncate the indices.  This patch adds a corresponding check for whether
variable permutes can be lowered to QImode-based permutes.


2017-12-09  Richard Sandiford  <richard.sandiford@linaro.org>

gcc/
	* optabs-query.c (can_vec_perm_var_p): Check whether lowering
	to qimode could truncate the indices.
	* optabs.c (expand_vec_perm_var): Likewise.

Index: gcc/optabs-query.c
===================================================================
--- gcc/optabs-query.c	2017-12-09 22:47:21.534314227 +0000
+++ gcc/optabs-query.c	2017-12-09 22:47:25.861316866 +0000
@@ -378,7 +378,8 @@ can_vec_perm_var_p (machine_mode mode)
 
   /* We allow fallback to a QI vector mode, and adjust the mask.  */
   machine_mode qimode;
-  if (!qimode_for_vec_perm (mode).exists (&qimode))
+  if (!qimode_for_vec_perm (mode).exists (&qimode)
+      || GET_MODE_NUNITS (qimode) > GET_MODE_MASK (QImode) + 1)
     return false;
 
   if (direct_optab_handler (vec_perm_optab, qimode) == CODE_FOR_nothing)
Index: gcc/optabs.c
===================================================================
--- gcc/optabs.c	2017-12-09 22:47:23.878315657 +0000
+++ gcc/optabs.c	2017-12-09 22:47:25.861316866 +0000
@@ -5595,7 +5595,8 @@ expand_vec_perm_var (machine_mode mode,
   /* As a special case to aid several targets, lower the element-based
      permutation to a byte-based permutation and try again.  */
   machine_mode qimode;
-  if (!qimode_for_vec_perm (mode).exists (&qimode))
+  if (!qimode_for_vec_perm (mode).exists (&qimode)
+      || GET_MODE_NUNITS (qimode) > GET_MODE_MASK (QImode) + 1)
     return NULL_RTX;
   icode = direct_optab_handler (vec_perm_optab, qimode);
   if (icode == CODE_FOR_nothing)

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [08/13] Add a vec_perm_indices_to_tree helper function
  2017-12-09 23:06 [00/13] Make VEC_PERM_EXPR work for variable-length vectors Richard Sandiford
                   ` (6 preceding siblings ...)
  2017-12-09 23:20 ` [07/13] Make vec_perm_indices use new vector encoding Richard Sandiford
@ 2017-12-09 23:20 ` Richard Sandiford
  2017-12-18 13:34   ` Richard Biener
  2017-12-09 23:21 ` [09/13] Use explicit encodings for simple permutes Richard Sandiford
                   ` (3 subsequent siblings)
  11 siblings, 1 reply; 46+ messages in thread
From: Richard Sandiford @ 2017-12-09 23:20 UTC (permalink / raw)
  To: gcc-patches

This patch adds a function for creating a VECTOR_CST from a
vec_perm_indices, operating directly on the encoding.


2017-12-09  Richard Sandiford  <richard.sandiford@linaro.org>

gcc/
	* vec-perm-indices.h (vec_perm_indices_to_tree): Declare.
	* vec-perm-indices.c (vec_perm_indices_to_tree): New function.
	* tree-ssa-forwprop.c (simplify_vector_constructor): Use it.
	* tree-vect-slp.c (vect_transform_slp_perm_load): Likewise.
	* tree-vect-stmts.c (vectorizable_bswap): Likewise.
	(vect_gen_perm_mask_any): Likewise.

Index: gcc/vec-perm-indices.h
===================================================================
--- gcc/vec-perm-indices.h	2017-12-09 22:48:47.548825399 +0000
+++ gcc/vec-perm-indices.h	2017-12-09 22:48:50.361942571 +0000
@@ -73,6 +73,7 @@ typedef int_vector_builder<HOST_WIDE_INT
 };
 
 bool tree_to_vec_perm_builder (vec_perm_builder *, tree);
+tree vec_perm_indices_to_tree (tree, const vec_perm_indices &);
 rtx vec_perm_indices_to_rtx (machine_mode, const vec_perm_indices &);
 
 inline
Index: gcc/vec-perm-indices.c
===================================================================
--- gcc/vec-perm-indices.c	2017-12-09 22:48:47.548825399 +0000
+++ gcc/vec-perm-indices.c	2017-12-09 22:48:50.360942531 +0000
@@ -152,6 +152,20 @@ tree_to_vec_perm_builder (vec_perm_build
   return true;
 }
 
+/* Return a VECTOR_CST of type TYPE for the permutation vector in INDICES.  */
+
+tree
+vec_perm_indices_to_tree (tree type, const vec_perm_indices &indices)
+{
+  gcc_assert (TYPE_VECTOR_SUBPARTS (type) == indices.length ());
+  tree_vector_builder sel (type, indices.encoding ().npatterns (),
+			   indices.encoding ().nelts_per_pattern ());
+  unsigned int encoded_nelts = sel.encoded_nelts ();
+  for (unsigned int i = 0; i < encoded_nelts; i++)
+    sel.quick_push (build_int_cst (TREE_TYPE (type), indices[i]));
+  return sel.build ();
+}
+
 /* Return a CONST_VECTOR of mode MODE that contains the elements of
    INDICES.  */
 
Index: gcc/tree-ssa-forwprop.c
===================================================================
--- gcc/tree-ssa-forwprop.c	2017-12-09 22:48:47.546825312 +0000
+++ gcc/tree-ssa-forwprop.c	2017-12-09 22:48:50.359942492 +0000
@@ -2119,10 +2119,7 @@ simplify_vector_constructor (gimple_stmt
 	  || GET_MODE_SIZE (TYPE_MODE (mask_type))
 	     != GET_MODE_SIZE (TYPE_MODE (type)))
 	return false;
-      tree_vector_builder mask_elts (mask_type, nelts, 1);
-      for (i = 0; i < nelts; i++)
-	mask_elts.quick_push (build_int_cst (TREE_TYPE (mask_type), sel[i]));
-      op2 = mask_elts.build ();
+      op2 = vec_perm_indices_to_tree (mask_type, indices);
       if (conv_code == ERROR_MARK)
 	gimple_assign_set_rhs_with_ops (gsi, VEC_PERM_EXPR, orig, orig, op2);
       else
Index: gcc/tree-vect-slp.c
===================================================================
--- gcc/tree-vect-slp.c	2017-12-09 22:48:47.547825355 +0000
+++ gcc/tree-vect-slp.c	2017-12-09 22:48:50.359942492 +0000
@@ -3675,13 +3675,7 @@ vect_transform_slp_perm_load (slp_tree n
 		  tree mask_vec = NULL_TREE;
 		  
 		  if (! noop_p)
-		    {
-		      tree_vector_builder mask_elts (mask_type, nunits, 1);
-		      for (int l = 0; l < nunits; ++l)
-			mask_elts.quick_push (build_int_cst (mask_element_type,
-							     mask[l]));
-		      mask_vec = mask_elts.build ();
-		    }
+		    mask_vec = vec_perm_indices_to_tree (mask_type, indices);
 
 		  if (second_vec_index == -1)
 		    second_vec_index = first_vec_index;
Index: gcc/tree-vect-stmts.c
===================================================================
--- gcc/tree-vect-stmts.c	2017-12-09 22:48:47.548825399 +0000
+++ gcc/tree-vect-stmts.c	2017-12-09 22:48:50.360942531 +0000
@@ -2529,10 +2529,7 @@ vectorizable_bswap (gimple *stmt, gimple
       return true;
     }
 
-  tree_vector_builder telts (char_vectype, num_bytes, 1);
-  for (unsigned i = 0; i < num_bytes; ++i)
-    telts.quick_push (build_int_cst (char_type_node, elts[i]));
-  tree bswap_vconst = telts.build ();
+  tree bswap_vconst = vec_perm_indices_to_tree (char_vectype, indices);
 
   /* Transform.  */
   vec<tree> vec_oprnds = vNULL;
@@ -6521,17 +6518,10 @@ vect_gen_perm_mask_any (tree vectype, co
 {
   tree mask_elt_type, mask_type;
 
-  unsigned int nunits = sel.length ();
-  gcc_checking_assert (nunits == TYPE_VECTOR_SUBPARTS (vectype));
-
   mask_elt_type = lang_hooks.types.type_for_mode
     (int_mode_for_mode (TYPE_MODE (TREE_TYPE (vectype))).require (), 1);
   mask_type = get_vectype_for_scalar_type (mask_elt_type);
-
-  tree_vector_builder mask_elts (mask_type, nunits, 1);
-  for (unsigned int i = 0; i < nunits; ++i)
-    mask_elts.quick_push (build_int_cst (mask_elt_type, sel[i]));
-  return mask_elts.build ();
+  return vec_perm_indices_to_tree (mask_type, sel);
 }
 
 /* Checked version of vect_gen_perm_mask_any.  Asserts can_vec_perm_const_p,

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [07/13] Make vec_perm_indices use new vector encoding
  2017-12-09 23:06 [00/13] Make VEC_PERM_EXPR work for variable-length vectors Richard Sandiford
                   ` (5 preceding siblings ...)
  2017-12-09 23:18 ` [06/13] Check whether a vector of QIs can store all indices Richard Sandiford
@ 2017-12-09 23:20 ` Richard Sandiford
  2017-12-12 15:32   ` Richard Biener
  2017-12-09 23:20 ` [08/13] Add a vec_perm_indices_to_tree helper function Richard Sandiford
                   ` (4 subsequent siblings)
  11 siblings, 1 reply; 46+ messages in thread
From: Richard Sandiford @ 2017-12-09 23:20 UTC (permalink / raw)
  To: gcc-patches

This patch changes vec_perm_indices from a plain vec<> to a class
that stores a canonicalised permutation, using the same encoding
as for VECTOR_CSTs.  This means that vec_perm_indices now carries
information about the number of vectors being permuted (currently
always 1 or 2) and the number of elements in each input vector.

A new vec_perm_builder class is used to actually build up the vector,
like tree_vector_builder does for trees.  vec_perm_indices is the
completed representation, a bit like VECTOR_CST is for trees.

The patch just does a mechanical conversion of the code to
vec_perm_builder: a later patch uses explicit encodings where possible.

The point of all this is that it makes the representation suitable
for variable-length vectors.  It's no longer necessary for the
underlying vec<>s to store every element explicitly.

In int-vector-builder.h, "using the same encoding as tree and rtx constants"
describes the endpoint -- adding the rtx encoding comes later.


2017-12-09  Richard Sandiford  <richard.sandiford@linaro.org>

gcc/
	* int-vector-builder.h: New file.
	* vec-perm-indices.h: Include int-vector-builder.h.
	(vec_perm_indices): Redefine as an int_vector_builder.
	(auto_vec_perm_indices): Delete.
	(vec_perm_builder): Redefine as a stand-alone class.
	(vec_perm_indices::vec_perm_indices): New function.
	(vec_perm_indices::clamp): Likewise.
	* vec-perm-indices.c: Include fold-const.h and tree-vector-builder.h.
	(vec_perm_indices::new_vector): New function.
	(vec_perm_indices::new_expanded_vector): Update for new
	vec_perm_indices class.
	(vec_perm_indices::rotate_inputs): New function.
	(vec_perm_indices::all_in_range_p): Operate directly on the
	encoded form, without computing elided elements.
	(tree_to_vec_perm_builder): Operate directly on the VECTOR_CST
	encoding.  Update for new vec_perm_indices class.
	* optabs.c (expand_vec_perm_const): Create a vec_perm_indices for
	the given vec_perm_builder.
	(expand_vec_perm_var): Update vec_perm_builder constructor.
	(expand_mult_highpart): Use vec_perm_builder instead of
	auto_vec_perm_indices.
	* optabs-query.c (can_mult_highpart_p): Use vec_perm_builder and
	vec_perm_indices instead of auto_vec_perm_indices.  Use a single
	or double series encoding as appropriate.
	* fold-const.c (fold_ternary_loc): Use vec_perm_builder and
	vec_perm_indices instead of auto_vec_perm_indices.
	* tree-ssa-forwprop.c (simplify_vector_constructor): Likewise.
	* tree-vect-data-refs.c (vect_grouped_store_supported): Likewise.
	(vect_permute_store_chain): Likewise.
	(vect_grouped_load_supported): Likewise.
	(vect_permute_load_chain): Likewise.
	(vect_shift_permute_load_chain): Likewise.
	* tree-vect-slp.c (vect_build_slp_tree_1): Likewise.
	(vect_transform_slp_perm_load): Likewise.
	(vect_schedule_slp_instance): Likewise.
	* tree-vect-stmts.c (perm_mask_for_reverse): Likewise.
	(vectorizable_mask_load_store): Likewise.
	(vectorizable_bswap): Likewise.
	(vectorizable_store): Likewise.
	(vectorizable_load): Likewise.
	* tree-vect-generic.c (lower_vec_perm): Use vec_perm_builder and
	vec_perm_indices instead of auto_vec_perm_indices.  Use
	tree_to_vec_perm_builder to read the vector from a tree.
	* tree-vect-loop.c (calc_vec_perm_mask_for_shift): Take a
	vec_perm_builder instead of a vec_perm_indices.
	(have_whole_vector_shift): Use vec_perm_builder and
	vec_perm_indices instead of auto_vec_perm_indices.  Leave the
	truncation to calc_vec_perm_mask_for_shift.
	(vect_create_epilog_for_reduction): Likewise.
	* config/aarch64/aarch64.c (expand_vec_perm_d::perm): Change
	from auto_vec_perm_indices to vec_perm_indices.
	(aarch64_expand_vec_perm_const_1): Use rotate_inputs on d.perm
	instead of changing individual elements.
	(aarch64_vectorize_vec_perm_const): Use new_vector to install
	the vector in d.perm.
	* config/arm/arm.c (expand_vec_perm_d::perm): Change
	from auto_vec_perm_indices to vec_perm_indices.
	(arm_expand_vec_perm_const_1): Use rotate_inputs on d.perm
	instead of changing individual elements.
	(arm_vectorize_vec_perm_const): Use new_vector to install
	the vector in d.perm.
	* config/powerpcspe/powerpcspe.c (rs6000_expand_extract_even):
	Update vec_perm_builder constructor.
	(rs6000_expand_interleave): Likewise.
	* config/rs6000/rs6000.c (rs6000_expand_extract_even): Likewise.
	(rs6000_expand_interleave): Likewise.

Index: gcc/int-vector-builder.h
===================================================================
--- /dev/null	2017-12-09 13:59:56.352713187 +0000
+++ gcc/int-vector-builder.h	2017-12-09 22:48:47.545825268 +0000
@@ -0,0 +1,90 @@
+/* A class for building vector integer constants.
+   Copyright (C) 2017 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+#ifndef GCC_INT_VECTOR_BUILDER_H
+#define GCC_INT_VECTOR_BUILDER_H 1
+
+#include "vector-builder.h"
+
+/* This class is used to build vectors of integer type T using the same
+   encoding as tree and rtx constants.  See vector_builder for more
+   details.  */
+template<typename T>
+class int_vector_builder : public vector_builder<T, int_vector_builder<T> >
+{
+  typedef vector_builder<T, int_vector_builder> parent;
+  friend class vector_builder<T, int_vector_builder>;
+
+public:
+  int_vector_builder () {}
+  int_vector_builder (unsigned int, unsigned int, unsigned int);
+
+  using parent::new_vector;
+
+private:
+  bool equal_p (T, T) const;
+  bool allow_steps_p () const { return true; }
+  bool integral_p (T) const { return true; }
+  T step (T, T) const;
+  T apply_step (T, unsigned int, T) const;
+  bool can_elide_p (T) const { return true; }
+  void note_representative (T *, T) {}
+};
+
+/* Create a new builder for a vector with FULL_NELTS elements.
+   Initially encode the value as NPATTERNS interleaved patterns with
+   NELTS_PER_PATTERN elements each.  */
+
+template<typename T>
+inline
+int_vector_builder<T>::int_vector_builder (unsigned int full_nelts,
+					   unsigned int npatterns,
+					   unsigned int nelts_per_pattern)
+{
+  new_vector (full_nelts, npatterns, nelts_per_pattern);
+}
+
+/* Return true if elements ELT1 and ELT2 are equal.  */
+
+template<typename T>
+inline bool
+int_vector_builder<T>::equal_p (T elt1, T elt2) const
+{
+  return elt1 == elt2;
+}
+
+/* Return the value of element ELT2 minus the value of element ELT1.  */
+
+template<typename T>
+inline T
+int_vector_builder<T>::step (T elt1, T elt2) const
+{
+  return elt2 - elt1;
+}
+
+/* Return a vector element with the value BASE + FACTOR * STEP.  */
+
+template<typename T>
+inline T
+int_vector_builder<T>::apply_step (T base, unsigned int factor, T step) const
+{
+  return base + factor * step;
+}
+
+#endif
Index: gcc/vec-perm-indices.h
===================================================================
--- gcc/vec-perm-indices.h	2017-12-09 22:47:27.885318101 +0000
+++ gcc/vec-perm-indices.h	2017-12-09 22:48:47.548825399 +0000
@@ -20,30 +20,102 @@ Software Foundation; either version 3, o
 #ifndef GCC_VEC_PERN_INDICES_H
 #define GCC_VEC_PERN_INDICES_H 1
 
+#include "int-vector-builder.h"
+
+/* A vector_builder for building constant permutation vectors.
+   The elements do not need to be clamped to a particular range
+   of input elements.  */
+typedef int_vector_builder<HOST_WIDE_INT> vec_perm_builder;
+
 /* This class represents a constant permutation vector, such as that used
-   as the final operand to a VEC_PERM_EXPR.  */
-class vec_perm_indices : public auto_vec<unsigned short, 32>
+   as the final operand to a VEC_PERM_EXPR.  The vector is canonicalized
+   for a particular number of input vectors and for a particular number
+   of elements per input.  The class copes with cases in which the
+   input and output vectors have different numbers of elements.  */
+class vec_perm_indices
 {
-  typedef unsigned short element_type;
-  typedef auto_vec<element_type, 32> parent_type;
+  typedef HOST_WIDE_INT element_type;
 
 public:
-  vec_perm_indices () {}
-  vec_perm_indices (unsigned int nunits) : parent_type (nunits) {}
+  vec_perm_indices ();
+  vec_perm_indices (const vec_perm_builder &, unsigned int, unsigned int);
 
+  void new_vector (const vec_perm_builder &, unsigned int, unsigned int);
   void new_expanded_vector (const vec_perm_indices &, unsigned int);
+  void rotate_inputs (int delta);
+
+  /* Return the underlying vector encoding.  */
+  const vec_perm_builder &encoding () const { return m_encoding; }
+
+  /* Return the number of output elements.  This is called length ()
+     so that we present a more vec-like interface.  */
+  unsigned int length () const { return m_encoding.full_nelts (); }
+
+  /* Return the number of input vectors being permuted.  */
+  unsigned int ninputs () const { return m_ninputs; }
 
+  /* Return the number of elements in each input vector.  */
+  unsigned int nelts_per_input () const { return m_nelts_per_input; }
+
+  /* Return the total number of input elements.  */
+  unsigned int input_nelts () const { return m_ninputs * m_nelts_per_input; }
+
+  element_type clamp (element_type) const;
+  element_type operator[] (unsigned int i) const;
   bool all_in_range_p (element_type, element_type) const;
 
 private:
   vec_perm_indices (const vec_perm_indices &);
-};
 
-/* Temporary.  */
-typedef vec_perm_indices vec_perm_builder;
-typedef vec_perm_indices auto_vec_perm_indices;
+  vec_perm_builder m_encoding;
+  unsigned int m_ninputs;
+  unsigned int m_nelts_per_input;
+};
 
 bool tree_to_vec_perm_builder (vec_perm_builder *, tree);
 rtx vec_perm_indices_to_rtx (machine_mode, const vec_perm_indices &);
 
+inline
+vec_perm_indices::vec_perm_indices ()
+  : m_ninputs (0),
+    m_nelts_per_input (0)
+{
+}
+
+/* Construct a permutation vector that selects between NINPUTS vector
+   inputs that have NELTS_PER_INPUT elements each.  Take the elements of
+   the new vector from ELEMENTS, clamping each one to be in range.  */
+
+inline
+vec_perm_indices::vec_perm_indices (const vec_perm_builder &elements,
+				    unsigned int ninputs,
+				    unsigned int nelts_per_input)
+{
+  new_vector (elements, ninputs, nelts_per_input);
+}
+
+/* Return the canonical value for permutation vector element ELT,
+   taking into account the current number of input elements.  */
+
+inline vec_perm_indices::element_type
+vec_perm_indices::clamp (element_type elt) const
+{
+  element_type limit = input_nelts ();
+  elt %= limit;
+  /* Treat negative elements as counting from the end.  This only matters
+     if the vector size is not a power of 2.  */
+  if (elt < 0)
+    elt += limit;
+  return elt;
+}
+
+/* Return the value of vector element I, which might or might not be
+   explicitly encoded.  */
+
+inline vec_perm_indices::element_type
+vec_perm_indices::operator[] (unsigned int i) const
+{
+  return clamp (m_encoding.elt (i));
+}
+
 #endif
Index: gcc/vec-perm-indices.c
===================================================================
--- gcc/vec-perm-indices.c	2017-12-09 22:47:27.885318101 +0000
+++ gcc/vec-perm-indices.c	2017-12-09 22:48:47.548825399 +0000
@@ -22,11 +22,33 @@ Software Foundation; either version 3, o
 #include "coretypes.h"
 #include "vec-perm-indices.h"
 #include "tree.h"
+#include "fold-const.h"
+#include "tree-vector-builder.h"
 #include "backend.h"
 #include "rtl.h"
 #include "memmodel.h"
 #include "emit-rtl.h"
 
+/* Switch to a new permutation vector that selects between NINPUTS vector
+   inputs that have NELTS_PER_INPUT elements each.  Take the elements of the
+   new permutation vector from ELEMENTS, clamping each one to be in range.  */
+
+void
+vec_perm_indices::new_vector (const vec_perm_builder &elements,
+			      unsigned int ninputs,
+			      unsigned int nelts_per_input)
+{
+  m_ninputs = ninputs;
+  m_nelts_per_input = nelts_per_input;
+  /* Expand the encoding and clamp each element.  E.g. { 0, 2, 4, ... }
+     might wrap halfway if there is only one vector input.  */
+  unsigned int full_nelts = elements.full_nelts ();
+  m_encoding.new_vector (full_nelts, full_nelts, 1);
+  for (unsigned int i = 0; i < full_nelts; ++i)
+    m_encoding.quick_push (clamp (elements.elt (i)));
+  m_encoding.finalize ();
+}
+
 /* Switch to a new permutation vector that selects the same input elements
    as ORIG, but with each element split into FACTOR pieces.  For example,
    if ORIG is { 1, 2, 0, 3 } and FACTOR is 2, the new permutation is
@@ -36,14 +58,31 @@ Software Foundation; either version 3, o
 vec_perm_indices::new_expanded_vector (const vec_perm_indices &orig,
 				       unsigned int factor)
 {
-  truncate (0);
-  reserve (orig.length () * factor);
-  for (unsigned int i = 0; i < orig.length (); ++i)
+  m_ninputs = orig.m_ninputs;
+  m_nelts_per_input = orig.m_nelts_per_input * factor;
+  m_encoding.new_vector (orig.m_encoding.full_nelts () * factor,
+			 orig.m_encoding.npatterns () * factor,
+			 orig.m_encoding.nelts_per_pattern ());
+  unsigned int encoded_nelts = orig.m_encoding.encoded_nelts ();
+  for (unsigned int i = 0; i < encoded_nelts; ++i)
     {
-      element_type base = orig[i] * factor;
+      element_type base = orig.m_encoding[i] * factor;
       for (unsigned int j = 0; j < factor; ++j)
-	quick_push (base + j);
+	m_encoding.quick_push (base + j);
     }
+  m_encoding.finalize ();
+}
+
+/* Rotate the inputs of the permutation right by DELTA inputs.  This changes
+   the values of the permutation vector but it doesn't change the way that
+   the elements are encoded.  */
+
+void
+vec_perm_indices::rotate_inputs (int delta)
+{
+  element_type element_delta = delta * m_nelts_per_input;
+  for (unsigned int i = 0; i < m_encoding.length (); ++i)
+    m_encoding[i] = clamp (m_encoding[i] + element_delta);
 }
 
 /* Return true if all elements of the permutation vector are in the range
@@ -52,9 +91,44 @@ vec_perm_indices::new_expanded_vector (c
 bool
 vec_perm_indices::all_in_range_p (element_type start, element_type size) const
 {
-  for (unsigned int i = 0; i < length (); ++i)
-    if ((*this)[i] < start || ((*this)[i] - start) >= size)
+  /* Check the first two elements of each pattern.  */
+  unsigned int npatterns = m_encoding.npatterns ();
+  unsigned int nelts_per_pattern = m_encoding.nelts_per_pattern ();
+  unsigned int base_nelts = npatterns * MIN (nelts_per_pattern, 2);
+  for (unsigned int i = 0; i < base_nelts; ++i)
+    if (m_encoding[i] < start || (m_encoding[i] - start) >= size)
       return false;
+
+  /* For stepped encodings, check the full range of the series.  */
+  if (nelts_per_pattern == 3)
+    {
+      element_type limit = input_nelts ();
+
+      /* The number of elements in each pattern beyond the first two
+	 that we checked above.  */
+      unsigned int step_nelts = (m_encoding.full_nelts () / npatterns) - 2;
+      for (unsigned int i = 0; i < npatterns; ++i)
+	{
+	  /* BASE1 has been checked but BASE2 hasn't.   */
+	  element_type base1 = m_encoding[i + npatterns];
+	  element_type base2 = m_encoding[i + base_nelts];
+
+	  /* The step to add to get from BASE1 to each subsequent value.  */
+	  element_type step = clamp (base2 - base1);
+
+	  /* STEP has no inherent sign, so a value near LIMIT can
+	     act as a negative step.  The series is in range if it
+	     is in range according to one of the two interpretations.
+
+	     Since we're dealing with clamped values, ELEMENT_TYPE is
+	     wide enough for overflow not to be a problem.  */
+	  element_type headroom_down = base1 - start;
+	  element_type headroom_up = size - headroom_down - 1;
+	  if (headroom_up < step * step_nelts
+	      && headroom_down < (limit - step) * step_nelts)
+	    return false;
+	}
+    }
   return true;
 }
 
@@ -65,15 +139,16 @@ vec_perm_indices::all_in_range_p (elemen
 bool
 tree_to_vec_perm_builder (vec_perm_builder *builder, tree cst)
 {
-  unsigned int nelts = TYPE_VECTOR_SUBPARTS (TREE_TYPE (cst));
-  for (unsigned int i = 0; i < nelts; ++i)
-    if (!tree_fits_shwi_p (vector_cst_elt (cst, i)))
+  unsigned int encoded_nelts = vector_cst_encoded_nelts (cst);
+  for (unsigned int i = 0; i < encoded_nelts; ++i)
+    if (!tree_fits_shwi_p (VECTOR_CST_ENCODED_ELT (cst, i)))
       return false;
 
-  builder->reserve (nelts);
-  for (unsigned int i = 0; i < nelts; ++i)
-    builder->quick_push (tree_to_shwi (vector_cst_elt (cst, i))
-			 & (2 * nelts - 1));
+  builder->new_vector (TYPE_VECTOR_SUBPARTS (TREE_TYPE (cst)),
+		       VECTOR_CST_NPATTERNS (cst),
+		       VECTOR_CST_NELTS_PER_PATTERN (cst));
+  for (unsigned int i = 0; i < encoded_nelts; ++i)
+    builder->quick_push (tree_to_shwi (VECTOR_CST_ENCODED_ELT (cst, i)));
   return true;
 }
 
Index: gcc/optabs.c
===================================================================
--- gcc/optabs.c	2017-12-09 22:47:27.881318099 +0000
+++ gcc/optabs.c	2017-12-09 22:48:47.546825312 +0000
@@ -5456,6 +5456,11 @@ expand_vec_perm_const (machine_mode mode
   rtx_insn *last = get_last_insn ();
 
   bool single_arg_p = rtx_equal_p (v0, v1);
+  /* Always specify two input vectors here and leave the target to handle
+     cases in which the inputs are equal.  Not all backends can cope with
+     the single-input representation when testing for a double-input
+     target instruction.  */
+  vec_perm_indices indices (sel, 2, GET_MODE_NUNITS (mode));
 
   /* See if this can be handled with a vec_shr.  We only do this if the
      second vector is all zeroes.  */
@@ -5468,7 +5473,7 @@ expand_vec_perm_const (machine_mode mode
       && (shift_code != CODE_FOR_nothing
 	  || shift_code_qi != CODE_FOR_nothing))
     {
-      rtx shift_amt = shift_amt_for_vec_perm_mask (mode, sel);
+      rtx shift_amt = shift_amt_for_vec_perm_mask (mode, indices);
       if (shift_amt)
 	{
 	  struct expand_operand ops[3];
@@ -5500,7 +5505,7 @@ expand_vec_perm_const (machine_mode mode
       else
 	v1 = force_reg (mode, v1);
 
-      if (targetm.vectorize.vec_perm_const (mode, target, v0, v1, sel))
+      if (targetm.vectorize.vec_perm_const (mode, target, v0, v1, indices))
 	return target;
     }
 
@@ -5509,7 +5514,7 @@ expand_vec_perm_const (machine_mode mode
   rtx target_qi = NULL_RTX, v0_qi = NULL_RTX, v1_qi = NULL_RTX;
   if (qimode != VOIDmode)
     {
-      qimode_indices.new_expanded_vector (sel, GET_MODE_UNIT_SIZE (mode));
+      qimode_indices.new_expanded_vector (indices, GET_MODE_UNIT_SIZE (mode));
       target_qi = gen_reg_rtx (qimode);
       v0_qi = gen_lowpart (qimode, v0);
       v1_qi = gen_lowpart (qimode, v1);
@@ -5536,7 +5541,7 @@ expand_vec_perm_const (machine_mode mode
      REQUIRED_SEL_MODE is OK.  */
   if (sel_mode != required_sel_mode)
     {
-      if (!selector_fits_mode_p (required_sel_mode, sel))
+      if (!selector_fits_mode_p (required_sel_mode, indices))
 	{
 	  delete_insns_since (last);
 	  return NULL_RTX;
@@ -5547,7 +5552,7 @@ expand_vec_perm_const (machine_mode mode
   insn_code icode = direct_optab_handler (vec_perm_optab, mode);
   if (icode != CODE_FOR_nothing)
     {
-      rtx sel_rtx = vec_perm_indices_to_rtx (sel_mode, sel);
+      rtx sel_rtx = vec_perm_indices_to_rtx (sel_mode, indices);
       rtx tmp = expand_vec_perm_1 (icode, target, v0, v1, sel_rtx);
       if (tmp)
 	return tmp;
@@ -5621,7 +5626,7 @@ expand_vec_perm_var (machine_mode mode,
   gcc_assert (sel != NULL);
 
   /* Broadcast the low byte each element into each of its bytes.  */
-  vec_perm_builder const_sel (w);
+  vec_perm_builder const_sel (w, w, 1);
   for (i = 0; i < w; ++i)
     {
       int this_e = i / u * u;
@@ -5848,7 +5853,7 @@ expand_mult_highpart (machine_mode mode,
   expand_insn (optab_handler (tab2, mode), 3, eops);
   m2 = gen_lowpart (mode, eops[0].value);
 
-  auto_vec_perm_indices sel (nunits);
+  vec_perm_builder sel (nunits, nunits, 1);
   if (method == 2)
     {
       for (i = 0; i < nunits; ++i)
Index: gcc/optabs-query.c
===================================================================
--- gcc/optabs-query.c	2017-12-09 22:47:27.881318099 +0000
+++ gcc/optabs-query.c	2017-12-09 22:48:47.545825268 +0000
@@ -501,12 +501,13 @@ can_mult_highpart_p (machine_mode mode,
       op = uns_p ? vec_widen_umult_odd_optab : vec_widen_smult_odd_optab;
       if (optab_handler (op, mode) != CODE_FOR_nothing)
 	{
-	  auto_vec_perm_indices sel (nunits);
+	  vec_perm_builder sel (nunits, nunits, 1);
 	  for (i = 0; i < nunits; ++i)
 	    sel.quick_push (!BYTES_BIG_ENDIAN
 			    + (i & ~1)
 			    + ((i & 1) ? nunits : 0));
-	  if (can_vec_perm_const_p (mode, sel))
+	  vec_perm_indices indices (sel, 2, nunits);
+	  if (can_vec_perm_const_p (mode, indices))
 	    return 2;
 	}
     }
@@ -517,10 +518,11 @@ can_mult_highpart_p (machine_mode mode,
       op = uns_p ? vec_widen_umult_lo_optab : vec_widen_smult_lo_optab;
       if (optab_handler (op, mode) != CODE_FOR_nothing)
 	{
-	  auto_vec_perm_indices sel (nunits);
+	  vec_perm_builder sel (nunits, nunits, 1);
 	  for (i = 0; i < nunits; ++i)
 	    sel.quick_push (2 * i + (BYTES_BIG_ENDIAN ? 0 : 1));
-	  if (can_vec_perm_const_p (mode, sel))
+	  vec_perm_indices indices (sel, 2, nunits);
+	  if (can_vec_perm_const_p (mode, indices))
 	    return 3;
 	}
     }
Index: gcc/fold-const.c
===================================================================
--- gcc/fold-const.c	2017-12-09 22:47:27.881318099 +0000
+++ gcc/fold-const.c	2017-12-09 22:48:47.545825268 +0000
@@ -11217,7 +11217,7 @@ fold_ternary_loc (location_t loc, enum t
 	    {
 	      unsigned int nelts = VECTOR_CST_NELTS (arg0), i;
 	      gcc_assert (nelts == TYPE_VECTOR_SUBPARTS (type));
-	      auto_vec_perm_indices sel (nelts);
+	      vec_perm_builder sel (nelts, nelts, 1);
 	      for (i = 0; i < nelts; i++)
 		{
 		  tree val = VECTOR_CST_ELT (arg0, i);
@@ -11228,7 +11228,8 @@ fold_ternary_loc (location_t loc, enum t
 		  else /* Currently unreachable.  */
 		    return NULL_TREE;
 		}
-	      tree t = fold_vec_perm (type, arg1, arg2, sel);
+	      tree t = fold_vec_perm (type, arg1, arg2,
+				      vec_perm_indices (sel, 2, nelts));
 	      if (t != NULL_TREE)
 		return t;
 	    }
@@ -11558,8 +11559,8 @@ fold_ternary_loc (location_t loc, enum t
 	  mask2 = 2 * nelts - 1;
 	  mask = single_arg ? (nelts - 1) : mask2;
 	  gcc_assert (nelts == TYPE_VECTOR_SUBPARTS (type));
-	  auto_vec_perm_indices sel (nelts);
-	  auto_vec_perm_indices sel2 (nelts);
+	  vec_perm_builder sel (nelts, nelts, 1);
+	  vec_perm_builder sel2 (nelts, nelts, 1);
 	  for (i = 0; i < nelts; i++)
 	    {
 	      tree val = VECTOR_CST_ELT (arg2, i);
@@ -11604,12 +11605,13 @@ fold_ternary_loc (location_t loc, enum t
 	      need_mask_canon = true;
 	    }
 
+	  vec_perm_indices indices (sel, 2, nelts);
 	  if ((TREE_CODE (op0) == VECTOR_CST
 	       || TREE_CODE (op0) == CONSTRUCTOR)
 	      && (TREE_CODE (op1) == VECTOR_CST
 		  || TREE_CODE (op1) == CONSTRUCTOR))
 	    {
-	      tree t = fold_vec_perm (type, op0, op1, sel);
+	      tree t = fold_vec_perm (type, op0, op1, indices);
 	      if (t != NULL_TREE)
 		return t;
 	    }
@@ -11621,11 +11623,14 @@ fold_ternary_loc (location_t loc, enum t
 	     argument permutation while still allowing an equivalent
 	     2-argument version.  */
 	  if (need_mask_canon && arg2 == op2
-	      && !can_vec_perm_const_p (TYPE_MODE (type), sel, false)
-	      && can_vec_perm_const_p (TYPE_MODE (type), sel2, false))
+	      && !can_vec_perm_const_p (TYPE_MODE (type), indices, false)
+	      && can_vec_perm_const_p (TYPE_MODE (type),
+				       vec_perm_indices (sel2, 2, nelts),
+				       false))
 	    {
 	      need_mask_canon = need_mask_canon2;
-	      sel = sel2;
+	      sel.truncate (0);
+	      sel.splice (sel2);
 	    }
 
 	  if (need_mask_canon && arg2 == op2)
Index: gcc/tree-ssa-forwprop.c
===================================================================
--- gcc/tree-ssa-forwprop.c	2017-12-09 22:47:27.883318100 +0000
+++ gcc/tree-ssa-forwprop.c	2017-12-09 22:48:47.546825312 +0000
@@ -2019,7 +2019,7 @@ simplify_vector_constructor (gimple_stmt
   elem_type = TREE_TYPE (type);
   elem_size = TREE_INT_CST_LOW (TYPE_SIZE (elem_type));
 
-  auto_vec_perm_indices sel (nelts);
+  vec_perm_builder sel (nelts, nelts, 1);
   orig = NULL;
   conv_code = ERROR_MARK;
   maybe_ident = true;
@@ -2109,7 +2109,8 @@ simplify_vector_constructor (gimple_stmt
     {
       tree mask_type;
 
-      if (!can_vec_perm_const_p (TYPE_MODE (type), sel))
+      vec_perm_indices indices (sel, 1, nelts);
+      if (!can_vec_perm_const_p (TYPE_MODE (type), indices))
 	return false;
       mask_type
 	= build_vector_type (build_nonstandard_integer_type (elem_size, 1),
Index: gcc/tree-vect-data-refs.c
===================================================================
--- gcc/tree-vect-data-refs.c	2017-12-09 22:47:27.883318100 +0000
+++ gcc/tree-vect-data-refs.c	2017-12-09 22:48:47.546825312 +0000
@@ -4566,7 +4566,7 @@ vect_grouped_store_supported (tree vecty
   if (VECTOR_MODE_P (mode))
     {
       unsigned int i, nelt = GET_MODE_NUNITS (mode);
-      auto_vec_perm_indices sel (nelt);
+      vec_perm_builder sel (nelt, nelt, 1);
       sel.quick_grow (nelt);
 
       if (count == 3)
@@ -4574,6 +4574,7 @@ vect_grouped_store_supported (tree vecty
 	  unsigned int j0 = 0, j1 = 0, j2 = 0;
 	  unsigned int i, j;
 
+	  vec_perm_indices indices;
 	  for (j = 0; j < 3; j++)
 	    {
 	      int nelt0 = ((3 - j) * nelt) % 3;
@@ -4588,7 +4589,8 @@ vect_grouped_store_supported (tree vecty
 		  if (3 * i + nelt2 < nelt)
 		    sel[3 * i + nelt2] = 0;
 		}
-	      if (!can_vec_perm_const_p (mode, sel))
+	      indices.new_vector (sel, 2, nelt);
+	      if (!can_vec_perm_const_p (mode, indices))
 		{
 		  if (dump_enabled_p ())
 		    dump_printf (MSG_MISSED_OPTIMIZATION,
@@ -4605,7 +4607,8 @@ vect_grouped_store_supported (tree vecty
 		  if (3 * i + nelt2 < nelt)
 		    sel[3 * i + nelt2] = nelt + j2++;
 		}
-	      if (!can_vec_perm_const_p (mode, sel))
+	      indices.new_vector (sel, 2, nelt);
+	      if (!can_vec_perm_const_p (mode, indices))
 		{
 		  if (dump_enabled_p ())
 		    dump_printf (MSG_MISSED_OPTIMIZATION,
@@ -4625,11 +4628,13 @@ vect_grouped_store_supported (tree vecty
 	      sel[i * 2] = i;
 	      sel[i * 2 + 1] = i + nelt;
 	    }
-	  if (can_vec_perm_const_p (mode, sel))
+	  vec_perm_indices indices (sel, 2, nelt);
+	  if (can_vec_perm_const_p (mode, indices))
 	    {
 	      for (i = 0; i < nelt; i++)
 		sel[i] += nelt / 2;
-	      if (can_vec_perm_const_p (mode, sel))
+	      indices.new_vector (sel, 2, nelt);
+	      if (can_vec_perm_const_p (mode, indices))
 		return true;
 	    }
 	}
@@ -4731,7 +4736,7 @@ vect_permute_store_chain (vec<tree> dr_c
   unsigned int i, n, log_length = exact_log2 (length);
   unsigned int j, nelt = TYPE_VECTOR_SUBPARTS (vectype);
 
-  auto_vec_perm_indices sel (nelt);
+  vec_perm_builder sel (nelt, nelt, 1);
   sel.quick_grow (nelt);
 
   result_chain->quick_grow (length);
@@ -4742,6 +4747,7 @@ vect_permute_store_chain (vec<tree> dr_c
     {
       unsigned int j0 = 0, j1 = 0, j2 = 0;
 
+      vec_perm_indices indices;
       for (j = 0; j < 3; j++)
         {
 	  int nelt0 = ((3 - j) * nelt) % 3;
@@ -4757,7 +4763,8 @@ vect_permute_store_chain (vec<tree> dr_c
 	      if (3 * i + nelt2 < nelt)
 		sel[3 * i + nelt2] = 0;
 	    }
-	  perm3_mask_low = vect_gen_perm_mask_checked (vectype, sel);
+	  indices.new_vector (sel, 2, nelt);
+	  perm3_mask_low = vect_gen_perm_mask_checked (vectype, indices);
 
 	  for (i = 0; i < nelt; i++)
 	    {
@@ -4768,7 +4775,8 @@ vect_permute_store_chain (vec<tree> dr_c
 	      if (3 * i + nelt2 < nelt)
 		sel[3 * i + nelt2] = nelt + j2++;
 	    }
-	  perm3_mask_high = vect_gen_perm_mask_checked (vectype, sel);
+	  indices.new_vector (sel, 2, nelt);
+	  perm3_mask_high = vect_gen_perm_mask_checked (vectype, indices);
 
 	  vect1 = dr_chain[0];
 	  vect2 = dr_chain[1];
@@ -4805,11 +4813,13 @@ vect_permute_store_chain (vec<tree> dr_c
 	  sel[i * 2] = i;
 	  sel[i * 2 + 1] = i + nelt;
 	}
-	perm_mask_high = vect_gen_perm_mask_checked (vectype, sel);
+	vec_perm_indices indices (sel, 2, nelt);
+	perm_mask_high = vect_gen_perm_mask_checked (vectype, indices);
 
 	for (i = 0; i < nelt; i++)
 	  sel[i] += nelt / 2;
-	perm_mask_low = vect_gen_perm_mask_checked (vectype, sel);
+	indices.new_vector (sel, 2, nelt);
+	perm_mask_low = vect_gen_perm_mask_checked (vectype, indices);
 
 	for (i = 0, n = log_length; i < n; i++)
 	  {
@@ -5154,11 +5164,12 @@ vect_grouped_load_supported (tree vectyp
   if (VECTOR_MODE_P (mode))
     {
       unsigned int i, j, nelt = GET_MODE_NUNITS (mode);
-      auto_vec_perm_indices sel (nelt);
+      vec_perm_builder sel (nelt, nelt, 1);
       sel.quick_grow (nelt);
 
       if (count == 3)
 	{
+	  vec_perm_indices indices;
 	  unsigned int k;
 	  for (k = 0; k < 3; k++)
 	    {
@@ -5167,7 +5178,8 @@ vect_grouped_load_supported (tree vectyp
 		  sel[i] = 3 * i + k;
 		else
 		  sel[i] = 0;
-	      if (!can_vec_perm_const_p (mode, sel))
+	      indices.new_vector (sel, 2, nelt);
+	      if (!can_vec_perm_const_p (mode, indices))
 		{
 		  if (dump_enabled_p ())
 		    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
@@ -5180,7 +5192,8 @@ vect_grouped_load_supported (tree vectyp
 		  sel[i] = i;
 		else
 		  sel[i] = nelt + ((nelt + k) % 3) + 3 * (j++);
-	      if (!can_vec_perm_const_p (mode, sel))
+	      indices.new_vector (sel, 2, nelt);
+	      if (!can_vec_perm_const_p (mode, indices))
 		{
 		  if (dump_enabled_p ())
 		    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
@@ -5195,13 +5208,16 @@ vect_grouped_load_supported (tree vectyp
 	{
 	  /* If length is not equal to 3 then only power of 2 is supported.  */
 	  gcc_assert (pow2p_hwi (count));
+
 	  for (i = 0; i < nelt; i++)
 	    sel[i] = i * 2;
-	  if (can_vec_perm_const_p (mode, sel))
+	  vec_perm_indices indices (sel, 2, nelt);
+	  if (can_vec_perm_const_p (mode, indices))
 	    {
 	      for (i = 0; i < nelt; i++)
 		sel[i] = i * 2 + 1;
-	      if (can_vec_perm_const_p (mode, sel))
+	      indices.new_vector (sel, 2, nelt);
+	      if (can_vec_perm_const_p (mode, indices))
 		return true;
 	    }
         }
@@ -5316,7 +5332,7 @@ vect_permute_load_chain (vec<tree> dr_ch
   unsigned int i, j, log_length = exact_log2 (length);
   unsigned nelt = TYPE_VECTOR_SUBPARTS (vectype);
 
-  auto_vec_perm_indices sel (nelt);
+  vec_perm_builder sel (nelt, nelt, 1);
   sel.quick_grow (nelt);
 
   result_chain->quick_grow (length);
@@ -5327,6 +5343,7 @@ vect_permute_load_chain (vec<tree> dr_ch
     {
       unsigned int k;
 
+      vec_perm_indices indices;
       for (k = 0; k < 3; k++)
 	{
 	  for (i = 0; i < nelt; i++)
@@ -5334,15 +5351,16 @@ vect_permute_load_chain (vec<tree> dr_ch
 	      sel[i] = 3 * i + k;
 	    else
 	      sel[i] = 0;
-	  perm3_mask_low = vect_gen_perm_mask_checked (vectype, sel);
+	  indices.new_vector (sel, 2, nelt);
+	  perm3_mask_low = vect_gen_perm_mask_checked (vectype, indices);
 
 	  for (i = 0, j = 0; i < nelt; i++)
 	    if (3 * i + k < 2 * nelt)
 	      sel[i] = i;
 	    else
 	      sel[i] = nelt + ((nelt + k) % 3) + 3 * (j++);
-
-	  perm3_mask_high = vect_gen_perm_mask_checked (vectype, sel);
+	  indices.new_vector (sel, 2, nelt);
+	  perm3_mask_high = vect_gen_perm_mask_checked (vectype, indices);
 
 	  first_vect = dr_chain[0];
 	  second_vect = dr_chain[1];
@@ -5374,11 +5392,13 @@ vect_permute_load_chain (vec<tree> dr_ch
 
       for (i = 0; i < nelt; ++i)
 	sel[i] = i * 2;
-      perm_mask_even = vect_gen_perm_mask_checked (vectype, sel);
+      vec_perm_indices indices (sel, 2, nelt);
+      perm_mask_even = vect_gen_perm_mask_checked (vectype, indices);
 
       for (i = 0; i < nelt; ++i)
 	sel[i] = i * 2 + 1;
-      perm_mask_odd = vect_gen_perm_mask_checked (vectype, sel);
+      indices.new_vector (sel, 2, nelt);
+      perm_mask_odd = vect_gen_perm_mask_checked (vectype, indices);
 
       for (i = 0; i < log_length; i++)
 	{
@@ -5514,7 +5534,7 @@ vect_shift_permute_load_chain (vec<tree>
   stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
   loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
 
-  auto_vec_perm_indices sel (nelt);
+  vec_perm_builder sel (nelt, nelt, 1);
   sel.quick_grow (nelt);
 
   result_chain->quick_grow (length);
@@ -5528,7 +5548,8 @@ vect_shift_permute_load_chain (vec<tree>
 	sel[i] = i * 2;
       for (i = 0; i < nelt / 2; ++i)
 	sel[nelt / 2 + i] = i * 2 + 1;
-      if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel))
+      vec_perm_indices indices (sel, 2, nelt);
+      if (!can_vec_perm_const_p (TYPE_MODE (vectype), indices))
 	{
 	  if (dump_enabled_p ())
 	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
@@ -5536,13 +5557,14 @@ vect_shift_permute_load_chain (vec<tree>
 			      supported by target\n");
 	  return false;
 	}
-      perm2_mask1 = vect_gen_perm_mask_checked (vectype, sel);
+      perm2_mask1 = vect_gen_perm_mask_checked (vectype, indices);
 
       for (i = 0; i < nelt / 2; ++i)
 	sel[i] = i * 2 + 1;
       for (i = 0; i < nelt / 2; ++i)
 	sel[nelt / 2 + i] = i * 2;
-      if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel))
+      indices.new_vector (sel, 2, nelt);
+      if (!can_vec_perm_const_p (TYPE_MODE (vectype), indices))
 	{
 	  if (dump_enabled_p ())
 	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
@@ -5550,20 +5572,21 @@ vect_shift_permute_load_chain (vec<tree>
 			      supported by target\n");
 	  return false;
 	}
-      perm2_mask2 = vect_gen_perm_mask_checked (vectype, sel);
+      perm2_mask2 = vect_gen_perm_mask_checked (vectype, indices);
 
       /* Generating permutation constant to shift all elements.
 	 For vector length 8 it is {4 5 6 7 8 9 10 11}.  */
       for (i = 0; i < nelt; i++)
 	sel[i] = nelt / 2 + i;
-      if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel))
+      indices.new_vector (sel, 2, nelt);
+      if (!can_vec_perm_const_p (TYPE_MODE (vectype), indices))
 	{
 	  if (dump_enabled_p ())
 	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
 			     "shift permutation is not supported by target\n");
 	  return false;
 	}
-      shift1_mask = vect_gen_perm_mask_checked (vectype, sel);
+      shift1_mask = vect_gen_perm_mask_checked (vectype, indices);
 
       /* Generating permutation constant to select vector from 2.
 	 For vector length 8 it is {0 1 2 3 12 13 14 15}.  */
@@ -5571,14 +5594,15 @@ vect_shift_permute_load_chain (vec<tree>
 	sel[i] = i;
       for (i = nelt / 2; i < nelt; i++)
 	sel[i] = nelt + i;
-      if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel))
+      indices.new_vector (sel, 2, nelt);
+      if (!can_vec_perm_const_p (TYPE_MODE (vectype), indices))
 	{
 	  if (dump_enabled_p ())
 	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
 			     "select is not supported by target\n");
 	  return false;
 	}
-      select_mask = vect_gen_perm_mask_checked (vectype, sel);
+      select_mask = vect_gen_perm_mask_checked (vectype, indices);
 
       for (i = 0; i < log_length; i++)
 	{
@@ -5634,7 +5658,8 @@ vect_shift_permute_load_chain (vec<tree>
 	  sel[i] = 3 * k + (l % 3);
 	  k++;
 	}
-      if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel))
+      vec_perm_indices indices (sel, 2, nelt);
+      if (!can_vec_perm_const_p (TYPE_MODE (vectype), indices))
 	{
 	  if (dump_enabled_p ())
 	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
@@ -5642,59 +5667,63 @@ vect_shift_permute_load_chain (vec<tree>
 			      supported by target\n");
 	  return false;
 	}
-      perm3_mask = vect_gen_perm_mask_checked (vectype, sel);
+      perm3_mask = vect_gen_perm_mask_checked (vectype, indices);
 
       /* Generating permutation constant to shift all elements.
 	 For vector length 8 it is {6 7 8 9 10 11 12 13}.  */
       for (i = 0; i < nelt; i++)
 	sel[i] = 2 * (nelt / 3) + (nelt % 3) + i;
-      if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel))
+      indices.new_vector (sel, 2, nelt);
+      if (!can_vec_perm_const_p (TYPE_MODE (vectype), indices))
 	{
 	  if (dump_enabled_p ())
 	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
 			     "shift permutation is not supported by target\n");
 	  return false;
 	}
-      shift1_mask = vect_gen_perm_mask_checked (vectype, sel);
+      shift1_mask = vect_gen_perm_mask_checked (vectype, indices);
 
       /* Generating permutation constant to shift all elements.
 	 For vector length 8 it is {5 6 7 8 9 10 11 12}.  */
       for (i = 0; i < nelt; i++)
 	sel[i] = 2 * (nelt / 3) + 1 + i;
-      if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel))
+      indices.new_vector (sel, 2, nelt);
+      if (!can_vec_perm_const_p (TYPE_MODE (vectype), indices))
 	{
 	  if (dump_enabled_p ())
 	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
 			     "shift permutation is not supported by target\n");
 	  return false;
 	}
-      shift2_mask = vect_gen_perm_mask_checked (vectype, sel);
+      shift2_mask = vect_gen_perm_mask_checked (vectype, indices);
 
       /* Generating permutation constant to shift all elements.
 	 For vector length 8 it is {3 4 5 6 7 8 9 10}.  */
       for (i = 0; i < nelt; i++)
 	sel[i] = (nelt / 3) + (nelt % 3) / 2 + i;
-      if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel))
+      indices.new_vector (sel, 2, nelt);
+      if (!can_vec_perm_const_p (TYPE_MODE (vectype), indices))
 	{
 	  if (dump_enabled_p ())
 	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
 			     "shift permutation is not supported by target\n");
 	  return false;
 	}
-      shift3_mask = vect_gen_perm_mask_checked (vectype, sel);
+      shift3_mask = vect_gen_perm_mask_checked (vectype, indices);
 
       /* Generating permutation constant to shift all elements.
 	 For vector length 8 it is {5 6 7 8 9 10 11 12}.  */
       for (i = 0; i < nelt; i++)
 	sel[i] = 2 * (nelt / 3) + (nelt % 3) / 2 + i;
-      if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel))
+      indices.new_vector (sel, 2, nelt);
+      if (!can_vec_perm_const_p (TYPE_MODE (vectype), indices))
 	{
 	  if (dump_enabled_p ())
 	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
 			     "shift permutation is not supported by target\n");
 	  return false;
 	}
-      shift4_mask = vect_gen_perm_mask_checked (vectype, sel);
+      shift4_mask = vect_gen_perm_mask_checked (vectype, indices);
 
       for (k = 0; k < 3; k++)
 	{
Index: gcc/tree-vect-slp.c
===================================================================
--- gcc/tree-vect-slp.c	2017-12-09 22:47:27.884318101 +0000
+++ gcc/tree-vect-slp.c	2017-12-09 22:48:47.547825355 +0000
@@ -894,7 +894,7 @@ vect_build_slp_tree_1 (vec_info *vinfo,
       && TREE_CODE_CLASS (alt_stmt_code) != tcc_reference)
     {
       unsigned int count = TYPE_VECTOR_SUBPARTS (vectype);
-      auto_vec_perm_indices sel (count);
+      vec_perm_builder sel (count, count, 1);
       for (i = 0; i < count; ++i)
 	{
 	  unsigned int elt = i;
@@ -902,7 +902,8 @@ vect_build_slp_tree_1 (vec_info *vinfo,
 	    elt += count;
 	  sel.quick_push (elt);
 	}
-      if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel))
+      vec_perm_indices indices (sel, 2, count);
+      if (!can_vec_perm_const_p (TYPE_MODE (vectype), indices))
 	{
 	  for (i = 0; i < group_size; ++i)
 	    if (gimple_assign_rhs_code (stmts[i]) == alt_stmt_code)
@@ -3570,8 +3571,9 @@ vect_transform_slp_perm_load (slp_tree n
     (int_mode_for_mode (TYPE_MODE (TREE_TYPE (vectype))).require (), 1);
   mask_type = get_vectype_for_scalar_type (mask_element_type);
   nunits = TYPE_VECTOR_SUBPARTS (vectype);
-  auto_vec_perm_indices mask (nunits);
+  vec_perm_builder mask (nunits, nunits, 1);
   mask.quick_grow (nunits);
+  vec_perm_indices indices;
 
   /* Initialize the vect stmts of NODE to properly insert the generated
      stmts later.  */
@@ -3644,10 +3646,10 @@ vect_transform_slp_perm_load (slp_tree n
 	    noop_p = false;
 	  mask[index++] = mask_element;
 
-	  if (index == nunits)
+	  if (index == nunits && !noop_p)
 	    {
-	      if (! noop_p
-		  && ! can_vec_perm_const_p (mode, mask))
+	      indices.new_vector (mask, 2, nunits);
+	      if (!can_vec_perm_const_p (mode, indices))
 		{
 		  if (dump_enabled_p ())
 		    {
@@ -3655,16 +3657,19 @@ vect_transform_slp_perm_load (slp_tree n
 				       vect_location, 
 				       "unsupported vect permute { ");
 		      for (i = 0; i < nunits; ++i)
-			dump_printf (MSG_MISSED_OPTIMIZATION, "%d ", mask[i]);
+			dump_printf (MSG_MISSED_OPTIMIZATION,
+				     HOST_WIDE_INT_PRINT_DEC " ", mask[i]);
 		      dump_printf (MSG_MISSED_OPTIMIZATION, "}\n");
 		    }
 		  gcc_assert (analyze_only);
 		  return false;
 		}
 
-	      if (! noop_p)
-		++*n_perms;
+	      ++*n_perms;
+	    }
 
+	  if (index == nunits)
+	    {
 	      if (!analyze_only)
 		{
 		  tree mask_vec = NULL_TREE;
@@ -3797,7 +3802,7 @@ vect_schedule_slp_instance (slp_tree nod
       enum tree_code code0 = gimple_assign_rhs_code (stmt);
       enum tree_code ocode = ERROR_MARK;
       gimple *ostmt;
-      auto_vec_perm_indices mask (group_size);
+      vec_perm_builder mask (group_size, group_size, 1);
       FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_STMTS (node), i, ostmt)
 	if (gimple_assign_rhs_code (ostmt) != code0)
 	  {
Index: gcc/tree-vect-stmts.c
===================================================================
--- gcc/tree-vect-stmts.c	2017-12-09 22:47:27.885318101 +0000
+++ gcc/tree-vect-stmts.c	2017-12-09 22:48:47.548825399 +0000
@@ -1717,13 +1717,14 @@ perm_mask_for_reverse (tree vectype)
 
   nunits = TYPE_VECTOR_SUBPARTS (vectype);
 
-  auto_vec_perm_indices sel (nunits);
+  vec_perm_builder sel (nunits, nunits, 1);
   for (i = 0; i < nunits; ++i)
     sel.quick_push (nunits - 1 - i);
 
-  if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel))
+  vec_perm_indices indices (sel, 1, nunits);
+  if (!can_vec_perm_const_p (TYPE_MODE (vectype), indices))
     return NULL_TREE;
-  return vect_gen_perm_mask_checked (vectype, sel);
+  return vect_gen_perm_mask_checked (vectype, indices);
 }
 
 /* A subroutine of get_load_store_type, with a subset of the same
@@ -2185,27 +2186,32 @@ vectorizable_mask_load_store (gimple *st
 	{
 	  modifier = WIDEN;
 
-	  auto_vec_perm_indices sel (gather_off_nunits);
+	  vec_perm_builder sel (gather_off_nunits, gather_off_nunits, 1);
 	  for (i = 0; i < gather_off_nunits; ++i)
 	    sel.quick_push (i | nunits);
 
-	  perm_mask = vect_gen_perm_mask_checked (gs_info.offset_vectype, sel);
+	  vec_perm_indices indices (sel, 1, gather_off_nunits);
+	  perm_mask = vect_gen_perm_mask_checked (gs_info.offset_vectype,
+						  indices);
 	}
       else if (nunits == gather_off_nunits * 2)
 	{
 	  modifier = NARROW;
 
-	  auto_vec_perm_indices sel (nunits);
+	  vec_perm_builder sel (nunits, nunits, 1);
 	  sel.quick_grow (nunits);
 	  for (i = 0; i < nunits; ++i)
 	    sel[i] = i < gather_off_nunits
 		     ? i : i + nunits - gather_off_nunits;
+	  vec_perm_indices indices (sel, 2, nunits);
+	  perm_mask = vect_gen_perm_mask_checked (vectype, indices);
 
-	  perm_mask = vect_gen_perm_mask_checked (vectype, sel);
 	  ncopies *= 2;
+
 	  for (i = 0; i < nunits; ++i)
 	    sel[i] = i | gather_off_nunits;
-	  mask_perm_mask = vect_gen_perm_mask_checked (masktype, sel);
+	  indices.new_vector (sel, 2, gather_off_nunits);
+	  mask_perm_mask = vect_gen_perm_mask_checked (masktype, indices);
 	}
       else
 	gcc_unreachable ();
@@ -2498,12 +2504,13 @@ vectorizable_bswap (gimple *stmt, gimple
   unsigned int num_bytes = TYPE_VECTOR_SUBPARTS (char_vectype);
   unsigned word_bytes = num_bytes / nunits;
 
-  auto_vec_perm_indices elts (num_bytes);
+  vec_perm_builder elts (num_bytes, num_bytes, 1);
   for (unsigned i = 0; i < nunits; ++i)
     for (unsigned j = 0; j < word_bytes; ++j)
       elts.quick_push ((i + 1) * word_bytes - j - 1);
 
-  if (!can_vec_perm_const_p (TYPE_MODE (char_vectype), elts))
+  vec_perm_indices indices (elts, 1, num_bytes);
+  if (!can_vec_perm_const_p (TYPE_MODE (char_vectype), indices))
     return false;
 
   if (! vec_stmt)
@@ -5809,22 +5816,25 @@ vectorizable_store (gimple *stmt, gimple
 	{
 	  modifier = WIDEN;
 
-	  auto_vec_perm_indices sel (scatter_off_nunits);
+	  vec_perm_builder sel (scatter_off_nunits, scatter_off_nunits, 1);
 	  for (i = 0; i < (unsigned int) scatter_off_nunits; ++i)
 	    sel.quick_push (i | nunits);
 
-	  perm_mask = vect_gen_perm_mask_checked (gs_info.offset_vectype, sel);
+	  vec_perm_indices indices (sel, 1, scatter_off_nunits);
+	  perm_mask = vect_gen_perm_mask_checked (gs_info.offset_vectype,
+						  indices);
 	  gcc_assert (perm_mask != NULL_TREE);
 	}
       else if (nunits == (unsigned int) scatter_off_nunits * 2)
 	{
 	  modifier = NARROW;
 
-	  auto_vec_perm_indices sel (nunits);
+	  vec_perm_builder sel (nunits, nunits, 1);
 	  for (i = 0; i < (unsigned int) nunits; ++i)
 	    sel.quick_push (i | scatter_off_nunits);
 
-	  perm_mask = vect_gen_perm_mask_checked (vectype, sel);
+	  vec_perm_indices indices (sel, 2, nunits);
+	  perm_mask = vect_gen_perm_mask_checked (vectype, indices);
 	  gcc_assert (perm_mask != NULL_TREE);
 	  ncopies *= 2;
 	}
@@ -6845,22 +6855,25 @@ vectorizable_load (gimple *stmt, gimple_
 	{
 	  modifier = WIDEN;
 
-	  auto_vec_perm_indices sel (gather_off_nunits);
+	  vec_perm_builder sel (gather_off_nunits, gather_off_nunits, 1);
 	  for (i = 0; i < gather_off_nunits; ++i)
 	    sel.quick_push (i | nunits);
 
-	  perm_mask = vect_gen_perm_mask_checked (gs_info.offset_vectype, sel);
+	  vec_perm_indices indices (sel, 1, gather_off_nunits);
+	  perm_mask = vect_gen_perm_mask_checked (gs_info.offset_vectype,
+						  indices);
 	}
       else if (nunits == gather_off_nunits * 2)
 	{
 	  modifier = NARROW;
 
-	  auto_vec_perm_indices sel (nunits);
+	  vec_perm_builder sel (nunits, nunits, 1);
 	  for (i = 0; i < nunits; ++i)
 	    sel.quick_push (i < gather_off_nunits
 			    ? i : i + nunits - gather_off_nunits);
 
-	  perm_mask = vect_gen_perm_mask_checked (vectype, sel);
+	  vec_perm_indices indices (sel, 2, nunits);
+	  perm_mask = vect_gen_perm_mask_checked (vectype, indices);
 	  ncopies *= 2;
 	}
       else
Index: gcc/tree-vect-generic.c
===================================================================
--- gcc/tree-vect-generic.c	2017-12-09 22:47:27.883318100 +0000
+++ gcc/tree-vect-generic.c	2017-12-09 22:48:47.547825355 +0000
@@ -1299,15 +1299,13 @@ lower_vec_perm (gimple_stmt_iterator *gs
 	mask = gimple_assign_rhs1 (def_stmt);
     }
 
-  if (TREE_CODE (mask) == VECTOR_CST)
-    {
-      auto_vec_perm_indices sel_int (elements);
-
-      for (i = 0; i < elements; ++i)
-	sel_int.quick_push (TREE_INT_CST_LOW (VECTOR_CST_ELT (mask, i))
-			    & (2 * elements - 1));
+  vec_perm_builder sel_int;
 
-      if (can_vec_perm_const_p (TYPE_MODE (vect_type), sel_int))
+  if (TREE_CODE (mask) == VECTOR_CST
+      && tree_to_vec_perm_builder (&sel_int, mask))
+    {
+      vec_perm_indices indices (sel_int, 2, elements);
+      if (can_vec_perm_const_p (TYPE_MODE (vect_type), indices))
 	{
 	  gimple_assign_set_rhs3 (stmt, mask);
 	  update_stmt (stmt);
@@ -1319,14 +1317,14 @@ lower_vec_perm (gimple_stmt_iterator *gs
 	  != CODE_FOR_nothing
 	  && TREE_CODE (vec1) == VECTOR_CST
 	  && initializer_zerop (vec1)
-	  && sel_int[0]
-	  && sel_int[0] < elements)
+	  && indices[0]
+	  && indices[0] < elements)
 	{
 	  for (i = 1; i < elements; ++i)
 	    {
-	      unsigned int expected = i + sel_int[0];
+	      unsigned int expected = i + indices[0];
 	      /* Indices into the second vector are all equivalent.  */
-	      if (MIN (elements, (unsigned) sel_int[i])
+	      if (MIN (elements, (unsigned) indices[i])
 		  != MIN (elements, expected))
  		break;
 	    }
Index: gcc/tree-vect-loop.c
===================================================================
--- gcc/tree-vect-loop.c	2017-12-09 22:47:27.884318101 +0000
+++ gcc/tree-vect-loop.c	2017-12-09 22:48:47.547825355 +0000
@@ -3714,12 +3714,11 @@ vect_estimate_min_profitable_iters (loop
    vector elements (not bits) for a vector with NELT elements.  */
 static void
 calc_vec_perm_mask_for_shift (unsigned int offset, unsigned int nelt,
-			      vec_perm_indices *sel)
+			      vec_perm_builder *sel)
 {
-  unsigned int i;
-
-  for (i = 0; i < nelt; i++)
-    sel->quick_push ((i + offset) & (2 * nelt - 1));
+  sel->new_vector (nelt, nelt, 1);
+  for (unsigned int i = 0; i < nelt; i++)
+    sel->quick_push (i + offset);
 }
 
 /* Checks whether the target supports whole-vector shifts for vectors of mode
@@ -3732,13 +3731,13 @@ have_whole_vector_shift (machine_mode mo
     return true;
 
   unsigned int i, nelt = GET_MODE_NUNITS (mode);
-  auto_vec_perm_indices sel (nelt);
-
+  vec_perm_builder sel;
+  vec_perm_indices indices;
   for (i = nelt/2; i >= 1; i/=2)
     {
-      sel.truncate (0);
       calc_vec_perm_mask_for_shift (i, nelt, &sel);
-      if (!can_vec_perm_const_p (mode, sel, false))
+      indices.new_vector (sel, 2, nelt);
+      if (!can_vec_perm_const_p (mode, indices, false))
 	return false;
     }
   return true;
@@ -5028,7 +5027,8 @@ vect_create_epilog_for_reduction (vec<tr
       if (reduce_with_shift && !slp_reduc)
         {
           int nelements = vec_size_in_bits / element_bitsize;
-          auto_vec_perm_indices sel (nelements);
+	  vec_perm_builder sel;
+	  vec_perm_indices indices;
 
           int elt_offset;
 
@@ -5052,9 +5052,9 @@ vect_create_epilog_for_reduction (vec<tr
                elt_offset >= 1;
                elt_offset /= 2)
             {
-	      sel.truncate (0);
 	      calc_vec_perm_mask_for_shift (elt_offset, nelements, &sel);
-	      tree mask = vect_gen_perm_mask_any (vectype, sel);
+	      indices.new_vector (sel, 2, nelements);
+	      tree mask = vect_gen_perm_mask_any (vectype, indices);
 	      epilog_stmt = gimple_build_assign (vec_dest, VEC_PERM_EXPR,
 						 new_temp, zero_vec, mask);
               new_name = make_ssa_name (vec_dest, epilog_stmt);
Index: gcc/config/aarch64/aarch64.c
===================================================================
--- gcc/config/aarch64/aarch64.c	2017-12-09 22:47:27.856318084 +0000
+++ gcc/config/aarch64/aarch64.c	2017-12-09 22:48:47.535824832 +0000
@@ -13208,7 +13208,7 @@ #define MAX_VECT_LEN 16
 struct expand_vec_perm_d
 {
   rtx target, op0, op1;
-  auto_vec_perm_indices perm;
+  vec_perm_indices perm;
   machine_mode vmode;
   bool one_vector_p;
   bool testing_p;
@@ -13598,10 +13598,7 @@ aarch64_expand_vec_perm_const_1 (struct
   unsigned int nelt = d->perm.length ();
   if (d->perm[0] >= nelt)
     {
-      gcc_assert (nelt == (nelt & -nelt));
-      for (unsigned int i = 0; i < nelt; ++i)
-	d->perm[i] ^= nelt; /* Keep the same index, but in the other vector.  */
-
+      d->perm.rotate_inputs (1);
       std::swap (d->op0, d->op1);
     }
 
@@ -13641,12 +13638,10 @@ aarch64_vectorize_vec_perm_const (machin
 
   /* Calculate whether all elements are in one vector.  */
   unsigned int nelt = sel.length ();
-  d.perm.reserve (nelt);
   for (i = which = 0; i < nelt; ++i)
     {
       unsigned int ei = sel[i] & (2 * nelt - 1);
       which |= (ei < nelt ? 1 : 2);
-      d.perm.quick_push (ei);
     }
 
   switch (which)
@@ -13665,8 +13660,6 @@ aarch64_vectorize_vec_perm_const (machin
 	 input vector.  */
       /* Fall Through.  */
     case 2:
-      for (i = 0; i < nelt; ++i)
-	d.perm[i] &= nelt - 1;
       d.op0 = op1;
       d.one_vector_p = true;
       break;
@@ -13677,6 +13670,8 @@ aarch64_vectorize_vec_perm_const (machin
       break;
     }
 
+  d.perm.new_vector (sel.encoding (), d.one_vector_p ? 1 : 2, nelt);
+
   if (!d.testing_p)
     return aarch64_expand_vec_perm_const_1 (&d);
 
Index: gcc/config/arm/arm.c
===================================================================
--- gcc/config/arm/arm.c	2017-12-09 22:47:27.858318085 +0000
+++ gcc/config/arm/arm.c	2017-12-09 22:48:47.538824963 +0000
@@ -28852,7 +28852,7 @@ #define MAX_VECT_LEN 16
 struct expand_vec_perm_d
 {
   rtx target, op0, op1;
-  auto_vec_perm_indices perm;
+  vec_perm_indices perm;
   machine_mode vmode;
   bool one_vector_p;
   bool testing_p;
@@ -29360,9 +29360,7 @@ arm_expand_vec_perm_const_1 (struct expa
   unsigned int nelt = d->perm.length ();
   if (d->perm[0] >= nelt)
     {
-      for (unsigned int i = 0; i < nelt; ++i)
-	d->perm[i] = (d->perm[i] + nelt) & (2 * nelt - 1);
-
+      d->perm.rotate_inputs (1);
       std::swap (d->op0, d->op1);
     }
 
@@ -29402,12 +29400,10 @@ arm_vectorize_vec_perm_const (machine_mo
   d.testing_p = !target;
 
   nelt = GET_MODE_NUNITS (d.vmode);
-  d.perm.reserve (nelt);
   for (i = which = 0; i < nelt; ++i)
     {
       int ei = sel[i] & (2 * nelt - 1);
       which |= (ei < nelt ? 1 : 2);
-      d.perm.quick_push (ei);
     }
 
   switch (which)
@@ -29426,8 +29422,6 @@ arm_vectorize_vec_perm_const (machine_mo
 	 input vector.  */
       /* FALLTHRU */
     case 2:
-      for (i = 0; i < nelt; ++i)
-        d.perm[i] &= nelt - 1;
       d.op0 = op1;
       d.one_vector_p = true;
       break;
@@ -29438,6 +29432,8 @@ arm_vectorize_vec_perm_const (machine_mo
       break;
     }
 
+  d.perm.new_vector (sel.encoding (), d.one_vector_p ? 1 : 2, nelt);
+
   if (d.testing_p)
     return arm_expand_vec_perm_const_1 (&d);
 
Index: gcc/config/powerpcspe/powerpcspe.c
===================================================================
--- gcc/config/powerpcspe/powerpcspe.c	2017-12-09 22:47:27.871318093 +0000
+++ gcc/config/powerpcspe/powerpcspe.c	2017-12-09 22:48:47.541825094 +0000
@@ -38780,7 +38780,7 @@ rs6000_expand_extract_even (rtx target,
 {
   machine_mode vmode = GET_MODE (target);
   unsigned i, nelt = GET_MODE_NUNITS (vmode);
-  vec_perm_builder perm (nelt);
+  vec_perm_builder perm (nelt, nelt, 1);
 
   for (i = 0; i < nelt; i++)
     perm.quick_push (i * 2);
@@ -38795,7 +38795,7 @@ rs6000_expand_interleave (rtx target, rt
 {
   machine_mode vmode = GET_MODE (target);
   unsigned i, high, nelt = GET_MODE_NUNITS (vmode);
-  vec_perm_builder perm (nelt);
+  vec_perm_builder perm (nelt, nelt, 1);
 
   high = (highp ? 0 : nelt / 2);
   for (i = 0; i < nelt / 2; i++)
Index: gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/config/rs6000/rs6000.c	2017-12-09 22:47:27.874318095 +0000
+++ gcc/config/rs6000/rs6000.c	2017-12-09 22:48:47.544825224 +0000
@@ -36017,7 +36017,7 @@ rs6000_expand_extract_even (rtx target,
 {
   machine_mode vmode = GET_MODE (target);
   unsigned i, nelt = GET_MODE_NUNITS (vmode);
-  vec_perm_builder perm (nelt);
+  vec_perm_builder perm (nelt, nelt, 1);
 
   for (i = 0; i < nelt; i++)
     perm.quick_push (i * 2);
@@ -36032,7 +36032,7 @@ rs6000_expand_interleave (rtx target, rt
 {
   machine_mode vmode = GET_MODE (target);
   unsigned i, high, nelt = GET_MODE_NUNITS (vmode);
-  vec_perm_builder perm (nelt);
+  vec_perm_builder perm (nelt, nelt, 1);
 
   high = (highp ? 0 : nelt / 2);
   for (i = 0; i < nelt / 2; i++)

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [09/13] Use explicit encodings for simple permutes
  2017-12-09 23:06 [00/13] Make VEC_PERM_EXPR work for variable-length vectors Richard Sandiford
                   ` (7 preceding siblings ...)
  2017-12-09 23:20 ` [08/13] Add a vec_perm_indices_to_tree helper function Richard Sandiford
@ 2017-12-09 23:21 ` Richard Sandiford
  2017-12-19 20:37   ` Richard Sandiford
  2018-01-02 13:07   ` Richard Biener
  2017-12-09 23:23 ` [10/13] Rework VEC_PERM_EXPR folding Richard Sandiford
                   ` (2 subsequent siblings)
  11 siblings, 2 replies; 46+ messages in thread
From: Richard Sandiford @ 2017-12-09 23:21 UTC (permalink / raw)
  To: gcc-patches

This patch makes users of vec_perm_builders use the compressed encoding
where possible.  This means that they work with variable-length vectors.


2017-12-09  Richard Sandiford  <richard.sandiford@linaro.org>

gcc/
	* optabs.c (expand_vec_perm_var): Use an explicit encoding for
	the broadcast of the low byte.
	(expand_mult_highpart): Use an explicit encoding for the permutes.
	* optabs-query.c (can_mult_highpart_p): Likewise.
	* tree-vect-loop.c (calc_vec_perm_mask_for_shift): Likewise.
	* tree-vect-stmts.c (perm_mask_for_reverse): Likewise.
	(vectorizable_bswap): Likewise.
	* tree-vect-data-refs.c (vect_grouped_store_supported): Use an
	explicit encoding for the power-of-2 permutes.
	(vect_permute_store_chain): Likewise.
	(vect_grouped_load_supported): Likewise.
	(vect_permute_load_chain): Likewise.

Index: gcc/optabs.c
===================================================================
--- gcc/optabs.c	2017-12-09 22:48:47.546825312 +0000
+++ gcc/optabs.c	2017-12-09 22:48:52.266015836 +0000
@@ -5625,15 +5625,14 @@ expand_vec_perm_var (machine_mode mode,
 			       NULL, 0, OPTAB_DIRECT);
   gcc_assert (sel != NULL);
 
-  /* Broadcast the low byte each element into each of its bytes.  */
-  vec_perm_builder const_sel (w, w, 1);
-  for (i = 0; i < w; ++i)
-    {
-      int this_e = i / u * u;
-      if (BYTES_BIG_ENDIAN)
-	this_e += u - 1;
-      const_sel.quick_push (this_e);
-    }
+  /* Broadcast the low byte each element into each of its bytes.
+     The encoding has U interleaved stepped patterns, one for each
+     byte of an element.  */
+  vec_perm_builder const_sel (w, u, 3);
+  unsigned int low_byte_in_u = BYTES_BIG_ENDIAN ? u - 1 : 0;
+  for (i = 0; i < 3; ++i)
+    for (unsigned int j = 0; j < u; ++j)
+      const_sel.quick_push (i * u + low_byte_in_u);
   sel = gen_lowpart (qimode, sel);
   sel = expand_vec_perm_const (qimode, sel, sel, const_sel, qimode, NULL);
   gcc_assert (sel != NULL);
@@ -5853,16 +5852,20 @@ expand_mult_highpart (machine_mode mode,
   expand_insn (optab_handler (tab2, mode), 3, eops);
   m2 = gen_lowpart (mode, eops[0].value);
 
-  vec_perm_builder sel (nunits, nunits, 1);
+  vec_perm_builder sel;
   if (method == 2)
     {
-      for (i = 0; i < nunits; ++i)
+      /* The encoding has 2 interleaved stepped patterns.  */
+      sel.new_vector (nunits, 2, 3);
+      for (i = 0; i < 6; ++i)
 	sel.quick_push (!BYTES_BIG_ENDIAN + (i & ~1)
 			+ ((i & 1) ? nunits : 0));
     }
   else
     {
-      for (i = 0; i < nunits; ++i)
+      /* The encoding has a single interleaved stepped pattern.  */
+      sel.new_vector (nunits, 1, 3);
+      for (i = 0; i < 3; ++i)
 	sel.quick_push (2 * i + (BYTES_BIG_ENDIAN ? 0 : 1));
     }
 
Index: gcc/optabs-query.c
===================================================================
--- gcc/optabs-query.c	2017-12-09 22:48:47.545825268 +0000
+++ gcc/optabs-query.c	2017-12-09 22:48:52.265015799 +0000
@@ -501,8 +501,9 @@ can_mult_highpart_p (machine_mode mode,
       op = uns_p ? vec_widen_umult_odd_optab : vec_widen_smult_odd_optab;
       if (optab_handler (op, mode) != CODE_FOR_nothing)
 	{
-	  vec_perm_builder sel (nunits, nunits, 1);
-	  for (i = 0; i < nunits; ++i)
+	  /* The encoding has 2 interleaved stepped patterns.  */
+	  vec_perm_builder sel (nunits, 2, 3);
+	  for (i = 0; i < 6; ++i)
 	    sel.quick_push (!BYTES_BIG_ENDIAN
 			    + (i & ~1)
 			    + ((i & 1) ? nunits : 0));
@@ -518,8 +519,9 @@ can_mult_highpart_p (machine_mode mode,
       op = uns_p ? vec_widen_umult_lo_optab : vec_widen_smult_lo_optab;
       if (optab_handler (op, mode) != CODE_FOR_nothing)
 	{
-	  vec_perm_builder sel (nunits, nunits, 1);
-	  for (i = 0; i < nunits; ++i)
+	  /* The encoding has a single stepped pattern.  */
+	  vec_perm_builder sel (nunits, 1, 3);
+	  for (int i = 0; i < 3; ++i)
 	    sel.quick_push (2 * i + (BYTES_BIG_ENDIAN ? 0 : 1));
 	  vec_perm_indices indices (sel, 2, nunits);
 	  if (can_vec_perm_const_p (mode, indices))
Index: gcc/tree-vect-loop.c
===================================================================
--- gcc/tree-vect-loop.c	2017-12-09 22:48:47.547825355 +0000
+++ gcc/tree-vect-loop.c	2017-12-09 22:48:52.267015873 +0000
@@ -3716,8 +3716,10 @@ vect_estimate_min_profitable_iters (loop
 calc_vec_perm_mask_for_shift (unsigned int offset, unsigned int nelt,
 			      vec_perm_builder *sel)
 {
-  sel->new_vector (nelt, nelt, 1);
-  for (unsigned int i = 0; i < nelt; i++)
+  /* The encoding is a single stepped pattern.  Any wrap-around is handled
+     by vec_perm_indices.  */
+  sel->new_vector (nelt, 1, 3);
+  for (unsigned int i = 0; i < 3; i++)
     sel->quick_push (i + offset);
 }
 
Index: gcc/tree-vect-stmts.c
===================================================================
--- gcc/tree-vect-stmts.c	2017-12-09 22:48:50.360942531 +0000
+++ gcc/tree-vect-stmts.c	2017-12-09 22:48:52.268015910 +0000
@@ -1717,8 +1717,9 @@ perm_mask_for_reverse (tree vectype)
 
   nunits = TYPE_VECTOR_SUBPARTS (vectype);
 
-  vec_perm_builder sel (nunits, nunits, 1);
-  for (i = 0; i < nunits; ++i)
+  /* The encoding has a single stepped pattern.  */
+  vec_perm_builder sel (nunits, 1, 3);
+  for (i = 0; i < 3; ++i)
     sel.quick_push (nunits - 1 - i);
 
   vec_perm_indices indices (sel, 1, nunits);
@@ -2504,8 +2505,9 @@ vectorizable_bswap (gimple *stmt, gimple
   unsigned int num_bytes = TYPE_VECTOR_SUBPARTS (char_vectype);
   unsigned word_bytes = num_bytes / nunits;
 
-  vec_perm_builder elts (num_bytes, num_bytes, 1);
-  for (unsigned i = 0; i < nunits; ++i)
+  /* The encoding uses one stepped pattern for each byte in the word.  */
+  vec_perm_builder elts (num_bytes, word_bytes, 3);
+  for (unsigned i = 0; i < 3; ++i)
     for (unsigned j = 0; j < word_bytes; ++j)
       elts.quick_push ((i + 1) * word_bytes - j - 1);
 
Index: gcc/tree-vect-data-refs.c
===================================================================
--- gcc/tree-vect-data-refs.c	2017-12-09 22:48:47.546825312 +0000
+++ gcc/tree-vect-data-refs.c	2017-12-09 22:48:52.267015873 +0000
@@ -4566,14 +4566,13 @@ vect_grouped_store_supported (tree vecty
   if (VECTOR_MODE_P (mode))
     {
       unsigned int i, nelt = GET_MODE_NUNITS (mode);
-      vec_perm_builder sel (nelt, nelt, 1);
-      sel.quick_grow (nelt);
-
       if (count == 3)
 	{
 	  unsigned int j0 = 0, j1 = 0, j2 = 0;
 	  unsigned int i, j;
 
+	  vec_perm_builder sel (nelt, nelt, 1);
+	  sel.quick_grow (nelt);
 	  vec_perm_indices indices;
 	  for (j = 0; j < 3; j++)
 	    {
@@ -4623,7 +4622,10 @@ vect_grouped_store_supported (tree vecty
 	  /* If length is not equal to 3 then only power of 2 is supported.  */
 	  gcc_assert (pow2p_hwi (count));
 
-	  for (i = 0; i < nelt / 2; i++)
+	  /* The encoding has 2 interleaved stepped patterns.  */
+	  vec_perm_builder sel (nelt, 2, 3);
+	  sel.quick_grow (6);
+	  for (i = 0; i < 3; i++)
 	    {
 	      sel[i * 2] = i;
 	      sel[i * 2 + 1] = i + nelt;
@@ -4631,7 +4633,7 @@ vect_grouped_store_supported (tree vecty
 	  vec_perm_indices indices (sel, 2, nelt);
 	  if (can_vec_perm_const_p (mode, indices))
 	    {
-	      for (i = 0; i < nelt; i++)
+	      for (i = 0; i < 6; i++)
 		sel[i] += nelt / 2;
 	      indices.new_vector (sel, 2, nelt);
 	      if (can_vec_perm_const_p (mode, indices))
@@ -4736,9 +4738,6 @@ vect_permute_store_chain (vec<tree> dr_c
   unsigned int i, n, log_length = exact_log2 (length);
   unsigned int j, nelt = TYPE_VECTOR_SUBPARTS (vectype);
 
-  vec_perm_builder sel (nelt, nelt, 1);
-  sel.quick_grow (nelt);
-
   result_chain->quick_grow (length);
   memcpy (result_chain->address (), dr_chain.address (),
 	  length * sizeof (tree));
@@ -4747,6 +4746,8 @@ vect_permute_store_chain (vec<tree> dr_c
     {
       unsigned int j0 = 0, j1 = 0, j2 = 0;
 
+      vec_perm_builder sel (nelt, nelt, 1);
+      sel.quick_grow (nelt);
       vec_perm_indices indices;
       for (j = 0; j < 3; j++)
         {
@@ -4808,7 +4809,10 @@ vect_permute_store_chain (vec<tree> dr_c
       /* If length is not equal to 3 then only power of 2 is supported.  */
       gcc_assert (pow2p_hwi (length));
 
-      for (i = 0, n = nelt / 2; i < n; i++)
+      /* The encoding has 2 interleaved stepped patterns.  */
+      vec_perm_builder sel (nelt, 2, 3);
+      sel.quick_grow (6);
+      for (i = 0; i < 3; i++)
 	{
 	  sel[i * 2] = i;
 	  sel[i * 2 + 1] = i + nelt;
@@ -4816,7 +4820,7 @@ vect_permute_store_chain (vec<tree> dr_c
 	vec_perm_indices indices (sel, 2, nelt);
 	perm_mask_high = vect_gen_perm_mask_checked (vectype, indices);
 
-	for (i = 0; i < nelt; i++)
+	for (i = 0; i < 6; i++)
 	  sel[i] += nelt / 2;
 	indices.new_vector (sel, 2, nelt);
 	perm_mask_low = vect_gen_perm_mask_checked (vectype, indices);
@@ -5164,11 +5168,11 @@ vect_grouped_load_supported (tree vectyp
   if (VECTOR_MODE_P (mode))
     {
       unsigned int i, j, nelt = GET_MODE_NUNITS (mode);
-      vec_perm_builder sel (nelt, nelt, 1);
-      sel.quick_grow (nelt);
 
       if (count == 3)
 	{
+	  vec_perm_builder sel (nelt, nelt, 1);
+	  sel.quick_grow (nelt);
 	  vec_perm_indices indices;
 	  unsigned int k;
 	  for (k = 0; k < 3; k++)
@@ -5209,12 +5213,15 @@ vect_grouped_load_supported (tree vectyp
 	  /* If length is not equal to 3 then only power of 2 is supported.  */
 	  gcc_assert (pow2p_hwi (count));
 
-	  for (i = 0; i < nelt; i++)
+	  /* The encoding has a single stepped pattern.  */
+	  vec_perm_builder sel (nelt, 1, 3);
+	  sel.quick_grow (3);
+	  for (i = 0; i < 3; i++)
 	    sel[i] = i * 2;
 	  vec_perm_indices indices (sel, 2, nelt);
 	  if (can_vec_perm_const_p (mode, indices))
 	    {
-	      for (i = 0; i < nelt; i++)
+	      for (i = 0; i < 3; i++)
 		sel[i] = i * 2 + 1;
 	      indices.new_vector (sel, 2, nelt);
 	      if (can_vec_perm_const_p (mode, indices))
@@ -5332,9 +5339,6 @@ vect_permute_load_chain (vec<tree> dr_ch
   unsigned int i, j, log_length = exact_log2 (length);
   unsigned nelt = TYPE_VECTOR_SUBPARTS (vectype);
 
-  vec_perm_builder sel (nelt, nelt, 1);
-  sel.quick_grow (nelt);
-
   result_chain->quick_grow (length);
   memcpy (result_chain->address (), dr_chain.address (),
 	  length * sizeof (tree));
@@ -5343,6 +5347,8 @@ vect_permute_load_chain (vec<tree> dr_ch
     {
       unsigned int k;
 
+      vec_perm_builder sel (nelt, nelt, 1);
+      sel.quick_grow (nelt);
       vec_perm_indices indices;
       for (k = 0; k < 3; k++)
 	{
@@ -5390,12 +5396,15 @@ vect_permute_load_chain (vec<tree> dr_ch
       /* If length is not equal to 3 then only power of 2 is supported.  */
       gcc_assert (pow2p_hwi (length));
 
-      for (i = 0; i < nelt; ++i)
+      /* The encoding has a single stepped pattern.  */
+      vec_perm_builder sel (nelt, 1, 3);
+      sel.quick_grow (3);
+      for (i = 0; i < 3; ++i)
 	sel[i] = i * 2;
       vec_perm_indices indices (sel, 2, nelt);
       perm_mask_even = vect_gen_perm_mask_checked (vectype, indices);
 
-      for (i = 0; i < nelt; ++i)
+      for (i = 0; i < 3; ++i)
 	sel[i] = i * 2 + 1;
       indices.new_vector (sel, 2, nelt);
       perm_mask_odd = vect_gen_perm_mask_checked (vectype, indices);

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [10/13] Rework VEC_PERM_EXPR folding
  2017-12-09 23:06 [00/13] Make VEC_PERM_EXPR work for variable-length vectors Richard Sandiford
                   ` (8 preceding siblings ...)
  2017-12-09 23:21 ` [09/13] Use explicit encodings for simple permutes Richard Sandiford
@ 2017-12-09 23:23 ` Richard Sandiford
  2017-12-09 23:24   ` [11/13] Use vec_perm_builder::series_p in shift_amt_for_vec_perm_mask Richard Sandiford
                     ` (3 more replies)
  2017-12-09 23:27 ` [13/13] [AArch64] Use vec_perm_indices helper routines Richard Sandiford
  2017-12-12 14:12 ` [00/13] Make VEC_PERM_EXPR work for variable-length vectors Richard Biener
  11 siblings, 4 replies; 46+ messages in thread
From: Richard Sandiford @ 2017-12-09 23:23 UTC (permalink / raw)
  To: gcc-patches

This patch reworks the VEC_PERM_EXPR folding so that more of it works
for variable-length vectors.  E.g. it means that we can now recognise
variable-length permutes that reduce to a single vector, or cases in
which a variable-length permute only needs one input.  There should be
no functional change for fixed-length vectors.


2017-12-09  Richard Sandiford  <richard.sandiford@linaro.org>

gcc/
	* selftest.h (selftest::vec_perm_indices_c_tests): Declare.
	* selftest-run-tests.c (selftest::run_tests): Call it.
	* vector-builder.h (vector_builder::operator ==): New function.
	(vector_builder::operator !=): Likewise.
	* vec-perm-indices.h (vec_perm_indices::series_p): Declare.
	(vec_perm_indices::all_from_input_p): New function.
	* vec-perm-indices.c (vec_perm_indices::series_p): Likewise.
	(test_vec_perm_12, selftest::vec_perm_indices_c_tests): Likewise.
	* fold-const.c (fold_ternary_loc): Use tree_to_vec_perm_builder
	instead of reading the VECTOR_CST directly.  Detect whether both
	vector inputs are the same before constructing the vec_perm_indices,
	and update the number of inputs argument accordingly.  Use the
	utility functions added above.  Only construct sel2 if we need to.

Index: gcc/selftest.h
===================================================================
*** gcc/selftest.h	2017-12-09 23:06:55.002855594 +0000
--- gcc/selftest.h	2017-12-09 23:21:51.517599734 +0000
*************** extern void vec_c_tests ();
*** 201,206 ****
--- 201,207 ----
  extern void wide_int_cc_tests ();
  extern void predict_c_tests ();
  extern void simplify_rtx_c_tests ();
+ extern void vec_perm_indices_c_tests ();
  
  extern int num_passes;
  
Index: gcc/selftest-run-tests.c
===================================================================
*** gcc/selftest-run-tests.c	2017-12-09 23:06:55.002855594 +0000
--- gcc/selftest-run-tests.c	2017-12-09 23:21:51.517599734 +0000
*************** selftest::run_tests ()
*** 73,78 ****
--- 73,79 ----
  
    /* Mid-level data structures.  */
    input_c_tests ();
+   vec_perm_indices_c_tests ();
    tree_c_tests ();
    gimple_c_tests ();
    rtl_tests_c_tests ();
Index: gcc/vector-builder.h
===================================================================
*** gcc/vector-builder.h	2017-12-09 23:06:55.002855594 +0000
--- gcc/vector-builder.h	2017-12-09 23:21:51.518600090 +0000
*************** #define GCC_VECTOR_BUILDER_H
*** 97,102 ****
--- 97,105 ----
    bool encoded_full_vector_p () const;
    T elt (unsigned int) const;
  
+   bool operator == (const Derived &) const;
+   bool operator != (const Derived &x) const { return !operator == (x); }
+ 
    void finalize ();
  
  protected:
*************** vector_builder<T, Derived>::new_vector (
*** 168,173 ****
--- 171,196 ----
    this->truncate (0);
  }
  
+ /* Return true if this vector and OTHER have the same elements and
+    are encoded in the same way.  */
+ 
+ template<typename T, typename Derived>
+ bool
+ vector_builder<T, Derived>::operator == (const Derived &other) const
+ {
+   if (m_full_nelts != other.m_full_nelts
+       || m_npatterns != other.m_npatterns
+       || m_nelts_per_pattern != other.m_nelts_per_pattern)
+     return false;
+ 
+   unsigned int nelts = encoded_nelts ();
+   for (unsigned int i = 0; i < nelts; ++i)
+     if (!derived ()->equal_p ((*this)[i], other[i]))
+       return false;
+ 
+   return true;
+ }
+ 
  /* Return the value of vector element I, which might or might not be
     encoded explicitly.  */
  
Index: gcc/vec-perm-indices.h
===================================================================
*** gcc/vec-perm-indices.h	2017-12-09 23:20:13.233112018 +0000
--- gcc/vec-perm-indices.h	2017-12-09 23:21:51.517599734 +0000
*************** typedef int_vector_builder<HOST_WIDE_INT
*** 62,68 ****
--- 62,70 ----
  
    element_type clamp (element_type) const;
    element_type operator[] (unsigned int i) const;
+   bool series_p (unsigned int, unsigned int, element_type, element_type) const;
    bool all_in_range_p (element_type, element_type) const;
+   bool all_from_input_p (unsigned int) const;
  
  private:
    vec_perm_indices (const vec_perm_indices &);
*************** vec_perm_indices::operator[] (unsigned i
*** 119,122 ****
--- 121,133 ----
    return clamp (m_encoding.elt (i));
  }
  
+ /* Return true if the permutation vector only selects elements from
+    input I.  */
+ 
+ inline bool
+ vec_perm_indices::all_from_input_p (unsigned int i) const
+ {
+   return all_in_range_p (i * m_nelts_per_input, m_nelts_per_input);
+ }
+ 
  #endif
Index: gcc/vec-perm-indices.c
===================================================================
*** gcc/vec-perm-indices.c	2017-12-09 23:20:13.233112018 +0000
--- gcc/vec-perm-indices.c	2017-12-09 23:21:51.517599734 +0000
*************** Software Foundation; either version 3, o
*** 28,33 ****
--- 28,34 ----
  #include "rtl.h"
  #include "memmodel.h"
  #include "emit-rtl.h"
+ #include "selftest.h"
  
  /* Switch to a new permutation vector that selects between NINPUTS vector
     inputs that have NELTS_PER_INPUT elements each.  Take the elements of the
*************** vec_perm_indices::rotate_inputs (int del
*** 85,90 ****
--- 86,139 ----
      m_encoding[i] = clamp (m_encoding[i] + element_delta);
  }
  
+ /* Return true if index OUT_BASE + I * OUT_STEP selects input
+    element IN_BASE + I * IN_STEP.  */
+ 
+ bool
+ vec_perm_indices::series_p (unsigned int out_base, unsigned int out_step,
+ 			    element_type in_base, element_type in_step) const
+ {
+   /* Check the base value.  */
+   if (clamp (m_encoding.elt (out_base)) != clamp (in_base))
+     return false;
+ 
+   unsigned int full_nelts = m_encoding.full_nelts ();
+   unsigned int npatterns = m_encoding.npatterns ();
+ 
+   /* Calculate which multiple of OUT_STEP elements we need to get
+      back to the same pattern.  */
+   unsigned int cycle_length = least_common_multiple (out_step, npatterns);
+ 
+   /* Check the steps.  */
+   in_step = clamp (in_step);
+   out_base += out_step;
+   unsigned int limit = 0;
+   for (;;)
+     {
+       /* Succeed if we've checked all the elements in the vector.  */
+       if (out_base >= full_nelts)
+ 	return true;
+ 
+       if (out_base >= npatterns)
+ 	{
+ 	  /* We've got to the end of the "foreground" values.  Check
+ 	     2 elements from each pattern in the "background" values.  */
+ 	  if (limit == 0)
+ 	    limit = out_base + cycle_length * 2;
+ 	  else if (out_base >= limit)
+ 	    return true;
+ 	}
+ 
+       element_type v0 = m_encoding.elt (out_base - out_step);
+       element_type v1 = m_encoding.elt (out_base);
+       if (clamp (v1 - v0) != in_step)
+ 	return false;
+ 
+       out_base += out_step;
+     }
+   return true;
+ }
+ 
  /* Return true if all elements of the permutation vector are in the range
     [START, START + SIZE).  */
  
*************** vec_perm_indices_to_rtx (machine_mode mo
*** 180,182 ****
--- 229,280 ----
      RTVEC_ELT (v, i) = gen_int_mode (indices[i], GET_MODE_INNER (mode));
    return gen_rtx_CONST_VECTOR (mode, v);
  }
+ 
+ #if CHECKING_P
+ 
+ namespace selftest {
+ 
+ /* Test a 12-element vector.  */
+ 
+ static void
+ test_vec_perm_12 (void)
+ {
+   vec_perm_builder builder (12, 12, 1);
+   for (unsigned int i = 0; i < 4; ++i)
+     {
+       builder.quick_push (i * 5);
+       builder.quick_push (3 + i);
+       builder.quick_push (2 + 3 * i);
+     }
+   vec_perm_indices indices (builder, 1, 12);
+   ASSERT_TRUE (indices.series_p (0, 3, 0, 5));
+   ASSERT_FALSE (indices.series_p (0, 3, 3, 5));
+   ASSERT_FALSE (indices.series_p (0, 3, 0, 8));
+   ASSERT_TRUE (indices.series_p (1, 3, 3, 1));
+   ASSERT_TRUE (indices.series_p (2, 3, 2, 3));
+ 
+   ASSERT_TRUE (indices.series_p (0, 4, 0, 4));
+   ASSERT_FALSE (indices.series_p (1, 4, 3, 4));
+ 
+   ASSERT_TRUE (indices.series_p (0, 6, 0, 10));
+   ASSERT_FALSE (indices.series_p (0, 6, 0, 100));
+ 
+   ASSERT_FALSE (indices.series_p (1, 10, 3, 7));
+   ASSERT_TRUE (indices.series_p (1, 10, 3, 8));
+ 
+   ASSERT_TRUE (indices.series_p (0, 12, 0, 10));
+   ASSERT_TRUE (indices.series_p (0, 12, 0, 11));
+   ASSERT_TRUE (indices.series_p (0, 12, 0, 100));
+ }
+ 
+ /* Run selftests for this file.  */
+ 
+ void
+ vec_perm_indices_c_tests ()
+ {
+   test_vec_perm_12 ();
+ }
+ 
+ } // namespace selftest
+ 
+ #endif
Index: gcc/fold-const.c
===================================================================
*** gcc/fold-const.c	2017-12-09 23:18:12.040041251 +0000
--- gcc/fold-const.c	2017-12-09 23:21:51.517599734 +0000
*************** fold_ternary_loc (location_t loc, enum t
*** 11547,11645 ****
      case VEC_PERM_EXPR:
        if (TREE_CODE (arg2) == VECTOR_CST)
  	{
! 	  unsigned int nelts = VECTOR_CST_NELTS (arg2), i, mask, mask2;
! 	  bool need_mask_canon = false;
! 	  bool need_mask_canon2 = false;
! 	  bool all_in_vec0 = true;
! 	  bool all_in_vec1 = true;
! 	  bool maybe_identity = true;
! 	  bool single_arg = (op0 == op1);
! 	  bool changed = false;
! 
! 	  mask2 = 2 * nelts - 1;
! 	  mask = single_arg ? (nelts - 1) : mask2;
! 	  gcc_assert (nelts == TYPE_VECTOR_SUBPARTS (type));
! 	  vec_perm_builder sel (nelts, nelts, 1);
! 	  vec_perm_builder sel2 (nelts, nelts, 1);
! 	  for (i = 0; i < nelts; i++)
! 	    {
! 	      tree val = VECTOR_CST_ELT (arg2, i);
! 	      if (TREE_CODE (val) != INTEGER_CST)
! 		return NULL_TREE;
! 
! 	      /* Make sure that the perm value is in an acceptable
! 		 range.  */
! 	      wi::tree_to_wide_ref t = wi::to_wide (val);
! 	      need_mask_canon |= wi::gtu_p (t, mask);
! 	      need_mask_canon2 |= wi::gtu_p (t, mask2);
! 	      unsigned int elt = t.to_uhwi () & mask;
! 	      unsigned int elt2 = t.to_uhwi () & mask2;
! 
! 	      if (elt < nelts)
! 		all_in_vec1 = false;
! 	      else
! 		all_in_vec0 = false;
! 
! 	      if ((elt & (nelts - 1)) != i)
! 		maybe_identity = false;
! 
! 	      sel.quick_push (elt);
! 	      sel2.quick_push (elt2);
! 	    }
  
! 	  if (maybe_identity)
! 	    {
! 	      if (all_in_vec0)
! 		return op0;
! 	      if (all_in_vec1)
! 		return op1;
! 	    }
  
! 	  if (all_in_vec0)
! 	    op1 = op0;
! 	  else if (all_in_vec1)
! 	    {
! 	      op0 = op1;
! 	      for (i = 0; i < nelts; i++)
! 		sel[i] -= nelts;
! 	      need_mask_canon = true;
  	    }
  
- 	  vec_perm_indices indices (sel, 2, nelts);
  	  if ((TREE_CODE (op0) == VECTOR_CST
  	       || TREE_CODE (op0) == CONSTRUCTOR)
  	      && (TREE_CODE (op1) == VECTOR_CST
  		  || TREE_CODE (op1) == CONSTRUCTOR))
  	    {
! 	      tree t = fold_vec_perm (type, op0, op1, indices);
  	      if (t != NULL_TREE)
  		return t;
  	    }
  
! 	  if (op0 == op1 && !single_arg)
! 	    changed = true;
  
! 	  /* Some targets are deficient and fail to expand a single
! 	     argument permutation while still allowing an equivalent
! 	     2-argument version.  */
! 	  if (need_mask_canon && arg2 == op2
! 	      && !can_vec_perm_const_p (TYPE_MODE (type), indices, false)
! 	      && can_vec_perm_const_p (TYPE_MODE (type),
! 				       vec_perm_indices (sel2, 2, nelts),
! 				       false))
  	    {
! 	      need_mask_canon = need_mask_canon2;
! 	      sel.truncate (0);
! 	      sel.splice (sel2);
! 	    }
! 
! 	  if (need_mask_canon && arg2 == op2)
! 	    {
! 	      tree eltype = TREE_TYPE (TREE_TYPE (arg2));
! 	      tree_vector_builder tsel (TREE_TYPE (arg2), nelts, 1);
! 	      for (i = 0; i < nelts; i++)
! 		tsel.quick_push (build_int_cst (eltype, sel[i]));
! 	      op2 = tsel.build ();
  	      changed = true;
  	    }
  
--- 11547,11611 ----
      case VEC_PERM_EXPR:
        if (TREE_CODE (arg2) == VECTOR_CST)
  	{
! 	  /* Build a vector of integers from the tree mask.  */
! 	  vec_perm_builder builder;
! 	  if (!tree_to_vec_perm_builder (&builder, arg2))
! 	    return NULL_TREE;
  
! 	  /* Create a vec_perm_indices for the integer vector.  */
! 	  unsigned int nelts = TYPE_VECTOR_SUBPARTS (type);
! 	  bool single_arg = (op0 == op1);
! 	  vec_perm_indices sel (builder, single_arg ? 1 : 2, nelts);
  
! 	  /* Check for cases that fold to OP0 or OP1 in their original
! 	     element order.  */
! 	  if (sel.series_p (0, 1, 0, 1))
! 	    return op0;
! 	  if (sel.series_p (0, 1, nelts, 1))
! 	    return op1;
! 
! 	  if (!single_arg)
! 	    {
! 	      if (sel.all_from_input_p (0))
! 		op1 = op0;
! 	      else if (sel.all_from_input_p (1))
! 		{
! 		  op0 = op1;
! 		  sel.rotate_inputs (1);
! 		}
  	    }
  
  	  if ((TREE_CODE (op0) == VECTOR_CST
  	       || TREE_CODE (op0) == CONSTRUCTOR)
  	      && (TREE_CODE (op1) == VECTOR_CST
  		  || TREE_CODE (op1) == CONSTRUCTOR))
  	    {
! 	      tree t = fold_vec_perm (type, op0, op1, sel);
  	      if (t != NULL_TREE)
  		return t;
  	    }
  
! 	  bool changed = (op0 == op1 && !single_arg);
  
! 	  /* Generate a canonical form of the selector.  */
! 	  if (arg2 == op2 && sel.encoding () != builder)
  	    {
! 	      /* Some targets are deficient and fail to expand a single
! 		 argument permutation while still allowing an equivalent
! 		 2-argument version.  */
! 	      if (sel.ninputs () == 2
! 		  || can_vec_perm_const_p (TYPE_MODE (type), sel, false))
! 		op2 = vec_perm_indices_to_tree (TREE_TYPE (arg2), sel);
! 	      else
! 		{
! 		  vec_perm_indices sel2 (builder, 2, nelts);
! 		  if (can_vec_perm_const_p (TYPE_MODE (type), sel2, false))
! 		    op2 = vec_perm_indices_to_tree (TREE_TYPE (arg2), sel2);
! 		  else
! 		    /* Not directly supported with either encoding,
! 		       so use the preferred form.  */
! 		    op2 = vec_perm_indices_to_tree (TREE_TYPE (arg2), sel);
! 		}
  	      changed = true;
  	    }
  

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [11/13] Use vec_perm_builder::series_p in shift_amt_for_vec_perm_mask
  2017-12-09 23:23 ` [10/13] Rework VEC_PERM_EXPR folding Richard Sandiford
@ 2017-12-09 23:24   ` Richard Sandiford
  2017-12-19 20:37     ` Richard Sandiford
  2018-01-02 13:08     ` Richard Biener
  2017-12-09 23:25   ` [12/13] Use ssizetype selectors for autovectorised VEC_PERM_EXPRs Richard Sandiford
                     ` (2 subsequent siblings)
  3 siblings, 2 replies; 46+ messages in thread
From: Richard Sandiford @ 2017-12-09 23:24 UTC (permalink / raw)
  To: gcc-patches

This patch makes shift_amt_for_vec_perm_mask use series_p to check
for the simple case of a natural linear series before falling back
to testing each element individually.  The series_p test works with
variable-length vectors but testing every individual element doesn't.


2017-12-09  Richard Sandiford  <richard.sandiford@linaro.org>

gcc/
	* optabs.c (shift_amt_for_vec_perm_mask): Try using series_p
	before testing each element individually.

Index: gcc/optabs.c
===================================================================
--- gcc/optabs.c	2017-12-09 22:48:52.266015836 +0000
+++ gcc/optabs.c	2017-12-09 22:48:56.257154317 +0000
@@ -5375,20 +5375,20 @@ vector_compare_rtx (machine_mode cmp_mod
 static rtx
 shift_amt_for_vec_perm_mask (machine_mode mode, const vec_perm_indices &sel)
 {
-  unsigned int i, first, nelt = GET_MODE_NUNITS (mode);
+  unsigned int nelt = GET_MODE_NUNITS (mode);
   unsigned int bitsize = GET_MODE_UNIT_BITSIZE (mode);
-
-  first = sel[0];
+  unsigned int first = sel[0];
   if (first >= nelt)
     return NULL_RTX;
-  for (i = 1; i < nelt; i++)
-    {
-      int idx = sel[i];
-      unsigned int expected = i + first;
-      /* Indices into the second vector are all equivalent.  */
-      if (idx < 0 || (MIN (nelt, (unsigned) idx) != MIN (nelt, expected)))
-	return NULL_RTX;
-    }
+
+  if (!sel.series_p (0, 1, first, 1))
+    for (unsigned int i = 1; i < nelt; i++)
+      {
+	unsigned int expected = i + first;
+	/* Indices into the second vector are all equivalent.  */
+	if (MIN (nelt, sel[i]) != MIN (nelt, expected))
+	  return NULL_RTX;
+      }
 
   return GEN_INT (first * bitsize);
 }

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [12/13] Use ssizetype selectors for autovectorised VEC_PERM_EXPRs
  2017-12-09 23:23 ` [10/13] Rework VEC_PERM_EXPR folding Richard Sandiford
  2017-12-09 23:24   ` [11/13] Use vec_perm_builder::series_p in shift_amt_for_vec_perm_mask Richard Sandiford
@ 2017-12-09 23:25   ` Richard Sandiford
  2017-12-19 20:37     ` Richard Sandiford
  2018-01-02 13:09     ` Richard Biener
  2017-12-19 20:37   ` [10/13] Rework VEC_PERM_EXPR folding Richard Sandiford
  2018-01-02 13:08   ` Richard Biener
  3 siblings, 2 replies; 46+ messages in thread
From: Richard Sandiford @ 2017-12-09 23:25 UTC (permalink / raw)
  To: gcc-patches

The previous patches mean that there's no reason that constant
VEC_PERM_EXPRs need to have the same shape as the data inputs.
This patch makes the autovectoriser use ssizetype elements instead,
so that indices don't get truncated for large or variable-length vectors.


2017-12-09  Richard Sandiford  <richard.sandiford@linaro.org>

gcc/
	* tree-cfg.c (verify_gimple_assign_ternary): Allow the size of
	the selector elements to be different from the data elements
	if the selector is a VECTOR_CST.
	* tree-vect-stmts.c (vect_gen_perm_mask_any): Use a vector of
	ssizetype for the selector.

Index: gcc/tree-cfg.c
===================================================================
--- gcc/tree-cfg.c	2017-12-09 22:47:07.103588314 +0000
+++ gcc/tree-cfg.c	2017-12-09 22:48:58.259216407 +0000
@@ -4300,8 +4300,11 @@ verify_gimple_assign_ternary (gassign *s
 	}
 
       if (TREE_CODE (TREE_TYPE (rhs3_type)) != INTEGER_TYPE
-	  || GET_MODE_BITSIZE (SCALAR_INT_TYPE_MODE (TREE_TYPE (rhs3_type)))
-	     != GET_MODE_BITSIZE (SCALAR_TYPE_MODE (TREE_TYPE (rhs1_type))))
+	  || (TREE_CODE (rhs3) != VECTOR_CST
+	      && (GET_MODE_BITSIZE (SCALAR_INT_TYPE_MODE
+				    (TREE_TYPE (rhs3_type)))
+		  != GET_MODE_BITSIZE (SCALAR_TYPE_MODE
+				       (TREE_TYPE (rhs1_type))))))
 	{
 	  error ("invalid mask type in vector permute expression");
 	  debug_generic_expr (lhs_type);
Index: gcc/tree-vect-stmts.c
===================================================================
--- gcc/tree-vect-stmts.c	2017-12-09 22:48:52.268015910 +0000
+++ gcc/tree-vect-stmts.c	2017-12-09 22:48:58.259216407 +0000
@@ -6518,11 +6518,12 @@ vectorizable_store (gimple *stmt, gimple
 tree
 vect_gen_perm_mask_any (tree vectype, const vec_perm_indices &sel)
 {
-  tree mask_elt_type, mask_type;
+  tree mask_type;
 
-  mask_elt_type = lang_hooks.types.type_for_mode
-    (int_mode_for_mode (TYPE_MODE (TREE_TYPE (vectype))).require (), 1);
-  mask_type = get_vectype_for_scalar_type (mask_elt_type);
+  unsigned int nunits = sel.length ();
+  gcc_assert (nunits == TYPE_VECTOR_SUBPARTS (vectype));
+
+  mask_type = build_vector_type (ssizetype, nunits);
   return vec_perm_indices_to_tree (mask_type, sel);
 }
 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [13/13] [AArch64] Use vec_perm_indices helper routines
  2017-12-09 23:06 [00/13] Make VEC_PERM_EXPR work for variable-length vectors Richard Sandiford
                   ` (9 preceding siblings ...)
  2017-12-09 23:23 ` [10/13] Rework VEC_PERM_EXPR folding Richard Sandiford
@ 2017-12-09 23:27 ` Richard Sandiford
  2017-12-19 20:37   ` Richard Sandiford
  2017-12-12 14:12 ` [00/13] Make VEC_PERM_EXPR work for variable-length vectors Richard Biener
  11 siblings, 1 reply; 46+ messages in thread
From: Richard Sandiford @ 2017-12-09 23:27 UTC (permalink / raw)
  To: gcc-patches; +Cc: richard.earnshaw, james.greenhalgh, marcus.shawcroft

This patch makes the AArch64 vec_perm_const code use the new
vec_perm_indices routines, instead of checking each element individually.
This means that they extend naturally to variable-length vectors.

Also, aarch64_evpc_dup was the only function that generated rtl when
testing_p is true, and that looked accidental.  The patch adds the
missing check and then replaces the gen_rtx_REG/start_sequence/
end_sequence stuff with an assert that no rtl is generated.

Tested on aarch64-linux-gnu.  Also tested by making sure that there
were no assembly output differences for aarch64_be-linux-gnu or
aarch64_be-linux-gnu.  OK to install?

Richard


2017-12-09  Richard Sandiford  <richard.sandiford@linaro.org>

gcc/
	* config/aarch64/aarch64.c (aarch64_evpc_trn): Use d.perm.series_p
	instead of checking each element individually.
	(aarch64_evpc_uzp): Likewise.
	(aarch64_evpc_zip): Likewise.
	(aarch64_evpc_ext): Likewise.
	(aarch64_evpc_rev): Likewise.
	(aarch64_evpc_dup): Test the encoding for a single duplicated element,
	instead of checking each element individually.  Return true without
	generating rtl if
	(aarch64_vectorize_vec_perm_const): Use all_from_input_p to test
	whether all selected elements come from the same input, instead of
	checking each element individually.  Remove calls to gen_rtx_REG,
	start_sequence and end_sequence and instead assert that no rtl is
	generated.

Index: gcc/config/aarch64/aarch64.c
===================================================================
--- gcc/config/aarch64/aarch64.c	2017-12-09 22:48:47.535824832 +0000
+++ gcc/config/aarch64/aarch64.c	2017-12-09 22:49:00.139270410 +0000
@@ -13295,7 +13295,7 @@ aarch64_expand_vec_perm (rtx target, rtx
 static bool
 aarch64_evpc_trn (struct expand_vec_perm_d *d)
 {
-  unsigned int i, odd, mask, nelt = d->perm.length ();
+  unsigned int odd, nelt = d->perm.length ();
   rtx out, in0, in1, x;
   machine_mode vmode = d->vmode;
 
@@ -13304,21 +13304,11 @@ aarch64_evpc_trn (struct expand_vec_perm
 
   /* Note that these are little-endian tests.
      We correct for big-endian later.  */
-  if (d->perm[0] == 0)
-    odd = 0;
-  else if (d->perm[0] == 1)
-    odd = 1;
-  else
+  odd = d->perm[0];
+  if ((odd != 0 && odd != 1)
+      || !d->perm.series_p (0, 2, odd, 2)
+      || !d->perm.series_p (1, 2, nelt + odd, 2))
     return false;
-  mask = (d->one_vector_p ? nelt - 1 : 2 * nelt - 1);
-
-  for (i = 0; i < nelt; i += 2)
-    {
-      if (d->perm[i] != i + odd)
-	return false;
-      if (d->perm[i + 1] != ((i + nelt + odd) & mask))
-	return false;
-    }
 
   /* Success!  */
   if (d->testing_p)
@@ -13342,7 +13332,7 @@ aarch64_evpc_trn (struct expand_vec_perm
 static bool
 aarch64_evpc_uzp (struct expand_vec_perm_d *d)
 {
-  unsigned int i, odd, mask, nelt = d->perm.length ();
+  unsigned int odd;
   rtx out, in0, in1, x;
   machine_mode vmode = d->vmode;
 
@@ -13351,20 +13341,10 @@ aarch64_evpc_uzp (struct expand_vec_perm
 
   /* Note that these are little-endian tests.
      We correct for big-endian later.  */
-  if (d->perm[0] == 0)
-    odd = 0;
-  else if (d->perm[0] == 1)
-    odd = 1;
-  else
+  odd = d->perm[0];
+  if ((odd != 0 && odd != 1)
+      || !d->perm.series_p (0, 1, odd, 2))
     return false;
-  mask = (d->one_vector_p ? nelt - 1 : 2 * nelt - 1);
-
-  for (i = 0; i < nelt; i++)
-    {
-      unsigned elt = (i * 2 + odd) & mask;
-      if (d->perm[i] != elt)
-	return false;
-    }
 
   /* Success!  */
   if (d->testing_p)
@@ -13388,7 +13368,7 @@ aarch64_evpc_uzp (struct expand_vec_perm
 static bool
 aarch64_evpc_zip (struct expand_vec_perm_d *d)
 {
-  unsigned int i, high, mask, nelt = d->perm.length ();
+  unsigned int high, nelt = d->perm.length ();
   rtx out, in0, in1, x;
   machine_mode vmode = d->vmode;
 
@@ -13397,25 +13377,11 @@ aarch64_evpc_zip (struct expand_vec_perm
 
   /* Note that these are little-endian tests.
      We correct for big-endian later.  */
-  high = nelt / 2;
-  if (d->perm[0] == high)
-    /* Do Nothing.  */
-    ;
-  else if (d->perm[0] == 0)
-    high = 0;
-  else
+  high = d->perm[0];
+  if ((high != 0 && high * 2 != nelt)
+      || !d->perm.series_p (0, 2, high, 1)
+      || !d->perm.series_p (1, 2, high + nelt, 1))
     return false;
-  mask = (d->one_vector_p ? nelt - 1 : 2 * nelt - 1);
-
-  for (i = 0; i < nelt / 2; i++)
-    {
-      unsigned elt = (i + high) & mask;
-      if (d->perm[i * 2] != elt)
-	return false;
-      elt = (elt + nelt) & mask;
-      if (d->perm[i * 2 + 1] != elt)
-	return false;
-    }
 
   /* Success!  */
   if (d->testing_p)
@@ -13440,23 +13406,14 @@ aarch64_evpc_zip (struct expand_vec_perm
 static bool
 aarch64_evpc_ext (struct expand_vec_perm_d *d)
 {
-  unsigned int i, nelt = d->perm.length ();
+  unsigned int nelt = d->perm.length ();
   rtx offset;
 
   unsigned int location = d->perm[0]; /* Always < nelt.  */
 
   /* Check if the extracted indices are increasing by one.  */
-  for (i = 1; i < nelt; i++)
-    {
-      unsigned int required = location + i;
-      if (d->one_vector_p)
-        {
-          /* We'll pass the same vector in twice, so allow indices to wrap.  */
-	  required &= (nelt - 1);
-	}
-      if (d->perm[i] != required)
-        return false;
-    }
+  if (!d->perm.series_p (0, 1, location, 1))
+    return false;
 
   /* Success! */
   if (d->testing_p)
@@ -13488,7 +13445,7 @@ aarch64_evpc_ext (struct expand_vec_perm
 static bool
 aarch64_evpc_rev (struct expand_vec_perm_d *d)
 {
-  unsigned int i, j, diff, size, unspec, nelt = d->perm.length ();
+  unsigned int i, diff, size, unspec;
 
   if (!d->one_vector_p)
     return false;
@@ -13504,18 +13461,10 @@ aarch64_evpc_rev (struct expand_vec_perm
   else
     return false;
 
-  for (i = 0; i < nelt ; i += diff + 1)
-    for (j = 0; j <= diff; j += 1)
-      {
-	/* This is guaranteed to be true as the value of diff
-	   is 7, 3, 1 and we should have enough elements in the
-	   queue to generate this.  Getting a vector mask with a
-	   value of diff other than these values implies that
-	   something is wrong by the time we get here.  */
-	gcc_assert (i + j < nelt);
-	if (d->perm[i + j] != i + diff - j)
-	  return false;
-      }
+  unsigned int step = diff + 1;
+  for (i = 0; i < step; ++i)
+    if (!d->perm.series_p (i, step, diff - i, step))
+      return false;
 
   /* Success! */
   if (d->testing_p)
@@ -13532,15 +13481,17 @@ aarch64_evpc_dup (struct expand_vec_perm
   rtx out = d->target;
   rtx in0;
   machine_mode vmode = d->vmode;
-  unsigned int i, elt, nelt = d->perm.length ();
+  unsigned int elt;
   rtx lane;
 
+  if (d->perm.encoding ().encoded_nelts () != 1)
+    return false;
+
+  /* Success! */
+  if (d->testing_p)
+    return true;
+
   elt = d->perm[0];
-  for (i = 1; i < nelt; i++)
-    {
-      if (elt != d->perm[i])
-	return false;
-    }
 
   /* The generic preparation in aarch64_expand_vec_perm_const_1
      swaps the operand order and the permute indices if it finds
@@ -13628,61 +13579,37 @@ aarch64_vectorize_vec_perm_const (machin
 				  rtx op1, const vec_perm_indices &sel)
 {
   struct expand_vec_perm_d d;
-  unsigned int i, which;
 
-  d.vmode = vmode;
-  d.target = target;
-  d.op0 = op0;
-  d.op1 = op1;
-  d.testing_p = !target;
-
-  /* Calculate whether all elements are in one vector.  */
-  unsigned int nelt = sel.length ();
-  for (i = which = 0; i < nelt; ++i)
+  /* Check whether the mask can be applied to a single vector.  */
+  if (op0 && rtx_equal_p (op0, op1))
+    d.one_vector_p = true;
+  else if (sel.all_from_input_p (0))
     {
-      unsigned int ei = sel[i] & (2 * nelt - 1);
-      which |= (ei < nelt ? 1 : 2);
+      d.one_vector_p = true;
+      op1 = op0;
     }
-
-  switch (which)
+  else if (sel.all_from_input_p (1))
     {
-    default:
-      gcc_unreachable ();
-
-    case 3:
-      d.one_vector_p = false;
-      if (d.testing_p || !rtx_equal_p (op0, op1))
-	break;
-
-      /* The elements of PERM do not suggest that only the first operand
-	 is used, but both operands are identical.  Allow easier matching
-	 of the permutation by folding the permutation into the single
-	 input vector.  */
-      /* Fall Through.  */
-    case 2:
-      d.op0 = op1;
-      d.one_vector_p = true;
-      break;
-
-    case 1:
-      d.op1 = op0;
       d.one_vector_p = true;
-      break;
+      op0 = op1;
     }
+  else
+    d.one_vector_p = false;
 
-  d.perm.new_vector (sel.encoding (), d.one_vector_p ? 1 : 2, nelt);
+  d.perm.new_vector (sel.encoding (), d.one_vector_p ? 1 : 2,
+		     sel.nelts_per_input ());
+  d.vmode = vmode;
+  d.target = target;
+  d.op0 = op0;
+  d.op1 = op1;
+  d.testing_p = !target;
 
   if (!d.testing_p)
     return aarch64_expand_vec_perm_const_1 (&d);
 
-  d.target = gen_raw_REG (d.vmode, LAST_VIRTUAL_REGISTER + 1);
-  d.op1 = d.op0 = gen_raw_REG (d.vmode, LAST_VIRTUAL_REGISTER + 2);
-  if (!d.one_vector_p)
-    d.op1 = gen_raw_REG (d.vmode, LAST_VIRTUAL_REGISTER + 3);
-
-  start_sequence ();
+  rtx_insn *last = get_last_insn ();
   bool ret = aarch64_expand_vec_perm_const_1 (&d);
-  end_sequence ();
+  gcc_assert (last == get_last_insn ());
 
   return ret;
 }

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [00/13] Make VEC_PERM_EXPR work for variable-length vectors
  2017-12-09 23:06 [00/13] Make VEC_PERM_EXPR work for variable-length vectors Richard Sandiford
                   ` (10 preceding siblings ...)
  2017-12-09 23:27 ` [13/13] [AArch64] Use vec_perm_indices helper routines Richard Sandiford
@ 2017-12-12 14:12 ` Richard Biener
  2017-12-12 15:32   ` Richard Sandiford
  11 siblings, 1 reply; 46+ messages in thread
From: Richard Biener @ 2017-12-12 14:12 UTC (permalink / raw)
  To: GCC Patches, Richard Sandiford

On Sun, Dec 10, 2017 at 12:06 AM, Richard Sandiford
<richard.sandiford@linaro.org> wrote:
> This series is a replacement for:
> https://gcc.gnu.org/ml/gcc-patches/2017-11/msg00747.html
> based on the feedback that using VEC_PERM_EXPR would be better.
>
> The changes are:
>
> (1) Remove the restriction that the selector elements have to have the
>     same width as the data elements, but only for constant selectors.
>     This lets through the cases we need without also allowing
>     potentially-expensive ops.  Adding support for the variable
>     case can be done later if it seems useful, but it's not trivial.
>
> (2) Encode the integer form of constant selectors (vec_perm_indices)
>     in the same way as the new VECTOR_CST encoding, so that it can
>     cope with variable-length vectors.
>
> (3) Remove the vec_perm_const optab and reuse the target hook to emit
>     code.  This avoids the need to create a CONST_VECTOR for the wide
>     selectors, and hence the need to have a corresponding wide vector
>     mode (which the target wouldn't otherwise need or support).

Hmm.  Makes sense I suppose.

> (4) When handling the variable vec_perm optab, check that modes can store
>     all element indices before using them.
>
> (5) Unconditionally use ssizetype selector elements in permutes created
>     by the vectoriser.

Why specifically _signed_ sizetype?  That sounds like an odd choice.  But I'll
eventually see when looking at the patch.  Does that mean we have a
VNDImode vector unconditionally for the permute even though a vector
matching the width of the data members would work?  What happens if the
target doesn't have vec_perm_const but vec_perm to handle all constant
permutes?

Going to look over the patches now.

Thanks for (re-)doing the work.

Richard.

> (6) Make the AArch64 vec_perm_const handling handle variable-length vectors.
>
> Tested directly on trunk on aarch64-linux-gnu, x86_64-linux-gnu and
> powerpc64le-linux-gnu.  Also tested by comparing the before and after
> assembly output for:
>
>    arm-linux-gnueabi arm-linux-gnueabihf aarch64-linux-gnu
>    aarch64_be-linux-gnu ia64-linux-gnu i686-pc-linux-gnu
>    mipsisa64-linux-gnu mipsel-linux-gnu powerpc64-linux-gnu
>    powerpc64le-linux-gnu powerpc-eabispe x86_64-linux-gnu
>    sparc64-linux-gnu
>
> at -O3, which should cover all the ports that defined vec_perm_const.
> The only difference was one instance of different RA for ia64-linux-gnu,
> caused by using force_reg on a SUBREG that was previously used directly.
>
> OK to install?
>
> Thanks,
> Richard

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [02/13] Pass vec_perm_indices by reference
  2017-12-09 23:09 ` [02/13] Pass vec_perm_indices by reference Richard Sandiford
@ 2017-12-12 14:23   ` Richard Biener
  0 siblings, 0 replies; 46+ messages in thread
From: Richard Biener @ 2017-12-12 14:23 UTC (permalink / raw)
  To: GCC Patches, Richard Sandiford

On Sun, Dec 10, 2017 at 12:09 AM, Richard Sandiford
<richard.sandiford@linaro.org> wrote:
> This patch makes functions take vec_perm_indices by reference rather
> than value, since a later patch will turn vec_perm_indices into a class
> that would be more expensive to copy.

Ok.

>
> 2017-12-06  Richard Sandiford  <richard.sandiford@linaro.org>
>
> gcc/
>         * fold-const.c (fold_vec_perm): Take a const vec_perm_indices &
>         instead of vec_perm_indices.
>         * tree-vectorizer.h (vect_gen_perm_mask_any): Likewise,
>         (vect_gen_perm_mask_checked): Likewise,
>         * tree-vect-stmts.c (vect_gen_perm_mask_any): Likewise,
>         (vect_gen_perm_mask_checked): Likewise,
>
> Index: gcc/fold-const.c
> ===================================================================
> --- gcc/fold-const.c    2017-12-09 22:47:11.840391388 +0000
> +++ gcc/fold-const.c    2017-12-09 22:47:19.119312754 +0000
> @@ -8801,7 +8801,7 @@ vec_cst_ctor_to_array (tree arg, unsigne
>     NULL_TREE otherwise.  */
>
>  static tree
> -fold_vec_perm (tree type, tree arg0, tree arg1, vec_perm_indices sel)
> +fold_vec_perm (tree type, tree arg0, tree arg1, const vec_perm_indices &sel)
>  {
>    unsigned int i;
>    bool need_ctor = false;
> Index: gcc/tree-vectorizer.h
> ===================================================================
> --- gcc/tree-vectorizer.h       2017-12-09 22:47:11.840391388 +0000
> +++ gcc/tree-vectorizer.h       2017-12-09 22:47:19.120312754 +0000
> @@ -1204,8 +1204,8 @@ extern void vect_get_load_cost (struct d
>  extern void vect_get_store_cost (struct data_reference *, int,
>                                  unsigned int *, stmt_vector_for_cost *);
>  extern bool vect_supportable_shift (enum tree_code, tree);
> -extern tree vect_gen_perm_mask_any (tree, vec_perm_indices);
> -extern tree vect_gen_perm_mask_checked (tree, vec_perm_indices);
> +extern tree vect_gen_perm_mask_any (tree, const vec_perm_indices &);
> +extern tree vect_gen_perm_mask_checked (tree, const vec_perm_indices &);
>  extern void optimize_mask_stores (struct loop*);
>
>  /* In tree-vect-data-refs.c.  */
> Index: gcc/tree-vect-stmts.c
> ===================================================================
> --- gcc/tree-vect-stmts.c       2017-12-09 22:47:11.840391388 +0000
> +++ gcc/tree-vect-stmts.c       2017-12-09 22:47:19.119312754 +0000
> @@ -6506,7 +6506,7 @@ vectorizable_store (gimple *stmt, gimple
>     vect_gen_perm_mask_checked.  */
>
>  tree
> -vect_gen_perm_mask_any (tree vectype, vec_perm_indices sel)
> +vect_gen_perm_mask_any (tree vectype, const vec_perm_indices &sel)
>  {
>    tree mask_elt_type, mask_type;
>
> @@ -6527,7 +6527,7 @@ vect_gen_perm_mask_any (tree vectype, ve
>     i.e. that the target supports the pattern _for arbitrary input vectors_.  */
>
>  tree
> -vect_gen_perm_mask_checked (tree vectype, vec_perm_indices sel)
> +vect_gen_perm_mask_checked (tree vectype, const vec_perm_indices &sel)
>  {
>    gcc_assert (can_vec_perm_p (TYPE_MODE (vectype), false, &sel));
>    return vect_gen_perm_mask_any (vectype, sel);

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [03/13] Split can_vec_perm_p into can_vec_perm_{var,const}_p
  2017-12-09 23:11 ` [03/13] Split can_vec_perm_p into can_vec_perm_{var,const}_p Richard Sandiford
@ 2017-12-12 14:25   ` Richard Biener
  0 siblings, 0 replies; 46+ messages in thread
From: Richard Biener @ 2017-12-12 14:25 UTC (permalink / raw)
  To: GCC Patches, Richard Sandiford

On Sun, Dec 10, 2017 at 12:10 AM, Richard Sandiford
<richard.sandiford@linaro.org> wrote:
> This patch splits can_vec_perm_p into two functions: can_vec_perm_var_p
> for testing permute operations with variable selection vectors, and
> can_vec_perm_const_p for testing permute operations with specific
> constant selection vectors.  This means that we can pass the constant
> selection vector by reference.
>
> Constant permutes can still use a variable permute as a fallback.
> A later patch adds a check to make sure that we don't truncate the
> vector indices when doing this.
>
> However, have_whole_vector_shift checked:
>
>   if (direct_optab_handler (vec_perm_const_optab, mode) == CODE_FOR_nothing)
>     return false;
>
> which had the effect of disallowing the fallback to variable permutes.
> I'm not sure whether that was the intention or whether it was just
> supposed to short-cut the loop on targets that don't support permutes.
> (But then why bother?  The first check in the loop would fail and
> we'd bail out straightaway.)
>
> The patch adds a parameter for disallowing the fallback.  I think it
> makes sense to do this for the following code in the VEC_PERM_EXPR
> folder:
>
>           /* Some targets are deficient and fail to expand a single
>              argument permutation while still allowing an equivalent
>              2-argument version.  */
>           if (need_mask_canon && arg2 == op2
>               && !can_vec_perm_p (TYPE_MODE (type), false, &sel)
>               && can_vec_perm_p (TYPE_MODE (type), false, &sel2))
>
> since it's really testing whether the expand_vec_perm_const code expects
> a particular form.

Ok.

>
> 2017-12-09  Richard Sandiford  <richard.sandiford@linaro.org>
>
> gcc/
>         * optabs-query.h (can_vec_perm_p): Delete.
>         (can_vec_perm_var_p, can_vec_perm_const_p): Declare.
>         * optabs-query.c (can_vec_perm_p): Split into...
>         (can_vec_perm_var_p, can_vec_perm_const_p): ...these two functions.
>         (can_mult_highpart_p): Use can_vec_perm_const_p to test whether a
>         particular selector is valid.
>         * tree-ssa-forwprop.c (simplify_vector_constructor): Likewise.
>         * tree-vect-data-refs.c (vect_grouped_store_supported): Likewise.
>         (vect_grouped_load_supported): Likewise.
>         (vect_shift_permute_load_chain): Likewise.
>         * tree-vect-slp.c (vect_build_slp_tree_1): Likewise.
>         (vect_transform_slp_perm_load): Likewise.
>         * tree-vect-stmts.c (perm_mask_for_reverse): Likewise.
>         (vectorizable_bswap): Likewise.
>         (vect_gen_perm_mask_checked): Likewise.
>         * fold-const.c (fold_ternary_loc): Likewise.  Don't take
>         implementations of variable permutation vectors into account
>         when deciding which selector to use.
>         * tree-vect-loop.c (have_whole_vector_shift): Don't check whether
>         vec_perm_const_optab is supported; instead use can_vec_perm_const_p
>         with a false third argument.
>         * tree-vect-generic.c (lower_vec_perm): Use can_vec_perm_const_p
>         to test whether the constant selector is valid and can_vec_perm_var_p
>         to test whether a variable selector is valid.
>
> Index: gcc/optabs-query.h
> ===================================================================
> --- gcc/optabs-query.h  2017-12-09 22:47:14.730310076 +0000
> +++ gcc/optabs-query.h  2017-12-09 22:47:21.534314227 +0000
> @@ -175,7 +175,9 @@ enum insn_code can_float_p (machine_mode
>  enum insn_code can_fix_p (machine_mode, machine_mode, int, bool *);
>  bool can_conditionally_move_p (machine_mode mode);
>  opt_machine_mode qimode_for_vec_perm (machine_mode);
> -bool can_vec_perm_p (machine_mode, bool, vec_perm_indices *);
> +bool can_vec_perm_var_p (machine_mode);
> +bool can_vec_perm_const_p (machine_mode, const vec_perm_indices &,
> +                          bool = true);
>  /* Find a widening optab even if it doesn't widen as much as we want.  */
>  #define find_widening_optab_handler(A, B, C) \
>    find_widening_optab_handler_and_mode (A, B, C, NULL)
> Index: gcc/optabs-query.c
> ===================================================================
> --- gcc/optabs-query.c  2017-12-09 22:47:14.729310075 +0000
> +++ gcc/optabs-query.c  2017-12-09 22:47:21.534314227 +0000
> @@ -361,58 +361,75 @@ qimode_for_vec_perm (machine_mode mode)
>    return opt_machine_mode ();
>  }
>
> -/* Return true if VEC_PERM_EXPR of arbitrary input vectors can be
> -   expanded using SIMD extensions of the CPU.  SEL may be NULL, which
> -   stands for an unknown constant.  Note that additional permutations
> -   representing whole-vector shifts may also be handled via the vec_shr
> -   optab, but only where the second input vector is entirely constant
> -   zeroes; this case is not dealt with here.  */
> +/* Return true if VEC_PERM_EXPRs with variable selector operands can be
> +   expanded using SIMD extensions of the CPU.  MODE is the mode of the
> +   vectors being permuted.  */
>
>  bool
> -can_vec_perm_p (machine_mode mode, bool variable, vec_perm_indices *sel)
> +can_vec_perm_var_p (machine_mode mode)
>  {
> -  machine_mode qimode;
> -
>    /* If the target doesn't implement a vector mode for the vector type,
>       then no operations are supported.  */
>    if (!VECTOR_MODE_P (mode))
>      return false;
>
> -  if (!variable)
> -    {
> -      if (direct_optab_handler (vec_perm_const_optab, mode) != CODE_FOR_nothing
> -         && (sel == NULL
> -             || targetm.vectorize.vec_perm_const_ok == NULL
> -             || targetm.vectorize.vec_perm_const_ok (mode, *sel)))
> -       return true;
> -    }
> -
>    if (direct_optab_handler (vec_perm_optab, mode) != CODE_FOR_nothing)
>      return true;
>
>    /* We allow fallback to a QI vector mode, and adjust the mask.  */
> +  machine_mode qimode;
>    if (!qimode_for_vec_perm (mode).exists (&qimode))
>      return false;
>
> -  /* ??? For completeness, we ought to check the QImode version of
> -      vec_perm_const_optab.  But all users of this implicit lowering
> -      feature implement the variable vec_perm_optab.  */
>    if (direct_optab_handler (vec_perm_optab, qimode) == CODE_FOR_nothing)
>      return false;
>
>    /* In order to support the lowering of variable permutations,
>       we need to support shifts and adds.  */
> -  if (variable)
> +  if (GET_MODE_UNIT_SIZE (mode) > 2
> +      && optab_handler (ashl_optab, mode) == CODE_FOR_nothing
> +      && optab_handler (vashl_optab, mode) == CODE_FOR_nothing)
> +    return false;
> +  if (optab_handler (add_optab, qimode) == CODE_FOR_nothing)
> +    return false;
> +
> +  return true;
> +}
> +
> +/* Return true if the target directly supports VEC_PERM_EXPRs on vectors
> +   of mode MODE using the selector SEL.  ALLOW_VARIABLE_P is true if it
> +   is acceptable to force the selector into a register and use a variable
> +   permute (if the target supports that).
> +
> +   Note that additional permutations representing whole-vector shifts may
> +   also be handled via the vec_shr optab, but only where the second input
> +   vector is entirely constant zeroes; this case is not dealt with here.  */
> +
> +bool
> +can_vec_perm_const_p (machine_mode mode, const vec_perm_indices &sel,
> +                     bool allow_variable_p)
> +{
> +  /* If the target doesn't implement a vector mode for the vector type,
> +     then no operations are supported.  */
> +  if (!VECTOR_MODE_P (mode))
> +    return false;
> +
> +  /* It's probably cheaper to test for the variable case first.  */
> +  if (allow_variable_p && can_vec_perm_var_p (mode))
> +    return true;
> +
> +  if (direct_optab_handler (vec_perm_const_optab, mode) != CODE_FOR_nothing)
>      {
> -      if (GET_MODE_UNIT_SIZE (mode) > 2
> -         && optab_handler (ashl_optab, mode) == CODE_FOR_nothing
> -         && optab_handler (vashl_optab, mode) == CODE_FOR_nothing)
> -       return false;
> -      if (optab_handler (add_optab, qimode) == CODE_FOR_nothing)
> -       return false;
> +      if (targetm.vectorize.vec_perm_const_ok == NULL
> +         || targetm.vectorize.vec_perm_const_ok (mode, sel))
> +       return true;
> +
> +      /* ??? For completeness, we ought to check the QImode version of
> +        vec_perm_const_optab.  But all users of this implicit lowering
> +        feature implement the variable vec_perm_optab.  */
>      }
>
> -  return true;
> +  return false;
>  }
>
>  /* Find a widening optab even if it doesn't widen as much as we want.
> @@ -472,7 +489,7 @@ can_mult_highpart_p (machine_mode mode,
>             sel.quick_push (!BYTES_BIG_ENDIAN
>                             + (i & ~1)
>                             + ((i & 1) ? nunits : 0));
> -         if (can_vec_perm_p (mode, false, &sel))
> +         if (can_vec_perm_const_p (mode, sel))
>             return 2;
>         }
>      }
> @@ -486,7 +503,7 @@ can_mult_highpart_p (machine_mode mode,
>           auto_vec_perm_indices sel (nunits);
>           for (i = 0; i < nunits; ++i)
>             sel.quick_push (2 * i + (BYTES_BIG_ENDIAN ? 0 : 1));
> -         if (can_vec_perm_p (mode, false, &sel))
> +         if (can_vec_perm_const_p (mode, sel))
>             return 3;
>         }
>      }
> Index: gcc/tree-ssa-forwprop.c
> ===================================================================
> --- gcc/tree-ssa-forwprop.c     2017-12-09 22:47:11.145420483 +0000
> +++ gcc/tree-ssa-forwprop.c     2017-12-09 22:47:21.534314227 +0000
> @@ -2108,7 +2108,7 @@ simplify_vector_constructor (gimple_stmt
>      {
>        tree mask_type;
>
> -      if (!can_vec_perm_p (TYPE_MODE (type), false, &sel))
> +      if (!can_vec_perm_const_p (TYPE_MODE (type), sel))
>         return false;
>        mask_type
>         = build_vector_type (build_nonstandard_integer_type (elem_size, 1),
> Index: gcc/tree-vect-data-refs.c
> ===================================================================
> --- gcc/tree-vect-data-refs.c   2017-12-09 22:47:11.145420483 +0000
> +++ gcc/tree-vect-data-refs.c   2017-12-09 22:47:21.535314227 +0000
> @@ -4587,11 +4587,11 @@ vect_grouped_store_supported (tree vecty
>                   if (3 * i + nelt2 < nelt)
>                     sel[3 * i + nelt2] = 0;
>                 }
> -             if (!can_vec_perm_p (mode, false, &sel))
> +             if (!can_vec_perm_const_p (mode, sel))
>                 {
>                   if (dump_enabled_p ())
>                     dump_printf (MSG_MISSED_OPTIMIZATION,
> -                                "permutaion op not supported by target.\n");
> +                                "permutation op not supported by target.\n");
>                   return false;
>                 }
>
> @@ -4604,11 +4604,11 @@ vect_grouped_store_supported (tree vecty
>                   if (3 * i + nelt2 < nelt)
>                     sel[3 * i + nelt2] = nelt + j2++;
>                 }
> -             if (!can_vec_perm_p (mode, false, &sel))
> +             if (!can_vec_perm_const_p (mode, sel))
>                 {
>                   if (dump_enabled_p ())
>                     dump_printf (MSG_MISSED_OPTIMIZATION,
> -                                "permutaion op not supported by target.\n");
> +                                "permutation op not supported by target.\n");
>                   return false;
>                 }
>             }
> @@ -4624,11 +4624,11 @@ vect_grouped_store_supported (tree vecty
>               sel[i * 2] = i;
>               sel[i * 2 + 1] = i + nelt;
>             }
> -         if (can_vec_perm_p (mode, false, &sel))
> +         if (can_vec_perm_const_p (mode, sel))
>             {
>               for (i = 0; i < nelt; i++)
>                 sel[i] += nelt / 2;
> -             if (can_vec_perm_p (mode, false, &sel))
> +             if (can_vec_perm_const_p (mode, sel))
>                 return true;
>             }
>         }
> @@ -5166,7 +5166,7 @@ vect_grouped_load_supported (tree vectyp
>                   sel[i] = 3 * i + k;
>                 else
>                   sel[i] = 0;
> -             if (!can_vec_perm_p (mode, false, &sel))
> +             if (!can_vec_perm_const_p (mode, sel))
>                 {
>                   if (dump_enabled_p ())
>                     dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> @@ -5179,7 +5179,7 @@ vect_grouped_load_supported (tree vectyp
>                   sel[i] = i;
>                 else
>                   sel[i] = nelt + ((nelt + k) % 3) + 3 * (j++);
> -             if (!can_vec_perm_p (mode, false, &sel))
> +             if (!can_vec_perm_const_p (mode, sel))
>                 {
>                   if (dump_enabled_p ())
>                     dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> @@ -5196,11 +5196,11 @@ vect_grouped_load_supported (tree vectyp
>           gcc_assert (pow2p_hwi (count));
>           for (i = 0; i < nelt; i++)
>             sel[i] = i * 2;
> -         if (can_vec_perm_p (mode, false, &sel))
> +         if (can_vec_perm_const_p (mode, sel))
>             {
>               for (i = 0; i < nelt; i++)
>                 sel[i] = i * 2 + 1;
> -             if (can_vec_perm_p (mode, false, &sel))
> +             if (can_vec_perm_const_p (mode, sel))
>                 return true;
>             }
>          }
> @@ -5527,7 +5527,7 @@ vect_shift_permute_load_chain (vec<tree>
>         sel[i] = i * 2;
>        for (i = 0; i < nelt / 2; ++i)
>         sel[nelt / 2 + i] = i * 2 + 1;
> -      if (!can_vec_perm_p (TYPE_MODE (vectype), false, &sel))
> +      if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel))
>         {
>           if (dump_enabled_p ())
>             dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> @@ -5541,7 +5541,7 @@ vect_shift_permute_load_chain (vec<tree>
>         sel[i] = i * 2 + 1;
>        for (i = 0; i < nelt / 2; ++i)
>         sel[nelt / 2 + i] = i * 2;
> -      if (!can_vec_perm_p (TYPE_MODE (vectype), false, &sel))
> +      if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel))
>         {
>           if (dump_enabled_p ())
>             dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> @@ -5555,7 +5555,7 @@ vect_shift_permute_load_chain (vec<tree>
>          For vector length 8 it is {4 5 6 7 8 9 10 11}.  */
>        for (i = 0; i < nelt; i++)
>         sel[i] = nelt / 2 + i;
> -      if (!can_vec_perm_p (TYPE_MODE (vectype), false, &sel))
> +      if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel))
>         {
>           if (dump_enabled_p ())
>             dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> @@ -5570,7 +5570,7 @@ vect_shift_permute_load_chain (vec<tree>
>         sel[i] = i;
>        for (i = nelt / 2; i < nelt; i++)
>         sel[i] = nelt + i;
> -      if (!can_vec_perm_p (TYPE_MODE (vectype), false, &sel))
> +      if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel))
>         {
>           if (dump_enabled_p ())
>             dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> @@ -5633,7 +5633,7 @@ vect_shift_permute_load_chain (vec<tree>
>           sel[i] = 3 * k + (l % 3);
>           k++;
>         }
> -      if (!can_vec_perm_p (TYPE_MODE (vectype), false, &sel))
> +      if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel))
>         {
>           if (dump_enabled_p ())
>             dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> @@ -5647,7 +5647,7 @@ vect_shift_permute_load_chain (vec<tree>
>          For vector length 8 it is {6 7 8 9 10 11 12 13}.  */
>        for (i = 0; i < nelt; i++)
>         sel[i] = 2 * (nelt / 3) + (nelt % 3) + i;
> -      if (!can_vec_perm_p (TYPE_MODE (vectype), false, &sel))
> +      if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel))
>         {
>           if (dump_enabled_p ())
>             dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> @@ -5660,7 +5660,7 @@ vect_shift_permute_load_chain (vec<tree>
>          For vector length 8 it is {5 6 7 8 9 10 11 12}.  */
>        for (i = 0; i < nelt; i++)
>         sel[i] = 2 * (nelt / 3) + 1 + i;
> -      if (!can_vec_perm_p (TYPE_MODE (vectype), false, &sel))
> +      if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel))
>         {
>           if (dump_enabled_p ())
>             dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> @@ -5673,7 +5673,7 @@ vect_shift_permute_load_chain (vec<tree>
>          For vector length 8 it is {3 4 5 6 7 8 9 10}.  */
>        for (i = 0; i < nelt; i++)
>         sel[i] = (nelt / 3) + (nelt % 3) / 2 + i;
> -      if (!can_vec_perm_p (TYPE_MODE (vectype), false, &sel))
> +      if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel))
>         {
>           if (dump_enabled_p ())
>             dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> @@ -5686,7 +5686,7 @@ vect_shift_permute_load_chain (vec<tree>
>          For vector length 8 it is {5 6 7 8 9 10 11 12}.  */
>        for (i = 0; i < nelt; i++)
>         sel[i] = 2 * (nelt / 3) + (nelt % 3) / 2 + i;
> -      if (!can_vec_perm_p (TYPE_MODE (vectype), false, &sel))
> +      if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel))
>         {
>           if (dump_enabled_p ())
>             dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> Index: gcc/tree-vect-slp.c
> ===================================================================
> --- gcc/tree-vect-slp.c 2017-12-09 22:47:11.145420483 +0000
> +++ gcc/tree-vect-slp.c 2017-12-09 22:47:21.536314228 +0000
> @@ -901,7 +901,7 @@ vect_build_slp_tree_1 (vec_info *vinfo,
>             elt += count;
>           sel.quick_push (elt);
>         }
> -      if (!can_vec_perm_p (TYPE_MODE (vectype), false, &sel))
> +      if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel))
>         {
>           for (i = 0; i < group_size; ++i)
>             if (gimple_assign_rhs_code (stmts[i]) == alt_stmt_code)
> @@ -3646,7 +3646,7 @@ vect_transform_slp_perm_load (slp_tree n
>           if (index == nunits)
>             {
>               if (! noop_p
> -                 && ! can_vec_perm_p (mode, false, &mask))
> +                 && ! can_vec_perm_const_p (mode, mask))
>                 {
>                   if (dump_enabled_p ())
>                     {
> Index: gcc/tree-vect-stmts.c
> ===================================================================
> --- gcc/tree-vect-stmts.c       2017-12-09 22:47:19.119312754 +0000
> +++ gcc/tree-vect-stmts.c       2017-12-09 22:47:21.537314229 +0000
> @@ -1720,7 +1720,7 @@ perm_mask_for_reverse (tree vectype)
>    for (i = 0; i < nunits; ++i)
>      sel.quick_push (nunits - 1 - i);
>
> -  if (!can_vec_perm_p (TYPE_MODE (vectype), false, &sel))
> +  if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel))
>      return NULL_TREE;
>    return vect_gen_perm_mask_checked (vectype, sel);
>  }
> @@ -2502,7 +2502,7 @@ vectorizable_bswap (gimple *stmt, gimple
>      for (unsigned j = 0; j < word_bytes; ++j)
>        elts.quick_push ((i + 1) * word_bytes - j - 1);
>
> -  if (! can_vec_perm_p (TYPE_MODE (char_vectype), false, &elts))
> +  if (!can_vec_perm_const_p (TYPE_MODE (char_vectype), elts))
>      return false;
>
>    if (! vec_stmt)
> @@ -6502,7 +6502,7 @@ vectorizable_store (gimple *stmt, gimple
>
>  /* Given a vector type VECTYPE, turns permutation SEL into the equivalent
>     VECTOR_CST mask.  No checks are made that the target platform supports the
> -   mask, so callers may wish to test can_vec_perm_p separately, or use
> +   mask, so callers may wish to test can_vec_perm_const_p separately, or use
>     vect_gen_perm_mask_checked.  */
>
>  tree
> @@ -6523,13 +6523,13 @@ vect_gen_perm_mask_any (tree vectype, co
>    return mask_elts.build ();
>  }
>
> -/* Checked version of vect_gen_perm_mask_any.  Asserts can_vec_perm_p,
> +/* Checked version of vect_gen_perm_mask_any.  Asserts can_vec_perm_const_p,
>     i.e. that the target supports the pattern _for arbitrary input vectors_.  */
>
>  tree
>  vect_gen_perm_mask_checked (tree vectype, const vec_perm_indices &sel)
>  {
> -  gcc_assert (can_vec_perm_p (TYPE_MODE (vectype), false, &sel));
> +  gcc_assert (can_vec_perm_const_p (TYPE_MODE (vectype), sel));
>    return vect_gen_perm_mask_any (vectype, sel);
>  }
>
> Index: gcc/fold-const.c
> ===================================================================
> --- gcc/fold-const.c    2017-12-09 22:47:19.119312754 +0000
> +++ gcc/fold-const.c    2017-12-09 22:47:21.534314227 +0000
> @@ -11620,8 +11620,8 @@ fold_ternary_loc (location_t loc, enum t
>              argument permutation while still allowing an equivalent
>              2-argument version.  */
>           if (need_mask_canon && arg2 == op2
> -             && !can_vec_perm_p (TYPE_MODE (type), false, &sel)
> -             && can_vec_perm_p (TYPE_MODE (type), false, &sel2))
> +             && !can_vec_perm_const_p (TYPE_MODE (type), sel, false)
> +             && can_vec_perm_const_p (TYPE_MODE (type), sel2, false))
>             {
>               need_mask_canon = need_mask_canon2;
>               sel = sel2;
> Index: gcc/tree-vect-loop.c
> ===================================================================
> --- gcc/tree-vect-loop.c        2017-12-09 22:47:11.145420483 +0000
> +++ gcc/tree-vect-loop.c        2017-12-09 22:47:21.536314228 +0000
> @@ -3730,9 +3730,6 @@ have_whole_vector_shift (machine_mode mo
>    if (optab_handler (vec_shr_optab, mode) != CODE_FOR_nothing)
>      return true;
>
> -  if (direct_optab_handler (vec_perm_const_optab, mode) == CODE_FOR_nothing)
> -    return false;
> -
>    unsigned int i, nelt = GET_MODE_NUNITS (mode);
>    auto_vec_perm_indices sel (nelt);
>
> @@ -3740,7 +3737,7 @@ have_whole_vector_shift (machine_mode mo
>      {
>        sel.truncate (0);
>        calc_vec_perm_mask_for_shift (i, nelt, &sel);
> -      if (!can_vec_perm_p (mode, false, &sel))
> +      if (!can_vec_perm_const_p (mode, sel, false))
>         return false;
>      }
>    return true;
> Index: gcc/tree-vect-generic.c
> ===================================================================
> --- gcc/tree-vect-generic.c     2017-12-09 22:47:11.145420483 +0000
> +++ gcc/tree-vect-generic.c     2017-12-09 22:47:21.535314227 +0000
> @@ -1306,7 +1306,7 @@ lower_vec_perm (gimple_stmt_iterator *gs
>         sel_int.quick_push (TREE_INT_CST_LOW (VECTOR_CST_ELT (mask, i))
>                             & (2 * elements - 1));
>
> -      if (can_vec_perm_p (TYPE_MODE (vect_type), false, &sel_int))
> +      if (can_vec_perm_const_p (TYPE_MODE (vect_type), sel_int))
>         {
>           gimple_assign_set_rhs3 (stmt, mask);
>           update_stmt (stmt);
> @@ -1337,7 +1337,7 @@ lower_vec_perm (gimple_stmt_iterator *gs
>             }
>         }
>      }
> -  else if (can_vec_perm_p (TYPE_MODE (vect_type), true, NULL))
> +  else if (can_vec_perm_var_p (TYPE_MODE (vect_type)))
>      return;
>
>    warning_at (loc, OPT_Wvector_operation_performance,

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [04/13] Refactor expand_vec_perm
  2017-12-09 23:13 ` [04/13] Refactor expand_vec_perm Richard Sandiford
@ 2017-12-12 15:17   ` Richard Biener
  0 siblings, 0 replies; 46+ messages in thread
From: Richard Biener @ 2017-12-12 15:17 UTC (permalink / raw)
  To: GCC Patches, Richard Sandiford

On Sun, Dec 10, 2017 at 12:13 AM, Richard Sandiford
<richard.sandiford@linaro.org> wrote:
> This patch splits the variable handling out of expand_vec_perm into
> a subroutine, so that the next patch can use a different interface
> for expanding constant permutes.  expand_vec_perm now does all the
> CONST_VECTOR handling directly and defers to expand_vec_perm_var
> for other rtx codes.  Handling CONST_VECTORs includes handling the
> fallback to variable permutes.
>
> The patch also adds an assert for valid optab modes to expand_vec_perm_1,
> so that we get it when using optabs for CONST_VECTORs.  The MODE_VECTOR_INT
> part was previously in expand_vec_perm and the mode_for_int_vector part
> is new.
>
> Most of the patch is just reindentation, so I've attached a -b version.

Ok.

>
> 2017-12-06  Richard Sandiford  <richard.sandiford@linaro.org>
>
> gcc/
>         * optabs.c (expand_vec_perm_1): Assert that SEL has an integer
>         vector mode and that that mode matches the mode of the data
>         being permuted.
>         (expand_vec_perm): Split handling of non-CONST_VECTOR selectors
>         out into expand_vec_perm_var.  Do all CONST_VECTOR handling here,
>         directly using expand_vec_perm_1 when forcing selectors into
>         registers.
>         (expand_vec_perm_var): New function, split out from expand_vec_perm.
>
> Index: gcc/optabs.c
> ===================================================================
> --- gcc/optabs.c        2017-12-09 22:47:14.731310077 +0000
> +++ gcc/optabs.c        2017-12-09 22:47:23.878315657 +0000
> @@ -5405,6 +5405,8 @@ expand_vec_perm_1 (enum insn_code icode,
>    machine_mode smode = GET_MODE (sel);
>    struct expand_operand ops[4];
>
> +  gcc_assert (GET_MODE_CLASS (smode) == MODE_VECTOR_INT
> +             || mode_for_int_vector (tmode).require () == smode);
>    create_output_operand (&ops[0], target, tmode);
>    create_input_operand (&ops[3], sel, smode);
>
> @@ -5431,8 +5433,13 @@ expand_vec_perm_1 (enum insn_code icode,
>    return NULL_RTX;
>  }
>
> -/* Generate instructions for vec_perm optab given its mode
> -   and three operands.  */
> +static rtx expand_vec_perm_var (machine_mode, rtx, rtx, rtx, rtx);
> +
> +/* Implement a permutation of vectors v0 and v1 using the permutation
> +   vector in SEL and return the result.  Use TARGET to hold the result
> +   if nonnull and convenient.
> +
> +   MODE is the mode of the vectors being permuted (V0 and V1).  */
>
>  rtx
>  expand_vec_perm (machine_mode mode, rtx v0, rtx v1, rtx sel, rtx target)
> @@ -5443,6 +5450,9 @@ expand_vec_perm (machine_mode mode, rtx
>    rtx tmp, sel_qi = NULL;
>    rtvec vec;
>
> +  if (GET_CODE (sel) != CONST_VECTOR)
> +    return expand_vec_perm_var (mode, v0, v1, sel, target);
> +
>    if (!target || GET_MODE (target) != mode)
>      target = gen_reg_rtx (mode);
>
> @@ -5455,86 +5465,125 @@ expand_vec_perm (machine_mode mode, rtx
>    if (!qimode_for_vec_perm (mode).exists (&qimode))
>      qimode = VOIDmode;
>
> -  /* If the input is a constant, expand it specially.  */
> -  gcc_assert (GET_MODE_CLASS (GET_MODE (sel)) == MODE_VECTOR_INT);
> -  if (GET_CODE (sel) == CONST_VECTOR)
> -    {
> -      /* See if this can be handled with a vec_shr.  We only do this if the
> -        second vector is all zeroes.  */
> -      enum insn_code shift_code = optab_handler (vec_shr_optab, mode);
> -      enum insn_code shift_code_qi = ((qimode != VOIDmode && qimode != mode)
> -                                     ? optab_handler (vec_shr_optab, qimode)
> -                                     : CODE_FOR_nothing);
> -      rtx shift_amt = NULL_RTX;
> -      if (v1 == CONST0_RTX (GET_MODE (v1))
> -         && (shift_code != CODE_FOR_nothing
> -             || shift_code_qi != CODE_FOR_nothing))
> +  /* See if this can be handled with a vec_shr.  We only do this if the
> +     second vector is all zeroes.  */
> +  insn_code shift_code = optab_handler (vec_shr_optab, mode);
> +  insn_code shift_code_qi = ((qimode != VOIDmode && qimode != mode)
> +                            ? optab_handler (vec_shr_optab, qimode)
> +                            : CODE_FOR_nothing);
> +
> +  if (v1 == CONST0_RTX (GET_MODE (v1))
> +      && (shift_code != CODE_FOR_nothing
> +         || shift_code_qi != CODE_FOR_nothing))
> +    {
> +      rtx shift_amt = shift_amt_for_vec_perm_mask (sel);
> +      if (shift_amt)
>         {
> -         shift_amt = shift_amt_for_vec_perm_mask (sel);
> -         if (shift_amt)
> +         struct expand_operand ops[3];
> +         if (shift_code != CODE_FOR_nothing)
>             {
> -             struct expand_operand ops[3];
> -             if (shift_code != CODE_FOR_nothing)
> -               {
> -                 create_output_operand (&ops[0], target, mode);
> -                 create_input_operand (&ops[1], v0, mode);
> -                 create_convert_operand_from_type (&ops[2], shift_amt,
> -                                                   sizetype);
> -                 if (maybe_expand_insn (shift_code, 3, ops))
> -                   return ops[0].value;
> -               }
> -             if (shift_code_qi != CODE_FOR_nothing)
> -               {
> -                 tmp = gen_reg_rtx (qimode);
> -                 create_output_operand (&ops[0], tmp, qimode);
> -                 create_input_operand (&ops[1], gen_lowpart (qimode, v0),
> -                                       qimode);
> -                 create_convert_operand_from_type (&ops[2], shift_amt,
> -                                                   sizetype);
> -                 if (maybe_expand_insn (shift_code_qi, 3, ops))
> -                   return gen_lowpart (mode, ops[0].value);
> -               }
> +             create_output_operand (&ops[0], target, mode);
> +             create_input_operand (&ops[1], v0, mode);
> +             create_convert_operand_from_type (&ops[2], shift_amt, sizetype);
> +             if (maybe_expand_insn (shift_code, 3, ops))
> +               return ops[0].value;
> +           }
> +         if (shift_code_qi != CODE_FOR_nothing)
> +           {
> +             rtx tmp = gen_reg_rtx (qimode);
> +             create_output_operand (&ops[0], tmp, qimode);
> +             create_input_operand (&ops[1], gen_lowpart (qimode, v0), qimode);
> +             create_convert_operand_from_type (&ops[2], shift_amt, sizetype);
> +             if (maybe_expand_insn (shift_code_qi, 3, ops))
> +               return gen_lowpart (mode, ops[0].value);
>             }
>         }
> +    }
>
> -      icode = direct_optab_handler (vec_perm_const_optab, mode);
> -      if (icode != CODE_FOR_nothing)
> +  icode = direct_optab_handler (vec_perm_const_optab, mode);
> +  if (icode != CODE_FOR_nothing)
> +    {
> +      tmp = expand_vec_perm_1 (icode, target, v0, v1, sel);
> +      if (tmp)
> +       return tmp;
> +    }
> +
> +  /* Fall back to a constant byte-based permutation.  */
> +  if (qimode != VOIDmode)
> +    {
> +      vec = rtvec_alloc (w);
> +      for (i = 0; i < e; ++i)
>         {
> -         tmp = expand_vec_perm_1 (icode, target, v0, v1, sel);
> -         if (tmp)
> -           return tmp;
> +         unsigned int j, this_e;
> +
> +         this_e = INTVAL (CONST_VECTOR_ELT (sel, i));
> +         this_e &= 2 * e - 1;
> +         this_e *= u;
> +
> +         for (j = 0; j < u; ++j)
> +           RTVEC_ELT (vec, i * u + j) = GEN_INT (this_e + j);
>         }
> +      sel_qi = gen_rtx_CONST_VECTOR (qimode, vec);
>
> -      /* Fall back to a constant byte-based permutation.  */
> -      if (qimode != VOIDmode)
> +      icode = direct_optab_handler (vec_perm_const_optab, qimode);
> +      if (icode != CODE_FOR_nothing)
>         {
> -         vec = rtvec_alloc (w);
> -         for (i = 0; i < e; ++i)
> -           {
> -             unsigned int j, this_e;
> +         tmp = gen_reg_rtx (qimode);
> +         tmp = expand_vec_perm_1 (icode, tmp, gen_lowpart (qimode, v0),
> +                                  gen_lowpart (qimode, v1), sel_qi);
> +         if (tmp)
> +           return gen_lowpart (mode, tmp);
> +       }
> +    }
>
> -             this_e = INTVAL (CONST_VECTOR_ELT (sel, i));
> -             this_e &= 2 * e - 1;
> -             this_e *= u;
> +  /* Otherwise expand as a fully variable permuation.  */
>
> -             for (j = 0; j < u; ++j)
> -               RTVEC_ELT (vec, i * u + j) = GEN_INT (this_e + j);
> -           }
> -         sel_qi = gen_rtx_CONST_VECTOR (qimode, vec);
> +  icode = direct_optab_handler (vec_perm_optab, mode);
> +  if (icode != CODE_FOR_nothing)
> +    {
> +      rtx tmp = expand_vec_perm_1 (icode, target, v0, v1, sel);
> +      if (tmp)
> +       return tmp;
> +    }
>
> -         icode = direct_optab_handler (vec_perm_const_optab, qimode);
> -         if (icode != CODE_FOR_nothing)
> -           {
> -             tmp = mode != qimode ? gen_reg_rtx (qimode) : target;
> -             tmp = expand_vec_perm_1 (icode, tmp, gen_lowpart (qimode, v0),
> -                                      gen_lowpart (qimode, v1), sel_qi);
> -             if (tmp)
> -               return gen_lowpart (mode, tmp);
> -           }
> +  if (qimode != VOIDmode)
> +    {
> +      icode = direct_optab_handler (vec_perm_optab, qimode);
> +      if (icode != CODE_FOR_nothing)
> +       {
> +         rtx tmp = gen_reg_rtx (qimode);
> +         tmp = expand_vec_perm_1 (icode, tmp, gen_lowpart (qimode, v0),
> +                                  gen_lowpart (qimode, v1), sel_qi);
> +         if (tmp)
> +           return gen_lowpart (mode, tmp);
>         }
>      }
>
> -  /* Otherwise expand as a fully variable permuation.  */
> +  return NULL_RTX;
> +}
> +
> +/* Implement a permutation of vectors v0 and v1 using the permutation
> +   vector in SEL and return the result.  Use TARGET to hold the result
> +   if nonnull and convenient.
> +
> +   MODE is the mode of the vectors being permuted (V0 and V1).
> +   SEL must have the integer equivalent of MODE and is known to be
> +   unsuitable for permutes with a constant permutation vector.  */
> +
> +static rtx
> +expand_vec_perm_var (machine_mode mode, rtx v0, rtx v1, rtx sel, rtx target)
> +{
> +  enum insn_code icode;
> +  unsigned int i, w, u;
> +  rtx tmp, sel_qi;
> +  rtvec vec;
> +
> +  w = GET_MODE_SIZE (mode);
> +  u = GET_MODE_UNIT_SIZE (mode);
> +
> +  if (!target || GET_MODE (target) != mode)
> +    target = gen_reg_rtx (mode);
> +
>    icode = direct_optab_handler (vec_perm_optab, mode);
>    if (icode != CODE_FOR_nothing)
>      {
> @@ -5545,50 +5594,47 @@ expand_vec_perm (machine_mode mode, rtx
>
>    /* As a special case to aid several targets, lower the element-based
>       permutation to a byte-based permutation and try again.  */
> -  if (qimode == VOIDmode)
> +  machine_mode qimode;
> +  if (!qimode_for_vec_perm (mode).exists (&qimode))
>      return NULL_RTX;
>    icode = direct_optab_handler (vec_perm_optab, qimode);
>    if (icode == CODE_FOR_nothing)
>      return NULL_RTX;
>
> -  if (sel_qi == NULL)
> +  /* Multiply each element by its byte size.  */
> +  machine_mode selmode = GET_MODE (sel);
> +  if (u == 2)
> +    sel = expand_simple_binop (selmode, PLUS, sel, sel,
> +                              NULL, 0, OPTAB_DIRECT);
> +  else
> +    sel = expand_simple_binop (selmode, ASHIFT, sel, GEN_INT (exact_log2 (u)),
> +                              NULL, 0, OPTAB_DIRECT);
> +  gcc_assert (sel != NULL);
> +
> +  /* Broadcast the low byte each element into each of its bytes.  */
> +  vec = rtvec_alloc (w);
> +  for (i = 0; i < w; ++i)
>      {
> -      /* Multiply each element by its byte size.  */
> -      machine_mode selmode = GET_MODE (sel);
> -      if (u == 2)
> -       sel = expand_simple_binop (selmode, PLUS, sel, sel,
> -                                  NULL, 0, OPTAB_DIRECT);
> -      else
> -       sel = expand_simple_binop (selmode, ASHIFT, sel,
> -                                  GEN_INT (exact_log2 (u)),
> -                                  NULL, 0, OPTAB_DIRECT);
> -      gcc_assert (sel != NULL);
> -
> -      /* Broadcast the low byte each element into each of its bytes.  */
> -      vec = rtvec_alloc (w);
> -      for (i = 0; i < w; ++i)
> -       {
> -         int this_e = i / u * u;
> -         if (BYTES_BIG_ENDIAN)
> -           this_e += u - 1;
> -         RTVEC_ELT (vec, i) = GEN_INT (this_e);
> -       }
> -      tmp = gen_rtx_CONST_VECTOR (qimode, vec);
> -      sel = gen_lowpart (qimode, sel);
> -      sel = expand_vec_perm (qimode, sel, sel, tmp, NULL);
> -      gcc_assert (sel != NULL);
> -
> -      /* Add the byte offset to each byte element.  */
> -      /* Note that the definition of the indicies here is memory ordering,
> -        so there should be no difference between big and little endian.  */
> -      vec = rtvec_alloc (w);
> -      for (i = 0; i < w; ++i)
> -       RTVEC_ELT (vec, i) = GEN_INT (i % u);
> -      tmp = gen_rtx_CONST_VECTOR (qimode, vec);
> -      sel_qi = expand_simple_binop (qimode, PLUS, sel, tmp,
> -                                   sel, 0, OPTAB_DIRECT);
> -      gcc_assert (sel_qi != NULL);
> +      int this_e = i / u * u;
> +      if (BYTES_BIG_ENDIAN)
> +       this_e += u - 1;
> +      RTVEC_ELT (vec, i) = GEN_INT (this_e);
>      }
> +  tmp = gen_rtx_CONST_VECTOR (qimode, vec);
> +  sel = gen_lowpart (qimode, sel);
> +  sel = expand_vec_perm (qimode, sel, sel, tmp, NULL);
> +  gcc_assert (sel != NULL);
> +
> +  /* Add the byte offset to each byte element.  */
> +  /* Note that the definition of the indicies here is memory ordering,
> +     so there should be no difference between big and little endian.  */
> +  vec = rtvec_alloc (w);
> +  for (i = 0; i < w; ++i)
> +    RTVEC_ELT (vec, i) = GEN_INT (i % u);
> +  tmp = gen_rtx_CONST_VECTOR (qimode, vec);
> +  sel_qi = expand_simple_binop (qimode, PLUS, sel, tmp,
> +                               sel, 0, OPTAB_DIRECT);
> +  gcc_assert (sel_qi != NULL);
>
>    tmp = mode != qimode ? gen_reg_rtx (qimode) : target;
>    tmp = expand_vec_perm_1 (icode, tmp, gen_lowpart (qimode, v0),
>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [05/13] Remove vec_perm_const optab
  2017-12-09 23:17 ` [05/13] Remove vec_perm_const optab Richard Sandiford
@ 2017-12-12 15:26   ` Richard Biener
  2017-12-20 13:42     ` Richard Sandiford
  0 siblings, 1 reply; 46+ messages in thread
From: Richard Biener @ 2017-12-12 15:26 UTC (permalink / raw)
  To: GCC Patches, Richard Sandiford

On Sun, Dec 10, 2017 at 12:16 AM, Richard Sandiford
<richard.sandiford@linaro.org> wrote:
> One of the changes needed for variable-length VEC_PERM_EXPRs -- and for
> long fixed-length VEC_PERM_EXPRs -- is the ability to use constant
> selectors that wouldn't fit in the vectors being permuted.  E.g. a
> permute on two V256QIs can't be done using a V256QI selector.
>
> At the moment constant permutes use two interfaces:
> targetm.vectorizer.vec_perm_const_ok for testing whether a permute is
> valid and the vec_perm_const optab for actually emitting the permute.
> The former gets passed a vec<> selector and the latter an rtx selector.
> Most ports share a lot of code between the hook and the optab, with a
> wrapper function for each interface.
>
> We could try to keep that interface and require ports to define wider
> vector modes that could be attached to the CONST_VECTOR (e.g. V256HI or
> V256SI in the example above).  But building a CONST_VECTOR rtx seems a bit
> pointless here, since the expand code only creates the CONST_VECTOR in
> order to call the optab, and the first thing the target does is take
> the CONST_VECTOR apart again.
>
> The easiest approach therefore seemed to be to remove the optab and
> reuse the target hook to emit the code.  One potential drawback is that
> it's no longer possible to use match_operand predicates to force
> operands into the required form, but in practice all targets want
> register operands anyway.
>
> The patch also changes vec_perm_indices into a class that provides
> some simple routines for handling permutations.  A later patch will
> flesh this out and get rid of auto_vec_perm_indices, but I didn't
> want to do all that in this patch and make it more complicated than
> it already is.
>
>
> 2017-12-09  Richard Sandiford  <richard.sandiford@linaro.org>
>
> gcc/
>         * Makefile.in (OBJS): Add vec-perm-indices.o.
>         * vec-perm-indices.h: New file.
>         * vec-perm-indices.c: Likewise.
>         * target.h (vec_perm_indices): Replace with a forward class
>         declaration.
>         (auto_vec_perm_indices): Move to vec-perm-indices.h.
>         * optabs.h: Include vec-perm-indices.h.
>         (expand_vec_perm): Delete.
>         (selector_fits_mode_p, expand_vec_perm_var): Declare.
>         (expand_vec_perm_const): Declare.
>         * target.def (vec_perm_const_ok): Replace with...
>         (vec_perm_const): ...this new hook.
>         * doc/tm.texi.in (TARGET_VECTORIZE_VEC_PERM_CONST_OK): Replace with...
>         (TARGET_VECTORIZE_VEC_PERM_CONST): ...this new hook.
>         * doc/tm.texi: Regenerate.
>         * optabs.def (vec_perm_const): Delete.
>         * doc/md.texi (vec_perm_const): Likewise.
>         (vec_perm): Refer to TARGET_VECTORIZE_VEC_PERM_CONST.
>         * expr.c (expand_expr_real_2): Use expand_vec_perm_const rather than
>         expand_vec_perm for constant permutation vectors.  Assert that
>         the mode of variable permutation vectors is the integer equivalent
>         of the mode that is being permuted.
>         * optabs-query.h (selector_fits_mode_p): Declare.
>         * optabs-query.c: Include vec-perm-indices.h.
>         (can_vec_perm_const_p): Check whether targetm.vectorize.vec_perm_const
>         is defined, instead of checking whether the vec_perm_const_optab
>         exists.  Use targetm.vectorize.vec_perm_const instead of
>         targetm.vectorize.vec_perm_const_ok.  Check whether the indices
>         fit in the vector mode before using a variable permute.
>         * optabs.c (shift_amt_for_vec_perm_mask): Take a mode and a
>         vec_perm_indices instead of an rtx.
>         (expand_vec_perm): Replace with...
>         (expand_vec_perm_const): ...this new function.  Take the selector
>         as a vec_perm_indices rather than an rtx.  Also take the mode of
>         the selector.  Update call to shift_amt_for_vec_perm_mask.
>         Use targetm.vectorize.vec_perm_const instead of vec_perm_const_optab.
>         Use vec_perm_indices::new_expanded_vector to expand the original
>         selector into bytes.  Check whether the indices fit in the vector
>         mode before using a variable permute.
>         (expand_vec_perm_var): Make global.
>         (expand_mult_highpart): Use expand_vec_perm_const.
>         * fold-const.c: Includes vec-perm-indices.h.
>         * tree-ssa-forwprop.c: Likewise.
>         * tree-vect-data-refs.c: Likewise.
>         * tree-vect-generic.c: Likewise.
>         * tree-vect-loop.c: Likewise.
>         * tree-vect-slp.c: Likewise.
>         * tree-vect-stmts.c: Likewise.
>         * config/aarch64/aarch64-protos.h (aarch64_expand_vec_perm_const):
>         Delete.
>         * config/aarch64/aarch64-simd.md (vec_perm_const<mode>): Delete.
>         * config/aarch64/aarch64.c (aarch64_expand_vec_perm_const)
>         (aarch64_vectorize_vec_perm_const_ok): Fuse into...
>         (aarch64_vectorize_vec_perm_const): ...this new function.
>         (TARGET_VECTORIZE_VEC_PERM_CONST_OK): Delete.
>         (TARGET_VECTORIZE_VEC_PERM_CONST): Redefine.
>         * config/arm/arm-protos.h (arm_expand_vec_perm_const): Delete.
>         * config/arm/vec-common.md (vec_perm_const<mode>): Delete.
>         * config/arm/arm.c (TARGET_VECTORIZE_VEC_PERM_CONST_OK): Delete.
>         (TARGET_VECTORIZE_VEC_PERM_CONST): Redefine.
>         (arm_expand_vec_perm_const, arm_vectorize_vec_perm_const_ok): Merge
>         into...
>         (arm_vectorize_vec_perm_const): ...this new function.  Explicitly
>         check for NEON modes.
>         * config/i386/i386-protos.h (ix86_expand_vec_perm_const): Delete.
>         * config/i386/sse.md (VEC_PERM_CONST, vec_perm_const<mode>): Delete.
>         * config/i386/i386.c (ix86_expand_vec_perm_const_1): Update comment.
>         (ix86_expand_vec_perm_const, ix86_vectorize_vec_perm_const_ok): Merge
>         into...
>         (ix86_vectorize_vec_perm_const): ...this new function.  Incorporate
>         the old VEC_PERM_CONST conditions.
>         * config/ia64/ia64-protos.h (ia64_expand_vec_perm_const): Delete.
>         * config/ia64/vect.md (vec_perm_const<mode>): Delete.
>         * config/ia64/ia64.c (ia64_expand_vec_perm_const)
>         (ia64_vectorize_vec_perm_const_ok): Merge into...
>         (ia64_vectorize_vec_perm_const): ...this new function.
>         * config/mips/loongson.md (vec_perm_const<mode>): Delete.
>         * config/mips/mips-msa.md (vec_perm_const<mode>): Delete.
>         * config/mips/mips-ps-3d.md (vec_perm_constv2sf): Delete.
>         * config/mips/mips-protos.h (mips_expand_vec_perm_const): Delete.
>         * config/mips/mips.c (mips_expand_vec_perm_const)
>         (mips_vectorize_vec_perm_const_ok): Merge into...
>         (mips_vectorize_vec_perm_const): ...this new function.
>         * config/powerpcspe/altivec.md (vec_perm_constv16qi): Delete.
>         * config/powerpcspe/paired.md (vec_perm_constv2sf): Delete.
>         * config/powerpcspe/spe.md (vec_perm_constv2si): Delete.
>         * config/powerpcspe/vsx.md (vec_perm_const<mode>): Delete.
>         * config/powerpcspe/powerpcspe-protos.h (altivec_expand_vec_perm_const)
>         (rs6000_expand_vec_perm_const): Delete.
>         * config/powerpcspe/powerpcspe.c (TARGET_VECTORIZE_VEC_PERM_CONST_OK):
>         Delete.
>         (TARGET_VECTORIZE_VEC_PERM_CONST): Redefine.
>         (altivec_expand_vec_perm_const_le): Take each operand individually.
>         Operate on constant selectors rather than rtxes.
>         (altivec_expand_vec_perm_const): Likewise.  Update call to
>         altivec_expand_vec_perm_const_le.
>         (rs6000_expand_vec_perm_const): Delete.
>         (rs6000_vectorize_vec_perm_const_ok): Delete.
>         (rs6000_vectorize_vec_perm_const): New function.
>         (rs6000_do_expand_vec_perm): Take a vec_perm_builder instead of
>         an element count and rtx array.
>         (rs6000_expand_extract_even): Update call accordingly.
>         (rs6000_expand_interleave): Likewise.
>         * config/rs6000/altivec.md (vec_perm_constv16qi): Delete.
>         * config/rs6000/paired.md (vec_perm_constv2sf): Delete.
>         * config/rs6000/vsx.md (vec_perm_const<mode>): Delete.
>         * config/rs6000/rs6000-protos.h (altivec_expand_vec_perm_const)
>         (rs6000_expand_vec_perm_const): Delete.
>         * config/rs6000/rs6000.c (TARGET_VECTORIZE_VEC_PERM_CONST_OK): Delete.
>         (TARGET_VECTORIZE_VEC_PERM_CONST): Redefine.
>         (altivec_expand_vec_perm_const_le): Take each operand individually.
>         Operate on constant selectors rather than rtxes.
>         (altivec_expand_vec_perm_const): Likewise.  Update call to
>         altivec_expand_vec_perm_const_le.
>         (rs6000_expand_vec_perm_const): Delete.
>         (rs6000_vectorize_vec_perm_const_ok): Delete.
>         (rs6000_vectorize_vec_perm_const): New function.  Remove stray
>         reference to the SPE evmerge intructions.
>         (rs6000_do_expand_vec_perm): Take a vec_perm_builder instead of
>         an element count and rtx array.
>         (rs6000_expand_extract_even): Update call accordingly.
>         (rs6000_expand_interleave): Likewise.
>         * config/sparc/sparc.md (vec_perm_constv8qi): Delete in favor of...
>         * config/sparc/sparc.c (sparc_vectorize_vec_perm_const): ...this
>         new function.
>         (TARGET_VECTORIZE_VEC_PERM_CONST): Redefine.
>
> Index: gcc/Makefile.in
> ===================================================================
> --- gcc/Makefile.in     2017-12-09 22:47:09.549486911 +0000
> +++ gcc/Makefile.in     2017-12-09 22:47:27.854318082 +0000
> @@ -1584,6 +1584,7 @@ OBJS = \
>         var-tracking.o \
>         varasm.o \
>         varpool.o \
> +       vec-perm-indices.o \
>         vmsdbgout.o \
>         vr-values.o \
>         vtable-verify.o \
> Index: gcc/vec-perm-indices.h
> ===================================================================
> --- /dev/null   2017-12-09 13:59:56.352713187 +0000
> +++ gcc/vec-perm-indices.h      2017-12-09 22:47:27.885318101 +0000
> @@ -0,0 +1,49 @@
> +/* A representation of vector permutation indices.
> +   Copyright (C) 2017 Free Software Foundation, Inc.
> +
> +This file is part of GCC.
> +
> +GCC is free software; you can redistribute it and/or modify it under
> +the terms of the GNU General Public License as published by the Free
> +Software Foundation; either version 3, or (at your option) any later
> +version.
> +
> +GCC is distributed in the hope that it will be useful, but WITHOUT ANY
> +WARRANTY; without even the implied warranty of MERCHANTABILITY or
> +FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
> +for more details.
> +
> +You should have received a copy of the GNU General Public License
> +along with GCC; see the file COPYING3.  If not see
> +<http://www.gnu.org/licenses/>.  */
> +
> +#ifndef GCC_VEC_PERN_INDICES_H
> +#define GCC_VEC_PERN_INDICES_H 1
> +
> +/* This class represents a constant permutation vector, such as that used
> +   as the final operand to a VEC_PERM_EXPR.  */
> +class vec_perm_indices : public auto_vec<unsigned short, 32>
> +{
> +  typedef unsigned short element_type;
> +  typedef auto_vec<element_type, 32> parent_type;
> +
> +public:
> +  vec_perm_indices () {}
> +  vec_perm_indices (unsigned int nunits) : parent_type (nunits) {}
> +
> +  void new_expanded_vector (const vec_perm_indices &, unsigned int);
> +
> +  bool all_in_range_p (element_type, element_type) const;
> +
> +private:
> +  vec_perm_indices (const vec_perm_indices &);
> +};
> +
> +/* Temporary.  */
> +typedef vec_perm_indices vec_perm_builder;
> +typedef vec_perm_indices auto_vec_perm_indices;
> +
> +bool tree_to_vec_perm_builder (vec_perm_builder *, tree);
> +rtx vec_perm_indices_to_rtx (machine_mode, const vec_perm_indices &);
> +
> +#endif
> Index: gcc/vec-perm-indices.c
> ===================================================================
> --- /dev/null   2017-12-09 13:59:56.352713187 +0000
> +++ gcc/vec-perm-indices.c      2017-12-09 22:47:27.885318101 +0000
> @@ -0,0 +1,93 @@
> +/* A representation of vector permutation indices.
> +   Copyright (C) 2017 Free Software Foundation, Inc.
> +
> +This file is part of GCC.
> +
> +GCC is free software; you can redistribute it and/or modify it under
> +the terms of the GNU General Public License as published by the Free
> +Software Foundation; either version 3, or (at your option) any later
> +version.
> +
> +GCC is distributed in the hope that it will be useful, but WITHOUT ANY
> +WARRANTY; without even the implied warranty of MERCHANTABILITY or
> +FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
> +for more details.
> +
> +You should have received a copy of the GNU General Public License
> +along with GCC; see the file COPYING3.  If not see
> +<http://www.gnu.org/licenses/>.  */
> +
> +#include "config.h"
> +#include "system.h"
> +#include "coretypes.h"
> +#include "vec-perm-indices.h"
> +#include "tree.h"
> +#include "backend.h"
> +#include "rtl.h"
> +#include "memmodel.h"
> +#include "emit-rtl.h"
> +
> +/* Switch to a new permutation vector that selects the same input elements
> +   as ORIG, but with each element split into FACTOR pieces.  For example,
> +   if ORIG is { 1, 2, 0, 3 } and FACTOR is 2, the new permutation is
> +   { 2, 3, 4, 5, 0, 1, 6, 7 }.  */
> +
> +void
> +vec_perm_indices::new_expanded_vector (const vec_perm_indices &orig,
> +                                      unsigned int factor)
> +{
> +  truncate (0);
> +  reserve (orig.length () * factor);
> +  for (unsigned int i = 0; i < orig.length (); ++i)
> +    {
> +      element_type base = orig[i] * factor;

No check whether this overflows unsigned short?  (not that this is likely)

> +      for (unsigned int j = 0; j < factor; ++j)
> +       quick_push (base + j);
> +    }
> +}
> +
> +/* Return true if all elements of the permutation vector are in the range
> +   [START, START + SIZE).  */
> +
> +bool
> +vec_perm_indices::all_in_range_p (element_type start, element_type size) const
> +{
> +  for (unsigned int i = 0; i < length (); ++i)
> +    if ((*this)[i] < start || ((*this)[i] - start) >= size)
> +      return false;
> +  return true;
> +}
> +
> +/* Try to read the contents of VECTOR_CST CST as a constant permutation
> +   vector.  Return true and add the elements to BUILDER on success,
> +   otherwise return false without modifying BUILDER.  */
> +
> +bool
> +tree_to_vec_perm_builder (vec_perm_builder *builder, tree cst)
> +{
> +  unsigned int nelts = TYPE_VECTOR_SUBPARTS (TREE_TYPE (cst));
> +  for (unsigned int i = 0; i < nelts; ++i)
> +    if (!tree_fits_shwi_p (vector_cst_elt (cst, i)))

So why specifically shwi and not uhwi?  Shouldn't this also somehow
be checked for IN_RANGE of unsigned short aka vec_perm_indices::element_type?

The rest of the changes look ok, please give target maintainers a
chance to review.

Thanks,
Richard.


> +      return false;
> +
> +  builder->reserve (nelts);
> +  for (unsigned int i = 0; i < nelts; ++i)
> +    builder->quick_push (tree_to_shwi (vector_cst_elt (cst, i))
> +                        & (2 * nelts - 1));
> +  return true;
> +}
> +
> +/* Return a CONST_VECTOR of mode MODE that contains the elements of
> +   INDICES.  */
> +
> +rtx
> +vec_perm_indices_to_rtx (machine_mode mode, const vec_perm_indices &indices)
> +{
> +  gcc_assert (GET_MODE_CLASS (mode) == MODE_VECTOR_INT
> +             && GET_MODE_NUNITS (mode) == indices.length ());
> +  unsigned int nelts = indices.length ();
> +  rtvec v = rtvec_alloc (nelts);
> +  for (unsigned int i = 0; i < nelts; ++i)
> +    RTVEC_ELT (v, i) = gen_int_mode (indices[i], GET_MODE_INNER (mode));
> +  return gen_rtx_CONST_VECTOR (mode, v);
> +}
> Index: gcc/target.h
> ===================================================================
> --- gcc/target.h        2017-12-09 22:47:09.549486911 +0000
> +++ gcc/target.h        2017-12-09 22:47:27.882318099 +0000
> @@ -193,13 +193,7 @@ enum vect_cost_model_location {
>    vect_epilogue = 2
>  };
>
> -/* The type to use for vector permutes with a constant permute vector.
> -   Each entry is an index into the concatenated input vectors.  */
> -typedef vec<unsigned short> vec_perm_indices;
> -
> -/* Same, but can be used to construct local permute vectors that are
> -   automatically freed.  */
> -typedef auto_vec<unsigned short, 32> auto_vec_perm_indices;
> +class vec_perm_indices;
>
>  /* The target structure.  This holds all the backend hooks.  */
>  #define DEFHOOKPOD(NAME, DOC, TYPE, INIT) TYPE NAME;
> Index: gcc/optabs.h
> ===================================================================
> --- gcc/optabs.h        2017-12-09 22:47:09.549486911 +0000
> +++ gcc/optabs.h        2017-12-09 22:47:27.882318099 +0000
> @@ -22,6 +22,7 @@ #define GCC_OPTABS_H
>
>  #include "optabs-query.h"
>  #include "optabs-libfuncs.h"
> +#include "vec-perm-indices.h"
>
>  /* Generate code for a widening multiply.  */
>  extern rtx expand_widening_mult (machine_mode, rtx, rtx, rtx, int, optab);
> @@ -307,7 +308,9 @@ extern int have_insn_for (enum rtx_code,
>  extern rtx_insn *gen_cond_trap (enum rtx_code, rtx, rtx, rtx);
>
>  /* Generate code for VEC_PERM_EXPR.  */
> -extern rtx expand_vec_perm (machine_mode, rtx, rtx, rtx, rtx);
> +extern rtx expand_vec_perm_var (machine_mode, rtx, rtx, rtx, rtx);
> +extern rtx expand_vec_perm_const (machine_mode, rtx, rtx,
> +                                 const vec_perm_builder &, machine_mode, rtx);
>
>  /* Generate code for vector comparison.  */
>  extern rtx expand_vec_cmp_expr (tree, tree, rtx);
> Index: gcc/target.def
> ===================================================================
> --- gcc/target.def      2017-12-09 22:47:09.549486911 +0000
> +++ gcc/target.def      2017-12-09 22:47:27.882318099 +0000
> @@ -1841,12 +1841,27 @@ DEFHOOK
>   bool, (const_tree type, bool is_packed),
>   default_builtin_vector_alignment_reachable)
>
> -/* Return true if a vector created for vec_perm_const is valid.
> -   A NULL indicates that all constants are valid permutations.  */
>  DEFHOOK
> -(vec_perm_const_ok,
> - "Return true if a vector created for @code{vec_perm_const} is valid.",
> - bool, (machine_mode, vec_perm_indices),
> +(vec_perm_const,
> + "This hook is used to test whether the target can permute up to two\n\
> +vectors of mode @var{mode} using the permutation vector @code{sel}, and\n\
> +also to emit such a permutation.  In the former case @var{in0}, @var{in1}\n\
> +and @var{out} are all null.  In the latter case @var{in0} and @var{in1} are\n\
> +the source vectors and @var{out} is the destination vector; all three are\n\
> +registers of mode @var{mode}.  @var{in1} is the same as @var{in0} if\n\
> +@var{sel} describes a permutation on one vector instead of two.\n\
> +\n\
> +Return true if the operation is possible, emitting instructions for it\n\
> +if rtxes are provided.\n\
> +\n\
> +@cindex @code{vec_perm@var{m}} instruction pattern\n\
> +If the hook returns false for a mode with multibyte elements, GCC will\n\
> +try the equivalent byte operation.  If that also fails, it will try forcing\n\
> +the selector into a register and using the @var{vec_perm@var{mode}}\n\
> +instruction pattern.  There is no need for the hook to handle these two\n\
> +implementation approaches itself.",
> + bool, (machine_mode mode, rtx output, rtx in0, rtx in1,
> +       const vec_perm_indices &sel),
>   NULL)
>
>  /* Return true if the target supports misaligned store/load of a
> Index: gcc/doc/tm.texi.in
> ===================================================================
> --- gcc/doc/tm.texi.in  2017-12-09 22:47:09.549486911 +0000
> +++ gcc/doc/tm.texi.in  2017-12-09 22:47:27.879318098 +0000
> @@ -4079,7 +4079,7 @@ address;  but often a machine-dependent
>
>  @hook TARGET_VECTORIZE_VECTOR_ALIGNMENT_REACHABLE
>
> -@hook TARGET_VECTORIZE_VEC_PERM_CONST_OK
> +@hook TARGET_VECTORIZE_VEC_PERM_CONST
>
>  @hook TARGET_VECTORIZE_BUILTIN_CONVERSION
>
> Index: gcc/doc/tm.texi
> ===================================================================
> --- gcc/doc/tm.texi     2017-12-09 22:47:09.549486911 +0000
> +++ gcc/doc/tm.texi     2017-12-09 22:47:27.878318097 +0000
> @@ -5798,8 +5798,24 @@ correct for most targets.
>  Return true if vector alignment is reachable (by peeling N iterations) for the given scalar type @var{type}.  @var{is_packed} is false if the scalar access using @var{type} is known to be naturally aligned.
>  @end deftypefn
>
> -@deftypefn {Target Hook} bool TARGET_VECTORIZE_VEC_PERM_CONST_OK (machine_mode, @var{vec_perm_indices})
> -Return true if a vector created for @code{vec_perm_const} is valid.
> +@deftypefn {Target Hook} bool TARGET_VECTORIZE_VEC_PERM_CONST (machine_mode @var{mode}, rtx @var{output}, rtx @var{in0}, rtx @var{in1}, const vec_perm_indices @var{&sel})
> +This hook is used to test whether the target can permute up to two
> +vectors of mode @var{mode} using the permutation vector @code{sel}, and
> +also to emit such a permutation.  In the former case @var{in0}, @var{in1}
> +and @var{out} are all null.  In the latter case @var{in0} and @var{in1} are
> +the source vectors and @var{out} is the destination vector; all three are
> +registers of mode @var{mode}.  @var{in1} is the same as @var{in0} if
> +@var{sel} describes a permutation on one vector instead of two.
> +
> +Return true if the operation is possible, emitting instructions for it
> +if rtxes are provided.
> +
> +@cindex @code{vec_perm@var{m}} instruction pattern
> +If the hook returns false for a mode with multibyte elements, GCC will
> +try the equivalent byte operation.  If that also fails, it will try forcing
> +the selector into a register and using the @var{vec_perm@var{mode}}
> +instruction pattern.  There is no need for the hook to handle these two
> +implementation approaches itself.
>  @end deftypefn
>
>  @deftypefn {Target Hook} tree TARGET_VECTORIZE_BUILTIN_CONVERSION (unsigned @var{code}, tree @var{dest_type}, tree @var{src_type})
> Index: gcc/optabs.def
> ===================================================================
> --- gcc/optabs.def      2017-12-09 22:47:09.549486911 +0000
> +++ gcc/optabs.def      2017-12-09 22:47:27.882318099 +0000
> @@ -302,7 +302,6 @@ OPTAB_D (vec_pack_ssat_optab, "vec_pack_
>  OPTAB_D (vec_pack_trunc_optab, "vec_pack_trunc_$a")
>  OPTAB_D (vec_pack_ufix_trunc_optab, "vec_pack_ufix_trunc_$a")
>  OPTAB_D (vec_pack_usat_optab, "vec_pack_usat_$a")
> -OPTAB_D (vec_perm_const_optab, "vec_perm_const$a")
>  OPTAB_D (vec_perm_optab, "vec_perm$a")
>  OPTAB_D (vec_realign_load_optab, "vec_realign_load_$a")
>  OPTAB_D (vec_set_optab, "vec_set$a")
> Index: gcc/doc/md.texi
> ===================================================================
> --- gcc/doc/md.texi     2017-12-09 22:47:09.549486911 +0000
> +++ gcc/doc/md.texi     2017-12-09 22:47:27.877318096 +0000
> @@ -4972,20 +4972,8 @@ where @var{q} is a vector of @code{QImod
>  the middle-end will lower the mode @var{m} @code{VEC_PERM_EXPR} to
>  mode @var{q}.
>
> -@cindex @code{vec_perm_const@var{m}} instruction pattern
> -@item @samp{vec_perm_const@var{m}}
> -Like @samp{vec_perm} except that the permutation is a compile-time
> -constant.  That is, operand 3, the @dfn{selector}, is a @code{CONST_VECTOR}.
> -
> -Some targets cannot perform a permutation with a variable selector,
> -but can efficiently perform a constant permutation.  Further, the
> -target hook @code{vec_perm_ok} is queried to determine if the
> -specific constant permutation is available efficiently; the named
> -pattern is never expanded without @code{vec_perm_ok} returning true.
> -
> -There is no need for a target to supply both @samp{vec_perm@var{m}}
> -and @samp{vec_perm_const@var{m}} if the former can trivially implement
> -the operation with, say, the vector constant loaded into a register.
> +See also @code{TARGET_VECTORIZER_VEC_PERM_CONST}, which performs
> +the analogous operation for constant selectors.
>
>  @cindex @code{push@var{m}1} instruction pattern
>  @item @samp{push@var{m}1}
> Index: gcc/expr.c
> ===================================================================
> --- gcc/expr.c  2017-12-09 22:47:09.549486911 +0000
> +++ gcc/expr.c  2017-12-09 22:47:27.880318098 +0000
> @@ -9439,28 +9439,24 @@ #define REDUCE_BIT_FIELD(expr)  (reduce_b
>        goto binop;
>
>      case VEC_PERM_EXPR:
> -      expand_operands (treeop0, treeop1, target, &op0, &op1, EXPAND_NORMAL);
> -      op2 = expand_normal (treeop2);
> -
> -      /* Careful here: if the target doesn't support integral vector modes,
> -        a constant selection vector could wind up smooshed into a normal
> -        integral constant.  */
> -      if (CONSTANT_P (op2) && !VECTOR_MODE_P (GET_MODE (op2)))
> -       {
> -         tree sel_type = TREE_TYPE (treeop2);
> -         machine_mode vmode
> -           = mode_for_vector (SCALAR_TYPE_MODE (TREE_TYPE (sel_type)),
> -                              TYPE_VECTOR_SUBPARTS (sel_type)).require ();
> -         gcc_assert (GET_MODE_CLASS (vmode) == MODE_VECTOR_INT);
> -         op2 = simplify_subreg (vmode, op2, TYPE_MODE (sel_type), 0);
> -         gcc_assert (op2 && GET_CODE (op2) == CONST_VECTOR);
> -       }
> -      else
> -        gcc_assert (GET_MODE_CLASS (GET_MODE (op2)) == MODE_VECTOR_INT);
> -
> -      temp = expand_vec_perm (mode, op0, op1, op2, target);
> -      gcc_assert (temp);
> -      return temp;
> +      {
> +       expand_operands (treeop0, treeop1, target, &op0, &op1, EXPAND_NORMAL);
> +       vec_perm_builder sel;
> +       if (TREE_CODE (treeop2) == VECTOR_CST
> +           && tree_to_vec_perm_builder (&sel, treeop2))
> +         {
> +           machine_mode sel_mode = TYPE_MODE (TREE_TYPE (treeop2));
> +           temp = expand_vec_perm_const (mode, op0, op1, sel,
> +                                         sel_mode, target);
> +         }
> +       else
> +         {
> +           op2 = expand_normal (treeop2);
> +           temp = expand_vec_perm_var (mode, op0, op1, op2, target);
> +         }
> +       gcc_assert (temp);
> +       return temp;
> +      }
>
>      case DOT_PROD_EXPR:
>        {
> Index: gcc/optabs-query.h
> ===================================================================
> --- gcc/optabs-query.h  2017-12-09 22:47:21.534314227 +0000
> +++ gcc/optabs-query.h  2017-12-09 22:47:27.881318099 +0000
> @@ -175,6 +175,7 @@ enum insn_code can_float_p (machine_mode
>  enum insn_code can_fix_p (machine_mode, machine_mode, int, bool *);
>  bool can_conditionally_move_p (machine_mode mode);
>  opt_machine_mode qimode_for_vec_perm (machine_mode);
> +bool selector_fits_mode_p (machine_mode, const vec_perm_indices &);
>  bool can_vec_perm_var_p (machine_mode);
>  bool can_vec_perm_const_p (machine_mode, const vec_perm_indices &,
>                            bool = true);
> Index: gcc/optabs-query.c
> ===================================================================
> --- gcc/optabs-query.c  2017-12-09 22:47:25.861316866 +0000
> +++ gcc/optabs-query.c  2017-12-09 22:47:27.881318099 +0000
> @@ -28,6 +28,7 @@ Software Foundation; either version 3, o
>  #include "insn-config.h"
>  #include "rtl.h"
>  #include "recog.h"
> +#include "vec-perm-indices.h"
>
>  struct target_optabs default_target_optabs;
>  struct target_optabs *this_fn_optabs = &default_target_optabs;
> @@ -361,6 +362,17 @@ qimode_for_vec_perm (machine_mode mode)
>    return opt_machine_mode ();
>  }
>
> +/* Return true if selector SEL can be represented in the integer
> +   equivalent of vector mode MODE.  */
> +
> +bool
> +selector_fits_mode_p (machine_mode mode, const vec_perm_indices &sel)
> +{
> +  unsigned HOST_WIDE_INT mask = GET_MODE_MASK (GET_MODE_INNER (mode));
> +  return (mask == HOST_WIDE_INT_M1U
> +         || sel.all_in_range_p (0, mask + 1));
> +}
> +
>  /* Return true if VEC_PERM_EXPRs with variable selector operands can be
>     expanded using SIMD extensions of the CPU.  MODE is the mode of the
>     vectors being permuted.  */
> @@ -416,18 +428,22 @@ can_vec_perm_const_p (machine_mode mode,
>      return false;
>
>    /* It's probably cheaper to test for the variable case first.  */
> -  if (allow_variable_p && can_vec_perm_var_p (mode))
> +  if (allow_variable_p
> +      && selector_fits_mode_p (mode, sel)
> +      && can_vec_perm_var_p (mode))
>      return true;
>
> -  if (direct_optab_handler (vec_perm_const_optab, mode) != CODE_FOR_nothing)
> +  if (targetm.vectorize.vec_perm_const != NULL)
>      {
> -      if (targetm.vectorize.vec_perm_const_ok == NULL
> -         || targetm.vectorize.vec_perm_const_ok (mode, sel))
> +      if (targetm.vectorize.vec_perm_const (mode, NULL_RTX, NULL_RTX,
> +                                           NULL_RTX, sel))
>         return true;
>
>        /* ??? For completeness, we ought to check the QImode version of
>          vec_perm_const_optab.  But all users of this implicit lowering
> -        feature implement the variable vec_perm_optab.  */
> +        feature implement the variable vec_perm_optab, and the ia64
> +        port specifically doesn't want us to lower V2SF operations
> +        into integer operations.  */
>      }
>
>    return false;
> Index: gcc/optabs.c
> ===================================================================
> --- gcc/optabs.c        2017-12-09 22:47:25.861316866 +0000
> +++ gcc/optabs.c        2017-12-09 22:47:27.881318099 +0000
> @@ -5367,25 +5367,23 @@ vector_compare_rtx (machine_mode cmp_mod
>    return gen_rtx_fmt_ee (rcode, cmp_mode, ops[0].value, ops[1].value);
>  }
>
> -/* Checks if vec_perm mask SEL is a constant equivalent to a shift of the first
> -   vec_perm operand, assuming the second operand is a constant vector of zeroes.
> -   Return the shift distance in bits if so, or NULL_RTX if the vec_perm is not a
> -   shift.  */
> +/* Check if vec_perm mask SEL is a constant equivalent to a shift of
> +   the first vec_perm operand, assuming the second operand is a constant
> +   vector of zeros.  Return the shift distance in bits if so, or NULL_RTX
> +   if the vec_perm is not a shift.  MODE is the mode of the value being
> +   shifted.  */
>  static rtx
> -shift_amt_for_vec_perm_mask (rtx sel)
> +shift_amt_for_vec_perm_mask (machine_mode mode, const vec_perm_indices &sel)
>  {
> -  unsigned int i, first, nelt = GET_MODE_NUNITS (GET_MODE (sel));
> -  unsigned int bitsize = GET_MODE_UNIT_BITSIZE (GET_MODE (sel));
> +  unsigned int i, first, nelt = GET_MODE_NUNITS (mode);
> +  unsigned int bitsize = GET_MODE_UNIT_BITSIZE (mode);
>
> -  if (GET_CODE (sel) != CONST_VECTOR)
> -    return NULL_RTX;
> -
> -  first = INTVAL (CONST_VECTOR_ELT (sel, 0));
> +  first = sel[0];
>    if (first >= nelt)
>      return NULL_RTX;
>    for (i = 1; i < nelt; i++)
>      {
> -      int idx = INTVAL (CONST_VECTOR_ELT (sel, i));
> +      int idx = sel[i];
>        unsigned int expected = i + first;
>        /* Indices into the second vector are all equivalent.  */
>        if (idx < 0 || (MIN (nelt, (unsigned) idx) != MIN (nelt, expected)))
> @@ -5395,7 +5393,7 @@ shift_amt_for_vec_perm_mask (rtx sel)
>    return GEN_INT (first * bitsize);
>  }
>
> -/* A subroutine of expand_vec_perm for expanding one vec_perm insn.  */
> +/* A subroutine of expand_vec_perm_var for expanding one vec_perm insn.  */
>
>  static rtx
>  expand_vec_perm_1 (enum insn_code icode, rtx target,
> @@ -5433,38 +5431,32 @@ expand_vec_perm_1 (enum insn_code icode,
>    return NULL_RTX;
>  }
>
> -static rtx expand_vec_perm_var (machine_mode, rtx, rtx, rtx, rtx);
> -
>  /* Implement a permutation of vectors v0 and v1 using the permutation
>     vector in SEL and return the result.  Use TARGET to hold the result
>     if nonnull and convenient.
>
> -   MODE is the mode of the vectors being permuted (V0 and V1).  */
> +   MODE is the mode of the vectors being permuted (V0 and V1).  SEL_MODE
> +   is the TYPE_MODE associated with SEL, or BLKmode if SEL isn't known
> +   to have a particular mode.  */
>
>  rtx
> -expand_vec_perm (machine_mode mode, rtx v0, rtx v1, rtx sel, rtx target)
> +expand_vec_perm_const (machine_mode mode, rtx v0, rtx v1,
> +                      const vec_perm_builder &sel, machine_mode sel_mode,
> +                      rtx target)
>  {
> -  enum insn_code icode;
> -  machine_mode qimode;
> -  unsigned int i, w, e, u;
> -  rtx tmp, sel_qi = NULL;
> -  rtvec vec;
> -
> -  if (GET_CODE (sel) != CONST_VECTOR)
> -    return expand_vec_perm_var (mode, v0, v1, sel, target);
> -
> -  if (!target || GET_MODE (target) != mode)
> +  if (!target || !register_operand (target, mode))
>      target = gen_reg_rtx (mode);
>
> -  w = GET_MODE_SIZE (mode);
> -  e = GET_MODE_NUNITS (mode);
> -  u = GET_MODE_UNIT_SIZE (mode);
> -
>    /* Set QIMODE to a different vector mode with byte elements.
>       If no such mode, or if MODE already has byte elements, use VOIDmode.  */
> +  machine_mode qimode;
>    if (!qimode_for_vec_perm (mode).exists (&qimode))
>      qimode = VOIDmode;
>
> +  rtx_insn *last = get_last_insn ();
> +
> +  bool single_arg_p = rtx_equal_p (v0, v1);
> +
>    /* See if this can be handled with a vec_shr.  We only do this if the
>       second vector is all zeroes.  */
>    insn_code shift_code = optab_handler (vec_shr_optab, mode);
> @@ -5476,7 +5468,7 @@ expand_vec_perm (machine_mode mode, rtx
>        && (shift_code != CODE_FOR_nothing
>           || shift_code_qi != CODE_FOR_nothing))
>      {
> -      rtx shift_amt = shift_amt_for_vec_perm_mask (sel);
> +      rtx shift_amt = shift_amt_for_vec_perm_mask (mode, sel);
>        if (shift_amt)
>         {
>           struct expand_operand ops[3];
> @@ -5500,65 +5492,81 @@ expand_vec_perm (machine_mode mode, rtx
>         }
>      }
>
> -  icode = direct_optab_handler (vec_perm_const_optab, mode);
> -  if (icode != CODE_FOR_nothing)
> +  if (targetm.vectorize.vec_perm_const != NULL)
>      {
> -      tmp = expand_vec_perm_1 (icode, target, v0, v1, sel);
> -      if (tmp)
> -       return tmp;
> +      v0 = force_reg (mode, v0);
> +      if (single_arg_p)
> +       v1 = v0;
> +      else
> +       v1 = force_reg (mode, v1);
> +
> +      if (targetm.vectorize.vec_perm_const (mode, target, v0, v1, sel))
> +       return target;
>      }
>
>    /* Fall back to a constant byte-based permutation.  */
> +  vec_perm_indices qimode_indices;
> +  rtx target_qi = NULL_RTX, v0_qi = NULL_RTX, v1_qi = NULL_RTX;
>    if (qimode != VOIDmode)
>      {
> -      vec = rtvec_alloc (w);
> -      for (i = 0; i < e; ++i)
> -       {
> -         unsigned int j, this_e;
> +      qimode_indices.new_expanded_vector (sel, GET_MODE_UNIT_SIZE (mode));
> +      target_qi = gen_reg_rtx (qimode);
> +      v0_qi = gen_lowpart (qimode, v0);
> +      v1_qi = gen_lowpart (qimode, v1);
> +      if (targetm.vectorize.vec_perm_const != NULL
> +         && targetm.vectorize.vec_perm_const (qimode, target_qi, v0_qi,
> +                                              v1_qi, qimode_indices))
> +       return gen_lowpart (mode, target_qi);
> +    }
>
> -         this_e = INTVAL (CONST_VECTOR_ELT (sel, i));
> -         this_e &= 2 * e - 1;
> -         this_e *= u;
> +  /* Otherwise expand as a fully variable permuation.  */
>
> -         for (j = 0; j < u; ++j)
> -           RTVEC_ELT (vec, i * u + j) = GEN_INT (this_e + j);
> -       }
> -      sel_qi = gen_rtx_CONST_VECTOR (qimode, vec);
> +  /* The optabs are only defined for selectors with the same width
> +     as the values being permuted.  */
> +  machine_mode required_sel_mode;
> +  if (!mode_for_int_vector (mode).exists (&required_sel_mode)
> +      || !VECTOR_MODE_P (required_sel_mode))
> +    {
> +      delete_insns_since (last);
> +      return NULL_RTX;
> +    }
>
> -      icode = direct_optab_handler (vec_perm_const_optab, qimode);
> -      if (icode != CODE_FOR_nothing)
> +  /* We know that it is semantically valid to treat SEL as having SEL_MODE.
> +     If that isn't the mode we want then we need to prove that using
> +     REQUIRED_SEL_MODE is OK.  */
> +  if (sel_mode != required_sel_mode)
> +    {
> +      if (!selector_fits_mode_p (required_sel_mode, sel))
>         {
> -         tmp = gen_reg_rtx (qimode);
> -         tmp = expand_vec_perm_1 (icode, tmp, gen_lowpart (qimode, v0),
> -                                  gen_lowpart (qimode, v1), sel_qi);
> -         if (tmp)
> -           return gen_lowpart (mode, tmp);
> +         delete_insns_since (last);
> +         return NULL_RTX;
>         }
> +      sel_mode = required_sel_mode;
>      }
>
> -  /* Otherwise expand as a fully variable permuation.  */
> -
> -  icode = direct_optab_handler (vec_perm_optab, mode);
> +  insn_code icode = direct_optab_handler (vec_perm_optab, mode);
>    if (icode != CODE_FOR_nothing)
>      {
> -      rtx tmp = expand_vec_perm_1 (icode, target, v0, v1, sel);
> +      rtx sel_rtx = vec_perm_indices_to_rtx (sel_mode, sel);
> +      rtx tmp = expand_vec_perm_1 (icode, target, v0, v1, sel_rtx);
>        if (tmp)
>         return tmp;
>      }
>
> -  if (qimode != VOIDmode)
> +  if (qimode != VOIDmode
> +      && selector_fits_mode_p (qimode, qimode_indices))
>      {
>        icode = direct_optab_handler (vec_perm_optab, qimode);
>        if (icode != CODE_FOR_nothing)
>         {
> -         rtx tmp = gen_reg_rtx (qimode);
> -         tmp = expand_vec_perm_1 (icode, tmp, gen_lowpart (qimode, v0),
> -                                  gen_lowpart (qimode, v1), sel_qi);
> +         rtx sel_qi = vec_perm_indices_to_rtx (qimode, qimode_indices);
> +         rtx tmp = expand_vec_perm_1 (icode, target_qi, v0_qi, v1_qi, sel_qi);
>           if (tmp)
>             return gen_lowpart (mode, tmp);
>         }
>      }
>
> +  delete_insns_since (last);
>    return NULL_RTX;
>  }
>
> @@ -5570,7 +5578,7 @@ expand_vec_perm (machine_mode mode, rtx
>     SEL must have the integer equivalent of MODE and is known to be
>     unsuitable for permutes with a constant permutation vector.  */
>
> -static rtx
> +rtx
>  expand_vec_perm_var (machine_mode mode, rtx v0, rtx v1, rtx sel, rtx target)
>  {
>    enum insn_code icode;
> @@ -5613,17 +5621,16 @@ expand_vec_perm_var (machine_mode mode,
>    gcc_assert (sel != NULL);
>
>    /* Broadcast the low byte each element into each of its bytes.  */
> -  vec = rtvec_alloc (w);
> +  vec_perm_builder const_sel (w);
>    for (i = 0; i < w; ++i)
>      {
>        int this_e = i / u * u;
>        if (BYTES_BIG_ENDIAN)
>         this_e += u - 1;
> -      RTVEC_ELT (vec, i) = GEN_INT (this_e);
> +      const_sel.quick_push (this_e);
>      }
> -  tmp = gen_rtx_CONST_VECTOR (qimode, vec);
>    sel = gen_lowpart (qimode, sel);
> -  sel = expand_vec_perm (qimode, sel, sel, tmp, NULL);
> +  sel = expand_vec_perm_const (qimode, sel, sel, const_sel, qimode, NULL);
>    gcc_assert (sel != NULL);
>
>    /* Add the byte offset to each byte element.  */
> @@ -5797,9 +5804,8 @@ expand_mult_highpart (machine_mode mode,
>    enum insn_code icode;
>    int method, i, nunits;
>    machine_mode wmode;
> -  rtx m1, m2, perm;
> +  rtx m1, m2;
>    optab tab1, tab2;
> -  rtvec v;
>
>    method = can_mult_highpart_p (mode, uns_p);
>    switch (method)
> @@ -5842,21 +5848,20 @@ expand_mult_highpart (machine_mode mode,
>    expand_insn (optab_handler (tab2, mode), 3, eops);
>    m2 = gen_lowpart (mode, eops[0].value);
>
> -  v = rtvec_alloc (nunits);
> +  auto_vec_perm_indices sel (nunits);
>    if (method == 2)
>      {
>        for (i = 0; i < nunits; ++i)
> -       RTVEC_ELT (v, i) = GEN_INT (!BYTES_BIG_ENDIAN + (i & ~1)
> -                                   + ((i & 1) ? nunits : 0));
> -      perm = gen_rtx_CONST_VECTOR (mode, v);
> +       sel.quick_push (!BYTES_BIG_ENDIAN + (i & ~1)
> +                       + ((i & 1) ? nunits : 0));
>      }
>    else
>      {
> -      int base = BYTES_BIG_ENDIAN ? 0 : 1;
> -      perm = gen_const_vec_series (mode, GEN_INT (base), GEN_INT (2));
> +      for (i = 0; i < nunits; ++i)
> +       sel.quick_push (2 * i + (BYTES_BIG_ENDIAN ? 0 : 1));
>      }
>
> -  return expand_vec_perm (mode, m1, m2, perm, target);
> +  return expand_vec_perm_const (mode, m1, m2, sel, BLKmode, target);
>  }
>
>  /* Helper function to find the MODE_CC set in a sync_compare_and_swap
> Index: gcc/fold-const.c
> ===================================================================
> --- gcc/fold-const.c    2017-12-09 22:47:21.534314227 +0000
> +++ gcc/fold-const.c    2017-12-09 22:47:27.881318099 +0000
> @@ -82,6 +82,7 @@ Software Foundation; either version 3, o
>  #include "stringpool.h"
>  #include "attribs.h"
>  #include "tree-vector-builder.h"
> +#include "vec-perm-indices.h"
>
>  /* Nonzero if we are folding constants inside an initializer; zero
>     otherwise.  */
> Index: gcc/tree-ssa-forwprop.c
> ===================================================================
> --- gcc/tree-ssa-forwprop.c     2017-12-09 22:47:21.534314227 +0000
> +++ gcc/tree-ssa-forwprop.c     2017-12-09 22:47:27.883318100 +0000
> @@ -47,6 +47,7 @@ the Free Software Foundation; either ver
>  #include "cfganal.h"
>  #include "optabs-tree.h"
>  #include "tree-vector-builder.h"
> +#include "vec-perm-indices.h"
>
>  /* This pass propagates the RHS of assignment statements into use
>     sites of the LHS of the assignment.  It's basically a specialized
> Index: gcc/tree-vect-data-refs.c
> ===================================================================
> --- gcc/tree-vect-data-refs.c   2017-12-09 22:47:21.535314227 +0000
> +++ gcc/tree-vect-data-refs.c   2017-12-09 22:47:27.883318100 +0000
> @@ -52,6 +52,7 @@ Software Foundation; either version 3, o
>  #include "params.h"
>  #include "tree-cfg.h"
>  #include "tree-hash-traits.h"
> +#include "vec-perm-indices.h"
>
>  /* Return true if load- or store-lanes optab OPTAB is implemented for
>     COUNT vectors of type VECTYPE.  NAME is the name of OPTAB.  */
> Index: gcc/tree-vect-generic.c
> ===================================================================
> --- gcc/tree-vect-generic.c     2017-12-09 22:47:21.535314227 +0000
> +++ gcc/tree-vect-generic.c     2017-12-09 22:47:27.883318100 +0000
> @@ -38,6 +38,7 @@ Free Software Foundation; either version
>  #include "gimplify.h"
>  #include "tree-cfg.h"
>  #include "tree-vector-builder.h"
> +#include "vec-perm-indices.h"
>
>
>  static void expand_vector_operations_1 (gimple_stmt_iterator *);
> Index: gcc/tree-vect-loop.c
> ===================================================================
> --- gcc/tree-vect-loop.c        2017-12-09 22:47:21.536314228 +0000
> +++ gcc/tree-vect-loop.c        2017-12-09 22:47:27.884318101 +0000
> @@ -52,6 +52,7 @@ Software Foundation; either version 3, o
>  #include "tree-if-conv.h"
>  #include "internal-fn.h"
>  #include "tree-vector-builder.h"
> +#include "vec-perm-indices.h"
>
>  /* Loop Vectorization Pass.
>
> Index: gcc/tree-vect-slp.c
> ===================================================================
> --- gcc/tree-vect-slp.c 2017-12-09 22:47:21.536314228 +0000
> +++ gcc/tree-vect-slp.c 2017-12-09 22:47:27.884318101 +0000
> @@ -42,6 +42,7 @@ Software Foundation; either version 3, o
>  #include "gimple-walk.h"
>  #include "dbgcnt.h"
>  #include "tree-vector-builder.h"
> +#include "vec-perm-indices.h"
>
>
>  /* Recursively free the memory allocated for the SLP tree rooted at NODE.  */
> Index: gcc/tree-vect-stmts.c
> ===================================================================
> --- gcc/tree-vect-stmts.c       2017-12-09 22:47:21.537314229 +0000
> +++ gcc/tree-vect-stmts.c       2017-12-09 22:47:27.885318101 +0000
> @@ -49,6 +49,7 @@ Software Foundation; either version 3, o
>  #include "builtins.h"
>  #include "internal-fn.h"
>  #include "tree-vector-builder.h"
> +#include "vec-perm-indices.h"
>
>  /* For lang_hooks.types.type_for_mode.  */
>  #include "langhooks.h"
> Index: gcc/config/aarch64/aarch64-protos.h
> ===================================================================
> --- gcc/config/aarch64/aarch64-protos.h 2017-12-09 22:47:09.549486911 +0000
> +++ gcc/config/aarch64/aarch64-protos.h 2017-12-09 22:47:27.854318082 +0000
> @@ -474,8 +474,6 @@ extern void aarch64_split_combinev16qi (
>  extern void aarch64_expand_vec_perm (rtx, rtx, rtx, rtx, unsigned int);
>  extern bool aarch64_madd_needs_nop (rtx_insn *);
>  extern void aarch64_final_prescan_insn (rtx_insn *);
> -extern bool
> -aarch64_expand_vec_perm_const (rtx, rtx, rtx, rtx, unsigned int);
>  void aarch64_atomic_assign_expand_fenv (tree *, tree *, tree *);
>  int aarch64_ccmp_mode_to_code (machine_mode mode);
>
> Index: gcc/config/aarch64/aarch64-simd.md
> ===================================================================
> --- gcc/config/aarch64/aarch64-simd.md  2017-12-09 22:47:09.549486911 +0000
> +++ gcc/config/aarch64/aarch64-simd.md  2017-12-09 22:47:27.854318082 +0000
> @@ -5348,20 +5348,6 @@ (define_expand "aarch64_get_qreg<VSTRUCT
>
>  ;; vec_perm support
>
> -(define_expand "vec_perm_const<mode>"
> -  [(match_operand:VALL_F16 0 "register_operand")
> -   (match_operand:VALL_F16 1 "register_operand")
> -   (match_operand:VALL_F16 2 "register_operand")
> -   (match_operand:<V_INT_EQUIV> 3)]
> -  "TARGET_SIMD"
> -{
> -  if (aarch64_expand_vec_perm_const (operands[0], operands[1],
> -                                    operands[2], operands[3], <nunits>))
> -    DONE;
> -  else
> -    FAIL;
> -})
> -
>  (define_expand "vec_perm<mode>"
>    [(match_operand:VB 0 "register_operand")
>     (match_operand:VB 1 "register_operand")
> Index: gcc/config/aarch64/aarch64.c
> ===================================================================
> --- gcc/config/aarch64/aarch64.c        2017-12-09 22:47:09.549486911 +0000
> +++ gcc/config/aarch64/aarch64.c        2017-12-09 22:47:27.856318084 +0000
> @@ -141,8 +141,6 @@ static void aarch64_elf_asm_constructor
>  static void aarch64_elf_asm_destructor (rtx, int) ATTRIBUTE_UNUSED;
>  static void aarch64_override_options_after_change (void);
>  static bool aarch64_vector_mode_supported_p (machine_mode);
> -static bool aarch64_vectorize_vec_perm_const_ok (machine_mode,
> -                                                vec_perm_indices);
>  static int aarch64_address_cost (rtx, machine_mode, addr_space_t, bool);
>  static bool aarch64_builtin_support_vector_misalignment (machine_mode mode,
>                                                          const_tree type,
> @@ -13626,29 +13624,27 @@ aarch64_expand_vec_perm_const_1 (struct
>    return false;
>  }
>
> -/* Expand a vec_perm_const pattern with the operands given by TARGET,
> -   OP0, OP1 and SEL.  NELT is the number of elements in the vector.  */
> +/* Implement TARGET_VECTORIZE_VEC_PERM_CONST.  */
>
> -bool
> -aarch64_expand_vec_perm_const (rtx target, rtx op0, rtx op1, rtx sel,
> -                              unsigned int nelt)
> +static bool
> +aarch64_vectorize_vec_perm_const (machine_mode vmode, rtx target, rtx op0,
> +                                 rtx op1, const vec_perm_indices &sel)
>  {
>    struct expand_vec_perm_d d;
>    unsigned int i, which;
>
> +  d.vmode = vmode;
>    d.target = target;
>    d.op0 = op0;
>    d.op1 = op1;
> +  d.testing_p = !target;
>
> -  d.vmode = GET_MODE (target);
> -  gcc_assert (VECTOR_MODE_P (d.vmode));
> -  d.testing_p = false;
> -
> +  /* Calculate whether all elements are in one vector.  */
> +  unsigned int nelt = sel.length ();
>    d.perm.reserve (nelt);
>    for (i = which = 0; i < nelt; ++i)
>      {
> -      rtx e = XVECEXP (sel, 0, i);
> -      unsigned int ei = INTVAL (e) & (2 * nelt - 1);
> +      unsigned int ei = sel[i] & (2 * nelt - 1);
>        which |= (ei < nelt ? 1 : 2);
>        d.perm.quick_push (ei);
>      }
> @@ -13660,7 +13656,7 @@ aarch64_expand_vec_perm_const (rtx targe
>
>      case 3:
>        d.one_vector_p = false;
> -      if (!rtx_equal_p (op0, op1))
> +      if (d.testing_p || !rtx_equal_p (op0, op1))
>         break;
>
>        /* The elements of PERM do not suggest that only the first operand
> @@ -13681,37 +13677,8 @@ aarch64_expand_vec_perm_const (rtx targe
>        break;
>      }
>
> -  return aarch64_expand_vec_perm_const_1 (&d);
> -}
> -
> -static bool
> -aarch64_vectorize_vec_perm_const_ok (machine_mode vmode, vec_perm_indices sel)
> -{
> -  struct expand_vec_perm_d d;
> -  unsigned int i, nelt, which;
> -  bool ret;
> -
> -  d.vmode = vmode;
> -  d.testing_p = true;
> -  d.perm.safe_splice (sel);
> -
> -  /* Calculate whether all elements are in one vector.  */
> -  nelt = sel.length ();
> -  for (i = which = 0; i < nelt; ++i)
> -    {
> -      unsigned int e = d.perm[i];
> -      gcc_assert (e < 2 * nelt);
> -      which |= (e < nelt ? 1 : 2);
> -    }
> -
> -  /* If all elements are from the second vector, reindex as if from the
> -     first vector.  */
> -  if (which == 2)
> -    for (i = 0; i < nelt; ++i)
> -      d.perm[i] -= nelt;
> -
> -  /* Check whether the mask can be applied to a single vector.  */
> -  d.one_vector_p = (which != 3);
> +  if (!d.testing_p)
> +    return aarch64_expand_vec_perm_const_1 (&d);
>
>    d.target = gen_raw_REG (d.vmode, LAST_VIRTUAL_REGISTER + 1);
>    d.op1 = d.op0 = gen_raw_REG (d.vmode, LAST_VIRTUAL_REGISTER + 2);
> @@ -13719,7 +13686,7 @@ aarch64_vectorize_vec_perm_const_ok (mac
>      d.op1 = gen_raw_REG (d.vmode, LAST_VIRTUAL_REGISTER + 3);
>
>    start_sequence ();
> -  ret = aarch64_expand_vec_perm_const_1 (&d);
> +  bool ret = aarch64_expand_vec_perm_const_1 (&d);
>    end_sequence ();
>
>    return ret;
> @@ -15471,9 +15438,9 @@ #define TARGET_VECTORIZE_VECTOR_ALIGNMEN
>
>  /* vec_perm support.  */
>
> -#undef TARGET_VECTORIZE_VEC_PERM_CONST_OK
> -#define TARGET_VECTORIZE_VEC_PERM_CONST_OK \
> -  aarch64_vectorize_vec_perm_const_ok
> +#undef TARGET_VECTORIZE_VEC_PERM_CONST
> +#define TARGET_VECTORIZE_VEC_PERM_CONST \
> +  aarch64_vectorize_vec_perm_const
>
>  #undef TARGET_INIT_LIBFUNCS
>  #define TARGET_INIT_LIBFUNCS aarch64_init_libfuncs
> Index: gcc/config/arm/arm-protos.h
> ===================================================================
> --- gcc/config/arm/arm-protos.h 2017-12-09 22:47:09.549486911 +0000
> +++ gcc/config/arm/arm-protos.h 2017-12-09 22:47:27.856318084 +0000
> @@ -357,7 +357,6 @@ extern bool arm_validize_comparison (rtx
>
>  extern bool arm_gen_setmem (rtx *);
>  extern void arm_expand_vec_perm (rtx target, rtx op0, rtx op1, rtx sel);
> -extern bool arm_expand_vec_perm_const (rtx target, rtx op0, rtx op1, rtx sel);
>
>  extern bool arm_autoinc_modes_ok_p (machine_mode, enum arm_auto_incmodes);
>
> Index: gcc/config/arm/vec-common.md
> ===================================================================
> --- gcc/config/arm/vec-common.md        2017-12-09 22:47:09.549486911 +0000
> +++ gcc/config/arm/vec-common.md        2017-12-09 22:47:27.858318085 +0000
> @@ -109,35 +109,6 @@ (define_expand "umax<mode>3"
>  {
>  })
>
> -(define_expand "vec_perm_const<mode>"
> -  [(match_operand:VALL 0 "s_register_operand" "")
> -   (match_operand:VALL 1 "s_register_operand" "")
> -   (match_operand:VALL 2 "s_register_operand" "")
> -   (match_operand:<V_cmp_result> 3 "" "")]
> -  "TARGET_NEON
> -   || (TARGET_REALLY_IWMMXT && VALID_IWMMXT_REG_MODE (<MODE>mode))"
> -{
> -  if (arm_expand_vec_perm_const (operands[0], operands[1],
> -                                operands[2], operands[3]))
> -    DONE;
> -  else
> -    FAIL;
> -})
> -
> -(define_expand "vec_perm_const<mode>"
> -  [(match_operand:VH 0 "s_register_operand")
> -   (match_operand:VH 1 "s_register_operand")
> -   (match_operand:VH 2 "s_register_operand")
> -   (match_operand:<V_cmp_result> 3)]
> -  "TARGET_NEON"
> -{
> -  if (arm_expand_vec_perm_const (operands[0], operands[1],
> -                                operands[2], operands[3]))
> -    DONE;
> -  else
> -    FAIL;
> -})
> -
>  (define_expand "vec_perm<mode>"
>    [(match_operand:VE 0 "s_register_operand" "")
>     (match_operand:VE 1 "s_register_operand" "")
> Index: gcc/config/arm/arm.c
> ===================================================================
> --- gcc/config/arm/arm.c        2017-12-09 22:47:09.549486911 +0000
> +++ gcc/config/arm/arm.c        2017-12-09 22:47:27.858318085 +0000
> @@ -288,7 +288,8 @@ static int arm_cortex_a5_branch_cost (bo
>  static int arm_cortex_m_branch_cost (bool, bool);
>  static int arm_cortex_m7_branch_cost (bool, bool);
>
> -static bool arm_vectorize_vec_perm_const_ok (machine_mode, vec_perm_indices);
> +static bool arm_vectorize_vec_perm_const (machine_mode, rtx, rtx, rtx,
> +                                         const vec_perm_indices &);
>
>  static bool aarch_macro_fusion_pair_p (rtx_insn*, rtx_insn*);
>
> @@ -734,9 +735,8 @@ #define TARGET_VECTORIZE_SUPPORT_VECTOR_
>  #define TARGET_PREFERRED_RENAME_CLASS \
>    arm_preferred_rename_class
>
> -#undef TARGET_VECTORIZE_VEC_PERM_CONST_OK
> -#define TARGET_VECTORIZE_VEC_PERM_CONST_OK \
> -  arm_vectorize_vec_perm_const_ok
> +#undef TARGET_VECTORIZE_VEC_PERM_CONST
> +#define TARGET_VECTORIZE_VEC_PERM_CONST arm_vectorize_vec_perm_const
>
>  #undef TARGET_VECTORIZE_BUILTIN_VECTORIZATION_COST
>  #define TARGET_VECTORIZE_BUILTIN_VECTORIZATION_COST \
> @@ -29381,28 +29381,31 @@ arm_expand_vec_perm_const_1 (struct expa
>    return false;
>  }
>
> -/* Expand a vec_perm_const pattern.  */
> +/* Implement TARGET_VECTORIZE_VEC_PERM_CONST.  */
>
> -bool
> -arm_expand_vec_perm_const (rtx target, rtx op0, rtx op1, rtx sel)
> +static bool
> +arm_vectorize_vec_perm_const (machine_mode vmode, rtx target, rtx op0, rtx op1,
> +                             const vec_perm_indices &sel)
>  {
>    struct expand_vec_perm_d d;
>    int i, nelt, which;
>
> +  if (!VALID_NEON_DREG_MODE (vmode) && !VALID_NEON_QREG_MODE (vmode))
> +    return false;
> +
>    d.target = target;
>    d.op0 = op0;
>    d.op1 = op1;
>
> -  d.vmode = GET_MODE (target);
> +  d.vmode = vmode;
>    gcc_assert (VECTOR_MODE_P (d.vmode));
> -  d.testing_p = false;
> +  d.testing_p = !target;
>
>    nelt = GET_MODE_NUNITS (d.vmode);
>    d.perm.reserve (nelt);
>    for (i = which = 0; i < nelt; ++i)
>      {
> -      rtx e = XVECEXP (sel, 0, i);
> -      int ei = INTVAL (e) & (2 * nelt - 1);
> +      int ei = sel[i] & (2 * nelt - 1);
>        which |= (ei < nelt ? 1 : 2);
>        d.perm.quick_push (ei);
>      }
> @@ -29414,7 +29417,7 @@ arm_expand_vec_perm_const (rtx target, r
>
>      case 3:
>        d.one_vector_p = false;
> -      if (!rtx_equal_p (op0, op1))
> +      if (d.testing_p || !rtx_equal_p (op0, op1))
>         break;
>
>        /* The elements of PERM do not suggest that only the first operand
> @@ -29435,38 +29438,8 @@ arm_expand_vec_perm_const (rtx target, r
>        break;
>      }
>
> -  return arm_expand_vec_perm_const_1 (&d);
> -}
> -
> -/* Implement TARGET_VECTORIZE_VEC_PERM_CONST_OK.  */
> -
> -static bool
> -arm_vectorize_vec_perm_const_ok (machine_mode vmode, vec_perm_indices sel)
> -{
> -  struct expand_vec_perm_d d;
> -  unsigned int i, nelt, which;
> -  bool ret;
> -
> -  d.vmode = vmode;
> -  d.testing_p = true;
> -  d.perm.safe_splice (sel);
> -
> -  /* Categorize the set of elements in the selector.  */
> -  nelt = GET_MODE_NUNITS (d.vmode);
> -  for (i = which = 0; i < nelt; ++i)
> -    {
> -      unsigned int e = d.perm[i];
> -      gcc_assert (e < 2 * nelt);
> -      which |= (e < nelt ? 1 : 2);
> -    }
> -
> -  /* For all elements from second vector, fold the elements to first.  */
> -  if (which == 2)
> -    for (i = 0; i < nelt; ++i)
> -      d.perm[i] -= nelt;
> -
> -  /* Check whether the mask can be applied to the vector type.  */
> -  d.one_vector_p = (which != 3);
> +  if (d.testing_p)
> +    return arm_expand_vec_perm_const_1 (&d);
>
>    d.target = gen_raw_REG (d.vmode, LAST_VIRTUAL_REGISTER + 1);
>    d.op1 = d.op0 = gen_raw_REG (d.vmode, LAST_VIRTUAL_REGISTER + 2);
> @@ -29474,7 +29447,7 @@ arm_vectorize_vec_perm_const_ok (machine
>      d.op1 = gen_raw_REG (d.vmode, LAST_VIRTUAL_REGISTER + 3);
>
>    start_sequence ();
> -  ret = arm_expand_vec_perm_const_1 (&d);
> +  bool ret = arm_expand_vec_perm_const_1 (&d);
>    end_sequence ();
>
>    return ret;
> Index: gcc/config/i386/i386-protos.h
> ===================================================================
> --- gcc/config/i386/i386-protos.h       2017-12-09 22:47:09.549486911 +0000
> +++ gcc/config/i386/i386-protos.h       2017-12-09 22:47:27.859318085 +0000
> @@ -133,7 +133,6 @@ extern bool ix86_expand_fp_movcc (rtx[])
>  extern bool ix86_expand_fp_vcond (rtx[]);
>  extern bool ix86_expand_int_vcond (rtx[]);
>  extern void ix86_expand_vec_perm (rtx[]);
> -extern bool ix86_expand_vec_perm_const (rtx[]);
>  extern bool ix86_expand_mask_vec_cmp (rtx[]);
>  extern bool ix86_expand_int_vec_cmp (rtx[]);
>  extern bool ix86_expand_fp_vec_cmp (rtx[]);
> Index: gcc/config/i386/sse.md
> ===================================================================
> --- gcc/config/i386/sse.md      2017-12-09 22:47:09.549486911 +0000
> +++ gcc/config/i386/sse.md      2017-12-09 22:47:27.863318088 +0000
> @@ -11476,30 +11476,6 @@ (define_expand "vec_perm<mode>"
>    DONE;
>  })
>
> -(define_mode_iterator VEC_PERM_CONST
> -  [(V4SF "TARGET_SSE") (V4SI "TARGET_SSE")
> -   (V2DF "TARGET_SSE") (V2DI "TARGET_SSE")
> -   (V16QI "TARGET_SSE2") (V8HI "TARGET_SSE2")
> -   (V8SF "TARGET_AVX") (V4DF "TARGET_AVX")
> -   (V8SI "TARGET_AVX") (V4DI "TARGET_AVX")
> -   (V32QI "TARGET_AVX2") (V16HI "TARGET_AVX2")
> -   (V16SI "TARGET_AVX512F") (V8DI "TARGET_AVX512F")
> -   (V16SF "TARGET_AVX512F") (V8DF "TARGET_AVX512F")
> -   (V32HI "TARGET_AVX512BW") (V64QI "TARGET_AVX512BW")])
> -
> -(define_expand "vec_perm_const<mode>"
> -  [(match_operand:VEC_PERM_CONST 0 "register_operand")
> -   (match_operand:VEC_PERM_CONST 1 "register_operand")
> -   (match_operand:VEC_PERM_CONST 2 "register_operand")
> -   (match_operand:<sseintvecmode> 3)]
> -  ""
> -{
> -  if (ix86_expand_vec_perm_const (operands))
> -    DONE;
> -  else
> -    FAIL;
> -})
> -
>  ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
>  ;;
>  ;; Parallel bitwise logical operations
> Index: gcc/config/i386/i386.c
> ===================================================================
> --- gcc/config/i386/i386.c      2017-12-09 22:47:09.549486911 +0000
> +++ gcc/config/i386/i386.c      2017-12-09 22:47:27.862318087 +0000
> @@ -47588,9 +47588,8 @@ expand_vec_perm_vpshufb4_vpermq2 (struct
>    return true;
>  }
>
> -/* The guts of ix86_expand_vec_perm_const, also used by the ok hook.
> -   With all of the interface bits taken care of, perform the expansion
> -   in D and return true on success.  */
> +/* The guts of ix86_vectorize_vec_perm_const.  With all of the interface bits
> +   taken care of, perform the expansion in D and return true on success.  */
>
>  static bool
>  ix86_expand_vec_perm_const_1 (struct expand_vec_perm_d *d)
> @@ -47725,69 +47724,29 @@ canonicalize_perm (struct expand_vec_per
>    return (which == 3);
>  }
>
> -bool
> -ix86_expand_vec_perm_const (rtx operands[4])
> +/* Implement TARGET_VECTORIZE_VEC_PERM_CONST.  */
> +
> +static bool
> +ix86_vectorize_vec_perm_const (machine_mode vmode, rtx target, rtx op0,
> +                              rtx op1, const vec_perm_indices &sel)
>  {
>    struct expand_vec_perm_d d;
>    unsigned char perm[MAX_VECT_LEN];
> -  int i, nelt;
> +  unsigned int i, nelt, which;
>    bool two_args;
> -  rtx sel;
>
> -  d.target = operands[0];
> -  d.op0 = operands[1];
> -  d.op1 = operands[2];
> -  sel = operands[3];
> +  d.target = target;
> +  d.op0 = op0;
> +  d.op1 = op1;
>
> -  d.vmode = GET_MODE (d.target);
> +  d.vmode = vmode;
>    gcc_assert (VECTOR_MODE_P (d.vmode));
>    d.nelt = nelt = GET_MODE_NUNITS (d.vmode);
> -  d.testing_p = false;
> +  d.testing_p = !target;
>
> -  gcc_assert (GET_CODE (sel) == CONST_VECTOR);
> -  gcc_assert (XVECLEN (sel, 0) == nelt);
> +  gcc_assert (sel.length () == nelt);
>    gcc_checking_assert (sizeof (d.perm) == sizeof (perm));
>
> -  for (i = 0; i < nelt; ++i)
> -    {
> -      rtx e = XVECEXP (sel, 0, i);
> -      int ei = INTVAL (e) & (2 * nelt - 1);
> -      d.perm[i] = ei;
> -      perm[i] = ei;
> -    }
> -
> -  two_args = canonicalize_perm (&d);
> -
> -  if (ix86_expand_vec_perm_const_1 (&d))
> -    return true;
> -
> -  /* If the selector says both arguments are needed, but the operands are the
> -     same, the above tried to expand with one_operand_p and flattened selector.
> -     If that didn't work, retry without one_operand_p; we succeeded with that
> -     during testing.  */
> -  if (two_args && d.one_operand_p)
> -    {
> -      d.one_operand_p = false;
> -      memcpy (d.perm, perm, sizeof (perm));
> -      return ix86_expand_vec_perm_const_1 (&d);
> -    }
> -
> -  return false;
> -}
> -
> -/* Implement targetm.vectorize.vec_perm_const_ok.  */
> -
> -static bool
> -ix86_vectorize_vec_perm_const_ok (machine_mode vmode, vec_perm_indices sel)
> -{
> -  struct expand_vec_perm_d d;
> -  unsigned int i, nelt, which;
> -  bool ret;
> -
> -  d.vmode = vmode;
> -  d.nelt = nelt = GET_MODE_NUNITS (d.vmode);
> -  d.testing_p = true;
> -
>    /* Given sufficient ISA support we can just return true here
>       for selected vector modes.  */
>    switch (d.vmode)
> @@ -47796,17 +47755,23 @@ ix86_vectorize_vec_perm_const_ok (machin
>      case E_V16SImode:
>      case E_V8DImode:
>      case E_V8DFmode:
> -      if (TARGET_AVX512F)
> -       /* All implementable with a single vperm[it]2 insn.  */
> +      if (!TARGET_AVX512F)
> +       return false;
> +      /* All implementable with a single vperm[it]2 insn.  */
> +      if (d.testing_p)
>         return true;
>        break;
>      case E_V32HImode:
> -      if (TARGET_AVX512BW)
> +      if (!TARGET_AVX512BW)
> +       return false;
> +      if (d.testing_p)
>         /* All implementable with a single vperm[it]2 insn.  */
>         return true;
>        break;
>      case E_V64QImode:
> -      if (TARGET_AVX512BW)
> +      if (!TARGET_AVX512BW)
> +       return false;
> +      if (d.testing_p)
>         /* Implementable with 2 vperm[it]2, 2 vpshufb and 1 or insn.  */
>         return true;
>        break;
> @@ -47814,73 +47779,108 @@ ix86_vectorize_vec_perm_const_ok (machin
>      case E_V8SFmode:
>      case E_V4DFmode:
>      case E_V4DImode:
> -      if (TARGET_AVX512VL)
> +      if (!TARGET_AVX)
> +       return false;
> +      if (d.testing_p && TARGET_AVX512VL)
>         /* All implementable with a single vperm[it]2 insn.  */
>         return true;
>        break;
>      case E_V16HImode:
> -      if (TARGET_AVX2)
> +      if (!TARGET_SSE2)
> +       return false;
> +      if (d.testing_p && TARGET_AVX2)
>         /* Implementable with 4 vpshufb insns, 2 vpermq and 3 vpor insns.  */
>         return true;
>        break;
>      case E_V32QImode:
> -      if (TARGET_AVX2)
> +      if (!TARGET_SSE2)
> +       return false;
> +      if (d.testing_p && TARGET_AVX2)
>         /* Implementable with 4 vpshufb insns, 2 vpermq and 3 vpor insns.  */
>         return true;
>        break;
> -    case E_V4SImode:
> -    case E_V4SFmode:
>      case E_V8HImode:
>      case E_V16QImode:
> +      if (!TARGET_SSE2)
> +       return false;
> +      /* Fall through.  */
> +    case E_V4SImode:
> +    case E_V4SFmode:
> +      if (!TARGET_SSE)
> +       return false;
>        /* All implementable with a single vpperm insn.  */
> -      if (TARGET_XOP)
> +      if (d.testing_p && TARGET_XOP)
>         return true;
>        /* All implementable with 2 pshufb + 1 ior.  */
> -      if (TARGET_SSSE3)
> +      if (d.testing_p && TARGET_SSSE3)
>         return true;
>        break;
>      case E_V2DImode:
>      case E_V2DFmode:
> +      if (!TARGET_SSE)
> +       return false;
>        /* All implementable with shufpd or unpck[lh]pd.  */
> -      return true;
> +      if (d.testing_p)
> +       return true;
> +      break;
>      default:
>        return false;
>      }
>
> -  /* Extract the values from the vector CST into the permutation
> -     array in D.  */
>    for (i = which = 0; i < nelt; ++i)
>      {
>        unsigned char e = sel[i];
>        gcc_assert (e < 2 * nelt);
>        d.perm[i] = e;
> +      perm[i] = e;
>        which |= (e < nelt ? 1 : 2);
>      }
>
> -  /* For all elements from second vector, fold the elements to first.  */
> -  if (which == 2)
> -    for (i = 0; i < nelt; ++i)
> -      d.perm[i] -= nelt;
> +  if (d.testing_p)
> +    {
> +      /* For all elements from second vector, fold the elements to first.  */
> +      if (which == 2)
> +       for (i = 0; i < nelt; ++i)
> +         d.perm[i] -= nelt;
> +
> +      /* Check whether the mask can be applied to the vector type.  */
> +      d.one_operand_p = (which != 3);
> +
> +      /* Implementable with shufps or pshufd.  */
> +      if (d.one_operand_p && (d.vmode == V4SFmode || d.vmode == V4SImode))
> +       return true;
> +
> +      /* Otherwise we have to go through the motions and see if we can
> +        figure out how to generate the requested permutation.  */
> +      d.target = gen_raw_REG (d.vmode, LAST_VIRTUAL_REGISTER + 1);
> +      d.op1 = d.op0 = gen_raw_REG (d.vmode, LAST_VIRTUAL_REGISTER + 2);
> +      if (!d.one_operand_p)
> +       d.op1 = gen_raw_REG (d.vmode, LAST_VIRTUAL_REGISTER + 3);
> +
> +      start_sequence ();
> +      bool ret = ix86_expand_vec_perm_const_1 (&d);
> +      end_sequence ();
>
> -  /* Check whether the mask can be applied to the vector type.  */
> -  d.one_operand_p = (which != 3);
> +      return ret;
> +    }
>
> -  /* Implementable with shufps or pshufd.  */
> -  if (d.one_operand_p && (d.vmode == V4SFmode || d.vmode == V4SImode))
> +  two_args = canonicalize_perm (&d);
> +
> +  if (ix86_expand_vec_perm_const_1 (&d))
>      return true;
>
> -  /* Otherwise we have to go through the motions and see if we can
> -     figure out how to generate the requested permutation.  */
> -  d.target = gen_raw_REG (d.vmode, LAST_VIRTUAL_REGISTER + 1);
> -  d.op1 = d.op0 = gen_raw_REG (d.vmode, LAST_VIRTUAL_REGISTER + 2);
> -  if (!d.one_operand_p)
> -    d.op1 = gen_raw_REG (d.vmode, LAST_VIRTUAL_REGISTER + 3);
> -
> -  start_sequence ();
> -  ret = ix86_expand_vec_perm_const_1 (&d);
> -  end_sequence ();
> +  /* If the selector says both arguments are needed, but the operands are the
> +     same, the above tried to expand with one_operand_p and flattened selector.
> +     If that didn't work, retry without one_operand_p; we succeeded with that
> +     during testing.  */
> +  if (two_args && d.one_operand_p)
> +    {
> +      d.one_operand_p = false;
> +      memcpy (d.perm, perm, sizeof (perm));
> +      return ix86_expand_vec_perm_const_1 (&d);
> +    }
>
> -  return ret;
> +  return false;
>  }
>
>  void
> @@ -50532,9 +50532,8 @@ #define TARGET_CLASS_LIKELY_SPILLED_P ix
>  #undef TARGET_VECTORIZE_BUILTIN_VECTORIZATION_COST
>  #define TARGET_VECTORIZE_BUILTIN_VECTORIZATION_COST \
>    ix86_builtin_vectorization_cost
> -#undef TARGET_VECTORIZE_VEC_PERM_CONST_OK
> -#define TARGET_VECTORIZE_VEC_PERM_CONST_OK \
> -  ix86_vectorize_vec_perm_const_ok
> +#undef TARGET_VECTORIZE_VEC_PERM_CONST
> +#define TARGET_VECTORIZE_VEC_PERM_CONST ix86_vectorize_vec_perm_const
>  #undef TARGET_VECTORIZE_PREFERRED_SIMD_MODE
>  #define TARGET_VECTORIZE_PREFERRED_SIMD_MODE \
>    ix86_preferred_simd_mode
> Index: gcc/config/ia64/ia64-protos.h
> ===================================================================
> --- gcc/config/ia64/ia64-protos.h       2017-12-09 22:47:09.549486911 +0000
> +++ gcc/config/ia64/ia64-protos.h       2017-12-09 22:47:27.864318089 +0000
> @@ -62,7 +62,6 @@ extern const char *get_bundle_name (int)
>  extern const char *output_probe_stack_range (rtx, rtx);
>
>  extern void ia64_expand_vec_perm_even_odd (rtx, rtx, rtx, int);
> -extern bool ia64_expand_vec_perm_const (rtx op[4]);
>  extern void ia64_expand_vec_setv2sf (rtx op[3]);
>  #endif /* RTX_CODE */
>
> Index: gcc/config/ia64/vect.md
> ===================================================================
> --- gcc/config/ia64/vect.md     2017-12-09 22:47:09.549486911 +0000
> +++ gcc/config/ia64/vect.md     2017-12-09 22:47:27.865318089 +0000
> @@ -1549,19 +1549,6 @@ (define_expand "vec_pack_trunc_v2si"
>    DONE;
>  })
>
> -(define_expand "vec_perm_const<mode>"
> -  [(match_operand:VEC 0 "register_operand" "")
> -   (match_operand:VEC 1 "register_operand" "")
> -   (match_operand:VEC 2 "register_operand" "")
> -   (match_operand:<vecint> 3 "" "")]
> -  ""
> -{
> -  if (ia64_expand_vec_perm_const (operands))
> -    DONE;
> -  else
> -    FAIL;
> -})
> -
>  ;; Missing operations
>  ;; fprcpa
>  ;; fpsqrta
> Index: gcc/config/ia64/ia64.c
> ===================================================================
> --- gcc/config/ia64/ia64.c      2017-12-09 22:47:09.549486911 +0000
> +++ gcc/config/ia64/ia64.c      2017-12-09 22:47:27.864318089 +0000
> @@ -333,7 +333,8 @@ static fixed_size_mode ia64_get_reg_raw_
>  static section * ia64_hpux_function_section (tree, enum node_frequency,
>                                              bool, bool);
>
> -static bool ia64_vectorize_vec_perm_const_ok (machine_mode, vec_perm_indices);
> +static bool ia64_vectorize_vec_perm_const (machine_mode, rtx, rtx, rtx,
> +                                          const vec_perm_indices &);
>
>  static unsigned int ia64_hard_regno_nregs (unsigned int, machine_mode);
>  static bool ia64_hard_regno_mode_ok (unsigned int, machine_mode);
> @@ -652,8 +653,8 @@ #define TARGET_DELAY_SCHED2 true
>  #undef TARGET_DELAY_VARTRACK
>  #define TARGET_DELAY_VARTRACK true
>
> -#undef TARGET_VECTORIZE_VEC_PERM_CONST_OK
> -#define TARGET_VECTORIZE_VEC_PERM_CONST_OK ia64_vectorize_vec_perm_const_ok
> +#undef TARGET_VECTORIZE_VEC_PERM_CONST
> +#define TARGET_VECTORIZE_VEC_PERM_CONST ia64_vectorize_vec_perm_const
>
>  #undef TARGET_ATTRIBUTE_TAKES_IDENTIFIER_P
>  #define TARGET_ATTRIBUTE_TAKES_IDENTIFIER_P ia64_attribute_takes_identifier_p
> @@ -11741,32 +11742,31 @@ ia64_expand_vec_perm_const_1 (struct exp
>    return false;
>  }
>
> -bool
> -ia64_expand_vec_perm_const (rtx operands[4])
> +/* Implement TARGET_VECTORIZE_VEC_PERM_CONST.  */
> +
> +static bool
> +ia64_vectorize_vec_perm_const (machine_mode vmode, rtx target, rtx op0,
> +                              rtx op1, const vec_perm_indices &sel)
>  {
>    struct expand_vec_perm_d d;
>    unsigned char perm[MAX_VECT_LEN];
> -  int i, nelt, which;
> -  rtx sel;
> +  unsigned int i, nelt, which;
>
> -  d.target = operands[0];
> -  d.op0 = operands[1];
> -  d.op1 = operands[2];
> -  sel = operands[3];
> +  d.target = target;
> +  d.op0 = op0;
> +  d.op1 = op1;
>
> -  d.vmode = GET_MODE (d.target);
> +  d.vmode = vmode;
>    gcc_assert (VECTOR_MODE_P (d.vmode));
>    d.nelt = nelt = GET_MODE_NUNITS (d.vmode);
> -  d.testing_p = false;
> +  d.testing_p = !target;
>
> -  gcc_assert (GET_CODE (sel) == CONST_VECTOR);
> -  gcc_assert (XVECLEN (sel, 0) == nelt);
> +  gcc_assert (sel.length () == nelt);
>    gcc_checking_assert (sizeof (d.perm) == sizeof (perm));
>
>    for (i = which = 0; i < nelt; ++i)
>      {
> -      rtx e = XVECEXP (sel, 0, i);
> -      int ei = INTVAL (e) & (2 * nelt - 1);
> +      unsigned int ei = sel[i] & (2 * nelt - 1);
>
>        which |= (ei < nelt ? 1 : 2);
>        d.perm[i] = ei;
> @@ -11779,7 +11779,7 @@ ia64_expand_vec_perm_const (rtx operands
>        gcc_unreachable();
>
>      case 3:
> -      if (!rtx_equal_p (d.op0, d.op1))
> +      if (d.testing_p || !rtx_equal_p (d.op0, d.op1))
>         {
>           d.one_operand_p = false;
>           break;
> @@ -11807,6 +11807,22 @@ ia64_expand_vec_perm_const (rtx operands
>        break;
>      }
>
> +  if (d.testing_p)
> +    {
> +      /* We have to go through the motions and see if we can
> +        figure out how to generate the requested permutation.  */
> +      d.target = gen_raw_REG (d.vmode, LAST_VIRTUAL_REGISTER + 1);
> +      d.op1 = d.op0 = gen_raw_REG (d.vmode, LAST_VIRTUAL_REGISTER + 2);
> +      if (!d.one_operand_p)
> +       d.op1 = gen_raw_REG (d.vmode, LAST_VIRTUAL_REGISTER + 3);
> +
> +      start_sequence ();
> +      bool ret = ia64_expand_vec_perm_const_1 (&d);
> +      end_sequence ();
> +
> +      return ret;
> +    }
> +
>    if (ia64_expand_vec_perm_const_1 (&d))
>      return true;
>
> @@ -11823,51 +11839,6 @@ ia64_expand_vec_perm_const (rtx operands
>    return false;
>  }
>
> -/* Implement targetm.vectorize.vec_perm_const_ok.  */
> -
> -static bool
> -ia64_vectorize_vec_perm_const_ok (machine_mode vmode, vec_perm_indices sel)
> -{
> -  struct expand_vec_perm_d d;
> -  unsigned int i, nelt, which;
> -  bool ret;
> -
> -  d.vmode = vmode;
> -  d.nelt = nelt = GET_MODE_NUNITS (d.vmode);
> -  d.testing_p = true;
> -
> -  /* Extract the values from the vector CST into the permutation
> -     array in D.  */
> -  for (i = which = 0; i < nelt; ++i)
> -    {
> -      unsigned char e = sel[i];
> -      d.perm[i] = e;
> -      gcc_assert (e < 2 * nelt);
> -      which |= (e < nelt ? 1 : 2);
> -    }
> -
> -  /* For all elements from second vector, fold the elements to first.  */
> -  if (which == 2)
> -    for (i = 0; i < nelt; ++i)
> -      d.perm[i] -= nelt;
> -
> -  /* Check whether the mask can be applied to the vector type.  */
> -  d.one_operand_p = (which != 3);
> -
> -  /* Otherwise we have to go through the motions and see if we can
> -     figure out how to generate the requested permutation.  */
> -  d.target = gen_raw_REG (d.vmode, LAST_VIRTUAL_REGISTER + 1);
> -  d.op1 = d.op0 = gen_raw_REG (d.vmode, LAST_VIRTUAL_REGISTER + 2);
> -  if (!d.one_operand_p)
> -    d.op1 = gen_raw_REG (d.vmode, LAST_VIRTUAL_REGISTER + 3);
> -
> -  start_sequence ();
> -  ret = ia64_expand_vec_perm_const_1 (&d);
> -  end_sequence ();
> -
> -  return ret;
> -}
> -
>  void
>  ia64_expand_vec_setv2sf (rtx operands[3])
>  {
> Index: gcc/config/mips/loongson.md
> ===================================================================
> --- gcc/config/mips/loongson.md 2017-12-09 22:47:09.549486911 +0000
> +++ gcc/config/mips/loongson.md 2017-12-09 22:47:27.865318089 +0000
> @@ -784,19 +784,6 @@ (define_insn "*loongson_punpcklwd_hi"
>    "punpcklwd\t%0,%1,%2"
>    [(set_attr "type" "fcvt")])
>
> -(define_expand "vec_perm_const<mode>"
> -  [(match_operand:VWHB 0 "register_operand" "")
> -   (match_operand:VWHB 1 "register_operand" "")
> -   (match_operand:VWHB 2 "register_operand" "")
> -   (match_operand:VWHB 3 "" "")]
> -  "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
> -{
> -  if (mips_expand_vec_perm_const (operands))
> -    DONE;
> -  else
> -    FAIL;
> -})
> -
>  (define_expand "vec_unpacks_lo_<mode>"
>    [(match_operand:<V_stretch_half> 0 "register_operand" "")
>     (match_operand:VHB 1 "register_operand" "")]
> Index: gcc/config/mips/mips-msa.md
> ===================================================================
> --- gcc/config/mips/mips-msa.md 2017-12-09 22:47:09.549486911 +0000
> +++ gcc/config/mips/mips-msa.md 2017-12-09 22:47:27.865318089 +0000
> @@ -558,19 +558,6 @@ (define_insn_and_split "msa_copy_s_<msaf
>    [(set_attr "type" "simd_copy")
>     (set_attr "mode" "<MODE>")])
>
> -(define_expand "vec_perm_const<mode>"
> -  [(match_operand:MSA 0 "register_operand")
> -   (match_operand:MSA 1 "register_operand")
> -   (match_operand:MSA 2 "register_operand")
> -   (match_operand:<VIMODE> 3 "")]
> -  "ISA_HAS_MSA"
> -{
> -  if (mips_expand_vec_perm_const (operands))
> -    DONE;
> -  else
> -    FAIL;
> -})
> -
>  (define_expand "abs<mode>2"
>    [(match_operand:IMSA 0 "register_operand" "=f")
>     (abs:IMSA (match_operand:IMSA 1 "register_operand" "f"))]
> Index: gcc/config/mips/mips-ps-3d.md
> ===================================================================
> --- gcc/config/mips/mips-ps-3d.md       2017-12-09 22:47:09.549486911 +0000
> +++ gcc/config/mips/mips-ps-3d.md       2017-12-09 22:47:27.865318089 +0000
> @@ -164,19 +164,6 @@ (define_insn "vec_perm_const_ps"
>    [(set_attr "type" "fmove")
>     (set_attr "mode" "SF")])
>
> -(define_expand "vec_perm_constv2sf"
> -  [(match_operand:V2SF 0 "register_operand" "")
> -   (match_operand:V2SF 1 "register_operand" "")
> -   (match_operand:V2SF 2 "register_operand" "")
> -   (match_operand:V2SI 3 "" "")]
> -  "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT"
> -{
> -  if (mips_expand_vec_perm_const (operands))
> -    DONE;
> -  else
> -    FAIL;
> -})
> -
>  ;; Expanders for builtins.  The instruction:
>  ;;
>  ;;     P[UL][UL].PS <result>, <a>, <b>
> Index: gcc/config/mips/mips-protos.h
> ===================================================================
> --- gcc/config/mips/mips-protos.h       2017-12-09 22:47:09.549486911 +0000
> +++ gcc/config/mips/mips-protos.h       2017-12-09 22:47:27.865318089 +0000
> @@ -348,7 +348,6 @@ extern void mips_expand_atomic_qihi (uni
>                                      rtx, rtx, rtx, rtx);
>
>  extern void mips_expand_vector_init (rtx, rtx);
> -extern bool mips_expand_vec_perm_const (rtx op[4]);
>  extern void mips_expand_vec_unpack (rtx op[2], bool, bool);
>  extern void mips_expand_vec_reduc (rtx, rtx, rtx (*)(rtx, rtx, rtx));
>  extern void mips_expand_vec_minmax (rtx, rtx, rtx,
> Index: gcc/config/mips/mips.c
> ===================================================================
> --- gcc/config/mips/mips.c      2017-12-09 22:47:09.549486911 +0000
> +++ gcc/config/mips/mips.c      2017-12-09 22:47:27.867318090 +0000
> @@ -21377,34 +21377,32 @@ mips_expand_vec_perm_const_1 (struct exp
>    return false;
>  }
>
> -/* Expand a vec_perm_const pattern.  */
> +/* Implement TARGET_VECTORIZE_VEC_PERM_CONST.  */
>
> -bool
> -mips_expand_vec_perm_const (rtx operands[4])
> +static bool
> +mips_vectorize_vec_perm_const (machine_mode vmode, rtx target, rtx op0,
> +                              rtx op1, const vec_perm_indices &sel)
>  {
>    struct expand_vec_perm_d d;
>    int i, nelt, which;
>    unsigned char orig_perm[MAX_VECT_LEN];
> -  rtx sel;
>    bool ok;
>
> -  d.target = operands[0];
> -  d.op0 = operands[1];
> -  d.op1 = operands[2];
> -  sel = operands[3];
> -
> -  d.vmode = GET_MODE (d.target);
> -  gcc_assert (VECTOR_MODE_P (d.vmode));
> -  d.nelt = nelt = GET_MODE_NUNITS (d.vmode);
> -  d.testing_p = false;
> +  d.target = target;
> +  d.op0 = op0;
> +  d.op1 = op1;
> +
> +  d.vmode = vmode;
> +  gcc_assert (VECTOR_MODE_P (vmode));
> +  d.nelt = nelt = GET_MODE_NUNITS (vmode);
> +  d.testing_p = !target;
>
>    /* This is overly conservative, but ensures we don't get an
>       uninitialized warning on ORIG_PERM.  */
>    memset (orig_perm, 0, MAX_VECT_LEN);
>    for (i = which = 0; i < nelt; ++i)
>      {
> -      rtx e = XVECEXP (sel, 0, i);
> -      int ei = INTVAL (e) & (2 * nelt - 1);
> +      int ei = sel[i] & (2 * nelt - 1);
>        which |= (ei < nelt ? 1 : 2);
>        orig_perm[i] = ei;
>      }
> @@ -21417,7 +21415,7 @@ mips_expand_vec_perm_const (rtx operands
>
>      case 3:
>        d.one_vector_p = false;
> -      if (!rtx_equal_p (d.op0, d.op1))
> +      if (d.testing_p || !rtx_equal_p (d.op0, d.op1))
>         break;
>        /* FALLTHRU */
>
> @@ -21434,6 +21432,19 @@ mips_expand_vec_perm_const (rtx operands
>        break;
>      }
>
> +  if (d.testing_p)
> +    {
> +      d.target = gen_raw_REG (d.vmode, LAST_VIRTUAL_REGISTER + 1);
> +      d.op1 = d.op0 = gen_raw_REG (d.vmode, LAST_VIRTUAL_REGISTER + 2);
> +      if (!d.one_vector_p)
> +       d.op1 = gen_raw_REG (d.vmode, LAST_VIRTUAL_REGISTER + 3);
> +
> +      start_sequence ();
> +      ok = mips_expand_vec_perm_const_1 (&d);
> +      end_sequence ();
> +      return ok;
> +    }
> +
>    ok = mips_expand_vec_perm_const_1 (&d);
>
>    /* If we were given a two-vector permutation which just happened to
> @@ -21445,8 +21456,8 @@ mips_expand_vec_perm_const (rtx operands
>       the original permutation.  */
>    if (!ok && which == 3)
>      {
> -      d.op0 = operands[1];
> -      d.op1 = operands[2];
> +      d.op0 = op0;
> +      d.op1 = op1;
>        d.one_vector_p = false;
>        memcpy (d.perm, orig_perm, MAX_VECT_LEN);
>        ok = mips_expand_vec_perm_const_1 (&d);
> @@ -21466,48 +21477,6 @@ mips_sched_reassociation_width (unsigned
>    return 1;
>  }
>
> -/* Implement TARGET_VECTORIZE_VEC_PERM_CONST_OK.  */
> -
> -static bool
> -mips_vectorize_vec_perm_const_ok (machine_mode vmode, vec_perm_indices sel)
> -{
> -  struct expand_vec_perm_d d;
> -  unsigned int i, nelt, which;
> -  bool ret;
> -
> -  d.vmode = vmode;
> -  d.nelt = nelt = GET_MODE_NUNITS (d.vmode);
> -  d.testing_p = true;
> -
> -  /* Categorize the set of elements in the selector.  */
> -  for (i = which = 0; i < nelt; ++i)
> -    {
> -      unsigned char e = sel[i];
> -      d.perm[i] = e;
> -      gcc_assert (e < 2 * nelt);
> -      which |= (e < nelt ? 1 : 2);
> -    }
> -
> -  /* For all elements from second vector, fold the elements to first.  */
> -  if (which == 2)
> -    for (i = 0; i < nelt; ++i)
> -      d.perm[i] -= nelt;
> -
> -  /* Check whether the mask can be applied to the vector type.  */
> -  d.one_vector_p = (which != 3);
> -
> -  d.target = gen_raw_REG (d.vmode, LAST_VIRTUAL_REGISTER + 1);
> -  d.op1 = d.op0 = gen_raw_REG (d.vmode, LAST_VIRTUAL_REGISTER + 2);
> -  if (!d.one_vector_p)
> -    d.op1 = gen_raw_REG (d.vmode, LAST_VIRTUAL_REGISTER + 3);
> -
> -  start_sequence ();
> -  ret = mips_expand_vec_perm_const_1 (&d);
> -  end_sequence ();
> -
> -  return ret;
> -}
> -
>  /* Expand an integral vector unpack operation.  */
>
>  void
> @@ -22589,8 +22558,8 @@ #define TARGET_SHIFT_TRUNCATION_MASK mip
>  #undef TARGET_PREPARE_PCH_SAVE
>  #define TARGET_PREPARE_PCH_SAVE mips_prepare_pch_save
>
> -#undef TARGET_VECTORIZE_VEC_PERM_CONST_OK
> -#define TARGET_VECTORIZE_VEC_PERM_CONST_OK mips_vectorize_vec_perm_const_ok
> +#undef TARGET_VECTORIZE_VEC_PERM_CONST
> +#define TARGET_VECTORIZE_VEC_PERM_CONST mips_vectorize_vec_perm_const
>
>  #undef TARGET_SCHED_REASSOCIATION_WIDTH
>  #define TARGET_SCHED_REASSOCIATION_WIDTH mips_sched_reassociation_width
> Index: gcc/config/powerpcspe/altivec.md
> ===================================================================
> --- gcc/config/powerpcspe/altivec.md    2017-12-09 22:47:09.549486911 +0000
> +++ gcc/config/powerpcspe/altivec.md    2017-12-09 22:47:27.867318090 +0000
> @@ -2080,19 +2080,6 @@ (define_expand "vec_permv16qi"
>    }
>  })
>
> -(define_expand "vec_perm_constv16qi"
> -  [(match_operand:V16QI 0 "register_operand" "")
> -   (match_operand:V16QI 1 "register_operand" "")
> -   (match_operand:V16QI 2 "register_operand" "")
> -   (match_operand:V16QI 3 "" "")]
> -  "TARGET_ALTIVEC"
> -{
> -  if (altivec_expand_vec_perm_const (operands))
> -    DONE;
> -  else
> -    FAIL;
> -})
> -
>  (define_insn "*altivec_vpermr_<mode>_internal"
>    [(set (match_operand:VM 0 "register_operand" "=v,?wo")
>         (unspec:VM [(match_operand:VM 1 "register_operand" "v,wo")
> Index: gcc/config/powerpcspe/paired.md
> ===================================================================
> --- gcc/config/powerpcspe/paired.md     2017-12-09 22:47:09.549486911 +0000
> +++ gcc/config/powerpcspe/paired.md     2017-12-09 22:47:27.867318090 +0000
> @@ -313,19 +313,6 @@ (define_insn "paired_merge11"
>    "ps_merge11 %0, %1, %2"
>    [(set_attr "type" "fp")])
>
> -(define_expand "vec_perm_constv2sf"
> -  [(match_operand:V2SF 0 "gpc_reg_operand" "")
> -   (match_operand:V2SF 1 "gpc_reg_operand" "")
> -   (match_operand:V2SF 2 "gpc_reg_operand" "")
> -   (match_operand:V2SI 3 "" "")]
> -  "TARGET_PAIRED_FLOAT"
> -{
> -  if (rs6000_expand_vec_perm_const (operands))
> -    DONE;
> -  else
> -    FAIL;
> -})
> -
>  (define_insn "paired_sum0"
>    [(set (match_operand:V2SF 0 "gpc_reg_operand" "=f")
>         (vec_concat:V2SF (plus:SF (vec_select:SF
> Index: gcc/config/powerpcspe/spe.md
> ===================================================================
> --- gcc/config/powerpcspe/spe.md        2017-12-09 22:47:09.549486911 +0000
> +++ gcc/config/powerpcspe/spe.md        2017-12-09 22:47:27.871318093 +0000
> @@ -511,19 +511,6 @@ (define_insn "vec_perm10_v2si"
>    [(set_attr "type" "vecsimple")
>     (set_attr  "length" "4")])
>
> -(define_expand "vec_perm_constv2si"
> -  [(match_operand:V2SI 0 "gpc_reg_operand" "")
> -   (match_operand:V2SI 1 "gpc_reg_operand" "")
> -   (match_operand:V2SI 2 "gpc_reg_operand" "")
> -   (match_operand:V2SI 3 "" "")]
> -  "TARGET_SPE"
> -{
> -  if (rs6000_expand_vec_perm_const (operands))
> -    DONE;
> -  else
> -    FAIL;
> -})
> -
>  (define_expand "spe_evmergehi"
>    [(match_operand:V2SI 0 "register_operand" "")
>     (match_operand:V2SI 1 "register_operand" "")
> Index: gcc/config/powerpcspe/vsx.md
> ===================================================================
> --- gcc/config/powerpcspe/vsx.md        2017-12-09 22:47:09.549486911 +0000
> +++ gcc/config/powerpcspe/vsx.md        2017-12-09 22:47:27.871318093 +0000
> @@ -2543,19 +2543,6 @@ (define_insn "vsx_xxpermdi2_<mode>_1"
>  }
>    [(set_attr "type" "vecperm")])
>
> -(define_expand "vec_perm_const<mode>"
> -  [(match_operand:VSX_D 0 "vsx_register_operand" "")
> -   (match_operand:VSX_D 1 "vsx_register_operand" "")
> -   (match_operand:VSX_D 2 "vsx_register_operand" "")
> -   (match_operand:V2DI  3 "" "")]
> -  "VECTOR_MEM_VSX_P (<MODE>mode)"
> -{
> -  if (rs6000_expand_vec_perm_const (operands))
> -    DONE;
> -  else
> -    FAIL;
> -})
> -
>  ;; Extraction of a single element in a small integer vector.  Until ISA 3.0,
>  ;; none of the small types were allowed in a vector register, so we had to
>  ;; extract to a DImode and either do a direct move or store.
> Index: gcc/config/powerpcspe/powerpcspe-protos.h
> ===================================================================
> --- gcc/config/powerpcspe/powerpcspe-protos.h   2017-12-09 22:47:09.549486911 +0000
> +++ gcc/config/powerpcspe/powerpcspe-protos.h   2017-12-09 22:47:27.867318090 +0000
> @@ -64,9 +64,7 @@ extern void rs6000_expand_vector_extract
>  extern void rs6000_split_vec_extract_var (rtx, rtx, rtx, rtx, rtx);
>  extern rtx rs6000_adjust_vec_address (rtx, rtx, rtx, rtx, machine_mode);
>  extern void rs6000_split_v4si_init (rtx []);
> -extern bool altivec_expand_vec_perm_const (rtx op[4]);
>  extern void altivec_expand_vec_perm_le (rtx op[4]);
> -extern bool rs6000_expand_vec_perm_const (rtx op[4]);
>  extern void altivec_expand_lvx_be (rtx, rtx, machine_mode, unsigned);
>  extern void altivec_expand_stvx_be (rtx, rtx, machine_mode, unsigned);
>  extern void altivec_expand_stvex_be (rtx, rtx, machine_mode, unsigned);
> Index: gcc/config/powerpcspe/powerpcspe.c
> ===================================================================
> --- gcc/config/powerpcspe/powerpcspe.c  2017-12-09 22:47:09.549486911 +0000
> +++ gcc/config/powerpcspe/powerpcspe.c  2017-12-09 22:47:27.871318093 +0000
> @@ -1936,8 +1936,8 @@ #define TARGET_SET_CURRENT_FUNCTION rs60
>  #undef TARGET_LEGITIMATE_CONSTANT_P
>  #define TARGET_LEGITIMATE_CONSTANT_P rs6000_legitimate_constant_p
>
> -#undef TARGET_VECTORIZE_VEC_PERM_CONST_OK
> -#define TARGET_VECTORIZE_VEC_PERM_CONST_OK rs6000_vectorize_vec_perm_const_ok
> +#undef TARGET_VECTORIZE_VEC_PERM_CONST
> +#define TARGET_VECTORIZE_VEC_PERM_CONST rs6000_vectorize_vec_perm_const
>
>  #undef TARGET_CAN_USE_DOLOOP_P
>  #define TARGET_CAN_USE_DOLOOP_P can_use_doloop_if_innermost
> @@ -38311,6 +38311,9 @@ rs6000_emit_parity (rtx dst, rtx src)
>  }
>
>  /* Expand an Altivec constant permutation for little endian mode.
> +   OP0 and OP1 are the input vectors and TARGET is the output vector.
> +   SEL specifies the constant permutation vector.
> +
>     There are two issues: First, the two input operands must be
>     swapped so that together they form a double-wide array in LE
>     order.  Second, the vperm instruction has surprising behavior
> @@ -38352,22 +38355,18 @@ rs6000_emit_parity (rtx dst, rtx src)
>
>     vr9  = 00000006 00000004 00000002 00000000.  */
>
> -void
> -altivec_expand_vec_perm_const_le (rtx operands[4])
> +static void
> +altivec_expand_vec_perm_const_le (rtx target, rtx op0, rtx op1,
> +                                 const vec_perm_indices &sel)
>  {
>    unsigned int i;
>    rtx perm[16];
>    rtx constv, unspec;
> -  rtx target = operands[0];
> -  rtx op0 = operands[1];
> -  rtx op1 = operands[2];
> -  rtx sel = operands[3];
>
>    /* Unpack and adjust the constant selector.  */
>    for (i = 0; i < 16; ++i)
>      {
> -      rtx e = XVECEXP (sel, 0, i);
> -      unsigned int elt = 31 - (INTVAL (e) & 31);
> +      unsigned int elt = 31 - (sel[i] & 31);
>        perm[i] = GEN_INT (elt);
>      }
>
> @@ -38449,10 +38448,14 @@ altivec_expand_vec_perm_le (rtx operands
>  }
>
>  /* Expand an Altivec constant permutation.  Return true if we match
> -   an efficient implementation; false to fall back to VPERM.  */
> +   an efficient implementation; false to fall back to VPERM.
>
> -bool
> -altivec_expand_vec_perm_const (rtx operands[4])
> +   OP0 and OP1 are the input vectors and TARGET is the output vector.
> +   SEL specifies the constant permutation vector.  */
> +
> +static bool
> +altivec_expand_vec_perm_const (rtx target, rtx op0, rtx op1,
> +                              const vec_perm_indices &sel)
>  {
>    struct altivec_perm_insn {
>      HOST_WIDE_INT mask;
> @@ -38496,19 +38499,13 @@ altivec_expand_vec_perm_const (rtx opera
>
>    unsigned int i, j, elt, which;
>    unsigned char perm[16];
> -  rtx target, op0, op1, sel, x;
> +  rtx x;
>    bool one_vec;
>
> -  target = operands[0];
> -  op0 = operands[1];
> -  op1 = operands[2];
> -  sel = operands[3];
> -
>    /* Unpack the constant selector.  */
>    for (i = which = 0; i < 16; ++i)
>      {
> -      rtx e = XVECEXP (sel, 0, i);
> -      elt = INTVAL (e) & 31;
> +      elt = sel[i] & 31;
>        which |= (elt < 16 ? 1 : 2);
>        perm[i] = elt;
>      }
> @@ -38664,7 +38661,7 @@ altivec_expand_vec_perm_const (rtx opera
>
>    if (!BYTES_BIG_ENDIAN)
>      {
> -      altivec_expand_vec_perm_const_le (operands);
> +      altivec_expand_vec_perm_const_le (target, op0, op1, sel);
>        return true;
>      }
>
> @@ -38724,60 +38721,54 @@ rs6000_expand_vec_perm_const_1 (rtx targ
>    return true;
>  }
>
> -bool
> -rs6000_expand_vec_perm_const (rtx operands[4])
> -{
> -  rtx target, op0, op1, sel;
> -  unsigned char perm0, perm1;
> -
> -  target = operands[0];
> -  op0 = operands[1];
> -  op1 = operands[2];
> -  sel = operands[3];
> -
> -  /* Unpack the constant selector.  */
> -  perm0 = INTVAL (XVECEXP (sel, 0, 0)) & 3;
> -  perm1 = INTVAL (XVECEXP (sel, 0, 1)) & 3;
> -
> -  return rs6000_expand_vec_perm_const_1 (target, op0, op1, perm0, perm1);
> -}
> -
> -/* Test whether a constant permutation is supported.  */
> +/* Implement TARGET_VECTORIZE_VEC_PERM_CONST.  */
>
>  static bool
> -rs6000_vectorize_vec_perm_const_ok (machine_mode vmode, vec_perm_indices sel)
> +rs6000_vectorize_vec_perm_const (machine_mode vmode, rtx target, rtx op0,
> +                                rtx op1, const vec_perm_indices &sel)
>  {
> +  bool testing_p = !target;
> +
>    /* AltiVec (and thus VSX) can handle arbitrary permutations.  */
> -  if (TARGET_ALTIVEC)
> +  if (TARGET_ALTIVEC && testing_p)
>      return true;
>
> -  /* Check for ps_merge* or evmerge* insns.  */
> -  if ((TARGET_PAIRED_FLOAT && vmode == V2SFmode)
> -      || (TARGET_SPE && vmode == V2SImode))
> -    {
> -      rtx op0 = gen_raw_REG (vmode, LAST_VIRTUAL_REGISTER + 1);
> -      rtx op1 = gen_raw_REG (vmode, LAST_VIRTUAL_REGISTER + 2);
> -      return rs6000_expand_vec_perm_const_1 (NULL, op0, op1, sel[0], sel[1]);
> +  /* Check for ps_merge*, evmerge* or xxperm* insns.  */
> +  if ((vmode == V2SFmode && TARGET_PAIRED_FLOAT)
> +      || (vmode == V2SImode && TARGET_SPE)
> +      || ((vmode == V2DFmode || vmode == V2DImode)
> +         && VECTOR_MEM_VSX_P (vmode)))
> +    {
> +      if (testing_p)
> +       {
> +         op0 = gen_raw_REG (vmode, LAST_VIRTUAL_REGISTER + 1);
> +         op1 = gen_raw_REG (vmode, LAST_VIRTUAL_REGISTER + 2);
> +       }
> +      if (rs6000_expand_vec_perm_const_1 (target, op0, op1, sel[0], sel[1]))
> +       return true;
> +    }
> +
> +  if (TARGET_ALTIVEC)
> +    {
> +      /* Force the target-independent code to lower to V16QImode.  */
> +      if (vmode != V16QImode)
> +       return false;
> +      if (altivec_expand_vec_perm_const (target, op0, op1, sel))
> +       return true;
>      }
>
>    return false;
>  }
>
> -/* A subroutine for rs6000_expand_extract_even & rs6000_expand_interleave.  */
> +/* A subroutine for rs6000_expand_extract_even & rs6000_expand_interleave.
> +   OP0 and OP1 are the input vectors and TARGET is the output vector.
> +   PERM specifies the constant permutation vector.  */
>
>  static void
>  rs6000_do_expand_vec_perm (rtx target, rtx op0, rtx op1,
> -                          machine_mode vmode, unsigned nelt, rtx perm[])
> +                          machine_mode vmode, const vec_perm_builder &perm)
>  {
> -  machine_mode imode;
> -  rtx x;
> -
> -  imode = vmode;
> -  if (GET_MODE_CLASS (vmode) != MODE_VECTOR_INT)
> -    imode = mode_for_int_vector (vmode).require ();
> -
> -  x = gen_rtx_CONST_VECTOR (imode, gen_rtvec_v (nelt, perm));
> -  x = expand_vec_perm (vmode, op0, op1, x, target);
> +  rtx x = expand_vec_perm_const (vmode, op0, op1, perm, BLKmode, target);
>    if (x != target)
>      emit_move_insn (target, x);
>  }
> @@ -38789,12 +38780,12 @@ rs6000_expand_extract_even (rtx target,
>  {
>    machine_mode vmode = GET_MODE (target);
>    unsigned i, nelt = GET_MODE_NUNITS (vmode);
> -  rtx perm[16];
> +  vec_perm_builder perm (nelt);
>
>    for (i = 0; i < nelt; i++)
> -    perm[i] = GEN_INT (i * 2);
> +    perm.quick_push (i * 2);
>
> -  rs6000_do_expand_vec_perm (target, op0, op1, vmode, nelt, perm);
> +  rs6000_do_expand_vec_perm (target, op0, op1, vmode, perm);
>  }
>
>  /* Expand a vector interleave operation.  */
> @@ -38804,16 +38795,16 @@ rs6000_expand_interleave (rtx target, rt
>  {
>    machine_mode vmode = GET_MODE (target);
>    unsigned i, high, nelt = GET_MODE_NUNITS (vmode);
> -  rtx perm[16];
> +  vec_perm_builder perm (nelt);
>
>    high = (highp ? 0 : nelt / 2);
>    for (i = 0; i < nelt / 2; i++)
>      {
> -      perm[i * 2] = GEN_INT (i + high);
> -      perm[i * 2 + 1] = GEN_INT (i + nelt + high);
> +      perm.quick_push (i + high);
> +      perm.quick_push (i + nelt + high);
>      }
>
> -  rs6000_do_expand_vec_perm (target, op0, op1, vmode, nelt, perm);
> +  rs6000_do_expand_vec_perm (target, op0, op1, vmode, perm);
>  }
>
>  /* Scale a V2DF vector SRC by two to the SCALE and place in TGT.  */
> Index: gcc/config/rs6000/altivec.md
> ===================================================================
> --- gcc/config/rs6000/altivec.md        2017-12-09 22:47:09.549486911 +0000
> +++ gcc/config/rs6000/altivec.md        2017-12-09 22:47:27.872318093 +0000
> @@ -2198,19 +2198,6 @@ (define_expand "vec_permv16qi"
>    }
>  })
>
> -(define_expand "vec_perm_constv16qi"
> -  [(match_operand:V16QI 0 "register_operand" "")
> -   (match_operand:V16QI 1 "register_operand" "")
> -   (match_operand:V16QI 2 "register_operand" "")
> -   (match_operand:V16QI 3 "" "")]
> -  "TARGET_ALTIVEC"
> -{
> -  if (altivec_expand_vec_perm_const (operands))
> -    DONE;
> -  else
> -    FAIL;
> -})
> -
>  (define_insn "*altivec_vpermr_<mode>_internal"
>    [(set (match_operand:VM 0 "register_operand" "=v,?wo")
>         (unspec:VM [(match_operand:VM 1 "register_operand" "v,wo")
> Index: gcc/config/rs6000/paired.md
> ===================================================================
> --- gcc/config/rs6000/paired.md 2017-12-09 22:47:09.549486911 +0000
> +++ gcc/config/rs6000/paired.md 2017-12-09 22:47:27.872318093 +0000
> @@ -313,19 +313,6 @@ (define_insn "paired_merge11"
>    "ps_merge11 %0, %1, %2"
>    [(set_attr "type" "fp")])
>
> -(define_expand "vec_perm_constv2sf"
> -  [(match_operand:V2SF 0 "gpc_reg_operand" "")
> -   (match_operand:V2SF 1 "gpc_reg_operand" "")
> -   (match_operand:V2SF 2 "gpc_reg_operand" "")
> -   (match_operand:V2SI 3 "" "")]
> -  "TARGET_PAIRED_FLOAT"
> -{
> -  if (rs6000_expand_vec_perm_const (operands))
> -    DONE;
> -  else
> -    FAIL;
> -})
> -
>  (define_insn "paired_sum0"
>    [(set (match_operand:V2SF 0 "gpc_reg_operand" "=f")
>         (vec_concat:V2SF (plus:SF (vec_select:SF
> Index: gcc/config/rs6000/vsx.md
> ===================================================================
> --- gcc/config/rs6000/vsx.md    2017-12-09 22:47:09.549486911 +0000
> +++ gcc/config/rs6000/vsx.md    2017-12-09 22:47:27.875318095 +0000
> @@ -3189,19 +3189,6 @@ (define_insn "vsx_xxpermdi2_<mode>_1"
>  }
>    [(set_attr "type" "vecperm")])
>
> -(define_expand "vec_perm_const<mode>"
> -  [(match_operand:VSX_D 0 "vsx_register_operand" "")
> -   (match_operand:VSX_D 1 "vsx_register_operand" "")
> -   (match_operand:VSX_D 2 "vsx_register_operand" "")
> -   (match_operand:V2DI  3 "" "")]
> -  "VECTOR_MEM_VSX_P (<MODE>mode)"
> -{
> -  if (rs6000_expand_vec_perm_const (operands))
> -    DONE;
> -  else
> -    FAIL;
> -})
> -
>  ;; Extraction of a single element in a small integer vector.  Until ISA 3.0,
>  ;; none of the small types were allowed in a vector register, so we had to
>  ;; extract to a DImode and either do a direct move or store.
> Index: gcc/config/rs6000/rs6000-protos.h
> ===================================================================
> --- gcc/config/rs6000/rs6000-protos.h   2017-12-09 22:47:09.549486911 +0000
> +++ gcc/config/rs6000/rs6000-protos.h   2017-12-09 22:47:27.872318093 +0000
> @@ -63,9 +63,7 @@ extern void rs6000_expand_vector_extract
>  extern void rs6000_split_vec_extract_var (rtx, rtx, rtx, rtx, rtx);
>  extern rtx rs6000_adjust_vec_address (rtx, rtx, rtx, rtx, machine_mode);
>  extern void rs6000_split_v4si_init (rtx []);
> -extern bool altivec_expand_vec_perm_const (rtx op[4]);
>  extern void altivec_expand_vec_perm_le (rtx op[4]);
> -extern bool rs6000_expand_vec_perm_const (rtx op[4]);
>  extern void altivec_expand_lvx_be (rtx, rtx, machine_mode, unsigned);
>  extern void altivec_expand_stvx_be (rtx, rtx, machine_mode, unsigned);
>  extern void altivec_expand_stvex_be (rtx, rtx, machine_mode, unsigned);
> Index: gcc/config/rs6000/rs6000.c
> ===================================================================
> --- gcc/config/rs6000/rs6000.c  2017-12-09 22:47:09.549486911 +0000
> +++ gcc/config/rs6000/rs6000.c  2017-12-09 22:47:27.874318095 +0000
> @@ -1907,8 +1907,8 @@ #define TARGET_SET_CURRENT_FUNCTION rs60
>  #undef TARGET_LEGITIMATE_CONSTANT_P
>  #define TARGET_LEGITIMATE_CONSTANT_P rs6000_legitimate_constant_p
>
> -#undef TARGET_VECTORIZE_VEC_PERM_CONST_OK
> -#define TARGET_VECTORIZE_VEC_PERM_CONST_OK rs6000_vectorize_vec_perm_const_ok
> +#undef TARGET_VECTORIZE_VEC_PERM_CONST
> +#define TARGET_VECTORIZE_VEC_PERM_CONST rs6000_vectorize_vec_perm_const
>
>  #undef TARGET_CAN_USE_DOLOOP_P
>  #define TARGET_CAN_USE_DOLOOP_P can_use_doloop_if_innermost
> @@ -35545,6 +35545,9 @@ rs6000_emit_parity (rtx dst, rtx src)
>  }
>
>  /* Expand an Altivec constant permutation for little endian mode.
> +   OP0 and OP1 are the input vectors and TARGET is the output vector.
> +   SEL specifies the constant permutation vector.
> +
>     There are two issues: First, the two input operands must be
>     swapped so that together they form a double-wide array in LE
>     order.  Second, the vperm instruction has surprising behavior
> @@ -35586,22 +35589,18 @@ rs6000_emit_parity (rtx dst, rtx src)
>
>     vr9  = 00000006 00000004 00000002 00000000.  */
>
> -void
> -altivec_expand_vec_perm_const_le (rtx operands[4])
> +static void
> +altivec_expand_vec_perm_const_le (rtx target, rtx op0, rtx op1,
> +                                 const vec_perm_indices &sel)
>  {
>    unsigned int i;
>    rtx perm[16];
>    rtx constv, unspec;
> -  rtx target = operands[0];
> -  rtx op0 = operands[1];
> -  rtx op1 = operands[2];
> -  rtx sel = operands[3];
>
>    /* Unpack and adjust the constant selector.  */
>    for (i = 0; i < 16; ++i)
>      {
> -      rtx e = XVECEXP (sel, 0, i);
> -      unsigned int elt = 31 - (INTVAL (e) & 31);
> +      unsigned int elt = 31 - (sel[i] & 31);
>        perm[i] = GEN_INT (elt);
>      }
>
> @@ -35683,10 +35682,14 @@ altivec_expand_vec_perm_le (rtx operands
>  }
>
>  /* Expand an Altivec constant permutation.  Return true if we match
> -   an efficient implementation; false to fall back to VPERM.  */
> +   an efficient implementation; false to fall back to VPERM.
>
> -bool
> -altivec_expand_vec_perm_const (rtx operands[4])
> +   OP0 and OP1 are the input vectors and TARGET is the output vector.
> +   SEL specifies the constant permutation vector.  */
> +
> +static bool
> +altivec_expand_vec_perm_const (rtx target, rtx op0, rtx op1,
> +                              const vec_perm_indices &sel)
>  {
>    struct altivec_perm_insn {
>      HOST_WIDE_INT mask;
> @@ -35734,19 +35737,13 @@ altivec_expand_vec_perm_const (rtx opera
>
>    unsigned int i, j, elt, which;
>    unsigned char perm[16];
> -  rtx target, op0, op1, sel, x;
> +  rtx x;
>    bool one_vec;
>
> -  target = operands[0];
> -  op0 = operands[1];
> -  op1 = operands[2];
> -  sel = operands[3];
> -
>    /* Unpack the constant selector.  */
>    for (i = which = 0; i < 16; ++i)
>      {
> -      rtx e = XVECEXP (sel, 0, i);
> -      elt = INTVAL (e) & 31;
> +      elt = sel[i] & 31;
>        which |= (elt < 16 ? 1 : 2);
>        perm[i] = elt;
>      }
> @@ -35902,7 +35899,7 @@ altivec_expand_vec_perm_const (rtx opera
>
>    if (!BYTES_BIG_ENDIAN)
>      {
> -      altivec_expand_vec_perm_const_le (operands);
> +      altivec_expand_vec_perm_const_le (target, op0, op1, sel);
>        return true;
>      }
>
> @@ -35962,59 +35959,53 @@ rs6000_expand_vec_perm_const_1 (rtx targ
>    return true;
>  }
>
> -bool
> -rs6000_expand_vec_perm_const (rtx operands[4])
> -{
> -  rtx target, op0, op1, sel;
> -  unsigned char perm0, perm1;
> -
> -  target = operands[0];
> -  op0 = operands[1];
> -  op1 = operands[2];
> -  sel = operands[3];
> -
> -  /* Unpack the constant selector.  */
> -  perm0 = INTVAL (XVECEXP (sel, 0, 0)) & 3;
> -  perm1 = INTVAL (XVECEXP (sel, 0, 1)) & 3;
> -
> -  return rs6000_expand_vec_perm_const_1 (target, op0, op1, perm0, perm1);
> -}
> -
> -/* Test whether a constant permutation is supported.  */
> +/* Implement TARGET_VECTORIZE_VEC_PERM_CONST.  */
>
>  static bool
> -rs6000_vectorize_vec_perm_const_ok (machine_mode vmode, vec_perm_indices sel)
> +rs6000_vectorize_vec_perm_const (machine_mode vmode, rtx target, rtx op0,
> +                                rtx op1, const vec_perm_indices &sel)
>  {
> +  bool testing_p = !target;
> +
>    /* AltiVec (and thus VSX) can handle arbitrary permutations.  */
> -  if (TARGET_ALTIVEC)
> +  if (TARGET_ALTIVEC && testing_p)
>      return true;
>
> -  /* Check for ps_merge* or evmerge* insns.  */
> -  if (TARGET_PAIRED_FLOAT && vmode == V2SFmode)
> +  /* Check for ps_merge* or xxpermdi insns.  */
> +  if ((vmode == V2SFmode && TARGET_PAIRED_FLOAT)
> +      || ((vmode == V2DFmode || vmode == V2DImode)
> +         && VECTOR_MEM_VSX_P (vmode)))
> +    {
> +      if (testing_p)
> +       {
> +         op0 = gen_raw_REG (vmode, LAST_VIRTUAL_REGISTER + 1);
> +         op1 = gen_raw_REG (vmode, LAST_VIRTUAL_REGISTER + 2);
> +       }
> +      if (rs6000_expand_vec_perm_const_1 (target, op0, op1, sel[0], sel[1]))
> +       return true;
> +    }
> +
> +  if (TARGET_ALTIVEC)
>      {
> -      rtx op0 = gen_raw_REG (vmode, LAST_VIRTUAL_REGISTER + 1);
> -      rtx op1 = gen_raw_REG (vmode, LAST_VIRTUAL_REGISTER + 2);
> -      return rs6000_expand_vec_perm_const_1 (NULL, op0, op1, sel[0], sel[1]);
> +      /* Force the target-independent code to lower to V16QImode.  */
> +      if (vmode != V16QImode)
> +       return false;
> +      if (altivec_expand_vec_perm_const (target, op0, op1, sel))
> +       return true;
>      }
>
>    return false;
>  }
>
> -/* A subroutine for rs6000_expand_extract_even & rs6000_expand_interleave.  */
> +/* A subroutine for rs6000_expand_extract_even & rs6000_expand_interleave.
> +   OP0 and OP1 are the input vectors and TARGET is the output vector.
> +   PERM specifies the constant permutation vector.  */
>
>  static void
>  rs6000_do_expand_vec_perm (rtx target, rtx op0, rtx op1,
> -                          machine_mode vmode, unsigned nelt, rtx perm[])
> +                          machine_mode vmode, const vec_perm_builder &perm)
>  {
> -  machine_mode imode;
> -  rtx x;
> -
> -  imode = vmode;
> -  if (GET_MODE_CLASS (vmode) != MODE_VECTOR_INT)
> -    imode = mode_for_int_vector (vmode).require ();
> -
> -  x = gen_rtx_CONST_VECTOR (imode, gen_rtvec_v (nelt, perm));
> -  x = expand_vec_perm (vmode, op0, op1, x, target);
> +  rtx x = expand_vec_perm_const (vmode, op0, op1, perm, BLKmode, target);
>    if (x != target)
>      emit_move_insn (target, x);
>  }
> @@ -36026,12 +36017,12 @@ rs6000_expand_extract_even (rtx target,
>  {
>    machine_mode vmode = GET_MODE (target);
>    unsigned i, nelt = GET_MODE_NUNITS (vmode);
> -  rtx perm[16];
> +  vec_perm_builder perm (nelt);
>
>    for (i = 0; i < nelt; i++)
> -    perm[i] = GEN_INT (i * 2);
> +    perm.quick_push (i * 2);
>
> -  rs6000_do_expand_vec_perm (target, op0, op1, vmode, nelt, perm);
> +  rs6000_do_expand_vec_perm (target, op0, op1, vmode, perm);
>  }
>
>  /* Expand a vector interleave operation.  */
> @@ -36041,16 +36032,16 @@ rs6000_expand_interleave (rtx target, rt
>  {
>    machine_mode vmode = GET_MODE (target);
>    unsigned i, high, nelt = GET_MODE_NUNITS (vmode);
> -  rtx perm[16];
> +  vec_perm_builder perm (nelt);
>
>    high = (highp ? 0 : nelt / 2);
>    for (i = 0; i < nelt / 2; i++)
>      {
> -      perm[i * 2] = GEN_INT (i + high);
> -      perm[i * 2 + 1] = GEN_INT (i + nelt + high);
> +      perm.quick_push (i + high);
> +      perm.quick_push (i + nelt + high);
>      }
>
> -  rs6000_do_expand_vec_perm (target, op0, op1, vmode, nelt, perm);
> +  rs6000_do_expand_vec_perm (target, op0, op1, vmode, perm);
>  }
>
>  /* Scale a V2DF vector SRC by two to the SCALE and place in TGT.  */
> Index: gcc/config/sparc/sparc.md
> ===================================================================
> --- gcc/config/sparc/sparc.md   2017-12-09 22:47:09.549486911 +0000
> +++ gcc/config/sparc/sparc.md   2017-12-09 22:47:27.876318096 +0000
> @@ -9327,28 +9327,6 @@ (define_insn "bshuffle<VM64:mode>_vis"
>     (set_attr "subtype" "other")
>     (set_attr "fptype" "double")])
>
> -;; The rtl expanders will happily convert constant permutations on other
> -;; modes down to V8QI.  Rely on this to avoid the complexity of the byte
> -;; order of the permutation.
> -(define_expand "vec_perm_constv8qi"
> -  [(match_operand:V8QI 0 "register_operand" "")
> -   (match_operand:V8QI 1 "register_operand" "")
> -   (match_operand:V8QI 2 "register_operand" "")
> -   (match_operand:V8QI 3 "" "")]
> -  "TARGET_VIS2"
> -{
> -  unsigned int i, mask;
> -  rtx sel = operands[3];
> -
> -  for (i = mask = 0; i < 8; ++i)
> -    mask |= (INTVAL (XVECEXP (sel, 0, i)) & 0xf) << (28 - i*4);
> -  sel = force_reg (SImode, gen_int_mode (mask, SImode));
> -
> -  emit_insn (gen_bmasksi_vis (gen_reg_rtx (SImode), sel, const0_rtx));
> -  emit_insn (gen_bshufflev8qi_vis (operands[0], operands[1], operands[2]));
> -  DONE;
> -})
> -
>  ;; Unlike constant permutation, we can vastly simplify the compression of
>  ;; the 64-bit selector input to the 32-bit %gsr value by knowing what the
>  ;; width of the input is.
> Index: gcc/config/sparc/sparc.c
> ===================================================================
> --- gcc/config/sparc/sparc.c    2017-12-09 22:47:09.549486911 +0000
> +++ gcc/config/sparc/sparc.c    2017-12-09 22:47:27.876318096 +0000
> @@ -686,6 +686,8 @@ static bool sparc_modes_tieable_p (machi
>  static bool sparc_can_change_mode_class (machine_mode, machine_mode,
>                                          reg_class_t);
>  static HOST_WIDE_INT sparc_constant_alignment (const_tree, HOST_WIDE_INT);
> +static bool sparc_vectorize_vec_perm_const (machine_mode, rtx, rtx, rtx,
> +                                           const vec_perm_indices &);
>
>  #ifdef SUBTARGET_ATTRIBUTE_TABLE
>  /* Table of valid machine attributes.  */
> @@ -930,6 +932,9 @@ #define TARGET_CAN_CHANGE_MODE_CLASS spa
>  #undef TARGET_CONSTANT_ALIGNMENT
>  #define TARGET_CONSTANT_ALIGNMENT sparc_constant_alignment
>
> +#undef TARGET_VECTORIZE_VEC_PERM_CONST
> +#define TARGET_VECTORIZE_VEC_PERM_CONST sparc_vectorize_vec_perm_const
> +
>  struct gcc_target targetm = TARGET_INITIALIZER;
>
>  /* Return the memory reference contained in X if any, zero otherwise.  */
> @@ -12812,6 +12817,32 @@ sparc_expand_vec_perm_bmask (machine_mod
>    emit_insn (gen_bmasksi_vis (gen_reg_rtx (SImode), sel, t_1));
>  }
>
> +/* Implement TARGET_VEC_PERM_CONST.  */
> +
> +static bool
> +sparc_vectorize_vec_perm_const (machine_mode vmode, rtx target, rtx op0,
> +                               rtx op1, const vec_perm_indices &sel)
> +{
> +  /* All permutes are supported.  */
> +  if (!target)
> +    return true;
> +
> +  /* Force target-independent code to convert constant permutations on other
> +     modes down to V8QI.  Rely on this to avoid the complexity of the byte
> +     order of the permutation.  */
> +  if (vmode != V8QImode)
> +    return false;
> +
> +  unsigned int i, mask;
> +  for (i = mask = 0; i < 8; ++i)
> +    mask |= (sel[i] & 0xf) << (28 - i*4);
> +  rtx mask_rtx = force_reg (SImode, gen_int_mode (mask, SImode));
> +
> +  emit_insn (gen_bmasksi_vis (gen_reg_rtx (SImode), mask_rtx, const0_rtx));
> +  emit_insn (gen_bshufflev8qi_vis (target, op0, op1));
> +  return true;
> +}
> +
>  /* Implement TARGET_FRAME_POINTER_REQUIRED.  */
>
>  static bool

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [06/13] Check whether a vector of QIs can store all indices
  2017-12-09 23:18 ` [06/13] Check whether a vector of QIs can store all indices Richard Sandiford
@ 2017-12-12 15:27   ` Richard Biener
  0 siblings, 0 replies; 46+ messages in thread
From: Richard Biener @ 2017-12-12 15:27 UTC (permalink / raw)
  To: GCC Patches, Richard Sandiford

On Sun, Dec 10, 2017 at 12:18 AM, Richard Sandiford
<richard.sandiford@linaro.org> wrote:
> The patch to remove the vec_perm_const optab checked whether replacing
> a constant permute with a variable permute is safe, or whether it might
> truncate the indices.  This patch adds a corresponding check for whether
> variable permutes can be lowered to QImode-based permutes.

Ok.

>
> 2017-12-09  Richard Sandiford  <richard.sandiford@linaro.org>
>
> gcc/
>         * optabs-query.c (can_vec_perm_var_p): Check whether lowering
>         to qimode could truncate the indices.
>         * optabs.c (expand_vec_perm_var): Likewise.
>
> Index: gcc/optabs-query.c
> ===================================================================
> --- gcc/optabs-query.c  2017-12-09 22:47:21.534314227 +0000
> +++ gcc/optabs-query.c  2017-12-09 22:47:25.861316866 +0000
> @@ -378,7 +378,8 @@ can_vec_perm_var_p (machine_mode mode)
>
>    /* We allow fallback to a QI vector mode, and adjust the mask.  */
>    machine_mode qimode;
> -  if (!qimode_for_vec_perm (mode).exists (&qimode))
> +  if (!qimode_for_vec_perm (mode).exists (&qimode)
> +      || GET_MODE_NUNITS (qimode) > GET_MODE_MASK (QImode) + 1)
>      return false;
>
>    if (direct_optab_handler (vec_perm_optab, qimode) == CODE_FOR_nothing)
> Index: gcc/optabs.c
> ===================================================================
> --- gcc/optabs.c        2017-12-09 22:47:23.878315657 +0000
> +++ gcc/optabs.c        2017-12-09 22:47:25.861316866 +0000
> @@ -5595,7 +5595,8 @@ expand_vec_perm_var (machine_mode mode,
>    /* As a special case to aid several targets, lower the element-based
>       permutation to a byte-based permutation and try again.  */
>    machine_mode qimode;
> -  if (!qimode_for_vec_perm (mode).exists (&qimode))
> +  if (!qimode_for_vec_perm (mode).exists (&qimode)
> +      || GET_MODE_NUNITS (qimode) > GET_MODE_MASK (QImode) + 1)
>      return NULL_RTX;
>    icode = direct_optab_handler (vec_perm_optab, qimode);
>    if (icode == CODE_FOR_nothing)

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [07/13] Make vec_perm_indices use new vector encoding
  2017-12-09 23:20 ` [07/13] Make vec_perm_indices use new vector encoding Richard Sandiford
@ 2017-12-12 15:32   ` Richard Biener
  2017-12-12 15:47     ` Richard Sandiford
  0 siblings, 1 reply; 46+ messages in thread
From: Richard Biener @ 2017-12-12 15:32 UTC (permalink / raw)
  To: GCC Patches, Richard Sandiford

On Sun, Dec 10, 2017 at 12:20 AM, Richard Sandiford
<richard.sandiford@linaro.org> wrote:
> This patch changes vec_perm_indices from a plain vec<> to a class
> that stores a canonicalised permutation, using the same encoding
> as for VECTOR_CSTs.  This means that vec_perm_indices now carries
> information about the number of vectors being permuted (currently
> always 1 or 2) and the number of elements in each input vector.

Before I dive into  the C++ details can you explain why it needs this
info and how it encodes it for variable-length vectors?  To interleave
two vectors you need sth like { 0, N, 1, N+1, ... }, I'm not sure we
can directly encode N here, can we?  extract even/odd should just
work as { 0, 2, 4, 6, ...} without knowledge of whether we permute
one or two vectors (the one vector case just has two times the same
vector) or how many elements each of the vectors (or the result) has.

Richard.

> A new vec_perm_builder class is used to actually build up the vector,
> like tree_vector_builder does for trees.  vec_perm_indices is the
> completed representation, a bit like VECTOR_CST is for trees.
>
> The patch just does a mechanical conversion of the code to
> vec_perm_builder: a later patch uses explicit encodings where possible.
>
> The point of all this is that it makes the representation suitable
> for variable-length vectors.  It's no longer necessary for the
> underlying vec<>s to store every element explicitly.
>
> In int-vector-builder.h, "using the same encoding as tree and rtx constants"
> describes the endpoint -- adding the rtx encoding comes later.
>
>
> 2017-12-09  Richard Sandiford  <richard.sandiford@linaro.org>
>
> gcc/
>         * int-vector-builder.h: New file.
>         * vec-perm-indices.h: Include int-vector-builder.h.
>         (vec_perm_indices): Redefine as an int_vector_builder.
>         (auto_vec_perm_indices): Delete.
>         (vec_perm_builder): Redefine as a stand-alone class.
>         (vec_perm_indices::vec_perm_indices): New function.
>         (vec_perm_indices::clamp): Likewise.
>         * vec-perm-indices.c: Include fold-const.h and tree-vector-builder.h.
>         (vec_perm_indices::new_vector): New function.
>         (vec_perm_indices::new_expanded_vector): Update for new
>         vec_perm_indices class.
>         (vec_perm_indices::rotate_inputs): New function.
>         (vec_perm_indices::all_in_range_p): Operate directly on the
>         encoded form, without computing elided elements.
>         (tree_to_vec_perm_builder): Operate directly on the VECTOR_CST
>         encoding.  Update for new vec_perm_indices class.
>         * optabs.c (expand_vec_perm_const): Create a vec_perm_indices for
>         the given vec_perm_builder.
>         (expand_vec_perm_var): Update vec_perm_builder constructor.
>         (expand_mult_highpart): Use vec_perm_builder instead of
>         auto_vec_perm_indices.
>         * optabs-query.c (can_mult_highpart_p): Use vec_perm_builder and
>         vec_perm_indices instead of auto_vec_perm_indices.  Use a single
>         or double series encoding as appropriate.
>         * fold-const.c (fold_ternary_loc): Use vec_perm_builder and
>         vec_perm_indices instead of auto_vec_perm_indices.
>         * tree-ssa-forwprop.c (simplify_vector_constructor): Likewise.
>         * tree-vect-data-refs.c (vect_grouped_store_supported): Likewise.
>         (vect_permute_store_chain): Likewise.
>         (vect_grouped_load_supported): Likewise.
>         (vect_permute_load_chain): Likewise.
>         (vect_shift_permute_load_chain): Likewise.
>         * tree-vect-slp.c (vect_build_slp_tree_1): Likewise.
>         (vect_transform_slp_perm_load): Likewise.
>         (vect_schedule_slp_instance): Likewise.
>         * tree-vect-stmts.c (perm_mask_for_reverse): Likewise.
>         (vectorizable_mask_load_store): Likewise.
>         (vectorizable_bswap): Likewise.
>         (vectorizable_store): Likewise.
>         (vectorizable_load): Likewise.
>         * tree-vect-generic.c (lower_vec_perm): Use vec_perm_builder and
>         vec_perm_indices instead of auto_vec_perm_indices.  Use
>         tree_to_vec_perm_builder to read the vector from a tree.
>         * tree-vect-loop.c (calc_vec_perm_mask_for_shift): Take a
>         vec_perm_builder instead of a vec_perm_indices.
>         (have_whole_vector_shift): Use vec_perm_builder and
>         vec_perm_indices instead of auto_vec_perm_indices.  Leave the
>         truncation to calc_vec_perm_mask_for_shift.
>         (vect_create_epilog_for_reduction): Likewise.
>         * config/aarch64/aarch64.c (expand_vec_perm_d::perm): Change
>         from auto_vec_perm_indices to vec_perm_indices.
>         (aarch64_expand_vec_perm_const_1): Use rotate_inputs on d.perm
>         instead of changing individual elements.
>         (aarch64_vectorize_vec_perm_const): Use new_vector to install
>         the vector in d.perm.
>         * config/arm/arm.c (expand_vec_perm_d::perm): Change
>         from auto_vec_perm_indices to vec_perm_indices.
>         (arm_expand_vec_perm_const_1): Use rotate_inputs on d.perm
>         instead of changing individual elements.
>         (arm_vectorize_vec_perm_const): Use new_vector to install
>         the vector in d.perm.
>         * config/powerpcspe/powerpcspe.c (rs6000_expand_extract_even):
>         Update vec_perm_builder constructor.
>         (rs6000_expand_interleave): Likewise.
>         * config/rs6000/rs6000.c (rs6000_expand_extract_even): Likewise.
>         (rs6000_expand_interleave): Likewise.
>
> Index: gcc/int-vector-builder.h
> ===================================================================
> --- /dev/null   2017-12-09 13:59:56.352713187 +0000
> +++ gcc/int-vector-builder.h    2017-12-09 22:48:47.545825268 +0000
> @@ -0,0 +1,90 @@
> +/* A class for building vector integer constants.
> +   Copyright (C) 2017 Free Software Foundation, Inc.
> +
> +This file is part of GCC.
> +
> +GCC is free software; you can redistribute it and/or modify it under
> +the terms of the GNU General Public License as published by the Free
> +Software Foundation; either version 3, or (at your option) any later
> +version.
> +
> +GCC is distributed in the hope that it will be useful, but WITHOUT ANY
> +WARRANTY; without even the implied warranty of MERCHANTABILITY or
> +FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
> +for more details.
> +
> +You should have received a copy of the GNU General Public License
> +along with GCC; see the file COPYING3.  If not see
> +<http://www.gnu.org/licenses/>.  */
> +
> +#ifndef GCC_INT_VECTOR_BUILDER_H
> +#define GCC_INT_VECTOR_BUILDER_H 1
> +
> +#include "vector-builder.h"
> +
> +/* This class is used to build vectors of integer type T using the same
> +   encoding as tree and rtx constants.  See vector_builder for more
> +   details.  */
> +template<typename T>
> +class int_vector_builder : public vector_builder<T, int_vector_builder<T> >
> +{
> +  typedef vector_builder<T, int_vector_builder> parent;
> +  friend class vector_builder<T, int_vector_builder>;
> +
> +public:
> +  int_vector_builder () {}
> +  int_vector_builder (unsigned int, unsigned int, unsigned int);
> +
> +  using parent::new_vector;
> +
> +private:
> +  bool equal_p (T, T) const;
> +  bool allow_steps_p () const { return true; }
> +  bool integral_p (T) const { return true; }
> +  T step (T, T) const;
> +  T apply_step (T, unsigned int, T) const;
> +  bool can_elide_p (T) const { return true; }
> +  void note_representative (T *, T) {}
> +};
> +
> +/* Create a new builder for a vector with FULL_NELTS elements.
> +   Initially encode the value as NPATTERNS interleaved patterns with
> +   NELTS_PER_PATTERN elements each.  */
> +
> +template<typename T>
> +inline
> +int_vector_builder<T>::int_vector_builder (unsigned int full_nelts,
> +                                          unsigned int npatterns,
> +                                          unsigned int nelts_per_pattern)
> +{
> +  new_vector (full_nelts, npatterns, nelts_per_pattern);
> +}
> +
> +/* Return true if elements ELT1 and ELT2 are equal.  */
> +
> +template<typename T>
> +inline bool
> +int_vector_builder<T>::equal_p (T elt1, T elt2) const
> +{
> +  return elt1 == elt2;
> +}
> +
> +/* Return the value of element ELT2 minus the value of element ELT1.  */
> +
> +template<typename T>
> +inline T
> +int_vector_builder<T>::step (T elt1, T elt2) const
> +{
> +  return elt2 - elt1;
> +}
> +
> +/* Return a vector element with the value BASE + FACTOR * STEP.  */
> +
> +template<typename T>
> +inline T
> +int_vector_builder<T>::apply_step (T base, unsigned int factor, T step) const
> +{
> +  return base + factor * step;
> +}
> +
> +#endif
> Index: gcc/vec-perm-indices.h
> ===================================================================
> --- gcc/vec-perm-indices.h      2017-12-09 22:47:27.885318101 +0000
> +++ gcc/vec-perm-indices.h      2017-12-09 22:48:47.548825399 +0000
> @@ -20,30 +20,102 @@ Software Foundation; either version 3, o
>  #ifndef GCC_VEC_PERN_INDICES_H
>  #define GCC_VEC_PERN_INDICES_H 1
>
> +#include "int-vector-builder.h"
> +
> +/* A vector_builder for building constant permutation vectors.
> +   The elements do not need to be clamped to a particular range
> +   of input elements.  */
> +typedef int_vector_builder<HOST_WIDE_INT> vec_perm_builder;
> +
>  /* This class represents a constant permutation vector, such as that used
> -   as the final operand to a VEC_PERM_EXPR.  */
> -class vec_perm_indices : public auto_vec<unsigned short, 32>
> +   as the final operand to a VEC_PERM_EXPR.  The vector is canonicalized
> +   for a particular number of input vectors and for a particular number
> +   of elements per input.  The class copes with cases in which the
> +   input and output vectors have different numbers of elements.  */
> +class vec_perm_indices
>  {
> -  typedef unsigned short element_type;
> -  typedef auto_vec<element_type, 32> parent_type;
> +  typedef HOST_WIDE_INT element_type;
>
>  public:
> -  vec_perm_indices () {}
> -  vec_perm_indices (unsigned int nunits) : parent_type (nunits) {}
> +  vec_perm_indices ();
> +  vec_perm_indices (const vec_perm_builder &, unsigned int, unsigned int);
>
> +  void new_vector (const vec_perm_builder &, unsigned int, unsigned int);
>    void new_expanded_vector (const vec_perm_indices &, unsigned int);
> +  void rotate_inputs (int delta);
> +
> +  /* Return the underlying vector encoding.  */
> +  const vec_perm_builder &encoding () const { return m_encoding; }
> +
> +  /* Return the number of output elements.  This is called length ()
> +     so that we present a more vec-like interface.  */
> +  unsigned int length () const { return m_encoding.full_nelts (); }
> +
> +  /* Return the number of input vectors being permuted.  */
> +  unsigned int ninputs () const { return m_ninputs; }
>
> +  /* Return the number of elements in each input vector.  */
> +  unsigned int nelts_per_input () const { return m_nelts_per_input; }
> +
> +  /* Return the total number of input elements.  */
> +  unsigned int input_nelts () const { return m_ninputs * m_nelts_per_input; }
> +
> +  element_type clamp (element_type) const;
> +  element_type operator[] (unsigned int i) const;
>    bool all_in_range_p (element_type, element_type) const;
>
>  private:
>    vec_perm_indices (const vec_perm_indices &);
> -};
>
> -/* Temporary.  */
> -typedef vec_perm_indices vec_perm_builder;
> -typedef vec_perm_indices auto_vec_perm_indices;
> +  vec_perm_builder m_encoding;
> +  unsigned int m_ninputs;
> +  unsigned int m_nelts_per_input;
> +};
>
>  bool tree_to_vec_perm_builder (vec_perm_builder *, tree);
>  rtx vec_perm_indices_to_rtx (machine_mode, const vec_perm_indices &);
>
> +inline
> +vec_perm_indices::vec_perm_indices ()
> +  : m_ninputs (0),
> +    m_nelts_per_input (0)
> +{
> +}
> +
> +/* Construct a permutation vector that selects between NINPUTS vector
> +   inputs that have NELTS_PER_INPUT elements each.  Take the elements of
> +   the new vector from ELEMENTS, clamping each one to be in range.  */
> +
> +inline
> +vec_perm_indices::vec_perm_indices (const vec_perm_builder &elements,
> +                                   unsigned int ninputs,
> +                                   unsigned int nelts_per_input)
> +{
> +  new_vector (elements, ninputs, nelts_per_input);
> +}
> +
> +/* Return the canonical value for permutation vector element ELT,
> +   taking into account the current number of input elements.  */
> +
> +inline vec_perm_indices::element_type
> +vec_perm_indices::clamp (element_type elt) const
> +{
> +  element_type limit = input_nelts ();
> +  elt %= limit;
> +  /* Treat negative elements as counting from the end.  This only matters
> +     if the vector size is not a power of 2.  */
> +  if (elt < 0)
> +    elt += limit;
> +  return elt;
> +}
> +
> +/* Return the value of vector element I, which might or might not be
> +   explicitly encoded.  */
> +
> +inline vec_perm_indices::element_type
> +vec_perm_indices::operator[] (unsigned int i) const
> +{
> +  return clamp (m_encoding.elt (i));
> +}
> +
>  #endif
> Index: gcc/vec-perm-indices.c
> ===================================================================
> --- gcc/vec-perm-indices.c      2017-12-09 22:47:27.885318101 +0000
> +++ gcc/vec-perm-indices.c      2017-12-09 22:48:47.548825399 +0000
> @@ -22,11 +22,33 @@ Software Foundation; either version 3, o
>  #include "coretypes.h"
>  #include "vec-perm-indices.h"
>  #include "tree.h"
> +#include "fold-const.h"
> +#include "tree-vector-builder.h"
>  #include "backend.h"
>  #include "rtl.h"
>  #include "memmodel.h"
>  #include "emit-rtl.h"
>
> +/* Switch to a new permutation vector that selects between NINPUTS vector
> +   inputs that have NELTS_PER_INPUT elements each.  Take the elements of the
> +   new permutation vector from ELEMENTS, clamping each one to be in range.  */
> +
> +void
> +vec_perm_indices::new_vector (const vec_perm_builder &elements,
> +                             unsigned int ninputs,
> +                             unsigned int nelts_per_input)
> +{
> +  m_ninputs = ninputs;
> +  m_nelts_per_input = nelts_per_input;
> +  /* Expand the encoding and clamp each element.  E.g. { 0, 2, 4, ... }
> +     might wrap halfway if there is only one vector input.  */
> +  unsigned int full_nelts = elements.full_nelts ();
> +  m_encoding.new_vector (full_nelts, full_nelts, 1);
> +  for (unsigned int i = 0; i < full_nelts; ++i)
> +    m_encoding.quick_push (clamp (elements.elt (i)));
> +  m_encoding.finalize ();
> +}
> +
>  /* Switch to a new permutation vector that selects the same input elements
>     as ORIG, but with each element split into FACTOR pieces.  For example,
>     if ORIG is { 1, 2, 0, 3 } and FACTOR is 2, the new permutation is
> @@ -36,14 +58,31 @@ Software Foundation; either version 3, o
>  vec_perm_indices::new_expanded_vector (const vec_perm_indices &orig,
>                                        unsigned int factor)
>  {
> -  truncate (0);
> -  reserve (orig.length () * factor);
> -  for (unsigned int i = 0; i < orig.length (); ++i)
> +  m_ninputs = orig.m_ninputs;
> +  m_nelts_per_input = orig.m_nelts_per_input * factor;
> +  m_encoding.new_vector (orig.m_encoding.full_nelts () * factor,
> +                        orig.m_encoding.npatterns () * factor,
> +                        orig.m_encoding.nelts_per_pattern ());
> +  unsigned int encoded_nelts = orig.m_encoding.encoded_nelts ();
> +  for (unsigned int i = 0; i < encoded_nelts; ++i)
>      {
> -      element_type base = orig[i] * factor;
> +      element_type base = orig.m_encoding[i] * factor;
>        for (unsigned int j = 0; j < factor; ++j)
> -       quick_push (base + j);
> +       m_encoding.quick_push (base + j);
>      }
> +  m_encoding.finalize ();
> +}
> +
> +/* Rotate the inputs of the permutation right by DELTA inputs.  This changes
> +   the values of the permutation vector but it doesn't change the way that
> +   the elements are encoded.  */
> +
> +void
> +vec_perm_indices::rotate_inputs (int delta)
> +{
> +  element_type element_delta = delta * m_nelts_per_input;
> +  for (unsigned int i = 0; i < m_encoding.length (); ++i)
> +    m_encoding[i] = clamp (m_encoding[i] + element_delta);
>  }
>
>  /* Return true if all elements of the permutation vector are in the range
> @@ -52,9 +91,44 @@ vec_perm_indices::new_expanded_vector (c
>  bool
>  vec_perm_indices::all_in_range_p (element_type start, element_type size) const
>  {
> -  for (unsigned int i = 0; i < length (); ++i)
> -    if ((*this)[i] < start || ((*this)[i] - start) >= size)
> +  /* Check the first two elements of each pattern.  */
> +  unsigned int npatterns = m_encoding.npatterns ();
> +  unsigned int nelts_per_pattern = m_encoding.nelts_per_pattern ();
> +  unsigned int base_nelts = npatterns * MIN (nelts_per_pattern, 2);
> +  for (unsigned int i = 0; i < base_nelts; ++i)
> +    if (m_encoding[i] < start || (m_encoding[i] - start) >= size)
>        return false;
> +
> +  /* For stepped encodings, check the full range of the series.  */
> +  if (nelts_per_pattern == 3)
> +    {
> +      element_type limit = input_nelts ();
> +
> +      /* The number of elements in each pattern beyond the first two
> +        that we checked above.  */
> +      unsigned int step_nelts = (m_encoding.full_nelts () / npatterns) - 2;
> +      for (unsigned int i = 0; i < npatterns; ++i)
> +       {
> +         /* BASE1 has been checked but BASE2 hasn't.   */
> +         element_type base1 = m_encoding[i + npatterns];
> +         element_type base2 = m_encoding[i + base_nelts];
> +
> +         /* The step to add to get from BASE1 to each subsequent value.  */
> +         element_type step = clamp (base2 - base1);
> +
> +         /* STEP has no inherent sign, so a value near LIMIT can
> +            act as a negative step.  The series is in range if it
> +            is in range according to one of the two interpretations.
> +
> +            Since we're dealing with clamped values, ELEMENT_TYPE is
> +            wide enough for overflow not to be a problem.  */
> +         element_type headroom_down = base1 - start;
> +         element_type headroom_up = size - headroom_down - 1;
> +         if (headroom_up < step * step_nelts
> +             && headroom_down < (limit - step) * step_nelts)
> +           return false;
> +       }
> +    }
>    return true;
>  }
>
> @@ -65,15 +139,16 @@ vec_perm_indices::all_in_range_p (elemen
>  bool
>  tree_to_vec_perm_builder (vec_perm_builder *builder, tree cst)
>  {
> -  unsigned int nelts = TYPE_VECTOR_SUBPARTS (TREE_TYPE (cst));
> -  for (unsigned int i = 0; i < nelts; ++i)
> -    if (!tree_fits_shwi_p (vector_cst_elt (cst, i)))
> +  unsigned int encoded_nelts = vector_cst_encoded_nelts (cst);
> +  for (unsigned int i = 0; i < encoded_nelts; ++i)
> +    if (!tree_fits_shwi_p (VECTOR_CST_ENCODED_ELT (cst, i)))
>        return false;
>
> -  builder->reserve (nelts);
> -  for (unsigned int i = 0; i < nelts; ++i)
> -    builder->quick_push (tree_to_shwi (vector_cst_elt (cst, i))
> -                        & (2 * nelts - 1));
> +  builder->new_vector (TYPE_VECTOR_SUBPARTS (TREE_TYPE (cst)),
> +                      VECTOR_CST_NPATTERNS (cst),
> +                      VECTOR_CST_NELTS_PER_PATTERN (cst));
> +  for (unsigned int i = 0; i < encoded_nelts; ++i)
> +    builder->quick_push (tree_to_shwi (VECTOR_CST_ENCODED_ELT (cst, i)));
>    return true;
>  }
>
> Index: gcc/optabs.c
> ===================================================================
> --- gcc/optabs.c        2017-12-09 22:47:27.881318099 +0000
> +++ gcc/optabs.c        2017-12-09 22:48:47.546825312 +0000
> @@ -5456,6 +5456,11 @@ expand_vec_perm_const (machine_mode mode
>    rtx_insn *last = get_last_insn ();
>
>    bool single_arg_p = rtx_equal_p (v0, v1);
> +  /* Always specify two input vectors here and leave the target to handle
> +     cases in which the inputs are equal.  Not all backends can cope with
> +     the single-input representation when testing for a double-input
> +     target instruction.  */
> +  vec_perm_indices indices (sel, 2, GET_MODE_NUNITS (mode));
>
>    /* See if this can be handled with a vec_shr.  We only do this if the
>       second vector is all zeroes.  */
> @@ -5468,7 +5473,7 @@ expand_vec_perm_const (machine_mode mode
>        && (shift_code != CODE_FOR_nothing
>           || shift_code_qi != CODE_FOR_nothing))
>      {
> -      rtx shift_amt = shift_amt_for_vec_perm_mask (mode, sel);
> +      rtx shift_amt = shift_amt_for_vec_perm_mask (mode, indices);
>        if (shift_amt)
>         {
>           struct expand_operand ops[3];
> @@ -5500,7 +5505,7 @@ expand_vec_perm_const (machine_mode mode
>        else
>         v1 = force_reg (mode, v1);
>
> -      if (targetm.vectorize.vec_perm_const (mode, target, v0, v1, sel))
> +      if (targetm.vectorize.vec_perm_const (mode, target, v0, v1, indices))
>         return target;
>      }
>
> @@ -5509,7 +5514,7 @@ expand_vec_perm_const (machine_mode mode
>    rtx target_qi = NULL_RTX, v0_qi = NULL_RTX, v1_qi = NULL_RTX;
>    if (qimode != VOIDmode)
>      {
> -      qimode_indices.new_expanded_vector (sel, GET_MODE_UNIT_SIZE (mode));
> +      qimode_indices.new_expanded_vector (indices, GET_MODE_UNIT_SIZE (mode));
>        target_qi = gen_reg_rtx (qimode);
>        v0_qi = gen_lowpart (qimode, v0);
>        v1_qi = gen_lowpart (qimode, v1);
> @@ -5536,7 +5541,7 @@ expand_vec_perm_const (machine_mode mode
>       REQUIRED_SEL_MODE is OK.  */
>    if (sel_mode != required_sel_mode)
>      {
> -      if (!selector_fits_mode_p (required_sel_mode, sel))
> +      if (!selector_fits_mode_p (required_sel_mode, indices))
>         {
>           delete_insns_since (last);
>           return NULL_RTX;
> @@ -5547,7 +5552,7 @@ expand_vec_perm_const (machine_mode mode
>    insn_code icode = direct_optab_handler (vec_perm_optab, mode);
>    if (icode != CODE_FOR_nothing)
>      {
> -      rtx sel_rtx = vec_perm_indices_to_rtx (sel_mode, sel);
> +      rtx sel_rtx = vec_perm_indices_to_rtx (sel_mode, indices);
>        rtx tmp = expand_vec_perm_1 (icode, target, v0, v1, sel_rtx);
>        if (tmp)
>         return tmp;
> @@ -5621,7 +5626,7 @@ expand_vec_perm_var (machine_mode mode,
>    gcc_assert (sel != NULL);
>
>    /* Broadcast the low byte each element into each of its bytes.  */
> -  vec_perm_builder const_sel (w);
> +  vec_perm_builder const_sel (w, w, 1);
>    for (i = 0; i < w; ++i)
>      {
>        int this_e = i / u * u;
> @@ -5848,7 +5853,7 @@ expand_mult_highpart (machine_mode mode,
>    expand_insn (optab_handler (tab2, mode), 3, eops);
>    m2 = gen_lowpart (mode, eops[0].value);
>
> -  auto_vec_perm_indices sel (nunits);
> +  vec_perm_builder sel (nunits, nunits, 1);
>    if (method == 2)
>      {
>        for (i = 0; i < nunits; ++i)
> Index: gcc/optabs-query.c
> ===================================================================
> --- gcc/optabs-query.c  2017-12-09 22:47:27.881318099 +0000
> +++ gcc/optabs-query.c  2017-12-09 22:48:47.545825268 +0000
> @@ -501,12 +501,13 @@ can_mult_highpart_p (machine_mode mode,
>        op = uns_p ? vec_widen_umult_odd_optab : vec_widen_smult_odd_optab;
>        if (optab_handler (op, mode) != CODE_FOR_nothing)
>         {
> -         auto_vec_perm_indices sel (nunits);
> +         vec_perm_builder sel (nunits, nunits, 1);
>           for (i = 0; i < nunits; ++i)
>             sel.quick_push (!BYTES_BIG_ENDIAN
>                             + (i & ~1)
>                             + ((i & 1) ? nunits : 0));
> -         if (can_vec_perm_const_p (mode, sel))
> +         vec_perm_indices indices (sel, 2, nunits);
> +         if (can_vec_perm_const_p (mode, indices))
>             return 2;
>         }
>      }
> @@ -517,10 +518,11 @@ can_mult_highpart_p (machine_mode mode,
>        op = uns_p ? vec_widen_umult_lo_optab : vec_widen_smult_lo_optab;
>        if (optab_handler (op, mode) != CODE_FOR_nothing)
>         {
> -         auto_vec_perm_indices sel (nunits);
> +         vec_perm_builder sel (nunits, nunits, 1);
>           for (i = 0; i < nunits; ++i)
>             sel.quick_push (2 * i + (BYTES_BIG_ENDIAN ? 0 : 1));
> -         if (can_vec_perm_const_p (mode, sel))
> +         vec_perm_indices indices (sel, 2, nunits);
> +         if (can_vec_perm_const_p (mode, indices))
>             return 3;
>         }
>      }
> Index: gcc/fold-const.c
> ===================================================================
> --- gcc/fold-const.c    2017-12-09 22:47:27.881318099 +0000
> +++ gcc/fold-const.c    2017-12-09 22:48:47.545825268 +0000
> @@ -11217,7 +11217,7 @@ fold_ternary_loc (location_t loc, enum t
>             {
>               unsigned int nelts = VECTOR_CST_NELTS (arg0), i;
>               gcc_assert (nelts == TYPE_VECTOR_SUBPARTS (type));
> -             auto_vec_perm_indices sel (nelts);
> +             vec_perm_builder sel (nelts, nelts, 1);
>               for (i = 0; i < nelts; i++)
>                 {
>                   tree val = VECTOR_CST_ELT (arg0, i);
> @@ -11228,7 +11228,8 @@ fold_ternary_loc (location_t loc, enum t
>                   else /* Currently unreachable.  */
>                     return NULL_TREE;
>                 }
> -             tree t = fold_vec_perm (type, arg1, arg2, sel);
> +             tree t = fold_vec_perm (type, arg1, arg2,
> +                                     vec_perm_indices (sel, 2, nelts));
>               if (t != NULL_TREE)
>                 return t;
>             }
> @@ -11558,8 +11559,8 @@ fold_ternary_loc (location_t loc, enum t
>           mask2 = 2 * nelts - 1;
>           mask = single_arg ? (nelts - 1) : mask2;
>           gcc_assert (nelts == TYPE_VECTOR_SUBPARTS (type));
> -         auto_vec_perm_indices sel (nelts);
> -         auto_vec_perm_indices sel2 (nelts);
> +         vec_perm_builder sel (nelts, nelts, 1);
> +         vec_perm_builder sel2 (nelts, nelts, 1);
>           for (i = 0; i < nelts; i++)
>             {
>               tree val = VECTOR_CST_ELT (arg2, i);
> @@ -11604,12 +11605,13 @@ fold_ternary_loc (location_t loc, enum t
>               need_mask_canon = true;
>             }
>
> +         vec_perm_indices indices (sel, 2, nelts);
>           if ((TREE_CODE (op0) == VECTOR_CST
>                || TREE_CODE (op0) == CONSTRUCTOR)
>               && (TREE_CODE (op1) == VECTOR_CST
>                   || TREE_CODE (op1) == CONSTRUCTOR))
>             {
> -             tree t = fold_vec_perm (type, op0, op1, sel);
> +             tree t = fold_vec_perm (type, op0, op1, indices);
>               if (t != NULL_TREE)
>                 return t;
>             }
> @@ -11621,11 +11623,14 @@ fold_ternary_loc (location_t loc, enum t
>              argument permutation while still allowing an equivalent
>              2-argument version.  */
>           if (need_mask_canon && arg2 == op2
> -             && !can_vec_perm_const_p (TYPE_MODE (type), sel, false)
> -             && can_vec_perm_const_p (TYPE_MODE (type), sel2, false))
> +             && !can_vec_perm_const_p (TYPE_MODE (type), indices, false)
> +             && can_vec_perm_const_p (TYPE_MODE (type),
> +                                      vec_perm_indices (sel2, 2, nelts),
> +                                      false))
>             {
>               need_mask_canon = need_mask_canon2;
> -             sel = sel2;
> +             sel.truncate (0);
> +             sel.splice (sel2);
>             }
>
>           if (need_mask_canon && arg2 == op2)
> Index: gcc/tree-ssa-forwprop.c
> ===================================================================
> --- gcc/tree-ssa-forwprop.c     2017-12-09 22:47:27.883318100 +0000
> +++ gcc/tree-ssa-forwprop.c     2017-12-09 22:48:47.546825312 +0000
> @@ -2019,7 +2019,7 @@ simplify_vector_constructor (gimple_stmt
>    elem_type = TREE_TYPE (type);
>    elem_size = TREE_INT_CST_LOW (TYPE_SIZE (elem_type));
>
> -  auto_vec_perm_indices sel (nelts);
> +  vec_perm_builder sel (nelts, nelts, 1);
>    orig = NULL;
>    conv_code = ERROR_MARK;
>    maybe_ident = true;
> @@ -2109,7 +2109,8 @@ simplify_vector_constructor (gimple_stmt
>      {
>        tree mask_type;
>
> -      if (!can_vec_perm_const_p (TYPE_MODE (type), sel))
> +      vec_perm_indices indices (sel, 1, nelts);
> +      if (!can_vec_perm_const_p (TYPE_MODE (type), indices))
>         return false;
>        mask_type
>         = build_vector_type (build_nonstandard_integer_type (elem_size, 1),
> Index: gcc/tree-vect-data-refs.c
> ===================================================================
> --- gcc/tree-vect-data-refs.c   2017-12-09 22:47:27.883318100 +0000
> +++ gcc/tree-vect-data-refs.c   2017-12-09 22:48:47.546825312 +0000
> @@ -4566,7 +4566,7 @@ vect_grouped_store_supported (tree vecty
>    if (VECTOR_MODE_P (mode))
>      {
>        unsigned int i, nelt = GET_MODE_NUNITS (mode);
> -      auto_vec_perm_indices sel (nelt);
> +      vec_perm_builder sel (nelt, nelt, 1);
>        sel.quick_grow (nelt);
>
>        if (count == 3)
> @@ -4574,6 +4574,7 @@ vect_grouped_store_supported (tree vecty
>           unsigned int j0 = 0, j1 = 0, j2 = 0;
>           unsigned int i, j;
>
> +         vec_perm_indices indices;
>           for (j = 0; j < 3; j++)
>             {
>               int nelt0 = ((3 - j) * nelt) % 3;
> @@ -4588,7 +4589,8 @@ vect_grouped_store_supported (tree vecty
>                   if (3 * i + nelt2 < nelt)
>                     sel[3 * i + nelt2] = 0;
>                 }
> -             if (!can_vec_perm_const_p (mode, sel))
> +             indices.new_vector (sel, 2, nelt);
> +             if (!can_vec_perm_const_p (mode, indices))
>                 {
>                   if (dump_enabled_p ())
>                     dump_printf (MSG_MISSED_OPTIMIZATION,
> @@ -4605,7 +4607,8 @@ vect_grouped_store_supported (tree vecty
>                   if (3 * i + nelt2 < nelt)
>                     sel[3 * i + nelt2] = nelt + j2++;
>                 }
> -             if (!can_vec_perm_const_p (mode, sel))
> +             indices.new_vector (sel, 2, nelt);
> +             if (!can_vec_perm_const_p (mode, indices))
>                 {
>                   if (dump_enabled_p ())
>                     dump_printf (MSG_MISSED_OPTIMIZATION,
> @@ -4625,11 +4628,13 @@ vect_grouped_store_supported (tree vecty
>               sel[i * 2] = i;
>               sel[i * 2 + 1] = i + nelt;
>             }
> -         if (can_vec_perm_const_p (mode, sel))
> +         vec_perm_indices indices (sel, 2, nelt);
> +         if (can_vec_perm_const_p (mode, indices))
>             {
>               for (i = 0; i < nelt; i++)
>                 sel[i] += nelt / 2;
> -             if (can_vec_perm_const_p (mode, sel))
> +             indices.new_vector (sel, 2, nelt);
> +             if (can_vec_perm_const_p (mode, indices))
>                 return true;
>             }
>         }
> @@ -4731,7 +4736,7 @@ vect_permute_store_chain (vec<tree> dr_c
>    unsigned int i, n, log_length = exact_log2 (length);
>    unsigned int j, nelt = TYPE_VECTOR_SUBPARTS (vectype);
>
> -  auto_vec_perm_indices sel (nelt);
> +  vec_perm_builder sel (nelt, nelt, 1);
>    sel.quick_grow (nelt);
>
>    result_chain->quick_grow (length);
> @@ -4742,6 +4747,7 @@ vect_permute_store_chain (vec<tree> dr_c
>      {
>        unsigned int j0 = 0, j1 = 0, j2 = 0;
>
> +      vec_perm_indices indices;
>        for (j = 0; j < 3; j++)
>          {
>           int nelt0 = ((3 - j) * nelt) % 3;
> @@ -4757,7 +4763,8 @@ vect_permute_store_chain (vec<tree> dr_c
>               if (3 * i + nelt2 < nelt)
>                 sel[3 * i + nelt2] = 0;
>             }
> -         perm3_mask_low = vect_gen_perm_mask_checked (vectype, sel);
> +         indices.new_vector (sel, 2, nelt);
> +         perm3_mask_low = vect_gen_perm_mask_checked (vectype, indices);
>
>           for (i = 0; i < nelt; i++)
>             {
> @@ -4768,7 +4775,8 @@ vect_permute_store_chain (vec<tree> dr_c
>               if (3 * i + nelt2 < nelt)
>                 sel[3 * i + nelt2] = nelt + j2++;
>             }
> -         perm3_mask_high = vect_gen_perm_mask_checked (vectype, sel);
> +         indices.new_vector (sel, 2, nelt);
> +         perm3_mask_high = vect_gen_perm_mask_checked (vectype, indices);
>
>           vect1 = dr_chain[0];
>           vect2 = dr_chain[1];
> @@ -4805,11 +4813,13 @@ vect_permute_store_chain (vec<tree> dr_c
>           sel[i * 2] = i;
>           sel[i * 2 + 1] = i + nelt;
>         }
> -       perm_mask_high = vect_gen_perm_mask_checked (vectype, sel);
> +       vec_perm_indices indices (sel, 2, nelt);
> +       perm_mask_high = vect_gen_perm_mask_checked (vectype, indices);
>
>         for (i = 0; i < nelt; i++)
>           sel[i] += nelt / 2;
> -       perm_mask_low = vect_gen_perm_mask_checked (vectype, sel);
> +       indices.new_vector (sel, 2, nelt);
> +       perm_mask_low = vect_gen_perm_mask_checked (vectype, indices);
>
>         for (i = 0, n = log_length; i < n; i++)
>           {
> @@ -5154,11 +5164,12 @@ vect_grouped_load_supported (tree vectyp
>    if (VECTOR_MODE_P (mode))
>      {
>        unsigned int i, j, nelt = GET_MODE_NUNITS (mode);
> -      auto_vec_perm_indices sel (nelt);
> +      vec_perm_builder sel (nelt, nelt, 1);
>        sel.quick_grow (nelt);
>
>        if (count == 3)
>         {
> +         vec_perm_indices indices;
>           unsigned int k;
>           for (k = 0; k < 3; k++)
>             {
> @@ -5167,7 +5178,8 @@ vect_grouped_load_supported (tree vectyp
>                   sel[i] = 3 * i + k;
>                 else
>                   sel[i] = 0;
> -             if (!can_vec_perm_const_p (mode, sel))
> +             indices.new_vector (sel, 2, nelt);
> +             if (!can_vec_perm_const_p (mode, indices))
>                 {
>                   if (dump_enabled_p ())
>                     dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> @@ -5180,7 +5192,8 @@ vect_grouped_load_supported (tree vectyp
>                   sel[i] = i;
>                 else
>                   sel[i] = nelt + ((nelt + k) % 3) + 3 * (j++);
> -             if (!can_vec_perm_const_p (mode, sel))
> +             indices.new_vector (sel, 2, nelt);
> +             if (!can_vec_perm_const_p (mode, indices))
>                 {
>                   if (dump_enabled_p ())
>                     dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> @@ -5195,13 +5208,16 @@ vect_grouped_load_supported (tree vectyp
>         {
>           /* If length is not equal to 3 then only power of 2 is supported.  */
>           gcc_assert (pow2p_hwi (count));
> +
>           for (i = 0; i < nelt; i++)
>             sel[i] = i * 2;
> -         if (can_vec_perm_const_p (mode, sel))
> +         vec_perm_indices indices (sel, 2, nelt);
> +         if (can_vec_perm_const_p (mode, indices))
>             {
>               for (i = 0; i < nelt; i++)
>                 sel[i] = i * 2 + 1;
> -             if (can_vec_perm_const_p (mode, sel))
> +             indices.new_vector (sel, 2, nelt);
> +             if (can_vec_perm_const_p (mode, indices))
>                 return true;
>             }
>          }
> @@ -5316,7 +5332,7 @@ vect_permute_load_chain (vec<tree> dr_ch
>    unsigned int i, j, log_length = exact_log2 (length);
>    unsigned nelt = TYPE_VECTOR_SUBPARTS (vectype);
>
> -  auto_vec_perm_indices sel (nelt);
> +  vec_perm_builder sel (nelt, nelt, 1);
>    sel.quick_grow (nelt);
>
>    result_chain->quick_grow (length);
> @@ -5327,6 +5343,7 @@ vect_permute_load_chain (vec<tree> dr_ch
>      {
>        unsigned int k;
>
> +      vec_perm_indices indices;
>        for (k = 0; k < 3; k++)
>         {
>           for (i = 0; i < nelt; i++)
> @@ -5334,15 +5351,16 @@ vect_permute_load_chain (vec<tree> dr_ch
>               sel[i] = 3 * i + k;
>             else
>               sel[i] = 0;
> -         perm3_mask_low = vect_gen_perm_mask_checked (vectype, sel);
> +         indices.new_vector (sel, 2, nelt);
> +         perm3_mask_low = vect_gen_perm_mask_checked (vectype, indices);
>
>           for (i = 0, j = 0; i < nelt; i++)
>             if (3 * i + k < 2 * nelt)
>               sel[i] = i;
>             else
>               sel[i] = nelt + ((nelt + k) % 3) + 3 * (j++);
> -
> -         perm3_mask_high = vect_gen_perm_mask_checked (vectype, sel);
> +         indices.new_vector (sel, 2, nelt);
> +         perm3_mask_high = vect_gen_perm_mask_checked (vectype, indices);
>
>           first_vect = dr_chain[0];
>           second_vect = dr_chain[1];
> @@ -5374,11 +5392,13 @@ vect_permute_load_chain (vec<tree> dr_ch
>
>        for (i = 0; i < nelt; ++i)
>         sel[i] = i * 2;
> -      perm_mask_even = vect_gen_perm_mask_checked (vectype, sel);
> +      vec_perm_indices indices (sel, 2, nelt);
> +      perm_mask_even = vect_gen_perm_mask_checked (vectype, indices);
>
>        for (i = 0; i < nelt; ++i)
>         sel[i] = i * 2 + 1;
> -      perm_mask_odd = vect_gen_perm_mask_checked (vectype, sel);
> +      indices.new_vector (sel, 2, nelt);
> +      perm_mask_odd = vect_gen_perm_mask_checked (vectype, indices);
>
>        for (i = 0; i < log_length; i++)
>         {
> @@ -5514,7 +5534,7 @@ vect_shift_permute_load_chain (vec<tree>
>    stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
>    loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
>
> -  auto_vec_perm_indices sel (nelt);
> +  vec_perm_builder sel (nelt, nelt, 1);
>    sel.quick_grow (nelt);
>
>    result_chain->quick_grow (length);
> @@ -5528,7 +5548,8 @@ vect_shift_permute_load_chain (vec<tree>
>         sel[i] = i * 2;
>        for (i = 0; i < nelt / 2; ++i)
>         sel[nelt / 2 + i] = i * 2 + 1;
> -      if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel))
> +      vec_perm_indices indices (sel, 2, nelt);
> +      if (!can_vec_perm_const_p (TYPE_MODE (vectype), indices))
>         {
>           if (dump_enabled_p ())
>             dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> @@ -5536,13 +5557,14 @@ vect_shift_permute_load_chain (vec<tree>
>                               supported by target\n");
>           return false;
>         }
> -      perm2_mask1 = vect_gen_perm_mask_checked (vectype, sel);
> +      perm2_mask1 = vect_gen_perm_mask_checked (vectype, indices);
>
>        for (i = 0; i < nelt / 2; ++i)
>         sel[i] = i * 2 + 1;
>        for (i = 0; i < nelt / 2; ++i)
>         sel[nelt / 2 + i] = i * 2;
> -      if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel))
> +      indices.new_vector (sel, 2, nelt);
> +      if (!can_vec_perm_const_p (TYPE_MODE (vectype), indices))
>         {
>           if (dump_enabled_p ())
>             dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> @@ -5550,20 +5572,21 @@ vect_shift_permute_load_chain (vec<tree>
>                               supported by target\n");
>           return false;
>         }
> -      perm2_mask2 = vect_gen_perm_mask_checked (vectype, sel);
> +      perm2_mask2 = vect_gen_perm_mask_checked (vectype, indices);
>
>        /* Generating permutation constant to shift all elements.
>          For vector length 8 it is {4 5 6 7 8 9 10 11}.  */
>        for (i = 0; i < nelt; i++)
>         sel[i] = nelt / 2 + i;
> -      if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel))
> +      indices.new_vector (sel, 2, nelt);
> +      if (!can_vec_perm_const_p (TYPE_MODE (vectype), indices))
>         {
>           if (dump_enabled_p ())
>             dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>                              "shift permutation is not supported by target\n");
>           return false;
>         }
> -      shift1_mask = vect_gen_perm_mask_checked (vectype, sel);
> +      shift1_mask = vect_gen_perm_mask_checked (vectype, indices);
>
>        /* Generating permutation constant to select vector from 2.
>          For vector length 8 it is {0 1 2 3 12 13 14 15}.  */
> @@ -5571,14 +5594,15 @@ vect_shift_permute_load_chain (vec<tree>
>         sel[i] = i;
>        for (i = nelt / 2; i < nelt; i++)
>         sel[i] = nelt + i;
> -      if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel))
> +      indices.new_vector (sel, 2, nelt);
> +      if (!can_vec_perm_const_p (TYPE_MODE (vectype), indices))
>         {
>           if (dump_enabled_p ())
>             dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>                              "select is not supported by target\n");
>           return false;
>         }
> -      select_mask = vect_gen_perm_mask_checked (vectype, sel);
> +      select_mask = vect_gen_perm_mask_checked (vectype, indices);
>
>        for (i = 0; i < log_length; i++)
>         {
> @@ -5634,7 +5658,8 @@ vect_shift_permute_load_chain (vec<tree>
>           sel[i] = 3 * k + (l % 3);
>           k++;
>         }
> -      if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel))
> +      vec_perm_indices indices (sel, 2, nelt);
> +      if (!can_vec_perm_const_p (TYPE_MODE (vectype), indices))
>         {
>           if (dump_enabled_p ())
>             dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> @@ -5642,59 +5667,63 @@ vect_shift_permute_load_chain (vec<tree>
>                               supported by target\n");
>           return false;
>         }
> -      perm3_mask = vect_gen_perm_mask_checked (vectype, sel);
> +      perm3_mask = vect_gen_perm_mask_checked (vectype, indices);
>
>        /* Generating permutation constant to shift all elements.
>          For vector length 8 it is {6 7 8 9 10 11 12 13}.  */
>        for (i = 0; i < nelt; i++)
>         sel[i] = 2 * (nelt / 3) + (nelt % 3) + i;
> -      if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel))
> +      indices.new_vector (sel, 2, nelt);
> +      if (!can_vec_perm_const_p (TYPE_MODE (vectype), indices))
>         {
>           if (dump_enabled_p ())
>             dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>                              "shift permutation is not supported by target\n");
>           return false;
>         }
> -      shift1_mask = vect_gen_perm_mask_checked (vectype, sel);
> +      shift1_mask = vect_gen_perm_mask_checked (vectype, indices);
>
>        /* Generating permutation constant to shift all elements.
>          For vector length 8 it is {5 6 7 8 9 10 11 12}.  */
>        for (i = 0; i < nelt; i++)
>         sel[i] = 2 * (nelt / 3) + 1 + i;
> -      if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel))
> +      indices.new_vector (sel, 2, nelt);
> +      if (!can_vec_perm_const_p (TYPE_MODE (vectype), indices))
>         {
>           if (dump_enabled_p ())
>             dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>                              "shift permutation is not supported by target\n");
>           return false;
>         }
> -      shift2_mask = vect_gen_perm_mask_checked (vectype, sel);
> +      shift2_mask = vect_gen_perm_mask_checked (vectype, indices);
>
>        /* Generating permutation constant to shift all elements.
>          For vector length 8 it is {3 4 5 6 7 8 9 10}.  */
>        for (i = 0; i < nelt; i++)
>         sel[i] = (nelt / 3) + (nelt % 3) / 2 + i;
> -      if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel))
> +      indices.new_vector (sel, 2, nelt);
> +      if (!can_vec_perm_const_p (TYPE_MODE (vectype), indices))
>         {
>           if (dump_enabled_p ())
>             dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>                              "shift permutation is not supported by target\n");
>           return false;
>         }
> -      shift3_mask = vect_gen_perm_mask_checked (vectype, sel);
> +      shift3_mask = vect_gen_perm_mask_checked (vectype, indices);
>
>        /* Generating permutation constant to shift all elements.
>          For vector length 8 it is {5 6 7 8 9 10 11 12}.  */
>        for (i = 0; i < nelt; i++)
>         sel[i] = 2 * (nelt / 3) + (nelt % 3) / 2 + i;
> -      if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel))
> +      indices.new_vector (sel, 2, nelt);
> +      if (!can_vec_perm_const_p (TYPE_MODE (vectype), indices))
>         {
>           if (dump_enabled_p ())
>             dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>                              "shift permutation is not supported by target\n");
>           return false;
>         }
> -      shift4_mask = vect_gen_perm_mask_checked (vectype, sel);
> +      shift4_mask = vect_gen_perm_mask_checked (vectype, indices);
>
>        for (k = 0; k < 3; k++)
>         {
> Index: gcc/tree-vect-slp.c
> ===================================================================
> --- gcc/tree-vect-slp.c 2017-12-09 22:47:27.884318101 +0000
> +++ gcc/tree-vect-slp.c 2017-12-09 22:48:47.547825355 +0000
> @@ -894,7 +894,7 @@ vect_build_slp_tree_1 (vec_info *vinfo,
>        && TREE_CODE_CLASS (alt_stmt_code) != tcc_reference)
>      {
>        unsigned int count = TYPE_VECTOR_SUBPARTS (vectype);
> -      auto_vec_perm_indices sel (count);
> +      vec_perm_builder sel (count, count, 1);
>        for (i = 0; i < count; ++i)
>         {
>           unsigned int elt = i;
> @@ -902,7 +902,8 @@ vect_build_slp_tree_1 (vec_info *vinfo,
>             elt += count;
>           sel.quick_push (elt);
>         }
> -      if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel))
> +      vec_perm_indices indices (sel, 2, count);
> +      if (!can_vec_perm_const_p (TYPE_MODE (vectype), indices))
>         {
>           for (i = 0; i < group_size; ++i)
>             if (gimple_assign_rhs_code (stmts[i]) == alt_stmt_code)
> @@ -3570,8 +3571,9 @@ vect_transform_slp_perm_load (slp_tree n
>      (int_mode_for_mode (TYPE_MODE (TREE_TYPE (vectype))).require (), 1);
>    mask_type = get_vectype_for_scalar_type (mask_element_type);
>    nunits = TYPE_VECTOR_SUBPARTS (vectype);
> -  auto_vec_perm_indices mask (nunits);
> +  vec_perm_builder mask (nunits, nunits, 1);
>    mask.quick_grow (nunits);
> +  vec_perm_indices indices;
>
>    /* Initialize the vect stmts of NODE to properly insert the generated
>       stmts later.  */
> @@ -3644,10 +3646,10 @@ vect_transform_slp_perm_load (slp_tree n
>             noop_p = false;
>           mask[index++] = mask_element;
>
> -         if (index == nunits)
> +         if (index == nunits && !noop_p)
>             {
> -             if (! noop_p
> -                 && ! can_vec_perm_const_p (mode, mask))
> +             indices.new_vector (mask, 2, nunits);
> +             if (!can_vec_perm_const_p (mode, indices))
>                 {
>                   if (dump_enabled_p ())
>                     {
> @@ -3655,16 +3657,19 @@ vect_transform_slp_perm_load (slp_tree n
>                                        vect_location,
>                                        "unsupported vect permute { ");
>                       for (i = 0; i < nunits; ++i)
> -                       dump_printf (MSG_MISSED_OPTIMIZATION, "%d ", mask[i]);
> +                       dump_printf (MSG_MISSED_OPTIMIZATION,
> +                                    HOST_WIDE_INT_PRINT_DEC " ", mask[i]);
>                       dump_printf (MSG_MISSED_OPTIMIZATION, "}\n");
>                     }
>                   gcc_assert (analyze_only);
>                   return false;
>                 }
>
> -             if (! noop_p)
> -               ++*n_perms;
> +             ++*n_perms;
> +           }
>
> +         if (index == nunits)
> +           {
>               if (!analyze_only)
>                 {
>                   tree mask_vec = NULL_TREE;
> @@ -3797,7 +3802,7 @@ vect_schedule_slp_instance (slp_tree nod
>        enum tree_code code0 = gimple_assign_rhs_code (stmt);
>        enum tree_code ocode = ERROR_MARK;
>        gimple *ostmt;
> -      auto_vec_perm_indices mask (group_size);
> +      vec_perm_builder mask (group_size, group_size, 1);
>        FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_STMTS (node), i, ostmt)
>         if (gimple_assign_rhs_code (ostmt) != code0)
>           {
> Index: gcc/tree-vect-stmts.c
> ===================================================================
> --- gcc/tree-vect-stmts.c       2017-12-09 22:47:27.885318101 +0000
> +++ gcc/tree-vect-stmts.c       2017-12-09 22:48:47.548825399 +0000
> @@ -1717,13 +1717,14 @@ perm_mask_for_reverse (tree vectype)
>
>    nunits = TYPE_VECTOR_SUBPARTS (vectype);
>
> -  auto_vec_perm_indices sel (nunits);
> +  vec_perm_builder sel (nunits, nunits, 1);
>    for (i = 0; i < nunits; ++i)
>      sel.quick_push (nunits - 1 - i);
>
> -  if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel))
> +  vec_perm_indices indices (sel, 1, nunits);
> +  if (!can_vec_perm_const_p (TYPE_MODE (vectype), indices))
>      return NULL_TREE;
> -  return vect_gen_perm_mask_checked (vectype, sel);
> +  return vect_gen_perm_mask_checked (vectype, indices);
>  }
>
>  /* A subroutine of get_load_store_type, with a subset of the same
> @@ -2185,27 +2186,32 @@ vectorizable_mask_load_store (gimple *st
>         {
>           modifier = WIDEN;
>
> -         auto_vec_perm_indices sel (gather_off_nunits);
> +         vec_perm_builder sel (gather_off_nunits, gather_off_nunits, 1);
>           for (i = 0; i < gather_off_nunits; ++i)
>             sel.quick_push (i | nunits);
>
> -         perm_mask = vect_gen_perm_mask_checked (gs_info.offset_vectype, sel);
> +         vec_perm_indices indices (sel, 1, gather_off_nunits);
> +         perm_mask = vect_gen_perm_mask_checked (gs_info.offset_vectype,
> +                                                 indices);
>         }
>        else if (nunits == gather_off_nunits * 2)
>         {
>           modifier = NARROW;
>
> -         auto_vec_perm_indices sel (nunits);
> +         vec_perm_builder sel (nunits, nunits, 1);
>           sel.quick_grow (nunits);
>           for (i = 0; i < nunits; ++i)
>             sel[i] = i < gather_off_nunits
>                      ? i : i + nunits - gather_off_nunits;
> +         vec_perm_indices indices (sel, 2, nunits);
> +         perm_mask = vect_gen_perm_mask_checked (vectype, indices);
>
> -         perm_mask = vect_gen_perm_mask_checked (vectype, sel);
>           ncopies *= 2;
> +
>           for (i = 0; i < nunits; ++i)
>             sel[i] = i | gather_off_nunits;
> -         mask_perm_mask = vect_gen_perm_mask_checked (masktype, sel);
> +         indices.new_vector (sel, 2, gather_off_nunits);
> +         mask_perm_mask = vect_gen_perm_mask_checked (masktype, indices);
>         }
>        else
>         gcc_unreachable ();
> @@ -2498,12 +2504,13 @@ vectorizable_bswap (gimple *stmt, gimple
>    unsigned int num_bytes = TYPE_VECTOR_SUBPARTS (char_vectype);
>    unsigned word_bytes = num_bytes / nunits;
>
> -  auto_vec_perm_indices elts (num_bytes);
> +  vec_perm_builder elts (num_bytes, num_bytes, 1);
>    for (unsigned i = 0; i < nunits; ++i)
>      for (unsigned j = 0; j < word_bytes; ++j)
>        elts.quick_push ((i + 1) * word_bytes - j - 1);
>
> -  if (!can_vec_perm_const_p (TYPE_MODE (char_vectype), elts))
> +  vec_perm_indices indices (elts, 1, num_bytes);
> +  if (!can_vec_perm_const_p (TYPE_MODE (char_vectype), indices))
>      return false;
>
>    if (! vec_stmt)
> @@ -5809,22 +5816,25 @@ vectorizable_store (gimple *stmt, gimple
>         {
>           modifier = WIDEN;
>
> -         auto_vec_perm_indices sel (scatter_off_nunits);
> +         vec_perm_builder sel (scatter_off_nunits, scatter_off_nunits, 1);
>           for (i = 0; i < (unsigned int) scatter_off_nunits; ++i)
>             sel.quick_push (i | nunits);
>
> -         perm_mask = vect_gen_perm_mask_checked (gs_info.offset_vectype, sel);
> +         vec_perm_indices indices (sel, 1, scatter_off_nunits);
> +         perm_mask = vect_gen_perm_mask_checked (gs_info.offset_vectype,
> +                                                 indices);
>           gcc_assert (perm_mask != NULL_TREE);
>         }
>        else if (nunits == (unsigned int) scatter_off_nunits * 2)
>         {
>           modifier = NARROW;
>
> -         auto_vec_perm_indices sel (nunits);
> +         vec_perm_builder sel (nunits, nunits, 1);
>           for (i = 0; i < (unsigned int) nunits; ++i)
>             sel.quick_push (i | scatter_off_nunits);
>
> -         perm_mask = vect_gen_perm_mask_checked (vectype, sel);
> +         vec_perm_indices indices (sel, 2, nunits);
> +         perm_mask = vect_gen_perm_mask_checked (vectype, indices);
>           gcc_assert (perm_mask != NULL_TREE);
>           ncopies *= 2;
>         }
> @@ -6845,22 +6855,25 @@ vectorizable_load (gimple *stmt, gimple_
>         {
>           modifier = WIDEN;
>
> -         auto_vec_perm_indices sel (gather_off_nunits);
> +         vec_perm_builder sel (gather_off_nunits, gather_off_nunits, 1);
>           for (i = 0; i < gather_off_nunits; ++i)
>             sel.quick_push (i | nunits);
>
> -         perm_mask = vect_gen_perm_mask_checked (gs_info.offset_vectype, sel);
> +         vec_perm_indices indices (sel, 1, gather_off_nunits);
> +         perm_mask = vect_gen_perm_mask_checked (gs_info.offset_vectype,
> +                                                 indices);
>         }
>        else if (nunits == gather_off_nunits * 2)
>         {
>           modifier = NARROW;
>
> -         auto_vec_perm_indices sel (nunits);
> +         vec_perm_builder sel (nunits, nunits, 1);
>           for (i = 0; i < nunits; ++i)
>             sel.quick_push (i < gather_off_nunits
>                             ? i : i + nunits - gather_off_nunits);
>
> -         perm_mask = vect_gen_perm_mask_checked (vectype, sel);
> +         vec_perm_indices indices (sel, 2, nunits);
> +         perm_mask = vect_gen_perm_mask_checked (vectype, indices);
>           ncopies *= 2;
>         }
>        else
> Index: gcc/tree-vect-generic.c
> ===================================================================
> --- gcc/tree-vect-generic.c     2017-12-09 22:47:27.883318100 +0000
> +++ gcc/tree-vect-generic.c     2017-12-09 22:48:47.547825355 +0000
> @@ -1299,15 +1299,13 @@ lower_vec_perm (gimple_stmt_iterator *gs
>         mask = gimple_assign_rhs1 (def_stmt);
>      }
>
> -  if (TREE_CODE (mask) == VECTOR_CST)
> -    {
> -      auto_vec_perm_indices sel_int (elements);
> -
> -      for (i = 0; i < elements; ++i)
> -       sel_int.quick_push (TREE_INT_CST_LOW (VECTOR_CST_ELT (mask, i))
> -                           & (2 * elements - 1));
> +  vec_perm_builder sel_int;
>
> -      if (can_vec_perm_const_p (TYPE_MODE (vect_type), sel_int))
> +  if (TREE_CODE (mask) == VECTOR_CST
> +      && tree_to_vec_perm_builder (&sel_int, mask))
> +    {
> +      vec_perm_indices indices (sel_int, 2, elements);
> +      if (can_vec_perm_const_p (TYPE_MODE (vect_type), indices))
>         {
>           gimple_assign_set_rhs3 (stmt, mask);
>           update_stmt (stmt);
> @@ -1319,14 +1317,14 @@ lower_vec_perm (gimple_stmt_iterator *gs
>           != CODE_FOR_nothing
>           && TREE_CODE (vec1) == VECTOR_CST
>           && initializer_zerop (vec1)
> -         && sel_int[0]
> -         && sel_int[0] < elements)
> +         && indices[0]
> +         && indices[0] < elements)
>         {
>           for (i = 1; i < elements; ++i)
>             {
> -             unsigned int expected = i + sel_int[0];
> +             unsigned int expected = i + indices[0];
>               /* Indices into the second vector are all equivalent.  */
> -             if (MIN (elements, (unsigned) sel_int[i])
> +             if (MIN (elements, (unsigned) indices[i])
>                   != MIN (elements, expected))
>                 break;
>             }
> Index: gcc/tree-vect-loop.c
> ===================================================================
> --- gcc/tree-vect-loop.c        2017-12-09 22:47:27.884318101 +0000
> +++ gcc/tree-vect-loop.c        2017-12-09 22:48:47.547825355 +0000
> @@ -3714,12 +3714,11 @@ vect_estimate_min_profitable_iters (loop
>     vector elements (not bits) for a vector with NELT elements.  */
>  static void
>  calc_vec_perm_mask_for_shift (unsigned int offset, unsigned int nelt,
> -                             vec_perm_indices *sel)
> +                             vec_perm_builder *sel)
>  {
> -  unsigned int i;
> -
> -  for (i = 0; i < nelt; i++)
> -    sel->quick_push ((i + offset) & (2 * nelt - 1));
> +  sel->new_vector (nelt, nelt, 1);
> +  for (unsigned int i = 0; i < nelt; i++)
> +    sel->quick_push (i + offset);
>  }
>
>  /* Checks whether the target supports whole-vector shifts for vectors of mode
> @@ -3732,13 +3731,13 @@ have_whole_vector_shift (machine_mode mo
>      return true;
>
>    unsigned int i, nelt = GET_MODE_NUNITS (mode);
> -  auto_vec_perm_indices sel (nelt);
> -
> +  vec_perm_builder sel;
> +  vec_perm_indices indices;
>    for (i = nelt/2; i >= 1; i/=2)
>      {
> -      sel.truncate (0);
>        calc_vec_perm_mask_for_shift (i, nelt, &sel);
> -      if (!can_vec_perm_const_p (mode, sel, false))
> +      indices.new_vector (sel, 2, nelt);
> +      if (!can_vec_perm_const_p (mode, indices, false))
>         return false;
>      }
>    return true;
> @@ -5028,7 +5027,8 @@ vect_create_epilog_for_reduction (vec<tr
>        if (reduce_with_shift && !slp_reduc)
>          {
>            int nelements = vec_size_in_bits / element_bitsize;
> -          auto_vec_perm_indices sel (nelements);
> +         vec_perm_builder sel;
> +         vec_perm_indices indices;
>
>            int elt_offset;
>
> @@ -5052,9 +5052,9 @@ vect_create_epilog_for_reduction (vec<tr
>                 elt_offset >= 1;
>                 elt_offset /= 2)
>              {
> -             sel.truncate (0);
>               calc_vec_perm_mask_for_shift (elt_offset, nelements, &sel);
> -             tree mask = vect_gen_perm_mask_any (vectype, sel);
> +             indices.new_vector (sel, 2, nelements);
> +             tree mask = vect_gen_perm_mask_any (vectype, indices);
>               epilog_stmt = gimple_build_assign (vec_dest, VEC_PERM_EXPR,
>                                                  new_temp, zero_vec, mask);
>                new_name = make_ssa_name (vec_dest, epilog_stmt);
> Index: gcc/config/aarch64/aarch64.c
> ===================================================================
> --- gcc/config/aarch64/aarch64.c        2017-12-09 22:47:27.856318084 +0000
> +++ gcc/config/aarch64/aarch64.c        2017-12-09 22:48:47.535824832 +0000
> @@ -13208,7 +13208,7 @@ #define MAX_VECT_LEN 16
>  struct expand_vec_perm_d
>  {
>    rtx target, op0, op1;
> -  auto_vec_perm_indices perm;
> +  vec_perm_indices perm;
>    machine_mode vmode;
>    bool one_vector_p;
>    bool testing_p;
> @@ -13598,10 +13598,7 @@ aarch64_expand_vec_perm_const_1 (struct
>    unsigned int nelt = d->perm.length ();
>    if (d->perm[0] >= nelt)
>      {
> -      gcc_assert (nelt == (nelt & -nelt));
> -      for (unsigned int i = 0; i < nelt; ++i)
> -       d->perm[i] ^= nelt; /* Keep the same index, but in the other vector.  */
> -
> +      d->perm.rotate_inputs (1);
>        std::swap (d->op0, d->op1);
>      }
>
> @@ -13641,12 +13638,10 @@ aarch64_vectorize_vec_perm_const (machin
>
>    /* Calculate whether all elements are in one vector.  */
>    unsigned int nelt = sel.length ();
> -  d.perm.reserve (nelt);
>    for (i = which = 0; i < nelt; ++i)
>      {
>        unsigned int ei = sel[i] & (2 * nelt - 1);
>        which |= (ei < nelt ? 1 : 2);
> -      d.perm.quick_push (ei);
>      }
>
>    switch (which)
> @@ -13665,8 +13660,6 @@ aarch64_vectorize_vec_perm_const (machin
>          input vector.  */
>        /* Fall Through.  */
>      case 2:
> -      for (i = 0; i < nelt; ++i)
> -       d.perm[i] &= nelt - 1;
>        d.op0 = op1;
>        d.one_vector_p = true;
>        break;
> @@ -13677,6 +13670,8 @@ aarch64_vectorize_vec_perm_const (machin
>        break;
>      }
>
> +  d.perm.new_vector (sel.encoding (), d.one_vector_p ? 1 : 2, nelt);
> +
>    if (!d.testing_p)
>      return aarch64_expand_vec_perm_const_1 (&d);
>
> Index: gcc/config/arm/arm.c
> ===================================================================
> --- gcc/config/arm/arm.c        2017-12-09 22:47:27.858318085 +0000
> +++ gcc/config/arm/arm.c        2017-12-09 22:48:47.538824963 +0000
> @@ -28852,7 +28852,7 @@ #define MAX_VECT_LEN 16
>  struct expand_vec_perm_d
>  {
>    rtx target, op0, op1;
> -  auto_vec_perm_indices perm;
> +  vec_perm_indices perm;
>    machine_mode vmode;
>    bool one_vector_p;
>    bool testing_p;
> @@ -29360,9 +29360,7 @@ arm_expand_vec_perm_const_1 (struct expa
>    unsigned int nelt = d->perm.length ();
>    if (d->perm[0] >= nelt)
>      {
> -      for (unsigned int i = 0; i < nelt; ++i)
> -       d->perm[i] = (d->perm[i] + nelt) & (2 * nelt - 1);
> -
> +      d->perm.rotate_inputs (1);
>        std::swap (d->op0, d->op1);
>      }
>
> @@ -29402,12 +29400,10 @@ arm_vectorize_vec_perm_const (machine_mo
>    d.testing_p = !target;
>
>    nelt = GET_MODE_NUNITS (d.vmode);
> -  d.perm.reserve (nelt);
>    for (i = which = 0; i < nelt; ++i)
>      {
>        int ei = sel[i] & (2 * nelt - 1);
>        which |= (ei < nelt ? 1 : 2);
> -      d.perm.quick_push (ei);
>      }
>
>    switch (which)
> @@ -29426,8 +29422,6 @@ arm_vectorize_vec_perm_const (machine_mo
>          input vector.  */
>        /* FALLTHRU */
>      case 2:
> -      for (i = 0; i < nelt; ++i)
> -        d.perm[i] &= nelt - 1;
>        d.op0 = op1;
>        d.one_vector_p = true;
>        break;
> @@ -29438,6 +29432,8 @@ arm_vectorize_vec_perm_const (machine_mo
>        break;
>      }
>
> +  d.perm.new_vector (sel.encoding (), d.one_vector_p ? 1 : 2, nelt);
> +
>    if (d.testing_p)
>      return arm_expand_vec_perm_const_1 (&d);
>
> Index: gcc/config/powerpcspe/powerpcspe.c
> ===================================================================
> --- gcc/config/powerpcspe/powerpcspe.c  2017-12-09 22:47:27.871318093 +0000
> +++ gcc/config/powerpcspe/powerpcspe.c  2017-12-09 22:48:47.541825094 +0000
> @@ -38780,7 +38780,7 @@ rs6000_expand_extract_even (rtx target,
>  {
>    machine_mode vmode = GET_MODE (target);
>    unsigned i, nelt = GET_MODE_NUNITS (vmode);
> -  vec_perm_builder perm (nelt);
> +  vec_perm_builder perm (nelt, nelt, 1);
>
>    for (i = 0; i < nelt; i++)
>      perm.quick_push (i * 2);
> @@ -38795,7 +38795,7 @@ rs6000_expand_interleave (rtx target, rt
>  {
>    machine_mode vmode = GET_MODE (target);
>    unsigned i, high, nelt = GET_MODE_NUNITS (vmode);
> -  vec_perm_builder perm (nelt);
> +  vec_perm_builder perm (nelt, nelt, 1);
>
>    high = (highp ? 0 : nelt / 2);
>    for (i = 0; i < nelt / 2; i++)
> Index: gcc/config/rs6000/rs6000.c
> ===================================================================
> --- gcc/config/rs6000/rs6000.c  2017-12-09 22:47:27.874318095 +0000
> +++ gcc/config/rs6000/rs6000.c  2017-12-09 22:48:47.544825224 +0000
> @@ -36017,7 +36017,7 @@ rs6000_expand_extract_even (rtx target,
>  {
>    machine_mode vmode = GET_MODE (target);
>    unsigned i, nelt = GET_MODE_NUNITS (vmode);
> -  vec_perm_builder perm (nelt);
> +  vec_perm_builder perm (nelt, nelt, 1);
>
>    for (i = 0; i < nelt; i++)
>      perm.quick_push (i * 2);
> @@ -36032,7 +36032,7 @@ rs6000_expand_interleave (rtx target, rt
>  {
>    machine_mode vmode = GET_MODE (target);
>    unsigned i, high, nelt = GET_MODE_NUNITS (vmode);
> -  vec_perm_builder perm (nelt);
> +  vec_perm_builder perm (nelt, nelt, 1);
>
>    high = (highp ? 0 : nelt / 2);
>    for (i = 0; i < nelt / 2; i++)

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [00/13] Make VEC_PERM_EXPR work for variable-length vectors
  2017-12-12 14:12 ` [00/13] Make VEC_PERM_EXPR work for variable-length vectors Richard Biener
@ 2017-12-12 15:32   ` Richard Sandiford
  2017-12-12 15:38     ` Richard Biener
  0 siblings, 1 reply; 46+ messages in thread
From: Richard Sandiford @ 2017-12-12 15:32 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC Patches

Richard Biener <richard.guenther@gmail.com> writes:
> On Sun, Dec 10, 2017 at 12:06 AM, Richard Sandiford
> <richard.sandiford@linaro.org> wrote:
>> This series is a replacement for:
>> https://gcc.gnu.org/ml/gcc-patches/2017-11/msg00747.html
>> based on the feedback that using VEC_PERM_EXPR would be better.
>>
>> The changes are:
>>
>> (1) Remove the restriction that the selector elements have to have the
>>     same width as the data elements, but only for constant selectors.
>>     This lets through the cases we need without also allowing
>>     potentially-expensive ops.  Adding support for the variable
>>     case can be done later if it seems useful, but it's not trivial.
>>
>> (2) Encode the integer form of constant selectors (vec_perm_indices)
>>     in the same way as the new VECTOR_CST encoding, so that it can
>>     cope with variable-length vectors.
>>
>> (3) Remove the vec_perm_const optab and reuse the target hook to emit
>>     code.  This avoids the need to create a CONST_VECTOR for the wide
>>     selectors, and hence the need to have a corresponding wide vector
>>     mode (which the target wouldn't otherwise need or support).
>
> Hmm.  Makes sense I suppose.
>
>> (4) When handling the variable vec_perm optab, check that modes can store
>>     all element indices before using them.
>>
>> (5) Unconditionally use ssizetype selector elements in permutes created
>>     by the vectoriser.
>
> Why specifically _signed_ sizetype?  That sounds like an odd choice.  But I'll
> eventually see when looking at the patch.

Sorry, should have said.  The choice doesn't matter for vector lengths
that are a power of 2, but for others, using a signed selector means that
-1 always selects the last input element, whereas for unsigned selectors,
the element selected by -1 would depend on the selector precision.  (And the
use of sizetype precision is pretty arbitrary.)

> Does that mean we have a VNDImode vector unconditionally for the
> permute even though a vector matching the width of the data members
> would work?

A VECTOR_CST of N DIs, yeah.  It only becomes a VNDI at the rtl level
if we're selecting 64-bit data elements.

> What happens if the target doesn't have vec_perm_const but vec_perm to
> handle all constant permutes?

We'll try to represent the selector as VN?I, where ? matches the width
of the data, but only after checking that that doesn't change the
selector values.  So for current targets we'll use vec_perm for constant
permutes as before, but we wouldn't for 2-input V256QI interleaves.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [00/13] Make VEC_PERM_EXPR work for variable-length vectors
  2017-12-12 15:32   ` Richard Sandiford
@ 2017-12-12 15:38     ` Richard Biener
  2017-12-12 15:57       ` Richard Sandiford
  0 siblings, 1 reply; 46+ messages in thread
From: Richard Biener @ 2017-12-12 15:38 UTC (permalink / raw)
  To: Richard Biener, GCC Patches, Richard Sandiford

On Tue, Dec 12, 2017 at 4:32 PM, Richard Sandiford
<richard.sandiford@linaro.org> wrote:
> Richard Biener <richard.guenther@gmail.com> writes:
>> On Sun, Dec 10, 2017 at 12:06 AM, Richard Sandiford
>> <richard.sandiford@linaro.org> wrote:
>>> This series is a replacement for:
>>> https://gcc.gnu.org/ml/gcc-patches/2017-11/msg00747.html
>>> based on the feedback that using VEC_PERM_EXPR would be better.
>>>
>>> The changes are:
>>>
>>> (1) Remove the restriction that the selector elements have to have the
>>>     same width as the data elements, but only for constant selectors.
>>>     This lets through the cases we need without also allowing
>>>     potentially-expensive ops.  Adding support for the variable
>>>     case can be done later if it seems useful, but it's not trivial.
>>>
>>> (2) Encode the integer form of constant selectors (vec_perm_indices)
>>>     in the same way as the new VECTOR_CST encoding, so that it can
>>>     cope with variable-length vectors.
>>>
>>> (3) Remove the vec_perm_const optab and reuse the target hook to emit
>>>     code.  This avoids the need to create a CONST_VECTOR for the wide
>>>     selectors, and hence the need to have a corresponding wide vector
>>>     mode (which the target wouldn't otherwise need or support).
>>
>> Hmm.  Makes sense I suppose.
>>
>>> (4) When handling the variable vec_perm optab, check that modes can store
>>>     all element indices before using them.
>>>
>>> (5) Unconditionally use ssizetype selector elements in permutes created
>>>     by the vectoriser.
>>
>> Why specifically _signed_ sizetype?  That sounds like an odd choice.  But I'll
>> eventually see when looking at the patch.
>
> Sorry, should have said.  The choice doesn't matter for vector lengths
> that are a power of 2,

which are the only ones we support anyway?

> but for others, using a signed selector means that
> -1 always selects the last input element, whereas for unsigned selectors,
> the element selected by -1 would depend on the selector precision.  (And the
> use of sizetype precision is pretty arbitrary.)

hmm, so you are saying that vec_perm <v1, v2, { -1, -2, ... }> is equal
to vec_perm <v1, v2, {2*n-1, 2*n-2, ....}?

tree.def defines VEC_PERM_EXPR via

   N = length(mask)
   foreach i in N:
     M = mask[i] % (2*N)
     A = M < N ? v0[M] : v1[M-N]

which doesn't reflect this behavior.  Does this behavior persist for variable
vector permutations?

>
>> Does that mean we have a VNDImode vector unconditionally for the
>> permute even though a vector matching the width of the data members
>> would work?
>
> A VECTOR_CST of N DIs, yeah.  It only becomes a VNDI at the rtl level
> if we're selecting 64-bit data elements.

And on GIMPLE?  Do we have a vector type with ssizetype elements
unconditionally?

>> What happens if the target doesn't have vec_perm_const but vec_perm to
>> handle all constant permutes?
>
> We'll try to represent the selector as VN?I, where ? matches the width
> of the data, but only after checking that that doesn't change the
> selector values.  So for current targets we'll use vec_perm for constant
> permutes as before, but we wouldn't for 2-input V256QI interleaves.

I see.

Richard.

> Thanks,
> Richard

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [07/13] Make vec_perm_indices use new vector encoding
  2017-12-12 15:32   ` Richard Biener
@ 2017-12-12 15:47     ` Richard Sandiford
  2017-12-14 10:37       ` Richard Biener
  0 siblings, 1 reply; 46+ messages in thread
From: Richard Sandiford @ 2017-12-12 15:47 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC Patches

Richard Biener <richard.guenther@gmail.com> writes:
> On Sun, Dec 10, 2017 at 12:20 AM, Richard Sandiford
> <richard.sandiford@linaro.org> wrote:
>> This patch changes vec_perm_indices from a plain vec<> to a class
>> that stores a canonicalised permutation, using the same encoding
>> as for VECTOR_CSTs.  This means that vec_perm_indices now carries
>> information about the number of vectors being permuted (currently
>> always 1 or 2) and the number of elements in each input vector.
>
> Before I dive into  the C++ details can you explain why it needs this
> info and how it encodes it for variable-length vectors?  To interleave
> two vectors you need sth like { 0, N, 1, N+1, ... }, I'm not sure we
> can directly encode N here, can we?  extract even/odd should just
> work as { 0, 2, 4, 6, ...} without knowledge of whether we permute
> one or two vectors (the one vector case just has two times the same
> vector) or how many elements each of the vectors (or the result) has.

One of the later patches switches the element types to HOST_WIDE_INT,
so that we can represent all ssizetypes.  Then there's a poly_int
patch (not yet posted) to make that poly_int64, so that we can
represent the N even for variable-length vectors.

The class needs to know the number of elements because that affects
the canonical representation.  E.g. extract even on fixed-length
vectors with both inputs the same should be { 0, 2, 4, ..., 0, 2, 4 ... },
which we can't encode as a simple series.  Interleave low with both
inputs the same should be { 0, 0, 1, 1, ... } for both fixed-length and
variable-length vectors.

Also, operator[] is supposed to return an in-range selector even if
the selector element is only implicitly encoded.  So we need to know
the number of input elements there.

Separating the number of input elements into the number of inputs
and the number of elements per input isn't really necessary, but made
it easier to provide routines for testing whether all selected
elements come from a particular input, and for rotating the selector
by a whole number of inputs.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [00/13] Make VEC_PERM_EXPR work for variable-length vectors
  2017-12-12 15:38     ` Richard Biener
@ 2017-12-12 15:57       ` Richard Sandiford
  0 siblings, 0 replies; 46+ messages in thread
From: Richard Sandiford @ 2017-12-12 15:57 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC Patches

Richard Biener <richard.guenther@gmail.com> writes:
> On Tue, Dec 12, 2017 at 4:32 PM, Richard Sandiford
> <richard.sandiford@linaro.org> wrote:
>> Richard Biener <richard.guenther@gmail.com> writes:
>>> On Sun, Dec 10, 2017 at 12:06 AM, Richard Sandiford
>>> <richard.sandiford@linaro.org> wrote:
>>>> This series is a replacement for:
>>>> https://gcc.gnu.org/ml/gcc-patches/2017-11/msg00747.html
>>>> based on the feedback that using VEC_PERM_EXPR would be better.
>>>>
>>>> The changes are:
>>>>
>>>> (1) Remove the restriction that the selector elements have to have the
>>>>     same width as the data elements, but only for constant selectors.
>>>>     This lets through the cases we need without also allowing
>>>>     potentially-expensive ops.  Adding support for the variable
>>>>     case can be done later if it seems useful, but it's not trivial.
>>>>
>>>> (2) Encode the integer form of constant selectors (vec_perm_indices)
>>>>     in the same way as the new VECTOR_CST encoding, so that it can
>>>>     cope with variable-length vectors.
>>>>
>>>> (3) Remove the vec_perm_const optab and reuse the target hook to emit
>>>>     code.  This avoids the need to create a CONST_VECTOR for the wide
>>>>     selectors, and hence the need to have a corresponding wide vector
>>>>     mode (which the target wouldn't otherwise need or support).
>>>
>>> Hmm.  Makes sense I suppose.
>>>
>>>> (4) When handling the variable vec_perm optab, check that modes can store
>>>>     all element indices before using them.
>>>>
>>>> (5) Unconditionally use ssizetype selector elements in permutes created
>>>>     by the vectoriser.
>>>
>>> Why specifically _signed_ sizetype?  That sounds like an odd choice.
>>> But I'll
>>> eventually see when looking at the patch.
>>
>> Sorry, should have said.  The choice doesn't matter for vector lengths
>> that are a power of 2,
>
> which are the only ones we support anyway?

Yeah, for fixed-length at the tree level.  The variable-length support
allows (2^N)*X vectors for non-power-of-2 X though, and we support
non-power-of-2 fixed-length vectors in RTL (e.g. V12QI).

>> but for others, using a signed selector means that
>> -1 always selects the last input element, whereas for unsigned selectors,
>> the element selected by -1 would depend on the selector precision.  (And the
>> use of sizetype precision is pretty arbitrary.)
>
> hmm, so you are saying that vec_perm <v1, v2, { -1, -2, ... }> is equal
> to vec_perm <v1, v2, {2*n-1, 2*n-2, ....}?

Yeah.

> tree.def defines VEC_PERM_EXPR via
>
>    N = length(mask)
>    foreach i in N:
>      M = mask[i] % (2*N)
>      A = M < N ? v0[M] : v1[M-N]
>
> which doesn't reflect this behavior.  Does this behavior persist for variable
> vector permutations?

__builtin_shuffle is defined to wrap though:

  The elements of the input vectors are numbered in memory ordering of
  @var{vec0} beginning at 0 and @var{vec1} beginning at @var{N}.  The
  elements of @var{mask} are considered modulo @var{N} in the single-operand
  case and modulo @math{2*@var{N}} in the two-operand case.

I think we need to preserve that for VEC_PERM_EXPR, otherwise we'd need
to introduce the masking operation when lowering __builtin_shuffle to
VEC_PERM_EXPR.

>>> Does that mean we have a VNDImode vector unconditionally for the
>>> permute even though a vector matching the width of the data members
>>> would work?
>>
>> A VECTOR_CST of N DIs, yeah.  It only becomes a VNDI at the rtl level
>> if we're selecting 64-bit data elements.
>
> And on GIMPLE?  Do we have a vector type with ssizetype elements
> unconditionally?

For autovectorised permutes, yes.  Other permutes we keep the existing types.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [07/13] Make vec_perm_indices use new vector encoding
  2017-12-12 15:47     ` Richard Sandiford
@ 2017-12-14 10:37       ` Richard Biener
  2017-12-20 13:48         ` Richard Sandiford
  0 siblings, 1 reply; 46+ messages in thread
From: Richard Biener @ 2017-12-14 10:37 UTC (permalink / raw)
  To: Richard Biener, GCC Patches, Richard Sandiford

On Tue, Dec 12, 2017 at 4:46 PM, Richard Sandiford
<richard.sandiford@linaro.org> wrote:
> Richard Biener <richard.guenther@gmail.com> writes:
>> On Sun, Dec 10, 2017 at 12:20 AM, Richard Sandiford
>> <richard.sandiford@linaro.org> wrote:
>>> This patch changes vec_perm_indices from a plain vec<> to a class
>>> that stores a canonicalised permutation, using the same encoding
>>> as for VECTOR_CSTs.  This means that vec_perm_indices now carries
>>> information about the number of vectors being permuted (currently
>>> always 1 or 2) and the number of elements in each input vector.
>>
>> Before I dive into  the C++ details can you explain why it needs this
>> info and how it encodes it for variable-length vectors?  To interleave
>> two vectors you need sth like { 0, N, 1, N+1, ... }, I'm not sure we
>> can directly encode N here, can we?  extract even/odd should just
>> work as { 0, 2, 4, 6, ...} without knowledge of whether we permute
>> one or two vectors (the one vector case just has two times the same
>> vector) or how many elements each of the vectors (or the result) has.
>
> One of the later patches switches the element types to HOST_WIDE_INT,
> so that we can represent all ssizetypes.  Then there's a poly_int
> patch (not yet posted) to make that poly_int64, so that we can
> represent the N even for variable-length vectors.
>
> The class needs to know the number of elements because that affects
> the canonical representation.  E.g. extract even on fixed-length
> vectors with both inputs the same should be { 0, 2, 4, ..., 0, 2, 4 ... },
> which we can't encode as a simple series.  Interleave low with both
> inputs the same should be { 0, 0, 1, 1, ... } for both fixed-length and
> variable-length vectors.

Huh?  extract even is { 0, 2, 4, 6, 8 ... } indexes in the selection vector
are referencing concat'ed input vectors.  So yes, for two same vectors
that's effectively { 0, 2, 4, ..., 0, 2, 4, ... } but I don't see why
that should
be the canonical form?

> Also, operator[] is supposed to return an in-range selector even if
> the selector element is only implicitly encoded.  So we need to know
> the number of input elements there.
>
> Separating the number of input elements into the number of inputs
> and the number of elements per input isn't really necessary, but made
> it easier to provide routines for testing whether all selected
> elements come from a particular input, and for rotating the selector
> by a whole number of inputs.
>
> Thanks,
> Richard

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [08/13] Add a vec_perm_indices_to_tree helper function
  2017-12-09 23:20 ` [08/13] Add a vec_perm_indices_to_tree helper function Richard Sandiford
@ 2017-12-18 13:34   ` Richard Biener
  0 siblings, 0 replies; 46+ messages in thread
From: Richard Biener @ 2017-12-18 13:34 UTC (permalink / raw)
  To: GCC Patches, Richard Sandiford

On Sun, Dec 10, 2017 at 12:20 AM, Richard Sandiford
<richard.sandiford@linaro.org> wrote:
> This patch adds a function for creating a VECTOR_CST from a
> vec_perm_indices, operating directly on the encoding.

Ok.

>
> 2017-12-09  Richard Sandiford  <richard.sandiford@linaro.org>
>
> gcc/
>         * vec-perm-indices.h (vec_perm_indices_to_tree): Declare.
>         * vec-perm-indices.c (vec_perm_indices_to_tree): New function.
>         * tree-ssa-forwprop.c (simplify_vector_constructor): Use it.
>         * tree-vect-slp.c (vect_transform_slp_perm_load): Likewise.
>         * tree-vect-stmts.c (vectorizable_bswap): Likewise.
>         (vect_gen_perm_mask_any): Likewise.
>
> Index: gcc/vec-perm-indices.h
> ===================================================================
> --- gcc/vec-perm-indices.h      2017-12-09 22:48:47.548825399 +0000
> +++ gcc/vec-perm-indices.h      2017-12-09 22:48:50.361942571 +0000
> @@ -73,6 +73,7 @@ typedef int_vector_builder<HOST_WIDE_INT
>  };
>
>  bool tree_to_vec_perm_builder (vec_perm_builder *, tree);
> +tree vec_perm_indices_to_tree (tree, const vec_perm_indices &);
>  rtx vec_perm_indices_to_rtx (machine_mode, const vec_perm_indices &);
>
>  inline
> Index: gcc/vec-perm-indices.c
> ===================================================================
> --- gcc/vec-perm-indices.c      2017-12-09 22:48:47.548825399 +0000
> +++ gcc/vec-perm-indices.c      2017-12-09 22:48:50.360942531 +0000
> @@ -152,6 +152,20 @@ tree_to_vec_perm_builder (vec_perm_build
>    return true;
>  }
>
> +/* Return a VECTOR_CST of type TYPE for the permutation vector in INDICES.  */
> +
> +tree
> +vec_perm_indices_to_tree (tree type, const vec_perm_indices &indices)
> +{
> +  gcc_assert (TYPE_VECTOR_SUBPARTS (type) == indices.length ());
> +  tree_vector_builder sel (type, indices.encoding ().npatterns (),
> +                          indices.encoding ().nelts_per_pattern ());
> +  unsigned int encoded_nelts = sel.encoded_nelts ();
> +  for (unsigned int i = 0; i < encoded_nelts; i++)
> +    sel.quick_push (build_int_cst (TREE_TYPE (type), indices[i]));
> +  return sel.build ();
> +}
> +
>  /* Return a CONST_VECTOR of mode MODE that contains the elements of
>     INDICES.  */
>
> Index: gcc/tree-ssa-forwprop.c
> ===================================================================
> --- gcc/tree-ssa-forwprop.c     2017-12-09 22:48:47.546825312 +0000
> +++ gcc/tree-ssa-forwprop.c     2017-12-09 22:48:50.359942492 +0000
> @@ -2119,10 +2119,7 @@ simplify_vector_constructor (gimple_stmt
>           || GET_MODE_SIZE (TYPE_MODE (mask_type))
>              != GET_MODE_SIZE (TYPE_MODE (type)))
>         return false;
> -      tree_vector_builder mask_elts (mask_type, nelts, 1);
> -      for (i = 0; i < nelts; i++)
> -       mask_elts.quick_push (build_int_cst (TREE_TYPE (mask_type), sel[i]));
> -      op2 = mask_elts.build ();
> +      op2 = vec_perm_indices_to_tree (mask_type, indices);
>        if (conv_code == ERROR_MARK)
>         gimple_assign_set_rhs_with_ops (gsi, VEC_PERM_EXPR, orig, orig, op2);
>        else
> Index: gcc/tree-vect-slp.c
> ===================================================================
> --- gcc/tree-vect-slp.c 2017-12-09 22:48:47.547825355 +0000
> +++ gcc/tree-vect-slp.c 2017-12-09 22:48:50.359942492 +0000
> @@ -3675,13 +3675,7 @@ vect_transform_slp_perm_load (slp_tree n
>                   tree mask_vec = NULL_TREE;
>
>                   if (! noop_p)
> -                   {
> -                     tree_vector_builder mask_elts (mask_type, nunits, 1);
> -                     for (int l = 0; l < nunits; ++l)
> -                       mask_elts.quick_push (build_int_cst (mask_element_type,
> -                                                            mask[l]));
> -                     mask_vec = mask_elts.build ();
> -                   }
> +                   mask_vec = vec_perm_indices_to_tree (mask_type, indices);
>
>                   if (second_vec_index == -1)
>                     second_vec_index = first_vec_index;
> Index: gcc/tree-vect-stmts.c
> ===================================================================
> --- gcc/tree-vect-stmts.c       2017-12-09 22:48:47.548825399 +0000
> +++ gcc/tree-vect-stmts.c       2017-12-09 22:48:50.360942531 +0000
> @@ -2529,10 +2529,7 @@ vectorizable_bswap (gimple *stmt, gimple
>        return true;
>      }
>
> -  tree_vector_builder telts (char_vectype, num_bytes, 1);
> -  for (unsigned i = 0; i < num_bytes; ++i)
> -    telts.quick_push (build_int_cst (char_type_node, elts[i]));
> -  tree bswap_vconst = telts.build ();
> +  tree bswap_vconst = vec_perm_indices_to_tree (char_vectype, indices);
>
>    /* Transform.  */
>    vec<tree> vec_oprnds = vNULL;
> @@ -6521,17 +6518,10 @@ vect_gen_perm_mask_any (tree vectype, co
>  {
>    tree mask_elt_type, mask_type;
>
> -  unsigned int nunits = sel.length ();
> -  gcc_checking_assert (nunits == TYPE_VECTOR_SUBPARTS (vectype));
> -
>    mask_elt_type = lang_hooks.types.type_for_mode
>      (int_mode_for_mode (TYPE_MODE (TREE_TYPE (vectype))).require (), 1);
>    mask_type = get_vectype_for_scalar_type (mask_elt_type);
> -
> -  tree_vector_builder mask_elts (mask_type, nunits, 1);
> -  for (unsigned int i = 0; i < nunits; ++i)
> -    mask_elts.quick_push (build_int_cst (mask_elt_type, sel[i]));
> -  return mask_elts.build ();
> +  return vec_perm_indices_to_tree (mask_type, sel);
>  }
>
>  /* Checked version of vect_gen_perm_mask_any.  Asserts can_vec_perm_const_p,

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [01/13] Add a qimode_for_vec_perm helper function
  2017-12-09 23:08 ` [01/13] Add a qimode_for_vec_perm helper function Richard Sandiford
@ 2017-12-18 13:34   ` Richard Biener
  0 siblings, 0 replies; 46+ messages in thread
From: Richard Biener @ 2017-12-18 13:34 UTC (permalink / raw)
  To: GCC Patches, Richard Sandiford

On Sun, Dec 10, 2017 at 12:08 AM, Richard Sandiford
<richard.sandiford@linaro.org> wrote:
> The vec_perm code falls back to doing byte-level permutes if
> element-level permutes aren't supported.  There were two copies
> of the code to calculate the mode, and later patches add another,
> so this patch splits it out into a helper function.
>

Ok.

> 2017-12-09  Richard Sandiford  <richard.sandiford@linaro.org>
>
> gcc/
>         * optabs-query.h (qimode_for_vec_perm): Declare.
>         * optabs-query.c (can_vec_perm_p): Split out qimode search to...
>         (qimode_for_vec_perm): ...this new function.
>         * optabs.c (expand_vec_perm): Use qimode_for_vec_perm.
>
> Index: gcc/optabs-query.h
> ===================================================================
> --- gcc/optabs-query.h  2017-12-09 22:47:12.476364764 +0000
> +++ gcc/optabs-query.h  2017-12-09 22:47:14.730310076 +0000
> @@ -174,6 +174,7 @@ enum insn_code can_extend_p (machine_mod
>  enum insn_code can_float_p (machine_mode, machine_mode, int);
>  enum insn_code can_fix_p (machine_mode, machine_mode, int, bool *);
>  bool can_conditionally_move_p (machine_mode mode);
> +opt_machine_mode qimode_for_vec_perm (machine_mode);
>  bool can_vec_perm_p (machine_mode, bool, vec_perm_indices *);
>  /* Find a widening optab even if it doesn't widen as much as we want.  */
>  #define find_widening_optab_handler(A, B, C) \
> Index: gcc/optabs-query.c
> ===================================================================
> --- gcc/optabs-query.c  2017-12-09 22:47:12.476364764 +0000
> +++ gcc/optabs-query.c  2017-12-09 22:47:14.729310075 +0000
> @@ -345,6 +345,22 @@ can_conditionally_move_p (machine_mode m
>    return direct_optab_handler (movcc_optab, mode) != CODE_FOR_nothing;
>  }
>
> +/* If a target doesn't implement a permute on a vector with multibyte
> +   elements, we can try to do the same permute on byte elements.
> +   If this makes sense for vector mode MODE then return the appropriate
> +   byte vector mode.  */
> +
> +opt_machine_mode
> +qimode_for_vec_perm (machine_mode mode)
> +{
> +  machine_mode qimode;
> +  if (GET_MODE_INNER (mode) != QImode
> +      && mode_for_vector (QImode, GET_MODE_SIZE (mode)).exists (&qimode)
> +      && VECTOR_MODE_P (qimode))
> +    return qimode;
> +  return opt_machine_mode ();
> +}
> +
>  /* Return true if VEC_PERM_EXPR of arbitrary input vectors can be
>     expanded using SIMD extensions of the CPU.  SEL may be NULL, which
>     stands for an unknown constant.  Note that additional permutations
> @@ -375,9 +391,7 @@ can_vec_perm_p (machine_mode mode, bool
>      return true;
>
>    /* We allow fallback to a QI vector mode, and adjust the mask.  */
> -  if (GET_MODE_INNER (mode) == QImode
> -      || !mode_for_vector (QImode, GET_MODE_SIZE (mode)).exists (&qimode)
> -      || !VECTOR_MODE_P (qimode))
> +  if (!qimode_for_vec_perm (mode).exists (&qimode))
>      return false;
>
>    /* ??? For completeness, we ought to check the QImode version of
> Index: gcc/optabs.c
> ===================================================================
> --- gcc/optabs.c        2017-12-09 22:47:12.476364764 +0000
> +++ gcc/optabs.c        2017-12-09 22:47:14.731310077 +0000
> @@ -5452,9 +5452,7 @@ expand_vec_perm (machine_mode mode, rtx
>
>    /* Set QIMODE to a different vector mode with byte elements.
>       If no such mode, or if MODE already has byte elements, use VOIDmode.  */
> -  if (GET_MODE_INNER (mode) == QImode
> -      || !mode_for_vector (QImode, w).exists (&qimode)
> -      || !VECTOR_MODE_P (qimode))
> +  if (!qimode_for_vec_perm (mode).exists (&qimode))
>      qimode = VOIDmode;
>
>    /* If the input is a constant, expand it specially.  */

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [09/13] Use explicit encodings for simple permutes
  2017-12-09 23:21 ` [09/13] Use explicit encodings for simple permutes Richard Sandiford
@ 2017-12-19 20:37   ` Richard Sandiford
  2018-01-02 13:07   ` Richard Biener
  1 sibling, 0 replies; 46+ messages in thread
From: Richard Sandiford @ 2017-12-19 20:37 UTC (permalink / raw)
  To: gcc-patches

Ping

Richard Sandiford <richard.sandiford@linaro.org> writes:
> This patch makes users of vec_perm_builders use the compressed encoding
> where possible.  This means that they work with variable-length vectors.
>
>
> 2017-12-09  Richard Sandiford  <richard.sandiford@linaro.org>
>
> gcc/
> 	* optabs.c (expand_vec_perm_var): Use an explicit encoding for
> 	the broadcast of the low byte.
> 	(expand_mult_highpart): Use an explicit encoding for the permutes.
> 	* optabs-query.c (can_mult_highpart_p): Likewise.
> 	* tree-vect-loop.c (calc_vec_perm_mask_for_shift): Likewise.
> 	* tree-vect-stmts.c (perm_mask_for_reverse): Likewise.
> 	(vectorizable_bswap): Likewise.
> 	* tree-vect-data-refs.c (vect_grouped_store_supported): Use an
> 	explicit encoding for the power-of-2 permutes.
> 	(vect_permute_store_chain): Likewise.
> 	(vect_grouped_load_supported): Likewise.
> 	(vect_permute_load_chain): Likewise.
>
> Index: gcc/optabs.c
> ===================================================================
> --- gcc/optabs.c	2017-12-09 22:48:47.546825312 +0000
> +++ gcc/optabs.c	2017-12-09 22:48:52.266015836 +0000
> @@ -5625,15 +5625,14 @@ expand_vec_perm_var (machine_mode mode,
>  			       NULL, 0, OPTAB_DIRECT);
>    gcc_assert (sel != NULL);
>  
> -  /* Broadcast the low byte each element into each of its bytes.  */
> -  vec_perm_builder const_sel (w, w, 1);
> -  for (i = 0; i < w; ++i)
> -    {
> -      int this_e = i / u * u;
> -      if (BYTES_BIG_ENDIAN)
> -	this_e += u - 1;
> -      const_sel.quick_push (this_e);
> -    }
> +  /* Broadcast the low byte each element into each of its bytes.
> +     The encoding has U interleaved stepped patterns, one for each
> +     byte of an element.  */
> +  vec_perm_builder const_sel (w, u, 3);
> +  unsigned int low_byte_in_u = BYTES_BIG_ENDIAN ? u - 1 : 0;
> +  for (i = 0; i < 3; ++i)
> +    for (unsigned int j = 0; j < u; ++j)
> +      const_sel.quick_push (i * u + low_byte_in_u);
>    sel = gen_lowpart (qimode, sel);
>    sel = expand_vec_perm_const (qimode, sel, sel, const_sel, qimode, NULL);
>    gcc_assert (sel != NULL);
> @@ -5853,16 +5852,20 @@ expand_mult_highpart (machine_mode mode,
>    expand_insn (optab_handler (tab2, mode), 3, eops);
>    m2 = gen_lowpart (mode, eops[0].value);
>  
> -  vec_perm_builder sel (nunits, nunits, 1);
> +  vec_perm_builder sel;
>    if (method == 2)
>      {
> -      for (i = 0; i < nunits; ++i)
> +      /* The encoding has 2 interleaved stepped patterns.  */
> +      sel.new_vector (nunits, 2, 3);
> +      for (i = 0; i < 6; ++i)
>  	sel.quick_push (!BYTES_BIG_ENDIAN + (i & ~1)
>  			+ ((i & 1) ? nunits : 0));
>      }
>    else
>      {
> -      for (i = 0; i < nunits; ++i)
> +      /* The encoding has a single interleaved stepped pattern.  */
> +      sel.new_vector (nunits, 1, 3);
> +      for (i = 0; i < 3; ++i)
>  	sel.quick_push (2 * i + (BYTES_BIG_ENDIAN ? 0 : 1));
>      }
>  
> Index: gcc/optabs-query.c
> ===================================================================
> --- gcc/optabs-query.c	2017-12-09 22:48:47.545825268 +0000
> +++ gcc/optabs-query.c	2017-12-09 22:48:52.265015799 +0000
> @@ -501,8 +501,9 @@ can_mult_highpart_p (machine_mode mode,
>        op = uns_p ? vec_widen_umult_odd_optab : vec_widen_smult_odd_optab;
>        if (optab_handler (op, mode) != CODE_FOR_nothing)
>  	{
> -	  vec_perm_builder sel (nunits, nunits, 1);
> -	  for (i = 0; i < nunits; ++i)
> +	  /* The encoding has 2 interleaved stepped patterns.  */
> +	  vec_perm_builder sel (nunits, 2, 3);
> +	  for (i = 0; i < 6; ++i)
>  	    sel.quick_push (!BYTES_BIG_ENDIAN
>  			    + (i & ~1)
>  			    + ((i & 1) ? nunits : 0));
> @@ -518,8 +519,9 @@ can_mult_highpart_p (machine_mode mode,
>        op = uns_p ? vec_widen_umult_lo_optab : vec_widen_smult_lo_optab;
>        if (optab_handler (op, mode) != CODE_FOR_nothing)
>  	{
> -	  vec_perm_builder sel (nunits, nunits, 1);
> -	  for (i = 0; i < nunits; ++i)
> +	  /* The encoding has a single stepped pattern.  */
> +	  vec_perm_builder sel (nunits, 1, 3);
> +	  for (int i = 0; i < 3; ++i)
>  	    sel.quick_push (2 * i + (BYTES_BIG_ENDIAN ? 0 : 1));
>  	  vec_perm_indices indices (sel, 2, nunits);
>  	  if (can_vec_perm_const_p (mode, indices))
> Index: gcc/tree-vect-loop.c
> ===================================================================
> --- gcc/tree-vect-loop.c	2017-12-09 22:48:47.547825355 +0000
> +++ gcc/tree-vect-loop.c	2017-12-09 22:48:52.267015873 +0000
> @@ -3716,8 +3716,10 @@ vect_estimate_min_profitable_iters (loop
>  calc_vec_perm_mask_for_shift (unsigned int offset, unsigned int nelt,
>  			      vec_perm_builder *sel)
>  {
> -  sel->new_vector (nelt, nelt, 1);
> -  for (unsigned int i = 0; i < nelt; i++)
> +  /* The encoding is a single stepped pattern.  Any wrap-around is handled
> +     by vec_perm_indices.  */
> +  sel->new_vector (nelt, 1, 3);
> +  for (unsigned int i = 0; i < 3; i++)
>      sel->quick_push (i + offset);
>  }
>  
> Index: gcc/tree-vect-stmts.c
> ===================================================================
> --- gcc/tree-vect-stmts.c	2017-12-09 22:48:50.360942531 +0000
> +++ gcc/tree-vect-stmts.c	2017-12-09 22:48:52.268015910 +0000
> @@ -1717,8 +1717,9 @@ perm_mask_for_reverse (tree vectype)
>  
>    nunits = TYPE_VECTOR_SUBPARTS (vectype);
>  
> -  vec_perm_builder sel (nunits, nunits, 1);
> -  for (i = 0; i < nunits; ++i)
> +  /* The encoding has a single stepped pattern.  */
> +  vec_perm_builder sel (nunits, 1, 3);
> +  for (i = 0; i < 3; ++i)
>      sel.quick_push (nunits - 1 - i);
>  
>    vec_perm_indices indices (sel, 1, nunits);
> @@ -2504,8 +2505,9 @@ vectorizable_bswap (gimple *stmt, gimple
>    unsigned int num_bytes = TYPE_VECTOR_SUBPARTS (char_vectype);
>    unsigned word_bytes = num_bytes / nunits;
>  
> -  vec_perm_builder elts (num_bytes, num_bytes, 1);
> -  for (unsigned i = 0; i < nunits; ++i)
> +  /* The encoding uses one stepped pattern for each byte in the word.  */
> +  vec_perm_builder elts (num_bytes, word_bytes, 3);
> +  for (unsigned i = 0; i < 3; ++i)
>      for (unsigned j = 0; j < word_bytes; ++j)
>        elts.quick_push ((i + 1) * word_bytes - j - 1);
>  
> Index: gcc/tree-vect-data-refs.c
> ===================================================================
> --- gcc/tree-vect-data-refs.c	2017-12-09 22:48:47.546825312 +0000
> +++ gcc/tree-vect-data-refs.c	2017-12-09 22:48:52.267015873 +0000
> @@ -4566,14 +4566,13 @@ vect_grouped_store_supported (tree vecty
>    if (VECTOR_MODE_P (mode))
>      {
>        unsigned int i, nelt = GET_MODE_NUNITS (mode);
> -      vec_perm_builder sel (nelt, nelt, 1);
> -      sel.quick_grow (nelt);
> -
>        if (count == 3)
>  	{
>  	  unsigned int j0 = 0, j1 = 0, j2 = 0;
>  	  unsigned int i, j;
>  
> +	  vec_perm_builder sel (nelt, nelt, 1);
> +	  sel.quick_grow (nelt);
>  	  vec_perm_indices indices;
>  	  for (j = 0; j < 3; j++)
>  	    {
> @@ -4623,7 +4622,10 @@ vect_grouped_store_supported (tree vecty
>  	  /* If length is not equal to 3 then only power of 2 is supported.  */
>  	  gcc_assert (pow2p_hwi (count));
>  
> -	  for (i = 0; i < nelt / 2; i++)
> +	  /* The encoding has 2 interleaved stepped patterns.  */
> +	  vec_perm_builder sel (nelt, 2, 3);
> +	  sel.quick_grow (6);
> +	  for (i = 0; i < 3; i++)
>  	    {
>  	      sel[i * 2] = i;
>  	      sel[i * 2 + 1] = i + nelt;
> @@ -4631,7 +4633,7 @@ vect_grouped_store_supported (tree vecty
>  	  vec_perm_indices indices (sel, 2, nelt);
>  	  if (can_vec_perm_const_p (mode, indices))
>  	    {
> -	      for (i = 0; i < nelt; i++)
> +	      for (i = 0; i < 6; i++)
>  		sel[i] += nelt / 2;
>  	      indices.new_vector (sel, 2, nelt);
>  	      if (can_vec_perm_const_p (mode, indices))
> @@ -4736,9 +4738,6 @@ vect_permute_store_chain (vec<tree> dr_c
>    unsigned int i, n, log_length = exact_log2 (length);
>    unsigned int j, nelt = TYPE_VECTOR_SUBPARTS (vectype);
>  
> -  vec_perm_builder sel (nelt, nelt, 1);
> -  sel.quick_grow (nelt);
> -
>    result_chain->quick_grow (length);
>    memcpy (result_chain->address (), dr_chain.address (),
>  	  length * sizeof (tree));
> @@ -4747,6 +4746,8 @@ vect_permute_store_chain (vec<tree> dr_c
>      {
>        unsigned int j0 = 0, j1 = 0, j2 = 0;
>  
> +      vec_perm_builder sel (nelt, nelt, 1);
> +      sel.quick_grow (nelt);
>        vec_perm_indices indices;
>        for (j = 0; j < 3; j++)
>          {
> @@ -4808,7 +4809,10 @@ vect_permute_store_chain (vec<tree> dr_c
>        /* If length is not equal to 3 then only power of 2 is supported.  */
>        gcc_assert (pow2p_hwi (length));
>  
> -      for (i = 0, n = nelt / 2; i < n; i++)
> +      /* The encoding has 2 interleaved stepped patterns.  */
> +      vec_perm_builder sel (nelt, 2, 3);
> +      sel.quick_grow (6);
> +      for (i = 0; i < 3; i++)
>  	{
>  	  sel[i * 2] = i;
>  	  sel[i * 2 + 1] = i + nelt;
> @@ -4816,7 +4820,7 @@ vect_permute_store_chain (vec<tree> dr_c
>  	vec_perm_indices indices (sel, 2, nelt);
>  	perm_mask_high = vect_gen_perm_mask_checked (vectype, indices);
>  
> -	for (i = 0; i < nelt; i++)
> +	for (i = 0; i < 6; i++)
>  	  sel[i] += nelt / 2;
>  	indices.new_vector (sel, 2, nelt);
>  	perm_mask_low = vect_gen_perm_mask_checked (vectype, indices);
> @@ -5164,11 +5168,11 @@ vect_grouped_load_supported (tree vectyp
>    if (VECTOR_MODE_P (mode))
>      {
>        unsigned int i, j, nelt = GET_MODE_NUNITS (mode);
> -      vec_perm_builder sel (nelt, nelt, 1);
> -      sel.quick_grow (nelt);
>  
>        if (count == 3)
>  	{
> +	  vec_perm_builder sel (nelt, nelt, 1);
> +	  sel.quick_grow (nelt);
>  	  vec_perm_indices indices;
>  	  unsigned int k;
>  	  for (k = 0; k < 3; k++)
> @@ -5209,12 +5213,15 @@ vect_grouped_load_supported (tree vectyp
>  	  /* If length is not equal to 3 then only power of 2 is supported.  */
>  	  gcc_assert (pow2p_hwi (count));
>  
> -	  for (i = 0; i < nelt; i++)
> +	  /* The encoding has a single stepped pattern.  */
> +	  vec_perm_builder sel (nelt, 1, 3);
> +	  sel.quick_grow (3);
> +	  for (i = 0; i < 3; i++)
>  	    sel[i] = i * 2;
>  	  vec_perm_indices indices (sel, 2, nelt);
>  	  if (can_vec_perm_const_p (mode, indices))
>  	    {
> -	      for (i = 0; i < nelt; i++)
> +	      for (i = 0; i < 3; i++)
>  		sel[i] = i * 2 + 1;
>  	      indices.new_vector (sel, 2, nelt);
>  	      if (can_vec_perm_const_p (mode, indices))
> @@ -5332,9 +5339,6 @@ vect_permute_load_chain (vec<tree> dr_ch
>    unsigned int i, j, log_length = exact_log2 (length);
>    unsigned nelt = TYPE_VECTOR_SUBPARTS (vectype);
>  
> -  vec_perm_builder sel (nelt, nelt, 1);
> -  sel.quick_grow (nelt);
> -
>    result_chain->quick_grow (length);
>    memcpy (result_chain->address (), dr_chain.address (),
>  	  length * sizeof (tree));
> @@ -5343,6 +5347,8 @@ vect_permute_load_chain (vec<tree> dr_ch
>      {
>        unsigned int k;
>  
> +      vec_perm_builder sel (nelt, nelt, 1);
> +      sel.quick_grow (nelt);
>        vec_perm_indices indices;
>        for (k = 0; k < 3; k++)
>  	{
> @@ -5390,12 +5396,15 @@ vect_permute_load_chain (vec<tree> dr_ch
>        /* If length is not equal to 3 then only power of 2 is supported.  */
>        gcc_assert (pow2p_hwi (length));
>  
> -      for (i = 0; i < nelt; ++i)
> +      /* The encoding has a single stepped pattern.  */
> +      vec_perm_builder sel (nelt, 1, 3);
> +      sel.quick_grow (3);
> +      for (i = 0; i < 3; ++i)
>  	sel[i] = i * 2;
>        vec_perm_indices indices (sel, 2, nelt);
>        perm_mask_even = vect_gen_perm_mask_checked (vectype, indices);
>  
> -      for (i = 0; i < nelt; ++i)
> +      for (i = 0; i < 3; ++i)
>  	sel[i] = i * 2 + 1;
>        indices.new_vector (sel, 2, nelt);
>        perm_mask_odd = vect_gen_perm_mask_checked (vectype, indices);

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [13/13] [AArch64] Use vec_perm_indices helper routines
  2017-12-09 23:27 ` [13/13] [AArch64] Use vec_perm_indices helper routines Richard Sandiford
@ 2017-12-19 20:37   ` Richard Sandiford
  2018-01-04 11:28     ` Richard Sandiford
  0 siblings, 1 reply; 46+ messages in thread
From: Richard Sandiford @ 2017-12-19 20:37 UTC (permalink / raw)
  To: gcc-patches; +Cc: richard.earnshaw, james.greenhalgh, marcus.shawcroft

Ping

Richard Sandiford <richard.sandiford@linaro.org> writes:
> This patch makes the AArch64 vec_perm_const code use the new
> vec_perm_indices routines, instead of checking each element individually.
> This means that they extend naturally to variable-length vectors.
>
> Also, aarch64_evpc_dup was the only function that generated rtl when
> testing_p is true, and that looked accidental.  The patch adds the
> missing check and then replaces the gen_rtx_REG/start_sequence/
> end_sequence stuff with an assert that no rtl is generated.
>
> Tested on aarch64-linux-gnu.  Also tested by making sure that there
> were no assembly output differences for aarch64_be-linux-gnu or
> aarch64_be-linux-gnu.  OK to install?
>
> Richard
>
>
> 2017-12-09  Richard Sandiford  <richard.sandiford@linaro.org>
>
> gcc/
> 	* config/aarch64/aarch64.c (aarch64_evpc_trn): Use d.perm.series_p
> 	instead of checking each element individually.
> 	(aarch64_evpc_uzp): Likewise.
> 	(aarch64_evpc_zip): Likewise.
> 	(aarch64_evpc_ext): Likewise.
> 	(aarch64_evpc_rev): Likewise.
> 	(aarch64_evpc_dup): Test the encoding for a single duplicated element,
> 	instead of checking each element individually.  Return true without
> 	generating rtl if
> 	(aarch64_vectorize_vec_perm_const): Use all_from_input_p to test
> 	whether all selected elements come from the same input, instead of
> 	checking each element individually.  Remove calls to gen_rtx_REG,
> 	start_sequence and end_sequence and instead assert that no rtl is
> 	generated.
>
> Index: gcc/config/aarch64/aarch64.c
> ===================================================================
> --- gcc/config/aarch64/aarch64.c	2017-12-09 22:48:47.535824832 +0000
> +++ gcc/config/aarch64/aarch64.c	2017-12-09 22:49:00.139270410 +0000
> @@ -13295,7 +13295,7 @@ aarch64_expand_vec_perm (rtx target, rtx
>  static bool
>  aarch64_evpc_trn (struct expand_vec_perm_d *d)
>  {
> -  unsigned int i, odd, mask, nelt = d->perm.length ();
> +  unsigned int odd, nelt = d->perm.length ();
>    rtx out, in0, in1, x;
>    machine_mode vmode = d->vmode;
>  
> @@ -13304,21 +13304,11 @@ aarch64_evpc_trn (struct expand_vec_perm
>  
>    /* Note that these are little-endian tests.
>       We correct for big-endian later.  */
> -  if (d->perm[0] == 0)
> -    odd = 0;
> -  else if (d->perm[0] == 1)
> -    odd = 1;
> -  else
> +  odd = d->perm[0];
> +  if ((odd != 0 && odd != 1)
> +      || !d->perm.series_p (0, 2, odd, 2)
> +      || !d->perm.series_p (1, 2, nelt + odd, 2))
>      return false;
> -  mask = (d->one_vector_p ? nelt - 1 : 2 * nelt - 1);
> -
> -  for (i = 0; i < nelt; i += 2)
> -    {
> -      if (d->perm[i] != i + odd)
> -	return false;
> -      if (d->perm[i + 1] != ((i + nelt + odd) & mask))
> -	return false;
> -    }
>  
>    /* Success!  */
>    if (d->testing_p)
> @@ -13342,7 +13332,7 @@ aarch64_evpc_trn (struct expand_vec_perm
>  static bool
>  aarch64_evpc_uzp (struct expand_vec_perm_d *d)
>  {
> -  unsigned int i, odd, mask, nelt = d->perm.length ();
> +  unsigned int odd;
>    rtx out, in0, in1, x;
>    machine_mode vmode = d->vmode;
>  
> @@ -13351,20 +13341,10 @@ aarch64_evpc_uzp (struct expand_vec_perm
>  
>    /* Note that these are little-endian tests.
>       We correct for big-endian later.  */
> -  if (d->perm[0] == 0)
> -    odd = 0;
> -  else if (d->perm[0] == 1)
> -    odd = 1;
> -  else
> +  odd = d->perm[0];
> +  if ((odd != 0 && odd != 1)
> +      || !d->perm.series_p (0, 1, odd, 2))
>      return false;
> -  mask = (d->one_vector_p ? nelt - 1 : 2 * nelt - 1);
> -
> -  for (i = 0; i < nelt; i++)
> -    {
> -      unsigned elt = (i * 2 + odd) & mask;
> -      if (d->perm[i] != elt)
> -	return false;
> -    }
>  
>    /* Success!  */
>    if (d->testing_p)
> @@ -13388,7 +13368,7 @@ aarch64_evpc_uzp (struct expand_vec_perm
>  static bool
>  aarch64_evpc_zip (struct expand_vec_perm_d *d)
>  {
> -  unsigned int i, high, mask, nelt = d->perm.length ();
> +  unsigned int high, nelt = d->perm.length ();
>    rtx out, in0, in1, x;
>    machine_mode vmode = d->vmode;
>  
> @@ -13397,25 +13377,11 @@ aarch64_evpc_zip (struct expand_vec_perm
>  
>    /* Note that these are little-endian tests.
>       We correct for big-endian later.  */
> -  high = nelt / 2;
> -  if (d->perm[0] == high)
> -    /* Do Nothing.  */
> -    ;
> -  else if (d->perm[0] == 0)
> -    high = 0;
> -  else
> +  high = d->perm[0];
> +  if ((high != 0 && high * 2 != nelt)
> +      || !d->perm.series_p (0, 2, high, 1)
> +      || !d->perm.series_p (1, 2, high + nelt, 1))
>      return false;
> -  mask = (d->one_vector_p ? nelt - 1 : 2 * nelt - 1);
> -
> -  for (i = 0; i < nelt / 2; i++)
> -    {
> -      unsigned elt = (i + high) & mask;
> -      if (d->perm[i * 2] != elt)
> -	return false;
> -      elt = (elt + nelt) & mask;
> -      if (d->perm[i * 2 + 1] != elt)
> -	return false;
> -    }
>  
>    /* Success!  */
>    if (d->testing_p)
> @@ -13440,23 +13406,14 @@ aarch64_evpc_zip (struct expand_vec_perm
>  static bool
>  aarch64_evpc_ext (struct expand_vec_perm_d *d)
>  {
> -  unsigned int i, nelt = d->perm.length ();
> +  unsigned int nelt = d->perm.length ();
>    rtx offset;
>  
>    unsigned int location = d->perm[0]; /* Always < nelt.  */
>  
>    /* Check if the extracted indices are increasing by one.  */
> -  for (i = 1; i < nelt; i++)
> -    {
> -      unsigned int required = location + i;
> -      if (d->one_vector_p)
> -        {
> -          /* We'll pass the same vector in twice, so allow indices to wrap.  */
> -	  required &= (nelt - 1);
> -	}
> -      if (d->perm[i] != required)
> -        return false;
> -    }
> +  if (!d->perm.series_p (0, 1, location, 1))
> +    return false;
>  
>    /* Success! */
>    if (d->testing_p)
> @@ -13488,7 +13445,7 @@ aarch64_evpc_ext (struct expand_vec_perm
>  static bool
>  aarch64_evpc_rev (struct expand_vec_perm_d *d)
>  {
> -  unsigned int i, j, diff, size, unspec, nelt = d->perm.length ();
> +  unsigned int i, diff, size, unspec;
>  
>    if (!d->one_vector_p)
>      return false;
> @@ -13504,18 +13461,10 @@ aarch64_evpc_rev (struct expand_vec_perm
>    else
>      return false;
>  
> -  for (i = 0; i < nelt ; i += diff + 1)
> -    for (j = 0; j <= diff; j += 1)
> -      {
> -	/* This is guaranteed to be true as the value of diff
> -	   is 7, 3, 1 and we should have enough elements in the
> -	   queue to generate this.  Getting a vector mask with a
> -	   value of diff other than these values implies that
> -	   something is wrong by the time we get here.  */
> -	gcc_assert (i + j < nelt);
> -	if (d->perm[i + j] != i + diff - j)
> -	  return false;
> -      }
> +  unsigned int step = diff + 1;
> +  for (i = 0; i < step; ++i)
> +    if (!d->perm.series_p (i, step, diff - i, step))
> +      return false;
>  
>    /* Success! */
>    if (d->testing_p)
> @@ -13532,15 +13481,17 @@ aarch64_evpc_dup (struct expand_vec_perm
>    rtx out = d->target;
>    rtx in0;
>    machine_mode vmode = d->vmode;
> -  unsigned int i, elt, nelt = d->perm.length ();
> +  unsigned int elt;
>    rtx lane;
>  
> +  if (d->perm.encoding ().encoded_nelts () != 1)
> +    return false;
> +
> +  /* Success! */
> +  if (d->testing_p)
> +    return true;
> +
>    elt = d->perm[0];
> -  for (i = 1; i < nelt; i++)
> -    {
> -      if (elt != d->perm[i])
> -	return false;
> -    }
>  
>    /* The generic preparation in aarch64_expand_vec_perm_const_1
>       swaps the operand order and the permute indices if it finds
> @@ -13628,61 +13579,37 @@ aarch64_vectorize_vec_perm_const (machin
>  				  rtx op1, const vec_perm_indices &sel)
>  {
>    struct expand_vec_perm_d d;
> -  unsigned int i, which;
>  
> -  d.vmode = vmode;
> -  d.target = target;
> -  d.op0 = op0;
> -  d.op1 = op1;
> -  d.testing_p = !target;
> -
> -  /* Calculate whether all elements are in one vector.  */
> -  unsigned int nelt = sel.length ();
> -  for (i = which = 0; i < nelt; ++i)
> +  /* Check whether the mask can be applied to a single vector.  */
> +  if (op0 && rtx_equal_p (op0, op1))
> +    d.one_vector_p = true;
> +  else if (sel.all_from_input_p (0))
>      {
> -      unsigned int ei = sel[i] & (2 * nelt - 1);
> -      which |= (ei < nelt ? 1 : 2);
> +      d.one_vector_p = true;
> +      op1 = op0;
>      }
> -
> -  switch (which)
> +  else if (sel.all_from_input_p (1))
>      {
> -    default:
> -      gcc_unreachable ();
> -
> -    case 3:
> -      d.one_vector_p = false;
> -      if (d.testing_p || !rtx_equal_p (op0, op1))
> -	break;
> -
> -      /* The elements of PERM do not suggest that only the first operand
> -	 is used, but both operands are identical.  Allow easier matching
> -	 of the permutation by folding the permutation into the single
> -	 input vector.  */
> -      /* Fall Through.  */
> -    case 2:
> -      d.op0 = op1;
> -      d.one_vector_p = true;
> -      break;
> -
> -    case 1:
> -      d.op1 = op0;
>        d.one_vector_p = true;
> -      break;
> +      op0 = op1;
>      }
> +  else
> +    d.one_vector_p = false;
>  
> -  d.perm.new_vector (sel.encoding (), d.one_vector_p ? 1 : 2, nelt);
> +  d.perm.new_vector (sel.encoding (), d.one_vector_p ? 1 : 2,
> +		     sel.nelts_per_input ());
> +  d.vmode = vmode;
> +  d.target = target;
> +  d.op0 = op0;
> +  d.op1 = op1;
> +  d.testing_p = !target;
>  
>    if (!d.testing_p)
>      return aarch64_expand_vec_perm_const_1 (&d);
>  
> -  d.target = gen_raw_REG (d.vmode, LAST_VIRTUAL_REGISTER + 1);
> -  d.op1 = d.op0 = gen_raw_REG (d.vmode, LAST_VIRTUAL_REGISTER + 2);
> -  if (!d.one_vector_p)
> -    d.op1 = gen_raw_REG (d.vmode, LAST_VIRTUAL_REGISTER + 3);
> -
> -  start_sequence ();
> +  rtx_insn *last = get_last_insn ();
>    bool ret = aarch64_expand_vec_perm_const_1 (&d);
> -  end_sequence ();
> +  gcc_assert (last == get_last_insn ());
>  
>    return ret;
>  }

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [11/13] Use vec_perm_builder::series_p in shift_amt_for_vec_perm_mask
  2017-12-09 23:24   ` [11/13] Use vec_perm_builder::series_p in shift_amt_for_vec_perm_mask Richard Sandiford
@ 2017-12-19 20:37     ` Richard Sandiford
  2018-01-02 13:08     ` Richard Biener
  1 sibling, 0 replies; 46+ messages in thread
From: Richard Sandiford @ 2017-12-19 20:37 UTC (permalink / raw)
  To: gcc-patches

Ping

Richard Sandiford <richard.sandiford@linaro.org> writes:
> This patch makes shift_amt_for_vec_perm_mask use series_p to check
> for the simple case of a natural linear series before falling back
> to testing each element individually.  The series_p test works with
> variable-length vectors but testing every individual element doesn't.
>
>
> 2017-12-09  Richard Sandiford  <richard.sandiford@linaro.org>
>
> gcc/
> 	* optabs.c (shift_amt_for_vec_perm_mask): Try using series_p
> 	before testing each element individually.
>
> Index: gcc/optabs.c
> ===================================================================
> --- gcc/optabs.c	2017-12-09 22:48:52.266015836 +0000
> +++ gcc/optabs.c	2017-12-09 22:48:56.257154317 +0000
> @@ -5375,20 +5375,20 @@ vector_compare_rtx (machine_mode cmp_mod
>  static rtx
>  shift_amt_for_vec_perm_mask (machine_mode mode, const vec_perm_indices &sel)
>  {
> -  unsigned int i, first, nelt = GET_MODE_NUNITS (mode);
> +  unsigned int nelt = GET_MODE_NUNITS (mode);
>    unsigned int bitsize = GET_MODE_UNIT_BITSIZE (mode);
> -
> -  first = sel[0];
> +  unsigned int first = sel[0];
>    if (first >= nelt)
>      return NULL_RTX;
> -  for (i = 1; i < nelt; i++)
> -    {
> -      int idx = sel[i];
> -      unsigned int expected = i + first;
> -      /* Indices into the second vector are all equivalent.  */
> -      if (idx < 0 || (MIN (nelt, (unsigned) idx) != MIN (nelt, expected)))
> -	return NULL_RTX;
> -    }
> +
> +  if (!sel.series_p (0, 1, first, 1))
> +    for (unsigned int i = 1; i < nelt; i++)
> +      {
> +	unsigned int expected = i + first;
> +	/* Indices into the second vector are all equivalent.  */
> +	if (MIN (nelt, sel[i]) != MIN (nelt, expected))
> +	  return NULL_RTX;
> +      }
>  
>    return GEN_INT (first * bitsize);
>  }

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [10/13] Rework VEC_PERM_EXPR folding
  2017-12-09 23:23 ` [10/13] Rework VEC_PERM_EXPR folding Richard Sandiford
  2017-12-09 23:24   ` [11/13] Use vec_perm_builder::series_p in shift_amt_for_vec_perm_mask Richard Sandiford
  2017-12-09 23:25   ` [12/13] Use ssizetype selectors for autovectorised VEC_PERM_EXPRs Richard Sandiford
@ 2017-12-19 20:37   ` Richard Sandiford
  2018-01-02 13:08   ` Richard Biener
  3 siblings, 0 replies; 46+ messages in thread
From: Richard Sandiford @ 2017-12-19 20:37 UTC (permalink / raw)
  To: gcc-patches

Ping

Richard Sandiford <richard.sandiford@linaro.org> writes:
> This patch reworks the VEC_PERM_EXPR folding so that more of it works
> for variable-length vectors.  E.g. it means that we can now recognise
> variable-length permutes that reduce to a single vector, or cases in
> which a variable-length permute only needs one input.  There should be
> no functional change for fixed-length vectors.
>
>
> 2017-12-09  Richard Sandiford  <richard.sandiford@linaro.org>
>
> gcc/
> 	* selftest.h (selftest::vec_perm_indices_c_tests): Declare.
> 	* selftest-run-tests.c (selftest::run_tests): Call it.
> 	* vector-builder.h (vector_builder::operator ==): New function.
> 	(vector_builder::operator !=): Likewise.
> 	* vec-perm-indices.h (vec_perm_indices::series_p): Declare.
> 	(vec_perm_indices::all_from_input_p): New function.
> 	* vec-perm-indices.c (vec_perm_indices::series_p): Likewise.
> 	(test_vec_perm_12, selftest::vec_perm_indices_c_tests): Likewise.
> 	* fold-const.c (fold_ternary_loc): Use tree_to_vec_perm_builder
> 	instead of reading the VECTOR_CST directly.  Detect whether both
> 	vector inputs are the same before constructing the vec_perm_indices,
> 	and update the number of inputs argument accordingly.  Use the
> 	utility functions added above.  Only construct sel2 if we need to.
>
> Index: gcc/selftest.h
> ===================================================================
> *** gcc/selftest.h	2017-12-09 23:06:55.002855594 +0000
> --- gcc/selftest.h	2017-12-09 23:21:51.517599734 +0000
> *************** extern void vec_c_tests ();
> *** 201,206 ****
> --- 201,207 ----
>   extern void wide_int_cc_tests ();
>   extern void predict_c_tests ();
>   extern void simplify_rtx_c_tests ();
> + extern void vec_perm_indices_c_tests ();
>   
>   extern int num_passes;
>   
> Index: gcc/selftest-run-tests.c
> ===================================================================
> *** gcc/selftest-run-tests.c	2017-12-09 23:06:55.002855594 +0000
> --- gcc/selftest-run-tests.c	2017-12-09 23:21:51.517599734 +0000
> *************** selftest::run_tests ()
> *** 73,78 ****
> --- 73,79 ----
>   
>     /* Mid-level data structures.  */
>     input_c_tests ();
> +   vec_perm_indices_c_tests ();
>     tree_c_tests ();
>     gimple_c_tests ();
>     rtl_tests_c_tests ();
> Index: gcc/vector-builder.h
> ===================================================================
> *** gcc/vector-builder.h	2017-12-09 23:06:55.002855594 +0000
> --- gcc/vector-builder.h	2017-12-09 23:21:51.518600090 +0000
> *************** #define GCC_VECTOR_BUILDER_H
> *** 97,102 ****
> --- 97,105 ----
>     bool encoded_full_vector_p () const;
>     T elt (unsigned int) const;
>   
> +   bool operator == (const Derived &) const;
> +   bool operator != (const Derived &x) const { return !operator == (x); }
> + 
>     void finalize ();
>   
>   protected:
> *************** vector_builder<T, Derived>::new_vector (
> *** 168,173 ****
> --- 171,196 ----
>     this->truncate (0);
>   }
>   
> + /* Return true if this vector and OTHER have the same elements and
> +    are encoded in the same way.  */
> + 
> + template<typename T, typename Derived>
> + bool
> + vector_builder<T, Derived>::operator == (const Derived &other) const
> + {
> +   if (m_full_nelts != other.m_full_nelts
> +       || m_npatterns != other.m_npatterns
> +       || m_nelts_per_pattern != other.m_nelts_per_pattern)
> +     return false;
> + 
> +   unsigned int nelts = encoded_nelts ();
> +   for (unsigned int i = 0; i < nelts; ++i)
> +     if (!derived ()->equal_p ((*this)[i], other[i]))
> +       return false;
> + 
> +   return true;
> + }
> + 
>   /* Return the value of vector element I, which might or might not be
>      encoded explicitly.  */
>   
> Index: gcc/vec-perm-indices.h
> ===================================================================
> *** gcc/vec-perm-indices.h	2017-12-09 23:20:13.233112018 +0000
> --- gcc/vec-perm-indices.h	2017-12-09 23:21:51.517599734 +0000
> *************** typedef int_vector_builder<HOST_WIDE_INT
> *** 62,68 ****
> --- 62,70 ----
>   
>     element_type clamp (element_type) const;
>     element_type operator[] (unsigned int i) const;
> +   bool series_p (unsigned int, unsigned int, element_type, element_type) const;
>     bool all_in_range_p (element_type, element_type) const;
> +   bool all_from_input_p (unsigned int) const;
>   
>   private:
>     vec_perm_indices (const vec_perm_indices &);
> *************** vec_perm_indices::operator[] (unsigned i
> *** 119,122 ****
> --- 121,133 ----
>     return clamp (m_encoding.elt (i));
>   }
>   
> + /* Return true if the permutation vector only selects elements from
> +    input I.  */
> + 
> + inline bool
> + vec_perm_indices::all_from_input_p (unsigned int i) const
> + {
> +   return all_in_range_p (i * m_nelts_per_input, m_nelts_per_input);
> + }
> + 
>   #endif
> Index: gcc/vec-perm-indices.c
> ===================================================================
> *** gcc/vec-perm-indices.c	2017-12-09 23:20:13.233112018 +0000
> --- gcc/vec-perm-indices.c	2017-12-09 23:21:51.517599734 +0000
> *************** Software Foundation; either version 3, o
> *** 28,33 ****
> --- 28,34 ----
>   #include "rtl.h"
>   #include "memmodel.h"
>   #include "emit-rtl.h"
> + #include "selftest.h"
>   
>   /* Switch to a new permutation vector that selects between NINPUTS vector
>      inputs that have NELTS_PER_INPUT elements each.  Take the elements of the
> *************** vec_perm_indices::rotate_inputs (int del
> *** 85,90 ****
> --- 86,139 ----
>       m_encoding[i] = clamp (m_encoding[i] + element_delta);
>   }
>   
> + /* Return true if index OUT_BASE + I * OUT_STEP selects input
> +    element IN_BASE + I * IN_STEP.  */
> + 
> + bool
> + vec_perm_indices::series_p (unsigned int out_base, unsigned int out_step,
> + 			    element_type in_base, element_type in_step) const
> + {
> +   /* Check the base value.  */
> +   if (clamp (m_encoding.elt (out_base)) != clamp (in_base))
> +     return false;
> + 
> +   unsigned int full_nelts = m_encoding.full_nelts ();
> +   unsigned int npatterns = m_encoding.npatterns ();
> + 
> +   /* Calculate which multiple of OUT_STEP elements we need to get
> +      back to the same pattern.  */
> +   unsigned int cycle_length = least_common_multiple (out_step, npatterns);
> + 
> +   /* Check the steps.  */
> +   in_step = clamp (in_step);
> +   out_base += out_step;
> +   unsigned int limit = 0;
> +   for (;;)
> +     {
> +       /* Succeed if we've checked all the elements in the vector.  */
> +       if (out_base >= full_nelts)
> + 	return true;
> + 
> +       if (out_base >= npatterns)
> + 	{
> + 	  /* We've got to the end of the "foreground" values.  Check
> + 	     2 elements from each pattern in the "background" values.  */
> + 	  if (limit == 0)
> + 	    limit = out_base + cycle_length * 2;
> + 	  else if (out_base >= limit)
> + 	    return true;
> + 	}
> + 
> +       element_type v0 = m_encoding.elt (out_base - out_step);
> +       element_type v1 = m_encoding.elt (out_base);
> +       if (clamp (v1 - v0) != in_step)
> + 	return false;
> + 
> +       out_base += out_step;
> +     }
> +   return true;
> + }
> + 
>   /* Return true if all elements of the permutation vector are in the range
>      [START, START + SIZE).  */
>   
> *************** vec_perm_indices_to_rtx (machine_mode mo
> *** 180,182 ****
> --- 229,280 ----
>       RTVEC_ELT (v, i) = gen_int_mode (indices[i], GET_MODE_INNER (mode));
>     return gen_rtx_CONST_VECTOR (mode, v);
>   }
> + 
> + #if CHECKING_P
> + 
> + namespace selftest {
> + 
> + /* Test a 12-element vector.  */
> + 
> + static void
> + test_vec_perm_12 (void)
> + {
> +   vec_perm_builder builder (12, 12, 1);
> +   for (unsigned int i = 0; i < 4; ++i)
> +     {
> +       builder.quick_push (i * 5);
> +       builder.quick_push (3 + i);
> +       builder.quick_push (2 + 3 * i);
> +     }
> +   vec_perm_indices indices (builder, 1, 12);
> +   ASSERT_TRUE (indices.series_p (0, 3, 0, 5));
> +   ASSERT_FALSE (indices.series_p (0, 3, 3, 5));
> +   ASSERT_FALSE (indices.series_p (0, 3, 0, 8));
> +   ASSERT_TRUE (indices.series_p (1, 3, 3, 1));
> +   ASSERT_TRUE (indices.series_p (2, 3, 2, 3));
> + 
> +   ASSERT_TRUE (indices.series_p (0, 4, 0, 4));
> +   ASSERT_FALSE (indices.series_p (1, 4, 3, 4));
> + 
> +   ASSERT_TRUE (indices.series_p (0, 6, 0, 10));
> +   ASSERT_FALSE (indices.series_p (0, 6, 0, 100));
> + 
> +   ASSERT_FALSE (indices.series_p (1, 10, 3, 7));
> +   ASSERT_TRUE (indices.series_p (1, 10, 3, 8));
> + 
> +   ASSERT_TRUE (indices.series_p (0, 12, 0, 10));
> +   ASSERT_TRUE (indices.series_p (0, 12, 0, 11));
> +   ASSERT_TRUE (indices.series_p (0, 12, 0, 100));
> + }
> + 
> + /* Run selftests for this file.  */
> + 
> + void
> + vec_perm_indices_c_tests ()
> + {
> +   test_vec_perm_12 ();
> + }
> + 
> + } // namespace selftest
> + 
> + #endif
> Index: gcc/fold-const.c
> ===================================================================
> *** gcc/fold-const.c	2017-12-09 23:18:12.040041251 +0000
> --- gcc/fold-const.c	2017-12-09 23:21:51.517599734 +0000
> *************** fold_ternary_loc (location_t loc, enum t
> *** 11547,11645 ****
>       case VEC_PERM_EXPR:
>         if (TREE_CODE (arg2) == VECTOR_CST)
>   	{
> ! 	  unsigned int nelts = VECTOR_CST_NELTS (arg2), i, mask, mask2;
> ! 	  bool need_mask_canon = false;
> ! 	  bool need_mask_canon2 = false;
> ! 	  bool all_in_vec0 = true;
> ! 	  bool all_in_vec1 = true;
> ! 	  bool maybe_identity = true;
> ! 	  bool single_arg = (op0 == op1);
> ! 	  bool changed = false;
> ! 
> ! 	  mask2 = 2 * nelts - 1;
> ! 	  mask = single_arg ? (nelts - 1) : mask2;
> ! 	  gcc_assert (nelts == TYPE_VECTOR_SUBPARTS (type));
> ! 	  vec_perm_builder sel (nelts, nelts, 1);
> ! 	  vec_perm_builder sel2 (nelts, nelts, 1);
> ! 	  for (i = 0; i < nelts; i++)
> ! 	    {
> ! 	      tree val = VECTOR_CST_ELT (arg2, i);
> ! 	      if (TREE_CODE (val) != INTEGER_CST)
> ! 		return NULL_TREE;
> ! 
> ! 	      /* Make sure that the perm value is in an acceptable
> ! 		 range.  */
> ! 	      wi::tree_to_wide_ref t = wi::to_wide (val);
> ! 	      need_mask_canon |= wi::gtu_p (t, mask);
> ! 	      need_mask_canon2 |= wi::gtu_p (t, mask2);
> ! 	      unsigned int elt = t.to_uhwi () & mask;
> ! 	      unsigned int elt2 = t.to_uhwi () & mask2;
> ! 
> ! 	      if (elt < nelts)
> ! 		all_in_vec1 = false;
> ! 	      else
> ! 		all_in_vec0 = false;
> ! 
> ! 	      if ((elt & (nelts - 1)) != i)
> ! 		maybe_identity = false;
> ! 
> ! 	      sel.quick_push (elt);
> ! 	      sel2.quick_push (elt2);
> ! 	    }
>   
> ! 	  if (maybe_identity)
> ! 	    {
> ! 	      if (all_in_vec0)
> ! 		return op0;
> ! 	      if (all_in_vec1)
> ! 		return op1;
> ! 	    }
>   
> ! 	  if (all_in_vec0)
> ! 	    op1 = op0;
> ! 	  else if (all_in_vec1)
> ! 	    {
> ! 	      op0 = op1;
> ! 	      for (i = 0; i < nelts; i++)
> ! 		sel[i] -= nelts;
> ! 	      need_mask_canon = true;
>   	    }
>   
> - 	  vec_perm_indices indices (sel, 2, nelts);
>   	  if ((TREE_CODE (op0) == VECTOR_CST
>   	       || TREE_CODE (op0) == CONSTRUCTOR)
>   	      && (TREE_CODE (op1) == VECTOR_CST
>   		  || TREE_CODE (op1) == CONSTRUCTOR))
>   	    {
> ! 	      tree t = fold_vec_perm (type, op0, op1, indices);
>   	      if (t != NULL_TREE)
>   		return t;
>   	    }
>   
> ! 	  if (op0 == op1 && !single_arg)
> ! 	    changed = true;
>   
> ! 	  /* Some targets are deficient and fail to expand a single
> ! 	     argument permutation while still allowing an equivalent
> ! 	     2-argument version.  */
> ! 	  if (need_mask_canon && arg2 == op2
> ! 	      && !can_vec_perm_const_p (TYPE_MODE (type), indices, false)
> ! 	      && can_vec_perm_const_p (TYPE_MODE (type),
> ! 				       vec_perm_indices (sel2, 2, nelts),
> ! 				       false))
>   	    {
> ! 	      need_mask_canon = need_mask_canon2;
> ! 	      sel.truncate (0);
> ! 	      sel.splice (sel2);
> ! 	    }
> ! 
> ! 	  if (need_mask_canon && arg2 == op2)
> ! 	    {
> ! 	      tree eltype = TREE_TYPE (TREE_TYPE (arg2));
> ! 	      tree_vector_builder tsel (TREE_TYPE (arg2), nelts, 1);
> ! 	      for (i = 0; i < nelts; i++)
> ! 		tsel.quick_push (build_int_cst (eltype, sel[i]));
> ! 	      op2 = tsel.build ();
>   	      changed = true;
>   	    }
>   
> --- 11547,11611 ----
>       case VEC_PERM_EXPR:
>         if (TREE_CODE (arg2) == VECTOR_CST)
>   	{
> ! 	  /* Build a vector of integers from the tree mask.  */
> ! 	  vec_perm_builder builder;
> ! 	  if (!tree_to_vec_perm_builder (&builder, arg2))
> ! 	    return NULL_TREE;
>   
> ! 	  /* Create a vec_perm_indices for the integer vector.  */
> ! 	  unsigned int nelts = TYPE_VECTOR_SUBPARTS (type);
> ! 	  bool single_arg = (op0 == op1);
> ! 	  vec_perm_indices sel (builder, single_arg ? 1 : 2, nelts);
>   
> ! 	  /* Check for cases that fold to OP0 or OP1 in their original
> ! 	     element order.  */
> ! 	  if (sel.series_p (0, 1, 0, 1))
> ! 	    return op0;
> ! 	  if (sel.series_p (0, 1, nelts, 1))
> ! 	    return op1;
> ! 
> ! 	  if (!single_arg)
> ! 	    {
> ! 	      if (sel.all_from_input_p (0))
> ! 		op1 = op0;
> ! 	      else if (sel.all_from_input_p (1))
> ! 		{
> ! 		  op0 = op1;
> ! 		  sel.rotate_inputs (1);
> ! 		}
>   	    }
>   
>   	  if ((TREE_CODE (op0) == VECTOR_CST
>   	       || TREE_CODE (op0) == CONSTRUCTOR)
>   	      && (TREE_CODE (op1) == VECTOR_CST
>   		  || TREE_CODE (op1) == CONSTRUCTOR))
>   	    {
> ! 	      tree t = fold_vec_perm (type, op0, op1, sel);
>   	      if (t != NULL_TREE)
>   		return t;
>   	    }
>   
> ! 	  bool changed = (op0 == op1 && !single_arg);
>   
> ! 	  /* Generate a canonical form of the selector.  */
> ! 	  if (arg2 == op2 && sel.encoding () != builder)
>   	    {
> ! 	      /* Some targets are deficient and fail to expand a single
> ! 		 argument permutation while still allowing an equivalent
> ! 		 2-argument version.  */
> ! 	      if (sel.ninputs () == 2
> ! 		  || can_vec_perm_const_p (TYPE_MODE (type), sel, false))
> ! 		op2 = vec_perm_indices_to_tree (TREE_TYPE (arg2), sel);
> ! 	      else
> ! 		{
> ! 		  vec_perm_indices sel2 (builder, 2, nelts);
> ! 		  if (can_vec_perm_const_p (TYPE_MODE (type), sel2, false))
> ! 		    op2 = vec_perm_indices_to_tree (TREE_TYPE (arg2), sel2);
> ! 		  else
> ! 		    /* Not directly supported with either encoding,
> ! 		       so use the preferred form.  */
> ! 		    op2 = vec_perm_indices_to_tree (TREE_TYPE (arg2), sel);
> ! 		}
>   	      changed = true;
>   	    }
>   

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [12/13] Use ssizetype selectors for autovectorised VEC_PERM_EXPRs
  2017-12-09 23:25   ` [12/13] Use ssizetype selectors for autovectorised VEC_PERM_EXPRs Richard Sandiford
@ 2017-12-19 20:37     ` Richard Sandiford
  2018-01-02 13:09     ` Richard Biener
  1 sibling, 0 replies; 46+ messages in thread
From: Richard Sandiford @ 2017-12-19 20:37 UTC (permalink / raw)
  To: gcc-patches

Ping

Richard Sandiford <richard.sandiford@linaro.org> writes:
> The previous patches mean that there's no reason that constant
> VEC_PERM_EXPRs need to have the same shape as the data inputs.
> This patch makes the autovectoriser use ssizetype elements instead,
> so that indices don't get truncated for large or variable-length vectors.
>
>
> 2017-12-09  Richard Sandiford  <richard.sandiford@linaro.org>
>
> gcc/
> 	* tree-cfg.c (verify_gimple_assign_ternary): Allow the size of
> 	the selector elements to be different from the data elements
> 	if the selector is a VECTOR_CST.
> 	* tree-vect-stmts.c (vect_gen_perm_mask_any): Use a vector of
> 	ssizetype for the selector.
>
> Index: gcc/tree-cfg.c
> ===================================================================
> --- gcc/tree-cfg.c	2017-12-09 22:47:07.103588314 +0000
> +++ gcc/tree-cfg.c	2017-12-09 22:48:58.259216407 +0000
> @@ -4300,8 +4300,11 @@ verify_gimple_assign_ternary (gassign *s
>  	}
>  
>        if (TREE_CODE (TREE_TYPE (rhs3_type)) != INTEGER_TYPE
> -	  || GET_MODE_BITSIZE (SCALAR_INT_TYPE_MODE (TREE_TYPE (rhs3_type)))
> -	     != GET_MODE_BITSIZE (SCALAR_TYPE_MODE (TREE_TYPE (rhs1_type))))
> +	  || (TREE_CODE (rhs3) != VECTOR_CST
> +	      && (GET_MODE_BITSIZE (SCALAR_INT_TYPE_MODE
> +				    (TREE_TYPE (rhs3_type)))
> +		  != GET_MODE_BITSIZE (SCALAR_TYPE_MODE
> +				       (TREE_TYPE (rhs1_type))))))
>  	{
>  	  error ("invalid mask type in vector permute expression");
>  	  debug_generic_expr (lhs_type);
> Index: gcc/tree-vect-stmts.c
> ===================================================================
> --- gcc/tree-vect-stmts.c	2017-12-09 22:48:52.268015910 +0000
> +++ gcc/tree-vect-stmts.c	2017-12-09 22:48:58.259216407 +0000
> @@ -6518,11 +6518,12 @@ vectorizable_store (gimple *stmt, gimple
>  tree
>  vect_gen_perm_mask_any (tree vectype, const vec_perm_indices &sel)
>  {
> -  tree mask_elt_type, mask_type;
> +  tree mask_type;
>  
> -  mask_elt_type = lang_hooks.types.type_for_mode
> -    (int_mode_for_mode (TYPE_MODE (TREE_TYPE (vectype))).require (), 1);
> -  mask_type = get_vectype_for_scalar_type (mask_elt_type);
> +  unsigned int nunits = sel.length ();
> +  gcc_assert (nunits == TYPE_VECTOR_SUBPARTS (vectype));
> +
> +  mask_type = build_vector_type (ssizetype, nunits);
>    return vec_perm_indices_to_tree (mask_type, sel);
>  }
>  

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [05/13] Remove vec_perm_const optab
  2017-12-12 15:26   ` Richard Biener
@ 2017-12-20 13:42     ` Richard Sandiford
  0 siblings, 0 replies; 46+ messages in thread
From: Richard Sandiford @ 2017-12-20 13:42 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC Patches

Richard Biener <richard.guenther@gmail.com> writes:
> On Sun, Dec 10, 2017 at 12:16 AM, Richard Sandiford
> <richard.sandiford@linaro.org> wrote:
>> One of the changes needed for variable-length VEC_PERM_EXPRs -- and for
>> long fixed-length VEC_PERM_EXPRs -- is the ability to use constant
>> selectors that wouldn't fit in the vectors being permuted.  E.g. a
>> permute on two V256QIs can't be done using a V256QI selector.
>>
>> At the moment constant permutes use two interfaces:
>> targetm.vectorizer.vec_perm_const_ok for testing whether a permute is
>> valid and the vec_perm_const optab for actually emitting the permute.
>> The former gets passed a vec<> selector and the latter an rtx selector.
>> Most ports share a lot of code between the hook and the optab, with a
>> wrapper function for each interface.
>>
>> We could try to keep that interface and require ports to define wider
>> vector modes that could be attached to the CONST_VECTOR (e.g. V256HI or
>> V256SI in the example above).  But building a CONST_VECTOR rtx seems a bit
>> pointless here, since the expand code only creates the CONST_VECTOR in
>> order to call the optab, and the first thing the target does is take
>> the CONST_VECTOR apart again.
>>
>> The easiest approach therefore seemed to be to remove the optab and
>> reuse the target hook to emit the code.  One potential drawback is that
>> it's no longer possible to use match_operand predicates to force
>> operands into the required form, but in practice all targets want
>> register operands anyway.
>>
>> The patch also changes vec_perm_indices into a class that provides
>> some simple routines for handling permutations.  A later patch will
>> flesh this out and get rid of auto_vec_perm_indices, but I didn't
>> want to do all that in this patch and make it more complicated than
>> it already is.
>>
>>
>> 2017-12-09  Richard Sandiford  <richard.sandiford@linaro.org>
>>
>> gcc/
>>         * Makefile.in (OBJS): Add vec-perm-indices.o.
>>         * vec-perm-indices.h: New file.
>>         * vec-perm-indices.c: Likewise.
>>         * target.h (vec_perm_indices): Replace with a forward class
>>         declaration.
>>         (auto_vec_perm_indices): Move to vec-perm-indices.h.
>>         * optabs.h: Include vec-perm-indices.h.
>>         (expand_vec_perm): Delete.
>>         (selector_fits_mode_p, expand_vec_perm_var): Declare.
>>         (expand_vec_perm_const): Declare.
>>         * target.def (vec_perm_const_ok): Replace with...
>>         (vec_perm_const): ...this new hook.
>>         * doc/tm.texi.in (TARGET_VECTORIZE_VEC_PERM_CONST_OK): Replace with...
>>         (TARGET_VECTORIZE_VEC_PERM_CONST): ...this new hook.
>>         * doc/tm.texi: Regenerate.
>>         * optabs.def (vec_perm_const): Delete.
>>         * doc/md.texi (vec_perm_const): Likewise.
>>         (vec_perm): Refer to TARGET_VECTORIZE_VEC_PERM_CONST.
>>         * expr.c (expand_expr_real_2): Use expand_vec_perm_const rather than
>>         expand_vec_perm for constant permutation vectors.  Assert that
>>         the mode of variable permutation vectors is the integer equivalent
>>         of the mode that is being permuted.
>>         * optabs-query.h (selector_fits_mode_p): Declare.
>>         * optabs-query.c: Include vec-perm-indices.h.
>>         (can_vec_perm_const_p): Check whether targetm.vectorize.vec_perm_const
>>         is defined, instead of checking whether the vec_perm_const_optab
>>         exists.  Use targetm.vectorize.vec_perm_const instead of
>>         targetm.vectorize.vec_perm_const_ok.  Check whether the indices
>>         fit in the vector mode before using a variable permute.
>>         * optabs.c (shift_amt_for_vec_perm_mask): Take a mode and a
>>         vec_perm_indices instead of an rtx.
>>         (expand_vec_perm): Replace with...
>>         (expand_vec_perm_const): ...this new function.  Take the selector
>>         as a vec_perm_indices rather than an rtx.  Also take the mode of
>>         the selector.  Update call to shift_amt_for_vec_perm_mask.
>>         Use targetm.vectorize.vec_perm_const instead of vec_perm_const_optab.
>>         Use vec_perm_indices::new_expanded_vector to expand the original
>>         selector into bytes.  Check whether the indices fit in the vector
>>         mode before using a variable permute.
>>         (expand_vec_perm_var): Make global.
>>         (expand_mult_highpart): Use expand_vec_perm_const.
>>         * fold-const.c: Includes vec-perm-indices.h.
>>         * tree-ssa-forwprop.c: Likewise.
>>         * tree-vect-data-refs.c: Likewise.
>>         * tree-vect-generic.c: Likewise.
>>         * tree-vect-loop.c: Likewise.
>>         * tree-vect-slp.c: Likewise.
>>         * tree-vect-stmts.c: Likewise.
>>         * config/aarch64/aarch64-protos.h (aarch64_expand_vec_perm_const):
>>         Delete.
>>         * config/aarch64/aarch64-simd.md (vec_perm_const<mode>): Delete.
>>         * config/aarch64/aarch64.c (aarch64_expand_vec_perm_const)
>>         (aarch64_vectorize_vec_perm_const_ok): Fuse into...
>>         (aarch64_vectorize_vec_perm_const): ...this new function.
>>         (TARGET_VECTORIZE_VEC_PERM_CONST_OK): Delete.
>>         (TARGET_VECTORIZE_VEC_PERM_CONST): Redefine.
>>         * config/arm/arm-protos.h (arm_expand_vec_perm_const): Delete.
>>         * config/arm/vec-common.md (vec_perm_const<mode>): Delete.
>>         * config/arm/arm.c (TARGET_VECTORIZE_VEC_PERM_CONST_OK): Delete.
>>         (TARGET_VECTORIZE_VEC_PERM_CONST): Redefine.
>>         (arm_expand_vec_perm_const, arm_vectorize_vec_perm_const_ok): Merge
>>         into...
>>         (arm_vectorize_vec_perm_const): ...this new function.  Explicitly
>>         check for NEON modes.
>>         * config/i386/i386-protos.h (ix86_expand_vec_perm_const): Delete.
>>         * config/i386/sse.md (VEC_PERM_CONST, vec_perm_const<mode>): Delete.
>>         * config/i386/i386.c (ix86_expand_vec_perm_const_1): Update comment.
>>         (ix86_expand_vec_perm_const, ix86_vectorize_vec_perm_const_ok): Merge
>>         into...
>>         (ix86_vectorize_vec_perm_const): ...this new function.  Incorporate
>>         the old VEC_PERM_CONST conditions.
>>         * config/ia64/ia64-protos.h (ia64_expand_vec_perm_const): Delete.
>>         * config/ia64/vect.md (vec_perm_const<mode>): Delete.
>>         * config/ia64/ia64.c (ia64_expand_vec_perm_const)
>>         (ia64_vectorize_vec_perm_const_ok): Merge into...
>>         (ia64_vectorize_vec_perm_const): ...this new function.
>>         * config/mips/loongson.md (vec_perm_const<mode>): Delete.
>>         * config/mips/mips-msa.md (vec_perm_const<mode>): Delete.
>>         * config/mips/mips-ps-3d.md (vec_perm_constv2sf): Delete.
>>         * config/mips/mips-protos.h (mips_expand_vec_perm_const): Delete.
>>         * config/mips/mips.c (mips_expand_vec_perm_const)
>>         (mips_vectorize_vec_perm_const_ok): Merge into...
>>         (mips_vectorize_vec_perm_const): ...this new function.
>>         * config/powerpcspe/altivec.md (vec_perm_constv16qi): Delete.
>>         * config/powerpcspe/paired.md (vec_perm_constv2sf): Delete.
>>         * config/powerpcspe/spe.md (vec_perm_constv2si): Delete.
>>         * config/powerpcspe/vsx.md (vec_perm_const<mode>): Delete.
>> * config/powerpcspe/powerpcspe-protos.h
> (altivec_expand_vec_perm_const)
>>         (rs6000_expand_vec_perm_const): Delete.
>>         * config/powerpcspe/powerpcspe.c (TARGET_VECTORIZE_VEC_PERM_CONST_OK):
>>         Delete.
>>         (TARGET_VECTORIZE_VEC_PERM_CONST): Redefine.
>>         (altivec_expand_vec_perm_const_le): Take each operand individually.
>>         Operate on constant selectors rather than rtxes.
>>         (altivec_expand_vec_perm_const): Likewise.  Update call to
>>         altivec_expand_vec_perm_const_le.
>>         (rs6000_expand_vec_perm_const): Delete.
>>         (rs6000_vectorize_vec_perm_const_ok): Delete.
>>         (rs6000_vectorize_vec_perm_const): New function.
>>         (rs6000_do_expand_vec_perm): Take a vec_perm_builder instead of
>>         an element count and rtx array.
>>         (rs6000_expand_extract_even): Update call accordingly.
>>         (rs6000_expand_interleave): Likewise.
>>         * config/rs6000/altivec.md (vec_perm_constv16qi): Delete.
>>         * config/rs6000/paired.md (vec_perm_constv2sf): Delete.
>>         * config/rs6000/vsx.md (vec_perm_const<mode>): Delete.
>>         * config/rs6000/rs6000-protos.h (altivec_expand_vec_perm_const)
>>         (rs6000_expand_vec_perm_const): Delete.
>>         * config/rs6000/rs6000.c (TARGET_VECTORIZE_VEC_PERM_CONST_OK): Delete.
>>         (TARGET_VECTORIZE_VEC_PERM_CONST): Redefine.
>>         (altivec_expand_vec_perm_const_le): Take each operand individually.
>>         Operate on constant selectors rather than rtxes.
>>         (altivec_expand_vec_perm_const): Likewise.  Update call to
>>         altivec_expand_vec_perm_const_le.
>>         (rs6000_expand_vec_perm_const): Delete.
>>         (rs6000_vectorize_vec_perm_const_ok): Delete.
>>         (rs6000_vectorize_vec_perm_const): New function.  Remove stray
>>         reference to the SPE evmerge intructions.
>>         (rs6000_do_expand_vec_perm): Take a vec_perm_builder instead of
>>         an element count and rtx array.
>>         (rs6000_expand_extract_even): Update call accordingly.
>>         (rs6000_expand_interleave): Likewise.
>>         * config/sparc/sparc.md (vec_perm_constv8qi): Delete in favor of...
>>         * config/sparc/sparc.c (sparc_vectorize_vec_perm_const): ...this
>>         new function.
>>         (TARGET_VECTORIZE_VEC_PERM_CONST): Redefine.
>>
>> Index: gcc/Makefile.in
>> ===================================================================
>> --- gcc/Makefile.in     2017-12-09 22:47:09.549486911 +0000
>> +++ gcc/Makefile.in     2017-12-09 22:47:27.854318082 +0000
>> @@ -1584,6 +1584,7 @@ OBJS = \
>>         var-tracking.o \
>>         varasm.o \
>>         varpool.o \
>> +       vec-perm-indices.o \
>>         vmsdbgout.o \
>>         vr-values.o \
>>         vtable-verify.o \
>> Index: gcc/vec-perm-indices.h
>> ===================================================================
>> --- /dev/null   2017-12-09 13:59:56.352713187 +0000
>> +++ gcc/vec-perm-indices.h      2017-12-09 22:47:27.885318101 +0000
>> @@ -0,0 +1,49 @@
>> +/* A representation of vector permutation indices.
>> +   Copyright (C) 2017 Free Software Foundation, Inc.
>> +
>> +This file is part of GCC.
>> +
>> +GCC is free software; you can redistribute it and/or modify it under
>> +the terms of the GNU General Public License as published by the Free
>> +Software Foundation; either version 3, or (at your option) any later
>> +version.
>> +
>> +GCC is distributed in the hope that it will be useful, but WITHOUT ANY
>> +WARRANTY; without even the implied warranty of MERCHANTABILITY or
>> +FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
>> +for more details.
>> +
>> +You should have received a copy of the GNU General Public License
>> +along with GCC; see the file COPYING3.  If not see
>> +<http://www.gnu.org/licenses/>.  */
>> +
>> +#ifndef GCC_VEC_PERN_INDICES_H
>> +#define GCC_VEC_PERN_INDICES_H 1
>> +
>> +/* This class represents a constant permutation vector, such as that used
>> +   as the final operand to a VEC_PERM_EXPR.  */
>> +class vec_perm_indices : public auto_vec<unsigned short, 32>
>> +{
>> +  typedef unsigned short element_type;
>> +  typedef auto_vec<element_type, 32> parent_type;
>> +
>> +public:
>> +  vec_perm_indices () {}
>> +  vec_perm_indices (unsigned int nunits) : parent_type (nunits) {}
>> +
>> +  void new_expanded_vector (const vec_perm_indices &, unsigned int);
>> +
>> +  bool all_in_range_p (element_type, element_type) const;
>> +
>> +private:
>> +  vec_perm_indices (const vec_perm_indices &);
>> +};
>> +
>> +/* Temporary.  */
>> +typedef vec_perm_indices vec_perm_builder;
>> +typedef vec_perm_indices auto_vec_perm_indices;
>> +
>> +bool tree_to_vec_perm_builder (vec_perm_builder *, tree);
>> +rtx vec_perm_indices_to_rtx (machine_mode, const vec_perm_indices &);
>> +
>> +#endif
>> Index: gcc/vec-perm-indices.c
>> ===================================================================
>> --- /dev/null   2017-12-09 13:59:56.352713187 +0000
>> +++ gcc/vec-perm-indices.c      2017-12-09 22:47:27.885318101 +0000
>> @@ -0,0 +1,93 @@
>> +/* A representation of vector permutation indices.
>> +   Copyright (C) 2017 Free Software Foundation, Inc.
>> +
>> +This file is part of GCC.
>> +
>> +GCC is free software; you can redistribute it and/or modify it under
>> +the terms of the GNU General Public License as published by the Free
>> +Software Foundation; either version 3, or (at your option) any later
>> +version.
>> +
>> +GCC is distributed in the hope that it will be useful, but WITHOUT ANY
>> +WARRANTY; without even the implied warranty of MERCHANTABILITY or
>> +FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
>> +for more details.
>> +
>> +You should have received a copy of the GNU General Public License
>> +along with GCC; see the file COPYING3.  If not see
>> +<http://www.gnu.org/licenses/>.  */
>> +
>> +#include "config.h"
>> +#include "system.h"
>> +#include "coretypes.h"
>> +#include "vec-perm-indices.h"
>> +#include "tree.h"
>> +#include "backend.h"
>> +#include "rtl.h"
>> +#include "memmodel.h"
>> +#include "emit-rtl.h"
>> +
>> +/* Switch to a new permutation vector that selects the same input elements
>> +   as ORIG, but with each element split into FACTOR pieces.  For example,
>> +   if ORIG is { 1, 2, 0, 3 } and FACTOR is 2, the new permutation is
>> +   { 2, 3, 4, 5, 0, 1, 6, 7 }.  */
>> +
>> +void
>> +vec_perm_indices::new_expanded_vector (const vec_perm_indices &orig,
>> +                                      unsigned int factor)
>> +{
>> +  truncate (0);
>> +  reserve (orig.length () * factor);
>> +  for (unsigned int i = 0; i < orig.length (); ++i)
>> +    {
>> +      element_type base = orig[i] * factor;
>
> No check whether this overflows unsigned short?  (not that this is likely)

A later patch changes element_type to HOST_WIDE_INT.  Wrapping at that size
should be OK, since the number of vector elements can't be greater than the
address space size.

Also, this function is in practice only used for expanding to QImode
mask values.  Later patches check that this is safe before using the
function, i.e. that QImode can hold all the required byte indices.

>> +      for (unsigned int j = 0; j < factor; ++j)
>> +       quick_push (base + j);
>> +    }
>> +}
>> +
>> +/* Return true if all elements of the permutation vector are in the range
>> +   [START, START + SIZE).  */
>> +
>> +bool
>> +vec_perm_indices::all_in_range_p (element_type start, element_type
> size) const
>> +{
>> +  for (unsigned int i = 0; i < length (); ++i)
>> +    if ((*this)[i] < start || ((*this)[i] - start) >= size)
>> +      return false;
>> +  return true;
>> +}
>> +
>> +/* Try to read the contents of VECTOR_CST CST as a constant permutation
>> +   vector.  Return true and add the elements to BUILDER on success,
>> +   otherwise return false without modifying BUILDER.  */
>> +
>> +bool
>> +tree_to_vec_perm_builder (vec_perm_builder *builder, tree cst)
>> +{
>> +  unsigned int nelts = TYPE_VECTOR_SUBPARTS (TREE_TYPE (cst));
>> +  for (unsigned int i = 0; i < nelts; ++i)
>> +    if (!tree_fits_shwi_p (vector_cst_elt (cst, i)))
>
> So why specifically shwi and not uhwi?  Shouldn't this also somehow
> be checked for IN_RANGE of unsigned short aka vec_perm_indices::element_type?

shwi matches element_type once element_type changes to HOST_WIDE_INT
in a later patch.  I guess on its own this patch leaves things in a
bit of an inconsistent state though, sorry.

It's shwi rather than uhwi because of the thing I mentioned in the other
reply (to a later patch) about negative values counting from the end.
E.g. the element selected by ~(T) 0 depends on the width of T if the
vector is not a power of 2 in size (which can be true for variable-length
vectors, or at the rtl level for fixed-length vectors), and in practice
the width of T is arbitrary.  Using signed values and counting from the
end for negative indices avoids that: the index selected by -1 does not
depend on the number of bits used to store the -1.

> The rest of the changes look ok, please give target maintainers a
> chance to review.

Thanks,
Richard

>
> Thanks,
> Richard.
>
>
>> +      return false;
>> +
>> +  builder->reserve (nelts);
>> +  for (unsigned int i = 0; i < nelts; ++i)
>> +    builder->quick_push (tree_to_shwi (vector_cst_elt (cst, i))
>> +                        & (2 * nelts - 1));
>> +  return true;
>> +}
>> +
>> +/* Return a CONST_VECTOR of mode MODE that contains the elements of
>> +   INDICES.  */
>> +
>> +rtx
>> +vec_perm_indices_to_rtx (machine_mode mode, const vec_perm_indices &indices)
>> +{
>> +  gcc_assert (GET_MODE_CLASS (mode) == MODE_VECTOR_INT
>> +             && GET_MODE_NUNITS (mode) == indices.length ());
>> +  unsigned int nelts = indices.length ();
>> +  rtvec v = rtvec_alloc (nelts);
>> +  for (unsigned int i = 0; i < nelts; ++i)
>> +    RTVEC_ELT (v, i) = gen_int_mode (indices[i], GET_MODE_INNER (mode));
>> +  return gen_rtx_CONST_VECTOR (mode, v);
>> +}
>> Index: gcc/target.h
>> ===================================================================
>> --- gcc/target.h        2017-12-09 22:47:09.549486911 +0000
>> +++ gcc/target.h        2017-12-09 22:47:27.882318099 +0000
>> @@ -193,13 +193,7 @@ enum vect_cost_model_location {
>>    vect_epilogue = 2
>>  };
>>
>> -/* The type to use for vector permutes with a constant permute vector.
>> -   Each entry is an index into the concatenated input vectors.  */
>> -typedef vec<unsigned short> vec_perm_indices;
>> -
>> -/* Same, but can be used to construct local permute vectors that are
>> -   automatically freed.  */
>> -typedef auto_vec<unsigned short, 32> auto_vec_perm_indices;
>> +class vec_perm_indices;
>>
>>  /* The target structure.  This holds all the backend hooks.  */
>>  #define DEFHOOKPOD(NAME, DOC, TYPE, INIT) TYPE NAME;
>> Index: gcc/optabs.h
>> ===================================================================
>> --- gcc/optabs.h        2017-12-09 22:47:09.549486911 +0000
>> +++ gcc/optabs.h        2017-12-09 22:47:27.882318099 +0000
>> @@ -22,6 +22,7 @@ #define GCC_OPTABS_H
>>
>>  #include "optabs-query.h"
>>  #include "optabs-libfuncs.h"
>> +#include "vec-perm-indices.h"
>>
>>  /* Generate code for a widening multiply.  */
>>  extern rtx expand_widening_mult (machine_mode, rtx, rtx, rtx, int, optab);
>> @@ -307,7 +308,9 @@ extern int have_insn_for (enum rtx_code,
>>  extern rtx_insn *gen_cond_trap (enum rtx_code, rtx, rtx, rtx);
>>
>>  /* Generate code for VEC_PERM_EXPR.  */
>> -extern rtx expand_vec_perm (machine_mode, rtx, rtx, rtx, rtx);
>> +extern rtx expand_vec_perm_var (machine_mode, rtx, rtx, rtx, rtx);
>> +extern rtx expand_vec_perm_const (machine_mode, rtx, rtx,
>> + const vec_perm_builder &, machine_mode, rtx);
>>
>>  /* Generate code for vector comparison.  */
>>  extern rtx expand_vec_cmp_expr (tree, tree, rtx);
>> Index: gcc/target.def
>> ===================================================================
>> --- gcc/target.def      2017-12-09 22:47:09.549486911 +0000
>> +++ gcc/target.def      2017-12-09 22:47:27.882318099 +0000
>> @@ -1841,12 +1841,27 @@ DEFHOOK
>>   bool, (const_tree type, bool is_packed),
>>   default_builtin_vector_alignment_reachable)
>>
>> -/* Return true if a vector created for vec_perm_const is valid.
>> -   A NULL indicates that all constants are valid permutations.  */
>>  DEFHOOK
>> -(vec_perm_const_ok,
>> - "Return true if a vector created for @code{vec_perm_const} is valid.",
>> - bool, (machine_mode, vec_perm_indices),
>> +(vec_perm_const,
>> + "This hook is used to test whether the target can permute up to two\n\
>> +vectors of mode @var{mode} using the permutation vector @code{sel}, and\n\
>> +also to emit such a permutation.  In the former case @var{in0}, @var{in1}\n\
>> +and @var{out} are all null.  In the latter case @var{in0} and
> @var{in1} are\n\
>> +the source vectors and @var{out} is the destination vector; all three are\n\
>> +registers of mode @var{mode}.  @var{in1} is the same as @var{in0} if\n\
>> +@var{sel} describes a permutation on one vector instead of two.\n\
>> +\n\
>> +Return true if the operation is possible, emitting instructions for it\n\
>> +if rtxes are provided.\n\
>> +\n\
>> +@cindex @code{vec_perm@var{m}} instruction pattern\n\
>> +If the hook returns false for a mode with multibyte elements, GCC will\n\
>> +try the equivalent byte operation.  If that also fails, it will try
> forcing\n\
>> +the selector into a register and using the @var{vec_perm@var{mode}}\n\
>> +instruction pattern.  There is no need for the hook to handle these two\n\
>> +implementation approaches itself.",
>> + bool, (machine_mode mode, rtx output, rtx in0, rtx in1,
>> +       const vec_perm_indices &sel),
>>   NULL)
>>
>>  /* Return true if the target supports misaligned store/load of a
>> Index: gcc/doc/tm.texi.in
>> ===================================================================
>> --- gcc/doc/tm.texi.in  2017-12-09 22:47:09.549486911 +0000
>> +++ gcc/doc/tm.texi.in  2017-12-09 22:47:27.879318098 +0000
>> @@ -4079,7 +4079,7 @@ address;  but often a machine-dependent
>>
>>  @hook TARGET_VECTORIZE_VECTOR_ALIGNMENT_REACHABLE
>>
>> -@hook TARGET_VECTORIZE_VEC_PERM_CONST_OK
>> +@hook TARGET_VECTORIZE_VEC_PERM_CONST
>>
>>  @hook TARGET_VECTORIZE_BUILTIN_CONVERSION
>>
>> Index: gcc/doc/tm.texi
>> ===================================================================
>> --- gcc/doc/tm.texi     2017-12-09 22:47:09.549486911 +0000
>> +++ gcc/doc/tm.texi     2017-12-09 22:47:27.878318097 +0000
>> @@ -5798,8 +5798,24 @@ correct for most targets.
>> Return true if vector alignment is reachable (by peeling N iterations)
> for the given scalar type @var{type}.  @var{is_packed} is false if the
> scalar access using @var{type} is known to be naturally aligned.
>>  @end deftypefn
>>
>> -@deftypefn {Target Hook} bool TARGET_VECTORIZE_VEC_PERM_CONST_OK
> (machine_mode, @var{vec_perm_indices})
>> -Return true if a vector created for @code{vec_perm_const} is valid.
>> +@deftypefn {Target Hook} bool TARGET_VECTORIZE_VEC_PERM_CONST
> (machine_mode @var{mode}, rtx @var{output}, rtx @var{in0}, rtx
> @var{in1}, const vec_perm_indices @var{&sel})
>> +This hook is used to test whether the target can permute up to two
>> +vectors of mode @var{mode} using the permutation vector @code{sel}, and
>> +also to emit such a permutation.  In the former case @var{in0}, @var{in1}
>> +and @var{out} are all null.  In the latter case @var{in0} and @var{in1} are
>> +the source vectors and @var{out} is the destination vector; all three are
>> +registers of mode @var{mode}.  @var{in1} is the same as @var{in0} if
>> +@var{sel} describes a permutation on one vector instead of two.
>> +
>> +Return true if the operation is possible, emitting instructions for it
>> +if rtxes are provided.
>> +
>> +@cindex @code{vec_perm@var{m}} instruction pattern
>> +If the hook returns false for a mode with multibyte elements, GCC will
>> +try the equivalent byte operation.  If that also fails, it will try forcing
>> +the selector into a register and using the @var{vec_perm@var{mode}}
>> +instruction pattern.  There is no need for the hook to handle these two
>> +implementation approaches itself.
>>  @end deftypefn
>>
>> @deftypefn {Target Hook} tree TARGET_VECTORIZE_BUILTIN_CONVERSION
> (unsigned @var{code}, tree @var{dest_type}, tree @var{src_type})
>> Index: gcc/optabs.def
>> ===================================================================
>> --- gcc/optabs.def      2017-12-09 22:47:09.549486911 +0000
>> +++ gcc/optabs.def      2017-12-09 22:47:27.882318099 +0000
>> @@ -302,7 +302,6 @@ OPTAB_D (vec_pack_ssat_optab, "vec_pack_
>>  OPTAB_D (vec_pack_trunc_optab, "vec_pack_trunc_$a")
>>  OPTAB_D (vec_pack_ufix_trunc_optab, "vec_pack_ufix_trunc_$a")
>>  OPTAB_D (vec_pack_usat_optab, "vec_pack_usat_$a")
>> -OPTAB_D (vec_perm_const_optab, "vec_perm_const$a")
>>  OPTAB_D (vec_perm_optab, "vec_perm$a")
>>  OPTAB_D (vec_realign_load_optab, "vec_realign_load_$a")
>>  OPTAB_D (vec_set_optab, "vec_set$a")
>> Index: gcc/doc/md.texi
>> ===================================================================
>> --- gcc/doc/md.texi     2017-12-09 22:47:09.549486911 +0000
>> +++ gcc/doc/md.texi     2017-12-09 22:47:27.877318096 +0000
>> @@ -4972,20 +4972,8 @@ where @var{q} is a vector of @code{QImod
>>  the middle-end will lower the mode @var{m} @code{VEC_PERM_EXPR} to
>>  mode @var{q}.
>>
>> -@cindex @code{vec_perm_const@var{m}} instruction pattern
>> -@item @samp{vec_perm_const@var{m}}
>> -Like @samp{vec_perm} except that the permutation is a compile-time
>> -constant.  That is, operand 3, the @dfn{selector}, is a @code{CONST_VECTOR}.
>> -
>> -Some targets cannot perform a permutation with a variable selector,
>> -but can efficiently perform a constant permutation.  Further, the
>> -target hook @code{vec_perm_ok} is queried to determine if the
>> -specific constant permutation is available efficiently; the named
>> -pattern is never expanded without @code{vec_perm_ok} returning true.
>> -
>> -There is no need for a target to supply both @samp{vec_perm@var{m}}
>> -and @samp{vec_perm_const@var{m}} if the former can trivially implement
>> -the operation with, say, the vector constant loaded into a register.
>> +See also @code{TARGET_VECTORIZER_VEC_PERM_CONST}, which performs
>> +the analogous operation for constant selectors.
>>
>>  @cindex @code{push@var{m}1} instruction pattern
>>  @item @samp{push@var{m}1}
>> Index: gcc/expr.c
>> ===================================================================
>> --- gcc/expr.c  2017-12-09 22:47:09.549486911 +0000
>> +++ gcc/expr.c  2017-12-09 22:47:27.880318098 +0000
>> @@ -9439,28 +9439,24 @@ #define REDUCE_BIT_FIELD(expr)  (reduce_b
>>        goto binop;
>>
>>      case VEC_PERM_EXPR:
>> -      expand_operands (treeop0, treeop1, target, &op0, &op1, EXPAND_NORMAL);
>> -      op2 = expand_normal (treeop2);
>> -
>> -      /* Careful here: if the target doesn't support integral vector modes,
>> -        a constant selection vector could wind up smooshed into a normal
>> -        integral constant.  */
>> -      if (CONSTANT_P (op2) && !VECTOR_MODE_P (GET_MODE (op2)))
>> -       {
>> -         tree sel_type = TREE_TYPE (treeop2);
>> -         machine_mode vmode
>> -           = mode_for_vector (SCALAR_TYPE_MODE (TREE_TYPE (sel_type)),
>> -                              TYPE_VECTOR_SUBPARTS (sel_type)).require ();
>> -         gcc_assert (GET_MODE_CLASS (vmode) == MODE_VECTOR_INT);
>> -         op2 = simplify_subreg (vmode, op2, TYPE_MODE (sel_type), 0);
>> -         gcc_assert (op2 && GET_CODE (op2) == CONST_VECTOR);
>> -       }
>> -      else
>> -        gcc_assert (GET_MODE_CLASS (GET_MODE (op2)) == MODE_VECTOR_INT);
>> -
>> -      temp = expand_vec_perm (mode, op0, op1, op2, target);
>> -      gcc_assert (temp);
>> -      return temp;
>> +      {
>> +       expand_operands (treeop0, treeop1, target, &op0, &op1, EXPAND_NORMAL);
>> +       vec_perm_builder sel;
>> +       if (TREE_CODE (treeop2) == VECTOR_CST
>> +           && tree_to_vec_perm_builder (&sel, treeop2))
>> +         {
>> +           machine_mode sel_mode = TYPE_MODE (TREE_TYPE (treeop2));
>> +           temp = expand_vec_perm_const (mode, op0, op1, sel,
>> +                                         sel_mode, target);
>> +         }
>> +       else
>> +         {
>> +           op2 = expand_normal (treeop2);
>> +           temp = expand_vec_perm_var (mode, op0, op1, op2, target);
>> +         }
>> +       gcc_assert (temp);
>> +       return temp;
>> +      }
>>
>>      case DOT_PROD_EXPR:
>>        {
>> Index: gcc/optabs-query.h
>> ===================================================================
>> --- gcc/optabs-query.h  2017-12-09 22:47:21.534314227 +0000
>> +++ gcc/optabs-query.h  2017-12-09 22:47:27.881318099 +0000
>> @@ -175,6 +175,7 @@ enum insn_code can_float_p (machine_mode
>>  enum insn_code can_fix_p (machine_mode, machine_mode, int, bool *);
>>  bool can_conditionally_move_p (machine_mode mode);
>>  opt_machine_mode qimode_for_vec_perm (machine_mode);
>> +bool selector_fits_mode_p (machine_mode, const vec_perm_indices &);
>>  bool can_vec_perm_var_p (machine_mode);
>>  bool can_vec_perm_const_p (machine_mode, const vec_perm_indices &,
>>                            bool = true);
>> Index: gcc/optabs-query.c
>> ===================================================================
>> --- gcc/optabs-query.c  2017-12-09 22:47:25.861316866 +0000
>> +++ gcc/optabs-query.c  2017-12-09 22:47:27.881318099 +0000
>> @@ -28,6 +28,7 @@ Software Foundation; either version 3, o
>>  #include "insn-config.h"
>>  #include "rtl.h"
>>  #include "recog.h"
>> +#include "vec-perm-indices.h"
>>
>>  struct target_optabs default_target_optabs;
>>  struct target_optabs *this_fn_optabs = &default_target_optabs;
>> @@ -361,6 +362,17 @@ qimode_for_vec_perm (machine_mode mode)
>>    return opt_machine_mode ();
>>  }
>>
>> +/* Return true if selector SEL can be represented in the integer
>> +   equivalent of vector mode MODE.  */
>> +
>> +bool
>> +selector_fits_mode_p (machine_mode mode, const vec_perm_indices &sel)
>> +{
>> +  unsigned HOST_WIDE_INT mask = GET_MODE_MASK (GET_MODE_INNER (mode));
>> +  return (mask == HOST_WIDE_INT_M1U
>> +         || sel.all_in_range_p (0, mask + 1));
>> +}
>> +
>>  /* Return true if VEC_PERM_EXPRs with variable selector operands can be
>>     expanded using SIMD extensions of the CPU.  MODE is the mode of the
>>     vectors being permuted.  */
>> @@ -416,18 +428,22 @@ can_vec_perm_const_p (machine_mode mode,
>>      return false;
>>
>>    /* It's probably cheaper to test for the variable case first.  */
>> -  if (allow_variable_p && can_vec_perm_var_p (mode))
>> +  if (allow_variable_p
>> +      && selector_fits_mode_p (mode, sel)
>> +      && can_vec_perm_var_p (mode))
>>      return true;
>>
>> -  if (direct_optab_handler (vec_perm_const_optab, mode) != CODE_FOR_nothing)
>> +  if (targetm.vectorize.vec_perm_const != NULL)
>>      {
>> -      if (targetm.vectorize.vec_perm_const_ok == NULL
>> -         || targetm.vectorize.vec_perm_const_ok (mode, sel))
>> +      if (targetm.vectorize.vec_perm_const (mode, NULL_RTX, NULL_RTX,
>> +                                           NULL_RTX, sel))
>>         return true;
>>
>>        /* ??? For completeness, we ought to check the QImode version of
>>          vec_perm_const_optab.  But all users of this implicit lowering
>> -        feature implement the variable vec_perm_optab.  */
>> +        feature implement the variable vec_perm_optab, and the ia64
>> +        port specifically doesn't want us to lower V2SF operations
>> +        into integer operations.  */
>>      }
>>
>>    return false;
>> Index: gcc/optabs.c
>> ===================================================================
>> --- gcc/optabs.c        2017-12-09 22:47:25.861316866 +0000
>> +++ gcc/optabs.c        2017-12-09 22:47:27.881318099 +0000
>> @@ -5367,25 +5367,23 @@ vector_compare_rtx (machine_mode cmp_mod
>>    return gen_rtx_fmt_ee (rcode, cmp_mode, ops[0].value, ops[1].value);
>>  }
>>
>> -/* Checks if vec_perm mask SEL is a constant equivalent to a shift of
> the first
>> - vec_perm operand, assuming the second operand is a constant vector
> of zeroes.
>> - Return the shift distance in bits if so, or NULL_RTX if the vec_perm
> is not a
>> -   shift.  */
>> +/* Check if vec_perm mask SEL is a constant equivalent to a shift of
>> +   the first vec_perm operand, assuming the second operand is a constant
>> +   vector of zeros.  Return the shift distance in bits if so, or NULL_RTX
>> +   if the vec_perm is not a shift.  MODE is the mode of the value being
>> +   shifted.  */
>>  static rtx
>> -shift_amt_for_vec_perm_mask (rtx sel)
>> +shift_amt_for_vec_perm_mask (machine_mode mode, const vec_perm_indices &sel)
>>  {
>> -  unsigned int i, first, nelt = GET_MODE_NUNITS (GET_MODE (sel));
>> -  unsigned int bitsize = GET_MODE_UNIT_BITSIZE (GET_MODE (sel));
>> +  unsigned int i, first, nelt = GET_MODE_NUNITS (mode);
>> +  unsigned int bitsize = GET_MODE_UNIT_BITSIZE (mode);
>>
>> -  if (GET_CODE (sel) != CONST_VECTOR)
>> -    return NULL_RTX;
>> -
>> -  first = INTVAL (CONST_VECTOR_ELT (sel, 0));
>> +  first = sel[0];
>>    if (first >= nelt)
>>      return NULL_RTX;
>>    for (i = 1; i < nelt; i++)
>>      {
>> -      int idx = INTVAL (CONST_VECTOR_ELT (sel, i));
>> +      int idx = sel[i];
>>        unsigned int expected = i + first;
>>        /* Indices into the second vector are all equivalent.  */
>>        if (idx < 0 || (MIN (nelt, (unsigned) idx) != MIN (nelt, expected)))
>> @@ -5395,7 +5393,7 @@ shift_amt_for_vec_perm_mask (rtx sel)
>>    return GEN_INT (first * bitsize);
>>  }
>>
>> -/* A subroutine of expand_vec_perm for expanding one vec_perm insn.  */
>> +/* A subroutine of expand_vec_perm_var for expanding one vec_perm insn.  */
>>
>>  static rtx
>>  expand_vec_perm_1 (enum insn_code icode, rtx target,
>> @@ -5433,38 +5431,32 @@ expand_vec_perm_1 (enum insn_code icode,
>>    return NULL_RTX;
>>  }
>>
>> -static rtx expand_vec_perm_var (machine_mode, rtx, rtx, rtx, rtx);
>> -
>>  /* Implement a permutation of vectors v0 and v1 using the permutation
>>     vector in SEL and return the result.  Use TARGET to hold the result
>>     if nonnull and convenient.
>>
>> -   MODE is the mode of the vectors being permuted (V0 and V1).  */
>> +   MODE is the mode of the vectors being permuted (V0 and V1).  SEL_MODE
>> +   is the TYPE_MODE associated with SEL, or BLKmode if SEL isn't known
>> +   to have a particular mode.  */
>>
>>  rtx
>> -expand_vec_perm (machine_mode mode, rtx v0, rtx v1, rtx sel, rtx target)
>> +expand_vec_perm_const (machine_mode mode, rtx v0, rtx v1,
>> +                      const vec_perm_builder &sel, machine_mode sel_mode,
>> +                      rtx target)
>>  {
>> -  enum insn_code icode;
>> -  machine_mode qimode;
>> -  unsigned int i, w, e, u;
>> -  rtx tmp, sel_qi = NULL;
>> -  rtvec vec;
>> -
>> -  if (GET_CODE (sel) != CONST_VECTOR)
>> -    return expand_vec_perm_var (mode, v0, v1, sel, target);
>> -
>> -  if (!target || GET_MODE (target) != mode)
>> +  if (!target || !register_operand (target, mode))
>>      target = gen_reg_rtx (mode);
>>
>> -  w = GET_MODE_SIZE (mode);
>> -  e = GET_MODE_NUNITS (mode);
>> -  u = GET_MODE_UNIT_SIZE (mode);
>> -
>>    /* Set QIMODE to a different vector mode with byte elements.
>>       If no such mode, or if MODE already has byte elements, use VOIDmode.  */
>> +  machine_mode qimode;
>>    if (!qimode_for_vec_perm (mode).exists (&qimode))
>>      qimode = VOIDmode;
>>
>> +  rtx_insn *last = get_last_insn ();
>> +
>> +  bool single_arg_p = rtx_equal_p (v0, v1);
>> +
>>    /* See if this can be handled with a vec_shr.  We only do this if the
>>       second vector is all zeroes.  */
>>    insn_code shift_code = optab_handler (vec_shr_optab, mode);
>> @@ -5476,7 +5468,7 @@ expand_vec_perm (machine_mode mode, rtx
>>        && (shift_code != CODE_FOR_nothing
>>           || shift_code_qi != CODE_FOR_nothing))
>>      {
>> -      rtx shift_amt = shift_amt_for_vec_perm_mask (sel);
>> +      rtx shift_amt = shift_amt_for_vec_perm_mask (mode, sel);
>>        if (shift_amt)
>>         {
>>           struct expand_operand ops[3];
>> @@ -5500,65 +5492,81 @@ expand_vec_perm (machine_mode mode, rtx
>>         }
>>      }
>>
>> -  icode = direct_optab_handler (vec_perm_const_optab, mode);
>> -  if (icode != CODE_FOR_nothing)
>> +  if (targetm.vectorize.vec_perm_const != NULL)
>>      {
>> -      tmp = expand_vec_perm_1 (icode, target, v0, v1, sel);
>> -      if (tmp)
>> -       return tmp;
>> +      v0 = force_reg (mode, v0);
>> +      if (single_arg_p)
>> +       v1 = v0;
>> +      else
>> +       v1 = force_reg (mode, v1);
>> +
>> +      if (targetm.vectorize.vec_perm_const (mode, target, v0, v1, sel))
>> +       return target;
>>      }
>>
>>    /* Fall back to a constant byte-based permutation.  */
>> +  vec_perm_indices qimode_indices;
>> +  rtx target_qi = NULL_RTX, v0_qi = NULL_RTX, v1_qi = NULL_RTX;
>>    if (qimode != VOIDmode)
>>      {
>> -      vec = rtvec_alloc (w);
>> -      for (i = 0; i < e; ++i)
>> -       {
>> -         unsigned int j, this_e;
>> +      qimode_indices.new_expanded_vector (sel, GET_MODE_UNIT_SIZE (mode));
>> +      target_qi = gen_reg_rtx (qimode);
>> +      v0_qi = gen_lowpart (qimode, v0);
>> +      v1_qi = gen_lowpart (qimode, v1);
>> +      if (targetm.vectorize.vec_perm_const != NULL
>> +         && targetm.vectorize.vec_perm_const (qimode, target_qi, v0_qi,
>> +                                              v1_qi, qimode_indices))
>> +       return gen_lowpart (mode, target_qi);
>> +    }
>>
>> -         this_e = INTVAL (CONST_VECTOR_ELT (sel, i));
>> -         this_e &= 2 * e - 1;
>> -         this_e *= u;
>> +  /* Otherwise expand as a fully variable permuation.  */
>>
>> -         for (j = 0; j < u; ++j)
>> -           RTVEC_ELT (vec, i * u + j) = GEN_INT (this_e + j);
>> -       }
>> -      sel_qi = gen_rtx_CONST_VECTOR (qimode, vec);
>> +  /* The optabs are only defined for selectors with the same width
>> +     as the values being permuted.  */
>> +  machine_mode required_sel_mode;
>> +  if (!mode_for_int_vector (mode).exists (&required_sel_mode)
>> +      || !VECTOR_MODE_P (required_sel_mode))
>> +    {
>> +      delete_insns_since (last);
>> +      return NULL_RTX;
>> +    }
>>
>> -      icode = direct_optab_handler (vec_perm_const_optab, qimode);
>> -      if (icode != CODE_FOR_nothing)
>> +  /* We know that it is semantically valid to treat SEL as having SEL_MODE.
>> +     If that isn't the mode we want then we need to prove that using
>> +     REQUIRED_SEL_MODE is OK.  */
>> +  if (sel_mode != required_sel_mode)
>> +    {
>> +      if (!selector_fits_mode_p (required_sel_mode, sel))
>>         {
>> -         tmp = gen_reg_rtx (qimode);
>> -         tmp = expand_vec_perm_1 (icode, tmp, gen_lowpart (qimode, v0),
>> -                                  gen_lowpart (qimode, v1), sel_qi);
>> -         if (tmp)
>> -           return gen_lowpart (mode, tmp);
>> +         delete_insns_since (last);
>> +         return NULL_RTX;
>>         }
>> +      sel_mode = required_sel_mode;
>>      }
>>
>> -  /* Otherwise expand as a fully variable permuation.  */
>> -
>> -  icode = direct_optab_handler (vec_perm_optab, mode);
>> +  insn_code icode = direct_optab_handler (vec_perm_optab, mode);
>>    if (icode != CODE_FOR_nothing)
>>      {
>> -      rtx tmp = expand_vec_perm_1 (icode, target, v0, v1, sel);
>> +      rtx sel_rtx = vec_perm_indices_to_rtx (sel_mode, sel);
>> +      rtx tmp = expand_vec_perm_1 (icode, target, v0, v1, sel_rtx);
>>        if (tmp)
>>         return tmp;
>>      }
>>
>> -  if (qimode != VOIDmode)
>> +  if (qimode != VOIDmode
>> +      && selector_fits_mode_p (qimode, qimode_indices))
>>      {
>>        icode = direct_optab_handler (vec_perm_optab, qimode);
>>        if (icode != CODE_FOR_nothing)
>>         {
>> -         rtx tmp = gen_reg_rtx (qimode);
>> -         tmp = expand_vec_perm_1 (icode, tmp, gen_lowpart (qimode, v0),
>> -                                  gen_lowpart (qimode, v1), sel_qi);
>> +         rtx sel_qi = vec_perm_indices_to_rtx (qimode, qimode_indices);
>> + rtx tmp = expand_vec_perm_1 (icode, target_qi, v0_qi, v1_qi,
> sel_qi);
>>           if (tmp)
>>             return gen_lowpart (mode, tmp);
>>         }
>>      }
>>
>> +  delete_insns_since (last);
>>    return NULL_RTX;
>>  }
>>
>> @@ -5570,7 +5578,7 @@ expand_vec_perm (machine_mode mode, rtx
>>     SEL must have the integer equivalent of MODE and is known to be
>>     unsuitable for permutes with a constant permutation vector.  */
>>
>> -static rtx
>> +rtx
>>  expand_vec_perm_var (machine_mode mode, rtx v0, rtx v1, rtx sel, rtx target)
>>  {
>>    enum insn_code icode;
>> @@ -5613,17 +5621,16 @@ expand_vec_perm_var (machine_mode mode,
>>    gcc_assert (sel != NULL);
>>
>>    /* Broadcast the low byte each element into each of its bytes.  */
>> -  vec = rtvec_alloc (w);
>> +  vec_perm_builder const_sel (w);
>>    for (i = 0; i < w; ++i)
>>      {
>>        int this_e = i / u * u;
>>        if (BYTES_BIG_ENDIAN)
>>         this_e += u - 1;
>> -      RTVEC_ELT (vec, i) = GEN_INT (this_e);
>> +      const_sel.quick_push (this_e);
>>      }
>> -  tmp = gen_rtx_CONST_VECTOR (qimode, vec);
>>    sel = gen_lowpart (qimode, sel);
>> -  sel = expand_vec_perm (qimode, sel, sel, tmp, NULL);
>> +  sel = expand_vec_perm_const (qimode, sel, sel, const_sel, qimode, NULL);
>>    gcc_assert (sel != NULL);
>>
>>    /* Add the byte offset to each byte element.  */
>> @@ -5797,9 +5804,8 @@ expand_mult_highpart (machine_mode mode,
>>    enum insn_code icode;
>>    int method, i, nunits;
>>    machine_mode wmode;
>> -  rtx m1, m2, perm;
>> +  rtx m1, m2;
>>    optab tab1, tab2;
>> -  rtvec v;
>>
>>    method = can_mult_highpart_p (mode, uns_p);
>>    switch (method)
>> @@ -5842,21 +5848,20 @@ expand_mult_highpart (machine_mode mode,
>>    expand_insn (optab_handler (tab2, mode), 3, eops);
>>    m2 = gen_lowpart (mode, eops[0].value);
>>
>> -  v = rtvec_alloc (nunits);
>> +  auto_vec_perm_indices sel (nunits);
>>    if (method == 2)
>>      {
>>        for (i = 0; i < nunits; ++i)
>> -       RTVEC_ELT (v, i) = GEN_INT (!BYTES_BIG_ENDIAN + (i & ~1)
>> -                                   + ((i & 1) ? nunits : 0));
>> -      perm = gen_rtx_CONST_VECTOR (mode, v);
>> +       sel.quick_push (!BYTES_BIG_ENDIAN + (i & ~1)
>> +                       + ((i & 1) ? nunits : 0));
>>      }
>>    else
>>      {
>> -      int base = BYTES_BIG_ENDIAN ? 0 : 1;
>> -      perm = gen_const_vec_series (mode, GEN_INT (base), GEN_INT (2));
>> +      for (i = 0; i < nunits; ++i)
>> +       sel.quick_push (2 * i + (BYTES_BIG_ENDIAN ? 0 : 1));
>>      }
>>
>> -  return expand_vec_perm (mode, m1, m2, perm, target);
>> +  return expand_vec_perm_const (mode, m1, m2, sel, BLKmode, target);
>>  }
>>
>>  /* Helper function to find the MODE_CC set in a sync_compare_and_swap
>> Index: gcc/fold-const.c
>> ===================================================================
>> --- gcc/fold-const.c    2017-12-09 22:47:21.534314227 +0000
>> +++ gcc/fold-const.c    2017-12-09 22:47:27.881318099 +0000
>> @@ -82,6 +82,7 @@ Software Foundation; either version 3, o
>>  #include "stringpool.h"
>>  #include "attribs.h"
>>  #include "tree-vector-builder.h"
>> +#include "vec-perm-indices.h"
>>
>>  /* Nonzero if we are folding constants inside an initializer; zero
>>     otherwise.  */
>> Index: gcc/tree-ssa-forwprop.c
>> ===================================================================
>> --- gcc/tree-ssa-forwprop.c     2017-12-09 22:47:21.534314227 +0000
>> +++ gcc/tree-ssa-forwprop.c     2017-12-09 22:47:27.883318100 +0000
>> @@ -47,6 +47,7 @@ the Free Software Foundation; either ver
>>  #include "cfganal.h"
>>  #include "optabs-tree.h"
>>  #include "tree-vector-builder.h"
>> +#include "vec-perm-indices.h"
>>
>>  /* This pass propagates the RHS of assignment statements into use
>>     sites of the LHS of the assignment.  It's basically a specialized
>> Index: gcc/tree-vect-data-refs.c
>> ===================================================================
>> --- gcc/tree-vect-data-refs.c   2017-12-09 22:47:21.535314227 +0000
>> +++ gcc/tree-vect-data-refs.c   2017-12-09 22:47:27.883318100 +0000
>> @@ -52,6 +52,7 @@ Software Foundation; either version 3, o
>>  #include "params.h"
>>  #include "tree-cfg.h"
>>  #include "tree-hash-traits.h"
>> +#include "vec-perm-indices.h"
>>
>>  /* Return true if load- or store-lanes optab OPTAB is implemented for
>>     COUNT vectors of type VECTYPE.  NAME is the name of OPTAB.  */
>> Index: gcc/tree-vect-generic.c
>> ===================================================================
>> --- gcc/tree-vect-generic.c     2017-12-09 22:47:21.535314227 +0000
>> +++ gcc/tree-vect-generic.c     2017-12-09 22:47:27.883318100 +0000
>> @@ -38,6 +38,7 @@ Free Software Foundation; either version
>>  #include "gimplify.h"
>>  #include "tree-cfg.h"
>>  #include "tree-vector-builder.h"
>> +#include "vec-perm-indices.h"
>>
>>
>>  static void expand_vector_operations_1 (gimple_stmt_iterator *);
>> Index: gcc/tree-vect-loop.c
>> ===================================================================
>> --- gcc/tree-vect-loop.c        2017-12-09 22:47:21.536314228 +0000
>> +++ gcc/tree-vect-loop.c        2017-12-09 22:47:27.884318101 +0000
>> @@ -52,6 +52,7 @@ Software Foundation; either version 3, o
>>  #include "tree-if-conv.h"
>>  #include "internal-fn.h"
>>  #include "tree-vector-builder.h"
>> +#include "vec-perm-indices.h"
>>
>>  /* Loop Vectorization Pass.
>>
>> Index: gcc/tree-vect-slp.c
>> ===================================================================
>> --- gcc/tree-vect-slp.c 2017-12-09 22:47:21.536314228 +0000
>> +++ gcc/tree-vect-slp.c 2017-12-09 22:47:27.884318101 +0000
>> @@ -42,6 +42,7 @@ Software Foundation; either version 3, o
>>  #include "gimple-walk.h"
>>  #include "dbgcnt.h"
>>  #include "tree-vector-builder.h"
>> +#include "vec-perm-indices.h"
>>
>>
>>  /* Recursively free the memory allocated for the SLP tree rooted at NODE.  */
>> Index: gcc/tree-vect-stmts.c
>> ===================================================================
>> --- gcc/tree-vect-stmts.c       2017-12-09 22:47:21.537314229 +0000
>> +++ gcc/tree-vect-stmts.c       2017-12-09 22:47:27.885318101 +0000
>> @@ -49,6 +49,7 @@ Software Foundation; either version 3, o
>>  #include "builtins.h"
>>  #include "internal-fn.h"
>>  #include "tree-vector-builder.h"
>> +#include "vec-perm-indices.h"
>>
>>  /* For lang_hooks.types.type_for_mode.  */
>>  #include "langhooks.h"
>> Index: gcc/config/aarch64/aarch64-protos.h
>> ===================================================================
>> --- gcc/config/aarch64/aarch64-protos.h 2017-12-09 22:47:09.549486911 +0000
>> +++ gcc/config/aarch64/aarch64-protos.h 2017-12-09 22:47:27.854318082 +0000
>> @@ -474,8 +474,6 @@ extern void aarch64_split_combinev16qi (
>>  extern void aarch64_expand_vec_perm (rtx, rtx, rtx, rtx, unsigned int);
>>  extern bool aarch64_madd_needs_nop (rtx_insn *);
>>  extern void aarch64_final_prescan_insn (rtx_insn *);
>> -extern bool
>> -aarch64_expand_vec_perm_const (rtx, rtx, rtx, rtx, unsigned int);
>>  void aarch64_atomic_assign_expand_fenv (tree *, tree *, tree *);
>>  int aarch64_ccmp_mode_to_code (machine_mode mode);
>>
>> Index: gcc/config/aarch64/aarch64-simd.md
>> ===================================================================
>> --- gcc/config/aarch64/aarch64-simd.md  2017-12-09 22:47:09.549486911 +0000
>> +++ gcc/config/aarch64/aarch64-simd.md  2017-12-09 22:47:27.854318082 +0000
>> @@ -5348,20 +5348,6 @@ (define_expand "aarch64_get_qreg<VSTRUCT
>>
>>  ;; vec_perm support
>>
>> -(define_expand "vec_perm_const<mode>"
>> -  [(match_operand:VALL_F16 0 "register_operand")
>> -   (match_operand:VALL_F16 1 "register_operand")
>> -   (match_operand:VALL_F16 2 "register_operand")
>> -   (match_operand:<V_INT_EQUIV> 3)]
>> -  "TARGET_SIMD"
>> -{
>> -  if (aarch64_expand_vec_perm_const (operands[0], operands[1],
>> -                                    operands[2], operands[3], <nunits>))
>> -    DONE;
>> -  else
>> -    FAIL;
>> -})
>> -
>>  (define_expand "vec_perm<mode>"
>>    [(match_operand:VB 0 "register_operand")
>>     (match_operand:VB 1 "register_operand")
>> Index: gcc/config/aarch64/aarch64.c
>> ===================================================================
>> --- gcc/config/aarch64/aarch64.c        2017-12-09 22:47:09.549486911 +0000
>> +++ gcc/config/aarch64/aarch64.c        2017-12-09 22:47:27.856318084 +0000
>> @@ -141,8 +141,6 @@ static void aarch64_elf_asm_constructor
>>  static void aarch64_elf_asm_destructor (rtx, int) ATTRIBUTE_UNUSED;
>>  static void aarch64_override_options_after_change (void);
>>  static bool aarch64_vector_mode_supported_p (machine_mode);
>> -static bool aarch64_vectorize_vec_perm_const_ok (machine_mode,
>> -                                                vec_perm_indices);
>>  static int aarch64_address_cost (rtx, machine_mode, addr_space_t, bool);
>>  static bool aarch64_builtin_support_vector_misalignment (machine_mode mode,
>>                                                          const_tree type,
>> @@ -13626,29 +13624,27 @@ aarch64_expand_vec_perm_const_1 (struct
>>    return false;
>>  }
>>
>> -/* Expand a vec_perm_const pattern with the operands given by TARGET,
>> -   OP0, OP1 and SEL.  NELT is the number of elements in the vector.  */
>> +/* Implement TARGET_VECTORIZE_VEC_PERM_CONST.  */
>>
>> -bool
>> -aarch64_expand_vec_perm_const (rtx target, rtx op0, rtx op1, rtx sel,
>> -                              unsigned int nelt)
>> +static bool
>> +aarch64_vectorize_vec_perm_const (machine_mode vmode, rtx target, rtx op0,
>> +                                 rtx op1, const vec_perm_indices &sel)
>>  {
>>    struct expand_vec_perm_d d;
>>    unsigned int i, which;
>>
>> +  d.vmode = vmode;
>>    d.target = target;
>>    d.op0 = op0;
>>    d.op1 = op1;
>> +  d.testing_p = !target;
>>
>> -  d.vmode = GET_MODE (target);
>> -  gcc_assert (VECTOR_MODE_P (d.vmode));
>> -  d.testing_p = false;
>> -
>> +  /* Calculate whether all elements are in one vector.  */
>> +  unsigned int nelt = sel.length ();
>>    d.perm.reserve (nelt);
>>    for (i = which = 0; i < nelt; ++i)
>>      {
>> -      rtx e = XVECEXP (sel, 0, i);
>> -      unsigned int ei = INTVAL (e) & (2 * nelt - 1);
>> +      unsigned int ei = sel[i] & (2 * nelt - 1);
>>        which |= (ei < nelt ? 1 : 2);
>>        d.perm.quick_push (ei);
>>      }
>> @@ -13660,7 +13656,7 @@ aarch64_expand_vec_perm_const (rtx targe
>>
>>      case 3:
>>        d.one_vector_p = false;
>> -      if (!rtx_equal_p (op0, op1))
>> +      if (d.testing_p || !rtx_equal_p (op0, op1))
>>         break;
>>
>>        /* The elements of PERM do not suggest that only the first operand
>> @@ -13681,37 +13677,8 @@ aarch64_expand_vec_perm_const (rtx targe
>>        break;
>>      }
>>
>> -  return aarch64_expand_vec_perm_const_1 (&d);
>> -}
>> -
>> -static bool
>> -aarch64_vectorize_vec_perm_const_ok (machine_mode vmode,
> vec_perm_indices sel)
>> -{
>> -  struct expand_vec_perm_d d;
>> -  unsigned int i, nelt, which;
>> -  bool ret;
>> -
>> -  d.vmode = vmode;
>> -  d.testing_p = true;
>> -  d.perm.safe_splice (sel);
>> -
>> -  /* Calculate whether all elements are in one vector.  */
>> -  nelt = sel.length ();
>> -  for (i = which = 0; i < nelt; ++i)
>> -    {
>> -      unsigned int e = d.perm[i];
>> -      gcc_assert (e < 2 * nelt);
>> -      which |= (e < nelt ? 1 : 2);
>> -    }
>> -
>> -  /* If all elements are from the second vector, reindex as if from the
>> -     first vector.  */
>> -  if (which == 2)
>> -    for (i = 0; i < nelt; ++i)
>> -      d.perm[i] -= nelt;
>> -
>> -  /* Check whether the mask can be applied to a single vector.  */
>> -  d.one_vector_p = (which != 3);
>> +  if (!d.testing_p)
>> +    return aarch64_expand_vec_perm_const_1 (&d);
>>
>>    d.target = gen_raw_REG (d.vmode, LAST_VIRTUAL_REGISTER + 1);
>>    d.op1 = d.op0 = gen_raw_REG (d.vmode, LAST_VIRTUAL_REGISTER + 2);
>> @@ -13719,7 +13686,7 @@ aarch64_vectorize_vec_perm_const_ok (mac
>>      d.op1 = gen_raw_REG (d.vmode, LAST_VIRTUAL_REGISTER + 3);
>>
>>    start_sequence ();
>> -  ret = aarch64_expand_vec_perm_const_1 (&d);
>> +  bool ret = aarch64_expand_vec_perm_const_1 (&d);
>>    end_sequence ();
>>
>>    return ret;
>> @@ -15471,9 +15438,9 @@ #define TARGET_VECTORIZE_VECTOR_ALIGNMEN
>>
>>  /* vec_perm support.  */
>>
>> -#undef TARGET_VECTORIZE_VEC_PERM_CONST_OK
>> -#define TARGET_VECTORIZE_VEC_PERM_CONST_OK \
>> -  aarch64_vectorize_vec_perm_const_ok
>> +#undef TARGET_VECTORIZE_VEC_PERM_CONST
>> +#define TARGET_VECTORIZE_VEC_PERM_CONST \
>> +  aarch64_vectorize_vec_perm_const
>>
>>  #undef TARGET_INIT_LIBFUNCS
>>  #define TARGET_INIT_LIBFUNCS aarch64_init_libfuncs
>> Index: gcc/config/arm/arm-protos.h
>> ===================================================================
>> --- gcc/config/arm/arm-protos.h 2017-12-09 22:47:09.549486911 +0000
>> +++ gcc/config/arm/arm-protos.h 2017-12-09 22:47:27.856318084 +0000
>> @@ -357,7 +357,6 @@ extern bool arm_validize_comparison (rtx
>>
>>  extern bool arm_gen_setmem (rtx *);
>>  extern void arm_expand_vec_perm (rtx target, rtx op0, rtx op1, rtx sel);
>> -extern bool arm_expand_vec_perm_const (rtx target, rtx op0, rtx op1,
> rtx sel);
>>
>>  extern bool arm_autoinc_modes_ok_p (machine_mode, enum arm_auto_incmodes);
>>
>> Index: gcc/config/arm/vec-common.md
>> ===================================================================
>> --- gcc/config/arm/vec-common.md        2017-12-09 22:47:09.549486911 +0000
>> +++ gcc/config/arm/vec-common.md        2017-12-09 22:47:27.858318085 +0000
>> @@ -109,35 +109,6 @@ (define_expand "umax<mode>3"
>>  {
>>  })
>>
>> -(define_expand "vec_perm_const<mode>"
>> -  [(match_operand:VALL 0 "s_register_operand" "")
>> -   (match_operand:VALL 1 "s_register_operand" "")
>> -   (match_operand:VALL 2 "s_register_operand" "")
>> -   (match_operand:<V_cmp_result> 3 "" "")]
>> -  "TARGET_NEON
>> -   || (TARGET_REALLY_IWMMXT && VALID_IWMMXT_REG_MODE (<MODE>mode))"
>> -{
>> -  if (arm_expand_vec_perm_const (operands[0], operands[1],
>> -                                operands[2], operands[3]))
>> -    DONE;
>> -  else
>> -    FAIL;
>> -})
>> -
>> -(define_expand "vec_perm_const<mode>"
>> -  [(match_operand:VH 0 "s_register_operand")
>> -   (match_operand:VH 1 "s_register_operand")
>> -   (match_operand:VH 2 "s_register_operand")
>> -   (match_operand:<V_cmp_result> 3)]
>> -  "TARGET_NEON"
>> -{
>> -  if (arm_expand_vec_perm_const (operands[0], operands[1],
>> -                                operands[2], operands[3]))
>> -    DONE;
>> -  else
>> -    FAIL;
>> -})
>> -
>>  (define_expand "vec_perm<mode>"
>>    [(match_operand:VE 0 "s_register_operand" "")
>>     (match_operand:VE 1 "s_register_operand" "")
>> Index: gcc/config/arm/arm.c
>> ===================================================================
>> --- gcc/config/arm/arm.c        2017-12-09 22:47:09.549486911 +0000
>> +++ gcc/config/arm/arm.c        2017-12-09 22:47:27.858318085 +0000
>> @@ -288,7 +288,8 @@ static int arm_cortex_a5_branch_cost (bo
>>  static int arm_cortex_m_branch_cost (bool, bool);
>>  static int arm_cortex_m7_branch_cost (bool, bool);
>>
>> -static bool arm_vectorize_vec_perm_const_ok (machine_mode, vec_perm_indices);
>> +static bool arm_vectorize_vec_perm_const (machine_mode, rtx, rtx, rtx,
>> +                                         const vec_perm_indices &);
>>
>>  static bool aarch_macro_fusion_pair_p (rtx_insn*, rtx_insn*);
>>
>> @@ -734,9 +735,8 @@ #define TARGET_VECTORIZE_SUPPORT_VECTOR_
>>  #define TARGET_PREFERRED_RENAME_CLASS \
>>    arm_preferred_rename_class
>>
>> -#undef TARGET_VECTORIZE_VEC_PERM_CONST_OK
>> -#define TARGET_VECTORIZE_VEC_PERM_CONST_OK \
>> -  arm_vectorize_vec_perm_const_ok
>> +#undef TARGET_VECTORIZE_VEC_PERM_CONST
>> +#define TARGET_VECTORIZE_VEC_PERM_CONST arm_vectorize_vec_perm_const
>>
>>  #undef TARGET_VECTORIZE_BUILTIN_VECTORIZATION_COST
>>  #define TARGET_VECTORIZE_BUILTIN_VECTORIZATION_COST \
>> @@ -29381,28 +29381,31 @@ arm_expand_vec_perm_const_1 (struct expa
>>    return false;
>>  }
>>
>> -/* Expand a vec_perm_const pattern.  */
>> +/* Implement TARGET_VECTORIZE_VEC_PERM_CONST.  */
>>
>> -bool
>> -arm_expand_vec_perm_const (rtx target, rtx op0, rtx op1, rtx sel)
>> +static bool
>> +arm_vectorize_vec_perm_const (machine_mode vmode, rtx target, rtx
> op0, rtx op1,
>> +                             const vec_perm_indices &sel)
>>  {
>>    struct expand_vec_perm_d d;
>>    int i, nelt, which;
>>
>> +  if (!VALID_NEON_DREG_MODE (vmode) && !VALID_NEON_QREG_MODE (vmode))
>> +    return false;
>> +
>>    d.target = target;
>>    d.op0 = op0;
>>    d.op1 = op1;
>>
>> -  d.vmode = GET_MODE (target);
>> +  d.vmode = vmode;
>>    gcc_assert (VECTOR_MODE_P (d.vmode));
>> -  d.testing_p = false;
>> +  d.testing_p = !target;
>>
>>    nelt = GET_MODE_NUNITS (d.vmode);
>>    d.perm.reserve (nelt);
>>    for (i = which = 0; i < nelt; ++i)
>>      {
>> -      rtx e = XVECEXP (sel, 0, i);
>> -      int ei = INTVAL (e) & (2 * nelt - 1);
>> +      int ei = sel[i] & (2 * nelt - 1);
>>        which |= (ei < nelt ? 1 : 2);
>>        d.perm.quick_push (ei);
>>      }
>> @@ -29414,7 +29417,7 @@ arm_expand_vec_perm_const (rtx target, r
>>
>>      case 3:
>>        d.one_vector_p = false;
>> -      if (!rtx_equal_p (op0, op1))
>> +      if (d.testing_p || !rtx_equal_p (op0, op1))
>>         break;
>>
>>        /* The elements of PERM do not suggest that only the first operand
>> @@ -29435,38 +29438,8 @@ arm_expand_vec_perm_const (rtx target, r
>>        break;
>>      }
>>
>> -  return arm_expand_vec_perm_const_1 (&d);
>> -}
>> -
>> -/* Implement TARGET_VECTORIZE_VEC_PERM_CONST_OK.  */
>> -
>> -static bool
>> -arm_vectorize_vec_perm_const_ok (machine_mode vmode, vec_perm_indices sel)
>> -{
>> -  struct expand_vec_perm_d d;
>> -  unsigned int i, nelt, which;
>> -  bool ret;
>> -
>> -  d.vmode = vmode;
>> -  d.testing_p = true;
>> -  d.perm.safe_splice (sel);
>> -
>> -  /* Categorize the set of elements in the selector.  */
>> -  nelt = GET_MODE_NUNITS (d.vmode);
>> -  for (i = which = 0; i < nelt; ++i)
>> -    {
>> -      unsigned int e = d.perm[i];
>> -      gcc_assert (e < 2 * nelt);
>> -      which |= (e < nelt ? 1 : 2);
>> -    }
>> -
>> -  /* For all elements from second vector, fold the elements to first.  */
>> -  if (which == 2)
>> -    for (i = 0; i < nelt; ++i)
>> -      d.perm[i] -= nelt;
>> -
>> -  /* Check whether the mask can be applied to the vector type.  */
>> -  d.one_vector_p = (which != 3);
>> +  if (d.testing_p)
>> +    return arm_expand_vec_perm_const_1 (&d);
>>
>>    d.target = gen_raw_REG (d.vmode, LAST_VIRTUAL_REGISTER + 1);
>>    d.op1 = d.op0 = gen_raw_REG (d.vmode, LAST_VIRTUAL_REGISTER + 2);
>> @@ -29474,7 +29447,7 @@ arm_vectorize_vec_perm_const_ok (machine
>>      d.op1 = gen_raw_REG (d.vmode, LAST_VIRTUAL_REGISTER + 3);
>>
>>    start_sequence ();
>> -  ret = arm_expand_vec_perm_const_1 (&d);
>> +  bool ret = arm_expand_vec_perm_const_1 (&d);
>>    end_sequence ();
>>
>>    return ret;
>> Index: gcc/config/i386/i386-protos.h
>> ===================================================================
>> --- gcc/config/i386/i386-protos.h       2017-12-09 22:47:09.549486911 +0000
>> +++ gcc/config/i386/i386-protos.h       2017-12-09 22:47:27.859318085 +0000
>> @@ -133,7 +133,6 @@ extern bool ix86_expand_fp_movcc (rtx[])
>>  extern bool ix86_expand_fp_vcond (rtx[]);
>>  extern bool ix86_expand_int_vcond (rtx[]);
>>  extern void ix86_expand_vec_perm (rtx[]);
>> -extern bool ix86_expand_vec_perm_const (rtx[]);
>>  extern bool ix86_expand_mask_vec_cmp (rtx[]);
>>  extern bool ix86_expand_int_vec_cmp (rtx[]);
>>  extern bool ix86_expand_fp_vec_cmp (rtx[]);
>> Index: gcc/config/i386/sse.md
>> ===================================================================
>> --- gcc/config/i386/sse.md      2017-12-09 22:47:09.549486911 +0000
>> +++ gcc/config/i386/sse.md      2017-12-09 22:47:27.863318088 +0000
>> @@ -11476,30 +11476,6 @@ (define_expand "vec_perm<mode>"
>>    DONE;
>>  })
>>
>> -(define_mode_iterator VEC_PERM_CONST
>> -  [(V4SF "TARGET_SSE") (V4SI "TARGET_SSE")
>> -   (V2DF "TARGET_SSE") (V2DI "TARGET_SSE")
>> -   (V16QI "TARGET_SSE2") (V8HI "TARGET_SSE2")
>> -   (V8SF "TARGET_AVX") (V4DF "TARGET_AVX")
>> -   (V8SI "TARGET_AVX") (V4DI "TARGET_AVX")
>> -   (V32QI "TARGET_AVX2") (V16HI "TARGET_AVX2")
>> -   (V16SI "TARGET_AVX512F") (V8DI "TARGET_AVX512F")
>> -   (V16SF "TARGET_AVX512F") (V8DF "TARGET_AVX512F")
>> -   (V32HI "TARGET_AVX512BW") (V64QI "TARGET_AVX512BW")])
>> -
>> -(define_expand "vec_perm_const<mode>"
>> -  [(match_operand:VEC_PERM_CONST 0 "register_operand")
>> -   (match_operand:VEC_PERM_CONST 1 "register_operand")
>> -   (match_operand:VEC_PERM_CONST 2 "register_operand")
>> -   (match_operand:<sseintvecmode> 3)]
>> -  ""
>> -{
>> -  if (ix86_expand_vec_perm_const (operands))
>> -    DONE;
>> -  else
>> -    FAIL;
>> -})
>> -
>>  ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
>>  ;;
>>  ;; Parallel bitwise logical operations
>> Index: gcc/config/i386/i386.c
>> ===================================================================
>> --- gcc/config/i386/i386.c      2017-12-09 22:47:09.549486911 +0000
>> +++ gcc/config/i386/i386.c      2017-12-09 22:47:27.862318087 +0000
>> @@ -47588,9 +47588,8 @@ expand_vec_perm_vpshufb4_vpermq2 (struct
>>    return true;
>>  }
>>
>> -/* The guts of ix86_expand_vec_perm_const, also used by the ok hook.
>> -   With all of the interface bits taken care of, perform the expansion
>> -   in D and return true on success.  */
>> +/* The guts of ix86_vectorize_vec_perm_const.  With all of the interface bits
>> +   taken care of, perform the expansion in D and return true on success.  */
>>
>>  static bool
>>  ix86_expand_vec_perm_const_1 (struct expand_vec_perm_d *d)
>> @@ -47725,69 +47724,29 @@ canonicalize_perm (struct expand_vec_per
>>    return (which == 3);
>>  }
>>
>> -bool
>> -ix86_expand_vec_perm_const (rtx operands[4])
>> +/* Implement TARGET_VECTORIZE_VEC_PERM_CONST.  */
>> +
>> +static bool
>> +ix86_vectorize_vec_perm_const (machine_mode vmode, rtx target, rtx op0,
>> +                              rtx op1, const vec_perm_indices &sel)
>>  {
>>    struct expand_vec_perm_d d;
>>    unsigned char perm[MAX_VECT_LEN];
>> -  int i, nelt;
>> +  unsigned int i, nelt, which;
>>    bool two_args;
>> -  rtx sel;
>>
>> -  d.target = operands[0];
>> -  d.op0 = operands[1];
>> -  d.op1 = operands[2];
>> -  sel = operands[3];
>> +  d.target = target;
>> +  d.op0 = op0;
>> +  d.op1 = op1;
>>
>> -  d.vmode = GET_MODE (d.target);
>> +  d.vmode = vmode;
>>    gcc_assert (VECTOR_MODE_P (d.vmode));
>>    d.nelt = nelt = GET_MODE_NUNITS (d.vmode);
>> -  d.testing_p = false;
>> +  d.testing_p = !target;
>>
>> -  gcc_assert (GET_CODE (sel) == CONST_VECTOR);
>> -  gcc_assert (XVECLEN (sel, 0) == nelt);
>> +  gcc_assert (sel.length () == nelt);
>>    gcc_checking_assert (sizeof (d.perm) == sizeof (perm));
>>
>> -  for (i = 0; i < nelt; ++i)
>> -    {
>> -      rtx e = XVECEXP (sel, 0, i);
>> -      int ei = INTVAL (e) & (2 * nelt - 1);
>> -      d.perm[i] = ei;
>> -      perm[i] = ei;
>> -    }
>> -
>> -  two_args = canonicalize_perm (&d);
>> -
>> -  if (ix86_expand_vec_perm_const_1 (&d))
>> -    return true;
>> -
>> -  /* If the selector says both arguments are needed, but the operands are the
>> - same, the above tried to expand with one_operand_p and flattened
> selector.
>> -     If that didn't work, retry without one_operand_p; we succeeded with that
>> -     during testing.  */
>> -  if (two_args && d.one_operand_p)
>> -    {
>> -      d.one_operand_p = false;
>> -      memcpy (d.perm, perm, sizeof (perm));
>> -      return ix86_expand_vec_perm_const_1 (&d);
>> -    }
>> -
>> -  return false;
>> -}
>> -
>> -/* Implement targetm.vectorize.vec_perm_const_ok.  */
>> -
>> -static bool
>> -ix86_vectorize_vec_perm_const_ok (machine_mode vmode, vec_perm_indices sel)
>> -{
>> -  struct expand_vec_perm_d d;
>> -  unsigned int i, nelt, which;
>> -  bool ret;
>> -
>> -  d.vmode = vmode;
>> -  d.nelt = nelt = GET_MODE_NUNITS (d.vmode);
>> -  d.testing_p = true;
>> -
>>    /* Given sufficient ISA support we can just return true here
>>       for selected vector modes.  */
>>    switch (d.vmode)
>> @@ -47796,17 +47755,23 @@ ix86_vectorize_vec_perm_const_ok (machin
>>      case E_V16SImode:
>>      case E_V8DImode:
>>      case E_V8DFmode:
>> -      if (TARGET_AVX512F)
>> -       /* All implementable with a single vperm[it]2 insn.  */
>> +      if (!TARGET_AVX512F)
>> +       return false;
>> +      /* All implementable with a single vperm[it]2 insn.  */
>> +      if (d.testing_p)
>>         return true;
>>        break;
>>      case E_V32HImode:
>> -      if (TARGET_AVX512BW)
>> +      if (!TARGET_AVX512BW)
>> +       return false;
>> +      if (d.testing_p)
>>         /* All implementable with a single vperm[it]2 insn.  */
>>         return true;
>>        break;
>>      case E_V64QImode:
>> -      if (TARGET_AVX512BW)
>> +      if (!TARGET_AVX512BW)
>> +       return false;
>> +      if (d.testing_p)
>>         /* Implementable with 2 vperm[it]2, 2 vpshufb and 1 or insn.  */
>>         return true;
>>        break;
>> @@ -47814,73 +47779,108 @@ ix86_vectorize_vec_perm_const_ok (machin
>>      case E_V8SFmode:
>>      case E_V4DFmode:
>>      case E_V4DImode:
>> -      if (TARGET_AVX512VL)
>> +      if (!TARGET_AVX)
>> +       return false;
>> +      if (d.testing_p && TARGET_AVX512VL)
>>         /* All implementable with a single vperm[it]2 insn.  */
>>         return true;
>>        break;
>>      case E_V16HImode:
>> -      if (TARGET_AVX2)
>> +      if (!TARGET_SSE2)
>> +       return false;
>> +      if (d.testing_p && TARGET_AVX2)
>>         /* Implementable with 4 vpshufb insns, 2 vpermq and 3 vpor insns.  */
>>         return true;
>>        break;
>>      case E_V32QImode:
>> -      if (TARGET_AVX2)
>> +      if (!TARGET_SSE2)
>> +       return false;
>> +      if (d.testing_p && TARGET_AVX2)
>>         /* Implementable with 4 vpshufb insns, 2 vpermq and 3 vpor insns.  */
>>         return true;
>>        break;
>> -    case E_V4SImode:
>> -    case E_V4SFmode:
>>      case E_V8HImode:
>>      case E_V16QImode:
>> +      if (!TARGET_SSE2)
>> +       return false;
>> +      /* Fall through.  */
>> +    case E_V4SImode:
>> +    case E_V4SFmode:
>> +      if (!TARGET_SSE)
>> +       return false;
>>        /* All implementable with a single vpperm insn.  */
>> -      if (TARGET_XOP)
>> +      if (d.testing_p && TARGET_XOP)
>>         return true;
>>        /* All implementable with 2 pshufb + 1 ior.  */
>> -      if (TARGET_SSSE3)
>> +      if (d.testing_p && TARGET_SSSE3)
>>         return true;
>>        break;
>>      case E_V2DImode:
>>      case E_V2DFmode:
>> +      if (!TARGET_SSE)
>> +       return false;
>>        /* All implementable with shufpd or unpck[lh]pd.  */
>> -      return true;
>> +      if (d.testing_p)
>> +       return true;
>> +      break;
>>      default:
>>        return false;
>>      }
>>
>> -  /* Extract the values from the vector CST into the permutation
>> -     array in D.  */
>>    for (i = which = 0; i < nelt; ++i)
>>      {
>>        unsigned char e = sel[i];
>>        gcc_assert (e < 2 * nelt);
>>        d.perm[i] = e;
>> +      perm[i] = e;
>>        which |= (e < nelt ? 1 : 2);
>>      }
>>
>> -  /* For all elements from second vector, fold the elements to first.  */
>> -  if (which == 2)
>> -    for (i = 0; i < nelt; ++i)
>> -      d.perm[i] -= nelt;
>> +  if (d.testing_p)
>> +    {
>> +      /* For all elements from second vector, fold the elements to first.  */
>> +      if (which == 2)
>> +       for (i = 0; i < nelt; ++i)
>> +         d.perm[i] -= nelt;
>> +
>> +      /* Check whether the mask can be applied to the vector type.  */
>> +      d.one_operand_p = (which != 3);
>> +
>> +      /* Implementable with shufps or pshufd.  */
>> +      if (d.one_operand_p && (d.vmode == V4SFmode || d.vmode == V4SImode))
>> +       return true;
>> +
>> +      /* Otherwise we have to go through the motions and see if we can
>> +        figure out how to generate the requested permutation.  */
>> +      d.target = gen_raw_REG (d.vmode, LAST_VIRTUAL_REGISTER + 1);
>> +      d.op1 = d.op0 = gen_raw_REG (d.vmode, LAST_VIRTUAL_REGISTER + 2);
>> +      if (!d.one_operand_p)
>> +       d.op1 = gen_raw_REG (d.vmode, LAST_VIRTUAL_REGISTER + 3);
>> +
>> +      start_sequence ();
>> +      bool ret = ix86_expand_vec_perm_const_1 (&d);
>> +      end_sequence ();
>>
>> -  /* Check whether the mask can be applied to the vector type.  */
>> -  d.one_operand_p = (which != 3);
>> +      return ret;
>> +    }
>>
>> -  /* Implementable with shufps or pshufd.  */
>> -  if (d.one_operand_p && (d.vmode == V4SFmode || d.vmode == V4SImode))
>> +  two_args = canonicalize_perm (&d);
>> +
>> +  if (ix86_expand_vec_perm_const_1 (&d))
>>      return true;
>>
>> -  /* Otherwise we have to go through the motions and see if we can
>> -     figure out how to generate the requested permutation.  */
>> -  d.target = gen_raw_REG (d.vmode, LAST_VIRTUAL_REGISTER + 1);
>> -  d.op1 = d.op0 = gen_raw_REG (d.vmode, LAST_VIRTUAL_REGISTER + 2);
>> -  if (!d.one_operand_p)
>> -    d.op1 = gen_raw_REG (d.vmode, LAST_VIRTUAL_REGISTER + 3);
>> -
>> -  start_sequence ();
>> -  ret = ix86_expand_vec_perm_const_1 (&d);
>> -  end_sequence ();
>> +  /* If the selector says both arguments are needed, but the operands are the
>> + same, the above tried to expand with one_operand_p and flattened
> selector.
>> +     If that didn't work, retry without one_operand_p; we succeeded with that
>> +     during testing.  */
>> +  if (two_args && d.one_operand_p)
>> +    {
>> +      d.one_operand_p = false;
>> +      memcpy (d.perm, perm, sizeof (perm));
>> +      return ix86_expand_vec_perm_const_1 (&d);
>> +    }
>>
>> -  return ret;
>> +  return false;
>>  }
>>
>>  void
>> @@ -50532,9 +50532,8 @@ #define TARGET_CLASS_LIKELY_SPILLED_P ix
>>  #undef TARGET_VECTORIZE_BUILTIN_VECTORIZATION_COST
>>  #define TARGET_VECTORIZE_BUILTIN_VECTORIZATION_COST \
>>    ix86_builtin_vectorization_cost
>> -#undef TARGET_VECTORIZE_VEC_PERM_CONST_OK
>> -#define TARGET_VECTORIZE_VEC_PERM_CONST_OK \
>> -  ix86_vectorize_vec_perm_const_ok
>> +#undef TARGET_VECTORIZE_VEC_PERM_CONST
>> +#define TARGET_VECTORIZE_VEC_PERM_CONST ix86_vectorize_vec_perm_const
>>  #undef TARGET_VECTORIZE_PREFERRED_SIMD_MODE
>>  #define TARGET_VECTORIZE_PREFERRED_SIMD_MODE \
>>    ix86_preferred_simd_mode
>> Index: gcc/config/ia64/ia64-protos.h
>> ===================================================================
>> --- gcc/config/ia64/ia64-protos.h       2017-12-09 22:47:09.549486911 +0000
>> +++ gcc/config/ia64/ia64-protos.h       2017-12-09 22:47:27.864318089 +0000
>> @@ -62,7 +62,6 @@ extern const char *get_bundle_name (int)
>>  extern const char *output_probe_stack_range (rtx, rtx);
>>
>>  extern void ia64_expand_vec_perm_even_odd (rtx, rtx, rtx, int);
>> -extern bool ia64_expand_vec_perm_const (rtx op[4]);
>>  extern void ia64_expand_vec_setv2sf (rtx op[3]);
>>  #endif /* RTX_CODE */
>>
>> Index: gcc/config/ia64/vect.md
>> ===================================================================
>> --- gcc/config/ia64/vect.md     2017-12-09 22:47:09.549486911 +0000
>> +++ gcc/config/ia64/vect.md     2017-12-09 22:47:27.865318089 +0000
>> @@ -1549,19 +1549,6 @@ (define_expand "vec_pack_trunc_v2si"
>>    DONE;
>>  })
>>
>> -(define_expand "vec_perm_const<mode>"
>> -  [(match_operand:VEC 0 "register_operand" "")
>> -   (match_operand:VEC 1 "register_operand" "")
>> -   (match_operand:VEC 2 "register_operand" "")
>> -   (match_operand:<vecint> 3 "" "")]
>> -  ""
>> -{
>> -  if (ia64_expand_vec_perm_const (operands))
>> -    DONE;
>> -  else
>> -    FAIL;
>> -})
>> -
>>  ;; Missing operations
>>  ;; fprcpa
>>  ;; fpsqrta
>> Index: gcc/config/ia64/ia64.c
>> ===================================================================
>> --- gcc/config/ia64/ia64.c      2017-12-09 22:47:09.549486911 +0000
>> +++ gcc/config/ia64/ia64.c      2017-12-09 22:47:27.864318089 +0000
>> @@ -333,7 +333,8 @@ static fixed_size_mode ia64_get_reg_raw_
>>  static section * ia64_hpux_function_section (tree, enum node_frequency,
>>                                              bool, bool);
>>
>> -static bool ia64_vectorize_vec_perm_const_ok (machine_mode,
> vec_perm_indices);
>> +static bool ia64_vectorize_vec_perm_const (machine_mode, rtx, rtx, rtx,
>> +                                          const vec_perm_indices &);
>>
>>  static unsigned int ia64_hard_regno_nregs (unsigned int, machine_mode);
>>  static bool ia64_hard_regno_mode_ok (unsigned int, machine_mode);
>> @@ -652,8 +653,8 @@ #define TARGET_DELAY_SCHED2 true
>>  #undef TARGET_DELAY_VARTRACK
>>  #define TARGET_DELAY_VARTRACK true
>>
>> -#undef TARGET_VECTORIZE_VEC_PERM_CONST_OK
>> -#define TARGET_VECTORIZE_VEC_PERM_CONST_OK ia64_vectorize_vec_perm_const_ok
>> +#undef TARGET_VECTORIZE_VEC_PERM_CONST
>> +#define TARGET_VECTORIZE_VEC_PERM_CONST ia64_vectorize_vec_perm_const
>>
>>  #undef TARGET_ATTRIBUTE_TAKES_IDENTIFIER_P
>>  #define TARGET_ATTRIBUTE_TAKES_IDENTIFIER_P ia64_attribute_takes_identifier_p
>> @@ -11741,32 +11742,31 @@ ia64_expand_vec_perm_const_1 (struct exp
>>    return false;
>>  }
>>
>> -bool
>> -ia64_expand_vec_perm_const (rtx operands[4])
>> +/* Implement TARGET_VECTORIZE_VEC_PERM_CONST.  */
>> +
>> +static bool
>> +ia64_vectorize_vec_perm_const (machine_mode vmode, rtx target, rtx op0,
>> +                              rtx op1, const vec_perm_indices &sel)
>>  {
>>    struct expand_vec_perm_d d;
>>    unsigned char perm[MAX_VECT_LEN];
>> -  int i, nelt, which;
>> -  rtx sel;
>> +  unsigned int i, nelt, which;
>>
>> -  d.target = operands[0];
>> -  d.op0 = operands[1];
>> -  d.op1 = operands[2];
>> -  sel = operands[3];
>> +  d.target = target;
>> +  d.op0 = op0;
>> +  d.op1 = op1;
>>
>> -  d.vmode = GET_MODE (d.target);
>> +  d.vmode = vmode;
>>    gcc_assert (VECTOR_MODE_P (d.vmode));
>>    d.nelt = nelt = GET_MODE_NUNITS (d.vmode);
>> -  d.testing_p = false;
>> +  d.testing_p = !target;
>>
>> -  gcc_assert (GET_CODE (sel) == CONST_VECTOR);
>> -  gcc_assert (XVECLEN (sel, 0) == nelt);
>> +  gcc_assert (sel.length () == nelt);
>>    gcc_checking_assert (sizeof (d.perm) == sizeof (perm));
>>
>>    for (i = which = 0; i < nelt; ++i)
>>      {
>> -      rtx e = XVECEXP (sel, 0, i);
>> -      int ei = INTVAL (e) & (2 * nelt - 1);
>> +      unsigned int ei = sel[i] & (2 * nelt - 1);
>>
>>        which |= (ei < nelt ? 1 : 2);
>>        d.perm[i] = ei;
>> @@ -11779,7 +11779,7 @@ ia64_expand_vec_perm_const (rtx operands
>>        gcc_unreachable();
>>
>>      case 3:
>> -      if (!rtx_equal_p (d.op0, d.op1))
>> +      if (d.testing_p || !rtx_equal_p (d.op0, d.op1))
>>         {
>>           d.one_operand_p = false;
>>           break;
>> @@ -11807,6 +11807,22 @@ ia64_expand_vec_perm_const (rtx operands
>>        break;
>>      }
>>
>> +  if (d.testing_p)
>> +    {
>> +      /* We have to go through the motions and see if we can
>> +        figure out how to generate the requested permutation.  */
>> +      d.target = gen_raw_REG (d.vmode, LAST_VIRTUAL_REGISTER + 1);
>> +      d.op1 = d.op0 = gen_raw_REG (d.vmode, LAST_VIRTUAL_REGISTER + 2);
>> +      if (!d.one_operand_p)
>> +       d.op1 = gen_raw_REG (d.vmode, LAST_VIRTUAL_REGISTER + 3);
>> +
>> +      start_sequence ();
>> +      bool ret = ia64_expand_vec_perm_const_1 (&d);
>> +      end_sequence ();
>> +
>> +      return ret;
>> +    }
>> +
>>    if (ia64_expand_vec_perm_const_1 (&d))
>>      return true;
>>
>> @@ -11823,51 +11839,6 @@ ia64_expand_vec_perm_const (rtx operands
>>    return false;
>>  }
>>
>> -/* Implement targetm.vectorize.vec_perm_const_ok.  */
>> -
>> -static bool
>> -ia64_vectorize_vec_perm_const_ok (machine_mode vmode, vec_perm_indices sel)
>> -{
>> -  struct expand_vec_perm_d d;
>> -  unsigned int i, nelt, which;
>> -  bool ret;
>> -
>> -  d.vmode = vmode;
>> -  d.nelt = nelt = GET_MODE_NUNITS (d.vmode);
>> -  d.testing_p = true;
>> -
>> -  /* Extract the values from the vector CST into the permutation
>> -     array in D.  */
>> -  for (i = which = 0; i < nelt; ++i)
>> -    {
>> -      unsigned char e = sel[i];
>> -      d.perm[i] = e;
>> -      gcc_assert (e < 2 * nelt);
>> -      which |= (e < nelt ? 1 : 2);
>> -    }
>> -
>> -  /* For all elements from second vector, fold the elements to first.  */
>> -  if (which == 2)
>> -    for (i = 0; i < nelt; ++i)
>> -      d.perm[i] -= nelt;
>> -
>> -  /* Check whether the mask can be applied to the vector type.  */
>> -  d.one_operand_p = (which != 3);
>> -
>> -  /* Otherwise we have to go through the motions and see if we can
>> -     figure out how to generate the requested permutation.  */
>> -  d.target = gen_raw_REG (d.vmode, LAST_VIRTUAL_REGISTER + 1);
>> -  d.op1 = d.op0 = gen_raw_REG (d.vmode, LAST_VIRTUAL_REGISTER + 2);
>> -  if (!d.one_operand_p)
>> -    d.op1 = gen_raw_REG (d.vmode, LAST_VIRTUAL_REGISTER + 3);
>> -
>> -  start_sequence ();
>> -  ret = ia64_expand_vec_perm_const_1 (&d);
>> -  end_sequence ();
>> -
>> -  return ret;
>> -}
>> -
>>  void
>>  ia64_expand_vec_setv2sf (rtx operands[3])
>>  {
>> Index: gcc/config/mips/loongson.md
>> ===================================================================
>> --- gcc/config/mips/loongson.md 2017-12-09 22:47:09.549486911 +0000
>> +++ gcc/config/mips/loongson.md 2017-12-09 22:47:27.865318089 +0000
>> @@ -784,19 +784,6 @@ (define_insn "*loongson_punpcklwd_hi"
>>    "punpcklwd\t%0,%1,%2"
>>    [(set_attr "type" "fcvt")])
>>
>> -(define_expand "vec_perm_const<mode>"
>> -  [(match_operand:VWHB 0 "register_operand" "")
>> -   (match_operand:VWHB 1 "register_operand" "")
>> -   (match_operand:VWHB 2 "register_operand" "")
>> -   (match_operand:VWHB 3 "" "")]
>> -  "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
>> -{
>> -  if (mips_expand_vec_perm_const (operands))
>> -    DONE;
>> -  else
>> -    FAIL;
>> -})
>> -
>>  (define_expand "vec_unpacks_lo_<mode>"
>>    [(match_operand:<V_stretch_half> 0 "register_operand" "")
>>     (match_operand:VHB 1 "register_operand" "")]
>> Index: gcc/config/mips/mips-msa.md
>> ===================================================================
>> --- gcc/config/mips/mips-msa.md 2017-12-09 22:47:09.549486911 +0000
>> +++ gcc/config/mips/mips-msa.md 2017-12-09 22:47:27.865318089 +0000
>> @@ -558,19 +558,6 @@ (define_insn_and_split "msa_copy_s_<msaf
>>    [(set_attr "type" "simd_copy")
>>     (set_attr "mode" "<MODE>")])
>>
>> -(define_expand "vec_perm_const<mode>"
>> -  [(match_operand:MSA 0 "register_operand")
>> -   (match_operand:MSA 1 "register_operand")
>> -   (match_operand:MSA 2 "register_operand")
>> -   (match_operand:<VIMODE> 3 "")]
>> -  "ISA_HAS_MSA"
>> -{
>> -  if (mips_expand_vec_perm_const (operands))
>> -    DONE;
>> -  else
>> -    FAIL;
>> -})
>> -
>>  (define_expand "abs<mode>2"
>>    [(match_operand:IMSA 0 "register_operand" "=f")
>>     (abs:IMSA (match_operand:IMSA 1 "register_operand" "f"))]
>> Index: gcc/config/mips/mips-ps-3d.md
>> ===================================================================
>> --- gcc/config/mips/mips-ps-3d.md       2017-12-09 22:47:09.549486911 +0000
>> +++ gcc/config/mips/mips-ps-3d.md       2017-12-09 22:47:27.865318089 +0000
>> @@ -164,19 +164,6 @@ (define_insn "vec_perm_const_ps"
>>    [(set_attr "type" "fmove")
>>     (set_attr "mode" "SF")])
>>
>> -(define_expand "vec_perm_constv2sf"
>> -  [(match_operand:V2SF 0 "register_operand" "")
>> -   (match_operand:V2SF 1 "register_operand" "")
>> -   (match_operand:V2SF 2 "register_operand" "")
>> -   (match_operand:V2SI 3 "" "")]
>> -  "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT"
>> -{
>> -  if (mips_expand_vec_perm_const (operands))
>> -    DONE;
>> -  else
>> -    FAIL;
>> -})
>> -
>>  ;; Expanders for builtins.  The instruction:
>>  ;;
>>  ;;     P[UL][UL].PS <result>, <a>, <b>
>> Index: gcc/config/mips/mips-protos.h
>> ===================================================================
>> --- gcc/config/mips/mips-protos.h       2017-12-09 22:47:09.549486911 +0000
>> +++ gcc/config/mips/mips-protos.h       2017-12-09 22:47:27.865318089 +0000
>> @@ -348,7 +348,6 @@ extern void mips_expand_atomic_qihi (uni
>>                                      rtx, rtx, rtx, rtx);
>>
>>  extern void mips_expand_vector_init (rtx, rtx);
>> -extern bool mips_expand_vec_perm_const (rtx op[4]);
>>  extern void mips_expand_vec_unpack (rtx op[2], bool, bool);
>>  extern void mips_expand_vec_reduc (rtx, rtx, rtx (*)(rtx, rtx, rtx));
>>  extern void mips_expand_vec_minmax (rtx, rtx, rtx,
>> Index: gcc/config/mips/mips.c
>> ===================================================================
>> --- gcc/config/mips/mips.c      2017-12-09 22:47:09.549486911 +0000
>> +++ gcc/config/mips/mips.c      2017-12-09 22:47:27.867318090 +0000
>> @@ -21377,34 +21377,32 @@ mips_expand_vec_perm_const_1 (struct exp
>>    return false;
>>  }
>>
>> -/* Expand a vec_perm_const pattern.  */
>> +/* Implement TARGET_VECTORIZE_VEC_PERM_CONST.  */
>>
>> -bool
>> -mips_expand_vec_perm_const (rtx operands[4])
>> +static bool
>> +mips_vectorize_vec_perm_const (machine_mode vmode, rtx target, rtx op0,
>> +                              rtx op1, const vec_perm_indices &sel)
>>  {
>>    struct expand_vec_perm_d d;
>>    int i, nelt, which;
>>    unsigned char orig_perm[MAX_VECT_LEN];
>> -  rtx sel;
>>    bool ok;
>>
>> -  d.target = operands[0];
>> -  d.op0 = operands[1];
>> -  d.op1 = operands[2];
>> -  sel = operands[3];
>> -
>> -  d.vmode = GET_MODE (d.target);
>> -  gcc_assert (VECTOR_MODE_P (d.vmode));
>> -  d.nelt = nelt = GET_MODE_NUNITS (d.vmode);
>> -  d.testing_p = false;
>> +  d.target = target;
>> +  d.op0 = op0;
>> +  d.op1 = op1;
>> +
>> +  d.vmode = vmode;
>> +  gcc_assert (VECTOR_MODE_P (vmode));
>> +  d.nelt = nelt = GET_MODE_NUNITS (vmode);
>> +  d.testing_p = !target;
>>
>>    /* This is overly conservative, but ensures we don't get an
>>       uninitialized warning on ORIG_PERM.  */
>>    memset (orig_perm, 0, MAX_VECT_LEN);
>>    for (i = which = 0; i < nelt; ++i)
>>      {
>> -      rtx e = XVECEXP (sel, 0, i);
>> -      int ei = INTVAL (e) & (2 * nelt - 1);
>> +      int ei = sel[i] & (2 * nelt - 1);
>>        which |= (ei < nelt ? 1 : 2);
>>        orig_perm[i] = ei;
>>      }
>> @@ -21417,7 +21415,7 @@ mips_expand_vec_perm_const (rtx operands
>>
>>      case 3:
>>        d.one_vector_p = false;
>> -      if (!rtx_equal_p (d.op0, d.op1))
>> +      if (d.testing_p || !rtx_equal_p (d.op0, d.op1))
>>         break;
>>        /* FALLTHRU */
>>
>> @@ -21434,6 +21432,19 @@ mips_expand_vec_perm_const (rtx operands
>>        break;
>>      }
>>
>> +  if (d.testing_p)
>> +    {
>> +      d.target = gen_raw_REG (d.vmode, LAST_VIRTUAL_REGISTER + 1);
>> +      d.op1 = d.op0 = gen_raw_REG (d.vmode, LAST_VIRTUAL_REGISTER + 2);
>> +      if (!d.one_vector_p)
>> +       d.op1 = gen_raw_REG (d.vmode, LAST_VIRTUAL_REGISTER + 3);
>> +
>> +      start_sequence ();
>> +      ok = mips_expand_vec_perm_const_1 (&d);
>> +      end_sequence ();
>> +      return ok;
>> +    }
>> +
>>    ok = mips_expand_vec_perm_const_1 (&d);
>>
>>    /* If we were given a two-vector permutation which just happened to
>> @@ -21445,8 +21456,8 @@ mips_expand_vec_perm_const (rtx operands
>>       the original permutation.  */
>>    if (!ok && which == 3)
>>      {
>> -      d.op0 = operands[1];
>> -      d.op1 = operands[2];
>> +      d.op0 = op0;
>> +      d.op1 = op1;
>>        d.one_vector_p = false;
>>        memcpy (d.perm, orig_perm, MAX_VECT_LEN);
>>        ok = mips_expand_vec_perm_const_1 (&d);
>> @@ -21466,48 +21477,6 @@ mips_sched_reassociation_width (unsigned
>>    return 1;
>>  }
>>
>> -/* Implement TARGET_VECTORIZE_VEC_PERM_CONST_OK.  */
>> -
>> -static bool
>> -mips_vectorize_vec_perm_const_ok (machine_mode vmode, vec_perm_indices sel)
>> -{
>> -  struct expand_vec_perm_d d;
>> -  unsigned int i, nelt, which;
>> -  bool ret;
>> -
>> -  d.vmode = vmode;
>> -  d.nelt = nelt = GET_MODE_NUNITS (d.vmode);
>> -  d.testing_p = true;
>> -
>> -  /* Categorize the set of elements in the selector.  */
>> -  for (i = which = 0; i < nelt; ++i)
>> -    {
>> -      unsigned char e = sel[i];
>> -      d.perm[i] = e;
>> -      gcc_assert (e < 2 * nelt);
>> -      which |= (e < nelt ? 1 : 2);
>> -    }
>> -
>> -  /* For all elements from second vector, fold the elements to first.  */
>> -  if (which == 2)
>> -    for (i = 0; i < nelt; ++i)
>> -      d.perm[i] -= nelt;
>> -
>> -  /* Check whether the mask can be applied to the vector type.  */
>> -  d.one_vector_p = (which != 3);
>> -
>> -  d.target = gen_raw_REG (d.vmode, LAST_VIRTUAL_REGISTER + 1);
>> -  d.op1 = d.op0 = gen_raw_REG (d.vmode, LAST_VIRTUAL_REGISTER + 2);
>> -  if (!d.one_vector_p)
>> -    d.op1 = gen_raw_REG (d.vmode, LAST_VIRTUAL_REGISTER + 3);
>> -
>> -  start_sequence ();
>> -  ret = mips_expand_vec_perm_const_1 (&d);
>> -  end_sequence ();
>> -
>> -  return ret;
>> -}
>> -
>>  /* Expand an integral vector unpack operation.  */
>>
>>  void
>> @@ -22589,8 +22558,8 @@ #define TARGET_SHIFT_TRUNCATION_MASK mip
>>  #undef TARGET_PREPARE_PCH_SAVE
>>  #define TARGET_PREPARE_PCH_SAVE mips_prepare_pch_save
>>
>> -#undef TARGET_VECTORIZE_VEC_PERM_CONST_OK
>> -#define TARGET_VECTORIZE_VEC_PERM_CONST_OK mips_vectorize_vec_perm_const_ok
>> +#undef TARGET_VECTORIZE_VEC_PERM_CONST
>> +#define TARGET_VECTORIZE_VEC_PERM_CONST mips_vectorize_vec_perm_const
>>
>>  #undef TARGET_SCHED_REASSOCIATION_WIDTH
>>  #define TARGET_SCHED_REASSOCIATION_WIDTH mips_sched_reassociation_width
>> Index: gcc/config/powerpcspe/altivec.md
>> ===================================================================
>> --- gcc/config/powerpcspe/altivec.md    2017-12-09 22:47:09.549486911 +0000
>> +++ gcc/config/powerpcspe/altivec.md    2017-12-09 22:47:27.867318090 +0000
>> @@ -2080,19 +2080,6 @@ (define_expand "vec_permv16qi"
>>    }
>>  })
>>
>> -(define_expand "vec_perm_constv16qi"
>> -  [(match_operand:V16QI 0 "register_operand" "")
>> -   (match_operand:V16QI 1 "register_operand" "")
>> -   (match_operand:V16QI 2 "register_operand" "")
>> -   (match_operand:V16QI 3 "" "")]
>> -  "TARGET_ALTIVEC"
>> -{
>> -  if (altivec_expand_vec_perm_const (operands))
>> -    DONE;
>> -  else
>> -    FAIL;
>> -})
>> -
>>  (define_insn "*altivec_vpermr_<mode>_internal"
>>    [(set (match_operand:VM 0 "register_operand" "=v,?wo")
>>         (unspec:VM [(match_operand:VM 1 "register_operand" "v,wo")
>> Index: gcc/config/powerpcspe/paired.md
>> ===================================================================
>> --- gcc/config/powerpcspe/paired.md     2017-12-09 22:47:09.549486911 +0000
>> +++ gcc/config/powerpcspe/paired.md     2017-12-09 22:47:27.867318090 +0000
>> @@ -313,19 +313,6 @@ (define_insn "paired_merge11"
>>    "ps_merge11 %0, %1, %2"
>>    [(set_attr "type" "fp")])
>>
>> -(define_expand "vec_perm_constv2sf"
>> -  [(match_operand:V2SF 0 "gpc_reg_operand" "")
>> -   (match_operand:V2SF 1 "gpc_reg_operand" "")
>> -   (match_operand:V2SF 2 "gpc_reg_operand" "")
>> -   (match_operand:V2SI 3 "" "")]
>> -  "TARGET_PAIRED_FLOAT"
>> -{
>> -  if (rs6000_expand_vec_perm_const (operands))
>> -    DONE;
>> -  else
>> -    FAIL;
>> -})
>> -
>>  (define_insn "paired_sum0"
>>    [(set (match_operand:V2SF 0 "gpc_reg_operand" "=f")
>>         (vec_concat:V2SF (plus:SF (vec_select:SF
>> Index: gcc/config/powerpcspe/spe.md
>> ===================================================================
>> --- gcc/config/powerpcspe/spe.md        2017-12-09 22:47:09.549486911 +0000
>> +++ gcc/config/powerpcspe/spe.md        2017-12-09 22:47:27.871318093 +0000
>> @@ -511,19 +511,6 @@ (define_insn "vec_perm10_v2si"
>>    [(set_attr "type" "vecsimple")
>>     (set_attr  "length" "4")])
>>
>> -(define_expand "vec_perm_constv2si"
>> -  [(match_operand:V2SI 0 "gpc_reg_operand" "")
>> -   (match_operand:V2SI 1 "gpc_reg_operand" "")
>> -   (match_operand:V2SI 2 "gpc_reg_operand" "")
>> -   (match_operand:V2SI 3 "" "")]
>> -  "TARGET_SPE"
>> -{
>> -  if (rs6000_expand_vec_perm_const (operands))
>> -    DONE;
>> -  else
>> -    FAIL;
>> -})
>> -
>>  (define_expand "spe_evmergehi"
>>    [(match_operand:V2SI 0 "register_operand" "")
>>     (match_operand:V2SI 1 "register_operand" "")
>> Index: gcc/config/powerpcspe/vsx.md
>> ===================================================================
>> --- gcc/config/powerpcspe/vsx.md        2017-12-09 22:47:09.549486911 +0000
>> +++ gcc/config/powerpcspe/vsx.md        2017-12-09 22:47:27.871318093 +0000
>> @@ -2543,19 +2543,6 @@ (define_insn "vsx_xxpermdi2_<mode>_1"
>>  }
>>    [(set_attr "type" "vecperm")])
>>
>> -(define_expand "vec_perm_const<mode>"
>> -  [(match_operand:VSX_D 0 "vsx_register_operand" "")
>> -   (match_operand:VSX_D 1 "vsx_register_operand" "")
>> -   (match_operand:VSX_D 2 "vsx_register_operand" "")
>> -   (match_operand:V2DI  3 "" "")]
>> -  "VECTOR_MEM_VSX_P (<MODE>mode)"
>> -{
>> -  if (rs6000_expand_vec_perm_const (operands))
>> -    DONE;
>> -  else
>> -    FAIL;
>> -})
>> -
>>  ;; Extraction of a single element in a small integer vector.  Until ISA 3.0,
>>  ;; none of the small types were allowed in a vector register, so we had to
>>  ;; extract to a DImode and either do a direct move or store.
>> Index: gcc/config/powerpcspe/powerpcspe-protos.h
>> ===================================================================
>> --- gcc/config/powerpcspe/powerpcspe-protos.h 2017-12-09
> 22:47:09.549486911 +0000
>> +++ gcc/config/powerpcspe/powerpcspe-protos.h 2017-12-09
> 22:47:27.867318090 +0000
>> @@ -64,9 +64,7 @@ extern void rs6000_expand_vector_extract
>>  extern void rs6000_split_vec_extract_var (rtx, rtx, rtx, rtx, rtx);
>>  extern rtx rs6000_adjust_vec_address (rtx, rtx, rtx, rtx, machine_mode);
>>  extern void rs6000_split_v4si_init (rtx []);
>> -extern bool altivec_expand_vec_perm_const (rtx op[4]);
>>  extern void altivec_expand_vec_perm_le (rtx op[4]);
>> -extern bool rs6000_expand_vec_perm_const (rtx op[4]);
>>  extern void altivec_expand_lvx_be (rtx, rtx, machine_mode, unsigned);
>>  extern void altivec_expand_stvx_be (rtx, rtx, machine_mode, unsigned);
>>  extern void altivec_expand_stvex_be (rtx, rtx, machine_mode, unsigned);
>> Index: gcc/config/powerpcspe/powerpcspe.c
>> ===================================================================
>> --- gcc/config/powerpcspe/powerpcspe.c  2017-12-09 22:47:09.549486911 +0000
>> +++ gcc/config/powerpcspe/powerpcspe.c  2017-12-09 22:47:27.871318093 +0000
>> @@ -1936,8 +1936,8 @@ #define TARGET_SET_CURRENT_FUNCTION rs60
>>  #undef TARGET_LEGITIMATE_CONSTANT_P
>>  #define TARGET_LEGITIMATE_CONSTANT_P rs6000_legitimate_constant_p
>>
>> -#undef TARGET_VECTORIZE_VEC_PERM_CONST_OK
>> -#define TARGET_VECTORIZE_VEC_PERM_CONST_OK rs6000_vectorize_vec_perm_const_ok
>> +#undef TARGET_VECTORIZE_VEC_PERM_CONST
>> +#define TARGET_VECTORIZE_VEC_PERM_CONST rs6000_vectorize_vec_perm_const
>>
>>  #undef TARGET_CAN_USE_DOLOOP_P
>>  #define TARGET_CAN_USE_DOLOOP_P can_use_doloop_if_innermost
>> @@ -38311,6 +38311,9 @@ rs6000_emit_parity (rtx dst, rtx src)
>>  }
>>
>>  /* Expand an Altivec constant permutation for little endian mode.
>> +   OP0 and OP1 are the input vectors and TARGET is the output vector.
>> +   SEL specifies the constant permutation vector.
>> +
>>     There are two issues: First, the two input operands must be
>>     swapped so that together they form a double-wide array in LE
>>     order.  Second, the vperm instruction has surprising behavior
>> @@ -38352,22 +38355,18 @@ rs6000_emit_parity (rtx dst, rtx src)
>>
>>     vr9  = 00000006 00000004 00000002 00000000.  */
>>
>> -void
>> -altivec_expand_vec_perm_const_le (rtx operands[4])
>> +static void
>> +altivec_expand_vec_perm_const_le (rtx target, rtx op0, rtx op1,
>> +                                 const vec_perm_indices &sel)
>>  {
>>    unsigned int i;
>>    rtx perm[16];
>>    rtx constv, unspec;
>> -  rtx target = operands[0];
>> -  rtx op0 = operands[1];
>> -  rtx op1 = operands[2];
>> -  rtx sel = operands[3];
>>
>>    /* Unpack and adjust the constant selector.  */
>>    for (i = 0; i < 16; ++i)
>>      {
>> -      rtx e = XVECEXP (sel, 0, i);
>> -      unsigned int elt = 31 - (INTVAL (e) & 31);
>> +      unsigned int elt = 31 - (sel[i] & 31);
>>        perm[i] = GEN_INT (elt);
>>      }
>>
>> @@ -38449,10 +38448,14 @@ altivec_expand_vec_perm_le (rtx operands
>>  }
>>
>>  /* Expand an Altivec constant permutation.  Return true if we match
>> -   an efficient implementation; false to fall back to VPERM.  */
>> +   an efficient implementation; false to fall back to VPERM.
>>
>> -bool
>> -altivec_expand_vec_perm_const (rtx operands[4])
>> +   OP0 and OP1 are the input vectors and TARGET is the output vector.
>> +   SEL specifies the constant permutation vector.  */
>> +
>> +static bool
>> +altivec_expand_vec_perm_const (rtx target, rtx op0, rtx op1,
>> +                              const vec_perm_indices &sel)
>>  {
>>    struct altivec_perm_insn {
>>      HOST_WIDE_INT mask;
>> @@ -38496,19 +38499,13 @@ altivec_expand_vec_perm_const (rtx opera
>>
>>    unsigned int i, j, elt, which;
>>    unsigned char perm[16];
>> -  rtx target, op0, op1, sel, x;
>> +  rtx x;
>>    bool one_vec;
>>
>> -  target = operands[0];
>> -  op0 = operands[1];
>> -  op1 = operands[2];
>> -  sel = operands[3];
>> -
>>    /* Unpack the constant selector.  */
>>    for (i = which = 0; i < 16; ++i)
>>      {
>> -      rtx e = XVECEXP (sel, 0, i);
>> -      elt = INTVAL (e) & 31;
>> +      elt = sel[i] & 31;
>>        which |= (elt < 16 ? 1 : 2);
>>        perm[i] = elt;
>>      }
>> @@ -38664,7 +38661,7 @@ altivec_expand_vec_perm_const (rtx opera
>>
>>    if (!BYTES_BIG_ENDIAN)
>>      {
>> -      altivec_expand_vec_perm_const_le (operands);
>> +      altivec_expand_vec_perm_const_le (target, op0, op1, sel);
>>        return true;
>>      }
>>
>> @@ -38724,60 +38721,54 @@ rs6000_expand_vec_perm_const_1 (rtx targ
>>    return true;
>>  }
>>
>> -bool
>> -rs6000_expand_vec_perm_const (rtx operands[4])
>> -{
>> -  rtx target, op0, op1, sel;
>> -  unsigned char perm0, perm1;
>> -
>> -  target = operands[0];
>> -  op0 = operands[1];
>> -  op1 = operands[2];
>> -  sel = operands[3];
>> -
>> -  /* Unpack the constant selector.  */
>> -  perm0 = INTVAL (XVECEXP (sel, 0, 0)) & 3;
>> -  perm1 = INTVAL (XVECEXP (sel, 0, 1)) & 3;
>> -
>> -  return rs6000_expand_vec_perm_const_1 (target, op0, op1, perm0, perm1);
>> -}
>> -
>> -/* Test whether a constant permutation is supported.  */
>> +/* Implement TARGET_VECTORIZE_VEC_PERM_CONST.  */
>>
>>  static bool
>> -rs6000_vectorize_vec_perm_const_ok (machine_mode vmode, vec_perm_indices sel)
>> +rs6000_vectorize_vec_perm_const (machine_mode vmode, rtx target, rtx op0,
>> +                                rtx op1, const vec_perm_indices &sel)
>>  {
>> +  bool testing_p = !target;
>> +
>>    /* AltiVec (and thus VSX) can handle arbitrary permutations.  */
>> -  if (TARGET_ALTIVEC)
>> +  if (TARGET_ALTIVEC && testing_p)
>>      return true;
>>
>> -  /* Check for ps_merge* or evmerge* insns.  */
>> -  if ((TARGET_PAIRED_FLOAT && vmode == V2SFmode)
>> -      || (TARGET_SPE && vmode == V2SImode))
>> -    {
>> -      rtx op0 = gen_raw_REG (vmode, LAST_VIRTUAL_REGISTER + 1);
>> -      rtx op1 = gen_raw_REG (vmode, LAST_VIRTUAL_REGISTER + 2);
>> -      return rs6000_expand_vec_perm_const_1 (NULL, op0, op1, sel[0], sel[1]);
>> +  /* Check for ps_merge*, evmerge* or xxperm* insns.  */
>> +  if ((vmode == V2SFmode && TARGET_PAIRED_FLOAT)
>> +      || (vmode == V2SImode && TARGET_SPE)
>> +      || ((vmode == V2DFmode || vmode == V2DImode)
>> +         && VECTOR_MEM_VSX_P (vmode)))
>> +    {
>> +      if (testing_p)
>> +       {
>> +         op0 = gen_raw_REG (vmode, LAST_VIRTUAL_REGISTER + 1);
>> +         op1 = gen_raw_REG (vmode, LAST_VIRTUAL_REGISTER + 2);
>> +       }
>> +      if (rs6000_expand_vec_perm_const_1 (target, op0, op1, sel[0], sel[1]))
>> +       return true;
>> +    }
>> +
>> +  if (TARGET_ALTIVEC)
>> +    {
>> +      /* Force the target-independent code to lower to V16QImode.  */
>> +      if (vmode != V16QImode)
>> +       return false;
>> +      if (altivec_expand_vec_perm_const (target, op0, op1, sel))
>> +       return true;
>>      }
>>
>>    return false;
>>  }
>>
>> -/* A subroutine for rs6000_expand_extract_even &
> rs6000_expand_interleave.  */
>> +/* A subroutine for rs6000_expand_extract_even & rs6000_expand_interleave.
>> +   OP0 and OP1 are the input vectors and TARGET is the output vector.
>> +   PERM specifies the constant permutation vector.  */
>>
>>  static void
>>  rs6000_do_expand_vec_perm (rtx target, rtx op0, rtx op1,
>> -                          machine_mode vmode, unsigned nelt, rtx perm[])
>> +                          machine_mode vmode, const vec_perm_builder &perm)
>>  {
>> -  machine_mode imode;
>> -  rtx x;
>> -
>> -  imode = vmode;
>> -  if (GET_MODE_CLASS (vmode) != MODE_VECTOR_INT)
>> -    imode = mode_for_int_vector (vmode).require ();
>> -
>> -  x = gen_rtx_CONST_VECTOR (imode, gen_rtvec_v (nelt, perm));
>> -  x = expand_vec_perm (vmode, op0, op1, x, target);
>> +  rtx x = expand_vec_perm_const (vmode, op0, op1, perm, BLKmode, target);
>>    if (x != target)
>>      emit_move_insn (target, x);
>>  }
>> @@ -38789,12 +38780,12 @@ rs6000_expand_extract_even (rtx target,
>>  {
>>    machine_mode vmode = GET_MODE (target);
>>    unsigned i, nelt = GET_MODE_NUNITS (vmode);
>> -  rtx perm[16];
>> +  vec_perm_builder perm (nelt);
>>
>>    for (i = 0; i < nelt; i++)
>> -    perm[i] = GEN_INT (i * 2);
>> +    perm.quick_push (i * 2);
>>
>> -  rs6000_do_expand_vec_perm (target, op0, op1, vmode, nelt, perm);
>> +  rs6000_do_expand_vec_perm (target, op0, op1, vmode, perm);
>>  }
>>
>>  /* Expand a vector interleave operation.  */
>> @@ -38804,16 +38795,16 @@ rs6000_expand_interleave (rtx target, rt
>>  {
>>    machine_mode vmode = GET_MODE (target);
>>    unsigned i, high, nelt = GET_MODE_NUNITS (vmode);
>> -  rtx perm[16];
>> +  vec_perm_builder perm (nelt);
>>
>>    high = (highp ? 0 : nelt / 2);
>>    for (i = 0; i < nelt / 2; i++)
>>      {
>> -      perm[i * 2] = GEN_INT (i + high);
>> -      perm[i * 2 + 1] = GEN_INT (i + nelt + high);
>> +      perm.quick_push (i + high);
>> +      perm.quick_push (i + nelt + high);
>>      }
>>
>> -  rs6000_do_expand_vec_perm (target, op0, op1, vmode, nelt, perm);
>> +  rs6000_do_expand_vec_perm (target, op0, op1, vmode, perm);
>>  }
>>
>>  /* Scale a V2DF vector SRC by two to the SCALE and place in TGT.  */
>> Index: gcc/config/rs6000/altivec.md
>> ===================================================================
>> --- gcc/config/rs6000/altivec.md        2017-12-09 22:47:09.549486911 +0000
>> +++ gcc/config/rs6000/altivec.md        2017-12-09 22:47:27.872318093 +0000
>> @@ -2198,19 +2198,6 @@ (define_expand "vec_permv16qi"
>>    }
>>  })
>>
>> -(define_expand "vec_perm_constv16qi"
>> -  [(match_operand:V16QI 0 "register_operand" "")
>> -   (match_operand:V16QI 1 "register_operand" "")
>> -   (match_operand:V16QI 2 "register_operand" "")
>> -   (match_operand:V16QI 3 "" "")]
>> -  "TARGET_ALTIVEC"
>> -{
>> -  if (altivec_expand_vec_perm_const (operands))
>> -    DONE;
>> -  else
>> -    FAIL;
>> -})
>> -
>>  (define_insn "*altivec_vpermr_<mode>_internal"
>>    [(set (match_operand:VM 0 "register_operand" "=v,?wo")
>>         (unspec:VM [(match_operand:VM 1 "register_operand" "v,wo")
>> Index: gcc/config/rs6000/paired.md
>> ===================================================================
>> --- gcc/config/rs6000/paired.md 2017-12-09 22:47:09.549486911 +0000
>> +++ gcc/config/rs6000/paired.md 2017-12-09 22:47:27.872318093 +0000
>> @@ -313,19 +313,6 @@ (define_insn "paired_merge11"
>>    "ps_merge11 %0, %1, %2"
>>    [(set_attr "type" "fp")])
>>
>> -(define_expand "vec_perm_constv2sf"
>> -  [(match_operand:V2SF 0 "gpc_reg_operand" "")
>> -   (match_operand:V2SF 1 "gpc_reg_operand" "")
>> -   (match_operand:V2SF 2 "gpc_reg_operand" "")
>> -   (match_operand:V2SI 3 "" "")]
>> -  "TARGET_PAIRED_FLOAT"
>> -{
>> -  if (rs6000_expand_vec_perm_const (operands))
>> -    DONE;
>> -  else
>> -    FAIL;
>> -})
>> -
>>  (define_insn "paired_sum0"
>>    [(set (match_operand:V2SF 0 "gpc_reg_operand" "=f")
>>         (vec_concat:V2SF (plus:SF (vec_select:SF
>> Index: gcc/config/rs6000/vsx.md
>> ===================================================================
>> --- gcc/config/rs6000/vsx.md    2017-12-09 22:47:09.549486911 +0000
>> +++ gcc/config/rs6000/vsx.md    2017-12-09 22:47:27.875318095 +0000
>> @@ -3189,19 +3189,6 @@ (define_insn "vsx_xxpermdi2_<mode>_1"
>>  }
>>    [(set_attr "type" "vecperm")])
>>
>> -(define_expand "vec_perm_const<mode>"
>> -  [(match_operand:VSX_D 0 "vsx_register_operand" "")
>> -   (match_operand:VSX_D 1 "vsx_register_operand" "")
>> -   (match_operand:VSX_D 2 "vsx_register_operand" "")
>> -   (match_operand:V2DI  3 "" "")]
>> -  "VECTOR_MEM_VSX_P (<MODE>mode)"
>> -{
>> -  if (rs6000_expand_vec_perm_const (operands))
>> -    DONE;
>> -  else
>> -    FAIL;
>> -})
>> -
>>  ;; Extraction of a single element in a small integer vector.  Until ISA 3.0,
>>  ;; none of the small types were allowed in a vector register, so we had to
>>  ;; extract to a DImode and either do a direct move or store.
>> Index: gcc/config/rs6000/rs6000-protos.h
>> ===================================================================
>> --- gcc/config/rs6000/rs6000-protos.h   2017-12-09 22:47:09.549486911 +0000
>> +++ gcc/config/rs6000/rs6000-protos.h   2017-12-09 22:47:27.872318093 +0000
>> @@ -63,9 +63,7 @@ extern void rs6000_expand_vector_extract
>>  extern void rs6000_split_vec_extract_var (rtx, rtx, rtx, rtx, rtx);
>>  extern rtx rs6000_adjust_vec_address (rtx, rtx, rtx, rtx, machine_mode);
>>  extern void rs6000_split_v4si_init (rtx []);
>> -extern bool altivec_expand_vec_perm_const (rtx op[4]);
>>  extern void altivec_expand_vec_perm_le (rtx op[4]);
>> -extern bool rs6000_expand_vec_perm_const (rtx op[4]);
>>  extern void altivec_expand_lvx_be (rtx, rtx, machine_mode, unsigned);
>>  extern void altivec_expand_stvx_be (rtx, rtx, machine_mode, unsigned);
>>  extern void altivec_expand_stvex_be (rtx, rtx, machine_mode, unsigned);
>> Index: gcc/config/rs6000/rs6000.c
>> ===================================================================
>> --- gcc/config/rs6000/rs6000.c  2017-12-09 22:47:09.549486911 +0000
>> +++ gcc/config/rs6000/rs6000.c  2017-12-09 22:47:27.874318095 +0000
>> @@ -1907,8 +1907,8 @@ #define TARGET_SET_CURRENT_FUNCTION rs60
>>  #undef TARGET_LEGITIMATE_CONSTANT_P
>>  #define TARGET_LEGITIMATE_CONSTANT_P rs6000_legitimate_constant_p
>>
>> -#undef TARGET_VECTORIZE_VEC_PERM_CONST_OK
>> -#define TARGET_VECTORIZE_VEC_PERM_CONST_OK rs6000_vectorize_vec_perm_const_ok
>> +#undef TARGET_VECTORIZE_VEC_PERM_CONST
>> +#define TARGET_VECTORIZE_VEC_PERM_CONST rs6000_vectorize_vec_perm_const
>>
>>  #undef TARGET_CAN_USE_DOLOOP_P
>>  #define TARGET_CAN_USE_DOLOOP_P can_use_doloop_if_innermost
>> @@ -35545,6 +35545,9 @@ rs6000_emit_parity (rtx dst, rtx src)
>>  }
>>
>>  /* Expand an Altivec constant permutation for little endian mode.
>> +   OP0 and OP1 are the input vectors and TARGET is the output vector.
>> +   SEL specifies the constant permutation vector.
>> +
>>     There are two issues: First, the two input operands must be
>>     swapped so that together they form a double-wide array in LE
>>     order.  Second, the vperm instruction has surprising behavior
>> @@ -35586,22 +35589,18 @@ rs6000_emit_parity (rtx dst, rtx src)
>>
>>     vr9  = 00000006 00000004 00000002 00000000.  */
>>
>> -void
>> -altivec_expand_vec_perm_const_le (rtx operands[4])
>> +static void
>> +altivec_expand_vec_perm_const_le (rtx target, rtx op0, rtx op1,
>> +                                 const vec_perm_indices &sel)
>>  {
>>    unsigned int i;
>>    rtx perm[16];
>>    rtx constv, unspec;
>> -  rtx target = operands[0];
>> -  rtx op0 = operands[1];
>> -  rtx op1 = operands[2];
>> -  rtx sel = operands[3];
>>
>>    /* Unpack and adjust the constant selector.  */
>>    for (i = 0; i < 16; ++i)
>>      {
>> -      rtx e = XVECEXP (sel, 0, i);
>> -      unsigned int elt = 31 - (INTVAL (e) & 31);
>> +      unsigned int elt = 31 - (sel[i] & 31);
>>        perm[i] = GEN_INT (elt);
>>      }
>>
>> @@ -35683,10 +35682,14 @@ altivec_expand_vec_perm_le (rtx operands
>>  }
>>
>>  /* Expand an Altivec constant permutation.  Return true if we match
>> -   an efficient implementation; false to fall back to VPERM.  */
>> +   an efficient implementation; false to fall back to VPERM.
>>
>> -bool
>> -altivec_expand_vec_perm_const (rtx operands[4])
>> +   OP0 and OP1 are the input vectors and TARGET is the output vector.
>> +   SEL specifies the constant permutation vector.  */
>> +
>> +static bool
>> +altivec_expand_vec_perm_const (rtx target, rtx op0, rtx op1,
>> +                              const vec_perm_indices &sel)
>>  {
>>    struct altivec_perm_insn {
>>      HOST_WIDE_INT mask;
>> @@ -35734,19 +35737,13 @@ altivec_expand_vec_perm_const (rtx opera
>>
>>    unsigned int i, j, elt, which;
>>    unsigned char perm[16];
>> -  rtx target, op0, op1, sel, x;
>> +  rtx x;
>>    bool one_vec;
>>
>> -  target = operands[0];
>> -  op0 = operands[1];
>> -  op1 = operands[2];
>> -  sel = operands[3];
>> -
>>    /* Unpack the constant selector.  */
>>    for (i = which = 0; i < 16; ++i)
>>      {
>> -      rtx e = XVECEXP (sel, 0, i);
>> -      elt = INTVAL (e) & 31;
>> +      elt = sel[i] & 31;
>>        which |= (elt < 16 ? 1 : 2);
>>        perm[i] = elt;
>>      }
>> @@ -35902,7 +35899,7 @@ altivec_expand_vec_perm_const (rtx opera
>>
>>    if (!BYTES_BIG_ENDIAN)
>>      {
>> -      altivec_expand_vec_perm_const_le (operands);
>> +      altivec_expand_vec_perm_const_le (target, op0, op1, sel);
>>        return true;
>>      }
>>
>> @@ -35962,59 +35959,53 @@ rs6000_expand_vec_perm_const_1 (rtx targ
>>    return true;
>>  }
>>
>> -bool
>> -rs6000_expand_vec_perm_const (rtx operands[4])
>> -{
>> -  rtx target, op0, op1, sel;
>> -  unsigned char perm0, perm1;
>> -
>> -  target = operands[0];
>> -  op0 = operands[1];
>> -  op1 = operands[2];
>> -  sel = operands[3];
>> -
>> -  /* Unpack the constant selector.  */
>> -  perm0 = INTVAL (XVECEXP (sel, 0, 0)) & 3;
>> -  perm1 = INTVAL (XVECEXP (sel, 0, 1)) & 3;
>> -
>> -  return rs6000_expand_vec_perm_const_1 (target, op0, op1, perm0, perm1);
>> -}
>> -
>> -/* Test whether a constant permutation is supported.  */
>> +/* Implement TARGET_VECTORIZE_VEC_PERM_CONST.  */
>>
>>  static bool
>> -rs6000_vectorize_vec_perm_const_ok (machine_mode vmode, vec_perm_indices sel)
>> +rs6000_vectorize_vec_perm_const (machine_mode vmode, rtx target, rtx op0,
>> +                                rtx op1, const vec_perm_indices &sel)
>>  {
>> +  bool testing_p = !target;
>> +
>>    /* AltiVec (and thus VSX) can handle arbitrary permutations.  */
>> -  if (TARGET_ALTIVEC)
>> +  if (TARGET_ALTIVEC && testing_p)
>>      return true;
>>
>> -  /* Check for ps_merge* or evmerge* insns.  */
>> -  if (TARGET_PAIRED_FLOAT && vmode == V2SFmode)
>> +  /* Check for ps_merge* or xxpermdi insns.  */
>> +  if ((vmode == V2SFmode && TARGET_PAIRED_FLOAT)
>> +      || ((vmode == V2DFmode || vmode == V2DImode)
>> +         && VECTOR_MEM_VSX_P (vmode)))
>> +    {
>> +      if (testing_p)
>> +       {
>> +         op0 = gen_raw_REG (vmode, LAST_VIRTUAL_REGISTER + 1);
>> +         op1 = gen_raw_REG (vmode, LAST_VIRTUAL_REGISTER + 2);
>> +       }
>> +      if (rs6000_expand_vec_perm_const_1 (target, op0, op1, sel[0], sel[1]))
>> +       return true;
>> +    }
>> +
>> +  if (TARGET_ALTIVEC)
>>      {
>> -      rtx op0 = gen_raw_REG (vmode, LAST_VIRTUAL_REGISTER + 1);
>> -      rtx op1 = gen_raw_REG (vmode, LAST_VIRTUAL_REGISTER + 2);
>> -      return rs6000_expand_vec_perm_const_1 (NULL, op0, op1, sel[0], sel[1]);
>> +      /* Force the target-independent code to lower to V16QImode.  */
>> +      if (vmode != V16QImode)
>> +       return false;
>> +      if (altivec_expand_vec_perm_const (target, op0, op1, sel))
>> +       return true;
>>      }
>>
>>    return false;
>>  }
>>
>> -/* A subroutine for rs6000_expand_extract_even &
> rs6000_expand_interleave.  */
>> +/* A subroutine for rs6000_expand_extract_even & rs6000_expand_interleave.
>> +   OP0 and OP1 are the input vectors and TARGET is the output vector.
>> +   PERM specifies the constant permutation vector.  */
>>
>>  static void
>>  rs6000_do_expand_vec_perm (rtx target, rtx op0, rtx op1,
>> -                          machine_mode vmode, unsigned nelt, rtx perm[])
>> +                          machine_mode vmode, const vec_perm_builder &perm)
>>  {
>> -  machine_mode imode;
>> -  rtx x;
>> -
>> -  imode = vmode;
>> -  if (GET_MODE_CLASS (vmode) != MODE_VECTOR_INT)
>> -    imode = mode_for_int_vector (vmode).require ();
>> -
>> -  x = gen_rtx_CONST_VECTOR (imode, gen_rtvec_v (nelt, perm));
>> -  x = expand_vec_perm (vmode, op0, op1, x, target);
>> +  rtx x = expand_vec_perm_const (vmode, op0, op1, perm, BLKmode, target);
>>    if (x != target)
>>      emit_move_insn (target, x);
>>  }
>> @@ -36026,12 +36017,12 @@ rs6000_expand_extract_even (rtx target,
>>  {
>>    machine_mode vmode = GET_MODE (target);
>>    unsigned i, nelt = GET_MODE_NUNITS (vmode);
>> -  rtx perm[16];
>> +  vec_perm_builder perm (nelt);
>>
>>    for (i = 0; i < nelt; i++)
>> -    perm[i] = GEN_INT (i * 2);
>> +    perm.quick_push (i * 2);
>>
>> -  rs6000_do_expand_vec_perm (target, op0, op1, vmode, nelt, perm);
>> +  rs6000_do_expand_vec_perm (target, op0, op1, vmode, perm);
>>  }
>>
>>  /* Expand a vector interleave operation.  */
>> @@ -36041,16 +36032,16 @@ rs6000_expand_interleave (rtx target, rt
>>  {
>>    machine_mode vmode = GET_MODE (target);
>>    unsigned i, high, nelt = GET_MODE_NUNITS (vmode);
>> -  rtx perm[16];
>> +  vec_perm_builder perm (nelt);
>>
>>    high = (highp ? 0 : nelt / 2);
>>    for (i = 0; i < nelt / 2; i++)
>>      {
>> -      perm[i * 2] = GEN_INT (i + high);
>> -      perm[i * 2 + 1] = GEN_INT (i + nelt + high);
>> +      perm.quick_push (i + high);
>> +      perm.quick_push (i + nelt + high);
>>      }
>>
>> -  rs6000_do_expand_vec_perm (target, op0, op1, vmode, nelt, perm);
>> +  rs6000_do_expand_vec_perm (target, op0, op1, vmode, perm);
>>  }
>>
>>  /* Scale a V2DF vector SRC by two to the SCALE and place in TGT.  */
>> Index: gcc/config/sparc/sparc.md
>> ===================================================================
>> --- gcc/config/sparc/sparc.md   2017-12-09 22:47:09.549486911 +0000
>> +++ gcc/config/sparc/sparc.md   2017-12-09 22:47:27.876318096 +0000
>> @@ -9327,28 +9327,6 @@ (define_insn "bshuffle<VM64:mode>_vis"
>>     (set_attr "subtype" "other")
>>     (set_attr "fptype" "double")])
>>
>> -;; The rtl expanders will happily convert constant permutations on other
>> -;; modes down to V8QI.  Rely on this to avoid the complexity of the byte
>> -;; order of the permutation.
>> -(define_expand "vec_perm_constv8qi"
>> -  [(match_operand:V8QI 0 "register_operand" "")
>> -   (match_operand:V8QI 1 "register_operand" "")
>> -   (match_operand:V8QI 2 "register_operand" "")
>> -   (match_operand:V8QI 3 "" "")]
>> -  "TARGET_VIS2"
>> -{
>> -  unsigned int i, mask;
>> -  rtx sel = operands[3];
>> -
>> -  for (i = mask = 0; i < 8; ++i)
>> -    mask |= (INTVAL (XVECEXP (sel, 0, i)) & 0xf) << (28 - i*4);
>> -  sel = force_reg (SImode, gen_int_mode (mask, SImode));
>> -
>> -  emit_insn (gen_bmasksi_vis (gen_reg_rtx (SImode), sel, const0_rtx));
>> -  emit_insn (gen_bshufflev8qi_vis (operands[0], operands[1], operands[2]));
>> -  DONE;
>> -})
>> -
>>  ;; Unlike constant permutation, we can vastly simplify the compression of
>>  ;; the 64-bit selector input to the 32-bit %gsr value by knowing what the
>>  ;; width of the input is.
>> Index: gcc/config/sparc/sparc.c
>> ===================================================================
>> --- gcc/config/sparc/sparc.c    2017-12-09 22:47:09.549486911 +0000
>> +++ gcc/config/sparc/sparc.c    2017-12-09 22:47:27.876318096 +0000
>> @@ -686,6 +686,8 @@ static bool sparc_modes_tieable_p (machi
>>  static bool sparc_can_change_mode_class (machine_mode, machine_mode,
>>                                          reg_class_t);
>>  static HOST_WIDE_INT sparc_constant_alignment (const_tree, HOST_WIDE_INT);
>> +static bool sparc_vectorize_vec_perm_const (machine_mode, rtx, rtx, rtx,
>> +                                           const vec_perm_indices &);
>>
>>  #ifdef SUBTARGET_ATTRIBUTE_TABLE
>>  /* Table of valid machine attributes.  */
>> @@ -930,6 +932,9 @@ #define TARGET_CAN_CHANGE_MODE_CLASS spa
>>  #undef TARGET_CONSTANT_ALIGNMENT
>>  #define TARGET_CONSTANT_ALIGNMENT sparc_constant_alignment
>>
>> +#undef TARGET_VECTORIZE_VEC_PERM_CONST
>> +#define TARGET_VECTORIZE_VEC_PERM_CONST sparc_vectorize_vec_perm_const
>> +
>>  struct gcc_target targetm = TARGET_INITIALIZER;
>>
>>  /* Return the memory reference contained in X if any, zero otherwise.  */
>> @@ -12812,6 +12817,32 @@ sparc_expand_vec_perm_bmask (machine_mod
>>    emit_insn (gen_bmasksi_vis (gen_reg_rtx (SImode), sel, t_1));
>>  }
>>
>> +/* Implement TARGET_VEC_PERM_CONST.  */
>> +
>> +static bool
>> +sparc_vectorize_vec_perm_const (machine_mode vmode, rtx target, rtx op0,
>> +                               rtx op1, const vec_perm_indices &sel)
>> +{
>> +  /* All permutes are supported.  */
>> +  if (!target)
>> +    return true;
>> +
>> +  /* Force target-independent code to convert constant permutations on other
>> +     modes down to V8QI.  Rely on this to avoid the complexity of the byte
>> +     order of the permutation.  */
>> +  if (vmode != V8QImode)
>> +    return false;
>> +
>> +  unsigned int i, mask;
>> +  for (i = mask = 0; i < 8; ++i)
>> +    mask |= (sel[i] & 0xf) << (28 - i*4);
>> +  rtx mask_rtx = force_reg (SImode, gen_int_mode (mask, SImode));
>> +
>> +  emit_insn (gen_bmasksi_vis (gen_reg_rtx (SImode), mask_rtx, const0_rtx));
>> +  emit_insn (gen_bshufflev8qi_vis (target, op0, op1));
>> +  return true;
>> +}
>> +
>>  /* Implement TARGET_FRAME_POINTER_REQUIRED.  */
>>
>>  static bool

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [07/13] Make vec_perm_indices use new vector encoding
  2017-12-14 10:37       ` Richard Biener
@ 2017-12-20 13:48         ` Richard Sandiford
  2018-01-02 13:15           ` Richard Biener
  0 siblings, 1 reply; 46+ messages in thread
From: Richard Sandiford @ 2017-12-20 13:48 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC Patches

Richard Biener <richard.guenther@gmail.com> writes:
> On Tue, Dec 12, 2017 at 4:46 PM, Richard Sandiford
> <richard.sandiford@linaro.org> wrote:
>> Richard Biener <richard.guenther@gmail.com> writes:
>>> On Sun, Dec 10, 2017 at 12:20 AM, Richard Sandiford
>>> <richard.sandiford@linaro.org> wrote:
>>>> This patch changes vec_perm_indices from a plain vec<> to a class
>>>> that stores a canonicalised permutation, using the same encoding
>>>> as for VECTOR_CSTs.  This means that vec_perm_indices now carries
>>>> information about the number of vectors being permuted (currently
>>>> always 1 or 2) and the number of elements in each input vector.
>>>
>>> Before I dive into  the C++ details can you explain why it needs this
>>> info and how it encodes it for variable-length vectors?  To interleave
>>> two vectors you need sth like { 0, N, 1, N+1, ... }, I'm not sure we
>>> can directly encode N here, can we?  extract even/odd should just
>>> work as { 0, 2, 4, 6, ...} without knowledge of whether we permute
>>> one or two vectors (the one vector case just has two times the same
>>> vector) or how many elements each of the vectors (or the result) has.
>>
>> One of the later patches switches the element types to HOST_WIDE_INT,
>> so that we can represent all ssizetypes.  Then there's a poly_int
>> patch (not yet posted) to make that poly_int64, so that we can
>> represent the N even for variable-length vectors.
>>
>> The class needs to know the number of elements because that affects
>> the canonical representation.  E.g. extract even on fixed-length
>> vectors with both inputs the same should be { 0, 2, 4, ..., 0, 2, 4 ... },
>> which we can't encode as a simple series.  Interleave low with both
>> inputs the same should be { 0, 0, 1, 1, ... } for both fixed-length and
>> variable-length vectors.
>
> Huh?  extract even is { 0, 2, 4, 6, 8 ... } indexes in the selection vector
> are referencing concat'ed input vectors.  So yes, for two same vectors
> that's effectively { 0, 2, 4, ..., 0, 2, 4, ... } but I don't see why
> that should
> be the canonical form?

Current practice is to use the single-input form where possible,
if both inputs are the same (see e.g. the VEC_PERM_EXPR handling
in fold-const.c).  It means that things like:

    _1 = VEC_PERM_EXPR <a, a, { 0, 2, 4, 6, 0, 2, 4, 6 }>;
    _2 = VEC_PERM_EXPR <a, a, { 0, 2, 4, 6, 8, 10, 12, 14 }>;
    _3 = VEC_PERM_EXPR <a, b, { 0, 2, 4, 6, 0, 2, 4, 6 }>;

get folded to the same sequence, and so can be CSEd.

We could instead convert the single-input form to use the two-input
selector, but that would be harder.  The advantage of treating the
single-input form as canonical is that it works even for irregular
permutes.

Thanks,
Richard

>> Also, operator[] is supposed to return an in-range selector even if
>> the selector element is only implicitly encoded.  So we need to know
>> the number of input elements there.
>>
>> Separating the number of input elements into the number of inputs
>> and the number of elements per input isn't really necessary, but made
>> it easier to provide routines for testing whether all selected
>> elements come from a particular input, and for rotating the selector
>> by a whole number of inputs.
>>
>> Thanks,
>> Richard

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [09/13] Use explicit encodings for simple permutes
  2017-12-09 23:21 ` [09/13] Use explicit encodings for simple permutes Richard Sandiford
  2017-12-19 20:37   ` Richard Sandiford
@ 2018-01-02 13:07   ` Richard Biener
  1 sibling, 0 replies; 46+ messages in thread
From: Richard Biener @ 2018-01-02 13:07 UTC (permalink / raw)
  To: GCC Patches, Richard Sandiford

On Sun, Dec 10, 2017 at 12:21 AM, Richard Sandiford
<richard.sandiford@linaro.org> wrote:
> This patch makes users of vec_perm_builders use the compressed encoding
> where possible.  This means that they work with variable-length vectors.
>

Ok.

Richard.

> 2017-12-09  Richard Sandiford  <richard.sandiford@linaro.org>
>
> gcc/
>         * optabs.c (expand_vec_perm_var): Use an explicit encoding for
>         the broadcast of the low byte.
>         (expand_mult_highpart): Use an explicit encoding for the permutes.
>         * optabs-query.c (can_mult_highpart_p): Likewise.
>         * tree-vect-loop.c (calc_vec_perm_mask_for_shift): Likewise.
>         * tree-vect-stmts.c (perm_mask_for_reverse): Likewise.
>         (vectorizable_bswap): Likewise.
>         * tree-vect-data-refs.c (vect_grouped_store_supported): Use an
>         explicit encoding for the power-of-2 permutes.
>         (vect_permute_store_chain): Likewise.
>         (vect_grouped_load_supported): Likewise.
>         (vect_permute_load_chain): Likewise.
>
> Index: gcc/optabs.c
> ===================================================================
> --- gcc/optabs.c        2017-12-09 22:48:47.546825312 +0000
> +++ gcc/optabs.c        2017-12-09 22:48:52.266015836 +0000
> @@ -5625,15 +5625,14 @@ expand_vec_perm_var (machine_mode mode,
>                                NULL, 0, OPTAB_DIRECT);
>    gcc_assert (sel != NULL);
>
> -  /* Broadcast the low byte each element into each of its bytes.  */
> -  vec_perm_builder const_sel (w, w, 1);
> -  for (i = 0; i < w; ++i)
> -    {
> -      int this_e = i / u * u;
> -      if (BYTES_BIG_ENDIAN)
> -       this_e += u - 1;
> -      const_sel.quick_push (this_e);
> -    }
> +  /* Broadcast the low byte each element into each of its bytes.
> +     The encoding has U interleaved stepped patterns, one for each
> +     byte of an element.  */
> +  vec_perm_builder const_sel (w, u, 3);
> +  unsigned int low_byte_in_u = BYTES_BIG_ENDIAN ? u - 1 : 0;
> +  for (i = 0; i < 3; ++i)
> +    for (unsigned int j = 0; j < u; ++j)
> +      const_sel.quick_push (i * u + low_byte_in_u);
>    sel = gen_lowpart (qimode, sel);
>    sel = expand_vec_perm_const (qimode, sel, sel, const_sel, qimode, NULL);
>    gcc_assert (sel != NULL);
> @@ -5853,16 +5852,20 @@ expand_mult_highpart (machine_mode mode,
>    expand_insn (optab_handler (tab2, mode), 3, eops);
>    m2 = gen_lowpart (mode, eops[0].value);
>
> -  vec_perm_builder sel (nunits, nunits, 1);
> +  vec_perm_builder sel;
>    if (method == 2)
>      {
> -      for (i = 0; i < nunits; ++i)
> +      /* The encoding has 2 interleaved stepped patterns.  */
> +      sel.new_vector (nunits, 2, 3);
> +      for (i = 0; i < 6; ++i)
>         sel.quick_push (!BYTES_BIG_ENDIAN + (i & ~1)
>                         + ((i & 1) ? nunits : 0));
>      }
>    else
>      {
> -      for (i = 0; i < nunits; ++i)
> +      /* The encoding has a single interleaved stepped pattern.  */
> +      sel.new_vector (nunits, 1, 3);
> +      for (i = 0; i < 3; ++i)
>         sel.quick_push (2 * i + (BYTES_BIG_ENDIAN ? 0 : 1));
>      }
>
> Index: gcc/optabs-query.c
> ===================================================================
> --- gcc/optabs-query.c  2017-12-09 22:48:47.545825268 +0000
> +++ gcc/optabs-query.c  2017-12-09 22:48:52.265015799 +0000
> @@ -501,8 +501,9 @@ can_mult_highpart_p (machine_mode mode,
>        op = uns_p ? vec_widen_umult_odd_optab : vec_widen_smult_odd_optab;
>        if (optab_handler (op, mode) != CODE_FOR_nothing)
>         {
> -         vec_perm_builder sel (nunits, nunits, 1);
> -         for (i = 0; i < nunits; ++i)
> +         /* The encoding has 2 interleaved stepped patterns.  */
> +         vec_perm_builder sel (nunits, 2, 3);
> +         for (i = 0; i < 6; ++i)
>             sel.quick_push (!BYTES_BIG_ENDIAN
>                             + (i & ~1)
>                             + ((i & 1) ? nunits : 0));
> @@ -518,8 +519,9 @@ can_mult_highpart_p (machine_mode mode,
>        op = uns_p ? vec_widen_umult_lo_optab : vec_widen_smult_lo_optab;
>        if (optab_handler (op, mode) != CODE_FOR_nothing)
>         {
> -         vec_perm_builder sel (nunits, nunits, 1);
> -         for (i = 0; i < nunits; ++i)
> +         /* The encoding has a single stepped pattern.  */
> +         vec_perm_builder sel (nunits, 1, 3);
> +         for (int i = 0; i < 3; ++i)
>             sel.quick_push (2 * i + (BYTES_BIG_ENDIAN ? 0 : 1));
>           vec_perm_indices indices (sel, 2, nunits);
>           if (can_vec_perm_const_p (mode, indices))
> Index: gcc/tree-vect-loop.c
> ===================================================================
> --- gcc/tree-vect-loop.c        2017-12-09 22:48:47.547825355 +0000
> +++ gcc/tree-vect-loop.c        2017-12-09 22:48:52.267015873 +0000
> @@ -3716,8 +3716,10 @@ vect_estimate_min_profitable_iters (loop
>  calc_vec_perm_mask_for_shift (unsigned int offset, unsigned int nelt,
>                               vec_perm_builder *sel)
>  {
> -  sel->new_vector (nelt, nelt, 1);
> -  for (unsigned int i = 0; i < nelt; i++)
> +  /* The encoding is a single stepped pattern.  Any wrap-around is handled
> +     by vec_perm_indices.  */
> +  sel->new_vector (nelt, 1, 3);
> +  for (unsigned int i = 0; i < 3; i++)
>      sel->quick_push (i + offset);
>  }
>
> Index: gcc/tree-vect-stmts.c
> ===================================================================
> --- gcc/tree-vect-stmts.c       2017-12-09 22:48:50.360942531 +0000
> +++ gcc/tree-vect-stmts.c       2017-12-09 22:48:52.268015910 +0000
> @@ -1717,8 +1717,9 @@ perm_mask_for_reverse (tree vectype)
>
>    nunits = TYPE_VECTOR_SUBPARTS (vectype);
>
> -  vec_perm_builder sel (nunits, nunits, 1);
> -  for (i = 0; i < nunits; ++i)
> +  /* The encoding has a single stepped pattern.  */
> +  vec_perm_builder sel (nunits, 1, 3);
> +  for (i = 0; i < 3; ++i)
>      sel.quick_push (nunits - 1 - i);
>
>    vec_perm_indices indices (sel, 1, nunits);
> @@ -2504,8 +2505,9 @@ vectorizable_bswap (gimple *stmt, gimple
>    unsigned int num_bytes = TYPE_VECTOR_SUBPARTS (char_vectype);
>    unsigned word_bytes = num_bytes / nunits;
>
> -  vec_perm_builder elts (num_bytes, num_bytes, 1);
> -  for (unsigned i = 0; i < nunits; ++i)
> +  /* The encoding uses one stepped pattern for each byte in the word.  */
> +  vec_perm_builder elts (num_bytes, word_bytes, 3);
> +  for (unsigned i = 0; i < 3; ++i)
>      for (unsigned j = 0; j < word_bytes; ++j)
>        elts.quick_push ((i + 1) * word_bytes - j - 1);
>
> Index: gcc/tree-vect-data-refs.c
> ===================================================================
> --- gcc/tree-vect-data-refs.c   2017-12-09 22:48:47.546825312 +0000
> +++ gcc/tree-vect-data-refs.c   2017-12-09 22:48:52.267015873 +0000
> @@ -4566,14 +4566,13 @@ vect_grouped_store_supported (tree vecty
>    if (VECTOR_MODE_P (mode))
>      {
>        unsigned int i, nelt = GET_MODE_NUNITS (mode);
> -      vec_perm_builder sel (nelt, nelt, 1);
> -      sel.quick_grow (nelt);
> -
>        if (count == 3)
>         {
>           unsigned int j0 = 0, j1 = 0, j2 = 0;
>           unsigned int i, j;
>
> +         vec_perm_builder sel (nelt, nelt, 1);
> +         sel.quick_grow (nelt);
>           vec_perm_indices indices;
>           for (j = 0; j < 3; j++)
>             {
> @@ -4623,7 +4622,10 @@ vect_grouped_store_supported (tree vecty
>           /* If length is not equal to 3 then only power of 2 is supported.  */
>           gcc_assert (pow2p_hwi (count));
>
> -         for (i = 0; i < nelt / 2; i++)
> +         /* The encoding has 2 interleaved stepped patterns.  */
> +         vec_perm_builder sel (nelt, 2, 3);
> +         sel.quick_grow (6);
> +         for (i = 0; i < 3; i++)
>             {
>               sel[i * 2] = i;
>               sel[i * 2 + 1] = i + nelt;
> @@ -4631,7 +4633,7 @@ vect_grouped_store_supported (tree vecty
>           vec_perm_indices indices (sel, 2, nelt);
>           if (can_vec_perm_const_p (mode, indices))
>             {
> -             for (i = 0; i < nelt; i++)
> +             for (i = 0; i < 6; i++)
>                 sel[i] += nelt / 2;
>               indices.new_vector (sel, 2, nelt);
>               if (can_vec_perm_const_p (mode, indices))
> @@ -4736,9 +4738,6 @@ vect_permute_store_chain (vec<tree> dr_c
>    unsigned int i, n, log_length = exact_log2 (length);
>    unsigned int j, nelt = TYPE_VECTOR_SUBPARTS (vectype);
>
> -  vec_perm_builder sel (nelt, nelt, 1);
> -  sel.quick_grow (nelt);
> -
>    result_chain->quick_grow (length);
>    memcpy (result_chain->address (), dr_chain.address (),
>           length * sizeof (tree));
> @@ -4747,6 +4746,8 @@ vect_permute_store_chain (vec<tree> dr_c
>      {
>        unsigned int j0 = 0, j1 = 0, j2 = 0;
>
> +      vec_perm_builder sel (nelt, nelt, 1);
> +      sel.quick_grow (nelt);
>        vec_perm_indices indices;
>        for (j = 0; j < 3; j++)
>          {
> @@ -4808,7 +4809,10 @@ vect_permute_store_chain (vec<tree> dr_c
>        /* If length is not equal to 3 then only power of 2 is supported.  */
>        gcc_assert (pow2p_hwi (length));
>
> -      for (i = 0, n = nelt / 2; i < n; i++)
> +      /* The encoding has 2 interleaved stepped patterns.  */
> +      vec_perm_builder sel (nelt, 2, 3);
> +      sel.quick_grow (6);
> +      for (i = 0; i < 3; i++)
>         {
>           sel[i * 2] = i;
>           sel[i * 2 + 1] = i + nelt;
> @@ -4816,7 +4820,7 @@ vect_permute_store_chain (vec<tree> dr_c
>         vec_perm_indices indices (sel, 2, nelt);
>         perm_mask_high = vect_gen_perm_mask_checked (vectype, indices);
>
> -       for (i = 0; i < nelt; i++)
> +       for (i = 0; i < 6; i++)
>           sel[i] += nelt / 2;
>         indices.new_vector (sel, 2, nelt);
>         perm_mask_low = vect_gen_perm_mask_checked (vectype, indices);
> @@ -5164,11 +5168,11 @@ vect_grouped_load_supported (tree vectyp
>    if (VECTOR_MODE_P (mode))
>      {
>        unsigned int i, j, nelt = GET_MODE_NUNITS (mode);
> -      vec_perm_builder sel (nelt, nelt, 1);
> -      sel.quick_grow (nelt);
>
>        if (count == 3)
>         {
> +         vec_perm_builder sel (nelt, nelt, 1);
> +         sel.quick_grow (nelt);
>           vec_perm_indices indices;
>           unsigned int k;
>           for (k = 0; k < 3; k++)
> @@ -5209,12 +5213,15 @@ vect_grouped_load_supported (tree vectyp
>           /* If length is not equal to 3 then only power of 2 is supported.  */
>           gcc_assert (pow2p_hwi (count));
>
> -         for (i = 0; i < nelt; i++)
> +         /* The encoding has a single stepped pattern.  */
> +         vec_perm_builder sel (nelt, 1, 3);
> +         sel.quick_grow (3);
> +         for (i = 0; i < 3; i++)
>             sel[i] = i * 2;
>           vec_perm_indices indices (sel, 2, nelt);
>           if (can_vec_perm_const_p (mode, indices))
>             {
> -             for (i = 0; i < nelt; i++)
> +             for (i = 0; i < 3; i++)
>                 sel[i] = i * 2 + 1;
>               indices.new_vector (sel, 2, nelt);
>               if (can_vec_perm_const_p (mode, indices))
> @@ -5332,9 +5339,6 @@ vect_permute_load_chain (vec<tree> dr_ch
>    unsigned int i, j, log_length = exact_log2 (length);
>    unsigned nelt = TYPE_VECTOR_SUBPARTS (vectype);
>
> -  vec_perm_builder sel (nelt, nelt, 1);
> -  sel.quick_grow (nelt);
> -
>    result_chain->quick_grow (length);
>    memcpy (result_chain->address (), dr_chain.address (),
>           length * sizeof (tree));
> @@ -5343,6 +5347,8 @@ vect_permute_load_chain (vec<tree> dr_ch
>      {
>        unsigned int k;
>
> +      vec_perm_builder sel (nelt, nelt, 1);
> +      sel.quick_grow (nelt);
>        vec_perm_indices indices;
>        for (k = 0; k < 3; k++)
>         {
> @@ -5390,12 +5396,15 @@ vect_permute_load_chain (vec<tree> dr_ch
>        /* If length is not equal to 3 then only power of 2 is supported.  */
>        gcc_assert (pow2p_hwi (length));
>
> -      for (i = 0; i < nelt; ++i)
> +      /* The encoding has a single stepped pattern.  */
> +      vec_perm_builder sel (nelt, 1, 3);
> +      sel.quick_grow (3);
> +      for (i = 0; i < 3; ++i)
>         sel[i] = i * 2;
>        vec_perm_indices indices (sel, 2, nelt);
>        perm_mask_even = vect_gen_perm_mask_checked (vectype, indices);
>
> -      for (i = 0; i < nelt; ++i)
> +      for (i = 0; i < 3; ++i)
>         sel[i] = i * 2 + 1;
>        indices.new_vector (sel, 2, nelt);
>        perm_mask_odd = vect_gen_perm_mask_checked (vectype, indices);

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [11/13] Use vec_perm_builder::series_p in shift_amt_for_vec_perm_mask
  2017-12-09 23:24   ` [11/13] Use vec_perm_builder::series_p in shift_amt_for_vec_perm_mask Richard Sandiford
  2017-12-19 20:37     ` Richard Sandiford
@ 2018-01-02 13:08     ` Richard Biener
  1 sibling, 0 replies; 46+ messages in thread
From: Richard Biener @ 2018-01-02 13:08 UTC (permalink / raw)
  To: GCC Patches, Richard Sandiford

On Sun, Dec 10, 2017 at 12:24 AM, Richard Sandiford
<richard.sandiford@linaro.org> wrote:
> This patch makes shift_amt_for_vec_perm_mask use series_p to check
> for the simple case of a natural linear series before falling back
> to testing each element individually.  The series_p test works with
> variable-length vectors but testing every individual element doesn't.

Ok.

Richard.

>
> 2017-12-09  Richard Sandiford  <richard.sandiford@linaro.org>
>
> gcc/
>         * optabs.c (shift_amt_for_vec_perm_mask): Try using series_p
>         before testing each element individually.
>
> Index: gcc/optabs.c
> ===================================================================
> --- gcc/optabs.c        2017-12-09 22:48:52.266015836 +0000
> +++ gcc/optabs.c        2017-12-09 22:48:56.257154317 +0000
> @@ -5375,20 +5375,20 @@ vector_compare_rtx (machine_mode cmp_mod
>  static rtx
>  shift_amt_for_vec_perm_mask (machine_mode mode, const vec_perm_indices &sel)
>  {
> -  unsigned int i, first, nelt = GET_MODE_NUNITS (mode);
> +  unsigned int nelt = GET_MODE_NUNITS (mode);
>    unsigned int bitsize = GET_MODE_UNIT_BITSIZE (mode);
> -
> -  first = sel[0];
> +  unsigned int first = sel[0];
>    if (first >= nelt)
>      return NULL_RTX;
> -  for (i = 1; i < nelt; i++)
> -    {
> -      int idx = sel[i];
> -      unsigned int expected = i + first;
> -      /* Indices into the second vector are all equivalent.  */
> -      if (idx < 0 || (MIN (nelt, (unsigned) idx) != MIN (nelt, expected)))
> -       return NULL_RTX;
> -    }
> +
> +  if (!sel.series_p (0, 1, first, 1))
> +    for (unsigned int i = 1; i < nelt; i++)
> +      {
> +       unsigned int expected = i + first;
> +       /* Indices into the second vector are all equivalent.  */
> +       if (MIN (nelt, sel[i]) != MIN (nelt, expected))
> +         return NULL_RTX;
> +      }
>
>    return GEN_INT (first * bitsize);
>  }

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [10/13] Rework VEC_PERM_EXPR folding
  2017-12-09 23:23 ` [10/13] Rework VEC_PERM_EXPR folding Richard Sandiford
                     ` (2 preceding siblings ...)
  2017-12-19 20:37   ` [10/13] Rework VEC_PERM_EXPR folding Richard Sandiford
@ 2018-01-02 13:08   ` Richard Biener
  3 siblings, 0 replies; 46+ messages in thread
From: Richard Biener @ 2018-01-02 13:08 UTC (permalink / raw)
  To: GCC Patches, Richard Sandiford

On Sun, Dec 10, 2017 at 12:23 AM, Richard Sandiford
<richard.sandiford@linaro.org> wrote:
> This patch reworks the VEC_PERM_EXPR folding so that more of it works
> for variable-length vectors.  E.g. it means that we can now recognise
> variable-length permutes that reduce to a single vector, or cases in
> which a variable-length permute only needs one input.  There should be
> no functional change for fixed-length vectors.

Ok.

Richard.

>
> 2017-12-09  Richard Sandiford  <richard.sandiford@linaro.org>
>
> gcc/
>         * selftest.h (selftest::vec_perm_indices_c_tests): Declare.
>         * selftest-run-tests.c (selftest::run_tests): Call it.
>         * vector-builder.h (vector_builder::operator ==): New function.
>         (vector_builder::operator !=): Likewise.
>         * vec-perm-indices.h (vec_perm_indices::series_p): Declare.
>         (vec_perm_indices::all_from_input_p): New function.
>         * vec-perm-indices.c (vec_perm_indices::series_p): Likewise.
>         (test_vec_perm_12, selftest::vec_perm_indices_c_tests): Likewise.
>         * fold-const.c (fold_ternary_loc): Use tree_to_vec_perm_builder
>         instead of reading the VECTOR_CST directly.  Detect whether both
>         vector inputs are the same before constructing the vec_perm_indices,
>         and update the number of inputs argument accordingly.  Use the
>         utility functions added above.  Only construct sel2 if we need to.
>
> Index: gcc/selftest.h
> ===================================================================
> *** gcc/selftest.h      2017-12-09 23:06:55.002855594 +0000
> --- gcc/selftest.h      2017-12-09 23:21:51.517599734 +0000
> *************** extern void vec_c_tests ();
> *** 201,206 ****
> --- 201,207 ----
>   extern void wide_int_cc_tests ();
>   extern void predict_c_tests ();
>   extern void simplify_rtx_c_tests ();
> + extern void vec_perm_indices_c_tests ();
>
>   extern int num_passes;
>
> Index: gcc/selftest-run-tests.c
> ===================================================================
> *** gcc/selftest-run-tests.c    2017-12-09 23:06:55.002855594 +0000
> --- gcc/selftest-run-tests.c    2017-12-09 23:21:51.517599734 +0000
> *************** selftest::run_tests ()
> *** 73,78 ****
> --- 73,79 ----
>
>     /* Mid-level data structures.  */
>     input_c_tests ();
> +   vec_perm_indices_c_tests ();
>     tree_c_tests ();
>     gimple_c_tests ();
>     rtl_tests_c_tests ();
> Index: gcc/vector-builder.h
> ===================================================================
> *** gcc/vector-builder.h        2017-12-09 23:06:55.002855594 +0000
> --- gcc/vector-builder.h        2017-12-09 23:21:51.518600090 +0000
> *************** #define GCC_VECTOR_BUILDER_H
> *** 97,102 ****
> --- 97,105 ----
>     bool encoded_full_vector_p () const;
>     T elt (unsigned int) const;
>
> +   bool operator == (const Derived &) const;
> +   bool operator != (const Derived &x) const { return !operator == (x); }
> +
>     void finalize ();
>
>   protected:
> *************** vector_builder<T, Derived>::new_vector (
> *** 168,173 ****
> --- 171,196 ----
>     this->truncate (0);
>   }
>
> + /* Return true if this vector and OTHER have the same elements and
> +    are encoded in the same way.  */
> +
> + template<typename T, typename Derived>
> + bool
> + vector_builder<T, Derived>::operator == (const Derived &other) const
> + {
> +   if (m_full_nelts != other.m_full_nelts
> +       || m_npatterns != other.m_npatterns
> +       || m_nelts_per_pattern != other.m_nelts_per_pattern)
> +     return false;
> +
> +   unsigned int nelts = encoded_nelts ();
> +   for (unsigned int i = 0; i < nelts; ++i)
> +     if (!derived ()->equal_p ((*this)[i], other[i]))
> +       return false;
> +
> +   return true;
> + }
> +
>   /* Return the value of vector element I, which might or might not be
>      encoded explicitly.  */
>
> Index: gcc/vec-perm-indices.h
> ===================================================================
> *** gcc/vec-perm-indices.h      2017-12-09 23:20:13.233112018 +0000
> --- gcc/vec-perm-indices.h      2017-12-09 23:21:51.517599734 +0000
> *************** typedef int_vector_builder<HOST_WIDE_INT
> *** 62,68 ****
> --- 62,70 ----
>
>     element_type clamp (element_type) const;
>     element_type operator[] (unsigned int i) const;
> +   bool series_p (unsigned int, unsigned int, element_type, element_type) const;
>     bool all_in_range_p (element_type, element_type) const;
> +   bool all_from_input_p (unsigned int) const;
>
>   private:
>     vec_perm_indices (const vec_perm_indices &);
> *************** vec_perm_indices::operator[] (unsigned i
> *** 119,122 ****
> --- 121,133 ----
>     return clamp (m_encoding.elt (i));
>   }
>
> + /* Return true if the permutation vector only selects elements from
> +    input I.  */
> +
> + inline bool
> + vec_perm_indices::all_from_input_p (unsigned int i) const
> + {
> +   return all_in_range_p (i * m_nelts_per_input, m_nelts_per_input);
> + }
> +
>   #endif
> Index: gcc/vec-perm-indices.c
> ===================================================================
> *** gcc/vec-perm-indices.c      2017-12-09 23:20:13.233112018 +0000
> --- gcc/vec-perm-indices.c      2017-12-09 23:21:51.517599734 +0000
> *************** Software Foundation; either version 3, o
> *** 28,33 ****
> --- 28,34 ----
>   #include "rtl.h"
>   #include "memmodel.h"
>   #include "emit-rtl.h"
> + #include "selftest.h"
>
>   /* Switch to a new permutation vector that selects between NINPUTS vector
>      inputs that have NELTS_PER_INPUT elements each.  Take the elements of the
> *************** vec_perm_indices::rotate_inputs (int del
> *** 85,90 ****
> --- 86,139 ----
>       m_encoding[i] = clamp (m_encoding[i] + element_delta);
>   }
>
> + /* Return true if index OUT_BASE + I * OUT_STEP selects input
> +    element IN_BASE + I * IN_STEP.  */
> +
> + bool
> + vec_perm_indices::series_p (unsigned int out_base, unsigned int out_step,
> +                           element_type in_base, element_type in_step) const
> + {
> +   /* Check the base value.  */
> +   if (clamp (m_encoding.elt (out_base)) != clamp (in_base))
> +     return false;
> +
> +   unsigned int full_nelts = m_encoding.full_nelts ();
> +   unsigned int npatterns = m_encoding.npatterns ();
> +
> +   /* Calculate which multiple of OUT_STEP elements we need to get
> +      back to the same pattern.  */
> +   unsigned int cycle_length = least_common_multiple (out_step, npatterns);
> +
> +   /* Check the steps.  */
> +   in_step = clamp (in_step);
> +   out_base += out_step;
> +   unsigned int limit = 0;
> +   for (;;)
> +     {
> +       /* Succeed if we've checked all the elements in the vector.  */
> +       if (out_base >= full_nelts)
> +       return true;
> +
> +       if (out_base >= npatterns)
> +       {
> +         /* We've got to the end of the "foreground" values.  Check
> +            2 elements from each pattern in the "background" values.  */
> +         if (limit == 0)
> +           limit = out_base + cycle_length * 2;
> +         else if (out_base >= limit)
> +           return true;
> +       }
> +
> +       element_type v0 = m_encoding.elt (out_base - out_step);
> +       element_type v1 = m_encoding.elt (out_base);
> +       if (clamp (v1 - v0) != in_step)
> +       return false;
> +
> +       out_base += out_step;
> +     }
> +   return true;
> + }
> +
>   /* Return true if all elements of the permutation vector are in the range
>      [START, START + SIZE).  */
>
> *************** vec_perm_indices_to_rtx (machine_mode mo
> *** 180,182 ****
> --- 229,280 ----
>       RTVEC_ELT (v, i) = gen_int_mode (indices[i], GET_MODE_INNER (mode));
>     return gen_rtx_CONST_VECTOR (mode, v);
>   }
> +
> + #if CHECKING_P
> +
> + namespace selftest {
> +
> + /* Test a 12-element vector.  */
> +
> + static void
> + test_vec_perm_12 (void)
> + {
> +   vec_perm_builder builder (12, 12, 1);
> +   for (unsigned int i = 0; i < 4; ++i)
> +     {
> +       builder.quick_push (i * 5);
> +       builder.quick_push (3 + i);
> +       builder.quick_push (2 + 3 * i);
> +     }
> +   vec_perm_indices indices (builder, 1, 12);
> +   ASSERT_TRUE (indices.series_p (0, 3, 0, 5));
> +   ASSERT_FALSE (indices.series_p (0, 3, 3, 5));
> +   ASSERT_FALSE (indices.series_p (0, 3, 0, 8));
> +   ASSERT_TRUE (indices.series_p (1, 3, 3, 1));
> +   ASSERT_TRUE (indices.series_p (2, 3, 2, 3));
> +
> +   ASSERT_TRUE (indices.series_p (0, 4, 0, 4));
> +   ASSERT_FALSE (indices.series_p (1, 4, 3, 4));
> +
> +   ASSERT_TRUE (indices.series_p (0, 6, 0, 10));
> +   ASSERT_FALSE (indices.series_p (0, 6, 0, 100));
> +
> +   ASSERT_FALSE (indices.series_p (1, 10, 3, 7));
> +   ASSERT_TRUE (indices.series_p (1, 10, 3, 8));
> +
> +   ASSERT_TRUE (indices.series_p (0, 12, 0, 10));
> +   ASSERT_TRUE (indices.series_p (0, 12, 0, 11));
> +   ASSERT_TRUE (indices.series_p (0, 12, 0, 100));
> + }
> +
> + /* Run selftests for this file.  */
> +
> + void
> + vec_perm_indices_c_tests ()
> + {
> +   test_vec_perm_12 ();
> + }
> +
> + } // namespace selftest
> +
> + #endif
> Index: gcc/fold-const.c
> ===================================================================
> *** gcc/fold-const.c    2017-12-09 23:18:12.040041251 +0000
> --- gcc/fold-const.c    2017-12-09 23:21:51.517599734 +0000
> *************** fold_ternary_loc (location_t loc, enum t
> *** 11547,11645 ****
>       case VEC_PERM_EXPR:
>         if (TREE_CODE (arg2) == VECTOR_CST)
>         {
> !         unsigned int nelts = VECTOR_CST_NELTS (arg2), i, mask, mask2;
> !         bool need_mask_canon = false;
> !         bool need_mask_canon2 = false;
> !         bool all_in_vec0 = true;
> !         bool all_in_vec1 = true;
> !         bool maybe_identity = true;
> !         bool single_arg = (op0 == op1);
> !         bool changed = false;
> !
> !         mask2 = 2 * nelts - 1;
> !         mask = single_arg ? (nelts - 1) : mask2;
> !         gcc_assert (nelts == TYPE_VECTOR_SUBPARTS (type));
> !         vec_perm_builder sel (nelts, nelts, 1);
> !         vec_perm_builder sel2 (nelts, nelts, 1);
> !         for (i = 0; i < nelts; i++)
> !           {
> !             tree val = VECTOR_CST_ELT (arg2, i);
> !             if (TREE_CODE (val) != INTEGER_CST)
> !               return NULL_TREE;
> !
> !             /* Make sure that the perm value is in an acceptable
> !                range.  */
> !             wi::tree_to_wide_ref t = wi::to_wide (val);
> !             need_mask_canon |= wi::gtu_p (t, mask);
> !             need_mask_canon2 |= wi::gtu_p (t, mask2);
> !             unsigned int elt = t.to_uhwi () & mask;
> !             unsigned int elt2 = t.to_uhwi () & mask2;
> !
> !             if (elt < nelts)
> !               all_in_vec1 = false;
> !             else
> !               all_in_vec0 = false;
> !
> !             if ((elt & (nelts - 1)) != i)
> !               maybe_identity = false;
> !
> !             sel.quick_push (elt);
> !             sel2.quick_push (elt2);
> !           }
>
> !         if (maybe_identity)
> !           {
> !             if (all_in_vec0)
> !               return op0;
> !             if (all_in_vec1)
> !               return op1;
> !           }
>
> !         if (all_in_vec0)
> !           op1 = op0;
> !         else if (all_in_vec1)
> !           {
> !             op0 = op1;
> !             for (i = 0; i < nelts; i++)
> !               sel[i] -= nelts;
> !             need_mask_canon = true;
>             }
>
> -         vec_perm_indices indices (sel, 2, nelts);
>           if ((TREE_CODE (op0) == VECTOR_CST
>                || TREE_CODE (op0) == CONSTRUCTOR)
>               && (TREE_CODE (op1) == VECTOR_CST
>                   || TREE_CODE (op1) == CONSTRUCTOR))
>             {
> !             tree t = fold_vec_perm (type, op0, op1, indices);
>               if (t != NULL_TREE)
>                 return t;
>             }
>
> !         if (op0 == op1 && !single_arg)
> !           changed = true;
>
> !         /* Some targets are deficient and fail to expand a single
> !            argument permutation while still allowing an equivalent
> !            2-argument version.  */
> !         if (need_mask_canon && arg2 == op2
> !             && !can_vec_perm_const_p (TYPE_MODE (type), indices, false)
> !             && can_vec_perm_const_p (TYPE_MODE (type),
> !                                      vec_perm_indices (sel2, 2, nelts),
> !                                      false))
>             {
> !             need_mask_canon = need_mask_canon2;
> !             sel.truncate (0);
> !             sel.splice (sel2);
> !           }
> !
> !         if (need_mask_canon && arg2 == op2)
> !           {
> !             tree eltype = TREE_TYPE (TREE_TYPE (arg2));
> !             tree_vector_builder tsel (TREE_TYPE (arg2), nelts, 1);
> !             for (i = 0; i < nelts; i++)
> !               tsel.quick_push (build_int_cst (eltype, sel[i]));
> !             op2 = tsel.build ();
>               changed = true;
>             }
>
> --- 11547,11611 ----
>       case VEC_PERM_EXPR:
>         if (TREE_CODE (arg2) == VECTOR_CST)
>         {
> !         /* Build a vector of integers from the tree mask.  */
> !         vec_perm_builder builder;
> !         if (!tree_to_vec_perm_builder (&builder, arg2))
> !           return NULL_TREE;
>
> !         /* Create a vec_perm_indices for the integer vector.  */
> !         unsigned int nelts = TYPE_VECTOR_SUBPARTS (type);
> !         bool single_arg = (op0 == op1);
> !         vec_perm_indices sel (builder, single_arg ? 1 : 2, nelts);
>
> !         /* Check for cases that fold to OP0 or OP1 in their original
> !            element order.  */
> !         if (sel.series_p (0, 1, 0, 1))
> !           return op0;
> !         if (sel.series_p (0, 1, nelts, 1))
> !           return op1;
> !
> !         if (!single_arg)
> !           {
> !             if (sel.all_from_input_p (0))
> !               op1 = op0;
> !             else if (sel.all_from_input_p (1))
> !               {
> !                 op0 = op1;
> !                 sel.rotate_inputs (1);
> !               }
>             }
>
>           if ((TREE_CODE (op0) == VECTOR_CST
>                || TREE_CODE (op0) == CONSTRUCTOR)
>               && (TREE_CODE (op1) == VECTOR_CST
>                   || TREE_CODE (op1) == CONSTRUCTOR))
>             {
> !             tree t = fold_vec_perm (type, op0, op1, sel);
>               if (t != NULL_TREE)
>                 return t;
>             }
>
> !         bool changed = (op0 == op1 && !single_arg);
>
> !         /* Generate a canonical form of the selector.  */
> !         if (arg2 == op2 && sel.encoding () != builder)
>             {
> !             /* Some targets are deficient and fail to expand a single
> !                argument permutation while still allowing an equivalent
> !                2-argument version.  */
> !             if (sel.ninputs () == 2
> !                 || can_vec_perm_const_p (TYPE_MODE (type), sel, false))
> !               op2 = vec_perm_indices_to_tree (TREE_TYPE (arg2), sel);
> !             else
> !               {
> !                 vec_perm_indices sel2 (builder, 2, nelts);
> !                 if (can_vec_perm_const_p (TYPE_MODE (type), sel2, false))
> !                   op2 = vec_perm_indices_to_tree (TREE_TYPE (arg2), sel2);
> !                 else
> !                   /* Not directly supported with either encoding,
> !                      so use the preferred form.  */
> !                   op2 = vec_perm_indices_to_tree (TREE_TYPE (arg2), sel);
> !               }
>               changed = true;
>             }
>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [12/13] Use ssizetype selectors for autovectorised VEC_PERM_EXPRs
  2017-12-09 23:25   ` [12/13] Use ssizetype selectors for autovectorised VEC_PERM_EXPRs Richard Sandiford
  2017-12-19 20:37     ` Richard Sandiford
@ 2018-01-02 13:09     ` Richard Biener
  1 sibling, 0 replies; 46+ messages in thread
From: Richard Biener @ 2018-01-02 13:09 UTC (permalink / raw)
  To: GCC Patches, Richard Sandiford

On Sun, Dec 10, 2017 at 12:25 AM, Richard Sandiford
<richard.sandiford@linaro.org> wrote:
> The previous patches mean that there's no reason that constant
> VEC_PERM_EXPRs need to have the same shape as the data inputs.
> This patch makes the autovectoriser use ssizetype elements instead,
> so that indices don't get truncated for large or variable-length vectors.

Ok.

Richard.

>
> 2017-12-09  Richard Sandiford  <richard.sandiford@linaro.org>
>
> gcc/
>         * tree-cfg.c (verify_gimple_assign_ternary): Allow the size of
>         the selector elements to be different from the data elements
>         if the selector is a VECTOR_CST.
>         * tree-vect-stmts.c (vect_gen_perm_mask_any): Use a vector of
>         ssizetype for the selector.
>
> Index: gcc/tree-cfg.c
> ===================================================================
> --- gcc/tree-cfg.c      2017-12-09 22:47:07.103588314 +0000
> +++ gcc/tree-cfg.c      2017-12-09 22:48:58.259216407 +0000
> @@ -4300,8 +4300,11 @@ verify_gimple_assign_ternary (gassign *s
>         }
>
>        if (TREE_CODE (TREE_TYPE (rhs3_type)) != INTEGER_TYPE
> -         || GET_MODE_BITSIZE (SCALAR_INT_TYPE_MODE (TREE_TYPE (rhs3_type)))
> -            != GET_MODE_BITSIZE (SCALAR_TYPE_MODE (TREE_TYPE (rhs1_type))))
> +         || (TREE_CODE (rhs3) != VECTOR_CST
> +             && (GET_MODE_BITSIZE (SCALAR_INT_TYPE_MODE
> +                                   (TREE_TYPE (rhs3_type)))
> +                 != GET_MODE_BITSIZE (SCALAR_TYPE_MODE
> +                                      (TREE_TYPE (rhs1_type))))))
>         {
>           error ("invalid mask type in vector permute expression");
>           debug_generic_expr (lhs_type);
> Index: gcc/tree-vect-stmts.c
> ===================================================================
> --- gcc/tree-vect-stmts.c       2017-12-09 22:48:52.268015910 +0000
> +++ gcc/tree-vect-stmts.c       2017-12-09 22:48:58.259216407 +0000
> @@ -6518,11 +6518,12 @@ vectorizable_store (gimple *stmt, gimple
>  tree
>  vect_gen_perm_mask_any (tree vectype, const vec_perm_indices &sel)
>  {
> -  tree mask_elt_type, mask_type;
> +  tree mask_type;
>
> -  mask_elt_type = lang_hooks.types.type_for_mode
> -    (int_mode_for_mode (TYPE_MODE (TREE_TYPE (vectype))).require (), 1);
> -  mask_type = get_vectype_for_scalar_type (mask_elt_type);
> +  unsigned int nunits = sel.length ();
> +  gcc_assert (nunits == TYPE_VECTOR_SUBPARTS (vectype));
> +
> +  mask_type = build_vector_type (ssizetype, nunits);
>    return vec_perm_indices_to_tree (mask_type, sel);
>  }
>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [07/13] Make vec_perm_indices use new vector encoding
  2017-12-20 13:48         ` Richard Sandiford
@ 2018-01-02 13:15           ` Richard Biener
  2018-01-02 18:30             ` Richard Sandiford
  0 siblings, 1 reply; 46+ messages in thread
From: Richard Biener @ 2018-01-02 13:15 UTC (permalink / raw)
  To: Richard Biener, GCC Patches, Richard Sandiford

On Wed, Dec 20, 2017 at 2:48 PM, Richard Sandiford
<richard.sandiford@linaro.org> wrote:
> Richard Biener <richard.guenther@gmail.com> writes:
>> On Tue, Dec 12, 2017 at 4:46 PM, Richard Sandiford
>> <richard.sandiford@linaro.org> wrote:
>>> Richard Biener <richard.guenther@gmail.com> writes:
>>>> On Sun, Dec 10, 2017 at 12:20 AM, Richard Sandiford
>>>> <richard.sandiford@linaro.org> wrote:
>>>>> This patch changes vec_perm_indices from a plain vec<> to a class
>>>>> that stores a canonicalised permutation, using the same encoding
>>>>> as for VECTOR_CSTs.  This means that vec_perm_indices now carries
>>>>> information about the number of vectors being permuted (currently
>>>>> always 1 or 2) and the number of elements in each input vector.
>>>>
>>>> Before I dive into  the C++ details can you explain why it needs this
>>>> info and how it encodes it for variable-length vectors?  To interleave
>>>> two vectors you need sth like { 0, N, 1, N+1, ... }, I'm not sure we
>>>> can directly encode N here, can we?  extract even/odd should just
>>>> work as { 0, 2, 4, 6, ...} without knowledge of whether we permute
>>>> one or two vectors (the one vector case just has two times the same
>>>> vector) or how many elements each of the vectors (or the result) has.
>>>
>>> One of the later patches switches the element types to HOST_WIDE_INT,
>>> so that we can represent all ssizetypes.  Then there's a poly_int
>>> patch (not yet posted) to make that poly_int64, so that we can
>>> represent the N even for variable-length vectors.
>>>
>>> The class needs to know the number of elements because that affects
>>> the canonical representation.  E.g. extract even on fixed-length
>>> vectors with both inputs the same should be { 0, 2, 4, ..., 0, 2, 4 ... },
>>> which we can't encode as a simple series.  Interleave low with both
>>> inputs the same should be { 0, 0, 1, 1, ... } for both fixed-length and
>>> variable-length vectors.
>>
>> Huh?  extract even is { 0, 2, 4, 6, 8 ... } indexes in the selection vector
>> are referencing concat'ed input vectors.  So yes, for two same vectors
>> that's effectively { 0, 2, 4, ..., 0, 2, 4, ... } but I don't see why
>> that should
>> be the canonical form?
>
> Current practice is to use the single-input form where possible,
> if both inputs are the same (see e.g. the VEC_PERM_EXPR handling
> in fold-const.c).  It means that things like:
>
>     _1 = VEC_PERM_EXPR <a, a, { 0, 2, 4, 6, 0, 2, 4, 6 }>;
>     _2 = VEC_PERM_EXPR <a, a, { 0, 2, 4, 6, 8, 10, 12, 14 }>;
>     _3 = VEC_PERM_EXPR <a, b, { 0, 2, 4, 6, 0, 2, 4, 6 }>;
>
> get folded to the same sequence, and so can be CSEd.
>
> We could instead convert the single-input form to use the two-input
> selector, but that would be harder.  The advantage of treating the
> single-input form as canonical is that it works even for irregular
> permutes.

Ok, I see.  Maybe adding a comment along this helps.

Thanks,
Richard.

> Thanks,
> Richard
>
>>> Also, operator[] is supposed to return an in-range selector even if
>>> the selector element is only implicitly encoded.  So we need to know
>>> the number of input elements there.
>>>
>>> Separating the number of input elements into the number of inputs
>>> and the number of elements per input isn't really necessary, but made
>>> it easier to provide routines for testing whether all selected
>>> elements come from a particular input, and for rotating the selector
>>> by a whole number of inputs.
>>>
>>> Thanks,
>>> Richard

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [07/13] Make vec_perm_indices use new vector encoding
  2018-01-02 13:15           ` Richard Biener
@ 2018-01-02 18:30             ` Richard Sandiford
  0 siblings, 0 replies; 46+ messages in thread
From: Richard Sandiford @ 2018-01-02 18:30 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC Patches

Richard Biener <richard.guenther@gmail.com> writes:
> On Wed, Dec 20, 2017 at 2:48 PM, Richard Sandiford
> <richard.sandiford@linaro.org> wrote:
>> Richard Biener <richard.guenther@gmail.com> writes:
>>> On Tue, Dec 12, 2017 at 4:46 PM, Richard Sandiford
>>> <richard.sandiford@linaro.org> wrote:
>>>> Richard Biener <richard.guenther@gmail.com> writes:
>>>>> On Sun, Dec 10, 2017 at 12:20 AM, Richard Sandiford
>>>>> <richard.sandiford@linaro.org> wrote:
>>>>>> This patch changes vec_perm_indices from a plain vec<> to a class
>>>>>> that stores a canonicalised permutation, using the same encoding
>>>>>> as for VECTOR_CSTs.  This means that vec_perm_indices now carries
>>>>>> information about the number of vectors being permuted (currently
>>>>>> always 1 or 2) and the number of elements in each input vector.
>>>>>
>>>>> Before I dive into  the C++ details can you explain why it needs this
>>>>> info and how it encodes it for variable-length vectors?  To interleave
>>>>> two vectors you need sth like { 0, N, 1, N+1, ... }, I'm not sure we
>>>>> can directly encode N here, can we?  extract even/odd should just
>>>>> work as { 0, 2, 4, 6, ...} without knowledge of whether we permute
>>>>> one or two vectors (the one vector case just has two times the same
>>>>> vector) or how many elements each of the vectors (or the result) has.
>>>>
>>>> One of the later patches switches the element types to HOST_WIDE_INT,
>>>> so that we can represent all ssizetypes.  Then there's a poly_int
>>>> patch (not yet posted) to make that poly_int64, so that we can
>>>> represent the N even for variable-length vectors.
>>>>
>>>> The class needs to know the number of elements because that affects
>>>> the canonical representation.  E.g. extract even on fixed-length
>>>> vectors with both inputs the same should be { 0, 2, 4, ..., 0, 2, 4 ... },
>>>> which we can't encode as a simple series.  Interleave low with both
>>>> inputs the same should be { 0, 0, 1, 1, ... } for both fixed-length and
>>>> variable-length vectors.
>>>
>>> Huh?  extract even is { 0, 2, 4, 6, 8 ... } indexes in the selection vector
>>> are referencing concat'ed input vectors.  So yes, for two same vectors
>>> that's effectively { 0, 2, 4, ..., 0, 2, 4, ... } but I don't see why
>>> that should
>>> be the canonical form?
>>
>> Current practice is to use the single-input form where possible,
>> if both inputs are the same (see e.g. the VEC_PERM_EXPR handling
>> in fold-const.c).  It means that things like:
>>
>>     _1 = VEC_PERM_EXPR <a, a, { 0, 2, 4, 6, 0, 2, 4, 6 }>;
>>     _2 = VEC_PERM_EXPR <a, a, { 0, 2, 4, 6, 8, 10, 12, 14 }>;
>>     _3 = VEC_PERM_EXPR <a, b, { 0, 2, 4, 6, 0, 2, 4, 6 }>;
>>
>> get folded to the same sequence, and so can be CSEd.
>>
>> We could instead convert the single-input form to use the two-input
>> selector, but that would be harder.  The advantage of treating the
>> single-input form as canonical is that it works even for irregular
>> permutes.
>
> Ok, I see.  Maybe adding a comment along this helps.

OK, thanks, installed as below with that change.

Richard


2018-01-02  Richard Sandiford  <richard.sandiford@linaro.org>

gcc/
	* int-vector-builder.h: New file.
	* vec-perm-indices.h: Include int-vector-builder.h.
	(vec_perm_indices): Redefine as an int_vector_builder.
	(auto_vec_perm_indices): Delete.
	(vec_perm_builder): Redefine as a stand-alone class.
	(vec_perm_indices::vec_perm_indices): New function.
	(vec_perm_indices::clamp): Likewise.
	* vec-perm-indices.c: Include fold-const.h and tree-vector-builder.h.
	(vec_perm_indices::new_vector): New function.
	(vec_perm_indices::new_expanded_vector): Update for new
	vec_perm_indices class.
	(vec_perm_indices::rotate_inputs): New function.
	(vec_perm_indices::all_in_range_p): Operate directly on the
	encoded form, without computing elided elements.
	(tree_to_vec_perm_builder): Operate directly on the VECTOR_CST
	encoding.  Update for new vec_perm_indices class.
	* optabs.c (expand_vec_perm_const): Create a vec_perm_indices for
	the given vec_perm_builder.
	(expand_vec_perm_var): Update vec_perm_builder constructor.
	(expand_mult_highpart): Use vec_perm_builder instead of
	auto_vec_perm_indices.
	* optabs-query.c (can_mult_highpart_p): Use vec_perm_builder and
	vec_perm_indices instead of auto_vec_perm_indices.  Use a single
	or double series encoding as appropriate.
	* fold-const.c (fold_ternary_loc): Use vec_perm_builder and
	vec_perm_indices instead of auto_vec_perm_indices.
	* tree-ssa-forwprop.c (simplify_vector_constructor): Likewise.
	* tree-vect-data-refs.c (vect_grouped_store_supported): Likewise.
	(vect_permute_store_chain): Likewise.
	(vect_grouped_load_supported): Likewise.
	(vect_permute_load_chain): Likewise.
	(vect_shift_permute_load_chain): Likewise.
	* tree-vect-slp.c (vect_build_slp_tree_1): Likewise.
	(vect_transform_slp_perm_load): Likewise.
	(vect_schedule_slp_instance): Likewise.
	* tree-vect-stmts.c (perm_mask_for_reverse): Likewise.
	(vectorizable_mask_load_store): Likewise.
	(vectorizable_bswap): Likewise.
	(vectorizable_store): Likewise.
	(vectorizable_load): Likewise.
	* tree-vect-generic.c (lower_vec_perm): Use vec_perm_builder and
	vec_perm_indices instead of auto_vec_perm_indices.  Use
	tree_to_vec_perm_builder to read the vector from a tree.
	* tree-vect-loop.c (calc_vec_perm_mask_for_shift): Take a
	vec_perm_builder instead of a vec_perm_indices.
	(have_whole_vector_shift): Use vec_perm_builder and
	vec_perm_indices instead of auto_vec_perm_indices.  Leave the
	truncation to calc_vec_perm_mask_for_shift.
	(vect_create_epilog_for_reduction): Likewise.
	* config/aarch64/aarch64.c (expand_vec_perm_d::perm): Change
	from auto_vec_perm_indices to vec_perm_indices.
	(aarch64_expand_vec_perm_const_1): Use rotate_inputs on d.perm
	instead of changing individual elements.
	(aarch64_vectorize_vec_perm_const): Use new_vector to install
	the vector in d.perm.
	* config/arm/arm.c (expand_vec_perm_d::perm): Change
	from auto_vec_perm_indices to vec_perm_indices.
	(arm_expand_vec_perm_const_1): Use rotate_inputs on d.perm
	instead of changing individual elements.
	(arm_vectorize_vec_perm_const): Use new_vector to install
	the vector in d.perm.
	* config/powerpcspe/powerpcspe.c (rs6000_expand_extract_even):
	Update vec_perm_builder constructor.
	(rs6000_expand_interleave): Likewise.
	* config/rs6000/rs6000.c (rs6000_expand_extract_even): Likewise.
	(rs6000_expand_interleave): Likewise.
------------------------------------------------------------------------------

Index: gcc/int-vector-builder.h
===================================================================
--- /dev/null	2017-12-30 11:27:13.464311244 +0000
+++ gcc/int-vector-builder.h	2018-01-02 17:01:28.746627393 +0000
@@ -0,0 +1,90 @@
+/* A class for building vector integer constants.
+   Copyright (C) 2017 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+#ifndef GCC_INT_VECTOR_BUILDER_H
+#define GCC_INT_VECTOR_BUILDER_H 1
+
+#include "vector-builder.h"
+
+/* This class is used to build vectors of integer type T using the same
+   encoding as tree and rtx constants.  See vector_builder for more
+   details.  */
+template<typename T>
+class int_vector_builder : public vector_builder<T, int_vector_builder<T> >
+{
+  typedef vector_builder<T, int_vector_builder> parent;
+  friend class vector_builder<T, int_vector_builder>;
+
+public:
+  int_vector_builder () {}
+  int_vector_builder (unsigned int, unsigned int, unsigned int);
+
+  using parent::new_vector;
+
+private:
+  bool equal_p (T, T) const;
+  bool allow_steps_p () const { return true; }
+  bool integral_p (T) const { return true; }
+  T step (T, T) const;
+  T apply_step (T, unsigned int, T) const;
+  bool can_elide_p (T) const { return true; }
+  void note_representative (T *, T) {}
+};
+
+/* Create a new builder for a vector with FULL_NELTS elements.
+   Initially encode the value as NPATTERNS interleaved patterns with
+   NELTS_PER_PATTERN elements each.  */
+
+template<typename T>
+inline
+int_vector_builder<T>::int_vector_builder (unsigned int full_nelts,
+					   unsigned int npatterns,
+					   unsigned int nelts_per_pattern)
+{
+  new_vector (full_nelts, npatterns, nelts_per_pattern);
+}
+
+/* Return true if elements ELT1 and ELT2 are equal.  */
+
+template<typename T>
+inline bool
+int_vector_builder<T>::equal_p (T elt1, T elt2) const
+{
+  return elt1 == elt2;
+}
+
+/* Return the value of element ELT2 minus the value of element ELT1.  */
+
+template<typename T>
+inline T
+int_vector_builder<T>::step (T elt1, T elt2) const
+{
+  return elt2 - elt1;
+}
+
+/* Return a vector element with the value BASE + FACTOR * STEP.  */
+
+template<typename T>
+inline T
+int_vector_builder<T>::apply_step (T base, unsigned int factor, T step) const
+{
+  return base + factor * step;
+}
+
+#endif
Index: gcc/vec-perm-indices.h
===================================================================
--- gcc/vec-perm-indices.h	2018-01-02 17:01:26.633719678 +0000
+++ gcc/vec-perm-indices.h	2018-01-02 18:25:11.081076126 +0000
@@ -20,30 +20,117 @@ Software Foundation; either version 3, o
 #ifndef GCC_VEC_PERN_INDICES_H
 #define GCC_VEC_PERN_INDICES_H 1
 
+#include "int-vector-builder.h"
+
+/* A vector_builder for building constant permutation vectors.
+   The elements do not need to be clamped to a particular range
+   of input elements.  */
+typedef int_vector_builder<HOST_WIDE_INT> vec_perm_builder;
+
 /* This class represents a constant permutation vector, such as that used
-   as the final operand to a VEC_PERM_EXPR.  */
-class vec_perm_indices : public auto_vec<unsigned short, 32>
+   as the final operand to a VEC_PERM_EXPR.
+
+   Permutation vectors select indices modulo the number of input elements,
+   and the class canonicalizes each permutation vector for a particular
+   number of input vectors and for a particular number of elements per
+   input.  For example, the gimple statements:
+
+    _1 = VEC_PERM_EXPR <a, a, { 0, 2, 4, 6, 0, 2, 4, 6 }>;
+    _2 = VEC_PERM_EXPR <a, a, { 0, 2, 4, 6, 8, 10, 12, 14 }>;
+    _3 = VEC_PERM_EXPR <a, a, { 0, 2, 20, 22, 24, 2, 4, 14 }>;
+
+   effectively have only a single vector input "a".  If "a" has 8
+   elements, the indices select elements modulo 8, which makes all three
+   VEC_PERM_EXPRs equivalent.  The canonical form is for the indices to be
+   in the range [0, number of input elements - 1], so the class treats the
+   second and third permutation vectors as though they had been the first.
+
+   The class copes with cases in which the input and output vectors have
+   different numbers of elements.  */
+class vec_perm_indices
 {
-  typedef unsigned short element_type;
-  typedef auto_vec<element_type, 32> parent_type;
+  typedef HOST_WIDE_INT element_type;
 
 public:
-  vec_perm_indices () {}
-  vec_perm_indices (unsigned int nunits) : parent_type (nunits) {}
+  vec_perm_indices ();
+  vec_perm_indices (const vec_perm_builder &, unsigned int, unsigned int);
 
+  void new_vector (const vec_perm_builder &, unsigned int, unsigned int);
   void new_expanded_vector (const vec_perm_indices &, unsigned int);
+  void rotate_inputs (int delta);
+
+  /* Return the underlying vector encoding.  */
+  const vec_perm_builder &encoding () const { return m_encoding; }
+
+  /* Return the number of output elements.  This is called length ()
+     so that we present a more vec-like interface.  */
+  unsigned int length () const { return m_encoding.full_nelts (); }
 
+  /* Return the number of input vectors being permuted.  */
+  unsigned int ninputs () const { return m_ninputs; }
+
+  /* Return the number of elements in each input vector.  */
+  unsigned int nelts_per_input () const { return m_nelts_per_input; }
+
+  /* Return the total number of input elements.  */
+  unsigned int input_nelts () const { return m_ninputs * m_nelts_per_input; }
+
+  element_type clamp (element_type) const;
+  element_type operator[] (unsigned int i) const;
   bool all_in_range_p (element_type, element_type) const;
 
 private:
   vec_perm_indices (const vec_perm_indices &);
-};
 
-/* Temporary.  */
-typedef vec_perm_indices vec_perm_builder;
-typedef vec_perm_indices auto_vec_perm_indices;
+  vec_perm_builder m_encoding;
+  unsigned int m_ninputs;
+  unsigned int m_nelts_per_input;
+};
 
 bool tree_to_vec_perm_builder (vec_perm_builder *, tree);
 rtx vec_perm_indices_to_rtx (machine_mode, const vec_perm_indices &);
 
+inline
+vec_perm_indices::vec_perm_indices ()
+  : m_ninputs (0),
+    m_nelts_per_input (0)
+{
+}
+
+/* Construct a permutation vector that selects between NINPUTS vector
+   inputs that have NELTS_PER_INPUT elements each.  Take the elements of
+   the new vector from ELEMENTS, clamping each one to be in range.  */
+
+inline
+vec_perm_indices::vec_perm_indices (const vec_perm_builder &elements,
+				    unsigned int ninputs,
+				    unsigned int nelts_per_input)
+{
+  new_vector (elements, ninputs, nelts_per_input);
+}
+
+/* Return the canonical value for permutation vector element ELT,
+   taking into account the current number of input elements.  */
+
+inline vec_perm_indices::element_type
+vec_perm_indices::clamp (element_type elt) const
+{
+  element_type limit = input_nelts ();
+  elt %= limit;
+  /* Treat negative elements as counting from the end.  This only matters
+     if the vector size is not a power of 2.  */
+  if (elt < 0)
+    elt += limit;
+  return elt;
+}
+
+/* Return the value of vector element I, which might or might not be
+   explicitly encoded.  */
+
+inline vec_perm_indices::element_type
+vec_perm_indices::operator[] (unsigned int i) const
+{
+  return clamp (m_encoding.elt (i));
+}
+
 #endif
Index: gcc/vec-perm-indices.c
===================================================================
--- gcc/vec-perm-indices.c	2018-01-02 17:01:26.632719721 +0000
+++ gcc/vec-perm-indices.c	2018-01-02 17:01:28.750627219 +0000
@@ -22,11 +22,33 @@ Software Foundation; either version 3, o
 #include "coretypes.h"
 #include "vec-perm-indices.h"
 #include "tree.h"
+#include "fold-const.h"
+#include "tree-vector-builder.h"
 #include "backend.h"
 #include "rtl.h"
 #include "memmodel.h"
 #include "emit-rtl.h"
 
+/* Switch to a new permutation vector that selects between NINPUTS vector
+   inputs that have NELTS_PER_INPUT elements each.  Take the elements of the
+   new permutation vector from ELEMENTS, clamping each one to be in range.  */
+
+void
+vec_perm_indices::new_vector (const vec_perm_builder &elements,
+			      unsigned int ninputs,
+			      unsigned int nelts_per_input)
+{
+  m_ninputs = ninputs;
+  m_nelts_per_input = nelts_per_input;
+  /* Expand the encoding and clamp each element.  E.g. { 0, 2, 4, ... }
+     might wrap halfway if there is only one vector input.  */
+  unsigned int full_nelts = elements.full_nelts ();
+  m_encoding.new_vector (full_nelts, full_nelts, 1);
+  for (unsigned int i = 0; i < full_nelts; ++i)
+    m_encoding.quick_push (clamp (elements.elt (i)));
+  m_encoding.finalize ();
+}
+
 /* Switch to a new permutation vector that selects the same input elements
    as ORIG, but with each element split into FACTOR pieces.  For example,
    if ORIG is { 1, 2, 0, 3 } and FACTOR is 2, the new permutation is
@@ -36,14 +58,31 @@ Software Foundation; either version 3, o
 vec_perm_indices::new_expanded_vector (const vec_perm_indices &orig,
 				       unsigned int factor)
 {
-  truncate (0);
-  reserve (orig.length () * factor);
-  for (unsigned int i = 0; i < orig.length (); ++i)
+  m_ninputs = orig.m_ninputs;
+  m_nelts_per_input = orig.m_nelts_per_input * factor;
+  m_encoding.new_vector (orig.m_encoding.full_nelts () * factor,
+			 orig.m_encoding.npatterns () * factor,
+			 orig.m_encoding.nelts_per_pattern ());
+  unsigned int encoded_nelts = orig.m_encoding.encoded_nelts ();
+  for (unsigned int i = 0; i < encoded_nelts; ++i)
     {
-      element_type base = orig[i] * factor;
+      element_type base = orig.m_encoding[i] * factor;
       for (unsigned int j = 0; j < factor; ++j)
-	quick_push (base + j);
+	m_encoding.quick_push (base + j);
     }
+  m_encoding.finalize ();
+}
+
+/* Rotate the inputs of the permutation right by DELTA inputs.  This changes
+   the values of the permutation vector but it doesn't change the way that
+   the elements are encoded.  */
+
+void
+vec_perm_indices::rotate_inputs (int delta)
+{
+  element_type element_delta = delta * m_nelts_per_input;
+  for (unsigned int i = 0; i < m_encoding.length (); ++i)
+    m_encoding[i] = clamp (m_encoding[i] + element_delta);
 }
 
 /* Return true if all elements of the permutation vector are in the range
@@ -52,9 +91,44 @@ vec_perm_indices::new_expanded_vector (c
 bool
 vec_perm_indices::all_in_range_p (element_type start, element_type size) const
 {
-  for (unsigned int i = 0; i < length (); ++i)
-    if ((*this)[i] < start || ((*this)[i] - start) >= size)
+  /* Check the first two elements of each pattern.  */
+  unsigned int npatterns = m_encoding.npatterns ();
+  unsigned int nelts_per_pattern = m_encoding.nelts_per_pattern ();
+  unsigned int base_nelts = npatterns * MIN (nelts_per_pattern, 2);
+  for (unsigned int i = 0; i < base_nelts; ++i)
+    if (m_encoding[i] < start || (m_encoding[i] - start) >= size)
       return false;
+
+  /* For stepped encodings, check the full range of the series.  */
+  if (nelts_per_pattern == 3)
+    {
+      element_type limit = input_nelts ();
+
+      /* The number of elements in each pattern beyond the first two
+	 that we checked above.  */
+      unsigned int step_nelts = (m_encoding.full_nelts () / npatterns) - 2;
+      for (unsigned int i = 0; i < npatterns; ++i)
+	{
+	  /* BASE1 has been checked but BASE2 hasn't.   */
+	  element_type base1 = m_encoding[i + npatterns];
+	  element_type base2 = m_encoding[i + base_nelts];
+
+	  /* The step to add to get from BASE1 to each subsequent value.  */
+	  element_type step = clamp (base2 - base1);
+
+	  /* STEP has no inherent sign, so a value near LIMIT can
+	     act as a negative step.  The series is in range if it
+	     is in range according to one of the two interpretations.
+
+	     Since we're dealing with clamped values, ELEMENT_TYPE is
+	     wide enough for overflow not to be a problem.  */
+	  element_type headroom_down = base1 - start;
+	  element_type headroom_up = size - headroom_down - 1;
+	  if (headroom_up < step * step_nelts
+	      && headroom_down < (limit - step) * step_nelts)
+	    return false;
+	}
+    }
   return true;
 }
 
@@ -65,15 +139,16 @@ vec_perm_indices::all_in_range_p (elemen
 bool
 tree_to_vec_perm_builder (vec_perm_builder *builder, tree cst)
 {
-  unsigned int nelts = TYPE_VECTOR_SUBPARTS (TREE_TYPE (cst));
-  for (unsigned int i = 0; i < nelts; ++i)
-    if (!tree_fits_shwi_p (vector_cst_elt (cst, i)))
+  unsigned int encoded_nelts = vector_cst_encoded_nelts (cst);
+  for (unsigned int i = 0; i < encoded_nelts; ++i)
+    if (!tree_fits_shwi_p (VECTOR_CST_ENCODED_ELT (cst, i)))
       return false;
 
-  builder->reserve (nelts);
-  for (unsigned int i = 0; i < nelts; ++i)
-    builder->quick_push (tree_to_shwi (vector_cst_elt (cst, i))
-			 & (2 * nelts - 1));
+  builder->new_vector (TYPE_VECTOR_SUBPARTS (TREE_TYPE (cst)),
+		       VECTOR_CST_NPATTERNS (cst),
+		       VECTOR_CST_NELTS_PER_PATTERN (cst));
+  for (unsigned int i = 0; i < encoded_nelts; ++i)
+    builder->quick_push (tree_to_shwi (VECTOR_CST_ENCODED_ELT (cst, i)));
   return true;
 }
 
Index: gcc/optabs.c
===================================================================
--- gcc/optabs.c	2018-01-02 17:01:27.762670303 +0000
+++ gcc/optabs.c	2018-01-02 17:01:28.747627350 +0000
@@ -5476,6 +5476,11 @@ expand_vec_perm_const (machine_mode mode
   rtx_insn *last = get_last_insn ();
 
   bool single_arg_p = rtx_equal_p (v0, v1);
+  /* Always specify two input vectors here and leave the target to handle
+     cases in which the inputs are equal.  Not all backends can cope with
+     the single-input representation when testing for a double-input
+     target instruction.  */
+  vec_perm_indices indices (sel, 2, GET_MODE_NUNITS (mode));
 
   /* See if this can be handled with a vec_shr.  We only do this if the
      second vector is all zeroes.  */
@@ -5488,7 +5493,7 @@ expand_vec_perm_const (machine_mode mode
       && (shift_code != CODE_FOR_nothing
 	  || shift_code_qi != CODE_FOR_nothing))
     {
-      rtx shift_amt = shift_amt_for_vec_perm_mask (mode, sel);
+      rtx shift_amt = shift_amt_for_vec_perm_mask (mode, indices);
       if (shift_amt)
 	{
 	  struct expand_operand ops[3];
@@ -5520,7 +5525,7 @@ expand_vec_perm_const (machine_mode mode
       else
 	v1 = force_reg (mode, v1);
 
-      if (targetm.vectorize.vec_perm_const (mode, target, v0, v1, sel))
+      if (targetm.vectorize.vec_perm_const (mode, target, v0, v1, indices))
 	return target;
     }
 
@@ -5529,7 +5534,7 @@ expand_vec_perm_const (machine_mode mode
   rtx target_qi = NULL_RTX, v0_qi = NULL_RTX, v1_qi = NULL_RTX;
   if (qimode != VOIDmode)
     {
-      qimode_indices.new_expanded_vector (sel, GET_MODE_UNIT_SIZE (mode));
+      qimode_indices.new_expanded_vector (indices, GET_MODE_UNIT_SIZE (mode));
       target_qi = gen_reg_rtx (qimode);
       v0_qi = gen_lowpart (qimode, v0);
       v1_qi = gen_lowpart (qimode, v1);
@@ -5556,7 +5561,7 @@ expand_vec_perm_const (machine_mode mode
      REQUIRED_SEL_MODE is OK.  */
   if (sel_mode != required_sel_mode)
     {
-      if (!selector_fits_mode_p (required_sel_mode, sel))
+      if (!selector_fits_mode_p (required_sel_mode, indices))
 	{
 	  delete_insns_since (last);
 	  return NULL_RTX;
@@ -5567,7 +5572,7 @@ expand_vec_perm_const (machine_mode mode
   insn_code icode = direct_optab_handler (vec_perm_optab, mode);
   if (icode != CODE_FOR_nothing)
     {
-      rtx sel_rtx = vec_perm_indices_to_rtx (sel_mode, sel);
+      rtx sel_rtx = vec_perm_indices_to_rtx (sel_mode, indices);
       rtx tmp = expand_vec_perm_1 (icode, target, v0, v1, sel_rtx);
       if (tmp)
 	return tmp;
@@ -5642,7 +5647,7 @@ expand_vec_perm_var (machine_mode mode,
   gcc_assert (sel != NULL);
 
   /* Broadcast the low byte each element into each of its bytes.  */
-  vec_perm_builder const_sel (w);
+  vec_perm_builder const_sel (w, w, 1);
   for (i = 0; i < w; ++i)
     {
       int this_e = i / u * u;
@@ -5890,7 +5895,7 @@ expand_mult_highpart (machine_mode mode,
   expand_insn (optab_handler (tab2, mode), 3, eops);
   m2 = gen_lowpart (mode, eops[0].value);
 
-  auto_vec_perm_indices sel (nunits);
+  vec_perm_builder sel (nunits, nunits, 1);
   if (method == 2)
     {
       for (i = 0; i < nunits; ++i)
Index: gcc/optabs-query.c
===================================================================
--- gcc/optabs-query.c	2018-01-02 17:01:27.761670346 +0000
+++ gcc/optabs-query.c	2018-01-02 17:01:28.746627393 +0000
@@ -516,12 +516,13 @@ can_mult_highpart_p (machine_mode mode,
       op = uns_p ? vec_widen_umult_odd_optab : vec_widen_smult_odd_optab;
       if (optab_handler (op, mode) != CODE_FOR_nothing)
 	{
-	  auto_vec_perm_indices sel (nunits);
+	  vec_perm_builder sel (nunits, nunits, 1);
 	  for (i = 0; i < nunits; ++i)
 	    sel.quick_push (!BYTES_BIG_ENDIAN
 			    + (i & ~1)
 			    + ((i & 1) ? nunits : 0));
-	  if (can_vec_perm_const_p (mode, sel))
+	  vec_perm_indices indices (sel, 2, nunits);
+	  if (can_vec_perm_const_p (mode, indices))
 	    return 2;
 	}
     }
@@ -532,10 +533,11 @@ can_mult_highpart_p (machine_mode mode,
       op = uns_p ? vec_widen_umult_lo_optab : vec_widen_smult_lo_optab;
       if (optab_handler (op, mode) != CODE_FOR_nothing)
 	{
-	  auto_vec_perm_indices sel (nunits);
+	  vec_perm_builder sel (nunits, nunits, 1);
 	  for (i = 0; i < nunits; ++i)
 	    sel.quick_push (2 * i + (BYTES_BIG_ENDIAN ? 0 : 1));
-	  if (can_vec_perm_const_p (mode, sel))
+	  vec_perm_indices indices (sel, 2, nunits);
+	  if (can_vec_perm_const_p (mode, indices))
 	    return 3;
 	}
     }
Index: gcc/fold-const.c
===================================================================
--- gcc/fold-const.c	2018-01-02 17:01:26.628719897 +0000
+++ gcc/fold-const.c	2018-01-02 17:01:28.746627393 +0000
@@ -11373,7 +11373,7 @@ fold_ternary_loc (location_t loc, enum t
 	    {
 	      unsigned int nelts = VECTOR_CST_NELTS (arg0), i;
 	      gcc_assert (nelts == TYPE_VECTOR_SUBPARTS (type));
-	      auto_vec_perm_indices sel (nelts);
+	      vec_perm_builder sel (nelts, nelts, 1);
 	      for (i = 0; i < nelts; i++)
 		{
 		  tree val = VECTOR_CST_ELT (arg0, i);
@@ -11384,7 +11384,8 @@ fold_ternary_loc (location_t loc, enum t
 		  else /* Currently unreachable.  */
 		    return NULL_TREE;
 		}
-	      tree t = fold_vec_perm (type, arg1, arg2, sel);
+	      tree t = fold_vec_perm (type, arg1, arg2,
+				      vec_perm_indices (sel, 2, nelts));
 	      if (t != NULL_TREE)
 		return t;
 	    }
@@ -11716,8 +11717,8 @@ fold_ternary_loc (location_t loc, enum t
 	  mask2 = 2 * nelts - 1;
 	  mask = single_arg ? (nelts - 1) : mask2;
 	  gcc_assert (nelts == TYPE_VECTOR_SUBPARTS (type));
-	  auto_vec_perm_indices sel (nelts);
-	  auto_vec_perm_indices sel2 (nelts);
+	  vec_perm_builder sel (nelts, nelts, 1);
+	  vec_perm_builder sel2 (nelts, nelts, 1);
 	  for (i = 0; i < nelts; i++)
 	    {
 	      tree val = VECTOR_CST_ELT (arg2, i);
@@ -11762,12 +11763,13 @@ fold_ternary_loc (location_t loc, enum t
 	      need_mask_canon = true;
 	    }
 
+	  vec_perm_indices indices (sel, 2, nelts);
 	  if ((TREE_CODE (op0) == VECTOR_CST
 	       || TREE_CODE (op0) == CONSTRUCTOR)
 	      && (TREE_CODE (op1) == VECTOR_CST
 		  || TREE_CODE (op1) == CONSTRUCTOR))
 	    {
-	      tree t = fold_vec_perm (type, op0, op1, sel);
+	      tree t = fold_vec_perm (type, op0, op1, indices);
 	      if (t != NULL_TREE)
 		return t;
 	    }
@@ -11779,11 +11781,14 @@ fold_ternary_loc (location_t loc, enum t
 	     argument permutation while still allowing an equivalent
 	     2-argument version.  */
 	  if (need_mask_canon && arg2 == op2
-	      && !can_vec_perm_const_p (TYPE_MODE (type), sel, false)
-	      && can_vec_perm_const_p (TYPE_MODE (type), sel2, false))
+	      && !can_vec_perm_const_p (TYPE_MODE (type), indices, false)
+	      && can_vec_perm_const_p (TYPE_MODE (type),
+				       vec_perm_indices (sel2, 2, nelts),
+				       false))
 	    {
 	      need_mask_canon = need_mask_canon2;
-	      sel = sel2;
+	      sel.truncate (0);
+	      sel.splice (sel2);
 	    }
 
 	  if (need_mask_canon && arg2 == op2)
Index: gcc/tree-ssa-forwprop.c
===================================================================
--- gcc/tree-ssa-forwprop.c	2018-01-02 17:01:26.630719809 +0000
+++ gcc/tree-ssa-forwprop.c	2018-01-02 17:01:28.747627350 +0000
@@ -2018,7 +2018,7 @@ simplify_vector_constructor (gimple_stmt
   elem_type = TREE_TYPE (type);
   elem_size = TREE_INT_CST_LOW (TYPE_SIZE (elem_type));
 
-  auto_vec_perm_indices sel (nelts);
+  vec_perm_builder sel (nelts, nelts, 1);
   orig = NULL;
   conv_code = ERROR_MARK;
   maybe_ident = true;
@@ -2109,7 +2109,8 @@ simplify_vector_constructor (gimple_stmt
     {
       tree mask_type;
 
-      if (!can_vec_perm_const_p (TYPE_MODE (type), sel))
+      vec_perm_indices indices (sel, 1, nelts);
+      if (!can_vec_perm_const_p (TYPE_MODE (type), indices))
 	return false;
       mask_type
 	= build_vector_type (build_nonstandard_integer_type (elem_size, 1),
Index: gcc/tree-vect-data-refs.c
===================================================================
--- gcc/tree-vect-data-refs.c	2018-01-02 17:01:26.630719809 +0000
+++ gcc/tree-vect-data-refs.c	2018-01-02 17:01:28.748627306 +0000
@@ -4579,7 +4579,7 @@ vect_grouped_store_supported (tree vecty
   if (VECTOR_MODE_P (mode))
     {
       unsigned int i, nelt = GET_MODE_NUNITS (mode);
-      auto_vec_perm_indices sel (nelt);
+      vec_perm_builder sel (nelt, nelt, 1);
       sel.quick_grow (nelt);
 
       if (count == 3)
@@ -4587,6 +4587,7 @@ vect_grouped_store_supported (tree vecty
 	  unsigned int j0 = 0, j1 = 0, j2 = 0;
 	  unsigned int i, j;
 
+	  vec_perm_indices indices;
 	  for (j = 0; j < 3; j++)
 	    {
 	      int nelt0 = ((3 - j) * nelt) % 3;
@@ -4601,7 +4602,8 @@ vect_grouped_store_supported (tree vecty
 		  if (3 * i + nelt2 < nelt)
 		    sel[3 * i + nelt2] = 0;
 		}
-	      if (!can_vec_perm_const_p (mode, sel))
+	      indices.new_vector (sel, 2, nelt);
+	      if (!can_vec_perm_const_p (mode, indices))
 		{
 		  if (dump_enabled_p ())
 		    dump_printf (MSG_MISSED_OPTIMIZATION,
@@ -4618,7 +4620,8 @@ vect_grouped_store_supported (tree vecty
 		  if (3 * i + nelt2 < nelt)
 		    sel[3 * i + nelt2] = nelt + j2++;
 		}
-	      if (!can_vec_perm_const_p (mode, sel))
+	      indices.new_vector (sel, 2, nelt);
+	      if (!can_vec_perm_const_p (mode, indices))
 		{
 		  if (dump_enabled_p ())
 		    dump_printf (MSG_MISSED_OPTIMIZATION,
@@ -4638,11 +4641,13 @@ vect_grouped_store_supported (tree vecty
 	      sel[i * 2] = i;
 	      sel[i * 2 + 1] = i + nelt;
 	    }
-	  if (can_vec_perm_const_p (mode, sel))
+	  vec_perm_indices indices (sel, 2, nelt);
+	  if (can_vec_perm_const_p (mode, indices))
 	    {
 	      for (i = 0; i < nelt; i++)
 		sel[i] += nelt / 2;
-	      if (can_vec_perm_const_p (mode, sel))
+	      indices.new_vector (sel, 2, nelt);
+	      if (can_vec_perm_const_p (mode, indices))
 		return true;
 	    }
 	}
@@ -4744,7 +4749,7 @@ vect_permute_store_chain (vec<tree> dr_c
   unsigned int i, n, log_length = exact_log2 (length);
   unsigned int j, nelt = TYPE_VECTOR_SUBPARTS (vectype);
 
-  auto_vec_perm_indices sel (nelt);
+  vec_perm_builder sel (nelt, nelt, 1);
   sel.quick_grow (nelt);
 
   result_chain->quick_grow (length);
@@ -4755,6 +4760,7 @@ vect_permute_store_chain (vec<tree> dr_c
     {
       unsigned int j0 = 0, j1 = 0, j2 = 0;
 
+      vec_perm_indices indices;
       for (j = 0; j < 3; j++)
         {
 	  int nelt0 = ((3 - j) * nelt) % 3;
@@ -4770,7 +4776,8 @@ vect_permute_store_chain (vec<tree> dr_c
 	      if (3 * i + nelt2 < nelt)
 		sel[3 * i + nelt2] = 0;
 	    }
-	  perm3_mask_low = vect_gen_perm_mask_checked (vectype, sel);
+	  indices.new_vector (sel, 2, nelt);
+	  perm3_mask_low = vect_gen_perm_mask_checked (vectype, indices);
 
 	  for (i = 0; i < nelt; i++)
 	    {
@@ -4781,7 +4788,8 @@ vect_permute_store_chain (vec<tree> dr_c
 	      if (3 * i + nelt2 < nelt)
 		sel[3 * i + nelt2] = nelt + j2++;
 	    }
-	  perm3_mask_high = vect_gen_perm_mask_checked (vectype, sel);
+	  indices.new_vector (sel, 2, nelt);
+	  perm3_mask_high = vect_gen_perm_mask_checked (vectype, indices);
 
 	  vect1 = dr_chain[0];
 	  vect2 = dr_chain[1];
@@ -4818,11 +4826,13 @@ vect_permute_store_chain (vec<tree> dr_c
 	  sel[i * 2] = i;
 	  sel[i * 2 + 1] = i + nelt;
 	}
-	perm_mask_high = vect_gen_perm_mask_checked (vectype, sel);
+	vec_perm_indices indices (sel, 2, nelt);
+	perm_mask_high = vect_gen_perm_mask_checked (vectype, indices);
 
 	for (i = 0; i < nelt; i++)
 	  sel[i] += nelt / 2;
-	perm_mask_low = vect_gen_perm_mask_checked (vectype, sel);
+	indices.new_vector (sel, 2, nelt);
+	perm_mask_low = vect_gen_perm_mask_checked (vectype, indices);
 
 	for (i = 0, n = log_length; i < n; i++)
 	  {
@@ -5167,11 +5177,12 @@ vect_grouped_load_supported (tree vectyp
   if (VECTOR_MODE_P (mode))
     {
       unsigned int i, j, nelt = GET_MODE_NUNITS (mode);
-      auto_vec_perm_indices sel (nelt);
+      vec_perm_builder sel (nelt, nelt, 1);
       sel.quick_grow (nelt);
 
       if (count == 3)
 	{
+	  vec_perm_indices indices;
 	  unsigned int k;
 	  for (k = 0; k < 3; k++)
 	    {
@@ -5180,7 +5191,8 @@ vect_grouped_load_supported (tree vectyp
 		  sel[i] = 3 * i + k;
 		else
 		  sel[i] = 0;
-	      if (!can_vec_perm_const_p (mode, sel))
+	      indices.new_vector (sel, 2, nelt);
+	      if (!can_vec_perm_const_p (mode, indices))
 		{
 		  if (dump_enabled_p ())
 		    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
@@ -5193,7 +5205,8 @@ vect_grouped_load_supported (tree vectyp
 		  sel[i] = i;
 		else
 		  sel[i] = nelt + ((nelt + k) % 3) + 3 * (j++);
-	      if (!can_vec_perm_const_p (mode, sel))
+	      indices.new_vector (sel, 2, nelt);
+	      if (!can_vec_perm_const_p (mode, indices))
 		{
 		  if (dump_enabled_p ())
 		    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
@@ -5208,13 +5221,16 @@ vect_grouped_load_supported (tree vectyp
 	{
 	  /* If length is not equal to 3 then only power of 2 is supported.  */
 	  gcc_assert (pow2p_hwi (count));
+
 	  for (i = 0; i < nelt; i++)
 	    sel[i] = i * 2;
-	  if (can_vec_perm_const_p (mode, sel))
+	  vec_perm_indices indices (sel, 2, nelt);
+	  if (can_vec_perm_const_p (mode, indices))
 	    {
 	      for (i = 0; i < nelt; i++)
 		sel[i] = i * 2 + 1;
-	      if (can_vec_perm_const_p (mode, sel))
+	      indices.new_vector (sel, 2, nelt);
+	      if (can_vec_perm_const_p (mode, indices))
 		return true;
 	    }
         }
@@ -5329,7 +5345,7 @@ vect_permute_load_chain (vec<tree> dr_ch
   unsigned int i, j, log_length = exact_log2 (length);
   unsigned nelt = TYPE_VECTOR_SUBPARTS (vectype);
 
-  auto_vec_perm_indices sel (nelt);
+  vec_perm_builder sel (nelt, nelt, 1);
   sel.quick_grow (nelt);
 
   result_chain->quick_grow (length);
@@ -5340,6 +5356,7 @@ vect_permute_load_chain (vec<tree> dr_ch
     {
       unsigned int k;
 
+      vec_perm_indices indices;
       for (k = 0; k < 3; k++)
 	{
 	  for (i = 0; i < nelt; i++)
@@ -5347,15 +5364,16 @@ vect_permute_load_chain (vec<tree> dr_ch
 	      sel[i] = 3 * i + k;
 	    else
 	      sel[i] = 0;
-	  perm3_mask_low = vect_gen_perm_mask_checked (vectype, sel);
+	  indices.new_vector (sel, 2, nelt);
+	  perm3_mask_low = vect_gen_perm_mask_checked (vectype, indices);
 
 	  for (i = 0, j = 0; i < nelt; i++)
 	    if (3 * i + k < 2 * nelt)
 	      sel[i] = i;
 	    else
 	      sel[i] = nelt + ((nelt + k) % 3) + 3 * (j++);
-
-	  perm3_mask_high = vect_gen_perm_mask_checked (vectype, sel);
+	  indices.new_vector (sel, 2, nelt);
+	  perm3_mask_high = vect_gen_perm_mask_checked (vectype, indices);
 
 	  first_vect = dr_chain[0];
 	  second_vect = dr_chain[1];
@@ -5387,11 +5405,13 @@ vect_permute_load_chain (vec<tree> dr_ch
 
       for (i = 0; i < nelt; ++i)
 	sel[i] = i * 2;
-      perm_mask_even = vect_gen_perm_mask_checked (vectype, sel);
+      vec_perm_indices indices (sel, 2, nelt);
+      perm_mask_even = vect_gen_perm_mask_checked (vectype, indices);
 
       for (i = 0; i < nelt; ++i)
 	sel[i] = i * 2 + 1;
-      perm_mask_odd = vect_gen_perm_mask_checked (vectype, sel);
+      indices.new_vector (sel, 2, nelt);
+      perm_mask_odd = vect_gen_perm_mask_checked (vectype, indices);
 
       for (i = 0; i < log_length; i++)
 	{
@@ -5527,7 +5547,7 @@ vect_shift_permute_load_chain (vec<tree>
   stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
   loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
 
-  auto_vec_perm_indices sel (nelt);
+  vec_perm_builder sel (nelt, nelt, 1);
   sel.quick_grow (nelt);
 
   result_chain->quick_grow (length);
@@ -5541,7 +5561,8 @@ vect_shift_permute_load_chain (vec<tree>
 	sel[i] = i * 2;
       for (i = 0; i < nelt / 2; ++i)
 	sel[nelt / 2 + i] = i * 2 + 1;
-      if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel))
+      vec_perm_indices indices (sel, 2, nelt);
+      if (!can_vec_perm_const_p (TYPE_MODE (vectype), indices))
 	{
 	  if (dump_enabled_p ())
 	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
@@ -5549,13 +5570,14 @@ vect_shift_permute_load_chain (vec<tree>
 			      supported by target\n");
 	  return false;
 	}
-      perm2_mask1 = vect_gen_perm_mask_checked (vectype, sel);
+      perm2_mask1 = vect_gen_perm_mask_checked (vectype, indices);
 
       for (i = 0; i < nelt / 2; ++i)
 	sel[i] = i * 2 + 1;
       for (i = 0; i < nelt / 2; ++i)
 	sel[nelt / 2 + i] = i * 2;
-      if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel))
+      indices.new_vector (sel, 2, nelt);
+      if (!can_vec_perm_const_p (TYPE_MODE (vectype), indices))
 	{
 	  if (dump_enabled_p ())
 	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
@@ -5563,20 +5585,21 @@ vect_shift_permute_load_chain (vec<tree>
 			      supported by target\n");
 	  return false;
 	}
-      perm2_mask2 = vect_gen_perm_mask_checked (vectype, sel);
+      perm2_mask2 = vect_gen_perm_mask_checked (vectype, indices);
 
       /* Generating permutation constant to shift all elements.
 	 For vector length 8 it is {4 5 6 7 8 9 10 11}.  */
       for (i = 0; i < nelt; i++)
 	sel[i] = nelt / 2 + i;
-      if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel))
+      indices.new_vector (sel, 2, nelt);
+      if (!can_vec_perm_const_p (TYPE_MODE (vectype), indices))
 	{
 	  if (dump_enabled_p ())
 	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
 			     "shift permutation is not supported by target\n");
 	  return false;
 	}
-      shift1_mask = vect_gen_perm_mask_checked (vectype, sel);
+      shift1_mask = vect_gen_perm_mask_checked (vectype, indices);
 
       /* Generating permutation constant to select vector from 2.
 	 For vector length 8 it is {0 1 2 3 12 13 14 15}.  */
@@ -5584,14 +5607,15 @@ vect_shift_permute_load_chain (vec<tree>
 	sel[i] = i;
       for (i = nelt / 2; i < nelt; i++)
 	sel[i] = nelt + i;
-      if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel))
+      indices.new_vector (sel, 2, nelt);
+      if (!can_vec_perm_const_p (TYPE_MODE (vectype), indices))
 	{
 	  if (dump_enabled_p ())
 	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
 			     "select is not supported by target\n");
 	  return false;
 	}
-      select_mask = vect_gen_perm_mask_checked (vectype, sel);
+      select_mask = vect_gen_perm_mask_checked (vectype, indices);
 
       for (i = 0; i < log_length; i++)
 	{
@@ -5647,7 +5671,8 @@ vect_shift_permute_load_chain (vec<tree>
 	  sel[i] = 3 * k + (l % 3);
 	  k++;
 	}
-      if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel))
+      vec_perm_indices indices (sel, 2, nelt);
+      if (!can_vec_perm_const_p (TYPE_MODE (vectype), indices))
 	{
 	  if (dump_enabled_p ())
 	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
@@ -5655,59 +5680,63 @@ vect_shift_permute_load_chain (vec<tree>
 			      supported by target\n");
 	  return false;
 	}
-      perm3_mask = vect_gen_perm_mask_checked (vectype, sel);
+      perm3_mask = vect_gen_perm_mask_checked (vectype, indices);
 
       /* Generating permutation constant to shift all elements.
 	 For vector length 8 it is {6 7 8 9 10 11 12 13}.  */
       for (i = 0; i < nelt; i++)
 	sel[i] = 2 * (nelt / 3) + (nelt % 3) + i;
-      if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel))
+      indices.new_vector (sel, 2, nelt);
+      if (!can_vec_perm_const_p (TYPE_MODE (vectype), indices))
 	{
 	  if (dump_enabled_p ())
 	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
 			     "shift permutation is not supported by target\n");
 	  return false;
 	}
-      shift1_mask = vect_gen_perm_mask_checked (vectype, sel);
+      shift1_mask = vect_gen_perm_mask_checked (vectype, indices);
 
       /* Generating permutation constant to shift all elements.
 	 For vector length 8 it is {5 6 7 8 9 10 11 12}.  */
       for (i = 0; i < nelt; i++)
 	sel[i] = 2 * (nelt / 3) + 1 + i;
-      if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel))
+      indices.new_vector (sel, 2, nelt);
+      if (!can_vec_perm_const_p (TYPE_MODE (vectype), indices))
 	{
 	  if (dump_enabled_p ())
 	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
 			     "shift permutation is not supported by target\n");
 	  return false;
 	}
-      shift2_mask = vect_gen_perm_mask_checked (vectype, sel);
+      shift2_mask = vect_gen_perm_mask_checked (vectype, indices);
 
       /* Generating permutation constant to shift all elements.
 	 For vector length 8 it is {3 4 5 6 7 8 9 10}.  */
       for (i = 0; i < nelt; i++)
 	sel[i] = (nelt / 3) + (nelt % 3) / 2 + i;
-      if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel))
+      indices.new_vector (sel, 2, nelt);
+      if (!can_vec_perm_const_p (TYPE_MODE (vectype), indices))
 	{
 	  if (dump_enabled_p ())
 	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
 			     "shift permutation is not supported by target\n");
 	  return false;
 	}
-      shift3_mask = vect_gen_perm_mask_checked (vectype, sel);
+      shift3_mask = vect_gen_perm_mask_checked (vectype, indices);
 
       /* Generating permutation constant to shift all elements.
 	 For vector length 8 it is {5 6 7 8 9 10 11 12}.  */
       for (i = 0; i < nelt; i++)
 	sel[i] = 2 * (nelt / 3) + (nelt % 3) / 2 + i;
-      if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel))
+      indices.new_vector (sel, 2, nelt);
+      if (!can_vec_perm_const_p (TYPE_MODE (vectype), indices))
 	{
 	  if (dump_enabled_p ())
 	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
 			     "shift permutation is not supported by target\n");
 	  return false;
 	}
-      shift4_mask = vect_gen_perm_mask_checked (vectype, sel);
+      shift4_mask = vect_gen_perm_mask_checked (vectype, indices);
 
       for (k = 0; k < 3; k++)
 	{
Index: gcc/tree-vect-slp.c
===================================================================
--- gcc/tree-vect-slp.c	2018-01-02 17:01:26.632719721 +0000
+++ gcc/tree-vect-slp.c	2018-01-02 17:01:28.749627263 +0000
@@ -894,7 +894,7 @@ vect_build_slp_tree_1 (vec_info *vinfo,
       && TREE_CODE_CLASS (alt_stmt_code) != tcc_reference)
     {
       unsigned int count = TYPE_VECTOR_SUBPARTS (vectype);
-      auto_vec_perm_indices sel (count);
+      vec_perm_builder sel (count, count, 1);
       for (i = 0; i < count; ++i)
 	{
 	  unsigned int elt = i;
@@ -902,7 +902,8 @@ vect_build_slp_tree_1 (vec_info *vinfo,
 	    elt += count;
 	  sel.quick_push (elt);
 	}
-      if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel))
+      vec_perm_indices indices (sel, 2, count);
+      if (!can_vec_perm_const_p (TYPE_MODE (vectype), indices))
 	{
 	  for (i = 0; i < group_size; ++i)
 	    if (gimple_assign_rhs_code (stmts[i]) == alt_stmt_code)
@@ -3570,8 +3571,9 @@ vect_transform_slp_perm_load (slp_tree n
     (int_mode_for_mode (TYPE_MODE (TREE_TYPE (vectype))).require (), 1);
   mask_type = get_vectype_for_scalar_type (mask_element_type);
   nunits = TYPE_VECTOR_SUBPARTS (vectype);
-  auto_vec_perm_indices mask (nunits);
+  vec_perm_builder mask (nunits, nunits, 1);
   mask.quick_grow (nunits);
+  vec_perm_indices indices;
 
   /* Initialize the vect stmts of NODE to properly insert the generated
      stmts later.  */
@@ -3644,10 +3646,10 @@ vect_transform_slp_perm_load (slp_tree n
 	    noop_p = false;
 	  mask[index++] = mask_element;
 
-	  if (index == nunits)
+	  if (index == nunits && !noop_p)
 	    {
-	      if (! noop_p
-		  && ! can_vec_perm_const_p (mode, mask))
+	      indices.new_vector (mask, 2, nunits);
+	      if (!can_vec_perm_const_p (mode, indices))
 		{
 		  if (dump_enabled_p ())
 		    {
@@ -3655,16 +3657,19 @@ vect_transform_slp_perm_load (slp_tree n
 				       vect_location, 
 				       "unsupported vect permute { ");
 		      for (i = 0; i < nunits; ++i)
-			dump_printf (MSG_MISSED_OPTIMIZATION, "%d ", mask[i]);
+			dump_printf (MSG_MISSED_OPTIMIZATION,
+				     HOST_WIDE_INT_PRINT_DEC " ", mask[i]);
 		      dump_printf (MSG_MISSED_OPTIMIZATION, "}\n");
 		    }
 		  gcc_assert (analyze_only);
 		  return false;
 		}
 
-	      if (! noop_p)
-		++*n_perms;
+	      ++*n_perms;
+	    }
 
+	  if (index == nunits)
+	    {
 	      if (!analyze_only)
 		{
 		  tree mask_vec = NULL_TREE;
@@ -3797,7 +3802,7 @@ vect_schedule_slp_instance (slp_tree nod
       enum tree_code code0 = gimple_assign_rhs_code (stmt);
       enum tree_code ocode = ERROR_MARK;
       gimple *ostmt;
-      auto_vec_perm_indices mask (group_size);
+      vec_perm_builder mask (group_size, group_size, 1);
       FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_STMTS (node), i, ostmt)
 	if (gimple_assign_rhs_code (ostmt) != code0)
 	  {
Index: gcc/tree-vect-stmts.c
===================================================================
--- gcc/tree-vect-stmts.c	2018-01-02 17:01:26.632719721 +0000
+++ gcc/tree-vect-stmts.c	2018-01-02 17:01:28.750627219 +0000
@@ -1717,13 +1717,14 @@ perm_mask_for_reverse (tree vectype)
 
   nunits = TYPE_VECTOR_SUBPARTS (vectype);
 
-  auto_vec_perm_indices sel (nunits);
+  vec_perm_builder sel (nunits, nunits, 1);
   for (i = 0; i < nunits; ++i)
     sel.quick_push (nunits - 1 - i);
 
-  if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel))
+  vec_perm_indices indices (sel, 1, nunits);
+  if (!can_vec_perm_const_p (TYPE_MODE (vectype), indices))
     return NULL_TREE;
-  return vect_gen_perm_mask_checked (vectype, sel);
+  return vect_gen_perm_mask_checked (vectype, indices);
 }
 
 /* A subroutine of get_load_store_type, with a subset of the same
@@ -2185,27 +2186,32 @@ vectorizable_mask_load_store (gimple *st
 	{
 	  modifier = WIDEN;
 
-	  auto_vec_perm_indices sel (gather_off_nunits);
+	  vec_perm_builder sel (gather_off_nunits, gather_off_nunits, 1);
 	  for (i = 0; i < gather_off_nunits; ++i)
 	    sel.quick_push (i | nunits);
 
-	  perm_mask = vect_gen_perm_mask_checked (gs_info.offset_vectype, sel);
+	  vec_perm_indices indices (sel, 1, gather_off_nunits);
+	  perm_mask = vect_gen_perm_mask_checked (gs_info.offset_vectype,
+						  indices);
 	}
       else if (nunits == gather_off_nunits * 2)
 	{
 	  modifier = NARROW;
 
-	  auto_vec_perm_indices sel (nunits);
+	  vec_perm_builder sel (nunits, nunits, 1);
 	  sel.quick_grow (nunits);
 	  for (i = 0; i < nunits; ++i)
 	    sel[i] = i < gather_off_nunits
 		     ? i : i + nunits - gather_off_nunits;
+	  vec_perm_indices indices (sel, 2, nunits);
+	  perm_mask = vect_gen_perm_mask_checked (vectype, indices);
 
-	  perm_mask = vect_gen_perm_mask_checked (vectype, sel);
 	  ncopies *= 2;
+
 	  for (i = 0; i < nunits; ++i)
 	    sel[i] = i | gather_off_nunits;
-	  mask_perm_mask = vect_gen_perm_mask_checked (masktype, sel);
+	  indices.new_vector (sel, 2, gather_off_nunits);
+	  mask_perm_mask = vect_gen_perm_mask_checked (masktype, indices);
 	}
       else
 	gcc_unreachable ();
@@ -2498,12 +2504,13 @@ vectorizable_bswap (gimple *stmt, gimple
   unsigned int num_bytes = TYPE_VECTOR_SUBPARTS (char_vectype);
   unsigned word_bytes = num_bytes / nunits;
 
-  auto_vec_perm_indices elts (num_bytes);
+  vec_perm_builder elts (num_bytes, num_bytes, 1);
   for (unsigned i = 0; i < nunits; ++i)
     for (unsigned j = 0; j < word_bytes; ++j)
       elts.quick_push ((i + 1) * word_bytes - j - 1);
 
-  if (!can_vec_perm_const_p (TYPE_MODE (char_vectype), elts))
+  vec_perm_indices indices (elts, 1, num_bytes);
+  if (!can_vec_perm_const_p (TYPE_MODE (char_vectype), indices))
     return false;
 
   if (! vec_stmt)
@@ -5826,22 +5833,25 @@ vectorizable_store (gimple *stmt, gimple
 	{
 	  modifier = WIDEN;
 
-	  auto_vec_perm_indices sel (scatter_off_nunits);
+	  vec_perm_builder sel (scatter_off_nunits, scatter_off_nunits, 1);
 	  for (i = 0; i < (unsigned int) scatter_off_nunits; ++i)
 	    sel.quick_push (i | nunits);
 
-	  perm_mask = vect_gen_perm_mask_checked (gs_info.offset_vectype, sel);
+	  vec_perm_indices indices (sel, 1, scatter_off_nunits);
+	  perm_mask = vect_gen_perm_mask_checked (gs_info.offset_vectype,
+						  indices);
 	  gcc_assert (perm_mask != NULL_TREE);
 	}
       else if (nunits == (unsigned int) scatter_off_nunits * 2)
 	{
 	  modifier = NARROW;
 
-	  auto_vec_perm_indices sel (nunits);
+	  vec_perm_builder sel (nunits, nunits, 1);
 	  for (i = 0; i < (unsigned int) nunits; ++i)
 	    sel.quick_push (i | scatter_off_nunits);
 
-	  perm_mask = vect_gen_perm_mask_checked (vectype, sel);
+	  vec_perm_indices indices (sel, 2, nunits);
+	  perm_mask = vect_gen_perm_mask_checked (vectype, indices);
 	  gcc_assert (perm_mask != NULL_TREE);
 	  ncopies *= 2;
 	}
@@ -6862,22 +6872,25 @@ vectorizable_load (gimple *stmt, gimple_
 	{
 	  modifier = WIDEN;
 
-	  auto_vec_perm_indices sel (gather_off_nunits);
+	  vec_perm_builder sel (gather_off_nunits, gather_off_nunits, 1);
 	  for (i = 0; i < gather_off_nunits; ++i)
 	    sel.quick_push (i | nunits);
 
-	  perm_mask = vect_gen_perm_mask_checked (gs_info.offset_vectype, sel);
+	  vec_perm_indices indices (sel, 1, gather_off_nunits);
+	  perm_mask = vect_gen_perm_mask_checked (gs_info.offset_vectype,
+						  indices);
 	}
       else if (nunits == gather_off_nunits * 2)
 	{
 	  modifier = NARROW;
 
-	  auto_vec_perm_indices sel (nunits);
+	  vec_perm_builder sel (nunits, nunits, 1);
 	  for (i = 0; i < nunits; ++i)
 	    sel.quick_push (i < gather_off_nunits
 			    ? i : i + nunits - gather_off_nunits);
 
-	  perm_mask = vect_gen_perm_mask_checked (vectype, sel);
+	  vec_perm_indices indices (sel, 2, nunits);
+	  perm_mask = vect_gen_perm_mask_checked (vectype, indices);
 	  ncopies *= 2;
 	}
       else
Index: gcc/tree-vect-generic.c
===================================================================
--- gcc/tree-vect-generic.c	2018-01-02 17:01:26.631719765 +0000
+++ gcc/tree-vect-generic.c	2018-01-02 17:01:28.748627306 +0000
@@ -1299,15 +1299,13 @@ lower_vec_perm (gimple_stmt_iterator *gs
 	mask = gimple_assign_rhs1 (def_stmt);
     }
 
-  if (TREE_CODE (mask) == VECTOR_CST)
-    {
-      auto_vec_perm_indices sel_int (elements);
-
-      for (i = 0; i < elements; ++i)
-	sel_int.quick_push (TREE_INT_CST_LOW (VECTOR_CST_ELT (mask, i))
-			    & (2 * elements - 1));
+  vec_perm_builder sel_int;
 
-      if (can_vec_perm_const_p (TYPE_MODE (vect_type), sel_int))
+  if (TREE_CODE (mask) == VECTOR_CST
+      && tree_to_vec_perm_builder (&sel_int, mask))
+    {
+      vec_perm_indices indices (sel_int, 2, elements);
+      if (can_vec_perm_const_p (TYPE_MODE (vect_type), indices))
 	{
 	  gimple_assign_set_rhs3 (stmt, mask);
 	  update_stmt (stmt);
@@ -1319,14 +1317,14 @@ lower_vec_perm (gimple_stmt_iterator *gs
 	  != CODE_FOR_nothing
 	  && TREE_CODE (vec1) == VECTOR_CST
 	  && initializer_zerop (vec1)
-	  && sel_int[0]
-	  && sel_int[0] < elements)
+	  && indices[0]
+	  && indices[0] < elements)
 	{
 	  for (i = 1; i < elements; ++i)
 	    {
-	      unsigned int expected = i + sel_int[0];
+	      unsigned int expected = i + indices[0];
 	      /* Indices into the second vector are all equivalent.  */
-	      if (MIN (elements, (unsigned) sel_int[i])
+	      if (MIN (elements, (unsigned) indices[i])
 		  != MIN (elements, expected))
  		break;
 	    }
Index: gcc/tree-vect-loop.c
===================================================================
--- gcc/tree-vect-loop.c	2018-01-02 17:01:26.631719765 +0000
+++ gcc/tree-vect-loop.c	2018-01-02 17:01:28.749627263 +0000
@@ -3713,12 +3713,11 @@ vect_estimate_min_profitable_iters (loop
    vector elements (not bits) for a vector with NELT elements.  */
 static void
 calc_vec_perm_mask_for_shift (unsigned int offset, unsigned int nelt,
-			      vec_perm_indices *sel)
+			      vec_perm_builder *sel)
 {
-  unsigned int i;
-
-  for (i = 0; i < nelt; i++)
-    sel->quick_push ((i + offset) & (2 * nelt - 1));
+  sel->new_vector (nelt, nelt, 1);
+  for (unsigned int i = 0; i < nelt; i++)
+    sel->quick_push (i + offset);
 }
 
 /* Checks whether the target supports whole-vector shifts for vectors of mode
@@ -3731,13 +3730,13 @@ have_whole_vector_shift (machine_mode mo
     return true;
 
   unsigned int i, nelt = GET_MODE_NUNITS (mode);
-  auto_vec_perm_indices sel (nelt);
-
+  vec_perm_builder sel;
+  vec_perm_indices indices;
   for (i = nelt/2; i >= 1; i/=2)
     {
-      sel.truncate (0);
       calc_vec_perm_mask_for_shift (i, nelt, &sel);
-      if (!can_vec_perm_const_p (mode, sel, false))
+      indices.new_vector (sel, 2, nelt);
+      if (!can_vec_perm_const_p (mode, indices, false))
 	return false;
     }
   return true;
@@ -5055,7 +5054,8 @@ vect_create_epilog_for_reduction (vec<tr
       if (reduce_with_shift && !slp_reduc)
         {
           int nelements = vec_size_in_bits / element_bitsize;
-          auto_vec_perm_indices sel (nelements);
+	  vec_perm_builder sel;
+	  vec_perm_indices indices;
 
           int elt_offset;
 
@@ -5079,9 +5079,9 @@ vect_create_epilog_for_reduction (vec<tr
                elt_offset >= 1;
                elt_offset /= 2)
             {
-	      sel.truncate (0);
 	      calc_vec_perm_mask_for_shift (elt_offset, nelements, &sel);
-	      tree mask = vect_gen_perm_mask_any (vectype, sel);
+	      indices.new_vector (sel, 2, nelements);
+	      tree mask = vect_gen_perm_mask_any (vectype, indices);
 	      epilog_stmt = gimple_build_assign (vec_dest, VEC_PERM_EXPR,
 						 new_temp, zero_vec, mask);
               new_name = make_ssa_name (vec_dest, epilog_stmt);
Index: gcc/config/aarch64/aarch64.c
===================================================================
--- gcc/config/aarch64/aarch64.c	2018-01-02 17:01:26.602721036 +0000
+++ gcc/config/aarch64/aarch64.c	2018-01-02 17:01:28.736627829 +0000
@@ -13252,7 +13252,7 @@ #define MAX_VECT_LEN 16
 struct expand_vec_perm_d
 {
   rtx target, op0, op1;
-  auto_vec_perm_indices perm;
+  vec_perm_indices perm;
   machine_mode vmode;
   bool one_vector_p;
   bool testing_p;
@@ -13642,10 +13642,7 @@ aarch64_expand_vec_perm_const_1 (struct
   unsigned int nelt = d->perm.length ();
   if (d->perm[0] >= nelt)
     {
-      gcc_assert (nelt == (nelt & -nelt));
-      for (unsigned int i = 0; i < nelt; ++i)
-	d->perm[i] ^= nelt; /* Keep the same index, but in the other vector.  */
-
+      d->perm.rotate_inputs (1);
       std::swap (d->op0, d->op1);
     }
 
@@ -13685,12 +13682,10 @@ aarch64_vectorize_vec_perm_const (machin
 
   /* Calculate whether all elements are in one vector.  */
   unsigned int nelt = sel.length ();
-  d.perm.reserve (nelt);
   for (i = which = 0; i < nelt; ++i)
     {
       unsigned int ei = sel[i] & (2 * nelt - 1);
       which |= (ei < nelt ? 1 : 2);
-      d.perm.quick_push (ei);
     }
 
   switch (which)
@@ -13709,8 +13704,6 @@ aarch64_vectorize_vec_perm_const (machin
 	 input vector.  */
       /* Fall Through.  */
     case 2:
-      for (i = 0; i < nelt; ++i)
-	d.perm[i] &= nelt - 1;
       d.op0 = op1;
       d.one_vector_p = true;
       break;
@@ -13721,6 +13714,8 @@ aarch64_vectorize_vec_perm_const (machin
       break;
     }
 
+  d.perm.new_vector (sel.encoding (), d.one_vector_p ? 1 : 2, nelt);
+
   if (!d.testing_p)
     return aarch64_expand_vec_perm_const_1 (&d);
 
Index: gcc/config/arm/arm.c
===================================================================
--- gcc/config/arm/arm.c	2018-01-02 17:01:26.604720948 +0000
+++ gcc/config/arm/arm.c	2018-01-02 17:01:28.739627698 +0000
@@ -28854,7 +28854,7 @@ #define MAX_VECT_LEN 16
 struct expand_vec_perm_d
 {
   rtx target, op0, op1;
-  auto_vec_perm_indices perm;
+  vec_perm_indices perm;
   machine_mode vmode;
   bool one_vector_p;
   bool testing_p;
@@ -29362,9 +29362,7 @@ arm_expand_vec_perm_const_1 (struct expa
   unsigned int nelt = d->perm.length ();
   if (d->perm[0] >= nelt)
     {
-      for (unsigned int i = 0; i < nelt; ++i)
-	d->perm[i] = (d->perm[i] + nelt) & (2 * nelt - 1);
-
+      d->perm.rotate_inputs (1);
       std::swap (d->op0, d->op1);
     }
 
@@ -29404,12 +29402,10 @@ arm_vectorize_vec_perm_const (machine_mo
   d.testing_p = !target;
 
   nelt = GET_MODE_NUNITS (d.vmode);
-  d.perm.reserve (nelt);
   for (i = which = 0; i < nelt; ++i)
     {
       int ei = sel[i] & (2 * nelt - 1);
       which |= (ei < nelt ? 1 : 2);
-      d.perm.quick_push (ei);
     }
 
   switch (which)
@@ -29428,8 +29424,6 @@ arm_vectorize_vec_perm_const (machine_mo
 	 input vector.  */
       /* FALLTHRU */
     case 2:
-      for (i = 0; i < nelt; ++i)
-        d.perm[i] &= nelt - 1;
       d.op0 = op1;
       d.one_vector_p = true;
       break;
@@ -29440,6 +29434,8 @@ arm_vectorize_vec_perm_const (machine_mo
       break;
     }
 
+  d.perm.new_vector (sel.encoding (), d.one_vector_p ? 1 : 2, nelt);
+
   if (d.testing_p)
     return arm_expand_vec_perm_const_1 (&d);
 
Index: gcc/config/powerpcspe/powerpcspe.c
===================================================================
--- gcc/config/powerpcspe/powerpcspe.c	2018-01-02 17:01:26.617720379 +0000
+++ gcc/config/powerpcspe/powerpcspe.c	2018-01-02 17:01:28.742627568 +0000
@@ -38782,7 +38782,7 @@ rs6000_expand_extract_even (rtx target,
 {
   machine_mode vmode = GET_MODE (target);
   unsigned i, nelt = GET_MODE_NUNITS (vmode);
-  vec_perm_builder perm (nelt);
+  vec_perm_builder perm (nelt, nelt, 1);
 
   for (i = 0; i < nelt; i++)
     perm.quick_push (i * 2);
@@ -38797,7 +38797,7 @@ rs6000_expand_interleave (rtx target, rt
 {
   machine_mode vmode = GET_MODE (target);
   unsigned i, high, nelt = GET_MODE_NUNITS (vmode);
-  vec_perm_builder perm (nelt);
+  vec_perm_builder perm (nelt, nelt, 1);
 
   high = (highp ? 0 : nelt / 2);
   for (i = 0; i < nelt / 2; i++)
Index: gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/config/rs6000/rs6000.c	2018-01-02 17:01:26.621720203 +0000
+++ gcc/config/rs6000/rs6000.c	2018-01-02 17:01:28.745627437 +0000
@@ -36042,7 +36042,7 @@ rs6000_expand_extract_even (rtx target,
 {
   machine_mode vmode = GET_MODE (target);
   unsigned i, nelt = GET_MODE_NUNITS (vmode);
-  vec_perm_builder perm (nelt);
+  vec_perm_builder perm (nelt, nelt, 1);
 
   for (i = 0; i < nelt; i++)
     perm.quick_push (i * 2);
@@ -36057,7 +36057,7 @@ rs6000_expand_interleave (rtx target, rt
 {
   machine_mode vmode = GET_MODE (target);
   unsigned i, high, nelt = GET_MODE_NUNITS (vmode);
-  vec_perm_builder perm (nelt);
+  vec_perm_builder perm (nelt, nelt, 1);
 
   high = (highp ? 0 : nelt / 2);
   for (i = 0; i < nelt / 2; i++)

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [13/13] [AArch64] Use vec_perm_indices helper routines
  2017-12-19 20:37   ` Richard Sandiford
@ 2018-01-04 11:28     ` Richard Sandiford
  2018-01-09 12:18       ` James Greenhalgh
  0 siblings, 1 reply; 46+ messages in thread
From: Richard Sandiford @ 2018-01-04 11:28 UTC (permalink / raw)
  To: gcc-patches; +Cc: richard.earnshaw, james.greenhalgh, marcus.shawcroft

Ping**2

Richard Sandiford <richard.sandiford@linaro.org> writes:
> Ping
>
> Richard Sandiford <richard.sandiford@linaro.org> writes:
>> This patch makes the AArch64 vec_perm_const code use the new
>> vec_perm_indices routines, instead of checking each element individually.
>> This means that they extend naturally to variable-length vectors.
>>
>> Also, aarch64_evpc_dup was the only function that generated rtl when
>> testing_p is true, and that looked accidental.  The patch adds the
>> missing check and then replaces the gen_rtx_REG/start_sequence/
>> end_sequence stuff with an assert that no rtl is generated.
>>
>> Tested on aarch64-linux-gnu.  Also tested by making sure that there
>> were no assembly output differences for aarch64_be-linux-gnu or
>> aarch64_be-linux-gnu.  OK to install?
>>
>> Richard
>>
>>
>> 2017-12-09  Richard Sandiford  <richard.sandiford@linaro.org>
>>
>> gcc/
>> 	* config/aarch64/aarch64.c (aarch64_evpc_trn): Use d.perm.series_p
>> 	instead of checking each element individually.
>> 	(aarch64_evpc_uzp): Likewise.
>> 	(aarch64_evpc_zip): Likewise.
>> 	(aarch64_evpc_ext): Likewise.
>> 	(aarch64_evpc_rev): Likewise.
>> 	(aarch64_evpc_dup): Test the encoding for a single duplicated element,
>> 	instead of checking each element individually.  Return true without
>> 	generating rtl if
>> 	(aarch64_vectorize_vec_perm_const): Use all_from_input_p to test
>> 	whether all selected elements come from the same input, instead of
>> 	checking each element individually.  Remove calls to gen_rtx_REG,
>> 	start_sequence and end_sequence and instead assert that no rtl is
>> 	generated.
>>
>> Index: gcc/config/aarch64/aarch64.c
>> ===================================================================
>> --- gcc/config/aarch64/aarch64.c	2017-12-09 22:48:47.535824832 +0000
>> +++ gcc/config/aarch64/aarch64.c	2017-12-09 22:49:00.139270410 +0000
>> @@ -13295,7 +13295,7 @@ aarch64_expand_vec_perm (rtx target, rtx
>>  static bool
>>  aarch64_evpc_trn (struct expand_vec_perm_d *d)
>>  {
>> -  unsigned int i, odd, mask, nelt = d->perm.length ();
>> +  unsigned int odd, nelt = d->perm.length ();
>>    rtx out, in0, in1, x;
>>    machine_mode vmode = d->vmode;
>>  
>> @@ -13304,21 +13304,11 @@ aarch64_evpc_trn (struct expand_vec_perm
>>  
>>    /* Note that these are little-endian tests.
>>       We correct for big-endian later.  */
>> -  if (d->perm[0] == 0)
>> -    odd = 0;
>> -  else if (d->perm[0] == 1)
>> -    odd = 1;
>> -  else
>> +  odd = d->perm[0];
>> +  if ((odd != 0 && odd != 1)
>> +      || !d->perm.series_p (0, 2, odd, 2)
>> +      || !d->perm.series_p (1, 2, nelt + odd, 2))
>>      return false;
>> -  mask = (d->one_vector_p ? nelt - 1 : 2 * nelt - 1);
>> -
>> -  for (i = 0; i < nelt; i += 2)
>> -    {
>> -      if (d->perm[i] != i + odd)
>> -	return false;
>> -      if (d->perm[i + 1] != ((i + nelt + odd) & mask))
>> -	return false;
>> -    }
>>  
>>    /* Success!  */
>>    if (d->testing_p)
>> @@ -13342,7 +13332,7 @@ aarch64_evpc_trn (struct expand_vec_perm
>>  static bool
>>  aarch64_evpc_uzp (struct expand_vec_perm_d *d)
>>  {
>> -  unsigned int i, odd, mask, nelt = d->perm.length ();
>> +  unsigned int odd;
>>    rtx out, in0, in1, x;
>>    machine_mode vmode = d->vmode;
>>  
>> @@ -13351,20 +13341,10 @@ aarch64_evpc_uzp (struct expand_vec_perm
>>  
>>    /* Note that these are little-endian tests.
>>       We correct for big-endian later.  */
>> -  if (d->perm[0] == 0)
>> -    odd = 0;
>> -  else if (d->perm[0] == 1)
>> -    odd = 1;
>> -  else
>> +  odd = d->perm[0];
>> +  if ((odd != 0 && odd != 1)
>> +      || !d->perm.series_p (0, 1, odd, 2))
>>      return false;
>> -  mask = (d->one_vector_p ? nelt - 1 : 2 * nelt - 1);
>> -
>> -  for (i = 0; i < nelt; i++)
>> -    {
>> -      unsigned elt = (i * 2 + odd) & mask;
>> -      if (d->perm[i] != elt)
>> -	return false;
>> -    }
>>  
>>    /* Success!  */
>>    if (d->testing_p)
>> @@ -13388,7 +13368,7 @@ aarch64_evpc_uzp (struct expand_vec_perm
>>  static bool
>>  aarch64_evpc_zip (struct expand_vec_perm_d *d)
>>  {
>> -  unsigned int i, high, mask, nelt = d->perm.length ();
>> +  unsigned int high, nelt = d->perm.length ();
>>    rtx out, in0, in1, x;
>>    machine_mode vmode = d->vmode;
>>  
>> @@ -13397,25 +13377,11 @@ aarch64_evpc_zip (struct expand_vec_perm
>>  
>>    /* Note that these are little-endian tests.
>>       We correct for big-endian later.  */
>> -  high = nelt / 2;
>> -  if (d->perm[0] == high)
>> -    /* Do Nothing.  */
>> -    ;
>> -  else if (d->perm[0] == 0)
>> -    high = 0;
>> -  else
>> +  high = d->perm[0];
>> +  if ((high != 0 && high * 2 != nelt)
>> +      || !d->perm.series_p (0, 2, high, 1)
>> +      || !d->perm.series_p (1, 2, high + nelt, 1))
>>      return false;
>> -  mask = (d->one_vector_p ? nelt - 1 : 2 * nelt - 1);
>> -
>> -  for (i = 0; i < nelt / 2; i++)
>> -    {
>> -      unsigned elt = (i + high) & mask;
>> -      if (d->perm[i * 2] != elt)
>> -	return false;
>> -      elt = (elt + nelt) & mask;
>> -      if (d->perm[i * 2 + 1] != elt)
>> -	return false;
>> -    }
>>  
>>    /* Success!  */
>>    if (d->testing_p)
>> @@ -13440,23 +13406,14 @@ aarch64_evpc_zip (struct expand_vec_perm
>>  static bool
>>  aarch64_evpc_ext (struct expand_vec_perm_d *d)
>>  {
>> -  unsigned int i, nelt = d->perm.length ();
>> +  unsigned int nelt = d->perm.length ();
>>    rtx offset;
>>  
>>    unsigned int location = d->perm[0]; /* Always < nelt.  */
>>  
>>    /* Check if the extracted indices are increasing by one.  */
>> -  for (i = 1; i < nelt; i++)
>> -    {
>> -      unsigned int required = location + i;
>> -      if (d->one_vector_p)
>> -        {
>> -          /* We'll pass the same vector in twice, so allow indices to wrap.  */
>> -	  required &= (nelt - 1);
>> -	}
>> -      if (d->perm[i] != required)
>> -        return false;
>> -    }
>> +  if (!d->perm.series_p (0, 1, location, 1))
>> +    return false;
>>  
>>    /* Success! */
>>    if (d->testing_p)
>> @@ -13488,7 +13445,7 @@ aarch64_evpc_ext (struct expand_vec_perm
>>  static bool
>>  aarch64_evpc_rev (struct expand_vec_perm_d *d)
>>  {
>> -  unsigned int i, j, diff, size, unspec, nelt = d->perm.length ();
>> +  unsigned int i, diff, size, unspec;
>>  
>>    if (!d->one_vector_p)
>>      return false;
>> @@ -13504,18 +13461,10 @@ aarch64_evpc_rev (struct expand_vec_perm
>>    else
>>      return false;
>>  
>> -  for (i = 0; i < nelt ; i += diff + 1)
>> -    for (j = 0; j <= diff; j += 1)
>> -      {
>> -	/* This is guaranteed to be true as the value of diff
>> -	   is 7, 3, 1 and we should have enough elements in the
>> -	   queue to generate this.  Getting a vector mask with a
>> -	   value of diff other than these values implies that
>> -	   something is wrong by the time we get here.  */
>> -	gcc_assert (i + j < nelt);
>> -	if (d->perm[i + j] != i + diff - j)
>> -	  return false;
>> -      }
>> +  unsigned int step = diff + 1;
>> +  for (i = 0; i < step; ++i)
>> +    if (!d->perm.series_p (i, step, diff - i, step))
>> +      return false;
>>  
>>    /* Success! */
>>    if (d->testing_p)
>> @@ -13532,15 +13481,17 @@ aarch64_evpc_dup (struct expand_vec_perm
>>    rtx out = d->target;
>>    rtx in0;
>>    machine_mode vmode = d->vmode;
>> -  unsigned int i, elt, nelt = d->perm.length ();
>> +  unsigned int elt;
>>    rtx lane;
>>  
>> +  if (d->perm.encoding ().encoded_nelts () != 1)
>> +    return false;
>> +
>> +  /* Success! */
>> +  if (d->testing_p)
>> +    return true;
>> +
>>    elt = d->perm[0];
>> -  for (i = 1; i < nelt; i++)
>> -    {
>> -      if (elt != d->perm[i])
>> -	return false;
>> -    }
>>  
>>    /* The generic preparation in aarch64_expand_vec_perm_const_1
>>       swaps the operand order and the permute indices if it finds
>> @@ -13628,61 +13579,37 @@ aarch64_vectorize_vec_perm_const (machin
>>  				  rtx op1, const vec_perm_indices &sel)
>>  {
>>    struct expand_vec_perm_d d;
>> -  unsigned int i, which;
>>  
>> -  d.vmode = vmode;
>> -  d.target = target;
>> -  d.op0 = op0;
>> -  d.op1 = op1;
>> -  d.testing_p = !target;
>> -
>> -  /* Calculate whether all elements are in one vector.  */
>> -  unsigned int nelt = sel.length ();
>> -  for (i = which = 0; i < nelt; ++i)
>> +  /* Check whether the mask can be applied to a single vector.  */
>> +  if (op0 && rtx_equal_p (op0, op1))
>> +    d.one_vector_p = true;
>> +  else if (sel.all_from_input_p (0))
>>      {
>> -      unsigned int ei = sel[i] & (2 * nelt - 1);
>> -      which |= (ei < nelt ? 1 : 2);
>> +      d.one_vector_p = true;
>> +      op1 = op0;
>>      }
>> -
>> -  switch (which)
>> +  else if (sel.all_from_input_p (1))
>>      {
>> -    default:
>> -      gcc_unreachable ();
>> -
>> -    case 3:
>> -      d.one_vector_p = false;
>> -      if (d.testing_p || !rtx_equal_p (op0, op1))
>> -	break;
>> -
>> -      /* The elements of PERM do not suggest that only the first operand
>> -	 is used, but both operands are identical.  Allow easier matching
>> -	 of the permutation by folding the permutation into the single
>> -	 input vector.  */
>> -      /* Fall Through.  */
>> -    case 2:
>> -      d.op0 = op1;
>> -      d.one_vector_p = true;
>> -      break;
>> -
>> -    case 1:
>> -      d.op1 = op0;
>>        d.one_vector_p = true;
>> -      break;
>> +      op0 = op1;
>>      }
>> +  else
>> +    d.one_vector_p = false;
>>  
>> -  d.perm.new_vector (sel.encoding (), d.one_vector_p ? 1 : 2, nelt);
>> +  d.perm.new_vector (sel.encoding (), d.one_vector_p ? 1 : 2,
>> +		     sel.nelts_per_input ());
>> +  d.vmode = vmode;
>> +  d.target = target;
>> +  d.op0 = op0;
>> +  d.op1 = op1;
>> +  d.testing_p = !target;
>>  
>>    if (!d.testing_p)
>>      return aarch64_expand_vec_perm_const_1 (&d);
>>  
>> -  d.target = gen_raw_REG (d.vmode, LAST_VIRTUAL_REGISTER + 1);
>> -  d.op1 = d.op0 = gen_raw_REG (d.vmode, LAST_VIRTUAL_REGISTER + 2);
>> -  if (!d.one_vector_p)
>> -    d.op1 = gen_raw_REG (d.vmode, LAST_VIRTUAL_REGISTER + 3);
>> -
>> -  start_sequence ();
>> +  rtx_insn *last = get_last_insn ();
>>    bool ret = aarch64_expand_vec_perm_const_1 (&d);
>> -  end_sequence ();
>> +  gcc_assert (last == get_last_insn ());
>>  
>>    return ret;
>>  }

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [13/13] [AArch64] Use vec_perm_indices helper routines
  2018-01-04 11:28     ` Richard Sandiford
@ 2018-01-09 12:18       ` James Greenhalgh
  2018-01-09 16:24         ` RFA: Expand vec_perm_indices::series_p comment Richard Sandiford
  0 siblings, 1 reply; 46+ messages in thread
From: James Greenhalgh @ 2018-01-09 12:18 UTC (permalink / raw)
  To: Richard Sandiford; +Cc: gcc-patches, Richard Earnshaw, Marcus Shawcroft, nd

On Thu, Jan 04, 2018 at 11:27:56AM +0000, Richard Sandiford wrote:
> Ping**2

This is OK.

It took me a while to get the hang of the interface - a worked example
in the comment in vec-perm-indices.c would probably have been helpful.
It took until your code for REV for this to really make sense to me; so
perhaps that make for a good example.

James

> 
> Richard Sandiford <richard.sandiford@linaro.org> writes:
> > Ping
> >
> > Richard Sandiford <richard.sandiford@linaro.org> writes:
> >> This patch makes the AArch64 vec_perm_const code use the new
> >> vec_perm_indices routines, instead of checking each element individually.
> >> This means that they extend naturally to variable-length vectors.
> >>
> >> Also, aarch64_evpc_dup was the only function that generated rtl when
> >> testing_p is true, and that looked accidental.  The patch adds the
> >> missing check and then replaces the gen_rtx_REG/start_sequence/
> >> end_sequence stuff with an assert that no rtl is generated.
> >>
> >> Tested on aarch64-linux-gnu.  Also tested by making sure that there
> >> were no assembly output differences for aarch64_be-linux-gnu or
> >> aarch64_be-linux-gnu.  OK to install?
> >>
> >> Richard
> >>
> >>
> >> 2017-12-09  Richard Sandiford  <richard.sandiford@linaro.org>
> >>
> >> gcc/
> >> 	* config/aarch64/aarch64.c (aarch64_evpc_trn): Use d.perm.series_p
> >> 	instead of checking each element individually.
> >> 	(aarch64_evpc_uzp): Likewise.
> >> 	(aarch64_evpc_zip): Likewise.
> >> 	(aarch64_evpc_ext): Likewise.
> >> 	(aarch64_evpc_rev): Likewise.
> >> 	(aarch64_evpc_dup): Test the encoding for a single duplicated element,
> >> 	instead of checking each element individually.  Return true without
> >> 	generating rtl if
> >> 	(aarch64_vectorize_vec_perm_const): Use all_from_input_p to test
> >> 	whether all selected elements come from the same input, instead of
> >> 	checking each element individually.  Remove calls to gen_rtx_REG,
> >> 	start_sequence and end_sequence and instead assert that no rtl is
> >> 	generated.
> >>
> >> Index: gcc/config/aarch64/aarch64.c
> >> ===================================================================
> >> --- gcc/config/aarch64/aarch64.c	2017-12-09 22:48:47.535824832 +0000
> >> +++ gcc/config/aarch64/aarch64.c	2017-12-09 22:49:00.139270410 +0000
> >> @@ -13295,7 +13295,7 @@ aarch64_expand_vec_perm (rtx target, rtx
> >>  static bool
> >>  aarch64_evpc_trn (struct expand_vec_perm_d *d)
> >>  {
> >> -  unsigned int i, odd, mask, nelt = d->perm.length ();
> >> +  unsigned int odd, nelt = d->perm.length ();
> >>    rtx out, in0, in1, x;
> >>    machine_mode vmode = d->vmode;
> >>  
> >> @@ -13304,21 +13304,11 @@ aarch64_evpc_trn (struct expand_vec_perm
> >>  
> >>    /* Note that these are little-endian tests.
> >>       We correct for big-endian later.  */
> >> -  if (d->perm[0] == 0)
> >> -    odd = 0;
> >> -  else if (d->perm[0] == 1)
> >> -    odd = 1;
> >> -  else
> >> +  odd = d->perm[0];
> >> +  if ((odd != 0 && odd != 1)
> >> +      || !d->perm.series_p (0, 2, odd, 2)
> >> +      || !d->perm.series_p (1, 2, nelt + odd, 2))
> >>      return false;
> >> -  mask = (d->one_vector_p ? nelt - 1 : 2 * nelt - 1);
> >> -
> >> -  for (i = 0; i < nelt; i += 2)
> >> -    {
> >> -      if (d->perm[i] != i + odd)
> >> -	return false;
> >> -      if (d->perm[i + 1] != ((i + nelt + odd) & mask))
> >> -	return false;
> >> -    }
> >>  
> >>    /* Success!  */
> >>    if (d->testing_p)
> >> @@ -13342,7 +13332,7 @@ aarch64_evpc_trn (struct expand_vec_perm
> >>  static bool
> >>  aarch64_evpc_uzp (struct expand_vec_perm_d *d)
> >>  {
> >> -  unsigned int i, odd, mask, nelt = d->perm.length ();
> >> +  unsigned int odd;
> >>    rtx out, in0, in1, x;
> >>    machine_mode vmode = d->vmode;
> >>  
> >> @@ -13351,20 +13341,10 @@ aarch64_evpc_uzp (struct expand_vec_perm
> >>  
> >>    /* Note that these are little-endian tests.
> >>       We correct for big-endian later.  */
> >> -  if (d->perm[0] == 0)
> >> -    odd = 0;
> >> -  else if (d->perm[0] == 1)
> >> -    odd = 1;
> >> -  else
> >> +  odd = d->perm[0];
> >> +  if ((odd != 0 && odd != 1)
> >> +      || !d->perm.series_p (0, 1, odd, 2))
> >>      return false;
> >> -  mask = (d->one_vector_p ? nelt - 1 : 2 * nelt - 1);
> >> -
> >> -  for (i = 0; i < nelt; i++)
> >> -    {
> >> -      unsigned elt = (i * 2 + odd) & mask;
> >> -      if (d->perm[i] != elt)
> >> -	return false;
> >> -    }
> >>  
> >>    /* Success!  */
> >>    if (d->testing_p)
> >> @@ -13388,7 +13368,7 @@ aarch64_evpc_uzp (struct expand_vec_perm
> >>  static bool
> >>  aarch64_evpc_zip (struct expand_vec_perm_d *d)
> >>  {
> >> -  unsigned int i, high, mask, nelt = d->perm.length ();
> >> +  unsigned int high, nelt = d->perm.length ();
> >>    rtx out, in0, in1, x;
> >>    machine_mode vmode = d->vmode;
> >>  
> >> @@ -13397,25 +13377,11 @@ aarch64_evpc_zip (struct expand_vec_perm
> >>  
> >>    /* Note that these are little-endian tests.
> >>       We correct for big-endian later.  */
> >> -  high = nelt / 2;
> >> -  if (d->perm[0] == high)
> >> -    /* Do Nothing.  */
> >> -    ;
> >> -  else if (d->perm[0] == 0)
> >> -    high = 0;
> >> -  else
> >> +  high = d->perm[0];
> >> +  if ((high != 0 && high * 2 != nelt)
> >> +      || !d->perm.series_p (0, 2, high, 1)
> >> +      || !d->perm.series_p (1, 2, high + nelt, 1))
> >>      return false;
> >> -  mask = (d->one_vector_p ? nelt - 1 : 2 * nelt - 1);
> >> -
> >> -  for (i = 0; i < nelt / 2; i++)
> >> -    {
> >> -      unsigned elt = (i + high) & mask;
> >> -      if (d->perm[i * 2] != elt)
> >> -	return false;
> >> -      elt = (elt + nelt) & mask;
> >> -      if (d->perm[i * 2 + 1] != elt)
> >> -	return false;
> >> -    }
> >>  
> >>    /* Success!  */
> >>    if (d->testing_p)
> >> @@ -13440,23 +13406,14 @@ aarch64_evpc_zip (struct expand_vec_perm
> >>  static bool
> >>  aarch64_evpc_ext (struct expand_vec_perm_d *d)
> >>  {
> >> -  unsigned int i, nelt = d->perm.length ();
> >> +  unsigned int nelt = d->perm.length ();
> >>    rtx offset;
> >>  
> >>    unsigned int location = d->perm[0]; /* Always < nelt.  */
> >>  
> >>    /* Check if the extracted indices are increasing by one.  */
> >> -  for (i = 1; i < nelt; i++)
> >> -    {
> >> -      unsigned int required = location + i;
> >> -      if (d->one_vector_p)
> >> -        {
> >> -          /* We'll pass the same vector in twice, so allow indices to wrap.  */
> >> -	  required &= (nelt - 1);
> >> -	}
> >> -      if (d->perm[i] != required)
> >> -        return false;
> >> -    }
> >> +  if (!d->perm.series_p (0, 1, location, 1))
> >> +    return false;
> >>  
> >>    /* Success! */
> >>    if (d->testing_p)
> >> @@ -13488,7 +13445,7 @@ aarch64_evpc_ext (struct expand_vec_perm
> >>  static bool
> >>  aarch64_evpc_rev (struct expand_vec_perm_d *d)
> >>  {
> >> -  unsigned int i, j, diff, size, unspec, nelt = d->perm.length ();
> >> +  unsigned int i, diff, size, unspec;
> >>  
> >>    if (!d->one_vector_p)
> >>      return false;
> >> @@ -13504,18 +13461,10 @@ aarch64_evpc_rev (struct expand_vec_perm
> >>    else
> >>      return false;
> >>  
> >> -  for (i = 0; i < nelt ; i += diff + 1)
> >> -    for (j = 0; j <= diff; j += 1)
> >> -      {
> >> -	/* This is guaranteed to be true as the value of diff
> >> -	   is 7, 3, 1 and we should have enough elements in the
> >> -	   queue to generate this.  Getting a vector mask with a
> >> -	   value of diff other than these values implies that
> >> -	   something is wrong by the time we get here.  */
> >> -	gcc_assert (i + j < nelt);
> >> -	if (d->perm[i + j] != i + diff - j)
> >> -	  return false;
> >> -      }
> >> +  unsigned int step = diff + 1;
> >> +  for (i = 0; i < step; ++i)
> >> +    if (!d->perm.series_p (i, step, diff - i, step))
> >> +      return false;
> >>  
> >>    /* Success! */
> >>    if (d->testing_p)
> >> @@ -13532,15 +13481,17 @@ aarch64_evpc_dup (struct expand_vec_perm
> >>    rtx out = d->target;
> >>    rtx in0;
> >>    machine_mode vmode = d->vmode;
> >> -  unsigned int i, elt, nelt = d->perm.length ();
> >> +  unsigned int elt;
> >>    rtx lane;
> >>  
> >> +  if (d->perm.encoding ().encoded_nelts () != 1)
> >> +    return false;
> >> +
> >> +  /* Success! */
> >> +  if (d->testing_p)
> >> +    return true;
> >> +
> >>    elt = d->perm[0];
> >> -  for (i = 1; i < nelt; i++)
> >> -    {
> >> -      if (elt != d->perm[i])
> >> -	return false;
> >> -    }
> >>  
> >>    /* The generic preparation in aarch64_expand_vec_perm_const_1
> >>       swaps the operand order and the permute indices if it finds
> >> @@ -13628,61 +13579,37 @@ aarch64_vectorize_vec_perm_const (machin
> >>  				  rtx op1, const vec_perm_indices &sel)
> >>  {
> >>    struct expand_vec_perm_d d;
> >> -  unsigned int i, which;
> >>  
> >> -  d.vmode = vmode;
> >> -  d.target = target;
> >> -  d.op0 = op0;
> >> -  d.op1 = op1;
> >> -  d.testing_p = !target;
> >> -
> >> -  /* Calculate whether all elements are in one vector.  */
> >> -  unsigned int nelt = sel.length ();
> >> -  for (i = which = 0; i < nelt; ++i)
> >> +  /* Check whether the mask can be applied to a single vector.  */
> >> +  if (op0 && rtx_equal_p (op0, op1))
> >> +    d.one_vector_p = true;
> >> +  else if (sel.all_from_input_p (0))
> >>      {
> >> -      unsigned int ei = sel[i] & (2 * nelt - 1);
> >> -      which |= (ei < nelt ? 1 : 2);
> >> +      d.one_vector_p = true;
> >> +      op1 = op0;
> >>      }
> >> -
> >> -  switch (which)
> >> +  else if (sel.all_from_input_p (1))
> >>      {
> >> -    default:
> >> -      gcc_unreachable ();
> >> -
> >> -    case 3:
> >> -      d.one_vector_p = false;
> >> -      if (d.testing_p || !rtx_equal_p (op0, op1))
> >> -	break;
> >> -
> >> -      /* The elements of PERM do not suggest that only the first operand
> >> -	 is used, but both operands are identical.  Allow easier matching
> >> -	 of the permutation by folding the permutation into the single
> >> -	 input vector.  */
> >> -      /* Fall Through.  */
> >> -    case 2:
> >> -      d.op0 = op1;
> >> -      d.one_vector_p = true;
> >> -      break;
> >> -
> >> -    case 1:
> >> -      d.op1 = op0;
> >>        d.one_vector_p = true;
> >> -      break;
> >> +      op0 = op1;
> >>      }
> >> +  else
> >> +    d.one_vector_p = false;
> >>  
> >> -  d.perm.new_vector (sel.encoding (), d.one_vector_p ? 1 : 2, nelt);
> >> +  d.perm.new_vector (sel.encoding (), d.one_vector_p ? 1 : 2,
> >> +		     sel.nelts_per_input ());
> >> +  d.vmode = vmode;
> >> +  d.target = target;
> >> +  d.op0 = op0;
> >> +  d.op1 = op1;
> >> +  d.testing_p = !target;
> >>  
> >>    if (!d.testing_p)
> >>      return aarch64_expand_vec_perm_const_1 (&d);
> >>  
> >> -  d.target = gen_raw_REG (d.vmode, LAST_VIRTUAL_REGISTER + 1);
> >> -  d.op1 = d.op0 = gen_raw_REG (d.vmode, LAST_VIRTUAL_REGISTER + 2);
> >> -  if (!d.one_vector_p)
> >> -    d.op1 = gen_raw_REG (d.vmode, LAST_VIRTUAL_REGISTER + 3);
> >> -
> >> -  start_sequence ();
> >> +  rtx_insn *last = get_last_insn ();
> >>    bool ret = aarch64_expand_vec_perm_const_1 (&d);
> >> -  end_sequence ();
> >> +  gcc_assert (last == get_last_insn ());
> >>  
> >>    return ret;
> >>  }

^ permalink raw reply	[flat|nested] 46+ messages in thread

* RFA: Expand vec_perm_indices::series_p comment
  2018-01-09 12:18       ` James Greenhalgh
@ 2018-01-09 16:24         ` Richard Sandiford
  2018-01-29 20:56           ` Ping: " Richard Sandiford
  0 siblings, 1 reply; 46+ messages in thread
From: Richard Sandiford @ 2018-01-09 16:24 UTC (permalink / raw)
  To: James Greenhalgh; +Cc: gcc-patches

James Greenhalgh <james.greenhalgh@arm.com> writes:
> On Thu, Jan 04, 2018 at 11:27:56AM +0000, Richard Sandiford wrote:
>> Ping**2
>
> This is OK.

Thanks.

> It took me a while to get the hang of the interface - a worked example
> in the comment in vec-perm-indices.c would probably have been helpful.
> It took until your code for REV for this to really make sense to me; so
> perhaps that make for a good example.

Yeah, good idea.

Is the following OK?  Tested on aarch64-linux-gnu.

Thanks,
Richard


2018-01-09  Richard Sandiford  <richard.sandiford@linaro.org>

gcc/
	* vec-perm-indices.c (vec_perm_indices::series_p): Give examples
	of usage.

Index: gcc/vec-perm-indices.c
===================================================================
--- gcc/vec-perm-indices.c	2018-01-03 11:12:55.709763763 +0000
+++ gcc/vec-perm-indices.c	2018-01-09 15:46:40.004232873 +0000
@@ -114,7 +114,18 @@ vec_perm_indices::rotate_inputs (int del
 }
 
 /* Return true if index OUT_BASE + I * OUT_STEP selects input
-   element IN_BASE + I * IN_STEP.  */
+   element IN_BASE + I * IN_STEP.  For example, the call to test
+   whether a permute reverses a vector of N elements would be:
+
+     series_p (0, 1, N - 1, -1)
+
+   which would return true for { N - 1, N - 2, N - 3, ... }.
+   The calls to test for an interleaving of elements starting
+   at N1 and N2 would be:
+
+     series_p (0, 2, N1, 1) && series_p (1, 2, N2, 1).
+
+   which would return true for { N1, N2, N1 + 1, N2 + 1, ... }.  */
 
 bool
 vec_perm_indices::series_p (unsigned int out_base, unsigned int out_step,

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Ping: Expand vec_perm_indices::series_p comment
  2018-01-09 16:24         ` RFA: Expand vec_perm_indices::series_p comment Richard Sandiford
@ 2018-01-29 20:56           ` Richard Sandiford
  2018-01-30  7:20             ` Jeff Law
  0 siblings, 1 reply; 46+ messages in thread
From: Richard Sandiford @ 2018-01-29 20:56 UTC (permalink / raw)
  To: gcc-patches

Ping

Richard Sandiford <richard.sandiford@linaro.org> writes:
> James Greenhalgh <james.greenhalgh@arm.com> writes:
>> On Thu, Jan 04, 2018 at 11:27:56AM +0000, Richard Sandiford wrote:
>>> Ping**2
>>
>> This is OK.
>
> Thanks.
>
>> It took me a while to get the hang of the interface - a worked example
>> in the comment in vec-perm-indices.c would probably have been helpful.
>> It took until your code for REV for this to really make sense to me; so
>> perhaps that make for a good example.
>
> Yeah, good idea.
>
> Is the following OK?  Tested on aarch64-linux-gnu.
>
> Thanks,
> Richard
>
>
> 2018-01-09  Richard Sandiford  <richard.sandiford@linaro.org>
>
> gcc/
> 	* vec-perm-indices.c (vec_perm_indices::series_p): Give examples
> 	of usage.
>
> Index: gcc/vec-perm-indices.c
> ===================================================================
> --- gcc/vec-perm-indices.c	2018-01-03 11:12:55.709763763 +0000
> +++ gcc/vec-perm-indices.c	2018-01-09 15:46:40.004232873 +0000
> @@ -114,7 +114,18 @@ vec_perm_indices::rotate_inputs (int del
>  }
>  
>  /* Return true if index OUT_BASE + I * OUT_STEP selects input
> -   element IN_BASE + I * IN_STEP.  */
> +   element IN_BASE + I * IN_STEP.  For example, the call to test
> +   whether a permute reverses a vector of N elements would be:
> +
> +     series_p (0, 1, N - 1, -1)
> +
> +   which would return true for { N - 1, N - 2, N - 3, ... }.
> +   The calls to test for an interleaving of elements starting
> +   at N1 and N2 would be:
> +
> +     series_p (0, 2, N1, 1) && series_p (1, 2, N2, 1).
> +
> +   which would return true for { N1, N2, N1 + 1, N2 + 1, ... }.  */
>  
>  bool
>  vec_perm_indices::series_p (unsigned int out_base, unsigned int out_step,

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Ping: Expand vec_perm_indices::series_p comment
  2018-01-29 20:56           ` Ping: " Richard Sandiford
@ 2018-01-30  7:20             ` Jeff Law
  0 siblings, 0 replies; 46+ messages in thread
From: Jeff Law @ 2018-01-30  7:20 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

On 01/29/2018 01:38 PM, Richard Sandiford wrote:
> Ping
> 
> Richard Sandiford <richard.sandiford@linaro.org> writes:
>> James Greenhalgh <james.greenhalgh@arm.com> writes:
>>> On Thu, Jan 04, 2018 at 11:27:56AM +0000, Richard Sandiford wrote:
>>>> Ping**2
>>>
>>> This is OK.
>>
>> Thanks.
>>
>>> It took me a while to get the hang of the interface - a worked example
>>> in the comment in vec-perm-indices.c would probably have been helpful.
>>> It took until your code for REV for this to really make sense to me; so
>>> perhaps that make for a good example.
>>
>> Yeah, good idea.
>>
>> Is the following OK?  Tested on aarch64-linux-gnu.
>>
>> Thanks,
>> Richard
>>
>>
>> 2018-01-09  Richard Sandiford  <richard.sandiford@linaro.org>
>>
>> gcc/
>> 	* vec-perm-indices.c (vec_perm_indices::series_p): Give examples
>> 	of usage.
OK
jeff

^ permalink raw reply	[flat|nested] 46+ messages in thread

end of thread, other threads:[~2018-01-30  6:11 UTC | newest]

Thread overview: 46+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-12-09 23:06 [00/13] Make VEC_PERM_EXPR work for variable-length vectors Richard Sandiford
2017-12-09 23:08 ` [01/13] Add a qimode_for_vec_perm helper function Richard Sandiford
2017-12-18 13:34   ` Richard Biener
2017-12-09 23:09 ` [02/13] Pass vec_perm_indices by reference Richard Sandiford
2017-12-12 14:23   ` Richard Biener
2017-12-09 23:11 ` [03/13] Split can_vec_perm_p into can_vec_perm_{var,const}_p Richard Sandiford
2017-12-12 14:25   ` Richard Biener
2017-12-09 23:13 ` [04/13] Refactor expand_vec_perm Richard Sandiford
2017-12-12 15:17   ` Richard Biener
2017-12-09 23:17 ` [05/13] Remove vec_perm_const optab Richard Sandiford
2017-12-12 15:26   ` Richard Biener
2017-12-20 13:42     ` Richard Sandiford
2017-12-09 23:18 ` [06/13] Check whether a vector of QIs can store all indices Richard Sandiford
2017-12-12 15:27   ` Richard Biener
2017-12-09 23:20 ` [07/13] Make vec_perm_indices use new vector encoding Richard Sandiford
2017-12-12 15:32   ` Richard Biener
2017-12-12 15:47     ` Richard Sandiford
2017-12-14 10:37       ` Richard Biener
2017-12-20 13:48         ` Richard Sandiford
2018-01-02 13:15           ` Richard Biener
2018-01-02 18:30             ` Richard Sandiford
2017-12-09 23:20 ` [08/13] Add a vec_perm_indices_to_tree helper function Richard Sandiford
2017-12-18 13:34   ` Richard Biener
2017-12-09 23:21 ` [09/13] Use explicit encodings for simple permutes Richard Sandiford
2017-12-19 20:37   ` Richard Sandiford
2018-01-02 13:07   ` Richard Biener
2017-12-09 23:23 ` [10/13] Rework VEC_PERM_EXPR folding Richard Sandiford
2017-12-09 23:24   ` [11/13] Use vec_perm_builder::series_p in shift_amt_for_vec_perm_mask Richard Sandiford
2017-12-19 20:37     ` Richard Sandiford
2018-01-02 13:08     ` Richard Biener
2017-12-09 23:25   ` [12/13] Use ssizetype selectors for autovectorised VEC_PERM_EXPRs Richard Sandiford
2017-12-19 20:37     ` Richard Sandiford
2018-01-02 13:09     ` Richard Biener
2017-12-19 20:37   ` [10/13] Rework VEC_PERM_EXPR folding Richard Sandiford
2018-01-02 13:08   ` Richard Biener
2017-12-09 23:27 ` [13/13] [AArch64] Use vec_perm_indices helper routines Richard Sandiford
2017-12-19 20:37   ` Richard Sandiford
2018-01-04 11:28     ` Richard Sandiford
2018-01-09 12:18       ` James Greenhalgh
2018-01-09 16:24         ` RFA: Expand vec_perm_indices::series_p comment Richard Sandiford
2018-01-29 20:56           ` Ping: " Richard Sandiford
2018-01-30  7:20             ` Jeff Law
2017-12-12 14:12 ` [00/13] Make VEC_PERM_EXPR work for variable-length vectors Richard Biener
2017-12-12 15:32   ` Richard Sandiford
2017-12-12 15:38     ` Richard Biener
2017-12-12 15:57       ` Richard Sandiford

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).