[PATCH] Switch vec_init and vec_extract optabs to 2 mode optab to allow extraction of vector from vector or initialization of vector from smaller vectors (PR target/80846)

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH] Switch vec_init and vec_extract optabs to 2 mode optab to allow extraction of vector from vector or initialization of vector from smaller vectors (PR target/80846)
@ 2017-07-25  9:14 Jakub Jelinek
  2017-07-25 21:12 ` Segher Boessenkool
                   ` (7 more replies)
  0 siblings, 8 replies; 15+ messages in thread
From: Jakub Jelinek @ 2017-07-25  9:14 UTC (permalink / raw)
  To: Richard Biener, Uros Bizjak, David Edelsohn, Segher Boessenkool,
	Marcus Shawcroft, Richard Earnshaw, Andreas Krebbel,
	Matthew Fortune, Eric Botcazou, Andrew Jenner
  Cc: gcc-patches

Hi!

The following patch adjusts the vec_init and vec_extract optabs, so that
they don't have in the expander names just the vector mode, but also another
mode, for vec_extract the mode of the result and for vec_init the mode of
the elts of the vector passed as second operand.

Without this patch, the second mode has been implicit, GET_MODE_INNER of
the vector mode, so one could just extract a single element from a vector
or construct vector from elements.  While that is most common, we allow
in GIMPLE e.g. construction of V8DImode from 4 V2DImode elements etc.
and the vectorizer uses them.  By having the second mode in the name
it allows the generic code (vectorizer, expansion) to query whether the
backend supports such vector from vector expansions or inits from vector
elts and use them if available.

For vec_extract, if we say want to extract high V2SImode from V4SImode
the fallback is try to expand it as DImode extraction from V2DImode.
This works well in many cases, but doesn't really work for very large
vectors, say if we want to extract high V8SImode from V16SImode on x86,
we'd need OImode extraction from V2OImode, which is something the backend
doesn't have any support for.
For vec_init, the fallback is usually to go through memory, which is slow in
many cases.

This patch only adds new vector from vector extract and init patterns to
the i386 backend, but I had to change many other targets too, because
it needs to have the element mode in the vec_extract/vec_init expander
names.  Seems most of the backends didn't really have a mode attribute
usable for this or had it only in uppercase, while for the names we need
lowercase.  Some backends had a convention on how to name lower case
vs. upper case modes, others didn't have any.  So I'm CCing maintainers
of affected backends to seek advice on what mode attributes they want to
use.

Bootstrapped/regtested on x86_64-linux and i686-linux, where it improves
e.g. the code generation for slp-43.c and slp-45.c testcases.
make cc1 tested in cross-compilers to the remaining targets.

Ok for trunk?

2017-07-25  Jakub Jelinek  <jakub@redhat.com>

	PR target/80846
	* optabs.def (vec_extract_optab, vec_init_optab): Change from
	a direct optab to conversion optab.
	* optabs.c (expand_vector_broadcast): Use convert_optab_handler
	with GET_MODE_INNER as last argument instead of optab_handler.
	* expmed.c (extract_bit_field_1): Likewise.  Use vector from
	vector extraction if possible and optab is available.
	* expr.c (store_constructor): Use convert_optab_handler instead
	of optab_handler.  Use vector initialization from smaller
	vectors if possible and optab is available.
	* tree-vect-stmts.c (vectorizable_load): Likewise.
	* doc/md.texi (vec_extract, vec_init): Document that the optabs
	now have two modes.
	* config/i386/i386.c (ix86_expand_vector_init): Handle expansion
	of vec_init from half-sized vectors with the same element mode.
	* config/i386/sse.md (ssehalfvecmode): Add V4TI case.
	(ssehalfvecmodelower, ssescalarmodelower): New mode attributes.
	(reduc_plus_scal_v8df, reduc_plus_scal_v4df, reduc_plus_scal_v2df,
	reduc_plus_scal_v16sf, reduc_plus_scal_v8sf, reduc_plus_scal_v4sf,
	reduc_<code>_scal_<mode>, reduc_umin_scal_v8hi): Add element mode
	after mode in gen_vec_extract* calls.
	(vec_extract<mode>): Renamed to ...
	(vec_extract<mode><ssescalarmodelower>): ... this.
	(vec_extract<mode><ssehalfvecmodelower>): New expander.
	(rotl<mode>3, rotr<mode>3, <shift_insn><mode>3, ashrv2di3): Add
	element mode after mode in gen_vec_init* calls.
	(VEC_INIT_HALF_MODE): New mode iterator.
	(vec_init<mode>): Renamed to ...
	(vec_init<mode><ssescalarmodelower>): ... this.
	(vec_init<mode><ssehalfvecmodelower>): New expander.
	* config/i386/mmx.md (vec_extractv2sf): Renamed to ...
	(vec_extractv2sfsf): ... this.
	(vec_initv2sf): Renamed to ...
	(vec_initv2sfsf): ... this.
	(vec_extractv2si): Renamed to ...
	(vec_extractv2sisi): ... this.
	(vec_initv2si): Renamed to ...
	(vec_initv2sisi): ... this.
	(vec_extractv4hi): Renamed to ...
	(vec_extractv4hihi): ... this.
	(vec_initv4hi): Renamed to ...
	(vec_initv4hihi): ... this.
	(vec_extractv8qi): Renamed to ...
	(vec_extractv8qiqi): ... this.
	(vec_initv8qi): Renamed to ...
	(vec_initv8qiqi): ... this.
	* config/rs6000/vector.md (VEC_base_l): New mode attribute.
	(vec_init<mode>): Renamed to ...
	(vec_init<mode><VEC_base_l>): ... this.
	(vec_extract<mode>): Renamed to ...
	(vec_extract<mode><VEC_base_l>): ... this.
	* config/rs6000/paired.md (vec_initv2sf): Renamed to ...
	(vec_initv2sfsf): ... this.
	* config/rs6000/altivec.md (splitter, altivec_copysign_v4sf3,
	vec_unpacku_hi_v16qi, vec_unpacku_hi_v8hi, vec_unpacku_lo_v16qi,
	vec_unpacku_lo_v8hi, mulv16qi3, altivec_vreve<mode>2): Add
	element mode after mode in gen_vec_init* calls.
	* config/aarch64/aarch64-simd.md (vec_init<mode>): Renamed to ...
	(vec_init<mode><Vel>): ... this.
	(vec_extract<mode>): Renamed to ...
	(vec_extract<mode><Vel>): ... this.
	* config/aarch64/iterators.md (Vel): New mode attribute.
	* config/s390/s390.c (s390_expand_vec_strlen, s390_expand_vec_movstr):
	Add element mode after mode in gen_vec_extract* calls.
	* config/s390/vector.md (non_vec_l): New mode attribute.
	(vec_extract<mode>): Renamed to ...
	(vec_extract<mode><non_vec_l>): ... this.
	(vec_init<mode>): Renamed to ...
	(vec_init<mode><non_vec_l>): ... this.
	* config/s390/s390-builtins.def (s390_vlgvb, s390_vlgvh, s390_vlgvf,
	s390_vlgvf_flt, s390_vlgvg, s390_vlgvg_dbl): Add element mode after
	vec_extract mode.
	* config/arm/iterators.md (V_elem_l): New mode attribute.
	* config/arm/neon.md (vec_extract<mode>): Renamed to ...
	(vec_extract<mode><V_elem_l>): ... this.
	(vec_extractv2di): Renamed to ...
	(vec_extractv2didi): ... this.
	(vec_init<mode>): Renamed to ...
	(vec_init<mode><V_elem_l>): ... this.
	(reduc_plus_scal_<mode>, reduc_plus_scal_v2di, reduc_smin_scal_<mode>,
	reduc_smax_scal_<mode>, reduc_umin_scal_<mode>,
	reduc_umax_scal_<mode>, neon_vget_lane<mode>, neon_vget_laneu<mode>):
	Add element mode after gen_vec_extract* calls.
	* config/mips/mips-msa.md (vec_init<mode>): Renamed to ...
	(vec_init<mode><unitmode>): ... this.
	(vec_extract<mode>): Renamed to ...
	(vec_extract<mode><unitmode>): ... this.
	* config/mips/loongson.md (vec_init<mode>): Renamed to ...
	(vec_init<mode><unitmode>): ... this.
	* config/mips/mips-ps-3d.md (vec_initv2sf): Renamed to ...
	(vec_initv2sfsf): ... this.
	(vec_extractv2sf): Renamed to ...
	(vec_extractv2sfsf): ... this.
	(reduc_plus_scal_v2sf, reduc_smin_scal_v2sf, reduc_smax_scal_v2sf):
	Add element mode after gen_vec_extract* calls.
	* config/mips/mips.md (unitmode): New mode iterator.
	* config/spu/spu.c (spu_expand_prologue, spu_allocate_stack,
	spu_builtin_extract): Add element mode after gen_vec_extract* calls.
	* config/spu/spu.md (inner_l): New mode attribute.
	(vec_init<mode>): Renamed to ...
	(vec_init<mode><inner_l>): ... this.
	(vec_extract<mode>): Renamed to ...
	(vec_extract<mode><inner_l>): ... this.
	* config/sparc/sparc.md (veltmode): New mode iterator.
	(vec_init<VMALL:mode>): Renamed to ...
	(vec_init<VMALL:mode><VMALL:veltmode>): ... this.
	* config/ia64/vect.md (vec_initv2si): Renamed to ...
	(vec_initv2sisi): ... this.
	(vec_initv2sf): Renamed to ...
	(vec_initv2sfsf): ... this.
	(vec_extractv2sf): Renamed to ...
	(vec_extractv2sfsf): ... this.
	* config/powerpcspe/vector.md (VEC_base_l): New mode attribute.
	(vec_init<mode>): Renamed to ...
	(vec_init<mode><VEC_base_l>): ... this.
	(vec_extract<mode>): Renamed to ...
	(vec_extract<mode><VEC_base_l>): ... this.
	* config/powerpcspe/paired.md (vec_initv2sf): Renamed to ...
	(vec_initv2sfsf): ... this.
	* config/powerpcspe/altivec.md (splitter, altivec_copysign_v4sf3,
	vec_unpacku_hi_v16qi, vec_unpacku_hi_v8hi, vec_unpacku_lo_v16qi,
	vec_unpacku_lo_v8hi, mulv16qi3): Add element mode after mode in
	gen_vec_init* calls.

--- gcc/optabs.def.jj	2017-07-24 10:57:45.944815535 +0200
+++ gcc/optabs.def	2017-07-24 16:11:23.066229910 +0200
@@ -89,6 +89,8 @@ OPTAB_CD(vec_cmpu_optab, "vec_cmpu$a$b")
 OPTAB_CD(vec_cmpeq_optab, "vec_cmpeq$a$b")
 OPTAB_CD(maskload_optab, "maskload$a$b")
 OPTAB_CD(maskstore_optab, "maskstore$a$b")
+OPTAB_CD(vec_extract_optab, "vec_extract$a$b")
+OPTAB_CD(vec_init_optab, "vec_init$a$b")
 
 OPTAB_NL(add_optab, "add$P$a3", PLUS, "add", '3', gen_int_fp_fixed_libfunc)
 OPTAB_NX(add_optab, "add$F$a3")
@@ -294,8 +296,6 @@ OPTAB_D (udot_prod_optab, "udot_prod$I$a
 OPTAB_D (usum_widen_optab, "widen_usum$I$a3")
 OPTAB_D (usad_optab, "usad$I$a")
 OPTAB_D (ssad_optab, "ssad$I$a")
-OPTAB_D (vec_extract_optab, "vec_extract$a")
-OPTAB_D (vec_init_optab, "vec_init$a")
 OPTAB_D (vec_pack_sfix_trunc_optab, "vec_pack_sfix_trunc_$a")
 OPTAB_D (vec_pack_ssat_optab, "vec_pack_ssat_$a")
 OPTAB_D (vec_pack_trunc_optab, "vec_pack_trunc_$a")
--- gcc/optabs.c.jj	2017-07-24 10:57:46.216812275 +0200
+++ gcc/optabs.c	2017-07-24 16:11:23.067229898 +0200
@@ -386,7 +386,8 @@ expand_vector_broadcast (machine_mode vm
   /* ??? If the target doesn't have a vec_init, then we have no easy way
      of performing this operation.  Most of this sort of generic support
      is hidden away in the vector lowering support in gimple.  */
-  icode = optab_handler (vec_init_optab, vmode);
+  icode = convert_optab_handler (vec_init_optab, vmode,
+				 GET_MODE_INNER (vmode));
   if (icode == CODE_FOR_nothing)
     return NULL;
 
--- gcc/expmed.c.jj	2017-07-24 10:57:45.914815894 +0200
+++ gcc/expmed.c	2017-07-24 16:11:23.071229850 +0200
@@ -1566,6 +1566,55 @@ extract_bit_field_1 (rtx str_rtx, unsign
       return op0;
     }
 
+  /* First try to check for vector from vector extractions.  */
+  if (VECTOR_MODE_P (GET_MODE (op0))
+      && !MEM_P (op0)
+      && VECTOR_MODE_P (tmode)
+      && GET_MODE_SIZE (GET_MODE (op0)) > GET_MODE_SIZE (tmode))
+    {
+      machine_mode new_mode = GET_MODE (op0);
+      if (GET_MODE_INNER (new_mode) != GET_MODE_INNER (tmode))
+	{
+	  new_mode = mode_for_vector (GET_MODE_INNER (tmode),
+				      GET_MODE_BITSIZE (GET_MODE (op0))
+				      / GET_MODE_UNIT_BITSIZE (tmode));
+	  if (!VECTOR_MODE_P (new_mode)
+	      || GET_MODE_SIZE (new_mode) != GET_MODE_SIZE (GET_MODE (op0))
+	      || GET_MODE_INNER (new_mode) != GET_MODE_INNER (tmode)
+	      || !targetm.vector_mode_supported_p (new_mode))
+	    new_mode = VOIDmode;
+	}
+      if (new_mode != VOIDmode
+	  && (convert_optab_handler (vec_extract_optab, new_mode, tmode)
+	      != CODE_FOR_nothing)
+	  && ((bitnum + bitsize - 1) / GET_MODE_BITSIZE (tmode)
+	      == bitnum / GET_MODE_BITSIZE (tmode)))
+	{
+	  struct expand_operand ops[3];
+	  machine_mode outermode = new_mode;
+	  machine_mode innermode = tmode;
+	  enum insn_code icode
+	    = convert_optab_handler (vec_extract_optab, outermode, innermode);
+	  unsigned HOST_WIDE_INT pos = bitnum / GET_MODE_BITSIZE (innermode);
+
+	  if (new_mode != GET_MODE (op0))
+	    op0 = gen_lowpart (new_mode, op0);
+	  create_output_operand (&ops[0], target, innermode);
+	  ops[0].target = 1;
+	  create_input_operand (&ops[1], op0, outermode);
+	  create_integer_operand (&ops[2], pos);
+	  if (maybe_expand_insn (icode, 3, ops))
+	    {
+	      if (alt_rtl && ops[0].target)
+		*alt_rtl = target;
+	      target = ops[0].value;
+	      if (GET_MODE (target) != mode)
+		return gen_lowpart (tmode, target);
+	      return target;
+	    }
+	}
+    }
+
   /* See if we can get a better vector mode before extracting.  */
   if (VECTOR_MODE_P (GET_MODE (op0))
       && !MEM_P (op0)
@@ -1599,14 +1648,17 @@ extract_bit_field_1 (rtx str_rtx, unsign
      available.  */
   if (VECTOR_MODE_P (GET_MODE (op0))
       && !MEM_P (op0)
-      && optab_handler (vec_extract_optab, GET_MODE (op0)) != CODE_FOR_nothing
+      && (convert_optab_handler (vec_extract_optab, GET_MODE (op0),
+				 GET_MODE_INNER (GET_MODE (op0)))
+	  != CODE_FOR_nothing)
       && ((bitnum + bitsize - 1) / GET_MODE_UNIT_BITSIZE (GET_MODE (op0))
 	  == bitnum / GET_MODE_UNIT_BITSIZE (GET_MODE (op0))))
     {
       struct expand_operand ops[3];
       machine_mode outermode = GET_MODE (op0);
       machine_mode innermode = GET_MODE_INNER (outermode);
-      enum insn_code icode = optab_handler (vec_extract_optab, outermode);
+      enum insn_code icode
+	= convert_optab_handler (vec_extract_optab, outermode, innermode);
       unsigned HOST_WIDE_INT pos = bitnum / GET_MODE_BITSIZE (innermode);
 
       create_output_operand (&ops[0], target, innermode);
--- gcc/expr.c.jj	2017-07-24 10:57:45.963815307 +0200
+++ gcc/expr.c	2017-07-24 16:11:23.073229826 +0200
@@ -6589,6 +6589,7 @@ store_constructor (tree exp, rtx target,
 	rtvec vector = NULL;
 	unsigned n_elts;
 	alias_set_type alias;
+	bool vec_vec_init_p = false;
 
 	gcc_assert (eltmode != BLKmode);
 
@@ -6596,27 +6597,30 @@ store_constructor (tree exp, rtx target,
 	if (REG_P (target) && VECTOR_MODE_P (GET_MODE (target)))
 	  {
 	    machine_mode mode = GET_MODE (target);
+	    machine_mode emode = eltmode;
 
-	    icode = (int) optab_handler (vec_init_optab, mode);
-	    /* Don't use vec_init<mode> if some elements have VECTOR_TYPE.  */
-	    if (icode != CODE_FOR_nothing)
+	    if (CONSTRUCTOR_NELTS (exp)
+		&& (TREE_CODE (TREE_TYPE (CONSTRUCTOR_ELT (exp, 0)->value))
+		    == VECTOR_TYPE))
 	      {
-		tree value;
-
-		FOR_EACH_CONSTRUCTOR_VALUE (CONSTRUCTOR_ELTS (exp), idx, value)
-		  if (TREE_CODE (TREE_TYPE (value)) == VECTOR_TYPE)
-		    {
-		      icode = CODE_FOR_nothing;
-		      break;
-		    }
+		tree etype = TREE_TYPE (CONSTRUCTOR_ELT (exp, 0)->value);
+		gcc_assert (CONSTRUCTOR_NELTS (exp) * TYPE_VECTOR_SUBPARTS (etype)
+			    == n_elts);
+		emode = TYPE_MODE (etype);
 	      }
+	    icode = (int) convert_optab_handler (vec_init_optab, mode, emode);
 	    if (icode != CODE_FOR_nothing)
 	      {
-		unsigned int i;
+		unsigned int i, n = n_elts;
 
-		vector = rtvec_alloc (n_elts);
-		for (i = 0; i < n_elts; i++)
-		  RTVEC_ELT (vector, i) = CONST0_RTX (GET_MODE_INNER (mode));
+		if (emode != eltmode)
+		  {
+		    n = CONSTRUCTOR_NELTS (exp);
+		    vec_vec_init_p = true;
+		  }
+		vector = rtvec_alloc (n);
+		for (i = 0; i < n; i++)
+		  RTVEC_ELT (vector, i) = CONST0_RTX (emode);
 	      }
 	  }
 
@@ -6634,10 +6638,10 @@ store_constructor (tree exp, rtx target,
 
 	    FOR_EACH_CONSTRUCTOR_VALUE (CONSTRUCTOR_ELTS (exp), idx, value)
 	      {
-		int n_elts_here = tree_to_uhwi
-		  (int_const_binop (TRUNC_DIV_EXPR,
-				    TYPE_SIZE (TREE_TYPE (value)),
-				    TYPE_SIZE (elttype)));
+		tree sz = TYPE_SIZE (TREE_TYPE (value));
+		int n_elts_here
+		  = tree_to_uhwi (int_const_binop (TRUNC_DIV_EXPR, sz,
+						   TYPE_SIZE (elttype)));
 
 		count += n_elts_here;
 		if (mostly_zeros_p (value))
@@ -6687,18 +6691,21 @@ store_constructor (tree exp, rtx target,
 
 	    if (vector)
 	      {
-		/* vec_init<mode> should not be used if there are VECTOR_TYPE
-		   elements.  */
-		gcc_assert (TREE_CODE (TREE_TYPE (value)) != VECTOR_TYPE);
-		RTVEC_ELT (vector, eltpos)
-		  = expand_normal (value);
+		if (vec_vec_init_p)
+		  {
+		    gcc_assert (ce->index == NULL_TREE);
+		    gcc_assert (TREE_CODE (TREE_TYPE (value)) == VECTOR_TYPE);
+		    eltpos = idx;
+		  }
+		else
+		  gcc_assert (TREE_CODE (TREE_TYPE (value)) != VECTOR_TYPE);
+		RTVEC_ELT (vector, eltpos) = expand_normal (value);
 	      }
 	    else
 	      {
-		machine_mode value_mode =
-		  TREE_CODE (TREE_TYPE (value)) == VECTOR_TYPE
-		  ? TYPE_MODE (TREE_TYPE (value))
-		  : eltmode;
+		machine_mode value_mode
+		  = (TREE_CODE (TREE_TYPE (value)) == VECTOR_TYPE
+		     ? TYPE_MODE (TREE_TYPE (value)) : eltmode);
 		bitpos = eltpos * elt_size;
 		store_constructor_field (target, bitsize, bitpos, 0,
 					 bitregion_end, value_mode,
@@ -6707,9 +6714,9 @@ store_constructor (tree exp, rtx target,
 	  }
 
 	if (vector)
-	  emit_insn (GEN_FCN (icode)
-		     (target,
-		      gen_rtx_PARALLEL (GET_MODE (target), vector)));
+	  emit_insn (GEN_FCN (icode) (target,
+				      gen_rtx_PARALLEL (GET_MODE (target),
+							vector)));
 	break;
       }
 
--- gcc/tree-vect-stmts.c.jj	2017-07-24 10:57:46.004814816 +0200
+++ gcc/tree-vect-stmts.c	2017-07-24 16:11:23.049230114 +0200
@@ -6996,29 +6996,43 @@ vectorizable_load (gimple *stmt, gimple_
 	{
 	  if (group_size < nunits)
 	    {
-	      /* Avoid emitting a constructor of vector elements by performing
-		 the loads using an integer type of the same size,
-		 constructing a vector of those and then re-interpreting it
-		 as the original vector type.  This works around the fact
-		 that the vec_init optab was only designed for scalar
-		 element modes and thus expansion goes through memory.
-		 This avoids a huge runtime penalty due to the general
-		 inability to perform store forwarding from smaller stores
-		 to a larger load.  */
-	      unsigned lsize
-		= group_size * TYPE_PRECISION (TREE_TYPE (vectype));
-	      machine_mode elmode = mode_for_size (lsize, MODE_INT, 0);
-	      machine_mode vmode = mode_for_vector (elmode,
-						    nunits / group_size);
-	      /* If we can't construct such a vector fall back to
-		 element loads of the original vector type.  */
+	      /* First check if vec_init optab supports construction from
+		 vector elts directly.  */
+	      machine_mode elmode = TYPE_MODE (TREE_TYPE (vectype));
+	      machine_mode vmode = mode_for_vector (elmode, group_size);
 	      if (VECTOR_MODE_P (vmode)
-		  && optab_handler (vec_init_optab, vmode) != CODE_FOR_nothing)
+		  && (convert_optab_handler (vec_init_optab,
+					     TYPE_MODE (vectype), vmode)
+		      != CODE_FOR_nothing))
 		{
 		  nloads = nunits / group_size;
 		  lnel = group_size;
-		  ltype = build_nonstandard_integer_type (lsize, 1);
-		  lvectype = build_vector_type (ltype, nloads);
+		  ltype = build_vector_type (TREE_TYPE (vectype), group_size);
+		}
+	      else
+		{
+		  /* Otherwise avoid emitting a constructor of vector elements
+		     by performing the loads using an integer type of the same
+		     size, constructing a vector of those and then
+		     re-interpreting it as the original vector type.
+		     This avoids a huge runtime penalty due to the general
+		     inability to perform store forwarding from smaller stores
+		     to a larger load.  */
+		  unsigned lsize
+		    = group_size * TYPE_PRECISION (TREE_TYPE (vectype));
+		  elmode = mode_for_size (lsize, MODE_INT, 0);
+		  vmode = mode_for_vector (elmode, nunits / group_size);
+		  /* If we can't construct such a vector fall back to
+		     element loads of the original vector type.  */
+		  if (VECTOR_MODE_P (vmode)
+		      && (convert_optab_handler (vec_init_optab, vmode, elmode)
+			  != CODE_FOR_nothing))
+		    {
+		      nloads = nunits / group_size;
+		      lnel = group_size;
+		      ltype = build_nonstandard_integer_type (lsize, 1);
+		      lvectype = build_vector_type (ltype, nloads);
+		    }
 		}
 	    }
 	  else
--- gcc/doc/md.texi.jj	2017-07-24 10:57:45.989814996 +0200
+++ gcc/doc/md.texi	2017-07-24 17:09:55.536882382 +0200
@@ -4871,15 +4871,22 @@ This pattern is not allowed to @code{FAI
 Set given field in the vector value.  Operand 0 is the vector to modify,
 operand 1 is new value of field and operand 2 specify the field index.
 
-@cindex @code{vec_extract@var{m}} instruction pattern
-@item @samp{vec_extract@var{m}}
+@cindex @code{vec_extract@var{m}@var{n}} instruction pattern
+@item @samp{vec_extract@var{m}@var{n}}
 Extract given field from the vector value.  Operand 1 is the vector, operand 2
-specify field index and operand 0 place to store value into.
+specify field index and operand 0 place to store value into.  The
+@var{n} mode is the mode of the field or vector of fields that should be
+extracted, should be either element mode of the vector mode @var{m}, or
+a vector mode with the same element mode and smaller number of elements.
+If @var{n} is a vector mode, the index is counted in units of that mode.
 
-@cindex @code{vec_init@var{m}} instruction pattern
-@item @samp{vec_init@var{m}}
+@cindex @code{vec_init@var{m}@var{n}} instruction pattern
+@item @samp{vec_init@var{m}@var{n}}
 Initialize the vector to given values.  Operand 0 is the vector to initialize
-and operand 1 is parallel containing values for individual fields.
+and operand 1 is parallel containing values for individual fields.  The
+@var{n} mode is the mode of the elements, should be either element mode of
+the vector mode @var{m}, or a vector mode with the same element mode and
+smaller number of elements.
 
 @cindex @code{vec_cmp@var{m}@var{n}} instruction pattern
 @item @samp{vec_cmp@var{m}@var{n}}
--- gcc/config/i386/i386.c.jj	2017-07-24 10:58:11.831505333 +0200
+++ gcc/config/i386/i386.c	2017-07-24 16:11:23.060229982 +0200
@@ -44297,6 +44297,34 @@ ix86_expand_vector_init (bool mmx_ok, rt
   int i;
   rtx x;
 
+  /* Handle first initialization from vector elts.  */
+  if (n_elts != XVECLEN (vals, 0))
+    {
+      rtx subtarget = target;
+      x = XVECEXP (vals, 0, 0);
+      gcc_assert (GET_MODE_INNER (GET_MODE (x)) == inner_mode);
+      if (GET_MODE_NUNITS (GET_MODE (x)) * 2 == n_elts)
+	{
+	  rtx ops[2] = { XVECEXP (vals, 0, 0), XVECEXP (vals, 0, 1) };
+	  if (inner_mode == QImode || inner_mode == HImode)
+	    {
+	      mode = mode_for_vector (SImode,
+				      n_elts * GET_MODE_SIZE (inner_mode) / 4);
+	      inner_mode
+		= mode_for_vector (SImode,
+				   n_elts * GET_MODE_SIZE (inner_mode) / 8);
+	      ops[0] = gen_lowpart (inner_mode, ops[0]);
+	      ops[1] = gen_lowpart (inner_mode, ops[1]);
+	      subtarget = gen_reg_rtx (mode);
+	    }
+	  ix86_expand_vector_init_concat (mode, subtarget, ops, 2);
+	  if (subtarget != target)
+	    emit_move_insn (target, gen_lowpart (GET_MODE (target), subtarget));
+	  return;
+	}
+      gcc_unreachable ();
+    }
+
   for (i = 0; i < n_elts; ++i)
     {
       x = XVECEXP (vals, 0, i);
--- gcc/config/i386/sse.md.jj	2017-07-24 10:57:45.807817176 +0200
+++ gcc/config/i386/sse.md	2017-07-24 16:54:35.658088768 +0200
@@ -658,13 +658,21 @@ (define_mode_attr ssedoublevecmode
 
 ;; Mapping of vector modes to a vector mode of half size
 (define_mode_attr ssehalfvecmode
-  [(V64QI "V32QI") (V32HI "V16HI") (V16SI "V8SI") (V8DI "V4DI")
+  [(V64QI "V32QI") (V32HI "V16HI") (V16SI "V8SI") (V8DI "V4DI") (V4TI "V2TI")
    (V32QI "V16QI") (V16HI  "V8HI") (V8SI  "V4SI") (V4DI "V2DI")
    (V16QI  "V8QI") (V8HI   "V4HI") (V4SI  "V2SI")
    (V16SF "V8SF") (V8DF "V4DF")
    (V8SF  "V4SF") (V4DF "V2DF")
    (V4SF  "V2SF")])
 
+(define_mode_attr ssehalfvecmodelower
+  [(V64QI "v32qi") (V32HI "v16hi") (V16SI "v8si") (V8DI "v4di") (V4TI "v2ti")
+   (V32QI "v16qi") (V16HI  "v8hi") (V8SI  "v4si") (V4DI "v2di")
+   (V16QI  "v8qi") (V8HI   "v4hi") (V4SI  "v2si")
+   (V16SF "v8sf") (V8DF "v4df")
+   (V8SF  "v4sf") (V4DF "v2df")
+   (V4SF  "v2sf")])
+
 ;; Mapping of vector modes ti packed single mode of the same size
 (define_mode_attr ssePSmode
   [(V16SI "V16SF") (V8DF "V16SF")
@@ -690,6 +698,16 @@ (define_mode_attr ssescalarmode
    (V8DF "DF")  (V4DF "DF")  (V2DF "DF")
    (V4TI "TI")  (V2TI "TI")])
 
+;; Mapping of vector modes back to the scalar modes
+(define_mode_attr ssescalarmodelower
+  [(V64QI "qi") (V32QI "qi") (V16QI "qi")
+   (V32HI "hi") (V16HI "hi") (V8HI "hi")
+   (V16SI "si") (V8SI "si")  (V4SI "si")
+   (V8DI "di")  (V4DI "di")  (V2DI "di")
+   (V16SF "sf") (V8SF "sf")  (V4SF "sf")
+   (V8DF "df")  (V4DF "df")  (V2DF "df")
+   (V4TI "ti")  (V2TI "ti")])
+
 ;; Mapping of vector modes to the 128bit modes
 (define_mode_attr ssexmmmode
   [(V64QI "V16QI") (V32QI "V16QI") (V16QI "V16QI")
@@ -2356,7 +2374,7 @@ (define_expand "reduc_plus_scal_v8df"
 {
   rtx tmp = gen_reg_rtx (V8DFmode);
   ix86_expand_reduc (gen_addv8df3, tmp, operands[1]);
-  emit_insn (gen_vec_extractv8df (operands[0], tmp, const0_rtx));
+  emit_insn (gen_vec_extractv8dfdf (operands[0], tmp, const0_rtx));
   DONE;
 })
 
@@ -2371,7 +2389,7 @@ (define_expand "reduc_plus_scal_v4df"
   emit_insn (gen_avx_haddv4df3 (tmp, operands[1], operands[1]));
   emit_insn (gen_avx_vperm2f128v4df3 (tmp2, tmp, tmp, GEN_INT (1)));
   emit_insn (gen_addv4df3 (vec_res, tmp, tmp2));
-  emit_insn (gen_vec_extractv4df (operands[0], vec_res, const0_rtx));
+  emit_insn (gen_vec_extractv4dfdf (operands[0], vec_res, const0_rtx));
   DONE;
 })
 
@@ -2382,7 +2400,7 @@ (define_expand "reduc_plus_scal_v2df"
 {
   rtx tmp = gen_reg_rtx (V2DFmode);
   emit_insn (gen_sse3_haddv2df3 (tmp, operands[1], operands[1]));
-  emit_insn (gen_vec_extractv2df (operands[0], tmp, const0_rtx));
+  emit_insn (gen_vec_extractv2dfdf (operands[0], tmp, const0_rtx));
   DONE;
 })
 
@@ -2393,7 +2411,7 @@ (define_expand "reduc_plus_scal_v16sf"
 {
   rtx tmp = gen_reg_rtx (V16SFmode);
   ix86_expand_reduc (gen_addv16sf3, tmp, operands[1]);
-  emit_insn (gen_vec_extractv16sf (operands[0], tmp, const0_rtx));
+  emit_insn (gen_vec_extractv16sfsf (operands[0], tmp, const0_rtx));
   DONE;
 })
 
@@ -2409,7 +2427,7 @@ (define_expand "reduc_plus_scal_v8sf"
   emit_insn (gen_avx_haddv8sf3 (tmp2, tmp, tmp));
   emit_insn (gen_avx_vperm2f128v8sf3 (tmp, tmp2, tmp2, GEN_INT (1)));
   emit_insn (gen_addv8sf3 (vec_res, tmp, tmp2));
-  emit_insn (gen_vec_extractv8sf (operands[0], vec_res, const0_rtx));
+  emit_insn (gen_vec_extractv8sfsf (operands[0], vec_res, const0_rtx));
   DONE;
 })
 
@@ -2427,7 +2445,7 @@ (define_expand "reduc_plus_scal_v4sf"
     }
   else
     ix86_expand_reduc (gen_addv4sf3, vec_res, operands[1]);
-  emit_insn (gen_vec_extractv4sf (operands[0], vec_res, const0_rtx));
+  emit_insn (gen_vec_extractv4sfsf (operands[0], vec_res, const0_rtx));
   DONE;
 })
 
@@ -2449,7 +2467,8 @@ (define_expand "reduc_<code>_scal_<mode>
 {
   rtx tmp = gen_reg_rtx (<MODE>mode);
   ix86_expand_reduc (gen_<code><mode>3, tmp, operands[1]);
-  emit_insn (gen_vec_extract<mode> (operands[0], tmp, const0_rtx));
+  emit_insn (gen_vec_extract<mode><ssescalarmodelower> (operands[0], tmp,
+							const0_rtx));
   DONE;
 })
 
@@ -2461,7 +2480,8 @@ (define_expand "reduc_<code>_scal_<mode>
 {
   rtx tmp = gen_reg_rtx (<MODE>mode);
   ix86_expand_reduc (gen_<code><mode>3, tmp, operands[1]);
-  emit_insn (gen_vec_extract<mode> (operands[0], tmp, const0_rtx));
+  emit_insn (gen_vec_extract<mode><ssescalarmodelower> (operands[0], tmp,
+  							const0_rtx));
   DONE;
 })
 
@@ -2473,7 +2493,8 @@ (define_expand "reduc_<code>_scal_<mode>
 {
   rtx tmp = gen_reg_rtx (<MODE>mode);
   ix86_expand_reduc (gen_<code><mode>3, tmp, operands[1]);
-  emit_insn (gen_vec_extract<mode> (operands[0], tmp, const0_rtx));
+  emit_insn (gen_vec_extract<mode><ssescalarmodelower> (operands[0], tmp,
+							const0_rtx));
   DONE;
 })
 
@@ -2485,7 +2506,7 @@ (define_expand "reduc_umin_scal_v8hi"
 {
   rtx tmp = gen_reg_rtx (V8HImode);
   ix86_expand_reduc (gen_uminv8hi3, tmp, operands[1]);
-  emit_insn (gen_vec_extractv8hi (operands[0], tmp, const0_rtx));
+  emit_insn (gen_vec_extractv8hihi (operands[0], tmp, const0_rtx));
   DONE;
 })
 
@@ -7881,7 +7902,7 @@ (define_mode_iterator VEC_EXTRACT_MODE
    (V8DF "TARGET_AVX512F") (V4DF "TARGET_AVX") V2DF
    (V4TI "TARGET_AVX512F") (V2TI "TARGET_AVX")])
 
-(define_expand "vec_extract<mode>"
+(define_expand "vec_extract<mode><ssescalarmodelower>"
   [(match_operand:<ssescalarmode> 0 "register_operand")
    (match_operand:VEC_EXTRACT_MODE 1 "register_operand")
    (match_operand 2 "const_int_operand")]
@@ -7892,6 +7913,19 @@ (define_expand "vec_extract<mode>"
   DONE;
 })
 
+(define_expand "vec_extract<mode><ssehalfvecmodelower>"
+  [(match_operand:<ssehalfvecmode> 0 "nonimmediate_operand")
+   (match_operand:V_512 1 "register_operand")
+   (match_operand 2 "const_0_to_1_operand")]
+  "TARGET_AVX512F"
+{
+  if (INTVAL (operands[2]))
+    emit_insn (gen_vec_extract_hi_<mode> (operands[0], operands[1]));
+  else
+    emit_insn (gen_vec_extract_lo_<mode> (operands[0], operands[1]));
+  DONE;
+})
+
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
 ;;
 ;; Parallel double-precision floating point element swizzling
@@ -16693,7 +16727,7 @@ (define_expand "rotl<mode>3"
       for (i = 0; i < <ssescalarnum>; i++)
 	RTVEC_ELT (vs, i) = op2;
 
-      emit_insn (gen_vec_init<mode> (reg, par));
+      emit_insn (gen_vec_init<mode><ssescalarmodelower> (reg, par));
       emit_insn (gen_xop_vrotl<mode>3 (operands[0], operands[1], reg));
       DONE;
     }
@@ -16725,7 +16759,7 @@ (define_expand "rotr<mode>3"
       for (i = 0; i < <ssescalarnum>; i++)
 	RTVEC_ELT (vs, i) = op2;
 
-      emit_insn (gen_vec_init<mode> (reg, par));
+      emit_insn (gen_vec_init<mode><ssescalarmodelower> (reg, par));
       emit_insn (gen_neg<mode>2 (neg, reg));
       emit_insn (gen_xop_vrotl<mode>3 (operands[0], operands[1], neg));
       DONE;
@@ -17019,7 +17053,7 @@ (define_expand "<shift_insn><mode>3"
         XVECEXP (par, 0, i) = operands[2];
 
       tmp = gen_reg_rtx (V16QImode);
-      emit_insn (gen_vec_initv16qi (tmp, par));
+      emit_insn (gen_vec_initv16qiqi (tmp, par));
 
       if (negate)
 	emit_insn (gen_negv16qi2 (tmp, tmp));
@@ -17055,7 +17089,7 @@ (define_expand "ashrv2di3"
       for (i = 0; i < 2; i++)
 	XVECEXP (par, 0, i) = operands[2];
 
-      emit_insn (gen_vec_initv2di (reg, par));
+      emit_insn (gen_vec_initv2didi (reg, par));
 
       if (negate)
 	emit_insn (gen_negv2di2 (reg, reg));
@@ -18775,7 +18809,7 @@ (define_insn_and_split "avx_<castmode><a
 				  <ssehalfvecmode>mode);
 })
 
-;; Modes handled by vec_init patterns.
+;; Modes handled by vec_init expanders.
 (define_mode_iterator VEC_INIT_MODE
   [(V64QI "TARGET_AVX512F") (V32QI "TARGET_AVX") V16QI
    (V32HI "TARGET_AVX512F") (V16HI "TARGET_AVX") V8HI
@@ -18785,11 +18819,31 @@ (define_mode_iterator VEC_INIT_MODE
    (V8DF "TARGET_AVX512F") (V4DF "TARGET_AVX") (V2DF "TARGET_SSE2")
    (V4TI "TARGET_AVX512F") (V2TI "TARGET_AVX")])
 
-(define_expand "vec_init<mode>"
+;; Likewise, but for initialization from half sized vectors.
+;; Thus, these are all VEC_INIT_MODE modes except V2??.
+(define_mode_iterator VEC_INIT_HALF_MODE
+  [(V64QI "TARGET_AVX512F") (V32QI "TARGET_AVX") V16QI
+   (V32HI "TARGET_AVX512F") (V16HI "TARGET_AVX") V8HI
+   (V16SI "TARGET_AVX512F") (V8SI "TARGET_AVX") V4SI
+   (V8DI "TARGET_AVX512F") (V4DI "TARGET_AVX")
+   (V16SF "TARGET_AVX512F") (V8SF "TARGET_AVX") V4SF
+   (V8DF "TARGET_AVX512F") (V4DF "TARGET_AVX")
+   (V4TI "TARGET_AVX512F")])
+
+(define_expand "vec_init<mode><ssescalarmodelower>"
   [(match_operand:VEC_INIT_MODE 0 "register_operand")
    (match_operand 1)]
   "TARGET_SSE"
 {
+  ix86_expand_vector_init (false, operands[0], operands[1]);
+  DONE;
+})
+
+(define_expand "vec_init<mode><ssehalfvecmodelower>"
+  [(match_operand:VEC_INIT_HALF_MODE 0 "register_operand")
+   (match_operand 1)]
+  "TARGET_SSE"
+{
   ix86_expand_vector_init (false, operands[0], operands[1]);
   DONE;
 })
--- gcc/config/i386/mmx.md.jj	2017-07-24 10:57:45.869816434 +0200
+++ gcc/config/i386/mmx.md	2017-07-24 16:11:23.065229922 +0200
@@ -641,7 +641,7 @@ (define_split
   [(set (match_dup 0) (match_dup 1))]
   "operands[1] = adjust_address (operands[1], SFmode, 4);")
 
-(define_expand "vec_extractv2sf"
+(define_expand "vec_extractv2sfsf"
   [(match_operand:SF 0 "register_operand")
    (match_operand:V2SF 1 "register_operand")
    (match_operand 2 "const_int_operand")]
@@ -652,7 +652,7 @@ (define_expand "vec_extractv2sf"
   DONE;
 })
 
-(define_expand "vec_initv2sf"
+(define_expand "vec_initv2sfsf"
   [(match_operand:V2SF 0 "register_operand")
    (match_operand 1)]
   "TARGET_SSE"
@@ -1344,7 +1344,7 @@ (define_insn_and_split "*vec_extractv2si
   operands[1] = adjust_address (operands[1], SImode, INTVAL (operands[2]) * 4);
 })
 
-(define_expand "vec_extractv2si"
+(define_expand "vec_extractv2sisi"
   [(match_operand:SI 0 "register_operand")
    (match_operand:V2SI 1 "register_operand")
    (match_operand 2 "const_int_operand")]
@@ -1355,7 +1355,7 @@ (define_expand "vec_extractv2si"
   DONE;
 })
 
-(define_expand "vec_initv2si"
+(define_expand "vec_initv2sisi"
   [(match_operand:V2SI 0 "register_operand")
    (match_operand 1)]
   "TARGET_SSE"
@@ -1375,7 +1375,7 @@ (define_expand "vec_setv4hi"
   DONE;
 })
 
-(define_expand "vec_extractv4hi"
+(define_expand "vec_extractv4hihi"
   [(match_operand:HI 0 "register_operand")
    (match_operand:V4HI 1 "register_operand")
    (match_operand 2 "const_int_operand")]
@@ -1386,7 +1386,7 @@ (define_expand "vec_extractv4hi"
   DONE;
 })
 
-(define_expand "vec_initv4hi"
+(define_expand "vec_initv4hihi"
   [(match_operand:V4HI 0 "register_operand")
    (match_operand 1)]
   "TARGET_SSE"
@@ -1406,7 +1406,7 @@ (define_expand "vec_setv8qi"
   DONE;
 })
 
-(define_expand "vec_extractv8qi"
+(define_expand "vec_extractv8qiqi"
   [(match_operand:QI 0 "register_operand")
    (match_operand:V8QI 1 "register_operand")
    (match_operand 2 "const_int_operand")]
@@ -1417,7 +1417,7 @@ (define_expand "vec_extractv8qi"
   DONE;
 })
 
-(define_expand "vec_initv8qi"
+(define_expand "vec_initv8qiqi"
   [(match_operand:V8QI 0 "register_operand")
    (match_operand 1)]
   "TARGET_SSE"
--- gcc/config/rs6000/vector.md.jj	2017-06-08 20:50:49.000000000 +0200
+++ gcc/config/rs6000/vector.md	2017-07-24 17:44:44.699580927 +0200
@@ -74,6 +74,16 @@ (define_mode_attr VEC_base [(V16QI "QI")
 			    (V1TI  "TI")
 			    (TI    "TI")])
 
+;; As above, but in lower case
+(define_mode_attr VEC_base_l [(V16QI "qi")
+			      (V8HI  "hi")
+			      (V4SI  "si")
+			      (V2DI  "di")
+			      (V4SF  "sf")
+			      (V2DF  "df")
+			      (V1TI  "ti")
+			      (TI    "ti")])
+
 ;; Same size integer type for floating point data
 (define_mode_attr VEC_int [(V4SF  "v4si")
 			   (V2DF  "v2di")])
@@ -1016,7 +1026,7 @@ (define_expand "fixuns_trunc<mode><VEC_i
 
 \f
 ;; Vector initialization, set, extract
-(define_expand "vec_init<mode>"
+(define_expand "vec_init<mode><VEC_base_l>"
   [(match_operand:VEC_E 0 "vlogical_operand" "")
    (match_operand:VEC_E 1 "" "")]
   "VECTOR_MEM_ALTIVEC_OR_VSX_P (<MODE>mode)"
@@ -1035,7 +1045,7 @@ (define_expand "vec_set<mode>"
   DONE;
 })
 
-(define_expand "vec_extract<mode>"
+(define_expand "vec_extract<mode><VEC_base_l>"
   [(match_operand:<VEC_base> 0 "register_operand" "")
    (match_operand:VEC_E 1 "vlogical_operand" "")
    (match_operand 2 "const_int_operand" "")]
--- gcc/config/rs6000/paired.md.jj	2017-06-08 20:50:49.000000000 +0200
+++ gcc/config/rs6000/paired.md	2017-07-24 17:48:20.324985029 +0200
@@ -377,7 +377,7 @@ (define_insn "paired_muls1"
   "ps_muls1 %0, %1, %2"
   [(set_attr "type" "fp")])
 
-(define_expand "vec_initv2sf"
+(define_expand "vec_initv2sfsf"
   [(match_operand:V2SF 0 "gpc_reg_operand" "=f")
    (match_operand 1 "" "")]
   "TARGET_PAIRED_FLOAT"
--- gcc/config/rs6000/altivec.md.jj	2017-07-24 10:58:12.000000000 +0200
+++ gcc/config/rs6000/altivec.md	2017-07-24 17:48:49.573633038 +0200
@@ -311,7 +311,7 @@ (define_split
   for (i = 0; i < num_elements; i++)
     RTVEC_ELT (v, i) = constm1_rtx;
 
-  emit_insn (gen_vec_initv4si (dest, gen_rtx_PARALLEL (mode, v)));
+  emit_insn (gen_vec_initv4sisi (dest, gen_rtx_PARALLEL (mode, v)));
   emit_insn (gen_rtx_SET (dest, gen_rtx_ASHIFT (mode, dest, dest)));
   DONE;
 })
@@ -2267,7 +2267,7 @@ (define_expand "altivec_copysign_v4sf3"
   RTVEC_ELT (v, 2) = GEN_INT (mask_val);
   RTVEC_ELT (v, 3) = GEN_INT (mask_val);
 
-  emit_insn (gen_vec_initv4si (mask, gen_rtx_PARALLEL (V4SImode, v)));
+  emit_insn (gen_vec_initv4sisi (mask, gen_rtx_PARALLEL (V4SImode, v)));
   emit_insn (gen_vector_select_v4sf (operands[0], operands[1], operands[2],
 				     gen_lowpart (V4SFmode, mask)));
   DONE;
@@ -3409,7 +3409,7 @@ (define_expand "vec_unpacku_hi_v16qi"
   RTVEC_ELT (v, 14) = gen_rtx_CONST_INT (QImode, be ? 16 :  0);
   RTVEC_ELT (v, 15) = gen_rtx_CONST_INT (QImode, be ?  7 : 16);
 
-  emit_insn (gen_vec_initv16qi (mask, gen_rtx_PARALLEL (V16QImode, v)));
+  emit_insn (gen_vec_initv16qiqi (mask, gen_rtx_PARALLEL (V16QImode, v)));
   emit_insn (gen_vperm_v16qiv8hi (operands[0], operands[1], vzero, mask));
   DONE;
 }")
@@ -3445,7 +3445,7 @@ (define_expand "vec_unpacku_hi_v8hi"
   RTVEC_ELT (v, 14) = gen_rtx_CONST_INT (QImode, be ?  6 : 17);
   RTVEC_ELT (v, 15) = gen_rtx_CONST_INT (QImode, be ?  7 : 16);
 
-  emit_insn (gen_vec_initv16qi (mask, gen_rtx_PARALLEL (V16QImode, v)));
+  emit_insn (gen_vec_initv16qiqi (mask, gen_rtx_PARALLEL (V16QImode, v)));
   emit_insn (gen_vperm_v8hiv4si (operands[0], operands[1], vzero, mask));
   DONE;
 }")
@@ -3481,7 +3481,7 @@ (define_expand "vec_unpacku_lo_v16qi"
   RTVEC_ELT (v, 14) = gen_rtx_CONST_INT (QImode, be ? 16 :  8);
   RTVEC_ELT (v, 15) = gen_rtx_CONST_INT (QImode, be ? 15 : 16);
 
-  emit_insn (gen_vec_initv16qi (mask, gen_rtx_PARALLEL (V16QImode, v)));
+  emit_insn (gen_vec_initv16qiqi (mask, gen_rtx_PARALLEL (V16QImode, v)));
   emit_insn (gen_vperm_v16qiv8hi (operands[0], operands[1], vzero, mask));
   DONE;
 }")
@@ -3517,7 +3517,7 @@ (define_expand "vec_unpacku_lo_v8hi"
   RTVEC_ELT (v, 14) = gen_rtx_CONST_INT (QImode, be ? 14 : 17);
   RTVEC_ELT (v, 15) = gen_rtx_CONST_INT (QImode, be ? 15 : 16);
 
-  emit_insn (gen_vec_initv16qi (mask, gen_rtx_PARALLEL (V16QImode, v)));
+  emit_insn (gen_vec_initv16qiqi (mask, gen_rtx_PARALLEL (V16QImode, v)));
   emit_insn (gen_vperm_v8hiv4si (operands[0], operands[1], vzero, mask));
   DONE;
 }")
@@ -3758,7 +3758,7 @@ (define_expand "mulv16qi3"
      = gen_rtx_CONST_INT (QImode, BYTES_BIG_ENDIAN ? 2 * i + 17 : 15 - 2 * i);
   }
 
-  emit_insn (gen_vec_initv16qi (mask, gen_rtx_PARALLEL (V16QImode, v)));
+  emit_insn (gen_vec_initv16qiqi (mask, gen_rtx_PARALLEL (V16QImode, v)));
   emit_insn (gen_altivec_vmulesb (even, operands[1], operands[2]));
   emit_insn (gen_altivec_vmulosb (odd, operands[1], operands[2]));
   emit_insn (gen_altivec_vperm_v8hiv16qi (operands[0], even, odd, mask));
@@ -3804,7 +3804,7 @@ (define_expand "altivec_vreve<mode>2"
       RTVEC_ELT (v, i + j * size)
 	= GEN_INT (i + (num_elements - 1 - j) * size);
 
-  emit_insn (gen_vec_initv16qi (mask, gen_rtx_PARALLEL (V16QImode, v)));
+  emit_insn (gen_vec_initv16qiqi (mask, gen_rtx_PARALLEL (V16QImode, v)));
   emit_insn (gen_altivec_vperm_<mode> (operands[0], operands[1],
 	     operands[1], mask));
   DONE;
--- gcc/config/aarch64/aarch64-simd.md.jj	2017-07-24 15:01:21.000000000 +0200
+++ gcc/config/aarch64/aarch64-simd.md	2017-07-24 17:19:05.660170375 +0200
@@ -5617,9 +5617,9 @@ (define_expand "aarch64_set_qreg<VSTRUCT
   DONE;
 })
 
-;; Standard pattern name vec_init<mode>.
+;; Standard pattern name vec_init<mode><Vel>.
 
-(define_expand "vec_init<mode>"
+(define_expand "vec_init<mode><Vel>"
   [(match_operand:VALL_F16 0 "register_operand" "")
    (match_operand 1 "" "")]
   "TARGET_SIMD"
@@ -5674,9 +5674,9 @@ (define_insn "aarch64_urecpe<mode>"
  "urecpe\\t%0.<Vtype>, %1.<Vtype>"
   [(set_attr "type" "neon_fp_recpe_<Vetype><q>")])
 
-;; Standard pattern name vec_extract<mode>.
+;; Standard pattern name vec_extract<mode><Vel>.
 
-(define_expand "vec_extract<mode>"
+(define_expand "vec_extract<mode><Vel>"
   [(match_operand:<VEL> 0 "aarch64_simd_nonimmediate_operand" "")
    (match_operand:VALL_F16 1 "register_operand" "")
    (match_operand:SI 2 "immediate_operand" "")]
--- gcc/config/aarch64/iterators.md.jj	2017-03-19 11:57:22.000000000 +0100
+++ gcc/config/aarch64/iterators.md	2017-07-24 17:17:50.318091273 +0200
@@ -520,6 +520,17 @@ (define_mode_attr VEL [(V8QI "QI") (V16Q
 			(SI   "SI") (HI   "HI")
 			(QI   "QI")])
 
+;; Define element mode for each vector mode (lower case).
+(define_mode_attr Vel [(V8QI "qi") (V16QI "qi")
+			(V4HI "hi") (V8HI "hi")
+			(V2SI "si") (V4SI "si")
+			(DI "di")   (V2DI "di")
+			(V4HF "hf") (V8HF "hf")
+			(V2SF "sf") (V4SF "sf")
+			(V2DF "df") (DF "df")
+			(SI   "si") (HI   "hi")
+			(QI   "qi")])
+
 ;; 64-bit container modes the inner or scalar source mode.
 (define_mode_attr VCOND [(HI "V4HI") (SI "V2SI")
 			 (V4HI "V4HI") (V8HI "V4HI")
--- gcc/config/s390/s390.c.jj	2017-07-17 10:08:39.000000000 +0200
+++ gcc/config/s390/s390.c	2017-07-24 17:58:24.416715142 +0200
@@ -5792,7 +5792,7 @@ s390_expand_vec_strlen (rtx target, rtx
   add_int_reg_note (s390_emit_ccraw_jump (8, NE, loop_start_label),
 		    REG_BR_PROB,
 		    profile_probability::very_likely ().to_reg_br_prob_note ());
-  emit_insn (gen_vec_extractv16qi (len, result_reg, GEN_INT (7)));
+  emit_insn (gen_vec_extractv16qiqi (len, result_reg, GEN_INT (7)));
 
   /* If the string pointer wasn't aligned we have loaded less then 16
      bytes and the remaining bytes got filled with zeros (by vll).
@@ -5850,7 +5850,7 @@ s390_expand_vec_movstr (rtx result, rtx
   emit_insn (gen_vlbb (vsrc, src, GEN_INT (6)));
   emit_insn (gen_lcbb (loadlen, src_addr, GEN_INT (6)));
   emit_insn (gen_vfenezv16qi (vpos, vsrc, vsrc));
-  emit_insn (gen_vec_extractv16qi (gpos_qi, vpos, GEN_INT (7)));
+  emit_insn (gen_vec_extractv16qiqi (gpos_qi, vpos, GEN_INT (7)));
   emit_move_insn (gpos, gen_rtx_SUBREG (SImode, gpos_qi, 0));
   /* gpos is the byte index if a zero was found and 16 otherwise.
      So if it is lower than the loaded bytes we have a hit.  */
@@ -5928,7 +5928,7 @@ s390_expand_vec_movstr (rtx result, rtx
   force_expand_binop (Pmode, add_optab, dst_addr_reg, offset, dst_addr_reg,
 		      1, OPTAB_DIRECT);
 
-  emit_insn (gen_vec_extractv16qi (gpos_qi, vpos, GEN_INT (7)));
+  emit_insn (gen_vec_extractv16qiqi (gpos_qi, vpos, GEN_INT (7)));
   emit_move_insn (gpos, gen_rtx_SUBREG (SImode, gpos_qi, 0));
 
   emit_insn (gen_vstlv16qi (vsrc, gpos, gen_rtx_MEM (BLKmode, dst_addr_reg)));
--- gcc/config/s390/vector.md.jj	2017-04-25 15:51:31.000000000 +0200
+++ gcc/config/s390/vector.md	2017-07-24 17:57:37.665277768 +0200
@@ -90,6 +90,17 @@ (define_mode_attr non_vec[(V1QI "QI") (V
 			  (V1DF "DF") (V2DF "DF")
 			  (V1TF "TF") (TF "TF")])
 
+; Like above, but in lower case.
+(define_mode_attr non_vec_l[(V1QI "qi") (V2QI "qi") (V4QI "qi") (V8QI "qi")
+			    (V16QI "qi")
+			    (V1HI "hi") (V2HI "hi") (V4HI "hi") (V8HI "hi")
+			    (V1SI "si") (V2SI "si") (V4SI "si")
+			    (V1DI "di") (V2DI "di")
+			    (V1TI "ti") (TI "ti")
+			    (V1SF "sf") (V2SF "sf") (V4SF "sf")
+			    (V1DF "df") (V2DF "df")
+			    (V1TF "tf") (TF "tf")])
+
 ; The instruction suffix for integer instructions and instructions
 ; which do not care about whether it is floating point or integer.
 (define_mode_attr bhfgq[(V1QI "b") (V2QI "b") (V4QI "b") (V8QI "b") (V16QI "b")
@@ -453,7 +464,7 @@ (define_insn "*vec_set<mode>_plus"
 ; FIXME: Support also vector mode operands for 0
 ; FIXME: This should be (vec_select ..) or something but it does only allow constant selectors :(
 ; This is used via RTL standard name as well as for expanding the builtin
-(define_expand "vec_extract<mode>"
+(define_expand "vec_extract<mode><non_vec_l>"
   [(set (match_operand:<non_vec> 0 "nonimmediate_operand" "")
 	(unspec:<non_vec> [(match_operand:V  1 "register_operand" "")
 			   (match_operand:SI 2 "nonmemory_operand" "")]
@@ -485,7 +496,7 @@ (define_insn "*vec_extract<mode>_plus"
   "vlgv<bhfgq>\t%0,%v1,%Y3(%2)"
   [(set_attr "op_type" "VRS")])
 
-(define_expand "vec_init<mode>"
+(define_expand "vec_init<mode><non_vec_l>"
   [(match_operand:V_128 0 "register_operand" "")
    (match_operand:V_128 1 "nonmemory_operand" "")]
   "TARGET_VX"
--- gcc/config/s390/s390-builtins.def.jj	2017-03-24 15:08:56.000000000 +0100
+++ gcc/config/s390/s390-builtins.def	2017-07-24 18:02:22.571849086 +0200
@@ -450,12 +450,12 @@ OB_DEF_VAR (s390_vec_extract_u64,
 OB_DEF_VAR (s390_vec_extract_b64,       s390_vlgvg,         0,                  O2_ELEM,            BT_OV_ULONGLONG_BV2DI_INT)
 OB_DEF_VAR (s390_vec_extract_dbl,       s390_vlgvg_dbl,     0,                  O2_ELEM,            BT_OV_DBL_V2DF_INT)                      /* vlgvg */
 
-B_DEF      (s390_vlgvb,                 vec_extractv16qi,   0,                  B_VX,               O2_ELEM,            BT_FN_UCHAR_UV16QI_INT)
-B_DEF      (s390_vlgvh,                 vec_extractv8hi,    0,                  B_VX,               O2_ELEM,            BT_FN_USHORT_UV8HI_INT)
-B_DEF      (s390_vlgvf,                 vec_extractv4si,    0,                  B_VX,               O2_ELEM,            BT_FN_UINT_UV4SI_INT)
-B_DEF      (s390_vlgvf_flt,             vec_extractv4sf,    0,                  B_INT | B_VXE,      O2_ELEM,            BT_FN_FLT_V4SF_INT)
-B_DEF      (s390_vlgvg,                 vec_extractv2di,    0,                  B_VX,               O2_ELEM,            BT_FN_ULONGLONG_UV2DI_INT)
-B_DEF      (s390_vlgvg_dbl,             vec_extractv2df,    0,                  B_INT | B_VX,       O2_ELEM,            BT_FN_DBL_V2DF_INT)
+B_DEF      (s390_vlgvb,                 vec_extractv16qiqi, 0,                  B_VX,               O2_ELEM,            BT_FN_UCHAR_UV16QI_INT)
+B_DEF      (s390_vlgvh,                 vec_extractv8hihi,  0,                  B_VX,               O2_ELEM,            BT_FN_USHORT_UV8HI_INT)
+B_DEF      (s390_vlgvf,                 vec_extractv4sisi,  0,                  B_VX,               O2_ELEM,            BT_FN_UINT_UV4SI_INT)
+B_DEF      (s390_vlgvf_flt,             vec_extractv4sfsf,  0,                  B_INT | B_VXE,      O2_ELEM,            BT_FN_FLT_V4SF_INT)
+B_DEF      (s390_vlgvg,                 vec_extractv2didi,  0,                  B_VX,               O2_ELEM,            BT_FN_ULONGLONG_UV2DI_INT)
+B_DEF      (s390_vlgvg_dbl,             vec_extractv2dfdf,  0,                  B_INT | B_VX,       O2_ELEM,            BT_FN_DBL_V2DF_INT)
 
 OB_DEF     (s390_vec_insert_and_zero,   s390_vec_insert_and_zero_s8,s390_vec_insert_and_zero_dbl,B_VX,BT_FN_OV4SI_INTCONSTPTR)
 OB_DEF_VAR (s390_vec_insert_and_zero_s8,s390_vllezb,        0,                  0,                  BT_OV_V16QI_SCHARCONSTPTR)
--- gcc/config/arm/iterators.md.jj	2017-05-05 09:20:02.000000000 +0200
+++ gcc/config/arm/iterators.md	2017-07-24 17:25:15.665681575 +0200
@@ -444,6 +444,14 @@ (define_mode_attr V_elem [(V8QI "QI") (V
                           (V2SF "SF") (V4SF "SF")
                           (DI "DI")   (V2DI "DI")])
 
+;; As above but in lower case.
+(define_mode_attr V_elem_l [(V8QI "qi") (V16QI "qi")
+			    (V4HI "hi") (V8HI "hi")
+			    (V4HF "hf") (V8HF "hf")
+			    (V2SI "si") (V4SI "si")
+			    (V2SF "sf") (V4SF "sf")
+			    (DI "di")   (V2DI "di")])
+
 ;; Element modes for vector extraction, padded up to register size.
 
 (define_mode_attr V_ext [(V8QI "SI") (V16QI "SI")
--- gcc/config/arm/neon.md.jj	2017-07-17 10:08:41.000000000 +0200
+++ gcc/config/arm/neon.md	2017-07-24 17:27:42.173917259 +0200
@@ -412,7 +412,7 @@ (define_expand "vec_set<mode>"
   DONE;
 })
 
-(define_insn "vec_extract<mode>"
+(define_insn "vec_extract<mode><V_elem_l>"
   [(set (match_operand:<V_elem> 0 "nonimmediate_operand" "=Um,r")
         (vec_select:<V_elem>
           (match_operand:VD_LANE 1 "s_register_operand" "w,w")
@@ -434,7 +434,7 @@ (define_insn "vec_extract<mode>"
   [(set_attr "type" "neon_store1_one_lane<q>,neon_to_gp<q>")]
 )
 
-(define_insn "vec_extract<mode>"
+(define_insn "vec_extract<mode><V_elem_l>"
   [(set (match_operand:<V_elem> 0 "nonimmediate_operand" "=Um,r")
 	(vec_select:<V_elem>
           (match_operand:VQ2 1 "s_register_operand" "w,w")
@@ -460,7 +460,7 @@ (define_insn "vec_extract<mode>"
   [(set_attr "type" "neon_store1_one_lane<q>,neon_to_gp<q>")]
 )
 
-(define_insn "vec_extractv2di"
+(define_insn "vec_extractv2didi"
   [(set (match_operand:DI 0 "nonimmediate_operand" "=Um,r")
 	(vec_select:DI
           (match_operand:V2DI 1 "s_register_operand" "w,w")
@@ -479,7 +479,7 @@ (define_insn "vec_extractv2di"
   [(set_attr "type" "neon_store1_one_lane_q,neon_to_gp_q")]
 )
 
-(define_expand "vec_init<mode>"
+(define_expand "vec_init<mode><V_elem_l>"
   [(match_operand:VDQ 0 "s_register_operand" "")
    (match_operand 1 "" "")]
   "TARGET_NEON"
@@ -1581,7 +1581,7 @@ (define_expand "reduc_plus_scal_<mode>"
   neon_pairwise_reduce (vec, operands[1], <MODE>mode,
 			&gen_neon_vpadd_internal<mode>);
   /* The same result is actually computed into every element.  */
-  emit_insn (gen_vec_extract<mode> (operands[0], vec, const0_rtx));
+  emit_insn (gen_vec_extract<mode><V_elem_l> (operands[0], vec, const0_rtx));
   DONE;
 })
 
@@ -1607,7 +1607,7 @@ (define_expand "reduc_plus_scal_v2di"
   rtx vec = gen_reg_rtx (V2DImode);
 
   emit_insn (gen_arm_reduc_plus_internal_v2di (vec, operands[1]));
-  emit_insn (gen_vec_extractv2di (operands[0], vec, const0_rtx));
+  emit_insn (gen_vec_extractv2didi (operands[0], vec, const0_rtx));
 
   DONE;
 })
@@ -1631,7 +1631,7 @@ (define_expand "reduc_smin_scal_<mode>"
   neon_pairwise_reduce (vec, operands[1], <MODE>mode,
 			&gen_neon_vpsmin<mode>);
   /* The result is computed into every element of the vector.  */
-  emit_insn (gen_vec_extract<mode> (operands[0], vec, const0_rtx));
+  emit_insn (gen_vec_extract<mode><V_elem_l> (operands[0], vec, const0_rtx));
   DONE;
 })
 
@@ -1658,7 +1658,7 @@ (define_expand "reduc_smax_scal_<mode>"
   neon_pairwise_reduce (vec, operands[1], <MODE>mode,
 			&gen_neon_vpsmax<mode>);
   /* The result is computed into every element of the vector.  */
-  emit_insn (gen_vec_extract<mode> (operands[0], vec, const0_rtx));
+  emit_insn (gen_vec_extract<mode><V_elem_l> (operands[0], vec, const0_rtx));
   DONE;
 })
 
@@ -1685,7 +1685,7 @@ (define_expand "reduc_umin_scal_<mode>"
   neon_pairwise_reduce (vec, operands[1], <MODE>mode,
 			&gen_neon_vpumin<mode>);
   /* The result is computed into every element of the vector.  */
-  emit_insn (gen_vec_extract<mode> (operands[0], vec, const0_rtx));
+  emit_insn (gen_vec_extract<mode><V_elem_l> (operands[0], vec, const0_rtx));
   DONE;
 })
 
@@ -1711,7 +1711,7 @@ (define_expand "reduc_umax_scal_<mode>"
   neon_pairwise_reduce (vec, operands[1], <MODE>mode,
 			&gen_neon_vpumax<mode>);
   /* The result is computed into every element of the vector.  */
-  emit_insn (gen_vec_extract<mode> (operands[0], vec, const0_rtx));
+  emit_insn (gen_vec_extract<mode><V_elem_l> (operands[0], vec, const0_rtx));
   DONE;
 })
 
@@ -3272,7 +3272,8 @@ (define_expand "neon_vget_lane<mode>"
     }
 
   if (GET_MODE_UNIT_BITSIZE (<MODE>mode) == 32)
-    emit_insn (gen_vec_extract<mode> (operands[0], operands[1], operands[2]));
+    emit_insn (gen_vec_extract<mode><V_elem_l> (operands[0], operands[1],
+						operands[2]));
   else
     emit_insn (gen_neon_vget_lane<mode>_sext_internal (operands[0],
 						       operands[1],
@@ -3301,7 +3302,8 @@ (define_expand "neon_vget_laneu<mode>"
     }
 
   if (GET_MODE_UNIT_BITSIZE (<MODE>mode) == 32)
-    emit_insn (gen_vec_extract<mode> (operands[0], operands[1], operands[2]));
+    emit_insn (gen_vec_extract<mode><V_elem_l> (operands[0], operands[1],
+						operands[2]));
   else
     emit_insn (gen_neon_vget_lane<mode>_zext_internal (operands[0],
 						       operands[1],
--- gcc/config/mips/mips-msa.md.jj	2017-03-31 20:36:09.000000000 +0200
+++ gcc/config/mips/mips-msa.md	2017-07-24 17:33:32.657689124 +0200
@@ -231,7 +231,7 @@ (define_mode_attr bitimm
    (V4SI  "uimm5")
    (V2DI  "uimm6")])
 
-(define_expand "vec_init<mode>"
+(define_expand "vec_init<mode><unitmode>"
   [(match_operand:MSA 0 "register_operand")
    (match_operand:MSA 1 "")]
   "ISA_HAS_MSA"
@@ -311,7 +311,7 @@ (define_expand "vec_unpacku_lo_<mode>"
   DONE;
 })
 
-(define_expand "vec_extract<mode>"
+(define_expand "vec_extract<mode><unitmode>"
   [(match_operand:<UNITMODE> 0 "register_operand")
    (match_operand:IMSA 1 "register_operand")
    (match_operand 2 "const_<indeximm>_operand")]
@@ -329,7 +329,7 @@ (define_expand "vec_extract<mode>"
   DONE;
 })
 
-(define_expand "vec_extract<mode>"
+(define_expand "vec_extract<mode><unitmode>"
   [(match_operand:<UNITMODE> 0 "register_operand")
    (match_operand:FMSA 1 "register_operand")
    (match_operand 2 "const_<indeximm>_operand")]
--- gcc/config/mips/loongson.md.jj	2017-01-01 12:45:40.000000000 +0100
+++ gcc/config/mips/loongson.md	2017-07-24 18:08:29.736433972 +0200
@@ -119,7 +119,7 @@ (define_insn "mov<mode>_internal"
 
 ;; Initialization of a vector.
 
-(define_expand "vec_init<mode>"
+(define_expand "vec_init<mode><unitmode>"
   [(set (match_operand:VWHB 0 "register_operand")
 	(match_operand 1 ""))]
   "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
--- gcc/config/mips/mips-ps-3d.md.jj	2017-01-01 12:45:40.000000000 +0100
+++ gcc/config/mips/mips-ps-3d.md	2017-07-24 17:34:13.540195876 +0200
@@ -254,7 +254,7 @@ (define_expand "mips_pll_ps"
 })
 
 ; vec_init
-(define_expand "vec_initv2sf"
+(define_expand "vec_initv2sfsf"
   [(match_operand:V2SF 0 "register_operand")
    (match_operand:V2SF 1 "")]
   "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT"
@@ -282,7 +282,7 @@ (define_insn "vec_concatv2sf"
 ;; emulated.  There is no other way to get a vector mode bitfield extract
 ;; currently.
 
-(define_insn "vec_extractv2sf"
+(define_insn "vec_extractv2sfsf"
   [(set (match_operand:SF 0 "register_operand" "=f")
 	(vec_select:SF (match_operand:V2SF 1 "register_operand" "f")
 		       (parallel
@@ -379,7 +379,7 @@ (define_expand "reduc_plus_scal_v2sf"
     rtx temp = gen_reg_rtx (V2SFmode);
     emit_insn (gen_mips_addr_ps (temp, operands[1], operands[1]));
     rtx lane = BYTES_BIG_ENDIAN ? const1_rtx : const0_rtx;
-    emit_insn (gen_vec_extractv2sf (operands[0], temp, lane));
+    emit_insn (gen_vec_extractv2sfsf (operands[0], temp, lane));
     DONE;
   })
 
@@ -757,7 +757,7 @@ (define_expand "reduc_smin_scal_v2sf"
   rtx temp = gen_reg_rtx (V2SFmode);
   mips_expand_vec_reduc (temp, operands[1], gen_sminv2sf3);
   rtx lane = BYTES_BIG_ENDIAN ? const1_rtx : const0_rtx;
-  emit_insn (gen_vec_extractv2sf (operands[0], temp, lane));
+  emit_insn (gen_vec_extractv2sfsf (operands[0], temp, lane));
   DONE;
 })
 
@@ -769,6 +769,6 @@ (define_expand "reduc_smax_scal_v2sf"
   rtx temp = gen_reg_rtx (V2SFmode);
   mips_expand_vec_reduc (temp, operands[1], gen_smaxv2sf3);
   rtx lane = BYTES_BIG_ENDIAN ? const1_rtx : const0_rtx;
-  emit_insn (gen_vec_extractv2sf (operands[0], temp, lane));
+  emit_insn (gen_vec_extractv2sfsf (operands[0], temp, lane));
   DONE;
 })
--- gcc/config/mips/mips.md.jj	2017-06-15 11:03:32.000000000 +0200
+++ gcc/config/mips/mips.md	2017-07-24 19:00:15.519582707 +0200
@@ -917,6 +917,11 @@ (define_mode_attr UNITMODE [(SF "SF") (D
 			    (V16QI "QI") (V8HI "HI") (V4SI "SI") (V2DI "DI")
 			    (V2DF "DF")])
 
+;; As above, but in lower case.
+(define_mode_attr unitmode [(SF "sf") (DF "df") (V2SF "sf") (V4SF "sf")
+			    (V16QI "qi") (V8QI "qi") (V8HI "hi") (V4HI "hi")
+			    (V4SI "si") (V2SI "si") (V2DI "di") (V2DF "df")])
+
 ;; This attribute gives the integer mode that has the same size as a
 ;; fixed-point mode.
 (define_mode_attr IMODE [(QQ "QI") (HQ "HI") (SQ "SI") (DQ "DI")
--- gcc/config/spu/spu.c.jj	2017-07-17 10:08:39.000000000 +0200
+++ gcc/config/spu/spu.c	2017-07-24 18:06:01.693214125 +0200
@@ -1773,7 +1773,7 @@ spu_expand_prologue (void)
 	      size_v4si = scratch_v4si;
 	    }
 	  emit_insn (gen_cgt_v4si (scratch_v4si, sp_v4si, size_v4si));
-	  emit_insn (gen_vec_extractv4si
+	  emit_insn (gen_vec_extractv4sisi
 		     (scratch_reg_0, scratch_v4si, GEN_INT (1)));
 	  emit_insn (gen_spu_heq (scratch_reg_0, GEN_INT (0)));
 	}
@@ -5368,7 +5368,7 @@ spu_allocate_stack (rtx op0, rtx op1)
     {
       rtx avail = gen_reg_rtx(SImode);
       rtx result = gen_reg_rtx(SImode);
-      emit_insn (gen_vec_extractv4si (avail, sp, GEN_INT (1)));
+      emit_insn (gen_vec_extractv4sisi (avail, sp, GEN_INT (1)));
       emit_insn (gen_cgt_si(result, avail, GEN_INT (-1)));
       emit_insn (gen_spu_heq (result, GEN_INT(0) ));
     }
@@ -5684,22 +5684,22 @@ spu_builtin_extract (rtx ops[])
       switch (mode)
 	{
 	case V16QImode:
-	  emit_insn (gen_vec_extractv16qi (ops[0], ops[1], ops[2]));
+	  emit_insn (gen_vec_extractv16qiqi (ops[0], ops[1], ops[2]));
 	  break;
 	case V8HImode:
-	  emit_insn (gen_vec_extractv8hi (ops[0], ops[1], ops[2]));
+	  emit_insn (gen_vec_extractv8hihi (ops[0], ops[1], ops[2]));
 	  break;
 	case V4SFmode:
-	  emit_insn (gen_vec_extractv4sf (ops[0], ops[1], ops[2]));
+	  emit_insn (gen_vec_extractv4sfsf (ops[0], ops[1], ops[2]));
 	  break;
 	case V4SImode:
-	  emit_insn (gen_vec_extractv4si (ops[0], ops[1], ops[2]));
+	  emit_insn (gen_vec_extractv4sisi (ops[0], ops[1], ops[2]));
 	  break;
 	case V2DImode:
-	  emit_insn (gen_vec_extractv2di (ops[0], ops[1], ops[2]));
+	  emit_insn (gen_vec_extractv2didi (ops[0], ops[1], ops[2]));
 	  break;
 	case V2DFmode:
-	  emit_insn (gen_vec_extractv2df (ops[0], ops[1], ops[2]));
+	  emit_insn (gen_vec_extractv2dfdf (ops[0], ops[1], ops[2]));
 	  break;
 	default:
 	  abort ();
--- gcc/config/spu/spu.md.jj	2017-01-01 12:45:40.000000000 +0100
+++ gcc/config/spu/spu.md	2017-07-24 18:05:05.591888718 +0200
@@ -256,6 +256,13 @@ (define_mode_attr inner  [(V16QI "QI")
 			  (V2DI  "DI")
 			  (V4SF  "SF")
 			  (V2DF  "DF")])
+;; Like above, but in lower case
+(define_mode_attr inner_l [(V16QI "qi")
+			   (V8HI  "hi")
+			   (V4SI  "si")
+			   (V2DI  "di")
+			   (V4SF  "sf")
+			   (V2DF  "df")])
 (define_mode_attr vmult  [(V16QI "1")
 			  (V8HI  "2")
 			  (V4SI  "4")
@@ -4318,7 +4325,7 @@ (define_expand "restore_stack_nonlocal"
 ;; vector patterns
 
 ;; Vector initialization
-(define_expand "vec_init<mode>"
+(define_expand "vec_init<mode><inner_l>"
   [(match_operand:V 0 "register_operand" "")
    (match_operand 1 "" "")]
   ""
@@ -4347,7 +4354,7 @@ (define_expand "vec_set<mode>"
     operands[6] = GEN_INT (size);
   })
 
-(define_expand "vec_extract<mode>"
+(define_expand "vec_extract<mode><inner_l>"
   [(set (match_operand:<inner> 0 "spu_reg_operand" "=r")
 	(vec_select:<inner> (match_operand:V 1 "spu_reg_operand" "r")
 			    (parallel [(match_operand 2 "const_int_operand" "i")])))]
--- gcc/config/sparc/sparc.md.jj	2017-07-17 10:08:39.000000000 +0200
+++ gcc/config/sparc/sparc.md	2017-07-24 18:11:52.396997069 +0200
@@ -8621,6 +8621,8 @@ (define_mode_attr vconstr [(V1SI "f") (V
 (define_mode_attr vfptype [(V1SI "single") (V2HI "single") (V4QI "single")
 			   (V1DI "double") (V2SI "double") (V4HI "double")
 			   (V8QI "double")])
+(define_mode_attr veltmode [(V1SI "si") (V2HI "hi") (V4QI "qi") (V1DI "di")
+			    (V2SI "si") (V4HI "hi") (V8QI "qi")])
 
 (define_expand "mov<VMALL:mode>"
   [(set (match_operand:VMALL 0 "nonimmediate_operand" "")
@@ -8762,7 +8764,7 @@ (define_split
   DONE;
 })
 
-(define_expand "vec_init<VMALL:mode>"
+(define_expand "vec_init<VMALL:mode><VMALL:veltmode>"
   [(match_operand:VMALL 0 "register_operand" "")
    (match_operand:VMALL 1 "" "")]
   "TARGET_VIS"
--- gcc/config/ia64/vect.md.jj	2017-01-01 12:45:42.000000000 +0100
+++ gcc/config/ia64/vect.md	2017-07-24 17:29:28.996628899 +0200
@@ -1015,7 +1015,7 @@ (define_insn "*vec_interleave_highv2si"
 }
   [(set_attr "itanium_class" "mmshf")])
 
-(define_expand "vec_initv2si"
+(define_expand "vec_initv2sisi"
   [(match_operand:V2SI 0 "gr_register_operand" "")
    (match_operand 1 "" "")]
   ""
@@ -1299,7 +1299,7 @@ (define_insn "*fselect"
   "fselect %0 = %F2, %F3, %1"
   [(set_attr "itanium_class" "fmisc")])
 
-(define_expand "vec_initv2sf"
+(define_expand "vec_initv2sfsf"
   [(match_operand:V2SF 0 "fr_register_operand" "")
    (match_operand 1 "" "")]
   ""
@@ -1483,7 +1483,7 @@ (define_insn_and_split "*vec_extractv2sf
   operands[1] = gen_rtx_REG (SFmode, REGNO (operands[1]));
 })
 
-(define_expand "vec_extractv2sf"
+(define_expand "vec_extractv2sfsf"
   [(set (match_operand:SF 0 "register_operand" "")
 	(unspec:SF [(match_operand:V2SF 1 "register_operand" "")
 		    (match_operand:DI 2 "const_int_operand" "")]
--- gcc/config/powerpcspe/vector.md.jj	2017-05-25 10:37:03.000000000 +0200
+++ gcc/config/powerpcspe/vector.md	2017-07-24 17:41:21.897027743 +0200
@@ -74,6 +74,16 @@ (define_mode_attr VEC_base [(V16QI "QI")
 			    (V1TI  "TI")
 			    (TI    "TI")])
 
+;; As above, but in lower case
+(define_mode_attr VEC_base_l [(V16QI "qi")
+			      (V8HI  "hi")
+			      (V4SI  "si")
+			      (V2DI  "di")
+			      (V4SF  "sf")
+			      (V2DF  "df")
+			      (V1TI  "ti")
+			      (TI    "ti")])
+
 ;; Same size integer type for floating point data
 (define_mode_attr VEC_int [(V4SF  "v4si")
 			   (V2DF  "v2di")])
@@ -1017,7 +1027,7 @@ (define_expand "fixuns_trunc<mode><VEC_i
 
 \f
 ;; Vector initialization, set, extract
-(define_expand "vec_init<mode>"
+(define_expand "vec_init<mode><VEC_base_l>"
   [(match_operand:VEC_E 0 "vlogical_operand" "")
    (match_operand:VEC_E 1 "" "")]
   "VECTOR_MEM_ALTIVEC_OR_VSX_P (<MODE>mode)"
@@ -1036,7 +1046,7 @@ (define_expand "vec_set<mode>"
   DONE;
 })
 
-(define_expand "vec_extract<mode>"
+(define_expand "vec_extract<mode><VEC_base_l>"
   [(match_operand:<VEC_base> 0 "register_operand" "")
    (match_operand:VEC_E 1 "vlogical_operand" "")
    (match_operand 2 "const_int_operand" "")]
--- gcc/config/powerpcspe/paired.md.jj	2017-05-25 10:37:04.000000000 +0200
+++ gcc/config/powerpcspe/paired.md	2017-07-24 17:42:17.980351097 +0200
@@ -377,7 +377,7 @@ (define_insn "paired_muls1"
   "ps_muls1 %0, %1, %2"
   [(set_attr "type" "fp")])
 
-(define_expand "vec_initv2sf"
+(define_expand "vec_initv2sfsf"
   [(match_operand:V2SF 0 "gpc_reg_operand" "=f")
    (match_operand 1 "" "")]
   "TARGET_PAIRED_FLOAT"
--- gcc/config/powerpcspe/altivec.md.jj	2017-05-25 10:37:05.000000000 +0200
+++ gcc/config/powerpcspe/altivec.md	2017-07-24 17:42:49.897966010 +0200
@@ -301,7 +301,7 @@ (define_split
   for (i = 0; i < num_elements; i++)
     RTVEC_ELT (v, i) = constm1_rtx;
 
-  emit_insn (gen_vec_initv4si (dest, gen_rtx_PARALLEL (mode, v)));
+  emit_insn (gen_vec_initv4sisi (dest, gen_rtx_PARALLEL (mode, v)));
   emit_insn (gen_rtx_SET (dest, gen_rtx_ASHIFT (mode, dest, dest)));
   DONE;
 })
@@ -2222,7 +2222,7 @@ (define_expand "altivec_copysign_v4sf3"
   RTVEC_ELT (v, 2) = GEN_INT (mask_val);
   RTVEC_ELT (v, 3) = GEN_INT (mask_val);
 
-  emit_insn (gen_vec_initv4si (mask, gen_rtx_PARALLEL (V4SImode, v)));
+  emit_insn (gen_vec_initv4sisi (mask, gen_rtx_PARALLEL (V4SImode, v)));
   emit_insn (gen_vector_select_v4sf (operands[0], operands[1], operands[2],
 				     gen_lowpart (V4SFmode, mask)));
   DONE;
@@ -3014,7 +3014,7 @@ (define_expand "vec_unpacku_hi_v16qi"
   RTVEC_ELT (v, 14) = gen_rtx_CONST_INT (QImode, be ? 16 :  0);
   RTVEC_ELT (v, 15) = gen_rtx_CONST_INT (QImode, be ?  7 : 16);
 
-  emit_insn (gen_vec_initv16qi (mask, gen_rtx_PARALLEL (V16QImode, v)));
+  emit_insn (gen_vec_initv16qiqi (mask, gen_rtx_PARALLEL (V16QImode, v)));
   emit_insn (gen_vperm_v16qiv8hi (operands[0], operands[1], vzero, mask));
   DONE;
 }")
@@ -3050,7 +3050,7 @@ (define_expand "vec_unpacku_hi_v8hi"
   RTVEC_ELT (v, 14) = gen_rtx_CONST_INT (QImode, be ?  6 : 17);
   RTVEC_ELT (v, 15) = gen_rtx_CONST_INT (QImode, be ?  7 : 16);
 
-  emit_insn (gen_vec_initv16qi (mask, gen_rtx_PARALLEL (V16QImode, v)));
+  emit_insn (gen_vec_initv16qiqi (mask, gen_rtx_PARALLEL (V16QImode, v)));
   emit_insn (gen_vperm_v8hiv4si (operands[0], operands[1], vzero, mask));
   DONE;
 }")
@@ -3086,7 +3086,7 @@ (define_expand "vec_unpacku_lo_v16qi"
   RTVEC_ELT (v, 14) = gen_rtx_CONST_INT (QImode, be ? 16 :  8);
   RTVEC_ELT (v, 15) = gen_rtx_CONST_INT (QImode, be ? 15 : 16);
 
-  emit_insn (gen_vec_initv16qi (mask, gen_rtx_PARALLEL (V16QImode, v)));
+  emit_insn (gen_vec_initv16qiqi (mask, gen_rtx_PARALLEL (V16QImode, v)));
   emit_insn (gen_vperm_v16qiv8hi (operands[0], operands[1], vzero, mask));
   DONE;
 }")
@@ -3122,7 +3122,7 @@ (define_expand "vec_unpacku_lo_v8hi"
   RTVEC_ELT (v, 14) = gen_rtx_CONST_INT (QImode, be ? 14 : 17);
   RTVEC_ELT (v, 15) = gen_rtx_CONST_INT (QImode, be ? 15 : 16);
 
-  emit_insn (gen_vec_initv16qi (mask, gen_rtx_PARALLEL (V16QImode, v)));
+  emit_insn (gen_vec_initv16qiqi (mask, gen_rtx_PARALLEL (V16QImode, v)));
   emit_insn (gen_vperm_v8hiv4si (operands[0], operands[1], vzero, mask));
   DONE;
 }")
@@ -3363,7 +3363,7 @@ (define_expand "mulv16qi3"
      = gen_rtx_CONST_INT (QImode, BYTES_BIG_ENDIAN ? 2 * i + 17 : 15 - 2 * i);
   }
 
-  emit_insn (gen_vec_initv16qi (mask, gen_rtx_PARALLEL (V16QImode, v)));
+  emit_insn (gen_vec_initv16qiqi (mask, gen_rtx_PARALLEL (V16QImode, v)));
   emit_insn (gen_altivec_vmulesb (even, operands[1], operands[2]));
   emit_insn (gen_altivec_vmulosb (odd, operands[1], operands[2]));
   emit_insn (gen_altivec_vperm_v8hiv16qi (operands[0], even, odd, mask));

	Jakub

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Switch vec_init and vec_extract optabs to 2 mode optab to allow extraction of vector from vector or initialization of vector from smaller vectors (PR target/80846)
  2017-07-25  9:14 [PATCH] Switch vec_init and vec_extract optabs to 2 mode optab to allow extraction of vector from vector or initialization of vector from smaller vectors (PR target/80846) Jakub Jelinek
@ 2017-07-25 21:12 ` Segher Boessenkool
  2017-07-26  7:09   ` Jakub Jelinek
  2017-07-25 21:45 ` Matthew Fortune
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 15+ messages in thread
From: Segher Boessenkool @ 2017-07-25 21:12 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Richard Biener, Uros Bizjak, David Edelsohn, Marcus Shawcroft,
	Richard Earnshaw, Andreas Krebbel, Matthew Fortune,
	Eric Botcazou, Andrew Jenner, gcc-patches

Hi Jakub,

On Tue, Jul 25, 2017 at 11:14:32AM +0200, Jakub Jelinek wrote:
> The following patch adjusts the vec_init and vec_extract optabs, so that
> they don't have in the expander names just the vector mode, but also another
> mode, for vec_extract the mode of the result and for vec_init the mode of
> the elts of the vector passed as second operand.
> 
> Without this patch, the second mode has been implicit, GET_MODE_INNER of
> the vector mode, so one could just extract a single element from a vector
> or construct vector from elements.  While that is most common, we allow
> in GIMPLE e.g. construction of V8DImode from 4 V2DImode elements etc.
> and the vectorizer uses them.  By having the second mode in the name
> it allows the generic code (vectorizer, expansion) to query whether the
> backend supports such vector from vector expansions or inits from vector
> elts and use them if available.
> 
> For vec_extract, if we say want to extract high V2SImode from V4SImode
> the fallback is try to expand it as DImode extraction from V2DImode.
> This works well in many cases, but doesn't really work for very large
> vectors, say if we want to extract high V8SImode from V16SImode on x86,
> we'd need OImode extraction from V2OImode, which is something the backend
> doesn't have any support for.
> For vec_init, the fallback is usually to go through memory, which is slow in
> many cases.
> 
> This patch only adds new vector from vector extract and init patterns to
> the i386 backend, but I had to change many other targets too, because
> it needs to have the element mode in the vec_extract/vec_init expander
> names.  Seems most of the backends didn't really have a mode attribute
> usable for this or had it only in uppercase, while for the names we need
> lowercase.  Some backends had a convention on how to name lower case
> vs. upper case modes, others didn't have any.  So I'm CCing maintainers
> of affected backends to seek advice on what mode attributes they want to
> use.

Would it be possible (and useful) to _also_ keep the old names?  Or do
you expect all targets will want to support all combinations?

> --- gcc/config/rs6000/vector.md.jj	2017-06-08 20:50:49.000000000 +0200
> +++ gcc/config/rs6000/vector.md	2017-07-24 17:44:44.699580927 +0200
> @@ -74,6 +74,16 @@ (define_mode_attr VEC_base [(V16QI "QI")
>  			    (V1TI  "TI")
>  			    (TI    "TI")])
>  
> +;; As above, but in lower case
> +(define_mode_attr VEC_base_l [(V16QI "qi")
> +			      (V8HI  "hi")
> +			      (V4SI  "si")
> +			      (V2DI  "di")
> +			      (V4SF  "sf")
> +			      (V2DF  "df")
> +			      (V1TI  "ti")
> +			      (TI    "ti")])
> +
>  ;; Same size integer type for floating point data
>  (define_mode_attr VEC_int [(V4SF  "v4si")
>  			   (V2DF  "v2di")])

> @@ -520,6 +520,17 @@ (define_mode_attr VEL [(V8QI "QI") (V16Q
>  			(SI   "SI") (HI   "HI")
>  			(QI   "QI")])
>  
> +;; Define element mode for each vector mode (lower case).
> +(define_mode_attr Vel [(V8QI "qi") (V16QI "qi")
> +			(V4HI "hi") (V8HI "hi")
> +			(V2SI "si") (V4SI "si")
> +			(DI "di")   (V2DI "di")
> +			(V4HF "hf") (V8HF "hf")
> +			(V2SF "sf") (V4SF "sf")
> +			(V2DF "df") (DF "df")
> +			(SI   "si") (HI   "hi")
> +			(QI   "qi")])

(Inconsistent spacing, please fix).

("vel" instead of "Vel" for this name?)

How is this different from VEC_base_l?  They can just be merged it seems.
(And for that matter, VEC_base and VEL as well).  We'll handle that I
suppose, don't let it hold up this patch :-)


Segher

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Switch vec_init and vec_extract optabs to 2 mode optab to allow extraction of vector from vector or initialization of vector from smaller vectors (PR target/80846)
  2017-07-25 21:12 ` Segher Boessenkool
@ 2017-07-26  7:09   ` Jakub Jelinek
  2017-07-26  7:29     ` Richard Biener
  2017-07-26 11:41     ` Segher Boessenkool
  0 siblings, 2 replies; 15+ messages in thread
From: Jakub Jelinek @ 2017-07-26  7:09 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Richard Biener, Uros Bizjak, David Edelsohn, Marcus Shawcroft,
	Richard Earnshaw, Andreas Krebbel, Matthew Fortune,
	Eric Botcazou, Andrew Jenner, gcc-patches

On Tue, Jul 25, 2017 at 03:52:56PM -0500, Segher Boessenkool wrote:
> On Tue, Jul 25, 2017 at 11:14:32AM +0200, Jakub Jelinek wrote:
> > This patch only adds new vector from vector extract and init patterns to
> > the i386 backend, but I had to change many other targets too, because
> > it needs to have the element mode in the vec_extract/vec_init expander
> > names.  Seems most of the backends didn't really have a mode attribute
> > usable for this or had it only in uppercase, while for the names we need
> > lowercase.  Some backends had a convention on how to name lower case
> > vs. upper case modes, others didn't have any.  So I'm CCing maintainers
> > of affected backends to seek advice on what mode attributes they want to
> > use.
> 
> Would it be possible (and useful) to _also_ keep the old names?  Or do
> you expect all targets will want to support all combinations?

Richi's preference was to use only a single conversion optab instead of
old + new when we've discussed it on IRC.  Of course it would be far less
work for me to support say:
OPTAB_CD(vec_extract2_optab, "vec_extract$a$b")
OPTAB_CD(vec_init2_optab, "vec_init$a$b")
OPTAB_D (vec_extract_optab, "vec_extract$a")
OPTAB_D (vec_init_optab, "vec_init$a")
where the single mode vec_extract/vec_init would be
extraction/initialization from element mode and the two mode one would be
used for 2 vector modes.  If there is agreement on that, most of the
config/*/* changes could go away.

> > --- gcc/config/rs6000/vector.md.jj	2017-06-08 20:50:49.000000000 +0200
> > +++ gcc/config/rs6000/vector.md	2017-07-24 17:44:44.699580927 +0200
> > @@ -74,6 +74,16 @@ (define_mode_attr VEC_base [(V16QI "QI")
> >  			    (V1TI  "TI")
> >  			    (TI    "TI")])
> >  
> > +;; As above, but in lower case
> > +(define_mode_attr VEC_base_l [(V16QI "qi")
> > +			      (V8HI  "hi")
> > +			      (V4SI  "si")
> > +			      (V2DI  "di")
> > +			      (V4SF  "sf")
> > +			      (V2DF  "df")
> > +			      (V1TI  "ti")
> > +			      (TI    "ti")])
> > +
> >  ;; Same size integer type for floating point data
> >  (define_mode_attr VEC_int [(V4SF  "v4si")
> >  			   (V2DF  "v2di")])
> 
> > @@ -520,6 +520,17 @@ (define_mode_attr VEL [(V8QI "QI") (V16Q
> >  			(SI   "SI") (HI   "HI")
> >  			(QI   "QI")])
> >  
> > +;; Define element mode for each vector mode (lower case).
> > +(define_mode_attr Vel [(V8QI "qi") (V16QI "qi")
> > +			(V4HI "hi") (V8HI "hi")
> > +			(V2SI "si") (V4SI "si")
> > +			(DI "di")   (V2DI "di")
> > +			(V4HF "hf") (V8HF "hf")
> > +			(V2SF "sf") (V4SF "sf")
> > +			(V2DF "df") (DF "df")
> > +			(SI   "si") (HI   "hi")
> > +			(QI   "qi")])
> 
> (Inconsistent spacing, please fix).

It is the same spacing as in VEL right above it, I've tried to follow
whatever weirdo formatting each backend had.

> ("vel" instead of "Vel" for this name?)

That is to follow aarch64 iterator naming convention, where they have
already preexisting e.g. VDBL for
;; Double modes of vector modes.
and Vdbl for:
;; Double modes of vector modes (lower case).
or similarly VHALF vs Vhalf.

> How is this different from VEC_base_l?  They can just be merged it seems.

They can't be merged, each backend has its own iterators and iterator naming
conventions, we don't really have any gcc/iterators.md that would be used
for all backends.  VEC_base_l is just rs6000 (and powerpcspe), using
rs6000 iterator naming conventions, Vel is aarch64 using aarch64 naming
conventions, V_elem_l is arm using arm iterator naming conventions etc.

> (And for that matter, VEC_base and VEL as well).  We'll handle that I
> suppose, don't let it hold up this patch :-)

	Jakub

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Switch vec_init and vec_extract optabs to 2 mode optab to allow extraction of vector from vector or initialization of vector from smaller vectors (PR target/80846)
  2017-07-26  7:09   ` Jakub Jelinek
@ 2017-07-26  7:29     ` Richard Biener
  2017-07-26 11:41     ` Segher Boessenkool
  1 sibling, 0 replies; 15+ messages in thread
From: Richard Biener @ 2017-07-26  7:29 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Segher Boessenkool, Uros Bizjak, David Edelsohn,
	Marcus Shawcroft, Richard Earnshaw, Andreas Krebbel,
	Matthew Fortune, Eric Botcazou, Andrew Jenner, gcc-patches

On Wed, 26 Jul 2017, Jakub Jelinek wrote:

> On Tue, Jul 25, 2017 at 03:52:56PM -0500, Segher Boessenkool wrote:
> > On Tue, Jul 25, 2017 at 11:14:32AM +0200, Jakub Jelinek wrote:
> > > This patch only adds new vector from vector extract and init patterns to
> > > the i386 backend, but I had to change many other targets too, because
> > > it needs to have the element mode in the vec_extract/vec_init expander
> > > names.  Seems most of the backends didn't really have a mode attribute
> > > usable for this or had it only in uppercase, while for the names we need
> > > lowercase.  Some backends had a convention on how to name lower case
> > > vs. upper case modes, others didn't have any.  So I'm CCing maintainers
> > > of affected backends to seek advice on what mode attributes they want to
> > > use.
> > 
> > Would it be possible (and useful) to _also_ keep the old names?  Or do
> > you expect all targets will want to support all combinations?
> 
> Richi's preference was to use only a single conversion optab instead of
> old + new when we've discussed it on IRC.  Of course it would be far less

Yep.  To me there's no advantage to having two variants besides avoiding
the mechanical change to existing backends adding the matching
component mode (plus a convenient iterator for that -- or maybe
gen* support for "component of mode").

> work for me to support say:
> OPTAB_CD(vec_extract2_optab, "vec_extract$a$b")
> OPTAB_CD(vec_init2_optab, "vec_init$a$b")
> OPTAB_D (vec_extract_optab, "vec_extract$a")
> OPTAB_D (vec_init_optab, "vec_init$a")
> where the single mode vec_extract/vec_init would be
> extraction/initialization from element mode and the two mode one would be
> used for 2 vector modes.  If there is agreement on that, most of the
> config/*/* changes could go away.
> 
> > > --- gcc/config/rs6000/vector.md.jj	2017-06-08 20:50:49.000000000 +0200
> > > +++ gcc/config/rs6000/vector.md	2017-07-24 17:44:44.699580927 +0200
> > > @@ -74,6 +74,16 @@ (define_mode_attr VEC_base [(V16QI "QI")
> > >  			    (V1TI  "TI")
> > >  			    (TI    "TI")])
> > >  
> > > +;; As above, but in lower case
> > > +(define_mode_attr VEC_base_l [(V16QI "qi")
> > > +			      (V8HI  "hi")
> > > +			      (V4SI  "si")
> > > +			      (V2DI  "di")
> > > +			      (V4SF  "sf")
> > > +			      (V2DF  "df")
> > > +			      (V1TI  "ti")
> > > +			      (TI    "ti")])
> > > +
> > >  ;; Same size integer type for floating point data
> > >  (define_mode_attr VEC_int [(V4SF  "v4si")
> > >  			   (V2DF  "v2di")])
> > 
> > > @@ -520,6 +520,17 @@ (define_mode_attr VEL [(V8QI "QI") (V16Q
> > >  			(SI   "SI") (HI   "HI")
> > >  			(QI   "QI")])
> > >  
> > > +;; Define element mode for each vector mode (lower case).
> > > +(define_mode_attr Vel [(V8QI "qi") (V16QI "qi")
> > > +			(V4HI "hi") (V8HI "hi")
> > > +			(V2SI "si") (V4SI "si")
> > > +			(DI "di")   (V2DI "di")
> > > +			(V4HF "hf") (V8HF "hf")
> > > +			(V2SF "sf") (V4SF "sf")
> > > +			(V2DF "df") (DF "df")
> > > +			(SI   "si") (HI   "hi")
> > > +			(QI   "qi")])
> > 
> > (Inconsistent spacing, please fix).
> 
> It is the same spacing as in VEL right above it, I've tried to follow
> whatever weirdo formatting each backend had.
> 
> > ("vel" instead of "Vel" for this name?)
> 
> That is to follow aarch64 iterator naming convention, where they have
> already preexisting e.g. VDBL for
> ;; Double modes of vector modes.
> and Vdbl for:
> ;; Double modes of vector modes (lower case).
> or similarly VHALF vs Vhalf.
> 
> > How is this different from VEC_base_l?  They can just be merged it seems.
> 
> They can't be merged, each backend has its own iterators and iterator naming
> conventions, we don't really have any gcc/iterators.md that would be used
> for all backends.  VEC_base_l is just rs6000 (and powerpcspe), using
> rs6000 iterator naming conventions, Vel is aarch64 using aarch64 naming
> conventions, V_elem_l is arm using arm iterator naming conventions etc.
> 
> > (And for that matter, VEC_base and VEL as well).  We'll handle that I
> > suppose, don't let it hold up this patch :-)
> 
> 	Jakub
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Switch vec_init and vec_extract optabs to 2 mode optab to allow extraction of vector from vector or initialization of vector from smaller vectors (PR target/80846)
  2017-07-26  7:09   ` Jakub Jelinek
  2017-07-26  7:29     ` Richard Biener
@ 2017-07-26 11:41     ` Segher Boessenkool
  2017-08-01 16:21       ` Jakub Jelinek
  1 sibling, 1 reply; 15+ messages in thread
From: Segher Boessenkool @ 2017-07-26 11:41 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Richard Biener, Uros Bizjak, David Edelsohn, Marcus Shawcroft,
	Richard Earnshaw, Andreas Krebbel, Matthew Fortune,
	Eric Botcazou, Andrew Jenner, gcc-patches

On Wed, Jul 26, 2017 at 09:09:04AM +0200, Jakub Jelinek wrote:
> On Tue, Jul 25, 2017 at 03:52:56PM -0500, Segher Boessenkool wrote:
> > On Tue, Jul 25, 2017 at 11:14:32AM +0200, Jakub Jelinek wrote:
> > > This patch only adds new vector from vector extract and init patterns to
> > > the i386 backend, but I had to change many other targets too, because
> > > it needs to have the element mode in the vec_extract/vec_init expander
> > > names.  Seems most of the backends didn't really have a mode attribute
> > > usable for this or had it only in uppercase, while for the names we need
> > > lowercase.  Some backends had a convention on how to name lower case
> > > vs. upper case modes, others didn't have any.  So I'm CCing maintainers
> > > of affected backends to seek advice on what mode attributes they want to
> > > use.
> > 
> > Would it be possible (and useful) to _also_ keep the old names?  Or do
> > you expect all targets will want to support all combinations?
> 
> Richi's preference was to use only a single conversion optab instead of
> old + new when we've discussed it on IRC.  Of course it would be far less
> work for me to support say:
> OPTAB_CD(vec_extract2_optab, "vec_extract$a$b")
> OPTAB_CD(vec_init2_optab, "vec_init$a$b")
> OPTAB_D (vec_extract_optab, "vec_extract$a")
> OPTAB_D (vec_init_optab, "vec_init$a")
> where the single mode vec_extract/vec_init would be
> extraction/initialization from element mode and the two mode one would be
> used for 2 vector modes.  If there is agreement on that, most of the
> config/*/* changes could go away.

And less work for backends, old as well as new.

> > > @@ -520,6 +520,17 @@ (define_mode_attr VEL [(V8QI "QI") (V16Q
> > >  			(SI   "SI") (HI   "HI")
> > >  			(QI   "QI")])
> > >  
> > > +;; Define element mode for each vector mode (lower case).
> > > +(define_mode_attr Vel [(V8QI "qi") (V16QI "qi")
> > > +			(V4HI "hi") (V8HI "hi")
> > > +			(V2SI "si") (V4SI "si")
> > > +			(DI "di")   (V2DI "di")
> > > +			(V4HF "hf") (V8HF "hf")
> > > +			(V2SF "sf") (V4SF "sf")
> > > +			(V2DF "df") (DF "df")
> > > +			(SI   "si") (HI   "hi")
> > > +			(QI   "qi")])
> > 
> > (Inconsistent spacing, please fix).
> 
> It is the same spacing as in VEL right above it, I've tried to follow
> whatever weirdo formatting each backend had.
> 
> > ("vel" instead of "Vel" for this name?)
> 
> That is to follow aarch64 iterator naming convention, where they have

Ugh, for some reason I thought this was in rs6000/ as well.  I have
fresh coffee now.  Sorry.


Segher

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Switch vec_init and vec_extract optabs to 2 mode optab to allow extraction of vector from vector or initialization of vector from smaller vectors (PR target/80846)
  2017-07-26 11:41     ` Segher Boessenkool
@ 2017-08-01 16:21       ` Jakub Jelinek
  2017-08-01 23:57         ` Segher Boessenkool
  0 siblings, 1 reply; 15+ messages in thread
From: Jakub Jelinek @ 2017-08-01 16:21 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Richard Biener, Uros Bizjak, David Edelsohn, Marcus Shawcroft,
	Richard Earnshaw, Andreas Krebbel, Matthew Fortune,
	Eric Botcazou, Andrew Jenner, gcc-patches

On Wed, Jul 26, 2017 at 06:41:23AM -0500, Segher Boessenkool wrote:
> > That is to follow aarch64 iterator naming convention, where they have
> 
> Ugh, for some reason I thought this was in rs6000/ as well.  I have
> fresh coffee now.  Sorry.

Apparently I broke power bootstrap with this, because two new spots were
introduced after I wrote the patch and my cross-compiler which didn't have
HAVE_AS_POWER9 defined didn't reveal that.  Fixed thusly, committed as
obvious to trunk:

2017-08-01  Jakub Jelinek  <jakub@redhat.com>

	PR target/80846
	* config/rs6000/vsx.md (vextract_fp_from_shorth,
	vextract_fp_from_shortl): Add element mode after mode in gen_vec_init*
	calls.

--- gcc/config/rs6000/vsx.md.jj	2017-07-28 09:10:49.000000000 +0200
+++ gcc/config/rs6000/vsx.md	2017-08-01 18:04:50.000000000 +0200
@@ -4523,7 +4523,7 @@ (define_expand "vextract_fp_from_shorth"
      inputs in half words 1,3,5,7 (IBM numbering).  Use xxperm to move
      src half words 0,1,2,3 for the conversion instruction.  */
   v = gen_rtvec_v (16, rvals);
-  emit_insn (gen_vec_initv16qi (mask, gen_rtx_PARALLEL (V16QImode, v)));
+  emit_insn (gen_vec_initv16qiqi (mask, gen_rtx_PARALLEL (V16QImode, v)));
   emit_insn (gen_altivec_vperm_v8hiv16qi (tmp, operands[1],
 					  operands[1], mask));
   emit_insn (gen_vsx_xvcvhpsp (operands[0], tmp));
@@ -4552,7 +4552,7 @@ (define_expand "vextract_fp_from_shortl"
      inputs in half words 1,3,5,7 (IBM numbering).  Use xxperm to move
      src half words 4,5,6,7 for the conversion instruction.  */
   v = gen_rtvec_v (16, rvals);
-  emit_insn (gen_vec_initv16qi (mask, gen_rtx_PARALLEL (V16QImode, v)));
+  emit_insn (gen_vec_initv16qiqi (mask, gen_rtx_PARALLEL (V16QImode, v)));
   emit_insn (gen_altivec_vperm_v8hiv16qi (tmp, operands[1],
 					  operands[1], mask));
   emit_insn (gen_vsx_xvcvhpsp (operands[0], tmp));


	Jakub

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Switch vec_init and vec_extract optabs to 2 mode optab to allow extraction of vector from vector or initialization of vector from smaller vectors (PR target/80846)
  2017-08-01 16:21       ` Jakub Jelinek
@ 2017-08-01 23:57         ` Segher Boessenkool
  0 siblings, 0 replies; 15+ messages in thread
From: Segher Boessenkool @ 2017-08-01 23:57 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Richard Biener, Uros Bizjak, David Edelsohn, Marcus Shawcroft,
	Richard Earnshaw, Andreas Krebbel, Matthew Fortune,
	Eric Botcazou, Andrew Jenner, gcc-patches

On Tue, Aug 01, 2017 at 06:21:34PM +0200, Jakub Jelinek wrote:
> Apparently I broke power bootstrap with this, because two new spots were
> introduced after I wrote the patch and my cross-compiler which didn't have
> HAVE_AS_POWER9 defined didn't reveal that.  Fixed thusly, committed as
> obvious to trunk:

Thanks!


Segher

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [PATCH] Switch vec_init and vec_extract optabs to 2 mode optab to allow extraction of vector from vector or initialization of vector from smaller vectors (PR target/80846)
  2017-07-25  9:14 [PATCH] Switch vec_init and vec_extract optabs to 2 mode optab to allow extraction of vector from vector or initialization of vector from smaller vectors (PR target/80846) Jakub Jelinek
  2017-07-25 21:12 ` Segher Boessenkool
@ 2017-07-25 21:45 ` Matthew Fortune
  2017-07-26  7:25   ` Richard Biener
  2017-07-26  7:34 ` Eric Botcazou
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 15+ messages in thread
From: Matthew Fortune @ 2017-07-25 21:45 UTC (permalink / raw)
  To: Jakub Jelinek, Richard Biener, Uros Bizjak, David Edelsohn,
	Segher Boessenkool, Marcus Shawcroft, Richard Earnshaw,
	Andreas Krebbel, Eric Botcazou, Andrew Jenner
  Cc: gcc-patches, Sameera Deshpande

Jakub Jelinek <jakub@redhat.com> writes:
> Bootstrapped/regtested on x86_64-linux and i686-linux, where it improves
> e.g. the code generation for slp-43.c and slp-45.c testcases.
> make cc1 tested in cross-compilers to the remaining targets.

No objections for the MIPS part. I've pointed out this change to Sameera to
see how/if it will affect her autovectorization branch and whether MIPS MSA
should define more forms of vec_init/vec_expand in general.

Thanks,
Matthew

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [PATCH] Switch vec_init and vec_extract optabs to 2 mode optab to allow extraction of vector from vector or initialization of vector from smaller vectors (PR target/80846)
  2017-07-25 21:45 ` Matthew Fortune
@ 2017-07-26  7:25   ` Richard Biener
  0 siblings, 0 replies; 15+ messages in thread
From: Richard Biener @ 2017-07-26  7:25 UTC (permalink / raw)
  To: Matthew Fortune
  Cc: Jakub Jelinek, Uros Bizjak, David Edelsohn, Segher Boessenkool,
	Marcus Shawcroft, Richard Earnshaw, Andreas Krebbel,
	Eric Botcazou, Andrew Jenner, gcc-patches, Sameera Deshpande

On Tue, 25 Jul 2017, Matthew Fortune wrote:

> Jakub Jelinek <jakub@redhat.com> writes:
> > Bootstrapped/regtested on x86_64-linux and i686-linux, where it improves
> > e.g. the code generation for slp-43.c and slp-45.c testcases.
> > make cc1 tested in cross-compilers to the remaining targets.
> 
> No objections for the MIPS part. I've pointed out this change to Sameera to
> see how/if it will affect her autovectorization branch and whether MIPS MSA
> should define more forms of vec_init/vec_expand in general.

Note the vectorizer will try both variants (punning via integer mode
and vector element) so the change is mostly to get around having very
large integer modes like OImode of AVX512 halves or even larger ones
for ARM SVE.

You should only need to have matching component mode vector part
inits/extracts (and all vector modes are power-of-two size).  Thus
for V8SI have V4SI, V2SI and SI components for init/extracts.
If those are not available the vectorizer will try to pun V8SI
to V2TI and to TImode init/extract or V4DI with DImode, etc.
(that's what the bulk conversion of targets besides x86 should
provide).

Richard.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Switch vec_init and vec_extract optabs to 2 mode optab to allow extraction of vector from vector or initialization of vector from smaller vectors (PR target/80846)
  2017-07-25  9:14 [PATCH] Switch vec_init and vec_extract optabs to 2 mode optab to allow extraction of vector from vector or initialization of vector from smaller vectors (PR target/80846) Jakub Jelinek
  2017-07-25 21:12 ` Segher Boessenkool
  2017-07-25 21:45 ` Matthew Fortune
@ 2017-07-26  7:34 ` Eric Botcazou
  2017-07-26 10:35 ` Richard Biener
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 15+ messages in thread
From: Eric Botcazou @ 2017-07-26  7:34 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Richard Biener, Uros Bizjak, David Edelsohn, Segher Boessenkool,
	Marcus Shawcroft, Richard Earnshaw, Andreas Krebbel,
	Matthew Fortune, Andrew Jenner, gcc-patches

> Bootstrapped/regtested on x86_64-linux and i686-linux, where it improves
> e.g. the code generation for slp-43.c and slp-45.c testcases.
> make cc1 tested in cross-compilers to the remaining targets.

The SPARC bits are OK by me.

-- 
Eric Botcazou

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Switch vec_init and vec_extract optabs to 2 mode optab to allow extraction of vector from vector or initialization of vector from smaller vectors (PR target/80846)
  2017-07-25  9:14 [PATCH] Switch vec_init and vec_extract optabs to 2 mode optab to allow extraction of vector from vector or initialization of vector from smaller vectors (PR target/80846) Jakub Jelinek
                   ` (2 preceding siblings ...)
  2017-07-26  7:34 ` Eric Botcazou
@ 2017-07-26 10:35 ` Richard Biener
  2017-07-26 10:42 ` Uros Bizjak
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 15+ messages in thread
From: Richard Biener @ 2017-07-26 10:35 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Uros Bizjak, David Edelsohn, Segher Boessenkool,
	Marcus Shawcroft, Richard Earnshaw, Andreas Krebbel,
	Matthew Fortune, Eric Botcazou, Andrew Jenner, gcc-patches

On Tue, 25 Jul 2017, Jakub Jelinek wrote:

> Hi!
> 
> The following patch adjusts the vec_init and vec_extract optabs, so that
> they don't have in the expander names just the vector mode, but also another
> mode, for vec_extract the mode of the result and for vec_init the mode of
> the elts of the vector passed as second operand.
> 
> Without this patch, the second mode has been implicit, GET_MODE_INNER of
> the vector mode, so one could just extract a single element from a vector
> or construct vector from elements.  While that is most common, we allow
> in GIMPLE e.g. construction of V8DImode from 4 V2DImode elements etc.
> and the vectorizer uses them.  By having the second mode in the name
> it allows the generic code (vectorizer, expansion) to query whether the
> backend supports such vector from vector expansions or inits from vector
> elts and use them if available.
> 
> For vec_extract, if we say want to extract high V2SImode from V4SImode
> the fallback is try to expand it as DImode extraction from V2DImode.
> This works well in many cases, but doesn't really work for very large
> vectors, say if we want to extract high V8SImode from V16SImode on x86,
> we'd need OImode extraction from V2OImode, which is something the backend
> doesn't have any support for.
> For vec_init, the fallback is usually to go through memory, which is slow in
> many cases.
> 
> This patch only adds new vector from vector extract and init patterns to
> the i386 backend, but I had to change many other targets too, because
> it needs to have the element mode in the vec_extract/vec_init expander
> names.  Seems most of the backends didn't really have a mode attribute
> usable for this or had it only in uppercase, while for the names we need
> lowercase.  Some backends had a convention on how to name lower case
> vs. upper case modes, others didn't have any.  So I'm CCing maintainers
> of affected backends to seek advice on what mode attributes they want to
> use.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, where it improves
> e.g. the code generation for slp-43.c and slp-45.c testcases.
> make cc1 tested in cross-compilers to the remaining targets.
> 
> Ok for trunk?

The non-target specific bits are ok.

Thanks,
Richard.

> 2017-07-25  Jakub Jelinek  <jakub@redhat.com>
> 
> 	PR target/80846
> 	* optabs.def (vec_extract_optab, vec_init_optab): Change from
> 	a direct optab to conversion optab.
> 	* optabs.c (expand_vector_broadcast): Use convert_optab_handler
> 	with GET_MODE_INNER as last argument instead of optab_handler.
> 	* expmed.c (extract_bit_field_1): Likewise.  Use vector from
> 	vector extraction if possible and optab is available.
> 	* expr.c (store_constructor): Use convert_optab_handler instead
> 	of optab_handler.  Use vector initialization from smaller
> 	vectors if possible and optab is available.
> 	* tree-vect-stmts.c (vectorizable_load): Likewise.
> 	* doc/md.texi (vec_extract, vec_init): Document that the optabs
> 	now have two modes.
> 	* config/i386/i386.c (ix86_expand_vector_init): Handle expansion
> 	of vec_init from half-sized vectors with the same element mode.
> 	* config/i386/sse.md (ssehalfvecmode): Add V4TI case.
> 	(ssehalfvecmodelower, ssescalarmodelower): New mode attributes.
> 	(reduc_plus_scal_v8df, reduc_plus_scal_v4df, reduc_plus_scal_v2df,
> 	reduc_plus_scal_v16sf, reduc_plus_scal_v8sf, reduc_plus_scal_v4sf,
> 	reduc_<code>_scal_<mode>, reduc_umin_scal_v8hi): Add element mode
> 	after mode in gen_vec_extract* calls.
> 	(vec_extract<mode>): Renamed to ...
> 	(vec_extract<mode><ssescalarmodelower>): ... this.
> 	(vec_extract<mode><ssehalfvecmodelower>): New expander.
> 	(rotl<mode>3, rotr<mode>3, <shift_insn><mode>3, ashrv2di3): Add
> 	element mode after mode in gen_vec_init* calls.
> 	(VEC_INIT_HALF_MODE): New mode iterator.
> 	(vec_init<mode>): Renamed to ...
> 	(vec_init<mode><ssescalarmodelower>): ... this.
> 	(vec_init<mode><ssehalfvecmodelower>): New expander.
> 	* config/i386/mmx.md (vec_extractv2sf): Renamed to ...
> 	(vec_extractv2sfsf): ... this.
> 	(vec_initv2sf): Renamed to ...
> 	(vec_initv2sfsf): ... this.
> 	(vec_extractv2si): Renamed to ...
> 	(vec_extractv2sisi): ... this.
> 	(vec_initv2si): Renamed to ...
> 	(vec_initv2sisi): ... this.
> 	(vec_extractv4hi): Renamed to ...
> 	(vec_extractv4hihi): ... this.
> 	(vec_initv4hi): Renamed to ...
> 	(vec_initv4hihi): ... this.
> 	(vec_extractv8qi): Renamed to ...
> 	(vec_extractv8qiqi): ... this.
> 	(vec_initv8qi): Renamed to ...
> 	(vec_initv8qiqi): ... this.
> 	* config/rs6000/vector.md (VEC_base_l): New mode attribute.
> 	(vec_init<mode>): Renamed to ...
> 	(vec_init<mode><VEC_base_l>): ... this.
> 	(vec_extract<mode>): Renamed to ...
> 	(vec_extract<mode><VEC_base_l>): ... this.
> 	* config/rs6000/paired.md (vec_initv2sf): Renamed to ...
> 	(vec_initv2sfsf): ... this.
> 	* config/rs6000/altivec.md (splitter, altivec_copysign_v4sf3,
> 	vec_unpacku_hi_v16qi, vec_unpacku_hi_v8hi, vec_unpacku_lo_v16qi,
> 	vec_unpacku_lo_v8hi, mulv16qi3, altivec_vreve<mode>2): Add
> 	element mode after mode in gen_vec_init* calls.
> 	* config/aarch64/aarch64-simd.md (vec_init<mode>): Renamed to ...
> 	(vec_init<mode><Vel>): ... this.
> 	(vec_extract<mode>): Renamed to ...
> 	(vec_extract<mode><Vel>): ... this.
> 	* config/aarch64/iterators.md (Vel): New mode attribute.
> 	* config/s390/s390.c (s390_expand_vec_strlen, s390_expand_vec_movstr):
> 	Add element mode after mode in gen_vec_extract* calls.
> 	* config/s390/vector.md (non_vec_l): New mode attribute.
> 	(vec_extract<mode>): Renamed to ...
> 	(vec_extract<mode><non_vec_l>): ... this.
> 	(vec_init<mode>): Renamed to ...
> 	(vec_init<mode><non_vec_l>): ... this.
> 	* config/s390/s390-builtins.def (s390_vlgvb, s390_vlgvh, s390_vlgvf,
> 	s390_vlgvf_flt, s390_vlgvg, s390_vlgvg_dbl): Add element mode after
> 	vec_extract mode.
> 	* config/arm/iterators.md (V_elem_l): New mode attribute.
> 	* config/arm/neon.md (vec_extract<mode>): Renamed to ...
> 	(vec_extract<mode><V_elem_l>): ... this.
> 	(vec_extractv2di): Renamed to ...
> 	(vec_extractv2didi): ... this.
> 	(vec_init<mode>): Renamed to ...
> 	(vec_init<mode><V_elem_l>): ... this.
> 	(reduc_plus_scal_<mode>, reduc_plus_scal_v2di, reduc_smin_scal_<mode>,
> 	reduc_smax_scal_<mode>, reduc_umin_scal_<mode>,
> 	reduc_umax_scal_<mode>, neon_vget_lane<mode>, neon_vget_laneu<mode>):
> 	Add element mode after gen_vec_extract* calls.
> 	* config/mips/mips-msa.md (vec_init<mode>): Renamed to ...
> 	(vec_init<mode><unitmode>): ... this.
> 	(vec_extract<mode>): Renamed to ...
> 	(vec_extract<mode><unitmode>): ... this.
> 	* config/mips/loongson.md (vec_init<mode>): Renamed to ...
> 	(vec_init<mode><unitmode>): ... this.
> 	* config/mips/mips-ps-3d.md (vec_initv2sf): Renamed to ...
> 	(vec_initv2sfsf): ... this.
> 	(vec_extractv2sf): Renamed to ...
> 	(vec_extractv2sfsf): ... this.
> 	(reduc_plus_scal_v2sf, reduc_smin_scal_v2sf, reduc_smax_scal_v2sf):
> 	Add element mode after gen_vec_extract* calls.
> 	* config/mips/mips.md (unitmode): New mode iterator.
> 	* config/spu/spu.c (spu_expand_prologue, spu_allocate_stack,
> 	spu_builtin_extract): Add element mode after gen_vec_extract* calls.
> 	* config/spu/spu.md (inner_l): New mode attribute.
> 	(vec_init<mode>): Renamed to ...
> 	(vec_init<mode><inner_l>): ... this.
> 	(vec_extract<mode>): Renamed to ...
> 	(vec_extract<mode><inner_l>): ... this.
> 	* config/sparc/sparc.md (veltmode): New mode iterator.
> 	(vec_init<VMALL:mode>): Renamed to ...
> 	(vec_init<VMALL:mode><VMALL:veltmode>): ... this.
> 	* config/ia64/vect.md (vec_initv2si): Renamed to ...
> 	(vec_initv2sisi): ... this.
> 	(vec_initv2sf): Renamed to ...
> 	(vec_initv2sfsf): ... this.
> 	(vec_extractv2sf): Renamed to ...
> 	(vec_extractv2sfsf): ... this.
> 	* config/powerpcspe/vector.md (VEC_base_l): New mode attribute.
> 	(vec_init<mode>): Renamed to ...
> 	(vec_init<mode><VEC_base_l>): ... this.
> 	(vec_extract<mode>): Renamed to ...
> 	(vec_extract<mode><VEC_base_l>): ... this.
> 	* config/powerpcspe/paired.md (vec_initv2sf): Renamed to ...
> 	(vec_initv2sfsf): ... this.
> 	* config/powerpcspe/altivec.md (splitter, altivec_copysign_v4sf3,
> 	vec_unpacku_hi_v16qi, vec_unpacku_hi_v8hi, vec_unpacku_lo_v16qi,
> 	vec_unpacku_lo_v8hi, mulv16qi3): Add element mode after mode in
> 	gen_vec_init* calls.
> 
> --- gcc/optabs.def.jj	2017-07-24 10:57:45.944815535 +0200
> +++ gcc/optabs.def	2017-07-24 16:11:23.066229910 +0200
> @@ -89,6 +89,8 @@ OPTAB_CD(vec_cmpu_optab, "vec_cmpu$a$b")
>  OPTAB_CD(vec_cmpeq_optab, "vec_cmpeq$a$b")
>  OPTAB_CD(maskload_optab, "maskload$a$b")
>  OPTAB_CD(maskstore_optab, "maskstore$a$b")
> +OPTAB_CD(vec_extract_optab, "vec_extract$a$b")
> +OPTAB_CD(vec_init_optab, "vec_init$a$b")
>  
>  OPTAB_NL(add_optab, "add$P$a3", PLUS, "add", '3', gen_int_fp_fixed_libfunc)
>  OPTAB_NX(add_optab, "add$F$a3")
> @@ -294,8 +296,6 @@ OPTAB_D (udot_prod_optab, "udot_prod$I$a
>  OPTAB_D (usum_widen_optab, "widen_usum$I$a3")
>  OPTAB_D (usad_optab, "usad$I$a")
>  OPTAB_D (ssad_optab, "ssad$I$a")
> -OPTAB_D (vec_extract_optab, "vec_extract$a")
> -OPTAB_D (vec_init_optab, "vec_init$a")
>  OPTAB_D (vec_pack_sfix_trunc_optab, "vec_pack_sfix_trunc_$a")
>  OPTAB_D (vec_pack_ssat_optab, "vec_pack_ssat_$a")
>  OPTAB_D (vec_pack_trunc_optab, "vec_pack_trunc_$a")
> --- gcc/optabs.c.jj	2017-07-24 10:57:46.216812275 +0200
> +++ gcc/optabs.c	2017-07-24 16:11:23.067229898 +0200
> @@ -386,7 +386,8 @@ expand_vector_broadcast (machine_mode vm
>    /* ??? If the target doesn't have a vec_init, then we have no easy way
>       of performing this operation.  Most of this sort of generic support
>       is hidden away in the vector lowering support in gimple.  */
> -  icode = optab_handler (vec_init_optab, vmode);
> +  icode = convert_optab_handler (vec_init_optab, vmode,
> +				 GET_MODE_INNER (vmode));
>    if (icode == CODE_FOR_nothing)
>      return NULL;
>  
> --- gcc/expmed.c.jj	2017-07-24 10:57:45.914815894 +0200
> +++ gcc/expmed.c	2017-07-24 16:11:23.071229850 +0200
> @@ -1566,6 +1566,55 @@ extract_bit_field_1 (rtx str_rtx, unsign
>        return op0;
>      }
>  
> +  /* First try to check for vector from vector extractions.  */
> +  if (VECTOR_MODE_P (GET_MODE (op0))
> +      && !MEM_P (op0)
> +      && VECTOR_MODE_P (tmode)
> +      && GET_MODE_SIZE (GET_MODE (op0)) > GET_MODE_SIZE (tmode))
> +    {
> +      machine_mode new_mode = GET_MODE (op0);
> +      if (GET_MODE_INNER (new_mode) != GET_MODE_INNER (tmode))
> +	{
> +	  new_mode = mode_for_vector (GET_MODE_INNER (tmode),
> +				      GET_MODE_BITSIZE (GET_MODE (op0))
> +				      / GET_MODE_UNIT_BITSIZE (tmode));
> +	  if (!VECTOR_MODE_P (new_mode)
> +	      || GET_MODE_SIZE (new_mode) != GET_MODE_SIZE (GET_MODE (op0))
> +	      || GET_MODE_INNER (new_mode) != GET_MODE_INNER (tmode)
> +	      || !targetm.vector_mode_supported_p (new_mode))
> +	    new_mode = VOIDmode;
> +	}
> +      if (new_mode != VOIDmode
> +	  && (convert_optab_handler (vec_extract_optab, new_mode, tmode)
> +	      != CODE_FOR_nothing)
> +	  && ((bitnum + bitsize - 1) / GET_MODE_BITSIZE (tmode)
> +	      == bitnum / GET_MODE_BITSIZE (tmode)))
> +	{
> +	  struct expand_operand ops[3];
> +	  machine_mode outermode = new_mode;
> +	  machine_mode innermode = tmode;
> +	  enum insn_code icode
> +	    = convert_optab_handler (vec_extract_optab, outermode, innermode);
> +	  unsigned HOST_WIDE_INT pos = bitnum / GET_MODE_BITSIZE (innermode);
> +
> +	  if (new_mode != GET_MODE (op0))
> +	    op0 = gen_lowpart (new_mode, op0);
> +	  create_output_operand (&ops[0], target, innermode);
> +	  ops[0].target = 1;
> +	  create_input_operand (&ops[1], op0, outermode);
> +	  create_integer_operand (&ops[2], pos);
> +	  if (maybe_expand_insn (icode, 3, ops))
> +	    {
> +	      if (alt_rtl && ops[0].target)
> +		*alt_rtl = target;
> +	      target = ops[0].value;
> +	      if (GET_MODE (target) != mode)
> +		return gen_lowpart (tmode, target);
> +	      return target;
> +	    }
> +	}
> +    }
> +
>    /* See if we can get a better vector mode before extracting.  */
>    if (VECTOR_MODE_P (GET_MODE (op0))
>        && !MEM_P (op0)
> @@ -1599,14 +1648,17 @@ extract_bit_field_1 (rtx str_rtx, unsign
>       available.  */
>    if (VECTOR_MODE_P (GET_MODE (op0))
>        && !MEM_P (op0)
> -      && optab_handler (vec_extract_optab, GET_MODE (op0)) != CODE_FOR_nothing
> +      && (convert_optab_handler (vec_extract_optab, GET_MODE (op0),
> +				 GET_MODE_INNER (GET_MODE (op0)))
> +	  != CODE_FOR_nothing)
>        && ((bitnum + bitsize - 1) / GET_MODE_UNIT_BITSIZE (GET_MODE (op0))
>  	  == bitnum / GET_MODE_UNIT_BITSIZE (GET_MODE (op0))))
>      {
>        struct expand_operand ops[3];
>        machine_mode outermode = GET_MODE (op0);
>        machine_mode innermode = GET_MODE_INNER (outermode);
> -      enum insn_code icode = optab_handler (vec_extract_optab, outermode);
> +      enum insn_code icode
> +	= convert_optab_handler (vec_extract_optab, outermode, innermode);
>        unsigned HOST_WIDE_INT pos = bitnum / GET_MODE_BITSIZE (innermode);
>  
>        create_output_operand (&ops[0], target, innermode);
> --- gcc/expr.c.jj	2017-07-24 10:57:45.963815307 +0200
> +++ gcc/expr.c	2017-07-24 16:11:23.073229826 +0200
> @@ -6589,6 +6589,7 @@ store_constructor (tree exp, rtx target,
>  	rtvec vector = NULL;
>  	unsigned n_elts;
>  	alias_set_type alias;
> +	bool vec_vec_init_p = false;
>  
>  	gcc_assert (eltmode != BLKmode);
>  
> @@ -6596,27 +6597,30 @@ store_constructor (tree exp, rtx target,
>  	if (REG_P (target) && VECTOR_MODE_P (GET_MODE (target)))
>  	  {
>  	    machine_mode mode = GET_MODE (target);
> +	    machine_mode emode = eltmode;
>  
> -	    icode = (int) optab_handler (vec_init_optab, mode);
> -	    /* Don't use vec_init<mode> if some elements have VECTOR_TYPE.  */
> -	    if (icode != CODE_FOR_nothing)
> +	    if (CONSTRUCTOR_NELTS (exp)
> +		&& (TREE_CODE (TREE_TYPE (CONSTRUCTOR_ELT (exp, 0)->value))
> +		    == VECTOR_TYPE))
>  	      {
> -		tree value;
> -
> -		FOR_EACH_CONSTRUCTOR_VALUE (CONSTRUCTOR_ELTS (exp), idx, value)
> -		  if (TREE_CODE (TREE_TYPE (value)) == VECTOR_TYPE)
> -		    {
> -		      icode = CODE_FOR_nothing;
> -		      break;
> -		    }
> +		tree etype = TREE_TYPE (CONSTRUCTOR_ELT (exp, 0)->value);
> +		gcc_assert (CONSTRUCTOR_NELTS (exp) * TYPE_VECTOR_SUBPARTS (etype)
> +			    == n_elts);
> +		emode = TYPE_MODE (etype);
>  	      }
> +	    icode = (int) convert_optab_handler (vec_init_optab, mode, emode);
>  	    if (icode != CODE_FOR_nothing)
>  	      {
> -		unsigned int i;
> +		unsigned int i, n = n_elts;
>  
> -		vector = rtvec_alloc (n_elts);
> -		for (i = 0; i < n_elts; i++)
> -		  RTVEC_ELT (vector, i) = CONST0_RTX (GET_MODE_INNER (mode));
> +		if (emode != eltmode)
> +		  {
> +		    n = CONSTRUCTOR_NELTS (exp);
> +		    vec_vec_init_p = true;
> +		  }
> +		vector = rtvec_alloc (n);
> +		for (i = 0; i < n; i++)
> +		  RTVEC_ELT (vector, i) = CONST0_RTX (emode);
>  	      }
>  	  }
>  
> @@ -6634,10 +6638,10 @@ store_constructor (tree exp, rtx target,
>  
>  	    FOR_EACH_CONSTRUCTOR_VALUE (CONSTRUCTOR_ELTS (exp), idx, value)
>  	      {
> -		int n_elts_here = tree_to_uhwi
> -		  (int_const_binop (TRUNC_DIV_EXPR,
> -				    TYPE_SIZE (TREE_TYPE (value)),
> -				    TYPE_SIZE (elttype)));
> +		tree sz = TYPE_SIZE (TREE_TYPE (value));
> +		int n_elts_here
> +		  = tree_to_uhwi (int_const_binop (TRUNC_DIV_EXPR, sz,
> +						   TYPE_SIZE (elttype)));
>  
>  		count += n_elts_here;
>  		if (mostly_zeros_p (value))
> @@ -6687,18 +6691,21 @@ store_constructor (tree exp, rtx target,
>  
>  	    if (vector)
>  	      {
> -		/* vec_init<mode> should not be used if there are VECTOR_TYPE
> -		   elements.  */
> -		gcc_assert (TREE_CODE (TREE_TYPE (value)) != VECTOR_TYPE);
> -		RTVEC_ELT (vector, eltpos)
> -		  = expand_normal (value);
> +		if (vec_vec_init_p)
> +		  {
> +		    gcc_assert (ce->index == NULL_TREE);
> +		    gcc_assert (TREE_CODE (TREE_TYPE (value)) == VECTOR_TYPE);
> +		    eltpos = idx;
> +		  }
> +		else
> +		  gcc_assert (TREE_CODE (TREE_TYPE (value)) != VECTOR_TYPE);
> +		RTVEC_ELT (vector, eltpos) = expand_normal (value);
>  	      }
>  	    else
>  	      {
> -		machine_mode value_mode =
> -		  TREE_CODE (TREE_TYPE (value)) == VECTOR_TYPE
> -		  ? TYPE_MODE (TREE_TYPE (value))
> -		  : eltmode;
> +		machine_mode value_mode
> +		  = (TREE_CODE (TREE_TYPE (value)) == VECTOR_TYPE
> +		     ? TYPE_MODE (TREE_TYPE (value)) : eltmode);
>  		bitpos = eltpos * elt_size;
>  		store_constructor_field (target, bitsize, bitpos, 0,
>  					 bitregion_end, value_mode,
> @@ -6707,9 +6714,9 @@ store_constructor (tree exp, rtx target,
>  	  }
>  
>  	if (vector)
> -	  emit_insn (GEN_FCN (icode)
> -		     (target,
> -		      gen_rtx_PARALLEL (GET_MODE (target), vector)));
> +	  emit_insn (GEN_FCN (icode) (target,
> +				      gen_rtx_PARALLEL (GET_MODE (target),
> +							vector)));
>  	break;
>        }
>  
> --- gcc/tree-vect-stmts.c.jj	2017-07-24 10:57:46.004814816 +0200
> +++ gcc/tree-vect-stmts.c	2017-07-24 16:11:23.049230114 +0200
> @@ -6996,29 +6996,43 @@ vectorizable_load (gimple *stmt, gimple_
>  	{
>  	  if (group_size < nunits)
>  	    {
> -	      /* Avoid emitting a constructor of vector elements by performing
> -		 the loads using an integer type of the same size,
> -		 constructing a vector of those and then re-interpreting it
> -		 as the original vector type.  This works around the fact
> -		 that the vec_init optab was only designed for scalar
> -		 element modes and thus expansion goes through memory.
> -		 This avoids a huge runtime penalty due to the general
> -		 inability to perform store forwarding from smaller stores
> -		 to a larger load.  */
> -	      unsigned lsize
> -		= group_size * TYPE_PRECISION (TREE_TYPE (vectype));
> -	      machine_mode elmode = mode_for_size (lsize, MODE_INT, 0);
> -	      machine_mode vmode = mode_for_vector (elmode,
> -						    nunits / group_size);
> -	      /* If we can't construct such a vector fall back to
> -		 element loads of the original vector type.  */
> +	      /* First check if vec_init optab supports construction from
> +		 vector elts directly.  */
> +	      machine_mode elmode = TYPE_MODE (TREE_TYPE (vectype));
> +	      machine_mode vmode = mode_for_vector (elmode, group_size);
>  	      if (VECTOR_MODE_P (vmode)
> -		  && optab_handler (vec_init_optab, vmode) != CODE_FOR_nothing)
> +		  && (convert_optab_handler (vec_init_optab,
> +					     TYPE_MODE (vectype), vmode)
> +		      != CODE_FOR_nothing))
>  		{
>  		  nloads = nunits / group_size;
>  		  lnel = group_size;
> -		  ltype = build_nonstandard_integer_type (lsize, 1);
> -		  lvectype = build_vector_type (ltype, nloads);
> +		  ltype = build_vector_type (TREE_TYPE (vectype), group_size);
> +		}
> +	      else
> +		{
> +		  /* Otherwise avoid emitting a constructor of vector elements
> +		     by performing the loads using an integer type of the same
> +		     size, constructing a vector of those and then
> +		     re-interpreting it as the original vector type.
> +		     This avoids a huge runtime penalty due to the general
> +		     inability to perform store forwarding from smaller stores
> +		     to a larger load.  */
> +		  unsigned lsize
> +		    = group_size * TYPE_PRECISION (TREE_TYPE (vectype));
> +		  elmode = mode_for_size (lsize, MODE_INT, 0);
> +		  vmode = mode_for_vector (elmode, nunits / group_size);
> +		  /* If we can't construct such a vector fall back to
> +		     element loads of the original vector type.  */
> +		  if (VECTOR_MODE_P (vmode)
> +		      && (convert_optab_handler (vec_init_optab, vmode, elmode)
> +			  != CODE_FOR_nothing))
> +		    {
> +		      nloads = nunits / group_size;
> +		      lnel = group_size;
> +		      ltype = build_nonstandard_integer_type (lsize, 1);
> +		      lvectype = build_vector_type (ltype, nloads);
> +		    }
>  		}
>  	    }
>  	  else
> --- gcc/doc/md.texi.jj	2017-07-24 10:57:45.989814996 +0200
> +++ gcc/doc/md.texi	2017-07-24 17:09:55.536882382 +0200
> @@ -4871,15 +4871,22 @@ This pattern is not allowed to @code{FAI
>  Set given field in the vector value.  Operand 0 is the vector to modify,
>  operand 1 is new value of field and operand 2 specify the field index.
>  
> -@cindex @code{vec_extract@var{m}} instruction pattern
> -@item @samp{vec_extract@var{m}}
> +@cindex @code{vec_extract@var{m}@var{n}} instruction pattern
> +@item @samp{vec_extract@var{m}@var{n}}
>  Extract given field from the vector value.  Operand 1 is the vector, operand 2
> -specify field index and operand 0 place to store value into.
> +specify field index and operand 0 place to store value into.  The
> +@var{n} mode is the mode of the field or vector of fields that should be
> +extracted, should be either element mode of the vector mode @var{m}, or
> +a vector mode with the same element mode and smaller number of elements.
> +If @var{n} is a vector mode, the index is counted in units of that mode.
>  
> -@cindex @code{vec_init@var{m}} instruction pattern
> -@item @samp{vec_init@var{m}}
> +@cindex @code{vec_init@var{m}@var{n}} instruction pattern
> +@item @samp{vec_init@var{m}@var{n}}
>  Initialize the vector to given values.  Operand 0 is the vector to initialize
> -and operand 1 is parallel containing values for individual fields.
> +and operand 1 is parallel containing values for individual fields.  The
> +@var{n} mode is the mode of the elements, should be either element mode of
> +the vector mode @var{m}, or a vector mode with the same element mode and
> +smaller number of elements.
>  
>  @cindex @code{vec_cmp@var{m}@var{n}} instruction pattern
>  @item @samp{vec_cmp@var{m}@var{n}}
> --- gcc/config/i386/i386.c.jj	2017-07-24 10:58:11.831505333 +0200
> +++ gcc/config/i386/i386.c	2017-07-24 16:11:23.060229982 +0200
> @@ -44297,6 +44297,34 @@ ix86_expand_vector_init (bool mmx_ok, rt
>    int i;
>    rtx x;
>  
> +  /* Handle first initialization from vector elts.  */
> +  if (n_elts != XVECLEN (vals, 0))
> +    {
> +      rtx subtarget = target;
> +      x = XVECEXP (vals, 0, 0);
> +      gcc_assert (GET_MODE_INNER (GET_MODE (x)) == inner_mode);
> +      if (GET_MODE_NUNITS (GET_MODE (x)) * 2 == n_elts)
> +	{
> +	  rtx ops[2] = { XVECEXP (vals, 0, 0), XVECEXP (vals, 0, 1) };
> +	  if (inner_mode == QImode || inner_mode == HImode)
> +	    {
> +	      mode = mode_for_vector (SImode,
> +				      n_elts * GET_MODE_SIZE (inner_mode) / 4);
> +	      inner_mode
> +		= mode_for_vector (SImode,
> +				   n_elts * GET_MODE_SIZE (inner_mode) / 8);
> +	      ops[0] = gen_lowpart (inner_mode, ops[0]);
> +	      ops[1] = gen_lowpart (inner_mode, ops[1]);
> +	      subtarget = gen_reg_rtx (mode);
> +	    }
> +	  ix86_expand_vector_init_concat (mode, subtarget, ops, 2);
> +	  if (subtarget != target)
> +	    emit_move_insn (target, gen_lowpart (GET_MODE (target), subtarget));
> +	  return;
> +	}
> +      gcc_unreachable ();
> +    }
> +
>    for (i = 0; i < n_elts; ++i)
>      {
>        x = XVECEXP (vals, 0, i);
> --- gcc/config/i386/sse.md.jj	2017-07-24 10:57:45.807817176 +0200
> +++ gcc/config/i386/sse.md	2017-07-24 16:54:35.658088768 +0200
> @@ -658,13 +658,21 @@ (define_mode_attr ssedoublevecmode
>  
>  ;; Mapping of vector modes to a vector mode of half size
>  (define_mode_attr ssehalfvecmode
> -  [(V64QI "V32QI") (V32HI "V16HI") (V16SI "V8SI") (V8DI "V4DI")
> +  [(V64QI "V32QI") (V32HI "V16HI") (V16SI "V8SI") (V8DI "V4DI") (V4TI "V2TI")
>     (V32QI "V16QI") (V16HI  "V8HI") (V8SI  "V4SI") (V4DI "V2DI")
>     (V16QI  "V8QI") (V8HI   "V4HI") (V4SI  "V2SI")
>     (V16SF "V8SF") (V8DF "V4DF")
>     (V8SF  "V4SF") (V4DF "V2DF")
>     (V4SF  "V2SF")])
>  
> +(define_mode_attr ssehalfvecmodelower
> +  [(V64QI "v32qi") (V32HI "v16hi") (V16SI "v8si") (V8DI "v4di") (V4TI "v2ti")
> +   (V32QI "v16qi") (V16HI  "v8hi") (V8SI  "v4si") (V4DI "v2di")
> +   (V16QI  "v8qi") (V8HI   "v4hi") (V4SI  "v2si")
> +   (V16SF "v8sf") (V8DF "v4df")
> +   (V8SF  "v4sf") (V4DF "v2df")
> +   (V4SF  "v2sf")])
> +
>  ;; Mapping of vector modes ti packed single mode of the same size
>  (define_mode_attr ssePSmode
>    [(V16SI "V16SF") (V8DF "V16SF")
> @@ -690,6 +698,16 @@ (define_mode_attr ssescalarmode
>     (V8DF "DF")  (V4DF "DF")  (V2DF "DF")
>     (V4TI "TI")  (V2TI "TI")])
>  
> +;; Mapping of vector modes back to the scalar modes
> +(define_mode_attr ssescalarmodelower
> +  [(V64QI "qi") (V32QI "qi") (V16QI "qi")
> +   (V32HI "hi") (V16HI "hi") (V8HI "hi")
> +   (V16SI "si") (V8SI "si")  (V4SI "si")
> +   (V8DI "di")  (V4DI "di")  (V2DI "di")
> +   (V16SF "sf") (V8SF "sf")  (V4SF "sf")
> +   (V8DF "df")  (V4DF "df")  (V2DF "df")
> +   (V4TI "ti")  (V2TI "ti")])
> +
>  ;; Mapping of vector modes to the 128bit modes
>  (define_mode_attr ssexmmmode
>    [(V64QI "V16QI") (V32QI "V16QI") (V16QI "V16QI")
> @@ -2356,7 +2374,7 @@ (define_expand "reduc_plus_scal_v8df"
>  {
>    rtx tmp = gen_reg_rtx (V8DFmode);
>    ix86_expand_reduc (gen_addv8df3, tmp, operands[1]);
> -  emit_insn (gen_vec_extractv8df (operands[0], tmp, const0_rtx));
> +  emit_insn (gen_vec_extractv8dfdf (operands[0], tmp, const0_rtx));
>    DONE;
>  })
>  
> @@ -2371,7 +2389,7 @@ (define_expand "reduc_plus_scal_v4df"
>    emit_insn (gen_avx_haddv4df3 (tmp, operands[1], operands[1]));
>    emit_insn (gen_avx_vperm2f128v4df3 (tmp2, tmp, tmp, GEN_INT (1)));
>    emit_insn (gen_addv4df3 (vec_res, tmp, tmp2));
> -  emit_insn (gen_vec_extractv4df (operands[0], vec_res, const0_rtx));
> +  emit_insn (gen_vec_extractv4dfdf (operands[0], vec_res, const0_rtx));
>    DONE;
>  })
>  
> @@ -2382,7 +2400,7 @@ (define_expand "reduc_plus_scal_v2df"
>  {
>    rtx tmp = gen_reg_rtx (V2DFmode);
>    emit_insn (gen_sse3_haddv2df3 (tmp, operands[1], operands[1]));
> -  emit_insn (gen_vec_extractv2df (operands[0], tmp, const0_rtx));
> +  emit_insn (gen_vec_extractv2dfdf (operands[0], tmp, const0_rtx));
>    DONE;
>  })
>  
> @@ -2393,7 +2411,7 @@ (define_expand "reduc_plus_scal_v16sf"
>  {
>    rtx tmp = gen_reg_rtx (V16SFmode);
>    ix86_expand_reduc (gen_addv16sf3, tmp, operands[1]);
> -  emit_insn (gen_vec_extractv16sf (operands[0], tmp, const0_rtx));
> +  emit_insn (gen_vec_extractv16sfsf (operands[0], tmp, const0_rtx));
>    DONE;
>  })
>  
> @@ -2409,7 +2427,7 @@ (define_expand "reduc_plus_scal_v8sf"
>    emit_insn (gen_avx_haddv8sf3 (tmp2, tmp, tmp));
>    emit_insn (gen_avx_vperm2f128v8sf3 (tmp, tmp2, tmp2, GEN_INT (1)));
>    emit_insn (gen_addv8sf3 (vec_res, tmp, tmp2));
> -  emit_insn (gen_vec_extractv8sf (operands[0], vec_res, const0_rtx));
> +  emit_insn (gen_vec_extractv8sfsf (operands[0], vec_res, const0_rtx));
>    DONE;
>  })
>  
> @@ -2427,7 +2445,7 @@ (define_expand "reduc_plus_scal_v4sf"
>      }
>    else
>      ix86_expand_reduc (gen_addv4sf3, vec_res, operands[1]);
> -  emit_insn (gen_vec_extractv4sf (operands[0], vec_res, const0_rtx));
> +  emit_insn (gen_vec_extractv4sfsf (operands[0], vec_res, const0_rtx));
>    DONE;
>  })
>  
> @@ -2449,7 +2467,8 @@ (define_expand "reduc_<code>_scal_<mode>
>  {
>    rtx tmp = gen_reg_rtx (<MODE>mode);
>    ix86_expand_reduc (gen_<code><mode>3, tmp, operands[1]);
> -  emit_insn (gen_vec_extract<mode> (operands[0], tmp, const0_rtx));
> +  emit_insn (gen_vec_extract<mode><ssescalarmodelower> (operands[0], tmp,
> +							const0_rtx));
>    DONE;
>  })
>  
> @@ -2461,7 +2480,8 @@ (define_expand "reduc_<code>_scal_<mode>
>  {
>    rtx tmp = gen_reg_rtx (<MODE>mode);
>    ix86_expand_reduc (gen_<code><mode>3, tmp, operands[1]);
> -  emit_insn (gen_vec_extract<mode> (operands[0], tmp, const0_rtx));
> +  emit_insn (gen_vec_extract<mode><ssescalarmodelower> (operands[0], tmp,
> +  							const0_rtx));
>    DONE;
>  })
>  
> @@ -2473,7 +2493,8 @@ (define_expand "reduc_<code>_scal_<mode>
>  {
>    rtx tmp = gen_reg_rtx (<MODE>mode);
>    ix86_expand_reduc (gen_<code><mode>3, tmp, operands[1]);
> -  emit_insn (gen_vec_extract<mode> (operands[0], tmp, const0_rtx));
> +  emit_insn (gen_vec_extract<mode><ssescalarmodelower> (operands[0], tmp,
> +							const0_rtx));
>    DONE;
>  })
>  
> @@ -2485,7 +2506,7 @@ (define_expand "reduc_umin_scal_v8hi"
>  {
>    rtx tmp = gen_reg_rtx (V8HImode);
>    ix86_expand_reduc (gen_uminv8hi3, tmp, operands[1]);
> -  emit_insn (gen_vec_extractv8hi (operands[0], tmp, const0_rtx));
> +  emit_insn (gen_vec_extractv8hihi (operands[0], tmp, const0_rtx));
>    DONE;
>  })
>  
> @@ -7881,7 +7902,7 @@ (define_mode_iterator VEC_EXTRACT_MODE
>     (V8DF "TARGET_AVX512F") (V4DF "TARGET_AVX") V2DF
>     (V4TI "TARGET_AVX512F") (V2TI "TARGET_AVX")])
>  
> -(define_expand "vec_extract<mode>"
> +(define_expand "vec_extract<mode><ssescalarmodelower>"
>    [(match_operand:<ssescalarmode> 0 "register_operand")
>     (match_operand:VEC_EXTRACT_MODE 1 "register_operand")
>     (match_operand 2 "const_int_operand")]
> @@ -7892,6 +7913,19 @@ (define_expand "vec_extract<mode>"
>    DONE;
>  })
>  
> +(define_expand "vec_extract<mode><ssehalfvecmodelower>"
> +  [(match_operand:<ssehalfvecmode> 0 "nonimmediate_operand")
> +   (match_operand:V_512 1 "register_operand")
> +   (match_operand 2 "const_0_to_1_operand")]
> +  "TARGET_AVX512F"
> +{
> +  if (INTVAL (operands[2]))
> +    emit_insn (gen_vec_extract_hi_<mode> (operands[0], operands[1]));
> +  else
> +    emit_insn (gen_vec_extract_lo_<mode> (operands[0], operands[1]));
> +  DONE;
> +})
> +
>  ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
>  ;;
>  ;; Parallel double-precision floating point element swizzling
> @@ -16693,7 +16727,7 @@ (define_expand "rotl<mode>3"
>        for (i = 0; i < <ssescalarnum>; i++)
>  	RTVEC_ELT (vs, i) = op2;
>  
> -      emit_insn (gen_vec_init<mode> (reg, par));
> +      emit_insn (gen_vec_init<mode><ssescalarmodelower> (reg, par));
>        emit_insn (gen_xop_vrotl<mode>3 (operands[0], operands[1], reg));
>        DONE;
>      }
> @@ -16725,7 +16759,7 @@ (define_expand "rotr<mode>3"
>        for (i = 0; i < <ssescalarnum>; i++)
>  	RTVEC_ELT (vs, i) = op2;
>  
> -      emit_insn (gen_vec_init<mode> (reg, par));
> +      emit_insn (gen_vec_init<mode><ssescalarmodelower> (reg, par));
>        emit_insn (gen_neg<mode>2 (neg, reg));
>        emit_insn (gen_xop_vrotl<mode>3 (operands[0], operands[1], neg));
>        DONE;
> @@ -17019,7 +17053,7 @@ (define_expand "<shift_insn><mode>3"
>          XVECEXP (par, 0, i) = operands[2];
>  
>        tmp = gen_reg_rtx (V16QImode);
> -      emit_insn (gen_vec_initv16qi (tmp, par));
> +      emit_insn (gen_vec_initv16qiqi (tmp, par));
>  
>        if (negate)
>  	emit_insn (gen_negv16qi2 (tmp, tmp));
> @@ -17055,7 +17089,7 @@ (define_expand "ashrv2di3"
>        for (i = 0; i < 2; i++)
>  	XVECEXP (par, 0, i) = operands[2];
>  
> -      emit_insn (gen_vec_initv2di (reg, par));
> +      emit_insn (gen_vec_initv2didi (reg, par));
>  
>        if (negate)
>  	emit_insn (gen_negv2di2 (reg, reg));
> @@ -18775,7 +18809,7 @@ (define_insn_and_split "avx_<castmode><a
>  				  <ssehalfvecmode>mode);
>  })
>  
> -;; Modes handled by vec_init patterns.
> +;; Modes handled by vec_init expanders.
>  (define_mode_iterator VEC_INIT_MODE
>    [(V64QI "TARGET_AVX512F") (V32QI "TARGET_AVX") V16QI
>     (V32HI "TARGET_AVX512F") (V16HI "TARGET_AVX") V8HI
> @@ -18785,11 +18819,31 @@ (define_mode_iterator VEC_INIT_MODE
>     (V8DF "TARGET_AVX512F") (V4DF "TARGET_AVX") (V2DF "TARGET_SSE2")
>     (V4TI "TARGET_AVX512F") (V2TI "TARGET_AVX")])
>  
> -(define_expand "vec_init<mode>"
> +;; Likewise, but for initialization from half sized vectors.
> +;; Thus, these are all VEC_INIT_MODE modes except V2??.
> +(define_mode_iterator VEC_INIT_HALF_MODE
> +  [(V64QI "TARGET_AVX512F") (V32QI "TARGET_AVX") V16QI
> +   (V32HI "TARGET_AVX512F") (V16HI "TARGET_AVX") V8HI
> +   (V16SI "TARGET_AVX512F") (V8SI "TARGET_AVX") V4SI
> +   (V8DI "TARGET_AVX512F") (V4DI "TARGET_AVX")
> +   (V16SF "TARGET_AVX512F") (V8SF "TARGET_AVX") V4SF
> +   (V8DF "TARGET_AVX512F") (V4DF "TARGET_AVX")
> +   (V4TI "TARGET_AVX512F")])
> +
> +(define_expand "vec_init<mode><ssescalarmodelower>"
>    [(match_operand:VEC_INIT_MODE 0 "register_operand")
>     (match_operand 1)]
>    "TARGET_SSE"
>  {
> +  ix86_expand_vector_init (false, operands[0], operands[1]);
> +  DONE;
> +})
> +
> +(define_expand "vec_init<mode><ssehalfvecmodelower>"
> +  [(match_operand:VEC_INIT_HALF_MODE 0 "register_operand")
> +   (match_operand 1)]
> +  "TARGET_SSE"
> +{
>    ix86_expand_vector_init (false, operands[0], operands[1]);
>    DONE;
>  })
> --- gcc/config/i386/mmx.md.jj	2017-07-24 10:57:45.869816434 +0200
> +++ gcc/config/i386/mmx.md	2017-07-24 16:11:23.065229922 +0200
> @@ -641,7 +641,7 @@ (define_split
>    [(set (match_dup 0) (match_dup 1))]
>    "operands[1] = adjust_address (operands[1], SFmode, 4);")
>  
> -(define_expand "vec_extractv2sf"
> +(define_expand "vec_extractv2sfsf"
>    [(match_operand:SF 0 "register_operand")
>     (match_operand:V2SF 1 "register_operand")
>     (match_operand 2 "const_int_operand")]
> @@ -652,7 +652,7 @@ (define_expand "vec_extractv2sf"
>    DONE;
>  })
>  
> -(define_expand "vec_initv2sf"
> +(define_expand "vec_initv2sfsf"
>    [(match_operand:V2SF 0 "register_operand")
>     (match_operand 1)]
>    "TARGET_SSE"
> @@ -1344,7 +1344,7 @@ (define_insn_and_split "*vec_extractv2si
>    operands[1] = adjust_address (operands[1], SImode, INTVAL (operands[2]) * 4);
>  })
>  
> -(define_expand "vec_extractv2si"
> +(define_expand "vec_extractv2sisi"
>    [(match_operand:SI 0 "register_operand")
>     (match_operand:V2SI 1 "register_operand")
>     (match_operand 2 "const_int_operand")]
> @@ -1355,7 +1355,7 @@ (define_expand "vec_extractv2si"
>    DONE;
>  })
>  
> -(define_expand "vec_initv2si"
> +(define_expand "vec_initv2sisi"
>    [(match_operand:V2SI 0 "register_operand")
>     (match_operand 1)]
>    "TARGET_SSE"
> @@ -1375,7 +1375,7 @@ (define_expand "vec_setv4hi"
>    DONE;
>  })
>  
> -(define_expand "vec_extractv4hi"
> +(define_expand "vec_extractv4hihi"
>    [(match_operand:HI 0 "register_operand")
>     (match_operand:V4HI 1 "register_operand")
>     (match_operand 2 "const_int_operand")]
> @@ -1386,7 +1386,7 @@ (define_expand "vec_extractv4hi"
>    DONE;
>  })
>  
> -(define_expand "vec_initv4hi"
> +(define_expand "vec_initv4hihi"
>    [(match_operand:V4HI 0 "register_operand")
>     (match_operand 1)]
>    "TARGET_SSE"
> @@ -1406,7 +1406,7 @@ (define_expand "vec_setv8qi"
>    DONE;
>  })
>  
> -(define_expand "vec_extractv8qi"
> +(define_expand "vec_extractv8qiqi"
>    [(match_operand:QI 0 "register_operand")
>     (match_operand:V8QI 1 "register_operand")
>     (match_operand 2 "const_int_operand")]
> @@ -1417,7 +1417,7 @@ (define_expand "vec_extractv8qi"
>    DONE;
>  })
>  
> -(define_expand "vec_initv8qi"
> +(define_expand "vec_initv8qiqi"
>    [(match_operand:V8QI 0 "register_operand")
>     (match_operand 1)]
>    "TARGET_SSE"
> --- gcc/config/rs6000/vector.md.jj	2017-06-08 20:50:49.000000000 +0200
> +++ gcc/config/rs6000/vector.md	2017-07-24 17:44:44.699580927 +0200
> @@ -74,6 +74,16 @@ (define_mode_attr VEC_base [(V16QI "QI")
>  			    (V1TI  "TI")
>  			    (TI    "TI")])
>  
> +;; As above, but in lower case
> +(define_mode_attr VEC_base_l [(V16QI "qi")
> +			      (V8HI  "hi")
> +			      (V4SI  "si")
> +			      (V2DI  "di")
> +			      (V4SF  "sf")
> +			      (V2DF  "df")
> +			      (V1TI  "ti")
> +			      (TI    "ti")])
> +
>  ;; Same size integer type for floating point data
>  (define_mode_attr VEC_int [(V4SF  "v4si")
>  			   (V2DF  "v2di")])
> @@ -1016,7 +1026,7 @@ (define_expand "fixuns_trunc<mode><VEC_i
>  
>  \f
>  ;; Vector initialization, set, extract
> -(define_expand "vec_init<mode>"
> +(define_expand "vec_init<mode><VEC_base_l>"
>    [(match_operand:VEC_E 0 "vlogical_operand" "")
>     (match_operand:VEC_E 1 "" "")]
>    "VECTOR_MEM_ALTIVEC_OR_VSX_P (<MODE>mode)"
> @@ -1035,7 +1045,7 @@ (define_expand "vec_set<mode>"
>    DONE;
>  })
>  
> -(define_expand "vec_extract<mode>"
> +(define_expand "vec_extract<mode><VEC_base_l>"
>    [(match_operand:<VEC_base> 0 "register_operand" "")
>     (match_operand:VEC_E 1 "vlogical_operand" "")
>     (match_operand 2 "const_int_operand" "")]
> --- gcc/config/rs6000/paired.md.jj	2017-06-08 20:50:49.000000000 +0200
> +++ gcc/config/rs6000/paired.md	2017-07-24 17:48:20.324985029 +0200
> @@ -377,7 +377,7 @@ (define_insn "paired_muls1"
>    "ps_muls1 %0, %1, %2"
>    [(set_attr "type" "fp")])
>  
> -(define_expand "vec_initv2sf"
> +(define_expand "vec_initv2sfsf"
>    [(match_operand:V2SF 0 "gpc_reg_operand" "=f")
>     (match_operand 1 "" "")]
>    "TARGET_PAIRED_FLOAT"
> --- gcc/config/rs6000/altivec.md.jj	2017-07-24 10:58:12.000000000 +0200
> +++ gcc/config/rs6000/altivec.md	2017-07-24 17:48:49.573633038 +0200
> @@ -311,7 +311,7 @@ (define_split
>    for (i = 0; i < num_elements; i++)
>      RTVEC_ELT (v, i) = constm1_rtx;
>  
> -  emit_insn (gen_vec_initv4si (dest, gen_rtx_PARALLEL (mode, v)));
> +  emit_insn (gen_vec_initv4sisi (dest, gen_rtx_PARALLEL (mode, v)));
>    emit_insn (gen_rtx_SET (dest, gen_rtx_ASHIFT (mode, dest, dest)));
>    DONE;
>  })
> @@ -2267,7 +2267,7 @@ (define_expand "altivec_copysign_v4sf3"
>    RTVEC_ELT (v, 2) = GEN_INT (mask_val);
>    RTVEC_ELT (v, 3) = GEN_INT (mask_val);
>  
> -  emit_insn (gen_vec_initv4si (mask, gen_rtx_PARALLEL (V4SImode, v)));
> +  emit_insn (gen_vec_initv4sisi (mask, gen_rtx_PARALLEL (V4SImode, v)));
>    emit_insn (gen_vector_select_v4sf (operands[0], operands[1], operands[2],
>  				     gen_lowpart (V4SFmode, mask)));
>    DONE;
> @@ -3409,7 +3409,7 @@ (define_expand "vec_unpacku_hi_v16qi"
>    RTVEC_ELT (v, 14) = gen_rtx_CONST_INT (QImode, be ? 16 :  0);
>    RTVEC_ELT (v, 15) = gen_rtx_CONST_INT (QImode, be ?  7 : 16);
>  
> -  emit_insn (gen_vec_initv16qi (mask, gen_rtx_PARALLEL (V16QImode, v)));
> +  emit_insn (gen_vec_initv16qiqi (mask, gen_rtx_PARALLEL (V16QImode, v)));
>    emit_insn (gen_vperm_v16qiv8hi (operands[0], operands[1], vzero, mask));
>    DONE;
>  }")
> @@ -3445,7 +3445,7 @@ (define_expand "vec_unpacku_hi_v8hi"
>    RTVEC_ELT (v, 14) = gen_rtx_CONST_INT (QImode, be ?  6 : 17);
>    RTVEC_ELT (v, 15) = gen_rtx_CONST_INT (QImode, be ?  7 : 16);
>  
> -  emit_insn (gen_vec_initv16qi (mask, gen_rtx_PARALLEL (V16QImode, v)));
> +  emit_insn (gen_vec_initv16qiqi (mask, gen_rtx_PARALLEL (V16QImode, v)));
>    emit_insn (gen_vperm_v8hiv4si (operands[0], operands[1], vzero, mask));
>    DONE;
>  }")
> @@ -3481,7 +3481,7 @@ (define_expand "vec_unpacku_lo_v16qi"
>    RTVEC_ELT (v, 14) = gen_rtx_CONST_INT (QImode, be ? 16 :  8);
>    RTVEC_ELT (v, 15) = gen_rtx_CONST_INT (QImode, be ? 15 : 16);
>  
> -  emit_insn (gen_vec_initv16qi (mask, gen_rtx_PARALLEL (V16QImode, v)));
> +  emit_insn (gen_vec_initv16qiqi (mask, gen_rtx_PARALLEL (V16QImode, v)));
>    emit_insn (gen_vperm_v16qiv8hi (operands[0], operands[1], vzero, mask));
>    DONE;
>  }")
> @@ -3517,7 +3517,7 @@ (define_expand "vec_unpacku_lo_v8hi"
>    RTVEC_ELT (v, 14) = gen_rtx_CONST_INT (QImode, be ? 14 : 17);
>    RTVEC_ELT (v, 15) = gen_rtx_CONST_INT (QImode, be ? 15 : 16);
>  
> -  emit_insn (gen_vec_initv16qi (mask, gen_rtx_PARALLEL (V16QImode, v)));
> +  emit_insn (gen_vec_initv16qiqi (mask, gen_rtx_PARALLEL (V16QImode, v)));
>    emit_insn (gen_vperm_v8hiv4si (operands[0], operands[1], vzero, mask));
>    DONE;
>  }")
> @@ -3758,7 +3758,7 @@ (define_expand "mulv16qi3"
>       = gen_rtx_CONST_INT (QImode, BYTES_BIG_ENDIAN ? 2 * i + 17 : 15 - 2 * i);
>    }
>  
> -  emit_insn (gen_vec_initv16qi (mask, gen_rtx_PARALLEL (V16QImode, v)));
> +  emit_insn (gen_vec_initv16qiqi (mask, gen_rtx_PARALLEL (V16QImode, v)));
>    emit_insn (gen_altivec_vmulesb (even, operands[1], operands[2]));
>    emit_insn (gen_altivec_vmulosb (odd, operands[1], operands[2]));
>    emit_insn (gen_altivec_vperm_v8hiv16qi (operands[0], even, odd, mask));
> @@ -3804,7 +3804,7 @@ (define_expand "altivec_vreve<mode>2"
>        RTVEC_ELT (v, i + j * size)
>  	= GEN_INT (i + (num_elements - 1 - j) * size);
>  
> -  emit_insn (gen_vec_initv16qi (mask, gen_rtx_PARALLEL (V16QImode, v)));
> +  emit_insn (gen_vec_initv16qiqi (mask, gen_rtx_PARALLEL (V16QImode, v)));
>    emit_insn (gen_altivec_vperm_<mode> (operands[0], operands[1],
>  	     operands[1], mask));
>    DONE;
> --- gcc/config/aarch64/aarch64-simd.md.jj	2017-07-24 15:01:21.000000000 +0200
> +++ gcc/config/aarch64/aarch64-simd.md	2017-07-24 17:19:05.660170375 +0200
> @@ -5617,9 +5617,9 @@ (define_expand "aarch64_set_qreg<VSTRUCT
>    DONE;
>  })
>  
> -;; Standard pattern name vec_init<mode>.
> +;; Standard pattern name vec_init<mode><Vel>.
>  
> -(define_expand "vec_init<mode>"
> +(define_expand "vec_init<mode><Vel>"
>    [(match_operand:VALL_F16 0 "register_operand" "")
>     (match_operand 1 "" "")]
>    "TARGET_SIMD"
> @@ -5674,9 +5674,9 @@ (define_insn "aarch64_urecpe<mode>"
>   "urecpe\\t%0.<Vtype>, %1.<Vtype>"
>    [(set_attr "type" "neon_fp_recpe_<Vetype><q>")])
>  
> -;; Standard pattern name vec_extract<mode>.
> +;; Standard pattern name vec_extract<mode><Vel>.
>  
> -(define_expand "vec_extract<mode>"
> +(define_expand "vec_extract<mode><Vel>"
>    [(match_operand:<VEL> 0 "aarch64_simd_nonimmediate_operand" "")
>     (match_operand:VALL_F16 1 "register_operand" "")
>     (match_operand:SI 2 "immediate_operand" "")]
> --- gcc/config/aarch64/iterators.md.jj	2017-03-19 11:57:22.000000000 +0100
> +++ gcc/config/aarch64/iterators.md	2017-07-24 17:17:50.318091273 +0200
> @@ -520,6 +520,17 @@ (define_mode_attr VEL [(V8QI "QI") (V16Q
>  			(SI   "SI") (HI   "HI")
>  			(QI   "QI")])
>  
> +;; Define element mode for each vector mode (lower case).
> +(define_mode_attr Vel [(V8QI "qi") (V16QI "qi")
> +			(V4HI "hi") (V8HI "hi")
> +			(V2SI "si") (V4SI "si")
> +			(DI "di")   (V2DI "di")
> +			(V4HF "hf") (V8HF "hf")
> +			(V2SF "sf") (V4SF "sf")
> +			(V2DF "df") (DF "df")
> +			(SI   "si") (HI   "hi")
> +			(QI   "qi")])
> +
>  ;; 64-bit container modes the inner or scalar source mode.
>  (define_mode_attr VCOND [(HI "V4HI") (SI "V2SI")
>  			 (V4HI "V4HI") (V8HI "V4HI")
> --- gcc/config/s390/s390.c.jj	2017-07-17 10:08:39.000000000 +0200
> +++ gcc/config/s390/s390.c	2017-07-24 17:58:24.416715142 +0200
> @@ -5792,7 +5792,7 @@ s390_expand_vec_strlen (rtx target, rtx
>    add_int_reg_note (s390_emit_ccraw_jump (8, NE, loop_start_label),
>  		    REG_BR_PROB,
>  		    profile_probability::very_likely ().to_reg_br_prob_note ());
> -  emit_insn (gen_vec_extractv16qi (len, result_reg, GEN_INT (7)));
> +  emit_insn (gen_vec_extractv16qiqi (len, result_reg, GEN_INT (7)));
>  
>    /* If the string pointer wasn't aligned we have loaded less then 16
>       bytes and the remaining bytes got filled with zeros (by vll).
> @@ -5850,7 +5850,7 @@ s390_expand_vec_movstr (rtx result, rtx
>    emit_insn (gen_vlbb (vsrc, src, GEN_INT (6)));
>    emit_insn (gen_lcbb (loadlen, src_addr, GEN_INT (6)));
>    emit_insn (gen_vfenezv16qi (vpos, vsrc, vsrc));
> -  emit_insn (gen_vec_extractv16qi (gpos_qi, vpos, GEN_INT (7)));
> +  emit_insn (gen_vec_extractv16qiqi (gpos_qi, vpos, GEN_INT (7)));
>    emit_move_insn (gpos, gen_rtx_SUBREG (SImode, gpos_qi, 0));
>    /* gpos is the byte index if a zero was found and 16 otherwise.
>       So if it is lower than the loaded bytes we have a hit.  */
> @@ -5928,7 +5928,7 @@ s390_expand_vec_movstr (rtx result, rtx
>    force_expand_binop (Pmode, add_optab, dst_addr_reg, offset, dst_addr_reg,
>  		      1, OPTAB_DIRECT);
>  
> -  emit_insn (gen_vec_extractv16qi (gpos_qi, vpos, GEN_INT (7)));
> +  emit_insn (gen_vec_extractv16qiqi (gpos_qi, vpos, GEN_INT (7)));
>    emit_move_insn (gpos, gen_rtx_SUBREG (SImode, gpos_qi, 0));
>  
>    emit_insn (gen_vstlv16qi (vsrc, gpos, gen_rtx_MEM (BLKmode, dst_addr_reg)));
> --- gcc/config/s390/vector.md.jj	2017-04-25 15:51:31.000000000 +0200
> +++ gcc/config/s390/vector.md	2017-07-24 17:57:37.665277768 +0200
> @@ -90,6 +90,17 @@ (define_mode_attr non_vec[(V1QI "QI") (V
>  			  (V1DF "DF") (V2DF "DF")
>  			  (V1TF "TF") (TF "TF")])
>  
> +; Like above, but in lower case.
> +(define_mode_attr non_vec_l[(V1QI "qi") (V2QI "qi") (V4QI "qi") (V8QI "qi")
> +			    (V16QI "qi")
> +			    (V1HI "hi") (V2HI "hi") (V4HI "hi") (V8HI "hi")
> +			    (V1SI "si") (V2SI "si") (V4SI "si")
> +			    (V1DI "di") (V2DI "di")
> +			    (V1TI "ti") (TI "ti")
> +			    (V1SF "sf") (V2SF "sf") (V4SF "sf")
> +			    (V1DF "df") (V2DF "df")
> +			    (V1TF "tf") (TF "tf")])
> +
>  ; The instruction suffix for integer instructions and instructions
>  ; which do not care about whether it is floating point or integer.
>  (define_mode_attr bhfgq[(V1QI "b") (V2QI "b") (V4QI "b") (V8QI "b") (V16QI "b")
> @@ -453,7 +464,7 @@ (define_insn "*vec_set<mode>_plus"
>  ; FIXME: Support also vector mode operands for 0
>  ; FIXME: This should be (vec_select ..) or something but it does only allow constant selectors :(
>  ; This is used via RTL standard name as well as for expanding the builtin
> -(define_expand "vec_extract<mode>"
> +(define_expand "vec_extract<mode><non_vec_l>"
>    [(set (match_operand:<non_vec> 0 "nonimmediate_operand" "")
>  	(unspec:<non_vec> [(match_operand:V  1 "register_operand" "")
>  			   (match_operand:SI 2 "nonmemory_operand" "")]
> @@ -485,7 +496,7 @@ (define_insn "*vec_extract<mode>_plus"
>    "vlgv<bhfgq>\t%0,%v1,%Y3(%2)"
>    [(set_attr "op_type" "VRS")])
>  
> -(define_expand "vec_init<mode>"
> +(define_expand "vec_init<mode><non_vec_l>"
>    [(match_operand:V_128 0 "register_operand" "")
>     (match_operand:V_128 1 "nonmemory_operand" "")]
>    "TARGET_VX"
> --- gcc/config/s390/s390-builtins.def.jj	2017-03-24 15:08:56.000000000 +0100
> +++ gcc/config/s390/s390-builtins.def	2017-07-24 18:02:22.571849086 +0200
> @@ -450,12 +450,12 @@ OB_DEF_VAR (s390_vec_extract_u64,
>  OB_DEF_VAR (s390_vec_extract_b64,       s390_vlgvg,         0,                  O2_ELEM,            BT_OV_ULONGLONG_BV2DI_INT)
>  OB_DEF_VAR (s390_vec_extract_dbl,       s390_vlgvg_dbl,     0,                  O2_ELEM,            BT_OV_DBL_V2DF_INT)                      /* vlgvg */
>  
> -B_DEF      (s390_vlgvb,                 vec_extractv16qi,   0,                  B_VX,               O2_ELEM,            BT_FN_UCHAR_UV16QI_INT)
> -B_DEF      (s390_vlgvh,                 vec_extractv8hi,    0,                  B_VX,               O2_ELEM,            BT_FN_USHORT_UV8HI_INT)
> -B_DEF      (s390_vlgvf,                 vec_extractv4si,    0,                  B_VX,               O2_ELEM,            BT_FN_UINT_UV4SI_INT)
> -B_DEF      (s390_vlgvf_flt,             vec_extractv4sf,    0,                  B_INT | B_VXE,      O2_ELEM,            BT_FN_FLT_V4SF_INT)
> -B_DEF      (s390_vlgvg,                 vec_extractv2di,    0,                  B_VX,               O2_ELEM,            BT_FN_ULONGLONG_UV2DI_INT)
> -B_DEF      (s390_vlgvg_dbl,             vec_extractv2df,    0,                  B_INT | B_VX,       O2_ELEM,            BT_FN_DBL_V2DF_INT)
> +B_DEF      (s390_vlgvb,                 vec_extractv16qiqi, 0,                  B_VX,               O2_ELEM,            BT_FN_UCHAR_UV16QI_INT)
> +B_DEF      (s390_vlgvh,                 vec_extractv8hihi,  0,                  B_VX,               O2_ELEM,            BT_FN_USHORT_UV8HI_INT)
> +B_DEF      (s390_vlgvf,                 vec_extractv4sisi,  0,                  B_VX,               O2_ELEM,            BT_FN_UINT_UV4SI_INT)
> +B_DEF      (s390_vlgvf_flt,             vec_extractv4sfsf,  0,                  B_INT | B_VXE,      O2_ELEM,            BT_FN_FLT_V4SF_INT)
> +B_DEF      (s390_vlgvg,                 vec_extractv2didi,  0,                  B_VX,               O2_ELEM,            BT_FN_ULONGLONG_UV2DI_INT)
> +B_DEF      (s390_vlgvg_dbl,             vec_extractv2dfdf,  0,                  B_INT | B_VX,       O2_ELEM,            BT_FN_DBL_V2DF_INT)
>  
>  OB_DEF     (s390_vec_insert_and_zero,   s390_vec_insert_and_zero_s8,s390_vec_insert_and_zero_dbl,B_VX,BT_FN_OV4SI_INTCONSTPTR)
>  OB_DEF_VAR (s390_vec_insert_and_zero_s8,s390_vllezb,        0,                  0,                  BT_OV_V16QI_SCHARCONSTPTR)
> --- gcc/config/arm/iterators.md.jj	2017-05-05 09:20:02.000000000 +0200
> +++ gcc/config/arm/iterators.md	2017-07-24 17:25:15.665681575 +0200
> @@ -444,6 +444,14 @@ (define_mode_attr V_elem [(V8QI "QI") (V
>                            (V2SF "SF") (V4SF "SF")
>                            (DI "DI")   (V2DI "DI")])
>  
> +;; As above but in lower case.
> +(define_mode_attr V_elem_l [(V8QI "qi") (V16QI "qi")
> +			    (V4HI "hi") (V8HI "hi")
> +			    (V4HF "hf") (V8HF "hf")
> +			    (V2SI "si") (V4SI "si")
> +			    (V2SF "sf") (V4SF "sf")
> +			    (DI "di")   (V2DI "di")])
> +
>  ;; Element modes for vector extraction, padded up to register size.
>  
>  (define_mode_attr V_ext [(V8QI "SI") (V16QI "SI")
> --- gcc/config/arm/neon.md.jj	2017-07-17 10:08:41.000000000 +0200
> +++ gcc/config/arm/neon.md	2017-07-24 17:27:42.173917259 +0200
> @@ -412,7 +412,7 @@ (define_expand "vec_set<mode>"
>    DONE;
>  })
>  
> -(define_insn "vec_extract<mode>"
> +(define_insn "vec_extract<mode><V_elem_l>"
>    [(set (match_operand:<V_elem> 0 "nonimmediate_operand" "=Um,r")
>          (vec_select:<V_elem>
>            (match_operand:VD_LANE 1 "s_register_operand" "w,w")
> @@ -434,7 +434,7 @@ (define_insn "vec_extract<mode>"
>    [(set_attr "type" "neon_store1_one_lane<q>,neon_to_gp<q>")]
>  )
>  
> -(define_insn "vec_extract<mode>"
> +(define_insn "vec_extract<mode><V_elem_l>"
>    [(set (match_operand:<V_elem> 0 "nonimmediate_operand" "=Um,r")
>  	(vec_select:<V_elem>
>            (match_operand:VQ2 1 "s_register_operand" "w,w")
> @@ -460,7 +460,7 @@ (define_insn "vec_extract<mode>"
>    [(set_attr "type" "neon_store1_one_lane<q>,neon_to_gp<q>")]
>  )
>  
> -(define_insn "vec_extractv2di"
> +(define_insn "vec_extractv2didi"
>    [(set (match_operand:DI 0 "nonimmediate_operand" "=Um,r")
>  	(vec_select:DI
>            (match_operand:V2DI 1 "s_register_operand" "w,w")
> @@ -479,7 +479,7 @@ (define_insn "vec_extractv2di"
>    [(set_attr "type" "neon_store1_one_lane_q,neon_to_gp_q")]
>  )
>  
> -(define_expand "vec_init<mode>"
> +(define_expand "vec_init<mode><V_elem_l>"
>    [(match_operand:VDQ 0 "s_register_operand" "")
>     (match_operand 1 "" "")]
>    "TARGET_NEON"
> @@ -1581,7 +1581,7 @@ (define_expand "reduc_plus_scal_<mode>"
>    neon_pairwise_reduce (vec, operands[1], <MODE>mode,
>  			&gen_neon_vpadd_internal<mode>);
>    /* The same result is actually computed into every element.  */
> -  emit_insn (gen_vec_extract<mode> (operands[0], vec, const0_rtx));
> +  emit_insn (gen_vec_extract<mode><V_elem_l> (operands[0], vec, const0_rtx));
>    DONE;
>  })
>  
> @@ -1607,7 +1607,7 @@ (define_expand "reduc_plus_scal_v2di"
>    rtx vec = gen_reg_rtx (V2DImode);
>  
>    emit_insn (gen_arm_reduc_plus_internal_v2di (vec, operands[1]));
> -  emit_insn (gen_vec_extractv2di (operands[0], vec, const0_rtx));
> +  emit_insn (gen_vec_extractv2didi (operands[0], vec, const0_rtx));
>  
>    DONE;
>  })
> @@ -1631,7 +1631,7 @@ (define_expand "reduc_smin_scal_<mode>"
>    neon_pairwise_reduce (vec, operands[1], <MODE>mode,
>  			&gen_neon_vpsmin<mode>);
>    /* The result is computed into every element of the vector.  */
> -  emit_insn (gen_vec_extract<mode> (operands[0], vec, const0_rtx));
> +  emit_insn (gen_vec_extract<mode><V_elem_l> (operands[0], vec, const0_rtx));
>    DONE;
>  })
>  
> @@ -1658,7 +1658,7 @@ (define_expand "reduc_smax_scal_<mode>"
>    neon_pairwise_reduce (vec, operands[1], <MODE>mode,
>  			&gen_neon_vpsmax<mode>);
>    /* The result is computed into every element of the vector.  */
> -  emit_insn (gen_vec_extract<mode> (operands[0], vec, const0_rtx));
> +  emit_insn (gen_vec_extract<mode><V_elem_l> (operands[0], vec, const0_rtx));
>    DONE;
>  })
>  
> @@ -1685,7 +1685,7 @@ (define_expand "reduc_umin_scal_<mode>"
>    neon_pairwise_reduce (vec, operands[1], <MODE>mode,
>  			&gen_neon_vpumin<mode>);
>    /* The result is computed into every element of the vector.  */
> -  emit_insn (gen_vec_extract<mode> (operands[0], vec, const0_rtx));
> +  emit_insn (gen_vec_extract<mode><V_elem_l> (operands[0], vec, const0_rtx));
>    DONE;
>  })
>  
> @@ -1711,7 +1711,7 @@ (define_expand "reduc_umax_scal_<mode>"
>    neon_pairwise_reduce (vec, operands[1], <MODE>mode,
>  			&gen_neon_vpumax<mode>);
>    /* The result is computed into every element of the vector.  */
> -  emit_insn (gen_vec_extract<mode> (operands[0], vec, const0_rtx));
> +  emit_insn (gen_vec_extract<mode><V_elem_l> (operands[0], vec, const0_rtx));
>    DONE;
>  })
>  
> @@ -3272,7 +3272,8 @@ (define_expand "neon_vget_lane<mode>"
>      }
>  
>    if (GET_MODE_UNIT_BITSIZE (<MODE>mode) == 32)
> -    emit_insn (gen_vec_extract<mode> (operands[0], operands[1], operands[2]));
> +    emit_insn (gen_vec_extract<mode><V_elem_l> (operands[0], operands[1],
> +						operands[2]));
>    else
>      emit_insn (gen_neon_vget_lane<mode>_sext_internal (operands[0],
>  						       operands[1],
> @@ -3301,7 +3302,8 @@ (define_expand "neon_vget_laneu<mode>"
>      }
>  
>    if (GET_MODE_UNIT_BITSIZE (<MODE>mode) == 32)
> -    emit_insn (gen_vec_extract<mode> (operands[0], operands[1], operands[2]));
> +    emit_insn (gen_vec_extract<mode><V_elem_l> (operands[0], operands[1],
> +						operands[2]));
>    else
>      emit_insn (gen_neon_vget_lane<mode>_zext_internal (operands[0],
>  						       operands[1],
> --- gcc/config/mips/mips-msa.md.jj	2017-03-31 20:36:09.000000000 +0200
> +++ gcc/config/mips/mips-msa.md	2017-07-24 17:33:32.657689124 +0200
> @@ -231,7 +231,7 @@ (define_mode_attr bitimm
>     (V4SI  "uimm5")
>     (V2DI  "uimm6")])
>  
> -(define_expand "vec_init<mode>"
> +(define_expand "vec_init<mode><unitmode>"
>    [(match_operand:MSA 0 "register_operand")
>     (match_operand:MSA 1 "")]
>    "ISA_HAS_MSA"
> @@ -311,7 +311,7 @@ (define_expand "vec_unpacku_lo_<mode>"
>    DONE;
>  })
>  
> -(define_expand "vec_extract<mode>"
> +(define_expand "vec_extract<mode><unitmode>"
>    [(match_operand:<UNITMODE> 0 "register_operand")
>     (match_operand:IMSA 1 "register_operand")
>     (match_operand 2 "const_<indeximm>_operand")]
> @@ -329,7 +329,7 @@ (define_expand "vec_extract<mode>"
>    DONE;
>  })
>  
> -(define_expand "vec_extract<mode>"
> +(define_expand "vec_extract<mode><unitmode>"
>    [(match_operand:<UNITMODE> 0 "register_operand")
>     (match_operand:FMSA 1 "register_operand")
>     (match_operand 2 "const_<indeximm>_operand")]
> --- gcc/config/mips/loongson.md.jj	2017-01-01 12:45:40.000000000 +0100
> +++ gcc/config/mips/loongson.md	2017-07-24 18:08:29.736433972 +0200
> @@ -119,7 +119,7 @@ (define_insn "mov<mode>_internal"
>  
>  ;; Initialization of a vector.
>  
> -(define_expand "vec_init<mode>"
> +(define_expand "vec_init<mode><unitmode>"
>    [(set (match_operand:VWHB 0 "register_operand")
>  	(match_operand 1 ""))]
>    "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
> --- gcc/config/mips/mips-ps-3d.md.jj	2017-01-01 12:45:40.000000000 +0100
> +++ gcc/config/mips/mips-ps-3d.md	2017-07-24 17:34:13.540195876 +0200
> @@ -254,7 +254,7 @@ (define_expand "mips_pll_ps"
>  })
>  
>  ; vec_init
> -(define_expand "vec_initv2sf"
> +(define_expand "vec_initv2sfsf"
>    [(match_operand:V2SF 0 "register_operand")
>     (match_operand:V2SF 1 "")]
>    "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT"
> @@ -282,7 +282,7 @@ (define_insn "vec_concatv2sf"
>  ;; emulated.  There is no other way to get a vector mode bitfield extract
>  ;; currently.
>  
> -(define_insn "vec_extractv2sf"
> +(define_insn "vec_extractv2sfsf"
>    [(set (match_operand:SF 0 "register_operand" "=f")
>  	(vec_select:SF (match_operand:V2SF 1 "register_operand" "f")
>  		       (parallel
> @@ -379,7 +379,7 @@ (define_expand "reduc_plus_scal_v2sf"
>      rtx temp = gen_reg_rtx (V2SFmode);
>      emit_insn (gen_mips_addr_ps (temp, operands[1], operands[1]));
>      rtx lane = BYTES_BIG_ENDIAN ? const1_rtx : const0_rtx;
> -    emit_insn (gen_vec_extractv2sf (operands[0], temp, lane));
> +    emit_insn (gen_vec_extractv2sfsf (operands[0], temp, lane));
>      DONE;
>    })
>  
> @@ -757,7 +757,7 @@ (define_expand "reduc_smin_scal_v2sf"
>    rtx temp = gen_reg_rtx (V2SFmode);
>    mips_expand_vec_reduc (temp, operands[1], gen_sminv2sf3);
>    rtx lane = BYTES_BIG_ENDIAN ? const1_rtx : const0_rtx;
> -  emit_insn (gen_vec_extractv2sf (operands[0], temp, lane));
> +  emit_insn (gen_vec_extractv2sfsf (operands[0], temp, lane));
>    DONE;
>  })
>  
> @@ -769,6 +769,6 @@ (define_expand "reduc_smax_scal_v2sf"
>    rtx temp = gen_reg_rtx (V2SFmode);
>    mips_expand_vec_reduc (temp, operands[1], gen_smaxv2sf3);
>    rtx lane = BYTES_BIG_ENDIAN ? const1_rtx : const0_rtx;
> -  emit_insn (gen_vec_extractv2sf (operands[0], temp, lane));
> +  emit_insn (gen_vec_extractv2sfsf (operands[0], temp, lane));
>    DONE;
>  })
> --- gcc/config/mips/mips.md.jj	2017-06-15 11:03:32.000000000 +0200
> +++ gcc/config/mips/mips.md	2017-07-24 19:00:15.519582707 +0200
> @@ -917,6 +917,11 @@ (define_mode_attr UNITMODE [(SF "SF") (D
>  			    (V16QI "QI") (V8HI "HI") (V4SI "SI") (V2DI "DI")
>  			    (V2DF "DF")])
>  
> +;; As above, but in lower case.
> +(define_mode_attr unitmode [(SF "sf") (DF "df") (V2SF "sf") (V4SF "sf")
> +			    (V16QI "qi") (V8QI "qi") (V8HI "hi") (V4HI "hi")
> +			    (V4SI "si") (V2SI "si") (V2DI "di") (V2DF "df")])
> +
>  ;; This attribute gives the integer mode that has the same size as a
>  ;; fixed-point mode.
>  (define_mode_attr IMODE [(QQ "QI") (HQ "HI") (SQ "SI") (DQ "DI")
> --- gcc/config/spu/spu.c.jj	2017-07-17 10:08:39.000000000 +0200
> +++ gcc/config/spu/spu.c	2017-07-24 18:06:01.693214125 +0200
> @@ -1773,7 +1773,7 @@ spu_expand_prologue (void)
>  	      size_v4si = scratch_v4si;
>  	    }
>  	  emit_insn (gen_cgt_v4si (scratch_v4si, sp_v4si, size_v4si));
> -	  emit_insn (gen_vec_extractv4si
> +	  emit_insn (gen_vec_extractv4sisi
>  		     (scratch_reg_0, scratch_v4si, GEN_INT (1)));
>  	  emit_insn (gen_spu_heq (scratch_reg_0, GEN_INT (0)));
>  	}
> @@ -5368,7 +5368,7 @@ spu_allocate_stack (rtx op0, rtx op1)
>      {
>        rtx avail = gen_reg_rtx(SImode);
>        rtx result = gen_reg_rtx(SImode);
> -      emit_insn (gen_vec_extractv4si (avail, sp, GEN_INT (1)));
> +      emit_insn (gen_vec_extractv4sisi (avail, sp, GEN_INT (1)));
>        emit_insn (gen_cgt_si(result, avail, GEN_INT (-1)));
>        emit_insn (gen_spu_heq (result, GEN_INT(0) ));
>      }
> @@ -5684,22 +5684,22 @@ spu_builtin_extract (rtx ops[])
>        switch (mode)
>  	{
>  	case V16QImode:
> -	  emit_insn (gen_vec_extractv16qi (ops[0], ops[1], ops[2]));
> +	  emit_insn (gen_vec_extractv16qiqi (ops[0], ops[1], ops[2]));
>  	  break;
>  	case V8HImode:
> -	  emit_insn (gen_vec_extractv8hi (ops[0], ops[1], ops[2]));
> +	  emit_insn (gen_vec_extractv8hihi (ops[0], ops[1], ops[2]));
>  	  break;
>  	case V4SFmode:
> -	  emit_insn (gen_vec_extractv4sf (ops[0], ops[1], ops[2]));
> +	  emit_insn (gen_vec_extractv4sfsf (ops[0], ops[1], ops[2]));
>  	  break;
>  	case V4SImode:
> -	  emit_insn (gen_vec_extractv4si (ops[0], ops[1], ops[2]));
> +	  emit_insn (gen_vec_extractv4sisi (ops[0], ops[1], ops[2]));
>  	  break;
>  	case V2DImode:
> -	  emit_insn (gen_vec_extractv2di (ops[0], ops[1], ops[2]));
> +	  emit_insn (gen_vec_extractv2didi (ops[0], ops[1], ops[2]));
>  	  break;
>  	case V2DFmode:
> -	  emit_insn (gen_vec_extractv2df (ops[0], ops[1], ops[2]));
> +	  emit_insn (gen_vec_extractv2dfdf (ops[0], ops[1], ops[2]));
>  	  break;
>  	default:
>  	  abort ();
> --- gcc/config/spu/spu.md.jj	2017-01-01 12:45:40.000000000 +0100
> +++ gcc/config/spu/spu.md	2017-07-24 18:05:05.591888718 +0200
> @@ -256,6 +256,13 @@ (define_mode_attr inner  [(V16QI "QI")
>  			  (V2DI  "DI")
>  			  (V4SF  "SF")
>  			  (V2DF  "DF")])
> +;; Like above, but in lower case
> +(define_mode_attr inner_l [(V16QI "qi")
> +			   (V8HI  "hi")
> +			   (V4SI  "si")
> +			   (V2DI  "di")
> +			   (V4SF  "sf")
> +			   (V2DF  "df")])
>  (define_mode_attr vmult  [(V16QI "1")
>  			  (V8HI  "2")
>  			  (V4SI  "4")
> @@ -4318,7 +4325,7 @@ (define_expand "restore_stack_nonlocal"
>  ;; vector patterns
>  
>  ;; Vector initialization
> -(define_expand "vec_init<mode>"
> +(define_expand "vec_init<mode><inner_l>"
>    [(match_operand:V 0 "register_operand" "")
>     (match_operand 1 "" "")]
>    ""
> @@ -4347,7 +4354,7 @@ (define_expand "vec_set<mode>"
>      operands[6] = GEN_INT (size);
>    })
>  
> -(define_expand "vec_extract<mode>"
> +(define_expand "vec_extract<mode><inner_l>"
>    [(set (match_operand:<inner> 0 "spu_reg_operand" "=r")
>  	(vec_select:<inner> (match_operand:V 1 "spu_reg_operand" "r")
>  			    (parallel [(match_operand 2 "const_int_operand" "i")])))]
> --- gcc/config/sparc/sparc.md.jj	2017-07-17 10:08:39.000000000 +0200
> +++ gcc/config/sparc/sparc.md	2017-07-24 18:11:52.396997069 +0200
> @@ -8621,6 +8621,8 @@ (define_mode_attr vconstr [(V1SI "f") (V
>  (define_mode_attr vfptype [(V1SI "single") (V2HI "single") (V4QI "single")
>  			   (V1DI "double") (V2SI "double") (V4HI "double")
>  			   (V8QI "double")])
> +(define_mode_attr veltmode [(V1SI "si") (V2HI "hi") (V4QI "qi") (V1DI "di")
> +			    (V2SI "si") (V4HI "hi") (V8QI "qi")])
>  
>  (define_expand "mov<VMALL:mode>"
>    [(set (match_operand:VMALL 0 "nonimmediate_operand" "")
> @@ -8762,7 +8764,7 @@ (define_split
>    DONE;
>  })
>  
> -(define_expand "vec_init<VMALL:mode>"
> +(define_expand "vec_init<VMALL:mode><VMALL:veltmode>"
>    [(match_operand:VMALL 0 "register_operand" "")
>     (match_operand:VMALL 1 "" "")]
>    "TARGET_VIS"
> --- gcc/config/ia64/vect.md.jj	2017-01-01 12:45:42.000000000 +0100
> +++ gcc/config/ia64/vect.md	2017-07-24 17:29:28.996628899 +0200
> @@ -1015,7 +1015,7 @@ (define_insn "*vec_interleave_highv2si"
>  }
>    [(set_attr "itanium_class" "mmshf")])
>  
> -(define_expand "vec_initv2si"
> +(define_expand "vec_initv2sisi"
>    [(match_operand:V2SI 0 "gr_register_operand" "")
>     (match_operand 1 "" "")]
>    ""
> @@ -1299,7 +1299,7 @@ (define_insn "*fselect"
>    "fselect %0 = %F2, %F3, %1"
>    [(set_attr "itanium_class" "fmisc")])
>  
> -(define_expand "vec_initv2sf"
> +(define_expand "vec_initv2sfsf"
>    [(match_operand:V2SF 0 "fr_register_operand" "")
>     (match_operand 1 "" "")]
>    ""
> @@ -1483,7 +1483,7 @@ (define_insn_and_split "*vec_extractv2sf
>    operands[1] = gen_rtx_REG (SFmode, REGNO (operands[1]));
>  })
>  
> -(define_expand "vec_extractv2sf"
> +(define_expand "vec_extractv2sfsf"
>    [(set (match_operand:SF 0 "register_operand" "")
>  	(unspec:SF [(match_operand:V2SF 1 "register_operand" "")
>  		    (match_operand:DI 2 "const_int_operand" "")]
> --- gcc/config/powerpcspe/vector.md.jj	2017-05-25 10:37:03.000000000 +0200
> +++ gcc/config/powerpcspe/vector.md	2017-07-24 17:41:21.897027743 +0200
> @@ -74,6 +74,16 @@ (define_mode_attr VEC_base [(V16QI "QI")
>  			    (V1TI  "TI")
>  			    (TI    "TI")])
>  
> +;; As above, but in lower case
> +(define_mode_attr VEC_base_l [(V16QI "qi")
> +			      (V8HI  "hi")
> +			      (V4SI  "si")
> +			      (V2DI  "di")
> +			      (V4SF  "sf")
> +			      (V2DF  "df")
> +			      (V1TI  "ti")
> +			      (TI    "ti")])
> +
>  ;; Same size integer type for floating point data
>  (define_mode_attr VEC_int [(V4SF  "v4si")
>  			   (V2DF  "v2di")])
> @@ -1017,7 +1027,7 @@ (define_expand "fixuns_trunc<mode><VEC_i
>  
>  \f
>  ;; Vector initialization, set, extract
> -(define_expand "vec_init<mode>"
> +(define_expand "vec_init<mode><VEC_base_l>"
>    [(match_operand:VEC_E 0 "vlogical_operand" "")
>     (match_operand:VEC_E 1 "" "")]
>    "VECTOR_MEM_ALTIVEC_OR_VSX_P (<MODE>mode)"
> @@ -1036,7 +1046,7 @@ (define_expand "vec_set<mode>"
>    DONE;
>  })
>  
> -(define_expand "vec_extract<mode>"
> +(define_expand "vec_extract<mode><VEC_base_l>"
>    [(match_operand:<VEC_base> 0 "register_operand" "")
>     (match_operand:VEC_E 1 "vlogical_operand" "")
>     (match_operand 2 "const_int_operand" "")]
> --- gcc/config/powerpcspe/paired.md.jj	2017-05-25 10:37:04.000000000 +0200
> +++ gcc/config/powerpcspe/paired.md	2017-07-24 17:42:17.980351097 +0200
> @@ -377,7 +377,7 @@ (define_insn "paired_muls1"
>    "ps_muls1 %0, %1, %2"
>    [(set_attr "type" "fp")])
>  
> -(define_expand "vec_initv2sf"
> +(define_expand "vec_initv2sfsf"
>    [(match_operand:V2SF 0 "gpc_reg_operand" "=f")
>     (match_operand 1 "" "")]
>    "TARGET_PAIRED_FLOAT"
> --- gcc/config/powerpcspe/altivec.md.jj	2017-05-25 10:37:05.000000000 +0200
> +++ gcc/config/powerpcspe/altivec.md	2017-07-24 17:42:49.897966010 +0200
> @@ -301,7 +301,7 @@ (define_split
>    for (i = 0; i < num_elements; i++)
>      RTVEC_ELT (v, i) = constm1_rtx;
>  
> -  emit_insn (gen_vec_initv4si (dest, gen_rtx_PARALLEL (mode, v)));
> +  emit_insn (gen_vec_initv4sisi (dest, gen_rtx_PARALLEL (mode, v)));
>    emit_insn (gen_rtx_SET (dest, gen_rtx_ASHIFT (mode, dest, dest)));
>    DONE;
>  })
> @@ -2222,7 +2222,7 @@ (define_expand "altivec_copysign_v4sf3"
>    RTVEC_ELT (v, 2) = GEN_INT (mask_val);
>    RTVEC_ELT (v, 3) = GEN_INT (mask_val);
>  
> -  emit_insn (gen_vec_initv4si (mask, gen_rtx_PARALLEL (V4SImode, v)));
> +  emit_insn (gen_vec_initv4sisi (mask, gen_rtx_PARALLEL (V4SImode, v)));
>    emit_insn (gen_vector_select_v4sf (operands[0], operands[1], operands[2],
>  				     gen_lowpart (V4SFmode, mask)));
>    DONE;
> @@ -3014,7 +3014,7 @@ (define_expand "vec_unpacku_hi_v16qi"
>    RTVEC_ELT (v, 14) = gen_rtx_CONST_INT (QImode, be ? 16 :  0);
>    RTVEC_ELT (v, 15) = gen_rtx_CONST_INT (QImode, be ?  7 : 16);
>  
> -  emit_insn (gen_vec_initv16qi (mask, gen_rtx_PARALLEL (V16QImode, v)));
> +  emit_insn (gen_vec_initv16qiqi (mask, gen_rtx_PARALLEL (V16QImode, v)));
>    emit_insn (gen_vperm_v16qiv8hi (operands[0], operands[1], vzero, mask));
>    DONE;
>  }")
> @@ -3050,7 +3050,7 @@ (define_expand "vec_unpacku_hi_v8hi"
>    RTVEC_ELT (v, 14) = gen_rtx_CONST_INT (QImode, be ?  6 : 17);
>    RTVEC_ELT (v, 15) = gen_rtx_CONST_INT (QImode, be ?  7 : 16);
>  
> -  emit_insn (gen_vec_initv16qi (mask, gen_rtx_PARALLEL (V16QImode, v)));
> +  emit_insn (gen_vec_initv16qiqi (mask, gen_rtx_PARALLEL (V16QImode, v)));
>    emit_insn (gen_vperm_v8hiv4si (operands[0], operands[1], vzero, mask));
>    DONE;
>  }")
> @@ -3086,7 +3086,7 @@ (define_expand "vec_unpacku_lo_v16qi"
>    RTVEC_ELT (v, 14) = gen_rtx_CONST_INT (QImode, be ? 16 :  8);
>    RTVEC_ELT (v, 15) = gen_rtx_CONST_INT (QImode, be ? 15 : 16);
>  
> -  emit_insn (gen_vec_initv16qi (mask, gen_rtx_PARALLEL (V16QImode, v)));
> +  emit_insn (gen_vec_initv16qiqi (mask, gen_rtx_PARALLEL (V16QImode, v)));
>    emit_insn (gen_vperm_v16qiv8hi (operands[0], operands[1], vzero, mask));
>    DONE;
>  }")
> @@ -3122,7 +3122,7 @@ (define_expand "vec_unpacku_lo_v8hi"
>    RTVEC_ELT (v, 14) = gen_rtx_CONST_INT (QImode, be ? 14 : 17);
>    RTVEC_ELT (v, 15) = gen_rtx_CONST_INT (QImode, be ? 15 : 16);
>  
> -  emit_insn (gen_vec_initv16qi (mask, gen_rtx_PARALLEL (V16QImode, v)));
> +  emit_insn (gen_vec_initv16qiqi (mask, gen_rtx_PARALLEL (V16QImode, v)));
>    emit_insn (gen_vperm_v8hiv4si (operands[0], operands[1], vzero, mask));
>    DONE;
>  }")
> @@ -3363,7 +3363,7 @@ (define_expand "mulv16qi3"
>       = gen_rtx_CONST_INT (QImode, BYTES_BIG_ENDIAN ? 2 * i + 17 : 15 - 2 * i);
>    }
>  
> -  emit_insn (gen_vec_initv16qi (mask, gen_rtx_PARALLEL (V16QImode, v)));
> +  emit_insn (gen_vec_initv16qiqi (mask, gen_rtx_PARALLEL (V16QImode, v)));
>    emit_insn (gen_altivec_vmulesb (even, operands[1], operands[2]));
>    emit_insn (gen_altivec_vmulosb (odd, operands[1], operands[2]));
>    emit_insn (gen_altivec_vperm_v8hiv16qi (operands[0], even, odd, mask));
> 
> 	Jakub
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Switch vec_init and vec_extract optabs to 2 mode optab to allow extraction of vector from vector or initialization of vector from smaller vectors (PR target/80846)
  2017-07-25  9:14 [PATCH] Switch vec_init and vec_extract optabs to 2 mode optab to allow extraction of vector from vector or initialization of vector from smaller vectors (PR target/80846) Jakub Jelinek
                   ` (3 preceding siblings ...)
  2017-07-26 10:35 ` Richard Biener
@ 2017-07-26 10:42 ` Uros Bizjak
  2017-07-27 11:43 ` Segher Boessenkool
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 15+ messages in thread
From: Uros Bizjak @ 2017-07-26 10:42 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Richard Biener, David Edelsohn, Segher Boessenkool,
	Marcus Shawcroft, Richard Earnshaw, Andreas Krebbel,
	Matthew Fortune, Eric Botcazou, Andrew Jenner, gcc-patches

On Tue, Jul 25, 2017 at 11:14 AM, Jakub Jelinek <jakub@redhat.com> wrote:
> Hi!
>
> The following patch adjusts the vec_init and vec_extract optabs, so that
> they don't have in the expander names just the vector mode, but also another
> mode, for vec_extract the mode of the result and for vec_init the mode of
> the elts of the vector passed as second operand.
>
> Without this patch, the second mode has been implicit, GET_MODE_INNER of
> the vector mode, so one could just extract a single element from a vector
> or construct vector from elements.  While that is most common, we allow
> in GIMPLE e.g. construction of V8DImode from 4 V2DImode elements etc.
> and the vectorizer uses them.  By having the second mode in the name
> it allows the generic code (vectorizer, expansion) to query whether the
> backend supports such vector from vector expansions or inits from vector
> elts and use them if available.
>
> For vec_extract, if we say want to extract high V2SImode from V4SImode
> the fallback is try to expand it as DImode extraction from V2DImode.
> This works well in many cases, but doesn't really work for very large
> vectors, say if we want to extract high V8SImode from V16SImode on x86,
> we'd need OImode extraction from V2OImode, which is something the backend
> doesn't have any support for.
> For vec_init, the fallback is usually to go through memory, which is slow in
> many cases.
>
> This patch only adds new vector from vector extract and init patterns to
> the i386 backend, but I had to change many other targets too, because
> it needs to have the element mode in the vec_extract/vec_init expander
> names.  Seems most of the backends didn't really have a mode attribute
> usable for this or had it only in uppercase, while for the names we need
> lowercase.  Some backends had a convention on how to name lower case
> vs. upper case modes, others didn't have any.  So I'm CCing maintainers
> of affected backends to seek advice on what mode attributes they want to
> use.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, where it improves
> e.g. the code generation for slp-43.c and slp-45.c testcases.
> make cc1 tested in cross-compilers to the remaining targets.
>
> Ok for trunk?
>
> 2017-07-25  Jakub Jelinek  <jakub@redhat.com>
>
>         PR target/80846
>         * optabs.def (vec_extract_optab, vec_init_optab): Change from
>         a direct optab to conversion optab.
>         * optabs.c (expand_vector_broadcast): Use convert_optab_handler
>         with GET_MODE_INNER as last argument instead of optab_handler.
>         * expmed.c (extract_bit_field_1): Likewise.  Use vector from
>         vector extraction if possible and optab is available.
>         * expr.c (store_constructor): Use convert_optab_handler instead
>         of optab_handler.  Use vector initialization from smaller
>         vectors if possible and optab is available.
>         * tree-vect-stmts.c (vectorizable_load): Likewise.
>         * doc/md.texi (vec_extract, vec_init): Document that the optabs
>         now have two modes.
>         * config/i386/i386.c (ix86_expand_vector_init): Handle expansion
>         of vec_init from half-sized vectors with the same element mode.
>         * config/i386/sse.md (ssehalfvecmode): Add V4TI case.
>         (ssehalfvecmodelower, ssescalarmodelower): New mode attributes.
>         (reduc_plus_scal_v8df, reduc_plus_scal_v4df, reduc_plus_scal_v2df,
>         reduc_plus_scal_v16sf, reduc_plus_scal_v8sf, reduc_plus_scal_v4sf,
>         reduc_<code>_scal_<mode>, reduc_umin_scal_v8hi): Add element mode
>         after mode in gen_vec_extract* calls.
>         (vec_extract<mode>): Renamed to ...
>         (vec_extract<mode><ssescalarmodelower>): ... this.
>         (vec_extract<mode><ssehalfvecmodelower>): New expander.
>         (rotl<mode>3, rotr<mode>3, <shift_insn><mode>3, ashrv2di3): Add
>         element mode after mode in gen_vec_init* calls.
>         (VEC_INIT_HALF_MODE): New mode iterator.
>         (vec_init<mode>): Renamed to ...
>         (vec_init<mode><ssescalarmodelower>): ... this.
>         (vec_init<mode><ssehalfvecmodelower>): New expander.
>         * config/i386/mmx.md (vec_extractv2sf): Renamed to ...
>         (vec_extractv2sfsf): ... this.
>         (vec_initv2sf): Renamed to ...
>         (vec_initv2sfsf): ... this.
>         (vec_extractv2si): Renamed to ...
>         (vec_extractv2sisi): ... this.
>         (vec_initv2si): Renamed to ...
>         (vec_initv2sisi): ... this.
>         (vec_extractv4hi): Renamed to ...
>         (vec_extractv4hihi): ... this.
>         (vec_initv4hi): Renamed to ...
>         (vec_initv4hihi): ... this.
>         (vec_extractv8qi): Renamed to ...
>         (vec_extractv8qiqi): ... this.
>         (vec_initv8qi): Renamed to ...
>         (vec_initv8qiqi): ... this.
>         * config/rs6000/vector.md (VEC_base_l): New mode attribute.
>         (vec_init<mode>): Renamed to ...
>         (vec_init<mode><VEC_base_l>): ... this.
>         (vec_extract<mode>): Renamed to ...
>         (vec_extract<mode><VEC_base_l>): ... this.
>         * config/rs6000/paired.md (vec_initv2sf): Renamed to ...
>         (vec_initv2sfsf): ... this.
>         * config/rs6000/altivec.md (splitter, altivec_copysign_v4sf3,
>         vec_unpacku_hi_v16qi, vec_unpacku_hi_v8hi, vec_unpacku_lo_v16qi,
>         vec_unpacku_lo_v8hi, mulv16qi3, altivec_vreve<mode>2): Add
>         element mode after mode in gen_vec_init* calls.
>         * config/aarch64/aarch64-simd.md (vec_init<mode>): Renamed to ...
>         (vec_init<mode><Vel>): ... this.
>         (vec_extract<mode>): Renamed to ...
>         (vec_extract<mode><Vel>): ... this.
>         * config/aarch64/iterators.md (Vel): New mode attribute.
>         * config/s390/s390.c (s390_expand_vec_strlen, s390_expand_vec_movstr):
>         Add element mode after mode in gen_vec_extract* calls.
>         * config/s390/vector.md (non_vec_l): New mode attribute.
>         (vec_extract<mode>): Renamed to ...
>         (vec_extract<mode><non_vec_l>): ... this.
>         (vec_init<mode>): Renamed to ...
>         (vec_init<mode><non_vec_l>): ... this.
>         * config/s390/s390-builtins.def (s390_vlgvb, s390_vlgvh, s390_vlgvf,
>         s390_vlgvf_flt, s390_vlgvg, s390_vlgvg_dbl): Add element mode after
>         vec_extract mode.
>         * config/arm/iterators.md (V_elem_l): New mode attribute.
>         * config/arm/neon.md (vec_extract<mode>): Renamed to ...
>         (vec_extract<mode><V_elem_l>): ... this.
>         (vec_extractv2di): Renamed to ...
>         (vec_extractv2didi): ... this.
>         (vec_init<mode>): Renamed to ...
>         (vec_init<mode><V_elem_l>): ... this.
>         (reduc_plus_scal_<mode>, reduc_plus_scal_v2di, reduc_smin_scal_<mode>,
>         reduc_smax_scal_<mode>, reduc_umin_scal_<mode>,
>         reduc_umax_scal_<mode>, neon_vget_lane<mode>, neon_vget_laneu<mode>):
>         Add element mode after gen_vec_extract* calls.
>         * config/mips/mips-msa.md (vec_init<mode>): Renamed to ...
>         (vec_init<mode><unitmode>): ... this.
>         (vec_extract<mode>): Renamed to ...
>         (vec_extract<mode><unitmode>): ... this.
>         * config/mips/loongson.md (vec_init<mode>): Renamed to ...
>         (vec_init<mode><unitmode>): ... this.
>         * config/mips/mips-ps-3d.md (vec_initv2sf): Renamed to ...
>         (vec_initv2sfsf): ... this.
>         (vec_extractv2sf): Renamed to ...
>         (vec_extractv2sfsf): ... this.
>         (reduc_plus_scal_v2sf, reduc_smin_scal_v2sf, reduc_smax_scal_v2sf):
>         Add element mode after gen_vec_extract* calls.
>         * config/mips/mips.md (unitmode): New mode iterator.
>         * config/spu/spu.c (spu_expand_prologue, spu_allocate_stack,
>         spu_builtin_extract): Add element mode after gen_vec_extract* calls.
>         * config/spu/spu.md (inner_l): New mode attribute.
>         (vec_init<mode>): Renamed to ...
>         (vec_init<mode><inner_l>): ... this.
>         (vec_extract<mode>): Renamed to ...
>         (vec_extract<mode><inner_l>): ... this.
>         * config/sparc/sparc.md (veltmode): New mode iterator.
>         (vec_init<VMALL:mode>): Renamed to ...
>         (vec_init<VMALL:mode><VMALL:veltmode>): ... this.
>         * config/ia64/vect.md (vec_initv2si): Renamed to ...
>         (vec_initv2sisi): ... this.
>         (vec_initv2sf): Renamed to ...
>         (vec_initv2sfsf): ... this.
>         (vec_extractv2sf): Renamed to ...
>         (vec_extractv2sfsf): ... this.
>         * config/powerpcspe/vector.md (VEC_base_l): New mode attribute.
>         (vec_init<mode>): Renamed to ...
>         (vec_init<mode><VEC_base_l>): ... this.
>         (vec_extract<mode>): Renamed to ...
>         (vec_extract<mode><VEC_base_l>): ... this.
>         * config/powerpcspe/paired.md (vec_initv2sf): Renamed to ...
>         (vec_initv2sfsf): ... this.
>         * config/powerpcspe/altivec.md (splitter, altivec_copysign_v4sf3,
>         vec_unpacku_hi_v16qi, vec_unpacku_hi_v8hi, vec_unpacku_lo_v16qi,
>         vec_unpacku_lo_v8hi, mulv16qi3): Add element mode after mode in
>         gen_vec_init* calls.

OK for the x86 part.

Thanks,
Uros.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Switch vec_init and vec_extract optabs to 2 mode optab to allow extraction of vector from vector or initialization of vector from smaller vectors (PR target/80846)
  2017-07-25  9:14 [PATCH] Switch vec_init and vec_extract optabs to 2 mode optab to allow extraction of vector from vector or initialization of vector from smaller vectors (PR target/80846) Jakub Jelinek
                   ` (4 preceding siblings ...)
  2017-07-26 10:42 ` Uros Bizjak
@ 2017-07-27 11:43 ` Segher Boessenkool
  2017-07-27 11:56 ` Andreas Krebbel
  2017-08-01  8:09 ` Richard Earnshaw (lists)
  7 siblings, 0 replies; 15+ messages in thread
From: Segher Boessenkool @ 2017-07-27 11:43 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Richard Biener, Uros Bizjak, David Edelsohn, Marcus Shawcroft,
	Richard Earnshaw, Andreas Krebbel, Matthew Fortune,
	Eric Botcazou, Andrew Jenner, gcc-patches

On Tue, Jul 25, 2017 at 11:14:32AM +0200, Jakub Jelinek wrote:
> The following patch adjusts the vec_init and vec_extract optabs, so that
> they don't have in the expander names just the vector mode, but also another
> mode, for vec_extract the mode of the result and for vec_init the mode of
> the elts of the vector passed as second operand.

> Ok for trunk?

I failed to say this explicitly yet: okay for rs6000.


Segher

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Switch vec_init and vec_extract optabs to 2 mode optab to allow extraction of vector from vector or initialization of vector from smaller vectors (PR target/80846)
  2017-07-25  9:14 [PATCH] Switch vec_init and vec_extract optabs to 2 mode optab to allow extraction of vector from vector or initialization of vector from smaller vectors (PR target/80846) Jakub Jelinek
                   ` (5 preceding siblings ...)
  2017-07-27 11:43 ` Segher Boessenkool
@ 2017-07-27 11:56 ` Andreas Krebbel
  2017-08-01  8:09 ` Richard Earnshaw (lists)
  7 siblings, 0 replies; 15+ messages in thread
From: Andreas Krebbel @ 2017-07-27 11:56 UTC (permalink / raw)
  To: Jakub Jelinek, gcc-patches

On 07/25/2017 11:14 AM, Jakub Jelinek wrote:

S/390 parts are ok.

-Andreas-

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Switch vec_init and vec_extract optabs to 2 mode optab to allow extraction of vector from vector or initialization of vector from smaller vectors (PR target/80846)
  2017-07-25  9:14 [PATCH] Switch vec_init and vec_extract optabs to 2 mode optab to allow extraction of vector from vector or initialization of vector from smaller vectors (PR target/80846) Jakub Jelinek
                   ` (6 preceding siblings ...)
  2017-07-27 11:56 ` Andreas Krebbel
@ 2017-08-01  8:09 ` Richard Earnshaw (lists)
  7 siblings, 0 replies; 15+ messages in thread
From: Richard Earnshaw (lists) @ 2017-08-01  8:09 UTC (permalink / raw)
  To: Jakub Jelinek, Richard Biener, Uros Bizjak, David Edelsohn,
	Segher Boessenkool, Marcus Shawcroft, Andreas Krebbel,
	Matthew Fortune, Eric Botcazou, Andrew Jenner
  Cc: gcc-patches

On 25/07/17 10:14, Jakub Jelinek wrote:
> Hi!
> 
> The following patch adjusts the vec_init and vec_extract optabs, so that
> they don't have in the expander names just the vector mode, but also another
> mode, for vec_extract the mode of the result and for vec_init the mode of
> the elts of the vector passed as second operand.
> 
> Without this patch, the second mode has been implicit, GET_MODE_INNER of
> the vector mode, so one could just extract a single element from a vector
> or construct vector from elements.  While that is most common, we allow
> in GIMPLE e.g. construction of V8DImode from 4 V2DImode elements etc.
> and the vectorizer uses them.  By having the second mode in the name
> it allows the generic code (vectorizer, expansion) to query whether the
> backend supports such vector from vector expansions or inits from vector
> elts and use them if available.
> 
> For vec_extract, if we say want to extract high V2SImode from V4SImode
> the fallback is try to expand it as DImode extraction from V2DImode.
> This works well in many cases, but doesn't really work for very large
> vectors, say if we want to extract high V8SImode from V16SImode on x86,
> we'd need OImode extraction from V2OImode, which is something the backend
> doesn't have any support for.
> For vec_init, the fallback is usually to go through memory, which is slow in
> many cases.
> 
> This patch only adds new vector from vector extract and init patterns to
> the i386 backend, but I had to change many other targets too, because
> it needs to have the element mode in the vec_extract/vec_init expander
> names.  Seems most of the backends didn't really have a mode attribute
> usable for this or had it only in uppercase, while for the names we need
> lowercase.  Some backends had a convention on how to name lower case
> vs. upper case modes, others didn't have any.  So I'm CCing maintainers
> of affected backends to seek advice on what mode attributes they want to
> use.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, where it improves
> e.g. the code generation for slp-43.c and slp-45.c testcases.
> make cc1 tested in cross-compilers to the remaining targets.
> 
> Ok for trunk?
> 
> 2017-07-25  Jakub Jelinek  <jakub@redhat.com>
> 
> 	PR target/80846
> 	* optabs.def (vec_extract_optab, vec_init_optab): Change from
> 	a direct optab to conversion optab.
> 	* optabs.c (expand_vector_broadcast): Use convert_optab_handler
> 	with GET_MODE_INNER as last argument instead of optab_handler.
> 	* expmed.c (extract_bit_field_1): Likewise.  Use vector from
> 	vector extraction if possible and optab is available.
> 	* expr.c (store_constructor): Use convert_optab_handler instead
> 	of optab_handler.  Use vector initialization from smaller
> 	vectors if possible and optab is available.
> 	* tree-vect-stmts.c (vectorizable_load): Likewise.
> 	* doc/md.texi (vec_extract, vec_init): Document that the optabs
> 	now have two modes.
> 	* config/i386/i386.c (ix86_expand_vector_init): Handle expansion
> 	of vec_init from half-sized vectors with the same element mode.
> 	* config/i386/sse.md (ssehalfvecmode): Add V4TI case.
> 	(ssehalfvecmodelower, ssescalarmodelower): New mode attributes.
> 	(reduc_plus_scal_v8df, reduc_plus_scal_v4df, reduc_plus_scal_v2df,
> 	reduc_plus_scal_v16sf, reduc_plus_scal_v8sf, reduc_plus_scal_v4sf,
> 	reduc_<code>_scal_<mode>, reduc_umin_scal_v8hi): Add element mode
> 	after mode in gen_vec_extract* calls.
> 	(vec_extract<mode>): Renamed to ...
> 	(vec_extract<mode><ssescalarmodelower>): ... this.
> 	(vec_extract<mode><ssehalfvecmodelower>): New expander.
> 	(rotl<mode>3, rotr<mode>3, <shift_insn><mode>3, ashrv2di3): Add
> 	element mode after mode in gen_vec_init* calls.
> 	(VEC_INIT_HALF_MODE): New mode iterator.
> 	(vec_init<mode>): Renamed to ...
> 	(vec_init<mode><ssescalarmodelower>): ... this.
> 	(vec_init<mode><ssehalfvecmodelower>): New expander.
> 	* config/i386/mmx.md (vec_extractv2sf): Renamed to ...
> 	(vec_extractv2sfsf): ... this.
> 	(vec_initv2sf): Renamed to ...
> 	(vec_initv2sfsf): ... this.
> 	(vec_extractv2si): Renamed to ...
> 	(vec_extractv2sisi): ... this.
> 	(vec_initv2si): Renamed to ...
> 	(vec_initv2sisi): ... this.
> 	(vec_extractv4hi): Renamed to ...
> 	(vec_extractv4hihi): ... this.
> 	(vec_initv4hi): Renamed to ...
> 	(vec_initv4hihi): ... this.
> 	(vec_extractv8qi): Renamed to ...
> 	(vec_extractv8qiqi): ... this.
> 	(vec_initv8qi): Renamed to ...
> 	(vec_initv8qiqi): ... this.
> 	* config/rs6000/vector.md (VEC_base_l): New mode attribute.
> 	(vec_init<mode>): Renamed to ...
> 	(vec_init<mode><VEC_base_l>): ... this.
> 	(vec_extract<mode>): Renamed to ...
> 	(vec_extract<mode><VEC_base_l>): ... this.
> 	* config/rs6000/paired.md (vec_initv2sf): Renamed to ...
> 	(vec_initv2sfsf): ... this.
> 	* config/rs6000/altivec.md (splitter, altivec_copysign_v4sf3,
> 	vec_unpacku_hi_v16qi, vec_unpacku_hi_v8hi, vec_unpacku_lo_v16qi,
> 	vec_unpacku_lo_v8hi, mulv16qi3, altivec_vreve<mode>2): Add
> 	element mode after mode in gen_vec_init* calls.
> 	* config/aarch64/aarch64-simd.md (vec_init<mode>): Renamed to ...
> 	(vec_init<mode><Vel>): ... this.
> 	(vec_extract<mode>): Renamed to ...
> 	(vec_extract<mode><Vel>): ... this.
> 	* config/aarch64/iterators.md (Vel): New mode attribute.
> 	* config/s390/s390.c (s390_expand_vec_strlen, s390_expand_vec_movstr):
> 	Add element mode after mode in gen_vec_extract* calls.
> 	* config/s390/vector.md (non_vec_l): New mode attribute.
> 	(vec_extract<mode>): Renamed to ...
> 	(vec_extract<mode><non_vec_l>): ... this.
> 	(vec_init<mode>): Renamed to ...
> 	(vec_init<mode><non_vec_l>): ... this.
> 	* config/s390/s390-builtins.def (s390_vlgvb, s390_vlgvh, s390_vlgvf,
> 	s390_vlgvf_flt, s390_vlgvg, s390_vlgvg_dbl): Add element mode after
> 	vec_extract mode.
> 	* config/arm/iterators.md (V_elem_l): New mode attribute.
> 	* config/arm/neon.md (vec_extract<mode>): Renamed to ...
> 	(vec_extract<mode><V_elem_l>): ... this.
> 	(vec_extractv2di): Renamed to ...
> 	(vec_extractv2didi): ... this.
> 	(vec_init<mode>): Renamed to ...
> 	(vec_init<mode><V_elem_l>): ... this.
> 	(reduc_plus_scal_<mode>, reduc_plus_scal_v2di, reduc_smin_scal_<mode>,
> 	reduc_smax_scal_<mode>, reduc_umin_scal_<mode>,
> 	reduc_umax_scal_<mode>, neon_vget_lane<mode>, neon_vget_laneu<mode>):
> 	Add element mode after gen_vec_extract* calls.
> 	* config/mips/mips-msa.md (vec_init<mode>): Renamed to ...
> 	(vec_init<mode><unitmode>): ... this.
> 	(vec_extract<mode>): Renamed to ...
> 	(vec_extract<mode><unitmode>): ... this.
> 	* config/mips/loongson.md (vec_init<mode>): Renamed to ...
> 	(vec_init<mode><unitmode>): ... this.
> 	* config/mips/mips-ps-3d.md (vec_initv2sf): Renamed to ...
> 	(vec_initv2sfsf): ... this.
> 	(vec_extractv2sf): Renamed to ...
> 	(vec_extractv2sfsf): ... this.
> 	(reduc_plus_scal_v2sf, reduc_smin_scal_v2sf, reduc_smax_scal_v2sf):
> 	Add element mode after gen_vec_extract* calls.
> 	* config/mips/mips.md (unitmode): New mode iterator.
> 	* config/spu/spu.c (spu_expand_prologue, spu_allocate_stack,
> 	spu_builtin_extract): Add element mode after gen_vec_extract* calls.
> 	* config/spu/spu.md (inner_l): New mode attribute.
> 	(vec_init<mode>): Renamed to ...
> 	(vec_init<mode><inner_l>): ... this.
> 	(vec_extract<mode>): Renamed to ...
> 	(vec_extract<mode><inner_l>): ... this.
> 	* config/sparc/sparc.md (veltmode): New mode iterator.
> 	(vec_init<VMALL:mode>): Renamed to ...
> 	(vec_init<VMALL:mode><VMALL:veltmode>): ... this.
> 	* config/ia64/vect.md (vec_initv2si): Renamed to ...
> 	(vec_initv2sisi): ... this.
> 	(vec_initv2sf): Renamed to ...
> 	(vec_initv2sfsf): ... this.
> 	(vec_extractv2sf): Renamed to ...
> 	(vec_extractv2sfsf): ... this.
> 	* config/powerpcspe/vector.md (VEC_base_l): New mode attribute.
> 	(vec_init<mode>): Renamed to ...
> 	(vec_init<mode><VEC_base_l>): ... this.
> 	(vec_extract<mode>): Renamed to ...
> 	(vec_extract<mode><VEC_base_l>): ... this.
> 	* config/powerpcspe/paired.md (vec_initv2sf): Renamed to ...
> 	(vec_initv2sfsf): ... this.
> 	* config/powerpcspe/altivec.md (splitter, altivec_copysign_v4sf3,
> 	vec_unpacku_hi_v16qi, vec_unpacku_hi_v8hi, vec_unpacku_lo_v16qi,
> 	vec_unpacku_lo_v8hi, mulv16qi3): Add element mode after mode in
> 	gen_vec_init* calls.
> 

Arm & AArch64 bits are OK.

R.

> --- gcc/optabs.def.jj	2017-07-24 10:57:45.944815535 +0200
> +++ gcc/optabs.def	2017-07-24 16:11:23.066229910 +0200
> @@ -89,6 +89,8 @@ OPTAB_CD(vec_cmpu_optab, "vec_cmpu$a$b")
>  OPTAB_CD(vec_cmpeq_optab, "vec_cmpeq$a$b")
>  OPTAB_CD(maskload_optab, "maskload$a$b")
>  OPTAB_CD(maskstore_optab, "maskstore$a$b")
> +OPTAB_CD(vec_extract_optab, "vec_extract$a$b")
> +OPTAB_CD(vec_init_optab, "vec_init$a$b")
>  
>  OPTAB_NL(add_optab, "add$P$a3", PLUS, "add", '3', gen_int_fp_fixed_libfunc)
>  OPTAB_NX(add_optab, "add$F$a3")
> @@ -294,8 +296,6 @@ OPTAB_D (udot_prod_optab, "udot_prod$I$a
>  OPTAB_D (usum_widen_optab, "widen_usum$I$a3")
>  OPTAB_D (usad_optab, "usad$I$a")
>  OPTAB_D (ssad_optab, "ssad$I$a")
> -OPTAB_D (vec_extract_optab, "vec_extract$a")
> -OPTAB_D (vec_init_optab, "vec_init$a")
>  OPTAB_D (vec_pack_sfix_trunc_optab, "vec_pack_sfix_trunc_$a")
>  OPTAB_D (vec_pack_ssat_optab, "vec_pack_ssat_$a")
>  OPTAB_D (vec_pack_trunc_optab, "vec_pack_trunc_$a")
> --- gcc/optabs.c.jj	2017-07-24 10:57:46.216812275 +0200
> +++ gcc/optabs.c	2017-07-24 16:11:23.067229898 +0200
> @@ -386,7 +386,8 @@ expand_vector_broadcast (machine_mode vm
>    /* ??? If the target doesn't have a vec_init, then we have no easy way
>       of performing this operation.  Most of this sort of generic support
>       is hidden away in the vector lowering support in gimple.  */
> -  icode = optab_handler (vec_init_optab, vmode);
> +  icode = convert_optab_handler (vec_init_optab, vmode,
> +				 GET_MODE_INNER (vmode));
>    if (icode == CODE_FOR_nothing)
>      return NULL;
>  
> --- gcc/expmed.c.jj	2017-07-24 10:57:45.914815894 +0200
> +++ gcc/expmed.c	2017-07-24 16:11:23.071229850 +0200
> @@ -1566,6 +1566,55 @@ extract_bit_field_1 (rtx str_rtx, unsign
>        return op0;
>      }
>  
> +  /* First try to check for vector from vector extractions.  */
> +  if (VECTOR_MODE_P (GET_MODE (op0))
> +      && !MEM_P (op0)
> +      && VECTOR_MODE_P (tmode)
> +      && GET_MODE_SIZE (GET_MODE (op0)) > GET_MODE_SIZE (tmode))
> +    {
> +      machine_mode new_mode = GET_MODE (op0);
> +      if (GET_MODE_INNER (new_mode) != GET_MODE_INNER (tmode))
> +	{
> +	  new_mode = mode_for_vector (GET_MODE_INNER (tmode),
> +				      GET_MODE_BITSIZE (GET_MODE (op0))
> +				      / GET_MODE_UNIT_BITSIZE (tmode));
> +	  if (!VECTOR_MODE_P (new_mode)
> +	      || GET_MODE_SIZE (new_mode) != GET_MODE_SIZE (GET_MODE (op0))
> +	      || GET_MODE_INNER (new_mode) != GET_MODE_INNER (tmode)
> +	      || !targetm.vector_mode_supported_p (new_mode))
> +	    new_mode = VOIDmode;
> +	}
> +      if (new_mode != VOIDmode
> +	  && (convert_optab_handler (vec_extract_optab, new_mode, tmode)
> +	      != CODE_FOR_nothing)
> +	  && ((bitnum + bitsize - 1) / GET_MODE_BITSIZE (tmode)
> +	      == bitnum / GET_MODE_BITSIZE (tmode)))
> +	{
> +	  struct expand_operand ops[3];
> +	  machine_mode outermode = new_mode;
> +	  machine_mode innermode = tmode;
> +	  enum insn_code icode
> +	    = convert_optab_handler (vec_extract_optab, outermode, innermode);
> +	  unsigned HOST_WIDE_INT pos = bitnum / GET_MODE_BITSIZE (innermode);
> +
> +	  if (new_mode != GET_MODE (op0))
> +	    op0 = gen_lowpart (new_mode, op0);
> +	  create_output_operand (&ops[0], target, innermode);
> +	  ops[0].target = 1;
> +	  create_input_operand (&ops[1], op0, outermode);
> +	  create_integer_operand (&ops[2], pos);
> +	  if (maybe_expand_insn (icode, 3, ops))
> +	    {
> +	      if (alt_rtl && ops[0].target)
> +		*alt_rtl = target;
> +	      target = ops[0].value;
> +	      if (GET_MODE (target) != mode)
> +		return gen_lowpart (tmode, target);
> +	      return target;
> +	    }
> +	}
> +    }
> +
>    /* See if we can get a better vector mode before extracting.  */
>    if (VECTOR_MODE_P (GET_MODE (op0))
>        && !MEM_P (op0)
> @@ -1599,14 +1648,17 @@ extract_bit_field_1 (rtx str_rtx, unsign
>       available.  */
>    if (VECTOR_MODE_P (GET_MODE (op0))
>        && !MEM_P (op0)
> -      && optab_handler (vec_extract_optab, GET_MODE (op0)) != CODE_FOR_nothing
> +      && (convert_optab_handler (vec_extract_optab, GET_MODE (op0),
> +				 GET_MODE_INNER (GET_MODE (op0)))
> +	  != CODE_FOR_nothing)
>        && ((bitnum + bitsize - 1) / GET_MODE_UNIT_BITSIZE (GET_MODE (op0))
>  	  == bitnum / GET_MODE_UNIT_BITSIZE (GET_MODE (op0))))
>      {
>        struct expand_operand ops[3];
>        machine_mode outermode = GET_MODE (op0);
>        machine_mode innermode = GET_MODE_INNER (outermode);
> -      enum insn_code icode = optab_handler (vec_extract_optab, outermode);
> +      enum insn_code icode
> +	= convert_optab_handler (vec_extract_optab, outermode, innermode);
>        unsigned HOST_WIDE_INT pos = bitnum / GET_MODE_BITSIZE (innermode);
>  
>        create_output_operand (&ops[0], target, innermode);
> --- gcc/expr.c.jj	2017-07-24 10:57:45.963815307 +0200
> +++ gcc/expr.c	2017-07-24 16:11:23.073229826 +0200
> @@ -6589,6 +6589,7 @@ store_constructor (tree exp, rtx target,
>  	rtvec vector = NULL;
>  	unsigned n_elts;
>  	alias_set_type alias;
> +	bool vec_vec_init_p = false;
>  
>  	gcc_assert (eltmode != BLKmode);
>  
> @@ -6596,27 +6597,30 @@ store_constructor (tree exp, rtx target,
>  	if (REG_P (target) && VECTOR_MODE_P (GET_MODE (target)))
>  	  {
>  	    machine_mode mode = GET_MODE (target);
> +	    machine_mode emode = eltmode;
>  
> -	    icode = (int) optab_handler (vec_init_optab, mode);
> -	    /* Don't use vec_init<mode> if some elements have VECTOR_TYPE.  */
> -	    if (icode != CODE_FOR_nothing)
> +	    if (CONSTRUCTOR_NELTS (exp)
> +		&& (TREE_CODE (TREE_TYPE (CONSTRUCTOR_ELT (exp, 0)->value))
> +		    == VECTOR_TYPE))
>  	      {
> -		tree value;
> -
> -		FOR_EACH_CONSTRUCTOR_VALUE (CONSTRUCTOR_ELTS (exp), idx, value)
> -		  if (TREE_CODE (TREE_TYPE (value)) == VECTOR_TYPE)
> -		    {
> -		      icode = CODE_FOR_nothing;
> -		      break;
> -		    }
> +		tree etype = TREE_TYPE (CONSTRUCTOR_ELT (exp, 0)->value);
> +		gcc_assert (CONSTRUCTOR_NELTS (exp) * TYPE_VECTOR_SUBPARTS (etype)
> +			    == n_elts);
> +		emode = TYPE_MODE (etype);
>  	      }
> +	    icode = (int) convert_optab_handler (vec_init_optab, mode, emode);
>  	    if (icode != CODE_FOR_nothing)
>  	      {
> -		unsigned int i;
> +		unsigned int i, n = n_elts;
>  
> -		vector = rtvec_alloc (n_elts);
> -		for (i = 0; i < n_elts; i++)
> -		  RTVEC_ELT (vector, i) = CONST0_RTX (GET_MODE_INNER (mode));
> +		if (emode != eltmode)
> +		  {
> +		    n = CONSTRUCTOR_NELTS (exp);
> +		    vec_vec_init_p = true;
> +		  }
> +		vector = rtvec_alloc (n);
> +		for (i = 0; i < n; i++)
> +		  RTVEC_ELT (vector, i) = CONST0_RTX (emode);
>  	      }
>  	  }
>  
> @@ -6634,10 +6638,10 @@ store_constructor (tree exp, rtx target,
>  
>  	    FOR_EACH_CONSTRUCTOR_VALUE (CONSTRUCTOR_ELTS (exp), idx, value)
>  	      {
> -		int n_elts_here = tree_to_uhwi
> -		  (int_const_binop (TRUNC_DIV_EXPR,
> -				    TYPE_SIZE (TREE_TYPE (value)),
> -				    TYPE_SIZE (elttype)));
> +		tree sz = TYPE_SIZE (TREE_TYPE (value));
> +		int n_elts_here
> +		  = tree_to_uhwi (int_const_binop (TRUNC_DIV_EXPR, sz,
> +						   TYPE_SIZE (elttype)));
>  
>  		count += n_elts_here;
>  		if (mostly_zeros_p (value))
> @@ -6687,18 +6691,21 @@ store_constructor (tree exp, rtx target,
>  
>  	    if (vector)
>  	      {
> -		/* vec_init<mode> should not be used if there are VECTOR_TYPE
> -		   elements.  */
> -		gcc_assert (TREE_CODE (TREE_TYPE (value)) != VECTOR_TYPE);
> -		RTVEC_ELT (vector, eltpos)
> -		  = expand_normal (value);
> +		if (vec_vec_init_p)
> +		  {
> +		    gcc_assert (ce->index == NULL_TREE);
> +		    gcc_assert (TREE_CODE (TREE_TYPE (value)) == VECTOR_TYPE);
> +		    eltpos = idx;
> +		  }
> +		else
> +		  gcc_assert (TREE_CODE (TREE_TYPE (value)) != VECTOR_TYPE);
> +		RTVEC_ELT (vector, eltpos) = expand_normal (value);
>  	      }
>  	    else
>  	      {
> -		machine_mode value_mode =
> -		  TREE_CODE (TREE_TYPE (value)) == VECTOR_TYPE
> -		  ? TYPE_MODE (TREE_TYPE (value))
> -		  : eltmode;
> +		machine_mode value_mode
> +		  = (TREE_CODE (TREE_TYPE (value)) == VECTOR_TYPE
> +		     ? TYPE_MODE (TREE_TYPE (value)) : eltmode);
>  		bitpos = eltpos * elt_size;
>  		store_constructor_field (target, bitsize, bitpos, 0,
>  					 bitregion_end, value_mode,
> @@ -6707,9 +6714,9 @@ store_constructor (tree exp, rtx target,
>  	  }
>  
>  	if (vector)
> -	  emit_insn (GEN_FCN (icode)
> -		     (target,
> -		      gen_rtx_PARALLEL (GET_MODE (target), vector)));
> +	  emit_insn (GEN_FCN (icode) (target,
> +				      gen_rtx_PARALLEL (GET_MODE (target),
> +							vector)));
>  	break;
>        }
>  
> --- gcc/tree-vect-stmts.c.jj	2017-07-24 10:57:46.004814816 +0200
> +++ gcc/tree-vect-stmts.c	2017-07-24 16:11:23.049230114 +0200
> @@ -6996,29 +6996,43 @@ vectorizable_load (gimple *stmt, gimple_
>  	{
>  	  if (group_size < nunits)
>  	    {
> -	      /* Avoid emitting a constructor of vector elements by performing
> -		 the loads using an integer type of the same size,
> -		 constructing a vector of those and then re-interpreting it
> -		 as the original vector type.  This works around the fact
> -		 that the vec_init optab was only designed for scalar
> -		 element modes and thus expansion goes through memory.
> -		 This avoids a huge runtime penalty due to the general
> -		 inability to perform store forwarding from smaller stores
> -		 to a larger load.  */
> -	      unsigned lsize
> -		= group_size * TYPE_PRECISION (TREE_TYPE (vectype));
> -	      machine_mode elmode = mode_for_size (lsize, MODE_INT, 0);
> -	      machine_mode vmode = mode_for_vector (elmode,
> -						    nunits / group_size);
> -	      /* If we can't construct such a vector fall back to
> -		 element loads of the original vector type.  */
> +	      /* First check if vec_init optab supports construction from
> +		 vector elts directly.  */
> +	      machine_mode elmode = TYPE_MODE (TREE_TYPE (vectype));
> +	      machine_mode vmode = mode_for_vector (elmode, group_size);
>  	      if (VECTOR_MODE_P (vmode)
> -		  && optab_handler (vec_init_optab, vmode) != CODE_FOR_nothing)
> +		  && (convert_optab_handler (vec_init_optab,
> +					     TYPE_MODE (vectype), vmode)
> +		      != CODE_FOR_nothing))
>  		{
>  		  nloads = nunits / group_size;
>  		  lnel = group_size;
> -		  ltype = build_nonstandard_integer_type (lsize, 1);
> -		  lvectype = build_vector_type (ltype, nloads);
> +		  ltype = build_vector_type (TREE_TYPE (vectype), group_size);
> +		}
> +	      else
> +		{
> +		  /* Otherwise avoid emitting a constructor of vector elements
> +		     by performing the loads using an integer type of the same
> +		     size, constructing a vector of those and then
> +		     re-interpreting it as the original vector type.
> +		     This avoids a huge runtime penalty due to the general
> +		     inability to perform store forwarding from smaller stores
> +		     to a larger load.  */
> +		  unsigned lsize
> +		    = group_size * TYPE_PRECISION (TREE_TYPE (vectype));
> +		  elmode = mode_for_size (lsize, MODE_INT, 0);
> +		  vmode = mode_for_vector (elmode, nunits / group_size);
> +		  /* If we can't construct such a vector fall back to
> +		     element loads of the original vector type.  */
> +		  if (VECTOR_MODE_P (vmode)
> +		      && (convert_optab_handler (vec_init_optab, vmode, elmode)
> +			  != CODE_FOR_nothing))
> +		    {
> +		      nloads = nunits / group_size;
> +		      lnel = group_size;
> +		      ltype = build_nonstandard_integer_type (lsize, 1);
> +		      lvectype = build_vector_type (ltype, nloads);
> +		    }
>  		}
>  	    }
>  	  else
> --- gcc/doc/md.texi.jj	2017-07-24 10:57:45.989814996 +0200
> +++ gcc/doc/md.texi	2017-07-24 17:09:55.536882382 +0200
> @@ -4871,15 +4871,22 @@ This pattern is not allowed to @code{FAI
>  Set given field in the vector value.  Operand 0 is the vector to modify,
>  operand 1 is new value of field and operand 2 specify the field index.
>  
> -@cindex @code{vec_extract@var{m}} instruction pattern
> -@item @samp{vec_extract@var{m}}
> +@cindex @code{vec_extract@var{m}@var{n}} instruction pattern
> +@item @samp{vec_extract@var{m}@var{n}}
>  Extract given field from the vector value.  Operand 1 is the vector, operand 2
> -specify field index and operand 0 place to store value into.
> +specify field index and operand 0 place to store value into.  The
> +@var{n} mode is the mode of the field or vector of fields that should be
> +extracted, should be either element mode of the vector mode @var{m}, or
> +a vector mode with the same element mode and smaller number of elements.
> +If @var{n} is a vector mode, the index is counted in units of that mode.
>  
> -@cindex @code{vec_init@var{m}} instruction pattern
> -@item @samp{vec_init@var{m}}
> +@cindex @code{vec_init@var{m}@var{n}} instruction pattern
> +@item @samp{vec_init@var{m}@var{n}}
>  Initialize the vector to given values.  Operand 0 is the vector to initialize
> -and operand 1 is parallel containing values for individual fields.
> +and operand 1 is parallel containing values for individual fields.  The
> +@var{n} mode is the mode of the elements, should be either element mode of
> +the vector mode @var{m}, or a vector mode with the same element mode and
> +smaller number of elements.
>  
>  @cindex @code{vec_cmp@var{m}@var{n}} instruction pattern
>  @item @samp{vec_cmp@var{m}@var{n}}
> --- gcc/config/i386/i386.c.jj	2017-07-24 10:58:11.831505333 +0200
> +++ gcc/config/i386/i386.c	2017-07-24 16:11:23.060229982 +0200
> @@ -44297,6 +44297,34 @@ ix86_expand_vector_init (bool mmx_ok, rt
>    int i;
>    rtx x;
>  
> +  /* Handle first initialization from vector elts.  */
> +  if (n_elts != XVECLEN (vals, 0))
> +    {
> +      rtx subtarget = target;
> +      x = XVECEXP (vals, 0, 0);
> +      gcc_assert (GET_MODE_INNER (GET_MODE (x)) == inner_mode);
> +      if (GET_MODE_NUNITS (GET_MODE (x)) * 2 == n_elts)
> +	{
> +	  rtx ops[2] = { XVECEXP (vals, 0, 0), XVECEXP (vals, 0, 1) };
> +	  if (inner_mode == QImode || inner_mode == HImode)
> +	    {
> +	      mode = mode_for_vector (SImode,
> +				      n_elts * GET_MODE_SIZE (inner_mode) / 4);
> +	      inner_mode
> +		= mode_for_vector (SImode,
> +				   n_elts * GET_MODE_SIZE (inner_mode) / 8);
> +	      ops[0] = gen_lowpart (inner_mode, ops[0]);
> +	      ops[1] = gen_lowpart (inner_mode, ops[1]);
> +	      subtarget = gen_reg_rtx (mode);
> +	    }
> +	  ix86_expand_vector_init_concat (mode, subtarget, ops, 2);
> +	  if (subtarget != target)
> +	    emit_move_insn (target, gen_lowpart (GET_MODE (target), subtarget));
> +	  return;
> +	}
> +      gcc_unreachable ();
> +    }
> +
>    for (i = 0; i < n_elts; ++i)
>      {
>        x = XVECEXP (vals, 0, i);
> --- gcc/config/i386/sse.md.jj	2017-07-24 10:57:45.807817176 +0200
> +++ gcc/config/i386/sse.md	2017-07-24 16:54:35.658088768 +0200
> @@ -658,13 +658,21 @@ (define_mode_attr ssedoublevecmode
>  
>  ;; Mapping of vector modes to a vector mode of half size
>  (define_mode_attr ssehalfvecmode
> -  [(V64QI "V32QI") (V32HI "V16HI") (V16SI "V8SI") (V8DI "V4DI")
> +  [(V64QI "V32QI") (V32HI "V16HI") (V16SI "V8SI") (V8DI "V4DI") (V4TI "V2TI")
>     (V32QI "V16QI") (V16HI  "V8HI") (V8SI  "V4SI") (V4DI "V2DI")
>     (V16QI  "V8QI") (V8HI   "V4HI") (V4SI  "V2SI")
>     (V16SF "V8SF") (V8DF "V4DF")
>     (V8SF  "V4SF") (V4DF "V2DF")
>     (V4SF  "V2SF")])
>  
> +(define_mode_attr ssehalfvecmodelower
> +  [(V64QI "v32qi") (V32HI "v16hi") (V16SI "v8si") (V8DI "v4di") (V4TI "v2ti")
> +   (V32QI "v16qi") (V16HI  "v8hi") (V8SI  "v4si") (V4DI "v2di")
> +   (V16QI  "v8qi") (V8HI   "v4hi") (V4SI  "v2si")
> +   (V16SF "v8sf") (V8DF "v4df")
> +   (V8SF  "v4sf") (V4DF "v2df")
> +   (V4SF  "v2sf")])
> +
>  ;; Mapping of vector modes ti packed single mode of the same size
>  (define_mode_attr ssePSmode
>    [(V16SI "V16SF") (V8DF "V16SF")
> @@ -690,6 +698,16 @@ (define_mode_attr ssescalarmode
>     (V8DF "DF")  (V4DF "DF")  (V2DF "DF")
>     (V4TI "TI")  (V2TI "TI")])
>  
> +;; Mapping of vector modes back to the scalar modes
> +(define_mode_attr ssescalarmodelower
> +  [(V64QI "qi") (V32QI "qi") (V16QI "qi")
> +   (V32HI "hi") (V16HI "hi") (V8HI "hi")
> +   (V16SI "si") (V8SI "si")  (V4SI "si")
> +   (V8DI "di")  (V4DI "di")  (V2DI "di")
> +   (V16SF "sf") (V8SF "sf")  (V4SF "sf")
> +   (V8DF "df")  (V4DF "df")  (V2DF "df")
> +   (V4TI "ti")  (V2TI "ti")])
> +
>  ;; Mapping of vector modes to the 128bit modes
>  (define_mode_attr ssexmmmode
>    [(V64QI "V16QI") (V32QI "V16QI") (V16QI "V16QI")
> @@ -2356,7 +2374,7 @@ (define_expand "reduc_plus_scal_v8df"
>  {
>    rtx tmp = gen_reg_rtx (V8DFmode);
>    ix86_expand_reduc (gen_addv8df3, tmp, operands[1]);
> -  emit_insn (gen_vec_extractv8df (operands[0], tmp, const0_rtx));
> +  emit_insn (gen_vec_extractv8dfdf (operands[0], tmp, const0_rtx));
>    DONE;
>  })
>  
> @@ -2371,7 +2389,7 @@ (define_expand "reduc_plus_scal_v4df"
>    emit_insn (gen_avx_haddv4df3 (tmp, operands[1], operands[1]));
>    emit_insn (gen_avx_vperm2f128v4df3 (tmp2, tmp, tmp, GEN_INT (1)));
>    emit_insn (gen_addv4df3 (vec_res, tmp, tmp2));
> -  emit_insn (gen_vec_extractv4df (operands[0], vec_res, const0_rtx));
> +  emit_insn (gen_vec_extractv4dfdf (operands[0], vec_res, const0_rtx));
>    DONE;
>  })
>  
> @@ -2382,7 +2400,7 @@ (define_expand "reduc_plus_scal_v2df"
>  {
>    rtx tmp = gen_reg_rtx (V2DFmode);
>    emit_insn (gen_sse3_haddv2df3 (tmp, operands[1], operands[1]));
> -  emit_insn (gen_vec_extractv2df (operands[0], tmp, const0_rtx));
> +  emit_insn (gen_vec_extractv2dfdf (operands[0], tmp, const0_rtx));
>    DONE;
>  })
>  
> @@ -2393,7 +2411,7 @@ (define_expand "reduc_plus_scal_v16sf"
>  {
>    rtx tmp = gen_reg_rtx (V16SFmode);
>    ix86_expand_reduc (gen_addv16sf3, tmp, operands[1]);
> -  emit_insn (gen_vec_extractv16sf (operands[0], tmp, const0_rtx));
> +  emit_insn (gen_vec_extractv16sfsf (operands[0], tmp, const0_rtx));
>    DONE;
>  })
>  
> @@ -2409,7 +2427,7 @@ (define_expand "reduc_plus_scal_v8sf"
>    emit_insn (gen_avx_haddv8sf3 (tmp2, tmp, tmp));
>    emit_insn (gen_avx_vperm2f128v8sf3 (tmp, tmp2, tmp2, GEN_INT (1)));
>    emit_insn (gen_addv8sf3 (vec_res, tmp, tmp2));
> -  emit_insn (gen_vec_extractv8sf (operands[0], vec_res, const0_rtx));
> +  emit_insn (gen_vec_extractv8sfsf (operands[0], vec_res, const0_rtx));
>    DONE;
>  })
>  
> @@ -2427,7 +2445,7 @@ (define_expand "reduc_plus_scal_v4sf"
>      }
>    else
>      ix86_expand_reduc (gen_addv4sf3, vec_res, operands[1]);
> -  emit_insn (gen_vec_extractv4sf (operands[0], vec_res, const0_rtx));
> +  emit_insn (gen_vec_extractv4sfsf (operands[0], vec_res, const0_rtx));
>    DONE;
>  })
>  
> @@ -2449,7 +2467,8 @@ (define_expand "reduc_<code>_scal_<mode>
>  {
>    rtx tmp = gen_reg_rtx (<MODE>mode);
>    ix86_expand_reduc (gen_<code><mode>3, tmp, operands[1]);
> -  emit_insn (gen_vec_extract<mode> (operands[0], tmp, const0_rtx));
> +  emit_insn (gen_vec_extract<mode><ssescalarmodelower> (operands[0], tmp,
> +							const0_rtx));
>    DONE;
>  })
>  
> @@ -2461,7 +2480,8 @@ (define_expand "reduc_<code>_scal_<mode>
>  {
>    rtx tmp = gen_reg_rtx (<MODE>mode);
>    ix86_expand_reduc (gen_<code><mode>3, tmp, operands[1]);
> -  emit_insn (gen_vec_extract<mode> (operands[0], tmp, const0_rtx));
> +  emit_insn (gen_vec_extract<mode><ssescalarmodelower> (operands[0], tmp,
> +  							const0_rtx));
>    DONE;
>  })
>  
> @@ -2473,7 +2493,8 @@ (define_expand "reduc_<code>_scal_<mode>
>  {
>    rtx tmp = gen_reg_rtx (<MODE>mode);
>    ix86_expand_reduc (gen_<code><mode>3, tmp, operands[1]);
> -  emit_insn (gen_vec_extract<mode> (operands[0], tmp, const0_rtx));
> +  emit_insn (gen_vec_extract<mode><ssescalarmodelower> (operands[0], tmp,
> +							const0_rtx));
>    DONE;
>  })
>  
> @@ -2485,7 +2506,7 @@ (define_expand "reduc_umin_scal_v8hi"
>  {
>    rtx tmp = gen_reg_rtx (V8HImode);
>    ix86_expand_reduc (gen_uminv8hi3, tmp, operands[1]);
> -  emit_insn (gen_vec_extractv8hi (operands[0], tmp, const0_rtx));
> +  emit_insn (gen_vec_extractv8hihi (operands[0], tmp, const0_rtx));
>    DONE;
>  })
>  
> @@ -7881,7 +7902,7 @@ (define_mode_iterator VEC_EXTRACT_MODE
>     (V8DF "TARGET_AVX512F") (V4DF "TARGET_AVX") V2DF
>     (V4TI "TARGET_AVX512F") (V2TI "TARGET_AVX")])
>  
> -(define_expand "vec_extract<mode>"
> +(define_expand "vec_extract<mode><ssescalarmodelower>"
>    [(match_operand:<ssescalarmode> 0 "register_operand")
>     (match_operand:VEC_EXTRACT_MODE 1 "register_operand")
>     (match_operand 2 "const_int_operand")]
> @@ -7892,6 +7913,19 @@ (define_expand "vec_extract<mode>"
>    DONE;
>  })
>  
> +(define_expand "vec_extract<mode><ssehalfvecmodelower>"
> +  [(match_operand:<ssehalfvecmode> 0 "nonimmediate_operand")
> +   (match_operand:V_512 1 "register_operand")
> +   (match_operand 2 "const_0_to_1_operand")]
> +  "TARGET_AVX512F"
> +{
> +  if (INTVAL (operands[2]))
> +    emit_insn (gen_vec_extract_hi_<mode> (operands[0], operands[1]));
> +  else
> +    emit_insn (gen_vec_extract_lo_<mode> (operands[0], operands[1]));
> +  DONE;
> +})
> +
>  ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
>  ;;
>  ;; Parallel double-precision floating point element swizzling
> @@ -16693,7 +16727,7 @@ (define_expand "rotl<mode>3"
>        for (i = 0; i < <ssescalarnum>; i++)
>  	RTVEC_ELT (vs, i) = op2;
>  
> -      emit_insn (gen_vec_init<mode> (reg, par));
> +      emit_insn (gen_vec_init<mode><ssescalarmodelower> (reg, par));
>        emit_insn (gen_xop_vrotl<mode>3 (operands[0], operands[1], reg));
>        DONE;
>      }
> @@ -16725,7 +16759,7 @@ (define_expand "rotr<mode>3"
>        for (i = 0; i < <ssescalarnum>; i++)
>  	RTVEC_ELT (vs, i) = op2;
>  
> -      emit_insn (gen_vec_init<mode> (reg, par));
> +      emit_insn (gen_vec_init<mode><ssescalarmodelower> (reg, par));
>        emit_insn (gen_neg<mode>2 (neg, reg));
>        emit_insn (gen_xop_vrotl<mode>3 (operands[0], operands[1], neg));
>        DONE;
> @@ -17019,7 +17053,7 @@ (define_expand "<shift_insn><mode>3"
>          XVECEXP (par, 0, i) = operands[2];
>  
>        tmp = gen_reg_rtx (V16QImode);
> -      emit_insn (gen_vec_initv16qi (tmp, par));
> +      emit_insn (gen_vec_initv16qiqi (tmp, par));
>  
>        if (negate)
>  	emit_insn (gen_negv16qi2 (tmp, tmp));
> @@ -17055,7 +17089,7 @@ (define_expand "ashrv2di3"
>        for (i = 0; i < 2; i++)
>  	XVECEXP (par, 0, i) = operands[2];
>  
> -      emit_insn (gen_vec_initv2di (reg, par));
> +      emit_insn (gen_vec_initv2didi (reg, par));
>  
>        if (negate)
>  	emit_insn (gen_negv2di2 (reg, reg));
> @@ -18775,7 +18809,7 @@ (define_insn_and_split "avx_<castmode><a
>  				  <ssehalfvecmode>mode);
>  })
>  
> -;; Modes handled by vec_init patterns.
> +;; Modes handled by vec_init expanders.
>  (define_mode_iterator VEC_INIT_MODE
>    [(V64QI "TARGET_AVX512F") (V32QI "TARGET_AVX") V16QI
>     (V32HI "TARGET_AVX512F") (V16HI "TARGET_AVX") V8HI
> @@ -18785,11 +18819,31 @@ (define_mode_iterator VEC_INIT_MODE
>     (V8DF "TARGET_AVX512F") (V4DF "TARGET_AVX") (V2DF "TARGET_SSE2")
>     (V4TI "TARGET_AVX512F") (V2TI "TARGET_AVX")])
>  
> -(define_expand "vec_init<mode>"
> +;; Likewise, but for initialization from half sized vectors.
> +;; Thus, these are all VEC_INIT_MODE modes except V2??.
> +(define_mode_iterator VEC_INIT_HALF_MODE
> +  [(V64QI "TARGET_AVX512F") (V32QI "TARGET_AVX") V16QI
> +   (V32HI "TARGET_AVX512F") (V16HI "TARGET_AVX") V8HI
> +   (V16SI "TARGET_AVX512F") (V8SI "TARGET_AVX") V4SI
> +   (V8DI "TARGET_AVX512F") (V4DI "TARGET_AVX")
> +   (V16SF "TARGET_AVX512F") (V8SF "TARGET_AVX") V4SF
> +   (V8DF "TARGET_AVX512F") (V4DF "TARGET_AVX")
> +   (V4TI "TARGET_AVX512F")])
> +
> +(define_expand "vec_init<mode><ssescalarmodelower>"
>    [(match_operand:VEC_INIT_MODE 0 "register_operand")
>     (match_operand 1)]
>    "TARGET_SSE"
>  {
> +  ix86_expand_vector_init (false, operands[0], operands[1]);
> +  DONE;
> +})
> +
> +(define_expand "vec_init<mode><ssehalfvecmodelower>"
> +  [(match_operand:VEC_INIT_HALF_MODE 0 "register_operand")
> +   (match_operand 1)]
> +  "TARGET_SSE"
> +{
>    ix86_expand_vector_init (false, operands[0], operands[1]);
>    DONE;
>  })
> --- gcc/config/i386/mmx.md.jj	2017-07-24 10:57:45.869816434 +0200
> +++ gcc/config/i386/mmx.md	2017-07-24 16:11:23.065229922 +0200
> @@ -641,7 +641,7 @@ (define_split
>    [(set (match_dup 0) (match_dup 1))]
>    "operands[1] = adjust_address (operands[1], SFmode, 4);")
>  
> -(define_expand "vec_extractv2sf"
> +(define_expand "vec_extractv2sfsf"
>    [(match_operand:SF 0 "register_operand")
>     (match_operand:V2SF 1 "register_operand")
>     (match_operand 2 "const_int_operand")]
> @@ -652,7 +652,7 @@ (define_expand "vec_extractv2sf"
>    DONE;
>  })
>  
> -(define_expand "vec_initv2sf"
> +(define_expand "vec_initv2sfsf"
>    [(match_operand:V2SF 0 "register_operand")
>     (match_operand 1)]
>    "TARGET_SSE"
> @@ -1344,7 +1344,7 @@ (define_insn_and_split "*vec_extractv2si
>    operands[1] = adjust_address (operands[1], SImode, INTVAL (operands[2]) * 4);
>  })
>  
> -(define_expand "vec_extractv2si"
> +(define_expand "vec_extractv2sisi"
>    [(match_operand:SI 0 "register_operand")
>     (match_operand:V2SI 1 "register_operand")
>     (match_operand 2 "const_int_operand")]
> @@ -1355,7 +1355,7 @@ (define_expand "vec_extractv2si"
>    DONE;
>  })
>  
> -(define_expand "vec_initv2si"
> +(define_expand "vec_initv2sisi"
>    [(match_operand:V2SI 0 "register_operand")
>     (match_operand 1)]
>    "TARGET_SSE"
> @@ -1375,7 +1375,7 @@ (define_expand "vec_setv4hi"
>    DONE;
>  })
>  
> -(define_expand "vec_extractv4hi"
> +(define_expand "vec_extractv4hihi"
>    [(match_operand:HI 0 "register_operand")
>     (match_operand:V4HI 1 "register_operand")
>     (match_operand 2 "const_int_operand")]
> @@ -1386,7 +1386,7 @@ (define_expand "vec_extractv4hi"
>    DONE;
>  })
>  
> -(define_expand "vec_initv4hi"
> +(define_expand "vec_initv4hihi"
>    [(match_operand:V4HI 0 "register_operand")
>     (match_operand 1)]
>    "TARGET_SSE"
> @@ -1406,7 +1406,7 @@ (define_expand "vec_setv8qi"
>    DONE;
>  })
>  
> -(define_expand "vec_extractv8qi"
> +(define_expand "vec_extractv8qiqi"
>    [(match_operand:QI 0 "register_operand")
>     (match_operand:V8QI 1 "register_operand")
>     (match_operand 2 "const_int_operand")]
> @@ -1417,7 +1417,7 @@ (define_expand "vec_extractv8qi"
>    DONE;
>  })
>  
> -(define_expand "vec_initv8qi"
> +(define_expand "vec_initv8qiqi"
>    [(match_operand:V8QI 0 "register_operand")
>     (match_operand 1)]
>    "TARGET_SSE"
> --- gcc/config/rs6000/vector.md.jj	2017-06-08 20:50:49.000000000 +0200
> +++ gcc/config/rs6000/vector.md	2017-07-24 17:44:44.699580927 +0200
> @@ -74,6 +74,16 @@ (define_mode_attr VEC_base [(V16QI "QI")
>  			    (V1TI  "TI")
>  			    (TI    "TI")])
>  
> +;; As above, but in lower case
> +(define_mode_attr VEC_base_l [(V16QI "qi")
> +			      (V8HI  "hi")
> +			      (V4SI  "si")
> +			      (V2DI  "di")
> +			      (V4SF  "sf")
> +			      (V2DF  "df")
> +			      (V1TI  "ti")
> +			      (TI    "ti")])
> +
>  ;; Same size integer type for floating point data
>  (define_mode_attr VEC_int [(V4SF  "v4si")
>  			   (V2DF  "v2di")])
> @@ -1016,7 +1026,7 @@ (define_expand "fixuns_trunc<mode><VEC_i
>  
>  \f
>  ;; Vector initialization, set, extract
> -(define_expand "vec_init<mode>"
> +(define_expand "vec_init<mode><VEC_base_l>"
>    [(match_operand:VEC_E 0 "vlogical_operand" "")
>     (match_operand:VEC_E 1 "" "")]
>    "VECTOR_MEM_ALTIVEC_OR_VSX_P (<MODE>mode)"
> @@ -1035,7 +1045,7 @@ (define_expand "vec_set<mode>"
>    DONE;
>  })
>  
> -(define_expand "vec_extract<mode>"
> +(define_expand "vec_extract<mode><VEC_base_l>"
>    [(match_operand:<VEC_base> 0 "register_operand" "")
>     (match_operand:VEC_E 1 "vlogical_operand" "")
>     (match_operand 2 "const_int_operand" "")]
> --- gcc/config/rs6000/paired.md.jj	2017-06-08 20:50:49.000000000 +0200
> +++ gcc/config/rs6000/paired.md	2017-07-24 17:48:20.324985029 +0200
> @@ -377,7 +377,7 @@ (define_insn "paired_muls1"
>    "ps_muls1 %0, %1, %2"
>    [(set_attr "type" "fp")])
>  
> -(define_expand "vec_initv2sf"
> +(define_expand "vec_initv2sfsf"
>    [(match_operand:V2SF 0 "gpc_reg_operand" "=f")
>     (match_operand 1 "" "")]
>    "TARGET_PAIRED_FLOAT"
> --- gcc/config/rs6000/altivec.md.jj	2017-07-24 10:58:12.000000000 +0200
> +++ gcc/config/rs6000/altivec.md	2017-07-24 17:48:49.573633038 +0200
> @@ -311,7 +311,7 @@ (define_split
>    for (i = 0; i < num_elements; i++)
>      RTVEC_ELT (v, i) = constm1_rtx;
>  
> -  emit_insn (gen_vec_initv4si (dest, gen_rtx_PARALLEL (mode, v)));
> +  emit_insn (gen_vec_initv4sisi (dest, gen_rtx_PARALLEL (mode, v)));
>    emit_insn (gen_rtx_SET (dest, gen_rtx_ASHIFT (mode, dest, dest)));
>    DONE;
>  })
> @@ -2267,7 +2267,7 @@ (define_expand "altivec_copysign_v4sf3"
>    RTVEC_ELT (v, 2) = GEN_INT (mask_val);
>    RTVEC_ELT (v, 3) = GEN_INT (mask_val);
>  
> -  emit_insn (gen_vec_initv4si (mask, gen_rtx_PARALLEL (V4SImode, v)));
> +  emit_insn (gen_vec_initv4sisi (mask, gen_rtx_PARALLEL (V4SImode, v)));
>    emit_insn (gen_vector_select_v4sf (operands[0], operands[1], operands[2],
>  				     gen_lowpart (V4SFmode, mask)));
>    DONE;
> @@ -3409,7 +3409,7 @@ (define_expand "vec_unpacku_hi_v16qi"
>    RTVEC_ELT (v, 14) = gen_rtx_CONST_INT (QImode, be ? 16 :  0);
>    RTVEC_ELT (v, 15) = gen_rtx_CONST_INT (QImode, be ?  7 : 16);
>  
> -  emit_insn (gen_vec_initv16qi (mask, gen_rtx_PARALLEL (V16QImode, v)));
> +  emit_insn (gen_vec_initv16qiqi (mask, gen_rtx_PARALLEL (V16QImode, v)));
>    emit_insn (gen_vperm_v16qiv8hi (operands[0], operands[1], vzero, mask));
>    DONE;
>  }")
> @@ -3445,7 +3445,7 @@ (define_expand "vec_unpacku_hi_v8hi"
>    RTVEC_ELT (v, 14) = gen_rtx_CONST_INT (QImode, be ?  6 : 17);
>    RTVEC_ELT (v, 15) = gen_rtx_CONST_INT (QImode, be ?  7 : 16);
>  
> -  emit_insn (gen_vec_initv16qi (mask, gen_rtx_PARALLEL (V16QImode, v)));
> +  emit_insn (gen_vec_initv16qiqi (mask, gen_rtx_PARALLEL (V16QImode, v)));
>    emit_insn (gen_vperm_v8hiv4si (operands[0], operands[1], vzero, mask));
>    DONE;
>  }")
> @@ -3481,7 +3481,7 @@ (define_expand "vec_unpacku_lo_v16qi"
>    RTVEC_ELT (v, 14) = gen_rtx_CONST_INT (QImode, be ? 16 :  8);
>    RTVEC_ELT (v, 15) = gen_rtx_CONST_INT (QImode, be ? 15 : 16);
>  
> -  emit_insn (gen_vec_initv16qi (mask, gen_rtx_PARALLEL (V16QImode, v)));
> +  emit_insn (gen_vec_initv16qiqi (mask, gen_rtx_PARALLEL (V16QImode, v)));
>    emit_insn (gen_vperm_v16qiv8hi (operands[0], operands[1], vzero, mask));
>    DONE;
>  }")
> @@ -3517,7 +3517,7 @@ (define_expand "vec_unpacku_lo_v8hi"
>    RTVEC_ELT (v, 14) = gen_rtx_CONST_INT (QImode, be ? 14 : 17);
>    RTVEC_ELT (v, 15) = gen_rtx_CONST_INT (QImode, be ? 15 : 16);
>  
> -  emit_insn (gen_vec_initv16qi (mask, gen_rtx_PARALLEL (V16QImode, v)));
> +  emit_insn (gen_vec_initv16qiqi (mask, gen_rtx_PARALLEL (V16QImode, v)));
>    emit_insn (gen_vperm_v8hiv4si (operands[0], operands[1], vzero, mask));
>    DONE;
>  }")
> @@ -3758,7 +3758,7 @@ (define_expand "mulv16qi3"
>       = gen_rtx_CONST_INT (QImode, BYTES_BIG_ENDIAN ? 2 * i + 17 : 15 - 2 * i);
>    }
>  
> -  emit_insn (gen_vec_initv16qi (mask, gen_rtx_PARALLEL (V16QImode, v)));
> +  emit_insn (gen_vec_initv16qiqi (mask, gen_rtx_PARALLEL (V16QImode, v)));
>    emit_insn (gen_altivec_vmulesb (even, operands[1], operands[2]));
>    emit_insn (gen_altivec_vmulosb (odd, operands[1], operands[2]));
>    emit_insn (gen_altivec_vperm_v8hiv16qi (operands[0], even, odd, mask));
> @@ -3804,7 +3804,7 @@ (define_expand "altivec_vreve<mode>2"
>        RTVEC_ELT (v, i + j * size)
>  	= GEN_INT (i + (num_elements - 1 - j) * size);
>  
> -  emit_insn (gen_vec_initv16qi (mask, gen_rtx_PARALLEL (V16QImode, v)));
> +  emit_insn (gen_vec_initv16qiqi (mask, gen_rtx_PARALLEL (V16QImode, v)));
>    emit_insn (gen_altivec_vperm_<mode> (operands[0], operands[1],
>  	     operands[1], mask));
>    DONE;
> --- gcc/config/aarch64/aarch64-simd.md.jj	2017-07-24 15:01:21.000000000 +0200
> +++ gcc/config/aarch64/aarch64-simd.md	2017-07-24 17:19:05.660170375 +0200
> @@ -5617,9 +5617,9 @@ (define_expand "aarch64_set_qreg<VSTRUCT
>    DONE;
>  })
>  
> -;; Standard pattern name vec_init<mode>.
> +;; Standard pattern name vec_init<mode><Vel>.
>  
> -(define_expand "vec_init<mode>"
> +(define_expand "vec_init<mode><Vel>"
>    [(match_operand:VALL_F16 0 "register_operand" "")
>     (match_operand 1 "" "")]
>    "TARGET_SIMD"
> @@ -5674,9 +5674,9 @@ (define_insn "aarch64_urecpe<mode>"
>   "urecpe\\t%0.<Vtype>, %1.<Vtype>"
>    [(set_attr "type" "neon_fp_recpe_<Vetype><q>")])
>  
> -;; Standard pattern name vec_extract<mode>.
> +;; Standard pattern name vec_extract<mode><Vel>.
>  
> -(define_expand "vec_extract<mode>"
> +(define_expand "vec_extract<mode><Vel>"
>    [(match_operand:<VEL> 0 "aarch64_simd_nonimmediate_operand" "")
>     (match_operand:VALL_F16 1 "register_operand" "")
>     (match_operand:SI 2 "immediate_operand" "")]
> --- gcc/config/aarch64/iterators.md.jj	2017-03-19 11:57:22.000000000 +0100
> +++ gcc/config/aarch64/iterators.md	2017-07-24 17:17:50.318091273 +0200
> @@ -520,6 +520,17 @@ (define_mode_attr VEL [(V8QI "QI") (V16Q
>  			(SI   "SI") (HI   "HI")
>  			(QI   "QI")])
>  
> +;; Define element mode for each vector mode (lower case).
> +(define_mode_attr Vel [(V8QI "qi") (V16QI "qi")
> +			(V4HI "hi") (V8HI "hi")
> +			(V2SI "si") (V4SI "si")
> +			(DI "di")   (V2DI "di")
> +			(V4HF "hf") (V8HF "hf")
> +			(V2SF "sf") (V4SF "sf")
> +			(V2DF "df") (DF "df")
> +			(SI   "si") (HI   "hi")
> +			(QI   "qi")])
> +
>  ;; 64-bit container modes the inner or scalar source mode.
>  (define_mode_attr VCOND [(HI "V4HI") (SI "V2SI")
>  			 (V4HI "V4HI") (V8HI "V4HI")
> --- gcc/config/s390/s390.c.jj	2017-07-17 10:08:39.000000000 +0200
> +++ gcc/config/s390/s390.c	2017-07-24 17:58:24.416715142 +0200
> @@ -5792,7 +5792,7 @@ s390_expand_vec_strlen (rtx target, rtx
>    add_int_reg_note (s390_emit_ccraw_jump (8, NE, loop_start_label),
>  		    REG_BR_PROB,
>  		    profile_probability::very_likely ().to_reg_br_prob_note ());
> -  emit_insn (gen_vec_extractv16qi (len, result_reg, GEN_INT (7)));
> +  emit_insn (gen_vec_extractv16qiqi (len, result_reg, GEN_INT (7)));
>  
>    /* If the string pointer wasn't aligned we have loaded less then 16
>       bytes and the remaining bytes got filled with zeros (by vll).
> @@ -5850,7 +5850,7 @@ s390_expand_vec_movstr (rtx result, rtx
>    emit_insn (gen_vlbb (vsrc, src, GEN_INT (6)));
>    emit_insn (gen_lcbb (loadlen, src_addr, GEN_INT (6)));
>    emit_insn (gen_vfenezv16qi (vpos, vsrc, vsrc));
> -  emit_insn (gen_vec_extractv16qi (gpos_qi, vpos, GEN_INT (7)));
> +  emit_insn (gen_vec_extractv16qiqi (gpos_qi, vpos, GEN_INT (7)));
>    emit_move_insn (gpos, gen_rtx_SUBREG (SImode, gpos_qi, 0));
>    /* gpos is the byte index if a zero was found and 16 otherwise.
>       So if it is lower than the loaded bytes we have a hit.  */
> @@ -5928,7 +5928,7 @@ s390_expand_vec_movstr (rtx result, rtx
>    force_expand_binop (Pmode, add_optab, dst_addr_reg, offset, dst_addr_reg,
>  		      1, OPTAB_DIRECT);
>  
> -  emit_insn (gen_vec_extractv16qi (gpos_qi, vpos, GEN_INT (7)));
> +  emit_insn (gen_vec_extractv16qiqi (gpos_qi, vpos, GEN_INT (7)));
>    emit_move_insn (gpos, gen_rtx_SUBREG (SImode, gpos_qi, 0));
>  
>    emit_insn (gen_vstlv16qi (vsrc, gpos, gen_rtx_MEM (BLKmode, dst_addr_reg)));
> --- gcc/config/s390/vector.md.jj	2017-04-25 15:51:31.000000000 +0200
> +++ gcc/config/s390/vector.md	2017-07-24 17:57:37.665277768 +0200
> @@ -90,6 +90,17 @@ (define_mode_attr non_vec[(V1QI "QI") (V
>  			  (V1DF "DF") (V2DF "DF")
>  			  (V1TF "TF") (TF "TF")])
>  
> +; Like above, but in lower case.
> +(define_mode_attr non_vec_l[(V1QI "qi") (V2QI "qi") (V4QI "qi") (V8QI "qi")
> +			    (V16QI "qi")
> +			    (V1HI "hi") (V2HI "hi") (V4HI "hi") (V8HI "hi")
> +			    (V1SI "si") (V2SI "si") (V4SI "si")
> +			    (V1DI "di") (V2DI "di")
> +			    (V1TI "ti") (TI "ti")
> +			    (V1SF "sf") (V2SF "sf") (V4SF "sf")
> +			    (V1DF "df") (V2DF "df")
> +			    (V1TF "tf") (TF "tf")])
> +
>  ; The instruction suffix for integer instructions and instructions
>  ; which do not care about whether it is floating point or integer.
>  (define_mode_attr bhfgq[(V1QI "b") (V2QI "b") (V4QI "b") (V8QI "b") (V16QI "b")
> @@ -453,7 +464,7 @@ (define_insn "*vec_set<mode>_plus"
>  ; FIXME: Support also vector mode operands for 0
>  ; FIXME: This should be (vec_select ..) or something but it does only allow constant selectors :(
>  ; This is used via RTL standard name as well as for expanding the builtin
> -(define_expand "vec_extract<mode>"
> +(define_expand "vec_extract<mode><non_vec_l>"
>    [(set (match_operand:<non_vec> 0 "nonimmediate_operand" "")
>  	(unspec:<non_vec> [(match_operand:V  1 "register_operand" "")
>  			   (match_operand:SI 2 "nonmemory_operand" "")]
> @@ -485,7 +496,7 @@ (define_insn "*vec_extract<mode>_plus"
>    "vlgv<bhfgq>\t%0,%v1,%Y3(%2)"
>    [(set_attr "op_type" "VRS")])
>  
> -(define_expand "vec_init<mode>"
> +(define_expand "vec_init<mode><non_vec_l>"
>    [(match_operand:V_128 0 "register_operand" "")
>     (match_operand:V_128 1 "nonmemory_operand" "")]
>    "TARGET_VX"
> --- gcc/config/s390/s390-builtins.def.jj	2017-03-24 15:08:56.000000000 +0100
> +++ gcc/config/s390/s390-builtins.def	2017-07-24 18:02:22.571849086 +0200
> @@ -450,12 +450,12 @@ OB_DEF_VAR (s390_vec_extract_u64,
>  OB_DEF_VAR (s390_vec_extract_b64,       s390_vlgvg,         0,                  O2_ELEM,            BT_OV_ULONGLONG_BV2DI_INT)
>  OB_DEF_VAR (s390_vec_extract_dbl,       s390_vlgvg_dbl,     0,                  O2_ELEM,            BT_OV_DBL_V2DF_INT)                      /* vlgvg */
>  
> -B_DEF      (s390_vlgvb,                 vec_extractv16qi,   0,                  B_VX,               O2_ELEM,            BT_FN_UCHAR_UV16QI_INT)
> -B_DEF      (s390_vlgvh,                 vec_extractv8hi,    0,                  B_VX,               O2_ELEM,            BT_FN_USHORT_UV8HI_INT)
> -B_DEF      (s390_vlgvf,                 vec_extractv4si,    0,                  B_VX,               O2_ELEM,            BT_FN_UINT_UV4SI_INT)
> -B_DEF      (s390_vlgvf_flt,             vec_extractv4sf,    0,                  B_INT | B_VXE,      O2_ELEM,            BT_FN_FLT_V4SF_INT)
> -B_DEF      (s390_vlgvg,                 vec_extractv2di,    0,                  B_VX,               O2_ELEM,            BT_FN_ULONGLONG_UV2DI_INT)
> -B_DEF      (s390_vlgvg_dbl,             vec_extractv2df,    0,                  B_INT | B_VX,       O2_ELEM,            BT_FN_DBL_V2DF_INT)
> +B_DEF      (s390_vlgvb,                 vec_extractv16qiqi, 0,                  B_VX,               O2_ELEM,            BT_FN_UCHAR_UV16QI_INT)
> +B_DEF      (s390_vlgvh,                 vec_extractv8hihi,  0,                  B_VX,               O2_ELEM,            BT_FN_USHORT_UV8HI_INT)
> +B_DEF      (s390_vlgvf,                 vec_extractv4sisi,  0,                  B_VX,               O2_ELEM,            BT_FN_UINT_UV4SI_INT)
> +B_DEF      (s390_vlgvf_flt,             vec_extractv4sfsf,  0,                  B_INT | B_VXE,      O2_ELEM,            BT_FN_FLT_V4SF_INT)
> +B_DEF      (s390_vlgvg,                 vec_extractv2didi,  0,                  B_VX,               O2_ELEM,            BT_FN_ULONGLONG_UV2DI_INT)
> +B_DEF      (s390_vlgvg_dbl,             vec_extractv2dfdf,  0,                  B_INT | B_VX,       O2_ELEM,            BT_FN_DBL_V2DF_INT)
>  
>  OB_DEF     (s390_vec_insert_and_zero,   s390_vec_insert_and_zero_s8,s390_vec_insert_and_zero_dbl,B_VX,BT_FN_OV4SI_INTCONSTPTR)
>  OB_DEF_VAR (s390_vec_insert_and_zero_s8,s390_vllezb,        0,                  0,                  BT_OV_V16QI_SCHARCONSTPTR)
> --- gcc/config/arm/iterators.md.jj	2017-05-05 09:20:02.000000000 +0200
> +++ gcc/config/arm/iterators.md	2017-07-24 17:25:15.665681575 +0200
> @@ -444,6 +444,14 @@ (define_mode_attr V_elem [(V8QI "QI") (V
>                            (V2SF "SF") (V4SF "SF")
>                            (DI "DI")   (V2DI "DI")])
>  
> +;; As above but in lower case.
> +(define_mode_attr V_elem_l [(V8QI "qi") (V16QI "qi")
> +			    (V4HI "hi") (V8HI "hi")
> +			    (V4HF "hf") (V8HF "hf")
> +			    (V2SI "si") (V4SI "si")
> +			    (V2SF "sf") (V4SF "sf")
> +			    (DI "di")   (V2DI "di")])
> +
>  ;; Element modes for vector extraction, padded up to register size.
>  
>  (define_mode_attr V_ext [(V8QI "SI") (V16QI "SI")
> --- gcc/config/arm/neon.md.jj	2017-07-17 10:08:41.000000000 +0200
> +++ gcc/config/arm/neon.md	2017-07-24 17:27:42.173917259 +0200
> @@ -412,7 +412,7 @@ (define_expand "vec_set<mode>"
>    DONE;
>  })
>  
> -(define_insn "vec_extract<mode>"
> +(define_insn "vec_extract<mode><V_elem_l>"
>    [(set (match_operand:<V_elem> 0 "nonimmediate_operand" "=Um,r")
>          (vec_select:<V_elem>
>            (match_operand:VD_LANE 1 "s_register_operand" "w,w")
> @@ -434,7 +434,7 @@ (define_insn "vec_extract<mode>"
>    [(set_attr "type" "neon_store1_one_lane<q>,neon_to_gp<q>")]
>  )
>  
> -(define_insn "vec_extract<mode>"
> +(define_insn "vec_extract<mode><V_elem_l>"
>    [(set (match_operand:<V_elem> 0 "nonimmediate_operand" "=Um,r")
>  	(vec_select:<V_elem>
>            (match_operand:VQ2 1 "s_register_operand" "w,w")
> @@ -460,7 +460,7 @@ (define_insn "vec_extract<mode>"
>    [(set_attr "type" "neon_store1_one_lane<q>,neon_to_gp<q>")]
>  )
>  
> -(define_insn "vec_extractv2di"
> +(define_insn "vec_extractv2didi"
>    [(set (match_operand:DI 0 "nonimmediate_operand" "=Um,r")
>  	(vec_select:DI
>            (match_operand:V2DI 1 "s_register_operand" "w,w")
> @@ -479,7 +479,7 @@ (define_insn "vec_extractv2di"
>    [(set_attr "type" "neon_store1_one_lane_q,neon_to_gp_q")]
>  )
>  
> -(define_expand "vec_init<mode>"
> +(define_expand "vec_init<mode><V_elem_l>"
>    [(match_operand:VDQ 0 "s_register_operand" "")
>     (match_operand 1 "" "")]
>    "TARGET_NEON"
> @@ -1581,7 +1581,7 @@ (define_expand "reduc_plus_scal_<mode>"
>    neon_pairwise_reduce (vec, operands[1], <MODE>mode,
>  			&gen_neon_vpadd_internal<mode>);
>    /* The same result is actually computed into every element.  */
> -  emit_insn (gen_vec_extract<mode> (operands[0], vec, const0_rtx));
> +  emit_insn (gen_vec_extract<mode><V_elem_l> (operands[0], vec, const0_rtx));
>    DONE;
>  })
>  
> @@ -1607,7 +1607,7 @@ (define_expand "reduc_plus_scal_v2di"
>    rtx vec = gen_reg_rtx (V2DImode);
>  
>    emit_insn (gen_arm_reduc_plus_internal_v2di (vec, operands[1]));
> -  emit_insn (gen_vec_extractv2di (operands[0], vec, const0_rtx));
> +  emit_insn (gen_vec_extractv2didi (operands[0], vec, const0_rtx));
>  
>    DONE;
>  })
> @@ -1631,7 +1631,7 @@ (define_expand "reduc_smin_scal_<mode>"
>    neon_pairwise_reduce (vec, operands[1], <MODE>mode,
>  			&gen_neon_vpsmin<mode>);
>    /* The result is computed into every element of the vector.  */
> -  emit_insn (gen_vec_extract<mode> (operands[0], vec, const0_rtx));
> +  emit_insn (gen_vec_extract<mode><V_elem_l> (operands[0], vec, const0_rtx));
>    DONE;
>  })
>  
> @@ -1658,7 +1658,7 @@ (define_expand "reduc_smax_scal_<mode>"
>    neon_pairwise_reduce (vec, operands[1], <MODE>mode,
>  			&gen_neon_vpsmax<mode>);
>    /* The result is computed into every element of the vector.  */
> -  emit_insn (gen_vec_extract<mode> (operands[0], vec, const0_rtx));
> +  emit_insn (gen_vec_extract<mode><V_elem_l> (operands[0], vec, const0_rtx));
>    DONE;
>  })
>  
> @@ -1685,7 +1685,7 @@ (define_expand "reduc_umin_scal_<mode>"
>    neon_pairwise_reduce (vec, operands[1], <MODE>mode,
>  			&gen_neon_vpumin<mode>);
>    /* The result is computed into every element of the vector.  */
> -  emit_insn (gen_vec_extract<mode> (operands[0], vec, const0_rtx));
> +  emit_insn (gen_vec_extract<mode><V_elem_l> (operands[0], vec, const0_rtx));
>    DONE;
>  })
>  
> @@ -1711,7 +1711,7 @@ (define_expand "reduc_umax_scal_<mode>"
>    neon_pairwise_reduce (vec, operands[1], <MODE>mode,
>  			&gen_neon_vpumax<mode>);
>    /* The result is computed into every element of the vector.  */
> -  emit_insn (gen_vec_extract<mode> (operands[0], vec, const0_rtx));
> +  emit_insn (gen_vec_extract<mode><V_elem_l> (operands[0], vec, const0_rtx));
>    DONE;
>  })
>  
> @@ -3272,7 +3272,8 @@ (define_expand "neon_vget_lane<mode>"
>      }
>  
>    if (GET_MODE_UNIT_BITSIZE (<MODE>mode) == 32)
> -    emit_insn (gen_vec_extract<mode> (operands[0], operands[1], operands[2]));
> +    emit_insn (gen_vec_extract<mode><V_elem_l> (operands[0], operands[1],
> +						operands[2]));
>    else
>      emit_insn (gen_neon_vget_lane<mode>_sext_internal (operands[0],
>  						       operands[1],
> @@ -3301,7 +3302,8 @@ (define_expand "neon_vget_laneu<mode>"
>      }
>  
>    if (GET_MODE_UNIT_BITSIZE (<MODE>mode) == 32)
> -    emit_insn (gen_vec_extract<mode> (operands[0], operands[1], operands[2]));
> +    emit_insn (gen_vec_extract<mode><V_elem_l> (operands[0], operands[1],
> +						operands[2]));
>    else
>      emit_insn (gen_neon_vget_lane<mode>_zext_internal (operands[0],
>  						       operands[1],
> --- gcc/config/mips/mips-msa.md.jj	2017-03-31 20:36:09.000000000 +0200
> +++ gcc/config/mips/mips-msa.md	2017-07-24 17:33:32.657689124 +0200
> @@ -231,7 +231,7 @@ (define_mode_attr bitimm
>     (V4SI  "uimm5")
>     (V2DI  "uimm6")])
>  
> -(define_expand "vec_init<mode>"
> +(define_expand "vec_init<mode><unitmode>"
>    [(match_operand:MSA 0 "register_operand")
>     (match_operand:MSA 1 "")]
>    "ISA_HAS_MSA"
> @@ -311,7 +311,7 @@ (define_expand "vec_unpacku_lo_<mode>"
>    DONE;
>  })
>  
> -(define_expand "vec_extract<mode>"
> +(define_expand "vec_extract<mode><unitmode>"
>    [(match_operand:<UNITMODE> 0 "register_operand")
>     (match_operand:IMSA 1 "register_operand")
>     (match_operand 2 "const_<indeximm>_operand")]
> @@ -329,7 +329,7 @@ (define_expand "vec_extract<mode>"
>    DONE;
>  })
>  
> -(define_expand "vec_extract<mode>"
> +(define_expand "vec_extract<mode><unitmode>"
>    [(match_operand:<UNITMODE> 0 "register_operand")
>     (match_operand:FMSA 1 "register_operand")
>     (match_operand 2 "const_<indeximm>_operand")]
> --- gcc/config/mips/loongson.md.jj	2017-01-01 12:45:40.000000000 +0100
> +++ gcc/config/mips/loongson.md	2017-07-24 18:08:29.736433972 +0200
> @@ -119,7 +119,7 @@ (define_insn "mov<mode>_internal"
>  
>  ;; Initialization of a vector.
>  
> -(define_expand "vec_init<mode>"
> +(define_expand "vec_init<mode><unitmode>"
>    [(set (match_operand:VWHB 0 "register_operand")
>  	(match_operand 1 ""))]
>    "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
> --- gcc/config/mips/mips-ps-3d.md.jj	2017-01-01 12:45:40.000000000 +0100
> +++ gcc/config/mips/mips-ps-3d.md	2017-07-24 17:34:13.540195876 +0200
> @@ -254,7 +254,7 @@ (define_expand "mips_pll_ps"
>  })
>  
>  ; vec_init
> -(define_expand "vec_initv2sf"
> +(define_expand "vec_initv2sfsf"
>    [(match_operand:V2SF 0 "register_operand")
>     (match_operand:V2SF 1 "")]
>    "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT"
> @@ -282,7 +282,7 @@ (define_insn "vec_concatv2sf"
>  ;; emulated.  There is no other way to get a vector mode bitfield extract
>  ;; currently.
>  
> -(define_insn "vec_extractv2sf"
> +(define_insn "vec_extractv2sfsf"
>    [(set (match_operand:SF 0 "register_operand" "=f")
>  	(vec_select:SF (match_operand:V2SF 1 "register_operand" "f")
>  		       (parallel
> @@ -379,7 +379,7 @@ (define_expand "reduc_plus_scal_v2sf"
>      rtx temp = gen_reg_rtx (V2SFmode);
>      emit_insn (gen_mips_addr_ps (temp, operands[1], operands[1]));
>      rtx lane = BYTES_BIG_ENDIAN ? const1_rtx : const0_rtx;
> -    emit_insn (gen_vec_extractv2sf (operands[0], temp, lane));
> +    emit_insn (gen_vec_extractv2sfsf (operands[0], temp, lane));
>      DONE;
>    })
>  
> @@ -757,7 +757,7 @@ (define_expand "reduc_smin_scal_v2sf"
>    rtx temp = gen_reg_rtx (V2SFmode);
>    mips_expand_vec_reduc (temp, operands[1], gen_sminv2sf3);
>    rtx lane = BYTES_BIG_ENDIAN ? const1_rtx : const0_rtx;
> -  emit_insn (gen_vec_extractv2sf (operands[0], temp, lane));
> +  emit_insn (gen_vec_extractv2sfsf (operands[0], temp, lane));
>    DONE;
>  })
>  
> @@ -769,6 +769,6 @@ (define_expand "reduc_smax_scal_v2sf"
>    rtx temp = gen_reg_rtx (V2SFmode);
>    mips_expand_vec_reduc (temp, operands[1], gen_smaxv2sf3);
>    rtx lane = BYTES_BIG_ENDIAN ? const1_rtx : const0_rtx;
> -  emit_insn (gen_vec_extractv2sf (operands[0], temp, lane));
> +  emit_insn (gen_vec_extractv2sfsf (operands[0], temp, lane));
>    DONE;
>  })
> --- gcc/config/mips/mips.md.jj	2017-06-15 11:03:32.000000000 +0200
> +++ gcc/config/mips/mips.md	2017-07-24 19:00:15.519582707 +0200
> @@ -917,6 +917,11 @@ (define_mode_attr UNITMODE [(SF "SF") (D
>  			    (V16QI "QI") (V8HI "HI") (V4SI "SI") (V2DI "DI")
>  			    (V2DF "DF")])
>  
> +;; As above, but in lower case.
> +(define_mode_attr unitmode [(SF "sf") (DF "df") (V2SF "sf") (V4SF "sf")
> +			    (V16QI "qi") (V8QI "qi") (V8HI "hi") (V4HI "hi")
> +			    (V4SI "si") (V2SI "si") (V2DI "di") (V2DF "df")])
> +
>  ;; This attribute gives the integer mode that has the same size as a
>  ;; fixed-point mode.
>  (define_mode_attr IMODE [(QQ "QI") (HQ "HI") (SQ "SI") (DQ "DI")
> --- gcc/config/spu/spu.c.jj	2017-07-17 10:08:39.000000000 +0200
> +++ gcc/config/spu/spu.c	2017-07-24 18:06:01.693214125 +0200
> @@ -1773,7 +1773,7 @@ spu_expand_prologue (void)
>  	      size_v4si = scratch_v4si;
>  	    }
>  	  emit_insn (gen_cgt_v4si (scratch_v4si, sp_v4si, size_v4si));
> -	  emit_insn (gen_vec_extractv4si
> +	  emit_insn (gen_vec_extractv4sisi
>  		     (scratch_reg_0, scratch_v4si, GEN_INT (1)));
>  	  emit_insn (gen_spu_heq (scratch_reg_0, GEN_INT (0)));
>  	}
> @@ -5368,7 +5368,7 @@ spu_allocate_stack (rtx op0, rtx op1)
>      {
>        rtx avail = gen_reg_rtx(SImode);
>        rtx result = gen_reg_rtx(SImode);
> -      emit_insn (gen_vec_extractv4si (avail, sp, GEN_INT (1)));
> +      emit_insn (gen_vec_extractv4sisi (avail, sp, GEN_INT (1)));
>        emit_insn (gen_cgt_si(result, avail, GEN_INT (-1)));
>        emit_insn (gen_spu_heq (result, GEN_INT(0) ));
>      }
> @@ -5684,22 +5684,22 @@ spu_builtin_extract (rtx ops[])
>        switch (mode)
>  	{
>  	case V16QImode:
> -	  emit_insn (gen_vec_extractv16qi (ops[0], ops[1], ops[2]));
> +	  emit_insn (gen_vec_extractv16qiqi (ops[0], ops[1], ops[2]));
>  	  break;
>  	case V8HImode:
> -	  emit_insn (gen_vec_extractv8hi (ops[0], ops[1], ops[2]));
> +	  emit_insn (gen_vec_extractv8hihi (ops[0], ops[1], ops[2]));
>  	  break;
>  	case V4SFmode:
> -	  emit_insn (gen_vec_extractv4sf (ops[0], ops[1], ops[2]));
> +	  emit_insn (gen_vec_extractv4sfsf (ops[0], ops[1], ops[2]));
>  	  break;
>  	case V4SImode:
> -	  emit_insn (gen_vec_extractv4si (ops[0], ops[1], ops[2]));
> +	  emit_insn (gen_vec_extractv4sisi (ops[0], ops[1], ops[2]));
>  	  break;
>  	case V2DImode:
> -	  emit_insn (gen_vec_extractv2di (ops[0], ops[1], ops[2]));
> +	  emit_insn (gen_vec_extractv2didi (ops[0], ops[1], ops[2]));
>  	  break;
>  	case V2DFmode:
> -	  emit_insn (gen_vec_extractv2df (ops[0], ops[1], ops[2]));
> +	  emit_insn (gen_vec_extractv2dfdf (ops[0], ops[1], ops[2]));
>  	  break;
>  	default:
>  	  abort ();
> --- gcc/config/spu/spu.md.jj	2017-01-01 12:45:40.000000000 +0100
> +++ gcc/config/spu/spu.md	2017-07-24 18:05:05.591888718 +0200
> @@ -256,6 +256,13 @@ (define_mode_attr inner  [(V16QI "QI")
>  			  (V2DI  "DI")
>  			  (V4SF  "SF")
>  			  (V2DF  "DF")])
> +;; Like above, but in lower case
> +(define_mode_attr inner_l [(V16QI "qi")
> +			   (V8HI  "hi")
> +			   (V4SI  "si")
> +			   (V2DI  "di")
> +			   (V4SF  "sf")
> +			   (V2DF  "df")])
>  (define_mode_attr vmult  [(V16QI "1")
>  			  (V8HI  "2")
>  			  (V4SI  "4")
> @@ -4318,7 +4325,7 @@ (define_expand "restore_stack_nonlocal"
>  ;; vector patterns
>  
>  ;; Vector initialization
> -(define_expand "vec_init<mode>"
> +(define_expand "vec_init<mode><inner_l>"
>    [(match_operand:V 0 "register_operand" "")
>     (match_operand 1 "" "")]
>    ""
> @@ -4347,7 +4354,7 @@ (define_expand "vec_set<mode>"
>      operands[6] = GEN_INT (size);
>    })
>  
> -(define_expand "vec_extract<mode>"
> +(define_expand "vec_extract<mode><inner_l>"
>    [(set (match_operand:<inner> 0 "spu_reg_operand" "=r")
>  	(vec_select:<inner> (match_operand:V 1 "spu_reg_operand" "r")
>  			    (parallel [(match_operand 2 "const_int_operand" "i")])))]
> --- gcc/config/sparc/sparc.md.jj	2017-07-17 10:08:39.000000000 +0200
> +++ gcc/config/sparc/sparc.md	2017-07-24 18:11:52.396997069 +0200
> @@ -8621,6 +8621,8 @@ (define_mode_attr vconstr [(V1SI "f") (V
>  (define_mode_attr vfptype [(V1SI "single") (V2HI "single") (V4QI "single")
>  			   (V1DI "double") (V2SI "double") (V4HI "double")
>  			   (V8QI "double")])
> +(define_mode_attr veltmode [(V1SI "si") (V2HI "hi") (V4QI "qi") (V1DI "di")
> +			    (V2SI "si") (V4HI "hi") (V8QI "qi")])
>  
>  (define_expand "mov<VMALL:mode>"
>    [(set (match_operand:VMALL 0 "nonimmediate_operand" "")
> @@ -8762,7 +8764,7 @@ (define_split
>    DONE;
>  })
>  
> -(define_expand "vec_init<VMALL:mode>"
> +(define_expand "vec_init<VMALL:mode><VMALL:veltmode>"
>    [(match_operand:VMALL 0 "register_operand" "")
>     (match_operand:VMALL 1 "" "")]
>    "TARGET_VIS"
> --- gcc/config/ia64/vect.md.jj	2017-01-01 12:45:42.000000000 +0100
> +++ gcc/config/ia64/vect.md	2017-07-24 17:29:28.996628899 +0200
> @@ -1015,7 +1015,7 @@ (define_insn "*vec_interleave_highv2si"
>  }
>    [(set_attr "itanium_class" "mmshf")])
>  
> -(define_expand "vec_initv2si"
> +(define_expand "vec_initv2sisi"
>    [(match_operand:V2SI 0 "gr_register_operand" "")
>     (match_operand 1 "" "")]
>    ""
> @@ -1299,7 +1299,7 @@ (define_insn "*fselect"
>    "fselect %0 = %F2, %F3, %1"
>    [(set_attr "itanium_class" "fmisc")])
>  
> -(define_expand "vec_initv2sf"
> +(define_expand "vec_initv2sfsf"
>    [(match_operand:V2SF 0 "fr_register_operand" "")
>     (match_operand 1 "" "")]
>    ""
> @@ -1483,7 +1483,7 @@ (define_insn_and_split "*vec_extractv2sf
>    operands[1] = gen_rtx_REG (SFmode, REGNO (operands[1]));
>  })
>  
> -(define_expand "vec_extractv2sf"
> +(define_expand "vec_extractv2sfsf"
>    [(set (match_operand:SF 0 "register_operand" "")
>  	(unspec:SF [(match_operand:V2SF 1 "register_operand" "")
>  		    (match_operand:DI 2 "const_int_operand" "")]
> --- gcc/config/powerpcspe/vector.md.jj	2017-05-25 10:37:03.000000000 +0200
> +++ gcc/config/powerpcspe/vector.md	2017-07-24 17:41:21.897027743 +0200
> @@ -74,6 +74,16 @@ (define_mode_attr VEC_base [(V16QI "QI")
>  			    (V1TI  "TI")
>  			    (TI    "TI")])
>  
> +;; As above, but in lower case
> +(define_mode_attr VEC_base_l [(V16QI "qi")
> +			      (V8HI  "hi")
> +			      (V4SI  "si")
> +			      (V2DI  "di")
> +			      (V4SF  "sf")
> +			      (V2DF  "df")
> +			      (V1TI  "ti")
> +			      (TI    "ti")])
> +
>  ;; Same size integer type for floating point data
>  (define_mode_attr VEC_int [(V4SF  "v4si")
>  			   (V2DF  "v2di")])
> @@ -1017,7 +1027,7 @@ (define_expand "fixuns_trunc<mode><VEC_i
>  
>  \f
>  ;; Vector initialization, set, extract
> -(define_expand "vec_init<mode>"
> +(define_expand "vec_init<mode><VEC_base_l>"
>    [(match_operand:VEC_E 0 "vlogical_operand" "")
>     (match_operand:VEC_E 1 "" "")]
>    "VECTOR_MEM_ALTIVEC_OR_VSX_P (<MODE>mode)"
> @@ -1036,7 +1046,7 @@ (define_expand "vec_set<mode>"
>    DONE;
>  })
>  
> -(define_expand "vec_extract<mode>"
> +(define_expand "vec_extract<mode><VEC_base_l>"
>    [(match_operand:<VEC_base> 0 "register_operand" "")
>     (match_operand:VEC_E 1 "vlogical_operand" "")
>     (match_operand 2 "const_int_operand" "")]
> --- gcc/config/powerpcspe/paired.md.jj	2017-05-25 10:37:04.000000000 +0200
> +++ gcc/config/powerpcspe/paired.md	2017-07-24 17:42:17.980351097 +0200
> @@ -377,7 +377,7 @@ (define_insn "paired_muls1"
>    "ps_muls1 %0, %1, %2"
>    [(set_attr "type" "fp")])
>  
> -(define_expand "vec_initv2sf"
> +(define_expand "vec_initv2sfsf"
>    [(match_operand:V2SF 0 "gpc_reg_operand" "=f")
>     (match_operand 1 "" "")]
>    "TARGET_PAIRED_FLOAT"
> --- gcc/config/powerpcspe/altivec.md.jj	2017-05-25 10:37:05.000000000 +0200
> +++ gcc/config/powerpcspe/altivec.md	2017-07-24 17:42:49.897966010 +0200
> @@ -301,7 +301,7 @@ (define_split
>    for (i = 0; i < num_elements; i++)
>      RTVEC_ELT (v, i) = constm1_rtx;
>  
> -  emit_insn (gen_vec_initv4si (dest, gen_rtx_PARALLEL (mode, v)));
> +  emit_insn (gen_vec_initv4sisi (dest, gen_rtx_PARALLEL (mode, v)));
>    emit_insn (gen_rtx_SET (dest, gen_rtx_ASHIFT (mode, dest, dest)));
>    DONE;
>  })
> @@ -2222,7 +2222,7 @@ (define_expand "altivec_copysign_v4sf3"
>    RTVEC_ELT (v, 2) = GEN_INT (mask_val);
>    RTVEC_ELT (v, 3) = GEN_INT (mask_val);
>  
> -  emit_insn (gen_vec_initv4si (mask, gen_rtx_PARALLEL (V4SImode, v)));
> +  emit_insn (gen_vec_initv4sisi (mask, gen_rtx_PARALLEL (V4SImode, v)));
>    emit_insn (gen_vector_select_v4sf (operands[0], operands[1], operands[2],
>  				     gen_lowpart (V4SFmode, mask)));
>    DONE;
> @@ -3014,7 +3014,7 @@ (define_expand "vec_unpacku_hi_v16qi"
>    RTVEC_ELT (v, 14) = gen_rtx_CONST_INT (QImode, be ? 16 :  0);
>    RTVEC_ELT (v, 15) = gen_rtx_CONST_INT (QImode, be ?  7 : 16);
>  
> -  emit_insn (gen_vec_initv16qi (mask, gen_rtx_PARALLEL (V16QImode, v)));
> +  emit_insn (gen_vec_initv16qiqi (mask, gen_rtx_PARALLEL (V16QImode, v)));
>    emit_insn (gen_vperm_v16qiv8hi (operands[0], operands[1], vzero, mask));
>    DONE;
>  }")
> @@ -3050,7 +3050,7 @@ (define_expand "vec_unpacku_hi_v8hi"
>    RTVEC_ELT (v, 14) = gen_rtx_CONST_INT (QImode, be ?  6 : 17);
>    RTVEC_ELT (v, 15) = gen_rtx_CONST_INT (QImode, be ?  7 : 16);
>  
> -  emit_insn (gen_vec_initv16qi (mask, gen_rtx_PARALLEL (V16QImode, v)));
> +  emit_insn (gen_vec_initv16qiqi (mask, gen_rtx_PARALLEL (V16QImode, v)));
>    emit_insn (gen_vperm_v8hiv4si (operands[0], operands[1], vzero, mask));
>    DONE;
>  }")
> @@ -3086,7 +3086,7 @@ (define_expand "vec_unpacku_lo_v16qi"
>    RTVEC_ELT (v, 14) = gen_rtx_CONST_INT (QImode, be ? 16 :  8);
>    RTVEC_ELT (v, 15) = gen_rtx_CONST_INT (QImode, be ? 15 : 16);
>  
> -  emit_insn (gen_vec_initv16qi (mask, gen_rtx_PARALLEL (V16QImode, v)));
> +  emit_insn (gen_vec_initv16qiqi (mask, gen_rtx_PARALLEL (V16QImode, v)));
>    emit_insn (gen_vperm_v16qiv8hi (operands[0], operands[1], vzero, mask));
>    DONE;
>  }")
> @@ -3122,7 +3122,7 @@ (define_expand "vec_unpacku_lo_v8hi"
>    RTVEC_ELT (v, 14) = gen_rtx_CONST_INT (QImode, be ? 14 : 17);
>    RTVEC_ELT (v, 15) = gen_rtx_CONST_INT (QImode, be ? 15 : 16);
>  
> -  emit_insn (gen_vec_initv16qi (mask, gen_rtx_PARALLEL (V16QImode, v)));
> +  emit_insn (gen_vec_initv16qiqi (mask, gen_rtx_PARALLEL (V16QImode, v)));
>    emit_insn (gen_vperm_v8hiv4si (operands[0], operands[1], vzero, mask));
>    DONE;
>  }")
> @@ -3363,7 +3363,7 @@ (define_expand "mulv16qi3"
>       = gen_rtx_CONST_INT (QImode, BYTES_BIG_ENDIAN ? 2 * i + 17 : 15 - 2 * i);
>    }
>  
> -  emit_insn (gen_vec_initv16qi (mask, gen_rtx_PARALLEL (V16QImode, v)));
> +  emit_insn (gen_vec_initv16qiqi (mask, gen_rtx_PARALLEL (V16QImode, v)));
>    emit_insn (gen_altivec_vmulesb (even, operands[1], operands[2]));
>    emit_insn (gen_altivec_vmulosb (odd, operands[1], operands[2]));
>    emit_insn (gen_altivec_vperm_v8hiv16qi (operands[0], even, odd, mask));
> 
> 	Jakub
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2017-08-01 23:57 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-07-25  9:14 [PATCH] Switch vec_init and vec_extract optabs to 2 mode optab to allow extraction of vector from vector or initialization of vector from smaller vectors (PR target/80846) Jakub Jelinek
2017-07-25 21:12 ` Segher Boessenkool
2017-07-26  7:09   ` Jakub Jelinek
2017-07-26  7:29     ` Richard Biener
2017-07-26 11:41     ` Segher Boessenkool
2017-08-01 16:21       ` Jakub Jelinek
2017-08-01 23:57         ` Segher Boessenkool
2017-07-25 21:45 ` Matthew Fortune
2017-07-26  7:25   ` Richard Biener
2017-07-26  7:34 ` Eric Botcazou
2017-07-26 10:35 ` Richard Biener
2017-07-26 10:42 ` Uros Bizjak
2017-07-27 11:43 ` Segher Boessenkool
2017-07-27 11:56 ` Andreas Krebbel
2017-08-01  8:09 ` Richard Earnshaw (lists)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).