public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH 0/4][Vectorizer] Reductions: replace VEC_RSHIFT_EXPR with VEC_PERM_EXPR
@ 2014-11-12 17:52 Alan Lawrence
  2014-11-12 17:53 ` [PATCH 1/4][Vectorizer] Split vect_gen_perm_mask into _checked and _any variants Alan Lawrence
                   ` (5 more replies)
  0 siblings, 6 replies; 12+ messages in thread
From: Alan Lawrence @ 2014-11-12 17:52 UTC (permalink / raw)
  To: gcc-patches
  Cc: Richard Biener, David Edelsohn, Catherine Moore, Matthew Fortune

In response to https://gcc.gnu.org/ml/gcc-patches/2014-09/msg01803.html, this 
series removes the VEC_RSHIFT_EXPR, instead using a VEC_PERM_EXPR (with a second 
argument full of constant zeroes) to represent the shift.

I've kept the use of vec_shr optab for platforms that define it, as even on 
platforms with a whole-vector-shift operation, this typically does not work as a 
vec-perm on arbitrary vectors (the shift will pull in zeroes from the end, 
whereas TARGET_VECTORIZE_VEC_PERM_CONST_OK and related machinery allows only to 
check for a shift-like permutation that will work for two arbitrary vectors).

I've also changed from the endianness-dependent shift direction of the old 
VEC_RSHIFT_EXPR, to an endian-neutral direction (VEC_PERM_EXPR is inherently 
endian-neutral), changing the meaning of vec_shr_optab to match (as I did in 
https://gcc.gnu.org/ml/gcc-patches/2014-09/msg01475.html). As previously, this 
will break any *bigendian* platform defining vec_shr; I see MIPS and RS6000, but 
candidate fixes for both of these have already been posted:

(for MIPS) https://gcc.gnu.org/ml/gcc-patches/2014-09/msg01481.html, although I 
have not been able to test this as there doesn't seem to be any working 
MIPS/Loongson hardware in the Compile Farm;

(for PowerPC) https://gcc.gnu.org/ml/gcc-patches/2014-09/msg01480.html, testing 
in progress.

ARM defines vec_shr only for little-endian; AArch64 does not yet, although in 
previous patch series I both added a vec_shr and made it endian-neutral (I will 
post revised patches for both of these shortly).

Bootstrapped and check-gcc on x86-none-linux-gnu and arm-none-linux-gnu;
cross-tested on aarch64{,_be}-none-elf (FWIW, both with and without previous 
patches adding a vec_shr pattern)

Ok for trunk if no regressions on PowerPC?

Thanks, Alan

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 1/4][Vectorizer] Split vect_gen_perm_mask into _checked and _any variants
  2014-11-12 17:52 [PATCH 0/4][Vectorizer] Reductions: replace VEC_RSHIFT_EXPR with VEC_PERM_EXPR Alan Lawrence
@ 2014-11-12 17:53 ` Alan Lawrence
  2014-11-13 11:17   ` Richard Biener
  2014-11-12 17:54 ` [PATCH 2/4][Vectorizer] Use a VEC_PERM_EXPR instead of VEC_RSHIFT_EXPR; expand appropriate VEC_PERM_EXPRs using vec_shr_optab Alan Lawrence
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 12+ messages in thread
From: Alan Lawrence @ 2014-11-12 17:53 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1636 bytes --]

This is a preliminary to patch 2, which wants functionality equivalent to 
vect_gen_perm_mask (converting a char* to an RTL const_vector) but without the 
check of can_vec_perm_p.

All existing calls to vect_gen_perm_mask barring that in perm_mask_for_reverse, 
assert the return value is non-null. Hence, this patch splits the existing 
vect_gen_perm_mask into two: a checked variant which makes the equivalent 
assertion; and an unchecked variant, which just turns any char* into an 
equivalent const_vector.

(An) alternative strategy would be merely to remove the check from 
vect_gen_perm_mask (so equivalent to this patch's vect_gen_perm_mask_any), and 
add a preceding assert at all callsites (i.e. changing the many 
"gen_vect_perm_mask (...); assert (... != NULL);" into "assert (can_vec_perm_p 
(...)); gen_vect_perm_mask (...);" - that would work just as well as far as this 
patch series is concerned.

On its own, bootstrapped on x86-none-linux-gnu (more testing with patches 2+3).

gcc/ChangeLog:

	* tree-vectorizer.h (vect_gen_perm_mask): Remove.
	(vect_gen_perm_mask_checked, vect_gen_perm_mask_any): New.

	tree_vec_data_refs.c (vect_permute_load_chain, vec_permute_store_chain,
	vec_shift_permute_load_chain): Replace vect_gen_perm_mask & assert
	with vect_gen_perm_mask_checked.

	* tree-vect-stmts.c (vectorizable_mask_load_store, vectorizable_load):
	Likewise.

	(vect_gen_perm_mask_checked): New.
	(vect_gen_perm_mask): Remove can_vec_perm_p check, rename to...
	(vect_gen_perm_mask_any): ...this.

	(perm_mask_for_reverse): Call can_vec_perm_p and
	vect_gen_perm_mask_checked.

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 1_vec_gen_perm_mask.patch --]
[-- Type: text/x-patch; name=1_vec_gen_perm_mask.patch, Size: 11471 bytes --]

diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
index 1d30bdf78cbd66874906c5f7444ce30c4c391feb..4c5b1394517bfcf5c4226143e53cbbe79e0c68b1 100644
--- a/gcc/tree-vect-data-refs.c
+++ b/gcc/tree-vect-data-refs.c
@@ -4623,8 +4623,7 @@ vect_permute_store_chain (vec<tree> dr_chain,
 	      if (3 * i + nelt2 < nelt)
 		sel[3 * i + nelt2] = 0;
 	    }
-	  perm3_mask_low = vect_gen_perm_mask (vectype, sel);
-	  gcc_assert (perm3_mask_low != NULL);
+	  perm3_mask_low = vect_gen_perm_mask_checked (vectype, sel);
 
 	  for (i = 0; i < nelt; i++)
 	    {
@@ -4635,8 +4634,7 @@ vect_permute_store_chain (vec<tree> dr_chain,
 	      if (3 * i + nelt2 < nelt)
 		sel[3 * i + nelt2] = nelt + j2++;
 	    }
-	  perm3_mask_high = vect_gen_perm_mask (vectype, sel);
-	  gcc_assert (perm3_mask_high != NULL);
+	  perm3_mask_high = vect_gen_perm_mask_checked (vectype, sel);
 
 	  vect1 = dr_chain[0];
 	  vect2 = dr_chain[1];
@@ -4675,13 +4673,11 @@ vect_permute_store_chain (vec<tree> dr_chain,
 	  sel[i * 2] = i;
 	  sel[i * 2 + 1] = i + nelt;
 	}
-	perm_mask_high = vect_gen_perm_mask (vectype, sel);
-	gcc_assert (perm_mask_high != NULL);
+	perm_mask_high = vect_gen_perm_mask_checked (vectype, sel);
 
 	for (i = 0; i < nelt; i++)
 	  sel[i] += nelt / 2;
-	perm_mask_low = vect_gen_perm_mask (vectype, sel);
-	gcc_assert (perm_mask_low != NULL);
+	perm_mask_low = vect_gen_perm_mask_checked (vectype, sel);
 
 	for (i = 0, n = log_length; i < n; i++)
 	  {
@@ -5184,8 +5180,7 @@ vect_permute_load_chain (vec<tree> dr_chain,
 	      sel[i] = 3 * i + k;
 	    else
 	      sel[i] = 0;
-	  perm3_mask_low = vect_gen_perm_mask (vectype, sel);
-	  gcc_assert (perm3_mask_low != NULL);
+	  perm3_mask_low = vect_gen_perm_mask_checked (vectype, sel);
 
 	  for (i = 0, j = 0; i < nelt; i++)
 	    if (3 * i + k < 2 * nelt)
@@ -5193,8 +5188,7 @@ vect_permute_load_chain (vec<tree> dr_chain,
 	    else
 	      sel[i] = nelt + ((nelt + k) % 3) + 3 * (j++);
 
-	  perm3_mask_high = vect_gen_perm_mask (vectype, sel);
-	  gcc_assert (perm3_mask_high != NULL);
+	  perm3_mask_high = vect_gen_perm_mask_checked (vectype, sel);
 
 	  first_vect = dr_chain[0];
 	  second_vect = dr_chain[1];
@@ -5228,13 +5222,11 @@ vect_permute_load_chain (vec<tree> dr_chain,
 
       for (i = 0; i < nelt; ++i)
 	sel[i] = i * 2;
-      perm_mask_even = vect_gen_perm_mask (vectype, sel);
-      gcc_assert (perm_mask_even != NULL);
+      perm_mask_even = vect_gen_perm_mask_checked (vectype, sel);
 
       for (i = 0; i < nelt; ++i)
 	sel[i] = i * 2 + 1;
-      perm_mask_odd = vect_gen_perm_mask (vectype, sel);
-      gcc_assert (perm_mask_odd != NULL);
+      perm_mask_odd = vect_gen_perm_mask_checked (vectype, sel);
 
       for (i = 0; i < log_length; i++)
 	{
@@ -5389,8 +5381,7 @@ vect_shift_permute_load_chain (vec<tree> dr_chain,
 			      supported by target\n");
 	  return false;
 	}
-      perm2_mask1 = vect_gen_perm_mask (vectype, sel);
-      gcc_assert (perm2_mask1 != NULL);
+      perm2_mask1 = vect_gen_perm_mask_checked (vectype, sel);
 
       for (i = 0; i < nelt / 2; ++i)
 	sel[i] = i * 2 + 1;
@@ -5404,8 +5395,7 @@ vect_shift_permute_load_chain (vec<tree> dr_chain,
 			      supported by target\n");
 	  return false;
 	}
-      perm2_mask2 = vect_gen_perm_mask (vectype, sel);
-      gcc_assert (perm2_mask2 != NULL);
+      perm2_mask2 = vect_gen_perm_mask_checked (vectype, sel);
 
       /* Generating permutation constant to shift all elements.
 	 For vector length 8 it is {4 5 6 7 8 9 10 11}.  */
@@ -5418,8 +5408,7 @@ vect_shift_permute_load_chain (vec<tree> dr_chain,
 			     "shift permutation is not supported by target\n");
 	  return false;
 	}
-      shift1_mask = vect_gen_perm_mask (vectype, sel);
-      gcc_assert (shift1_mask != NULL);
+      shift1_mask = vect_gen_perm_mask_checked (vectype, sel);
 
       /* Generating permutation constant to select vector from 2.
 	 For vector length 8 it is {0 1 2 3 12 13 14 15}.  */
@@ -5434,8 +5423,7 @@ vect_shift_permute_load_chain (vec<tree> dr_chain,
 			     "select is not supported by target\n");
 	  return false;
 	}
-      select_mask = vect_gen_perm_mask (vectype, sel);
-      gcc_assert (select_mask != NULL);
+      select_mask = vect_gen_perm_mask_checked (vectype, sel);
 
       first_vect = dr_chain[0];
       second_vect = dr_chain[1];
@@ -5494,8 +5482,7 @@ vect_shift_permute_load_chain (vec<tree> dr_chain,
 			      supported by target\n");
 	  return false;
 	}
-      perm3_mask = vect_gen_perm_mask (vectype, sel);
-      gcc_assert (perm3_mask != NULL);
+      perm3_mask = vect_gen_perm_mask_checked (vectype, sel);
 
       /* Generating permutation constant to shift all elements.
 	 For vector length 8 it is {6 7 8 9 10 11 12 13}.  */
@@ -5508,8 +5495,7 @@ vect_shift_permute_load_chain (vec<tree> dr_chain,
 			     "shift permutation is not supported by target\n");
 	  return false;
 	}
-      shift1_mask = vect_gen_perm_mask (vectype, sel);
-      gcc_assert (shift1_mask != NULL);
+      shift1_mask = vect_gen_perm_mask_checked (vectype, sel);
 
       /* Generating permutation constant to shift all elements.
 	 For vector length 8 it is {5 6 7 8 9 10 11 12}.  */
@@ -5522,8 +5508,7 @@ vect_shift_permute_load_chain (vec<tree> dr_chain,
 			     "shift permutation is not supported by target\n");
 	  return false;
 	}
-      shift2_mask = vect_gen_perm_mask (vectype, sel);
-      gcc_assert (shift2_mask != NULL);
+      shift2_mask = vect_gen_perm_mask_checked (vectype, sel);
 
       /* Generating permutation constant to shift all elements.
 	 For vector length 8 it is {3 4 5 6 7 8 9 10}.  */
@@ -5536,8 +5521,7 @@ vect_shift_permute_load_chain (vec<tree> dr_chain,
 			     "shift permutation is not supported by target\n");
 	  return false;
 	}
-      shift3_mask = vect_gen_perm_mask (vectype, sel);
-      gcc_assert (shift3_mask != NULL);
+      shift3_mask = vect_gen_perm_mask_checked (vectype, sel);
 
       /* Generating permutation constant to shift all elements.
 	 For vector length 8 it is {5 6 7 8 9 10 11 12}.  */
@@ -5550,8 +5534,7 @@ vect_shift_permute_load_chain (vec<tree> dr_chain,
 			     "shift permutation is not supported by target\n");
 	  return false;
 	}
-      shift4_mask = vect_gen_perm_mask (vectype, sel);
-      gcc_assert (shift4_mask != NULL);
+      shift4_mask = vect_gen_perm_mask_checked (vectype, sel);
 
       for (k = 0; k < 3; k++)
 	{
diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index 681479ed911e9ff0fadc8d52576086ed3bbaca5a..3c2d7d7926e7aece6c0aca63210a2d6d591ac1fb 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -1913,8 +1913,7 @@ vectorizable_mask_load_store (gimple stmt, gimple_stmt_iterator *gsi,
 	  for (i = 0; i < gather_off_nunits; ++i)
 	    sel[i] = i | nunits;
 
-	  perm_mask = vect_gen_perm_mask (gather_off_vectype, sel);
-	  gcc_assert (perm_mask != NULL_TREE);
+	  perm_mask = vect_gen_perm_mask_checked (gather_off_vectype, sel);
 	}
       else if (nunits == gather_off_nunits * 2)
 	{
@@ -1925,13 +1924,11 @@ vectorizable_mask_load_store (gimple stmt, gimple_stmt_iterator *gsi,
 	    sel[i] = i < gather_off_nunits
 		     ? i : i + nunits - gather_off_nunits;
 
-	  perm_mask = vect_gen_perm_mask (vectype, sel);
-	  gcc_assert (perm_mask != NULL_TREE);
+	  perm_mask = vect_gen_perm_mask_checked (vectype, sel);
 	  ncopies *= 2;
 	  for (i = 0; i < nunits; ++i)
 	    sel[i] = i | gather_off_nunits;
-	  mask_perm_mask = vect_gen_perm_mask (masktype, sel);
-	  gcc_assert (mask_perm_mask != NULL_TREE);
+	  mask_perm_mask = vect_gen_perm_mask_checked (masktype, sel);
 	}
       else
 	gcc_unreachable ();
@@ -4936,7 +4933,9 @@ perm_mask_for_reverse (tree vectype)
   for (i = 0; i < nunits; ++i)
     sel[i] = nunits - 1 - i;
 
-  return vect_gen_perm_mask (vectype, sel);
+  if (!can_vec_perm_p (TYPE_MODE (vectype), false, sel))
+    return NULL_TREE;
+  return vect_gen_perm_mask_checked (vectype, sel);
 }
 
 /* Function vectorizable_store.
@@ -5467,21 +5466,19 @@ vectorizable_store (gimple stmt, gimple_stmt_iterator *gsi, gimple *vec_stmt,
   return true;
 }
 
-/* Given a vector type VECTYPE and permutation SEL returns
-   the VECTOR_CST mask that implements the permutation of the
-   vector elements.  If that is impossible to do, returns NULL.  */
+/* Given a vector type VECTYPE, turns permutation SEL into the equivalent
+   VECTOR_CST mask.  No checks are made that the target platform supports the
+   mask, so callers may wish to test can_vec_perm_p separately, or use
+   vect_gen_perm_mask_checked.  */
 
 tree
-vect_gen_perm_mask (tree vectype, unsigned char *sel)
+vect_gen_perm_mask_any (tree vectype, const unsigned char *sel)
 {
   tree mask_elt_type, mask_type, mask_vec, *mask_elts;
   int i, nunits;
 
   nunits = TYPE_VECTOR_SUBPARTS (vectype);
 
-  if (!can_vec_perm_p (TYPE_MODE (vectype), false, sel))
-    return NULL;
-
   mask_elt_type = lang_hooks.types.type_for_mode
 		    (int_mode_for_mode (TYPE_MODE (TREE_TYPE (vectype))), 1);
   mask_type = get_vectype_for_scalar_type (mask_elt_type);
@@ -5494,6 +5491,15 @@ vect_gen_perm_mask (tree vectype, unsigned char *sel)
   return mask_vec;
 }
 
+/* Checked version of vect_gen_perm_mask_any.  Asserts can_vec_perm_p.  */
+
+tree
+vect_gen_perm_mask_checked (tree vectype, const unsigned char *sel)
+{
+  gcc_assert (can_vec_perm_p (TYPE_MODE (vectype), false, sel));
+  return vect_gen_perm_mask_any (vectype, sel);
+}
+
 /* Given a vector variable X and Y, that was generated for the scalar
    STMT, generate instructions to permute the vector elements of X and Y
    using permutation mask MASK_VEC, insert them at *GSI and return the
@@ -5849,8 +5855,7 @@ vectorizable_load (gimple stmt, gimple_stmt_iterator *gsi, gimple *vec_stmt,
 	  for (i = 0; i < gather_off_nunits; ++i)
 	    sel[i] = i | nunits;
 
-	  perm_mask = vect_gen_perm_mask (gather_off_vectype, sel);
-	  gcc_assert (perm_mask != NULL_TREE);
+	  perm_mask = vect_gen_perm_mask_checked (gather_off_vectype, sel);
 	}
       else if (nunits == gather_off_nunits * 2)
 	{
@@ -5861,8 +5866,7 @@ vectorizable_load (gimple stmt, gimple_stmt_iterator *gsi, gimple *vec_stmt,
 	    sel[i] = i < gather_off_nunits
 		     ? i : i + nunits - gather_off_nunits;
 
-	  perm_mask = vect_gen_perm_mask (vectype, sel);
-	  gcc_assert (perm_mask != NULL_TREE);
+	  perm_mask = vect_gen_perm_mask_checked (vectype, sel);
 	  ncopies *= 2;
 	}
       else
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index 93aa73e59c6079dc95288979e15c1d47725391b7..d817f9f53a6ed8d88a305207f6f482ef7a30fea3 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -1040,7 +1040,8 @@ extern void vect_get_store_cost (struct data_reference *, int,
 extern bool vect_supportable_shift (enum tree_code, tree);
 extern void vect_get_vec_defs (tree, tree, gimple, vec<tree> *,
 			       vec<tree> *, slp_tree, int);
-extern tree vect_gen_perm_mask (tree, unsigned char *);
+extern tree vect_gen_perm_mask_any (tree, const unsigned char *);
+extern tree vect_gen_perm_mask_checked (tree, const unsigned char *);
 
 /* In tree-vect-data-refs.c.  */
 extern bool vect_can_force_dr_alignment_p (const_tree, unsigned int);

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 2/4][Vectorizer] Use a VEC_PERM_EXPR instead of VEC_RSHIFT_EXPR; expand appropriate VEC_PERM_EXPRs using vec_shr_optab
  2014-11-12 17:52 [PATCH 0/4][Vectorizer] Reductions: replace VEC_RSHIFT_EXPR with VEC_PERM_EXPR Alan Lawrence
  2014-11-12 17:53 ` [PATCH 1/4][Vectorizer] Split vect_gen_perm_mask into _checked and _any variants Alan Lawrence
@ 2014-11-12 17:54 ` Alan Lawrence
  2014-11-13 11:23   ` Richard Biener
  2014-11-12 17:56 ` [PATCH 3/4] Remove VEC_RSHIFT_EXPR tree code, now unused Alan Lawrence
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 12+ messages in thread
From: Alan Lawrence @ 2014-11-12 17:54 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 2592 bytes --]

This makes the vectorizer use VEC_PERM_EXPRs when doing reductions via shifts, 
rather than VEC_RSHIFT_EXPR.

VEC_RSHIFT_EXPR presently has an endianness-dependent meaning (paralleling 
vec_shr_optab). While the overall destination of this patch series is to make 
these endianness-neutral, this patch already feels quite big enough, hence, here 
we just switch to using VEC_PERM_EXPRs that have meaning equivalent to the old 
VEC_RSHIFT_EXPRs. Since VEC_PERM_EXPR is endianness-neutral, this means the mask 
we need to represent the meaning of the old VEC_RSHIFT_EXPR changes according to 
endianness. (Patch 4 completes this journey by removing the 
BYTES_BIG_ENDIAN-conditional parts; so an alternative route to the same 
endpoint, would be to first change VEC_RSHIFT_EXPR to be endianness-independent, 
then replace it by VEC_PERM_EXPRs. I posted such a patch to make VEC_RSHIFT_EXPR 
independent https://gcc.gnu.org/ml/gcc-patches/2014-09/msg01475.html and this 
was what lead Richi to make his suggestion!)

The "trick" here is then to look for the platform handling vec_shr_optab when 
expanding vec_perm_const *if* the second vector is all constant zeroes and the 
vec_perm mask is appropriate. I felt it was best to keep this case separate from 
can_vec_perm_p, so the latter only indicates when the target platform can apply 
a given permutation to _arbitrary_input_vectors_, as can_vec_perm_p's interface 
is already complicated enough without making it also able to handle cases where 
some of the vectors-to-be-shuffled are known.

A nice side effect of this patch is that aarch64 targets suddenly perform 
reductions via shifts even *without* a vec_shr_optab, because 
aarch64_vectorize_vec_perm_const_ok looks for shuffle-masks for the EXT 
instruction, which can indeed be used to perform a shift :).

With patch 1, bootstrapped on x86-none-linux-gnu (more testing with patch 3).

gcc/ChangeLog:

	* optabs.c (can_vec_perm_p): Update comment, does not consider vec_shr.
	(shift_amt_for_vec_perm_mask): New.
	(expand_vec_perm_1): Use vec_shr_optab if second vector is const0_rtx
	and mask appropriate.

	* tree-vect-loop.c (calc_vec_perm_mask_for_shift): New.
	(have_whole_vector_shift): New.
	(vect_model_reduction_cost): Call have_whole_vector_shift instead of
	looking for vec_shr_optab.
	(vect_create_epilog_for_reduction): Likewise; also rename local variable
	have_whole_vector_shift to reduce_with_shift; output VEC_PERM_EXPRs
	instead of VEC_RSHIFT_EXPRs.

	* tree-vect-stmts.c (vect_gen_perm_mask_checked): Extend comment.

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 2_use_vec_perm_expr.patch --]
[-- Type: text/x-patch; name=2_use_vec_perm_expr.patch, Size: 10371 bytes --]

diff --git a/gcc/optabs.c b/gcc/optabs.c
index 9452f991a6fb784c6288ad8501a412b83b14c92a..64a6a1345bf88993b9c2a045e67cfcc22c8010a4 100644
--- a/gcc/optabs.c
+++ b/gcc/optabs.c
@@ -6575,8 +6575,11 @@ vector_compare_rtx (enum tree_code tcode, tree t_op0, tree t_op1,
   return gen_rtx_fmt_ee (rcode, VOIDmode, ops[0].value, ops[1].value);
 }
 
-/* Return true if VEC_PERM_EXPR can be expanded using SIMD extensions
-   of the CPU.  SEL may be NULL, which stands for an unknown constant.  */
+/* Return true if VEC_PERM_EXPR of arbitrary input vectors can be expanded using
+   SIMD extensions of the CPU.  SEL may be NULL, which stands for an unknown
+   constant.  Note that additional permutations representing whole-vector shifts
+   may also be handled via the vec_shr optab, but only where the second input
+   vector is entirely constant zeroes; this case is not dealt with here.  */
 
 bool
 can_vec_perm_p (enum machine_mode mode, bool variable,
@@ -6629,6 +6632,36 @@ can_vec_perm_p (enum machine_mode mode, bool variable,
   return true;
 }
 
+/* Checks if vec_perm mask SEL is a constant equivalent to a shift of the first
+   vec_perm operand, assuming the second operand is a constant vector of zeroes.
+   Return the shift distance in bits if so, or NULL_RTX if the vec_perm is not a
+   shift.  */
+static rtx
+shift_amt_for_vec_perm_mask (rtx sel)
+{
+  unsigned int i, first, nelt = GET_MODE_NUNITS (GET_MODE (sel));
+  unsigned int bitsize = GET_MODE_BITSIZE (GET_MODE_INNER (GET_MODE (sel)));
+
+  if (GET_CODE (sel) != CONST_VECTOR)
+    return NULL_RTX;
+
+  first = INTVAL (CONST_VECTOR_ELT (sel, 0));
+  if (first >= 2*nelt)
+    return NULL_RTX;
+  for (i = 1; i < nelt; i++)
+    {
+      int idx = INTVAL (CONST_VECTOR_ELT (sel, i));
+      unsigned int expected = (i + first) & (2 * nelt - 1);
+      /* Indices into the second vector are all equivalent.  */
+      if (idx < 0 || (MIN (nelt, (unsigned) idx) != MIN (nelt, expected)))
+	return NULL_RTX;
+    }
+
+  if (BYTES_BIG_ENDIAN)
+    first = (2 * nelt) - first;
+  return GEN_INT (first * bitsize);
+}
+
 /* A subroutine of expand_vec_perm for expanding one vec_perm insn.  */
 
 static rtx
@@ -6657,6 +6690,17 @@ expand_vec_perm_1 (enum insn_code icode, rtx target,
   else
     {
       create_input_operand (&ops[1], v0, tmode);
+      /* See if this can be handled with a vec_shr.  We only do this if the
+         second vector is all zeroes.  */
+      enum insn_code shift_code = optab_handler (vec_shr_optab, GET_MODE (v0));
+      if (v1 == CONST0_RTX (GET_MODE (v1)) && shift_code)
+	if (rtx shift_amt = shift_amt_for_vec_perm_mask (sel))
+	  {
+	    create_convert_operand_from_type (&ops[2], shift_amt,
+					      sizetype_tab[(int) stk_sizetype]);
+	    if (maybe_expand_insn (shift_code, 3, ops))
+	      return ops[0].value;
+	  }
       create_input_operand (&ops[2], v1, tmode);
     }
 
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index a15ce14ef841d4573f7937522bcdaab8d6cb5efe..f1d327f42a5c517a01121135569dd014e49502e0 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -3082,6 +3082,41 @@ vect_estimate_min_profitable_iters (loop_vec_info loop_vinfo,
   *ret_min_profitable_estimate = min_profitable_estimate;
 }
 
+/* Writes into SEL a mask for a vec_perm, equivalent to a vec_shr by OFFSET
+   vector elements (not bits) for a vector of mode MODE.  */
+static void
+calc_vec_perm_mask_for_shift (enum machine_mode mode, unsigned int offset,
+			      unsigned char *sel)
+{
+  unsigned int i, nelt = GET_MODE_NUNITS (mode);
+
+  for (i = 0; i < nelt; i++)
+    sel[i] = (BYTES_BIG_ENDIAN ? i - offset : i + offset) & (2*nelt - 1);
+}
+
+/* Checks whether the target supports whole-vector shifts for vectors of mode
+   MODE.  This is the case if _either_ the platform handles vec_shr_optab, _or_
+   it supports vec_perm_const with masks for all necessary shift amounts.  */
+static bool
+have_whole_vector_shift (enum machine_mode mode)
+{
+  if (optab_handler (vec_shr_optab, mode) != CODE_FOR_nothing)
+    return true;
+
+  if (direct_optab_handler (vec_perm_const_optab, mode) == CODE_FOR_nothing)
+    return false;
+
+  unsigned int i, nelt = GET_MODE_NUNITS (mode);
+  unsigned char *sel = XALLOCAVEC (unsigned char, nelt);
+
+  for (i = nelt/2; i >= 1; i/=2)
+    {
+      calc_vec_perm_mask_for_shift (mode, i, sel);
+      if (!can_vec_perm_p (mode, false, sel))
+	return false;
+    }
+  return true;
+}
 
 /* TODO: Close dependency between vect_model_*_cost and vectorizable_*
    functions. Design better to avoid maintenance issues.  */
@@ -3184,7 +3219,7 @@ vect_model_reduction_cost (stmt_vec_info stmt_info, enum tree_code reduc_code,
 	  /* We have a whole vector shift available.  */
 	  if (VECTOR_MODE_P (mode)
 	      && optab_handler (optab, mode) != CODE_FOR_nothing
-	      && optab_handler (vec_shr_optab, mode) != CODE_FOR_nothing)
+	      && have_whole_vector_shift (mode))
 	    {
 	      /* Final reduction via vector shifts and the reduction operator.
 		 Also requires scalar extract.  */
@@ -3787,7 +3822,6 @@ get_initial_def_for_reduction (gimple stmt, tree init_val,
   return init_def;
 }
 
-
 /* Function vect_create_epilog_for_reduction
 
    Create code at the loop-epilog to finalize the result of a reduction
@@ -4211,18 +4245,11 @@ vect_create_epilog_for_reduction (vec<tree> vect_defs, gimple stmt,
     }
   else
     {
-      enum tree_code shift_code = ERROR_MARK;
-      bool have_whole_vector_shift = true;
-      int bit_offset;
+      bool reduce_with_shift = have_whole_vector_shift (mode);
       int element_bitsize = tree_to_uhwi (bitsize);
       int vec_size_in_bits = tree_to_uhwi (TYPE_SIZE (vectype));
       tree vec_temp;
 
-      if (optab_handler (vec_shr_optab, mode) != CODE_FOR_nothing)
-        shift_code = VEC_RSHIFT_EXPR;
-      else
-        have_whole_vector_shift = false;
-
       /* Regardless of whether we have a whole vector shift, if we're
          emulating the operation via tree-vect-generic, we don't want
          to use it.  Only the first round of the reduction is likely
@@ -4230,18 +4257,24 @@ vect_create_epilog_for_reduction (vec<tree> vect_defs, gimple stmt,
       /* ??? It might be better to emit a reduction tree code here, so that
          tree-vect-generic can expand the first round via bit tricks.  */
       if (!VECTOR_MODE_P (mode))
-        have_whole_vector_shift = false;
+        reduce_with_shift = false;
       else
         {
           optab optab = optab_for_tree_code (code, vectype, optab_default);
           if (optab_handler (optab, mode) == CODE_FOR_nothing)
-            have_whole_vector_shift = false;
+            reduce_with_shift = false;
         }
 
-      if (have_whole_vector_shift && !slp_reduc)
+      if (reduce_with_shift && !slp_reduc)
         {
+          int nelements = vec_size_in_bits / element_bitsize;
+          unsigned char *sel = XALLOCAVEC (unsigned char, nelements);
+
+          int elt_offset;
+
+          tree zero_vec = build_zero_cst (vectype);
           /*** Case 2: Create:
-             for (offset = VS/2; offset >= element_size; offset/=2)
+             for (offset = nelements/2; offset >= 1; offset/=2)
                 {
                   Create:  va' = vec_shift <va, offset>
                   Create:  va = vop <va, va'>
@@ -4253,14 +4286,15 @@ vect_create_epilog_for_reduction (vec<tree> vect_defs, gimple stmt,
 
           vec_dest = vect_create_destination_var (scalar_dest, vectype);
           new_temp = new_phi_result;
-          for (bit_offset = vec_size_in_bits/2;
-               bit_offset >= element_bitsize;
-               bit_offset /= 2)
+          for (elt_offset = nelements / 2;
+               elt_offset >= 1;
+               elt_offset /= 2)
             {
-              tree bitpos = size_int (bit_offset);
-
-              epilog_stmt = gimple_build_assign_with_ops (shift_code,
-                                               vec_dest, new_temp, bitpos);
+              calc_vec_perm_mask_for_shift (mode, elt_offset, sel);
+              tree mask = vect_gen_perm_mask_any (vectype, sel);
+	      epilog_stmt = gimple_build_assign_with_ops (VEC_PERM_EXPR,
+							  vec_dest, new_temp,
+							  zero_vec, mask);
               new_name = make_ssa_name (vec_dest, epilog_stmt);
               gimple_assign_set_lhs (epilog_stmt, new_name);
               gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT);
@@ -4276,8 +4310,6 @@ vect_create_epilog_for_reduction (vec<tree> vect_defs, gimple stmt,
         }
       else
         {
-          tree rhs;
-
           /*** Case 3: Create:
              s = extract_field <v_out2, 0>
              for (offset = element_size;
@@ -4295,11 +4327,12 @@ vect_create_epilog_for_reduction (vec<tree> vect_defs, gimple stmt,
           vec_size_in_bits = tree_to_uhwi (TYPE_SIZE (vectype));
           FOR_EACH_VEC_ELT (new_phis, i, new_phi)
             {
+              int bit_offset;
               if (gimple_code (new_phi) == GIMPLE_PHI)
                 vec_temp = PHI_RESULT (new_phi);
               else
                 vec_temp = gimple_assign_lhs (new_phi);
-              rhs = build3 (BIT_FIELD_REF, scalar_type, vec_temp, bitsize,
+              tree rhs = build3 (BIT_FIELD_REF, scalar_type, vec_temp, bitsize,
                             bitsize_zero_node);
               epilog_stmt = gimple_build_assign (new_scalar_dest, rhs);
               new_temp = make_ssa_name (new_scalar_dest, epilog_stmt);
diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index 3c2d7d7926e7aece6c0aca63210a2d6d591ac1fb..db464f3fc842b14f25f5a4ae1e18aec91e59e868 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -5491,7 +5491,8 @@ vect_gen_perm_mask_any (tree vectype, const unsigned char *sel)
   return mask_vec;
 }
 
-/* Checked version of vect_gen_perm_mask_any.  Asserts can_vec_perm_p.  */
+/* Checked version of vect_gen_perm_mask_any.  Asserts can_vec_perm_p,
+   i.e. that the target supports the pattern _for arbitrary input vectors_.  */
 
 tree
 vect_gen_perm_mask_checked (tree vectype, const unsigned char *sel)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 3/4] Remove VEC_RSHIFT_EXPR tree code, now unused
  2014-11-12 17:52 [PATCH 0/4][Vectorizer] Reductions: replace VEC_RSHIFT_EXPR with VEC_PERM_EXPR Alan Lawrence
  2014-11-12 17:53 ` [PATCH 1/4][Vectorizer] Split vect_gen_perm_mask into _checked and _any variants Alan Lawrence
  2014-11-12 17:54 ` [PATCH 2/4][Vectorizer] Use a VEC_PERM_EXPR instead of VEC_RSHIFT_EXPR; expand appropriate VEC_PERM_EXPRs using vec_shr_optab Alan Lawrence
@ 2014-11-12 17:56 ` Alan Lawrence
  2014-11-13 11:24   ` Richard Biener
  2014-11-12 18:05 ` [PATCH 4/4][Vectorizer]Make reductions-via-shifts and vec_shr_optab endianness-neutral Alan Lawrence
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 12+ messages in thread
From: Alan Lawrence @ 2014-11-12 17:56 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1026 bytes --]

Tested (with patches 1+2):

Bootstrap + check-gcc on x64-none-linux-gnu

cross-tested check-gcc on aarch64-none-elf and aarch64_be-none-elf as these 
platforms stand (i.e. without vec_shr_optab).

also cross-tested check-gcc on aarch64-none-elf and aarch64_be-none-elf after 
applying https://gcc.gnu.org/ml/gcc-patches/2014-09/msg01473.html (which adds a 
vec_shr_<m> pattern).

bootstrap on powerpc64-none-linux-gnu; check-gcc in progress.

gcc/ChangeLog:

	* fold-const.c (const_binop): Remove code handling VEC_RSHIFT_EXPR.
	* tree-cfg.c (verify_gimple_assign_binary): Likewise.
	* tree-inline.c (estimate_operator_cost): Likewise.
	* tree-pretty-print.c (dump_generic_node, op_code_prio, op_symbol_code):
	Likewise.

	* tree-vect-generic.c (expand_vector_operations_1): Remove assertion
	against VEC_RSHIFT_EXPR.

	* optabs.h (expand_vec_shift_expr): Remove.
	* optabs.c (optab_for_tree_code): Remove case VEC_RSHIFT_EXPR.
	(expand_vec_shift_expr): Remove.
	* tree.def (VEC_RSHIFT_EXPR): Remove

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 3_rm_vec_rshift.patch --]
[-- Type: text/x-patch; name=3_rm_vec_rshift.patch, Size: 8765 bytes --]

diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
index 2df8ce3..15d7638 100644
--- a/gcc/cfgexpand.c
+++ b/gcc/cfgexpand.c
@@ -4659,7 +4659,6 @@ expand_debug_expr (tree exp)
     case VEC_PACK_FIX_TRUNC_EXPR:
     case VEC_PACK_SAT_EXPR:
     case VEC_PACK_TRUNC_EXPR:
-    case VEC_RSHIFT_EXPR:
     case VEC_UNPACK_FLOAT_HI_EXPR:
     case VEC_UNPACK_FLOAT_LO_EXPR:
     case VEC_UNPACK_HI_EXPR:
diff --git a/gcc/expr.c b/gcc/expr.c
index 0ef06ea..fb8455c 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -9143,12 +9143,6 @@ expand_expr_real_2 (sepops ops, rtx target, machine_mode tmode,
         return temp;
       }
 
-    case VEC_RSHIFT_EXPR:
-      {
-	target = expand_vec_shift_expr (ops, target);
-	return target;
-      }
-
     case VEC_UNPACK_HI_EXPR:
     case VEC_UNPACK_LO_EXPR:
       {
diff --git a/gcc/fold-const.c b/gcc/fold-const.c
index 13faf0c..22f5704 100644
--- a/gcc/fold-const.c
+++ b/gcc/fold-const.c
@@ -1419,44 +1419,17 @@ const_binop (enum tree_code code, tree arg1, tree arg2)
       int count = TYPE_VECTOR_SUBPARTS (type), i;
       tree *elts = XALLOCAVEC (tree, count);
 
-      if (code == VEC_RSHIFT_EXPR)
+      for (i = 0; i < count; i++)
 	{
-	  if (!tree_fits_uhwi_p (arg2))
-	    return NULL_TREE;
+	  tree elem1 = VECTOR_CST_ELT (arg1, i);
 
-	  unsigned HOST_WIDE_INT shiftc = tree_to_uhwi (arg2);
-	  unsigned HOST_WIDE_INT outerc = tree_to_uhwi (TYPE_SIZE (type));
-	  unsigned HOST_WIDE_INT innerc
-	    = tree_to_uhwi (TYPE_SIZE (TREE_TYPE (type)));
-	  if (shiftc >= outerc || (shiftc % innerc) != 0)
+	  elts[i] = const_binop (code, elem1, arg2);
+
+	  /* It is possible that const_binop cannot handle the given
+	     code and return NULL_TREE.  */
+	  if (elts[i] == NULL_TREE)
 	    return NULL_TREE;
-	  int offset = shiftc / innerc;
-	  /* The direction of VEC_RSHIFT_EXPR is endian dependent.
-	     For reductions, if !BYTES_BIG_ENDIAN then compiler picks first
-	     vector element, but last element if BYTES_BIG_ENDIAN.  */
-	  if (BYTES_BIG_ENDIAN)
-	    offset = -offset;
-	  tree zero = build_zero_cst (TREE_TYPE (type));
-	  for (i = 0; i < count; i++)
-	    {
-	      if (i + offset < 0 || i + offset >= count)
-		elts[i] = zero;
-	      else
-		elts[i] = VECTOR_CST_ELT (arg1, i + offset);
-	    }
 	}
-      else
-	for (i = 0; i < count; i++)
-	  {
-	    tree elem1 = VECTOR_CST_ELT (arg1, i);
-
-	    elts[i] = const_binop (code, elem1, arg2);
-
-	    /* It is possible that const_binop cannot handle the given
-	       code and return NULL_TREE */
-	    if (elts[i] == NULL_TREE)
-	      return NULL_TREE;
-	  }
 
       return build_vector (type, elts);
     }
diff --git a/gcc/optabs.c b/gcc/optabs.c
index 6194b45..b67828c 100644
--- a/gcc/optabs.c
+++ b/gcc/optabs.c
@@ -520,9 +520,6 @@ optab_for_tree_code (enum tree_code code, const_tree type,
     case REDUC_PLUS_EXPR:
       return reduc_plus_scal_optab;
 
-    case VEC_RSHIFT_EXPR:
-      return vec_shr_optab;
-
     case VEC_WIDEN_MULT_HI_EXPR:
       return TYPE_UNSIGNED (type) ?
 	vec_widen_umult_hi_optab : vec_widen_smult_hi_optab;
@@ -771,34 +768,6 @@ force_expand_binop (machine_mode mode, optab binoptab,
   return true;
 }
 
-/* Generate insns for VEC_RSHIFT_EXPR.  */
-
-rtx
-expand_vec_shift_expr (sepops ops, rtx target)
-{
-  struct expand_operand eops[3];
-  enum insn_code icode;
-  rtx rtx_op1, rtx_op2;
-  machine_mode mode = TYPE_MODE (ops->type);
-  tree vec_oprnd = ops->op0;
-  tree shift_oprnd = ops->op1;
-
-  gcc_assert (ops->code == VEC_RSHIFT_EXPR);
-
-  icode = optab_handler (vec_shr_optab, mode);
-  gcc_assert (icode != CODE_FOR_nothing);
-
-  rtx_op1 = expand_normal (vec_oprnd);
-  rtx_op2 = expand_normal (shift_oprnd);
-
-  create_output_operand (&eops[0], target, mode);
-  create_input_operand (&eops[1], rtx_op1, GET_MODE (rtx_op1));
-  create_convert_operand_from_type (&eops[2], rtx_op2, TREE_TYPE (shift_oprnd));
-  expand_insn (icode, 3, eops);
-
-  return eops[0].value;
-}
-
 /* Create a new vector value in VMODE with all elements set to OP.  The
    mode of OP must be the element mode of VMODE.  If OP is a constant,
    then the return value will be a constant.  */
diff --git a/gcc/optabs.h b/gcc/optabs.h
index 91e36d6..982a593 100644
--- a/gcc/optabs.h
+++ b/gcc/optabs.h
@@ -287,8 +287,6 @@ extern rtx simplify_expand_binop (machine_mode mode, optab binoptab,
 				  enum optab_methods methods);
 extern bool force_expand_binop (machine_mode, optab, rtx, rtx, rtx, int,
 				enum optab_methods);
-/* Generate code for VEC_RSHIFT_EXPR.  */
-extern rtx expand_vec_shift_expr (struct separate_ops *, rtx);
 
 /* Generate code for a simple binary or unary operation.  "Simple" in
    this case means "can be unambiguously described by a (mode, code)
diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c
index ee10bc6..904f2dd 100644
--- a/gcc/tree-cfg.c
+++ b/gcc/tree-cfg.c
@@ -3675,38 +3675,6 @@ verify_gimple_assign_binary (gimple stmt)
 	return false;
       }
 
-    case VEC_RSHIFT_EXPR:
-      {
-	if (TREE_CODE (rhs1_type) != VECTOR_TYPE
-	    || !(INTEGRAL_TYPE_P (TREE_TYPE (rhs1_type))
-		 || POINTER_TYPE_P (TREE_TYPE (rhs1_type))
-		 || FIXED_POINT_TYPE_P (TREE_TYPE (rhs1_type))
-		 || SCALAR_FLOAT_TYPE_P (TREE_TYPE (rhs1_type)))
-	    || (!INTEGRAL_TYPE_P (rhs2_type)
-		&& (TREE_CODE (rhs2_type) != VECTOR_TYPE
-		    || !INTEGRAL_TYPE_P (TREE_TYPE (rhs2_type))))
-	    || !useless_type_conversion_p (lhs_type, rhs1_type))
-	  {
-	    error ("type mismatch in vector shift expression");
-	    debug_generic_expr (lhs_type);
-	    debug_generic_expr (rhs1_type);
-	    debug_generic_expr (rhs2_type);
-	    return true;
-	  }
-	/* For shifting a vector of non-integral components we
-	   only allow shifting by a constant multiple of the element size.  */
-	if (!INTEGRAL_TYPE_P (TREE_TYPE (rhs1_type))
-	    && (TREE_CODE (rhs2) != INTEGER_CST
-		|| !div_if_zero_remainder (rhs2,
-					   TYPE_SIZE (TREE_TYPE (rhs1_type)))))
-	  {
-	    error ("non-element sized vector shift of floating point vector");
-	    return true;
-	  }
-
-	return false;
-      }
-
     case WIDEN_LSHIFT_EXPR:
       {
         if (!INTEGRAL_TYPE_P (lhs_type)
diff --git a/gcc/tree-inline.c b/gcc/tree-inline.c
index 8cb9510..520546e 100644
--- a/gcc/tree-inline.c
+++ b/gcc/tree-inline.c
@@ -3807,7 +3807,6 @@ estimate_operator_cost (enum tree_code code, eni_weights *weights,
     case RSHIFT_EXPR:
     case LROTATE_EXPR:
     case RROTATE_EXPR:
-    case VEC_RSHIFT_EXPR:
 
     case BIT_IOR_EXPR:
     case BIT_XOR_EXPR:
diff --git a/gcc/tree-pretty-print.c b/gcc/tree-pretty-print.c
index b8abd14..53720de 100644
--- a/gcc/tree-pretty-print.c
+++ b/gcc/tree-pretty-print.c
@@ -1858,7 +1858,6 @@ dump_generic_node (pretty_printer *buffer, tree node, int spc, int flags,
     case RSHIFT_EXPR:
     case LROTATE_EXPR:
     case RROTATE_EXPR:
-    case VEC_RSHIFT_EXPR:
     case WIDEN_LSHIFT_EXPR:
     case BIT_IOR_EXPR:
     case BIT_XOR_EXPR:
@@ -3038,7 +3037,6 @@ op_code_prio (enum tree_code code)
     case REDUC_MAX_EXPR:
     case REDUC_MIN_EXPR:
     case REDUC_PLUS_EXPR:
-    case VEC_RSHIFT_EXPR:
     case VEC_UNPACK_HI_EXPR:
     case VEC_UNPACK_LO_EXPR:
     case VEC_UNPACK_FLOAT_HI_EXPR:
@@ -3148,9 +3146,6 @@ op_symbol_code (enum tree_code code)
     case RROTATE_EXPR:
       return "r>>";
 
-    case VEC_RSHIFT_EXPR:
-      return "v>>";
-
     case WIDEN_LSHIFT_EXPR:
       return "w<<";
 
diff --git a/gcc/tree-vect-generic.c b/gcc/tree-vect-generic.c
index a0c1363..bd9df15 100644
--- a/gcc/tree-vect-generic.c
+++ b/gcc/tree-vect-generic.c
@@ -1604,7 +1604,6 @@ expand_vector_operations_1 (gimple_stmt_iterator *gsi)
   if (compute_type == type)
     return;
 
-  gcc_assert (code != VEC_RSHIFT_EXPR);
   new_rhs = expand_vector_operation (gsi, type, compute_type, stmt, code);
 
   /* Leave expression untouched for later expansion.  */
diff --git a/gcc/tree.def b/gcc/tree.def
index 91359a2..e4625d0 100644
--- a/gcc/tree.def
+++ b/gcc/tree.def
@@ -1251,11 +1251,6 @@ DEFTREECODE (WIDEN_LSHIFT_EXPR, "widen_lshift_expr", tcc_binary, 2)
    before adding operand three.  */
 DEFTREECODE (FMA_EXPR, "fma_expr", tcc_expression, 3)
 
-/* Whole vector right shift in bits.
-   Operand 0 is a vector to be shifted.
-   Operand 1 is an integer shift amount in bits.  */
-DEFTREECODE (VEC_RSHIFT_EXPR, "vec_rshift_expr", tcc_binary, 2)
-\f
 /* Widening vector multiplication.
    The two operands are vectors with N elements of size S. Multiplying the
    elements of the two vectors will result in N products of size 2*S.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 4/4][Vectorizer]Make reductions-via-shifts and vec_shr_optab endianness-neutral
  2014-11-12 17:52 [PATCH 0/4][Vectorizer] Reductions: replace VEC_RSHIFT_EXPR with VEC_PERM_EXPR Alan Lawrence
                   ` (2 preceding siblings ...)
  2014-11-12 17:56 ` [PATCH 3/4] Remove VEC_RSHIFT_EXPR tree code, now unused Alan Lawrence
@ 2014-11-12 18:05 ` Alan Lawrence
  2014-11-13 11:32   ` Richard Biener
  2014-11-12 19:20 ` [PATCH 0/4][Vectorizer] Reductions: replace VEC_RSHIFT_EXPR with VEC_PERM_EXPR Matthew Fortune
  2014-11-13  9:16 ` Richard Biener
  5 siblings, 1 reply; 12+ messages in thread
From: Alan Lawrence @ 2014-11-12 18:05 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1234 bytes --]

This redefines vec_shr optab to be the same (in terms of gcc vectors) regardless 
of target endianness. The vectorizer uses this to do reductions via shifts, so 
also change the vectorizer to shift things always the same way (from the 
midend's POV of vectors).

cross-tested check-gcc on (1) aarch64-none-elf and (2) aarch64_be-none-elf, both 
(a) using the endianness-independent vec_shr patterns at 
https://gcc.gnu.org/ml/gcc-patches/2014-09/msg01477.html and (b) in present 
state without any vec_shr patterns. No regressions on any combination.

Bootstrap + check-gcc on x86_64-none-linux-gnu.

This patch will break MIPS and PowerPC (which have bigendian vec_shr patterns). 
Candidate MIPS fix previously posted at 
https://gcc.gnu.org/ml/gcc-patches/2014-09/msg01481.html .
PowerPC should be fixed by 
https://gcc.gnu.org/ml/gcc-patches/2014-09/msg01480.html ; I've bootstrapped 
this on powerpc64-unknown-linux-gnu, check-gcc in progress.

gcc/ChangeLog:

	* optabs.c (shift_amt_for_vec_perm_mask): Remove code conditional on
	BYTES_BIG_ENDIAN.
	* tree-vect-loop.c (calc_vec_perm_mask_for_shift,
	vect_create_epilog_for_reduction): Likewise.
	* doc/md.texi (vec_shr_m): Clarify direction of shifting.

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 4_make_endian_neutral.patch --]
[-- Type: text/x-patch; name=4_make_endian_neutral.patch, Size: 2687 bytes --]

diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 3b5511ec39a86fb4278ebd766420eaec5eb05d8b..3742ca3e3f1c428ae8a84885969b31806df02ebd 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -4798,7 +4798,7 @@ of a wider mode.)
 
 @cindex @code{vec_shr_@var{m}} instruction pattern
 @item @samp{vec_shr_@var{m}}
-Whole vector right shift in bits.
+Whole vector right shift in bits, i.e. towards element 0.
 Operand 1 is a vector to be shifted.
 Operand 2 is an integer shift amount in bits.
 Operand 0 is where the resulting shifted vector is stored.
diff --git a/gcc/optabs.c b/gcc/optabs.c
index 9650b58a8ed2c6701f653c0d6c15dba71b04b089..6e8f52e1d88844b060857356ac2ec8b7cba059c6 100644
--- a/gcc/optabs.c
+++ b/gcc/optabs.c
@@ -6626,8 +6626,6 @@ shift_amt_for_vec_perm_mask (rtx sel)
 	return NULL_RTX;
     }
 
-  if (BYTES_BIG_ENDIAN)
-    first = (2 * nelt) - first;
   return GEN_INT (first * bitsize);
 }
 
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index f1d327f42a5c517a01121135569dd014e49502e0..91d82b72d0f0ef944d0fa5c23da36aff0384a9d0 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -3091,7 +3091,7 @@ calc_vec_perm_mask_for_shift (enum machine_mode mode, unsigned int offset,
   unsigned int i, nelt = GET_MODE_NUNITS (mode);
 
   for (i = 0; i < nelt; i++)
-    sel[i] = (BYTES_BIG_ENDIAN ? i - offset : i + offset) & (2*nelt - 1);
+    sel[i] = (i + offset) & (2*nelt - 1);
 }
 
 /* Checks whether the target supports whole-vector shifts for vectors of mode
@@ -3906,7 +3906,7 @@ vect_create_epilog_for_reduction (vec<tree> vect_defs, gimple stmt,
   gimple epilog_stmt = NULL;
   enum tree_code code = gimple_assign_rhs_code (stmt);
   gimple exit_phi;
-  tree bitsize, bitpos;
+  tree bitsize;
   tree adjustment_def = NULL;
   tree vec_initial_def = NULL;
   tree reduction_op, expr, def;
@@ -4416,14 +4416,8 @@ vect_create_epilog_for_reduction (vec<tree> vect_defs, gimple stmt,
         dump_printf_loc (MSG_NOTE, vect_location,
 			 "extract scalar result\n");
 
-      if (BYTES_BIG_ENDIAN)
-        bitpos = size_binop (MULT_EXPR,
-                             bitsize_int (TYPE_VECTOR_SUBPARTS (vectype) - 1),
-                             TYPE_SIZE (scalar_type));
-      else
-        bitpos = bitsize_zero_node;
-
-      rhs = build3 (BIT_FIELD_REF, scalar_type, new_temp, bitsize, bitpos);
+      rhs = build3 (BIT_FIELD_REF, scalar_type,
+		    new_temp, bitsize, bitsize_zero_node);
       epilog_stmt = gimple_build_assign (new_scalar_dest, rhs);
       new_temp = make_ssa_name (new_scalar_dest, epilog_stmt);
       gimple_assign_set_lhs (epilog_stmt, new_temp);

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [PATCH 0/4][Vectorizer] Reductions: replace VEC_RSHIFT_EXPR with VEC_PERM_EXPR
  2014-11-12 17:52 [PATCH 0/4][Vectorizer] Reductions: replace VEC_RSHIFT_EXPR with VEC_PERM_EXPR Alan Lawrence
                   ` (3 preceding siblings ...)
  2014-11-12 18:05 ` [PATCH 4/4][Vectorizer]Make reductions-via-shifts and vec_shr_optab endianness-neutral Alan Lawrence
@ 2014-11-12 19:20 ` Matthew Fortune
  2014-11-14 12:45   ` Alan Lawrence
  2014-11-13  9:16 ` Richard Biener
  5 siblings, 1 reply; 12+ messages in thread
From: Matthew Fortune @ 2014-11-12 19:20 UTC (permalink / raw)
  To: Alan Lawrence, gcc-patches
  Cc: Richard Biener, David Edelsohn, Catherine Moore

> (for MIPS) https://gcc.gnu.org/ml/gcc-patches/2014-09/msg01481.html,
> although I have not been able to test this as there doesn't seem to be
> any working MIPS/Loongson hardware in the Compile Farm;

I will post a patch to remove vec_shl and only support vec_shr for
little endian. This is on the basis that loongson only supports
little endian anyway.

I believe this is a safe thing to do regardless of whether your change
is in place or not.

Matthew

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/4][Vectorizer] Reductions: replace VEC_RSHIFT_EXPR with VEC_PERM_EXPR
  2014-11-12 17:52 [PATCH 0/4][Vectorizer] Reductions: replace VEC_RSHIFT_EXPR with VEC_PERM_EXPR Alan Lawrence
                   ` (4 preceding siblings ...)
  2014-11-12 19:20 ` [PATCH 0/4][Vectorizer] Reductions: replace VEC_RSHIFT_EXPR with VEC_PERM_EXPR Matthew Fortune
@ 2014-11-13  9:16 ` Richard Biener
  5 siblings, 0 replies; 12+ messages in thread
From: Richard Biener @ 2014-11-13  9:16 UTC (permalink / raw)
  To: Alan Lawrence
  Cc: gcc-patches, David Edelsohn, Catherine Moore, Matthew Fortune

On Wed, 12 Nov 2014, Alan Lawrence wrote:

> In response to https://gcc.gnu.org/ml/gcc-patches/2014-09/msg01803.html, this
> series removes the VEC_RSHIFT_EXPR, instead using a VEC_PERM_EXPR (with a
> second argument full of constant zeroes) to represent the shift.
> 
> I've kept the use of vec_shr optab for platforms that define it, as even on
> platforms with a whole-vector-shift operation, this typically does not work as
> a vec-perm on arbitrary vectors (the shift will pull in zeroes from the end,
> whereas TARGET_VECTORIZE_VEC_PERM_CONST_OK and related machinery allows only
> to check for a shift-like permutation that will work for two arbitrary
> vectors).

That's reasonable for the moment though I expected to use

 VEC_PERM <v4si, { 0, 0, 0, 0 }, { 4, 5, 0, 1 }>

for the shift - thus the shifted in vector elements should map
1:1 from the 2nd vector.  This means that the target can
answer "yes" to vec_perm_const_ok (v4si, ...) which such
a permute if it can shift in zeros as it then can do

  tem = shift-in-zeros
  tem2 = vec2 & ~<mask to clear not wanted stuff>
  perm_result = tem | tem2;

that is, simply OR in the wanted parts of the 2nd vector.  Of
course the actual expansion can special-case a constant or
zero 2nd vector.

Usually targets provide a way of setting vector elements to
all ones or zero with their permute instructions as well.

Of course the above requires adjustments to all targets vec_perm_const_ok
hooks and vec_perm_const expanders so for now asking for vec_shr
is ok, but long-term it shouldn't be needed, even without changes
to the vec_perm_const_ok interface.

Thanks,
Richard.

> I've also changed from the endianness-dependent shift direction of the old
> VEC_RSHIFT_EXPR, to an endian-neutral direction (VEC_PERM_EXPR is inherently
> endian-neutral), changing the meaning of vec_shr_optab to match (as I did in
> https://gcc.gnu.org/ml/gcc-patches/2014-09/msg01475.html). As previously, this
> will break any *bigendian* platform defining vec_shr; I see MIPS and RS6000,
> but candidate fixes for both of these have already been posted:
> 
> (for MIPS) https://gcc.gnu.org/ml/gcc-patches/2014-09/msg01481.html, although
> I have not been able to test this as there doesn't seem to be any working
> MIPS/Loongson hardware in the Compile Farm;
> 
> (for PowerPC) https://gcc.gnu.org/ml/gcc-patches/2014-09/msg01480.html,
> testing in progress.
> 
> ARM defines vec_shr only for little-endian; AArch64 does not yet, although in
> previous patch series I both added a vec_shr and made it endian-neutral (I
> will post revised patches for both of these shortly).
> 
> Bootstrapped and check-gcc on x86-none-linux-gnu and arm-none-linux-gnu;
> cross-tested on aarch64{,_be}-none-elf (FWIW, both with and without previous
> patches adding a vec_shr pattern)
> 
> Ok for trunk if no regressions on PowerPC?
> 
> Thanks, Alan

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/4][Vectorizer] Split vect_gen_perm_mask into _checked and _any variants
  2014-11-12 17:53 ` [PATCH 1/4][Vectorizer] Split vect_gen_perm_mask into _checked and _any variants Alan Lawrence
@ 2014-11-13 11:17   ` Richard Biener
  0 siblings, 0 replies; 12+ messages in thread
From: Richard Biener @ 2014-11-13 11:17 UTC (permalink / raw)
  To: Alan Lawrence; +Cc: gcc-patches

On Wed, Nov 12, 2014 at 6:52 PM, Alan Lawrence <alan.lawrence@arm.com> wrote:
> This is a preliminary to patch 2, which wants functionality equivalent to
> vect_gen_perm_mask (converting a char* to an RTL const_vector) but without
> the check of can_vec_perm_p.
>
> All existing calls to vect_gen_perm_mask barring that in
> perm_mask_for_reverse, assert the return value is non-null. Hence, this
> patch splits the existing vect_gen_perm_mask into two: a checked variant
> which makes the equivalent assertion; and an unchecked variant, which just
> turns any char* into an equivalent const_vector.
>
> (An) alternative strategy would be merely to remove the check from
> vect_gen_perm_mask (so equivalent to this patch's vect_gen_perm_mask_any),
> and add a preceding assert at all callsites (i.e. changing the many
> "gen_vect_perm_mask (...); assert (... != NULL);" into "assert
> (can_vec_perm_p (...)); gen_vect_perm_mask (...);" - that would work just as
> well as far as this patch series is concerned.
>
> On its own, bootstrapped on x86-none-linux-gnu (more testing with patches
> 2+3).

Ok.

Thanks,
Richard.

> gcc/ChangeLog:
>
>         * tree-vectorizer.h (vect_gen_perm_mask): Remove.
>         (vect_gen_perm_mask_checked, vect_gen_perm_mask_any): New.
>
>         tree_vec_data_refs.c (vect_permute_load_chain,
> vec_permute_store_chain,
>         vec_shift_permute_load_chain): Replace vect_gen_perm_mask & assert
>         with vect_gen_perm_mask_checked.
>
>         * tree-vect-stmts.c (vectorizable_mask_load_store,
> vectorizable_load):
>         Likewise.
>
>         (vect_gen_perm_mask_checked): New.
>         (vect_gen_perm_mask): Remove can_vec_perm_p check, rename to...
>         (vect_gen_perm_mask_any): ...this.
>
>         (perm_mask_for_reverse): Call can_vec_perm_p and
>         vect_gen_perm_mask_checked.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/4][Vectorizer] Use a VEC_PERM_EXPR instead of VEC_RSHIFT_EXPR; expand appropriate VEC_PERM_EXPRs using vec_shr_optab
  2014-11-12 17:54 ` [PATCH 2/4][Vectorizer] Use a VEC_PERM_EXPR instead of VEC_RSHIFT_EXPR; expand appropriate VEC_PERM_EXPRs using vec_shr_optab Alan Lawrence
@ 2014-11-13 11:23   ` Richard Biener
  0 siblings, 0 replies; 12+ messages in thread
From: Richard Biener @ 2014-11-13 11:23 UTC (permalink / raw)
  To: Alan Lawrence; +Cc: gcc-patches

On Wed, Nov 12, 2014 at 6:53 PM, Alan Lawrence <alan.lawrence@arm.com> wrote:
> This makes the vectorizer use VEC_PERM_EXPRs when doing reductions via
> shifts, rather than VEC_RSHIFT_EXPR.
>
> VEC_RSHIFT_EXPR presently has an endianness-dependent meaning (paralleling
> vec_shr_optab). While the overall destination of this patch series is to
> make these endianness-neutral, this patch already feels quite big enough,
> hence, here we just switch to using VEC_PERM_EXPRs that have meaning
> equivalent to the old VEC_RSHIFT_EXPRs. Since VEC_PERM_EXPR is
> endianness-neutral, this means the mask we need to represent the meaning of
> the old VEC_RSHIFT_EXPR changes according to endianness. (Patch 4 completes
> this journey by removing the BYTES_BIG_ENDIAN-conditional parts; so an
> alternative route to the same endpoint, would be to first change
> VEC_RSHIFT_EXPR to be endianness-independent, then replace it by
> VEC_PERM_EXPRs. I posted such a patch to make VEC_RSHIFT_EXPR independent
> https://gcc.gnu.org/ml/gcc-patches/2014-09/msg01475.html and this was what
> lead Richi to make his suggestion!)
>
> The "trick" here is then to look for the platform handling vec_shr_optab
> when expanding vec_perm_const *if* the second vector is all constant zeroes
> and the vec_perm mask is appropriate. I felt it was best to keep this case
> separate from can_vec_perm_p, so the latter only indicates when the target
> platform can apply a given permutation to _arbitrary_input_vectors_, as
> can_vec_perm_p's interface is already complicated enough without making it
> also able to handle cases where some of the vectors-to-be-shuffled are
> known.
>
> A nice side effect of this patch is that aarch64 targets suddenly perform
> reductions via shifts even *without* a vec_shr_optab, because
> aarch64_vectorize_vec_perm_const_ok looks for shuffle-masks for the EXT
> instruction, which can indeed be used to perform a shift :).
>
> With patch 1, bootstrapped on x86-none-linux-gnu (more testing with patch
> 3).

Ok.

Thanks,
Richard.

> gcc/ChangeLog:
>
>         * optabs.c (can_vec_perm_p): Update comment, does not consider
> vec_shr.
>         (shift_amt_for_vec_perm_mask): New.
>         (expand_vec_perm_1): Use vec_shr_optab if second vector is
> const0_rtx
>         and mask appropriate.
>
>         * tree-vect-loop.c (calc_vec_perm_mask_for_shift): New.
>         (have_whole_vector_shift): New.
>         (vect_model_reduction_cost): Call have_whole_vector_shift instead of
>         looking for vec_shr_optab.
>         (vect_create_epilog_for_reduction): Likewise; also rename local
> variable
>         have_whole_vector_shift to reduce_with_shift; output VEC_PERM_EXPRs
>         instead of VEC_RSHIFT_EXPRs.
>
>         * tree-vect-stmts.c (vect_gen_perm_mask_checked): Extend comment.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 3/4] Remove VEC_RSHIFT_EXPR tree code, now unused
  2014-11-12 17:56 ` [PATCH 3/4] Remove VEC_RSHIFT_EXPR tree code, now unused Alan Lawrence
@ 2014-11-13 11:24   ` Richard Biener
  0 siblings, 0 replies; 12+ messages in thread
From: Richard Biener @ 2014-11-13 11:24 UTC (permalink / raw)
  To: Alan Lawrence; +Cc: gcc-patches

On Wed, Nov 12, 2014 at 6:55 PM, Alan Lawrence <alan.lawrence@arm.com> wrote:
> Tested (with patches 1+2):
>
> Bootstrap + check-gcc on x64-none-linux-gnu
>
> cross-tested check-gcc on aarch64-none-elf and aarch64_be-none-elf as these
> platforms stand (i.e. without vec_shr_optab).
>
> also cross-tested check-gcc on aarch64-none-elf and aarch64_be-none-elf
> after applying https://gcc.gnu.org/ml/gcc-patches/2014-09/msg01473.html
> (which adds a vec_shr_<m> pattern).
>
> bootstrap on powerpc64-none-linux-gnu; check-gcc in progress.

Ok.

Thanks,
Richard.

> gcc/ChangeLog:
>
>         * fold-const.c (const_binop): Remove code handling VEC_RSHIFT_EXPR.
>         * tree-cfg.c (verify_gimple_assign_binary): Likewise.
>         * tree-inline.c (estimate_operator_cost): Likewise.
>         * tree-pretty-print.c (dump_generic_node, op_code_prio,
> op_symbol_code):
>         Likewise.
>
>         * tree-vect-generic.c (expand_vector_operations_1): Remove assertion
>         against VEC_RSHIFT_EXPR.
>
>         * optabs.h (expand_vec_shift_expr): Remove.
>         * optabs.c (optab_for_tree_code): Remove case VEC_RSHIFT_EXPR.
>         (expand_vec_shift_expr): Remove.
>         * tree.def (VEC_RSHIFT_EXPR): Remove

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 4/4][Vectorizer]Make reductions-via-shifts and vec_shr_optab endianness-neutral
  2014-11-12 18:05 ` [PATCH 4/4][Vectorizer]Make reductions-via-shifts and vec_shr_optab endianness-neutral Alan Lawrence
@ 2014-11-13 11:32   ` Richard Biener
  0 siblings, 0 replies; 12+ messages in thread
From: Richard Biener @ 2014-11-13 11:32 UTC (permalink / raw)
  To: Alan Lawrence; +Cc: gcc-patches

On Wed, Nov 12, 2014 at 6:56 PM, Alan Lawrence <alan.lawrence@arm.com> wrote:
> This redefines vec_shr optab to be the same (in terms of gcc vectors)
> regardless of target endianness. The vectorizer uses this to do reductions
> via shifts, so also change the vectorizer to shift things always the same
> way (from the midend's POV of vectors).
>
> cross-tested check-gcc on (1) aarch64-none-elf and (2) aarch64_be-none-elf,
> both (a) using the endianness-independent vec_shr patterns at
> https://gcc.gnu.org/ml/gcc-patches/2014-09/msg01477.html and (b) in present
> state without any vec_shr patterns. No regressions on any combination.
>
> Bootstrap + check-gcc on x86_64-none-linux-gnu.
>
> This patch will break MIPS and PowerPC (which have bigendian vec_shr
> patterns). Candidate MIPS fix previously posted at
> https://gcc.gnu.org/ml/gcc-patches/2014-09/msg01481.html .
> PowerPC should be fixed by
> https://gcc.gnu.org/ml/gcc-patches/2014-09/msg01480.html ; I've bootstrapped
> this on powerpc64-unknown-linux-gnu, check-gcc in progress.

Ok.

Thanks.
Richard.

> gcc/ChangeLog:
>
>         * optabs.c (shift_amt_for_vec_perm_mask): Remove code conditional on
>         BYTES_BIG_ENDIAN.
>         * tree-vect-loop.c (calc_vec_perm_mask_for_shift,
>         vect_create_epilog_for_reduction): Likewise.
>         * doc/md.texi (vec_shr_m): Clarify direction of shifting.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/4][Vectorizer] Reductions: replace VEC_RSHIFT_EXPR with VEC_PERM_EXPR
  2014-11-12 19:20 ` [PATCH 0/4][Vectorizer] Reductions: replace VEC_RSHIFT_EXPR with VEC_PERM_EXPR Matthew Fortune
@ 2014-11-14 12:45   ` Alan Lawrence
  0 siblings, 0 replies; 12+ messages in thread
From: Alan Lawrence @ 2014-11-14 12:45 UTC (permalink / raw)
  To: Matthew Fortune; +Cc: gcc-patches, Catherine Moore

Ah, I didn't realize Loongson was little-endian only. In that case (with mid-end 
reductions-via-shifts changes pushed) I don't think I have actually broken 
anything, or at least, no MIPS platform that exists :).

However, yes, that would seem a safe bet (and simpler than my linked patch that 
provided a BE version too!).

Cheers, Alan

Matthew Fortune wrote:
>> (for MIPS) https://gcc.gnu.org/ml/gcc-patches/2014-09/msg01481.html,
>> although I have not been able to test this as there doesn't seem to be
>> any working MIPS/Loongson hardware in the Compile Farm;
> 
> I will post a patch to remove vec_shl and only support vec_shr for
> little endian. This is on the basis that loongson only supports
> little endian anyway.
> 
> I believe this is a safe thing to do regardless of whether your change
> is in place or not.
> 
> Matthew
> 


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2014-11-14 12:25 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-11-12 17:52 [PATCH 0/4][Vectorizer] Reductions: replace VEC_RSHIFT_EXPR with VEC_PERM_EXPR Alan Lawrence
2014-11-12 17:53 ` [PATCH 1/4][Vectorizer] Split vect_gen_perm_mask into _checked and _any variants Alan Lawrence
2014-11-13 11:17   ` Richard Biener
2014-11-12 17:54 ` [PATCH 2/4][Vectorizer] Use a VEC_PERM_EXPR instead of VEC_RSHIFT_EXPR; expand appropriate VEC_PERM_EXPRs using vec_shr_optab Alan Lawrence
2014-11-13 11:23   ` Richard Biener
2014-11-12 17:56 ` [PATCH 3/4] Remove VEC_RSHIFT_EXPR tree code, now unused Alan Lawrence
2014-11-13 11:24   ` Richard Biener
2014-11-12 18:05 ` [PATCH 4/4][Vectorizer]Make reductions-via-shifts and vec_shr_optab endianness-neutral Alan Lawrence
2014-11-13 11:32   ` Richard Biener
2014-11-12 19:20 ` [PATCH 0/4][Vectorizer] Reductions: replace VEC_RSHIFT_EXPR with VEC_PERM_EXPR Matthew Fortune
2014-11-14 12:45   ` Alan Lawrence
2014-11-13  9:16 ` Richard Biener

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).