public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* Vector shuffling
@ 2011-08-30  7:17 Artem Shinkarov
  2011-08-30 13:50 ` Richard Guenther
  0 siblings, 1 reply; 71+ messages in thread
From: Artem Shinkarov @ 2011-08-30  7:17 UTC (permalink / raw)
  To: gcc-patches; +Cc: Richard Guenther

[-- Attachment #1: Type: text/plain, Size: 2772 bytes --]

Hi

This is a patch for the explicit vector shuffling we have discussed a
long time ago here:
http://gcc.gnu.org/ml/gcc-patches/2010-08/msg01092.html

The new patch introduces the new tree code, as we agreed, and expands
this code by checking the vshuffle pattern in the backend.

The patch at the moment lacks of some examples, but mainly it works
fine for me. It would be nice if i386 gurus could look into the way I
am doing the expansion.

Middle-end parts seems to be more or less fine, they have not changed
much from the previous time.

ChangeLog:
2011-08-30 Artjoms Sinkarovs <artyom.shinkaroff@gmailc.com>

	gcc/
	* optabs.c (expand_vec_shuffle_expr_p): New function. Checks
	if given expression can be expanded by the target.
	(expand_vec_shuffle_expr): New function. Expand VEC_SHUFFLE_EXPR
	using target vector instructions.
	* optabs.h: New optab vshuffle.
	(expand_vec_shuffle_expr_p): New prototype.
	(expand_vec_shuffle_expr): New prototype.
	* genopinit.c: Adjust to support vecshuffle.
	* builtins.def: New builtin __builtin_shuffle.
	* c-typeck.c (build_function_call_vec): Typecheck
	__builtin_shuffle, allowing only two or three arguments.
	Change the type of builtin depending on the arguments.
	(digest_init): Warn when constructor has less elements than
	vector type.
	* gimplify.c (gimplify_exp): Adjusted to support VEC_SHUFFLE_EXPR.
	* tree.def: New tree code VEC_SHUFFLE_EXPR.
	* tree-vect-generic.c (vector_element): New function. Returns an
	element of the vector at the given position.
	(lower_builtin_shuffle): Change builtin_shuffle with VEC_SHUFLLE_EXPR
	or expand an expression piecewise.
	(expand_vector_operations_1): Adjusted.
	(gate_expand_vector_operations_noop): New gate function.
	* gimple.c (get_gimple_rhs_num_ops): Adjust.
	* passes.c: Move veclower down.
	* tree-pretty-print.c (dump_generic_node): Recognize
	VEC_SHUFFLE_EXPR as valid expression.
	* tree-ssa-operands: Adjust.

	gcc/config/i386
	* sse.md: (sseshuffint) New mode_attr. Correspondence between the
	vector and the type of the mask when shuffling.
	(vecshuffle<mode>): New expansion.
	* i386-protos.h (ix86_expand_vshuffle): New prototype.
	* i386.c (ix86_expand_vshuffle): Expand vshuffle using pshufb.
	(ix86_vectorize_builtin_vec_perm_ok): Adjust.

	gcc/doc
	* extend.texi: Adjust.

	gcc/testsuite
	* gcc.c-torture/execute/vect-shuffle-2.c: New test.
	* gcc.c-torture/execute/vect-shuffle-4.c: New test.
	* gcc.c-torture/execute/vect-shuffle-1.c: New test.
	* gcc.c-torture/execute/vect-shuffle-3.c: New test.

bootstrapped on x86_64-unknown-linux-gnu. The AVX parts are not
tested, because I don't have actual hardware. It works with -mavx, the
assembler code looks fine to me. I'll test it on a real hardware in
couple of days.



Thanks,
Artem Shinkarov.

[-- Attachment #2: vec-shuffle.v11.diff --]
[-- Type: text/plain, Size: 39593 bytes --]

Index: gcc/doc/extend.texi
===================================================================
--- gcc/doc/extend.texi	(revision 177758)
+++ gcc/doc/extend.texi	(working copy)
@@ -6553,6 +6553,32 @@ invoke undefined behavior at runtime.  W
 accesses for vector subscription can be enabled with
 @option{-Warray-bounds}.
 
+Vector shuffling is available using functions 
+@code{__builtin_shuffle (vec, mask)} and 
+@code{__builtin_shuffle (vec0, vec1, mask)}. Both functions construct
+a permutation of elements from one or two vectors and return a vector
+of the same type as input vector(s). The mask is a vector of 
+integer-typed elements. The size of each element of the mask must be
+the same as the size of each input vector element. The number of 
+elements in input vector(s) and mask must be the same.
+
+The elements of the input vectors are numbered from left to right across
+one or both of the vectors. Each element in the mask specifies a number
+of element from the input vector(s). Consider the following example.
+
+@smallexample
+typedef int v4si __attribute__ ((vector_size (16)));
+
+v4si a = @{1,2,3,4@};
+v4si b = @{5,6,7,8@};
+v4si mask1 = @{0,1,1,3@};
+v4si mask2 = @{0,4,2,5@};
+v4si res;
+
+res = __builtin_shuffle (a, mask1);       /* res is @{1,2,2,4@}  */
+res = __builtin_shuffle2 (a, b, mask2);   /* res is @{1,5,3,6@}  */
+@end smallexample
+
 You can declare variables and use them in function calls and returns, as
 well as in assignments and some casts.  You can specify a vector type as
 a return type for a function.  Vector types can also be used as function
Index: gcc/tree-pretty-print.c
===================================================================
--- gcc/tree-pretty-print.c	(revision 177758)
+++ gcc/tree-pretty-print.c	(working copy)
@@ -2063,6 +2063,16 @@ dump_generic_node (pretty_printer *buffe
       dump_generic_node (buffer, TREE_OPERAND (node, 2), spc, flags, false);
       pp_string (buffer, " > ");
       break;
+    
+    case VEC_SHUFFLE_EXPR:
+      pp_string (buffer, " VEC_SHUFFLE_EXPR < ");
+      dump_generic_node (buffer, TREE_OPERAND (node, 0), spc, flags, false);
+      pp_string (buffer, " , ");
+      dump_generic_node (buffer, TREE_OPERAND (node, 1), spc, flags, false);
+      pp_string (buffer, " , ");
+      dump_generic_node (buffer, TREE_OPERAND (node, 2), spc, flags, false);
+      pp_string (buffer, " > ");
+      break;
 
     case DOT_PROD_EXPR:
       pp_string (buffer, " DOT_PROD_EXPR < ");
Index: gcc/optabs.c
===================================================================
--- gcc/optabs.c	(revision 177758)
+++ gcc/optabs.c	(working copy)
@@ -6530,6 +6530,79 @@ vector_compare_rtx (tree cond, bool unsi
   return gen_rtx_fmt_ee (rcode, VOIDmode, ops[0].value, ops[1].value);
 }
 
+bool
+expand_vec_shuffle_expr_p (enum machine_mode mode, tree v0, 
+			   tree v1, tree mask)
+{
+#define inner_type_size(vec) \
+  GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE (TREE_TYPE (vec))))
+
+  if (TREE_CODE (mask) == VECTOR_CST
+      && targetm.vectorize.builtin_vec_perm_ok (TREE_TYPE (v0), mask))
+    return true;
+  
+  if (v0 != v1 || inner_type_size (v0) != inner_type_size (mask))
+    return false;
+    
+  return direct_optab_handler (vshuffle_optab, mode) != CODE_FOR_nothing;
+#undef inner_type_size
+}
+
+rtx
+expand_vec_shuffle_expr (tree type, tree v0, tree v1, tree mask, rtx target)
+{
+  struct expand_operand ops[4];
+  enum insn_code icode;
+  enum machine_mode mode = TYPE_MODE (type);
+  rtx rtx_v0, rtx_mask;
+
+  gcc_assert (expand_vec_shuffle_expr_p (mode, v0, v1, mask));
+
+  if (TREE_CODE (mask) == VECTOR_CST)
+    {
+      tree m_type, call;
+      tree fn = targetm.vectorize.builtin_vec_perm (TREE_TYPE (v0), &m_type);
+      rtx t;
+
+      if (!fn)
+	goto vshuffle;
+
+      if (m_type != TREE_TYPE (TREE_TYPE (mask)))
+	{	
+	  int units = TYPE_VECTOR_SUBPARTS (TREE_TYPE (mask));
+	  tree cvt = build_vector_type (m_type, units);
+	  mask = fold_convert (cvt, mask);
+	}
+
+      fn = copy_node (fn);
+      call = fold_build1 (ADDR_EXPR, build_pointer_type (TREE_TYPE (fn)), fn);
+      call = build_call_nary (type /* ? */, call, 3, v0, v1, mask);
+
+      t = expand_normal (call);  
+      target = gen_reg_rtx (mode);
+      emit_insn (gen_rtx_SET (VOIDmode, target, t));
+      return target;
+    }
+
+vshuffle:
+  gcc_assert (v1 == v0);
+
+  icode = direct_optab_handler (vshuffle_optab, mode);
+
+  if (icode == CODE_FOR_nothing)
+    return 0;
+  
+  rtx_v0 = expand_normal (v0);
+  rtx_mask = expand_normal (mask);
+
+  create_output_operand (&ops[0], target, mode);
+  create_input_operand (&ops[1], rtx_v0, mode);
+  create_input_operand (&ops[2], rtx_mask, mode);
+  expand_insn (icode, 3, ops);
+  
+  return ops[0].value;
+}
+
 /* Return insn code for TYPE, the type of a VEC_COND_EXPR.  */
 
 static inline enum insn_code
Index: gcc/optabs.h
===================================================================
--- gcc/optabs.h	(revision 177758)
+++ gcc/optabs.h	(working copy)
@@ -630,6 +630,9 @@ enum direct_optab_index
   DOI_vcond,
   DOI_vcondu,
 
+  /* Vector shuffling.  */
+  DOI_vshuffle,
+
   /* Block move operation.  */
   DOI_movmem,
 
@@ -695,6 +698,7 @@ typedef struct direct_optab_d *direct_op
 #define reload_out_optab (&direct_optab_table[(int) DOI_reload_out])
 #define vcond_optab (&direct_optab_table[(int) DOI_vcond])
 #define vcondu_optab (&direct_optab_table[(int) DOI_vcondu])
+#define vshuffle_optab (&direct_optab_table[(int) DOI_vshuffle])
 #define movmem_optab (&direct_optab_table[(int) DOI_movmem])
 #define setmem_optab (&direct_optab_table[(int) DOI_setmem])
 #define cmpstr_optab (&direct_optab_table[(int) DOI_cmpstr])
@@ -864,8 +868,15 @@ extern rtx expand_widening_mult (enum ma
 /* Return tree if target supports vector operations for COND_EXPR.  */
 bool expand_vec_cond_expr_p (tree, enum machine_mode);
 
+/* Return tree if target supports vector operations for VEC_SHUFFLE_EXPR.  */
+bool expand_vec_shuffle_expr_p (enum machine_mode, tree, tree, tree);
+
 /* Generate code for VEC_COND_EXPR.  */
 extern rtx expand_vec_cond_expr (tree, tree, tree, tree, rtx);
+
+/* Generate code for VEC_SHUFFLE_EXPR.  */
+extern rtx expand_vec_shuffle_expr (tree, tree, tree, tree, rtx);
+
 /* Generate code for VEC_LSHIFT_EXPR and VEC_RSHIFT_EXPR.  */
 extern rtx expand_vec_shift_expr (sepops, rtx);
 
Index: gcc/genopinit.c
===================================================================
--- gcc/genopinit.c	(revision 177758)
+++ gcc/genopinit.c	(working copy)
@@ -253,6 +253,7 @@ static const char * const optabs[] =
   "set_optab_handler (vec_realign_load_optab, $A, CODE_FOR_$(vec_realign_load_$a$))",
   "set_direct_optab_handler (vcond_optab, $A, CODE_FOR_$(vcond$a$))",
   "set_direct_optab_handler (vcondu_optab, $A, CODE_FOR_$(vcondu$a$))",
+  "set_direct_optab_handler (vshuffle_optab, $A, CODE_FOR_$(vshuffle$a$))",
   "set_optab_handler (ssum_widen_optab, $A, CODE_FOR_$(widen_ssum$I$a3$))",
   "set_optab_handler (usum_widen_optab, $A, CODE_FOR_$(widen_usum$I$a3$))",
   "set_optab_handler (udot_prod_optab, $A, CODE_FOR_$(udot_prod$I$a$))",
Index: gcc/testsuite/gcc.c-torture/execute/vect-shuffle-2.c
===================================================================
--- gcc/testsuite/gcc.c-torture/execute/vect-shuffle-2.c	(revision 0)
+++ gcc/testsuite/gcc.c-torture/execute/vect-shuffle-2.c	(revision 0)
@@ -0,0 +1,44 @@
+#define vector(elcount, type)  \
+__attribute__((vector_size((elcount)*sizeof(type)))) type
+
+#define vidx(type, vec, idx) (*(((type *) &(vec)) + idx))
+
+#define shuf2compare(type, count, vres, v0, v1, mask) \
+do { \
+    int __i; \
+    for (__i = 0; __i < count; __i++) { \
+        if (vidx(type, vres, __i) != ((vidx(type, mask, __i) < count) ? \
+                          vidx(type, v0, vidx(type, mask, __i)) :  \
+                          vidx(type, v1, (vidx(type, mask, __i) - count)))) \
+            __builtin_abort (); \
+        } \
+} while (0)
+
+
+int main (int argc, char *argv[]) {
+    vector (8, short) v0 = {5, 5,5,5,5,5,argc,7};
+    vector (8, short) v1 = {argc, 1,8,8,4,9,argc,4};
+    vector (8, short) v2;
+
+    //vector (8, short) mask = {1,2,5,4,3,6,7};
+    
+    vector (8, short) mask0 = {0,2,3,1,4,5,6,7};
+    vector (8, short) mask1 = {0,12,3,4,3,0,10,9};
+    
+    vector (8, short) mask2 = {0,8,1,9,2,10,3,11};
+
+    v2 = __builtin_shuffle (v0, v1,  mask0);
+    shuf2compare (short, 8, v2, v0, v1, mask0);
+ 
+    v2 = __builtin_shuffle (v0, v1,  mask1);
+    shuf2compare (short, 8, v2, v0, v1, mask1);
+    
+    v2 = __builtin_shuffle (v0, v1,  mask2);
+    shuf2compare (short, 8, v2, v0, v1, mask2);
+
+    v2 = __builtin_shuffle (mask0, mask0,  v0);
+    shuf2compare (short, 8, v2, mask0, mask0, v0);
+
+    return 0; 
+}
+
Index: gcc/testsuite/gcc.c-torture/execute/vect-shuffle-4.c
===================================================================
--- gcc/testsuite/gcc.c-torture/execute/vect-shuffle-4.c	(revision 0)
+++ gcc/testsuite/gcc.c-torture/execute/vect-shuffle-4.c	(revision 0)
@@ -0,0 +1,50 @@
+#define vector(elcount, type)  \
+__attribute__((vector_size((elcount)*sizeof(type)))) type
+
+#define vidx(type, vec, idx) (*(((type *) &(vec)) + idx))
+
+#define shuf2compare(type, count, vres, v0, v1, mask) \
+do { \
+    int __i; \
+    for (__i = 0; __i < count; __i++) { \
+        if (vidx(type, vres, __i) != ((vidx(type, mask, __i) < count) ? \
+                          vidx(type, v0, vidx(type, mask, __i)) :  \
+                          vidx(type, v1, (vidx(type, mask, __i) - count)))) \
+            __builtin_abort (); \
+        } \
+} while (0)
+
+
+vector (8, short) __attribute__ ((noinline))
+f (vector (8, short) x, vector (8, short) y, vector (8, short) mask) {
+    return __builtin_shuffle (x, y, mask);
+}
+
+
+
+int main (int argc, char *argv[]) {
+    vector (8, short) v0 = {argc, 1,2,3,4,5,6,7};
+    vector (8, short) v1 = {argc, 1,argc,3,4,5,argc,7};
+    vector (8, short) v2;
+
+    //vector (8, short) mask = {1,2,5,4,3,6,7};
+    
+    vector (8, short) mask0 = {0,2,3,1,4,5,6,7};
+    vector (8, short) mask1 = {0,12,3,4,3,0,10,9};
+    vector (8, short) mask2 = {0,8,1,9,2,10,3,11};
+
+    v2 = f (v0, v1,  mask0);
+    shuf2compare (short, 8, v2, v0, v1, mask0);
+ 
+    v2 = f (v0, v1,  mask1);
+    shuf2compare (short, 8, v2, v0, v1, mask1);
+
+    v2 = f (v0, v1,  mask2);
+    shuf2compare (short, 8, v2, v0, v1, mask2);
+
+    v2 = f (mask0, mask0,  v0);
+    shuf2compare (short, 8, v2, mask0, mask0, v0);
+
+    return 0; 
+}
+
Index: gcc/testsuite/gcc.c-torture/execute/vect-shuffle-1.c
===================================================================
--- gcc/testsuite/gcc.c-torture/execute/vect-shuffle-1.c	(revision 0)
+++ gcc/testsuite/gcc.c-torture/execute/vect-shuffle-1.c	(revision 0)
@@ -0,0 +1,46 @@
+#define vector(elcount, type)  \
+__attribute__((vector_size((elcount)*sizeof(type)))) type
+
+#define vidx(type, vec, idx) (*(((type *) &(vec)) + idx))
+
+#define shufcompare(type, count, vres, v0, mask) \
+do { \
+    int __i; \
+    for (__i = 0; __i < count; __i++) { \
+        if (vidx(type, vres, __i) != vidx(type, v0, vidx(type, mask, __i))) \
+            __builtin_abort (); \
+    } \
+} while (0)
+
+
+int main (int argc, char *argv[]) {
+    /*vector (8, short) v0 = {argc, 1,2,3,4,5,6,7};
+    vector (8, short) v1 = {argc, 1,argc,3,4,5,argc,7};
+    vector (8, short) v2;
+   
+    vector (8, short) smask = {0,0,1,2,3,4,5,6};
+    
+    v2 = __builtin_shuffle (v0,  smask);
+    shufcompare (short, 8, v2, v0, smask);
+    v2 = __builtin_shuffle (v0, v1);
+    shufcompare (short, 8, v2, v0, v1);
+    v2 = __builtin_shuffle (smask, v0);
+    shufcompare (short, 8, v2, smask, v0);*/
+
+    vector (4, int) i0 = {argc, 1,2,3};
+    vector (4, int) i1 = {argc, 1, argc, 3};
+    vector (4, int) i2;
+
+    vector (4, int) imask = {0,3,2,1};
+
+    /*i2 = __builtin_shuffle (i0, imask);
+    shufcompare (int, 4, i2, i0, imask);*/
+    i2 = __builtin_shuffle (i0, i1);
+    shufcompare (int, 4, i2, i0, i1);
+    
+    i2 = __builtin_shuffle (imask, i0);
+    shufcompare (int, 4, i2, imask, i0);
+    
+    return 0;
+}
+
Index: gcc/testsuite/gcc.c-torture/execute/vect-shuffle-3.c
===================================================================
--- gcc/testsuite/gcc.c-torture/execute/vect-shuffle-3.c	(revision 0)
+++ gcc/testsuite/gcc.c-torture/execute/vect-shuffle-3.c	(revision 0)
@@ -0,0 +1,36 @@
+#define vector(elcount, type)  \
+__attribute__((vector_size((elcount)*sizeof(type)))) type
+
+#define vidx(type, vec, idx) (*(((type *) &(vec)) + idx))
+
+#define shufcompare(type, count, vres, v0, mask) \
+do { \
+    int __i; \
+    for (__i = 0; __i < count; __i++) { \
+        if (vidx(type, vres, __i) != vidx(type, v0, vidx(type, mask, __i))) \
+            __builtin_abort (); \
+    } \
+} while (0)
+
+vector (8, short) __attribute__ ((noinline))
+f (vector (8, short) x, vector (8, short) mask) {
+    return __builtin_shuffle (x, mask);
+}
+
+
+int main (int argc, char *argv[]) {
+    vector (8, short) v0 = {argc, 1,2,3,4,5,6,7};
+    vector (8, short) v1 = {argc, 1,argc,3,4,5,argc,7};
+    vector (8, short) v2;
+
+    vector (8, short) mask = {0,0,1,2,3,4,5,6};
+    
+    v2 = f (v0,  mask);
+    shufcompare (short, 8, v2, v0, mask);
+
+    v2 = f (v0, v1);
+    shufcompare (short, 8, v2, v0, v1);
+
+    return 0;
+}
+
Index: gcc/builtins.def
===================================================================
--- gcc/builtins.def	(revision 177758)
+++ gcc/builtins.def	(working copy)
@@ -725,6 +725,8 @@ DEF_GCC_BUILTIN        (BUILT_IN_VA_ARG_
 DEF_EXT_LIB_BUILTIN    (BUILT_IN__EXIT, "_exit", BT_FN_VOID_INT, ATTR_NORETURN_NOTHROW_LEAF_LIST)
 DEF_C99_BUILTIN        (BUILT_IN__EXIT2, "_Exit", BT_FN_VOID_INT, ATTR_NORETURN_NOTHROW_LEAF_LIST)
 
+DEF_GCC_BUILTIN        (BUILT_IN_SHUFFLE, "shuffle", BT_FN_INT_VAR, ATTR_CONST_NOTHROW_TYPEGENERIC)
+
 /* Implementing nested functions.  */
 DEF_BUILTIN_STUB (BUILT_IN_INIT_TRAMPOLINE, "__builtin_init_trampoline")
 DEF_BUILTIN_STUB (BUILT_IN_ADJUST_TRAMPOLINE, "__builtin_adjust_trampoline")
Index: gcc/expr.c
===================================================================
--- gcc/expr.c	(revision 177758)
+++ gcc/expr.c	(working copy)
@@ -9913,6 +9913,11 @@ expand_expr_real_1 (tree exp, rtx target
     case VEC_COND_EXPR:
       target = expand_vec_cond_expr (type, treeop0, treeop1, treeop2, target);
       return target;
+    
+    case VEC_SHUFFLE_EXPR:
+      target = expand_vec_shuffle_expr (type, treeop0, treeop1, treeop2, target);
+      return target;
+
 
     case MODIFY_EXPR:
       {
Index: gcc/c-typeck.c
===================================================================
--- gcc/c-typeck.c	(revision 177758)
+++ gcc/c-typeck.c	(working copy)
@@ -2815,6 +2815,68 @@ build_function_call_vec (location_t loc,
       && !check_builtin_function_arguments (fundecl, nargs, argarray))
     return error_mark_node;
 
+  /* Typecheck a builtin function which is declared with variable
+     argument list.  */
+  if (fundecl && DECL_BUILT_IN (fundecl)
+      && DECL_BUILT_IN_CLASS (fundecl) == BUILT_IN_NORMAL)
+    {
+      enum built_in_function fcode = DECL_FUNCTION_CODE (fundecl);
+      if (fcode == BUILT_IN_SHUFFLE) 
+        {
+          tree firstarg = VEC_index (tree, params, 0);
+          tree mask = VEC_index (tree, params, nargs - 1);
+
+          if (nargs != 2 && nargs != 3)
+            {
+              error_at (loc, "__builtin_shuffle accepts 2 or 3 argumensts");
+              return error_mark_node;
+            }
+
+          if (TREE_CODE (TREE_TYPE (mask)) != VECTOR_TYPE
+              || TREE_CODE (TREE_TYPE (TREE_TYPE (mask))) != INTEGER_TYPE)
+            {
+              error_at (loc, "__builtin_shuffle last argument must "
+                             "be an integer vector");
+              return error_mark_node;
+            }
+           
+          if (TREE_CODE (TREE_TYPE (firstarg)) != VECTOR_TYPE
+              || (nargs == 3 
+                  && TREE_CODE (TREE_TYPE (VEC_index (tree, params, 1))) 
+                     != VECTOR_TYPE))
+            {
+              error_at (loc, "__builtin_shuffle arguments must be vectors");
+              return error_mark_node;
+            }
+
+          if ((TYPE_VECTOR_SUBPARTS (TREE_TYPE (firstarg)) 
+                 != TYPE_VECTOR_SUBPARTS (TREE_TYPE (mask)))
+              || (nargs == 3 
+                  && TYPE_VECTOR_SUBPARTS (
+                            TREE_TYPE (VEC_index (tree, params, 1)))
+                     != TYPE_VECTOR_SUBPARTS (TREE_TYPE (mask))))
+            {
+              error_at (loc, "__builtin_shuffle number of elements of the "
+                             "argument vector(s) and the mask vector should "
+                             "be the same");
+              return error_mark_node;
+            }
+         
+          /* Here we change the return type of the builtin function 
+             from int f(...) --> t f(...) where t is a type of the 
+             first argument.  */
+          fundecl = copy_node (fundecl);
+          TREE_TYPE (fundecl) = build_function_type (TREE_TYPE (firstarg),
+                                        TYPE_ARG_TYPES (TREE_TYPE (fundecl)));
+          function = build_fold_addr_expr (fundecl);
+          result = build_call_array_loc (loc, TREE_TYPE (firstarg),
+		        function, nargs, argarray);
+          return require_complete_type (result);
+        }
+    }
+
+
+
   /* Check that the arguments to the function are valid.  */
   check_function_arguments (fntype, nargs, argarray);
 
@@ -6120,10 +6182,17 @@ digest_init (location_t init_loc, tree t
 	  tree value;
 	  bool constant_p = true;
 
-	  /* Iterate through elements and check if all constructor
+	  /* If constructor has less elements than the vector type.  */
+          if (CONSTRUCTOR_NELTS (inside_init) 
+              < TYPE_VECTOR_SUBPARTS (TREE_TYPE (inside_init)))
+            warning_at (init_loc, 0, "vector length does not match "
+                                     "initializer length, zero elements "
+                                     "will be inserted");
+          
+          /* Iterate through elements and check if all constructor
 	     elements are *_CSTs.  */
 	  FOR_EACH_CONSTRUCTOR_VALUE (CONSTRUCTOR_ELTS (inside_init), ix, value)
-	    if (!CONSTANT_CLASS_P (value))
+	  if (!CONSTANT_CLASS_P (value))
 	      {
 		constant_p = false;
 		break;
Index: gcc/gimplify.c
===================================================================
--- gcc/gimplify.c	(revision 177758)
+++ gcc/gimplify.c	(working copy)
@@ -7050,6 +7050,7 @@ gimplify_expr (tree *expr_p, gimple_seq
 	  break;
 
 	case BIT_FIELD_REF:
+	case VEC_SHUFFLE_EXPR:
 	  {
 	    enum gimplify_status r0, r1, r2;
 
Index: gcc/tree.def
===================================================================
--- gcc/tree.def	(revision 177758)
+++ gcc/tree.def	(working copy)
@@ -497,6 +497,14 @@ DEFTREECODE (COND_EXPR, "cond_expr", tcc
 */
 DEFTREECODE (VEC_COND_EXPR, "vec_cond_expr", tcc_expression, 3)
 
+/* Vector shuffle expression. A = VEC_SHUFFLE_EXPR<v0, v1, maks>
+   means
+
+   freach i in length (mask):
+     A = mask[i] < length (v0) ? v0[mask[i]] : v1[mask[i]]
+*/
+DEFTREECODE (VEC_SHUFFLE_EXPR, "vec_shuffle_expr", tcc_expression, 3)
+
 /* Declare local variables, including making RTL and allocating space.
    BIND_EXPR_VARS is a chain of VAR_DECL nodes for the variables.
    BIND_EXPR_BODY is the body, the expression to be computed using
Index: gcc/tree-vect-generic.c
===================================================================
--- gcc/tree-vect-generic.c	(revision 177758)
+++ gcc/tree-vect-generic.c	(working copy)
@@ -30,6 +30,7 @@ along with GCC; see the file COPYING3.
 #include "tree-pass.h"
 #include "flags.h"
 #include "ggc.h"
+#include "diagnostic.h"
 
 /* Need to include rtl.h, expr.h, etc. for optabs.  */
 #include "expr.h"
@@ -432,6 +433,280 @@ type_for_widest_vector_mode (enum machin
     }
 }
 
+
+/* Build a reference to the element of the vector VECT. Function 
+   returns either the element itself, either BIT_FIELD_REF, or an 
+   ARRAY_REF expression.
+   
+   GSI is requred to insert temporary variables while building a
+   refernece to the element of the vector VECT.
+   
+   PTMPVEC is a pointer to the temporary variable for caching
+   purposes. In case when PTMPVEC is NULL new temporary variable
+   will be created.  */
+static tree
+vector_element (gimple_stmt_iterator *gsi, tree vect, tree idx, tree *ptmpvec)
+{
+  tree type;
+  gimple asgn; 
+  unsigned HOST_WIDE_INT maxval;
+  tree tmpvec; 
+  tree indextype, arraytype;
+  bool need_asgn = true;
+
+  gcc_assert (TREE_CODE (TREE_TYPE (vect)) == VECTOR_TYPE);
+
+  type = TREE_TYPE (vect);
+  if (TREE_CODE (idx) == INTEGER_CST)
+    {
+      unsigned HOST_WIDE_INT index;
+
+      if (!host_integerp (idx, 1)
+           || (index = tree_low_cst (idx, 1)) > TYPE_VECTOR_SUBPARTS (type)-1)
+        return error_mark_node;
+
+      if (TREE_CODE (vect) == VECTOR_CST)
+        {
+            unsigned i;
+            tree vals = TREE_VECTOR_CST_ELTS (vect);
+            for (i = 0; vals; vals = TREE_CHAIN (vals), ++i)
+              if (i == index)
+                 return TREE_VALUE (vals);
+            return error_mark_node;
+        }
+      else if (TREE_CODE (vect) == CONSTRUCTOR)
+        {
+          unsigned i;
+          VEC (constructor_elt, gc) *vals = CONSTRUCTOR_ELTS (vect);
+          constructor_elt *elt;
+
+          for (i = 0; VEC_iterate (constructor_elt, vals, i, elt); i++)
+            if (operand_equal_p (elt->index, idx, 0))
+              return elt->value; 
+          return fold_convert (TREE_TYPE (type), integer_zero_node);
+        }
+      else if (TREE_CODE (vect) == SSA_NAME)
+        {
+          tree el;
+          gimple vectdef = SSA_NAME_DEF_STMT (vect);
+          if (gimple_assign_single_p (vectdef)
+              && (el = vector_element (gsi, gimple_assign_rhs1 (vectdef), 
+                                       idx, ptmpvec)) 
+                 != error_mark_node)
+            return el;
+          else
+            {
+              tree size = TYPE_SIZE (TREE_TYPE (type));
+              tree pos = fold_build2 (MULT_EXPR, TREE_TYPE (idx), 
+                                      idx, size);
+              return fold_build3 (BIT_FIELD_REF, TREE_TYPE (type), 
+                             vect, size, pos);
+            }
+        }
+      else
+        return error_mark_node;
+    }
+  
+  if (!ptmpvec)
+    tmpvec = create_tmp_var (TREE_TYPE (vect), "vectmp");
+  else if (!*ptmpvec)
+    tmpvec = *ptmpvec = create_tmp_var (TREE_TYPE (vect), "vectmp");
+  else
+    {
+      tmpvec = *ptmpvec;
+      need_asgn = false;
+    }
+  
+  if (need_asgn)
+    {
+      TREE_ADDRESSABLE (tmpvec) = 1;
+      asgn = gimple_build_assign (tmpvec, vect);
+      gsi_insert_before (gsi, asgn, GSI_SAME_STMT);
+    }
+
+  maxval = TYPE_VECTOR_SUBPARTS (TREE_TYPE (vect)) -1;
+  indextype = build_index_type (size_int (maxval));
+  arraytype = build_array_type (TREE_TYPE (type), indextype);
+  
+  return build4 (ARRAY_REF, TREE_TYPE (type),
+                 build1 (VIEW_CONVERT_EXPR, arraytype, tmpvec),
+                 idx, NULL_TREE, NULL_TREE);
+
+
+}
+
+/* Lower built-in vector shuffle function. Function can have two or
+   three arguments.
+   When function has two arguments: __builtin_shuffle (v0, mask), 
+   the lowered version would be {v0[mask[0]], v0[mask[1]], ...}
+   MASK and V0 must have the same number of elements.
+        
+   In case of three arguments: __builtin_shuffle (v0, v1, mask)
+   the lowered version would be: 
+         {mask[0] < len(v0) ? v0[mask[0]] : v1[mask[0]], ...}
+   V0 and V1 must have the same type. MASK, V0, V1 must have the
+   same number of arguments.  */
+static void
+lower_builtin_shuffle (gimple_stmt_iterator *gsi, location_t loc)
+{
+#define TRAP_RETURN(new_stmt, stmt, gsi, vec0) \
+do { \
+  new_stmt = gimple_build_call (built_in_decls[BUILT_IN_TRAP], 0); \
+  gsi_insert_before (gsi, new_stmt,  GSI_SAME_STMT); \
+  split_block (gimple_bb (new_stmt), new_stmt); \
+  new_stmt = gimple_build_assign (gimple_call_lhs (stmt), vec0); \
+  gsi_replace (gsi, new_stmt, false); \
+  return; \
+} while (0) 
+ 
+  gimple stmt = gsi_stmt (*gsi);
+  unsigned numargs = gimple_call_num_args (stmt);
+  tree mask = gimple_call_arg (stmt, numargs - 1);
+  tree vec0 = gimple_call_arg (stmt, 0);
+  tree vec1 = gimple_call_arg (stmt, 1);
+  unsigned els = TYPE_VECTOR_SUBPARTS (TREE_TYPE (mask));
+  tree type0 = TREE_TYPE (TREE_TYPE (vec0));
+  VEC(constructor_elt,gc) *v = NULL;
+  tree vectype, constr;
+  gimple new_stmt;
+  tree vec0tmp = NULL_TREE, masktmp = NULL_TREE;
+
+  if (expand_vec_shuffle_expr_p (TYPE_MODE (TREE_TYPE (vec0)), vec0, vec1, mask))
+    {
+      tree t;
+
+      t = gimplify_build3 (gsi, VEC_SHUFFLE_EXPR, TREE_TYPE (vec0),
+			   vec0, vec1, mask);
+      new_stmt = gimple_build_assign (gimple_call_lhs (stmt), t);
+      gsi_replace (gsi, new_stmt, false);
+
+      return;
+    }
+
+  
+  if (numargs == 2)
+    {
+      unsigned i;
+      tree vec0tmp = NULL_TREE;
+      
+      v = VEC_alloc (constructor_elt, gc, els);
+      for (i = 0; i < els; i++)
+        {
+          tree idxval, vecel, t;
+	   
+	  idxval = vector_element (gsi, mask, size_int (i), &masktmp);
+          if (idxval == error_mark_node)
+            {
+              warning_at (loc, 0, "Invalid shuffling mask index %i", i);
+              TRAP_RETURN (new_stmt, stmt, gsi, vec0);
+            }
+
+	  vecel = vector_element (gsi, vec0, idxval, &vec0tmp);
+          if (vecel == error_mark_node)
+            {
+              warning_at (loc, 0, "Invalid shuffling arguments");
+              TRAP_RETURN (new_stmt, stmt, gsi, vec0);
+            }
+          
+          t = force_gimple_operand_gsi (gsi, vecel, true, 
+					NULL_TREE, true, GSI_SAME_STMT);
+          CONSTRUCTOR_APPEND_ELT (v, size_int (i), t);
+        }
+    }
+  else if (numargs == 3) 
+    {
+      unsigned i;
+      tree var = create_tmp_var (type0, "vecel");
+      tree vec1tmp = NULL_TREE;
+
+      v = VEC_alloc (constructor_elt, gc, els);
+      for (i = 0; i < els; i++)
+        {
+          tree idxval, idx1val, cond, elval0, elval1, condexpr, t, ssatmp;
+          tree vec0el, vec1el;
+          gimple asgn;
+          
+          idxval = vector_element (gsi, mask, size_int (i), &masktmp);
+	  if (idxval == error_mark_node)
+            {
+              warning_at (loc, 0, "Invalid shuffling mask index %i", i);
+              TRAP_RETURN (new_stmt, stmt, gsi, vec0);
+            }
+          
+          if (TREE_CODE (idxval) == INTEGER_CST)
+            {
+              if (tree_int_cst_lt (idxval, size_int (els)))
+                {
+                  vec0el = vector_element (gsi, vec0, idxval, &vec0tmp);
+                  t = force_gimple_operand_gsi (gsi, vec0el,
+                                    true, NULL_TREE, true, GSI_SAME_STMT);
+                }
+              else if (tree_int_cst_lt (idxval, size_int (2*els)))
+                {
+                  idx1val = fold_build2 (MINUS_EXPR, TREE_TYPE (idxval),
+                        idxval, build_int_cst (TREE_TYPE (idxval), els));
+                  
+                  vec1el = vector_element (gsi, vec1, idx1val, &vec1tmp);
+                  t = force_gimple_operand_gsi (gsi, vec1el, true, 
+						NULL_TREE, true, GSI_SAME_STMT);
+                }
+              else
+                {
+                  warning_at (loc, 0, "Invalid shuffling mask index %i", i);
+                  TRAP_RETURN (new_stmt, stmt, gsi, vec0);
+                }
+            }
+          else
+            {
+
+              idx1val = fold_build2 (MINUS_EXPR, TREE_TYPE (idxval),
+                            idxval, build_int_cst (TREE_TYPE (idxval), els));
+              idx1val = force_gimple_operand_gsi (gsi, idx1val, 
+                                true, NULL_TREE, true, GSI_SAME_STMT);
+              cond = build2 (GT_EXPR, boolean_type_node, \
+                             idxval, convert (type0, size_int (els - 1)));
+              
+	      vec0el = vector_element (gsi, vec0, idxval, &vec0tmp);
+              if (vec0el == error_mark_node)
+                {
+                  warning_at (loc, 0, "Invalid shuffling arguments");
+                  TRAP_RETURN (new_stmt, stmt, gsi, vec0);
+                }
+
+              elval0 = force_gimple_operand_gsi (gsi, vec0el, 
+                                true, NULL_TREE, true, GSI_SAME_STMT);
+	      
+	      vec1el = vector_element (gsi, vec1, idx1val, &vec1tmp);
+              if (vec1el == error_mark_node)
+                {
+                  warning_at (loc, 0, "Invalid shuffling arguments");
+                  TRAP_RETURN (new_stmt, stmt, gsi, vec0);
+                }
+
+              elval1 = force_gimple_operand_gsi (gsi, vec1el,
+                                true, NULL_TREE, true, GSI_SAME_STMT);
+
+              condexpr = fold_build3 (COND_EXPR, type0, cond, \
+                                      elval1, elval0);
+
+              t = force_gimple_operand_gsi (gsi, condexpr, true, \
+                                        NULL_TREE, true, GSI_SAME_STMT);
+            }
+          
+          asgn = gimple_build_assign (var, t);
+          ssatmp = make_ssa_name (var, asgn);
+          gimple_assign_set_lhs (asgn, ssatmp);
+          gsi_insert_before (gsi, asgn, GSI_SAME_STMT);
+          CONSTRUCTOR_APPEND_ELT (v, size_int (i), ssatmp);
+        }
+    }
+  
+  vectype = build_vector_type (type0, els);
+  constr = build_constructor (vectype, v);
+  new_stmt = gimple_build_assign (gimple_call_lhs (stmt), constr);
+  gsi_replace (gsi, new_stmt, false);
+}
+
 /* Process one statement.  If we identify a vector operation, expand it.  */
 
 static void
@@ -445,6 +720,13 @@ expand_vector_operations_1 (gimple_stmt_
   enum gimple_rhs_class rhs_class;
   tree new_rhs;
 
+  if (gimple_call_builtin_p (stmt, BUILT_IN_SHUFFLE))
+    {
+      lower_builtin_shuffle (gsi, gimple_location (stmt));
+      gimple_set_modified (gsi_stmt (*gsi), true);
+      update_stmt (gsi_stmt (*gsi));
+    }
+  
   if (gimple_code (stmt) != GIMPLE_ASSIGN)
     return;
 
@@ -612,10 +894,11 @@ expand_vector_operations_1 (gimple_stmt_
 /* Use this to lower vector operations introduced by the vectorizer,
    if it may need the bit-twiddling tricks implemented in this file.  */
 
+
 static bool
-gate_expand_vector_operations (void)
+gate_expand_vector_operations_noop (void)
 {
-  return flag_tree_vectorize != 0;
+  return optimize == 0;
 }
 
 static unsigned int
@@ -648,7 +931,7 @@ struct gimple_opt_pass pass_lower_vector
  {
   GIMPLE_PASS,
   "veclower",				/* name */
-  0,					/* gate */
+  gate_expand_vector_operations_noop,   /* gate */
   expand_vector_operations,		/* execute */
   NULL,					/* sub */
   NULL,					/* next */
@@ -660,7 +943,8 @@ struct gimple_opt_pass pass_lower_vector
   0,					/* todo_flags_start */
   TODO_update_ssa	                /* todo_flags_finish */
     | TODO_verify_ssa
-    | TODO_verify_stmts | TODO_verify_flow
+    | TODO_verify_stmts | TODO_verify_flow 
+    | TODO_cleanup_cfg
  }
 };
 
@@ -669,7 +953,7 @@ struct gimple_opt_pass pass_lower_vector
  {
   GIMPLE_PASS,
   "veclower2",				/* name */
-  gate_expand_vector_operations,	/* gate */
+  0,	                                /* gate */
   expand_vector_operations,		/* execute */
   NULL,					/* sub */
   NULL,					/* next */
@@ -682,6 +966,7 @@ struct gimple_opt_pass pass_lower_vector
   TODO_update_ssa	                /* todo_flags_finish */
     | TODO_verify_ssa
     | TODO_verify_stmts | TODO_verify_flow
+    | TODO_cleanup_cfg
  }
 };
 
Index: gcc/gimple.c
===================================================================
--- gcc/gimple.c	(revision 177758)
+++ gcc/gimple.c	(working copy)
@@ -2623,6 +2623,7 @@ get_gimple_rhs_num_ops (enum tree_code c
       || (SYM) == ADDR_EXPR						    \
       || (SYM) == WITH_SIZE_EXPR					    \
       || (SYM) == SSA_NAME						    \
+      || (SYM) == VEC_SHUFFLE_EXPR					    \
       || (SYM) == VEC_COND_EXPR) ? GIMPLE_SINGLE_RHS			    \
    : GIMPLE_INVALID_RHS),
 #define END_OF_BASE_TREE_CODES (unsigned char) GIMPLE_INVALID_RHS,
Index: gcc/passes.c
===================================================================
--- gcc/passes.c	(revision 177758)
+++ gcc/passes.c	(working copy)
@@ -1354,7 +1354,6 @@ init_optimization_passes (void)
 	  NEXT_PASS (pass_vectorize);
 	    {
 	      struct opt_pass **p = &pass_vectorize.pass.sub;
-	      NEXT_PASS (pass_lower_vector_ssa);
 	      NEXT_PASS (pass_dce_loop);
 	    }
           NEXT_PASS (pass_predcom);
@@ -1366,6 +1365,7 @@ init_optimization_passes (void)
 	  NEXT_PASS (pass_lim);
 	  NEXT_PASS (pass_tree_loop_done);
 	}
+      NEXT_PASS (pass_lower_vector_ssa);
       NEXT_PASS (pass_cse_reciprocals);
       NEXT_PASS (pass_reassoc);
       NEXT_PASS (pass_vrp);
Index: gcc/config/i386/sse.md
===================================================================
--- gcc/config/i386/sse.md	(revision 177758)
+++ gcc/config/i386/sse.md	(working copy)
@@ -127,6 +127,12 @@ (define_mode_attr sseinsnmode
    (V8SF "V8SF") (V4DF "V4DF")
    (V4SF "V4SF") (V2DF "V2DF")])
 
+;; All 128bit vector modes
+(define_mode_attr sseshuffint
+  [(V16QI "V16QI") (V8HI "V8HI") 
+   (V4SI "V4SI")  (V2DI "V2DI")
+   (V4SF "V4SI") (V2DF "V2DI")])
+
 ;; Mapping of vector float modes to an integer mode of the same size
 (define_mode_attr sseintvecmode
   [(V8SF "V8SI") (V4DF "V4DI")
@@ -5670,6 +5676,18 @@ (define_expand "vconduv2di"
   DONE;
 })
 
+(define_expand "vshuffle<mode>"
+  [(match_operand:V_128 0 "register_operand" "")
+   (match_operand:V_128 1 "general_operand" "")
+   (match_operand:<sseshuffint> 2 "general_operand" "")]
+  "TARGET_SSE3 || TARGET_AVX"
+{
+  bool ok = ix86_expand_vshuffle (operands);
+  gcc_assert (ok);
+  DONE;
+})
+
+
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
 ;;
 ;; Parallel bitwise logical operations
Index: gcc/config/i386/i386-protos.h
===================================================================
--- gcc/config/i386/i386-protos.h	(revision 177758)
+++ gcc/config/i386/i386-protos.h	(working copy)
@@ -118,6 +118,7 @@ extern bool ix86_expand_int_movcc (rtx[]
 extern bool ix86_expand_fp_movcc (rtx[]);
 extern bool ix86_expand_fp_vcond (rtx[]);
 extern bool ix86_expand_int_vcond (rtx[]);
+extern bool ix86_expand_vshuffle (rtx[]);
 extern void ix86_expand_sse_unpack (rtx[], bool, bool);
 extern bool ix86_expand_int_addcc (rtx[]);
 extern rtx ix86_expand_call (rtx, rtx, rtx, rtx, rtx, bool);
Index: gcc/config/i386/i386.c
===================================================================
--- gcc/config/i386/i386.c	(revision 177758)
+++ gcc/config/i386/i386.c	(working copy)
@@ -18703,6 +18703,96 @@ ix86_expand_int_vcond (rtx operands[])
   return true;
 }
 
+bool
+ix86_expand_vshuffle (rtx operands[])
+{
+  rtx target = operands[0];
+  rtx op0 = operands[1];
+  rtx mask = operands[2];
+  rtx mm, vt, cv0, t1;
+  enum machine_mode mode = GET_MODE (op0);
+  enum machine_mode maskmode = GET_MODE (mask);
+  enum machine_mode maskinner = GET_MODE_INNER (mode);
+  rtx vec[16];
+  int w, i, j;
+
+  gcc_assert ((TARGET_SSE3 || TARGET_AVX) && GET_MODE_BITSIZE (mode) == 128);
+
+  op0 = force_reg (mode, op0);
+  mask = force_reg (maskmode, mask);
+
+  /* Number of elements in the vector.  */
+  w = GET_MODE_BITSIZE (maskmode) / GET_MODE_BITSIZE (maskinner);
+ 
+  /* mask = mask & {w-1, w-1, w-1,...} */
+  for (i = 0; i < w; i++)
+    vec[i] = GEN_INT (w - 1);
+
+  mm = gen_rtx_CONST_VECTOR (maskmode, gen_rtvec_v (w, vec));
+  mm = force_reg (maskmode, mm);
+
+  mask = gen_rtx_AND (maskmode, mask, mm);
+  
+  /* Convert mask to vector of chars.  */
+  mask = simplify_gen_subreg (V16QImode, mask, maskmode, 0);
+  mask = force_reg (V16QImode, mask);
+
+
+  /* Build a helper mask wich we will use in pshufb
+     (v4si) --> {0,0,0,0, 4,4,4,4, 8,8,8,8, 12,12,12,12}
+     (v8hi) --> {0,0, 2,2, 4,4, 6,6, ...}
+     ...  */
+  for (i = 0; i < w; i++)
+    for (j = 0; j < 16/w; j++)
+      vec[i*w+j] = GEN_INT (i*16/w);
+
+  vt = gen_rtx_CONST_VECTOR (V16QImode, gen_rtvec_v (16, vec));
+  vt = force_reg (V16QImode, vt);
+  
+  t1 = gen_reg_rtx (V16QImode);
+  emit_insn (gen_ssse3_pshufbv16qi3 (t1, mask, vt));
+  mm = t1;
+
+  /* MM contains now something like
+     mm = {m[0], .., m[0], m[k], .., m[k], ... }, where 
+     m[i] is an index of the element in the vector we are
+     selecting from.
+
+     Convert it into the byte positions by doing
+     mm = mm * {16/w, 16/w, ...}
+     mm = mm + {0,1,..,16/w, 0,1,..,16/w, ...}  */
+  for (i = 0; i < 16; i++)
+    vec[i] = GEN_INT (16/w);
+
+  cv0 = gen_rtx_CONST_VECTOR (V16QImode, gen_rtvec_v (16, vec));
+  cv0 = force_reg (V16QImode, cv0);
+  mm = gen_rtx_MULT (V16QImode, mm, cv0);
+
+  for (i = 0; i < w; i++)
+    for (j = 0; j < 16/w; j++)
+      vec[i*w+j] = GEN_INT (j);
+
+  cv0 = gen_rtx_CONST_VECTOR (V16QImode, gen_rtvec_v (16, vec));
+  cv0 = force_reg (V16QImode, cv0);
+  mm = gen_rtx_PLUS (V16QImode, mm, cv0);
+  mm = force_reg (V16QImode, mm);
+
+  t1 = gen_reg_rtx (V16QImode);
+  
+  /* Convert OP0 to vector of chars.  */
+  op0 = simplify_gen_subreg (V16QImode, op0, mode, 0);
+  op0 = force_reg (V16QImode, op0);
+  emit_insn (gen_ssse3_pshufbv16qi3 (t1, op0, mm));
+  
+  /* Convert it back from vector of chars to the original mode.  */
+  t1 = simplify_gen_subreg (mode, t1, V16QImode, 0);
+  
+  emit_insn (gen_rtx_SET (VOIDmode, target, t1));
+ 
+  fprintf (stderr, "-- %s called\n", __func__);
+  return true;
+}
+
 /* Unpack OP[1] into the next wider integer vector type.  UNSIGNED_P is
    true if we should do zero extension, else sign extension.  HIGH_P is
    true if we want the N/2 high elements, else the low elements.  */
@@ -30297,6 +30387,9 @@ struct expand_vec_perm_d
 
 static bool expand_vec_perm_1 (struct expand_vec_perm_d *d);
 static bool expand_vec_perm_broadcast_1 (struct expand_vec_perm_d *d);
+static int extract_vec_perm_cst (struct expand_vec_perm_d *, tree);
+static bool ix86_vectorize_builtin_vec_perm_ok (tree vec_type, tree mask);
+
 
 /* Get a vector mode of the same size as the original but with elements
    twice as wide.  This is only guaranteed to apply to integral vectors.  */
@@ -33960,10 +34053,10 @@ ix86_vectorize_builtin_vec_perm_ok (tree
 
   vec_mask = extract_vec_perm_cst (&d, mask);
 
-  /* This hook is cannot be called in response to something that the
-     user does (unlike the builtin expander) so we shouldn't ever see
-     an error generated from the extract.  */
-  gcc_assert (vec_mask > 0 && vec_mask <= 3);
+  /* Check whether the mask can be applied to the vector type.  */
+  if (vec_mask < 0 || vec_mask > 3)
+    return false;
+  
   one_vec = (vec_mask != 3);
 
   /* Implementable with shufps or pshufd.  */
Index: gcc/tree-ssa-operands.c
===================================================================
--- gcc/tree-ssa-operands.c	(revision 177758)
+++ gcc/tree-ssa-operands.c	(working copy)
@@ -943,6 +943,7 @@ get_expr_operands (gimple stmt, tree *ex
 
     case COND_EXPR:
     case VEC_COND_EXPR:
+    case VEC_SHUFFLE_EXPR:
       get_expr_operands (stmt, &TREE_OPERAND (expr, 0), uflags);
       get_expr_operands (stmt, &TREE_OPERAND (expr, 1), uflags);
       get_expr_operands (stmt, &TREE_OPERAND (expr, 2), uflags);

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: Vector shuffling
  2011-08-30  7:17 Vector shuffling Artem Shinkarov
@ 2011-08-30 13:50 ` Richard Guenther
  2011-08-30 19:46   ` Joseph S. Myers
  2011-08-30 20:36   ` Artem Shinkarov
  0 siblings, 2 replies; 71+ messages in thread
From: Richard Guenther @ 2011-08-30 13:50 UTC (permalink / raw)
  To: Artem Shinkarov; +Cc: gcc-patches, Richard Henderson, Joseph S. Myers

On Tue, Aug 30, 2011 at 4:31 AM, Artem Shinkarov
<artyom.shinkaroff@gmail.com> wrote:
> Hi
>
> This is a patch for the explicit vector shuffling we have discussed a
> long time ago here:
> http://gcc.gnu.org/ml/gcc-patches/2010-08/msg01092.html
>
> The new patch introduces the new tree code, as we agreed, and expands
> this code by checking the vshuffle pattern in the backend.
>
> The patch at the moment lacks of some examples, but mainly it works
> fine for me. It would be nice if i386 gurus could look into the way I
> am doing the expansion.
>
> Middle-end parts seems to be more or less fine, they have not changed
> much from the previous time.

+@code{__builtin_shuffle (vec, mask)} and
+@code{__builtin_shuffle (vec0, vec1, mask)}. Both functions construct

the latter would be __builtin_shuffle2.


+bool
+expand_vec_shuffle_expr_p (enum machine_mode mode, tree v0,
+			   tree v1, tree mask)
+{
+#define inner_type_size(vec) \
+  GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE (TREE_TYPE (vec))))

missing comment.  No #defines like this please, just initialize
two temporary variables.

+
+rtx
+expand_vec_shuffle_expr (tree type, tree v0, tree v1, tree mask, rtx target)
+{

comment.

+vshuffle:
+  gcc_assert (v1 == v0);
+
+  icode = direct_optab_handler (vshuffle_optab, mode);

hmm, so we don't have a vshuffle2 optab but always go via the
builtin function, but only for constant masks there?  I wonder
if we should arrange for targets to only support a vshuffle
optab (thus, transition away from the builtin) and so
unconditionally have a vshuffle2 optab only (with possibly
equivalent v1 and v0?)

I suppose Richard might remember what he had in mind back
when we discussed this.

Index: gcc/c-typeck.c
===================================================================
--- gcc/c-typeck.c	(revision 177758)
+++ gcc/c-typeck.c	(working copy)
@@ -2815,6 +2815,68 @@ build_function_call_vec (location_t loc,
       && !check_builtin_function_arguments (fundecl, nargs, argarray))
     return error_mark_node;

+  /* Typecheck a builtin function which is declared with variable
+     argument list.  */
+  if (fundecl && DECL_BUILT_IN (fundecl)
+      && DECL_BUILT_IN_CLASS (fundecl) == BUILT_IN_NORMAL)

just add to check_builtin_function_arguments which is called right
in front of your added code.

+          /* Here we change the return type of the builtin function
+             from int f(...) --> t f(...) where t is a type of the
+             first argument.  */
+          fundecl = copy_node (fundecl);
+          TREE_TYPE (fundecl) = build_function_type (TREE_TYPE (firstarg),
+                                        TYPE_ARG_TYPES (TREE_TYPE (fundecl)));
+          function = build_fold_addr_expr (fundecl);

oh, hum - now I remember ;)  Eventually the C frontend should handle
this not via the function call mechanism but similar to how Joseph
added __builtin_complex support with

2011-08-19  Joseph Myers  <joseph@codesourcery.com>

        * c-parser.c (c_parser_postfix_expression): Handle RID_BUILTIN_COMPLEX.
        * doc/extend.texi (__builtin_complex): Document.

and then emit VEC_SHUFFLE_EXPRs directly from the frontend.  Joseph?

 	  FOR_EACH_CONSTRUCTOR_VALUE (CONSTRUCTOR_ELTS (inside_init), ix, value)
-	    if (!CONSTANT_CLASS_P (value))
+	  if (!CONSTANT_CLASS_P (value))

watch out for spurious whitespace changes.

Index: gcc/gimplify.c
===================================================================
--- gcc/gimplify.c	(revision 177758)
+++ gcc/gimplify.c	(working copy)
@@ -7050,6 +7050,7 @@ gimplify_expr (tree *expr_p, gimple_seq
 	  break;

 	case BIT_FIELD_REF:
+	case VEC_SHUFFLE_EXPR:

I don't think that's quite the right place given the is_gimple_lvalue
predicate on the first operand.  More like

        case VEC_SHUFFLE_EXPR:
           goto expr_3;

+/* Vector shuffle expression. A = VEC_SHUFFLE_EXPR<v0, v1, maks>

typo, mask

+   means
+
+   freach i in length (mask):
+     A = mask[i] < length (v0) ? v0[mask[i]] : v1[mask[i]]
+*/
+DEFTREECODE (VEC_SHUFFLE_EXPR, "vec_shuffle_expr", tcc_expression, 3)

what is the (is there any?) constraint on the operand types, especially
the mask type?

Index: gcc/gimple.c
===================================================================
--- gcc/gimple.c	(revision 177758)
+++ gcc/gimple.c	(working copy)
@@ -2623,6 +2623,7 @@ get_gimple_rhs_num_ops (enum tree_code c
       || (SYM) == ADDR_EXPR						    \
       || (SYM) == WITH_SIZE_EXPR					    \
       || (SYM) == SSA_NAME						    \
+      || (SYM) == VEC_SHUFFLE_EXPR					    \
       || (SYM) == VEC_COND_EXPR) ? GIMPLE_SINGLE_RHS			    \
    : GIMPLE_INVALID_RHS),
 #define END_OF_BASE_TREE_CODES (unsigned char) GIMPLE_INVALID_RHS,

please make it GIMPLE_TERNARY_RHS instead.

which requires adjustment at least here:

Index: gcc/tree-ssa-operands.c
===================================================================
--- gcc/tree-ssa-operands.c	(revision 177758)
+++ gcc/tree-ssa-operands.c	(working copy)
@@ -943,6 +943,7 @@ get_expr_operands (gimple stmt, tree *ex

     case COND_EXPR:
     case VEC_COND_EXPR:
+    case VEC_SHUFFLE_EXPR:


I think it would be nicer if the builtin would be handled by the frontend
not as builtin but like __builtin_complex and we'd just deal with
VEC_SHUFFLE_EXPR throughout the middle-end, eventually
lowering it in tree-vect-generic.c.  So I didn't look at the lowering
code in detail because that would obviously change then.

Defering to Joseph for a decision here and to x86 maintainers for
the target specific bits.

Thanks,
Richard.

> ChangeLog:
> 2011-08-30 Artjoms Sinkarovs <artyom.shinkaroff@gmailc.com>
>
>        gcc/
>        * optabs.c (expand_vec_shuffle_expr_p): New function. Checks
>        if given expression can be expanded by the target.
>        (expand_vec_shuffle_expr): New function. Expand VEC_SHUFFLE_EXPR
>        using target vector instructions.
>        * optabs.h: New optab vshuffle.
>        (expand_vec_shuffle_expr_p): New prototype.
>        (expand_vec_shuffle_expr): New prototype.
>        * genopinit.c: Adjust to support vecshuffle.
>        * builtins.def: New builtin __builtin_shuffle.
>        * c-typeck.c (build_function_call_vec): Typecheck
>        __builtin_shuffle, allowing only two or three arguments.
>        Change the type of builtin depending on the arguments.
>        (digest_init): Warn when constructor has less elements than
>        vector type.
>        * gimplify.c (gimplify_exp): Adjusted to support VEC_SHUFFLE_EXPR.
>        * tree.def: New tree code VEC_SHUFFLE_EXPR.
>        * tree-vect-generic.c (vector_element): New function. Returns an
>        element of the vector at the given position.
>        (lower_builtin_shuffle): Change builtin_shuffle with VEC_SHUFLLE_EXPR
>        or expand an expression piecewise.
>        (expand_vector_operations_1): Adjusted.
>        (gate_expand_vector_operations_noop): New gate function.
>        * gimple.c (get_gimple_rhs_num_ops): Adjust.
>        * passes.c: Move veclower down.
>        * tree-pretty-print.c (dump_generic_node): Recognize
>        VEC_SHUFFLE_EXPR as valid expression.
>        * tree-ssa-operands: Adjust.
>
>        gcc/config/i386
>        * sse.md: (sseshuffint) New mode_attr. Correspondence between the
>        vector and the type of the mask when shuffling.
>        (vecshuffle<mode>): New expansion.
>        * i386-protos.h (ix86_expand_vshuffle): New prototype.
>        * i386.c (ix86_expand_vshuffle): Expand vshuffle using pshufb.
>        (ix86_vectorize_builtin_vec_perm_ok): Adjust.
>
>        gcc/doc
>        * extend.texi: Adjust.
>
>        gcc/testsuite
>        * gcc.c-torture/execute/vect-shuffle-2.c: New test.
>        * gcc.c-torture/execute/vect-shuffle-4.c: New test.
>        * gcc.c-torture/execute/vect-shuffle-1.c: New test.
>        * gcc.c-torture/execute/vect-shuffle-3.c: New test.
>
> bootstrapped on x86_64-unknown-linux-gnu. The AVX parts are not
> tested, because I don't have actual hardware. It works with -mavx, the
> assembler code looks fine to me. I'll test it on a real hardware in
> couple of days.
>
>
>
> Thanks,
> Artem Shinkarov.
>

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: Vector shuffling
  2011-08-30 13:50 ` Richard Guenther
@ 2011-08-30 19:46   ` Joseph S. Myers
  2011-08-30 20:36   ` Artem Shinkarov
  1 sibling, 0 replies; 71+ messages in thread
From: Joseph S. Myers @ 2011-08-30 19:46 UTC (permalink / raw)
  To: Richard Guenther; +Cc: Artem Shinkarov, gcc-patches, Richard Henderson

On Tue, 30 Aug 2011, Richard Guenther wrote:

> oh, hum - now I remember ;)  Eventually the C frontend should handle
> this not via the function call mechanism but similar to how Joseph
> added __builtin_complex support with
> 
> 2011-08-19  Joseph Myers  <joseph@codesourcery.com>
> 
>         * c-parser.c (c_parser_postfix_expression): Handle RID_BUILTIN_COMPLEX.
>         * doc/extend.texi (__builtin_complex): Document.
> 
> and then emit VEC_SHUFFLE_EXPRs directly from the frontend.  Joseph?

It's probably time to refactor the parsing code before adding yet another 
pseudo-builtin.  Considering just those all of whose operands are 
expressions (there are more where types are involved), we have 
__builtin_complex (two operands) and __builtin_choose_expr (three 
operands).  How about a helper that parses a parenthesized list of 
expressions (using c_parser_expr_list, disabling all folding and 
conversions), gives an error if the number of expressions is wrong, then 
returns an error status and the list?  Pass the keyword to this function 
and it can give a "wrong number of arguments" error that says which 
pseudo-builtin is involved, rather than less friendly parse errors - so 
these things would act a bit more like built-in functions while still 
being purely front-end syntax for GENERIC and GIMPLE operations.  Then 
c_parser_postfix_expression would only have the code that deals with 
semantics, without duplicating the generic code for parsing lists.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: Vector shuffling
  2011-08-30 13:50 ` Richard Guenther
  2011-08-30 19:46   ` Joseph S. Myers
@ 2011-08-30 20:36   ` Artem Shinkarov
  2011-08-31  7:53     ` Chris Lattner
  2011-08-31  8:59     ` Richard Guenther
  1 sibling, 2 replies; 71+ messages in thread
From: Artem Shinkarov @ 2011-08-30 20:36 UTC (permalink / raw)
  To: Richard Guenther; +Cc: gcc-patches, Richard Henderson, Joseph S. Myers

On Tue, Aug 30, 2011 at 2:03 PM, Richard Guenther
<richard.guenther@gmail.com> wrote:
> On Tue, Aug 30, 2011 at 4:31 AM, Artem Shinkarov
> <artyom.shinkaroff@gmail.com> wrote:
>> Hi
>>
>> This is a patch for the explicit vector shuffling we have discussed a
>> long time ago here:
>> http://gcc.gnu.org/ml/gcc-patches/2010-08/msg01092.html
>>
>> The new patch introduces the new tree code, as we agreed, and expands
>> this code by checking the vshuffle pattern in the backend.
>>
>> The patch at the moment lacks of some examples, but mainly it works
>> fine for me. It would be nice if i386 gurus could look into the way I
>> am doing the expansion.
>>
>> Middle-end parts seems to be more or less fine, they have not changed
>> much from the previous time.
>
> +@code{__builtin_shuffle (vec, mask)} and
> +@code{__builtin_shuffle (vec0, vec1, mask)}. Both functions construct
>
> the latter would be __builtin_shuffle2.

Why??
That was the syntax we agreed on that elegantly handles both cases in one place.

> +bool
> +expand_vec_shuffle_expr_p (enum machine_mode mode, tree v0,
> +                          tree v1, tree mask)
> +{
> +#define inner_type_size(vec) \
> +  GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE (TREE_TYPE (vec))))
>
> missing comment.  No #defines like this please, just initialize
> two temporary variables.
>
> +
> +rtx
> +expand_vec_shuffle_expr (tree type, tree v0, tree v1, tree mask, rtx target)
> +{
>
> comment.
>
> +vshuffle:
> +  gcc_assert (v1 == v0);
> +
> +  icode = direct_optab_handler (vshuffle_optab, mode);
>
> hmm, so we don't have a vshuffle2 optab but always go via the
> builtin function, but only for constant masks there?  I wonder
> if we should arrange for targets to only support a vshuffle
> optab (thus, transition away from the builtin) and so
> unconditionally have a vshuffle2 optab only (with possibly
> equivalent v1 and v0?)

I have only implemented the case with non-constant mask that supports
only one argument. I think that it would be enough for the first
version. Later we can introduce vshuffle2 pattern and reuse the code
that expands vshuffle at the moment.

> I suppose Richard might remember what he had in mind back
> when we discussed this.
>
> Index: gcc/c-typeck.c
> ===================================================================
> --- gcc/c-typeck.c      (revision 177758)
> +++ gcc/c-typeck.c      (working copy)
> @@ -2815,6 +2815,68 @@ build_function_call_vec (location_t loc,
>       && !check_builtin_function_arguments (fundecl, nargs, argarray))
>     return error_mark_node;
>
> +  /* Typecheck a builtin function which is declared with variable
> +     argument list.  */
> +  if (fundecl && DECL_BUILT_IN (fundecl)
> +      && DECL_BUILT_IN_CLASS (fundecl) == BUILT_IN_NORMAL)
>
> just add to check_builtin_function_arguments which is called right
> in front of your added code.
>
> +          /* Here we change the return type of the builtin function
> +             from int f(...) --> t f(...) where t is a type of the
> +             first argument.  */
> +          fundecl = copy_node (fundecl);
> +          TREE_TYPE (fundecl) = build_function_type (TREE_TYPE (firstarg),
> +                                        TYPE_ARG_TYPES (TREE_TYPE (fundecl)));
> +          function = build_fold_addr_expr (fundecl);
>
> oh, hum - now I remember ;)  Eventually the C frontend should handle
> this not via the function call mechanism but similar to how Joseph
> added __builtin_complex support with
>
> 2011-08-19  Joseph Myers  <joseph@codesourcery.com>
>
>        * c-parser.c (c_parser_postfix_expression): Handle RID_BUILTIN_COMPLEX.
>        * doc/extend.texi (__builtin_complex): Document.
>
> and then emit VEC_SHUFFLE_EXPRs directly from the frontend.  Joseph?
>
>          FOR_EACH_CONSTRUCTOR_VALUE (CONSTRUCTOR_ELTS (inside_init), ix, value)
> -           if (!CONSTANT_CLASS_P (value))
> +         if (!CONSTANT_CLASS_P (value))
>
> watch out for spurious whitespace changes.
>
> Index: gcc/gimplify.c
> ===================================================================
> --- gcc/gimplify.c      (revision 177758)
> +++ gcc/gimplify.c      (working copy)
> @@ -7050,6 +7050,7 @@ gimplify_expr (tree *expr_p, gimple_seq
>          break;
>
>        case BIT_FIELD_REF:
> +       case VEC_SHUFFLE_EXPR:
>
> I don't think that's quite the right place given the is_gimple_lvalue
> predicate on the first operand.  More like
>
>        case VEC_SHUFFLE_EXPR:
>           goto expr_3;
>
> +/* Vector shuffle expression. A = VEC_SHUFFLE_EXPR<v0, v1, maks>
>
> typo, mask
>
> +   means
> +
> +   freach i in length (mask):
> +     A = mask[i] < length (v0) ? v0[mask[i]] : v1[mask[i]]
> +*/
> +DEFTREECODE (VEC_SHUFFLE_EXPR, "vec_shuffle_expr", tcc_expression, 3)
>
> what is the (is there any?) constraint on the operand types, especially
> the mask type?
>
> Index: gcc/gimple.c
> ===================================================================
> --- gcc/gimple.c        (revision 177758)
> +++ gcc/gimple.c        (working copy)
> @@ -2623,6 +2623,7 @@ get_gimple_rhs_num_ops (enum tree_code c
>       || (SYM) == ADDR_EXPR                                                \
>       || (SYM) == WITH_SIZE_EXPR                                           \
>       || (SYM) == SSA_NAME                                                 \
> +      || (SYM) == VEC_SHUFFLE_EXPR                                         \
>       || (SYM) == VEC_COND_EXPR) ? GIMPLE_SINGLE_RHS                       \
>    : GIMPLE_INVALID_RHS),
>  #define END_OF_BASE_TREE_CODES (unsigned char) GIMPLE_INVALID_RHS,
>
> please make it GIMPLE_TERNARY_RHS instead.
>
> which requires adjustment at least here:
>
> Index: gcc/tree-ssa-operands.c
> ===================================================================
> --- gcc/tree-ssa-operands.c     (revision 177758)
> +++ gcc/tree-ssa-operands.c     (working copy)
> @@ -943,6 +943,7 @@ get_expr_operands (gimple stmt, tree *ex
>
>     case COND_EXPR:
>     case VEC_COND_EXPR:
> +    case VEC_SHUFFLE_EXPR:
>
>
> I think it would be nicer if the builtin would be handled by the frontend
> not as builtin but like __builtin_complex and we'd just deal with
> VEC_SHUFFLE_EXPR throughout the middle-end, eventually
> lowering it in tree-vect-generic.c.  So I didn't look at the lowering
> code in detail because that would obviously change then.
>
> Defering to Joseph for a decision here and to x86 maintainers for
> the target specific bits.
>
> Thanks,
> Richard.

I'll go and see how the __builtin_complex are treated, and try to
adjust the patch.


Thanks,
Artem.

>> ChangeLog:
>> 2011-08-30 Artjoms Sinkarovs <artyom.shinkaroff@gmailc.com>
>>
>>        gcc/
>>        * optabs.c (expand_vec_shuffle_expr_p): New function. Checks
>>        if given expression can be expanded by the target.
>>        (expand_vec_shuffle_expr): New function. Expand VEC_SHUFFLE_EXPR
>>        using target vector instructions.
>>        * optabs.h: New optab vshuffle.
>>        (expand_vec_shuffle_expr_p): New prototype.
>>        (expand_vec_shuffle_expr): New prototype.
>>        * genopinit.c: Adjust to support vecshuffle.
>>        * builtins.def: New builtin __builtin_shuffle.
>>        * c-typeck.c (build_function_call_vec): Typecheck
>>        __builtin_shuffle, allowing only two or three arguments.
>>        Change the type of builtin depending on the arguments.
>>        (digest_init): Warn when constructor has less elements than
>>        vector type.
>>        * gimplify.c (gimplify_exp): Adjusted to support VEC_SHUFFLE_EXPR.
>>        * tree.def: New tree code VEC_SHUFFLE_EXPR.
>>        * tree-vect-generic.c (vector_element): New function. Returns an
>>        element of the vector at the given position.
>>        (lower_builtin_shuffle): Change builtin_shuffle with VEC_SHUFLLE_EXPR
>>        or expand an expression piecewise.
>>        (expand_vector_operations_1): Adjusted.
>>        (gate_expand_vector_operations_noop): New gate function.
>>        * gimple.c (get_gimple_rhs_num_ops): Adjust.
>>        * passes.c: Move veclower down.
>>        * tree-pretty-print.c (dump_generic_node): Recognize
>>        VEC_SHUFFLE_EXPR as valid expression.
>>        * tree-ssa-operands: Adjust.
>>
>>        gcc/config/i386
>>        * sse.md: (sseshuffint) New mode_attr. Correspondence between the
>>        vector and the type of the mask when shuffling.
>>        (vecshuffle<mode>): New expansion.
>>        * i386-protos.h (ix86_expand_vshuffle): New prototype.
>>        * i386.c (ix86_expand_vshuffle): Expand vshuffle using pshufb.
>>        (ix86_vectorize_builtin_vec_perm_ok): Adjust.
>>
>>        gcc/doc
>>        * extend.texi: Adjust.
>>
>>        gcc/testsuite
>>        * gcc.c-torture/execute/vect-shuffle-2.c: New test.
>>        * gcc.c-torture/execute/vect-shuffle-4.c: New test.
>>        * gcc.c-torture/execute/vect-shuffle-1.c: New test.
>>        * gcc.c-torture/execute/vect-shuffle-3.c: New test.
>>
>> bootstrapped on x86_64-unknown-linux-gnu. The AVX parts are not
>> tested, because I don't have actual hardware. It works with -mavx, the
>> assembler code looks fine to me. I'll test it on a real hardware in
>> couple of days.
>>
>>
>>
>> Thanks,
>> Artem Shinkarov.
>>
>

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: Vector shuffling
  2011-08-30 20:36   ` Artem Shinkarov
@ 2011-08-31  7:53     ` Chris Lattner
  2011-08-31  9:00       ` Richard Guenther
  2011-08-31  9:02       ` Artem Shinkarov
  2011-08-31  8:59     ` Richard Guenther
  1 sibling, 2 replies; 71+ messages in thread
From: Chris Lattner @ 2011-08-31  7:53 UTC (permalink / raw)
  To: Artem Shinkarov
  Cc: Richard Guenther, gcc-patches, Richard Henderson, Joseph S. Myers

On Aug 30, 2011, at 10:01 AM, Artem Shinkarov wrote:
>>> The patch at the moment lacks of some examples, but mainly it works
>>> fine for me. It would be nice if i386 gurus could look into the way I
>>> am doing the expansion.
>>> 
>>> Middle-end parts seems to be more or less fine, they have not changed
>>> much from the previous time.
>> 
>> +@code{__builtin_shuffle (vec, mask)} and
>> +@code{__builtin_shuffle (vec0, vec1, mask)}. Both functions construct
>> 
>> the latter would be __builtin_shuffle2.
> 
> Why??
> That was the syntax we agreed on that elegantly handles both cases in one place.

If you're going to add vector shuffling builtins, you might consider adding the same builtin that clang has for compatibility:
http://clang.llvm.org/docs/LanguageExtensions.html#__builtin_shufflevector

It should be straight-forward to map it into the same IR.

-Chris

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: Vector shuffling
  2011-08-30 20:36   ` Artem Shinkarov
  2011-08-31  7:53     ` Chris Lattner
@ 2011-08-31  8:59     ` Richard Guenther
  1 sibling, 0 replies; 71+ messages in thread
From: Richard Guenther @ 2011-08-31  8:59 UTC (permalink / raw)
  To: Artem Shinkarov; +Cc: gcc-patches, Richard Henderson, Joseph S. Myers

On Tue, Aug 30, 2011 at 7:01 PM, Artem Shinkarov
<artyom.shinkaroff@gmail.com> wrote:
> On Tue, Aug 30, 2011 at 2:03 PM, Richard Guenther
> <richard.guenther@gmail.com> wrote:
>> On Tue, Aug 30, 2011 at 4:31 AM, Artem Shinkarov
>> <artyom.shinkaroff@gmail.com> wrote:
>>> Hi
>>>
>>> This is a patch for the explicit vector shuffling we have discussed a
>>> long time ago here:
>>> http://gcc.gnu.org/ml/gcc-patches/2010-08/msg01092.html
>>>
>>> The new patch introduces the new tree code, as we agreed, and expands
>>> this code by checking the vshuffle pattern in the backend.
>>>
>>> The patch at the moment lacks of some examples, but mainly it works
>>> fine for me. It would be nice if i386 gurus could look into the way I
>>> am doing the expansion.
>>>
>>> Middle-end parts seems to be more or less fine, they have not changed
>>> much from the previous time.
>>
>> +@code{__builtin_shuffle (vec, mask)} and
>> +@code{__builtin_shuffle (vec0, vec1, mask)}. Both functions construct
>>
>> the latter would be __builtin_shuffle2.
>
> Why??
> That was the syntax we agreed on that elegantly handles both cases in one place.

Ah, then there was a case below that mentions __builtin_shuffle2 that
needs adjusting then.

>> +bool
>> +expand_vec_shuffle_expr_p (enum machine_mode mode, tree v0,
>> +                          tree v1, tree mask)
>> +{
>> +#define inner_type_size(vec) \
>> +  GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE (TREE_TYPE (vec))))
>>
>> missing comment.  No #defines like this please, just initialize
>> two temporary variables.
>>
>> +
>> +rtx
>> +expand_vec_shuffle_expr (tree type, tree v0, tree v1, tree mask, rtx target)
>> +{
>>
>> comment.
>>
>> +vshuffle:
>> +  gcc_assert (v1 == v0);
>> +
>> +  icode = direct_optab_handler (vshuffle_optab, mode);
>>
>> hmm, so we don't have a vshuffle2 optab but always go via the
>> builtin function, but only for constant masks there?  I wonder
>> if we should arrange for targets to only support a vshuffle
>> optab (thus, transition away from the builtin) and so
>> unconditionally have a vshuffle2 optab only (with possibly
>> equivalent v1 and v0?)
>
> I have only implemented the case with non-constant mask that supports
> only one argument. I think that it would be enough for the first
> version. Later we can introduce vshuffle2 pattern and reuse the code
> that expands vshuffle at the moment.

Ok.

>> I suppose Richard might remember what he had in mind back
>> when we discussed this.
>>
>> Index: gcc/c-typeck.c
>> ===================================================================
>> --- gcc/c-typeck.c      (revision 177758)
>> +++ gcc/c-typeck.c      (working copy)
>> @@ -2815,6 +2815,68 @@ build_function_call_vec (location_t loc,
>>       && !check_builtin_function_arguments (fundecl, nargs, argarray))
>>     return error_mark_node;
>>
>> +  /* Typecheck a builtin function which is declared with variable
>> +     argument list.  */
>> +  if (fundecl && DECL_BUILT_IN (fundecl)
>> +      && DECL_BUILT_IN_CLASS (fundecl) == BUILT_IN_NORMAL)
>>
>> just add to check_builtin_function_arguments which is called right
>> in front of your added code.
>>
>> +          /* Here we change the return type of the builtin function
>> +             from int f(...) --> t f(...) where t is a type of the
>> +             first argument.  */
>> +          fundecl = copy_node (fundecl);
>> +          TREE_TYPE (fundecl) = build_function_type (TREE_TYPE (firstarg),
>> +                                        TYPE_ARG_TYPES (TREE_TYPE (fundecl)));
>> +          function = build_fold_addr_expr (fundecl);
>>
>> oh, hum - now I remember ;)  Eventually the C frontend should handle
>> this not via the function call mechanism but similar to how Joseph
>> added __builtin_complex support with
>>
>> 2011-08-19  Joseph Myers  <joseph@codesourcery.com>
>>
>>        * c-parser.c (c_parser_postfix_expression): Handle RID_BUILTIN_COMPLEX.
>>        * doc/extend.texi (__builtin_complex): Document.
>>
>> and then emit VEC_SHUFFLE_EXPRs directly from the frontend.  Joseph?
>>
>>          FOR_EACH_CONSTRUCTOR_VALUE (CONSTRUCTOR_ELTS (inside_init), ix, value)
>> -           if (!CONSTANT_CLASS_P (value))
>> +         if (!CONSTANT_CLASS_P (value))
>>
>> watch out for spurious whitespace changes.
>>
>> Index: gcc/gimplify.c
>> ===================================================================
>> --- gcc/gimplify.c      (revision 177758)
>> +++ gcc/gimplify.c      (working copy)
>> @@ -7050,6 +7050,7 @@ gimplify_expr (tree *expr_p, gimple_seq
>>          break;
>>
>>        case BIT_FIELD_REF:
>> +       case VEC_SHUFFLE_EXPR:
>>
>> I don't think that's quite the right place given the is_gimple_lvalue
>> predicate on the first operand.  More like
>>
>>        case VEC_SHUFFLE_EXPR:
>>           goto expr_3;
>>
>> +/* Vector shuffle expression. A = VEC_SHUFFLE_EXPR<v0, v1, maks>
>>
>> typo, mask
>>
>> +   means
>> +
>> +   freach i in length (mask):
>> +     A = mask[i] < length (v0) ? v0[mask[i]] : v1[mask[i]]
>> +*/
>> +DEFTREECODE (VEC_SHUFFLE_EXPR, "vec_shuffle_expr", tcc_expression, 3)
>>
>> what is the (is there any?) constraint on the operand types, especially
>> the mask type?
>>
>> Index: gcc/gimple.c
>> ===================================================================
>> --- gcc/gimple.c        (revision 177758)
>> +++ gcc/gimple.c        (working copy)
>> @@ -2623,6 +2623,7 @@ get_gimple_rhs_num_ops (enum tree_code c
>>       || (SYM) == ADDR_EXPR                                                \
>>       || (SYM) == WITH_SIZE_EXPR                                           \
>>       || (SYM) == SSA_NAME                                                 \
>> +      || (SYM) == VEC_SHUFFLE_EXPR                                         \
>>       || (SYM) == VEC_COND_EXPR) ? GIMPLE_SINGLE_RHS                       \
>>    : GIMPLE_INVALID_RHS),
>>  #define END_OF_BASE_TREE_CODES (unsigned char) GIMPLE_INVALID_RHS,
>>
>> please make it GIMPLE_TERNARY_RHS instead.
>>
>> which requires adjustment at least here:
>>
>> Index: gcc/tree-ssa-operands.c
>> ===================================================================
>> --- gcc/tree-ssa-operands.c     (revision 177758)
>> +++ gcc/tree-ssa-operands.c     (working copy)
>> @@ -943,6 +943,7 @@ get_expr_operands (gimple stmt, tree *ex
>>
>>     case COND_EXPR:
>>     case VEC_COND_EXPR:
>> +    case VEC_SHUFFLE_EXPR:
>>
>>
>> I think it would be nicer if the builtin would be handled by the frontend
>> not as builtin but like __builtin_complex and we'd just deal with
>> VEC_SHUFFLE_EXPR throughout the middle-end, eventually
>> lowering it in tree-vect-generic.c.  So I didn't look at the lowering
>> code in detail because that would obviously change then.
>>
>> Defering to Joseph for a decision here and to x86 maintainers for
>> the target specific bits.
>>
>> Thanks,
>> Richard.
>
> I'll go and see how the __builtin_complex are treated, and try to
> adjust the patch.

Thanks, also see Josephs comments on this.

Richard.

>
> Thanks,
> Artem.
>
>>> ChangeLog:
>>> 2011-08-30 Artjoms Sinkarovs <artyom.shinkaroff@gmailc.com>
>>>
>>>        gcc/
>>>        * optabs.c (expand_vec_shuffle_expr_p): New function. Checks
>>>        if given expression can be expanded by the target.
>>>        (expand_vec_shuffle_expr): New function. Expand VEC_SHUFFLE_EXPR
>>>        using target vector instructions.
>>>        * optabs.h: New optab vshuffle.
>>>        (expand_vec_shuffle_expr_p): New prototype.
>>>        (expand_vec_shuffle_expr): New prototype.
>>>        * genopinit.c: Adjust to support vecshuffle.
>>>        * builtins.def: New builtin __builtin_shuffle.
>>>        * c-typeck.c (build_function_call_vec): Typecheck
>>>        __builtin_shuffle, allowing only two or three arguments.
>>>        Change the type of builtin depending on the arguments.
>>>        (digest_init): Warn when constructor has less elements than
>>>        vector type.
>>>        * gimplify.c (gimplify_exp): Adjusted to support VEC_SHUFFLE_EXPR.
>>>        * tree.def: New tree code VEC_SHUFFLE_EXPR.
>>>        * tree-vect-generic.c (vector_element): New function. Returns an
>>>        element of the vector at the given position.
>>>        (lower_builtin_shuffle): Change builtin_shuffle with VEC_SHUFLLE_EXPR
>>>        or expand an expression piecewise.
>>>        (expand_vector_operations_1): Adjusted.
>>>        (gate_expand_vector_operations_noop): New gate function.
>>>        * gimple.c (get_gimple_rhs_num_ops): Adjust.
>>>        * passes.c: Move veclower down.
>>>        * tree-pretty-print.c (dump_generic_node): Recognize
>>>        VEC_SHUFFLE_EXPR as valid expression.
>>>        * tree-ssa-operands: Adjust.
>>>
>>>        gcc/config/i386
>>>        * sse.md: (sseshuffint) New mode_attr. Correspondence between the
>>>        vector and the type of the mask when shuffling.
>>>        (vecshuffle<mode>): New expansion.
>>>        * i386-protos.h (ix86_expand_vshuffle): New prototype.
>>>        * i386.c (ix86_expand_vshuffle): Expand vshuffle using pshufb.
>>>        (ix86_vectorize_builtin_vec_perm_ok): Adjust.
>>>
>>>        gcc/doc
>>>        * extend.texi: Adjust.
>>>
>>>        gcc/testsuite
>>>        * gcc.c-torture/execute/vect-shuffle-2.c: New test.
>>>        * gcc.c-torture/execute/vect-shuffle-4.c: New test.
>>>        * gcc.c-torture/execute/vect-shuffle-1.c: New test.
>>>        * gcc.c-torture/execute/vect-shuffle-3.c: New test.
>>>
>>> bootstrapped on x86_64-unknown-linux-gnu. The AVX parts are not
>>> tested, because I don't have actual hardware. It works with -mavx, the
>>> assembler code looks fine to me. I'll test it on a real hardware in
>>> couple of days.
>>>
>>>
>>>
>>> Thanks,
>>> Artem Shinkarov.
>>>
>>
>

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: Vector shuffling
  2011-08-31  7:53     ` Chris Lattner
@ 2011-08-31  9:00       ` Richard Guenther
  2011-08-31  9:02       ` Artem Shinkarov
  1 sibling, 0 replies; 71+ messages in thread
From: Richard Guenther @ 2011-08-31  9:00 UTC (permalink / raw)
  To: Chris Lattner
  Cc: Artem Shinkarov, gcc-patches, Richard Henderson, Joseph S. Myers

On Wed, Aug 31, 2011 at 1:51 AM, Chris Lattner <clattner@apple.com> wrote:
> On Aug 30, 2011, at 10:01 AM, Artem Shinkarov wrote:
>>>> The patch at the moment lacks of some examples, but mainly it works
>>>> fine for me. It would be nice if i386 gurus could look into the way I
>>>> am doing the expansion.
>>>>
>>>> Middle-end parts seems to be more or less fine, they have not changed
>>>> much from the previous time.
>>>
>>> +@code{__builtin_shuffle (vec, mask)} and
>>> +@code{__builtin_shuffle (vec0, vec1, mask)}. Both functions construct
>>>
>>> the latter would be __builtin_shuffle2.
>>
>> Why??
>> That was the syntax we agreed on that elegantly handles both cases in one place.
>
> If you're going to add vector shuffling builtins, you might consider adding the same builtin that clang has for compatibility:
> http://clang.llvm.org/docs/LanguageExtensions.html#__builtin_shufflevector
>
> It should be straight-forward to map it into the same IR.

Sure.  It doesn't support a vector argument for element selection though,
which I think is required for a mapping to OpenCL shuffle/shuffle2.  That's odd.

Richard.

> -Chris
>

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: Vector shuffling
  2011-08-31  7:53     ` Chris Lattner
  2011-08-31  9:00       ` Richard Guenther
@ 2011-08-31  9:02       ` Artem Shinkarov
  2011-08-31  9:04         ` Duncan Sands
  2011-08-31 20:36         ` Chris Lattner
  1 sibling, 2 replies; 71+ messages in thread
From: Artem Shinkarov @ 2011-08-31  9:02 UTC (permalink / raw)
  To: Chris Lattner
  Cc: Richard Guenther, gcc-patches, Richard Henderson, Joseph S. Myers

On Wed, Aug 31, 2011 at 12:51 AM, Chris Lattner <clattner@apple.com> wrote:
> On Aug 30, 2011, at 10:01 AM, Artem Shinkarov wrote:
>>>> The patch at the moment lacks of some examples, but mainly it works
>>>> fine for me. It would be nice if i386 gurus could look into the way I
>>>> am doing the expansion.
>>>>
>>>> Middle-end parts seems to be more or less fine, they have not changed
>>>> much from the previous time.
>>>
>>> +@code{__builtin_shuffle (vec, mask)} and
>>> +@code{__builtin_shuffle (vec0, vec1, mask)}. Both functions construct
>>>
>>> the latter would be __builtin_shuffle2.
>>
>> Why??
>> That was the syntax we agreed on that elegantly handles both cases in one place.
>
> If you're going to add vector shuffling builtins, you might consider adding the same builtin that clang has for compatibility:
> http://clang.llvm.org/docs/LanguageExtensions.html#__builtin_shufflevector
>
> It should be straight-forward to map it into the same IR.
>
> -Chris
>

Chris

I am trying to use OpenCL syntax here which says that the mask for
shuffling is a vector. Also I didn't really get from the clang
description if the indexes could be non-constnants? If not, then I
have a problem here, because I want to support this.



Artem.

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: Vector shuffling
  2011-08-31  9:02       ` Artem Shinkarov
@ 2011-08-31  9:04         ` Duncan Sands
  2011-08-31  9:34           ` Richard Guenther
  2011-08-31 20:36         ` Chris Lattner
  1 sibling, 1 reply; 71+ messages in thread
From: Duncan Sands @ 2011-08-31  9:04 UTC (permalink / raw)
  To: gcc-patches

Hi Artem,

On 31/08/11 10:27, Artem Shinkarov wrote:
> On Wed, Aug 31, 2011 at 12:51 AM, Chris Lattner<clattner@apple.com>  wrote:
>> On Aug 30, 2011, at 10:01 AM, Artem Shinkarov wrote:
>>>>> The patch at the moment lacks of some examples, but mainly it works
>>>>> fine for me. It would be nice if i386 gurus could look into the way I
>>>>> am doing the expansion.
>>>>>
>>>>> Middle-end parts seems to be more or less fine, they have not changed
>>>>> much from the previous time.
>>>>
>>>> +@code{__builtin_shuffle (vec, mask)} and
>>>> +@code{__builtin_shuffle (vec0, vec1, mask)}. Both functions construct
>>>>
>>>> the latter would be __builtin_shuffle2.
>>>
>>> Why??
>>> That was the syntax we agreed on that elegantly handles both cases in one place.
>>
>> If you're going to add vector shuffling builtins, you might consider adding the same builtin that clang has for compatibility:
>> http://clang.llvm.org/docs/LanguageExtensions.html#__builtin_shufflevector
>>
>> It should be straight-forward to map it into the same IR.
>>
>> -Chris
>>
>
> Chris
>
> I am trying to use OpenCL syntax here which says that the mask for
> shuffling is a vector. Also I didn't really get from the clang
> description if the indexes could be non-constnants? If not, then I
> have a problem here, because I want to support this.

probably it maps directly to the LLVM shufflevector instruction, see
   http://llvm.org/docs/LangRef.html#i_shufflevector
That requires the shuffle mask to be constant.

Ciao, Duncan.

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: Vector shuffling
  2011-08-31  9:04         ` Duncan Sands
@ 2011-08-31  9:34           ` Richard Guenther
  2011-08-31 14:33             ` Artem Shinkarov
  0 siblings, 1 reply; 71+ messages in thread
From: Richard Guenther @ 2011-08-31  9:34 UTC (permalink / raw)
  To: Duncan Sands; +Cc: gcc-patches

On Wed, Aug 31, 2011 at 10:35 AM, Duncan Sands <baldrick@free.fr> wrote:
> Hi Artem,
>
> On 31/08/11 10:27, Artem Shinkarov wrote:
>>
>> On Wed, Aug 31, 2011 at 12:51 AM, Chris Lattner<clattner@apple.com>
>>  wrote:
>>>
>>> On Aug 30, 2011, at 10:01 AM, Artem Shinkarov wrote:
>>>>>>
>>>>>> The patch at the moment lacks of some examples, but mainly it works
>>>>>> fine for me. It would be nice if i386 gurus could look into the way I
>>>>>> am doing the expansion.
>>>>>>
>>>>>> Middle-end parts seems to be more or less fine, they have not changed
>>>>>> much from the previous time.
>>>>>
>>>>> +@code{__builtin_shuffle (vec, mask)} and
>>>>> +@code{__builtin_shuffle (vec0, vec1, mask)}. Both functions construct
>>>>>
>>>>> the latter would be __builtin_shuffle2.
>>>>
>>>> Why??
>>>> That was the syntax we agreed on that elegantly handles both cases in
>>>> one place.
>>>
>>> If you're going to add vector shuffling builtins, you might consider
>>> adding the same builtin that clang has for compatibility:
>>>
>>> http://clang.llvm.org/docs/LanguageExtensions.html#__builtin_shufflevector
>>>
>>> It should be straight-forward to map it into the same IR.
>>>
>>> -Chris
>>>
>>
>> Chris
>>
>> I am trying to use OpenCL syntax here which says that the mask for
>> shuffling is a vector. Also I didn't really get from the clang
>> description if the indexes could be non-constnants? If not, then I
>> have a problem here, because I want to support this.
>
> probably it maps directly to the LLVM shufflevector instruction, see
>  http://llvm.org/docs/LangRef.html#i_shufflevector
> That requires the shuffle mask to be constant.

I see.  I think it's not worth copying LLVM builtins that merely map
its internal IL.

Richard.

> Ciao, Duncan.
>

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: Vector shuffling
  2011-08-31  9:34           ` Richard Guenther
@ 2011-08-31 14:33             ` Artem Shinkarov
  2011-08-31 15:17               ` Richard Guenther
  2011-08-31 17:25               ` Joseph S. Myers
  0 siblings, 2 replies; 71+ messages in thread
From: Artem Shinkarov @ 2011-08-31 14:33 UTC (permalink / raw)
  To: Richard Guenther; +Cc: Duncan Sands, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 756 bytes --]

Here is a newer version of the patch, which transforms the builtin to
the VEC_SHUFFLE_EXPR in the front-end.

Several comments:
1) Helper function for the pseudo-builtins.
In my case the builtin can have 2 or 3 arguments, and I think that I
expressed that in a pretty much short way without any helper function.
Am I missing something?

2) Richard, why do you want to treat VEC_SHUFFLE_EXPR as
GIMPLE_TERNARY_RHS when VEC_COND_EXPR (which is about the same as
VEC_SHUFF_EXPR) is single_rhs? From my perspective I don't see much of
a difference whether it is trenary or single, so I converted it to
trenary as you asked. But still it looks suspicious that vec_cond and
vec_shuffle are treated differently.

Can anyone review the x86 parts?


Thanks,
Artem.

[-- Attachment #2: vec-shuffle.v12.diff --]
[-- Type: text/plain, Size: 44995 bytes --]

Index: gcc/doc/extend.texi
===================================================================
--- gcc/doc/extend.texi	(revision 178354)
+++ gcc/doc/extend.texi	(working copy)
@@ -6561,6 +6561,32 @@ invoke undefined behavior at runtime.  W
 accesses for vector subscription can be enabled with
 @option{-Warray-bounds}.
 
+Vector shuffling is available using functions 
+@code{__builtin_shuffle (vec, mask)} and 
+@code{__builtin_shuffle (vec0, vec1, mask)}. Both functions construct
+a permutation of elements from one or two vectors and return a vector
+of the same type as input vector(s). The mask is a vector of 
+integer-typed elements. The size of each element of the mask must be
+the same as the size of each input vector element. The number of 
+elements in input vector(s) and mask must be the same.
+
+The elements of the input vectors are numbered from left to right across
+one or both of the vectors. Each element in the mask specifies a number
+of element from the input vector(s). Consider the following example.
+
+@smallexample
+typedef int v4si __attribute__ ((vector_size (16)));
+
+v4si a = @{1,2,3,4@};
+v4si b = @{5,6,7,8@};
+v4si mask1 = @{0,1,1,3@};
+v4si mask2 = @{0,4,2,5@};
+v4si res;
+
+res = __builtin_shuffle (a, mask1);       /* res is @{1,2,2,4@}  */
+res = __builtin_shuffle2 (a, b, mask2);   /* res is @{1,5,3,6@}  */
+@end smallexample
+
 You can declare variables and use them in function calls and returns, as
 well as in assignments and some casts.  You can specify a vector type as
 a return type for a function.  Vector types can also be used as function
Index: gcc/tree-pretty-print.c
===================================================================
--- gcc/tree-pretty-print.c	(revision 178354)
+++ gcc/tree-pretty-print.c	(working copy)
@@ -2067,6 +2067,16 @@ dump_generic_node (pretty_printer *buffe
       dump_generic_node (buffer, TREE_OPERAND (node, 2), spc, flags, false);
       pp_string (buffer, " > ");
       break;
+    
+    case VEC_SHUFFLE_EXPR:
+      pp_string (buffer, " VEC_SHUFFLE_EXPR < ");
+      dump_generic_node (buffer, TREE_OPERAND (node, 0), spc, flags, false);
+      pp_string (buffer, " , ");
+      dump_generic_node (buffer, TREE_OPERAND (node, 1), spc, flags, false);
+      pp_string (buffer, " , ");
+      dump_generic_node (buffer, TREE_OPERAND (node, 2), spc, flags, false);
+      pp_string (buffer, " > ");
+      break;
 
     case DOT_PROD_EXPR:
       pp_string (buffer, " DOT_PROD_EXPR < ");
Index: gcc/c-family/c-common.c
===================================================================
--- gcc/c-family/c-common.c	(revision 178354)
+++ gcc/c-family/c-common.c	(working copy)
@@ -425,6 +425,7 @@ const struct c_common_resword c_common_r
   { "__attribute__",	RID_ATTRIBUTE,	0 },
   { "__builtin_choose_expr", RID_CHOOSE_EXPR, D_CONLY },
   { "__builtin_complex", RID_BUILTIN_COMPLEX, D_CONLY },
+  { "__builtin_shuffle", RID_BUILTIN_SHUFFLE, D_CONLY },
   { "__builtin_offsetof", RID_OFFSETOF, 0 },
   { "__builtin_types_compatible_p", RID_TYPES_COMPATIBLE_P, D_CONLY },
   { "__builtin_va_arg",	RID_VA_ARG,	0 },
Index: gcc/c-family/c-common.h
===================================================================
--- gcc/c-family/c-common.h	(revision 178354)
+++ gcc/c-family/c-common.h	(working copy)
@@ -103,7 +103,7 @@ enum rid
   /* C extensions */
   RID_ASM,       RID_TYPEOF,   RID_ALIGNOF,  RID_ATTRIBUTE,  RID_VA_ARG,
   RID_EXTENSION, RID_IMAGPART, RID_REALPART, RID_LABEL,      RID_CHOOSE_EXPR,
-  RID_TYPES_COMPATIBLE_P,      RID_BUILTIN_COMPLEX,
+  RID_TYPES_COMPATIBLE_P,      RID_BUILTIN_COMPLEX,	     RID_BUILTIN_SHUFFLE,
   RID_DFLOAT32, RID_DFLOAT64, RID_DFLOAT128,
   RID_FRACT, RID_ACCUM,
 
@@ -898,6 +898,7 @@ extern tree build_function_call (locatio
 
 extern tree build_function_call_vec (location_t, tree,
     				     VEC(tree,gc) *, VEC(tree,gc) *);
+extern tree c_build_vec_shuffle_expr (location_t, tree, tree, tree);
 
 extern tree resolve_overloaded_builtin (location_t, tree, VEC(tree,gc) *);
 
Index: gcc/optabs.c
===================================================================
--- gcc/optabs.c	(revision 178354)
+++ gcc/optabs.c	(working copy)
@@ -6620,6 +6620,82 @@ vector_compare_rtx (tree cond, bool unsi
   return gen_rtx_fmt_ee (rcode, VOIDmode, ops[0].value, ops[1].value);
 }
 
+/* Return true if VEC_SHUFF_EXPR can be expanded using SIMD extensions
+   of the CPU.  */
+bool
+expand_vec_shuffle_expr_p (enum machine_mode mode, tree v0, 
+			   tree v1, tree mask)
+{
+  int v0_mode_s = GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE (TREE_TYPE (v0))));
+  int mask_mode_s = GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE (TREE_TYPE (mask))));
+
+  if (TREE_CODE (mask) == VECTOR_CST
+      && targetm.vectorize.builtin_vec_perm_ok (TREE_TYPE (v0), mask))
+    return true;
+  
+  if (v0 != v1 || v0_mode_s != mask_mode_s)
+    return false;
+    
+  return direct_optab_handler (vshuffle_optab, mode) != CODE_FOR_nothing;
+}
+
+/* Generate instructions for VEC_COND_EXPR given its type and three
+   operands.  */
+rtx
+expand_vec_shuffle_expr (tree type, tree v0, tree v1, tree mask, rtx target)
+{
+  struct expand_operand ops[4];
+  enum insn_code icode;
+  enum machine_mode mode = TYPE_MODE (type);
+  rtx rtx_v0, rtx_mask;
+
+  gcc_assert (expand_vec_shuffle_expr_p (mode, v0, v1, mask));
+
+  if (TREE_CODE (mask) == VECTOR_CST)
+    {
+      tree m_type, call;
+      tree fn = targetm.vectorize.builtin_vec_perm (TREE_TYPE (v0), &m_type);
+      rtx t;
+
+      if (!fn)
+	goto vshuffle;
+
+      if (m_type != TREE_TYPE (TREE_TYPE (mask)))
+	{	
+	  int units = TYPE_VECTOR_SUBPARTS (TREE_TYPE (mask));
+	  tree cvt = build_vector_type (m_type, units);
+	  mask = fold_convert (cvt, mask);
+	}
+
+      fn = copy_node (fn);
+      call = fold_build1 (ADDR_EXPR, build_pointer_type (TREE_TYPE (fn)), fn);
+      call = build_call_nary (type /* ? */, call, 3, v0, v1, mask);
+
+      t = expand_normal (call);  
+      target = gen_reg_rtx (mode);
+      emit_insn (gen_rtx_SET (VOIDmode, target, t));
+      return target;
+    }
+
+vshuffle:
+  gcc_assert (v1 == v0);
+
+  icode = direct_optab_handler (vshuffle_optab, mode);
+
+  if (icode == CODE_FOR_nothing)
+    return 0;
+  
+  rtx_v0 = expand_normal (v0);
+  rtx_mask = expand_normal (mask);
+
+  create_output_operand (&ops[0], target, mode);
+  create_input_operand (&ops[1], rtx_v0, mode);
+  create_input_operand (&ops[2], rtx_mask, mode);
+  expand_insn (icode, 3, ops);
+  
+  return ops[0].value;
+}
+
 /* Return insn code for TYPE, the type of a VEC_COND_EXPR.  */
 
 static inline enum insn_code
Index: gcc/optabs.h
===================================================================
--- gcc/optabs.h	(revision 178354)
+++ gcc/optabs.h	(working copy)
@@ -636,6 +636,9 @@ enum direct_optab_index
   DOI_vcond,
   DOI_vcondu,
 
+  /* Vector shuffling.  */
+  DOI_vshuffle,
+
   /* Block move operation.  */
   DOI_movmem,
 
@@ -701,6 +704,7 @@ typedef struct direct_optab_d *direct_op
 #define reload_out_optab (&direct_optab_table[(int) DOI_reload_out])
 #define vcond_optab (&direct_optab_table[(int) DOI_vcond])
 #define vcondu_optab (&direct_optab_table[(int) DOI_vcondu])
+#define vshuffle_optab (&direct_optab_table[(int) DOI_vshuffle])
 #define movmem_optab (&direct_optab_table[(int) DOI_movmem])
 #define setmem_optab (&direct_optab_table[(int) DOI_setmem])
 #define cmpstr_optab (&direct_optab_table[(int) DOI_cmpstr])
@@ -879,8 +883,15 @@ extern rtx expand_widening_mult (enum ma
 /* Return tree if target supports vector operations for COND_EXPR.  */
 bool expand_vec_cond_expr_p (tree, enum machine_mode);
 
+/* Return tree if target supports vector operations for VEC_SHUFFLE_EXPR.  */
+bool expand_vec_shuffle_expr_p (enum machine_mode, tree, tree, tree);
+
 /* Generate code for VEC_COND_EXPR.  */
 extern rtx expand_vec_cond_expr (tree, tree, tree, tree, rtx);
+
+/* Generate code for VEC_SHUFFLE_EXPR.  */
+extern rtx expand_vec_shuffle_expr (tree, tree, tree, tree, rtx);
+
 /* Generate code for VEC_LSHIFT_EXPR and VEC_RSHIFT_EXPR.  */
 extern rtx expand_vec_shift_expr (sepops, rtx);
 
Index: gcc/genopinit.c
===================================================================
--- gcc/genopinit.c	(revision 178354)
+++ gcc/genopinit.c	(working copy)
@@ -255,6 +255,7 @@ static const char * const optabs[] =
   "set_optab_handler (vec_realign_load_optab, $A, CODE_FOR_$(vec_realign_load_$a$))",
   "set_direct_optab_handler (vcond_optab, $A, CODE_FOR_$(vcond$a$))",
   "set_direct_optab_handler (vcondu_optab, $A, CODE_FOR_$(vcondu$a$))",
+  "set_direct_optab_handler (vshuffle_optab, $A, CODE_FOR_$(vshuffle$a$))",
   "set_optab_handler (ssum_widen_optab, $A, CODE_FOR_$(widen_ssum$I$a3$))",
   "set_optab_handler (usum_widen_optab, $A, CODE_FOR_$(widen_usum$I$a3$))",
   "set_optab_handler (udot_prod_optab, $A, CODE_FOR_$(udot_prod$I$a$))",
Index: gcc/testsuite/gcc.c-torture/execute/vect-shuffle-2.c
===================================================================
--- gcc/testsuite/gcc.c-torture/execute/vect-shuffle-2.c	(revision 0)
+++ gcc/testsuite/gcc.c-torture/execute/vect-shuffle-2.c	(revision 0)
@@ -0,0 +1,44 @@
+#define vector(elcount, type)  \
+__attribute__((vector_size((elcount)*sizeof(type)))) type
+
+#define vidx(type, vec, idx) (*(((type *) &(vec)) + idx))
+
+#define shuf2compare(type, count, vres, v0, v1, mask) \
+do { \
+    int __i; \
+    for (__i = 0; __i < count; __i++) { \
+        if (vidx(type, vres, __i) != ((vidx(type, mask, __i) < count) ? \
+                          vidx(type, v0, vidx(type, mask, __i)) :  \
+                          vidx(type, v1, (vidx(type, mask, __i) - count)))) \
+            __builtin_abort (); \
+        } \
+} while (0)
+
+
+int main (int argc, char *argv[]) {
+    vector (8, short) v0 = {5, 5,5,5,5,5,argc,7};
+    vector (8, short) v1 = {argc, 1,8,8,4,9,argc,4};
+    vector (8, short) v2;
+
+    //vector (8, short) mask = {1,2,5,4,3,6,7};
+    
+    vector (8, short) mask0 = {0,2,3,1,4,5,6,7};
+    vector (8, short) mask1 = {0,12,3,4,3,0,10,9};
+    
+    vector (8, short) mask2 = {0,8,1,9,2,10,3,11};
+
+    v2 = __builtin_shuffle (v0, v1,  mask0);
+    shuf2compare (short, 8, v2, v0, v1, mask0);
+ 
+    v2 = __builtin_shuffle (v0, v1,  mask1);
+    shuf2compare (short, 8, v2, v0, v1, mask1);
+    
+    v2 = __builtin_shuffle (v0, v1,  mask2);
+    shuf2compare (short, 8, v2, v0, v1, mask2);
+
+    v2 = __builtin_shuffle (mask0, mask0,  v0);
+    shuf2compare (short, 8, v2, mask0, mask0, v0);
+
+    return 0; 
+}
+
Index: gcc/testsuite/gcc.c-torture/execute/vect-shuffle-4.c
===================================================================
--- gcc/testsuite/gcc.c-torture/execute/vect-shuffle-4.c	(revision 0)
+++ gcc/testsuite/gcc.c-torture/execute/vect-shuffle-4.c	(revision 0)
@@ -0,0 +1,50 @@
+#define vector(elcount, type)  \
+__attribute__((vector_size((elcount)*sizeof(type)))) type
+
+#define vidx(type, vec, idx) (*(((type *) &(vec)) + idx))
+
+#define shuf2compare(type, count, vres, v0, v1, mask) \
+do { \
+    int __i; \
+    for (__i = 0; __i < count; __i++) { \
+        if (vidx(type, vres, __i) != ((vidx(type, mask, __i) < count) ? \
+                          vidx(type, v0, vidx(type, mask, __i)) :  \
+                          vidx(type, v1, (vidx(type, mask, __i) - count)))) \
+            __builtin_abort (); \
+        } \
+} while (0)
+
+
+vector (8, short) __attribute__ ((noinline))
+f (vector (8, short) x, vector (8, short) y, vector (8, short) mask) {
+    return __builtin_shuffle (x, y, mask);
+}
+
+
+
+int main (int argc, char *argv[]) {
+    vector (8, short) v0 = {argc, 1,2,3,4,5,6,7};
+    vector (8, short) v1 = {argc, 1,argc,3,4,5,argc,7};
+    vector (8, short) v2;
+
+    //vector (8, short) mask = {1,2,5,4,3,6,7};
+    
+    vector (8, short) mask0 = {0,2,3,1,4,5,6,7};
+    vector (8, short) mask1 = {0,12,3,4,3,0,10,9};
+    vector (8, short) mask2 = {0,8,1,9,2,10,3,11};
+
+    v2 = f (v0, v1,  mask0);
+    shuf2compare (short, 8, v2, v0, v1, mask0);
+ 
+    v2 = f (v0, v1,  mask1);
+    shuf2compare (short, 8, v2, v0, v1, mask1);
+
+    v2 = f (v0, v1,  mask2);
+    shuf2compare (short, 8, v2, v0, v1, mask2);
+
+    v2 = f (mask0, mask0,  v0);
+    shuf2compare (short, 8, v2, mask0, mask0, v0);
+
+    return 0; 
+}
+
Index: gcc/testsuite/gcc.c-torture/execute/vect-shuffle-1.c
===================================================================
--- gcc/testsuite/gcc.c-torture/execute/vect-shuffle-1.c	(revision 0)
+++ gcc/testsuite/gcc.c-torture/execute/vect-shuffle-1.c	(revision 0)
@@ -0,0 +1,46 @@
+#define vector(elcount, type)  \
+__attribute__((vector_size((elcount)*sizeof(type)))) type
+
+#define vidx(type, vec, idx) (*(((type *) &(vec)) + idx))
+
+#define shufcompare(type, count, vres, v0, mask) \
+do { \
+    int __i; \
+    for (__i = 0; __i < count; __i++) { \
+        if (vidx(type, vres, __i) != vidx(type, v0, vidx(type, mask, __i))) \
+            __builtin_abort (); \
+    } \
+} while (0)
+
+
+int main (int argc, char *argv[]) {
+    /*vector (8, short) v0 = {argc, 1,2,3,4,5,6,7};
+    vector (8, short) v1 = {argc, 1,argc,3,4,5,argc,7};
+    vector (8, short) v2;
+   
+    vector (8, short) smask = {0,0,1,2,3,4,5,6};
+    
+    v2 = __builtin_shuffle (v0,  smask);
+    shufcompare (short, 8, v2, v0, smask);
+    v2 = __builtin_shuffle (v0, v1);
+    shufcompare (short, 8, v2, v0, v1);
+    v2 = __builtin_shuffle (smask, v0);
+    shufcompare (short, 8, v2, smask, v0);*/
+
+    vector (4, int) i0 = {argc, 1,2,3};
+    vector (4, int) i1 = {argc, 1, argc, 3};
+    vector (4, int) i2;
+
+    vector (4, int) imask = {0,3,2,1};
+
+    /*i2 = __builtin_shuffle (i0, imask);
+    shufcompare (int, 4, i2, i0, imask);*/
+    i2 = __builtin_shuffle (i0, i1);
+    shufcompare (int, 4, i2, i0, i1);
+    
+    i2 = __builtin_shuffle (imask, i0);
+    shufcompare (int, 4, i2, imask, i0);
+    
+    return 0;
+}
+
Index: gcc/testsuite/gcc.c-torture/execute/vect-shuffle-3.c
===================================================================
--- gcc/testsuite/gcc.c-torture/execute/vect-shuffle-3.c	(revision 0)
+++ gcc/testsuite/gcc.c-torture/execute/vect-shuffle-3.c	(revision 0)
@@ -0,0 +1,36 @@
+#define vector(elcount, type)  \
+__attribute__((vector_size((elcount)*sizeof(type)))) type
+
+#define vidx(type, vec, idx) (*(((type *) &(vec)) + idx))
+
+#define shufcompare(type, count, vres, v0, mask) \
+do { \
+    int __i; \
+    for (__i = 0; __i < count; __i++) { \
+        if (vidx(type, vres, __i) != vidx(type, v0, vidx(type, mask, __i))) \
+            __builtin_abort (); \
+    } \
+} while (0)
+
+vector (8, short) __attribute__ ((noinline))
+f (vector (8, short) x, vector (8, short) mask) {
+    return __builtin_shuffle (x, mask);
+}
+
+
+int main (int argc, char *argv[]) {
+    vector (8, short) v0 = {argc, 1,2,3,4,5,6,7};
+    vector (8, short) v1 = {argc, 1,argc,3,4,5,argc,7};
+    vector (8, short) v2;
+
+    vector (8, short) mask = {0,0,1,2,3,4,5,6};
+    
+    v2 = f (v0,  mask);
+    shufcompare (short, 8, v2, v0, mask);
+
+    v2 = f (v0, v1);
+    shufcompare (short, 8, v2, v0, v1);
+
+    return 0;
+}
+
Index: gcc/expr.c
===================================================================
--- gcc/expr.c	(revision 178354)
+++ gcc/expr.c	(working copy)
@@ -8605,6 +8605,10 @@ expand_expr_real_2 (sepops ops, rtx targ
     case VEC_PACK_FIX_TRUNC_EXPR:
       mode = TYPE_MODE (TREE_TYPE (treeop0));
       goto binop;
+    
+    case VEC_SHUFFLE_EXPR:
+      target = expand_vec_shuffle_expr (type, treeop0, treeop1, treeop2, target);
+      return target;
 
     case DOT_PROD_EXPR:
       {
Index: gcc/gimple-pretty-print.c
===================================================================
--- gcc/gimple-pretty-print.c	(revision 178354)
+++ gcc/gimple-pretty-print.c	(working copy)
@@ -417,6 +417,16 @@ dump_ternary_rhs (pretty_printer *buffer
       dump_generic_node (buffer, gimple_assign_rhs3 (gs), spc, flags, false);
       pp_string (buffer, ">");
       break;
+    
+    case VEC_SHUFFLE_EXPR:
+      pp_string (buffer, "VEC_SHUFFLE_EXPR <");
+      dump_generic_node (buffer, gimple_assign_rhs1 (gs), spc, flags, false);
+      pp_string (buffer, ", ");
+      dump_generic_node (buffer, gimple_assign_rhs2 (gs), spc, flags, false);
+      pp_string (buffer, ", ");
+      dump_generic_node (buffer, gimple_assign_rhs3 (gs), spc, flags, false);
+      pp_string (buffer, ">");
+      break;
 
     case REALIGN_LOAD_EXPR:
       pp_string (buffer, "REALIGN_LOAD <");
Index: gcc/c-typeck.c
===================================================================
--- gcc/c-typeck.c	(revision 178354)
+++ gcc/c-typeck.c	(working copy)
@@ -2845,6 +2845,89 @@ build_function_call_vec (location_t loc,
     }
   return require_complete_type (result);
 }
+
+/* Build a VEC_SHUFLE_EXPR if V0, V1 and MASK are not error_mark_nodes
+   and have vector types, V0 has the same type as V1, and the number of
+   elements of V0, V1, MASK is the same.  */
+tree
+c_build_vec_shuffle_expr (location_t loc, tree v0, tree v1, tree mask)
+{
+  tree vec_shuffle, tmp;
+  bool wrap = true;
+  bool maybe_const = false;
+  bool two_arguments = v0 == v1;
+
+
+  if (v0 == error_mark_node || v1 == error_mark_node 
+      || mask == error_mark_node)
+    return error_mark_node;
+
+  if (TREE_CODE (TREE_TYPE (mask)) != VECTOR_TYPE
+      || TREE_CODE (TREE_TYPE (TREE_TYPE (mask))) != INTEGER_TYPE)
+    {
+      error_at (loc, "__builtin_shuffle last argument must "
+		     "be an integer vector");
+      return error_mark_node;
+    }
+   
+  if (TREE_CODE (TREE_TYPE (v0)) != VECTOR_TYPE
+      || TREE_CODE (TREE_TYPE (v1)) != VECTOR_TYPE)
+    {
+      error_at (loc, "__builtin_shuffle arguments must be vectors");
+      return error_mark_node;
+    }
+
+  if (TREE_TYPE (v0) != TREE_TYPE (v1))
+    {
+      error_at (loc, "__builtin_shuffle argument vectors must be of "
+		     "the same type");
+      return error_mark_node;
+    }
+
+  if (TYPE_VECTOR_SUBPARTS (TREE_TYPE (v0)) 
+      != TYPE_VECTOR_SUBPARTS (TREE_TYPE (mask))
+      && TYPE_VECTOR_SUBPARTS (TREE_TYPE (v1))
+	 != TYPE_VECTOR_SUBPARTS (TREE_TYPE (mask)))
+    {
+      error_at (loc, "__builtin_shuffle number of elements of the "
+		     "argument vector(s) and the mask vector should "
+		     "be the same");
+      return error_mark_node;
+    }
+  
+  if (GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE (TREE_TYPE (v0)))) 
+      != GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE (TREE_TYPE (mask)))))
+    {
+      error_at (loc, "__builtin_shuffle argument vector(s) inner type "
+		     "must have the same size as inner type of the mask");
+      return error_mark_node;
+    }
+
+  /* Avoid C_MAYBE_CONST_EXPRs inside VEC_SHUFFLE_EXPR.  */
+  tmp = c_fully_fold (v0, false, &maybe_const);
+  v0 = save_expr (tmp);
+  wrap &= maybe_const;
+
+  if (!two_arguments)
+    {
+      tmp = c_fully_fold (v1, false, &maybe_const);
+      v1 = save_expr (tmp);
+      wrap &= maybe_const;
+    }
+  else
+    v1 = v0;
+  
+  tmp = c_fully_fold (mask, false, &maybe_const);
+  mask = save_expr (tmp);
+  wrap &= maybe_const;
+
+  vec_shuffle = build3 (VEC_SHUFFLE_EXPR, TREE_TYPE (v0), v0, v1, mask);
+
+  if (!wrap)
+    vec_shuffle = c_wrap_maybe_const (vec_shuffle, true);
+
+  return vec_shuffle;
+}
 \f
 /* Convert the argument expressions in the vector VALUES
    to the types in the list TYPELIST.
@@ -6120,7 +6203,14 @@ digest_init (location_t init_loc, tree t
 	  tree value;
 	  bool constant_p = true;
 
-	  /* Iterate through elements and check if all constructor
+	  /* If constructor has less elements than the vector type.  */
+          if (CONSTRUCTOR_NELTS (inside_init) 
+              < TYPE_VECTOR_SUBPARTS (TREE_TYPE (inside_init)))
+            warning_at (init_loc, 0, "vector length does not match "
+                                     "initializer length, zero elements "
+                                     "will be inserted");
+          
+          /* Iterate through elements and check if all constructor
 	     elements are *_CSTs.  */
 	  FOR_EACH_CONSTRUCTOR_VALUE (CONSTRUCTOR_ELTS (inside_init), ix, value)
 	    if (!CONSTANT_CLASS_P (value))
Index: gcc/gimplify.c
===================================================================
--- gcc/gimplify.c	(revision 178354)
+++ gcc/gimplify.c	(working copy)
@@ -7053,6 +7053,32 @@ gimplify_expr (tree *expr_p, gimple_seq
 	  }
 	  break;
 
+	case VEC_SHUFFLE_EXPR:
+	  {
+	    enum gimplify_status r0, r1, r2;
+
+	    if (TREE_OPERAND (*expr_p, 0) == TREE_OPERAND (*expr_p, 1))
+	      {
+		r0 = r1 = gimplify_expr (&TREE_OPERAND (*expr_p, 0), pre_p,
+					 post_p, is_gimple_val, fb_rvalue);
+		TREE_OPERAND (*expr_p, 1) = TREE_OPERAND (*expr_p, 0);
+	      }
+	    else
+	      {
+		 r0 = gimplify_expr (&TREE_OPERAND (*expr_p, 0), pre_p,
+				     post_p, is_gimple_val, fb_rvalue);
+		 r1 = gimplify_expr (&TREE_OPERAND (*expr_p, 1), pre_p,
+				     post_p, is_gimple_val, fb_rvalue);
+	      }
+
+	    r2 = gimplify_expr (&TREE_OPERAND (*expr_p, 2), pre_p,
+				post_p, is_gimple_val, fb_rvalue);
+	    recalculate_side_effects (*expr_p);
+
+	    ret = MIN (r0, MIN (r1, r2));
+	    break;
+	  }
+
 	case TARGET_MEM_REF:
 	  {
 	    enum gimplify_status r0 = GS_ALL_DONE, r1 = GS_ALL_DONE;
Index: gcc/tree.def
===================================================================
--- gcc/tree.def	(revision 178354)
+++ gcc/tree.def	(working copy)
@@ -497,6 +497,19 @@ DEFTREECODE (COND_EXPR, "cond_expr", tcc
 */
 DEFTREECODE (VEC_COND_EXPR, "vec_cond_expr", tcc_expression, 3)
 
+/* Vector shuffle expression. A = VEC_SHUFFLE_EXPR<v0, v1, maks>
+   means
+
+   freach i in length (mask):
+     A = mask[i] < length (v0) ? v0[mask[i]] : v1[mask[i]]
+
+   V0 and V1 are vectors of the same type. MASK is an integer-typed
+   vector. The number of MASK elements must be the same with the
+   number of elements in V0 and V1. The size of the inner type
+   of the MASK and of the V0 and V1 must be the same.
+*/
+DEFTREECODE (VEC_SHUFFLE_EXPR, "vec_shuffle_expr", tcc_expression, 3)
+
 /* Declare local variables, including making RTL and allocating space.
    BIND_EXPR_VARS is a chain of VAR_DECL nodes for the variables.
    BIND_EXPR_BODY is the body, the expression to be computed using
Index: gcc/tree-inline.c
===================================================================
--- gcc/tree-inline.c	(revision 178354)
+++ gcc/tree-inline.c	(working copy)
@@ -3285,6 +3285,7 @@ estimate_operator_cost (enum tree_code c
        ??? We may consider mapping RTL costs to this.  */
     case COND_EXPR:
     case VEC_COND_EXPR:
+    case VEC_SHUFFLE_EXPR:
 
     case PLUS_EXPR:
     case POINTER_PLUS_EXPR:
Index: gcc/tree-vect-generic.c
===================================================================
--- gcc/tree-vect-generic.c	(revision 178354)
+++ gcc/tree-vect-generic.c	(working copy)
@@ -30,6 +30,7 @@ along with GCC; see the file COPYING3.
 #include "tree-pass.h"
 #include "flags.h"
 #include "ggc.h"
+#include "diagnostic.h"
 
 /* Need to include rtl.h, expr.h, etc. for optabs.  */
 #include "expr.h"
@@ -432,6 +433,279 @@ type_for_widest_vector_mode (enum machin
     }
 }
 
+
+/* Build a reference to the element of the vector VECT. Function 
+   returns either the element itself, either BIT_FIELD_REF, or an 
+   ARRAY_REF expression.
+   
+   GSI is requred to insert temporary variables while building a
+   refernece to the element of the vector VECT.
+   
+   PTMPVEC is a pointer to the temporary variable for caching
+   purposes. In case when PTMPVEC is NULL new temporary variable
+   will be created.  */
+static tree
+vector_element (gimple_stmt_iterator *gsi, tree vect, tree idx, tree *ptmpvec)
+{
+  tree type;
+  gimple asgn; 
+  unsigned HOST_WIDE_INT maxval;
+  tree tmpvec; 
+  tree indextype, arraytype;
+  bool need_asgn = true;
+
+  gcc_assert (TREE_CODE (TREE_TYPE (vect)) == VECTOR_TYPE);
+
+  type = TREE_TYPE (vect);
+  if (TREE_CODE (idx) == INTEGER_CST)
+    {
+      unsigned HOST_WIDE_INT index;
+
+      if (!host_integerp (idx, 1)
+           || (index = tree_low_cst (idx, 1)) > TYPE_VECTOR_SUBPARTS (type)-1)
+        return error_mark_node;
+
+      if (TREE_CODE (vect) == VECTOR_CST)
+        {
+            unsigned i;
+            tree vals = TREE_VECTOR_CST_ELTS (vect);
+            for (i = 0; vals; vals = TREE_CHAIN (vals), ++i)
+              if (i == index)
+                 return TREE_VALUE (vals);
+            return error_mark_node;
+        }
+      else if (TREE_CODE (vect) == CONSTRUCTOR)
+        {
+          unsigned i;
+          VEC (constructor_elt, gc) *vals = CONSTRUCTOR_ELTS (vect);
+          constructor_elt *elt;
+
+          for (i = 0; VEC_iterate (constructor_elt, vals, i, elt); i++)
+            if (operand_equal_p (elt->index, idx, 0))
+              return elt->value; 
+          return fold_convert (TREE_TYPE (type), integer_zero_node);
+        }
+      else if (TREE_CODE (vect) == SSA_NAME)
+        {
+          tree el;
+          gimple vectdef = SSA_NAME_DEF_STMT (vect);
+          if (gimple_assign_single_p (vectdef)
+              && (el = vector_element (gsi, gimple_assign_rhs1 (vectdef), 
+                                       idx, ptmpvec)) 
+                 != error_mark_node)
+            return el;
+          else
+            {
+              tree size = TYPE_SIZE (TREE_TYPE (type));
+              tree pos = fold_build2 (MULT_EXPR, TREE_TYPE (idx), 
+                                      idx, size);
+              return fold_build3 (BIT_FIELD_REF, TREE_TYPE (type), 
+                             vect, size, pos);
+            }
+        }
+      else
+        return error_mark_node;
+    }
+  
+  if (!ptmpvec)
+    tmpvec = create_tmp_var (TREE_TYPE (vect), "vectmp");
+  else if (!*ptmpvec)
+    tmpvec = *ptmpvec = create_tmp_var (TREE_TYPE (vect), "vectmp");
+  else
+    {
+      tmpvec = *ptmpvec;
+      need_asgn = false;
+    }
+  
+  if (need_asgn)
+    {
+      TREE_ADDRESSABLE (tmpvec) = 1;
+      asgn = gimple_build_assign (tmpvec, vect);
+      gsi_insert_before (gsi, asgn, GSI_SAME_STMT);
+    }
+
+  maxval = TYPE_VECTOR_SUBPARTS (TREE_TYPE (vect)) -1;
+  indextype = build_index_type (size_int (maxval));
+  arraytype = build_array_type (TREE_TYPE (type), indextype);
+  
+  return build4 (ARRAY_REF, TREE_TYPE (type),
+                 build1 (VIEW_CONVERT_EXPR, arraytype, tmpvec),
+                 idx, NULL_TREE, NULL_TREE);
+
+
+}
+
+/* Check if VEC_SHUFFLE_EXPR within the given setting is supported
+   by hardware, or lower it piecewie.
+
+   When VEC_SHUFFLE_EXPR has the same first and second operands:
+   VEC_SHUFFLE_EXPR <v0, v0, mask> the lowered version would be 
+   {v0[mask[0]], v0[mask[1]], ...}
+   MASK and V0 must have the same number of elements.
+        
+   Otherwise VEC_SHUFFLE_EXPR <v0, v1, mask> is lowered to 
+   {mask[0] < len(v0) ? v0[mask[0]] : v1[mask[0]], ...}
+   V0 and V1 must have the same type. MASK, V0, V1 must have the
+   same number of arguments.  */
+static void
+lower_vec_shuffle (gimple_stmt_iterator *gsi, location_t loc)
+{
+#define TRAP_RETURN(new_stmt, stmt, gsi, vec0) \
+do { \
+  new_stmt = gimple_build_call (built_in_decls[BUILT_IN_TRAP], 0); \
+  gsi_insert_before (gsi, new_stmt,  GSI_SAME_STMT); \
+  split_block (gimple_bb (new_stmt), new_stmt); \
+  new_stmt = gimple_build_assign (gimple_call_lhs (stmt), vec0); \
+  gsi_replace (gsi, new_stmt, false); \
+  return; \
+} while (0) 
+ 
+  gimple stmt = gsi_stmt (*gsi);
+  tree mask = gimple_assign_rhs3 (stmt);
+  tree vec0 = gimple_assign_rhs1 (stmt);
+  tree vec1 = gimple_assign_rhs2 (stmt);
+  unsigned els = TYPE_VECTOR_SUBPARTS (TREE_TYPE (mask));
+  tree type0 = TREE_TYPE (TREE_TYPE (vec0));
+  VEC(constructor_elt,gc) *v = NULL;
+  tree vectype, constr;
+  gimple new_stmt;
+  tree vec0tmp = NULL_TREE, masktmp = NULL_TREE;
+
+  if (expand_vec_shuffle_expr_p (TYPE_MODE (TREE_TYPE (vec0)), vec0, vec1, mask))
+    {
+      tree t;
+
+      t = gimplify_build3 (gsi, VEC_SHUFFLE_EXPR, TREE_TYPE (vec0),
+			   vec0, vec1, mask);
+      gimple_assign_set_rhs_from_tree (gsi, t);
+      /* Statement should be updated by callee.  */
+      return;
+    }
+
+  
+  if (vec0 == vec1)
+    {
+      unsigned i;
+      tree vec0tmp = NULL_TREE;
+      
+      v = VEC_alloc (constructor_elt, gc, els);
+      for (i = 0; i < els; i++)
+        {
+          tree idxval, vecel, t;
+	   
+	  idxval = vector_element (gsi, mask, size_int (i), &masktmp);
+          if (idxval == error_mark_node)
+            {
+              warning_at (loc, 0, "Invalid shuffling mask index %i", i);
+              TRAP_RETURN (new_stmt, stmt, gsi, vec0);
+            }
+
+	  vecel = vector_element (gsi, vec0, idxval, &vec0tmp);
+          if (vecel == error_mark_node)
+            {
+              warning_at (loc, 0, "Invalid shuffling arguments");
+              TRAP_RETURN (new_stmt, stmt, gsi, vec0);
+            }
+          
+          t = force_gimple_operand_gsi (gsi, vecel, true, 
+					NULL_TREE, true, GSI_SAME_STMT);
+          CONSTRUCTOR_APPEND_ELT (v, size_int (i), t);
+        }
+    }
+  else
+    {
+      unsigned i;
+      tree var = create_tmp_var (type0, "vecel");
+      tree vec1tmp = NULL_TREE;
+
+      v = VEC_alloc (constructor_elt, gc, els);
+      for (i = 0; i < els; i++)
+        {
+          tree idxval, idx1val, cond, elval0, elval1, condexpr, t, ssatmp;
+          tree vec0el, vec1el;
+          gimple asgn;
+          
+          idxval = vector_element (gsi, mask, size_int (i), &masktmp);
+	  if (idxval == error_mark_node)
+            {
+              warning_at (loc, 0, "Invalid shuffling mask index %i", i);
+              TRAP_RETURN (new_stmt, stmt, gsi, vec0);
+            }
+          
+          if (TREE_CODE (idxval) == INTEGER_CST)
+            {
+              if (tree_int_cst_lt (idxval, size_int (els)))
+                {
+                  vec0el = vector_element (gsi, vec0, idxval, &vec0tmp);
+                  t = force_gimple_operand_gsi (gsi, vec0el,
+                                    true, NULL_TREE, true, GSI_SAME_STMT);
+                }
+              else if (tree_int_cst_lt (idxval, size_int (2*els)))
+                {
+                  idx1val = fold_build2 (MINUS_EXPR, TREE_TYPE (idxval),
+                        idxval, build_int_cst (TREE_TYPE (idxval), els));
+                  
+                  vec1el = vector_element (gsi, vec1, idx1val, &vec1tmp);
+                  t = force_gimple_operand_gsi (gsi, vec1el, true, 
+						NULL_TREE, true, GSI_SAME_STMT);
+                }
+              else
+                {
+                  warning_at (loc, 0, "Invalid shuffling mask index %i", i);
+                  TRAP_RETURN (new_stmt, stmt, gsi, vec0);
+                }
+            }
+          else
+            {
+
+              idx1val = fold_build2 (MINUS_EXPR, TREE_TYPE (idxval),
+                            idxval, build_int_cst (TREE_TYPE (idxval), els));
+              idx1val = force_gimple_operand_gsi (gsi, idx1val, 
+                                true, NULL_TREE, true, GSI_SAME_STMT);
+              cond = build2 (GT_EXPR, boolean_type_node, \
+                             idxval, convert (type0, size_int (els - 1)));
+              
+	      vec0el = vector_element (gsi, vec0, idxval, &vec0tmp);
+              if (vec0el == error_mark_node)
+                {
+                  warning_at (loc, 0, "Invalid shuffling arguments");
+                  TRAP_RETURN (new_stmt, stmt, gsi, vec0);
+                }
+
+              elval0 = force_gimple_operand_gsi (gsi, vec0el, 
+                                true, NULL_TREE, true, GSI_SAME_STMT);
+	      
+	      vec1el = vector_element (gsi, vec1, idx1val, &vec1tmp);
+              if (vec1el == error_mark_node)
+                {
+                  warning_at (loc, 0, "Invalid shuffling arguments");
+                  TRAP_RETURN (new_stmt, stmt, gsi, vec0);
+                }
+
+              elval1 = force_gimple_operand_gsi (gsi, vec1el,
+                                true, NULL_TREE, true, GSI_SAME_STMT);
+
+              condexpr = fold_build3 (COND_EXPR, type0, cond, \
+                                      elval1, elval0);
+
+              t = force_gimple_operand_gsi (gsi, condexpr, true, \
+                                        NULL_TREE, true, GSI_SAME_STMT);
+            }
+          
+          asgn = gimple_build_assign (var, t);
+          ssatmp = make_ssa_name (var, asgn);
+          gimple_assign_set_lhs (asgn, ssatmp);
+          gsi_insert_before (gsi, asgn, GSI_SAME_STMT);
+          CONSTRUCTOR_APPEND_ELT (v, size_int (i), ssatmp);
+        }
+    }
+  
+  vectype = build_vector_type (type0, els);
+  constr = build_constructor (vectype, v);
+  gimple_assign_set_rhs_from_tree (gsi, constr);
+  /* Statement should be updated by callee.  */
+}
+
 /* Process one statement.  If we identify a vector operation, expand it.  */
 
 static void
@@ -451,6 +725,13 @@ expand_vector_operations_1 (gimple_stmt_
   code = gimple_assign_rhs_code (stmt);
   rhs_class = get_gimple_rhs_class (code);
 
+  if (code == VEC_SHUFFLE_EXPR)
+    {
+      lower_vec_shuffle (gsi, gimple_location (stmt));
+      gimple_set_modified (gsi_stmt (*gsi), true);
+      update_stmt (gsi_stmt (*gsi));
+    }
+
   if (rhs_class != GIMPLE_UNARY_RHS && rhs_class != GIMPLE_BINARY_RHS)
     return;
 
@@ -612,10 +893,11 @@ expand_vector_operations_1 (gimple_stmt_
 /* Use this to lower vector operations introduced by the vectorizer,
    if it may need the bit-twiddling tricks implemented in this file.  */
 
+
 static bool
-gate_expand_vector_operations (void)
+gate_expand_vector_operations_noop (void)
 {
-  return flag_tree_vectorize != 0;
+  return optimize == 0;
 }
 
 static unsigned int
@@ -648,7 +930,7 @@ struct gimple_opt_pass pass_lower_vector
  {
   GIMPLE_PASS,
   "veclower",				/* name */
-  0,					/* gate */
+  gate_expand_vector_operations_noop,   /* gate */
   expand_vector_operations,		/* execute */
   NULL,					/* sub */
   NULL,					/* next */
@@ -660,7 +942,8 @@ struct gimple_opt_pass pass_lower_vector
   0,					/* todo_flags_start */
   TODO_update_ssa	                /* todo_flags_finish */
     | TODO_verify_ssa
-    | TODO_verify_stmts | TODO_verify_flow
+    | TODO_verify_stmts | TODO_verify_flow 
+    | TODO_cleanup_cfg
  }
 };
 
@@ -669,7 +952,7 @@ struct gimple_opt_pass pass_lower_vector
  {
   GIMPLE_PASS,
   "veclower2",				/* name */
-  gate_expand_vector_operations,	/* gate */
+  0,	                                /* gate */
   expand_vector_operations,		/* execute */
   NULL,					/* sub */
   NULL,					/* next */
@@ -682,6 +965,7 @@ struct gimple_opt_pass pass_lower_vector
   TODO_update_ssa	                /* todo_flags_finish */
     | TODO_verify_ssa
     | TODO_verify_stmts | TODO_verify_flow
+    | TODO_cleanup_cfg
  }
 };
 
Index: gcc/gimple.c
===================================================================
--- gcc/gimple.c	(revision 178354)
+++ gcc/gimple.c	(working copy)
@@ -2615,6 +2615,7 @@ get_gimple_rhs_num_ops (enum tree_code c
       || (SYM) == WIDEN_MULT_MINUS_EXPR					    \
       || (SYM) == DOT_PROD_EXPR						    \
       || (SYM) == REALIGN_LOAD_EXPR					    \
+      || (SYM) == VEC_SHUFFLE_EXPR					    \
       || (SYM) == FMA_EXPR) ? GIMPLE_TERNARY_RHS			    \
    : ((SYM) == COND_EXPR						    \
       || (SYM) == CONSTRUCTOR						    \
Index: gcc/tree-cfg.c
===================================================================
--- gcc/tree-cfg.c	(revision 178354)
+++ gcc/tree-cfg.c	(working copy)
@@ -3713,6 +3713,7 @@ verify_gimple_assign_ternary (gimple stm
 
     case DOT_PROD_EXPR:
     case REALIGN_LOAD_EXPR:
+    case VEC_SHUFFLE_EXPR:
       /* FIXME.  */
       return false;
 
Index: gcc/passes.c
===================================================================
--- gcc/passes.c	(revision 178354)
+++ gcc/passes.c	(working copy)
@@ -1354,7 +1354,6 @@ init_optimization_passes (void)
 	  NEXT_PASS (pass_vectorize);
 	    {
 	      struct opt_pass **p = &pass_vectorize.pass.sub;
-	      NEXT_PASS (pass_lower_vector_ssa);
 	      NEXT_PASS (pass_dce_loop);
 	    }
           NEXT_PASS (pass_predcom);
@@ -1366,6 +1365,7 @@ init_optimization_passes (void)
 	  NEXT_PASS (pass_lim);
 	  NEXT_PASS (pass_tree_loop_done);
 	}
+      NEXT_PASS (pass_lower_vector_ssa);
       NEXT_PASS (pass_cse_reciprocals);
       NEXT_PASS (pass_reassoc);
       NEXT_PASS (pass_vrp);
Index: gcc/c-parser.c
===================================================================
--- gcc/c-parser.c	(revision 178354)
+++ gcc/c-parser.c	(working copy)
@@ -6027,6 +6027,10 @@ c_parser_alignof_expression (c_parser *p
 			     assignment-expression )
      __builtin_types_compatible_p ( type-name , type-name )
      __builtin_complex ( assignment-expression , assignment-expression )
+     __builtin_shuffle ( assignment-expression , assignment-expression )
+     __builtin_shuffle ( assignment-expression , 
+			 assignment-expression ,
+			 assignment-expression, )
 
    offsetof-member-designator:
      identifier
@@ -6461,6 +6465,43 @@ c_parser_postfix_expression (c_parser *p
 						   (TREE_TYPE (e1.value))),
 			       e1.value, e2.value);
 	  break;
+	case RID_BUILTIN_SHUFFLE:
+	  {
+	    VEC(tree,gc) *expr_list;
+	    
+	    c_parser_consume_token (parser);
+	    if (!c_parser_require (parser, CPP_OPEN_PAREN, "expected %<(%>"))
+	      {
+		expr.value = error_mark_node;
+		break;
+	      }
+	    loc = c_parser_peek_token (parser)->location;
+
+	    expr_list = c_parser_expr_list (parser, false, false, NULL);
+	  
+	    if (!c_parser_require (parser, CPP_CLOSE_PAREN, "expected %<)%>"))
+	      {
+		expr.value = error_mark_node;
+		break;
+	      }
+
+	    if (VEC_length (tree, expr_list) == 2)
+	      expr.value = c_build_vec_shuffle_expr 
+				(loc, VEC_index (tree, expr_list, 0),
+				 VEC_index (tree, expr_list, 0),
+				 VEC_index (tree, expr_list, 1));
+	    else if (VEC_length (tree, expr_list) == 3)
+	      expr.value = c_build_vec_shuffle_expr 
+				(loc, VEC_index (tree, expr_list, 0),
+				 VEC_index (tree, expr_list, 1),
+				 VEC_index (tree, expr_list, 2));
+	    else
+	      {
+		error_at (loc, "%<__builtin_shuffle%> wrong number of arguments");
+		expr.value = error_mark_node;
+	      }
+	    break;
+	  }
 	case RID_AT_SELECTOR:
 	  gcc_assert (c_dialect_objc ());
 	  c_parser_consume_token (parser);
Index: gcc/config/i386/sse.md
===================================================================
--- gcc/config/i386/sse.md	(revision 178354)
+++ gcc/config/i386/sse.md	(working copy)
@@ -231,6 +231,12 @@ (define_mode_attr sseinsnmode
    (V4SF "V4SF") (V2DF "V2DF")
    (TI "TI") (V32QI "OI") (V16HI "OI") (V8SI "OI") (V4DI "OI")])
 
+;; All 128bit vector modes
+(define_mode_attr sseshuffint
+  [(V16QI "V16QI") (V8HI "V8HI") 
+   (V4SI "V4SI")  (V2DI "V2DI")
+   (V4SF "V4SI") (V2DF "V2DI")])
+
 ;; Mapping of vector float modes to an integer mode of the same size
 (define_mode_attr sseintvecmode
   [(V8SF "V8SI") (V4DF "V4DI")
@@ -6234,6 +6240,18 @@ (define_expand "vconduv2di"
   DONE;
 })
 
+(define_expand "vshuffle<mode>"
+  [(match_operand:V_128 0 "register_operand" "")
+   (match_operand:V_128 1 "general_operand" "")
+   (match_operand:<sseshuffint> 2 "general_operand" "")]
+  "TARGET_SSE3 || TARGET_AVX"
+{
+  bool ok = ix86_expand_vshuffle (operands);
+  gcc_assert (ok);
+  DONE;
+})
+
+
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
 ;;
 ;; Parallel bitwise logical operations
Index: gcc/config/i386/i386-protos.h
===================================================================
--- gcc/config/i386/i386-protos.h	(revision 178354)
+++ gcc/config/i386/i386-protos.h	(working copy)
@@ -118,6 +118,7 @@ extern bool ix86_expand_int_movcc (rtx[]
 extern bool ix86_expand_fp_movcc (rtx[]);
 extern bool ix86_expand_fp_vcond (rtx[]);
 extern bool ix86_expand_int_vcond (rtx[]);
+extern bool ix86_expand_vshuffle (rtx[]);
 extern void ix86_expand_sse_unpack (rtx[], bool, bool);
 extern bool ix86_expand_int_addcc (rtx[]);
 extern rtx ix86_expand_call (rtx, rtx, rtx, rtx, rtx, bool);
Index: gcc/config/i386/i386.c
===================================================================
--- gcc/config/i386/i386.c	(revision 178354)
+++ gcc/config/i386/i386.c	(working copy)
@@ -18693,6 +18693,96 @@ ix86_expand_int_vcond (rtx operands[])
   return true;
 }
 
+bool
+ix86_expand_vshuffle (rtx operands[])
+{
+  rtx target = operands[0];
+  rtx op0 = operands[1];
+  rtx mask = operands[2];
+  rtx mm, vt, cv0, t1;
+  enum machine_mode mode = GET_MODE (op0);
+  enum machine_mode maskmode = GET_MODE (mask);
+  enum machine_mode maskinner = GET_MODE_INNER (mode);
+  rtx vec[16];
+  int w, i, j;
+
+  gcc_assert ((TARGET_SSE3 || TARGET_AVX) && GET_MODE_BITSIZE (mode) == 128);
+
+  op0 = force_reg (mode, op0);
+  mask = force_reg (maskmode, mask);
+
+  /* Number of elements in the vector.  */
+  w = GET_MODE_BITSIZE (maskmode) / GET_MODE_BITSIZE (maskinner);
+ 
+  /* mask = mask & {w-1, w-1, w-1,...} */
+  for (i = 0; i < w; i++)
+    vec[i] = GEN_INT (w - 1);
+
+  mm = gen_rtx_CONST_VECTOR (maskmode, gen_rtvec_v (w, vec));
+  mm = force_reg (maskmode, mm);
+
+  mask = gen_rtx_AND (maskmode, mask, mm);
+  
+  /* Convert mask to vector of chars.  */
+  mask = simplify_gen_subreg (V16QImode, mask, maskmode, 0);
+  mask = force_reg (V16QImode, mask);
+
+
+  /* Build a helper mask wich we will use in pshufb
+     (v4si) --> {0,0,0,0, 4,4,4,4, 8,8,8,8, 12,12,12,12}
+     (v8hi) --> {0,0, 2,2, 4,4, 6,6, ...}
+     ...  */
+  for (i = 0; i < w; i++)
+    for (j = 0; j < 16/w; j++)
+      vec[i*w+j] = GEN_INT (i*16/w);
+
+  vt = gen_rtx_CONST_VECTOR (V16QImode, gen_rtvec_v (16, vec));
+  vt = force_reg (V16QImode, vt);
+  
+  t1 = gen_reg_rtx (V16QImode);
+  emit_insn (gen_ssse3_pshufbv16qi3 (t1, mask, vt));
+  mm = t1;
+
+  /* MM contains now something like
+     mm = {m[0], .., m[0], m[k], .., m[k], ... }, where 
+     m[i] is an index of the element in the vector we are
+     selecting from.
+
+     Convert it into the byte positions by doing
+     mm = mm * {16/w, 16/w, ...}
+     mm = mm + {0,1,..,16/w, 0,1,..,16/w, ...}  */
+  for (i = 0; i < 16; i++)
+    vec[i] = GEN_INT (16/w);
+
+  cv0 = gen_rtx_CONST_VECTOR (V16QImode, gen_rtvec_v (16, vec));
+  cv0 = force_reg (V16QImode, cv0);
+  mm = gen_rtx_MULT (V16QImode, mm, cv0);
+
+  for (i = 0; i < w; i++)
+    for (j = 0; j < 16/w; j++)
+      vec[i*w+j] = GEN_INT (j);
+
+  cv0 = gen_rtx_CONST_VECTOR (V16QImode, gen_rtvec_v (16, vec));
+  cv0 = force_reg (V16QImode, cv0);
+  mm = gen_rtx_PLUS (V16QImode, mm, cv0);
+  mm = force_reg (V16QImode, mm);
+
+  t1 = gen_reg_rtx (V16QImode);
+  
+  /* Convert OP0 to vector of chars.  */
+  op0 = simplify_gen_subreg (V16QImode, op0, mode, 0);
+  op0 = force_reg (V16QImode, op0);
+  emit_insn (gen_ssse3_pshufbv16qi3 (t1, op0, mm));
+  
+  /* Convert it back from vector of chars to the original mode.  */
+  t1 = simplify_gen_subreg (mode, t1, V16QImode, 0);
+  
+  emit_insn (gen_rtx_SET (VOIDmode, target, t1));
+ 
+  fprintf (stderr, "-- %s called\n", __func__);
+  return true;
+}
+
 /* Unpack OP[1] into the next wider integer vector type.  UNSIGNED_P is
    true if we should do zero extension, else sign extension.  HIGH_P is
    true if we want the N/2 high elements, else the low elements.  */
@@ -30911,6 +31001,9 @@ struct expand_vec_perm_d
 
 static bool expand_vec_perm_1 (struct expand_vec_perm_d *d);
 static bool expand_vec_perm_broadcast_1 (struct expand_vec_perm_d *d);
+static int extract_vec_perm_cst (struct expand_vec_perm_d *, tree);
+static bool ix86_vectorize_builtin_vec_perm_ok (tree vec_type, tree mask);
+
 
 /* Get a vector mode of the same size as the original but with elements
    twice as wide.  This is only guaranteed to apply to integral vectors.  */
@@ -34576,10 +34669,10 @@ ix86_vectorize_builtin_vec_perm_ok (tree
 
   vec_mask = extract_vec_perm_cst (&d, mask);
 
-  /* This hook is cannot be called in response to something that the
-     user does (unlike the builtin expander) so we shouldn't ever see
-     an error generated from the extract.  */
-  gcc_assert (vec_mask > 0 && vec_mask <= 3);
+  /* Check whether the mask can be applied to the vector type.  */
+  if (vec_mask < 0 || vec_mask > 3)
+    return false;
+  
   one_vec = (vec_mask != 3);
 
   /* Implementable with shufps or pshufd.  */
Index: gcc/tree-ssa-operands.c
===================================================================
--- gcc/tree-ssa-operands.c	(revision 178354)
+++ gcc/tree-ssa-operands.c	(working copy)
@@ -943,6 +943,7 @@ get_expr_operands (gimple stmt, tree *ex
 
     case COND_EXPR:
     case VEC_COND_EXPR:
+    case VEC_SHUFFLE_EXPR:
       get_expr_operands (stmt, &TREE_OPERAND (expr, 0), uflags);
       get_expr_operands (stmt, &TREE_OPERAND (expr, 1), uflags);
       get_expr_operands (stmt, &TREE_OPERAND (expr, 2), uflags);

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: Vector shuffling
  2011-08-31 14:33             ` Artem Shinkarov
@ 2011-08-31 15:17               ` Richard Guenther
  2011-08-31 17:25               ` Joseph S. Myers
  1 sibling, 0 replies; 71+ messages in thread
From: Richard Guenther @ 2011-08-31 15:17 UTC (permalink / raw)
  To: Artem Shinkarov; +Cc: Duncan Sands, gcc-patches

On Wed, Aug 31, 2011 at 1:02 PM, Artem Shinkarov
<artyom.shinkaroff@gmail.com> wrote:
> Here is a newer version of the patch, which transforms the builtin to
> the VEC_SHUFFLE_EXPR in the front-end.
>
> Several comments:
> 1) Helper function for the pseudo-builtins.
> In my case the builtin can have 2 or 3 arguments, and I think that I
> expressed that in a pretty much short way without any helper function.
> Am I missing something?
>
> 2) Richard, why do you want to treat VEC_SHUFFLE_EXPR as
> GIMPLE_TERNARY_RHS when VEC_COND_EXPR (which is about the same as
> VEC_SHUFF_EXPR) is single_rhs? From my perspective I don't see much of
> a difference whether it is trenary or single, so I converted it to
> trenary as you asked. But still it looks suspicious that vec_cond and
> vec_shuffle are treated differently.

VEC_SHUFFLE_EXPR has three operands, VEC_COND_EXPR is
a little weird, as it has a sub-expression - if you count that as a single
operand then it would have three too.  I'll consider converting it to
ternary, too.

Richard.

> Can anyone review the x86 parts?
>
>
> Thanks,
> Artem.
>

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: Vector shuffling
  2011-08-31 14:33             ` Artem Shinkarov
  2011-08-31 15:17               ` Richard Guenther
@ 2011-08-31 17:25               ` Joseph S. Myers
  2011-08-31 19:08                 ` Artem Shinkarov
  1 sibling, 1 reply; 71+ messages in thread
From: Joseph S. Myers @ 2011-08-31 17:25 UTC (permalink / raw)
  To: Artem Shinkarov; +Cc: Richard Guenther, Duncan Sands, gcc-patches

On Wed, 31 Aug 2011, Artem Shinkarov wrote:

> 1) Helper function for the pseudo-builtins.
> In my case the builtin can have 2 or 3 arguments, and I think that I
> expressed that in a pretty much short way without any helper function.
> Am I missing something?

The point is to refactor what's common between this and other 
pseudo-builtins, not to have two pseudo-builtins doing things one way and 
one doing them another way....

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: Vector shuffling
  2011-08-31 17:25               ` Joseph S. Myers
@ 2011-08-31 19:08                 ` Artem Shinkarov
       [not found]                   ` <Pine.LNX.4.64.1108312053060.21299@digraph.polyomino.org.uk>
  0 siblings, 1 reply; 71+ messages in thread
From: Artem Shinkarov @ 2011-08-31 19:08 UTC (permalink / raw)
  To: Joseph S. Myers; +Cc: Richard Guenther, Duncan Sands, gcc-patches

On Wed, Aug 31, 2011 at 4:38 PM, Joseph S. Myers
<joseph@codesourcery.com> wrote:
> On Wed, 31 Aug 2011, Artem Shinkarov wrote:
>
>> 1) Helper function for the pseudo-builtins.
>> In my case the builtin can have 2 or 3 arguments, and I think that I
>> expressed that in a pretty much short way without any helper function.
>> Am I missing something?
>
> The point is to refactor what's common between this and other
> pseudo-builtins, not to have two pseudo-builtins doing things one way and
> one doing them another way....

Joseph, I don't mind adjusting, just look into the patch and tell me
if the way it is done at the moment is the right way to do it. I don't
see a good reason to write a helper function the way you describe,
because the number of operations we do there is very small. However,
if you think that this is a right way to go, I can put the statements
I am using right now to handle arguments of RID_BUILTIN_SHUFFLE in a
helper function. So is there anything missing?


Thanks,
Artem.

> --
> Joseph S. Myers
> joseph@codesourcery.com
>

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: Vector shuffling
  2011-08-31  9:02       ` Artem Shinkarov
  2011-08-31  9:04         ` Duncan Sands
@ 2011-08-31 20:36         ` Chris Lattner
  1 sibling, 0 replies; 71+ messages in thread
From: Chris Lattner @ 2011-08-31 20:36 UTC (permalink / raw)
  To: Artem Shinkarov
  Cc: Richard Guenther, gcc-patches, Richard Henderson, Joseph S. Myers


On Aug 31, 2011, at 1:27 AM, Artem Shinkarov wrote:

>> If you're going to add vector shuffling builtins, you might consider adding the same builtin that clang has for compatibility:
>> http://clang.llvm.org/docs/LanguageExtensions.html#__builtin_shufflevector
>> 
>> It should be straight-forward to map it into the same IR.
>> 
>> -Chris
>> 
> 
> Chris
> 
> I am trying to use OpenCL syntax here which says that the mask for
> shuffling is a vector. Also I didn't really get from the clang
> description if the indexes could be non-constnants? If not, then I
> have a problem here, because I want to support this.

Yes, constant elements are required for this builtin.  It is an implementation detail, but Clang doesn't implement the OpenCL shuffle operations with builtins.

-Chris

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: Vector shuffling
       [not found]                   ` <Pine.LNX.4.64.1108312053060.21299@digraph.polyomino.org.uk>
@ 2011-09-02 15:16                     ` Artem Shinkarov
  2011-09-02 15:41                       ` Joseph S. Myers
  0 siblings, 1 reply; 71+ messages in thread
From: Artem Shinkarov @ 2011-09-02 15:16 UTC (permalink / raw)
  To: Joseph S. Myers; +Cc: Richard Guenther, Duncan Sands, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1786 bytes --]

New version of the patch with the helper function.


Artem.

On Wed, Aug 31, 2011 at 9:55 PM, Joseph S. Myers
<joseph@codesourcery.com> wrote:
> On Wed, 31 Aug 2011, Artem Shinkarov wrote:
>
>> On Wed, Aug 31, 2011 at 4:38 PM, Joseph S. Myers
>> <joseph@codesourcery.com> wrote:
>> > On Wed, 31 Aug 2011, Artem Shinkarov wrote:
>> >
>> >> 1) Helper function for the pseudo-builtins.
>> >> In my case the builtin can have 2 or 3 arguments, and I think that I
>> >> expressed that in a pretty much short way without any helper function.
>> >> Am I missing something?
>> >
>> > The point is to refactor what's common between this and other
>> > pseudo-builtins, not to have two pseudo-builtins doing things one way and
>> > one doing them another way....
>>
>> Joseph, I don't mind adjusting, just look into the patch and tell me
>> if the way it is done at the moment is the right way to do it. I don't
>> see a good reason to write a helper function the way you describe,
>> because the number of operations we do there is very small. However,
>> if you think that this is a right way to go, I can put the statements
>> I am using right now to handle arguments of RID_BUILTIN_SHUFFLE in a
>> helper function. So is there anything missing?
>
> The common parts are I think:
>
> * Parse open parenthesis.  If there isn't one, a parse error is OK (no
> worse than at present) but "cannot take address of %<__builtin_whatever%>"
> would be better.
>
> * Parse expression list.
>
> * Parse close parenthesis (if not found, parse error).
>
> * Check number of arguments against the list of permitted numbers of
> arguments for this pseudo-builtin, and give an error if the number is
> wrong.
>
> --
> Joseph S. Myers
> joseph@codesourcery.com
>

[-- Attachment #2: vec-shuffle.v13.diff --]
[-- Type: text/plain, Size: 45960 bytes --]

Index: gcc/doc/extend.texi
===================================================================
--- gcc/doc/extend.texi	(revision 178354)
+++ gcc/doc/extend.texi	(working copy)
@@ -6561,6 +6561,32 @@ invoke undefined behavior at runtime.  W
 accesses for vector subscription can be enabled with
 @option{-Warray-bounds}.
 
+Vector shuffling is available using functions 
+@code{__builtin_shuffle (vec, mask)} and 
+@code{__builtin_shuffle (vec0, vec1, mask)}. Both functions construct
+a permutation of elements from one or two vectors and return a vector
+of the same type as input vector(s). The mask is a vector of 
+integer-typed elements. The size of each element of the mask must be
+the same as the size of each input vector element. The number of 
+elements in input vector(s) and mask must be the same.
+
+The elements of the input vectors are numbered from left to right across
+one or both of the vectors. Each element in the mask specifies a number
+of element from the input vector(s). Consider the following example.
+
+@smallexample
+typedef int v4si __attribute__ ((vector_size (16)));
+
+v4si a = @{1,2,3,4@};
+v4si b = @{5,6,7,8@};
+v4si mask1 = @{0,1,1,3@};
+v4si mask2 = @{0,4,2,5@};
+v4si res;
+
+res = __builtin_shuffle (a, mask1);       /* res is @{1,2,2,4@}  */
+res = __builtin_shuffle2 (a, b, mask2);   /* res is @{1,5,3,6@}  */
+@end smallexample
+
 You can declare variables and use them in function calls and returns, as
 well as in assignments and some casts.  You can specify a vector type as
 a return type for a function.  Vector types can also be used as function
Index: gcc/tree-pretty-print.c
===================================================================
--- gcc/tree-pretty-print.c	(revision 178354)
+++ gcc/tree-pretty-print.c	(working copy)
@@ -2067,6 +2067,16 @@ dump_generic_node (pretty_printer *buffe
       dump_generic_node (buffer, TREE_OPERAND (node, 2), spc, flags, false);
       pp_string (buffer, " > ");
       break;
+    
+    case VEC_SHUFFLE_EXPR:
+      pp_string (buffer, " VEC_SHUFFLE_EXPR < ");
+      dump_generic_node (buffer, TREE_OPERAND (node, 0), spc, flags, false);
+      pp_string (buffer, " , ");
+      dump_generic_node (buffer, TREE_OPERAND (node, 1), spc, flags, false);
+      pp_string (buffer, " , ");
+      dump_generic_node (buffer, TREE_OPERAND (node, 2), spc, flags, false);
+      pp_string (buffer, " > ");
+      break;
 
     case DOT_PROD_EXPR:
       pp_string (buffer, " DOT_PROD_EXPR < ");
Index: gcc/c-family/c-common.c
===================================================================
--- gcc/c-family/c-common.c	(revision 178354)
+++ gcc/c-family/c-common.c	(working copy)
@@ -425,6 +425,7 @@ const struct c_common_resword c_common_r
   { "__attribute__",	RID_ATTRIBUTE,	0 },
   { "__builtin_choose_expr", RID_CHOOSE_EXPR, D_CONLY },
   { "__builtin_complex", RID_BUILTIN_COMPLEX, D_CONLY },
+  { "__builtin_shuffle", RID_BUILTIN_SHUFFLE, D_CONLY },
   { "__builtin_offsetof", RID_OFFSETOF, 0 },
   { "__builtin_types_compatible_p", RID_TYPES_COMPATIBLE_P, D_CONLY },
   { "__builtin_va_arg",	RID_VA_ARG,	0 },
Index: gcc/c-family/c-common.h
===================================================================
--- gcc/c-family/c-common.h	(revision 178354)
+++ gcc/c-family/c-common.h	(working copy)
@@ -103,7 +103,7 @@ enum rid
   /* C extensions */
   RID_ASM,       RID_TYPEOF,   RID_ALIGNOF,  RID_ATTRIBUTE,  RID_VA_ARG,
   RID_EXTENSION, RID_IMAGPART, RID_REALPART, RID_LABEL,      RID_CHOOSE_EXPR,
-  RID_TYPES_COMPATIBLE_P,      RID_BUILTIN_COMPLEX,
+  RID_TYPES_COMPATIBLE_P,      RID_BUILTIN_COMPLEX,	     RID_BUILTIN_SHUFFLE,
   RID_DFLOAT32, RID_DFLOAT64, RID_DFLOAT128,
   RID_FRACT, RID_ACCUM,
 
@@ -898,6 +898,7 @@ extern tree build_function_call (locatio
 
 extern tree build_function_call_vec (location_t, tree,
     				     VEC(tree,gc) *, VEC(tree,gc) *);
+extern tree c_build_vec_shuffle_expr (location_t, tree, tree, tree);
 
 extern tree resolve_overloaded_builtin (location_t, tree, VEC(tree,gc) *);
 
Index: gcc/optabs.c
===================================================================
--- gcc/optabs.c	(revision 178354)
+++ gcc/optabs.c	(working copy)
@@ -6620,6 +6620,82 @@ vector_compare_rtx (tree cond, bool unsi
   return gen_rtx_fmt_ee (rcode, VOIDmode, ops[0].value, ops[1].value);
 }
 
+/* Return true if VEC_SHUFF_EXPR can be expanded using SIMD extensions
+   of the CPU.  */
+bool
+expand_vec_shuffle_expr_p (enum machine_mode mode, tree v0, 
+			   tree v1, tree mask)
+{
+  int v0_mode_s = GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE (TREE_TYPE (v0))));
+  int mask_mode_s = GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE (TREE_TYPE (mask))));
+
+  if (TREE_CODE (mask) == VECTOR_CST
+      && targetm.vectorize.builtin_vec_perm_ok (TREE_TYPE (v0), mask))
+    return true;
+  
+  if (v0 != v1 || v0_mode_s != mask_mode_s)
+    return false;
+    
+  return direct_optab_handler (vshuffle_optab, mode) != CODE_FOR_nothing;
+}
+
+/* Generate instructions for VEC_COND_EXPR given its type and three
+   operands.  */
+rtx
+expand_vec_shuffle_expr (tree type, tree v0, tree v1, tree mask, rtx target)
+{
+  struct expand_operand ops[4];
+  enum insn_code icode;
+  enum machine_mode mode = TYPE_MODE (type);
+  rtx rtx_v0, rtx_mask;
+
+  gcc_assert (expand_vec_shuffle_expr_p (mode, v0, v1, mask));
+
+  if (TREE_CODE (mask) == VECTOR_CST)
+    {
+      tree m_type, call;
+      tree fn = targetm.vectorize.builtin_vec_perm (TREE_TYPE (v0), &m_type);
+      rtx t;
+
+      if (!fn)
+	goto vshuffle;
+
+      if (m_type != TREE_TYPE (TREE_TYPE (mask)))
+	{	
+	  int units = TYPE_VECTOR_SUBPARTS (TREE_TYPE (mask));
+	  tree cvt = build_vector_type (m_type, units);
+	  mask = fold_convert (cvt, mask);
+	}
+
+      fn = copy_node (fn);
+      call = fold_build1 (ADDR_EXPR, build_pointer_type (TREE_TYPE (fn)), fn);
+      call = build_call_nary (type /* ? */, call, 3, v0, v1, mask);
+
+      t = expand_normal (call);  
+      target = gen_reg_rtx (mode);
+      emit_insn (gen_rtx_SET (VOIDmode, target, t));
+      return target;
+    }
+
+vshuffle:
+  gcc_assert (v1 == v0);
+
+  icode = direct_optab_handler (vshuffle_optab, mode);
+
+  if (icode == CODE_FOR_nothing)
+    return 0;
+  
+  rtx_v0 = expand_normal (v0);
+  rtx_mask = expand_normal (mask);
+
+  create_output_operand (&ops[0], target, mode);
+  create_input_operand (&ops[1], rtx_v0, mode);
+  create_input_operand (&ops[2], rtx_mask, mode);
+  expand_insn (icode, 3, ops);
+  
+  return ops[0].value;
+}
+
 /* Return insn code for TYPE, the type of a VEC_COND_EXPR.  */
 
 static inline enum insn_code
Index: gcc/optabs.h
===================================================================
--- gcc/optabs.h	(revision 178354)
+++ gcc/optabs.h	(working copy)
@@ -636,6 +636,9 @@ enum direct_optab_index
   DOI_vcond,
   DOI_vcondu,
 
+  /* Vector shuffling.  */
+  DOI_vshuffle,
+
   /* Block move operation.  */
   DOI_movmem,
 
@@ -701,6 +704,7 @@ typedef struct direct_optab_d *direct_op
 #define reload_out_optab (&direct_optab_table[(int) DOI_reload_out])
 #define vcond_optab (&direct_optab_table[(int) DOI_vcond])
 #define vcondu_optab (&direct_optab_table[(int) DOI_vcondu])
+#define vshuffle_optab (&direct_optab_table[(int) DOI_vshuffle])
 #define movmem_optab (&direct_optab_table[(int) DOI_movmem])
 #define setmem_optab (&direct_optab_table[(int) DOI_setmem])
 #define cmpstr_optab (&direct_optab_table[(int) DOI_cmpstr])
@@ -879,8 +883,15 @@ extern rtx expand_widening_mult (enum ma
 /* Return tree if target supports vector operations for COND_EXPR.  */
 bool expand_vec_cond_expr_p (tree, enum machine_mode);
 
+/* Return tree if target supports vector operations for VEC_SHUFFLE_EXPR.  */
+bool expand_vec_shuffle_expr_p (enum machine_mode, tree, tree, tree);
+
 /* Generate code for VEC_COND_EXPR.  */
 extern rtx expand_vec_cond_expr (tree, tree, tree, tree, rtx);
+
+/* Generate code for VEC_SHUFFLE_EXPR.  */
+extern rtx expand_vec_shuffle_expr (tree, tree, tree, tree, rtx);
+
 /* Generate code for VEC_LSHIFT_EXPR and VEC_RSHIFT_EXPR.  */
 extern rtx expand_vec_shift_expr (sepops, rtx);
 
Index: gcc/genopinit.c
===================================================================
--- gcc/genopinit.c	(revision 178354)
+++ gcc/genopinit.c	(working copy)
@@ -255,6 +255,7 @@ static const char * const optabs[] =
   "set_optab_handler (vec_realign_load_optab, $A, CODE_FOR_$(vec_realign_load_$a$))",
   "set_direct_optab_handler (vcond_optab, $A, CODE_FOR_$(vcond$a$))",
   "set_direct_optab_handler (vcondu_optab, $A, CODE_FOR_$(vcondu$a$))",
+  "set_direct_optab_handler (vshuffle_optab, $A, CODE_FOR_$(vshuffle$a$))",
   "set_optab_handler (ssum_widen_optab, $A, CODE_FOR_$(widen_ssum$I$a3$))",
   "set_optab_handler (usum_widen_optab, $A, CODE_FOR_$(widen_usum$I$a3$))",
   "set_optab_handler (udot_prod_optab, $A, CODE_FOR_$(udot_prod$I$a$))",
Index: gcc/testsuite/gcc.c-torture/execute/vect-shuffle-2.c
===================================================================
--- gcc/testsuite/gcc.c-torture/execute/vect-shuffle-2.c	(revision 0)
+++ gcc/testsuite/gcc.c-torture/execute/vect-shuffle-2.c	(revision 0)
@@ -0,0 +1,44 @@
+#define vector(elcount, type)  \
+__attribute__((vector_size((elcount)*sizeof(type)))) type
+
+#define vidx(type, vec, idx) (*(((type *) &(vec)) + idx))
+
+#define shuf2compare(type, count, vres, v0, v1, mask) \
+do { \
+    int __i; \
+    for (__i = 0; __i < count; __i++) { \
+        if (vidx(type, vres, __i) != ((vidx(type, mask, __i) < count) ? \
+                          vidx(type, v0, vidx(type, mask, __i)) :  \
+                          vidx(type, v1, (vidx(type, mask, __i) - count)))) \
+            __builtin_abort (); \
+        } \
+} while (0)
+
+
+int main (int argc, char *argv[]) {
+    vector (8, short) v0 = {5, 5,5,5,5,5,argc,7};
+    vector (8, short) v1 = {argc, 1,8,8,4,9,argc,4};
+    vector (8, short) v2;
+
+    //vector (8, short) mask = {1,2,5,4,3,6,7};
+    
+    vector (8, short) mask0 = {0,2,3,1,4,5,6,7};
+    vector (8, short) mask1 = {0,12,3,4,3,0,10,9};
+    
+    vector (8, short) mask2 = {0,8,1,9,2,10,3,11};
+
+    v2 = __builtin_shuffle (v0, v1,  mask0);
+    shuf2compare (short, 8, v2, v0, v1, mask0);
+ 
+    v2 = __builtin_shuffle (v0, v1,  mask1);
+    shuf2compare (short, 8, v2, v0, v1, mask1);
+    
+    v2 = __builtin_shuffle (v0, v1,  mask2);
+    shuf2compare (short, 8, v2, v0, v1, mask2);
+
+    v2 = __builtin_shuffle (mask0, mask0,  v0);
+    shuf2compare (short, 8, v2, mask0, mask0, v0);
+
+    return 0; 
+}
+
Index: gcc/testsuite/gcc.c-torture/execute/vect-shuffle-4.c
===================================================================
--- gcc/testsuite/gcc.c-torture/execute/vect-shuffle-4.c	(revision 0)
+++ gcc/testsuite/gcc.c-torture/execute/vect-shuffle-4.c	(revision 0)
@@ -0,0 +1,50 @@
+#define vector(elcount, type)  \
+__attribute__((vector_size((elcount)*sizeof(type)))) type
+
+#define vidx(type, vec, idx) (*(((type *) &(vec)) + idx))
+
+#define shuf2compare(type, count, vres, v0, v1, mask) \
+do { \
+    int __i; \
+    for (__i = 0; __i < count; __i++) { \
+        if (vidx(type, vres, __i) != ((vidx(type, mask, __i) < count) ? \
+                          vidx(type, v0, vidx(type, mask, __i)) :  \
+                          vidx(type, v1, (vidx(type, mask, __i) - count)))) \
+            __builtin_abort (); \
+        } \
+} while (0)
+
+
+vector (8, short) __attribute__ ((noinline))
+f (vector (8, short) x, vector (8, short) y, vector (8, short) mask) {
+    return __builtin_shuffle (x, y, mask);
+}
+
+
+
+int main (int argc, char *argv[]) {
+    vector (8, short) v0 = {argc, 1,2,3,4,5,6,7};
+    vector (8, short) v1 = {argc, 1,argc,3,4,5,argc,7};
+    vector (8, short) v2;
+
+    //vector (8, short) mask = {1,2,5,4,3,6,7};
+    
+    vector (8, short) mask0 = {0,2,3,1,4,5,6,7};
+    vector (8, short) mask1 = {0,12,3,4,3,0,10,9};
+    vector (8, short) mask2 = {0,8,1,9,2,10,3,11};
+
+    v2 = f (v0, v1,  mask0);
+    shuf2compare (short, 8, v2, v0, v1, mask0);
+ 
+    v2 = f (v0, v1,  mask1);
+    shuf2compare (short, 8, v2, v0, v1, mask1);
+
+    v2 = f (v0, v1,  mask2);
+    shuf2compare (short, 8, v2, v0, v1, mask2);
+
+    v2 = f (mask0, mask0,  v0);
+    shuf2compare (short, 8, v2, mask0, mask0, v0);
+
+    return 0; 
+}
+
Index: gcc/testsuite/gcc.c-torture/execute/vect-shuffle-1.c
===================================================================
--- gcc/testsuite/gcc.c-torture/execute/vect-shuffle-1.c	(revision 0)
+++ gcc/testsuite/gcc.c-torture/execute/vect-shuffle-1.c	(revision 0)
@@ -0,0 +1,46 @@
+#define vector(elcount, type)  \
+__attribute__((vector_size((elcount)*sizeof(type)))) type
+
+#define vidx(type, vec, idx) (*(((type *) &(vec)) + idx))
+
+#define shufcompare(type, count, vres, v0, mask) \
+do { \
+    int __i; \
+    for (__i = 0; __i < count; __i++) { \
+        if (vidx(type, vres, __i) != vidx(type, v0, vidx(type, mask, __i))) \
+            __builtin_abort (); \
+    } \
+} while (0)
+
+
+int main (int argc, char *argv[]) {
+    /*vector (8, short) v0 = {argc, 1,2,3,4,5,6,7};
+    vector (8, short) v1 = {argc, 1,argc,3,4,5,argc,7};
+    vector (8, short) v2;
+   
+    vector (8, short) smask = {0,0,1,2,3,4,5,6};
+    
+    v2 = __builtin_shuffle (v0,  smask);
+    shufcompare (short, 8, v2, v0, smask);
+    v2 = __builtin_shuffle (v0, v1);
+    shufcompare (short, 8, v2, v0, v1);
+    v2 = __builtin_shuffle (smask, v0);
+    shufcompare (short, 8, v2, smask, v0);*/
+
+    vector (4, int) i0 = {argc, 1,2,3};
+    vector (4, int) i1 = {argc, 1, argc, 3};
+    vector (4, int) i2;
+
+    vector (4, int) imask = {0,3,2,1};
+
+    /*i2 = __builtin_shuffle (i0, imask);
+    shufcompare (int, 4, i2, i0, imask);*/
+    i2 = __builtin_shuffle (i0, i1);
+    shufcompare (int, 4, i2, i0, i1);
+    
+    i2 = __builtin_shuffle (imask, i0);
+    shufcompare (int, 4, i2, imask, i0);
+    
+    return 0;
+}
+
Index: gcc/testsuite/gcc.c-torture/execute/vect-shuffle-3.c
===================================================================
--- gcc/testsuite/gcc.c-torture/execute/vect-shuffle-3.c	(revision 0)
+++ gcc/testsuite/gcc.c-torture/execute/vect-shuffle-3.c	(revision 0)
@@ -0,0 +1,36 @@
+#define vector(elcount, type)  \
+__attribute__((vector_size((elcount)*sizeof(type)))) type
+
+#define vidx(type, vec, idx) (*(((type *) &(vec)) + idx))
+
+#define shufcompare(type, count, vres, v0, mask) \
+do { \
+    int __i; \
+    for (__i = 0; __i < count; __i++) { \
+        if (vidx(type, vres, __i) != vidx(type, v0, vidx(type, mask, __i))) \
+            __builtin_abort (); \
+    } \
+} while (0)
+
+vector (8, short) __attribute__ ((noinline))
+f (vector (8, short) x, vector (8, short) mask) {
+    return __builtin_shuffle (x, mask);
+}
+
+
+int main (int argc, char *argv[]) {
+    vector (8, short) v0 = {argc, 1,2,3,4,5,6,7};
+    vector (8, short) v1 = {argc, 1,argc,3,4,5,argc,7};
+    vector (8, short) v2;
+
+    vector (8, short) mask = {0,0,1,2,3,4,5,6};
+    
+    v2 = f (v0,  mask);
+    shufcompare (short, 8, v2, v0, mask);
+
+    v2 = f (v0, v1);
+    shufcompare (short, 8, v2, v0, v1);
+
+    return 0;
+}
+
Index: gcc/expr.c
===================================================================
--- gcc/expr.c	(revision 178354)
+++ gcc/expr.c	(working copy)
@@ -8605,6 +8605,10 @@ expand_expr_real_2 (sepops ops, rtx targ
     case VEC_PACK_FIX_TRUNC_EXPR:
       mode = TYPE_MODE (TREE_TYPE (treeop0));
       goto binop;
+    
+    case VEC_SHUFFLE_EXPR:
+      target = expand_vec_shuffle_expr (type, treeop0, treeop1, treeop2, target);
+      return target;
 
     case DOT_PROD_EXPR:
       {
Index: gcc/gimple-pretty-print.c
===================================================================
--- gcc/gimple-pretty-print.c	(revision 178354)
+++ gcc/gimple-pretty-print.c	(working copy)
@@ -417,6 +417,16 @@ dump_ternary_rhs (pretty_printer *buffer
       dump_generic_node (buffer, gimple_assign_rhs3 (gs), spc, flags, false);
       pp_string (buffer, ">");
       break;
+    
+    case VEC_SHUFFLE_EXPR:
+      pp_string (buffer, "VEC_SHUFFLE_EXPR <");
+      dump_generic_node (buffer, gimple_assign_rhs1 (gs), spc, flags, false);
+      pp_string (buffer, ", ");
+      dump_generic_node (buffer, gimple_assign_rhs2 (gs), spc, flags, false);
+      pp_string (buffer, ", ");
+      dump_generic_node (buffer, gimple_assign_rhs3 (gs), spc, flags, false);
+      pp_string (buffer, ">");
+      break;
 
     case REALIGN_LOAD_EXPR:
       pp_string (buffer, "REALIGN_LOAD <");
Index: gcc/c-typeck.c
===================================================================
--- gcc/c-typeck.c	(revision 178354)
+++ gcc/c-typeck.c	(working copy)
@@ -2845,6 +2845,89 @@ build_function_call_vec (location_t loc,
     }
   return require_complete_type (result);
 }
+
+/* Build a VEC_SHUFLE_EXPR if V0, V1 and MASK are not error_mark_nodes
+   and have vector types, V0 has the same type as V1, and the number of
+   elements of V0, V1, MASK is the same.  */
+tree
+c_build_vec_shuffle_expr (location_t loc, tree v0, tree v1, tree mask)
+{
+  tree vec_shuffle, tmp;
+  bool wrap = true;
+  bool maybe_const = false;
+  bool two_arguments = v0 == v1;
+
+
+  if (v0 == error_mark_node || v1 == error_mark_node 
+      || mask == error_mark_node)
+    return error_mark_node;
+
+  if (TREE_CODE (TREE_TYPE (mask)) != VECTOR_TYPE
+      || TREE_CODE (TREE_TYPE (TREE_TYPE (mask))) != INTEGER_TYPE)
+    {
+      error_at (loc, "__builtin_shuffle last argument must "
+		     "be an integer vector");
+      return error_mark_node;
+    }
+   
+  if (TREE_CODE (TREE_TYPE (v0)) != VECTOR_TYPE
+      || TREE_CODE (TREE_TYPE (v1)) != VECTOR_TYPE)
+    {
+      error_at (loc, "__builtin_shuffle arguments must be vectors");
+      return error_mark_node;
+    }
+
+  if (TREE_TYPE (v0) != TREE_TYPE (v1))
+    {
+      error_at (loc, "__builtin_shuffle argument vectors must be of "
+		     "the same type");
+      return error_mark_node;
+    }
+
+  if (TYPE_VECTOR_SUBPARTS (TREE_TYPE (v0)) 
+      != TYPE_VECTOR_SUBPARTS (TREE_TYPE (mask))
+      && TYPE_VECTOR_SUBPARTS (TREE_TYPE (v1))
+	 != TYPE_VECTOR_SUBPARTS (TREE_TYPE (mask)))
+    {
+      error_at (loc, "__builtin_shuffle number of elements of the "
+		     "argument vector(s) and the mask vector should "
+		     "be the same");
+      return error_mark_node;
+    }
+  
+  if (GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE (TREE_TYPE (v0)))) 
+      != GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE (TREE_TYPE (mask)))))
+    {
+      error_at (loc, "__builtin_shuffle argument vector(s) inner type "
+		     "must have the same size as inner type of the mask");
+      return error_mark_node;
+    }
+
+  /* Avoid C_MAYBE_CONST_EXPRs inside VEC_SHUFFLE_EXPR.  */
+  tmp = c_fully_fold (v0, false, &maybe_const);
+  v0 = save_expr (tmp);
+  wrap &= maybe_const;
+
+  if (!two_arguments)
+    {
+      tmp = c_fully_fold (v1, false, &maybe_const);
+      v1 = save_expr (tmp);
+      wrap &= maybe_const;
+    }
+  else
+    v1 = v0;
+  
+  tmp = c_fully_fold (mask, false, &maybe_const);
+  mask = save_expr (tmp);
+  wrap &= maybe_const;
+
+  vec_shuffle = build3 (VEC_SHUFFLE_EXPR, TREE_TYPE (v0), v0, v1, mask);
+
+  if (!wrap)
+    vec_shuffle = c_wrap_maybe_const (vec_shuffle, true);
+
+  return vec_shuffle;
+}
 \f
 /* Convert the argument expressions in the vector VALUES
    to the types in the list TYPELIST.
@@ -6120,7 +6203,14 @@ digest_init (location_t init_loc, tree t
 	  tree value;
 	  bool constant_p = true;
 
-	  /* Iterate through elements and check if all constructor
+	  /* If constructor has less elements than the vector type.  */
+          if (CONSTRUCTOR_NELTS (inside_init) 
+              < TYPE_VECTOR_SUBPARTS (TREE_TYPE (inside_init)))
+            warning_at (init_loc, 0, "vector length does not match "
+                                     "initializer length, zero elements "
+                                     "will be inserted");
+          
+          /* Iterate through elements and check if all constructor
 	     elements are *_CSTs.  */
 	  FOR_EACH_CONSTRUCTOR_VALUE (CONSTRUCTOR_ELTS (inside_init), ix, value)
 	    if (!CONSTANT_CLASS_P (value))
Index: gcc/gimplify.c
===================================================================
--- gcc/gimplify.c	(revision 178354)
+++ gcc/gimplify.c	(working copy)
@@ -7053,6 +7053,32 @@ gimplify_expr (tree *expr_p, gimple_seq
 	  }
 	  break;
 
+	case VEC_SHUFFLE_EXPR:
+	  {
+	    enum gimplify_status r0, r1, r2;
+
+	    if (TREE_OPERAND (*expr_p, 0) == TREE_OPERAND (*expr_p, 1))
+	      {
+		r0 = r1 = gimplify_expr (&TREE_OPERAND (*expr_p, 0), pre_p,
+					 post_p, is_gimple_val, fb_rvalue);
+		TREE_OPERAND (*expr_p, 1) = TREE_OPERAND (*expr_p, 0);
+	      }
+	    else
+	      {
+		 r0 = gimplify_expr (&TREE_OPERAND (*expr_p, 0), pre_p,
+				     post_p, is_gimple_val, fb_rvalue);
+		 r1 = gimplify_expr (&TREE_OPERAND (*expr_p, 1), pre_p,
+				     post_p, is_gimple_val, fb_rvalue);
+	      }
+
+	    r2 = gimplify_expr (&TREE_OPERAND (*expr_p, 2), pre_p,
+				post_p, is_gimple_val, fb_rvalue);
+	    recalculate_side_effects (*expr_p);
+
+	    ret = MIN (r0, MIN (r1, r2));
+	    break;
+	  }
+
 	case TARGET_MEM_REF:
 	  {
 	    enum gimplify_status r0 = GS_ALL_DONE, r1 = GS_ALL_DONE;
Index: gcc/tree.def
===================================================================
--- gcc/tree.def	(revision 178354)
+++ gcc/tree.def	(working copy)
@@ -497,6 +497,19 @@ DEFTREECODE (COND_EXPR, "cond_expr", tcc
 */
 DEFTREECODE (VEC_COND_EXPR, "vec_cond_expr", tcc_expression, 3)
 
+/* Vector shuffle expression. A = VEC_SHUFFLE_EXPR<v0, v1, maks>
+   means
+
+   freach i in length (mask):
+     A = mask[i] < length (v0) ? v0[mask[i]] : v1[mask[i]]
+
+   V0 and V1 are vectors of the same type. MASK is an integer-typed
+   vector. The number of MASK elements must be the same with the
+   number of elements in V0 and V1. The size of the inner type
+   of the MASK and of the V0 and V1 must be the same.
+*/
+DEFTREECODE (VEC_SHUFFLE_EXPR, "vec_shuffle_expr", tcc_expression, 3)
+
 /* Declare local variables, including making RTL and allocating space.
    BIND_EXPR_VARS is a chain of VAR_DECL nodes for the variables.
    BIND_EXPR_BODY is the body, the expression to be computed using
Index: gcc/tree-inline.c
===================================================================
--- gcc/tree-inline.c	(revision 178354)
+++ gcc/tree-inline.c	(working copy)
@@ -3285,6 +3285,7 @@ estimate_operator_cost (enum tree_code c
        ??? We may consider mapping RTL costs to this.  */
     case COND_EXPR:
     case VEC_COND_EXPR:
+    case VEC_SHUFFLE_EXPR:
 
     case PLUS_EXPR:
     case POINTER_PLUS_EXPR:
Index: gcc/tree-vect-generic.c
===================================================================
--- gcc/tree-vect-generic.c	(revision 178354)
+++ gcc/tree-vect-generic.c	(working copy)
@@ -30,6 +30,7 @@ along with GCC; see the file COPYING3.
 #include "tree-pass.h"
 #include "flags.h"
 #include "ggc.h"
+#include "diagnostic.h"
 
 /* Need to include rtl.h, expr.h, etc. for optabs.  */
 #include "expr.h"
@@ -432,6 +433,279 @@ type_for_widest_vector_mode (enum machin
     }
 }
 
+
+/* Build a reference to the element of the vector VECT. Function 
+   returns either the element itself, either BIT_FIELD_REF, or an 
+   ARRAY_REF expression.
+   
+   GSI is requred to insert temporary variables while building a
+   refernece to the element of the vector VECT.
+   
+   PTMPVEC is a pointer to the temporary variable for caching
+   purposes. In case when PTMPVEC is NULL new temporary variable
+   will be created.  */
+static tree
+vector_element (gimple_stmt_iterator *gsi, tree vect, tree idx, tree *ptmpvec)
+{
+  tree type;
+  gimple asgn; 
+  unsigned HOST_WIDE_INT maxval;
+  tree tmpvec; 
+  tree indextype, arraytype;
+  bool need_asgn = true;
+
+  gcc_assert (TREE_CODE (TREE_TYPE (vect)) == VECTOR_TYPE);
+
+  type = TREE_TYPE (vect);
+  if (TREE_CODE (idx) == INTEGER_CST)
+    {
+      unsigned HOST_WIDE_INT index;
+
+      if (!host_integerp (idx, 1)
+           || (index = tree_low_cst (idx, 1)) > TYPE_VECTOR_SUBPARTS (type)-1)
+        return error_mark_node;
+
+      if (TREE_CODE (vect) == VECTOR_CST)
+        {
+            unsigned i;
+            tree vals = TREE_VECTOR_CST_ELTS (vect);
+            for (i = 0; vals; vals = TREE_CHAIN (vals), ++i)
+              if (i == index)
+                 return TREE_VALUE (vals);
+            return error_mark_node;
+        }
+      else if (TREE_CODE (vect) == CONSTRUCTOR)
+        {
+          unsigned i;
+          VEC (constructor_elt, gc) *vals = CONSTRUCTOR_ELTS (vect);
+          constructor_elt *elt;
+
+          for (i = 0; VEC_iterate (constructor_elt, vals, i, elt); i++)
+            if (operand_equal_p (elt->index, idx, 0))
+              return elt->value; 
+          return fold_convert (TREE_TYPE (type), integer_zero_node);
+        }
+      else if (TREE_CODE (vect) == SSA_NAME)
+        {
+          tree el;
+          gimple vectdef = SSA_NAME_DEF_STMT (vect);
+          if (gimple_assign_single_p (vectdef)
+              && (el = vector_element (gsi, gimple_assign_rhs1 (vectdef), 
+                                       idx, ptmpvec)) 
+                 != error_mark_node)
+            return el;
+          else
+            {
+              tree size = TYPE_SIZE (TREE_TYPE (type));
+              tree pos = fold_build2 (MULT_EXPR, TREE_TYPE (idx), 
+                                      idx, size);
+              return fold_build3 (BIT_FIELD_REF, TREE_TYPE (type), 
+                             vect, size, pos);
+            }
+        }
+      else
+        return error_mark_node;
+    }
+  
+  if (!ptmpvec)
+    tmpvec = create_tmp_var (TREE_TYPE (vect), "vectmp");
+  else if (!*ptmpvec)
+    tmpvec = *ptmpvec = create_tmp_var (TREE_TYPE (vect), "vectmp");
+  else
+    {
+      tmpvec = *ptmpvec;
+      need_asgn = false;
+    }
+  
+  if (need_asgn)
+    {
+      TREE_ADDRESSABLE (tmpvec) = 1;
+      asgn = gimple_build_assign (tmpvec, vect);
+      gsi_insert_before (gsi, asgn, GSI_SAME_STMT);
+    }
+
+  maxval = TYPE_VECTOR_SUBPARTS (TREE_TYPE (vect)) -1;
+  indextype = build_index_type (size_int (maxval));
+  arraytype = build_array_type (TREE_TYPE (type), indextype);
+  
+  return build4 (ARRAY_REF, TREE_TYPE (type),
+                 build1 (VIEW_CONVERT_EXPR, arraytype, tmpvec),
+                 idx, NULL_TREE, NULL_TREE);
+
+
+}
+
+/* Check if VEC_SHUFFLE_EXPR within the given setting is supported
+   by hardware, or lower it piecewie.
+
+   When VEC_SHUFFLE_EXPR has the same first and second operands:
+   VEC_SHUFFLE_EXPR <v0, v0, mask> the lowered version would be 
+   {v0[mask[0]], v0[mask[1]], ...}
+   MASK and V0 must have the same number of elements.
+        
+   Otherwise VEC_SHUFFLE_EXPR <v0, v1, mask> is lowered to 
+   {mask[0] < len(v0) ? v0[mask[0]] : v1[mask[0]], ...}
+   V0 and V1 must have the same type. MASK, V0, V1 must have the
+   same number of arguments.  */
+static void
+lower_vec_shuffle (gimple_stmt_iterator *gsi, location_t loc)
+{
+#define TRAP_RETURN(new_stmt, stmt, gsi, vec0) \
+do { \
+  new_stmt = gimple_build_call (built_in_decls[BUILT_IN_TRAP], 0); \
+  gsi_insert_before (gsi, new_stmt,  GSI_SAME_STMT); \
+  split_block (gimple_bb (new_stmt), new_stmt); \
+  new_stmt = gimple_build_assign (gimple_call_lhs (stmt), vec0); \
+  gsi_replace (gsi, new_stmt, false); \
+  return; \
+} while (0) 
+ 
+  gimple stmt = gsi_stmt (*gsi);
+  tree mask = gimple_assign_rhs3 (stmt);
+  tree vec0 = gimple_assign_rhs1 (stmt);
+  tree vec1 = gimple_assign_rhs2 (stmt);
+  unsigned els = TYPE_VECTOR_SUBPARTS (TREE_TYPE (mask));
+  tree type0 = TREE_TYPE (TREE_TYPE (vec0));
+  VEC(constructor_elt,gc) *v = NULL;
+  tree vectype, constr;
+  gimple new_stmt;
+  tree vec0tmp = NULL_TREE, masktmp = NULL_TREE;
+
+  if (expand_vec_shuffle_expr_p (TYPE_MODE (TREE_TYPE (vec0)), vec0, vec1, mask))
+    {
+      tree t;
+
+      t = gimplify_build3 (gsi, VEC_SHUFFLE_EXPR, TREE_TYPE (vec0),
+			   vec0, vec1, mask);
+      gimple_assign_set_rhs_from_tree (gsi, t);
+      /* Statement should be updated by callee.  */
+      return;
+    }
+
+  
+  if (vec0 == vec1)
+    {
+      unsigned i;
+      tree vec0tmp = NULL_TREE;
+      
+      v = VEC_alloc (constructor_elt, gc, els);
+      for (i = 0; i < els; i++)
+        {
+          tree idxval, vecel, t;
+	   
+	  idxval = vector_element (gsi, mask, size_int (i), &masktmp);
+          if (idxval == error_mark_node)
+            {
+              warning_at (loc, 0, "Invalid shuffling mask index %i", i);
+              TRAP_RETURN (new_stmt, stmt, gsi, vec0);
+            }
+
+	  vecel = vector_element (gsi, vec0, idxval, &vec0tmp);
+          if (vecel == error_mark_node)
+            {
+              warning_at (loc, 0, "Invalid shuffling arguments");
+              TRAP_RETURN (new_stmt, stmt, gsi, vec0);
+            }
+          
+          t = force_gimple_operand_gsi (gsi, vecel, true, 
+					NULL_TREE, true, GSI_SAME_STMT);
+          CONSTRUCTOR_APPEND_ELT (v, size_int (i), t);
+        }
+    }
+  else
+    {
+      unsigned i;
+      tree var = create_tmp_var (type0, "vecel");
+      tree vec1tmp = NULL_TREE;
+
+      v = VEC_alloc (constructor_elt, gc, els);
+      for (i = 0; i < els; i++)
+        {
+          tree idxval, idx1val, cond, elval0, elval1, condexpr, t, ssatmp;
+          tree vec0el, vec1el;
+          gimple asgn;
+          
+          idxval = vector_element (gsi, mask, size_int (i), &masktmp);
+	  if (idxval == error_mark_node)
+            {
+              warning_at (loc, 0, "Invalid shuffling mask index %i", i);
+              TRAP_RETURN (new_stmt, stmt, gsi, vec0);
+            }
+          
+          if (TREE_CODE (idxval) == INTEGER_CST)
+            {
+              if (tree_int_cst_lt (idxval, size_int (els)))
+                {
+                  vec0el = vector_element (gsi, vec0, idxval, &vec0tmp);
+                  t = force_gimple_operand_gsi (gsi, vec0el,
+                                    true, NULL_TREE, true, GSI_SAME_STMT);
+                }
+              else if (tree_int_cst_lt (idxval, size_int (2*els)))
+                {
+                  idx1val = fold_build2 (MINUS_EXPR, TREE_TYPE (idxval),
+                        idxval, build_int_cst (TREE_TYPE (idxval), els));
+                  
+                  vec1el = vector_element (gsi, vec1, idx1val, &vec1tmp);
+                  t = force_gimple_operand_gsi (gsi, vec1el, true, 
+						NULL_TREE, true, GSI_SAME_STMT);
+                }
+              else
+                {
+                  warning_at (loc, 0, "Invalid shuffling mask index %i", i);
+                  TRAP_RETURN (new_stmt, stmt, gsi, vec0);
+                }
+            }
+          else
+            {
+
+              idx1val = fold_build2 (MINUS_EXPR, TREE_TYPE (idxval),
+                            idxval, build_int_cst (TREE_TYPE (idxval), els));
+              idx1val = force_gimple_operand_gsi (gsi, idx1val, 
+                                true, NULL_TREE, true, GSI_SAME_STMT);
+              cond = build2 (GT_EXPR, boolean_type_node, \
+                             idxval, convert (type0, size_int (els - 1)));
+              
+	      vec0el = vector_element (gsi, vec0, idxval, &vec0tmp);
+              if (vec0el == error_mark_node)
+                {
+                  warning_at (loc, 0, "Invalid shuffling arguments");
+                  TRAP_RETURN (new_stmt, stmt, gsi, vec0);
+                }
+
+              elval0 = force_gimple_operand_gsi (gsi, vec0el, 
+                                true, NULL_TREE, true, GSI_SAME_STMT);
+	      
+	      vec1el = vector_element (gsi, vec1, idx1val, &vec1tmp);
+              if (vec1el == error_mark_node)
+                {
+                  warning_at (loc, 0, "Invalid shuffling arguments");
+                  TRAP_RETURN (new_stmt, stmt, gsi, vec0);
+                }
+
+              elval1 = force_gimple_operand_gsi (gsi, vec1el,
+                                true, NULL_TREE, true, GSI_SAME_STMT);
+
+              condexpr = fold_build3 (COND_EXPR, type0, cond, \
+                                      elval1, elval0);
+
+              t = force_gimple_operand_gsi (gsi, condexpr, true, \
+                                        NULL_TREE, true, GSI_SAME_STMT);
+            }
+          
+          asgn = gimple_build_assign (var, t);
+          ssatmp = make_ssa_name (var, asgn);
+          gimple_assign_set_lhs (asgn, ssatmp);
+          gsi_insert_before (gsi, asgn, GSI_SAME_STMT);
+          CONSTRUCTOR_APPEND_ELT (v, size_int (i), ssatmp);
+        }
+    }
+  
+  vectype = build_vector_type (type0, els);
+  constr = build_constructor (vectype, v);
+  gimple_assign_set_rhs_from_tree (gsi, constr);
+  /* Statement should be updated by callee.  */
+}
+
 /* Process one statement.  If we identify a vector operation, expand it.  */
 
 static void
@@ -451,6 +725,13 @@ expand_vector_operations_1 (gimple_stmt_
   code = gimple_assign_rhs_code (stmt);
   rhs_class = get_gimple_rhs_class (code);
 
+  if (code == VEC_SHUFFLE_EXPR)
+    {
+      lower_vec_shuffle (gsi, gimple_location (stmt));
+      gimple_set_modified (gsi_stmt (*gsi), true);
+      update_stmt (gsi_stmt (*gsi));
+    }
+
   if (rhs_class != GIMPLE_UNARY_RHS && rhs_class != GIMPLE_BINARY_RHS)
     return;
 
@@ -612,10 +893,11 @@ expand_vector_operations_1 (gimple_stmt_
 /* Use this to lower vector operations introduced by the vectorizer,
    if it may need the bit-twiddling tricks implemented in this file.  */
 
+
 static bool
-gate_expand_vector_operations (void)
+gate_expand_vector_operations_noop (void)
 {
-  return flag_tree_vectorize != 0;
+  return optimize == 0;
 }
 
 static unsigned int
@@ -648,7 +930,7 @@ struct gimple_opt_pass pass_lower_vector
  {
   GIMPLE_PASS,
   "veclower",				/* name */
-  0,					/* gate */
+  gate_expand_vector_operations_noop,   /* gate */
   expand_vector_operations,		/* execute */
   NULL,					/* sub */
   NULL,					/* next */
@@ -660,7 +942,8 @@ struct gimple_opt_pass pass_lower_vector
   0,					/* todo_flags_start */
   TODO_update_ssa	                /* todo_flags_finish */
     | TODO_verify_ssa
-    | TODO_verify_stmts | TODO_verify_flow
+    | TODO_verify_stmts | TODO_verify_flow 
+    | TODO_cleanup_cfg
  }
 };
 
@@ -669,7 +952,7 @@ struct gimple_opt_pass pass_lower_vector
  {
   GIMPLE_PASS,
   "veclower2",				/* name */
-  gate_expand_vector_operations,	/* gate */
+  0,	                                /* gate */
   expand_vector_operations,		/* execute */
   NULL,					/* sub */
   NULL,					/* next */
@@ -682,6 +965,7 @@ struct gimple_opt_pass pass_lower_vector
   TODO_update_ssa	                /* todo_flags_finish */
     | TODO_verify_ssa
     | TODO_verify_stmts | TODO_verify_flow
+    | TODO_cleanup_cfg
  }
 };
 
Index: gcc/gimple.c
===================================================================
--- gcc/gimple.c	(revision 178354)
+++ gcc/gimple.c	(working copy)
@@ -2615,6 +2615,7 @@ get_gimple_rhs_num_ops (enum tree_code c
       || (SYM) == WIDEN_MULT_MINUS_EXPR					    \
       || (SYM) == DOT_PROD_EXPR						    \
       || (SYM) == REALIGN_LOAD_EXPR					    \
+      || (SYM) == VEC_SHUFFLE_EXPR					    \
       || (SYM) == FMA_EXPR) ? GIMPLE_TERNARY_RHS			    \
    : ((SYM) == COND_EXPR						    \
       || (SYM) == CONSTRUCTOR						    \
Index: gcc/tree-cfg.c
===================================================================
--- gcc/tree-cfg.c	(revision 178354)
+++ gcc/tree-cfg.c	(working copy)
@@ -3713,6 +3713,7 @@ verify_gimple_assign_ternary (gimple stm
 
     case DOT_PROD_EXPR:
     case REALIGN_LOAD_EXPR:
+    case VEC_SHUFFLE_EXPR:
       /* FIXME.  */
       return false;
 
Index: gcc/passes.c
===================================================================
--- gcc/passes.c	(revision 178354)
+++ gcc/passes.c	(working copy)
@@ -1354,7 +1354,6 @@ init_optimization_passes (void)
 	  NEXT_PASS (pass_vectorize);
 	    {
 	      struct opt_pass **p = &pass_vectorize.pass.sub;
-	      NEXT_PASS (pass_lower_vector_ssa);
 	      NEXT_PASS (pass_dce_loop);
 	    }
           NEXT_PASS (pass_predcom);
@@ -1366,6 +1365,7 @@ init_optimization_passes (void)
 	  NEXT_PASS (pass_lim);
 	  NEXT_PASS (pass_tree_loop_done);
 	}
+      NEXT_PASS (pass_lower_vector_ssa);
       NEXT_PASS (pass_cse_reciprocals);
       NEXT_PASS (pass_reassoc);
       NEXT_PASS (pass_vrp);
Index: gcc/c-parser.c
===================================================================
--- gcc/c-parser.c	(revision 178354)
+++ gcc/c-parser.c	(working copy)
@@ -5989,6 +5989,41 @@ c_parser_alignof_expression (c_parser *p
     }
 }
 
+/* Helper function to read arguments of builtins which are interfaces
+   for the middle-end nodes like COMPLEX_EXPR, VEC_SHUFLE_EXPR and
+   others. The name of the builtin is passed using BNAME parameter.
+   Function returns true if there were no errors while parsing and
+   stores the arguments in EXPR_LIST*/
+static bool
+c_parser_get_builtin_args (c_parser *  parser, const char *  bname, 
+			   VEC(tree,gc) **  expr_list)
+{
+  location_t loc = c_parser_peek_token (parser)->location;
+  *expr_list = NULL;
+
+  if (c_parser_next_token_is_not (parser, CPP_OPEN_PAREN))
+    {
+      error_at (loc, "cannot take address of %<%s%>", bname);
+      return false;
+    }
+
+  c_parser_consume_token (parser);
+
+  if (c_parser_next_token_is (parser, CPP_CLOSE_PAREN))
+    {
+      c_parser_consume_token (parser);
+      return true;
+    }
+    
+  *expr_list = c_parser_expr_list (parser, false, false, NULL);
+
+  if (!c_parser_require (parser, CPP_CLOSE_PAREN, "expected %<)%>"))
+    return false;
+
+  return true;
+}
+
+
 /* Parse a postfix expression (C90 6.3.1-6.3.2, C99 6.5.1-6.5.2).
 
    postfix-expression:
@@ -6027,6 +6062,10 @@ c_parser_alignof_expression (c_parser *p
 			     assignment-expression )
      __builtin_types_compatible_p ( type-name , type-name )
      __builtin_complex ( assignment-expression , assignment-expression )
+     __builtin_shuffle ( assignment-expression , assignment-expression )
+     __builtin_shuffle ( assignment-expression , 
+			 assignment-expression ,
+			 assignment-expression, )
 
    offsetof-member-designator:
      identifier
@@ -6461,6 +6500,35 @@ c_parser_postfix_expression (c_parser *p
 						   (TREE_TYPE (e1.value))),
 			       e1.value, e2.value);
 	  break;
+	case RID_BUILTIN_SHUFFLE:
+	  {
+	    VEC(tree,gc) *expr_list;
+	    
+	    c_parser_consume_token (parser);
+	    if (! c_parser_get_builtin_args (parser, 
+					     "__builtin_shuffle", &expr_list))
+	      {
+		expr.value = error_mark_node;
+		break;
+	      }
+
+	    if (VEC_length (tree, expr_list) == 2)
+	      expr.value = c_build_vec_shuffle_expr 
+				(loc, VEC_index (tree, expr_list, 0),
+				 VEC_index (tree, expr_list, 0),
+				 VEC_index (tree, expr_list, 1));
+	    else if (VEC_length (tree, expr_list) == 3)
+	      expr.value = c_build_vec_shuffle_expr 
+				(loc, VEC_index (tree, expr_list, 0),
+				 VEC_index (tree, expr_list, 1),
+				 VEC_index (tree, expr_list, 2));
+	    else
+	      {
+		error_at (loc, "%<__builtin_shuffle%> wrong number of arguments");
+		expr.value = error_mark_node;
+	      }
+	    break;
+	  }
 	case RID_AT_SELECTOR:
 	  gcc_assert (c_dialect_objc ());
 	  c_parser_consume_token (parser);
Index: gcc/config/i386/sse.md
===================================================================
--- gcc/config/i386/sse.md	(revision 178354)
+++ gcc/config/i386/sse.md	(working copy)
@@ -231,6 +231,12 @@ (define_mode_attr sseinsnmode
    (V4SF "V4SF") (V2DF "V2DF")
    (TI "TI") (V32QI "OI") (V16HI "OI") (V8SI "OI") (V4DI "OI")])
 
+;; All 128bit vector modes
+(define_mode_attr sseshuffint
+  [(V16QI "V16QI") (V8HI "V8HI") 
+   (V4SI "V4SI")  (V2DI "V2DI")
+   (V4SF "V4SI") (V2DF "V2DI")])
+
 ;; Mapping of vector float modes to an integer mode of the same size
 (define_mode_attr sseintvecmode
   [(V8SF "V8SI") (V4DF "V4DI")
@@ -6234,6 +6240,18 @@ (define_expand "vconduv2di"
   DONE;
 })
 
+(define_expand "vshuffle<mode>"
+  [(match_operand:V_128 0 "register_operand" "")
+   (match_operand:V_128 1 "general_operand" "")
+   (match_operand:<sseshuffint> 2 "general_operand" "")]
+  "TARGET_SSE3 || TARGET_AVX"
+{
+  bool ok = ix86_expand_vshuffle (operands);
+  gcc_assert (ok);
+  DONE;
+})
+
+
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
 ;;
 ;; Parallel bitwise logical operations
Index: gcc/config/i386/i386-protos.h
===================================================================
--- gcc/config/i386/i386-protos.h	(revision 178354)
+++ gcc/config/i386/i386-protos.h	(working copy)
@@ -118,6 +118,7 @@ extern bool ix86_expand_int_movcc (rtx[]
 extern bool ix86_expand_fp_movcc (rtx[]);
 extern bool ix86_expand_fp_vcond (rtx[]);
 extern bool ix86_expand_int_vcond (rtx[]);
+extern bool ix86_expand_vshuffle (rtx[]);
 extern void ix86_expand_sse_unpack (rtx[], bool, bool);
 extern bool ix86_expand_int_addcc (rtx[]);
 extern rtx ix86_expand_call (rtx, rtx, rtx, rtx, rtx, bool);
Index: gcc/config/i386/i386.c
===================================================================
--- gcc/config/i386/i386.c	(revision 178354)
+++ gcc/config/i386/i386.c	(working copy)
@@ -18693,6 +18693,96 @@ ix86_expand_int_vcond (rtx operands[])
   return true;
 }
 
+bool
+ix86_expand_vshuffle (rtx operands[])
+{
+  rtx target = operands[0];
+  rtx op0 = operands[1];
+  rtx mask = operands[2];
+  rtx mm, vt, cv0, t1;
+  enum machine_mode mode = GET_MODE (op0);
+  enum machine_mode maskmode = GET_MODE (mask);
+  enum machine_mode maskinner = GET_MODE_INNER (mode);
+  rtx vec[16];
+  int w, i, j;
+
+  gcc_assert ((TARGET_SSE3 || TARGET_AVX) && GET_MODE_BITSIZE (mode) == 128);
+
+  op0 = force_reg (mode, op0);
+  mask = force_reg (maskmode, mask);
+
+  /* Number of elements in the vector.  */
+  w = GET_MODE_BITSIZE (maskmode) / GET_MODE_BITSIZE (maskinner);
+ 
+  /* mask = mask & {w-1, w-1, w-1,...} */
+  for (i = 0; i < w; i++)
+    vec[i] = GEN_INT (w - 1);
+
+  mm = gen_rtx_CONST_VECTOR (maskmode, gen_rtvec_v (w, vec));
+  mm = force_reg (maskmode, mm);
+
+  mask = gen_rtx_AND (maskmode, mask, mm);
+  
+  /* Convert mask to vector of chars.  */
+  mask = simplify_gen_subreg (V16QImode, mask, maskmode, 0);
+  mask = force_reg (V16QImode, mask);
+
+
+  /* Build a helper mask wich we will use in pshufb
+     (v4si) --> {0,0,0,0, 4,4,4,4, 8,8,8,8, 12,12,12,12}
+     (v8hi) --> {0,0, 2,2, 4,4, 6,6, ...}
+     ...  */
+  for (i = 0; i < w; i++)
+    for (j = 0; j < 16/w; j++)
+      vec[i*w+j] = GEN_INT (i*16/w);
+
+  vt = gen_rtx_CONST_VECTOR (V16QImode, gen_rtvec_v (16, vec));
+  vt = force_reg (V16QImode, vt);
+  
+  t1 = gen_reg_rtx (V16QImode);
+  emit_insn (gen_ssse3_pshufbv16qi3 (t1, mask, vt));
+  mm = t1;
+
+  /* MM contains now something like
+     mm = {m[0], .., m[0], m[k], .., m[k], ... }, where 
+     m[i] is an index of the element in the vector we are
+     selecting from.
+
+     Convert it into the byte positions by doing
+     mm = mm * {16/w, 16/w, ...}
+     mm = mm + {0,1,..,16/w, 0,1,..,16/w, ...}  */
+  for (i = 0; i < 16; i++)
+    vec[i] = GEN_INT (16/w);
+
+  cv0 = gen_rtx_CONST_VECTOR (V16QImode, gen_rtvec_v (16, vec));
+  cv0 = force_reg (V16QImode, cv0);
+  mm = gen_rtx_MULT (V16QImode, mm, cv0);
+
+  for (i = 0; i < w; i++)
+    for (j = 0; j < 16/w; j++)
+      vec[i*w+j] = GEN_INT (j);
+
+  cv0 = gen_rtx_CONST_VECTOR (V16QImode, gen_rtvec_v (16, vec));
+  cv0 = force_reg (V16QImode, cv0);
+  mm = gen_rtx_PLUS (V16QImode, mm, cv0);
+  mm = force_reg (V16QImode, mm);
+
+  t1 = gen_reg_rtx (V16QImode);
+  
+  /* Convert OP0 to vector of chars.  */
+  op0 = simplify_gen_subreg (V16QImode, op0, mode, 0);
+  op0 = force_reg (V16QImode, op0);
+  emit_insn (gen_ssse3_pshufbv16qi3 (t1, op0, mm));
+  
+  /* Convert it back from vector of chars to the original mode.  */
+  t1 = simplify_gen_subreg (mode, t1, V16QImode, 0);
+  
+  emit_insn (gen_rtx_SET (VOIDmode, target, t1));
+ 
+  fprintf (stderr, "-- %s called\n", __func__);
+  return true;
+}
+
 /* Unpack OP[1] into the next wider integer vector type.  UNSIGNED_P is
    true if we should do zero extension, else sign extension.  HIGH_P is
    true if we want the N/2 high elements, else the low elements.  */
@@ -30911,6 +31001,9 @@ struct expand_vec_perm_d
 
 static bool expand_vec_perm_1 (struct expand_vec_perm_d *d);
 static bool expand_vec_perm_broadcast_1 (struct expand_vec_perm_d *d);
+static int extract_vec_perm_cst (struct expand_vec_perm_d *, tree);
+static bool ix86_vectorize_builtin_vec_perm_ok (tree vec_type, tree mask);
+
 
 /* Get a vector mode of the same size as the original but with elements
    twice as wide.  This is only guaranteed to apply to integral vectors.  */
@@ -34576,10 +34669,10 @@ ix86_vectorize_builtin_vec_perm_ok (tree
 
   vec_mask = extract_vec_perm_cst (&d, mask);
 
-  /* This hook is cannot be called in response to something that the
-     user does (unlike the builtin expander) so we shouldn't ever see
-     an error generated from the extract.  */
-  gcc_assert (vec_mask > 0 && vec_mask <= 3);
+  /* Check whether the mask can be applied to the vector type.  */
+  if (vec_mask < 0 || vec_mask > 3)
+    return false;
+  
   one_vec = (vec_mask != 3);
 
   /* Implementable with shufps or pshufd.  */
Index: gcc/tree-ssa-operands.c
===================================================================
--- gcc/tree-ssa-operands.c	(revision 178354)
+++ gcc/tree-ssa-operands.c	(working copy)
@@ -943,6 +943,7 @@ get_expr_operands (gimple stmt, tree *ex
 
     case COND_EXPR:
     case VEC_COND_EXPR:
+    case VEC_SHUFFLE_EXPR:
       get_expr_operands (stmt, &TREE_OPERAND (expr, 0), uflags);
       get_expr_operands (stmt, &TREE_OPERAND (expr, 1), uflags);
       get_expr_operands (stmt, &TREE_OPERAND (expr, 2), uflags);

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: Vector shuffling
  2011-09-02 15:16                     ` Artem Shinkarov
@ 2011-09-02 15:41                       ` Joseph S. Myers
  2011-09-02 16:09                         ` Artem Shinkarov
  0 siblings, 1 reply; 71+ messages in thread
From: Joseph S. Myers @ 2011-09-02 15:41 UTC (permalink / raw)
  To: Artem Shinkarov; +Cc: Richard Guenther, Duncan Sands, gcc-patches

On Fri, 2 Sep 2011, Artem Shinkarov wrote:

> +  /* Avoid C_MAYBE_CONST_EXPRs inside VEC_SHUFFLE_EXPR.  */
> +  tmp = c_fully_fold (v0, false, &maybe_const);
> +  v0 = save_expr (tmp);
> +  wrap &= maybe_const;

I suppose you need this save_expr because of the two-argument case, but 
shouldn't need it otherwise.

> +  if (!two_arguments)
> +    {
> +      tmp = c_fully_fold (v1, false, &maybe_const);
> +      v1 = save_expr (tmp);

And you shouldn't need this save_expr at all.

> +  tmp = c_fully_fold (mask, false, &maybe_const);
> +  mask = save_expr (tmp);

Or this one.

> +/* Helper function to read arguments of builtins which are interfaces
> +   for the middle-end nodes like COMPLEX_EXPR, VEC_SHUFLE_EXPR and

Spelling of SHUFFLE.

> +   others. The name of the builtin is passed using BNAME parameter.

Two spaces after ".".

> +   Function returns true if there were no errors while parsing and
> +   stores the arguments in EXPR_LIST*/

".  " at end of comment.

> +static bool
> +c_parser_get_builtin_args (c_parser *  parser, const char *  bname, 
> +			   VEC(tree,gc) **  expr_list)

No spaces after "*".

> +  if (c_parser_next_token_is_not (parser, CPP_OPEN_PAREN))
> +    {
> +      error_at (loc, "cannot take address of %<%s%>", bname);

%qs is a simpler form of %<%s%>.

> @@ -6461,6 +6500,35 @@ c_parser_postfix_expression (c_parser *p

Should also convert __builtin_choose_expr and __builtin_complex to use the 
new helper.

> +	    if (! c_parser_get_builtin_args (parser, 

No space after "!".

> +	      {
> +		error_at (loc, "%<__builtin_shuffle%> wrong number of arguments");

"wrong number of arguments to %<__builtin_shuffle%>".

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: Vector shuffling
  2011-09-02 15:41                       ` Joseph S. Myers
@ 2011-09-02 16:09                         ` Artem Shinkarov
  2011-09-02 17:15                           ` Artem Shinkarov
  2011-09-02 19:52                           ` Joseph S. Myers
  0 siblings, 2 replies; 71+ messages in thread
From: Artem Shinkarov @ 2011-09-02 16:09 UTC (permalink / raw)
  To: Joseph S. Myers; +Cc: Richard Guenther, Duncan Sands, gcc-patches

On Fri, Sep 2, 2011 at 4:41 PM, Joseph S. Myers <joseph@codesourcery.com> wrote:
> On Fri, 2 Sep 2011, Artem Shinkarov wrote:
>
>> +  /* Avoid C_MAYBE_CONST_EXPRs inside VEC_SHUFFLE_EXPR.  */
>> +  tmp = c_fully_fold (v0, false, &maybe_const);
>> +  v0 = save_expr (tmp);
>> +  wrap &= maybe_const;
>
> I suppose you need this save_expr because of the two-argument case, but
> shouldn't need it otherwise.
>
>> +  if (!two_arguments)
>> +    {
>> +      tmp = c_fully_fold (v1, false, &maybe_const);
>> +      v1 = save_expr (tmp);
>
> And you shouldn't need this save_expr at all.
>
>> +  tmp = c_fully_fold (mask, false, &maybe_const);
>> +  mask = save_expr (tmp);
>
> Or this one.

Joseph, I don't understand this comment. I have 2 or 3 arguments in
the VEC_SHUFFLE_EXPR and any of them can be C_MAYBE_CONST_EXPR, so I
need to wrap mask (the last argument) to avoid the following failure:

#define vector(elcount, type)  \
 __attribute__((vector_size((elcount)*sizeof(type)))) type

extern int p, q, v, r;
int main ()
{
  vector (4, int) i0 = {argc, 1,2,3};
  vector (4, int) i1 = {argc, 1, argc, 3};
  vector (4, int) imask = {0,3,2,1};
  vector (4, int) extmask = {p,q,r,v};
  i2 = __builtin_shuffle (i0, (p,q)? imask:extmask);
  return 0;
}

and the same failure would happen if __builtin_shuffle expression will
be in the following form:
i2 = __builtin_shuffle (i0, (p,q)? imask:extmask, i2);

All the rest -- agreed, and is fixed already.


Thanks,
Artem.

>> +/* Helper function to read arguments of builtins which are interfaces
>> +   for the middle-end nodes like COMPLEX_EXPR, VEC_SHUFLE_EXPR and
>
> Spelling of SHUFFLE.
>
>> +   others. The name of the builtin is passed using BNAME parameter.
>
> Two spaces after ".".
>
>> +   Function returns true if there were no errors while parsing and
>> +   stores the arguments in EXPR_LIST*/
>
> ".  " at end of comment.
>
>> +static bool
>> +c_parser_get_builtin_args (c_parser *  parser, const char *  bname,
>> +                        VEC(tree,gc) **  expr_list)
>
> No spaces after "*".
>
>> +  if (c_parser_next_token_is_not (parser, CPP_OPEN_PAREN))
>> +    {
>> +      error_at (loc, "cannot take address of %<%s%>", bname);
>
> %qs is a simpler form of %<%s%>.
>
>> @@ -6461,6 +6500,35 @@ c_parser_postfix_expression (c_parser *p
>
> Should also convert __builtin_choose_expr and __builtin_complex to use the
> new helper.
>
>> +         if (! c_parser_get_builtin_args (parser,
>
> No space after "!".
>
>> +           {
>> +             error_at (loc, "%<__builtin_shuffle%> wrong number of arguments");
>
> "wrong number of arguments to %<__builtin_shuffle%>".
>
> --
> Joseph S. Myers
> joseph@codesourcery.com
>

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: Vector shuffling
  2011-09-02 16:09                         ` Artem Shinkarov
@ 2011-09-02 17:15                           ` Artem Shinkarov
  2011-09-02 19:52                           ` Joseph S. Myers
  1 sibling, 0 replies; 71+ messages in thread
From: Artem Shinkarov @ 2011-09-02 17:15 UTC (permalink / raw)
  To: Joseph S. Myers; +Cc: Richard Guenther, Duncan Sands, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 3092 bytes --]

New version  of the patch with adjusted __builtin_complex and
__builtin_choose_expr.


Thanks,
Artem.

On Fri, Sep 2, 2011 at 5:08 PM, Artem Shinkarov
<artyom.shinkaroff@gmail.com> wrote:
> On Fri, Sep 2, 2011 at 4:41 PM, Joseph S. Myers <joseph@codesourcery.com> wrote:
>> On Fri, 2 Sep 2011, Artem Shinkarov wrote:
>>
>>> +  /* Avoid C_MAYBE_CONST_EXPRs inside VEC_SHUFFLE_EXPR.  */
>>> +  tmp = c_fully_fold (v0, false, &maybe_const);
>>> +  v0 = save_expr (tmp);
>>> +  wrap &= maybe_const;
>>
>> I suppose you need this save_expr because of the two-argument case, but
>> shouldn't need it otherwise.
>>
>>> +  if (!two_arguments)
>>> +    {
>>> +      tmp = c_fully_fold (v1, false, &maybe_const);
>>> +      v1 = save_expr (tmp);
>>
>> And you shouldn't need this save_expr at all.
>>
>>> +  tmp = c_fully_fold (mask, false, &maybe_const);
>>> +  mask = save_expr (tmp);
>>
>> Or this one.
>
> Joseph, I don't understand this comment. I have 2 or 3 arguments in
> the VEC_SHUFFLE_EXPR and any of them can be C_MAYBE_CONST_EXPR, so I
> need to wrap mask (the last argument) to avoid the following failure:
>
> #define vector(elcount, type)  \
>  __attribute__((vector_size((elcount)*sizeof(type)))) type
>
> extern int p, q, v, r;
> int main ()
> {
>  vector (4, int) i0 = {argc, 1,2,3};
>  vector (4, int) i1 = {argc, 1, argc, 3};
>  vector (4, int) imask = {0,3,2,1};
>  vector (4, int) extmask = {p,q,r,v};
>  i2 = __builtin_shuffle (i0, (p,q)? imask:extmask);
>  return 0;
> }
>
> and the same failure would happen if __builtin_shuffle expression will
> be in the following form:
> i2 = __builtin_shuffle (i0, (p,q)? imask:extmask, i2);
>
> All the rest -- agreed, and is fixed already.
>
>
> Thanks,
> Artem.
>
>>> +/* Helper function to read arguments of builtins which are interfaces
>>> +   for the middle-end nodes like COMPLEX_EXPR, VEC_SHUFLE_EXPR and
>>
>> Spelling of SHUFFLE.
>>
>>> +   others. The name of the builtin is passed using BNAME parameter.
>>
>> Two spaces after ".".
>>
>>> +   Function returns true if there were no errors while parsing and
>>> +   stores the arguments in EXPR_LIST*/
>>
>> ".  " at end of comment.
>>
>>> +static bool
>>> +c_parser_get_builtin_args (c_parser *  parser, const char *  bname,
>>> +                        VEC(tree,gc) **  expr_list)
>>
>> No spaces after "*".
>>
>>> +  if (c_parser_next_token_is_not (parser, CPP_OPEN_PAREN))
>>> +    {
>>> +      error_at (loc, "cannot take address of %<%s%>", bname);
>>
>> %qs is a simpler form of %<%s%>.
>>
>>> @@ -6461,6 +6500,35 @@ c_parser_postfix_expression (c_parser *p
>>
>> Should also convert __builtin_choose_expr and __builtin_complex to use the
>> new helper.
>>
>>> +         if (! c_parser_get_builtin_args (parser,
>>
>> No space after "!".
>>
>>> +           {
>>> +             error_at (loc, "%<__builtin_shuffle%> wrong number of arguments");
>>
>> "wrong number of arguments to %<__builtin_shuffle%>".
>>
>> --
>> Joseph S. Myers
>> joseph@codesourcery.com
>>
>

[-- Attachment #2: vec-shuffle.v14.diff --]
[-- Type: text/plain, Size: 52941 bytes --]

Index: gcc/doc/extend.texi
===================================================================
--- gcc/doc/extend.texi	(revision 178354)
+++ gcc/doc/extend.texi	(working copy)
@@ -6561,6 +6561,32 @@ invoke undefined behavior at runtime.  W
 accesses for vector subscription can be enabled with
 @option{-Warray-bounds}.
 
+Vector shuffling is available using functions 
+@code{__builtin_shuffle (vec, mask)} and 
+@code{__builtin_shuffle (vec0, vec1, mask)}. Both functions construct
+a permutation of elements from one or two vectors and return a vector
+of the same type as input vector(s). The mask is a vector of 
+integer-typed elements. The size of each element of the mask must be
+the same as the size of each input vector element. The number of 
+elements in input vector(s) and mask must be the same.
+
+The elements of the input vectors are numbered from left to right across
+one or both of the vectors. Each element in the mask specifies a number
+of element from the input vector(s). Consider the following example.
+
+@smallexample
+typedef int v4si __attribute__ ((vector_size (16)));
+
+v4si a = @{1,2,3,4@};
+v4si b = @{5,6,7,8@};
+v4si mask1 = @{0,1,1,3@};
+v4si mask2 = @{0,4,2,5@};
+v4si res;
+
+res = __builtin_shuffle (a, mask1);       /* res is @{1,2,2,4@}  */
+res = __builtin_shuffle2 (a, b, mask2);   /* res is @{1,5,3,6@}  */
+@end smallexample
+
 You can declare variables and use them in function calls and returns, as
 well as in assignments and some casts.  You can specify a vector type as
 a return type for a function.  Vector types can also be used as function
Index: gcc/tree-pretty-print.c
===================================================================
--- gcc/tree-pretty-print.c	(revision 178354)
+++ gcc/tree-pretty-print.c	(working copy)
@@ -2067,6 +2067,16 @@ dump_generic_node (pretty_printer *buffe
       dump_generic_node (buffer, TREE_OPERAND (node, 2), spc, flags, false);
       pp_string (buffer, " > ");
       break;
+    
+    case VEC_SHUFFLE_EXPR:
+      pp_string (buffer, " VEC_SHUFFLE_EXPR < ");
+      dump_generic_node (buffer, TREE_OPERAND (node, 0), spc, flags, false);
+      pp_string (buffer, " , ");
+      dump_generic_node (buffer, TREE_OPERAND (node, 1), spc, flags, false);
+      pp_string (buffer, " , ");
+      dump_generic_node (buffer, TREE_OPERAND (node, 2), spc, flags, false);
+      pp_string (buffer, " > ");
+      break;
 
     case DOT_PROD_EXPR:
       pp_string (buffer, " DOT_PROD_EXPR < ");
Index: gcc/c-family/c-common.c
===================================================================
--- gcc/c-family/c-common.c	(revision 178354)
+++ gcc/c-family/c-common.c	(working copy)
@@ -425,6 +425,7 @@ const struct c_common_resword c_common_r
   { "__attribute__",	RID_ATTRIBUTE,	0 },
   { "__builtin_choose_expr", RID_CHOOSE_EXPR, D_CONLY },
   { "__builtin_complex", RID_BUILTIN_COMPLEX, D_CONLY },
+  { "__builtin_shuffle", RID_BUILTIN_SHUFFLE, D_CONLY },
   { "__builtin_offsetof", RID_OFFSETOF, 0 },
   { "__builtin_types_compatible_p", RID_TYPES_COMPATIBLE_P, D_CONLY },
   { "__builtin_va_arg",	RID_VA_ARG,	0 },
Index: gcc/c-family/c-common.h
===================================================================
--- gcc/c-family/c-common.h	(revision 178354)
+++ gcc/c-family/c-common.h	(working copy)
@@ -103,7 +103,7 @@ enum rid
   /* C extensions */
   RID_ASM,       RID_TYPEOF,   RID_ALIGNOF,  RID_ATTRIBUTE,  RID_VA_ARG,
   RID_EXTENSION, RID_IMAGPART, RID_REALPART, RID_LABEL,      RID_CHOOSE_EXPR,
-  RID_TYPES_COMPATIBLE_P,      RID_BUILTIN_COMPLEX,
+  RID_TYPES_COMPATIBLE_P,      RID_BUILTIN_COMPLEX,	     RID_BUILTIN_SHUFFLE,
   RID_DFLOAT32, RID_DFLOAT64, RID_DFLOAT128,
   RID_FRACT, RID_ACCUM,
 
@@ -898,6 +898,7 @@ extern tree build_function_call (locatio
 
 extern tree build_function_call_vec (location_t, tree,
     				     VEC(tree,gc) *, VEC(tree,gc) *);
+extern tree c_build_vec_shuffle_expr (location_t, tree, tree, tree);
 
 extern tree resolve_overloaded_builtin (location_t, tree, VEC(tree,gc) *);
 
Index: gcc/optabs.c
===================================================================
--- gcc/optabs.c	(revision 178354)
+++ gcc/optabs.c	(working copy)
@@ -6620,6 +6620,82 @@ vector_compare_rtx (tree cond, bool unsi
   return gen_rtx_fmt_ee (rcode, VOIDmode, ops[0].value, ops[1].value);
 }
 
+/* Return true if VEC_SHUFF_EXPR can be expanded using SIMD extensions
+   of the CPU.  */
+bool
+expand_vec_shuffle_expr_p (enum machine_mode mode, tree v0, 
+			   tree v1, tree mask)
+{
+  int v0_mode_s = GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE (TREE_TYPE (v0))));
+  int mask_mode_s = GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE (TREE_TYPE (mask))));
+
+  if (TREE_CODE (mask) == VECTOR_CST
+      && targetm.vectorize.builtin_vec_perm_ok (TREE_TYPE (v0), mask))
+    return true;
+  
+  if (v0 != v1 || v0_mode_s != mask_mode_s)
+    return false;
+    
+  return direct_optab_handler (vshuffle_optab, mode) != CODE_FOR_nothing;
+}
+
+/* Generate instructions for VEC_COND_EXPR given its type and three
+   operands.  */
+rtx
+expand_vec_shuffle_expr (tree type, tree v0, tree v1, tree mask, rtx target)
+{
+  struct expand_operand ops[4];
+  enum insn_code icode;
+  enum machine_mode mode = TYPE_MODE (type);
+  rtx rtx_v0, rtx_mask;
+
+  gcc_assert (expand_vec_shuffle_expr_p (mode, v0, v1, mask));
+
+  if (TREE_CODE (mask) == VECTOR_CST)
+    {
+      tree m_type, call;
+      tree fn = targetm.vectorize.builtin_vec_perm (TREE_TYPE (v0), &m_type);
+      rtx t;
+
+      if (!fn)
+	goto vshuffle;
+
+      if (m_type != TREE_TYPE (TREE_TYPE (mask)))
+	{	
+	  int units = TYPE_VECTOR_SUBPARTS (TREE_TYPE (mask));
+	  tree cvt = build_vector_type (m_type, units);
+	  mask = fold_convert (cvt, mask);
+	}
+
+      fn = copy_node (fn);
+      call = fold_build1 (ADDR_EXPR, build_pointer_type (TREE_TYPE (fn)), fn);
+      call = build_call_nary (type /* ? */, call, 3, v0, v1, mask);
+
+      t = expand_normal (call);  
+      target = gen_reg_rtx (mode);
+      emit_insn (gen_rtx_SET (VOIDmode, target, t));
+      return target;
+    }
+
+vshuffle:
+  gcc_assert (v1 == v0);
+
+  icode = direct_optab_handler (vshuffle_optab, mode);
+
+  if (icode == CODE_FOR_nothing)
+    return 0;
+  
+  rtx_v0 = expand_normal (v0);
+  rtx_mask = expand_normal (mask);
+
+  create_output_operand (&ops[0], target, mode);
+  create_input_operand (&ops[1], rtx_v0, mode);
+  create_input_operand (&ops[2], rtx_mask, mode);
+  expand_insn (icode, 3, ops);
+  
+  return ops[0].value;
+}
+
 /* Return insn code for TYPE, the type of a VEC_COND_EXPR.  */
 
 static inline enum insn_code
Index: gcc/optabs.h
===================================================================
--- gcc/optabs.h	(revision 178354)
+++ gcc/optabs.h	(working copy)
@@ -636,6 +636,9 @@ enum direct_optab_index
   DOI_vcond,
   DOI_vcondu,
 
+  /* Vector shuffling.  */
+  DOI_vshuffle,
+
   /* Block move operation.  */
   DOI_movmem,
 
@@ -701,6 +704,7 @@ typedef struct direct_optab_d *direct_op
 #define reload_out_optab (&direct_optab_table[(int) DOI_reload_out])
 #define vcond_optab (&direct_optab_table[(int) DOI_vcond])
 #define vcondu_optab (&direct_optab_table[(int) DOI_vcondu])
+#define vshuffle_optab (&direct_optab_table[(int) DOI_vshuffle])
 #define movmem_optab (&direct_optab_table[(int) DOI_movmem])
 #define setmem_optab (&direct_optab_table[(int) DOI_setmem])
 #define cmpstr_optab (&direct_optab_table[(int) DOI_cmpstr])
@@ -879,8 +883,15 @@ extern rtx expand_widening_mult (enum ma
 /* Return tree if target supports vector operations for COND_EXPR.  */
 bool expand_vec_cond_expr_p (tree, enum machine_mode);
 
+/* Return tree if target supports vector operations for VEC_SHUFFLE_EXPR.  */
+bool expand_vec_shuffle_expr_p (enum machine_mode, tree, tree, tree);
+
 /* Generate code for VEC_COND_EXPR.  */
 extern rtx expand_vec_cond_expr (tree, tree, tree, tree, rtx);
+
+/* Generate code for VEC_SHUFFLE_EXPR.  */
+extern rtx expand_vec_shuffle_expr (tree, tree, tree, tree, rtx);
+
 /* Generate code for VEC_LSHIFT_EXPR and VEC_RSHIFT_EXPR.  */
 extern rtx expand_vec_shift_expr (sepops, rtx);
 
Index: gcc/genopinit.c
===================================================================
--- gcc/genopinit.c	(revision 178354)
+++ gcc/genopinit.c	(working copy)
@@ -255,6 +255,7 @@ static const char * const optabs[] =
   "set_optab_handler (vec_realign_load_optab, $A, CODE_FOR_$(vec_realign_load_$a$))",
   "set_direct_optab_handler (vcond_optab, $A, CODE_FOR_$(vcond$a$))",
   "set_direct_optab_handler (vcondu_optab, $A, CODE_FOR_$(vcondu$a$))",
+  "set_direct_optab_handler (vshuffle_optab, $A, CODE_FOR_$(vshuffle$a$))",
   "set_optab_handler (ssum_widen_optab, $A, CODE_FOR_$(widen_ssum$I$a3$))",
   "set_optab_handler (usum_widen_optab, $A, CODE_FOR_$(widen_usum$I$a3$))",
   "set_optab_handler (udot_prod_optab, $A, CODE_FOR_$(udot_prod$I$a$))",
Index: gcc/testsuite/gcc.c-torture/execute/vect-shuffle-2.c
===================================================================
--- gcc/testsuite/gcc.c-torture/execute/vect-shuffle-2.c	(revision 0)
+++ gcc/testsuite/gcc.c-torture/execute/vect-shuffle-2.c	(revision 0)
@@ -0,0 +1,44 @@
+#define vector(elcount, type)  \
+__attribute__((vector_size((elcount)*sizeof(type)))) type
+
+#define vidx(type, vec, idx) (*(((type *) &(vec)) + idx))
+
+#define shuf2compare(type, count, vres, v0, v1, mask) \
+do { \
+    int __i; \
+    for (__i = 0; __i < count; __i++) { \
+        if (vidx(type, vres, __i) != ((vidx(type, mask, __i) < count) ? \
+                          vidx(type, v0, vidx(type, mask, __i)) :  \
+                          vidx(type, v1, (vidx(type, mask, __i) - count)))) \
+            __builtin_abort (); \
+        } \
+} while (0)
+
+
+int main (int argc, char *argv[]) {
+    vector (8, short) v0 = {5, 5,5,5,5,5,argc,7};
+    vector (8, short) v1 = {argc, 1,8,8,4,9,argc,4};
+    vector (8, short) v2;
+
+    //vector (8, short) mask = {1,2,5,4,3,6,7};
+    
+    vector (8, short) mask0 = {0,2,3,1,4,5,6,7};
+    vector (8, short) mask1 = {0,12,3,4,3,0,10,9};
+    
+    vector (8, short) mask2 = {0,8,1,9,2,10,3,11};
+
+    v2 = __builtin_shuffle (v0, v1,  mask0);
+    shuf2compare (short, 8, v2, v0, v1, mask0);
+ 
+    v2 = __builtin_shuffle (v0, v1,  mask1);
+    shuf2compare (short, 8, v2, v0, v1, mask1);
+    
+    v2 = __builtin_shuffle (v0, v1,  mask2);
+    shuf2compare (short, 8, v2, v0, v1, mask2);
+
+    v2 = __builtin_shuffle (mask0, mask0,  v0);
+    shuf2compare (short, 8, v2, mask0, mask0, v0);
+
+    return 0; 
+}
+
Index: gcc/testsuite/gcc.c-torture/execute/vect-shuffle-4.c
===================================================================
--- gcc/testsuite/gcc.c-torture/execute/vect-shuffle-4.c	(revision 0)
+++ gcc/testsuite/gcc.c-torture/execute/vect-shuffle-4.c	(revision 0)
@@ -0,0 +1,50 @@
+#define vector(elcount, type)  \
+__attribute__((vector_size((elcount)*sizeof(type)))) type
+
+#define vidx(type, vec, idx) (*(((type *) &(vec)) + idx))
+
+#define shuf2compare(type, count, vres, v0, v1, mask) \
+do { \
+    int __i; \
+    for (__i = 0; __i < count; __i++) { \
+        if (vidx(type, vres, __i) != ((vidx(type, mask, __i) < count) ? \
+                          vidx(type, v0, vidx(type, mask, __i)) :  \
+                          vidx(type, v1, (vidx(type, mask, __i) - count)))) \
+            __builtin_abort (); \
+        } \
+} while (0)
+
+
+vector (8, short) __attribute__ ((noinline))
+f (vector (8, short) x, vector (8, short) y, vector (8, short) mask) {
+    return __builtin_shuffle (x, y, mask);
+}
+
+
+
+int main (int argc, char *argv[]) {
+    vector (8, short) v0 = {argc, 1,2,3,4,5,6,7};
+    vector (8, short) v1 = {argc, 1,argc,3,4,5,argc,7};
+    vector (8, short) v2;
+
+    //vector (8, short) mask = {1,2,5,4,3,6,7};
+    
+    vector (8, short) mask0 = {0,2,3,1,4,5,6,7};
+    vector (8, short) mask1 = {0,12,3,4,3,0,10,9};
+    vector (8, short) mask2 = {0,8,1,9,2,10,3,11};
+
+    v2 = f (v0, v1,  mask0);
+    shuf2compare (short, 8, v2, v0, v1, mask0);
+ 
+    v2 = f (v0, v1,  mask1);
+    shuf2compare (short, 8, v2, v0, v1, mask1);
+
+    v2 = f (v0, v1,  mask2);
+    shuf2compare (short, 8, v2, v0, v1, mask2);
+
+    v2 = f (mask0, mask0,  v0);
+    shuf2compare (short, 8, v2, mask0, mask0, v0);
+
+    return 0; 
+}
+
Index: gcc/testsuite/gcc.c-torture/execute/vect-shuffle-1.c
===================================================================
--- gcc/testsuite/gcc.c-torture/execute/vect-shuffle-1.c	(revision 0)
+++ gcc/testsuite/gcc.c-torture/execute/vect-shuffle-1.c	(revision 0)
@@ -0,0 +1,46 @@
+#define vector(elcount, type)  \
+__attribute__((vector_size((elcount)*sizeof(type)))) type
+
+#define vidx(type, vec, idx) (*(((type *) &(vec)) + idx))
+
+#define shufcompare(type, count, vres, v0, mask) \
+do { \
+    int __i; \
+    for (__i = 0; __i < count; __i++) { \
+        if (vidx(type, vres, __i) != vidx(type, v0, vidx(type, mask, __i))) \
+            __builtin_abort (); \
+    } \
+} while (0)
+
+
+int main (int argc, char *argv[]) {
+    /*vector (8, short) v0 = {argc, 1,2,3,4,5,6,7};
+    vector (8, short) v1 = {argc, 1,argc,3,4,5,argc,7};
+    vector (8, short) v2;
+   
+    vector (8, short) smask = {0,0,1,2,3,4,5,6};
+    
+    v2 = __builtin_shuffle (v0,  smask);
+    shufcompare (short, 8, v2, v0, smask);
+    v2 = __builtin_shuffle (v0, v1);
+    shufcompare (short, 8, v2, v0, v1);
+    v2 = __builtin_shuffle (smask, v0);
+    shufcompare (short, 8, v2, smask, v0);*/
+
+    vector (4, int) i0 = {argc, 1,2,3};
+    vector (4, int) i1 = {argc, 1, argc, 3};
+    vector (4, int) i2;
+
+    vector (4, int) imask = {0,3,2,1};
+
+    /*i2 = __builtin_shuffle (i0, imask);
+    shufcompare (int, 4, i2, i0, imask);*/
+    i2 = __builtin_shuffle (i0, i1);
+    shufcompare (int, 4, i2, i0, i1);
+    
+    i2 = __builtin_shuffle (imask, i0);
+    shufcompare (int, 4, i2, imask, i0);
+    
+    return 0;
+}
+
Index: gcc/testsuite/gcc.c-torture/execute/vect-shuffle-3.c
===================================================================
--- gcc/testsuite/gcc.c-torture/execute/vect-shuffle-3.c	(revision 0)
+++ gcc/testsuite/gcc.c-torture/execute/vect-shuffle-3.c	(revision 0)
@@ -0,0 +1,36 @@
+#define vector(elcount, type)  \
+__attribute__((vector_size((elcount)*sizeof(type)))) type
+
+#define vidx(type, vec, idx) (*(((type *) &(vec)) + idx))
+
+#define shufcompare(type, count, vres, v0, mask) \
+do { \
+    int __i; \
+    for (__i = 0; __i < count; __i++) { \
+        if (vidx(type, vres, __i) != vidx(type, v0, vidx(type, mask, __i))) \
+            __builtin_abort (); \
+    } \
+} while (0)
+
+vector (8, short) __attribute__ ((noinline))
+f (vector (8, short) x, vector (8, short) mask) {
+    return __builtin_shuffle (x, mask);
+}
+
+
+int main (int argc, char *argv[]) {
+    vector (8, short) v0 = {argc, 1,2,3,4,5,6,7};
+    vector (8, short) v1 = {argc, 1,argc,3,4,5,argc,7};
+    vector (8, short) v2;
+
+    vector (8, short) mask = {0,0,1,2,3,4,5,6};
+    
+    v2 = f (v0,  mask);
+    shufcompare (short, 8, v2, v0, mask);
+
+    v2 = f (v0, v1);
+    shufcompare (short, 8, v2, v0, v1);
+
+    return 0;
+}
+
Index: gcc/testsuite/gcc.dg/builtin-complex-err-1.c
===================================================================
--- gcc/testsuite/gcc.dg/builtin-complex-err-1.c	(revision 178354)
+++ gcc/testsuite/gcc.dg/builtin-complex-err-1.c	(working copy)
@@ -19,8 +19,8 @@ _Complex float fc3 = __builtin_complex (
 void
 f (void)
 {
-  __builtin_complex (0.0); /* { dg-error "expected" } */
-  __builtin_complex (0.0, 0.0, 0.0); /* { dg-error "expected" } */
+  __builtin_complex (0.0); /* { dg-error "wrong number of arguments" } */
+  __builtin_complex (0.0, 0.0, 0.0); /* { dg-error "wrong number of arguments" } */
 }
 
-void (*p) (void) = __builtin_complex; /* { dg-error "expected" } */
+void (*p) (void) = __builtin_complex; /* { dg-error "cannot take address" } */
Index: gcc/expr.c
===================================================================
--- gcc/expr.c	(revision 178354)
+++ gcc/expr.c	(working copy)
@@ -8605,6 +8605,10 @@ expand_expr_real_2 (sepops ops, rtx targ
     case VEC_PACK_FIX_TRUNC_EXPR:
       mode = TYPE_MODE (TREE_TYPE (treeop0));
       goto binop;
+    
+    case VEC_SHUFFLE_EXPR:
+      target = expand_vec_shuffle_expr (type, treeop0, treeop1, treeop2, target);
+      return target;
 
     case DOT_PROD_EXPR:
       {
Index: gcc/gimple-pretty-print.c
===================================================================
--- gcc/gimple-pretty-print.c	(revision 178354)
+++ gcc/gimple-pretty-print.c	(working copy)
@@ -417,6 +417,16 @@ dump_ternary_rhs (pretty_printer *buffer
       dump_generic_node (buffer, gimple_assign_rhs3 (gs), spc, flags, false);
       pp_string (buffer, ">");
       break;
+    
+    case VEC_SHUFFLE_EXPR:
+      pp_string (buffer, "VEC_SHUFFLE_EXPR <");
+      dump_generic_node (buffer, gimple_assign_rhs1 (gs), spc, flags, false);
+      pp_string (buffer, ", ");
+      dump_generic_node (buffer, gimple_assign_rhs2 (gs), spc, flags, false);
+      pp_string (buffer, ", ");
+      dump_generic_node (buffer, gimple_assign_rhs3 (gs), spc, flags, false);
+      pp_string (buffer, ">");
+      break;
 
     case REALIGN_LOAD_EXPR:
       pp_string (buffer, "REALIGN_LOAD <");
Index: gcc/c-typeck.c
===================================================================
--- gcc/c-typeck.c	(revision 178354)
+++ gcc/c-typeck.c	(working copy)
@@ -2845,6 +2845,89 @@ build_function_call_vec (location_t loc,
     }
   return require_complete_type (result);
 }
+
+/* Build a VEC_SHUFFLE_EXPR if V0, V1 and MASK are not error_mark_nodes
+   and have vector types, V0 has the same type as V1, and the number of
+   elements of V0, V1, MASK is the same.  */
+tree
+c_build_vec_shuffle_expr (location_t loc, tree v0, tree v1, tree mask)
+{
+  tree vec_shuffle, tmp;
+  bool wrap = true;
+  bool maybe_const = false;
+  bool two_arguments = v0 == v1;
+
+
+  if (v0 == error_mark_node || v1 == error_mark_node 
+      || mask == error_mark_node)
+    return error_mark_node;
+
+  if (TREE_CODE (TREE_TYPE (mask)) != VECTOR_TYPE
+      || TREE_CODE (TREE_TYPE (TREE_TYPE (mask))) != INTEGER_TYPE)
+    {
+      error_at (loc, "__builtin_shuffle last argument must "
+		     "be an integer vector");
+      return error_mark_node;
+    }
+   
+  if (TREE_CODE (TREE_TYPE (v0)) != VECTOR_TYPE
+      || TREE_CODE (TREE_TYPE (v1)) != VECTOR_TYPE)
+    {
+      error_at (loc, "__builtin_shuffle arguments must be vectors");
+      return error_mark_node;
+    }
+
+  if (TREE_TYPE (v0) != TREE_TYPE (v1))
+    {
+      error_at (loc, "__builtin_shuffle argument vectors must be of "
+		     "the same type");
+      return error_mark_node;
+    }
+
+  if (TYPE_VECTOR_SUBPARTS (TREE_TYPE (v0)) 
+      != TYPE_VECTOR_SUBPARTS (TREE_TYPE (mask))
+      && TYPE_VECTOR_SUBPARTS (TREE_TYPE (v1))
+	 != TYPE_VECTOR_SUBPARTS (TREE_TYPE (mask)))
+    {
+      error_at (loc, "__builtin_shuffle number of elements of the "
+		     "argument vector(s) and the mask vector should "
+		     "be the same");
+      return error_mark_node;
+    }
+  
+  if (GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE (TREE_TYPE (v0)))) 
+      != GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE (TREE_TYPE (mask)))))
+    {
+      error_at (loc, "__builtin_shuffle argument vector(s) inner type "
+		     "must have the same size as inner type of the mask");
+      return error_mark_node;
+    }
+
+  /* Avoid C_MAYBE_CONST_EXPRs inside VEC_SHUFFLE_EXPR.  */
+  tmp = c_fully_fold (v0, false, &maybe_const);
+  v0 = save_expr (tmp);
+  wrap &= maybe_const;
+
+  if (!two_arguments)
+    {
+      tmp = c_fully_fold (v1, false, &maybe_const);
+      v1 = save_expr (tmp);
+      wrap &= maybe_const;
+    }
+  else
+    v1 = v0;
+  
+  tmp = c_fully_fold (mask, false, &maybe_const);
+  mask = save_expr (tmp);
+  wrap &= maybe_const;
+
+  vec_shuffle = build3 (VEC_SHUFFLE_EXPR, TREE_TYPE (v0), v0, v1, mask);
+
+  if (!wrap)
+    vec_shuffle = c_wrap_maybe_const (vec_shuffle, true);
+
+  return vec_shuffle;
+}
 \f
 /* Convert the argument expressions in the vector VALUES
    to the types in the list TYPELIST.
@@ -6120,7 +6203,14 @@ digest_init (location_t init_loc, tree t
 	  tree value;
 	  bool constant_p = true;
 
-	  /* Iterate through elements and check if all constructor
+	  /* If constructor has less elements than the vector type.  */
+          if (CONSTRUCTOR_NELTS (inside_init) 
+              < TYPE_VECTOR_SUBPARTS (TREE_TYPE (inside_init)))
+            warning_at (init_loc, 0, "vector length does not match "
+                                     "initializer length, zero elements "
+                                     "will be inserted");
+          
+          /* Iterate through elements and check if all constructor
 	     elements are *_CSTs.  */
 	  FOR_EACH_CONSTRUCTOR_VALUE (CONSTRUCTOR_ELTS (inside_init), ix, value)
 	    if (!CONSTANT_CLASS_P (value))
Index: gcc/gimplify.c
===================================================================
--- gcc/gimplify.c	(revision 178354)
+++ gcc/gimplify.c	(working copy)
@@ -7053,6 +7053,32 @@ gimplify_expr (tree *expr_p, gimple_seq
 	  }
 	  break;
 
+	case VEC_SHUFFLE_EXPR:
+	  {
+	    enum gimplify_status r0, r1, r2;
+
+	    if (TREE_OPERAND (*expr_p, 0) == TREE_OPERAND (*expr_p, 1))
+	      {
+		r0 = r1 = gimplify_expr (&TREE_OPERAND (*expr_p, 0), pre_p,
+					 post_p, is_gimple_val, fb_rvalue);
+		TREE_OPERAND (*expr_p, 1) = TREE_OPERAND (*expr_p, 0);
+	      }
+	    else
+	      {
+		 r0 = gimplify_expr (&TREE_OPERAND (*expr_p, 0), pre_p,
+				     post_p, is_gimple_val, fb_rvalue);
+		 r1 = gimplify_expr (&TREE_OPERAND (*expr_p, 1), pre_p,
+				     post_p, is_gimple_val, fb_rvalue);
+	      }
+
+	    r2 = gimplify_expr (&TREE_OPERAND (*expr_p, 2), pre_p,
+				post_p, is_gimple_val, fb_rvalue);
+	    recalculate_side_effects (*expr_p);
+
+	    ret = MIN (r0, MIN (r1, r2));
+	    break;
+	  }
+
 	case TARGET_MEM_REF:
 	  {
 	    enum gimplify_status r0 = GS_ALL_DONE, r1 = GS_ALL_DONE;
Index: gcc/tree.def
===================================================================
--- gcc/tree.def	(revision 178354)
+++ gcc/tree.def	(working copy)
@@ -497,6 +497,19 @@ DEFTREECODE (COND_EXPR, "cond_expr", tcc
 */
 DEFTREECODE (VEC_COND_EXPR, "vec_cond_expr", tcc_expression, 3)
 
+/* Vector shuffle expression. A = VEC_SHUFFLE_EXPR<v0, v1, maks>
+   means
+
+   freach i in length (mask):
+     A = mask[i] < length (v0) ? v0[mask[i]] : v1[mask[i]]
+
+   V0 and V1 are vectors of the same type. MASK is an integer-typed
+   vector. The number of MASK elements must be the same with the
+   number of elements in V0 and V1. The size of the inner type
+   of the MASK and of the V0 and V1 must be the same.
+*/
+DEFTREECODE (VEC_SHUFFLE_EXPR, "vec_shuffle_expr", tcc_expression, 3)
+
 /* Declare local variables, including making RTL and allocating space.
    BIND_EXPR_VARS is a chain of VAR_DECL nodes for the variables.
    BIND_EXPR_BODY is the body, the expression to be computed using
Index: gcc/tree-inline.c
===================================================================
--- gcc/tree-inline.c	(revision 178354)
+++ gcc/tree-inline.c	(working copy)
@@ -3285,6 +3285,7 @@ estimate_operator_cost (enum tree_code c
        ??? We may consider mapping RTL costs to this.  */
     case COND_EXPR:
     case VEC_COND_EXPR:
+    case VEC_SHUFFLE_EXPR:
 
     case PLUS_EXPR:
     case POINTER_PLUS_EXPR:
Index: gcc/tree-vect-generic.c
===================================================================
--- gcc/tree-vect-generic.c	(revision 178354)
+++ gcc/tree-vect-generic.c	(working copy)
@@ -30,6 +30,7 @@ along with GCC; see the file COPYING3.
 #include "tree-pass.h"
 #include "flags.h"
 #include "ggc.h"
+#include "diagnostic.h"
 
 /* Need to include rtl.h, expr.h, etc. for optabs.  */
 #include "expr.h"
@@ -432,6 +433,279 @@ type_for_widest_vector_mode (enum machin
     }
 }
 
+
+/* Build a reference to the element of the vector VECT. Function 
+   returns either the element itself, either BIT_FIELD_REF, or an 
+   ARRAY_REF expression.
+   
+   GSI is requred to insert temporary variables while building a
+   refernece to the element of the vector VECT.
+   
+   PTMPVEC is a pointer to the temporary variable for caching
+   purposes. In case when PTMPVEC is NULL new temporary variable
+   will be created.  */
+static tree
+vector_element (gimple_stmt_iterator *gsi, tree vect, tree idx, tree *ptmpvec)
+{
+  tree type;
+  gimple asgn; 
+  unsigned HOST_WIDE_INT maxval;
+  tree tmpvec; 
+  tree indextype, arraytype;
+  bool need_asgn = true;
+
+  gcc_assert (TREE_CODE (TREE_TYPE (vect)) == VECTOR_TYPE);
+
+  type = TREE_TYPE (vect);
+  if (TREE_CODE (idx) == INTEGER_CST)
+    {
+      unsigned HOST_WIDE_INT index;
+
+      if (!host_integerp (idx, 1)
+           || (index = tree_low_cst (idx, 1)) > TYPE_VECTOR_SUBPARTS (type)-1)
+        return error_mark_node;
+
+      if (TREE_CODE (vect) == VECTOR_CST)
+        {
+            unsigned i;
+            tree vals = TREE_VECTOR_CST_ELTS (vect);
+            for (i = 0; vals; vals = TREE_CHAIN (vals), ++i)
+              if (i == index)
+                 return TREE_VALUE (vals);
+            return error_mark_node;
+        }
+      else if (TREE_CODE (vect) == CONSTRUCTOR)
+        {
+          unsigned i;
+          VEC (constructor_elt, gc) *vals = CONSTRUCTOR_ELTS (vect);
+          constructor_elt *elt;
+
+          for (i = 0; VEC_iterate (constructor_elt, vals, i, elt); i++)
+            if (operand_equal_p (elt->index, idx, 0))
+              return elt->value; 
+          return fold_convert (TREE_TYPE (type), integer_zero_node);
+        }
+      else if (TREE_CODE (vect) == SSA_NAME)
+        {
+          tree el;
+          gimple vectdef = SSA_NAME_DEF_STMT (vect);
+          if (gimple_assign_single_p (vectdef)
+              && (el = vector_element (gsi, gimple_assign_rhs1 (vectdef), 
+                                       idx, ptmpvec)) 
+                 != error_mark_node)
+            return el;
+          else
+            {
+              tree size = TYPE_SIZE (TREE_TYPE (type));
+              tree pos = fold_build2 (MULT_EXPR, TREE_TYPE (idx), 
+                                      idx, size);
+              return fold_build3 (BIT_FIELD_REF, TREE_TYPE (type), 
+                             vect, size, pos);
+            }
+        }
+      else
+        return error_mark_node;
+    }
+  
+  if (!ptmpvec)
+    tmpvec = create_tmp_var (TREE_TYPE (vect), "vectmp");
+  else if (!*ptmpvec)
+    tmpvec = *ptmpvec = create_tmp_var (TREE_TYPE (vect), "vectmp");
+  else
+    {
+      tmpvec = *ptmpvec;
+      need_asgn = false;
+    }
+  
+  if (need_asgn)
+    {
+      TREE_ADDRESSABLE (tmpvec) = 1;
+      asgn = gimple_build_assign (tmpvec, vect);
+      gsi_insert_before (gsi, asgn, GSI_SAME_STMT);
+    }
+
+  maxval = TYPE_VECTOR_SUBPARTS (TREE_TYPE (vect)) -1;
+  indextype = build_index_type (size_int (maxval));
+  arraytype = build_array_type (TREE_TYPE (type), indextype);
+  
+  return build4 (ARRAY_REF, TREE_TYPE (type),
+                 build1 (VIEW_CONVERT_EXPR, arraytype, tmpvec),
+                 idx, NULL_TREE, NULL_TREE);
+
+
+}
+
+/* Check if VEC_SHUFFLE_EXPR within the given setting is supported
+   by hardware, or lower it piecewie.
+
+   When VEC_SHUFFLE_EXPR has the same first and second operands:
+   VEC_SHUFFLE_EXPR <v0, v0, mask> the lowered version would be 
+   {v0[mask[0]], v0[mask[1]], ...}
+   MASK and V0 must have the same number of elements.
+        
+   Otherwise VEC_SHUFFLE_EXPR <v0, v1, mask> is lowered to 
+   {mask[0] < len(v0) ? v0[mask[0]] : v1[mask[0]], ...}
+   V0 and V1 must have the same type. MASK, V0, V1 must have the
+   same number of arguments.  */
+static void
+lower_vec_shuffle (gimple_stmt_iterator *gsi, location_t loc)
+{
+#define TRAP_RETURN(new_stmt, stmt, gsi, vec0) \
+do { \
+  new_stmt = gimple_build_call (built_in_decls[BUILT_IN_TRAP], 0); \
+  gsi_insert_before (gsi, new_stmt,  GSI_SAME_STMT); \
+  split_block (gimple_bb (new_stmt), new_stmt); \
+  new_stmt = gimple_build_assign (gimple_call_lhs (stmt), vec0); \
+  gsi_replace (gsi, new_stmt, false); \
+  return; \
+} while (0) 
+ 
+  gimple stmt = gsi_stmt (*gsi);
+  tree mask = gimple_assign_rhs3 (stmt);
+  tree vec0 = gimple_assign_rhs1 (stmt);
+  tree vec1 = gimple_assign_rhs2 (stmt);
+  unsigned els = TYPE_VECTOR_SUBPARTS (TREE_TYPE (mask));
+  tree type0 = TREE_TYPE (TREE_TYPE (vec0));
+  VEC(constructor_elt,gc) *v = NULL;
+  tree vectype, constr;
+  gimple new_stmt;
+  tree vec0tmp = NULL_TREE, masktmp = NULL_TREE;
+
+  if (expand_vec_shuffle_expr_p (TYPE_MODE (TREE_TYPE (vec0)), vec0, vec1, mask))
+    {
+      tree t;
+
+      t = gimplify_build3 (gsi, VEC_SHUFFLE_EXPR, TREE_TYPE (vec0),
+			   vec0, vec1, mask);
+      gimple_assign_set_rhs_from_tree (gsi, t);
+      /* Statement should be updated by callee.  */
+      return;
+    }
+
+  
+  if (vec0 == vec1)
+    {
+      unsigned i;
+      tree vec0tmp = NULL_TREE;
+      
+      v = VEC_alloc (constructor_elt, gc, els);
+      for (i = 0; i < els; i++)
+        {
+          tree idxval, vecel, t;
+	   
+	  idxval = vector_element (gsi, mask, size_int (i), &masktmp);
+          if (idxval == error_mark_node)
+            {
+              warning_at (loc, 0, "Invalid shuffling mask index %i", i);
+              TRAP_RETURN (new_stmt, stmt, gsi, vec0);
+            }
+
+	  vecel = vector_element (gsi, vec0, idxval, &vec0tmp);
+          if (vecel == error_mark_node)
+            {
+              warning_at (loc, 0, "Invalid shuffling arguments");
+              TRAP_RETURN (new_stmt, stmt, gsi, vec0);
+            }
+          
+          t = force_gimple_operand_gsi (gsi, vecel, true, 
+					NULL_TREE, true, GSI_SAME_STMT);
+          CONSTRUCTOR_APPEND_ELT (v, size_int (i), t);
+        }
+    }
+  else
+    {
+      unsigned i;
+      tree var = create_tmp_var (type0, "vecel");
+      tree vec1tmp = NULL_TREE;
+
+      v = VEC_alloc (constructor_elt, gc, els);
+      for (i = 0; i < els; i++)
+        {
+          tree idxval, idx1val, cond, elval0, elval1, condexpr, t, ssatmp;
+          tree vec0el, vec1el;
+          gimple asgn;
+          
+          idxval = vector_element (gsi, mask, size_int (i), &masktmp);
+	  if (idxval == error_mark_node)
+            {
+              warning_at (loc, 0, "Invalid shuffling mask index %i", i);
+              TRAP_RETURN (new_stmt, stmt, gsi, vec0);
+            }
+          
+          if (TREE_CODE (idxval) == INTEGER_CST)
+            {
+              if (tree_int_cst_lt (idxval, size_int (els)))
+                {
+                  vec0el = vector_element (gsi, vec0, idxval, &vec0tmp);
+                  t = force_gimple_operand_gsi (gsi, vec0el,
+                                    true, NULL_TREE, true, GSI_SAME_STMT);
+                }
+              else if (tree_int_cst_lt (idxval, size_int (2*els)))
+                {
+                  idx1val = fold_build2 (MINUS_EXPR, TREE_TYPE (idxval),
+                        idxval, build_int_cst (TREE_TYPE (idxval), els));
+                  
+                  vec1el = vector_element (gsi, vec1, idx1val, &vec1tmp);
+                  t = force_gimple_operand_gsi (gsi, vec1el, true, 
+						NULL_TREE, true, GSI_SAME_STMT);
+                }
+              else
+                {
+                  warning_at (loc, 0, "Invalid shuffling mask index %i", i);
+                  TRAP_RETURN (new_stmt, stmt, gsi, vec0);
+                }
+            }
+          else
+            {
+
+              idx1val = fold_build2 (MINUS_EXPR, TREE_TYPE (idxval),
+                            idxval, build_int_cst (TREE_TYPE (idxval), els));
+              idx1val = force_gimple_operand_gsi (gsi, idx1val, 
+                                true, NULL_TREE, true, GSI_SAME_STMT);
+              cond = build2 (GT_EXPR, boolean_type_node, \
+                             idxval, convert (type0, size_int (els - 1)));
+              
+	      vec0el = vector_element (gsi, vec0, idxval, &vec0tmp);
+              if (vec0el == error_mark_node)
+                {
+                  warning_at (loc, 0, "Invalid shuffling arguments");
+                  TRAP_RETURN (new_stmt, stmt, gsi, vec0);
+                }
+
+              elval0 = force_gimple_operand_gsi (gsi, vec0el, 
+                                true, NULL_TREE, true, GSI_SAME_STMT);
+	      
+	      vec1el = vector_element (gsi, vec1, idx1val, &vec1tmp);
+              if (vec1el == error_mark_node)
+                {
+                  warning_at (loc, 0, "Invalid shuffling arguments");
+                  TRAP_RETURN (new_stmt, stmt, gsi, vec0);
+                }
+
+              elval1 = force_gimple_operand_gsi (gsi, vec1el,
+                                true, NULL_TREE, true, GSI_SAME_STMT);
+
+              condexpr = fold_build3 (COND_EXPR, type0, cond, \
+                                      elval1, elval0);
+
+              t = force_gimple_operand_gsi (gsi, condexpr, true, \
+                                        NULL_TREE, true, GSI_SAME_STMT);
+            }
+          
+          asgn = gimple_build_assign (var, t);
+          ssatmp = make_ssa_name (var, asgn);
+          gimple_assign_set_lhs (asgn, ssatmp);
+          gsi_insert_before (gsi, asgn, GSI_SAME_STMT);
+          CONSTRUCTOR_APPEND_ELT (v, size_int (i), ssatmp);
+        }
+    }
+  
+  vectype = build_vector_type (type0, els);
+  constr = build_constructor (vectype, v);
+  gimple_assign_set_rhs_from_tree (gsi, constr);
+  /* Statement should be updated by callee.  */
+}
+
 /* Process one statement.  If we identify a vector operation, expand it.  */
 
 static void
@@ -451,6 +725,13 @@ expand_vector_operations_1 (gimple_stmt_
   code = gimple_assign_rhs_code (stmt);
   rhs_class = get_gimple_rhs_class (code);
 
+  if (code == VEC_SHUFFLE_EXPR)
+    {
+      lower_vec_shuffle (gsi, gimple_location (stmt));
+      gimple_set_modified (gsi_stmt (*gsi), true);
+      update_stmt (gsi_stmt (*gsi));
+    }
+
   if (rhs_class != GIMPLE_UNARY_RHS && rhs_class != GIMPLE_BINARY_RHS)
     return;
 
@@ -612,10 +893,11 @@ expand_vector_operations_1 (gimple_stmt_
 /* Use this to lower vector operations introduced by the vectorizer,
    if it may need the bit-twiddling tricks implemented in this file.  */
 
+
 static bool
-gate_expand_vector_operations (void)
+gate_expand_vector_operations_noop (void)
 {
-  return flag_tree_vectorize != 0;
+  return optimize == 0;
 }
 
 static unsigned int
@@ -648,7 +930,7 @@ struct gimple_opt_pass pass_lower_vector
  {
   GIMPLE_PASS,
   "veclower",				/* name */
-  0,					/* gate */
+  gate_expand_vector_operations_noop,   /* gate */
   expand_vector_operations,		/* execute */
   NULL,					/* sub */
   NULL,					/* next */
@@ -660,7 +942,8 @@ struct gimple_opt_pass pass_lower_vector
   0,					/* todo_flags_start */
   TODO_update_ssa	                /* todo_flags_finish */
     | TODO_verify_ssa
-    | TODO_verify_stmts | TODO_verify_flow
+    | TODO_verify_stmts | TODO_verify_flow 
+    | TODO_cleanup_cfg
  }
 };
 
@@ -669,7 +952,7 @@ struct gimple_opt_pass pass_lower_vector
  {
   GIMPLE_PASS,
   "veclower2",				/* name */
-  gate_expand_vector_operations,	/* gate */
+  0,	                                /* gate */
   expand_vector_operations,		/* execute */
   NULL,					/* sub */
   NULL,					/* next */
@@ -682,6 +965,7 @@ struct gimple_opt_pass pass_lower_vector
   TODO_update_ssa	                /* todo_flags_finish */
     | TODO_verify_ssa
     | TODO_verify_stmts | TODO_verify_flow
+    | TODO_cleanup_cfg
  }
 };
 
Index: gcc/gimple.c
===================================================================
--- gcc/gimple.c	(revision 178354)
+++ gcc/gimple.c	(working copy)
@@ -2615,6 +2615,7 @@ get_gimple_rhs_num_ops (enum tree_code c
       || (SYM) == WIDEN_MULT_MINUS_EXPR					    \
       || (SYM) == DOT_PROD_EXPR						    \
       || (SYM) == REALIGN_LOAD_EXPR					    \
+      || (SYM) == VEC_SHUFFLE_EXPR					    \
       || (SYM) == FMA_EXPR) ? GIMPLE_TERNARY_RHS			    \
    : ((SYM) == COND_EXPR						    \
       || (SYM) == CONSTRUCTOR						    \
Index: gcc/tree-cfg.c
===================================================================
--- gcc/tree-cfg.c	(revision 178354)
+++ gcc/tree-cfg.c	(working copy)
@@ -3713,6 +3713,7 @@ verify_gimple_assign_ternary (gimple stm
 
     case DOT_PROD_EXPR:
     case REALIGN_LOAD_EXPR:
+    case VEC_SHUFFLE_EXPR:
       /* FIXME.  */
       return false;
 
Index: gcc/passes.c
===================================================================
--- gcc/passes.c	(revision 178354)
+++ gcc/passes.c	(working copy)
@@ -1354,7 +1354,6 @@ init_optimization_passes (void)
 	  NEXT_PASS (pass_vectorize);
 	    {
 	      struct opt_pass **p = &pass_vectorize.pass.sub;
-	      NEXT_PASS (pass_lower_vector_ssa);
 	      NEXT_PASS (pass_dce_loop);
 	    }
           NEXT_PASS (pass_predcom);
@@ -1366,6 +1365,7 @@ init_optimization_passes (void)
 	  NEXT_PASS (pass_lim);
 	  NEXT_PASS (pass_tree_loop_done);
 	}
+      NEXT_PASS (pass_lower_vector_ssa);
       NEXT_PASS (pass_cse_reciprocals);
       NEXT_PASS (pass_reassoc);
       NEXT_PASS (pass_vrp);
Index: gcc/c-parser.c
===================================================================
--- gcc/c-parser.c	(revision 178354)
+++ gcc/c-parser.c	(working copy)
@@ -5989,6 +5989,41 @@ c_parser_alignof_expression (c_parser *p
     }
 }
 
+/* Helper function to read arguments of builtins which are interfaces
+   for the middle-end nodes like COMPLEX_EXPR, VEC_SHUFFLE_EXPR and
+   others.  The name of the builtin is passed using BNAME parameter.
+   Function returns true if there were no errors while parsing and
+   stores the arguments in EXPR_LIST.  */
+static bool
+c_parser_get_builtin_args (c_parser *parser, const char *bname, 
+			   VEC(tree,gc) **expr_list)
+{
+  location_t loc = c_parser_peek_token (parser)->location;
+  *expr_list = NULL;
+
+  if (c_parser_next_token_is_not (parser, CPP_OPEN_PAREN))
+    {
+      error_at (loc, "cannot take address of %qs", bname);
+      return false;
+    }
+
+  c_parser_consume_token (parser);
+
+  if (c_parser_next_token_is (parser, CPP_CLOSE_PAREN))
+    {
+      c_parser_consume_token (parser);
+      return true;
+    }
+    
+  *expr_list = c_parser_expr_list (parser, false, false, NULL);
+
+  if (!c_parser_require (parser, CPP_CLOSE_PAREN, "expected %<)%>"))
+    return false;
+
+  return true;
+}
+
+
 /* Parse a postfix expression (C90 6.3.1-6.3.2, C99 6.5.1-6.5.2).
 
    postfix-expression:
@@ -6027,6 +6062,10 @@ c_parser_alignof_expression (c_parser *p
 			     assignment-expression )
      __builtin_types_compatible_p ( type-name , type-name )
      __builtin_complex ( assignment-expression , assignment-expression )
+     __builtin_shuffle ( assignment-expression , assignment-expression )
+     __builtin_shuffle ( assignment-expression , 
+			 assignment-expression ,
+			 assignment-expression, )
 
    offsetof-member-designator:
      identifier
@@ -6047,7 +6086,7 @@ c_parser_alignof_expression (c_parser *p
 static struct c_expr
 c_parser_postfix_expression (c_parser *parser)
 {
-  struct c_expr expr, e1, e2, e3;
+  struct c_expr expr, e1;
   struct c_type_name *t1, *t2;
   location_t loc = c_parser_peek_token (parser)->location;;
   expr.original_code = ERROR_MARK;
@@ -6333,45 +6372,42 @@ c_parser_postfix_expression (c_parser *p
 	  }
 	  break;
 	case RID_CHOOSE_EXPR:
-	  c_parser_consume_token (parser);
-	  if (!c_parser_require (parser, CPP_OPEN_PAREN, "expected %<(%>"))
-	    {
-	      expr.value = error_mark_node;
-	      break;
-	    }
-	  loc = c_parser_peek_token (parser)->location;
-	  e1 = c_parser_expr_no_commas (parser, NULL);
-	  if (!c_parser_require (parser, CPP_COMMA, "expected %<,%>"))
-	    {
-	      c_parser_skip_until_found (parser, CPP_CLOSE_PAREN, NULL);
-	      expr.value = error_mark_node;
-	      break;
-	    }
-	  e2 = c_parser_expr_no_commas (parser, NULL);
-	  if (!c_parser_require (parser, CPP_COMMA, "expected %<,%>"))
-	    {
-	      c_parser_skip_until_found (parser, CPP_CLOSE_PAREN, NULL);
-	      expr.value = error_mark_node;
-	      break;
-	    }
-	  e3 = c_parser_expr_no_commas (parser, NULL);
-	  c_parser_skip_until_found (parser, CPP_CLOSE_PAREN,
-				     "expected %<)%>");
 	  {
-	    tree c;
+	    VEC(tree,gc) *expr_list;
+	    tree e1value, e2value, e3value, c;
 
-	    c = e1.value;
-	    mark_exp_read (e2.value);
-	    mark_exp_read (e3.value);
+	    c_parser_consume_token (parser);
+	    if (!c_parser_get_builtin_args (parser, 
+					    "__builtin_choose_expr", &expr_list))
+	      {
+		expr.value = error_mark_node;
+		break;
+	      }
+
+	    if (VEC_length (tree, expr_list) != 3)
+	      {
+		error_at (loc, "wrong number of arguments to "
+			       "%<__builtin_choose_expr%>");
+		expr.value = error_mark_node;
+		break;
+	      }
+	    
+	    e1value = VEC_index (tree, expr_list, 0);
+	    e2value = VEC_index (tree, expr_list, 1);
+	    e3value = VEC_index (tree, expr_list, 2);
+
+	    c = e1value;
+	    mark_exp_read (e2value);
+	    mark_exp_read (e3value);
 	    if (TREE_CODE (c) != INTEGER_CST
 		|| !INTEGRAL_TYPE_P (TREE_TYPE (c)))
 	      error_at (loc,
 			"first argument to %<__builtin_choose_expr%> not"
 			" a constant");
 	    constant_expression_warning (c);
-	    expr = integer_zerop (c) ? e3 : e2;
+	    expr.value = integer_zerop (c) ? e3value : e2value;
+	    break;
 	  }
-	  break;
 	case RID_TYPES_COMPATIBLE_P:
 	  c_parser_consume_token (parser);
 	  if (!c_parser_require (parser, CPP_OPEN_PAREN, "expected %<(%>"))
@@ -6410,57 +6446,94 @@ c_parser_postfix_expression (c_parser *p
 	  }
 	  break;
 	case RID_BUILTIN_COMPLEX:
-	  c_parser_consume_token (parser);
-	  if (!c_parser_require (parser, CPP_OPEN_PAREN, "expected %<(%>"))
-	    {
-	      expr.value = error_mark_node;
-	      break;
-	    }
-	  loc = c_parser_peek_token (parser)->location;
-	  e1 = c_parser_expr_no_commas (parser, NULL);
-	  if (!c_parser_require (parser, CPP_COMMA, "expected %<,%>"))
-	    {
-	      c_parser_skip_until_found (parser, CPP_CLOSE_PAREN, NULL);
-	      expr.value = error_mark_node;
-	      break;
-	    }
-	  e2 = c_parser_expr_no_commas (parser, NULL);
-	  c_parser_skip_until_found (parser, CPP_CLOSE_PAREN,
-				     "expected %<)%>");
-	  mark_exp_read (e1.value);
-	  if (TREE_CODE (e1.value) == EXCESS_PRECISION_EXPR)
-	    e1.value = convert (TREE_TYPE (e1.value),
-				TREE_OPERAND (e1.value, 0));
-	  mark_exp_read (e2.value);
-	  if (TREE_CODE (e2.value) == EXCESS_PRECISION_EXPR)
-	    e2.value = convert (TREE_TYPE (e2.value),
-				TREE_OPERAND (e2.value, 0));
-	  if (!SCALAR_FLOAT_TYPE_P (TREE_TYPE (e1.value))
-	      || DECIMAL_FLOAT_TYPE_P (TREE_TYPE (e1.value))
-	      || !SCALAR_FLOAT_TYPE_P (TREE_TYPE (e2.value))
-	      || DECIMAL_FLOAT_TYPE_P (TREE_TYPE (e2.value)))
-	    {
-	      error_at (loc, "%<__builtin_complex%> operand "
-			"not of real binary floating-point type");
-	      expr.value = error_mark_node;
-	      break;
-	    }
-	  if (TYPE_MAIN_VARIANT (TREE_TYPE (e1.value))
-	      != TYPE_MAIN_VARIANT (TREE_TYPE (e2.value)))
-	    {
-	      error_at (loc,
-			"%<__builtin_complex%> operands of different types");
-	      expr.value = error_mark_node;
-	      break;
-	    }
-	  if (!flag_isoc99)
-	    pedwarn (loc, OPT_pedantic,
-		     "ISO C90 does not support complex types");
-	  expr.value = build2 (COMPLEX_EXPR,
-			       build_complex_type (TYPE_MAIN_VARIANT
-						   (TREE_TYPE (e1.value))),
-			       e1.value, e2.value);
-	  break;
+	  { 
+	    VEC(tree,gc) *expr_list;
+	    tree e1value, e2value;
+	    
+	    c_parser_consume_token (parser);
+	    if (!c_parser_get_builtin_args (parser, 
+					    "__builtin_complex", &expr_list))
+	      {
+		expr.value = error_mark_node;
+		break;
+	      }
+
+	    if (VEC_length (tree, expr_list) != 2)
+	      {
+		error_at (loc, "wrong number of arguments to "
+			       "%<__builtin_complex%>");
+		expr.value = error_mark_node;
+		break;
+	      }
+	    
+	    e1value = VEC_index (tree, expr_list, 0);
+	    e2value = VEC_index (tree, expr_list, 1);
+
+	    mark_exp_read (e1value);
+	    if (TREE_CODE (e1value) == EXCESS_PRECISION_EXPR)
+	      e1value = convert (TREE_TYPE (e1value),
+				 TREE_OPERAND (e1value, 0));
+	    mark_exp_read (e2value);
+	    if (TREE_CODE (e2value) == EXCESS_PRECISION_EXPR)
+	      e2value = convert (TREE_TYPE (e2value),
+				 TREE_OPERAND (e2value, 0));
+	    if (!SCALAR_FLOAT_TYPE_P (TREE_TYPE (e1value))
+		|| DECIMAL_FLOAT_TYPE_P (TREE_TYPE (e1value))
+		|| !SCALAR_FLOAT_TYPE_P (TREE_TYPE (e2value))
+		|| DECIMAL_FLOAT_TYPE_P (TREE_TYPE (e2value)))
+	      {
+		error_at (loc, "%<__builtin_complex%> operand "
+			  "not of real binary floating-point type");
+		expr.value = error_mark_node;
+		break;
+	      }
+	    if (TYPE_MAIN_VARIANT (TREE_TYPE (e1value))
+		!= TYPE_MAIN_VARIANT (TREE_TYPE (e2value)))
+	      {
+		error_at (loc,
+			  "%<__builtin_complex%> operands of different types");
+		expr.value = error_mark_node;
+		break;
+	      }
+	    if (!flag_isoc99)
+	      pedwarn (loc, OPT_pedantic,
+		       "ISO C90 does not support complex types");
+	    expr.value = build2 (COMPLEX_EXPR,
+				 build_complex_type (TYPE_MAIN_VARIANT
+						     (TREE_TYPE (e1value))),
+				 e1value, e2value);
+	    break;
+	  }
+	case RID_BUILTIN_SHUFFLE:
+	  {
+	    VEC(tree,gc) *expr_list;
+	    
+	    c_parser_consume_token (parser);
+	    if (!c_parser_get_builtin_args (parser, 
+					    "__builtin_shuffle", &expr_list))
+	      {
+		expr.value = error_mark_node;
+		break;
+	      }
+
+	    if (VEC_length (tree, expr_list) == 2)
+	      expr.value = c_build_vec_shuffle_expr 
+				(loc, VEC_index (tree, expr_list, 0),
+				 VEC_index (tree, expr_list, 0),
+				 VEC_index (tree, expr_list, 1));
+	    else if (VEC_length (tree, expr_list) == 3)
+	      expr.value = c_build_vec_shuffle_expr 
+				(loc, VEC_index (tree, expr_list, 0),
+				 VEC_index (tree, expr_list, 1),
+				 VEC_index (tree, expr_list, 2));
+	    else
+	      {
+		error_at (loc, "wrong number of arguments to "
+			       "%<__builtin_shuffle%>");
+		expr.value = error_mark_node;
+	      }
+	    break;
+	  }
 	case RID_AT_SELECTOR:
 	  gcc_assert (c_dialect_objc ());
 	  c_parser_consume_token (parser);
Index: gcc/config/i386/sse.md
===================================================================
--- gcc/config/i386/sse.md	(revision 178354)
+++ gcc/config/i386/sse.md	(working copy)
@@ -231,6 +231,12 @@ (define_mode_attr sseinsnmode
    (V4SF "V4SF") (V2DF "V2DF")
    (TI "TI") (V32QI "OI") (V16HI "OI") (V8SI "OI") (V4DI "OI")])
 
+;; All 128bit vector modes
+(define_mode_attr sseshuffint
+  [(V16QI "V16QI") (V8HI "V8HI") 
+   (V4SI "V4SI")  (V2DI "V2DI")
+   (V4SF "V4SI") (V2DF "V2DI")])
+
 ;; Mapping of vector float modes to an integer mode of the same size
 (define_mode_attr sseintvecmode
   [(V8SF "V8SI") (V4DF "V4DI")
@@ -6234,6 +6240,18 @@ (define_expand "vconduv2di"
   DONE;
 })
 
+(define_expand "vshuffle<mode>"
+  [(match_operand:V_128 0 "register_operand" "")
+   (match_operand:V_128 1 "general_operand" "")
+   (match_operand:<sseshuffint> 2 "general_operand" "")]
+  "TARGET_SSE3 || TARGET_AVX"
+{
+  bool ok = ix86_expand_vshuffle (operands);
+  gcc_assert (ok);
+  DONE;
+})
+
+
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
 ;;
 ;; Parallel bitwise logical operations
Index: gcc/config/i386/i386-protos.h
===================================================================
--- gcc/config/i386/i386-protos.h	(revision 178354)
+++ gcc/config/i386/i386-protos.h	(working copy)
@@ -118,6 +118,7 @@ extern bool ix86_expand_int_movcc (rtx[]
 extern bool ix86_expand_fp_movcc (rtx[]);
 extern bool ix86_expand_fp_vcond (rtx[]);
 extern bool ix86_expand_int_vcond (rtx[]);
+extern bool ix86_expand_vshuffle (rtx[]);
 extern void ix86_expand_sse_unpack (rtx[], bool, bool);
 extern bool ix86_expand_int_addcc (rtx[]);
 extern rtx ix86_expand_call (rtx, rtx, rtx, rtx, rtx, bool);
Index: gcc/config/i386/i386.c
===================================================================
--- gcc/config/i386/i386.c	(revision 178354)
+++ gcc/config/i386/i386.c	(working copy)
@@ -18693,6 +18693,96 @@ ix86_expand_int_vcond (rtx operands[])
   return true;
 }
 
+bool
+ix86_expand_vshuffle (rtx operands[])
+{
+  rtx target = operands[0];
+  rtx op0 = operands[1];
+  rtx mask = operands[2];
+  rtx mm, vt, cv0, t1;
+  enum machine_mode mode = GET_MODE (op0);
+  enum machine_mode maskmode = GET_MODE (mask);
+  enum machine_mode maskinner = GET_MODE_INNER (mode);
+  rtx vec[16];
+  int w, i, j;
+
+  gcc_assert ((TARGET_SSE3 || TARGET_AVX) && GET_MODE_BITSIZE (mode) == 128);
+
+  op0 = force_reg (mode, op0);
+  mask = force_reg (maskmode, mask);
+
+  /* Number of elements in the vector.  */
+  w = GET_MODE_BITSIZE (maskmode) / GET_MODE_BITSIZE (maskinner);
+ 
+  /* mask = mask & {w-1, w-1, w-1,...} */
+  for (i = 0; i < w; i++)
+    vec[i] = GEN_INT (w - 1);
+
+  mm = gen_rtx_CONST_VECTOR (maskmode, gen_rtvec_v (w, vec));
+  mm = force_reg (maskmode, mm);
+
+  mask = gen_rtx_AND (maskmode, mask, mm);
+  
+  /* Convert mask to vector of chars.  */
+  mask = simplify_gen_subreg (V16QImode, mask, maskmode, 0);
+  mask = force_reg (V16QImode, mask);
+
+
+  /* Build a helper mask wich we will use in pshufb
+     (v4si) --> {0,0,0,0, 4,4,4,4, 8,8,8,8, 12,12,12,12}
+     (v8hi) --> {0,0, 2,2, 4,4, 6,6, ...}
+     ...  */
+  for (i = 0; i < w; i++)
+    for (j = 0; j < 16/w; j++)
+      vec[i*w+j] = GEN_INT (i*16/w);
+
+  vt = gen_rtx_CONST_VECTOR (V16QImode, gen_rtvec_v (16, vec));
+  vt = force_reg (V16QImode, vt);
+  
+  t1 = gen_reg_rtx (V16QImode);
+  emit_insn (gen_ssse3_pshufbv16qi3 (t1, mask, vt));
+  mm = t1;
+
+  /* MM contains now something like
+     mm = {m[0], .., m[0], m[k], .., m[k], ... }, where 
+     m[i] is an index of the element in the vector we are
+     selecting from.
+
+     Convert it into the byte positions by doing
+     mm = mm * {16/w, 16/w, ...}
+     mm = mm + {0,1,..,16/w, 0,1,..,16/w, ...}  */
+  for (i = 0; i < 16; i++)
+    vec[i] = GEN_INT (16/w);
+
+  cv0 = gen_rtx_CONST_VECTOR (V16QImode, gen_rtvec_v (16, vec));
+  cv0 = force_reg (V16QImode, cv0);
+  mm = gen_rtx_MULT (V16QImode, mm, cv0);
+
+  for (i = 0; i < w; i++)
+    for (j = 0; j < 16/w; j++)
+      vec[i*w+j] = GEN_INT (j);
+
+  cv0 = gen_rtx_CONST_VECTOR (V16QImode, gen_rtvec_v (16, vec));
+  cv0 = force_reg (V16QImode, cv0);
+  mm = gen_rtx_PLUS (V16QImode, mm, cv0);
+  mm = force_reg (V16QImode, mm);
+
+  t1 = gen_reg_rtx (V16QImode);
+  
+  /* Convert OP0 to vector of chars.  */
+  op0 = simplify_gen_subreg (V16QImode, op0, mode, 0);
+  op0 = force_reg (V16QImode, op0);
+  emit_insn (gen_ssse3_pshufbv16qi3 (t1, op0, mm));
+  
+  /* Convert it back from vector of chars to the original mode.  */
+  t1 = simplify_gen_subreg (mode, t1, V16QImode, 0);
+  
+  emit_insn (gen_rtx_SET (VOIDmode, target, t1));
+ 
+  fprintf (stderr, "-- %s called\n", __func__);
+  return true;
+}
+
 /* Unpack OP[1] into the next wider integer vector type.  UNSIGNED_P is
    true if we should do zero extension, else sign extension.  HIGH_P is
    true if we want the N/2 high elements, else the low elements.  */
@@ -30911,6 +31001,9 @@ struct expand_vec_perm_d
 
 static bool expand_vec_perm_1 (struct expand_vec_perm_d *d);
 static bool expand_vec_perm_broadcast_1 (struct expand_vec_perm_d *d);
+static int extract_vec_perm_cst (struct expand_vec_perm_d *, tree);
+static bool ix86_vectorize_builtin_vec_perm_ok (tree vec_type, tree mask);
+
 
 /* Get a vector mode of the same size as the original but with elements
    twice as wide.  This is only guaranteed to apply to integral vectors.  */
@@ -34576,10 +34669,10 @@ ix86_vectorize_builtin_vec_perm_ok (tree
 
   vec_mask = extract_vec_perm_cst (&d, mask);
 
-  /* This hook is cannot be called in response to something that the
-     user does (unlike the builtin expander) so we shouldn't ever see
-     an error generated from the extract.  */
-  gcc_assert (vec_mask > 0 && vec_mask <= 3);
+  /* Check whether the mask can be applied to the vector type.  */
+  if (vec_mask < 0 || vec_mask > 3)
+    return false;
+  
   one_vec = (vec_mask != 3);
 
   /* Implementable with shufps or pshufd.  */
Index: gcc/tree-ssa-operands.c
===================================================================
--- gcc/tree-ssa-operands.c	(revision 178354)
+++ gcc/tree-ssa-operands.c	(working copy)
@@ -943,6 +943,7 @@ get_expr_operands (gimple stmt, tree *ex
 
     case COND_EXPR:
     case VEC_COND_EXPR:
+    case VEC_SHUFFLE_EXPR:
       get_expr_operands (stmt, &TREE_OPERAND (expr, 0), uflags);
       get_expr_operands (stmt, &TREE_OPERAND (expr, 1), uflags);
       get_expr_operands (stmt, &TREE_OPERAND (expr, 2), uflags);

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: Vector shuffling
  2011-09-02 16:09                         ` Artem Shinkarov
  2011-09-02 17:15                           ` Artem Shinkarov
@ 2011-09-02 19:52                           ` Joseph S. Myers
  2011-09-03 15:53                             ` Artem Shinkarov
  1 sibling, 1 reply; 71+ messages in thread
From: Joseph S. Myers @ 2011-09-02 19:52 UTC (permalink / raw)
  To: Artem Shinkarov; +Cc: Richard Guenther, Duncan Sands, gcc-patches

On Fri, 2 Sep 2011, Artem Shinkarov wrote:

> Joseph, I don't understand this comment. I have 2 or 3 arguments in
> the VEC_SHUFFLE_EXPR and any of them can be C_MAYBE_CONST_EXPR,

Yes.

> so I
> need to wrap mask (the last argument) to avoid the following failure:

No.  You need to fold it (c_fully_fold) to eliminate any 
C_MAYBE_CONST_EXPR it contains, but you shouldn't need to wrap the result 
of folding in a SAVE_EXPR.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: Vector shuffling
  2011-09-02 19:52                           ` Joseph S. Myers
@ 2011-09-03 15:53                             ` Artem Shinkarov
  2011-09-06 15:40                               ` Richard Guenther
  2011-09-07 15:07                               ` Joseph S. Myers
  0 siblings, 2 replies; 71+ messages in thread
From: Artem Shinkarov @ 2011-09-03 15:53 UTC (permalink / raw)
  To: Joseph S. Myers; +Cc: Richard Guenther, Duncan Sands, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 661 bytes --]

On Fri, Sep 2, 2011 at 8:52 PM, Joseph S. Myers <joseph@codesourcery.com> wrote:
> On Fri, 2 Sep 2011, Artem Shinkarov wrote:
>
>> Joseph, I don't understand this comment. I have 2 or 3 arguments in
>> the VEC_SHUFFLE_EXPR and any of them can be C_MAYBE_CONST_EXPR,
>
> Yes.
>
>> so I
>> need to wrap mask (the last argument) to avoid the following failure:
>
> No.  You need to fold it (c_fully_fold) to eliminate any
> C_MAYBE_CONST_EXPR it contains, but you shouldn't need to wrap the result
> of folding in a SAVE_EXPR.

Ok, Now I get it, thanks.

In the attachment there is a new version of the patch that removes save-exprs.


Artem.

[-- Attachment #2: vec-shuffle.v15.diff --]
[-- Type: text/plain, Size: 52919 bytes --]

Index: gcc/doc/extend.texi
===================================================================
--- gcc/doc/extend.texi	(revision 178354)
+++ gcc/doc/extend.texi	(working copy)
@@ -6561,6 +6561,32 @@ invoke undefined behavior at runtime.  W
 accesses for vector subscription can be enabled with
 @option{-Warray-bounds}.
 
+Vector shuffling is available using functions 
+@code{__builtin_shuffle (vec, mask)} and 
+@code{__builtin_shuffle (vec0, vec1, mask)}. Both functions construct
+a permutation of elements from one or two vectors and return a vector
+of the same type as input vector(s). The mask is a vector of 
+integer-typed elements. The size of each element of the mask must be
+the same as the size of each input vector element. The number of 
+elements in input vector(s) and mask must be the same.
+
+The elements of the input vectors are numbered from left to right across
+one or both of the vectors. Each element in the mask specifies a number
+of element from the input vector(s). Consider the following example.
+
+@smallexample
+typedef int v4si __attribute__ ((vector_size (16)));
+
+v4si a = @{1,2,3,4@};
+v4si b = @{5,6,7,8@};
+v4si mask1 = @{0,1,1,3@};
+v4si mask2 = @{0,4,2,5@};
+v4si res;
+
+res = __builtin_shuffle (a, mask1);       /* res is @{1,2,2,4@}  */
+res = __builtin_shuffle2 (a, b, mask2);   /* res is @{1,5,3,6@}  */
+@end smallexample
+
 You can declare variables and use them in function calls and returns, as
 well as in assignments and some casts.  You can specify a vector type as
 a return type for a function.  Vector types can also be used as function
Index: gcc/tree-pretty-print.c
===================================================================
--- gcc/tree-pretty-print.c	(revision 178354)
+++ gcc/tree-pretty-print.c	(working copy)
@@ -2067,6 +2067,16 @@ dump_generic_node (pretty_printer *buffe
       dump_generic_node (buffer, TREE_OPERAND (node, 2), s0, V1, MASK is the same.  */
tree
pc, flags, false);
       pp_string (buffer, " > ");
       break;
+    
+    case VEC_SHUFFLE_EXPR:
+      pp_string (buffer, " VEC_SHUFFLE_EXPR < ");
+      dump_generic_node (buffer, TREE_OPERAND (node, 0), spc, flags, false);
+      pp_string (buffer, " , ");
+      dump_generic_node (buffer, TREE_OPERAND (node, 1), spc, flags, false);
+      pp_string (buffer, " , ");
+      dump_generic_node (buffer, TREE_OPERAND (node, 2), spc, flags, false);
+      pp_string (buffer, " > ");
+      break;
 
     case DOT_PROD_EXPR:
       pp_string (buffer, " DOT_PROD_EXPR < ");
Index: gcc/c-family/c-common.c
===================================================================
--- gcc/c-family/c-common.c	(revision 178354)
+++ gcc/c-family/c-common.c	(working copy)
@@ -425,6 +425,7 @@ const struct c_common_resword c_common_r
   { "__attribute__",	RID_ATTRIBUTE,	0 },
   { "__builtin_choose_expr", RID_CHOOSE_EXPR, D_CONLY },
   { "__builtin_complex", RID_BUILTIN_COMPLEX, D_CONLY },
+  { "__builtin_shuffle", RID_BUILTIN_SHUFFLE, D_CONLY },
   { "__builtin_offsetof", RID_OFFSETOF, 0 },
   { "__builtin_types_compatible_p", RID_TYPES_COMPATIBLE_P, D_CONLY },
   { "__builtin_va_arg",	RID_VA_ARG,	0 },
Index: gcc/c-family/c-common.h
===================================================================
--- gcc/c-family/c-common.h	(revision 178354)
+++ gcc/c-family/c-common.h	(working copy)
@@ -103,7 +103,7 @@ enum rid
   /* C extensions */
   RID_ASM,       RID_TYPEOF,   RID_ALIGNOF,  RID_ATTRIBUTE,  RID_VA_ARG,
   RID_EXTENSION, RID_IMAGPART, RID_REALPART, RID_LABEL,      RID_CHOOSE_EXPR,
-  RID_TYPES_COMPATIBLE_P,      RID_BUILTIN_COMPLEX,
+  RID_TYPES_COMPATIBLE_P,      RID_BUILTIN_COMPLEX,	     RID_BUILTIN_SHUFFLE,
   RID_DFLOAT32, RID_DFLOAT64, RID_DFLOAT128,
   RID_FRACT, RID_ACCUM,
 
@@ -898,6 +898,7 @@ extern tree build_function_call (locatio
 
 extern tree build_function_call_vec (location_t, tree,
     				     VEC(tree,gc) *, VEC(tree,gc) *);
+extern tree c_build_vec_shuffle_expr (location_t, tree, tree, tree);
 
 extern tree resolve_overloaded_builtin (location_t, tree, VEC(tree,gc) *);
 
Index: gcc/optabs.c
===================================================================
--- gcc/optabs.c	(revision 178354)
+++ gcc/optabs.c	(working copy)
@@ -6620,6 +6620,82 @@ vector_compare_rtx (tree cond, bool unsi
   return gen_rtx_fmt_ee (rcode, VOIDmode, ops[0].value, ops[1].value);
 }
 
+/* Return true if VEC_SHUFF_EXPR can be expanded using SIMD extensions
+   of the CPU.  */
+bool
+expand_vec_shuffle_expr_p (enum machine_mode mode, tree v0, 
+			   tree v1, tree mask)
+{
+  int v0_mode_s = GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE (TREE_TYPE (v0))));
+  int mask_mode_s = GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE (TREE_TYPE (mask))));
+
+  if (TREE_CODE (mask) == VECTOR_CST
+      && targetm.vectorize.builtin_vec_perm_ok (TREE_TYPE (v0), mask))
+    return true;
+  
+  if (v0 != v1 || v0_mode_s != mask_mode_s)
+    return false;
+    
+  return direct_optab_handler (vshuffle_optab, mode) != CODE_FOR_nothing;
+}
+
+/* Generate instructions for VEC_COND_EXPR given its type and three
+   operands.  */
+rtx
+expand_vec_shuffle_expr (tree type, tree v0, tree v1, tree mask, rtx target)
+{
+  struct expand_operand ops[4];
+  enum insn_code icode;
+  enum machine_mode mode = TYPE_MODE (type);
+  rtx rtx_v0, rtx_mask;
+
+  gcc_assert (expand_vec_shuffle_expr_p (mode, v0, v1, mask));
+
+  if (TREE_CODE (mask) == VECTOR_CST)
+    {
+      tree m_type, call;
+      tree fn = targetm.vectorize.builtin_vec_perm (TREE_TYPE (v0), &m_type);
+      rtx t;
+
+      if (!fn)
+	goto vshuffle;
+
+      if (m_type != TREE_TYPE (TREE_TYPE (mask)))
+	{	
+	  int units = TYPE_VECTOR_SUBPARTS (TREE_TYPE (mask));
+	  tree cvt = build_vector_type (m_type, units);
+	  mask = fold_convert (cvt, mask);
+	}
+
+      fn = copy_node (fn);
+      call = fold_build1 (ADDR_EXPR, build_pointer_type (TREE_TYPE (fn)), fn);
+      call = build_call_nary (type /* ? */, call, 3, v0, v1, mask);
+
+      t = expand_normal (call);  
+      target = gen_reg_rtx (mode);
+      emit_insn (gen_rtx_SET (VOIDmode, target, t));
+      return target;
+    }
+
+vshuffle:
+  gcc_assert (v1 == v0);
+
+  icode = direct_optab_handler (vshuffle_optab, mode);
+
+  if (icode == CODE_FOR_nothing)
+    return 0;
+  
+  rtx_v0 = expand_normal (v0);
+  rtx_mask = expand_normal (mask);
+
+  create_output_operand (&ops[0], target, mode);
+  create_input_operand (&ops[1], rtx_v0, mode);
+  create_input_operand (&ops[2], rtx_mask, mode);
+  expand_insn (icode, 3, ops);
+  
+  return ops[0].value;
+}
+
 /* Return insn code for TYPE, the type of a VEC_COND_EXPR.  */
 
 static inline enum insn_code
Index: gcc/optabs.h
===================================================================
--- gcc/optabs.h	(revision 178354)
+++ gcc/optabs.h	(working copy)
@@ -636,6 +636,9 @@ enum direct_optab_index
   DOI_vcond,
   DOI_vcondu,
 
+  /* Vector shuffling.  */
+  DOI_vshuffle,
+
   /* Block move operation.  */
   DOI_movmem,
 
@@ -701,6 +704,7 @@ typedef struct direct_optab_d *direct_op
 #define reload_out_optab (&direct_optab_table[(int) DOI_reload_out])
 #define vcond_optab (&direct_optab_table[(int) DOI_vcond])
 #define vcondu_optab (&direct_optab_table[(int) DOI_vcondu])
+#define vshuffle_optab (&direct_optab_table[(int) DOI_vshuffle])
 #define movmem_optab (&direct_optab_table[(int) DOI_movmem])
 #define setmem_optab (&direct_optab_table[(int) DOI_setmem])
 #define cmpstr_optab (&direct_optab_table[(int) DOI_cmpstr])
@@ -879,8 +883,15 @@ extern rtx expand_widening_mult (enum ma
 /* Return tree if target supports vector operations for COND_EXPR.  */
 bool expand_vec_cond_expr_p (tree, enum machine_mode);
 
+/* Return tree if target supports vector operations for VEC_SHUFFLE_EXPR.  */
+bool expand_vec_shuffle_expr_p (enum machine_mode, tree, tree, tree);
+
 /* Generate code for VEC_COND_EXPR.  */
 extern rtx expand_vec_cond_expr (tree, tree, tree, tree, rtx);
+
+/* Generate code for VEC_SHUFFLE_EXPR.  */
+extern rtx expand_vec_shuffle_expr (tree, tree, tree, tree, rtx);
+
 /* Generate code for VEC_LSHIFT_EXPR and VEC_RSHIFT_EXPR.  */
 extern rtx expand_vec_shift_expr (sepops, rtx);
 
Index: gcc/genopinit.c
===================================================================
--- gcc/genopinit.c	(revision 178354)
+++ gcc/genopinit.c	(working copy)
@@ -255,6 +255,7 @@ static const char * const optabs[] =
   "set_optab_handler (vec_realign_load_optab, $A, CODE_FOR_$(vec_realign_load_$a$))",
   "set_direct_optab_handler (vcond_optab, $A, CODE_FOR_$(vcond$a$))",
   "set_direct_optab_handler (vcondu_optab, $A, CODE_FOR_$(vcondu$a$))",
+  "set_direct_optab_handler (vshuffle_optab, $A, CODE_FOR_$(vshuffle$a$))",
   "set_optab_handler (ssum_widen_optab, $A, CODE_FOR_$(widen_ssum$I$a3$))",
   "set_optab_handler (usum_widen_optab, $A, CODE_FOR_$(widen_usum$I$a3$))",
   "set_optab_handler (udot_prod_optab, $A, CODE_FOR_$(udot_prod$I$a$))",
Index: gcc/testsuite/gcc.c-torture/execute/vect-shuffle-2.c
===================================================================
--- gcc/testsuite/gcc.c-torture/execute/vect-shuffle-2.c	(revision 0)
+++ gcc/testsuite/gcc.c-torture/execute/vect-shuffle-2.c	(revision 0)
@@ -0,0 +1,44 @@
+#define vector(elcount, type)  \
+__attribute__((vector_size((elcount)*sizeof(type)))) type
+
+#define vidx(type, vec, idx) (*(((type *) &(vec)) + idx))
+
+#define shuf2compare(type, count, vres, v0, v1, mask) \
+do { \
+    int __i; \
+    for (__i = 0; __i < count; __i++) { \
+        if (vidx(type, vres, __i) != ((vidx(type, mask, __i) < count) ? \
+                          vidx(type, v0, vidx(type, mask, __i)) :  \
+                          vidx(type, v1, (vidx(type, mask, __i) - count)))) \
+            __builtin_abort (); \
+        } \
+} while (0)
+
+
+int main (int argc, char *argv[]) {
+    vector (8, short) v0 = {5, 5,5,5,5,5,argc,7};
+    vector (8, short) v1 = {argc, 1,8,8,4,9,argc,4};
+    vector (8, short) v2;
+
+    //vector (8, short) mask = {1,2,5,4,3,6,7};
+    
+    vector (8, short) mask0 = {0,2,3,1,4,5,6,7};
+    vector (8, short) mask1 = {0,12,3,4,3,0,10,9};
+    
+    vector (8, short) mask2 = {0,8,1,9,2,10,3,11};
+
+    v2 = __builtin_shuffle (v0, v1,  mask0);
+    shuf2compare (short, 8, v2, v0, v1, mask0);
+ 
+    v2 = __builtin_shuffle (v0, v1,  mask1);
+    shuf2compare (short, 8, v2, v0, v1, mask1);
+    
+    v2 = __builtin_shuffle (v0, v1,  mask2);
+    shuf2compare (short, 8, v2, v0, v1, mask2);
+
+    v2 = __builtin_shuffle (mask0, mask0,  v0);
+    shuf2compare (short, 8, v2, mask0, mask0, v0);
+
+    return 0; 
+}
+
Index: gcc/testsuite/gcc.c-torture/execute/vect-shuffle-4.c
===================================================================
--- gcc/testsuite/gcc.c-torture/execute/vect-shuffle-4.c	(revision 0)
+++ gcc/testsuite/gcc.c-torture/execute/vect-shuffle-4.c	(revision 0)
@@ -0,0 +1,50 @@
+#define vector(elcount, type)  \
+__attribute__((vector_size((elcount)*sizeof(type)))) type
+
+#define vidx(type, vec, idx) (*(((type *) &(vec)) + idx))
+
+#define shuf2compare(type, count, vres, v0, v1, mask) \
+do { \
+    int __i; \
+    for (__i = 0; __i < count; __i++) { \
+        if (vidx(type, vres, __i) != ((vidx(type, mask, __i) < count) ? \
+                          vidx(type, v0, vidx(type, mask, __i)) :  \
+                          vidx(type, v1, (vidx(type, mask, __i) - count)))) \
+            __builtin_abort (); \
+        } \
+} while (0)
+
+
+vector (8, short) __attribute__ ((noinline))
+f (vector (8, short) x, vector (8, short) y, vector (8, short) mask) {
+    return __builtin_shuffle (x, y, mask);
+}
+
+
+
+int main (int argc, char *argv[]) {
+    vector (8, short) v0 = {argc, 1,2,3,4,5,6,7};
+    vector (8, short) v1 = {argc, 1,argc,3,4,5,argc,7};
+    vector (8, short) v2;
+
+    //vector (8, short) mask = {1,2,5,4,3,6,7};
+    
+    vector (8, short) mask0 = {0,2,3,1,4,5,6,7};
+    vector (8, short) mask1 = {0,12,3,4,3,0,10,9};
+    vector (8, short) mask2 = {0,8,1,9,2,10,3,11};
+
+    v2 = f (v0, v1,  mask0);
+    shuf2compare (short, 8, v2, v0, v1, mask0);
+ 
+    v2 = f (v0, v1,  mask1);
+    shuf2compare (short, 8, v2, v0, v1, mask1);
+
+    v2 = f (v0, v1,  mask2);
+    shuf2compare (short, 8, v2, v0, v1, mask2);
+
+    v2 = f (mask0, mask0,  v0);
+    shuf2compare (short, 8, v2, mask0, mask0, v0);
+
+    return 0; 
+}
+
Index: gcc/testsuite/gcc.c-torture/execute/vect-shuffle-1.c
===================================================================
--- gcc/testsuite/gcc.c-torture/execute/vect-shuffle-1.c	(revision 0)
+++ gcc/testsuite/gcc.c-torture/execute/vect-shuffle-1.c	(revision 0)
@@ -0,0 +1,46 @@
+#define vector(elcount, type)  \
+__attribute__((vector_size((elcount)*sizeof(type)))) type
+
+#define vidx(type, vec, idx) (*(((type *) &(vec)) + idx))
+
+#define shufcompare(type, count, vres, v0, mask) \
+do { \
+    int __i; \
+    for (__i = 0; __i < count; __i++) { \
+        if (vidx(type, vres, __i) != vidx(type, v0, vidx(type, mask, __i))) \
+            __builtin_abort (); \
+    } \
+} while (0)
+
+
+int main (int argc, char *argv[]) {
+    /*vector (8, short) v0 = {argc, 1,2,3,4,5,6,7};
+    vector (8, short) v1 = {argc, 1,argc,3,4,5,argc,7};
+    vector (8, short) v2;
+   
+    vector (8, short) smask = {0,0,1,2,3,4,5,6};
+    
+    v2 = __builtin_shuffle (v0,  smask);
+    shufcompare (short, 8, v2, v0, smask);
+    v2 = __builtin_shuffle (v0, v1);
+    shufcompare (short, 8, v2, v0, v1);
+    v2 = __builtin_shuffle (smask, v0);
+    shufcompare (short, 8, v2, smask, v0);*/
+
+    vector (4, int) i0 = {argc, 1,2,3};
+    vector (4, int) i1 = {argc, 1, argc, 3};
+    vector (4, int) i2;
+
+    vector (4, int) imask = {0,3,2,1};
+
+    /*i2 = __builtin_shuffle (i0, imask);
+    shufcompare (int, 4, i2, i0, imask);*/
+    i2 = __builtin_shuffle (i0, i1);
+    shufcompare (int, 4, i2, i0, i1);
+    
+    i2 = __builtin_shuffle (imask, i0);
+    shufcompare (int, 4, i2, imask, i0);
+    
+    return 0;
+}
+
Index: gcc/testsuite/gcc.c-torture/execute/vect-shuffle-3.c
===================================================================
--- gcc/testsuite/gcc.c-torture/execute/vect-shuffle-3.c	(revision 0)
+++ gcc/testsuite/gcc.c-torture/execute/vect-shuffle-3.c	(revision 0)
@@ -0,0 +1,36 @@
+#define vector(elcount, type)  \
+__attribute__((vector_size((elcount)*sizeof(type)))) type
+
+#define vidx(type, vec, idx) (*(((type *) &(vec)) + idx))
+
+#define shufcompare(type, count, vres, v0, mask) \
+do { \
+    int __i; \
+    for (__i = 0; __i < count; __i++) { \
+        if (vidx(type, vres, __i) != vidx(type, v0, vidx(type, mask, __i))) \
+            __builtin_abort (); \
+    } \
+} while (0)
+
+vector (8, short) __attribute__ ((noinline))
+f (vector (8, short) x, vector (8, short) mask) {
+    return __builtin_shuffle (x, mask);
+}
+
+
+int main (int argc, char *argv[]) {
+    vector (8, short) v0 = {argc, 1,2,3,4,5,6,7};
+    vector (8, short) v1 = {argc, 1,argc,3,4,5,argc,7};
+    vector (8, short) v2;
+
+    vector (8, short) mask = {0,0,1,2,3,4,5,6};
+    
+    v2 = f (v0,  mask);
+    shufcompare (short, 8, v2, v0, mask);
+
+    v2 = f (v0, v1);
+    shufcompare (short, 8, v2, v0, v1);
+
+    return 0;
+}
+
Index: gcc/testsuite/gcc.dg/builtin-complex-err-1.c
===================================================================
--- gcc/testsuite/gcc.dg/builtin-complex-err-1.c	(revision 178354)
+++ gcc/testsuite/gcc.dg/builtin-complex-err-1.c	(working copy)
@@ -19,8 +19,8 @@ _Complex float fc3 = __builtin_complex (
 void
 f (void)
 {
-  __builtin_complex (0.0); /* { dg-error "expected" } */
-  __builtin_complex (0.0, 0.0, 0.0); /* { dg-error "expected" } */
+  __builtin_complex (0.0); /* { dg-error "wrong number of arguments" } */
+  __builtin_complex (0.0, 0.0, 0.0); /* { dg-error "wrong number of arguments" } */
 }
 
-void (*p) (void) = __builtin_complex; /* { dg-error "expected" } */
+void (*p) (void) = __builtin_complex; /* { dg-error "cannot take address" } */
Index: gcc/expr.c
===================================================================
--- gcc/expr.c	(revision 178354)
+++ gcc/expr.c	(working copy)
@@ -8605,6 +8605,10 @@ expand_expr_real_2 (sepops ops, rtx targ
     case VEC_PACK_FIX_TRUNC_EXPR:
       mode = TYPE_MODE (TREE_TYPE (treeop0));
       goto binop;
+    
+    case VEC_SHUFFLE_EXPR:
+      target = expand_vec_shuffle_expr (type, treeop0, treeop1, treeop2, target);
+      return target;
 
     case DOT_PROD_EXPR:
       {
Index: gcc/gimple-pretty-print.c
===================================================================
--- gcc/gimple-pretty-print.c	(revision 178354)
+++ gcc/gimple-pretty-print.c	(working copy)
@@ -417,6 +417,16 @@ dump_ternary_rhs (pretty_printer *buffer
       dump_generic_node (buffer, gimple_assign_rhs3 (gs), spc, flags, false);
       pp_string (buffer, ">");
       break;
+    
+    case VEC_SHUFFLE_EXPR:
+      pp_string (buffer, "VEC_SHUFFLE_EXPR <");
+      dump_generic_node (buffer, gimple_assign_rhs1 (gs), spc, flags, false);
+      pp_string (buffer, ", ");
+      dump_generic_node (buffer, gimple_assign_rhs2 (gs), spc, flags, false);
+      pp_string (buffer, ", ");
+      dump_generic_node (buffer, gimple_assign_rhs3 (gs), spc, flags, false);
+      pp_string (buffer, ">");
+      break;
 
     case REALIGN_LOAD_EXPR:
       pp_string (buffer, "REALIGN_LOAD <");
Index: gcc/c-typeck.c
===================================================================
--- gcc/c-typeck.c	(revision 178354)
+++ gcc/c-typeck.c	(working copy)
@@ -2845,6 +2845,87 @@ build_function_call_vec (location_t loc,
     }
   return require_complete_type (result);
 }
+
+/* Build a VEC_SHUFFLE_EXPR if V0, V1 and MASK are not error_mark_nodes
+   and have vector types, V0 has the same type as V1, and the number of
+   elements of V0, V1, MASK is the same.  */
+tree
+c_build_vec_shuffle_expr (location_t loc, tree v0, tree v1, tree mask)
+{
+  tree vec_shuffle, tmp;
+  bool wrap = true;
+  bool maybe_const = false;
+  bool two_arguments = v0 == v1;
+
+
+  if (v0 == error_mark_node || v1 == error_mark_node 
+      || mask == error_mark_node)
+    return error_mark_node;
+
+  if (TREE_CODE (TREE_TYPE (mask)) != VECTOR_TYPE
+      || TREE_CODE (TREE_TYPE (TREE_TYPE (mask))) != INTEGER_TYPE)
+    {
+      error_at (loc, "__builtin_shuffle last argument must "
+		     "be an integer vector");
+      return error_mark_node;
+    }
+   
+  if (TREE_CODE (TREE_TYPE (v0)) != VECTOR_TYPE
+      || TREE_CODE (TREE_TYPE (v1)) != VECTOR_TYPE)
+    {
+      error_at (loc, "__builtin_shuffle arguments must be vectors");
+      return error_mark_node;
+    }
+
+  if (TREE_TYPE (v0) != TREE_TYPE (v1))
+    {
+      error_at (loc, "__builtin_shuffle argument vectors must be of "
+		     "the same type");
+      return error_mark_node;
+    }
+
+  if (TYPE_VECTOR_SUBPARTS (TREE_TYPE (v0)) 
+      != TYPE_VECTOR_SUBPARTS (TREE_TYPE (mask))
+      && TYPE_VECTOR_SUBPARTS (TREE_TYPE (v1))
+	 != TYPE_VECTOR_SUBPARTS (TREE_TYPE (mask)))
+    {
+      error_at (loc, "__builtin_shuffle number of elements of the "
+		     "argument vector(s) and the mask vector should "
+		     "be the same");
+      return error_mark_node;
+    }
+  
+  if (GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE (TREE_TYPE (v0)))) 
+      != GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE (TREE_TYPE (mask)))))
+    {
+      error_at (loc, "__builtin_shuffle argument vector(s) inner type "
+		     "must have the same size as inner type of the mask");
+      return error_mark_node;
+    }
+
+  /* Avoid C_MAYBE_CONST_EXPRs inside VEC_SHUFFLE_EXPR.  */
+  tmp = c_fully_fold (v0, false, &maybe_const);
+  v0 = save_expr (tmp);
+  wrap &= maybe_const;
+
+  if (!two_arguments)
+    {
+      v1 = c_fully_fold (v1, false, &maybe_const);
+      wrap &= maybe_const;
+    }
+  else
+    v1 = v0;
+  
+  mask = c_fully_fold (mask, false, &maybe_const);
+  wrap &= maybe_const;
+
+  vec_shuffle = build3 (VEC_SHUFFLE_EXPR, TREE_TYPE (v0), v0, v1, mask);
+
+  if (!wrap)
+    vec_shuffle = c_wrap_maybe_const (vec_shuffle, true);
+
+  return vec_shuffle;
+}
 \f
 /* Convert the argument expressions in the vector VALUES
    to the types in the list TYPELIST.
@@ -6120,7 +6201,14 @@ digest_init (location_t init_loc, tree t
 	  tree value;
 	  bool constant_p = true;
 
-	  /* Iterate through elements and check if all constructor
+	  /* If constructor has less elements than the vector type.  */
+          if (CONSTRUCTOR_NELTS (inside_init) 
+              < TYPE_VECTOR_SUBPARTS (TREE_TYPE (inside_init)))
+            warning_at (init_loc, 0, "vector length does not match "
+                                     "initializer length, zero elements "
+                                     "will be inserted");
+          
+          /* Iterate through elements and check if all constructor
 	     elements are *_CSTs.  */
 	  FOR_EACH_CONSTRUCTOR_VALUE (CONSTRUCTOR_ELTS (inside_init), ix, value)
 	    if (!CONSTANT_CLASS_P (value))
Index: gcc/gimplify.c
===================================================================
--- gcc/gimplify.c	(revision 178354)
+++ gcc/gimplify.c	(working copy)
@@ -7053,6 +7053,32 @@ gimplify_expr (tree *expr_p, gimple_seq
 	  }
 	  break;
 
+	case VEC_SHUFFLE_EXPR:
+	  {
+	    enum gimplify_status r0, r1, r2;
+
+	    if (TREE_OPERAND (*expr_p, 0) == TREE_OPERAND (*expr_p, 1))
+	      {
+		r0 = r1 = gimplify_expr (&TREE_OPERAND (*expr_p, 0), pre_p,
+					 post_p, is_gimple_val, fb_rvalue);
+		TREE_OPERAND (*expr_p, 1) = TREE_OPERAND (*expr_p, 0);
+	      }
+	    else
+	      {
+		 r0 = gimplify_expr (&TREE_OPERAND (*expr_p, 0), pre_p,
+				     post_p, is_gimple_val, fb_rvalue);
+		 r1 = gimplify_expr (&TREE_OPERAND (*expr_p, 1), pre_p,
+				     post_p, is_gimple_val, fb_rvalue);
+	      }
+
+	    r2 = gimplify_expr (&TREE_OPERAND (*expr_p, 2), pre_p,
+				post_p, is_gimple_val, fb_rvalue);
+	    recalculate_side_effects (*expr_p);
+
+	    ret = MIN (r0, MIN (r1, r2));
+	    break;
+	  }
+
 	case TARGET_MEM_REF:
 	  {
 	    enum gimplify_status r0 = GS_ALL_DONE, r1 = GS_ALL_DONE;
Index: gcc/tree.def
===================================================================
--- gcc/tree.def	(revision 178354)
+++ gcc/tree.def	(working copy)
@@ -497,6 +497,19 @@ DEFTREECODE (COND_EXPR, "cond_expr", tcc
 */
 DEFTREECODE (VEC_COND_EXPR, "vec_cond_expr", tcc_expression, 3)
 
+/* Vector shuffle expression. A = VEC_SHUFFLE_EXPR<v0, v1, maks>
+   means
+
+   freach i in length (mask):
+     A = mask[i] < length (v0) ? v0[mask[i]] : v1[mask[i]]
+
+   V0 and V1 are vectors of the same type. MASK is an integer-typed
+   vector. The number of MASK elements must be the same with the
+   number of elements in V0 and V1. The size of the inner type
+   of the MASK and of the V0 and V1 must be the same.
+*/
+DEFTREECODE (VEC_SHUFFLE_EXPR, "vec_shuffle_expr", tcc_expression, 3)
+
 /* Declare local variables, including making RTL and allocating space.
    BIND_EXPR_VARS is a chain of VAR_DECL nodes for the variables.
    BIND_EXPR_BODY is the body, the expression to be computed using
Index: gcc/tree-inline.c
===================================================================
--- gcc/tree-inline.c	(revision 178354)
+++ gcc/tree-inline.c	(working copy)
@@ -3285,6 +3285,7 @@ estimate_operator_cost (enum tree_code c
        ??? We may consider mapping RTL costs to this.  */
     case COND_EXPR:
     case VEC_COND_EXPR:
+    case VEC_SHUFFLE_EXPR:
 
     case PLUS_EXPR:
     case POINTER_PLUS_EXPR:
Index: gcc/tree-vect-generic.c
===================================================================
--- gcc/tree-vect-generic.c	(revision 178354)
+++ gcc/tree-vect-generic.c	(working copy)
@@ -30,6 +30,7 @@ along with GCC; see the file COPYING3.
 #include "tree-pass.h"
 #include "flags.h"
 #include "ggc.h"
+#include "diagnostic.h"
 
 /* Need to include rtl.h, expr.h, etc. for optabs.  */
 #include "expr.h"
@@ -432,6 +433,279 @@ type_for_widest_vector_mode (enum machin
     }
 }
 
+
+/* Build a reference to the element of the vector VECT. Function 
+   returns either the element itself, either BIT_FIELD_REF, or an 
+   ARRAY_REF expression.
+   
+   GSI is requred to insert temporary variables while building a
+   refernece to the element of the vector VECT.
+   
+   PTMPVEC is a pointer to the temporary variable for caching
+   purposes. In case when PTMPVEC is NULL new temporary variable
+   will be created.  */
+static tree
+vector_element (gimple_stmt_iterator *gsi, tree vect, tree idx, tree *ptmpvec)
+{
+  tree type;
+  gimple asgn; 
+  unsigned HOST_WIDE_INT maxval;
+  tree tmpvec; 
+  tree indextype, arraytype;
+  bool need_asgn = true;
+
+  gcc_assert (TREE_CODE (TREE_TYPE (vect)) == VECTOR_TYPE);
+
+  type = TREE_TYPE (vect);
+  if (TREE_CODE (idx) == INTEGER_CST)
+    {
+      unsigned HOST_WIDE_INT index;
+
+      if (!host_integerp (idx, 1)
+           || (index = tree_low_cst (idx, 1)) > TYPE_VECTOR_SUBPARTS (type)-1)
+        return error_mark_node;
+
+      if (TREE_CODE (vect) == VECTOR_CST)
+        {
+            unsigned i;
+            tree vals = TREE_VECTOR_CST_ELTS (vect);
+            for (i = 0; vals; vals = TREE_CHAIN (vals), ++i)
+              if (i == index)
+                 return TREE_VALUE (vals);
+            return error_mark_node;
+        }
+      else if (TREE_CODE (vect) == CONSTRUCTOR)
+        {
+          unsigned i;
+          VEC (constructor_elt, gc) *vals = CONSTRUCTOR_ELTS (vect);
+          constructor_elt *elt;
+
+          for (i = 0; VEC_iterate (constructor_elt, vals, i, elt); i++)
+            if (operand_equal_p (elt->index, idx, 0))
+              return elt->value; 
+          return fold_convert (TREE_TYPE (type), integer_zero_node);
+        }
+      else if (TREE_CODE (vect) == SSA_NAME)
+        {
+          tree el;
+          gimple vectdef = SSA_NAME_DEF_STMT (vect);
+          if (gimple_assign_single_p (vectdef)
+              && (el = vector_element (gsi, gimple_assign_rhs1 (vectdef), 
+                                       idx, ptmpvec)) 
+                 != error_mark_node)
+            return el;
+          else
+            {
+              tree size = TYPE_SIZE (TREE_TYPE (type));
+              tree pos = fold_build2 (MULT_EXPR, TREE_TYPE (idx), 
+                                      idx, size);
+              return fold_build3 (BIT_FIELD_REF, TREE_TYPE (type), 
+                             vect, size, pos);
+            }
+        }
+      else
+        return error_mark_node;
+    }
+  
+  if (!ptmpvec)
+    tmpvec = create_tmp_var (TREE_TYPE (vect), "vectmp");
+  else if (!*ptmpvec)
+    tmpvec = *ptmpvec = create_tmp_var (TREE_TYPE (vect), "vectmp");
+  else
+    {
+      tmpvec = *ptmpvec;
+      need_asgn = false;
+    }
+  
+  if (need_asgn)
+    {
+      TREE_ADDRESSABLE (tmpvec) = 1;
+      asgn = gimple_build_assign (tmpvec, vect);
+      gsi_insert_before (gsi, asgn, GSI_SAME_STMT);
+    }
+
+  maxval = TYPE_VECTOR_SUBPARTS (TREE_TYPE (vect)) -1;
+  indextype = build_index_type (size_int (maxval));
+  arraytype = build_array_type (TREE_TYPE (type), indextype);
+  
+  return build4 (ARRAY_REF, TREE_TYPE (type),
+                 build1 (VIEW_CONVERT_EXPR, arraytype, tmpvec),
+                 idx, NULL_TREE, NULL_TREE);
+
+
+}
+
+/* Check if VEC_SHUFFLE_EXPR within the given setting is supported
+   by hardware, or lower it piecewie.
+
+   When VEC_SHUFFLE_EXPR has the same first and second operands:
+   VEC_SHUFFLE_EXPR <v0, v0, mask> the lowered version would be 
+   {v0[mask[0]], v0[mask[1]], ...}
+   MASK and V0 must have the same number of elements.
+        
+   Otherwise VEC_SHUFFLE_EXPR <v0, v1, mask> is lowered to 
+   {mask[0] < len(v0) ? v0[mask[0]] : v1[mask[0]], ...}
+   V0 and V1 must have the same type. MASK, V0, V1 must have the
+   same number of arguments.  */
+static void
+lower_vec_shuffle (gimple_stmt_iterator *gsi, location_t loc)
+{
+#define TRAP_RETURN(new_stmt, stmt, gsi, vec0) \
+do { \
+  new_stmt = gimple_build_call (built_in_decls[BUILT_IN_TRAP], 0); \
+  gsi_insert_before (gsi, new_stmt,  GSI_SAME_STMT); \
+  split_block (gimple_bb (new_stmt), new_stmt); \
+  new_stmt = gimple_build_assign (gimple_call_lhs (stmt), vec0); \
+  gsi_replace (gsi, new_stmt, false); \
+  return; \
+} while (0) 
+ 
+  gimple stmt = gsi_stmt (*gsi);
+  tree mask = gimple_assign_rhs3 (stmt);
+  tree vec0 = gimple_assign_rhs1 (stmt);
+  tree vec1 = gimple_assign_rhs2 (stmt);
+  unsigned els = TYPE_VECTOR_SUBPARTS (TREE_TYPE (mask));
+  tree type0 = TREE_TYPE (TREE_TYPE (vec0));
+  VEC(constructor_elt,gc) *v = NULL;
+  tree vectype, constr;
+  gimple new_stmt;
+  tree vec0tmp = NULL_TREE, masktmp = NULL_TREE;
+
+  if (expand_vec_shuffle_expr_p (TYPE_MODE (TREE_TYPE (vec0)), vec0, vec1, mask))
+    {
+      tree t;
+
+      t = gimplify_build3 (gsi, VEC_SHUFFLE_EXPR, TREE_TYPE (vec0),
+			   vec0, vec1, mask);
+      gimple_assign_set_rhs_from_tree (gsi, t);
+      /* Statement should be updated by callee.  */
+      return;
+    }
+
+  
+  if (vec0 == vec1)
+    {
+      unsigned i;
+      tree vec0tmp = NULL_TREE;
+      
+      v = VEC_alloc (constructor_elt, gc, els);
+      for (i = 0; i < els; i++)
+        {
+          tree idxval, vecel, t;
+	   
+	  idxval = vector_element (gsi, mask, size_int (i), &masktmp);
+          if (idxval == error_mark_node)
+            {
+              warning_at (loc, 0, "Invalid shuffling mask index %i", i);
+              TRAP_RETURN (new_stmt, stmt, gsi, vec0);
+            }
+
+	  vecel = vector_element (gsi, vec0, idxval, &vec0tmp);
+          if (vecel == error_mark_node)
+            {
+              warning_at (loc, 0, "Invalid shuffling arguments");
+              TRAP_RETURN (new_stmt, stmt, gsi, vec0);
+            }
+          
+          t = force_gimple_operand_gsi (gsi, vecel, true, 
+					NULL_TREE, true, GSI_SAME_STMT);
+          CONSTRUCTOR_APPEND_ELT (v, size_int (i), t);
+        }
+    }
+  else
+    {
+      unsigned i;
+      tree var = create_tmp_var (type0, "vecel");
+      tree vec1tmp = NULL_TREE;
+
+      v = VEC_alloc (constructor_elt, gc, els);
+      for (i = 0; i < els; i++)
+        {
+          tree idxval, idx1val, cond, elval0, elval1, condexpr, t, ssatmp;
+          tree vec0el, vec1el;
+          gimple asgn;
+          
+          idxval = vector_element (gsi, mask, size_int (i), &masktmp);
+	  if (idxval == error_mark_node)
+            {
+              warning_at (loc, 0, "Invalid shuffling mask index %i", i);
+              TRAP_RETURN (new_stmt, stmt, gsi, vec0);
+            }
+          
+          if (TREE_CODE (idxval) == INTEGER_CST)
+            {
+              if (tree_int_cst_lt (idxval, size_int (els)))
+                {
+                  vec0el = vector_element (gsi, vec0, idxval, &vec0tmp);
+                  t = force_gimple_operand_gsi (gsi, vec0el,
+                                    true, NULL_TREE, true, GSI_SAME_STMT);
+                }
+              else if (tree_int_cst_lt (idxval, size_int (2*els)))
+                {
+                  idx1val = fold_build2 (MINUS_EXPR, TREE_TYPE (idxval),
+                        idxval, build_int_cst (TREE_TYPE (idxval), els));
+                  
+                  vec1el = vector_element (gsi, vec1, idx1val, &vec1tmp);
+                  t = force_gimple_operand_gsi (gsi, vec1el, true, 
+						NULL_TREE, true, GSI_SAME_STMT);
+                }
+              else
+                {
+                  warning_at (loc, 0, "Invalid shuffling mask index %i", i);
+                  TRAP_RETURN (new_stmt, stmt, gsi, vec0);
+                }
+            }
+          else
+            {
+
+              idx1val = fold_build2 (MINUS_EXPR, TREE_TYPE (idxval),
+                            idxval, build_int_cst (TREE_TYPE (idxval), els));
+              idx1val = force_gimple_operand_gsi (gsi, idx1val, 
+                                true, NULL_TREE, true, GSI_SAME_STMT);
+              cond = build2 (GT_EXPR, boolean_type_node, \
+                             idxval, convert (type0, size_int (els - 1)));
+              
+	      vec0el = vector_element (gsi, vec0, idxval, &vec0tmp);
+              if (vec0el == error_mark_node)
+                {
+                  warning_at (loc, 0, "Invalid shuffling arguments");
+                  TRAP_RETURN (new_stmt, stmt, gsi, vec0);
+                }
+
+              elval0 = force_gimple_operand_gsi (gsi, vec0el, 
+                                true, NULL_TREE, true, GSI_SAME_STMT);
+	      
+	      vec1el = vector_element (gsi, vec1, idx1val, &vec1tmp);
+              if (vec1el == error_mark_node)
+                {
+                  warning_at (loc, 0, "Invalid shuffling arguments");
+                  TRAP_RETURN (new_stmt, stmt, gsi, vec0);
+                }
+
+              elval1 = force_gimple_operand_gsi (gsi, vec1el,
+                                true, NULL_TREE, true, GSI_SAME_STMT);
+
+              condexpr = fold_build3 (COND_EXPR, type0, cond, \
+                                      elval1, elval0);
+
+              t = force_gimple_operand_gsi (gsi, condexpr, true, \
+                                        NULL_TREE, true, GSI_SAME_STMT);
+            }
+          
+          asgn = gimple_build_assign (var, t);
+          ssatmp = make_ssa_name (var, asgn);
+          gimple_assign_set_lhs (asgn, ssatmp);
+          gsi_insert_before (gsi, asgn, GSI_SAME_STMT);
+          CONSTRUCTOR_APPEND_ELT (v, size_int (i), ssatmp);
+        }
+    }
+  
+  vectype = build_vector_type (type0, els);
+  constr = build_constructor (vectype, v);
+  gimple_assign_set_rhs_from_tree (gsi, constr);
+  /* Statement should be updated by callee.  */
+}
+
 /* Process one statement.  If we identify a vector operation, expand it.  */
 
 static void
@@ -451,6 +725,13 @@ expand_vector_operations_1 (gimple_stmt_
   code = gimple_assign_rhs_code (stmt);
   rhs_class = get_gimple_rhs_class (code);
 
+  if (code == VEC_SHUFFLE_EXPR)
+    {
+      lower_vec_shuffle (gsi, gimple_location (stmt));
+      gimple_set_modified (gsi_stmt (*gsi), true);
+      update_stmt (gsi_stmt (*gsi));
+    }
+
   if (rhs_class != GIMPLE_UNARY_RHS && rhs_class != GIMPLE_BINARY_RHS)
     return;
 
@@ -612,10 +893,11 @@ expand_vector_operations_1 (gimple_stmt_
 /* Use this to lower vector operations introduced by the vectorizer,
    if it may need the bit-twiddling tricks implemented in this file.  */
 
+
 static bool
-gate_expand_vector_operations (void)
+gate_expand_vector_operations_noop (void)
 {
-  return flag_tree_vectorize != 0;
+  return optimize == 0;
 }
 
 static unsigned int
@@ -648,7 +930,7 @@ struct gimple_opt_pass pass_lower_vector
  {
   GIMPLE_PASS,
   "veclower",				/* name */
-  0,					/* gate */
+  gate_expand_vector_operations_noop,   /* gate */
   expand_vector_operations,		/* execute */
   NULL,					/* sub */
   NULL,					/* next */
@@ -660,7 +942,8 @@ struct gimple_opt_pass pass_lower_vector
   0,					/* todo_flags_start */
   TODO_update_ssa	                /* todo_flags_finish */
     | TODO_verify_ssa
-    | TODO_verify_stmts | TODO_verify_flow
+    | TODO_verify_stmts | TODO_verify_flow 
+    | TODO_cleanup_cfg
  }
 };
 
@@ -669,7 +952,7 @@ struct gimple_opt_pass pass_lower_vector
  {
   GIMPLE_PASS,
   "veclower2",				/* name */
-  gate_expand_vector_operations,	/* gate */
+  0,	                                /* gate */
   expand_vector_operations,		/* execute */
   NULL,					/* sub */
   NULL,					/* next */
@@ -682,6 +965,7 @@ struct gimple_opt_pass pass_lower_vector
   TODO_update_ssa	                /* todo_flags_finish */
     | TODO_verify_ssa
     | TODO_verify_stmts | TODO_verify_flow
+    | TODO_cleanup_cfg
  }
 };
 
Index: gcc/gimple.c
===================================================================
--- gcc/gimple.c	(revision 178354)
+++ gcc/gimple.c	(working copy)
@@ -2615,6 +2615,7 @@ get_gimple_rhs_num_ops (enum tree_code c
       || (SYM) == WIDEN_MULT_MINUS_EXPR					    \
       || (SYM) == DOT_PROD_EXPR						    \
       || (SYM) == REALIGN_LOAD_EXPR					    \
+      || (SYM) == VEC_SHUFFLE_EXPR					    \
       || (SYM) == FMA_EXPR) ? GIMPLE_TERNARY_RHS			    \
    : ((SYM) == COND_EXPR						    \
       || (SYM) == CONSTRUCTOR						    \
Index: gcc/tree-cfg.c
===================================================================
--- gcc/tree-cfg.c	(revision 178354)
+++ gcc/tree-cfg.c	(working copy)
@@ -3713,6 +3713,7 @@ verify_gimple_assign_ternary (gimple stm
 
     case DOT_PROD_EXPR:
     case REALIGN_LOAD_EXPR:
+    case VEC_SHUFFLE_EXPR:
       /* FIXME.  */
       return false;
 
Index: gcc/passes.c
===================================================================
--- gcc/passes.c	(revision 178354)
+++ gcc/passes.c	(working copy)
@@ -1354,7 +1354,6 @@ init_optimization_passes (void)
 	  NEXT_PASS (pass_vectorize);
 	    {
 	      struct opt_pass **p = &pass_vectorize.pass.sub;
-	      NEXT_PASS (pass_lower_vector_ssa);
 	      NEXT_PASS (pass_dce_loop);
 	    }
           NEXT_PASS (pass_predcom);
@@ -1366,6 +1365,7 @@ init_optimization_passes (void)
 	  NEXT_PASS (pass_lim);
 	  NEXT_PASS (pass_tree_loop_done);
 	}
+      NEXT_PASS (pass_lower_vector_ssa);
       NEXT_PASS (pass_cse_reciprocals);
       NEXT_PASS (pass_reassoc);
       NEXT_PASS (pass_vrp);
Index: gcc/c-parser.c
===================================================================
--- gcc/c-parser.c	(revision 178354)
+++ gcc/c-parser.c	(working copy)
@@ -5989,6 +5989,41 @@ c_parser_alignof_expression (c_parser *p
     }
 }
 
+/* Helper function to read arguments of builtins which are interfaces
+   for the middle-end nodes like COMPLEX_EXPR, VEC_SHUFFLE_EXPR and
+   others.  The name of the builtin is passed using BNAME parameter.
+   Function returns true if there were no errors while parsing and
+   stores the arguments in EXPR_LIST.  */
+static bool
+c_parser_get_builtin_args (c_parser *parser, const char *bname, 
+			   VEC(tree,gc) **expr_list)
+{
+  location_t loc = c_parser_peek_token (parser)->location;
+  *expr_list = NULL;
+
+  if (c_parser_next_token_is_not (parser, CPP_OPEN_PAREN))
+    {
+      error_at (loc, "cannot take address of %qs", bname);
+      return false;
+    }
+
+  c_parser_consume_token (parser);
+
+  if (c_parser_next_token_is (parser, CPP_CLOSE_PAREN))
+    {
+      c_parser_consume_token (parser);
+      return true;
+    }
+    
+  *expr_list = c_parser_expr_list (parser, false, false, NULL);
+
+  if (!c_parser_require (parser, CPP_CLOSE_PAREN, "expected %<)%>"))
+    return false;
+
+  return true;
+}
+
+
 /* Parse a postfix expression (C90 6.3.1-6.3.2, C99 6.5.1-6.5.2).
 
    postfix-expression:
@@ -6027,6 +6062,10 @@ c_parser_alignof_expression (c_parser *p
 			     assignment-expression )
      __builtin_types_compatible_p ( type-name , type-name )
      __builtin_complex ( assignment-expression , assignment-expression )
+     __builtin_shuffle ( assignment-expression , assignment-expression )
+     __builtin_shuffle ( assignment-expression , 
+			 assignment-expression ,
+			 assignment-expression, )
 
    offsetof-member-designator:
      identifier
@@ -6047,7 +6086,7 @@ c_parser_alignof_expression (c_parser *p
 static struct c_expr
 c_parser_postfix_expression (c_parser *parser)
 {
-  struct c_expr expr, e1, e2, e3;
+  struct c_expr expr, e1;
   struct c_type_name *t1, *t2;
   location_t loc = c_parser_peek_token (parser)->location;;
   expr.original_code = ERROR_MARK;
@@ -6333,45 +6372,42 @@ c_parser_postfix_expression (c_parser *p
 	  }
 	  break;
 	case RID_CHOOSE_EXPR:
-	  c_parser_consume_token (parser);
-	  if (!c_parser_require (parser, CPP_OPEN_PAREN, "expected %<(%>"))
-	    {
-	      expr.value = error_mark_node;
-	      break;
-	    }
-	  loc = c_parser_peek_token (parser)->location;
-	  e1 = c_parser_expr_no_commas (parser, NULL);
-	  if (!c_parser_require (parser, CPP_COMMA, "expected %<,%>"))
-	    {
-	      c_parser_skip_until_found (parser, CPP_CLOSE_PAREN, NULL);
-	      expr.value = error_mark_node;
-	      break;
-	    }
-	  e2 = c_parser_expr_no_commas (parser, NULL);
-	  if (!c_parser_require (parser, CPP_COMMA, "expected %<,%>"))
-	    {
-	      c_parser_skip_until_found (parser, CPP_CLOSE_PAREN, NULL);
-	      expr.value = error_mark_node;
-	      break;
-	    }
-	  e3 = c_parser_expr_no_commas (parser, NULL);
-	  c_parser_skip_until_found (parser, CPP_CLOSE_PAREN,
-				     "expected %<)%>");
 	  {
-	    tree c;
+	    VEC(tree,gc) *expr_list;
+	    tree e1value, e2value, e3value, c;
 
-	    c = e1.value;
-	    mark_exp_read (e2.value);
-	    mark_exp_read (e3.value);
+	    c_parser_consume_token (parser);
+	    if (!c_parser_get_builtin_args (parser, 
+					    "__builtin_choose_expr", &expr_list))
+	      {
+		expr.value = error_mark_node;
+		break;
+	      }
+
+	    if (VEC_length (tree, expr_list) != 3)
+	      {
+		error_at (loc, "wrong number of arguments to "
+			       "%<__builtin_choose_expr%>");
+		expr.value = error_mark_node;
+		break;
+	      }
+	    
+	    e1value = VEC_index (tree, expr_list, 0);
+	    e2value = VEC_index (tree, expr_list, 1);
+	    e3value = VEC_index (tree, expr_list, 2);
+
+	    c = e1value;
+	    mark_exp_read (e2value);
+	    mark_exp_read (e3value);
 	    if (TREE_CODE (c) != INTEGER_CST
 		|| !INTEGRAL_TYPE_P (TREE_TYPE (c)))
 	      error_at (loc,
 			"first argument to %<__builtin_choose_expr%> not"
 			" a constant");
 	    constant_expression_warning (c);
-	    expr = integer_zerop (c) ? e3 : e2;
+	    expr.value = integer_zerop (c) ? e3value : e2value;
+	    break;
 	  }
-	  break;
 	case RID_TYPES_COMPATIBLE_P:
 	  c_parser_consume_token (parser);
 	  if (!c_parser_require (parser, CPP_OPEN_PAREN, "expected %<(%>"))
@@ -6410,57 +6446,94 @@ c_parser_postfix_expression (c_parser *p
 	  }
 	  break;
 	case RID_BUILTIN_COMPLEX:
-	  c_parser_consume_token (parser);
-	  if (!c_parser_require (parser, CPP_OPEN_PAREN, "expected %<(%>"))
-	    {
-	      expr.value = error_mark_node;
-	      break;
-	    }
-	  loc = c_parser_peek_token (parser)->location;
-	  e1 = c_parser_expr_no_commas (parser, NULL);
-	  if (!c_parser_require (parser, CPP_COMMA, "expected %<,%>"))
-	    {
-	      c_parser_skip_until_found (parser, CPP_CLOSE_PAREN, NULL);
-	      expr.value = error_mark_node;
-	      break;
-	    }
-	  e2 = c_parser_expr_no_commas (parser, NULL);
-	  c_parser_skip_until_found (parser, CPP_CLOSE_PAREN,
-				     "expected %<)%>");
-	  mark_exp_read (e1.value);
-	  if (TREE_CODE (e1.value) == EXCESS_PRECISION_EXPR)
-	    e1.value = convert (TREE_TYPE (e1.value),
-				TREE_OPERAND (e1.value, 0));
-	  mark_exp_read (e2.value);
-	  if (TREE_CODE (e2.value) == EXCESS_PRECISION_EXPR)
-	    e2.value = convert (TREE_TYPE (e2.value),
-				TREE_OPERAND (e2.value, 0));
-	  if (!SCALAR_FLOAT_TYPE_P (TREE_TYPE (e1.value))
-	      || DECIMAL_FLOAT_TYPE_P (TREE_TYPE (e1.value))
-	      || !SCALAR_FLOAT_TYPE_P (TREE_TYPE (e2.value))
-	      || DECIMAL_FLOAT_TYPE_P (TREE_TYPE (e2.value)))
-	    {
-	      error_at (loc, "%<__builtin_complex%> operand "
-			"not of real binary floating-point type");
-	      expr.value = error_mark_node;
-	      break;
-	    }
-	  if (TYPE_MAIN_VARIANT (TREE_TYPE (e1.value))
-	      != TYPE_MAIN_VARIANT (TREE_TYPE (e2.value)))
-	    {
-	      error_at (loc,
-			"%<__builtin_complex%> operands of different types");
-	      expr.value = error_mark_node;
-	      break;
-	    }
-	  if (!flag_isoc99)
-	    pedwarn (loc, OPT_pedantic,
-		     "ISO C90 does not support complex types");
-	  expr.value = build2 (COMPLEX_EXPR,
-			       build_complex_type (TYPE_MAIN_VARIANT
-						   (TREE_TYPE (e1.value))),
-			       e1.value, e2.value);
-	  break;
+	  { 
+	    VEC(tree,gc) *expr_list;
+	    tree e1value, e2value;
+	    
+	    c_parser_consume_token (parser);
+	    if (!c_parser_get_builtin_args (parser, 
+					    "__builtin_complex", &expr_list))
+	      {
+		expr.value = error_mark_node;
+		break;
+	      }
+
+	    if (VEC_length (tree, expr_list) != 2)
+	      {
+		error_at (loc, "wrong number of arguments to "
+			       "%<__builtin_complex%>");
+		expr.value = error_mark_node;
+		break;
+	      }
+	    
+	    e1value = VEC_index (tree, expr_list, 0);
+	    e2value = VEC_index (tree, expr_list, 1);
+
+	    mark_exp_read (e1value);
+	    if (TREE_CODE (e1value) == EXCESS_PRECISION_EXPR)
+	      e1value = convert (TREE_TYPE (e1value),
+				 TREE_OPERAND (e1value, 0));
+	    mark_exp_read (e2value);
+	    if (TREE_CODE (e2value) == EXCESS_PRECISION_EXPR)
+	      e2value = convert (TREE_TYPE (e2value),
+				 TREE_OPERAND (e2value, 0));
+	    if (!SCALAR_FLOAT_TYPE_P (TREE_TYPE (e1value))
+		|| DECIMAL_FLOAT_TYPE_P (TREE_TYPE (e1value))
+		|| !SCALAR_FLOAT_TYPE_P (TREE_TYPE (e2value))
+		|| DECIMAL_FLOAT_TYPE_P (TREE_TYPE (e2value)))
+	      {
+		error_at (loc, "%<__builtin_complex%> operand "
+			  "not of real binary floating-point type");
+		expr.value = error_mark_node;
+		break;
+	      }
+	    if (TYPE_MAIN_VARIANT (TREE_TYPE (e1value))
+		!= TYPE_MAIN_VARIANT (TREE_TYPE (e2value)))
+	      {
+		error_at (loc,
+			  "%<__builtin_complex%> operands of different types");
+		expr.value = error_mark_node;
+		break;
+	      }
+	    if (!flag_isoc99)
+	      pedwarn (loc, OPT_pedantic,
+		       "ISO C90 does not support complex types");
+	    expr.value = build2 (COMPLEX_EXPR,
+				 build_complex_type (TYPE_MAIN_VARIANT
+						     (TREE_TYPE (e1value))),
+				 e1value, e2value);
+	    break;
+	  }
+	case RID_BUILTIN_SHUFFLE:
+	  {
+	    VEC(tree,gc) *expr_list;
+	    
+	    c_parser_consume_token (parser);
+	    if (!c_parser_get_builtin_args (parser, 
+					    "__builtin_shuffle", &expr_list))
+	      {
+		expr.value = error_mark_node;
+		break;
+	      }
+
+	    if (VEC_length (tree, expr_list) == 2)
+	      expr.value = c_build_vec_shuffle_expr 
+				(loc, VEC_index (tree, expr_list, 0),
+				 VEC_index (tree, expr_list, 0),
+				 VEC_index (tree, expr_list, 1));
+	    else if (VEC_length (tree, expr_list) == 3)
+	      expr.value = c_build_vec_shuffle_expr 
+				(loc, VEC_index (tree, expr_list, 0),
+				 VEC_index (tree, expr_list, 1),
+				 VEC_index (tree, expr_list, 2));
+	    else
+	      {
+		error_at (loc, "wrong number of arguments to "
+			       "%<__builtin_shuffle%>");
+		expr.value = error_mark_node;
+	      }
+	    break;
+	  }
 	case RID_AT_SELECTOR:
 	  gcc_assert (c_dialect_objc ());
 	  c_parser_consume_token (parser);
Index: gcc/config/i386/sse.md
===================================================================
--- gcc/config/i386/sse.md	(revision 178354)
+++ gcc/config/i386/sse.md	(working copy)
@@ -231,6 +231,12 @@ (define_mode_attr sseinsnmode
    (V4SF "V4SF") (V2DF "V2DF")
    (TI "TI") (V32QI "OI") (V16HI "OI") (V8SI "OI") (V4DI "OI")])
 
+;; All 128bit vector modes
+(define_mode_attr sseshuffint
+  [(V16QI "V16QI") (V8HI "V8HI") 
+   (V4SI "V4SI")  (V2DI "V2DI")
+   (V4SF "V4SI") (V2DF "V2DI")])
+
 ;; Mapping of vector float modes to an integer mode of the same size
 (define_mode_attr sseintvecmode
   [(V8SF "V8SI") (V4DF "V4DI")
@@ -6234,6 +6240,18 @@ (define_expand "vconduv2di"
   DONE;
 })
 
+(define_expand "vshuffle<mode>"
+  [(match_operand:V_128 0 "register_operand" "")
+   (match_operand:V_128 1 "general_operand" "")
+   (match_operand:<sseshuffint> 2 "general_operand" "")]
+  "TARGET_SSE3 || TARGET_AVX"
+{
+  bool ok = ix86_expand_vshuffle (operands);
+  gcc_assert (ok);
+  DONE;
+})
+
+
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
 ;;
 ;; Parallel bitwise logical operations
Index: gcc/config/i386/i386-protos.h
===================================================================
--- gcc/config/i386/i386-protos.h	(revision 178354)
+++ gcc/config/i386/i386-protos.h	(working copy)
@@ -118,6 +118,7 @@ extern bool ix86_expand_int_movcc (rtx[]
 extern bool ix86_expand_fp_movcc (rtx[]);
 extern bool ix86_expand_fp_vcond (rtx[]);
 extern bool ix86_expand_int_vcond (rtx[]);
+extern bool ix86_expand_vshuffle (rtx[]);
 extern void ix86_expand_sse_unpack (rtx[], bool, bool);
 extern bool ix86_expand_int_addcc (rtx[]);
 extern rtx ix86_expand_call (rtx, rtx, rtx, rtx, rtx, bool);
Index: gcc/config/i386/i386.c
===================================================================
--- gcc/config/i386/i386.c	(revision 178354)
+++ gcc/config/i386/i386.c	(working copy)
@@ -18693,6 +18693,96 @@ ix86_expand_int_vcond (rtx operands[])
   return true;
 }
 
+bool
+ix86_expand_vshuffle (rtx operands[])
+{
+  rtx target = operands[0];
+  rtx op0 = operands[1];
+  rtx mask = operands[2];
+  rtx mm, vt, cv0, t1;
+  enum machine_mode mode = GET_MODE (op0);
+  enum machine_mode maskmode = GET_MODE (mask);
+  enum machine_mode maskinner = GET_MODE_INNER (mode);
+  rtx vec[16];
+  int w, i, j;
+
+  gcc_assert ((TARGET_SSE3 || TARGET_AVX) && GET_MODE_BITSIZE (mode) == 128);
+
+  op0 = force_reg (mode, op0);
+  mask = force_reg (maskmode, mask);
+
+  /* Number of elements in the vector.  */
+  w = GET_MODE_BITSIZE (maskmode) / GET_MODE_BITSIZE (maskinner);
+ 
+  /* mask = mask & {w-1, w-1, w-1,...} */
+  for (i = 0; i < w; i++)
+    vec[i] = GEN_INT (w - 1);
+
+  mm = gen_rtx_CONST_VECTOR (maskmode, gen_rtvec_v (w, vec));
+  mm = force_reg (maskmode, mm);
+
+  mask = gen_rtx_AND (maskmode, mask, mm);
+  
+  /* Convert mask to vector of chars.  */
+  mask = simplify_gen_subreg (V16QImode, mask, maskmode, 0);
+  mask = force_reg (V16QImode, mask);
+
+
+  /* Build a helper mask wich we will use in pshufb
+     (v4si) --> {0,0,0,0, 4,4,4,4, 8,8,8,8, 12,12,12,12}
+     (v8hi) --> {0,0, 2,2, 4,4, 6,6, ...}
+     ...  */
+  for (i = 0; i < w; i++)
+    for (j = 0; j < 16/w; j++)
+      vec[i*w+j] = GEN_INT (i*16/w);
+
+  vt = gen_rtx_CONST_VECTOR (V16QImode, gen_rtvec_v (16, vec));
+  vt = force_reg (V16QImode, vt);
+  
+  t1 = gen_reg_rtx (V16QImode);
+  emit_insn (gen_ssse3_pshufbv16qi3 (t1, mask, vt));
+  mm = t1;
+
+  /* MM contains now something like
+     mm = {m[0], .., m[0], m[k], .., m[k], ... }, where 
+     m[i] is an index of the element in the vector we are
+     selecting from.
+
+     Convert it into the byte positions by doing
+     mm = mm * {16/w, 16/w, ...}
+     mm = mm + {0,1,..,16/w, 0,1,..,16/w, ...}  */
+  for (i = 0; i < 16; i++)
+    vec[i] = GEN_INT (16/w);
+
+  cv0 = gen_rtx_CONST_VECTOR (V16QImode, gen_rtvec_v (16, vec));
+  cv0 = force_reg (V16QImode, cv0);
+  mm = gen_rtx_MULT (V16QImode, mm, cv0);
+
+  for (i = 0; i < w; i++)
+    for (j = 0; j < 16/w; j++)
+      vec[i*w+j] = GEN_INT (j);
+
+  cv0 = gen_rtx_CONST_VECTOR (V16QImode, gen_rtvec_v (16, vec));
+  cv0 = force_reg (V16QImode, cv0);
+  mm = gen_rtx_PLUS (V16QImode, mm, cv0);
+  mm = force_reg (V16QImode, mm);
+
+  t1 = gen_reg_rtx (V16QImode);
+  
+  /* Convert OP0 to vector of chars.  */
+  op0 = simplify_gen_subreg (V16QImode, op0, mode, 0);
+  op0 = force_reg (V16QImode, op0);
+  emit_insn (gen_ssse3_pshufbv16qi3 (t1, op0, mm));
+  
+  /* Convert it back from vector of chars to the original mode.  */
+  t1 = simplify_gen_subreg (mode, t1, V16QImode, 0);
+  
+  emit_insn (gen_rtx_SET (VOIDmode, target, t1));
+ 
+  fprintf (stderr, "-- %s called\n", __func__);
+  return true;
+}
+
 /* Unpack OP[1] into the next wider integer vector type.  UNSIGNED_P is
    true if we should do zero extension, else sign extension.  HIGH_P is
    true if we want the N/2 high elements, else the low elements.  */
@@ -30911,6 +31001,9 @@ struct expand_vec_perm_d
 
 static bool expand_vec_perm_1 (struct expand_vec_perm_d *d);
 static bool expand_vec_perm_broadcast_1 (struct expand_vec_perm_d *d);
+static int extract_vec_perm_cst (struct expand_vec_perm_d *, tree);
+static bool ix86_vectorize_builtin_vec_perm_ok (tree vec_type, tree mask);
+
 
 /* Get a vector mode of the same size as the original but with elements
    twice as wide.  This is only guaranteed to apply to integral vectors.  */
@@ -34576,10 +34669,10 @@ ix86_vectorize_builtin_vec_perm_ok (tree
 
   vec_mask = extract_vec_perm_cst (&d, mask);
 
-  /* This hook is cannot be called in response to something that the
-     user does (unlike the builtin expander) so we shouldn't ever see
-     an error generated from the extract.  */
-  gcc_assert (vec_mask > 0 && vec_mask <= 3);
+  /* Check whether the mask can be applied to the vector type.  */
+  if (vec_mask < 0 || vec_mask > 3)
+    return false;
+  
   one_vec = (vec_mask != 3);
 
   /* Implementable with shufps or pshufd.  */
Index: gcc/tree-ssa-operands.c
===================================================================
--- gcc/tree-ssa-operands.c	(revision 178354)
+++ gcc/tree-ssa-operands.c	(working copy)
@@ -943,6 +943,7 @@ get_expr_operands (gimple stmt, tree *ex
 
     case COND_EXPR:
     case VEC_COND_EXPR:
+    case VEC_SHUFFLE_EXPR:
       get_expr_operands (stmt, &TREE_OPERAND (expr, 0), uflags);
       get_expr_operands (stmt, &TREE_OPERAND (expr, 1), uflags);
       get_expr_operands (stmt, &TREE_OPERAND (expr, 2), uflags);

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: Vector shuffling
  2011-09-03 15:53                             ` Artem Shinkarov
@ 2011-09-06 15:40                               ` Richard Guenther
  2011-09-07 15:07                               ` Joseph S. Myers
  1 sibling, 0 replies; 71+ messages in thread
From: Richard Guenther @ 2011-09-06 15:40 UTC (permalink / raw)
  To: Artem Shinkarov; +Cc: Joseph S. Myers, Duncan Sands, gcc-patches, Jan Hubicka

On Sat, Sep 3, 2011 at 5:52 PM, Artem Shinkarov
<artyom.shinkaroff@gmail.com> wrote:
> On Fri, Sep 2, 2011 at 8:52 PM, Joseph S. Myers <joseph@codesourcery.com> wrote:
>> On Fri, 2 Sep 2011, Artem Shinkarov wrote:
>>
>>> Joseph, I don't understand this comment. I have 2 or 3 arguments in
>>> the VEC_SHUFFLE_EXPR and any of them can be C_MAYBE_CONST_EXPR,
>>
>> Yes.
>>
>>> so I
>>> need to wrap mask (the last argument) to avoid the following failure:
>>
>> No.  You need to fold it (c_fully_fold) to eliminate any
>> C_MAYBE_CONST_EXPR it contains, but you shouldn't need to wrap the result
>> of folding in a SAVE_EXPR.
>
> Ok, Now I get it, thanks.
>
> In the attachment there is a new version of the patch that removes save-exprs.

You are missing a ChangeLog entry

+
+res = __builtin_shuffle (a, mask1);       /* res is @{1,2,2,4@}  */
+res = __builtin_shuffle2 (a, b, mask2);   /* res is @{1,5,3,6@}  */

should be __builtin_shuffle (a, b, mask2);

+bool
+expand_vec_shuffle_expr_p (enum machine_mode mode, tree v0,
+                          tree v1, tree mask)
+{
+  int v0_mode_s = GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE (TREE_TYPE (v0))));
+  int mask_mode_s = GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE (TREE_TYPE
(mask))));
+
+  if (TREE_CODE (mask) == VECTOR_CST
+      && targetm.vectorize.builtin_vec_perm_ok (TREE_TYPE (v0), mask))
+    return true;
+
+  if (v0 != v1 || v0_mode_s != mask_mode_s)
+    return false;

the mask size check constrains the size of the vector elements but not
their count.  It looks like you should instead verify the vector modes
are equal?  At least that would match the fact that you have an expander
that distinguishes one operand mode only.

At some point we definitely want to merge the builtin_vec_perm_*
target hooks with the optab.

+      fn = copy_node (fn);

You shouldn't need to copy fn

+      call = fold_build1 (ADDR_EXPR, build_pointer_type (TREE_TYPE (fn)), fn);
+      call = build_call_nary (type /* ? */, call, 3, v0, v1, mask);
+
+      t = expand_normal (call);
+      target = gen_reg_rtx (mode);
+      emit_insn (gen_rtx_SET (VOIDmode, target, t));

why can't you simply use

  return expand_expr_real_1 (call, target, VOIDmode, EXPAND_NORMAL, NULL);

here?

+       case VEC_SHUFFLE_EXPR:
+         {
+           enum gimplify_status r0, r1, r2;
+
+           if (TREE_OPERAND (*expr_p, 0) == TREE_OPERAND (*expr_p, 1))
+             {
+               r0 = r1 = gimplify_expr (&TREE_OPERAND (*expr_p, 0), pre_p,
+                                        post_p, is_gimple_val, fb_rvalue);
+               TREE_OPERAND (*expr_p, 1) = TREE_OPERAND (*expr_p, 0);
+             }
+           else
+             {
+                r0 = gimplify_expr (&TREE_OPERAND (*expr_p, 0), pre_p,
+                                    post_p, is_gimple_val, fb_rvalue);
+                r1 = gimplify_expr (&TREE_OPERAND (*expr_p, 1), pre_p,
+                                    post_p, is_gimple_val, fb_rvalue);
+             }

please avoid the above tree sharing (I realize it's probably for constants,
but we don't share those).  Thus, unconditionally gimplify both
operands.  The equality check in expanding should probably use
operand_equal_p (..., ..., 0) instead of a pointer comparison.

Thus, the above should simply use the expr_3: code, like FMA_EXPR.

+/* Vector shuffle expression. A = VEC_SHUFFLE_EXPR<v0, v1, maks>
+   means
+
+   freach i in length (mask):

foreach

+   number of elements in V0 and V1. The size of the inner type
+   of the MASK and of the V0 and V1 must be the same.

Index: gcc/tree-cfg.c
===================================================================
--- gcc/tree-cfg.c      (revision 178354)
+++ gcc/tree-cfg.c      (working copy)
@@ -3713,6 +3713,7 @@ verify_gimple_assign_ternary (gimple stm

     case DOT_PROD_EXPR:
     case REALIGN_LOAD_EXPR:
+    case VEC_SHUFFLE_EXPR:
       /* FIXME.  */
       return false;

can you do some basic verification here?  At least what you document
in tree.def should be verified here.

Index: gcc/tree-vect-generic.c
===================================================================
--- gcc/tree-vect-generic.c     (revision 178354)
+++ gcc/tree-vect-generic.c     (working copy)
@@ -30,6 +30,7 @@ along with GCC; see the file COPYING3.
 #include "tree-pass.h"
 #include "flags.h"
 #include "ggc.h"
+#include "diagnostic.h"

I don't see where you need this, nor a Makefile.in change.

+/* Build a reference to the element of the vector VECT. Function

two spaces after a '.'

+      if (TREE_CODE (vect) == VECTOR_CST)
+        {
+            unsigned i;
+            tree vals = TREE_VECTOR_CST_ELTS (vect);
+            for (i = 0; vals; vals = TREE_CHAIN (vals), ++i)
+              if (i == index)

operand_equal_p

+      else if (TREE_CODE (vect) == SSA_NAME)
+        {
+          tree el;
+          gimple vectdef = SSA_NAME_DEF_STMT (vect);
+          if (gimple_assign_single_p (vectdef)
+              && (el = vector_element (gsi, gimple_assign_rhs1 (vectdef),
+                                       idx, ptmpvec))
+                 != error_mark_node)
+            return el;

I think it's a premature optimization to look up the defining statement here.

+  if (need_asgn)
+    {
+      TREE_ADDRESSABLE (tmpvec) = 1;

this should be at the point we call create_tmp_var.

+      asgn = gimple_build_assign (tmpvec, vect);

and I think this needs gimplification for the case of a non-constant
non-SSA name vect.  But maybe that never happens
(consider (v4si){a, b, c, d}[i]).

+  indextype = build_index_type (size_int (maxval));
+  arraytype = build_array_type (TREE_TYPE (type), indextype);

there is now build_array_type_nelts conveniently available.

+                 idx, NULL_TREE, NULL_TREE);
+
+
+}

excess vertical space.

+static void
+lower_vec_shuffle (gimple_stmt_iterator *gsi, location_t loc)
+{
+#define TRAP_RETURN(new_stmt, stmt, gsi, vec0) \
+do { \
+  new_stmt = gimple_build_call (built_in_decls[BUILT_IN_TRAP], 0); \
+  gsi_insert_before (gsi, new_stmt,  GSI_SAME_STMT); \
+  split_block (gimple_bb (new_stmt), new_stmt); \
+  new_stmt = gimple_build_assign (gimple_call_lhs (stmt), vec0); \
+  gsi_replace (gsi, new_stmt, false); \
+  return; \
+} while (0)

I don't like such defines - this instead asks for factoring the lowering
to a function that returns a failure state and the caller doing the
TRAP_RETURN in case of failure.

+    }
+
+

excess vertical space (please double-check your patches for these simple
stylistic issues).

+  if (vec0 == vec1)
+    {

operand_equal_p

+          if (idxval == error_mark_node)
+            {
+              warning_at (loc, 0, "Invalid shuffling mask index %i", i);
+              TRAP_RETURN (new_stmt, stmt, gsi, vec0);

ah, here is the diagnostic.  I think you should change that to

  if (warning_at (loc, 0, ...))
    inform (loc, "if this code is reached, the program will abort");

as we say elsewhere when inserting traps.

+  if (code == VEC_SHUFFLE_EXPR)
+    {
+      lower_vec_shuffle (gsi, gimple_location (stmt));
+      gimple_set_modified (gsi_stmt (*gsi), true);
+      update_stmt (gsi_stmt (*gsi));

I don't think you need the gimple_set_modified call.

 /* Use this to lower vector operations introduced by the vectorizer,
    if it may need the bit-twiddling tricks implemented in this file.  */

+

spurious white-space change.

 static bool
-gate_expand_vector_operations (void)
+gate_expand_vector_operations_noop (void)
 {
-  return flag_tree_vectorize != 0;
+  return optimize == 0;
 }

but I think we already have this change in-tree.  Your patch needs updating
(we have _ssa, not _noop).

Index: gcc/passes.c
===================================================================
--- gcc/passes.c        (revision 178354)
+++ gcc/passes.c        (working copy)
@@ -1354,7 +1354,6 @@ init_optimization_passes (void)
          NEXT_PASS (pass_vectorize);
            {
              struct opt_pass **p = &pass_vectorize.pass.sub;
-             NEXT_PASS (pass_lower_vector_ssa);
              NEXT_PASS (pass_dce_loop);
            }
           NEXT_PASS (pass_predcom);
@@ -1366,6 +1365,7 @@ init_optimization_passes (void)
          NEXT_PASS (pass_lim);
          NEXT_PASS (pass_tree_loop_done);
        }
+      NEXT_PASS (pass_lower_vector_ssa);

This change should not be neccesary.

I leave the C frontend and x86 backend changes to the respective
maintainers.  Honza, can you have a look at the x86 changes?

Thanks,
Richard.

>
> Artem.
>

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: Vector shuffling
  2011-09-03 15:53                             ` Artem Shinkarov
  2011-09-06 15:40                               ` Richard Guenther
@ 2011-09-07 15:07                               ` Joseph S. Myers
  2011-09-09 17:04                                 ` Artem Shinkarov
  1 sibling, 1 reply; 71+ messages in thread
From: Joseph S. Myers @ 2011-09-07 15:07 UTC (permalink / raw)
  To: Artem Shinkarov; +Cc: Richard Guenther, Duncan Sands, gcc-patches

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: TEXT/PLAIN; charset=X-UNKNOWN, Size: 4115 bytes --]

On Sat, 3 Sep 2011, Artem Shinkarov wrote:

> > No.  You need to fold it (c_fully_fold) to eliminate any
> > C_MAYBE_CONST_EXPR it contains, but you shouldn't need to wrap the result
> > of folding in a SAVE_EXPR.
> 
> Ok, Now I get it, thanks.
> 
> In the attachment there is a new version of the patch that removes save-exprs.

> +res = __builtin_shuffle2 (a, b, mask2);   /* res is @{1,5,3,6@}  */

Elsewhere it's __builtin_shuffle for the variants with both numbers of 
arguments.

> Index: gcc/c-family/c-common.h
> ===================================================================
> --- gcc/c-family/c-common.h	(revision 178354)
> +++ gcc/c-family/c-common.h	(working copy)

> @@ -898,6 +898,7 @@ extern tree build_function_call (locatio
>  
>  extern tree build_function_call_vec (location_t, tree,
>      				     VEC(tree,gc) *, VEC(tree,gc) *);
> +extern tree c_build_vec_shuffle_expr (location_t, tree, tree, tree);

Since this function is actually defined in c-typeck.c, not in c-family 
code, the declaration should go in c-tree.h not c-common.h.

> +/* Return true if VEC_SHUFF_EXPR can be expanded using SIMD extensions

VEC_SHUFFLE_EXPR

> Index: gcc/c-typeck.c
> ===================================================================
> +/* Build a VEC_SHUFFLE_EXPR if V0, V1 and MASK are not error_mark_nodes
> +   and have vector types, V0 has the same type as V1, and the number of
> +   elements of V0, V1, MASK is the same.  */
> +tree
> +c_build_vec_shuffle_expr (location_t loc, tree v0, tree v1, tree mask)
> +{
> +  tree vec_shuffle, tmp;
> +  bool wrap = true;
> +  bool maybe_const = false;
> +  bool two_arguments = v0 == v1;

Relying on pointer comparison to determine the number of arguments seems 
error-prone; a convention of passing NULL_TREE for v1 in the two-argument 
case would be better.  Consider the case where v0 and v1 both refer to the 
same volatile DECL (so at present would have the same tree), which is a 
three-argument case where the variable should be read from twice.

The documentation seems to suggest that in the two-argument case the mask 
values must be within a single vector whereas the implementation treats 
the two-argument case as if the first argument is passed twice (so twice 
the range of mask values) but only evaluated once.  If the twice-the-range 
is intended semantics, it should be documented; otherwise there should be 
a comment noting that it's an implementation accident and not guaranteed 
semantics for users.

> +  if (TREE_TYPE (v0) != TREE_TYPE (v1))
> +    {
> +      error_at (loc, "__builtin_shuffle argument vectors must be of "
> +		     "the same type");
> +      return error_mark_node;
> +    }

What if one is const-qualified or a typedef, and the other isn't?  That 
should still be allowed, and I don't see any testcases for it.  You may 
need to use TYPE_MAIN_VARIANT.

> @@ -6120,7 +6201,14 @@ digest_init (location_t init_loc, tree t
>  	  tree value;
>  	  bool constant_p = true;
>  
> -	  /* Iterate through elements and check if all constructor
> +	  /* If constructor has less elements than the vector type.  */
> +          if (CONSTRUCTOR_NELTS (inside_init) 
> +              < TYPE_VECTOR_SUBPARTS (TREE_TYPE (inside_init)))
> +            warning_at (init_loc, 0, "vector length does not match "
> +                                     "initializer length, zero elements "
> +                                     "will be inserted");
> +          

This looks unrelated to the other changes and should be submitted 
separately with its own testcase and rationale if you wish to propose it 
as a patch.

>  	case RID_CHOOSE_EXPR:

> -	    expr = integer_zerop (c) ? e3 : e2;
> +	    expr.value = integer_zerop (c) ? e3value : e2value;

I think you should preserve the original_type value of the operand as well 
(meaning you need to use the p_orig_types operand to c_parser_expr_list).  
This is only relevant to __builtin_choose_expr, not the other 
pseudo-builtins.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: Vector shuffling
  2011-09-07 15:07                               ` Joseph S. Myers
@ 2011-09-09 17:04                                 ` Artem Shinkarov
  2011-09-12  8:02                                   ` Richard Guenther
                                                     ` (2 more replies)
  0 siblings, 3 replies; 71+ messages in thread
From: Artem Shinkarov @ 2011-09-09 17:04 UTC (permalink / raw)
  To: Joseph S. Myers; +Cc: Richard Guenther, Duncan Sands, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1828 bytes --]

Hi, sorry for the delay, I had a lot of other stuff to do.

In the attachment there is a new patch that fixes all the issues
pointed by Joseph and almost all the issues pointed by Richard. The
issues that are not fixed are explained further.

Artem.

>+      if (TREE_CODE (vect) == VECTOR_CST)
>+        {
>+            unsigned i;
>+            tree vals = TREE_VECTOR_CST_ELTS (vect);
>+            for (i = 0; vals; vals = TREE_CHAIN (vals), ++i)
>+              if (i == index)
>
>operand_equal_p

Sorry, I didn't get this comment. It looks fine to me without any changes.

>+  if (need_asgn)
>+    {
>+      TREE_ADDRESSABLE (tmpvec) = 1;
>
>this should be at the point we call create_tmp_var.

Here we are talking about writing this line two times several lines
upper. I would like to leave it just to avoid code duplication.

>Index: gcc/passes.c
>===========================================================>========
>--- gcc/passes.c        (revision 178354)
>+++ gcc/passes.c        (working copy)
>@@ -1354,7 +1354,6 @@ init_optimization_passes (void)
>         NEXT_PASS (pass_vectorize);
>           {
>             struct opt_pass **p = &pass_vectorize.pass.sub;
>-             NEXT_PASS (pass_lower_vector_ssa);
>             NEXT_PASS (pass_dce_loop);
>           }
>          NEXT_PASS (pass_predcom);
>@@ -1366,6 +1365,7 @@ init_optimization_passes (void)
>         NEXT_PASS (pass_lim);
>         NEXT_PASS (pass_tree_loop_done);
>       }
>+      NEXT_PASS (pass_lower_vector_ssa);
>
>This change should not be neccesary.

Without this change the vector lowering with -Ox does not execute the
pass at all. So the overall idea is to make sure that if we have -Ox
we make lowering only once and as late as possible.

But maybe I am just missing something.



Thanks,
Artem.

P.S. X86 PEOPLE, WHERE ARE YOU? :)

[-- Attachment #2: vec-shuffle.v16.diff --]
[-- Type: text/plain, Size: 57292 bytes --]

Index: gcc/doc/extend.texi
===================================================================
--- gcc/doc/extend.texi	(revision 178354)
+++ gcc/doc/extend.texi	(working copy)
@@ -6561,6 +6561,32 @@ invoke undefined behavior at runtime.  W
 accesses for vector subscription can be enabled with
 @option{-Warray-bounds}.
 
+Vector shuffling is available using functions
+@code{__builtin_shuffle (vec, mask)} and
+@code{__builtin_shuffle (vec0, vec1, mask)}. Both functions construct
+a permutation of elements from one or two vectors and return a vector
+of the same type as input vector(s). The mask is a vector of
+integer-typed elements. The size of each element of the mask must be
+the same as the size of each input vector element. The number of
+elements in input vector(s) and mask must be the same.
+
+The elements of the input vectors are numbered from left to right across
+one or both of the vectors. Each element in the mask specifies a number
+of element from the input vector(s). Consider the following example.
+
+@smallexample
+typedef int v4si __attribute__ ((vector_size (16)));
+
+v4si a = @{1,2,3,4@};
+v4si b = @{5,6,7,8@};
+v4si mask1 = @{0,1,1,3@};
+v4si mask2 = @{0,4,2,5@};
+v4si res;
+
+res = __builtin_shuffle (a, mask1);       /* res is @{1,2,2,4@}  */
+res = __builtin_shuffle (a, b, mask2);    /* res is @{1,5,3,6@}  */
+@end smallexample
+
 You can declare variables and use them in function calls and returns, as
 well as in assignments and some casts.  You can specify a vector type as
 a return type for a function.  Vector types can also be used as function
Index: gcc/tree-pretty-print.c
===================================================================
--- gcc/tree-pretty-print.c	(revision 178354)
+++ gcc/tree-pretty-print.c	(working copy)
@@ -2067,6 +2067,16 @@ dump_generic_node (pretty_printer *buffe
       dump_generic_node (buffer, TREE_OPERAND (node, 2), spc, flags, false);
       pp_string (buffer, " > ");
       break;
+    
+    case VEC_SHUFFLE_EXPR:
+      pp_string (buffer, " VEC_SHUFFLE_EXPR < ");
+      dump_generic_node (buffer, TREE_OPERAND (node, 0), spc, flags, false);
+      pp_string (buffer, " , ");
+      dump_generic_node (buffer, TREE_OPERAND (node, 1), spc, flags, false);
+      pp_string (buffer, " , ");
+      dump_generic_node (buffer, TREE_OPERAND (node, 2), spc, flags, false);
+      pp_string (buffer, " > ");
+      break;
 
     case DOT_PROD_EXPR:
       pp_string (buffer, " DOT_PROD_EXPR < ");
Index: gcc/c-family/c-common.c
===================================================================
--- gcc/c-family/c-common.c	(revision 178354)
+++ gcc/c-family/c-common.c	(working copy)
@@ -425,6 +425,7 @@ const struct c_common_resword c_common_r
   { "__attribute__",	RID_ATTRIBUTE,	0 },
   { "__builtin_choose_expr", RID_CHOOSE_EXPR, D_CONLY },
   { "__builtin_complex", RID_BUILTIN_COMPLEX, D_CONLY },
+  { "__builtin_shuffle", RID_BUILTIN_SHUFFLE, D_CONLY },
   { "__builtin_offsetof", RID_OFFSETOF, 0 },
   { "__builtin_types_compatible_p", RID_TYPES_COMPATIBLE_P, D_CONLY },
   { "__builtin_va_arg",	RID_VA_ARG,	0 },
Index: gcc/c-family/c-common.h
===================================================================
--- gcc/c-family/c-common.h	(revision 178354)
+++ gcc/c-family/c-common.h	(working copy)
@@ -103,7 +103,7 @@ enum rid
   /* C extensions */
   RID_ASM,       RID_TYPEOF,   RID_ALIGNOF,  RID_ATTRIBUTE,  RID_VA_ARG,
   RID_EXTENSION, RID_IMAGPART, RID_REALPART, RID_LABEL,      RID_CHOOSE_EXPR,
-  RID_TYPES_COMPATIBLE_P,      RID_BUILTIN_COMPLEX,
+  RID_TYPES_COMPATIBLE_P,      RID_BUILTIN_COMPLEX,	     RID_BUILTIN_SHUFFLE,
   RID_DFLOAT32, RID_DFLOAT64, RID_DFLOAT128,
   RID_FRACT, RID_ACCUM,
 
Index: gcc/optabs.c
===================================================================
--- gcc/optabs.c	(revision 178354)
+++ gcc/optabs.c	(working copy)
@@ -6620,6 +6620,79 @@ vector_compare_rtx (tree cond, bool unsi
   return gen_rtx_fmt_ee (rcode, VOIDmode, ops[0].value, ops[1].value);
 }
 
+/* Return true if VEC_SHUFFLE_EXPR can be expanded using SIMD extensions
+   of the CPU.  */
+bool
+expand_vec_shuffle_expr_p (enum machine_mode mode, tree v0, tree v1, tree mask)
+{
+  int v0_mode_s = GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE (TREE_TYPE (v0))));
+  int mask_mode_s = GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE (TREE_TYPE (mask))));
+
+  if (TREE_CODE (mask) == VECTOR_CST
+      && targetm.vectorize.builtin_vec_perm_ok (TREE_TYPE (v0), mask))
+    return true;
+  
+  if (!operand_equal_p (v0, v1, 0) || v0_mode_s != mask_mode_s
+      || TYPE_VECTOR_SUBPARTS (TREE_TYPE (v0))
+	 != TYPE_VECTOR_SUBPARTS (TREE_TYPE (mask)))
+    return false;
+    
+  return direct_optab_handler (vshuffle_optab, mode) != CODE_FOR_nothing;
+}
+
+/* Generate instructions for VEC_COND_EXPR given its type and three
+   operands.  */
+rtx
+expand_vec_shuffle_expr (tree type, tree v0, tree v1, tree mask, rtx target)
+{
+  struct expand_operand ops[4];
+  enum insn_code icode;
+  enum machine_mode mode = TYPE_MODE (type);
+  rtx rtx_v0, rtx_mask;
+
+  gcc_assert (expand_vec_shuffle_expr_p (mode, v0, v1, mask));
+
+  if (TREE_CODE (mask) == VECTOR_CST)
+    {
+      tree m_type, call;
+      tree fn = targetm.vectorize.builtin_vec_perm (TREE_TYPE (v0), &m_type);
+      /*rtx t;*/
+
+      if (!fn)
+	goto vshuffle;
+
+      if (m_type != TREE_TYPE (TREE_TYPE (mask)))
+	{
+	  int units = TYPE_VECTOR_SUBPARTS (TREE_TYPE (mask));
+	  tree cvt = build_vector_type (m_type, units);
+	  mask = fold_convert (cvt, mask);
+	}
+
+      call = fold_build1 (ADDR_EXPR, build_pointer_type (TREE_TYPE (fn)), fn);
+      call = build_call_nary (type /* ? */, call, 3, v0, v1, mask);
+
+      return expand_expr_real_1 (call, target, VOIDmode, EXPAND_NORMAL, NULL);
+    }
+
+vshuffle:
+  gcc_assert (operand_equal_p (v0, v1, 0));
+
+  icode = direct_optab_handler (vshuffle_optab, mode);
+
+  if (icode == CODE_FOR_nothing)
+    return 0;
+
+  rtx_v0 = expand_normal (v0);
+  rtx_mask = expand_normal (mask);
+
+  create_output_operand (&ops[0], target, mode);
+  create_input_operand (&ops[1], rtx_v0, mode);
+  create_input_operand (&ops[2], rtx_mask, mode);
+  expand_insn (icode, 3, ops);
+
+  return ops[0].value;
+}
+
 /* Return insn code for TYPE, the type of a VEC_COND_EXPR.  */
 
 static inline enum insn_code
Index: gcc/optabs.h
===================================================================
--- gcc/optabs.h	(revision 178354)
+++ gcc/optabs.h	(working copy)
@@ -636,6 +636,9 @@ enum direct_optab_index
   DOI_vcond,
   DOI_vcondu,
 
+  /* Vector shuffling.  */
+  DOI_vshuffle,
+
   /* Block move operation.  */
   DOI_movmem,
 
@@ -701,6 +704,7 @@ typedef struct direct_optab_d *direct_op
 #define reload_out_optab (&direct_optab_table[(int) DOI_reload_out])
 #define vcond_optab (&direct_optab_table[(int) DOI_vcond])
 #define vcondu_optab (&direct_optab_table[(int) DOI_vcondu])
+#define vshuffle_optab (&direct_optab_table[(int) DOI_vshuffle])
 #define movmem_optab (&direct_optab_table[(int) DOI_movmem])
 #define setmem_optab (&direct_optab_table[(int) DOI_setmem])
 #define cmpstr_optab (&direct_optab_table[(int) DOI_cmpstr])
@@ -879,8 +883,15 @@ extern rtx expand_widening_mult (enum ma
 /* Return tree if target supports vector operations for COND_EXPR.  */
 bool expand_vec_cond_expr_p (tree, enum machine_mode);
 
+/* Return tree if target supports vector operations for VEC_SHUFFLE_EXPR.  */
+bool expand_vec_shuffle_expr_p (enum machine_mode, tree, tree, tree);
+
 /* Generate code for VEC_COND_EXPR.  */
 extern rtx expand_vec_cond_expr (tree, tree, tree, tree, rtx);
+
+/* Generate code for VEC_SHUFFLE_EXPR.  */
+extern rtx expand_vec_shuffle_expr (tree, tree, tree, tree, rtx);
+
 /* Generate code for VEC_LSHIFT_EXPR and VEC_RSHIFT_EXPR.  */
 extern rtx expand_vec_shift_expr (sepops, rtx);
 
Index: gcc/genopinit.c
===================================================================
--- gcc/genopinit.c	(revision 178354)
+++ gcc/genopinit.c	(working copy)
@@ -255,6 +255,7 @@ static const char * const optabs[] =
   "set_optab_handler (vec_realign_load_optab, $A, CODE_FOR_$(vec_realign_load_$a$))",
   "set_direct_optab_handler (vcond_optab, $A, CODE_FOR_$(vcond$a$))",
   "set_direct_optab_handler (vcondu_optab, $A, CODE_FOR_$(vcondu$a$))",
+  "set_direct_optab_handler (vshuffle_optab, $A, CODE_FOR_$(vshuffle$a$))",
   "set_optab_handler (ssum_widen_optab, $A, CODE_FOR_$(widen_ssum$I$a3$))",
   "set_optab_handler (usum_widen_optab, $A, CODE_FOR_$(widen_usum$I$a3$))",
   "set_optab_handler (udot_prod_optab, $A, CODE_FOR_$(udot_prod$I$a$))",
Index: gcc/testsuite/gcc.c-torture/execute/vect-shuffle-2.c
===================================================================
--- gcc/testsuite/gcc.c-torture/execute/vect-shuffle-2.c	(revision 0)
+++ gcc/testsuite/gcc.c-torture/execute/vect-shuffle-2.c	(revision 0)
@@ -0,0 +1,44 @@
+#define vector(elcount, type)  \
+__attribute__((vector_size((elcount)*sizeof(type)))) type
+
+#define vidx(type, vec, idx) (*(((type *) &(vec)) + idx))
+
+#define shuf2compare(type, count, vres, v0, v1, mask) \
+do { \
+    int __i; \
+    for (__i = 0; __i < count; __i++) { \
+        if (vidx(type, vres, __i) != ((vidx(type, mask, __i) < count) ? \
+                          vidx(type, v0, vidx(type, mask, __i)) :  \
+                          vidx(type, v1, (vidx(type, mask, __i) - count)))) \
+            __builtin_abort (); \
+        } \
+} while (0)
+
+
+int main (int argc, char *argv[]) {
+    vector (8, short) v0 = {5, 5,5,5,5,5,argc,7};
+    vector (8, short) v1 = {argc, 1,8,8,4,9,argc,4};
+    vector (8, short) v2;
+
+    //vector (8, short) mask = {1,2,5,4,3,6,7};
+    
+    vector (8, short) mask0 = {0,2,3,1,4,5,6,7};
+    vector (8, short) mask1 = {0,12,3,4,3,0,10,9};
+    
+    vector (8, short) mask2 = {0,8,1,9,2,10,3,11};
+
+    v2 = __builtin_shuffle (v0, v1,  mask0);
+    shuf2compare (short, 8, v2, v0, v1, mask0);
+ 
+    v2 = __builtin_shuffle (v0, v1,  mask1);
+    shuf2compare (short, 8, v2, v0, v1, mask1);
+    
+    v2 = __builtin_shuffle (v0, v1,  mask2);
+    shuf2compare (short, 8, v2, v0, v1, mask2);
+
+    v2 = __builtin_shuffle (mask0, mask0,  v0);
+    shuf2compare (short, 8, v2, mask0, mask0, v0);
+
+    return 0; 
+}
+
Index: gcc/testsuite/gcc.c-torture/execute/vect-shuffle-4.c
===================================================================
--- gcc/testsuite/gcc.c-torture/execute/vect-shuffle-4.c	(revision 0)
+++ gcc/testsuite/gcc.c-torture/execute/vect-shuffle-4.c	(revision 0)
@@ -0,0 +1,50 @@
+#define vector(elcount, type)  \
+__attribute__((vector_size((elcount)*sizeof(type)))) type
+
+#define vidx(type, vec, idx) (*(((type *) &(vec)) + idx))
+
+#define shuf2compare(type, count, vres, v0, v1, mask) \
+do { \
+    int __i; \
+    for (__i = 0; __i < count; __i++) { \
+        if (vidx(type, vres, __i) != ((vidx(type, mask, __i) < count) ? \
+                          vidx(type, v0, vidx(type, mask, __i)) :  \
+                          vidx(type, v1, (vidx(type, mask, __i) - count)))) \
+            __builtin_abort (); \
+        } \
+} while (0)
+
+
+vector (8, short) __attribute__ ((noinline))
+f (vector (8, short) x, vector (8, short) y, vector (8, short) mask) {
+    return __builtin_shuffle (x, y, mask);
+}
+
+
+
+int main (int argc, char *argv[]) {
+    vector (8, short) v0 = {argc, 1,2,3,4,5,6,7};
+    vector (8, short) v1 = {argc, 1,argc,3,4,5,argc,7};
+    vector (8, short) v2;
+
+    //vector (8, short) mask = {1,2,5,4,3,6,7};
+    
+    vector (8, short) mask0 = {0,2,3,1,4,5,6,7};
+    vector (8, short) mask1 = {0,12,3,4,3,0,10,9};
+    vector (8, short) mask2 = {0,8,1,9,2,10,3,11};
+
+    v2 = f (v0, v1,  mask0);
+    shuf2compare (short, 8, v2, v0, v1, mask0);
+ 
+    v2 = f (v0, v1,  mask1);
+    shuf2compare (short, 8, v2, v0, v1, mask1);
+
+    v2 = f (v0, v1,  mask2);
+    shuf2compare (short, 8, v2, v0, v1, mask2);
+
+    v2 = f (mask0, mask0,  v0);
+    shuf2compare (short, 8, v2, mask0, mask0, v0);
+
+    return 0; 
+}
+
Index: gcc/testsuite/gcc.c-torture/execute/vect-shuffle-1.c
===================================================================
--- gcc/testsuite/gcc.c-torture/execute/vect-shuffle-1.c	(revision 0)
+++ gcc/testsuite/gcc.c-torture/execute/vect-shuffle-1.c	(revision 0)
@@ -0,0 +1,46 @@
+#define vector(elcount, type)  \
+__attribute__((vector_size((elcount)*sizeof(type)))) type
+
+#define vidx(type, vec, idx) (*(((type *) &(vec)) + idx))
+
+#define shufcompare(type, count, vres, v0, mask) \
+do { \
+    int __i; \
+    for (__i = 0; __i < count; __i++) { \
+        if (vidx(type, vres, __i) != vidx(type, v0, vidx(type, mask, __i))) \
+            __builtin_abort (); \
+    } \
+} while (0)
+
+
+int main (int argc, char *argv[]) {
+    /*vector (8, short) v0 = {argc, 1,2,3,4,5,6,7};
+    vector (8, short) v1 = {argc, 1,argc,3,4,5,argc,7};
+    vector (8, short) v2;
+   
+    vector (8, short) smask = {0,0,1,2,3,4,5,6};
+    
+    v2 = __builtin_shuffle (v0,  smask);
+    shufcompare (short, 8, v2, v0, smask);
+    v2 = __builtin_shuffle (v0, v1);
+    shufcompare (short, 8, v2, v0, v1);
+    v2 = __builtin_shuffle (smask, v0);
+    shufcompare (short, 8, v2, smask, v0);*/
+
+    vector (4, int) i0 = {argc, 1,2,3};
+    vector (4, int) i1 = {argc, 1, argc, 3};
+    vector (4, int) i2;
+
+    vector (4, int) imask = {0,3,2,1};
+
+    /*i2 = __builtin_shuffle (i0, imask);
+    shufcompare (int, 4, i2, i0, imask);*/
+    i2 = __builtin_shuffle (i0, i1);
+    shufcompare (int, 4, i2, i0, i1);
+    
+    i2 = __builtin_shuffle (imask, i0);
+    shufcompare (int, 4, i2, imask, i0);
+    
+    return 0;
+}
+
Index: gcc/testsuite/gcc.c-torture/execute/vect-shuffle-3.c
===================================================================
--- gcc/testsuite/gcc.c-torture/execute/vect-shuffle-3.c	(revision 0)
+++ gcc/testsuite/gcc.c-torture/execute/vect-shuffle-3.c	(revision 0)
@@ -0,0 +1,36 @@
+#define vector(elcount, type)  \
+__attribute__((vector_size((elcount)*sizeof(type)))) type
+
+#define vidx(type, vec, idx) (*(((type *) &(vec)) + idx))
+
+#define shufcompare(type, count, vres, v0, mask) \
+do { \
+    int __i; \
+    for (__i = 0; __i < count; __i++) { \
+        if (vidx(type, vres, __i) != vidx(type, v0, vidx(type, mask, __i))) \
+            __builtin_abort (); \
+    } \
+} while (0)
+
+vector (8, short) __attribute__ ((noinline))
+f (vector (8, short) x, vector (8, short) mask) {
+    return __builtin_shuffle (x, mask);
+}
+
+
+int main (int argc, char *argv[]) {
+    vector (8, short) v0 = {argc, 1,2,3,4,5,6,7};
+    vector (8, short) v1 = {argc, 1,argc,3,4,5,argc,7};
+    vector (8, short) v2;
+
+    vector (8, short) mask = {0,0,1,2,3,4,5,6};
+    
+    v2 = f (v0,  mask);
+    shufcompare (short, 8, v2, v0, mask);
+
+    v2 = f (v0, v1);
+    shufcompare (short, 8, v2, v0, v1);
+
+    return 0;
+}
+
Index: gcc/testsuite/gcc.c-torture/execute/vect-shuffle-5.c
===================================================================
--- gcc/testsuite/gcc.c-torture/execute/vect-shuffle-5.c	(revision 0)
+++ gcc/testsuite/gcc.c-torture/execute/vect-shuffle-5.c	(revision 0)
@@ -0,0 +1,64 @@
+/* Test that different type variants are compatible within
+   vector shuffling.  */
+
+#define vector(elcount, type)  \
+__attribute__((vector_size((elcount)*sizeof(type)))) type
+
+#define shufcompare(count, vres, v0, mask) \
+do { \
+    int __i; \
+    for (__i = 0; __i < count; __i++) { \
+        if (vres[__i] != v0[mask[__i]]) \
+            __builtin_abort (); \
+    } \
+} while (0)
+
+#define test_compat_mask(res, vec, mask) \
+  res = __builtin_shuffle (vec, mask); \
+  shufcompare(4, res, vec, mask); \
+  res = __builtin_shuffle (vec, c ## mask); \
+  shufcompare(4, res, vec, c ##  mask); \
+  res = __builtin_shuffle (vec, r ## mask); \
+  shufcompare(4, res, vec, r ##  mask); \
+  res = __builtin_shuffle (vec, d ## mask); \
+  shufcompare(4, res, vec, d ##  mask); \
+  res = __builtin_shuffle (vec, dc ## mask); \
+  shufcompare(4, res, vec, dc ##  mask); \
+
+#define test_compat_vec(res, vec, mask) \
+  test_compat_mask (res, vec, mask); \
+  test_compat_mask (res, c ## vec, mask); \
+  test_compat_mask (res, r ## vec, mask); \
+  test_compat_mask (res, d ## vec, mask); \
+  test_compat_mask (res, dc ## vec, mask); 
+
+#define test_compat(res, vec, mask) \
+  test_compat_vec (res, vec, mask); \
+  test_compat_vec (d ## res, vec, mask); \
+  test_compat_vec (r ## res, vec, mask);
+
+typedef vector (4, int) v4si;
+typedef const vector (4, int) v4sicst;
+
+int main (int argc, char *argv[]) {
+    vector (4, int) vec = {argc, 1,2,3};
+    const vector (4, int) cvec = {argc, 1,2,3};
+    register vector (4, int) rvec = {argc, 1,2,3};
+    v4si dvec = {argc, 1,2,3};
+    v4sicst dcvec = {argc, 1,2,3};
+    
+    vector (4, int) res; 
+    v4si dres;
+    register vector (4, int) rres;
+
+    vector (4, int) mask = {0,3,2,1};
+    const vector (4, int) cmask = {0,3,2,1};
+    register vector (4, int) rmask = {0,3,2,1};
+    v4si dmask = {0,3,2,1};
+    v4sicst dcmask = {0,3,2,1};
+
+    test_compat (res, vec, mask);
+
+    return 0;
+}
+
Index: gcc/testsuite/gcc.dg/builtin-complex-err-1.c
===================================================================
--- gcc/testsuite/gcc.dg/builtin-complex-err-1.c	(revision 178354)
+++ gcc/testsuite/gcc.dg/builtin-complex-err-1.c	(working copy)
@@ -19,8 +19,8 @@ _Complex float fc3 = __builtin_complex (
 void
 f (void)
 {
-  __builtin_complex (0.0); /* { dg-error "expected" } */
-  __builtin_complex (0.0, 0.0, 0.0); /* { dg-error "expected" } */
+  __builtin_complex (0.0); /* { dg-error "wrong number of arguments" } */
+  __builtin_complex (0.0, 0.0, 0.0); /* { dg-error "wrong number of arguments" } */
 }
 
-void (*p) (void) = __builtin_complex; /* { dg-error "expected" } */
+void (*p) (void) = __builtin_complex; /* { dg-error "cannot take address" } */
Index: gcc/c-tree.h
===================================================================
--- gcc/c-tree.h	(revision 178354)
+++ gcc/c-tree.h	(working copy)
@@ -579,6 +579,7 @@ extern tree c_begin_omp_task (void);
 extern tree c_finish_omp_task (location_t, tree, tree);
 extern tree c_finish_omp_clauses (tree);
 extern tree c_build_va_arg (location_t, tree, tree);
+extern tree c_build_vec_shuffle_expr (location_t, tree, tree, tree);
 
 /* Set to 0 at beginning of a function definition, set to 1 if
    a return statement that specifies a return value is seen.  */
Index: gcc/expr.c
===================================================================
--- gcc/expr.c	(revision 178354)
+++ gcc/expr.c	(working copy)
@@ -8605,6 +8605,10 @@ expand_expr_real_2 (sepops ops, rtx targ
     case VEC_PACK_FIX_TRUNC_EXPR:
       mode = TYPE_MODE (TREE_TYPE (treeop0));
       goto binop;
+    
+    case VEC_SHUFFLE_EXPR:
+      target = expand_vec_shuffle_expr (type, treeop0, treeop1, treeop2, target);
+      return target;
 
     case DOT_PROD_EXPR:
       {
Index: gcc/gimple-pretty-print.c
===================================================================
--- gcc/gimple-pretty-print.c	(revision 178354)
+++ gcc/gimple-pretty-print.c	(working copy)
@@ -417,6 +417,16 @@ dump_ternary_rhs (pretty_printer *buffer
       dump_generic_node (buffer, gimple_assign_rhs3 (gs), spc, flags, false);
       pp_string (buffer, ">");
       break;
+    
+    case VEC_SHUFFLE_EXPR:
+      pp_string (buffer, "VEC_SHUFFLE_EXPR <");
+      dump_generic_node (buffer, gimple_assign_rhs1 (gs), spc, flags, false);
+      pp_string (buffer, ", ");
+      dump_generic_node (buffer, gimple_assign_rhs2 (gs), spc, flags, false);
+      pp_string (buffer, ", ");
+      dump_generic_node (buffer, gimple_assign_rhs3 (gs), spc, flags, false);
+      pp_string (buffer, ">");
+      break;
 
     case REALIGN_LOAD_EXPR:
       pp_string (buffer, "REALIGN_LOAD <");
Index: gcc/c-typeck.c
===================================================================
--- gcc/c-typeck.c	(revision 178354)
+++ gcc/c-typeck.c	(working copy)
@@ -2845,6 +2845,99 @@ build_function_call_vec (location_t loc,
     }
   return require_complete_type (result);
 }
+
+/* Build a VEC_SHUFFLE_EXPR if V0, V1 and MASK are not error_mark_nodes
+   and have vector types, V0 has the same type as V1, and the number of
+   elements of V0, V1, MASK is the same.
+   
+   In case V1 is a NULL_TREE it is assumed that __builtin_shuffle was
+   called with two arguments.  In this case implementation passes the
+   first argument twice in order to share the same tree code.  This fact
+   could enable the mask-values being twice the vector length.  This is
+   an implementation accident and this semantics is not guaranteed to
+   the user.  */
+tree
+c_build_vec_shuffle_expr (location_t loc, tree v0, tree v1, tree mask)
+{
+  tree vec_shuffle, tmp;
+  bool wrap = true;
+  bool maybe_const = false;
+  bool two_arguments;
+  
+  if (v1 == NULL_TREE)
+    {
+      two_arguments = true;
+      v1 = v0;
+    }
+
+  if (v0 == error_mark_node || v1 == error_mark_node
+      || mask == error_mark_node)
+    return error_mark_node;
+
+  if (TREE_CODE (TREE_TYPE (mask)) != VECTOR_TYPE
+      || TREE_CODE (TREE_TYPE (TREE_TYPE (mask))) != INTEGER_TYPE)
+    {
+      error_at (loc, "__builtin_shuffle last argument must "
+		     "be an integer vector");
+      return error_mark_node;
+    }
+   
+  if (TREE_CODE (TREE_TYPE (v0)) != VECTOR_TYPE
+      || TREE_CODE (TREE_TYPE (v1)) != VECTOR_TYPE)
+    {
+      error_at (loc, "__builtin_shuffle arguments must be vectors");
+      return error_mark_node;
+    }
+
+  if (TYPE_MAIN_VARIANT (TREE_TYPE (v0)) != TYPE_MAIN_VARIANT (TREE_TYPE (v1)))
+    {
+      error_at (loc, "__builtin_shuffle argument vectors must be of "
+		     "the same type");
+      return error_mark_node;
+    }
+
+  if (TYPE_VECTOR_SUBPARTS (TREE_TYPE (v0))
+      != TYPE_VECTOR_SUBPARTS (TREE_TYPE (mask))
+      && TYPE_VECTOR_SUBPARTS (TREE_TYPE (v1))
+	 != TYPE_VECTOR_SUBPARTS (TREE_TYPE (mask)))
+    {
+      error_at (loc, "__builtin_shuffle number of elements of the "
+		     "argument vector(s) and the mask vector should "
+		     "be the same");
+      return error_mark_node;
+    }
+  
+  if (GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE (TREE_TYPE (v0))))
+      != GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE (TREE_TYPE (mask)))))
+    {
+      error_at (loc, "__builtin_shuffle argument vector(s) inner type "
+		     "must have the same size as inner type of the mask");
+      return error_mark_node;
+    }
+
+  /* Avoid C_MAYBE_CONST_EXPRs inside VEC_SHUFFLE_EXPR.  */
+  tmp = c_fully_fold (v0, false, &maybe_const);
+  v0 = save_expr (tmp);
+  wrap &= maybe_const;
+
+  if (!two_arguments)
+    {
+      v1 = c_fully_fold (v1, false, &maybe_const);
+      wrap &= maybe_const;
+    }
+  else
+    v1 = v0;
+  
+  mask = c_fully_fold (mask, false, &maybe_const);
+  wrap &= maybe_const;
+
+  vec_shuffle = build3 (VEC_SHUFFLE_EXPR, TREE_TYPE (v0), v0, v1, mask);
+
+  if (!wrap)
+    vec_shuffle = c_wrap_maybe_const (vec_shuffle, true);
+
+  return vec_shuffle;
+}
 \f
 /* Convert the argument expressions in the vector VALUES
    to the types in the list TYPELIST.
@@ -6120,7 +6213,7 @@ digest_init (location_t init_loc, tree t
 	  tree value;
 	  bool constant_p = true;
 
-	  /* Iterate through elements and check if all constructor
+          /* Iterate through elements and check if all constructor
 	     elements are *_CSTs.  */
 	  FOR_EACH_CONSTRUCTOR_VALUE (CONSTRUCTOR_ELTS (inside_init), ix, value)
 	    if (!CONSTANT_CLASS_P (value))
Index: gcc/gimplify.c
===================================================================
--- gcc/gimplify.c	(revision 178354)
+++ gcc/gimplify.c	(working copy)
@@ -7286,6 +7286,7 @@ gimplify_expr (tree *expr_p, gimple_seq
 	  }
 
 	case FMA_EXPR:
+	case VEC_SHUFFLE_EXPR:
 	  /* Classified as tcc_expression.  */
 	  goto expr_3;
 
Index: gcc/tree.def
===================================================================
--- gcc/tree.def	(revision 178354)
+++ gcc/tree.def	(working copy)
@@ -497,6 +497,19 @@ DEFTREECODE (COND_EXPR, "cond_expr", tcc
 */
 DEFTREECODE (VEC_COND_EXPR, "vec_cond_expr", tcc_expression, 3)
 
+/* Vector shuffle expression.  A = VEC_SHUFFLE_EXPR<v0, v1, maks>
+   means
+
+   foreach i in length (mask):
+     A = mask[i] < length (v0) ? v0[mask[i]] : v1[mask[i]]
+
+   V0 and V1 are vectors of the same type.  MASK is an integer-typed
+   vector.  The number of MASK elements must be the same with the
+   number of elements in V0 and V1.  The size of the inner type
+   of the MASK and of the V0 and V1 must be the same.
+*/
+DEFTREECODE (VEC_SHUFFLE_EXPR, "vec_shuffle_expr", tcc_expression, 3)
+
 /* Declare local variables, including making RTL and allocating space.
    BIND_EXPR_VARS is a chain of VAR_DECL nodes for the variables.
    BIND_EXPR_BODY is the body, the expression to be computed using
Index: gcc/tree-inline.c
===================================================================
--- gcc/tree-inline.c	(revision 178354)
+++ gcc/tree-inline.c	(working copy)
@@ -3285,6 +3285,7 @@ estimate_operator_cost (enum tree_code c
        ??? We may consider mapping RTL costs to this.  */
     case COND_EXPR:
     case VEC_COND_EXPR:
+    case VEC_SHUFFLE_EXPR:
 
     case PLUS_EXPR:
     case POINTER_PLUS_EXPR:
Index: gcc/tree-vect-generic.c
===================================================================
--- gcc/tree-vect-generic.c	(revision 178354)
+++ gcc/tree-vect-generic.c	(working copy)
@@ -30,6 +30,7 @@ along with GCC; see the file COPYING3.
 #include "tree-pass.h"
 #include "flags.h"
 #include "ggc.h"
+#include "diagnostic.h"
 
 /* Need to include rtl.h, expr.h, etc. for optabs.  */
 #include "expr.h"
@@ -432,6 +433,263 @@ type_for_widest_vector_mode (enum machin
     }
 }
 
+
+/* Build a reference to the element of the vector VECT.  Function
+   returns either the element itself, either BIT_FIELD_REF, or an
+   ARRAY_REF expression.
+   
+   GSI is requred to insert temporary variables while building a
+   refernece to the element of the vector VECT.
+   
+   PTMPVEC is a pointer to the temporary variable for caching
+   purposes.  In case when PTMPVEC is NULL new temporary variable
+   will be created.  */
+static tree
+vector_element (gimple_stmt_iterator *gsi, tree vect, tree idx, tree *ptmpvec)
+{
+  tree type;
+  gimple asgn;
+  tree tmpvec;
+  tree arraytype;
+  bool need_asgn = true;
+
+  gcc_assert (TREE_CODE (TREE_TYPE (vect)) == VECTOR_TYPE);
+
+  type = TREE_TYPE (vect);
+  if (TREE_CODE (idx) == INTEGER_CST)
+    {
+      unsigned HOST_WIDE_INT index;
+
+      if (!host_integerp (idx, 1)
+           || (index = tree_low_cst (idx, 1)) > TYPE_VECTOR_SUBPARTS (type)-1)
+        return error_mark_node;
+
+      if (TREE_CODE (vect) == VECTOR_CST)
+        {
+            unsigned i;
+            tree vals = TREE_VECTOR_CST_ELTS (vect);
+            for (i = 0; vals; vals = TREE_CHAIN (vals), ++i)
+              if (i == index)
+                 return TREE_VALUE (vals);
+            return error_mark_node;
+        }
+      else if (TREE_CODE (vect) == CONSTRUCTOR)
+        {
+          unsigned i;
+          VEC (constructor_elt, gc) *vals = CONSTRUCTOR_ELTS (vect);
+          constructor_elt *elt;
+
+          for (i = 0; VEC_iterate (constructor_elt, vals, i, elt); i++)
+            if (operand_equal_p (elt->index, idx, 0))
+              return elt->value;
+          return fold_convert (TREE_TYPE (type), integer_zero_node);
+        }
+      else if (TREE_CODE (vect) == SSA_NAME)
+        {
+	  tree size = TYPE_SIZE (TREE_TYPE (type));
+          tree pos = fold_build2 (MULT_EXPR, TREE_TYPE (idx), idx, size);
+          return fold_build3 (BIT_FIELD_REF, TREE_TYPE (type), vect, size, pos);
+        }
+      else
+        return error_mark_node;
+    }
+  
+  if (!ptmpvec)
+    tmpvec = create_tmp_var (TREE_TYPE (vect), "vectmp");
+  else if (!*ptmpvec)
+    tmpvec = *ptmpvec = create_tmp_var (TREE_TYPE (vect), "vectmp");
+  else
+    {
+      tmpvec = *ptmpvec;
+      need_asgn = false;
+    }
+  
+  if (need_asgn)
+    {
+      TREE_ADDRESSABLE (tmpvec) = 1;
+      asgn = gimple_build_assign (tmpvec, vect);
+      gsi_insert_before (gsi, asgn, GSI_SAME_STMT);
+    }
+
+  arraytype = build_array_type_nelts (TREE_TYPE (type),
+				      TYPE_VECTOR_SUBPARTS (TREE_TYPE (vect)));
+  
+  return build4 (ARRAY_REF, TREE_TYPE (type),
+                 build1 (VIEW_CONVERT_EXPR, arraytype, tmpvec),
+                 idx, NULL_TREE, NULL_TREE);
+}
+
+/* Check if VEC_SHUFFLE_EXPR within the given setting is supported
+   by hardware, or lower it piecewise.  Function returns false when
+   the expression must be replaced with TRAP_RETURN, true otherwise.
+
+   When VEC_SHUFFLE_EXPR has the same first and second operands:
+   VEC_SHUFFLE_EXPR <v0, v0, mask> the lowered version would be
+   {v0[mask[0]], v0[mask[1]], ...}
+   MASK and V0 must have the same number of elements.
+        
+   Otherwise VEC_SHUFFLE_EXPR <v0, v1, mask> is lowered to
+   {mask[0] < len(v0) ? v0[mask[0]] : v1[mask[0]], ...}
+   V0 and V1 must have the same type.  MASK, V0, V1 must have the
+   same number of arguments.  */
+static bool
+lower_vec_shuffle (gimple_stmt_iterator *gsi, location_t loc)
+{
+ 
+  gimple stmt = gsi_stmt (*gsi);
+  tree mask = gimple_assign_rhs3 (stmt);
+  tree vec0 = gimple_assign_rhs1 (stmt);
+  tree vec1 = gimple_assign_rhs2 (stmt);
+  unsigned els = TYPE_VECTOR_SUBPARTS (TREE_TYPE (mask));
+  tree type0 = TREE_TYPE (TREE_TYPE (vec0));
+  VEC(constructor_elt,gc) *v = NULL;
+  tree vectype, constr;
+  tree vec0tmp = NULL_TREE, masktmp = NULL_TREE;
+
+  if (expand_vec_shuffle_expr_p (TYPE_MODE (TREE_TYPE (vec0)), vec0, vec1, mask))
+    {
+      tree t;
+
+      t = gimplify_build3 (gsi, VEC_SHUFFLE_EXPR, TREE_TYPE (vec0),
+			   vec0, vec1, mask);
+      gimple_assign_set_rhs_from_tree (gsi, t);
+      /* Statement should be updated by callee.  */
+      return true;
+    }
+  
+  if (operand_equal_p (vec0, vec1, 0))
+    {
+      unsigned i;
+      tree vec0tmp = NULL_TREE;
+      
+      v = VEC_alloc (constructor_elt, gc, els);
+      for (i = 0; i < els; i++)
+        {
+          tree idxval, vecel, t;
+	   
+	  idxval = vector_element (gsi, mask, size_int (i), &masktmp);
+          if (idxval == error_mark_node)
+            {
+              if (warning_at (loc, 0, "Invalid shuffling mask index %i", i))
+		inform (loc, "if this code is reached the programm will abort");
+	      return false;
+            }
+
+	  vecel = vector_element (gsi, vec0, idxval, &vec0tmp);
+          if (vecel == error_mark_node)
+            {
+              if (warning_at (loc, 0, "Invalid shuffling arguments"))
+		inform (loc, "if this code is reached the programm will abort");
+	      return false;
+            }
+          
+          t = force_gimple_operand_gsi (gsi, vecel, true,
+					NULL_TREE, true, GSI_SAME_STMT);
+          CONSTRUCTOR_APPEND_ELT (v, size_int (i), t);
+        }
+    }
+  else
+    {
+      unsigned i;
+      tree var = create_tmp_var (type0, "vecel");
+      tree vec1tmp = NULL_TREE;
+
+      v = VEC_alloc (constructor_elt, gc, els);
+      for (i = 0; i < els; i++)
+        {
+          tree idxval, idx1val, cond, elval0, elval1, condexpr, t, ssatmp;
+          tree vec0el, vec1el;
+          gimple asgn;
+          
+          idxval = vector_element (gsi, mask, size_int (i), &masktmp);
+	  if (idxval == error_mark_node)
+            {
+              if (warning_at (loc, 0, "Invalid shuffling mask index %i", i))
+		inform (loc, "if this code is reached the programm will abort");
+	      return false;
+            }
+          
+          if (TREE_CODE (idxval) == INTEGER_CST)
+            {
+              if (tree_int_cst_lt (idxval, size_int (els)))
+                {
+                  vec0el = vector_element (gsi, vec0, idxval, &vec0tmp);
+                  t = force_gimple_operand_gsi (gsi, vec0el,
+                                    true, NULL_TREE, true, GSI_SAME_STMT);
+                }
+              else if (tree_int_cst_lt (idxval, size_int (2*els)))
+                {
+                  idx1val = fold_build2 (MINUS_EXPR, TREE_TYPE (idxval),
+                        idxval, build_int_cst (TREE_TYPE (idxval), els));
+                  
+                  vec1el = vector_element (gsi, vec1, idx1val, &vec1tmp);
+                  t = force_gimple_operand_gsi (gsi, vec1el, true,
+						NULL_TREE, true, GSI_SAME_STMT);
+                }
+              else
+                {
+                  if (warning_at (loc, 0, "Invalid shuffling mask index %i", i))
+		    inform (loc, "if this code is reached the "
+				  "programm will abort");
+		  return false;
+                }
+            }
+          else
+            {
+
+              idx1val = fold_build2 (MINUS_EXPR, TREE_TYPE (idxval),
+                            idxval, build_int_cst (TREE_TYPE (idxval), els));
+              idx1val = force_gimple_operand_gsi (gsi, idx1val,
+                                true, NULL_TREE, true, GSI_SAME_STMT);
+              cond = fold_build2 (GT_EXPR, boolean_type_node, \
+                             idxval, fold_convert (type0, size_int (els - 1)));
+              
+	      vec0el = vector_element (gsi, vec0, idxval, &vec0tmp);
+              if (vec0el == error_mark_node)
+                {
+                  if (warning_at (loc, 0, "Invalid shuffling arguments"))
+		    inform (loc, "if this code is reached the "
+				 "programm will abort");
+		  return false;
+                }
+
+              elval0 = force_gimple_operand_gsi (gsi, vec0el,
+                                true, NULL_TREE, true, GSI_SAME_STMT);
+	      
+	      vec1el = vector_element (gsi, vec1, idx1val, &vec1tmp);
+              if (vec1el == error_mark_node)
+                {
+                  if (warning_at (loc, 0, "Invalid shuffling arguments"))
+		    inform (loc, "if this code is reached the "
+				 "programm will abort");
+		  return false;
+                }
+
+              elval1 = force_gimple_operand_gsi (gsi, vec1el,
+                                true, NULL_TREE, true, GSI_SAME_STMT);
+
+              condexpr = fold_build3 (COND_EXPR, type0, cond, \
+                                      elval1, elval0);
+
+              t = force_gimple_operand_gsi (gsi, condexpr, true, \
+                                        NULL_TREE, true, GSI_SAME_STMT);
+            }
+          
+          asgn = gimple_build_assign (var, t);
+          ssatmp = make_ssa_name (var, asgn);
+          gimple_assign_set_lhs (asgn, ssatmp);
+          gsi_insert_before (gsi, asgn, GSI_SAME_STMT);
+          CONSTRUCTOR_APPEND_ELT (v, size_int (i), ssatmp);
+        }
+    }
+  
+  vectype = build_vector_type (type0, els);
+  constr = build_constructor (vectype, v);
+  gimple_assign_set_rhs_from_tree (gsi, constr);
+  /* Statement should be updated by callee.  */
+  return true;
+}
+
 /* Process one statement.  If we identify a vector operation, expand it.  */
 
 static void
@@ -451,6 +709,25 @@ expand_vector_operations_1 (gimple_stmt_
   code = gimple_assign_rhs_code (stmt);
   rhs_class = get_gimple_rhs_class (code);
 
+  if (code == VEC_SHUFFLE_EXPR)
+    {
+      if (!lower_vec_shuffle (gsi, gimple_location (stmt)))
+	{
+	  gimple new_stmt;
+	  tree vec0;
+	  
+	  vec0 = gimple_assign_rhs1 (stmt);
+	  new_stmt = gimple_build_call (built_in_decls[BUILT_IN_TRAP], 0);
+	  gsi_insert_before (gsi, new_stmt,  GSI_SAME_STMT);
+	  split_block (gimple_bb (new_stmt), new_stmt);
+	  new_stmt = gimple_build_assign (gimple_assign_lhs (stmt), vec0);
+	  gsi_replace (gsi, new_stmt, false);
+	}
+
+      gimple_set_modified (gsi_stmt (*gsi), true);
+      update_stmt (gsi_stmt (*gsi));
+    }
+
   if (rhs_class != GIMPLE_UNARY_RHS && rhs_class != GIMPLE_BINARY_RHS)
     return;
 
@@ -613,9 +890,9 @@ expand_vector_operations_1 (gimple_stmt_
    if it may need the bit-twiddling tricks implemented in this file.  */
 
 static bool
-gate_expand_vector_operations (void)
+gate_expand_vector_operations_ssa (void)
 {
-  return flag_tree_vectorize != 0;
+  return optimize == 0;
 }
 
 static unsigned int
@@ -648,7 +925,7 @@ struct gimple_opt_pass pass_lower_vector
  {
   GIMPLE_PASS,
   "veclower",				/* name */
-  0,					/* gate */
+  gate_expand_vector_operations_ssa,    /* gate */
   expand_vector_operations,		/* execute */
   NULL,					/* sub */
   NULL,					/* next */
@@ -661,6 +938,7 @@ struct gimple_opt_pass pass_lower_vector
   TODO_update_ssa	                /* todo_flags_finish */
     | TODO_verify_ssa
     | TODO_verify_stmts | TODO_verify_flow
+    | TODO_cleanup_cfg
  }
 };
 
@@ -669,7 +947,7 @@ struct gimple_opt_pass pass_lower_vector
  {
   GIMPLE_PASS,
   "veclower2",				/* name */
-  gate_expand_vector_operations,	/* gate */
+  0,	                                /* gate */
   expand_vector_operations,		/* execute */
   NULL,					/* sub */
   NULL,					/* next */
@@ -682,6 +960,7 @@ struct gimple_opt_pass pass_lower_vector
   TODO_update_ssa	                /* todo_flags_finish */
     | TODO_verify_ssa
     | TODO_verify_stmts | TODO_verify_flow
+    | TODO_cleanup_cfg
  }
 };
 
Index: gcc/Makefile.in
===================================================================
--- gcc/Makefile.in	(revision 178354)
+++ gcc/Makefile.in	(working copy)
@@ -3178,7 +3178,7 @@ tree-vect-generic.o : tree-vect-generic.
     $(TM_H) $(TREE_FLOW_H) $(GIMPLE_H) tree-iterator.h $(TREE_PASS_H) \
     $(FLAGS_H) $(OPTABS_H) $(MACHMODE_H) $(EXPR_H) \
     langhooks.h $(FLAGS_H) $(DIAGNOSTIC_H) gt-tree-vect-generic.h $(GGC_H) \
-    coretypes.h insn-codes.h
+    coretypes.h insn-codes.h $(DIAGNOSTIC_H)
 df-core.o : df-core.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) \
    insn-config.h $(RECOG_H) $(FUNCTION_H) $(REGS_H) alloc-pool.h \
    hard-reg-set.h $(BASIC_BLOCK_H) $(DF_H) $(BITMAP_H) sbitmap.h $(TIMEVAR_H) \
Index: gcc/gimple.c
===================================================================
--- gcc/gimple.c	(revision 178354)
+++ gcc/gimple.c	(working copy)
@@ -2615,6 +2615,7 @@ get_gimple_rhs_num_ops (enum tree_code c
       || (SYM) == WIDEN_MULT_MINUS_EXPR					    \
       || (SYM) == DOT_PROD_EXPR						    \
       || (SYM) == REALIGN_LOAD_EXPR					    \
+      || (SYM) == VEC_SHUFFLE_EXPR					    \
       || (SYM) == FMA_EXPR) ? GIMPLE_TERNARY_RHS			    \
    : ((SYM) == COND_EXPR						    \
       || (SYM) == CONSTRUCTOR						    \
Index: gcc/tree-cfg.c
===================================================================
--- gcc/tree-cfg.c	(revision 178354)
+++ gcc/tree-cfg.c	(working copy)
@@ -3711,6 +3711,59 @@ verify_gimple_assign_ternary (gimple stm
 	}
       break;
 
+    case VEC_SHUFFLE_EXPR:
+      if (!useless_type_conversion_p (lhs_type, rhs1_type)
+	  || !useless_type_conversion_p (lhs_type, rhs2_type))
+	{
+	  error ("type mismatch in vector shuffle expression");
+	  debug_generic_expr (lhs_type);
+	  debug_generic_expr (rhs1_type);
+	  debug_generic_expr (rhs2_type);
+	  debug_generic_expr (rhs3_type);
+	  return true;
+	}
+
+      if (TREE_CODE (rhs1_type) != VECTOR_TYPE
+	  || TREE_CODE (rhs2_type) != VECTOR_TYPE
+	  || TREE_CODE (rhs3_type) != VECTOR_TYPE)
+	{
+	  error ("vector types expected in vector shuffle expression");
+	  debug_generic_expr (lhs_type);
+	  debug_generic_expr (rhs1_type);
+	  debug_generic_expr (rhs2_type);
+	  debug_generic_expr (rhs3_type);
+	  return true;
+	}
+
+      if (TYPE_VECTOR_SUBPARTS (rhs1_type) != TYPE_VECTOR_SUBPARTS (rhs2_type)
+	  || TYPE_VECTOR_SUBPARTS (rhs2_type)
+	     != TYPE_VECTOR_SUBPARTS (rhs3_type)
+	  || TYPE_VECTOR_SUBPARTS (rhs3_type)
+	     != TYPE_VECTOR_SUBPARTS (lhs_type))
+	{
+	  error ("vectors with different element number found "
+		 "in vector shuffle expression");
+	  debug_generic_expr (lhs_type);
+	  debug_generic_expr (rhs1_type);
+	  debug_generic_expr (rhs2_type);
+	  debug_generic_expr (rhs3_type);
+	  return true;
+	}
+
+      if (TREE_CODE (TREE_TYPE (rhs3_type)) != INTEGER_TYPE
+	  || GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE (rhs3_type)))
+	     != GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE (rhs1_type))))
+	{
+	  error ("invalid mask type in vector shuffle expression");
+	  debug_generic_expr (lhs_type);
+	  debug_generic_expr (rhs1_type);
+	  debug_generic_expr (rhs2_type);
+	  debug_generic_expr (rhs3_type);
+	  return true;
+	}
+
+      return false;
+
     case DOT_PROD_EXPR:
     case REALIGN_LOAD_EXPR:
       /* FIXME.  */
Index: gcc/passes.c
===================================================================
--- gcc/passes.c	(revision 178354)
+++ gcc/passes.c	(working copy)
@@ -1354,7 +1354,6 @@ init_optimization_passes (void)
 	  NEXT_PASS (pass_vectorize);
 	    {
 	      struct opt_pass **p = &pass_vectorize.pass.sub;
-	      NEXT_PASS (pass_lower_vector_ssa);
 	      NEXT_PASS (pass_dce_loop);
 	    }
           NEXT_PASS (pass_predcom);
@@ -1366,6 +1365,7 @@ init_optimization_passes (void)
 	  NEXT_PASS (pass_lim);
 	  NEXT_PASS (pass_tree_loop_done);
 	}
+      NEXT_PASS (pass_lower_vector_ssa);
       NEXT_PASS (pass_cse_reciprocals);
       NEXT_PASS (pass_reassoc);
       NEXT_PASS (pass_vrp);
Index: gcc/c-parser.c
===================================================================
--- gcc/c-parser.c	(revision 178354)
+++ gcc/c-parser.c	(working copy)
@@ -5989,6 +5989,46 @@ c_parser_alignof_expression (c_parser *p
     }
 }
 
+/* Helper function to read arguments of builtins which are interfaces
+   for the middle-end nodes like COMPLEX_EXPR, VEC_SHUFFLE_EXPR and
+   others.  The name of the builtin is passed using BNAME parameter.
+   Function returns true if there were no errors while parsing and
+   stores the arguments in EXPR_LIST.  List of original types can be
+   obtained by passing non NULL value to ORIG_TYPES.  */
+static bool
+c_parser_get_builtin_args (c_parser *parser, const char *bname,
+			   VEC(tree,gc) **expr_list,
+			   VEC(tree,gc) **orig_types)
+{
+  location_t loc = c_parser_peek_token (parser)->location;
+  *expr_list = NULL;
+
+  if (c_parser_next_token_is_not (parser, CPP_OPEN_PAREN))
+    {
+      error_at (loc, "cannot take address of %qs", bname);
+      return false;
+    }
+
+  c_parser_consume_token (parser);
+
+  if (c_parser_next_token_is (parser, CPP_CLOSE_PAREN))
+    {
+      c_parser_consume_token (parser);
+      return true;
+    }
+    
+  if (orig_types)
+    *expr_list = c_parser_expr_list (parser, false, false, orig_types);
+  else
+    *expr_list = c_parser_expr_list (parser, false, false, NULL);
+
+  if (!c_parser_require (parser, CPP_CLOSE_PAREN, "expected %<)%>"))
+    return false;
+
+  return true;
+}
+
+
 /* Parse a postfix expression (C90 6.3.1-6.3.2, C99 6.5.1-6.5.2).
 
    postfix-expression:
@@ -6027,6 +6067,10 @@ c_parser_alignof_expression (c_parser *p
 			     assignment-expression )
      __builtin_types_compatible_p ( type-name , type-name )
      __builtin_complex ( assignment-expression , assignment-expression )
+     __builtin_shuffle ( assignment-expression , assignment-expression )
+     __builtin_shuffle ( assignment-expression , 
+			 assignment-expression ,
+			 assignment-expression, )
 
    offsetof-member-designator:
      identifier
@@ -6047,7 +6091,7 @@ c_parser_alignof_expression (c_parser *p
 static struct c_expr
 c_parser_postfix_expression (c_parser *parser)
 {
-  struct c_expr expr, e1, e2, e3;
+  struct c_expr expr, e1;
   struct c_type_name *t1, *t2;
   location_t loc = c_parser_peek_token (parser)->location;;
   expr.original_code = ERROR_MARK;
@@ -6333,45 +6377,55 @@ c_parser_postfix_expression (c_parser *p
 	  }
 	  break;
 	case RID_CHOOSE_EXPR:
-	  c_parser_consume_token (parser);
-	  if (!c_parser_require (parser, CPP_OPEN_PAREN, "expected %<(%>"))
-	    {
-	      expr.value = error_mark_node;
-	      break;
-	    }
-	  loc = c_parser_peek_token (parser)->location;
-	  e1 = c_parser_expr_no_commas (parser, NULL);
-	  if (!c_parser_require (parser, CPP_COMMA, "expected %<,%>"))
-	    {
-	      c_parser_skip_until_found (parser, CPP_CLOSE_PAREN, NULL);
-	      expr.value = error_mark_node;
-	      break;
-	    }
-	  e2 = c_parser_expr_no_commas (parser, NULL);
-	  if (!c_parser_require (parser, CPP_COMMA, "expected %<,%>"))
-	    {
-	      c_parser_skip_until_found (parser, CPP_CLOSE_PAREN, NULL);
-	      expr.value = error_mark_node;
-	      break;
-	    }
-	  e3 = c_parser_expr_no_commas (parser, NULL);
-	  c_parser_skip_until_found (parser, CPP_CLOSE_PAREN,
-				     "expected %<)%>");
 	  {
-	    tree c;
+	    VEC(tree,gc) *expr_list;
+	    VEC(tree,gc) *orig_types;
+	    tree e1value, e2value, e3value, c;
 
-	    c = e1.value;
-	    mark_exp_read (e2.value);
-	    mark_exp_read (e3.value);
+	    c_parser_consume_token (parser);
+	    if (!c_parser_get_builtin_args (parser,
+					    "__builtin_choose_expr",
+					    &expr_list, &orig_types))
+	      {
+		expr.value = error_mark_node;
+		break;
+	      }
+
+	    if (VEC_length (tree, expr_list) != 3)
+	      {
+		error_at (loc, "wrong number of arguments to "
+			       "%<__builtin_choose_expr%>");
+		expr.value = error_mark_node;
+		break;
+	      }
+	    
+	    e1value = VEC_index (tree, expr_list, 0);
+	    e2value = VEC_index (tree, expr_list, 1);
+	    e3value = VEC_index (tree, expr_list, 2);
+
+	    c = e1value;
+	    mark_exp_read (e2value);
+	    mark_exp_read (e3value);
 	    if (TREE_CODE (c) != INTEGER_CST
 		|| !INTEGRAL_TYPE_P (TREE_TYPE (c)))
 	      error_at (loc,
 			"first argument to %<__builtin_choose_expr%> not"
 			" a constant");
 	    constant_expression_warning (c);
-	    expr = integer_zerop (c) ? e3 : e2;
+	    
+	    if (integer_zerop (c))
+	      {
+		expr.value = e3value;
+		expr.original_type = VEC_index (tree, orig_types, 2);
+	      }
+	    else
+	      {
+		expr.value = e2value;
+		expr.original_type = VEC_index (tree, orig_types, 1);
+	      }
+
+	    break;
 	  }
-	  break;
 	case RID_TYPES_COMPATIBLE_P:
 	  c_parser_consume_token (parser);
 	  if (!c_parser_require (parser, CPP_OPEN_PAREN, "expected %<(%>"))
@@ -6410,57 +6464,96 @@ c_parser_postfix_expression (c_parser *p
 	  }
 	  break;
 	case RID_BUILTIN_COMPLEX:
-	  c_parser_consume_token (parser);
-	  if (!c_parser_require (parser, CPP_OPEN_PAREN, "expected %<(%>"))
-	    {
-	      expr.value = error_mark_node;
-	      break;
-	    }
-	  loc = c_parser_peek_token (parser)->location;
-	  e1 = c_parser_expr_no_commas (parser, NULL);
-	  if (!c_parser_require (parser, CPP_COMMA, "expected %<,%>"))
-	    {
-	      c_parser_skip_until_found (parser, CPP_CLOSE_PAREN, NULL);
-	      expr.value = error_mark_node;
-	      break;
-	    }
-	  e2 = c_parser_expr_no_commas (parser, NULL);
-	  c_parser_skip_until_found (parser, CPP_CLOSE_PAREN,
-				     "expected %<)%>");
-	  mark_exp_read (e1.value);
-	  if (TREE_CODE (e1.value) == EXCESS_PRECISION_EXPR)
-	    e1.value = convert (TREE_TYPE (e1.value),
-				TREE_OPERAND (e1.value, 0));
-	  mark_exp_read (e2.value);
-	  if (TREE_CODE (e2.value) == EXCESS_PRECISION_EXPR)
-	    e2.value = convert (TREE_TYPE (e2.value),
-				TREE_OPERAND (e2.value, 0));
-	  if (!SCALAR_FLOAT_TYPE_P (TREE_TYPE (e1.value))
-	      || DECIMAL_FLOAT_TYPE_P (TREE_TYPE (e1.value))
-	      || !SCALAR_FLOAT_TYPE_P (TREE_TYPE (e2.value))
-	      || DECIMAL_FLOAT_TYPE_P (TREE_TYPE (e2.value)))
-	    {
-	      error_at (loc, "%<__builtin_complex%> operand "
-			"not of real binary floating-point type");
-	      expr.value = error_mark_node;
-	      break;
-	    }
-	  if (TYPE_MAIN_VARIANT (TREE_TYPE (e1.value))
-	      != TYPE_MAIN_VARIANT (TREE_TYPE (e2.value)))
-	    {
-	      error_at (loc,
-			"%<__builtin_complex%> operands of different types");
-	      expr.value = error_mark_node;
-	      break;
-	    }
-	  if (!flag_isoc99)
-	    pedwarn (loc, OPT_pedantic,
-		     "ISO C90 does not support complex types");
-	  expr.value = build2 (COMPLEX_EXPR,
-			       build_complex_type (TYPE_MAIN_VARIANT
-						   (TREE_TYPE (e1.value))),
-			       e1.value, e2.value);
-	  break;
+	  { 
+	    VEC(tree,gc) *expr_list;
+	    tree e1value, e2value;
+	    
+	    c_parser_consume_token (parser);
+	    if (!c_parser_get_builtin_args (parser,
+					    "__builtin_complex",
+					    &expr_list, NULL))
+	      {
+		expr.value = error_mark_node;
+		break;
+	      }
+
+	    if (VEC_length (tree, expr_list) != 2)
+	      {
+		error_at (loc, "wrong number of arguments to "
+			       "%<__builtin_complex%>");
+		expr.value = error_mark_node;
+		break;
+	      }
+	    
+	    e1value = VEC_index (tree, expr_list, 0);
+	    e2value = VEC_index (tree, expr_list, 1);
+
+	    mark_exp_read (e1value);
+	    if (TREE_CODE (e1value) == EXCESS_PRECISION_EXPR)
+	      e1value = convert (TREE_TYPE (e1value),
+				 TREE_OPERAND (e1value, 0));
+	    mark_exp_read (e2value);
+	    if (TREE_CODE (e2value) == EXCESS_PRECISION_EXPR)
+	      e2value = convert (TREE_TYPE (e2value),
+				 TREE_OPERAND (e2value, 0));
+	    if (!SCALAR_FLOAT_TYPE_P (TREE_TYPE (e1value))
+		|| DECIMAL_FLOAT_TYPE_P (TREE_TYPE (e1value))
+		|| !SCALAR_FLOAT_TYPE_P (TREE_TYPE (e2value))
+		|| DECIMAL_FLOAT_TYPE_P (TREE_TYPE (e2value)))
+	      {
+		error_at (loc, "%<__builtin_complex%> operand "
+			  "not of real binary floating-point type");
+		expr.value = error_mark_node;
+		break;
+	      }
+	    if (TYPE_MAIN_VARIANT (TREE_TYPE (e1value))
+		!= TYPE_MAIN_VARIANT (TREE_TYPE (e2value)))
+	      {
+		error_at (loc,
+			  "%<__builtin_complex%> operands of different types");
+		expr.value = error_mark_node;
+		break;
+	      }
+	    if (!flag_isoc99)
+	      pedwarn (loc, OPT_pedantic,
+		       "ISO C90 does not support complex types");
+	    expr.value = build2 (COMPLEX_EXPR,
+				 build_complex_type (TYPE_MAIN_VARIANT
+						     (TREE_TYPE (e1value))),
+				 e1value, e2value);
+	    break;
+	  }
+	case RID_BUILTIN_SHUFFLE:
+	  {
+	    VEC(tree,gc) *expr_list;
+	    
+	    c_parser_consume_token (parser);
+	    if (!c_parser_get_builtin_args (parser,
+					    "__builtin_shuffle",
+					    &expr_list, NULL))
+	      {
+		expr.value = error_mark_node;
+		break;
+	      }
+
+	    if (VEC_length (tree, expr_list) == 2)
+	      expr.value = c_build_vec_shuffle_expr
+				(loc, VEC_index (tree, expr_list, 0),
+				 NULL_TREE,
+				 VEC_index (tree, expr_list, 1));
+	    else if (VEC_length (tree, expr_list) == 3)
+	      expr.value = c_build_vec_shuffle_expr
+				(loc, VEC_index (tree, expr_list, 0),
+				 VEC_index (tree, expr_list, 1),
+				 VEC_index (tree, expr_list, 2));
+	    else
+	      {
+		error_at (loc, "wrong number of arguments to "
+			       "%<__builtin_shuffle%>");
+		expr.value = error_mark_node;
+	      }
+	    break;
+	  }
 	case RID_AT_SELECTOR:
 	  gcc_assert (c_dialect_objc ());
 	  c_parser_consume_token (parser);
Index: gcc/config/i386/sse.md
===================================================================
--- gcc/config/i386/sse.md	(revision 178354)
+++ gcc/config/i386/sse.md	(working copy)
@@ -231,6 +231,12 @@ (define_mode_attr sseinsnmode
    (V4SF "V4SF") (V2DF "V2DF")
    (TI "TI") (V32QI "OI") (V16HI "OI") (V8SI "OI") (V4DI "OI")])
 
+;; All 128bit vector modes
+(define_mode_attr sseshuffint
+  [(V16QI "V16QI") (V8HI "V8HI")
+   (V4SI "V4SI")  (V2DI "V2DI")
+   (V4SF "V4SI") (V2DF "V2DI")])
+
 ;; Mapping of vector float modes to an integer mode of the same size
 (define_mode_attr sseintvecmode
   [(V8SF "V8SI") (V4DF "V4DI")
@@ -6234,6 +6240,18 @@ (define_expand "vconduv2di"
   DONE;
 })
 
+(define_expand "vshuffle<mode>"
+  [(match_operand:V_128 0 "register_operand" "")
+   (match_operand:V_128 1 "general_operand" "")
+   (match_operand:<sseshuffint> 2 "general_operand" "")]
+  "TARGET_SSE3 || TARGET_AVX"
+{
+  bool ok = ix86_expand_vshuffle (operands);
+  gcc_assert (ok);
+  DONE;
+})
+
+
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
 ;;
 ;; Parallel bitwise logical operations
Index: gcc/config/i386/i386-protos.h
===================================================================
--- gcc/config/i386/i386-protos.h	(revision 178354)
+++ gcc/config/i386/i386-protos.h	(working copy)
@@ -118,6 +118,7 @@ extern bool ix86_expand_int_movcc (rtx[]
 extern bool ix86_expand_fp_movcc (rtx[]);
 extern bool ix86_expand_fp_vcond (rtx[]);
 extern bool ix86_expand_int_vcond (rtx[]);
+extern bool ix86_expand_vshuffle (rtx[]);
 extern void ix86_expand_sse_unpack (rtx[], bool, bool);
 extern bool ix86_expand_int_addcc (rtx[]);
 extern rtx ix86_expand_call (rtx, rtx, rtx, rtx, rtx, bool);
Index: gcc/config/i386/i386.c
===================================================================
--- gcc/config/i386/i386.c	(revision 178354)
+++ gcc/config/i386/i386.c	(working copy)
@@ -18693,6 +18693,96 @@ ix86_expand_int_vcond (rtx operands[])
   return true;
 }
 
+bool
+ix86_expand_vshuffle (rtx operands[])
+{
+  rtx target = operands[0];
+  rtx op0 = operands[1];
+  rtx mask = operands[2];
+  rtx mm, vt, cv0, t1;
+  enum machine_mode mode = GET_MODE (op0);
+  enum machine_mode maskmode = GET_MODE (mask);
+  enum machine_mode maskinner = GET_MODE_INNER (mode);
+  rtx vec[16];
+  int w, i, j;
+
+  gcc_assert ((TARGET_SSE3 || TARGET_AVX) && GET_MODE_BITSIZE (mode) == 128);
+
+  op0 = force_reg (mode, op0);
+  mask = force_reg (maskmode, mask);
+
+  /* Number of elements in the vector.  */
+  w = GET_MODE_BITSIZE (maskmode) / GET_MODE_BITSIZE (maskinner);
+ 
+  /* mask = mask & {w-1, w-1, w-1,...} */
+  for (i = 0; i < w; i++)
+    vec[i] = GEN_INT (w - 1);
+
+  mm = gen_rtx_CONST_VECTOR (maskmode, gen_rtvec_v (w, vec));
+  mm = force_reg (maskmode, mm);
+
+  mask = gen_rtx_AND (maskmode, mask, mm);
+  
+  /* Convert mask to vector of chars.  */
+  mask = simplify_gen_subreg (V16QImode, mask, maskmode, 0);
+  mask = force_reg (V16QImode, mask);
+
+
+  /* Build a helper mask wich we will use in pshufb
+     (v4si) --> {0,0,0,0, 4,4,4,4, 8,8,8,8, 12,12,12,12}
+     (v8hi) --> {0,0, 2,2, 4,4, 6,6, ...}
+     ...  */
+  for (i = 0; i < w; i++)
+    for (j = 0; j < 16/w; j++)
+      vec[i*w+j] = GEN_INT (i*16/w);
+
+  vt = gen_rtx_CONST_VECTOR (V16QImode, gen_rtvec_v (16, vec));
+  vt = force_reg (V16QImode, vt);
+  
+  t1 = gen_reg_rtx (V16QImode);
+  emit_insn (gen_ssse3_pshufbv16qi3 (t1, mask, vt));
+  mm = t1;
+
+  /* MM contains now something like
+     mm = {m[0], .., m[0], m[k], .., m[k], ... }, where 
+     m[i] is an index of the element in the vector we are
+     selecting from.
+
+     Convert it into the byte positions by doing
+     mm = mm * {16/w, 16/w, ...}
+     mm = mm + {0,1,..,16/w, 0,1,..,16/w, ...}  */
+  for (i = 0; i < 16; i++)
+    vec[i] = GEN_INT (16/w);
+
+  cv0 = gen_rtx_CONST_VECTOR (V16QImode, gen_rtvec_v (16, vec));
+  cv0 = force_reg (V16QImode, cv0);
+  mm = gen_rtx_MULT (V16QImode, mm, cv0);
+
+  for (i = 0; i < w; i++)
+    for (j = 0; j < 16/w; j++)
+      vec[i*w+j] = GEN_INT (j);
+
+  cv0 = gen_rtx_CONST_VECTOR (V16QImode, gen_rtvec_v (16, vec));
+  cv0 = force_reg (V16QImode, cv0);
+  mm = gen_rtx_PLUS (V16QImode, mm, cv0);
+  mm = force_reg (V16QImode, mm);
+
+  t1 = gen_reg_rtx (V16QImode);
+  
+  /* Convert OP0 to vector of chars.  */
+  op0 = simplify_gen_subreg (V16QImode, op0, mode, 0);
+  op0 = force_reg (V16QImode, op0);
+  emit_insn (gen_ssse3_pshufbv16qi3 (t1, op0, mm));
+  
+  /* Convert it back from vector of chars to the original mode.  */
+  t1 = simplify_gen_subreg (mode, t1, V16QImode, 0);
+  
+  emit_insn (gen_rtx_SET (VOIDmode, target, t1));
+ 
+  fprintf (stderr, "-- %s called\n", __func__);
+  return true;
+}
+
 /* Unpack OP[1] into the next wider integer vector type.  UNSIGNED_P is
    true if we should do zero extension, else sign extension.  HIGH_P is
    true if we want the N/2 high elements, else the low elements.  */
@@ -30911,6 +31001,9 @@ struct expand_vec_perm_d
 
 static bool expand_vec_perm_1 (struct expand_vec_perm_d *d);
 static bool expand_vec_perm_broadcast_1 (struct expand_vec_perm_d *d);
+static int extract_vec_perm_cst (struct expand_vec_perm_d *, tree);
+static bool ix86_vectorize_builtin_vec_perm_ok (tree vec_type, tree mask);
+
 
 /* Get a vector mode of the same size as the original but with elements
    twice as wide.  This is only guaranteed to apply to integral vectors.  */
@@ -34576,10 +34669,10 @@ ix86_vectorize_builtin_vec_perm_ok (tree
 
   vec_mask = extract_vec_perm_cst (&d, mask);
 
-  /* This hook is cannot be called in response to something that the
-     user does (unlike the builtin expander) so we shouldn't ever see
-     an error generated from the extract.  */
-  gcc_assert (vec_mask > 0 && vec_mask <= 3);
+  /* Check whether the mask can be applied to the vector type.  */
+  if (vec_mask < 0 || vec_mask > 3)
+    return false;
+  
   one_vec = (vec_mask != 3);
 
   /* Implementable with shufps or pshufd.  */
Index: gcc/tree-ssa-operands.c
===================================================================
--- gcc/tree-ssa-operands.c	(revision 178354)
+++ gcc/tree-ssa-operands.c	(working copy)
@@ -943,6 +943,7 @@ get_expr_operands (gimple stmt, tree *ex
 
     case COND_EXPR:
     case VEC_COND_EXPR:
+    case VEC_SHUFFLE_EXPR:
       get_expr_operands (stmt, &TREE_OPERAND (expr, 0), uflags);
       get_expr_operands (stmt, &TREE_OPERAND (expr, 1), uflags);
       get_expr_operands (stmt, &TREE_OPERAND (expr, 2), uflags);

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: Vector shuffling
  2011-09-09 17:04                                 ` Artem Shinkarov
@ 2011-09-12  8:02                                   ` Richard Guenther
  2011-09-13 17:48                                   ` Joseph S. Myers
  2011-09-15 20:36                                   ` Richard Henderson
  2 siblings, 0 replies; 71+ messages in thread
From: Richard Guenther @ 2011-09-12  8:02 UTC (permalink / raw)
  To: Artem Shinkarov; +Cc: Joseph S. Myers, Duncan Sands, gcc-patches

On Fri, Sep 9, 2011 at 5:51 PM, Artem Shinkarov
<artyom.shinkaroff@gmail.com> wrote:
> Hi, sorry for the delay, I had a lot of other stuff to do.
>
> In the attachment there is a new patch that fixes all the issues
> pointed by Joseph and almost all the issues pointed by Richard. The
> issues that are not fixed are explained further.
>
> Artem.
>
>>+      if (TREE_CODE (vect) == VECTOR_CST)
>>+        {
>>+            unsigned i;
>>+            tree vals = TREE_VECTOR_CST_ELTS (vect);
>>+            for (i = 0; vals; vals = TREE_CHAIN (vals), ++i)
>>+              if (i == index)
>>
>>operand_equal_p
>
> Sorry, I didn't get this comment. It looks fine to me without any changes.

Error on my side, the code is ok.

>>+  if (need_asgn)
>>+    {
>>+      TREE_ADDRESSABLE (tmpvec) = 1;
>>
>>this should be at the point we call create_tmp_var.
>
> Here we are talking about writing this line two times several lines
> upper. I would like to leave it just to avoid code duplication.

Well, it's just that they are supposed to be occuring in pairs only,
so it would make the code more understandable.  So please do the duplication.

>>Index: gcc/passes.c
>>===========================================================>========
>>--- gcc/passes.c        (revision 178354)
>>+++ gcc/passes.c        (working copy)
>>@@ -1354,7 +1354,6 @@ init_optimization_passes (void)
>>         NEXT_PASS (pass_vectorize);
>>           {
>>             struct opt_pass **p = &pass_vectorize.pass.sub;
>>-             NEXT_PASS (pass_lower_vector_ssa);
>>             NEXT_PASS (pass_dce_loop);
>>           }
>>          NEXT_PASS (pass_predcom);
>>@@ -1366,6 +1365,7 @@ init_optimization_passes (void)
>>         NEXT_PASS (pass_lim);
>>         NEXT_PASS (pass_tree_loop_done);
>>       }
>>+      NEXT_PASS (pass_lower_vector_ssa);
>>
>>This change should not be neccesary.
>
> Without this change the vector lowering with -Ox does not execute the
> pass at all. So the overall idea is to make sure that if we have -Ox
> we make lowering only once and as late as possible.
>
> But maybe I am just missing something.

Ah, I thought we had already converted the -O0 pass to work like the
complex lowering at -O0 (which uses a property).  So hm, I guess the
change is ok as well.

I'm on vacation for the next two weeks, conditional on the x86 approval
I'll take care of committing the patch after I return.

Richard.

>
>
> Thanks,
> Artem.
>
> P.S. X86 PEOPLE, WHERE ARE YOU? :)
>

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: Vector shuffling
  2011-09-09 17:04                                 ` Artem Shinkarov
  2011-09-12  8:02                                   ` Richard Guenther
@ 2011-09-13 17:48                                   ` Joseph S. Myers
  2011-09-15 20:36                                   ` Richard Henderson
  2 siblings, 0 replies; 71+ messages in thread
From: Joseph S. Myers @ 2011-09-13 17:48 UTC (permalink / raw)
  To: Artem Shinkarov; +Cc: Richard Guenther, Duncan Sands, gcc-patches

On Fri, 9 Sep 2011, Artem Shinkarov wrote:

> Hi, sorry for the delay, I had a lot of other stuff to do.
> 
> In the attachment there is a new patch that fixes all the issues
> pointed by Joseph and almost all the issues pointed by Richard. The
> issues that are not fixed are explained further.

The C front-end parts of this version of the patch are OK with the 
spurious whitespace change

> @@ -6120,7 +6213,7 @@ digest_init (location_t init_loc, tree t
>  	  tree value;
>  	  bool constant_p = true;
>  
> -	  /* Iterate through elements and check if all constructor
> +          /* Iterate through elements and check if all constructor
>  	     elements are *_CSTs.  */

removed.  The original version of this line was correctly indented with a 
TAB.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: Vector shuffling
  2011-09-09 17:04                                 ` Artem Shinkarov
  2011-09-12  8:02                                   ` Richard Guenther
  2011-09-13 17:48                                   ` Joseph S. Myers
@ 2011-09-15 20:36                                   ` Richard Henderson
  2011-09-28 13:43                                     ` Artem Shinkarov
  2 siblings, 1 reply; 71+ messages in thread
From: Richard Henderson @ 2011-09-15 20:36 UTC (permalink / raw)
  To: Artem Shinkarov
  Cc: Joseph S. Myers, Richard Guenther, Duncan Sands, gcc-patches

> +The elements of the input vectors are numbered from left to right across
> +one or both of the vectors. Each element in the mask specifies a number
> +of element from the input vector(s). Consider the following example.

It would be more preferable to talk about the memory ordering of the
elements rather than "left" and "right" which are ambiguous at best.

> +  if (TREE_CODE (mask) == VECTOR_CST)
> +    {
> +      tree m_type, call;
> +      tree fn = targetm.vectorize.builtin_vec_perm (TREE_TYPE (v0), &m_type);
> +      /*rtx t;*/

Leftover crap.

> +
> +      if (!fn)
> +	goto vshuffle;
> +
> +      if (m_type != TREE_TYPE (TREE_TYPE (mask)))
> +	{
> +	  int units = TYPE_VECTOR_SUBPARTS (TREE_TYPE (mask));
> +	  tree cvt = build_vector_type (m_type, units);
> +	  mask = fold_convert (cvt, mask);
> +	}
> +
> +      call = fold_build1 (ADDR_EXPR, build_pointer_type (TREE_TYPE (fn)), fn);
> +      call = build_call_nary (type /* ? */, call, 3, v0, v1, mask);
> +
> +      return expand_expr_real_1 (call, target, VOIDmode, EXPAND_NORMAL, NULL);
> +    }
> +
> +vshuffle:
> +  gcc_assert (operand_equal_p (v0, v1, 0));

Why can't a non-constant shuffle have different V0 and V1?  That seems
like a direct violation of the documentation, and any sort of usefulness.

Also, while I'm ok with the use of builtin_vec_perm here in the short
term, I think that in the long term we should simply force the named
pattern to handle constants.  Then the vectorizer can simply use the
predicate and the tree code and we can drop the large redundancy of
builtins with different argument types.

Indeed, once this patch is applied, I think that ought to be the very
next task in this domain.

> +/* Vector shuffle expression.  A = VEC_SHUFFLE_EXPR<v0, v1, maks>

Typo in "mask".

> +   foreach i in length (mask):
> +     A = mask[i] < length (v0) ? v0[mask[i]] : v1[mask[i]]

Surely it's v1[mask[i] - length].

> +      if (TREE_CODE (vect) == VECTOR_CST)
> +        {
> +            unsigned i;

Indentation is off all through this function.

> +  mask = gen_rtx_AND (maskmode, mask, mm);
> +  
> +  /* Convert mask to vector of chars.  */
> +  mask = simplify_gen_subreg (V16QImode, mask, maskmode, 0);
> +  mask = force_reg (V16QImode, mask);

Why are you using force_reg to do all the dirty work?  Seems to
me this should be using expand_normal.  All throughout this 
function.  That would also avoid the need for all of the extra
force_reg stuff that ought not be there for -O0.

I also see that you're not even attempting to use xop_pperm.

Is ssse3_pshufb why you do the wrong thing in the expander for v0 != v1?
And give the vshuffle named pattern the wrong number of arguments?
It's certainly possible to handle it, though it takes a few more steps,
and might well be more efficient as a libgcc function rather than inline.


r~


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: Vector shuffling
  2011-09-15 20:36                                   ` Richard Henderson
@ 2011-09-28 13:43                                     ` Artem Shinkarov
  2011-09-28 15:20                                       ` Richard Henderson
  0 siblings, 1 reply; 71+ messages in thread
From: Artem Shinkarov @ 2011-09-28 13:43 UTC (permalink / raw)
  To: Richard Henderson
  Cc: Joseph S. Myers, Richard Guenther, Duncan Sands, gcc-patches

On Thu, Sep 15, 2011 at 8:05 PM, Richard Henderson <rth@redhat.com> wrote:
>> +The elements of the input vectors are numbered from left to right across
>> +one or both of the vectors. Each element in the mask specifies a number
>> +of element from the input vector(s). Consider the following example.
>
> It would be more preferable to talk about the memory ordering of the
> elements rather than "left" and "right" which are ambiguous at best.
>
>> +  if (TREE_CODE (mask) == VECTOR_CST)
>> +    {
>> +      tree m_type, call;
>> +      tree fn = targetm.vectorize.builtin_vec_perm (TREE_TYPE (v0), &m_type);
>> +      /*rtx t;*/
>
> Leftover crap.

Fixed.

>> +
>> +      if (!fn)
>> +     goto vshuffle;
>> +
>> +      if (m_type != TREE_TYPE (TREE_TYPE (mask)))
>> +     {
>> +       int units = TYPE_VECTOR_SUBPARTS (TREE_TYPE (mask));
>> +       tree cvt = build_vector_type (m_type, units);
>> +       mask = fold_convert (cvt, mask);
>> +     }
>> +
>> +      call = fold_build1 (ADDR_EXPR, build_pointer_type (TREE_TYPE (fn)), fn);
>> +      call = build_call_nary (type /* ? */, call, 3, v0, v1, mask);
>> +
>> +      return expand_expr_real_1 (call, target, VOIDmode, EXPAND_NORMAL, NULL);
>> +    }
>> +
>> +vshuffle:
>> +  gcc_assert (operand_equal_p (v0, v1, 0));
>
> Why can't a non-constant shuffle have different V0 and V1?  That seems
> like a direct violation of the documentation, and any sort of usefulness.

Ok, I agree. The reason why this assert is here is that noone in the
middle-end generates the code that does not meet this assert. In
principle we definitely want to support it in the upcoming patches,
but it would be nice to start with a simple thing.

> Also, while I'm ok with the use of builtin_vec_perm here in the short
> term, I think that in the long term we should simply force the named
> pattern to handle constants.  Then the vectorizer can simply use the
> predicate and the tree code and we can drop the large redundancy of
> builtins with different argument types.
>
> Indeed, once this patch is applied, I think that ought to be the very
> next task in this domain.
>
>> +/* Vector shuffle expression.  A = VEC_SHUFFLE_EXPR<v0, v1, maks>
>
> Typo in "mask".
>
>> +   foreach i in length (mask):
>> +     A = mask[i] < length (v0) ? v0[mask[i]] : v1[mask[i]]
>
> Surely it's v1[mask[i] - length].
>
>> +      if (TREE_CODE (vect) == VECTOR_CST)
>> +        {
>> +            unsigned i;
>
> Indentation is off all through this function.

Fixed.

>> +  mask = gen_rtx_AND (maskmode, mask, mm);
>> +
>> +  /* Convert mask to vector of chars.  */
>> +  mask = simplify_gen_subreg (V16QImode, mask, maskmode, 0);
>> +  mask = force_reg (V16QImode, mask);
>
> Why are you using force_reg to do all the dirty work?  Seems to
> me this should be using expand_normal.  All throughout this
> function.  That would also avoid the need for all of the extra
> force_reg stuff that ought not be there for -O0.

I don't really understand this. As far as I know, expand_normal
"converts" tree to rtx. All my computations are happening at the level
of rtx and force_reg is needed just to bring an rtx expression to the
register of the correct mode. If I am missing something, could you
give an example how can I use expand_normal instead of force_reg in
this particular code.

> I also see that you're not even attempting to use xop_pperm.

As I said, I am happy to experiment with the cases v0 != v1 in the
upcoming patches. Let's just start with a simple thing and see what
kind of issues/problems it would bring.

> Is ssse3_pshufb why you do the wrong thing in the expander for v0 != v1?

My personal feeling is that it may be the case with v0 != v1, that it
would be more efficient to perform piecewise shuffling rather than
bitwise dances around the masks.

> And give the vshuffle named pattern the wrong number of arguments?

Ok, If I'll make vshuffle to accept only two arguments -- vector and
mask, would it be ok?

> It's certainly possible to handle it, though it takes a few more steps,
> and might well be more efficient as a libgcc function rather than inline.

I don't really understand why it could be more efficient. I thought
that inline gives more chances to the final RTL optimisation.
>
>
> r~
>
>
>

Thanks,
Artem.

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: Vector shuffling
  2011-09-28 13:43                                     ` Artem Shinkarov
@ 2011-09-28 15:20                                       ` Richard Henderson
  2011-09-29 11:16                                         ` Artem Shinkarov
  0 siblings, 1 reply; 71+ messages in thread
From: Richard Henderson @ 2011-09-28 15:20 UTC (permalink / raw)
  To: Artem Shinkarov
  Cc: Joseph S. Myers, Richard Guenther, Duncan Sands, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1446 bytes --]

On 09/28/2011 05:59 AM, Artem Shinkarov wrote:
> I don't really understand this. As far as I know, expand_normal
> "converts" tree to rtx. All my computations are happening at the level
> of rtx and force_reg is needed just to bring an rtx expression to the
> register of the correct mode. If I am missing something, could you
> give an example how can I use expand_normal instead of force_reg in
> this particular code.

Sorry, I meant expand_(simple_)binop.

>> Is ssse3_pshufb why you do the wrong thing in the expander for v0 != v1?
> 
> My personal feeling is that it may be the case with v0 != v1, that it
> would be more efficient to perform piecewise shuffling rather than
> bitwise dances around the masks.

Maybe for V2DI and V2DFmode, but probably not otherwise.

We can perform the double-word shuffle in 12 insns; 10 for SSE 4.1.
Example assembly attached.

>> It's certainly possible to handle it, though it takes a few more steps,
>> and might well be more efficient as a libgcc function rather than inline.
> 
> I don't really understand why it could be more efficient. I thought
> that inline gives more chances to the final RTL optimisation.

We'll not be able to optimize this at the rtl level.  There are too many
UNSPEC instructions in the way.  In any case, even if that weren't so we'd
only be able to do useful optimization for a constant permutation.  And
we should have been able to prove that at the gimple level.


r~

[-- Attachment #2: z.s --]
[-- Type: text/plain, Size: 587 bytes --]

	.data
	.align 16
vec3:	.long	3,3,3,3
vec4:	.long	4,4,4,4
dup4:	.byte	0,0,0,0, 4,4,4,4, 8,8,8,8, 12,12,12,12
ofs4:	.byte	0,1,2,3, 0,1,2,3, 0,1,2,3, 0,1,2,3

	.text
shuffle2:

	// Convert the low bits of the mask to a shuffle
	movdqa	%xmm2, %xmm3
	pand	vec3, %xmm3
	pmulld	vec4, %xmm3
	pshufb	dup4, %xmm3
	paddb	ofs4, %xmm3

	// Shuffle both inputs
	pshufb	%xmm3, %xmm0
	pshufb	%xmm3, %xmm1

	// Select and merge the inputs
	// Use ix86_expand_int_vcond for use of pblendvb for SSE4_1.
	pand	vec4, %xmm2
	pcmpeqd	vec4, %xmm2
	pand	%xmm2, %xmm1
	pandn	%xmm2, %xmm0
	por	%xmm1, %xmm0

	ret

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: Vector shuffling
  2011-09-28 15:20                                       ` Richard Henderson
@ 2011-09-29 11:16                                         ` Artem Shinkarov
  2011-09-29 17:22                                           ` Richard Henderson
  0 siblings, 1 reply; 71+ messages in thread
From: Artem Shinkarov @ 2011-09-29 11:16 UTC (permalink / raw)
  To: Richard Henderson
  Cc: Joseph S. Myers, Richard Guenther, Duncan Sands, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1770 bytes --]

Here is a new version of the patch which hopefully fixes all the
formatting issues and uses expand_simple_binop instead of force_reg in
binary operations.

Ok?


On Wed, Sep 28, 2011 at 3:46 PM, Richard Henderson <rth@redhat.com> wrote:
> On 09/28/2011 05:59 AM, Artem Shinkarov wrote:
>> I don't really understand this. As far as I know, expand_normal
>> "converts" tree to rtx. All my computations are happening at the level
>> of rtx and force_reg is needed just to bring an rtx expression to the
>> register of the correct mode. If I am missing something, could you
>> give an example how can I use expand_normal instead of force_reg in
>> this particular code.
>
> Sorry, I meant expand_(simple_)binop.
>
>>> Is ssse3_pshufb why you do the wrong thing in the expander for v0 != v1?
>>
>> My personal feeling is that it may be the case with v0 != v1, that it
>> would be more efficient to perform piecewise shuffling rather than
>> bitwise dances around the masks.
>
> Maybe for V2DI and V2DFmode, but probably not otherwise.
>
> We can perform the double-word shuffle in 12 insns; 10 for SSE 4.1.
> Example assembly attached.
>
>>> It's certainly possible to handle it, though it takes a few more steps,
>>> and might well be more efficient as a libgcc function rather than inline.
>>
>> I don't really understand why it could be more efficient. I thought
>> that inline gives more chances to the final RTL optimisation.
>
> We'll not be able to optimize this at the rtl level.  There are too many
> UNSPEC instructions in the way.  In any case, even if that weren't so we'd
> only be able to do useful optimization for a constant permutation.  And
> we should have been able to prove that at the gimple level.
>
>
> r~
>

[-- Attachment #2: vec-shuffle.v17.diff --]
[-- Type: text/plain, Size: 63941 bytes --]

Index: gcc/doc/extend.texi
===================================================================
--- gcc/doc/extend.texi	(revision 178354)
+++ gcc/doc/extend.texi	(working copy)
@@ -6561,6 +6561,32 @@ invoke undefined behavior at runtime.  W
 accesses for vector subscription can be enabled with
 @option{-Warray-bounds}.
 
+Vector shuffling is available using functions
+@code{__builtin_shuffle (vec, mask)} and
+@code{__builtin_shuffle (vec0, vec1, mask)}. Both functions construct
+a permutation of elements from one or two vectors and return a vector
+of the same type as input vector(s). The mask is a vector of
+integer-typed elements. The size of each element of the mask must be
+the same as the size of each input vector element. The number of
+elements in input vector(s) and mask must be the same.
+
+The elements of the input vectors are numbered from left to right across
+one or both of the vectors. Each element in the mask specifies a number
+of element from the input vector(s). Consider the following example.
+
+@smallexample
+typedef int v4si __attribute__ ((vector_size (16)));
+
+v4si a = @{1,2,3,4@};
+v4si b = @{5,6,7,8@};
+v4si mask1 = @{0,1,1,3@};
+v4si mask2 = @{0,4,2,5@};
+v4si res;
+
+res = __builtin_shuffle (a, mask1);       /* res is @{1,2,2,4@}  */
+res = __builtin_shuffle (a, b, mask2);    /* res is @{1,5,3,6@}  */
+@end smallexample
+
 You can declare variables and use them in function calls and returns, as
 well as in assignments and some casts.  You can specify a vector type as
 a return type for a function.  Vector types can also be used as function
Index: gcc/tree-pretty-print.c
===================================================================
--- gcc/tree-pretty-print.c	(revision 178354)
+++ gcc/tree-pretty-print.c	(working copy)
@@ -2067,6 +2067,16 @@ dump_generic_node (pretty_printer *buffe
       dump_generic_node (buffer, TREE_OPERAND (node, 2), spc, flags, false);
       pp_string (buffer, " > ");
       break;
+    
+    case VEC_SHUFFLE_EXPR:
+      pp_string (buffer, " VEC_SHUFFLE_EXPR < ");
+      dump_generic_node (buffer, TREE_OPERAND (node, 0), spc, flags, false);
+      pp_string (buffer, " , ");
+      dump_generic_node (buffer, TREE_OPERAND (node, 1), spc, flags, false);
+      pp_string (buffer, " , ");
+      dump_generic_node (buffer, TREE_OPERAND (node, 2), spc, flags, false);
+      pp_string (buffer, " > ");
+      break;
 
     case DOT_PROD_EXPR:
       pp_string (buffer, " DOT_PROD_EXPR < ");
Index: gcc/c-family/c-common.c
===================================================================
--- gcc/c-family/c-common.c	(revision 178354)
+++ gcc/c-family/c-common.c	(working copy)
@@ -425,6 +425,7 @@ const struct c_common_resword c_common_r
   { "__attribute__",	RID_ATTRIBUTE,	0 },
   { "__builtin_choose_expr", RID_CHOOSE_EXPR, D_CONLY },
   { "__builtin_complex", RID_BUILTIN_COMPLEX, D_CONLY },
+  { "__builtin_shuffle", RID_BUILTIN_SHUFFLE, D_CONLY },
   { "__builtin_offsetof", RID_OFFSETOF, 0 },
   { "__builtin_types_compatible_p", RID_TYPES_COMPATIBLE_P, D_CONLY },
   { "__builtin_va_arg",	RID_VA_ARG,	0 },
Index: gcc/c-family/c-common.h
===================================================================
--- gcc/c-family/c-common.h	(revision 178354)
+++ gcc/c-family/c-common.h	(working copy)
@@ -103,7 +103,7 @@ enum rid
   /* C extensions */
   RID_ASM,       RID_TYPEOF,   RID_ALIGNOF,  RID_ATTRIBUTE,  RID_VA_ARG,
   RID_EXTENSION, RID_IMAGPART, RID_REALPART, RID_LABEL,      RID_CHOOSE_EXPR,
-  RID_TYPES_COMPATIBLE_P,      RID_BUILTIN_COMPLEX,
+  RID_TYPES_COMPATIBLE_P,      RID_BUILTIN_COMPLEX,	     RID_BUILTIN_SHUFFLE,
   RID_DFLOAT32, RID_DFLOAT64, RID_DFLOAT128,
   RID_FRACT, RID_ACCUM,
 
Index: gcc/optabs.c
===================================================================
--- gcc/optabs.c	(revision 178354)
+++ gcc/optabs.c	(working copy)
@@ -6620,6 +6620,78 @@ vector_compare_rtx (tree cond, bool unsi
   return gen_rtx_fmt_ee (rcode, VOIDmode, ops[0].value, ops[1].value);
 }
 
+/* Return true if VEC_SHUFFLE_EXPR can be expanded using SIMD extensions
+   of the CPU.  */
+bool
+expand_vec_shuffle_expr_p (enum machine_mode mode, tree v0, tree v1, tree mask)
+{
+  int v0_mode_s = GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE (TREE_TYPE (v0))));
+  int mask_mode_s = GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE (TREE_TYPE (mask))));
+
+  if (TREE_CODE (mask) == VECTOR_CST
+      && targetm.vectorize.builtin_vec_perm_ok (TREE_TYPE (v0), mask))
+    return true;
+
+  if (!operand_equal_p (v0, v1, 0) || v0_mode_s != mask_mode_s
+      || TYPE_VECTOR_SUBPARTS (TREE_TYPE (v0))
+	 != TYPE_VECTOR_SUBPARTS (TREE_TYPE (mask)))
+    return false;
+
+  return direct_optab_handler (vshuffle_optab, mode) != CODE_FOR_nothing;
+}
+
+/* Generate instructions for VEC_COND_EXPR given its type and three
+   operands.  */
+rtx
+expand_vec_shuffle_expr (tree type, tree v0, tree v1, tree mask, rtx target)
+{
+  struct expand_operand ops[4];
+  enum insn_code icode;
+  enum machine_mode mode = TYPE_MODE (type);
+  rtx rtx_v0, rtx_mask;
+
+  gcc_assert (expand_vec_shuffle_expr_p (mode, v0, v1, mask));
+
+  if (TREE_CODE (mask) == VECTOR_CST)
+    {
+      tree m_type, call;
+      tree fn = targetm.vectorize.builtin_vec_perm (TREE_TYPE (v0), &m_type);
+
+      if (!fn)
+	goto vshuffle;
+
+      if (m_type != TREE_TYPE (TREE_TYPE (mask)))
+	{
+	  int units = TYPE_VECTOR_SUBPARTS (TREE_TYPE (mask));
+	  tree cvt = build_vector_type (m_type, units);
+	  mask = fold_convert (cvt, mask);
+	}
+
+      call = fold_build1 (ADDR_EXPR, build_pointer_type (TREE_TYPE (fn)), fn);
+      call = build_call_nary (type, call, 3, v0, v1, mask);
+
+      return expand_expr_real_1 (call, target, VOIDmode, EXPAND_NORMAL, NULL);
+    }
+
+vshuffle:
+  gcc_assert (operand_equal_p (v0, v1, 0));
+
+  icode = direct_optab_handler (vshuffle_optab, mode);
+
+  if (icode == CODE_FOR_nothing)
+    return 0;
+
+  rtx_v0 = expand_normal (v0);
+  rtx_mask = expand_normal (mask);
+
+  create_output_operand (&ops[0], target, mode);
+  create_input_operand (&ops[1], rtx_v0, mode);
+  create_input_operand (&ops[2], rtx_mask, mode);
+  expand_insn (icode, 3, ops);
+
+  return ops[0].value;
+}
+
 /* Return insn code for TYPE, the type of a VEC_COND_EXPR.  */
 
 static inline enum insn_code
Index: gcc/optabs.h
===================================================================
--- gcc/optabs.h	(revision 178354)
+++ gcc/optabs.h	(working copy)
@@ -636,6 +636,9 @@ enum direct_optab_index
   DOI_vcond,
   DOI_vcondu,
 
+  /* Vector shuffling.  */
+  DOI_vshuffle,
+
   /* Block move operation.  */
   DOI_movmem,
 
@@ -701,6 +704,7 @@ typedef struct direct_optab_d *direct_op
 #define reload_out_optab (&direct_optab_table[(int) DOI_reload_out])
 #define vcond_optab (&direct_optab_table[(int) DOI_vcond])
 #define vcondu_optab (&direct_optab_table[(int) DOI_vcondu])
+#define vshuffle_optab (&direct_optab_table[(int) DOI_vshuffle])
 #define movmem_optab (&direct_optab_table[(int) DOI_movmem])
 #define setmem_optab (&direct_optab_table[(int) DOI_setmem])
 #define cmpstr_optab (&direct_optab_table[(int) DOI_cmpstr])
@@ -879,8 +883,15 @@ extern rtx expand_widening_mult (enum ma
 /* Return tree if target supports vector operations for COND_EXPR.  */
 bool expand_vec_cond_expr_p (tree, enum machine_mode);
 
+/* Return tree if target supports vector operations for VEC_SHUFFLE_EXPR.  */
+bool expand_vec_shuffle_expr_p (enum machine_mode, tree, tree, tree);
+
 /* Generate code for VEC_COND_EXPR.  */
 extern rtx expand_vec_cond_expr (tree, tree, tree, tree, rtx);
+
+/* Generate code for VEC_SHUFFLE_EXPR.  */
+extern rtx expand_vec_shuffle_expr (tree, tree, tree, tree, rtx);
+
 /* Generate code for VEC_LSHIFT_EXPR and VEC_RSHIFT_EXPR.  */
 extern rtx expand_vec_shift_expr (sepops, rtx);
 
Index: gcc/genopinit.c
===================================================================
--- gcc/genopinit.c	(revision 178354)
+++ gcc/genopinit.c	(working copy)
@@ -255,6 +255,7 @@ static const char * const optabs[] =
   "set_optab_handler (vec_realign_load_optab, $A, CODE_FOR_$(vec_realign_load_$a$))",
   "set_direct_optab_handler (vcond_optab, $A, CODE_FOR_$(vcond$a$))",
   "set_direct_optab_handler (vcondu_optab, $A, CODE_FOR_$(vcondu$a$))",
+  "set_direct_optab_handler (vshuffle_optab, $A, CODE_FOR_$(vshuffle$a$))",
   "set_optab_handler (ssum_widen_optab, $A, CODE_FOR_$(widen_ssum$I$a3$))",
   "set_optab_handler (usum_widen_optab, $A, CODE_FOR_$(widen_usum$I$a3$))",
   "set_optab_handler (udot_prod_optab, $A, CODE_FOR_$(udot_prod$I$a$))",
Index: gcc/testsuite/gcc.c-torture/execute/vect-shuffle-2.c
===================================================================
--- gcc/testsuite/gcc.c-torture/execute/vect-shuffle-2.c	(revision 0)
+++ gcc/testsuite/gcc.c-torture/execute/vect-shuffle-2.c	(revision 0)
@@ -0,0 +1,44 @@
+#define vector(elcount, type)  \
+__attribute__((vector_size((elcount)*sizeof(type)))) type
+
+#define vidx(type, vec, idx) (*(((type *) &(vec)) + idx))
+
+#define shuf2compare(type, count, vres, v0, v1, mask) \
+do { \
+    int __i; \
+    for (__i = 0; __i < count; __i++) { \
+        if (vidx(type, vres, __i) != ((vidx(type, mask, __i) < count) ? \
+                          vidx(type, v0, vidx(type, mask, __i)) :  \
+                          vidx(type, v1, (vidx(type, mask, __i) - count)))) \
+            __builtin_abort (); \
+        } \
+} while (0)
+
+
+int main (int argc, char *argv[]) {
+    vector (8, short) v0 = {5, 5,5,5,5,5,argc,7};
+    vector (8, short) v1 = {argc, 1,8,8,4,9,argc,4};
+    vector (8, short) v2;
+
+    //vector (8, short) mask = {1,2,5,4,3,6,7};
+    
+    vector (8, short) mask0 = {0,2,3,1,4,5,6,7};
+    vector (8, short) mask1 = {0,12,3,4,3,0,10,9};
+    
+    vector (8, short) mask2 = {0,8,1,9,2,10,3,11};
+
+    v2 = __builtin_shuffle (v0, v1,  mask0);
+    shuf2compare (short, 8, v2, v0, v1, mask0);
+ 
+    v2 = __builtin_shuffle (v0, v1,  mask1);
+    shuf2compare (short, 8, v2, v0, v1, mask1);
+    
+    v2 = __builtin_shuffle (v0, v1,  mask2);
+    shuf2compare (short, 8, v2, v0, v1, mask2);
+
+    v2 = __builtin_shuffle (mask0, mask0,  v0);
+    shuf2compare (short, 8, v2, mask0, mask0, v0);
+
+    return 0; 
+}
+
Index: gcc/testsuite/gcc.c-torture/execute/vect-shuffle-4.c
===================================================================
--- gcc/testsuite/gcc.c-torture/execute/vect-shuffle-4.c	(revision 0)
+++ gcc/testsuite/gcc.c-torture/execute/vect-shuffle-4.c	(revision 0)
@@ -0,0 +1,50 @@
+#define vector(elcount, type)  \
+__attribute__((vector_size((elcount)*sizeof(type)))) type
+
+#define vidx(type, vec, idx) (*(((type *) &(vec)) + idx))
+
+#define shuf2compare(type, count, vres, v0, v1, mask) \
+do { \
+    int __i; \
+    for (__i = 0; __i < count; __i++) { \
+        if (vidx(type, vres, __i) != ((vidx(type, mask, __i) < count) ? \
+                          vidx(type, v0, vidx(type, mask, __i)) :  \
+                          vidx(type, v1, (vidx(type, mask, __i) - count)))) \
+            __builtin_abort (); \
+        } \
+} while (0)
+
+
+vector (8, short) __attribute__ ((noinline))
+f (vector (8, short) x, vector (8, short) y, vector (8, short) mask) {
+    return __builtin_shuffle (x, y, mask);
+}
+
+
+
+int main (int argc, char *argv[]) {
+    vector (8, short) v0 = {argc, 1,2,3,4,5,6,7};
+    vector (8, short) v1 = {argc, 1,argc,3,4,5,argc,7};
+    vector (8, short) v2;
+
+    //vector (8, short) mask = {1,2,5,4,3,6,7};
+    
+    vector (8, short) mask0 = {0,2,3,1,4,5,6,7};
+    vector (8, short) mask1 = {0,12,3,4,3,0,10,9};
+    vector (8, short) mask2 = {0,8,1,9,2,10,3,11};
+
+    v2 = f (v0, v1,  mask0);
+    shuf2compare (short, 8, v2, v0, v1, mask0);
+ 
+    v2 = f (v0, v1,  mask1);
+    shuf2compare (short, 8, v2, v0, v1, mask1);
+
+    v2 = f (v0, v1,  mask2);
+    shuf2compare (short, 8, v2, v0, v1, mask2);
+
+    v2 = f (mask0, mask0,  v0);
+    shuf2compare (short, 8, v2, mask0, mask0, v0);
+
+    return 0; 
+}
+
Index: gcc/testsuite/gcc.c-torture/execute/vect-shuffle-1.c
===================================================================
--- gcc/testsuite/gcc.c-torture/execute/vect-shuffle-1.c	(revision 0)
+++ gcc/testsuite/gcc.c-torture/execute/vect-shuffle-1.c	(revision 0)
@@ -0,0 +1,46 @@
+#define vector(elcount, type)  \
+__attribute__((vector_size((elcount)*sizeof(type)))) type
+
+#define vidx(type, vec, idx) (*(((type *) &(vec)) + idx))
+
+#define shufcompare(type, count, vres, v0, mask) \
+do { \
+    int __i; \
+    for (__i = 0; __i < count; __i++) { \
+        if (vidx(type, vres, __i) != vidx(type, v0, vidx(type, mask, __i))) \
+            __builtin_abort (); \
+    } \
+} while (0)
+
+
+int main (int argc, char *argv[]) {
+    /*vector (8, short) v0 = {argc, 1,2,3,4,5,6,7};
+    vector (8, short) v1 = {argc, 1,argc,3,4,5,argc,7};
+    vector (8, short) v2;
+   
+    vector (8, short) smask = {0,0,1,2,3,4,5,6};
+    
+    v2 = __builtin_shuffle (v0,  smask);
+    shufcompare (short, 8, v2, v0, smask);
+    v2 = __builtin_shuffle (v0, v1);
+    shufcompare (short, 8, v2, v0, v1);
+    v2 = __builtin_shuffle (smask, v0);
+    shufcompare (short, 8, v2, smask, v0);*/
+
+    vector (4, int) i0 = {argc, 1,2,3};
+    vector (4, int) i1 = {argc, 1, argc, 3};
+    vector (4, int) i2;
+
+    vector (4, int) imask = {0,3,2,1};
+
+    /*i2 = __builtin_shuffle (i0, imask);
+    shufcompare (int, 4, i2, i0, imask);*/
+    i2 = __builtin_shuffle (i0, i1);
+    shufcompare (int, 4, i2, i0, i1);
+    
+    i2 = __builtin_shuffle (imask, i0);
+    shufcompare (int, 4, i2, imask, i0);
+    
+    return 0;
+}
+
Index: gcc/testsuite/gcc.c-torture/execute/vect-shuffle-3.c
===================================================================
--- gcc/testsuite/gcc.c-torture/execute/vect-shuffle-3.c	(revision 0)
+++ gcc/testsuite/gcc.c-torture/execute/vect-shuffle-3.c	(revision 0)
@@ -0,0 +1,36 @@
+#define vector(elcount, type)  \
+__attribute__((vector_size((elcount)*sizeof(type)))) type
+
+#define vidx(type, vec, idx) (*(((type *) &(vec)) + idx))
+
+#define shufcompare(type, count, vres, v0, mask) \
+do { \
+    int __i; \
+    for (__i = 0; __i < count; __i++) { \
+        if (vidx(type, vres, __i) != vidx(type, v0, vidx(type, mask, __i))) \
+            __builtin_abort (); \
+    } \
+} while (0)
+
+vector (8, short) __attribute__ ((noinline))
+f (vector (8, short) x, vector (8, short) mask) {
+    return __builtin_shuffle (x, mask);
+}
+
+
+int main (int argc, char *argv[]) {
+    vector (8, short) v0 = {argc, 1,2,3,4,5,6,7};
+    vector (8, short) v1 = {argc, 1,argc,3,4,5,argc,7};
+    vector (8, short) v2;
+
+    vector (8, short) mask = {0,0,1,2,3,4,5,6};
+    
+    v2 = f (v0,  mask);
+    shufcompare (short, 8, v2, v0, mask);
+
+    v2 = f (v0, v1);
+    shufcompare (short, 8, v2, v0, v1);
+
+    return 0;
+}
+
Index: gcc/testsuite/gcc.c-torture/execute/vect-shuffle-5.c
===================================================================
--- gcc/testsuite/gcc.c-torture/execute/vect-shuffle-5.c	(revision 0)
+++ gcc/testsuite/gcc.c-torture/execute/vect-shuffle-5.c	(revision 0)
@@ -0,0 +1,64 @@
+/* Test that different type variants are compatible within
+   vector shuffling.  */
+
+#define vector(elcount, type)  \
+__attribute__((vector_size((elcount)*sizeof(type)))) type
+
+#define shufcompare(count, vres, v0, mask) \
+do { \
+    int __i; \
+    for (__i = 0; __i < count; __i++) { \
+        if (vres[__i] != v0[mask[__i]]) \
+            __builtin_abort (); \
+    } \
+} while (0)
+
+#define test_compat_mask(res, vec, mask) \
+  res = __builtin_shuffle (vec, mask); \
+  shufcompare(4, res, vec, mask); \
+  res = __builtin_shuffle (vec, c ## mask); \
+  shufcompare(4, res, vec, c ##  mask); \
+  res = __builtin_shuffle (vec, r ## mask); \
+  shufcompare(4, res, vec, r ##  mask); \
+  res = __builtin_shuffle (vec, d ## mask); \
+  shufcompare(4, res, vec, d ##  mask); \
+  res = __builtin_shuffle (vec, dc ## mask); \
+  shufcompare(4, res, vec, dc ##  mask); \
+
+#define test_compat_vec(res, vec, mask) \
+  test_compat_mask (res, vec, mask); \
+  test_compat_mask (res, c ## vec, mask); \
+  test_compat_mask (res, r ## vec, mask); \
+  test_compat_mask (res, d ## vec, mask); \
+  test_compat_mask (res, dc ## vec, mask); 
+
+#define test_compat(res, vec, mask) \
+  test_compat_vec (res, vec, mask); \
+  test_compat_vec (d ## res, vec, mask); \
+  test_compat_vec (r ## res, vec, mask);
+
+typedef vector (4, int) v4si;
+typedef const vector (4, int) v4sicst;
+
+int main (int argc, char *argv[]) {
+    vector (4, int) vec = {argc, 1,2,3};
+    const vector (4, int) cvec = {argc, 1,2,3};
+    register vector (4, int) rvec = {argc, 1,2,3};
+    v4si dvec = {argc, 1,2,3};
+    v4sicst dcvec = {argc, 1,2,3};
+    
+    vector (4, int) res; 
+    v4si dres;
+    register vector (4, int) rres;
+
+    vector (4, int) mask = {0,3,2,1};
+    const vector (4, int) cmask = {0,3,2,1};
+    register vector (4, int) rmask = {0,3,2,1};
+    v4si dmask = {0,3,2,1};
+    v4sicst dcmask = {0,3,2,1};
+
+    test_compat (res, vec, mask);
+
+    return 0;
+}
+
Index: gcc/testsuite/gcc.dg/builtin-complex-err-1.c
===================================================================
--- gcc/testsuite/gcc.dg/builtin-complex-err-1.c	(revision 178354)
+++ gcc/testsuite/gcc.dg/builtin-complex-err-1.c	(working copy)
@@ -19,8 +19,8 @@ _Complex float fc3 = __builtin_complex (
 void
 f (void)
 {
-  __builtin_complex (0.0); /* { dg-error "expected" } */
-  __builtin_complex (0.0, 0.0, 0.0); /* { dg-error "expected" } */
+  __builtin_complex (0.0); /* { dg-error "wrong number of arguments" } */
+  __builtin_complex (0.0, 0.0, 0.0); /* { dg-error "wrong number of arguments" } */
 }
 
-void (*p) (void) = __builtin_complex; /* { dg-error "expected" } */
+void (*p) (void) = __builtin_complex; /* { dg-error "cannot take address" } */
Index: gcc/c-tree.h
===================================================================
--- gcc/c-tree.h	(revision 178354)
+++ gcc/c-tree.h	(working copy)
@@ -579,6 +579,7 @@ extern tree c_begin_omp_task (void);
 extern tree c_finish_omp_task (location_t, tree, tree);
 extern tree c_finish_omp_clauses (tree);
 extern tree c_build_va_arg (location_t, tree, tree);
+extern tree c_build_vec_shuffle_expr (location_t, tree, tree, tree);
 
 /* Set to 0 at beginning of a function definition, set to 1 if
    a return statement that specifies a return value is seen.  */
Index: gcc/expr.c
===================================================================
--- gcc/expr.c	(revision 178354)
+++ gcc/expr.c	(working copy)
@@ -8605,6 +8605,10 @@ expand_expr_real_2 (sepops ops, rtx targ
     case VEC_PACK_FIX_TRUNC_EXPR:
       mode = TYPE_MODE (TREE_TYPE (treeop0));
       goto binop;
+    
+    case VEC_SHUFFLE_EXPR:
+      target = expand_vec_shuffle_expr (type, treeop0, treeop1, treeop2, target);
+      return target;
 
     case DOT_PROD_EXPR:
       {
Index: gcc/gimple-pretty-print.c
===================================================================
--- gcc/gimple-pretty-print.c	(revision 178354)
+++ gcc/gimple-pretty-print.c	(working copy)
@@ -417,6 +417,16 @@ dump_ternary_rhs (pretty_printer *buffer
       dump_generic_node (buffer, gimple_assign_rhs3 (gs), spc, flags, false);
       pp_string (buffer, ">");
       break;
+    
+    case VEC_SHUFFLE_EXPR:
+      pp_string (buffer, "VEC_SHUFFLE_EXPR <");
+      dump_generic_node (buffer, gimple_assign_rhs1 (gs), spc, flags, false);
+      pp_string (buffer, ", ");
+      dump_generic_node (buffer, gimple_assign_rhs2 (gs), spc, flags, false);
+      pp_string (buffer, ", ");
+      dump_generic_node (buffer, gimple_assign_rhs3 (gs), spc, flags, false);
+      pp_string (buffer, ">");
+      break;
 
     case REALIGN_LOAD_EXPR:
       pp_string (buffer, "REALIGN_LOAD <");
Index: gcc/c-typeck.c
===================================================================
--- gcc/c-typeck.c	(revision 178354)
+++ gcc/c-typeck.c	(working copy)
@@ -2307,7 +2307,7 @@ build_array_ref (location_t loc, tree ar
       if (TREE_CODE (TREE_TYPE (index)) != ARRAY_TYPE
 	  && TREE_CODE (TREE_TYPE (index)) != POINTER_TYPE)
 	{
-          error_at (loc, 
+          error_at (loc,
             "subscripted value is neither array nor pointer nor vector");
 
 	  return error_mark_node;
@@ -2339,8 +2339,8 @@ build_array_ref (location_t loc, tree ar
   index = default_conversion (index);
 
   gcc_assert (TREE_CODE (TREE_TYPE (index)) == INTEGER_TYPE);
-  
-  /* For vector[index], convert the vector to a 
+
+  /* For vector[index], convert the vector to a
      pointer of the underlying type.  */
   if (TREE_CODE (TREE_TYPE (array)) == VECTOR_TYPE)
     {
@@ -2348,11 +2348,11 @@ build_array_ref (location_t loc, tree ar
       tree type1;
 
       if (TREE_CODE (index) == INTEGER_CST)
-        if (!host_integerp (index, 1) 
-            || ((unsigned HOST_WIDE_INT) tree_low_cst (index, 1) 
+        if (!host_integerp (index, 1)
+            || ((unsigned HOST_WIDE_INT) tree_low_cst (index, 1)
                >= TYPE_VECTOR_SUBPARTS (TREE_TYPE (array))))
           warning_at (loc, OPT_Warray_bounds, "index value is out of bound");
-     
+
       c_common_mark_addressable_vec (array);
       type = build_qualified_type (TREE_TYPE (type), TYPE_QUALS (type));
       type = build_pointer_type (type);
@@ -2845,6 +2845,99 @@ build_function_call_vec (location_t loc,
     }
   return require_complete_type (result);
 }
+
+/* Build a VEC_SHUFFLE_EXPR if V0, V1 and MASK are not error_mark_nodes
+   and have vector types, V0 has the same type as V1, and the number of
+   elements of V0, V1, MASK is the same.
+
+   In case V1 is a NULL_TREE it is assumed that __builtin_shuffle was
+   called with two arguments.  In this case implementation passes the
+   first argument twice in order to share the same tree code.  This fact
+   could enable the mask-values being twice the vector length.  This is
+   an implementation accident and this semantics is not guaranteed to
+   the user.  */
+tree
+c_build_vec_shuffle_expr (location_t loc, tree v0, tree v1, tree mask)
+{
+  tree vec_shuffle, tmp;
+  bool wrap = true;
+  bool maybe_const = false;
+  bool two_arguments;
+
+  if (v1 == NULL_TREE)
+    {
+      two_arguments = true;
+      v1 = v0;
+    }
+
+  if (v0 == error_mark_node || v1 == error_mark_node
+      || mask == error_mark_node)
+    return error_mark_node;
+
+  if (TREE_CODE (TREE_TYPE (mask)) != VECTOR_TYPE
+      || TREE_CODE (TREE_TYPE (TREE_TYPE (mask))) != INTEGER_TYPE)
+    {
+      error_at (loc, "__builtin_shuffle last argument must "
+		     "be an integer vector");
+      return error_mark_node;
+    }
+
+  if (TREE_CODE (TREE_TYPE (v0)) != VECTOR_TYPE
+      || TREE_CODE (TREE_TYPE (v1)) != VECTOR_TYPE)
+    {
+      error_at (loc, "__builtin_shuffle arguments must be vectors");
+      return error_mark_node;
+    }
+
+  if (TYPE_MAIN_VARIANT (TREE_TYPE (v0)) != TYPE_MAIN_VARIANT (TREE_TYPE (v1)))
+    {
+      error_at (loc, "__builtin_shuffle argument vectors must be of "
+		     "the same type");
+      return error_mark_node;
+    }
+
+  if (TYPE_VECTOR_SUBPARTS (TREE_TYPE (v0))
+      != TYPE_VECTOR_SUBPARTS (TREE_TYPE (mask))
+      && TYPE_VECTOR_SUBPARTS (TREE_TYPE (v1))
+	 != TYPE_VECTOR_SUBPARTS (TREE_TYPE (mask)))
+    {
+      error_at (loc, "__builtin_shuffle number of elements of the "
+		     "argument vector(s) and the mask vector should "
+		     "be the same");
+      return error_mark_node;
+    }
+
+  if (GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE (TREE_TYPE (v0))))
+      != GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE (TREE_TYPE (mask)))))
+    {
+      error_at (loc, "__builtin_shuffle argument vector(s) inner type "
+		     "must have the same size as inner type of the mask");
+      return error_mark_node;
+    }
+
+  /* Avoid C_MAYBE_CONST_EXPRs inside VEC_SHUFFLE_EXPR.  */
+  tmp = c_fully_fold (v0, false, &maybe_const);
+  v0 = save_expr (tmp);
+  wrap &= maybe_const;
+
+  if (!two_arguments)
+    {
+      v1 = c_fully_fold (v1, false, &maybe_const);
+      wrap &= maybe_const;
+    }
+  else
+    v1 = v0;
+
+  mask = c_fully_fold (mask, false, &maybe_const);
+  wrap &= maybe_const;
+
+  vec_shuffle = build3 (VEC_SHUFFLE_EXPR, TREE_TYPE (v0), v0, v1, mask);
+
+  if (!wrap)
+    vec_shuffle = c_wrap_maybe_const (vec_shuffle, true);
+
+  return vec_shuffle;
+}
 \f
 /* Convert the argument expressions in the vector VALUES
    to the types in the list TYPELIST.
@@ -3167,7 +3260,7 @@ convert_arguments (tree typelist, VEC(tr
 
   if (typetail != 0 && TREE_VALUE (typetail) != void_type_node)
     {
-      error_at (input_location, 
+      error_at (input_location,
 		"too few arguments to function %qE", function);
       if (fundecl && !DECL_BUILT_IN (fundecl))
 	inform (DECL_SOURCE_LOCATION (fundecl), "declared here");
@@ -3566,7 +3659,7 @@ build_unary_op (location_t location,
 
       /* Complain about anything that is not a true lvalue.  In
 	 Objective-C, skip this check for property_refs.  */
-      if (!objc_is_property_ref (arg) 
+      if (!objc_is_property_ref (arg)
 	  && !lvalue_or_else (location,
 			      arg, ((code == PREINCREMENT_EXPR
 				     || code == POSTINCREMENT_EXPR)
@@ -3683,7 +3776,7 @@ build_unary_op (location_t location,
 	   need to ask Objective-C to build the increment or decrement
 	   expression for it.  */
 	if (objc_is_property_ref (arg))
-	  return objc_build_incr_expr_for_property_ref (location, code, 
+	  return objc_build_incr_expr_for_property_ref (location, code,
 							arg, inc);
 
 	/* Report a read-only lvalue.  */
@@ -5926,7 +6019,7 @@ void
 pedwarn_init (location_t location, int opt, const char *gmsgid)
 {
   char *ofwhat;
-  
+
   /* The gmsgid may be a format string with %< and %>. */
   pedwarn (location, opt, gmsgid);
   ofwhat = print_spelling ((char *) alloca (spelling_length () + 1));
@@ -9344,8 +9437,8 @@ scalar_to_vector (location_t loc, enum t
   tree type1 = TREE_TYPE (op1);
   bool integer_only_op = false;
   enum stv_conv ret = stv_firstarg;
-  
-  gcc_assert (TREE_CODE (type0) == VECTOR_TYPE 
+
+  gcc_assert (TREE_CODE (type0) == VECTOR_TYPE
 	      || TREE_CODE (type1) == VECTOR_TYPE);
   switch (code)
     {
@@ -9370,7 +9463,7 @@ scalar_to_vector (location_t loc, enum t
       case BIT_AND_EXPR:
 	integer_only_op = true;
 	/* ... fall through ...  */
-      
+
       case PLUS_EXPR:
       case MINUS_EXPR:
       case MULT_EXPR:
@@ -9387,7 +9480,7 @@ scalar_to_vector (location_t loc, enum t
 	  }
 
 	if (TREE_CODE (type0) == INTEGER_TYPE
-	    && TREE_CODE (TREE_TYPE (type1)) == INTEGER_TYPE) 
+	    && TREE_CODE (TREE_TYPE (type1)) == INTEGER_TYPE)
 	  {
 	    if (unsafe_conversion_p (TREE_TYPE (type1), op0, false))
 	      {
@@ -9399,7 +9492,7 @@ scalar_to_vector (location_t loc, enum t
 	  }
 	else if (!integer_only_op
 		    /* Allow integer --> real conversion if safe.  */
-		 && (TREE_CODE (type0) == REAL_TYPE 
+		 && (TREE_CODE (type0) == REAL_TYPE
 		     || TREE_CODE (type0) == INTEGER_TYPE)
 		 && SCALAR_FLOAT_TYPE_P (TREE_TYPE (type1)))
 	  {
@@ -9414,7 +9507,7 @@ scalar_to_vector (location_t loc, enum t
       default:
 	break;
     }
- 
+
   return stv_nothing;
 }
 \f
@@ -9529,8 +9622,8 @@ build_binary_op (location_t location, en
     int_const = int_const_or_overflow = false;
 
   /* Do not apply default conversion in mixed vector/scalar expression.  */
-  if (convert_p 
-      && !((TREE_CODE (TREE_TYPE (op0)) == VECTOR_TYPE) 
+  if (convert_p
+      && !((TREE_CODE (TREE_TYPE (op0)) == VECTOR_TYPE)
 	   != (TREE_CODE (TREE_TYPE (op1)) == VECTOR_TYPE)))
     {
       op0 = default_conversion (op0);
@@ -9608,7 +9701,7 @@ build_binary_op (location_t location, en
   if ((code0 == VECTOR_TYPE) != (code1 == VECTOR_TYPE))
     {
       enum stv_conv convert_flag = scalar_to_vector (location, code, op0, op1);
-      
+
       switch (convert_flag)
 	{
 	  case stv_error:
@@ -9949,7 +10042,7 @@ build_binary_op (location_t location, en
 	    {
 	      if (code == EQ_EXPR)
 		warning_at (location,
-			    OPT_Waddress, 
+			    OPT_Waddress,
 			    "the comparison will always evaluate as %<false%> "
 			    "for the address of %qD will never be NULL",
 			    TREE_OPERAND (op1, 0));
Index: gcc/gimplify.c
===================================================================
--- gcc/gimplify.c	(revision 178354)
+++ gcc/gimplify.c	(working copy)
@@ -7286,6 +7286,7 @@ gimplify_expr (tree *expr_p, gimple_seq
 	  }
 
 	case FMA_EXPR:
+	case VEC_SHUFFLE_EXPR:
 	  /* Classified as tcc_expression.  */
 	  goto expr_3;
 
Index: gcc/tree.def
===================================================================
--- gcc/tree.def	(revision 178354)
+++ gcc/tree.def	(working copy)
@@ -497,6 +497,19 @@ DEFTREECODE (COND_EXPR, "cond_expr", tcc
 */
 DEFTREECODE (VEC_COND_EXPR, "vec_cond_expr", tcc_expression, 3)
 
+/* Vector shuffle expression.  A = VEC_SHUFFLE_EXPR<v0, v1, mask>
+   means
+
+   foreach i in length (mask):
+     A = mask[i] < length (v0) ? v0[mask[i]] : v1[mask[i] - length (mask)]
+
+   V0 and V1 are vectors of the same type.  MASK is an integer-typed
+   vector.  The number of MASK elements must be the same with the
+   number of elements in V0 and V1.  The size of the inner type
+   of the MASK and of the V0 and V1 must be the same.
+*/
+DEFTREECODE (VEC_SHUFFLE_EXPR, "vec_shuffle_expr", tcc_expression, 3)
+
 /* Declare local variables, including making RTL and allocating space.
    BIND_EXPR_VARS is a chain of VAR_DECL nodes for the variables.
    BIND_EXPR_BODY is the body, the expression to be computed using
Index: gcc/tree-inline.c
===================================================================
--- gcc/tree-inline.c	(revision 178354)
+++ gcc/tree-inline.c	(working copy)
@@ -3285,6 +3285,7 @@ estimate_operator_cost (enum tree_code c
        ??? We may consider mapping RTL costs to this.  */
     case COND_EXPR:
     case VEC_COND_EXPR:
+    case VEC_SHUFFLE_EXPR:
 
     case PLUS_EXPR:
     case POINTER_PLUS_EXPR:
Index: gcc/tree-vect-generic.c
===================================================================
--- gcc/tree-vect-generic.c	(revision 178354)
+++ gcc/tree-vect-generic.c	(working copy)
@@ -30,6 +30,7 @@ along with GCC; see the file COPYING3.
 #include "tree-pass.h"
 #include "flags.h"
 #include "ggc.h"
+#include "diagnostic.h"
 
 /* Need to include rtl.h, expr.h, etc. for optabs.  */
 #include "expr.h"
@@ -326,10 +327,10 @@ uniform_vector_p (tree vec)
         }
       if (i != TYPE_VECTOR_SUBPARTS (TREE_TYPE (vec)))
 	return NULL_TREE;
-      
+
       return first;
     }
-  
+
   return NULL_TREE;
 }
 
@@ -432,6 +433,263 @@ type_for_widest_vector_mode (enum machin
     }
 }
 
+
+/* Build a reference to the element of the vector VECT.  Function
+   returns either the element itself, either BIT_FIELD_REF, or an
+   ARRAY_REF expression.
+
+   GSI is requred to insert temporary variables while building a
+   refernece to the element of the vector VECT.
+
+   PTMPVEC is a pointer to the temporary variable for caching
+   purposes.  In case when PTMPVEC is NULL new temporary variable
+   will be created.  */
+static tree
+vector_element (gimple_stmt_iterator *gsi, tree vect, tree idx, tree *ptmpvec)
+{
+  tree type;
+  gimple asgn;
+  tree tmpvec;
+  tree arraytype;
+  bool need_asgn = true;
+
+  gcc_assert (TREE_CODE (TREE_TYPE (vect)) == VECTOR_TYPE);
+
+  type = TREE_TYPE (vect);
+  if (TREE_CODE (idx) == INTEGER_CST)
+    {
+      unsigned HOST_WIDE_INT index;
+
+      if (!host_integerp (idx, 1)
+           || (index = tree_low_cst (idx, 1)) > TYPE_VECTOR_SUBPARTS (type)-1)
+        return error_mark_node;
+
+      if (TREE_CODE (vect) == VECTOR_CST)
+        {
+	  unsigned i;
+	  tree vals = TREE_VECTOR_CST_ELTS (vect);
+	  for (i = 0; vals; vals = TREE_CHAIN (vals), ++i)
+	    if (i == index)
+	       return TREE_VALUE (vals);
+	  return error_mark_node;
+        }
+      else if (TREE_CODE (vect) == CONSTRUCTOR)
+        {
+          unsigned i;
+          VEC (constructor_elt, gc) *vals = CONSTRUCTOR_ELTS (vect);
+          constructor_elt *elt;
+
+          for (i = 0; VEC_iterate (constructor_elt, vals, i, elt); i++)
+            if (operand_equal_p (elt->index, idx, 0))
+              return elt->value;
+          return fold_convert (TREE_TYPE (type), integer_zero_node);
+        }
+      else if (TREE_CODE (vect) == SSA_NAME)
+        {
+	  tree size = TYPE_SIZE (TREE_TYPE (type));
+          tree pos = fold_build2 (MULT_EXPR, TREE_TYPE (idx), idx, size);
+          return fold_build3 (BIT_FIELD_REF, TREE_TYPE (type), vect, size, pos);
+        }
+      else
+        return error_mark_node;
+    }
+
+  if (!ptmpvec)
+    tmpvec = create_tmp_var (TREE_TYPE (vect), "vectmp");
+  else if (!*ptmpvec)
+    tmpvec = *ptmpvec = create_tmp_var (TREE_TYPE (vect), "vectmp");
+  else
+    {
+      tmpvec = *ptmpvec;
+      need_asgn = false;
+    }
+
+  if (need_asgn)
+    {
+      TREE_ADDRESSABLE (tmpvec) = 1;
+      asgn = gimple_build_assign (tmpvec, vect);
+      gsi_insert_before (gsi, asgn, GSI_SAME_STMT);
+    }
+
+  arraytype = build_array_type_nelts (TREE_TYPE (type),
+				      TYPE_VECTOR_SUBPARTS (TREE_TYPE (vect)));
+
+  return build4 (ARRAY_REF, TREE_TYPE (type),
+                 build1 (VIEW_CONVERT_EXPR, arraytype, tmpvec),
+                 idx, NULL_TREE, NULL_TREE);
+}
+
+/* Check if VEC_SHUFFLE_EXPR within the given setting is supported
+   by hardware, or lower it piecewise.  Function returns false when
+   the expression must be replaced with TRAP_RETURN, true otherwise.
+
+   When VEC_SHUFFLE_EXPR has the same first and second operands:
+   VEC_SHUFFLE_EXPR <v0, v0, mask> the lowered version would be
+   {v0[mask[0]], v0[mask[1]], ...}
+   MASK and V0 must have the same number of elements.
+
+   Otherwise VEC_SHUFFLE_EXPR <v0, v1, mask> is lowered to
+   {mask[0] < len(v0) ? v0[mask[0]] : v1[mask[0]], ...}
+   V0 and V1 must have the same type.  MASK, V0, V1 must have the
+   same number of arguments.  */
+static bool
+lower_vec_shuffle (gimple_stmt_iterator *gsi, location_t loc)
+{
+
+  gimple stmt = gsi_stmt (*gsi);
+  tree mask = gimple_assign_rhs3 (stmt);
+  tree vec0 = gimple_assign_rhs1 (stmt);
+  tree vec1 = gimple_assign_rhs2 (stmt);
+  unsigned els = TYPE_VECTOR_SUBPARTS (TREE_TYPE (mask));
+  tree type0 = TREE_TYPE (TREE_TYPE (vec0));
+  VEC(constructor_elt,gc) *v = NULL;
+  tree vectype, constr;
+  tree vec0tmp = NULL_TREE, masktmp = NULL_TREE;
+
+  if (expand_vec_shuffle_expr_p (TYPE_MODE (TREE_TYPE (vec0)), vec0, vec1, mask))
+    {
+      tree t;
+
+      t = gimplify_build3 (gsi, VEC_SHUFFLE_EXPR, TREE_TYPE (vec0),
+			   vec0, vec1, mask);
+      gimple_assign_set_rhs_from_tree (gsi, t);
+      /* Statement should be updated by callee.  */
+      return true;
+    }
+
+  if (operand_equal_p (vec0, vec1, 0))
+    {
+      unsigned i;
+      tree vec0tmp = NULL_TREE;
+
+      v = VEC_alloc (constructor_elt, gc, els);
+      for (i = 0; i < els; i++)
+        {
+          tree idxval, vecel, t;
+
+	  idxval = vector_element (gsi, mask, size_int (i), &masktmp);
+          if (idxval == error_mark_node)
+            {
+              if (warning_at (loc, 0, "Invalid shuffling mask index %i", i))
+		inform (loc, "if this code is reached the programm will abort");
+	      return false;
+            }
+
+	  vecel = vector_element (gsi, vec0, idxval, &vec0tmp);
+          if (vecel == error_mark_node)
+            {
+              if (warning_at (loc, 0, "Invalid shuffling arguments"))
+		inform (loc, "if this code is reached the programm will abort");
+	      return false;
+            }
+
+          t = force_gimple_operand_gsi (gsi, vecel, true,
+					NULL_TREE, true, GSI_SAME_STMT);
+          CONSTRUCTOR_APPEND_ELT (v, size_int (i), t);
+        }
+    }
+  else
+    {
+      unsigned i;
+      tree var = create_tmp_var (type0, "vecel");
+      tree vec1tmp = NULL_TREE;
+
+      v = VEC_alloc (constructor_elt, gc, els);
+      for (i = 0; i < els; i++)
+        {
+          tree idxval, idx1val, cond, elval0, elval1, condexpr, t, ssatmp;
+          tree vec0el, vec1el;
+          gimple asgn;
+
+          idxval = vector_element (gsi, mask, size_int (i), &masktmp);
+	  if (idxval == error_mark_node)
+            {
+              if (warning_at (loc, 0, "Invalid shuffling mask index %i", i))
+		inform (loc, "if this code is reached the programm will abort");
+	      return false;
+            }
+
+          if (TREE_CODE (idxval) == INTEGER_CST)
+            {
+              if (tree_int_cst_lt (idxval, size_int (els)))
+                {
+                  vec0el = vector_element (gsi, vec0, idxval, &vec0tmp);
+                  t = force_gimple_operand_gsi (gsi, vec0el,
+                                    true, NULL_TREE, true, GSI_SAME_STMT);
+                }
+              else if (tree_int_cst_lt (idxval, size_int (2*els)))
+                {
+                  idx1val = fold_build2 (MINUS_EXPR, TREE_TYPE (idxval),
+                        idxval, build_int_cst (TREE_TYPE (idxval), els));
+
+                  vec1el = vector_element (gsi, vec1, idx1val, &vec1tmp);
+                  t = force_gimple_operand_gsi (gsi, vec1el, true,
+						NULL_TREE, true, GSI_SAME_STMT);
+                }
+              else
+                {
+                  if (warning_at (loc, 0, "Invalid shuffling mask index %i", i))
+		    inform (loc, "if this code is reached the "
+				  "programm will abort");
+		  return false;
+                }
+            }
+          else
+            {
+
+              idx1val = fold_build2 (MINUS_EXPR, TREE_TYPE (idxval),
+                            idxval, build_int_cst (TREE_TYPE (idxval), els));
+              idx1val = force_gimple_operand_gsi (gsi, idx1val,
+                                true, NULL_TREE, true, GSI_SAME_STMT);
+              cond = fold_build2 (GT_EXPR, boolean_type_node, \
+                             idxval, fold_convert (type0, size_int (els - 1)));
+
+	      vec0el = vector_element (gsi, vec0, idxval, &vec0tmp);
+              if (vec0el == error_mark_node)
+                {
+                  if (warning_at (loc, 0, "Invalid shuffling arguments"))
+		    inform (loc, "if this code is reached the "
+				 "programm will abort");
+		  return false;
+                }
+
+              elval0 = force_gimple_operand_gsi (gsi, vec0el,
+                                true, NULL_TREE, true, GSI_SAME_STMT);
+
+	      vec1el = vector_element (gsi, vec1, idx1val, &vec1tmp);
+              if (vec1el == error_mark_node)
+                {
+                  if (warning_at (loc, 0, "Invalid shuffling arguments"))
+		    inform (loc, "if this code is reached the "
+				 "programm will abort");
+		  return false;
+                }
+
+              elval1 = force_gimple_operand_gsi (gsi, vec1el,
+                                true, NULL_TREE, true, GSI_SAME_STMT);
+
+              condexpr = fold_build3 (COND_EXPR, type0, cond, \
+                                      elval1, elval0);
+
+              t = force_gimple_operand_gsi (gsi, condexpr, true, \
+                                        NULL_TREE, true, GSI_SAME_STMT);
+            }
+
+          asgn = gimple_build_assign (var, t);
+          ssatmp = make_ssa_name (var, asgn);
+          gimple_assign_set_lhs (asgn, ssatmp);
+          gsi_insert_before (gsi, asgn, GSI_SAME_STMT);
+          CONSTRUCTOR_APPEND_ELT (v, size_int (i), ssatmp);
+        }
+    }
+
+  vectype = build_vector_type (type0, els);
+  constr = build_constructor (vectype, v);
+  gimple_assign_set_rhs_from_tree (gsi, constr);
+  /* Statement should be updated by callee.  */
+  return true;
+}
+
 /* Process one statement.  If we identify a vector operation, expand it.  */
 
 static void
@@ -451,6 +709,25 @@ expand_vector_operations_1 (gimple_stmt_
   code = gimple_assign_rhs_code (stmt);
   rhs_class = get_gimple_rhs_class (code);
 
+  if (code == VEC_SHUFFLE_EXPR)
+    {
+      if (!lower_vec_shuffle (gsi, gimple_location (stmt)))
+	{
+	  gimple new_stmt;
+	  tree vec0;
+
+	  vec0 = gimple_assign_rhs1 (stmt);
+	  new_stmt = gimple_build_call (built_in_decls[BUILT_IN_TRAP], 0);
+	  gsi_insert_before (gsi, new_stmt,  GSI_SAME_STMT);
+	  split_block (gimple_bb (new_stmt), new_stmt);
+	  new_stmt = gimple_build_assign (gimple_assign_lhs (stmt), vec0);
+	  gsi_replace (gsi, new_stmt, false);
+	}
+
+      gimple_set_modified (gsi_stmt (*gsi), true);
+      update_stmt (gsi_stmt (*gsi));
+    }
+
   if (rhs_class != GIMPLE_UNARY_RHS && rhs_class != GIMPLE_BINARY_RHS)
     return;
 
@@ -485,9 +762,9 @@ expand_vector_operations_1 (gimple_stmt_
     {
       bool vector_scalar_shift;
       op = optab_for_tree_code (code, type, optab_scalar);
-      
+
       /* Vector/Scalar shift is supported.  */
-      vector_scalar_shift = (op && (optab_handler (op, TYPE_MODE (type)) 
+      vector_scalar_shift = (op && (optab_handler (op, TYPE_MODE (type))
 				    != CODE_FOR_nothing));
 
       /* If the 2nd argument is vector, we need a vector/vector shift.
@@ -500,10 +777,10 @@ expand_vector_operations_1 (gimple_stmt_
           /* Check whether we have vector <op> {x,x,x,x} where x
              could be a scalar variable or a constant. Transform
              vector <op> {x,x,x,x} ==> vector <op> scalar.  */
-          if (vector_scalar_shift 
+          if (vector_scalar_shift
               && ((TREE_CODE (rhs2) == VECTOR_CST
 		   && (first = uniform_vector_p (rhs2)) != NULL_TREE)
-		  || (TREE_CODE (rhs2) == SSA_NAME 
+		  || (TREE_CODE (rhs2) == SSA_NAME
 		      && (def_stmt = SSA_NAME_DEF_STMT (rhs2))
 		      && gimple_assign_single_p (def_stmt)
 		      && (first = uniform_vector_p
@@ -516,14 +793,14 @@ expand_vector_operations_1 (gimple_stmt_
           else
             op = optab_for_tree_code (code, type, optab_vector);
         }
-    
+
       /* Try for a vector/scalar shift, and if we don't have one, see if we
          have a vector/vector shift */
       else if (!vector_scalar_shift)
 	{
 	  op = optab_for_tree_code (code, type, optab_vector);
 
-	  if (op && (optab_handler (op, TYPE_MODE (type)) 
+	  if (op && (optab_handler (op, TYPE_MODE (type))
 		     != CODE_FOR_nothing))
 	    {
 	      /* Transform vector <op> scalar => vector <op> {x,x,x,x}.  */
@@ -613,9 +890,9 @@ expand_vector_operations_1 (gimple_stmt_
    if it may need the bit-twiddling tricks implemented in this file.  */
 
 static bool
-gate_expand_vector_operations (void)
+gate_expand_vector_operations_ssa (void)
 {
-  return flag_tree_vectorize != 0;
+  return optimize == 0;
 }
 
 static unsigned int
@@ -648,7 +925,7 @@ struct gimple_opt_pass pass_lower_vector
  {
   GIMPLE_PASS,
   "veclower",				/* name */
-  0,					/* gate */
+  gate_expand_vector_operations_ssa,    /* gate */
   expand_vector_operations,		/* execute */
   NULL,					/* sub */
   NULL,					/* next */
@@ -661,6 +938,7 @@ struct gimple_opt_pass pass_lower_vector
   TODO_update_ssa	                /* todo_flags_finish */
     | TODO_verify_ssa
     | TODO_verify_stmts | TODO_verify_flow
+    | TODO_cleanup_cfg
  }
 };
 
@@ -669,7 +947,7 @@ struct gimple_opt_pass pass_lower_vector
  {
   GIMPLE_PASS,
   "veclower2",				/* name */
-  gate_expand_vector_operations,	/* gate */
+  0,	                                /* gate */
   expand_vector_operations,		/* execute */
   NULL,					/* sub */
   NULL,					/* next */
@@ -682,6 +960,7 @@ struct gimple_opt_pass pass_lower_vector
   TODO_update_ssa	                /* todo_flags_finish */
     | TODO_verify_ssa
     | TODO_verify_stmts | TODO_verify_flow
+    | TODO_cleanup_cfg
  }
 };
 
Index: gcc/Makefile.in
===================================================================
--- gcc/Makefile.in	(revision 178354)
+++ gcc/Makefile.in	(working copy)
@@ -3178,7 +3178,7 @@ tree-vect-generic.o : tree-vect-generic.
     $(TM_H) $(TREE_FLOW_H) $(GIMPLE_H) tree-iterator.h $(TREE_PASS_H) \
     $(FLAGS_H) $(OPTABS_H) $(MACHMODE_H) $(EXPR_H) \
     langhooks.h $(FLAGS_H) $(DIAGNOSTIC_H) gt-tree-vect-generic.h $(GGC_H) \
-    coretypes.h insn-codes.h
+    coretypes.h insn-codes.h $(DIAGNOSTIC_H)
 df-core.o : df-core.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) \
    insn-config.h $(RECOG_H) $(FUNCTION_H) $(REGS_H) alloc-pool.h \
    hard-reg-set.h $(BASIC_BLOCK_H) $(DF_H) $(BITMAP_H) sbitmap.h $(TIMEVAR_H) \
Index: gcc/gimple.c
===================================================================
--- gcc/gimple.c	(revision 178354)
+++ gcc/gimple.c	(working copy)
@@ -2615,6 +2615,7 @@ get_gimple_rhs_num_ops (enum tree_code c
       || (SYM) == WIDEN_MULT_MINUS_EXPR					    \
       || (SYM) == DOT_PROD_EXPR						    \
       || (SYM) == REALIGN_LOAD_EXPR					    \
+      || (SYM) == VEC_SHUFFLE_EXPR					    \
       || (SYM) == FMA_EXPR) ? GIMPLE_TERNARY_RHS			    \
    : ((SYM) == COND_EXPR						    \
       || (SYM) == CONSTRUCTOR						    \
Index: gcc/tree-cfg.c
===================================================================
--- gcc/tree-cfg.c	(revision 178354)
+++ gcc/tree-cfg.c	(working copy)
@@ -3711,6 +3711,59 @@ verify_gimple_assign_ternary (gimple stm
 	}
       break;
 
+    case VEC_SHUFFLE_EXPR:
+      if (!useless_type_conversion_p (lhs_type, rhs1_type)
+	  || !useless_type_conversion_p (lhs_type, rhs2_type))
+	{
+	  error ("type mismatch in vector shuffle expression");
+	  debug_generic_expr (lhs_type);
+	  debug_generic_expr (rhs1_type);
+	  debug_generic_expr (rhs2_type);
+	  debug_generic_expr (rhs3_type);
+	  return true;
+	}
+
+      if (TREE_CODE (rhs1_type) != VECTOR_TYPE
+	  || TREE_CODE (rhs2_type) != VECTOR_TYPE
+	  || TREE_CODE (rhs3_type) != VECTOR_TYPE)
+	{
+	  error ("vector types expected in vector shuffle expression");
+	  debug_generic_expr (lhs_type);
+	  debug_generic_expr (rhs1_type);
+	  debug_generic_expr (rhs2_type);
+	  debug_generic_expr (rhs3_type);
+	  return true;
+	}
+
+      if (TYPE_VECTOR_SUBPARTS (rhs1_type) != TYPE_VECTOR_SUBPARTS (rhs2_type)
+	  || TYPE_VECTOR_SUBPARTS (rhs2_type)
+	     != TYPE_VECTOR_SUBPARTS (rhs3_type)
+	  || TYPE_VECTOR_SUBPARTS (rhs3_type)
+	     != TYPE_VECTOR_SUBPARTS (lhs_type))
+	{
+	  error ("vectors with different element number found "
+		 "in vector shuffle expression");
+	  debug_generic_expr (lhs_type);
+	  debug_generic_expr (rhs1_type);
+	  debug_generic_expr (rhs2_type);
+	  debug_generic_expr (rhs3_type);
+	  return true;
+	}
+
+      if (TREE_CODE (TREE_TYPE (rhs3_type)) != INTEGER_TYPE
+	  || GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE (rhs3_type)))
+	     != GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE (rhs1_type))))
+	{
+	  error ("invalid mask type in vector shuffle expression");
+	  debug_generic_expr (lhs_type);
+	  debug_generic_expr (rhs1_type);
+	  debug_generic_expr (rhs2_type);
+	  debug_generic_expr (rhs3_type);
+	  return true;
+	}
+
+      return false;
+
     case DOT_PROD_EXPR:
     case REALIGN_LOAD_EXPR:
       /* FIXME.  */
Index: gcc/passes.c
===================================================================
--- gcc/passes.c	(revision 178354)
+++ gcc/passes.c	(working copy)
@@ -1354,7 +1354,6 @@ init_optimization_passes (void)
 	  NEXT_PASS (pass_vectorize);
 	    {
 	      struct opt_pass **p = &pass_vectorize.pass.sub;
-	      NEXT_PASS (pass_lower_vector_ssa);
 	      NEXT_PASS (pass_dce_loop);
 	    }
           NEXT_PASS (pass_predcom);
@@ -1366,6 +1365,7 @@ init_optimization_passes (void)
 	  NEXT_PASS (pass_lim);
 	  NEXT_PASS (pass_tree_loop_done);
 	}
+      NEXT_PASS (pass_lower_vector_ssa);
       NEXT_PASS (pass_cse_reciprocals);
       NEXT_PASS (pass_reassoc);
       NEXT_PASS (pass_vrp);
Index: gcc/c-parser.c
===================================================================
--- gcc/c-parser.c	(revision 178354)
+++ gcc/c-parser.c	(working copy)
@@ -5989,6 +5989,46 @@ c_parser_alignof_expression (c_parser *p
     }
 }
 
+/* Helper function to read arguments of builtins which are interfaces
+   for the middle-end nodes like COMPLEX_EXPR, VEC_SHUFFLE_EXPR and
+   others.  The name of the builtin is passed using BNAME parameter.
+   Function returns true if there were no errors while parsing and
+   stores the arguments in EXPR_LIST.  List of original types can be
+   obtained by passing non NULL value to ORIG_TYPES.  */
+static bool
+c_parser_get_builtin_args (c_parser *parser, const char *bname,
+			   VEC(tree,gc) **expr_list,
+			   VEC(tree,gc) **orig_types)
+{
+  location_t loc = c_parser_peek_token (parser)->location;
+  *expr_list = NULL;
+
+  if (c_parser_next_token_is_not (parser, CPP_OPEN_PAREN))
+    {
+      error_at (loc, "cannot take address of %qs", bname);
+      return false;
+    }
+
+  c_parser_consume_token (parser);
+
+  if (c_parser_next_token_is (parser, CPP_CLOSE_PAREN))
+    {
+      c_parser_consume_token (parser);
+      return true;
+    }
+    
+  if (orig_types)
+    *expr_list = c_parser_expr_list (parser, false, false, orig_types);
+  else
+    *expr_list = c_parser_expr_list (parser, false, false, NULL);
+
+  if (!c_parser_require (parser, CPP_CLOSE_PAREN, "expected %<)%>"))
+    return false;
+
+  return true;
+}
+
+
 /* Parse a postfix expression (C90 6.3.1-6.3.2, C99 6.5.1-6.5.2).
 
    postfix-expression:
@@ -6027,6 +6067,10 @@ c_parser_alignof_expression (c_parser *p
 			     assignment-expression )
      __builtin_types_compatible_p ( type-name , type-name )
      __builtin_complex ( assignment-expression , assignment-expression )
+     __builtin_shuffle ( assignment-expression , assignment-expression )
+     __builtin_shuffle ( assignment-expression , 
+			 assignment-expression ,
+			 assignment-expression, )
 
    offsetof-member-designator:
      identifier
@@ -6047,7 +6091,7 @@ c_parser_alignof_expression (c_parser *p
 static struct c_expr
 c_parser_postfix_expression (c_parser *parser)
 {
-  struct c_expr expr, e1, e2, e3;
+  struct c_expr expr, e1;
   struct c_type_name *t1, *t2;
   location_t loc = c_parser_peek_token (parser)->location;;
   expr.original_code = ERROR_MARK;
@@ -6333,45 +6377,55 @@ c_parser_postfix_expression (c_parser *p
 	  }
 	  break;
 	case RID_CHOOSE_EXPR:
-	  c_parser_consume_token (parser);
-	  if (!c_parser_require (parser, CPP_OPEN_PAREN, "expected %<(%>"))
-	    {
-	      expr.value = error_mark_node;
-	      break;
-	    }
-	  loc = c_parser_peek_token (parser)->location;
-	  e1 = c_parser_expr_no_commas (parser, NULL);
-	  if (!c_parser_require (parser, CPP_COMMA, "expected %<,%>"))
-	    {
-	      c_parser_skip_until_found (parser, CPP_CLOSE_PAREN, NULL);
-	      expr.value = error_mark_node;
-	      break;
-	    }
-	  e2 = c_parser_expr_no_commas (parser, NULL);
-	  if (!c_parser_require (parser, CPP_COMMA, "expected %<,%>"))
-	    {
-	      c_parser_skip_until_found (parser, CPP_CLOSE_PAREN, NULL);
-	      expr.value = error_mark_node;
-	      break;
-	    }
-	  e3 = c_parser_expr_no_commas (parser, NULL);
-	  c_parser_skip_until_found (parser, CPP_CLOSE_PAREN,
-				     "expected %<)%>");
 	  {
-	    tree c;
+	    VEC(tree,gc) *expr_list;
+	    VEC(tree,gc) *orig_types;
+	    tree e1value, e2value, e3value, c;
 
-	    c = e1.value;
-	    mark_exp_read (e2.value);
-	    mark_exp_read (e3.value);
+	    c_parser_consume_token (parser);
+	    if (!c_parser_get_builtin_args (parser,
+					    "__builtin_choose_expr",
+					    &expr_list, &orig_types))
+	      {
+		expr.value = error_mark_node;
+		break;
+	      }
+
+	    if (VEC_length (tree, expr_list) != 3)
+	      {
+		error_at (loc, "wrong number of arguments to "
+			       "%<__builtin_choose_expr%>");
+		expr.value = error_mark_node;
+		break;
+	      }
+	    
+	    e1value = VEC_index (tree, expr_list, 0);
+	    e2value = VEC_index (tree, expr_list, 1);
+	    e3value = VEC_index (tree, expr_list, 2);
+
+	    c = e1value;
+	    mark_exp_read (e2value);
+	    mark_exp_read (e3value);
 	    if (TREE_CODE (c) != INTEGER_CST
 		|| !INTEGRAL_TYPE_P (TREE_TYPE (c)))
 	      error_at (loc,
 			"first argument to %<__builtin_choose_expr%> not"
 			" a constant");
 	    constant_expression_warning (c);
-	    expr = integer_zerop (c) ? e3 : e2;
+	    
+	    if (integer_zerop (c))
+	      {
+		expr.value = e3value;
+		expr.original_type = VEC_index (tree, orig_types, 2);
+	      }
+	    else
+	      {
+		expr.value = e2value;
+		expr.original_type = VEC_index (tree, orig_types, 1);
+	      }
+
+	    break;
 	  }
-	  break;
 	case RID_TYPES_COMPATIBLE_P:
 	  c_parser_consume_token (parser);
 	  if (!c_parser_require (parser, CPP_OPEN_PAREN, "expected %<(%>"))
@@ -6410,57 +6464,96 @@ c_parser_postfix_expression (c_parser *p
 	  }
 	  break;
 	case RID_BUILTIN_COMPLEX:
-	  c_parser_consume_token (parser);
-	  if (!c_parser_require (parser, CPP_OPEN_PAREN, "expected %<(%>"))
-	    {
-	      expr.value = error_mark_node;
-	      break;
-	    }
-	  loc = c_parser_peek_token (parser)->location;
-	  e1 = c_parser_expr_no_commas (parser, NULL);
-	  if (!c_parser_require (parser, CPP_COMMA, "expected %<,%>"))
-	    {
-	      c_parser_skip_until_found (parser, CPP_CLOSE_PAREN, NULL);
-	      expr.value = error_mark_node;
-	      break;
-	    }
-	  e2 = c_parser_expr_no_commas (parser, NULL);
-	  c_parser_skip_until_found (parser, CPP_CLOSE_PAREN,
-				     "expected %<)%>");
-	  mark_exp_read (e1.value);
-	  if (TREE_CODE (e1.value) == EXCESS_PRECISION_EXPR)
-	    e1.value = convert (TREE_TYPE (e1.value),
-				TREE_OPERAND (e1.value, 0));
-	  mark_exp_read (e2.value);
-	  if (TREE_CODE (e2.value) == EXCESS_PRECISION_EXPR)
-	    e2.value = convert (TREE_TYPE (e2.value),
-				TREE_OPERAND (e2.value, 0));
-	  if (!SCALAR_FLOAT_TYPE_P (TREE_TYPE (e1.value))
-	      || DECIMAL_FLOAT_TYPE_P (TREE_TYPE (e1.value))
-	      || !SCALAR_FLOAT_TYPE_P (TREE_TYPE (e2.value))
-	      || DECIMAL_FLOAT_TYPE_P (TREE_TYPE (e2.value)))
-	    {
-	      error_at (loc, "%<__builtin_complex%> operand "
-			"not of real binary floating-point type");
-	      expr.value = error_mark_node;
-	      break;
-	    }
-	  if (TYPE_MAIN_VARIANT (TREE_TYPE (e1.value))
-	      != TYPE_MAIN_VARIANT (TREE_TYPE (e2.value)))
-	    {
-	      error_at (loc,
-			"%<__builtin_complex%> operands of different types");
-	      expr.value = error_mark_node;
-	      break;
-	    }
-	  if (!flag_isoc99)
-	    pedwarn (loc, OPT_pedantic,
-		     "ISO C90 does not support complex types");
-	  expr.value = build2 (COMPLEX_EXPR,
-			       build_complex_type (TYPE_MAIN_VARIANT
-						   (TREE_TYPE (e1.value))),
-			       e1.value, e2.value);
-	  break;
+	  { 
+	    VEC(tree,gc) *expr_list;
+	    tree e1value, e2value;
+	    
+	    c_parser_consume_token (parser);
+	    if (!c_parser_get_builtin_args (parser,
+					    "__builtin_complex",
+					    &expr_list, NULL))
+	      {
+		expr.value = error_mark_node;
+		break;
+	      }
+
+	    if (VEC_length (tree, expr_list) != 2)
+	      {
+		error_at (loc, "wrong number of arguments to "
+			       "%<__builtin_complex%>");
+		expr.value = error_mark_node;
+		break;
+	      }
+	    
+	    e1value = VEC_index (tree, expr_list, 0);
+	    e2value = VEC_index (tree, expr_list, 1);
+
+	    mark_exp_read (e1value);
+	    if (TREE_CODE (e1value) == EXCESS_PRECISION_EXPR)
+	      e1value = convert (TREE_TYPE (e1value),
+				 TREE_OPERAND (e1value, 0));
+	    mark_exp_read (e2value);
+	    if (TREE_CODE (e2value) == EXCESS_PRECISION_EXPR)
+	      e2value = convert (TREE_TYPE (e2value),
+				 TREE_OPERAND (e2value, 0));
+	    if (!SCALAR_FLOAT_TYPE_P (TREE_TYPE (e1value))
+		|| DECIMAL_FLOAT_TYPE_P (TREE_TYPE (e1value))
+		|| !SCALAR_FLOAT_TYPE_P (TREE_TYPE (e2value))
+		|| DECIMAL_FLOAT_TYPE_P (TREE_TYPE (e2value)))
+	      {
+		error_at (loc, "%<__builtin_complex%> operand "
+			  "not of real binary floating-point type");
+		expr.value = error_mark_node;
+		break;
+	      }
+	    if (TYPE_MAIN_VARIANT (TREE_TYPE (e1value))
+		!= TYPE_MAIN_VARIANT (TREE_TYPE (e2value)))
+	      {
+		error_at (loc,
+			  "%<__builtin_complex%> operands of different types");
+		expr.value = error_mark_node;
+		break;
+	      }
+	    if (!flag_isoc99)
+	      pedwarn (loc, OPT_pedantic,
+		       "ISO C90 does not support complex types");
+	    expr.value = build2 (COMPLEX_EXPR,
+				 build_complex_type (TYPE_MAIN_VARIANT
+						     (TREE_TYPE (e1value))),
+				 e1value, e2value);
+	    break;
+	  }
+	case RID_BUILTIN_SHUFFLE:
+	  {
+	    VEC(tree,gc) *expr_list;
+	    
+	    c_parser_consume_token (parser);
+	    if (!c_parser_get_builtin_args (parser,
+					    "__builtin_shuffle",
+					    &expr_list, NULL))
+	      {
+		expr.value = error_mark_node;
+		break;
+	      }
+
+	    if (VEC_length (tree, expr_list) == 2)
+	      expr.value = c_build_vec_shuffle_expr
+				(loc, VEC_index (tree, expr_list, 0),
+				 NULL_TREE,
+				 VEC_index (tree, expr_list, 1));
+	    else if (VEC_length (tree, expr_list) == 3)
+	      expr.value = c_build_vec_shuffle_expr
+				(loc, VEC_index (tree, expr_list, 0),
+				 VEC_index (tree, expr_list, 1),
+				 VEC_index (tree, expr_list, 2));
+	    else
+	      {
+		error_at (loc, "wrong number of arguments to "
+			       "%<__builtin_shuffle%>");
+		expr.value = error_mark_node;
+	      }
+	    break;
+	  }
 	case RID_AT_SELECTOR:
 	  gcc_assert (c_dialect_objc ());
 	  c_parser_consume_token (parser);
Index: gcc/config/i386/sse.md
===================================================================
--- gcc/config/i386/sse.md	(revision 178354)
+++ gcc/config/i386/sse.md	(working copy)
@@ -231,6 +231,12 @@ (define_mode_attr sseinsnmode
    (V4SF "V4SF") (V2DF "V2DF")
    (TI "TI") (V32QI "OI") (V16HI "OI") (V8SI "OI") (V4DI "OI")])
 
+;; All 128bit vector modes
+(define_mode_attr sseshuffint
+  [(V16QI "V16QI") (V8HI "V8HI")
+   (V4SI "V4SI")  (V2DI "V2DI")
+   (V4SF "V4SI") (V2DF "V2DI")])
+
 ;; Mapping of vector float modes to an integer mode of the same size
 (define_mode_attr sseintvecmode
   [(V8SF "V8SI") (V4DF "V4DI")
@@ -6234,6 +6240,18 @@ (define_expand "vconduv2di"
   DONE;
 })
 
+(define_expand "vshuffle<mode>"
+  [(match_operand:V_128 0 "register_operand" "")
+   (match_operand:V_128 1 "general_operand" "")
+   (match_operand:<sseshuffint> 2 "general_operand" "")]
+  "TARGET_SSSE3 || TARGET_AVX"
+{
+  bool ok = ix86_expand_vshuffle (operands);
+  gcc_assert (ok);
+  DONE;
+})
+
+
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
 ;;
 ;; Parallel bitwise logical operations
Index: gcc/config/i386/i386-protos.h
===================================================================
--- gcc/config/i386/i386-protos.h	(revision 178354)
+++ gcc/config/i386/i386-protos.h	(working copy)
@@ -118,6 +118,7 @@ extern bool ix86_expand_int_movcc (rtx[]
 extern bool ix86_expand_fp_movcc (rtx[]);
 extern bool ix86_expand_fp_vcond (rtx[]);
 extern bool ix86_expand_int_vcond (rtx[]);
+extern bool ix86_expand_vshuffle (rtx[]);
 extern void ix86_expand_sse_unpack (rtx[], bool, bool);
 extern bool ix86_expand_int_addcc (rtx[]);
 extern rtx ix86_expand_call (rtx, rtx, rtx, rtx, rtx, bool);
Index: gcc/config/i386/i386.c
===================================================================
--- gcc/config/i386/i386.c	(revision 178354)
+++ gcc/config/i386/i386.c	(working copy)
@@ -18693,6 +18693,93 @@ ix86_expand_int_vcond (rtx operands[])
   return true;
 }
 
+bool
+ix86_expand_vshuffle (rtx operands[])
+{
+  rtx target = operands[0];
+  rtx op0 = operands[1];
+  rtx mask = operands[2];
+  rtx mm, vt, cv0, t1;
+  enum machine_mode mode = GET_MODE (op0);
+  enum machine_mode maskmode = GET_MODE (mask);
+  enum machine_mode maskinner = GET_MODE_INNER (mode);
+  rtx vec[16];
+  int w, i, j;
+
+  gcc_assert ((TARGET_SSSE3 || TARGET_AVX) && GET_MODE_BITSIZE (mode) == 128);
+
+  op0 = force_reg (mode, op0);
+  mask = force_reg (maskmode, mask);
+
+  /* Number of elements in the vector.  */
+  w = GET_MODE_BITSIZE (maskmode) / GET_MODE_BITSIZE (maskinner);
+
+  /* mask = mask & {w-1, w-1, w-1,...} */
+  for (i = 0; i < w; i++)
+    vec[i] = GEN_INT (w - 1);
+
+  mm = gen_rtx_CONST_VECTOR (maskmode, gen_rtvec_v (w, vec));
+  mask = expand_simple_binop (maskmode, AND, mask, mm,
+			      NULL_RTX, 0, OPTAB_DIRECT);
+
+  /* Convert mask to vector of chars.  */
+  mask = simplify_gen_subreg (V16QImode, mask, maskmode, 0);
+  mask = force_reg (V16QImode, mask);
+
+
+  /* Build a helper mask wich we will use in pshufb
+     (v4si) --> {0,0,0,0, 4,4,4,4, 8,8,8,8, 12,12,12,12}
+     (v8hi) --> {0,0, 2,2, 4,4, 6,6, ...}
+     ...  */
+  for (i = 0; i < w; i++)
+    for (j = 0; j < 16/w; j++)
+      vec[i*w+j] = GEN_INT (i*16/w);
+
+  vt = gen_rtx_CONST_VECTOR (V16QImode, gen_rtvec_v (16, vec));
+  vt = force_reg (V16QImode, vt);
+
+  t1 = gen_reg_rtx (V16QImode);
+  emit_insn (gen_ssse3_pshufbv16qi3 (t1, mask, vt));
+  mm = t1;
+
+  /* MM contains now something like
+     mm = {m[0], .., m[0], m[k], .., m[k], ... }, where
+     m[i] is an index of the element in the vector we are
+     selecting from.
+
+     Convert it into the byte positions by doing
+     mm = mm * {16/w, 16/w, ...}
+     mm = mm + {0,1,..,16/w, 0,1,..,16/w, ...}  */
+  for (i = 0; i < 16; i++)
+    vec[i] = GEN_INT (16/w);
+
+  cv0 = gen_rtx_CONST_VECTOR (V16QImode, gen_rtvec_v (16, vec));
+  mm = expand_simple_binop (V16QImode, MULT, mm, cv0,
+			    NULL_RTX, 0, OPTAB_DIRECT);
+
+  for (i = 0; i < w; i++)
+    for (j = 0; j < 16/w; j++)
+      vec[i*w+j] = GEN_INT (j);
+
+  cv0 = gen_rtx_CONST_VECTOR (V16QImode, gen_rtvec_v (16, vec));
+  mm = expand_simple_binop (V16QImode, PLUS, mm, cv0,
+			    NULL_RTX, 0, OPTAB_DIRECT);
+
+  t1 = gen_reg_rtx (V16QImode);
+
+  /* Convert OP0 to vector of chars.  */
+  op0 = simplify_gen_subreg (V16QImode, op0, mode, 0);
+  op0 = force_reg (V16QImode, op0);
+  emit_insn (gen_ssse3_pshufbv16qi3 (t1, op0, mm));
+
+  /* Convert it back from vector of chars to the original mode.  */
+  t1 = simplify_gen_subreg (mode, t1, V16QImode, 0);
+
+  emit_insn (gen_rtx_SET (VOIDmode, target, t1));
+
+  return true;
+}
+
 /* Unpack OP[1] into the next wider integer vector type.  UNSIGNED_P is
    true if we should do zero extension, else sign extension.  HIGH_P is
    true if we want the N/2 high elements, else the low elements.  */
@@ -30911,6 +30998,9 @@ struct expand_vec_perm_d
 
 static bool expand_vec_perm_1 (struct expand_vec_perm_d *d);
 static bool expand_vec_perm_broadcast_1 (struct expand_vec_perm_d *d);
+static int extract_vec_perm_cst (struct expand_vec_perm_d *, tree);
+static bool ix86_vectorize_builtin_vec_perm_ok (tree vec_type, tree mask);
+
 
 /* Get a vector mode of the same size as the original but with elements
    twice as wide.  This is only guaranteed to apply to integral vectors.  */
@@ -32417,7 +32507,7 @@ void ix86_emit_i387_round (rtx op0, rtx
   res = gen_reg_rtx (outmode);
 
   half = CONST_DOUBLE_FROM_REAL_VALUE (dconsthalf, inmode);
-  
+
   /* round(a) = sgn(a) * floor(fabs(a) + 0.5) */
 
   /* scratch = fxam(op1) */
@@ -34576,10 +34666,10 @@ ix86_vectorize_builtin_vec_perm_ok (tree
 
   vec_mask = extract_vec_perm_cst (&d, mask);
 
-  /* This hook is cannot be called in response to something that the
-     user does (unlike the builtin expander) so we shouldn't ever see
-     an error generated from the extract.  */
-  gcc_assert (vec_mask > 0 && vec_mask <= 3);
+  /* Check whether the mask can be applied to the vector type.  */
+  if (vec_mask < 0 || vec_mask > 3)
+    return false;
+
   one_vec = (vec_mask != 3);
 
   /* Implementable with shufps or pshufd.  */
Index: gcc/tree-ssa-operands.c
===================================================================
--- gcc/tree-ssa-operands.c	(revision 178354)
+++ gcc/tree-ssa-operands.c	(working copy)
@@ -943,6 +943,7 @@ get_expr_operands (gimple stmt, tree *ex
 
     case COND_EXPR:
     case VEC_COND_EXPR:
+    case VEC_SHUFFLE_EXPR:
       get_expr_operands (stmt, &TREE_OPERAND (expr, 0), uflags);
       get_expr_operands (stmt, &TREE_OPERAND (expr, 1), uflags);
       get_expr_operands (stmt, &TREE_OPERAND (expr, 2), uflags);

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: Vector shuffling
  2011-09-29 11:16                                         ` Artem Shinkarov
@ 2011-09-29 17:22                                           ` Richard Henderson
  2011-09-30 20:34                                             ` Artem Shinkarov
  0 siblings, 1 reply; 71+ messages in thread
From: Richard Henderson @ 2011-09-29 17:22 UTC (permalink / raw)
  To: Artem Shinkarov
  Cc: Joseph S. Myers, Richard Guenther, Duncan Sands, gcc-patches

On 09/29/2011 03:44 AM, Artem Shinkarov wrote:
> Here is a new version of the patch which hopefully fixes all the
> formatting issues and uses expand_simple_binop instead of force_reg in
> binary operations.
> 
> Ok?

Well, it's certainly not perfect by any means.  But I guess I can fix
things up myself once this patch is applied.


r~

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: Vector shuffling
  2011-09-29 17:22                                           ` Richard Henderson
@ 2011-09-30 20:34                                             ` Artem Shinkarov
  2011-09-30 20:44                                               ` Richard Henderson
  0 siblings, 1 reply; 71+ messages in thread
From: Artem Shinkarov @ 2011-09-30 20:34 UTC (permalink / raw)
  To: Richard Henderson
  Cc: Joseph S. Myers, Richard Guenther, Duncan Sands, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 523 bytes --]

On Thu, Sep 29, 2011 at 4:22 PM, Richard Henderson <rth@redhat.com> wrote:
> On 09/29/2011 03:44 AM, Artem Shinkarov wrote:
>> Here is a new version of the patch which hopefully fixes all the
>> formatting issues and uses expand_simple_binop instead of force_reg in
>> binary operations.
>>
>> Ok?
>
> Well, it's certainly not perfect by any means.  But I guess I can fix
> things up myself once this patch is applied.
>
>
> r~
>

I hope that the new version looks a little bit better.


Thanks,
Artem.

[-- Attachment #2: vec-shuffle.v18.diff --]
[-- Type: text/plain, Size: 66288 bytes --]

Index: gcc/doc/extend.texi
===================================================================
--- gcc/doc/extend.texi	(revision 178354)
+++ gcc/doc/extend.texi	(working copy)
@@ -6561,6 +6561,32 @@ invoke undefined behavior at runtime.  W
 accesses for vector subscription can be enabled with
 @option{-Warray-bounds}.
 
+Vector shuffling is available using functions
+@code{__builtin_shuffle (vec, mask)} and
+@code{__builtin_shuffle (vec0, vec1, mask)}. Both functions construct
+a permutation of elements from one or two vectors and return a vector
+of the same type as input vector(s). The mask is a vector of
+integer-typed elements. The size of each element of the mask must be
+the same as the size of each input vector element. The number of
+elements in input vector(s) and mask must be the same.
+
+The elements of the input vectors are numbered from left to right across
+one or both of the vectors. Each element in the mask specifies a number
+of element from the input vector(s). Consider the following example.
+
+@smallexample
+typedef int v4si __attribute__ ((vector_size (16)));
+
+v4si a = @{1,2,3,4@};
+v4si b = @{5,6,7,8@};
+v4si mask1 = @{0,1,1,3@};
+v4si mask2 = @{0,4,2,5@};
+v4si res;
+
+res = __builtin_shuffle (a, mask1);       /* res is @{1,2,2,4@}  */
+res = __builtin_shuffle (a, b, mask2);    /* res is @{1,5,3,6@}  */
+@end smallexample
+
 You can declare variables and use them in function calls and returns, as
 well as in assignments and some casts.  You can specify a vector type as
 a return type for a function.  Vector types can also be used as function
Index: gcc/tree-pretty-print.c
===================================================================
--- gcc/tree-pretty-print.c	(revision 178354)
+++ gcc/tree-pretty-print.c	(working copy)
@@ -2067,6 +2067,16 @@ dump_generic_node (pretty_printer *buffe
       dump_generic_node (buffer, TREE_OPERAND (node, 2), spc, flags, false);
       pp_string (buffer, " > ");
       break;
+    
+    case VEC_SHUFFLE_EXPR:
+      pp_string (buffer, " VEC_SHUFFLE_EXPR < ");
+      dump_generic_node (buffer, TREE_OPERAND (node, 0), spc, flags, false);
+      pp_string (buffer, " , ");
+      dump_generic_node (buffer, TREE_OPERAND (node, 1), spc, flags, false);
+      pp_string (buffer, " , ");
+      dump_generic_node (buffer, TREE_OPERAND (node, 2), spc, flags, false);
+      pp_string (buffer, " > ");
+      break;
 
     case DOT_PROD_EXPR:
       pp_string (buffer, " DOT_PROD_EXPR < ");
Index: gcc/c-family/c-common.c
===================================================================
--- gcc/c-family/c-common.c	(revision 178354)
+++ gcc/c-family/c-common.c	(working copy)
@@ -425,6 +425,7 @@ const struct c_common_resword c_common_r
   { "__attribute__",	RID_ATTRIBUTE,	0 },
   { "__builtin_choose_expr", RID_CHOOSE_EXPR, D_CONLY },
   { "__builtin_complex", RID_BUILTIN_COMPLEX, D_CONLY },
+  { "__builtin_shuffle", RID_BUILTIN_SHUFFLE, D_CONLY },
   { "__builtin_offsetof", RID_OFFSETOF, 0 },
   { "__builtin_types_compatible_p", RID_TYPES_COMPATIBLE_P, D_CONLY },
   { "__builtin_va_arg",	RID_VA_ARG,	0 },
Index: gcc/c-family/c-common.h
===================================================================
--- gcc/c-family/c-common.h	(revision 178354)
+++ gcc/c-family/c-common.h	(working copy)
@@ -103,7 +103,7 @@ enum rid
   /* C extensions */
   RID_ASM,       RID_TYPEOF,   RID_ALIGNOF,  RID_ATTRIBUTE,  RID_VA_ARG,
   RID_EXTENSION, RID_IMAGPART, RID_REALPART, RID_LABEL,      RID_CHOOSE_EXPR,
-  RID_TYPES_COMPATIBLE_P,      RID_BUILTIN_COMPLEX,
+  RID_TYPES_COMPATIBLE_P,      RID_BUILTIN_COMPLEX,	     RID_BUILTIN_SHUFFLE,
   RID_DFLOAT32, RID_DFLOAT64, RID_DFLOAT128,
   RID_FRACT, RID_ACCUM,
 
Index: gcc/optabs.c
===================================================================
--- gcc/optabs.c	(revision 178354)
+++ gcc/optabs.c	(working copy)
@@ -6620,6 +6620,92 @@ vector_compare_rtx (tree cond, bool unsi
   return gen_rtx_fmt_ee (rcode, VOIDmode, ops[0].value, ops[1].value);
 }
 
+/* Return true if VEC_SHUFFLE_EXPR can be expanded using SIMD extensions
+   of the CPU.  */
+bool
+expand_vec_shuffle_expr_p (enum machine_mode mode, tree v0, tree v1, tree mask)
+{
+  int v0_mode_s = GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE (TREE_TYPE (v0))));
+  int mask_mode_s = GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE (TREE_TYPE (mask))));
+
+  if (TREE_CODE (mask) == VECTOR_CST
+      && targetm.vectorize.builtin_vec_perm_ok (TREE_TYPE (v0), mask))
+    return true;
+
+  if (v0_mode_s != mask_mode_s
+      || TYPE_VECTOR_SUBPARTS (TREE_TYPE (v0))
+	 != TYPE_VECTOR_SUBPARTS (TREE_TYPE (mask)))
+    return false;
+
+  return direct_optab_handler (vshuffle_optab, mode) != CODE_FOR_nothing;
+}
+
+/* Generate instructions for VEC_COND_EXPR given its type and three
+   operands.  */
+rtx
+expand_vec_shuffle_expr (tree type, tree v0, tree v1, tree mask, rtx target)
+{
+  struct expand_operand ops[4];
+  enum insn_code icode;
+  enum machine_mode mode = TYPE_MODE (type);
+  rtx rtx_v0, rtx_mask;
+
+  gcc_assert (expand_vec_shuffle_expr_p (mode, v0, v1, mask));
+
+  if (TREE_CODE (mask) == VECTOR_CST)
+    {
+      tree m_type, call;
+      tree fn = targetm.vectorize.builtin_vec_perm (TREE_TYPE (v0), &m_type);
+
+      if (!fn)
+	goto vshuffle;
+
+      if (m_type != TREE_TYPE (TREE_TYPE (mask)))
+	{
+	  int units = TYPE_VECTOR_SUBPARTS (TREE_TYPE (mask));
+	  tree cvt = build_vector_type (m_type, units);
+	  mask = fold_convert (cvt, mask);
+	}
+
+      call = fold_build1 (ADDR_EXPR, build_pointer_type (TREE_TYPE (fn)), fn);
+      call = build_call_nary (type, call, 3, v0, v1, mask);
+
+      return expand_expr_real_1 (call, target, VOIDmode, EXPAND_NORMAL, NULL);
+    }
+
+vshuffle:
+  icode = direct_optab_handler (vshuffle_optab, mode);
+
+  if (icode == CODE_FOR_nothing)
+    return 0;
+
+  rtx_v0 = expand_normal (v0);
+  rtx_mask = expand_normal (mask);
+
+  create_output_operand (&ops[0], target, mode);
+  create_input_operand (&ops[3], rtx_mask, mode);
+
+  if (operand_equal_p (v0, v1, 0))
+    {
+      rtx_v0 = expand_normal (v0);
+      if (!insn_operand_matches(icode, 1, rtx_v0))
+        rtx_v0 = force_reg (mode, rtx_v0);
+
+      gcc_checking_assert(insn_operand_matches(icode, 2, rtx_v0));
+
+      create_fixed_operand (&ops[1], rtx_v0);
+      create_fixed_operand (&ops[2], rtx_v0);
+    }
+  else
+    {
+      create_input_operand (&ops[1], expand_normal (v0), mode);
+      create_input_operand (&ops[2], expand_normal (v1), mode);
+    }
+
+  expand_insn (icode, 4, ops);
+  return ops[0].value;
+}
+
 /* Return insn code for TYPE, the type of a VEC_COND_EXPR.  */
 
 static inline enum insn_code
Index: gcc/optabs.h
===================================================================
--- gcc/optabs.h	(revision 178354)
+++ gcc/optabs.h	(working copy)
@@ -636,6 +636,9 @@ enum direct_optab_index
   DOI_vcond,
   DOI_vcondu,
 
+  /* Vector shuffling.  */
+  DOI_vshuffle,
+
   /* Block move operation.  */
   DOI_movmem,
 
@@ -701,6 +704,7 @@ typedef struct direct_optab_d *direct_op
 #define reload_out_optab (&direct_optab_table[(int) DOI_reload_out])
 #define vcond_optab (&direct_optab_table[(int) DOI_vcond])
 #define vcondu_optab (&direct_optab_table[(int) DOI_vcondu])
+#define vshuffle_optab (&direct_optab_table[(int) DOI_vshuffle])
 #define movmem_optab (&direct_optab_table[(int) DOI_movmem])
 #define setmem_optab (&direct_optab_table[(int) DOI_setmem])
 #define cmpstr_optab (&direct_optab_table[(int) DOI_cmpstr])
@@ -879,8 +883,15 @@ extern rtx expand_widening_mult (enum ma
 /* Return tree if target supports vector operations for COND_EXPR.  */
 bool expand_vec_cond_expr_p (tree, enum machine_mode);
 
+/* Return tree if target supports vector operations for VEC_SHUFFLE_EXPR.  */
+bool expand_vec_shuffle_expr_p (enum machine_mode, tree, tree, tree);
+
 /* Generate code for VEC_COND_EXPR.  */
 extern rtx expand_vec_cond_expr (tree, tree, tree, tree, rtx);
+
+/* Generate code for VEC_SHUFFLE_EXPR.  */
+extern rtx expand_vec_shuffle_expr (tree, tree, tree, tree, rtx);
+
 /* Generate code for VEC_LSHIFT_EXPR and VEC_RSHIFT_EXPR.  */
 extern rtx expand_vec_shift_expr (sepops, rtx);
 
Index: gcc/genopinit.c
===================================================================
--- gcc/genopinit.c	(revision 178354)
+++ gcc/genopinit.c	(working copy)
@@ -255,6 +255,7 @@ static const char * const optabs[] =
   "set_optab_handler (vec_realign_load_optab, $A, CODE_FOR_$(vec_realign_load_$a$))",
   "set_direct_optab_handler (vcond_optab, $A, CODE_FOR_$(vcond$a$))",
   "set_direct_optab_handler (vcondu_optab, $A, CODE_FOR_$(vcondu$a$))",
+  "set_direct_optab_handler (vshuffle_optab, $A, CODE_FOR_$(vshuffle$a$))",
   "set_optab_handler (ssum_widen_optab, $A, CODE_FOR_$(widen_ssum$I$a3$))",
   "set_optab_handler (usum_widen_optab, $A, CODE_FOR_$(widen_usum$I$a3$))",
   "set_optab_handler (udot_prod_optab, $A, CODE_FOR_$(udot_prod$I$a$))",
Index: gcc/testsuite/gcc.c-torture/execute/vect-shuffle-2.c
===================================================================
--- gcc/testsuite/gcc.c-torture/execute/vect-shuffle-2.c	(revision 0)
+++ gcc/testsuite/gcc.c-torture/execute/vect-shuffle-2.c	(revision 0)
@@ -0,0 +1,44 @@
+#define vector(elcount, type)  \
+__attribute__((vector_size((elcount)*sizeof(type)))) type
+
+#define vidx(type, vec, idx) (*(((type *) &(vec)) + idx))
+
+#define shuf2compare(type, count, vres, v0, v1, mask) \
+do { \
+    int __i; \
+    for (__i = 0; __i < count; __i++) { \
+        if (vidx(type, vres, __i) != ((vidx(type, mask, __i) < count) ? \
+                          vidx(type, v0, vidx(type, mask, __i)) :  \
+                          vidx(type, v1, (vidx(type, mask, __i) - count)))) \
+            __builtin_abort (); \
+        } \
+} while (0)
+
+
+int main (int argc, char *argv[]) {
+    vector (8, short) v0 = {5, 5,5,5,5,5,argc,7};
+    vector (8, short) v1 = {argc, 1,8,8,4,9,argc,4};
+    vector (8, short) v2;
+
+    //vector (8, short) mask = {1,2,5,4,3,6,7};
+    
+    vector (8, short) mask0 = {0,2,3,1,4,5,6,7};
+    vector (8, short) mask1 = {0,12,3,4,3,0,10,9};
+    
+    vector (8, short) mask2 = {0,8,1,9,2,10,3,11};
+
+    v2 = __builtin_shuffle (v0, v1,  mask0);
+    shuf2compare (short, 8, v2, v0, v1, mask0);
+ 
+    v2 = __builtin_shuffle (v0, v1,  mask1);
+    shuf2compare (short, 8, v2, v0, v1, mask1);
+    
+    v2 = __builtin_shuffle (v0, v1,  mask2);
+    shuf2compare (short, 8, v2, v0, v1, mask2);
+
+    v2 = __builtin_shuffle (mask0, mask0,  v0);
+    shuf2compare (short, 8, v2, mask0, mask0, v0);
+
+    return 0; 
+}
+
Index: gcc/testsuite/gcc.c-torture/execute/vect-shuffle-4.c
===================================================================
--- gcc/testsuite/gcc.c-torture/execute/vect-shuffle-4.c	(revision 0)
+++ gcc/testsuite/gcc.c-torture/execute/vect-shuffle-4.c	(revision 0)
@@ -0,0 +1,50 @@
+#define vector(elcount, type)  \
+__attribute__((vector_size((elcount)*sizeof(type)))) type
+
+#define vidx(type, vec, idx) (*(((type *) &(vec)) + idx))
+
+#define shuf2compare(type, count, vres, v0, v1, mask) \
+do { \
+    int __i; \
+    for (__i = 0; __i < count; __i++) { \
+        if (vidx(type, vres, __i) != ((vidx(type, mask, __i) < count) ? \
+                          vidx(type, v0, vidx(type, mask, __i)) :  \
+                          vidx(type, v1, (vidx(type, mask, __i) - count)))) \
+            __builtin_abort (); \
+        } \
+} while (0)
+
+
+vector (8, short) __attribute__ ((noinline))
+f (vector (8, short) x, vector (8, short) y, vector (8, short) mask) {
+    return __builtin_shuffle (x, y, mask);
+}
+
+
+
+int main (int argc, char *argv[]) {
+    vector (8, short) v0 = {argc, 1,2,3,4,5,6,7};
+    vector (8, short) v1 = {argc, 1,argc,3,4,5,argc,7};
+    vector (8, short) v2;
+
+    //vector (8, short) mask = {1,2,5,4,3,6,7};
+    
+    vector (8, short) mask0 = {0,2,3,1,4,5,6,7};
+    vector (8, short) mask1 = {0,12,3,4,3,0,10,9};
+    vector (8, short) mask2 = {0,8,1,9,2,10,3,11};
+
+    v2 = f (v0, v1,  mask0);
+    shuf2compare (short, 8, v2, v0, v1, mask0);
+ 
+    v2 = f (v0, v1,  mask1);
+    shuf2compare (short, 8, v2, v0, v1, mask1);
+
+    v2 = f (v0, v1,  mask2);
+    shuf2compare (short, 8, v2, v0, v1, mask2);
+
+    v2 = f (mask0, mask0,  v0);
+    shuf2compare (short, 8, v2, mask0, mask0, v0);
+
+    return 0; 
+}
+
Index: gcc/testsuite/gcc.c-torture/execute/vect-shuffle-1.c
===================================================================
--- gcc/testsuite/gcc.c-torture/execute/vect-shuffle-1.c	(revision 0)
+++ gcc/testsuite/gcc.c-torture/execute/vect-shuffle-1.c	(revision 0)
@@ -0,0 +1,46 @@
+#define vector(elcount, type)  \
+__attribute__((vector_size((elcount)*sizeof(type)))) type
+
+#define vidx(type, vec, idx) (*(((type *) &(vec)) + idx))
+
+#define shufcompare(type, count, vres, v0, mask) \
+do { \
+    int __i; \
+    for (__i = 0; __i < count; __i++) { \
+        if (vidx(type, vres, __i) != vidx(type, v0, vidx(type, mask, __i))) \
+            __builtin_abort (); \
+    } \
+} while (0)
+
+
+int main (int argc, char *argv[]) {
+    /*vector (8, short) v0 = {argc, 1,2,3,4,5,6,7};
+    vector (8, short) v1 = {argc, 1,argc,3,4,5,argc,7};
+    vector (8, short) v2;
+   
+    vector (8, short) smask = {0,0,1,2,3,4,5,6};
+    
+    v2 = __builtin_shuffle (v0,  smask);
+    shufcompare (short, 8, v2, v0, smask);
+    v2 = __builtin_shuffle (v0, v1);
+    shufcompare (short, 8, v2, v0, v1);
+    v2 = __builtin_shuffle (smask, v0);
+    shufcompare (short, 8, v2, smask, v0);*/
+
+    vector (4, int) i0 = {argc, 1,2,3};
+    vector (4, int) i1 = {argc, 1, argc, 3};
+    vector (4, int) i2;
+
+    vector (4, int) imask = {0,3,2,1};
+
+    /*i2 = __builtin_shuffle (i0, imask);
+    shufcompare (int, 4, i2, i0, imask);*/
+    i2 = __builtin_shuffle (i0, i1);
+    shufcompare (int, 4, i2, i0, i1);
+    
+    i2 = __builtin_shuffle (imask, i0);
+    shufcompare (int, 4, i2, imask, i0);
+    
+    return 0;
+}
+
Index: gcc/testsuite/gcc.c-torture/execute/vect-shuffle-3.c
===================================================================
--- gcc/testsuite/gcc.c-torture/execute/vect-shuffle-3.c	(revision 0)
+++ gcc/testsuite/gcc.c-torture/execute/vect-shuffle-3.c	(revision 0)
@@ -0,0 +1,36 @@
+#define vector(elcount, type)  \
+__attribute__((vector_size((elcount)*sizeof(type)))) type
+
+#define vidx(type, vec, idx) (*(((type *) &(vec)) + idx))
+
+#define shufcompare(type, count, vres, v0, mask) \
+do { \
+    int __i; \
+    for (__i = 0; __i < count; __i++) { \
+        if (vidx(type, vres, __i) != vidx(type, v0, vidx(type, mask, __i))) \
+            __builtin_abort (); \
+    } \
+} while (0)
+
+vector (8, short) __attribute__ ((noinline))
+f (vector (8, short) x, vector (8, short) mask) {
+    return __builtin_shuffle (x, mask);
+}
+
+
+int main (int argc, char *argv[]) {
+    vector (8, short) v0 = {argc, 1,2,3,4,5,6,7};
+    vector (8, short) v1 = {argc, 1,argc,3,4,5,argc,7};
+    vector (8, short) v2;
+
+    vector (8, short) mask = {0,0,1,2,3,4,5,6};
+    
+    v2 = f (v0,  mask);
+    shufcompare (short, 8, v2, v0, mask);
+
+    v2 = f (v0, v1);
+    shufcompare (short, 8, v2, v0, v1);
+
+    return 0;
+}
+
Index: gcc/testsuite/gcc.c-torture/execute/vect-shuffle-5.c
===================================================================
--- gcc/testsuite/gcc.c-torture/execute/vect-shuffle-5.c	(revision 0)
+++ gcc/testsuite/gcc.c-torture/execute/vect-shuffle-5.c	(revision 0)
@@ -0,0 +1,64 @@
+/* Test that different type variants are compatible within
+   vector shuffling.  */
+
+#define vector(elcount, type)  \
+__attribute__((vector_size((elcount)*sizeof(type)))) type
+
+#define shufcompare(count, vres, v0, mask) \
+do { \
+    int __i; \
+    for (__i = 0; __i < count; __i++) { \
+        if (vres[__i] != v0[mask[__i]]) \
+            __builtin_abort (); \
+    } \
+} while (0)
+
+#define test_compat_mask(res, vec, mask) \
+  res = __builtin_shuffle (vec, mask); \
+  shufcompare(4, res, vec, mask); \
+  res = __builtin_shuffle (vec, c ## mask); \
+  shufcompare(4, res, vec, c ##  mask); \
+  res = __builtin_shuffle (vec, r ## mask); \
+  shufcompare(4, res, vec, r ##  mask); \
+  res = __builtin_shuffle (vec, d ## mask); \
+  shufcompare(4, res, vec, d ##  mask); \
+  res = __builtin_shuffle (vec, dc ## mask); \
+  shufcompare(4, res, vec, dc ##  mask); \
+
+#define test_compat_vec(res, vec, mask) \
+  test_compat_mask (res, vec, mask); \
+  test_compat_mask (res, c ## vec, mask); \
+  test_compat_mask (res, r ## vec, mask); \
+  test_compat_mask (res, d ## vec, mask); \
+  test_compat_mask (res, dc ## vec, mask); 
+
+#define test_compat(res, vec, mask) \
+  test_compat_vec (res, vec, mask); \
+  test_compat_vec (d ## res, vec, mask); \
+  test_compat_vec (r ## res, vec, mask);
+
+typedef vector (4, int) v4si;
+typedef const vector (4, int) v4sicst;
+
+int main (int argc, char *argv[]) {
+    vector (4, int) vec = {argc, 1,2,3};
+    const vector (4, int) cvec = {argc, 1,2,3};
+    register vector (4, int) rvec = {argc, 1,2,3};
+    v4si dvec = {argc, 1,2,3};
+    v4sicst dcvec = {argc, 1,2,3};
+    
+    vector (4, int) res; 
+    v4si dres;
+    register vector (4, int) rres;
+
+    vector (4, int) mask = {0,3,2,1};
+    const vector (4, int) cmask = {0,3,2,1};
+    register vector (4, int) rmask = {0,3,2,1};
+    v4si dmask = {0,3,2,1};
+    v4sicst dcmask = {0,3,2,1};
+
+    test_compat (res, vec, mask);
+
+    return 0;
+}
+
Index: gcc/testsuite/gcc.dg/builtin-complex-err-1.c
===================================================================
--- gcc/testsuite/gcc.dg/builtin-complex-err-1.c	(revision 178354)
+++ gcc/testsuite/gcc.dg/builtin-complex-err-1.c	(working copy)
@@ -19,8 +19,8 @@ _Complex float fc3 = __builtin_complex (
 void
 f (void)
 {
-  __builtin_complex (0.0); /* { dg-error "expected" } */
-  __builtin_complex (0.0, 0.0, 0.0); /* { dg-error "expected" } */
+  __builtin_complex (0.0); /* { dg-error "wrong number of arguments" } */
+  __builtin_complex (0.0, 0.0, 0.0); /* { dg-error "wrong number of arguments" } */
 }
 
-void (*p) (void) = __builtin_complex; /* { dg-error "expected" } */
+void (*p) (void) = __builtin_complex; /* { dg-error "cannot take address" } */
Index: gcc/c-tree.h
===================================================================
--- gcc/c-tree.h	(revision 178354)
+++ gcc/c-tree.h	(working copy)
@@ -579,6 +579,7 @@ extern tree c_begin_omp_task (void);
 extern tree c_finish_omp_task (location_t, tree, tree);
 extern tree c_finish_omp_clauses (tree);
 extern tree c_build_va_arg (location_t, tree, tree);
+extern tree c_build_vec_shuffle_expr (location_t, tree, tree, tree);
 
 /* Set to 0 at beginning of a function definition, set to 1 if
    a return statement that specifies a return value is seen.  */
Index: gcc/expr.c
===================================================================
--- gcc/expr.c	(revision 178354)
+++ gcc/expr.c	(working copy)
@@ -8605,6 +8605,10 @@ expand_expr_real_2 (sepops ops, rtx targ
     case VEC_PACK_FIX_TRUNC_EXPR:
       mode = TYPE_MODE (TREE_TYPE (treeop0));
       goto binop;
+    
+    case VEC_SHUFFLE_EXPR:
+      target = expand_vec_shuffle_expr (type, treeop0, treeop1, treeop2, target);
+      return target;
 
     case DOT_PROD_EXPR:
       {
Index: gcc/gimple-pretty-print.c
===================================================================
--- gcc/gimple-pretty-print.c	(revision 178354)
+++ gcc/gimple-pretty-print.c	(working copy)
@@ -417,6 +417,16 @@ dump_ternary_rhs (pretty_printer *buffer
       dump_generic_node (buffer, gimple_assign_rhs3 (gs), spc, flags, false);
       pp_string (buffer, ">");
       break;
+    
+    case VEC_SHUFFLE_EXPR:
+      pp_string (buffer, "VEC_SHUFFLE_EXPR <");
+      dump_generic_node (buffer, gimple_assign_rhs1 (gs), spc, flags, false);
+      pp_string (buffer, ", ");
+      dump_generic_node (buffer, gimple_assign_rhs2 (gs), spc, flags, false);
+      pp_string (buffer, ", ");
+      dump_generic_node (buffer, gimple_assign_rhs3 (gs), spc, flags, false);
+      pp_string (buffer, ">");
+      break;
 
     case REALIGN_LOAD_EXPR:
       pp_string (buffer, "REALIGN_LOAD <");
Index: gcc/c-typeck.c
===================================================================
--- gcc/c-typeck.c	(revision 178354)
+++ gcc/c-typeck.c	(working copy)
@@ -2307,7 +2307,7 @@ build_array_ref (location_t loc, tree ar
       if (TREE_CODE (TREE_TYPE (index)) != ARRAY_TYPE
 	  && TREE_CODE (TREE_TYPE (index)) != POINTER_TYPE)
 	{
-          error_at (loc, 
+          error_at (loc,
             "subscripted value is neither array nor pointer nor vector");
 
 	  return error_mark_node;
@@ -2339,8 +2339,8 @@ build_array_ref (location_t loc, tree ar
   index = default_conversion (index);
 
   gcc_assert (TREE_CODE (TREE_TYPE (index)) == INTEGER_TYPE);
-  
-  /* For vector[index], convert the vector to a 
+
+  /* For vector[index], convert the vector to a
      pointer of the underlying type.  */
   if (TREE_CODE (TREE_TYPE (array)) == VECTOR_TYPE)
     {
@@ -2348,11 +2348,11 @@ build_array_ref (location_t loc, tree ar
       tree type1;
 
       if (TREE_CODE (index) == INTEGER_CST)
-        if (!host_integerp (index, 1) 
-            || ((unsigned HOST_WIDE_INT) tree_low_cst (index, 1) 
+        if (!host_integerp (index, 1)
+            || ((unsigned HOST_WIDE_INT) tree_low_cst (index, 1)
                >= TYPE_VECTOR_SUBPARTS (TREE_TYPE (array))))
           warning_at (loc, OPT_Warray_bounds, "index value is out of bound");
-     
+
       c_common_mark_addressable_vec (array);
       type = build_qualified_type (TREE_TYPE (type), TYPE_QUALS (type));
       type = build_pointer_type (type);
@@ -2845,6 +2845,99 @@ build_function_call_vec (location_t loc,
     }
   return require_complete_type (result);
 }
+
+/* Build a VEC_SHUFFLE_EXPR if V0, V1 and MASK are not error_mark_nodes
+   and have vector types, V0 has the same type as V1, and the number of
+   elements of V0, V1, MASK is the same.
+
+   In case V1 is a NULL_TREE it is assumed that __builtin_shuffle was
+   called with two arguments.  In this case implementation passes the
+   first argument twice in order to share the same tree code.  This fact
+   could enable the mask-values being twice the vector length.  This is
+   an implementation accident and this semantics is not guaranteed to
+   the user.  */
+tree
+c_build_vec_shuffle_expr (location_t loc, tree v0, tree v1, tree mask)
+{
+  tree vec_shuffle, tmp;
+  bool wrap = true;
+  bool maybe_const = false;
+  bool two_arguments;
+
+  if (v1 == NULL_TREE)
+    {
+      two_arguments = true;
+      v1 = v0;
+    }
+
+  if (v0 == error_mark_node || v1 == error_mark_node
+      || mask == error_mark_node)
+    return error_mark_node;
+
+  if (TREE_CODE (TREE_TYPE (mask)) != VECTOR_TYPE
+      || TREE_CODE (TREE_TYPE (TREE_TYPE (mask))) != INTEGER_TYPE)
+    {
+      error_at (loc, "__builtin_shuffle last argument must "
+		     "be an integer vector");
+      return error_mark_node;
+    }
+
+  if (TREE_CODE (TREE_TYPE (v0)) != VECTOR_TYPE
+      || TREE_CODE (TREE_TYPE (v1)) != VECTOR_TYPE)
+    {
+      error_at (loc, "__builtin_shuffle arguments must be vectors");
+      return error_mark_node;
+    }
+
+  if (TYPE_MAIN_VARIANT (TREE_TYPE (v0)) != TYPE_MAIN_VARIANT (TREE_TYPE (v1)))
+    {
+      error_at (loc, "__builtin_shuffle argument vectors must be of "
+		     "the same type");
+      return error_mark_node;
+    }
+
+  if (TYPE_VECTOR_SUBPARTS (TREE_TYPE (v0))
+      != TYPE_VECTOR_SUBPARTS (TREE_TYPE (mask))
+      && TYPE_VECTOR_SUBPARTS (TREE_TYPE (v1))
+	 != TYPE_VECTOR_SUBPARTS (TREE_TYPE (mask)))
+    {
+      error_at (loc, "__builtin_shuffle number of elements of the "
+		     "argument vector(s) and the mask vector should "
+		     "be the same");
+      return error_mark_node;
+    }
+
+  if (GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE (TREE_TYPE (v0))))
+      != GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE (TREE_TYPE (mask)))))
+    {
+      error_at (loc, "__builtin_shuffle argument vector(s) inner type "
+		     "must have the same size as inner type of the mask");
+      return error_mark_node;
+    }
+
+  /* Avoid C_MAYBE_CONST_EXPRs inside VEC_SHUFFLE_EXPR.  */
+  tmp = c_fully_fold (v0, false, &maybe_const);
+  v0 = save_expr (tmp);
+  wrap &= maybe_const;
+
+  if (!two_arguments)
+    {
+      v1 = c_fully_fold (v1, false, &maybe_const);
+      wrap &= maybe_const;
+    }
+  else
+    v1 = v0;
+
+  mask = c_fully_fold (mask, false, &maybe_const);
+  wrap &= maybe_const;
+
+  vec_shuffle = build3 (VEC_SHUFFLE_EXPR, TREE_TYPE (v0), v0, v1, mask);
+
+  if (!wrap)
+    vec_shuffle = c_wrap_maybe_const (vec_shuffle, true);
+
+  return vec_shuffle;
+}
 \f
 /* Convert the argument expressions in the vector VALUES
    to the types in the list TYPELIST.
@@ -3167,7 +3260,7 @@ convert_arguments (tree typelist, VEC(tr
 
   if (typetail != 0 && TREE_VALUE (typetail) != void_type_node)
     {
-      error_at (input_location, 
+      error_at (input_location,
 		"too few arguments to function %qE", function);
       if (fundecl && !DECL_BUILT_IN (fundecl))
 	inform (DECL_SOURCE_LOCATION (fundecl), "declared here");
@@ -3566,7 +3659,7 @@ build_unary_op (location_t location,
 
       /* Complain about anything that is not a true lvalue.  In
 	 Objective-C, skip this check for property_refs.  */
-      if (!objc_is_property_ref (arg) 
+      if (!objc_is_property_ref (arg)
 	  && !lvalue_or_else (location,
 			      arg, ((code == PREINCREMENT_EXPR
 				     || code == POSTINCREMENT_EXPR)
@@ -3683,7 +3776,7 @@ build_unary_op (location_t location,
 	   need to ask Objective-C to build the increment or decrement
 	   expression for it.  */
 	if (objc_is_property_ref (arg))
-	  return objc_build_incr_expr_for_property_ref (location, code, 
+	  return objc_build_incr_expr_for_property_ref (location, code,
 							arg, inc);
 
 	/* Report a read-only lvalue.  */
@@ -5926,7 +6019,7 @@ void
 pedwarn_init (location_t location, int opt, const char *gmsgid)
 {
   char *ofwhat;
-  
+
   /* The gmsgid may be a format string with %< and %>. */
   pedwarn (location, opt, gmsgid);
   ofwhat = print_spelling ((char *) alloca (spelling_length () + 1));
@@ -9344,8 +9437,8 @@ scalar_to_vector (location_t loc, enum t
   tree type1 = TREE_TYPE (op1);
   bool integer_only_op = false;
   enum stv_conv ret = stv_firstarg;
-  
-  gcc_assert (TREE_CODE (type0) == VECTOR_TYPE 
+
+  gcc_assert (TREE_CODE (type0) == VECTOR_TYPE
 	      || TREE_CODE (type1) == VECTOR_TYPE);
   switch (code)
     {
@@ -9370,7 +9463,7 @@ scalar_to_vector (location_t loc, enum t
       case BIT_AND_EXPR:
 	integer_only_op = true;
 	/* ... fall through ...  */
-      
+
       case PLUS_EXPR:
       case MINUS_EXPR:
       case MULT_EXPR:
@@ -9387,7 +9480,7 @@ scalar_to_vector (location_t loc, enum t
 	  }
 
 	if (TREE_CODE (type0) == INTEGER_TYPE
-	    && TREE_CODE (TREE_TYPE (type1)) == INTEGER_TYPE) 
+	    && TREE_CODE (TREE_TYPE (type1)) == INTEGER_TYPE)
 	  {
 	    if (unsafe_conversion_p (TREE_TYPE (type1), op0, false))
 	      {
@@ -9399,7 +9492,7 @@ scalar_to_vector (location_t loc, enum t
 	  }
 	else if (!integer_only_op
 		    /* Allow integer --> real conversion if safe.  */
-		 && (TREE_CODE (type0) == REAL_TYPE 
+		 && (TREE_CODE (type0) == REAL_TYPE
 		     || TREE_CODE (type0) == INTEGER_TYPE)
 		 && SCALAR_FLOAT_TYPE_P (TREE_TYPE (type1)))
 	  {
@@ -9414,7 +9507,7 @@ scalar_to_vector (location_t loc, enum t
       default:
 	break;
     }
- 
+
   return stv_nothing;
 }
 \f
@@ -9529,8 +9622,8 @@ build_binary_op (location_t location, en
     int_const = int_const_or_overflow = false;
 
   /* Do not apply default conversion in mixed vector/scalar expression.  */
-  if (convert_p 
-      && !((TREE_CODE (TREE_TYPE (op0)) == VECTOR_TYPE) 
+  if (convert_p
+      && !((TREE_CODE (TREE_TYPE (op0)) == VECTOR_TYPE)
 	   != (TREE_CODE (TREE_TYPE (op1)) == VECTOR_TYPE)))
     {
       op0 = default_conversion (op0);
@@ -9608,7 +9701,7 @@ build_binary_op (location_t location, en
   if ((code0 == VECTOR_TYPE) != (code1 == VECTOR_TYPE))
     {
       enum stv_conv convert_flag = scalar_to_vector (location, code, op0, op1);
-      
+
       switch (convert_flag)
 	{
 	  case stv_error:
@@ -9949,7 +10042,7 @@ build_binary_op (location_t location, en
 	    {
 	      if (code == EQ_EXPR)
 		warning_at (location,
-			    OPT_Waddress, 
+			    OPT_Waddress,
 			    "the comparison will always evaluate as %<false%> "
 			    "for the address of %qD will never be NULL",
 			    TREE_OPERAND (op1, 0));
Index: gcc/gimplify.c
===================================================================
--- gcc/gimplify.c	(revision 178354)
+++ gcc/gimplify.c	(working copy)
@@ -7286,6 +7286,7 @@ gimplify_expr (tree *expr_p, gimple_seq
 	  }
 
 	case FMA_EXPR:
+	case VEC_SHUFFLE_EXPR:
 	  /* Classified as tcc_expression.  */
 	  goto expr_3;
 
Index: gcc/tree.def
===================================================================
--- gcc/tree.def	(revision 178354)
+++ gcc/tree.def	(working copy)
@@ -497,6 +497,19 @@ DEFTREECODE (COND_EXPR, "cond_expr", tcc
 */
 DEFTREECODE (VEC_COND_EXPR, "vec_cond_expr", tcc_expression, 3)
 
+/* Vector shuffle expression.  A = VEC_SHUFFLE_EXPR<v0, v1, mask>
+   means
+
+   foreach i in length (mask):
+     A = mask[i] < length (v0) ? v0[mask[i]] : v1[mask[i] - length (mask)]
+
+   V0 and V1 are vectors of the same type.  MASK is an integer-typed
+   vector.  The number of MASK elements must be the same with the
+   number of elements in V0 and V1.  The size of the inner type
+   of the MASK and of the V0 and V1 must be the same.
+*/
+DEFTREECODE (VEC_SHUFFLE_EXPR, "vec_shuffle_expr", tcc_expression, 3)
+
 /* Declare local variables, including making RTL and allocating space.
    BIND_EXPR_VARS is a chain of VAR_DECL nodes for the variables.
    BIND_EXPR_BODY is the body, the expression to be computed using
Index: gcc/tree-inline.c
===================================================================
--- gcc/tree-inline.c	(revision 178354)
+++ gcc/tree-inline.c	(working copy)
@@ -3285,6 +3285,7 @@ estimate_operator_cost (enum tree_code c
        ??? We may consider mapping RTL costs to this.  */
     case COND_EXPR:
     case VEC_COND_EXPR:
+    case VEC_SHUFFLE_EXPR:
 
     case PLUS_EXPR:
     case POINTER_PLUS_EXPR:
Index: gcc/tree-vect-generic.c
===================================================================
--- gcc/tree-vect-generic.c	(revision 178354)
+++ gcc/tree-vect-generic.c	(working copy)
@@ -30,6 +30,7 @@ along with GCC; see the file COPYING3.
 #include "tree-pass.h"
 #include "flags.h"
 #include "ggc.h"
+#include "diagnostic.h"
 
 /* Need to include rtl.h, expr.h, etc. for optabs.  */
 #include "expr.h"
@@ -326,10 +327,10 @@ uniform_vector_p (tree vec)
         }
       if (i != TYPE_VECTOR_SUBPARTS (TREE_TYPE (vec)))
 	return NULL_TREE;
-      
+
       return first;
     }
-  
+
   return NULL_TREE;
 }
 
@@ -432,6 +433,263 @@ type_for_widest_vector_mode (enum machin
     }
 }
 
+
+/* Build a reference to the element of the vector VECT.  Function
+   returns either the element itself, either BIT_FIELD_REF, or an
+   ARRAY_REF expression.
+
+   GSI is requred to insert temporary variables while building a
+   refernece to the element of the vector VECT.
+
+   PTMPVEC is a pointer to the temporary variable for caching
+   purposes.  In case when PTMPVEC is NULL new temporary variable
+   will be created.  */
+static tree
+vector_element (gimple_stmt_iterator *gsi, tree vect, tree idx, tree *ptmpvec)
+{
+  tree type;
+  gimple asgn;
+  tree tmpvec;
+  tree arraytype;
+  bool need_asgn = true;
+
+  gcc_assert (TREE_CODE (TREE_TYPE (vect)) == VECTOR_TYPE);
+
+  type = TREE_TYPE (vect);
+  if (TREE_CODE (idx) == INTEGER_CST)
+    {
+      unsigned HOST_WIDE_INT index;
+
+      if (!host_integerp (idx, 1)
+           || (index = tree_low_cst (idx, 1)) > TYPE_VECTOR_SUBPARTS (type)-1)
+        return error_mark_node;
+
+      if (TREE_CODE (vect) == VECTOR_CST)
+        {
+	  unsigned i;
+	  tree vals = TREE_VECTOR_CST_ELTS (vect);
+	  for (i = 0; vals; vals = TREE_CHAIN (vals), ++i)
+	    if (i == index)
+	       return TREE_VALUE (vals);
+	  return error_mark_node;
+        }
+      else if (TREE_CODE (vect) == CONSTRUCTOR)
+        {
+          unsigned i;
+          VEC (constructor_elt, gc) *vals = CONSTRUCTOR_ELTS (vect);
+          constructor_elt *elt;
+
+          for (i = 0; VEC_iterate (constructor_elt, vals, i, elt); i++)
+            if (operand_equal_p (elt->index, idx, 0))
+              return elt->value;
+          return fold_convert (TREE_TYPE (type), integer_zero_node);
+        }
+      else if (TREE_CODE (vect) == SSA_NAME)
+        {
+	  tree size = TYPE_SIZE (TREE_TYPE (type));
+          tree pos = fold_build2 (MULT_EXPR, TREE_TYPE (idx), idx, size);
+          return fold_build3 (BIT_FIELD_REF, TREE_TYPE (type), vect, size, pos);
+        }
+      else
+        return error_mark_node;
+    }
+
+  if (!ptmpvec)
+    tmpvec = create_tmp_var (TREE_TYPE (vect), "vectmp");
+  else if (!*ptmpvec)
+    tmpvec = *ptmpvec = create_tmp_var (TREE_TYPE (vect), "vectmp");
+  else
+    {
+      tmpvec = *ptmpvec;
+      need_asgn = false;
+    }
+
+  if (need_asgn)
+    {
+      TREE_ADDRESSABLE (tmpvec) = 1;
+      asgn = gimple_build_assign (tmpvec, vect);
+      gsi_insert_before (gsi, asgn, GSI_SAME_STMT);
+    }
+
+  arraytype = build_array_type_nelts (TREE_TYPE (type),
+				      TYPE_VECTOR_SUBPARTS (TREE_TYPE (vect)));
+
+  return build4 (ARRAY_REF, TREE_TYPE (type),
+                 build1 (VIEW_CONVERT_EXPR, arraytype, tmpvec),
+                 idx, NULL_TREE, NULL_TREE);
+}
+
+/* Check if VEC_SHUFFLE_EXPR within the given setting is supported
+   by hardware, or lower it piecewise.  Function returns false when
+   the expression must be replaced with TRAP_RETURN, true otherwise.
+
+   When VEC_SHUFFLE_EXPR has the same first and second operands:
+   VEC_SHUFFLE_EXPR <v0, v0, mask> the lowered version would be
+   {v0[mask[0]], v0[mask[1]], ...}
+   MASK and V0 must have the same number of elements.
+
+   Otherwise VEC_SHUFFLE_EXPR <v0, v1, mask> is lowered to
+   {mask[0] < len(v0) ? v0[mask[0]] : v1[mask[0]], ...}
+   V0 and V1 must have the same type.  MASK, V0, V1 must have the
+   same number of arguments.  */
+static bool
+lower_vec_shuffle (gimple_stmt_iterator *gsi, location_t loc)
+{
+
+  gimple stmt = gsi_stmt (*gsi);
+  tree mask = gimple_assign_rhs3 (stmt);
+  tree vec0 = gimple_assign_rhs1 (stmt);
+  tree vec1 = gimple_assign_rhs2 (stmt);
+  unsigned els = TYPE_VECTOR_SUBPARTS (TREE_TYPE (mask));
+  tree type0 = TREE_TYPE (TREE_TYPE (vec0));
+  VEC(constructor_elt,gc) *v = NULL;
+  tree vectype, constr;
+  tree vec0tmp = NULL_TREE, masktmp = NULL_TREE;
+
+  if (expand_vec_shuffle_expr_p (TYPE_MODE (TREE_TYPE (vec0)), vec0, vec1, mask))
+    {
+      tree t;
+
+      t = gimplify_build3 (gsi, VEC_SHUFFLE_EXPR, TREE_TYPE (vec0),
+			   vec0, vec1, mask);
+      gimple_assign_set_rhs_from_tree (gsi, t);
+      /* Statement should be updated by callee.  */
+      return true;
+    }
+
+  if (operand_equal_p (vec0, vec1, 0))
+    {
+      unsigned i;
+      tree vec0tmp = NULL_TREE;
+
+      v = VEC_alloc (constructor_elt, gc, els);
+      for (i = 0; i < els; i++)
+        {
+          tree idxval, vecel, t;
+
+	  idxval = vector_element (gsi, mask, size_int (i), &masktmp);
+          if (idxval == error_mark_node)
+            {
+              if (warning_at (loc, 0, "Invalid shuffling mask index %i", i))
+		inform (loc, "if this code is reached the programm will abort");
+	      return false;
+            }
+
+	  vecel = vector_element (gsi, vec0, idxval, &vec0tmp);
+          if (vecel == error_mark_node)
+            {
+              if (warning_at (loc, 0, "Invalid shuffling arguments"))
+		inform (loc, "if this code is reached the programm will abort");
+	      return false;
+            }
+
+          t = force_gimple_operand_gsi (gsi, vecel, true,
+					NULL_TREE, true, GSI_SAME_STMT);
+          CONSTRUCTOR_APPEND_ELT (v, size_int (i), t);
+        }
+    }
+  else
+    {
+      unsigned i;
+      tree var = create_tmp_var (type0, "vecel");
+      tree vec1tmp = NULL_TREE;
+
+      v = VEC_alloc (constructor_elt, gc, els);
+      for (i = 0; i < els; i++)
+        {
+          tree idxval, idx1val, cond, elval0, elval1, condexpr, t, ssatmp;
+          tree vec0el, vec1el;
+          gimple asgn;
+
+          idxval = vector_element (gsi, mask, size_int (i), &masktmp);
+	  if (idxval == error_mark_node)
+            {
+              if (warning_at (loc, 0, "Invalid shuffling mask index %i", i))
+		inform (loc, "if this code is reached the programm will abort");
+	      return false;
+            }
+
+          if (TREE_CODE (idxval) == INTEGER_CST)
+            {
+              if (tree_int_cst_lt (idxval, size_int (els)))
+                {
+                  vec0el = vector_element (gsi, vec0, idxval, &vec0tmp);
+                  t = force_gimple_operand_gsi (gsi, vec0el,
+                                    true, NULL_TREE, true, GSI_SAME_STMT);
+                }
+              else if (tree_int_cst_lt (idxval, size_int (2*els)))
+                {
+                  idx1val = fold_build2 (MINUS_EXPR, TREE_TYPE (idxval),
+                        idxval, build_int_cst (TREE_TYPE (idxval), els));
+
+                  vec1el = vector_element (gsi, vec1, idx1val, &vec1tmp);
+                  t = force_gimple_operand_gsi (gsi, vec1el, true,
+						NULL_TREE, true, GSI_SAME_STMT);
+                }
+              else
+                {
+                  if (warning_at (loc, 0, "Invalid shuffling mask index %i", i))
+		    inform (loc, "if this code is reached the "
+				  "programm will abort");
+		  return false;
+                }
+            }
+          else
+            {
+
+              idx1val = fold_build2 (MINUS_EXPR, TREE_TYPE (idxval),
+                            idxval, build_int_cst (TREE_TYPE (idxval), els));
+              idx1val = force_gimple_operand_gsi (gsi, idx1val,
+                                true, NULL_TREE, true, GSI_SAME_STMT);
+              cond = fold_build2 (GT_EXPR, boolean_type_node, \
+                             idxval, fold_convert (type0, size_int (els - 1)));
+
+	      vec0el = vector_element (gsi, vec0, idxval, &vec0tmp);
+              if (vec0el == error_mark_node)
+                {
+                  if (warning_at (loc, 0, "Invalid shuffling arguments"))
+		    inform (loc, "if this code is reached the "
+				 "programm will abort");
+		  return false;
+                }
+
+              elval0 = force_gimple_operand_gsi (gsi, vec0el,
+                                true, NULL_TREE, true, GSI_SAME_STMT);
+
+	      vec1el = vector_element (gsi, vec1, idx1val, &vec1tmp);
+              if (vec1el == error_mark_node)
+                {
+                  if (warning_at (loc, 0, "Invalid shuffling arguments"))
+		    inform (loc, "if this code is reached the "
+				 "programm will abort");
+		  return false;
+                }
+
+              elval1 = force_gimple_operand_gsi (gsi, vec1el,
+                                true, NULL_TREE, true, GSI_SAME_STMT);
+
+              condexpr = fold_build3 (COND_EXPR, type0, cond, \
+                                      elval1, elval0);
+
+              t = force_gimple_operand_gsi (gsi, condexpr, true, \
+                                        NULL_TREE, true, GSI_SAME_STMT);
+            }
+
+          asgn = gimple_build_assign (var, t);
+          ssatmp = make_ssa_name (var, asgn);
+          gimple_assign_set_lhs (asgn, ssatmp);
+          gsi_insert_before (gsi, asgn, GSI_SAME_STMT);
+          CONSTRUCTOR_APPEND_ELT (v, size_int (i), ssatmp);
+        }
+    }
+
+  vectype = build_vector_type (type0, els);
+  constr = build_constructor (vectype, v);
+  gimple_assign_set_rhs_from_tree (gsi, constr);
+  /* Statement should be updated by callee.  */
+  return true;
+}
+
 /* Process one statement.  If we identify a vector operation, expand it.  */
 
 static void
@@ -451,6 +709,25 @@ expand_vector_operations_1 (gimple_stmt_
   code = gimple_assign_rhs_code (stmt);
   rhs_class = get_gimple_rhs_class (code);
 
+  if (code == VEC_SHUFFLE_EXPR)
+    {
+      if (!lower_vec_shuffle (gsi, gimple_location (stmt)))
+	{
+	  gimple new_stmt;
+	  tree vec0;
+
+	  vec0 = gimple_assign_rhs1 (stmt);
+	  new_stmt = gimple_build_call (built_in_decls[BUILT_IN_TRAP], 0);
+	  gsi_insert_before (gsi, new_stmt,  GSI_SAME_STMT);
+	  split_block (gimple_bb (new_stmt), new_stmt);
+	  new_stmt = gimple_build_assign (gimple_assign_lhs (stmt), vec0);
+	  gsi_replace (gsi, new_stmt, false);
+	}
+
+      gimple_set_modified (gsi_stmt (*gsi), true);
+      update_stmt (gsi_stmt (*gsi));
+    }
+
   if (rhs_class != GIMPLE_UNARY_RHS && rhs_class != GIMPLE_BINARY_RHS)
     return;
 
@@ -485,9 +762,9 @@ expand_vector_operations_1 (gimple_stmt_
     {
       bool vector_scalar_shift;
       op = optab_for_tree_code (code, type, optab_scalar);
-      
+
       /* Vector/Scalar shift is supported.  */
-      vector_scalar_shift = (op && (optab_handler (op, TYPE_MODE (type)) 
+      vector_scalar_shift = (op && (optab_handler (op, TYPE_MODE (type))
 				    != CODE_FOR_nothing));
 
       /* If the 2nd argument is vector, we need a vector/vector shift.
@@ -500,10 +777,10 @@ expand_vector_operations_1 (gimple_stmt_
           /* Check whether we have vector <op> {x,x,x,x} where x
              could be a scalar variable or a constant. Transform
              vector <op> {x,x,x,x} ==> vector <op> scalar.  */
-          if (vector_scalar_shift 
+          if (vector_scalar_shift
               && ((TREE_CODE (rhs2) == VECTOR_CST
 		   && (first = uniform_vector_p (rhs2)) != NULL_TREE)
-		  || (TREE_CODE (rhs2) == SSA_NAME 
+		  || (TREE_CODE (rhs2) == SSA_NAME
 		      && (def_stmt = SSA_NAME_DEF_STMT (rhs2))
 		      && gimple_assign_single_p (def_stmt)
 		      && (first = uniform_vector_p
@@ -516,14 +793,14 @@ expand_vector_operations_1 (gimple_stmt_
           else
             op = optab_for_tree_code (code, type, optab_vector);
         }
-    
+
       /* Try for a vector/scalar shift, and if we don't have one, see if we
          have a vector/vector shift */
       else if (!vector_scalar_shift)
 	{
 	  op = optab_for_tree_code (code, type, optab_vector);
 
-	  if (op && (optab_handler (op, TYPE_MODE (type)) 
+	  if (op && (optab_handler (op, TYPE_MODE (type))
 		     != CODE_FOR_nothing))
 	    {
 	      /* Transform vector <op> scalar => vector <op> {x,x,x,x}.  */
@@ -613,9 +890,9 @@ expand_vector_operations_1 (gimple_stmt_
    if it may need the bit-twiddling tricks implemented in this file.  */
 
 static bool
-gate_expand_vector_operations (void)
+gate_expand_vector_operations_ssa (void)
 {
-  return flag_tree_vectorize != 0;
+  return optimize == 0;
 }
 
 static unsigned int
@@ -648,7 +925,7 @@ struct gimple_opt_pass pass_lower_vector
  {
   GIMPLE_PASS,
   "veclower",				/* name */
-  0,					/* gate */
+  gate_expand_vector_operations_ssa,    /* gate */
   expand_vector_operations,		/* execute */
   NULL,					/* sub */
   NULL,					/* next */
@@ -661,6 +938,7 @@ struct gimple_opt_pass pass_lower_vector
   TODO_update_ssa	                /* todo_flags_finish */
     | TODO_verify_ssa
     | TODO_verify_stmts | TODO_verify_flow
+    | TODO_cleanup_cfg
  }
 };
 
@@ -669,7 +947,7 @@ struct gimple_opt_pass pass_lower_vector
  {
   GIMPLE_PASS,
   "veclower2",				/* name */
-  gate_expand_vector_operations,	/* gate */
+  0,	                                /* gate */
   expand_vector_operations,		/* execute */
   NULL,					/* sub */
   NULL,					/* next */
@@ -682,6 +960,7 @@ struct gimple_opt_pass pass_lower_vector
   TODO_update_ssa	                /* todo_flags_finish */
     | TODO_verify_ssa
     | TODO_verify_stmts | TODO_verify_flow
+    | TODO_cleanup_cfg
  }
 };
 
Index: gcc/Makefile.in
===================================================================
--- gcc/Makefile.in	(revision 178354)
+++ gcc/Makefile.in	(working copy)
@@ -3178,7 +3178,7 @@ tree-vect-generic.o : tree-vect-generic.
     $(TM_H) $(TREE_FLOW_H) $(GIMPLE_H) tree-iterator.h $(TREE_PASS_H) \
     $(FLAGS_H) $(OPTABS_H) $(MACHMODE_H) $(EXPR_H) \
     langhooks.h $(FLAGS_H) $(DIAGNOSTIC_H) gt-tree-vect-generic.h $(GGC_H) \
-    coretypes.h insn-codes.h
+    coretypes.h insn-codes.h $(DIAGNOSTIC_H)
 df-core.o : df-core.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) \
    insn-config.h $(RECOG_H) $(FUNCTION_H) $(REGS_H) alloc-pool.h \
    hard-reg-set.h $(BASIC_BLOCK_H) $(DF_H) $(BITMAP_H) sbitmap.h $(TIMEVAR_H) \
Index: gcc/gimple.c
===================================================================
--- gcc/gimple.c	(revision 178354)
+++ gcc/gimple.c	(working copy)
@@ -2615,6 +2615,7 @@ get_gimple_rhs_num_ops (enum tree_code c
       || (SYM) == WIDEN_MULT_MINUS_EXPR					    \
       || (SYM) == DOT_PROD_EXPR						    \
       || (SYM) == REALIGN_LOAD_EXPR					    \
+      || (SYM) == VEC_SHUFFLE_EXPR					    \
       || (SYM) == FMA_EXPR) ? GIMPLE_TERNARY_RHS			    \
    : ((SYM) == COND_EXPR						    \
       || (SYM) == CONSTRUCTOR						    \
Index: gcc/tree-cfg.c
===================================================================
--- gcc/tree-cfg.c	(revision 178354)
+++ gcc/tree-cfg.c	(working copy)
@@ -3711,6 +3711,59 @@ verify_gimple_assign_ternary (gimple stm
 	}
       break;
 
+    case VEC_SHUFFLE_EXPR:
+      if (!useless_type_conversion_p (lhs_type, rhs1_type)
+	  || !useless_type_conversion_p (lhs_type, rhs2_type))
+	{
+	  error ("type mismatch in vector shuffle expression");
+	  debug_generic_expr (lhs_type);
+	  debug_generic_expr (rhs1_type);
+	  debug_generic_expr (rhs2_type);
+	  debug_generic_expr (rhs3_type);
+	  return true;
+	}
+
+      if (TREE_CODE (rhs1_type) != VECTOR_TYPE
+	  || TREE_CODE (rhs2_type) != VECTOR_TYPE
+	  || TREE_CODE (rhs3_type) != VECTOR_TYPE)
+	{
+	  error ("vector types expected in vector shuffle expression");
+	  debug_generic_expr (lhs_type);
+	  debug_generic_expr (rhs1_type);
+	  debug_generic_expr (rhs2_type);
+	  debug_generic_expr (rhs3_type);
+	  return true;
+	}
+
+      if (TYPE_VECTOR_SUBPARTS (rhs1_type) != TYPE_VECTOR_SUBPARTS (rhs2_type)
+	  || TYPE_VECTOR_SUBPARTS (rhs2_type)
+	     != TYPE_VECTOR_SUBPARTS (rhs3_type)
+	  || TYPE_VECTOR_SUBPARTS (rhs3_type)
+	     != TYPE_VECTOR_SUBPARTS (lhs_type))
+	{
+	  error ("vectors with different element number found "
+		 "in vector shuffle expression");
+	  debug_generic_expr (lhs_type);
+	  debug_generic_expr (rhs1_type);
+	  debug_generic_expr (rhs2_type);
+	  debug_generic_expr (rhs3_type);
+	  return true;
+	}
+
+      if (TREE_CODE (TREE_TYPE (rhs3_type)) != INTEGER_TYPE
+	  || GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE (rhs3_type)))
+	     != GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE (rhs1_type))))
+	{
+	  error ("invalid mask type in vector shuffle expression");
+	  debug_generic_expr (lhs_type);
+	  debug_generic_expr (rhs1_type);
+	  debug_generic_expr (rhs2_type);
+	  debug_generic_expr (rhs3_type);
+	  return true;
+	}
+
+      return false;
+
     case DOT_PROD_EXPR:
     case REALIGN_LOAD_EXPR:
       /* FIXME.  */
Index: gcc/passes.c
===================================================================
--- gcc/passes.c	(revision 178354)
+++ gcc/passes.c	(working copy)
@@ -1354,7 +1354,6 @@ init_optimization_passes (void)
 	  NEXT_PASS (pass_vectorize);
 	    {
 	      struct opt_pass **p = &pass_vectorize.pass.sub;
-	      NEXT_PASS (pass_lower_vector_ssa);
 	      NEXT_PASS (pass_dce_loop);
 	    }
           NEXT_PASS (pass_predcom);
@@ -1366,6 +1365,7 @@ init_optimization_passes (void)
 	  NEXT_PASS (pass_lim);
 	  NEXT_PASS (pass_tree_loop_done);
 	}
+      NEXT_PASS (pass_lower_vector_ssa);
       NEXT_PASS (pass_cse_reciprocals);
       NEXT_PASS (pass_reassoc);
       NEXT_PASS (pass_vrp);
Index: gcc/c-parser.c
===================================================================
--- gcc/c-parser.c	(revision 178354)
+++ gcc/c-parser.c	(working copy)
@@ -5989,6 +5989,46 @@ c_parser_alignof_expression (c_parser *p
     }
 }
 
+/* Helper function to read arguments of builtins which are interfaces
+   for the middle-end nodes like COMPLEX_EXPR, VEC_SHUFFLE_EXPR and
+   others.  The name of the builtin is passed using BNAME parameter.
+   Function returns true if there were no errors while parsing and
+   stores the arguments in EXPR_LIST.  List of original types can be
+   obtained by passing non NULL value to ORIG_TYPES.  */
+static bool
+c_parser_get_builtin_args (c_parser *parser, const char *bname,
+			   VEC(tree,gc) **expr_list,
+			   VEC(tree,gc) **orig_types)
+{
+  location_t loc = c_parser_peek_token (parser)->location;
+  *expr_list = NULL;
+
+  if (c_parser_next_token_is_not (parser, CPP_OPEN_PAREN))
+    {
+      error_at (loc, "cannot take address of %qs", bname);
+      return false;
+    }
+
+  c_parser_consume_token (parser);
+
+  if (c_parser_next_token_is (parser, CPP_CLOSE_PAREN))
+    {
+      c_parser_consume_token (parser);
+      return true;
+    }
+    
+  if (orig_types)
+    *expr_list = c_parser_expr_list (parser, false, false, orig_types);
+  else
+    *expr_list = c_parser_expr_list (parser, false, false, NULL);
+
+  if (!c_parser_require (parser, CPP_CLOSE_PAREN, "expected %<)%>"))
+    return false;
+
+  return true;
+}
+
+
 /* Parse a postfix expression (C90 6.3.1-6.3.2, C99 6.5.1-6.5.2).
 
    postfix-expression:
@@ -6027,6 +6067,10 @@ c_parser_alignof_expression (c_parser *p
 			     assignment-expression )
      __builtin_types_compatible_p ( type-name , type-name )
      __builtin_complex ( assignment-expression , assignment-expression )
+     __builtin_shuffle ( assignment-expression , assignment-expression )
+     __builtin_shuffle ( assignment-expression , 
+			 assignment-expression ,
+			 assignment-expression, )
 
    offsetof-member-designator:
      identifier
@@ -6047,7 +6091,7 @@ c_parser_alignof_expression (c_parser *p
 static struct c_expr
 c_parser_postfix_expression (c_parser *parser)
 {
-  struct c_expr expr, e1, e2, e3;
+  struct c_expr expr, e1;
   struct c_type_name *t1, *t2;
   location_t loc = c_parser_peek_token (parser)->location;;
   expr.original_code = ERROR_MARK;
@@ -6333,45 +6377,55 @@ c_parser_postfix_expression (c_parser *p
 	  }
 	  break;
 	case RID_CHOOSE_EXPR:
-	  c_parser_consume_token (parser);
-	  if (!c_parser_require (parser, CPP_OPEN_PAREN, "expected %<(%>"))
-	    {
-	      expr.value = error_mark_node;
-	      break;
-	    }
-	  loc = c_parser_peek_token (parser)->location;
-	  e1 = c_parser_expr_no_commas (parser, NULL);
-	  if (!c_parser_require (parser, CPP_COMMA, "expected %<,%>"))
-	    {
-	      c_parser_skip_until_found (parser, CPP_CLOSE_PAREN, NULL);
-	      expr.value = error_mark_node;
-	      break;
-	    }
-	  e2 = c_parser_expr_no_commas (parser, NULL);
-	  if (!c_parser_require (parser, CPP_COMMA, "expected %<,%>"))
-	    {
-	      c_parser_skip_until_found (parser, CPP_CLOSE_PAREN, NULL);
-	      expr.value = error_mark_node;
-	      break;
-	    }
-	  e3 = c_parser_expr_no_commas (parser, NULL);
-	  c_parser_skip_until_found (parser, CPP_CLOSE_PAREN,
-				     "expected %<)%>");
 	  {
-	    tree c;
+	    VEC(tree,gc) *expr_list;
+	    VEC(tree,gc) *orig_types;
+	    tree e1value, e2value, e3value, c;
 
-	    c = e1.value;
-	    mark_exp_read (e2.value);
-	    mark_exp_read (e3.value);
+	    c_parser_consume_token (parser);
+	    if (!c_parser_get_builtin_args (parser,
+					    "__builtin_choose_expr",
+					    &expr_list, &orig_types))
+	      {
+		expr.value = error_mark_node;
+		break;
+	      }
+
+	    if (VEC_length (tree, expr_list) != 3)
+	      {
+		error_at (loc, "wrong number of arguments to "
+			       "%<__builtin_choose_expr%>");
+		expr.value = error_mark_node;
+		break;
+	      }
+	    
+	    e1value = VEC_index (tree, expr_list, 0);
+	    e2value = VEC_index (tree, expr_list, 1);
+	    e3value = VEC_index (tree, expr_list, 2);
+
+	    c = e1value;
+	    mark_exp_read (e2value);
+	    mark_exp_read (e3value);
 	    if (TREE_CODE (c) != INTEGER_CST
 		|| !INTEGRAL_TYPE_P (TREE_TYPE (c)))
 	      error_at (loc,
 			"first argument to %<__builtin_choose_expr%> not"
 			" a constant");
 	    constant_expression_warning (c);
-	    expr = integer_zerop (c) ? e3 : e2;
+	    
+	    if (integer_zerop (c))
+	      {
+		expr.value = e3value;
+		expr.original_type = VEC_index (tree, orig_types, 2);
+	      }
+	    else
+	      {
+		expr.value = e2value;
+		expr.original_type = VEC_index (tree, orig_types, 1);
+	      }
+
+	    break;
 	  }
-	  break;
 	case RID_TYPES_COMPATIBLE_P:
 	  c_parser_consume_token (parser);
 	  if (!c_parser_require (parser, CPP_OPEN_PAREN, "expected %<(%>"))
@@ -6410,57 +6464,96 @@ c_parser_postfix_expression (c_parser *p
 	  }
 	  break;
 	case RID_BUILTIN_COMPLEX:
-	  c_parser_consume_token (parser);
-	  if (!c_parser_require (parser, CPP_OPEN_PAREN, "expected %<(%>"))
-	    {
-	      expr.value = error_mark_node;
-	      break;
-	    }
-	  loc = c_parser_peek_token (parser)->location;
-	  e1 = c_parser_expr_no_commas (parser, NULL);
-	  if (!c_parser_require (parser, CPP_COMMA, "expected %<,%>"))
-	    {
-	      c_parser_skip_until_found (parser, CPP_CLOSE_PAREN, NULL);
-	      expr.value = error_mark_node;
-	      break;
-	    }
-	  e2 = c_parser_expr_no_commas (parser, NULL);
-	  c_parser_skip_until_found (parser, CPP_CLOSE_PAREN,
-				     "expected %<)%>");
-	  mark_exp_read (e1.value);
-	  if (TREE_CODE (e1.value) == EXCESS_PRECISION_EXPR)
-	    e1.value = convert (TREE_TYPE (e1.value),
-				TREE_OPERAND (e1.value, 0));
-	  mark_exp_read (e2.value);
-	  if (TREE_CODE (e2.value) == EXCESS_PRECISION_EXPR)
-	    e2.value = convert (TREE_TYPE (e2.value),
-				TREE_OPERAND (e2.value, 0));
-	  if (!SCALAR_FLOAT_TYPE_P (TREE_TYPE (e1.value))
-	      || DECIMAL_FLOAT_TYPE_P (TREE_TYPE (e1.value))
-	      || !SCALAR_FLOAT_TYPE_P (TREE_TYPE (e2.value))
-	      || DECIMAL_FLOAT_TYPE_P (TREE_TYPE (e2.value)))
-	    {
-	      error_at (loc, "%<__builtin_complex%> operand "
-			"not of real binary floating-point type");
-	      expr.value = error_mark_node;
-	      break;
-	    }
-	  if (TYPE_MAIN_VARIANT (TREE_TYPE (e1.value))
-	      != TYPE_MAIN_VARIANT (TREE_TYPE (e2.value)))
-	    {
-	      error_at (loc,
-			"%<__builtin_complex%> operands of different types");
-	      expr.value = error_mark_node;
-	      break;
-	    }
-	  if (!flag_isoc99)
-	    pedwarn (loc, OPT_pedantic,
-		     "ISO C90 does not support complex types");
-	  expr.value = build2 (COMPLEX_EXPR,
-			       build_complex_type (TYPE_MAIN_VARIANT
-						   (TREE_TYPE (e1.value))),
-			       e1.value, e2.value);
-	  break;
+	  { 
+	    VEC(tree,gc) *expr_list;
+	    tree e1value, e2value;
+	    
+	    c_parser_consume_token (parser);
+	    if (!c_parser_get_builtin_args (parser,
+					    "__builtin_complex",
+					    &expr_list, NULL))
+	      {
+		expr.value = error_mark_node;
+		break;
+	      }
+
+	    if (VEC_length (tree, expr_list) != 2)
+	      {
+		error_at (loc, "wrong number of arguments to "
+			       "%<__builtin_complex%>");
+		expr.value = error_mark_node;
+		break;
+	      }
+	    
+	    e1value = VEC_index (tree, expr_list, 0);
+	    e2value = VEC_index (tree, expr_list, 1);
+
+	    mark_exp_read (e1value);
+	    if (TREE_CODE (e1value) == EXCESS_PRECISION_EXPR)
+	      e1value = convert (TREE_TYPE (e1value),
+				 TREE_OPERAND (e1value, 0));
+	    mark_exp_read (e2value);
+	    if (TREE_CODE (e2value) == EXCESS_PRECISION_EXPR)
+	      e2value = convert (TREE_TYPE (e2value),
+				 TREE_OPERAND (e2value, 0));
+	    if (!SCALAR_FLOAT_TYPE_P (TREE_TYPE (e1value))
+		|| DECIMAL_FLOAT_TYPE_P (TREE_TYPE (e1value))
+		|| !SCALAR_FLOAT_TYPE_P (TREE_TYPE (e2value))
+		|| DECIMAL_FLOAT_TYPE_P (TREE_TYPE (e2value)))
+	      {
+		error_at (loc, "%<__builtin_complex%> operand "
+			  "not of real binary floating-point type");
+		expr.value = error_mark_node;
+		break;
+	      }
+	    if (TYPE_MAIN_VARIANT (TREE_TYPE (e1value))
+		!= TYPE_MAIN_VARIANT (TREE_TYPE (e2value)))
+	      {
+		error_at (loc,
+			  "%<__builtin_complex%> operands of different types");
+		expr.value = error_mark_node;
+		break;
+	      }
+	    if (!flag_isoc99)
+	      pedwarn (loc, OPT_pedantic,
+		       "ISO C90 does not support complex types");
+	    expr.value = build2 (COMPLEX_EXPR,
+				 build_complex_type (TYPE_MAIN_VARIANT
+						     (TREE_TYPE (e1value))),
+				 e1value, e2value);
+	    break;
+	  }
+	case RID_BUILTIN_SHUFFLE:
+	  {
+	    VEC(tree,gc) *expr_list;
+	    
+	    c_parser_consume_token (parser);
+	    if (!c_parser_get_builtin_args (parser,
+					    "__builtin_shuffle",
+					    &expr_list, NULL))
+	      {
+		expr.value = error_mark_node;
+		break;
+	      }
+
+	    if (VEC_length (tree, expr_list) == 2)
+	      expr.value = c_build_vec_shuffle_expr
+				(loc, VEC_index (tree, expr_list, 0),
+				 NULL_TREE,
+				 VEC_index (tree, expr_list, 1));
+	    else if (VEC_length (tree, expr_list) == 3)
+	      expr.value = c_build_vec_shuffle_expr
+				(loc, VEC_index (tree, expr_list, 0),
+				 VEC_index (tree, expr_list, 1),
+				 VEC_index (tree, expr_list, 2));
+	    else
+	      {
+		error_at (loc, "wrong number of arguments to "
+			       "%<__builtin_shuffle%>");
+		expr.value = error_mark_node;
+	      }
+	    break;
+	  }
 	case RID_AT_SELECTOR:
 	  gcc_assert (c_dialect_objc ());
 	  c_parser_consume_token (parser);
Index: gcc/config/i386/sse.md
===================================================================
--- gcc/config/i386/sse.md	(revision 178354)
+++ gcc/config/i386/sse.md	(working copy)
@@ -231,6 +231,12 @@ (define_mode_attr sseinsnmode
    (V4SF "V4SF") (V2DF "V2DF")
    (TI "TI") (V32QI "OI") (V16HI "OI") (V8SI "OI") (V4DI "OI")])
 
+;; All 128bit vector modes
+(define_mode_attr sseshuffint
+  [(V16QI "V16QI") (V8HI "V8HI")
+   (V4SI "V4SI")  (V2DI "V2DI")
+   (V4SF "V4SI") (V2DF "V2DI")])
+
 ;; Mapping of vector float modes to an integer mode of the same size
 (define_mode_attr sseintvecmode
   [(V8SF "V8SI") (V4DF "V4DI")
@@ -6234,6 +6240,19 @@ (define_expand "vconduv2di"
   DONE;
 })
 
+(define_expand "vshuffle<mode>"
+  [(match_operand:V_128 0 "register_operand" "")
+   (match_operand:V_128 1 "general_operand" "")
+   (match_operand:V_128 2 "general_operand" "")
+   (match_operand:<sseshuffint> 3 "general_operand" "")]
+  "TARGET_SSSE3 || TARGET_AVX"
+{
+  bool ok = ix86_expand_vshuffle (operands);
+  gcc_assert (ok);
+  DONE;
+})
+
+
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
 ;;
 ;; Parallel bitwise logical operations
Index: gcc/config/i386/i386-protos.h
===================================================================
--- gcc/config/i386/i386-protos.h	(revision 178354)
+++ gcc/config/i386/i386-protos.h	(working copy)
@@ -118,6 +118,7 @@ extern bool ix86_expand_int_movcc (rtx[]
 extern bool ix86_expand_fp_movcc (rtx[]);
 extern bool ix86_expand_fp_vcond (rtx[]);
 extern bool ix86_expand_int_vcond (rtx[]);
+extern bool ix86_expand_vshuffle (rtx[]);
 extern void ix86_expand_sse_unpack (rtx[], bool, bool);
 extern bool ix86_expand_int_addcc (rtx[]);
 extern rtx ix86_expand_call (rtx, rtx, rtx, rtx, rtx, bool);
Index: gcc/config/i386/i386.c
===================================================================
--- gcc/config/i386/i386.c	(revision 178354)
+++ gcc/config/i386/i386.c	(working copy)
@@ -18693,6 +18693,152 @@ ix86_expand_int_vcond (rtx operands[])
   return true;
 }
 
+bool
+ix86_expand_vshuffle (rtx operands[])
+{
+  rtx target = operands[0];
+  rtx op0 = operands[1];
+  rtx op1 = operands[2];
+  rtx mask = operands[3];
+  rtx new_mask, vt, t1, t2, w_vector;
+  enum machine_mode mode = GET_MODE (op0);
+  enum machine_mode maskmode = GET_MODE (mask);
+  enum machine_mode maskinner = GET_MODE_INNER (mode);
+  rtx vec[16];
+  int w, i, j;
+  bool one_operand_shuffle = op0 == op1;
+
+  gcc_assert ((TARGET_SSSE3 || TARGET_AVX) && GET_MODE_BITSIZE (mode) == 128);
+
+  op0 = force_reg (mode, op0);
+  op1 = force_reg (mode, op0);
+  mask = force_reg (maskmode, mask);
+
+  /* Number of elements in the vector.  */
+  w = GET_MODE_BITSIZE (maskmode) / GET_MODE_BITSIZE (maskinner);
+
+  /* generate w_vector = {w, w, ...}  */
+  for (i = 0; i < w; i++)
+    vec[i] = GEN_INT (w);
+  w_vector = gen_rtx_CONST_VECTOR (maskmode, gen_rtvec_v (w, vec));
+
+  /* mask = mask & {w-1, w-1, w-1,...} */
+  for (i = 0; i < w; i++)
+    vec[i] = GEN_INT (w - 1);
+
+  vt = gen_rtx_CONST_VECTOR (maskmode, gen_rtvec_v (w, vec));
+  new_mask = expand_simple_binop (maskmode, AND, mask, vt,
+				  NULL_RTX, 0, OPTAB_DIRECT);
+
+  /* If the original vector mode is V16QImode, we can just
+     use pshufb directly.  */
+  if (mode == V16QImode && one_operand_shuffle)
+    {
+      t1 = gen_reg_rtx (V16QImode);
+      emit_insn (gen_ssse3_pshufbv16qi3 (t1, op0, new_mask));
+      emit_insn (gen_rtx_SET (VOIDmode, target, t1));
+      return true;
+    }
+  else if (mode == V16QImode)
+    {
+      rtx xops[6];
+
+      t1 = gen_reg_rtx (V16QImode);
+      t2 = gen_reg_rtx (V16QImode);
+      emit_insn (gen_ssse3_pshufbv16qi3 (t1, op0, new_mask));
+      emit_insn (gen_ssse3_pshufbv16qi3 (t2, op1, new_mask));
+
+      /* mask = mask & {w, w, ...}  */
+      mask = expand_simple_binop (V16QImode, AND, mask, w_vector,
+				  NULL_RTX, 0, OPTAB_DIRECT);
+      xops[0] = target;
+      xops[1] = operands[1];
+      xops[2] = operands[2];
+      xops[3] = gen_rtx_EQ (mode, mask, w_vector);
+      xops[4] = t1;
+      xops[5] = t2;
+
+      return ix86_expand_int_vcond (xops);
+    }
+
+  /* mask = mask * {w, w, ...}  */
+  new_mask = expand_simple_binop (maskmode, MULT, new_mask, w_vector,
+				  NULL_RTX, 0, OPTAB_DIRECT);
+
+  /* Convert mask to vector of chars.  */
+  new_mask = simplify_gen_subreg (V16QImode, new_mask, maskmode, 0);
+  new_mask = force_reg (V16QImode, new_mask);
+
+  /* Build a helper mask wich we will use in pshufb
+     (v4si) --> {0,0,0,0, 4,4,4,4, 8,8,8,8, 12,12,12,12}
+     (v8hi) --> {0,0, 2,2, 4,4, 6,6, ...}
+     ...  */
+  for (i = 0; i < w; i++)
+    for (j = 0; j < 16/w; j++)
+      vec[i*w+j] = GEN_INT (i*16/w);
+  vt = gen_rtx_CONST_VECTOR (V16QImode, gen_rtvec_v (16, vec));
+  vt = force_reg (V16QImode, vt);
+
+  t1 = gen_reg_rtx (V16QImode);
+  emit_insn (gen_ssse3_pshufbv16qi3 (t1, new_mask, vt));
+  new_mask = t1;
+
+  /* Convert it into the byte positions by doing
+     new_mask = new_mask + {0,1,..,16/w, 0,1,..,16/w, ...}  */
+  for (i = 0; i < w; i++)
+    for (j = 0; j < 16/w; j++)
+      vec[i*w+j] = GEN_INT (j);
+
+  vt = gen_rtx_CONST_VECTOR (V16QImode, gen_rtvec_v (16, vec));
+  new_mask = expand_simple_binop (V16QImode, PLUS, new_mask, vt,
+				  NULL_RTX, 0, OPTAB_DIRECT);
+
+  t1 = gen_reg_rtx (V16QImode);
+
+  /* Convert OP0 to vector of chars.  */
+  op0 = simplify_gen_subreg (V16QImode, op0, mode, 0);
+  op0 = force_reg (V16QImode, op0);
+  emit_insn (gen_ssse3_pshufbv16qi3 (t1, op0, new_mask));
+
+  if (one_operand_shuffle)
+    {
+      /* Convert it back from vector of chars to the original mode.  */
+      t1 = simplify_gen_subreg (mode, t1, V16QImode, 0);
+      emit_insn (gen_rtx_SET (VOIDmode, target, t1));
+      return true;
+    }
+  else
+    {
+      rtx xops[6];
+
+      t2 = gen_reg_rtx (V16QImode);
+
+      /* Convert OP1 to vector of chars.  */
+      op1 = simplify_gen_subreg (V16QImode, op1, mode, 0);
+      op1 = force_reg (V16QImode, op1);
+      emit_insn (gen_ssse3_pshufbv16qi3 (t1, op1, new_mask));
+
+      /* mask = mask & {w, w, ...}  */
+      mask = expand_simple_binop (V16QImode, AND, mask, w_vector,
+				  NULL_RTX, 0, OPTAB_DIRECT);
+
+      t1 = simplify_gen_subreg (mode, t1, V16QImode, 0);
+      t2 = simplify_gen_subreg (mode, t2, V16QImode, 0);
+
+      xops[0] = target;
+      xops[1] = operands[1];
+      xops[2] = operands[2];
+      xops[3] = gen_rtx_EQ (mode, mask, w_vector);
+      xops[4] = t1;
+      xops[5] = t2;
+
+      fprintf (stderr, "-- here in %s \n", __func__);
+      return ix86_expand_int_vcond (xops);
+    }
+
+  return false;
+}
+
 /* Unpack OP[1] into the next wider integer vector type.  UNSIGNED_P is
    true if we should do zero extension, else sign extension.  HIGH_P is
    true if we want the N/2 high elements, else the low elements.  */
@@ -30911,6 +31057,9 @@ struct expand_vec_perm_d
 
 static bool expand_vec_perm_1 (struct expand_vec_perm_d *d);
 static bool expand_vec_perm_broadcast_1 (struct expand_vec_perm_d *d);
+static int extract_vec_perm_cst (struct expand_vec_perm_d *, tree);
+static bool ix86_vectorize_builtin_vec_perm_ok (tree vec_type, tree mask);
+
 
 /* Get a vector mode of the same size as the original but with elements
    twice as wide.  This is only guaranteed to apply to integral vectors.  */
@@ -32417,7 +32566,7 @@ void ix86_emit_i387_round (rtx op0, rtx
   res = gen_reg_rtx (outmode);
 
   half = CONST_DOUBLE_FROM_REAL_VALUE (dconsthalf, inmode);
-  
+
   /* round(a) = sgn(a) * floor(fabs(a) + 0.5) */
 
   /* scratch = fxam(op1) */
@@ -34576,10 +34725,10 @@ ix86_vectorize_builtin_vec_perm_ok (tree
 
   vec_mask = extract_vec_perm_cst (&d, mask);
 
-  /* This hook is cannot be called in response to something that the
-     user does (unlike the builtin expander) so we shouldn't ever see
-     an error generated from the extract.  */
-  gcc_assert (vec_mask > 0 && vec_mask <= 3);
+  /* Check whether the mask can be applied to the vector type.  */
+  if (vec_mask < 0 || vec_mask > 3)
+    return false;
+
   one_vec = (vec_mask != 3);
 
   /* Implementable with shufps or pshufd.  */
Index: gcc/tree-ssa-operands.c
===================================================================
--- gcc/tree-ssa-operands.c	(revision 178354)
+++ gcc/tree-ssa-operands.c	(working copy)
@@ -943,6 +943,7 @@ get_expr_operands (gimple stmt, tree *ex
 
     case COND_EXPR:
     case VEC_COND_EXPR:
+    case VEC_SHUFFLE_EXPR:
       get_expr_operands (stmt, &TREE_OPERAND (expr, 0), uflags);
       get_expr_operands (stmt, &TREE_OPERAND (expr, 1), uflags);
       get_expr_operands (stmt, &TREE_OPERAND (expr, 2), uflags);

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: Vector shuffling
  2011-09-30 20:34                                             ` Artem Shinkarov
@ 2011-09-30 20:44                                               ` Richard Henderson
  2011-09-30 20:51                                                 ` Artem Shinkarov
  0 siblings, 1 reply; 71+ messages in thread
From: Richard Henderson @ 2011-09-30 20:44 UTC (permalink / raw)
  To: Artem Shinkarov
  Cc: Joseph S. Myers, Richard Guenther, Duncan Sands, gcc-patches

> I hope that the new version looks a little bit better.

Nearly ok.  Some trivial fixes, and then please commit.

> +  rtx_v0 = expand_normal (v0);
> +  rtx_mask = expand_normal (mask);
> +
> +  create_output_operand (&ops[0], target, mode);
> +  create_input_operand (&ops[3], rtx_mask, mode);
> +
> +  if (operand_equal_p (v0, v1, 0))
> +    {
> +      rtx_v0 = expand_normal (v0);
> +      if (!insn_operand_matches(icode, 1, rtx_v0))
> +        rtx_v0 = force_reg (mode, rtx_v0);
> +
> +      gcc_checking_assert(insn_operand_matches(icode, 2, rtx_v0));
> +
> +      create_fixed_operand (&ops[1], rtx_v0);
> +      create_fixed_operand (&ops[2], rtx_v0);
> +    }
> +  else
> +    {
> +      create_input_operand (&ops[1], expand_normal (v0), mode);
> +      create_input_operand (&ops[2], expand_normal (v1), mode);
> +    }

The first line should be removed.  Otherwise you're expanding v0 twice.

> +(define_expand "vshuffle<mode>"
> +  [(match_operand:V_128 0 "register_operand" "")
> +   (match_operand:V_128 1 "general_operand" "")
> +   (match_operand:V_128 2 "general_operand" "")
> +   (match_operand:<sseshuffint> 3 "general_operand" "")]
> +  "TARGET_SSSE3 || TARGET_AVX"
> +{
> +  bool ok = ix86_expand_vshuffle (operands);
> +  gcc_assert (ok);
> +  DONE;
> +})

Operands 1, 2, and 3 should use register_operand.  That will avoid...

> +  op0 = force_reg (mode, op0);
> +  op1 = force_reg (mode, op0);
> +  mask = force_reg (maskmode, mask);

... these lines in ix86_expand_vshuffle and the obvious typo for op1.

> +      fprintf (stderr, "-- here in %s \n", __func__);

Remove the debugging lines.


r~

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: Vector shuffling
  2011-09-30 20:44                                               ` Richard Henderson
@ 2011-09-30 20:51                                                 ` Artem Shinkarov
  2011-09-30 23:22                                                   ` Richard Henderson
  2011-10-04  2:26                                                   ` Hans-Peter Nilsson
  0 siblings, 2 replies; 71+ messages in thread
From: Artem Shinkarov @ 2011-09-30 20:51 UTC (permalink / raw)
  To: Richard Henderson
  Cc: Joseph S. Myers, Richard Guenther, Duncan Sands, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 4601 bytes --]

On Fri, Sep 30, 2011 at 7:18 PM, Richard Henderson <rth@redhat.com> wrote:
>> I hope that the new version looks a little bit better.
>
> Nearly ok.  Some trivial fixes, and then please commit.
>
>> +  rtx_v0 = expand_normal (v0);
>> +  rtx_mask = expand_normal (mask);
>> +
>> +  create_output_operand (&ops[0], target, mode);
>> +  create_input_operand (&ops[3], rtx_mask, mode);
>> +
>> +  if (operand_equal_p (v0, v1, 0))
>> +    {
>> +      rtx_v0 = expand_normal (v0);
>> +      if (!insn_operand_matches(icode, 1, rtx_v0))
>> +        rtx_v0 = force_reg (mode, rtx_v0);
>> +
>> +      gcc_checking_assert(insn_operand_matches(icode, 2, rtx_v0));
>> +
>> +      create_fixed_operand (&ops[1], rtx_v0);
>> +      create_fixed_operand (&ops[2], rtx_v0);
>> +    }
>> +  else
>> +    {
>> +      create_input_operand (&ops[1], expand_normal (v0), mode);
>> +      create_input_operand (&ops[2], expand_normal (v1), mode);
>> +    }
>
> The first line should be removed.  Otherwise you're expanding v0 twice.
>
>> +(define_expand "vshuffle<mode>"
>> +  [(match_operand:V_128 0 "register_operand" "")
>> +   (match_operand:V_128 1 "general_operand" "")
>> +   (match_operand:V_128 2 "general_operand" "")
>> +   (match_operand:<sseshuffint> 3 "general_operand" "")]
>> +  "TARGET_SSSE3 || TARGET_AVX"
>> +{
>> +  bool ok = ix86_expand_vshuffle (operands);
>> +  gcc_assert (ok);
>> +  DONE;
>> +})
>
> Operands 1, 2, and 3 should use register_operand.  That will avoid...
>
>> +  op0 = force_reg (mode, op0);
>> +  op1 = force_reg (mode, op0);
>> +  mask = force_reg (maskmode, mask);
>
> ... these lines in ix86_expand_vshuffle and the obvious typo for op1.
>
>> +      fprintf (stderr, "-- here in %s \n", __func__);
>
> Remove the debugging lines.
>
>
> r~
>

Ok, in the attachment there is a patch which fixes mentioned errors.

ChangeLog:

	gcc/
	* optabs.c (expand_vec_shuffle_expr_p): New function. Checks
	if given expression can be expanded by the target.
	(expand_vec_shuffle_expr): New function. Expand VEC_SHUFFLE_EXPR
	using target vector instructions.
	* optabs.h: New optab vshuffle.
	(expand_vec_shuffle_expr_p): New prototype.
	(expand_vec_shuffle_expr): New prototype.
	(vshuffle_optab): New optab.
	* genopinit.c: Adjust to support vecshuffle.
	* c-tree.h (c_build_vec_shuffle_expr): New prototype.
	* expr.c (expand_expr_real_2): Adjust.
	* c-typeck.c: (c_build_vec_shuffle_expr): Build a VEC_SHUFFLE_EXPR
	recognizing the cases of two and three arguments.
	(convert_arguments) (build_binary_op)
	(scalar_to_vector) (build_array_ref): Spurious whitespace.
	* gimplify.c (gimplify_exp): Adjusted to support VEC_SHUFFLE_EXPR.
	* tree.def: New tree code VEC_SHUFFLE_EXPR.
	* tree-inline.c (estimate_operator_cost): Recognize VEC_SHUFFLE_EXPR.
	* tree-vect-generic.c (vector_element): New function. Returns an
	element of the vector at the given position.
	(lower_vec_shuffle): Checks if VEC_SHUFLLE_EXPR is supported
	by the backend or expand an expression piecewise.
	(expand_vector_operations_1): Adjusted.
	(gate_expand_vector_operations_noop): New gate function.
	* Makefile.in (tree-vect-generic.o): New include.
	* gimple.c (get_gimple_rhs_num_ops): Adjust.
	* tree-cfg.c (verify_gimple_assign_trenary): Verify VEC_SHUFFLE_EXPR.
	* passes.c: Move veclower down.
	* tree-pretty-print.c (dump_generic_node): Recognize
	VEC_SHUFFLE_EXPR as valid expression.
	* c-parser.c (c_parser_get_builtin_args): Helper function for the
	builtins with variable number of arguments.
	(c_parser_postfix_expression): Use a new helper function for
	RID_CHOOSE_EXPR, RID_BUILTIN_COMPLEX and RID_BUILTIN_SHUFFLE.
	* tree-ssa-operands: Adjust.
	
	c-family/
	* c-common.c: New __builtin_shuffle keyword.
	* c-common.h: New __builtin_shuffle keyword.

	gcc/config/i386
	* sse.md: (sseshuffint) New mode_attr. Correspondence between the
	vector and the type of the mask when shuffling.
	(vecshuffle<mode>): New expansion.
	* i386-protos.h (ix86_expand_vshuffle): New prototype.
	* i386.c (ix86_expand_vshuffle): Expand vshuffle using pshufb.
	(ix86_vectorize_builtin_vec_perm_ok): Adjust.

	gcc/doc
	* extend.texi: Adjust.

	gcc/testsuite
	* gcc.c-torture/execute/vect-shuffle-2.c: New test.
	* gcc.c-torture/execute/vect-shuffle-4.c: New test.
	* gcc.c-torture/execute/vect-shuffle-1.c: New test.
	* gcc.dg/builtin-complex-err-1.c: Adjust.

bootstrapped and tested on x86_64-unknown-linux-gnu.

Unfortunately I don't have permissions to commit to the trunk.


Thanks,
Artem.

[-- Attachment #2: vec-shuffle.v19.diff --]
[-- Type: text/plain, Size: 66194 bytes --]

Index: gcc/doc/extend.texi
===================================================================
--- gcc/doc/extend.texi	(revision 178354)
+++ gcc/doc/extend.texi	(working copy)
@@ -6561,6 +6561,32 @@ invoke undefined behavior at runtime.  W
 accesses for vector subscription can be enabled with
 @option{-Warray-bounds}.
 
+Vector shuffling is available using functions
+@code{__builtin_shuffle (vec, mask)} and
+@code{__builtin_shuffle (vec0, vec1, mask)}. Both functions construct
+a permutation of elements from one or two vectors and return a vector
+of the same type as input vector(s). The mask is a vector of
+integer-typed elements. The size of each element of the mask must be
+the same as the size of each input vector element. The number of
+elements in input vector(s) and mask must be the same.
+
+The elements of the input vectors are numbered from left to right across
+one or both of the vectors. Each element in the mask specifies a number
+of element from the input vector(s). Consider the following example.
+
+@smallexample
+typedef int v4si __attribute__ ((vector_size (16)));
+
+v4si a = @{1,2,3,4@};
+v4si b = @{5,6,7,8@};
+v4si mask1 = @{0,1,1,3@};
+v4si mask2 = @{0,4,2,5@};
+v4si res;
+
+res = __builtin_shuffle (a, mask1);       /* res is @{1,2,2,4@}  */
+res = __builtin_shuffle (a, b, mask2);    /* res is @{1,5,3,6@}  */
+@end smallexample
+
 You can declare variables and use them in function calls and returns, as
 well as in assignments and some casts.  You can specify a vector type as
 a return type for a function.  Vector types can also be used as function
Index: gcc/tree-pretty-print.c
===================================================================
--- gcc/tree-pretty-print.c	(revision 178354)
+++ gcc/tree-pretty-print.c	(working copy)
@@ -2067,6 +2067,16 @@ dump_generic_node (pretty_printer *buffe
       dump_generic_node (buffer, TREE_OPERAND (node, 2), spc, flags, false);
       pp_string (buffer, " > ");
       break;
+    
+    case VEC_SHUFFLE_EXPR:
+      pp_string (buffer, " VEC_SHUFFLE_EXPR < ");
+      dump_generic_node (buffer, TREE_OPERAND (node, 0), spc, flags, false);
+      pp_string (buffer, " , ");
+      dump_generic_node (buffer, TREE_OPERAND (node, 1), spc, flags, false);
+      pp_string (buffer, " , ");
+      dump_generic_node (buffer, TREE_OPERAND (node, 2), spc, flags, false);
+      pp_string (buffer, " > ");
+      break;
 
     case DOT_PROD_EXPR:
       pp_string (buffer, " DOT_PROD_EXPR < ");
Index: gcc/c-family/c-common.c
===================================================================
--- gcc/c-family/c-common.c	(revision 178354)
+++ gcc/c-family/c-common.c	(working copy)
@@ -425,6 +425,7 @@ const struct c_common_resword c_common_r
   { "__attribute__",	RID_ATTRIBUTE,	0 },
   { "__builtin_choose_expr", RID_CHOOSE_EXPR, D_CONLY },
   { "__builtin_complex", RID_BUILTIN_COMPLEX, D_CONLY },
+  { "__builtin_shuffle", RID_BUILTIN_SHUFFLE, D_CONLY },
   { "__builtin_offsetof", RID_OFFSETOF, 0 },
   { "__builtin_types_compatible_p", RID_TYPES_COMPATIBLE_P, D_CONLY },
   { "__builtin_va_arg",	RID_VA_ARG,	0 },
Index: gcc/c-family/c-common.h
===================================================================
--- gcc/c-family/c-common.h	(revision 178354)
+++ gcc/c-family/c-common.h	(working copy)
@@ -103,7 +103,7 @@ enum rid
   /* C extensions */
   RID_ASM,       RID_TYPEOF,   RID_ALIGNOF,  RID_ATTRIBUTE,  RID_VA_ARG,
   RID_EXTENSION, RID_IMAGPART, RID_REALPART, RID_LABEL,      RID_CHOOSE_EXPR,
-  RID_TYPES_COMPATIBLE_P,      RID_BUILTIN_COMPLEX,
+  RID_TYPES_COMPATIBLE_P,      RID_BUILTIN_COMPLEX,	     RID_BUILTIN_SHUFFLE,
   RID_DFLOAT32, RID_DFLOAT64, RID_DFLOAT128,
   RID_FRACT, RID_ACCUM,
 
Index: gcc/optabs.c
===================================================================
--- gcc/optabs.c	(revision 178354)
+++ gcc/optabs.c	(working copy)
@@ -6620,6 +6620,93 @@ vector_compare_rtx (tree cond, bool unsi
   return gen_rtx_fmt_ee (rcode, VOIDmode, ops[0].value, ops[1].value);
 }
 
+/* Return true if VEC_SHUFFLE_EXPR can be expanded using SIMD extensions
+   of the CPU.  */
+bool
+expand_vec_shuffle_expr_p (enum machine_mode mode, tree v0, tree v1, tree mask)
+{
+  int v0_mode_s = GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE (TREE_TYPE (v0))));
+  int mask_mode_s = GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE (TREE_TYPE (mask))));
+
+  if (TREE_CODE (mask) == VECTOR_CST
+      && targetm.vectorize.builtin_vec_perm_ok (TREE_TYPE (v0), mask))
+    return true;
+
+  if (v0_mode_s != mask_mode_s
+      || TYPE_VECTOR_SUBPARTS (TREE_TYPE (v0))
+	 != TYPE_VECTOR_SUBPARTS (TREE_TYPE (mask))
+      || TYPE_VECTOR_SUBPARTS (TREE_TYPE (v1))
+	 != TYPE_VECTOR_SUBPARTS (TREE_TYPE (mask)))
+    return false;
+
+  return direct_optab_handler (vshuffle_optab, mode) != CODE_FOR_nothing;
+}
+
+/* Generate instructions for VEC_COND_EXPR given its type and three
+   operands.  */
+rtx
+expand_vec_shuffle_expr (tree type, tree v0, tree v1, tree mask, rtx target)
+{
+  struct expand_operand ops[4];
+  enum insn_code icode;
+  enum machine_mode mode = TYPE_MODE (type);
+  rtx rtx_v0, rtx_mask;
+
+  gcc_assert (expand_vec_shuffle_expr_p (mode, v0, v1, mask));
+
+  if (TREE_CODE (mask) == VECTOR_CST)
+    {
+      tree m_type, call;
+      tree fn = targetm.vectorize.builtin_vec_perm (TREE_TYPE (v0), &m_type);
+
+      if (!fn)
+	goto vshuffle;
+
+      if (m_type != TREE_TYPE (TREE_TYPE (mask)))
+	{
+	  int units = TYPE_VECTOR_SUBPARTS (TREE_TYPE (mask));
+	  tree cvt = build_vector_type (m_type, units);
+	  mask = fold_convert (cvt, mask);
+	}
+
+      call = fold_build1 (ADDR_EXPR, build_pointer_type (TREE_TYPE (fn)), fn);
+      call = build_call_nary (type, call, 3, v0, v1, mask);
+
+      return expand_expr_real_1 (call, target, VOIDmode, EXPAND_NORMAL, NULL);
+    }
+
+vshuffle:
+  icode = direct_optab_handler (vshuffle_optab, mode);
+
+  if (icode == CODE_FOR_nothing)
+    return 0;
+
+  rtx_mask = expand_normal (mask);
+
+  create_output_operand (&ops[0], target, mode);
+  create_input_operand (&ops[3], rtx_mask, mode);
+
+  if (operand_equal_p (v0, v1, 0))
+    {
+      rtx_v0 = expand_normal (v0);
+      if (!insn_operand_matches(icode, 1, rtx_v0))
+        rtx_v0 = force_reg (mode, rtx_v0);
+
+      gcc_checking_assert(insn_operand_matches(icode, 2, rtx_v0));
+
+      create_fixed_operand (&ops[1], rtx_v0);
+      create_fixed_operand (&ops[2], rtx_v0);
+    }
+  else
+    {
+      create_input_operand (&ops[1], expand_normal (v0), mode);
+      create_input_operand (&ops[2], expand_normal (v1), mode);
+    }
+
+  expand_insn (icode, 4, ops);
+  return ops[0].value;
+}
+
 /* Return insn code for TYPE, the type of a VEC_COND_EXPR.  */
 
 static inline enum insn_code
Index: gcc/optabs.h
===================================================================
--- gcc/optabs.h	(revision 178354)
+++ gcc/optabs.h	(working copy)
@@ -636,6 +636,9 @@ enum direct_optab_index
   DOI_vcond,
   DOI_vcondu,
 
+  /* Vector shuffling.  */
+  DOI_vshuffle,
+
   /* Block move operation.  */
   DOI_movmem,
 
@@ -701,6 +704,7 @@ typedef struct direct_optab_d *direct_op
 #define reload_out_optab (&direct_optab_table[(int) DOI_reload_out])
 #define vcond_optab (&direct_optab_table[(int) DOI_vcond])
 #define vcondu_optab (&direct_optab_table[(int) DOI_vcondu])
+#define vshuffle_optab (&direct_optab_table[(int) DOI_vshuffle])
 #define movmem_optab (&direct_optab_table[(int) DOI_movmem])
 #define setmem_optab (&direct_optab_table[(int) DOI_setmem])
 #define cmpstr_optab (&direct_optab_table[(int) DOI_cmpstr])
@@ -879,8 +883,15 @@ extern rtx expand_widening_mult (enum ma
 /* Return tree if target supports vector operations for COND_EXPR.  */
 bool expand_vec_cond_expr_p (tree, enum machine_mode);
 
+/* Return tree if target supports vector operations for VEC_SHUFFLE_EXPR.  */
+bool expand_vec_shuffle_expr_p (enum machine_mode, tree, tree, tree);
+
 /* Generate code for VEC_COND_EXPR.  */
 extern rtx expand_vec_cond_expr (tree, tree, tree, tree, rtx);
+
+/* Generate code for VEC_SHUFFLE_EXPR.  */
+extern rtx expand_vec_shuffle_expr (tree, tree, tree, tree, rtx);
+
 /* Generate code for VEC_LSHIFT_EXPR and VEC_RSHIFT_EXPR.  */
 extern rtx expand_vec_shift_expr (sepops, rtx);
 
Index: gcc/genopinit.c
===================================================================
--- gcc/genopinit.c	(revision 178354)
+++ gcc/genopinit.c	(working copy)
@@ -255,6 +255,7 @@ static const char * const optabs[] =
   "set_optab_handler (vec_realign_load_optab, $A, CODE_FOR_$(vec_realign_load_$a$))",
   "set_direct_optab_handler (vcond_optab, $A, CODE_FOR_$(vcond$a$))",
   "set_direct_optab_handler (vcondu_optab, $A, CODE_FOR_$(vcondu$a$))",
+  "set_direct_optab_handler (vshuffle_optab, $A, CODE_FOR_$(vshuffle$a$))",
   "set_optab_handler (ssum_widen_optab, $A, CODE_FOR_$(widen_ssum$I$a3$))",
   "set_optab_handler (usum_widen_optab, $A, CODE_FOR_$(widen_usum$I$a3$))",
   "set_optab_handler (udot_prod_optab, $A, CODE_FOR_$(udot_prod$I$a$))",
Index: gcc/testsuite/gcc.c-torture/execute/vect-shuffle-2.c
===================================================================
--- gcc/testsuite/gcc.c-torture/execute/vect-shuffle-2.c	(revision 0)
+++ gcc/testsuite/gcc.c-torture/execute/vect-shuffle-2.c	(revision 0)
@@ -0,0 +1,44 @@
+#define vector(elcount, type)  \
+__attribute__((vector_size((elcount)*sizeof(type)))) type
+
+#define vidx(type, vec, idx) (*(((type *) &(vec)) + idx))
+
+#define shuf2compare(type, count, vres, v0, v1, mask) \
+do { \
+    int __i; \
+    for (__i = 0; __i < count; __i++) { \
+        if (vidx(type, vres, __i) != ((vidx(type, mask, __i) < count) ? \
+                          vidx(type, v0, vidx(type, mask, __i)) :  \
+                          vidx(type, v1, (vidx(type, mask, __i) - count)))) \
+            __builtin_abort (); \
+        } \
+} while (0)
+
+
+int main (int argc, char *argv[]) {
+    vector (8, short) v0 = {5, 5,5,5,5,5,argc,7};
+    vector (8, short) v1 = {argc, 1,8,8,4,9,argc,4};
+    vector (8, short) v2;
+
+    //vector (8, short) mask = {1,2,5,4,3,6,7};
+    
+    vector (8, short) mask0 = {0,2,3,1,4,5,6,7};
+    vector (8, short) mask1 = {0,12,3,4,3,0,10,9};
+    
+    vector (8, short) mask2 = {0,8,1,9,2,10,3,11};
+
+    v2 = __builtin_shuffle (v0, v1,  mask0);
+    shuf2compare (short, 8, v2, v0, v1, mask0);
+ 
+    v2 = __builtin_shuffle (v0, v1,  mask1);
+    shuf2compare (short, 8, v2, v0, v1, mask1);
+    
+    v2 = __builtin_shuffle (v0, v1,  mask2);
+    shuf2compare (short, 8, v2, v0, v1, mask2);
+
+    v2 = __builtin_shuffle (mask0, mask0,  v0);
+    shuf2compare (short, 8, v2, mask0, mask0, v0);
+
+    return 0; 
+}
+
Index: gcc/testsuite/gcc.c-torture/execute/vect-shuffle-4.c
===================================================================
--- gcc/testsuite/gcc.c-torture/execute/vect-shuffle-4.c	(revision 0)
+++ gcc/testsuite/gcc.c-torture/execute/vect-shuffle-4.c	(revision 0)
@@ -0,0 +1,50 @@
+#define vector(elcount, type)  \
+__attribute__((vector_size((elcount)*sizeof(type)))) type
+
+#define vidx(type, vec, idx) (*(((type *) &(vec)) + idx))
+
+#define shuf2compare(type, count, vres, v0, v1, mask) \
+do { \
+    int __i; \
+    for (__i = 0; __i < count; __i++) { \
+        if (vidx(type, vres, __i) != ((vidx(type, mask, __i) < count) ? \
+                          vidx(type, v0, vidx(type, mask, __i)) :  \
+                          vidx(type, v1, (vidx(type, mask, __i) - count)))) \
+            __builtin_abort (); \
+        } \
+} while (0)
+
+
+vector (8, short) __attribute__ ((noinline))
+f (vector (8, short) x, vector (8, short) y, vector (8, short) mask) {
+    return __builtin_shuffle (x, y, mask);
+}
+
+
+
+int main (int argc, char *argv[]) {
+    vector (8, short) v0 = {argc, 1,2,3,4,5,6,7};
+    vector (8, short) v1 = {argc, 1,argc,3,4,5,argc,7};
+    vector (8, short) v2;
+
+    //vector (8, short) mask = {1,2,5,4,3,6,7};
+    
+    vector (8, short) mask0 = {0,2,3,1,4,5,6,7};
+    vector (8, short) mask1 = {0,12,3,4,3,0,10,9};
+    vector (8, short) mask2 = {0,8,1,9,2,10,3,11};
+
+    v2 = f (v0, v1,  mask0);
+    shuf2compare (short, 8, v2, v0, v1, mask0);
+ 
+    v2 = f (v0, v1,  mask1);
+    shuf2compare (short, 8, v2, v0, v1, mask1);
+
+    v2 = f (v0, v1,  mask2);
+    shuf2compare (short, 8, v2, v0, v1, mask2);
+
+    v2 = f (mask0, mask0,  v0);
+    shuf2compare (short, 8, v2, mask0, mask0, v0);
+
+    return 0; 
+}
+
Index: gcc/testsuite/gcc.c-torture/execute/vect-shuffle-1.c
===================================================================
--- gcc/testsuite/gcc.c-torture/execute/vect-shuffle-1.c	(revision 0)
+++ gcc/testsuite/gcc.c-torture/execute/vect-shuffle-1.c	(revision 0)
@@ -0,0 +1,46 @@
+#define vector(elcount, type)  \
+__attribute__((vector_size((elcount)*sizeof(type)))) type
+
+#define vidx(type, vec, idx) (*(((type *) &(vec)) + idx))
+
+#define shufcompare(type, count, vres, v0, mask) \
+do { \
+    int __i; \
+    for (__i = 0; __i < count; __i++) { \
+        if (vidx(type, vres, __i) != vidx(type, v0, vidx(type, mask, __i))) \
+            __builtin_abort (); \
+    } \
+} while (0)
+
+
+int main (int argc, char *argv[]) {
+    /*vector (8, short) v0 = {argc, 1,2,3,4,5,6,7};
+    vector (8, short) v1 = {argc, 1,argc,3,4,5,argc,7};
+    vector (8, short) v2;
+   
+    vector (8, short) smask = {0,0,1,2,3,4,5,6};
+    
+    v2 = __builtin_shuffle (v0,  smask);
+    shufcompare (short, 8, v2, v0, smask);
+    v2 = __builtin_shuffle (v0, v1);
+    shufcompare (short, 8, v2, v0, v1);
+    v2 = __builtin_shuffle (smask, v0);
+    shufcompare (short, 8, v2, smask, v0);*/
+
+    vector (4, int) i0 = {argc, 1,2,3};
+    vector (4, int) i1 = {argc, 1, argc, 3};
+    vector (4, int) i2;
+
+    vector (4, int) imask = {0,3,2,1};
+
+    /*i2 = __builtin_shuffle (i0, imask);
+    shufcompare (int, 4, i2, i0, imask);*/
+    i2 = __builtin_shuffle (i0, i1);
+    shufcompare (int, 4, i2, i0, i1);
+    
+    i2 = __builtin_shuffle (imask, i0);
+    shufcompare (int, 4, i2, imask, i0);
+    
+    return 0;
+}
+
Index: gcc/testsuite/gcc.c-torture/execute/vect-shuffle-3.c
===================================================================
--- gcc/testsuite/gcc.c-torture/execute/vect-shuffle-3.c	(revision 0)
+++ gcc/testsuite/gcc.c-torture/execute/vect-shuffle-3.c	(revision 0)
@@ -0,0 +1,36 @@
+#define vector(elcount, type)  \
+__attribute__((vector_size((elcount)*sizeof(type)))) type
+
+#define vidx(type, vec, idx) (*(((type *) &(vec)) + idx))
+
+#define shufcompare(type, count, vres, v0, mask) \
+do { \
+    int __i; \
+    for (__i = 0; __i < count; __i++) { \
+        if (vidx(type, vres, __i) != vidx(type, v0, vidx(type, mask, __i))) \
+            __builtin_abort (); \
+    } \
+} while (0)
+
+vector (8, short) __attribute__ ((noinline))
+f (vector (8, short) x, vector (8, short) mask) {
+    return __builtin_shuffle (x, mask);
+}
+
+
+int main (int argc, char *argv[]) {
+    vector (8, short) v0 = {argc, 1,2,3,4,5,6,7};
+    vector (8, short) v1 = {argc, 1,argc,3,4,5,argc,7};
+    vector (8, short) v2;
+
+    vector (8, short) mask = {0,0,1,2,3,4,5,6};
+    
+    v2 = f (v0,  mask);
+    shufcompare (short, 8, v2, v0, mask);
+
+    v2 = f (v0, v1);
+    shufcompare (short, 8, v2, v0, v1);
+
+    return 0;
+}
+
Index: gcc/testsuite/gcc.c-torture/execute/vect-shuffle-5.c
===================================================================
--- gcc/testsuite/gcc.c-torture/execute/vect-shuffle-5.c	(revision 0)
+++ gcc/testsuite/gcc.c-torture/execute/vect-shuffle-5.c	(revision 0)
@@ -0,0 +1,64 @@
+/* Test that different type variants are compatible within
+   vector shuffling.  */
+
+#define vector(elcount, type)  \
+__attribute__((vector_size((elcount)*sizeof(type)))) type
+
+#define shufcompare(count, vres, v0, mask) \
+do { \
+    int __i; \
+    for (__i = 0; __i < count; __i++) { \
+        if (vres[__i] != v0[mask[__i]]) \
+            __builtin_abort (); \
+    } \
+} while (0)
+
+#define test_compat_mask(res, vec, mask) \
+  res = __builtin_shuffle (vec, mask); \
+  shufcompare(4, res, vec, mask); \
+  res = __builtin_shuffle (vec, c ## mask); \
+  shufcompare(4, res, vec, c ##  mask); \
+  res = __builtin_shuffle (vec, r ## mask); \
+  shufcompare(4, res, vec, r ##  mask); \
+  res = __builtin_shuffle (vec, d ## mask); \
+  shufcompare(4, res, vec, d ##  mask); \
+  res = __builtin_shuffle (vec, dc ## mask); \
+  shufcompare(4, res, vec, dc ##  mask); \
+
+#define test_compat_vec(res, vec, mask) \
+  test_compat_mask (res, vec, mask); \
+  test_compat_mask (res, c ## vec, mask); \
+  test_compat_mask (res, r ## vec, mask); \
+  test_compat_mask (res, d ## vec, mask); \
+  test_compat_mask (res, dc ## vec, mask); 
+
+#define test_compat(res, vec, mask) \
+  test_compat_vec (res, vec, mask); \
+  test_compat_vec (d ## res, vec, mask); \
+  test_compat_vec (r ## res, vec, mask);
+
+typedef vector (4, int) v4si;
+typedef const vector (4, int) v4sicst;
+
+int main (int argc, char *argv[]) {
+    vector (4, int) vec = {argc, 1,2,3};
+    const vector (4, int) cvec = {argc, 1,2,3};
+    register vector (4, int) rvec = {argc, 1,2,3};
+    v4si dvec = {argc, 1,2,3};
+    v4sicst dcvec = {argc, 1,2,3};
+    
+    vector (4, int) res; 
+    v4si dres;
+    register vector (4, int) rres;
+
+    vector (4, int) mask = {0,3,2,1};
+    const vector (4, int) cmask = {0,3,2,1};
+    register vector (4, int) rmask = {0,3,2,1};
+    v4si dmask = {0,3,2,1};
+    v4sicst dcmask = {0,3,2,1};
+
+    test_compat (res, vec, mask);
+
+    return 0;
+}
+
Index: gcc/testsuite/gcc.dg/builtin-complex-err-1.c
===================================================================
--- gcc/testsuite/gcc.dg/builtin-complex-err-1.c	(revision 178354)
+++ gcc/testsuite/gcc.dg/builtin-complex-err-1.c	(working copy)
@@ -19,8 +19,8 @@ _Complex float fc3 = __builtin_complex (
 void
 f (void)
 {
-  __builtin_complex (0.0); /* { dg-error "expected" } */
-  __builtin_complex (0.0, 0.0, 0.0); /* { dg-error "expected" } */
+  __builtin_complex (0.0); /* { dg-error "wrong number of arguments" } */
+  __builtin_complex (0.0, 0.0, 0.0); /* { dg-error "wrong number of arguments" } */
 }
 
-void (*p) (void) = __builtin_complex; /* { dg-error "expected" } */
+void (*p) (void) = __builtin_complex; /* { dg-error "cannot take address" } */
Index: gcc/c-tree.h
===================================================================
--- gcc/c-tree.h	(revision 178354)
+++ gcc/c-tree.h	(working copy)
@@ -579,6 +579,7 @@ extern tree c_begin_omp_task (void);
 extern tree c_finish_omp_task (location_t, tree, tree);
 extern tree c_finish_omp_clauses (tree);
 extern tree c_build_va_arg (location_t, tree, tree);
+extern tree c_build_vec_shuffle_expr (location_t, tree, tree, tree);
 
 /* Set to 0 at beginning of a function definition, set to 1 if
    a return statement that specifies a return value is seen.  */
Index: gcc/expr.c
===================================================================
--- gcc/expr.c	(revision 178354)
+++ gcc/expr.c	(working copy)
@@ -8605,6 +8605,10 @@ expand_expr_real_2 (sepops ops, rtx targ
     case VEC_PACK_FIX_TRUNC_EXPR:
       mode = TYPE_MODE (TREE_TYPE (treeop0));
       goto binop;
+    
+    case VEC_SHUFFLE_EXPR:
+      target = expand_vec_shuffle_expr (type, treeop0, treeop1, treeop2, target);
+      return target;
 
     case DOT_PROD_EXPR:
       {
Index: gcc/gimple-pretty-print.c
===================================================================
--- gcc/gimple-pretty-print.c	(revision 178354)
+++ gcc/gimple-pretty-print.c	(working copy)
@@ -417,6 +417,16 @@ dump_ternary_rhs (pretty_printer *buffer
       dump_generic_node (buffer, gimple_assign_rhs3 (gs), spc, flags, false);
       pp_string (buffer, ">");
       break;
+    
+    case VEC_SHUFFLE_EXPR:
+      pp_string (buffer, "VEC_SHUFFLE_EXPR <");
+      dump_generic_node (buffer, gimple_assign_rhs1 (gs), spc, flags, false);
+      pp_string (buffer, ", ");
+      dump_generic_node (buffer, gimple_assign_rhs2 (gs), spc, flags, false);
+      pp_string (buffer, ", ");
+      dump_generic_node (buffer, gimple_assign_rhs3 (gs), spc, flags, false);
+      pp_string (buffer, ">");
+      break;
 
     case REALIGN_LOAD_EXPR:
       pp_string (buffer, "REALIGN_LOAD <");
Index: gcc/c-typeck.c
===================================================================
--- gcc/c-typeck.c	(revision 178354)
+++ gcc/c-typeck.c	(working copy)
@@ -2307,7 +2307,7 @@ build_array_ref (location_t loc, tree ar
       if (TREE_CODE (TREE_TYPE (index)) != ARRAY_TYPE
 	  && TREE_CODE (TREE_TYPE (index)) != POINTER_TYPE)
 	{
-          error_at (loc, 
+          error_at (loc,
             "subscripted value is neither array nor pointer nor vector");
 
 	  return error_mark_node;
@@ -2339,8 +2339,8 @@ build_array_ref (location_t loc, tree ar
   index = default_conversion (index);
 
   gcc_assert (TREE_CODE (TREE_TYPE (index)) == INTEGER_TYPE);
-  
-  /* For vector[index], convert the vector to a 
+
+  /* For vector[index], convert the vector to a
      pointer of the underlying type.  */
   if (TREE_CODE (TREE_TYPE (array)) == VECTOR_TYPE)
     {
@@ -2348,11 +2348,11 @@ build_array_ref (location_t loc, tree ar
       tree type1;
 
       if (TREE_CODE (index) == INTEGER_CST)
-        if (!host_integerp (index, 1) 
-            || ((unsigned HOST_WIDE_INT) tree_low_cst (index, 1) 
+        if (!host_integerp (index, 1)
+            || ((unsigned HOST_WIDE_INT) tree_low_cst (index, 1)
                >= TYPE_VECTOR_SUBPARTS (TREE_TYPE (array))))
           warning_at (loc, OPT_Warray_bounds, "index value is out of bound");
-     
+
       c_common_mark_addressable_vec (array);
       type = build_qualified_type (TREE_TYPE (type), TYPE_QUALS (type));
       type = build_pointer_type (type);
@@ -2845,6 +2845,99 @@ build_function_call_vec (location_t loc,
     }
   return require_complete_type (result);
 }
+
+/* Build a VEC_SHUFFLE_EXPR if V0, V1 and MASK are not error_mark_nodes
+   and have vector types, V0 has the same type as V1, and the number of
+   elements of V0, V1, MASK is the same.
+
+   In case V1 is a NULL_TREE it is assumed that __builtin_shuffle was
+   called with two arguments.  In this case implementation passes the
+   first argument twice in order to share the same tree code.  This fact
+   could enable the mask-values being twice the vector length.  This is
+   an implementation accident and this semantics is not guaranteed to
+   the user.  */
+tree
+c_build_vec_shuffle_expr (location_t loc, tree v0, tree v1, tree mask)
+{
+  tree vec_shuffle, tmp;
+  bool wrap = true;
+  bool maybe_const = false;
+  bool two_arguments;
+
+  if (v1 == NULL_TREE)
+    {
+      two_arguments = true;
+      v1 = v0;
+    }
+
+  if (v0 == error_mark_node || v1 == error_mark_node
+      || mask == error_mark_node)
+    return error_mark_node;
+
+  if (TREE_CODE (TREE_TYPE (mask)) != VECTOR_TYPE
+      || TREE_CODE (TREE_TYPE (TREE_TYPE (mask))) != INTEGER_TYPE)
+    {
+      error_at (loc, "__builtin_shuffle last argument must "
+		     "be an integer vector");
+      return error_mark_node;
+    }
+
+  if (TREE_CODE (TREE_TYPE (v0)) != VECTOR_TYPE
+      || TREE_CODE (TREE_TYPE (v1)) != VECTOR_TYPE)
+    {
+      error_at (loc, "__builtin_shuffle arguments must be vectors");
+      return error_mark_node;
+    }
+
+  if (TYPE_MAIN_VARIANT (TREE_TYPE (v0)) != TYPE_MAIN_VARIANT (TREE_TYPE (v1)))
+    {
+      error_at (loc, "__builtin_shuffle argument vectors must be of "
+		     "the same type");
+      return error_mark_node;
+    }
+
+  if (TYPE_VECTOR_SUBPARTS (TREE_TYPE (v0))
+      != TYPE_VECTOR_SUBPARTS (TREE_TYPE (mask))
+      && TYPE_VECTOR_SUBPARTS (TREE_TYPE (v1))
+	 != TYPE_VECTOR_SUBPARTS (TREE_TYPE (mask)))
+    {
+      error_at (loc, "__builtin_shuffle number of elements of the "
+		     "argument vector(s) and the mask vector should "
+		     "be the same");
+      return error_mark_node;
+    }
+
+  if (GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE (TREE_TYPE (v0))))
+      != GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE (TREE_TYPE (mask)))))
+    {
+      error_at (loc, "__builtin_shuffle argument vector(s) inner type "
+		     "must have the same size as inner type of the mask");
+      return error_mark_node;
+    }
+
+  /* Avoid C_MAYBE_CONST_EXPRs inside VEC_SHUFFLE_EXPR.  */
+  tmp = c_fully_fold (v0, false, &maybe_const);
+  v0 = save_expr (tmp);
+  wrap &= maybe_const;
+
+  if (!two_arguments)
+    {
+      v1 = c_fully_fold (v1, false, &maybe_const);
+      wrap &= maybe_const;
+    }
+  else
+    v1 = v0;
+
+  mask = c_fully_fold (mask, false, &maybe_const);
+  wrap &= maybe_const;
+
+  vec_shuffle = build3 (VEC_SHUFFLE_EXPR, TREE_TYPE (v0), v0, v1, mask);
+
+  if (!wrap)
+    vec_shuffle = c_wrap_maybe_const (vec_shuffle, true);
+
+  return vec_shuffle;
+}
 \f
 /* Convert the argument expressions in the vector VALUES
    to the types in the list TYPELIST.
@@ -3167,7 +3260,7 @@ convert_arguments (tree typelist, VEC(tr
 
   if (typetail != 0 && TREE_VALUE (typetail) != void_type_node)
     {
-      error_at (input_location, 
+      error_at (input_location,
 		"too few arguments to function %qE", function);
       if (fundecl && !DECL_BUILT_IN (fundecl))
 	inform (DECL_SOURCE_LOCATION (fundecl), "declared here");
@@ -3566,7 +3659,7 @@ build_unary_op (location_t location,
 
       /* Complain about anything that is not a true lvalue.  In
 	 Objective-C, skip this check for property_refs.  */
-      if (!objc_is_property_ref (arg) 
+      if (!objc_is_property_ref (arg)
 	  && !lvalue_or_else (location,
 			      arg, ((code == PREINCREMENT_EXPR
 				     || code == POSTINCREMENT_EXPR)
@@ -3683,7 +3776,7 @@ build_unary_op (location_t location,
 	   need to ask Objective-C to build the increment or decrement
 	   expression for it.  */
 	if (objc_is_property_ref (arg))
-	  return objc_build_incr_expr_for_property_ref (location, code, 
+	  return objc_build_incr_expr_for_property_ref (location, code,
 							arg, inc);
 
 	/* Report a read-only lvalue.  */
@@ -5926,7 +6019,7 @@ void
 pedwarn_init (location_t location, int opt, const char *gmsgid)
 {
   char *ofwhat;
-  
+
   /* The gmsgid may be a format string with %< and %>. */
   pedwarn (location, opt, gmsgid);
   ofwhat = print_spelling ((char *) alloca (spelling_length () + 1));
@@ -9344,8 +9437,8 @@ scalar_to_vector (location_t loc, enum t
   tree type1 = TREE_TYPE (op1);
   bool integer_only_op = false;
   enum stv_conv ret = stv_firstarg;
-  
-  gcc_assert (TREE_CODE (type0) == VECTOR_TYPE 
+
+  gcc_assert (TREE_CODE (type0) == VECTOR_TYPE
 	      || TREE_CODE (type1) == VECTOR_TYPE);
   switch (code)
     {
@@ -9370,7 +9463,7 @@ scalar_to_vector (location_t loc, enum t
       case BIT_AND_EXPR:
 	integer_only_op = true;
 	/* ... fall through ...  */
-      
+
       case PLUS_EXPR:
       case MINUS_EXPR:
       case MULT_EXPR:
@@ -9387,7 +9480,7 @@ scalar_to_vector (location_t loc, enum t
 	  }
 
 	if (TREE_CODE (type0) == INTEGER_TYPE
-	    && TREE_CODE (TREE_TYPE (type1)) == INTEGER_TYPE) 
+	    && TREE_CODE (TREE_TYPE (type1)) == INTEGER_TYPE)
 	  {
 	    if (unsafe_conversion_p (TREE_TYPE (type1), op0, false))
 	      {
@@ -9399,7 +9492,7 @@ scalar_to_vector (location_t loc, enum t
 	  }
 	else if (!integer_only_op
 		    /* Allow integer --> real conversion if safe.  */
-		 && (TREE_CODE (type0) == REAL_TYPE 
+		 && (TREE_CODE (type0) == REAL_TYPE
 		     || TREE_CODE (type0) == INTEGER_TYPE)
 		 && SCALAR_FLOAT_TYPE_P (TREE_TYPE (type1)))
 	  {
@@ -9414,7 +9507,7 @@ scalar_to_vector (location_t loc, enum t
       default:
 	break;
     }
- 
+
   return stv_nothing;
 }
 \f
@@ -9529,8 +9622,8 @@ build_binary_op (location_t location, en
     int_const = int_const_or_overflow = false;
 
   /* Do not apply default conversion in mixed vector/scalar expression.  */
-  if (convert_p 
-      && !((TREE_CODE (TREE_TYPE (op0)) == VECTOR_TYPE) 
+  if (convert_p
+      && !((TREE_CODE (TREE_TYPE (op0)) == VECTOR_TYPE)
 	   != (TREE_CODE (TREE_TYPE (op1)) == VECTOR_TYPE)))
     {
       op0 = default_conversion (op0);
@@ -9608,7 +9701,7 @@ build_binary_op (location_t location, en
   if ((code0 == VECTOR_TYPE) != (code1 == VECTOR_TYPE))
     {
       enum stv_conv convert_flag = scalar_to_vector (location, code, op0, op1);
-      
+
       switch (convert_flag)
 	{
 	  case stv_error:
@@ -9949,7 +10042,7 @@ build_binary_op (location_t location, en
 	    {
 	      if (code == EQ_EXPR)
 		warning_at (location,
-			    OPT_Waddress, 
+			    OPT_Waddress,
 			    "the comparison will always evaluate as %<false%> "
 			    "for the address of %qD will never be NULL",
 			    TREE_OPERAND (op1, 0));
Index: gcc/gimplify.c
===================================================================
--- gcc/gimplify.c	(revision 178354)
+++ gcc/gimplify.c	(working copy)
@@ -7286,6 +7286,7 @@ gimplify_expr (tree *expr_p, gimple_seq
 	  }
 
 	case FMA_EXPR:
+	case VEC_SHUFFLE_EXPR:
 	  /* Classified as tcc_expression.  */
 	  goto expr_3;
 
Index: gcc/tree.def
===================================================================
--- gcc/tree.def	(revision 178354)
+++ gcc/tree.def	(working copy)
@@ -497,6 +497,19 @@ DEFTREECODE (COND_EXPR, "cond_expr", tcc
 */
 DEFTREECODE (VEC_COND_EXPR, "vec_cond_expr", tcc_expression, 3)
 
+/* Vector shuffle expression.  A = VEC_SHUFFLE_EXPR<v0, v1, mask>
+   means
+
+   foreach i in length (mask):
+     A = mask[i] < length (v0) ? v0[mask[i]] : v1[mask[i] - length (mask)]
+
+   V0 and V1 are vectors of the same type.  MASK is an integer-typed
+   vector.  The number of MASK elements must be the same with the
+   number of elements in V0 and V1.  The size of the inner type
+   of the MASK and of the V0 and V1 must be the same.
+*/
+DEFTREECODE (VEC_SHUFFLE_EXPR, "vec_shuffle_expr", tcc_expression, 3)
+
 /* Declare local variables, including making RTL and allocating space.
    BIND_EXPR_VARS is a chain of VAR_DECL nodes for the variables.
    BIND_EXPR_BODY is the body, the expression to be computed using
Index: gcc/tree-inline.c
===================================================================
--- gcc/tree-inline.c	(revision 178354)
+++ gcc/tree-inline.c	(working copy)
@@ -3285,6 +3285,7 @@ estimate_operator_cost (enum tree_code c
        ??? We may consider mapping RTL costs to this.  */
     case COND_EXPR:
     case VEC_COND_EXPR:
+    case VEC_SHUFFLE_EXPR:
 
     case PLUS_EXPR:
     case POINTER_PLUS_EXPR:
Index: gcc/tree-vect-generic.c
===================================================================
--- gcc/tree-vect-generic.c	(revision 178354)
+++ gcc/tree-vect-generic.c	(working copy)
@@ -30,6 +30,7 @@ along with GCC; see the file COPYING3.
 #include "tree-pass.h"
 #include "flags.h"
 #include "ggc.h"
+#include "diagnostic.h"
 
 /* Need to include rtl.h, expr.h, etc. for optabs.  */
 #include "expr.h"
@@ -326,10 +327,10 @@ uniform_vector_p (tree vec)
         }
       if (i != TYPE_VECTOR_SUBPARTS (TREE_TYPE (vec)))
 	return NULL_TREE;
-      
+
       return first;
     }
-  
+
   return NULL_TREE;
 }
 
@@ -432,6 +433,263 @@ type_for_widest_vector_mode (enum machin
     }
 }
 
+
+/* Build a reference to the element of the vector VECT.  Function
+   returns either the element itself, either BIT_FIELD_REF, or an
+   ARRAY_REF expression.
+
+   GSI is requred to insert temporary variables while building a
+   refernece to the element of the vector VECT.
+
+   PTMPVEC is a pointer to the temporary variable for caching
+   purposes.  In case when PTMPVEC is NULL new temporary variable
+   will be created.  */
+static tree
+vector_element (gimple_stmt_iterator *gsi, tree vect, tree idx, tree *ptmpvec)
+{
+  tree type;
+  gimple asgn;
+  tree tmpvec;
+  tree arraytype;
+  bool need_asgn = true;
+
+  gcc_assert (TREE_CODE (TREE_TYPE (vect)) == VECTOR_TYPE);
+
+  type = TREE_TYPE (vect);
+  if (TREE_CODE (idx) == INTEGER_CST)
+    {
+      unsigned HOST_WIDE_INT index;
+
+      if (!host_integerp (idx, 1)
+           || (index = tree_low_cst (idx, 1)) > TYPE_VECTOR_SUBPARTS (type)-1)
+        return error_mark_node;
+
+      if (TREE_CODE (vect) == VECTOR_CST)
+        {
+	  unsigned i;
+	  tree vals = TREE_VECTOR_CST_ELTS (vect);
+	  for (i = 0; vals; vals = TREE_CHAIN (vals), ++i)
+	    if (i == index)
+	       return TREE_VALUE (vals);
+	  return error_mark_node;
+        }
+      else if (TREE_CODE (vect) == CONSTRUCTOR)
+        {
+          unsigned i;
+          VEC (constructor_elt, gc) *vals = CONSTRUCTOR_ELTS (vect);
+          constructor_elt *elt;
+
+          for (i = 0; VEC_iterate (constructor_elt, vals, i, elt); i++)
+            if (operand_equal_p (elt->index, idx, 0))
+              return elt->value;
+          return fold_convert (TREE_TYPE (type), integer_zero_node);
+        }
+      else if (TREE_CODE (vect) == SSA_NAME)
+        {
+	  tree size = TYPE_SIZE (TREE_TYPE (type));
+          tree pos = fold_build2 (MULT_EXPR, TREE_TYPE (idx), idx, size);
+          return fold_build3 (BIT_FIELD_REF, TREE_TYPE (type), vect, size, pos);
+        }
+      else
+        return error_mark_node;
+    }
+
+  if (!ptmpvec)
+    tmpvec = create_tmp_var (TREE_TYPE (vect), "vectmp");
+  else if (!*ptmpvec)
+    tmpvec = *ptmpvec = create_tmp_var (TREE_TYPE (vect), "vectmp");
+  else
+    {
+      tmpvec = *ptmpvec;
+      need_asgn = false;
+    }
+
+  if (need_asgn)
+    {
+      TREE_ADDRESSABLE (tmpvec) = 1;
+      asgn = gimple_build_assign (tmpvec, vect);
+      gsi_insert_before (gsi, asgn, GSI_SAME_STMT);
+    }
+
+  arraytype = build_array_type_nelts (TREE_TYPE (type),
+				      TYPE_VECTOR_SUBPARTS (TREE_TYPE (vect)));
+
+  return build4 (ARRAY_REF, TREE_TYPE (type),
+                 build1 (VIEW_CONVERT_EXPR, arraytype, tmpvec),
+                 idx, NULL_TREE, NULL_TREE);
+}
+
+/* Check if VEC_SHUFFLE_EXPR within the given setting is supported
+   by hardware, or lower it piecewise.  Function returns false when
+   the expression must be replaced with TRAP_RETURN, true otherwise.
+
+   When VEC_SHUFFLE_EXPR has the same first and second operands:
+   VEC_SHUFFLE_EXPR <v0, v0, mask> the lowered version would be
+   {v0[mask[0]], v0[mask[1]], ...}
+   MASK and V0 must have the same number of elements.
+
+   Otherwise VEC_SHUFFLE_EXPR <v0, v1, mask> is lowered to
+   {mask[0] < len(v0) ? v0[mask[0]] : v1[mask[0]], ...}
+   V0 and V1 must have the same type.  MASK, V0, V1 must have the
+   same number of arguments.  */
+static bool
+lower_vec_shuffle (gimple_stmt_iterator *gsi, location_t loc)
+{
+
+  gimple stmt = gsi_stmt (*gsi);
+  tree mask = gimple_assign_rhs3 (stmt);
+  tree vec0 = gimple_assign_rhs1 (stmt);
+  tree vec1 = gimple_assign_rhs2 (stmt);
+  unsigned els = TYPE_VECTOR_SUBPARTS (TREE_TYPE (mask));
+  tree type0 = TREE_TYPE (TREE_TYPE (vec0));
+  VEC(constructor_elt,gc) *v = NULL;
+  tree vectype, constr;
+  tree vec0tmp = NULL_TREE, masktmp = NULL_TREE;
+
+  if (expand_vec_shuffle_expr_p (TYPE_MODE (TREE_TYPE (vec0)), vec0, vec1, mask))
+    {
+      tree t;
+
+      t = gimplify_build3 (gsi, VEC_SHUFFLE_EXPR, TREE_TYPE (vec0),
+			   vec0, vec1, mask);
+      gimple_assign_set_rhs_from_tree (gsi, t);
+      /* Statement should be updated by callee.  */
+      return true;
+    }
+
+  if (operand_equal_p (vec0, vec1, 0))
+    {
+      unsigned i;
+      tree vec0tmp = NULL_TREE;
+
+      v = VEC_alloc (constructor_elt, gc, els);
+      for (i = 0; i < els; i++)
+        {
+          tree idxval, vecel, t;
+
+	  idxval = vector_element (gsi, mask, size_int (i), &masktmp);
+          if (idxval == error_mark_node)
+            {
+              if (warning_at (loc, 0, "Invalid shuffling mask index %i", i))
+		inform (loc, "if this code is reached the programm will abort");
+	      return false;
+            }
+
+	  vecel = vector_element (gsi, vec0, idxval, &vec0tmp);
+          if (vecel == error_mark_node)
+            {
+              if (warning_at (loc, 0, "Invalid shuffling arguments"))
+		inform (loc, "if this code is reached the programm will abort");
+	      return false;
+            }
+
+          t = force_gimple_operand_gsi (gsi, vecel, true,
+					NULL_TREE, true, GSI_SAME_STMT);
+          CONSTRUCTOR_APPEND_ELT (v, size_int (i), t);
+        }
+    }
+  else
+    {
+      unsigned i;
+      tree var = create_tmp_var (type0, "vecel");
+      tree vec1tmp = NULL_TREE;
+
+      v = VEC_alloc (constructor_elt, gc, els);
+      for (i = 0; i < els; i++)
+        {
+          tree idxval, idx1val, cond, elval0, elval1, condexpr, t, ssatmp;
+          tree vec0el, vec1el;
+          gimple asgn;
+
+          idxval = vector_element (gsi, mask, size_int (i), &masktmp);
+	  if (idxval == error_mark_node)
+            {
+              if (warning_at (loc, 0, "Invalid shuffling mask index %i", i))
+		inform (loc, "if this code is reached the programm will abort");
+	      return false;
+            }
+
+          if (TREE_CODE (idxval) == INTEGER_CST)
+            {
+              if (tree_int_cst_lt (idxval, size_int (els)))
+                {
+                  vec0el = vector_element (gsi, vec0, idxval, &vec0tmp);
+                  t = force_gimple_operand_gsi (gsi, vec0el,
+                                    true, NULL_TREE, true, GSI_SAME_STMT);
+                }
+              else if (tree_int_cst_lt (idxval, size_int (2*els)))
+                {
+                  idx1val = fold_build2 (MINUS_EXPR, TREE_TYPE (idxval),
+                        idxval, build_int_cst (TREE_TYPE (idxval), els));
+
+                  vec1el = vector_element (gsi, vec1, idx1val, &vec1tmp);
+                  t = force_gimple_operand_gsi (gsi, vec1el, true,
+						NULL_TREE, true, GSI_SAME_STMT);
+                }
+              else
+                {
+                  if (warning_at (loc, 0, "Invalid shuffling mask index %i", i))
+		    inform (loc, "if this code is reached the "
+				  "programm will abort");
+		  return false;
+                }
+            }
+          else
+            {
+
+              idx1val = fold_build2 (MINUS_EXPR, TREE_TYPE (idxval),
+                            idxval, build_int_cst (TREE_TYPE (idxval), els));
+              idx1val = force_gimple_operand_gsi (gsi, idx1val,
+                                true, NULL_TREE, true, GSI_SAME_STMT);
+              cond = fold_build2 (GT_EXPR, boolean_type_node, \
+                             idxval, fold_convert (type0, size_int (els - 1)));
+
+	      vec0el = vector_element (gsi, vec0, idxval, &vec0tmp);
+              if (vec0el == error_mark_node)
+                {
+                  if (warning_at (loc, 0, "Invalid shuffling arguments"))
+		    inform (loc, "if this code is reached the "
+				 "programm will abort");
+		  return false;
+                }
+
+              elval0 = force_gimple_operand_gsi (gsi, vec0el,
+                                true, NULL_TREE, true, GSI_SAME_STMT);
+
+	      vec1el = vector_element (gsi, vec1, idx1val, &vec1tmp);
+              if (vec1el == error_mark_node)
+                {
+                  if (warning_at (loc, 0, "Invalid shuffling arguments"))
+		    inform (loc, "if this code is reached the "
+				 "programm will abort");
+		  return false;
+                }
+
+              elval1 = force_gimple_operand_gsi (gsi, vec1el,
+                                true, NULL_TREE, true, GSI_SAME_STMT);
+
+              condexpr = fold_build3 (COND_EXPR, type0, cond, \
+                                      elval1, elval0);
+
+              t = force_gimple_operand_gsi (gsi, condexpr, true, \
+                                        NULL_TREE, true, GSI_SAME_STMT);
+            }
+
+          asgn = gimple_build_assign (var, t);
+          ssatmp = make_ssa_name (var, asgn);
+          gimple_assign_set_lhs (asgn, ssatmp);
+          gsi_insert_before (gsi, asgn, GSI_SAME_STMT);
+          CONSTRUCTOR_APPEND_ELT (v, size_int (i), ssatmp);
+        }
+    }
+
+  vectype = build_vector_type (type0, els);
+  constr = build_constructor (vectype, v);
+  gimple_assign_set_rhs_from_tree (gsi, constr);
+  /* Statement should be updated by callee.  */
+  return true;
+}
+
 /* Process one statement.  If we identify a vector operation, expand it.  */
 
 static void
@@ -451,6 +709,25 @@ expand_vector_operations_1 (gimple_stmt_
   code = gimple_assign_rhs_code (stmt);
   rhs_class = get_gimple_rhs_class (code);
 
+  if (code == VEC_SHUFFLE_EXPR)
+    {
+      if (!lower_vec_shuffle (gsi, gimple_location (stmt)))
+	{
+	  gimple new_stmt;
+	  tree vec0;
+
+	  vec0 = gimple_assign_rhs1 (stmt);
+	  new_stmt = gimple_build_call (built_in_decls[BUILT_IN_TRAP], 0);
+	  gsi_insert_before (gsi, new_stmt,  GSI_SAME_STMT);
+	  split_block (gimple_bb (new_stmt), new_stmt);
+	  new_stmt = gimple_build_assign (gimple_assign_lhs (stmt), vec0);
+	  gsi_replace (gsi, new_stmt, false);
+	}
+
+      gimple_set_modified (gsi_stmt (*gsi), true);
+      update_stmt (gsi_stmt (*gsi));
+    }
+
   if (rhs_class != GIMPLE_UNARY_RHS && rhs_class != GIMPLE_BINARY_RHS)
     return;
 
@@ -485,9 +762,9 @@ expand_vector_operations_1 (gimple_stmt_
     {
       bool vector_scalar_shift;
       op = optab_for_tree_code (code, type, optab_scalar);
-      
+
       /* Vector/Scalar shift is supported.  */
-      vector_scalar_shift = (op && (optab_handler (op, TYPE_MODE (type)) 
+      vector_scalar_shift = (op && (optab_handler (op, TYPE_MODE (type))
 				    != CODE_FOR_nothing));
 
       /* If the 2nd argument is vector, we need a vector/vector shift.
@@ -500,10 +777,10 @@ expand_vector_operations_1 (gimple_stmt_
           /* Check whether we have vector <op> {x,x,x,x} where x
              could be a scalar variable or a constant. Transform
              vector <op> {x,x,x,x} ==> vector <op> scalar.  */
-          if (vector_scalar_shift 
+          if (vector_scalar_shift
               && ((TREE_CODE (rhs2) == VECTOR_CST
 		   && (first = uniform_vector_p (rhs2)) != NULL_TREE)
-		  || (TREE_CODE (rhs2) == SSA_NAME 
+		  || (TREE_CODE (rhs2) == SSA_NAME
 		      && (def_stmt = SSA_NAME_DEF_STMT (rhs2))
 		      && gimple_assign_single_p (def_stmt)
 		      && (first = uniform_vector_p
@@ -516,14 +793,14 @@ expand_vector_operations_1 (gimple_stmt_
           else
             op = optab_for_tree_code (code, type, optab_vector);
         }
-    
+
       /* Try for a vector/scalar shift, and if we don't have one, see if we
          have a vector/vector shift */
       else if (!vector_scalar_shift)
 	{
 	  op = optab_for_tree_code (code, type, optab_vector);
 
-	  if (op && (optab_handler (op, TYPE_MODE (type)) 
+	  if (op && (optab_handler (op, TYPE_MODE (type))
 		     != CODE_FOR_nothing))
 	    {
 	      /* Transform vector <op> scalar => vector <op> {x,x,x,x}.  */
@@ -613,9 +890,9 @@ expand_vector_operations_1 (gimple_stmt_
    if it may need the bit-twiddling tricks implemented in this file.  */
 
 static bool
-gate_expand_vector_operations (void)
+gate_expand_vector_operations_ssa (void)
 {
-  return flag_tree_vectorize != 0;
+  return optimize == 0;
 }
 
 static unsigned int
@@ -648,7 +925,7 @@ struct gimple_opt_pass pass_lower_vector
  {
   GIMPLE_PASS,
   "veclower",				/* name */
-  0,					/* gate */
+  gate_expand_vector_operations_ssa,    /* gate */
   expand_vector_operations,		/* execute */
   NULL,					/* sub */
   NULL,					/* next */
@@ -661,6 +938,7 @@ struct gimple_opt_pass pass_lower_vector
   TODO_update_ssa	                /* todo_flags_finish */
     | TODO_verify_ssa
     | TODO_verify_stmts | TODO_verify_flow
+    | TODO_cleanup_cfg
  }
 };
 
@@ -669,7 +947,7 @@ struct gimple_opt_pass pass_lower_vector
  {
   GIMPLE_PASS,
   "veclower2",				/* name */
-  gate_expand_vector_operations,	/* gate */
+  0,	                                /* gate */
   expand_vector_operations,		/* execute */
   NULL,					/* sub */
   NULL,					/* next */
@@ -682,6 +960,7 @@ struct gimple_opt_pass pass_lower_vector
   TODO_update_ssa	                /* todo_flags_finish */
     | TODO_verify_ssa
     | TODO_verify_stmts | TODO_verify_flow
+    | TODO_cleanup_cfg
  }
 };
 
Index: gcc/Makefile.in
===================================================================
--- gcc/Makefile.in	(revision 178354)
+++ gcc/Makefile.in	(working copy)
@@ -3178,7 +3178,7 @@ tree-vect-generic.o : tree-vect-generic.
     $(TM_H) $(TREE_FLOW_H) $(GIMPLE_H) tree-iterator.h $(TREE_PASS_H) \
     $(FLAGS_H) $(OPTABS_H) $(MACHMODE_H) $(EXPR_H) \
     langhooks.h $(FLAGS_H) $(DIAGNOSTIC_H) gt-tree-vect-generic.h $(GGC_H) \
-    coretypes.h insn-codes.h
+    coretypes.h insn-codes.h $(DIAGNOSTIC_H)
 df-core.o : df-core.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) \
    insn-config.h $(RECOG_H) $(FUNCTION_H) $(REGS_H) alloc-pool.h \
    hard-reg-set.h $(BASIC_BLOCK_H) $(DF_H) $(BITMAP_H) sbitmap.h $(TIMEVAR_H) \
Index: gcc/gimple.c
===================================================================
--- gcc/gimple.c	(revision 178354)
+++ gcc/gimple.c	(working copy)
@@ -2615,6 +2615,7 @@ get_gimple_rhs_num_ops (enum tree_code c
       || (SYM) == WIDEN_MULT_MINUS_EXPR					    \
       || (SYM) == DOT_PROD_EXPR						    \
       || (SYM) == REALIGN_LOAD_EXPR					    \
+      || (SYM) == VEC_SHUFFLE_EXPR					    \
       || (SYM) == FMA_EXPR) ? GIMPLE_TERNARY_RHS			    \
    : ((SYM) == COND_EXPR						    \
       || (SYM) == CONSTRUCTOR						    \
Index: gcc/tree-cfg.c
===================================================================
--- gcc/tree-cfg.c	(revision 178354)
+++ gcc/tree-cfg.c	(working copy)
@@ -3711,6 +3711,59 @@ verify_gimple_assign_ternary (gimple stm
 	}
       break;
 
+    case VEC_SHUFFLE_EXPR:
+      if (!useless_type_conversion_p (lhs_type, rhs1_type)
+	  || !useless_type_conversion_p (lhs_type, rhs2_type))
+	{
+	  error ("type mismatch in vector shuffle expression");
+	  debug_generic_expr (lhs_type);
+	  debug_generic_expr (rhs1_type);
+	  debug_generic_expr (rhs2_type);
+	  debug_generic_expr (rhs3_type);
+	  return true;
+	}
+
+      if (TREE_CODE (rhs1_type) != VECTOR_TYPE
+	  || TREE_CODE (rhs2_type) != VECTOR_TYPE
+	  || TREE_CODE (rhs3_type) != VECTOR_TYPE)
+	{
+	  error ("vector types expected in vector shuffle expression");
+	  debug_generic_expr (lhs_type);
+	  debug_generic_expr (rhs1_type);
+	  debug_generic_expr (rhs2_type);
+	  debug_generic_expr (rhs3_type);
+	  return true;
+	}
+
+      if (TYPE_VECTOR_SUBPARTS (rhs1_type) != TYPE_VECTOR_SUBPARTS (rhs2_type)
+	  || TYPE_VECTOR_SUBPARTS (rhs2_type)
+	     != TYPE_VECTOR_SUBPARTS (rhs3_type)
+	  || TYPE_VECTOR_SUBPARTS (rhs3_type)
+	     != TYPE_VECTOR_SUBPARTS (lhs_type))
+	{
+	  error ("vectors with different element number found "
+		 "in vector shuffle expression");
+	  debug_generic_expr (lhs_type);
+	  debug_generic_expr (rhs1_type);
+	  debug_generic_expr (rhs2_type);
+	  debug_generic_expr (rhs3_type);
+	  return true;
+	}
+
+      if (TREE_CODE (TREE_TYPE (rhs3_type)) != INTEGER_TYPE
+	  || GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE (rhs3_type)))
+	     != GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE (rhs1_type))))
+	{
+	  error ("invalid mask type in vector shuffle expression");
+	  debug_generic_expr (lhs_type);
+	  debug_generic_expr (rhs1_type);
+	  debug_generic_expr (rhs2_type);
+	  debug_generic_expr (rhs3_type);
+	  return true;
+	}
+
+      return false;
+
     case DOT_PROD_EXPR:
     case REALIGN_LOAD_EXPR:
       /* FIXME.  */
Index: gcc/passes.c
===================================================================
--- gcc/passes.c	(revision 178354)
+++ gcc/passes.c	(working copy)
@@ -1354,7 +1354,6 @@ init_optimization_passes (void)
 	  NEXT_PASS (pass_vectorize);
 	    {
 	      struct opt_pass **p = &pass_vectorize.pass.sub;
-	      NEXT_PASS (pass_lower_vector_ssa);
 	      NEXT_PASS (pass_dce_loop);
 	    }
           NEXT_PASS (pass_predcom);
@@ -1366,6 +1365,7 @@ init_optimization_passes (void)
 	  NEXT_PASS (pass_lim);
 	  NEXT_PASS (pass_tree_loop_done);
 	}
+      NEXT_PASS (pass_lower_vector_ssa);
       NEXT_PASS (pass_cse_reciprocals);
       NEXT_PASS (pass_reassoc);
       NEXT_PASS (pass_vrp);
Index: gcc/c-parser.c
===================================================================
--- gcc/c-parser.c	(revision 178354)
+++ gcc/c-parser.c	(working copy)
@@ -5989,6 +5989,46 @@ c_parser_alignof_expression (c_parser *p
     }
 }
 
+/* Helper function to read arguments of builtins which are interfaces
+   for the middle-end nodes like COMPLEX_EXPR, VEC_SHUFFLE_EXPR and
+   others.  The name of the builtin is passed using BNAME parameter.
+   Function returns true if there were no errors while parsing and
+   stores the arguments in EXPR_LIST.  List of original types can be
+   obtained by passing non NULL value to ORIG_TYPES.  */
+static bool
+c_parser_get_builtin_args (c_parser *parser, const char *bname,
+			   VEC(tree,gc) **expr_list,
+			   VEC(tree,gc) **orig_types)
+{
+  location_t loc = c_parser_peek_token (parser)->location;
+  *expr_list = NULL;
+
+  if (c_parser_next_token_is_not (parser, CPP_OPEN_PAREN))
+    {
+      error_at (loc, "cannot take address of %qs", bname);
+      return false;
+    }
+
+  c_parser_consume_token (parser);
+
+  if (c_parser_next_token_is (parser, CPP_CLOSE_PAREN))
+    {
+      c_parser_consume_token (parser);
+      return true;
+    }
+    
+  if (orig_types)
+    *expr_list = c_parser_expr_list (parser, false, false, orig_types);
+  else
+    *expr_list = c_parser_expr_list (parser, false, false, NULL);
+
+  if (!c_parser_require (parser, CPP_CLOSE_PAREN, "expected %<)%>"))
+    return false;
+
+  return true;
+}
+
+
 /* Parse a postfix expression (C90 6.3.1-6.3.2, C99 6.5.1-6.5.2).
 
    postfix-expression:
@@ -6027,6 +6067,10 @@ c_parser_alignof_expression (c_parser *p
 			     assignment-expression )
      __builtin_types_compatible_p ( type-name , type-name )
      __builtin_complex ( assignment-expression , assignment-expression )
+     __builtin_shuffle ( assignment-expression , assignment-expression )
+     __builtin_shuffle ( assignment-expression , 
+			 assignment-expression ,
+			 assignment-expression, )
 
    offsetof-member-designator:
      identifier
@@ -6047,7 +6091,7 @@ c_parser_alignof_expression (c_parser *p
 static struct c_expr
 c_parser_postfix_expression (c_parser *parser)
 {
-  struct c_expr expr, e1, e2, e3;
+  struct c_expr expr, e1;
   struct c_type_name *t1, *t2;
   location_t loc = c_parser_peek_token (parser)->location;;
   expr.original_code = ERROR_MARK;
@@ -6333,45 +6377,55 @@ c_parser_postfix_expression (c_parser *p
 	  }
 	  break;
 	case RID_CHOOSE_EXPR:
-	  c_parser_consume_token (parser);
-	  if (!c_parser_require (parser, CPP_OPEN_PAREN, "expected %<(%>"))
-	    {
-	      expr.value = error_mark_node;
-	      break;
-	    }
-	  loc = c_parser_peek_token (parser)->location;
-	  e1 = c_parser_expr_no_commas (parser, NULL);
-	  if (!c_parser_require (parser, CPP_COMMA, "expected %<,%>"))
-	    {
-	      c_parser_skip_until_found (parser, CPP_CLOSE_PAREN, NULL);
-	      expr.value = error_mark_node;
-	      break;
-	    }
-	  e2 = c_parser_expr_no_commas (parser, NULL);
-	  if (!c_parser_require (parser, CPP_COMMA, "expected %<,%>"))
-	    {
-	      c_parser_skip_until_found (parser, CPP_CLOSE_PAREN, NULL);
-	      expr.value = error_mark_node;
-	      break;
-	    }
-	  e3 = c_parser_expr_no_commas (parser, NULL);
-	  c_parser_skip_until_found (parser, CPP_CLOSE_PAREN,
-				     "expected %<)%>");
 	  {
-	    tree c;
+	    VEC(tree,gc) *expr_list;
+	    VEC(tree,gc) *orig_types;
+	    tree e1value, e2value, e3value, c;
 
-	    c = e1.value;
-	    mark_exp_read (e2.value);
-	    mark_exp_read (e3.value);
+	    c_parser_consume_token (parser);
+	    if (!c_parser_get_builtin_args (parser,
+					    "__builtin_choose_expr",
+					    &expr_list, &orig_types))
+	      {
+		expr.value = error_mark_node;
+		break;
+	      }
+
+	    if (VEC_length (tree, expr_list) != 3)
+	      {
+		error_at (loc, "wrong number of arguments to "
+			       "%<__builtin_choose_expr%>");
+		expr.value = error_mark_node;
+		break;
+	      }
+	    
+	    e1value = VEC_index (tree, expr_list, 0);
+	    e2value = VEC_index (tree, expr_list, 1);
+	    e3value = VEC_index (tree, expr_list, 2);
+
+	    c = e1value;
+	    mark_exp_read (e2value);
+	    mark_exp_read (e3value);
 	    if (TREE_CODE (c) != INTEGER_CST
 		|| !INTEGRAL_TYPE_P (TREE_TYPE (c)))
 	      error_at (loc,
 			"first argument to %<__builtin_choose_expr%> not"
 			" a constant");
 	    constant_expression_warning (c);
-	    expr = integer_zerop (c) ? e3 : e2;
+	    
+	    if (integer_zerop (c))
+	      {
+		expr.value = e3value;
+		expr.original_type = VEC_index (tree, orig_types, 2);
+	      }
+	    else
+	      {
+		expr.value = e2value;
+		expr.original_type = VEC_index (tree, orig_types, 1);
+	      }
+
+	    break;
 	  }
-	  break;
 	case RID_TYPES_COMPATIBLE_P:
 	  c_parser_consume_token (parser);
 	  if (!c_parser_require (parser, CPP_OPEN_PAREN, "expected %<(%>"))
@@ -6410,57 +6464,96 @@ c_parser_postfix_expression (c_parser *p
 	  }
 	  break;
 	case RID_BUILTIN_COMPLEX:
-	  c_parser_consume_token (parser);
-	  if (!c_parser_require (parser, CPP_OPEN_PAREN, "expected %<(%>"))
-	    {
-	      expr.value = error_mark_node;
-	      break;
-	    }
-	  loc = c_parser_peek_token (parser)->location;
-	  e1 = c_parser_expr_no_commas (parser, NULL);
-	  if (!c_parser_require (parser, CPP_COMMA, "expected %<,%>"))
-	    {
-	      c_parser_skip_until_found (parser, CPP_CLOSE_PAREN, NULL);
-	      expr.value = error_mark_node;
-	      break;
-	    }
-	  e2 = c_parser_expr_no_commas (parser, NULL);
-	  c_parser_skip_until_found (parser, CPP_CLOSE_PAREN,
-				     "expected %<)%>");
-	  mark_exp_read (e1.value);
-	  if (TREE_CODE (e1.value) == EXCESS_PRECISION_EXPR)
-	    e1.value = convert (TREE_TYPE (e1.value),
-				TREE_OPERAND (e1.value, 0));
-	  mark_exp_read (e2.value);
-	  if (TREE_CODE (e2.value) == EXCESS_PRECISION_EXPR)
-	    e2.value = convert (TREE_TYPE (e2.value),
-				TREE_OPERAND (e2.value, 0));
-	  if (!SCALAR_FLOAT_TYPE_P (TREE_TYPE (e1.value))
-	      || DECIMAL_FLOAT_TYPE_P (TREE_TYPE (e1.value))
-	      || !SCALAR_FLOAT_TYPE_P (TREE_TYPE (e2.value))
-	      || DECIMAL_FLOAT_TYPE_P (TREE_TYPE (e2.value)))
-	    {
-	      error_at (loc, "%<__builtin_complex%> operand "
-			"not of real binary floating-point type");
-	      expr.value = error_mark_node;
-	      break;
-	    }
-	  if (TYPE_MAIN_VARIANT (TREE_TYPE (e1.value))
-	      != TYPE_MAIN_VARIANT (TREE_TYPE (e2.value)))
-	    {
-	      error_at (loc,
-			"%<__builtin_complex%> operands of different types");
-	      expr.value = error_mark_node;
-	      break;
-	    }
-	  if (!flag_isoc99)
-	    pedwarn (loc, OPT_pedantic,
-		     "ISO C90 does not support complex types");
-	  expr.value = build2 (COMPLEX_EXPR,
-			       build_complex_type (TYPE_MAIN_VARIANT
-						   (TREE_TYPE (e1.value))),
-			       e1.value, e2.value);
-	  break;
+	  { 
+	    VEC(tree,gc) *expr_list;
+	    tree e1value, e2value;
+	    
+	    c_parser_consume_token (parser);
+	    if (!c_parser_get_builtin_args (parser,
+					    "__builtin_complex",
+					    &expr_list, NULL))
+	      {
+		expr.value = error_mark_node;
+		break;
+	      }
+
+	    if (VEC_length (tree, expr_list) != 2)
+	      {
+		error_at (loc, "wrong number of arguments to "
+			       "%<__builtin_complex%>");
+		expr.value = error_mark_node;
+		break;
+	      }
+	    
+	    e1value = VEC_index (tree, expr_list, 0);
+	    e2value = VEC_index (tree, expr_list, 1);
+
+	    mark_exp_read (e1value);
+	    if (TREE_CODE (e1value) == EXCESS_PRECISION_EXPR)
+	      e1value = convert (TREE_TYPE (e1value),
+				 TREE_OPERAND (e1value, 0));
+	    mark_exp_read (e2value);
+	    if (TREE_CODE (e2value) == EXCESS_PRECISION_EXPR)
+	      e2value = convert (TREE_TYPE (e2value),
+				 TREE_OPERAND (e2value, 0));
+	    if (!SCALAR_FLOAT_TYPE_P (TREE_TYPE (e1value))
+		|| DECIMAL_FLOAT_TYPE_P (TREE_TYPE (e1value))
+		|| !SCALAR_FLOAT_TYPE_P (TREE_TYPE (e2value))
+		|| DECIMAL_FLOAT_TYPE_P (TREE_TYPE (e2value)))
+	      {
+		error_at (loc, "%<__builtin_complex%> operand "
+			  "not of real binary floating-point type");
+		expr.value = error_mark_node;
+		break;
+	      }
+	    if (TYPE_MAIN_VARIANT (TREE_TYPE (e1value))
+		!= TYPE_MAIN_VARIANT (TREE_TYPE (e2value)))
+	      {
+		error_at (loc,
+			  "%<__builtin_complex%> operands of different types");
+		expr.value = error_mark_node;
+		break;
+	      }
+	    if (!flag_isoc99)
+	      pedwarn (loc, OPT_pedantic,
+		       "ISO C90 does not support complex types");
+	    expr.value = build2 (COMPLEX_EXPR,
+				 build_complex_type (TYPE_MAIN_VARIANT
+						     (TREE_TYPE (e1value))),
+				 e1value, e2value);
+	    break;
+	  }
+	case RID_BUILTIN_SHUFFLE:
+	  {
+	    VEC(tree,gc) *expr_list;
+	    
+	    c_parser_consume_token (parser);
+	    if (!c_parser_get_builtin_args (parser,
+					    "__builtin_shuffle",
+					    &expr_list, NULL))
+	      {
+		expr.value = error_mark_node;
+		break;
+	      }
+
+	    if (VEC_length (tree, expr_list) == 2)
+	      expr.value = c_build_vec_shuffle_expr
+				(loc, VEC_index (tree, expr_list, 0),
+				 NULL_TREE,
+				 VEC_index (tree, expr_list, 1));
+	    else if (VEC_length (tree, expr_list) == 3)
+	      expr.value = c_build_vec_shuffle_expr
+				(loc, VEC_index (tree, expr_list, 0),
+				 VEC_index (tree, expr_list, 1),
+				 VEC_index (tree, expr_list, 2));
+	    else
+	      {
+		error_at (loc, "wrong number of arguments to "
+			       "%<__builtin_shuffle%>");
+		expr.value = error_mark_node;
+	      }
+	    break;
+	  }
 	case RID_AT_SELECTOR:
 	  gcc_assert (c_dialect_objc ());
 	  c_parser_consume_token (parser);
Index: gcc/config/i386/sse.md
===================================================================
--- gcc/config/i386/sse.md	(revision 178354)
+++ gcc/config/i386/sse.md	(working copy)
@@ -231,6 +231,12 @@ (define_mode_attr sseinsnmode
    (V4SF "V4SF") (V2DF "V2DF")
    (TI "TI") (V32QI "OI") (V16HI "OI") (V8SI "OI") (V4DI "OI")])
 
+;; All 128bit vector modes
+(define_mode_attr sseshuffint
+  [(V16QI "V16QI") (V8HI "V8HI")
+   (V4SI "V4SI")  (V2DI "V2DI")
+   (V4SF "V4SI") (V2DF "V2DI")])
+
 ;; Mapping of vector float modes to an integer mode of the same size
 (define_mode_attr sseintvecmode
   [(V8SF "V8SI") (V4DF "V4DI")
@@ -6234,6 +6240,19 @@ (define_expand "vconduv2di"
   DONE;
 })
 
+(define_expand "vshuffle<mode>"
+  [(match_operand:V_128 0 "register_operand" "")
+   (match_operand:V_128 1 "register_operand" "")
+   (match_operand:V_128 2 "register_operand" "")
+   (match_operand:<sseshuffint> 3 "register_operand" "")]
+  "TARGET_SSSE3 || TARGET_AVX"
+{
+  bool ok = ix86_expand_vshuffle (operands);
+  gcc_assert (ok);
+  DONE;
+})
+
+
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
 ;;
 ;; Parallel bitwise logical operations
Index: gcc/config/i386/i386-protos.h
===================================================================
--- gcc/config/i386/i386-protos.h	(revision 178354)
+++ gcc/config/i386/i386-protos.h	(working copy)
@@ -118,6 +118,7 @@ extern bool ix86_expand_int_movcc (rtx[]
 extern bool ix86_expand_fp_movcc (rtx[]);
 extern bool ix86_expand_fp_vcond (rtx[]);
 extern bool ix86_expand_int_vcond (rtx[]);
+extern bool ix86_expand_vshuffle (rtx[]);
 extern void ix86_expand_sse_unpack (rtx[], bool, bool);
 extern bool ix86_expand_int_addcc (rtx[]);
 extern rtx ix86_expand_call (rtx, rtx, rtx, rtx, rtx, bool);
Index: gcc/config/i386/i386.c
===================================================================
--- gcc/config/i386/i386.c	(revision 178354)
+++ gcc/config/i386/i386.c	(working copy)
@@ -18693,6 +18693,147 @@ ix86_expand_int_vcond (rtx operands[])
   return true;
 }
 
+bool
+ix86_expand_vshuffle (rtx operands[])
+{
+  rtx target = operands[0];
+  rtx op0 = operands[1];
+  rtx op1 = operands[2];
+  rtx mask = operands[3];
+  rtx new_mask, vt, t1, t2, w_vector;
+  enum machine_mode mode = GET_MODE (op0);
+  enum machine_mode maskmode = GET_MODE (mask);
+  enum machine_mode maskinner = GET_MODE_INNER (mode);
+  rtx vec[16];
+  int w, i, j;
+  bool one_operand_shuffle = op0 == op1;
+
+  gcc_assert ((TARGET_SSSE3 || TARGET_AVX) && GET_MODE_BITSIZE (mode) == 128);
+
+  /* Number of elements in the vector.  */
+  w = GET_MODE_BITSIZE (maskmode) / GET_MODE_BITSIZE (maskinner);
+
+  /* generate w_vector = {w, w, ...}  */
+  for (i = 0; i < w; i++)
+    vec[i] = GEN_INT (w);
+  w_vector = gen_rtx_CONST_VECTOR (maskmode, gen_rtvec_v (w, vec));
+
+  /* mask = mask & {w-1, w-1, w-1,...} */
+  for (i = 0; i < w; i++)
+    vec[i] = GEN_INT (w - 1);
+
+  vt = gen_rtx_CONST_VECTOR (maskmode, gen_rtvec_v (w, vec));
+  new_mask = expand_simple_binop (maskmode, AND, mask, vt,
+				  NULL_RTX, 0, OPTAB_DIRECT);
+
+  /* If the original vector mode is V16QImode, we can just
+     use pshufb directly.  */
+  if (mode == V16QImode && one_operand_shuffle)
+    {
+      t1 = gen_reg_rtx (V16QImode);
+      emit_insn (gen_ssse3_pshufbv16qi3 (t1, op0, new_mask));
+      emit_insn (gen_rtx_SET (VOIDmode, target, t1));
+      return true;
+    }
+  else if (mode == V16QImode)
+    {
+      rtx xops[6];
+
+      t1 = gen_reg_rtx (V16QImode);
+      t2 = gen_reg_rtx (V16QImode);
+      emit_insn (gen_ssse3_pshufbv16qi3 (t1, op0, new_mask));
+      emit_insn (gen_ssse3_pshufbv16qi3 (t2, op1, new_mask));
+
+      /* mask = mask & {w, w, ...}  */
+      mask = expand_simple_binop (V16QImode, AND, mask, w_vector,
+				  NULL_RTX, 0, OPTAB_DIRECT);
+      xops[0] = target;
+      xops[1] = operands[1];
+      xops[2] = operands[2];
+      xops[3] = gen_rtx_EQ (mode, mask, w_vector);
+      xops[4] = t1;
+      xops[5] = t2;
+
+      return ix86_expand_int_vcond (xops);
+    }
+
+  /* mask = mask * {w, w, ...}  */
+  new_mask = expand_simple_binop (maskmode, MULT, new_mask, w_vector,
+				  NULL_RTX, 0, OPTAB_DIRECT);
+
+  /* Convert mask to vector of chars.  */
+  new_mask = simplify_gen_subreg (V16QImode, new_mask, maskmode, 0);
+  new_mask = force_reg (V16QImode, new_mask);
+
+  /* Build a helper mask wich we will use in pshufb
+     (v4si) --> {0,0,0,0, 4,4,4,4, 8,8,8,8, 12,12,12,12}
+     (v8hi) --> {0,0, 2,2, 4,4, 6,6, ...}
+     ...  */
+  for (i = 0; i < w; i++)
+    for (j = 0; j < 16/w; j++)
+      vec[i*w+j] = GEN_INT (i*16/w);
+  vt = gen_rtx_CONST_VECTOR (V16QImode, gen_rtvec_v (16, vec));
+  vt = force_reg (V16QImode, vt);
+
+  t1 = gen_reg_rtx (V16QImode);
+  emit_insn (gen_ssse3_pshufbv16qi3 (t1, new_mask, vt));
+  new_mask = t1;
+
+  /* Convert it into the byte positions by doing
+     new_mask = new_mask + {0,1,..,16/w, 0,1,..,16/w, ...}  */
+  for (i = 0; i < w; i++)
+    for (j = 0; j < 16/w; j++)
+      vec[i*w+j] = GEN_INT (j);
+
+  vt = gen_rtx_CONST_VECTOR (V16QImode, gen_rtvec_v (16, vec));
+  new_mask = expand_simple_binop (V16QImode, PLUS, new_mask, vt,
+				  NULL_RTX, 0, OPTAB_DIRECT);
+
+  t1 = gen_reg_rtx (V16QImode);
+
+  /* Convert OP0 to vector of chars.  */
+  op0 = simplify_gen_subreg (V16QImode, op0, mode, 0);
+  op0 = force_reg (V16QImode, op0);
+  emit_insn (gen_ssse3_pshufbv16qi3 (t1, op0, new_mask));
+
+  if (one_operand_shuffle)
+    {
+      /* Convert it back from vector of chars to the original mode.  */
+      t1 = simplify_gen_subreg (mode, t1, V16QImode, 0);
+      emit_insn (gen_rtx_SET (VOIDmode, target, t1));
+      return true;
+    }
+  else
+    {
+      rtx xops[6];
+
+      t2 = gen_reg_rtx (V16QImode);
+
+      /* Convert OP1 to vector of chars.  */
+      op1 = simplify_gen_subreg (V16QImode, op1, mode, 0);
+      op1 = force_reg (V16QImode, op1);
+      emit_insn (gen_ssse3_pshufbv16qi3 (t1, op1, new_mask));
+
+      /* mask = mask & {w, w, ...}  */
+      mask = expand_simple_binop (V16QImode, AND, mask, w_vector,
+				  NULL_RTX, 0, OPTAB_DIRECT);
+
+      t1 = simplify_gen_subreg (mode, t1, V16QImode, 0);
+      t2 = simplify_gen_subreg (mode, t2, V16QImode, 0);
+
+      xops[0] = target;
+      xops[1] = operands[1];
+      xops[2] = operands[2];
+      xops[3] = gen_rtx_EQ (mode, mask, w_vector);
+      xops[4] = t1;
+      xops[5] = t2;
+
+      return ix86_expand_int_vcond (xops);
+    }
+
+  return false;
+}
+
 /* Unpack OP[1] into the next wider integer vector type.  UNSIGNED_P is
    true if we should do zero extension, else sign extension.  HIGH_P is
    true if we want the N/2 high elements, else the low elements.  */
@@ -30911,6 +31052,9 @@ struct expand_vec_perm_d
 
 static bool expand_vec_perm_1 (struct expand_vec_perm_d *d);
 static bool expand_vec_perm_broadcast_1 (struct expand_vec_perm_d *d);
+static int extract_vec_perm_cst (struct expand_vec_perm_d *, tree);
+static bool ix86_vectorize_builtin_vec_perm_ok (tree vec_type, tree mask);
+
 
 /* Get a vector mode of the same size as the original but with elements
    twice as wide.  This is only guaranteed to apply to integral vectors.  */
@@ -32417,7 +32561,7 @@ void ix86_emit_i387_round (rtx op0, rtx
   res = gen_reg_rtx (outmode);
 
   half = CONST_DOUBLE_FROM_REAL_VALUE (dconsthalf, inmode);
-  
+
   /* round(a) = sgn(a) * floor(fabs(a) + 0.5) */
 
   /* scratch = fxam(op1) */
@@ -34576,10 +34720,10 @@ ix86_vectorize_builtin_vec_perm_ok (tree
 
   vec_mask = extract_vec_perm_cst (&d, mask);
 
-  /* This hook is cannot be called in response to something that the
-     user does (unlike the builtin expander) so we shouldn't ever see
-     an error generated from the extract.  */
-  gcc_assert (vec_mask > 0 && vec_mask <= 3);
+  /* Check whether the mask can be applied to the vector type.  */
+  if (vec_mask < 0 || vec_mask > 3)
+    return false;
+
   one_vec = (vec_mask != 3);
 
   /* Implementable with shufps or pshufd.  */
Index: gcc/tree-ssa-operands.c
===================================================================
--- gcc/tree-ssa-operands.c	(revision 178354)
+++ gcc/tree-ssa-operands.c	(working copy)
@@ -943,6 +943,7 @@ get_expr_operands (gimple stmt, tree *ex
 
     case COND_EXPR:
     case VEC_COND_EXPR:
+    case VEC_SHUFFLE_EXPR:
       get_expr_operands (stmt, &TREE_OPERAND (expr, 0), uflags);
       get_expr_operands (stmt, &TREE_OPERAND (expr, 1), uflags);
       get_expr_operands (stmt, &TREE_OPERAND (expr, 2), uflags);

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: Vector shuffling
  2011-09-30 20:51                                                 ` Artem Shinkarov
@ 2011-09-30 23:22                                                   ` Richard Henderson
       [not found]                                                     ` <CABYV9SUt+mFr3XQLHnzJevBmovkop92tSRDnR9j4U7bOuDWuew@mail.gmail.com>
  2011-10-04  2:26                                                   ` Hans-Peter Nilsson
  1 sibling, 1 reply; 71+ messages in thread
From: Richard Henderson @ 2011-09-30 23:22 UTC (permalink / raw)
  To: Artem Shinkarov
  Cc: Joseph S. Myers, Richard Guenther, Duncan Sands, gcc-patches

On 09/30/2011 12:14 PM, Artem Shinkarov wrote:
> Ok, in the attachment there is a patch which fixes mentioned errors.

The changes are ok.  I would have committed it for you, only the patch
isn't against mainline.  There are 4 rejects.


r~

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: Vector shuffling
       [not found]                                                     ` <CABYV9SUt+mFr3XQLHnzJevBmovkop92tSRDnR9j4U7bOuDWuew@mail.gmail.com>
@ 2011-10-03 12:15                                                       ` Artem Shinkarov
  2011-10-03 15:13                                                         ` Richard Henderson
  2011-10-03 22:48                                                       ` H.J. Lu
  1 sibling, 1 reply; 71+ messages in thread
From: Artem Shinkarov @ 2011-10-03 12:15 UTC (permalink / raw)
  To: Richard Henderson
  Cc: Joseph S. Myers, Richard Guenther, Duncan Sands, gcc-patches

Hi, can anyone commit it please?

Richard?
Or may be Richard?


Thanks,
Artem.



On Sat, Oct 1, 2011 at 12:21 AM, Artem Shinkarov
<artyom.shinkaroff@gmail.com> wrote:
> Sorry for that, the vector comparison was submitted earlier. In the
> attachment there is a new version of the patch against the latest
> checkout.
>
> Richard, can you have a look at the genopinit.c, I am using
> set_direct_optab_handler, is it correct?
>
> All the rest seems to be the same.
>
>
> Thanks,
> Artem.
>
>
> On Fri, Sep 30, 2011 at 10:24 PM, Richard Henderson <rth@redhat.com> wrote:
>> On 09/30/2011 12:14 PM, Artem Shinkarov wrote:
>>> Ok, in the attachment there is a patch which fixes mentioned errors.
>>
>> The changes are ok.  I would have committed it for you, only the patch
>> isn't against mainline.  There are 4 rejects.
>>
>>
>> r~
>>
>

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: Vector shuffling
  2011-10-03 12:15                                                       ` Artem Shinkarov
@ 2011-10-03 15:13                                                         ` Richard Henderson
  2011-10-03 16:44                                                           ` Artem Shinkarov
  0 siblings, 1 reply; 71+ messages in thread
From: Richard Henderson @ 2011-10-03 15:13 UTC (permalink / raw)
  To: Artem Shinkarov
  Cc: Joseph S. Myers, Richard Guenther, Duncan Sands, gcc-patches

On 10/03/2011 05:14 AM, Artem Shinkarov wrote:
> Hi, can anyone commit it please?
> 
> Richard?
> Or may be Richard?

Committed.


r~

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: Vector shuffling
  2011-10-03 15:13                                                         ` Richard Henderson
@ 2011-10-03 16:44                                                           ` Artem Shinkarov
  2011-10-03 17:12                                                             ` Richard Henderson
  2011-10-06 10:55                                                             ` Georg-Johann Lay
  0 siblings, 2 replies; 71+ messages in thread
From: Artem Shinkarov @ 2011-10-03 16:44 UTC (permalink / raw)
  To: Richard Henderson
  Cc: Joseph S. Myers, Richard Guenther, Duncan Sands, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 713 bytes --]

Hi, Richard

There is a problem with the testcases of the patch you have committed
for me. The code in every test-case is doubled. Could you please,
apply the following patch, otherwise it would fail all the tests from
the vector-shuffle-patch would fail.

Also, if it is possible, could you change my name from in the
ChangeLog from "Artem Shinkarov" to "Artjoms Sinkarovs". The last
version is the way I am spelled in the passport, and the name I use in
the ChangeLog.



Thanks,
Artem.


On Mon, Oct 3, 2011 at 4:13 PM, Richard Henderson <rth@redhat.com> wrote:
> On 10/03/2011 05:14 AM, Artem Shinkarov wrote:
>> Hi, can anyone commit it please?
>>
>> Richard?
>> Or may be Richard?
>
> Committed.
>
>
> r~
>

[-- Attachment #2: double-test-cases.diff --]
[-- Type: text/plain, Size: 10766 bytes --]

Index: gcc/testsuite/gcc.c-torture/execute/vect-shuffle-1.c
===================================================================
--- gcc/testsuite/gcc.c-torture/execute/vect-shuffle-1.c	(revision 179464)
+++ gcc/testsuite/gcc.c-torture/execute/vect-shuffle-1.c	(working copy)
@@ -17,55 +17,9 @@ int main (int argc, char *argv[]) {
     /*vector (8, short) v0 = {argc, 1,2,3,4,5,6,7};
     vector (8, short) v1 = {argc, 1,argc,3,4,5,argc,7};
     vector (8, short) v2;
-   
-    vector (8, short) smask = {0,0,1,2,3,4,5,6};
-    
-    v2 = __builtin_shuffle (v0,  smask);
-    shufcompare (short, 8, v2, v0, smask);
-    v2 = __builtin_shuffle (v0, v1);
-    shufcompare (short, 8, v2, v0, v1);
-    v2 = __builtin_shuffle (smask, v0);
-    shufcompare (short, 8, v2, smask, v0);*/
-
-    vector (4, int) i0 = {argc, 1,2,3};
-    vector (4, int) i1 = {argc, 1, argc, 3};
-    vector (4, int) i2;
-
-    vector (4, int) imask = {0,3,2,1};
-
-    /*i2 = __builtin_shuffle (i0, imask);
-    shufcompare (int, 4, i2, i0, imask);*/
-    i2 = __builtin_shuffle (i0, i1);
-    shufcompare (int, 4, i2, i0, i1);
-    
-    i2 = __builtin_shuffle (imask, i0);
-    shufcompare (int, 4, i2, imask, i0);
-    
-    return 0;
-}
-
-#define vector(elcount, type)  \
-__attribute__((vector_size((elcount)*sizeof(type)))) type
-
-#define vidx(type, vec, idx) (*(((type *) &(vec)) + idx))
 
-#define shufcompare(type, count, vres, v0, mask) \
-do { \
-    int __i; \
-    for (__i = 0; __i < count; __i++) { \
-        if (vidx(type, vres, __i) != vidx(type, v0, vidx(type, mask, __i))) \
-            __builtin_abort (); \
-    } \
-} while (0)
-
-
-int main (int argc, char *argv[]) {
-    /*vector (8, short) v0 = {argc, 1,2,3,4,5,6,7};
-    vector (8, short) v1 = {argc, 1,argc,3,4,5,argc,7};
-    vector (8, short) v2;
-   
     vector (8, short) smask = {0,0,1,2,3,4,5,6};
-    
+
     v2 = __builtin_shuffle (v0,  smask);
     shufcompare (short, 8, v2, v0, smask);
     v2 = __builtin_shuffle (v0, v1);
@@ -83,10 +37,10 @@ int main (int argc, char *argv[]) {
     shufcompare (int, 4, i2, i0, imask);*/
     i2 = __builtin_shuffle (i0, i1);
     shufcompare (int, 4, i2, i0, i1);
-    
+
     i2 = __builtin_shuffle (imask, i0);
     shufcompare (int, 4, i2, imask, i0);
-    
+
     return 0;
 }
 
Index: gcc/testsuite/gcc.c-torture/execute/vect-shuffle-2.c
===================================================================
--- gcc/testsuite/gcc.c-torture/execute/vect-shuffle-2.c	(revision 179464)
+++ gcc/testsuite/gcc.c-torture/execute/vect-shuffle-2.c	(working copy)
@@ -42,47 +42,3 @@ int main (int argc, char *argv[]) {
     return 0;
 }
 
-#define vector(elcount, type)  \
-__attribute__((vector_size((elcount)*sizeof(type)))) type
-
-#define vidx(type, vec, idx) (*(((type *) &(vec)) + idx))
-
-#define shuf2compare(type, count, vres, v0, v1, mask) \
-do { \
-    int __i; \
-    for (__i = 0; __i < count; __i++) { \
-        if (vidx(type, vres, __i) != ((vidx(type, mask, __i) < count) ? \
-                          vidx(type, v0, vidx(type, mask, __i)) :  \
-                          vidx(type, v1, (vidx(type, mask, __i) - count)))) \
-            __builtin_abort (); \
-        } \
-} while (0)
-
-
-int main (int argc, char *argv[]) {
-    vector (8, short) v0 = {5, 5,5,5,5,5,argc,7};
-    vector (8, short) v1 = {argc, 1,8,8,4,9,argc,4};
-    vector (8, short) v2;
-
-    //vector (8, short) mask = {1,2,5,4,3,6,7};
-
-    vector (8, short) mask0 = {0,2,3,1,4,5,6,7};
-    vector (8, short) mask1 = {0,12,3,4,3,0,10,9};
-
-    vector (8, short) mask2 = {0,8,1,9,2,10,3,11};
-
-    v2 = __builtin_shuffle (v0, v1,  mask0);
-    shuf2compare (short, 8, v2, v0, v1, mask0);
-
-    v2 = __builtin_shuffle (v0, v1,  mask1);
-    shuf2compare (short, 8, v2, v0, v1, mask1);
-
-    v2 = __builtin_shuffle (v0, v1,  mask2);
-    shuf2compare (short, 8, v2, v0, v1, mask2);
-
-    v2 = __builtin_shuffle (mask0, mask0,  v0);
-    shuf2compare (short, 8, v2, mask0, mask0, v0);
-
-    return 0;
-}
-
Index: gcc/testsuite/gcc.c-torture/execute/vect-shuffle-3.c
===================================================================
--- gcc/testsuite/gcc.c-torture/execute/vect-shuffle-3.c	(revision 179464)
+++ gcc/testsuite/gcc.c-torture/execute/vect-shuffle-3.c	(working copy)
@@ -24,43 +24,7 @@ int main (int argc, char *argv[]) {
     vector (8, short) v2;
 
     vector (8, short) mask = {0,0,1,2,3,4,5,6};
-    
-    v2 = f (v0,  mask);
-    shufcompare (short, 8, v2, v0, mask);
-
-    v2 = f (v0, v1);
-    shufcompare (short, 8, v2, v0, v1);
-
-    return 0;
-}
-
-#define vector(elcount, type)  \
-__attribute__((vector_size((elcount)*sizeof(type)))) type
-
-#define vidx(type, vec, idx) (*(((type *) &(vec)) + idx))
 
-#define shufcompare(type, count, vres, v0, mask) \
-do { \
-    int __i; \
-    for (__i = 0; __i < count; __i++) { \
-        if (vidx(type, vres, __i) != vidx(type, v0, vidx(type, mask, __i))) \
-            __builtin_abort (); \
-    } \
-} while (0)
-
-vector (8, short) __attribute__ ((noinline))
-f (vector (8, short) x, vector (8, short) mask) {
-    return __builtin_shuffle (x, mask);
-}
-
-
-int main (int argc, char *argv[]) {
-    vector (8, short) v0 = {argc, 1,2,3,4,5,6,7};
-    vector (8, short) v1 = {argc, 1,argc,3,4,5,argc,7};
-    vector (8, short) v2;
-
-    vector (8, short) mask = {0,0,1,2,3,4,5,6};
-    
     v2 = f (v0,  mask);
     shufcompare (short, 8, v2, v0, mask);
 
Index: gcc/testsuite/gcc.c-torture/execute/vect-shuffle-4.c
===================================================================
--- gcc/testsuite/gcc.c-torture/execute/vect-shuffle-4.c	(revision 179464)
+++ gcc/testsuite/gcc.c-torture/execute/vect-shuffle-4.c	(working copy)
@@ -28,64 +28,14 @@ int main (int argc, char *argv[]) {
     vector (8, short) v2;
 
     //vector (8, short) mask = {1,2,5,4,3,6,7};
-    
-    vector (8, short) mask0 = {0,2,3,1,4,5,6,7};
-    vector (8, short) mask1 = {0,12,3,4,3,0,10,9};
-    vector (8, short) mask2 = {0,8,1,9,2,10,3,11};
-
-    v2 = f (v0, v1,  mask0);
-    shuf2compare (short, 8, v2, v0, v1, mask0);
- 
-    v2 = f (v0, v1,  mask1);
-    shuf2compare (short, 8, v2, v0, v1, mask1);
-
-    v2 = f (v0, v1,  mask2);
-    shuf2compare (short, 8, v2, v0, v1, mask2);
-
-    v2 = f (mask0, mask0,  v0);
-    shuf2compare (short, 8, v2, mask0, mask0, v0);
-
-    return 0; 
-}
-
-#define vector(elcount, type)  \
-__attribute__((vector_size((elcount)*sizeof(type)))) type
-
-#define vidx(type, vec, idx) (*(((type *) &(vec)) + idx))
-
-#define shuf2compare(type, count, vres, v0, v1, mask) \
-do { \
-    int __i; \
-    for (__i = 0; __i < count; __i++) { \
-        if (vidx(type, vres, __i) != ((vidx(type, mask, __i) < count) ? \
-                          vidx(type, v0, vidx(type, mask, __i)) :  \
-                          vidx(type, v1, (vidx(type, mask, __i) - count)))) \
-            __builtin_abort (); \
-        } \
-} while (0)
-
-
-vector (8, short) __attribute__ ((noinline))
-f (vector (8, short) x, vector (8, short) y, vector (8, short) mask) {
-    return __builtin_shuffle (x, y, mask);
-}
 
-
-
-int main (int argc, char *argv[]) {
-    vector (8, short) v0 = {argc, 1,2,3,4,5,6,7};
-    vector (8, short) v1 = {argc, 1,argc,3,4,5,argc,7};
-    vector (8, short) v2;
-
-    //vector (8, short) mask = {1,2,5,4,3,6,7};
-    
     vector (8, short) mask0 = {0,2,3,1,4,5,6,7};
     vector (8, short) mask1 = {0,12,3,4,3,0,10,9};
     vector (8, short) mask2 = {0,8,1,9,2,10,3,11};
 
     v2 = f (v0, v1,  mask0);
     shuf2compare (short, 8, v2, v0, v1, mask0);
- 
+
     v2 = f (v0, v1,  mask1);
     shuf2compare (short, 8, v2, v0, v1, mask1);
 
@@ -95,6 +45,6 @@ int main (int argc, char *argv[]) {
     v2 = f (mask0, mask0,  v0);
     shuf2compare (short, 8, v2, mask0, mask0, v0);
 
-    return 0; 
+    return 0;
 }
 
Index: gcc/testsuite/gcc.c-torture/execute/vect-shuffle-5.c
===================================================================
--- gcc/testsuite/gcc.c-torture/execute/vect-shuffle-5.c	(revision 179464)
+++ gcc/testsuite/gcc.c-torture/execute/vect-shuffle-5.c	(working copy)
@@ -30,7 +30,7 @@ do { \
   test_compat_mask (res, c ## vec, mask); \
   test_compat_mask (res, r ## vec, mask); \
   test_compat_mask (res, d ## vec, mask); \
-  test_compat_mask (res, dc ## vec, mask); 
+  test_compat_mask (res, dc ## vec, mask);
 
 #define test_compat(res, vec, mask) \
   test_compat_vec (res, vec, mask); \
@@ -46,72 +46,8 @@ int main (int argc, char *argv[]) {
     register vector (4, int) rvec = {argc, 1,2,3};
     v4si dvec = {argc, 1,2,3};
     v4sicst dcvec = {argc, 1,2,3};
-    
-    vector (4, int) res; 
-    v4si dres;
-    register vector (4, int) rres;
-
-    vector (4, int) mask = {0,3,2,1};
-    const vector (4, int) cmask = {0,3,2,1};
-    register vector (4, int) rmask = {0,3,2,1};
-    v4si dmask = {0,3,2,1};
-    v4sicst dcmask = {0,3,2,1};
-
-    test_compat (res, vec, mask);
-
-    return 0;
-}
-
-/* Test that different type variants are compatible within
-   vector shuffling.  */
 
-#define vector(elcount, type)  \
-__attribute__((vector_size((elcount)*sizeof(type)))) type
-
-#define shufcompare(count, vres, v0, mask) \
-do { \
-    int __i; \
-    for (__i = 0; __i < count; __i++) { \
-        if (vres[__i] != v0[mask[__i]]) \
-            __builtin_abort (); \
-    } \
-} while (0)
-
-#define test_compat_mask(res, vec, mask) \
-  res = __builtin_shuffle (vec, mask); \
-  shufcompare(4, res, vec, mask); \
-  res = __builtin_shuffle (vec, c ## mask); \
-  shufcompare(4, res, vec, c ##  mask); \
-  res = __builtin_shuffle (vec, r ## mask); \
-  shufcompare(4, res, vec, r ##  mask); \
-  res = __builtin_shuffle (vec, d ## mask); \
-  shufcompare(4, res, vec, d ##  mask); \
-  res = __builtin_shuffle (vec, dc ## mask); \
-  shufcompare(4, res, vec, dc ##  mask); \
-
-#define test_compat_vec(res, vec, mask) \
-  test_compat_mask (res, vec, mask); \
-  test_compat_mask (res, c ## vec, mask); \
-  test_compat_mask (res, r ## vec, mask); \
-  test_compat_mask (res, d ## vec, mask); \
-  test_compat_mask (res, dc ## vec, mask); 
-
-#define test_compat(res, vec, mask) \
-  test_compat_vec (res, vec, mask); \
-  test_compat_vec (d ## res, vec, mask); \
-  test_compat_vec (r ## res, vec, mask);
-
-typedef vector (4, int) v4si;
-typedef const vector (4, int) v4sicst;
-
-int main (int argc, char *argv[]) {
-    vector (4, int) vec = {argc, 1,2,3};
-    const vector (4, int) cvec = {argc, 1,2,3};
-    register vector (4, int) rvec = {argc, 1,2,3};
-    v4si dvec = {argc, 1,2,3};
-    v4sicst dcvec = {argc, 1,2,3};
-    
-    vector (4, int) res; 
+    vector (4, int) res;
     v4si dres;
     register vector (4, int) rres;
 
@@ -126,3 +62,4 @@ int main (int argc, char *argv[]) {
     return 0;
 }
 
+

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: Vector shuffling
  2011-10-03 16:44                                                           ` Artem Shinkarov
@ 2011-10-03 17:12                                                             ` Richard Henderson
  2011-10-03 17:21                                                               ` Artem Shinkarov
  2011-10-03 23:05                                                               ` Artem Shinkarov
  2011-10-06 10:55                                                             ` Georg-Johann Lay
  1 sibling, 2 replies; 71+ messages in thread
From: Richard Henderson @ 2011-10-03 17:12 UTC (permalink / raw)
  To: Artem Shinkarov
  Cc: Joseph S. Myers, Richard Guenther, Duncan Sands, gcc-patches

On 10/03/2011 09:43 AM, Artem Shinkarov wrote:
> Hi, Richard
> 
> There is a problem with the testcases of the patch you have committed
> for me. The code in every test-case is doubled. Could you please,
> apply the following patch, otherwise it would fail all the tests from
> the vector-shuffle-patch would fail.

Huh.  Dunno what happened there.  Fixed.

> Also, if it is possible, could you change my name from in the
> ChangeLog from "Artem Shinkarov" to "Artjoms Sinkarovs". The last
> version is the way I am spelled in the passport, and the name I use in
> the ChangeLog.

Fixed.


r~

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: Vector shuffling
  2011-10-03 17:12                                                             ` Richard Henderson
@ 2011-10-03 17:21                                                               ` Artem Shinkarov
  2011-10-03 23:05                                                               ` Artem Shinkarov
  1 sibling, 0 replies; 71+ messages in thread
From: Artem Shinkarov @ 2011-10-03 17:21 UTC (permalink / raw)
  To: Richard Henderson
  Cc: Joseph S. Myers, Richard Guenther, Duncan Sands, gcc-patches

On Mon, Oct 3, 2011 at 6:12 PM, Richard Henderson <rth@redhat.com> wrote:
> On 10/03/2011 09:43 AM, Artem Shinkarov wrote:
>> Hi, Richard
>>
>> There is a problem with the testcases of the patch you have committed
>> for me. The code in every test-case is doubled. Could you please,
>> apply the following patch, otherwise it would fail all the tests from
>> the vector-shuffle-patch would fail.
>
> Huh.  Dunno what happened there.  Fixed.
>

This is a common pattern, when the patch adds new files and you apply
the same patch to the code-base second time. In that case the content
of the files is doubled. This is an annoying feature of svn. May be
there is a solution to the problem, but I never managed to find one.

>> Also, if it is possible, could you change my name from in the
>> ChangeLog from "Artem Shinkarov" to "Artjoms Sinkarovs". The last
>> version is the way I am spelled in the passport, and the name I use in
>> the ChangeLog.
>
> Fixed.

Thank you very much.


Artem.
>
>
> r~
>

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: Vector shuffling
       [not found]                                                     ` <CABYV9SUt+mFr3XQLHnzJevBmovkop92tSRDnR9j4U7bOuDWuew@mail.gmail.com>
  2011-10-03 12:15                                                       ` Artem Shinkarov
@ 2011-10-03 22:48                                                       ` H.J. Lu
  1 sibling, 0 replies; 71+ messages in thread
From: H.J. Lu @ 2011-10-03 22:48 UTC (permalink / raw)
  To: Artem Shinkarov
  Cc: Richard Henderson, Joseph S. Myers, Richard Guenther,
	Duncan Sands, gcc-patches

On Fri, Sep 30, 2011 at 4:21 PM, Artem Shinkarov
<artyom.shinkaroff@gmail.com> wrote:
> Sorry for that, the vector comparison was submitted earlier. In the
> attachment there is a new version of the patch against the latest
> checkout.
>
> Richard, can you have a look at the genopinit.c, I am using
> set_direct_optab_handler, is it correct?
>
> All the rest seems to be the same.

This caused:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50607

-- 
H.J.

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: Vector shuffling
  2011-10-03 17:12                                                             ` Richard Henderson
  2011-10-03 17:21                                                               ` Artem Shinkarov
@ 2011-10-03 23:05                                                               ` Artem Shinkarov
  2011-10-04 15:21                                                                 ` Artem Shinkarov
  1 sibling, 1 reply; 71+ messages in thread
From: Artem Shinkarov @ 2011-10-03 23:05 UTC (permalink / raw)
  To: Richard Henderson
  Cc: Joseph S. Myers, Richard Guenther, Duncan Sands, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1847 bytes --]

On Mon, Oct 3, 2011 at 6:12 PM, Richard Henderson <rth@redhat.com> wrote:
> On 10/03/2011 09:43 AM, Artem Shinkarov wrote:
>> Hi, Richard
>>
>> There is a problem with the testcases of the patch you have committed
>> for me. The code in every test-case is doubled. Could you please,
>> apply the following patch, otherwise it would fail all the tests from
>> the vector-shuffle-patch would fail.
>
> Huh.  Dunno what happened there.  Fixed.
>
>> Also, if it is possible, could you change my name from in the
>> ChangeLog from "Artem Shinkarov" to "Artjoms Sinkarovs". The last
>> version is the way I am spelled in the passport, and the name I use in
>> the ChangeLog.
>
> Fixed.
>
>
> r~
>

Richard, there was a problem causing segfault in ix86_expand_vshuffle
which I have fixed with the patch attached.

Another thing I cannot figure out is the following case:
#define vector(elcount, type)  \
__attribute__((vector_size((elcount)*sizeof(type)))) type

vector (8, short) __attribute__ ((noinline))
f (vector (8, short) x, vector (8, short) y, vector (8, short) mask) {
    return  __builtin_shuffle (x, y, mask);
}

int main (int argc, char *argv[]) {
    vector (8, short) v0 = {argc, 1,2,3,4,5,6,7};
    vector (8, short) v1 = {argc, 1,argc,3,4,5,argc,7};
    vector (8, short) mask0 = {0,2,3,1,4,5,6,7};
    vector (8, short) v2;
    int i;

    v2 = f (v0, v1,  mask0);
    /* v2 =  __builtin_shuffle (v0, v1, mask0); */
    for (i = 0; i < 8; i ++)
      __builtin_printf ("%i, ", v2[i]);

    return 0;
}

I am compiling with support of ssse3, in my case it is ./xgcc -B. b.c
-O3 -mtune=core2 -march=core2

And I get 1, 1, 1, 3, 4, 5, 1, 7, on the output, which is wrong.

But if I will call __builtin_shuffle directly, then the answer is correct.

Any ideas?


Thanks,
Artem.

[-- Attachment #2: fix-segfault.diff --]
[-- Type: text/plain, Size: 1932 bytes --]

Index: gcc/config/i386/i386.c
===================================================================
--- gcc/config/i386/i386.c	(revision 179464)
+++ gcc/config/i386/i386.c	(working copy)
@@ -19312,14 +19312,17 @@ ix86_expand_vshuffle (rtx operands[])
       xops[1] = operands[1];
       xops[2] = operands[2];
       xops[3] = gen_rtx_EQ (mode, mask, w_vector);
-      xops[4] = t1;
-      xops[5] = t2;
+      xops[4] = t2;
+      xops[5] = t1;
 
       return ix86_expand_int_vcond (xops);
     }
 
-  /* mask = mask * {w, w, ...}  */
-  new_mask = expand_simple_binop (maskmode, MULT, new_mask, w_vector,
+  /* mask = mask * {16/w, 16/w, ...}  */
+  for (i = 0; i < w; i++)
+    vec[i] = GEN_INT (16/w);
+  vt = gen_rtx_CONST_VECTOR (maskmode, gen_rtvec_v (w, vec));
+  new_mask = expand_simple_binop (maskmode, MULT, new_mask, vt,
 				  NULL_RTX, 0, OPTAB_DIRECT);
 
   /* Convert mask to vector of chars.  */
@@ -19332,7 +19335,7 @@ ix86_expand_vshuffle (rtx operands[])
      ...  */
   for (i = 0; i < w; i++)
     for (j = 0; j < 16/w; j++)
-      vec[i*w+j] = GEN_INT (i*16/w);
+      vec[i*(16/w)+j] = GEN_INT (i*16/w);
   vt = gen_rtx_CONST_VECTOR (V16QImode, gen_rtvec_v (16, vec));
   vt = force_reg (V16QImode, vt);
 
@@ -19344,7 +19347,7 @@ ix86_expand_vshuffle (rtx operands[])
      new_mask = new_mask + {0,1,..,16/w, 0,1,..,16/w, ...}  */
   for (i = 0; i < w; i++)
     for (j = 0; j < 16/w; j++)
-      vec[i*w+j] = GEN_INT (j);
+      vec[i*(16/w)+j] = GEN_INT (j);
 
   vt = gen_rtx_CONST_VECTOR (V16QImode, gen_rtvec_v (16, vec));
   new_mask = expand_simple_binop (V16QImode, PLUS, new_mask, vt,
@@ -19386,8 +19389,8 @@ ix86_expand_vshuffle (rtx operands[])
       xops[1] = operands[1];
       xops[2] = operands[2];
       xops[3] = gen_rtx_EQ (mode, mask, w_vector);
-      xops[4] = t1;
-      xops[5] = t2;
+      xops[4] = t2;
+      xops[5] = t1;
 
       return ix86_expand_int_vcond (xops);
     }

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: Vector shuffling
  2011-09-30 20:51                                                 ` Artem Shinkarov
  2011-09-30 23:22                                                   ` Richard Henderson
@ 2011-10-04  2:26                                                   ` Hans-Peter Nilsson
  1 sibling, 0 replies; 71+ messages in thread
From: Hans-Peter Nilsson @ 2011-10-04  2:26 UTC (permalink / raw)
  To: Artem Shinkarov; +Cc: gcc-patches

On Fri, 30 Sep 2011, Artem Shinkarov wrote:
> 	gcc/doc
> 	* extend.texi: Adjust.

Pretty please document the new pattern names in doc/md.texi as
well.  Thanks in advance.

brgds, H-P

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: Vector shuffling
  2011-10-03 23:05                                                               ` Artem Shinkarov
@ 2011-10-04 15:21                                                                 ` Artem Shinkarov
  2011-10-04 16:43                                                                   ` Richard Henderson
  0 siblings, 1 reply; 71+ messages in thread
From: Artem Shinkarov @ 2011-10-04 15:21 UTC (permalink / raw)
  To: Richard Henderson
  Cc: Joseph S. Myers, Richard Guenther, Duncan Sands, gcc-patches

Ping.

Richard, the patch in the attachment should be submitted asap. The
other problem could wait for a while.

Thanks,
Artem.

On Tue, Oct 4, 2011 at 12:04 AM, Artem Shinkarov
<artyom.shinkaroff@gmail.com> wrote:
> On Mon, Oct 3, 2011 at 6:12 PM, Richard Henderson <rth@redhat.com> wrote:
>> On 10/03/2011 09:43 AM, Artem Shinkarov wrote:
>>> Hi, Richard
>>>
>>> There is a problem with the testcases of the patch you have committed
>>> for me. The code in every test-case is doubled. Could you please,
>>> apply the following patch, otherwise it would fail all the tests from
>>> the vector-shuffle-patch would fail.
>>
>> Huh.  Dunno what happened there.  Fixed.
>>
>>> Also, if it is possible, could you change my name from in the
>>> ChangeLog from "Artem Shinkarov" to "Artjoms Sinkarovs". The last
>>> version is the way I am spelled in the passport, and the name I use in
>>> the ChangeLog.
>>
>> Fixed.
>>
>>
>> r~
>>
>
> Richard, there was a problem causing segfault in ix86_expand_vshuffle
> which I have fixed with the patch attached.
>
> Another thing I cannot figure out is the following case:
> #define vector(elcount, type)  \
> __attribute__((vector_size((elcount)*sizeof(type)))) type
>
> vector (8, short) __attribute__ ((noinline))
> f (vector (8, short) x, vector (8, short) y, vector (8, short) mask) {
>    return  __builtin_shuffle (x, y, mask);
> }
>
> int main (int argc, char *argv[]) {
>    vector (8, short) v0 = {argc, 1,2,3,4,5,6,7};
>    vector (8, short) v1 = {argc, 1,argc,3,4,5,argc,7};
>    vector (8, short) mask0 = {0,2,3,1,4,5,6,7};
>    vector (8, short) v2;
>    int i;
>
>    v2 = f (v0, v1,  mask0);
>    /* v2 =  __builtin_shuffle (v0, v1, mask0); */
>    for (i = 0; i < 8; i ++)
>      __builtin_printf ("%i, ", v2[i]);
>
>    return 0;
> }
>
> I am compiling with support of ssse3, in my case it is ./xgcc -B. b.c
> -O3 -mtune=core2 -march=core2
>
> And I get 1, 1, 1, 3, 4, 5, 1, 7, on the output, which is wrong.
>
> But if I will call __builtin_shuffle directly, then the answer is correct.
>
> Any ideas?
>
>
> Thanks,
> Artem.
>

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: Vector shuffling
  2011-10-04 15:21                                                                 ` Artem Shinkarov
@ 2011-10-04 16:43                                                                   ` Richard Henderson
  0 siblings, 0 replies; 71+ messages in thread
From: Richard Henderson @ 2011-10-04 16:43 UTC (permalink / raw)
  To: Artem Shinkarov
  Cc: Joseph S. Myers, Richard Guenther, Duncan Sands, gcc-patches

On 10/04/2011 08:18 AM, Artem Shinkarov wrote:
> Ping.
> 
> Richard, the patch in the attachment should be submitted asap. The
> other problem could wait for a while.

The patch in the attachment is wrong too.  I've re-written the x86
backend support, adding TARGET_XOP in the process.  I've also re-written
the test cases so that they actually test what we wanted.

Patch to follow once testing is complete.


r~

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: Vector shuffling
  2011-10-03 16:44                                                           ` Artem Shinkarov
  2011-10-03 17:12                                                             ` Richard Henderson
@ 2011-10-06 10:55                                                             ` Georg-Johann Lay
  2011-10-06 11:28                                                               ` Richard Guenther
  2011-10-06 11:47                                                               ` Jakub Jelinek
  1 sibling, 2 replies; 71+ messages in thread
From: Georg-Johann Lay @ 2011-10-06 10:55 UTC (permalink / raw)
  To: Artem Shinkarov
  Cc: Richard Henderson, Joseph S. Myers, Richard Guenther,
	Duncan Sands, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1980 bytes --]

Artem Shinkarov schrieb:
> Hi, Richard
> 
> There is a problem with the testcases of the patch you have committed
> for me. The code in every test-case is doubled. Could you please,
> apply the following patch, otherwise it would fail all the tests from
> the vector-shuffle-patch would fail.
> 
> Also, if it is possible, could you change my name from in the
> ChangeLog from "Artem Shinkarov" to "Artjoms Sinkarovs". The last
> version is the way I am spelled in the passport, and the name I use in
> the ChangeLog.
> 
> Thanks,
> Artem.
> 
> On Mon, Oct 3, 2011 at 4:13 PM, Richard Henderson <rth@redhat.com> wrote:
>> On 10/03/2011 05:14 AM, Artem Shinkarov wrote:
>>> Hi, can anyone commit it please?
>>>
>>> Richard?
>>> Or may be Richard?
>> Committed.
>>
>> r~
>>
> Hi, Richard
> 
> There is a problem with the testcases of the patch you have committed
> for me. The code in every test-case is doubled. Could you please,
> apply the following patch, otherwise it would fail all the tests from
> the vector-shuffle-patch would fail.
> 
> Also, if it is possible, could you change my name from in the
> ChangeLog from "Artem Shinkarov" to "Artjoms Sinkarovs". The last
> version is the way I am spelled in the passport, and the name I use in
> the ChangeLog.
> 
> 
> Thanks,
> Artem.
> 

The following test cases cause FAILs because main cannot be found by the linker
 because if __SIZEOF_INT__ != 4 you are trying to compile and run an empty file.

> Index: gcc/testsuite/gcc.c-torture/execute/vect-shuffle-1.c

> Index: gcc/testsuite/gcc.c-torture/execute/vect-shuffle-5.c

The following patch avoids __SIZEOF_INT__.

Ok by some maintainer to commit?

Johann

testsuite/
	* lib/target-supports.exp (check_effective_target_int32): New
	function.
	* gcc.c-torture/execute/vect-shuffle-1.c: Don't use
	__SIZEOF_INT__.
	* gcc.c-torture/execute/vect-shuffle-5.c: Ditto.
	* gcc.c-torture/execute/vect-shuffle-1.x: New file.
	* gcc.c-torture/execute/vect-shuffle-5.x: New file.


[-- Attachment #2: vshuffle.diff --]
[-- Type: text/x-patch, Size: 2134 bytes --]

Index: lib/target-supports.exp
===================================================================
--- lib/target-supports.exp	(revision 179599)
+++ lib/target-supports.exp	(working copy)
@@ -1583,6 +1583,15 @@ proc check_effective_target_int16 { } {
     }]
 }
 
+# Returns 1 if we're generating 32-bit integers with the
+# default options, 0 otherwise.
+
+proc check_effective_target_int32 { } {
+    return [check_no_compiler_messages int32 object {
+	int dummy[sizeof (int) == 4 ? 1 : -1];
+    }]
+}
+
 # Return 1 if we're generating 64-bit code using default options, 0
 # otherwise.
 
Index: gcc.c-torture/execute/vect-shuffle-1.c
===================================================================
--- gcc.c-torture/execute/vect-shuffle-1.c	(revision 179599)
+++ gcc.c-torture/execute/vect-shuffle-1.c	(working copy)
@@ -1,4 +1,3 @@
-#if __SIZEOF_INT__ == 4
 typedef unsigned int V __attribute__((vector_size(16), may_alias));
 
 struct S
@@ -64,5 +63,3 @@ int main()
 
   return 0;
 }
-
-#endif /* SIZEOF_INT */
Index: gcc.c-torture/execute/vect-shuffle-1.x
===================================================================
--- gcc.c-torture/execute/vect-shuffle-1.x	(revision 0)
+++ gcc.c-torture/execute/vect-shuffle-1.x	(revision 0)
@@ -0,0 +1,7 @@
+load_lib target-supports.exp
+
+if { [check_effective_target_int32] } {
+	return 0
+}
+
+return 1;
Index: gcc.c-torture/execute/vect-shuffle-5.c
===================================================================
--- gcc.c-torture/execute/vect-shuffle-5.c	(revision 179599)
+++ gcc.c-torture/execute/vect-shuffle-5.c	(working copy)
@@ -1,4 +1,3 @@
-#if __SIZEOF_INT__ == 4
 typedef unsigned int V __attribute__((vector_size(16), may_alias));
 
 struct S
@@ -60,5 +59,3 @@ int main()
 
   return 0;
 }
-
-#endif /* SIZEOF_INT */
Index: gcc.c-torture/execute/vect-shuffle-5.x
===================================================================
--- gcc.c-torture/execute/vect-shuffle-5.x	(revision 0)
+++ gcc.c-torture/execute/vect-shuffle-5.x	(revision 0)
@@ -0,0 +1,7 @@
+load_lib target-supports.exp
+
+if { [check_effective_target_int32] } {
+	return 0
+}
+
+return 1;

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: Vector shuffling
  2011-10-06 10:55                                                             ` Georg-Johann Lay
@ 2011-10-06 11:28                                                               ` Richard Guenther
  2011-10-06 11:38                                                                 ` Georg-Johann Lay
  2011-10-06 11:47                                                               ` Jakub Jelinek
  1 sibling, 1 reply; 71+ messages in thread
From: Richard Guenther @ 2011-10-06 11:28 UTC (permalink / raw)
  To: Georg-Johann Lay
  Cc: Artem Shinkarov, Richard Henderson, Joseph S. Myers,
	Duncan Sands, gcc-patches

On Thu, Oct 6, 2011 at 12:51 PM, Georg-Johann Lay <avr@gjlay.de> wrote:
> Artem Shinkarov schrieb:
>> Hi, Richard
>>
>> There is a problem with the testcases of the patch you have committed
>> for me. The code in every test-case is doubled. Could you please,
>> apply the following patch, otherwise it would fail all the tests from
>> the vector-shuffle-patch would fail.
>>
>> Also, if it is possible, could you change my name from in the
>> ChangeLog from "Artem Shinkarov" to "Artjoms Sinkarovs". The last
>> version is the way I am spelled in the passport, and the name I use in
>> the ChangeLog.
>>
>> Thanks,
>> Artem.
>>
>> On Mon, Oct 3, 2011 at 4:13 PM, Richard Henderson <rth@redhat.com> wrote:
>>> On 10/03/2011 05:14 AM, Artem Shinkarov wrote:
>>>> Hi, can anyone commit it please?
>>>>
>>>> Richard?
>>>> Or may be Richard?
>>> Committed.
>>>
>>> r~
>>>
>> Hi, Richard
>>
>> There is a problem with the testcases of the patch you have committed
>> for me. The code in every test-case is doubled. Could you please,
>> apply the following patch, otherwise it would fail all the tests from
>> the vector-shuffle-patch would fail.
>>
>> Also, if it is possible, could you change my name from in the
>> ChangeLog from "Artem Shinkarov" to "Artjoms Sinkarovs". The last
>> version is the way I am spelled in the passport, and the name I use in
>> the ChangeLog.
>>
>>
>> Thanks,
>> Artem.
>>
>
> The following test cases cause FAILs because main cannot be found by the linker
>  because if __SIZEOF_INT__ != 4 you are trying to compile and run an empty file.
>
>> Index: gcc/testsuite/gcc.c-torture/execute/vect-shuffle-1.c
>
>> Index: gcc/testsuite/gcc.c-torture/execute/vect-shuffle-5.c
>
> The following patch avoids __SIZEOF_INT__.
>
> Ok by some maintainer to commit?

On a general note, if you need to add .x files, consider moving the
test to gcc.dg/torture instead.

Richard.

> Johann
>
> testsuite/
>        * lib/target-supports.exp (check_effective_target_int32): New
>        function.
>        * gcc.c-torture/execute/vect-shuffle-1.c: Don't use
>        __SIZEOF_INT__.
>        * gcc.c-torture/execute/vect-shuffle-5.c: Ditto.
>        * gcc.c-torture/execute/vect-shuffle-1.x: New file.
>        * gcc.c-torture/execute/vect-shuffle-5.x: New file.
>
>

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: Vector shuffling
  2011-10-06 11:28                                                               ` Richard Guenther
@ 2011-10-06 11:38                                                                 ` Georg-Johann Lay
  2011-10-06 11:46                                                                   ` Richard Guenther
  0 siblings, 1 reply; 71+ messages in thread
From: Georg-Johann Lay @ 2011-10-06 11:38 UTC (permalink / raw)
  To: Richard Guenther
  Cc: Artem Shinkarov, Richard Henderson, Joseph S. Myers,
	Duncan Sands, gcc-patches

Richard Guenther schrieb:
> On Thu, Oct 6, 2011 at 12:51 PM, Georg-Johann Lay <avr@gjlay.de> wrote:
>> Artem Shinkarov schrieb:
>>> Hi, Richard
>>>
>>> There is a problem with the testcases of the patch you have committed
>>> for me. The code in every test-case is doubled. Could you please,
>>> apply the following patch, otherwise it would fail all the tests from
>>> the vector-shuffle-patch would fail.
>>>
>>> Also, if it is possible, could you change my name from in the
>>> ChangeLog from "Artem Shinkarov" to "Artjoms Sinkarovs". The last
>>> version is the way I am spelled in the passport, and the name I use in
>>> the ChangeLog.
>>>
>>> Thanks,
>>> Artem.
>>>
>>> On Mon, Oct 3, 2011 at 4:13 PM, Richard Henderson <rth@redhat.com> wrote:
>>>> On 10/03/2011 05:14 AM, Artem Shinkarov wrote:
>>>>> Hi, can anyone commit it please?
>>>>>
>>>>> Richard?
>>>>> Or may be Richard?
>>>> Committed.
>>>>
>>>> r~
>>>>
>>> Hi, Richard
>>>
>>> There is a problem with the testcases of the patch you have committed
>>> for me. The code in every test-case is doubled. Could you please,
>>> apply the following patch, otherwise it would fail all the tests from
>>> the vector-shuffle-patch would fail.
>>>
>>> Also, if it is possible, could you change my name from in the
>>> ChangeLog from "Artem Shinkarov" to "Artjoms Sinkarovs". The last
>>> version is the way I am spelled in the passport, and the name I use in
>>> the ChangeLog.
>>>
>>>
>>> Thanks,
>>> Artem.
>>>
>> The following test cases cause FAILs because main cannot be found by the linker
>>  because if __SIZEOF_INT__ != 4 you are trying to compile and run an empty file.
>>
>>> Index: gcc/testsuite/gcc.c-torture/execute/vect-shuffle-1.c
>>> Index: gcc/testsuite/gcc.c-torture/execute/vect-shuffle-5.c
>> The following patch avoids __SIZEOF_INT__.
>>
>> Ok by some maintainer to commit?
> 
> On a general note, if you need to add .x files, consider moving the
> test to gcc.dg/torture instead.

So should I move all vect-shuffle-*.c files so that they are kept together?

Johann

> Richard.
> 
>> Johann
>>
>> testsuite/
>>        * lib/target-supports.exp (check_effective_target_int32): New
>>        function.
>>        * gcc.c-torture/execute/vect-shuffle-1.c: Don't use
>>        __SIZEOF_INT__.
>>        * gcc.c-torture/execute/vect-shuffle-5.c: Ditto.
>>        * gcc.c-torture/execute/vect-shuffle-1.x: New file.
>>        * gcc.c-torture/execute/vect-shuffle-5.x: New file.

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: Vector shuffling
  2011-10-06 11:38                                                                 ` Georg-Johann Lay
@ 2011-10-06 11:46                                                                   ` Richard Guenther
  2011-10-06 12:12                                                                     ` Georg-Johann Lay
  0 siblings, 1 reply; 71+ messages in thread
From: Richard Guenther @ 2011-10-06 11:46 UTC (permalink / raw)
  To: Georg-Johann Lay
  Cc: Artem Shinkarov, Richard Henderson, Joseph S. Myers,
	Duncan Sands, gcc-patches

On Thu, Oct 6, 2011 at 1:03 PM, Georg-Johann Lay <avr@gjlay.de> wrote:
> Richard Guenther schrieb:
>> On Thu, Oct 6, 2011 at 12:51 PM, Georg-Johann Lay <avr@gjlay.de> wrote:
>>> Artem Shinkarov schrieb:
>>>> Hi, Richard
>>>>
>>>> There is a problem with the testcases of the patch you have committed
>>>> for me. The code in every test-case is doubled. Could you please,
>>>> apply the following patch, otherwise it would fail all the tests from
>>>> the vector-shuffle-patch would fail.
>>>>
>>>> Also, if it is possible, could you change my name from in the
>>>> ChangeLog from "Artem Shinkarov" to "Artjoms Sinkarovs". The last
>>>> version is the way I am spelled in the passport, and the name I use in
>>>> the ChangeLog.
>>>>
>>>> Thanks,
>>>> Artem.
>>>>
>>>> On Mon, Oct 3, 2011 at 4:13 PM, Richard Henderson <rth@redhat.com> wrote:
>>>>> On 10/03/2011 05:14 AM, Artem Shinkarov wrote:
>>>>>> Hi, can anyone commit it please?
>>>>>>
>>>>>> Richard?
>>>>>> Or may be Richard?
>>>>> Committed.
>>>>>
>>>>> r~
>>>>>
>>>> Hi, Richard
>>>>
>>>> There is a problem with the testcases of the patch you have committed
>>>> for me. The code in every test-case is doubled. Could you please,
>>>> apply the following patch, otherwise it would fail all the tests from
>>>> the vector-shuffle-patch would fail.
>>>>
>>>> Also, if it is possible, could you change my name from in the
>>>> ChangeLog from "Artem Shinkarov" to "Artjoms Sinkarovs". The last
>>>> version is the way I am spelled in the passport, and the name I use in
>>>> the ChangeLog.
>>>>
>>>>
>>>> Thanks,
>>>> Artem.
>>>>
>>> The following test cases cause FAILs because main cannot be found by the linker
>>>  because if __SIZEOF_INT__ != 4 you are trying to compile and run an empty file.
>>>
>>>> Index: gcc/testsuite/gcc.c-torture/execute/vect-shuffle-1.c
>>>> Index: gcc/testsuite/gcc.c-torture/execute/vect-shuffle-5.c
>>> The following patch avoids __SIZEOF_INT__.
>>>
>>> Ok by some maintainer to commit?
>>
>> On a general note, if you need to add .x files, consider moving the
>> test to gcc.dg/torture instead.
>
> So should I move all vect-shuffle-*.c files so that they are kept together?

Yes.

> Johann
>
>> Richard.
>>
>>> Johann
>>>
>>> testsuite/
>>>        * lib/target-supports.exp (check_effective_target_int32): New
>>>        function.
>>>        * gcc.c-torture/execute/vect-shuffle-1.c: Don't use
>>>        __SIZEOF_INT__.
>>>        * gcc.c-torture/execute/vect-shuffle-5.c: Ditto.
>>>        * gcc.c-torture/execute/vect-shuffle-1.x: New file.
>>>        * gcc.c-torture/execute/vect-shuffle-5.x: New file.
>

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: Vector shuffling
  2011-10-06 10:55                                                             ` Georg-Johann Lay
  2011-10-06 11:28                                                               ` Richard Guenther
@ 2011-10-06 11:47                                                               ` Jakub Jelinek
  1 sibling, 0 replies; 71+ messages in thread
From: Jakub Jelinek @ 2011-10-06 11:47 UTC (permalink / raw)
  To: Georg-Johann Lay
  Cc: Artem Shinkarov, Richard Henderson, Joseph S. Myers,
	Richard Guenther, Duncan Sands, gcc-patches

On Thu, Oct 06, 2011 at 12:51:54PM +0200, Georg-Johann Lay wrote:
> The following patch avoids __SIZEOF_INT__.
> 
> Ok by some maintainer to commit?

That is unnecessary.  You can just add
#else
int
main ()
{
  return 0;
}
before the final #endif in the files instead.
Or move around the #ifdefs, so that it ifdefs out for weirdo targets
just everything before main and then also main's body except for return 0;
at the end.

	Jakub

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: Vector shuffling
  2011-10-06 11:46                                                                   ` Richard Guenther
@ 2011-10-06 12:12                                                                     ` Georg-Johann Lay
  2011-10-06 15:43                                                                       ` Richard Henderson
  0 siblings, 1 reply; 71+ messages in thread
From: Georg-Johann Lay @ 2011-10-06 12:12 UTC (permalink / raw)
  To: Richard Guenther
  Cc: Artem Shinkarov, Richard Henderson, Joseph S. Myers,
	Duncan Sands, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 3550 bytes --]

Richard Guenther schrieb:
> On Thu, Oct 6, 2011 at 1:03 PM, Georg-Johann Lay <avr@gjlay.de> wrote:
>> Richard Guenther schrieb:
>>> On Thu, Oct 6, 2011 at 12:51 PM, Georg-Johann Lay <avr@gjlay.de> wrote:
>>>> Artem Shinkarov schrieb:
>>>>> Hi, Richard
>>>>>
>>>>> There is a problem with the testcases of the patch you have committed
>>>>> for me. The code in every test-case is doubled. Could you please,
>>>>> apply the following patch, otherwise it would fail all the tests from
>>>>> the vector-shuffle-patch would fail.
>>>>>
>>>>> Also, if it is possible, could you change my name from in the
>>>>> ChangeLog from "Artem Shinkarov" to "Artjoms Sinkarovs". The last
>>>>> version is the way I am spelled in the passport, and the name I use in
>>>>> the ChangeLog.
>>>>>
>>>>> Thanks,
>>>>> Artem.
>>>>>
>>>>> On Mon, Oct 3, 2011 at 4:13 PM, Richard Henderson <rth@redhat.com> wrote:
>>>>>> On 10/03/2011 05:14 AM, Artem Shinkarov wrote:
>>>>>>> Hi, can anyone commit it please?
>>>>>>>
>>>>>>> Richard?
>>>>>>> Or may be Richard?
>>>>>> Committed.
>>>>>>
>>>>>> r~
>>>>>>
>>>>> Hi, Richard
>>>>>
>>>>> There is a problem with the testcases of the patch you have committed
>>>>> for me. The code in every test-case is doubled. Could you please,
>>>>> apply the following patch, otherwise it would fail all the tests from
>>>>> the vector-shuffle-patch would fail.
>>>>>
>>>>> Also, if it is possible, could you change my name from in the
>>>>> ChangeLog from "Artem Shinkarov" to "Artjoms Sinkarovs". The last
>>>>> version is the way I am spelled in the passport, and the name I use in
>>>>> the ChangeLog.
>>>>>
>>>>>
>>>>> Thanks,
>>>>> Artem.
>>>>>
>>>> The following test cases cause FAILs because main cannot be found by the linker
>>>>  because if __SIZEOF_INT__ != 4 you are trying to compile and run an empty file.
>>>>
>>>>> Index: gcc/testsuite/gcc.c-torture/execute/vect-shuffle-1.c
>>>>> Index: gcc/testsuite/gcc.c-torture/execute/vect-shuffle-5.c
>>>> The following patch avoids __SIZEOF_INT__.
>>>>
>>>> Ok by some maintainer to commit?
>>> On a general note, if you need to add .x files, consider moving the
>>> test to gcc.dg/torture instead.
>> So should I move all vect-shuffle-*.c files so that they are kept together?
> 
> Yes.

So here it is.  Lightly tested on my target: All tests either PASS or are
UNSUPPORTED now.

Ok?

Johann

testsuite/
	* lib/target-supports.exp (check_effective_target_int32): New
	function.
	(check_effective_target_short16): New function.
	(check_effective_target_longlong64): New function.
	
	* gcc.c-torture/execute/vect-shuffle-1.c: Move to gcc.dg/torture.
	* gcc.c-torture/execute/vect-shuffle-2.c: Move to gcc.dg/torture.
	* gcc.c-torture/execute/vect-shuffle-3.c: Move to gcc.dg/torture.
	* gcc.c-torture/execute/vect-shuffle-4.c: Move to gcc.dg/torture.
	* gcc.c-torture/execute/vect-shuffle-5.c: Move to gcc.dg/torture.
	* gcc.c-torture/execute/vect-shuffle-6.c: Move to gcc.dg/torture.
	* gcc.c-torture/execute/vect-shuffle-7.c: Move to gcc.dg/torture.
	* gcc.c-torture/execute/vect-shuffle-8.c: Move to gcc.dg/torture.
	* gcc.dg/torture/vect-shuffle-1.c: Use dg-require-effective-target
	int32 instead of __SIZEOF_INT__ == 4.
	* gcc.dg/torture/vect-shuffle-5.c: Ditto.
	* gcc.dg/torture/vect-shuffle-2.c: Use dg-require-effective-target
	short16 instead of __SIZEOF_SHORT__ == 2.
	* gcc.dg/torture/vect-shuffle-6.c: Ditto.
	* gcc.dg/torture/vect-shuffle-3.c: Use dg-require-effective-target
	longlong64 instead of __SIZEOF_LONG_LONG__ == 8.
	* gcc.dg/torture/vect-shuffle-7.c: Ditto.



[-- Attachment #2: vshuffle.diff --]
[-- Type: text/x-patch, Size: 19838 bytes --]

Index: lib/target-supports.exp
===================================================================
--- lib/target-supports.exp	(revision 179599)
+++ lib/target-supports.exp	(working copy)
@@ -1583,6 +1583,33 @@ proc check_effective_target_int16 { } {
     }]
 }
 
+# Returns 1 if we're generating 32-bit integers with the
+# default options, 0 otherwise.
+
+proc check_effective_target_int32 { } {
+    return [check_no_compiler_messages int32 object {
+	int dummy[sizeof (int) == 4 ? 1 : -1];
+    }]
+}
+
+# Returns 1 if we're generating 64-bit long long integers with the
+# default options, 0 otherwise.
+
+proc check_effective_target_longlong64 { } {
+    return [check_no_compiler_messages longlong64 object {
+	int dummy[sizeof (long long ) == 8 ? 1 : -1];
+    }]
+}
+
+# Returns 1 if we're generating 16-bit short integers with the
+# default options, 0 otherwise.
+
+proc check_effective_target_short16 { } {
+    return [check_no_compiler_messages short16 object {
+	int dummy[sizeof (short) == 2 ? 1 : -1];
+    }]
+}
+
 # Return 1 if we're generating 64-bit code using default options, 0
 # otherwise.
 
Index: gcc.c-torture/execute/vect-shuffle-2.c
===================================================================
--- gcc.c-torture/execute/vect-shuffle-2.c	(revision 179599)
+++ gcc.c-torture/execute/vect-shuffle-2.c	(working copy)
@@ -1,68 +0,0 @@
-#if __SIZEOF_SHORT__ == 2
-typedef unsigned short V __attribute__((vector_size(16), may_alias));
-
-struct S
-{
-  V in, mask, out;
-};
-
-struct S tests[] = {
-  {
-    { 0x1111, 0x2222, 0x3333, 0x4444, 0x5555, 0x6666, 0x7777, 0x8888 },
-    { 0, 1, 2, 3, 4, 5, 6, 7 },
-    { 0x1111, 0x2222, 0x3333, 0x4444, 0x5555, 0x6666, 0x7777, 0x8888 },
-  },
-  {
-    { 0x1111, 0x2222, 0x3333, 0x4444, 0x5555, 0x6666, 0x7777, 0x8888 },
-    { 0x10, 0x21, 0x32, 0x43, 0x54, 0x65, 0x76, 0x87 },
-    { 0x1111, 0x2222, 0x3333, 0x4444, 0x5555, 0x6666, 0x7777, 0x8888 },
-  },
-  {
-    { 0x1111, 0x2222, 0x3333, 0x4444, 0x5555, 0x6666, 0x7777, 0x8888 },
-    { 7, 6, 5, 4, 3, 2, 1, 0 },
-    { 0x8888, 0x7777, 0x6666, 0x5555, 0x4444, 0x3333, 0x2222, 0x1111 },
-  },
-  {
-    { 0x1111, 0x2222, 0x3333, 0x4444, 0x5555, 0x6666, 0x7777, 0x8888 },
-    { 7, 0, 5, 3, 2, 4, 1, 6 },
-    { 0x8888, 0x1111, 0x6666, 0x4444, 0x3333, 0x5555, 0x2222, 0x7777 },
-  },
-  {
-    { 0x1111, 0x2222, 0x3333, 0x4444, 0x5555, 0x6666, 0x7777, 0x8888 },
-    { 0, 2, 1, 3, 4, 6, 5, 7 },
-    { 0x1111, 0x3333, 0x2222, 0x4444, 0x5555, 0x7777, 0x6666, 0x8888 },
-  },
-  {
-    { 0x1122, 0x3344, 0x5566, 0x7788, 0x99aa, 0xbbcc, 0xddee, 0xff00 },
-    { 3, 1, 2, 0, 7, 5, 6, 4 },
-    { 0x7788, 0x3344, 0x5566, 0x1122, 0xff00, 0xbbcc, 0xddee, 0x99aa },
-  },
-  {
-    { 0x1122, 0x3344, 0x5566, 0x7788, 0x99aa, 0xbbcc, 0xddee, 0xff00 },
-    { 0, 0, 0, 0 },
-    { 0x1122, 0x1122, 0x1122, 0x1122, 0x1122, 0x1122, 0x1122, 0x1122 },
-  },
-  {
-    { 0x1122, 0x3344, 0x5566, 0x7788, 0x99aa, 0xbbcc, 0xddee, 0xff00 },
-    { 1, 6, 1, 6, 1, 6, 1, 6 }, 
-    { 0x3344, 0xddee, 0x3344, 0xddee, 0x3344, 0xddee, 0x3344, 0xddee },
-  }
-};
-
-extern void abort(void);
-
-int main()
-{
-  int i;
-
-  for (i = 0; i < sizeof(tests)/sizeof(tests[0]); ++i)
-    {
-      V r = __builtin_shuffle(tests[i].in, tests[i].mask);
-      if (memcmp(&r, &tests[i].out, sizeof(V)) != 0)
-	abort();
-    }
-
-  return 0;
-}
-
-#endif /* SIZEOF_SHORT */
Index: gcc.c-torture/execute/vect-shuffle-4.c
===================================================================
--- gcc.c-torture/execute/vect-shuffle-4.c	(revision 179599)
+++ gcc.c-torture/execute/vect-shuffle-4.c	(working copy)
@@ -1,51 +0,0 @@
-typedef unsigned char V __attribute__((vector_size(16), may_alias));
-
-struct S
-{
-  V in, mask, out;
-};
-
-struct S tests[] = {
-  {
-    { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16 },
-    { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, },
-    { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16 },
-  },
-  {
-    { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16 },
-    { 0x10, 0x21, 0x32, 0x43, 0x54, 0x65, 0x76, 0x87,
-      0x98, 0xa9, 0xba, 0xcb, 0xdc, 0xed, 0xfe, 0xff },
-    { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16 },
-  },
-  {
-    { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16 },
-    { 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0 },
-    { 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1 },
-  },
-  {
-    { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16 },
-    { 0, 2, 4, 6, 8, 10, 12, 14, 1, 3, 5, 7, 9, 11, 13, 15 },
-    { 1, 3, 5, 7, 9, 11, 13, 15, 2, 4, 6, 8, 10, 12, 14, 16 },
-  },
-  {
-    { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16 },
-    { 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3 }, 
-    { 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4 }, 
-  },
-};
-
-extern void abort(void);
-
-int main()
-{
-  int i;
-
-  for (i = 0; i < sizeof(tests)/sizeof(tests[0]); ++i)
-    {
-      V r = __builtin_shuffle(tests[i].in, tests[i].mask);
-      if (memcmp(&r, &tests[i].out, sizeof(V)) != 0)
-	abort();
-    }
-
-  return 0;
-}
Index: gcc.c-torture/execute/vect-shuffle-6.c
===================================================================
--- gcc.c-torture/execute/vect-shuffle-6.c	(revision 179599)
+++ gcc.c-torture/execute/vect-shuffle-6.c	(working copy)
@@ -1,64 +0,0 @@
-#if __SIZEOF_SHORT__ == 2
-typedef unsigned short V __attribute__((vector_size(16), may_alias));
-
-struct S
-{
-  V in1, in2, mask, out;
-};
-
-struct S tests[] = {
-  {
-    { 0x1010, 0x2121, 0x3232, 0x4343, 0x5454, 0x6565, 0x7676, 0x8787 },
-    { 0x9898, 0xa9a9, 0xbaba, 0xcbcb, 0xdcdc, 0xeded, 0xfefe, 0x0f0f },
-    { 0, 1, 2, 3, 4, 5, 6, 7 },
-    { 0x1010, 0x2121, 0x3232, 0x4343, 0x5454, 0x6565, 0x7676, 0x8787 },
-  },
-  {
-    { 0x1010, 0x2121, 0x3232, 0x4343, 0x5454, 0x6565, 0x7676, 0x8787 },
-    { 0x9898, 0xa9a9, 0xbaba, 0xcbcb, 0xdcdc, 0xeded, 0xfefe, 0x0f0f },
-    { 8, 9, 10, 11, 12, 13, 14, 15 },
-    { 0x9898, 0xa9a9, 0xbaba, 0xcbcb, 0xdcdc, 0xeded, 0xfefe, 0x0f0f },
-  },
-  {
-    { 0x1010, 0x2121, 0x3232, 0x4343, 0x5454, 0x6565, 0x7676, 0x8787 },
-    { 0x9898, 0xa9a9, 0xbaba, 0xcbcb, 0xdcdc, 0xeded, 0xfefe, 0x0f0f },
-    { 0, 8, 1, 9, 2, 10, 3, 11 },
-    { 0x1010, 0x9898, 0x2121, 0xa9a9, 0x3232, 0xbaba, 0x4343, 0xcbcb },
-  },
-  {
-    { 0x1010, 0x2121, 0x3232, 0x4343, 0x5454, 0x6565, 0x7676, 0x8787 },
-    { 0x9898, 0xa9a9, 0xbaba, 0xcbcb, 0xdcdc, 0xeded, 0xfefe, 0x0f0f },
-    { 0, 15, 4, 11, 12, 3, 7, 8 },
-    { 0x1010, 0x0f0f, 0x5454, 0xcbcb, 0xdcdc, 0x4343, 0x8787, 0x9898 },
-  },
-  {
-    { 0x1010, 0x2121, 0x3232, 0x4343, 0x5454, 0x6565, 0x7676, 0x8787 },
-    { 0x9898, 0xa9a9, 0xbaba, 0xcbcb, 0xdcdc, 0xeded, 0xfefe, 0x0f0f },
-    { 0, 0, 0, 0, 0, 0, 0, 0 },
-    { 0x1010, 0x1010, 0x1010, 0x1010, 0x1010, 0x1010, 0x1010, 0x1010 },
-  },
-  {
-    { 0x1010, 0x2121, 0x3232, 0x4343, 0x5454, 0x6565, 0x7676, 0x8787 },
-    { 0x9898, 0xa9a9, 0xbaba, 0xcbcb, 0xdcdc, 0xeded, 0xfefe, 0x0f0f },
-    { 14, 14, 14, 14, 14, 14, 14, 14 },
-    { 0xfefe, 0xfefe, 0xfefe, 0xfefe, 0xfefe, 0xfefe, 0xfefe, 0xfefe },
-  },
-};
-
-extern void abort(void);
-
-int main()
-{
-  int i;
-
-  for (i = 0; i < sizeof(tests)/sizeof(tests[0]); ++i)
-    {
-      V r = __builtin_shuffle(tests[i].in1, tests[i].in2, tests[i].mask);
-      if (__builtin_memcmp(&r, &tests[i].out, sizeof(V)) != 0)
-	abort();
-    }
-
-  return 0;
-}
-
-#endif /* SIZEOF_SHORT */
Index: gcc.c-torture/execute/vect-shuffle-8.c
===================================================================
--- gcc.c-torture/execute/vect-shuffle-8.c	(revision 179599)
+++ gcc.c-torture/execute/vect-shuffle-8.c	(working copy)
@@ -1,55 +0,0 @@
-typedef unsigned char V __attribute__((vector_size(16), may_alias));
-
-struct S
-{
-  V in1, in2, mask, out;
-};
-
-struct S tests[] = {
-  {
-    { 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25 },
-    { 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45 },
-    { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 },
-    { 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25 },
-  },
-  {
-    { 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25 },
-    { 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45 },
-    { 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31 },
-    { 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45 },
-  },
-  {
-    { 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25 },
-    { 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45 },
-    { 7, 6, 5, 4, 16, 17, 18, 19, 31, 30, 29, 28, 3, 2, 1, 0 },
-    { 17, 16, 15, 14, 30, 31, 32, 33, 45, 44, 43, 42, 13, 12, 11, 10 },
-  },
-  {
-    { 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25 },
-    { 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45 },
-    { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 },
-    { 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10 },
-  },
-  {
-    { 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25 },
-    { 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45 },
-    { 63, 63, 63, 63, 63, 63, 63, 63, 63, 63, 63, 63, 63, 63, 63, 63 },
-    { 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45 },
-  },
-};
-
-extern void abort(void);
-
-int main()
-{
-  int i;
-
-  for (i = 0; i < sizeof(tests)/sizeof(tests[0]); ++i)
-    {
-      V r = __builtin_shuffle(tests[i].in1, tests[i].in2, tests[i].mask);
-      if (__builtin_memcmp(&r, &tests[i].out, sizeof(V)) != 0)
-	abort();
-    }
-
-  return 0;
-}
Index: gcc.c-torture/execute/vect-shuffle-1.c
===================================================================
--- gcc.c-torture/execute/vect-shuffle-1.c	(revision 179599)
+++ gcc.c-torture/execute/vect-shuffle-1.c	(working copy)
@@ -1,68 +0,0 @@
-#if __SIZEOF_INT__ == 4
-typedef unsigned int V __attribute__((vector_size(16), may_alias));
-
-struct S
-{
-  V in, mask, out;
-};
-
-struct S tests[] = {
-  {
-    { 0x11111111, 0x22222222, 0x33333333, 0x44444444 },
-    { 0, 1, 2, 3 },
-    { 0x11111111, 0x22222222, 0x33333333, 0x44444444 },
-  },
-  {
-    { 0x11111111, 0x22222222, 0x33333333, 0x44444444 },
-    { 0+1*4, 1+2*4, 2+3*4, 3+4*4 },
-    { 0x11111111, 0x22222222, 0x33333333, 0x44444444 },
-  },
-  {
-    { 0x11111111, 0x22222222, 0x33333333, 0x44444444 },
-    { 3, 2, 1, 0 },
-    { 0x44444444, 0x33333333, 0x22222222, 0x11111111 },
-  },
-  {
-    { 0x11111111, 0x22222222, 0x33333333, 0x44444444 },
-    { 0, 3, 2, 1 },
-    { 0x11111111, 0x44444444, 0x33333333, 0x22222222 },
-  },
-  {
-    { 0x11111111, 0x22222222, 0x33333333, 0x44444444 },
-    { 0, 2, 1, 3 },
-    { 0x11111111, 0x33333333, 0x22222222, 0x44444444 },
-  },
-  {
-    { 0x11223344, 0x55667788, 0x99aabbcc, 0xddeeff00 },
-    { 3, 1, 2, 0 },
-    { 0xddeeff00, 0x55667788, 0x99aabbcc, 0x11223344 },
-  },
-  {
-    { 0x11223344, 0x55667788, 0x99aabbcc, 0xddeeff00 },
-    { 0, 0, 0, 0 },
-    { 0x11223344, 0x11223344, 0x11223344, 0x11223344 },
-  },
-  {
-    { 0x11223344, 0x55667788, 0x99aabbcc, 0xddeeff00 },
-    { 1, 2, 1, 2 },
-    { 0x55667788, 0x99aabbcc, 0x55667788, 0x99aabbcc },
-  }
-};
-
-extern void abort(void);
-
-int main()
-{
-  int i;
-
-  for (i = 0; i < sizeof(tests)/sizeof(tests[0]); ++i)
-    {
-      V r = __builtin_shuffle(tests[i].in, tests[i].mask);
-      if (__builtin_memcmp(&r, &tests[i].out, sizeof(V)) != 0)
-	abort();
-    }
-
-  return 0;
-}
-
-#endif /* SIZEOF_INT */
Index: gcc.c-torture/execute/vect-shuffle-3.c
===================================================================
--- gcc.c-torture/execute/vect-shuffle-3.c	(revision 179599)
+++ gcc.c-torture/execute/vect-shuffle-3.c	(working copy)
@@ -1,58 +0,0 @@
-#if __SIZEOF_LONG_LONG__ == 8
-typedef unsigned long long V __attribute__((vector_size(16), may_alias));
-
-struct S
-{
-  V in, mask, out;
-};
-
-struct S tests[] = {
-  {
-    { 0x1111111111111111, 0x2222222222222222 },
-    { 0, 1 },
-    { 0x1111111111111111, 0x2222222222222222 },
-  },
-  {
-    { 0x1111111111111111, 0x2222222222222222 },
-    { 0x0102030405060700, 0xffeeddccbbaa99f1 },
-    { 0x1111111111111111, 0x2222222222222222 },
-  },
-  {
-    { 0x1111111111111111, 0x2222222222222222 },
-    { 1, 0 },
-    { 0x2222222222222222, 0x1111111111111111 },
-  },
-  {
-    { 0x1111111111111111, 0x2222222222222222 },
-    { 0, 0 },
-    { 0x1111111111111111, 0x1111111111111111 },
-  },
-  {
-    { 0x1122334455667788, 0x99aabbccddeeff00 },
-    { 1, 1 },
-    { 0x99aabbccddeeff00, 0x99aabbccddeeff00 },
-  },
-  {
-    { 0x1122334455667788, 0x99aabbccddeeff00 },
-    { 1, 0 },
-    { 0x99aabbccddeeff00, 0x1122334455667788 },
-  },
-};
-
-extern void abort(void);
-
-int main()
-{
-  int i;
-
-  for (i = 0; i < sizeof(tests)/sizeof(tests[0]); ++i)
-    {
-      V r = __builtin_shuffle(tests[i].in, tests[i].mask);
-      if (__builtin_memcmp(&r, &tests[i].out, sizeof(V)) != 0)
-	abort();
-    }
-
-  return 0;
-}
-
-#endif /* SIZEOF_LONG_LONG */
Index: gcc.c-torture/execute/vect-shuffle-5.c
===================================================================
--- gcc.c-torture/execute/vect-shuffle-5.c	(revision 179599)
+++ gcc.c-torture/execute/vect-shuffle-5.c	(working copy)
@@ -1,64 +0,0 @@
-#if __SIZEOF_INT__ == 4
-typedef unsigned int V __attribute__((vector_size(16), may_alias));
-
-struct S
-{
-  V in1, in2, mask, out;
-};
-
-struct S tests[] = {
-  {
-    { 0x11111111, 0x22222222, 0x33333333, 0x44444444 },
-    { 0x55555555, 0x66666666, 0x77777777, 0x88888888 },
-    { 0, 1, 2, 3 },
-    { 0x11111111, 0x22222222, 0x33333333, 0x44444444 },
-  },
-  {
-    { 0x11111111, 0x22222222, 0x33333333, 0x44444444 },
-    { 0x55555555, 0x66666666, 0x77777777, 0x88888888 },
-    { 4, 5, 6, 7 },
-    { 0x55555555, 0x66666666, 0x77777777, 0x88888888 },
-  },
-  {
-    { 0x11111111, 0x22222222, 0x33333333, 0x44444444 },
-    { 0x55555555, 0x66666666, 0x77777777, 0x88888888 },
-    { 0, 4, 1, 5 },
-    { 0x11111111, 0x55555555, 0x22222222, 0x66666666 },
-  },
-  {
-    { 0x11111111, 0x22222222, 0x33333333, 0x44444444 },
-    { 0x55555555, 0x66666666, 0x77777777, 0x88888888 },
-    { 0, 7, 4, 3 },
-    { 0x11111111, 0x88888888, 0x55555555, 0x44444444 },
-  },
-  {
-    { 0x11111111, 0x22222222, 0x33333333, 0x44444444 },
-    { 0x55555555, 0x66666666, 0x77777777, 0x88888888 },
-    { 0, 0, 0, 0 },
-    { 0x11111111, 0x11111111, 0x11111111, 0x11111111 },
-  },
-  {
-    { 0x11111111, 0x22222222, 0x33333333, 0x44444444 },
-    { 0x55555555, 0x66666666, 0x77777777, 0x88888888 },
-    { 7, 7, 7, 7 },
-    { 0x88888888, 0x88888888, 0x88888888, 0x88888888 },
-  },
-};
-
-extern void abort(void);
-
-int main()
-{
-  int i;
-
-  for (i = 0; i < sizeof(tests)/sizeof(tests[0]); ++i)
-    {
-      V r = __builtin_shuffle(tests[i].in1, tests[i].in2, tests[i].mask);
-      if (__builtin_memcmp(&r, &tests[i].out, sizeof(V)) != 0)
-	abort();
-    }
-
-  return 0;
-}
-
-#endif /* SIZEOF_INT */
Index: gcc.c-torture/execute/vect-shuffle-7.c
===================================================================
--- gcc.c-torture/execute/vect-shuffle-7.c	(revision 179599)
+++ gcc.c-torture/execute/vect-shuffle-7.c	(working copy)
@@ -1,70 +0,0 @@
-#if __SIZEOF_LONG_LONG__ == 8
-typedef unsigned long long V __attribute__((vector_size(16), may_alias));
-
-struct S
-{
-  V in1, in2, mask, out;
-};
-
-struct S tests[] = {
-  {
-    { 0x1112131415161718, 0x2122232425262728 },
-    { 0x3132333435363738, 0x4142434445464748 },
-    { 0, 1 },
-    { 0x1112131415161718, 0x2122232425262728 },
-  },
-  {
-    { 0x1112131415161718, 0x2122232425262728 },
-    { 0x3132333435363738, 0x4142434445464748 },
-    { 2, 3 },
-    { 0x3132333435363738, 0x4142434445464748 },
-  },
-  {
-    { 0x1112131415161718, 0x2122232425262728 },
-    { 0x3132333435363738, 0x4142434445464748 },
-    { 0, 2 },
-    { 0x1112131415161718, 0x3132333435363738 },
-  },
-  {
-    { 0x1112131415161718, 0x2122232425262728 },
-    { 0x3132333435363738, 0x4142434445464748 },
-    { 2, 1 },
-    { 0x3132333435363738, 0x2122232425262728 },
-  },
-  {
-    { 0x1112131415161718, 0x2122232425262728 },
-    { 0x3132333435363738, 0x4142434445464748 },
-    { 3, 0 },
-    { 0x4142434445464748, 0x1112131415161718 },
-  },
-  {
-    { 0x1112131415161718, 0x2122232425262728 },
-    { 0x3132333435363738, 0x4142434445464748 },
-    { 0, 0 },
-    { 0x1112131415161718, 0x1112131415161718 },
-  },
-  {
-    { 0x1112131415161718, 0x2122232425262728 },
-    { 0x3132333435363738, 0x4142434445464748 },
-    { 3, 3 },
-    { 0x4142434445464748, 0x4142434445464748 },
-  },
-};
-
-extern void abort(void);
-
-int main()
-{
-  int i;
-
-  for (i = 0; i < sizeof(tests)/sizeof(tests[0]); ++i)
-    {
-      V r = __builtin_shuffle(tests[i].in1, tests[i].in2, tests[i].mask);
-      if (__builtin_memcmp(&r, &tests[i].out, sizeof(V)) != 0)
-	abort();
-    }
-
-  return 0;
-}
-
-#endif /* SIZEOF_LONG_LONG */
Index: gcc.dg/torture/vect-shuffle-6.c
===================================================================
--- gcc.dg/torture/vect-shuffle-6.c	(revision 179599)
+++ gcc.dg/torture/vect-shuffle-6.c	(working copy)
@@ -1,4 +1,6 @@
-#if __SIZEOF_SHORT__ == 2
+/* { dg-do run } */
+/* { dg-require-effective-target short16 } */
+
 typedef unsigned short V __attribute__((vector_size(16), may_alias));
 
 struct S
@@ -60,5 +62,3 @@ int main()
 
   return 0;
 }
-
-#endif /* SIZEOF_SHORT */
Index: gcc.dg/torture/vect-shuffle-7.c
===================================================================
--- gcc.dg/torture/vect-shuffle-7.c	(revision 179599)
+++ gcc.dg/torture/vect-shuffle-7.c	(working copy)
@@ -1,4 +1,6 @@
-#if __SIZEOF_LONG_LONG__ == 8
+/* { dg-do run } */
+/* { dg-require-effective-target longlong64 } */
+
 typedef unsigned long long V __attribute__((vector_size(16), may_alias));
 
 struct S
@@ -66,5 +68,3 @@ int main()
 
   return 0;
 }
-
-#endif /* SIZEOF_LONG_LONG */
Index: gcc.dg/torture/vect-shuffle-1.c
===================================================================
--- gcc.dg/torture/vect-shuffle-1.c	(revision 179599)
+++ gcc.dg/torture/vect-shuffle-1.c	(working copy)
@@ -1,4 +1,6 @@
-#if __SIZEOF_INT__ == 4
+/* { dg-do run } */
+/* { dg-require-effective-target int32 } */
+
 typedef unsigned int V __attribute__((vector_size(16), may_alias));
 
 struct S
@@ -64,5 +66,3 @@ int main()
 
   return 0;
 }
-
-#endif /* SIZEOF_INT */
Index: gcc.dg/torture/vect-shuffle-2.c
===================================================================
--- gcc.dg/torture/vect-shuffle-2.c	(revision 179599)
+++ gcc.dg/torture/vect-shuffle-2.c	(working copy)
@@ -1,4 +1,6 @@
-#if __SIZEOF_SHORT__ == 2
+/* { dg-do run } */
+/* { dg-require-effective-target short16 } */
+
 typedef unsigned short V __attribute__((vector_size(16), may_alias));
 
 struct S
@@ -64,5 +66,3 @@ int main()
 
   return 0;
 }
-
-#endif /* SIZEOF_SHORT */
Index: gcc.dg/torture/vect-shuffle-3.c
===================================================================
--- gcc.dg/torture/vect-shuffle-3.c	(revision 179599)
+++ gcc.dg/torture/vect-shuffle-3.c	(working copy)
@@ -1,4 +1,6 @@
-#if __SIZEOF_LONG_LONG__ == 8
+/* { dg-do run } */
+/* { dg-require-effective-target longlong64 } */
+
 typedef unsigned long long V __attribute__((vector_size(16), may_alias));
 
 struct S
@@ -54,5 +56,3 @@ int main()
 
   return 0;
 }
-
-#endif /* SIZEOF_LONG_LONG */
Index: gcc.dg/torture/vect-shuffle-5.c
===================================================================
--- gcc.dg/torture/vect-shuffle-5.c	(revision 179599)
+++ gcc.dg/torture/vect-shuffle-5.c	(working copy)
@@ -1,4 +1,6 @@
-#if __SIZEOF_INT__ == 4
+/* { dg-do run } */
+/* { dg-require-effective-target int32 } */
+
 typedef unsigned int V __attribute__((vector_size(16), may_alias));
 
 struct S
@@ -60,5 +62,3 @@ int main()
 
   return 0;
 }
-
-#endif /* SIZEOF_INT */

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: Vector shuffling
  2011-10-06 12:12                                                                     ` Georg-Johann Lay
@ 2011-10-06 15:43                                                                       ` Richard Henderson
  2011-10-06 18:13                                                                         ` Georg-Johann Lay
  0 siblings, 1 reply; 71+ messages in thread
From: Richard Henderson @ 2011-10-06 15:43 UTC (permalink / raw)
  To: Georg-Johann Lay
  Cc: Richard Guenther, Artem Shinkarov, Joseph S. Myers, Duncan Sands,
	gcc-patches

On 10/06/2011 04:46 AM, Georg-Johann Lay wrote:
> So here it is.  Lightly tested on my target: All tests either PASS or are
> UNSUPPORTED now.
> 
> Ok?

Not ok, but only because I've completely restructured the tests again.
Patch coming very shortly...


r~

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: Vector shuffling
  2011-10-06 15:43                                                                       ` Richard Henderson
@ 2011-10-06 18:13                                                                         ` Georg-Johann Lay
  0 siblings, 0 replies; 71+ messages in thread
From: Georg-Johann Lay @ 2011-10-06 18:13 UTC (permalink / raw)
  To: Richard Henderson
  Cc: Richard Guenther, Artem Shinkarov, Joseph S. Myers, Duncan Sands,
	gcc-patches

Richard Henderson schrieb:
> On 10/06/2011 04:46 AM, Georg-Johann Lay wrote:
> 
>>So here it is.  Lightly tested on my target: All tests either PASS or are
>>UNSUPPORTED now.
>>
>>Ok?
> 
> Not ok, but only because I've completely restructured the tests again.
> Patch coming very shortly...

Thanks, I hope your patch fixed the issues addressed in my patch :-)

Johann

> 
> r~
> 

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: Vector shuffling
  2010-08-16 18:53     ` Richard Henderson
  2010-08-16 19:24       ` Richard Henderson
@ 2010-08-17  9:36       ` Richard Guenther
  1 sibling, 0 replies; 71+ messages in thread
From: Richard Guenther @ 2010-08-17  9:36 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Andrew Pinski, Artem Shinkarov, gcc-patches

On Mon, Aug 16, 2010 at 8:53 PM, Richard Henderson <rth@redhat.com> wrote:
> On 08/15/2010 03:09 PM, Richard Guenther wrote:
>> On the tree level we generally express target dependent features via
>> builtins.  What tree code are you thinking of?  We have vector lowering
>> for target unsupported stuff to allow optimizing - would that new tree code
>> be target specific then (in that it appears only when target support
>> is available)?
>>
>> I think the hurdle to add a new tree code should be large - otherwise we'll
>> just accumulate a mess.
>
> In this case I think that a tree code would be best.
>
> The problem is that the original shuffle is overloaded for all
> vector types.  Which means that a single builtin function cannot
> be type correct.  Since the user can define arbitrary vector
> types, and __builtin_shuffle is supposed to be generic, we cannot
> possibly pre-define all of the decls required.
>
> (Given that Artem doesn't introduce such a tree code and only
> two builtins suggests that his testing is incomplete, because
> this really ought to have failed in verify_types_in_gimple_stmt
> somewhere.)

The C frontend pieces build new type-correct function decls at parsing
time.

> The big question is what type on which to define the permutation
> vector.  While it is logical to use an integral vector of the
> same width as the output vector, the variable permutation case
> for both x86 and powerpc would prefer to permute on bytes and
> not the original element types.  Further, the constant permute
> case for x86 would prefer to permute on the original element
> types, and not have to re-interpret a byte permutation back into
> the original element types.  If we always use either byte or
> always use element permute then we'll have duplicate code in
> the backends to compensate.  Better to handle both forms in the
> middle-end, and ask the backend which is preferred.
>
> Which suggests something like
>
>  VEC_PERM_ELT (V1, V2, EMASK)
>  VEC_PERM_BYTE (V1, V2, BMASK)
>
> where EMASK is element based indicies and BMASK is byte based
> indicies.  A target hook would determine if VEC_PERM_ELT or
> VEC_PERM_BYTE is preferred or possible for a given permutation.
>
> Permutations originating from the user via __builtin_shuffle
> would originally be represented as VEC_PERM_ELT, and would be
> lowered to VEC_PERM_BYTE in tree-vect-generic.c as required by
> the aforementioned target hook.

That sounds like a good idea.

Richard.

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: Vector shuffling
  2010-08-16 17:44 ` Richard Henderson
  2010-08-16 19:33   ` Artem Shinkarov
@ 2010-08-17  9:33   ` Richard Guenther
  1 sibling, 0 replies; 71+ messages in thread
From: Richard Guenther @ 2010-08-17  9:33 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Artem Shinkarov, gcc-patches

On Mon, Aug 16, 2010 at 7:42 PM, Richard Henderson <rth@redhat.com> wrote:
> Only looking closely at the i386 changes for now.
>
> On 08/15/2010 07:30 AM, Artem Shinkarov wrote:
>> +  /* Recursively grab the definition of the variable.  */
>> +  while (TREE_CODE (mask) == SSA_NAME)
>> +    {
>> +      gimple maskdef = SSA_NAME_DEF_STMT (mask);
>> +      if (gimple_assign_single_p (maskdef))
>> +        mask = gimple_assign_rhs1 (maskdef);
>> +      else
>> +        break;
>> +    }
>> +
>
> Err, surely copy-propagation has happened and this loop isn't needed.
> In particular, I'd hope that MASK is *already* VECTOR_CST if it is
> in fact constant.

I think it can happen when not optimizing.  The question is of course
whether we are fine with producing less optimal code in that case.

Richard.

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: Vector shuffling
  2010-08-16 19:33   ` Artem Shinkarov
@ 2010-08-16 19:59     ` Richard Henderson
  0 siblings, 0 replies; 71+ messages in thread
From: Richard Henderson @ 2010-08-16 19:59 UTC (permalink / raw)
  To: Artem Shinkarov; +Cc: gcc-patches, Richard Guenther

On 08/16/2010 12:32 PM, Artem Shinkarov wrote:
>> You should need to use m_type here in casting the arguments.
>> In particular your existing mask could be V4SI with unsigned
>> elements, whereas the builtin takes V4SI with signed elements.
> 
> Looking into the code of ix86_vectorize_builtin_vec_perm, I see that
> the type returned via m_type is almost always the same to TREE_TYPE
> (TREE_TYPE (vec0)) and in your case for V4SI mode there are two
> functions: IX86_BUILTIN_VEC_PERM_V4SI_U and IX86_BUILTIN_VEC_PERM_V4SI
> for unsigned and signed mask. The only change is happening in
> V2DFmode. Could there be any problems with that?
> 
> And mask_type really returns mask element type.

The message at 

  http://gcc.gnu.org/ml/gcc-patches/2010-08/msg01178.html

in the end should supersede many of the comments here.

>> Um, when are you EVER going to see a VECTOR_CST of the wrong size,
>> with the wrong number of elements.  This is very most certainly a
>> bug elsewhere.
> 
> I noticed it quite recently, that nobody ever checked VECTOR_CST for
> consistency, it is fixed about a week ago. But anyway, we have a loop
> over dynamic list, where exit depends on a number, looks suspicious to
> me. And still you can construct such a VECTOR_CST. So why should we
> segfault, when we can just handle it...

Because it's a bug at the origin of the VECTOR_CST.

Perhaps we need an addition to one of the verify_* routines in tree-cfg.c
to find the culprit earlier, but one should never EVER find a VECTOR_CST
without the proper number of elements.

Extra checks here merely make the code more complex for no advantage.

> ... what's wrong with just
> returning false instead of false assertion?

Because it hides a bug in the vectorizer.  Note the different paths
taken for permutations originating in the vectorizer, where we assert
that the values must be correct, and for permutations originating from
the user, where we generate an error.

> Well, the shuffle functionality I introduce is more generic that
> vec_perm we have. So I don't see any problem that expanding
> builtin_shuffle I use vec_perm? Why should user check against vec_perm
> and shuffle if he just wants to shuffle?

I'm not meaning to suggest that __builtin_shuffle handle its inputs
any differently as far as the user is concerned.

What I mean is that tree-vect-generic.c should make use of the existing
target hook when possible, in order to share code between as many targets
as possible.  Indeed, as the follow-up message referenced above suggests,
the existing target support can be re-arranged so as to completely share
the support required by i386 and powerpc.  In the end, almost all of the 
i386 code you add would reside in tree-vect-generic.c.


r~

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: Vector shuffling
  2010-08-16 17:44 ` Richard Henderson
@ 2010-08-16 19:33   ` Artem Shinkarov
  2010-08-16 19:59     ` Richard Henderson
  2010-08-17  9:33   ` Richard Guenther
  1 sibling, 1 reply; 71+ messages in thread
From: Artem Shinkarov @ 2010-08-16 19:33 UTC (permalink / raw)
  To: Richard Henderson; +Cc: gcc-patches, Richard Guenther

On Mon, Aug 16, 2010 at 6:42 PM, Richard Henderson <rth@redhat.com> wrote:
> Only looking closely at the i386 changes for now.
>
> On 08/15/2010 07:30 AM, Artem Shinkarov wrote:
>> +  /* Recursively grab the definition of the variable.  */
>> +  while (TREE_CODE (mask) == SSA_NAME)
>> +    {
>> +      gimple maskdef = SSA_NAME_DEF_STMT (mask);
>> +      if (gimple_assign_single_p (maskdef))
>> +        mask = gimple_assign_rhs1 (maskdef);
>> +      else
>> +        break;
>> +    }
>> +
>
> Err, surely copy-propagation has happened and this loop isn't needed.
> In particular, I'd hope that MASK is *already* VECTOR_CST if it is
> in fact constant.

MASK could be anything and I wanted to get to constructor, but looking
over the patch again may be this is a deprecated part.

>
>> +          t = ix86_vectorize_builtin_vec_perm (TREE_TYPE (vec0), &m_type);
>> +
>> +          if (t != NULL_TREE)
>> +            {
>> +              gimple c = gimple_build_call (t, 3, vec0, vec0, mask);
>> +              gimple stmt = gsi_stmt (*gsi);
>> +              gimple_call_set_lhs (c, gimple_call_lhs (stmt));
>> +              gsi_replace (gsi, c, false);
>
> You should need to use m_type here in casting the arguments.
> In particular your existing mask could be V4SI with unsigned
> elements, whereas the builtin takes V4SI with signed elements.

Looking into the code of ix86_vectorize_builtin_vec_perm, I see that
the type returned via m_type is almost always the same to TREE_TYPE
(TREE_TYPE (vec0)) and in your case for V4SI mode there are two
functions: IX86_BUILTIN_VEC_PERM_V4SI_U and IX86_BUILTIN_VEC_PERM_V4SI
for unsigned and signed mask. The only change is happening in
V2DFmode. Could there be any problems with that?

And mask_type really returns mask element type.

>
>> +  /* If we cannot expand it via vec_perm, we will try to expand it
>> +     via PSHUFB instruction.  */
>> +    {
>
> Indentation is off.  Although it wouldn't be if you moved the
> TARGET_SSE3 || TARGET_AVX test up to be an IF protecting this block,
> which would also help readability.
>
>> +          /* m1var = mm1var << 8*i */
>> +          m1 = build2 (LSHIFT_EXPR, mtype, m1var,
>> +                        build_int_cst (TREE_TYPE (mtype), 8*i));
>> +          t = force_gimple_operand_gsi (gsi, m1,
>> +                            true, NULL_TREE, true, GSI_SAME_STMT);
>> +          asgn = gimple_build_assign (m1var, t);
>> +          gsi_insert_before (gsi, asgn , GSI_SAME_STMT);
>> +
>> +          /* mvar = mvar | m1var */
>> +          m1 = build2 (BIT_IOR_EXPR, mtype, mvar, m1var);
>> +          t = force_gimple_operand_gsi (gsi, m1,
>> +                            true, NULL_TREE, true, GSI_SAME_STMT);
>> +          asgn = gimple_build_assign (mvar, t);
>> +          gsi_insert_before (gsi, asgn, GSI_SAME_STMT);
>
> I don't believe this computes the proper values.  Suppose we're
> trying to generate a permuation for a V4SI.  Suppose that [A-D]
> are already masked to [0-3].  As far as I can see you'll produce
>
>  t0                    = (v16qi){ A,0,0,0,B,0,0,0,C,0,0,0,D,0,0,0 }
>  t1 = t0 << 8;         = (v16qi){ A,A,0,0,B,B,0,0,C,C,0,0,D,D,0,0 }
>  t2 = t1 << 16;        = (v16qi){ A,A,A,A,B,B,B,B,C,C,C,C,D,D,D,D }
>
> when what you really want is
>
>  t2 = (v16qi){ A*4, A*4+1, A*4+2, A*4+3,
>                B*4, B*4+1, B*4+2, B*4+3,
>                C*4, C*4+1, C*4+2, C*4+3,
>                D*4, D*4+1, D*4+2, D*4+3 };
>
> So:
>
>  t0 = mask & (v4si){ 3, 3, 3, 3 };
>  t1 = t0 * 4;
>
> You ought to perform the permutation into T2 above in one step, not
> explicit shifts.  You know this will succeed because you're already
> assuming pshufb.
>
>  t2 = __builtin_ia32_vec_perm_v16qi_u
>        ((v16qi)t1,
>         (v16qi){ 0,0,0,0, 4,4,4,4, 8,8,8,8, 12,12,12,12 });
>
> You need to add an offset compensation vector dependent on the source
> vector element size, e.g.
>
>  t3 = t2 + (v16qi){ 0,1,2,3, 0,1,2,3, 0,1,2,3, 0,1,2,3 }
>
>> +      if (fntype != NULL_TREE)
>
> You already checked this above, although given that you tested for
> SSE3, I don't think even that is needed.  You can assume it.
>
>> +ix86_vectorize_builtin_shuffle2 (gimple_stmt_iterator *gsi,
>> +                                tree vec0, tree vec1, tree mask)
>
> You ought to extract all of the pshufb code from above so that
> you can re-use it here for the TARGET_XOP vpperm instruction,
> which does in fact perform the two operand shuffle.

That is *very* useful, thank you. I missed this conversion issues.

>
>> -  for (i = 0; i < nelt; ++i, list = TREE_CHAIN (list))
>> +  for (i = 0; i < nelt; ++i, list =
>> +                        (list == NULL_TREE ? NULL_TREE : TREE_CHAIN (list)))
>>      {
>>        unsigned HOST_WIDE_INT e;
>> +      tree value;
>> +
>> +      if (list != NULL_TREE)
>> +        value = TREE_VALUE (list);
>> +      else
>> +          value = fold_convert (TREE_TYPE (TREE_TYPE (cst)),
>> +                                integer_zero_node);
>
> Um, when are you EVER going to see a VECTOR_CST of the wrong size,
> with the wrong number of elements.  This is very most certainly a
> bug elsewhere.

I noticed it quite recently, that nobody ever checked VECTOR_CST for
consistency, it is fixed about a week ago. But anyway, we have a loop
over dynamic list, where exit depends on a number, looks suspicious to
me. And still you can construct such a VECTOR_CST. So why should we
segfault, when we can just handle it...

>
>> -  /* This hook is cannot be called in response to something that the
>> -     user does (unlike the builtin expander) so we shouldn't ever see
>> -     an error generated from the extract.  */
>> -  gcc_assert (vec_mask > 0 && vec_mask <= 3);
>> +  /* Check whether the mask can be applied to the vector type.  */
>> +  if (vec_mask < 0 || vec_mask > 3)
>> +    return false;
>
> I'd very much prefer this check to be left in place.  Indeed, this
> points to the fact that you've incorrectly interpreted the spec for
> the OpenCL shuffle builtins: if the user writes { 9,15,33,101 } as
> a literal, you should interpret this with the n-1 mask in place,
> i.e.  { 1, 3, 1, 1 }.

The problem was different. vec_perm_ok can't handle every mask, for
instance mask with repeating element. And when I used it for the sake
of checking whether I can use a built-in permutation or not, I
received a false assertion as the answer. I think that this function
was never used in that way before, but what's wrong with just
returning false instead of false assertion?

>
> Which suggests that you shouldn't be handling VECTOR_CST in the new
> hooks at all.  You should handle that in generic code and call into
> the existing TARGET_VECTORIZE_BUILTIN_VEC_PERM hook.

Well, the shuffle functionality I introduce is more generic that
vec_perm we have. So I don't see any problem that expanding
builtin_shuffle I use vec_perm? Why should user check against vec_perm
and shuffle if he just wants to shuffle?

>
> The new hooks should *only* handle the variable permutation case.

Well, again, I thought that it would be nice to bring some
generalisation of these functions. I can't understand why it is so
bad?


Artem.

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: Vector shuffling
  2010-08-16 18:53     ` Richard Henderson
@ 2010-08-16 19:24       ` Richard Henderson
  2010-08-17  9:36       ` Richard Guenther
  1 sibling, 0 replies; 71+ messages in thread
From: Richard Henderson @ 2010-08-16 19:24 UTC (permalink / raw)
  To: Richard Guenther; +Cc: Andrew Pinski, Artem Shinkarov, gcc-patches

On 08/16/2010 11:53 AM, Richard Henderson wrote:
> Which suggests something like
> 
>   VEC_PERM_ELT (V1, V2, EMASK)
>   VEC_PERM_BYTE (V1, V2, BMASK)
> 
> where EMASK is element based indicies and BMASK is byte based
> indicies.  A target hook would determine if VEC_PERM_ELT or
> VEC_PERM_BYTE is preferred or possible for a given permutation.

... I forgot to add.  It was my intention that we handle single
operand shuffle (as opposed to shuffle2) via

  VEC_PERM_ELT (V1, V1, EMASK)

Recognizing that V1==V2 in the various places that actually
require that we distinguish the availability of native insns
(i.e. sse3 pshufb vs xop vpperm) should be easy.



r~

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: Vector shuffling
  2010-08-15 22:10   ` Richard Guenther
@ 2010-08-16 18:53     ` Richard Henderson
  2010-08-16 19:24       ` Richard Henderson
  2010-08-17  9:36       ` Richard Guenther
  0 siblings, 2 replies; 71+ messages in thread
From: Richard Henderson @ 2010-08-16 18:53 UTC (permalink / raw)
  To: Richard Guenther; +Cc: Andrew Pinski, Artem Shinkarov, gcc-patches

On 08/15/2010 03:09 PM, Richard Guenther wrote:
> On the tree level we generally express target dependent features via
> builtins.  What tree code are you thinking of?  We have vector lowering
> for target unsupported stuff to allow optimizing - would that new tree code
> be target specific then (in that it appears only when target support
> is available)?
> 
> I think the hurdle to add a new tree code should be large - otherwise we'll
> just accumulate a mess.

In this case I think that a tree code would be best.

The problem is that the original shuffle is overloaded for all
vector types.  Which means that a single builtin function cannot
be type correct.  Since the user can define arbitrary vector 
types, and __builtin_shuffle is supposed to be generic, we cannot
possibly pre-define all of the decls required.

(Given that Artem doesn't introduce such a tree code and only
two builtins suggests that his testing is incomplete, because
this really ought to have failed in verify_types_in_gimple_stmt
somewhere.)

The big question is what type on which to define the permutation
vector.  While it is logical to use an integral vector of the 
same width as the output vector, the variable permutation case
for both x86 and powerpc would prefer to permute on bytes and
not the original element types.  Further, the constant permute
case for x86 would prefer to permute on the original element
types, and not have to re-interpret a byte permutation back into
the original element types.  If we always use either byte or
always use element permute then we'll have duplicate code in
the backends to compensate.  Better to handle both forms in the
middle-end, and ask the backend which is preferred.

Which suggests something like

  VEC_PERM_ELT (V1, V2, EMASK)
  VEC_PERM_BYTE (V1, V2, BMASK)

where EMASK is element based indicies and BMASK is byte based
indicies.  A target hook would determine if VEC_PERM_ELT or
VEC_PERM_BYTE is preferred or possible for a given permutation.

Permutations originating from the user via __builtin_shuffle
would originally be represented as VEC_PERM_ELT, and would be
lowered to VEC_PERM_BYTE in tree-vect-generic.c as required by
the aforementioned target hook.

Permutations originating from the vectorizer would be in the
desired form to begin with.  Given that it can already handle
relatively arbitrary MASK_TYPE, it should not be difficult to
modify the existing code to use the new tree codes.

This would allow quite a bit of cleanup in both the x86 and
the powerpc backends in my opinion.  At the moment we have an
ugly proliferation of target-specific permutation builtins
which serve no purpose except to satisfy type correctness.



r~

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: Vector shuffling
  2010-08-15 15:32 Artem Shinkarov
                   ` (2 preceding siblings ...)
  2010-08-15 18:26 ` Andrew Pinski
@ 2010-08-16 17:44 ` Richard Henderson
  2010-08-16 19:33   ` Artem Shinkarov
  2010-08-17  9:33   ` Richard Guenther
  3 siblings, 2 replies; 71+ messages in thread
From: Richard Henderson @ 2010-08-16 17:44 UTC (permalink / raw)
  To: Artem Shinkarov; +Cc: gcc-patches, Richard Guenther

Only looking closely at the i386 changes for now.

On 08/15/2010 07:30 AM, Artem Shinkarov wrote:
> +  /* Recursively grab the definition of the variable.  */
> +  while (TREE_CODE (mask) == SSA_NAME)
> +    {
> +      gimple maskdef = SSA_NAME_DEF_STMT (mask);
> +      if (gimple_assign_single_p (maskdef))
> +        mask = gimple_assign_rhs1 (maskdef);
> +      else
> +        break;
> +    }
> +

Err, surely copy-propagation has happened and this loop isn't needed.
In particular, I'd hope that MASK is *already* VECTOR_CST if it is
in fact constant.

> +          t = ix86_vectorize_builtin_vec_perm (TREE_TYPE (vec0), &m_type);
> +          
> +          if (t != NULL_TREE)
> +            {
> +              gimple c = gimple_build_call (t, 3, vec0, vec0, mask);
> +              gimple stmt = gsi_stmt (*gsi);
> +              gimple_call_set_lhs (c, gimple_call_lhs (stmt));
> +              gsi_replace (gsi, c, false);

You should need to use m_type here in casting the arguments.
In particular your existing mask could be V4SI with unsigned
elements, whereas the builtin takes V4SI with signed elements.

> +  /* If we cannot expand it via vec_perm, we will try to expand it 
> +     via PSHUFB instruction.  */
> +    {

Indentation is off.  Although it wouldn't be if you moved the
TARGET_SSE3 || TARGET_AVX test up to be an IF protecting this block,
which would also help readability.

> +          /* m1var = mm1var << 8*i */
> +          m1 = build2 (LSHIFT_EXPR, mtype, m1var, 
> +                        build_int_cst (TREE_TYPE (mtype), 8*i));
> +          t = force_gimple_operand_gsi (gsi, m1,
> +                            true, NULL_TREE, true, GSI_SAME_STMT);
> +          asgn = gimple_build_assign (m1var, t);
> +          gsi_insert_before (gsi, asgn , GSI_SAME_STMT);
> +
> +          /* mvar = mvar | m1var */
> +          m1 = build2 (BIT_IOR_EXPR, mtype, mvar, m1var);
> +          t = force_gimple_operand_gsi (gsi, m1,
> +                            true, NULL_TREE, true, GSI_SAME_STMT);
> +          asgn = gimple_build_assign (mvar, t);
> +          gsi_insert_before (gsi, asgn, GSI_SAME_STMT);

I don't believe this computes the proper values.  Suppose we're
trying to generate a permuation for a V4SI.  Suppose that [A-D]
are already masked to [0-3].  As far as I can see you'll produce

  t0 			= (v16qi){ A,0,0,0,B,0,0,0,C,0,0,0,D,0,0,0 }
  t1 = t0 << 8;		= (v16qi){ A,A,0,0,B,B,0,0,C,C,0,0,D,D,0,0 }
  t2 = t1 << 16;	= (v16qi){ A,A,A,A,B,B,B,B,C,C,C,C,D,D,D,D }

when what you really want is

  t2 = (v16qi){ A*4, A*4+1, A*4+2, A*4+3,
	        B*4, B*4+1, B*4+2, B*4+3,
		C*4, C*4+1, C*4+2, C*4+3,
		D*4, D*4+1, D*4+2, D*4+3 };

So:

  t0 = mask & (v4si){ 3, 3, 3, 3 };
  t1 = t0 * 4;

You ought to perform the permutation into T2 above in one step, not
explicit shifts.  You know this will succeed because you're already
assuming pshufb.

  t2 = __builtin_ia32_vec_perm_v16qi_u
	((v16qi)t1,
	 (v16qi){ 0,0,0,0, 4,4,4,4, 8,8,8,8, 12,12,12,12 });

You need to add an offset compensation vector dependent on the source
vector element size, e.g.

  t3 = t2 + (v16qi){ 0,1,2,3, 0,1,2,3, 0,1,2,3, 0,1,2,3 }

> +      if (fntype != NULL_TREE)

You already checked this above, although given that you tested for
SSE3, I don't think even that is needed.  You can assume it.

> +ix86_vectorize_builtin_shuffle2 (gimple_stmt_iterator *gsi, 
> +                                tree vec0, tree vec1, tree mask)

You ought to extract all of the pshufb code from above so that
you can re-use it here for the TARGET_XOP vpperm instruction,
which does in fact perform the two operand shuffle.

> -  for (i = 0; i < nelt; ++i, list = TREE_CHAIN (list))
> +  for (i = 0; i < nelt; ++i, list = 
> +                        (list == NULL_TREE ? NULL_TREE : TREE_CHAIN (list)))
>      {
>        unsigned HOST_WIDE_INT e;
> +      tree value;
> +
> +      if (list != NULL_TREE)
> +        value = TREE_VALUE (list);
> +      else
> +          value = fold_convert (TREE_TYPE (TREE_TYPE (cst)), 
> +                                integer_zero_node);

Um, when are you EVER going to see a VECTOR_CST of the wrong size,
with the wrong number of elements.  This is very most certainly a
bug elsewhere.

> -  /* This hook is cannot be called in response to something that the
> -     user does (unlike the builtin expander) so we shouldn't ever see
> -     an error generated from the extract.  */
> -  gcc_assert (vec_mask > 0 && vec_mask <= 3);
> +  /* Check whether the mask can be applied to the vector type.  */
> +  if (vec_mask < 0 || vec_mask > 3)
> +    return false;

I'd very much prefer this check to be left in place.  Indeed, this
points to the fact that you've incorrectly interpreted the spec for
the OpenCL shuffle builtins: if the user writes { 9,15,33,101 } as
a literal, you should interpret this with the n-1 mask in place,
i.e.  { 1, 3, 1, 1 }.

Which suggests that you shouldn't be handling VECTOR_CST in the new
hooks at all.  You should handle that in generic code and call into
the existing TARGET_VECTORIZE_BUILTIN_VEC_PERM hook.

The new hooks should *only* handle the variable permutation case.


r~

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: Vector shuffling
  2010-08-15 22:46   ` Richard Guenther
@ 2010-08-16 15:49     ` Chris Lattner
  0 siblings, 0 replies; 71+ messages in thread
From: Chris Lattner @ 2010-08-16 15:49 UTC (permalink / raw)
  To: Richard Guenther; +Cc: Artem Shinkarov, gcc-patches


On Aug 15, 2010, at 3:10 PM, Richard Guenther wrote:

>> Great, thanks for working on this.  In the effort to make free software compilers agree with each other in this case, could you consider implementing the same extension that Clang provides?
>> http://clang.llvm.org/docs/LanguageExtensions.html#__builtin_shufflevector
>> 
>> The major difference is the naming of the builtin and that the index list is taken as a series of variadic arguments.
> 
> I don't see how you can implement OpenCL shuffle and shuffle2 with that.

This is what the V.xxxy and V.s1230 style shuffles are mapped onto.

-Chris 

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: Vector shuffling
  2010-08-15 16:12 ` Chris Lattner
  2010-08-15 18:56   ` Steven Bosscher
@ 2010-08-15 22:46   ` Richard Guenther
  2010-08-16 15:49     ` Chris Lattner
  1 sibling, 1 reply; 71+ messages in thread
From: Richard Guenther @ 2010-08-15 22:46 UTC (permalink / raw)
  To: Chris Lattner; +Cc: Artem Shinkarov, gcc-patches

On Sun, Aug 15, 2010 at 6:07 PM, Chris Lattner <clattner@apple.com> wrote:
>
> On Aug 15, 2010, at 7:30 AM, Artem Shinkarov wrote:
>
>> The patch implements vector shuffling according to the OpenCL
>> standard. The patch introduces builtin function __builtin_shuffle
>> which accepts two or three parameters: __builtin_shuffle (vec, mask)
>> or __builtin_shuffle (vec0, vec1, mask) and returns a shuffled vector.
>>
>> Function is trying to dispatch shuffling to the hardware-specific
>> shuffling instructions via new target hooks. If this attempt fails,
>> function expands shuffling piecewise.
>
> Great, thanks for working on this.  In the effort to make free software compilers agree with each other in this case, could you consider implementing the same extension that Clang provides?
> http://clang.llvm.org/docs/LanguageExtensions.html#__builtin_shufflevector
>
> The major difference is the naming of the builtin and that the index list is taken as a series of variadic arguments.

I don't see how you can implement OpenCL shuffle and shuffle2 with that.

Richard.

> -Chris

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: Vector shuffling
  2010-08-15 18:26 ` Andrew Pinski
@ 2010-08-15 22:10   ` Richard Guenther
  2010-08-16 18:53     ` Richard Henderson
  0 siblings, 1 reply; 71+ messages in thread
From: Richard Guenther @ 2010-08-15 22:10 UTC (permalink / raw)
  To: Andrew Pinski; +Cc: Artem Shinkarov, gcc-patches

On Sun, Aug 15, 2010 at 7:34 PM, Andrew Pinski <pinskia@gmail.com> wrote:
> On Sun, Aug 15, 2010 at 7:30 AM, Artem Shinkarov
> <artyom.shinkaroff@gmail.com> wrote:
>> The patch implements vector shuffling according to the OpenCL
>> standard. The patch introduces builtin function __builtin_shuffle
>> which accepts two or three parameters: __builtin_shuffle (vec, mask)
>> or __builtin_shuffle (vec0, vec1, mask) and returns a shuffled vector.
>>
>> Function is trying to dispatch shuffling to the hardware-specific
>> shuffling instructions via new target hooks. If this attempt fails,
>> function expands shuffling piecewise.
>
> I don't like the idea of a target hook here.  Opcodes seems like an
> easier way of adding support to new targets.  Not to mention maybe we
> should have a new tree code and a new rtl code which will allow the
> compiler to optimize these shuffles easier.

On the tree level we generally express target dependent features via
builtins.  What tree code are you thinking of?  We have vector lowering
for target unsupported stuff to allow optimizing - would that new tree code
be target specific then (in that it appears only when target support
is available)?

I think the hurdle to add a new tree code should be large - otherwise we'll
just accumulate a mess.

Richard.

> -- Pinski
>

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: Vector shuffling
  2010-08-15 18:56   ` Steven Bosscher
@ 2010-08-15 21:23     ` Paolo Bonzini
  0 siblings, 0 replies; 71+ messages in thread
From: Paolo Bonzini @ 2010-08-15 21:23 UTC (permalink / raw)
  To: Steven Bosscher
  Cc: Chris Lattner, Artem Shinkarov, gcc-patches, Richard Guenther

On 08/15/2010 08:54 PM, Steven Bosscher wrote:
> On Sun, Aug 15, 2010 at 6:07 PM, Chris Lattner<clattner@apple.com>
> wrote:
>>
>> On Aug 15, 2010, at 7:30 AM, Artem Shinkarov wrote:
>>
>>> The patch implements vector shuffling according to the OpenCL
>>> standard. The patch introduces builtin function
>>> __builtin_shuffle which accepts two or three parameters:
>>> __builtin_shuffle (vec, mask) or __builtin_shuffle (vec0, vec1,
>>> mask) and returns a shuffled vector.
>>>
>>> Function is trying to dispatch shuffling to the
>>> hardware-specific shuffling instructions via new target hooks. If
>>> this attempt fails, function expands shuffling piecewise.
>>
>> Great, thanks for working on this.  In the effort to make free
>> software compilers agree with each other in this case, could you
>> consider implementing the same extension that Clang provides?
>> http://clang.llvm.org/docs/LanguageExtensions.html#__builtin_shufflevector
>>
>> The major difference is the naming of the builtin and that the
>> index  list is taken as a series of variadic arguments.
>
> It seems to me that the focus should first be on implementing the
> standard, and only look at extensions later. But perhaps you can
> file an enhancement request in Bugzilla when the patch is on the GCC
> trunk.

In fact, it would be even better if the three-argument variable was
named shuffle2, which is coherent with OpenCL C.

Paolo

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: Vector shuffling
  2010-08-15 16:12 ` Chris Lattner
@ 2010-08-15 18:56   ` Steven Bosscher
  2010-08-15 21:23     ` Paolo Bonzini
  2010-08-15 22:46   ` Richard Guenther
  1 sibling, 1 reply; 71+ messages in thread
From: Steven Bosscher @ 2010-08-15 18:56 UTC (permalink / raw)
  To: Chris Lattner; +Cc: Artem Shinkarov, gcc-patches, Richard Guenther

On Sun, Aug 15, 2010 at 6:07 PM, Chris Lattner <clattner@apple.com> wrote:
>
> On Aug 15, 2010, at 7:30 AM, Artem Shinkarov wrote:
>
>> The patch implements vector shuffling according to the OpenCL
>> standard. The patch introduces builtin function __builtin_shuffle
>> which accepts two or three parameters: __builtin_shuffle (vec, mask)
>> or __builtin_shuffle (vec0, vec1, mask) and returns a shuffled vector.
>>
>> Function is trying to dispatch shuffling to the hardware-specific
>> shuffling instructions via new target hooks. If this attempt fails,
>> function expands shuffling piecewise.
>
> Great, thanks for working on this.  In the effort to make free software compilers agree with each other in this case, could you consider implementing the same extension that Clang provides?
> http://clang.llvm.org/docs/LanguageExtensions.html#__builtin_shufflevector
>
> The major difference is the naming of the builtin and that the index list is taken as a series of variadic arguments.

It seems to me that the focus should first be on implementing the
standard, and only look at extensions later. But perhaps you can file
an enhancement request in Bugzilla when the patch is on the GCC trunk.

Ciao!
Steven

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: Vector shuffling
  2010-08-15 15:32 Artem Shinkarov
  2010-08-15 15:34 ` Joseph S. Myers
  2010-08-15 16:12 ` Chris Lattner
@ 2010-08-15 18:26 ` Andrew Pinski
  2010-08-15 22:10   ` Richard Guenther
  2010-08-16 17:44 ` Richard Henderson
  3 siblings, 1 reply; 71+ messages in thread
From: Andrew Pinski @ 2010-08-15 18:26 UTC (permalink / raw)
  To: Artem Shinkarov; +Cc: gcc-patches, Richard Guenther

On Sun, Aug 15, 2010 at 7:30 AM, Artem Shinkarov
<artyom.shinkaroff@gmail.com> wrote:
> The patch implements vector shuffling according to the OpenCL
> standard. The patch introduces builtin function __builtin_shuffle
> which accepts two or three parameters: __builtin_shuffle (vec, mask)
> or __builtin_shuffle (vec0, vec1, mask) and returns a shuffled vector.
>
> Function is trying to dispatch shuffling to the hardware-specific
> shuffling instructions via new target hooks. If this attempt fails,
> function expands shuffling piecewise.

I don't like the idea of a target hook here.  Opcodes seems like an
easier way of adding support to new targets.  Not to mention maybe we
should have a new tree code and a new rtl code which will allow the
compiler to optimize these shuffles easier.

-- Pinski

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: Vector shuffling
  2010-08-15 15:32 Artem Shinkarov
  2010-08-15 15:34 ` Joseph S. Myers
@ 2010-08-15 16:12 ` Chris Lattner
  2010-08-15 18:56   ` Steven Bosscher
  2010-08-15 22:46   ` Richard Guenther
  2010-08-15 18:26 ` Andrew Pinski
  2010-08-16 17:44 ` Richard Henderson
  3 siblings, 2 replies; 71+ messages in thread
From: Chris Lattner @ 2010-08-15 16:12 UTC (permalink / raw)
  To: Artem Shinkarov; +Cc: gcc-patches, Richard Guenther


On Aug 15, 2010, at 7:30 AM, Artem Shinkarov wrote:

> The patch implements vector shuffling according to the OpenCL
> standard. The patch introduces builtin function __builtin_shuffle
> which accepts two or three parameters: __builtin_shuffle (vec, mask)
> or __builtin_shuffle (vec0, vec1, mask) and returns a shuffled vector.
> 
> Function is trying to dispatch shuffling to the hardware-specific
> shuffling instructions via new target hooks. If this attempt fails,
> function expands shuffling piecewise.

Great, thanks for working on this.  In the effort to make free software compilers agree with each other in this case, could you consider implementing the same extension that Clang provides?
http://clang.llvm.org/docs/LanguageExtensions.html#__builtin_shufflevector

The major difference is the naming of the builtin and that the index list is taken as a series of variadic arguments.

-Chris

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: Vector shuffling
  2010-08-15 15:56   ` Artem Shinkarov
@ 2010-08-15 16:04     ` Joseph S. Myers
  0 siblings, 0 replies; 71+ messages in thread
From: Joseph S. Myers @ 2010-08-15 16:04 UTC (permalink / raw)
  To: Artem Shinkarov; +Cc: gcc-patches, Richard Guenther

[-- Attachment #1: Type: TEXT/PLAIN, Size: 800 bytes --]

On Sun, 15 Aug 2010, Artem Shinkarov wrote:

> On Sun, Aug 15, 2010 at 4:32 PM, Joseph S. Myers
> <joseph@codesourcery.com> wrote:
> > On Sun, 15 Aug 2010, Artem Shinkarov wrote:
> >
> >>         * target.def: Target hooks for vector shuffle.
> >
> > New hooks should have their documentation in target.def, not in
> > tm.texi.in.
> 
> Ok, I'll move the documentation to target.def. But what about
> tm.texi.in? Should there be some documentation as well, or should it
> be blank?

You just put an @hook line there.  An existing example is 
TARGET_ASM_OUTPUT_SOURCE_FILENAME where there is just:

@hook TARGET_ASM_OUTPUT_SOURCE_FILENAME

in tm.texi.in and the documentation is automatically extracted from 
target.def for tm.texi.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: Vector shuffling
  2010-08-15 15:34 ` Joseph S. Myers
@ 2010-08-15 15:56   ` Artem Shinkarov
  2010-08-15 16:04     ` Joseph S. Myers
  0 siblings, 1 reply; 71+ messages in thread
From: Artem Shinkarov @ 2010-08-15 15:56 UTC (permalink / raw)
  To: Joseph S. Myers; +Cc: gcc-patches, Richard Guenther

Hi Joseph,

I'm really sorry about the copyright assignment, I've sent a signed
letter 10 days ago, but there is still no answer. Is there any chance
to check whether my letter was received or not?

On Sun, Aug 15, 2010 at 4:32 PM, Joseph S. Myers
<joseph@codesourcery.com> wrote:
> On Sun, 15 Aug 2010, Artem Shinkarov wrote:
>
>>         * target.def: Target hooks for vector shuffle.
>
> New hooks should have their documentation in target.def, not in
> tm.texi.in.

Ok, I'll move the documentation to target.def. But what about
tm.texi.in? Should there be some documentation as well, or should it
be blank?


> I have not otherwise looked at this patch since your
> copyright assignment does not yet seem to be on file.
>
> --
> Joseph S. Myers
> joseph@codesourcery.com
>



Artem Shinkarov

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: Vector shuffling
  2010-08-15 15:32 Artem Shinkarov
@ 2010-08-15 15:34 ` Joseph S. Myers
  2010-08-15 15:56   ` Artem Shinkarov
  2010-08-15 16:12 ` Chris Lattner
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 71+ messages in thread
From: Joseph S. Myers @ 2010-08-15 15:34 UTC (permalink / raw)
  To: Artem Shinkarov; +Cc: gcc-patches, Richard Guenther

On Sun, 15 Aug 2010, Artem Shinkarov wrote:

>         * target.def: Target hooks for vector shuffle.

New hooks should have their documentation in target.def, not in 
tm.texi.in.  I have not otherwise looked at this patch since your 
copyright assignment does not yet seem to be on file.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Vector shuffling
@ 2010-08-15 15:32 Artem Shinkarov
  2010-08-15 15:34 ` Joseph S. Myers
                   ` (3 more replies)
  0 siblings, 4 replies; 71+ messages in thread
From: Artem Shinkarov @ 2010-08-15 15:32 UTC (permalink / raw)
  To: gcc-patches; +Cc: Richard Guenther

[-- Attachment #1: Type: text/plain, Size: 2339 bytes --]

The patch implements vector shuffling according to the OpenCL
standard. The patch introduces builtin function __builtin_shuffle
which accepts two or three parameters: __builtin_shuffle (vec, mask)
or __builtin_shuffle (vec0, vec1, mask) and returns a shuffled vector.

Function is trying to dispatch shuffling to the hardware-specific
shuffling instructions via new target hooks. If this attempt fails,
function expands shuffling piecewise.

Changelog:

2010-08-15 Artem Shinkarov <artyom.shinkaroff@gmail.com>

        gcc/
        * builtins.def (BUILTIN_SHUFFLE): New built-in.
        * c-typeck.c (build_function_call_vec): Typecheck
        shuffle function, change return type.
        * Makefile.in: New include.
        * passes.c: Move lower_vector.
        * target.def: Target hooks for vector shuffle.
        * target.h: New include.
        * targhooks.c: Default expansions.
        * targhooks.h: New declarations.
        * tree.c (build_vector_from_val): Build a vector
        from sacalr. New function.
        * tree.h (build_vector_from_val): New definition.
        * tree-vect-generic.c (vector_element):
        Return vectors element at the position specified.
        New function.
        (lower_builtin_shuffle): Builtin expansion. New function.
        (expand_vector_operations_1): Handle expansion.
        (gate_expand_vector_operations_noop): New gate function.
        Change gate pass rules.
        * config/i386/i386.c (ix86_vectorize_builtin_shuffle):
        Expand builtin shuffle to hardware-specific instructions
        or return flase otherwise.
        (ix86_vectorize_builtin_shuffle2): Expand built-in
        shuffle with two input vectors to hardware-specific
        instructions or return false otherwise.
        (extract_vec_perm_cst): Handle situation when VECTOR_CST
        has less elements in the list than it should.
        (ix86_vectorize_builtin_vec_perm_ok): Return false instead
        of error when mask is invalid.

        gcc/testsuite/gcc.c-torture/execute/
        * vect-shuffle-1.c: New test.
        * vect-shuffle-2.c: New test.
        * vect-shuffle-3.c: New test.
        * vect-shuffle-4.c: New test.

        gcc/doc/
        * extend.texi: Adjust.
        * tm.texi: Adjust.
        * tm.texi.in: Adjust.


bootstrapped and tested on x86_64_unknown-linux.

[-- Attachment #2: vec-shuffle.v9.diff --]
[-- Type: text/x-diff, Size: 43689 bytes --]

Index: gcc/doc/extend.texi
===================================================================
--- gcc/doc/extend.texi	(revision 163244)
+++ gcc/doc/extend.texi	(working copy)
@@ -6141,6 +6141,32 @@ minus or complement operators on a vecto
 elements are the negative or complemented values of the corresponding
 elements in the operand.
 
+Vector shuffling is available using functions 
+@code{__builtin_shuffle (vec, mask)} and 
+@code{__builtin_shuffle (vec0, vec1, mask)}. Both functions construct
+a permutation of elements from one or two vectors and return a vector
+of the same type as input vector(s). The mask is a vector of 
+integer-typed elements. The size of each element of the mask must be
+the same as the size of each input vector element. The number of 
+elements in input vector(s) and mask must be the same.
+
+The elements of the input vectors are numbered from left to right across
+one or both of the vectors. Each element in the mask specifies a number
+of element from the input vector(s). Consider the following example.
+
+@smallexample
+typedef int v4si __attribute__ ((vector_size (16)));
+
+v4si a = @{1,2,3,4@};
+v4si b = @{5,6,7,8@};
+v4si mask1 = @{0,1,1,3@};
+v4si mask2 = @{0,4,2,5@};
+v4si res;
+
+res = __builtin_shuffle (a, mask1);       /* res is @{1,2,2,4@}  */
+res = __builtin_shuffle2 (a, b, mask2);   /* res is @{1,5,3,6@}  */
+@end smallexample
+
 You can declare variables and use them in function calls and returns, as
 well as in assignments and some casts.  You can specify a vector type as
 a return type for a function.  Vector types can also be used as function
Index: gcc/doc/tm.texi
===================================================================
--- gcc/doc/tm.texi	(revision 163244)
+++ gcc/doc/tm.texi	(working copy)
@@ -5710,6 +5710,18 @@ misalignment value (@var{misalign}).
 Return true if vector alignment is reachable (by peeling N iterations) for the given type.
 @end deftypefn
 
+@deftypefn {Target Hook} bool TARGET_VECTORIZE_BUILTIN_SHUFFLE (gimple_stmt_iterator *@var{gsi}, tree @var{vec0}, tree @var{mask})
+This hook should find out if vector shuffling is possible within hardware instructions
+and replace the statement pointed by @var{gsi} with the appropriate hardware-specific
+functions. If the hardware expansion is unavailable, function returns false.
+@end deftypefn
+
+@deftypefn {Target Hook} bool TARGET_VECTORIZE_BUILTIN_SHUFFLE2 (gimple_stmt_iterator *@var{gsi}, tree @var{vec0}, tree @var{vec1}, tree @var{mask})
+This hook should find out if vector shuffling is possible within hardware instructions
+and replace the statement pointed by @var{gsi} with the appropriate hardware-specific
+functions. If the hardware expansion is unavailable, function returns false.
+@end deftypefn
+
 @deftypefn {Target Hook} tree TARGET_VECTORIZE_BUILTIN_VEC_PERM (tree @var{type}, tree *@var{mask_element_type})
 Target builtin that implements vector permute.
 @end deftypefn
Index: gcc/doc/tm.texi.in
===================================================================
--- gcc/doc/tm.texi.in	(revision 163244)
+++ gcc/doc/tm.texi.in	(working copy)
@@ -5710,6 +5710,18 @@ misalignment value (@var{misalign}).
 Return true if vector alignment is reachable (by peeling N iterations) for the given type.
 @end deftypefn
 
+@hook TARGET_VECTORIZE_BUILTIN_SHUFFLE
+This hook should find out if vector shuffling is possible within hardware instructions
+and replace the statement pointed by @var{gsi} with the appropriate hardware-specific
+functions. If the hardware expansion is unavailable, function returns false.
+@end deftypefn
+
+@hook TARGET_VECTORIZE_BUILTIN_SHUFFLE2
+This hook should find out if vector shuffling is possible within hardware instructions
+and replace the statement pointed by @var{gsi} with the appropriate hardware-specific
+functions. If the hardware expansion is unavailable, function returns false.
+@end deftypefn
+
 @hook TARGET_VECTORIZE_BUILTIN_VEC_PERM
 Target builtin that implements vector permute.
 @end deftypefn
Index: gcc/targhooks.c
===================================================================
--- gcc/targhooks.c	(revision 163244)
+++ gcc/targhooks.c	(working copy)
@@ -472,6 +472,26 @@ default_builtin_vectorized_function (tre
   return NULL_TREE;
 }
 
+
+/* Vector shuffling functions */
+
+bool
+default_builtin_shuffle (gimple_stmt_iterator *gsi ATTRIBUTE_UNUSED, 
+                         tree vec0 ATTRIBUTE_UNUSED, 
+                         tree mask ATTRIBUTE_UNUSED)
+{
+  return false;
+}
+
+bool
+default_builtin_shuffle2 (gimple_stmt_iterator *gsi ATTRIBUTE_UNUSED, 
+                          tree vec0 ATTRIBUTE_UNUSED, 
+                          tree vec1 ATTRIBUTE_UNUSED, 
+                          tree mask ATTRIBUTE_UNUSED)
+{
+  return false;
+}
+
 /* Vectorized conversion.  */
 
 tree
Index: gcc/targhooks.h
===================================================================
--- gcc/targhooks.h	(revision 163244)
+++ gcc/targhooks.h	(working copy)
@@ -75,6 +75,12 @@ extern const char * default_invalid_with
 
 extern tree default_builtin_vectorized_function (tree, tree, tree);
 
+extern bool default_builtin_shuffle (gimple_stmt_iterator *gsi, tree vec0, 
+                                     tree mask);
+
+extern bool default_builtin_shuffle2 (gimple_stmt_iterator *gsi, tree vec0, 
+                                      tree vec1, tree mask);
+
 extern tree default_builtin_vectorized_conversion (unsigned int, tree, tree);
 
 extern int default_builtin_vectorization_cost (enum vect_cost_for_stmt, tree, int);
Index: gcc/target.def
===================================================================
--- gcc/target.def	(revision 163244)
+++ gcc/target.def	(working copy)
@@ -829,6 +829,23 @@ DEFHOOK
  "",
  tree, (tree type, tree *mask_element_type), NULL)
 
+/* Target built-in that implements vector shuffling, or returns
+   false if not available.  */
+DEFHOOK
+(builtin_shuffle,
+ "",
+ bool, (gimple_stmt_iterator *gsi, tree vec0, tree mask),
+ default_builtin_shuffle)
+
+/* Target built-in that implements two vector shuffling, or returns
+   false if not available.  */
+DEFHOOK
+(builtin_shuffle2,
+ "",
+ bool, (gimple_stmt_iterator *gsi, tree vec0, tree vec1, tree mask),
+ default_builtin_shuffle2)
+
+
 /* Return true if a vector created for builtin_vec_perm is valid.  */
 DEFHOOK
 (builtin_vec_perm_ok,
Index: gcc/tree.c
===================================================================
--- gcc/tree.c	(revision 163244)
+++ gcc/tree.c	(working copy)
@@ -1358,6 +1358,27 @@ build_vector_from_ctor (tree type, VEC(c
   return build_vector (type, nreverse (list));
 }
 
+/* Build a vector of type VECTYPE where all the elements are SCs.  */
+tree
+build_vector_from_val (const tree sc, const tree vectype) 
+{
+  tree t = NULL_TREE;
+  int i, nunits = TYPE_VECTOR_SUBPARTS (vectype);
+
+  if (sc == error_mark_node)
+    return sc;
+
+  gcc_assert (TREE_TYPE (sc) == TREE_TYPE (vectype));
+
+  for (i = 0; i < nunits; ++i)
+    t = tree_cons (NULL_TREE, sc, t);
+
+  if (CONSTANT_CLASS_P (sc))
+    return build_vector (vectype, t);
+  else 
+    return build_constructor_from_list (vectype, t);
+}
+
 /* Return a new CONSTRUCTOR node whose type is TYPE and whose values
    are in the VEC pointed to by VALS.  */
 tree
Index: gcc/tree.h
===================================================================
--- gcc/tree.h	(revision 163244)
+++ gcc/tree.h	(working copy)
@@ -4027,6 +4027,7 @@ extern tree build_int_cst_type (tree, HO
 extern tree build_int_cst_wide (tree, unsigned HOST_WIDE_INT, HOST_WIDE_INT);
 extern tree build_vector (tree, tree);
 extern tree build_vector_from_ctor (tree, VEC(constructor_elt,gc) *);
+extern tree build_vector_from_val (const tree, const tree);
 extern tree build_constructor (tree, VEC(constructor_elt,gc) *);
 extern tree build_constructor_single (tree, tree, tree);
 extern tree build_constructor_from_list (tree, tree);
Index: gcc/target.h
===================================================================
--- gcc/target.h	(revision 163244)
+++ gcc/target.h	(working copy)
@@ -51,6 +51,7 @@
 
 #include "tm.h"
 #include "insn-modes.h"
+#include "gimple.h"
 
 /* Types used by the record_gcc_switches() target function.  */
 typedef enum
Index: gcc/testsuite/gcc.c-torture/execute/vect-shuffle-2.c
===================================================================
--- gcc/testsuite/gcc.c-torture/execute/vect-shuffle-2.c	(revision 0)
+++ gcc/testsuite/gcc.c-torture/execute/vect-shuffle-2.c	(revision 0)
@@ -0,0 +1,43 @@
+#define vector(elcount, type)  \
+__attribute__((vector_size((elcount)*sizeof(type)))) type
+
+#define vidx(type, vec, idx) (*(((type *) &(vec)) + idx))
+
+#define shuf2compare(type, count, vres, v0, v1, mask) \
+do { \
+    int __i; \
+    for (__i = 0; __i < count; __i++) { \
+        if (vidx(type, vres, __i) != ((vidx(type, mask, __i) < count) ? \
+                          vidx(type, v0, vidx(type, mask, __i)) :  \
+                          vidx(type, v1, (vidx(type, mask, __i) - count)))) \
+            __builtin_abort (); \
+        } \
+} while (0)
+
+
+int main (int argc, char *argv[]) {
+    vector (8, short) v0 = {argc, 1,2,3,4,5,6,7};
+    vector (8, short) v1 = {argc, 1,argc,3,4,5,argc,7};
+    vector (8, short) v2;
+
+    //vector (8, short) mask = {1,2,5,4,3,6,7};
+    
+    vector (8, short) mask0 = {0,2,3,1,4,5,6,7};
+    vector (8, short) mask1 = {0,12,3,4,3,0,10,9};
+    vector (8, short) mask2 = {0,8,1,9,2,10,3,11};
+
+    v2 = __builtin_shuffle (v0, v1,  mask0);
+    shuf2compare (short, 8, v2, v0, v1, mask0);
+ 
+    v2 = __builtin_shuffle (v0, v1,  mask1);
+    shuf2compare (short, 8, v2, v0, v1, mask1);
+
+    v2 = __builtin_shuffle (v0, v1,  mask2);
+    shuf2compare (short, 8, v2, v0, v1, mask2);
+
+    v2 = __builtin_shuffle (mask0, mask0,  v0);
+    shuf2compare (short, 8, v2, mask0, mask0, v0);
+
+    return 0; 
+}
+
Index: gcc/testsuite/gcc.c-torture/execute/vect-shuffle-4.c
===================================================================
--- gcc/testsuite/gcc.c-torture/execute/vect-shuffle-4.c	(revision 0)
+++ gcc/testsuite/gcc.c-torture/execute/vect-shuffle-4.c	(revision 0)
@@ -0,0 +1,50 @@
+#define vector(elcount, type)  \
+__attribute__((vector_size((elcount)*sizeof(type)))) type
+
+#define vidx(type, vec, idx) (*(((type *) &(vec)) + idx))
+
+#define shuf2compare(type, count, vres, v0, v1, mask) \
+do { \
+    int __i; \
+    for (__i = 0; __i < count; __i++) { \
+        if (vidx(type, vres, __i) != ((vidx(type, mask, __i) < count) ? \
+                          vidx(type, v0, vidx(type, mask, __i)) :  \
+                          vidx(type, v1, (vidx(type, mask, __i) - count)))) \
+            __builtin_abort (); \
+        } \
+} while (0)
+
+
+vector (8, short) __attribute__ ((noinline))
+f (vector (8, short) x, vector (8, short) y, vector (8, short) mask) {
+    return __builtin_shuffle (x, y, mask);
+}
+
+
+
+int main (int argc, char *argv[]) {
+    vector (8, short) v0 = {argc, 1,2,3,4,5,6,7};
+    vector (8, short) v1 = {argc, 1,argc,3,4,5,argc,7};
+    vector (8, short) v2;
+
+    //vector (8, short) mask = {1,2,5,4,3,6,7};
+    
+    vector (8, short) mask0 = {0,2,3,1,4,5,6,7};
+    vector (8, short) mask1 = {0,12,3,4,3,0,10,9};
+    vector (8, short) mask2 = {0,8,1,9,2,10,3,11};
+
+    v2 = f (v0, v1,  mask0);
+    shuf2compare (short, 8, v2, v0, v1, mask0);
+ 
+    v2 = f (v0, v1,  mask1);
+    shuf2compare (short, 8, v2, v0, v1, mask1);
+
+    v2 = f (v0, v1,  mask2);
+    shuf2compare (short, 8, v2, v0, v1, mask2);
+
+    v2 = f (mask0, mask0,  v0);
+    shuf2compare (short, 8, v2, mask0, mask0, v0);
+
+    return 0; 
+}
+
Index: gcc/testsuite/gcc.c-torture/execute/vect-shuffle-1.c
===================================================================
--- gcc/testsuite/gcc.c-torture/execute/vect-shuffle-1.c	(revision 0)
+++ gcc/testsuite/gcc.c-torture/execute/vect-shuffle-1.c	(revision 0)
@@ -0,0 +1,46 @@
+#define vector(elcount, type)  \
+__attribute__((vector_size((elcount)*sizeof(type)))) type
+
+#define vidx(type, vec, idx) (*(((type *) &(vec)) + idx))
+
+#define shufcompare(type, count, vres, v0, mask) \
+do { \
+    int __i; \
+    for (__i = 0; __i < count; __i++) { \
+        if (vidx(type, vres, __i) != vidx(type, v0, vidx(type, mask, __i))) \
+            __builtin_abort (); \
+    } \
+} while (0)
+
+
+int main (int argc, char *argv[]) {
+    /*vector (8, short) v0 = {argc, 1,2,3,4,5,6,7};
+    vector (8, short) v1 = {argc, 1,argc,3,4,5,argc,7};
+    vector (8, short) v2;
+   
+    vector (8, short) smask = {0,0,1,2,3,4,5,6};
+    
+    v2 = __builtin_shuffle (v0,  smask);
+    shufcompare (short, 8, v2, v0, smask);
+    v2 = __builtin_shuffle (v0, v1);
+    shufcompare (short, 8, v2, v0, v1);
+    v2 = __builtin_shuffle (smask, v0);
+    shufcompare (short, 8, v2, smask, v0);*/
+
+    vector (4, int) i0 = {argc, 1,2,3};
+    vector (4, int) i1 = {argc, 1, argc, 3};
+    vector (4, int) i2;
+
+    vector (4, int) imask = {0,3,2,1};
+
+    /*i2 = __builtin_shuffle (i0, imask);
+    shufcompare (int, 4, i2, i0, imask);*/
+    i2 = __builtin_shuffle (i0, i1);
+    shufcompare (int, 4, i2, i0, i1);
+    
+    i2 = __builtin_shuffle (imask, i0);
+    shufcompare (int, 4, i2, imask, i0);
+    
+    return 0;
+}
+
Index: gcc/testsuite/gcc.c-torture/execute/vect-shuffle-3.c
===================================================================
--- gcc/testsuite/gcc.c-torture/execute/vect-shuffle-3.c	(revision 0)
+++ gcc/testsuite/gcc.c-torture/execute/vect-shuffle-3.c	(revision 0)
@@ -0,0 +1,36 @@
+#define vector(elcount, type)  \
+__attribute__((vector_size((elcount)*sizeof(type)))) type
+
+#define vidx(type, vec, idx) (*(((type *) &(vec)) + idx))
+
+#define shufcompare(type, count, vres, v0, mask) \
+do { \
+    int __i; \
+    for (__i = 0; __i < count; __i++) { \
+        if (vidx(type, vres, __i) != vidx(type, v0, vidx(type, mask, __i))) \
+            __builtin_abort (); \
+    } \
+} while (0)
+
+vector (8, short) __attribute__ ((noinline))
+f (vector (8, short) x, vector (8, short) mask) {
+    return __builtin_shuffle (x, mask);
+}
+
+
+int main (int argc, char *argv[]) {
+    vector (8, short) v0 = {argc, 1,2,3,4,5,6,7};
+    vector (8, short) v1 = {argc, 1,argc,3,4,5,argc,7};
+    vector (8, short) v2;
+
+    vector (8, short) mask = {0,0,1,2,3,4,5,6};
+    
+    v2 = f (v0,  mask);
+    shufcompare (short, 8, v2, v0, mask);
+
+    v2 = f (v0, v1);
+    shufcompare (short, 8, v2, v0, v1);
+
+    return 0;
+}
+
Index: gcc/builtins.def
===================================================================
--- gcc/builtins.def	(revision 163244)
+++ gcc/builtins.def	(working copy)
@@ -708,6 +708,8 @@ DEF_GCC_BUILTIN        (BUILT_IN_VA_ARG_
 DEF_EXT_LIB_BUILTIN    (BUILT_IN__EXIT, "_exit", BT_FN_VOID_INT, ATTR_NORETURN_NOTHROW_LIST)
 DEF_C99_BUILTIN        (BUILT_IN__EXIT2, "_Exit", BT_FN_VOID_INT, ATTR_NORETURN_NOTHROW_LIST)
 
+DEF_GCC_BUILTIN        (BUILT_IN_SHUFFLE, "shuffle", BT_FN_INT_VAR, ATTR_CONST_NOTHROW_TYPEGENERIC)
+
 /* Implementing nested functions.  */
 DEF_BUILTIN_STUB (BUILT_IN_INIT_TRAMPOLINE, "__builtin_init_trampoline")
 DEF_BUILTIN_STUB (BUILT_IN_ADJUST_TRAMPOLINE, "__builtin_adjust_trampoline")
Index: gcc/c-typeck.c
===================================================================
--- gcc/c-typeck.c	(revision 163244)
+++ gcc/c-typeck.c	(working copy)
@@ -2794,6 +2794,68 @@ build_function_call_vec (location_t loc,
       && !check_builtin_function_arguments (fundecl, nargs, argarray))
     return error_mark_node;
 
+  /* Typecheck a builtin function which is declared with variable
+     argument list.  */
+  if (fundecl && DECL_BUILT_IN (fundecl)
+      && DECL_BUILT_IN_CLASS (fundecl) == BUILT_IN_NORMAL)
+    {
+      enum built_in_function fcode = DECL_FUNCTION_CODE (fundecl);
+      if (fcode == BUILT_IN_SHUFFLE) 
+        {
+          tree firstarg = VEC_index (tree, params, 0);
+          tree mask = VEC_index (tree, params, nargs - 1);
+
+          if (nargs != 2 && nargs != 3)
+            {
+              error_at (loc, "__builtin_shuffle accepts 2 or 3 argumensts");
+              return error_mark_node;
+            }
+
+          if (TREE_CODE (TREE_TYPE (mask)) != VECTOR_TYPE
+              || TREE_CODE (TREE_TYPE (TREE_TYPE (mask))) != INTEGER_TYPE)
+            {
+              error_at (loc, "__builtin_shuffle last argument must "
+                             "be an integer vector");
+              return error_mark_node;
+            }
+           
+          if (TREE_CODE (TREE_TYPE (firstarg)) != VECTOR_TYPE
+              || (nargs == 3 
+                  && TREE_CODE (TREE_TYPE (VEC_index (tree, params, 1))) 
+                     != VECTOR_TYPE))
+            {
+              error_at (loc, "__builtin_shuffle arguments must be vectors");
+              return error_mark_node;
+            }
+
+          if ((TYPE_VECTOR_SUBPARTS (TREE_TYPE (firstarg)) 
+                 != TYPE_VECTOR_SUBPARTS (TREE_TYPE (mask)))
+              || (nargs == 3 
+                  && TYPE_VECTOR_SUBPARTS (
+                            TREE_TYPE (VEC_index (tree, params, 1)))
+                     != TYPE_VECTOR_SUBPARTS (TREE_TYPE (mask))))
+            {
+              error_at (loc, "__builtin_shuffle number of elements of the "
+                             "argument vector(s) and the mask vector should "
+                             "be the same");
+              return error_mark_node;
+            }
+         
+          /* Here we change the return type of the builtin function 
+             from int f(...) --> t f(...) where t is a type of the 
+             first argument.  */
+          fundecl = copy_node (fundecl);
+          TREE_TYPE (fundecl) = build_function_type (TREE_TYPE (firstarg),
+                                        TYPE_ARG_TYPES (TREE_TYPE (fundecl)));
+          function = build_fold_addr_expr (fundecl);
+          result = build_call_array_loc (loc, TREE_TYPE (firstarg),
+		        function, nargs, argarray);
+          return require_complete_type (result);
+        }
+    }
+
+
+
   /* Check that the arguments to the function are valid.  */
   check_function_arguments (TYPE_ATTRIBUTES (fntype), nargs, argarray,
 			    TYPE_ARG_TYPES (fntype));
@@ -6005,10 +6067,17 @@ digest_init (location_t init_loc, tree t
 	  tree value;
 	  bool constant_p = true;
 
-	  /* Iterate through elements and check if all constructor
+	  /* If constructor has less elements than the vector type.  */
+          if (CONSTRUCTOR_NELTS (inside_init) 
+              < TYPE_VECTOR_SUBPARTS (TREE_TYPE (inside_init)))
+            warning_at (init_loc, 0, "vector length does not match "
+                                     "initializer length, zero elements "
+                                     "will be inserted");
+          
+          /* Iterate through elements and check if all constructor
 	     elements are *_CSTs.  */
 	  FOR_EACH_CONSTRUCTOR_VALUE (CONSTRUCTOR_ELTS (inside_init), ix, value)
-	    if (!CONSTANT_CLASS_P (value))
+            if (!CONSTANT_CLASS_P (value))
 	      {
 		constant_p = false;
 		break;
Index: gcc/tree-vect-generic.c
===================================================================
--- gcc/tree-vect-generic.c	(revision 163244)
+++ gcc/tree-vect-generic.c	(working copy)
@@ -30,6 +30,8 @@ along with GCC; see the file COPYING3.  
 #include "tree-pass.h"
 #include "flags.h"
 #include "ggc.h"
+#include "target.h"
+#include "diagnostic.h"
 
 /* Need to include rtl.h, expr.h, etc. for optabs.  */
 #include "expr.h"
@@ -383,6 +385,277 @@ type_for_widest_vector_mode (enum machin
     }
 }
 
+
+/* Build a reference to the element of the vector VECT. Function 
+   returns either the element itself, either BIT_FIELD_REF, or an 
+   ARRAY_REF expression.
+   
+   GSI is requred to insert temporary variables while building a
+   refernece to the element of the vector VECT.
+   
+   PTMPVEC is a pointer to the temporary variable for caching
+   purposes. In case when PTMPVEC is NULL new temporary variable
+   will be created.  */
+static tree
+vector_element (gimple_stmt_iterator *gsi, tree vect, tree idx, tree *ptmpvec)
+{
+  tree type;
+  gimple asgn; 
+  unsigned HOST_WIDE_INT maxval;
+  tree tmpvec; 
+  tree indextype, arraytype;
+  bool need_asgn = true;
+
+  gcc_assert (TREE_CODE (TREE_TYPE (vect)) == VECTOR_TYPE);
+
+  type = TREE_TYPE (vect);
+  if (TREE_CODE (idx) == INTEGER_CST)
+    {
+      unsigned HOST_WIDE_INT index;
+
+      if (!host_integerp (idx, 1)
+           || (index = tree_low_cst (idx, 1)) > TYPE_VECTOR_SUBPARTS (type)-1)
+        return error_mark_node;
+
+      if (TREE_CODE (vect) == VECTOR_CST)
+        {
+            unsigned i;
+            tree vals = TREE_VECTOR_CST_ELTS (vect);
+            for (i = 0; vals; vals = TREE_CHAIN (vals), ++i)
+              if (i == index)
+                 return TREE_VALUE (vals);
+            return error_mark_node;
+        }
+      else if (TREE_CODE (vect) == CONSTRUCTOR)
+        {
+          unsigned i;
+          VEC (constructor_elt, gc) *vals = CONSTRUCTOR_ELTS (vect);
+          constructor_elt *elt;
+
+          for (i = 0; VEC_iterate (constructor_elt, vals, i, elt); i++)
+            if (operand_equal_p (elt->index, idx, 0))
+              return elt->value; 
+          return fold_convert (TREE_TYPE (type), integer_zero_node);
+        }
+      else if (TREE_CODE (vect) == SSA_NAME)
+        {
+          tree el;
+          gimple vectdef = SSA_NAME_DEF_STMT (vect);
+          if (gimple_assign_single_p (vectdef)
+              && (el = vector_element (gsi, gimple_assign_rhs1 (vectdef), 
+                                       idx, ptmpvec)) 
+                 != error_mark_node)
+            return el;
+          else
+            {
+              tree size = TYPE_SIZE (TREE_TYPE (type));
+              tree pos = fold_build2 (MULT_EXPR, TREE_TYPE (idx), 
+                                      idx, size);
+              return fold_build3 (BIT_FIELD_REF, TREE_TYPE (type), 
+                             vect, size, pos);
+            }
+        }
+      else
+        return error_mark_node;
+    }
+  
+  if (!ptmpvec)
+    tmpvec = create_tmp_var (TREE_TYPE (vect), "vectmp");
+  else if (!*ptmpvec)
+    tmpvec = *ptmpvec = create_tmp_var (TREE_TYPE (vect), "vectmp");
+  else
+    {
+      tmpvec = *ptmpvec;
+      need_asgn = false;
+    }
+  
+  if (need_asgn)
+    {
+      TREE_ADDRESSABLE (tmpvec) = 1;
+      asgn = gimple_build_assign (tmpvec, vect);
+      gsi_insert_before (gsi, asgn, GSI_SAME_STMT);
+    }
+
+  maxval = TYPE_VECTOR_SUBPARTS (TREE_TYPE (vect)) -1;
+  indextype = build_index_type (size_int (maxval));
+  arraytype = build_array_type (TREE_TYPE (type), indextype);
+  
+  return build4 (ARRAY_REF, TREE_TYPE (type),
+                 build1 (VIEW_CONVERT_EXPR, arraytype, tmpvec),
+                 idx, NULL_TREE, NULL_TREE);
+
+
+}
+
+/* Lower built-in vector shuffle function. Function can have two or
+   three arguments.
+   When function has two arguments: __builtin_shuffle (v0, mask, 
+   the lowered version would be {v0[mask[0]], v0[mask[1]], ...}
+   MASK and V0 must have the same number of elements.
+        
+   In case of three arguments: __builtin_shuffle (v0, v1, mask)
+   the lowered version would be: 
+         {mask[0] < len(v0) ? v0[mask[0]] : v1[mask[0]], ...}
+   V0 and V1 must have the same type. MASK, V0, V1 must have the
+   same number of arguments.  */
+static void
+lower_builtin_shuffle (gimple_stmt_iterator *gsi, location_t loc)
+{
+#define TRAP_RETURN(new_stmt, stmt, gsi, vec0) \
+do { \
+  new_stmt = gimple_build_call (built_in_decls[BUILT_IN_TRAP], 0); \
+  gsi_insert_before (gsi, new_stmt,  GSI_SAME_STMT); \
+  split_block (gimple_bb (new_stmt), new_stmt); \
+  new_stmt = gimple_build_assign (gimple_call_lhs (stmt), vec0); \
+  gsi_replace (gsi, new_stmt, false); \
+  return; \
+} while (0) 
+ 
+  gimple stmt = gsi_stmt (*gsi);
+  unsigned numargs = gimple_call_num_args (stmt);
+  tree mask = gimple_call_arg (stmt, numargs - 1);
+  tree vec0 = gimple_call_arg (stmt, 0);
+  unsigned els = TYPE_VECTOR_SUBPARTS (TREE_TYPE (mask));
+  tree type0 = TREE_TYPE (TREE_TYPE (vec0));
+  VEC(constructor_elt,gc) *v = NULL;
+  tree vectype, constr;
+  gimple new_stmt;
+  tree vec0tmp = NULL_TREE, masktmp = NULL_TREE;
+
+  if (numargs == 2)
+    {
+      unsigned i;
+      tree vec0tmp = NULL_TREE;
+      
+      if (targetm.vectorize.builtin_shuffle (gsi, vec0, mask))
+        {
+          /* Built-in is expanded by target.  */
+          return ;
+        }
+
+      v = VEC_alloc (constructor_elt, gc, els);
+      for (i = 0; i < els; i++)
+        {
+          tree idxval, vecel, t;
+          if ((idxval = vector_element (gsi, mask, size_int (i), &masktmp))
+              == error_mark_node)
+            {
+              warning_at (loc, 0, "Invalid shuffling mask index %i", i);
+              TRAP_RETURN (new_stmt, stmt, gsi, vec0);
+            }
+
+          if ((vecel = vector_element (gsi, vec0, idxval, &vec0tmp))
+              == error_mark_node)
+            {
+              warning_at (loc, 0, "Invalid shuffling arguments");
+              TRAP_RETURN (new_stmt, stmt, gsi, vec0);
+            }
+          
+          t = force_gimple_operand_gsi (gsi, vecel, true, 
+                                    NULL_TREE, true, GSI_SAME_STMT);
+          CONSTRUCTOR_APPEND_ELT (v, size_int (i), t);
+        }
+    }
+  else if (numargs == 3) 
+    {
+      unsigned i;
+      tree vec1 = gimple_call_arg (stmt, 1);
+      tree var = create_tmp_var (type0, "vecel");
+      tree vec1tmp = NULL_TREE;
+
+      if (targetm.vectorize.builtin_shuffle2 (gsi, vec0, vec1, mask))
+        {
+          /* Built-in is expanded by target.  */
+          return ;
+        }
+
+      v = VEC_alloc (constructor_elt, gc, els);
+      for (i = 0; i < els; i++)
+        {
+          tree idxval, idx1val, cond, elval0, elval1, condexpr, t, ssatmp;
+          tree vec0el, vec1el;
+          gimple asgn;
+          
+          if ((idxval = vector_element (gsi, mask, size_int (i), &masktmp))
+              == error_mark_node)
+            {
+              warning_at (loc, 0, "Invalid shuffling mask index %i", i);
+              TRAP_RETURN (new_stmt, stmt, gsi, vec0);
+            }
+          
+          if (TREE_CODE (idxval) == INTEGER_CST)
+            {
+              if (tree_int_cst_lt (idxval, size_int (els)))
+                {
+                  vec0el = vector_element (gsi, vec0, idxval, &vec0tmp);
+                  t = force_gimple_operand_gsi (gsi, vec0el,
+                                    true, NULL_TREE, true, GSI_SAME_STMT);
+                }
+              else if (tree_int_cst_lt (idxval, size_int (2*els)))
+                {
+                  idx1val = fold_build2 (MINUS_EXPR, TREE_TYPE (idxval),
+                        idxval, build_int_cst (TREE_TYPE (idxval), els));
+                  
+                  vec1el = vector_element (gsi, vec1, idx1val, &vec1tmp);
+                  t = force_gimple_operand_gsi (gsi, vec1el, 
+                                    true, NULL_TREE, true, GSI_SAME_STMT); 
+                }
+              else
+                {
+                  warning_at (loc, 0, "Invalid shuffling mask index %i", i);
+                  TRAP_RETURN (new_stmt, stmt, gsi, vec0);
+                }
+            }
+          else
+            {
+
+              idx1val = fold_build2 (MINUS_EXPR, TREE_TYPE (idxval),
+                            idxval, build_int_cst (TREE_TYPE (idxval), els));
+              idx1val = force_gimple_operand_gsi (gsi, idx1val, 
+                                true, NULL_TREE, true, GSI_SAME_STMT);
+              cond = build2 (GT_EXPR, boolean_type_node, \
+                             idxval, convert (type0, size_int (els - 1)));
+              
+              if ((vec0el = vector_element (gsi, vec0, idxval, &vec0tmp))
+                  == error_mark_node)
+                {
+                  warning_at (loc, 0, "Invalid shuffling arguments");
+                  TRAP_RETURN (new_stmt, stmt, gsi, vec0);
+                }
+              elval0 = force_gimple_operand_gsi (gsi, vec0el, 
+                                true, NULL_TREE, true, GSI_SAME_STMT);
+            
+              if ((vec1el = vector_element (gsi, vec1, idx1val, &vec1tmp))
+                  == error_mark_node)
+                {
+                  warning_at (loc, 0, "Invalid shuffling arguments");
+                  TRAP_RETURN (new_stmt, stmt, gsi, vec0);
+                }
+
+              elval1 = force_gimple_operand_gsi (gsi, vec1el,
+                                true, NULL_TREE, true, GSI_SAME_STMT);
+
+              condexpr = fold_build3 (COND_EXPR, type0, cond, \
+                                      elval1, elval0);
+
+              t = force_gimple_operand_gsi (gsi, condexpr, true, \
+                                        NULL_TREE, true, GSI_SAME_STMT);
+            }
+          
+          asgn = gimple_build_assign (var, t);
+          ssatmp = make_ssa_name (var, asgn);
+          gimple_assign_set_lhs (asgn, ssatmp);
+          gsi_insert_before (gsi, asgn, GSI_SAME_STMT);
+          CONSTRUCTOR_APPEND_ELT (v, size_int (i), ssatmp);
+        }
+    }
+  
+  vectype = build_vector_type (type0, els);
+  constr = build_constructor (vectype, v);
+  new_stmt = gimple_build_assign (gimple_call_lhs (stmt), constr);
+  gsi_replace (gsi, new_stmt, false);
+}
+
 /* Process one statement.  If we identify a vector operation, expand it.  */
 
 static void
@@ -396,6 +669,12 @@ expand_vector_operations_1 (gimple_stmt_
   enum gimple_rhs_class rhs_class;
   tree new_rhs;
 
+  if (gimple_call_builtin_p (stmt, BUILT_IN_SHUFFLE))
+    {
+      lower_builtin_shuffle (gsi, gimple_location (stmt));
+      gimple_set_modified (gsi_stmt (*gsi), true);
+    }
+  
   if (gimple_code (stmt) != GIMPLE_ASSIGN)
     return;
 
@@ -521,10 +800,11 @@ expand_vector_operations_1 (gimple_stmt_
 /* Use this to lower vector operations introduced by the vectorizer,
    if it may need the bit-twiddling tricks implemented in this file.  */
 
+
 static bool
-gate_expand_vector_operations (void)
+gate_expand_vector_operations_noop (void)
 {
-  return flag_tree_vectorize != 0;
+  return optimize == 0;
 }
 
 static unsigned int
@@ -549,7 +829,7 @@ struct gimple_opt_pass pass_lower_vector
  {
   GIMPLE_PASS,
   "veclower",				/* name */
-  0,					/* gate */
+  gate_expand_vector_operations_noop,   /* gate */
   expand_vector_operations,		/* execute */
   NULL,					/* sub */
   NULL,					/* next */
@@ -559,8 +839,8 @@ struct gimple_opt_pass pass_lower_vector
   0,					/* properties_provided */
   0,					/* properties_destroyed */
   0,					/* todo_flags_start */
-  TODO_dump_func | TODO_ggc_collect
-    | TODO_verify_stmts			/* todo_flags_finish */
+  TODO_dump_func | TODO_update_ssa | TODO_ggc_collect
+    | TODO_verify_stmts | TODO_cleanup_cfg/* todo_flags_finish */
  }
 };
 
@@ -569,7 +849,7 @@ struct gimple_opt_pass pass_lower_vector
  {
   GIMPLE_PASS,
   "veclower2",				/* name */
-  gate_expand_vector_operations,	/* gate */
+  0,	                                /* gate */
   expand_vector_operations,		/* execute */
   NULL,					/* sub */
   NULL,					/* next */
@@ -582,6 +862,7 @@ struct gimple_opt_pass pass_lower_vector
   TODO_dump_func | TODO_update_ssa	/* todo_flags_finish */
     | TODO_verify_ssa
     | TODO_verify_stmts | TODO_verify_flow
+    | TODO_cleanup_cfg
  }
 };
 
Index: gcc/Makefile.in
===================================================================
--- gcc/Makefile.in	(revision 163244)
+++ gcc/Makefile.in	(working copy)
@@ -864,7 +864,7 @@ endif
 VEC_H = vec.h statistics.h
 EXCEPT_H = except.h $(HASHTAB_H) vecprim.h vecir.h
 TOPLEV_H = toplev.h $(INPUT_H) bversion.h $(DIAGNOSTIC_CORE_H)
-TARGET_H = $(TM_H) target.h target.def insn-modes.h
+TGT = $(TM_H) target.h target.def insn-modes.h
 MACHMODE_H = machmode.h mode-classes.def insn-modes.h
 HOOKS_H = hooks.h $(MACHMODE_H)
 HOSTHOOKS_DEF_H = hosthooks-def.h $(HOOKS_H)
@@ -886,8 +886,9 @@ TREE_H = tree.h all-tree.def tree.def c-
 REGSET_H = regset.h $(BITMAP_H) hard-reg-set.h
 BASIC_BLOCK_H = basic-block.h $(PREDICT_H) $(VEC_H) $(FUNCTION_H) cfghooks.h
 GIMPLE_H = gimple.h gimple.def gsstruct.def pointer-set.h $(VEC_H) \
-	$(GGC_H) $(BASIC_BLOCK_H) $(TM_H) $(TARGET_H) tree-ssa-operands.h \
+	$(GGC_H) $(BASIC_BLOCK_H) $(TM_H) $(TGT) tree-ssa-operands.h \
 	tree-ssa-alias.h vecir.h
+TARGET_H = $(TGT) gimple.h
 GCOV_IO_H = gcov-io.h gcov-iov.h auto-host.h
 COVERAGE_H = coverage.h $(GCOV_IO_H)
 DEMANGLE_H = $(srcdir)/../include/demangle.h
@@ -3156,7 +3157,7 @@ tree-vect-generic.o : tree-vect-generic.
     $(TM_H) $(TREE_FLOW_H) $(GIMPLE_H) tree-iterator.h $(TREE_PASS_H) \
     $(FLAGS_H) $(OPTABS_H) $(MACHMODE_H) $(EXPR_H) \
     langhooks.h $(FLAGS_H) $(DIAGNOSTIC_H) gt-tree-vect-generic.h $(GGC_H) \
-    coretypes.h insn-codes.h
+    coretypes.h insn-codes.h target.h diagnostic.h
 df-core.o : df-core.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) \
    insn-config.h $(RECOG_H) $(FUNCTION_H) $(REGS_H) alloc-pool.h \
    hard-reg-set.h $(BASIC_BLOCK_H) $(DF_H) $(BITMAP_H) sbitmap.h $(TIMEVAR_H) \
Index: gcc/passes.c
===================================================================
--- gcc/passes.c	(revision 163244)
+++ gcc/passes.c	(working copy)
@@ -735,7 +735,6 @@ init_optimization_passes (void)
   NEXT_PASS (pass_refactor_eh);
   NEXT_PASS (pass_lower_eh);
   NEXT_PASS (pass_build_cfg);
-  NEXT_PASS (pass_lower_vector);
   NEXT_PASS (pass_warn_function_return);
   NEXT_PASS (pass_build_cgraph_edges);
   NEXT_PASS (pass_inline_parameters);
@@ -763,6 +762,7 @@ init_optimization_passes (void)
 
       NEXT_PASS (pass_referenced_vars);
       NEXT_PASS (pass_build_ssa);
+      NEXT_PASS (pass_lower_vector);
       NEXT_PASS (pass_early_warn_uninitialized);
       /* Note that it is not strictly necessary to schedule an early
 	 inline pass here.  However, some test cases (e.g.,
@@ -915,7 +915,6 @@ init_optimization_passes (void)
 	  NEXT_PASS (pass_vectorize);
 	    {
 	      struct opt_pass **p = &pass_vectorize.pass.sub;
-	      NEXT_PASS (pass_lower_vector_ssa);
 	      NEXT_PASS (pass_dce_loop);
 	    }
           NEXT_PASS (pass_predcom);
@@ -926,6 +925,7 @@ init_optimization_passes (void)
 	  NEXT_PASS (pass_iv_optimize);
 	  NEXT_PASS (pass_tree_loop_done);
 	}
+      NEXT_PASS (pass_lower_vector_ssa);
       NEXT_PASS (pass_cse_reciprocals);
       NEXT_PASS (pass_reassoc);
       NEXT_PASS (pass_vrp);
Index: gcc/config/i386/i386.c
===================================================================
--- gcc/config/i386/i386.c	(revision 163244)
+++ gcc/config/i386/i386.c	(working copy)
@@ -25,6 +25,7 @@ along with GCC; see the file COPYING3.  
 #include "tm.h"
 #include "rtl.h"
 #include "tree.h"
+#include "tree-flow.h"
 #include "tm_p.h"
 #include "regs.h"
 #include "hard-reg-set.h"
@@ -27855,6 +27856,9 @@ struct expand_vec_perm_d
 
 static bool expand_vec_perm_1 (struct expand_vec_perm_d *d);
 static bool expand_vec_perm_broadcast_1 (struct expand_vec_perm_d *d);
+static int extract_vec_perm_cst (struct expand_vec_perm_d *, tree);
+static bool ix86_vectorize_builtin_vec_perm_ok (tree vec_type, tree mask);
+
 
 /* Get a vector mode of the same size as the original but with elements
    twice as wide.  This is only guaranteed to apply to integral vectors.  */
@@ -30222,6 +30226,182 @@ ix86_vectorize_builtin_vec_perm (tree ve
   return ix86_builtins[(int) fcode];
 }
 
+/*  Lower shuffling the vector VEC0 as specified by MASK. Replaces the
+    statement at *GSI with a target specific sequence implementing the
+    shuffle operation and returns true.  Returns false if no target
+    specific sequence for this shuffle operation exists.  */
+static bool
+ix86_vectorize_builtin_shuffle (gimple_stmt_iterator *gsi, 
+                                tree vec0, tree mask)
+{
+  if (!VECTOR_MODE_P (TYPE_MODE (TREE_TYPE (vec0))))
+    return false;
+ 
+  /* Recursively grab the definition of the variable.  */
+  while (TREE_CODE (mask) == SSA_NAME)
+    {
+      gimple maskdef = SSA_NAME_DEF_STMT (mask);
+      if (gimple_assign_single_p (maskdef))
+        mask = gimple_assign_rhs1 (maskdef);
+      else
+        break;
+    }
+
+  if (TREE_CODE (mask) == VECTOR_CST)
+    {
+      tree t, m_type;
+      if (ix86_vectorize_builtin_vec_perm_ok (TREE_TYPE (vec0), mask))
+        {
+          t = ix86_vectorize_builtin_vec_perm (TREE_TYPE (vec0), &m_type);
+          
+          if (t != NULL_TREE)
+            {
+              gimple c = gimple_build_call (t, 3, vec0, vec0, mask);
+              gimple stmt = gsi_stmt (*gsi);
+              gimple_call_set_lhs (c, gimple_call_lhs (stmt));
+              gsi_replace (gsi, c, false);
+              return true;
+            }
+        }
+    }
+  /* If we cannot expand it via vec_perm, we will try to expand it 
+     via PSHUFB instruction.  */
+    {
+      tree mtype = TREE_TYPE (mask);
+      unsigned HOST_WIDE_INT i = 1, w = TYPE_VECTOR_SUBPARTS (mtype);
+      tree mcst, c, m1;
+      tree mvar, m1var, t;
+      tree fntype;
+      gimple asgn;
+      
+      if (tree_low_cst (TYPE_SIZE (mtype), 1) != 128)
+        return false;
+
+      if (!TARGET_SSE3 && !TARGET_AVX)
+        return false;
+
+      if (NULL_TREE == (fntype = ix86_builtins[(int) IX86_BUILTIN_PSHUFB128]))
+        return false;
+
+      mvar = create_tmp_var (mtype, "mask");
+      m1var = create_tmp_var (mtype, "nmask");
+
+      c = build_int_cst (TREE_TYPE (mtype), w-1);
+      mcst = build_vector_from_val (c, mtype);
+      
+      /* mvar = mask & {w-1, w-1, w-1,...} */
+      m1 = build2 (BIT_AND_EXPR, mtype, mask, mcst);
+      t = force_gimple_operand_gsi (gsi, m1,
+                            true, NULL_TREE, true, GSI_SAME_STMT);
+      asgn = gimple_build_assign (mvar, t);
+      gsi_insert_before (gsi, asgn, GSI_SAME_STMT);
+      
+      /* m1var = mvar */
+      asgn = gimple_build_assign (m1var, t);
+      gsi_insert_before (gsi, asgn, GSI_SAME_STMT);
+
+      while (w != 16)
+        {
+          /* m1var = mm1var << 8*i */
+          m1 = build2 (LSHIFT_EXPR, mtype, m1var, 
+                        build_int_cst (TREE_TYPE (mtype), 8*i));
+          t = force_gimple_operand_gsi (gsi, m1,
+                            true, NULL_TREE, true, GSI_SAME_STMT);
+          asgn = gimple_build_assign (m1var, t);
+          gsi_insert_before (gsi, asgn , GSI_SAME_STMT);
+
+          /* mvar = mvar | m1var */
+          m1 = build2 (BIT_IOR_EXPR, mtype, mvar, m1var);
+          t = force_gimple_operand_gsi (gsi, m1,
+                            true, NULL_TREE, true, GSI_SAME_STMT);
+          asgn = gimple_build_assign (mvar, t);
+          gsi_insert_before (gsi, asgn, GSI_SAME_STMT);
+
+          /* m1var = mvar */
+          t = force_gimple_operand_gsi (gsi, mvar,
+                            true, NULL_TREE, true, GSI_SAME_STMT);
+          asgn = gimple_build_assign (m1var, t);
+          gsi_insert_before (gsi, asgn, GSI_SAME_STMT);
+
+          w *= 2;
+          i *= 2;
+        }
+
+      if (fntype != NULL_TREE)
+        {
+            tree v, m, r, ctype;
+            gimple c, stmt, asgn;
+            
+            ctype = build_vector_type (char_type_node, 16);
+            r = create_tmp_var (ctype, "res");
+
+            v = force_gimple_operand_gsi (gsi, 
+                    fold_build1 (VIEW_CONVERT_EXPR, ctype, vec0),
+                    true, NULL_TREE, true, GSI_SAME_STMT);
+
+            m = force_gimple_operand_gsi (gsi, 
+                    fold_build1 (VIEW_CONVERT_EXPR, ctype, mvar),
+                    true, NULL_TREE, true, GSI_SAME_STMT);
+
+            c = gimple_build_call (fntype, 2, v, m);
+            gimple_call_set_lhs (c, r);
+            gsi_insert_before (gsi, c, GSI_SAME_STMT);
+
+            stmt = gsi_stmt (*gsi);
+            t = force_gimple_operand_gsi (gsi,
+                    build1 (VIEW_CONVERT_EXPR, TREE_TYPE (vec0), r),
+                    true, NULL_TREE, true, GSI_SAME_STMT);
+            asgn = gimple_build_assign (gimple_call_lhs (stmt), t);
+            gsi_replace (gsi, asgn, false);
+            return true;
+        }
+    }
+  return false;
+}
+
+/*  Lower shuffling vectors VEC0 and VEC1 as specified by MASK.
+    Replaces the statement at *GSI with a target specific sequence
+    implementing the shuffle operation and returns true.  Returns
+    false if no target specific sequence for this shuffle operation
+    exists.  */
+static bool
+ix86_vectorize_builtin_shuffle2 (gimple_stmt_iterator *gsi, 
+                                tree vec0, tree vec1, tree mask)
+{
+  if (!VECTOR_MODE_P (TYPE_MODE (TREE_TYPE (vec0))))
+    return false;
+ 
+  /* Check wheteher vector size is 128 or 256 */
+  while (TREE_CODE (mask) == SSA_NAME)
+    {
+      gimple maskdef = SSA_NAME_DEF_STMT (mask);
+      if (gimple_assign_single_p (maskdef))
+        mask = gimple_assign_rhs1 (maskdef);
+      else
+        break;
+    }
+
+  if (TREE_CODE (mask) == VECTOR_CST)
+    {
+      tree t, m_type;
+      if (!ix86_vectorize_builtin_vec_perm_ok (TREE_TYPE (vec0), mask))
+        return false;
+      
+      t = ix86_vectorize_builtin_vec_perm (TREE_TYPE (vec0), &m_type);
+      
+      if (t != NULL_TREE)
+        {
+          gimple c = gimple_build_call (t, 3, vec0, vec1, mask);
+          gimple stmt = gsi_stmt (*gsi);
+          gimple_call_set_lhs (c, gimple_call_lhs (stmt));
+          gsi_replace (gsi, c, false);
+          return true;
+        }
+    }
+
+  return false;
+}
+
 /* Return a vector mode with twice as many elements as VMODE.  */
 /* ??? Consider moving this to a table generated by genmodes.c.  */
 
@@ -31139,13 +31319,21 @@ extract_vec_perm_cst (struct expand_vec_
   unsigned i, nelt = d->nelt;
   int ret = 0;
 
-  for (i = 0; i < nelt; ++i, list = TREE_CHAIN (list))
+  for (i = 0; i < nelt; ++i, list = 
+                        (list == NULL_TREE ? NULL_TREE : TREE_CHAIN (list)))
     {
       unsigned HOST_WIDE_INT e;
+      tree value;
+
+      if (list != NULL_TREE)
+        value = TREE_VALUE (list);
+      else
+          value = fold_convert (TREE_TYPE (TREE_TYPE (cst)), 
+                                integer_zero_node);
 
-      if (!host_integerp (TREE_VALUE (list), 1))
+      if (!host_integerp (value, 1))
 	return 0;
-      e = tree_low_cst (TREE_VALUE (list), 1);
+      e = tree_low_cst (value, 1);
       if (e >= 2 * nelt)
 	return 0;
 
@@ -31294,10 +31482,10 @@ ix86_vectorize_builtin_vec_perm_ok (tree
 
   vec_mask = extract_vec_perm_cst (&d, mask);
 
-  /* This hook is cannot be called in response to something that the
-     user does (unlike the builtin expander) so we shouldn't ever see
-     an error generated from the extract.  */
-  gcc_assert (vec_mask > 0 && vec_mask <= 3);
+  /* Check whether the mask can be applied to the vector type.  */
+  if (vec_mask < 0 || vec_mask > 3)
+    return false;
+  
   one_vec = (vec_mask != 3);
 
   /* Implementable with shufps or pshufd.  */
@@ -31715,6 +31903,14 @@ ix86_enum_va_list (int idx, const char *
 #define TARGET_VECTORIZE_BUILTIN_VEC_PERM_OK \
   ix86_vectorize_builtin_vec_perm_ok
 
+#undef TARGET_VECTORIZE_BUILTIN_SHUFFLE
+#define TARGET_VECTORIZE_BUILTIN_SHUFFLE \
+  ix86_vectorize_builtin_shuffle
+
+#undef TARGET_VECTORIZE_BUILTIN_SHUFFLE2
+#define TARGET_VECTORIZE_BUILTIN_SHUFFLE2 \
+  ix86_vectorize_builtin_shuffle2
+
 #undef TARGET_SET_CURRENT_FUNCTION
 #define TARGET_SET_CURRENT_FUNCTION ix86_set_current_function
 

^ permalink raw reply	[flat|nested] 71+ messages in thread

end of thread, other threads:[~2011-10-06 18:04 UTC | newest]

Thread overview: 71+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-08-30  7:17 Vector shuffling Artem Shinkarov
2011-08-30 13:50 ` Richard Guenther
2011-08-30 19:46   ` Joseph S. Myers
2011-08-30 20:36   ` Artem Shinkarov
2011-08-31  7:53     ` Chris Lattner
2011-08-31  9:00       ` Richard Guenther
2011-08-31  9:02       ` Artem Shinkarov
2011-08-31  9:04         ` Duncan Sands
2011-08-31  9:34           ` Richard Guenther
2011-08-31 14:33             ` Artem Shinkarov
2011-08-31 15:17               ` Richard Guenther
2011-08-31 17:25               ` Joseph S. Myers
2011-08-31 19:08                 ` Artem Shinkarov
     [not found]                   ` <Pine.LNX.4.64.1108312053060.21299@digraph.polyomino.org.uk>
2011-09-02 15:16                     ` Artem Shinkarov
2011-09-02 15:41                       ` Joseph S. Myers
2011-09-02 16:09                         ` Artem Shinkarov
2011-09-02 17:15                           ` Artem Shinkarov
2011-09-02 19:52                           ` Joseph S. Myers
2011-09-03 15:53                             ` Artem Shinkarov
2011-09-06 15:40                               ` Richard Guenther
2011-09-07 15:07                               ` Joseph S. Myers
2011-09-09 17:04                                 ` Artem Shinkarov
2011-09-12  8:02                                   ` Richard Guenther
2011-09-13 17:48                                   ` Joseph S. Myers
2011-09-15 20:36                                   ` Richard Henderson
2011-09-28 13:43                                     ` Artem Shinkarov
2011-09-28 15:20                                       ` Richard Henderson
2011-09-29 11:16                                         ` Artem Shinkarov
2011-09-29 17:22                                           ` Richard Henderson
2011-09-30 20:34                                             ` Artem Shinkarov
2011-09-30 20:44                                               ` Richard Henderson
2011-09-30 20:51                                                 ` Artem Shinkarov
2011-09-30 23:22                                                   ` Richard Henderson
     [not found]                                                     ` <CABYV9SUt+mFr3XQLHnzJevBmovkop92tSRDnR9j4U7bOuDWuew@mail.gmail.com>
2011-10-03 12:15                                                       ` Artem Shinkarov
2011-10-03 15:13                                                         ` Richard Henderson
2011-10-03 16:44                                                           ` Artem Shinkarov
2011-10-03 17:12                                                             ` Richard Henderson
2011-10-03 17:21                                                               ` Artem Shinkarov
2011-10-03 23:05                                                               ` Artem Shinkarov
2011-10-04 15:21                                                                 ` Artem Shinkarov
2011-10-04 16:43                                                                   ` Richard Henderson
2011-10-06 10:55                                                             ` Georg-Johann Lay
2011-10-06 11:28                                                               ` Richard Guenther
2011-10-06 11:38                                                                 ` Georg-Johann Lay
2011-10-06 11:46                                                                   ` Richard Guenther
2011-10-06 12:12                                                                     ` Georg-Johann Lay
2011-10-06 15:43                                                                       ` Richard Henderson
2011-10-06 18:13                                                                         ` Georg-Johann Lay
2011-10-06 11:47                                                               ` Jakub Jelinek
2011-10-03 22:48                                                       ` H.J. Lu
2011-10-04  2:26                                                   ` Hans-Peter Nilsson
2011-08-31 20:36         ` Chris Lattner
2011-08-31  8:59     ` Richard Guenther
  -- strict thread matches above, loose matches on Subject: below --
2010-08-15 15:32 Artem Shinkarov
2010-08-15 15:34 ` Joseph S. Myers
2010-08-15 15:56   ` Artem Shinkarov
2010-08-15 16:04     ` Joseph S. Myers
2010-08-15 16:12 ` Chris Lattner
2010-08-15 18:56   ` Steven Bosscher
2010-08-15 21:23     ` Paolo Bonzini
2010-08-15 22:46   ` Richard Guenther
2010-08-16 15:49     ` Chris Lattner
2010-08-15 18:26 ` Andrew Pinski
2010-08-15 22:10   ` Richard Guenther
2010-08-16 18:53     ` Richard Henderson
2010-08-16 19:24       ` Richard Henderson
2010-08-17  9:36       ` Richard Guenther
2010-08-16 17:44 ` Richard Henderson
2010-08-16 19:33   ` Artem Shinkarov
2010-08-16 19:59     ` Richard Henderson
2010-08-17  9:33   ` Richard Guenther

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).