[patch] Support vectorization of widening shifts

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [patch] Support vectorization of widening shifts
@ 2011-09-19  8:26 Ira Rosen
  2011-09-26 14:40 ` Richard Guenther
  2011-09-29 15:42 ` Ramana Radhakrishnan
  0 siblings, 2 replies; 9+ messages in thread
From: Ira Rosen @ 2011-09-19  8:26 UTC (permalink / raw)
  To: gcc-patches; +Cc: Patch Tracking

[-- Attachment #1: Type: text/plain, Size: 3481 bytes --]

Hi,

This patch adds a support of widening shift left. The following
pattern is detected:

type a_t;
TYPE a_T, res_T;

a_t = ;
a_T = (TYPE) a_t;
res_T = a_T << CONST;

('TYPE' is at least 2 times bigger than 'type', and CONST is at most
the size of 'type')

and create a pattern stmt using new tree code WIDEN_SHIFT_LEFT_EXPR for it:

a_t = ;
a_T = (TYPE) a_t;
res_T = a_T << CONST;
    -->  res_T = a_t w<< CONST;

which is later transformed into:

va_t = ;
vres_T0 = WIDEN_SHIFT_LEFT_LO_EXPR <va_t, CONST>;
vres_T1 = WIDEN_SHIFT_LEFT_HI_EXPR <va_t, CONST>;

This patch also supports unsigned types, and cases when 'TYPE' is 4
times bigger than 'type'.
This feature is similar to vectorization of widening multiplication.

Bootstrapped on powerpc64-suse-linux, tested on powerpc64-suse-linux
and arm-linux-gnueabi
OK for mainline?

Thanks,
Ira

ChangeLog:

        * doc/md.texi (vec_widen_ushiftl_hi, vec_widen_ushiftl_lo,
vec_widen_sshiftl_hi,
        vec_widen_sshiftl_lo): Document.
        * tree-pretty-print.c (dump_generic_node): Handle WIDEN_SHIFT_LEFT_EXPR,
        VEC_WIDEN_SHIFT_LEFT_HI_EXPR and VEC_WIDEN_SHIFT_LEFT_LO_EXPR.
        (op_code_prio): Likewise.
        (op_symbol_code): Handle WIDEN_SHIFT_LEFT_EXPR.
        * optabs.c (optab_for_tree_code): Handle
        VEC_WIDEN_SHIFT_LEFT_HI_EXPR and VEC_WIDEN_SHIFT_LEFT_LO_EXPR.
        (init-optabs): Initialize optab codes for vec_widen_u/sshiftl_hi/lo.
        * optabs.h (enum optab_index): Add OTI_vec_widen_u/sshiftl_hi/lo.
        * genopinit.c (optabs): Initialize the new optabs.
        * expr.c (expand_expr_real_2): Handle
        VEC_WIDEN_SHIFT_LEFT_HI_EXPR and VEC_WIDEN_SHIFT_LEFT_LO_EXPR.
        * gimple-pretty-print.c (dump_binary_rhs): Likewise.
        * tree-vectorizer.h (NUM_PATTERNS): Increase to 6.
        * tree.def (WIDEN_SHIFT_LEFT_EXPR, VEC_WIDEN_SHIFT_LEFT_HI_EXPR,
        VEC_WIDEN_SHIFT_LEFT_LO_EXPR): New.
        * cfgexpand.c (expand_debug_expr):  Handle new tree codes.
        * tree-vect-patterns.c (vect_vect_recog_func_ptrs): Add
        vect_recog_widen_shift_pattern.
        (vect_handle_widen_mult_by_const): Rename...
        (vect_handle_widen_op_by_const): ...to this.  Handle shifts.
        Add a new argument, update documentation.
        (vect_recog_widen_mult_pattern): Assume that only second
        operand can be constant.  Update call to
        vect_handle_widen_op_by_const.
        (vect_operation_fits_smaller_type): Add the already existing
        def stmt to the list of pattern statements.
        (vect_recog_widen_shift_pattern): New.
        * tree-vect-stmts.c (vectorizable_type_promotion): Handle
        widening shifts.
        (supportable_widening_operation): Likewise.
        * tree-inline.c (estimate_operator_cost): Handle new tree codes.
        * tree-vect-generic.c (expand_vector_operations_1): Likewise.
        * tree-cfg.c (verify_gimple_assign_binary): Likewise.
        * config/arm/neon.md (neon_vec_<US>shiftl_lo_<mode>): New.
        (vec_widen_<US>shiftl_lo_<mode>, neon_vec_<US>shiftl_hi_<mode>,
        vec_widen_<US>shiftl_hi_<mode>, neon_vec_<US>shift_left_<mode>):
        Likewise.
        * tree-vect-slp.c (vect_build_slp_tree): Require same shift operand
        for widening shift.

testsuite/ChangeLog:

       * gcc.dg/vect/vect-widen-shift-s16.c: New.
       * gcc.dg/vect/vect-widen-shift-s8.c: New.
       * gcc.dg/vect/vect-widen-shift-u16.c: New.
       * gcc.dg/vect/vect-widen-shift-u8.c: New.

[-- Attachment #2: widen-shift.txt --]
[-- Type: text/plain, Size: 44608 bytes --]

Index: doc/md.texi
===================================================================
--- doc/md.texi	(revision 178942)
+++ doc/md.texi	(working copy)
@@ -4238,6 +4238,17 @@ are vectors with N signed/unsigned elements of siz
 elements of the two vectors, and put the N/2 products of size 2*S in the
 output vector (operand 0).
 
+@cindex @code{vec_widen_ushiftl_hi_@var{m}} instruction pattern
+@cindex @code{vec_widen_ushiftl_lo_@var{m}} instruction pattern
+@cindex @code{vec_widen_sshiftl_hi_@var{m}} instruction pattern
+@cindex @code{vec_widen_sshiftl_lo_@var{m}} instruction pattern
+@item @samp{vec_widen_ushiftl_hi_@var{m}}, @samp{vec_widen_ushiftl_lo_@var{m}}
+@itemx @samp{vec_widen_sshiftl_hi_@var{m}}, @samp{vec_widen_sshiftl_lo_@var{m}}
+Signed/Unsigned widening shift left.  The first input (operand 1) is a vector
+with N signed/unsigned elements of size S@.  Operand 2 is a constant.  Shift
+the high/low elements of operand 1, and put the N/2 results of size 2*S in the
+output vector (operand 0).
+
 @cindex @code{mulhisi3} instruction pattern
 @item @samp{mulhisi3}
 Multiply operands 1 and 2, which have mode @code{HImode}, and store
Index: tree-pretty-print.c
===================================================================
--- tree-pretty-print.c	(revision 178942)
+++ tree-pretty-print.c	(working copy)
@@ -1599,6 +1599,7 @@ dump_generic_node (pretty_printer *buffer, tree no
     case RROTATE_EXPR:
     case VEC_LSHIFT_EXPR:
     case VEC_RSHIFT_EXPR:
+    case WIDEN_SHIFT_LEFT_EXPR:
     case BIT_IOR_EXPR:
     case BIT_XOR_EXPR:
     case BIT_AND_EXPR:
@@ -2287,6 +2288,22 @@ dump_generic_node (pretty_printer *buffer, tree no
       pp_string (buffer, " > ");
       break;
 
+    case VEC_WIDEN_SHIFT_LEFT_HI_EXPR:
+      pp_string (buffer, " VEC_WIDEN_SHIFT_LEFT_HI_EXPR < ");
+      dump_generic_node (buffer, TREE_OPERAND (node, 0), spc, flags, false);
+      pp_string (buffer, ", ");
+      dump_generic_node (buffer, TREE_OPERAND (node, 1), spc, flags, false);
+      pp_string (buffer, " > ");
+      break;
+
+    case VEC_WIDEN_SHIFT_LEFT_LO_EXPR:
+      pp_string (buffer, " VEC_WIDEN_SHIFT_LEFT_HI_EXPR < ");
+      dump_generic_node (buffer, TREE_OPERAND (node, 0), spc, flags, false);
+      pp_string (buffer, ", ");
+      dump_generic_node (buffer, TREE_OPERAND (node, 1), spc, flags, false);
+      pp_string (buffer, " > ");
+      break;
+
     case VEC_UNPACK_HI_EXPR:
       pp_string (buffer, " VEC_UNPACK_HI_EXPR < ");
       dump_generic_node (buffer, TREE_OPERAND (node, 0), spc, flags, false);
@@ -2609,6 +2626,9 @@ op_code_prio (enum tree_code code)
     case RSHIFT_EXPR:
     case LROTATE_EXPR:
     case RROTATE_EXPR:
+    case VEC_WIDEN_SHIFT_LEFT_HI_EXPR:
+    case VEC_WIDEN_SHIFT_LEFT_LO_EXPR:
+    case WIDEN_SHIFT_LEFT_EXPR:
       return 11;
 
     case WIDEN_SUM_EXPR:
@@ -2784,6 +2804,9 @@ op_symbol_code (enum tree_code code)
     case VEC_RSHIFT_EXPR:
       return "v>>";
 
+    case WIDEN_SHIFT_LEFT_EXPR:
+      return "w<<";
+
     case POINTER_PLUS_EXPR:
       return "+";
 
Index: optabs.c
===================================================================
--- optabs.c	(revision 178942)
+++ optabs.c	(working copy)
@@ -479,6 +479,14 @@ optab_for_tree_code (enum tree_code code, const_tr
       return TYPE_UNSIGNED (type) ?
 	vec_widen_umult_lo_optab : vec_widen_smult_lo_optab;
 
+    case VEC_WIDEN_SHIFT_LEFT_HI_EXPR:
+      return TYPE_UNSIGNED (type) ?
+        vec_widen_ushiftl_hi_optab : vec_widen_sshiftl_hi_optab;
+
+    case VEC_WIDEN_SHIFT_LEFT_LO_EXPR:
+      return TYPE_UNSIGNED (type) ?
+        vec_widen_ushiftl_lo_optab : vec_widen_sshiftl_lo_optab;
+
     case VEC_UNPACK_HI_EXPR:
       return TYPE_UNSIGNED (type) ?
 	vec_unpacku_hi_optab : vec_unpacks_hi_optab;
@@ -6132,6 +6140,10 @@ init_optabs (void)
   init_optab (vec_widen_umult_lo_optab, UNKNOWN);
   init_optab (vec_widen_smult_hi_optab, UNKNOWN);
   init_optab (vec_widen_smult_lo_optab, UNKNOWN);
+  init_optab (vec_widen_ushiftl_hi_optab, UNKNOWN);
+  init_optab (vec_widen_ushiftl_lo_optab, UNKNOWN);
+  init_optab (vec_widen_sshiftl_hi_optab, UNKNOWN);
+  init_optab (vec_widen_sshiftl_lo_optab, UNKNOWN);
   init_optab (vec_unpacks_hi_optab, UNKNOWN);
   init_optab (vec_unpacks_lo_optab, UNKNOWN);
   init_optab (vec_unpacku_hi_optab, UNKNOWN);
Index: optabs.h
===================================================================
--- optabs.h	(revision 178942)
+++ optabs.h	(working copy)
@@ -351,6 +351,12 @@ enum optab_index
   OTI_vec_widen_umult_lo,
   OTI_vec_widen_smult_hi,
   OTI_vec_widen_smult_lo,
+  /* Widening shift left.
+     The high/low part of the resulting vector is returned.  */
+  OTI_vec_widen_ushiftl_hi,
+  OTI_vec_widen_ushiftl_lo,
+  OTI_vec_widen_sshiftl_hi,
+  OTI_vec_widen_sshiftl_lo,
   /* Extract and widen the high/low part of a vector of signed or
      floating point elements.  */
   OTI_vec_unpacks_hi,
@@ -544,6 +550,10 @@ enum optab_index
 #define vec_widen_umult_lo_optab (&optab_table[OTI_vec_widen_umult_lo])
 #define vec_widen_smult_hi_optab (&optab_table[OTI_vec_widen_smult_hi])
 #define vec_widen_smult_lo_optab (&optab_table[OTI_vec_widen_smult_lo])
+#define vec_widen_ushiftl_hi_optab (&optab_table[OTI_vec_widen_ushiftl_hi])
+#define vec_widen_ushiftl_lo_optab (&optab_table[OTI_vec_widen_ushiftl_lo])
+#define vec_widen_sshiftl_hi_optab (&optab_table[OTI_vec_widen_sshiftl_hi])
+#define vec_widen_sshiftl_lo_optab (&optab_table[OTI_vec_widen_sshiftl_lo])
 #define vec_unpacks_hi_optab (&optab_table[OTI_vec_unpacks_hi])
 #define vec_unpacks_lo_optab (&optab_table[OTI_vec_unpacks_lo])
 #define vec_unpacku_hi_optab (&optab_table[OTI_vec_unpacku_hi])
Index: genopinit.c
===================================================================
--- genopinit.c	(revision 178942)
+++ genopinit.c	(working copy)
@@ -269,6 +269,10 @@ static const char * const optabs[] =
   "set_optab_handler (vec_widen_umult_lo_optab, $A, CODE_FOR_$(vec_widen_umult_lo_$a$))",
   "set_optab_handler (vec_widen_smult_hi_optab, $A, CODE_FOR_$(vec_widen_smult_hi_$a$))",
   "set_optab_handler (vec_widen_smult_lo_optab, $A, CODE_FOR_$(vec_widen_smult_lo_$a$))",
+  "set_optab_handler (vec_widen_ushiftl_hi_optab, $A, CODE_FOR_$(vec_widen_ushiftl_hi_$a$))",
+  "set_optab_handler (vec_widen_ushiftl_lo_optab, $A, CODE_FOR_$(vec_widen_ushiftl_lo_$a$))",
+  "set_optab_handler (vec_widen_sshiftl_hi_optab, $A, CODE_FOR_$(vec_widen_sshiftl_hi_$a$))",
+  "set_optab_handler (vec_widen_sshiftl_lo_optab, $A, CODE_FOR_$(vec_widen_sshiftl_lo_$a$))",
   "set_optab_handler (vec_unpacks_hi_optab, $A, CODE_FOR_$(vec_unpacks_hi_$a$))",
   "set_optab_handler (vec_unpacks_lo_optab, $A, CODE_FOR_$(vec_unpacks_lo_$a$))",
   "set_optab_handler (vec_unpacku_hi_optab, $A, CODE_FOR_$(vec_unpacku_hi_$a$))",
Index: testsuite/lib/target-supports.exp
===================================================================
--- testsuite/lib/target-supports.exp	(revision 178942)
+++ testsuite/lib/target-supports.exp	(working copy)
@@ -2890,6 +2890,26 @@ proc check_effective_target_vect_widen_mult_hi_to_
 }
 
 # Return 1 if the target plus current options supports a vector
+# widening shift, 0 otherwise.
+#
+# This won't change for different subtargets so cache the result.
+
+proc check_effective_target_vect_widen_shift { } {
+    global et_vect_widen_shift
+
+    if [info exists et_vect_shift_saved] {
+        verbose "check_effective_target_vect_widen_shift: using cached result" 2
+    } else {
+        set et_vect_widen_shift_saved 0
+        if { ([istarget arm*-*-*] && [check_effective_target_arm_neon]) } {
+            set et_vect_widen_shift_saved 1
+        }
+    }
+    verbose "check_effective_target_vect_widen_shift: returning $et_vect_widen_shift_saved" 2
+    return $et_vect_widen_shift_saved
+}
+
+# Return 1 if the target plus current options supports a vector
 # dot-product of signed chars, 0 otherwise.
 #
 # This won't change for different subtargets so cache the result.
Index: testsuite/gcc.dg/vect/vect-widen-shift-u16.c
===================================================================
--- testsuite/gcc.dg/vect/vect-widen-shift-u16.c	(revision 0)
+++ testsuite/gcc.dg/vect/vect-widen-shift-u16.c	(revision 0)
@@ -0,0 +1,58 @@
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target vect_shift } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+#define N 64
+#define C 7 
+
+__attribute__ ((noinline)) void
+foo (unsigned short *src, unsigned int *dst)
+{
+  int i;
+  unsigned short b, *s = src;
+  unsigned int *d = dst;
+
+  for (i = 0; i < N; i++)
+    {
+      b = *s++;
+      *d = b<<C;
+      d++;
+    }
+
+  s = src;
+  d = dst;
+  for (i = 0; i < N; i++)
+    {
+      b = *s++;
+      if (*d != b<<C) 
+        abort (); 
+      d++;
+    }
+}
+
+int main (void)
+{
+  int i;
+  unsigned short in[N];
+  unsigned int out[N];
+
+  check_vect ();
+
+  for (i = 0; i < N; i++)
+    {
+      in[i] = i;
+      out[i] = 255;
+      __asm__ volatile ("");
+    }
+
+  foo (in, out);
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vect_recog_widen_shift_pattern: detected" 1 "vect" { target vect_widen_shift } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
+
Index: testsuite/gcc.dg/vect/vect-widen-mult-half-u8.c
===================================================================
--- testsuite/gcc.dg/vect/vect-widen-mult-half-u8.c	(revision 178942)
+++ testsuite/gcc.dg/vect/vect-widen-mult-half-u8.c	(working copy)
@@ -4,7 +4,8 @@
 #include <stdlib.h>
 
 #define N 32
-#define COEF 32470
+#define COEF1 32470
+#define COEF2 32
 
 unsigned char in[N];
 int out[N];
@@ -15,7 +16,7 @@ foo ()
   int i;
 
   for (i = 0; i < N; i++)
-    out[i] = in[i] * COEF;
+    out[i] = in[i] * COEF1;
 }
 
 __attribute__ ((noinline)) void
@@ -24,7 +25,7 @@ bar ()
   int i;
 
   for (i = 0; i < N; i++)
-    out[i] = COEF * in[i];
+    out[i] = COEF2 * in[i];
 }
 
 int main (void)
@@ -40,13 +41,13 @@ int main (void)
   foo ();
 
   for (i = 0; i < N; i++)
-    if (out[i] != in[i] * COEF)
+    if (out[i] != in[i] * COEF1)
       abort ();
 
   bar ();
 
   for (i = 0; i < N; i++)
-    if (out[i] != in[i] * COEF)
+    if (out[i] != in[i] * COEF2)
       abort ();
 
   return 0;
Index: testsuite/gcc.dg/vect/vect-widen-shift-s8.c
===================================================================
--- testsuite/gcc.dg/vect/vect-widen-shift-s8.c	(revision 0)
+++ testsuite/gcc.dg/vect/vect-widen-shift-s8.c	(revision 0)
@@ -0,0 +1,58 @@
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target vect_shift } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+#define N 64
+#define C 7 
+
+__attribute__ ((noinline)) void
+foo (char *src, int *dst)
+{
+  int i;
+  char b, *s = src;
+  int *d = dst;
+
+  for (i = 0; i < N; i++)
+    {
+      b = *s++;
+      *d = b << C;
+      d++;
+    }
+
+  s = src;
+  d = dst;
+  for (i = 0; i < N; i++)
+    {
+      b = *s++;
+      if (*d != b << C) 
+        abort (); 
+      d++;
+    }
+}
+
+int main (void)
+{
+  int i;
+  char in[N];
+  int out[N];
+
+  check_vect ();
+
+  for (i = 0; i < N; i++)
+    {
+      in[i] = i;
+      out[i] = 255;
+      __asm__ volatile ("");
+    }
+
+  foo (in, out);
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vect_recog_widen_shift_pattern: detected" 1 "vect" { target vect_widen_shift } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
+
Index: testsuite/gcc.dg/vect/vect-widen-shift-u8.c
===================================================================
--- testsuite/gcc.dg/vect/vect-widen-shift-u8.c	(revision 0)
+++ testsuite/gcc.dg/vect/vect-widen-shift-u8.c	(revision 0)
@@ -0,0 +1,65 @@
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target vect_shift } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+#define N 64
+#define C1 10 
+#define C2 5 
+
+__attribute__ ((noinline)) void
+foo (unsigned char *src, unsigned int *dst1, unsigned int *dst2)
+{
+  int i;
+  unsigned char b, *s = src;
+  unsigned int *d1 = dst1, *d2 = dst2;
+
+  for (i = 0; i < N; i++)
+    {
+      b = *s++;
+      *d1 = b << C1;
+      d1++;
+      *d2 = b << C2;
+      d2++;
+    }
+
+  s = src;
+  d1 = dst1;
+  d2 = dst2;
+  for (i = 0; i < N; i++)
+    {
+      b = *s++;
+      if (*d1 != b << C1 || *d2 != b << C2) 
+        abort (); 
+      d1++;
+      d2++;
+    }
+}
+
+int main (void)
+{
+  int i;
+  unsigned char in[N];
+  unsigned int out1[N];
+  unsigned int out2[N];
+
+  check_vect ();
+
+  for (i = 0; i < N; i++)
+    {
+      in[i] = i;
+      out1[i] = 255;
+      out2[i] = 255;
+      __asm__ volatile ("");
+    }
+
+  foo (in, out1, out2);
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vect_recog_widen_shift_pattern: detected" 1 "vect" { target vect_widen_shift } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
+
Index: testsuite/gcc.dg/vect/vect-widen-shift-s16.c
===================================================================
--- testsuite/gcc.dg/vect/vect-widen-shift-s16.c	(revision 0)
+++ testsuite/gcc.dg/vect/vect-widen-shift-s16.c	(revision 0)
@@ -0,0 +1,107 @@
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target vect_shift } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+#define N 64
+#define C 16 
+
+__attribute__ ((noinline)) void
+foo (short *src, int *dst)
+{
+  int i;
+  short b, b0, b1, b2, b3, *s = src;
+  int *d = dst;
+
+  for (i = 0; i < N/4; i++)
+    {
+      b0 = *s++;
+      b1 = *s++;
+      b2 = *s++;
+      b3 = *s++;
+      *d = b0 << C;
+      d++;
+      *d = b1 << C;
+      d++;
+      *d = b2 << C;
+      d++;
+      *d = b3 << C;
+      d++;
+    }
+
+  s = src;
+  d = dst;
+  for (i = 0; i < N; i++)
+    {
+      b = *s++;
+      if (*d != b << C) 
+        abort (); 
+      d++;
+    }
+
+  s = src;
+  d = dst;
+  for (i = 0; i < N/4; i++)
+    {
+      b0 = *s++;
+      b1 = *s++;
+      b2 = *s++;
+      b3 = *s++;
+      *d = b0 << C;
+      d++;
+      *d = b1 << C;
+      d++;
+      *d = b2 << C;
+      d++;
+      *d = b3 << 6;
+      d++;
+    }
+
+  s = src;
+  d = dst;
+  for (i = 0; i < N/4; i++)
+    {
+      b = *s++;
+      if (*d != b << C)
+        abort ();
+      d++;
+      b = *s++;
+      if (*d != b << C)
+        abort ();
+      d++;
+      b = *s++;
+      if (*d != b << C)
+        abort ();
+      d++;
+      b = *s++;
+      if (*d != b << 6)
+        abort ();
+      d++;
+    }
+}
+
+int main (void)
+{
+  int i;
+  short in[N];
+  int out[N];
+
+  check_vect ();
+
+  for (i = 0; i < N; i++)
+    {
+      in[i] = i;
+      out[i] = 255;
+      __asm__ volatile ("");
+    }
+
+  foo (in, out);
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vect_recog_widen_shift_pattern: detected" 8 "vect" { target vect_widen_shift } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
+
Index: expr.c
===================================================================
--- expr.c	(revision 178942)
+++ expr.c	(working copy)
@@ -8600,6 +8600,19 @@ expand_expr_real_2 (sepops ops, rtx target, enum m
 	return target;
       }
 
+    case VEC_WIDEN_SHIFT_LEFT_HI_EXPR:
+    case VEC_WIDEN_SHIFT_LEFT_LO_EXPR:
+      {
+        tree oprnd0 = treeop0;
+        tree oprnd1 = treeop1;
+
+        expand_operands (oprnd0, oprnd1, NULL_RTX, &op0, &op1, EXPAND_NORMAL);
+        target = expand_widen_pattern_expr (ops, op0, op1, NULL_RTX,
+                                            target, unsignedp);
+        gcc_assert (target);
+        return target;
+      }
+
     case VEC_PACK_TRUNC_EXPR:
     case VEC_PACK_SAT_EXPR:
     case VEC_PACK_FIX_TRUNC_EXPR:
Index: gimple-pretty-print.c
===================================================================
--- gimple-pretty-print.c	(revision 178942)
+++ gimple-pretty-print.c	(working copy)
@@ -343,6 +343,8 @@ dump_binary_rhs (pretty_printer *buffer, gimple gs
     case VEC_EXTRACT_ODD_EXPR:
     case VEC_INTERLEAVE_HIGH_EXPR:
     case VEC_INTERLEAVE_LOW_EXPR:
+    case VEC_WIDEN_SHIFT_LEFT_HI_EXPR:
+    case VEC_WIDEN_SHIFT_LEFT_LO_EXPR:
       for (p = tree_code_name [(int) code]; *p; p++)
 	pp_character (buffer, TOUPPER (*p));
       pp_string (buffer, " <");
Index: tree-vectorizer.h
===================================================================
--- tree-vectorizer.h	(revision 178942)
+++ tree-vectorizer.h	(working copy)
@@ -902,7 +902,7 @@ extern void vect_slp_transform_bb (basic_block);
    Additional pattern recognition functions can (and will) be added
    in the future.  */
 typedef gimple (* vect_recog_func_ptr) (VEC (gimple, heap) **, tree *, tree *);
-#define NUM_PATTERNS 5
+#define NUM_PATTERNS 6 
 void vect_pattern_recog (loop_vec_info);
 
 /* In tree-vectorizer.c.  */
Index: tree.def
===================================================================
--- tree.def	(revision 178942)
+++ tree.def	(working copy)
@@ -1111,6 +1111,19 @@ DEFTREECODE (WIDEN_MULT_PLUS_EXPR, "widen_mult_plu
    is subtracted from t3.  */
 DEFTREECODE (WIDEN_MULT_MINUS_EXPR, "widen_mult_minus_expr", tcc_expression, 3)
 
+/* Widening shift left.
+   The first operand is of type t1.
+   The second operand is the number of bits to shift by; it need not be the
+   same type as the first operand and result.
+   Note that the result is undefined if the second operand is larger
+   than or equal to the first operand's type size.
+   The type of the entire expression is t2, such that t2 is at least twice
+   the size of t1.
+   WIDEN_SHIFT_EXPR is equivalent to first widening (promoting)
+   the first argument from type t1 to type t2, and then shifting it
+   by the second argument.  */
+DEFTREECODE (WIDEN_SHIFT_LEFT_EXPR, "widen_shift_expr", tcc_binary, 2)
+
 /* Fused multiply-add.
    All operands and the result are of the same type.  No intermediate
    rounding is performed after multiplying operand one with operand two
@@ -1166,6 +1179,16 @@ DEFTREECODE (VEC_EXTRACT_ODD_EXPR, "vec_extractodd
 DEFTREECODE (VEC_INTERLEAVE_HIGH_EXPR, "vec_interleavehigh_expr", tcc_binary, 2)
 DEFTREECODE (VEC_INTERLEAVE_LOW_EXPR, "vec_interleavelow_expr", tcc_binary, 2)
 
+/* Widening vector shift left in bits.
+   Operand 0 is a vector to be shifted with N elements of size S.
+   Operand 1 is an integer shift amount in bits. 
+   The result of the operation is N elements of size 2*S.
+   VEC_WIDEN_SHIFT_LEFT_HI_EXPR computes the N/2 high results.
+   VEC_WIDEN_SHIFT_LEFT_LO_EXPR computes the N/2 low results.
+ */
+DEFTREECODE (VEC_WIDEN_SHIFT_LEFT_HI_EXPR, "widen_shift_left_hi_expr", tcc_binary, 2)
+DEFTREECODE (VEC_WIDEN_SHIFT_LEFT_LO_EXPR, "widen_shift_left_lo_expr", tcc_binary, 2)
+
 /* PREDICT_EXPR.  Specify hint for branch prediction.  The
    PREDICT_EXPR_PREDICTOR specify predictor and PREDICT_EXPR_OUTCOME the
    outcome (0 for not taken and 1 for taken).  Once the profile is guessed
Index: cfgexpand.c
===================================================================
--- cfgexpand.c	(revision 178942)
+++ cfgexpand.c	(working copy)
@@ -3264,6 +3264,8 @@ expand_debug_expr (tree exp)
     case VEC_UNPACK_LO_EXPR:
     case VEC_WIDEN_MULT_HI_EXPR:
     case VEC_WIDEN_MULT_LO_EXPR:
+    case VEC_WIDEN_SHIFT_LEFT_HI_EXPR:
+    case VEC_WIDEN_SHIFT_LEFT_LO_EXPR:
       return NULL;
 
    /* Misc codes.  */
Index: tree-vect-patterns.c
===================================================================
--- tree-vect-patterns.c	(revision 178942)
+++ tree-vect-patterns.c	(working copy)
@@ -49,12 +49,15 @@ static gimple vect_recog_dot_prod_pattern (VEC (gi
 static gimple vect_recog_pow_pattern (VEC (gimple, heap) **, tree *, tree *);
 static gimple vect_recog_over_widening_pattern (VEC (gimple, heap) **, tree *,
                                                  tree *);
+static gimple vect_recog_widen_shift_pattern (VEC (gimple, heap) **,
+	                                tree *, tree *);
 static vect_recog_func_ptr vect_vect_recog_func_ptrs[NUM_PATTERNS] = {
 	vect_recog_widen_mult_pattern,
 	vect_recog_widen_sum_pattern,
 	vect_recog_dot_prod_pattern,
 	vect_recog_pow_pattern,
-        vect_recog_over_widening_pattern};
+        vect_recog_over_widening_pattern,
+	vect_recog_widen_shift_pattern};
 
 
 /* Function widened_name_p
@@ -335,27 +338,36 @@ vect_recog_dot_prod_pattern (VEC (gimple, heap) **
 }
 
 
-/* Handle two cases of multiplication by a constant.  The first one is when
-   the constant, CONST_OPRND, fits the type (HALF_TYPE) of the second
-   operand (OPRND).  In that case, we can peform widen-mult from HALF_TYPE to
-   TYPE.
+/* Handle widening operation by a constant.  At the moment we support MULT_EXPR
+   and LSHIFT_EXPR.
 
+   For MULT_EXPR we check that CONST_OPRND fits HALF_TYPE, and for LSHIFT_EXPR
+   we check that CONST_OPRND is less or equal to the size of HALF_TYPE.  
+   
    Otherwise, if the type of the result (TYPE) is at least 4 times bigger than
-   HALF_TYPE, and CONST_OPRND fits an intermediate type (2 times smaller than
-   TYPE), we can perform widen-mult from the intermediate type to TYPE and
-   replace a_T = (TYPE) a_t; with a_it - (interm_type) a_t;  */
+   HALF_TYPE, and there is an intermediate type (2 times smaller than TYPE)
+   that satisfies the above restrictions,  we can perform a widening opeartion
+   from the intermediate type to TYPE and replace a_T = (TYPE) a_t;
+   with a_it = (interm_type) a_t;  */
 
 static bool
-vect_handle_widen_mult_by_const (gimple stmt, tree const_oprnd, tree *oprnd,
-   			         VEC (gimple, heap) **stmts, tree type,
-			         tree *half_type, gimple def_stmt)
+vect_handle_widen_op_by_const (gimple stmt, enum tree_code code,
+		               tree const_oprnd, tree *oprnd,
+   		               VEC (gimple, heap) **stmts, tree type,
+			       tree *half_type, gimple def_stmt)
 {
   tree new_type, new_oprnd, tmp;
   gimple new_stmt;
   loop_vec_info loop_info = STMT_VINFO_LOOP_VINFO (vinfo_for_stmt (stmt));
   struct loop *loop = LOOP_VINFO_LOOP (loop_info);
 
-  if (int_fits_type_p (const_oprnd, *half_type))
+  if (code != MULT_EXPR && code != LSHIFT_EXPR)
+    return false;
+
+  if (((code == MULT_EXPR && int_fits_type_p (const_oprnd, *half_type))
+        || (code == LSHIFT_EXPR
+            && compare_tree_int (const_oprnd, TYPE_PRECISION (*half_type)) != 1))
+      && TYPE_PRECISION (type) == (TYPE_PRECISION (*half_type) * 2))
     {
       /* CONST_OPRND is a constant of HALF_TYPE.  */
       *oprnd = gimple_assign_rhs1 (def_stmt);
@@ -368,14 +380,16 @@ static bool
       || !vinfo_for_stmt (def_stmt))
     return false;
 
-  /* TYPE is 4 times bigger than HALF_TYPE, try widen-mult for
+  /* TYPE is 4 times bigger than HALF_TYPE, try widening operation for
      a type 2 times bigger than HALF_TYPE.  */
   new_type = build_nonstandard_integer_type (TYPE_PRECISION (type) / 2,
                                              TYPE_UNSIGNED (type));
-  if (!int_fits_type_p (const_oprnd, new_type))
+  if ((code == MULT_EXPR && !int_fits_type_p (const_oprnd, new_type))
+      || (code == LSHIFT_EXPR
+          && compare_tree_int (const_oprnd, TYPE_PRECISION (new_type)) == 1))
     return false;
 
-  /* Use NEW_TYPE for widen_mult.  */
+  /* Use NEW_TYPE for widening operation.  */
   if (STMT_VINFO_RELATED_STMT (vinfo_for_stmt (def_stmt)))
     {
       new_stmt = STMT_VINFO_RELATED_STMT (vinfo_for_stmt (def_stmt));
@@ -385,6 +399,7 @@ static bool
           || TREE_TYPE (gimple_assign_lhs (new_stmt)) != new_type)
         return false;
 
+      VEC_safe_push (gimple, heap, *stmts, def_stmt);
       *oprnd = gimple_assign_lhs (new_stmt);
     }
   else
@@ -495,7 +510,7 @@ vect_recog_widen_mult_pattern (VEC (gimple, heap)
   enum tree_code dummy_code;
   int dummy_int;
   VEC (tree, heap) *dummy_vec;
-  bool op0_ok, op1_ok;
+  bool op1_ok;
 
   if (!is_gimple_assign (last_stmt))
     return NULL;
@@ -515,38 +530,23 @@ vect_recog_widen_mult_pattern (VEC (gimple, heap)
     return NULL;
 
   /* Check argument 0.  */
-  op0_ok = widened_name_p (oprnd0, last_stmt, &half_type0, &def_stmt0, false);
+  if (!widened_name_p (oprnd0, last_stmt, &half_type0, &def_stmt0, false))
+    return NULL;
   /* Check argument 1.  */
   op1_ok = widened_name_p (oprnd1, last_stmt, &half_type1, &def_stmt1, false);
 
-  /* In case of multiplication by a constant one of the operands may not match
-     the pattern, but not both.  */
-  if (!op0_ok && !op1_ok)
-    return NULL;
-
-  if (op0_ok && op1_ok)
+  if (op1_ok)
     {
       oprnd0 = gimple_assign_rhs1 (def_stmt0);
       oprnd1 = gimple_assign_rhs1 (def_stmt1);
     }	       
-  else if (!op0_ok)
+  else
     {
-      if (TREE_CODE (oprnd0) == INTEGER_CST
-	  && TREE_CODE (half_type1) == INTEGER_TYPE
-          && vect_handle_widen_mult_by_const (last_stmt, oprnd0, &oprnd1,
-                                              stmts, type,
-				 	      &half_type1, def_stmt1))
-        half_type0 = half_type1;
-      else
-	return NULL;
-    }
-  else if (!op1_ok)
-    {
       if (TREE_CODE (oprnd1) == INTEGER_CST
           && TREE_CODE (half_type0) == INTEGER_TYPE
-          && vect_handle_widen_mult_by_const (last_stmt, oprnd1, &oprnd0,
-                                              stmts, type,
-					      &half_type0, def_stmt0))
+          && vect_handle_widen_op_by_const (last_stmt, MULT_EXPR, oprnd1,
+		                            &oprnd0, stmts, type,
+					    &half_type0, def_stmt0))
         half_type1 = half_type0;
       else
         return NULL;
@@ -1001,6 +1001,7 @@ vect_operation_fits_smaller_type (gimple stmt, tre
                   || TREE_TYPE (gimple_assign_lhs (new_stmt)) != interm_type)
                 return false;
 
+	      VEC_safe_push (gimple, heap, *stmts, def_stmt);
               oprnd = gimple_assign_lhs (new_stmt);
             }
           else
@@ -1070,10 +1071,8 @@ vect_operation_fits_smaller_type (gimple stmt, tre
    constants.
    Check if S3 and S4 can be done on a smaller type than 'TYPE', it can either
    be 'type' or some intermediate type.  For now, we expect S5 to be a type
-   demotion operation.  We also check that S3 and S4 have only one use.
-.
+   demotion operation.  We also check that S3 and S4 have only one use.  */
 
-*/
 static gimple
 vect_recog_over_widening_pattern (VEC (gimple, heap) **stmts,
                                   tree *type_in, tree *type_out)
@@ -1217,12 +1216,184 @@ vect_recog_over_widening_pattern (VEC (gimple, hea
   return pattern_stmt;
 }
 
+/* Detect widening shift pattern:
+ 
+   type a_t;
+   TYPE a_T, res_T;
+ 
+   S1 a_t = ;
+   S2 a_T = (TYPE) a_t;
+   S3 res_T = a_T << CONST;
 
+  where type 'TYPE' is at least double the size of type 'type'.
+
+  Also detect unsgigned cases:
+
+  unsigned type a_t;
+  unsigned TYPE u_res_T;
+  TYPE a_T, res_T;
+
+  S1 a_t = ;
+  S2 a_T = (TYPE) a_t;
+  S3 res_T = a_T << CONST;
+  S4 u_res_T = (unsigned TYPE) res_T;
+
+  And a case when 'TYPE' is 4 times bigger than 'type'.  In that case we
+  create an additional pattern stmt for S2 to create a variable of an
+  intermediate type, and perform widen-shift on the intermediate type:
+ 
+  type a_t;
+  interm_type a_it;
+  TYPE a_T, res_T, res_T';
+
+  S1 a_t = ;
+  S2 a_T = (TYPE) a_t;
+      '--> a_it = (interm_type) a_t;
+  S3 res_T = a_T << CONST;
+      '--> res_T' = a_it <<* CONST;
+
+  Input/Output:
+
+  * STMTS: Contains a stmt from which the pattern search begins. 
+    In case of unsigned widen-shift, the original stmt (S3) is replaced with S4
+    in STMTS.  When an intermediate type is used and a pattern statement is
+    created for S2, we also put S2 here (before S3).
+
+  Output:
+
+  * TYPE_IN: The type of the input arguments to the pattern.
+
+  * TYPE_OUT: The type of the output of this pattern.
+  
+  * Return value: A new stmt that will be used to replace the sequence of
+    stmts that constitute the pattern.  In this case it will be:
+    WIDEN_SHIFT_LEFT_EXPR <a_t, CONST>.  */
+   
+static gimple
+vect_recog_widen_shift_pattern (VEC (gimple, heap) **stmts,
+				tree *type_in, tree *type_out)
+{
+  gimple last_stmt = VEC_pop (gimple, *stmts);
+  gimple def_stmt0;
+  tree oprnd0, oprnd1;
+  tree type, half_type0;
+  gimple pattern_stmt;
+  tree vectype, vectype_out = NULL_TREE;
+  tree dummy;
+  tree var;
+  enum tree_code dummy_code;
+  int dummy_int;
+  VEC (tree, heap) * dummy_vec;
+
+  if (!is_gimple_assign (last_stmt))
+    return NULL;
+
+  if (!vinfo_for_stmt (last_stmt)
+      || STMT_VINFO_IN_PATTERN_P (vinfo_for_stmt (last_stmt)))
+    return NULL;
+
+  type = gimple_expr_type (last_stmt);
+
+  /* Starting from LAST_STMT, follow the defs of its uses in search
+     of the above pattern.  */
+  if (gimple_assign_rhs_code (last_stmt) != LSHIFT_EXPR)
+    return NULL;
+
+  oprnd0 = gimple_assign_rhs1 (last_stmt);
+  oprnd1 = gimple_assign_rhs2 (last_stmt);
+  if (TREE_CODE (oprnd0) != SSA_NAME || TREE_CODE (oprnd1) != INTEGER_CST)
+    return NULL;
+
+  /* Check operand 0: it has to be defined by a type promotion.  */
+  if (!widened_name_p (oprnd0, last_stmt, &half_type0, &def_stmt0, false))
+    return NULL;
+
+  /* Check operand 1: has to be positive.  We check that it fits the type
+     in vect_handle_widen_op_by_const().  */
+  if (tree_int_cst_compare (oprnd1, size_zero_node) <= 0)
+    return NULL; 
+
+  oprnd0 = gimple_assign_rhs1 (def_stmt0);
+
+  /* Check if this a widening operation.  */
+  if (!vect_handle_widen_op_by_const (last_stmt, LSHIFT_EXPR, oprnd1,
+       				      &oprnd0, stmts,
+	                              type, &half_type0, def_stmt0))
+    return NULL;
+
+  /* Handle unsigned case.  Look for
+     S4  u_res_T = (unsigned TYPE) res_T;
+     Use unsigned TYPE as the type for WIDEN_SHIFT_LEFT_EXPR.  */
+  if (TYPE_UNSIGNED (type) != TYPE_UNSIGNED (half_type0))
+    {
+      tree lhs = gimple_assign_lhs (last_stmt), use_lhs;
+      imm_use_iterator imm_iter;
+      use_operand_p use_p;
+      int nuses = 0;
+      gimple use_stmt = NULL;
+      tree use_type;
+
+      FOR_EACH_IMM_USE_FAST (use_p, imm_iter, lhs)
+        {
+	  if (is_gimple_debug (USE_STMT (use_p)))
+	    continue;
+  	  use_stmt = USE_STMT (use_p);
+	  nuses++;
+        }
+
+      if (nuses != 1 || !is_gimple_assign (use_stmt)
+	  || gimple_assign_rhs_code (use_stmt) != NOP_EXPR)
+	return NULL;
+
+      use_lhs = gimple_assign_lhs (use_stmt);
+      use_type = TREE_TYPE (use_lhs);
+      if (!INTEGRAL_TYPE_P (use_type)
+	  || (TYPE_UNSIGNED (type) == TYPE_UNSIGNED (use_type))
+	  || (TYPE_PRECISION (type) != TYPE_PRECISION (use_type)))
+	return NULL;
+
+      type = use_type;
+      last_stmt = use_stmt;
+    }
+      
+  /* Pattern detected.  */
+  if (vect_print_dump_info (REPORT_DETAILS))
+    fprintf (vect_dump, "vect_recog_widen_shift_pattern: detected: ");
+
+  /* Check target support.  */
+  vectype = get_vectype_for_scalar_type (half_type0);
+  vectype_out = get_vectype_for_scalar_type (type);
+
+  if (!vectype
+      || !vectype_out
+      || !supportable_widening_operation (WIDEN_SHIFT_LEFT_EXPR, last_stmt,
+					  vectype_out, vectype,
+					  &dummy, &dummy, &dummy_code,
+					  &dummy_code, &dummy_int,
+					  &dummy_vec))
+    return NULL;
+
+  *type_in = vectype;
+  *type_out = vectype_out;
+
+  /* Pattern supported. Create a stmt to be used to replace the pattern.  */
+  var = vect_recog_temp_ssa_var (type, NULL);
+  pattern_stmt =
+    gimple_build_assign_with_ops (WIDEN_SHIFT_LEFT_EXPR, var, oprnd0, oprnd1);
+  SSA_NAME_DEF_STMT (var) = pattern_stmt;
+
+  if (vect_print_dump_info (REPORT_DETAILS))
+    print_gimple_stmt (vect_dump, pattern_stmt, 0, TDF_SLIM);
+
+  VEC_safe_push (gimple, heap, *stmts, last_stmt);
+  return pattern_stmt;
+}
+
 /* Mark statements that are involved in a pattern.  */
 
 static inline void
 vect_mark_pattern_stmts (gimple orig_stmt, gimple pattern_stmt,
-                         tree pattern_vectype)
+			 tree pattern_vectype)
 {
   stmt_vec_info pattern_stmt_info, def_stmt_info;
   stmt_vec_info orig_stmt_info = vinfo_for_stmt (orig_stmt);
@@ -1239,6 +1410,7 @@ vect_mark_pattern_stmts (gimple orig_stmt, gimple
 	= STMT_VINFO_DEF_TYPE (orig_stmt_info);
   STMT_VINFO_VECTYPE (pattern_stmt_info) = pattern_vectype;
   STMT_VINFO_IN_PATTERN_P (orig_stmt_info) = true;
+
   STMT_VINFO_RELATED_STMT (orig_stmt_info) = pattern_stmt;
   STMT_VINFO_PATTERN_DEF_STMT (pattern_stmt_info)
 	= STMT_VINFO_PATTERN_DEF_STMT (orig_stmt_info);
Index: tree-vect-stmts.c
===================================================================
--- tree-vect-stmts.c	(revision 178942)
+++ tree-vect-stmts.c	(working copy)
@@ -3318,6 +3318,7 @@ vectorizable_type_promotion (gimple stmt, gimple_s
   int multi_step_cvt = 0;
   VEC (tree, heap) *vec_oprnds0 = NULL, *vec_oprnds1 = NULL;
   VEC (tree, heap) *vec_dsts = NULL, *interm_types = NULL, *tmp_vec_dsts = NULL;
+  unsigned int k;
 
   /* FORNOW: not supported by basic block SLP vectorization.  */
   gcc_assert (loop_vinfo);
@@ -3337,7 +3338,8 @@ vectorizable_type_promotion (gimple stmt, gimple_s
 
   code = gimple_assign_rhs_code (stmt);
   if (!CONVERT_EXPR_CODE_P (code)
-      && code != WIDEN_MULT_EXPR)
+      && code != WIDEN_MULT_EXPR
+      && code != WIDEN_SHIFT_LEFT_EXPR)
     return false;
 
   scalar_dest = gimple_assign_lhs (stmt);
@@ -3365,7 +3367,7 @@ vectorizable_type_promotion (gimple stmt, gimple_s
       bool ok;
 
       op1 = gimple_assign_rhs2 (stmt);
-      if (code == WIDEN_MULT_EXPR)
+      if (code == WIDEN_MULT_EXPR || code == WIDEN_SHIFT_LEFT_EXPR)
         {
 	  /* For WIDEN_MULT_EXPR, if OP0 is a constant, use the type of
 	     OP1.  */
@@ -3442,7 +3444,7 @@ vectorizable_type_promotion (gimple stmt, gimple_s
     fprintf (vect_dump, "transform type promotion operation. ncopies = %d.",
                         ncopies);
 
-  if (code == WIDEN_MULT_EXPR)
+  if (code == WIDEN_MULT_EXPR || code == WIDEN_SHIFT_LEFT_EXPR)
     {
       if (CONSTANT_CLASS_P (op0))
 	op0 = fold_convert (TREE_TYPE (op1), op0);
@@ -3483,6 +3485,8 @@ vectorizable_type_promotion (gimple stmt, gimple_s
       if (op_type == binary_op)
         vec_oprnds1 = VEC_alloc (tree, heap, 1);
     }
+  else if (code == WIDEN_SHIFT_LEFT_EXPR)
+    vec_oprnds1 = VEC_alloc (tree, heap, slp_node->vec_stmts_size);
 
   /* In case the vectorization factor (VF) is bigger than the number
      of elements that we can fit in a vectype (nunits), we have to generate
@@ -3496,15 +3500,33 @@ vectorizable_type_promotion (gimple stmt, gimple_s
       if (j == 0)
         {
           if (slp_node)
-              vect_get_slp_defs (op0, op1, slp_node, &vec_oprnds0,
-                                 &vec_oprnds1, -1);
-          else
+	    {
+	      if (code == WIDEN_SHIFT_LEFT_EXPR)
+                {
+                  vec_oprnd1 = op1;			
+ 	          /* Store vec_oprnd1 for every vector stmt to be created
+		     for SLP_NODE.  We check during the analysis that all
+		     the shift arguments are the same.  */
+                  for (k = 0; k < slp_node->vec_stmts_size - 1; k++)
+                    VEC_quick_push (tree, vec_oprnds1, vec_oprnd1);
+
+    		  vect_get_slp_defs (op0, NULL_TREE, slp_node, &vec_oprnds0, NULL,
+ 	                             -1);
+                }
+              else
+                vect_get_slp_defs (op0, op1, slp_node, &vec_oprnds0,
+                                   &vec_oprnds1, -1); 
+	    }  
+	  else
             {
               vec_oprnd0 = vect_get_vec_def_for_operand (op0, stmt, NULL);
               VEC_quick_push (tree, vec_oprnds0, vec_oprnd0);
               if (op_type == binary_op)
                 {
-                  vec_oprnd1 = vect_get_vec_def_for_operand (op1, stmt, NULL);
+                  if (code == WIDEN_SHIFT_LEFT_EXPR)
+                    vec_oprnd1 = op1;
+                  else
+                    vec_oprnd1 = vect_get_vec_def_for_operand (op1, stmt, NULL);
                   VEC_quick_push (tree, vec_oprnds1, vec_oprnd1);
                 }
             }
@@ -3515,7 +3537,10 @@ vectorizable_type_promotion (gimple stmt, gimple_s
           VEC_replace (tree, vec_oprnds0, 0, vec_oprnd0);
           if (op_type == binary_op)
             {
-              vec_oprnd1 = vect_get_vec_def_for_stmt_copy (dt[1], vec_oprnd1);
+              if (code == WIDEN_SHIFT_LEFT_EXPR)
+                vec_oprnd1 = op1;
+              else
+                vec_oprnd1 = vect_get_vec_def_for_stmt_copy (dt[1], vec_oprnd1);
               VEC_replace (tree, vec_oprnds1, 0, vec_oprnd1);
             }
         }
@@ -5785,6 +5810,19 @@ supportable_widening_operation (enum tree_code cod
         }
       break;
 
+    case WIDEN_SHIFT_LEFT_EXPR:
+      if (BYTES_BIG_ENDIAN)
+        {
+          c1 = VEC_WIDEN_SHIFT_LEFT_HI_EXPR;
+          c2 = VEC_WIDEN_SHIFT_LEFT_LO_EXPR;
+        }
+      else
+        {
+          c2 = VEC_WIDEN_SHIFT_LEFT_HI_EXPR;
+          c1 = VEC_WIDEN_SHIFT_LEFT_LO_EXPR;
+        }
+      break;
+
     CASE_CONVERT:
       if (BYTES_BIG_ENDIAN)
         {
Index: tree-inline.c
===================================================================
--- tree-inline.c	(revision 178942)
+++ tree-inline.c	(working copy)
@@ -3354,6 +3354,7 @@ estimate_operator_cost (enum tree_code code, eni_w
     case DOT_PROD_EXPR:
     case WIDEN_MULT_PLUS_EXPR:
     case WIDEN_MULT_MINUS_EXPR:
+    case WIDEN_SHIFT_LEFT_EXPR:
 
     case VEC_WIDEN_MULT_HI_EXPR:
     case VEC_WIDEN_MULT_LO_EXPR:
@@ -3368,6 +3369,8 @@ estimate_operator_cost (enum tree_code code, eni_w
     case VEC_EXTRACT_ODD_EXPR:
     case VEC_INTERLEAVE_HIGH_EXPR:
     case VEC_INTERLEAVE_LOW_EXPR:
+    case VEC_WIDEN_SHIFT_LEFT_HI_EXPR:
+    case VEC_WIDEN_SHIFT_LEFT_LO_EXPR:
 
       return 1;
 
Index: tree-vect-generic.c
===================================================================
--- tree-vect-generic.c	(revision 178942)
+++ tree-vect-generic.c	(working copy)
@@ -552,7 +552,9 @@ expand_vector_operations_1 (gimple_stmt_iterator *
       || code == VEC_UNPACK_LO_EXPR
       || code == VEC_PACK_TRUNC_EXPR
       || code == VEC_PACK_SAT_EXPR
-      || code == VEC_PACK_FIX_TRUNC_EXPR)
+      || code == VEC_PACK_FIX_TRUNC_EXPR
+      || code == VEC_WIDEN_SHIFT_LEFT_HI_EXPR
+      || code == VEC_WIDEN_SHIFT_LEFT_LO_EXPR)
     type = TREE_TYPE (rhs1);
 
   /* Optabs will try converting a negation into a subtraction, so
Index: tree-cfg.c
===================================================================
--- tree-cfg.c	(revision 178942)
+++ tree-cfg.c	(working copy)
@@ -3609,6 +3609,9 @@ do_pointer_plus_expr_check:
     case VEC_EXTRACT_ODD_EXPR:
     case VEC_INTERLEAVE_HIGH_EXPR:
     case VEC_INTERLEAVE_LOW_EXPR:
+    case WIDEN_SHIFT_LEFT_EXPR:
+    case VEC_WIDEN_SHIFT_LEFT_HI_EXPR:
+    case VEC_WIDEN_SHIFT_LEFT_LO_EXPR:
       /* FIXME.  */
       return false;
 
Index: config/arm/neon.md
===================================================================
--- config/arm/neon.md	(revision 178942)
+++ config/arm/neon.md	(working copy)
@@ -144,6 +144,8 @@
   UNSPEC_MISALIGNED_ACCESS
   UNSPEC_VCLE
   UNSPEC_VCLT
+  UNSPEC_VSHLL_LO
+  UNSPEC_VSHLL_HI
 ])
 
 
@@ -5550,6 +5552,80 @@
  }
 )
 
+(define_insn "neon_vec_<US>shiftl_lo_<mode>"
+ [(set (match_operand:<V_unpack> 0 "register_operand" "=w")
+       (unspec:<V_unpack> [(SE:<V_unpack> (vec_select:<V_HALF>
+                           (match_operand:VU 1 "register_operand" "w")
+                           (match_operand:VU 2 "vect_par_constant_low" "")))
+                           (match_operand:SI 3 "immediate_operand" "i")]
+                          UNSPEC_VSHLL_LO))]
+  "TARGET_NEON && !BYTES_BIG_ENDIAN"
+{
+  neon_const_bounds (operands[3], 0, neon_element_bits (<MODE>mode) + 1);
+  return "vshll.<US><V_sz_elem> %q0, %e1, %3";
+}
+  [(set_attr "neon_type" "neon_shift_1")]
+)
+
+(define_expand "vec_widen_<US>shiftl_lo_<mode>"
+  [(match_operand:<V_unpack> 0 "register_operand" "")
+   (SE:<V_unpack> (match_operand:VU 1 "register_operand" ""))
+   (match_operand:SI 2 "immediate_operand" "i")]
+ "TARGET_NEON && !BYTES_BIG_ENDIAN"
+ {
+   rtvec v = rtvec_alloc (<V_mode_nunits>/2)  ;
+   rtx t1;
+   int i;
+   for (i = 0; i < (<V_mode_nunits>/2) ; i++)
+     RTVEC_ELT (v, i) = GEN_INT (i);
+   t1 = gen_rtx_PARALLEL (<MODE>mode, v);
+
+   emit_insn (gen_neon_vec_<US>shiftl_lo_<mode> (operands[0],
+                                                 operands[1],
+                                                 t1,
+                                                 operands[2]));
+   DONE;
+ }
+)
+
+(define_insn "neon_vec_<US>shiftl_hi_<mode>"
+ [(set (match_operand:<V_unpack> 0 "register_operand" "=w")
+       (unspec:<V_unpack> [(SE:<V_unpack> (vec_select:<V_HALF>
+                           (match_operand:VU 1 "register_operand" "w")
+                           (match_operand:VU 2 "vect_par_constant_high" "")))
+                           (match_operand:SI 3 "immediate_operand" "i")]
+                          UNSPEC_VSHLL_HI))]
+  "TARGET_NEON && !BYTES_BIG_ENDIAN"
+{
+  /* The boundaries are: 0 < imm <= size.  */
+  neon_const_bounds (operands[3], 0, neon_element_bits (<MODE>mode) + 1);
+  return "vshll.<US><V_sz_elem> %q0, %f1, %3";
+}
+  [(set_attr "neon_type" "neon_shift_1")]
+)
+
+(define_expand "vec_widen_<US>shiftl_hi_<mode>"
+  [(match_operand:<V_unpack> 0 "register_operand" "")
+   (SE:<V_unpack> (match_operand:VU 1 "register_operand" ""))
+   (match_operand:SI 2 "immediate_operand" "i")]
+ "TARGET_NEON && !BYTES_BIG_ENDIAN"
+ {
+   rtvec v = rtvec_alloc (<V_mode_nunits>/2)  ;
+   rtx t1;
+   int i;
+   for (i = 0; i < (<V_mode_nunits>/2) ; i++)
+     RTVEC_ELT (v, i) = GEN_INT (<V_mode_nunits>/2 + i);
+   t1 = gen_rtx_PARALLEL (<MODE>mode, v);
+
+
+   emit_insn (gen_neon_vec_<US>shiftl_hi_<mode> (operands[0],
+                                                 operands[1],
+                                                 t1,
+                                                 operands[2]));
+   DONE;
+ }
+)
+
 ;; Vectorize for non-neon-quad case
 (define_insn "neon_unpack<US>_<mode>"
  [(set (match_operand:<V_widen> 0 "register_operand" "=w")
@@ -5626,6 +5702,51 @@
  }
 )
 
+(define_insn "neon_vec_<US>shift_left_<mode>"
+ [(set (match_operand:<V_widen> 0 "register_operand" "=w")
+       (unspec:<V_widen> [(SE:<V_widen>
+                           (match_operand:VDI 1 "register_operand" "w"))
+                         (match_operand:SI 2 "immediate_operand" "i")]
+                                                UNSPEC_VSHLL_N))]
+  "TARGET_NEON"
+{
+  /* The boundaries are: 0 < imm <= size.  */
+  neon_const_bounds (operands[2], 0, neon_element_bits (<MODE>mode) + 1);
+  return "vshll.<US><V_sz_elem> %q0, %P1, %2";
+}
+  [(set_attr "neon_type" "neon_shift_1")]
+)
+
+(define_expand "vec_widen_<US>shiftl_hi_<mode>"
+ [(match_operand:<V_double_width> 0 "register_operand" "")
+   (SE:<V_double_width> (match_operand:VDI 1 "register_operand" ""))
+   (match_operand:SI 2 "immediate_operand" "i")]
+ "TARGET_NEON"
+ {
+   rtx tmpreg = gen_reg_rtx (<V_widen>mode);
+   emit_insn (gen_neon_vec_<US>shift_left_<mode> (tmpreg, operands[1], operands[2]));
+   emit_insn (gen_neon_vget_high<V_widen_l> (operands[0], tmpreg));
+
+   DONE;
+
+ }
+)
+
+(define_expand "vec_widen_<US>shiftl_lo_<mode>"
+  [(match_operand:<V_double_width> 0 "register_operand" "")
+   (SE:<V_double_width> (match_operand:VDI 1 "register_operand" ""))
+   (match_operand:SI 2 "immediate_operand" "i")]
+ "TARGET_NEON"
+ {
+   rtx tmpreg = gen_reg_rtx (<V_widen>mode);
+   emit_insn (gen_neon_vec_<US>shift_left_<mode> (tmpreg, operands[1], operands[2]));
+   emit_insn (gen_neon_vget_low<V_widen_l> (operands[0], tmpreg));
+
+   DONE;
+
+ }
+)
+
 ; FIXME: These instruction patterns can't be used safely in big-endian mode
 ; because the ordering of vector elements in Q registers is different from what
 ; the semantics of the instructions require.
Index: tree-vect-slp.c
===================================================================
--- tree-vect-slp.c	(revision 178942)
+++ tree-vect-slp.c	(working copy)
@@ -459,6 +459,11 @@ vect_build_slp_tree (loop_vec_info loop_vinfo, bb_
 		    }
 		}
 	    }
+	  else if (rhs_code == WIDEN_SHIFT_LEFT_EXPR)
+            {
+              need_same_oprnds = true;
+              first_op1 = gimple_assign_rhs2 (stmt);
+            }
 	}
       else
 	{

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [patch] Support vectorization of widening shifts
  2011-09-19  8:26 [patch] Support vectorization of widening shifts Ira Rosen
@ 2011-09-26 14:40 ` Richard Guenther
  2011-09-27  7:40   ` Ira Rosen
  2011-09-29 15:42 ` Ramana Radhakrishnan
  1 sibling, 1 reply; 9+ messages in thread
From: Richard Guenther @ 2011-09-26 14:40 UTC (permalink / raw)
  To: Ira Rosen; +Cc: gcc-patches, Patch Tracking

On Mon, Sep 19, 2011 at 9:54 AM, Ira Rosen <ira.rosen@linaro.org> wrote:
> Hi,
>
> This patch adds a support of widening shift left. The following
> pattern is detected:
>
> type a_t;
> TYPE a_T, res_T;
>
> a_t = ;
> a_T = (TYPE) a_t;
> res_T = a_T << CONST;
>
> ('TYPE' is at least 2 times bigger than 'type', and CONST is at most
> the size of 'type')
>
> and create a pattern stmt using new tree code WIDEN_SHIFT_LEFT_EXPR for it:
>
> a_t = ;
> a_T = (TYPE) a_t;
> res_T = a_T << CONST;
>    -->  res_T = a_t w<< CONST;
>
> which is later transformed into:
>
> va_t = ;
> vres_T0 = WIDEN_SHIFT_LEFT_LO_EXPR <va_t, CONST>;
> vres_T1 = WIDEN_SHIFT_LEFT_HI_EXPR <va_t, CONST>;
>
> This patch also supports unsigned types, and cases when 'TYPE' is 4
> times bigger than 'type'.
> This feature is similar to vectorization of widening multiplication.
>
> Bootstrapped on powerpc64-suse-linux, tested on powerpc64-suse-linux
> and arm-linux-gnueabi
> OK for mainline?

Hmm, it doesn't look like arm has real widening shift instructions.  So
why not split this into the widening, shift parts in the vectorizer?  That
way you wouldn't need new tree codes and all architectures that can
do widening conversions would benefit?

Thanks,
Richard.

> Thanks,
> Ira
>
> ChangeLog:
>
>        * doc/md.texi (vec_widen_ushiftl_hi, vec_widen_ushiftl_lo,
> vec_widen_sshiftl_hi,
>        vec_widen_sshiftl_lo): Document.
>        * tree-pretty-print.c (dump_generic_node): Handle WIDEN_SHIFT_LEFT_EXPR,
>        VEC_WIDEN_SHIFT_LEFT_HI_EXPR and VEC_WIDEN_SHIFT_LEFT_LO_EXPR.
>        (op_code_prio): Likewise.
>        (op_symbol_code): Handle WIDEN_SHIFT_LEFT_EXPR.
>        * optabs.c (optab_for_tree_code): Handle
>        VEC_WIDEN_SHIFT_LEFT_HI_EXPR and VEC_WIDEN_SHIFT_LEFT_LO_EXPR.
>        (init-optabs): Initialize optab codes for vec_widen_u/sshiftl_hi/lo.
>        * optabs.h (enum optab_index): Add OTI_vec_widen_u/sshiftl_hi/lo.
>        * genopinit.c (optabs): Initialize the new optabs.
>        * expr.c (expand_expr_real_2): Handle
>        VEC_WIDEN_SHIFT_LEFT_HI_EXPR and VEC_WIDEN_SHIFT_LEFT_LO_EXPR.
>        * gimple-pretty-print.c (dump_binary_rhs): Likewise.
>        * tree-vectorizer.h (NUM_PATTERNS): Increase to 6.
>        * tree.def (WIDEN_SHIFT_LEFT_EXPR, VEC_WIDEN_SHIFT_LEFT_HI_EXPR,
>        VEC_WIDEN_SHIFT_LEFT_LO_EXPR): New.
>        * cfgexpand.c (expand_debug_expr):  Handle new tree codes.
>        * tree-vect-patterns.c (vect_vect_recog_func_ptrs): Add
>        vect_recog_widen_shift_pattern.
>        (vect_handle_widen_mult_by_const): Rename...
>        (vect_handle_widen_op_by_const): ...to this.  Handle shifts.
>        Add a new argument, update documentation.
>        (vect_recog_widen_mult_pattern): Assume that only second
>        operand can be constant.  Update call to
>        vect_handle_widen_op_by_const.
>        (vect_operation_fits_smaller_type): Add the already existing
>        def stmt to the list of pattern statements.
>        (vect_recog_widen_shift_pattern): New.
>        * tree-vect-stmts.c (vectorizable_type_promotion): Handle
>        widening shifts.
>        (supportable_widening_operation): Likewise.
>        * tree-inline.c (estimate_operator_cost): Handle new tree codes.
>        * tree-vect-generic.c (expand_vector_operations_1): Likewise.
>        * tree-cfg.c (verify_gimple_assign_binary): Likewise.
>        * config/arm/neon.md (neon_vec_<US>shiftl_lo_<mode>): New.
>        (vec_widen_<US>shiftl_lo_<mode>, neon_vec_<US>shiftl_hi_<mode>,
>        vec_widen_<US>shiftl_hi_<mode>, neon_vec_<US>shift_left_<mode>):
>        Likewise.
>        * tree-vect-slp.c (vect_build_slp_tree): Require same shift operand
>        for widening shift.
>
> testsuite/ChangeLog:
>
>       * gcc.dg/vect/vect-widen-shift-s16.c: New.
>       * gcc.dg/vect/vect-widen-shift-s8.c: New.
>       * gcc.dg/vect/vect-widen-shift-u16.c: New.
>       * gcc.dg/vect/vect-widen-shift-u8.c: New.
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [patch] Support vectorization of widening shifts
  2011-09-26 14:40 ` Richard Guenther
@ 2011-09-27  7:40   ` Ira Rosen
  2011-09-27 13:17     ` Richard Guenther
  0 siblings, 1 reply; 9+ messages in thread
From: Ira Rosen @ 2011-09-27  7:40 UTC (permalink / raw)
  To: Richard Guenther; +Cc: gcc-patches, Patch Tracking

On 26 September 2011 17:12, Richard Guenther <richard.guenther@gmail.com> wrote:
> On Mon, Sep 19, 2011 at 9:54 AM, Ira Rosen <ira.rosen@linaro.org> wrote:
>> Hi,
>>
>> This patch adds a support of widening shift left. The following
>> pattern is detected:
>>
>> type a_t;
>> TYPE a_T, res_T;
>>
>> a_t = ;
>> a_T = (TYPE) a_t;
>> res_T = a_T << CONST;
>>
>> ('TYPE' is at least 2 times bigger than 'type', and CONST is at most
>> the size of 'type')
>>
>> and create a pattern stmt using new tree code WIDEN_SHIFT_LEFT_EXPR for it:
>>
>> a_t = ;
>> a_T = (TYPE) a_t;
>> res_T = a_T << CONST;
>>    -->  res_T = a_t w<< CONST;
>>
>> which is later transformed into:
>>
>> va_t = ;
>> vres_T0 = WIDEN_SHIFT_LEFT_LO_EXPR <va_t, CONST>;
>> vres_T1 = WIDEN_SHIFT_LEFT_HI_EXPR <va_t, CONST>;
>>
>> This patch also supports unsigned types, and cases when 'TYPE' is 4
>> times bigger than 'type'.
>> This feature is similar to vectorization of widening multiplication.
>>
>> Bootstrapped on powerpc64-suse-linux, tested on powerpc64-suse-linux
>> and arm-linux-gnueabi
>> OK for mainline?
>
> Hmm, it doesn't look like arm has real widening shift instructions.

It does: vshll. The implementation may look awkward because we don't
support multiple vector sizes in the same operation (vshll takes a
64-bit vector and produces a 128-bit vector), but the resulting code
is just the instruction itself.

> So why not split this into the widening, shift parts in the vectorizer?

What do you mean? (We of course already support widening first and
then shifting the widened value).

Thanks,
Ira

> That
> way you wouldn't need new tree codes and all architectures that can
> do widening conversions would benefit?
>
> Thanks,
> Richard.
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [patch] Support vectorization of widening shifts
  2011-09-27  7:40   ` Ira Rosen
@ 2011-09-27 13:17     ` Richard Guenther
  0 siblings, 0 replies; 9+ messages in thread
From: Richard Guenther @ 2011-09-27 13:17 UTC (permalink / raw)
  To: Ira Rosen; +Cc: gcc-patches, Patch Tracking

On Tue, Sep 27, 2011 at 8:32 AM, Ira Rosen <ira.rosen@linaro.org> wrote:
> On 26 September 2011 17:12, Richard Guenther <richard.guenther@gmail.com> wrote:
>> On Mon, Sep 19, 2011 at 9:54 AM, Ira Rosen <ira.rosen@linaro.org> wrote:
>>> Hi,
>>>
>>> This patch adds a support of widening shift left. The following
>>> pattern is detected:
>>>
>>> type a_t;
>>> TYPE a_T, res_T;
>>>
>>> a_t = ;
>>> a_T = (TYPE) a_t;
>>> res_T = a_T << CONST;
>>>
>>> ('TYPE' is at least 2 times bigger than 'type', and CONST is at most
>>> the size of 'type')
>>>
>>> and create a pattern stmt using new tree code WIDEN_SHIFT_LEFT_EXPR for it:
>>>
>>> a_t = ;
>>> a_T = (TYPE) a_t;
>>> res_T = a_T << CONST;
>>>    -->  res_T = a_t w<< CONST;
>>>
>>> which is later transformed into:
>>>
>>> va_t = ;
>>> vres_T0 = WIDEN_SHIFT_LEFT_LO_EXPR <va_t, CONST>;
>>> vres_T1 = WIDEN_SHIFT_LEFT_HI_EXPR <va_t, CONST>;
>>>
>>> This patch also supports unsigned types, and cases when 'TYPE' is 4
>>> times bigger than 'type'.
>>> This feature is similar to vectorization of widening multiplication.
>>>
>>> Bootstrapped on powerpc64-suse-linux, tested on powerpc64-suse-linux
>>> and arm-linux-gnueabi
>>> OK for mainline?
>>
>> Hmm, it doesn't look like arm has real widening shift instructions.
>
> It does: vshll. The implementation may look awkward because we don't
> support multiple vector sizes in the same operation (vshll takes a
> 64-bit vector and produces a 128-bit vector), but the resulting code
> is just the instruction itself.

Ah, ok.  Can you please do s/SHIFT_LEFT/LSHIFT/ on the new tree
code names for consistency with LSHIFT_EXPR?  Also please implement
some gimple verification for the new codes instead of sticking them
to the /* FIXME */ case in tree-cfg.c.

>> So why not split this into the widening, shift parts in the vectorizer?
>
> What do you mean? (We of course already support widening first and
> then shifting the widened value).

Of course.

Ok with the above changes.

Thanks,
Richard.

> Thanks,
> Ira
>
>> That
>> way you wouldn't need new tree codes and all architectures that can
>> do widening conversions would benefit?
>>
>> Thanks,
>> Richard.
>>
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [patch] Support vectorization of widening shifts
  2011-09-19  8:26 [patch] Support vectorization of widening shifts Ira Rosen
  2011-09-26 14:40 ` Richard Guenther
@ 2011-09-29 15:42 ` Ramana Radhakrishnan
  2011-10-02  8:31   ` Ira Rosen
  1 sibling, 1 reply; 9+ messages in thread
From: Ramana Radhakrishnan @ 2011-09-29 15:42 UTC (permalink / raw)
  To: Ira Rosen; +Cc: gcc-patches, Patch Tracking

On 19 September 2011 08:54, Ira Rosen <ira.rosen@linaro.org> wrote:

>
> Bootstrapped on powerpc64-suse-linux, tested on powerpc64-suse-linux
> and arm-linux-gnueabi
> OK for mainline?

Sorry I missed this patch. Is there any reason why we need unspecs in
this case ? Can't this be represented by subregs and zero/ sign
extensions in RTL without the UNSPECs ?

cheers
Ramana

>
> Thanks,
> Ira
>
> ChangeLog:
>
>        * doc/md.texi (vec_widen_ushiftl_hi, vec_widen_ushiftl_lo,
> vec_widen_sshiftl_hi,
>        vec_widen_sshiftl_lo): Document.
>        * tree-pretty-print.c (dump_generic_node): Handle WIDEN_SHIFT_LEFT_EXPR,
>        VEC_WIDEN_SHIFT_LEFT_HI_EXPR and VEC_WIDEN_SHIFT_LEFT_LO_EXPR.
>        (op_code_prio): Likewise.
>        (op_symbol_code): Handle WIDEN_SHIFT_LEFT_EXPR.
>        * optabs.c (optab_for_tree_code): Handle
>        VEC_WIDEN_SHIFT_LEFT_HI_EXPR and VEC_WIDEN_SHIFT_LEFT_LO_EXPR.
>        (init-optabs): Initialize optab codes for vec_widen_u/sshiftl_hi/lo.
>        * optabs.h (enum optab_index): Add OTI_vec_widen_u/sshiftl_hi/lo.
>        * genopinit.c (optabs): Initialize the new optabs.
>        * expr.c (expand_expr_real_2): Handle
>        VEC_WIDEN_SHIFT_LEFT_HI_EXPR and VEC_WIDEN_SHIFT_LEFT_LO_EXPR.
>        * gimple-pretty-print.c (dump_binary_rhs): Likewise.
>        * tree-vectorizer.h (NUM_PATTERNS): Increase to 6.
>        * tree.def (WIDEN_SHIFT_LEFT_EXPR, VEC_WIDEN_SHIFT_LEFT_HI_EXPR,
>        VEC_WIDEN_SHIFT_LEFT_LO_EXPR): New.
>        * cfgexpand.c (expand_debug_expr):  Handle new tree codes.
>        * tree-vect-patterns.c (vect_vect_recog_func_ptrs): Add
>        vect_recog_widen_shift_pattern.
>        (vect_handle_widen_mult_by_const): Rename...
>        (vect_handle_widen_op_by_const): ...to this.  Handle shifts.
>        Add a new argument, update documentation.
>        (vect_recog_widen_mult_pattern): Assume that only second
>        operand can be constant.  Update call to
>        vect_handle_widen_op_by_const.
>        (vect_operation_fits_smaller_type): Add the already existing
>        def stmt to the list of pattern statements.
>        (vect_recog_widen_shift_pattern): New.
>        * tree-vect-stmts.c (vectorizable_type_promotion): Handle
>        widening shifts.
>        (supportable_widening_operation): Likewise.
>        * tree-inline.c (estimate_operator_cost): Handle new tree codes.
>        * tree-vect-generic.c (expand_vector_operations_1): Likewise.
>        * tree-cfg.c (verify_gimple_assign_binary): Likewise.
>        * config/arm/neon.md (neon_vec_<US>shiftl_lo_<mode>): New.
>        (vec_widen_<US>shiftl_lo_<mode>, neon_vec_<US>shiftl_hi_<mode>,
>        vec_widen_<US>shiftl_hi_<mode>, neon_vec_<US>shift_left_<mode>):
>        Likewise.
>        * tree-vect-slp.c (vect_build_slp_tree): Require same shift operand
>        for widening shift.
>
> testsuite/ChangeLog:
>
>       * gcc.dg/vect/vect-widen-shift-s16.c: New.
>       * gcc.dg/vect/vect-widen-shift-s8.c: New.
>       * gcc.dg/vect/vect-widen-shift-u16.c: New.
>       * gcc.dg/vect/vect-widen-shift-u8.c: New.
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [patch] Support vectorization of widening shifts
  2011-09-29 15:42 ` Ramana Radhakrishnan
@ 2011-10-02  8:31   ` Ira Rosen
  2011-10-18 10:48     ` Ira Rosen
  0 siblings, 1 reply; 9+ messages in thread
From: Ira Rosen @ 2011-10-02  8:31 UTC (permalink / raw)
  To: Ramana Radhakrishnan; +Cc: gcc-patches, Patch Tracking

On 29 September 2011 17:30, Ramana Radhakrishnan
<ramana.radhakrishnan@linaro.org> wrote:
> On 19 September 2011 08:54, Ira Rosen <ira.rosen@linaro.org> wrote:
>
>>
>> Bootstrapped on powerpc64-suse-linux, tested on powerpc64-suse-linux
>> and arm-linux-gnueabi
>> OK for mainline?
>
> Sorry I missed this patch. Is there any reason why we need unspecs in
> this case ? Can't this be represented by subregs and zero/ sign
> extensions in RTL without the UNSPECs ?

Like this:

Index: config/arm/neon.md
===================================================================
--- config/arm/neon.md  (revision 178942)
+++ config/arm/neon.md  (working copy)
@@ -5550,6 +5550,46 @@
  }
 )

+(define_insn "neon_vec_<US>shiftl_<mode>"
+ [(set (match_operand:<V_widen> 0 "register_operand" "=w")
+       (SE:<V_widen> (match_operand:VW 1 "register_operand" "w")))
+       (match_operand:SI 2 "immediate_operand" "i")]
+  "TARGET_NEON"
+{
+  /* The boundaries are: 0 < imm <= size.  */
+  neon_const_bounds (operands[2], 0, neon_element_bits (<MODE>mode) + 1);
+  return "vshll.<US><V_sz_elem> %q0, %P1, %2";
+}
+  [(set_attr "neon_type" "neon_shift_1")]
+)
+
+(define_expand "vec_widen_<US>shiftl_lo_<mode>"
+  [(match_operand:<V_unpack> 0 "register_operand" "")
+   (SE:<V_unpack> (match_operand:VU 1 "register_operand" ""))
+   (match_operand:SI 2 "immediate_operand" "i")]
+ "TARGET_NEON && !BYTES_BIG_ENDIAN"
+ {
+  emit_insn (gen_neon_vec_<US>shiftl_<V_half> (operands[0],
+               simplify_gen_subreg (<V_HALF>mode, operands[1], <MODE>mode, 0),
+               operands[2]));
+   DONE;
+ }
+)
+
+(define_expand "vec_widen_<US>shiftl_hi_<mode>"
+  [(match_operand:<V_unpack> 0 "register_operand" "")
+   (SE:<V_unpack> (match_operand:VU 1 "register_operand" ""))
+   (match_operand:SI 2 "immediate_operand" "i")]
+ "TARGET_NEON && !BYTES_BIG_ENDIAN"
+ {
+  emit_insn (gen_neon_vec_<US>shiftl_<V_half> (operands[0],
+                simplify_gen_subreg (<V_HALF>mode, operands[1], <MODE>mode,
+                                    GET_MODE_SIZE (<V_HALF>mode)),
+                operands[2]));
+   DONE;
+ }
+)
+
 ;; Vectorize for non-neon-quad case
 (define_insn "neon_unpack<US>_<mode>"
  [(set (match_operand:<V_widen> 0 "register_operand" "=w")
@@ -5626,6 +5666,34 @@
  }
 )

+(define_expand "vec_widen_<US>shiftl_hi_<mode>"
+ [(match_operand:<V_double_width> 0 "register_operand" "")
+   (SE:<V_double_width> (match_operand:VDI 1 "register_operand" ""))
+   (match_operand:SI 2 "immediate_operand" "i")]
+ "TARGET_NEON"
+ {
+   rtx tmpreg = gen_reg_rtx (<V_widen>mode);
+   emit_insn (gen_neon_vec_<US>shiftl_<mode> (tmpreg, operands[1],
operands[2]));
+   emit_insn (gen_neon_vget_high<V_widen_l> (operands[0], tmpreg));
+
+   DONE;
+ }
+)
+
+(define_expand "vec_widen_<US>shiftl_lo_<mode>"
+  [(match_operand:<V_double_width> 0 "register_operand" "")
+   (SE:<V_double_width> (match_operand:VDI 1 "register_operand" ""))
+   (match_operand:SI 2 "immediate_operand" "i")]
+ "TARGET_NEON"
+ {
+   rtx tmpreg = gen_reg_rtx (<V_widen>mode);
+   emit_insn (gen_neon_vec_<US>shiftl_<mode> (tmpreg, operands[1],
operands[2]));
+   emit_insn (gen_neon_vget_low<V_widen_l> (operands[0], tmpreg));
+
+   DONE;
+ }
+)
+
 ; FIXME: These instruction patterns can't be used safely in big-endian mode
 ; because the ordering of vector elements in Q registers is different from what
 ; the semantics of the instructions require.

?

Thanks,
Ira


>
> cheers
> Ramana
>
>>
>> Thanks,
>> Ira
>>
>> ChangeLog:
>>
>>        * doc/md.texi (vec_widen_ushiftl_hi, vec_widen_ushiftl_lo,
>> vec_widen_sshiftl_hi,
>>        vec_widen_sshiftl_lo): Document.
>>        * tree-pretty-print.c (dump_generic_node): Handle WIDEN_SHIFT_LEFT_EXPR,
>>        VEC_WIDEN_SHIFT_LEFT_HI_EXPR and VEC_WIDEN_SHIFT_LEFT_LO_EXPR.
>>        (op_code_prio): Likewise.
>>        (op_symbol_code): Handle WIDEN_SHIFT_LEFT_EXPR.
>>        * optabs.c (optab_for_tree_code): Handle
>>        VEC_WIDEN_SHIFT_LEFT_HI_EXPR and VEC_WIDEN_SHIFT_LEFT_LO_EXPR.
>>        (init-optabs): Initialize optab codes for vec_widen_u/sshiftl_hi/lo.
>>        * optabs.h (enum optab_index): Add OTI_vec_widen_u/sshiftl_hi/lo.
>>        * genopinit.c (optabs): Initialize the new optabs.
>>        * expr.c (expand_expr_real_2): Handle
>>        VEC_WIDEN_SHIFT_LEFT_HI_EXPR and VEC_WIDEN_SHIFT_LEFT_LO_EXPR.
>>        * gimple-pretty-print.c (dump_binary_rhs): Likewise.
>>        * tree-vectorizer.h (NUM_PATTERNS): Increase to 6.
>>        * tree.def (WIDEN_SHIFT_LEFT_EXPR, VEC_WIDEN_SHIFT_LEFT_HI_EXPR,
>>        VEC_WIDEN_SHIFT_LEFT_LO_EXPR): New.
>>        * cfgexpand.c (expand_debug_expr):  Handle new tree codes.
>>        * tree-vect-patterns.c (vect_vect_recog_func_ptrs): Add
>>        vect_recog_widen_shift_pattern.
>>        (vect_handle_widen_mult_by_const): Rename...
>>        (vect_handle_widen_op_by_const): ...to this.  Handle shifts.
>>        Add a new argument, update documentation.
>>        (vect_recog_widen_mult_pattern): Assume that only second
>>        operand can be constant.  Update call to
>>        vect_handle_widen_op_by_const.
>>        (vect_operation_fits_smaller_type): Add the already existing
>>        def stmt to the list of pattern statements.
>>        (vect_recog_widen_shift_pattern): New.
>>        * tree-vect-stmts.c (vectorizable_type_promotion): Handle
>>        widening shifts.
>>        (supportable_widening_operation): Likewise.
>>        * tree-inline.c (estimate_operator_cost): Handle new tree codes.
>>        * tree-vect-generic.c (expand_vector_operations_1): Likewise.
>>        * tree-cfg.c (verify_gimple_assign_binary): Likewise.
>>        * config/arm/neon.md (neon_vec_<US>shiftl_lo_<mode>): New.
>>        (vec_widen_<US>shiftl_lo_<mode>, neon_vec_<US>shiftl_hi_<mode>,
>>        vec_widen_<US>shiftl_hi_<mode>, neon_vec_<US>shift_left_<mode>):
>>        Likewise.
>>        * tree-vect-slp.c (vect_build_slp_tree): Require same shift operand
>>        for widening shift.
>>
>> testsuite/ChangeLog:
>>
>>       * gcc.dg/vect/vect-widen-shift-s16.c: New.
>>       * gcc.dg/vect/vect-widen-shift-s8.c: New.
>>       * gcc.dg/vect/vect-widen-shift-u16.c: New.
>>       * gcc.dg/vect/vect-widen-shift-u8.c: New.
>>
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [patch] Support vectorization of widening shifts
  2011-10-02  8:31   ` Ira Rosen
@ 2011-10-18 10:48     ` Ira Rosen
  2011-10-18 10:53       ` Jakub Jelinek
  0 siblings, 1 reply; 9+ messages in thread
From: Ira Rosen @ 2011-10-18 10:48 UTC (permalink / raw)
  To: Ramana Radhakrishnan; +Cc: gcc-patches, Patch Tracking

[-- Attachment #1: Type: text/plain, Size: 691 bytes --]

On 2 October 2011 10:30, Ira Rosen <ira.rosen@linaro.org> wrote:
> On 29 September 2011 17:30, Ramana Radhakrishnan
> <ramana.radhakrishnan@linaro.org> wrote:
>> On 19 September 2011 08:54, Ira Rosen <ira.rosen@linaro.org> wrote:
>>
>>>
>>> Bootstrapped on powerpc64-suse-linux, tested on powerpc64-suse-linux
>>> and arm-linux-gnueabi
>>> OK for mainline?
>>
>> Sorry I missed this patch. Is there any reason why we need unspecs in
>> this case ? Can't this be represented by subregs and zero/ sign
>> extensions in RTL without the UNSPECs ?

I committed the attached patch with Ramana's solution for testing
shift amount in vshll.
The patch also addresses Richard's comments.

Thanks,
Ira

[-- Attachment #2: widen-shifts.txt --]
[-- Type: text/plain, Size: 47542 bytes --]

Index: doc/md.texi
===================================================================
--- doc/md.texi	(revision 180123)
+++ doc/md.texi	(working copy)
@@ -4272,6 +4272,17 @@ are vectors with N signed/unsigned elements of siz
 elements of the two vectors, and put the N/2 products of size 2*S in the
 output vector (operand 0).
 
+@cindex @code{vec_widen_ushiftl_hi_@var{m}} instruction pattern
+@cindex @code{vec_widen_ushiftl_lo_@var{m}} instruction pattern
+@cindex @code{vec_widen_sshiftl_hi_@var{m}} instruction pattern
+@cindex @code{vec_widen_sshiftl_lo_@var{m}} instruction pattern
+@item @samp{vec_widen_ushiftl_hi_@var{m}}, @samp{vec_widen_ushiftl_lo_@var{m}}
+@itemx @samp{vec_widen_sshiftl_hi_@var{m}}, @samp{vec_widen_sshiftl_lo_@var{m}}
+Signed/Unsigned widening shift left.  The first input (operand 1) is a vector
+with N signed/unsigned elements of size S@.  Operand 2 is a constant.  Shift
+the high/low elements of operand 1, and put the N/2 results of size 2*S in the
+output vector (operand 0).
+
 @cindex @code{mulhisi3} instruction pattern
 @item @samp{mulhisi3}
 Multiply operands 1 and 2, which have mode @code{HImode}, and store
Index: tree-pretty-print.c
===================================================================
--- tree-pretty-print.c	(revision 180123)
+++ tree-pretty-print.c	(working copy)
@@ -1599,6 +1599,7 @@ dump_generic_node (pretty_printer *buffer, tree no
     case RROTATE_EXPR:
     case VEC_LSHIFT_EXPR:
     case VEC_RSHIFT_EXPR:
+    case WIDEN_LSHIFT_EXPR:
     case BIT_IOR_EXPR:
     case BIT_XOR_EXPR:
     case BIT_AND_EXPR:
@@ -2297,6 +2298,22 @@ dump_generic_node (pretty_printer *buffer, tree no
       pp_string (buffer, " > ");
       break;
 
+    case VEC_WIDEN_LSHIFT_HI_EXPR:
+      pp_string (buffer, " VEC_WIDEN_LSHIFT_HI_EXPR < ");
+      dump_generic_node (buffer, TREE_OPERAND (node, 0), spc, flags, false);
+      pp_string (buffer, ", ");
+      dump_generic_node (buffer, TREE_OPERAND (node, 1), spc, flags, false);
+      pp_string (buffer, " > ");
+      break;
+
+    case VEC_WIDEN_LSHIFT_LO_EXPR:
+      pp_string (buffer, " VEC_WIDEN_LSHIFT_HI_EXPR < ");
+      dump_generic_node (buffer, TREE_OPERAND (node, 0), spc, flags, false);
+      pp_string (buffer, ", ");
+      dump_generic_node (buffer, TREE_OPERAND (node, 1), spc, flags, false);
+      pp_string (buffer, " > ");
+      break;
+
     case VEC_UNPACK_HI_EXPR:
       pp_string (buffer, " VEC_UNPACK_HI_EXPR < ");
       dump_generic_node (buffer, TREE_OPERAND (node, 0), spc, flags, false);
@@ -2619,6 +2636,9 @@ op_code_prio (enum tree_code code)
     case RSHIFT_EXPR:
     case LROTATE_EXPR:
     case RROTATE_EXPR:
+    case VEC_WIDEN_LSHIFT_HI_EXPR:
+    case VEC_WIDEN_LSHIFT_LO_EXPR:
+    case WIDEN_LSHIFT_EXPR:
       return 11;
 
     case WIDEN_SUM_EXPR:
@@ -2794,6 +2814,9 @@ op_symbol_code (enum tree_code code)
     case VEC_RSHIFT_EXPR:
       return "v>>";
 
+    case WIDEN_LSHIFT_EXPR:
+      return "w<<";
+
     case POINTER_PLUS_EXPR:
       return "+";
 
Index: optabs.c
===================================================================
--- optabs.c	(revision 180123)
+++ optabs.c	(working copy)
@@ -479,6 +479,14 @@ optab_for_tree_code (enum tree_code code, const_tr
       return TYPE_UNSIGNED (type) ?
 	vec_widen_umult_lo_optab : vec_widen_smult_lo_optab;
 
+    case VEC_WIDEN_LSHIFT_HI_EXPR:
+      return TYPE_UNSIGNED (type) ?
+        vec_widen_ushiftl_hi_optab : vec_widen_sshiftl_hi_optab;
+
+    case VEC_WIDEN_LSHIFT_LO_EXPR:
+      return TYPE_UNSIGNED (type) ?
+        vec_widen_ushiftl_lo_optab : vec_widen_sshiftl_lo_optab;
+
     case VEC_UNPACK_HI_EXPR:
       return TYPE_UNSIGNED (type) ?
 	vec_unpacku_hi_optab : vec_unpacks_hi_optab;
@@ -6197,6 +6205,10 @@ init_optabs (void)
   init_optab (vec_widen_umult_lo_optab, UNKNOWN);
   init_optab (vec_widen_smult_hi_optab, UNKNOWN);
   init_optab (vec_widen_smult_lo_optab, UNKNOWN);
+  init_optab (vec_widen_ushiftl_hi_optab, UNKNOWN);
+  init_optab (vec_widen_ushiftl_lo_optab, UNKNOWN);
+  init_optab (vec_widen_sshiftl_hi_optab, UNKNOWN);
+  init_optab (vec_widen_sshiftl_lo_optab, UNKNOWN);
   init_optab (vec_unpacks_hi_optab, UNKNOWN);
   init_optab (vec_unpacks_lo_optab, UNKNOWN);
   init_optab (vec_unpacku_hi_optab, UNKNOWN);
Index: optabs.h
===================================================================
--- optabs.h	(revision 180123)
+++ optabs.h	(working copy)
@@ -351,6 +351,12 @@ enum optab_index
   OTI_vec_widen_umult_lo,
   OTI_vec_widen_smult_hi,
   OTI_vec_widen_smult_lo,
+  /* Widening shift left.
+     The high/low part of the resulting vector is returned.  */
+  OTI_vec_widen_ushiftl_hi,
+  OTI_vec_widen_ushiftl_lo,
+  OTI_vec_widen_sshiftl_hi,
+  OTI_vec_widen_sshiftl_lo,
   /* Extract and widen the high/low part of a vector of signed or
      floating point elements.  */
   OTI_vec_unpacks_hi,
@@ -544,6 +550,10 @@ enum optab_index
 #define vec_widen_umult_lo_optab (&optab_table[OTI_vec_widen_umult_lo])
 #define vec_widen_smult_hi_optab (&optab_table[OTI_vec_widen_smult_hi])
 #define vec_widen_smult_lo_optab (&optab_table[OTI_vec_widen_smult_lo])
+#define vec_widen_ushiftl_hi_optab (&optab_table[OTI_vec_widen_ushiftl_hi])
+#define vec_widen_ushiftl_lo_optab (&optab_table[OTI_vec_widen_ushiftl_lo])
+#define vec_widen_sshiftl_hi_optab (&optab_table[OTI_vec_widen_sshiftl_hi])
+#define vec_widen_sshiftl_lo_optab (&optab_table[OTI_vec_widen_sshiftl_lo])
 #define vec_unpacks_hi_optab (&optab_table[OTI_vec_unpacks_hi])
 #define vec_unpacks_lo_optab (&optab_table[OTI_vec_unpacks_lo])
 #define vec_unpacku_hi_optab (&optab_table[OTI_vec_unpacku_hi])
Index: genopinit.c
===================================================================
--- genopinit.c	(revision 180123)
+++ genopinit.c	(working copy)
@@ -271,6 +271,10 @@ static const char * const optabs[] =
   "set_optab_handler (vec_widen_umult_lo_optab, $A, CODE_FOR_$(vec_widen_umult_lo_$a$))",
   "set_optab_handler (vec_widen_smult_hi_optab, $A, CODE_FOR_$(vec_widen_smult_hi_$a$))",
   "set_optab_handler (vec_widen_smult_lo_optab, $A, CODE_FOR_$(vec_widen_smult_lo_$a$))",
+  "set_optab_handler (vec_widen_ushiftl_hi_optab, $A, CODE_FOR_$(vec_widen_ushiftl_hi_$a$))",
+  "set_optab_handler (vec_widen_ushiftl_lo_optab, $A, CODE_FOR_$(vec_widen_ushiftl_lo_$a$))",
+  "set_optab_handler (vec_widen_sshiftl_hi_optab, $A, CODE_FOR_$(vec_widen_sshiftl_hi_$a$))",
+  "set_optab_handler (vec_widen_sshiftl_lo_optab, $A, CODE_FOR_$(vec_widen_sshiftl_lo_$a$))",
   "set_optab_handler (vec_unpacks_hi_optab, $A, CODE_FOR_$(vec_unpacks_hi_$a$))",
   "set_optab_handler (vec_unpacks_lo_optab, $A, CODE_FOR_$(vec_unpacks_lo_$a$))",
   "set_optab_handler (vec_unpacku_hi_optab, $A, CODE_FOR_$(vec_unpacku_hi_$a$))",
Index: ChangeLog
===================================================================
--- ChangeLog	(revision 180123)
+++ ChangeLog	(working copy)
@@ -1,3 +1,49 @@
+2011-10-18  Ira Rosen  <ira.rosen@linaro.org>
+
+	* doc/md.texi (vec_widen_ushiftl_hi, vec_widen_ushiftl_lo,
+	vec_widen_sshiftl_hi, vec_widen_sshiftl_lo): Document.
+	* tree-pretty-print.c (dump_generic_node): Handle WIDEN_LSHIFT_EXPR,
+	VEC_WIDEN_LSHIFT_HI_EXPR and VEC_WIDEN_LSHIFT_LO_EXPR.
+	(op_code_prio): Likewise.
+	(op_symbol_code): Handle WIDEN_LSHIFT_EXPR.
+	* optabs.c (optab_for_tree_code): Handle
+	VEC_WIDEN_LSHIFT_HI_EXPR and VEC_WIDEN_LSHIFT_LO_EXPR.
+	(init-optabs): Initialize optab codes for vec_widen_u/sshiftl_hi/lo.
+	* optabs.h (enum optab_index): Add OTI_vec_widen_u/sshiftl_hi/lo.
+	* genopinit.c (optabs): Initialize the new optabs.
+	* expr.c (expand_expr_real_2): Handle
+	VEC_WIDEN_LSHIFT_HI_EXPR and VEC_WIDEN_LSHIFT_LO_EXPR.
+	* gimple-pretty-print.c (dump_binary_rhs): Likewise.
+	* tree-vectorizer.h (NUM_PATTERNS): Increase to 8.
+	* tree.def (WIDEN_LSHIFT_EXPR, VEC_WIDEN_LSHIFT_HI_EXPR,
+	VEC_WIDEN_LSHIFT_LO_EXPR): New.
+	* cfgexpand.c (expand_debug_expr): Handle new tree codes.
+	* tree-vect-patterns.c (vect_vect_recog_func_ptrs): Add
+	vect_recog_widen_shift_pattern.
+	(vect_handle_widen_mult_by_const): Rename...
+	(vect_handle_widen_op_by_const): ...to this.  Handle shifts.
+	Add a new argument, update documentation.
+	(vect_recog_widen_mult_pattern): Assume that only second
+	operand can be constant.  Update call to
+	vect_handle_widen_op_by_const.
+	(vect_recog_over_widening_pattern): Fix typo.
+	(vect_recog_widen_shift_pattern): New.
+	* tree-vect-stmts.c (vectorizable_type_promotion): Handle
+	widening shifts.
+	(supportable_widening_operation): Likewise.
+	* tree-inline.c (estimate_operator_cost): Handle new tree codes.
+	* tree-vect-generic.c (expand_vector_operations_1): Likewise.
+	* tree-cfg.c (verify_gimple_assign_binary): Likewise.
+	* config/arm/neon.md (neon_vec_<US>shiftl_<mode>): New.
+	(vec_widen_<US>shiftl_lo_<mode>, neon_vec_<US>shiftl_hi_<mode>,
+	vec_widen_<US>shiftl_hi_<mode>, neon_vec_<US>shift_left_<mode>):
+	Likewise.
+	* config/arm/predicates.md (const_neon_scalar_shift_amount_operand):
+	New.
+	* config/arm/iterators.md (V_innermode): New.
+	* tree-vect-slp.c (vect_build_slp_tree): Require same shift operand
+	for widening shift.
+
 2011-10-17  Eric Botcazou  <ebotcazou@adacore.com>
 
 	* config/sparc/sparc.md (in_call_delay): Fix formatting issues.
Index: testsuite/lib/target-supports.exp
===================================================================
--- testsuite/lib/target-supports.exp	(revision 180123)
+++ testsuite/lib/target-supports.exp	(working copy)
@@ -2907,6 +2907,26 @@ proc check_effective_target_vect_widen_mult_hi_to_
 }
 
 # Return 1 if the target plus current options supports a vector
+# widening shift, 0 otherwise.
+#
+# This won't change for different subtargets so cache the result.
+
+proc check_effective_target_vect_widen_shift { } {
+    global et_vect_widen_shift_saved
+
+    if [info exists et_vect_shift_saved] {
+        verbose "check_effective_target_vect_widen_shift: using cached result" 2
+    } else {
+        set et_vect_widen_shift_saved 0
+        if { ([istarget arm*-*-*] && [check_effective_target_arm_neon_ok]) } {
+            set et_vect_widen_shift_saved 1
+        }
+    }
+    verbose "check_effective_target_vect_widen_shift: returning $et_vect_widen_shift_saved" 2
+    return $et_vect_widen_shift_saved
+}
+
+# Return 1 if the target plus current options supports a vector
 # dot-product of signed chars, 0 otherwise.
 #
 # This won't change for different subtargets so cache the result.
Index: testsuite/gcc.dg/vect/vect-widen-shift-u16.c
===================================================================
--- testsuite/gcc.dg/vect/vect-widen-shift-u16.c	(revision 0)
+++ testsuite/gcc.dg/vect/vect-widen-shift-u16.c	(revision 0)
@@ -0,0 +1,58 @@
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target vect_shift } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+#define N 64
+#define C 7
+
+__attribute__ ((noinline)) void
+foo (unsigned short *src, unsigned int *dst)
+{
+  int i;
+  unsigned short b, *s = src;
+  unsigned int *d = dst;
+
+  for (i = 0; i < N; i++)
+    {
+      b = *s++;
+      *d = b << C;
+      d++;
+    }
+
+  s = src;
+  d = dst;
+  for (i = 0; i < N; i++)
+    {
+      b = *s++;
+      if (*d != b << C)
+        abort ();
+      d++;
+    }
+}
+
+int main (void)
+{
+  int i;
+  unsigned short in[N];
+  unsigned int out[N];
+
+  check_vect ();
+
+  for (i = 0; i < N; i++)
+    {
+      in[i] = i;
+      out[i] = 255;
+      __asm__ volatile ("");
+    }
+
+  foo (in, out);
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vect_recog_widen_shift_pattern: detected" 1 "vect" { target vect_widen_shift } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
+
Index: testsuite/gcc.dg/vect/vect-widen-shift-s8.c
===================================================================
--- testsuite/gcc.dg/vect/vect-widen-shift-s8.c	(revision 0)
+++ testsuite/gcc.dg/vect/vect-widen-shift-s8.c	(revision 0)
@@ -0,0 +1,58 @@
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target vect_shift } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+#define N 64
+#define C 12
+
+__attribute__ ((noinline)) void
+foo (char *src, int *dst)
+{
+  int i;
+  char b, *s = src;
+  int *d = dst;
+
+  for (i = 0; i < N; i++)
+    {
+      b = *s++;
+      *d = b << C;
+      d++;
+    }
+
+  s = src;
+  d = dst;
+  for (i = 0; i < N; i++)
+    {
+      b = *s++;
+      if (*d != b << C)
+        abort ();
+      d++;
+    }
+}
+
+int main (void)
+{
+  int i;
+  char in[N];
+  int out[N];
+
+  check_vect ();
+
+  for (i = 0; i < N; i++)
+    {
+      in[i] = i;
+      out[i] = 255;
+      __asm__ volatile ("");
+    }
+
+  foo (in, out);
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vect_recog_widen_shift_pattern: detected" 1 "vect" { target vect_widen_shift } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
+
Index: testsuite/gcc.dg/vect/vect-widen-shift-u8.c
===================================================================
--- testsuite/gcc.dg/vect/vect-widen-shift-u8.c	(revision 0)
+++ testsuite/gcc.dg/vect/vect-widen-shift-u8.c	(revision 0)
@@ -0,0 +1,65 @@
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target vect_shift } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+#define N 64
+#define C1 10
+#define C2 5
+
+__attribute__ ((noinline)) void
+foo (unsigned char *src, unsigned int *dst1, unsigned int *dst2)
+{
+  int i;
+  unsigned char b, *s = src;
+  unsigned int *d1 = dst1, *d2 = dst2;
+
+  for (i = 0; i < N; i++)
+    {
+      b = *s++;
+      *d1 = b << C1;
+      d1++;
+      *d2 = b << C2;
+      d2++;
+    }
+
+  s = src;
+  d1 = dst1;
+  d2 = dst2;
+  for (i = 0; i < N; i++)
+    {
+      b = *s++;
+      if (*d1 != b << C1 || *d2 != b << C2)
+        abort ();
+      d1++;
+      d2++;
+    }
+}
+
+int main (void)
+{
+  int i;
+  unsigned char in[N];
+  unsigned int out1[N];
+  unsigned int out2[N];
+
+  check_vect ();
+
+  for (i = 0; i < N; i++)
+    {
+      in[i] = i;
+      out1[i] = 255;
+      out2[i] = 255;
+      __asm__ volatile ("");
+    }
+
+  foo (in, out1, out2);
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vect_recog_widen_shift_pattern: detected" 1 "vect" { target vect_widen_shift } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
+
Index: testsuite/gcc.dg/vect/vect-widen-shift-s16.c
===================================================================
--- testsuite/gcc.dg/vect/vect-widen-shift-s16.c	(revision 0)
+++ testsuite/gcc.dg/vect/vect-widen-shift-s16.c	(revision 0)
@@ -0,0 +1,107 @@
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target vect_shift } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+#define N 64
+#define C 16
+
+__attribute__ ((noinline)) void
+foo (short *src, int *dst)
+{
+  int i;
+  short b, b0, b1, b2, b3, *s = src;
+  int *d = dst;
+
+  for (i = 0; i < N/4; i++)
+    {
+      b0 = *s++;
+      b1 = *s++;
+      b2 = *s++;
+      b3 = *s++;
+      *d = b0 << C;
+      d++;
+      *d = b1 << C;
+      d++;
+      *d = b2 << C;
+      d++;
+      *d = b3 << C;
+      d++;
+    }
+
+  s = src;
+  d = dst;
+  for (i = 0; i < N; i++)
+    {
+      b = *s++;
+      if (*d != b << C)
+        abort ();
+      d++;
+    }
+
+  s = src;
+  d = dst;
+  for (i = 0; i < N/4; i++)
+    {
+      b0 = *s++;
+      b1 = *s++;
+      b2 = *s++;
+      b3 = *s++;
+      *d = b0 << C;
+      d++;
+      *d = b1 << C;
+      d++;
+      *d = b2 << C;
+      d++;
+      *d = b3 << 6;
+      d++;
+    }
+
+  s = src;
+  d = dst;
+  for (i = 0; i < N/4; i++)
+    {
+      b = *s++;
+      if (*d != b << C)
+        abort ();
+      d++;
+      b = *s++;
+      if (*d != b << C)
+        abort ();
+      d++;
+      b = *s++;
+      if (*d != b << C)
+        abort ();
+      d++;
+      b = *s++;
+      if (*d != b << 6)
+        abort ();
+      d++;
+    }
+}
+
+int main (void)
+{
+  int i;
+  short in[N];
+  int out[N];
+
+  check_vect ();
+
+  for (i = 0; i < N; i++)
+    {
+      in[i] = i;
+      out[i] = 255;
+      __asm__ volatile ("");
+    }
+
+  foo (in, out);
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vect_recog_widen_shift_pattern: detected" 8 "vect" { target vect_widen_shift } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
+
Index: testsuite/ChangeLog
===================================================================
--- testsuite/ChangeLog	(revision 180123)
+++ testsuite/ChangeLog	(working copy)
@@ -1,3 +1,12 @@
+2011-10-18  Ira Rosen  <ira.rosen@linaro.org>
+
+	* testsuite/lib/target-supports.exp
+	(check_effective_target_vect_widen_shift): New.
+	* gcc.dg/vect/vect-widen-shift-s16.c: New.
+	* gcc.dg/vect/vect-widen-shift-s8.c: New.
+	* gcc.dg/vect/vect-widen-shift-u16.c: New.
+	* gcc.dg/vect/vect-widen-shift-u8.c: New.
+
 2011-10-17  Michael Spertus  <mike_spertus@symantec.com>
 
 	* g++.dg/ext/bases.C: New test.
Index: expr.c
===================================================================
--- expr.c	(revision 180123)
+++ expr.c	(working copy)
@@ -8711,6 +8711,19 @@ expand_expr_real_2 (sepops ops, rtx target, enum m
 	return target;
       }
 
+    case VEC_WIDEN_LSHIFT_HI_EXPR:
+    case VEC_WIDEN_LSHIFT_LO_EXPR:
+      {
+        tree oprnd0 = treeop0;
+        tree oprnd1 = treeop1;
+
+        expand_operands (oprnd0, oprnd1, NULL_RTX, &op0, &op1, EXPAND_NORMAL);
+        target = expand_widen_pattern_expr (ops, op0, op1, NULL_RTX,
+                                            target, unsignedp);
+        gcc_assert (target);
+        return target;
+      }
+
     case VEC_PACK_TRUNC_EXPR:
     case VEC_PACK_SAT_EXPR:
     case VEC_PACK_FIX_TRUNC_EXPR:
Index: gimple-pretty-print.c
===================================================================
--- gimple-pretty-print.c	(revision 180123)
+++ gimple-pretty-print.c	(working copy)
@@ -343,6 +343,8 @@ dump_binary_rhs (pretty_printer *buffer, gimple gs
     case VEC_EXTRACT_ODD_EXPR:
     case VEC_INTERLEAVE_HIGH_EXPR:
     case VEC_INTERLEAVE_LOW_EXPR:
+    case VEC_WIDEN_LSHIFT_HI_EXPR:
+    case VEC_WIDEN_LSHIFT_LO_EXPR:
       for (p = tree_code_name [(int) code]; *p; p++)
 	pp_character (buffer, TOUPPER (*p));
       pp_string (buffer, " <");
Index: tree-vectorizer.h
===================================================================
--- tree-vectorizer.h	(revision 180123)
+++ tree-vectorizer.h	(working copy)
@@ -902,7 +902,7 @@ extern void vect_slp_transform_bb (basic_block);
    Additional pattern recognition functions can (and will) be added
    in the future.  */
 typedef gimple (* vect_recog_func_ptr) (VEC (gimple, heap) **, tree *, tree *);
-#define NUM_PATTERNS 7
+#define NUM_PATTERNS 8
 void vect_pattern_recog (loop_vec_info);
 
 /* In tree-vectorizer.c.  */
Index: tree.def
===================================================================
--- tree.def	(revision 180123)
+++ tree.def	(working copy)
@@ -1125,6 +1125,19 @@ DEFTREECODE (WIDEN_MULT_PLUS_EXPR, "widen_mult_plu
    is subtracted from t3.  */
 DEFTREECODE (WIDEN_MULT_MINUS_EXPR, "widen_mult_minus_expr", tcc_expression, 3)
 
+/* Widening shift left.
+   The first operand is of type t1.
+   The second operand is the number of bits to shift by; it need not be the
+   same type as the first operand and result.
+   Note that the result is undefined if the second operand is larger
+   than or equal to the first operand's type size.
+   The type of the entire expression is t2, such that t2 is at least twice
+   the size of t1.
+   WIDEN_LSHIFT_EXPR is equivalent to first widening (promoting)
+   the first argument from type t1 to type t2, and then shifting it
+   by the second argument.  */
+DEFTREECODE (WIDEN_LSHIFT_EXPR, "widen_lshift_expr", tcc_binary, 2)
+
 /* Fused multiply-add.
    All operands and the result are of the same type.  No intermediate
    rounding is performed after multiplying operand one with operand two
@@ -1180,6 +1193,16 @@ DEFTREECODE (VEC_EXTRACT_ODD_EXPR, "vec_extractodd
 DEFTREECODE (VEC_INTERLEAVE_HIGH_EXPR, "vec_interleavehigh_expr", tcc_binary, 2)
 DEFTREECODE (VEC_INTERLEAVE_LOW_EXPR, "vec_interleavelow_expr", tcc_binary, 2)
 
+/* Widening vector shift left in bits.
+   Operand 0 is a vector to be shifted with N elements of size S.
+   Operand 1 is an integer shift amount in bits.
+   The result of the operation is N elements of size 2*S.
+   VEC_WIDEN_LSHIFT_HI_EXPR computes the N/2 high results.
+   VEC_WIDEN_LSHIFT_LO_EXPR computes the N/2 low results.
+ */
+DEFTREECODE (VEC_WIDEN_LSHIFT_HI_EXPR, "widen_lshift_hi_expr", tcc_binary, 2)
+DEFTREECODE (VEC_WIDEN_LSHIFT_LO_EXPR, "widen_lshift_lo_expr", tcc_binary, 2)
+
 /* PREDICT_EXPR.  Specify hint for branch prediction.  The
    PREDICT_EXPR_PREDICTOR specify predictor and PREDICT_EXPR_OUTCOME the
    outcome (0 for not taken and 1 for taken).  Once the profile is guessed
Index: cfgexpand.c
===================================================================
--- cfgexpand.c	(revision 180123)
+++ cfgexpand.c	(working copy)
@@ -3265,6 +3265,8 @@ expand_debug_expr (tree exp)
     case VEC_UNPACK_LO_EXPR:
     case VEC_WIDEN_MULT_HI_EXPR:
     case VEC_WIDEN_MULT_LO_EXPR:
+    case VEC_WIDEN_LSHIFT_HI_EXPR:
+    case VEC_WIDEN_LSHIFT_LO_EXPR:
       return NULL;
 
    /* Misc codes.  */
Index: tree-vect-patterns.c
===================================================================
--- tree-vect-patterns.c	(revision 180123)
+++ tree-vect-patterns.c	(working copy)
@@ -49,6 +49,8 @@ static gimple vect_recog_dot_prod_pattern (VEC (gi
 static gimple vect_recog_pow_pattern (VEC (gimple, heap) **, tree *, tree *);
 static gimple vect_recog_over_widening_pattern (VEC (gimple, heap) **, tree *,
                                                  tree *);
+static gimple vect_recog_widen_shift_pattern (VEC (gimple, heap) **,
+	                                tree *, tree *);
 static gimple vect_recog_mixed_size_cond_pattern (VEC (gimple, heap) **,
 						  tree *, tree *);
 static gimple vect_recog_bool_pattern (VEC (gimple, heap) **, tree *, tree *);
@@ -58,10 +60,10 @@ static vect_recog_func_ptr vect_vect_recog_func_pt
 	vect_recog_dot_prod_pattern,
 	vect_recog_pow_pattern,
 	vect_recog_over_widening_pattern,
+	vect_recog_widen_shift_pattern,
 	vect_recog_mixed_size_cond_pattern,
 	vect_recog_bool_pattern};
 
-
 /* Function widened_name_p
 
    Check whether NAME, an ssa-name used in USE_STMT,
@@ -340,27 +342,37 @@ vect_recog_dot_prod_pattern (VEC (gimple, heap) **
 }
 
 
-/* Handle two cases of multiplication by a constant.  The first one is when
-   the constant, CONST_OPRND, fits the type (HALF_TYPE) of the second
-   operand (OPRND).  In that case, we can peform widen-mult from HALF_TYPE to
-   TYPE.
+/* Handle widening operation by a constant.  At the moment we support MULT_EXPR
+   and LSHIFT_EXPR.
 
+   For MULT_EXPR we check that CONST_OPRND fits HALF_TYPE, and for LSHIFT_EXPR
+   we check that CONST_OPRND is less or equal to the size of HALF_TYPE.
+
    Otherwise, if the type of the result (TYPE) is at least 4 times bigger than
-   HALF_TYPE, and CONST_OPRND fits an intermediate type (2 times smaller than
-   TYPE), we can perform widen-mult from the intermediate type to TYPE and
-   replace a_T = (TYPE) a_t; with a_it - (interm_type) a_t;  */
+   HALF_TYPE, and there is an intermediate type (2 times smaller than TYPE)
+   that satisfies the above restrictions,  we can perform a widening opeartion
+   from the intermediate type to TYPE and replace a_T = (TYPE) a_t;
+   with a_it = (interm_type) a_t;  */
 
 static bool
-vect_handle_widen_mult_by_const (gimple stmt, tree const_oprnd, tree *oprnd,
-   			         VEC (gimple, heap) **stmts, tree type,
-			         tree *half_type, gimple def_stmt)
+vect_handle_widen_op_by_const (gimple stmt, enum tree_code code,
+		               tree const_oprnd, tree *oprnd,
+   		               VEC (gimple, heap) **stmts, tree type,
+			       tree *half_type, gimple def_stmt)
 {
   tree new_type, new_oprnd, tmp;
   gimple new_stmt;
   loop_vec_info loop_info = STMT_VINFO_LOOP_VINFO (vinfo_for_stmt (stmt));
   struct loop *loop = LOOP_VINFO_LOOP (loop_info);
 
-  if (int_fits_type_p (const_oprnd, *half_type))
+  if (code != MULT_EXPR && code != LSHIFT_EXPR)
+    return false;
+
+  if (((code == MULT_EXPR && int_fits_type_p (const_oprnd, *half_type))
+        || (code == LSHIFT_EXPR
+            && compare_tree_int (const_oprnd, TYPE_PRECISION (*half_type))
+	    	!= 1))
+      && TYPE_PRECISION (type) == (TYPE_PRECISION (*half_type) * 2))
     {
       /* CONST_OPRND is a constant of HALF_TYPE.  */
       *oprnd = gimple_assign_rhs1 (def_stmt);
@@ -373,14 +385,16 @@ static bool
       || !vinfo_for_stmt (def_stmt))
     return false;
 
-  /* TYPE is 4 times bigger than HALF_TYPE, try widen-mult for
+  /* TYPE is 4 times bigger than HALF_TYPE, try widening operation for
      a type 2 times bigger than HALF_TYPE.  */
   new_type = build_nonstandard_integer_type (TYPE_PRECISION (type) / 2,
                                              TYPE_UNSIGNED (type));
-  if (!int_fits_type_p (const_oprnd, new_type))
+  if ((code == MULT_EXPR && !int_fits_type_p (const_oprnd, new_type))
+      || (code == LSHIFT_EXPR
+          && compare_tree_int (const_oprnd, TYPE_PRECISION (new_type)) == 1))
     return false;
 
-  /* Use NEW_TYPE for widen_mult.  */
+  /* Use NEW_TYPE for widening operation.  */
   if (STMT_VINFO_RELATED_STMT (vinfo_for_stmt (def_stmt)))
     {
       new_stmt = STMT_VINFO_RELATED_STMT (vinfo_for_stmt (def_stmt));
@@ -500,7 +514,7 @@ vect_recog_widen_mult_pattern (VEC (gimple, heap)
   enum tree_code dummy_code;
   int dummy_int;
   VEC (tree, heap) *dummy_vec;
-  bool op0_ok, op1_ok;
+  bool op1_ok;
 
   if (!is_gimple_assign (last_stmt))
     return NULL;
@@ -520,38 +534,23 @@ vect_recog_widen_mult_pattern (VEC (gimple, heap)
     return NULL;
 
   /* Check argument 0.  */
-  op0_ok = widened_name_p (oprnd0, last_stmt, &half_type0, &def_stmt0, false);
+  if (!widened_name_p (oprnd0, last_stmt, &half_type0, &def_stmt0, false))
+    return NULL;
   /* Check argument 1.  */
   op1_ok = widened_name_p (oprnd1, last_stmt, &half_type1, &def_stmt1, false);
 
-  /* In case of multiplication by a constant one of the operands may not match
-     the pattern, but not both.  */
-  if (!op0_ok && !op1_ok)
-    return NULL;
-
-  if (op0_ok && op1_ok)
+  if (op1_ok)
     {
       oprnd0 = gimple_assign_rhs1 (def_stmt0);
       oprnd1 = gimple_assign_rhs1 (def_stmt1);
     }	       
-  else if (!op0_ok)
+  else
     {
-      if (TREE_CODE (oprnd0) == INTEGER_CST
-	  && TREE_CODE (half_type1) == INTEGER_TYPE
-          && vect_handle_widen_mult_by_const (last_stmt, oprnd0, &oprnd1,
-                                              stmts, type,
-				 	      &half_type1, def_stmt1))
-        half_type0 = half_type1;
-      else
-	return NULL;
-    }
-  else if (!op1_ok)
-    {
       if (TREE_CODE (oprnd1) == INTEGER_CST
           && TREE_CODE (half_type0) == INTEGER_TYPE
-          && vect_handle_widen_mult_by_const (last_stmt, oprnd1, &oprnd0,
-                                              stmts, type,
-					      &half_type0, def_stmt0))
+          && vect_handle_widen_op_by_const (last_stmt, MULT_EXPR, oprnd1,
+		                            &oprnd0, stmts, type,
+					    &half_type0, def_stmt0))
         half_type1 = half_type0;
       else
         return NULL;
@@ -1130,7 +1129,7 @@ vect_recog_over_widening_pattern (VEC (gimple, hea
          statetments, except for the case when the last statement in the
          sequence doesn't have a corresponding pattern statement.  In such
          case we associate the last pattern statement with the last statement
-         in the sequence.  Therefore, we only add an original statetement to
+         in the sequence.  Therefore, we only add the original statement to
          the list if we know that it is not the last.  */
       if (prev_stmt)
         VEC_safe_push (gimple, heap, *stmts, prev_stmt);
@@ -1215,7 +1214,231 @@ vect_recog_over_widening_pattern (VEC (gimple, hea
   return pattern_stmt;
 }
 
+/* Detect widening shift pattern:
 
+   type a_t;
+   TYPE a_T, res_T;
+
+   S1 a_t = ;
+   S2 a_T = (TYPE) a_t;
+   S3 res_T = a_T << CONST;
+
+  where type 'TYPE' is at least double the size of type 'type'.
+
+  Also detect unsgigned cases:
+
+  unsigned type a_t;
+  unsigned TYPE u_res_T;
+  TYPE a_T, res_T;
+
+  S1 a_t = ;
+  S2 a_T = (TYPE) a_t;
+  S3 res_T = a_T << CONST;
+  S4 u_res_T = (unsigned TYPE) res_T;
+
+  And a case when 'TYPE' is 4 times bigger than 'type'.  In that case we
+  create an additional pattern stmt for S2 to create a variable of an
+  intermediate type, and perform widen-shift on the intermediate type:
+
+  type a_t;
+  interm_type a_it;
+  TYPE a_T, res_T, res_T';
+
+  S1 a_t = ;
+  S2 a_T = (TYPE) a_t;
+      '--> a_it = (interm_type) a_t;
+  S3 res_T = a_T << CONST;
+      '--> res_T' = a_it <<* CONST;
+
+  Input/Output:
+
+  * STMTS: Contains a stmt from which the pattern search begins.
+    In case of unsigned widen-shift, the original stmt (S3) is replaced with S4
+    in STMTS.  When an intermediate type is used and a pattern statement is
+    created for S2, we also put S2 here (before S3).
+
+  Output:
+
+  * TYPE_IN: The type of the input arguments to the pattern.
+
+  * TYPE_OUT: The type of the output of this pattern.
+
+  * Return value: A new stmt that will be used to replace the sequence of
+    stmts that constitute the pattern.  In this case it will be:
+    WIDEN_LSHIFT_EXPR <a_t, CONST>.  */
+
+static gimple
+vect_recog_widen_shift_pattern (VEC (gimple, heap) **stmts,
+				tree *type_in, tree *type_out)
+{
+  gimple last_stmt = VEC_pop (gimple, *stmts);
+  gimple def_stmt0;
+  tree oprnd0, oprnd1;
+  tree type, half_type0;
+  gimple pattern_stmt, orig_stmt = NULL;
+  tree vectype, vectype_out = NULL_TREE;
+  tree dummy;
+  tree var;
+  enum tree_code dummy_code;
+  int dummy_int;
+  VEC (tree, heap) * dummy_vec;
+  gimple use_stmt = NULL;
+  bool over_widen = false;
+
+  if (!is_gimple_assign (last_stmt) || !vinfo_for_stmt (last_stmt))
+    return NULL;
+
+  orig_stmt = last_stmt;
+  if (STMT_VINFO_IN_PATTERN_P (vinfo_for_stmt (last_stmt)))
+    {
+      /* This statement was also detected as over-widening operation (it can't
+         be any other pattern, because only over-widening detects shifts).
+         LAST_STMT is the final type demotion statement, but its related
+         statement is shift.  We analyze the related statement to catch cases:
+
+         orig code:
+          type a_t;
+          itype res;
+          TYPE a_T, res_T;
+
+          S1 a_T = (TYPE) a_t;
+          S2 res_T = a_T << CONST;
+          S3 res = (itype)res_T;
+
+          (size of type * 2 <= size of itype
+           and size of itype * 2 <= size of TYPE)
+
+         code after over-widening pattern detection:
+
+          S1 a_T = (TYPE) a_t;
+               --> a_it = (itype) a_t;
+          S2 res_T = a_T << CONST;
+          S3 res = (itype)res_T;  <--- LAST_STMT
+               --> res = a_it << CONST;
+
+         after widen_shift:
+
+          S1 a_T = (TYPE) a_t;
+               --> a_it = (itype) a_t; - redundant
+          S2 res_T = a_T << CONST;
+          S3 res = (itype)res_T;
+               --> res = a_t w<< CONST;
+
+      i.e., we replace the three statements with res = a_t w<< CONST.  */
+      last_stmt = STMT_VINFO_RELATED_STMT (vinfo_for_stmt (last_stmt));
+      over_widen = true;
+    }
+
+  if (gimple_assign_rhs_code (last_stmt) != LSHIFT_EXPR)
+    return NULL;
+
+  oprnd0 = gimple_assign_rhs1 (last_stmt);
+  oprnd1 = gimple_assign_rhs2 (last_stmt);
+  if (TREE_CODE (oprnd0) != SSA_NAME || TREE_CODE (oprnd1) != INTEGER_CST)
+    return NULL;
+
+  /* Check operand 0: it has to be defined by a type promotion.  */
+  if (!widened_name_p (oprnd0, last_stmt, &half_type0, &def_stmt0, false))
+    return NULL;
+
+  /* Check operand 1: has to be positive.  We check that it fits the type
+     in vect_handle_widen_op_by_const ().  */
+  if (tree_int_cst_compare (oprnd1, size_zero_node) <= 0)
+    return NULL;
+
+  oprnd0 = gimple_assign_rhs1 (def_stmt0);
+  type = gimple_expr_type (last_stmt);
+
+  /* Check if this a widening operation.  */
+  if (!vect_handle_widen_op_by_const (last_stmt, LSHIFT_EXPR, oprnd1,
+       				      &oprnd0, stmts,
+	                              type, &half_type0, def_stmt0))
+    return NULL;
+
+  /* Handle unsigned case.  Look for
+     S4  u_res_T = (unsigned TYPE) res_T;
+     Use unsigned TYPE as the type for WIDEN_LSHIFT_EXPR.  */
+  if (TYPE_UNSIGNED (type) != TYPE_UNSIGNED (half_type0))
+    {
+      tree lhs = gimple_assign_lhs (last_stmt), use_lhs;
+      imm_use_iterator imm_iter;
+      use_operand_p use_p;
+      int nuses = 0;
+      tree use_type;
+
+      if (over_widen)
+        {
+          /* In case of over-widening pattern, S4 should be ORIG_STMT itself.
+             We check here that TYPE is the correct type for the operation,
+             i.e., it's the type of the original result.  */
+          tree orig_type = gimple_expr_type (orig_stmt);
+          if ((TYPE_UNSIGNED (type) != TYPE_UNSIGNED (orig_type))
+              || (TYPE_PRECISION (type) != TYPE_PRECISION (orig_type)))
+            return NULL;
+        }
+      else
+        {
+          FOR_EACH_IMM_USE_FAST (use_p, imm_iter, lhs)
+            {
+	      if (is_gimple_debug (USE_STMT (use_p)))
+	        continue;
+      	      use_stmt = USE_STMT (use_p);
+ 	      nuses++;
+            }
+
+          if (nuses != 1 || !is_gimple_assign (use_stmt)
+	      || !CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (use_stmt)))
+	    return NULL;
+
+          use_lhs = gimple_assign_lhs (use_stmt);
+          use_type = TREE_TYPE (use_lhs);
+
+          if (!INTEGRAL_TYPE_P (use_type)
+              || (TYPE_UNSIGNED (type) == TYPE_UNSIGNED (use_type))
+              || (TYPE_PRECISION (type) != TYPE_PRECISION (use_type)))
+            return NULL;
+
+          type = use_type;
+        }
+    }
+
+  /* Pattern detected.  */
+  if (vect_print_dump_info (REPORT_DETAILS))
+    fprintf (vect_dump, "vect_recog_widen_shift_pattern: detected: ");
+
+  /* Check target support.  */
+  vectype = get_vectype_for_scalar_type (half_type0);
+  vectype_out = get_vectype_for_scalar_type (type);
+
+  if (!vectype
+      || !vectype_out
+      || !supportable_widening_operation (WIDEN_LSHIFT_EXPR, last_stmt,
+					  vectype_out, vectype,
+					  &dummy, &dummy, &dummy_code,
+					  &dummy_code, &dummy_int,
+					  &dummy_vec))
+    return NULL;
+
+  *type_in = vectype;
+  *type_out = vectype_out;
+
+  /* Pattern supported.  Create a stmt to be used to replace the pattern.  */
+  var = vect_recog_temp_ssa_var (type, NULL);
+  pattern_stmt =
+    gimple_build_assign_with_ops (WIDEN_LSHIFT_EXPR, var, oprnd0, oprnd1);
+
+  if (vect_print_dump_info (REPORT_DETAILS))
+    print_gimple_stmt (vect_dump, pattern_stmt, 0, TDF_SLIM);
+
+  if (use_stmt)
+    last_stmt = use_stmt;
+  else
+    last_stmt = orig_stmt;
+
+  VEC_safe_push (gimple, heap, *stmts, last_stmt);
+  return pattern_stmt;
+}
+
 /* Function vect_recog_mixed_size_cond_pattern
 
    Try to find the following pattern:
Index: tree-vect-stmts.c
===================================================================
--- tree-vect-stmts.c	(revision 180123)
+++ tree-vect-stmts.c	(working copy)
@@ -3333,6 +3333,7 @@ vectorizable_type_promotion (gimple stmt, gimple_s
   VEC (tree, heap) *vec_oprnds0 = NULL, *vec_oprnds1 = NULL;
   VEC (tree, heap) *vec_dsts = NULL, *interm_types = NULL, *tmp_vec_dsts = NULL;
   bb_vec_info bb_vinfo = STMT_VINFO_BB_VINFO (stmt_info);
+  unsigned int k;
 
   if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo)
     return false;
@@ -3349,7 +3350,8 @@ vectorizable_type_promotion (gimple stmt, gimple_s
 
   code = gimple_assign_rhs_code (stmt);
   if (!CONVERT_EXPR_CODE_P (code)
-      && code != WIDEN_MULT_EXPR)
+      && code != WIDEN_MULT_EXPR
+      && code != WIDEN_LSHIFT_EXPR)
     return false;
 
   scalar_dest = gimple_assign_lhs (stmt);
@@ -3377,7 +3379,7 @@ vectorizable_type_promotion (gimple stmt, gimple_s
       bool ok;
 
       op1 = gimple_assign_rhs2 (stmt);
-      if (code == WIDEN_MULT_EXPR)
+      if (code == WIDEN_MULT_EXPR || code == WIDEN_LSHIFT_EXPR)
         {
 	  /* For WIDEN_MULT_EXPR, if OP0 is a constant, use the type of
 	     OP1.  */
@@ -3454,7 +3456,7 @@ vectorizable_type_promotion (gimple stmt, gimple_s
     fprintf (vect_dump, "transform type promotion operation. ncopies = %d.",
                         ncopies);
 
-  if (code == WIDEN_MULT_EXPR)
+  if (code == WIDEN_MULT_EXPR || code == WIDEN_LSHIFT_EXPR)
     {
       if (CONSTANT_CLASS_P (op0))
 	op0 = fold_convert (TREE_TYPE (op1), op0);
@@ -3495,6 +3497,8 @@ vectorizable_type_promotion (gimple stmt, gimple_s
       if (op_type == binary_op)
         vec_oprnds1 = VEC_alloc (tree, heap, 1);
     }
+  else if (code == WIDEN_LSHIFT_EXPR)
+    vec_oprnds1 = VEC_alloc (tree, heap, slp_node->vec_stmts_size);
 
   /* In case the vectorization factor (VF) is bigger than the number
      of elements that we can fit in a vectype (nunits), we have to generate
@@ -3508,15 +3512,33 @@ vectorizable_type_promotion (gimple stmt, gimple_s
       if (j == 0)
         {
           if (slp_node)
-              vect_get_slp_defs (op0, op1, slp_node, &vec_oprnds0,
-                                 &vec_oprnds1, -1);
-          else
+	    {
+	      if (code == WIDEN_LSHIFT_EXPR)
+                {
+                  vec_oprnd1 = op1;
+		  /* Store vec_oprnd1 for every vector stmt to be created
+		     for SLP_NODE.  We check during the analysis that all
+		     the shift arguments are the same.  */
+                  for (k = 0; k < slp_node->vec_stmts_size - 1; k++)
+                    VEC_quick_push (tree, vec_oprnds1, vec_oprnd1);
+
+    		  vect_get_slp_defs (op0, NULL_TREE, slp_node, &vec_oprnds0, NULL,
+ 	                             -1);
+                }
+              else
+                vect_get_slp_defs (op0, op1, slp_node, &vec_oprnds0,
+                                   &vec_oprnds1, -1);
+	    }
+	  else
             {
               vec_oprnd0 = vect_get_vec_def_for_operand (op0, stmt, NULL);
               VEC_quick_push (tree, vec_oprnds0, vec_oprnd0);
               if (op_type == binary_op)
                 {
-                  vec_oprnd1 = vect_get_vec_def_for_operand (op1, stmt, NULL);
+                  if (code == WIDEN_LSHIFT_EXPR)
+                    vec_oprnd1 = op1;
+                  else
+                    vec_oprnd1 = vect_get_vec_def_for_operand (op1, stmt, NULL);
                   VEC_quick_push (tree, vec_oprnds1, vec_oprnd1);
                 }
             }
@@ -3527,7 +3549,10 @@ vectorizable_type_promotion (gimple stmt, gimple_s
           VEC_replace (tree, vec_oprnds0, 0, vec_oprnd0);
           if (op_type == binary_op)
             {
-              vec_oprnd1 = vect_get_vec_def_for_stmt_copy (dt[1], vec_oprnd1);
+              if (code == WIDEN_LSHIFT_EXPR)
+                vec_oprnd1 = op1;
+              else
+                vec_oprnd1 = vect_get_vec_def_for_stmt_copy (dt[1], vec_oprnd1);
               VEC_replace (tree, vec_oprnds1, 0, vec_oprnd1);
             }
         }
@@ -5789,6 +5814,19 @@ supportable_widening_operation (enum tree_code cod
         }
       break;
 
+    case WIDEN_LSHIFT_EXPR:
+      if (BYTES_BIG_ENDIAN)
+        {
+          c1 = VEC_WIDEN_LSHIFT_HI_EXPR;
+          c2 = VEC_WIDEN_LSHIFT_LO_EXPR;
+        }
+      else
+        {
+          c2 = VEC_WIDEN_LSHIFT_HI_EXPR;
+          c1 = VEC_WIDEN_LSHIFT_LO_EXPR;
+        }
+      break;
+
     CASE_CONVERT:
       if (BYTES_BIG_ENDIAN)
         {
Index: tree-inline.c
===================================================================
--- tree-inline.c	(revision 180123)
+++ tree-inline.c	(working copy)
@@ -3355,6 +3355,7 @@ estimate_operator_cost (enum tree_code code, eni_w
     case DOT_PROD_EXPR:
     case WIDEN_MULT_PLUS_EXPR:
     case WIDEN_MULT_MINUS_EXPR:
+    case WIDEN_LSHIFT_EXPR:
 
     case VEC_WIDEN_MULT_HI_EXPR:
     case VEC_WIDEN_MULT_LO_EXPR:
@@ -3369,6 +3370,8 @@ estimate_operator_cost (enum tree_code code, eni_w
     case VEC_EXTRACT_ODD_EXPR:
     case VEC_INTERLEAVE_HIGH_EXPR:
     case VEC_INTERLEAVE_LOW_EXPR:
+    case VEC_WIDEN_LSHIFT_HI_EXPR:
+    case VEC_WIDEN_LSHIFT_LO_EXPR:
 
       return 1;
 
Index: tree-vect-generic.c
===================================================================
--- tree-vect-generic.c	(revision 180123)
+++ tree-vect-generic.c	(working copy)
@@ -823,7 +823,9 @@ expand_vector_operations_1 (gimple_stmt_iterator *
       || code == VEC_UNPACK_LO_EXPR
       || code == VEC_PACK_TRUNC_EXPR
       || code == VEC_PACK_SAT_EXPR
-      || code == VEC_PACK_FIX_TRUNC_EXPR)
+      || code == VEC_PACK_FIX_TRUNC_EXPR
+      || code == VEC_WIDEN_LSHIFT_HI_EXPR
+      || code == VEC_WIDEN_LSHIFT_LO_EXPR)
     type = TREE_TYPE (rhs1);
 
   /* Optabs will try converting a negation into a subtraction, so
Index: tree-cfg.c
===================================================================
--- tree-cfg.c	(revision 180123)
+++ tree-cfg.c	(working copy)
@@ -3510,6 +3510,44 @@ verify_gimple_assign_binary (gimple stmt)
 	return false;
       }
 
+    case WIDEN_LSHIFT_EXPR:
+      {
+        if (!INTEGRAL_TYPE_P (lhs_type)
+            || !INTEGRAL_TYPE_P (rhs1_type)
+            || TREE_CODE (rhs2) != INTEGER_CST
+            || (2 * TYPE_PRECISION (rhs1_type) > TYPE_PRECISION (lhs_type)))
+          {
+            error ("type mismatch in widening vector shift expression");
+            debug_generic_expr (lhs_type);
+            debug_generic_expr (rhs1_type);
+            debug_generic_expr (rhs2_type);
+            return true;
+          }
+
+        return false;
+      }
+
+    case VEC_WIDEN_LSHIFT_HI_EXPR:
+    case VEC_WIDEN_LSHIFT_LO_EXPR:
+      {
+        if (TREE_CODE (rhs1_type) != VECTOR_TYPE
+            || TREE_CODE (lhs_type) != VECTOR_TYPE
+            || !INTEGRAL_TYPE_P (TREE_TYPE (rhs1_type))
+            || !INTEGRAL_TYPE_P (TREE_TYPE (lhs_type))
+            || TREE_CODE (rhs2) != INTEGER_CST
+            || (2 * TYPE_PRECISION (TREE_TYPE (rhs1_type))
+                > TYPE_PRECISION (TREE_TYPE (lhs_type))))
+          {
+            error ("type mismatch in widening vector shift expression");
+            debug_generic_expr (lhs_type);
+            debug_generic_expr (rhs1_type);
+            debug_generic_expr (rhs2_type);
+            return true;
+          }
+
+        return false;
+      }
+
     case PLUS_EXPR:
     case MINUS_EXPR:
       {
Index: config/arm/neon.md
===================================================================
--- config/arm/neon.md	(revision 180123)
+++ config/arm/neon.md	(working copy)
@@ -5335,6 +5335,44 @@
  }
 )
 
+(define_insn "neon_vec_<US>shiftl_<mode>"
+ [(set (match_operand:<V_widen> 0 "register_operand" "=w")
+       (SE:<V_widen> (ashift:VW (match_operand:VW 1 "register_operand" "w")
+       (match_operand:<V_innermode> 2 "const_neon_scalar_shift_amount_operand" ""))))]
+  "TARGET_NEON"
+{
+  return "vshll.<US><V_sz_elem> %q0, %P1, %2";
+}
+  [(set_attr "neon_type" "neon_shift_1")]
+)
+
+(define_expand "vec_widen_<US>shiftl_lo_<mode>"
+  [(match_operand:<V_unpack> 0 "register_operand" "")
+   (SE:<V_unpack> (match_operand:VU 1 "register_operand" ""))
+   (match_operand:SI 2 "immediate_operand" "i")]
+ "TARGET_NEON && !BYTES_BIG_ENDIAN"
+ {
+  emit_insn (gen_neon_vec_<US>shiftl_<V_half> (operands[0],
+		simplify_gen_subreg (<V_HALF>mode, operands[1], <MODE>mode, 0),
+		operands[2]));
+   DONE;
+ }
+)
+
+(define_expand "vec_widen_<US>shiftl_hi_<mode>"
+  [(match_operand:<V_unpack> 0 "register_operand" "")
+   (SE:<V_unpack> (match_operand:VU 1 "register_operand" ""))
+   (match_operand:SI 2 "immediate_operand" "i")]
+ "TARGET_NEON && !BYTES_BIG_ENDIAN"
+ {
+  emit_insn (gen_neon_vec_<US>shiftl_<V_half> (operands[0],
+                simplify_gen_subreg (<V_HALF>mode, operands[1], <MODE>mode,
+				     GET_MODE_SIZE (<V_HALF>mode)),
+                operands[2]));
+   DONE;
+ }
+)
+
 ;; Vectorize for non-neon-quad case
 (define_insn "neon_unpack<US>_<mode>"
  [(set (match_operand:<V_widen> 0 "register_operand" "=w")
@@ -5411,6 +5449,34 @@
  }
 )
 
+(define_expand "vec_widen_<US>shiftl_hi_<mode>"
+ [(match_operand:<V_double_width> 0 "register_operand" "")
+   (SE:<V_double_width> (match_operand:VDI 1 "register_operand" ""))
+   (match_operand:SI 2 "immediate_operand" "i")]
+ "TARGET_NEON"
+ {
+   rtx tmpreg = gen_reg_rtx (<V_widen>mode);
+   emit_insn (gen_neon_vec_<US>shiftl_<mode> (tmpreg, operands[1], operands[2]));
+   emit_insn (gen_neon_vget_high<V_widen_l> (operands[0], tmpreg));
+
+   DONE;
+ }
+)
+
+(define_expand "vec_widen_<US>shiftl_lo_<mode>"
+  [(match_operand:<V_double_width> 0 "register_operand" "")
+   (SE:<V_double_width> (match_operand:VDI 1 "register_operand" ""))
+   (match_operand:SI 2 "immediate_operand" "i")]
+ "TARGET_NEON"
+ {
+   rtx tmpreg = gen_reg_rtx (<V_widen>mode);
+   emit_insn (gen_neon_vec_<US>shiftl_<mode> (tmpreg, operands[1], operands[2]));
+   emit_insn (gen_neon_vget_low<V_widen_l> (operands[0], tmpreg));
+
+   DONE;
+ }
+)
+
 ; FIXME: These instruction patterns can't be used safely in big-endian mode
 ; because the ordering of vector elements in Q registers is different from what
 ; the semantics of the instructions require.
Index: config/arm/predicates.md
===================================================================
--- config/arm/predicates.md	(revision 180123)
+++ config/arm/predicates.md	(working copy)
@@ -136,6 +136,11 @@
 	    (match_operand 0 "s_register_operand"))
        (match_operand 0 "const_int_operand")))
 
+(define_predicate "const_neon_scalar_shift_amount_operand"
+  (and (match_code "const_int")
+       (match_test "((unsigned HOST_WIDE_INT) INTVAL (op)) <= GET_MODE_BITSIZE (mode)
+	&& ((unsigned HOST_WIDE_INT) INTVAL (op)) > 0")))
+
 (define_predicate "arm_add_operand"
   (ior (match_operand 0 "arm_rhs_operand")
        (match_operand 0 "arm_neg_immediate_operand")))
Index: config/arm/iterators.md
===================================================================
--- config/arm/iterators.md	(revision 180123)
+++ config/arm/iterators.md	(working copy)
@@ -414,6 +414,9 @@
 			       (V4QQ "8") (V2HQ "16") (QQ "8") (HQ "16")
 			       (V2HA "16") (HA "16") (SQ "") (SA "")])
 
+;; Mode attribute for vshll.
+(define_mode_attr V_innermode [(V8QI "QI") (V4HI "HI") (V2SI "SI")])
+
 ;;----------------------------------------------------------------------------
 ;; Code attributes
 ;;----------------------------------------------------------------------------
Index: tree-vect-slp.c
===================================================================
--- tree-vect-slp.c	(revision 180123)
+++ tree-vect-slp.c	(working copy)
@@ -489,6 +489,11 @@ vect_build_slp_tree (loop_vec_info loop_vinfo, bb_
 		    }
 		}
 	    }
+	  else if (rhs_code == WIDEN_LSHIFT_EXPR)
+            {
+              need_same_oprnds = true;
+              first_op1 = gimple_assign_rhs2 (stmt);
+            }
 	}
       else
 	{

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [patch] Support vectorization of widening shifts
  2011-10-18 10:48     ` Ira Rosen
@ 2011-10-18 10:53       ` Jakub Jelinek
  2011-10-18 11:33         ` Ira Rosen
  0 siblings, 1 reply; 9+ messages in thread
From: Jakub Jelinek @ 2011-10-18 10:53 UTC (permalink / raw)
  To: Ira Rosen; +Cc: Ramana Radhakrishnan, gcc-patches, Patch Tracking

On Tue, Oct 18, 2011 at 11:39:22AM +0200, Ira Rosen wrote:
> On 2 October 2011 10:30, Ira Rosen <ira.rosen@linaro.org> wrote:
> > On 29 September 2011 17:30, Ramana Radhakrishnan
> > <ramana.radhakrishnan@linaro.org> wrote:
> >> On 19 September 2011 08:54, Ira Rosen <ira.rosen@linaro.org> wrote:
> >>
> >>>
> >>> Bootstrapped on powerpc64-suse-linux, tested on powerpc64-suse-linux
> >>> and arm-linux-gnueabi
> >>> OK for mainline?
> >>
> >> Sorry I missed this patch. Is there any reason why we need unspecs in
> >> this case ? Can't this be represented by subregs and zero/ sign
> >> extensions in RTL without the UNSPECs ?
> 
> I committed the attached patch with Ramana's solution for testing

> +/* Detect widening shift pattern:
>  
> +   type a_t;
> +   TYPE a_T, res_T;
> +
> +   S1 a_t = ;
> +   S2 a_T = (TYPE) a_t;
> +   S3 res_T = a_T << CONST;
> +
> +  where type 'TYPE' is at least double the size of type 'type'.
> +
> +  Also detect unsgigned cases:

unsigned

	Jakub

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [patch] Support vectorization of widening shifts
  2011-10-18 10:53       ` Jakub Jelinek
@ 2011-10-18 11:33         ` Ira Rosen
  0 siblings, 0 replies; 9+ messages in thread
From: Ira Rosen @ 2011-10-18 11:33 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Ramana Radhakrishnan, gcc-patches, Patch Tracking

On 18 October 2011 11:43, Jakub Jelinek <jakub@redhat.com> wrote:
> On Tue, Oct 18, 2011 at 11:39:22AM +0200, Ira Rosen wrote:
>> On 2 October 2011 10:30, Ira Rosen <ira.rosen@linaro.org> wrote:
>> > On 29 September 2011 17:30, Ramana Radhakrishnan
>> > <ramana.radhakrishnan@linaro.org> wrote:
>> >> On 19 September 2011 08:54, Ira Rosen <ira.rosen@linaro.org> wrote:
>> >>
>> >>>
>> >>> Bootstrapped on powerpc64-suse-linux, tested on powerpc64-suse-linux
>> >>> and arm-linux-gnueabi
>> >>> OK for mainline?
>> >>
>> >> Sorry I missed this patch. Is there any reason why we need unspecs in
>> >> this case ? Can't this be represented by subregs and zero/ sign
>> >> extensions in RTL without the UNSPECs ?
>>
>> I committed the attached patch with Ramana's solution for testing
>
>> +/* Detect widening shift pattern:
>>
>> +   type a_t;
>> +   TYPE a_T, res_T;
>> +
>> +   S1 a_t = ;
>> +   S2 a_T = (TYPE) a_t;
>> +   S3 res_T = a_T << CONST;
>> +
>> +  where type 'TYPE' is at least double the size of type 'type'.
>> +
>> +  Also detect unsgigned cases:
>
> unsigned

Thanks, I'll fix this.

Ira

>
>        Jakub
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2011-10-18 10:12 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-09-19  8:26 [patch] Support vectorization of widening shifts Ira Rosen
2011-09-26 14:40 ` Richard Guenther
2011-09-27  7:40   ` Ira Rosen
2011-09-27 13:17     ` Richard Guenther
2011-09-29 15:42 ` Ramana Radhakrishnan
2011-10-02  8:31   ` Ira Rosen
2011-10-18 10:48     ` Ira Rosen
2011-10-18 10:53       ` Jakub Jelinek
2011-10-18 11:33         ` Ira Rosen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).